From jychang48 at gmail.com Mon Jun 1 01:12:25 2015 From: jychang48 at gmail.com (Justin Chang) Date: Mon, 1 Jun 2015 01:12:25 -0500 Subject: [petsc-users] Modified Taylor-Hood elements with piece-wise constant pressure for Stokes equation In-Reply-To: <87a8wkgs5b.fsf@jedbrown.org> References: <877frpili3.fsf@jedbrown.org> <87a8wkgs5b.fsf@jedbrown.org> Message-ID: There are a few papers that discuss this modified/augmented Taylor-Hood elements for Stokes equations in detail (e.g., http://link.springer.com/article/10.1007%2Fs10915-011-9549-4). From what I have seem, it seems people primarily use this to ensure local mass conservation while attaining the desirable qualities of the TH element. Lately I have seen this element used in many FEniCS and Deal.II applications (and it's also very easy to implement, just a few additional lines of code), which is why I had wanted to experiment with this myself (if possible) using DMPlex. On Sun, May 31, 2015 at 7:59 PM, Jed Brown wrote: > Justin Chang writes: > > > I am referring to P2 / (P1 + P0) elements, I think this is the correct > way > > of expressing it. Some call it modified Taylor Hood, others call it > > something else, but it's not Crouzeix-Raviart elements. > > Okay, thanks. This pressure space is not a disjoint union (the constant > exists in both spaces) and thus the obvious "basis" is actually not > linearly independent. I presume that people using this element do some > "pinning" (like set one cell "average" to zero) instead of enforcing a > unique expression via a Lagrange multiplier (which would involve a dense > row and column). That may contribute to ill conditioning and in any > case, would make domain decomposition or multigrid preconditioners more > technical. Do you know of anything explaining why the method is not > very widely used (e.g., in popular software packages, finite element > books, etc.)? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From riseth at maths.ox.ac.uk Mon Jun 1 04:07:04 2015 From: riseth at maths.ox.ac.uk (=?UTF-8?Q?Asbj=C3=B8rn_Nilsen_Riseth?=) Date: Mon, 01 Jun 2015 09:07:04 +0000 Subject: [petsc-users] SNESVINEWTONRSLS: definition of active set? In-Reply-To: References: <5ABC2C8A-1BE6-489C-BACD-E8D780CD205A@mcs.anl.gov> <9B83FB13-0727-4BB9-A7A8-A14EC1DC8FC0@mcs.anl.gov> Message-ID: Hi again Barry, I sorted out the jacobian issue, and made a comparison between the two definitions of the active set. The active set with strict inequality takes the same or fewer Newton steps than the current petsc code. With a larger search space, I expect this to happen. snes_vi_monitor logs comparing the two are shown below. I could submit a pull request with the change, but we should probably consider: 1) Whether this active set definition is consistent with newtonssls 2) The linear systems to solve becomes larger, so for some cases the overall performance might not improve so much. 3) For more flexibility, we could add an option to decide whether to use a strict inequality or not. This would sort out 1) and 2). I don't have much experience with the petsc codebase though, so adding options might take me some time. Ozzy __ log using my patch __ 0 SNES VI Function norm 7.796491981333e+02 Active lower constraints 0/18 upper constraints 0/0 Percent of total 0 Percent of bounded 0 1 SNES VI Function norm 2.405400030748e+02 Active lower constraints 0/16 upper constraints 0/0 Percent of total 0 Percent of bounded 0 2 SNES VI Function norm 2.145739795389e+02 Active lower constraints 0/17 upper constraints 0/0 Percent of total 0 Percent of bounded 0 3 SNES VI Function norm 1.942498283668e+02 Active lower constraints 0/13 upper constraints 0/0 Percent of total 0 Percent of bounded 0 4 SNES VI Function norm 1.834306037299e+01 Active lower constraints 0/11 upper constraints 0/0 Percent of total 0 Percent of bounded 0 5 SNES VI Function norm 1.724597091463e+01 Active lower constraints 0/11 upper constraints 0/0 Percent of total 0 Percent of bounded 0 6 SNES VI Function norm 4.210027533399e-02 Active lower constraints 0/10 upper constraints 0/0 Percent of total 0 Percent of bounded 0 7 SNES VI Function norm 3.014124871281e-07 Active lower constraints 0/10 upper constraints 0/0 Percent of total 0 Percent of bounded 0 SNES Object:(firedrake_snes_0_) 1 MPI processes type: vinewtonrsls maximum iterations=20, maximum function evaluations=10000 tolerances: relative=0, absolute=1e-06, solution=0 total number of linear solver iterations=7 total number of function evaluations=22 norm schedule ALWAYS SNESLineSearch Object: (firedrake_snes_0_) 1 MPI processes type: l2 maxstep=1.000000e+08, minlambda=1.000000e-12 tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 maximum iterations=1 KSP Object: (firedrake_snes_0_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (firedrake_snes_0_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 matrix ordering: nd factor fill ratio given 5, needed 1.54545 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=36, cols=36 package used to perform factorization: petsc total: nonzeros=816, allocated nonzeros=816 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 15 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=36, cols=36 total: nonzeros=528, allocated nonzeros=528 total number of mallocs used during MatSetValues calls =0 not using I-node routines -------------------------------------------------- __ log using the original petsc code __ 0 SNES VI Function norm 7.796491981333e+02 Active lower constraints 12/18 upper constraints 0/0 Percent of total 0.333333 Percent of bounded 0.333333 1 SNES VI Function norm 2.630718602212e+02 Active lower constraints 12/16 upper constraints 0/0 Percent of total 0.333333 Percent of bounded 0.333333 2 SNES VI Function norm 2.363417090057e+02 Active lower constraints 12/17 upper constraints 0/0 Percent of total 0.333333 Percent of bounded 0.333333 3 SNES VI Function norm 1.902271040685e+01 Active lower constraints 12/14 upper constraints 0/0 Percent of total 0.333333 Percent of bounded 0.333333 4 SNES VI Function norm 1.866193366410e+01 Active lower constraints 12/12 upper constraints 0/0 Percent of total 0.333333 Percent of bounded 0.333333 5 SNES VI Function norm 1.865568900723e+01 Active lower constraints 12/12 upper constraints 0/0 Percent of total 0.333333 Percent of bounded 0.333333 6 SNES VI Function norm 2.182461654877e+02 Active lower constraints 10/12 upper constraints 0/0 Percent of total 0.277778 Percent of bounded 0.277778 7 SNES VI Function norm 2.575010971279e-01 Active lower constraints 10/11 upper constraints 0/0 Percent of total 0.277778 Percent of bounded 0.277778 8 SNES VI Function norm 1.056372578821e-05 Active lower constraints 10/10 upper constraints 0/0 Percent of total 0.277778 Percent of bounded 0.277778 9 SNES VI Function norm 4.368019257866e-11 Active lower constraints 10/10 upper constraints 0/0 Percent of total 0.277778 Percent of bounded 0.277778 SNES Object:(firedrake_snes_0_) 1 MPI processes type: vinewtonrsls maximum iterations=20, maximum function evaluations=10000 tolerances: relative=0, absolute=1e-06, solution=0 total number of linear solver iterations=9 total number of function evaluations=28 norm schedule ALWAYS SNESLineSearch Object: (firedrake_snes_0_) 1 MPI processes type: l2 maxstep=1.000000e+08, minlambda=1.000000e-12 tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 maximum iterations=1 KSP Object: (firedrake_snes_0_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (firedrake_snes_0_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 matrix ordering: nd factor fill ratio given 5, needed 1.57895 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=26, cols=26 package used to perform factorization: petsc total: nonzeros=420, allocated nonzeros=420 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 11 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=26, cols=26 total: nonzeros=266, allocated nonzeros=266 total number of mallocs used during MatSetValues calls =0 not using I-node routines On Sat, 30 May 2015 at 01:07 Asbj?rn Nilsen Riseth wrote: > Hey Barry, > > thanks for the offer to have a look at the code. I ran SNESTEST today: > user-defined failed, 1.0 and -1.0 seemed to work fine. My first step will > have to be to find out what's wrong with my jacobian. If I've still got > issues after that, I'll try to set up an easy-to-experiment code > > The code is a DG0 FVM formulation set up in Firedrake (a "fork" of > FEniCS). I was assuming UFL would sort the Jacobian for me. > Lesson learnt: always do a SNESTEST. > > > Ozzy > > On Fri, 29 May 2015 at 19:21 Barry Smith wrote: > >> >> > On May 29, 2015, at 4:19 AM, Asbj?rn Nilsen Riseth < >> riseth at maths.ox.ac.uk> wrote: >> > >> > Barry, >> > >> > I changed the code, but it only makes a difference at the order of >> 1e-10 - so that's not the cause. I've attached (if that's appropriate?) the >> patch in case anyone is interested. >> > >> > Investigating the function values, I see that the first Newton step >> goes towards the expected solution for my function values. Then it shoots >> back close to the initial conditions. >> >> When does it "shoot back close to the initial conditions"? At the >> second Newton step? If so is the residual norm still smaller at the second >> step? >> >> > At the time the solver hits tolerance for the inactive set; the >> function value is 100-1000 at some of the active set indices. >> > Reducing the timestep by an order of magnitude shows the same behavior >> for the first two timesteps. >> > >> > Maybe VI is not the right approach. The company I work with seem to >> just be projecting negative values. >> >> The VI solver is essentially just a "more sophisticated" version of >> "projecting negative values" so should not work worse then an ad hoc method >> and should generally work better (sometimes much better). >> >> Is your code something simple you could email me to play with or is it >> a big application that would be hard for me to experiment with? >> >> >> >> Barry >> >> > >> > I'll investigate further. >> > >> > Ozzy >> > >> > >> > On Thu, 28 May 2015 at 20:26 Barry Smith wrote: >> > >> > Ozzy, >> > >> > I cannot say why it is implemented as >= (could be in error). >> Just try changing the PETSc code (just run make gnumake in the PETSc >> directory after you change the source to update the library) and see how it >> affects your code run. >> > >> > Barry >> > >> > > On May 28, 2015, at 3:15 AM, Asbj?rn Nilsen Riseth < >> riseth at maths.ox.ac.uk> wrote: >> > > >> > > Dear PETSc developers, >> > > >> > > Is the active set in NewtonRSLS defined differently from the >> reference* you give in the documentation on purpose? >> > > The reference defines the active set as: >> > > x_i = 0 and F_i > 0, >> > > whilst the PETSc code defines it as x_i = 0 and F_i >= 0 (vi.c: 356) : >> > > !((PetscRealPart(x[i]) > PetscRealPart(xl[i]) + 1.e-8 || >> (PetscRealPart(f[i]) < 0.0) >> > > So PETSc freezes the variables if f[i] == 0. >> > > >> > > I've been using the Newton RSLS method to ensure positivity in a >> subsurface flow problem I'm working on. My solution stays almost constant >> for two timesteps (seemingly independent of the size of the timestep), >> before it goes towards the expected solution. >> > > From my initial conditions, certain variables are frozen because x_i >> = 0 and f[i] = 0, and I was wondering if that could be the cause of my >> issue. >> > > >> > > >> > > *: >> > > - T. S. Munson, and S. Benson. Flexible Complementarity Solvers for >> Large-Scale Applications, Optimization Methods and Software, 21 (2006). >> > > >> > > >> > > Regards, >> > > Ozzy >> > >> > <0001-Define-active-and-inactive-sets-correctly.patch> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Jun 1 09:38:07 2015 From: jed at jedbrown.org (Jed Brown) Date: Mon, 01 Jun 2015 16:38:07 +0200 Subject: [petsc-users] Modified Taylor-Hood elements with piece-wise constant pressure for Stokes equation In-Reply-To: References: <877frpili3.fsf@jedbrown.org> <87a8wkgs5b.fsf@jedbrown.org> Message-ID: <87pp5ffq9c.fsf@jedbrown.org> Justin Chang writes: > There are a few papers that discuss this modified/augmented Taylor-Hood > elements for Stokes equations in detail (e.g., > http://link.springer.com/article/10.1007%2Fs10915-011-9549-4). This analysis does not state a finite element. > From what I have seem, it seems people primarily use this to ensure > local mass conservation while attaining the desirable qualities of the > TH element. Lately I have seen this element used in many FEniCS and > Deal.II applications (and it's also very easy to implement, just a few > additional lines of code), Could you point to a specific example? How are they handling linear dependence of the "basis"? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From knepley at gmail.com Mon Jun 1 10:02:52 2015 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 1 Jun 2015 10:02:52 -0500 Subject: [petsc-users] Modified Taylor-Hood elements with piece-wise constant pressure for Stokes equation In-Reply-To: <87pp5ffq9c.fsf@jedbrown.org> References: <877frpili3.fsf@jedbrown.org> <87a8wkgs5b.fsf@jedbrown.org> <87pp5ffq9c.fsf@jedbrown.org> Message-ID: On Mon, Jun 1, 2015 at 9:38 AM, Jed Brown wrote: > Justin Chang writes: > > > There are a few papers that discuss this modified/augmented Taylor-Hood > > elements for Stokes equations in detail (e.g., > > http://link.springer.com/article/10.1007%2Fs10915-011-9549-4). > > This analysis does not state a finite element. They certaiinly state the approximation space up front. Then later in the paper they say that they independently test with P1 and P0, and that this has a 1D null space, and then in the solution section they have some way of handling that which I ignored because its easy to handle. Matt > > From what I have seem, it seems people primarily use this to ensure > > local mass conservation while attaining the desirable qualities of the > > TH element. Lately I have seen this element used in many FEniCS and > > Deal.II applications (and it's also very easy to implement, just a few > > additional lines of code), > > Could you point to a specific example? How are they handling > linear dependence of the "basis"? > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From erocha.ssa at gmail.com Mon Jun 1 16:06:15 2015 From: erocha.ssa at gmail.com (Eduardo) Date: Mon, 1 Jun 2015 23:06:15 +0200 Subject: [petsc-users] Convergence of iterative linear solver Message-ID: I am solving a linear system for which the preconditioned residual decreases, but the true residual increases (or have an erratic behavior). According to Petsc FAQ, this is due to a preconditioner that is singular or close to singular. What can I do in this case? I used GMRES with ILU preconditioner. Incidentally, I tried to solve a smaller system with a direct solver (superlu_dist) and it ran, so the system apparently is not singular. Thanks in advance, Eduardo -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Jun 1 16:11:27 2015 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 1 Jun 2015 16:11:27 -0500 Subject: [petsc-users] Convergence of iterative linear solver In-Reply-To: References: Message-ID: On Mon, Jun 1, 2015 at 4:06 PM, Eduardo wrote: > I am solving a linear system for which the preconditioned residual > decreases, but the true residual increases (or have an erratic behavior). > According to Petsc FAQ, this is due to a preconditioner that is singular or > close to singular. What can I do in this case? I used GMRES with ILU > preconditioner. > ILU can be spectacularly bad. We recommend switching to a preconditioner tailored to your problem. The best way to find these is to look in the literature. Thanks, Matt > Incidentally, I tried to solve a smaller system with a direct solver > (superlu_dist) and it ran, so the system apparently is not singular. > > Thanks in advance, > Eduardo > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Jun 1 21:11:07 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 1 Jun 2015 21:11:07 -0500 Subject: [petsc-users] Convergence of iterative linear solver In-Reply-To: References: Message-ID: > On Jun 1, 2015, at 4:06 PM, Eduardo wrote: > > I am solving a linear system for which the preconditioned residual decreases, but the true residual increases (or have an erratic behavior). According to Petsc FAQ, this is due to a preconditioner that is singular or close to singular. What can I do in this case? I used GMRES with ILU preconditioner. ^^^^^^^^^^^^^^^ > > Incidentally, I tried to solve a smaller system with a direct solver (superlu_dist) and it ran, so the system apparently is not singular. ILU can produce very badly conditioned (one could say singular PRECONDITIONER) from not singular sparse matrices. So it doesn't have anything to do with the system itself being singular. What type of problem are you solving? Different problems need different preconditioners. Barry > > Thanks in advance, > Eduardo From jychang48 at gmail.com Mon Jun 1 21:51:16 2015 From: jychang48 at gmail.com (Justin Chang) Date: Mon, 1 Jun 2015 21:51:16 -0500 Subject: [petsc-users] Modified Taylor-Hood elements with piece-wise constant pressure for Stokes equation In-Reply-To: References: <877frpili3.fsf@jedbrown.org> <87a8wkgs5b.fsf@jedbrown.org> <87pp5ffq9c.fsf@jedbrown.org> Message-ID: Jed, I am not quite sure what you're asking for. Are you asking for how people actually implement this augmented TH? In other words, how the shape/basis functions for this mixed function space would look? I have only seen in some key note lectures and presentations at conferences briefly mentioning this P2/(P1+P0) element, as if it's the de facto discretization for Stokes flows. That said, even I am not too sure how this would look. Matt, In the 'quad_q2p1_full' example you pointed me to, is that P2/P1_disc or P2/(P1+P0)? I imagine those are two very different discretizations, so when you have the command line option "-pres_petscdualspace_lagrange_continuity 0" it looks to me you're doing the former? Thanks, Justin On Mon, Jun 1, 2015 at 10:02 AM, Matthew Knepley wrote: > On Mon, Jun 1, 2015 at 9:38 AM, Jed Brown wrote: > >> Justin Chang writes: >> >> > There are a few papers that discuss this modified/augmented Taylor-Hood >> > elements for Stokes equations in detail (e.g., >> > http://link.springer.com/article/10.1007%2Fs10915-011-9549-4). >> >> This analysis does not state a finite element. > > > They certaiinly state the approximation space up front. Then later in the > paper > they say that they independently test with P1 and P0, and that this has a > 1D > null space, and then in the solution section they have some way of > handling that > which I ignored because its easy to handle. > > Matt > > >> > From what I have seem, it seems people primarily use this to ensure >> > local mass conservation while attaining the desirable qualities of the >> > TH element. Lately I have seen this element used in many FEniCS and >> > Deal.II applications (and it's also very easy to implement, just a few >> > additional lines of code), >> >> Could you point to a specific example? How are they handling >> linear dependence of the "basis"? >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tillmanscott878853562 at gmail.com Mon Jun 1 22:58:32 2015 From: tillmanscott878853562 at gmail.com (tillmanscott878853562 at gmail.com) Date: Tue, 02 Jun 2015 03:58:32 +0000 Subject: [petsc-users] =?gb2312?b?zeLDs9b3tq/KvdOqz/q12tK7xrfFxqOssO/W+sT6?= =?gb2312?b?v6q3osirx/LEv7Hqv827p9OutcPK0LOhIQ==?= Message-ID: <089e0115f060b5e0c0051780f481@google.com> B2B????????????????????????? ????????????????????????????? ???????????????????????????????????? ?????????????? ??????????????????????????soho???????? ???????????????????? ???????????????????????? ?????????QQ?61737441---???? ?QQ?????????????????????? I've invited you to fill out the form Untitled form. To fill it out, visit: https://docs.google.com/forms/d/1szLK6HHGU_6K3LsxR2kIFbDeKW80n-bdn_Fm9T0DIik/viewform?c=0&w=1&usp=mail_form_link -------------- next part -------------- An HTML attachment was scrubbed... URL: From erocha.ssa at gmail.com Mon Jun 1 23:24:22 2015 From: erocha.ssa at gmail.com (Eduardo) Date: Tue, 2 Jun 2015 06:24:22 +0200 Subject: [petsc-users] Convergence of iterative linear solver In-Reply-To: References: Message-ID: I am solving a FEM solid mechanics linear elasticity model, for now the only problem is the mesh that has needle-shaped and very flat elements. Have you any suggestion of preconditioner (and references). Thanks, Eduardo On Tue, Jun 2, 2015 at 4:11 AM, Barry Smith wrote: > > > On Jun 1, 2015, at 4:06 PM, Eduardo wrote: > > > > I am solving a linear system for which the preconditioned residual > decreases, but the true residual increases (or have an erratic behavior). > According to Petsc FAQ, this is due to a preconditioner that is singular or > close to singular. What can I do in this case? I used GMRES with ILU > preconditioner. > ^^^^^^^^^^^^^^^ > > > > Incidentally, I tried to solve a smaller system with a direct solver > (superlu_dist) and it ran, so the system apparently is not singular. > > ILU can produce very badly conditioned (one could say singular > PRECONDITIONER) from not singular sparse matrices. So it doesn't have > anything to do with the system itself being singular. > > What type of problem are you solving? Different problems need different > preconditioners. > > Barry > > > > > > > Thanks in advance, > > Eduardo > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Jun 2 00:31:22 2015 From: jed at jedbrown.org (Jed Brown) Date: Tue, 02 Jun 2015 07:31:22 +0200 Subject: [petsc-users] Modified Taylor-Hood elements with piece-wise constant pressure for Stokes equation In-Reply-To: References: <877frpili3.fsf@jedbrown.org> <87a8wkgs5b.fsf@jedbrown.org> <87pp5ffq9c.fsf@jedbrown.org> Message-ID: <876176ekwl.fsf@jedbrown.org> Justin Chang writes: > I am not quite sure what you're asking for. Are you asking for how people > actually implement this augmented TH? In other words, how the shape/basis > functions for this mixed function space would look? Ciarlet's finite element definition requires a local approximation space ("P1+P0" in your case -- though P0 is a subset of P1 so this is actually just P1) and a set of "nodes" (a dual basis, which will define the basis functions used for computation and the continuity between elements). The dual basis needs to be unisolvent, such that the Vandermonde matrix is invertible. Now what might that look like for "P1+P0"? And if you bypass Ciarlet's construction and just try to pick some "basis" functions, say {(1-x)/2, (1+x)/2, 1} for the 1D simplex [-1,1], then you can express the constant f(x)=1 as u=(1,1,0), v=(0,0,1), or t*u + (1-t)*v for any real value t. Clearly it fails to be a basis because it's linearly dependent. Can you still compute with it? The answer is probably yes if you pin some cell displacement, but that causes a bunch of somewhat unattractive side-effects that I asked about in my earlier message. Now you said it was "just a few additional lines of code", so maybe you can explain how people are implementing it. > I have only seen in some key note lectures and presentations at > conferences briefly mentioning this P2/(P1+P0) element, as if it's the > de facto discretization for Stokes flows. It's hardly "de facto". > That said, even I am not too sure how this would look. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From francesco.caimmi at gmail.com Tue Jun 2 04:52:14 2015 From: francesco.caimmi at gmail.com (Francesco Caimmi) Date: Tue, 02 Jun 2015 11:52:14 +0200 Subject: [petsc-users] [petsc4py] dm/examples/tutorials/ex2 in python Message-ID: <34150389.MoinacE02k@wotan> Dear PETSC users, first of all, many thanks to the developers for making PETSC available. I am trying to get familiar with the library using petsc4py (my C/C++ knowledge is rudimentary to say the least), and I am now trying to reproduce teh examples in dm/examples/tutorials/ in python to get accustomed to DMs. However, while translating ex2.c, I get an error whose cause I cannot understand. I am doing (I hope it's ok to post this short code snippet): impost sys import petsc4py petsc4py.init(sys.argv) from petsc4py import PETSc stype = PETSc.DMDA.StencilType.BOX bx = PETSc.DMDA.BoundaryType.NONE by = PETSc.DMDA.BoundaryType.NONE comm = PETSc.COMM_WORLD rank = comm.rank OptDB = PETSc.Options()#get PETSc option DB M = OptDB.getInt('M', 10) N = OptDB.getInt('N', 8) m = OptDB.getInt('m', PETSc.DECIDE) n = OptDB.getInt('n', PETSc.DECIDE) dm = PETSc.DMDA().create(dim=2, sizes = (M,N), proc_sizes=(m,n), boundary_type=(bx,by), stencil_type=stype, stencil_width = 1, dof = 1, comm = comm ) global_vec = dm.createGlobalVector() start, end = global_vec.getOwnershipRange() with global_vec as v: for i in xrange(start,end): v[i] = 5.0*rank As far as I understand this should be the closest python translation of ex2.c up to line 48, except for the viewer part which is still to be translated. If run on a single processor everything is ok, but when I run with mpiexec -n 2 I get the following error (on the second rank) ############################################# v[i] = 5.0*rank IndexError: index 40 is out of bounds for axis 0 with size 40 ############################################# I am on linux/x64 and I get this behaviour both with petsc3.4.3+petsc4py 3.4.2 (the packages available from opensuse repositories) and with petsc3.5.4+petsc4py3.5.1 (that I built myself). I would have expected the code to seamlessly handle the transition from one to multiple processors, so it's me who's doing something wrong or some other kind of problem? Up to now, I have never seen somethinf like that with vectors/matrices created by PETSc.Vec()/PETSc.Mat(). Thank you for your attention, -- Francesco Caimmi From jychang48 at gmail.com Tue Jun 2 06:13:39 2015 From: jychang48 at gmail.com (Justin Chang) Date: Tue, 2 Jun 2015 06:13:39 -0500 Subject: [petsc-users] Modified Taylor-Hood elements with piece-wise constant pressure for Stokes equation In-Reply-To: <876176ekwl.fsf@jedbrown.org> References: <877frpili3.fsf@jedbrown.org> <87a8wkgs5b.fsf@jedbrown.org> <87pp5ffq9c.fsf@jedbrown.org> <876176ekwl.fsf@jedbrown.org> Message-ID: In FEniCS's Stokes example (example 19), one defines the Taylor-Hood function spaces with these three lines: V = VectorFunctionSpace(mesh, "CG", 2) Q = FunctionSpace(mesh, "CG", 1) W = V * Q To implement P2/(P1+P0), all we gotta do is this: V = VectorFunctionSpace(mesh, "CG", 2) Q = FunctionSpace(mesh, "CG", 1) P = FunctionSpace(mesh, "DG", 0) W = V * (Q + P) Everything else (boundary conditions, variational form, etc) remains the same. The sum of element-wise divergence (via assemble(div(u)*dx)) for the former is -0.000454806468878 whereas the latter its 3.91121126886e-14 hence ensuring (better) local mass conservation. I haven't looked into the exact framework on how this is done, but attached are the UFL/FFC generated header files that implement the Taylor-Hood and Modified Taylor-Hood discretization. Not sure how it's done in Deal.II, but IIRC it should be possible. Thanks, Justin On Tue, Jun 2, 2015 at 12:31 AM, Jed Brown wrote: > Justin Chang writes: > > I am not quite sure what you're asking for. Are you asking for how people > > actually implement this augmented TH? In other words, how the shape/basis > > functions for this mixed function space would look? > > Ciarlet's finite element definition requires a local approximation space > ("P1+P0" in your case -- though P0 is a subset of P1 so this is actually > just P1) and a set of "nodes" (a dual basis, which will define the basis > functions used for computation and the continuity between elements). > The dual basis needs to be unisolvent, such that the Vandermonde matrix > is invertible. > > Now what might that look like for "P1+P0"? > > And if you bypass Ciarlet's construction and just try to pick some > "basis" functions, say {(1-x)/2, (1+x)/2, 1} for the 1D simplex [-1,1], > then you can express the constant f(x)=1 as u=(1,1,0), v=(0,0,1), or t*u > + (1-t)*v for any real value t. Clearly it fails to be a basis because > it's linearly dependent. > > Can you still compute with it? The answer is probably yes if you pin > some cell displacement, but that causes a bunch of somewhat unattractive > side-effects that I asked about in my earlier message. > > Now you said it was "just a few additional lines of code", so maybe you > can explain how people are implementing it. > > > I have only seen in some key note lectures and presentations at > > conferences briefly mentioning this P2/(P1+P0) element, as if it's the > > de facto discretization for Stokes flows. > > It's hardly "de facto". > > > That said, even I am not too sure how this would look. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: TH.h Type: text/x-chdr Size: 409646 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: MTH.h Type: text/x-chdr Size: 454613 bytes Desc: not available URL: From knepley at gmail.com Tue Jun 2 07:08:16 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 2 Jun 2015 07:08:16 -0500 Subject: [petsc-users] Modified Taylor-Hood elements with piece-wise constant pressure for Stokes equation In-Reply-To: References: <877frpili3.fsf@jedbrown.org> <87a8wkgs5b.fsf@jedbrown.org> <87pp5ffq9c.fsf@jedbrown.org> Message-ID: On Mon, Jun 1, 2015 at 9:51 PM, Justin Chang wrote: > Jed, > > I am not quite sure what you're asking for. Are you asking for how people > actually implement this augmented TH? In other words, how the shape/basis > functions for this mixed function space would look? I have only seen in > some key note lectures and presentations at conferences briefly mentioning > this P2/(P1+P0) element, as if it's the de facto discretization for Stokes > flows. That said, even I am not too sure how this would look. > > Matt, > > In the 'quad_q2p1_full' example you pointed me to, is that P2/P1_disc or > P2/(P1+P0)? I imagine those are two very different discretizations, so when > you have the command line option "-pres_petscdualspace_lagrange_continuity > 0" it looks to me you're doing the former? > Its Q2-P1_disc. That is stable and captures the pressure jumps well. This is the standard variable-viscosity Stokes element in my experience. Thanks, Matt > Thanks, > Justin > > On Mon, Jun 1, 2015 at 10:02 AM, Matthew Knepley > wrote: > >> On Mon, Jun 1, 2015 at 9:38 AM, Jed Brown wrote: >> >>> Justin Chang writes: >>> >>> > There are a few papers that discuss this modified/augmented Taylor-Hood >>> > elements for Stokes equations in detail (e.g., >>> > http://link.springer.com/article/10.1007%2Fs10915-011-9549-4). >>> >>> This analysis does not state a finite element. >> >> >> They certaiinly state the approximation space up front. Then later in the >> paper >> they say that they independently test with P1 and P0, and that this has a >> 1D >> null space, and then in the solution section they have some way of >> handling that >> which I ignored because its easy to handle. >> >> Matt >> >> >>> > From what I have seem, it seems people primarily use this to ensure >>> > local mass conservation while attaining the desirable qualities of the >>> > TH element. Lately I have seen this element used in many FEniCS and >>> > Deal.II applications (and it's also very easy to implement, just a few >>> > additional lines of code), >>> >>> Could you point to a specific example? How are they handling >>> linear dependence of the "basis"? >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Jun 2 07:14:06 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 2 Jun 2015 07:14:06 -0500 Subject: [petsc-users] Modified Taylor-Hood elements with piece-wise constant pressure for Stokes equation In-Reply-To: References: <877frpili3.fsf@jedbrown.org> <87a8wkgs5b.fsf@jedbrown.org> <87pp5ffq9c.fsf@jedbrown.org> <876176ekwl.fsf@jedbrown.org> Message-ID: On Tue, Jun 2, 2015 at 6:13 AM, Justin Chang wrote: > In FEniCS's Stokes example (example 19), one defines the Taylor-Hood > function spaces with these three lines: > > V = VectorFunctionSpace(mesh, "CG", 2) > Q = FunctionSpace(mesh, "CG", 1) > W = V * Q > > To implement P2/(P1+P0), all we gotta do is this: > > V = VectorFunctionSpace(mesh, "CG", 2) > Q = FunctionSpace(mesh, "CG", 1) > P = FunctionSpace(mesh, "DG", 0) > W = V * (Q + P) > So here you would need 4 dual basis vectors, which I am assuming are: ev_(-1, -1), ev_(1, -1), ev_(-1, 1), ev_(-0.5, -0.5) where ev_(x, y) is the point evaluation functional at (x, y). Then you need some basis for the primal space, which naively is 1, x, y, 1 As you can see, this basis in linearly dependent, so the Vandermonde matrix that FIAT constructs will be singular. The construction of a nodal basis will fail. So Jed's question is, what are they actually doing internally? Thanks, Matt > Everything else (boundary conditions, variational form, etc) remains the > same. The sum of element-wise divergence (via assemble(div(u)*dx)) for the > former is -0.000454806468878 whereas the latter its 3.91121126886e-14 hence > ensuring (better) local mass conservation. > > I haven't looked into the exact framework on how this is done, but > attached are the UFL/FFC generated header files that implement the > Taylor-Hood and Modified Taylor-Hood discretization. > > Not sure how it's done in Deal.II, but IIRC it should be possible. > > Thanks, > Justin > > On Tue, Jun 2, 2015 at 12:31 AM, Jed Brown wrote: > >> Justin Chang writes: >> > I am not quite sure what you're asking for. Are you asking for how >> people >> > actually implement this augmented TH? In other words, how the >> shape/basis >> > functions for this mixed function space would look? >> >> Ciarlet's finite element definition requires a local approximation space >> ("P1+P0" in your case -- though P0 is a subset of P1 so this is actually >> just P1) and a set of "nodes" (a dual basis, which will define the basis >> functions used for computation and the continuity between elements). >> The dual basis needs to be unisolvent, such that the Vandermonde matrix >> is invertible. >> >> Now what might that look like for "P1+P0"? >> >> And if you bypass Ciarlet's construction and just try to pick some >> "basis" functions, say {(1-x)/2, (1+x)/2, 1} for the 1D simplex [-1,1], >> then you can express the constant f(x)=1 as u=(1,1,0), v=(0,0,1), or t*u >> + (1-t)*v for any real value t. Clearly it fails to be a basis because >> it's linearly dependent. >> >> Can you still compute with it? The answer is probably yes if you pin >> some cell displacement, but that causes a bunch of somewhat unattractive >> side-effects that I asked about in my earlier message. >> >> Now you said it was "just a few additional lines of code", so maybe you >> can explain how people are implementing it. >> >> > I have only seen in some key note lectures and presentations at >> > conferences briefly mentioning this P2/(P1+P0) element, as if it's the >> > de facto discretization for Stokes flows. >> >> It's hardly "de facto". >> >> > That said, even I am not too sure how this would look. >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From lawrence.mitchell at imperial.ac.uk Tue Jun 2 07:18:32 2015 From: lawrence.mitchell at imperial.ac.uk (Lawrence Mitchell) Date: Tue, 02 Jun 2015 13:18:32 +0100 Subject: [petsc-users] Modified Taylor-Hood elements with piece-wise constant pressure for Stokes equation In-Reply-To: References: <877frpili3.fsf@jedbrown.org> <87a8wkgs5b.fsf@jedbrown.org> <87pp5ffq9c.fsf@jedbrown.org> <876176ekwl.fsf@jedbrown.org> Message-ID: <556D9F18.2070107@imperial.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 02/06/15 13:14, Matthew Knepley wrote: > On Tue, Jun 2, 2015 at 6:13 AM, Justin Chang > wrote: > > In FEniCS's Stokes example (example 19), one defines the > Taylor-Hood function spaces with these three lines: > > V = VectorFunctionSpace(mesh, "CG", 2) Q = FunctionSpace(mesh, > "CG", 1) W = V * Q > > To implement P2/(P1+P0), all we gotta do is this: > > V = VectorFunctionSpace(mesh, "CG", 2) Q = FunctionSpace(mesh, > "CG", 1) P = FunctionSpace(mesh, "DG", 0) W = V * (Q + P) > > > So here you would need 4 dual basis vectors, which I am assuming > are: > > ev_(-1, -1), ev_(1, -1), ev_(-1, 1), ev_(-0.5, -0.5) > > where ev_(x, y) is the point evaluation functional at (x, y). Then > you need some basis for the primal space, which naively is > > 1, x, y, 1 > > As you can see, this basis in linearly dependent, so the > Vandermonde matrix that FIAT constructs will be singular. The > construction of a nodal basis will fail. > > So Jed's question is, what are they actually doing internally? So-called "enriched" elements in FEniCS are not created with a nodal basis, instead te "basis" for the space Q + P is just the concatenation of the bases for Q and P separately and so tabulation of basis functions at points is just the concatentation of the tabulation of Q and that of P. Lawrence -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJVbZ8UAAoJECOc1kQ8PEYvfMEH/RjoxiJLL/Jl7/FzZSZCgTVF 2WodH3Xnt+vD6+L6IhbZ5g+R9F4leRHBnin8wRZKdE9GepbFIGpDRxq6ydhzqpUU eyawpNWltsJ2JcxAJo6nUxACQYJyAVr8xrlkfg90OGTPT8CTvliZ8545j+cr2EGC 80vtw2vZOx0WKJ3CFQ0RfbjSYnUf1UibV30WfSr8qm2IbysKxEBKUFC/JbXZ1vft MIzbK8koA5Ix58vss3YUAr7aCOB39xy/2xokW5G+fzvocCPxr3Wkv+lST3f9yzLA mXns+DJGzuAZJaX64ZnpS+n8yVzySECjLjeIecMB5rXTBwkiQUheTDVGhtifNrY= =ZWA6 -----END PGP SIGNATURE----- From knepley at gmail.com Tue Jun 2 07:37:35 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 2 Jun 2015 07:37:35 -0500 Subject: [petsc-users] Modified Taylor-Hood elements with piece-wise constant pressure for Stokes equation In-Reply-To: <556D9F18.2070107@imperial.ac.uk> References: <877frpili3.fsf@jedbrown.org> <87a8wkgs5b.fsf@jedbrown.org> <87pp5ffq9c.fsf@jedbrown.org> <876176ekwl.fsf@jedbrown.org> <556D9F18.2070107@imperial.ac.uk> Message-ID: On Tue, Jun 2, 2015 at 7:18 AM, Lawrence Mitchell < lawrence.mitchell at imperial.ac.uk> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 02/06/15 13:14, Matthew Knepley wrote: > > On Tue, Jun 2, 2015 at 6:13 AM, Justin Chang > > wrote: > > > > In FEniCS's Stokes example (example 19), one defines the > > Taylor-Hood function spaces with these three lines: > > > > V = VectorFunctionSpace(mesh, "CG", 2) Q = FunctionSpace(mesh, > > "CG", 1) W = V * Q > > > > To implement P2/(P1+P0), all we gotta do is this: > > > > V = VectorFunctionSpace(mesh, "CG", 2) Q = FunctionSpace(mesh, > > "CG", 1) P = FunctionSpace(mesh, "DG", 0) W = V * (Q + P) > > > > > > So here you would need 4 dual basis vectors, which I am assuming > > are: > > > > ev_(-1, -1), ev_(1, -1), ev_(-1, 1), ev_(-0.5, -0.5) > > > > where ev_(x, y) is the point evaluation functional at (x, y). Then > > you need some basis for the primal space, which naively is > > > > 1, x, y, 1 > > > > As you can see, this basis in linearly dependent, so the > > Vandermonde matrix that FIAT constructs will be singular. The > > construction of a nodal basis will fail. > > > > So Jed's question is, what are they actually doing internally? > > So-called "enriched" elements in FEniCS are not created with a nodal > basis, instead te "basis" for the space Q + P is just the > concatenation of the bases for Q and P separately and so tabulation of > basis functions at points is just the concatentation of the tabulation > of Q and that of P. > This construction appears to throw away unisolvence. In the Boffi paper, they use QR to solve the pressure Laplacian. This would put a damper on me using the method. Justin, how do they solve the system in the FEniCS example? Thanks, Matt > Lawrence > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1 > > iQEcBAEBAgAGBQJVbZ8UAAoJECOc1kQ8PEYvfMEH/RjoxiJLL/Jl7/FzZSZCgTVF > 2WodH3Xnt+vD6+L6IhbZ5g+R9F4leRHBnin8wRZKdE9GepbFIGpDRxq6ydhzqpUU > eyawpNWltsJ2JcxAJo6nUxACQYJyAVr8xrlkfg90OGTPT8CTvliZ8545j+cr2EGC > 80vtw2vZOx0WKJ3CFQ0RfbjSYnUf1UibV30WfSr8qm2IbysKxEBKUFC/JbXZ1vft > MIzbK8koA5Ix58vss3YUAr7aCOB39xy/2xokW5G+fzvocCPxr3Wkv+lST3f9yzLA > mXns+DJGzuAZJaX64ZnpS+n8yVzySECjLjeIecMB5rXTBwkiQUheTDVGhtifNrY= > =ZWA6 > -----END PGP SIGNATURE----- > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Jun 2 07:43:21 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 2 Jun 2015 07:43:21 -0500 Subject: [petsc-users] [petsc4py] dm/examples/tutorials/ex2 in python In-Reply-To: <34150389.MoinacE02k@wotan> References: <34150389.MoinacE02k@wotan> Message-ID: On Tue, Jun 2, 2015 at 4:52 AM, Francesco Caimmi wrote: > Dear PETSC users, > > first of all, many thanks to the developers for making PETSC available. > > I am trying to get familiar with the library using petsc4py (my C/C++ > knowledge is rudimentary to say the least), and I am now trying to > reproduce > teh examples in dm/examples/tutorials/ in python to get accustomed to DMs. > > However, while translating ex2.c, I get an error whose cause I cannot > understand. > I am doing (I hope it's ok to post this short code snippet): > > impost sys > import petsc4py > petsc4py.init(sys.argv) > from petsc4py import PETSc > stype = PETSc.DMDA.StencilType.BOX > bx = PETSc.DMDA.BoundaryType.NONE > by = PETSc.DMDA.BoundaryType.NONE > comm = PETSc.COMM_WORLD > rank = comm.rank > OptDB = PETSc.Options()#get PETSc option DB > M = OptDB.getInt('M', 10) > N = OptDB.getInt('N', 8) > m = OptDB.getInt('m', PETSc.DECIDE) > n = OptDB.getInt('n', PETSc.DECIDE) > dm = PETSc.DMDA().create(dim=2, sizes = (M,N), proc_sizes=(m,n), > boundary_type=(bx,by), stencil_type=stype, > stencil_width = 1, dof = 1, comm = comm > ) > global_vec = dm.createGlobalVector() > start, end = global_vec.getOwnershipRange() > with global_vec as v: > for i in xrange(start,end): > v[i] = 5.0*rank > The 'with' construction just uses VecGetArray() which return the raw C pointer. This is indexed from 0 always, so you want v[i-start] = 5.0*rank If you want to index using [start, end), you need something like DMDAVecGetArray() Thanks, Matt > As far as I understand this should be the closest python translation of > ex2.c > up to line 48, except for the viewer part which is still to be translated. > > If run on a single processor everything is ok, but when I run with > mpiexec -n 2 I get the following error (on the second rank) > > ############################################# > v[i] = 5.0*rank > IndexError: index 40 is out of bounds for axis 0 with size 40 > ############################################# > > I am on linux/x64 and I get this behaviour both with petsc3.4.3+petsc4py > 3.4.2 > (the packages available from opensuse repositories) and with > petsc3.5.4+petsc4py3.5.1 (that I built myself). > > I would have expected the code to seamlessly handle the transition from > one > to multiple processors, so it's me who's doing something wrong or some > other > kind of problem? Up to now, I have never seen somethinf like that with > vectors/matrices created by PETSc.Vec()/PETSc.Mat(). > > Thank you for your attention, > -- > Francesco Caimmi > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From lawrence.mitchell at imperial.ac.uk Tue Jun 2 08:21:09 2015 From: lawrence.mitchell at imperial.ac.uk (Lawrence Mitchell) Date: Tue, 02 Jun 2015 14:21:09 +0100 Subject: [petsc-users] Modified Taylor-Hood elements with piece-wise constant pressure for Stokes equation In-Reply-To: References: <877frpili3.fsf@jedbrown.org> <87a8wkgs5b.fsf@jedbrown.org> <87pp5ffq9c.fsf@jedbrown.org> <876176ekwl.fsf@jedbrown.org> <556D9F18.2070107@imperial.ac.uk> Message-ID: <556DADC5.2030103@imperial.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 02/06/15 13:37, Matthew Knepley wrote: > >> This construction appears to throw away unisolvence. Yes, it's normally used for building spaces where the resulting enriched space is unisolvent (e.g. MINI). >> In the Boffi paper, they use QR to solve the pressure Laplacian. >> This would put a damper on me using the method. I guess you could use some kind of interior penalty scheme (as normal for DG laplacians). Lawrence -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJVba3CAAoJECOc1kQ8PEYvriEH/ilvHjMbmipvSGozYM9rEu7y HYhkN1V3vY9V1BFHlXlqmfApZKiWFTL/U3Jr/MPXGcXvlgn3A4tRV0NE7QDqiEd4 fpy5tVzDHWjb1bkBoU1xnB0P7RRZahPgHI0vO/lXMufSIvVyYHo0KCr6f3Tu1mfd ghO53lInD+yAOqnG29sq+o0NLhsdaZab7NheDCCrWAP4PujaT4u2YXLRQ+sjcfY1 isGadi8wHFU7z5Dwa/R3sjmH0PgTcSZoTV+l5scZEjNbO7lHuqmboTbTsnglNslm 50k/Xn7w3jcN9/GabCEW3bsXU41ok0xly6QDvfZQSvNfYz2XSBf0meREdCpvWCg= =abNX -----END PGP SIGNATURE----- From knepley at gmail.com Tue Jun 2 08:42:30 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 2 Jun 2015 08:42:30 -0500 Subject: [petsc-users] Convergence of iterative linear solver In-Reply-To: References: Message-ID: On Mon, Jun 1, 2015 at 11:24 PM, Eduardo wrote: > I am solving a FEM solid mechanics linear elasticity model, for now the > only problem is the mesh that has needle-shaped and very flat elements. > Have you any suggestion of preconditioner (and references). > The problem here is your discretization. WIth quasi-regular elements, -pc_type gamg works fine. However, with flat elements, your FEM basis becomes very ill-conditioned since the normal basis functions are almost linearly dependent. I think the best use of time here is to investigate better discretization strategies for this problem, since no solver is really going to help you. Thanks, Matt > Thanks, > Eduardo > > On Tue, Jun 2, 2015 at 4:11 AM, Barry Smith wrote: > >> >> > On Jun 1, 2015, at 4:06 PM, Eduardo wrote: >> > >> > I am solving a linear system for which the preconditioned residual >> decreases, but the true residual increases (or have an erratic behavior). >> According to Petsc FAQ, this is due to a preconditioner that is singular or >> close to singular. What can I do in this case? I used GMRES with ILU >> preconditioner. >> ^^^^^^^^^^^^^^^ >> > >> > Incidentally, I tried to solve a smaller system with a direct solver >> (superlu_dist) and it ran, so the system apparently is not singular. >> >> ILU can produce very badly conditioned (one could say singular >> PRECONDITIONER) from not singular sparse matrices. So it doesn't have >> anything to do with the system itself being singular. >> >> What type of problem are you solving? Different problems need >> different preconditioners. >> >> Barry >> >> >> >> > >> > Thanks in advance, >> > Eduardo >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Tue Jun 2 09:27:15 2015 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 2 Jun 2015 10:27:15 -0400 Subject: [petsc-users] Convergence of iterative linear solver In-Reply-To: References: Message-ID: Eduardo, as Matt said you problem is ill conditioned but and you might find that if you add more elements and make the mesh less anisotropic that your solve faster, and you get a better solution obviously. I'm not sure what options there are for better discretization but you can probably do a lot better by using better solver (parameters). I would start with: -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1 -pc_gamg_threshold 0.02 # [0 - 0.1] #-mg_levels_ksp_type richardson -mg_levels_ksp_type chebyshev -mg_levels_pc_type sor #-mg_levels_pc_type jacobi -mg_levels_ksp_max_it 2 # [1-8] Experiment with the two # parameters. If you scan you should find a minima for each. And you might try the commented out smoother parameters. Mark On Tue, Jun 2, 2015 at 9:42 AM, Matthew Knepley wrote: > On Mon, Jun 1, 2015 at 11:24 PM, Eduardo wrote: > >> I am solving a FEM solid mechanics linear elasticity model, for now the >> only problem is the mesh that has needle-shaped and very flat elements. >> Have you any suggestion of preconditioner (and references). >> > > The problem here is your discretization. WIth quasi-regular elements, > -pc_type gamg works fine. However, with flat elements, your FEM > basis becomes very ill-conditioned since the normal basis functions are > almost linearly dependent. I think the best use of time here is > to investigate better discretization strategies for this problem, since no > solver is really going to help you. > > Thanks, > > Matt > > >> Thanks, >> Eduardo >> >> On Tue, Jun 2, 2015 at 4:11 AM, Barry Smith wrote: >> >>> >>> > On Jun 1, 2015, at 4:06 PM, Eduardo wrote: >>> > >>> > I am solving a linear system for which the preconditioned residual >>> decreases, but the true residual increases (or have an erratic behavior). >>> According to Petsc FAQ, this is due to a preconditioner that is singular or >>> close to singular. What can I do in this case? I used GMRES with ILU >>> preconditioner. >>> ^^^^^^^^^^^^^^^ >>> > >>> > Incidentally, I tried to solve a smaller system with a direct solver >>> (superlu_dist) and it ran, so the system apparently is not singular. >>> >>> ILU can produce very badly conditioned (one could say singular >>> PRECONDITIONER) from not singular sparse matrices. So it doesn't have >>> anything to do with the system itself being singular. >>> >>> What type of problem are you solving? Different problems need >>> different preconditioners. >>> >>> Barry >>> >>> >>> >>> > >>> > Thanks in advance, >>> > Eduardo >>> >>> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From francesco.caimmi at polimi.it Tue Jun 2 09:30:35 2015 From: francesco.caimmi at polimi.it (Francesco Caimmi) Date: Tue, 2 Jun 2015 16:30:35 +0200 Subject: [petsc-users] [petsc4py] dm/examples/tutorials/ex2 in python In-Reply-To: References: <34150389.MoinacE02k@wotan> Message-ID: <5048964.moFYollcMY@wotan> On Tuesday 02 June 2015 07:43:21 Matthew Knepley wrote: > On Tue, Jun 2, 2015 at 4:52 AM, Francesco Caimmi > wrote: > > with global_vec as v: > > for i in xrange(start,end): > > v[i] = 5.0*rank > > The 'with' construction just uses > > VecGetArray() > > which return the raw C pointer. This is indexed from 0 always, so you want > > v[i-start] = 5.0*rank > > If you want to index using [start, end), you need something like > > DMDAVecGetArray() > > Thanks, > > Matt Matt, thank you very much for your answer. Unfortunately I wasn't able to find documentation for the 'with' construction. As far as I understand using DMDAVecGetArray would give me an object with shape (M,N) in this case, not a vector with shape M*N as global_vec, so I would have to resort to something different than VecGetOwnershipRange to get the indexes. I would prefer to avoid it because I don't feel it would be very clear. Ithink I will go with the "v[i-start]" thing . Anyway, would it be ok to do as follows? for i in xrange(start,end): global_vec[i] = 5.0*rank global_vec.assemblyBegin() global_vec.assemblyEnd() I ask because, while the program works, I cannot see assembly operation in the C code for ex2, so I am trying to figure out why. Is there some (python related ) reason that would make such an approach unwise? Thanks, -- Francesco Caimmi From knepley at gmail.com Tue Jun 2 09:35:21 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 2 Jun 2015 09:35:21 -0500 Subject: [petsc-users] [petsc4py] dm/examples/tutorials/ex2 in python In-Reply-To: <5048964.moFYollcMY@wotan> References: <34150389.MoinacE02k@wotan> <5048964.moFYollcMY@wotan> Message-ID: On Tue, Jun 2, 2015 at 9:30 AM, Francesco Caimmi wrote: > On Tuesday 02 June 2015 07:43:21 Matthew Knepley wrote: > > On Tue, Jun 2, 2015 at 4:52 AM, Francesco Caimmi < > francesco.caimmi at gmail.com > > > wrote: > > > with global_vec as v: > > > for i in xrange(start,end): > > > v[i] = 5.0*rank > > > > The 'with' construction just uses > > > > VecGetArray() > > > > which return the raw C pointer. This is indexed from 0 always, so you > want > > > > v[i-start] = 5.0*rank > > > > If you want to index using [start, end), you need something like > > > > DMDAVecGetArray() > > > > Thanks, > > > > Matt > Matt, > thank you very much for your answer. Unfortunately I wasn't able to find > documentation for the 'with' construction. > > As far as I understand using DMDAVecGetArray would give me an object with > shape (M,N) in this case, not a vector with shape M*N as global_vec, so I > would have to resort to something different than VecGetOwnershipRange to > get > the indexes. > I would prefer to avoid it because I don't feel it would be very clear. > Ithink I will go with the "v[i-start]" thing . > > Anyway, would it be ok to do as follows? > > for i in xrange(start,end): > global_vec[i] = 5.0*rank > No, you would need xrange(end-start): > global_vec.assemblyBegin() > global_vec.assemblyEnd() > > I ask because, while the program works, I cannot see assembly operation in > the > C code for ex2, so I am trying to figure out why. > Is there some (python related ) reason that would make such an approach > unwise? > Direct access to arrays can only happen for local values. Thus you do not need the assembly calls. Thanks, Matt > Thanks, > -- > Francesco Caimmi > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Jun 2 10:40:01 2015 From: jed at jedbrown.org (Jed Brown) Date: Tue, 02 Jun 2015 17:40:01 +0200 Subject: [petsc-users] Modified Taylor-Hood elements with piece-wise constant pressure for Stokes equation In-Reply-To: <556D9F18.2070107@imperial.ac.uk> References: <877frpili3.fsf@jedbrown.org> <87a8wkgs5b.fsf@jedbrown.org> <87pp5ffq9c.fsf@jedbrown.org> <876176ekwl.fsf@jedbrown.org> <556D9F18.2070107@imperial.ac.uk> Message-ID: <87lhg2ce5q.fsf@jedbrown.org> Lawrence Mitchell writes: > So-called "enriched" elements in FEniCS are not created with a nodal > basis, instead te "basis" for the space Q + P is just the > concatenation of the bases for Q and P separately and so tabulation of > basis functions at points is just the concatentation of the tabulation > of Q and that of P. So the mass matrix for CG1+DG0 is singular? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From lawrence.mitchell at imperial.ac.uk Tue Jun 2 11:06:26 2015 From: lawrence.mitchell at imperial.ac.uk (Lawrence Mitchell) Date: Tue, 02 Jun 2015 17:06:26 +0100 Subject: [petsc-users] Modified Taylor-Hood elements with piece-wise constant pressure for Stokes equation In-Reply-To: <87lhg2ce5q.fsf@jedbrown.org> References: <877frpili3.fsf@jedbrown.org> <87a8wkgs5b.fsf@jedbrown.org> <87pp5ffq9c.fsf@jedbrown.org> <876176ekwl.fsf@jedbrown.org> <556D9F18.2070107@imperial.ac.uk> <87lhg2ce5q.fsf@jedbrown.org> Message-ID: <556DD482.2050803@imperial.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 02/06/15 16:40, Jed Brown wrote: > Lawrence Mitchell writes: >> So-called "enriched" elements in FEniCS are not created with a >> nodal basis, instead te "basis" for the space Q + P is just the >> concatenation of the bases for Q and P separately and so >> tabulation of basis functions at points is just the >> concatentation of the tabulation of Q and that of P. > > So the mass matrix for CG1+DG0 is singular? I believe so, yes. Lawrence -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJVbdR/AAoJECOc1kQ8PEYv+kkH/28Ct1XUVRQ4Y1/OzHJbOfPG 70JD/IVtbs5cuMa7ici6btQOj6nsGQ17TzbZcV/L2oANAawexut9a9AujnzsrNaE Ch0gei1KT+fjVvMk1ZnLRuKNOCZ/0NeXazBkxZRGksMSzUF36+7HXJ3szhOLcaiz KHzy0RO209IsJBFi/F5pyIsrGz69Eknf5NnhmTgjKa73JUlgZeN7h5r1x38g3egK POHW97yUf/YoIJmuSlNKraKIconjiaE+i392bGxXfYWQC7cxKD3rMsr/JTXQNSJ/ NPhAYsUsUJVJHAsIYuWSe2O1CY2C9pqsvJa6MZ5CTEn2ER3X0NWx2NwLT/XQylI= =c4ts -----END PGP SIGNATURE----- From jed at jedbrown.org Tue Jun 2 11:20:49 2015 From: jed at jedbrown.org (Jed Brown) Date: Tue, 02 Jun 2015 18:20:49 +0200 Subject: [petsc-users] Modified Taylor-Hood elements with piece-wise constant pressure for Stokes equation In-Reply-To: <556DD482.2050803@imperial.ac.uk> References: <877frpili3.fsf@jedbrown.org> <87a8wkgs5b.fsf@jedbrown.org> <87pp5ffq9c.fsf@jedbrown.org> <876176ekwl.fsf@jedbrown.org> <556D9F18.2070107@imperial.ac.uk> <87lhg2ce5q.fsf@jedbrown.org> <556DD482.2050803@imperial.ac.uk> Message-ID: <87fv6acc9q.fsf@jedbrown.org> Lawrence Mitchell writes: >> So the mass matrix for CG1+DG0 is singular? > > I believe so, yes. Fabulous. Now let's take a one element domain. What is the norm of the vector u=((1,1,1),(-1)) in the "basis" {CG1, DG0}? Note that this represents the continuous function u(x,y)=0. I assume FEniCS is not using QR to solve the under-determined system, so are they silently pinning (e.g., remove the DG0 part from one element in the domain)? If they are just "solving" the under-determined systems by hoping that the solver doesn't notice that the system is singular, rather than rigorously finding a minimum norm solution (expensive), then the discrete solution can vary an arbitrary amount by the choice of algebraic solver. Why does it appear that none of the papers talk about these issues? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From jed at jedbrown.org Tue Jun 2 11:49:29 2015 From: jed at jedbrown.org (Jed Brown) Date: Tue, 02 Jun 2015 18:49:29 +0200 Subject: [petsc-users] [petsc4py] dm/examples/tutorials/ex2 in python In-Reply-To: References: <34150389.MoinacE02k@wotan> <5048964.moFYollcMY@wotan> Message-ID: <87a8wicaxy.fsf@jedbrown.org> Matthew Knepley writes: >> for i in xrange(start,end): >> global_vec[i] = 5.0*rank > > No, you would need xrange(end-start): Specifically, Python made a decision about indexing semantics that, while not irrational, really sucks for distributed array computing. I'm not aware of a clean way to give you nice syntax. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From lawrence.mitchell at imperial.ac.uk Tue Jun 2 11:51:10 2015 From: lawrence.mitchell at imperial.ac.uk (Lawrence Mitchell) Date: Tue, 02 Jun 2015 17:51:10 +0100 Subject: [petsc-users] Modified Taylor-Hood elements with piece-wise constant pressure for Stokes equation In-Reply-To: <87fv6acc9q.fsf@jedbrown.org> References: <877frpili3.fsf@jedbrown.org> <87a8wkgs5b.fsf@jedbrown.org> <87pp5ffq9c.fsf@jedbrown.org> <876176ekwl.fsf@jedbrown.org> <556D9F18.2070107@imperial.ac.uk> <87lhg2ce5q.fsf@jedbrown.org> <556DD482.2050803@imperial.ac.uk> <87fv6acc9q.fsf@jedbrown.org> Message-ID: <556DDEFE.1030301@imperial.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 02/06/15 17:20, Jed Brown wrote: > Lawrence Mitchell writes: >>> So the mass matrix for CG1+DG0 is singular? >> >> I believe so, yes. > > Fabulous. Now let's take a one element domain. What is the norm > of the vector > > u=((1,1,1),(-1)) > > in the "basis" {CG1, DG0}? Note that this represents the > continuous function u(x,y)=0. > > I assume FEniCS is not using QR to solve the under-determined > system, so are they silently pinning (e.g., remove the DG0 part > from one element in the domain)? If they are just "solving" the > under-determined systems by hoping that the solver doesn't notice > that the system is singular, rather than rigorously finding a > minimum norm solution (expensive), then the discrete solution can > vary an arbitrary amount by the choice of algebraic solver. Maybe Justin can chime in here, I don't know, I just happened to know how the fenics implementation produces the "basis", so proffered that. > Why does it appear that none of the papers talk about these > issues? Pass. Lawrence -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJVbd76AAoJECOc1kQ8PEYvaUUH/AoKpZCNISMHrpGWNbZ67aCd 7mBJyPk7z0S7BLDeSm8i1iF+LitnzaxFNZEaGmSDWjlr1Z9tG2vjRk1R0Aj2cte+ Pb1Ki8pwdU+N1bJgV5GCW+lyaXI3QQQs2K3vsUA530ohwpPUNRqHaW4BB+IIR0hs d/vzIPEn4K9tF5JpFGMKZhJ8zuayI0NdGQ6dVqRTCxGGhTCNuOKgB9xonecHUxFg ZosuoQAVOzgbIJunJutekydvrKKpSf0I6z6cTjp713lR5cVnrGyN6J5evI9Jt7x1 Sdtczorz+EcbQ5lycBMxEGmWKIHCOM5kBG+IxjN+pQg6R/+TDbsIQL9Lyj0BU6I= =irr/ -----END PGP SIGNATURE----- From tuan.vawr at gmail.com Tue Jun 2 12:17:18 2015 From: tuan.vawr at gmail.com (Tuan Nguyen) Date: Tue, 2 Jun 2015 13:17:18 -0400 Subject: [petsc-users] Double-colon target Message-ID: Dear all, I installed PETSC 3.1-p8 library. I am trying compile the fortran code link to Petsc but there is message saying about target file as follow: *?**?* *petsc-3.1-p8-openmpi/conf/rules:120: *** target file `clean' has both : and :: entries. Stop.* ?In my "makefile" I added double-colon to target clean but it will bring me to another error message: *?* */home/petsc-3.1-p8-openmpi/conf/rules:299: warning: overriding commands for target `.c.o'makefile:39: warning: ignoring old commands for target `.c.o'/home/petsc-3.1-p8-openmpi/linux-gnu-c-debug/conf/petscrules:31: warning: overriding commands for target `.F.o'makefile:42: warning: ignoring old commands for target `.F.o'/home/petsc-3.1-p8-openmpi/linux-gnu-c-debug/conf/petscrules:31: warning: overriding commands for target `.F90.o'makefile:47: warning: ignoring old commands for target `.F90.o'* /bin/rm -f *.o *.mod *.f90 ?Could any one help me? Thanks,? Tuan -------------- next part -------------- An HTML attachment was scrubbed... URL: From italo at tasso.com.br Tue Jun 2 12:26:23 2015 From: italo at tasso.com.br (Italo Tasso) Date: Tue, 2 Jun 2015 14:26:23 -0300 Subject: [petsc-users] KSP "randomly" not converging Message-ID: I made a code to solve the Navier-Stokes equations, incompressible, non-linear, all coupled, finite differences, staggered grid. I am running the code with: -ts_monitor -snes_monitor -ksp_monitor_true_residual -snes_converged_reason -ksp_converged_reason -pc_type fieldsplit -pc_fieldsplit_type schur -pc_fieldsplit_detect_saddle_point It works very well most of the time. But in some cases, the solver halts for a long time then KSP does not converge. See output1.txt. It seems that the residual is already very small, close to machine zero, but KSP doesn't stop. So I added -ksp_atol 1e-10. See output2.txt. Now it fails on a different time step. I also tried -ksp_norm_type unpreconditioned. It works for this case (grid size), but fail for other cases. I also tried building the Jacobian and including null space. It fixes some cases but causes others that worked before to fail. Seems really random. It feels like this is related to the PC, because the code halts for a long time at the first KSP step, then diverges. Any suggestions? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- 0 TS dt 1 time 0 0 SNES Function norm 4.253022803593e+02 0 KSP preconditioned resid norm 1.598377068791e+01 true resid norm 4.253022803593e+02 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.527395460671e-04 true resid norm 1.824978619341e-03 ||r(i)||/||b|| 4.291015363941e-06 Linear solve converged due to CONVERGED_RTOL iterations 1 1 SNES Function norm 3.724738658496e+00 0 KSP preconditioned resid norm 4.541655062749e-01 true resid norm 3.724738658496e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.742053553611e-06 true resid norm 9.163350378091e-06 ||r(i)||/||b|| 2.460132432967e-06 Linear solve converged due to CONVERGED_RTOL iterations 1 2 SNES Function norm 7.570359357297e-05 0 KSP preconditioned resid norm 9.031735278010e-06 true resid norm 7.570359357297e-05 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 3.561905648780e-11 true resid norm 1.167305855774e-10 ||r(i)||/||b|| 1.541942463602e-06 Linear solve converged due to CONVERGED_RTOL iterations 1 3 SNES Function norm 1.167090042251e-10 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 3 1 TS dt 1 time 1 0 SNES Function norm 2.653082379074e+00 0 KSP preconditioned resid norm 2.077610485197e-01 true resid norm 2.653082379074e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 9.769974416602e-07 true resid norm 2.473966590397e-05 ||r(i)||/||b|| 9.324876641260e-06 Linear solve converged due to CONVERGED_RTOL iterations 1 1 SNES Function norm 7.247455375157e-04 0 KSP preconditioned resid norm 1.044465137677e-04 true resid norm 7.247455375157e-04 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.451030701351e-09 true resid norm 1.426395371133e-09 ||r(i)||/||b|| 1.968132671809e-06 2 KSP preconditioned resid norm 5.832060027950e-15 true resid norm 1.573719406040e-12 ||r(i)||/||b|| 2.171409583886e-09 Linear solve converged due to CONVERGED_RTOL iterations 2 2 SNES Function norm 7.209185280340e-12 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 2 2 TS dt 1 time 2 0 SNES Function norm 3.893566734673e-02 0 KSP preconditioned resid norm 2.075210645861e-03 true resid norm 3.893566734673e-02 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.354604121708e-08 true resid norm 3.210120674764e-07 ||r(i)||/||b|| 8.244678705972e-06 Linear solve converged due to CONVERGED_RTOL iterations 1 1 SNES Function norm 3.928262978140e-07 0 KSP preconditioned resid norm 4.012446092441e-08 true resid norm 3.928262978140e-07 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 3.291755030881e-13 true resid norm 1.142603847352e-12 ||r(i)||/||b|| 2.908674530474e-06 Linear solve converged due to CONVERGED_RTOL iterations 1 2 SNES Function norm 1.140361438245e-12 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 2 3 TS dt 1 time 3 0 SNES Function norm 7.072003797845e-04 0 KSP preconditioned resid norm 2.985985826471e-05 true resid norm 7.072003797845e-04 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.360683404935e-09 true resid norm 7.766564413762e-09 ||r(i)||/||b|| 1.098212704033e-05 2 KSP preconditioned resid norm 1.355536488579e-14 true resid norm 2.741029939237e-13 ||r(i)||/||b|| 3.875888669732e-10 Linear solve converged due to CONVERGED_RTOL iterations 2 1 SNES Function norm 8.799393538189e-11 0 KSP preconditioned resid norm 1.268240860687e-11 true resid norm 8.799393538189e-11 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.598556138248e-15 true resid norm 1.300296229532e-16 ||r(i)||/||b|| 1.477711189856e-06 2 KSP preconditioned resid norm 1.594636691828e-15 true resid norm 1.176359160863e-16 ||r(i)||/||b|| 1.336863905175e-06 3 KSP preconditioned resid norm 1.594636691828e-15 true resid norm 1.176359136362e-16 ||r(i)||/||b|| 1.336863877331e-06 4 KSP preconditioned resid norm 1.594636466157e-15 true resid norm 1.680057681033e-15 ||r(i)||/||b|| 1.909288036433e-05 5 KSP preconditioned resid norm 1.126933726205e-15 true resid norm 3.013339326200e-09 ||r(i)||/||b|| 3.424485236536e+01 6 KSP preconditioned resid norm 9.387864793887e-16 true resid norm 3.933424137599e-09 ||r(i)||/||b|| 4.470108218855e+01 7 KSP preconditioned resid norm 9.070661278230e-16 true resid norm 4.072033380018e-09 ||r(i)||/||b|| 4.627629577364e+01 8 KSP preconditioned resid norm 8.249402532426e-16 true resid norm 4.408768188541e-09 ||r(i)||/||b|| 5.010309141655e+01 9 KSP preconditioned resid norm 7.534530605607e-16 true resid norm 4.675885194343e-09 ||r(i)||/||b|| 5.313872114084e+01 10 KSP preconditioned resid norm 6.973682751383e-16 true resid norm 4.868512559313e-09 ||r(i)||/||b|| 5.532781933419e+01 11 KSP preconditioned resid norm 6.521713140954e-16 true resid norm 5.012908023992e-09 ||r(i)||/||b|| 5.696878997668e+01 12 KSP preconditioned resid norm 6.147571940872e-16 true resid norm 5.125121763781e-09 ||r(i)||/||b|| 5.824403399551e+01 13 KSP preconditioned resid norm 5.831220039769e-16 true resid norm 5.214832474798e-09 ||r(i)||/||b|| 5.926354415410e+01 14 KSP preconditioned resid norm 5.559164887865e-16 true resid norm 5.288191141747e-09 ||r(i)||/||b|| 6.009722282333e+01 15 KSP preconditioned resid norm 5.321948598603e-16 true resid norm 5.349296664528e-09 ||r(i)||/||b|| 6.079165162136e+01 16 KSP preconditioned resid norm 5.112718616972e-16 true resid norm 5.400980813168e-09 ||r(i)||/||b|| 6.137901197086e+01 17 KSP preconditioned resid norm 4.926372126708e-16 true resid norm 5.445267774664e-09 ||r(i)||/||b|| 6.188230758213e+01 18 KSP preconditioned resid norm 4.759020574565e-16 true resid norm 5.483639339047e-09 ||r(i)||/||b|| 6.231837813877e+01 19 KSP preconditioned resid norm 4.607641760886e-16 true resid norm 5.517205444830e-09 ||r(i)||/||b|| 6.269983744773e+01 20 KSP preconditioned resid norm 4.469846488451e-16 true resid norm 5.546816971573e-09 ||r(i)||/||b|| 6.303635526130e+01 21 KSP preconditioned resid norm 4.343717702574e-16 true resid norm 5.573133423670e-09 ||r(i)||/||b|| 6.333542646414e+01 22 KSP preconditioned resid norm 4.227696928215e-16 true resid norm 5.596675265998e-09 ||r(i)||/||b|| 6.360296583746e+01 23 KSP preconditioned resid norm 4.120502401467e-16 true resid norm 5.617859471536e-09 ||r(i)||/||b|| 6.384371203714e+01 24 KSP preconditioned resid norm 4.021068945392e-16 true resid norm 5.637024566096e-09 ||r(i)||/||b|| 6.406151221254e+01 25 KSP preconditioned resid norm 3.928503078688e-16 true resid norm 5.654444680967e-09 ||r(i)||/||b|| 6.425948170663e+01 26 KSP preconditioned resid norm 3.842048996651e-16 true resid norm 5.670348976522e-09 ||r(i)||/||b|| 6.444022479406e+01 27 KSP preconditioned resid norm 3.761062443545e-16 true resid norm 5.684925120671e-09 ||r(i)||/||b|| 6.460587421166e+01 28 KSP preconditioned resid norm 3.684990400366e-16 true resid norm 5.698334924251e-09 ||r(i)||/||b|| 6.475826884569e+01 29 KSP preconditioned resid norm 3.613355117717e-16 true resid norm 5.710711489252e-09 ||r(i)||/||b|| 6.489892132302e+01 30 KSP preconditioned resid norm 8.698819833915e-09 true resid norm 5.722170463550e-09 ||r(i)||/||b|| 6.502914591462e+01 31 KSP preconditioned resid norm 7.536793804038e-14 true resid norm 2.349427321624e-13 ||r(i)||/||b|| 2.669987779757e-03 32 KSP preconditioned resid norm 1.928503282462e-15 true resid norm 6.473127949349e-15 ||r(i)||/||b|| 7.356334185142e-05 33 KSP preconditioned resid norm 1.928503253731e-15 true resid norm 6.690222616855e-15 ||r(i)||/||b|| 7.603049673616e-05 34 KSP preconditioned resid norm 1.928503253731e-15 true resid norm 6.238864042479e-15 ||r(i)||/||b|| 7.090106852709e-05 35 KSP preconditioned resid norm 1.928498870918e-15 true resid norm 3.214140551449e-12 ||r(i)||/||b|| 3.652684173630e-02 36 KSP preconditioned resid norm 1.625907985123e-15 true resid norm 2.044786756291e-07 ||r(i)||/||b|| 2.323781459957e+03 37 KSP preconditioned resid norm 1.431982563341e-15 true resid norm 3.172178429613e-07 ||r(i)||/||b|| 3.604996657834e+03 38 KSP preconditioned resid norm 1.294174652820e-15 true resid norm 3.886416984150e-07 ||r(i)||/||b|| 4.416687317464e+03 39 KSP preconditioned resid norm 1.189776100504e-15 true resid norm 4.379426967629e-07 ||r(i)||/||b|| 4.976964547184e+03 40 KSP preconditioned resid norm 1.107154614931e-15 true resid norm 4.740220827425e-07 ||r(i)||/||b|| 5.386985826754e+03 41 KSP preconditioned resid norm 1.039654336201e-15 true resid norm 5.015718018458e-07 ||r(i)||/||b|| 5.700072393274e+03 42 KSP preconditioned resid norm 9.831596390000e-16 true resid norm 5.232978881303e-07 ||r(i)||/||b|| 5.946976753105e+03 43 KSP preconditioned resid norm 9.349720637672e-16 true resid norm 5.408703295482e-07 ||r(i)||/||b|| 6.146677349988e+03 44 KSP preconditioned resid norm 8.932379684960e-16 true resid norm 5.553759198673e-07 ||r(i)||/||b|| 6.311524964272e+03 45 KSP preconditioned resid norm 8.566353323118e-16 true resid norm 5.675528117436e-07 ||r(i)||/||b|| 6.449908272434e+03 46 KSP preconditioned resid norm 8.241923453887e-16 true resid norm 5.779200537651e-07 ||r(i)||/||b|| 6.567725960397e+03 47 KSP preconditioned resid norm 7.951765820935e-16 true resid norm 5.868530591039e-07 ||r(i)||/||b|| 6.669244380956e+03 48 KSP preconditioned resid norm 7.690242557867e-16 true resid norm 5.946302226359e-07 ||r(i)||/||b|| 6.757627330284e+03 49 KSP preconditioned resid norm 7.452934070804e-16 true resid norm 6.014622202040e-07 ||r(i)||/||b|| 6.835269017049e+03 50 KSP preconditioned resid norm 7.236320331378e-16 true resid norm 6.075115156835e-07 ||r(i)||/||b|| 6.904015748891e+03 51 KSP preconditioned resid norm 7.037558397152e-16 true resid norm 6.129052979292e-07 ||r(i)||/||b|| 6.965312953322e+03 52 KSP preconditioned resid norm 6.854323672734e-16 true resid norm 6.177446416151e-07 ||r(i)||/||b|| 7.020309285340e+03 53 KSP preconditioned resid norm 6.684694403275e-16 true resid norm 6.221108590337e-07 ||r(i)||/||b|| 7.069928811955e+03 54 KSP preconditioned resid norm 6.527066145219e-16 true resid norm 6.260701223797e-07 ||r(i)||/||b|| 7.114923541749e+03 55 KSP preconditioned resid norm 6.380087433170e-16 true resid norm 6.296767723820e-07 ||r(i)||/||b|| 7.155911025564e+03 56 KSP preconditioned resid norm 6.242610702369e-16 true resid norm 6.329758684080e-07 ||r(i)||/||b|| 7.193403336956e+03 57 KSP preconditioned resid norm 6.113654367785e-16 true resid norm 6.360052307587e-07 ||r(i)||/||b|| 7.227830281695e+03 58 KSP preconditioned resid norm 5.992373181288e-16 true resid norm 6.387965762121e-07 ||r(i)||/||b|| 7.259552302551e+03 59 KSP preconditioned resid norm 5.878034812657e-16 true resid norm 6.413769143451e-07 ||r(i)||/||b|| 7.288876347689e+03 60 KSP preconditioned resid norm 1.312342799778e-07 true resid norm 6.437692835159e-07 ||r(i)||/||b|| 7.316064234678e+03 Linear solve did not converge due to DIVERGED_DTOL iterations 60 Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_NONLINEAR_SOLVE, increase -ts_max_snes_failures or make negative to attempt recovery [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.5.4, May, 23, 2015 [0]PETSC ERROR: ./planb on a arch-linux2-c-opt named localhost.localdomain by root Tue Jun 2 13:59:08 2015 [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich --download-hypre --with-debugging=0 COPTFLAGS="-Ofast -march=native -mtune=native" CXXOPTFLAGS="-Ofast -march=native -mtune=native" FOPTFLAGS="-Ofast -march=native -mtune=native" [0]PETSC ERROR: #1 TSStep() line 2638 in /root/petsc/petsc-3.5.4/src/ts/interface/ts.c [0]PETSC ERROR: #2 TSSolve() line 2748 in /root/petsc/petsc-3.5.4/src/ts/interface/ts.c DIVERGED_NONLINEAR_SOLVE at time 0 after 3 steps -------------- next part -------------- 0 TS dt 1 time 0 0 SNES Function norm 4.253022803593e+02 0 KSP preconditioned resid norm 1.598377068791e+01 true resid norm 4.253022803593e+02 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.527395460671e-04 true resid norm 1.824978619341e-03 ||r(i)||/||b|| 4.291015363941e-06 Linear solve converged due to CONVERGED_RTOL iterations 1 1 SNES Function norm 3.724738658496e+00 0 KSP preconditioned resid norm 4.541655062749e-01 true resid norm 3.724738658496e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.742053553611e-06 true resid norm 9.163350378091e-06 ||r(i)||/||b|| 2.460132432967e-06 Linear solve converged due to CONVERGED_RTOL iterations 1 2 SNES Function norm 7.570359357297e-05 0 KSP preconditioned resid norm 9.031735278010e-06 true resid norm 7.570359357297e-05 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 3.561905648780e-11 true resid norm 1.167305855774e-10 ||r(i)||/||b|| 1.541942463602e-06 Linear solve converged due to CONVERGED_ATOL iterations 1 3 SNES Function norm 1.167090042251e-10 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 3 1 TS dt 1 time 1 0 SNES Function norm 2.653082379074e+00 0 KSP preconditioned resid norm 2.077610485197e-01 true resid norm 2.653082379074e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 9.769974416602e-07 true resid norm 2.473966590397e-05 ||r(i)||/||b|| 9.324876641260e-06 Linear solve converged due to CONVERGED_RTOL iterations 1 1 SNES Function norm 7.247455375157e-04 0 KSP preconditioned resid norm 1.044465137677e-04 true resid norm 7.247455375157e-04 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.451030701351e-09 true resid norm 1.426395371133e-09 ||r(i)||/||b|| 1.968132671809e-06 2 KSP preconditioned resid norm 5.832060027950e-15 true resid norm 1.573719406040e-12 ||r(i)||/||b|| 2.171409583886e-09 Linear solve converged due to CONVERGED_ATOL iterations 2 2 SNES Function norm 7.209185280340e-12 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 2 2 TS dt 1 time 2 0 SNES Function norm 3.893566734673e-02 0 KSP preconditioned resid norm 2.075210645861e-03 true resid norm 3.893566734673e-02 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.354604121708e-08 true resid norm 3.210120674764e-07 ||r(i)||/||b|| 8.244678705972e-06 Linear solve converged due to CONVERGED_RTOL iterations 1 1 SNES Function norm 3.928262978140e-07 0 KSP preconditioned resid norm 4.012446092441e-08 true resid norm 3.928262978140e-07 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 3.291755030881e-13 true resid norm 1.142603847352e-12 ||r(i)||/||b|| 2.908674530474e-06 Linear solve converged due to CONVERGED_ATOL iterations 1 2 SNES Function norm 1.140361438245e-12 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 2 3 TS dt 1 time 3 0 SNES Function norm 7.072003797845e-04 0 KSP preconditioned resid norm 2.985985826471e-05 true resid norm 7.072003797845e-04 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.360683404935e-09 true resid norm 7.766564413762e-09 ||r(i)||/||b|| 1.098212704033e-05 2 KSP preconditioned resid norm 1.355536488579e-14 true resid norm 2.741029939237e-13 ||r(i)||/||b|| 3.875888669732e-10 Linear solve converged due to CONVERGED_ATOL iterations 2 1 SNES Function norm 8.799393538189e-11 0 KSP preconditioned resid norm 1.268240860687e-11 true resid norm 8.799393538189e-11 ||r(i)||/||b|| 1.000000000000e+00 Linear solve converged due to CONVERGED_ATOL iterations 0 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 4 TS dt 1 time 4 0 SNES Function norm 1.344414362743e-05 0 KSP preconditioned resid norm 5.172248777434e-07 true resid norm 1.344414362743e-05 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.222241652113e-11 true resid norm 1.334838274983e-10 ||r(i)||/||b|| 9.928771307229e-06 Linear solve converged due to CONVERGED_ATOL iterations 1 1 SNES Function norm 1.334850288577e-10 0 KSP preconditioned resid norm 1.222161239175e-11 true resid norm 1.334850288577e-10 ||r(i)||/||b|| 1.000000000000e+00 Linear solve converged due to CONVERGED_ATOL iterations 0 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 5 TS dt 1 time 5 0 SNES Function norm 2.591466423628e-07 0 KSP preconditioned resid norm 9.637807739615e-09 true resid norm 2.591466423628e-07 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.678675619343e-13 true resid norm 2.331564850773e-12 ||r(i)||/||b|| 8.997086859835e-06 Linear solve converged due to CONVERGED_ATOL iterations 1 1 SNES Function norm 2.332803353252e-12 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 6 TS dt 1 time 6 0 SNES Function norm 5.017542391741e-09 0 KSP preconditioned resid norm 1.844050048424e-10 true resid norm 5.017542391741e-09 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 3.090981020654e-15 true resid norm 4.433674429577e-14 ||r(i)||/||b|| 8.836346727981e-06 Linear solve converged due to CONVERGED_ATOL iterations 1 1 SNES Function norm 8.680378695082e-14 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 7 TS dt 1 time 7 0 SNES Function norm 9.729564145362e-11 0 KSP preconditioned resid norm 1.044238592402e-03 true resid norm 9.729564145362e-11 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.044238592401e-03 true resid norm 6.820544672134e-10 ||r(i)||/||b|| 7.010123547399e+00 2 KSP preconditioned resid norm 1.044238592401e-03 true resid norm 8.445969119028e-10 ||r(i)||/||b|| 8.680727104362e+00 3 KSP preconditioned resid norm 1.044136977723e-03 true resid norm 2.449201165913e-03 ||r(i)||/||b|| 2.517277371649e+07 4 KSP preconditioned resid norm 7.445935495674e-07 true resid norm 1.072817730961e+01 ||r(i)||/||b|| 1.102636988597e+11 5 KSP preconditioned resid norm 3.916227586096e-07 true resid norm 9.555493716569e+00 ||r(i)||/||b|| 9.821091236779e+10 6 KSP preconditioned resid norm 2.937255290875e-07 true resid norm 9.985006049138e+00 ||r(i)||/||b|| 1.026254198026e+11 7 KSP preconditioned resid norm 2.411571542429e-07 true resid norm 1.083641444070e+01 ||r(i)||/||b|| 1.113761549726e+11 8 KSP preconditioned resid norm 2.096009641527e-07 true resid norm 1.034314038751e+01 ||r(i)||/||b|| 1.063063075897e+11 9 KSP preconditioned resid norm 1.878687988652e-07 true resid norm 1.075084562626e+01 ||r(i)||/||b|| 1.104966827459e+11 10 KSP preconditioned resid norm 1.717428515416e-07 true resid norm 9.934773875293e+00 ||r(i)||/||b|| 1.021091358962e+11 11 KSP preconditioned resid norm 1.591641622471e-07 true resid norm 1.033195422785e+01 ||r(i)||/||b|| 1.061913367699e+11 12 KSP preconditioned resid norm 1.489978197813e-07 true resid norm 1.066054982748e+01 ||r(i)||/||b|| 1.095686267977e+11 13 KSP preconditioned resid norm 1.405598009774e-07 true resid norm 1.028836260604e+01 ||r(i)||/||b|| 1.057433041432e+11 14 KSP preconditioned resid norm 1.334100642297e-07 true resid norm 9.549639372745e+00 ||r(i)||/||b|| 9.815074169892e+10 15 KSP preconditioned resid norm 1.272509723678e-07 true resid norm 1.121963347709e+01 ||r(i)||/||b|| 1.153148620993e+11 16 KSP preconditioned resid norm 1.218730402854e-07 true resid norm 1.048201830088e+01 ||r(i)||/||b|| 1.077336882133e+11 17 KSP preconditioned resid norm 1.171239783614e-07 true resid norm 1.057886746243e+01 ||r(i)||/||b|| 1.087290993140e+11 18 KSP preconditioned resid norm 1.128900635955e-07 true resid norm 1.026457473633e+01 ||r(i)||/||b|| 1.054988135437e+11 19 KSP preconditioned resid norm 1.090844273663e-07 true resid norm 9.830369274163e+00 ||r(i)||/||b|| 1.010360703449e+11 20 KSP preconditioned resid norm 1.056394163151e-07 true resid norm 1.054966888344e+01 ||r(i)||/||b|| 1.084289977005e+11 21 KSP preconditioned resid norm 1.025014516390e-07 true resid norm 1.001811941256e+01 ||r(i)||/||b|| 1.029657573853e+11 22 KSP preconditioned resid norm 9.962747519408e-08 true resid norm 1.030677687005e+01 ||r(i)||/||b|| 1.059325650776e+11 23 KSP preconditioned resid norm 9.698243424036e-08 true resid norm 1.011509529250e+01 ||r(i)||/||b|| 1.039624708915e+11 24 KSP preconditioned resid norm 9.453746415448e-08 true resid norm 1.040323617238e+01 ||r(i)||/||b|| 1.069239692236e+11 25 KSP preconditioned resid norm 9.226855118760e-08 true resid norm 9.753982831337e+00 ||r(i)||/||b|| 1.002509740993e+11 26 KSP preconditioned resid norm 9.015553226141e-08 true resid norm 9.689553808953e+00 ||r(i)||/||b|| 9.958877565520e+10 27 KSP preconditioned resid norm 8.818133580182e-08 true resid norm 9.938190304321e+00 ||r(i)||/||b|| 1.021442497921e+11 28 KSP preconditioned resid norm 8.633139783847e-08 true resid norm 9.763032236643e+00 ||r(i)||/||b|| 1.003439834589e+11 29 KSP preconditioned resid norm 8.459320747362e-08 true resid norm 9.632547602791e+00 ||r(i)||/||b|| 9.900286856510e+10 30 KSP preconditioned resid norm 1.224166934780e+10 true resid norm 1.048973657614e+01 ||r(i)||/||b|| 1.078130162814e+11 Linear solve did not converge due to DIVERGED_DTOL iterations 30 Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_NONLINEAR_SOLVE, increase -ts_max_snes_failures or make negative to attempt recovery [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.5.4, May, 23, 2015 [0]PETSC ERROR: ./planb on a arch-linux2-c-opt named localhost.localdomain by root Tue Jun 2 14:14:06 2015 [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich --download-hypre --with-debugging=0 COPTFLAGS="-Ofast -march=native -mtune=native" CXXOPTFLAGS="-Ofast -march=native -mtune=native" FOPTFLAGS="-Ofast -march=native -mtune=native" [0]PETSC ERROR: #1 TSStep() line 2638 in /root/petsc/petsc-3.5.4/src/ts/interface/ts.c [0]PETSC ERROR: #2 TSSolve() line 2748 in /root/petsc/petsc-3.5.4/src/ts/interface/ts.c DIVERGED_NONLINEAR_SOLVE at time 0 after 7 steps From knepley at gmail.com Tue Jun 2 12:30:56 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 2 Jun 2015 12:30:56 -0500 Subject: [petsc-users] Double-colon target In-Reply-To: References: Message-ID: On Tue, Jun 2, 2015 at 12:17 PM, Tuan Nguyen wrote: > Dear all, > > I installed PETSC 3.1-p8 library. I am trying compile the fortran code > link to Petsc but there is message saying about target file as follow: > > *?**?* > *petsc-3.1-p8-openmpi/conf/rules:120: *** target file `clean' has both : > and :: entries. Stop.* > > ?In my "makefile" I added double-colon to target clean but it will bring > me to another error message: > *?* > > > > > > */home/petsc-3.1-p8-openmpi/conf/rules:299: warning: overriding commands > for target `.c.o'makefile:39: warning: ignoring old commands for target > `.c.o'/home/petsc-3.1-p8-openmpi/linux-gnu-c-debug/conf/petscrules:31: > warning: overriding commands for target `.F.o'makefile:42: warning: > ignoring old commands for target > `.F.o'/home/petsc-3.1-p8-openmpi/linux-gnu-c-debug/conf/petscrules:31: > warning: overriding commands for target `.F90.o'makefile:47: warning: > ignoring old commands for target `.F90.o'* > /bin/rm -f *.o *.mod *.f90 > > ?Could any one help me? > In your makefile, you have rules for .c.o, etc. You either use the PETSc makefiles, which define these rules, or use your own rules, in which case just use the conf/variables file which defines our make vars. Matt > Thanks,? > > Tuan > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Jun 2 12:36:45 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 2 Jun 2015 12:36:45 -0500 Subject: [petsc-users] KSP "randomly" not converging In-Reply-To: References: Message-ID: On Tue, Jun 2, 2015 at 12:26 PM, Italo Tasso wrote: > I made a code to solve the Navier-Stokes equations, incompressible, > non-linear, all coupled, finite differences, staggered grid. > > I am running the code with: > > -ts_monitor -snes_monitor -ksp_monitor_true_residual > -snes_converged_reason -ksp_converged_reason -pc_type fieldsplit > -pc_fieldsplit_type schur -pc_fieldsplit_detect_saddle_point > > It works very well most of the time. But in some cases, the solver halts > for a long time then KSP does not converge. > > See output1.txt. It seems that the residual is already very small, close > to machine zero, but KSP doesn't stop. > > So I added -ksp_atol 1e-10. See output2.txt. Now it fails on a different > time step. > > I also tried -ksp_norm_type unpreconditioned. It works for this case > (grid size), but fail for other cases. > > I also tried building the Jacobian and including null space. It fixes some > cases but causes others that worked before to fail. Seems really random. > > It feels like this is related to the PC, because the code halts for a long > time at the first KSP step, then diverges. > > Any suggestions? > Yes, this is related to your preconditioner. If you have a null space, you have to project it out. However, 0 KSP preconditioned resid norm 1.044238592402e-03 true resid norm 9.729564145362e-11 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.044238592401e-03 true resid norm 6.820544672134e-10 ||r(i)||/||b|| 7.010123547399e+00 2 KSP preconditioned resid norm 1.044238592401e-03 true resid norm 8.445969119028e-10 ||r(i)||/||b|| 8.680727104362e+00 this is something strange. Your preconditioner has changed the solution to your problem. It appears ILU (which I assume you are using. you should ways send -ksp_view) has broken down completely. It is unreliable in the extreme. Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Jun 2 12:40:14 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 2 Jun 2015 12:40:14 -0500 Subject: [petsc-users] KSP "randomly" not converging In-Reply-To: References: Message-ID: Run for one time-step with -ts_view so we can see exactly what solver options you are using. Then run with the additional options -ksp_pc_side right -ts_max_snes_failures -1 > On Jun 2, 2015, at 12:26 PM, Italo Tasso wrote: > > I made a code to solve the Navier-Stokes equations, incompressible, non-linear, all coupled, finite differences, staggered grid. > > I am running the code with: > > -ts_monitor -snes_monitor -ksp_monitor_true_residual -snes_converged_reason -ksp_converged_reason -pc_type fieldsplit -pc_fieldsplit_type schur -pc_fieldsplit_detect_saddle_point > > It works very well most of the time. But in some cases, the solver halts for a long time then KSP does not converge. > > See output1.txt. It seems that the residual is already very small, close to machine zero, but KSP doesn't stop. > > So I added -ksp_atol 1e-10. See output2.txt. Now it fails on a different time step. > > I also tried -ksp_norm_type unpreconditioned. It works for this case (grid size), but fail for other cases. > > I also tried building the Jacobian and including null space. It fixes some cases but causes others that worked before to fail. Seems really random. > > It feels like this is related to the PC, because the code halts for a long time at the first KSP step, then diverges. > > Any suggestions? > From italo at tasso.com.br Tue Jun 2 12:51:04 2015 From: italo at tasso.com.br (Italo Tasso) Date: Tue, 2 Jun 2015 14:51:04 -0300 Subject: [petsc-users] KSP "randomly" not converging In-Reply-To: References: Message-ID: Here is ts_view. On Tue, Jun 2, 2015 at 2:40 PM, Barry Smith wrote: > > Run for one time-step with -ts_view so we can see exactly what solver > options you are using. > > Then run with the additional options -ksp_pc_side right > -ts_max_snes_failures -1 > > > > On Jun 2, 2015, at 12:26 PM, Italo Tasso wrote: > > > > I made a code to solve the Navier-Stokes equations, incompressible, > non-linear, all coupled, finite differences, staggered grid. > > > > I am running the code with: > > > > -ts_monitor -snes_monitor -ksp_monitor_true_residual > -snes_converged_reason -ksp_converged_reason -pc_type fieldsplit > -pc_fieldsplit_type schur -pc_fieldsplit_detect_saddle_point > > > > It works very well most of the time. But in some cases, the solver halts > for a long time then KSP does not converge. > > > > See output1.txt. It seems that the residual is already very small, close > to machine zero, but KSP doesn't stop. > > > > So I added -ksp_atol 1e-10. See output2.txt. Now it fails on a different > time step. > > > > I also tried -ksp_norm_type unpreconditioned. It works for this case > (grid size), but fail for other cases. > > > > I also tried building the Jacobian and including null space. It fixes > some cases but causes others that worked before to fail. Seems really > random. > > > > It feels like this is related to the PC, because the code halts for a > long time at the first KSP step, then diverges. > > > > Any suggestions? > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- 0 TS dt 1 time 0 0 SNES Function norm 4.253022803593e+02 0 KSP preconditioned resid norm 1.598377068791e+01 true resid norm 4.253022803593e+02 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.527395460671e-04 true resid norm 1.824978619341e-03 ||r(i)||/||b|| 4.291015363941e-06 Linear solve converged due to CONVERGED_RTOL iterations 1 1 SNES Function norm 3.724738658496e+00 0 KSP preconditioned resid norm 4.541655062749e-01 true resid norm 3.724738658496e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.742053553611e-06 true resid norm 9.163350378091e-06 ||r(i)||/||b|| 2.460132432967e-06 Linear solve converged due to CONVERGED_RTOL iterations 1 2 SNES Function norm 7.570359357297e-05 0 KSP preconditioned resid norm 9.031735278010e-06 true resid norm 7.570359357297e-05 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 3.561905648780e-11 true resid norm 1.167305855774e-10 ||r(i)||/||b|| 1.541942463602e-06 Linear solve converged due to CONVERGED_ATOL iterations 1 3 SNES Function norm 1.167090042251e-10 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 3 1 TS dt 1 time 1 TS Object: 1 MPI processes type: beuler maximum steps=1000000000 maximum time=1 total number of nonlinear solver iterations=3 total number of nonlinear solve failures=0 total number of linear solver iterations=3 total number of rejected steps=0 SNES Object: 1 MPI processes type: newtonls maximum iterations=50, maximum function evaluations=10000 tolerances: relative=1e-08, absolute=1e-50, solution=1e-08 total number of linear solver iterations=3 total number of function evaluations=4 SNESLineSearch Object: 1 MPI processes type: bt interpolation: cubic alpha=1.000000e-04 maxstep=1.000000e+08, minlambda=1.000000e-12 tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 maximum iterations=40 KSP Object: 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-10, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: fieldsplit FieldSplit with Schur preconditioner, blocksize = 3, factorization FULL Preconditioner for the Schur complement formed from S itself Split info: Split number 0 Defined by IS Split number 1 Defined by IS KSP solver for A00 block KSP Object: (fieldsplit_0_) 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (fieldsplit_0_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=263, cols=263 package used to perform factorization: petsc total: nonzeros=4387, allocated nonzeros=4387 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 121 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: (fieldsplit_0_) 1 MPI processes type: seqaij rows=263, cols=263 total: nonzeros=4387, allocated nonzeros=4387 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 121 nodes, limit used is 5 KSP solver for S = A11 - A10 inv(A00) A01 KSP Object: (fieldsplit_1_) 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (fieldsplit_1_) 1 MPI processes type: none linear system matrix = precond matrix: Mat Object: (fieldsplit_1_) 1 MPI processes type: schurcomplement rows=100, cols=100 Schur complement A11 - A10 inv(A00) A01 A11 Mat Object: (fieldsplit_1_) 1 MPI processes type: seqaij rows=100, cols=100 total: nonzeros=784, allocated nonzeros=784 total number of mallocs used during MatSetValues calls =0 not using I-node routines A10 Mat Object: 1 MPI processes type: seqaij rows=100, cols=263 total: nonzeros=1739, allocated nonzeros=1739 total number of mallocs used during MatSetValues calls =0 not using I-node routines KSP of A00 KSP Object: (fieldsplit_0_) 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (fieldsplit_0_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: natural factor fill ratio given 1, needed 1 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=263, cols=263 package used to perform factorization: petsc total: nonzeros=4387, allocated nonzeros=4387 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 121 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: (fieldsplit_0_) 1 MPI processes type: seqaij rows=263, cols=263 total: nonzeros=4387, allocated nonzeros=4387 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 121 nodes, limit used is 5 A01 Mat Object: 1 MPI processes type: seqaij rows=263, cols=100 total: nonzeros=1739, allocated nonzeros=1739 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 121 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=363, cols=363, bs=3 total: nonzeros=8649, allocated nonzeros=8649 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 121 nodes, limit used is 5 CONVERGED_TIME at time 1 after 1 steps From bsmith at mcs.anl.gov Tue Jun 2 13:14:55 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 2 Jun 2015 13:14:55 -0500 Subject: [petsc-users] KSP "randomly" not converging In-Reply-To: References: Message-ID: Ok, now use a real integrator with adaptive timestep selection. I suggest the additional options -ts_type arkimex -ts_arkimex_fully_implicit -ts_adapt_monitor but Emil and Jed will know much better than me. Barry > On Jun 2, 2015, at 12:51 PM, Italo Tasso wrote: > > From jed at jedbrown.org Tue Jun 2 13:19:38 2015 From: jed at jedbrown.org (Jed Brown) Date: Tue, 02 Jun 2015 20:19:38 +0200 Subject: [petsc-users] Modified Taylor-Hood elements with piece-wise constant pressure for Stokes equation In-Reply-To: <556DDEFE.1030301@imperial.ac.uk> References: <877frpili3.fsf@jedbrown.org> <87a8wkgs5b.fsf@jedbrown.org> <87pp5ffq9c.fsf@jedbrown.org> <876176ekwl.fsf@jedbrown.org> <556D9F18.2070107@imperial.ac.uk> <87lhg2ce5q.fsf@jedbrown.org> <556DD482.2050803@imperial.ac.uk> <87fv6acc9q.fsf@jedbrown.org> <556DDEFE.1030301@imperial.ac.uk> Message-ID: <871thuc6rp.fsf@jedbrown.org> Lawrence Mitchell writes: > Maybe Justin can chime in here, I don't know, I just happened to know > how the fenics implementation produces the "basis", so proffered that. Thanks, Lawrence. Unfortunately, my original questions remain unanswered and now I'm doubly curious why FEniCS appears not to fail due to the singular linear system. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From xzhao99 at gmail.com Tue Jun 2 14:05:52 2015 From: xzhao99 at gmail.com (Xujun Zhao) Date: Tue, 2 Jun 2015 14:05:52 -0500 Subject: [petsc-users] estimation of max and min eigenvalues in SLEPc Message-ID: Dear all, I need to evaluate the max and min eigenvalues of a matrix when I make the Chebyshev polynomial approximation. Are there efficient ways to do this? Thank you very much. Xujun -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Tue Jun 2 14:21:46 2015 From: jychang48 at gmail.com (Justin Chang) Date: Tue, 2 Jun 2015 14:21:46 -0500 Subject: [petsc-users] Modified Taylor-Hood elements with piece-wise constant pressure for Stokes equation In-Reply-To: <871thuc6rp.fsf@jedbrown.org> References: <877frpili3.fsf@jedbrown.org> <87a8wkgs5b.fsf@jedbrown.org> <87pp5ffq9c.fsf@jedbrown.org> <876176ekwl.fsf@jedbrown.org> <556D9F18.2070107@imperial.ac.uk> <87lhg2ce5q.fsf@jedbrown.org> <556DD482.2050803@imperial.ac.uk> <87fv6acc9q.fsf@jedbrown.org> <556DDEFE.1030301@imperial.ac.uk> <871thuc6rp.fsf@jedbrown.org> Message-ID: I originally solve that example problem using LU. But when I solve this one: http://fenicsproject.org/documentation/dolfin/1.5.0/python/demo/documented/stokes-iterative/python/documentation.html By simply running their code as is for TH and adding the one like I mentioned for MTH, I get the following outputs when I pass in -ksp_monitor -ksp_view and -log_summary The latter obviously takes a greater amount of time and iterations to converge, and it was using the solver and precondition options that was originally designed for P2/P1. I haven't experimented around with this fully yet. Thanks, Justin On Tue, Jun 2, 2015 at 1:19 PM, Jed Brown wrote: > Lawrence Mitchell writes: > > Maybe Justin can chime in here, I don't know, I just happened to know > > how the fenics implementation produces the "basis", so proffered that. > > Thanks, Lawrence. Unfortunately, my original questions remain > unanswered and now I'm doubly curious why FEniCS appears not to fail due > to the singular linear system. > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: output_TH.py Type: text/x-python Size: 11482 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: output_MTH.py Type: text/x-python Size: 21480 bytes Desc: not available URL: From tuan.vawr at gmail.com Tue Jun 2 14:38:36 2015 From: tuan.vawr at gmail.com (Tuan Nguyen) Date: Tue, 2 Jun 2015 15:38:36 -0400 Subject: [petsc-users] Double-colon target In-Reply-To: References: Message-ID: Thanks Matt! It works now but there is issue of memory occur: PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [3]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[3]PETSCERROR: or try http://valgrind.org on linux or man libgmalloc on Apple to find memory corruption errors [3]PETSC ERROR: likely location of problem given in stack below [3]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [3]PETSC ERROR: INSTEAD the line number of the start of the function [3]PETSC ERROR: is given. [3]PETSC ERROR: --------------------- Error Message ------------------------------------ [3]PETSC ERROR: Signal received! ?I am planning to find where in the code the error occur by using valgrind. Could you please let me know how to do this? Here is the command I use to run the code: /opt/openmpi/mpirun -np 10 ./fvom --casename=lake>info.dat ? ?Thanks,? On Tue, Jun 2, 2015 at 1:30 PM, Matthew Knepley wrote: > On Tue, Jun 2, 2015 at 12:17 PM, Tuan Nguyen wrote: > >> Dear all, >> >> I installed PETSC 3.1-p8 library. I am trying compile the fortran code >> link to Petsc but there is message saying about target file as follow: >> >> *?**?* >> *petsc-3.1-p8-openmpi/conf/rules:120: *** target file `clean' has both : >> and :: entries. Stop.* >> >> ?In my "makefile" I added double-colon to target clean but it will bring >> me to another error message: >> *?* >> >> >> >> >> >> */home/petsc-3.1-p8-openmpi/conf/rules:299: warning: overriding commands >> for target `.c.o'makefile:39: warning: ignoring old commands for target >> `.c.o'/home/petsc-3.1-p8-openmpi/linux-gnu-c-debug/conf/petscrules:31: >> warning: overriding commands for target `.F.o'makefile:42: warning: >> ignoring old commands for target >> `.F.o'/home/petsc-3.1-p8-openmpi/linux-gnu-c-debug/conf/petscrules:31: >> warning: overriding commands for target `.F90.o'makefile:47: warning: >> ignoring old commands for target `.F90.o'* >> /bin/rm -f *.o *.mod *.f90 >> >> ?Could any one help me? >> > > In your makefile, you have rules for .c.o, etc. You either use the PETSc > makefiles, which define these rules, or > use your own rules, in which case just use the conf/variables file which > defines our make vars. > > Matt > > >> Thanks,? >> >> Tuan >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -- Tuan Nguyen Michigan State University East Lansing, MI. 48823 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Jun 2 14:43:02 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 2 Jun 2015 14:43:02 -0500 Subject: [petsc-users] Double-colon target In-Reply-To: References: Message-ID: http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > On Jun 2, 2015, at 2:38 PM, Tuan Nguyen wrote: > > Thanks Matt! > > It works now but there is issue of memory occur: > > PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range > [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [3]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[3]PETSCERROR: or try http://valgrind.org on linux or man libgmalloc on Apple to find memory corruption errors > [3]PETSC ERROR: likely location of problem given in stack below > [3]PETSC ERROR: --------------------- Stack Frames ------------------------------------ > [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [3]PETSC ERROR: INSTEAD the line number of the start of the function > [3]PETSC ERROR: is given. > [3]PETSC ERROR: --------------------- Error Message ------------------------------------ > [3]PETSC ERROR: Signal received! > > ?I am planning to find where in the code the error occur by using valgrind. Could you please let me know how to do this? > > Here is the command I use to run the code: > > /opt/openmpi/mpirun -np 10 ./fvom --casename=lake>info.dat > ? > ?Thanks,? > > > > > On Tue, Jun 2, 2015 at 1:30 PM, Matthew Knepley wrote: > On Tue, Jun 2, 2015 at 12:17 PM, Tuan Nguyen wrote: > Dear all, > > I installed PETSC 3.1-p8 library. I am trying compile the fortran code link to Petsc but there is message saying about target file as follow: > > ??petsc-3.1-p8-openmpi/conf/rules:120: *** target file `clean' has both : and :: entries. Stop. > > ?In my "makefile" I added double-colon to target clean but it will bring me to another error message: > ? > /home/petsc-3.1-p8-openmpi/conf/rules:299: warning: overriding commands for target `.c.o' > makefile:39: warning: ignoring old commands for target `.c.o' > /home/petsc-3.1-p8-openmpi/linux-gnu-c-debug/conf/petscrules:31: warning: overriding commands for target `.F.o' > makefile:42: warning: ignoring old commands for target `.F.o' > /home/petsc-3.1-p8-openmpi/linux-gnu-c-debug/conf/petscrules:31: warning: overriding commands for target `.F90.o' > makefile:47: warning: ignoring old commands for target `.F90.o' > /bin/rm -f *.o *.mod *.f90 > > ?Could any one help me? > > In your makefile, you have rules for .c.o, etc. You either use the PETSc makefiles, which define these rules, or > use your own rules, in which case just use the conf/variables file which defines our make vars. > > Matt > > Thanks,? > > Tuan > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > > > -- > > Tuan Nguyen > Michigan State University > East Lansing, MI. 48823 > From jed at jedbrown.org Tue Jun 2 15:00:11 2015 From: jed at jedbrown.org (Jed Brown) Date: Tue, 02 Jun 2015 22:00:11 +0200 Subject: [petsc-users] Modified Taylor-Hood elements with piece-wise constant pressure for Stokes equation In-Reply-To: References: <877frpili3.fsf@jedbrown.org> <87a8wkgs5b.fsf@jedbrown.org> <87pp5ffq9c.fsf@jedbrown.org> <876176ekwl.fsf@jedbrown.org> <556D9F18.2070107@imperial.ac.uk> <87lhg2ce5q.fsf@jedbrown.org> <556DD482.2050803@imperial.ac.uk> <87fv6acc9q.fsf@jedbrown.org> <556DDEFE.1030301@imperial.ac.uk> <871thuc6rp.fsf@jedbrown.org> Message-ID: <87y4k1c244.fsf@jedbrown.org> Justin Chang writes: > I originally solve that example problem using LU. I'd like to learn why LU didn't notice that the system is singular. (The checks are not reliable, but this case should be pretty obviously bad.) > But when I solve this one: > > http://fenicsproject.org/documentation/dolfin/1.5.0/python/demo/documented/stokes-iterative/python/documentation.html > > By simply running their code as is for TH and adding the one like I > mentioned for MTH, I get the following outputs when I pass in -ksp_monitor > -ksp_view and -log_summary Thanks, Justin. Could you please run with -ksp_type gmres -ksp_gmres_restart 1000 -ksp_monitor_true_residual -ksp_monitor_singular_value Also, can you run a tiny problem size (like less than 1000 dofs) with -pc_type svd -pc_svd_monitor Thanks. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From jed at jedbrown.org Tue Jun 2 15:26:12 2015 From: jed at jedbrown.org (Jed Brown) Date: Tue, 02 Jun 2015 22:26:12 +0200 Subject: [petsc-users] estimation of max and min eigenvalues in SLEPc In-Reply-To: References: Message-ID: <87vbf5c0wr.fsf@jedbrown.org> Xujun Zhao writes: > I need to evaluate the max and min eigenvalues of a matrix You need the min and max eigenvalues of the preconditioner operator. But note that Chebyshev is not usually used as a stand-alone solver, but rather as a smoother for multigrid or occasionally as a polynomial preconditioner to control storage requirements or orthogonalization issues. One exception is if your problem is fairly well-conditioned with an evenly spaced spectrum. In general, estimating the largest eigenvalue is relatively inexpensive, but computing the smallest to even moderate accuracy is as expensive as solving the linear system. https://scicomp.stackexchange.com/questions/34/how-can-i-estimate-the-condition-number-of-a-large-sparse-matrix-using-petsc > when I make the Chebyshev polynomial approximation. Are there > efficient ways to do this? Thank you very much. > > Xujun -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From hsahasra at purdue.edu Tue Jun 2 15:55:06 2015 From: hsahasra at purdue.edu (Harshad Sahasrabudhe) Date: Tue, 2 Jun 2015 16:55:06 -0400 Subject: [petsc-users] Automatic Differentiation Message-ID: Hi, Does PETSc have automatic differentiation capability? I'm solving an elliptic non-linear equation in which calculating the correct Jacobian is extremely demanding. The system doesn't converge with an approximate Jacobian. Is there any SNES solver I should try which doesn't use a Jacobian? Thanks, Harshad -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Jun 2 15:59:23 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 2 Jun 2015 15:59:23 -0500 Subject: [petsc-users] Automatic Differentiation In-Reply-To: References: Message-ID: On Tue, Jun 2, 2015 at 3:55 PM, Harshad Sahasrabudhe wrote: > Hi, > > Does PETSc have automatic differentiation capability? I'm solving an > elliptic non-linear equation in which calculating the correct Jacobian is > extremely demanding. The system doesn't converge with an approximate > Jacobian. Is there any SNES solver I should try which doesn't use a > Jacobian? > Do you mean it does not converge with the default FD Jacobian? Matt > Thanks, > Harshad > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Tue Jun 2 16:00:41 2015 From: jychang48 at gmail.com (Justin Chang) Date: Tue, 2 Jun 2015 16:00:41 -0500 Subject: [petsc-users] Modified Taylor-Hood elements with piece-wise constant pressure for Stokes equation In-Reply-To: <87y4k1c244.fsf@jedbrown.org> References: <877frpili3.fsf@jedbrown.org> <87a8wkgs5b.fsf@jedbrown.org> <87pp5ffq9c.fsf@jedbrown.org> <876176ekwl.fsf@jedbrown.org> <556D9F18.2070107@imperial.ac.uk> <87lhg2ce5q.fsf@jedbrown.org> <556DD482.2050803@imperial.ac.uk> <87fv6acc9q.fsf@jedbrown.org> <556DDEFE.1030301@imperial.ac.uk> <871thuc6rp.fsf@jedbrown.org> <87y4k1c244.fsf@jedbrown.org> Message-ID: MTH did not converge with the default -ksp_rtol 1e-6 so I had to raise the tolerance to 1e-5 in order to get a solution. Attached are the outputs for TH and MTH Last one with the svd did not work with the way the AMG PC was hard-coded into FEniCS. Here's the list of preconditioners my installation of FEniCS supports: Preconditioner | Description --------------------------------------------------------------- default | default preconditioner ilu | Incomplete LU factorization icc | Incomplete Cholesky factorization sor | Successive over-relaxation petsc_amg | PETSc algebraic multigrid jacobi | Jacobi iteration bjacobi | Block Jacobi iteration additive_schwarz | Additive Schwarz amg | Algebraic multigrid hypre_amg | Hypre algebraic multigrid (BoomerAMG) hypre_euclid | Hypre parallel incomplete LU factorization hypre_parasails | Hypre parallel sparse approximate inverse none | No preconditioner I would need to change a few things here and there if we really insisted on seeing if svd works. On Tue, Jun 2, 2015 at 3:00 PM, Jed Brown wrote: > Justin Chang writes: > > > I originally solve that example problem using LU. > > I'd like to learn why LU didn't notice that the system is singular. > (The checks are not reliable, but this case should be pretty obviously > bad.) > > > But when I solve this one: > > > > > http://fenicsproject.org/documentation/dolfin/1.5.0/python/demo/documented/stokes-iterative/python/documentation.html > > > > By simply running their code as is for TH and adding the one like I > > mentioned for MTH, I get the following outputs when I pass in > -ksp_monitor > > -ksp_view and -log_summary > > Thanks, Justin. Could you please run with > > -ksp_type gmres -ksp_gmres_restart 1000 -ksp_monitor_true_residual > -ksp_monitor_singular_value > > Also, can you run a tiny problem size (like less than 1000 dofs) with > > -pc_type svd -pc_svd_monitor > > Thanks. > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- Passing options to PETSc: -ksp_view -log_summary -ksp_type gmres -ksp_gmres_restart 1000 -ksp_monitor_true_residual -ksp_monitor_singular_value -ksp_rtol 1e-5 Solving linear system of size 112724 x 112724 (PETSc Krylov solver). 0 KSP preconditioned resid norm 5.017503570069e+02 true resid norm 1.438923047556e+02 ||r(i)||/||b|| 1.000000000000e+00 0 KSP Residual norm 5.017503570069e+02 % max 1.000000000000e+00 min 1.000000000000e+00 max/min 1.000000000000e+00 1 KSP preconditioned resid norm 7.591227630371e+01 true resid norm 7.427193761206e+00 ||r(i)||/||b|| 5.161633746726e-02 1 KSP Residual norm 7.591227630371e+01 % max 9.403189726191e-01 min 9.403189726191e-01 max/min 1.000000000000e+00 2 KSP preconditioned resid norm 6.427205593966e+01 true resid norm 7.023263320949e+01 ||r(i)||/||b|| 4.880916552750e-01 2 KSP Residual norm 6.427205593966e+01 % max 6.747758529473e+00 min 1.359589903873e-01 max/min 4.963083728594e+01 3 KSP preconditioned resid norm 3.241898279412e+01 true resid norm 1.878401197162e+01 ||r(i)||/||b|| 1.305421579251e-01 3 KSP Residual norm 3.241898279412e+01 % max 7.471631147563e+00 min 1.038325889219e-01 max/min 7.195844026564e+01 4 KSP preconditioned resid norm 3.139681416085e+01 true resid norm 4.074206515646e+01 ||r(i)||/||b|| 2.831427658739e-01 4 KSP Residual norm 3.139681416085e+01 % max 8.974811447045e+00 min 8.653195741514e-02 max/min 1.037167274974e+02 5 KSP preconditioned resid norm 1.739306265625e+01 true resid norm 4.884953541019e+00 ||r(i)||/||b|| 3.394867814034e-02 5 KSP Residual norm 1.739306265625e+01 % max 1.071697436698e+01 min 8.634525870439e-02 max/min 1.241176936381e+02 6 KSP preconditioned resid norm 1.707694901980e+01 true resid norm 1.424666892138e+01 ||r(i)||/||b|| 9.900924824005e-02 6 KSP Residual norm 1.707694901980e+01 % max 1.074994667555e+01 min 8.525014244853e-02 max/min 1.260988705332e+02 7 KSP preconditioned resid norm 7.950399757905e+00 true resid norm 3.517784684167e+00 ||r(i)||/||b|| 2.444734407544e-02 7 KSP Residual norm 7.950399757905e+00 % max 1.095728016052e+01 min 8.483422448658e-02 max/min 1.291610812362e+02 8 KSP preconditioned resid norm 7.874381372884e+00 true resid norm 1.171852447583e+00 ||r(i)||/||b|| 8.143954950011e-03 8 KSP Residual norm 7.874381372884e+00 % max 1.121106359583e+01 min 8.208782597679e-02 max/min 1.365740103654e+02 9 KSP preconditioned resid norm 4.107489268098e+00 true resid norm 3.264930884633e+00 ||r(i)||/||b|| 2.269010069843e-02 9 KSP Residual norm 4.107489268098e+00 % max 1.152921320407e+01 min 8.053265011085e-02 max/min 1.431619745309e+02 10 KSP preconditioned resid norm 4.061177925849e+00 true resid norm 2.616167211079e+00 ||r(i)||/||b|| 1.818142544539e-02 10 KSP Residual norm 4.061177925849e+00 % max 1.167998175586e+01 min 7.384111196038e-02 max/min 1.581772192452e+02 11 KSP preconditioned resid norm 2.523921262752e+00 true resid norm 2.570520377043e+00 ||r(i)||/||b|| 1.786419629187e-02 11 KSP Residual norm 2.523921262752e+00 % max 1.275510655235e+01 min 7.372487579383e-02 max/min 1.730095359946e+02 12 KSP preconditioned resid norm 2.522786385151e+00 true resid norm 2.561222438539e+00 ||r(i)||/||b|| 1.779957894822e-02 12 KSP Residual norm 2.522786385151e+00 % max 1.276571885197e+01 min 7.281793271540e-02 max/min 1.753100970589e+02 13 KSP preconditioned resid norm 1.761799021615e+00 true resid norm 2.602521600466e+00 ||r(i)||/||b|| 1.808659333719e-02 13 KSP Residual norm 1.761799021615e+00 % max 1.284861800915e+01 min 7.248322221961e-02 max/min 1.772633392348e+02 14 KSP preconditioned resid norm 1.691971503345e+00 true resid norm 1.732652664732e+00 ||r(i)||/||b|| 1.204131567477e-02 14 KSP Residual norm 1.691971503345e+00 % max 1.295322782113e+01 min 7.196406381698e-02 max/min 1.799957803116e+02 15 KSP preconditioned resid norm 1.301081818132e+00 true resid norm 1.876510383618e+00 ||r(i)||/||b|| 1.304107531535e-02 15 KSP Residual norm 1.301081818132e+00 % max 1.312790775668e+01 min 7.169232568829e-02 max/min 1.831145472077e+02 16 KSP preconditioned resid norm 1.197984621371e+00 true resid norm 9.074366379619e-01 ||r(i)||/||b|| 6.306359742470e-03 16 KSP Residual norm 1.197984621371e+00 % max 1.313382580953e+01 min 7.149010577640e-02 max/min 1.837152941220e+02 17 KSP preconditioned resid norm 8.964510840301e-01 true resid norm 6.964575951497e-01 ||r(i)||/||b|| 4.840130932177e-03 17 KSP Residual norm 8.964510840301e-01 % max 1.315654379064e+01 min 7.090550743665e-02 max/min 1.855503791775e+02 18 KSP preconditioned resid norm 8.225518302734e-01 true resid norm 3.243310925156e-01 ||r(i)||/||b|| 2.253984972070e-03 18 KSP Residual norm 8.225518302734e-01 % max 1.325103083529e+01 min 7.042590884757e-02 max/min 1.881556241464e+02 19 KSP preconditioned resid norm 5.362354290050e-01 true resid norm 5.047117409956e-02 ||r(i)||/||b|| 3.507565896959e-04 19 KSP Residual norm 5.362354290050e-01 % max 1.339812068819e+01 min 6.994224201038e-02 max/min 1.915597828020e+02 20 KSP preconditioned resid norm 4.879788824637e-01 true resid norm 1.337642351024e-02 ||r(i)||/||b|| 9.296135420832e-05 20 KSP Residual norm 4.879788824637e-01 % max 1.344376429628e+01 min 6.993970445428e-02 max/min 1.922193466669e+02 21 KSP preconditioned resid norm 3.382919502049e-01 true resid norm 8.955332313210e-02 ||r(i)||/||b|| 6.223635328117e-04 21 KSP Residual norm 3.382919502049e-01 % max 1.344410382581e+01 min 6.993725959792e-02 max/min 1.922309210155e+02 22 KSP preconditioned resid norm 2.932082142605e-01 true resid norm 4.457206495651e-02 ||r(i)||/||b|| 3.097598932216e-04 22 KSP Residual norm 2.932082142605e-01 % max 1.344547709826e+01 min 6.963674731515e-02 max/min 1.930801999900e+02 23 KSP preconditioned resid norm 2.354873961507e-01 true resid norm 1.009021654589e-01 ||r(i)||/||b|| 7.012339237340e-04 23 KSP Residual norm 2.354873961507e-01 % max 1.350110123202e+01 min 6.780948821130e-02 max/min 1.991034232547e+02 24 KSP preconditioned resid norm 1.578982598937e-01 true resid norm 2.149204669477e-02 ||r(i)||/||b|| 1.493620296879e-04 24 KSP Residual norm 1.578982598937e-01 % max 1.393796405377e+01 min 6.778909020282e-02 max/min 2.056077757065e+02 25 KSP preconditioned resid norm 1.439132384300e-01 true resid norm 4.564585559954e-02 ||r(i)||/||b|| 3.172223537393e-04 25 KSP Residual norm 1.439132384300e-01 % max 1.401534831073e+01 min 6.775955702329e-02 max/min 2.068394323462e+02 26 KSP preconditioned resid norm 9.784723458020e-02 true resid norm 7.158308842766e-03 ||r(i)||/||b|| 4.974768355351e-05 26 KSP Residual norm 9.784723458020e-02 % max 1.402292817398e+01 min 6.759901445918e-02 max/min 2.074427901970e+02 27 KSP preconditioned resid norm 8.727790174243e-02 true resid norm 1.195804045346e-02 ||r(i)||/||b|| 8.310409979028e-05 27 KSP Residual norm 8.727790174243e-02 % max 1.405135637013e+01 min 6.725460215889e-02 max/min 2.089278044784e+02 28 KSP preconditioned resid norm 5.309479090041e-02 true resid norm 3.287140314870e-03 ||r(i)||/||b|| 2.284444828688e-05 28 KSP Residual norm 5.309479090041e-02 % max 1.405158291679e+01 min 6.517623965333e-02 max/min 2.155936425841e+02 29 KSP preconditioned resid norm 5.014747837635e-02 true resid norm 5.834891583834e-03 ||r(i)||/||b|| 4.055040742968e-05 29 KSP Residual norm 5.014747837635e-02 % max 1.431486239570e+01 min 6.440934819935e-02 max/min 2.222482107938e+02 30 KSP preconditioned resid norm 2.999073294099e-02 true resid norm 7.278697198972e-03 ||r(i)||/||b|| 5.058433952626e-05 30 KSP Residual norm 2.999073294099e-02 % max 1.436079512227e+01 min 5.986686536951e-02 max/min 2.398788550834e+02 31 KSP preconditioned resid norm 2.993620377984e-02 true resid norm 7.860741293369e-03 ||r(i)||/||b|| 5.462933759190e-05 31 KSP Residual norm 2.993620377984e-02 % max 1.446116755600e+01 min 5.975423408035e-02 max/min 2.420107592134e+02 32 KSP preconditioned resid norm 1.691446958699e-02 true resid norm 4.474713026899e-03 ||r(i)||/||b|| 3.109765344644e-05 32 KSP Residual norm 1.691446958699e-02 % max 1.446150692933e+01 min 5.505096962773e-02 max/min 2.626930465916e+02 33 KSP preconditioned resid norm 1.566058026851e-02 true resid norm 2.493664854637e-03 ||r(i)||/||b|| 1.733007792788e-05 33 KSP Residual norm 1.566058026851e-02 % max 1.449299837284e+01 min 5.420191030017e-02 max/min 2.673890697317e+02 34 KSP preconditioned resid norm 8.422901071325e-03 true resid norm 8.641226071459e-04 ||r(i)||/||b|| 6.005342736109e-06 34 KSP Residual norm 8.422901071325e-03 % max 1.464522771660e+01 min 5.178442848214e-02 max/min 2.828114192986e+02 35 KSP preconditioned resid norm 7.681102835129e-03 true resid norm 3.694560993766e-04 ||r(i)||/||b|| 2.567587613557e-06 35 KSP Residual norm 7.681102835129e-03 % max 1.464708984491e+01 min 4.568418646613e-02 max/min 3.206161908075e+02 36 KSP preconditioned resid norm 4.912467893099e-03 true resid norm 1.012672552156e-04 ||r(i)||/||b|| 7.037711668290e-07 36 KSP Residual norm 4.912467893099e-03 % max 1.467836674228e+01 min 4.478303063548e-02 max/min 3.277662662394e+02 KSP Object: 1 MPI processes type: gmres GMRES: restart=1000, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-15, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix followed by preconditioner matrix: Matrix Object: 1 MPI processes type: seqaij rows=112724, cols=112724 total: nonzeros=10553536, allocated nonzeros=10553536 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 77769 nodes, limit used is 5 Matrix Object: 1 MPI processes type: seqaij rows=112724, cols=112724 total: nonzeros=10553536, allocated nonzeros=10553536 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 77769 nodes, limit used is 5 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- Unknown Name on a linux-gnu-c-opt named pacotaco-xps with 1 processor, by justin Tue Jun 2 15:38:36 2015 Using Petsc Release Version 3.4.2, Jul, 02, 2013 Max Max/Min Avg Total Time (sec): 1.793e+01 1.00000 1.793e+01 Objects: 2.700e+02 1.00000 2.700e+02 Flops: 2.025e+09 1.00000 2.025e+09 2.025e+09 Flops/sec: 1.129e+08 1.00000 1.129e+08 1.129e+08 MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Reductions: 2.810e+02 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 1.7930e+01 100.0% 2.0247e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 2.800e+02 99.6% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %f - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Viewer 1 0 0 0 Index Set 6 6 4584 0 IS L to G Mapping 10 10 3162152 0 Vector 246 246 112196064 0 Vector Scatter 3 3 1932 0 Matrix 2 2 256897368 0 Preconditioner 1 1 1072 0 Krylov Solver 1 1 24237768 0 ======================================================================================================================== Average time to get PetscTime(): 9.53674e-08 #PETSc Option Table entries: -ksp_gmres_restart 1000 -ksp_monitor_singular_value -ksp_monitor_true_residual -ksp_rtol 1e-5 -ksp_type gmres -ksp_view -log_summary #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure run at: Tue Dec 17 23:10:14 2013 Configure options: --with-shared-libraries --with-debugging=0 --useThreads 0 --with-clanguage=C++ --with-c-support --with-fortran-interfaces=1 --with-mpi-dir=/usr/lib/openmpi --with-mpi-shared=1 --with-blas-lib=-lblas --with-lapack-lib=-llapack --with-blacs=1 --with-blacs-include=/usr/include --with-blacs-lib="[/usr/lib/libblacsCinit-openmpi.so,/usr/lib/libblacs-openmpi.so]" --with-scalapack=1 --with-scalapack-include=/usr/include --with-scalapack-lib=/usr/lib/libscalapack-openmpi.so --with-mumps=1 --with-mumps-include=/usr/include --with-mumps-lib="[/usr/lib/libdmumps.so,/usr/lib/libzmumps.so,/usr/lib/libsmumps.so,/usr/lib/libcmumps.so,/usr/lib/libmumps_common.so,/usr/lib/libpord.so]" --with-umfpack=1 --with-umfpack-include=/usr/include/suitesparse --with-umfpack-lib="[/usr/lib/libumfpack.so,/usr/lib/libamd.so]" --with-cholmod=1 --with-cholmod-include=/usr/include/suitesparse --with-cholmod-lib=/usr/lib/libcholmod.so --with-spooles=1 --with-spooles-include=/usr/include/spooles --with-spooles-lib=/usr/lib/libspooles.so --with-hypre=1 --with-hypre-dir=/usr --with-ptscotch=1 --with-ptscotch-include=/usr/include/scotch --with-ptscotch-lib="[/usr/lib/libptesmumps.so,/usr/lib/libptscotch.so,/usr/lib/libptscotcherr.so]" --with-fftw=1 --with-fftw-include=/usr/include --with-fftw-lib="[/usr/lib/x86_64-linux-gnu/libfftw3.so,/usr/lib/x86_64-linux-gnu/libfftw3_mpi.so]" --CXX_LINKER_FLAGS=-Wl,--no-as-needed ----------------------------------------- Libraries compiled on Tue Dec 17 23:10:14 2013 on lamiak Machine characteristics: Linux-3.2.0-37-generic-x86_64-with-Ubuntu-14.04-trusty Using PETSc directory: /build/buildd/petsc-3.4.2.dfsg1 Using PETSc arch: linux-gnu-c-opt ----------------------------------------- Using C compiler: mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O -fPIC ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: mpif90 -fPIC -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/build/buildd/petsc-3.4.2.dfsg1/linux-gnu-c-opt/include -I/build/buildd/petsc-3.4.2.dfsg1/include -I/build/buildd/petsc-3.4.2.dfsg1/include -I/build/buildd/petsc-3.4.2.dfsg1/linux-gnu-c-opt/include -I/usr/include -I/usr/include/suitesparse -I/usr/include/scotch -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi ----------------------------------------- Using C linker: mpicxx Using Fortran linker: mpif90 Using libraries: -L/build/buildd/petsc-3.4.2.dfsg1/linux-gnu-c-opt/lib -L/build/buildd/petsc-3.4.2.dfsg1/linux-gnu-c-opt/lib -lpetsc -L/usr/lib -ldmumps -lzmumps -lsmumps -lcmumps -lmumps_common -lpord -lscalapack-openmpi -lHYPRE_utilities -lHYPRE_struct_mv -lHYPRE_struct_ls -lHYPRE_sstruct_mv -lHYPRE_sstruct_ls -lHYPRE_IJ_mv -lHYPRE_parcsr_ls -lcholmod -lumfpack -lamd -llapack -lblas -lX11 -lpthread -lptesmumps -lptscotch -lptscotcherr -L/usr/lib/x86_64-linux-gnu -lfftw3 -lfftw3_mpi -lm -L/usr/lib/openmpi/lib -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/lib/x86_64-linux-gnu -lmpi_f90 -lmpi_f77 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpi_cxx -lstdc++ -ldl -lmpi -lhwloc -lgcc_s -lpthread -ldl ----------------------------------------- -------------- next part -------------- Passing options to PETSc: -ksp_view -log_summary -ksp_type gmres -ksp_gmres_restart 1000 -ksp_monitor_true_residual -ksp_monitor_singular_value -ksp_rtol 1e-5 Solving linear system of size 137300 x 137300 (PETSc Krylov solver). 0 KSP preconditioned resid norm 6.984814267994e+02 true resid norm 1.438923066390e+02 ||r(i)||/||b|| 1.000000000000e+00 0 KSP Residual norm 6.984814267994e+02 % max 1.000000000000e+00 min 1.000000000000e+00 max/min 1.000000000000e+00 1 KSP preconditioned resid norm 1.163986004310e+02 true resid norm 8.933451693693e+00 ||r(i)||/||b|| 6.208428999687e-02 1 KSP Residual norm 1.163986004310e+02 % max 9.287729945853e-01 min 9.287729945853e-01 max/min 1.000000000000e+00 2 KSP preconditioned resid norm 1.059479342938e+02 true resid norm 6.038790946940e+01 ||r(i)||/||b|| 4.196743445145e-01 2 KSP Residual norm 1.059479342938e+02 % max 6.126631016906e+00 min 1.285422639890e-01 max/min 4.766238610384e+01 3 KSP preconditioned resid norm 5.223783429483e+01 true resid norm 9.994585037421e-01 ||r(i)||/||b|| 6.945878671955e-03 3 KSP Residual norm 5.223783429483e+01 % max 7.465419988555e+00 min 1.087399792034e-01 max/min 6.865386625276e+01 4 KSP preconditioned resid norm 5.026240307604e+01 true resid norm 3.301854430057e+01 ||r(i)||/||b|| 2.294670581896e-01 4 KSP Residual norm 5.026240307604e+01 % max 8.165875239877e+00 min 9.898171436843e-02 max/min 8.249882609107e+01 5 KSP preconditioned resid norm 2.624220793040e+01 true resid norm 9.305239379087e+00 ||r(i)||/||b|| 6.466808126465e-02 5 KSP Residual norm 2.624220793040e+01 % max 9.487455998759e+00 min 9.830746058473e-02 max/min 9.650799585634e+01 6 KSP preconditioned resid norm 2.579673821273e+01 true resid norm 4.130468370235e+00 ||r(i)||/||b|| 2.870527595751e-02 6 KSP Residual norm 2.579673821273e+01 % max 9.498549062834e+00 min 9.824745073782e-02 max/min 9.667985267304e+01 7 KSP preconditioned resid norm 1.229586352739e+01 true resid norm 1.167819462052e+01 ||r(i)||/||b|| 8.115927038283e-02 7 KSP Residual norm 1.229586352739e+01 % max 9.515516850351e+00 min 9.797858247120e-02 max/min 9.711833556224e+01 8 KSP preconditioned resid norm 1.167672289446e+01 true resid norm 2.549025174662e+00 ||r(i)||/||b|| 1.771481209942e-02 8 KSP Residual norm 1.167672289446e+01 % max 9.692597770176e+00 min 9.281651306170e-02 max/min 1.044275145710e+02 9 KSP preconditioned resid norm 6.302455269425e+00 true resid norm 5.262700136815e+00 ||r(i)||/||b|| 3.657388125704e-02 9 KSP Residual norm 6.302455269425e+00 % max 1.021954586970e+01 min 9.267134980311e-02 max/min 1.102772959649e+02 10 KSP preconditioned resid norm 6.073640036558e+00 true resid norm 2.573333151284e+00 ||r(i)||/||b|| 1.788374383170e-02 10 KSP Residual norm 6.073640036558e+00 % max 1.022055340290e+01 min 8.910309349226e-02 max/min 1.147048099266e+02 11 KSP preconditioned resid norm 3.651903697539e+00 true resid norm 3.412601613272e+00 ||r(i)||/||b|| 2.371635908120e-02 11 KSP Residual norm 3.651903697539e+00 % max 1.063746875617e+01 min 8.784139785472e-02 max/min 1.210985823992e+02 12 KSP preconditioned resid norm 3.615882175100e+00 true resid norm 2.985337016258e+00 ||r(i)||/||b|| 2.074702314522e-02 12 KSP Residual norm 3.615882175100e+00 % max 1.065052062013e+01 min 8.403061114075e-02 max/min 1.267457236779e+02 13 KSP preconditioned resid norm 2.435682789121e+00 true resid norm 3.479769878744e+00 ||r(i)||/||b|| 2.418315447173e-02 13 KSP Residual norm 2.435682789121e+00 % max 1.098132202634e+01 min 8.388597954917e-02 max/min 1.309077164666e+02 14 KSP preconditioned resid norm 2.411170965897e+00 true resid norm 3.207679396198e+00 ||r(i)||/||b|| 2.229222305989e-02 14 KSP Residual norm 2.411170965897e+00 % max 1.104312864839e+01 min 8.377799106713e-02 max/min 1.318141973533e+02 15 KSP preconditioned resid norm 1.802221103228e+00 true resid norm 3.661737765783e+00 ||r(i)||/||b|| 2.544776611977e-02 15 KSP Residual norm 1.802221103228e+00 % max 1.117539151513e+01 min 8.312850956824e-02 max/min 1.344351242814e+02 16 KSP preconditioned resid norm 1.724050337077e+00 true resid norm 2.614493520188e+00 ||r(i)||/||b|| 1.816979365511e-02 16 KSP Residual norm 1.724050337077e+00 % max 1.121848051810e+01 min 8.295746169681e-02 max/min 1.352317234476e+02 17 KSP preconditioned resid norm 1.446543004216e+00 true resid norm 2.593901633636e+00 ||r(i)||/||b|| 1.802668741800e-02 17 KSP Residual norm 1.446543004216e+00 % max 1.121901106493e+01 min 8.035014922135e-02 max/min 1.396265118815e+02 18 KSP preconditioned resid norm 1.270289424799e+00 true resid norm 1.230509884415e+00 ||r(i)||/||b|| 8.551603022822e-03 18 KSP Residual norm 1.270289424799e+00 % max 1.124027603502e+01 min 7.973337070066e-02 max/min 1.409732955756e+02 19 KSP preconditioned resid norm 1.080253063006e+00 true resid norm 1.082987561438e+00 ||r(i)||/||b|| 7.526375709267e-03 19 KSP Residual norm 1.080253063006e+00 % max 1.133699520009e+01 min 7.601792058171e-02 max/min 1.491358236760e+02 20 KSP preconditioned resid norm 9.970541525815e-01 true resid norm 6.633295087021e-01 ||r(i)||/||b|| 4.609902531943e-03 20 KSP Residual norm 9.970541525815e-01 % max 1.138059682811e+01 min 7.557071730428e-02 max/min 1.505953262597e+02 21 KSP preconditioned resid norm 8.966498019650e-01 true resid norm 5.486447217741e-01 ||r(i)||/||b|| 3.812884334050e-03 21 KSP Residual norm 8.966498019650e-01 % max 1.139404354003e+01 min 7.311184878176e-02 max/min 1.558440079123e+02 22 KSP preconditioned resid norm 8.337476676724e-01 true resid norm 3.389792250906e-01 ||r(i)||/||b|| 2.355784218131e-03 22 KSP Residual norm 8.337476676724e-01 % max 1.140218849999e+01 min 7.194664865864e-02 max/min 1.584811622580e+02 23 KSP preconditioned resid norm 7.456280014082e-01 true resid norm 1.718772952356e-01 ||r(i)||/||b|| 1.194485648679e-03 23 KSP Residual norm 7.456280014082e-01 % max 1.152770388731e+01 min 6.666423164915e-02 max/min 1.729218743265e+02 24 KSP preconditioned resid norm 7.015114847169e-01 true resid norm 1.443046009947e-01 ||r(i)||/||b|| 1.002865298120e-03 24 KSP Residual norm 7.015114847169e-01 % max 1.155974051319e+01 min 6.656731948674e-02 max/min 1.736548895513e+02 25 KSP preconditioned resid norm 6.739554018025e-01 true resid norm 1.858419872640e-02 ||r(i)||/||b|| 1.291535257200e-04 25 KSP Residual norm 6.739554018025e-01 % max 1.156385727969e+01 min 6.215469200284e-02 max/min 1.860496272616e+02 26 KSP preconditioned resid norm 6.200637861680e-01 true resid norm 1.132932938906e-01 ||r(i)||/||b|| 7.873478196085e-04 26 KSP Residual norm 6.200637861680e-01 % max 1.170223472658e+01 min 6.190712793501e-02 max/min 1.890288746534e+02 27 KSP preconditioned resid norm 6.105065907983e-01 true resid norm 3.994545617989e-02 ||r(i)||/||b|| 2.776066150647e-04 27 KSP Residual norm 6.105065907983e-01 % max 1.171365389653e+01 min 6.027234281800e-02 max/min 1.943454219442e+02 28 KSP preconditioned resid norm 5.610386072732e-01 true resid norm 1.217257320241e-01 ||r(i)||/||b|| 8.459502447860e-04 28 KSP Residual norm 5.610386072732e-01 % max 1.172151686361e+01 min 5.728046389443e-02 max/min 2.046337628343e+02 29 KSP preconditioned resid norm 5.540259455874e-01 true resid norm 8.563080828184e-02 ||r(i)||/||b|| 5.951034511990e-04 29 KSP Residual norm 5.540259455874e-01 % max 1.172195784500e+01 min 5.671439847376e-02 max/min 2.066839843223e+02 30 KSP preconditioned resid norm 5.202167379615e-01 true resid norm 1.284220147458e-01 ||r(i)||/||b|| 8.924870116097e-04 30 KSP Residual norm 5.202167379615e-01 % max 1.182277060575e+01 min 5.020883270604e-02 max/min 2.354719273195e+02 31 KSP preconditioned resid norm 5.063245395222e-01 true resid norm 1.075653393151e-01 ||r(i)||/||b|| 7.475405866207e-04 31 KSP Residual norm 5.063245395222e-01 % max 1.184536475727e+01 min 4.908259681136e-02 max/min 2.413353311928e+02 32 KSP preconditioned resid norm 4.724480857262e-01 true resid norm 1.234120256066e-01 ||r(i)||/||b|| 8.576693812840e-04 32 KSP Residual norm 4.724480857262e-01 % max 1.186011169962e+01 min 3.884608431681e-02 max/min 3.053103525928e+02 33 KSP preconditioned resid norm 4.597874122923e-01 true resid norm 1.246405383864e-01 ||r(i)||/||b|| 8.662071051449e-04 33 KSP Residual norm 4.597874122923e-01 % max 1.186602976776e+01 min 3.825401433368e-02 max/min 3.101904460079e+02 34 KSP preconditioned resid norm 4.284942444766e-01 true resid norm 1.130606423234e-01 ||r(i)||/||b|| 7.857309745340e-04 34 KSP Residual norm 4.284942444766e-01 % max 1.189958948203e+01 min 3.091916244303e-02 max/min 3.848613138843e+02 35 KSP preconditioned resid norm 4.244269206123e-01 true resid norm 1.250348185214e-01 ||r(i)||/||b|| 8.689472108828e-04 35 KSP Residual norm 4.244269206123e-01 % max 1.191766131238e+01 min 3.024693929003e-02 max/min 3.940121411329e+02 36 KSP preconditioned resid norm 3.939025131320e-01 true resid norm 9.144589066824e-02 ||r(i)||/||b|| 6.355161912699e-04 36 KSP Residual norm 3.939025131320e-01 % max 1.202518185820e+01 min 2.593268190026e-02 max/min 4.637076066585e+02 37 KSP preconditioned resid norm 3.935795758769e-01 true resid norm 9.649570618534e-02 ||r(i)||/||b|| 6.706106006587e-04 37 KSP Residual norm 3.935795758769e-01 % max 1.203570530708e+01 min 2.496967807790e-02 max/min 4.820128345079e+02 38 KSP preconditioned resid norm 3.727310812870e-01 true resid norm 6.811969852976e-02 ||r(i)||/||b|| 4.734075095527e-04 38 KSP Residual norm 3.727310812870e-01 % max 1.208413247918e+01 min 2.165231259890e-02 max/min 5.580989293400e+02 39 KSP preconditioned resid norm 3.709983760517e-01 true resid norm 7.787632201538e-02 ||r(i)||/||b|| 5.412125487068e-04 39 KSP Residual norm 3.709983760517e-01 % max 1.210834407119e+01 min 2.098191538922e-02 max/min 5.770847821365e+02 40 KSP preconditioned resid norm 3.467238503118e-01 true resid norm 5.154845790525e-02 ||r(i)||/||b|| 3.582433217544e-04 40 KSP Residual norm 3.467238503118e-01 % max 1.224001159771e+01 min 1.814604032371e-02 max/min 6.745279619885e+02 41 KSP preconditioned resid norm 3.439959783975e-01 true resid norm 5.943616733119e-02 ||r(i)||/||b|| 4.130600774947e-04 41 KSP Residual norm 3.439959783975e-01 % max 1.224561432077e+01 min 1.767836586229e-02 max/min 6.926892686890e+02 42 KSP preconditioned resid norm 3.219614031089e-01 true resid norm 3.736094738334e-02 ||r(i)||/||b|| 2.596452045006e-04 42 KSP Residual norm 3.219614031089e-01 % max 1.227596175886e+01 min 1.612104438917e-02 max/min 7.614867537433e+02 43 KSP preconditioned resid norm 3.199947639961e-01 true resid norm 3.822136396506e-02 ||r(i)||/||b|| 2.656247916086e-04 43 KSP Residual norm 3.199947639961e-01 % max 1.227743106469e+01 min 1.572993605863e-02 max/min 7.805137299308e+02 44 KSP preconditioned resid norm 3.019847533357e-01 true resid norm 2.165216070075e-02 ||r(i)||/||b|| 1.504747627340e-04 44 KSP Residual norm 3.019847533357e-01 % max 1.240993286547e+01 min 1.461321415914e-02 max/min 8.492267840826e+02 45 KSP preconditioned resid norm 3.019442005034e-01 true resid norm 2.135563477224e-02 ||r(i)||/||b|| 1.484140137236e-04 45 KSP Residual norm 3.019442005034e-01 % max 1.243517069391e+01 min 1.440804616025e-02 max/min 8.630712697338e+02 46 KSP preconditioned resid norm 2.850889693160e-01 true resid norm 7.207841532279e-03 ||r(i)||/||b|| 5.009191735569e-05 46 KSP Residual norm 2.850889693160e-01 % max 1.243534970093e+01 min 1.323540651787e-02 max/min 9.395517760748e+02 47 KSP preconditioned resid norm 2.842908210742e-01 true resid norm 9.849131826029e-03 ||r(i)||/||b|| 6.844793899051e-05 47 KSP Residual norm 2.842908210742e-01 % max 1.249881969067e+01 min 1.320941100729e-02 max/min 9.462056774351e+02 48 KSP preconditioned resid norm 2.689207737926e-01 true resid norm 9.583983712934e-04 ||r(i)||/||b|| 6.660525456014e-06 48 KSP Residual norm 2.689207737926e-01 % max 1.261650861132e+01 min 1.223081388475e-02 max/min 1.031534673833e+03 49 KSP preconditioned resid norm 2.668280776593e-01 true resid norm 7.070848430604e-03 ||r(i)||/||b|| 4.913986435942e-05 49 KSP Residual norm 2.668280776593e-01 % max 1.262190931915e+01 min 1.216049456642e-02 max/min 1.037943748933e+03 50 KSP preconditioned resid norm 2.538922892674e-01 true resid norm 1.279955193991e-03 ||r(i)||/||b|| 8.895230216876e-06 50 KSP Residual norm 2.538922892674e-01 % max 1.267018660817e+01 min 1.132031787357e-02 max/min 1.119243006219e+03 51 KSP preconditioned resid norm 2.523880023918e-01 true resid norm 5.780247796299e-03 ||r(i)||/||b|| 4.017065214474e-05 51 KSP Residual norm 2.523880023918e-01 % max 1.268242637678e+01 min 1.111061199465e-02 max/min 1.141469649277e+03 52 KSP preconditioned resid norm 2.417991847669e-01 true resid norm 4.293987037567e-03 ||r(i)||/||b|| 2.984167213568e-05 52 KSP Residual norm 2.417991847669e-01 % max 1.270116330233e+01 min 1.034857425034e-02 max/min 1.227334606206e+03 53 KSP preconditioned resid norm 2.398246159240e-01 true resid norm 6.681384762633e-03 ||r(i)||/||b|| 4.643323134291e-05 53 KSP Residual norm 2.398246159240e-01 % max 1.272464486407e+01 min 1.021008048567e-02 max/min 1.246282522643e+03 54 KSP preconditioned resid norm 2.304239973256e-01 true resid norm 8.786460783172e-03 ||r(i)||/||b|| 6.106275580957e-05 54 KSP Residual norm 2.304239973256e-01 % max 1.282145153236e+01 min 9.617654891509e-03 max/min 1.333116199010e+03 55 KSP preconditioned resid norm 2.274098647246e-01 true resid norm 8.787197459688e-03 ||r(i)||/||b|| 6.106787544754e-05 55 KSP Residual norm 2.274098647246e-01 % max 1.284495561158e+01 min 9.439808674336e-03 max/min 1.360722028880e+03 56 KSP preconditioned resid norm 2.180398826060e-01 true resid norm 1.434954347375e-02 ||r(i)||/||b|| 9.972418824135e-05 56 KSP Residual norm 2.180398826060e-01 % max 1.286926627221e+01 min 9.015633770371e-03 max/min 1.427438891152e+03 57 KSP preconditioned resid norm 2.131920589612e-01 true resid norm 1.084420646671e-02 ||r(i)||/||b|| 7.536335138416e-05 57 KSP Residual norm 2.131920589612e-01 % max 1.290100040400e+01 min 8.814169489220e-03 max/min 1.463666023189e+03 58 KSP preconditioned resid norm 2.065643280101e-01 true resid norm 1.656339704046e-02 ||r(i)||/||b|| 1.151096776982e-04 58 KSP Residual norm 2.065643280101e-01 % max 1.292804597402e+01 min 8.384549705094e-03 max/min 1.541889120911e+03 59 KSP preconditioned resid norm 2.003555740237e-01 true resid norm 1.038231288935e-02 ||r(i)||/||b|| 7.215335643622e-05 59 KSP Residual norm 2.003555740237e-01 % max 1.293249562818e+01 min 8.155142206128e-03 max/min 1.585808720596e+03 60 KSP preconditioned resid norm 1.950953867657e-01 true resid norm 1.413406089093e-02 ||r(i)||/||b|| 9.822666146001e-05 60 KSP Residual norm 1.950953867657e-01 % max 1.293554486453e+01 min 7.731095227769e-03 max/min 1.673184003486e+03 61 KSP preconditioned resid norm 1.899764851607e-01 true resid norm 8.827240071818e-03 ||r(i)||/||b|| 6.134615726166e-05 61 KSP Residual norm 1.899764851607e-01 % max 1.298065726625e+01 min 7.508160945402e-03 max/min 1.728873070335e+03 62 KSP preconditioned resid norm 1.858933940701e-01 true resid norm 1.056927489106e-02 ||r(i)||/||b|| 7.345267539268e-05 62 KSP Residual norm 1.858933940701e-01 % max 1.299440622646e+01 min 7.213860170192e-03 max/min 1.801311076163e+03 63 KSP preconditioned resid norm 1.810870303013e-01 true resid norm 6.742586426457e-03 ||r(i)||/||b|| 4.685856098877e-05 63 KSP Residual norm 1.810870303013e-01 % max 1.299595367805e+01 min 7.004034416700e-03 max/min 1.855495405200e+03 64 KSP preconditioned resid norm 1.778248798506e-01 true resid norm 6.677930576889e-03 ||r(i)||/||b|| 4.640922598902e-05 64 KSP Residual norm 1.778248798506e-01 % max 1.299595829248e+01 min 6.814647504450e-03 max/min 1.907062439252e+03 65 KSP preconditioned resid norm 1.727924115825e-01 true resid norm 5.197256457236e-03 ||r(i)||/||b|| 3.611907112085e-05 65 KSP Residual norm 1.727924115825e-01 % max 1.301320633714e+01 min 6.624687576804e-03 max/min 1.964350195579e+03 66 KSP preconditioned resid norm 1.699236934303e-01 true resid norm 2.760195052716e-03 ||r(i)||/||b|| 1.918236712712e-05 66 KSP Residual norm 1.699236934303e-01 % max 1.302167249004e+01 min 6.519638399051e-03 max/min 1.997299802997e+03 67 KSP preconditioned resid norm 1.649310998218e-01 true resid norm 3.293409382150e-03 ||r(i)||/||b|| 2.288801576038e-05 67 KSP Residual norm 1.649310998218e-01 % max 1.302371148724e+01 min 6.266781646072e-03 max/min 2.078213702468e+03 68 KSP preconditioned resid norm 1.636393596642e-01 true resid norm 9.884906209348e-04 ||r(i)||/||b|| 6.869655814295e-06 68 KSP Residual norm 1.636393596642e-01 % max 1.302437495991e+01 min 6.215671699475e-03 max/min 2.095409086843e+03 69 KSP preconditioned resid norm 1.586669709342e-01 true resid norm 2.881441378483e-03 ||r(i)||/||b|| 2.002498567010e-05 69 KSP Residual norm 1.586669709342e-01 % max 1.304427120268e+01 min 5.967380256283e-03 max/min 2.185929275907e+03 70 KSP preconditioned resid norm 1.573023989071e-01 true resid norm 1.000381550810e-03 ||r(i)||/||b|| 6.952293518515e-06 70 KSP Residual norm 1.573023989071e-01 % max 1.306574692601e+01 min 5.881181855636e-03 max/min 2.221619267477e+03 71 KSP preconditioned resid norm 1.516399060506e-01 true resid norm 3.560033719346e-03 ||r(i)||/||b|| 2.474095941959e-05 71 KSP Residual norm 1.516399060506e-01 % max 1.306952059296e+01 min 5.636288570587e-03 max/min 2.318816793937e+03 72 KSP preconditioned resid norm 1.503518195084e-01 true resid norm 2.345474287001e-03 ||r(i)||/||b|| 1.630020632643e-05 72 KSP Residual norm 1.503518195084e-01 % max 1.307213706006e+01 min 5.548190071151e-03 max/min 2.356108369111e+03 73 KSP preconditioned resid norm 1.439927992762e-01 true resid norm 4.703459082651e-03 ||r(i)||/||b|| 3.268735620767e-05 73 KSP Residual norm 1.439927992762e-01 % max 1.307917952886e+01 min 5.275363624723e-03 max/min 2.479294406847e+03 74 KSP preconditioned resid norm 1.435960011870e-01 true resid norm 4.483492384225e-03 ||r(i)||/||b|| 3.115866642873e-05 74 KSP Residual norm 1.435960011870e-01 % max 1.308872240647e+01 min 5.247381315279e-03 max/min 2.494334148799e+03 75 KSP preconditioned resid norm 1.384154018521e-01 true resid norm 6.030043122791e-03 ||r(i)||/||b|| 4.190664020641e-05 75 KSP Residual norm 1.384154018521e-01 % max 1.309605706761e+01 min 5.057534139172e-03 max/min 2.589415455681e+03 76 KSP preconditioned resid norm 1.383519341196e-01 true resid norm 6.181946178464e-03 ||r(i)||/||b|| 4.296231204337e-05 76 KSP Residual norm 1.383519341196e-01 % max 1.309642044538e+01 min 5.047463753370e-03 max/min 2.594653688525e+03 77 KSP preconditioned resid norm 1.329173312782e-01 true resid norm 6.954398536003e-03 ||r(i)||/||b|| 4.833057929533e-05 77 KSP Residual norm 1.329173312782e-01 % max 1.309825571272e+01 min 4.803507372123e-03 max/min 2.726810785955e+03 78 KSP preconditioned resid norm 1.329173043215e-01 true resid norm 6.946120254536e-03 ||r(i)||/||b|| 4.827304820378e-05 78 KSP Residual norm 1.329173043215e-01 % max 1.311072078455e+01 min 4.802189162826e-03 max/min 2.730155006397e+03 79 KSP preconditioned resid norm 1.275598927624e-01 true resid norm 7.138958457992e-03 ||r(i)||/||b|| 4.961320465800e-05 79 KSP Residual norm 1.275598927624e-01 % max 1.312650704646e+01 min 4.590610278138e-03 max/min 2.859425272708e+03 80 KSP preconditioned resid norm 1.275569484234e-01 true resid norm 7.025529210780e-03 ||r(i)||/||b|| 4.882491201150e-05 80 KSP Residual norm 1.275569484234e-01 % max 1.313280315207e+01 min 4.588845114264e-03 max/min 2.861897236680e+03 81 KSP preconditioned resid norm 1.227187849301e-01 true resid norm 6.636830240688e-03 ||r(i)||/||b|| 4.612359337139e-05 81 KSP Residual norm 1.227187849301e-01 % max 1.313338768313e+01 min 4.423884807674e-03 max/min 2.968745402310e+03 82 KSP preconditioned resid norm 1.226831891668e-01 true resid norm 6.348147610828e-03 ||r(i)||/||b|| 4.411735247775e-05 82 KSP Residual norm 1.226831891668e-01 % max 1.313744093170e+01 min 4.423633515412e-03 max/min 2.969830318432e+03 83 KSP preconditioned resid norm 1.188228300298e-01 true resid norm 5.241447831669e-03 ||r(i)||/||b|| 3.642618534721e-05 83 KSP Residual norm 1.188228300298e-01 % max 1.315934322753e+01 min 4.272492365165e-03 max/min 3.080015621519e+03 84 KSP preconditioned resid norm 1.186380416906e-01 true resid norm 4.901826119931e-03 ||r(i)||/||b|| 3.406593607697e-05 84 KSP Residual norm 1.186380416906e-01 % max 1.317531594396e+01 min 4.266446309469e-03 max/min 3.088124164300e+03 85 KSP preconditioned resid norm 1.139134638963e-01 true resid norm 3.199432120475e-03 ||r(i)||/||b|| 2.223490744715e-05 85 KSP Residual norm 1.139134638963e-01 % max 1.318322334426e+01 min 4.086290876519e-03 max/min 3.226207762159e+03 86 KSP preconditioned resid norm 1.137291112899e-01 true resid norm 3.143817299611e-03 ||r(i)||/||b|| 2.184840435910e-05 86 KSP Residual norm 1.137291112899e-01 % max 1.319134391703e+01 min 4.076597334424e-03 max/min 3.235871202103e+03 87 KSP preconditioned resid norm 1.097004766710e-01 true resid norm 1.510053474298e-03 ||r(i)||/||b|| 1.049433086153e-05 87 KSP Residual norm 1.097004766710e-01 % max 1.319857384286e+01 min 3.927846091124e-03 max/min 3.360257387040e+03 88 KSP preconditioned resid norm 1.093354755492e-01 true resid norm 1.894396889453e-03 ||r(i)||/||b|| 1.316537995465e-05 88 KSP Residual norm 1.093354755492e-01 % max 1.320364831022e+01 min 3.909315868034e-03 max/min 3.377483108537e+03 89 KSP preconditioned resid norm 1.058535985624e-01 true resid norm 5.684208172485e-04 ||r(i)||/||b|| 3.950321115323e-06 89 KSP Residual norm 1.058535985624e-01 % max 1.321346591367e+01 min 3.762621022208e-03 max/min 3.511771670778e+03 90 KSP preconditioned resid norm 1.050510394459e-01 true resid norm 1.393768720082e-03 ||r(i)||/||b|| 9.686193463967e-06 90 KSP Residual norm 1.050510394459e-01 % max 1.321404376514e+01 min 3.726850313945e-03 max/min 3.545633082095e+03 91 KSP preconditioned resid norm 1.022704996805e-01 true resid norm 4.841068462494e-04 ||r(i)||/||b|| 3.364369211650e-06 91 KSP Residual norm 1.022704996805e-01 % max 1.321405643370e+01 min 3.615650805644e-03 max/min 3.654682695872e+03 92 KSP preconditioned resid norm 1.010278070060e-01 true resid norm 1.598147668229e-03 ||r(i)||/||b|| 1.110655396079e-05 92 KSP Residual norm 1.010278070060e-01 % max 1.322392599062e+01 min 3.567311549166e-03 max/min 3.706972550159e+03 93 KSP preconditioned resid norm 9.910271109254e-02 true resid norm 9.571008610822e-04 ||r(i)||/||b|| 6.651508224714e-06 93 KSP Residual norm 9.910271109254e-02 % max 1.324107680397e+01 min 3.483178966884e-03 max/min 3.801434531461e+03 94 KSP preconditioned resid norm 9.746667337540e-02 true resid norm 2.021028671667e-03 ||r(i)||/||b|| 1.404542549129e-05 94 KSP Residual norm 9.746667337540e-02 % max 1.324111623698e+01 min 3.434913976134e-03 max/min 3.854861090841e+03 95 KSP preconditioned resid norm 9.617362913527e-02 true resid norm 1.898737757434e-03 ||r(i)||/||b|| 1.319554743255e-05 95 KSP Residual norm 9.617362913527e-02 % max 1.324744395627e+01 min 3.367741117081e-03 max/min 3.933628950597e+03 96 KSP preconditioned resid norm 9.429461101496e-02 true resid norm 2.580754399862e-03 ||r(i)||/||b|| 1.793531885160e-05 96 KSP Residual norm 9.429461101496e-02 % max 1.324928841468e+01 min 3.316661615028e-03 max/min 3.994766410491e+03 97 KSP preconditioned resid norm 9.322069914953e-02 true resid norm 3.045169681452e-03 ||r(i)||/||b|| 2.116283874087e-05 97 KSP Residual norm 9.322069914953e-02 % max 1.325306385734e+01 min 3.259670836798e-03 max/min 4.065767533253e+03 98 KSP preconditioned resid norm 9.113307259015e-02 true resid norm 2.812404084890e-03 ||r(i)||/||b|| 1.954520120347e-05 98 KSP Residual norm 9.113307259015e-02 % max 1.325427325937e+01 min 3.191256486501e-03 max/min 4.153308678083e+03 99 KSP preconditioned resid norm 9.010748437649e-02 true resid norm 3.703813480914e-03 ||r(i)||/||b|| 2.574017727165e-05 99 KSP Residual norm 9.010748437649e-02 % max 1.327531231621e+01 min 3.137828551200e-03 max/min 4.230732208466e+03 100 KSP preconditioned resid norm 8.819342232903e-02 true resid norm 2.974323179808e-03 ||r(i)||/||b|| 2.067048092620e-05 100 KSP Residual norm 8.819342232903e-02 % max 1.327604652283e+01 min 3.078535574862e-03 max/min 4.312455126793e+03 101 KSP preconditioned resid norm 8.732667531731e-02 true resid norm 3.918253547658e-03 ||r(i)||/||b|| 2.723045893961e-05 101 KSP Residual norm 8.732667531731e-02 % max 1.327611936692e+01 min 3.036562153884e-03 max/min 4.372088794538e+03 102 KSP preconditioned resid norm 8.488685144925e-02 true resid norm 2.832462665539e-03 ||r(i)||/||b|| 1.968460115554e-05 102 KSP Residual norm 8.488685144925e-02 % max 1.327639034385e+01 min 2.953423223567e-03 max/min 4.495254942775e+03 103 KSP preconditioned resid norm 8.412146811290e-02 true resid norm 3.491332959370e-03 ||r(i)||/||b|| 2.426351374108e-05 103 KSP Residual norm 8.412146811290e-02 % max 1.327896568665e+01 min 2.921834859000e-03 max/min 4.544735184383e+03 104 KSP preconditioned resid norm 8.205234384475e-02 true resid norm 2.481514483287e-03 ||r(i)||/||b|| 1.724563697150e-05 104 KSP Residual norm 8.205234384475e-02 % max 1.330258303015e+01 min 2.854889165799e-03 max/min 4.659579499448e+03 105 KSP preconditioned resid norm 8.121078609603e-02 true resid norm 2.791072710808e-03 ||r(i)||/||b|| 1.939695579286e-05 105 KSP Residual norm 8.121078609603e-02 % max 1.337719366592e+01 min 2.825089980479e-03 max/min 4.735138971983e+03 106 KSP preconditioned resid norm 7.928895760050e-02 true resid norm 2.154083347577e-03 ||r(i)||/||b|| 1.497010783892e-05 106 KSP Residual norm 7.928895760050e-02 % max 1.401972017728e+01 min 2.763217156283e-03 max/min 5.073694677019e+03 107 KSP preconditioned resid norm 7.871632096076e-02 true resid norm 2.043176991235e-03 ||r(i)||/||b|| 1.419934838046e-05 107 KSP Residual norm 7.871632096076e-02 % max 1.406645202154e+01 min 2.745551846299e-03 max/min 5.123360551542e+03 108 KSP preconditioned resid norm 7.670730506069e-02 true resid norm 1.852220621451e-03 ||r(i)||/||b|| 1.287226999632e-05 108 KSP Residual norm 7.670730506069e-02 % max 1.484828705143e+01 min 2.668128116971e-03 max/min 5.565057748533e+03 109 KSP preconditioned resid norm 7.647386643631e-02 true resid norm 1.539066082434e-03 ||r(i)||/||b|| 1.069595809799e-05 109 KSP Residual norm 7.647386643631e-02 % max 1.490463305077e+01 min 2.663402756843e-03 max/min 5.596086815062e+03 110 KSP preconditioned resid norm 7.374274892578e-02 true resid norm 1.448708624507e-03 ||r(i)||/||b|| 1.006800612448e-05 110 KSP Residual norm 7.374274892578e-02 % max 1.582724695778e+01 min 2.556578577454e-03 max/min 6.190792294577e+03 111 KSP preconditioned resid norm 7.359342205137e-02 true resid norm 1.123777932966e-03 ||r(i)||/||b|| 7.809854183419e-06 111 KSP Residual norm 7.359342205137e-02 % max 1.595402265911e+01 min 2.553487687942e-03 max/min 6.247934045049e+03 112 KSP preconditioned resid norm 7.095448465081e-02 true resid norm 1.329261318887e-03 ||r(i)||/||b|| 9.237890127245e-06 112 KSP Residual norm 7.095448465081e-02 % max 1.645781196433e+01 min 2.454668540298e-03 max/min 6.704698289871e+03 113 KSP preconditioned resid norm 7.091227992601e-02 true resid norm 1.165934728348e-03 ||r(i)||/||b|| 8.102828814007e-06 113 KSP Residual norm 7.091227992601e-02 % max 1.659166130904e+01 min 2.452963977443e-03 max/min 6.763923751681e+03 114 KSP preconditioned resid norm 6.839860034076e-02 true resid norm 1.581836269981e-03 ||r(i)||/||b|| 1.099319558446e-05 114 KSP Residual norm 6.839860034076e-02 % max 1.677005110257e+01 min 2.360577707051e-03 max/min 7.104214808300e+03 115 KSP preconditioned resid norm 6.839664267602e-02 true resid norm 1.561170986505e-03 ||r(i)||/||b|| 1.084957926501e-05 115 KSP Residual norm 6.839664267602e-02 % max 1.695990613121e+01 min 2.360174810368e-03 max/min 7.185868630033e+03 116 KSP preconditioned resid norm 6.623018576874e-02 true resid norm 1.996395554550e-03 ||r(i)||/||b|| 1.387423414901e-05 116 KSP Residual norm 6.623018576874e-02 % max 1.698227378849e+01 min 2.285042453228e-03 max/min 7.431929224993e+03 117 KSP preconditioned resid norm 6.622585703291e-02 true resid norm 1.997354283265e-03 ||r(i)||/||b|| 1.388089697023e-05 117 KSP Residual norm 6.622585703291e-02 % max 1.737933517688e+01 min 2.284882397872e-03 max/min 7.606227433439e+03 118 KSP preconditioned resid norm 6.450813240620e-02 true resid norm 2.379123427294e-03 ||r(i)||/||b|| 1.653405580093e-05 118 KSP Residual norm 6.450813240620e-02 % max 1.737935525882e+01 min 2.220832028101e-03 max/min 7.825605466292e+03 119 KSP preconditioned resid norm 6.444002578015e-02 true resid norm 2.284010159286e-03 ||r(i)||/||b|| 1.587305265052e-05 119 KSP Residual norm 6.444002578015e-02 % max 1.777510718397e+01 min 2.220589880917e-03 max/min 8.004678097798e+03 120 KSP preconditioned resid norm 6.243494509431e-02 true resid norm 2.568109654162e-03 ||r(i)||/||b|| 1.784744239736e-05 120 KSP Residual norm 6.243494509431e-02 % max 1.778840866865e+01 min 2.148285547508e-03 max/min 8.280281310499e+03 121 KSP preconditioned resid norm 6.229003579918e-02 true resid norm 2.324134806590e-03 ||r(i)||/||b|| 1.615190457973e-05 121 KSP Residual norm 6.229003579918e-02 % max 1.797396785459e+01 min 2.147409030827e-03 max/min 8.370071838463e+03 122 KSP preconditioned resid norm 6.029478848700e-02 true resid norm 2.460084383984e-03 ||r(i)||/||b|| 1.709670545595e-05 122 KSP Residual norm 6.029478848700e-02 % max 1.798128465339e+01 min 2.079543527272e-03 max/min 8.646745988999e+03 123 KSP preconditioned resid norm 6.007031497142e-02 true resid norm 2.106108170100e-03 ||r(i)||/||b|| 1.463669753647e-05 123 KSP Residual norm 6.007031497142e-02 % max 1.800454939518e+01 min 2.076901118012e-03 max/min 8.668948771339e+03 124 KSP preconditioned resid norm 5.832926212876e-02 true resid norm 2.103161324551e-03 ||r(i)||/||b|| 1.461621801524e-05 124 KSP Residual norm 5.832926212876e-02 % max 1.800676896299e+01 min 2.019401552711e-03 max/min 8.916883786099e+03 125 KSP preconditioned resid norm 5.792543787644e-02 true resid norm 1.715151715863e-03 ||r(i)||/||b|| 1.191969019002e-05 125 KSP Residual norm 5.792543787644e-02 % max 1.800677793544e+01 min 2.011427159239e-03 max/min 8.952239633799e+03 126 KSP preconditioned resid norm 5.644152956544e-02 true resid norm 1.499161446982e-03 ||r(i)||/||b|| 1.041863517237e-05 126 KSP Residual norm 5.644152956544e-02 % max 1.800704443929e+01 min 1.967537092265e-03 max/min 9.152073681396e+03 127 KSP preconditioned resid norm 5.586973660567e-02 true resid norm 1.267102798702e-03 ||r(i)||/||b|| 8.805910672355e-06 127 KSP Residual norm 5.586973660567e-02 % max 1.805107398084e+01 min 1.952915856602e-03 max/min 9.243139646707e+03 128 KSP preconditioned resid norm 5.448891069115e-02 true resid norm 8.636346720002e-04 ||r(i)||/||b|| 6.001951682984e-06 128 KSP Residual norm 5.448891069115e-02 % max 1.805292736098e+01 min 1.913983497951e-03 max/min 9.432122784922e+03 129 KSP preconditioned resid norm 5.380539802981e-02 true resid norm 8.535034045037e-04 ||r(i)||/||b|| 5.931543002122e-06 129 KSP Residual norm 5.380539802981e-02 % max 1.811398444083e+01 min 1.893818302409e-03 max/min 9.564795322653e+03 130 KSP preconditioned resid norm 5.262703692841e-02 true resid norm 4.611697276534e-04 ||r(i)||/||b|| 3.204964451716e-06 130 KSP Residual norm 5.262703692841e-02 % max 1.812411377959e+01 min 1.864425110017e-03 max/min 9.721019998186e+03 131 KSP preconditioned resid norm 5.198010431990e-02 true resid norm 5.966815064337e-04 ||r(i)||/||b|| 4.146722784358e-06 131 KSP Residual norm 5.198010431990e-02 % max 1.823860571865e+01 min 1.839903165476e-03 max/min 9.912807402519e+03 132 KSP preconditioned resid norm 5.085529391473e-02 true resid norm 3.358781460833e-04 ||r(i)||/||b|| 2.334232829598e-06 132 KSP Residual norm 5.085529391473e-02 % max 1.826518404725e+01 min 1.810005386907e-03 max/min 1.009123187112e+04 133 KSP preconditioned resid norm 5.017514432361e-02 true resid norm 4.625101396212e-04 ||r(i)||/||b|| 3.214279834860e-06 133 KSP Residual norm 5.017514432361e-02 % max 1.828202177562e+01 min 1.785517877663e-03 max/min 1.023905837311e+04 134 KSP preconditioned resid norm 4.897836349852e-02 true resid norm 3.140587545507e-04 ||r(i)||/||b|| 2.182595872472e-06 134 KSP Residual norm 4.897836349852e-02 % max 1.828630516866e+01 min 1.753922973542e-03 max/min 1.042594540611e+04 135 KSP preconditioned resid norm 4.769605892209e-02 true resid norm 4.604200622165e-04 ||r(i)||/||b|| 3.199754545402e-06 135 KSP Residual norm 4.769605892209e-02 % max 1.828834683359e+01 min 1.716950298410e-03 max/min 1.065164603222e+04 136 KSP preconditioned resid norm 4.694516363902e-02 true resid norm 3.158829423650e-04 ||r(i)||/||b|| 2.195273324497e-06 136 KSP Residual norm 4.694516363902e-02 % max 1.829267335642e+01 min 1.692763795326e-03 max/min 1.080639449339e+04 137 KSP preconditioned resid norm 4.600477726265e-02 true resid norm 4.459130802726e-04 ||r(i)||/||b|| 3.098936216176e-06 137 KSP Residual norm 4.600477726265e-02 % max 1.831613436675e+01 min 1.665361379688e-03 max/min 1.099829417815e+04 138 KSP preconditioned resid norm 4.537939803542e-02 true resid norm 4.828484084324e-04 ||r(i)||/||b|| 3.355623519497e-06 138 KSP Residual norm 4.537939803542e-02 % max 1.831837575747e+01 min 1.644411398981e-03 max/min 1.113977668169e+04 139 KSP preconditioned resid norm 4.335672641269e-02 true resid norm 5.407678166899e-04 ||r(i)||/||b|| 3.758142664616e-06 139 KSP Residual norm 4.335672641269e-02 % max 1.835769547367e+01 min 1.600134099372e-03 max/min 1.147259812842e+04 140 KSP preconditioned resid norm 4.265504838638e-02 true resid norm 7.866395801145e-04 ||r(i)||/||b|| 5.466863368090e-06 140 KSP Residual norm 4.265504838638e-02 % max 1.835951473748e+01 min 1.577862587640e-03 max/min 1.163568670764e+04 141 KSP preconditioned resid norm 4.158346550243e-02 true resid norm 6.048841061750e-04 ||r(i)||/||b|| 4.203727915021e-06 141 KSP Residual norm 4.158346550243e-02 % max 1.847341913997e+01 min 1.557608383332e-03 max/min 1.186011794598e+04 142 KSP preconditioned resid norm 4.101409775033e-02 true resid norm 9.513463716205e-04 ||r(i)||/||b|| 6.611516583770e-06 142 KSP Residual norm 4.101409775033e-02 % max 1.851917901130e+01 min 1.542216069792e-03 max/min 1.200816109626e+04 143 KSP preconditioned resid norm 3.849619047557e-02 true resid norm 4.785060954918e-04 ||r(i)||/||b|| 3.325445999642e-06 143 KSP Residual norm 3.849619047557e-02 % max 1.860764097096e+01 min 1.499689912992e-03 max/min 1.240765894987e+04 144 KSP preconditioned resid norm 3.751577318689e-02 true resid norm 8.583533930410e-04 ||r(i)||/||b|| 5.965248685564e-06 144 KSP Residual norm 3.751577318689e-02 % max 1.863149145223e+01 min 1.481689591305e-03 max/min 1.257449033966e+04 145 KSP preconditioned resid norm 3.673387855308e-02 true resid norm 5.341137389016e-04 ||r(i)||/||b|| 3.711899206965e-06 145 KSP Residual norm 3.673387855308e-02 % max 1.867047493836e+01 min 1.468644236269e-03 max/min 1.271272815926e+04 146 KSP preconditioned resid norm 3.672694107649e-02 true resid norm 5.564983768274e-04 ||r(i)||/||b|| 3.867464424096e-06 146 KSP Residual norm 3.672694107649e-02 % max 1.867843142104e+01 min 1.468356392371e-03 max/min 1.272063888446e+04 147 KSP preconditioned resid norm 3.476552175333e-02 true resid norm 3.041819741103e-04 ||r(i)||/||b|| 2.113955785513e-06 147 KSP Residual norm 3.476552175333e-02 % max 1.874371026285e+01 min 1.443869321819e-03 max/min 1.298158356827e+04 148 KSP preconditioned resid norm 3.441661614207e-02 true resid norm 3.232787917785e-04 ||r(i)||/||b|| 2.246671829298e-06 148 KSP Residual norm 3.441661614207e-02 % max 1.875011791543e+01 min 1.440369950492e-03 max/min 1.301757087409e+04 149 KSP preconditioned resid norm 3.416583700832e-02 true resid norm 3.504294946165e-04 ||r(i)||/||b|| 2.435359490732e-06 149 KSP Residual norm 3.416583700832e-02 % max 1.875025564682e+01 min 1.436520145975e-03 max/min 1.305255321296e+04 150 KSP preconditioned resid norm 3.347212141620e-02 true resid norm 3.631417495901e-04 ||r(i)||/||b|| 2.523705110247e-06 150 KSP Residual norm 3.347212141620e-02 % max 1.875837031230e+01 min 1.426742386720e-03 max/min 1.314769259462e+04 151 KSP preconditioned resid norm 3.345420139042e-02 true resid norm 3.694013060342e-04 ||r(i)||/||b|| 2.567206785843e-06 151 KSP Residual norm 3.345420139042e-02 % max 1.882712051051e+01 min 1.426311630585e-03 max/min 1.319986467669e+04 152 KSP preconditioned resid norm 3.340755394354e-02 true resid norm 3.589735646211e-04 ||r(i)||/||b|| 2.494737717436e-06 152 KSP Residual norm 3.340755394354e-02 % max 1.883670991915e+01 min 1.424634232216e-03 max/min 1.322213764992e+04 153 KSP preconditioned resid norm 3.332106769315e-02 true resid norm 3.563428109286e-04 ||r(i)||/||b|| 2.476454921406e-06 153 KSP Residual norm 3.332106769315e-02 % max 1.887164917076e+01 min 1.423474821806e-03 max/min 1.325745203334e+04 154 KSP preconditioned resid norm 3.117258568907e-02 true resid norm 5.059814012607e-04 ||r(i)||/||b|| 3.516389535197e-06 154 KSP Residual norm 3.117258568907e-02 % max 1.889695812179e+01 min 1.394886394906e-03 max/min 1.354730979584e+04 155 KSP preconditioned resid norm 2.951301076745e-02 true resid norm 5.381856032655e-04 ||r(i)||/||b|| 3.740197206066e-06 155 KSP Residual norm 2.951301076745e-02 % max 1.992328134051e+01 min 1.377048843662e-03 max/min 1.446810070116e+04 156 KSP preconditioned resid norm 2.828691807444e-02 true resid norm 5.783569996283e-04 ||r(i)||/||b|| 4.019374024488e-06 156 KSP Residual norm 2.828691807444e-02 % max 2.057048552195e+01 min 1.363961483973e-03 max/min 1.508142697844e+04 157 KSP preconditioned resid norm 2.686596481189e-02 true resid norm 6.142573094409e-04 ||r(i)||/||b|| 4.268868320959e-06 157 KSP Residual norm 2.686596481189e-02 % max 2.169548158726e+01 min 1.350696937715e-03 max/min 1.606243486711e+04 158 KSP preconditioned resid norm 2.570785084243e-02 true resid norm 7.027270166661e-04 ||r(i)||/||b|| 4.883701103140e-06 158 KSP Residual norm 2.570785084243e-02 % max 2.265067929617e+01 min 1.340996265478e-03 max/min 1.689093391180e+04 159 KSP preconditioned resid norm 2.461384343166e-02 true resid norm 7.143033815156e-04 ||r(i)||/||b|| 4.964152693082e-06 159 KSP Residual norm 2.461384343166e-02 % max 2.426711626246e+01 min 1.331963318795e-03 max/min 1.821905747706e+04 160 KSP preconditioned resid norm 2.365655554984e-02 true resid norm 7.716314609398e-04 ||r(i)||/||b|| 5.362562314578e-06 160 KSP Residual norm 2.365655554984e-02 % max 2.528508397832e+01 min 1.324598508089e-03 max/min 1.908886641794e+04 161 KSP preconditioned resid norm 2.271313995990e-02 true resid norm 8.080063247403e-04 ||r(i)||/||b|| 5.615354591317e-06 161 KSP Residual norm 2.271313995990e-02 % max 2.710047477105e+01 min 1.318001272460e-03 max/min 2.056179712215e+04 162 KSP preconditioned resid norm 2.195987859654e-02 true resid norm 8.351249765833e-04 ||r(i)||/||b|| 5.803819509813e-06 162 KSP Residual norm 2.195987859654e-02 % max 2.819404882400e+01 min 1.312444049260e-03 max/min 2.148209581954e+04 163 KSP preconditioned resid norm 2.116085117451e-02 true resid norm 8.708485797190e-04 ||r(i)||/||b|| 6.052085758163e-06 163 KSP Residual norm 2.116085117451e-02 % max 3.000654129611e+01 min 1.307527119994e-03 max/min 2.294907756579e+04 164 KSP preconditioned resid norm 2.058340051781e-02 true resid norm 8.940937884719e-04 ||r(i)||/||b|| 6.213631634352e-06 164 KSP Residual norm 2.058340051781e-02 % max 3.109002584249e+01 min 1.303546908895e-03 max/min 2.385033145362e+04 165 KSP preconditioned resid norm 1.989398367284e-02 true resid norm 9.152102157778e-04 ||r(i)||/||b|| 6.360383241851e-06 165 KSP Residual norm 1.989398367284e-02 % max 3.280265234906e+01 min 1.299588058166e-03 max/min 2.524080776440e+04 166 KSP preconditioned resid norm 1.942554001343e-02 true resid norm 9.447512947604e-04 ||r(i)||/||b|| 6.565683161441e-06 166 KSP Residual norm 1.942554001343e-02 % max 3.388010820064e+01 min 1.296668249847e-03 max/min 2.612858624759e+04 167 KSP preconditioned resid norm 1.883038279646e-02 true resid norm 9.557977336677e-04 ||r(i)||/||b|| 6.642451955863e-06 167 KSP Residual norm 1.883038279646e-02 % max 3.547736279262e+01 min 1.293421019733e-03 max/min 2.742909095443e+04 168 KSP preconditioned resid norm 1.843122179957e-02 true resid norm 9.852184505393e-04 ||r(i)||/||b|| 6.846915401886e-06 168 KSP Residual norm 1.843122179957e-02 % max 3.654814903453e+01 min 1.291140051594e-03 max/min 2.830688196018e+04 169 KSP preconditioned resid norm 1.791806980431e-02 true resid norm 9.933253867828e-04 ||r(i)||/||b|| 6.903255705498e-06 169 KSP Residual norm 1.791806980431e-02 % max 3.801089399080e+01 min 1.288507285123e-03 max/min 2.949994495930e+04 170 KSP preconditioned resid norm 1.756513299053e-02 true resid norm 1.016284945532e-03 ||r(i)||/||b|| 7.062816416461e-06 170 KSP Residual norm 1.756513299053e-02 % max 3.906308178664e+01 min 1.286580812911e-03 max/min 3.036193404614e+04 171 KSP preconditioned resid norm 1.712025674221e-02 true resid norm 1.025179454020e-03 ||r(i)||/||b|| 7.124630065120e-06 171 KSP Residual norm 1.712025674221e-02 % max 4.040754299400e+01 min 1.284454485174e-03 max/min 3.145891385051e+04 172 KSP preconditioned resid norm 1.680355215203e-02 true resid norm 1.040867875233e-03 ||r(i)||/||b|| 7.233658974167e-06 172 KSP Residual norm 1.680355215203e-02 % max 4.143651193717e+01 min 1.282771441588e-03 max/min 3.230233430041e+04 173 KSP preconditioned resid norm 1.641453139877e-02 true resid norm 1.050486119211e-03 ||r(i)||/||b|| 7.300502325302e-06 173 KSP Residual norm 1.641453139877e-02 % max 4.268339661464e+01 min 1.281026597983e-03 max/min 3.331968023290e+04 174 KSP preconditioned resid norm 1.613141341887e-02 true resid norm 1.061609283280e-03 ||r(i)||/||b|| 7.377804332120e-06 174 KSP Residual norm 1.613141341887e-02 % max 4.368932696223e+01 min 1.279579757862e-03 max/min 3.414349648296e+04 175 KSP preconditioned resid norm 1.578732659562e-02 true resid norm 1.070526175546e-03 ||r(i)||/||b|| 7.439773540026e-06 175 KSP Residual norm 1.578732659562e-02 % max 4.484821904310e+01 min 1.278100728236e-03 max/min 3.508973749276e+04 176 KSP preconditioned resid norm 1.553403085934e-02 true resid norm 1.080028301277e-03 ||r(i)||/||b|| 7.505809910926e-06 176 KSP Residual norm 1.553403085934e-02 % max 4.583537344650e+01 min 1.276894454857e-03 max/min 3.589597657987e+04 177 KSP preconditioned resid norm 1.522618995395e-02 true resid norm 1.087445876720e-03 ||r(i)||/||b|| 7.557359403853e-06 177 KSP Residual norm 1.522618995395e-02 % max 4.691219773705e+01 min 1.275601612061e-03 max/min 3.677652747809e+04 178 KSP preconditioned resid norm 1.499668120626e-02 true resid norm 1.096769621916e-03 ||r(i)||/||b|| 7.622156093920e-06 178 KSP Residual norm 1.499668120626e-02 % max 4.788429158490e+01 min 1.274611942854e-03 max/min 3.756774118849e+04 179 KSP preconditioned resid norm 1.471884070273e-02 true resid norm 1.102777184181e-03 ||r(i)||/||b|| 7.663906500214e-06 179 KSP Residual norm 1.471884070273e-02 % max 4.888735534287e+01 min 1.273464298715e-03 max/min 3.838926257470e+04 180 KSP preconditioned resid norm 1.450863899336e-02 true resid norm 1.111729635978e-03 ||r(i)||/||b|| 7.726122834122e-06 180 KSP Residual norm 1.450863899336e-02 % max 4.984315787189e+01 min 1.272639990685e-03 max/min 3.916516708316e+04 181 KSP preconditioned resid norm 1.425612625648e-02 true resid norm 1.116798520172e-03 ||r(i)||/||b|| 7.761349763989e-06 181 KSP Residual norm 1.425612625648e-02 % max 5.078440462749e+01 min 1.271619534833e-03 max/min 3.993679181261e+04 182 KSP preconditioned resid norm 1.406351275203e-02 true resid norm 1.124594576327e-03 ||r(i)||/||b|| 7.815529562319e-06 182 KSP Residual norm 1.406351275203e-02 % max 5.171928847725e+01 min 1.270904809420e-03 max/min 4.069485621103e+04 183 KSP preconditioned resid norm 1.383249156419e-02 true resid norm 1.129174115919e-03 ||r(i)||/||b|| 7.847355722442e-06 183 KSP Residual norm 1.383249156419e-02 % max 5.261368552263e+01 min 1.270001930509e-03 max/min 4.142803586255e+04 184 KSP preconditioned resid norm 1.365658097167e-02 true resid norm 1.135459325074e-03 ||r(i)||/||b|| 7.891035675191e-06 184 KSP Residual norm 1.365658097167e-02 % max 5.352288966369e+01 min 1.269355492916e-03 max/min 4.216540595791e+04 185 KSP preconditioned resid norm 1.344383471329e-02 true resid norm 1.139764798163e-03 ||r(i)||/||b|| 7.920957171272e-06 185 KSP Residual norm 1.344383471329e-02 % max 5.438537408976e+01 min 1.268560753224e-03 max/min 4.287171422539e+04 186 KSP preconditioned resid norm 1.328404728176e-02 true resid norm 1.144824520075e-03 ||r(i)||/||b|| 7.956120426557e-06 186 KSP Residual norm 1.328404728176e-02 % max 5.526578320716e+01 min 1.267962194465e-03 max/min 4.358630205885e+04 187 KSP preconditioned resid norm 1.308695214231e-02 true resid norm 1.148801543475e-03 ||r(i)||/||b|| 7.983759314927e-06 187 KSP Residual norm 1.308695214231e-02 % max 5.610579081176e+01 min 1.267262760542e-03 max/min 4.427321038594e+04 188 KSP preconditioned resid norm 1.294292922909e-02 true resid norm 1.153263845064e-03 ||r(i)||/||b|| 8.014770712915e-06 188 KSP Residual norm 1.294292922909e-02 % max 5.695681206694e+01 min 1.266707691636e-03 max/min 4.496444794882e+04 189 KSP preconditioned resid norm 1.275944133029e-02 true resid norm 1.156731324409e-03 ||r(i)||/||b|| 8.038868452583e-06 189 KSP Residual norm 1.275944133029e-02 % max 5.777644964581e+01 min 1.266087810417e-03 max/min 4.563384085247e+04 190 KSP preconditioned resid norm 1.263008719773e-02 true resid norm 1.161189570863e-03 ||r(i)||/||b|| 8.069851668833e-06 190 KSP Residual norm 1.263008719773e-02 % max 5.860045207835e+01 min 1.265577565211e-03 max/min 4.630332718373e+04 191 KSP preconditioned resid norm 1.245871543701e-02 true resid norm 1.164044879892e-03 ||r(i)||/||b|| 8.089695044037e-06 191 KSP Residual norm 1.245871543701e-02 % max 5.939738460512e+01 min 1.265021908749e-03 max/min 4.695364103523e+04 192 KSP preconditioned resid norm 1.234100490949e-02 true resid norm 1.168766607171e-03 ||r(i)||/||b|| 8.122509357664e-06 192 KSP Residual norm 1.234100490949e-02 % max 6.019904586841e+01 min 1.264554265652e-03 max/min 4.760495259359e+04 193 KSP preconditioned resid norm 1.218072285661e-02 true resid norm 1.171083749038e-03 ||r(i)||/||b|| 8.138612663815e-06 193 KSP Residual norm 1.218072285661e-02 % max 6.097127016777e+01 min 1.264051451217e-03 max/min 4.823480097196e+04 194 KSP preconditioned resid norm 1.207011696255e-02 true resid norm 1.175896411555e-03 ||r(i)||/||b|| 8.172058944788e-06 194 KSP Residual norm 1.207011696255e-02 % max 6.175555091566e+01 min 1.263617221571e-03 max/min 4.887203961882e+04 195 KSP preconditioned resid norm 1.192016374675e-02 true resid norm 1.177866002139e-03 ||r(i)||/||b|| 8.185746893989e-06 195 KSP Residual norm 1.192016374675e-02 % max 6.250451858129e+01 min 1.263160844574e-03 max/min 4.948262832068e+04 196 KSP preconditioned resid norm 1.181306904179e-02 true resid norm 1.182356800390e-03 ||r(i)||/||b|| 8.216956333577e-06 196 KSP Residual norm 1.181306904179e-02 % max 6.327473124885e+01 min 1.262748439633e-03 max/min 5.010873841762e+04 197 KSP preconditioned resid norm 1.167272061230e-02 true resid norm 1.184167372256e-03 ||r(i)||/||b|| 8.229539159639e-06 197 KSP Residual norm 1.167272061230e-02 % max 6.400445840975e+01 min 1.262335339772e-03 max/min 5.070321363364e+04 198 KSP preconditioned resid norm 1.156784314270e-02 true resid norm 1.188045534515e-03 ||r(i)||/||b|| 8.256490998482e-06 198 KSP Residual norm 1.156784314270e-02 % max 6.476182365047e+01 min 1.261936651150e-03 max/min 5.131939356182e+04 199 KSP preconditioned resid norm 1.143629792622e-02 true resid norm 1.189803854088e-03 ||r(i)||/||b|| 8.268710689813e-06 199 KSP Residual norm 1.143629792622e-02 % max 6.547718982508e+01 min 1.261564441883e-03 max/min 5.190158160082e+04 200 KSP preconditioned resid norm 1.133396245242e-02 true resid norm 1.193056037076e-03 ||r(i)||/||b|| 8.291312196903e-06 200 KSP Residual norm 1.133396245242e-02 % max 6.622145306794e+01 min 1.261176641638e-03 max/min 5.250767488204e+04 201 KSP preconditioned resid norm 1.121036364948e-02 true resid norm 1.194773225376e-03 ||r(i)||/||b|| 8.303246040625e-06 201 KSP Residual norm 1.121036364948e-02 % max 6.692665793915e+01 min 1.260842209250e-03 max/min 5.308091484261e+04 202 KSP preconditioned resid norm 1.111144524574e-02 true resid norm 1.197568418096e-03 ||r(i)||/||b|| 8.322671629004e-06 202 KSP Residual norm 1.111144524574e-02 % max 6.765693853382e+01 min 1.260466204873e-03 max/min 5.367612259040e+04 203 KSP preconditioned resid norm 1.099496034654e-02 true resid norm 1.199195911181e-03 ||r(i)||/||b|| 8.333982123104e-06 203 KSP Residual norm 1.099496034654e-02 % max 6.835433085118e+01 min 1.260165428292e-03 max/min 5.424234732721e+04 204 KSP preconditioned resid norm 1.090018822777e-02 true resid norm 1.201735903578e-03 ||r(i)||/||b|| 8.351634160628e-06 204 KSP Residual norm 1.090018822777e-02 % max 6.906984853447e+01 min 1.259803498026e-03 max/min 5.482589042076e+04 205 KSP preconditioned resid norm 1.079007807571e-02 true resid norm 1.203215514642e-03 ||r(i)||/||b|| 8.361916927640e-06 205 KSP Residual norm 1.079007807571e-02 % max 6.976019515598e+01 min 1.259531675297e-03 max/min 5.538582039991e+04 206 KSP preconditioned resid norm 1.069977373762e-02 true resid norm 1.205648143042e-03 ||r(i)||/||b|| 8.378822823846e-06 206 KSP Residual norm 1.069977373762e-02 % max 7.046071724359e+01 min 1.259186125142e-03 max/min 5.595734882774e+04 207 KSP preconditioned resid norm 1.059542772778e-02 true resid norm 1.206946449390e-03 ||r(i)||/||b|| 8.387845588004e-06 207 KSP Residual norm 1.059542772778e-02 % max 7.114401986834e+01 min 1.258938566637e-03 max/min 5.651111321372e+04 208 KSP preconditioned resid norm 1.050951453560e-02 true resid norm 1.209342905636e-03 ||r(i)||/||b|| 8.404500100691e-06 208 KSP Residual norm 1.050951453560e-02 % max 7.182996384882e+01 min 1.258611378992e-03 max/min 5.707080441808e+04 209 KSP preconditioned resid norm 1.041045312396e-02 true resid norm 1.210460780603e-03 ||r(i)||/||b|| 8.412268931376e-06 209 KSP Residual norm 1.041045312396e-02 % max 7.250596813564e+01 min 1.258383866070e-03 max/min 5.761832306551e+04 210 KSP preconditioned resid norm 1.032857901784e-02 true resid norm 1.212827225727e-03 ||r(i)||/||b|| 8.428714877511e-06 210 KSP Residual norm 1.032857901784e-02 % max 7.317823108250e+01 min 1.258076643690e-03 max/min 5.816675116697e+04 211 KSP preconditioned resid norm 1.023443253666e-02 true resid norm 1.213791633097e-03 ||r(i)||/||b|| 8.435417163356e-06 211 KSP Residual norm 1.023443253666e-02 % max 7.384658038097e+01 min 1.257865660071e-03 max/min 5.870784355207e+04 212 KSP preconditioned resid norm 1.015611913131e-02 true resid norm 1.216094261775e-03 ||r(i)||/||b|| 8.451419608044e-06 212 KSP Residual norm 1.015611913131e-02 % max 7.450626341060e+01 min 1.257579498668e-03 max/min 5.924576815184e+04 213 KSP preconditioned resid norm 1.006659504393e-02 true resid norm 1.216944088642e-03 ||r(i)||/||b|| 8.457325600424e-06 213 KSP Residual norm 1.006659504393e-02 % max 7.516651634457e+01 min 1.257382261319e-03 max/min 5.978016284856e+04 214 KSP preconditioned resid norm 9.991364006986e-03 true resid norm 1.219135370129e-03 ||r(i)||/||b|| 8.472554222011e-06 214 KSP Residual norm 9.991364006986e-03 % max 7.581469524439e+01 min 1.257117570177e-03 max/min 6.030835702482e+04 215 KSP preconditioned resid norm 9.906214402368e-03 true resid norm 1.219909557908e-03 ||r(i)||/||b|| 8.477934549821e-06 215 KSP Residual norm 9.906214402368e-03 % max 7.646637193429e+01 min 1.256931938609e-03 max/min 6.083572991146e+04 216 KSP preconditioned resid norm 9.833665508084e-03 true resid norm 1.221947926851e-03 ||r(i)||/||b|| 8.492100483991e-06 216 KSP Residual norm 9.833665508084e-03 % max 7.710397628801e+01 min 1.256688353264e-03 max/min 6.135489048478e+04 217 KSP preconditioned resid norm 9.752662180238e-03 true resid norm 1.222678717243e-03 ||r(i)||/||b|| 8.497179215500e-06 217 KSP Residual norm 9.752662180238e-03 % max 7.774665933505e+01 min 1.256512698406e-03 max/min 6.187494916179e+04 218 KSP preconditioned resid norm 9.682502942875e-03 true resid norm 1.224538668290e-03 ||r(i)||/||b|| 8.510105209184e-06 218 KSP Residual norm 9.682502942875e-03 % max 7.837444505820e+01 min 1.256289152827e-03 max/min 6.238567361809e+04 219 KSP preconditioned resid norm 9.605422031975e-03 true resid norm 1.225249271117e-03 ||r(i)||/||b|| 8.515043644347e-06 219 KSP Residual norm 9.605422031975e-03 % max 7.900787035656e+01 min 1.256122237332e-03 max/min 6.289823395242e+04 220 KSP preconditioned resid norm 9.537463745511e-03 true resid norm 1.226923114063e-03 ||r(i)||/||b|| 8.526676253384e-06 220 KSP Residual norm 9.537463745511e-03 % max 7.962644233311e+01 min 1.255917156908e-03 max/min 6.340103078865e+04 221 KSP preconditioned resid norm 9.464078633704e-03 true resid norm 1.227628040936e-03 ||r(i)||/||b|| 8.531575242699e-06 221 KSP Residual norm 9.464078633704e-03 % max 8.025052711361e+01 min 1.255758044205e-03 max/min 6.390604263612e+04 222 KSP preconditioned resid norm 9.398214465467e-03 true resid norm 1.229122603089e-03 ||r(i)||/||b|| 8.541961914426e-06 222 KSP Residual norm 9.398214465467e-03 % max 8.086037613768e+01 min 1.255569580395e-03 max/min 6.440135011253e+04 223 KSP preconditioned resid norm 9.328294566913e-03 true resid norm 1.229829199388e-03 ||r(i)||/||b|| 8.546872505655e-06 223 KSP Residual norm 9.328294566913e-03 % max 8.147518995464e+01 min 1.255417568442e-03 max/min 6.489887667875e+04 224 KSP preconditioned resid norm 9.264471294408e-03 true resid norm 1.231160655142e-03 ||r(i)||/||b|| 8.556125646318e-06 224 KSP Residual norm 9.264471294408e-03 % max 8.207672556941e+01 min 1.255243809529e-03 max/min 6.538707854709e+04 225 KSP preconditioned resid norm 9.197784443425e-03 true resid norm 1.231870954253e-03 ||r(i)||/||b|| 8.561061970766e-06 225 KSP Residual norm 9.197784443425e-03 % max 8.268244064189e+01 min 1.255098379129e-03 max/min 6.587725872079e+04 226 KSP preconditioned resid norm 9.135975882630e-03 true resid norm 1.233059902649e-03 ||r(i)||/||b|| 8.569324736329e-06 226 KSP Residual norm 9.135975882630e-03 % max 8.327601457142e+01 min 1.254937502791e-03 max/min 6.635869466503e+04 227 KSP preconditioned resid norm 9.072292463378e-03 true resid norm 1.233772378013e-03 ||r(i)||/||b|| 8.574276184953e-06 227 KSP Residual norm 9.072292463378e-03 % max 8.387286404078e+01 min 1.254798272801e-03 max/min 6.684171141993e+04 228 KSP preconditioned resid norm 9.012479317697e-03 true resid norm 1.234840177419e-03 ||r(i)||/||b|| 8.581697008427e-06 228 KSP Residual norm 9.012479317697e-03 % max 8.445878521092e+01 min 1.254648634702e-03 max/min 6.731668363151e+04 229 KSP preconditioned resid norm 8.951576818904e-03 true resid norm 1.235551219746e-03 ||r(i)||/||b|| 8.586638497956e-06 229 KSP Residual norm 8.951576818904e-03 % max 8.504703781778e+01 min 1.254515321041e-03 max/min 6.779274544625e+04 230 KSP preconditioned resid norm 8.893734322611e-03 true resid norm 1.236517746333e-03 ||r(i)||/||b|| 8.593355511599e-06 230 KSP Residual norm 8.893734322611e-03 % max 8.562558164861e+01 min 1.254375491199e-03 max/min 6.826152316380e+04 231 KSP preconditioned resid norm 8.835401496383e-03 true resid norm 1.237222821605e-03 ||r(i)||/||b|| 8.598255532238e-06 231 KSP Residual norm 8.835401496383e-03 % max 8.620552870800e+01 min 1.254247869990e-03 max/min 6.873085517674e+04 232 KSP preconditioned resid norm 8.779493767166e-03 true resid norm 1.238105385899e-03 ||r(i)||/||b|| 8.604389038012e-06 232 KSP Residual norm 8.779493767166e-03 % max 8.677694303654e+01 min 1.254116634528e-03 max/min 6.919367836087e+04 233 KSP preconditioned resid norm 8.723533927295e-03 true resid norm 1.238799880506e-03 ||r(i)||/||b|| 8.609215526821e-06 233 KSP Residual norm 8.723533927295e-03 % max 8.734889094592e+01 min 1.253994511258e-03 max/min 6.965651776124e+04 234 KSP preconditioned resid norm 8.669512917961e-03 true resid norm 1.239612916539e-03 ||r(i)||/||b|| 8.614865836078e-06 234 KSP Residual norm 8.669512917961e-03 % max 8.791340022062e+01 min 1.253870855121e-03 max/min 7.011360050486e+04 235 KSP preconditioned resid norm 8.615746131402e-03 true resid norm 1.240292698620e-03 ||r(i)||/||b|| 8.619590078097e-06 235 KSP Residual norm 8.615746131402e-03 % max 8.847766403405e+01 min 1.253754041676e-03 max/min 7.057019247231e+04 236 KSP preconditioned resid norm 8.563553150576e-03 true resid norm 1.241047888334e-03 ||r(i)||/||b|| 8.624838376157e-06 236 KSP Residual norm 8.563553150576e-03 % max 8.903547287434e+01 min 1.253637123098e-03 max/min 7.102172648999e+04 237 KSP preconditioned resid norm 8.511817228346e-03 true resid norm 1.241709626795e-03 ||r(i)||/||b|| 8.629437221483e-06 237 KSP Residual norm 8.511817228346e-03 % max 8.959236938351e+01 min 1.253525423463e-03 max/min 7.147231935353e+04 238 KSP preconditioned resid norm 8.461385555673e-03 true resid norm 1.242416231549e-03 ||r(i)||/||b|| 8.634347871470e-06 238 KSP Residual norm 8.461385555673e-03 % max 9.014366616979e+01 min 1.253414546462e-03 max/min 7.191847774886e+04 239 KSP preconditioned resid norm 8.411535899328e-03 true resid norm 1.243057517624e-03 ||r(i)||/||b|| 8.638804580029e-06 239 KSP Residual norm 8.411535899328e-03 % max 9.069350675704e+01 min 1.253307750487e-03 max/min 7.236331756652e+04 240 KSP preconditioned resid norm 8.362793605094e-03 true resid norm 1.243722777012e-03 ||r(i)||/||b|| 8.643427894531e-06 240 KSP Residual norm 8.362793605094e-03 % max 9.123846746428e+01 min 1.253202338627e-03 max/min 7.280425885911e+04 241 KSP preconditioned resid norm 8.314702114455e-03 true resid norm 1.244342102625e-03 ||r(i)||/||b|| 8.647731985751e-06 241 KSP Residual norm 8.314702114455e-03 % max 9.178155165219e+01 min 1.253100222141e-03 max/min 7.324358421659e+04 242 KSP preconditioned resid norm 8.267574626045e-03 true resid norm 1.244971630454e-03 ||r(i)||/||b|| 8.652106978708e-06 242 KSP Residual norm 8.267574626045e-03 % max 9.232034370013e+01 min 1.252999795056e-03 max/min 7.367945634503e+04 243 KSP preconditioned resid norm 8.221127980925e-03 true resid norm 1.245568275842e-03 ||r(i)||/||b|| 8.656253450483e-06 243 KSP Residual norm 8.221127980925e-03 % max 9.285695423569e+01 min 1.252902123995e-03 max/min 7.411349414878e+04 244 KSP preconditioned resid norm 8.175540204420e-03 true resid norm 1.246166417360e-03 ||r(i)||/||b|| 8.660410319829e-06 244 KSP Residual norm 8.175540204420e-03 % max 9.338973986884e+01 min 1.252806277276e-03 max/min 7.454443800516e+04 245 KSP preconditioned resid norm 8.130637860727e-03 true resid norm 1.246740291434e-03 ||r(i)||/||b|| 8.664398539126e-06 245 KSP Residual norm 8.130637860727e-03 % max 9.392013976787e+01 min 1.252712813580e-03 max/min 7.497340072657e+04 246 KSP preconditioned resid norm 8.086515811149e-03 true resid norm 1.247310425546e-03 ||r(i)||/||b|| 8.668360767025e-06 246 KSP Residual norm 8.086515811149e-03 % max 9.444707850342e+01 min 1.252621202198e-03 max/min 7.539955282387e+04 247 KSP preconditioned resid norm 8.043068000819e-03 true resid norm 1.247861900948e-03 ||r(i)||/||b|| 8.672193323571e-06 247 KSP Residual norm 8.043068000819e-03 % max 9.497151003065e+01 min 1.252531709730e-03 max/min 7.582363727233e+04 248 KSP preconditioned resid norm 8.000339967882e-03 true resid norm 1.248406688002e-03 ||r(i)||/||b|| 8.675979398498e-06 248 KSP Residual norm 8.000339967882e-03 % max 9.549275995929e+01 min 1.252444034840e-03 max/min 7.624513136151e+04 249 KSP preconditioned resid norm 7.958265895850e-03 true resid norm 1.248936448921e-03 ||r(i)||/||b|| 8.679661047165e-06 249 KSP Residual norm 7.958265895850e-03 % max 9.601144518790e+01 min 1.252358284283e-03 max/min 7.666451876659e+04 250 KSP preconditioned resid norm 7.916863207090e-03 true resid norm 1.249458024480e-03 ||r(i)||/||b|| 8.683285810513e-06 250 KSP Residual norm 7.916863207090e-03 % max 9.652716321581e+01 min 1.252274283075e-03 max/min 7.708148647660e+04 251 KSP preconditioned resid norm 7.876089537614e-03 true resid norm 1.249966943560e-03 ||r(i)||/||b|| 8.686822615859e-06 251 KSP Residual norm 7.876089537614e-03 % max 9.704030565036e+01 min 1.252192055403e-03 max/min 7.749634349751e+04 252 KSP preconditioned resid norm 7.835946995106e-03 true resid norm 1.250467061890e-03 ||r(i)||/||b|| 8.690298259150e-06 252 KSP Residual norm 7.835946995106e-03 % max 9.755064698379e+01 min 1.252111493531e-03 max/min 7.790891425227e+04 253 KSP preconditioned resid norm 7.796406641405e-03 true resid norm 1.250956108308e-03 ||r(i)||/||b|| 8.693696956619e-06 253 KSP Residual norm 7.796406641405e-03 % max 9.805843372473e+01 min 1.252032582050e-03 max/min 7.831939450347e+04 254 KSP preconditioned resid norm 7.757462710271e-03 true resid norm 1.251436249391e-03 ||r(i)||/||b|| 8.697033765198e-06 254 KSP Residual norm 7.757462710271e-03 % max 9.856355096702e+01 min 1.251955248110e-03 max/min 7.872769503205e+04 255 KSP preconditioned resid norm 7.719093894688e-03 true resid norm 1.251906424231e-03 ||r(i)||/||b|| 8.700301311949e-06 255 KSP Residual norm 7.719093894688e-03 % max 9.906615499272e+01 min 1.251879459351e-03 max/min 7.913394077420e+04 256 KSP preconditioned resid norm 7.681290715363e-03 true resid norm 1.252367867770e-03 ||r(i)||/||b|| 8.703508179297e-06 256 KSP Residual norm 7.681290715363e-03 % max 9.956619717268e+01 min 1.251805160900e-03 max/min 7.953809449156e+04 257 KSP preconditioned resid norm 7.644036247839e-03 true resid norm 1.252820161896e-03 ||r(i)||/||b|| 8.706651461496e-06 257 KSP Residual norm 7.644036247839e-03 % max 1.000637794559e+02 min 1.251732314673e-03 max/min 7.994023824654e+04 258 KSP preconditioned resid norm 7.607319532807e-03 true resid norm 1.253264039419e-03 ||r(i)||/||b|| 8.709736251319e-06 258 KSP Residual norm 7.607319532807e-03 % max 1.005588911978e+02 min 1.251660875349e-03 max/min 8.034036469326e+04 259 KSP preconditioned resid norm 7.571126254178e-03 true resid norm 1.253699408473e-03 ||r(i)||/||b|| 8.712761910329e-06 259 KSP Residual norm 7.571126254178e-03 % max 1.010516025111e+02 min 1.251590804287e-03 max/min 8.073853064838e+04 260 KSP preconditioned resid norm 7.535445116392e-03 true resid norm 1.254126736740e-03 ||r(i)||/||b|| 8.715731688745e-06 260 KSP Residual norm 7.535445116392e-03 % max 1.015419234445e+02 min 1.251522061687e-03 max/min 8.113474508600e+04 261 KSP preconditioned resid norm 7.500263460886e-03 true resid norm 1.254546089275e-03 ||r(i)||/||b|| 8.718646038683e-06 261 KSP Residual norm 7.500263460886e-03 % max 1.020299058152e+02 min 1.251454610485e-03 max/min 8.152905024310e+04 262 KSP preconditioned resid norm 7.465570207901e-03 true resid norm 1.254957793838e-03 ||r(i)||/||b|| 8.721507237959e-06 262 KSP Residual norm 7.465570207901e-03 % max 1.025155702368e+02 min 1.251388414547e-03 max/min 8.192146342819e+04 263 KSP preconditioned resid norm 7.431353849974e-03 true resid norm 1.255361985088e-03 ||r(i)||/||b|| 8.724316222393e-06 263 KSP Residual norm 7.431353849974e-03 % max 1.029989580786e+02 min 1.251323439098e-03 max/min 8.231201850802e+04 264 KSP preconditioned resid norm 7.397603766441e-03 true resid norm 1.255758915970e-03 ||r(i)||/||b|| 8.727074749871e-06 264 KSP Residual norm 7.397603766441e-03 % max 1.034800948320e+02 min 1.251259650807e-03 max/min 8.270073662590e+04 265 KSP preconditioned resid norm 7.364309327176e-03 true resid norm 1.256148747275e-03 ||r(i)||/||b|| 8.729783937837e-06 265 KSP Residual norm 7.364309327176e-03 % max 1.039590158106e+02 min 1.251197017305e-03 max/min 8.308764676767e+04 266 KSP preconditioned resid norm 7.331460459593e-03 true resid norm 1.256531689740e-03 ||r(i)||/||b|| 8.732445250825e-06 266 KSP Residual norm 7.331460459593e-03 % max 1.044357483275e+02 min 1.251135507633e-03 max/min 8.347277148663e+04 267 KSP preconditioned resid norm 7.299047255930e-03 true resid norm 1.256907909730e-03 ||r(i)||/||b|| 8.735059845022e-06 267 KSP Residual norm 7.299047255930e-03 % max 1.049103240257e+02 min 1.251075091712e-03 max/min 8.385613679041e+04 268 KSP preconditioned resid norm 7.267060207335e-03 true resid norm 1.257277592154e-03 ||r(i)||/||b|| 8.737629005481e-06 268 KSP Residual norm 7.267060207335e-03 % max 1.053827704742e+02 min 1.251015740708e-03 max/min 8.423776539743e+04 269 KSP preconditioned resid norm 7.235490033156e-03 true resid norm 1.257640899938e-03 ||r(i)||/||b|| 8.740153864466e-06 269 KSP Residual norm 7.235490033156e-03 % max 1.058531169122e+02 min 1.250957426646e-03 max/min 8.461768135152e+04 270 KSP preconditioned resid norm 7.204327771402e-03 true resid norm 1.257998000326e-03 ||r(i)||/||b|| 8.742635584281e-06 270 KSP Residual norm 7.204327771402e-03 % max 1.063213904088e+02 min 1.250900122634e-03 max/min 8.499590693533e+04 271 KSP preconditioned resid norm 7.173564703362e-03 true resid norm 1.258349048571e-03 ||r(i)||/||b|| 8.745075243864e-06 271 KSP Residual norm 7.173564703362e-03 % max 1.067876184650e+02 min 1.250843802621e-03 max/min 8.537246476441e+04 272 KSP preconditioned resid norm 7.143192384191e-03 true resid norm 1.258694198769e-03 ||r(i)||/||b|| 8.747473914136e-06 272 KSP Residual norm 7.143192384191e-03 % max 1.072518273159e+02 min 1.250788441502e-03 max/min 8.574737642053e+04 273 KSP preconditioned resid norm 7.113202607575e-03 true resid norm 1.259033597312e-03 ||r(i)||/||b|| 8.749832612459e-06 273 KSP Residual norm 7.113202607575e-03 % max 1.077140430794e+02 min 1.250734014978e-03 max/min 8.612066337805e+04 274 KSP preconditioned resid norm 7.083587412353e-03 true resid norm 1.259367387496e-03 ||r(i)||/||b|| 8.752152334705e-06 274 KSP Residual norm 7.083587412353e-03 % max 1.081742910321e+02 min 1.250680499591e-03 max/min 8.649234642059e+04 275 KSP preconditioned resid norm 7.054339063782e-03 true resid norm 1.259695706280e-03 ||r(i)||/||b|| 8.754434032675e-06 275 KSP Residual norm 7.054339063782e-03 % max 1.086325961071e+02 min 1.250627872643e-03 max/min 8.686244604279e+04 276 KSP preconditioned resid norm 7.025450051101e-03 true resid norm 1.260018688108e-03 ||r(i)||/||b|| 8.756678640709e-06 276 KSP Residual norm 7.025450051101e-03 % max 1.090889825986e+02 min 1.250576112199e-03 max/min 8.723098221245e+04 277 KSP preconditioned resid norm 6.996913075745e-03 true resid norm 1.260336461071e-03 ||r(i)||/||b|| 8.758887049000e-06 277 KSP Residual norm 6.996913075745e-03 % max 1.095434743757e+02 min 1.250525197029e-03 max/min 8.759797454376e+04 278 KSP preconditioned resid norm 6.968721045823e-03 true resid norm 1.260649150836e-03 ||r(i)||/||b|| 8.761060130883e-06 278 KSP Residual norm 6.968721045823e-03 % max 1.099960947706e+02 min 1.250475106597e-03 max/min 8.796344220711e+04 KSP Object: 1 MPI processes type: gmres GMRES: restart=1000, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-15, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type Falgout HYPRE BoomerAMG: Interpolation type classical linear system matrix followed by preconditioner matrix: Matrix Object: 1 MPI processes type: seqaij rows=137300, cols=137300 total: nonzeros=12249280, allocated nonzeros=12249280 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 101325 nodes, limit used is 5 Matrix Object: 1 MPI processes type: seqaij rows=137300, cols=137300 total: nonzeros=12249280, allocated nonzeros=12249280 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 101325 nodes, limit used is 5 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- Unknown Name on a linux-gnu-c-opt named pacotaco-xps with 1 processor, by justin Tue Jun 2 15:36:44 2015 Using Petsc Release Version 3.4.2, Jul, 02, 2013 Max Max/Min Avg Total Time (sec): 1.068e+02 1.00000 1.068e+02 Objects: 1.718e+03 1.00000 1.718e+03 Flops: 4.590e+10 1.00000 4.590e+10 4.590e+10 Flops/sec: 4.298e+08 1.00000 4.298e+08 4.298e+08 MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Reductions: 1.731e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 1.0680e+02 100.0% 4.5900e+10 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 1.730e+03 99.9% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %f - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Viewer 1 0 0 0 Index Set 6 6 4584 0 IS L to G Mapping 10 10 3850280 0 Vector 1694 1694 934018016 0 Vector Scatter 3 3 1932 0 Matrix 2 2 298381656 0 Preconditioner 1 1 1072 0 Krylov Solver 1 1 24237768 0 ======================================================================================================================== Average time to get PetscTime(): 1.19209e-07 #PETSc Option Table entries: -ksp_gmres_restart 1000 -ksp_monitor_singular_value -ksp_monitor_true_residual -ksp_rtol 1e-5 -ksp_type gmres -ksp_view -log_summary #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure run at: Tue Dec 17 23:10:14 2013 Configure options: --with-shared-libraries --with-debugging=0 --useThreads 0 --with-clanguage=C++ --with-c-support --with-fortran-interfaces=1 --with-mpi-dir=/usr/lib/openmpi --with-mpi-shared=1 --with-blas-lib=-lblas --with-lapack-lib=-llapack --with-blacs=1 --with-blacs-include=/usr/include --with-blacs-lib="[/usr/lib/libblacsCinit-openmpi.so,/usr/lib/libblacs-openmpi.so]" --with-scalapack=1 --with-scalapack-include=/usr/include --with-scalapack-lib=/usr/lib/libscalapack-openmpi.so --with-mumps=1 --with-mumps-include=/usr/include --with-mumps-lib="[/usr/lib/libdmumps.so,/usr/lib/libzmumps.so,/usr/lib/libsmumps.so,/usr/lib/libcmumps.so,/usr/lib/libmumps_common.so,/usr/lib/libpord.so]" --with-umfpack=1 --with-umfpack-include=/usr/include/suitesparse --with-umfpack-lib="[/usr/lib/libumfpack.so,/usr/lib/libamd.so]" --with-cholmod=1 --with-cholmod-include=/usr/include/suitesparse --with-cholmod-lib=/usr/lib/libcholmod.so --with-spooles=1 --with-spooles-include=/usr/include/spooles --with-spooles-lib=/usr/lib/libspooles.so --with-hypre=1 --with-hypre-dir=/usr --with-ptscotch=1 --with-ptscotch-include=/usr/include/scotch --with-ptscotch-lib="[/usr/lib/libptesmumps.so,/usr/lib/libptscotch.so,/usr/lib/libptscotcherr.so]" --with-fftw=1 --with-fftw-include=/usr/include --with-fftw-lib="[/usr/lib/x86_64-linux-gnu/libfftw3.so,/usr/lib/x86_64-linux-gnu/libfftw3_mpi.so]" --CXX_LINKER_FLAGS=-Wl,--no-as-needed ----------------------------------------- Libraries compiled on Tue Dec 17 23:10:14 2013 on lamiak Machine characteristics: Linux-3.2.0-37-generic-x86_64-with-Ubuntu-14.04-trusty Using PETSc directory: /build/buildd/petsc-3.4.2.dfsg1 Using PETSc arch: linux-gnu-c-opt ----------------------------------------- Using C compiler: mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O -fPIC ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: mpif90 -fPIC -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/build/buildd/petsc-3.4.2.dfsg1/linux-gnu-c-opt/include -I/build/buildd/petsc-3.4.2.dfsg1/include -I/build/buildd/petsc-3.4.2.dfsg1/include -I/build/buildd/petsc-3.4.2.dfsg1/linux-gnu-c-opt/include -I/usr/include -I/usr/include/suitesparse -I/usr/include/scotch -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi ----------------------------------------- Using C linker: mpicxx Using Fortran linker: mpif90 Using libraries: -L/build/buildd/petsc-3.4.2.dfsg1/linux-gnu-c-opt/lib -L/build/buildd/petsc-3.4.2.dfsg1/linux-gnu-c-opt/lib -lpetsc -L/usr/lib -ldmumps -lzmumps -lsmumps -lcmumps -lmumps_common -lpord -lscalapack-openmpi -lHYPRE_utilities -lHYPRE_struct_mv -lHYPRE_struct_ls -lHYPRE_sstruct_mv -lHYPRE_sstruct_ls -lHYPRE_IJ_mv -lHYPRE_parcsr_ls -lcholmod -lumfpack -lamd -llapack -lblas -lX11 -lpthread -lptesmumps -lptscotch -lptscotcherr -L/usr/lib/x86_64-linux-gnu -lfftw3 -lfftw3_mpi -lm -L/usr/lib/openmpi/lib -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/lib/x86_64-linux-gnu -lmpi_f90 -lmpi_f77 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpi_cxx -lstdc++ -ldl -lmpi -lhwloc -lgcc_s -lpthread -ldl ----------------------------------------- From hsahasra at purdue.edu Tue Jun 2 16:05:20 2015 From: hsahasra at purdue.edu (Harshad Sahasrabudhe) Date: Tue, 2 Jun 2015 17:05:20 -0400 Subject: [petsc-users] Automatic Differentiation In-Reply-To: References: Message-ID: I haven't tried the FD Jacobian. How do I enable that? I have been providing the Jacobian. On Tue, Jun 2, 2015 at 4:59 PM, Matthew Knepley wrote: > On Tue, Jun 2, 2015 at 3:55 PM, Harshad Sahasrabudhe > wrote: > >> Hi, >> >> Does PETSc have automatic differentiation capability? I'm solving an >> elliptic non-linear equation in which calculating the correct Jacobian is >> extremely demanding. The system doesn't converge with an approximate >> Jacobian. Is there any SNES solver I should try which doesn't use a >> Jacobian? >> > > Do you mean it does not converge with the default FD Jacobian? > > Matt > > >> Thanks, >> Harshad >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Jun 2 16:12:05 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 2 Jun 2015 16:12:05 -0500 Subject: [petsc-users] Modified Taylor-Hood elements with piece-wise constant pressure for Stokes equation In-Reply-To: References: <877frpili3.fsf@jedbrown.org> <87a8wkgs5b.fsf@jedbrown.org> <87pp5ffq9c.fsf@jedbrown.org> <876176ekwl.fsf@jedbrown.org> <556D9F18.2070107@imperial.ac.uk> <87lhg2ce5q.fsf@jedbrown.org> <556DD482.2050803@imperial.ac.uk> <87fv6acc9q.fsf@jedbrown.org> <556DDEFE.1030301@imperial.ac.uk> <871thuc6rp.fsf@jedbrown.org> <87y4k1c244.fsf@jedbrown.org> Message-ID: On Tue, Jun 2, 2015 at 4:00 PM, Justin Chang wrote: > MTH did not converge with the default -ksp_rtol 1e-6 so I had to raise the > tolerance to 1e-5 in order to get a solution. Attached are the outputs for > TH and MTH > This really looks like it was never solving this system. Matt > Last one with the svd did not work with the way the AMG PC was hard-coded > into FEniCS. Here's the list of preconditioners my installation of FEniCS > supports: > > Preconditioner | Description > --------------------------------------------------------------- > default | default preconditioner > ilu | Incomplete LU factorization > icc | Incomplete Cholesky factorization > sor | Successive over-relaxation > petsc_amg | PETSc algebraic multigrid > jacobi | Jacobi iteration > bjacobi | Block Jacobi iteration > additive_schwarz | Additive Schwarz > amg | Algebraic multigrid > hypre_amg | Hypre algebraic multigrid (BoomerAMG) > hypre_euclid | Hypre parallel incomplete LU factorization > hypre_parasails | Hypre parallel sparse approximate inverse > none | No preconditioner > > I would need to change a few things here and there if we really insisted > on seeing if svd works. > > On Tue, Jun 2, 2015 at 3:00 PM, Jed Brown wrote: > >> Justin Chang writes: >> >> > I originally solve that example problem using LU. >> >> I'd like to learn why LU didn't notice that the system is singular. >> (The checks are not reliable, but this case should be pretty obviously >> bad.) >> >> > But when I solve this one: >> > >> > >> http://fenicsproject.org/documentation/dolfin/1.5.0/python/demo/documented/stokes-iterative/python/documentation.html >> > >> > By simply running their code as is for TH and adding the one like I >> > mentioned for MTH, I get the following outputs when I pass in >> -ksp_monitor >> > -ksp_view and -log_summary >> >> Thanks, Justin. Could you please run with >> >> -ksp_type gmres -ksp_gmres_restart 1000 -ksp_monitor_true_residual >> -ksp_monitor_singular_value >> >> Also, can you run a tiny problem size (like less than 1000 dofs) with >> >> -pc_type svd -pc_svd_monitor >> >> Thanks. >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Jun 2 16:15:08 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 2 Jun 2015 16:15:08 -0500 Subject: [petsc-users] Automatic Differentiation In-Reply-To: References: Message-ID: On Tue, Jun 2, 2015 at 4:05 PM, Harshad Sahasrabudhe wrote: > I haven't tried the FD Jacobian. How do I enable that? I have been > providing the Jacobian. > Just do not provide the Jacobian. It will turn on. Thanks, Matt > On Tue, Jun 2, 2015 at 4:59 PM, Matthew Knepley wrote: > >> On Tue, Jun 2, 2015 at 3:55 PM, Harshad Sahasrabudhe > > wrote: >> >>> Hi, >>> >>> Does PETSc have automatic differentiation capability? I'm solving an >>> elliptic non-linear equation in which calculating the correct Jacobian is >>> extremely demanding. The system doesn't converge with an approximate >>> Jacobian. Is there any SNES solver I should try which doesn't use a >>> Jacobian? >>> >> >> Do you mean it does not converge with the default FD Jacobian? >> >> Matt >> >> >>> Thanks, >>> Harshad >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Jun 2 16:45:54 2015 From: jed at jedbrown.org (Jed Brown) Date: Tue, 02 Jun 2015 23:45:54 +0200 Subject: [petsc-users] Modified Taylor-Hood elements with piece-wise constant pressure for Stokes equation In-Reply-To: References: <877frpili3.fsf@jedbrown.org> <87a8wkgs5b.fsf@jedbrown.org> <87pp5ffq9c.fsf@jedbrown.org> <876176ekwl.fsf@jedbrown.org> <556D9F18.2070107@imperial.ac.uk> <87lhg2ce5q.fsf@jedbrown.org> <556DD482.2050803@imperial.ac.uk> <87fv6acc9q.fsf@jedbrown.org> <556DDEFE.1030301@imperial.ac.uk> <871thuc6rp.fsf@jedbrown.org> <87y4k1c244.fsf@jedbrown.org> Message-ID: <87k2vlbx7x.fsf@jedbrown.org> Justin Chang writes: > Last one with the svd did not work with the way the AMG PC was hard-coded > into FEniCS. Here's the list of preconditioners my installation of FEniCS > supports: > > Preconditioner | Description > --------------------------------------------------------------- > default | default preconditioner > ilu | Incomplete LU factorization > icc | Incomplete Cholesky factorization > sor | Successive over-relaxation > petsc_amg | PETSc algebraic multigrid > jacobi | Jacobi iteration > bjacobi | Block Jacobi iteration > additive_schwarz | Additive Schwarz > amg | Algebraic multigrid > hypre_amg | Hypre algebraic multigrid (BoomerAMG) > hypre_euclid | Hypre parallel incomplete LU factorization > hypre_parasails | Hypre parallel sparse approximate inverse > none | No preconditioner Wrapping an extensible dynamic system with a static maintenance burden is pretty much the canonical example of bad design. I used to keep a patched version of Dolfin so I could bypass this stupid table. The table can and should be constructed dynamically using const char **pcs; int npcs; PetscFunctionListGet(PCList,&pcs,&npcs); Anyway, are you sure this wasn't fixed so that you can pass PETSc options directly? If you can get commands to PETSc, but just can't change the preconditioner, then try -ksp_view_mat binary (for the tiny problem) and open the resulting matrix "binaryoutput" in octave/matlab/scipy. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From jed at jedbrown.org Tue Jun 2 17:00:30 2015 From: jed at jedbrown.org (Jed Brown) Date: Wed, 03 Jun 2015 00:00:30 +0200 Subject: [petsc-users] Convergence of iterative linear solver In-Reply-To: References: Message-ID: <87h9qpbwjl.fsf@jedbrown.org> Eduardo writes: > I am solving a FEM solid mechanics linear elasticity model, for now the > only problem is the mesh that has needle-shaped and very flat elements. Why does it have such elements? Is the material highly anisotropic (e.g., fiber)? Is the geometry anisotropic (shell structures discretized using volumes)? Is it just a low-quality mesh? As Mark says, thresholding is important for AMG to solve anisotropic problems, but many of the discretizations used in solid mechanics will obscure the anisotropy from the usual strength of connection measures. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From jychang48 at gmail.com Tue Jun 2 17:12:05 2015 From: jychang48 at gmail.com (Justin Chang) Date: Tue, 2 Jun 2015 17:12:05 -0500 Subject: [petsc-users] Modified Taylor-Hood elements with piece-wise constant pressure for Stokes equation In-Reply-To: <87k2vlbx7x.fsf@jedbrown.org> References: <877frpili3.fsf@jedbrown.org> <87a8wkgs5b.fsf@jedbrown.org> <87pp5ffq9c.fsf@jedbrown.org> <876176ekwl.fsf@jedbrown.org> <556D9F18.2070107@imperial.ac.uk> <87lhg2ce5q.fsf@jedbrown.org> <556DD482.2050803@imperial.ac.uk> <87fv6acc9q.fsf@jedbrown.org> <556DDEFE.1030301@imperial.ac.uk> <871thuc6rp.fsf@jedbrown.org> <87y4k1c244.fsf@jedbrown.org> <87k2vlbx7x.fsf@jedbrown.org> Message-ID: Actually, something is wrong. The velocity solutions are correct, but the pressure solution has gone awry (see attached, left uses MTH whereas right is TH). Don't know why I didn't catch this earlier, I may need to consult to the FEniCS guys for this On Tue, Jun 2, 2015 at 4:45 PM, Jed Brown wrote: > Justin Chang writes: > > > Last one with the svd did not work with the way the AMG PC was hard-coded > > into FEniCS. Here's the list of preconditioners my installation of FEniCS > > supports: > > > > Preconditioner | Description > > --------------------------------------------------------------- > > default | default preconditioner > > ilu | Incomplete LU factorization > > icc | Incomplete Cholesky factorization > > sor | Successive over-relaxation > > petsc_amg | PETSc algebraic multigrid > > jacobi | Jacobi iteration > > bjacobi | Block Jacobi iteration > > additive_schwarz | Additive Schwarz > > amg | Algebraic multigrid > > hypre_amg | Hypre algebraic multigrid (BoomerAMG) > > hypre_euclid | Hypre parallel incomplete LU factorization > > hypre_parasails | Hypre parallel sparse approximate inverse > > none | No preconditioner > > Wrapping an extensible dynamic system with a static maintenance burden > is pretty much the canonical example of bad design. > > I used to keep a patched version of Dolfin so I could bypass this stupid > table. The table can and should be constructed dynamically using > > const char **pcs; > int npcs; > PetscFunctionListGet(PCList,&pcs,&npcs); > > Anyway, are you sure this wasn't fixed so that you can pass PETSc > options directly? > > If you can get commands to PETSc, but just can't change the > preconditioner, then try > > -ksp_view_mat binary > > (for the tiny problem) and open the resulting matrix "binaryoutput" in > octave/matlab/scipy. > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: stokes_iter_pressure.png Type: image/png Size: 108344 bytes Desc: not available URL: From xzhao99 at gmail.com Tue Jun 2 18:00:57 2015 From: xzhao99 at gmail.com (Xujun Zhao) Date: Tue, 2 Jun 2015 18:00:57 -0500 Subject: [petsc-users] estimation of max and min eigenvalues in SLEPc In-Reply-To: <87vbf5c0wr.fsf@jedbrown.org> References: <87vbf5c0wr.fsf@jedbrown.org> Message-ID: Hi Jed, Here is my problem: I want to evaluate a vector u = B*dw where B and dw are a matrix and a stochastic vector. However, B = D^(-1/2) in which D is not explicitly assembled, so it is expensive to directly evaluate B. One solution is to make a Chebyshev approximation on B w.r.t. D, which is B = sum(c_k*D_k) Then, the problem becomes u = B*dw = sum(c_k*y_k) where y_k = D_k*dw can be obtained from my solver. Note c_k is the coefficient that is a function of approximate(not exact) max and min eigenvalues of D matrix. So I need an approximate range [ lambda_min, lambda_max ] to calculate c_k. If this range is accurate, then Chebyshev approximation can converge faster, otherwise may be slow or even never. Xujun On Tue, Jun 2, 2015 at 3:26 PM, Jed Brown wrote: > Xujun Zhao writes: > > > I need to evaluate the max and min eigenvalues of a matrix > > You need the min and max eigenvalues of the preconditioner operator. > But note that Chebyshev is not usually used as a stand-alone solver, but > rather as a smoother for multigrid or occasionally as a polynomial > preconditioner to control storage requirements or orthogonalization > issues. One exception is if your problem is fairly well-conditioned > with an evenly spaced spectrum. > > In general, estimating the largest eigenvalue is relatively inexpensive, > but computing the smallest to even moderate accuracy is as expensive as > solving the linear system. > > > https://scicomp.stackexchange.com/questions/34/how-can-i-estimate-the-condition-number-of-a-large-sparse-matrix-using-petsc > > > when I make the Chebyshev polynomial approximation. Are there > > efficient ways to do this? Thank you very much. > > > > Xujun > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Jun 2 22:21:52 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 2 Jun 2015 22:21:52 -0500 Subject: [petsc-users] SNESVINEWTONRSLS: definition of active set? In-Reply-To: References: <5ABC2C8A-1BE6-489C-BACD-E8D780CD205A@mcs.anl.gov> <9B83FB13-0727-4BB9-A7A8-A14EC1DC8FC0@mcs.anl.gov> Message-ID: <6D4F2C44-5078-4D00-981F-EEA0961EE9E5@mcs.anl.gov> Ozzy, Thanks for the update. I have changed the checking of the f() as suggested by the cited reference in master. Barry > On Jun 1, 2015, at 4:07 AM, Asbj?rn Nilsen Riseth wrote: > > Hi again Barry, > > I sorted out the jacobian issue, and made a comparison between the two definitions of the active set. > The active set with strict inequality takes the same or fewer Newton steps than the current petsc code. With a larger search space, I expect this to happen. snes_vi_monitor logs comparing the two are shown below. > > I could submit a pull request with the change, but we should probably consider: > 1) Whether this active set definition is consistent with newtonssls > 2) The linear systems to solve becomes larger, so for some cases the overall performance might not improve so much. > 3) For more flexibility, we could add an option to decide whether to use a strict inequality or not. This would sort out 1) and 2). > > I don't have much experience with the petsc codebase though, so adding options might take me some time. > > > Ozzy > > > __ log using my patch __ > 0 SNES VI Function norm 7.796491981333e+02 Active lower constraints 0/18 upper constraints 0/0 Percent of total 0 Percent of bounded 0 > 1 SNES VI Function norm 2.405400030748e+02 Active lower constraints 0/16 upper constraints 0/0 Percent of total 0 Percent of bounded 0 > 2 SNES VI Function norm 2.145739795389e+02 Active lower constraints 0/17 upper constraints 0/0 Percent of total 0 Percent of bounded 0 > 3 SNES VI Function norm 1.942498283668e+02 Active lower constraints 0/13 upper constraints 0/0 Percent of total 0 Percent of bounded 0 > 4 SNES VI Function norm 1.834306037299e+01 Active lower constraints 0/11 upper constraints 0/0 Percent of total 0 Percent of bounded 0 > 5 SNES VI Function norm 1.724597091463e+01 Active lower constraints 0/11 upper constraints 0/0 Percent of total 0 Percent of bounded 0 > 6 SNES VI Function norm 4.210027533399e-02 Active lower constraints 0/10 upper constraints 0/0 Percent of total 0 Percent of bounded 0 > 7 SNES VI Function norm 3.014124871281e-07 Active lower constraints 0/10 upper constraints 0/0 Percent of total 0 Percent of bounded 0 > SNES Object:(firedrake_snes_0_) 1 MPI processes > type: vinewtonrsls > maximum iterations=20, maximum function evaluations=10000 > tolerances: relative=0, absolute=1e-06, solution=0 > total number of linear solver iterations=7 > total number of function evaluations=22 > norm schedule ALWAYS > SNESLineSearch Object: (firedrake_snes_0_) 1 MPI processes > type: l2 > maxstep=1.000000e+08, minlambda=1.000000e-12 > tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 > maximum iterations=1 > KSP Object: (firedrake_snes_0_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > left preconditioning > using NONE norm type for convergence test > PC Object: (firedrake_snes_0_) 1 MPI processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: nd > factor fill ratio given 5, needed 1.54545 > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=36, cols=36 > package used to perform factorization: petsc > total: nonzeros=816, allocated nonzeros=816 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 15 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=36, cols=36 > total: nonzeros=528, allocated nonzeros=528 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > -------------------------------------------------- > > __ log using the original petsc code __ > 0 SNES VI Function norm 7.796491981333e+02 Active lower constraints 12/18 upper constraints 0/0 Percent of total 0.333333 Percent of bounded 0.333333 > 1 SNES VI Function norm 2.630718602212e+02 Active lower constraints 12/16 upper constraints 0/0 Percent of total 0.333333 Percent of bounded 0.333333 > 2 SNES VI Function norm 2.363417090057e+02 Active lower constraints 12/17 upper constraints 0/0 Percent of total 0.333333 Percent of bounded 0.333333 > 3 SNES VI Function norm 1.902271040685e+01 Active lower constraints 12/14 upper constraints 0/0 Percent of total 0.333333 Percent of bounded 0.333333 > 4 SNES VI Function norm 1.866193366410e+01 Active lower constraints 12/12 upper constraints 0/0 Percent of total 0.333333 Percent of bounded 0.333333 > 5 SNES VI Function norm 1.865568900723e+01 Active lower constraints 12/12 upper constraints 0/0 Percent of total 0.333333 Percent of bounded 0.333333 > 6 SNES VI Function norm 2.182461654877e+02 Active lower constraints 10/12 upper constraints 0/0 Percent of total 0.277778 Percent of bounded 0.277778 > 7 SNES VI Function norm 2.575010971279e-01 Active lower constraints 10/11 upper constraints 0/0 Percent of total 0.277778 Percent of bounded 0.277778 > 8 SNES VI Function norm 1.056372578821e-05 Active lower constraints 10/10 upper constraints 0/0 Percent of total 0.277778 Percent of bounded 0.277778 > 9 SNES VI Function norm 4.368019257866e-11 Active lower constraints 10/10 upper constraints 0/0 Percent of total 0.277778 Percent of bounded 0.277778 > SNES Object:(firedrake_snes_0_) 1 MPI processes > type: vinewtonrsls > maximum iterations=20, maximum function evaluations=10000 > tolerances: relative=0, absolute=1e-06, solution=0 > total number of linear solver iterations=9 > total number of function evaluations=28 > norm schedule ALWAYS > SNESLineSearch Object: (firedrake_snes_0_) 1 MPI processes > type: l2 > maxstep=1.000000e+08, minlambda=1.000000e-12 > tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 > maximum iterations=1 > KSP Object: (firedrake_snes_0_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > left preconditioning > using NONE norm type for convergence test > PC Object: (firedrake_snes_0_) 1 MPI processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: nd > factor fill ratio given 5, needed 1.57895 > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=26, cols=26 > package used to perform factorization: petsc > total: nonzeros=420, allocated nonzeros=420 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 11 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=26, cols=26 > total: nonzeros=266, allocated nonzeros=266 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > > > On Sat, 30 May 2015 at 01:07 Asbj?rn Nilsen Riseth wrote: > Hey Barry, > > thanks for the offer to have a look at the code. I ran SNESTEST today: user-defined failed, 1.0 and -1.0 seemed to work fine. My first step will have to be to find out what's wrong with my jacobian. If I've still got issues after that, I'll try to set up an easy-to-experiment code > > The code is a DG0 FVM formulation set up in Firedrake (a "fork" of FEniCS). I was assuming UFL would sort the Jacobian for me. > Lesson learnt: always do a SNESTEST. > > > Ozzy > > On Fri, 29 May 2015 at 19:21 Barry Smith wrote: > > > On May 29, 2015, at 4:19 AM, Asbj?rn Nilsen Riseth wrote: > > > > Barry, > > > > I changed the code, but it only makes a difference at the order of 1e-10 - so that's not the cause. I've attached (if that's appropriate?) the patch in case anyone is interested. > > > > Investigating the function values, I see that the first Newton step goes towards the expected solution for my function values. Then it shoots back close to the initial conditions. > > When does it "shoot back close to the initial conditions"? At the second Newton step? If so is the residual norm still smaller at the second step? > > > At the time the solver hits tolerance for the inactive set; the function value is 100-1000 at some of the active set indices. > > Reducing the timestep by an order of magnitude shows the same behavior for the first two timesteps. > > > > Maybe VI is not the right approach. The company I work with seem to just be projecting negative values. > > The VI solver is essentially just a "more sophisticated" version of "projecting negative values" so should not work worse then an ad hoc method and should generally work better (sometimes much better). > > Is your code something simple you could email me to play with or is it a big application that would be hard for me to experiment with? > > > > Barry > > > > > I'll investigate further. > > > > Ozzy > > > > > > On Thu, 28 May 2015 at 20:26 Barry Smith wrote: > > > > Ozzy, > > > > I cannot say why it is implemented as >= (could be in error). Just try changing the PETSc code (just run make gnumake in the PETSc directory after you change the source to update the library) and see how it affects your code run. > > > > Barry > > > > > On May 28, 2015, at 3:15 AM, Asbj?rn Nilsen Riseth wrote: > > > > > > Dear PETSc developers, > > > > > > Is the active set in NewtonRSLS defined differently from the reference* you give in the documentation on purpose? > > > The reference defines the active set as: > > > x_i = 0 and F_i > 0, > > > whilst the PETSc code defines it as x_i = 0 and F_i >= 0 (vi.c: 356) : > > > !((PetscRealPart(x[i]) > PetscRealPart(xl[i]) + 1.e-8 || (PetscRealPart(f[i]) < 0.0) > > > So PETSc freezes the variables if f[i] == 0. > > > > > > I've been using the Newton RSLS method to ensure positivity in a subsurface flow problem I'm working on. My solution stays almost constant for two timesteps (seemingly independent of the size of the timestep), before it goes towards the expected solution. > > > From my initial conditions, certain variables are frozen because x_i = 0 and f[i] = 0, and I was wondering if that could be the cause of my issue. > > > > > > > > > *: > > > - T. S. Munson, and S. Benson. Flexible Complementarity Solvers for Large-Scale Applications, Optimization Methods and Software, 21 (2006). > > > > > > > > > Regards, > > > Ozzy > > > > <0001-Define-active-and-inactive-sets-correctly.patch> > From jroman at dsic.upv.es Wed Jun 3 02:24:23 2015 From: jroman at dsic.upv.es (Jose E. Roman) Date: Wed, 3 Jun 2015 09:24:23 +0200 Subject: [petsc-users] estimation of max and min eigenvalues in SLEPc In-Reply-To: References: <87vbf5c0wr.fsf@jedbrown.org> Message-ID: <2430BD38-CC10-4F02-858C-7E7632697B2D@dsic.upv.es> El 03/06/2015, a las 01:00, Xujun Zhao escribi?: > Hi Jed, > > Here is my problem: > > I want to evaluate a vector u = B*dw where B and dw are a matrix and a stochastic vector. However, B = D^(-1/2) in which D is not explicitly assembled, so it is expensive to directly evaluate B. One solution is to make a Chebyshev approximation on B w.r.t. D, which is > B = sum(c_k*D_k) > > Then, the problem becomes u = B*dw = sum(c_k*y_k) where y_k = D_k*dw can be obtained from my solver. > > Note c_k is the coefficient that is a function of approximate(not exact) max and min eigenvalues of D matrix. So I need an approximate range [ lambda_min, lambda_max ] to calculate c_k. If this range is accurate, then Chebyshev approximation can converge faster, otherwise may be slow or even never. > > Xujun > SLEPc does not support computing eigenvalues from both ends of the spectrum with the same call. You have to call EPSSolve() twice, one with EPS_LARGEST_REAL and the other with EPS_SMALLEST_REAL. Jose From francesco.caimmi at polimi.it Wed Jun 3 09:37:04 2015 From: francesco.caimmi at polimi.it (Francesco Caimmi) Date: Wed, 3 Jun 2015 16:37:04 +0200 Subject: [petsc-users] [petsc4py] dm/examples/tutorials/ex2 in python In-Reply-To: <87a8wicaxy.fsf@jedbrown.org> References: <34150389.MoinacE02k@wotan> <87a8wicaxy.fsf@jedbrown.org> Message-ID: <1460799.1zrT7xRHbX@pc-fcaimmi> Thank you all for your suggestions! It looks like I will have to re-read the initial parts of the manual to better understand the assembly operations. However I am still a bit puzzled by the fact that doing #### start, end = global_vec.getOwnershipRange() with global_vec as v: for i in xrange(end-start): v[i] = 5.0*rank viewer = PETSc.Viewer.DRAW(global_vec.comm) global_vec.view(viewer) #### or doing #### start, end = global_vec.getOwnershipRange() for i in xrange(start,end): global_vec[i] = 5.0*rank viewer = PETSc.Viewer.DRAW(global_vec.comm) global_vec.view(viewer) #### actually I get the _same_ output, which, btw, is the intended output, as far as I can tell from the output of ex2.c (after compiling it). This is true regardless of the number of processors (up to three). Based on what you said I expected the second piece of code to fail. Have I misunderstood something? -- Francesco Caimmi Laboratorio di Ingegneria dei Polimeri http://www.chem.polimi.it/polyenglab/ Politecnico di Milano - Dipartimento di Chimica, Materiali e Ingegneria Chimica ?Giulio Natta? P.zza Leonardo da Vinci, 32 I-20133 Milano Tel. +39.02.2399.4711 Fax +39.02.7063.8173 francesco.caimmi at polimi.it Skype: fmglcaimmi (please arrange meetings by e-mail) GPG Public Key : http://goo.gl/64dDo On Tuesday June 2 2015 at 18:49 Jed Brown wrote: > Matthew Knepley writes: > >> for i in xrange(start,end): > >> global_vec[i] = 5.0*rank > > > > No, you would need xrange(end-start): > Specifically, Python made a decision about indexing semantics that, > while not irrational, really sucks for distributed array computing. I'm > not aware of a clean way to give you nice syntax. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Jun 3 10:08:33 2015 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 3 Jun 2015 11:08:33 -0400 Subject: [petsc-users] Convergence of iterative linear solver In-Reply-To: <87h9qpbwjl.fsf@jedbrown.org> References: <87h9qpbwjl.fsf@jedbrown.org> Message-ID: On Tue, Jun 2, 2015 at 6:00 PM, Jed Brown wrote: > Eduardo writes: > > > I am solving a FEM solid mechanics linear elasticity model, for now the > > only problem is the mesh that has needle-shaped and very flat elements. > > Why does it have such elements? Is the material highly anisotropic > (e.g., fiber)? Is the geometry anisotropic (shell structures > discretized using volumes)? Is it just a low-quality mesh? > > As Mark says, thresholding is important for AMG to solve anisotropic > problems, but many of the discretizations used in solid mechanics will > obscure the anisotropy from the usual strength of connection measures. > Yes, but the main point of thresholding is that it governs the rate of coarsening. If you coarsen "too" fast convergence rates deteriorate, if you coarsen "too" slow the cost if each iterations gets (very) high. This is problem dependant and the defaults can be very bad for some problems. -------------- next part -------------- An HTML attachment was scrubbed... URL: From xzhao99 at gmail.com Wed Jun 3 10:47:37 2015 From: xzhao99 at gmail.com (Xujun Zhao) Date: Wed, 3 Jun 2015 10:47:37 -0500 Subject: [petsc-users] estimation of max and min eigenvalues in SLEPc In-Reply-To: <2430BD38-CC10-4F02-858C-7E7632697B2D@dsic.upv.es> References: <87vbf5c0wr.fsf@jedbrown.org> <2430BD38-CC10-4F02-858C-7E7632697B2D@dsic.upv.es> Message-ID: Hi Jose, Thank you for your reply. How about the computational cost compared to one EPSSolve() with all eigenvalues? what methods does SLEPc use for each solve? Because it may be cheaper for largest eigenvalue if the power method is used, but I don't if it is still so for smallest eigenvalue? Xujun On Wed, Jun 3, 2015 at 2:24 AM, Jose E. Roman wrote: > > El 03/06/2015, a las 01:00, Xujun Zhao escribi?: > > > Hi Jed, > > > > Here is my problem: > > > > I want to evaluate a vector u = B*dw where B and dw are a matrix and a > stochastic vector. However, B = D^(-1/2) in which D is not explicitly > assembled, so it is expensive to directly evaluate B. One solution is to > make a Chebyshev approximation on B w.r.t. D, which is > > B = sum(c_k*D_k) > > > > Then, the problem becomes u = B*dw = sum(c_k*y_k) where y_k = D_k*dw > can be obtained from my solver. > > > > Note c_k is the coefficient that is a function of approximate(not exact) > max and min eigenvalues of D matrix. So I need an approximate range [ > lambda_min, lambda_max ] to calculate c_k. If this range is accurate, then > Chebyshev approximation can converge faster, otherwise may be slow or even > never. > > > > Xujun > > > > SLEPc does not support computing eigenvalues from both ends of the > spectrum with the same call. You have to call EPSSolve() twice, one with > EPS_LARGEST_REAL and the other with EPS_SMALLEST_REAL. > > Jose > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Wed Jun 3 10:59:06 2015 From: jroman at dsic.upv.es (Jose E. Roman) Date: Wed, 3 Jun 2015 17:59:06 +0200 Subject: [petsc-users] estimation of max and min eigenvalues in SLEPc In-Reply-To: References: <87vbf5c0wr.fsf@jedbrown.org> <2430BD38-CC10-4F02-858C-7E7632697B2D@dsic.upv.es> Message-ID: <0F4D5291-AE02-41C6-877A-3F171316D6DC@dsic.upv.es> El 03/06/2015, a las 17:47, Xujun Zhao escribi?: > Hi Jose, > > Thank you for your reply. How about the computational cost compared to one EPSSolve() with all eigenvalues? what methods does SLEPc use for each solve? Because it may be cheaper for largest eigenvalue if the power method is used, but I don't if it is still so for smallest eigenvalue? > > Xujun > Don't compute all eigenvalues. For the largest eigenvalue, don't use the power iteration. The default solver (Krylov-Schur) will be very fast for that. For the smallest eigenvalue, convergence may be slow if eigenvalues are small and poorly separated - it may be necessary to do shift-and-invert, in which case the cost may blow up. Jose From xzhao99 at gmail.com Wed Jun 3 11:04:33 2015 From: xzhao99 at gmail.com (Xujun Zhao) Date: Wed, 3 Jun 2015 11:04:33 -0500 Subject: [petsc-users] estimation of max and min eigenvalues in SLEPc In-Reply-To: <0F4D5291-AE02-41C6-877A-3F171316D6DC@dsic.upv.es> References: <87vbf5c0wr.fsf@jedbrown.org> <2430BD38-CC10-4F02-858C-7E7632697B2D@dsic.upv.es> <0F4D5291-AE02-41C6-877A-3F171316D6DC@dsic.upv.es> Message-ID: One problem is that I don't have the explicit form of matrix D, but only u = D*v which can be obtained from my PETSc solver. How should I set up my EPSSolver in this case? Are there examples in SLEPc? Thanks. Xujun On Wed, Jun 3, 2015 at 10:59 AM, Jose E. Roman wrote: > > El 03/06/2015, a las 17:47, Xujun Zhao escribi?: > > > Hi Jose, > > > > Thank you for your reply. How about the computational cost compared to > one EPSSolve() with all eigenvalues? what methods does SLEPc use for each > solve? Because it may be cheaper for largest eigenvalue if the power method > is used, but I don't if it is still so for smallest eigenvalue? > > > > Xujun > > > > Don't compute all eigenvalues. > > For the largest eigenvalue, don't use the power iteration. The default > solver (Krylov-Schur) will be very fast for that. For the smallest > eigenvalue, convergence may be slow if eigenvalues are small and poorly > separated - it may be necessary to do shift-and-invert, in which case the > cost may blow up. > > Jose > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xzhao99 at gmail.com Wed Jun 3 11:08:17 2015 From: xzhao99 at gmail.com (Xujun Zhao) Date: Wed, 3 Jun 2015 11:08:17 -0500 Subject: [petsc-users] estimation of max and min eigenvalues in SLEPc In-Reply-To: References: <87vbf5c0wr.fsf@jedbrown.org> <2430BD38-CC10-4F02-858C-7E7632697B2D@dsic.upv.es> <0F4D5291-AE02-41C6-877A-3F171316D6DC@dsic.upv.es> Message-ID: Does shell matrix work? On Wed, Jun 3, 2015 at 11:04 AM, Xujun Zhao wrote: > One problem is that I don't have the explicit form of matrix D, but only u > = D*v which can be obtained from my PETSc solver. How should I set up my > EPSSolver in this case? Are there examples in SLEPc? Thanks. > > Xujun > > On Wed, Jun 3, 2015 at 10:59 AM, Jose E. Roman wrote: > >> >> El 03/06/2015, a las 17:47, Xujun Zhao escribi?: >> >> > Hi Jose, >> > >> > Thank you for your reply. How about the computational cost compared to >> one EPSSolve() with all eigenvalues? what methods does SLEPc use for each >> solve? Because it may be cheaper for largest eigenvalue if the power method >> is used, but I don't if it is still so for smallest eigenvalue? >> > >> > Xujun >> > >> >> Don't compute all eigenvalues. >> >> For the largest eigenvalue, don't use the power iteration. The default >> solver (Krylov-Schur) will be very fast for that. For the smallest >> eigenvalue, convergence may be slow if eigenvalues are small and poorly >> separated - it may be necessary to do shift-and-invert, in which case the >> cost may blow up. >> >> Jose >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Wed Jun 3 11:09:31 2015 From: jroman at dsic.upv.es (Jose E. Roman) Date: Wed, 3 Jun 2015 18:09:31 +0200 Subject: [petsc-users] estimation of max and min eigenvalues in SLEPc In-Reply-To: References: <87vbf5c0wr.fsf@jedbrown.org> <2430BD38-CC10-4F02-858C-7E7632697B2D@dsic.upv.es> <0F4D5291-AE02-41C6-877A-3F171316D6DC@dsic.upv.es> Message-ID: <9DE42052-605D-4D76-A510-E8EFB6616D2C@dsic.upv.es> El 03/06/2015, a las 18:04, Xujun Zhao escribi?: > One problem is that I don't have the explicit form of matrix D, but only u = D*v which can be obtained from my PETSc solver. How should I set up my EPSSolver in this case? Are there examples in SLEPc? Thanks. > > Xujun http://slepc.upv.es/documentation/current/src/eps/examples/tutorials/ex3.c.html http://slepc.upv.es/documentation/current/src/eps/examples/tutorials/ex6f.F.html > > On Wed, Jun 3, 2015 at 10:59 AM, Jose E. Roman wrote: > > El 03/06/2015, a las 17:47, Xujun Zhao escribi?: > > > Hi Jose, > > > > Thank you for your reply. How about the computational cost compared to one EPSSolve() with all eigenvalues? what methods does SLEPc use for each solve? Because it may be cheaper for largest eigenvalue if the power method is used, but I don't if it is still so for smallest eigenvalue? > > > > Xujun > > > > Don't compute all eigenvalues. > > For the largest eigenvalue, don't use the power iteration. The default solver (Krylov-Schur) will be very fast for that. For the smallest eigenvalue, convergence may be slow if eigenvalues are small and poorly separated - it may be necessary to do shift-and-invert, in which case the cost may blow up. > > Jose > > From mrosso at uci.edu Wed Jun 3 13:30:55 2015 From: mrosso at uci.edu (Michele Rosso) Date: Wed, 03 Jun 2015 11:30:55 -0700 Subject: [petsc-users] Petsc without debugging enabled Message-ID: <1433356255.17390.3.camel@kolmog5> Hi, I am performing some timing runs with PETSc. I think I correctly compiled it with no debug mode, yet -log_summary gives me a warning: ########################################################## # # # WARNING!!! # # # # This code was compiled with a debugging option, # # To get timing results run ./configure # # using --with-debugging=no, the performance will # # be generally two or three times faster. # # # ########################################################## Here are my configure options ( from -log_summary ): Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --known-level1-dcache-size=16384 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 --known-mpi-c-double-complex=1 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 --with-batch="1 " --known-mpi-shared="0 " --known-mpi-shared-libraries=0 --known-memcmp-ok --with-blas-lapack-lib="-L/opt/acml/5.3.1/gfortran64/lib -lacml" --COPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --FOPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --CXXOPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --with-x="0 " --with-debugging=0 --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " --with-fortranlib-autodetect="0 " --with-shared-libraries="0 " --with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn " --download-hypre=1 --download-blacs="1 " --download-scalapack="1 " --download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " PETSC_ARCH=gnu-opt-32idx Libraries compiled on Wed Jun 3 12:14:19 2015 on h2ologin2 Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64 Using PETSc directory: /mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4 Using PETSc arch: gnu-opt-32idx ----------------------------------------- Using C compiler: cc -march=bdver1 -fopenmp -O3 -ffast-math -fPIC ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: ftn -march=bdver1 -fopenmp -O3 -ffast-math -fPIC ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/include -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/include -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/include -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/include ----------------------------------------- Using C linker: cc Using Fortran linker: ftn Using libraries: -Wl,-rpath,/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -L/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -lpetsc -Wl,-rpath,/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -L/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -lsuperlu_dist_3.3 -lHYPRE -L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lpthread -lssl -lcrypto -ldl What am I doing wrong? Thanks, Michele -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jun 3 13:50:13 2015 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 3 Jun 2015 13:50:13 -0500 Subject: [petsc-users] [petsc4py] dm/examples/tutorials/ex2 in python In-Reply-To: <1460799.1zrT7xRHbX@pc-fcaimmi> References: <34150389.MoinacE02k@wotan> <87a8wicaxy.fsf@jedbrown.org> <1460799.1zrT7xRHbX@pc-fcaimmi> Message-ID: On Wed, Jun 3, 2015 at 9:37 AM, Francesco Caimmi wrote: > Thank you all for your suggestions! > > > > It looks like I will have to re-read the initial parts of the manual to > better understand the assembly operations. > > > > However I am still a bit puzzled by the fact that doing > > > > #### > > start, end = global_vec.getOwnershipRange() > > > > with global_vec as v: > > for i in xrange(end-start): > > v[i] = 5.0*rank > > > > viewer = PETSc.Viewer.DRAW(global_vec.comm) > > global_vec.view(viewer) > > #### > > > > or doing > > > > #### > > start, end = global_vec.getOwnershipRange() > > > > for i in xrange(start,end): > > global_vec[i] = 5.0*rank > It looks like Lisandro has overloaded the [] operator to translate from local to global indices. I don't think I would recommend this since it would lock the vector for every entry. The code above is a direct numpy array access. Thanks, Matt > > > viewer = PETSc.Viewer.DRAW(global_vec.comm) > > global_vec.view(viewer) > > #### > > > > actually I get the _same_ output, which, btw, is the intended output, as > far as I can tell from the output of ex2.c (after compiling it). This is > true regardless of the number of processors (up to three). > > Based on what you said I expected the second piece of code to fail. Have I > misunderstood something? > > > > > > -- > > Francesco Caimmi > > Laboratorio di Ingegneria dei Polimeri > > http://www.chem.polimi.it/polyenglab/ > > > > Politecnico di Milano - Dipartimento di Chimica, > > Materiali e Ingegneria Chimica ?Giulio Natta? > > > > P.zza Leonardo da Vinci, 32 > > I-20133 Milano > > Tel. +39.02.2399.4711 > > Fax +39.02.7063.8173 > > > > francesco.caimmi at polimi.it > > Skype: fmglcaimmi (please arrange meetings by e-mail) > > GPG Public Key : http://goo.gl/64dDo > > On Tuesday June 2 2015 at 18:49 Jed Brown wrote: > > > > > Matthew Knepley writes: > > > >> for i in xrange(start,end): > > > >> global_vec[i] = 5.0*rank > > > > > > > > No, you would need xrange(end-start): > > > Specifically, Python made a decision about indexing semantics that, > > > while not irrational, really sucks for distributed array computing. I'm > > > not aware of a clean way to give you nice syntax. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Jun 3 13:50:50 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 3 Jun 2015 13:50:50 -0500 Subject: [petsc-users] Petsc without debugging enabled In-Reply-To: <1433356255.17390.3.camel@kolmog5> References: <1433356255.17390.3.camel@kolmog5> Message-ID: <7C497DE5-0028-4B0A-8D2D-AB90F5351BB7@mcs.anl.gov> Though you turned on various compiler optimizations you did not turn off the "extra" PETSc error checking that is enabled by default. For optimized runs you should also use the argument --with-debugging=0 Barry > On Jun 3, 2015, at 1:30 PM, Michele Rosso wrote: > > Hi, > > I am performing some timing runs with PETSc. I think I correctly compiled it with no debug mode, yet -log_summary gives me a warning: > > ########################################################## > # # > # WARNING!!! # > # # > # This code was compiled with a debugging option, # > # To get timing results run ./configure # > # using --with-debugging=no, the performance will # > # be generally two or three times faster. # > # # > ########################################################## > > Here are my configure options ( from -log_summary ): > > Compiled without FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 > Configure options: --known-level1-dcache-size=16384 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 --known-mpi-c-double-complex=1 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 --with-batch="1 " --known-mpi-shared="0 " --known-mpi-shared-libraries=0 --known-memcmp-ok --with-blas-lapack-lib="-L/opt/acml/5.3.1/gfortran64/lib -lacml" --COPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --FOPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --CXXOPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --with-x="0 " --with-debugging=0 --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " --with-fortranlib-autodetect="0 " --with-shared-libraries="0 " --with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn " --download-hypre=1 --download-blacs="1 " --download-scalapack="1 " --download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " PETSC_ARCH=gnu-opt-32idx > > Libraries compiled on Wed Jun 3 12:14:19 2015 on h2ologin2 > Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64 > Using PETSc directory: /mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4 > Using PETSc arch: gnu-opt-32idx > ----------------------------------------- > > Using C compiler: cc -march=bdver1 -fopenmp -O3 -ffast-math -fPIC ${COPTFLAGS} ${CFLAGS} > Using Fortran compiler: ftn -march=bdver1 -fopenmp -O3 -ffast-math -fPIC ${FOPTFLAGS} ${FFLAGS} > ----------------------------------------- > > Using include paths: -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/include -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/include -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/include -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/include > ----------------------------------------- > > Using C linker: cc > Using Fortran linker: ftn > Using libraries: -Wl,-rpath,/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -L/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -lpetsc -Wl,-rpath,/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -L/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -lsuperlu_dist_3.3 -lHYPRE -L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lpthread -lssl -lcrypto -ldl > > What am I doing wrong? > > Thanks, > Michele > From mrosso at uci.edu Wed Jun 3 14:05:04 2015 From: mrosso at uci.edu (Michele Rosso) Date: Wed, 03 Jun 2015 12:05:04 -0700 Subject: [petsc-users] Petsc without debugging enabled In-Reply-To: <7C497DE5-0028-4B0A-8D2D-AB90F5351BB7@mcs.anl.gov> References: <1433356255.17390.3.camel@kolmog5> <7C497DE5-0028-4B0A-8D2D-AB90F5351BB7@mcs.anl.gov> Message-ID: <1433358304.17390.17.camel@kolmog5> Hi Barry, I think I did (see below in bold red): Configure options: --known-level1-dcache-size=16384 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 --known-mpi-c-double-complex=1 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 --with-batch="1 " --known-mpi-shared="0 " --known-mpi-shared-libraries=0 --known-memcmp-ok --with-blas-lapack-lib="-L/opt/acml/5.3.1/gfortran64/lib -lacml" --COPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --FOPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --CXXOPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --with-x="0 " --with-debugging=0 --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " --with-fortranlib-autodetect="0 " --with-shared-libraries="0 " --with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn " --download-hypre=1 --download-blacs="1 " --download-scalapack="1 " --download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " PETSC_ARCH=gnu-opt-32idx On Wed, 2015-06-03 at 13:50 -0500, Barry Smith wrote: > Though you turned on various compiler optimizations you did not turn off the "extra" PETSc error checking that is enabled by default. For optimized runs you should also use the argument --with-debugging=0 > > Barry > > > > On Jun 3, 2015, at 1:30 PM, Michele Rosso wrote: > > > > Hi, > > > > I am performing some timing runs with PETSc. I think I correctly compiled it with no debug mode, yet -log_summary gives me a warning: > > > > ########################################################## > > # # > > # WARNING!!! # > > # # > > # This code was compiled with a debugging option, # > > # To get timing results run ./configure # > > # using --with-debugging=no, the performance will # > > # be generally two or three times faster. # > > # # > > ########################################################## > > > > Here are my configure options ( from -log_summary ): > > > > Compiled without FORTRAN kernels > > Compiled with full precision matrices (default) > > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 > > Configure options: --known-level1-dcache-size=16384 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 --known-mpi-c-double-complex=1 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 --with-batch="1 " --known-mpi-shared="0 " --known-mpi-shared-libraries=0 --known-memcmp-ok --with-blas-lapack-lib="-L/opt/acml/5.3.1/gfortran64/lib -lacml" --COPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --FOPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --CXXOPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --with-x="0 " --with-debugging=0 --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " --with-fortranlib-autodetect="0 " --with-shared-libraries="0 " --with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn " --download-hypre=1 --download-blacs="1 " --download-scalapack="1 " --download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " PETSC_ARCH=gnu-opt-32idx > > > > Libraries compiled on Wed Jun 3 12:14:19 2015 on h2ologin2 > > Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64 > > Using PETSc directory: /mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4 > > Using PETSc arch: gnu-opt-32idx > > ----------------------------------------- > > > > Using C compiler: cc -march=bdver1 -fopenmp -O3 -ffast-math -fPIC ${COPTFLAGS} ${CFLAGS} > > Using Fortran compiler: ftn -march=bdver1 -fopenmp -O3 -ffast-math -fPIC ${FOPTFLAGS} ${FFLAGS} > > ----------------------------------------- > > > > Using include paths: -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/include -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/include -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/include -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/include > > ----------------------------------------- > > > > Using C linker: cc > > Using Fortran linker: ftn > > Using libraries: -Wl,-rpath,/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -L/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -lpetsc -Wl,-rpath,/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -L/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -lsuperlu_dist_3.3 -lHYPRE -L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lpthread -lssl -lcrypto -ldl > > > > What am I doing wrong? > > > > Thanks, > > Michele > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Jun 3 14:23:57 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 3 Jun 2015 14:23:57 -0500 Subject: [petsc-users] Petsc without debugging enabled In-Reply-To: <1433358304.17390.17.camel@kolmog5> References: <1433356255.17390.3.camel@kolmog5> <7C497DE5-0028-4B0A-8D2D-AB90F5351BB7@mcs.anl.gov> <1433358304.17390.17.camel@kolmog5> Message-ID: <72EB7DD8-FDEE-4508-80CB-0322DB3931F9@mcs.anl.gov> Hmm, look in $PETSC_ARCH/include/petscconf.h and look for #ifndef PETSC_USE_DEBUG #define PETSC_USE_DEBUG 1 #endif is it there? This is what triggers the printing of that message with -log_summary and PETSC_USE_DEBUG is defined in config/PETSc/options/LibraryOptions.py based on self.debugging.debugging: which is set in config/BuildSystem/config/compilerFlags.py so I am totally lost how it could be printing that message with that choice of configure options. Did you run make all after you ran the ./configure ? Barry > On Jun 3, 2015, at 2:05 PM, Michele Rosso wrote: > > Hi Barry, > > I think I did (see below in bold red): > > Configure options: --known-level1-dcache-size=16384 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 --known-mpi-c-double-complex=1 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 --with-batch="1 " --known-mpi-shared="0 " --known-mpi-shared-libraries=0 --known-memcmp-ok --with-blas-lapack-lib="-L/opt/acml/5.3.1/gfortran64/lib -lacml" --COPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --FOPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --CXXOPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --with-x="0 " --with-debugging=0 --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " --with-fortranlib-autodetect="0 " --with-shared-libraries="0 " --with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn " --download-hypre=1 --download-blacs="1 " --download-scalapack="1 " --download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " PETSC_ARCH=gnu-opt-32idx > > > > On Wed, 2015-06-03 at 13:50 -0500, Barry Smith wrote: >> Though you turned on various compiler optimizations you did not turn off the "extra" PETSc error checking that is enabled by default. For optimized runs you should also use the argument --with-debugging=0 >> >> Barry >> >> >> >> > On Jun 3, 2015, at 1:30 PM, Michele Rosso wrote: >> > >> > Hi, >> > >> > I am performing some timing runs with PETSc. I think I correctly compiled it with no debug mode, yet -log_summary gives me a warning: >> > >> > ########################################################## >> > # # >> > # WARNING!!! # >> > # # >> > # This code was compiled with a debugging option, # >> > # To get timing results run ./configure # >> > # using --with-debugging=no, the performance will # >> > # be generally two or three times faster. # >> > # # >> > ########################################################## >> > >> > Here are my configure options ( from -log_summary ): >> > >> > Compiled without FORTRAN kernels >> > Compiled with full precision matrices (default) >> > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 >> > Configure options: --known-level1-dcache-size=16384 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 --known-mpi-c-double-complex=1 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 --with-batch="1 " --known-mpi-shared="0 " --known-mpi-shared-libraries=0 --known-memcmp-ok --with-blas-lapack-lib="-L/opt/acml/5.3.1/gfortran64/lib -lacml" --COPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --FOPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --CXXOPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --with-x="0 " --with-debugging=0 --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " --with-fortranlib-autodetect="0 " --with-shared-libraries="0 " --with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn " --download-hypre=1 --download-blacs="1 " --download-scalapack="1 " --download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " PETSC_ARCH=gnu-opt-32idx >> > >> > Libraries compiled on Wed Jun 3 12:14:19 2015 on h2ologin2 >> > Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64 >> > Using PETSc directory: /mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4 >> > Using PETSc arch: gnu-opt-32idx >> > ----------------------------------------- >> > >> > Using C compiler: cc -march=bdver1 -fopenmp -O3 -ffast-math -fPIC ${COPTFLAGS} ${CFLAGS} >> > Using Fortran compiler: ftn -march=bdver1 -fopenmp -O3 -ffast-math -fPIC ${FOPTFLAGS} ${FFLAGS} >> > ----------------------------------------- >> > >> > Using include paths: -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/include -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/include -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/include -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/include >> > ----------------------------------------- >> > >> > Using C linker: cc >> > Using Fortran linker: ftn >> > Using libraries: -Wl,-rpath,/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -L/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -lpetsc -Wl,-rpath,/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -L/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -lsuperlu_dist_3.3 -lHYPRE -L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lpthread -lssl -lcrypto -ldl >> > >> > What am I doing wrong? >> > >> > Thanks, >> > Michele >> > >> >> >> > From mrosso at uci.edu Wed Jun 3 14:31:16 2015 From: mrosso at uci.edu (Michele Rosso) Date: Wed, 03 Jun 2015 12:31:16 -0700 Subject: [petsc-users] Petsc without debugging enabled In-Reply-To: <72EB7DD8-FDEE-4508-80CB-0322DB3931F9@mcs.anl.gov> References: <1433356255.17390.3.camel@kolmog5> <7C497DE5-0028-4B0A-8D2D-AB90F5351BB7@mcs.anl.gov> <1433358304.17390.17.camel@kolmog5> <72EB7DD8-FDEE-4508-80CB-0322DB3931F9@mcs.anl.gov> Message-ID: <1433359876.17390.36.camel@kolmog5> Barry, nevermind. I am on a cray machine so I had to edit petscconf.h before running make: it turns out I must have accidentally modified PETSC_USE_DEBUG. Sorry for the trouble :-) Thanks, Michele On Wed, 2015-06-03 at 14:23 -0500, Barry Smith wrote: > Hmm, look in $PETSC_ARCH/include/petscconf.h and look for > > #ifndef PETSC_USE_DEBUG > #define PETSC_USE_DEBUG 1 > #endif > > is it there? This is what triggers the printing of that message with -log_summary and PETSC_USE_DEBUG is defined in config/PETSc/options/LibraryOptions.py based on > self.debugging.debugging: which is set in config/BuildSystem/config/compilerFlags.py so I am totally lost how it could be printing that message with that choice of configure options. Did you run make all after you ran the ./configure ? > > Barry > > > > On Jun 3, 2015, at 2:05 PM, Michele Rosso wrote: > > > > Hi Barry, > > > > I think I did (see below in bold red): > > > > Configure options: --known-level1-dcache-size=16384 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 --known-mpi-c-double-complex=1 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 --with-batch="1 " --known-mpi-shared="0 " --known-mpi-shared-libraries=0 --known-memcmp-ok --with-blas-lapack-lib="-L/opt/acml/5.3.1/gfortran64/lib -lacml" --COPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --FOPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --CXXOPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --with-x="0 " --with-debugging=0 --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " --with-fortranlib-autodetect="0 " --with-shared-libraries="0 " --with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn " --download-hypre=1 --download-blacs="1 " --download-scalapack="1 " --download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " PETSC_ARCH=gnu-opt-32idx > > > > > > > > On Wed, 2015-06-03 at 13:50 -0500, Barry Smith wrote: > >> Though you turned on various compiler optimizations you did not turn off the "extra" PETSc error checking that is enabled by default. For optimized runs you should also use the argument --with-debugging=0 > >> > >> Barry > >> > >> > >> > >> > On Jun 3, 2015, at 1:30 PM, Michele Rosso wrote: > >> > > >> > Hi, > >> > > >> > I am performing some timing runs with PETSc. I think I correctly compiled it with no debug mode, yet -log_summary gives me a warning: > >> > > >> > ########################################################## > >> > # # > >> > # WARNING!!! # > >> > # # > >> > # This code was compiled with a debugging option, # > >> > # To get timing results run ./configure # > >> > # using --with-debugging=no, the performance will # > >> > # be generally two or three times faster. # > >> > # # > >> > ########################################################## > >> > > >> > Here are my configure options ( from -log_summary ): > >> > > >> > Compiled without FORTRAN kernels > >> > Compiled with full precision matrices (default) > >> > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 > >> > Configure options: --known-level1-dcache-size=16384 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 --known-mpi-c-double-complex=1 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 --with-batch="1 " --known-mpi-shared="0 " --known-mpi-shared-libraries=0 --known-memcmp-ok --with-blas-lapack-lib="-L/opt/acml/5.3.1/gfortran64/lib -lacml" --COPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --FOPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --CXXOPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --with-x="0 " --with-debugging=0 --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " --with-fortranlib-autodetect="0 " --with-shared-libraries="0 " --with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn " --download-hypre=1 --download-blacs="1 " --download-scalapack="1 " --download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " PETSC_ARCH=gnu-opt-32idx > >> > > >> > Libraries compiled on Wed Jun 3 12:14:19 2015 on h2ologin2 > >> > Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64 > >> > Using PETSc directory: /mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4 > >> > Using PETSc arch: gnu-opt-32idx > >> > ----------------------------------------- > >> > > >> > Using C compiler: cc -march=bdver1 -fopenmp -O3 -ffast-math -fPIC ${COPTFLAGS} ${CFLAGS} > >> > Using Fortran compiler: ftn -march=bdver1 -fopenmp -O3 -ffast-math -fPIC ${FOPTFLAGS} ${FFLAGS} > >> > ----------------------------------------- > >> > > >> > Using include paths: -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/include -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/include -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/include -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/include > >> > ----------------------------------------- > >> > > >> > Using C linker: cc > >> > Using Fortran linker: ftn > >> > Using libraries: -Wl,-rpath,/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -L/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -lpetsc -Wl,-rpath,/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -L/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -lsuperlu_dist_3.3 -lHYPRE -L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lpthread -lssl -lcrypto -ldl > >> > > >> > What am I doing wrong? > >> > > >> > Thanks, > >> > Michele > >> > > >> > >> > >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Thu Jun 4 08:12:18 2015 From: jychang48 at gmail.com (Justin Chang) Date: Thu, 4 Jun 2015 08:12:18 -0500 Subject: [petsc-users] Guidance on GAMG preconditioning Message-ID: Hello everyone, Apologies if this sounds like a newbie question, but I am attempting to play around with the gamg preconditioner for my anisotropic diffusion problem, but I have no idea how to "fine tune" the parameters such that I get better performance. I understand that this depends on both the material properties and the type of mesh used, but I guess for starters, here is what I am doing and some stuff I have noticed: - I am solving a unit cube with a small hole in the middle. The outside BC conditions are 0 and the inside is unity. I have a tensorial dispersion diffusivity (with constant velocity). I have 6 different sized grids to solve this problem on. The problem sizes range from 36K dofs to 1M dofs. I was able to solve all of them using the CG and Jacobi solver and preconditioner combination. - When I try to solve them using CG and GAMG (I did not set any other command line options) I seem to get slower wall clock times but with much fewer KSP iterations. I also notice that the FLOPS/s metric is much smaller. - For certain meshes, my CG/GAMG solver fails to converge after 2 iterations due to DIVERGED_INDEFINITE_PC. This does not happen when I solve this on one processor or with the CG/Jacobi solver. >From what I have read online and through these petsc-mailing lists, it sounds to me the gamg preconditioner will give better performance for nice elliptic problems like the one I am solving. When I saw the SNES ex12 test case 39 from builder.py, it only had -pc_type gamg. I am guessing that I need to set additional command line options? If so, where should I start? Thanks, Justin -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jun 4 09:04:28 2015 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 4 Jun 2015 09:04:28 -0500 Subject: [petsc-users] Guidance on GAMG preconditioning In-Reply-To: References: Message-ID: On Thu, Jun 4, 2015 at 8:12 AM, Justin Chang wrote: > Hello everyone, > > Apologies if this sounds like a newbie question, but I am attempting to > play around with the gamg preconditioner for my anisotropic diffusion > problem, but I have no idea how to "fine tune" the parameters such that I > get better performance. I understand that this depends on both the material > properties and the type of mesh used, but I guess for starters, here is > what I am doing and some stuff I have noticed: > > - I am solving a unit cube with a small hole in the middle. The outside BC > conditions are 0 and the inside is unity. I have a tensorial dispersion > diffusivity (with constant velocity). I have 6 different sized grids to > solve this problem on. The problem sizes range from 36K dofs to 1M dofs. I > was able to solve all of them using the CG and Jacobi solver and > preconditioner combination. > > - When I try to solve them using CG and GAMG (I did not set any other > command line options) I seem to get slower wall clock times but with much > fewer KSP iterations. I also notice that the FLOPS/s metric is much smaller. > > - For certain meshes, my CG/GAMG solver fails to converge after 2 > iterations due to DIVERGED_INDEFINITE_PC. This does not happen when I solve > this on one processor or with the CG/Jacobi solver. > > From what I have read online and through these petsc-mailing lists, it > sounds to me the gamg preconditioner will give better performance for nice > elliptic problems like the one I am solving. When I saw the SNES ex12 test > case 39 from builder.py, it only had -pc_type gamg. I am guessing that I > need to set additional command line options? If so, where should I start? > Mark recommends starting with -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1 -pc_gamg_threshold 0.02 # [0 - 0.1] #-mg_levels_ksp_type richardson -mg_levels_ksp_type chebyshev -mg_levels_pc_type sor #-mg_levels_pc_type jacobi -mg_levels_ksp_max_it 2 # [1-8] and messing with the # stuff. 1) GAMG should definitely be more scalable than CG/ILU. You should be able to see this by plotting the time/iterates as a function of problem size. GAMG should have a constant number of iterates, whereas CG should grow. GAMG will have higher overheads, so for smaller problem sizes CG can be better. 2) The DIVERGED_INDEFINITE_PC can happen if the GAMG is non-symmetric due to different number of iterates or the subsolver choice. Try out richardson instead of chebyshev. 3) If your coefficient is highly anisotropic, GAMG may need help making good coarse basis vectors. Mark has some experimental stuff for this. Adjusting the thresholding parameter determines how fast you coarsen (you can see the coarse problem sizes in -ksp_view). If its too fast, convergence deteriorates, too slow and the coarse problems are expensive. Mark, how do you turn on heavy-edge matching? Thanks, Matt > Thanks, > Justin > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Thu Jun 4 10:31:44 2015 From: jychang48 at gmail.com (Justin Chang) Date: Thu, 4 Jun 2015 10:31:44 -0500 Subject: [petsc-users] Guidance on GAMG preconditioning In-Reply-To: References: Message-ID: Yeah I saw his recommendation and am trying it out. But I am not sure what most of those parameters mean. For instance: 1) What does -pc_gamg_agg_nsmooths refer to? 2) Does increase in the threshold of -pc_gamg_threshold translate to making the coarsening faster? 3) What does -mg_levels_ksp_max_it refer to? I am not too worried about "bad" solutions from anisotropy (because it is driven by the velocity field, not the actual material property or mesh orientations) and we have a work-around for it (via optimization). I am more concerned about the time to solution especially for bigger problems. Thanks, Justin On Thu, Jun 4, 2015 at 9:04 AM, Matthew Knepley wrote: > On Thu, Jun 4, 2015 at 8:12 AM, Justin Chang wrote: > >> Hello everyone, >> >> Apologies if this sounds like a newbie question, but I am attempting to >> play around with the gamg preconditioner for my anisotropic diffusion >> problem, but I have no idea how to "fine tune" the parameters such that I >> get better performance. I understand that this depends on both the material >> properties and the type of mesh used, but I guess for starters, here is >> what I am doing and some stuff I have noticed: >> >> - I am solving a unit cube with a small hole in the middle. The outside >> BC conditions are 0 and the inside is unity. I have a tensorial dispersion >> diffusivity (with constant velocity). I have 6 different sized grids to >> solve this problem on. The problem sizes range from 36K dofs to 1M dofs. I >> was able to solve all of them using the CG and Jacobi solver and >> preconditioner combination. >> >> - When I try to solve them using CG and GAMG (I did not set any other >> command line options) I seem to get slower wall clock times but with much >> fewer KSP iterations. I also notice that the FLOPS/s metric is much smaller. >> >> - For certain meshes, my CG/GAMG solver fails to converge after 2 >> iterations due to DIVERGED_INDEFINITE_PC. This does not happen when I solve >> this on one processor or with the CG/Jacobi solver. >> >> From what I have read online and through these petsc-mailing lists, it >> sounds to me the gamg preconditioner will give better performance for nice >> elliptic problems like the one I am solving. When I saw the SNES ex12 test >> case 39 from builder.py, it only had -pc_type gamg. I am guessing that I >> need to set additional command line options? If so, where should I start? >> > > Mark recommends starting with > > -ksp_type cg > -pc_type gamg > -pc_gamg_agg_nsmooths 1 > -pc_gamg_threshold 0.02 # [0 - 0.1] > #-mg_levels_ksp_type richardson > -mg_levels_ksp_type chebyshev > -mg_levels_pc_type sor > #-mg_levels_pc_type jacobi > -mg_levels_ksp_max_it 2 # [1-8] > > and messing with the # stuff. > > 1) GAMG should definitely be more scalable than CG/ILU. You should be able > to see this by plotting the time/iterates as a function of problem size. > GAMG > should have a constant number of iterates, whereas CG should grow. > GAMG will have higher overheads, so for smaller problem sizes CG can be > better. > > 2) The DIVERGED_INDEFINITE_PC can happen if the GAMG is non-symmetric > due to different number of iterates or the subsolver choice. Try out > richardson instead of chebyshev. > > 3) If your coefficient is highly anisotropic, GAMG may need help making > good coarse basis vectors. Mark has some experimental stuff for this. > Adjusting the > thresholding parameter determines how fast you coarsen (you can see > the coarse problem sizes in -ksp_view). If its too fast, convergence > deteriorates, > too slow and the coarse problems are expensive. > > Mark, how do you turn on heavy-edge matching? > > Thanks, > > Matt > > >> Thanks, >> Justin >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From francesco.caimmi at polimi.it Thu Jun 4 10:42:41 2015 From: francesco.caimmi at polimi.it (Francesco Caimmi) Date: Thu, 4 Jun 2015 17:42:41 +0200 Subject: [petsc-users] [petsc4py] dm/examples/tutorials/ex2 in python In-Reply-To: References: <34150389.MoinacE02k@wotan> <1460799.1zrT7xRHbX@pc-fcaimmi> Message-ID: <2133367.gixjQnghja@pc-fcaimmi> On Wednesday June 3 2015 at 13:50 Matthew Knepley wrote: > > #### > > > > start, end = global_vec.getOwnershipRange() > > > > > > > > for i in xrange(start,end): > > > > global_vec[i] = 5.0*rank > > It looks like Lisandro has overloaded the [] operator to translate from > local to global indices. I don't > think I would recommend this since it would lock the vector for every > entry. The code above is > a direct numpy array access. > > Thanks, > > Matt > Ok, I got it. Thank you for time and sorry for the basic question, but unfortunately there is very little petsc4py documentation available. Thank you again, -- Francesco Caimmi Laboratorio di Ingegneria dei Polimeri http://www.chem.polimi.it/polyenglab/ Politecnico di Milano - Dipartimento di Chimica, Materiali e Ingegneria Chimica ?Giulio Natta? P.zza Leonardo da Vinci, 32 I-20133 Milano Tel. +39.02.2399.4711 Fax +39.02.7063.8173 francesco.caimmi at polimi.it Skype: fmglcaimmi (please arrange meetings by e-mail) GPG Public Key : http://goo.gl/64dDo -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jun 4 11:29:35 2015 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 4 Jun 2015 11:29:35 -0500 Subject: [petsc-users] Guidance on GAMG preconditioning In-Reply-To: References: Message-ID: On Thu, Jun 4, 2015 at 10:31 AM, Justin Chang wrote: > Yeah I saw his recommendation and am trying it out. But I am not sure what > most of those parameters mean. For instance: > > 1) What does -pc_gamg_agg_nsmooths refer to? > This is always 1 (its the definition of smoothed aggregation). Mark allows 0 to support unsmoothed aggregation, which may be better for easy problems on extremely large machines. > 2) Does increase in the threshold of -pc_gamg_threshold translate to > making the coarsening faster? > Yes, I believe so (easy to check). > 3) What does -mg_levels_ksp_max_it refer to? > This sets the maximum number of iterations for the smoother on each level. Thanks, Matt I am not too worried about "bad" solutions from anisotropy (because it is > driven by the velocity field, not the actual material property or mesh > orientations) and we have a work-around for it (via optimization). I am > more concerned about the time to solution especially for bigger problems. > > Thanks, > Justin > > On Thu, Jun 4, 2015 at 9:04 AM, Matthew Knepley wrote: > >> On Thu, Jun 4, 2015 at 8:12 AM, Justin Chang wrote: >> >>> Hello everyone, >>> >>> Apologies if this sounds like a newbie question, but I am attempting to >>> play around with the gamg preconditioner for my anisotropic diffusion >>> problem, but I have no idea how to "fine tune" the parameters such that I >>> get better performance. I understand that this depends on both the material >>> properties and the type of mesh used, but I guess for starters, here is >>> what I am doing and some stuff I have noticed: >>> >>> - I am solving a unit cube with a small hole in the middle. The outside >>> BC conditions are 0 and the inside is unity. I have a tensorial dispersion >>> diffusivity (with constant velocity). I have 6 different sized grids to >>> solve this problem on. The problem sizes range from 36K dofs to 1M dofs. I >>> was able to solve all of them using the CG and Jacobi solver and >>> preconditioner combination. >>> >>> - When I try to solve them using CG and GAMG (I did not set any other >>> command line options) I seem to get slower wall clock times but with much >>> fewer KSP iterations. I also notice that the FLOPS/s metric is much smaller. >>> >>> - For certain meshes, my CG/GAMG solver fails to converge after 2 >>> iterations due to DIVERGED_INDEFINITE_PC. This does not happen when I solve >>> this on one processor or with the CG/Jacobi solver. >>> >>> From what I have read online and through these petsc-mailing lists, it >>> sounds to me the gamg preconditioner will give better performance for nice >>> elliptic problems like the one I am solving. When I saw the SNES ex12 test >>> case 39 from builder.py, it only had -pc_type gamg. I am guessing that I >>> need to set additional command line options? If so, where should I start? >>> >> >> Mark recommends starting with >> >> -ksp_type cg >> -pc_type gamg >> -pc_gamg_agg_nsmooths 1 >> -pc_gamg_threshold 0.02 # [0 - 0.1] >> #-mg_levels_ksp_type richardson >> -mg_levels_ksp_type chebyshev >> -mg_levels_pc_type sor >> #-mg_levels_pc_type jacobi >> -mg_levels_ksp_max_it 2 # [1-8] >> >> and messing with the # stuff. >> >> 1) GAMG should definitely be more scalable than CG/ILU. You should be >> able to see this by plotting the time/iterates as a function of problem >> size. GAMG >> should have a constant number of iterates, whereas CG should grow. >> GAMG will have higher overheads, so for smaller problem sizes CG can be >> better. >> >> 2) The DIVERGED_INDEFINITE_PC can happen if the GAMG is non-symmetric >> due to different number of iterates or the subsolver choice. Try out >> richardson instead of chebyshev. >> >> 3) If your coefficient is highly anisotropic, GAMG may need help making >> good coarse basis vectors. Mark has some experimental stuff for this. >> Adjusting the >> thresholding parameter determines how fast you coarsen (you can see >> the coarse problem sizes in -ksp_view). If its too fast, convergence >> deteriorates, >> too slow and the coarse problems are expensive. >> >> Mark, how do you turn on heavy-edge matching? >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>> Justin >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Jun 4 11:31:52 2015 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 4 Jun 2015 12:31:52 -0400 Subject: [petsc-users] Guidance on GAMG preconditioning In-Reply-To: References: Message-ID: > > > Mark, how do you turn on heavy-edge matching? > > -mat_coarsen_type hem Don't try this yet. Try the parameters as Matt suggested. > Thanks, > > Matt > > >> Thanks, >> Justin >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Jun 4 11:33:25 2015 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 4 Jun 2015 12:33:25 -0400 Subject: [petsc-users] Guidance on GAMG preconditioning In-Reply-To: References: Message-ID: On Thu, Jun 4, 2015 at 12:29 PM, Matthew Knepley wrote: > On Thu, Jun 4, 2015 at 10:31 AM, Justin Chang wrote: > >> Yeah I saw his recommendation and am trying it out. But I am not sure >> what most of those parameters mean. For instance: >> >> 1) What does -pc_gamg_agg_nsmooths refer to? >> > > This is always 1 (its the definition of smoothed aggregation). Mark allows > 0 to support unsmoothed aggregation, which may be > better for easy problems on extremely large machines. > > >> 2) Does increase in the threshold of -pc_gamg_threshold translate to >> making the coarsening faster? >> > > Yes, I believe so (easy to check). > Other way around. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Carol.Brickley at awe.co.uk Thu Jun 4 12:32:22 2015 From: Carol.Brickley at awe.co.uk (Carol.Brickley at awe.co.uk) Date: Thu, 4 Jun 2015 17:32:22 +0000 Subject: [petsc-users] KSPSetComputeEigenvalues memory corruption Message-ID: <201506041732.t54HWVFW019827@msw2.awe.co.uk> All, I am using KSPSetComputeEigenvalues (in petsc 3.4.3) and getting a memory corruption problem. In the manual it says: "Currently this option is not valid for all iterative methods" Which methods and could this be why there is a memory corruption error message? Carol Dr Carol Brickley BSc,PhD,ARCS,DIC,MBCS Senior Software Engineer Applied Computer Science DS+T, AWE Aldermaston Reading Berkshire RG7 4PR Direct: 0118 9855035 ___________________________________________________ ____________________________ The information in this email and in any attachment(s) is commercial in confidence. If you are not the named addressee(s) or if you receive this email in error then any distribution, copying or use of this communication or the information in it is strictly prohibited. Please notify us immediately by email at admin.internet(at)awe.co.uk, and then delete this message from your computer. While attachments are virus checked, AWE plc does not accept any liability in respect of any virus which is not detected. AWE Plc Registered in England and Wales Registration No 02763902 AWE, Aldermaston, Reading, RG7 4PR -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jun 4 12:34:53 2015 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 4 Jun 2015 12:34:53 -0500 Subject: [petsc-users] KSPSetComputeEigenvalues memory corruption In-Reply-To: <201506041732.t54HWVFW019827@msw2.awe.co.uk> References: <201506041732.t54HWVFW019827@msw2.awe.co.uk> Message-ID: On Thu, Jun 4, 2015 at 12:32 PM, wrote: > All, > > > > I am using KSPSetComputeEigenvalues (in petsc 3.4.3) and getting a memory > corruption problem. In the manual it says: > > > > ?Currently this option is not valid for all iterative methods? > > > > Which methods and could this be why there is a memory corruption error > message? > Its only valid for Krylov methods that make a projected operator in the Krylov space that we can solve the eigenproblem for. The memory corruption likely has to do with your code, not the flag. Please run with valgrind to locate the problem. Thanks, Matt > Carol > > *Dr Carol Brickley * > > *BSc,PhD,ARCS,DIC,MBCS* > > > > *Senior Software Engineer* > > *Applied Computer Science* > > *DS+T,* > > *AWE* > > *Aldermaston* > > *Reading* > > *Berkshire* > > *RG7 4PR* > > > > *Direct: 0118 9855035* > > > > ___________________________________________________ > ____________________________ The information in this email and in any > attachment(s) is commercial in confidence. If you are not the named > addressee(s) or if you receive this email in error then any distribution, > copying or use of this communication or the information in it is strictly > prohibited. Please notify us immediately by email at admin.internet(at) > awe.co.uk, and then delete this message from your computer. While > attachments are virus checked, AWE plc does not accept any liability in > respect of any virus which is not detected. AWE Plc Registered in England > and Wales Registration No 02763902 AWE, Aldermaston, Reading, RG7 4PR > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hus003 at ucsd.edu Thu Jun 4 13:24:09 2015 From: hus003 at ucsd.edu (Sun, Hui) Date: Thu, 4 Jun 2015 18:24:09 +0000 Subject: [petsc-users] on the performance of MPI PETSc Message-ID: <7501CC2B7BBCC44A92ECEEC316170ECB01FF472A@XMAIL-MBX-BH1.AD.UCSD.EDU> Hello, I'm testing ex34.c under the examples of KSP. It's a multigrid 3D poisson solver. For 64^3 mesh, the time cost is 1s for 1 node with 12 cores; for 128^3 mesh, the time cost is 13s for 1 node with 12 cores, and the same for 2 nodes with 6 cores. For 256^3 mesh, I use 2 nodes with 12 cores, and time cost goes up to 726s. This doesn't seem right for I'm expecting O(N log(N)). I think it could be the memory bandwidth is not sufficient, and I need to do the bind-to-socket stuff. But I'm wondering what is the typical time cost for a 256^3 mesh, and then a 512^3 mesh? Please give me a rough idea. Thank you. Best, Hui -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Jun 4 13:56:36 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 4 Jun 2015 13:56:36 -0500 Subject: [petsc-users] on the performance of MPI PETSc In-Reply-To: <7501CC2B7BBCC44A92ECEEC316170ECB01FF472A@XMAIL-MBX-BH1.AD.UCSD.EDU> References: <7501CC2B7BBCC44A92ECEEC316170ECB01FF472A@XMAIL-MBX-BH1.AD.UCSD.EDU> Message-ID: > On Jun 4, 2015, at 1:24 PM, Sun, Hui wrote: > > Hello, > > I'm testing ex34.c under the examples of KSP. It's a multigrid 3D poisson solver. > > For 64^3 mesh, the time cost is 1s for 1 node with 12 cores; for 128^3 mesh, the time cost is 13s for 1 node with 12 cores, and the same for 2 nodes with 6 cores. For 256^3 mesh, I use 2 nodes with 12 cores, and time cost goes up to 726s. This doesn't seem right for I'm expecting O(N log(N)). I think it could be the memory bandwidth is not sufficient, and I need to do the bind-to-socket stuff. > > But I'm wondering what is the typical time cost for a 256^3 mesh, and then a 512^3 mesh? Please give me a rough idea. Thank you. There is no way we can answer that. What we can say is that given the numbers you have for 64 and 128 meshes, on an appropriate machine you should get much better numbers than 726 seconds for a 256 mesh. You need to first run streams on your machine, http://www.mcs.anl.gov/petsc/documentation/faq.html#computers check up on the binding business and if you still get poor results prepare a report on what you did and send it to us. Barry > > Best, > Hui From mail2amneet at gmail.com Thu Jun 4 15:47:33 2015 From: mail2amneet at gmail.com (Amneet Bhalla) Date: Thu, 4 Jun 2015 13:47:33 -0700 Subject: [petsc-users] PCASM subdomains Message-ID: Hi Folks, I have a basic question regarding matrix size as created by subdomains in PCASM. In the PCASMSetLocalSubdomains(PC pc,PetscInt n,IS is[],IS is_local[]) routine, I am defining is[] to be the local+overlapping DOFs and is_local [] to be the nonoverlapping local DOFs only. Now if I grab the subksp's from PCASM and print out the matrices, would the matrix size correspond to size of IS in is[] or in is_local[] dofs? Put in another words, do the subdomain matrices are defined for just local DOFs or with local+overlapping DOFs? Thanks, -- --Amneet -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jun 4 16:11:33 2015 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 4 Jun 2015 16:11:33 -0500 Subject: [petsc-users] PCASM subdomains In-Reply-To: References: Message-ID: On Thu, Jun 4, 2015 at 3:47 PM, Amneet Bhalla wrote: > > Hi Folks, > > I have a basic question regarding matrix size as created by subdomains in > PCASM. > > In the PCASMSetLocalSubdomains(PC pc,PetscInt n,IS is[],IS is_local[]) > routine, I am defining is[] to be the local+overlapping DOFs and is_local > [] to be the nonoverlapping local DOFs only. Now > if I grab the subksp's from PCASM and print out the matrices, would the > matrix size correspond to size of IS in is[] or in is_local[] dofs? Put in > another words, do the subdomain matrices are defined for just local DOFs or > with local+overlapping DOFs? > The matrices should definitely be for the overlapping division. If they are not, something is wrong. If they were the non-overlapping distribution, we would have block Jacobi. Thanks, Matt > Thanks, > -- > --Amneet > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Thu Jun 4 16:47:06 2015 From: jychang48 at gmail.com (Justin Chang) Date: Thu, 4 Jun 2015 16:47:06 -0500 Subject: [petsc-users] Guidance on GAMG preconditioning In-Reply-To: References: Message-ID: Thank you Matt and Mark for the clarification. Matt, if you recall our discussion about calculating the arithmetic intensity from the earlier threads, it seems GAMG now has a myriad of all these additional vector and matrix operations that were not present in the CG/Jacobi case. Running with the command line options you and Mark suggested, I now have these additional operations to deal with: VecMDot VecAXPBYCZ VecMAXPY VecSetRandom VecNormalize MatMultAdd MatMultTranspose MatSolve MatConvert MatScale MatResidual MatCoarsen MatAXPY MatMatMult MatMatMultSym MatMatMultNum MatPtAP MatPtAPSymbolic PatPtAPNumeric MatTrnMatMult MatTrnMatMultSym MatTrnMatMultNum MatGetSymTrans KSPGMRESOrthog PCGAMGGraph_AGG PCGAMGCoarse_AGG PCGAMGProl_AGG PCGAMGPOpt_AGG GAMG: createProl and all of its associated events. GAMG: partLevel PCSetUpOnBlocks Attached is the output from -log_summary showing the exact counts for the case I am running. I have the following questions: 1) For the Vec operations VecMDot and VecMAXPY, it seems the estimation of total bytes transferred (TBT) relies on knowing how many vectors there are. Is there a way to figure this out? Or at least with gamg what would it be, three vectors? 2) It seems there are a lot of matrix manipulations and multiplications. Is it safe to say that the size and number of non zeroes is the same? Or will it change? 3) If I follow the TBT tabulation as in that paper you pointed me to, would MatMultTranspose follow the same formula if the Jacobian is symmetric? 4) How do I calculate anything that requires the multiplication of at least two matrices? 5) More importantly, are any of the above calculations necessary? Because log_summary seems to indicate that MatMult() has the greatest amount of workload and number of calls. My only hesitation is how much traffic MatMatMults may take (assuming I go off of the same assumptions as in that paper). 6) And/or, are there any other functions that I missed that might be important to calculate as well? Thanks, Justin On Thu, Jun 4, 2015 at 11:33 AM, Mark Adams wrote: > > > On Thu, Jun 4, 2015 at 12:29 PM, Matthew Knepley > wrote: > >> On Thu, Jun 4, 2015 at 10:31 AM, Justin Chang >> wrote: >> >>> Yeah I saw his recommendation and am trying it out. But I am not sure >>> what most of those parameters mean. For instance: >>> >>> 1) What does -pc_gamg_agg_nsmooths refer to? >>> >> >> This is always 1 (its the definition of smoothed aggregation). Mark >> allows 0 to support unsmoothed aggregation, which may be >> better for easy problems on extremely large machines. >> >> >>> 2) Does increase in the threshold of -pc_gamg_threshold translate to >>> making the coarsening faster? >>> >> >> Yes, I believe so (easy to check). >> > > Other way around. > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ========================================== 1 processors: ========================================== TSTEP ANALYSIS TIME ITER FLOPS/s Linear solve converged due to CONVERGED_RTOL iterations 31 1 2.313901e+00 31 3.629168e+08 ========================================== Time summary: ========================================== Creating DMPlex: 0.212745s Distributing DMPlex: 0.000274897s Refining DMPlex: 1.1645s Setting up problem: 0.960611s Overall analysis time: 2.39205s Overall FLOPS/s: 2.60206e+08 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./main on a arch-linux2-c-opt named compute-0-18.local with 1 processor, by jchang23 Thu Jun 4 16:26:23 2015 Using Petsc Development GIT revision: v3.5.4-3996-gc7ab56a GIT Date: 2015-06-04 06:26:21 -0500 Max Max/Min Avg Total Time (sec): 4.735e+00 1.00000 4.735e+00 Objects: 5.320e+02 1.00000 5.320e+02 Flops: 8.524e+08 1.00000 8.524e+08 8.524e+08 Flops/sec: 1.800e+08 1.00000 1.800e+08 1.800e+08 MPI Messages: 5.500e+00 1.00000 5.500e+00 5.500e+00 MPI Message Lengths: 2.218e+06 1.00000 4.032e+05 2.218e+06 MPI Reductions: 1.000e+00 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 4.7345e+00 100.0% 8.5241e+08 100.0% 5.500e+00 100.0% 4.032e+05 100.0% 1.000e+00 100.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage CreateMesh 1 1.0 1.3775e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 29 0 0 0 0 29 0 0 0 0 0 BuildTwoSided 5 1.0 2.0537e-03 1.0 0.00e+00 0.0 5.0e-01 4.0e+00 0.0e+00 0 0 9 0 0 0 0 9 0 0 0 VecView 1 1.0 1.3811e-02 1.0 3.62e+05 1.0 1.0e+00 4.9e+05 0.0e+00 0 0 18 22 0 0 0 18 22 0 26 VecMDot 112 1.0 2.6209e-03 1.0 1.43e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 5464 VecTDot 62 1.0 3.2454e-03 1.0 7.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 2354 VecNorm 184 1.0 2.8832e-03 1.0 6.81e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 2362 VecScale 152 1.0 5.1141e-04 1.0 1.43e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2804 VecCopy 171 1.0 1.4133e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 658 1.0 4.2362e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 106 1.0 2.6708e-03 1.0 8.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 3007 VecAYPX 1054 1.0 1.1199e-02 1.0 2.45e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 2190 VecAXPBYCZ 512 1.0 6.9575e-03 1.0 4.17e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 5987 VecWAXPY 1 1.0 1.0610e-04 1.0 6.16e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 581 VecMAXPY 152 1.0 2.9120e-03 1.0 1.69e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 5813 VecAssemblyBegin 4 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 4 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 856 1.0 1.0922e-02 1.0 1.39e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 1275 VecSetRandom 4 1.0 6.6662e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNormalize 152 1.0 1.7796e-03 1.0 4.30e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 2417 MatMult 915 1.0 3.2228e-01 1.0 5.27e+08 1.0 0.0e+00 0.0e+00 0.0e+00 7 62 0 0 0 7 62 0 0 0 1637 MatMultAdd 128 1.0 2.0048e-02 1.0 2.06e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 1028 MatMultTranspose 128 1.0 2.2771e-02 1.0 2.06e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 906 MatSolve 64 1.0 9.5367e-05 1.0 1.13e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1188 MatLUFactorSym 1 1.0 3.5048e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatLUFactorNum 1 1.0 2.0981e-05 1.0 1.76e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 837 MatConvert 4 1.0 1.0548e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatScale 12 1.0 2.8198e-03 1.0 2.94e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1043 MatResidual 128 1.0 4.4002e-02 1.0 7.35e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 9 0 0 0 1 9 0 0 0 1670 MatAssemblyBegin 33 1.0 5.7220e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 33 1.0 1.8984e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRow 260352 1.0 1.4122e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRowIJ 1 1.0 5.0068e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 4.1962e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatCoarsen 4 1.0 3.9771e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 1 1.0 1.3030e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAXPY 4 1.0 1.1571e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMatMult 4 1.0 2.1907e-02 1.0 2.62e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 120 MatMatMultSym 4 1.0 1.5245e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMatMultNum 4 1.0 6.6392e-03 1.0 2.62e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 394 MatPtAP 4 1.0 2.0476e-01 1.0 4.64e+07 1.0 0.0e+00 0.0e+00 0.0e+00 4 5 0 0 0 4 5 0 0 0 227 MatPtAPSymbolic 4 1.0 7.4399e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 MatPtAPNumeric 4 1.0 1.3035e-01 1.0 4.64e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 5 0 0 0 3 5 0 0 0 356 MatTrnMatMult 1 1.0 3.2565e-01 1.0 2.06e+07 1.0 0.0e+00 0.0e+00 0.0e+00 7 2 0 0 0 7 2 0 0 0 63 MatTrnMatMultSym 1 1.0 1.7446e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0 MatTrnMatMultNum 1 1.0 1.5119e-01 1.0 2.06e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 2 0 0 0 3 2 0 0 0 136 MatGetSymTrans 5 1.0 5.9624e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 DMPlexInterp 3 1.0 1.8401e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0 DMPlexStratify 11 1.0 3.1623e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 7 0 0 0 0 7 0 0 0 0 0 DMPlexPrealloc 1 1.0 5.1204e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 11 0 0 0 0 11 0 0 0 0 0 DMPlexResidualFE 1 1.0 3.5016e-01 1.0 2.09e+07 1.0 0.0e+00 0.0e+00 0.0e+00 7 2 0 0 0 7 2 0 0 0 60 DMPlexJacobianFE 1 1.0 8.6178e-01 1.0 4.22e+07 1.0 0.0e+00 0.0e+00 0.0e+00 18 5 0 0 0 18 5 0 0 0 49 SFSetGraph 6 1.0 1.0118e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFBcastBegin 9 1.0 3.4070e-03 1.0 0.00e+00 0.0 4.5e+00 3.8e+05 0.0e+00 0 0 82 78 0 0 0 82 78 0 0 SFBcastEnd 9 1.0 4.9067e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFReduceBegin 1 1.0 2.2292e-04 1.0 0.00e+00 0.0 1.0e+00 4.9e+05 0.0e+00 0 0 18 22 0 0 0 18 22 0 0 SFReduceEnd 1 1.0 1.5783e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SNESFunctionEval 1 1.0 3.5272e-01 1.0 2.09e+07 1.0 2.0e+00 4.9e+05 0.0e+00 7 2 36 44 0 7 2 36 44 0 59 SNESJacobianEval 1 1.0 8.6338e-01 1.0 4.22e+07 1.0 2.5e+00 3.0e+05 0.0e+00 18 5 45 33 0 18 5 45 33 0 49 KSPGMRESOrthog 112 1.0 5.1789e-03 1.0 2.86e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 5531 KSPSetUp 15 1.0 3.0289e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 1.0976e+00 1.0 7.77e+08 1.0 0.0e+00 0.0e+00 0.0e+00 23 91 0 0 0 23 91 0 0 0 708 PCGAMGGraph_AGG 4 1.0 7.4403e-02 1.0 2.30e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 31 PCGAMGCoarse_AGG 4 1.0 3.3554e-01 1.0 2.06e+07 1.0 0.0e+00 0.0e+00 0.0e+00 7 2 0 0 0 7 2 0 0 0 61 PCGAMGProl_AGG 4 1.0 7.1132e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCGAMGPOpt_AGG 4 1.0 6.7413e-02 1.0 4.42e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 656 GAMG: createProl 4 1.0 4.8507e-01 1.0 6.71e+07 1.0 0.0e+00 0.0e+00 0.0e+00 10 8 0 0 0 10 8 0 0 0 138 Graph 8 1.0 7.4155e-02 1.0 2.30e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 31 MIS/Agg 4 1.0 4.0429e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SA: col data 4 1.0 1.5974e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SA: frmProl0 4 1.0 6.3415e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SA: smooth 4 1.0 6.7411e-02 1.0 4.42e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 656 GAMG: partLevel 4 1.0 2.0478e-01 1.0 4.64e+07 1.0 0.0e+00 0.0e+00 0.0e+00 4 5 0 0 0 4 5 0 0 0 227 PCSetUp 2 1.0 6.9165e-01 1.0 1.14e+08 1.0 0.0e+00 0.0e+00 0.0e+00 15 13 0 0 0 15 13 0 0 0 164 PCSetUpOnBlocks 32 1.0 1.4734e-04 1.0 1.76e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 119 PCApply 32 1.0 3.6208e-01 1.0 5.88e+08 1.0 0.0e+00 0.0e+00 0.0e+00 8 69 0 0 0 8 69 0 0 0 1625 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Viewer 4 3 2264 0 Object 7 7 4032 0 Container 7 7 3976 0 Vector 207 207 129392344 0 Matrix 24 24 43743284 0 Matrix Coarsen 4 4 2512 0 Distributed Mesh 28 28 129704 0 GraphPartitioner 11 11 6644 0 Star Forest Bipartite Graph 60 60 48392 0 Discrete System 28 28 23744 0 Index Set 47 47 9592920 0 IS L to G Mapping 1 1 302332 0 Section 61 61 40504 0 SNES 1 1 1332 0 SNESLineSearch 1 1 864 0 DMSNES 1 1 664 0 Krylov Solver 15 15 267352 0 Preconditioner 15 15 14740 0 Linear Space 2 2 1280 0 Dual Space 2 2 1312 0 FE Space 2 2 1496 0 PetscRandom 4 4 2496 0 ======================================================================================================================== Average time to get PetscTime(): 9.53674e-08 #PETSc Option Table entries: -al 1 -am 0 -at 0.001 -bcloc 0,1,0,1,0,0,0,1,0,1,1,1,0,0,0,1,0,1,1,1,0,1,0,1,0,1,0,0,0,1,0,1,1,1,0,1,0.45,0.55,0.45,0.55,0.45,0.55 -bcnum 7 -bcval 0,0,0,0,0,0,1 -dim 3 -dm_refine 1 -dt 0.001 -edges 3,3 -floc 0.25,0.75,0.25,0.75,0.25,0.75 -fnum 0 -ftime 0,99 -fval 1 -ksp_atol 1e-8 -ksp_converged_reason -ksp_max_it 50000 -ksp_rtol 1e-8 -ksp_type cg -log_summary -lower 0,0 -mat_petscspace_order 0 -mesh cube_with_hole3_mesh.dat -mg_levels_ksp_max_it 2 -mg_levels_ksp_type chebyshev -mg_levels_pc_type jacobi -mu 1 -nonneg 0 -numsteps 0 -options_left 0 -pc_gamg_agg_nsmooths 1 -pc_gamg_threshold 0.02 -pc_type gamg -petscpartitioner_type parmetis -progress 0 -simplex 1 -solution_petscspace_order 1 -tao_fatol 1e-8 -tao_frtol 1e-8 -tao_max_it 50000 -tao_type blmvm -trans cube_with_hole3_trans.dat -upper 1,1 -vtuname figure_cube_with_hole_3 -vtuprint 1 #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --download-chaco --download-ctetgen --download-fblaslapack --download-hdf5 --download-metis --download-parmetis --download-triangle --with-cc=mpicc --with-cmake=cmake --with-cxx=mpicxx --with-debugging=0 --with-fc=mpif90 --with-mpiexec=mpiexec --with-valgrind=1 CFLAGS= COPTFLAGS=-O3 CXXFLAGS= CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 PETSC_ARCH=arch-linux2-c-opt ----------------------------------------- Libraries compiled on Thu Jun 4 06:27:39 2015 on compute-2-42.local Machine characteristics: Linux-2.6.32-504.1.3.el6.x86_64-x86_64-with-redhat-6.6-Santiago Using PETSc directory: /home/jchang23/petsc Using PETSc arch: arch-linux2-c-opt ----------------------------------------- Using C compiler: mpicc -fPIC -O3 ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: mpif90 -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O3 ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/jchang23/petsc/arch-linux2-c-opt/include -I/home/jchang23/petsc/include -I/home/jchang23/petsc/include -I/home/jchang23/petsc/arch-linux2-c-opt/include -I/share/apps/openmpi-1.8.3/include ----------------------------------------- Using C linker: mpicc Using Fortran linker: mpif90 Using libraries: -Wl,-rpath,/home/jchang23/petsc/arch-linux2-c-opt/lib -L/home/jchang23/petsc/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/home/jchang23/petsc/arch-linux2-c-opt/lib -L/home/jchang23/petsc/arch-linux2-c-opt/lib -lflapack -lfblas -lparmetis -ltriangle -lmetis -lctetgen -lX11 -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lchaco -Wl,-rpath,/share/apps/openmpi-1.8.3/lib -L/share/apps/openmpi-1.8.3/lib -Wl,-rpath,/share/apps/gcc-4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -L/share/apps/gcc-4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -Wl,-rpath,/share/apps/gcc-4.9.2/lib64 -L/share/apps/gcc-4.9.2/lib64 -Wl,-rpath,/share/apps/gcc-4.9.2/lib -L/share/apps/gcc-4.9.2/lib -lmpi_usempi -lmpi_mpifh -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpi_cxx -lstdc++ -Wl,-rpath,/share/apps/openmpi-1.8.3/lib -L/share/apps/openmpi-1.8.3/lib -Wl,-rpath,/share/apps/gcc-4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -L/share/apps/gcc-4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -Wl,-rpath,/share/apps/gcc-4.9.2/lib64 -L/share/apps/gcc-4.9.2/lib64 -Wl,-rpath,/share/apps/gcc-4.9.2/lib -L/share/apps/gcc-4.9.2/lib -ldl -Wl,-rpath,/share/apps/openmpi-1.8.3/lib -lmpi -lgcc_s -lpthread -ldl ----------------------------------------- From mail2amneet at gmail.com Thu Jun 4 19:03:21 2015 From: mail2amneet at gmail.com (Amneet Bhalla) Date: Thu, 4 Jun 2015 17:03:21 -0700 Subject: [petsc-users] PCASM subdomains In-Reply-To: References: Message-ID: Thanks! They indeed are for the overlapping division. I just wanted to confirm it to debug my code, which I now think is giving me the right size of the subdomain matrices. On Thu, Jun 4, 2015 at 2:11 PM, Matthew Knepley wrote: > On Thu, Jun 4, 2015 at 3:47 PM, Amneet Bhalla > wrote: > >> >> Hi Folks, >> >> I have a basic question regarding matrix size as created by subdomains in >> PCASM. >> >> In the PCASMSetLocalSubdomains(PC pc,PetscInt n,IS is[],IS is_local[]) >> routine, I am defining is[] to be the local+overlapping DOFs and is_local >> [] to be the nonoverlapping local DOFs only. Now >> if I grab the subksp's from PCASM and print out the matrices, would the >> matrix size correspond to size of IS in is[] or in is_local[] dofs? Put in >> another words, do the subdomain matrices are defined for just local DOFs or >> with local+overlapping DOFs? >> > > The matrices should definitely be for the overlapping division. If they > are not, something is wrong. If they were > the non-overlapping distribution, we would have block Jacobi. > > Thanks, > > Matt > > >> Thanks, >> -- >> --Amneet >> >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -- --Amneet -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Jun 5 08:23:19 2015 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 5 Jun 2015 09:23:19 -0400 Subject: [petsc-users] Guidance on GAMG preconditioning In-Reply-To: References: Message-ID: AMG setup is kind of expensive, say the order of a solve. What you have looks OK here. If you have a super hard problem you will want to coarsen slower (high threshold), which will increase setup costs. The setup costs are symbolic (graph work) and numeric (like a factorization). As you noticed: MatPtAPNumeric 4 1.0 1.3035e-01 1.0 4.64e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 5 0 0 0 3 5 0 0 0 356 This is the numeric part. The symbolic part will get amortized if the grid does not change and the numerical part will get amortized if the operator does not change (linear). Mark On Thu, Jun 4, 2015 at 5:47 PM, Justin Chang wrote: > Thank you Matt and Mark for the clarification. Matt, if you recall our > discussion about calculating the arithmetic intensity from the earlier > threads, it seems GAMG now has a myriad of all these additional vector and > matrix operations that were not present in the CG/Jacobi case. Running with > the command line options you and Mark suggested, I now have these > additional operations to deal with: > > VecMDot > VecAXPBYCZ > VecMAXPY > VecSetRandom > VecNormalize > MatMultAdd > MatMultTranspose > MatSolve > MatConvert > MatScale > MatResidual > MatCoarsen > MatAXPY > MatMatMult > MatMatMultSym > MatMatMultNum > MatPtAP > MatPtAPSymbolic > PatPtAPNumeric > MatTrnMatMult > MatTrnMatMultSym > MatTrnMatMultNum > MatGetSymTrans > KSPGMRESOrthog > PCGAMGGraph_AGG > PCGAMGCoarse_AGG > PCGAMGProl_AGG > PCGAMGPOpt_AGG > GAMG: createProl and all of its associated events. > GAMG: partLevel > PCSetUpOnBlocks > > Attached is the output from -log_summary showing the exact counts for the > case I am running. > > I have the following questions: > > 1) For the Vec operations VecMDot and VecMAXPY, it seems the estimation of > total bytes transferred (TBT) relies on knowing how many vectors there are. > Is there a way to figure this out? Or at least with gamg what would it be, > three vectors? > > 2) It seems there are a lot of matrix manipulations and multiplications. > Is it safe to say that the size and number of non zeroes is the same? Or > will it change? > > 3) If I follow the TBT tabulation as in that paper you pointed me to, > would MatMultTranspose follow the same formula if the Jacobian is symmetric? > > 4) How do I calculate anything that requires the multiplication of at > least two matrices? > > 5) More importantly, are any of the above calculations necessary? Because > log_summary seems to indicate that MatMult() has the greatest amount of > workload and number of calls. My only hesitation is how much traffic > MatMatMults may take (assuming I go off of the same assumptions as in that > paper). > > 6) And/or, are there any other functions that I missed that might be > important to calculate as well? > > Thanks, > Justin > > On Thu, Jun 4, 2015 at 11:33 AM, Mark Adams wrote: > >> >> >> On Thu, Jun 4, 2015 at 12:29 PM, Matthew Knepley >> wrote: >> >>> On Thu, Jun 4, 2015 at 10:31 AM, Justin Chang >>> wrote: >>> >>>> Yeah I saw his recommendation and am trying it out. But I am not sure >>>> what most of those parameters mean. For instance: >>>> >>>> 1) What does -pc_gamg_agg_nsmooths refer to? >>>> >>> >>> This is always 1 (its the definition of smoothed aggregation). Mark >>> allows 0 to support unsmoothed aggregation, which may be >>> better for easy problems on extremely large machines. >>> >>> >>>> 2) Does increase in the threshold of -pc_gamg_threshold translate to >>>> making the coarsening faster? >>>> >>> >>> Yes, I believe so (easy to check). >>> >> >> Other way around. >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hus003 at ucsd.edu Fri Jun 5 11:37:38 2015 From: hus003 at ucsd.edu (Sun, Hui) Date: Fri, 5 Jun 2015 16:37:38 +0000 Subject: [petsc-users] on the performance of MPI PETSc In-Reply-To: References: <7501CC2B7BBCC44A92ECEEC316170ECB01FF472A@XMAIL-MBX-BH1.AD.UCSD.EDU>, Message-ID: <7501CC2B7BBCC44A92ECEEC316170ECB01FF47ED@XMAIL-MBX-BH1.AD.UCSD.EDU> Thank you Barry, I will check on that. However, I tried something else, and now have more questions on the multigrid solver in PETSc. If I run ex34 using the command: ./ex34 -pc_type mg -pc_mg_type full -ksp_type fgmres -ksp_monitor_short -pc_mg_levels 3 -mg_coarse_pc_factor_shift_type nonzero I get the following output: 0 KSP Residual norm 0.0289149 1 KSP Residual norm 0.0186085 2 KSP Residual norm 0.000732811 3 KSP Residual norm 1.07394e-05 4 KSP Residual norm 1.01043e-06 5 KSP Residual norm 5.62122e-08 Residual norm 5.62122e-08 Error norm 0.000200841 Error norm 5.18298e-05 Error norm 4.90288e-08 Time cost: 0.068264 14.4999 0.0826752 If I run ex34 using the command: ./ex34 -pc_type none -ksp_type gmres -ksp_monitor_short I get the following output: 0 KSP Residual norm 0.0289149 1 KSP Residual norm < 1.e-11 Residual norm 9.14804e-15 Error norm 0.00020064 Error norm 5.18301e-05 Error norm 4.90288e-08 Time cost: 1.60657 4.67008 0.0784049 >From the output, it seems that solving Poisson without a preconditioner is actually faster than using multigrid as a preconditioner. I think multigrid should do better than that. How should I look more into this issue? What are the options you would recommend me to try in order to raise up the efficiency of multigrid? Another question is: Petsc uses multigrid as a preconditioner, how do I specify the option so that it becomes a solver? Is it by doing: ./ex34 -pc_type mg -pc_mg_type full -ksp_type richardson -ksp_monitor_short -pc_mg_levels 3 -mg_coarse_pc_factor_shift_type nonzero Best, Hui ________________________________________ From: Barry Smith [bsmith at mcs.anl.gov] Sent: Thursday, June 04, 2015 11:56 AM To: Sun, Hui Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] on the performance of MPI PETSc > On Jun 4, 2015, at 1:24 PM, Sun, Hui wrote: > > Hello, > > I'm testing ex34.c under the examples of KSP. It's a multigrid 3D poisson solver. > > For 64^3 mesh, the time cost is 1s for 1 node with 12 cores; for 128^3 mesh, the time cost is 13s for 1 node with 12 cores, and the same for 2 nodes with 6 cores. For 256^3 mesh, I use 2 nodes with 12 cores, and time cost goes up to 726s. This doesn't seem right for I'm expecting O(N log(N)). I think it could be the memory bandwidth is not sufficient, and I need to do the bind-to-socket stuff. > > But I'm wondering what is the typical time cost for a 256^3 mesh, and then a 512^3 mesh? Please give me a rough idea. Thank you. There is no way we can answer that. What we can say is that given the numbers you have for 64 and 128 meshes, on an appropriate machine you should get much better numbers than 726 seconds for a 256 mesh. You need to first run streams on your machine, http://www.mcs.anl.gov/petsc/documentation/faq.html#computers check up on the binding business and if you still get poor results prepare a report on what you did and send it to us. Barry > > Best, > Hui From jed at jedbrown.org Fri Jun 5 11:45:11 2015 From: jed at jedbrown.org (Jed Brown) Date: Fri, 05 Jun 2015 10:45:11 -0600 Subject: [petsc-users] on the performance of MPI PETSc In-Reply-To: <7501CC2B7BBCC44A92ECEEC316170ECB01FF47ED@XMAIL-MBX-BH1.AD.UCSD.EDU> References: <7501CC2B7BBCC44A92ECEEC316170ECB01FF472A@XMAIL-MBX-BH1.AD.UCSD.EDU> <7501CC2B7BBCC44A92ECEEC316170ECB01FF47ED@XMAIL-MBX-BH1.AD.UCSD.EDU> Message-ID: <87wpziyui0.fsf@jedbrown.org> "Sun, Hui" writes: > If I run ex34 using the command: > ./ex34 -pc_type none -ksp_type gmres -ksp_monitor_short > > I get the following output: > 0 KSP Residual norm 0.0289149 > 1 KSP Residual norm < 1.e-11 > Residual norm 9.14804e-15 > Error norm 0.00020064 > Error norm 5.18301e-05 > Error norm 4.90288e-08 > Time cost: 1.60657 4.67008 0.0784049 > > From the output, it seems that solving Poisson without a > preconditioner is actually faster than using multigrid as a > preconditioner. I think multigrid should do better than that. Can't beat one iteration with on preconditioner. The right hand side (thus solution) is an eigenvector, so GMRES without preconditioning converges in one iteration always. The example could not be worse for testing solvers. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From hus003 at ucsd.edu Fri Jun 5 11:51:14 2015 From: hus003 at ucsd.edu (Sun, Hui) Date: Fri, 5 Jun 2015 16:51:14 +0000 Subject: [petsc-users] on the performance of MPI PETSc In-Reply-To: <87wpziyui0.fsf@jedbrown.org> References: <7501CC2B7BBCC44A92ECEEC316170ECB01FF472A@XMAIL-MBX-BH1.AD.UCSD.EDU> <7501CC2B7BBCC44A92ECEEC316170ECB01FF47ED@XMAIL-MBX-BH1.AD.UCSD.EDU>, <87wpziyui0.fsf@jedbrown.org> Message-ID: <7501CC2B7BBCC44A92ECEEC316170ECB01FF4809@XMAIL-MBX-BH1.AD.UCSD.EDU> Thank you Jed. I see. I have another question: Petsc uses multigrid as a preconditioner. How do I specify the option so that it becomes a solver? Is it by doing: ./ex34 -pc_type mg -pc_mg_type full -ksp_type richardson -ksp_monitor_short -pc_mg_levels 3 -mg_coarse_pc_factor_shift_type nonzero MG as preconditioner or as solver, which one gives better performance and why? Best, Hui ________________________________________ From: Jed Brown [jed at jedbrown.org] Sent: Friday, June 05, 2015 9:45 AM To: Sun, Hui; Barry Smith Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] on the performance of MPI PETSc "Sun, Hui" writes: > If I run ex34 using the command: > ./ex34 -pc_type none -ksp_type gmres -ksp_monitor_short > > I get the following output: > 0 KSP Residual norm 0.0289149 > 1 KSP Residual norm < 1.e-11 > Residual norm 9.14804e-15 > Error norm 0.00020064 > Error norm 5.18301e-05 > Error norm 4.90288e-08 > Time cost: 1.60657 4.67008 0.0784049 > > From the output, it seems that solving Poisson without a > preconditioner is actually faster than using multigrid as a > preconditioner. I think multigrid should do better than that. Can't beat one iteration with on preconditioner. The right hand side (thus solution) is an eigenvector, so GMRES without preconditioning converges in one iteration always. The example could not be worse for testing solvers. From jed at jedbrown.org Fri Jun 5 12:08:23 2015 From: jed at jedbrown.org (Jed Brown) Date: Fri, 05 Jun 2015 11:08:23 -0600 Subject: [petsc-users] on the performance of MPI PETSc In-Reply-To: <7501CC2B7BBCC44A92ECEEC316170ECB01FF4809@XMAIL-MBX-BH1.AD.UCSD.EDU> References: <7501CC2B7BBCC44A92ECEEC316170ECB01FF472A@XMAIL-MBX-BH1.AD.UCSD.EDU> <7501CC2B7BBCC44A92ECEEC316170ECB01FF47ED@XMAIL-MBX-BH1.AD.UCSD.EDU> <87wpziyui0.fsf@jedbrown.org> <7501CC2B7BBCC44A92ECEEC316170ECB01FF4809@XMAIL-MBX-BH1.AD.UCSD.EDU> Message-ID: <87lhfyytfc.fsf@jedbrown.org> "Sun, Hui" writes: > Thank you Jed. I see. > > I have another question: Petsc uses multigrid as a preconditioner. How do I specify the option so that it becomes a solver? Is it by doing: > ./ex34 -pc_type mg -pc_mg_type full -ksp_type richardson -ksp_monitor_short -pc_mg_levels 3 -mg_coarse_pc_factor_shift_type nonzero Yes. > MG as preconditioner or as solver, which one gives better performance and why? FMG done right is a direct solver (one iteration). Otherwise, Krylov usually provides enough benefit to justify its modest cost, but it all depends on what is slowing convergence. Hammer or wrench, which is more useful and why? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From hsahasra at purdue.edu Fri Jun 5 12:32:23 2015 From: hsahasra at purdue.edu (Harshad Sahasrabudhe) Date: Fri, 5 Jun 2015 13:32:23 -0400 Subject: [petsc-users] Crash when trying to use FD Jacobian Message-ID: Hi, I'm solving a non-linear equation using NEWTONLS. The SNES is called from a wrapper in the LibMesh library. I'm trying to use the default FD Jacobian by not setting any Mat or callback function for the Jacobian. When doing this I get the following error. I'm not able to figure out why I get this error. Can I get some pointers to what I might be doing wrong? [0]PETSC ERROR: --------------------- Error Message ------------------------------------ [0]PETSC ERROR: Object is in wrong state! [0]PETSC ERROR: Not for unassembled matrix! [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Petsc Release Version 3.4.3, Oct, 15, 2013 [0]PETSC ERROR: See docs/changes/index.html for recent updates. [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. [0]PETSC ERROR: See docs/index.html for manual pages. [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: ./nemo on a linux named conte-fe02.rcac.purdue.edu by hsahasra Fri Jun 5 13:25:27 2015 [0]PETSC ERROR: Libraries linked from /home/hsahasra/NEMO5/libs/petsc/build-real/linux/lib [0]PETSC ERROR: Configure run at Fri Mar 20 15:18:25 2015 [0]PETSC ERROR: Configure options --with-x=0 --download-hdf5=1 --with-scalar-type=real --with-single-library=0 --with-shared-libraries=0 --with-clanguage=C++ --with-fortran=1 --with-cc=mpiicc --with-fc=mpiifort --with-cxx=mpiicpc COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 --download-metis=1 --download-parmetis=1 --with-valgrind-dir=/apps/rhel6/valgrind/3.8.1/ --download-mumps=1 --with-fortran-kernels=0 --with-blas-lapack-dir=/apps/rhel6/intel/composer_xe_2013.3.163/mkl --download-superlu_dist=1 --with-blas-lapack-dir=/apps/rhel6/intel/composer_xe_2013.3.163/mkl --with-blacs-lib=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so --with-blacs-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include --with-scalapack-lib="-L/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64" i --with-scalapack-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include --with-pic=1 --with-debugging=1 [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: MatGetColoring() line 481 in /home/hsahasra/NEMO5/libs/petsc/build-real/src/mat/color/color.c [0]PETSC ERROR: SNESComputeJacobianDefaultColor() line 64 in /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/interface/snesj2.c [0]PETSC ERROR: SNESComputeJacobian() line 2152 in /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/interface/snes.c [0]PETSC ERROR: SNESSolve_NEWTONLS() line 218 in /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/impls/ls/ls.c [0]PETSC ERROR: SNESSolve() line 3636 in /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/interface/snes.c [0]PETSC ERROR: solve() line 538 in "unknowndirectory/"src/solvers/petsc_nonlinear_solver.C application called MPI_Abort(comm=0x84000000, 73) - process 0 Thanks, Harshad -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jun 5 12:59:53 2015 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 5 Jun 2015 12:59:53 -0500 Subject: [petsc-users] Crash when trying to use FD Jacobian In-Reply-To: References: Message-ID: On Fri, Jun 5, 2015 at 12:32 PM, Harshad Sahasrabudhe wrote: > Hi, > > I'm solving a non-linear equation using NEWTONLS. The SNES is called from > a wrapper in the LibMesh library. I'm trying to use the default FD Jacobian > by not setting any Mat or callback function for the Jacobian. > > When doing this I get the following error. I'm not able to figure out why > I get this error. Can I get some pointers to what I might be doing wrong? > Ah, this is going to be somewhat harder. Unless PETSc know the connectivity of your Jacobian, which means the influence between unknowns, it can only do one vector at a time: -snes_fd which is really slow. It is trying to use get coloring for the Jacobian so that it can do many vectors at once. Do you have a simplified Jacobian matrix you could use for preconditioner construction? Then it could use that. Thanks, Matt > [0]PETSC ERROR: --------------------- Error Message > ------------------------------------ > [0]PETSC ERROR: Object is in wrong state! > [0]PETSC ERROR: Not for unassembled matrix! > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Petsc Release Version 3.4.3, Oct, 15, 2013 > [0]PETSC ERROR: See docs/changes/index.html for recent updates. > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. > [0]PETSC ERROR: See docs/index.html for manual pages. > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: ./nemo on a linux named conte-fe02.rcac.purdue.edu by > hsahasra Fri Jun 5 13:25:27 2015 > [0]PETSC ERROR: Libraries linked from > /home/hsahasra/NEMO5/libs/petsc/build-real/linux/lib > [0]PETSC ERROR: Configure run at Fri Mar 20 15:18:25 2015 > [0]PETSC ERROR: Configure options --with-x=0 --download-hdf5=1 > --with-scalar-type=real --with-single-library=0 --with-shared-libraries=0 > --with-clanguage=C++ --with-fortran=1 --with-cc=mpiicc --with-fc=mpiifort > --with-cxx=mpiicpc COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 > --download-metis=1 --download-parmetis=1 > --with-valgrind-dir=/apps/rhel6/valgrind/3.8.1/ --download-mumps=1 > --with-fortran-kernels=0 > --with-blas-lapack-dir=/apps/rhel6/intel/composer_xe_2013.3.163/mkl > --download-superlu_dist=1 > --with-blas-lapack-dir=/apps/rhel6/intel/composer_xe_2013.3.163/mkl > --with-blacs-lib=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so > --with-blacs-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include > --with-scalapack-lib="-L/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64 > -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64" i > --with-scalapack-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include > --with-pic=1 --with-debugging=1 > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: MatGetColoring() line 481 in > /home/hsahasra/NEMO5/libs/petsc/build-real/src/mat/color/color.c > [0]PETSC ERROR: SNESComputeJacobianDefaultColor() line 64 in > /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/interface/snesj2.c > [0]PETSC ERROR: SNESComputeJacobian() line 2152 in > /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/interface/snes.c > [0]PETSC ERROR: SNESSolve_NEWTONLS() line 218 in > /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/impls/ls/ls.c > [0]PETSC ERROR: SNESSolve() line 3636 in > /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/interface/snes.c > [0]PETSC ERROR: solve() line 538 in > "unknowndirectory/"src/solvers/petsc_nonlinear_solver.C > application called MPI_Abort(comm=0x84000000, 73) - process 0 > > Thanks, > Harshad > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jun 5 13:02:10 2015 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 5 Jun 2015 13:02:10 -0500 Subject: [petsc-users] Guidance on GAMG preconditioning In-Reply-To: References: Message-ID: On Thu, Jun 4, 2015 at 4:47 PM, Justin Chang wrote: > Thank you Matt and Mark for the clarification. Matt, if you recall our > discussion about calculating the arithmetic intensity from the earlier > threads, it seems GAMG now has a myriad of all these additional vector and > matrix operations that were not present in the CG/Jacobi case. Running with > the command line options you and Mark suggested, I now have these > additional operations to deal with: > > VecMDot > VecAXPBYCZ > VecMAXPY > VecSetRandom > VecNormalize > MatMultAdd > MatMultTranspose > MatSolve > MatConvert > MatScale > MatResidual > MatCoarsen > MatAXPY > MatMatMult > MatMatMultSym > MatMatMultNum > MatPtAP > MatPtAPSymbolic > PatPtAPNumeric > MatTrnMatMult > MatTrnMatMultSym > MatTrnMatMultNum > MatGetSymTrans > KSPGMRESOrthog > PCGAMGGraph_AGG > PCGAMGCoarse_AGG > PCGAMGProl_AGG > PCGAMGPOpt_AGG > GAMG: createProl and all of its associated events. > GAMG: partLevel > PCSetUpOnBlocks > > Attached is the output from -log_summary showing the exact counts for the > case I am running. > > I have the following questions: > I will do my best > 1) For the Vec operations VecMDot and VecMAXPY, it seems the estimation of > total bytes transferred (TBT) relies on knowing how many vectors there are. > Is there a way to figure this out? Or at least with gamg what would it be, > three vectors? > AMG is very complex. I think that the simple performance model a) might be too simple, and b) is really hard to model this way all at once. It really needs to be broken into parts. Thus I am not sure how much you learn from this model for AMG. > 2) It seems there are a lot of matrix manipulations and multiplications. > Is it safe to say that the size and number of non zeroes is the same? Or > will it change? > Nope, since they happen on different levels. > 3) If I follow the TBT tabulation as in that paper you pointed me to, > would MatMultTranspose follow the same formula if the Jacobian is symmetric? > Yes. > 4) How do I calculate anything that requires the multiplication of at > least two matrices? > This is a hard problem. > 5) More importantly, are any of the above calculations necessary? Because > log_summary seems to indicate that MatMult() has the greatest amount of > workload and number of calls. My only hesitation is how much traffic > MatMatMults may take (assuming I go off of the same assumptions as in that > paper). > The overwhleming cost of AMG is the Galerkin triple-product RAP. Thanks, Matt > 6) And/or, are there any other functions that I missed that might be > important to calculate as well? > > Thanks, > Justin > > On Thu, Jun 4, 2015 at 11:33 AM, Mark Adams wrote: > >> >> >> On Thu, Jun 4, 2015 at 12:29 PM, Matthew Knepley >> wrote: >> >>> On Thu, Jun 4, 2015 at 10:31 AM, Justin Chang >>> wrote: >>> >>>> Yeah I saw his recommendation and am trying it out. But I am not sure >>>> what most of those parameters mean. For instance: >>>> >>>> 1) What does -pc_gamg_agg_nsmooths refer to? >>>> >>> >>> This is always 1 (its the definition of smoothed aggregation). Mark >>> allows 0 to support unsmoothed aggregation, which may be >>> better for easy problems on extremely large machines. >>> >>> >>>> 2) Does increase in the threshold of -pc_gamg_threshold translate to >>>> making the coarsening faster? >>>> >>> >>> Yes, I believe so (easy to check). >>> >> >> Other way around. >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hsahasra at purdue.edu Fri Jun 5 13:30:53 2015 From: hsahasra at purdue.edu (Harshad Sahasrabudhe) Date: Fri, 5 Jun 2015 14:30:53 -0400 Subject: [petsc-users] Crash when trying to use FD Jacobian In-Reply-To: References: Message-ID: Hi Matt, Thanks for helping me out with this. > Ah, this is going to be somewhat harder. Unless PETSc know the > connectivity of your Jacobian, which means the influence between > unknowns, it can only do one vector at a time Yes, I'm discretizing the equation on a FEM mesh, so I know the connectivity between different DOFs. Do you have a simplified Jacobian matrix you could use for preconditioner > construction? I have an approximate diagonal Jacobian matrix, which doesn't give good results. It is trying to use get coloring for the Jacobian > Is there some documentation on coloring which I can read up so that I can generate the coloring for the Jacobian myself? Thanks, Harshad On Fri, Jun 5, 2015 at 1:59 PM, Matthew Knepley wrote: > On Fri, Jun 5, 2015 at 12:32 PM, Harshad Sahasrabudhe > wrote: > >> Hi, >> >> I'm solving a non-linear equation using NEWTONLS. The SNES is called from >> a wrapper in the LibMesh library. I'm trying to use the default FD Jacobian >> by not setting any Mat or callback function for the Jacobian. >> >> When doing this I get the following error. I'm not able to figure out why >> I get this error. Can I get some pointers to what I might be doing wrong? >> > > Ah, this is going to be somewhat harder. Unless PETSc know the > connectivity of your Jacobian, which means the influence between > unknowns, it can only do one vector at a time: > > -snes_fd > > which is really slow. It is trying to use get coloring for the Jacobian so > that it can do many vectors > at once. Do you have a simplified Jacobian matrix you could use for > preconditioner construction? > Then it could use that. > > Thanks, > > Matt > > >> [0]PETSC ERROR: --------------------- Error Message >> ------------------------------------ >> [0]PETSC ERROR: Object is in wrong state! >> [0]PETSC ERROR: Not for unassembled matrix! >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [0]PETSC ERROR: Petsc Release Version 3.4.3, Oct, 15, 2013 >> [0]PETSC ERROR: See docs/changes/index.html for recent updates. >> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. >> [0]PETSC ERROR: See docs/index.html for manual pages. >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [0]PETSC ERROR: ./nemo on a linux named conte-fe02.rcac.purdue.edu by >> hsahasra Fri Jun 5 13:25:27 2015 >> [0]PETSC ERROR: Libraries linked from >> /home/hsahasra/NEMO5/libs/petsc/build-real/linux/lib >> [0]PETSC ERROR: Configure run at Fri Mar 20 15:18:25 2015 >> [0]PETSC ERROR: Configure options --with-x=0 --download-hdf5=1 >> --with-scalar-type=real --with-single-library=0 --with-shared-libraries=0 >> --with-clanguage=C++ --with-fortran=1 --with-cc=mpiicc --with-fc=mpiifort >> --with-cxx=mpiicpc COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 >> --download-metis=1 --download-parmetis=1 >> --with-valgrind-dir=/apps/rhel6/valgrind/3.8.1/ --download-mumps=1 >> --with-fortran-kernels=0 >> --with-blas-lapack-dir=/apps/rhel6/intel/composer_xe_2013.3.163/mkl >> --download-superlu_dist=1 >> --with-blas-lapack-dir=/apps/rhel6/intel/composer_xe_2013.3.163/mkl >> --with-blacs-lib=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so >> --with-blacs-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include >> --with-scalapack-lib="-L/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64 >> -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64" i >> --with-scalapack-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include >> --with-pic=1 --with-debugging=1 >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [0]PETSC ERROR: MatGetColoring() line 481 in >> /home/hsahasra/NEMO5/libs/petsc/build-real/src/mat/color/color.c >> [0]PETSC ERROR: SNESComputeJacobianDefaultColor() line 64 in >> /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/interface/snesj2.c >> [0]PETSC ERROR: SNESComputeJacobian() line 2152 in >> /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/interface/snes.c >> [0]PETSC ERROR: SNESSolve_NEWTONLS() line 218 in >> /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/impls/ls/ls.c >> [0]PETSC ERROR: SNESSolve() line 3636 in >> /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/interface/snes.c >> [0]PETSC ERROR: solve() line 538 in >> "unknowndirectory/"src/solvers/petsc_nonlinear_solver.C >> application called MPI_Abort(comm=0x84000000, 73) - process 0 >> >> Thanks, >> Harshad >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jun 5 13:33:40 2015 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 5 Jun 2015 13:33:40 -0500 Subject: [petsc-users] Crash when trying to use FD Jacobian In-Reply-To: References: Message-ID: On Fri, Jun 5, 2015 at 1:30 PM, Harshad Sahasrabudhe wrote: > Hi Matt, > > Thanks for helping me out with this. > > >> Ah, this is going to be somewhat harder. Unless PETSc know the >> connectivity of your Jacobian, which means the influence between >> unknowns, it can only do one vector at a time > > > Yes, I'm discretizing the equation on a FEM mesh, so I know the > connectivity between different DOFs. > > Do you have a simplified Jacobian matrix you could use for preconditioner >> construction? > > > I have an approximate diagonal Jacobian matrix, which doesn't give good > results. > > It is trying to use get coloring for the Jacobian >> > > Is there some documentation on coloring which I can read up so that I can > generate the coloring for the Jacobian myself? > Yes, section 5.6 of the manual. Thanks, Matt > Thanks, > Harshad > > On Fri, Jun 5, 2015 at 1:59 PM, Matthew Knepley wrote: > >> On Fri, Jun 5, 2015 at 12:32 PM, Harshad Sahasrabudhe < >> hsahasra at purdue.edu> wrote: >> >>> Hi, >>> >>> I'm solving a non-linear equation using NEWTONLS. The SNES is called >>> from a wrapper in the LibMesh library. I'm trying to use the default FD >>> Jacobian by not setting any Mat or callback function for the Jacobian. >>> >>> When doing this I get the following error. I'm not able to figure out >>> why I get this error. Can I get some pointers to what I might be doing >>> wrong? >>> >> >> Ah, this is going to be somewhat harder. Unless PETSc know the >> connectivity of your Jacobian, which means the influence between >> unknowns, it can only do one vector at a time: >> >> -snes_fd >> >> which is really slow. It is trying to use get coloring for the Jacobian >> so that it can do many vectors >> at once. Do you have a simplified Jacobian matrix you could use for >> preconditioner construction? >> Then it could use that. >> >> Thanks, >> >> Matt >> >> >>> [0]PETSC ERROR: --------------------- Error Message >>> ------------------------------------ >>> [0]PETSC ERROR: Object is in wrong state! >>> [0]PETSC ERROR: Not for unassembled matrix! >>> [0]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> [0]PETSC ERROR: Petsc Release Version 3.4.3, Oct, 15, 2013 >>> [0]PETSC ERROR: See docs/changes/index.html for recent updates. >>> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. >>> [0]PETSC ERROR: See docs/index.html for manual pages. >>> [0]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> [0]PETSC ERROR: ./nemo on a linux named conte-fe02.rcac.purdue.edu by >>> hsahasra Fri Jun 5 13:25:27 2015 >>> [0]PETSC ERROR: Libraries linked from >>> /home/hsahasra/NEMO5/libs/petsc/build-real/linux/lib >>> [0]PETSC ERROR: Configure run at Fri Mar 20 15:18:25 2015 >>> [0]PETSC ERROR: Configure options --with-x=0 --download-hdf5=1 >>> --with-scalar-type=real --with-single-library=0 --with-shared-libraries=0 >>> --with-clanguage=C++ --with-fortran=1 --with-cc=mpiicc --with-fc=mpiifort >>> --with-cxx=mpiicpc COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 >>> --download-metis=1 --download-parmetis=1 >>> --with-valgrind-dir=/apps/rhel6/valgrind/3.8.1/ --download-mumps=1 >>> --with-fortran-kernels=0 >>> --with-blas-lapack-dir=/apps/rhel6/intel/composer_xe_2013.3.163/mkl >>> --download-superlu_dist=1 >>> --with-blas-lapack-dir=/apps/rhel6/intel/composer_xe_2013.3.163/mkl >>> --with-blacs-lib=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so >>> --with-blacs-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include >>> --with-scalapack-lib="-L/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64 >>> -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64" i >>> --with-scalapack-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include >>> --with-pic=1 --with-debugging=1 >>> [0]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> [0]PETSC ERROR: MatGetColoring() line 481 in >>> /home/hsahasra/NEMO5/libs/petsc/build-real/src/mat/color/color.c >>> [0]PETSC ERROR: SNESComputeJacobianDefaultColor() line 64 in >>> /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/interface/snesj2.c >>> [0]PETSC ERROR: SNESComputeJacobian() line 2152 in >>> /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/interface/snes.c >>> [0]PETSC ERROR: SNESSolve_NEWTONLS() line 218 in >>> /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/impls/ls/ls.c >>> [0]PETSC ERROR: SNESSolve() line 3636 in >>> /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/interface/snes.c >>> [0]PETSC ERROR: solve() line 538 in >>> "unknowndirectory/"src/solvers/petsc_nonlinear_solver.C >>> application called MPI_Abort(comm=0x84000000, 73) - process 0 >>> >>> Thanks, >>> Harshad >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Jun 5 13:35:46 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 5 Jun 2015 13:35:46 -0500 Subject: [petsc-users] Crash when trying to use FD Jacobian In-Reply-To: References: Message-ID: If you can provide the non-zero structure of the Jacobian that is all that is needed. Since it is LibMesh I assume you are doing some finite element method. Thus just initially input "a wrong" Jacobian, so long as it has the correct nonzero structure then using the coloring code to compute the Jacobian will work fine. Barry > On Jun 5, 2015, at 12:59 PM, Matthew Knepley wrote: > > On Fri, Jun 5, 2015 at 12:32 PM, Harshad Sahasrabudhe wrote: > Hi, > > I'm solving a non-linear equation using NEWTONLS. The SNES is called from a wrapper in the LibMesh library. I'm trying to use the default FD Jacobian by not setting any Mat or callback function for the Jacobian. > > When doing this I get the following error. I'm not able to figure out why I get this error. Can I get some pointers to what I might be doing wrong? > > Ah, this is going to be somewhat harder. Unless PETSc know the connectivity of your Jacobian, which means the influence between > unknowns, it can only do one vector at a time: > > -snes_fd > > which is really slow. It is trying to use get coloring for the Jacobian so that it can do many vectors > at once. Do you have a simplified Jacobian matrix you could use for preconditioner construction? > Then it could use that. > > Thanks, > > Matt > > [0]PETSC ERROR: --------------------- Error Message ------------------------------------ > [0]PETSC ERROR: Object is in wrong state! > [0]PETSC ERROR: Not for unassembled matrix! > [0]PETSC ERROR: ------------------------------------------------------------------------ > [0]PETSC ERROR: Petsc Release Version 3.4.3, Oct, 15, 2013 > [0]PETSC ERROR: See docs/changes/index.html for recent updates. > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. > [0]PETSC ERROR: See docs/index.html for manual pages. > [0]PETSC ERROR: ------------------------------------------------------------------------ > [0]PETSC ERROR: ./nemo on a linux named conte-fe02.rcac.purdue.edu by hsahasra Fri Jun 5 13:25:27 2015 > [0]PETSC ERROR: Libraries linked from /home/hsahasra/NEMO5/libs/petsc/build-real/linux/lib > [0]PETSC ERROR: Configure run at Fri Mar 20 15:18:25 2015 > [0]PETSC ERROR: Configure options --with-x=0 --download-hdf5=1 --with-scalar-type=real --with-single-library=0 --with-shared-libraries=0 --with-clanguage=C++ --with-fortran=1 --with-cc=mpiicc --with-fc=mpiifort --with-cxx=mpiicpc COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 --download-metis=1 --download-parmetis=1 --with-valgrind-dir=/apps/rhel6/valgrind/3.8.1/ --download-mumps=1 --with-fortran-kernels=0 --with-blas-lapack-dir=/apps/rhel6/intel/composer_xe_2013.3.163/mkl --download-superlu_dist=1 --with-blas-lapack-dir=/apps/rhel6/intel/composer_xe_2013.3.163/mkl --with-blacs-lib=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so --with-blacs-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include --with-scalapack-lib="-L/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64" i --with-scalapack-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include --with-pic=1 --with-debugging=1 > [0]PETSC ERROR: ------------------------------------------------------------------------ > [0]PETSC ERROR: MatGetColoring() line 481 in /home/hsahasra/NEMO5/libs/petsc/build-real/src/mat/color/color.c > [0]PETSC ERROR: SNESComputeJacobianDefaultColor() line 64 in /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/interface/snesj2.c > [0]PETSC ERROR: SNESComputeJacobian() line 2152 in /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/interface/snes.c > [0]PETSC ERROR: SNESSolve_NEWTONLS() line 218 in /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/impls/ls/ls.c > [0]PETSC ERROR: SNESSolve() line 3636 in /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/interface/snes.c > [0]PETSC ERROR: solve() line 538 in "unknowndirectory/"src/solvers/petsc_nonlinear_solver.C > application called MPI_Abort(comm=0x84000000, 73) - process 0 > > Thanks, > Harshad > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener From bsmith at mcs.anl.gov Fri Jun 5 13:36:38 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 5 Jun 2015 13:36:38 -0500 Subject: [petsc-users] Crash when trying to use FD Jacobian In-Reply-To: References: Message-ID: <0370B286-9965-4968-9117-7C0757F94E13@mcs.anl.gov> > On Jun 5, 2015, at 1:30 PM, Harshad Sahasrabudhe wrote: > > Hi Matt, > > Thanks for helping me out with this. > > Ah, this is going to be somewhat harder. Unless PETSc know the connectivity of your Jacobian, which means the influence between > unknowns, it can only do one vector at a time > > Yes, I'm discretizing the equation on a FEM mesh, so I know the connectivity between different DOFs. > > Do you have a simplified Jacobian matrix you could use for preconditioner construction? > > I have an approximate diagonal Jacobian matrix, which doesn't give good results. > > It is trying to use get coloring for the Jacobian > > Is there some documentation on coloring which I can read up so that I can generate the coloring for the Jacobian myself? It is not the coloring you need to generate, just the nonzero structure. (From that PETSc computes the coloring) so do as I suggest in my other email. Barry > > Thanks, > Harshad > > On Fri, Jun 5, 2015 at 1:59 PM, Matthew Knepley wrote: > On Fri, Jun 5, 2015 at 12:32 PM, Harshad Sahasrabudhe wrote: > Hi, > > I'm solving a non-linear equation using NEWTONLS. The SNES is called from a wrapper in the LibMesh library. I'm trying to use the default FD Jacobian by not setting any Mat or callback function for the Jacobian. > > When doing this I get the following error. I'm not able to figure out why I get this error. Can I get some pointers to what I might be doing wrong? > > Ah, this is going to be somewhat harder. Unless PETSc know the connectivity of your Jacobian, which means the influence between > unknowns, it can only do one vector at a time: > > -snes_fd > > which is really slow. It is trying to use get coloring for the Jacobian so that it can do many vectors > at once. Do you have a simplified Jacobian matrix you could use for preconditioner construction? > Then it could use that. > > Thanks, > > Matt > > [0]PETSC ERROR: --------------------- Error Message ------------------------------------ > [0]PETSC ERROR: Object is in wrong state! > [0]PETSC ERROR: Not for unassembled matrix! > [0]PETSC ERROR: ------------------------------------------------------------------------ > [0]PETSC ERROR: Petsc Release Version 3.4.3, Oct, 15, 2013 > [0]PETSC ERROR: See docs/changes/index.html for recent updates. > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. > [0]PETSC ERROR: See docs/index.html for manual pages. > [0]PETSC ERROR: ------------------------------------------------------------------------ > [0]PETSC ERROR: ./nemo on a linux named conte-fe02.rcac.purdue.edu by hsahasra Fri Jun 5 13:25:27 2015 > [0]PETSC ERROR: Libraries linked from /home/hsahasra/NEMO5/libs/petsc/build-real/linux/lib > [0]PETSC ERROR: Configure run at Fri Mar 20 15:18:25 2015 > [0]PETSC ERROR: Configure options --with-x=0 --download-hdf5=1 --with-scalar-type=real --with-single-library=0 --with-shared-libraries=0 --with-clanguage=C++ --with-fortran=1 --with-cc=mpiicc --with-fc=mpiifort --with-cxx=mpiicpc COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 --download-metis=1 --download-parmetis=1 --with-valgrind-dir=/apps/rhel6/valgrind/3.8.1/ --download-mumps=1 --with-fortran-kernels=0 --with-blas-lapack-dir=/apps/rhel6/intel/composer_xe_2013.3.163/mkl --download-superlu_dist=1 --with-blas-lapack-dir=/apps/rhel6/intel/composer_xe_2013.3.163/mkl --with-blacs-lib=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so --with-blacs-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include --with-scalapack-lib="-L/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64" i --with-scalapack-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include --with-pic=1 --with-debugging=1 > [0]PETSC ERROR: ------------------------------------------------------------------------ > [0]PETSC ERROR: MatGetColoring() line 481 in /home/hsahasra/NEMO5/libs/petsc/build-real/src/mat/color/color.c > [0]PETSC ERROR: SNESComputeJacobianDefaultColor() line 64 in /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/interface/snesj2.c > [0]PETSC ERROR: SNESComputeJacobian() line 2152 in /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/interface/snes.c > [0]PETSC ERROR: SNESSolve_NEWTONLS() line 218 in /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/impls/ls/ls.c > [0]PETSC ERROR: SNESSolve() line 3636 in /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/interface/snes.c > [0]PETSC ERROR: solve() line 538 in "unknowndirectory/"src/solvers/petsc_nonlinear_solver.C > application called MPI_Abort(comm=0x84000000, 73) - process 0 > > Thanks, > Harshad > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > From hsahasra at purdue.edu Fri Jun 5 13:42:37 2015 From: hsahasra at purdue.edu (Harshad Sahasrabudhe) Date: Fri, 5 Jun 2015 14:42:37 -0400 Subject: [petsc-users] Crash when trying to use FD Jacobian In-Reply-To: <0370B286-9965-4968-9117-7C0757F94E13@mcs.anl.gov> References: <0370B286-9965-4968-9117-7C0757F94E13@mcs.anl.gov> Message-ID: > > It is not the coloring you need to generate, just the nonzero structure. > (From that PETSc computes the coloring) so do as I suggest in my other > email. Thanks, I'll try that out. On Fri, Jun 5, 2015 at 2:36 PM, Barry Smith wrote: > > > On Jun 5, 2015, at 1:30 PM, Harshad Sahasrabudhe > wrote: > > > > Hi Matt, > > > > Thanks for helping me out with this. > > > > Ah, this is going to be somewhat harder. Unless PETSc know the > connectivity of your Jacobian, which means the influence between > > unknowns, it can only do one vector at a time > > > > Yes, I'm discretizing the equation on a FEM mesh, so I know the > connectivity between different DOFs. > > > > Do you have a simplified Jacobian matrix you could use for > preconditioner construction? > > > > I have an approximate diagonal Jacobian matrix, which doesn't give good > results. > > > > It is trying to use get coloring for the Jacobian > > > > Is there some documentation on coloring which I can read up so that I > can generate the coloring for the Jacobian myself? > > It is not the coloring you need to generate, just the nonzero structure. > (From that PETSc computes the coloring) so do as I suggest in my other > email. > > Barry > > > > > > Thanks, > > Harshad > > > > On Fri, Jun 5, 2015 at 1:59 PM, Matthew Knepley > wrote: > > On Fri, Jun 5, 2015 at 12:32 PM, Harshad Sahasrabudhe < > hsahasra at purdue.edu> wrote: > > Hi, > > > > I'm solving a non-linear equation using NEWTONLS. The SNES is called > from a wrapper in the LibMesh library. I'm trying to use the default FD > Jacobian by not setting any Mat or callback function for the Jacobian. > > > > When doing this I get the following error. I'm not able to figure out > why I get this error. Can I get some pointers to what I might be doing > wrong? > > > > Ah, this is going to be somewhat harder. Unless PETSc know the > connectivity of your Jacobian, which means the influence between > > unknowns, it can only do one vector at a time: > > > > -snes_fd > > > > which is really slow. It is trying to use get coloring for the Jacobian > so that it can do many vectors > > at once. Do you have a simplified Jacobian matrix you could use for > preconditioner construction? > > Then it could use that. > > > > Thanks, > > > > Matt > > > > [0]PETSC ERROR: --------------------- Error Message > ------------------------------------ > > [0]PETSC ERROR: Object is in wrong state! > > [0]PETSC ERROR: Not for unassembled matrix! > > [0]PETSC ERROR: > ------------------------------------------------------------------------ > > [0]PETSC ERROR: Petsc Release Version 3.4.3, Oct, 15, 2013 > > [0]PETSC ERROR: See docs/changes/index.html for recent updates. > > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. > > [0]PETSC ERROR: See docs/index.html for manual pages. > > [0]PETSC ERROR: > ------------------------------------------------------------------------ > > [0]PETSC ERROR: ./nemo on a linux named conte-fe02.rcac.purdue.edu by > hsahasra Fri Jun 5 13:25:27 2015 > > [0]PETSC ERROR: Libraries linked from > /home/hsahasra/NEMO5/libs/petsc/build-real/linux/lib > > [0]PETSC ERROR: Configure run at Fri Mar 20 15:18:25 2015 > > [0]PETSC ERROR: Configure options --with-x=0 --download-hdf5=1 > --with-scalar-type=real --with-single-library=0 --with-shared-libraries=0 > --with-clanguage=C++ --with-fortran=1 --with-cc=mpiicc --with-fc=mpiifort > --with-cxx=mpiicpc COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 > --download-metis=1 --download-parmetis=1 > --with-valgrind-dir=/apps/rhel6/valgrind/3.8.1/ --download-mumps=1 > --with-fortran-kernels=0 > --with-blas-lapack-dir=/apps/rhel6/intel/composer_xe_2013.3.163/mkl > --download-superlu_dist=1 > --with-blas-lapack-dir=/apps/rhel6/intel/composer_xe_2013.3.163/mkl > --with-blacs-lib=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so > --with-blacs-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include > --with-scalapack-lib="-L/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64 > -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64" i > --with-scalapack-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include > --with-pic=1 --with-debugging=1 > > [0]PETSC ERROR: > ------------------------------------------------------------------------ > > [0]PETSC ERROR: MatGetColoring() line 481 in > /home/hsahasra/NEMO5/libs/petsc/build-real/src/mat/color/color.c > > [0]PETSC ERROR: SNESComputeJacobianDefaultColor() line 64 in > /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/interface/snesj2.c > > [0]PETSC ERROR: SNESComputeJacobian() line 2152 in > /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/interface/snes.c > > [0]PETSC ERROR: SNESSolve_NEWTONLS() line 218 in > /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/impls/ls/ls.c > > [0]PETSC ERROR: SNESSolve() line 3636 in > /home/hsahasra/NEMO5/libs/petsc/build-real/src/snes/interface/snes.c > > [0]PETSC ERROR: solve() line 538 in > "unknowndirectory/"src/solvers/petsc_nonlinear_solver.C > > application called MPI_Abort(comm=0x84000000, 73) - process 0 > > > > Thanks, > > Harshad > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ptbauman at gmail.com Fri Jun 5 13:49:39 2015 From: ptbauman at gmail.com (Paul T. Bauman) Date: Fri, 5 Jun 2015 14:49:39 -0400 Subject: [petsc-users] Crash when trying to use FD Jacobian In-Reply-To: References: <0370B286-9965-4968-9117-7C0757F94E13@mcs.anl.gov> Message-ID: On Fri, Jun 5, 2015 at 2:42 PM, Harshad Sahasrabudhe wrote: > It is not the coloring you need to generate, just the nonzero structure. >> (From that PETSc computes the coloring) so do as I suggest in my other >> email. > > > Thanks, I'll try that out. > FYI, this information is contained in the libMesh::CouplingMatrix which you can get from the libMesh::DofMap. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hsahasra at purdue.edu Fri Jun 5 13:51:20 2015 From: hsahasra at purdue.edu (Harshad Sahasrabudhe) Date: Fri, 5 Jun 2015 14:51:20 -0400 Subject: [petsc-users] Crash when trying to use FD Jacobian In-Reply-To: References: <0370B286-9965-4968-9117-7C0757F94E13@mcs.anl.gov> Message-ID: > > FYI, this information is contained in the libMesh::CouplingMatrix which > you can get from the libMesh::DofMap. Awesome! Thanks! On Fri, Jun 5, 2015 at 2:49 PM, Paul T. Bauman wrote: > > > On Fri, Jun 5, 2015 at 2:42 PM, Harshad Sahasrabudhe > wrote: > >> It is not the coloring you need to generate, just the nonzero >>> structure. (From that PETSc computes the coloring) so do as I suggest in my >>> other email. >> >> >> Thanks, I'll try that out. >> > > FYI, this information is contained in the libMesh::CouplingMatrix which > you can get from the libMesh::DofMap. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Jun 5 14:02:04 2015 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 5 Jun 2015 15:02:04 -0400 Subject: [petsc-users] Guidance on GAMG preconditioning In-Reply-To: References: Message-ID: > > >> > The overwhleming cost of AMG is the Galerkin triple-product RAP. > > That is overstating it a bit. It can be if you have a hard 3D operator and coarsening slowly is best. Rule of thumb is you spend 50% time is the solver and 50% in the setup, which is often mostly RAP (in 3D, 2D is much faster). That way you are within 2x of optimal and it often works out that way anyway. Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Jun 5 14:37:46 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 5 Jun 2015 14:37:46 -0500 Subject: [petsc-users] Crash when trying to use FD Jacobian In-Reply-To: References: <0370B286-9965-4968-9117-7C0757F94E13@mcs.anl.gov> Message-ID: <8B331034-A93F-4C35-9210-F33AA5EB4B90@mcs.anl.gov> > On Jun 5, 2015, at 1:49 PM, Paul T. Bauman wrote: > > > > On Fri, Jun 5, 2015 at 2:42 PM, Harshad Sahasrabudhe wrote: > It is not the coloring you need to generate, just the nonzero structure. (From that PETSc computes the coloring) so do as I suggest in my other email. > > Thanks, I'll try that out. > > FYI, this information is contained in the libMesh::CouplingMatrix which you can get from the libMesh::DofMap. Thanks. Maybe using this information could be "automated" within PETSc/libMesh so that one could "trivially" use the finite differencing to compute the Jacobian with coloring within PETSc/libMesh when the Jacobian cannot be provided directly by the user? For example within PETSc if one provides the option -snes_fd_color then SNES does ierr = PetscOptionsBool("-snes_fd_color","Use finite differences with coloring to compute Jacobian","SNESComputeJacobianDefaultColor",flg,&flg,NULL);CHKERRQ(ierr); if (flg) { DM dm; DMSNES sdm; ierr = SNESGetDM(snes,&dm);CHKERRQ(ierr); ierr = DMGetDMSNES(dm,&sdm);CHKERRQ(ierr); sdm->jacobianctx = NULL; ierr = SNESSetJacobian(snes,snes->jacobian,snes->jacobian_pre,SNESComputeJacobianDefaultColor,0);CHKERRQ(ierr); ierr = PetscInfo(snes,"Setting default finite difference coloring Jacobian matrix\n");CHKERRQ(ierr); } with PETSc/libMesh it would have to stick the correct matrix nonzero structure (from? "libMesh::CouplingMatrix") into the ,snes->jacobian_pre "Jacobian" "matrix" Barry From ptbauman at gmail.com Fri Jun 5 15:02:51 2015 From: ptbauman at gmail.com (Paul T. Bauman) Date: Fri, 5 Jun 2015 16:02:51 -0400 Subject: [petsc-users] Crash when trying to use FD Jacobian In-Reply-To: <8B331034-A93F-4C35-9210-F33AA5EB4B90@mcs.anl.gov> References: <0370B286-9965-4968-9117-7C0757F94E13@mcs.anl.gov> <8B331034-A93F-4C35-9210-F33AA5EB4B90@mcs.anl.gov> Message-ID: On Fri, Jun 5, 2015 at 3:37 PM, Barry Smith wrote: > > > On Jun 5, 2015, at 1:49 PM, Paul T. Bauman wrote: > > > > > > > > On Fri, Jun 5, 2015 at 2:42 PM, Harshad Sahasrabudhe < > hsahasra at purdue.edu> wrote: > > It is not the coloring you need to generate, just the nonzero > structure. (From that PETSc computes the coloring) so do as I suggest in my > other email. > > > > Thanks, I'll try that out. > > > > FYI, this information is contained in the libMesh::CouplingMatrix which > you can get from the libMesh::DofMap. > > Thanks. Maybe using this information could be "automated" within > PETSc/libMesh so that one could "trivially" use the finite differencing to > compute the Jacobian with coloring within PETSc/libMesh when the Jacobian > cannot be provided directly by the user? Thanks Barry, this is a good suggestion. To be honest, however, since MOOSE does its own finite differencing (I think...) and other libMesh-based infrastructure does as well, this may not be a developer priority immediately. (Might be something good to hack on in a couple weeks at PETSc 20.) That said... > For example within PETSc if one provides the option -snes_fd_color then > SNES does > > ierr = PetscOptionsBool("-snes_fd_color","Use finite differences with > coloring to compute > Jacobian","SNESComputeJacobianDefaultColor",flg,&flg,NULL);CHKERRQ(ierr); > if (flg) { > DM dm; > DMSNES sdm; > ierr = SNESGetDM(snes,&dm);CHKERRQ(ierr); > ierr = DMGetDMSNES(dm,&sdm);CHKERRQ(ierr); > sdm->jacobianctx = NULL; > ierr = > SNESSetJacobian(snes,snes->jacobian,snes->jacobian_pre,SNESComputeJacobianDefaultColor,0);CHKERRQ(ierr); > ierr = PetscInfo(snes,"Setting default finite difference coloring > Jacobian matrix\n");CHKERRQ(ierr); > } > > with PETSc/libMesh it would have to stick the correct matrix nonzero > structure (from? "libMesh::CouplingMatrix") into the ,snes->jacobian_pre > "Jacobian" "matrix" > Looks easy enough. Looking at the SNESSetJacobian documentation, the ctx (in your example, 0) would need to be a pointer a MatFDColoring? Best, Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Jun 5 15:06:47 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 5 Jun 2015 15:06:47 -0500 Subject: [petsc-users] Crash when trying to use FD Jacobian In-Reply-To: References: <0370B286-9965-4968-9117-7C0757F94E13@mcs.anl.gov> <8B331034-A93F-4C35-9210-F33AA5EB4B90@mcs.anl.gov> Message-ID: > On Jun 5, 2015, at 3:02 PM, Paul T. Bauman wrote: > > > > On Fri, Jun 5, 2015 at 3:37 PM, Barry Smith wrote: > > > On Jun 5, 2015, at 1:49 PM, Paul T. Bauman wrote: > > > > > > > > On Fri, Jun 5, 2015 at 2:42 PM, Harshad Sahasrabudhe wrote: > > It is not the coloring you need to generate, just the nonzero structure. (From that PETSc computes the coloring) so do as I suggest in my other email. > > > > Thanks, I'll try that out. > > > > FYI, this information is contained in the libMesh::CouplingMatrix which you can get from the libMesh::DofMap. > > Thanks. Maybe using this information could be "automated" within PETSc/libMesh so that one could "trivially" use the finite differencing to compute the Jacobian with coloring within PETSc/libMesh when the Jacobian cannot be provided directly by the user? > > Thanks Barry, this is a good suggestion. To be honest, however, since MOOSE does its own finite differencing (I think...) and other libMesh-based infrastructure does as well, We should fix that :-) Have one common abstraction/API for "differencing" for Jacobians with multiple implementations instead of multiple APIs and multiple implementations :-) Barry > this may not be a developer priority immediately. (Might be something good to hack on in a couple weeks at PETSc 20.) That said... > > For example within PETSc if one provides the option -snes_fd_color then SNES does > > ierr = PetscOptionsBool("-snes_fd_color","Use finite differences with coloring to compute Jacobian","SNESComputeJacobianDefaultColor",flg,&flg,NULL);CHKERRQ(ierr); > if (flg) { > DM dm; > DMSNES sdm; > ierr = SNESGetDM(snes,&dm);CHKERRQ(ierr); > ierr = DMGetDMSNES(dm,&sdm);CHKERRQ(ierr); > sdm->jacobianctx = NULL; > ierr = SNESSetJacobian(snes,snes->jacobian,snes->jacobian_pre,SNESComputeJacobianDefaultColor,0);CHKERRQ(ierr); > ierr = PetscInfo(snes,"Setting default finite difference coloring Jacobian matrix\n");CHKERRQ(ierr); > } > > with PETSc/libMesh it would have to stick the correct matrix nonzero structure (from? "libMesh::CouplingMatrix") into the ,snes->jacobian_pre "Jacobian" "matrix" > > Looks easy enough. Looking at the SNESSetJacobian documentation, the ctx (in your example, 0) would need to be a pointer a MatFDColoring? > > Best, > > Paul From evanum at gmail.com Fri Jun 5 18:26:36 2015 From: evanum at gmail.com (Evan Um) Date: Fri, 5 Jun 2015 16:26:36 -0700 Subject: [petsc-users] Calling single-precision MUMPS from PETSC In-Reply-To: References: Message-ID: Dear Barry and PETSC users, I am revisiting a problem about how to call a single precision MUMPS from double precision real/complex PETSC. After taking a look at mumps.c, I feel that the following changes (attached) can make it possible to always call a single precision MUMPS from PETSC. The change is basically to replace double precision MUMPS and their associated types with corresponding single ones. If someone already has some experience about this work, could you comment on the change? In advance, I appreciate your help. Best, Evan On Wed, Oct 22, 2014 at 1:36 PM, Barry Smith wrote: > > There is no support for this. You can only call single precision MUMPS > from single precision PETSc and double precision MUMPS from double > precision PETSc. > > You could try to hack the interface that calls MUMPS from PETSc to use > the single precision version but we don?t support that. > src/mat/impls/aij/mpi/mumps/mumps.c > > Barry > > > On Oct 22, 2014, at 3:29 PM, Evan Um wrote: > > > > Dear PETSC users, > > > > When MUMPS is used inside PETSC, The default MUMPS driver seems to be > double-precision MUMPS (i.e. DMUMPS). To reduce memory costs, I want to > test a single-precision MUMPS (SMUMPS) from PETSC. Does anyone know how to > switch from double to single-precision MUMPS inside PETSC? In advance, > thanks for your comments. > > > > Regards, > > Evan > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mumps_modified.c Type: text/x-csrc Size: 87301 bytes Desc: not available URL: From mrosso at uci.edu Fri Jun 5 21:22:59 2015 From: mrosso at uci.edu (Michele Rosso) Date: Fri, 05 Jun 2015 19:22:59 -0700 Subject: [petsc-users] Weird behavior for log_summary Message-ID: <1433557379.2754.6.camel@enterprise-A> Hi, I am checking the performances of my code via -log_summary, but the output is incomplete (please see attached) file. I configured petsc with the following options: if __name__ == '__main__': import sys import os sys.path.insert(0, os.path.abspath('config')) import configure configure_options = [ '--with-batch=1 ', '--known-mpi-shared=0 ', '--known-mpi-shared-libraries=0', '--known-memcmp-ok ', '--with-blas-lapack-lib=-L/opt/acml/5.3.1/gfortran64/lib -lacml', '--COPTFLAGS=-march=bdver1 -fopenmp -O3 -ffast-math -fPIC ', '--FOPTFLAGS=-march=bdver1 -fopenmp -O3 -ffast-math -fPIC ', '--CXXOPTFLAGS=-march=bdver1 -fopenmp -O3 -ffast-math -fPIC ', '--with-x=0 ', '--with-debugging=0', '--with-clib-autodetect=0 ', '--with-cxxlib-autodetect=0 ', '--with-fortranlib-autodetect=0 ', '--with-shared-libraries=0 ', '--with-mpi-compilers=1 ', '--with-cc=cc ', '--with-cxx=CC ', '--with-fc=ftn ', # '--with-64-bit-indices', '--download-hypre=1', '--download-blacs=1 ', '--download-scalapack=1 ', '--download-superlu_dist=1 ', '--download-metis=1 ', '--download-parmetis=1 ', ] configure.petsc_configure(configure_options) Any idea about this issue? Thanks, Michele -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: log_summary Type: application/octet-stream Size: 4782 bytes Desc: not available URL: From bsmith at mcs.anl.gov Fri Jun 5 21:34:30 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 5 Jun 2015 21:34:30 -0500 Subject: [petsc-users] Weird behavior for log_summary In-Reply-To: <1433557379.2754.6.camel@enterprise-A> References: <1433557379.2754.6.camel@enterprise-A> Message-ID: [NID 04001] 2015-06-04 19:07:24 Apid 25022256: initiated application termination Application 25022256 exit signals: Killed Application 25022256 resources: utime ~271s, stime ~15107s, Rss ~188536, inblocks ~5078831, outblocks ~12517984 Usually this kind of message indicates that either the OS or the batch system killed the process for some reason: often because it ran out of time or maybe memory. Can you run in batch with a request for more time? Do smaller jobs run through ok? If utime means user time and stime means system time then this is very bad, the system time is HUGE relative to the user time. Barry > On Jun 5, 2015, at 9:22 PM, Michele Rosso wrote: > > Hi, > > I am checking the performances of my code via -log_summary, but the output is incomplete (please see attached) file. > I configured petsc with the following options: > > if __name__ == '__main__': > import sys > import os > sys.path.insert(0, os.path.abspath('config')) > import configure > configure_options = [ > '--with-batch=1 ', > '--known-mpi-shared=0 ', > '--known-mpi-shared-libraries=0', > '--known-memcmp-ok ', > '--with-blas-lapack-lib=-L/opt/acml/5.3.1/gfortran64/lib -lacml', > '--COPTFLAGS=-march=bdver1 -fopenmp -O3 -ffast-math -fPIC ', > '--FOPTFLAGS=-march=bdver1 -fopenmp -O3 -ffast-math -fPIC ', > '--CXXOPTFLAGS=-march=bdver1 -fopenmp -O3 -ffast-math -fPIC ', > '--with-x=0 ', > '--with-debugging=0', > '--with-clib-autodetect=0 ', > '--with-cxxlib-autodetect=0 ', > '--with-fortranlib-autodetect=0 ', > '--with-shared-libraries=0 ', > '--with-mpi-compilers=1 ', > '--with-cc=cc ', > '--with-cxx=CC ', > '--with-fc=ftn ', > # '--with-64-bit-indices', > '--download-hypre=1', > '--download-blacs=1 ', > '--download-scalapack=1 ', > '--download-superlu_dist=1 ', > '--download-metis=1 ', > '--download-parmetis=1 ', > ] > configure.petsc_configure(configure_options) > > Any idea about this issue? > Thanks, > > Michele > > > > > From jychang48 at gmail.com Sat Jun 6 04:29:31 2015 From: jychang48 at gmail.com (Justin Chang) Date: Sat, 6 Jun 2015 04:29:31 -0500 Subject: [petsc-users] Guidance on GAMG preconditioning In-Reply-To: References: Message-ID: Matt and Mark thank you guys for your responses. The reason I brought up GAMG was because it seems to me that this is the preconditioner to use for elliptic problems. However, I am using CG/Jacobi for my larger problems and the solver converges (with -ksp_atol and -ksp_rtol set to 1e-8). Using GAMG I get rough the same wall-clock time, but significantly fewer solver iterations. As I also kind of mentioned in another mail, the ultimate purpose is to compare how this "correction" methodology using the TAO solver (with bounded constraints) performs compared to the original methodology using the KSP solver (without constraints). I have the A for BLMVM and CG/Jacobi and they are roughly 0.3 and 0.2 respectively (do these sound about right?). Although the AI is higher for TAO , the ratio of actual FLOPS/s over the AI*STREAMS BW is smaller, though I am not sure what conclusions to make of that. This was also partly why I wanted to see what kind of metrics another KSP solver/preconditioner produces. Point being, if I were to draw such comparisons between TAO and KSP, would I get crucified if people find out I am using CG/Jacobi and not GAMG? Thanks, Justin On Fri, Jun 5, 2015 at 2:02 PM, Mark Adams wrote: > >>> >> The overwhleming cost of AMG is the Galerkin triple-product RAP. >> >> > That is overstating it a bit. It can be if you have a hard 3D operator > and coarsening slowly is best. > > Rule of thumb is you spend 50% time is the solver and 50% in the setup, > which is often mostly RAP (in 3D, 2D is much faster). That way you are > within 2x of optimal and it often works out that way anyway. > > Mark > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Jun 6 06:13:24 2015 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 6 Jun 2015 06:13:24 -0500 Subject: [petsc-users] Guidance on GAMG preconditioning In-Reply-To: References: Message-ID: On Sat, Jun 6, 2015 at 4:29 AM, Justin Chang wrote: > Matt and Mark thank you guys for your responses. > > The reason I brought up GAMG was because it seems to me that this is the > preconditioner to use for elliptic problems. However, I am using CG/Jacobi > for my larger problems and the solver converges (with -ksp_atol and > -ksp_rtol set to 1e-8). Using GAMG I get rough the same wall-clock time, > but significantly fewer solver iterations. > > As I also kind of mentioned in another mail, the ultimate purpose is to > compare how this "correction" methodology using the TAO solver (with > bounded constraints) performs compared to the original methodology using > the KSP solver (without constraints). I have the A for BLMVM and CG/Jacobi > and they are roughly 0.3 and 0.2 respectively (do these sound about > right?). Although the AI is higher for TAO , the ratio of actual FLOPS/s > over the AI*STREAMS BW is smaller, though I am not sure what conclusions to > make of that. This was also partly why I wanted to see what kind of metrics > another KSP solver/preconditioner produces. > > Point being, if I were to draw such comparisons between TAO and KSP, would > I get crucified if people find out I am using CG/Jacobi and not GAMG? > Here is what someone like me reviewing your paper would say first. I can believe that a well-conditioned problem would converge using CG/Jacobi. However, if the highest order derivative looks like the Laplacian, then the condition number of the equations will be O(h^2), and even with CG it will be O(h), so the number of iterations should increase as the square root of the problem size (in 2D), where GAMG should be constant. Thus at some size GAMG will be more efficient. I would want to see where the crossover is for your problem. If you do not get the O(h) dependence, I would think that there is a problem in the formulation. Thanks, Matt > Thanks, > Justin > > On Fri, Jun 5, 2015 at 2:02 PM, Mark Adams wrote: > >> >>>> >>> The overwhleming cost of AMG is the Galerkin triple-product RAP. >>> >>> >> That is overstating it a bit. It can be if you have a hard 3D operator >> and coarsening slowly is best. >> >> Rule of thumb is you spend 50% time is the solver and 50% in the setup, >> which is often mostly RAP (in 3D, 2D is much faster). That way you are >> within 2x of optimal and it often works out that way anyway. >> >> Mark >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From may at bu.edu Sat Jun 6 15:00:54 2015 From: may at bu.edu (Young, Matthew, Adam) Date: Sat, 6 Jun 2015 20:00:54 +0000 Subject: [petsc-users] Guidance on GAMG preconditioning In-Reply-To: References: , Message-ID: <17A35C213185A84BB8ED54C88FBFD712C3D1232A@IST-EX10MBX-4.ad.bu.edu> Forgive me for being like a child who wanders into the middle of a movie... I've been attempting to follow this conversation from a beginner's level because I am trying to solve an elliptic PDE with variable coefficients. Both the operator and the RHS change at each time step and the operator has off-diagonal terms that become dominant as the instability of interest grows. I read somewhere that a direct method is the best for this but I'm intrigued by Justin's comment that GAMG seems to be "the preconditioner to use for elliptic problems". I don't want to highjack this conversation but it seems like a good chance to ask for your collective advice on resources for understanding my problem. Any thoughts? --Matt -------------------------------------------------------------- Matthew Young Graduate Student Boston University Dept. of Astronomy -------------------------------------------------------------- ________________________________ From: petsc-users-bounces at mcs.anl.gov [petsc-users-bounces at mcs.anl.gov] on behalf of Justin Chang [jychang48 at gmail.com] Sent: Saturday, June 06, 2015 5:29 AM To: Mark Adams Cc: petsc-users Subject: Re: [petsc-users] Guidance on GAMG preconditioning Matt and Mark thank you guys for your responses. The reason I brought up GAMG was because it seems to me that this is the preconditioner to use for elliptic problems. However, I am using CG/Jacobi for my larger problems and the solver converges (with -ksp_atol and -ksp_rtol set to 1e-8). Using GAMG I get rough the same wall-clock time, but significantly fewer solver iterations. As I also kind of mentioned in another mail, the ultimate purpose is to compare how this "correction" methodology using the TAO solver (with bounded constraints) performs compared to the original methodology using the KSP solver (without constraints). I have the A for BLMVM and CG/Jacobi and they are roughly 0.3 and 0.2 respectively (do these sound about right?). Although the AI is higher for TAO , the ratio of actual FLOPS/s over the AI*STREAMS BW is smaller, though I am not sure what conclusions to make of that. This was also partly why I wanted to see what kind of metrics another KSP solver/preconditioner produces. Point being, if I were to draw such comparisons between TAO and KSP, would I get crucified if people find out I am using CG/Jacobi and not GAMG? Thanks, Justin On Fri, Jun 5, 2015 at 2:02 PM, Mark Adams > wrote: The overwhleming cost of AMG is the Galerkin triple-product RAP. That is overstating it a bit. It can be if you have a hard 3D operator and coarsening slowly is best. Rule of thumb is you spend 50% time is the solver and 50% in the setup, which is often mostly RAP (in 3D, 2D is much faster). That way you are within 2x of optimal and it often works out that way anyway. Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Jun 6 17:12:38 2015 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 6 Jun 2015 17:12:38 -0500 Subject: [petsc-users] Guidance on GAMG preconditioning In-Reply-To: <17A35C213185A84BB8ED54C88FBFD712C3D1232A@IST-EX10MBX-4.ad.bu.edu> References: <17A35C213185A84BB8ED54C88FBFD712C3D1232A@IST-EX10MBX-4.ad.bu.edu> Message-ID: On Sat, Jun 6, 2015 at 3:00 PM, Young, Matthew, Adam wrote: > Forgive me for being like a child who wanders into the middle of a > movie... > > I've been attempting to follow this conversation from a beginner's level > because I am trying to solve an elliptic PDE with variable coefficients. > Both the operator and the RHS change at each time step and the operator has > off-diagonal terms that become dominant as the instability of interest > grows. I read somewhere that a direct method is the best for this but I'm > intrigued by Justin's comment that GAMG seems to be "the preconditioner to > use for elliptic problems". I don't want to highjack this conversation but > it seems like a good chance to ask for your collective advice on resources > for understanding my problem. Any thoughts? > The problem here is that fast methods do not depend on the operator being elliptic so much as they depend on the operator falling off away from the diagonal (satisfying a Calderon-Zygmund bound, there are lots of ways of expressing this). When this ceases to be true, these methods stop being fast. So the answer is, when you have complicated coefficient structure, there are no general methods and you need to know more about exactly what is going on. Where is your problem from? Matt > --Matt > > -------------------------------------------------------------- > Matthew Young > Graduate Student > Boston University Dept. of Astronomy > -------------------------------------------------------------- > > ------------------------------ > *From:* petsc-users-bounces at mcs.anl.gov [petsc-users-bounces at mcs.anl.gov] > on behalf of Justin Chang [jychang48 at gmail.com] > *Sent:* Saturday, June 06, 2015 5:29 AM > *To:* Mark Adams > *Cc:* petsc-users > *Subject:* Re: [petsc-users] Guidance on GAMG preconditioning > > Matt and Mark thank you guys for your responses. > > The reason I brought up GAMG was because it seems to me that this is the > preconditioner to use for elliptic problems. However, I am using CG/Jacobi > for my larger problems and the solver converges (with -ksp_atol and > -ksp_rtol set to 1e-8). Using GAMG I get rough the same wall-clock time, > but significantly fewer solver iterations. > > As I also kind of mentioned in another mail, the ultimate purpose is to > compare how this "correction" methodology using the TAO solver (with > bounded constraints) performs compared to the original methodology using > the KSP solver (without constraints). I have the A for BLMVM and CG/Jacobi > and they are roughly 0.3 and 0.2 respectively (do these sound about > right?). Although the AI is higher for TAO , the ratio of actual FLOPS/s > over the AI*STREAMS BW is smaller, though I am not sure what conclusions to > make of that. This was also partly why I wanted to see what kind of metrics > another KSP solver/preconditioner produces. > > Point being, if I were to draw such comparisons between TAO and KSP, > would I get crucified if people find out I am using CG/Jacobi and not GAMG? > > Thanks, > Justin > > On Fri, Jun 5, 2015 at 2:02 PM, Mark Adams wrote: > >> >>>> >>> The overwhleming cost of AMG is the Galerkin triple-product RAP. >>> >>> >> That is overstating it a bit. It can be if you have a hard 3D operator >> and coarsening slowly is best. >> >> Rule of thumb is you spend 50% time is the solver and 50% in the setup, >> which is often mostly RAP (in 3D, 2D is much faster). That way you are >> within 2x of optimal and it often works out that way anyway. >> >> Mark >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From may at bu.edu Sat Jun 6 18:02:14 2015 From: may at bu.edu (Young, Matthew, Adam) Date: Sat, 6 Jun 2015 23:02:14 +0000 Subject: [petsc-users] Guidance on GAMG preconditioning In-Reply-To: References: <17A35C213185A84BB8ED54C88FBFD712C3D1232A@IST-EX10MBX-4.ad.bu.edu>, Message-ID: <17A35C213185A84BB8ED54C88FBFD712C3D123ED@IST-EX10MBX-4.ad.bu.edu> This is a problem from ionospheric plasma physics. The simulation treats ions via a particle-in-cell method and electrons as an inertialess fluid, the justification being that ionospheric ions are 10^4 times more massive than electrons. We further assume that the plasma is effectively neutral on the length scale of interest (i.e. quasi-neutral) and those assumptions allows us to write an elliptic equation for the electrostatic potential, phi: Div[n(x) T Grad(phi)]. n(x) is the quasi-neutral plasma density, which is updated via an ion gather at each time step, and T is a tensor of constant coefficients that looks like {{1, kappa, 0},{-kappa, 1, 0},{0, 0, 1+kappa^2}}, where kappa is the ratio of gyrofrequency to collision frequency for electrons (~100 for our problem)*. The RHS is a function of density, ion current (or flux, both of which are related to density), and constant electron fluid parameters. Eq 1 of the attached paper shows this equation for the 2-D problem in the plane perpendicular to the ambient magnetic field. --Matt *It was a little unfair of me to say earlier that the off-diagonal terms grow as the simulation progresses. It's the density that gradients grow and they are multiplied by kappa. -------------------------------------------------------------- Matthew Young Graduate Student Boston University Dept. of Astronomy -------------------------------------------------------------- ________________________________ From: Matthew Knepley [knepley at gmail.com] Sent: Saturday, June 06, 2015 6:12 PM To: Young, Matthew, Adam Cc: Justin Chang; Mark Adams; petsc-users Subject: Re: [petsc-users] Guidance on GAMG preconditioning On Sat, Jun 6, 2015 at 3:00 PM, Young, Matthew, Adam > wrote: Forgive me for being like a child who wanders into the middle of a movie... I've been attempting to follow this conversation from a beginner's level because I am trying to solve an elliptic PDE with variable coefficients. Both the operator and the RHS change at each time step and the operator has off-diagonal terms that become dominant as the instability of interest grows. I read somewhere that a direct method is the best for this but I'm intrigued by Justin's comment that GAMG seems to be "the preconditioner to use for elliptic problems". I don't want to highjack this conversation but it seems like a good chance to ask for your collective advice on resources for understanding my problem. Any thoughts? The problem here is that fast methods do not depend on the operator being elliptic so much as they depend on the operator falling off away from the diagonal (satisfying a Calderon-Zygmund bound, there are lots of ways of expressing this). When this ceases to be true, these methods stop being fast. So the answer is, when you have complicated coefficient structure, there are no general methods and you need to know more about exactly what is going on. Where is your problem from? Matt --Matt -------------------------------------------------------------- Matthew Young Graduate Student Boston University Dept. of Astronomy -------------------------------------------------------------- ________________________________ From: petsc-users-bounces at mcs.anl.gov [petsc-users-bounces at mcs.anl.gov] on behalf of Justin Chang [jychang48 at gmail.com] Sent: Saturday, June 06, 2015 5:29 AM To: Mark Adams Cc: petsc-users Subject: Re: [petsc-users] Guidance on GAMG preconditioning Matt and Mark thank you guys for your responses. The reason I brought up GAMG was because it seems to me that this is the preconditioner to use for elliptic problems. However, I am using CG/Jacobi for my larger problems and the solver converges (with -ksp_atol and -ksp_rtol set to 1e-8). Using GAMG I get rough the same wall-clock time, but significantly fewer solver iterations. As I also kind of mentioned in another mail, the ultimate purpose is to compare how this "correction" methodology using the TAO solver (with bounded constraints) performs compared to the original methodology using the KSP solver (without constraints). I have the A for BLMVM and CG/Jacobi and they are roughly 0.3 and 0.2 respectively (do these sound about right?). Although the AI is higher for TAO , the ratio of actual FLOPS/s over the AI*STREAMS BW is smaller, though I am not sure what conclusions to make of that. This was also partly why I wanted to see what kind of metrics another KSP solver/preconditioner produces. Point being, if I were to draw such comparisons between TAO and KSP, would I get crucified if people find out I am using CG/Jacobi and not GAMG? Thanks, Justin On Fri, Jun 5, 2015 at 2:02 PM, Mark Adams > wrote: The overwhleming cost of AMG is the Galerkin triple-product RAP. That is overstating it a bit. It can be if you have a hard 3D operator and coarsening slowly is best. Rule of thumb is you spend 50% time is the solver and 50% in the setup, which is often mostly RAP (in 3D, 2D is much faster). That way you are within 2x of optimal and it often works out that way anyway. Mark -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Oppenheim_etal-1995.pdf Type: application/pdf Size: 538607 bytes Desc: Oppenheim_etal-1995.pdf URL: From knepley at gmail.com Sat Jun 6 19:21:03 2015 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 6 Jun 2015 19:21:03 -0500 Subject: [petsc-users] Guidance on GAMG preconditioning In-Reply-To: <17A35C213185A84BB8ED54C88FBFD712C3D123ED@IST-EX10MBX-4.ad.bu.edu> References: <17A35C213185A84BB8ED54C88FBFD712C3D1232A@IST-EX10MBX-4.ad.bu.edu> <17A35C213185A84BB8ED54C88FBFD712C3D123ED@IST-EX10MBX-4.ad.bu.edu> Message-ID: On Sat, Jun 6, 2015 at 6:02 PM, Young, Matthew, Adam wrote: > This is a problem from ionospheric plasma physics. The simulation treats > ions via a particle-in-cell method and electrons as an inertialess fluid, > the justification being that ionospheric ions are 10^4 times more massive > than electrons. We further assume that the plasma is effectively neutral on > the length scale of interest (i.e. quasi-neutral) and those assumptions > allows us to write an elliptic equation for the electrostatic potential, > phi: Div[n(x) T Grad(phi)]. n(x) is the quasi-neutral plasma density, > which is updated via an ion gather at each time step, and T is a tensor of > constant coefficients that looks like {{1, kappa, 0},{-kappa, 1, 0},{0, 0, > 1+kappa^2}}, where kappa is the ratio of gyrofrequency to collision > frequency for electrons (~100 for our problem)*. The RHS is a function of > density, ion current (or flux, both of which are related to density), and > constant electron fluid parameters. Eq 1 of the attached paper shows this > equation for the 2-D problem in the plane perpendicular to the ambient > magnetic field. > Its hard to say anything without knowing what the n(x) functions look like. I would say now that it is easy to try GAMG, so that is what I would do first. Thanks, Matt > --Matt > > *It was a little unfair of me to say earlier that the off-diagonal terms > grow as the simulation progresses. It's the density that gradients grow and > they are multiplied by kappa. > > > > -------------------------------------------------------------- > Matthew Young > Graduate Student > Boston University Dept. of Astronomy > -------------------------------------------------------------- > > ------------------------------ > *From:* Matthew Knepley [knepley at gmail.com] > *Sent:* Saturday, June 06, 2015 6:12 PM > *To:* Young, Matthew, Adam > *Cc:* Justin Chang; Mark Adams; petsc-users > > *Subject:* Re: [petsc-users] Guidance on GAMG preconditioning > > On Sat, Jun 6, 2015 at 3:00 PM, Young, Matthew, Adam > wrote: > >> Forgive me for being like a child who wanders into the middle of a >> movie... >> >> I've been attempting to follow this conversation from a beginner's >> level because I am trying to solve an elliptic PDE with variable >> coefficients. Both the operator and the RHS change at each time step and >> the operator has off-diagonal terms that become dominant as the instability >> of interest grows. I read somewhere that a direct method is the best for >> this but I'm intrigued by Justin's comment that GAMG seems to be "the >> preconditioner to use for elliptic problems". I don't want to highjack this >> conversation but it seems like a good chance to ask for your collective >> advice on resources for understanding my problem. Any thoughts? >> > > The problem here is that fast methods do not depend on the operator > being elliptic so much as they depend on the operator > falling off away from the diagonal (satisfying a Calderon-Zygmund bound, > there are lots of ways of expressing this). When > this ceases to be true, these methods stop being fast. > > So the answer is, when you have complicated coefficient structure, there > are no general methods and you need to know more > about exactly what is going on. Where is your problem from? > > Matt > > >> --Matt >> >> -------------------------------------------------------------- >> Matthew Young >> Graduate Student >> Boston University Dept. of Astronomy >> -------------------------------------------------------------- >> >> ------------------------------ >> *From:* petsc-users-bounces at mcs.anl.gov [petsc-users-bounces at mcs.anl.gov] >> on behalf of Justin Chang [jychang48 at gmail.com] >> *Sent:* Saturday, June 06, 2015 5:29 AM >> *To:* Mark Adams >> *Cc:* petsc-users >> *Subject:* Re: [petsc-users] Guidance on GAMG preconditioning >> >> Matt and Mark thank you guys for your responses. >> >> The reason I brought up GAMG was because it seems to me that this is the >> preconditioner to use for elliptic problems. However, I am using CG/Jacobi >> for my larger problems and the solver converges (with -ksp_atol and >> -ksp_rtol set to 1e-8). Using GAMG I get rough the same wall-clock time, >> but significantly fewer solver iterations. >> >> As I also kind of mentioned in another mail, the ultimate purpose is to >> compare how this "correction" methodology using the TAO solver (with >> bounded constraints) performs compared to the original methodology using >> the KSP solver (without constraints). I have the A for BLMVM and CG/Jacobi >> and they are roughly 0.3 and 0.2 respectively (do these sound about >> right?). Although the AI is higher for TAO , the ratio of actual FLOPS/s >> over the AI*STREAMS BW is smaller, though I am not sure what conclusions to >> make of that. This was also partly why I wanted to see what kind of metrics >> another KSP solver/preconditioner produces. >> >> Point being, if I were to draw such comparisons between TAO and KSP, >> would I get crucified if people find out I am using CG/Jacobi and not GAMG? >> >> Thanks, >> Justin >> >> On Fri, Jun 5, 2015 at 2:02 PM, Mark Adams wrote: >> >>> >>>>> >>>> The overwhleming cost of AMG is the Galerkin triple-product RAP. >>>> >>>> >>> That is overstating it a bit. It can be if you have a hard 3D >>> operator and coarsening slowly is best. >>> >>> Rule of thumb is you spend 50% time is the solver and 50% in the >>> setup, which is often mostly RAP (in 3D, 2D is much faster). That way you >>> are within 2x of optimal and it often works out that way anyway. >>> >>> Mark >>> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Sun Jun 7 11:11:01 2015 From: mfadams at lbl.gov (Mark Adams) Date: Sun, 7 Jun 2015 12:11:01 -0400 Subject: [petsc-users] Guidance on GAMG preconditioning In-Reply-To: <17A35C213185A84BB8ED54C88FBFD712C3D1232A@IST-EX10MBX-4.ad.bu.edu> References: <17A35C213185A84BB8ED54C88FBFD712C3D1232A@IST-EX10MBX-4.ad.bu.edu> Message-ID: On Sat, Jun 6, 2015 at 4:00 PM, Young, Matthew, Adam wrote: > Forgive me for being like a child who wanders into the middle of a > movie... > > I've been attempting to follow this conversation from a beginner's level > because I am trying to solve an elliptic PDE with variable coefficients. > Both the operator and the RHS change at each time step and the operator has > off-diagonal terms that become dominant > Yikes. > as the instability of interest grows. > As Matt says, out-of-the-box multigrid will not solve all elliptic problems fast. Is the problem even elliptic if the off diagonals are dominant? Anyway, another way of looking at it is: if the Green's function decays quickly you can exploit that with a local process plus a coarse grid correction. If you have a funny Green's function you need a funny method to deal with it. > I read somewhere that a direct method is the best for this but I'm > intrigued by Justin's comment that GAMG seems to be "the preconditioner to > use for elliptic problems". I don't want to highjack this conversation but > it seems like a good chance to ask for your collective advice on resources > for understanding my problem. Any thoughts? > > --Matt > > -------------------------------------------------------------- > Matthew Young > Graduate Student > Boston University Dept. of Astronomy > -------------------------------------------------------------- > > ------------------------------ > *From:* petsc-users-bounces at mcs.anl.gov [petsc-users-bounces at mcs.anl.gov] > on behalf of Justin Chang [jychang48 at gmail.com] > *Sent:* Saturday, June 06, 2015 5:29 AM > *To:* Mark Adams > *Cc:* petsc-users > *Subject:* Re: [petsc-users] Guidance on GAMG preconditioning > > Matt and Mark thank you guys for your responses. > > The reason I brought up GAMG was because it seems to me that this is the > preconditioner to use for elliptic problems. However, I am using CG/Jacobi > for my larger problems and the solver converges (with -ksp_atol and > -ksp_rtol set to 1e-8). Using GAMG I get rough the same wall-clock time, > but significantly fewer solver iterations. > > As I also kind of mentioned in another mail, the ultimate purpose is to > compare how this "correction" methodology using the TAO solver (with > bounded constraints) performs compared to the original methodology using > the KSP solver (without constraints). I have the A for BLMVM and CG/Jacobi > and they are roughly 0.3 and 0.2 respectively (do these sound about > right?). Although the AI is higher for TAO , the ratio of actual FLOPS/s > over the AI*STREAMS BW is smaller, though I am not sure what conclusions to > make of that. This was also partly why I wanted to see what kind of metrics > another KSP solver/preconditioner produces. > > Point being, if I were to draw such comparisons between TAO and KSP, > would I get crucified if people find out I am using CG/Jacobi and not GAMG? > > Thanks, > Justin > > On Fri, Jun 5, 2015 at 2:02 PM, Mark Adams wrote: > >> >>>> >>> The overwhleming cost of AMG is the Galerkin triple-product RAP. >>> >>> >> That is overstating it a bit. It can be if you have a hard 3D operator >> and coarsening slowly is best. >> >> Rule of thumb is you spend 50% time is the solver and 50% in the setup, >> which is often mostly RAP (in 3D, 2D is much faster). That way you are >> within 2x of optimal and it often works out that way anyway. >> >> Mark >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Sun Jun 7 11:26:38 2015 From: mfadams at lbl.gov (Mark Adams) Date: Sun, 7 Jun 2015 12:26:38 -0400 Subject: [petsc-users] Guidance on GAMG preconditioning In-Reply-To: References: <17A35C213185A84BB8ED54C88FBFD712C3D1232A@IST-EX10MBX-4.ad.bu.edu> <17A35C213185A84BB8ED54C88FBFD712C3D123ED@IST-EX10MBX-4.ad.bu.edu> Message-ID: On Sat, Jun 6, 2015 at 8:21 PM, Matthew Knepley wrote: > On Sat, Jun 6, 2015 at 6:02 PM, Young, Matthew, Adam wrote: > >> This is a problem from ionospheric plasma physics. The simulation >> treats ions via a particle-in-cell method and electrons as an inertialess >> fluid, the justification being that ionospheric ions are 10^4 times more >> massive than electrons. We further assume that the plasma is effectively >> neutral on the length scale of interest (i.e. quasi-neutral) and those >> assumptions allows us to write an elliptic equation for the electrostatic >> potential, phi: Div[n(x) T Grad(phi)]. n(x) is the quasi-neutral plasma >> density, which is updated via an ion gather at each time step, and T is a >> tensor of constant coefficients that looks like {{1, kappa, 0},{-kappa, 1, >> 0},{0, 0, 1+kappa^2}}, where kappa is the ratio of gyrofrequency to >> collision frequency for electrons (~100 for our problem)*. The RHS is a >> function of density, ion current (or flux, both of which are related to >> density), and constant electron fluid parameters. Eq 1 of the attached >> paper shows this equation for the 2-D problem in the plane perpendicular to >> the ambient magnetic field. >> > > Its hard to say anything without knowing what the n(x) functions look > like. > I would think its pretty smooth unless it is a perturbation, this does not sound like it has shocks or anything ... but T scares the hell out of me! Very anisotropy on the vertical direction with a sort of skew thing going on in the horizontal plane? If you are seeing the convergence rate tank as kappa is increased then you will probably find that ASM smoothers help. Making good subdomains for ASM is a problem. If you have small subdomains per process you can use these (just use bjacobi as the smoother). GAMG can let you use the GAMG aggregates as blocks (this has not been tested for ages and we don't have a regression test it). Try just using processor ASM and use as many processors as you can. If this helps your convergence rate (a lot) then this might be the way to go and we can look at GAMG blocks. Mark > I would say now that it is easy to try GAMG, so > that is what I would do first. > > Thanks, > > Matt > > >> --Matt >> >> *It was a little unfair of me to say earlier that the off-diagonal >> terms grow as the simulation progresses. It's the density that gradients >> grow and they are multiplied by kappa. >> >> >> >> -------------------------------------------------------------- >> Matthew Young >> Graduate Student >> Boston University Dept. of Astronomy >> -------------------------------------------------------------- >> >> ------------------------------ >> *From:* Matthew Knepley [knepley at gmail.com] >> *Sent:* Saturday, June 06, 2015 6:12 PM >> *To:* Young, Matthew, Adam >> *Cc:* Justin Chang; Mark Adams; petsc-users >> >> *Subject:* Re: [petsc-users] Guidance on GAMG preconditioning >> >> On Sat, Jun 6, 2015 at 3:00 PM, Young, Matthew, Adam >> wrote: >> >>> Forgive me for being like a child who wanders into the middle of a >>> movie... >>> >>> I've been attempting to follow this conversation from a beginner's >>> level because I am trying to solve an elliptic PDE with variable >>> coefficients. Both the operator and the RHS change at each time step and >>> the operator has off-diagonal terms that become dominant as the instability >>> of interest grows. I read somewhere that a direct method is the best for >>> this but I'm intrigued by Justin's comment that GAMG seems to be "the >>> preconditioner to use for elliptic problems". I don't want to highjack this >>> conversation but it seems like a good chance to ask for your collective >>> advice on resources for understanding my problem. Any thoughts? >>> >> >> The problem here is that fast methods do not depend on the operator >> being elliptic so much as they depend on the operator >> falling off away from the diagonal (satisfying a Calderon-Zygmund bound, >> there are lots of ways of expressing this). When >> this ceases to be true, these methods stop being fast. >> >> So the answer is, when you have complicated coefficient structure, >> there are no general methods and you need to know more >> about exactly what is going on. Where is your problem from? >> >> Matt >> >> >>> --Matt >>> >>> -------------------------------------------------------------- >>> Matthew Young >>> Graduate Student >>> Boston University Dept. of Astronomy >>> -------------------------------------------------------------- >>> >>> ------------------------------ >>> *From:* petsc-users-bounces at mcs.anl.gov [petsc-users-bounces at mcs.anl.gov] >>> on behalf of Justin Chang [jychang48 at gmail.com] >>> *Sent:* Saturday, June 06, 2015 5:29 AM >>> *To:* Mark Adams >>> *Cc:* petsc-users >>> *Subject:* Re: [petsc-users] Guidance on GAMG preconditioning >>> >>> Matt and Mark thank you guys for your responses. >>> >>> The reason I brought up GAMG was because it seems to me that this is the >>> preconditioner to use for elliptic problems. However, I am using CG/Jacobi >>> for my larger problems and the solver converges (with -ksp_atol and >>> -ksp_rtol set to 1e-8). Using GAMG I get rough the same wall-clock time, >>> but significantly fewer solver iterations. >>> >>> As I also kind of mentioned in another mail, the ultimate purpose is to >>> compare how this "correction" methodology using the TAO solver (with >>> bounded constraints) performs compared to the original methodology using >>> the KSP solver (without constraints). I have the A for BLMVM and CG/Jacobi >>> and they are roughly 0.3 and 0.2 respectively (do these sound about >>> right?). Although the AI is higher for TAO , the ratio of actual FLOPS/s >>> over the AI*STREAMS BW is smaller, though I am not sure what conclusions to >>> make of that. This was also partly why I wanted to see what kind of metrics >>> another KSP solver/preconditioner produces. >>> >>> Point being, if I were to draw such comparisons between TAO and KSP, >>> would I get crucified if people find out I am using CG/Jacobi and not GAMG? >>> >>> Thanks, >>> Justin >>> >>> On Fri, Jun 5, 2015 at 2:02 PM, Mark Adams wrote: >>> >>>> >>>>>> >>>>> The overwhleming cost of AMG is the Galerkin triple-product RAP. >>>>> >>>>> >>>> That is overstating it a bit. It can be if you have a hard 3D >>>> operator and coarsening slowly is best. >>>> >>>> Rule of thumb is you spend 50% time is the solver and 50% in the >>>> setup, which is often mostly RAP (in 3D, 2D is much faster). That way you >>>> are within 2x of optimal and it often works out that way anyway. >>>> >>>> Mark >>>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Sun Jun 7 11:55:36 2015 From: hzhang at mcs.anl.gov (Hong) Date: Sun, 7 Jun 2015 11:55:36 -0500 Subject: [petsc-users] Calling single-precision MUMPS from PETSC In-Reply-To: References: Message-ID: Evan, The data types are defined as s: single real d: double real c: single complex z: double complex see https://books.google.com/books?id=4E9PY7NT8a0C&pg=PA22&lpg=PA22&dq=SDCZ+data+type&source=bl&ots=rd44BVkxjc&sig=KGxae_a5N54KD-AbO9CBnOE53CA&hl=en&sa=X&ei=IXJ0VeG7CNX-yQSujoOQCg&ved=0CDEQ6AEwAw#v=onepage&q=SDCZ%20data%20type&f=false_ Our mumps interface uses these definitions. I do not understand why you suggest replacing z to c for double complex: #if defined(PETSC_USE_COMPLEX) #if defined(PETSC_USE_REAL_SINGLE) #include #else //#include // old #include // new #endif and replacing d to s for double real: #if defined(PETSC_USE_REAL_SINGLE) #include #else //#include // old #include // new #endif Hong On Fri, Jun 5, 2015 at 6:26 PM, Evan Um wrote: > Dear Barry and PETSC users, > > I am revisiting a problem about how to call a single precision MUMPS from > double precision real/complex PETSC. After taking a look at mumps.c, I feel > that the following changes (attached) can make it possible to always call a > single precision MUMPS from PETSC. The change is basically to replace > double precision MUMPS and their associated types with corresponding single > ones. If someone already has some experience about this work, could you > comment on the change? In advance, I appreciate your help. > > Best, > Evan > > On Wed, Oct 22, 2014 at 1:36 PM, Barry Smith wrote: > >> >> There is no support for this. You can only call single precision MUMPS >> from single precision PETSc and double precision MUMPS from double >> precision PETSc. >> >> You could try to hack the interface that calls MUMPS from PETSc to >> use the single precision version but we don?t support that. >> src/mat/impls/aij/mpi/mumps/mumps.c >> >> Barry >> >> > On Oct 22, 2014, at 3:29 PM, Evan Um wrote: >> > >> > Dear PETSC users, >> > >> > When MUMPS is used inside PETSC, The default MUMPS driver seems to be >> double-precision MUMPS (i.e. DMUMPS). To reduce memory costs, I want to >> test a single-precision MUMPS (SMUMPS) from PETSC. Does anyone know how to >> switch from double to single-precision MUMPS inside PETSC? In advance, >> thanks for your comments. >> > >> > Regards, >> > Evan >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From evanum at gmail.com Sun Jun 7 12:23:39 2015 From: evanum at gmail.com (Evan Um) Date: Sun, 7 Jun 2015 10:23:39 -0700 Subject: [petsc-users] Calling single-precision MUMPS from PETSC In-Reply-To: References: Message-ID: Dear Hong, Thanks for your reply. In the modified mumps.c, I replaced zmumps_c.h with cmumps_c.h because I want to use a single precision complex mumps in my double-precision PETSC application. In my PETSC application (double precision real and complex versions), I want to always call a single-precision MUMPS. My understanding is that mumps.c is designed so that a single precision PETSC application should use a single-precision MUMPS; a double-precision PETSC should use a double-precision MUMPS. I am looking for a way for my double-precision PETSC application to use a single precision MUMPS. Is it possible for a user to slightly modify mumps.c and use a single-precision MUMPS in double-precision PETSC? Do I have to make extra changes beyond the definition parts in mumps.c? In advance, thanks for your kind help. Best, Evan On Sun, Jun 7, 2015 at 9:55 AM, Hong wrote: > Evan, > The data types are defined as > s: single real > d: double real > c: single complex > z: double complex > > see > https://books.google.com/books?id=4E9PY7NT8a0C&pg=PA22&lpg=PA22&dq=SDCZ+data+type&source=bl&ots=rd44BVkxjc&sig=KGxae_a5N54KD-AbO9CBnOE53CA&hl=en&sa=X&ei=IXJ0VeG7CNX-yQSujoOQCg&ved=0CDEQ6AEwAw#v=onepage&q=SDCZ%20data%20type&f=false_ > > Our mumps interface uses these definitions. > > I do not understand why you suggest replacing z to c for double complex: > > #if defined(PETSC_USE_COMPLEX) > #if defined(PETSC_USE_REAL_SINGLE) > #include > #else > //#include // old > #include // new > #endif > > and replacing d to s for double real: > #if defined(PETSC_USE_REAL_SINGLE) > #include > #else > //#include // old > #include // new > #endif > > Hong > > > > > On Fri, Jun 5, 2015 at 6:26 PM, Evan Um wrote: > >> Dear Barry and PETSC users, >> >> I am revisiting a problem about how to call a single precision MUMPS from >> double precision real/complex PETSC. After taking a look at mumps.c, I feel >> that the following changes (attached) can make it possible to always call a >> single precision MUMPS from PETSC. The change is basically to replace >> double precision MUMPS and their associated types with corresponding single >> ones. If someone already has some experience about this work, could you >> comment on the change? In advance, I appreciate your help. >> >> Best, >> Evan >> >> On Wed, Oct 22, 2014 at 1:36 PM, Barry Smith wrote: >> >>> >>> There is no support for this. You can only call single precision >>> MUMPS from single precision PETSc and double precision MUMPS from double >>> precision PETSc. >>> >>> You could try to hack the interface that calls MUMPS from PETSc to >>> use the single precision version but we don?t support that. >>> src/mat/impls/aij/mpi/mumps/mumps.c >>> >>> Barry >>> >>> > On Oct 22, 2014, at 3:29 PM, Evan Um wrote: >>> > >>> > Dear PETSC users, >>> > >>> > When MUMPS is used inside PETSC, The default MUMPS driver seems to be >>> double-precision MUMPS (i.e. DMUMPS). To reduce memory costs, I want to >>> test a single-precision MUMPS (SMUMPS) from PETSC. Does anyone know how to >>> switch from double to single-precision MUMPS inside PETSC? In advance, >>> thanks for your comments. >>> > >>> > Regards, >>> > Evan >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sun Jun 7 14:49:23 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 7 Jun 2015 14:49:23 -0500 Subject: [petsc-users] Calling single-precision MUMPS from PETSC In-Reply-To: References: Message-ID: <53E34DE8-F1BE-41AF-AFC8-1ED46A138B83@mcs.anl.gov> > On Jun 7, 2015, at 12:23 PM, Evan Um wrote: > > Dear Hong, > > Thanks for your reply. In the modified mumps.c, I replaced zmumps_c.h with cmumps_c.h because I want to use a single precision complex mumps in my double-precision PETSC application. > In my PETSC application (double precision real and complex versions), I want to always call a single-precision MUMPS. My understanding is that mumps.c is designed so that a single precision PETSC application should use a single-precision MUMPS; a double-precision PETSC should use a double-precision MUMPS. > I am looking for a way for my double-precision PETSC application to use a single precision MUMPS. Is it possible for a user to slightly modify mumps.c and use a single-precision MUMPS in double-precision PETSC? Do I have to make extra changes beyond the definition parts in mumps.c? > In advance, thanks for your kind help. Well you've tried to make the changes, why don't you see if it works? Fix any problems that come up ... If you get it working we could provide a flag to add this feature to PETSc in the future using your code. Barry > > Best, > Evan > > > > On Sun, Jun 7, 2015 at 9:55 AM, Hong wrote: > Evan, > The data types are defined as > s: single real > d: double real > c: single complex > z: double complex > > see https://books.google.com/books?id=4E9PY7NT8a0C&pg=PA22&lpg=PA22&dq=SDCZ+data+type&source=bl&ots=rd44BVkxjc&sig=KGxae_a5N54KD-AbO9CBnOE53CA&hl=en&sa=X&ei=IXJ0VeG7CNX-yQSujoOQCg&ved=0CDEQ6AEwAw#v=onepage&q=SDCZ%20data%20type&f=false_ > > Our mumps interface uses these definitions. > > I do not understand why you suggest replacing z to c for double complex: > > #if defined(PETSC_USE_COMPLEX) > #if defined(PETSC_USE_REAL_SINGLE) > #include > #else > //#include // old > #include // new > #endif > > and replacing d to s for double real: > #if defined(PETSC_USE_REAL_SINGLE) > #include > #else > //#include // old > #include // new > #endif > > Hong > > > > > On Fri, Jun 5, 2015 at 6:26 PM, Evan Um wrote: > Dear Barry and PETSC users, > > I am revisiting a problem about how to call a single precision MUMPS from double precision real/complex PETSC. After taking a look at mumps.c, I feel that the following changes (attached) can make it possible to always call a single precision MUMPS from PETSC. The change is basically to replace double precision MUMPS and their associated types with corresponding single ones. If someone already has some experience about this work, could you comment on the change? In advance, I appreciate your help. > > Best, > Evan > > On Wed, Oct 22, 2014 at 1:36 PM, Barry Smith wrote: > > There is no support for this. You can only call single precision MUMPS from single precision PETSc and double precision MUMPS from double precision PETSc. > > You could try to hack the interface that calls MUMPS from PETSc to use the single precision version but we don?t support that. src/mat/impls/aij/mpi/mumps/mumps.c > > Barry > > > On Oct 22, 2014, at 3:29 PM, Evan Um wrote: > > > > Dear PETSC users, > > > > When MUMPS is used inside PETSC, The default MUMPS driver seems to be double-precision MUMPS (i.e. DMUMPS). To reduce memory costs, I want to test a single-precision MUMPS (SMUMPS) from PETSC. Does anyone know how to switch from double to single-precision MUMPS inside PETSC? In advance, thanks for your comments. > > > > Regards, > > Evan > > > > From mkhodak at princeton.edu Sun Jun 7 18:50:55 2015 From: mkhodak at princeton.edu (Mikhail Khodak) Date: Sun, 7 Jun 2015 16:50:55 -0700 Subject: [petsc-users] petsc4py Build Problem Message-ID: Hello, I am trying to build petsc4py-3.5.1 using Cygwin on 64-bit Windows 7. I have built PETSc 3.5.4 with shared and dynamic libraries using mpich2-1.2.1 and successfully ran the installation tests. I am using Python 2.7 and NumPy 1.9.2 and have installed mpi4py. However, when I attempt to install petsc4py (both with pip and distutils) I get a mpicc compiler error due to undefined references/symbols. I have attached the output of running pip install petsc petsc4py --allow-external petsc --allow-external petsc4py Thank you in advance for any help, Mikhail Khodak -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- Requirement already satisfied (use --upgrade to upgrade): petsc in /usr/lib/python2.7/site-packages Collecting petsc4py Downloading https://bitbucket.org/petsc/petsc4py/downloads/petsc4py-3.5.1.tar.gz (1.5MB) Requirement already satisfied (use --upgrade to upgrade): numpy in /usr/lib/python2.7/site-packages (from petsc4py) Building wheels for collected packages: petsc4py Running setup.py bdist_wheel for petsc4py Complete output from command /usr/bin/python -c "import setuptools;__file__='/tmp/pip-build-6RsUSP/petsc4py/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" bdist_wheel -d /tmp/tmptNVF2lpip-wheel-: running bdist_wheel running build running build_src running build_py creating build creating build/lib.cygwin-2.0.3-x86_64-2.7 creating build/lib.cygwin-2.0.3-x86_64-2.7/petsc4py copying src/help.py -> build/lib.cygwin-2.0.3-x86_64-2.7/petsc4py copying src/PETSc.py -> build/lib.cygwin-2.0.3-x86_64-2.7/petsc4py copying src/__init__.py -> build/lib.cygwin-2.0.3-x86_64-2.7/petsc4py copying src/__main__.py -> build/lib.cygwin-2.0.3-x86_64-2.7/petsc4py creating build/lib.cygwin-2.0.3-x86_64-2.7/petsc4py/lib copying src/lib/__init__.py -> build/lib.cygwin-2.0.3-x86_64-2.7/petsc4py/lib creating build/lib.cygwin-2.0.3-x86_64-2.7/petsc4py/include creating build/lib.cygwin-2.0.3-x86_64-2.7/petsc4py/include/petsc4py copying src/include/petsc4py/numpy.h -> build/lib.cygwin-2.0.3-x86_64-2.7/petsc4py/include/petsc4py copying src/include/petsc4py/petsc4py.h -> build/lib.cygwin-2.0.3-x86_64-2.7/petsc4py/include/petsc4py copying src/include/petsc4py/petsc4py.PETSc.h -> build/lib.cygwin-2.0.3-x86_64-2.7/petsc4py/include/petsc4py copying src/include/petsc4py/petsc4py.PETSc_api.h -> build/lib.cygwin-2.0.3-x86_64-2.7/petsc4py/include/petsc4py copying src/include/petsc4py/petsc4py.i -> build/lib.cygwin-2.0.3-x86_64-2.7/petsc4py/include/petsc4py copying src/include/petsc4py/PETSc.pxd -> build/lib.cygwin-2.0.3-x86_64-2.7/petsc4py/include/petsc4py copying src/include/petsc4py/__init__.pxd -> build/lib.cygwin-2.0.3-x86_64-2.7/petsc4py/include/petsc4py copying src/include/petsc4py/__init__.pyx -> build/lib.cygwin-2.0.3-x86_64-2.7/petsc4py/include/petsc4py copying src/lib/petsc.cfg -> build/lib.cygwin-2.0.3-x86_64-2.7/petsc4py/lib running build_ext PETSC_DIR: /cygdrive/c/cygwin64/petsc-3.5.4 PETSC_ARCH: cygwin-2.0.3-x86_64-python version: 3.5.4 release scalar-type: real precision: double language: CONLY compiler: /cygdrive/c/cygwin64/mpich2-1.2.1/bin/mpicc linker: /cygdrive/c/cygwin64/mpich2-1.2.1/bin/mpicc building 'PETSc' extension creating build/temp.cygwin-2.0.3-x86_64-2.7 creating build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python creating build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src /cygdrive/c/cygwin64/mpich2-1.2.1/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O -fno-strict-aliasing -ggdb -O2 -pipe -Wimplicit-function-declaration -fdebug-prefix-map=/usr/src/ports/python/python-2.7.10-1.x86_64/build=/usr/src/debug/python-2.7.10-1 -fdebug-prefix-map=/usr/src/ports/python/python-2.7.10-1.x86_64/src/Python-2.7.10=/usr/src/debug/python-2.7.10-1 -DNDEBUG -g -fwrapv -O3 -Wall -DPETSC_DIR=/cygdrive/c/cygwin64/petsc-3.5.4 -I/cygdrive/c/cygwin64/mpich2-1.2.1/src/include -I/cygdrive/c/cygwin64/petsc-3.5.4/cygwin-2.0.3-x86_64-python/include -I/cygdrive/c/cygwin64/petsc-3.5.4/include -Isrc/include -I/usr/lib/python2.7/site-packages/numpy/core/include -I/usr/include/python2.7 -c src/PETSc.c -o build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o In file included from /usr/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1804:0, from /usr/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:17, from /usr/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:4, from src/include/petsc4py/numpy.h:11, from src/petsc4py.PETSc.c:353, from src/PETSc.c:3: /usr/lib/python2.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp] #warning "Using deprecated NumPy API, disable it by " \ ^ /cygdrive/c/cygwin64/mpich2-1.2.1/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O -fno-strict-aliasing -ggdb -O2 -pipe -Wimplicit-function-declaration -fdebug-prefix-map=/usr/src/ports/python/python-2.7.10-1.x86_64/build=/usr/src/debug/python-2.7.10-1 -fdebug-prefix-map=/usr/src/ports/python/python-2.7.10-1.x86_64/src/Python-2.7.10=/usr/src/debug/python-2.7.10-1 -DNDEBUG -g -fwrapv -O3 -Wall -DPETSC_DIR=/cygdrive/c/cygwin64/petsc-3.5.4 -I/cygdrive/c/cygwin64/mpich2-1.2.1/src/include -I/cygdrive/c/cygwin64/petsc-3.5.4/cygwin-2.0.3-x86_64-python/include -I/cygdrive/c/cygwin64/petsc-3.5.4/include -Isrc/include -I/usr/lib/python2.7/site-packages/numpy/core/include -I/usr/include/python2.7 -c src/libpetsc4py.c -o build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/libpetsc4py.o creating build/lib.cygwin-2.0.3-x86_64-2.7/petsc4py/lib/cygwin-2.0.3-x86_64-python /cygdrive/c/cygwin64/mpich2-1.2.1/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O -shared -Wl,--enable-auto-image-base -fno-strict-aliasing -ggdb -O2 -pipe -Wimplicit-function-declaration -fdebug-prefix-map=/usr/src/ports/python/python-2.7.10-1.x86_64/build=/usr/src/debug/python-2.7.10-1 -fdebug-prefix-map=/usr/src/ports/python/python-2.7.10-1.x86_64/src/Python-2.7.10=/usr/src/debug/python-2.7.10-1 -DNDEBUG -g -fwrapv -O3 -Wall -L. build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/libpetsc4py.o -L/cygdrive/c/cygwin64/petsc-3.5.4/cygwin-2.0.3-x86_64-python/lib -L/usr/lib/python2.7/config -L/usr/lib -Wl,-R/cygdrive/c/cygwin64/petsc-3.5.4/cygwin-2.0.3-x86_64-python/lib -lpetsc -lpython2.7 -o build/lib.cygwin-2.0.3-x86_64-2.7/petsc4py/lib/cygwin-2.0.3-x86_64-python/PETSc.dll build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_2TS_152setPythonContext': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:165922: undefined reference to `__imp_TSPythonSetContext' /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:165922:(.text+0x533d9): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `__imp_TSPythonSetContext' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_2PC_46setPythonContext': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:127910: undefined reference to `__imp_PCPythonSetContext' /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:127910:(.text+0x53669): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `__imp_PCPythonSetContext' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_3KSP_120setPythonContext': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:141013: undefined reference to `__imp_KSPPythonSetContext' /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:141013:(.text+0x538f9): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `__imp_KSPPythonSetContext' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_4SNES_150setPythonContext': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:153495: undefined reference to `__imp_SNESPythonSetContext' /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:153495:(.text+0x53b89): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `__imp_SNESPythonSetContext' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_3Mat_78setPythonContext': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:104355: undefined reference to `__imp_MatPythonSetContext' /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:104355:(.text+0x53e19): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `__imp_MatPythonSetContext' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_f_8petsc4py_5PETSc_register': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:212385: undefined reference to `__imp_import_libpetsc4py' /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:212385:(.text+0x6311d): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `__imp_import_libpetsc4py' /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:212394: undefined reference to `__imp_PetscPythonRegisterAll' /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:212394:(.text+0x6312c): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `__imp_PetscPythonRegisterAll' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_3Mat_76createPython': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:104246: undefined reference to `__imp_MatPythonSetContext' /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:104246:(.text+0xa4d22): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `__imp_MatPythonSetContext' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_3Mat_80getPythonContext': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:104429: undefined reference to `__imp_MatPythonGetContext' /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:104429:(.text+0xa6c07): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `__imp_MatPythonGetContext' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_3KSP_122getPythonContext': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:141087: undefined reference to `__imp_KSPPythonGetContext' /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:141087:(.text+0xa6de7): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `__imp_KSPPythonGetContext' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_2PC_48getPythonContext': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:127984: undefined reference to `__imp_PCPythonGetContext' /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:127984:(.text+0xa6fc7): additional relocation overflows omitted from the output build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_4SNES_152getPythonContext': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:153569: undefined reference to `__imp_SNESPythonGetContext' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_2TS_154getPythonContext': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:165996: undefined reference to `__imp_TSPythonGetContext' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_4SNES_148createPython': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:153386: undefined reference to `__imp_SNESPythonSetContext' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_3KSP_118createPython': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:140904: undefined reference to `__imp_KSPPythonSetContext' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_2PC_44createPython': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:127801: undefined reference to `__imp_PCPythonSetContext' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_2TS_150createPython': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:165813: undefined reference to `__imp_TSPythonSetContext' collect2: error: ld returned 1 exit status error: command '/cygdrive/c/cygwin64/mpich2-1.2.1/bin/mpicc' failed with exit status 1 ---------------------------------------- Failed to build petsc4py Installing collected packages: petsc4py Running setup.py install for petsc4py Complete output from command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip-build-6RsUSP/petsc4py/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-paiwXa-record/install-record.txt --single-version-externally-managed --compile: running install running build running build_src running build_py running build_ext PETSC_DIR: /cygdrive/c/cygwin64/petsc-3.5.4 PETSC_ARCH: cygwin-2.0.3-x86_64-python version: 3.5.4 release scalar-type: real precision: double language: CONLY compiler: /cygdrive/c/cygwin64/mpich2-1.2.1/bin/mpicc linker: /cygdrive/c/cygwin64/mpich2-1.2.1/bin/mpicc building 'PETSc' extension /cygdrive/c/cygwin64/mpich2-1.2.1/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O -fno-strict-aliasing -ggdb -O2 -pipe -Wimplicit-function-declaration -fdebug-prefix-map=/usr/src/ports/python/python-2.7.10-1.x86_64/build=/usr/src/debug/python-2.7.10-1 -fdebug-prefix-map=/usr/src/ports/python/python-2.7.10-1.x86_64/src/Python-2.7.10=/usr/src/debug/python-2.7.10-1 -DNDEBUG -g -fwrapv -O3 -Wall -DPETSC_DIR=/cygdrive/c/cygwin64/petsc-3.5.4 -I/cygdrive/c/cygwin64/mpich2-1.2.1/src/include -I/cygdrive/c/cygwin64/petsc-3.5.4/cygwin-2.0.3-x86_64-python/include -I/cygdrive/c/cygwin64/petsc-3.5.4/include -Isrc/include -I/usr/lib/python2.7/site-packages/numpy/core/include -I/usr/include/python2.7 -c src/PETSc.c -o build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o In file included from /usr/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1804:0, from /usr/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:17, from /usr/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:4, from src/include/petsc4py/numpy.h:11, from src/petsc4py.PETSc.c:353, from src/PETSc.c:3: /usr/lib/python2.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp] #warning "Using deprecated NumPy API, disable it by " \ ^ /cygdrive/c/cygwin64/mpich2-1.2.1/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O -fno-strict-aliasing -ggdb -O2 -pipe -Wimplicit-function-declaration -fdebug-prefix-map=/usr/src/ports/python/python-2.7.10-1.x86_64/build=/usr/src/debug/python-2.7.10-1 -fdebug-prefix-map=/usr/src/ports/python/python-2.7.10-1.x86_64/src/Python-2.7.10=/usr/src/debug/python-2.7.10-1 -DNDEBUG -g -fwrapv -O3 -Wall -DPETSC_DIR=/cygdrive/c/cygwin64/petsc-3.5.4 -I/cygdrive/c/cygwin64/mpich2-1.2.1/src/include -I/cygdrive/c/cygwin64/petsc-3.5.4/cygwin-2.0.3-x86_64-python/include -I/cygdrive/c/cygwin64/petsc-3.5.4/include -Isrc/include -I/usr/lib/python2.7/site-packages/numpy/core/include -I/usr/include/python2.7 -c src/libpetsc4py.c -o build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/libpetsc4py.o /cygdrive/c/cygwin64/mpich2-1.2.1/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O -shared -Wl,--enable-auto-image-base -fno-strict-aliasing -ggdb -O2 -pipe -Wimplicit-function-declaration -fdebug-prefix-map=/usr/src/ports/python/python-2.7.10-1.x86_64/build=/usr/src/debug/python-2.7.10-1 -fdebug-prefix-map=/usr/src/ports/python/python-2.7.10-1.x86_64/src/Python-2.7.10=/usr/src/debug/python-2.7.10-1 -DNDEBUG -g -fwrapv -O3 -Wall -L. build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/libpetsc4py.o -L/cygdrive/c/cygwin64/petsc-3.5.4/cygwin-2.0.3-x86_64-python/lib -L/usr/lib/python2.7/config -L/usr/lib -Wl,-R/cygdrive/c/cygwin64/petsc-3.5.4/cygwin-2.0.3-x86_64-python/lib -lpetsc -lpython2.7 -o build/lib.cygwin-2.0.3-x86_64-2.7/petsc4py/lib/cygwin-2.0.3-x86_64-python/PETSc.dll build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_2TS_152setPythonContext': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:165922: undefined reference to `__imp_TSPythonSetContext' /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:165922:(.text+0x533d9): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `__imp_TSPythonSetContext' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_2PC_46setPythonContext': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:127910: undefined reference to `__imp_PCPythonSetContext' /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:127910:(.text+0x53669): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `__imp_PCPythonSetContext' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_3KSP_120setPythonContext': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:141013: undefined reference to `__imp_KSPPythonSetContext' /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:141013:(.text+0x538f9): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `__imp_KSPPythonSetContext' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_4SNES_150setPythonContext': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:153495: undefined reference to `__imp_SNESPythonSetContext' /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:153495:(.text+0x53b89): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `__imp_SNESPythonSetContext' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_3Mat_78setPythonContext': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:104355: undefined reference to `__imp_MatPythonSetContext' /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:104355:(.text+0x53e19): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `__imp_MatPythonSetContext' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_f_8petsc4py_5PETSc_register': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:212385: undefined reference to `__imp_import_libpetsc4py' /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:212385:(.text+0x6311d): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `__imp_import_libpetsc4py' /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:212394: undefined reference to `__imp_PetscPythonRegisterAll' /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:212394:(.text+0x6312c): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `__imp_PetscPythonRegisterAll' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_3Mat_76createPython': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:104246: undefined reference to `__imp_MatPythonSetContext' /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:104246:(.text+0xa4d22): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `__imp_MatPythonSetContext' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_3Mat_80getPythonContext': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:104429: undefined reference to `__imp_MatPythonGetContext' /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:104429:(.text+0xa6c07): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `__imp_MatPythonGetContext' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_3KSP_122getPythonContext': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:141087: undefined reference to `__imp_KSPPythonGetContext' /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:141087:(.text+0xa6de7): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `__imp_KSPPythonGetContext' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_2PC_48getPythonContext': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:127984: undefined reference to `__imp_PCPythonGetContext' /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:127984:(.text+0xa6fc7): additional relocation overflows omitted from the output build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_4SNES_152getPythonContext': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:153569: undefined reference to `__imp_SNESPythonGetContext' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_2TS_154getPythonContext': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:165996: undefined reference to `__imp_TSPythonGetContext' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_4SNES_148createPython': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:153386: undefined reference to `__imp_SNESPythonSetContext' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_3KSP_118createPython': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:140904: undefined reference to `__imp_KSPPythonSetContext' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_2PC_44createPython': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:127801: undefined reference to `__imp_PCPythonSetContext' build/temp.cygwin-2.0.3-x86_64-2.7/cygwin-2.0.3-x86_64-python/src/PETSc.o: In function `__pyx_pf_8petsc4py_5PETSc_2TS_150createPython': /tmp/pip-build-6RsUSP/petsc4py/src/petsc4py.PETSc.c:165813: undefined reference to `__imp_TSPythonSetContext' collect2: error: ld returned 1 exit status error: command '/cygdrive/c/cygwin64/mpich2-1.2.1/bin/mpicc' failed with exit status 1 ---------------------------------------- From jychang48 at gmail.com Sun Jun 7 21:08:28 2015 From: jychang48 at gmail.com (Justin Chang) Date: Sun, 7 Jun 2015 21:08:28 -0500 Subject: [petsc-users] Guidance on GAMG preconditioning In-Reply-To: References: Message-ID: Matt (Knepley), I see what you're saying and it makes perfect sense. The point of my work isn't necessarily to compare CG/Jacobi with GAMG. Rather I am trying to compare both the numerical solution and the computational performance of my "correction" methodology (through optimization) with just solving the FEM problem normally. Of course this methodology is going to be more expensive but I think it would be nice to have some "benchmark" to compare against. I have examples that show where the parallel efficiency of TAO overtakes CG/Jacobi, and I also have the AI that shows how TAO is higher than CG/Jacobi and that both are invariant with respect to problem size. I ran some (smaller) experiments with GAMG and have noticed problems in which GAMG wall-clock time is less than CG/Jacobi (though not by much). However the problem is that it seems I cannot compute the arithmetic intensity for GAMG. The way I see it I have these three options: 1) Stick with what I have and acknowledge that GAMG can be better for larger problems. Since I have compared TAO with CG/Jacobi, somebody else can compare GAMG with CG/Jacobi. 2) Do strong scaling studies with GAMG and TAO and forget about the AI stuff. If I do this, then IMHO the paper will lose much of its flavor. 3) Use a different performance model that can be used to measure GAMG. I can only imagine that the complexity in applying any other model would proliferate for GAMG 4) Simply report FLOPS/s and the associated wall-clock times with respect to each solver. Yes this is easily gamed but I would think that this can at least tell you something (I.e., if this metric drops for a given problem size, it can be an indicator that the program is losing some efficiency) Thoughts? Justin On Saturday, June 6, 2015, Matthew Knepley wrote: > On Sat, Jun 6, 2015 at 4:29 AM, Justin Chang > wrote: > >> Matt and Mark thank you guys for your responses. >> >> The reason I brought up GAMG was because it seems to me that this is the >> preconditioner to use for elliptic problems. However, I am using CG/Jacobi >> for my larger problems and the solver converges (with -ksp_atol and >> -ksp_rtol set to 1e-8). Using GAMG I get rough the same wall-clock time, >> but significantly fewer solver iterations. >> >> As I also kind of mentioned in another mail, the ultimate purpose is to >> compare how this "correction" methodology using the TAO solver (with >> bounded constraints) performs compared to the original methodology using >> the KSP solver (without constraints). I have the A for BLMVM and CG/Jacobi >> and they are roughly 0.3 and 0.2 respectively (do these sound about >> right?). Although the AI is higher for TAO , the ratio of actual FLOPS/s >> over the AI*STREAMS BW is smaller, though I am not sure what conclusions to >> make of that. This was also partly why I wanted to see what kind of metrics >> another KSP solver/preconditioner produces. >> >> Point being, if I were to draw such comparisons between TAO and KSP, >> would I get crucified if people find out I am using CG/Jacobi and not GAMG? >> > > Here is what someone like me reviewing your paper would say first. I can > believe that a well-conditioned problem would > converge using CG/Jacobi. However, if the highest order derivative looks > like the Laplacian, then the condition number of > the equations will be O(h^2), and even with CG it will be O(h), so the > number of iterations should increase as the square root > of the problem size (in 2D), where GAMG should be constant. Thus at some > size GAMG will be more efficient. I would want > to see where the crossover is for your problem. If you do not get the O(h) > dependence, I would think that there is a problem > in the formulation. > > Thanks, > > Matt > > >> Thanks, >> Justin >> >> On Fri, Jun 5, 2015 at 2:02 PM, Mark Adams > > wrote: >> >>> >>>>> >>>> The overwhleming cost of AMG is the Galerkin triple-product RAP. >>>> >>>> >>> That is overstating it a bit. It can be if you have a hard 3D operator >>> and coarsening slowly is best. >>> >>> Rule of thumb is you spend 50% time is the solver and 50% in the setup, >>> which is often mostly RAP (in 3D, 2D is much faster). That way you are >>> within 2x of optimal and it often works out that way anyway. >>> >>> Mark >>> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Sun Jun 7 21:18:57 2015 From: jychang48 at gmail.com (Justin Chang) Date: Sun, 7 Jun 2015 21:18:57 -0500 Subject: [petsc-users] Guidance on GAMG preconditioning In-Reply-To: References: Message-ID: Sorry forgot one more comment. So basically I am comparing a "good" TAO solver (though this can be debated) with a "so-so" KSP solver in CG/Jacobi. If this "good" TAO solver cannot beat the performance of the "so-so" KSP, is there really any need to include the performance of the "good"KSP if my objective is focused on TAO and my methodology? On Sunday, June 7, 2015, Justin Chang wrote: > Matt (Knepley), > > I see what you're saying and it makes perfect sense. The point of my work > isn't necessarily to compare CG/Jacobi with GAMG. Rather I am trying to > compare both the numerical solution and the computational performance of my > "correction" methodology (through optimization) with just solving the FEM > problem normally. Of course this methodology is going to be more expensive > but I think it would be nice to have some "benchmark" to compare against. I > have examples that show where the parallel efficiency of TAO > overtakes CG/Jacobi, and I also have the AI that shows how TAO is higher > than CG/Jacobi and that both are invariant with respect to problem size. > > I ran some (smaller) experiments with GAMG and have noticed problems in > which GAMG wall-clock time is less than CG/Jacobi (though not by much). > However the problem is that it seems I cannot compute the arithmetic > intensity for GAMG. > > The way I see it I have these three options: > > 1) Stick with what I have and acknowledge that GAMG can be better for > larger problems. Since I have compared TAO with CG/Jacobi, somebody else > can compare GAMG with CG/Jacobi. > > 2) Do strong scaling studies with GAMG and TAO and forget about the AI > stuff. If I do this, then IMHO the paper will lose much of its flavor. > > 3) Use a different performance model that can be used to measure GAMG. I > can only imagine that the complexity in applying any other model would > proliferate for GAMG > > 4) Simply report FLOPS/s and the associated wall-clock times with respect > to each solver. Yes this is easily gamed but I would think that this can at > least tell you something (I.e., if this metric drops for a given problem > size, it can be an indicator that the program is losing some efficiency) > > Thoughts? > > Justin > > On Saturday, June 6, 2015, Matthew Knepley > wrote: > >> On Sat, Jun 6, 2015 at 4:29 AM, Justin Chang wrote: >> >>> Matt and Mark thank you guys for your responses. >>> >>> The reason I brought up GAMG was because it seems to me that this is the >>> preconditioner to use for elliptic problems. However, I am using CG/Jacobi >>> for my larger problems and the solver converges (with -ksp_atol and >>> -ksp_rtol set to 1e-8). Using GAMG I get rough the same wall-clock time, >>> but significantly fewer solver iterations. >>> >>> As I also kind of mentioned in another mail, the ultimate purpose is to >>> compare how this "correction" methodology using the TAO solver (with >>> bounded constraints) performs compared to the original methodology using >>> the KSP solver (without constraints). I have the A for BLMVM and CG/Jacobi >>> and they are roughly 0.3 and 0.2 respectively (do these sound about >>> right?). Although the AI is higher for TAO , the ratio of actual FLOPS/s >>> over the AI*STREAMS BW is smaller, though I am not sure what conclusions to >>> make of that. This was also partly why I wanted to see what kind of metrics >>> another KSP solver/preconditioner produces. >>> >>> Point being, if I were to draw such comparisons between TAO and KSP, >>> would I get crucified if people find out I am using CG/Jacobi and not GAMG? >>> >> >> Here is what someone like me reviewing your paper would say first. I can >> believe that a well-conditioned problem would >> converge using CG/Jacobi. However, if the highest order derivative looks >> like the Laplacian, then the condition number of >> the equations will be O(h^2), and even with CG it will be O(h), so the >> number of iterations should increase as the square root >> of the problem size (in 2D), where GAMG should be constant. Thus at some >> size GAMG will be more efficient. I would want >> to see where the crossover is for your problem. If you do not get the >> O(h) dependence, I would think that there is a problem >> in the formulation. >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>> Justin >>> >>> On Fri, Jun 5, 2015 at 2:02 PM, Mark Adams wrote: >>> >>>> >>>>>> >>>>> The overwhleming cost of AMG is the Galerkin triple-product RAP. >>>>> >>>>> >>>> That is overstating it a bit. It can be if you have a hard 3D operator >>>> and coarsening slowly is best. >>>> >>>> Rule of thumb is you spend 50% time is the solver and 50% in the setup, >>>> which is often mostly RAP (in 3D, 2D is much faster). That way you are >>>> within 2x of optimal and it often works out that way anyway. >>>> >>>> Mark >>>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From har3.8077 at gmail.com Sun Jun 7 21:36:25 2015 From: har3.8077 at gmail.com (har3.8077 at gmail.com) Date: Mon, 08 Jun 2015 02:36:25 +0000 Subject: [petsc-users] =?gb2312?b?zeLDs9b3tq/KvdOqz/q12tK7xrfFxqOssO/W+sT6?= =?gb2312?b?v6q3osirx/LEv7Hqv827p9OutcPK0LOhIQ==?= Message-ID: <001a11c3315217df3d0517f8821d@google.com> ????????? ????????????????????????????200????? 700???????????? ???????????????????????????????????? ??????????google??????????????? ???????????????????????????????????? ??????????? ????? ?????? ???????????????????? ??????(18???????)+??700???????? 1880?/? 2880?/? ?????????QQ?61737441---???? ?QQ?????????????????????? ????????? ??????? ??????????? https://docs.google.com/forms/d/1lqrIACylQJq8rmNH_Shb47p24Xa0gcmxgDzwoT-Ouw4/viewform?c=0&w=1&usp=mail_form_link -------------- next part -------------- An HTML attachment was scrubbed... URL: From lawrence.mitchell at imperial.ac.uk Mon Jun 8 04:31:05 2015 From: lawrence.mitchell at imperial.ac.uk (Lawrence Mitchell) Date: Mon, 08 Jun 2015 10:31:05 +0100 Subject: [petsc-users] DMs not transferred into PCCOMPOSITE? Message-ID: <557560D9.6050408@imperial.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all, I have a multi-field system where the splits are defined by a DM. I can use a fieldsplit PC on its own with no trouble, but cannot do so inside a composite PC. My matrices do not have a blocksize set (because the data layout doesn't match) and hence the automagic case (when the pc doesn't have a DM) fails. It looks like this is just a case of the outer DM not being transferred into the sub PCs. Is there a good reason for this, or is it just an oversight? If I apply the following patch then things appear to work Cheers, Lawrence diff --git a/src/ksp/pc/impls/composite/composite.c b/src/ksp/pc/impls/composite/composite.c index 016d195..bf158cc 100644 - --- a/src/ksp/pc/impls/composite/composite.c +++ b/src/ksp/pc/impls/composite/composite.c @@ -170,2 +170,3 @@ static PetscErrorCode PCSetUp_Composite(PC pc) PC_CompositeLink next = jac->head; + DM dm; @@ -175,4 +176,6 @@ static PetscErrorCode PCSetUp_Composite(PC pc) } + ierr = PCGetDM(pc,&dm);CHKERRQ(ierr); while (next) { ierr = PCSetOperators(next->pc,pc->mat,pc->pmat);CHKERRQ(ierr); + ierr = PCSetDM(next->pc,dm);CHKERRQ(ierr); next = next->next; -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJVdWDUAAoJECOc1kQ8PEYvqdEIAOk3UpOyJaXszcLXEfEc4Cf0 /JAbnCahZ9l4IGe8d9Iinvf+8yKu2DAIbdeBPg20CexYGoqpbMa1ljGk+rz5oc/h yBGzNjd6Vp+0uAqn2dhLAm7R17A3yKOMk+WdS5sVSWmGBUUHK7y/pkssEjM0PN4b pKIGcYIHAADY7494AIaKGAjJHzBcbGFVaBpeeiGU64FhrV4dlq78jP63GRDrLFvK 22PzzT2NF1JyYumLTaPEHecpbm9ykhCnU+TSfTRC66WZCIjUfuuSarH4Z/HOuVbv GjpelCT4JdoDV6w8bwo3t40BnlEi7RsAyk8q+sIyw0D6l1qnGIWZuyxsPWRTzxM= =jK7M -----END PGP SIGNATURE----- From jychang48 at gmail.com Mon Jun 8 05:16:53 2015 From: jychang48 at gmail.com (Justin Chang) Date: Mon, 8 Jun 2015 05:16:53 -0500 Subject: [petsc-users] Tao iterations In-Reply-To: References: <3112142f40d64968ae157f4a5e819353@LUCKMAN.anl.gov> Message-ID: Hi Jason, One more question about BLMVM... if it only uses gradient information and does not require the definitition of a Hessian Matrix, can this method be applied to solve problems that are nonsymmetric by nature? (e.g., advection-diffusion equations). If I had wanted to solve the same equations using TRON I would need to rewrite the problem as "normal equations" (i.e., min 1/2*||K*u - f||^2 or min 1/2*u^T*K^T*K*u - u^T*K^T*f) so that I get a symmetric Hessian (but this method results in extremely high condition numbers and renders the numerical solution less reliable and accurate). Thanks, Justin On Wed, Apr 29, 2015 at 5:53 PM, Justin Chang wrote: > Okay that's what I figured, thanks you very much > > > > On Wed, Apr 29, 2015 at 10:39 AM, Jason Sarich > wrote: > >> Hi Justin, >> >> This expected behavior due to the accumulation of numerical round-offs. >> If this is a problem or if you just want to confirm that this is the cause, >> you can try configuring PETSc for quad precision >> (--with-precision=__float128, works with GNU compilers) and the results >> should match better. >> >> Jason >> >> >> On Tue, Apr 28, 2015 at 10:19 PM, Justin Chang wrote: >> >>> Jason (or anyone), >>> >>> I am noticing that the iteration numbers reported by >>> TaoGetSolutionStatus() for blmvm differ whenever I change the number of >>> processes. The solution seems to remain the same though. Is there a reason >>> why this could be happening? >>> >>> Thanks, >>> >>> On Tue, Apr 21, 2015 at 10:40 AM, Jason Sarich >>> wrote: >>> >>>> Justin, >>>> >>>> 1) The big difference between TRON and BLMVM is that TRON requires >>>> hessian information, BLMVM only uses gradient information. Thus TRON will >>>> usually converge faster, but requires more information, memory, and a KSP >>>> solver. GPCG (gradient projected conjugate gradient) is another >>>> gradient-only option, but usually performs worse than BLMVM. >>>> >>>> 2) TaoGetLinearSolveIterations() will get the total number of KSP >>>> iterations per solve >>>> >>>> Jason >>>> >>>> >>>> On Tue, Apr 21, 2015 at 10:33 AM, Justin Chang >>>> wrote: >>>> >>>>> Jason, >>>>> >>>>> Tightening the tolerances did the trick. Thanks. Though I do have a >>>>> couple more related questions: >>>>> >>>>> 1) Is there a general guideline for choosing tron over blmvm or vice >>>>> versa? Also is there another tao type that is also suitable given only >>>>> bounded constraints? >>>>> >>>>> 2) Is it possible to obtain the total number of KSP and/or PG >>>>> iterations from tron? >>>>> >>>>> Thanks, >>>>> Justin >>>>> >>>>> On Tue, Apr 21, 2015 at 9:52 AM, Jason Sarich >>>>> wrote: >>>>> >>>>>> Hi Justin, >>>>>> >>>>>> blmvm believes that it is already sufficiently close to a minimum, >>>>>> so it doesn't do anything. You may need to tighten some of the tolerance to >>>>>> force an iteration. >>>>>> >>>>>> Jason >>>>>> >>>>>> >>>>>> On Tue, Apr 21, 2015 at 9:48 AM, Justin Chang >>>>>> wrote: >>>>>> >>>>>>> Time step 1: >>>>>>> >>>>>>> Tao Object: 1 MPI processes >>>>>>> type: blmvm >>>>>>> Gradient steps: 0 >>>>>>> TaoLineSearch Object: 1 MPI processes >>>>>>> type: more-thuente >>>>>>> Active Set subset type: subvec >>>>>>> convergence tolerances: fatol=0.0001, frtol=0.0001 >>>>>>> convergence tolerances: gatol=0, steptol=0, gttol=0 >>>>>>> Residual in Function/Gradient:=0.0663148 >>>>>>> Objective value=-55.5945 >>>>>>> total number of iterations=35, (max: 2000) >>>>>>> total number of function/gradient evaluations=37, (max: 4000) >>>>>>> Solution converged: estimated |f(x)-f(X*)|/|f(X*)| <= frtol >>>>>>> >>>>>>> Time step 2: >>>>>>> >>>>>>> Tao Object: 1 MPI processes >>>>>>> type: blmvm >>>>>>> Gradient steps: 0 >>>>>>> TaoLineSearch Object: 1 MPI processes >>>>>>> type: more-thuente >>>>>>> Active Set subset type: subvec >>>>>>> convergence tolerances: fatol=0.0001, frtol=0.0001 >>>>>>> convergence tolerances: gatol=0, steptol=0, gttol=0 >>>>>>> Residual in Function/Gradient:=0.0682307 >>>>>>> Objective value=-66.9675 >>>>>>> total number of iterations=23, (max: 2000) >>>>>>> total number of function/gradient evaluations=25, (max: 4000) >>>>>>> Solution converged: estimated |f(x)-f(X*)|/|f(X*)| <= frtol >>>>>>> >>>>>>> Time step 3: >>>>>>> >>>>>>> Tao Object: 1 MPI processes >>>>>>> type: blmvm >>>>>>> Gradient steps: 0 >>>>>>> TaoLineSearch Object: 1 MPI processes >>>>>>> type: more-thuente >>>>>>> Active Set subset type: subvec >>>>>>> convergence tolerances: fatol=0.0001, frtol=0.0001 >>>>>>> convergence tolerances: gatol=0, steptol=0, gttol=0 >>>>>>> Residual in Function/Gradient:=0.0680522 >>>>>>> Objective value=-71.8211 >>>>>>> total number of iterations=19, (max: 2000) >>>>>>> total number of function/gradient evaluations=22, (max: 4000) >>>>>>> Solution converged: estimated |f(x)-f(X*)|/|f(X*)| <= frtol >>>>>>> >>>>>>> Time step 4: >>>>>>> >>>>>>> Tao Object: 1 MPI processes >>>>>>> type: blmvm >>>>>>> Gradient steps: 0 >>>>>>> TaoLineSearch Object: 1 MPI processes >>>>>>> type: more-thuente >>>>>>> Active Set subset type: subvec >>>>>>> convergence tolerances: fatol=0.0001, frtol=0.0001 >>>>>>> convergence tolerances: gatol=0, steptol=0, gttol=0 >>>>>>> Residual in Function/Gradient:=0.0551556 >>>>>>> Objective value=-75.1252 >>>>>>> total number of iterations=18, (max: 2000) >>>>>>> total number of function/gradient evaluations=20, (max: 4000) >>>>>>> Solution converged: estimated |f(x)-f(X*)|/|f(X*)| <= frtol >>>>>>> >>>>>>> Time step 5: >>>>>>> >>>>>>> Tao Object: 1 MPI processes >>>>>>> type: blmvm >>>>>>> Gradient steps: 0 >>>>>>> TaoLineSearch Object: 1 MPI processes >>>>>>> type: more-thuente >>>>>>> Active Set subset type: subvec >>>>>>> convergence tolerances: fatol=0.0001, frtol=0.0001 >>>>>>> convergence tolerances: gatol=0, steptol=0, gttol=0 >>>>>>> Residual in Function/Gradient:=0.0675667 >>>>>>> Objective value=-77.4414 >>>>>>> total number of iterations=6, (max: 2000) >>>>>>> total number of function/gradient evaluations=8, (max: 4000) >>>>>>> Solution converged: estimated |f(x)-f(X*)|/|f(X*)| <= frtol >>>>>>> >>>>>>> Time step 6: >>>>>>> >>>>>>> Tao Object: 1 MPI processes >>>>>>> type: blmvm >>>>>>> Gradient steps: 0 >>>>>>> TaoLineSearch Object: 1 MPI processes >>>>>>> type: more-thuente >>>>>>> Active Set subset type: subvec >>>>>>> convergence tolerances: fatol=0.0001, frtol=0.0001 >>>>>>> convergence tolerances: gatol=0, steptol=0, gttol=0 >>>>>>> Residual in Function/Gradient:=0.059143 >>>>>>> Objective value=-79.5007 >>>>>>> total number of iterations=3, (max: 2000) >>>>>>> total number of function/gradient evaluations=5, (max: 4000) >>>>>>> Solution converged: estimated |f(x)-f(X*)|/|f(X*)| <= frtol >>>>>>> >>>>>>> Time step 7: >>>>>>> >>>>>>> Tao Object: 1 MPI processes >>>>>>> type: blmvm >>>>>>> Gradient steps: 0 >>>>>>> TaoLineSearch Object: 1 MPI processes >>>>>>> type: more-thuente >>>>>>> Active Set subset type: subvec >>>>>>> convergence tolerances: fatol=0.0001, frtol=0.0001 >>>>>>> convergence tolerances: gatol=0, steptol=0, gttol=0 >>>>>>> Residual in Function/Gradient:=0.0433683 >>>>>>> Objective value=-81.3546 >>>>>>> total number of iterations=5, (max: 2000) >>>>>>> total number of function/gradient evaluations=8, (max: 4000) >>>>>>> Solution converged: estimated |f(x)-f(X*)|/|f(X*)| <= frtol >>>>>>> >>>>>>> Time step 8: >>>>>>> >>>>>>> Tao Object: 1 MPI processes >>>>>>> type: blmvm >>>>>>> Gradient steps: 0 >>>>>>> TaoLineSearch Object: 1 MPI processes >>>>>>> type: more-thuente >>>>>>> Active Set subset type: subvec >>>>>>> convergence tolerances: fatol=0.0001, frtol=0.0001 >>>>>>> convergence tolerances: gatol=0, steptol=0, gttol=0 >>>>>>> Residual in Function/Gradient:=0.0840676 >>>>>>> Objective value=-82.9382 >>>>>>> total number of iterations=0, (max: 2000) >>>>>>> total number of function/gradient evaluations=1, (max: 4000) >>>>>>> Solution converged: estimated |f(x)-f(X*)|/|f(X*)| <= frtol >>>>>>> >>>>>>> Time step 9: >>>>>>> >>>>>>> Tao Object: 1 MPI processes >>>>>>> type: blmvm >>>>>>> Gradient steps: 0 >>>>>>> TaoLineSearch Object: 1 MPI processes >>>>>>> type: more-thuente >>>>>>> Active Set subset type: subvec >>>>>>> convergence tolerances: fatol=0.0001, frtol=0.0001 >>>>>>> convergence tolerances: gatol=0, steptol=0, gttol=0 >>>>>>> Residual in Function/Gradient:=0.0840676 >>>>>>> Objective value=-82.9382 >>>>>>> total number of iterations=0, (max: 2000) >>>>>>> total number of function/gradient evaluations=1, (max: 4000) >>>>>>> Solution converged: estimated |f(x)-f(X*)|/|f(X*)| <= frtol >>>>>>> >>>>>>> Time step 10: >>>>>>> >>>>>>> Tao Object: 1 MPI processes >>>>>>> type: blmvm >>>>>>> Gradient steps: 0 >>>>>>> TaoLineSearch Object: 1 MPI processes >>>>>>> type: more-thuente >>>>>>> Active Set subset type: subvec >>>>>>> convergence tolerances: fatol=0.0001, frtol=0.0001 >>>>>>> convergence tolerances: gatol=0, steptol=0, gttol=0 >>>>>>> Residual in Function/Gradient:=0.0840676 >>>>>>> Objective value=-82.9382 >>>>>>> total number of iterations=0, (max: 2000) >>>>>>> total number of function/gradient evaluations=1, (max: 4000) >>>>>>> Solution converged: estimated |f(x)-f(X*)|/|f(X*)| <= frtol >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Apr 21, 2015 at 9:28 AM, Jason Sarich < >>>>>>> jason.sarich at gmail.com> wrote: >>>>>>> >>>>>>>> Hi Justin, >>>>>>>> >>>>>>>> what reason is blmvm giving for stopping the solve? (you can use >>>>>>>> -tao_view or -tao_converged_reason to get this) >>>>>>>> >>>>>>>> Jason >>>>>>>> >>>>>>>> On Mon, Apr 20, 2015 at 6:32 PM, Justin Chang >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Jason, >>>>>>>>> >>>>>>>>> I am using TaoGetSolutionStatus(tao,&its, ...) and it gives me >>>>>>>>> exactly what I want. However, I seem to be having an issue with blmvm >>>>>>>>> >>>>>>>>> I wrote my own backward euler code for a transient linear >>>>>>>>> diffusion problem with lower bounds >= 0 and upper bounds <= 1. For the >>>>>>>>> first several time steps I am getting its > 0, and it decreases over time >>>>>>>>> due to the nature of the discrete maximum principles. However, at some >>>>>>>>> point my its become 0 and the solution does not "update", which seems to me >>>>>>>>> that TaoSolve is not doing anything after that. This doesn't happen if I >>>>>>>>> were to use tron (my KSP and PC are cg and jacobi respectively). >>>>>>>>> >>>>>>>>> Do you know why this behavior may occur? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> On Tue, Apr 14, 2015 at 9:35 AM, Jason Sarich < >>>>>>>>> jason.sarich at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Justin, >>>>>>>>>> >>>>>>>>>> I have pushed these changes to the "next" branch, your code >>>>>>>>>> snippet should work fine there. >>>>>>>>>> >>>>>>>>>> Note that there is also available (since version 3.5.0) the >>>>>>>>>> routine TaoGetSolutionStatus(tao,&its,NULL,NULL,NULL,NULL,NULL) which will >>>>>>>>>> provide the >>>>>>>>>> same information >>>>>>>>>> >>>>>>>>>> Jason >>>>>>>>>> >>>>>>>>>> On Fri, Apr 10, 2015 at 6:28 PM, Justin Chang < >>>>>>>>>> jychang48 at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Whatever is convenient and/or follow the "PETSc" standards. >>>>>>>>>>> Something similar to SNESGetIterationNumber() or KSPGetIterationNumber() >>>>>>>>>>> would be nice. Ideally I want my code to look like this: >>>>>>>>>>> >>>>>>>>>>> ierr = TaoGetIterationNumber(tao,&its);CHKERRQ(ierr); >>>>>>>>>>> ierr = PetscPrintf(PETSC_COMM_WORLD, "Number of Tao iterations >>>>>>>>>>> = %D\n", its); >>>>>>>>>>> >>>>>>>>>>> Thanks :) >>>>>>>>>>> >>>>>>>>>>> On Fri, Apr 10, 2015 at 5:53 PM, Jason Sarich < >>>>>>>>>>> jason.sarich at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Justin, I'll get this in. I assume that displaying the >>>>>>>>>>>> number of iterations with tao_converged_reason is what you are asking for >>>>>>>>>>>> in particular? Or did you have something else in mind? >>>>>>>>>>>> >>>>>>>>>>>> Jason >>>>>>>>>>>> On Apr 10, 2015 16:42, "Smith, Barry F." >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Justin, >>>>>>>>>>>>> >>>>>>>>>>>>> Sorry TAO simply doesn't even collect this information >>>>>>>>>>>>> currently. But yes we should definitely make it available! >>>>>>>>>>>>> >>>>>>>>>>>>> Jason, >>>>>>>>>>>>> >>>>>>>>>>>>> Could you please add this; almost all the TaoSolve_xxx() >>>>>>>>>>>>> have a local variable iter; change that to tao->niter (I'm guess this is >>>>>>>>>>>>> suppose to capture this information) and add a TaoGetIterationNumber() and >>>>>>>>>>>>> the uses can access this. Also modify at the end of TaoSolve() >>>>>>>>>>>>> -tao_converged_reason to also print the iteration count. At the same time >>>>>>>>>>>>> since you add this you can add a tao->totalits which would accumulate all >>>>>>>>>>>>> iterations over all the solves for that Tao object and the routine >>>>>>>>>>>>> TaoGetTotalIterations() to access this. Note that TaoSolve() would >>>>>>>>>>>>> initialize tao->niter = 0 at the top. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks >>>>>>>>>>>>> >>>>>>>>>>>>> Barry >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> > On Apr 10, 2015, at 4:16 PM, Justin Chang >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> > >>>>>>>>>>>>> > Hi all, >>>>>>>>>>>>> > >>>>>>>>>>>>> > Is there a way to generically obtain the number of Tao >>>>>>>>>>>>> iterations? I am looking through the -help options for Tao and I don't see >>>>>>>>>>>>> any metric where you can output this quantity in the manner that you could >>>>>>>>>>>>> for SNES or KSP solves. I am currently using blmvm and tron, and the only >>>>>>>>>>>>> way I can see getting this metric is by outputting -tao_view and/or >>>>>>>>>>>>> -tao_monitor and manually finding this number. I find this cumbersome >>>>>>>>>>>>> especially for transient problems where I would like to simply have this >>>>>>>>>>>>> number printed for each step instead of ending up with unnecessary info. >>>>>>>>>>>>> > >>>>>>>>>>>>> > Thanks, >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > -- >>>>>>>>>>>>> > Justin Chang >>>>>>>>>>>>> > PhD Candidate, Civil Engineering - Computational Sciences >>>>>>>>>>>>> > University of Houston, Department of Civil and Environmental >>>>>>>>>>>>> Engineering >>>>>>>>>>>>> > Houston, TX 77004 >>>>>>>>>>>>> > (512) 963-3262 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>> >>> -- >>> Justin Chang >>> PhD Candidate, Civil Engineering - Computational Sciences >>> University of Houston, Department of Civil and Environmental Engineering >>> Houston, TX 77004 >>> (512) 963-3262 >>> >> >> > > > -- > Justin Chang > PhD Candidate, Civil Engineering - Computational Sciences > University of Houston, Department of Civil and Environmental Engineering > Houston, TX 77004 > (512) 963-3262 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Jun 8 05:52:16 2015 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 8 Jun 2015 05:52:16 -0500 Subject: [petsc-users] Guidance on GAMG preconditioning In-Reply-To: References: Message-ID: On Sun, Jun 7, 2015 at 9:08 PM, Justin Chang wrote: > Matt (Knepley), > > I see what you're saying and it makes perfect sense. The point of my work > isn't necessarily to compare CG/Jacobi with GAMG. Rather I am trying to > compare both the numerical solution and the computational performance of my > "correction" methodology (through optimization) with just solving the FEM > problem normally. Of course this methodology is going to be more expensive > but I think it would be nice to have some "benchmark" to compare against. I > have examples that show where the parallel efficiency of TAO > overtakes CG/Jacobi, and I also have the AI that shows how TAO is higher > than CG/Jacobi and that both are invariant with respect to problem size. > > I ran some (smaller) experiments with GAMG and have noticed problems in > which GAMG wall-clock time is less than CG/Jacobi (though not by much). > However the problem is that it seems I cannot compute the arithmetic > intensity for GAMG. > > The way I see it I have these three options: > > 1) Stick with what I have and acknowledge that GAMG can be better for > larger problems. Since I have compared TAO with CG/Jacobi, somebody else > can compare GAMG with CG/Jacobi. > Do this. Include the performance data you do have with GAMG. It is perfectly acceptable to say "I care about these problems sizes for my problems, and I have done careful perofrmance analysis", and then acknowledge that at some point GAMG will likely win. Thanks, Matt > 2) Do strong scaling studies with GAMG and TAO and forget about the AI > stuff. If I do this, then IMHO the paper will lose much of its flavor. > > 3) Use a different performance model that can be used to measure GAMG. I > can only imagine that the complexity in applying any other model would > proliferate for GAMG > > 4) Simply report FLOPS/s and the associated wall-clock times with respect > to each solver. Yes this is easily gamed but I would think that this can at > least tell you something (I.e., if this metric drops for a given problem > size, it can be an indicator that the program is losing some efficiency) > > Thoughts? > > Justin > > > On Saturday, June 6, 2015, Matthew Knepley wrote: > >> On Sat, Jun 6, 2015 at 4:29 AM, Justin Chang wrote: >> >>> Matt and Mark thank you guys for your responses. >>> >>> The reason I brought up GAMG was because it seems to me that this is the >>> preconditioner to use for elliptic problems. However, I am using CG/Jacobi >>> for my larger problems and the solver converges (with -ksp_atol and >>> -ksp_rtol set to 1e-8). Using GAMG I get rough the same wall-clock time, >>> but significantly fewer solver iterations. >>> >>> As I also kind of mentioned in another mail, the ultimate purpose is to >>> compare how this "correction" methodology using the TAO solver (with >>> bounded constraints) performs compared to the original methodology using >>> the KSP solver (without constraints). I have the A for BLMVM and CG/Jacobi >>> and they are roughly 0.3 and 0.2 respectively (do these sound about >>> right?). Although the AI is higher for TAO , the ratio of actual FLOPS/s >>> over the AI*STREAMS BW is smaller, though I am not sure what conclusions to >>> make of that. This was also partly why I wanted to see what kind of metrics >>> another KSP solver/preconditioner produces. >>> >>> Point being, if I were to draw such comparisons between TAO and KSP, >>> would I get crucified if people find out I am using CG/Jacobi and not GAMG? >>> >> >> Here is what someone like me reviewing your paper would say first. I can >> believe that a well-conditioned problem would >> converge using CG/Jacobi. However, if the highest order derivative looks >> like the Laplacian, then the condition number of >> the equations will be O(h^2), and even with CG it will be O(h), so the >> number of iterations should increase as the square root >> of the problem size (in 2D), where GAMG should be constant. Thus at some >> size GAMG will be more efficient. I would want >> to see where the crossover is for your problem. If you do not get the >> O(h) dependence, I would think that there is a problem >> in the formulation. >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>> Justin >>> >>> On Fri, Jun 5, 2015 at 2:02 PM, Mark Adams wrote: >>> >>>> >>>>>> >>>>> The overwhleming cost of AMG is the Galerkin triple-product RAP. >>>>> >>>>> >>>> That is overstating it a bit. It can be if you have a hard 3D operator >>>> and coarsening slowly is best. >>>> >>>> Rule of thumb is you spend 50% time is the solver and 50% in the setup, >>>> which is often mostly RAP (in 3D, 2D is much faster). That way you are >>>> within 2x of optimal and it often works out that way anyway. >>>> >>>> Mark >>>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Jun 8 06:06:54 2015 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 8 Jun 2015 06:06:54 -0500 Subject: [petsc-users] DMs not transferred into PCCOMPOSITE? In-Reply-To: <557560D9.6050408@imperial.ac.uk> References: <557560D9.6050408@imperial.ac.uk> Message-ID: On Mon, Jun 8, 2015 at 4:31 AM, Lawrence Mitchell < lawrence.mitchell at imperial.ac.uk> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi all, > > I have a multi-field system where the splits are defined by a DM. I > can use a fieldsplit PC on its own with no trouble, but cannot do so > inside a composite PC. My matrices do not have a blocksize set > (because the data layout doesn't match) and hence the automagic case > (when the pc doesn't have a DM) fails. > > It looks like this is just a case of the outer DM not being > transferred into the sub PCs. Is there a good reason for this, or is > it just an oversight? If I apply the following patch then things > appear to work > This looks like an oversight to me. I can integrate it. Matt > Cheers, > > Lawrence > > > diff --git a/src/ksp/pc/impls/composite/composite.c > b/src/ksp/pc/impls/composite/composite.c > index 016d195..bf158cc 100644 > - --- a/src/ksp/pc/impls/composite/composite.c > +++ b/src/ksp/pc/impls/composite/composite.c > @@ -170,2 +170,3 @@ static PetscErrorCode PCSetUp_Composite(PC pc) > PC_CompositeLink next = jac->head; > + DM dm; > > @@ -175,4 +176,6 @@ static PetscErrorCode PCSetUp_Composite(PC pc) > } > + ierr = PCGetDM(pc,&dm);CHKERRQ(ierr); > while (next) { > ierr = PCSetOperators(next->pc,pc->mat,pc->pmat);CHKERRQ(ierr); > + ierr = PCSetDM(next->pc,dm);CHKERRQ(ierr); > next = next->next; > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1 > > iQEcBAEBAgAGBQJVdWDUAAoJECOc1kQ8PEYvqdEIAOk3UpOyJaXszcLXEfEc4Cf0 > /JAbnCahZ9l4IGe8d9Iinvf+8yKu2DAIbdeBPg20CexYGoqpbMa1ljGk+rz5oc/h > yBGzNjd6Vp+0uAqn2dhLAm7R17A3yKOMk+WdS5sVSWmGBUUHK7y/pkssEjM0PN4b > pKIGcYIHAADY7494AIaKGAjJHzBcbGFVaBpeeiGU64FhrV4dlq78jP63GRDrLFvK > 22PzzT2NF1JyYumLTaPEHecpbm9ykhCnU+TSfTRC66WZCIjUfuuSarH4Z/HOuVbv > GjpelCT4JdoDV6w8bwo3t40BnlEi7RsAyk8q+sIyw0D6l1qnGIWZuyxsPWRTzxM= > =jK7M > -----END PGP SIGNATURE----- > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From dalcinl at gmail.com Mon Jun 8 07:11:47 2015 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Mon, 8 Jun 2015 15:11:47 +0300 Subject: [petsc-users] petsc4py Build Problem In-Reply-To: References: Message-ID: On 8 June 2015 at 02:50, Mikhail Khodak wrote: > Hello, I am trying to build petsc4py-3.5.1 using Cygwin on 64-bit Windows 7. > I have built PETSc 3.5.4 with shared and dynamic libraries using > mpich2-1.2.1 and successfully ran the installation tests. I am using Python > 2.7 and NumPy 1.9.2 and have installed mpi4py. However, when I attempt to > install petsc4py (both with pip and distutils) I get a mpicc compiler error > due to undefined references/symbols. I have attached the output of running > > pip install petsc petsc4py --allow-external petsc --allow-external petsc4py > I've never ever built or test petsc4py under Cygwin. The errors you see are expected. Perhaps you can manually workaround the issues following the following steps: 1) Download the petsc4py tarball and unpack it. 2) Open the file "src/libpetsc4py/libpetsc4py.h", add remove all occurences of DL_IMPORT, i.e, replace DL_IMPORT(XYZ) for just XYZ 3) Use pip again: pip install petsc pip install . The last line assumes your current working directory is the one having petsc4py's setup.py Finally, I do not guarantee this will work. I'm just guessing, petsc4py never explicitly supported Windows and/or Cygwin. -- Lisandro Dalcin ============ Research Scientist Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) Numerical Porous Media Center (NumPor) King Abdullah University of Science and Technology (KAUST) http://numpor.kaust.edu.sa/ 4700 King Abdullah University of Science and Technology al-Khawarizmi Bldg (Bldg 1), Office # 4332 Thuwal 23955-6900, Kingdom of Saudi Arabia http://www.kaust.edu.sa Office Phone: +966 12 808-0459 From jed at jedbrown.org Mon Jun 8 09:39:39 2015 From: jed at jedbrown.org (Jed Brown) Date: Mon, 08 Jun 2015 08:39:39 -0600 Subject: [petsc-users] Tao iterations In-Reply-To: References: <3112142f40d64968ae157f4a5e819353@LUCKMAN.anl.gov> Message-ID: <87a8waxo0k.fsf@jedbrown.org> Justin Chang writes: > Hi Jason, > > One more question about BLMVM... if it only uses gradient information and > does not require the definitition of a Hessian Matrix, can this method be > applied to solve problems that are nonsymmetric by nature? (e.g., > advection-diffusion equations). Such equations do not have an associated objective functional, thus you don't have the "gradient" of something, you just have a system of nonlinear equations. (There are ways to formulate such systems as optimization, but they have issues like you mention.) Reformulating nonlinear equations as optimization can also turn a problem with a unique solution into one with local minima, for which it may be impossible to guarantee that you have reached a global minimum. Use SNES for solving nonlinear equations. You can try the quasi-Newton methods, which are related to BLMVM. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From jychang48 at gmail.com Mon Jun 8 10:26:39 2015 From: jychang48 at gmail.com (Justin Chang) Date: Mon, 8 Jun 2015 10:26:39 -0500 Subject: [petsc-users] Tao iterations In-Reply-To: <87a8waxo0k.fsf@jedbrown.org> References: <3112142f40d64968ae157f4a5e819353@LUCKMAN.anl.gov> <87a8waxo0k.fsf@jedbrown.org> Message-ID: Jed, Thank you for your response. I agree completely with all that you said. I just wonder what would happen if I attempted to use the TAO routines after forming the Jacobian J and residual r arising from the advection diffusion equation. In my current (linear) diffusion framework, i have the following objective function and gradient vector: f = \frac{1}{2} x \cdot J*x + x\cdot[r - J*x^(0)] g = J*[x - x^(0)] + r where x is the solution at a given tao iterate and x^(0) is the initial guess used to compute J and r. These are built upon the assumption that the solver would converge in exactly one newton step if I used SNESSolve (i.e., no unexpected nonlinearities). This methodology works well if I am only looking at anisotropic diffusion. With TRON, I would set the Hessian as J, and I imagine the solver would immediately fail due to lack of symmetry. I am not exactly sure what's going on within the framework of BLMVM, but given the above objective and gradient routines, how would BLMVM know whether my equations have an associated objective functional? Would the solver simply blow up or will it commit a variational crime by giving me some solution that may not actually be a global minimum? Normally I would experiment with this myself but solving something like advection-diffusion using DMPlex isn't trivial when dealing with high advection to diffusion ratios. Thanks, Justin On Mon, Jun 8, 2015 at 9:39 AM, Jed Brown wrote: > Justin Chang writes: > > > Hi Jason, > > > > One more question about BLMVM... if it only uses gradient information and > > does not require the definitition of a Hessian Matrix, can this method be > > applied to solve problems that are nonsymmetric by nature? (e.g., > > advection-diffusion equations). > > Such equations do not have an associated objective functional, thus you > don't have the "gradient" of something, you just have a system of > nonlinear equations. (There are ways to formulate such systems as > optimization, but they have issues like you mention.) Reformulating > nonlinear equations as optimization can also turn a problem with a > unique solution into one with local minima, for which it may be > impossible to guarantee that you have reached a global minimum. > > Use SNES for solving nonlinear equations. You can try the quasi-Newton > methods, which are related to BLMVM. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Jun 8 10:55:40 2015 From: jed at jedbrown.org (Jed Brown) Date: Mon, 08 Jun 2015 09:55:40 -0600 Subject: [petsc-users] Tao iterations In-Reply-To: References: <3112142f40d64968ae157f4a5e819353@LUCKMAN.anl.gov> <87a8waxo0k.fs f@jedbro wn.org> Message-ID: <87616yxkhv.fsf@jedbrown.org> Justin Chang writes: > Jed, > > Thank you for your response. I agree completely with all that you said. I > just wonder what would happen if I attempted to use the TAO routines after > forming the Jacobian J and residual r arising from the advection diffusion > equation. > > In my current (linear) diffusion framework, i have the following objective > function and gradient vector: > > f = \frac{1}{2} x \cdot J*x + x\cdot[r - J*x^(0)] If J is non-symmetric, then you're no longer looking for a minimum of this functional. Let's take a prototypical example in 2 variables J = [0 1; -1 0] Now the J-"inner product" conj(x) \cdot J x = 0 for all real-valued x. (I'll assume you're working over the reals.) Similarly, the J-"inner product" with J = [1 1; -1 1] is identical to that with J = eye(2), but obviously you want the anti-symmetric part to affect your solution. In short, none of this makes mathematical sense in the way you intend if J is nonsymmetric. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From jychang48 at gmail.com Mon Jun 8 11:13:45 2015 From: jychang48 at gmail.com (Justin Chang) Date: Mon, 8 Jun 2015 11:13:45 -0500 Subject: [petsc-users] Tao iterations In-Reply-To: <87616yxkhv.fsf@jedbrown.org> References: <3112142f40d64968ae157f4a5e819353@LUCKMAN.anl.gov> <87616yxkhv.fsf@jedbrown.org> Message-ID: Ah I see that makes sense. Thank you very much On Monday, June 8, 2015, Jed Brown wrote: > Justin Chang > writes: > > > Jed, > > > > Thank you for your response. I agree completely with all that you said. I > > just wonder what would happen if I attempted to use the TAO routines > after > > forming the Jacobian J and residual r arising from the advection > diffusion > > equation. > > > > In my current (linear) diffusion framework, i have the following > objective > > function and gradient vector: > > > > f = \frac{1}{2} x \cdot J*x + x\cdot[r - J*x^(0)] > > If J is non-symmetric, then you're no longer looking for a minimum of > this functional. Let's take a prototypical example in 2 variables > > J = [0 1; -1 0] > > Now the J-"inner product" > > conj(x) \cdot J x = 0 > > for all real-valued x. (I'll assume you're working over the reals.) > Similarly, the J-"inner product" with > > J = [1 1; -1 1] > > is identical to that with J = eye(2), but obviously you want the > anti-symmetric part to affect your solution. > > In short, none of this makes mathematical sense in the way you intend if > J is nonsymmetric. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Jun 8 12:10:13 2015 From: jed at jedbrown.org (Jed Brown) Date: Mon, 08 Jun 2015 11:10:13 -0600 Subject: [petsc-users] Guidance on GAMG preconditioning In-Reply-To: <17A35C213185A84BB8ED54C88FBFD712C3D123ED@IST-EX10MBX-4.ad.bu.edu> References: <17A35C213185A84BB8ED54C88FBFD712C3D1232A@IST-EX10MBX-4.ad.bu.edu> <17A35C213185A84BB8ED54C88FBFD712C3D123ED@IST-EX10MBX-4.ad.bu.edu> Message-ID: <87zj4aw2h6.fsf@jedbrown.org> "Young, Matthew, Adam" writes: > This is a problem from ionospheric plasma physics. The simulation > treats ions via a particle-in-cell method and electrons as an > inertialess fluid, the justification being that ionospheric ions are > 10^4 times more massive than electrons. We further assume that the > plasma is effectively neutral on the length scale of interest > (i.e. quasi-neutral) and those assumptions allows us to write an > elliptic equation for the electrostatic potential, phi: Div[n(x) T > Grad(phi)]. n(x) is the quasi-neutral plasma density, which is updated > via an ion gather at each time step, and T is a tensor of constant > coefficients that looks like {{1, kappa, 0},{-kappa, 1, 0},{0, 0, > 1+kappa^2}}, where kappa is the ratio of gyrofrequency to collision > frequency for electrons (~100 for our problem)*. It seems to me that the nonsymmetric part of T is not an elliptic contribution so with your large value of kappa, you should think of this problem as a singular perturbation. Consequently, there is no reason to believe that methods designed for elliptic problems will work for your problem (especially not "out of the box"). -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From mrosso at uci.edu Mon Jun 8 12:16:59 2015 From: mrosso at uci.edu (Michele Rosso) Date: Mon, 08 Jun 2015 10:16:59 -0700 Subject: [petsc-users] Weird behavior for log_summary In-Reply-To: References: <1433557379.2754.6.camel@enterprise-A> Message-ID: <1433783819.2664.3.camel@enterprise-A> Hi Barry, I run a small test case like you suggested: this results in no error, but the problem with log_summary still persists. Please find attached the output of log_summary. Thanks, Michele On Fri, 2015-06-05 at 21:34 -0500, Barry Smith wrote: > [NID 04001] 2015-06-04 19:07:24 Apid 25022256: initiated application termination > Application 25022256 exit signals: Killed > Application 25022256 resources: utime ~271s, stime ~15107s, Rss ~188536, inblocks ~5078831, outblocks ~12517984 > > Usually this kind of message indicates that either the OS or the batch system killed the process for some reason: often because it ran out of time or maybe memory. > > Can you run in batch with a request for more time? Do smaller jobs run through ok? > > If utime means user time and stime means system time then this is very bad, the system time is HUGE relative to the user time. > > Barry > > > > > > On Jun 5, 2015, at 9:22 PM, Michele Rosso wrote: > > > > Hi, > > > > I am checking the performances of my code via -log_summary, but the output is incomplete (please see attached) file. > > I configured petsc with the following options: > > > > if __name__ == '__main__': > > import sys > > import os > > sys.path.insert(0, os.path.abspath('config')) > > import configure > > configure_options = [ > > '--with-batch=1 ', > > '--known-mpi-shared=0 ', > > '--known-mpi-shared-libraries=0', > > '--known-memcmp-ok ', > > '--with-blas-lapack-lib=-L/opt/acml/5.3.1/gfortran64/lib -lacml', > > '--COPTFLAGS=-march=bdver1 -fopenmp -O3 -ffast-math -fPIC ', > > '--FOPTFLAGS=-march=bdver1 -fopenmp -O3 -ffast-math -fPIC ', > > '--CXXOPTFLAGS=-march=bdver1 -fopenmp -O3 -ffast-math -fPIC ', > > '--with-x=0 ', > > '--with-debugging=0', > > '--with-clib-autodetect=0 ', > > '--with-cxxlib-autodetect=0 ', > > '--with-fortranlib-autodetect=0 ', > > '--with-shared-libraries=0 ', > > '--with-mpi-compilers=1 ', > > '--with-cc=cc ', > > '--with-cxx=CC ', > > '--with-fc=ftn ', > > # '--with-64-bit-indices', > > '--download-hypre=1', > > '--download-blacs=1 ', > > '--download-scalapack=1 ', > > '--download-superlu_dist=1 ', > > '--download-metis=1 ', > > '--download-parmetis=1 ', > > ] > > configure.petsc_configure(configure_options) > > > > Any idea about this issue? > > Thanks, > > > > Michele > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./test3.exe on a gnu-opt-32idx named . with 16 processors, by mrosso Mon Jun 8 11:54:16 2015 Using Petsc Release Version 3.5.4, May, 23, 2015 Max Max/Min Avg Total Time (sec): 4.387e-01 1.00635 4.373e-01 Objects: 5.410e+02 1.00000 5.410e+02 Flops: 2.648e+07 1.00131 2.647e+07 4.234e+08 Flops/sec: 6.072e+07 1.00727 6.052e+07 9.683e+08 MPI Messages: 1.668e+03 1.06992 1.614e+03 2.582e+04 MPI Message Lengths: 2.914e+06 1.05005 1.763e+03 4.552e+07 MPI Reductions: 1.106e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 4.3732e-01 100.0% 4.2344e+08 100.0% 2.582e+04 100.0% 1.763e+03 100.0% 1.105e+03 99.9% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Vector 240 134 1196608 0 Vector Scatter 30 0 0 0 Matrix 88 0 0 0 Matrix Null Space 4 0 0 0 Distributed Mesh 10 0 0 0 Star Forest Bipartite Graph 20 0 0 0 Discrete System 10 0 0 0 Index Set 76 68 140480 0 IS L to G Mapping 10 0 0 0 Krylov Solver 20 4 4640 0 DMKSP interface 8 0 0 0 Preconditioner 20 4 4000 0 Viewer 5 4 2976 0 ======================================================================================================================== Average time to get PetscTime(): 9.53674e-08 Average time for MPI_Barrier(): 6.62804e-06 Average time for zero size MPI_Send(): 2.25008e-06 #PETSc Option Table entries: -ksp_initial_guess_nonzero yes -ksp_norm_type unpreconditioned -ksp_rtol 1e-9 -ksp_type cg -log_summary -mg_coarse_ksp_type preonly -mg_coarse_pc_factor_mat_solver_package superlu_dist -mg_coarse_pc_type lu -mg_levels_ksp_max_it 1 -mg_levels_ksp_type richardson -options_left -pc_mg_galerkin -pc_mg_levels 3 -pc_type mg #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --known-level1-dcache-size=16384 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 --known-mpi-c-double-complex=1 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 --with-batch="1 " --known-mpi-shared="0 " --known-mpi-shared-libraries=0 --known-memcmp-ok --with-blas-lapack-lib="-L/opt/acml/5.3.1/gfortran64/lib -lacml" --COPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --FOPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --CXXOPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --with-x="0 " --with-debugging=0 --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " --with-fortranlib-autodetect="0 " --with-shared-libraries="0 " --with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn " --download-hypre=1 --download-blacs="1 " --download-scalapack="1 " --download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " PETSC_ARCH=gnu-opt-32idx ----------------------------------------- Libraries compiled on Wed Jun 3 14:35:20 2015 on h2ologin3 Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64 Using PETSc directory: /mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4 Using PETSc arch: gnu-opt-32idx ----------------------------------------- Using C compiler: cc -march=bdver1 -fopenmp -O3 -ffast-math -fPIC ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: ftn -march=bdver1 -fopenmp -O3 -ffast-math -fPIC ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/include -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/include -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/include -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/include ----------------------------------------- Using C linker: cc Using Fortran linker: ftn Using libraries: -Wl,-rpath,/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -L/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -lpetsc -Wl,-rpath,/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -L/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -lsuperlu_dist_3.3 -lHYPRE -L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lpthread -lssl -lcrypto -ldl ----------------------------------------- #PETSc Option Table entries: -ksp_initial_guess_nonzero yes -ksp_norm_type unpreconditioned -ksp_rtol 1e-9 -ksp_type cg -log_summary -mg_coarse_ksp_type preonly -mg_coarse_pc_factor_mat_solver_package superlu_dist -mg_coarse_pc_type lu -mg_levels_ksp_max_it 1 -mg_levels_ksp_type richardson -options_left -pc_mg_galerkin -pc_mg_levels 3 -pc_type mg #End of PETSc Option Table entries There are no unused options. [NID 15170] 2015-06-08 11:54:16 Apid 25082027: initiated application termination Application 25082027 resources: utime ~1s, stime ~9s, Rss ~27320, inblocks ~19758, outblocks ~49423 From may at bu.edu Mon Jun 8 12:25:34 2015 From: may at bu.edu (Young, Matthew, Adam) Date: Mon, 8 Jun 2015 17:25:34 +0000 Subject: [petsc-users] Guidance on GAMG preconditioning In-Reply-To: <87zj4aw2h6.fsf@jedbrown.org> References: <17A35C213185A84BB8ED54C88FBFD712C3D1232A@IST-EX10MBX-4.ad.bu.edu> <17A35C213185A84BB8ED54C88FBFD712C3D123ED@IST-EX10MBX-4.ad.bu.edu>, <87zj4aw2h6.fsf@jedbrown.org> Message-ID: <17A35C213185A84BB8ED54C88FBFD712C3D12E4E@IST-EX10MBX-4.ad.bu.edu> Thanks for your helpful comments, Jed, Matt, and Mark. They've given me some things to think about, so I'll work on those then open a new email thread when I have further questions. --Matt -------------------------------------------------------------- Matthew Young Graduate Student Boston University Dept. of Astronomy -------------------------------------------------------------- ________________________________________ From: Jed Brown [jed at jedbrown.org] Sent: Monday, June 08, 2015 1:10 PM To: Young, Matthew, Adam; Matthew Knepley Cc: petsc-users Subject: Re: [petsc-users] Guidance on GAMG preconditioning "Young, Matthew, Adam" writes: > This is a problem from ionospheric plasma physics. The simulation > treats ions via a particle-in-cell method and electrons as an > inertialess fluid, the justification being that ionospheric ions are > 10^4 times more massive than electrons. We further assume that the > plasma is effectively neutral on the length scale of interest > (i.e. quasi-neutral) and those assumptions allows us to write an > elliptic equation for the electrostatic potential, phi: Div[n(x) T > Grad(phi)]. n(x) is the quasi-neutral plasma density, which is updated > via an ion gather at each time step, and T is a tensor of constant > coefficients that looks like {{1, kappa, 0},{-kappa, 1, 0},{0, 0, > 1+kappa^2}}, where kappa is the ratio of gyrofrequency to collision > frequency for electrons (~100 for our problem)*. It seems to me that the nonsymmetric part of T is not an elliptic contribution so with your large value of kappa, you should think of this problem as a singular perturbation. Consequently, there is no reason to believe that methods designed for elliptic problems will work for your problem (especially not "out of the box"). From bsmith at mcs.anl.gov Mon Jun 8 12:48:47 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 8 Jun 2015 12:48:47 -0500 Subject: [petsc-users] Weird behavior for log_summary In-Reply-To: <1433783819.2664.3.camel@enterprise-A> References: <1433557379.2754.6.camel@enterprise-A> <1433783819.2664.3.camel@enterprise-A> Message-ID: <1586ABCF-E3F8-497B-824D-2586F45CFE23@mcs.anl.gov> > On Jun 8, 2015, at 12:16 PM, Michele Rosso wrote: > > Hi Barry, > > I run a small test case like you suggested: this results in no error, but the problem with log_summary still persists. > Please find attached the output of log_summary. I cannot explain why nothing gets listed in the log summary. It appears to have used the command line options and hence a Krylov solver. I am dumbfounded why no events are listed in the log_summary. Barry > > Thanks, > Michele > > On Fri, 2015-06-05 at 21:34 -0500, Barry Smith wrote: >> [NID 04001] 2015-06-04 19:07:24 Apid 25022256: initiated application termination >> Application 25022256 exit signals: Killed >> Application 25022256 resources: utime ~271s, stime ~15107s, Rss ~188536, inblocks ~5078831, outblocks ~12517984 >> >> Usually this kind of message indicates that either the OS or the batch system killed the process for some reason: often because it ran out of time or maybe memory. >> >> Can you run in batch with a request for more time? Do smaller jobs run through ok? >> >> If utime means user time and stime means system time then this is very bad, the system time is HUGE relative to the user time. >> >> Barry >> >> >> >> >> >> > On Jun 5, 2015, at 9:22 PM, Michele Rosso wrote: >> > >> > Hi, >> > >> > I am checking the performances of my code via -log_summary, but the output is incomplete (please see attached) file. >> > I configured petsc with the following options: >> > >> > if __name__ == '__main__': >> > import sys >> > import os >> > sys.path.insert(0, os.path.abspath('config')) >> > import configure >> > configure_options = [ >> > '--with-batch=1 ', >> > '--known-mpi-shared=0 ', >> > '--known-mpi-shared-libraries=0', >> > '--known-memcmp-ok ', >> > '--with-blas-lapack-lib=-L/opt/acml/5.3.1/gfortran64/lib -lacml', >> > '--COPTFLAGS=-march=bdver1 -fopenmp -O3 -ffast-math -fPIC ', >> > '--FOPTFLAGS=-march=bdver1 -fopenmp -O3 -ffast-math -fPIC ', >> > '--CXXOPTFLAGS=-march=bdver1 -fopenmp -O3 -ffast-math -fPIC ', >> > '--with-x=0 ', >> > '--with-debugging=0', >> > '--with-clib-autodetect=0 ', >> > '--with-cxxlib-autodetect=0 ', >> > '--with-fortranlib-autodetect=0 ', >> > '--with-shared-libraries=0 ', >> > '--with-mpi-compilers=1 ', >> > '--with-cc=cc ', >> > '--with-cxx=CC ', >> > '--with-fc=ftn ', >> > # '--with-64-bit-indices', >> > '--download-hypre=1', >> > '--download-blacs=1 ', >> > '--download-scalapack=1 ', >> > '--download-superlu_dist=1 ', >> > '--download-metis=1 ', >> > '--download-parmetis=1 ', >> > ] >> > configure.petsc_configure(configure_options) >> > >> > Any idea about this issue? >> > Thanks, >> > >> > Michele >> > >> > >> > >> > >> > >> >> >> > > From mrosso at uci.edu Mon Jun 8 12:53:03 2015 From: mrosso at uci.edu (Michele Rosso) Date: Mon, 08 Jun 2015 10:53:03 -0700 Subject: [petsc-users] Weird behavior for log_summary In-Reply-To: <1586ABCF-E3F8-497B-824D-2586F45CFE23@mcs.anl.gov> References: <1433557379.2754.6.camel@enterprise-A> <1433783819.2664.3.camel@enterprise-A> <1586ABCF-E3F8-497B-824D-2586F45CFE23@mcs.anl.gov> Message-ID: <1433785983.2664.5.camel@enterprise-A> Is there any external software I should link at compile time to allow PETSc to provide log infos? Also, is there any define in petscconf.h that controls profiling? Michele On Mon, 2015-06-08 at 12:48 -0500, Barry Smith wrote: > > On Jun 8, 2015, at 12:16 PM, Michele Rosso wrote: > > > > Hi Barry, > > > > I run a small test case like you suggested: this results in no error, but the problem with log_summary still persists. > > Please find attached the output of log_summary. > > I cannot explain why nothing gets listed in the log summary. It appears to have used the command line options and hence a Krylov solver. I am dumbfounded why no events are listed in the log_summary. > > Barry > > > > > Thanks, > > Michele > > > > On Fri, 2015-06-05 at 21:34 -0500, Barry Smith wrote: > >> [NID 04001] 2015-06-04 19:07:24 Apid 25022256: initiated application termination > >> Application 25022256 exit signals: Killed > >> Application 25022256 resources: utime ~271s, stime ~15107s, Rss ~188536, inblocks ~5078831, outblocks ~12517984 > >> > >> Usually this kind of message indicates that either the OS or the batch system killed the process for some reason: often because it ran out of time or maybe memory. > >> > >> Can you run in batch with a request for more time? Do smaller jobs run through ok? > >> > >> If utime means user time and stime means system time then this is very bad, the system time is HUGE relative to the user time. > >> > >> Barry > >> > >> > >> > >> > >> > >> > On Jun 5, 2015, at 9:22 PM, Michele Rosso wrote: > >> > > >> > Hi, > >> > > >> > I am checking the performances of my code via -log_summary, but the output is incomplete (please see attached) file. > >> > I configured petsc with the following options: > >> > > >> > if __name__ == '__main__': > >> > import sys > >> > import os > >> > sys.path.insert(0, os.path.abspath('config')) > >> > import configure > >> > configure_options = [ > >> > '--with-batch=1 ', > >> > '--known-mpi-shared=0 ', > >> > '--known-mpi-shared-libraries=0', > >> > '--known-memcmp-ok ', > >> > '--with-blas-lapack-lib=-L/opt/acml/5.3.1/gfortran64/lib -lacml', > >> > '--COPTFLAGS=-march=bdver1 -fopenmp -O3 -ffast-math -fPIC ', > >> > '--FOPTFLAGS=-march=bdver1 -fopenmp -O3 -ffast-math -fPIC ', > >> > '--CXXOPTFLAGS=-march=bdver1 -fopenmp -O3 -ffast-math -fPIC ', > >> > '--with-x=0 ', > >> > '--with-debugging=0', > >> > '--with-clib-autodetect=0 ', > >> > '--with-cxxlib-autodetect=0 ', > >> > '--with-fortranlib-autodetect=0 ', > >> > '--with-shared-libraries=0 ', > >> > '--with-mpi-compilers=1 ', > >> > '--with-cc=cc ', > >> > '--with-cxx=CC ', > >> > '--with-fc=ftn ', > >> > # '--with-64-bit-indices', > >> > '--download-hypre=1', > >> > '--download-blacs=1 ', > >> > '--download-scalapack=1 ', > >> > '--download-superlu_dist=1 ', > >> > '--download-metis=1 ', > >> > '--download-parmetis=1 ', > >> > ] > >> > configure.petsc_configure(configure_options) > >> > > >> > Any idea about this issue? > >> > Thanks, > >> > > >> > Michele > >> > > >> > > >> > > >> > > >> > > >> > >> > >> > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Jun 8 13:02:10 2015 From: jed at jedbrown.org (Jed Brown) Date: Mon, 08 Jun 2015 12:02:10 -0600 Subject: [petsc-users] Weird behavior for log_summary In-Reply-To: <1433785983.2664.5.camel@enterprise-A> References: <1433557379.2754.6.camel@enterprise-A> <1433783819.2664.3.camel@enterprise-A> <1586ABCF-E3F8-497B-824D-2586F45CFE23@mcs.anl.gov> <1433785983.2664.5.camel@enterprise-A> Message-ID: <87wpzew02l.fsf@jedbrown.org> Michele Rosso writes: > Is there any external software I should link at compile time to allow > PETSc to provide log infos? > Also, is there any define in petscconf.h that controls profiling? There is PETSC_USE_LOG, but that should always be on, in which case everything is run-time rather than compile-time. When you run a PETSc example, does the log_summary correctly contain events? Is it possible that your code is turning off events somewhere? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From bsmith at mcs.anl.gov Mon Jun 8 13:05:01 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 8 Jun 2015 13:05:01 -0500 Subject: [petsc-users] Weird behavior for log_summary In-Reply-To: <1433785983.2664.5.camel@enterprise-A> References: <1433557379.2754.6.camel@enterprise-A> <1433783819.2664.3.camel@enterprise-A> <1586ABCF-E3F8-497B-824D-2586F45CFE23@mcs.anl.gov> <1433785983.2664.5.camel@enterprise-A> Message-ID: > On Jun 8, 2015, at 12:53 PM, Michele Rosso wrote: > > Is there any external software I should link at compile time to allow PETSc to provide log infos? No > Also, is there any define in petscconf.h that controls profiling? Yes, but it would stop anything from being printed, not just the events. Barry > > Michele > > On Mon, 2015-06-08 at 12:48 -0500, Barry Smith wrote: >> > On Jun 8, 2015, at 12:16 PM, Michele Rosso wrote: >> > >> > Hi Barry, >> > >> > I run a small test case like you suggested: this results in no error, but the problem with log_summary still persists. >> > Please find attached the output of log_summary. >> >> >> I cannot explain why nothing gets listed in the log summary. It appears to have used the command line options and hence a Krylov solver. I am dumbfounded why no events are listed in the log_summary. >> >> Barry >> >> >> > >> > Thanks, >> > Michele >> > >> > On Fri, 2015-06-05 at 21:34 -0500, Barry Smith wrote: >> >> [NID 04001] 2015-06-04 19:07:24 Apid 25022256: initiated application termination >> >> Application 25022256 exit signals: Killed >> >> Application 25022256 resources: utime ~271s, stime ~15107s, Rss ~188536, inblocks ~5078831, outblocks ~12517984 >> >> >> >> Usually this kind of message indicates that either the OS or the batch system killed the process for some reason: often because it ran out of time or maybe memory. >> >> >> >> Can you run in batch with a request for more time? Do smaller jobs run through ok? >> >> >> >> If utime means user time and stime means system time then this is very bad, the system time is HUGE relative to the user time. >> >> >> >> Barry >> >> >> >> >> >> >> >> >> >> >> >> > On Jun 5, 2015, at 9:22 PM, Michele Rosso wrote: >> >> > >> >> > Hi, >> >> > >> >> > I am checking the performances of my code via -log_summary, but the output is incomplete (please see attached) file. >> >> > I configured petsc with the following options: >> >> > >> >> > if __name__ == '__main__': >> >> > import sys >> >> > import os >> >> > sys.path.insert(0, os.path.abspath('config')) >> >> > import configure >> >> > configure_options = [ >> >> > '--with-batch=1 ', >> >> > '--known-mpi-shared=0 ', >> >> > '--known-mpi-shared-libraries=0', >> >> > '--known-memcmp-ok ', >> >> > '--with-blas-lapack-lib=-L/opt/acml/5.3.1/gfortran64/lib -lacml', >> >> > '--COPTFLAGS=-march=bdver1 -fopenmp -O3 -ffast-math -fPIC ', >> >> > '--FOPTFLAGS=-march=bdver1 -fopenmp -O3 -ffast-math -fPIC ', >> >> > '--CXXOPTFLAGS=-march=bdver1 -fopenmp -O3 -ffast-math -fPIC ', >> >> > '--with-x=0 ', >> >> > '--with-debugging=0', >> >> > '--with-clib-autodetect=0 ', >> >> > '--with-cxxlib-autodetect=0 ', >> >> > '--with-fortranlib-autodetect=0 ', >> >> > '--with-shared-libraries=0 ', >> >> > '--with-mpi-compilers=1 ', >> >> > '--with-cc=cc ', >> >> > '--with-cxx=CC ', >> >> > '--with-fc=ftn ', >> >> > # '--with-64-bit-indices', >> >> > '--download-hypre=1', >> >> > '--download-blacs=1 ', >> >> > '--download-scalapack=1 ', >> >> > '--download-superlu_dist=1 ', >> >> > '--download-metis=1 ', >> >> > '--download-parmetis=1 ', >> >> > ] >> >> > configure.petsc_configure(configure_options) >> >> > >> >> > Any idea about this issue? >> >> > Thanks, >> >> > >> >> > Michele >> >> > >> >> > >> >> > >> >> > >> >> > >> >> >> >> >> >> >> > >> > >> >> >> > From mrosso at uci.edu Mon Jun 8 13:46:47 2015 From: mrosso at uci.edu (Michele Rosso) Date: Mon, 08 Jun 2015 11:46:47 -0700 Subject: [petsc-users] Weird behavior for log_summary In-Reply-To: <87wpzew02l.fsf@jedbrown.org> References: <1433557379.2754.6.camel@enterprise-A> <1433783819.2664.3.camel@enterprise-A> <1586ABCF-E3F8-497B-824D-2586F45CFE23@mcs.anl.gov> <1433785983.2664.5.camel@enterprise-A> <87wpzew02l.fsf@jedbrown.org> Message-ID: <1433789207.4103.0.camel@enterprise-A> Jed, In the petscconf.h I have #ifndef PETSC_USE_LOG #define PETSC_USE_LOG 1 #endif so I guess that is not the problem. I run ex50: I attached the output. It does prints the summary. The I guess there is something wrong with my code. I call mpi_init before petsc_initialize and then I finalize everything with petsc_finalize. I do not believe I added any other command that could turn off the log_summary output. Thanks, Michele On Mon, 2015-06-08 at 12:02 -0600, Jed Brown wrote: > Michele Rosso writes: > > > Is there any external software I should link at compile time to allow > > PETSc to provide log infos? > > Also, is there any define in petscconf.h that controls profiling? > > There is PETSC_USE_LOG, but that should always be on, in which case > everything is run-time rather than compile-time. When you run a PETSc > example, does the log_summary correctly contain events? Is it possible > that your code is turning off events somewhere? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- 0 KSP Residual norm 4.419256232155e+01 1 KSP Residual norm 5.451780353427e-01 2 KSP Residual norm 2.277974394337e-02 3 KSP Residual norm 2.972141446447e-04 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./ex50 on a gnu-opt-32idx named ????? with 16 processors, by mrosso Mon Jun 8 13:25:36 2015 Using Petsc Release Version 3.5.4, May, 23, 2015 Max Max/Min Avg Total Time (sec): 3.684e+00 1.00250 3.680e+00 Objects: 5.830e+02 1.00000 5.830e+02 Flops: 4.062e+08 1.00571 4.045e+08 6.472e+09 Flops/sec: 1.103e+08 1.00588 1.099e+08 1.759e+09 MPI Messages: 1.823e+03 2.00550 1.372e+03 2.196e+04 MPI Message Lengths: 1.306e+06 1.99861 7.158e+02 1.572e+07 MPI Reductions: 1.307e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 3.6802e+00 100.0% 6.4725e+09 100.0% 2.196e+04 100.0% 7.158e+02 100.0% 1.306e+03 99.9% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage KSPGMRESOrthog 93 1.0 2.7896e-01 1.2 8.37e+07 1.0 0.0e+00 0.0e+00 9.3e+01 7 21 0 0 7 7 21 0 0 7 4778 KSPSetUp 21 1.0 5.7312e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.5e+02 1 0 0 0 12 1 0 0 0 12 0 KSPSolve 1 1.0 3.6051e+00 1.0 4.06e+08 1.0 2.2e+04 7.1e+02 1.3e+03 98100100 99 99 98100100 99 99 1795 VecMDot 93 1.0 1.3218e-01 1.1 4.18e+07 1.0 0.0e+00 0.0e+00 9.3e+01 3 10 0 0 7 3 10 0 0 7 5042 VecNorm 103 1.0 5.8708e-02 4.9 9.84e+06 1.0 0.0e+00 0.0e+00 1.0e+02 1 2 0 0 8 1 2 0 0 8 2671 VecScale 103 1.0 1.5907e-02 1.2 4.92e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 4929 VecCopy 46 1.0 2.1967e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecSet 188 1.0 2.1128e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 19 1.0 1.0012e-02 1.3 1.93e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3076 VecAYPX 288 1.0 1.1642e-01 1.1 1.41e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 3 0 0 0 3 3 0 0 0 1924 VecAXPBYCZ 144 1.0 9.7574e-02 1.1 2.81e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 7 0 0 0 3 7 0 0 0 4592 VecMAXPY 103 1.0 1.8963e-01 1.5 5.04e+07 1.0 0.0e+00 0.0e+00 0.0e+00 4 12 0 0 0 4 12 0 0 0 4238 VecAssemblyBegin 1 1.0 7.0770e-0348.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 1 1.0 1.2207e-04 8.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 407 1.0 7.0384e-03 1.8 0.00e+00 0.0 1.9e+04 8.1e+02 0.0e+00 0 0 86 97 0 0 0 86 97 0 0 VecScatterEnd 407 1.0 5.3825e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNormalize 103 1.0 7.4179e-02 2.9 1.48e+07 1.0 0.0e+00 0.0e+00 1.0e+02 1 4 0 0 8 1 4 0 0 8 3171 MatMult 318 1.0 6.5145e-01 1.1 1.18e+08 1.0 1.5e+04 9.4e+02 0.0e+00 17 29 70 91 0 17 29 70 91 0 2880 MatMultAdd 36 1.0 3.7364e-02 1.0 6.32e+06 1.0 1.2e+03 3.3e+02 0.0e+00 1 2 5 3 0 1 2 5 3 0 2697 MatMultTranspose 45 1.0 4.8568e-02 1.1 7.90e+06 1.0 1.5e+03 3.3e+02 0.0e+00 1 2 7 3 0 1 2 7 3 0 2593 MatSolve 4 1.0 2.3127e-05 1.7 1.57e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1088 MatSOR 315 1.0 1.3988e+00 1.0 1.23e+08 1.0 0.0e+00 0.0e+00 0.0e+00 38 30 0 0 0 38 30 0 0 0 1402 MatLUFactorSym 1 1.0 3.0994e-05 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatLUFactorNum 1 1.0 3.2902e-05 3.3 8.93e+02 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 434 MatConvert 1 1.0 4.2915e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatResidual 36 1.0 8.2138e-02 1.1 1.40e+07 1.0 1.7e+03 9.1e+02 0.0e+00 2 3 8 10 0 2 3 8 10 0 2726 MatAssemblyBegin 30 1.0 5.0912e-02 5.9 0.00e+00 0.0 0.0e+00 0.0e+00 5.8e+01 1 0 0 0 4 1 0 0 0 4 0 MatAssemblyEnd 30 1.0 1.1132e-01 1.1 0.00e+00 0.0 1.6e+03 1.6e+02 1.5e+02 3 0 7 2 12 3 0 7 2 12 0 MatGetRowIJ 1 1.0 2.3127e-05 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetSubMatrice 1 1.0 2.5797e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 5.7936e-05 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRedundant 1 1.0 3.2496e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMPIConcateSeq 1 1.0 5.2929e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCSetUp 1 1.0 3.0851e-01 1.0 1.58e+06 1.0 3.2e+03 1.1e+02 5.5e+02 8 0 14 2 42 8 0 14 2 42 82 PCApply 4 1.0 2.7321e+00 1.0 3.86e+08 1.0 1.8e+04 7.9e+02 6.8e+02 74 95 84 93 52 74 95 84 93 52 2251 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Krylov Solver 21 21 303808 0 DMKSP interface 6 6 3936 0 Vector 315 315 98901528 0 Vector Scatter 31 31 2845572 0 Matrix 60 60 61654640 0 Matrix Null Space 11 11 6556 0 Distributed Mesh 10 10 48960 0 Star Forest Bipartite Graph 20 20 16000 0 Discrete System 10 10 7920 0 Index Set 67 67 1476092 0 IS L to G Mapping 10 10 1420508 0 Preconditioner 21 21 19192 0 Viewer 1 0 0 0 ======================================================================================================================== Average time to get PetscTime(): 9.53674e-08 Average time for MPI_Barrier(): 1.2207e-05 Average time for zero size MPI_Send(): 1.49012e-06 #PETSc Option Table entries: -da_grid_x 2049 -da_grid_y 2049 -ksp_monitor -log_summary -pc_mg_levels 10 -pc_type mg #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --known-level1-dcache-size=16384 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 --known-mpi-c-double-complex=1 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 --with-batch="1 " --known-mpi-shared="0 " --known-mpi-shared-libraries=0 --known-memcmp-ok --with-blas-lapack-lib="-L/opt/acml/5.3.1/gfortran64/lib -lacml" --COPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --FOPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --CXXOPTFLAGS="-march=bdver1 -fopenmp -O3 -ffast-math -fPIC " --with-x="0 " --with-debugging=0 --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " --with-fortranlib-autodetect="0 " --with-shared-libraries="0 " --with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn " --download-hypre=1 --download-blacs="1 " --download-scalapack="1 " --download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " PETSC_ARCH=gnu-opt-32idx ----------------------------------------- Libraries compiled on Wed Jun 3 14:35:20 2015 on h2ologin3 Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64 Using PETSc directory: /mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4 Using PETSc arch: gnu-opt-32idx ----------------------------------------- Using C compiler: cc -march=bdver1 -fopenmp -O3 -ffast-math -fPIC ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: ftn -march=bdver1 -fopenmp -O3 -ffast-math -fPIC ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/include -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/include -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/include -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/include ----------------------------------------- Using C linker: cc Using Fortran linker: ftn Using libraries: -Wl,-rpath,/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -L/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -lpetsc -Wl,-rpath,/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -L/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.5.4/gnu-opt-32idx/lib -lsuperlu_dist_3.3 -lHYPRE -L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lpthread -lssl -lcrypto -ldl ----------------------------------------- Application 25083726 resources: utime ~57s, stime ~13s, Rss ~165420, inblocks ~19777, outblocks ~47757 From jed at jedbrown.org Mon Jun 8 13:54:16 2015 From: jed at jedbrown.org (Jed Brown) Date: Mon, 08 Jun 2015 12:54:16 -0600 Subject: [petsc-users] Weird behavior for log_summary In-Reply-To: <1433789207.4103.0.camel@enterprise-A> References: <1433557379.2754.6.camel@enterprise-A> <1433783819.2664.3.camel@enterprise-A> <1586ABCF-E3F8-497B-824D-2586F45CFE23@mcs.anl.gov> <1433785983.2664.5.camel@enterprise-A> <87wpzew02l.fsf@jedbrown.org> <1433789207.4103.0.camel@enterprise-A> Message-ID: <87fv62vxnr.fsf@jedbrown.org> Michele Rosso writes: > Jed, > > In the petscconf.h I have > > #ifndef PETSC_USE_LOG > #define PETSC_USE_LOG 1 > #endif > > so I guess that is not the problem. > I run ex50: I attached the output. It does prints the summary. > The I guess there is something wrong with my code. > I call mpi_init before petsc_initialize and then I finalize everything > with petsc_finalize. Uh, there is no petsc_finalize (it's spelled PetscFinalize), but remember that it doesn't finalize MPI unless PetscInitialize called MPI_Init. > I do not believe I added any other command that could turn off the > log_summary output. Then incrementally figure out what is different between your code and the example. Or use a debugger to track down the difference. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From mrosso at uci.edu Mon Jun 8 13:58:49 2015 From: mrosso at uci.edu (Michele Rosso) Date: Mon, 08 Jun 2015 11:58:49 -0700 Subject: [petsc-users] Weird behavior for log_summary In-Reply-To: <87fv62vxnr.fsf@jedbrown.org> References: <1433557379.2754.6.camel@enterprise-A> <1433783819.2664.3.camel@enterprise-A> <1586ABCF-E3F8-497B-824D-2586F45CFE23@mcs.anl.gov> <1433785983.2664.5.camel@enterprise-A> <87wpzew02l.fsf@jedbrown.org> <1433789207.4103.0.camel@enterprise-A> <87fv62vxnr.fsf@jedbrown.org> Message-ID: <1433789929.4103.2.camel@enterprise-A> Ok, I will go step by step and see what is wrong. Just to make sure: if I initialize with mpi_init and then call petscinitialize and finally petscfinalize without calling mpi_finalize, petsc should still print the output of log_summary, correct? Michele On Mon, 2015-06-08 at 12:54 -0600, Jed Brown wrote: > Michele Rosso writes: > > > Jed, > > > > In the petscconf.h I have > > > > #ifndef PETSC_USE_LOG > > #define PETSC_USE_LOG 1 > > #endif > > > > so I guess that is not the problem. > > I run ex50: I attached the output. It does prints the summary. > > The I guess there is something wrong with my code. > > I call mpi_init before petsc_initialize and then I finalize everything > > with petsc_finalize. > > Uh, there is no petsc_finalize (it's spelled PetscFinalize), but > remember that it doesn't finalize MPI unless PetscInitialize called > MPI_Init. > > > I do not believe I added any other command that could turn off the > > log_summary output. > > Then incrementally figure out what is different between your code and > the example. Or use a debugger to track down the difference. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Jun 8 14:04:20 2015 From: jed at jedbrown.org (Jed Brown) Date: Mon, 08 Jun 2015 13:04:20 -0600 Subject: [petsc-users] Weird behavior for log_summary In-Reply-To: <1433789929.4103.2.camel@enterprise-A> References: <1433557379.2754.6.camel@enterprise-A> <1433783819.2664.3.camel@enterprise-A> <1586ABCF-E3F8-497B-824D-2586F45CFE23@mcs.anl.gov> <1433785983.2664.5.camel@enterprise-A> <87wpzew02l.fsf@jedbrown.org> <1433789207.4103.0.camel@enterprise-A> <87fv62vxnr.fsf@jedbrown.org> <1433789929.4103.2.camel@enterprise-A> Message-ID: <87616yvx6z.fsf@jedbrown.org> Michele Rosso writes: > Ok, I will go step by step and see what is wrong. > Just to make sure: if I initialize with mpi_init and then call > petscinitialize and finally petscfinalize without calling mpi_finalize, > petsc should > still print the output of log_summary, correct? The log_summary printing comes first, but you still need to call MPI_Finalize to clean up MPI correctly. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From gomer at stanford.edu Tue Jun 9 19:25:28 2015 From: gomer at stanford.edu (Paul Urbanczyk) Date: Tue, 09 Jun 2015 17:25:28 -0700 Subject: [petsc-users] Problem Setup / Multiple DAs Message-ID: <557783F8.5060504@stanford.edu> Hello, I'm relatively new to PETSc, so please forgive me if this is an obvious question. For context, I am working on a single-block, structured, parallel CFD solver using sixth-order compact Pade schemes. As I read through the manual and materials, it seems that using a "distributed array" (DA) would be extremely helpful in this case. When I set up the DA for solution of the Navier-Stokes equations, it makes sense to set it up with five degrees of freedom (DoFs) to handle the five conservation equations. However, there are a number of other calculations that need to be made (calculation of non-uniform mesh metrics, for example) that do not involve five DoFs. Is there a way to handle varying numbers of DoFs with the same distributed array, or would it be better to use multiple DAs (one with five DoFs, one with a single DoF, etc) for different aspects of the problem? Thanks for any/all advice on this matter! -Paul From bsmith at mcs.anl.gov Tue Jun 9 19:51:07 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 9 Jun 2015 19:51:07 -0500 Subject: [petsc-users] Problem Setup / Multiple DAs In-Reply-To: <557783F8.5060504@stanford.edu> References: <557783F8.5060504@stanford.edu> Message-ID: <8E3D3F8A-FA46-46F6-97F7-7C529E7BE48D@mcs.anl.gov> > On Jun 9, 2015, at 7:25 PM, Paul Urbanczyk wrote: > > Hello, > > I'm relatively new to PETSc, so please forgive me if this is an obvious question. > > For context, I am working on a single-block, structured, parallel CFD solver using sixth-order compact Pade schemes. > > As I read through the manual and materials, it seems that using a "distributed array" (DA) would be extremely helpful in this case. > > When I set up the DA for solution of the Navier-Stokes equations, it makes sense to set it up with five degrees of freedom (DoFs) to handle the five conservation equations. > > However, there are a number of other calculations that need to be made (calculation of non-uniform mesh metrics, for example) that do not involve five DoFs. > > Is there a way to handle varying numbers of DoFs with the same distributed array, or would it be better to use multiple DAs (one with five DoFs, one with a single DoF, etc) for different aspects of the problem? It is better just to use multiple DAs. Note that if the only difference between them is the dof (and stencil width, stencil type can be different also) then they will all have the same parallel layout so thus be compatible with each other. Each individual DA object now doesn't take much memory so it is ok to have several. Barry > > Thanks for any/all advice on this matter! > > -Paul > From venkateshgk.j at gmail.com Wed Jun 10 06:52:44 2015 From: venkateshgk.j at gmail.com (venkatesh g) Date: Wed, 10 Jun 2015 17:22:44 +0530 Subject: [petsc-users] Petsc Binary Write - Memory Message-ID: Hi I am trying to write very large sparse matrices A and B for solving generalized Eigenvalue problem so that I can use SLEPC ex7.c code. I want to read matrices from file according to that code. And I generate these matrices from Matlab using PetscBinaryWrite.m However, it exceeds my 256 GB RAM in one of the machines. So I am unable to generate these binary matrices. Kindly let me know how to write them efficiently. cheers, Venkatesh -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jun 10 08:03:00 2015 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 10 Jun 2015 08:03:00 -0500 Subject: [petsc-users] Petsc Binary Write - Memory In-Reply-To: References: Message-ID: On Wed, Jun 10, 2015 at 6:52 AM, venkatesh g wrote: > Hi > > I am trying to write very large sparse matrices A and B for solving > generalized Eigenvalue problem > > so that I can use SLEPC ex7.c code. > > I want to read matrices from file according to that code. And I generate > these matrices from Matlab using PetscBinaryWrite.m > > However, it exceeds my 256 GB RAM in one of the machines. So I am unable > to generate these binary matrices. > > Kindly let me know how to write them efficiently. > Consider writing C code to generate the matrix? I am not sure what else to propose. It sounds like you need to generate the matrix progressively and write portions, or you need to generate the entries in parallel. Matt > cheers, > > Venkatesh > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Jun 10 08:05:41 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 10 Jun 2015 08:05:41 -0500 Subject: [petsc-users] Petsc Binary Write - Memory In-Reply-To: References: Message-ID: Whats the size of the matrix? How many non-zeros? Satish On Wed, 10 Jun 2015, venkatesh g wrote: > Hi > > I am trying to write very large sparse matrices A and B for solving > generalized Eigenvalue problem > > so that I can use SLEPC ex7.c code. > > I want to read matrices from file according to that code. And I generate > these matrices from Matlab using PetscBinaryWrite.m > > However, it exceeds my 256 GB RAM in one of the machines. So I am unable to > generate these binary matrices. > > Kindly let me know how to write them efficiently. > > cheers, > > Venkatesh > From venkateshgk.j at gmail.com Wed Jun 10 08:31:20 2015 From: venkateshgk.j at gmail.com (venkatesh g) Date: Wed, 10 Jun 2015 19:01:20 +0530 Subject: [petsc-users] Petsc Binary Write - Memory In-Reply-To: References: Message-ID: Hi, Writing portions in binary u mean ? Is it possible to do ? Actually Matlab creates these elements and stores them in the workspace and they take up RAM. cheers, Venkatesh On Wed, Jun 10, 2015 at 6:33 PM, Matthew Knepley wrote: > On Wed, Jun 10, 2015 at 6:52 AM, venkatesh g > wrote: > >> Hi >> >> I am trying to write very large sparse matrices A and B for solving >> generalized Eigenvalue problem >> >> so that I can use SLEPC ex7.c code. >> >> I want to read matrices from file according to that code. And I generate >> these matrices from Matlab using PetscBinaryWrite.m >> >> However, it exceeds my 256 GB RAM in one of the machines. So I am unable >> to generate these binary matrices. >> >> Kindly let me know how to write them efficiently. >> > > Consider writing C code to generate the matrix? I am not sure what else to > propose. It sounds like > you need to generate the matrix progressively and write portions, or you > need to generate the entries > in parallel. > > Matt > > >> cheers, >> >> Venkatesh >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From venkateshgk.j at gmail.com Wed Jun 10 08:32:34 2015 From: venkateshgk.j at gmail.com (venkatesh g) Date: Wed, 10 Jun 2015 19:02:34 +0530 Subject: [petsc-users] Petsc Binary Write - Memory In-Reply-To: References: Message-ID: The size of the matrix is 84500 x 84500 the no. of non-zero elements is 2.7338e+09 cheers, Venkatesh On Wed, Jun 10, 2015 at 6:35 PM, Satish Balay wrote: > Whats the size of the matrix? How many non-zeros? > > Satish > > On Wed, 10 Jun 2015, venkatesh g wrote: > > > Hi > > > > I am trying to write very large sparse matrices A and B for solving > > generalized Eigenvalue problem > > > > so that I can use SLEPC ex7.c code. > > > > I want to read matrices from file according to that code. And I generate > > these matrices from Matlab using PetscBinaryWrite.m > > > > However, it exceeds my 256 GB RAM in one of the machines. So I am unable > to > > generate these binary matrices. > > > > Kindly let me know how to write them efficiently. > > > > cheers, > > > > Venkatesh > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jun 10 08:37:22 2015 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 10 Jun 2015 08:37:22 -0500 Subject: [petsc-users] Petsc Binary Write - Memory In-Reply-To: References: Message-ID: On Wed, Jun 10, 2015 at 8:32 AM, venkatesh g wrote: > The size of the matrix is 84500 x 84500 > > the no. of non-zero elements is 2.7338e+09 > This matrix is not sparse, it has 40% fill. You should treat it as dense. For dense matrices of this size, you should consider using Elemental. We have an interface to Elemental in PETSc. I recommend writing the code to create these entries on the fly since it will probably be faster than loading them from disk. Thanks, Matt > cheers, > Venkatesh > > On Wed, Jun 10, 2015 at 6:35 PM, Satish Balay wrote: > >> Whats the size of the matrix? How many non-zeros? >> >> Satish >> >> On Wed, 10 Jun 2015, venkatesh g wrote: >> >> > Hi >> > >> > I am trying to write very large sparse matrices A and B for solving >> > generalized Eigenvalue problem >> > >> > so that I can use SLEPC ex7.c code. >> > >> > I want to read matrices from file according to that code. And I generate >> > these matrices from Matlab using PetscBinaryWrite.m >> > >> > However, it exceeds my 256 GB RAM in one of the machines. So I am >> unable to >> > generate these binary matrices. >> > >> > Kindly let me know how to write them efficiently. >> > >> > cheers, >> > >> > Venkatesh >> > >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From venkateshgk.j at gmail.com Wed Jun 10 09:04:14 2015 From: venkateshgk.j at gmail.com (venkatesh g) Date: Wed, 10 Jun 2015 19:34:14 +0530 Subject: [petsc-users] Petsc Binary Write - Memory In-Reply-To: References: Message-ID: Hi, Yes I will try Elemental but there is lots of complex routines and functions with long codes in Matlab which goes in the creation of the A and B matrices, so re-writing in C will take a long time. Even if Matlab takes 128 GB out of 256 GB, and the matrix A to be written is around 25 GB, then PetscBinaryWrite.m is using more than 100 GB just to run.. This is the problem. so is there any other way ? cheers, Venkatesh On Wed, Jun 10, 2015 at 7:07 PM, Matthew Knepley wrote: > On Wed, Jun 10, 2015 at 8:32 AM, venkatesh g > wrote: > >> The size of the matrix is 84500 x 84500 >> >> the no. of non-zero elements is 2.7338e+09 >> > > This matrix is not sparse, it has 40% fill. You should treat it as dense. > For dense matrices of this size, > you should consider using Elemental. We have an interface to Elemental in > PETSc. > > I recommend writing the code to create these entries on the fly since it > will probably be faster than loading > them from disk. > > Thanks, > > Matt > > >> cheers, >> Venkatesh >> >> On Wed, Jun 10, 2015 at 6:35 PM, Satish Balay wrote: >> >>> Whats the size of the matrix? How many non-zeros? >>> >>> Satish >>> >>> On Wed, 10 Jun 2015, venkatesh g wrote: >>> >>> > Hi >>> > >>> > I am trying to write very large sparse matrices A and B for solving >>> > generalized Eigenvalue problem >>> > >>> > so that I can use SLEPC ex7.c code. >>> > >>> > I want to read matrices from file according to that code. And I >>> generate >>> > these matrices from Matlab using PetscBinaryWrite.m >>> > >>> > However, it exceeds my 256 GB RAM in one of the machines. So I am >>> unable to >>> > generate these binary matrices. >>> > >>> > Kindly let me know how to write them efficiently. >>> > >>> > cheers, >>> > >>> > Venkatesh >>> > >>> >>> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Jun 10 09:04:51 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 10 Jun 2015 09:04:51 -0500 Subject: [petsc-users] Petsc Binary Write - Memory In-Reply-To: References: Message-ID: Just to add - the sparse matrix would be of size: 4*84500+(4+8)*2.7338e+09 bytes = 33GB [app]. So matlab - with the intermediate represantations used in PetscBinaryWrite.m are perhaps consuming 256GB - so a sequential assmebly of this matrix in C code [with proper preallocation] should be more memory efficient.. Satish On Wed, 10 Jun 2015, Matthew Knepley wrote: > On Wed, Jun 10, 2015 at 8:32 AM, venkatesh g > wrote: > > > The size of the matrix is 84500 x 84500 > > > > the no. of non-zero elements is 2.7338e+09 > > > > This matrix is not sparse, it has 40% fill. You should treat it as dense. > For dense matrices of this size, > you should consider using Elemental. We have an interface to Elemental in > PETSc. > > I recommend writing the code to create these entries on the fly since it > will probably be faster than loading > them from disk. > > Thanks, > > Matt > > > > cheers, > > Venkatesh > > > > On Wed, Jun 10, 2015 at 6:35 PM, Satish Balay wrote: > > > >> Whats the size of the matrix? How many non-zeros? > >> > >> Satish > >> > >> On Wed, 10 Jun 2015, venkatesh g wrote: > >> > >> > Hi > >> > > >> > I am trying to write very large sparse matrices A and B for solving > >> > generalized Eigenvalue problem > >> > > >> > so that I can use SLEPC ex7.c code. > >> > > >> > I want to read matrices from file according to that code. And I generate > >> > these matrices from Matlab using PetscBinaryWrite.m > >> > > >> > However, it exceeds my 256 GB RAM in one of the machines. So I am > >> unable to > >> > generate these binary matrices. > >> > > >> > Kindly let me know how to write them efficiently. > >> > > >> > cheers, > >> > > >> > Venkatesh > >> > > >> > >> > > > > > From knepley at gmail.com Wed Jun 10 09:12:57 2015 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 10 Jun 2015 09:12:57 -0500 Subject: [petsc-users] Petsc Binary Write - Memory In-Reply-To: References: Message-ID: On Wed, Jun 10, 2015 at 9:04 AM, venkatesh g wrote: > Hi, > > Yes I will try Elemental but there is lots of complex routines and > functions with long codes in Matlab which goes in the creation of the A and > B matrices, so re-writing in C will take a long time. > > Even if Matlab takes 128 GB out of 256 GB, and the matrix A to be written > is around 25 GB, then PetscBinaryWrite.m is using more than 100 GB just to > run.. This is the problem. > > so is there any other way ? > Don't write it as a sparse matrix. Write it as a dense matrix. This is just a 4 integer header MAT_FILE_CLASSID, M, N, nz and then the data. https://bitbucket.org/petsc/petsc/src/7fe65498eb3a5a672e8885998851e0d5742e4c1e/src/mat/impls/dense/mpi/mpidense.c?at=master#cl-1571 Matt > cheers, > Venkatesh > > On Wed, Jun 10, 2015 at 7:07 PM, Matthew Knepley > wrote: > >> On Wed, Jun 10, 2015 at 8:32 AM, venkatesh g >> wrote: >> >>> The size of the matrix is 84500 x 84500 >>> >>> the no. of non-zero elements is 2.7338e+09 >>> >> >> This matrix is not sparse, it has 40% fill. You should treat it as dense. >> For dense matrices of this size, >> you should consider using Elemental. We have an interface to Elemental in >> PETSc. >> >> I recommend writing the code to create these entries on the fly since it >> will probably be faster than loading >> them from disk. >> >> Thanks, >> >> Matt >> >> >>> cheers, >>> Venkatesh >>> >>> On Wed, Jun 10, 2015 at 6:35 PM, Satish Balay wrote: >>> >>>> Whats the size of the matrix? How many non-zeros? >>>> >>>> Satish >>>> >>>> On Wed, 10 Jun 2015, venkatesh g wrote: >>>> >>>> > Hi >>>> > >>>> > I am trying to write very large sparse matrices A and B for solving >>>> > generalized Eigenvalue problem >>>> > >>>> > so that I can use SLEPC ex7.c code. >>>> > >>>> > I want to read matrices from file according to that code. And I >>>> generate >>>> > these matrices from Matlab using PetscBinaryWrite.m >>>> > >>>> > However, it exceeds my 256 GB RAM in one of the machines. So I am >>>> unable to >>>> > generate these binary matrices. >>>> > >>>> > Kindly let me know how to write them efficiently. >>>> > >>>> > cheers, >>>> > >>>> > Venkatesh >>>> > >>>> >>>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From kandanovian at gmail.com Wed Jun 10 09:42:00 2015 From: kandanovian at gmail.com (Tim Steinhoff) Date: Wed, 10 Jun 2015 16:42:00 +0200 Subject: [petsc-users] Setting values of a matrix in a blockwise manner Message-ID: Hi all I want to use Petsc to solve some linear systems via the built-in Krylov subspace methods as well as by means of UMFPACK. The considered matrix is block sparse with blocks of size 6x6. Here is what I came up with after taking a look at some of the examples MPI_Comm comm; Mat A; PetscInt n = 10000; /* dimension of matrix */ comm = PETSC_COMM_SELF; MatCreate(comm,&A); MatSetSizes(Amat,n,n,n,n); MatSetBlockSize(A,6); MatSetType(A,MATAIJ); /* UMFPACK compatible format due to comm = PETSC_COMM_SELF */ Questions: 1. I work on a single node with 2-8 cores. Hence, comm = PETSC_COMM_SELF; I guess. Is it correct in this contect to set MatSetSizes(Amat,n,n,n,n); with 4-times n? 2. After the above sequence of commands do I have to use something like MatSeqAIJSetPreallocation(A,0,d_nnz); /* d_nnz <-> number of nonzeros per row */ or is it possible to use MatSeqBAIJSetPreallocation(A,6,0,db_nnz); /* db_nnz <-> number of block nonzeros per block row */ In any case, is something like MatSetValuesBlocked(A,1,idx_r,1,idx_c,myblockvals,INSERT_VALUES); to fill values of one block into the matrix A ok? Regards Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jun 10 09:45:16 2015 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 10 Jun 2015 09:45:16 -0500 Subject: [petsc-users] Setting values of a matrix in a blockwise manner In-Reply-To: References: Message-ID: On Wed, Jun 10, 2015 at 9:42 AM, Tim Steinhoff wrote: > Hi all > > I want to use Petsc to solve some linear systems via the built-in Krylov > subspace methods as well as by means of UMFPACK. > > The considered matrix is block sparse with blocks of size 6x6. > > Here is what I came up with after taking a look at some of the examples > > MPI_Comm comm; > Mat A; > PetscInt n = 10000; /* dimension of matrix */ > comm = PETSC_COMM_SELF; > MatCreate(comm,&A); > MatSetSizes(Amat,n,n,n,n); > MatSetBlockSize(A,6); > MatSetType(A,MATAIJ); /* UMFPACK compatible format due to comm = > PETSC_COMM_SELF */ > > Questions: > 1. > I work on a single node with 2-8 cores. Hence, comm = PETSC_COMM_SELF; I > guess. Is it correct in this contect to set MatSetSizes(Amat,n,n,n,n); with > 4-times n? > Yes. > 2. > After the above sequence of commands do I have to use something like > MatSeqAIJSetPreallocation(A,0,d_nnz); /* d_nnz <-> number of nonzeros > per row */ > or is it possible to use > MatSeqBAIJSetPreallocation(A,6,0,db_nnz); /* db_nnz <-> number of block > nonzeros per block row */ > You should use this if using MATBAIJ. > In any case, is something like > MatSetValuesBlocked(A,1,idx_r,1,idx_c,myblockvals,INSERT_VALUES); > to fill values of one block into the matrix A ok? > Yes, however in order to get improved performance, you need type MATBAIJ. Thanks, Matt > Regards > Tim > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From kandanovian at gmail.com Wed Jun 10 09:55:24 2015 From: kandanovian at gmail.com (Tim Steinhoff) Date: Wed, 10 Jun 2015 16:55:24 +0200 Subject: [petsc-users] Setting values of a matrix in a blockwise manner In-Reply-To: References: Message-ID: Thanks for the quick reply! "Yes, however in order to get improved performance, you need type MATBAIJ." I considered MatSetType(A,MATAIJ); i.e. the non-block type since UMFPACK seems to require the seqaij type according to the summary page. So do I have to refrain from using the more amiable block-type if I want to make use of UMFPACK? 2015-06-10 16:45 GMT+02:00 Matthew Knepley : > On Wed, Jun 10, 2015 at 9:42 AM, Tim Steinhoff > wrote: >> >> Hi all >> >> I want to use Petsc to solve some linear systems via the built-in Krylov >> subspace methods as well as by means of UMFPACK. >> >> The considered matrix is block sparse with blocks of size 6x6. >> >> Here is what I came up with after taking a look at some of the examples >> >> MPI_Comm comm; >> Mat A; >> PetscInt n = 10000; /* dimension of matrix */ >> comm = PETSC_COMM_SELF; >> MatCreate(comm,&A); >> MatSetSizes(Amat,n,n,n,n); >> MatSetBlockSize(A,6); >> MatSetType(A,MATAIJ); /* UMFPACK compatible format due to comm = >> PETSC_COMM_SELF */ >> >> Questions: >> 1. >> I work on a single node with 2-8 cores. Hence, comm = PETSC_COMM_SELF; I >> guess. Is it correct in this contect to set MatSetSizes(Amat,n,n,n,n); with >> 4-times n? > > > Yes. > >> >> 2. >> After the above sequence of commands do I have to use something like >> MatSeqAIJSetPreallocation(A,0,d_nnz); /* d_nnz <-> number of nonzeros >> per row */ >> or is it possible to use >> MatSeqBAIJSetPreallocation(A,6,0,db_nnz); /* db_nnz <-> number of block >> nonzeros per block row */ > > > You should use this if using MATBAIJ. > >> >> In any case, is something like >> MatSetValuesBlocked(A,1,idx_r,1,idx_c,myblockvals,INSERT_VALUES); >> to fill values of one block into the matrix A ok? > > > Yes, however in order to get improved performance, you need type MATBAIJ. > > Thanks, > > Matt > >> >> Regards >> Tim > > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener From knepley at gmail.com Wed Jun 10 10:00:53 2015 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 10 Jun 2015 10:00:53 -0500 Subject: [petsc-users] Setting values of a matrix in a blockwise manner In-Reply-To: References: Message-ID: On Wed, Jun 10, 2015 at 9:55 AM, Tim Steinhoff wrote: > Thanks for the quick reply! > > "Yes, however in order to get improved performance, you need type MATBAIJ." > > I considered MatSetType(A,MATAIJ); i.e. the non-block type since > UMFPACK seems to require the seqaij type according to the summary > page. So do I have to refrain from using the more amiable block-type > if I want to make use of UMFPACK? > Yes, I think so. What do you need in UMFPACK? Thanks, Matt > 2015-06-10 16:45 GMT+02:00 Matthew Knepley : > > On Wed, Jun 10, 2015 at 9:42 AM, Tim Steinhoff > > wrote: > >> > >> Hi all > >> > >> I want to use Petsc to solve some linear systems via the built-in Krylov > >> subspace methods as well as by means of UMFPACK. > >> > >> The considered matrix is block sparse with blocks of size 6x6. > >> > >> Here is what I came up with after taking a look at some of the examples > >> > >> MPI_Comm comm; > >> Mat A; > >> PetscInt n = 10000; /* dimension of matrix */ > >> comm = PETSC_COMM_SELF; > >> MatCreate(comm,&A); > >> MatSetSizes(Amat,n,n,n,n); > >> MatSetBlockSize(A,6); > >> MatSetType(A,MATAIJ); /* UMFPACK compatible format due to comm = > >> PETSC_COMM_SELF */ > >> > >> Questions: > >> 1. > >> I work on a single node with 2-8 cores. Hence, comm = PETSC_COMM_SELF; I > >> guess. Is it correct in this contect to set MatSetSizes(Amat,n,n,n,n); > with > >> 4-times n? > > > > > > Yes. > > > >> > >> 2. > >> After the above sequence of commands do I have to use something like > >> MatSeqAIJSetPreallocation(A,0,d_nnz); /* d_nnz <-> number of nonzeros > >> per row */ > >> or is it possible to use > >> MatSeqBAIJSetPreallocation(A,6,0,db_nnz); /* db_nnz <-> number of > block > >> nonzeros per block row */ > > > > > > You should use this if using MATBAIJ. > > > >> > >> In any case, is something like > >> MatSetValuesBlocked(A,1,idx_r,1,idx_c,myblockvals,INSERT_VALUES); > >> to fill values of one block into the matrix A ok? > > > > > > Yes, however in order to get improved performance, you need type MATBAIJ. > > > > Thanks, > > > > Matt > > > >> > >> Regards > >> Tim > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments > > is infinitely more interesting than any results to which their > experiments > > lead. > > -- Norbert Wiener > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From kandanovian at gmail.com Wed Jun 10 10:03:32 2015 From: kandanovian at gmail.com (Tim Steinhoff) Date: Wed, 10 Jun 2015 17:03:32 +0200 Subject: [petsc-users] Reading values from a matrix into a different format Message-ID: Hi all assume that a sparse matrix is constructed via MPI_Comm comm; Mat A; PetscInt n = 10000; /* dimension of matrix */ comm = PETSC_COMM_SELF; MatCreate(comm,&A); MatSetSizes(Amat,n,n,n,n); MatSetBlockSize(A,6); MatSetType(A,MATAIJ); followd by some suitable preallocation and setting of values. What can be done to efficiently read the values from the matrix A and store them in a format like the coordinate format COO? Regards Tim From knepley at gmail.com Wed Jun 10 10:05:08 2015 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 10 Jun 2015 10:05:08 -0500 Subject: [petsc-users] Reading values from a matrix into a different format In-Reply-To: References: Message-ID: On Wed, Jun 10, 2015 at 10:03 AM, Tim Steinhoff wrote: > Hi all > > assume that a sparse matrix is constructed via > > MPI_Comm comm; > Mat A; > PetscInt n = 10000; /* dimension of matrix */ > comm = PETSC_COMM_SELF; > MatCreate(comm,&A); > MatSetSizes(Amat,n,n,n,n); > MatSetBlockSize(A,6); > MatSetType(A,MATAIJ); > > followd by some suitable preallocation and setting of values. > > What can be done to efficiently read the values from the matrix A and > store them in a format like the coordinate format COO? > Using MatGetRow() should be fine for this. Thanks, Matt > Regards > Tim > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From kandanovian at gmail.com Wed Jun 10 10:20:48 2015 From: kandanovian at gmail.com (Tim Steinhoff) Date: Wed, 10 Jun 2015 17:20:48 +0200 Subject: [petsc-users] Setting values of a matrix in a blockwise manner In-Reply-To: References: Message-ID: I wanted to give the multi frontal LU factorization approach a shot as an example for a direct sparse solver. Regards Tim 2015-06-10 17:00 GMT+02:00 Matthew Knepley : > On Wed, Jun 10, 2015 at 9:55 AM, Tim Steinhoff > wrote: >> >> Thanks for the quick reply! >> >> "Yes, however in order to get improved performance, you need type >> MATBAIJ." >> >> I considered MatSetType(A,MATAIJ); i.e. the non-block type since >> UMFPACK seems to require the seqaij type according to the summary >> page. So do I have to refrain from using the more amiable block-type >> if I want to make use of UMFPACK? > > > Yes, I think so. What do you need in UMFPACK? > > Thanks, > > Matt > >> >> 2015-06-10 16:45 GMT+02:00 Matthew Knepley : >> > On Wed, Jun 10, 2015 at 9:42 AM, Tim Steinhoff >> > wrote: >> >> >> >> Hi all >> >> >> >> I want to use Petsc to solve some linear systems via the built-in >> >> Krylov >> >> subspace methods as well as by means of UMFPACK. >> >> >> >> The considered matrix is block sparse with blocks of size 6x6. >> >> >> >> Here is what I came up with after taking a look at some of the examples >> >> >> >> MPI_Comm comm; >> >> Mat A; >> >> PetscInt n = 10000; /* dimension of matrix */ >> >> comm = PETSC_COMM_SELF; >> >> MatCreate(comm,&A); >> >> MatSetSizes(Amat,n,n,n,n); >> >> MatSetBlockSize(A,6); >> >> MatSetType(A,MATAIJ); /* UMFPACK compatible format due to comm = >> >> PETSC_COMM_SELF */ >> >> >> >> Questions: >> >> 1. >> >> I work on a single node with 2-8 cores. Hence, comm = PETSC_COMM_SELF; >> >> I >> >> guess. Is it correct in this contect to set MatSetSizes(Amat,n,n,n,n); >> >> with >> >> 4-times n? >> > >> > >> > Yes. >> > >> >> >> >> 2. >> >> After the above sequence of commands do I have to use something like >> >> MatSeqAIJSetPreallocation(A,0,d_nnz); /* d_nnz <-> number of nonzeros >> >> per row */ >> >> or is it possible to use >> >> MatSeqBAIJSetPreallocation(A,6,0,db_nnz); /* db_nnz <-> number of >> >> block >> >> nonzeros per block row */ >> > >> > >> > You should use this if using MATBAIJ. >> > >> >> >> >> In any case, is something like >> >> MatSetValuesBlocked(A,1,idx_r,1,idx_c,myblockvals,INSERT_VALUES); >> >> to fill values of one block into the matrix A ok? >> > >> > >> > Yes, however in order to get improved performance, you need type >> > MATBAIJ. >> > >> > Thanks, >> > >> > Matt >> > >> >> >> >> Regards >> >> Tim >> > >> > >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> > experiments >> > is infinitely more interesting than any results to which their >> > experiments >> > lead. >> > -- Norbert Wiener > > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener From knepley at gmail.com Wed Jun 10 10:24:17 2015 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 10 Jun 2015 10:24:17 -0500 Subject: [petsc-users] Setting values of a matrix in a blockwise manner In-Reply-To: References: Message-ID: On Wed, Jun 10, 2015 at 10:20 AM, Tim Steinhoff wrote: > I wanted to give the multi frontal LU factorization approach a shot as > an example for a direct sparse solver. > You can just switch types when you use something other than UMFPACK. Thanks, Matt > Regards > Tim > > 2015-06-10 17:00 GMT+02:00 Matthew Knepley : > > On Wed, Jun 10, 2015 at 9:55 AM, Tim Steinhoff > > wrote: > >> > >> Thanks for the quick reply! > >> > >> "Yes, however in order to get improved performance, you need type > >> MATBAIJ." > >> > >> I considered MatSetType(A,MATAIJ); i.e. the non-block type since > >> UMFPACK seems to require the seqaij type according to the summary > >> page. So do I have to refrain from using the more amiable block-type > >> if I want to make use of UMFPACK? > > > > > > Yes, I think so. What do you need in UMFPACK? > > > > Thanks, > > > > Matt > > > >> > >> 2015-06-10 16:45 GMT+02:00 Matthew Knepley : > >> > On Wed, Jun 10, 2015 at 9:42 AM, Tim Steinhoff > > >> > wrote: > >> >> > >> >> Hi all > >> >> > >> >> I want to use Petsc to solve some linear systems via the built-in > >> >> Krylov > >> >> subspace methods as well as by means of UMFPACK. > >> >> > >> >> The considered matrix is block sparse with blocks of size 6x6. > >> >> > >> >> Here is what I came up with after taking a look at some of the > examples > >> >> > >> >> MPI_Comm comm; > >> >> Mat A; > >> >> PetscInt n = 10000; /* dimension of matrix */ > >> >> comm = PETSC_COMM_SELF; > >> >> MatCreate(comm,&A); > >> >> MatSetSizes(Amat,n,n,n,n); > >> >> MatSetBlockSize(A,6); > >> >> MatSetType(A,MATAIJ); /* UMFPACK compatible format due to comm = > >> >> PETSC_COMM_SELF */ > >> >> > >> >> Questions: > >> >> 1. > >> >> I work on a single node with 2-8 cores. Hence, comm = > PETSC_COMM_SELF; > >> >> I > >> >> guess. Is it correct in this contect to set > MatSetSizes(Amat,n,n,n,n); > >> >> with > >> >> 4-times n? > >> > > >> > > >> > Yes. > >> > > >> >> > >> >> 2. > >> >> After the above sequence of commands do I have to use something like > >> >> MatSeqAIJSetPreallocation(A,0,d_nnz); /* d_nnz <-> number of > nonzeros > >> >> per row */ > >> >> or is it possible to use > >> >> MatSeqBAIJSetPreallocation(A,6,0,db_nnz); /* db_nnz <-> number of > >> >> block > >> >> nonzeros per block row */ > >> > > >> > > >> > You should use this if using MATBAIJ. > >> > > >> >> > >> >> In any case, is something like > >> >> MatSetValuesBlocked(A,1,idx_r,1,idx_c,myblockvals,INSERT_VALUES); > >> >> to fill values of one block into the matrix A ok? > >> > > >> > > >> > Yes, however in order to get improved performance, you need type > >> > MATBAIJ. > >> > > >> > Thanks, > >> > > >> > Matt > >> > > >> >> > >> >> Regards > >> >> Tim > >> > > >> > > >> > > >> > > >> > -- > >> > What most experimenters take for granted before they begin their > >> > experiments > >> > is infinitely more interesting than any results to which their > >> > experiments > >> > lead. > >> > -- Norbert Wiener > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments > > is infinitely more interesting than any results to which their > experiments > > lead. > > -- Norbert Wiener > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From kandanovian at gmail.com Wed Jun 10 10:31:16 2015 From: kandanovian at gmail.com (Tim Steinhoff) Date: Wed, 10 Jun 2015 17:31:16 +0200 Subject: [petsc-users] Setting values of a matrix in a blockwise manner In-Reply-To: References: Message-ID: One can switch on-the-fly? Say, we have a block sparse matrix A initilized via MatSetType(A,MATBAIJ) and subsequent calls of MatSeqBAIJSetPreallocation and MatSetValuesBlocked for use with e.g. the built-in Krylov subspace methods and afterwards changing type via MatSetType(A,MATAIJ)? Regards Tim 2015-06-10 17:24 GMT+02:00 Matthew Knepley : > On Wed, Jun 10, 2015 at 10:20 AM, Tim Steinhoff > wrote: >> >> I wanted to give the multi frontal LU factorization approach a shot as >> an example for a direct sparse solver. > > > You can just switch types when you use something other than UMFPACK. > > Thanks, > > Matt > >> >> Regards >> Tim >> >> 2015-06-10 17:00 GMT+02:00 Matthew Knepley : >> > On Wed, Jun 10, 2015 at 9:55 AM, Tim Steinhoff >> > wrote: >> >> >> >> Thanks for the quick reply! >> >> >> >> "Yes, however in order to get improved performance, you need type >> >> MATBAIJ." >> >> >> >> I considered MatSetType(A,MATAIJ); i.e. the non-block type since >> >> UMFPACK seems to require the seqaij type according to the summary >> >> page. So do I have to refrain from using the more amiable block-type >> >> if I want to make use of UMFPACK? >> > >> > >> > Yes, I think so. What do you need in UMFPACK? >> > >> > Thanks, >> > >> > Matt >> > >> >> >> >> 2015-06-10 16:45 GMT+02:00 Matthew Knepley : >> >> > On Wed, Jun 10, 2015 at 9:42 AM, Tim Steinhoff >> >> > >> >> > wrote: >> >> >> >> >> >> Hi all >> >> >> >> >> >> I want to use Petsc to solve some linear systems via the built-in >> >> >> Krylov >> >> >> subspace methods as well as by means of UMFPACK. >> >> >> >> >> >> The considered matrix is block sparse with blocks of size 6x6. >> >> >> >> >> >> Here is what I came up with after taking a look at some of the >> >> >> examples >> >> >> >> >> >> MPI_Comm comm; >> >> >> Mat A; >> >> >> PetscInt n = 10000; /* dimension of matrix */ >> >> >> comm = PETSC_COMM_SELF; >> >> >> MatCreate(comm,&A); >> >> >> MatSetSizes(Amat,n,n,n,n); >> >> >> MatSetBlockSize(A,6); >> >> >> MatSetType(A,MATAIJ); /* UMFPACK compatible format due to comm = >> >> >> PETSC_COMM_SELF */ >> >> >> >> >> >> Questions: >> >> >> 1. >> >> >> I work on a single node with 2-8 cores. Hence, comm = >> >> >> PETSC_COMM_SELF; >> >> >> I >> >> >> guess. Is it correct in this contect to set >> >> >> MatSetSizes(Amat,n,n,n,n); >> >> >> with >> >> >> 4-times n? >> >> > >> >> > >> >> > Yes. >> >> > >> >> >> >> >> >> 2. >> >> >> After the above sequence of commands do I have to use something like >> >> >> MatSeqAIJSetPreallocation(A,0,d_nnz); /* d_nnz <-> number of >> >> >> nonzeros >> >> >> per row */ >> >> >> or is it possible to use >> >> >> MatSeqBAIJSetPreallocation(A,6,0,db_nnz); /* db_nnz <-> number of >> >> >> block >> >> >> nonzeros per block row */ >> >> > >> >> > >> >> > You should use this if using MATBAIJ. >> >> > >> >> >> >> >> >> In any case, is something like >> >> >> MatSetValuesBlocked(A,1,idx_r,1,idx_c,myblockvals,INSERT_VALUES); >> >> >> to fill values of one block into the matrix A ok? >> >> > >> >> > >> >> > Yes, however in order to get improved performance, you need type >> >> > MATBAIJ. >> >> > >> >> > Thanks, >> >> > >> >> > Matt >> >> > >> >> >> >> >> >> Regards >> >> >> Tim >> >> > >> >> > >> >> > >> >> > >> >> > -- >> >> > What most experimenters take for granted before they begin their >> >> > experiments >> >> > is infinitely more interesting than any results to which their >> >> > experiments >> >> > lead. >> >> > -- Norbert Wiener >> > >> > >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> > experiments >> > is infinitely more interesting than any results to which their >> > experiments >> > lead. >> > -- Norbert Wiener > > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener From knepley at gmail.com Wed Jun 10 10:37:50 2015 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 10 Jun 2015 10:37:50 -0500 Subject: [petsc-users] Setting values of a matrix in a blockwise manner In-Reply-To: References: Message-ID: On Wed, Jun 10, 2015 at 10:31 AM, Tim Steinhoff wrote: > One can switch on-the-fly? > Yes, either by changing the string in MatSetType() or calling MatSetFromOptions() and passing -mat_type baij > Say, > > we have a block sparse matrix A initilized via MatSetType(A,MATBAIJ) > and subsequent calls of MatSeqBAIJSetPreallocation and > MatSetValuesBlocked > The type determines the storage format, so you do not really want to change after construction. YOu can do this however using MatConvert(). Thanks, Matt > for use with e.g. the built-in Krylov subspace methods and afterwards > changing type via MatSetType(A,MATAIJ)? > > Regards > Tim > > 2015-06-10 17:24 GMT+02:00 Matthew Knepley : > > On Wed, Jun 10, 2015 at 10:20 AM, Tim Steinhoff > > wrote: > >> > >> I wanted to give the multi frontal LU factorization approach a shot as > >> an example for a direct sparse solver. > > > > > > You can just switch types when you use something other than UMFPACK. > > > > Thanks, > > > > Matt > > > >> > >> Regards > >> Tim > >> > >> 2015-06-10 17:00 GMT+02:00 Matthew Knepley : > >> > On Wed, Jun 10, 2015 at 9:55 AM, Tim Steinhoff > > >> > wrote: > >> >> > >> >> Thanks for the quick reply! > >> >> > >> >> "Yes, however in order to get improved performance, you need type > >> >> MATBAIJ." > >> >> > >> >> I considered MatSetType(A,MATAIJ); i.e. the non-block type since > >> >> UMFPACK seems to require the seqaij type according to the summary > >> >> page. So do I have to refrain from using the more amiable block-type > >> >> if I want to make use of UMFPACK? > >> > > >> > > >> > Yes, I think so. What do you need in UMFPACK? > >> > > >> > Thanks, > >> > > >> > Matt > >> > > >> >> > >> >> 2015-06-10 16:45 GMT+02:00 Matthew Knepley : > >> >> > On Wed, Jun 10, 2015 at 9:42 AM, Tim Steinhoff > >> >> > > >> >> > wrote: > >> >> >> > >> >> >> Hi all > >> >> >> > >> >> >> I want to use Petsc to solve some linear systems via the built-in > >> >> >> Krylov > >> >> >> subspace methods as well as by means of UMFPACK. > >> >> >> > >> >> >> The considered matrix is block sparse with blocks of size 6x6. > >> >> >> > >> >> >> Here is what I came up with after taking a look at some of the > >> >> >> examples > >> >> >> > >> >> >> MPI_Comm comm; > >> >> >> Mat A; > >> >> >> PetscInt n = 10000; /* dimension of matrix */ > >> >> >> comm = PETSC_COMM_SELF; > >> >> >> MatCreate(comm,&A); > >> >> >> MatSetSizes(Amat,n,n,n,n); > >> >> >> MatSetBlockSize(A,6); > >> >> >> MatSetType(A,MATAIJ); /* UMFPACK compatible format due to comm = > >> >> >> PETSC_COMM_SELF */ > >> >> >> > >> >> >> Questions: > >> >> >> 1. > >> >> >> I work on a single node with 2-8 cores. Hence, comm = > >> >> >> PETSC_COMM_SELF; > >> >> >> I > >> >> >> guess. Is it correct in this contect to set > >> >> >> MatSetSizes(Amat,n,n,n,n); > >> >> >> with > >> >> >> 4-times n? > >> >> > > >> >> > > >> >> > Yes. > >> >> > > >> >> >> > >> >> >> 2. > >> >> >> After the above sequence of commands do I have to use something > like > >> >> >> MatSeqAIJSetPreallocation(A,0,d_nnz); /* d_nnz <-> number of > >> >> >> nonzeros > >> >> >> per row */ > >> >> >> or is it possible to use > >> >> >> MatSeqBAIJSetPreallocation(A,6,0,db_nnz); /* db_nnz <-> number > of > >> >> >> block > >> >> >> nonzeros per block row */ > >> >> > > >> >> > > >> >> > You should use this if using MATBAIJ. > >> >> > > >> >> >> > >> >> >> In any case, is something like > >> >> >> > MatSetValuesBlocked(A,1,idx_r,1,idx_c,myblockvals,INSERT_VALUES); > >> >> >> to fill values of one block into the matrix A ok? > >> >> > > >> >> > > >> >> > Yes, however in order to get improved performance, you need type > >> >> > MATBAIJ. > >> >> > > >> >> > Thanks, > >> >> > > >> >> > Matt > >> >> > > >> >> >> > >> >> >> Regards > >> >> >> Tim > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > What most experimenters take for granted before they begin their > >> >> > experiments > >> >> > is infinitely more interesting than any results to which their > >> >> > experiments > >> >> > lead. > >> >> > -- Norbert Wiener > >> > > >> > > >> > > >> > > >> > -- > >> > What most experimenters take for granted before they begin their > >> > experiments > >> > is infinitely more interesting than any results to which their > >> > experiments > >> > lead. > >> > -- Norbert Wiener > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments > > is infinitely more interesting than any results to which their > experiments > > lead. > > -- Norbert Wiener > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Jun 10 11:28:14 2015 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 10 Jun 2015 12:28:14 -0400 Subject: [petsc-users] GAMG In-Reply-To: <17A35C213185A84BB8ED54C88FBFD712C3D1B269@IST-EX10MBX-4.ad.bu.edu> References: <17A35C213185A84BB8ED54C88FBFD712C3D1A091@IST-EX10MBX-4.ad.bu.edu> <17A35C213185A84BB8ED54C88FBFD712C3D1B269@IST-EX10MBX-4.ad.bu.edu> Message-ID: Yes, lets get this back on the list. On Wed, Jun 10, 2015 at 12:01 PM, Young, Matthew, Adam wrote: > Ah, oops - I was looking at the v 3.5 manual. I am certainly interested > in algorithmic details if there are relevant papers. My main interest right > now is determining if this method is appropriate for my problem. > Jed mentioned that this will not work well out of the box, as I recall. It looks like very high anisotropy. There are heuristics in GAMG to deal with anisotropy, but I would not trust them on unstructured grids. You would want to play with '-pc_gamg_threshold x', x in [0 - 0.1]. My advisor used SuperLU on a sequential version of this problem years ago, > but SuperLU_dist appears to not scale well. > > yep, > Before any of that, though: Would you rather I redirect this to the > petsc-users list? I emailed you directly because I worried that the > discussion of my problem physics was detracting from PETSc-specific > conversation but maybe this is a conversation better had with the entire > PETSc group. > > You might look in the appendix of the "Multigrid" book by Trottenberg, et al. They have a listing of lots of physics problems and way to deal with them, with multigrid. Maybe they have something like your problem. Mark > --Matt > > -------------------------------------------------------------- > Matthew Young > Graduate Student > Boston University Dept. of Astronomy > -------------------------------------------------------------- > > ------------------------------ > *From:* Mark Adams [mfadams at lbl.gov] > *Sent:* Wednesday, June 10, 2015 10:33 AM > *To:* Young, Matthew, Adam > *Subject:* Re: GAMG > > The manual has material on this. Let me know if you want more detail, > algorithmic, etc. > Mark > > On Wed, Jun 10, 2015 at 10:15 AM, Young, Matthew, Adam wrote: > >> Hi Mark, >> >> We recently had a brief discussion (along with Matt Knepley and Jed >> Brown) on the petsc-users list about a problem I'm working on for my >> thesis research. Since I'd like to explore what GAMG has to offer, I was >> wondering if you can direct me to an introduction of sorts if such a >> document exists. >> >> Best, >> Matt >> >> -------------------------------------------------------------- >> Matthew Young >> Graduate Student >> Boston University Dept. of Astronomy >> -------------------------------------------------------------- >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Jun 10 11:42:49 2015 From: jed at jedbrown.org (Jed Brown) Date: Wed, 10 Jun 2015 10:42:49 -0600 Subject: [petsc-users] GAMG In-Reply-To: References: <17A35C213185A84BB8ED54C88FBFD712C3D1A091@IST-EX10MBX-4.ad.bu.edu> <17A35C213185A84BB8ED54C88FBFD712C3D1B269@IST-EX10MBX-4.ad.bu.edu> Message-ID: <87d213seeu.fsf@jedbrown.org> Mark Adams writes: > Yes, lets get this back on the list. > > On Wed, Jun 10, 2015 at 12:01 PM, Young, Matthew, Adam wrote: > >> Ah, oops - I was looking at the v 3.5 manual. I am certainly interested >> in algorithmic details if there are relevant papers. My main interest right >> now is determining if this method is appropriate for my problem. >> > > Jed mentioned that this will not work well out of the box, as I recall. It > looks like very high anisotropy. It looks like a hyperbolic term. If you only look at the symmetric part of the tensor, then you get anisotropy (1 versus 1 + \kappa^2 ? 10000), but we also have a big nonsymmetric contribution. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From may at bu.edu Wed Jun 10 16:02:14 2015 From: may at bu.edu (Young, Matthew, Adam) Date: Wed, 10 Jun 2015 21:02:14 +0000 Subject: [petsc-users] GAMG In-Reply-To: <87d213seeu.fsf@jedbrown.org> References: <17A35C213185A84BB8ED54C88FBFD712C3D1A091@IST-EX10MBX-4.ad.bu.edu> <17A35C213185A84BB8ED54C88FBFD712C3D1B269@IST-EX10MBX-4.ad.bu.edu> , <87d213seeu.fsf@jedbrown.org> Message-ID: <17A35C213185A84BB8ED54C88FBFD712C3D1C448@IST-EX10MBX-4.ad.bu.edu> Jed, When expanding the LHS, the anti-symmetric kappa terms cause mixed second-order derivatives to cancel, leaving n[\partial_{xx} + \partial_{yy} + (1+\kappa^2)\partial_{zz}]\phi + lower-order terms. Since n (density) and kappa are non-negative, I thought this would mean the operator is still elliptic. You're right that there is unavoidable anisotropy in the direction of the magnetic field. Mark, I'll look for that Trottenberg, et al. book. Thanks for the reference. Regarding the manual, the last sentence of the first paragraph in "Trouble shooting algebraic multigrid methods" says "-pc_gamg_threshold 0.0 is the most robust option ... and is recommended if poor convergence rates are observed, ..." but the previous sentence says that setting x=0.0 in -pc_gamg_threshold x "will result in ... generally worse convergence rates." This seems to be a contradiction. Can you clarify? --Matt -------------------------------------------------------------- Matthew Young Graduate Student Boston University Dept. of Astronomy -------------------------------------------------------------- ________________________________________ From: Jed Brown [jed at jedbrown.org] Sent: Wednesday, June 10, 2015 12:42 PM To: Mark Adams; Young, Matthew, Adam; PETSc users list Subject: Re: [petsc-users] GAMG Mark Adams writes: > Yes, lets get this back on the list. > > On Wed, Jun 10, 2015 at 12:01 PM, Young, Matthew, Adam wrote: > >> Ah, oops - I was looking at the v 3.5 manual. I am certainly interested >> in algorithmic details if there are relevant papers. My main interest right >> now is determining if this method is appropriate for my problem. >> > > Jed mentioned that this will not work well out of the box, as I recall. It > looks like very high anisotropy. It looks like a hyperbolic term. If you only look at the symmetric part of the tensor, then you get anisotropy (1 versus 1 + \kappa^2 ? 10000), but we also have a big nonsymmetric contribution. From riseth at maths.ox.ac.uk Wed Jun 10 18:30:03 2015 From: riseth at maths.ox.ac.uk (=?UTF-8?Q?Asbj=C3=B8rn_Nilsen_Riseth?=) Date: Wed, 10 Jun 2015 23:30:03 +0000 Subject: [petsc-users] Apply operator to linearised system before solving? Message-ID: Dear PETSc community, I'm trying to implement a preconditioner used in reservoir modelling, called Constrained Pressure Residual. They apply a transformation to the linearised system we get from each Newton step, before solving it. Then a 2-stage multiplicative preconditioner on the transformed system. The main problem is implementing step 1 below. Let A be the Jacobian, b the residual and K my transformation. A = [A00 A01; A10 A11]. The process is roughly like this: 1) Set Atilde = KA, btilde = Kb. - We want to solve Atilde x = btilde 2) Create a 2-stage multiplicative preconditioner using Atilde, btilde pc0: This is only applied to the "fieldsplit 1" block of my system. B_1 = [0 0; 0 S^-1] Where S is a selfp Schur approximation from Atilde S = Atilde11 - Atilde10 * inv(diag(Atilde00))* Atilde01 pc1: This is a standard ILU on the whole system Atilde Currently I'm doing something like this Step 1: -ksp_type richardson -ksp_max_it 1 -pc_type python Then I create Atilde from KA in PCSetup, and a FGMRES ksp to take care of step 2 with PCApply Step 2: pc_type composite pc_composite_type multiplicative pc_composite_pcs python,ilu, Are there better ways of dealing with this transformation? To me it looks similar to a 2-step right preconditioner on top of a left preconditioner. Regards, Ozzy -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jun 10 18:54:10 2015 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 10 Jun 2015 18:54:10 -0500 Subject: [petsc-users] Apply operator to linearised system before solving? In-Reply-To: References: Message-ID: On Wed, Jun 10, 2015 at 6:30 PM, Asbj?rn Nilsen Riseth < riseth at maths.ox.ac.uk> wrote: > Dear PETSc community, > > I'm trying to implement a preconditioner used in reservoir modelling, > called Constrained Pressure Residual. > They apply a transformation to the linearised system we get from each > Newton step, before solving it. Then a 2-stage multiplicative > preconditioner on the transformed system. > 1) What is K? 2) Do you care about the 2-stage thing, or would any Stokes solver do? Thanks, Matt > The main problem is implementing step 1 below. > > Let A be the Jacobian, b the residual and K my transformation. > A = [A00 A01; A10 A11]. > The process is roughly like this: > 1) Set Atilde = KA, btilde = Kb. > - We want to solve Atilde x = btilde > > 2) Create a 2-stage multiplicative preconditioner using Atilde, btilde > pc0: This is only applied to the "fieldsplit 1" block of my system. > B_1 = [0 0; 0 S^-1] > Where S is a selfp Schur approximation from Atilde > S = Atilde11 - Atilde10 * inv(diag(Atilde00))* Atilde01 > pc1: This is a standard ILU on the whole system Atilde > > Currently I'm doing something like this > Step 1: > -ksp_type richardson -ksp_max_it 1 > -pc_type python > Then I create Atilde from KA in PCSetup, and a FGMRES ksp to take care of > step 2 with PCApply > > Step 2: > pc_type composite > pc_composite_type multiplicative > pc_composite_pcs python,ilu, > > Are there better ways of dealing with this transformation? > To me it looks similar to a 2-step right preconditioner on top of a left > preconditioner. > > Regards, > Ozzy > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From riseth at maths.ox.ac.uk Wed Jun 10 19:23:08 2015 From: riseth at maths.ox.ac.uk (=?UTF-8?Q?Asbj=C3=B8rn_Nilsen_Riseth?=) Date: Thu, 11 Jun 2015 00:23:08 +0000 Subject: [petsc-users] Apply operator to linearised system before solving? In-Reply-To: References: Message-ID: Hi Matt, 1) K = [ I 0 ; -A10*inv(diag(A00)) I ] There is a typo in my first email: S = Atilde11 = A11 - A10*inv(diag(A00))*A01 2) I'd like to have the linear solver close to what these people currently use. My ultimate goal is to look at nonlinear solver strategies, not improve their linear solver. I already have a PC that seems to work fine for my purposes, but was hoping I could implement this version of CPR to get even closer to their current systems. Ozzy On Thu, 11 Jun 2015 at 00:54 Matthew Knepley wrote: > On Wed, Jun 10, 2015 at 6:30 PM, Asbj?rn Nilsen Riseth < > riseth at maths.ox.ac.uk> wrote: > >> Dear PETSc community, >> >> I'm trying to implement a preconditioner used in reservoir modelling, >> called Constrained Pressure Residual. >> They apply a transformation to the linearised system we get from each >> Newton step, before solving it. Then a 2-stage multiplicative >> preconditioner on the transformed system. >> > > 1) What is K? > > 2) Do you care about the 2-stage thing, or would any Stokes solver do? > > Thanks, > > Matt > > >> The main problem is implementing step 1 below. >> >> Let A be the Jacobian, b the residual and K my transformation. >> A = [A00 A01; A10 A11]. >> The process is roughly like this: >> 1) Set Atilde = KA, btilde = Kb. >> - We want to solve Atilde x = btilde >> >> 2) Create a 2-stage multiplicative preconditioner using Atilde, btilde >> pc0: This is only applied to the "fieldsplit 1" block of my system. >> B_1 = [0 0; 0 S^-1] >> Where S is a selfp Schur approximation from Atilde >> S = Atilde11 - Atilde10 * inv(diag(Atilde00))* Atilde01 >> pc1: This is a standard ILU on the whole system Atilde >> >> Currently I'm doing something like this >> Step 1: >> -ksp_type richardson -ksp_max_it 1 >> -pc_type python >> Then I create Atilde from KA in PCSetup, and a FGMRES ksp to take care of >> step 2 with PCApply >> >> Step 2: >> pc_type composite >> pc_composite_type multiplicative >> pc_composite_pcs python,ilu, >> >> Are there better ways of dealing with this transformation? >> To me it looks similar to a 2-step right preconditioner on top of a left >> preconditioner. >> >> Regards, >> Ozzy >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mkhodak at princeton.edu Thu Jun 11 05:07:05 2015 From: mkhodak at princeton.edu (Mikhail Khodak) Date: Thu, 11 Jun 2015 03:07:05 -0700 Subject: [petsc-users] petsc4py Build Problem In-Reply-To: References: Message-ID: Thank you for your help - the install seems to work, apart from routines requiring MPI, which fail due to the "Attempting to use an MPI routine before initializing MPI" error. This seems to be an error in the PETSc build itself. Thanks again, Mikhail Khodak On Mon, Jun 8, 2015 at 5:11 AM, Lisandro Dalcin wrote: > On 8 June 2015 at 02:50, Mikhail Khodak wrote: > > Hello, I am trying to build petsc4py-3.5.1 using Cygwin on 64-bit > Windows 7. > > I have built PETSc 3.5.4 with shared and dynamic libraries using > > mpich2-1.2.1 and successfully ran the installation tests. I am using > Python > > 2.7 and NumPy 1.9.2 and have installed mpi4py. However, when I attempt to > > install petsc4py (both with pip and distutils) I get a mpicc compiler > error > > due to undefined references/symbols. I have attached the output of > running > > > > pip install petsc petsc4py --allow-external petsc --allow-external > petsc4py > > > > I've never ever built or test petsc4py under Cygwin. The errors you > see are expected. > Perhaps you can manually workaround the issues following the following > steps: > > 1) Download the petsc4py tarball and unpack it. > 2) Open the file "src/libpetsc4py/libpetsc4py.h", add remove all > occurences of DL_IMPORT, i.e, replace DL_IMPORT(XYZ) for just XYZ > 3) Use pip again: > > pip install petsc > pip install . > > The last line assumes your current working directory is the one having > petsc4py's setup.py > > Finally, I do not guarantee this will work. I'm just guessing, > petsc4py never explicitly supported Windows and/or Cygwin. > > > -- > Lisandro Dalcin > ============ > Research Scientist > Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) > Numerical Porous Media Center (NumPor) > King Abdullah University of Science and Technology (KAUST) > http://numpor.kaust.edu.sa/ > > 4700 King Abdullah University of Science and Technology > al-Khawarizmi Bldg (Bldg 1), Office # 4332 > Thuwal 23955-6900, Kingdom of Saudi Arabia > http://www.kaust.edu.sa > > Office Phone: +966 12 808-0459 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jun 11 06:21:50 2015 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 11 Jun 2015 06:21:50 -0500 Subject: [petsc-users] petsc4py Build Problem In-Reply-To: References: Message-ID: On Thu, Jun 11, 2015 at 5:07 AM, Mikhail Khodak wrote: > Thank you for your help - the install seems to work, apart from routines > requiring MPI, which fail due to the "Attempting to use an MPI routine > before initializing MPI" error. This seems to be an error in the PETSc > build itself. > The MPI routines will not work until after import petsc4py, sys petsc4py.init(sys.argv) from petsc4py import PETSc If they fail after this, it is usually a mismatch between the mpiexec being used and the MPI libraries being linked. Thanks, Matt Thanks again, > Mikhail Khodak > > On Mon, Jun 8, 2015 at 5:11 AM, Lisandro Dalcin wrote: > >> On 8 June 2015 at 02:50, Mikhail Khodak wrote: >> > Hello, I am trying to build petsc4py-3.5.1 using Cygwin on 64-bit >> Windows 7. >> > I have built PETSc 3.5.4 with shared and dynamic libraries using >> > mpich2-1.2.1 and successfully ran the installation tests. I am using >> Python >> > 2.7 and NumPy 1.9.2 and have installed mpi4py. However, when I attempt >> to >> > install petsc4py (both with pip and distutils) I get a mpicc compiler >> error >> > due to undefined references/symbols. I have attached the output of >> running >> > >> > pip install petsc petsc4py --allow-external petsc --allow-external >> petsc4py >> > >> >> I've never ever built or test petsc4py under Cygwin. The errors you >> see are expected. >> Perhaps you can manually workaround the issues following the following >> steps: >> >> 1) Download the petsc4py tarball and unpack it. >> 2) Open the file "src/libpetsc4py/libpetsc4py.h", add remove all >> occurences of DL_IMPORT, i.e, replace DL_IMPORT(XYZ) for just XYZ >> 3) Use pip again: >> >> pip install petsc >> pip install . >> >> The last line assumes your current working directory is the one having >> petsc4py's setup.py >> >> Finally, I do not guarantee this will work. I'm just guessing, >> petsc4py never explicitly supported Windows and/or Cygwin. >> >> >> -- >> Lisandro Dalcin >> ============ >> Research Scientist >> Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) >> Numerical Porous Media Center (NumPor) >> King Abdullah University of Science and Technology (KAUST) >> http://numpor.kaust.edu.sa/ >> >> 4700 King Abdullah University of Science and Technology >> al-Khawarizmi Bldg (Bldg 1), Office # 4332 >> Thuwal 23955-6900, Kingdom of Saudi Arabia >> http://www.kaust.edu.sa >> >> Office Phone: +966 12 808-0459 >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From mailinglists at xgm.de Thu Jun 11 06:23:28 2015 From: mailinglists at xgm.de (Florian Lindner) Date: Thu, 11 Jun 2015 13:23:28 +0200 Subject: [petsc-users] PETSc was configured with one OpenMPI version but now appears to be compiling using a different OpenMPI Message-ID: <1505176.dZzVkMf1SI@asaru> Hello, I try to setup petsc on my Arch Linux box. Download it using git -b maint. % python2 configure works fine: [...] Compilers: C Compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 C++ Compiler: mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g -O0 -fPIC Fortran Compiler: mpif90 -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -g -O0 Linkers: Shared linker: mpicc -shared -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 Dynamic linker: mpicc -shared -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 make: [...] xxx=========================================================================xxx Configure stage complete. Now build PETSc libraries with (gnumake build): make PETSC_DIR=/home/florian/software/petsc PETSC_ARCH=arch-linux2-c-debug all xxx=========================================================================xxx Now building: % make PETSC_DIR=/home/florian/software/petsc PETSC_ARCH=arch-linux2-c-debug all yields this error: "PETSc was configured with one OpenMPI mpi.h version but now appears to be compiling using a different OpenMPI mpi.h version" I would prefer to use my distributions openmpi 1.8.5, there is no other MPI version installed. Using configure with --download-mpich and this compiler /home/florian/software/petsc/arch-linux2-c-debug/bin/mpicc seems to work. Is openmpi 1.8.5 incompatible with petsc? Is used to work fine some time ago, but I'm not sure how my system changed in the last weeks when I haven't used petsc on this machine (Arch is a rolling release). Thx, Florian From knepley at gmail.com Thu Jun 11 06:26:27 2015 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 11 Jun 2015 06:26:27 -0500 Subject: [petsc-users] PETSc was configured with one OpenMPI version but now appears to be compiling using a different OpenMPI In-Reply-To: <1505176.dZzVkMf1SI@asaru> References: <1505176.dZzVkMf1SI@asaru> Message-ID: Cannot tell without configure.log Thanks, Matt On Thu, Jun 11, 2015 at 6:23 AM, Florian Lindner wrote: > Hello, > > I try to setup petsc on my Arch Linux box. Download it using git -b maint. > > % python2 configure works fine: > [...] > Compilers: > C Compiler: mpicc -fPIC -Wall -Wwrite-strings > -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 > C++ Compiler: mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing > -Wno-unknown-pragmas -g -O0 -fPIC > Fortran Compiler: mpif90 -fPIC -Wall -Wno-unused-variable > -ffree-line-length-0 -g -O0 > Linkers: > Shared linker: mpicc -shared -fPIC -Wall -Wwrite-strings > -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 > Dynamic linker: mpicc -shared -fPIC -Wall -Wwrite-strings > -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 > make: > [...] > > xxx=========================================================================xxx > Configure stage complete. Now build PETSc libraries with (gnumake build): > make PETSC_DIR=/home/florian/software/petsc > PETSC_ARCH=arch-linux2-c-debug all > > xxx=========================================================================xxx > > Now building: > > % make PETSC_DIR=/home/florian/software/petsc > PETSC_ARCH=arch-linux2-c-debug all > > yields this error: > > "PETSc was configured with one OpenMPI mpi.h version but now appears to be > compiling using a different OpenMPI mpi.h version" > > I would prefer to use my distributions openmpi 1.8.5, there is no other > MPI version installed. > > Using configure with --download-mpich and this compiler > /home/florian/software/petsc/arch-linux2-c-debug/bin/mpicc seems to work. > > Is openmpi 1.8.5 incompatible with petsc? Is used to work fine some time > ago, but I'm not sure how my system changed in the last weeks when I > haven't used petsc on this machine (Arch is a rolling release). > > Thx, > Florian > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.scott at ed.ac.uk Thu Jun 11 06:48:00 2015 From: d.scott at ed.ac.uk (David Scott) Date: Thu, 11 Jun 2015 12:48:00 +0100 Subject: [petsc-users] -pc_mg_monitor Message-ID: <55797570.6060008@ed.ac.uk> Hello, I am using MINRES with GAMG and have supplied various options #PETSc Option Table entries: -ksp_max_it 500 -ksp_monitor_true_residual -log_summary -mg_levels_ksp_max_it 2 -mg_levels_ksp_type chebyshev -mg_levels_pc_type sor -options_left -pc_gamg_agg_nsmooths 1 -pc_gamg_threshold 0.03 -pc_gamg_type agg -pc_gamg_verbose 7 -pc_mg_monitor #End of PETSc Option Table entries There is one unused database option. It is: Option left: name:-pc_mg_monitor (no value) Does this mean that I should have supplied an integer value with -pc_mg_monitor or is it not applicable in this case? If I should have supplied a value what is the allowed range? Thanks in advance, David -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From mailinglists at xgm.de Thu Jun 11 07:04:21 2015 From: mailinglists at xgm.de (Florian Lindner) Date: Thu, 11 Jun 2015 14:04:21 +0200 Subject: [petsc-users] PETSc was configured with one OpenMPI version but now appears to be compiling using a different OpenMPI In-Reply-To: References: <1505176.dZzVkMf1SI@asaru> Message-ID: <1808084.eXZK6gUgd8@asaru> configure.log is attached. Thx! Florian Am Donnerstag, 11. Juni 2015, 06:26:27 schrieb Matthew Knepley: > Cannot tell without configure.log > > Thanks, > > Matt > > On Thu, Jun 11, 2015 at 6:23 AM, Florian Lindner > wrote: > > > Hello, > > > > I try to setup petsc on my Arch Linux box. Download it using git -b maint. > > > > % python2 configure works fine: > > [...] > > Compilers: > > C Compiler: mpicc -fPIC -Wall -Wwrite-strings > > -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 > > C++ Compiler: mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing > > -Wno-unknown-pragmas -g -O0 -fPIC > > Fortran Compiler: mpif90 -fPIC -Wall -Wno-unused-variable > > -ffree-line-length-0 -g -O0 > > Linkers: > > Shared linker: mpicc -shared -fPIC -Wall -Wwrite-strings > > -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 > > Dynamic linker: mpicc -shared -fPIC -Wall -Wwrite-strings > > -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 > > make: > > [...] > > > > xxx=========================================================================xxx > > Configure stage complete. Now build PETSc libraries with (gnumake build): > > make PETSC_DIR=/home/florian/software/petsc > > PETSC_ARCH=arch-linux2-c-debug all > > > > xxx=========================================================================xxx > > > > Now building: > > > > % make PETSC_DIR=/home/florian/software/petsc > > PETSC_ARCH=arch-linux2-c-debug all > > > > yields this error: > > > > "PETSc was configured with one OpenMPI mpi.h version but now appears to be > > compiling using a different OpenMPI mpi.h version" > > > > I would prefer to use my distributions openmpi 1.8.5, there is no other > > MPI version installed. > > > > Using configure with --download-mpich and this compiler > > /home/florian/software/petsc/arch-linux2-c-debug/bin/mpicc seems to work. > > > > Is openmpi 1.8.5 incompatible with petsc? Is used to work fine some time > > ago, but I'm not sure how my system changed in the last weeks when I > > haven't used petsc on this machine (Arch is a rolling release). > > > > Thx, > > Florian > > > > > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 2278888 bytes Desc: not available URL: From knepley at gmail.com Thu Jun 11 07:16:50 2015 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 11 Jun 2015 07:16:50 -0500 Subject: [petsc-users] PETSc was configured with one OpenMPI version but now appears to be compiling using a different OpenMPI In-Reply-To: <1808084.eXZK6gUgd8@asaru> References: <1505176.dZzVkMf1SI@asaru> <1808084.eXZK6gUgd8@asaru> Message-ID: On Thu, Jun 11, 2015 at 7:04 AM, Florian Lindner wrote: > configure.log is attached. > Ah, you have the buggy Apple preprocessor, so you get Unable to parse OpenMPI version from header. Probably a buggy preprocessor Defined "HAVE_OMPI_MAJOR_VERSION" to "unknown" Defined "HAVE_OMPI_MINOR_VERSION" to "unknown" Defined "HAVE_OMPI_RELEASE_VERSION" to "unknown" In the new release, we catch this. However, I think then the make check fails. Satish, how is the make check for this version number done? Thanks, Matt > Thx! > Florian > > Am Donnerstag, 11. Juni 2015, 06:26:27 schrieb Matthew Knepley: > > Cannot tell without configure.log > > > > Thanks, > > > > Matt > > > > On Thu, Jun 11, 2015 at 6:23 AM, Florian Lindner > > wrote: > > > > > Hello, > > > > > > I try to setup petsc on my Arch Linux box. Download it using git -b > maint. > > > > > > % python2 configure works fine: > > > [...] > > > Compilers: > > > C Compiler: mpicc -fPIC -Wall -Wwrite-strings > > > -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 > > > C++ Compiler: mpicxx -Wall -Wwrite-strings > -Wno-strict-aliasing > > > -Wno-unknown-pragmas -g -O0 -fPIC > > > Fortran Compiler: mpif90 -fPIC -Wall -Wno-unused-variable > > > -ffree-line-length-0 -g -O0 > > > Linkers: > > > Shared linker: mpicc -shared -fPIC -Wall -Wwrite-strings > > > -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 > > > Dynamic linker: mpicc -shared -fPIC -Wall -Wwrite-strings > > > -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 > > > make: > > > [...] > > > > > > > xxx=========================================================================xxx > > > Configure stage complete. Now build PETSc libraries with (gnumake > build): > > > make PETSC_DIR=/home/florian/software/petsc > > > PETSC_ARCH=arch-linux2-c-debug all > > > > > > > xxx=========================================================================xxx > > > > > > Now building: > > > > > > % make PETSC_DIR=/home/florian/software/petsc > > > PETSC_ARCH=arch-linux2-c-debug all > > > > > > yields this error: > > > > > > "PETSc was configured with one OpenMPI mpi.h version but now appears > to be > > > compiling using a different OpenMPI mpi.h version" > > > > > > I would prefer to use my distributions openmpi 1.8.5, there is no other > > > MPI version installed. > > > > > > Using configure with --download-mpich and this compiler > > > /home/florian/software/petsc/arch-linux2-c-debug/bin/mpicc seems to > work. > > > > > > Is openmpi 1.8.5 incompatible with petsc? Is used to work fine some > time > > > ago, but I'm not sure how my system changed in the last weeks when I > > > haven't used petsc on this machine (Arch is a rolling release). > > > > > > Thx, > > > Florian > > > > > > > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From ohenrich at epcc.ed.ac.uk Thu Jun 11 08:15:11 2015 From: ohenrich at epcc.ed.ac.uk (Oliver Henrich) Date: Thu, 11 Jun 2015 14:15:11 +0100 Subject: [petsc-users] Accessing 'halo' matrix entries? Message-ID: Dear PETSc-Team, I am trying to solve a Poisson equation with a mixed periodic-Dirichlet boundary condition. What I have in mind is e.g. a compressible flow with a total pressure difference imposed between the two sides of the system, but otherwise periodic, and periodic boundary conditions along the remaining two dimensions. Another example would be an electrostatic system with dielectric contrast in an external electric field / potential difference. For clarity, if x = 0 (N+1) is the left (right) halo site at the boundary and x = 1 (N) is the leftmost (rightmost) site in the physical domain: psi(x = 0) = psi(x = N) - dpsi psi(x = N+1) = psi(1) + dpsi I know it is possible to solve this with a double Poisson solve, which I try to avoid for performance reasons. It is also possible to solve this by modifying the matrix with a master-slave approach that imposes the constraint. This requires defining a transformation matrix that acts on the matrix, the solution vector and the righthand side of the problem. The core of the problem I have is that the pressure or potential difference should not be between the leftmost and rightmost site in the physical domain (a standard Dirichlet BC), but between the left- or rightmost site in the physical domain and the corresponding halo site at the opposite side of the system. It should be possible to do this if the entries of the transformation matrix that act on the halo sites can be accessed and modified. Is anything like this possible in PETSc? Best regards and many thanks, Oliver -- Dr Oliver Henrich Edinburgh Parallel Computing Centre School of Physics and Astronomy University of Edinburgh King's Buildings, JCMB Edinburgh EH9 3FD United Kingdom Tel: +44 (0)131 650 5818 Fax: +44 (0)131 650 6555 -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 From mfadams at lbl.gov Thu Jun 11 08:43:08 2015 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 11 Jun 2015 09:43:08 -0400 Subject: [petsc-users] GAMG In-Reply-To: <17A35C213185A84BB8ED54C88FBFD712C3D1C448@IST-EX10MBX-4.ad.bu.edu> References: <17A35C213185A84BB8ED54C88FBFD712C3D1A091@IST-EX10MBX-4.ad.bu.edu> <17A35C213185A84BB8ED54C88FBFD712C3D1B269@IST-EX10MBX-4.ad.bu.edu> <87d213seeu.fsf@jedbrown.org> <17A35C213185A84BB8ED54C88FBFD712C3D1C448@IST-EX10MBX-4.ad.bu.edu> Message-ID: On Wed, Jun 10, 2015 at 5:02 PM, Young, Matthew, Adam wrote: > Jed, > When expanding the LHS, the anti-symmetric kappa terms cause mixed > second-order derivatives to cancel, leaving n[\partial_{xx} + \partial_{yy} > + (1+\kappa^2)\partial_{zz}]\phi + lower-order terms. Since n (density) and > kappa are non-negative, I thought this would mean the operator is still > elliptic. You're right that there is unavoidable anisotropy in the > direction of the magnetic field. > > Mark, > I'll look for that Trottenberg, et al. book. Thanks for the reference. > Regarding the manual, the last sentence of the first paragraph in "Trouble > shooting algebraic multigrid methods" says "-pc_gamg_threshold 0.0 is the > most robust option ... and is recommended if poor convergence rates are > observed, ..." Yea, this is confusing. What I meant was if you have catastrophic convergence rate then it can come from thresholding. I should replace "poor" with "catastrophic" > but the previous sentence says that setting x=0.0 in -pc_gamg_threshold x > "will result in ... generally worse convergence rates." smaller x will generally degrade convergence rates, once you are working "correctly" (not easy to define), but each iteration will be faster. So there should be a minima in terms of solve times. > This seems to be a contradiction. Can you clarify? > > --Matt > -------------------------------------------------------------- > Matthew Young > Graduate Student > Boston University Dept. of Astronomy > -------------------------------------------------------------- > > > ________________________________________ > From: Jed Brown [jed at jedbrown.org] > Sent: Wednesday, June 10, 2015 12:42 PM > To: Mark Adams; Young, Matthew, Adam; PETSc users list > Subject: Re: [petsc-users] GAMG > > Mark Adams writes: > > > Yes, lets get this back on the list. > > > > On Wed, Jun 10, 2015 at 12:01 PM, Young, Matthew, Adam > wrote: > > > >> Ah, oops - I was looking at the v 3.5 manual. I am certainly interested > >> in algorithmic details if there are relevant papers. My main interest > right > >> now is determining if this method is appropriate for my problem. > >> > > > > Jed mentioned that this will not work well out of the box, as I recall. > It > > looks like very high anisotropy. > > It looks like a hyperbolic term. If you only look at the symmetric part > of the tensor, then you get anisotropy (1 versus 1 + \kappa^2 ? 10000), > but we also have a big nonsymmetric contribution. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jun 11 08:44:30 2015 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 11 Jun 2015 08:44:30 -0500 Subject: [petsc-users] Accessing 'halo' matrix entries? In-Reply-To: References: Message-ID: On Thu, Jun 11, 2015 at 8:15 AM, Oliver Henrich wrote: > Dear PETSc-Team, > > I am trying to solve a Poisson equation with a mixed periodic-Dirichlet > boundary condition. What I have in mind is e.g. a compressible flow with a > total pressure difference imposed between the two sides of the system, but > otherwise periodic, and periodic boundary conditions along the remaining > two dimensions. Another example would be an electrostatic system with > dielectric contrast in an external electric field / potential difference. > > For clarity, if x = 0 (N+1) is the left (right) halo site at the boundary > and x = 1 (N) is the leftmost (rightmost) site in the physical domain: > > psi(x = 0) = psi(x = N) - dpsi > psi(x = N+1) = psi(1) + dpsi > > I know it is possible to solve this with a double Poisson solve, which I > try to avoid for performance reasons. > > It is also possible to solve this by modifying the matrix with a > master-slave approach that imposes the constraint. This requires defining a > transformation matrix that acts on the matrix, the solution vector and the > righthand side of the problem. > > The core of the problem I have is that the pressure or potential > difference should not be between the leftmost and rightmost site in the > physical domain (a standard Dirichlet BC), but between the left- or > rightmost site in the physical domain and the corresponding halo site at > the opposite side of the system. It should be possible to do this if the > entries of the transformation matrix that act on the halo sites can be > accessed and modified. > > Is anything like this possible in PETSc? > The way that periodicity works in PETSc right now for DMDA is that values are copied into the halo region from the other part of the mesh. Thus, you can just choose to add your delta at the right boundary and not at the left. The mechanics would be the same as now. Does that make sense? Thanks, Matt > Best regards and many thanks, > Oliver > > -- > Dr Oliver Henrich > Edinburgh Parallel Computing Centre > School of Physics and Astronomy > University of Edinburgh > King's Buildings, JCMB > Edinburgh EH9 3FD > United Kingdom > > Tel: +44 (0)131 650 5818 > Fax: +44 (0)131 650 6555 > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336 > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From ohenrich at epcc.ed.ac.uk Thu Jun 11 09:40:53 2015 From: ohenrich at epcc.ed.ac.uk (Oliver Henrich) Date: Thu, 11 Jun 2015 15:40:53 +0100 Subject: [petsc-users] Accessing 'halo' matrix entries? In-Reply-To: References: Message-ID: <004D7A5F-7DC5-455F-80E3-78EB700F2E7F@epcc.ed.ac.uk> Dear Matt, Thanks for your quick response. The way I understand it is that I can only modify the halo of a vector, which is simply copied from the other side of the mesh, but matrices don?t actually have halo entries or the like. I am not sure the solution to this problem is simply adding the difference dpsi at one boundary only and not subtracting it at the other side. This would mean that at one side you have periodicity plus the added difference, whereas at the other side you just have periodicity. Sounds like it leads to a kind of discontinuity. It is also not possible to just add it at the end. It had to be added at every step during the iteration process. Would it be possible to model the halo region explicitly by adding two additional points at the leftmost and rightmost boundary, but only on the processes with minimal and maximal Cartesian rank along this dimension where the jump occurs? Then I could modify the matrix and rhs and impose the constraint directly. Best wishes and many thanks for your help, Oliver On 11 Jun 2015, at 14:44, Matthew Knepley wrote: > On Thu, Jun 11, 2015 at 8:15 AM, Oliver Henrich wrote: > Dear PETSc-Team, > > I am trying to solve a Poisson equation with a mixed periodic-Dirichlet boundary condition. What I have in mind is e.g. a compressible flow with a total pressure difference imposed between the two sides of the system, but otherwise periodic, and periodic boundary conditions along the remaining two dimensions. Another example would be an electrostatic system with dielectric contrast in an external electric field / potential difference. > > For clarity, if x = 0 (N+1) is the left (right) halo site at the boundary and x = 1 (N) is the leftmost (rightmost) site in the physical domain: > > psi(x = 0) = psi(x = N) - dpsi > psi(x = N+1) = psi(1) + dpsi > > I know it is possible to solve this with a double Poisson solve, which I try to avoid for performance reasons. > > It is also possible to solve this by modifying the matrix with a master-slave approach that imposes the constraint. This requires defining a transformation matrix that acts on the matrix, the solution vector and the righthand side of the problem. > > The core of the problem I have is that the pressure or potential difference should not be between the leftmost and rightmost site in the physical domain (a standard Dirichlet BC), but between the left- or rightmost site in the physical domain and the corresponding halo site at the opposite side of the system. It should be possible to do this if the entries of the transformation matrix that act on the halo sites can be accessed and modified. > > Is anything like this possible in PETSc? > > The way that periodicity works in PETSc right now for DMDA is that values are copied into the halo > region from the other part of the mesh. Thus, you can just choose to add your delta at the right boundary > and not at the left. The mechanics would be the same as now. Does that make sense? > > Thanks, > > Matt > > Best regards and many thanks, > Oliver > > -- > Dr Oliver Henrich > Edinburgh Parallel Computing Centre > School of Physics and Astronomy > University of Edinburgh > King's Buildings, JCMB > Edinburgh EH9 3FD > United Kingdom > > Tel: +44 (0)131 650 5818 > Fax: +44 (0)131 650 6555 > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336 > > > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener -- Dr Oliver Henrich Edinburgh Parallel Computing Centre School of Physics and Astronomy University of Edinburgh King's Buildings, JCMB Edinburgh EH9 3FD United Kingdom Tel: +44 (0)131 650 5818 Fax: +44 (0)131 650 6555 -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Thu Jun 11 09:44:04 2015 From: hzhang at mcs.anl.gov (Hong) Date: Thu, 11 Jun 2015 09:44:04 -0500 Subject: [petsc-users] -pc_mg_monitor In-Reply-To: <55797570.6060008@ed.ac.uk> References: <55797570.6060008@ed.ac.uk> Message-ID: David: PETSc library does not have the option '-pc_mg_monitor'. Hong On Thu, Jun 11, 2015 at 6:48 AM, David Scott wrote: > Hello, > > I am using MINRES with GAMG and have supplied various options > > #PETSc Option Table entries: > -ksp_max_it 500 > -ksp_monitor_true_residual > -log_summary > -mg_levels_ksp_max_it 2 > -mg_levels_ksp_type chebyshev > -mg_levels_pc_type sor > -options_left > -pc_gamg_agg_nsmooths 1 > -pc_gamg_threshold 0.03 > -pc_gamg_type agg > -pc_gamg_verbose 7 > -pc_mg_monitor > #End of PETSc Option Table entries > There is one unused database option. It is: > Option left: name:-pc_mg_monitor (no value) > > Does this mean that I should have supplied an integer value with > -pc_mg_monitor or is it not applicable in this case? If I should have > supplied a value what is the allowed range? > > Thanks in advance, > > David > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Jun 11 09:43:32 2015 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 11 Jun 2015 10:43:32 -0400 Subject: [petsc-users] -pc_mg_monitor In-Reply-To: <55797570.6060008@ed.ac.uk> References: <55797570.6060008@ed.ac.uk> Message-ID: GAMG does not know about this explicitly and apparently it is not getting picked up my PCMG correctly (probably my fault). I don't understand what this is supposed to do. I would not worry about it, but should GAMG care about this? Mark On Thu, Jun 11, 2015 at 7:48 AM, David Scott wrote: > Hello, > > I am using MINRES with GAMG and have supplied various options > > #PETSc Option Table entries: > -ksp_max_it 500 > -ksp_monitor_true_residual > -log_summary > -mg_levels_ksp_max_it 2 > -mg_levels_ksp_type chebyshev > -mg_levels_pc_type sor > -options_left > -pc_gamg_agg_nsmooths 1 > -pc_gamg_threshold 0.03 > -pc_gamg_type agg > -pc_gamg_verbose 7 > -pc_mg_monitor > #End of PETSc Option Table entries > There is one unused database option. It is: > Option left: name:-pc_mg_monitor (no value) > > Does this mean that I should have supplied an integer value with > -pc_mg_monitor or is it not applicable in this case? If I should have > supplied a value what is the allowed range? > > Thanks in advance, > > David > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas at ices.utexas.edu Thu Jun 11 09:53:16 2015 From: andreas at ices.utexas.edu (Andreas Mang) Date: Thu, 11 Jun 2015 09:53:16 -0500 Subject: [petsc-users] TAO: armijo condition not fulfilled Message-ID: <59D0AE21-A0E0-4B8B-8266-404C75F40E13@ices.utexas.edu> Hi guys: I have a problem with the TAO Armijo line search (petsc-3.5.4). My algorithm works if I use the More & Thuente line search (default). I have numerically checked the gradient of my objective. It?s correct. I am happy to write a small snippet of code and do an easy test if you guys disagree, but from what I?ve seen in the line search code it seems obvious to me that there is a bug. Am I missing something or are you not setting ls->reason to TAOLINESEARCH_SUCCESS if the Armijo condition is fulfilled (TaoLineSearchApply_Armijo in armijo.c; line 118 - 302)?! It seems to me that ls->reason is and will remain to be set to TAOLINESEARCH_CONTINUE_ITERATING if everything works (i.e. I don?t hit one of the exceptions). Does this make sense? If not I?ll invest the time and put together a simple test case and, if that works, continue to check my code. /Andreas From d.scott at ed.ac.uk Thu Jun 11 09:53:38 2015 From: d.scott at ed.ac.uk (David Scott) Date: Thu, 11 Jun 2015 15:53:38 +0100 Subject: [petsc-users] -pc_mg_monitor In-Reply-To: References: <55797570.6060008@ed.ac.uk> Message-ID: <5579A0F2.90901@ed.ac.uk> Thanks for the replies. It seems that the reference to -pc_mg_monitor in http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCMG.html should be removed. David On 11/06/2015 15:44, Hong wrote: > David: > PETSc library does not have the option '-pc_mg_monitor'. > Hong > > On Thu, Jun 11, 2015 at 6:48 AM, David Scott > wrote: > > Hello, > > I am using MINRES with GAMG and have supplied various options > > #PETSc Option Table entries: > -ksp_max_it 500 > -ksp_monitor_true_residual > -log_summary > -mg_levels_ksp_max_it 2 > -mg_levels_ksp_type chebyshev > -mg_levels_pc_type sor > -options_left > -pc_gamg_agg_nsmooths 1 > -pc_gamg_threshold 0.03 > -pc_gamg_type agg > -pc_gamg_verbose 7 > -pc_mg_monitor > #End of PETSc Option Table entries > There is one unused database option. It is: > Option left: name:-pc_mg_monitor (no value) > > Does this mean that I should have supplied an integer value with > -pc_mg_monitor or is it not applicable in this case? If I should > have supplied a value what is the allowed range? > > Thanks in advance, > > David > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: not available URL: From mkhodak at princeton.edu Thu Jun 11 11:20:32 2015 From: mkhodak at princeton.edu (Mikhail Khodak) Date: Thu, 11 Jun 2015 09:20:32 -0700 Subject: [petsc-users] petsc4py Build Problem In-Reply-To: References: Message-ID: I added these lines to the petsc4py test files (specifically test_comm.py) but the error remains the same. However, I have done the standard 'which mpiexec', 'which mpicc', 'which mpirun' and they are all in the same folder. In fact it is the only MPI installed. The reason I thought it might be a PETSc build problem is because one of the PETSc 'make test' tests (ex5f) fails with the same error, even though the 'make streams' test works fine with MPI processes. Thanks, Mikhail On Thu, Jun 11, 2015 at 4:21 AM, Matthew Knepley wrote: > On Thu, Jun 11, 2015 at 5:07 AM, Mikhail Khodak > wrote: > >> Thank you for your help - the install seems to work, apart from routines >> requiring MPI, which fail due to the "Attempting to use an MPI routine >> before initializing MPI" error. This seems to be an error in the PETSc >> build itself. >> > > The MPI routines will not work until after > > import petsc4py, sys > petsc4py.init(sys.argv) > from petsc4py import PETSc > > If they fail after this, it is usually a mismatch between the mpiexec > being used and the MPI > libraries being linked. > > Thanks, > > Matt > > Thanks again, >> Mikhail Khodak >> >> On Mon, Jun 8, 2015 at 5:11 AM, Lisandro Dalcin >> wrote: >> >>> On 8 June 2015 at 02:50, Mikhail Khodak wrote: >>> > Hello, I am trying to build petsc4py-3.5.1 using Cygwin on 64-bit >>> Windows 7. >>> > I have built PETSc 3.5.4 with shared and dynamic libraries using >>> > mpich2-1.2.1 and successfully ran the installation tests. I am using >>> Python >>> > 2.7 and NumPy 1.9.2 and have installed mpi4py. However, when I attempt >>> to >>> > install petsc4py (both with pip and distutils) I get a mpicc compiler >>> error >>> > due to undefined references/symbols. I have attached the output of >>> running >>> > >>> > pip install petsc petsc4py --allow-external petsc --allow-external >>> petsc4py >>> > >>> >>> I've never ever built or test petsc4py under Cygwin. The errors you >>> see are expected. >>> Perhaps you can manually workaround the issues following the following >>> steps: >>> >>> 1) Download the petsc4py tarball and unpack it. >>> 2) Open the file "src/libpetsc4py/libpetsc4py.h", add remove all >>> occurences of DL_IMPORT, i.e, replace DL_IMPORT(XYZ) for just XYZ >>> 3) Use pip again: >>> >>> pip install petsc >>> pip install . >>> >>> The last line assumes your current working directory is the one having >>> petsc4py's setup.py >>> >>> Finally, I do not guarantee this will work. I'm just guessing, >>> petsc4py never explicitly supported Windows and/or Cygwin. >>> >>> >>> -- >>> Lisandro Dalcin >>> ============ >>> Research Scientist >>> Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) >>> Numerical Porous Media Center (NumPor) >>> King Abdullah University of Science and Technology (KAUST) >>> http://numpor.kaust.edu.sa/ >>> >>> 4700 King Abdullah University of Science and Technology >>> al-Khawarizmi Bldg (Bldg 1), Office # 4332 >>> Thuwal 23955-6900, Kingdom of Saudi Arabia >>> http://www.kaust.edu.sa >>> >>> Office Phone: +966 12 808-0459 >>> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Thu Jun 11 12:27:42 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 11 Jun 2015 12:27:42 -0500 Subject: [petsc-users] petsc4py Build Problem In-Reply-To: References: Message-ID: If you get errors when running basic petsc examples - send us the relavant petsc logs [configure.log, make.log, test.log etc..] Also - note that mpich is not supported on cygwin/windows [it generally works for us - when we try the --download-mpich build]. Unless you really need a cywin build/use of petsc4py - it might be easier to install a linux VM - and use PETSc/petsc4py on it. Satish On Thu, 11 Jun 2015, Mikhail Khodak wrote: > I added these lines to the petsc4py test files (specifically test_comm.py) > but the error remains the same. However, I have done the standard 'which > mpiexec', 'which mpicc', 'which mpirun' and they are all in the same > folder. In fact it is the only MPI installed. > The reason I thought it might be a PETSc build problem is because one of > the PETSc 'make test' tests (ex5f) fails with the same error, even though > the 'make streams' test works fine with MPI processes. > Thanks, > Mikhail > > On Thu, Jun 11, 2015 at 4:21 AM, Matthew Knepley wrote: > > > On Thu, Jun 11, 2015 at 5:07 AM, Mikhail Khodak > > wrote: > > > >> Thank you for your help - the install seems to work, apart from routines > >> requiring MPI, which fail due to the "Attempting to use an MPI routine > >> before initializing MPI" error. This seems to be an error in the PETSc > >> build itself. > >> > > > > The MPI routines will not work until after > > > > import petsc4py, sys > > petsc4py.init(sys.argv) > > from petsc4py import PETSc > > > > If they fail after this, it is usually a mismatch between the mpiexec > > being used and the MPI > > libraries being linked. > > > > Thanks, > > > > Matt > > > > Thanks again, > >> Mikhail Khodak > >> > >> On Mon, Jun 8, 2015 at 5:11 AM, Lisandro Dalcin > >> wrote: > >> > >>> On 8 June 2015 at 02:50, Mikhail Khodak wrote: > >>> > Hello, I am trying to build petsc4py-3.5.1 using Cygwin on 64-bit > >>> Windows 7. > >>> > I have built PETSc 3.5.4 with shared and dynamic libraries using > >>> > mpich2-1.2.1 and successfully ran the installation tests. I am using > >>> Python > >>> > 2.7 and NumPy 1.9.2 and have installed mpi4py. However, when I attempt > >>> to > >>> > install petsc4py (both with pip and distutils) I get a mpicc compiler > >>> error > >>> > due to undefined references/symbols. I have attached the output of > >>> running > >>> > > >>> > pip install petsc petsc4py --allow-external petsc --allow-external > >>> petsc4py > >>> > > >>> > >>> I've never ever built or test petsc4py under Cygwin. The errors you > >>> see are expected. > >>> Perhaps you can manually workaround the issues following the following > >>> steps: > >>> > >>> 1) Download the petsc4py tarball and unpack it. > >>> 2) Open the file "src/libpetsc4py/libpetsc4py.h", add remove all > >>> occurences of DL_IMPORT, i.e, replace DL_IMPORT(XYZ) for just XYZ > >>> 3) Use pip again: > >>> > >>> pip install petsc > >>> pip install . > >>> > >>> The last line assumes your current working directory is the one having > >>> petsc4py's setup.py > >>> > >>> Finally, I do not guarantee this will work. I'm just guessing, > >>> petsc4py never explicitly supported Windows and/or Cygwin. > >>> > >>> > >>> -- > >>> Lisandro Dalcin > >>> ============ > >>> Research Scientist > >>> Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) > >>> Numerical Porous Media Center (NumPor) > >>> King Abdullah University of Science and Technology (KAUST) > >>> http://numpor.kaust.edu.sa/ > >>> > >>> 4700 King Abdullah University of Science and Technology > >>> al-Khawarizmi Bldg (Bldg 1), Office # 4332 > >>> Thuwal 23955-6900, Kingdom of Saudi Arabia > >>> http://www.kaust.edu.sa > >>> > >>> Office Phone: +966 12 808-0459 > >>> > >> > >> > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which their > > experiments lead. > > -- Norbert Wiener > > > From balay at mcs.anl.gov Thu Jun 11 12:34:51 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 11 Jun 2015 12:34:51 -0500 Subject: [petsc-users] PETSc was configured with one OpenMPI version but now appears to be compiling using a different OpenMPI In-Reply-To: References: <1505176.dZzVkMf1SI@asaru> <1808084.eXZK6gUgd8@asaru> Message-ID: The code is in petscsys.h. The flags HAVE_OMPI_MAJOR_VERSION etc should not be set [if the version cannot be determined] So perhaps the attached patch is the fix. patch -Np1 < mpi-version-check.patch Satish On Thu, 11 Jun 2015, Matthew Knepley wrote: > On Thu, Jun 11, 2015 at 7:04 AM, Florian Lindner > wrote: > > > configure.log is attached. > > > > Ah, you have the buggy Apple preprocessor, so you get > > Unable to parse OpenMPI version from header. Probably a > buggy preprocessor > Defined "HAVE_OMPI_MAJOR_VERSION" to "unknown" > Defined "HAVE_OMPI_MINOR_VERSION" to "unknown" > Defined "HAVE_OMPI_RELEASE_VERSION" to "unknown" > > In the new release, we catch this. However, I think then the make check > fails. > > Satish, how is the make check for this version number done? > > Thanks, > > Matt > > > > Thx! > > Florian > > > > Am Donnerstag, 11. Juni 2015, 06:26:27 schrieb Matthew Knepley: > > > Cannot tell without configure.log > > > > > > Thanks, > > > > > > Matt > > > > > > On Thu, Jun 11, 2015 at 6:23 AM, Florian Lindner > > > wrote: > > > > > > > Hello, > > > > > > > > I try to setup petsc on my Arch Linux box. Download it using git -b > > maint. > > > > > > > > % python2 configure works fine: > > > > [...] > > > > Compilers: > > > > C Compiler: mpicc -fPIC -Wall -Wwrite-strings > > > > -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 > > > > C++ Compiler: mpicxx -Wall -Wwrite-strings > > -Wno-strict-aliasing > > > > -Wno-unknown-pragmas -g -O0 -fPIC > > > > Fortran Compiler: mpif90 -fPIC -Wall -Wno-unused-variable > > > > -ffree-line-length-0 -g -O0 > > > > Linkers: > > > > Shared linker: mpicc -shared -fPIC -Wall -Wwrite-strings > > > > -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 > > > > Dynamic linker: mpicc -shared -fPIC -Wall -Wwrite-strings > > > > -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 > > > > make: > > > > [...] > > > > > > > > > > xxx=========================================================================xxx > > > > Configure stage complete. Now build PETSc libraries with (gnumake > > build): > > > > make PETSC_DIR=/home/florian/software/petsc > > > > PETSC_ARCH=arch-linux2-c-debug all > > > > > > > > > > xxx=========================================================================xxx > > > > > > > > Now building: > > > > > > > > % make PETSC_DIR=/home/florian/software/petsc > > > > PETSC_ARCH=arch-linux2-c-debug all > > > > > > > > yields this error: > > > > > > > > "PETSc was configured with one OpenMPI mpi.h version but now appears > > to be > > > > compiling using a different OpenMPI mpi.h version" > > > > > > > > I would prefer to use my distributions openmpi 1.8.5, there is no other > > > > MPI version installed. > > > > > > > > Using configure with --download-mpich and this compiler > > > > /home/florian/software/petsc/arch-linux2-c-debug/bin/mpicc seems to > > work. > > > > > > > > Is openmpi 1.8.5 incompatible with petsc? Is used to work fine some > > time > > > > ago, but I'm not sure how my system changed in the last weeks when I > > > > haven't used petsc on this machine (Arch is a rolling release). > > > > > > > > Thx, > > > > Florian > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- diff --git a/config/BuildSystem/config/packages/MPI.py b/config/BuildSystem/config/packages/MPI.py index 042aeeb..3b574f2 100644 --- a/config/BuildSystem/config/packages/MPI.py +++ b/config/BuildSystem/config/packages/MPI.py @@ -434,9 +434,9 @@ class Configure(config.package.Package): buf = self.outputPreprocess(mpich_test) try: mpich_numversion = re.compile('\nint mpich_ver = *([0-9]*) *;').search(buf).group(1) + self.addDefine('HAVE_MPICH_NUMVERSION',mpich_numversion) except: self.logPrint('Unable to parse MPICH version from header. Probably a buggy preprocessor') - self.addDefine('HAVE_MPICH_NUMVERSION',mpich_numversion) elif self.checkCompile(openmpi_test): buf = self.outputPreprocess(openmpi_test) ompi_major_version = ompi_minor_version = ompi_release_version = 'unknown' @@ -444,11 +444,11 @@ class Configure(config.package.Package): ompi_major_version = re.compile('\nint ompi_major = *([0-9]*) *;').search(buf).group(1) ompi_minor_version = re.compile('\nint ompi_minor = *([0-9]*) *;').search(buf).group(1) ompi_release_version = re.compile('\nint ompi_release = *([0-9]*) *;').search(buf).group(1) + self.addDefine('HAVE_OMPI_MAJOR_VERSION',ompi_major_version) + self.addDefine('HAVE_OMPI_MINOR_VERSION',ompi_minor_version) + self.addDefine('HAVE_OMPI_RELEASE_VERSION',ompi_release_version) except: self.logPrint('Unable to parse OpenMPI version from header. Probably a buggy preprocessor') - self.addDefine('HAVE_OMPI_MAJOR_VERSION',ompi_major_version) - self.addDefine('HAVE_OMPI_MINOR_VERSION',ompi_minor_version) - self.addDefine('HAVE_OMPI_RELEASE_VERSION',ompi_release_version) def findMPIInc(self): '''Find MPI include paths from "mpicc -show"''' import re From jason.sarich at gmail.com Thu Jun 11 12:44:33 2015 From: jason.sarich at gmail.com (Jason Sarich) Date: Thu, 11 Jun 2015 12:44:33 -0500 Subject: [petsc-users] TAO: armijo condition not fulfilled In-Reply-To: References: Message-ID: Hi Andreas, Yes it looks like a bug that the reason is never set, but the line should still terminate. Is the problem you are having with the line search itself, or is it failing because you are checking this ls->reason directly? Jason Sarich On Thu, Jun 11, 2015 at 9:53 AM, Andreas Mang wrote: > Hi guys: > > I have a problem with the TAO Armijo line search (petsc-3.5.4). My > algorithm works if I use the More & Thuente line search (default). I have > numerically checked the gradient of my objective. It?s correct. I am happy > to write a small snippet of code and do an easy test if you guys disagree, > but from what I?ve seen in the line search code it seems obvious to me that > there is a bug. Am I missing something or are you not setting > > ls->reason > > to > > TAOLINESEARCH_SUCCESS > > if the Armijo condition is fulfilled (TaoLineSearchApply_Armijo in > armijo.c; line 118 - 302)?! > > It seems to me that ls->reason is and will remain to be set to > > TAOLINESEARCH_CONTINUE_ITERATING > > if everything works (i.e. I don?t hit one of the exceptions). Does this > make sense? If not I?ll invest the time and put together a simple test case > and, if that works, continue to check my code. > > /Andreas > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Thu Jun 11 12:48:04 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 11 Jun 2015 12:48:04 -0500 Subject: [petsc-users] petsc4py Build Problem In-Reply-To: References: Message-ID: Also - should mention: - cygwin has OpenMPI pacakged - so you could also try that. - most of my builds are static [so never tried dll builds] Satish On Thu, 11 Jun 2015, Satish Balay wrote: > If you get errors when running basic petsc examples - send us the > relavant petsc logs [configure.log, make.log, test.log etc..] > > Also - note that mpich is not supported on cygwin/windows [it > generally works for us - when we try the --download-mpich build]. > > Unless you really need a cywin build/use of petsc4py - it might be > easier to install a linux VM - and use PETSc/petsc4py on it. > > Satish > > On Thu, 11 Jun 2015, Mikhail Khodak wrote: > > > I added these lines to the petsc4py test files (specifically test_comm.py) > > but the error remains the same. However, I have done the standard 'which > > mpiexec', 'which mpicc', 'which mpirun' and they are all in the same > > folder. In fact it is the only MPI installed. > > The reason I thought it might be a PETSc build problem is because one of > > the PETSc 'make test' tests (ex5f) fails with the same error, even though > > the 'make streams' test works fine with MPI processes. > > Thanks, > > Mikhail > > > > On Thu, Jun 11, 2015 at 4:21 AM, Matthew Knepley wrote: > > > > > On Thu, Jun 11, 2015 at 5:07 AM, Mikhail Khodak > > > wrote: > > > > > >> Thank you for your help - the install seems to work, apart from routines > > >> requiring MPI, which fail due to the "Attempting to use an MPI routine > > >> before initializing MPI" error. This seems to be an error in the PETSc > > >> build itself. > > >> > > > > > > The MPI routines will not work until after > > > > > > import petsc4py, sys > > > petsc4py.init(sys.argv) > > > from petsc4py import PETSc > > > > > > If they fail after this, it is usually a mismatch between the mpiexec > > > being used and the MPI > > > libraries being linked. > > > > > > Thanks, > > > > > > Matt > > > > > > Thanks again, > > >> Mikhail Khodak > > >> > > >> On Mon, Jun 8, 2015 at 5:11 AM, Lisandro Dalcin > > >> wrote: > > >> > > >>> On 8 June 2015 at 02:50, Mikhail Khodak wrote: > > >>> > Hello, I am trying to build petsc4py-3.5.1 using Cygwin on 64-bit > > >>> Windows 7. > > >>> > I have built PETSc 3.5.4 with shared and dynamic libraries using > > >>> > mpich2-1.2.1 and successfully ran the installation tests. I am using > > >>> Python > > >>> > 2.7 and NumPy 1.9.2 and have installed mpi4py. However, when I attempt > > >>> to > > >>> > install petsc4py (both with pip and distutils) I get a mpicc compiler > > >>> error > > >>> > due to undefined references/symbols. I have attached the output of > > >>> running > > >>> > > > >>> > pip install petsc petsc4py --allow-external petsc --allow-external > > >>> petsc4py > > >>> > > > >>> > > >>> I've never ever built or test petsc4py under Cygwin. The errors you > > >>> see are expected. > > >>> Perhaps you can manually workaround the issues following the following > > >>> steps: > > >>> > > >>> 1) Download the petsc4py tarball and unpack it. > > >>> 2) Open the file "src/libpetsc4py/libpetsc4py.h", add remove all > > >>> occurences of DL_IMPORT, i.e, replace DL_IMPORT(XYZ) for just XYZ > > >>> 3) Use pip again: > > >>> > > >>> pip install petsc > > >>> pip install . > > >>> > > >>> The last line assumes your current working directory is the one having > > >>> petsc4py's setup.py > > >>> > > >>> Finally, I do not guarantee this will work. I'm just guessing, > > >>> petsc4py never explicitly supported Windows and/or Cygwin. > > >>> > > >>> > > >>> -- > > >>> Lisandro Dalcin > > >>> ============ > > >>> Research Scientist > > >>> Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) > > >>> Numerical Porous Media Center (NumPor) > > >>> King Abdullah University of Science and Technology (KAUST) > > >>> http://numpor.kaust.edu.sa/ > > >>> > > >>> 4700 King Abdullah University of Science and Technology > > >>> al-Khawarizmi Bldg (Bldg 1), Office # 4332 > > >>> Thuwal 23955-6900, Kingdom of Saudi Arabia > > >>> http://www.kaust.edu.sa > > >>> > > >>> Office Phone: +966 12 808-0459 > > >>> > > >> > > >> > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > > > experiments is infinitely more interesting than any results to which their > > > experiments lead. > > > -- Norbert Wiener > > > > > > > From andreas at ices.utexas.edu Thu Jun 11 13:12:00 2015 From: andreas at ices.utexas.edu (Andreas Mang) Date: Thu, 11 Jun 2015 13:12:00 -0500 Subject: [petsc-users] TAO: armijo condition not fulfilled In-Reply-To: References: Message-ID: Hey Jason: The line search fails. If I use Armijo I get TaoLineSearch Object: type: armijo maxf=30, ftol=1e-10, gtol=0.0001 Armijo linesearch : alpha=1 beta=0.5 sigma=0.0001 memsize=1 maximum function evaluations=30 tolerances: ftol=0.0001, rtol=1e-10, gtol=0.9 total number of function evaluations=1 total number of gradient evaluations=1 total number of function/gradient evaluations=0 Termination reason: 0 The parameters seem to be the default ones also suggested by Nocedal and Wright. So I did not change anything. The termination reason is equivalent to TAOLINESEARCH_CONTINUE_ITERATING. I am not checking the reason directly. I guess it starts reducing the step size after that. I can see that my objective function get?s evaluated (as expected); however, the objective values increase (from what I see when monitoring the evaluations of my objective). This leads to a failure in the line search and made (still makes) me believe there is a bug on my side (which I have not found yet). However, if I use a unit step it converges (relative change of the gradient e.g. to 1E-9; see bottom of this email). If I use More & Thuente, same thing. No reduction in step size necessary. If you suggest that I should do some further testing on simpler problems, I?m happy to do so. After looking at the code, I just felt like there obviously is something wrong in the line-search implementation. Thanks for your help. /Andreas Here?s the output after the first iteration (where the Armijo line search fails): TaoLineSearch Object: type: armijo maxf=30, ftol=1e-10, gtol=0.0001 Armijo linesearch : alpha=1 beta=0.5 sigma=0.0001 memsize=1 maximum function evaluations=30 tolerances: ftol=0.0001, rtol=1e-10, gtol=0.9 total number of function evaluations=1 total number of gradient evaluations=1 total number of function/gradient evaluations=0 Termination reason: 0 TaoLineSearch Object: type: armijo maxf=30, ftol=1e-10, gtol=0.0001 Armijo linesearch : alpha=1 beta=0.5 sigma=0.0001 memsize=1 maximum function evaluations=30 tolerances: ftol=0.0001, rtol=1e-10, gtol=0.9 total number of function evaluations=30 total number of gradient evaluations=0 total number of function/gradient evaluations=0 Termination reason: 4 With final output (end of optimization): Tao Object: 1 MPI processes type: nls Newton steps: 1 BFGS steps: 0 Scaled gradient steps: 0 Gradient steps: 1 nls ksp atol: 0 nls ksp rtol: 1 nls ksp ctol: 0 nls ksp negc: 0 nls ksp dtol: 0 nls ksp iter: 0 nls ksp othr: 0 TaoLineSearch Object: 1 MPI processes type: armijo KSP Object: 1 MPI processes type: cg total KSP iterations: 21 convergence tolerances: fatol=0, frtol=0 convergence tolerances: gatol=0, steptol=0, gttol=0.0001 Residual in Function/Gradient:=0.038741 Objective value=0.639121 total number of iterations=0, (max: 50) total number of function evaluations=31, max: 10000 total number of gradient evaluations=1, max: 10000 total number of function/gradient evaluations=1, (max: 10000) total number of Hessian evaluations=1 Solver terminated: -6 Line Search Failure This is without line-search (unit step size): Tao Object: 1 MPI processes type: nls Newton steps: 3 BFGS steps: 0 Scaled gradient steps: 0 Gradient steps: 0 nls ksp atol: 0 nls ksp rtol: 3 nls ksp ctol: 0 nls ksp negc: 0 nls ksp dtol: 0 nls ksp iter: 0 nls ksp othr: 0 TaoLineSearch Object: 1 MPI processes type: unit KSP Object: 1 MPI processes type: cg total KSP iterations: 71 convergence tolerances: fatol=0, frtol=0 convergence tolerances: gatol=0, steptol=0, gttol=0.0001 Residual in Function/Gradient:=1.91135e-11 Objective value=0.160914 total number of iterations=3, (max: 50) total number of function/gradient evaluations=4, (max: 10000) total number of Hessian evaluations=3 Solution converged: ||g(X)||/||g(X0)|| <= gttol > On Jun 11, 2015, at 12:44 PM, Jason Sarich > wrote: > > Hi Andreas, > > Yes it looks like a bug that the reason is never set, but the line should still terminate. Is the problem you are having with the line search itself, or is it failing because you are checking this ls->reason directly? > > Jason Sarich > > > On Thu, Jun 11, 2015 at 9:53 AM, Andreas Mang > wrote: > Hi guys: > > I have a problem with the TAO Armijo line search (petsc-3.5.4). My algorithm works if I use the More & Thuente line search (default). I have numerically checked the gradient of my objective. It?s correct. I am happy to write a small snippet of code and do an easy test if you guys disagree, but from what I?ve seen in the line search code it seems obvious to me that there is a bug. Am I missing something or are you not setting > > ls->reason > > to > > TAOLINESEARCH_SUCCESS > > if the Armijo condition is fulfilled (TaoLineSearchApply_Armijo in armijo.c; line 118 - 302)?! > > It seems to me that ls->reason is and will remain to be set to > > TAOLINESEARCH_CONTINUE_ITERATING > > if everything works (i.e. I don?t hit one of the exceptions). Does this make sense? If not I?ll invest the time and put together a simple test case and, if that works, continue to check my code. > > /Andreas > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jun 11 13:41:25 2015 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 11 Jun 2015 13:41:25 -0500 Subject: [petsc-users] Accessing 'halo' matrix entries? In-Reply-To: <004D7A5F-7DC5-455F-80E3-78EB700F2E7F@epcc.ed.ac.uk> References: <004D7A5F-7DC5-455F-80E3-78EB700F2E7F@epcc.ed.ac.uk> Message-ID: On Thu, Jun 11, 2015 at 9:40 AM, Oliver Henrich wrote: > Dear Matt, > > Thanks for your quick response. > > The way I understand it is that I can only modify the halo of a vector, > which is simply copied from the other side of the mesh, but matrices don?t > actually have halo entries or the like. > > I am not sure the solution to this problem is simply adding the difference > dpsi at one boundary only and not subtracting it at the other side. This > would mean that at one side you have periodicity plus the added difference, > whereas at the other side you just have periodicity. Sounds like it leads > to a kind of discontinuity. > > It is also not possible to just add it at the end. It had to be added at > every step during the iteration process. > > Would it be possible to model the halo region explicitly by adding two > additional points at the leftmost and rightmost boundary, but only on the > processes with minimal and maximal Cartesian rank along this dimension > where the jump occurs? Then I could modify the matrix and rhs and impose > the constraint directly. > I think you are missing my point. Perhaps we need an explicit example. Suppose we have a 1D DMDA, 0 -- 1 -- 2 -- 3 and it is periodic. DMDA by definition has a collocated discretization, so all unknowns live on the vertices. If the stencil is size 1, then at vertices 1 we have 0 -- 1 -- 2 which is normal, and at vertex 0 we have 3 -- 0 -- 1 Now this halo 3 should be dp away from 3 itself, since it the periodic image, so you use p(3) - dp. For the stencil of 3 we have 2 -- 3 -- 0 Now I have no idea how you make sense of 0 here. You could use p(0) + dp, or just some Neumann condition here. I am not convinced this is sensible in higher dimensions since what you really want to fix is the average pressure drop which is a much different thing. Thats an integral condition which you should handle by projection (I think). Matt > Best wishes and many thanks for your help, > Oliver > > On 11 Jun 2015, at 14:44, Matthew Knepley wrote: > > On Thu, Jun 11, 2015 at 8:15 AM, Oliver Henrich > wrote: > >> Dear PETSc-Team, >> >> I am trying to solve a Poisson equation with a mixed periodic-Dirichlet >> boundary condition. What I have in mind is e.g. a compressible flow with a >> total pressure difference imposed between the two sides of the system, but >> otherwise periodic, and periodic boundary conditions along the remaining >> two dimensions. Another example would be an electrostatic system with >> dielectric contrast in an external electric field / potential difference. >> >> For clarity, if x = 0 (N+1) is the left (right) halo site at the >> boundary and x = 1 (N) is the leftmost (rightmost) site in the physical >> domain: >> >> psi(x = 0) = psi(x = N) - dpsi >> psi(x = N+1) = psi(1) + dpsi >> >> I know it is possible to solve this with a double Poisson solve, which I >> try to avoid for performance reasons. >> >> It is also possible to solve this by modifying the matrix with a >> master-slave approach that imposes the constraint. This requires defining a >> transformation matrix that acts on the matrix, the solution vector and the >> righthand side of the problem. >> >> The core of the problem I have is that the pressure or potential >> difference should not be between the leftmost and rightmost site in the >> physical domain (a standard Dirichlet BC), but between the left- or >> rightmost site in the physical domain and the corresponding halo site at >> the opposite side of the system. It should be possible to do this if the >> entries of the transformation matrix that act on the halo sites can be >> accessed and modified. >> >> Is anything like this possible in PETSc? >> > > The way that periodicity works in PETSc right now for DMDA is that values > are copied into the halo > region from the other part of the mesh. Thus, you can just choose to add > your delta at the right boundary > and not at the left. The mechanics would be the same as now. Does that > make sense? > > Thanks, > > Matt > > >> Best regards and many thanks, >> Oliver >> >> -- >> Dr Oliver Henrich >> Edinburgh Parallel Computing Centre >> School of Physics and Astronomy >> University of Edinburgh >> King's Buildings, JCMB >> Edinburgh EH9 3FD >> United Kingdom >> >> Tel: +44 (0)131 650 5818 >> Fax: +44 (0)131 650 6555 >> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336 >> >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > -- > Dr Oliver Henrich > Edinburgh Parallel Computing Centre > School of Physics and Astronomy > University of Edinburgh > King's Buildings, JCMB > Edinburgh EH9 3FD > United Kingdom > > Tel: +44 (0)131 650 5818 > Fax: +44 (0)131 650 6555 > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336 > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From mkhodak at princeton.edu Thu Jun 11 14:13:15 2015 From: mkhodak at princeton.edu (Mikhail Khodak) Date: Thu, 11 Jun 2015 12:13:15 -0700 Subject: [petsc-users] petsc4py Build Problem In-Reply-To: References: Message-ID: I have attached the log files. I tried using Open MPI before but received the same result. I will try a static build later and otherwise will get a VM. Thanks for looking at this, Mikhail On Thu, Jun 11, 2015 at 10:48 AM, Satish Balay wrote: > Also - should mention: > > - cygwin has OpenMPI pacakged - so you could also try that. > - most of my builds are static [so never tried dll builds] > > Satish > > On Thu, 11 Jun 2015, Satish Balay wrote: > > > If you get errors when running basic petsc examples - send us the > > relavant petsc logs [configure.log, make.log, test.log etc..] > > > > Also - note that mpich is not supported on cygwin/windows [it > > generally works for us - when we try the --download-mpich build]. > > > > Unless you really need a cywin build/use of petsc4py - it might be > > easier to install a linux VM - and use PETSc/petsc4py on it. > > > > Satish > > > > On Thu, 11 Jun 2015, Mikhail Khodak wrote: > > > > > I added these lines to the petsc4py test files (specifically > test_comm.py) > > > but the error remains the same. However, I have done the standard > 'which > > > mpiexec', 'which mpicc', 'which mpirun' and they are all in the same > > > folder. In fact it is the only MPI installed. > > > The reason I thought it might be a PETSc build problem is because one > of > > > the PETSc 'make test' tests (ex5f) fails with the same error, even > though > > > the 'make streams' test works fine with MPI processes. > > > Thanks, > > > Mikhail > > > > > > On Thu, Jun 11, 2015 at 4:21 AM, Matthew Knepley > wrote: > > > > > > > On Thu, Jun 11, 2015 at 5:07 AM, Mikhail Khodak < > mkhodak at princeton.edu> > > > > wrote: > > > > > > > >> Thank you for your help - the install seems to work, apart from > routines > > > >> requiring MPI, which fail due to the "Attempting to use an MPI > routine > > > >> before initializing MPI" error. This seems to be an error in the > PETSc > > > >> build itself. > > > >> > > > > > > > > The MPI routines will not work until after > > > > > > > > import petsc4py, sys > > > > petsc4py.init(sys.argv) > > > > from petsc4py import PETSc > > > > > > > > If they fail after this, it is usually a mismatch between the mpiexec > > > > being used and the MPI > > > > libraries being linked. > > > > > > > > Thanks, > > > > > > > > Matt > > > > > > > > Thanks again, > > > >> Mikhail Khodak > > > >> > > > >> On Mon, Jun 8, 2015 at 5:11 AM, Lisandro Dalcin > > > >> wrote: > > > >> > > > >>> On 8 June 2015 at 02:50, Mikhail Khodak > wrote: > > > >>> > Hello, I am trying to build petsc4py-3.5.1 using Cygwin on 64-bit > > > >>> Windows 7. > > > >>> > I have built PETSc 3.5.4 with shared and dynamic libraries using > > > >>> > mpich2-1.2.1 and successfully ran the installation tests. I am > using > > > >>> Python > > > >>> > 2.7 and NumPy 1.9.2 and have installed mpi4py. However, when I > attempt > > > >>> to > > > >>> > install petsc4py (both with pip and distutils) I get a mpicc > compiler > > > >>> error > > > >>> > due to undefined references/symbols. I have attached the output > of > > > >>> running > > > >>> > > > > >>> > pip install petsc petsc4py --allow-external petsc > --allow-external > > > >>> petsc4py > > > >>> > > > > >>> > > > >>> I've never ever built or test petsc4py under Cygwin. The errors you > > > >>> see are expected. > > > >>> Perhaps you can manually workaround the issues following the > following > > > >>> steps: > > > >>> > > > >>> 1) Download the petsc4py tarball and unpack it. > > > >>> 2) Open the file "src/libpetsc4py/libpetsc4py.h", add remove all > > > >>> occurences of DL_IMPORT, i.e, replace DL_IMPORT(XYZ) for just XYZ > > > >>> 3) Use pip again: > > > >>> > > > >>> pip install petsc > > > >>> pip install . > > > >>> > > > >>> The last line assumes your current working directory is the one > having > > > >>> petsc4py's setup.py > > > >>> > > > >>> Finally, I do not guarantee this will work. I'm just guessing, > > > >>> petsc4py never explicitly supported Windows and/or Cygwin. > > > >>> > > > >>> > > > >>> -- > > > >>> Lisandro Dalcin > > > >>> ============ > > > >>> Research Scientist > > > >>> Computer, Electrical and Mathematical Sciences & Engineering > (CEMSE) > > > >>> Numerical Porous Media Center (NumPor) > > > >>> King Abdullah University of Science and Technology (KAUST) > > > >>> http://numpor.kaust.edu.sa/ > > > >>> > > > >>> 4700 King Abdullah University of Science and Technology > > > >>> al-Khawarizmi Bldg (Bldg 1), Office # 4332 > > > >>> Thuwal 23955-6900, Kingdom of Saudi Arabia > > > >>> http://www.kaust.edu.sa > > > >>> > > > >>> Office Phone: +966 12 808-0459 > > > >>> > > > >> > > > >> > > > > > > > > > > > > -- > > > > What most experimenters take for granted before they begin their > > > > experiments is infinitely more interesting than any results to which > their > > > > experiments lead. > > > > -- Norbert Wiener > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 6995253 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: make.log Type: application/octet-stream Size: 100169 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test.log Type: application/octet-stream Size: 886 bytes Desc: not available URL: From jason.sarich at gmail.com Thu Jun 11 15:03:54 2015 From: jason.sarich at gmail.com (Jason Sarich) Date: Thu, 11 Jun 2015 15:03:54 -0500 Subject: [petsc-users] TAO: armijo condition not fulfilled In-Reply-To: <80e773c4d04e4dfea37ec2dcdef5cee6@LUCKMAN.anl.gov> References: <80e773c4d04e4dfea37ec2dcdef5cee6@LUCKMAN.anl.gov> Message-ID: Hi Andreas, I don't see anything obviously wrong. If the function is very flat, you can try setting -tao_ls_armijo_sigma to a smaller number. If you continue to have problems, please let me know. It would definitely help if you have an example you could send me that reproduces this behavior. Jason On Thu, Jun 11, 2015 at 1:12 PM, Andreas Mang wrote: > Hey Jason: > > The line search fails. If I use Armijo I get > > TaoLineSearch Object: > type: armijo > maxf=30, ftol=1e-10, gtol=0.0001 > Armijo linesearch : alpha=1 beta=0.5 sigma=0.0001 > memsize=1 > maximum function evaluations=30 > tolerances: ftol=0.0001, rtol=1e-10, gtol=0.9 > total number of function evaluations=1 > total number of gradient evaluations=1 > total number of function/gradient evaluations=0 > Termination reason: 0 > > The parameters seem to be the default ones also suggested by Nocedal and > Wright. So I did not change anything. The termination reason is equivalent > to TAOLINESEARCH_CONTINUE_ITERATING. I am not checking the reason directly. > I guess it starts reducing the step size after that. I can see that my > objective function get?s evaluated (as expected); however, the objective > values increase (from what I see when monitoring the evaluations of my > objective). This leads to a failure in the line search and made (still > makes) me believe there is a bug on my side (which I have not found yet). > However, if I use a unit step it converges (relative change of the gradient > e.g. to 1E-9; see bottom of this email). If I use More & Thuente, same > thing. No reduction in step size necessary. > > If you suggest that I should do some further testing on simpler > problems, I?m happy to do so. After looking at the code, I just felt like > there obviously is something wrong in the line-search implementation. > > Thanks for your help. > /Andreas > > *Here?s the output after the first iteration (where the Armijo line > search fails):* > > TaoLineSearch Object: > type: armijo > maxf=30, ftol=1e-10, gtol=0.0001 > Armijo linesearch : alpha=1 beta=0.5 sigma=0.0001 > memsize=1 > maximum function evaluations=30 > tolerances: ftol=0.0001, rtol=1e-10, gtol=0.9 > total number of function evaluations=1 > total number of gradient evaluations=1 > total number of function/gradient evaluations=0 > Termination reason: 0 > TaoLineSearch Object: > type: armijo > maxf=30, ftol=1e-10, gtol=0.0001 > Armijo linesearch : alpha=1 beta=0.5 sigma=0.0001 > memsize=1 > maximum function evaluations=30 > tolerances: ftol=0.0001, rtol=1e-10, gtol=0.9 > total number of function evaluations=30 > total number of gradient evaluations=0 > total number of function/gradient evaluations=0 > Termination reason: 4 > > *With final output (end of optimization):* > > Tao Object: 1 MPI processes > type: nls > Newton steps: 1 > BFGS steps: 0 > Scaled gradient steps: 0 > Gradient steps: 1 > nls ksp atol: 0 > nls ksp rtol: 1 > nls ksp ctol: 0 > nls ksp negc: 0 > nls ksp dtol: 0 > nls ksp iter: 0 > nls ksp othr: 0 > TaoLineSearch Object: 1 MPI processes > type: armijo > KSP Object: 1 MPI processes > type: cg > total KSP iterations: 21 > convergence tolerances: fatol=0, frtol=0 > convergence tolerances: gatol=0, steptol=0, gttol=0.0001 > Residual in Function/Gradient:=0.038741 > Objective value=0.639121 > total number of iterations=0, (max: 50) > total number of function evaluations=31, max: 10000 > total number of gradient evaluations=1, max: 10000 > total number of function/gradient evaluations=1, (max: 10000) > total number of Hessian evaluations=1 > Solver terminated: -6 Line Search Failure > > *This is without line-search (unit step size):* > > Tao Object: 1 MPI processes > type: nls > Newton steps: 3 > BFGS steps: 0 > Scaled gradient steps: 0 > Gradient steps: 0 > nls ksp atol: 0 > nls ksp rtol: 3 > nls ksp ctol: 0 > nls ksp negc: 0 > nls ksp dtol: 0 > nls ksp iter: 0 > nls ksp othr: 0 > TaoLineSearch Object: 1 MPI processes > type: unit > KSP Object: 1 MPI processes > type: cg > total KSP iterations: 71 > convergence tolerances: fatol=0, frtol=0 > convergence tolerances: gatol=0, steptol=0, gttol=0.0001 > Residual in Function/Gradient:=1.91135e-11 > Objective value=0.160914 > total number of iterations=3, (max: 50) > total number of function/gradient evaluations=4, (max: 10000) > total number of Hessian evaluations=3 > Solution converged: ||g(X)||/||g(X0)|| <= gttol > > > > On Jun 11, 2015, at 12:44 PM, Jason Sarich > wrote: > > Hi Andreas, > > Yes it looks like a bug that the reason is never set, but the line > should still terminate. Is the problem you are having with the line search > itself, or is it failing because you are checking this ls->reason directly? > > Jason Sarich > > > On Thu, Jun 11, 2015 at 9:53 AM, Andreas Mang > wrote: > >> Hi guys: >> >> I have a problem with the TAO Armijo line search (petsc-3.5.4). My >> algorithm works if I use the More & Thuente line search (default). I have >> numerically checked the gradient of my objective. It?s correct. I am happy >> to write a small snippet of code and do an easy test if you guys disagree, >> but from what I?ve seen in the line search code it seems obvious to me that >> there is a bug. Am I missing something or are you not setting >> >> ls->reason >> >> to >> >> TAOLINESEARCH_SUCCESS >> >> if the Armijo condition is fulfilled (TaoLineSearchApply_Armijo in >> armijo.c; line 118 - 302)?! >> >> It seems to me that ls->reason is and will remain to be set to >> >> TAOLINESEARCH_CONTINUE_ITERATING >> >> if everything works (i.e. I don?t hit one of the exceptions). Does this >> make sense? If not I?ll invest the time and put together a simple test case >> and, if that works, continue to check my code. >> >> /Andreas >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas at ices.utexas.edu Thu Jun 11 15:26:12 2015 From: andreas at ices.utexas.edu (Andreas Mang) Date: Thu, 11 Jun 2015 15:26:12 -0500 Subject: [petsc-users] TAO: armijo condition not fulfilled In-Reply-To: References: <80e773c4d04e4dfea37ec2dcdef5cee6@LUCKMAN.anl.gov> Message-ID: <2F899A81-AB00-4B04-BC66-AB330AF3174F@ices.utexas.edu> Hey Jason: Thanks for looking into this. In the meantime I have checked against your elliptic tao example using an Armijo linesearch. It works (i.e. converges) as you suggested in an earlier email even though it returns the ?wrong" flag. For my problem it does even choke if I set the regularization parameter to 1E6 (essentially solving a quadratic problem). A final question before I continue the struggle by myself: It says ?gradient steps: 1? instead of ?gradient steps: 0? in the outputs (Armijo vs. More-Thuente / Unit). Does it start doing gradient evaluations? Maybe this helps me to further poke my code. I?ll continue to look into this. I?ll come back to you if I discover that the problem is on the PETSc side of things and I can reproduce the problem with a toy example. Thanks for your time! /Andreas > On Jun 11, 2015, at 3:03 PM, Jason Sarich wrote: > > Hi Andreas, > > I don't see anything obviously wrong. If the function is very flat, you can try setting -tao_ls_armijo_sigma to a smaller number. If you continue to have problems, please let me know. It would definitely help if you have an example you could send me that reproduces this behavior. > > Jason > > > > > On Thu, Jun 11, 2015 at 1:12 PM, Andreas Mang > wrote: > Hey Jason: > > The line search fails. If I use Armijo I get > > TaoLineSearch Object: > type: armijo > maxf=30, ftol=1e-10, gtol=0.0001 > Armijo linesearch : alpha=1 beta=0.5 sigma=0.0001 memsize=1 > maximum function evaluations=30 > tolerances: ftol=0.0001, rtol=1e-10, gtol=0.9 > total number of function evaluations=1 > total number of gradient evaluations=1 > total number of function/gradient evaluations=0 > Termination reason: 0 > > The parameters seem to be the default ones also suggested by Nocedal and Wright. So I did not change anything. The termination reason is equivalent to TAOLINESEARCH_CONTINUE_ITERATING. I am not checking the reason directly. I guess it starts reducing the step size after that. I can see that my objective function get?s evaluated (as expected); however, the objective values increase (from what I see when monitoring the evaluations of my objective). This leads to a failure in the line search and made (still makes) me believe there is a bug on my side (which I have not found yet). However, if I use a unit step it converges (relative change of the gradient e.g. to 1E-9; see bottom of this email). If I use More & Thuente, same thing. No reduction in step size necessary. > > If you suggest that I should do some further testing on simpler problems, I?m happy to do so. After looking at the code, I just felt like there obviously is something wrong in the line-search implementation. > > Thanks for your help. > /Andreas > > Here?s the output after the first iteration (where the Armijo line search fails): > > TaoLineSearch Object: > type: armijo > maxf=30, ftol=1e-10, gtol=0.0001 > Armijo linesearch : alpha=1 beta=0.5 sigma=0.0001 memsize=1 > maximum function evaluations=30 > tolerances: ftol=0.0001, rtol=1e-10, gtol=0.9 > total number of function evaluations=1 > total number of gradient evaluations=1 > total number of function/gradient evaluations=0 > Termination reason: 0 > TaoLineSearch Object: > type: armijo > maxf=30, ftol=1e-10, gtol=0.0001 > Armijo linesearch : alpha=1 beta=0.5 sigma=0.0001 memsize=1 > maximum function evaluations=30 > tolerances: ftol=0.0001, rtol=1e-10, gtol=0.9 > total number of function evaluations=30 > total number of gradient evaluations=0 > total number of function/gradient evaluations=0 > Termination reason: 4 > > With final output (end of optimization): > > Tao Object: 1 MPI processes > type: nls > Newton steps: 1 > BFGS steps: 0 > Scaled gradient steps: 0 > Gradient steps: 1 > nls ksp atol: 0 > nls ksp rtol: 1 > nls ksp ctol: 0 > nls ksp negc: 0 > nls ksp dtol: 0 > nls ksp iter: 0 > nls ksp othr: 0 > TaoLineSearch Object: 1 MPI processes > type: armijo > KSP Object: 1 MPI processes > type: cg > total KSP iterations: 21 > convergence tolerances: fatol=0, frtol=0 > convergence tolerances: gatol=0, steptol=0, gttol=0.0001 > Residual in Function/Gradient:=0.038741 > Objective value=0.639121 > total number of iterations=0, (max: 50) > total number of function evaluations=31, max: 10000 > total number of gradient evaluations=1, max: 10000 > total number of function/gradient evaluations=1, (max: 10000) > total number of Hessian evaluations=1 > Solver terminated: -6 Line Search Failure > > This is without line-search (unit step size): > > Tao Object: 1 MPI processes > type: nls > Newton steps: 3 > BFGS steps: 0 > Scaled gradient steps: 0 > Gradient steps: 0 > nls ksp atol: 0 > nls ksp rtol: 3 > nls ksp ctol: 0 > nls ksp negc: 0 > nls ksp dtol: 0 > nls ksp iter: 0 > nls ksp othr: 0 > TaoLineSearch Object: 1 MPI processes > type: unit > KSP Object: 1 MPI processes > type: cg > total KSP iterations: 71 > convergence tolerances: fatol=0, frtol=0 > convergence tolerances: gatol=0, steptol=0, gttol=0.0001 > Residual in Function/Gradient:=1.91135e-11 > Objective value=0.160914 > total number of iterations=3, (max: 50) > total number of function/gradient evaluations=4, (max: 10000) > total number of Hessian evaluations=3 > Solution converged: ||g(X)||/||g(X0)|| <= gttol > > > >> On Jun 11, 2015, at 12:44 PM, Jason Sarich > wrote: >> >> Hi Andreas, >> >> Yes it looks like a bug that the reason is never set, but the line should still terminate. Is the problem you are having with the line search itself, or is it failing because you are checking this ls->reason directly? >> >> Jason Sarich >> >> >> On Thu, Jun 11, 2015 at 9:53 AM, Andreas Mang > wrote: >> Hi guys: >> >> I have a problem with the TAO Armijo line search (petsc-3.5.4). My algorithm works if I use the More & Thuente line search (default). I have numerically checked the gradient of my objective. It?s correct. I am happy to write a small snippet of code and do an easy test if you guys disagree, but from what I?ve seen in the line search code it seems obvious to me that there is a bug. Am I missing something or are you not setting >> >> ls->reason >> >> to >> >> TAOLINESEARCH_SUCCESS >> >> if the Armijo condition is fulfilled (TaoLineSearchApply_Armijo in armijo.c; line 118 - 302)?! >> >> It seems to me that ls->reason is and will remain to be set to >> >> TAOLINESEARCH_CONTINUE_ITERATING >> >> if everything works (i.e. I don?t hit one of the exceptions). Does this make sense? If not I?ll invest the time and put together a simple test case and, if that works, continue to check my code. >> >> /Andreas >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Jun 11 13:53:58 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 11 Jun 2015 13:53:58 -0500 Subject: [petsc-users] -pc_mg_monitor In-Reply-To: <5579A0F2.90901@ed.ac.uk> References: <55797570.6060008@ed.ac.uk> <5579A0F2.90901@ed.ac.uk> Message-ID: <54975CE1-51A7-47FB-A15D-4FE63D66174B@mcs.anl.gov> Use -mg_levels_ksp_monitor since mg_levels is the prefix for the levels this will turn monitor for each level and -mg_coarse_ksp_monitor (but note that if the coarse solver is a direct solver this won't have any affect). Barry > On Jun 11, 2015, at 9:53 AM, David Scott wrote: > > Thanks for the replies. It seems that the reference to -pc_mg_monitor in > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCMG.html > should be removed. > > David > > On 11/06/2015 15:44, Hong wrote: >> David: >> PETSc library does not have the option '-pc_mg_monitor'. >> Hong >> >> On Thu, Jun 11, 2015 at 6:48 AM, David Scott wrote: >> Hello, >> >> I am using MINRES with GAMG and have supplied various options >> >> #PETSc Option Table entries: >> -ksp_max_it 500 >> -ksp_monitor_true_residual >> -log_summary >> -mg_levels_ksp_max_it 2 >> -mg_levels_ksp_type chebyshev >> -mg_levels_pc_type sor >> -options_left >> -pc_gamg_agg_nsmooths 1 >> -pc_gamg_threshold 0.03 >> -pc_gamg_type agg >> -pc_gamg_verbose 7 >> -pc_mg_monitor >> #End of PETSc Option Table entries >> There is one unused database option. It is: >> Option left: name:-pc_mg_monitor (no value) >> >> Does this mean that I should have supplied an integer value with -pc_mg_monitor or is it not applicable in this case? If I should have supplied a value what is the allowed range? >> >> Thanks in advance, >> >> David >> >> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> >> > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. From bsmith at mcs.anl.gov Thu Jun 11 14:07:39 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 11 Jun 2015 14:07:39 -0500 Subject: [petsc-users] Accessing 'halo' matrix entries? In-Reply-To: References: Message-ID: <8F44F9E3-BD99-4842-80C9-BACA627C6CDA@mcs.anl.gov> > On Jun 11, 2015, at 8:15 AM, Oliver Henrich wrote: > > Dear PETSc-Team, > > I am trying to solve a Poisson equation with a mixed periodic-Dirichlet boundary condition. What I have in mind is e.g. a compressible flow with a total pressure difference imposed between the two sides of the system, but otherwise periodic, and periodic boundary conditions along the remaining two dimensions. Another example would be an electrostatic system with dielectric contrast in an external electric field / potential difference. > > For clarity, if x = 0 (N+1) is the left (right) halo site at the boundary and x = 1 (N) is the leftmost (rightmost) site in the physical domain: > > psi(x = 0) = psi(x = N) - dpsi > psi(x = N+1) = psi(1) + dpsi If I understand correctly this does't affect the MATRIX at all, since the dpsi is a constant. So aren't you just solving with a "regular periodic" matrix but a modified right hand side? Note in PETSc indexing which starts at 0 (not one) and ends with N-1 what you wrote above should be psi(x=-1) = psi(x=N-1) - dpsi psi(x=N) = psi(0) + dpsi Now x=-1 and x=N don't exist in the matrix (only in ghosted vectors) so b(0) = b(0) + od*dpsi and b(N-1) = b(N-1) - od*dpsi where od is the "off diagonal" entry of the Poisson matrix and b() is the "normal" right hand side > > I know it is possible to solve this with a double Poisson solve, which I try to avoid for performance reasons. > > It is also possible to solve this by modifying the matrix with a master-slave approach that imposes the constraint. This requires defining a transformation matrix that acts on the matrix, the solution vector and the righthand side of the problem. > > The core of the problem I have is that the pressure or potential difference should not be between the leftmost and rightmost site in the physical domain (a standard Dirichlet BC), but between the left- or rightmost site in the physical domain and the corresponding halo site at the opposite side of the system. It should be possible to do this if the entries of the transformation matrix that act on the halo sites can be accessed and modified. > > Is anything like this possible in PETSc? > > Best regards and many thanks, > Oliver > > -- > Dr Oliver Henrich > Edinburgh Parallel Computing Centre > School of Physics and Astronomy > University of Edinburgh > King's Buildings, JCMB > Edinburgh EH9 3FD > United Kingdom > > Tel: +44 (0)131 650 5818 > Fax: +44 (0)131 650 6555 > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336 > > > From jason.sarich at gmail.com Thu Jun 11 16:13:30 2015 From: jason.sarich at gmail.com (Jason Sarich) Date: Thu, 11 Jun 2015 16:13:30 -0500 Subject: [petsc-users] TAO: armijo condition not fulfilled In-Reply-To: <4281757f44c5454d8127ea054f7c111e@NAGURSKI.anl.gov> References: <80e773c4d04e4dfea37ec2dcdef5cee6@LUCKMAN.anl.gov> <4281757f44c5454d8127ea054f7c111e@NAGURSKI.anl.gov> Message-ID: Hi Andreas, I'm pretty sure this is a bug on my side, directly related to the ls->reason Can you try fixing src/tao/linesearch/impls/armijo/armijo.c so it sets this correctly? before (line 275): /* Successful termination, update memory */ armP->lastReference = ref; /* Successful termination, update memory */ ls->reason = TAOLINESEARCH_SUCCESS; armP->lastReference = ref; On Thu, Jun 11, 2015 at 3:26 PM, Andreas Mang wrote: > Hey Jason: > > Thanks for looking into this. In the meantime I have checked against > your elliptic tao example using an Armijo linesearch. It works (i.e. > converges) as you suggested in an earlier email even though it returns the > ?wrong" flag. For my problem it does even choke if I set the regularization > parameter to 1E6 (essentially solving a quadratic problem). > > A final question before I continue the struggle by myself: It says > ?gradient steps: 1? instead of ?gradient steps: 0? in the outputs (Armijo > vs. More-Thuente / Unit). Does it start doing gradient evaluations? Maybe > this helps me to further poke my code. > > I?ll continue to look into this. I?ll come back to you if I discover > that the problem is on the PETSc side of things and I can reproduce the > problem with a toy example. > > Thanks for your time! /Andreas > > > On Jun 11, 2015, at 3:03 PM, Jason Sarich wrote: > > Hi Andreas, > > I don't see anything obviously wrong. If the function is very flat, you > can try setting -tao_ls_armijo_sigma to a smaller number. If you continue > to have problems, please let me know. It would definitely help if you have > an example you could send me that reproduces this behavior. > > Jason > > > > > On Thu, Jun 11, 2015 at 1:12 PM, Andreas Mang > wrote: > >> Hey Jason: >> >> The line search fails. If I use Armijo I get >> >> TaoLineSearch Object: >> type: armijo >> maxf=30, ftol=1e-10, gtol=0.0001 >> Armijo linesearch : alpha=1 beta=0.5 sigma=0.0001 >> memsize=1 >> maximum function evaluations=30 >> tolerances: ftol=0.0001, rtol=1e-10, gtol=0.9 >> total number of function evaluations=1 >> total number of gradient evaluations=1 >> total number of function/gradient evaluations=0 >> Termination reason: 0 >> >> The parameters seem to be the default ones also suggested by Nocedal >> and Wright. So I did not change anything. The termination reason is >> equivalent to TAOLINESEARCH_CONTINUE_ITERATING. I am not checking the >> reason directly. I guess it starts reducing the step size after that. I can >> see that my objective function get?s evaluated (as expected); however, the >> objective values increase (from what I see when monitoring the evaluations >> of my objective). This leads to a failure in the line search and made >> (still makes) me believe there is a bug on my side (which I have not found >> yet). However, if I use a unit step it converges (relative change of the >> gradient e.g. to 1E-9; see bottom of this email). If I use More & Thuente, >> same thing. No reduction in step size necessary. >> >> If you suggest that I should do some further testing on simpler >> problems, I?m happy to do so. After looking at the code, I just felt like >> there obviously is something wrong in the line-search implementation. >> >> Thanks for your help. >> /Andreas >> >> *Here?s the output after the first iteration (where the Armijo line >> search fails):* >> >> TaoLineSearch Object: >> type: armijo >> maxf=30, ftol=1e-10, gtol=0.0001 >> Armijo linesearch : alpha=1 beta=0.5 sigma=0.0001 >> memsize=1 >> maximum function evaluations=30 >> tolerances: ftol=0.0001, rtol=1e-10, gtol=0.9 >> total number of function evaluations=1 >> total number of gradient evaluations=1 >> total number of function/gradient evaluations=0 >> Termination reason: 0 >> TaoLineSearch Object: >> type: armijo >> maxf=30, ftol=1e-10, gtol=0.0001 >> Armijo linesearch : alpha=1 beta=0.5 sigma=0.0001 >> memsize=1 >> maximum function evaluations=30 >> tolerances: ftol=0.0001, rtol=1e-10, gtol=0.9 >> total number of function evaluations=30 >> total number of gradient evaluations=0 >> total number of function/gradient evaluations=0 >> Termination reason: 4 >> >> *With final output (end of optimization):* >> >> Tao Object: 1 MPI processes >> type: nls >> Newton steps: 1 >> BFGS steps: 0 >> Scaled gradient steps: 0 >> Gradient steps: 1 >> nls ksp atol: 0 >> nls ksp rtol: 1 >> nls ksp ctol: 0 >> nls ksp negc: 0 >> nls ksp dtol: 0 >> nls ksp iter: 0 >> nls ksp othr: 0 >> TaoLineSearch Object: 1 MPI processes >> type: armijo >> KSP Object: 1 MPI processes >> type: cg >> total KSP iterations: 21 >> convergence tolerances: fatol=0, frtol=0 >> convergence tolerances: gatol=0, steptol=0, gttol=0.0001 >> Residual in Function/Gradient:=0.038741 >> Objective value=0.639121 >> total number of iterations=0, (max: 50) >> total number of function evaluations=31, max: 10000 >> total number of gradient evaluations=1, max: 10000 >> total number of function/gradient evaluations=1, (max: 10000) >> total number of Hessian evaluations=1 >> Solver terminated: -6 Line Search Failure >> >> *This is without line-search (unit step size):* >> >> Tao Object: 1 MPI processes >> type: nls >> Newton steps: 3 >> BFGS steps: 0 >> Scaled gradient steps: 0 >> Gradient steps: 0 >> nls ksp atol: 0 >> nls ksp rtol: 3 >> nls ksp ctol: 0 >> nls ksp negc: 0 >> nls ksp dtol: 0 >> nls ksp iter: 0 >> nls ksp othr: 0 >> TaoLineSearch Object: 1 MPI processes >> type: unit >> KSP Object: 1 MPI processes >> type: cg >> total KSP iterations: 71 >> convergence tolerances: fatol=0, frtol=0 >> convergence tolerances: gatol=0, steptol=0, gttol=0.0001 >> Residual in Function/Gradient:=1.91135e-11 >> Objective value=0.160914 >> total number of iterations=3, (max: 50) >> total number of function/gradient evaluations=4, (max: 10000) >> total number of Hessian evaluations=3 >> Solution converged: ||g(X)||/||g(X0)|| <= gttol >> >> >> >> On Jun 11, 2015, at 12:44 PM, Jason Sarich >> wrote: >> >> Hi Andreas, >> >> Yes it looks like a bug that the reason is never set, but the line >> should still terminate. Is the problem you are having with the line search >> itself, or is it failing because you are checking this ls->reason directly? >> >> Jason Sarich >> >> >> On Thu, Jun 11, 2015 at 9:53 AM, Andreas Mang >> wrote: >> >>> Hi guys: >>> >>> I have a problem with the TAO Armijo line search (petsc-3.5.4). My >>> algorithm works if I use the More & Thuente line search (default). I have >>> numerically checked the gradient of my objective. It?s correct. I am happy >>> to write a small snippet of code and do an easy test if you guys disagree, >>> but from what I?ve seen in the line search code it seems obvious to me that >>> there is a bug. Am I missing something or are you not setting >>> >>> ls->reason >>> >>> to >>> >>> TAOLINESEARCH_SUCCESS >>> >>> if the Armijo condition is fulfilled (TaoLineSearchApply_Armijo in >>> armijo.c; line 118 - 302)?! >>> >>> It seems to me that ls->reason is and will remain to be set to >>> >>> TAOLINESEARCH_CONTINUE_ITERATING >>> >>> if everything works (i.e. I don?t hit one of the exceptions). Does this >>> make sense? If not I?ll invest the time and put together a simple test case >>> and, if that works, continue to check my code. >>> >>> /Andreas >>> >>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas at ices.utexas.edu Thu Jun 11 16:54:15 2015 From: andreas at ices.utexas.edu (Andreas Mang) Date: Thu, 11 Jun 2015 16:54:15 -0500 Subject: [petsc-users] TAO: armijo condition not fulfilled In-Reply-To: References: <80e773c4d04e4dfea37ec2dcdef5cee6@LUCKMAN.anl.gov> <4281757f44c5454d8127ea054f7c111e@NAGURSKI.anl.gov> Message-ID: Hey Jason: I?ve added the flag and it works like a charm (see output below). Yaaayyyy! :) Thanks! /A P.S: I still wonder why the successive evaluations of the objective kicked me out, though (i.e. why I did get monotonically *increasing* objective values; I also tried to switch the search direction using ?tao_ls_armijo_nondescending? which did not help). Anyway... Output: Tao Object: 1 MPI processes type: nls Newton steps: 2 BFGS steps: 0 Scaled gradient steps: 0 Gradient steps: 0 nls ksp atol: 0 nls ksp rtol: 2 nls ksp ctol: 0 nls ksp negc: 0 nls ksp dtol: 0 nls ksp iter: 0 nls ksp othr: 0 TaoLineSearch Object: 1 MPI processes type: armijo KSP Object: 1 MPI processes type: cg total KSP iterations: 31 convergence tolerances: fatol=0, frtol=0 convergence tolerances: gatol=0, steptol=0, gttol=0.0001 Residual in Function/Gradient:=2.69882e-06 Objective value=0.370287 total number of iterations=2, (max: 50) total number of function evaluations=2, max: 10000 total number of gradient evaluations=2, max: 10000 total number of function/gradient evaluations=1, (max: 10000) total number of Hessian evaluations=2 Solution converged: ||g(X)||/||g(X0)|| <= gttol > On Jun 11, 2015, at 4:13 PM, Jason Sarich wrote: > > Hi Andreas, > > I'm pretty sure this is a bug on my side, directly related to the ls->reason > > Can you try fixing src/tao/linesearch/impls/armijo/armijo.c so it sets this correctly? > > before (line 275): > /* Successful termination, update memory */ > armP->lastReference = ref; > > /* Successful termination, update memory */ > ls->reason = TAOLINESEARCH_SUCCESS; > armP->lastReference = ref; > > On Thu, Jun 11, 2015 at 3:26 PM, Andreas Mang > wrote: > Hey Jason: > > Thanks for looking into this. In the meantime I have checked against your elliptic tao example using an Armijo linesearch. It works (i.e. converges) as you suggested in an earlier email even though it returns the ?wrong" flag. For my problem it does even choke if I set the regularization parameter to 1E6 (essentially solving a quadratic problem). > > A final question before I continue the struggle by myself: It says ?gradient steps: 1? instead of ?gradient steps: 0? in the outputs (Armijo vs. More-Thuente / Unit). Does it start doing gradient evaluations? Maybe this helps me to further poke my code. > > I?ll continue to look into this. I?ll come back to you if I discover that the problem is on the PETSc side of things and I can reproduce the problem with a toy example. > > Thanks for your time! /Andreas > > >> On Jun 11, 2015, at 3:03 PM, Jason Sarich > wrote: >> >> Hi Andreas, >> >> I don't see anything obviously wrong. If the function is very flat, you can try setting -tao_ls_armijo_sigma to a smaller number. If you continue to have problems, please let me know. It would definitely help if you have an example you could send me that reproduces this behavior. >> >> Jason >> >> >> >> >> On Thu, Jun 11, 2015 at 1:12 PM, Andreas Mang > wrote: >> Hey Jason: >> >> The line search fails. If I use Armijo I get >> >> TaoLineSearch Object: >> type: armijo >> maxf=30, ftol=1e-10, gtol=0.0001 >> Armijo linesearch : alpha=1 beta=0.5 sigma=0.0001 memsize=1 >> maximum function evaluations=30 >> tolerances: ftol=0.0001, rtol=1e-10, gtol=0.9 >> total number of function evaluations=1 >> total number of gradient evaluations=1 >> total number of function/gradient evaluations=0 >> Termination reason: 0 >> >> The parameters seem to be the default ones also suggested by Nocedal and Wright. So I did not change anything. The termination reason is equivalent to TAOLINESEARCH_CONTINUE_ITERATING. I am not checking the reason directly. I guess it starts reducing the step size after that. I can see that my objective function get?s evaluated (as expected); however, the objective values increase (from what I see when monitoring the evaluations of my objective). This leads to a failure in the line search and made (still makes) me believe there is a bug on my side (which I have not found yet). However, if I use a unit step it converges (relative change of the gradient e.g. to 1E-9; see bottom of this email). If I use More & Thuente, same thing. No reduction in step size necessary. >> >> If you suggest that I should do some further testing on simpler problems, I?m happy to do so. After looking at the code, I just felt like there obviously is something wrong in the line-search implementation. >> >> Thanks for your help. >> /Andreas >> >> Here?s the output after the first iteration (where the Armijo line search fails): >> >> TaoLineSearch Object: >> type: armijo >> maxf=30, ftol=1e-10, gtol=0.0001 >> Armijo linesearch : alpha=1 beta=0.5 sigma=0.0001 memsize=1 >> maximum function evaluations=30 >> tolerances: ftol=0.0001, rtol=1e-10, gtol=0.9 >> total number of function evaluations=1 >> total number of gradient evaluations=1 >> total number of function/gradient evaluations=0 >> Termination reason: 0 >> TaoLineSearch Object: >> type: armijo >> maxf=30, ftol=1e-10, gtol=0.0001 >> Armijo linesearch : alpha=1 beta=0.5 sigma=0.0001 memsize=1 >> maximum function evaluations=30 >> tolerances: ftol=0.0001, rtol=1e-10, gtol=0.9 >> total number of function evaluations=30 >> total number of gradient evaluations=0 >> total number of function/gradient evaluations=0 >> Termination reason: 4 >> >> With final output (end of optimization): >> >> Tao Object: 1 MPI processes >> type: nls >> Newton steps: 1 >> BFGS steps: 0 >> Scaled gradient steps: 0 >> Gradient steps: 1 >> nls ksp atol: 0 >> nls ksp rtol: 1 >> nls ksp ctol: 0 >> nls ksp negc: 0 >> nls ksp dtol: 0 >> nls ksp iter: 0 >> nls ksp othr: 0 >> TaoLineSearch Object: 1 MPI processes >> type: armijo >> KSP Object: 1 MPI processes >> type: cg >> total KSP iterations: 21 >> convergence tolerances: fatol=0, frtol=0 >> convergence tolerances: gatol=0, steptol=0, gttol=0.0001 >> Residual in Function/Gradient:=0.038741 >> Objective value=0.639121 >> total number of iterations=0, (max: 50) >> total number of function evaluations=31, max: 10000 >> total number of gradient evaluations=1, max: 10000 >> total number of function/gradient evaluations=1, (max: 10000) >> total number of Hessian evaluations=1 >> Solver terminated: -6 Line Search Failure >> >> This is without line-search (unit step size): >> >> Tao Object: 1 MPI processes >> type: nls >> Newton steps: 3 >> BFGS steps: 0 >> Scaled gradient steps: 0 >> Gradient steps: 0 >> nls ksp atol: 0 >> nls ksp rtol: 3 >> nls ksp ctol: 0 >> nls ksp negc: 0 >> nls ksp dtol: 0 >> nls ksp iter: 0 >> nls ksp othr: 0 >> TaoLineSearch Object: 1 MPI processes >> type: unit >> KSP Object: 1 MPI processes >> type: cg >> total KSP iterations: 71 >> convergence tolerances: fatol=0, frtol=0 >> convergence tolerances: gatol=0, steptol=0, gttol=0.0001 >> Residual in Function/Gradient:=1.91135e-11 >> Objective value=0.160914 >> total number of iterations=3, (max: 50) >> total number of function/gradient evaluations=4, (max: 10000) >> total number of Hessian evaluations=3 >> Solution converged: ||g(X)||/||g(X0)|| <= gttol >> >> >> >>> On Jun 11, 2015, at 12:44 PM, Jason Sarich > wrote: >>> >>> Hi Andreas, >>> >>> Yes it looks like a bug that the reason is never set, but the line should still terminate. Is the problem you are having with the line search itself, or is it failing because you are checking this ls->reason directly? >>> >>> Jason Sarich >>> >>> >>> On Thu, Jun 11, 2015 at 9:53 AM, Andreas Mang > wrote: >>> Hi guys: >>> >>> I have a problem with the TAO Armijo line search (petsc-3.5.4). My algorithm works if I use the More & Thuente line search (default). I have numerically checked the gradient of my objective. It?s correct. I am happy to write a small snippet of code and do an easy test if you guys disagree, but from what I?ve seen in the line search code it seems obvious to me that there is a bug. Am I missing something or are you not setting >>> >>> ls->reason >>> >>> to >>> >>> TAOLINESEARCH_SUCCESS >>> >>> if the Armijo condition is fulfilled (TaoLineSearchApply_Armijo in armijo.c; line 118 - 302)?! >>> >>> It seems to me that ls->reason is and will remain to be set to >>> >>> TAOLINESEARCH_CONTINUE_ITERATING >>> >>> if everything works (i.e. I don?t hit one of the exceptions). Does this make sense? If not I?ll invest the time and put together a simple test case and, if that works, continue to check my code. >>> >>> /Andreas >>> >>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Jun 11 18:33:04 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 11 Jun 2015 18:33:04 -0500 Subject: [petsc-users] Release of PETSc version 3.6 Message-ID: <4E0960F0-093F-46D6-BE49-AC7009F6DC0A@mcs.anl.gov> We are pleased to announce the release of PETSc version 3.6 at http://www.mcs.anl.gov/petsc The major changes and updates can be found at http://www.mcs.anl.gov/petsc/documentation/changes/36.html We recommend upgrading to PETSc 3.6 soon. As always please report problems to petsc-maint at mcs.anl.gov and ask questions at petsc-users at mcs.anl.gov This release includes contributions from Andrew Spott Barry Smith Ce Qin Dave May Dave Nystrom David A Ham Debojyoti Ghosh Dmitry Karpeev Dominic Meiser Ed Bueler Elliott Sales de Andrade Emil Constantinescu Hong Zhang H?kon Strandenes Ian Williamson Jason Sarich Jed Brown Jonathan Guyer Jose Roman Justin Chang Karl Rupp Lawrence Mitchell Lisandro Dalcin Lois Curfman McInnes Mark Adams Matthew Knepley Michael Lange Mr. Hong Zhang Pascal Tremblay Patrick Farrell Patrick Lacasse Patrick Sanan Pierre Jolivet R?mi Lacroix Satish Balay Semih ?zmen Shane Stafford Shri Abhyankar Stefano Zampini Surtai Han Tobin Isaac Vasiliy Kozyrev Vijay Mahadevan and bug reports/patches/proposed improvements received from Alejandro Lamas Davina Alirezaa Jalaali Andre Brand Andrew Cramer Andrew McRae Asbj?rn Nilsen Riseth ?smund Ervik Azariah Cornish Ben Liu Brian Yang Christiaan Klaij Constantine Khroulev Dave Nystrom David McWilliams David Moxey Du Yongle Ed D'Azevedo Ehsan Sadrfaridpour Eric Bavier Evan Um Fabian Jakub Francis Poulin Gabel Fabian Garth Wells Gautam Bisht Glenn Hammond Hakon Strandenes Ilmari Karoner Italo Tasso John Blechta Jonas Mairhofer Jorgen Kvalsvik Jose Roman Justin Chang Kai Song Lawrence Mitchell Lisandro Dalcin Mark Lohry Mark Samonds Martin Diehl Michael Souza Michael Welland Miguel Angel Salazar de Troya Mostafa Molaali Patrick Farrell Patrick Lacasse Patrick Sanan Peter Brune Peter Lichtner Pierre Jolivet Pratheek Shanthraj Randall Mackie Robert Nourgaliev Roy Stogner Sascha Schnepp Sean Farley Sebastien Gilles Shao-Ching Huang Stephan Kramer Sven Heinrich Thele German Guiller Tianyi Li Timoth?e Nicolas Torquil Macdonald Sorensen Victor Eijkhout Xavier Lacoste Yongle Du As always, thanks for your support, Barry From knepley at gmail.com Thu Jun 11 22:54:03 2015 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 11 Jun 2015 22:54:03 -0500 Subject: [petsc-users] Apply operator to linearised system before solving? In-Reply-To: References: Message-ID: On Wed, Jun 10, 2015 at 7:23 PM, Asbj?rn Nilsen Riseth < riseth at maths.ox.ac.uk> wrote: > Hi Matt, > > 1) K = > [ I 0 ; > -A10*inv(diag(A00)) I ] > Okay, it looks like this is some sort of weird approximate Schur complement. The right thing to do I think is to get everything written out in linear algebra language, since I think you can just use FieldSplit+PCComposite to do this from the command line. This one looks similar to SIMPLE. The paper by Shuttleworth, Elman, Shadid, etc. does a nice job of this kind of classification. It might help in the process. > There is a typo in my first email: S = Atilde11 = A11 - > A10*inv(diag(A00))*A01 > > 2) I'd like to have the linear solver close to what these people currently > use. My ultimate goal is to look at nonlinear solver strategies, not > improve their linear solver. I already have a PC that seems to work fine > for my purposes, but was hoping I could implement this version of CPR to > get even closer to their current systems. > Okay. Matt > Ozzy > > On Thu, 11 Jun 2015 at 00:54 Matthew Knepley wrote: > >> On Wed, Jun 10, 2015 at 6:30 PM, Asbj?rn Nilsen Riseth < >> riseth at maths.ox.ac.uk> wrote: >> >>> Dear PETSc community, >>> >>> I'm trying to implement a preconditioner used in reservoir modelling, >>> called Constrained Pressure Residual. >>> They apply a transformation to the linearised system we get from each >>> Newton step, before solving it. Then a 2-stage multiplicative >>> preconditioner on the transformed system. >>> >> >> 1) What is K? >> >> 2) Do you care about the 2-stage thing, or would any Stokes solver do? >> >> Thanks, >> >> Matt >> >> >>> The main problem is implementing step 1 below. >>> >>> Let A be the Jacobian, b the residual and K my transformation. >>> A = [A00 A01; A10 A11]. >>> The process is roughly like this: >>> 1) Set Atilde = KA, btilde = Kb. >>> - We want to solve Atilde x = btilde >>> >>> 2) Create a 2-stage multiplicative preconditioner using Atilde, btilde >>> pc0: This is only applied to the "fieldsplit 1" block of my system. >>> B_1 = [0 0; 0 S^-1] >>> Where S is a selfp Schur approximation from Atilde >>> S = Atilde11 - Atilde10 * inv(diag(Atilde00))* Atilde01 >>> pc1: This is a standard ILU on the whole system Atilde >>> >>> Currently I'm doing something like this >>> Step 1: >>> -ksp_type richardson -ksp_max_it 1 >>> -pc_type python >>> Then I create Atilde from KA in PCSetup, and a FGMRES ksp to take care >>> of step 2 with PCApply >>> >>> Step 2: >>> pc_type composite >>> pc_composite_type multiplicative >>> pc_composite_pcs python,ilu, >>> >>> Are there better ways of dealing with this transformation? >>> To me it looks similar to a 2-step right preconditioner on top of a left >>> preconditioner. >>> >>> Regards, >>> Ozzy >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From riseth at maths.ox.ac.uk Fri Jun 12 06:22:34 2015 From: riseth at maths.ox.ac.uk (=?UTF-8?Q?Asbj=C3=B8rn_Nilsen_Riseth?=) Date: Fri, 12 Jun 2015 11:22:34 +0000 Subject: [petsc-users] Apply operator to linearised system before solving? In-Reply-To: References: Message-ID: Thank you Matt, I'll read through the paper and see what I can map onto my problem. The CPR method is also about isolating the pressure equation, so SIMPLE looks relevant. Ozzy On Fri, 12 Jun 2015 at 04:54 Matthew Knepley wrote: > On Wed, Jun 10, 2015 at 7:23 PM, Asbj?rn Nilsen Riseth < > riseth at maths.ox.ac.uk> wrote: > >> Hi Matt, >> >> 1) K = >> [ I 0 ; >> -A10*inv(diag(A00)) I ] >> > > Okay, it looks like this is some sort of weird approximate Schur > complement. The right thing to > do I think is to get everything written out in linear algebra language, > since I think you can just > use FieldSplit+PCComposite to do this from the command line. This one > looks similar to SIMPLE. > The paper by Shuttleworth, Elman, Shadid, etc. does a nice job of this > kind of classification. It > might help in the process. > > >> There is a typo in my first email: S = Atilde11 = A11 - >> A10*inv(diag(A00))*A01 >> >> 2) I'd like to have the linear solver close to what these people >> currently use. My ultimate goal is to look at nonlinear solver >> strategies, not improve their linear solver. I already have a PC that seems >> to work fine for my purposes, but was hoping I could implement this version >> of CPR to get even closer to their current systems. >> > > Okay. > > Matt > > >> Ozzy >> >> On Thu, 11 Jun 2015 at 00:54 Matthew Knepley wrote: >> >>> On Wed, Jun 10, 2015 at 6:30 PM, Asbj?rn Nilsen Riseth < >>> riseth at maths.ox.ac.uk> wrote: >>> >>>> Dear PETSc community, >>>> >>>> I'm trying to implement a preconditioner used in reservoir modelling, >>>> called Constrained Pressure Residual. >>>> They apply a transformation to the linearised system we get from each >>>> Newton step, before solving it. Then a 2-stage multiplicative >>>> preconditioner on the transformed system. >>>> >>> >>> 1) What is K? >>> >>> 2) Do you care about the 2-stage thing, or would any Stokes solver do? >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> The main problem is implementing step 1 below. >>>> >>>> Let A be the Jacobian, b the residual and K my transformation. >>>> A = [A00 A01; A10 A11]. >>>> The process is roughly like this: >>>> 1) Set Atilde = KA, btilde = Kb. >>>> - We want to solve Atilde x = btilde >>>> >>>> 2) Create a 2-stage multiplicative preconditioner using Atilde, btilde >>>> pc0: This is only applied to the "fieldsplit 1" block of my system. >>>> B_1 = [0 0; 0 S^-1] >>>> Where S is a selfp Schur approximation from Atilde >>>> S = Atilde11 - Atilde10 * inv(diag(Atilde00))* Atilde01 >>>> pc1: This is a standard ILU on the whole system Atilde >>>> >>>> Currently I'm doing something like this >>>> Step 1: >>>> -ksp_type richardson -ksp_max_it 1 >>>> -pc_type python >>>> Then I create Atilde from KA in PCSetup, and a FGMRES ksp to take care >>>> of step 2 with PCApply >>>> >>>> Step 2: >>>> pc_type composite >>>> pc_composite_type multiplicative >>>> pc_composite_pcs python,ilu, >>>> >>>> Are there better ways of dealing with this transformation? >>>> To me it looks similar to a 2-step right preconditioner on top of a >>>> left preconditioner. >>>> >>>> Regards, >>>> Ozzy >>>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mailinglists at xgm.de Fri Jun 12 07:42:28 2015 From: mailinglists at xgm.de (Florian Lindner) Date: Fri, 12 Jun 2015 14:42:28 +0200 Subject: [petsc-users] PETSc was configured with one OpenMPI version but now appears to be compiling using a different OpenMPI In-Reply-To: References: <1505176.dZzVkMf1SI@asaru> Message-ID: <1545207.qDHAUg8Pgs@asaru> Am Donnerstag, 11. Juni 2015, 12:34:51 schrieb Satish Balay: > The code is in petscsys.h. The flags HAVE_OMPI_MAJOR_VERSION etc should > not be set [if the version cannot be determined] > > So perhaps the attached patch is the fix. > > patch -Np1 < mpi-version-check.patch I can confirm it fixes compilation on maint with a plain "python2 configure". Thanks a lot! Florian > Satish > > On Thu, 11 Jun 2015, Matthew Knepley wrote: > > > On Thu, Jun 11, 2015 at 7:04 AM, Florian Lindner > > wrote: > > > > > configure.log is attached. > > > > > > > Ah, you have the buggy Apple preprocessor, so you get > > > > Unable to parse OpenMPI version from header. Probably a > > buggy preprocessor > > Defined "HAVE_OMPI_MAJOR_VERSION" to "unknown" > > Defined "HAVE_OMPI_MINOR_VERSION" to "unknown" > > Defined "HAVE_OMPI_RELEASE_VERSION" to "unknown" > > > > In the new release, we catch this. However, I think then the make check > > fails. > > > > Satish, how is the make check for this version number done? > > > > Thanks, > > > > Matt > > > > > > > Thx! > > > Florian > > > > > > Am Donnerstag, 11. Juni 2015, 06:26:27 schrieb Matthew Knepley: > > > > Cannot tell without configure.log > > > > > > > > Thanks, > > > > > > > > Matt > > > > > > > > On Thu, Jun 11, 2015 at 6:23 AM, Florian Lindner > > > > wrote: > > > > > > > > > Hello, > > > > > > > > > > I try to setup petsc on my Arch Linux box. Download it using git -b > > > maint. > > > > > > > > > > % python2 configure works fine: > > > > > [...] > > > > > Compilers: > > > > > C Compiler: mpicc -fPIC -Wall -Wwrite-strings > > > > > -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 > > > > > C++ Compiler: mpicxx -Wall -Wwrite-strings > > > -Wno-strict-aliasing > > > > > -Wno-unknown-pragmas -g -O0 -fPIC > > > > > Fortran Compiler: mpif90 -fPIC -Wall -Wno-unused-variable > > > > > -ffree-line-length-0 -g -O0 > > > > > Linkers: > > > > > Shared linker: mpicc -shared -fPIC -Wall -Wwrite-strings > > > > > -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 > > > > > Dynamic linker: mpicc -shared -fPIC -Wall -Wwrite-strings > > > > > -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 > > > > > make: > > > > > [...] > > > > > > > > > > > > > xxx=========================================================================xxx > > > > > Configure stage complete. Now build PETSc libraries with (gnumake > > > build): > > > > > make PETSC_DIR=/home/florian/software/petsc > > > > > PETSC_ARCH=arch-linux2-c-debug all > > > > > > > > > > > > > xxx=========================================================================xxx > > > > > > > > > > Now building: > > > > > > > > > > % make PETSC_DIR=/home/florian/software/petsc > > > > > PETSC_ARCH=arch-linux2-c-debug all > > > > > > > > > > yields this error: > > > > > > > > > > "PETSc was configured with one OpenMPI mpi.h version but now appears > > > to be > > > > > compiling using a different OpenMPI mpi.h version" > > > > > > > > > > I would prefer to use my distributions openmpi 1.8.5, there is no other > > > > > MPI version installed. > > > > > > > > > > Using configure with --download-mpich and this compiler > > > > > /home/florian/software/petsc/arch-linux2-c-debug/bin/mpicc seems to > > > work. > > > > > > > > > > Is openmpi 1.8.5 incompatible with petsc? Is used to work fine some > > > time > > > > > ago, but I'm not sure how my system changed in the last weeks when I > > > > > haven't used petsc on this machine (Arch is a rolling release). > > > > > > > > > > Thx, > > > > > Florian > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From balay at mcs.anl.gov Fri Jun 12 09:44:39 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 12 Jun 2015 09:44:39 -0500 Subject: [petsc-users] PETSc was configured with one OpenMPI version but now appears to be compiling using a different OpenMPI In-Reply-To: <1545207.qDHAUg8Pgs@asaru> References: <1505176.dZzVkMf1SI@asaru> <1545207.qDHAUg8Pgs@asaru> Message-ID: Thanks. Pushed the fix to maint. https://bitbucket.org/petsc/petsc/commits/9da1e55a07b3975897109e4ae93063680bcfc056 Satish On Fri, 12 Jun 2015, Florian Lindner wrote: > Am Donnerstag, 11. Juni 2015, 12:34:51 schrieb Satish Balay: > > The code is in petscsys.h. The flags HAVE_OMPI_MAJOR_VERSION etc should > > not be set [if the version cannot be determined] > > > > So perhaps the attached patch is the fix. > > > > patch -Np1 < mpi-version-check.patch > > I can confirm it fixes compilation on maint with a plain "python2 configure". > > Thanks a lot! > Florian > > > Satish > > > > On Thu, 11 Jun 2015, Matthew Knepley wrote: > > > > > On Thu, Jun 11, 2015 at 7:04 AM, Florian Lindner > > > wrote: > > > > > > > configure.log is attached. > > > > > > > > > > Ah, you have the buggy Apple preprocessor, so you get > > > > > > Unable to parse OpenMPI version from header. Probably a > > > buggy preprocessor > > > Defined "HAVE_OMPI_MAJOR_VERSION" to "unknown" > > > Defined "HAVE_OMPI_MINOR_VERSION" to "unknown" > > > Defined "HAVE_OMPI_RELEASE_VERSION" to "unknown" > > > > > > In the new release, we catch this. However, I think then the make check > > > fails. > > > > > > Satish, how is the make check for this version number done? > > > > > > Thanks, > > > > > > Matt > > > > > > > > > > Thx! > > > > Florian > > > > > > > > Am Donnerstag, 11. Juni 2015, 06:26:27 schrieb Matthew Knepley: > > > > > Cannot tell without configure.log > > > > > > > > > > Thanks, > > > > > > > > > > Matt > > > > > > > > > > On Thu, Jun 11, 2015 at 6:23 AM, Florian Lindner > > > > > wrote: > > > > > > > > > > > Hello, > > > > > > > > > > > > I try to setup petsc on my Arch Linux box. Download it using git -b > > > > maint. > > > > > > > > > > > > % python2 configure works fine: > > > > > > [...] > > > > > > Compilers: > > > > > > C Compiler: mpicc -fPIC -Wall -Wwrite-strings > > > > > > -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 > > > > > > C++ Compiler: mpicxx -Wall -Wwrite-strings > > > > -Wno-strict-aliasing > > > > > > -Wno-unknown-pragmas -g -O0 -fPIC > > > > > > Fortran Compiler: mpif90 -fPIC -Wall -Wno-unused-variable > > > > > > -ffree-line-length-0 -g -O0 > > > > > > Linkers: > > > > > > Shared linker: mpicc -shared -fPIC -Wall -Wwrite-strings > > > > > > -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 > > > > > > Dynamic linker: mpicc -shared -fPIC -Wall -Wwrite-strings > > > > > > -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 > > > > > > make: > > > > > > [...] > > > > > > > > > > > > > > > > xxx=========================================================================xxx > > > > > > Configure stage complete. Now build PETSc libraries with (gnumake > > > > build): > > > > > > make PETSC_DIR=/home/florian/software/petsc > > > > > > PETSC_ARCH=arch-linux2-c-debug all > > > > > > > > > > > > > > > > xxx=========================================================================xxx > > > > > > > > > > > > Now building: > > > > > > > > > > > > % make PETSC_DIR=/home/florian/software/petsc > > > > > > PETSC_ARCH=arch-linux2-c-debug all > > > > > > > > > > > > yields this error: > > > > > > > > > > > > "PETSc was configured with one OpenMPI mpi.h version but now appears > > > > to be > > > > > > compiling using a different OpenMPI mpi.h version" > > > > > > > > > > > > I would prefer to use my distributions openmpi 1.8.5, there is no other > > > > > > MPI version installed. > > > > > > > > > > > > Using configure with --download-mpich and this compiler > > > > > > /home/florian/software/petsc/arch-linux2-c-debug/bin/mpicc seems to > > > > work. > > > > > > > > > > > > Is openmpi 1.8.5 incompatible with petsc? Is used to work fine some > > > > time > > > > > > ago, but I'm not sure how my system changed in the last weeks when I > > > > > > haven't used petsc on this machine (Arch is a rolling release). > > > > > > > > > > > > Thx, > > > > > > Florian > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From danyang.su at gmail.com Sun Jun 14 23:15:58 2015 From: danyang.su at gmail.com (Danyang Su) Date: Sun, 14 Jun 2015 21:15:58 -0700 Subject: [petsc-users] Error after updating to 3.6.0: finclude/petscsys.h: No such file or directory Message-ID: <557E517E.1040709@gmail.com> Hi PETSc User, I get problem in compiling my codes after updating PETSc to 3.6.0. The codes work fine using PETSc 3.5.3 and PETSc-dev. I have made modified include lines in makefile from #PETSc variables for V3.5.3 and previous version #include ${PETSC_DIR}/conf/variables #include ${PETSC_DIR}/conf/rules to #PETSc variables for development version, version V3.6.0 and later include ${PETSC_DIR}/lib/petsc/conf/variables include ${PETSC_DIR}/lib/petsc/conf/rules but I got the error fatal error: finclude/petscsys.h: No such file or directory #include ^ compilation terminated. The configure log is attached. Thanks and regards, Danyang -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 5547590 bytes Desc: not available URL: From danyang.su at gmail.com Sun Jun 14 23:29:40 2015 From: danyang.su at gmail.com (Danyang Su) Date: Sun, 14 Jun 2015 21:29:40 -0700 Subject: [petsc-users] Error after updating to 3.6.0: finclude/petscsys.h: No such file or directory In-Reply-To: <557E517E.1040709@gmail.com> References: <557E517E.1040709@gmail.com> Message-ID: <557E54B4.4040901@gmail.com> Sorry, I forgot this changes. * Fortran include files are now in include/petsc/finclude instead of include/finclude. Thus replace uses of #include "finclude/xxx.h" with #include "petsc/finclude/xxx.h". Reason for change: to namespace the finclude directory with PETSc for --prefix installs of PETSc and for packaging systems On 15-06-14 09:15 PM, Danyang Su wrote: > Hi PETSc User, > > I get problem in compiling my codes after updating PETSc to 3.6.0. > > The codes work fine using PETSc 3.5.3 and PETSc-dev. > > I have made modified include lines in makefile from > > #PETSc variables for V3.5.3 and previous version > #include ${PETSC_DIR}/conf/variables > #include ${PETSC_DIR}/conf/rules > > to > > #PETSc variables for development version, version V3.6.0 and later > include ${PETSC_DIR}/lib/petsc/conf/variables > include ${PETSC_DIR}/lib/petsc/conf/rules > > but I got the error > > fatal error: finclude/petscsys.h: No such file or directory > #include > ^ > compilation terminated. > > The configure log is attached. > > Thanks and regards, > > Danyang -------------- next part -------------- An HTML attachment was scrubbed... URL: From ohenrich at epcc.ed.ac.uk Mon Jun 15 05:20:16 2015 From: ohenrich at epcc.ed.ac.uk (Oliver Henrich) Date: Mon, 15 Jun 2015 11:20:16 +0100 Subject: [petsc-users] Accessing 'halo' matrix entries? In-Reply-To: <8F44F9E3-BD99-4842-80C9-BACA627C6CDA@mcs.anl.gov> References: <8F44F9E3-BD99-4842-80C9-BACA627C6CDA@mcs.anl.gov> Message-ID: Dear Barry and Matthew, Many thanks for your input, which is much appreciated. Just to avoid confusing you with pressure: What I want to solve is an electrostatic Poisson equation for a charge distribution with variable permittivity. The electric field consists of two parts, an externally imposed and constant one along one coordinate direction with magnitude E_ex=dpsi / N and a part E_int due to the local charge distribution which varies and obeys the Poisson equation. Hence the charges rho(x=0) should ?see? a potential psi(x=-1) = psi(x=N-1) - dpsi on their left and the charges rho(x=N-1) should ?see? a potential psi(N) = psi(x=0) + dpsi on their right. You suggest to modify the right hand side at b(x=0) and b(x=N-1). But what I don?t understand is how this could lead to the desired offset between psi(x=-1) and psi(x=N-1) and between psi(x=N) and psi(x=0), so between sites outside and inside the physical domain. Please correct me if I?m wrong, but wouldn?t modifying the right hand side in the way you suggest only allow me to have the offset between psi(x=0) and psi(x=N-1)? Kind regards and thanks again for your help. Oliver On 11 Jun 2015, at 20:07, Barry Smith wrote: > >> On Jun 11, 2015, at 8:15 AM, Oliver Henrich wrote: >> >> Dear PETSc-Team, >> >> I am trying to solve a Poisson equation with a mixed periodic-Dirichlet boundary condition. What I have in mind is e.g. a compressible flow with a total pressure difference imposed between the two sides of the system, but otherwise periodic, and periodic boundary conditions along the remaining two dimensions. Another example would be an electrostatic system with dielectric contrast in an external electric field / potential difference. >> > > If I understand correctly this does't affect the MATRIX at all, since the dpsi is a constant. So aren't you just solving with a "regular periodic" matrix but a modified right hand side? > > Note in PETSc indexing which starts at 0 (not one) and ends with N-1 what you wrote above should be > > psi(x=-1) = psi(x=N-1) - dpsi > psi(x=N) = psi(0) + dpsi > > Now x=-1 and x=N don't exist in the matrix (only in ghosted vectors) so b(0) = b(0) + od*dpsi and b(N-1) = b(N-1) - od*dpsi where od is the "off diagonal" entry of the Poisson matrix and b() is the "normal" right hand side > -- Dr Oliver Henrich Edinburgh Parallel Computing Centre School of Physics and Astronomy University of Edinburgh King's Buildings, JCMB Edinburgh EH9 3FD United Kingdom Tel: +44 (0)131 650 5818 Fax: +44 (0)131 650 6555 -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 From ohenrich at epcc.ed.ac.uk Mon Jun 15 12:51:09 2015 From: ohenrich at epcc.ed.ac.uk (Oliver Henrich) Date: Mon, 15 Jun 2015 18:51:09 +0100 Subject: [petsc-users] Accessing 'halo' matrix entries? In-Reply-To: References: <8F44F9E3-BD99-4842-80C9-BACA627C6CDA@mcs.anl.gov> Message-ID: <16002B4C-D55C-4545-A22E-C1D4F21FFF13@epcc.ed.ac.uk> Dear Barry and Matthew, Please ignore my last email from earlier today. I think I understand now what you mean. Many thanks. Oliver On 15 Jun 2015, at 11:20, Oliver Henrich wrote: > Dear Barry and Matthew, > > Many thanks for your input, which is much appreciated. > > Just to avoid confusing you with pressure: What I want to solve is an electrostatic Poisson equation for a charge distribution with variable permittivity. > > The electric field consists of two parts, an externally imposed and constant one along one coordinate direction with magnitude E_ex=dpsi / N and a part E_int due to the local charge distribution which varies and obeys the Poisson equation. Hence the charges rho(x=0) should ?see? a potential psi(x=-1) = psi(x=N-1) - dpsi on their left and the charges rho(x=N-1) should ?see? a potential psi(N) = psi(x=0) + dpsi on their right. > > You suggest to modify the right hand side at b(x=0) and b(x=N-1). But what I don?t understand is how this could lead to the desired offset between psi(x=-1) and psi(x=N-1) and between psi(x=N) and psi(x=0), so between sites outside and inside the physical domain. Please correct me if I?m wrong, but wouldn?t modifying the right hand side in the way you suggest only allow me to have the offset between psi(x=0) and psi(x=N-1)? > > Kind regards and thanks again for your help. > Oliver > > > > On 11 Jun 2015, at 20:07, Barry Smith wrote: > >> >>> On Jun 11, 2015, at 8:15 AM, Oliver Henrich wrote: >>> >>> Dear PETSc-Team, >>> >>> I am trying to solve a Poisson equation with a mixed periodic-Dirichlet boundary condition. What I have in mind is e.g. a compressible flow with a total pressure difference imposed between the two sides of the system, but otherwise periodic, and periodic boundary conditions along the remaining two dimensions. Another example would be an electrostatic system with dielectric contrast in an external electric field / potential difference. >>> >> >> If I understand correctly this does't affect the MATRIX at all, since the dpsi is a constant. So aren't you just solving with a "regular periodic" matrix but a modified right hand side? >> >> Note in PETSc indexing which starts at 0 (not one) and ends with N-1 what you wrote above should be >> >> psi(x=-1) = psi(x=N-1) - dpsi >> psi(x=N) = psi(0) + dpsi >> >> Now x=-1 and x=N don't exist in the matrix (only in ghosted vectors) so b(0) = b(0) + od*dpsi and b(N-1) = b(N-1) - od*dpsi where od is the "off diagonal" entry of the Poisson matrix and b() is the "normal" right hand side >> > > -- > Dr Oliver Henrich > Edinburgh Parallel Computing Centre > School of Physics and Astronomy > University of Edinburgh > King's Buildings, JCMB > Edinburgh EH9 3FD > United Kingdom > > Tel: +44 (0)131 650 5818 > Fax: +44 (0)131 650 6555 > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336 > > > -- Dr Oliver Henrich Edinburgh Parallel Computing Centre School of Physics and Astronomy University of Edinburgh King's Buildings, JCMB Edinburgh EH9 3FD United Kingdom Tel: +44 (0)131 650 5818 Fax: +44 (0)131 650 6555 -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 From it.sadr at gmail.com Mon Jun 15 16:12:57 2015 From: it.sadr at gmail.com (ehsan sadrfaridpour) Date: Mon, 15 Jun 2015 17:12:57 -0400 Subject: [petsc-users] How to pass the Matrices between functions in c++ Message-ID: Hi, Thanks for your great support. I am developing my code in c++ and I have created 4 matrices in a function and they are work fine. In the end of the function I need to delete them to release the memory space they occupied. - My question: However, I need to pass these matrices to other functions (or another class) and use their contents before I destroy them in the current function. (I need it because now I have a long function and I want to make it smaller. Also I need to make a recursive function that gets matrices as input.) I was thinking of using a pointer to the beginning of each of them, but I am not very good at this. Also I am worried about how I destroy them. Would you please let me know what is the best approach for this? Best, Ehsan -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Jun 15 16:36:04 2015 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 15 Jun 2015 16:36:04 -0500 Subject: [petsc-users] How to pass the Matrices between functions in c++ In-Reply-To: References: Message-ID: On Mon, Jun 15, 2015 at 4:12 PM, ehsan sadrfaridpour wrote: > Hi, > Thanks for your great support. > I am developing my code in c++ and I have created 4 matrices in a function > and they are work fine. > In the end of the function I need to delete them to release the memory > space they occupied. > > - My question: > > However, I need to pass these matrices to other functions (or another > class) and use their contents before I destroy them in the current function. > > (I need it because now I have a long function and I want to make it > smaller. Also I need to make a recursive function that gets matrices as > input.) > > I was thinking of using a pointer to the beginning of each of them, but I > am not very good at this. Also I am worried about how I destroy them. > > Would you please let me know what is the best approach for this? > You can just use "Mat A" since the PETSc type is a pointer. Thanks, Matt > Best, > Ehsan > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From it.sadr at gmail.com Mon Jun 15 16:40:24 2015 From: it.sadr at gmail.com (ehsan sadrfaridpour) Date: Mon, 15 Jun 2015 17:40:24 -0400 Subject: [petsc-users] How to pass the Matrices between functions in c++ In-Reply-To: References: Message-ID: Thanks a lot. So, I guess I shouldn't destroy it before I use it completely in other functions. And in the end I just call a destroy functions. Best, Ehsan On Mon, Jun 15, 2015 at 5:36 PM, Matthew Knepley wrote: > On Mon, Jun 15, 2015 at 4:12 PM, ehsan sadrfaridpour > wrote: > >> Hi, >> Thanks for your great support. >> I am developing my code in c++ and I have created 4 matrices in a >> function and they are work fine. >> In the end of the function I need to delete them to release the memory >> space they occupied. >> >> - My question: >> >> However, I need to pass these matrices to other functions (or another >> class) and use their contents before I destroy them in the current function. >> >> (I need it because now I have a long function and I want to make it >> smaller. Also I need to make a recursive function that gets matrices as >> input.) >> >> I was thinking of using a pointer to the beginning of each of them, but I >> am not very good at this. Also I am worried about how I destroy them. >> >> Would you please let me know what is the best approach for this? >> > > You can just use "Mat A" since the PETSc type is a pointer. > > Thanks, > > Matt > > >> Best, >> Ehsan >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Jun 15 16:41:56 2015 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 15 Jun 2015 16:41:56 -0500 Subject: [petsc-users] How to pass the Matrices between functions in c++ In-Reply-To: References: Message-ID: On Mon, Jun 15, 2015 at 4:40 PM, ehsan sadrfaridpour wrote: > Thanks a lot. So, I guess I shouldn't destroy it before I use it > completely in other functions. And in the end I just call a destroy > functions. > Yes, after you are done using it call MatDestroy(&A) Thanks, Matt > Best, > Ehsan > > On Mon, Jun 15, 2015 at 5:36 PM, Matthew Knepley > wrote: > >> On Mon, Jun 15, 2015 at 4:12 PM, ehsan sadrfaridpour >> wrote: >> >>> Hi, >>> Thanks for your great support. >>> I am developing my code in c++ and I have created 4 matrices in a >>> function and they are work fine. >>> In the end of the function I need to delete them to release the memory >>> space they occupied. >>> >>> - My question: >>> >>> However, I need to pass these matrices to other functions (or another >>> class) and use their contents before I destroy them in the current function. >>> >>> (I need it because now I have a long function and I want to make it >>> smaller. Also I need to make a recursive function that gets matrices as >>> input.) >>> >>> I was thinking of using a pointer to the beginning of each of them, but >>> I am not very good at this. Also I am worried about how I destroy them. >>> >>> Would you please let me know what is the best approach for this? >>> >> >> You can just use "Mat A" since the PETSc type is a pointer. >> >> Thanks, >> >> Matt >> >> >>> Best, >>> Ehsan >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From gnw20 at cam.ac.uk Wed Jun 17 06:24:41 2015 From: gnw20 at cam.ac.uk (Garth N. Wells) Date: Wed, 17 Jun 2015 12:24:41 +0100 Subject: [petsc-users] Parameter to pick KSP for estimating eigenvalues for Chebyshev Message-ID: It used to be possible to pick the Krylov solver used to estimate the eigenvalues for Chebyshev, via the parameter system ("mg_levels_est_ksp_type" and "gamg_est_ksp_type"). I know there was some clean-up, with the parameters becoming "mg_levels_ksp_chebyshev_esteig_foo", but an option to pick the Krylov solver appears to be missing. Or have I overlooked it? Garth From Manishkumar.K at LntTechservices.com Wed Jun 17 06:34:01 2015 From: Manishkumar.K at LntTechservices.com (Manish Kumar K) Date: Wed, 17 Jun 2015 11:34:01 +0000 Subject: [petsc-users] Reg : PETSC Installation Message-ID: Dear PETSc Team, I am configuring and trying to install PETSc on Cygwin shell. While invoking following commands in shell Step 1: ./configure -with-cc=gcc -with-cxx=g++ --with-fc=gfortran --with-mpi=0 --download-fblaslapack=/cygdrive/yourselectedloaction/fblaslapack-3.4.2.tar.gz -with-batch Step 2: ./conftest-arch-mswin-c-debug Step 3: ./reconfigure-arch-mswin-c-debug.py Step 4: make test all When i do the above steps I get libpetsc.a file generated in lib folder of PETSc "petsc-3.5.3\arch-mswin-c-debug\lib" . I need an libpetsc.lib file with .lib extension , can you please help me how to generate libpetsc.lib file Let me know if you needed anything else Regards Manish K [https://relayq.larsentoubro.com/DigitalSignature2015.jpg] L&T Technology Services Ltd www.LntTechservices.com This Email may contain confidential or privileged information for the intended recipient (s). If you are not the intended recipient, please do not use or disseminate the information, notify the sender and delete it from your system. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Manishkumar.K at LntTechservices.com Wed Jun 17 08:43:14 2015 From: Manishkumar.K at LntTechservices.com (Manish Kumar K) Date: Wed, 17 Jun 2015 13:43:14 +0000 Subject: [petsc-users] Reg : Building project using makefile in cygwin using petsc Message-ID: Dear PETSc Team, I am trying to build couple .c files eg : solidification.c , solidification_solver.c , solidification_find_ijk.c and so on with one header global_header.c using makefile in Cygwin, For that I'm invoking following command make PETSC_DIR=/cygdrive/c/Scratch/petsc PETSC_ARCH=arch-mswin-c-opt cdat and I have placed my c files in petsc location " =/cygdrive/c/Scratch/petsc/arch-mswin-c-opt/cdat" I get following error $ make PETSC_DIR=/cygdrive/c/Scratch/petsc PETSC_ARCH=arch-mswin-c-opt cdat make: *** No rule to make target 'cdat'. Stop. I want and .exe file to be generated for this project of mine eg : example.exe as my end result. I have attached makefile with this mail ,kindly help on this would be appreciated. Please help me in this . Regards Manish K [https://relayq.larsentoubro.com/DigitalSignature2015.jpg] L&T Technology Services Ltd www.LntTechservices.com This Email may contain confidential or privileged information for the intended recipient (s). If you are not the intended recipient, please do not use or disseminate the information, notify the sender and delete it from your system. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: makefile.zip Type: application/x-zip-compressed Size: 637 bytes Desc: makefile.zip URL: From balay at mcs.anl.gov Wed Jun 17 08:47:41 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 17 Jun 2015 08:47:41 -0500 Subject: [petsc-users] Reg : PETSC Installation In-Reply-To: References: Message-ID: On Wed, 17 Jun 2015, Manish Kumar K wrote: > Dear PETSc Team, > > > > I am configuring and trying to install PETSc on Cygwin shell. While invoking following commands in shell > > > > Step 1: > > ./configure -with-cc=gcc -with-cxx=g++ --with-fc=gfortran --with-mpi=0 > > --download-fblaslapack=/cygdrive/yourselectedloaction/fblaslapack-3.4.2.tar.gz -with-batch You should not need -with-batch option. Also you can install lapack from cygwin - and use that. > > > > Step 2: > > ./conftest-arch-mswin-c-debug > > > > Step 3: > > ./reconfigure-arch-mswin-c-debug.py > > > > Step 4: > > make test all > > > > When i do the above steps I get libpetsc.a file generated in lib folder of PETSc "petsc-3.5.3\arch-mswin-c-debug\lib" . > > I need an libpetsc.lib file with .lib extension , can you please help me how to generate libpetsc.lib file Presumably you need petsc.lib to use with MS compilers? If so - you should install PETSc with MS compilers - and not cgywin/gnu compilers. Please check instructions at: http://www.mcs.anl.gov/petsc/documentation/installation.html#windows Satish > > > > Let me know if you needed anything else > > > > Regards > > Manish K > > [https://relayq.larsentoubro.com/DigitalSignature2015.jpg] > > L&T Technology Services Ltd > > www.LntTechservices.com > > This Email may contain confidential or privileged information for the intended recipient (s). If you are not the intended recipient, please do not use or disseminate the information, notify the sender and delete it from your system. > From balay at mcs.anl.gov Wed Jun 17 08:59:03 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 17 Jun 2015 08:59:03 -0500 Subject: [petsc-users] Reg : Building project using makefile in cygwin using petsc In-Reply-To: References: Message-ID: On Wed, 17 Jun 2015, Manish Kumar K wrote: > Dear PETSc Team, > > > > I am trying to build couple .c files > > eg : solidification.c , solidification_solver.c , solidification_find_ijk.c and so on with one header global_header.c Normal notation for a header file is to use a '.h' notation - i.e global_header.h > > using makefile in Cygwin, > > For that I'm invoking following command > > > > make PETSC_DIR=/cygdrive/c/Scratch/petsc PETSC_ARCH=arch-mswin-c-opt cdat > > > > and I have placed my c files in petsc location " =/cygdrive/c/Scratch/petsc/arch-mswin-c-opt/cdat" > You can have your applications files in any location [don't need to add them to PETSc source or install dir] > > > I get following error > > $ make PETSC_DIR=/cygdrive/c/Scratch/petsc PETSC_ARCH=arch-mswin-c-opt cdat > > make: *** No rule to make target 'cdat'. Stop. > Are you sure you have the makefile in the same location as the sources? >>>>> include C:/Scratch/petsc/conf/variables include C:/Scratch/petsc/conf/rules <<<<<<< This is not really cygwin make format. Also you appear to link with /cygdrive/c/Scratch/petsc/arch-mswin-c-opt/lib/SDPFlex.lib You would have to rebuild this with cygwin gcc/g++/gfortran compilers - for this to work. If you are able to do that - the attached modified makefile is likely to work.. Satish > > > I want and .exe file to be generated for this project of mine eg : example.exe as my end result. > > I have attached makefile with this mail ,kindly help on this would be appreciated. > > > > Please help me in this . > > > > Regards > > Manish K > > [https://relayq.larsentoubro.com/DigitalSignature2015.jpg] > > L&T Technology Services Ltd > > www.LntTechservices.com > > This Email may contain confidential or privileged information for the intended recipient (s). If you are not the intended recipient, please do not use or disseminate the information, notify the sender and delete it from your system. > -------------- next part -------------- PETSC_DIR=/cygdrive/c/Scratch/petsc PETSC_ARCH=arch-mswin-c-opt SPDFLEX_LIB = -lSDPFlex CFLAGS = FFLAGS = CPPFLAGS = FPPFLAGS = EXAMPLESC = solidification_read_mesh_CaPS.c \ solidification_read_mat_property.c \ solidification_find_ijk.c \ solidification_find_node_number.c \ solidification_solver_tmatrixb.c \ solidification_solver_tgradient.c \ solidification_solver.c \ solidification_print_results_first.c \ solidification.c \ solidification_find_hotspot.c \ solidification_read_input_data.c \ solidification_solver_calc_property.c \ solidification_solver_calc_htc.c \ solidification_read_htc_data.c \ solidification_vdg.c CLEANFILES = cdat include ${PETSC_DIR}/conf/variables include ${PETSC_DIR}/conf/rules cdat: chkopts ${OBJSC} -${CLINKER} -o cdat-petsc-lic ${OBJSC} ${PETSC_LIB} ${SPDFLEX_LIB} ${RM} ${OBJSC} From bsmith at mcs.anl.gov Wed Jun 17 09:43:34 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 17 Jun 2015 09:43:34 -0500 Subject: [petsc-users] Parameter to pick KSP for estimating eigenvalues for Chebyshev In-Reply-To: References: Message-ID: I introduced a bug in my cleanup. The kspest eigenestimator was created AFTER the KSPSetFromOptions_Chebyshev() hence the options for controlling the KSP estimator were never processed. I've attached a patch, please let me know how it goes. Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: fix-chebyshev-eigest-options.patch Type: application/octet-stream Size: 1309 bytes Desc: not available URL: -------------- next part -------------- > On Jun 17, 2015, at 6:24 AM, Garth N. Wells wrote: > > It used to be possible to pick the Krylov solver used to estimate the > eigenvalues for Chebyshev, via the parameter system > ("mg_levels_est_ksp_type" and "gamg_est_ksp_type"). I know there was > some clean-up, with the parameters becoming > "mg_levels_ksp_chebyshev_esteig_foo", but an option to pick the Krylov > solver appears to be missing. Or have I overlooked it? > > Garth From gnw20 at cam.ac.uk Wed Jun 17 12:59:06 2015 From: gnw20 at cam.ac.uk (Garth N. Wells) Date: Wed, 17 Jun 2015 18:59:06 +0100 Subject: [petsc-users] Parameter to pick KSP for estimating eigenvalues for Chebyshev In-Reply-To: References: Message-ID: On 17 June 2015 at 15:43, Barry Smith wrote: > > I introduced a bug in my cleanup. The kspest eigenestimator was created AFTER the KSPSetFromOptions_Chebyshev() hence the options for controlling the KSP estimator were never processed. > > I've attached a patch, please let me know how it goes. > I've applied the patch, but I still can't change the Krylov method for estimating the eigenvalues. Garth > Barry > > >> On Jun 17, 2015, at 6:24 AM, Garth N. Wells wrote: >> >> It used to be possible to pick the Krylov solver used to estimate the >> eigenvalues for Chebyshev, via the parameter system >> ("mg_levels_est_ksp_type" and "gamg_est_ksp_type"). I know there was >> some clean-up, with the parameters becoming >> "mg_levels_ksp_chebyshev_esteig_foo", but an option to pick the Krylov >> solver appears to be missing. Or have I overlooked it? >> >> Garth > > From bsmith at mcs.anl.gov Wed Jun 17 14:33:39 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 17 Jun 2015 14:33:39 -0500 Subject: [petsc-users] Parameter to pick KSP for estimating eigenvalues for Chebyshev In-Reply-To: References: Message-ID: <2F275D22-B6A5-4B89-8BA0-CE2E6A014048@mcs.anl.gov> Hmm. How can I reproduce this? I tested my fix with both -pc_type mg and -pc_type gamg and it changed in both cases -mg_levels_esteig_ksp_type cg Barry > On Jun 17, 2015, at 12:59 PM, Garth N. Wells wrote: > > On 17 June 2015 at 15:43, Barry Smith wrote: >> >> I introduced a bug in my cleanup. The kspest eigenestimator was created AFTER the KSPSetFromOptions_Chebyshev() hence the options for controlling the KSP estimator were never processed. >> >> I've attached a patch, please let me know how it goes. >> > > I've applied the patch, but I still can't change the Krylov method for > estimating the eigenvalues. > > Garth > >> Barry >> >> >>> On Jun 17, 2015, at 6:24 AM, Garth N. Wells wrote: >>> >>> It used to be possible to pick the Krylov solver used to estimate the >>> eigenvalues for Chebyshev, via the parameter system >>> ("mg_levels_est_ksp_type" and "gamg_est_ksp_type"). I know there was >>> some clean-up, with the parameters becoming >>> "mg_levels_ksp_chebyshev_esteig_foo", but an option to pick the Krylov >>> solver appears to be missing. Or have I overlooked it? >>> >>> Garth >> >> From gnw20 at cam.ac.uk Wed Jun 17 15:19:12 2015 From: gnw20 at cam.ac.uk (Garth N. Wells) Date: Wed, 17 Jun 2015 21:19:12 +0100 Subject: [petsc-users] Parameter to pick KSP for estimating eigenvalues for Chebyshev In-Reply-To: <2F275D22-B6A5-4B89-8BA0-CE2E6A014048@mcs.anl.gov> References: <2F275D22-B6A5-4B89-8BA0-CE2E6A014048@mcs.anl.gov> Message-ID: On 17 June 2015 at 20:33, Barry Smith wrote: > > Hmm. How can I reproduce this? I tested my fix with both -pc_type mg and -pc_type gamg and it changed in both cases -mg_levels_esteig_ksp_type cg > I've tried again, and it works. Thanks. I must have screwed up and not linked to the patched version of PETSc. Will the patch find its way into the dev version? Garth > Barry > >> On Jun 17, 2015, at 12:59 PM, Garth N. Wells wrote: >> >> On 17 June 2015 at 15:43, Barry Smith wrote: >>> >>> I introduced a bug in my cleanup. The kspest eigenestimator was created AFTER the KSPSetFromOptions_Chebyshev() hence the options for controlling the KSP estimator were never processed. >>> >>> I've attached a patch, please let me know how it goes. >>> >> >> I've applied the patch, but I still can't change the Krylov method for >> estimating the eigenvalues. >> >> Garth >> >>> Barry >>> >>> >>>> On Jun 17, 2015, at 6:24 AM, Garth N. Wells wrote: >>>> >>>> It used to be possible to pick the Krylov solver used to estimate the >>>> eigenvalues for Chebyshev, via the parameter system >>>> ("mg_levels_est_ksp_type" and "gamg_est_ksp_type"). I know there was >>>> some clean-up, with the parameters becoming >>>> "mg_levels_ksp_chebyshev_esteig_foo", but an option to pick the Krylov >>>> solver appears to be missing. Or have I overlooked it? >>>> >>>> Garth >>> >>> > From bsmith at mcs.anl.gov Wed Jun 17 15:27:23 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 17 Jun 2015 15:27:23 -0500 Subject: [petsc-users] Parameter to pick KSP for estimating eigenvalues for Chebyshev In-Reply-To: References: <2F275D22-B6A5-4B89-8BA0-CE2E6A014048@mcs.anl.gov> Message-ID: It is now in maint, master and next and will be in the next patch release. Barry > On Jun 17, 2015, at 3:19 PM, Garth N. Wells wrote: > > On 17 June 2015 at 20:33, Barry Smith wrote: >> >> Hmm. How can I reproduce this? I tested my fix with both -pc_type mg and -pc_type gamg and it changed in both cases -mg_levels_esteig_ksp_type cg >> > > I've tried again, and it works. Thanks. I must have screwed up and not > linked to the patched version of PETSc. > > Will the patch find its way into the dev version? > > Garth > >> Barry >> >>> On Jun 17, 2015, at 12:59 PM, Garth N. Wells wrote: >>> >>> On 17 June 2015 at 15:43, Barry Smith wrote: >>>> >>>> I introduced a bug in my cleanup. The kspest eigenestimator was created AFTER the KSPSetFromOptions_Chebyshev() hence the options for controlling the KSP estimator were never processed. >>>> >>>> I've attached a patch, please let me know how it goes. >>>> >>> >>> I've applied the patch, but I still can't change the Krylov method for >>> estimating the eigenvalues. >>> >>> Garth >>> >>>> Barry >>>> >>>> >>>>> On Jun 17, 2015, at 6:24 AM, Garth N. Wells wrote: >>>>> >>>>> It used to be possible to pick the Krylov solver used to estimate the >>>>> eigenvalues for Chebyshev, via the parameter system >>>>> ("mg_levels_est_ksp_type" and "gamg_est_ksp_type"). I know there was >>>>> some clean-up, with the parameters becoming >>>>> "mg_levels_ksp_chebyshev_esteig_foo", but an option to pick the Krylov >>>>> solver appears to be missing. Or have I overlooked it? >>>>> >>>>> Garth >>>> >>>> >> From Manishkumar.K at LntTechservices.com Thu Jun 18 04:34:31 2015 From: Manishkumar.K at LntTechservices.com (Manish Kumar K) Date: Thu, 18 Jun 2015 09:34:31 +0000 Subject: [petsc-users] Reg : Petsc installation failed Message-ID: Dear PETSc Team, I am configuring and trying to install PETSc libraries using Cygwin shell for windows I checked instructions at: http://www.mcs.anl.gov/petsc/documentation/installation.html#windows And invoked following command Step 1: ./configure --with-cc='win32fe cl' --with-fc=0 --download-f2cblaslapack=1 C compiler you provided with -with-cc=win32fe cl does not work. Cannot compile C with /package/petsc/petsc-3.5.3/bin/win32fe/win32fe cl. I am sending you the log file for this . Please help me out in this . Regards Manish K [https://relayq.larsentoubro.com/DigitalSignature2015.jpg] L&T Technology Services Ltd www.LntTechservices.com This Email may contain confidential or privileged information for the intended recipient (s). If you are not the intended recipient, please do not use or disseminate the information, notify the sender and delete it from your system. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: error.zip Type: application/x-zip-compressed Size: 22857 bytes Desc: error.zip URL: From italo at tasso.com.br Thu Jun 18 05:48:29 2015 From: italo at tasso.com.br (Italo Tasso) Date: Thu, 18 Jun 2015 07:48:29 -0300 Subject: [petsc-users] arkimex rejecting all dt in petsc 3.6.0 Message-ID: I just upgraded to 3.6.0 and my code stopped working. All dt are rejected. I used the same configure line, same code, same everything. With 3.5.4 I get: 0 TS dt 1e-06 time 0 0 SNES Function norm 2.549981005316e+05 1 SNES Function norm 6.107056905987e-03 2 SNES Function norm 1.483881932064e-10 3 SNES Function norm 9.122873794272e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 3 0 SNES Function norm 7.790429171165e+04 1 SNES Function norm 7.289068747803e-04 2 SNES Function norm 8.227639633330e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 1.154356516184e+05 1 SNES Function norm 2.309925413255e-03 2 SNES Function norm 6.382141981406e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt 'basic': step 0 accepted t=0 + 1.000e-06 wlte=0.000654 family='arkimex' scheme=0:'3' dt=1.000e-05 With 3.6.0 I get: 0 TS dt 1e-06 time 0 0 SNES Function norm 2.549981005316e+05 1 SNES Function norm 6.107056925316e-03 2 SNES Function norm 1.519319591792e-10 3 SNES Function norm 9.070104116945e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 3 0 SNES Function norm 7.790429171165e+04 1 SNES Function norm 6.942541792651e-04 2 SNES Function norm 8.458781909516e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 1.154356516184e+05 1 SNES Function norm 2.287202942961e-03 2 SNES Function norm 6.585201377573e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt 'basic': step 0 rejected t=0 + 1.000e-06 wlte= 324 family='arkimex' scheme=0:'3' dt=1.311e-07 Any ideas? I attached the full output. Options I use: -ts_view -ts_type arkimex -ts_arkimex_fully_implicit -ts_adapt_monitor -ts_monitor -snes_monitor -snes_converged_reason -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package mumps -snes_rtol 0 -snes_atol 1e-10 -snes_stol 0 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- 0 TS dt 1e-06 time 0 0 SNES Function norm 2.549981005316e+05 1 SNES Function norm 6.107056905987e-03 2 SNES Function norm 1.483881932064e-10 3 SNES Function norm 9.122873794272e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 3 0 SNES Function norm 7.790429171165e+04 1 SNES Function norm 7.289068747803e-04 2 SNES Function norm 8.227639633330e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 1.154356516184e+05 1 SNES Function norm 2.309925413255e-03 2 SNES Function norm 6.382141981406e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt 'basic': step 0 accepted t=0 + 1.000e-06 wlte=0.000654 family='arkimex' scheme=0:'3' dt=1.000e-05 1 TS dt 1e-05 time 1e-06 TS Object: 1 MPI processes type: arkimex maximum steps=1000000000 maximum time=1e-06 total number of nonlinear solver iterations=7 total number of nonlinear solve failures=0 total number of linear solver iterations=7 total number of rejected steps=0 ARK IMEX 3 Stiff abscissa ct = 0.000000 0.871733 0.600000 1.000000 Stiffly accurate: yes Explicit first stage: yes FSAL property: yes Nonstiff abscissa c = 0.000000 0.871733 0.600000 1.000000 TSAdapt Object: 1 MPI processes type: basic number of candidates 1 Basic: clip fastest decrease 0.1, fastest increase 10 Basic: safety factor 0.9, extra factor after step rejection 0.5 SNES Object: 1 MPI processes type: newtonls maximum iterations=50, maximum function evaluations=10000 tolerances: relative=0, absolute=1e-10, solution=0 total number of linear solver iterations=2 total number of function evaluations=3 SNESLineSearch Object: 1 MPI processes type: bt interpolation: cubic alpha=1.000000e-04 maxstep=1.000000e+08, minlambda=1.000000e-12 tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 maximum iterations=40 KSP Object: 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 matrix ordering: nd factor fill ratio given 0, needed 0 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=30603, cols=30603 package used to perform factorization: mumps total: nonzeros=6.03922e+06, allocated nonzeros=6.03922e+06 total number of mallocs used during MatSetValues calls =0 MUMPS run parameters: SYM (matrix type): 0 PAR (host participation): 1 ICNTL(1) (output for error): 6 ICNTL(2) (output of diagnostic msg): 0 ICNTL(3) (output for global info): 0 ICNTL(4) (level of printing): 0 ICNTL(5) (input mat struct): 0 ICNTL(6) (matrix prescaling): 7 ICNTL(7) (sequentia matrix ordering):7 ICNTL(8) (scalling strategy): 77 ICNTL(10) (max num of refinements): 0 ICNTL(11) (error analysis): 0 ICNTL(12) (efficiency control): 1 ICNTL(13) (efficiency control): 0 ICNTL(14) (percentage of estimated workspace increase): 20 ICNTL(18) (input mat struct): 0 ICNTL(19) (Shur complement info): 0 ICNTL(20) (rhs sparse pattern): 0 ICNTL(21) (somumpstion struct): 0 ICNTL(22) (in-core/out-of-core facility): 0 ICNTL(23) (max size of memory can be allocated locally):0 ICNTL(24) (detection of null pivot rows): 0 ICNTL(25) (computation of a null space basis): 0 ICNTL(26) (Schur options for rhs or solution): 0 ICNTL(27) (experimental parameter): -8 ICNTL(28) (use parallel or sequential ordering): 1 ICNTL(29) (parallel ordering): 0 ICNTL(30) (user-specified set of entries in inv(A)): 0 ICNTL(31) (factors is discarded in the solve phase): 0 ICNTL(33) (compute determinant): 0 CNTL(1) (relative pivoting threshold): 0.01 CNTL(2) (stopping criterion of refinement): 1.49012e-08 CNTL(3) (absomumpste pivoting threshold): 0 CNTL(4) (vamumpse of static pivoting): -1 CNTL(5) (fixation for null pivots): 0 RINFO(1) (local estimated flops for the elimination after analysis): [0] 1.03709e+09 RINFO(2) (local estimated flops for the assembly after factorization): [0] 1.07576e+07 RINFO(3) (local estimated flops for the elimination after factorization): [0] 1.03736e+09 INFO(15) (estimated size of (in MB) MUMPS internal data for running numerical factorization): [0] 73 INFO(16) (size of (in MB) MUMPS internal data used during numerical factorization): [0] 73 INFO(23) (num of pivots eliminated on this processor after factorization): [0] 30603 RINFOG(1) (global estimated flops for the elimination after analysis): 1.03709e+09 RINFOG(2) (global estimated flops for the assembly after factorization): 1.07576e+07 RINFOG(3) (global estimated flops for the elimination after factorization): 1.03736e+09 (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): (0,0)*(2^0) INFOG(3) (estimated real workspace for factors on all processors after analysis): 6039225 INFOG(4) (estimated integer workspace for factors on all processors after analysis): 352584 INFOG(5) (estimated maximum front size in the complete tree): 456 INFOG(6) (number of nodes in the complete tree): 2491 INFOG(7) (ordering option effectively use after analysis): 5 INFOG(8) (structural symmetry in percent of the permuted matrix after analysis): 100 INFOG(9) (total real/complex workspace to store the matrix factors after factorization): 6040215 INFOG(10) (total integer space store the matrix factors after factorization): 352602 INFOG(11) (order of largest frontal matrix after factorization): 456 INFOG(12) (number of off-diagonal pivots): 1207 INFOG(13) (number of delayed pivots after factorization): 9 INFOG(14) (number of memory compress after factorization): 0 INFOG(15) (number of steps of iterative refinement after solution): 0 INFOG(16) (estimated size (in MB) of all MUMPS internal data for factorization after analysis: value on the most memory consuming processor): 73 INFOG(17) (estimated size of all MUMPS internal data for factorization after analysis: sum over all processors): 73 INFOG(18) (size of all MUMPS internal data allocated during factorization: value on the most memory consuming processor): 73 INFOG(19) (size of all MUMPS internal data allocated during factorization: sum over all processors): 73 INFOG(20) (estimated number of entries in the factors): 6039225 INFOG(21) (size in MB of memory effectively used during factorization - value on the most memory consuming processor): 63 INFOG(22) (size in MB of memory effectively used during factorization - sum over all processors): 63 INFOG(23) (after analysis: value of ICNTL(6) effectively used): 0 INFOG(24) (after analysis: value of ICNTL(12) effectively used): 1 INFOG(25) (after factorization: number of pivots modified by static pivoting): 0 INFOG(28) (after factorization: number of null pivots encountered): 0 INFOG(29) (after factorization: effective number of entries in the factors (sum over all processors)): 6040215 INFOG(30, 31) (after solution: size in Mbytes of memory used during solution phase): 67, 67 INFOG(32) (after analysis: type of analysis done): 1 INFOG(33) (value used for ICNTL(8)): 7 INFOG(34) (exponent of the determinant if determinant is requested): 0 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=30603, cols=30603, bs=3 total: nonzeros=815409, allocated nonzeros=815409 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 10201 nodes, limit used is 5 CONVERGED_TIME at time 1e-06 after 1 steps -------------- next part -------------- 0 TS dt 1e-06 time 0 0 SNES Function norm 2.549981005316e+05 1 SNES Function norm 6.107056925316e-03 2 SNES Function norm 1.519319591792e-10 3 SNES Function norm 9.070104116945e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 3 0 SNES Function norm 7.790429171165e+04 1 SNES Function norm 6.942541792651e-04 2 SNES Function norm 8.458781909516e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 1.154356516184e+05 1 SNES Function norm 2.287202942961e-03 2 SNES Function norm 6.585201377573e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt 'basic': step 0 rejected t=0 + 1.000e-06 wlte= 324 family='arkimex' scheme=0:'3' dt=1.311e-07 0 SNES Function norm 2.549981005316e+05 1 SNES Function norm 1.030893399688e-02 2 SNES Function norm 1.227244086383e-10 3 SNES Function norm 8.680479900786e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 3 0 SNES Function norm 7.850186237744e+04 1 SNES Function norm 4.435591601555e-04 2 SNES Function norm 8.317684980141e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 1.165153426209e+05 1 SNES Function norm 1.813148503621e-03 2 SNES Function norm 5.613694043108e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt 'basic': step 0 rejected t=0 + 1.311e-07 wlte= 320 family='arkimex' scheme=0:'3' dt=1.724e-08 0 SNES Function norm 2.549981005316e+05 1 SNES Function norm 4.263053096085e-02 2 SNES Function norm 6.252919451896e-10 3 SNES Function norm 9.058384623510e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 3 0 SNES Function norm 7.858128091677e+04 1 SNES Function norm 3.963115544432e-04 2 SNES Function norm 8.278281256019e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 1.166564177803e+05 1 SNES Function norm 9.631229966553e-04 2 SNES Function norm 5.708329543623e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt 'basic': step 0 rejected t=0 + 1.724e-08 wlte= 320 family='arkimex' scheme=0:'3' dt=2.269e-09 0 SNES Function norm 2.549981005316e+05 1 SNES Function norm 1.079124564076e-01 2 SNES Function norm 1.130538110070e-09 3 SNES Function norm 8.991479075319e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 3 0 SNES Function norm 7.859174767183e+04 1 SNES Function norm 4.321261456929e-04 2 SNES Function norm 8.218414643337e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 1.166749679782e+05 1 SNES Function norm 6.070076055571e-04 2 SNES Function norm 5.569380923621e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt 'basic': step 0 rejected t=0 + 2.269e-09 wlte= 320 family='arkimex' scheme=0:'3' dt=2.987e-10 0 SNES Function norm 2.549981005316e+05 1 SNES Function norm 2.312307641993e-02 2 SNES Function norm 2.026987092062e-10 3 SNES Function norm 8.926723703962e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 3 0 SNES Function norm 7.859312557607e+04 1 SNES Function norm 4.828786679744e-04 2 SNES Function norm 8.341615291368e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 1.166774092974e+05 1 SNES Function norm 1.835403825645e-03 2 SNES Function norm 5.577036922069e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt 'basic': step 0 rejected t=0 + 2.987e-10 wlte= 320 family='arkimex' scheme=0:'3' dt=3.931e-11 0 SNES Function norm 2.549981005316e+05 1 SNES Function norm 2.351722737234e-02 2 SNES Function norm 2.529720083595e-10 3 SNES Function norm 8.892352147112e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 3 0 SNES Function norm 7.859330694478e+04 1 SNES Function norm 4.885408570662e-04 2 SNES Function norm 8.327396926987e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 1.166777306269e+05 1 SNES Function norm 1.804733305674e-03 2 SNES Function norm 5.388931877150e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt 'basic': step 0 rejected t=0 + 3.931e-11 wlte= 320 family='arkimex' scheme=0:'3' dt=5.174e-12 0 SNES Function norm 2.549981005316e+05 1 SNES Function norm 9.516392283907e-02 2 SNES Function norm 1.084610044563e-09 3 SNES Function norm 9.117232285273e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 3 0 SNES Function norm 7.859333081724e+04 1 SNES Function norm 1.951960746471e-04 2 SNES Function norm 8.022315535248e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 1.166777729214e+05 1 SNES Function norm 1.707739879535e-03 2 SNES Function norm 6.069805583058e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt 'basic': step 0 rejected t=0 + 5.174e-12 wlte= 320 family='arkimex' scheme=0:'3' dt=6.810e-13 0 SNES Function norm 2.549981005316e+05 1 SNES Function norm 9.238561718262e-02 2 SNES Function norm 1.201780471186e-09 3 SNES Function norm 8.749890804482e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 3 0 SNES Function norm 7.859333395939e+04 1 SNES Function norm 4.010933191742e-04 2 SNES Function norm 8.258990222529e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 1.166777784888e+05 1 SNES Function norm 1.833489380639e-03 2 SNES Function norm 5.339548154287e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt 'basic': step 0 rejected t=0 + 6.810e-13 wlte= 320 family='arkimex' scheme=0:'3' dt=8.964e-14 0 SNES Function norm 2.549981005316e+05 1 SNES Function norm 7.725363551151e-02 2 SNES Function norm 3.035001046151e-10 3 SNES Function norm 9.104321363901e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 3 0 SNES Function norm 7.859333437279e+04 1 SNES Function norm 4.404284856037e-04 2 SNES Function norm 8.005734930179e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 1.166777792253e+05 1 SNES Function norm 9.764901758589e-04 2 SNES Function norm 5.365314630014e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt 'basic': step 0 rejected t=0 + 8.964e-14 wlte= 320 family='arkimex' scheme=0:'3' dt=1.180e-14 0 SNES Function norm 2.549981005316e+05 1 SNES Function norm 1.267027453118e-05 2 SNES Function norm 8.908505647078e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 7.859333442583e+04 1 SNES Function norm 4.668956513888e-04 2 SNES Function norm 8.348862687250e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 1.166777793501e+05 1 SNES Function norm 9.432068723061e-04 2 SNES Function norm 5.390404010769e-11 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt 'basic': step 0 rejected t=0 + 1.180e-14 wlte= 320 family='arkimex' scheme=0:'3' dt=1.553e-15 DIVERGED_STEP_REJECTED at time 0 after 0 steps [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 [0]PETSC ERROR: ./planb on a arch-linux2-c-opt named localhost.localdomain by root Thu Jun 18 07:37:39 2015 [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich --download-mumps --download-scalapack --download-parmetis --download-metis --with-debugging=0 COPTFLAGS="-Ofast -march=native -mtune=native" CXXOPTFLAGS="-Ofast -march=native -mtune=native" FOPTFLAGS="-Ofast -march=native -mtune=native" [0]PETSC ERROR: #1 TSStep() line 3107 in /root/petsc/petsc-3.6.0/src/ts/interface/ts.c [0]PETSC ERROR: #2 TSSolve() line 3282 in /root/petsc/petsc-3.6.0/src/ts/interface/ts.c From balay at mcs.anl.gov Thu Jun 18 09:15:05 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 18 Jun 2015 09:15:05 -0500 Subject: [petsc-users] Reg : Petsc installation failed In-Reply-To: References: Message-ID: >>>>>>> Executing: /package/petsc/petsc-3.5.3/bin/win32fe/win32fe cl -c -o /cygdrive/c/Users/20055335/AppData/Local/Temp/petsc-leuQLz/config.setCompilers/conftest.o -I/cygdrive/c/Users/20055335/AppData/Local/Temp/petsc-leuQLz/config.setCompilers /cygdrive/c/Users/20055335/AppData/Local/Temp/petsc-leuQLz/config.setCompilers/conftest.c Possible ERROR while running compiler: exit code 50176 <<<<< For some reason the compiler is returning error codes. Just to eliminate cygwin from the equation - can you do the following: 1. reboot the machine 2. [make sure you don't start any cygwin processes] 3. run cygwin setup [it defaults to update mode] - and run it to completion. It should run rebaseall at the end. 4. Use petsc-3.6 And rerun configure [from compiler-cmd,bash per instructions] Satish On Thu, 18 Jun 2015, Manish Kumar K wrote: > Dear PETSc Team, > > > > I am configuring and trying to install PETSc libraries using Cygwin shell for windows > > > > I checked instructions at: > > http://www.mcs.anl.gov/petsc/documentation/installation.html#windows > > And invoked following command > > > > Step 1: > > ./configure --with-cc='win32fe cl' --with-fc=0 --download-f2cblaslapack=1 > > > C compiler you provided with -with-cc=win32fe cl does not work. > Cannot compile C with /package/petsc/petsc-3.5.3/bin/win32fe/win32fe cl. > > > > I am sending you the log file for this . > > > > Please help me out in this . > > > > Regards > > Manish K > > [https://relayq.larsentoubro.com/DigitalSignature2015.jpg] > > L&T Technology Services Ltd > > www.LntTechservices.com > > This Email may contain confidential or privileged information for the intended recipient (s). If you are not the intended recipient, please do not use or disseminate the information, notify the sender and delete it from your system. > From jychang48 at gmail.com Thu Jun 18 10:44:34 2015 From: jychang48 at gmail.com (Justin Chang) Date: Thu, 18 Jun 2015 10:44:34 -0500 Subject: [petsc-users] Varying TAO optimization solve iterations using BLMVM Message-ID: I solved a transient diffusion across multiple cores using TAO BLMVM. When I simulate the same problem but on different numbers of processing cores, the number of solve iterations change quite drastically. The numerical solution is the same, but these changes are quite vast. I attached a PDF showing a comparison between KSP and TAO. KSP remains largely invariant with number of processors but TAO (with bounded constraints) fluctuates. My question is, why is this happening? I understand that accumulation of numerical round-offs may attribute to this, but the differences seem quite vast to me. My initial thought was that 1) the Hessian is only projected and not explicitly computed, which may have something to do with the rate of convergence 2) local problem size. Certain regions of my domain have different number of "violations" which need to be corrected by the bounded constraints so the rate of convergence depends on how these regions are partitioned? Any thoughts? Thanks, Justin -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: chrom_solver-eps-converted-to.pdf Type: application/pdf Size: 129709 bytes Desc: not available URL: From bsmith at mcs.anl.gov Thu Jun 18 11:08:23 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 18 Jun 2015 11:08:23 -0500 Subject: [petsc-users] arkimex rejecting all dt in petsc 3.6.0 In-Reply-To: References: Message-ID: <718CB9DE-700D-4FE6-86E1-FC5401F3940F@mcs.anl.gov> Here is the response from Emil who made the changes to the code If you are solving a DAE then in the new version, we introduced a new flag that distinguishes between ODEs and DAEs leading to different semantics. For DAEs this is needed b/c it may not have consistent initial conditions. If solving DAEs, the user has to set the EquationType appropriately: e.g., ierr = TSSetEquationType(ts,TS_EQ_DAE_IMPLICIT_INDEX1);CHKERRQ(ierr); This is documented in the doc, but I didn't add it to the changelog (Jed pointed it out to me). Currently it only affects -ts_type arkimex. I'll work with Satish to add it to the changelog. *If that's not the case:* let me know and we can dig deeper. Emil > On Jun 18, 2015, at 5:48 AM, Italo Tasso wrote: > > I just upgraded to 3.6.0 and my code stopped working. All dt are rejected. I used the same configure line, same code, same everything. > > With 3.5.4 I get: > > 0 TS dt 1e-06 time 0 > 0 SNES Function norm 2.549981005316e+05 > 1 SNES Function norm 6.107056905987e-03 > 2 SNES Function norm 1.483881932064e-10 > 3 SNES Function norm 9.122873794272e-11 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 3 > 0 SNES Function norm 7.790429171165e+04 > 1 SNES Function norm 7.289068747803e-04 > 2 SNES Function norm 8.227639633330e-11 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > 0 SNES Function norm 1.154356516184e+05 > 1 SNES Function norm 2.309925413255e-03 > 2 SNES Function norm 6.382141981406e-11 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > TSAdapt 'basic': step 0 accepted t=0 + 1.000e-06 wlte=0.000654 family='arkimex' scheme=0:'3' dt=1.000e-05 > > With 3.6.0 I get: > > 0 TS dt 1e-06 time 0 > 0 SNES Function norm 2.549981005316e+05 > 1 SNES Function norm 6.107056925316e-03 > 2 SNES Function norm 1.519319591792e-10 > 3 SNES Function norm 9.070104116945e-11 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 3 > 0 SNES Function norm 7.790429171165e+04 > 1 SNES Function norm 6.942541792651e-04 > 2 SNES Function norm 8.458781909516e-11 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > 0 SNES Function norm 1.154356516184e+05 > 1 SNES Function norm 2.287202942961e-03 > 2 SNES Function norm 6.585201377573e-11 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > TSAdapt 'basic': step 0 rejected t=0 + 1.000e-06 wlte= 324 family='arkimex' scheme=0:'3' dt=1.311e-07 > > Any ideas? I attached the full output. > > Options I use: > > -ts_view -ts_type arkimex -ts_arkimex_fully_implicit -ts_adapt_monitor -ts_monitor -snes_monitor -snes_converged_reason -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package mumps -snes_rtol 0 -snes_atol 1e-10 -snes_stol 0 > > From abhyshr at anl.gov Thu Jun 18 11:15:04 2015 From: abhyshr at anl.gov (Abhyankar, Shrirang G.) Date: Thu, 18 Jun 2015 16:15:04 +0000 Subject: [petsc-users] arkimex rejecting all dt in petsc 3.6.0 In-Reply-To: <718CB9DE-700D-4FE6-86E1-FC5401F3940F@mcs.anl.gov> References: <718CB9DE-700D-4FE6-86E1-FC5401F3940F@mcs.anl.gov> Message-ID: Can the DAE equation type be supplied via a run-time option? Shri -----Original Message----- From: barry smith Date: Thursday, June 18, 2015 at 11:08 AM To: Italo Tasso , "Constantinescu, Emil M." Cc: "petsc-users at mcs.anl.gov" Subject: Re: [petsc-users] arkimex rejecting all dt in petsc 3.6.0 > > Here is the response from Emil who made the changes to the code > >If you are solving a DAE then in the new version, we introduced a new >flag that distinguishes between ODEs and DAEs leading to different >semantics. For DAEs this is needed b/c it may not have consistent initial >conditions. > >If solving DAEs, the user has to set the EquationType appropriately: >e.g., >ierr = TSSetEquationType(ts,TS_EQ_DAE_IMPLICIT_INDEX1);CHKERRQ(ierr); > >This is documented in the doc, but I didn't add it to the changelog (Jed >pointed it out to me). > >Currently it only affects -ts_type arkimex. I'll work with Satish to add >it to the changelog. > >*If that's not the case:* let me know and we can dig deeper. > >Emil > >> On Jun 18, 2015, at 5:48 AM, Italo Tasso wrote: >> >> I just upgraded to 3.6.0 and my code stopped working. All dt are >>rejected. I used the same configure line, same code, same everything. >> >> With 3.5.4 I get: >> >> 0 TS dt 1e-06 time 0 >> 0 SNES Function norm 2.549981005316e+05 >> 1 SNES Function norm 6.107056905987e-03 >> 2 SNES Function norm 1.483881932064e-10 >> 3 SNES Function norm 9.122873794272e-11 >> Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 3 >> 0 SNES Function norm 7.790429171165e+04 >> 1 SNES Function norm 7.289068747803e-04 >> 2 SNES Function norm 8.227639633330e-11 >> Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 >> 0 SNES Function norm 1.154356516184e+05 >> 1 SNES Function norm 2.309925413255e-03 >> 2 SNES Function norm 6.382141981406e-11 >> Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 >> TSAdapt 'basic': step 0 accepted t=0 + 1.000e-06 >>wlte=0.000654 family='arkimex' scheme=0:'3' dt=1.000e-05 >> >> With 3.6.0 I get: >> >> 0 TS dt 1e-06 time 0 >> 0 SNES Function norm 2.549981005316e+05 >> 1 SNES Function norm 6.107056925316e-03 >> 2 SNES Function norm 1.519319591792e-10 >> 3 SNES Function norm 9.070104116945e-11 >> Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 3 >> 0 SNES Function norm 7.790429171165e+04 >> 1 SNES Function norm 6.942541792651e-04 >> 2 SNES Function norm 8.458781909516e-11 >> Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 >> 0 SNES Function norm 1.154356516184e+05 >> 1 SNES Function norm 2.287202942961e-03 >> 2 SNES Function norm 6.585201377573e-11 >> Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 >> TSAdapt 'basic': step 0 rejected t=0 + 1.000e-06 wlte= >> 324 family='arkimex' scheme=0:'3' dt=1.311e-07 >> >> Any ideas? I attached the full output. >> >> Options I use: >> >> -ts_view -ts_type arkimex -ts_arkimex_fully_implicit -ts_adapt_monitor >>-ts_monitor -snes_monitor -snes_converged_reason -ksp_type preonly >>-pc_type lu -pc_factor_mat_solver_package mumps -snes_rtol 0 -snes_atol >>1e-10 -snes_stol 0 >> >> > From emconsta at mcs.anl.gov Thu Jun 18 11:24:23 2015 From: emconsta at mcs.anl.gov (Emil Constantinescu) Date: Thu, 18 Jun 2015 11:24:23 -0500 Subject: [petsc-users] arkimex rejecting all dt in petsc 3.6.0 In-Reply-To: References: <718CB9DE-700D-4FE6-86E1-FC5401F3940F@mcs.anl.gov> Message-ID: <5582F0B7.60901@mcs.anl.gov> No, I'm not sure if it should. It belongs to the same category as "problem_type" == TS_LINEAR / == TS_NONLINEAR. Emil On 6/18/15 11:15 AM, Abhyankar, Shrirang G. wrote: > Can the DAE equation type be supplied via a run-time option? > > Shri > > -----Original Message----- > From: barry smith > Date: Thursday, June 18, 2015 at 11:08 AM > To: Italo Tasso , "Constantinescu, Emil M." > > Cc: "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] arkimex rejecting all dt in petsc 3.6.0 > >> >> Here is the response from Emil who made the changes to the code >> >> If you are solving a DAE then in the new version, we introduced a new >> flag that distinguishes between ODEs and DAEs leading to different >> semantics. For DAEs this is needed b/c it may not have consistent initial >> conditions. >> >> If solving DAEs, the user has to set the EquationType appropriately: >> e.g., >> ierr = TSSetEquationType(ts,TS_EQ_DAE_IMPLICIT_INDEX1);CHKERRQ(ierr); >> >> This is documented in the doc, but I didn't add it to the changelog (Jed >> pointed it out to me). >> >> Currently it only affects -ts_type arkimex. I'll work with Satish to add >> it to the changelog. >> >> *If that's not the case:* let me know and we can dig deeper. >> >> Emil >> >>> On Jun 18, 2015, at 5:48 AM, Italo Tasso wrote: >>> >>> I just upgraded to 3.6.0 and my code stopped working. All dt are >>> rejected. I used the same configure line, same code, same everything. >>> >>> With 3.5.4 I get: >>> >>> 0 TS dt 1e-06 time 0 >>> 0 SNES Function norm 2.549981005316e+05 >>> 1 SNES Function norm 6.107056905987e-03 >>> 2 SNES Function norm 1.483881932064e-10 >>> 3 SNES Function norm 9.122873794272e-11 >>> Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 3 >>> 0 SNES Function norm 7.790429171165e+04 >>> 1 SNES Function norm 7.289068747803e-04 >>> 2 SNES Function norm 8.227639633330e-11 >>> Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 >>> 0 SNES Function norm 1.154356516184e+05 >>> 1 SNES Function norm 2.309925413255e-03 >>> 2 SNES Function norm 6.382141981406e-11 >>> Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 >>> TSAdapt 'basic': step 0 accepted t=0 + 1.000e-06 >>> wlte=0.000654 family='arkimex' scheme=0:'3' dt=1.000e-05 >>> >>> With 3.6.0 I get: >>> >>> 0 TS dt 1e-06 time 0 >>> 0 SNES Function norm 2.549981005316e+05 >>> 1 SNES Function norm 6.107056925316e-03 >>> 2 SNES Function norm 1.519319591792e-10 >>> 3 SNES Function norm 9.070104116945e-11 >>> Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 3 >>> 0 SNES Function norm 7.790429171165e+04 >>> 1 SNES Function norm 6.942541792651e-04 >>> 2 SNES Function norm 8.458781909516e-11 >>> Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 >>> 0 SNES Function norm 1.154356516184e+05 >>> 1 SNES Function norm 2.287202942961e-03 >>> 2 SNES Function norm 6.585201377573e-11 >>> Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 >>> TSAdapt 'basic': step 0 rejected t=0 + 1.000e-06 wlte= >>> 324 family='arkimex' scheme=0:'3' dt=1.311e-07 >>> >>> Any ideas? I attached the full output. >>> >>> Options I use: >>> >>> -ts_view -ts_type arkimex -ts_arkimex_fully_implicit -ts_adapt_monitor >>> -ts_monitor -snes_monitor -snes_converged_reason -ksp_type preonly >>> -pc_type lu -pc_factor_mat_solver_package mumps -snes_rtol 0 -snes_atol >>> 1e-10 -snes_stol 0 >>> >>> >> > From jason.sarich at gmail.com Thu Jun 18 12:15:06 2015 From: jason.sarich at gmail.com (Jason Sarich) Date: Thu, 18 Jun 2015 12:15:06 -0500 Subject: [petsc-users] Varying TAO optimization solve iterations using BLMVM In-Reply-To: References: Message-ID: Hi Justin, I can't tell for sure why this is happening, have you tried using quad precision to make sure that numerical cutoffs isn't the problem? 1 The Hessian being approximate and the resulting implicit computation is the source of the cutoff, but would not be causing different convergence rates in infinite precision. 2 the local size may affect load balancing but not the resulting norms/convergence rate. Jason On Thu, Jun 18, 2015 at 10:44 AM, Justin Chang wrote: > I solved a transient diffusion across multiple cores using TAO BLMVM. > When I simulate the same problem but on different numbers of processing > cores, the number of solve iterations change quite drastically. The > numerical solution is the same, but these changes are quite vast. I > attached a PDF showing a comparison between KSP and TAO. KSP remains > largely invariant with number of processors but TAO (with bounded > constraints) fluctuates. > > My question is, why is this happening? I understand that accumulation of > numerical round-offs may attribute to this, but the differences seem quite > vast to me. My initial thought was that > > 1) the Hessian is only projected and not explicitly computed, which may > have something to do with the rate of convergence > > 2) local problem size. Certain regions of my domain have different number > of "violations" which need to be corrected by the bounded constraints so > the rate of convergence depends on how these regions are partitioned? > > Any thoughts? > > Thanks, > Justin > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jun 18 13:45:23 2015 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 18 Jun 2015 13:45:23 -0500 Subject: [petsc-users] Varying TAO optimization solve iterations using BLMVM In-Reply-To: References: Message-ID: On Thu, Jun 18, 2015 at 12:15 PM, Jason Sarich wrote: > Hi Justin, > > I can't tell for sure why this is happening, have you tried using quad > precision to make sure that numerical cutoffs isn't the problem? > > 1 The Hessian being approximate and the resulting implicit computation is > the source of the cutoff, but would not be causing different convergence > rates in infinite precision. > > 2 the local size may affect load balancing but not the resulting > norms/convergence rate. > This sounds to be like the preconditioner is dependent on the partition. Can you send -tao_view -snes_view Matt > Jason > > > On Thu, Jun 18, 2015 at 10:44 AM, Justin Chang > wrote: > >> I solved a transient diffusion across multiple cores using TAO BLMVM. >> When I simulate the same problem but on different numbers of processing >> cores, the number of solve iterations change quite drastically. The >> numerical solution is the same, but these changes are quite vast. I >> attached a PDF showing a comparison between KSP and TAO. KSP remains >> largely invariant with number of processors but TAO (with bounded >> constraints) fluctuates. >> >> My question is, why is this happening? I understand that accumulation of >> numerical round-offs may attribute to this, but the differences seem quite >> vast to me. My initial thought was that >> >> 1) the Hessian is only projected and not explicitly computed, which may >> have something to do with the rate of convergence >> >> 2) local problem size. Certain regions of my domain have different number >> of "violations" which need to be corrected by the bounded constraints so >> the rate of convergence depends on how these regions are partitioned? >> >> Any thoughts? >> >> Thanks, >> Justin >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.sarich at gmail.com Thu Jun 18 13:52:48 2015 From: jason.sarich at gmail.com (Jason Sarich) Date: Thu, 18 Jun 2015 13:52:48 -0500 Subject: [petsc-users] Varying TAO optimization solve iterations using BLMVM In-Reply-To: References: Message-ID: BLMVM doesn't use a KSP or preconditioner, it updates using the L-BFGS-B formula On Thu, Jun 18, 2015 at 1:45 PM, Matthew Knepley wrote: > On Thu, Jun 18, 2015 at 12:15 PM, Jason Sarich > wrote: > >> Hi Justin, >> >> I can't tell for sure why this is happening, have you tried using quad >> precision to make sure that numerical cutoffs isn't the problem? >> >> 1 The Hessian being approximate and the resulting implicit computation >> is the source of the cutoff, but would not be causing different convergence >> rates in infinite precision. >> >> 2 the local size may affect load balancing but not the resulting >> norms/convergence rate. >> > > This sounds to be like the preconditioner is dependent on the partition. > Can you send -tao_view -snes_view > > Matt > > >> Jason >> >> >> On Thu, Jun 18, 2015 at 10:44 AM, Justin Chang >> wrote: >> >>> I solved a transient diffusion across multiple cores using TAO BLMVM. >>> When I simulate the same problem but on different numbers of processing >>> cores, the number of solve iterations change quite drastically. The >>> numerical solution is the same, but these changes are quite vast. I >>> attached a PDF showing a comparison between KSP and TAO. KSP remains >>> largely invariant with number of processors but TAO (with bounded >>> constraints) fluctuates. >>> >>> My question is, why is this happening? I understand that accumulation of >>> numerical round-offs may attribute to this, but the differences seem quite >>> vast to me. My initial thought was that >>> >>> 1) the Hessian is only projected and not explicitly computed, which >>> may have something to do with the rate of convergence >>> >>> 2) local problem size. Certain regions of my domain have different >>> number of "violations" which need to be corrected by the bounded >>> constraints so the rate of convergence depends on how these regions are >>> partitioned? >>> >>> Any thoughts? >>> >>> Thanks, >>> Justin >>> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jun 18 15:50:17 2015 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 18 Jun 2015 15:50:17 -0500 Subject: [petsc-users] Varying TAO optimization solve iterations using BLMVM In-Reply-To: References: Message-ID: On Thu, Jun 18, 2015 at 1:52 PM, Jason Sarich wrote: > BLMVM doesn't use a KSP or preconditioner, it updates using the L-BFGS-B > formula > Then this sounds like a bug, unless one of the constants is partition dependent. Matt > On Thu, Jun 18, 2015 at 1:45 PM, Matthew Knepley > wrote: > >> On Thu, Jun 18, 2015 at 12:15 PM, Jason Sarich >> wrote: >> >>> Hi Justin, >>> >>> I can't tell for sure why this is happening, have you tried using quad >>> precision to make sure that numerical cutoffs isn't the problem? >>> >>> 1 The Hessian being approximate and the resulting implicit computation >>> is the source of the cutoff, but would not be causing different convergence >>> rates in infinite precision. >>> >>> 2 the local size may affect load balancing but not the resulting >>> norms/convergence rate. >>> >> >> This sounds to be like the preconditioner is dependent on the >> partition. Can you send -tao_view -snes_view >> >> Matt >> >> >>> Jason >>> >>> >>> On Thu, Jun 18, 2015 at 10:44 AM, Justin Chang >>> wrote: >>> >>>> I solved a transient diffusion across multiple cores using TAO BLMVM. >>>> When I simulate the same problem but on different numbers of processing >>>> cores, the number of solve iterations change quite drastically. The >>>> numerical solution is the same, but these changes are quite vast. I >>>> attached a PDF showing a comparison between KSP and TAO. KSP remains >>>> largely invariant with number of processors but TAO (with bounded >>>> constraints) fluctuates. >>>> >>>> My question is, why is this happening? I understand that accumulation >>>> of numerical round-offs may attribute to this, but the differences seem >>>> quite vast to me. My initial thought was that >>>> >>>> 1) the Hessian is only projected and not explicitly computed, which >>>> may have something to do with the rate of convergence >>>> >>>> 2) local problem size. Certain regions of my domain have different >>>> number of "violations" which need to be corrected by the bounded >>>> constraints so the rate of convergence depends on how these regions are >>>> partitioned? >>>> >>>> Any thoughts? >>>> >>>> Thanks, >>>> Justin >>>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From italo at tasso.com.br Thu Jun 18 16:10:28 2015 From: italo at tasso.com.br (Italo Tasso) Date: Thu, 18 Jun 2015 18:10:28 -0300 Subject: [petsc-users] arkimex rejecting all dt in petsc 3.6.0 In-Reply-To: <5582F0B7.60901@mcs.anl.gov> References: <718CB9DE-700D-4FE6-86E1-FC5401F3940F@mcs.anl.gov> <5582F0B7.60901@mcs.anl.gov> Message-ID: Thank you Barry and Emil. It works, but it takes twice as many steps than before. Was it wrong before? Should I have been using this option all along? I am solving the Navier-Stokes equations, full implicit and non-linear. I also noticed two things: In the 3.6.0 output, ts_monitor skips timestep #1. If I use the equation type in 3.5.4, I get segmentation fault. I attached the outputs. On Thu, Jun 18, 2015 at 1:24 PM, Emil Constantinescu wrote: > No, I'm not sure if it should. It belongs to the same category as > "problem_type" == TS_LINEAR / == TS_NONLINEAR. > > Emil > > > On 6/18/15 11:15 AM, Abhyankar, Shrirang G. wrote: > >> Can the DAE equation type be supplied via a run-time option? >> >> Shri >> >> -----Original Message----- >> From: barry smith >> Date: Thursday, June 18, 2015 at 11:08 AM >> To: Italo Tasso , "Constantinescu, Emil M." >> >> Cc: "petsc-users at mcs.anl.gov" >> Subject: Re: [petsc-users] arkimex rejecting all dt in petsc 3.6.0 >> >> >>> Here is the response from Emil who made the changes to the code >>> >>> If you are solving a DAE then in the new version, we introduced a new >>> flag that distinguishes between ODEs and DAEs leading to different >>> semantics. For DAEs this is needed b/c it may not have consistent initial >>> conditions. >>> >>> If solving DAEs, the user has to set the EquationType appropriately: >>> e.g., >>> ierr = TSSetEquationType(ts,TS_EQ_DAE_IMPLICIT_INDEX1);CHKERRQ(ierr); >>> >>> This is documented in the doc, but I didn't add it to the changelog (Jed >>> pointed it out to me). >>> >>> Currently it only affects -ts_type arkimex. I'll work with Satish to add >>> it to the changelog. >>> >>> *If that's not the case:* let me know and we can dig deeper. >>> >>> Emil >>> >>> On Jun 18, 2015, at 5:48 AM, Italo Tasso wrote: >>>> >>>> I just upgraded to 3.6.0 and my code stopped working. All dt are >>>> rejected. I used the same configure line, same code, same everything. >>>> >>>> With 3.5.4 I get: >>>> >>>> 0 TS dt 1e-06 time 0 >>>> 0 SNES Function norm 2.549981005316e+05 >>>> 1 SNES Function norm 6.107056905987e-03 >>>> 2 SNES Function norm 1.483881932064e-10 >>>> 3 SNES Function norm 9.122873794272e-11 >>>> Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 3 >>>> 0 SNES Function norm 7.790429171165e+04 >>>> 1 SNES Function norm 7.289068747803e-04 >>>> 2 SNES Function norm 8.227639633330e-11 >>>> Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 >>>> 0 SNES Function norm 1.154356516184e+05 >>>> 1 SNES Function norm 2.309925413255e-03 >>>> 2 SNES Function norm 6.382141981406e-11 >>>> Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 >>>> TSAdapt 'basic': step 0 accepted t=0 + 1.000e-06 >>>> wlte=0.000654 family='arkimex' scheme=0:'3' dt=1.000e-05 >>>> >>>> With 3.6.0 I get: >>>> >>>> 0 TS dt 1e-06 time 0 >>>> 0 SNES Function norm 2.549981005316e+05 >>>> 1 SNES Function norm 6.107056925316e-03 >>>> 2 SNES Function norm 1.519319591792e-10 >>>> 3 SNES Function norm 9.070104116945e-11 >>>> Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 3 >>>> 0 SNES Function norm 7.790429171165e+04 >>>> 1 SNES Function norm 6.942541792651e-04 >>>> 2 SNES Function norm 8.458781909516e-11 >>>> Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 >>>> 0 SNES Function norm 1.154356516184e+05 >>>> 1 SNES Function norm 2.287202942961e-03 >>>> 2 SNES Function norm 6.585201377573e-11 >>>> Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 >>>> TSAdapt 'basic': step 0 rejected t=0 + 1.000e-06 wlte= >>>> 324 family='arkimex' scheme=0:'3' dt=1.311e-07 >>>> >>>> Any ideas? I attached the full output. >>>> >>>> Options I use: >>>> >>>> -ts_view -ts_type arkimex -ts_arkimex_fully_implicit -ts_adapt_monitor >>>> -ts_monitor -snes_monitor -snes_converged_reason -ksp_type preonly >>>> -pc_type lu -pc_factor_mat_solver_package mumps -snes_rtol 0 -snes_atol >>>> 1e-10 -snes_stol 0 >>>> >>>> >>>> >>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- 0 TS dt 1e-06 time 0 TSAdapt 'basic': step 0 accepted t=0 + 1.000e-06 wlte=0.723 family='arkimex' scheme=0:'1bee' dt=1.059e-06 TSAdapt 'basic': step 1 accepted t=1e-06 + 1.059e-06 wlte=0.000706 family='arkimex' scheme=0:'3' dt=1.059e-05 2 TS dt 1.05853e-05 time 2.05853e-06 TSAdapt 'basic': step 2 accepted t=2.05853e-06+ 1.059e-05 wlte=0.548 family='arkimex' scheme=0:'3' dt=1.164e-05 3 TS dt 1.16435e-05 time 1.26438e-05 TSAdapt 'basic': step 3 accepted t=1.26438e-05+ 1.164e-05 wlte=0.579 family='arkimex' scheme=0:'3' dt=1.257e-05 4 TS dt 1.25697e-05 time 2.42873e-05 TSAdapt 'basic': step 4 accepted t=2.42873e-05+ 1.257e-05 wlte=0.559 family='arkimex' scheme=0:'3' dt=1.373e-05 5 TS dt 1.37332e-05 time 3.6857e-05 TSAdapt 'basic': step 5 accepted t=3.6857e-05 + 1.373e-05 wlte=0.536 family='arkimex' scheme=0:'3' dt=1.522e-05 6 TS dt 1.52176e-05 time 5.05902e-05 TSAdapt 'basic': step 6 accepted t=5.05902e-05+ 1.522e-05 wlte=0.511 family='arkimex' scheme=0:'3' dt=1.713e-05 7 TS dt 1.71336e-05 time 6.58078e-05 TSAdapt 'basic': step 7 accepted t=6.58078e-05+ 1.713e-05 wlte=0.484 family='arkimex' scheme=0:'3' dt=1.964e-05 8 TS dt 1.96364e-05 time 8.29414e-05 TSAdapt 'basic': step 8 accepted t=8.29414e-05+ 1.964e-05 wlte=0.456 family='arkimex' scheme=0:'3' dt=2.295e-05 9 TS dt 2.29535e-05 time 0.000102578 TSAdapt 'basic': step 9 accepted t=0.000102578+ 2.295e-05 wlte=0.429 family='arkimex' scheme=0:'3' dt=2.740e-05 10 TS dt 2.7397e-05 time 0.000125531 TSAdapt 'basic': step 10 accepted t=0.000125531+ 2.740e-05 wlte=0.403 family='arkimex' scheme=0:'3' dt=3.339e-05 11 TS dt 3.33874e-05 time 0.000152928 TSAdapt 'basic': step 11 accepted t=0.000152928+ 3.339e-05 wlte=0.382 family='arkimex' scheme=0:'3' dt=4.142e-05 12 TS dt 4.14194e-05 time 0.000186316 TSAdapt 'basic': step 12 accepted t=0.000186316+ 4.142e-05 wlte=0.369 family='arkimex' scheme=0:'3' dt=5.199e-05 13 TS dt 5.19872e-05 time 0.000227735 TSAdapt 'basic': step 13 accepted t=0.000227735+ 5.199e-05 wlte=0.364 family='arkimex' scheme=0:'3' dt=6.553e-05 14 TS dt 6.55288e-05 time 0.000279722 TSAdapt 'basic': step 14 accepted t=0.000279722+ 6.553e-05 wlte=0.365 family='arkimex' scheme=0:'3' dt=8.254e-05 15 TS dt 8.25371e-05 time 0.000345251 TSAdapt 'basic': step 15 accepted t=0.000345251+ 8.254e-05 wlte=0.367 family='arkimex' scheme=0:'3' dt=1.037e-04 16 TS dt 0.00010372 time 0.000427788 TSAdapt 'basic': step 16 accepted t=0.000427788+ 1.037e-04 wlte=0.369 family='arkimex' scheme=0:'3' dt=1.301e-04 17 TS dt 0.000130137 time 0.000531508 TSAdapt 'basic': step 17 accepted t=0.000531508+ 1.301e-04 wlte= 0.37 family='arkimex' scheme=0:'3' dt=1.631e-04 18 TS dt 0.000163074 time 0.000661645 TSAdapt 'basic': step 18 accepted t=0.000661645+ 1.631e-04 wlte=0.372 family='arkimex' scheme=0:'3' dt=2.040e-04 19 TS dt 0.000204046 time 0.00082472 TSAdapt 'basic': step 19 accepted t=0.00082472 + 2.040e-04 wlte=0.374 family='arkimex' scheme=0:'3' dt=2.548e-04 20 TS dt 0.000254802 time 0.00102877 TSAdapt 'basic': step 20 accepted t=0.00102877 + 2.548e-04 wlte=0.377 family='arkimex' scheme=0:'3' dt=3.174e-04 21 TS dt 0.000317385 time 0.00128357 TSAdapt 'basic': step 21 accepted t=0.00128357 + 3.174e-04 wlte=0.381 family='arkimex' scheme=0:'3' dt=3.940e-04 22 TS dt 0.000394007 time 0.00160095 TSAdapt 'basic': step 22 accepted t=0.00160095 + 3.940e-04 wlte=0.386 family='arkimex' scheme=0:'3' dt=4.870e-04 23 TS dt 0.000486973 time 0.00199496 TSAdapt 'basic': step 23 accepted t=0.00199496 + 4.870e-04 wlte=0.393 family='arkimex' scheme=0:'3' dt=5.985e-04 24 TS dt 0.000598494 time 0.00248193 TSAdapt 'basic': step 24 accepted t=0.00248193 + 5.985e-04 wlte=0.401 family='arkimex' scheme=0:'3' dt=7.306e-04 25 TS dt 0.000730558 time 0.00308043 TSAdapt 'basic': step 25 accepted t=0.00308043 + 7.306e-04 wlte=0.409 family='arkimex' scheme=0:'3' dt=8.854e-04 26 TS dt 0.000885429 time 0.00381099 TSAdapt 'basic': step 26 accepted t=0.00381099 + 8.854e-04 wlte=0.408 family='arkimex' scheme=0:'3' dt=1.075e-03 27 TS dt 0.00107468 time 0.00469641 TSAdapt 'basic': step 27 accepted t=0.00469641 + 1.075e-03 wlte=0.347 family='arkimex' scheme=0:'3' dt=1.376e-03 28 TS dt 0.00137622 time 0.0057711 TSAdapt 'basic': step 28 accepted t=0.0057711 + 1.376e-03 wlte=0.326 family='arkimex' scheme=0:'3' dt=1.799e-03 29 TS dt 0.00179877 time 0.00714732 TSAdapt 'basic': step 29 accepted t=0.00714732 + 1.799e-03 wlte=0.313 family='arkimex' scheme=0:'3' dt=2.384e-03 30 TS dt 0.00238444 time 0.00894608 TSAdapt 'basic': step 30 accepted t=0.00894608 + 2.384e-03 wlte=0.307 family='arkimex' scheme=0:'3' dt=3.182e-03 31 TS dt 0.00318229 time 0.0113305 TSAdapt 'basic': step 31 accepted t=0.0113305 + 3.182e-03 wlte=0.301 family='arkimex' scheme=0:'3' dt=4.273e-03 32 TS dt 0.00427274 time 0.0145128 TSAdapt 'basic': step 32 accepted t=0.0145128 + 4.273e-03 wlte=0.296 family='arkimex' scheme=0:'3' dt=5.771e-03 33 TS dt 0.00577055 time 0.0187856 TSAdapt 'basic': step 33 accepted t=0.0187856 + 5.771e-03 wlte=0.291 family='arkimex' scheme=0:'3' dt=7.840e-03 34 TS dt 0.00784028 time 0.0245561 TSAdapt 'basic': step 34 accepted t=0.0245561 + 7.840e-03 wlte=0.286 family='arkimex' scheme=0:'3' dt=1.071e-02 35 TS dt 0.0107123 time 0.0323964 TSAdapt 'basic': step 35 accepted t=0.0323964 + 1.071e-02 wlte=0.269 family='arkimex' scheme=0:'3' dt=1.493e-02 36 TS dt 0.0149302 time 0.0431087 TSAdapt 'basic': step 36 accepted t=0.0431087 + 1.493e-02 wlte= 0.23 family='arkimex' scheme=0:'3' dt=2.193e-02 37 TS dt 0.0219264 time 0.0580389 TSAdapt 'basic': step 37 accepted t=0.0580389 + 2.193e-02 wlte= 0.19 family='arkimex' scheme=0:'3' dt=3.435e-02 38 TS dt 0.0343516 time 0.0799653 TSAdapt 'basic': step 38 accepted t=0.0799653 + 3.435e-02 wlte=0.132 family='arkimex' scheme=0:'3' dt=6.071e-02 39 TS dt 0.060712 time 0.114317 TSAdapt 'basic': step 39 accepted t=0.114317 + 6.071e-02 wlte=0.0437 family='arkimex' scheme=0:'3' dt=1.551e-01 40 TS dt 0.155117 time 0.175029 TSAdapt 'basic': step 40 accepted t=0.175029 + 1.551e-01 wlte=0.00333 family='arkimex' scheme=0:'3' dt=9.351e-01 41 TS dt 0.935113 time 0.330146 TSAdapt 'basic': step 41 accepted t=0.330146 + 9.351e-01 wlte=0.000792 family='arkimex' scheme=0:'3' dt=8.735e+00 42 TS dt 8.73474 time 1.26526 TSAdapt 'basic': step 42 accepted t=1.26526 + 8.735e+00 wlte=4.42e-05 family='arkimex' scheme=0:'3' dt=8.735e+01 43 TS dt 87.3474 time 10 CONVERGED_TIME at time 10 after 43 steps -------------- next part -------------- 0 TS dt 1e-06 time 0 TSAdapt 'basic': step 0 accepted t=0 + 1.000e-06 wlte=4.64e-05 family='arkimex' scheme=0:'3' dt=1.000e-05 1 TS dt 1e-05 time 1e-06 TSAdapt 'basic': step 1 accepted t=1e-06 + 1.000e-05 wlte=0.0274 family='arkimex' scheme=0:'3' dt=2.984e-05 2 TS dt 2.98427e-05 time 1.1e-05 TSAdapt 'basic': step 2 accepted t=1.1e-05 + 2.984e-05 wlte=0.214 family='arkimex' scheme=0:'3' dt=4.489e-05 3 TS dt 4.48912e-05 time 4.08427e-05 TSAdapt 'basic': step 3 accepted t=4.08427e-05+ 4.489e-05 wlte= 0.16 family='arkimex' scheme=0:'3' dt=7.443e-05 4 TS dt 7.44292e-05 time 8.57339e-05 TSAdapt 'basic': step 4 accepted t=8.57339e-05+ 7.443e-05 wlte=0.128 family='arkimex' scheme=0:'3' dt=1.331e-04 5 TS dt 0.000133086 time 0.000160163 TSAdapt 'basic': step 5 accepted t=0.000160163+ 1.331e-04 wlte=0.132 family='arkimex' scheme=0:'3' dt=2.350e-04 6 TS dt 0.000234956 time 0.000293249 TSAdapt 'basic': step 6 accepted t=0.000293249+ 2.350e-04 wlte=0.143 family='arkimex' scheme=0:'3' dt=4.047e-04 7 TS dt 0.000404705 time 0.000528205 TSAdapt 'basic': step 7 accepted t=0.000528205+ 4.047e-04 wlte=0.152 family='arkimex' scheme=0:'3' dt=6.819e-04 8 TS dt 0.00068193 time 0.000932909 TSAdapt 'basic': step 8 accepted t=0.000932909+ 6.819e-04 wlte=0.162 family='arkimex' scheme=0:'3' dt=1.126e-03 9 TS dt 0.0011258 time 0.00161484 TSAdapt 'basic': step 9 accepted t=0.00161484 + 1.126e-03 wlte=0.172 family='arkimex' scheme=0:'3' dt=1.822e-03 10 TS dt 0.00182154 time 0.00274064 TSAdapt 'basic': step 10 accepted t=0.00274064 + 1.822e-03 wlte=0.183 family='arkimex' scheme=0:'3' dt=2.889e-03 11 TS dt 0.0028885 time 0.00456217 TSAdapt 'basic': step 11 accepted t=0.00456217 + 2.889e-03 wlte=0.195 family='arkimex' scheme=0:'3' dt=4.485e-03 12 TS dt 0.00448532 time 0.00745068 TSAdapt 'basic': step 12 accepted t=0.00745068 + 4.485e-03 wlte=0.209 family='arkimex' scheme=0:'3' dt=6.800e-03 13 TS dt 0.00679993 time 0.011936 TSAdapt 'basic': step 13 accepted t=0.011936 + 6.800e-03 wlte=0.229 family='arkimex' scheme=0:'3' dt=1.001e-02 14 TS dt 0.0100082 time 0.0187359 TSAdapt 'basic': step 14 accepted t=0.0187359 + 1.001e-02 wlte=0.251 family='arkimex' scheme=0:'3' dt=1.428e-02 15 TS dt 0.0142831 time 0.0287441 TSAdapt 'basic': step 15 accepted t=0.0287441 + 1.428e-02 wlte=0.253 family='arkimex' scheme=0:'3' dt=2.031e-02 16 TS dt 0.0203146 time 0.0430273 TSAdapt 'basic': step 16 accepted t=0.0430273 + 2.031e-02 wlte=0.222 family='arkimex' scheme=0:'3' dt=3.020e-02 17 TS dt 0.0301966 time 0.0633418 TSAdapt 'basic': step 17 accepted t=0.0633418 + 3.020e-02 wlte=0.155 family='arkimex' scheme=0:'3' dt=5.059e-02 18 TS dt 0.0505899 time 0.0935385 TSAdapt 'basic': step 18 accepted t=0.0935385 + 5.059e-02 wlte=0.0653 family='arkimex' scheme=0:'3' dt=1.131e-01 19 TS dt 0.113076 time 0.144128 TSAdapt 'basic': step 19 accepted t=0.144128 + 1.131e-01 wlte=0.00237 family='arkimex' scheme=0:'3' dt=7.635e-01 20 TS dt 0.763492 time 0.257205 TSAdapt 'basic': step 20 accepted t=0.257205 + 7.635e-01 wlte=0.000642 family='arkimex' scheme=0:'3' dt=7.635e+00 21 TS dt 7.63492 time 1.0207 TSAdapt 'basic': step 21 accepted t=1.0207 + 7.635e+00 wlte=4.38e-05 family='arkimex' scheme=0:'3' dt=1.344e+00 22 TS dt 1.34439 time 8.65561 TSAdapt 'basic': step 22 accepted t=8.65561 + 1.344e+00 wlte=2.85e-07 family='arkimex' scheme=0:'3' dt=1.344e+01 23 TS dt 13.4439 time 10 CONVERGED_TIME at time 10 after 23 steps -------------- next part -------------- 0 TS dt 1e-06 time 0 TSAdapt 'basic': step 0 accepted t=0 + 1.000e-06 wlte=0.723 family='arkimex' scheme=0:'1bee' dt=1.059e-06 [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [0]PETSC ERROR: to get more information on the crash. [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Signal received [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.5.4, May, 23, 2015 [0]PETSC ERROR: ./planb on a arch-linux2-c-opt named localhost.localdomain by root Thu Jun 18 17:53:30 2015 [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich --download-mumps --download-scalapack --download-parmetis --download-metis --with-debugging=0 COPTFLAGS="-Ofast -march=native -mtune=native" CXXOPTFLAGS="-Ofast -march=native -mtune=native" FOPTFLAGS="-Ofast -march=native -mtune=native" [0]PETSC ERROR: #1 User provided function() line 0 in unknown file application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 [cli_0]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 29812 RUNNING AT localhost.localdomain = EXIT CODE: 59 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== From emconsta at mcs.anl.gov Thu Jun 18 20:58:06 2015 From: emconsta at mcs.anl.gov (Emil Constantinescu) Date: Thu, 18 Jun 2015 20:58:06 -0500 Subject: [petsc-users] arkimex rejecting all dt in petsc 3.6.0 In-Reply-To: References: <718CB9DE-700D-4FE6-86E1-FC5401F3940F@mcs.anl.gov> <5582F0B7.60901@mcs.anl.gov> Message-ID: <5583772E.7050308@mcs.anl.gov> The results enclosed for 3.5.4 (354.txt) are not correct. I can tell that by looking at the first step. Most implicit schemes in arkimex have the first stage explicit and therefore need to be "started" somehow when dealing with algebraic constraints. In this case, a different scheme is used to start the integration. Note that in 3.6 and 3.5.4_SEGV the first step uses scheme 1bee (scheme=0:'1bee') and then switches to '3'. This is the correct sequence otherwise inconsistent initial conditions are inadvertently used or the solver thinks it solves an ODE. I'm not sure why it's crashing in 3.5.4_SEGV. Configuring with --with-debugging=yes may reveal more. Emil On 6/18/15 4:10 PM, Italo Tasso wrote: > Thank you Barry and Emil. > > It works, but it takes twice as many steps than before. > > Was it wrong before? Should I have been using this option all along? > > I am solving the Navier-Stokes equations, full implicit and non-linear. > > I also noticed two things: > > In the 3.6.0 output, ts_monitor skips timestep #1. > > If I use the equation type in 3.5.4, I get segmentation fault. > > I attached the outputs. > > > > > > On Thu, Jun 18, 2015 at 1:24 PM, Emil Constantinescu > > wrote: > > No, I'm not sure if it should. It belongs to the same category as > "problem_type" == TS_LINEAR / == TS_NONLINEAR. > > Emil > > > On 6/18/15 11:15 AM, Abhyankar, Shrirang G. wrote: > > Can the DAE equation type be supplied via a run-time option? > > Shri > > -----Original Message----- > From: barry smith > > Date: Thursday, June 18, 2015 at 11:08 AM > To: Italo Tasso >, "Constantinescu, Emil M." > > > Cc: "petsc-users at mcs.anl.gov " > > > Subject: Re: [petsc-users] arkimex rejecting all dt in petsc 3.6.0 > > > Here is the response from Emil who made the changes to > the code > > If you are solving a DAE then in the new version, we > introduced a new > flag that distinguishes between ODEs and DAEs leading to > different > semantics. For DAEs this is needed b/c it may not have > consistent initial > conditions. > > If solving DAEs, the user has to set the EquationType > appropriately: > e.g., > ierr = > TSSetEquationType(ts,TS_EQ_DAE_IMPLICIT_INDEX1);CHKERRQ(ierr); > > This is documented in the doc, but I didn't add it to the > changelog (Jed > pointed it out to me). > > Currently it only affects -ts_type arkimex. I'll work with > Satish to add > it to the changelog. > > *If that's not the case:* let me know and we can dig deeper. > > Emil > > On Jun 18, 2015, at 5:48 AM, Italo Tasso > > wrote: > > I just upgraded to 3.6.0 and my code stopped working. > All dt are > rejected. I used the same configure line, same code, > same everything. > > With 3.5.4 I get: > > 0 TS dt 1e-06 time 0 > 0 SNES Function norm 2.549981005316e+05 > 1 SNES Function norm 6.107056905987e-03 > 2 SNES Function norm 1.483881932064e-10 > 3 SNES Function norm 9.122873794272e-11 > Nonlinear solve converged due to CONVERGED_FNORM_ABS > iterations 3 > 0 SNES Function norm 7.790429171165e+04 > 1 SNES Function norm 7.289068747803e-04 > 2 SNES Function norm 8.227639633330e-11 > Nonlinear solve converged due to CONVERGED_FNORM_ABS > iterations 2 > 0 SNES Function norm 1.154356516184e+05 > 1 SNES Function norm 2.309925413255e-03 > 2 SNES Function norm 6.382141981406e-11 > Nonlinear solve converged due to CONVERGED_FNORM_ABS > iterations 2 > TSAdapt 'basic': step 0 accepted t=0 > + 1.000e-06 > wlte=0.000654 family='arkimex' scheme=0:'3' dt=1.000e-05 > > With 3.6.0 I get: > > 0 TS dt 1e-06 time 0 > 0 SNES Function norm 2.549981005316e+05 > 1 SNES Function norm 6.107056925316e-03 > 2 SNES Function norm 1.519319591792e-10 > 3 SNES Function norm 9.070104116945e-11 > Nonlinear solve converged due to CONVERGED_FNORM_ABS > iterations 3 > 0 SNES Function norm 7.790429171165e+04 > 1 SNES Function norm 6.942541792651e-04 > 2 SNES Function norm 8.458781909516e-11 > Nonlinear solve converged due to CONVERGED_FNORM_ABS > iterations 2 > 0 SNES Function norm 1.154356516184e+05 > 1 SNES Function norm 2.287202942961e-03 > 2 SNES Function norm 6.585201377573e-11 > Nonlinear solve converged due to CONVERGED_FNORM_ABS > iterations 2 > TSAdapt 'basic': step 0 rejected t=0 > + 1.000e-06 wlte= > 324 family='arkimex' scheme=0:'3' dt=1.311e-07 > > Any ideas? I attached the full output. > > Options I use: > > -ts_view -ts_type arkimex -ts_arkimex_fully_implicit > -ts_adapt_monitor > -ts_monitor -snes_monitor -snes_converged_reason > -ksp_type preonly > -pc_type lu -pc_factor_mat_solver_package mumps > -snes_rtol 0 -snes_atol > 1e-10 -snes_stol 0 > > > > > > From italo at tasso.com.br Thu Jun 18 22:31:28 2015 From: italo at tasso.com.br (Italo Tasso) Date: Fri, 19 Jun 2015 00:31:28 -0300 Subject: [petsc-users] arkimex rejecting all dt in petsc 3.6.0 In-Reply-To: <5583772E.7050308@mcs.anl.gov> References: <718CB9DE-700D-4FE6-86E1-FC5401F3940F@mcs.anl.gov> <5582F0B7.60901@mcs.anl.gov> <5583772E.7050308@mcs.anl.gov> Message-ID: I understand. Thanks again. On Thu, Jun 18, 2015 at 10:58 PM, Emil Constantinescu wrote: > The results enclosed for 3.5.4 (354.txt) are not correct. I can tell that > by looking at the first step. Most implicit schemes in arkimex have the > first stage explicit and therefore need to be "started" somehow when > dealing with algebraic constraints. In this case, a different scheme is > used to start the integration. > > Note that in 3.6 and 3.5.4_SEGV the first step uses scheme 1bee > (scheme=0:'1bee') and then switches to '3'. This is the correct sequence > otherwise inconsistent initial conditions are inadvertently used or the > solver thinks it solves an ODE. I'm not sure why it's crashing in > 3.5.4_SEGV. Configuring with --with-debugging=yes may reveal more. > > Emil > > On 6/18/15 4:10 PM, Italo Tasso wrote: > >> Thank you Barry and Emil. >> >> It works, but it takes twice as many steps than before. >> >> Was it wrong before? Should I have been using this option all along? >> >> I am solving the Navier-Stokes equations, full implicit and non-linear. >> >> I also noticed two things: >> >> In the 3.6.0 output, ts_monitor skips timestep #1. >> >> If I use the equation type in 3.5.4, I get segmentation fault. >> >> I attached the outputs. >> >> >> >> >> >> On Thu, Jun 18, 2015 at 1:24 PM, Emil Constantinescu >> > wrote: >> >> No, I'm not sure if it should. It belongs to the same category as >> "problem_type" == TS_LINEAR / == TS_NONLINEAR. >> >> Emil >> >> >> On 6/18/15 11:15 AM, Abhyankar, Shrirang G. wrote: >> >> Can the DAE equation type be supplied via a run-time option? >> >> Shri >> >> -----Original Message----- >> From: barry smith > >> >> Date: Thursday, June 18, 2015 at 11:08 AM >> To: Italo Tasso > >, "Constantinescu, Emil M." >> > >> Cc: "petsc-users at mcs.anl.gov " >> > >> Subject: Re: [petsc-users] arkimex rejecting all dt in petsc 3.6.0 >> >> >> Here is the response from Emil who made the changes to >> the code >> >> If you are solving a DAE then in the new version, we >> introduced a new >> flag that distinguishes between ODEs and DAEs leading to >> different >> semantics. For DAEs this is needed b/c it may not have >> consistent initial >> conditions. >> >> If solving DAEs, the user has to set the EquationType >> appropriately: >> e.g., >> ierr = >> TSSetEquationType(ts,TS_EQ_DAE_IMPLICIT_INDEX1);CHKERRQ(ierr); >> >> This is documented in the doc, but I didn't add it to the >> changelog (Jed >> pointed it out to me). >> >> Currently it only affects -ts_type arkimex. I'll work with >> Satish to add >> it to the changelog. >> >> *If that's not the case:* let me know and we can dig deeper. >> >> Emil >> >> On Jun 18, 2015, at 5:48 AM, Italo Tasso >> > wrote: >> >> I just upgraded to 3.6.0 and my code stopped working. >> All dt are >> rejected. I used the same configure line, same code, >> same everything. >> >> With 3.5.4 I get: >> >> 0 TS dt 1e-06 time 0 >> 0 SNES Function norm 2.549981005316e+05 >> 1 SNES Function norm 6.107056905987e-03 >> 2 SNES Function norm 1.483881932064e-10 >> 3 SNES Function norm 9.122873794272e-11 >> Nonlinear solve converged due to CONVERGED_FNORM_ABS >> iterations 3 >> 0 SNES Function norm 7.790429171165e+04 >> 1 SNES Function norm 7.289068747803e-04 >> 2 SNES Function norm 8.227639633330e-11 >> Nonlinear solve converged due to CONVERGED_FNORM_ABS >> iterations 2 >> 0 SNES Function norm 1.154356516184e+05 >> 1 SNES Function norm 2.309925413255e-03 >> 2 SNES Function norm 6.382141981406e-11 >> Nonlinear solve converged due to CONVERGED_FNORM_ABS >> iterations 2 >> TSAdapt 'basic': step 0 accepted t=0 >> + 1.000e-06 >> wlte=0.000654 family='arkimex' scheme=0:'3' dt=1.000e-05 >> >> With 3.6.0 I get: >> >> 0 TS dt 1e-06 time 0 >> 0 SNES Function norm 2.549981005316e+05 >> 1 SNES Function norm 6.107056925316e-03 >> 2 SNES Function norm 1.519319591792e-10 >> 3 SNES Function norm 9.070104116945e-11 >> Nonlinear solve converged due to CONVERGED_FNORM_ABS >> iterations 3 >> 0 SNES Function norm 7.790429171165e+04 >> 1 SNES Function norm 6.942541792651e-04 >> 2 SNES Function norm 8.458781909516e-11 >> Nonlinear solve converged due to CONVERGED_FNORM_ABS >> iterations 2 >> 0 SNES Function norm 1.154356516184e+05 >> 1 SNES Function norm 2.287202942961e-03 >> 2 SNES Function norm 6.585201377573e-11 >> Nonlinear solve converged due to CONVERGED_FNORM_ABS >> iterations 2 >> TSAdapt 'basic': step 0 rejected t=0 >> + 1.000e-06 wlte= >> 324 family='arkimex' scheme=0:'3' dt=1.311e-07 >> >> Any ideas? I attached the full output. >> >> Options I use: >> >> -ts_view -ts_type arkimex -ts_arkimex_fully_implicit >> -ts_adapt_monitor >> -ts_monitor -snes_monitor -snes_converged_reason >> -ksp_type preonly >> -pc_type lu -pc_factor_mat_solver_package mumps >> -snes_rtol 0 -snes_atol >> 1e-10 -snes_stol 0 >> >> >> >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri Jun 19 11:35:45 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 19 Jun 2015 11:35:45 -0500 Subject: [petsc-users] Reg : Petsc installation failed In-Reply-To: References: Message-ID: Please keep the conversation on list. [or use petsc-maint] On Fri, 19 Jun 2015, Manish Kumar K wrote: > > Hi Satish, > > I did as per your following instructions mentioned below for PETSC to install on my system using MS compilers , > For some reasons its flagging me the same error . > I have attached configure.log file with this email and snap shot of error in bash. > > Steps followed > 1)I installed Cygwin freshly and used latest PETSc release 3.6. Hm - I see latest cygwin as: CYGWIN_NT-6.1 ps4 2.0.4(0.287/5/3) 2015-06-09 12:22 x86_64 Cygwin However configure.log has: ('CYGWIN_NT-6.1', 'EESBLRW106', '2.0.2(0.287/5/3)', '2015-05-08 17:00', 'x86_64', '') a slightly older version. Are you sure you've install a fresh cygwin? Also can you try the following? Copy/paste and e-mail the complete output. cd src/benchmarks /cygdrive/d/Software/Cygwin64/package/petsc/bin/win32fe/win32fe cl -c sizeof.c echo $? /cygdrive/d/Software/Cygwin64/package/petsc/bin/win32fe/win32fe cl -o sizeof.exe sizeof.c echo $? ./sizeof.exe Satish ----------- balay at ps4 ~/petsc/src/benchmarks $ ~/petsc/bin/win32fe/win32fe.exe cl -c sizeof.c sizeof.c balay at ps4 ~/petsc/src/benchmarks $ echo $? 0 balay at ps4 ~/petsc/src/benchmarks $ ~/petsc/bin/win32fe/win32fe.exe cl -o sizeof.exe sizeof.o balay at ps4 ~/petsc/src/benchmarks $ echo $? 0 balay at ps4 ~/petsc/src/benchmarks $ ./sizeof.exe long double : 8 double : 8 int : 4 char : 1 short : 2 long : 4 long long : 8 int * : 8 size_t : 8 balay at ps4 ~/petsc/src/benchmarks $ > And followed same command as per the instructions mentioned at this link > > http://www.mcs.anl.gov/petsc/documentation/installation.html#windows > > Kindly help me out in this . > > Let me know if I missed anything. > > > Regards > Manish K > _______________________________________________________________________________________________________________________________________ > > -----Original Message----- > From: Satish Balay [mailto:balay at mcs.anl.gov] > Sent: Thursday, June 18, 2015 7:45 PM > To: Manish Kumar K > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Reg : Petsc installation failed > > >>>>>>> > Executing: /package/petsc/petsc-3.5.3/bin/win32fe/win32fe cl -c -o /cygdrive/c/Users/20055335/AppData/Local/Temp/petsc-leuQLz/config.setCompilers/conftest.o -I/cygdrive/c/Users/20055335/AppData/Local/Temp/petsc-leuQLz/config.setCompilers /cygdrive/c/Users/20055335/AppData/Local/Temp/petsc-leuQLz/config.setCompilers/conftest.c > Possible ERROR while running compiler: exit code 50176 <<<<< > > For some reason the compiler is returning error codes. Just to eliminate cygwin from the equation - can you do the following: > > 1. reboot the machine > 2. [make sure you don't start any cygwin processes] 3. run cygwin setup [it defaults to update mode] - and run it to completion. > It should run rebaseall at the end. > 4. Use petsc-3.6 > > And rerun configure [from compiler-cmd,bash per instructions] > > Satish > > On Thu, 18 Jun 2015, Manish Kumar K wrote: > > > Dear PETSc Team, > > > > > > > > I am configuring and trying to install PETSc libraries using > > Cygwin shell for windows > > > > > > > > I checked instructions at: > > > > http://www.mcs.anl.gov/petsc/documentation/installation.html#windows > > > > And invoked following command > > > > > > > > Step 1: > > > > ./configure --with-cc='win32fe cl' --with-fc=0 > > --download-f2cblaslapack=1 > > > > > > C compiler you provided with -with-cc=win32fe cl does not work. > > Cannot compile C with /package/petsc/petsc-3.5.3/bin/win32fe/win32fe cl. > > > > > > > > I am sending you the log file for this . > > > > > > > > Please help me out in this . > > > > > > > > Regards > > > > Manish K > > > > [https://relayq.larsentoubro.com/DigitalSignature2015.jpg] > > > > L&T Technology Services Ltd > > > > www.LntTechservices.com > > > > This Email may contain confidential or privileged information for the intended recipient (s). If you are not the intended recipient, please do not use or disseminate the information, notify the sender and delete it from your system. > > > > [https://relayq.larsentoubro.com/DigitalSignature2015.jpg] > L&T Technology Services Ltd > > www.LntTechservices.com > > This Email may contain confidential or privileged information for the intended recipient (s). If you are not the intended recipient, please do not use or disseminate the information, notify the sender and delete it from your system. > From jychang48 at gmail.com Fri Jun 19 11:52:32 2015 From: jychang48 at gmail.com (Justin Chang) Date: Fri, 19 Jun 2015 11:52:32 -0500 Subject: [petsc-users] Varying TAO optimization solve iterations using BLMVM In-Reply-To: References: Message-ID: My code sort of requires HDF5 so installing quad precision might be a little difficult. I could try to work around this but that might take some effort. In the mean time, is there any other potential explanation or alternative to figuring this out? Thanks, Justin On Thursday, June 18, 2015, Matthew Knepley wrote: > On Thu, Jun 18, 2015 at 1:52 PM, Jason Sarich > wrote: > >> BLMVM doesn't use a KSP or preconditioner, it updates using the L-BFGS-B >> formula >> > > Then this sounds like a bug, unless one of the constants is partition > dependent. > > Matt > > >> On Thu, Jun 18, 2015 at 1:45 PM, Matthew Knepley > > wrote: >> >>> On Thu, Jun 18, 2015 at 12:15 PM, Jason Sarich >> > wrote: >>> >>>> Hi Justin, >>>> >>>> I can't tell for sure why this is happening, have you tried using >>>> quad precision to make sure that numerical cutoffs isn't the problem? >>>> >>>> 1 The Hessian being approximate and the resulting implicit >>>> computation is the source of the cutoff, but would not be causing different >>>> convergence rates in infinite precision. >>>> >>>> 2 the local size may affect load balancing but not the resulting >>>> norms/convergence rate. >>>> >>> >>> This sounds to be like the preconditioner is dependent on the >>> partition. Can you send -tao_view -snes_view >>> >>> Matt >>> >>> >>>> Jason >>>> >>>> >>>> On Thu, Jun 18, 2015 at 10:44 AM, Justin Chang >>> > wrote: >>>> >>>>> I solved a transient diffusion across multiple cores using TAO >>>>> BLMVM. When I simulate the same problem but on different numbers of >>>>> processing cores, the number of solve iterations change quite drastically. >>>>> The numerical solution is the same, but these changes are quite vast. I >>>>> attached a PDF showing a comparison between KSP and TAO. KSP remains >>>>> largely invariant with number of processors but TAO (with bounded >>>>> constraints) fluctuates. >>>>> >>>>> My question is, why is this happening? I understand that accumulation >>>>> of numerical round-offs may attribute to this, but the differences seem >>>>> quite vast to me. My initial thought was that >>>>> >>>>> 1) the Hessian is only projected and not explicitly computed, which >>>>> may have something to do with the rate of convergence >>>>> >>>>> 2) local problem size. Certain regions of my domain have different >>>>> number of "violations" which need to be corrected by the bounded >>>>> constraints so the rate of convergence depends on how these regions are >>>>> partitioned? >>>>> >>>>> Any thoughts? >>>>> >>>>> Thanks, >>>>> Justin >>>>> >>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at email.arizona.edu Fri Jun 19 12:46:11 2015 From: aph at email.arizona.edu (Anthony Haas) Date: Fri, 19 Jun 2015 10:46:11 -0700 Subject: [petsc-users] from standard fortran program to PETSc Message-ID: <55845563.2000901@email.arizona.edu> Hi, I have a Fortran90 program that solves a complex linear generalized eigenvalue problem (GEVP) using standard fortran 90 programming: Subroutines, modules, allocatable arrays, real(8), int,... This program uses Lapack to solve the GEVP. The program is mainly made off: 1) set dimensions of problem and initialize arrays,... 2) compute the baseflow (for instance boundary layer flow) 3) build the (stability) complex generalized eigenvalue problem ==> build (dense) matrices A and B 4) solve the GEVP with Lapack Now I want to use PETSc + SLEPc to use sparse matrices. Do I need to rewrite/modify everything in terms of PETSc variables as follows: - int -> PetscInt - real(8) -> PetscScalar - complex*16 -> PetscScalar or is it possible to reuse all that F90 code? For instance I have a similarity solver that computes Blasius solution. If that similarity solver provides me with u and v velocities in terms of standard fortran90 real(8) variables, how should I do to use these variables to build my complex matrix? Should I convert them to Petsc variables? How? what should I do with my Fortran90 allocatable arrays? real(dp),allocatable,dimension(:,:) :: u--> PetscScalar,allocatable,dimension(:,:) :: u ???? Thanks a lot, Anthony From balay at mcs.anl.gov Fri Jun 19 13:08:09 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 19 Jun 2015 13:08:09 -0500 Subject: [petsc-users] from standard fortran program to PETSc In-Reply-To: <55845563.2000901@email.arizona.edu> References: <55845563.2000901@email.arizona.edu> Message-ID: On Fri, 19 Jun 2015, Anthony Haas wrote: > Hi, > > I have a Fortran90 program that solves a complex linear generalized eigenvalue > problem (GEVP) using standard fortran 90 programming: > > Subroutines, modules, allocatable arrays, real(8), int,... > > This program uses Lapack to solve the GEVP. The program is mainly made off: > > 1) set dimensions of problem and initialize arrays,... > 2) compute the baseflow (for instance boundary layer flow) > 3) build the (stability) complex generalized eigenvalue problem ==> build > (dense) matrices A and B > 4) solve the GEVP with Lapack > > Now I want to use PETSc + SLEPc to use sparse matrices. Do I need to > rewrite/modify everything in terms of PETSc variables as follows: > > - int -> PetscInt > - real(8) -> PetscScalar perhaps you mean: PetscReal > - complex*16 -> PetscScalar > > or is it possible to reuse all that F90 code? For instance I have a similarity > solver that computes Blasius solution. If that similarity solver provides me > with u and v velocities in terms of standard fortran90 real(8) variables, how > should I do to use these variables to build my complex matrix? Should I > convert them to Petsc variables? How? > You can use the current datatypes used in your code - And always make sure the types match manually. [Fortran does not have typecheck anyway..] > what should I do with my Fortran90 allocatable arrays? > > real(dp),allocatable,dimension(:,:) :: u--> > PetscScalar,allocatable,dimension(:,:) :: u ???? Either should work. You can add the following to your code: #if !defined(PETSC_USE_COMPLEX) #error "this code requires PETSc --with-scalar-type=complex build" #endif #if !defined(PETSC_USE_REAL_DOUBLE) #error "this code requires PETSc --with-precision=real build" #endif Satish From aph at email.arizona.edu Fri Jun 19 14:06:42 2015 From: aph at email.arizona.edu (Anthony Paul Haas) Date: Fri, 19 Jun 2015 12:06:42 -0700 Subject: [petsc-users] from standard fortran program to PETSc In-Reply-To: References: <55845563.2000901@email.arizona.edu> Message-ID: Hi Satish, Thanks for your answer. In the attached program, I have declared the following standard fortran arrays: real(dp),dimension(:,:),allocatable :: D1X,D2X,D1Y,D2Y Let's say these are real matrix derivatives that I need to insert in my complex A matrix (it could also be some real baseflow from a similarity solver). I have filled D1Y with 999.d0 just to test (see line 169). Then I insert D1Y in the global (complex) matrix with call MatSetValues(A,ny,idxm2,ny,idxn2,D1Y,INSERT_VALUES,ierr). I expected that in the Matrix A, this would be automatically converted to 999.0 + 0.0i but when I view A, I see 999 + 999 i or even 999 + 1.0326e-321 i. Is there a way to insert D1Y as is and obtain the proper behavior? How would you do it? Thanks Anthony On Fri, Jun 19, 2015 at 11:08 AM, Satish Balay wrote: > On Fri, 19 Jun 2015, Anthony Haas wrote: > > > Hi, > > > > I have a Fortran90 program that solves a complex linear generalized > eigenvalue > > problem (GEVP) using standard fortran 90 programming: > > > > Subroutines, modules, allocatable arrays, real(8), int,... > > > > This program uses Lapack to solve the GEVP. The program is mainly made > off: > > > > 1) set dimensions of problem and initialize arrays,... > > 2) compute the baseflow (for instance boundary layer flow) > > 3) build the (stability) complex generalized eigenvalue problem ==> build > > (dense) matrices A and B > > 4) solve the GEVP with Lapack > > > > Now I want to use PETSc + SLEPc to use sparse matrices. Do I need to > > rewrite/modify everything in terms of PETSc variables as follows: > > > > - int -> PetscInt > > - real(8) -> PetscScalar > perhaps you mean: PetscReal > > > - complex*16 -> PetscScalar > > > > or is it possible to reuse all that F90 code? For instance I have a > similarity > > solver that computes Blasius solution. If that similarity solver > provides me > > with u and v velocities in terms of standard fortran90 real(8) > variables, how > > should I do to use these variables to build my complex matrix? Should I > > convert them to Petsc variables? How? > > > > You can use the current datatypes used in your code - And always make > sure the types match manually. [Fortran does not have typecheck anyway..] > > > what should I do with my Fortran90 allocatable arrays? > > > > real(dp),allocatable,dimension(:,:) :: u--> > > PetscScalar,allocatable,dimension(:,:) :: u ???? > > Either should work. > > You can add the following to your code: > > #if !defined(PETSC_USE_COMPLEX) > #error "this code requires PETSc --with-scalar-type=complex build" > #endif > #if !defined(PETSC_USE_REAL_DOUBLE) > #error "this code requires PETSc --with-precision=real build" > #endif > > Satish > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Build_Matrices.F90 Type: text/x-fortran Size: 21559 bytes Desc: not available URL: From balay at mcs.anl.gov Fri Jun 19 14:23:48 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 19 Jun 2015 14:23:48 -0500 Subject: [petsc-users] from standard fortran program to PETSc In-Reply-To: References: <55845563.2000901@email.arizona.edu> Message-ID: Sorry - I should have checked the code closer. [real(dp) vs PetscScalar] MatSetValues() expects PetscScalar. You cannot pass in 'real*8' in place of 'complex*16' [i.e PetscScalar] - and expect it to work. This is what I meat by 'make sure the types match manually' You'll have to convert real*8 to complex*16 yourself before calling MatSetValues(). Satish On Fri, 19 Jun 2015, Anthony Paul Haas wrote: > Hi Satish, > > Thanks for your answer. In the attached program, I have declared the > following standard fortran arrays: real(dp),dimension(:,:),allocatable :: > D1X,D2X,D1Y,D2Y > Let's say these are real matrix derivatives that I need to insert in my > complex A matrix (it could also be some real baseflow from a similarity > solver). I have filled D1Y with 999.d0 just to test (see line 169). Then I > insert D1Y in the global (complex) matrix with call > MatSetValues(A,ny,idxm2,ny,idxn2,D1Y,INSERT_VALUES,ierr). I expected that > in the Matrix A, this would be automatically converted to 999.0 + 0.0i but > when I view A, I see 999 + 999 i or even 999 + 1.0326e-321 i. Is there a > way to insert D1Y as is and obtain the proper behavior? How would you do it? > > Thanks > > Anthony > > On Fri, Jun 19, 2015 at 11:08 AM, Satish Balay wrote: > > > On Fri, 19 Jun 2015, Anthony Haas wrote: > > > > > Hi, > > > > > > I have a Fortran90 program that solves a complex linear generalized > > eigenvalue > > > problem (GEVP) using standard fortran 90 programming: > > > > > > Subroutines, modules, allocatable arrays, real(8), int,... > > > > > > This program uses Lapack to solve the GEVP. The program is mainly made > > off: > > > > > > 1) set dimensions of problem and initialize arrays,... > > > 2) compute the baseflow (for instance boundary layer flow) > > > 3) build the (stability) complex generalized eigenvalue problem ==> build > > > (dense) matrices A and B > > > 4) solve the GEVP with Lapack > > > > > > Now I want to use PETSc + SLEPc to use sparse matrices. Do I need to > > > rewrite/modify everything in terms of PETSc variables as follows: > > > > > > - int -> PetscInt > > > - real(8) -> PetscScalar > > perhaps you mean: PetscReal > > > > > - complex*16 -> PetscScalar > > > > > > or is it possible to reuse all that F90 code? For instance I have a > > similarity > > > solver that computes Blasius solution. If that similarity solver > > provides me > > > with u and v velocities in terms of standard fortran90 real(8) > > variables, how > > > should I do to use these variables to build my complex matrix? Should I > > > convert them to Petsc variables? How? > > > > > > > You can use the current datatypes used in your code - And always make > > sure the types match manually. [Fortran does not have typecheck anyway..] > > > > > what should I do with my Fortran90 allocatable arrays? > > > > > > real(dp),allocatable,dimension(:,:) :: u--> > > > PetscScalar,allocatable,dimension(:,:) :: u ???? > > > > Either should work. > > > > You can add the following to your code: > > > > #if !defined(PETSC_USE_COMPLEX) > > #error "this code requires PETSc --with-scalar-type=complex build" > > #endif > > #if !defined(PETSC_USE_REAL_DOUBLE) > > #error "this code requires PETSc --with-precision=real build" > > #endif > > > > Satish > > > From mrosso at uci.edu Fri Jun 19 18:32:29 2015 From: mrosso at uci.edu (Michele Rosso) Date: Fri, 19 Jun 2015 16:32:29 -0700 Subject: [petsc-users] Petsc and cmake Message-ID: <1434756749.10836.18.camel@kolmog5> Hi, I am trying to move to cmake to build my code. How would you suggest to handle the dependency on PETSc? Currently my makefile relies on the PETSc variables for building the all code, namely FLINKER, CLINKER and so on. I found the FindPETSc.cmake module and I successfully used it but it does not import the aforementioned variables. Any suggestion on the matter is greatly appreciated. Thanks, Michele -------------- next part -------------- An HTML attachment was scrubbed... URL: From cuilongyin at gmail.com Fri Jun 19 22:42:51 2015 From: cuilongyin at gmail.com (Longyin Cui) Date: Fri, 19 Jun 2015 23:42:51 -0400 Subject: [petsc-users] About MatView Message-ID: Hi dear whoever reads this: I have a quick question: After matrix assembly, suppouse I have matrix A. Assuming I used 16 processors, if I want each processor to print out their local contents of the A, how do I proceed? (I simply want to know how the matrix is stored from generating to communicating to solving, so I get to display all the time to get better undersdtanding) I read the examples, and I tried things like below and sooooo many different codes from examples, it still is not working. PetscViewer viewer; PetscMPIInt my_rank; MPI_Comm_rank(PETSC_COMM_WORLD,&my_rank); PetscPrintf(MPI_COMM_SELF,"[%d] rank\n",my_rank); PetscViewerASCIIOpen(MPI_COMM_SELF,NULL,&viewer); PetscViewerPushFormat(viewer,PETSC_VIEWER_ASCII_INFO); MatView(impOP,viewer); Plea......se give me some hints Thank you so very much! Longyin Cui (or you know me as Eric); Student from C.S. division; Cell: 7407047169; return 0; -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sat Jun 20 00:34:16 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 20 Jun 2015 00:34:16 -0500 Subject: [petsc-users] About MatView In-Reply-To: References: Message-ID: <482CC387-136E-4746-85C1-C7FB4DB2C845@mcs.anl.gov> You need to cut and paste and send the entire error message: "not working" makes it very difficult for us to know what has gone wrong. Based on the code fragment you sent I guess one of your problems is that the viewer communicator is not the same as the matrix communicator. Since the matrix is on 16 processors (I am guessing PETSC_COMM_WORLD) the viewer communicator must also be the same (also PETSC_COMM_WORLD). The simplest code you can do is > PetscViewerASCIIOpen(PETSC_COMM_WORLD,"stdout",&viewer); > MatView(impOP,viewer); but you can get a similar effect with the command line option -mat_view and not write any code at all (the less code you have to write the better). Barry > On Jun 19, 2015, at 10:42 PM, Longyin Cui wrote: > > Hi dear whoever reads this: > > I have a quick question: > After matrix assembly, suppouse I have matrix A. Assuming I used 16 processors, if I want each processor to print out their local contents of the A, how do I proceed? (I simply want to know how the matrix is stored from generating to communicating to solving, so I get to display all the time to get better undersdtanding) > > I read the examples, and I tried things like below and sooooo many different codes from examples, it still is not working. > PetscViewer viewer; > PetscMPIInt my_rank; > MPI_Comm_rank(PETSC_COMM_WORLD,&my_rank); > PetscPrintf(MPI_COMM_SELF,"[%d] rank\n",my_rank); > PetscViewerASCIIOpen(MPI_COMM_SELF,NULL,&viewer); > PetscViewerPushFormat(viewer,PETSC_VIEWER_ASCII_INFO); > MatView(impOP,viewer); > > Plea......se give me some hints > > Thank you so very much! > > > Longyin Cui (or you know me as Eric); > Student from C.S. division; > Cell: 7407047169; > return 0; > From bsmith at mcs.anl.gov Sat Jun 20 12:38:19 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 20 Jun 2015 12:38:19 -0500 Subject: [petsc-users] About MatView In-Reply-To: References: <482CC387-136E-4746-85C1-C7FB4DB2C845@mcs.anl.gov> Message-ID: <6DAC777F-EA19-4F12-8141-E3F505CA151C@mcs.anl.gov> Eric, > On Jun 20, 2015, at 1:42 AM, Longyin Cui wrote: > > OMG you are real!!! > OK, All my error message looks like this: > PETSc Error ... exiting > -------------------------------------------------------------------------- > mpirun has exited due to process rank 13 with PID 1816 on > node cnode174.local exiting improperly. There are two reasons this could occur: > > 1. this process did not call "init" before exiting, but others in > the job did. This can cause a job to hang indefinitely while it waits > for all processes to call "init". By rule, if one process calls "init", > then ALL processes must call "init" prior to termination. > > 2. this process called "init", but exited without calling "finalize". > By rule, all processes that call "init" MUST call "finalize" prior to > exiting or it will be considered an "abnormal termination" > > This may have caused other processes in the application to be > terminated by signals sent by mpirun (as reported here). This crash doesn't seem to have anything to do in particular with code below. Do the PETSc examples run in parallel? Does your code that you ran have a PetscInitialize() in it? What about running on two processors, does that work? > > You are right, I did use PETSC_COMM_SELE, and when I used PETSC_COMM_WORLD alone I could get the entire matrix printed. But, this is one whole matrix in one file. The reason I used: PetscViewerASCIIOpen ( PETSC_COMM_SELF, "mat.output", &viewer); and MatView (matrix,viewer); was because it says "Each processor can instead write its own independent output by specifying the communicator PETSC_COMM_SELF". Yikes, this is completely untrue and has been for decades. We have no way of saving the matrix in its parts; you cannot use use a PETSC_COMM_SELF viewer with a parallel matrix. Sorry about the wrong information in the documentation; I have fixed it. Why can't you just save the matrix in one file and then compare it? We don't provide a way to save objects one part per process because we think it is a bad model for parallel computing since the result depends on the number of processors you are using. Barry > > Also, I tried this as well, which failed, same error message : > PetscMPIInt my_rank; > MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); > string str = KILLME(my_rank); // KILLME is a int to string function... > const char * c = str.c_str(); > PetscViewer viewer; > PetscViewerASCIIOpen(PETSC_COMM_WORLD, c , &viewer); > MatView(impOP,viewer); //impOP is the huge matrix. > PetscViewerDestroy(&viewer); > > I was trying to generate 16 files recording each matrix hold by each processor so I can conpare them with the big matrix...so, what do you think > > Thank you very muuuch. > > Longyin Cui (or you know me as Eric); > Student from C.S. division; > Cell: 7407047169; > return 0; > > > On Sat, Jun 20, 2015 at 1:34 AM, Barry Smith wrote: > > You need to cut and paste and send the entire error message: "not working" makes it very difficult for us to know what has gone wrong. > Based on the code fragment you sent I guess one of your problems is that the viewer communicator is not the same as the matrix communicator. Since the matrix is on 16 processors (I am guessing PETSC_COMM_WORLD) the viewer communicator must also be the same (also PETSC_COMM_WORLD). > The simplest code you can do is > > > PetscViewerASCIIOpen(PETSC_COMM_WORLD,"stdout",&viewer); > > MatView(impOP,viewer); > > but you can get a similar effect with the command line option -mat_view and not write any code at all (the less code you have to write the better). > > Barry > > > > On Jun 19, 2015, at 10:42 PM, Longyin Cui wrote: > > > > Hi dear whoever reads this: > > > > I have a quick question: > > After matrix assembly, suppouse I have matrix A. Assuming I used 16 processors, if I want each processor to print out their local contents of the A, how do I proceed? (I simply want to know how the matrix is stored from generating to communicating to solving, so I get to display all the time to get better undersdtanding) > > > > I read the examples, and I tried things like below and sooooo many different codes from examples, it still is not working. > > PetscViewer viewer; > > PetscMPIInt my_rank; > > MPI_Comm_rank(PETSC_COMM_WORLD,&my_rank); > > PetscPrintf(MPI_COMM_SELF,"[%d] rank\n",my_rank); > > PetscViewerASCIIOpen(MPI_COMM_SELF,NULL,&viewer); > > PetscViewerPushFormat(viewer,PETSC_VIEWER_ASCII_INFO); > > MatView(impOP,viewer); > > > > Plea......se give me some hints > > > > Thank you so very much! > > > > > > Longyin Cui (or you know me as Eric); > > Student from C.S. division; > > Cell: 7407047169; > > return 0; > > > > From jed at jedbrown.org Sat Jun 20 15:11:42 2015 From: jed at jedbrown.org (Jed Brown) Date: Sat, 20 Jun 2015 14:11:42 -0600 Subject: [petsc-users] Petsc and cmake In-Reply-To: <1434756749.10836.18.camel@kolmog5> References: <1434756749.10836.18.camel@kolmog5> Message-ID: <87wpyy5edd.fsf@jedbrown.org> Michele Rosso writes: > Hi, > > I am trying to move to cmake to build my code. How would you suggest to > handle the dependency on PETSc? > Currently my makefile relies on the PETSc variables for building the all > code, namely FLINKER, CLINKER and so on. I found the FindPETSc.cmake > module and I successfully used it > but it does not import the aforementioned variables. CMake insists on setting the compiler before discovering dependent packages. FindPETSc.cmake sets PETSC_COMPILER, but you can't set the compiler based on this information; you can only check and decide whether to error. That's life with CMake. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From cuilongyin at gmail.com Sat Jun 20 20:27:39 2015 From: cuilongyin at gmail.com (Longyin Cui) Date: Sat, 20 Jun 2015 21:27:39 -0400 Subject: [petsc-users] About MatView In-Reply-To: <6DAC777F-EA19-4F12-8141-E3F505CA151C@mcs.anl.gov> References: <482CC387-136E-4746-85C1-C7FB4DB2C845@mcs.anl.gov> <6DAC777F-EA19-4F12-8141-E3F505CA151C@mcs.anl.gov> Message-ID: Thank you so much for your heart warming in time reply, and I just need to ask you a few more questions: 1, Correct me if I'm wrong, I am kinda new in this field. So, I can only print the matrix or vectors entirely, to know the structure stored in each processor I could only cal MatGetLocalSize or MatGetOwnershipRange to get general ideas (Are there more of them? what is PetscViewerASCIISynchronizedPrintf used for?). Communicator PETSC_COMM_SELE is only useful when there is only one process going on. 2, Our project is based on another group's project, and they are physists.... So I am trying to get hold of when every processor communicates what with each other. The question is not that complicated, they first initialize the matrix and the vectors, set some operators and values, assembly them and now solve Ax=b using FGMRES. From this point I just want to know how the processors divide the matrix A, because I looked into KSPsolve(), there doesn't seem to be any communication right? (Maybe I wasn't paying enough attention). So, could you give me some hints how to achieve this? to know how they communicate? I didn't find much documentation about this. 3, related to question 2....how the matrix was generated generally, for example, all the processors genereate one matrix together? or each of them generate a whole matrix deparately by itself? Every one processor holds one copy or there's only one copy in Rank0? Thank you, you are the best! Longyin Cui (or you know me as Eric); Student from C.S. division; Cell: 7407047169; return 0; On Sat, Jun 20, 2015 at 1:38 PM, Barry Smith wrote: > > Eric, > > > > On Jun 20, 2015, at 1:42 AM, Longyin Cui wrote: > > > > OMG you are real!!! > > OK, All my error message looks like this: > > PETSc Error ... exiting > > > -------------------------------------------------------------------------- > > mpirun has exited due to process rank 13 with PID 1816 on > > node cnode174.local exiting improperly. There are two reasons this could > occur: > > > > 1. this process did not call "init" before exiting, but others in > > the job did. This can cause a job to hang indefinitely while it waits > > for all processes to call "init". By rule, if one process calls "init", > > then ALL processes must call "init" prior to termination. > > > > 2. this process called "init", but exited without calling "finalize". > > By rule, all processes that call "init" MUST call "finalize" prior to > > exiting or it will be considered an "abnormal termination" > > > > This may have caused other processes in the application to be > > terminated by signals sent by mpirun (as reported here). > > This crash doesn't seem to have anything to do in particular with code > below. Do the PETSc examples run in parallel? Does your code that you ran > have a PetscInitialize() in it? What about running on two processors, does > that work? > > > > > You are right, I did use PETSC_COMM_SELE, and when I used > PETSC_COMM_WORLD alone I could get the entire matrix printed. But, this is > one whole matrix in one file. The reason I used: PetscViewerASCIIOpen ( > PETSC_COMM_SELF, "mat.output", &viewer); and MatView (matrix,viewer); was > because it says "Each processor can instead write its own independent > output by specifying the communicator PETSC_COMM_SELF". > > Yikes, this is completely untrue and has been for decades. We have no > way of saving the matrix in its parts; you cannot use use a PETSC_COMM_SELF > viewer with a parallel matrix. Sorry about the wrong information in the > documentation; I have fixed it. > > Why can't you just save the matrix in one file and then compare it? We > don't provide a way to save objects one part per process because we think > it is a bad model for parallel computing since the result depends on the > number of processors you are using. > > Barry > > > > > Also, I tried this as well, which failed, same error message : > > PetscMPIInt my_rank; > > MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); > > string str = KILLME(my_rank); // KILLME is a int to string > function... > > const char * c = str.c_str(); > > PetscViewer viewer; > > PetscViewerASCIIOpen(PETSC_COMM_WORLD, c , &viewer); > > MatView(impOP,viewer); //impOP is the huge matrix. > > PetscViewerDestroy(&viewer); > > > > I was trying to generate 16 files recording each matrix hold by each > processor so I can conpare them with the big matrix...so, what do you think > > > > Thank you very muuuch. > > > > Longyin Cui (or you know me as Eric); > > Student from C.S. division; > > Cell: 7407047169; > > return 0; > > > > > > On Sat, Jun 20, 2015 at 1:34 AM, Barry Smith wrote: > > > > You need to cut and paste and send the entire error message: "not > working" makes it very difficult for us to know what has gone wrong. > > Based on the code fragment you sent I guess one of your problems is that > the viewer communicator is not the same as the matrix communicator. Since > the matrix is on 16 processors (I am guessing PETSC_COMM_WORLD) the viewer > communicator must also be the same (also PETSC_COMM_WORLD). > > The simplest code you can do is > > > > > PetscViewerASCIIOpen(PETSC_COMM_WORLD,"stdout",&viewer); > > > MatView(impOP,viewer); > > > > but you can get a similar effect with the command line option > -mat_view and not write any code at all (the less code you have to write > the better). > > > > Barry > > > > > > > On Jun 19, 2015, at 10:42 PM, Longyin Cui > wrote: > > > > > > Hi dear whoever reads this: > > > > > > I have a quick question: > > > After matrix assembly, suppouse I have matrix A. Assuming I used 16 > processors, if I want each processor to print out their local contents of > the A, how do I proceed? (I simply want to know how the matrix is stored > from generating to communicating to solving, so I get to display all the > time to get better undersdtanding) > > > > > > I read the examples, and I tried things like below and sooooo many > different codes from examples, it still is not working. > > > PetscViewer viewer; > > > PetscMPIInt my_rank; > > > MPI_Comm_rank(PETSC_COMM_WORLD,&my_rank); > > > PetscPrintf(MPI_COMM_SELF,"[%d] rank\n",my_rank); > > > PetscViewerASCIIOpen(MPI_COMM_SELF,NULL,&viewer); > > > PetscViewerPushFormat(viewer,PETSC_VIEWER_ASCII_INFO); > > > MatView(impOP,viewer); > > > > > > Plea......se give me some hints > > > > > > Thank you so very much! > > > > > > > > > Longyin Cui (or you know me as Eric); > > > Student from C.S. division; > > > Cell: 7407047169; > > > return 0; > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mrosso at uci.edu Sat Jun 20 21:29:56 2015 From: mrosso at uci.edu (Michele Rosso) Date: Sat, 20 Jun 2015 19:29:56 -0700 Subject: [petsc-users] Petsc and cmake In-Reply-To: <87wpyy5edd.fsf@jedbrown.org> References: <1434756749.10836.18.camel@kolmog5> <87wpyy5edd.fsf@jedbrown.org> Message-ID: <1434853796.2469.4.camel@enterprise-A> Hi Jed, thank you for your reply. So basically I should use PETSC_COMPILER to check against the system compiler and proceed only if they match, correct? Also, I attached the output of cmake and FindPETSc.cmake: it complains that PETSc requires extra include paths and explicit linking to all dependencies. Despite that, I can compile and run my correctly. Should I worry about it? Finally, is there a way to retrieve the compiler flags I use to build PETSc? Thanks, Michele On Sat, 2015-06-20 at 14:11 -0600, Jed Brown wrote: > Michele Rosso writes: > > > Hi, > > > > I am trying to move to cmake to build my code. How would you suggest to > > handle the dependency on PETSc? > > Currently my makefile relies on the PETSc variables for building the all > > code, namely FLINKER, CLINKER and so on. I found the FindPETSc.cmake > > module and I successfully used it > > but it does not import the aforementioned variables. > > CMake insists on setting the compiler before discovering dependent > packages. FindPETSc.cmake sets PETSC_COMPILER, but you can't set the > compiler based on this information; you can only check and decide > whether to error. That's life with CMake. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- -- The C compiler identification is GNU 4.8.4 -- The CXX compiler identification is GNU 4.8.4 -- Check for working C compiler: /usr/bin/cc -- Check for working C compiler: /usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- The Fortran compiler identification is GNU -- Check for working Fortran compiler: /usr/bin/gfortran -- Check for working Fortran compiler: /usr/bin/gfortran -- works -- Detecting Fortran compiler ABI info -- Detecting Fortran compiler ABI info - done -- Checking whether /usr/bin/gfortran supports Fortran 90 -- Checking whether /usr/bin/gfortran supports Fortran 90 -- yes -- petsc_lib_dir /opt/petsc/petsc-3.5.3/gnu-dbg-32idx/lib -- Recognized PETSc install with single library for all packages -- Performing Test MULTIPASS_TEST_1_petsc_works_minimal -- Performing Test MULTIPASS_TEST_1_petsc_works_minimal - Failed -- Performing Test MULTIPASS_TEST_2_petsc_works_allincludes -- Performing Test MULTIPASS_TEST_2_petsc_works_allincludes - Failed -- Performing Test MULTIPASS_TEST_3_petsc_works_alllibraries -- Performing Test MULTIPASS_TEST_3_petsc_works_alllibraries - Failed -- Performing Test MULTIPASS_TEST_4_petsc_works_all -- Performing Test MULTIPASS_TEST_4_petsc_works_all - Success -- PETSc requires extra include paths and explicit linking to all dependencies. This probably means you have static libraries and something unexpected in PETSc headers. -- Found PETSc: /opt/petsc/petsc-3.5.3/include;/opt/petsc/petsc-3.5.3/gnu-dbg-32idx/include;/usr/include/mpich -- Configuring done -- Generating done -- Build files have been written to: /home/mic/Documents/Dev/cmake_tests/build From knepley at gmail.com Sun Jun 21 07:17:09 2015 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 21 Jun 2015 07:17:09 -0500 Subject: [petsc-users] About MatView In-Reply-To: References: <482CC387-136E-4746-85C1-C7FB4DB2C845@mcs.anl.gov> <6DAC777F-EA19-4F12-8141-E3F505CA151C@mcs.anl.gov> Message-ID: On Sat, Jun 20, 2015 at 8:27 PM, Longyin Cui wrote: > Thank you so much for your heart warming in time reply, and I just need to > ask you a few more questions: > > 1, Correct me if I'm wrong, I am kinda new in this field. So, I can only > print the matrix or vectors entirely, to know the structure stored in each > processor I could only cal MatGetLocalSize or MatGetOwnershipRange to get > general ideas (Are there more of them? what is PetscViewerASCIISynchronizedPrintf > used for?). Communicator PETSC_COMM_SELE is only useful when there is > only one process going on. > The PetscViewers are always used on the whole object so that it is independent of the partition. As you point out, you can get information about the pieces through the API. > 2, Our project is based on another group's project, and they are > physists.... So I am trying to get hold of when every processor > communicates what with each other. The question is not that complicated, > they first initialize the matrix and the vectors, set some operators and > values, assembly them and now solve Ax=b using FGMRES. From this point I > just want to know how the processors divide the matrix A, because I looked > into KSPsolve(), there doesn't seem to be any communication right? (Maybe I > wasn't paying enough attention). So, could you give me some hints how to > achieve this? to know how they communicate? I didn't find much > documentation about this. > At the top level, you can look at -log_summary to see how many messages were sent, how big they were, and how many reductions were done. Do you need more specificity than this? > 3, related to question 2....how the matrix was generated generally, for > example, all the processors genereate one matrix together? or each of them > generate a whole matrix deparately by itself? Every one processor holds one > copy or there's only one copy in Rank0? > The matrix is stored in a distributed fashion, with each process holding a piece. Hopefully, it is generated by all processes, with a small amount of communication near the boundaries. Thanks, Matt > Thank you, you are the best! > > Longyin Cui (or you know me as Eric); > Student from C.S. division; > Cell: 7407047169; > return 0; > > > On Sat, Jun 20, 2015 at 1:38 PM, Barry Smith wrote: > >> >> Eric, >> >> >> > On Jun 20, 2015, at 1:42 AM, Longyin Cui wrote: >> > >> > OMG you are real!!! >> > OK, All my error message looks like this: >> > PETSc Error ... exiting >> > >> -------------------------------------------------------------------------- >> > mpirun has exited due to process rank 13 with PID 1816 on >> > node cnode174.local exiting improperly. There are two reasons this >> could occur: >> > >> > 1. this process did not call "init" before exiting, but others in >> > the job did. This can cause a job to hang indefinitely while it waits >> > for all processes to call "init". By rule, if one process calls "init", >> > then ALL processes must call "init" prior to termination. >> > >> > 2. this process called "init", but exited without calling "finalize". >> > By rule, all processes that call "init" MUST call "finalize" prior to >> > exiting or it will be considered an "abnormal termination" >> > >> > This may have caused other processes in the application to be >> > terminated by signals sent by mpirun (as reported here). >> >> This crash doesn't seem to have anything to do in particular with code >> below. Do the PETSc examples run in parallel? Does your code that you ran >> have a PetscInitialize() in it? What about running on two processors, does >> that work? >> >> > >> > You are right, I did use PETSC_COMM_SELE, and when I used >> PETSC_COMM_WORLD alone I could get the entire matrix printed. But, this is >> one whole matrix in one file. The reason I used: PetscViewerASCIIOpen ( >> PETSC_COMM_SELF, "mat.output", &viewer); and MatView (matrix,viewer); was >> because it says "Each processor can instead write its own independent >> output by specifying the communicator PETSC_COMM_SELF". >> >> Yikes, this is completely untrue and has been for decades. We have no >> way of saving the matrix in its parts; you cannot use use a PETSC_COMM_SELF >> viewer with a parallel matrix. Sorry about the wrong information in the >> documentation; I have fixed it. >> >> Why can't you just save the matrix in one file and then compare it? We >> don't provide a way to save objects one part per process because we think >> it is a bad model for parallel computing since the result depends on the >> number of processors you are using. >> >> Barry >> >> > >> > Also, I tried this as well, which failed, same error message : >> > PetscMPIInt my_rank; >> > MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); >> > string str = KILLME(my_rank); // KILLME is a int to string >> function... >> > const char * c = str.c_str(); >> > PetscViewer viewer; >> > PetscViewerASCIIOpen(PETSC_COMM_WORLD, c , &viewer); >> > MatView(impOP,viewer); //impOP is the huge matrix. >> > PetscViewerDestroy(&viewer); >> > >> > I was trying to generate 16 files recording each matrix hold by each >> processor so I can conpare them with the big matrix...so, what do you think >> > >> > Thank you very muuuch. >> > >> > Longyin Cui (or you know me as Eric); >> > Student from C.S. division; >> > Cell: 7407047169; >> > return 0; >> > >> > >> > On Sat, Jun 20, 2015 at 1:34 AM, Barry Smith >> wrote: >> > >> > You need to cut and paste and send the entire error message: "not >> working" makes it very difficult for us to know what has gone wrong. >> > Based on the code fragment you sent I guess one of your problems is >> that the viewer communicator is not the same as the matrix communicator. >> Since the matrix is on 16 processors (I am guessing PETSC_COMM_WORLD) the >> viewer communicator must also be the same (also PETSC_COMM_WORLD). >> > The simplest code you can do is >> > >> > > PetscViewerASCIIOpen(PETSC_COMM_WORLD,"stdout",&viewer); >> > > MatView(impOP,viewer); >> > >> > but you can get a similar effect with the command line option >> -mat_view and not write any code at all (the less code you have to write >> the better). >> > >> > Barry >> > >> > >> > > On Jun 19, 2015, at 10:42 PM, Longyin Cui >> wrote: >> > > >> > > Hi dear whoever reads this: >> > > >> > > I have a quick question: >> > > After matrix assembly, suppouse I have matrix A. Assuming I used 16 >> processors, if I want each processor to print out their local contents of >> the A, how do I proceed? (I simply want to know how the matrix is stored >> from generating to communicating to solving, so I get to display all the >> time to get better undersdtanding) >> > > >> > > I read the examples, and I tried things like below and sooooo many >> different codes from examples, it still is not working. >> > > PetscViewer viewer; >> > > PetscMPIInt my_rank; >> > > MPI_Comm_rank(PETSC_COMM_WORLD,&my_rank); >> > > PetscPrintf(MPI_COMM_SELF,"[%d] rank\n",my_rank); >> > > PetscViewerASCIIOpen(MPI_COMM_SELF,NULL,&viewer); >> > > PetscViewerPushFormat(viewer,PETSC_VIEWER_ASCII_INFO); >> > > MatView(impOP,viewer); >> > > >> > > Plea......se give me some hints >> > > >> > > Thank you so very much! >> > > >> > > >> > > Longyin Cui (or you know me as Eric); >> > > Student from C.S. division; >> > > Cell: 7407047169; >> > > return 0; >> > > >> > >> > >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Sun Jun 21 20:41:25 2015 From: jed at jedbrown.org (Jed Brown) Date: Sun, 21 Jun 2015 19:41:25 -0600 Subject: [petsc-users] Petsc and cmake In-Reply-To: <1434853796.2469.4.camel@enterprise-A> References: <1434756749.10836.18.camel@kolmog5> <87wpyy5edd.fsf@jedbrown.org> <1434853796.2469.4.camel@enterprise-A> Message-ID: <87ioag5xkq.fsf@jedbrown.org> Michele Rosso writes: > Hi Jed, > > thank you for your reply. > So basically I should use PETSC_COMPILER to check against the system > compiler and proceed only if they match, correct? > > Also, I attached the output of cmake and FindPETSc.cmake: it complains > that PETSc requires extra include paths and explicit linking to all > dependencies. > Despite that, I can compile and run my correctly. Should I worry about > it? No, it probably means you have some or all static libraries. It's only a showstopper if you are building for an environment that prohibits overlinking (like some Linux distros). For private users, it's rarely an issue. > Finally, is there a way to retrieve the compiler flags I use to build > PETSc? Not at this time and there is no single set of flags. If you just want all the PETSc makefile variables, include the makefile. CMake users normally want firm insulation from the flags used to build the package (it gets hokey at times, but it's what they ask for and what they expect). I would not be opposed if you want to create CMake variables for those flags (but be sure to namespace completely and accurately). -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From orxan.shibli at gmail.com Mon Jun 22 05:04:06 2015 From: orxan.shibli at gmail.com (Orxan Shibliyev) Date: Mon, 22 Jun 2015 04:04:06 -0600 Subject: [petsc-users] Very low CFL number for PETSC's GS Message-ID: I wanted to compare my own GS and the one of PETSC's. I used KSPRICHARDSON with PCSOR to obtain GS. I tested my GS with CFL=40 and solved Ax=b problem successfully and fast. However, PETSC failed to solve at CFL=40 and it gives an answer only for very low CFL numbers such as 0.1. Of course, the convergence was very slow. My question is that if A and b and the implementations are the same why PETSC fails to solve with same CFL number as for my GS solver? PS1: My GS solver is a plain GS, no fancy stuff. PS2: A is a block matrix (MATSEQBAIJ) for PETSC. Also, the process is sequential. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Jun 22 06:18:39 2015 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 22 Jun 2015 06:18:39 -0500 Subject: [petsc-users] Very low CFL number for PETSC's GS In-Reply-To: References: Message-ID: On Mon, Jun 22, 2015 at 5:04 AM, Orxan Shibliyev wrote: > I wanted to compare my own GS and the one of PETSC's. I used KSPRICHARDSON > with PCSOR to obtain GS. I tested my GS with CFL=40 and solved Ax=b problem > successfully and fast. However, PETSC failed to solve at CFL=40 and it > gives an answer only for very low CFL numbers such as 0.1. Of course, the > convergence was very slow. My question is that if A and b and the > implementations are the same why PETSC fails to solve with same CFL number > as for my GS solver? > > PS1: My GS solver is a plain GS, no fancy stuff. > PS2: A is a block matrix (MATSEQBAIJ) for PETSC. Also, the process is > sequential. > 1) For any solver question, send the output of -ksp_view -ksp_monitor_true_residual -ksp_converged_reason 2) Did you use http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCSOR.html and set the options so that the algorithm matches yours? Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From venkateshgk.j at gmail.com Mon Jun 22 10:57:39 2015 From: venkateshgk.j at gmail.com (venkatesh g) Date: Mon, 22 Jun 2015 21:27:39 +0530 Subject: [petsc-users] MUMPS error and superLU error In-Reply-To: References: Message-ID: Hi I have restructured my matrix eigenvalue problem to see why B is singular as you suggested by changing the governing equations in different form. Now my matrix B is not singular. Both A and B are invertible in Ax=lambda Bx. Still I receive error in MUMPS as it uses large memory (attached is the error log) I gave the command: aprun -n 240 -N 24 ./ex7 -f1 A100t -f2 B100t -st_type sinvert -eps_target 0.01 -st_ksp_type preonly -st_pc_type lu -st_pc_factor_mat_solver_package mumps -mat_mumps_cntl_1 1e-5 -mat_mumps_icntl_4 2 -evecs v100t The matrix A is 60% with zeros. Kindly help me. Venkatesh On Sun, May 31, 2015 at 8:04 PM, Hong wrote: > venkatesh, > > As we discussed previously, even on smaller problems, > both mumps and superlu_dist failed, although Mumps gave "OOM" error in > numerical factorization. > > You acknowledged that B is singular, which may need additional > reformulation for your eigenvalue problems. The option '-st_type sinvert' > likely uses B^{-1} (have you read slepc manual?), which could be the source > of trouble. > > Please investigate your model, understand why B is singular; if there is a > way to dump null space before submitting large size simulation. > > Hong > > > On Sun, May 31, 2015 at 8:36 AM, Dave May wrote: > >> It failed due to a lack of memory. "OOM" stands for "out of memory". OOM >> killer terminated your job means you ran out of memory. >> >> >> >> >> On Sunday, 31 May 2015, venkatesh g wrote: >> >>> Hi all, >>> >>> I tried to run my Generalized Eigenproblem in 120 x 24 = 2880 cores. >>> The matrix size of A = 20GB and B = 5GB. >>> >>> It got killed after 7 Hrs of run time. Please see the mumps error log. >>> Why must it fail ? >>> I gave the command: >>> >>> aprun -n 240 -N 24 ./ex7 -f1 a110t -f2 b110t -st_type sinvert -eps_nev 1 >>> -log_summary -st_ksp_type preonly -st_pc_type lu >>> -st_pc_factor_mat_solver_package mumps -mat_mumps_cntl_1 1e-2 >>> >>> Kindly let me know. >>> >>> cheers, >>> Venkatesh >>> >>> On Fri, May 29, 2015 at 10:46 PM, venkatesh g >>> wrote: >>> >>>> Hi Matt, users, >>>> >>>> Thanks for the info. Do you also use Petsc and Slepc with MUMPS ? I get >>>> into the segmentation error if I increase my matrix size. >>>> >>>> Can you suggest other software for direct solver for QR in parallel >>>> since as LU may not be good for a singular B matrix in Ax=lambda Bx ? I am >>>> attaching the working version mumps log. >>>> >>>> My matrix size here is around 47000x47000. If I am not wrong, the >>>> memory usage per core is 272MB. >>>> >>>> Can you tell me if I am wrong ? or really if its light on memory for >>>> this matrix ? >>>> >>>> Thanks >>>> cheers, >>>> Venkatesh >>>> >>>> On Fri, May 29, 2015 at 4:00 PM, Matt Landreman < >>>> matt.landreman at gmail.com> wrote: >>>> >>>>> Dear Venkatesh, >>>>> >>>>> As you can see in the error log, you are now getting a segmentation >>>>> fault, which is almost certainly a separate issue from the info(1)=-9 >>>>> memory problem you had previously. Here is one idea which may or may not >>>>> help. I've used mumps on the NERSC Edison system, and I found that I >>>>> sometimes get segmentation faults when using the default Intel compiler. >>>>> When I switched to the cray compiler the problem disappeared. So you could >>>>> perhaps try a different compiler if one is available on your system. >>>>> >>>>> Matt >>>>> On May 29, 2015 4:04 AM, "venkatesh g" >>>>> wrote: >>>>> >>>>>> Hi Matt, >>>>>> >>>>>> I did what you told and read the manual of that CNTL parameters. I >>>>>> solve for that with CNTL(1)=1e-4. It is working. >>>>>> >>>>>> But it was a test matrix with size 46000x46000. Actual matrix size is >>>>>> 108900x108900 and will increase in the future. >>>>>> >>>>>> I get this error of memory allocation failed. And the binary matrix >>>>>> size of A is 20GB and B is 5 GB. >>>>>> >>>>>> Now I submit this in 240 processors each 4 GB RAM and also in 128 >>>>>> Processors with total 512 GB RAM. >>>>>> >>>>>> In both the cases, it fails with the following error like memory is >>>>>> not enough. But for 90000x90000 size it had run serially in Matlab with >>>>>> <256 GB RAM. >>>>>> >>>>>> Kindly let me know. >>>>>> >>>>>> Venkatesh >>>>>> >>>>>> On Tue, May 26, 2015 at 8:02 PM, Matt Landreman < >>>>>> matt.landreman at gmail.com> wrote: >>>>>> >>>>>>> Hi Venkatesh, >>>>>>> >>>>>>> I've struggled a bit with mumps memory allocation too. I think the >>>>>>> behavior of mumps is roughly the following. First, in the "analysis step", >>>>>>> mumps computes a minimum memory required based on the structure of nonzeros >>>>>>> in the matrix. Then when it actually goes to factorize the matrix, if it >>>>>>> ever encounters an element smaller than CNTL(1) (default=0.01) in the >>>>>>> diagonal of a sub-matrix it is trying to factorize, it modifies the >>>>>>> ordering to avoid the small pivot, which increases the fill-in (hence >>>>>>> memory needed). ICNTL(14) sets the margin allowed for this unanticipated >>>>>>> fill-in. Setting ICNTL(14)=200000 as in your email is not the solution, >>>>>>> since this means mumps asks for a huge amount of memory at the start. >>>>>>> Better would be to lower CNTL(1) or (I think) use static pivoting >>>>>>> (CNTL(4)). Read the section in the mumps manual about these CNTL >>>>>>> parameters. I typically set CNTL(1)=1e-6, which eliminated all the >>>>>>> INFO(1)=-9 errors for my problem, without having to modify ICNTL(14). >>>>>>> >>>>>>> Also, I recommend running with ICNTL(4)=3 to display diagnostics. >>>>>>> Look for the line in standard output that says "TOTAL space in MBYTES >>>>>>> for IC factorization". This is the amount of memory that mumps is trying >>>>>>> to allocate, and for the default ICNTL(14), it should be similar to >>>>>>> matlab's need. >>>>>>> >>>>>>> Hope this helps, >>>>>>> -Matt Landreman >>>>>>> University of Maryland >>>>>>> >>>>>>> On Tue, May 26, 2015 at 10:03 AM, venkatesh g < >>>>>>> venkateshgk.j at gmail.com> wrote: >>>>>>> >>>>>>>> I posted a while ago in MUMPS forums but no one seems to reply. >>>>>>>> >>>>>>>> I am solving a large generalized Eigenvalue problem. >>>>>>>> >>>>>>>> I am getting the following error which is attached, after giving >>>>>>>> the command: >>>>>>>> >>>>>>>> /cluster/share/venkatesh/petsc-3.5.3/linux-gnu/bin/mpiexec -np 64 >>>>>>>> -hosts compute-0-4,compute-0-6,compute-0-7,compute-0-8 ./ex7 -f1 a72t -f2 >>>>>>>> b72t -st_type sinvert -eps_nev 3 -eps_target 0.5 -st_ksp_type preonly >>>>>>>> -st_pc_type lu -st_pc_factor_mat_solver_package mumps -mat_mumps_icntl_14 >>>>>>>> 200000 >>>>>>>> >>>>>>>> IT IS impossible to allocate so much memory per processor.. it is >>>>>>>> asking like around 70 GB per processor. >>>>>>>> >>>>>>>> A serial job in MATLAB for the same matrices takes < 60GB. >>>>>>>> >>>>>>>> After trying out superLU_dist, I have attached the error there also >>>>>>>> (segmentation error). >>>>>>>> >>>>>>>> Kindly help me. >>>>>>>> >>>>>>>> Venkatesh >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- Generalized eigenproblem stored in file. Reading COMPLEX matrices from binary files... Entering ZMUMPS driver with JOB, N, NZ = 1 50000 0 ZMUMPS 4.10.0 L U Solver for unsymmetric matrices Type of parallelism: Working host ****** ANALYSIS STEP ******** ** Max-trans not allowed because matrix is distributed ... Structural symmetry (in percent)= 86 Density: NBdense, Average, Median = 01635120109 Ordering based on METIS A root of estimated size 21001 has been selected for Scalapack. Leaving analysis phase with ... INFOG(1) = 0 INFOG(2) = 0 -- (20) Number of entries in factors (estim.) = 1314700738 -- (3) Storage of factors (REAL, estimated) = 1416564824 -- (4) Storage of factors (INT , estimated) = 10444785 -- (5) Maximum frontal size (estimated) = 21500 -- (6) Number of nodes in the tree = 240 -- (32) Type of analysis effectively used = 1 -- (7) Ordering option effectively used = 5 ICNTL(6) Maximum transversal option = 0 ICNTL(7) Pivot order option = 7 Percentage of memory relaxation (effective) = 35 Number of level 2 nodes = 139 Number of split nodes = 2 RINFOG(1) Operations during elimination (estim)= 2.341D+13 Distributed matrix entry format (ICNTL(18)) = 3 ** Rank of proc needing largest memory in IC facto : 30 ** Estimated corresponding MBYTES for IC facto : 21593 ** Estimated avg. MBYTES per work. proc at facto (IC) : 7708 ** TOTAL space in MBYTES for IC factorization : 1850075 ** Rank of proc needing largest memory for OOC facto : 30 ** Estimated corresponding MBYTES for OOC facto : 21681 ** Estimated avg. MBYTES per work. proc at facto (OOC) : 7782 ** TOTAL space in MBYTES for OOC factorization : 1867805 Entering ZMUMPS driver with JOB, N, NZ = 2 50000 716459748 ****** FACTORIZATION STEP ******** GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ... NUMBER OF WORKING PROCESSES = 240 OUT-OF-CORE OPTION (ICNTL(22)) = 0 REAL SPACE FOR FACTORS = 1416564824 INTEGER SPACE FOR FACTORS = 10444785 MAXIMUM FRONTAL SIZE (ESTIMATED) = 21500 NUMBER OF NODES IN THE TREE = 240 Convergence error after scaling for ONE-NORM (option 7/8) = 0.59D+01 Maximum effective relaxed size of S = 1181811925 Average effective relaxed size of S = 228990839 REDISTRIB: TOTAL DATA LOCAL/SENT = 328575589 1437471711 GLOBAL TIME FOR MATRIX DISTRIBUTION = 206.6792 ** Memory relaxation parameter ( ICNTL(14) ) : 35 ** Rank of processor needing largest memory in facto : 30 ** Space in MBYTES used by this processor for facto : 21593 ** Avg. Space in MBYTES per working proc during facto : 7708 [NID 01360] 2015-06-22 20:00:41 Apid 432433: initiated application termination [NID 01360] 2015-06-22 19:59:18 Apid 432433: OOM killer terminated this process. Application 432433 exit signals: Killed Application 432433 resources: utime ~0s, stime ~20s, Rss ~7716, inblocks ~16040, outblocks ~2380 From bsmith at mcs.anl.gov Mon Jun 22 12:43:13 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 22 Jun 2015 12:43:13 -0500 Subject: [petsc-users] MUMPS error and superLU error In-Reply-To: References: Message-ID: <464F9BA7-5FA7-46F9-A1BD-65CA71079D5E@mcs.anl.gov> There is nothing we can really do to help on the PETSc side. I do note from the output REDISTRIB: TOTAL DATA LOCAL/SENT = 328575589 1437471711 GLOBAL TIME FOR MATRIX DISTRIBUTION = 206.6792 ** Memory relaxation parameter ( ICNTL(14) ) : 35 ** Rank of processor needing largest memory in facto : 30 ** Space in MBYTES used by this processor for facto : 21593 ** Avg. Space in MBYTES per working proc during facto : 7708 some processes (like 30) require three times as much memory as other processes so perhaps a better load balancing of the matrix during the factorization would help with memory usage. Barry > On Jun 22, 2015, at 10:57 AM, venkatesh g wrote: > > Hi > I have restructured my matrix eigenvalue problem to see why B is singular as you suggested by changing the governing equations in different form. > > Now my matrix B is not singular. Both A and B are invertible in Ax=lambda Bx. > > Still I receive error in MUMPS as it uses large memory (attached is the error log) > > I gave the command: aprun -n 240 -N 24 ./ex7 -f1 A100t -f2 B100t -st_type sinvert -eps_target 0.01 -st_ksp_type preonly -st_pc_type lu -st_pc_factor_mat_solver_package mumps -mat_mumps_cntl_1 1e-5 -mat_mumps_icntl_4 2 -evecs v100t > > The matrix A is 60% with zeros. > > Kindly help me. > > Venkatesh > > On Sun, May 31, 2015 at 8:04 PM, Hong wrote: > venkatesh, > > As we discussed previously, even on smaller problems, > both mumps and superlu_dist failed, although Mumps gave "OOM" error in numerical factorization. > > You acknowledged that B is singular, which may need additional reformulation for your eigenvalue problems. The option '-st_type sinvert' likely uses B^{-1} (have you read slepc manual?), which could be the source of trouble. > > Please investigate your model, understand why B is singular; if there is a way to dump null space before submitting large size simulation. > > Hong > > > On Sun, May 31, 2015 at 8:36 AM, Dave May wrote: > It failed due to a lack of memory. "OOM" stands for "out of memory". OOM killer terminated your job means you ran out of memory. > > > > > On Sunday, 31 May 2015, venkatesh g wrote: > Hi all, > > I tried to run my Generalized Eigenproblem in 120 x 24 = 2880 cores. > The matrix size of A = 20GB and B = 5GB. > > It got killed after 7 Hrs of run time. Please see the mumps error log. Why must it fail ? > I gave the command: > > aprun -n 240 -N 24 ./ex7 -f1 a110t -f2 b110t -st_type sinvert -eps_nev 1 -log_summary -st_ksp_type preonly -st_pc_type lu -st_pc_factor_mat_solver_package mumps -mat_mumps_cntl_1 1e-2 > > Kindly let me know. > > cheers, > Venkatesh > > On Fri, May 29, 2015 at 10:46 PM, venkatesh g wrote: > Hi Matt, users, > > Thanks for the info. Do you also use Petsc and Slepc with MUMPS ? I get into the segmentation error if I increase my matrix size. > > Can you suggest other software for direct solver for QR in parallel since as LU may not be good for a singular B matrix in Ax=lambda Bx ? I am attaching the working version mumps log. > > My matrix size here is around 47000x47000. If I am not wrong, the memory usage per core is 272MB. > > Can you tell me if I am wrong ? or really if its light on memory for this matrix ? > > Thanks > cheers, > Venkatesh > > On Fri, May 29, 2015 at 4:00 PM, Matt Landreman wrote: > Dear Venkatesh, > > As you can see in the error log, you are now getting a segmentation fault, which is almost certainly a separate issue from the info(1)=-9 memory problem you had previously. Here is one idea which may or may not help. I've used mumps on the NERSC Edison system, and I found that I sometimes get segmentation faults when using the default Intel compiler. When I switched to the cray compiler the problem disappeared. So you could perhaps try a different compiler if one is available on your system. > > Matt > > On May 29, 2015 4:04 AM, "venkatesh g" wrote: > Hi Matt, > > I did what you told and read the manual of that CNTL parameters. I solve for that with CNTL(1)=1e-4. It is working. > > But it was a test matrix with size 46000x46000. Actual matrix size is 108900x108900 and will increase in the future. > > I get this error of memory allocation failed. And the binary matrix size of A is 20GB and B is 5 GB. > > Now I submit this in 240 processors each 4 GB RAM and also in 128 Processors with total 512 GB RAM. > > In both the cases, it fails with the following error like memory is not enough. But for 90000x90000 size it had run serially in Matlab with <256 GB RAM. > > Kindly let me know. > > Venkatesh > > On Tue, May 26, 2015 at 8:02 PM, Matt Landreman wrote: > Hi Venkatesh, > > I've struggled a bit with mumps memory allocation too. I think the behavior of mumps is roughly the following. First, in the "analysis step", mumps computes a minimum memory required based on the structure of nonzeros in the matrix. Then when it actually goes to factorize the matrix, if it ever encounters an element smaller than CNTL(1) (default=0.01) in the diagonal of a sub-matrix it is trying to factorize, it modifies the ordering to avoid the small pivot, which increases the fill-in (hence memory needed). ICNTL(14) sets the margin allowed for this unanticipated fill-in. Setting ICNTL(14)=200000 as in your email is not the solution, since this means mumps asks for a huge amount of memory at the start. Better would be to lower CNTL(1) or (I think) use static pivoting (CNTL(4)). Read the section in the mumps manual about these CNTL parameters. I typically set CNTL(1)=1e-6, which eliminated all the INFO(1)=-9 errors for my problem, without having to modify ICNTL(14). > > Also, I recommend running with ICNTL(4)=3 to display diagnostics. Look for the line in standard output that says "TOTAL space in MBYTES for IC factorization". This is the amount of memory that mumps is trying to allocate, and for the default ICNTL(14), it should be similar to matlab's need. > > Hope this helps, > -Matt Landreman > University of Maryland > > On Tue, May 26, 2015 at 10:03 AM, venkatesh g wrote: > I posted a while ago in MUMPS forums but no one seems to reply. > > I am solving a large generalized Eigenvalue problem. > > I am getting the following error which is attached, after giving the command: > > /cluster/share/venkatesh/petsc-3.5.3/linux-gnu/bin/mpiexec -np 64 -hosts compute-0-4,compute-0-6,compute-0-7,compute-0-8 ./ex7 -f1 a72t -f2 b72t -st_type sinvert -eps_nev 3 -eps_target 0.5 -st_ksp_type preonly -st_pc_type lu -st_pc_factor_mat_solver_package mumps -mat_mumps_icntl_14 200000 > > IT IS impossible to allocate so much memory per processor.. it is asking like around 70 GB per processor. > > A serial job in MATLAB for the same matrices takes < 60GB. > > After trying out superLU_dist, I have attached the error there also (segmentation error). > > Kindly help me. > > Venkatesh > > > > > > > > > From s.kramer at imperial.ac.uk Mon Jun 22 13:13:55 2015 From: s.kramer at imperial.ac.uk (Stephan Kramer) Date: Mon, 22 Jun 2015 19:13:55 +0100 Subject: [petsc-users] move from KSPSetNullSpace to MatSetNullSpace Message-ID: <55885063.2060406@imperial.ac.uk> Dear petsc devs I've been trying to move our code from using KSPSetNullSpace to use MatSetNullSpace instead. Although I appreciate the convenience of the nullspace being propagated automatically through the solver hierarchy, I'm still a little confused on how to deal with the case that mat/=pmat in a ksp. If I read the code correctly I need to call MatSetNullSpace on the pmat in order for a ksp to project that nullspace out of the residual during the krylov iteration. However the nullspaces of mat and pmat are not necessarily the same. For instance, what should I do if the system that I want to solve has a nullspace (i.e. the `mat` has a null vector), but the preconditioner matrix does not. As an example of existing setups that we have for which this is the case: take a standard Stokes velocity, pressure system - where we want to solve for pressure through a Schur complement. Suppose the boundary conditions are Dirichlet for velocity on all boundaries, then the pressure equation has the standard, constant nullspace. A frequently used preconditioner is the "scaled pressure mass matrix" where G^T K^{-1} G is approximated by a pressure mass matrix that is scaled by the value of the viscosity. So in this case the actual system has a nullspace, but the preconditioner matrix does not. The way we previously used to solve this is by setting the nullspace on the ksp of the Schur system, where the mat comes from MatCreateSchurComplement and the pmat is the scaled mass matrix. We then set the pctype to PCKSP and do not set a nullspace on its associated ksp. I don't really see how I can do this using only MatSetNullSpace - unless I create a copy of the mass matrix and have one copy with and one copy without a nullspace so that I could use the one with the nullspace for the pmat of the Schur ksp (and thus the mat of the pcksp) and the copy without the nullspace as the pmat for the pcksp. We would very much appreciate some guidance on what the correct way to deal with this kind of situation is Cheers Stephan From dalcinl at gmail.com Mon Jun 22 14:25:23 2015 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Mon, 22 Jun 2015 14:25:23 -0500 Subject: [petsc-users] petsc4py Build Problem In-Reply-To: References: Message-ID: Dear Mikhail, Satish helped me and we managed to get petsc4py working with static builds and Cygwin's Open MPI. Please git pull and try again. On 11 June 2015 at 14:13, Mikhail Khodak wrote: > I have attached the log files. > I tried using Open MPI before but received the same result. I will try a > static build later and otherwise will get a VM. > Thanks for looking at this, > Mikhail > > > On Thu, Jun 11, 2015 at 10:48 AM, Satish Balay wrote: >> >> Also - should mention: >> >> - cygwin has OpenMPI pacakged - so you could also try that. >> - most of my builds are static [so never tried dll builds] >> >> Satish >> >> On Thu, 11 Jun 2015, Satish Balay wrote: >> >> > If you get errors when running basic petsc examples - send us the >> > relavant petsc logs [configure.log, make.log, test.log etc..] >> > >> > Also - note that mpich is not supported on cygwin/windows [it >> > generally works for us - when we try the --download-mpich build]. >> > >> > Unless you really need a cywin build/use of petsc4py - it might be >> > easier to install a linux VM - and use PETSc/petsc4py on it. >> > >> > Satish >> > >> > On Thu, 11 Jun 2015, Mikhail Khodak wrote: >> > >> > > I added these lines to the petsc4py test files (specifically >> > > test_comm.py) >> > > but the error remains the same. However, I have done the standard >> > > 'which >> > > mpiexec', 'which mpicc', 'which mpirun' and they are all in the same >> > > folder. In fact it is the only MPI installed. >> > > The reason I thought it might be a PETSc build problem is because one >> > > of >> > > the PETSc 'make test' tests (ex5f) fails with the same error, even >> > > though >> > > the 'make streams' test works fine with MPI processes. >> > > Thanks, >> > > Mikhail >> > > >> > > On Thu, Jun 11, 2015 at 4:21 AM, Matthew Knepley >> > > wrote: >> > > >> > > > On Thu, Jun 11, 2015 at 5:07 AM, Mikhail Khodak >> > > > >> > > > wrote: >> > > > >> > > >> Thank you for your help - the install seems to work, apart from >> > > >> routines >> > > >> requiring MPI, which fail due to the "Attempting to use an MPI >> > > >> routine >> > > >> before initializing MPI" error. This seems to be an error in the >> > > >> PETSc >> > > >> build itself. >> > > >> >> > > > >> > > > The MPI routines will not work until after >> > > > >> > > > import petsc4py, sys >> > > > petsc4py.init(sys.argv) >> > > > from petsc4py import PETSc >> > > > >> > > > If they fail after this, it is usually a mismatch between the >> > > > mpiexec >> > > > being used and the MPI >> > > > libraries being linked. >> > > > >> > > > Thanks, >> > > > >> > > > Matt >> > > > >> > > > Thanks again, >> > > >> Mikhail Khodak >> > > >> >> > > >> On Mon, Jun 8, 2015 at 5:11 AM, Lisandro Dalcin >> > > >> wrote: >> > > >> >> > > >>> On 8 June 2015 at 02:50, Mikhail Khodak >> > > >>> wrote: >> > > >>> > Hello, I am trying to build petsc4py-3.5.1 using Cygwin on >> > > >>> > 64-bit >> > > >>> Windows 7. >> > > >>> > I have built PETSc 3.5.4 with shared and dynamic libraries using >> > > >>> > mpich2-1.2.1 and successfully ran the installation tests. I am >> > > >>> > using >> > > >>> Python >> > > >>> > 2.7 and NumPy 1.9.2 and have installed mpi4py. However, when I >> > > >>> > attempt >> > > >>> to >> > > >>> > install petsc4py (both with pip and distutils) I get a mpicc >> > > >>> > compiler >> > > >>> error >> > > >>> > due to undefined references/symbols. I have attached the output >> > > >>> > of >> > > >>> running >> > > >>> > >> > > >>> > pip install petsc petsc4py --allow-external petsc >> > > >>> > --allow-external >> > > >>> petsc4py >> > > >>> > >> > > >>> >> > > >>> I've never ever built or test petsc4py under Cygwin. The errors >> > > >>> you >> > > >>> see are expected. >> > > >>> Perhaps you can manually workaround the issues following the >> > > >>> following >> > > >>> steps: >> > > >>> >> > > >>> 1) Download the petsc4py tarball and unpack it. >> > > >>> 2) Open the file "src/libpetsc4py/libpetsc4py.h", add remove all >> > > >>> occurences of DL_IMPORT, i.e, replace DL_IMPORT(XYZ) for just XYZ >> > > >>> 3) Use pip again: >> > > >>> >> > > >>> pip install petsc >> > > >>> pip install . >> > > >>> >> > > >>> The last line assumes your current working directory is the one >> > > >>> having >> > > >>> petsc4py's setup.py >> > > >>> >> > > >>> Finally, I do not guarantee this will work. I'm just guessing, >> > > >>> petsc4py never explicitly supported Windows and/or Cygwin. >> > > >>> >> > > >>> >> > > >>> -- >> > > >>> Lisandro Dalcin >> > > >>> ============ >> > > >>> Research Scientist >> > > >>> Computer, Electrical and Mathematical Sciences & Engineering >> > > >>> (CEMSE) >> > > >>> Numerical Porous Media Center (NumPor) >> > > >>> King Abdullah University of Science and Technology (KAUST) >> > > >>> http://numpor.kaust.edu.sa/ >> > > >>> >> > > >>> 4700 King Abdullah University of Science and Technology >> > > >>> al-Khawarizmi Bldg (Bldg 1), Office # 4332 >> > > >>> Thuwal 23955-6900, Kingdom of Saudi Arabia >> > > >>> http://www.kaust.edu.sa >> > > >>> >> > > >>> Office Phone: +966 12 808-0459 >> > > >>> >> > > >> >> > > >> >> > > > >> > > > >> > > > -- >> > > > What most experimenters take for granted before they begin their >> > > > experiments is infinitely more interesting than any results to which >> > > > their >> > > > experiments lead. >> > > > -- Norbert Wiener >> > > > >> > > >> > >> > -- Lisandro Dalcin ============ Research Scientist Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) Numerical Porous Media Center (NumPor) King Abdullah University of Science and Technology (KAUST) http://numpor.kaust.edu.sa/ 4700 King Abdullah University of Science and Technology al-Khawarizmi Bldg (Bldg 1), Office # 4332 Thuwal 23955-6900, Kingdom of Saudi Arabia http://www.kaust.edu.sa Office Phone: +966 12 808-0459 From hzhang at mcs.anl.gov Mon Jun 22 15:29:56 2015 From: hzhang at mcs.anl.gov (Hong) Date: Mon, 22 Jun 2015 15:29:56 -0500 Subject: [petsc-users] MUMPS error and superLU error In-Reply-To: <464F9BA7-5FA7-46F9-A1BD-65CA71079D5E@mcs.anl.gov> References: <464F9BA7-5FA7-46F9-A1BD-65CA71079D5E@mcs.anl.gov> Message-ID: Venkatesh, You may also test superlu_dist, which may use less memory. Hong On Mon, Jun 22, 2015 at 12:43 PM, Barry Smith wrote: > > There is nothing we can really do to help on the PETSc side. I do note > from the output > > REDISTRIB: TOTAL DATA LOCAL/SENT = 328575589 1437471711 > GLOBAL TIME FOR MATRIX DISTRIBUTION = 206.6792 > ** Memory relaxation parameter ( ICNTL(14) ) : 35 > ** Rank of processor needing largest memory in facto : 30 > ** Space in MBYTES used by this processor for facto : 21593 > ** Avg. Space in MBYTES per working proc during facto : 7708 > > some processes (like 30) require three times as much memory as other > processes so perhaps a better load balancing of the matrix during the > factorization would help with memory usage. > > Barry > > > > On Jun 22, 2015, at 10:57 AM, venkatesh g > wrote: > > > > Hi > > I have restructured my matrix eigenvalue problem to see why B is > singular as you suggested by changing the governing equations in different > form. > > > > Now my matrix B is not singular. Both A and B are invertible in > Ax=lambda Bx. > > > > Still I receive error in MUMPS as it uses large memory (attached is the > error log) > > > > I gave the command: aprun -n 240 -N 24 ./ex7 -f1 A100t -f2 B100t > -st_type sinvert -eps_target 0.01 -st_ksp_type preonly -st_pc_type lu > -st_pc_factor_mat_solver_package mumps -mat_mumps_cntl_1 1e-5 > -mat_mumps_icntl_4 2 -evecs v100t > > > > The matrix A is 60% with zeros. > > > > Kindly help me. > > > > Venkatesh > > > > On Sun, May 31, 2015 at 8:04 PM, Hong wrote: > > venkatesh, > > > > As we discussed previously, even on smaller problems, > > both mumps and superlu_dist failed, although Mumps gave "OOM" error in > numerical factorization. > > > > You acknowledged that B is singular, which may need additional > reformulation for your eigenvalue problems. The option '-st_type sinvert' > likely uses B^{-1} (have you read slepc manual?), which could be the source > of trouble. > > > > Please investigate your model, understand why B is singular; if there is > a way to dump null space before submitting large size simulation. > > > > Hong > > > > > > On Sun, May 31, 2015 at 8:36 AM, Dave May > wrote: > > It failed due to a lack of memory. "OOM" stands for "out of memory". OOM > killer terminated your job means you ran out of memory. > > > > > > > > > > On Sunday, 31 May 2015, venkatesh g wrote: > > Hi all, > > > > I tried to run my Generalized Eigenproblem in 120 x 24 = 2880 cores. > > The matrix size of A = 20GB and B = 5GB. > > > > It got killed after 7 Hrs of run time. Please see the mumps error log. > Why must it fail ? > > I gave the command: > > > > aprun -n 240 -N 24 ./ex7 -f1 a110t -f2 b110t -st_type sinvert -eps_nev 1 > -log_summary -st_ksp_type preonly -st_pc_type lu > -st_pc_factor_mat_solver_package mumps -mat_mumps_cntl_1 1e-2 > > > > Kindly let me know. > > > > cheers, > > Venkatesh > > > > On Fri, May 29, 2015 at 10:46 PM, venkatesh g > wrote: > > Hi Matt, users, > > > > Thanks for the info. Do you also use Petsc and Slepc with MUMPS ? I get > into the segmentation error if I increase my matrix size. > > > > Can you suggest other software for direct solver for QR in parallel > since as LU may not be good for a singular B matrix in Ax=lambda Bx ? I am > attaching the working version mumps log. > > > > My matrix size here is around 47000x47000. If I am not wrong, the memory > usage per core is 272MB. > > > > Can you tell me if I am wrong ? or really if its light on memory for > this matrix ? > > > > Thanks > > cheers, > > Venkatesh > > > > On Fri, May 29, 2015 at 4:00 PM, Matt Landreman < > matt.landreman at gmail.com> wrote: > > Dear Venkatesh, > > > > As you can see in the error log, you are now getting a segmentation > fault, which is almost certainly a separate issue from the info(1)=-9 > memory problem you had previously. Here is one idea which may or may not > help. I've used mumps on the NERSC Edison system, and I found that I > sometimes get segmentation faults when using the default Intel compiler. > When I switched to the cray compiler the problem disappeared. So you could > perhaps try a different compiler if one is available on your system. > > > > Matt > > > > On May 29, 2015 4:04 AM, "venkatesh g" wrote: > > Hi Matt, > > > > I did what you told and read the manual of that CNTL parameters. I solve > for that with CNTL(1)=1e-4. It is working. > > > > But it was a test matrix with size 46000x46000. Actual matrix size is > 108900x108900 and will increase in the future. > > > > I get this error of memory allocation failed. And the binary matrix size > of A is 20GB and B is 5 GB. > > > > Now I submit this in 240 processors each 4 GB RAM and also in 128 > Processors with total 512 GB RAM. > > > > In both the cases, it fails with the following error like memory is not > enough. But for 90000x90000 size it had run serially in Matlab with <256 GB > RAM. > > > > Kindly let me know. > > > > Venkatesh > > > > On Tue, May 26, 2015 at 8:02 PM, Matt Landreman < > matt.landreman at gmail.com> wrote: > > Hi Venkatesh, > > > > I've struggled a bit with mumps memory allocation too. I think the > behavior of mumps is roughly the following. First, in the "analysis step", > mumps computes a minimum memory required based on the structure of nonzeros > in the matrix. Then when it actually goes to factorize the matrix, if it > ever encounters an element smaller than CNTL(1) (default=0.01) in the > diagonal of a sub-matrix it is trying to factorize, it modifies the > ordering to avoid the small pivot, which increases the fill-in (hence > memory needed). ICNTL(14) sets the margin allowed for this unanticipated > fill-in. Setting ICNTL(14)=200000 as in your email is not the solution, > since this means mumps asks for a huge amount of memory at the start. > Better would be to lower CNTL(1) or (I think) use static pivoting > (CNTL(4)). Read the section in the mumps manual about these CNTL > parameters. I typically set CNTL(1)=1e-6, which eliminated all the > INFO(1)=-9 errors for my problem, without having to modify ICNTL(14). > > > > Also, I recommend running with ICNTL(4)=3 to display diagnostics. Look > for the line in standard output that says "TOTAL space in MBYTES for IC > factorization". This is the amount of memory that mumps is trying to > allocate, and for the default ICNTL(14), it should be similar to matlab's > need. > > > > Hope this helps, > > -Matt Landreman > > University of Maryland > > > > On Tue, May 26, 2015 at 10:03 AM, venkatesh g > wrote: > > I posted a while ago in MUMPS forums but no one seems to reply. > > > > I am solving a large generalized Eigenvalue problem. > > > > I am getting the following error which is attached, after giving the > command: > > > > /cluster/share/venkatesh/petsc-3.5.3/linux-gnu/bin/mpiexec -np 64 -hosts > compute-0-4,compute-0-6,compute-0-7,compute-0-8 ./ex7 -f1 a72t -f2 b72t > -st_type sinvert -eps_nev 3 -eps_target 0.5 -st_ksp_type preonly > -st_pc_type lu -st_pc_factor_mat_solver_package mumps -mat_mumps_icntl_14 > 200000 > > > > IT IS impossible to allocate so much memory per processor.. it is asking > like around 70 GB per processor. > > > > A serial job in MATLAB for the same matrices takes < 60GB. > > > > After trying out superLU_dist, I have attached the error there also > (segmentation error). > > > > Kindly help me. > > > > Venkatesh > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Jun 22 16:55:54 2015 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 22 Jun 2015 16:55:54 -0500 Subject: [petsc-users] move from KSPSetNullSpace to MatSetNullSpace In-Reply-To: <55885063.2060406@imperial.ac.uk> References: <55885063.2060406@imperial.ac.uk> Message-ID: On Mon, Jun 22, 2015 at 1:13 PM, Stephan Kramer wrote: > Dear petsc devs > > I've been trying to move our code from using KSPSetNullSpace to use > MatSetNullSpace instead. Although I appreciate the convenience of the > nullspace being propagated automatically through the solver hierarchy, I'm > still a little confused on how to deal with the case that mat/=pmat in a > ksp. > > If I read the code correctly I need to call MatSetNullSpace on the pmat in > order for a ksp to project that nullspace out of the residual during the > krylov iteration. However the nullspaces of mat and pmat are not > necessarily the same. For instance, what should I do if the system that I > want to solve has a nullspace (i.e. the `mat` has a null vector), but the > preconditioner matrix does not. > > As an example of existing setups that we have for which this is the case: > take a standard Stokes velocity, pressure system - where we want to solve > for pressure through a Schur complement. Suppose the boundary conditions > are Dirichlet for velocity on all boundaries, then the pressure equation > has the standard, constant nullspace. A frequently used preconditioner is > the "scaled pressure mass matrix" where G^T K^{-1} G is approximated by a > pressure mass matrix that is scaled by the value of the viscosity. So in > this case the actual system has a nullspace, but the preconditioner matrix > does not. > > The way we previously used to solve this is by setting the nullspace on > the ksp of the Schur system, where the mat comes from > MatCreateSchurComplement and the pmat is the scaled mass matrix. We then > set the pctype to PCKSP and do not set a nullspace on its associated ksp. I > don't really see how I can do this using only MatSetNullSpace - unless I > create a copy of the mass matrix and have one copy with and one copy > without a nullspace so that I could use the one with the nullspace for the > pmat of the Schur ksp (and thus the mat of the pcksp) and the copy without > the nullspace as the pmat for the pcksp. > > We would very much appreciate some guidance on what the correct way to > deal with this kind of situation is > I can understand that this is inconsistent (we have not really thought of a good way to make this consistent). However, does this produce the wrong results? If A has a null space, won't you get the same answer by attaching it to P, or am I missing the import of the above example? Thanks, Matt > Cheers > Stephan > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.kramer at imperial.ac.uk Mon Jun 22 18:02:27 2015 From: s.kramer at imperial.ac.uk (Stephan Kramer) Date: Tue, 23 Jun 2015 00:02:27 +0100 Subject: [petsc-users] move from KSPSetNullSpace to MatSetNullSpace In-Reply-To: <55885063.2060406@imperial.ac.uk> References: <55885063.2060406@imperial.ac.uk> Message-ID: <55889403.9090707@imperial.ac.uk> > On Mon, Jun 22, 2015 at 1:13 PM, Stephan Kramer > wrote: > >> Dear petsc devs >> >> I've been trying to move our code from using KSPSetNullSpace to use >> MatSetNullSpace instead. Although I appreciate the convenience of the >> nullspace being propagated automatically through the solver hierarchy, I'm >> still a little confused on how to deal with the case that mat/=pmat in a >> ksp. >> >> If I read the code correctly I need to call MatSetNullSpace on the pmat in >> order for a ksp to project that nullspace out of the residual during the >> krylov iteration. However the nullspaces of mat and pmat are not >> necessarily the same. For instance, what should I do if the system that I >> want to solve has a nullspace (i.e. the `mat` has a null vector), but the >> preconditioner matrix does not. >> >> As an example of existing setups that we have for which this is the case: >> take a standard Stokes velocity, pressure system - where we want to solve >> for pressure through a Schur complement. Suppose the boundary conditions >> are Dirichlet for velocity on all boundaries, then the pressure equation >> has the standard, constant nullspace. A frequently used preconditioner is >> the "scaled pressure mass matrix" where G^T K^{-1} G is approximated by a >> pressure mass matrix that is scaled by the value of the viscosity. So in >> this case the actual system has a nullspace, but the preconditioner matrix >> does not. >> >> The way we previously used to solve this is by setting the nullspace on >> the ksp of the Schur system, where the mat comes from >> MatCreateSchurComplement and the pmat is the scaled mass matrix. We then >> set the pctype to PCKSP and do not set a nullspace on its associated ksp. I >> don't really see how I can do this using only MatSetNullSpace - unless I >> create a copy of the mass matrix and have one copy with and one copy >> without a nullspace so that I could use the one with the nullspace for the >> pmat of the Schur ksp (and thus the mat of the pcksp) and the copy without >> the nullspace as the pmat for the pcksp. >> >> We would very much appreciate some guidance on what the correct way to >> deal with this kind of situation is >> > > I can understand that this is inconsistent (we have not really thought of a > good way to make this consistent). However, does > this produce the wrong results? If A has a null space, won't you get the > same answer by attaching it to P, or am I missing the > import of the above example? Attaching the nullspace to P means it will be applied in the PCKSP solve as well - which in this case is just a mass matrix solve which doesn't have a nullspace. I was under the impression that removing a nullspace in a solve where the matrix doesn't having one, would lead to trouble - but I'm happy to be told wrong. Cheers Stephan From knepley at gmail.com Mon Jun 22 20:01:40 2015 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 22 Jun 2015 20:01:40 -0500 Subject: [petsc-users] move from KSPSetNullSpace to MatSetNullSpace In-Reply-To: <55889403.9090707@imperial.ac.uk> References: <55885063.2060406@imperial.ac.uk> <55889403.9090707@imperial.ac.uk> Message-ID: On Mon, Jun 22, 2015 at 6:02 PM, Stephan Kramer wrote: > On Mon, Jun 22, 2015 at 1:13 PM, Stephan Kramer > imperial.ac.uk> >> wrote: >> >> Dear petsc devs >>> >>> I've been trying to move our code from using KSPSetNullSpace to use >>> MatSetNullSpace instead. Although I appreciate the convenience of the >>> nullspace being propagated automatically through the solver hierarchy, >>> I'm >>> still a little confused on how to deal with the case that mat/=pmat in a >>> ksp. >>> >>> If I read the code correctly I need to call MatSetNullSpace on the pmat >>> in >>> order for a ksp to project that nullspace out of the residual during the >>> krylov iteration. However the nullspaces of mat and pmat are not >>> necessarily the same. For instance, what should I do if the system that I >>> want to solve has a nullspace (i.e. the `mat` has a null vector), but the >>> preconditioner matrix does not. >>> >>> As an example of existing setups that we have for which this is the case: >>> take a standard Stokes velocity, pressure system - where we want to solve >>> for pressure through a Schur complement. Suppose the boundary conditions >>> are Dirichlet for velocity on all boundaries, then the pressure equation >>> has the standard, constant nullspace. A frequently used preconditioner is >>> the "scaled pressure mass matrix" where G^T K^{-1} G is approximated by a >>> pressure mass matrix that is scaled by the value of the viscosity. So in >>> this case the actual system has a nullspace, but the preconditioner >>> matrix >>> does not. >>> >>> The way we previously used to solve this is by setting the nullspace on >>> the ksp of the Schur system, where the mat comes from >>> MatCreateSchurComplement and the pmat is the scaled mass matrix. We then >>> set the pctype to PCKSP and do not set a nullspace on its associated >>> ksp. I >>> don't really see how I can do this using only MatSetNullSpace - unless I >>> create a copy of the mass matrix and have one copy with and one copy >>> without a nullspace so that I could use the one with the nullspace for >>> the >>> pmat of the Schur ksp (and thus the mat of the pcksp) and the copy >>> without >>> the nullspace as the pmat for the pcksp. >>> >>> We would very much appreciate some guidance on what the correct way to >>> deal with this kind of situation is >>> >>> >> I can understand that this is inconsistent (we have not really thought of >> a >> good way to make this consistent). However, does >> this produce the wrong results? If A has a null space, won't you get the >> same answer by attaching it to P, or am I missing the >> import of the above example? >> > > Attaching the nullspace to P means it will be applied in the PCKSP solve as > well - which in this case is just a mass matrix solve which doesn't > have a nullspace. I was under the impression that removing a nullspace in > a solve > where the matrix doesn't having one, would lead to trouble - but I'm happy > to > be told wrong. > I guess I was saying that this solve is only applied after a system with that null space, so I did not think it would change the answer. Thanks, Matt > Cheers > Stephan > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.kramer at imperial.ac.uk Tue Jun 23 07:42:01 2015 From: s.kramer at imperial.ac.uk (Stephan Kramer) Date: Tue, 23 Jun 2015 13:42:01 +0100 Subject: [petsc-users] move from KSPSetNullSpace to MatSetNullSpace In-Reply-To: References: Message-ID: <55895419.7050500@imperial.ac.uk> >> On Mon, Jun 22, 2015 at 1:13 PM, Stephan Kramer >> imperial.ac.uk> >>> wrote: >>> >>> Dear petsc devs >>>> >>>> I've been trying to move our code from using KSPSetNullSpace to use >>>> MatSetNullSpace instead. Although I appreciate the convenience of the >>>> nullspace being propagated automatically through the solver hierarchy, >>>> I'm >>>> still a little confused on how to deal with the case that mat/=pmat in a >>>> ksp. >>>> >>>> If I read the code correctly I need to call MatSetNullSpace on the pmat >>>> in >>>> order for a ksp to project that nullspace out of the residual during the >>>> krylov iteration. However the nullspaces of mat and pmat are not >>>> necessarily the same. For instance, what should I do if the system that I >>>> want to solve has a nullspace (i.e. the `mat` has a null vector), but the >>>> preconditioner matrix does not. >>>> >>>> As an example of existing setups that we have for which this is the case: >>>> take a standard Stokes velocity, pressure system - where we want to solve >>>> for pressure through a Schur complement. Suppose the boundary conditions >>>> are Dirichlet for velocity on all boundaries, then the pressure equation >>>> has the standard, constant nullspace. A frequently used preconditioner is >>>> the "scaled pressure mass matrix" where G^T K^{-1} G is approximated by a >>>> pressure mass matrix that is scaled by the value of the viscosity. So in >>>> this case the actual system has a nullspace, but the preconditioner >>>> matrix >>>> does not. >>>> >>>> The way we previously used to solve this is by setting the nullspace on >>>> the ksp of the Schur system, where the mat comes from >>>> MatCreateSchurComplement and the pmat is the scaled mass matrix. We then >>>> set the pctype to PCKSP and do not set a nullspace on its associated >>>> ksp. I >>>> don't really see how I can do this using only MatSetNullSpace - unless I >>>> create a copy of the mass matrix and have one copy with and one copy >>>> without a nullspace so that I could use the one with the nullspace for >>>> the >>>> pmat of the Schur ksp (and thus the mat of the pcksp) and the copy >>>> without >>>> the nullspace as the pmat for the pcksp. >>>> >>>> We would very much appreciate some guidance on what the correct way to >>>> deal with this kind of situation is >>>> >>>> >>> I can understand that this is inconsistent (we have not really thought of >>> a >>> good way to make this consistent). However, does >>> this produce the wrong results? If A has a null space, won't you get the >>> same answer by attaching it to P, or am I missing the >>> import of the above example? >>> >> >> Attaching the nullspace to P means it will be applied in the PCKSP solve as >> well - which in this case is just a mass matrix solve which doesn't >> have a nullspace. I was under the impression that removing a nullspace in >> a solve >> where the matrix doesn't having one, would lead to trouble - but I'm happy >> to >> be told wrong. >> > > I guess I was saying that this solve is only applied after a system with > that null space, > so I did not think it would change the answer. > > Thanks, > > Matt Sorry if we're talking completely cross-purposes here: but the pcksp solve (that doesn't actually have a nullspace) is inside a solve that does have a nullspace. If what you are saying is that applying the nullspace inside the pcksp solve does not affect the outer solve, I can only see that if the difference in outcome of the pcksp solve (with or without the nullspace) would be inside the nullspace, because such a difference would be projected out anyway straight after the pcksp solve. I don't see that this is true however. My reasoning is the following: Say the original PCKSP solve (that doesn't apply the nullspace) gets passed some residual r from the outer solve, so it solves: Pz=r where P is our mass matrix approximation to the system in the outer solve, and P is full rank. Now if we do remove the nullspace during the pcksp solve, effectively we change the system to the following: NM^{-1}P z = NM^{-1}r where M^{-1} is the preconditioner inside the pcksp solve (assuming left-preconditiong here), and N is the projection operator that projects out the nullspace. This system is now rank-deficient, where I can add: z -> z + P^{-1}M n for arbitrary n in the nullspace. So not only is the possible difference between solving with or without a nullspace not found in that nullspace, but worse, I've ended up with a preconditioner that's rank deficient in the outer solve. I've experimented with the example I described before with the Schur complement of a Stokes velocity,pressure system on the outside using fgmres and the viscosit scaled pressure mass matrix as preconditioner which is solved in the pcksp solve with cg+sor. With petsc 3.5 (so I can still use KSPSetNullSpace) if I set a nullspace only on the outside ksp, or if I set it both on the outside ksp and the pcksp sub_ksp, I get different answers (and also a different a number of iterations). The difference in answer visually looks like some grid scale noise that indeed could very well be of the form of a constant vector multiplied by an inverse mass matrix. If you agree that applying the nullspace inside the pcksp indeed gives incorrect answers, I'm still not clear how I can set this up using MatSetNullSpace only. Cheers Stephan From knepley at gmail.com Tue Jun 23 10:20:17 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 23 Jun 2015 10:20:17 -0500 Subject: [petsc-users] June 24th DMPlex Webinar Message-ID: Michael Lange will give a webinar tomorrow at 9:00am CDT which covers the DMPlex interface for unstructured mesh management. Thanks, Matt ------------------------------------------------------------------------------ ARCHER Webinar: Flexible, Scalable Mesh and Data Management using PETSc DMPlex ------------------------------------------------------------------------------ 15:00 BST, Wed 24th June 2015 http://www.archer.ac.uk/training/virtual/. Michael Lange, Imperial College, London Scalable file I/O and efficient domain topology management present important challenges for many scientific applications if they are to fully utilise HPC resources. The use of composable abstractions allows a wide variety of application codes to utilise a range of mesh file formats while automatically inheriting the benefits of well-known performance optimisations. In this talk we give an overview of PETSc's DMPlex domain topology abstraction and it's recent integration with the Fluidity CFD application and the Firedrake automated finite element system. We highlight how both applications utilise DMPlex's mesh management capabilities to perform common mesh management tasks and data layout optimisation, while promoting application interoperability through shared file I/O routines. To join, click "Launch" at http://www.archer.ac.uk/training/virtual/. For more detailed information, including software requirements, seehttp://www.archer.ac.uk/training/virtual/blackboard_collaborate.php ----------------------------------------------------------------- -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From mkhodak at princeton.edu Tue Jun 23 12:15:59 2015 From: mkhodak at princeton.edu (Mikhail Khodak) Date: Tue, 23 Jun 2015 10:15:59 -0700 Subject: [petsc-users] petsc4py Build Problem In-Reply-To: References: Message-ID: Thank you for reviewing this. The errors for the command 'pip2 install .' inside the petsc4py directory are attached, along with the configure/make logs for the PETSc build. Best, Mikhail Khodak On Mon, Jun 22, 2015 at 12:25 PM, Lisandro Dalcin wrote: > Dear Mikhail, Satish helped me and we managed to get petsc4py working > with static builds and Cygwin's Open MPI. Please git pull and try > again. > > On 11 June 2015 at 14:13, Mikhail Khodak wrote: > > I have attached the log files. > > I tried using Open MPI before but received the same result. I will try a > > static build later and otherwise will get a VM. > > Thanks for looking at this, > > Mikhail > > > > > > On Thu, Jun 11, 2015 at 10:48 AM, Satish Balay > wrote: > >> > >> Also - should mention: > >> > >> - cygwin has OpenMPI pacakged - so you could also try that. > >> - most of my builds are static [so never tried dll builds] > >> > >> Satish > >> > >> On Thu, 11 Jun 2015, Satish Balay wrote: > >> > >> > If you get errors when running basic petsc examples - send us the > >> > relavant petsc logs [configure.log, make.log, test.log etc..] > >> > > >> > Also - note that mpich is not supported on cygwin/windows [it > >> > generally works for us - when we try the --download-mpich build]. > >> > > >> > Unless you really need a cywin build/use of petsc4py - it might be > >> > easier to install a linux VM - and use PETSc/petsc4py on it. > >> > > >> > Satish > >> > > >> > On Thu, 11 Jun 2015, Mikhail Khodak wrote: > >> > > >> > > I added these lines to the petsc4py test files (specifically > >> > > test_comm.py) > >> > > but the error remains the same. However, I have done the standard > >> > > 'which > >> > > mpiexec', 'which mpicc', 'which mpirun' and they are all in the same > >> > > folder. In fact it is the only MPI installed. > >> > > The reason I thought it might be a PETSc build problem is because > one > >> > > of > >> > > the PETSc 'make test' tests (ex5f) fails with the same error, even > >> > > though > >> > > the 'make streams' test works fine with MPI processes. > >> > > Thanks, > >> > > Mikhail > >> > > > >> > > On Thu, Jun 11, 2015 at 4:21 AM, Matthew Knepley > > >> > > wrote: > >> > > > >> > > > On Thu, Jun 11, 2015 at 5:07 AM, Mikhail Khodak > >> > > > > >> > > > wrote: > >> > > > > >> > > >> Thank you for your help - the install seems to work, apart from > >> > > >> routines > >> > > >> requiring MPI, which fail due to the "Attempting to use an MPI > >> > > >> routine > >> > > >> before initializing MPI" error. This seems to be an error in the > >> > > >> PETSc > >> > > >> build itself. > >> > > >> > >> > > > > >> > > > The MPI routines will not work until after > >> > > > > >> > > > import petsc4py, sys > >> > > > petsc4py.init(sys.argv) > >> > > > from petsc4py import PETSc > >> > > > > >> > > > If they fail after this, it is usually a mismatch between the > >> > > > mpiexec > >> > > > being used and the MPI > >> > > > libraries being linked. > >> > > > > >> > > > Thanks, > >> > > > > >> > > > Matt > >> > > > > >> > > > Thanks again, > >> > > >> Mikhail Khodak > >> > > >> > >> > > >> On Mon, Jun 8, 2015 at 5:11 AM, Lisandro Dalcin < > dalcinl at gmail.com> > >> > > >> wrote: > >> > > >> > >> > > >>> On 8 June 2015 at 02:50, Mikhail Khodak > >> > > >>> wrote: > >> > > >>> > Hello, I am trying to build petsc4py-3.5.1 using Cygwin on > >> > > >>> > 64-bit > >> > > >>> Windows 7. > >> > > >>> > I have built PETSc 3.5.4 with shared and dynamic libraries > using > >> > > >>> > mpich2-1.2.1 and successfully ran the installation tests. I am > >> > > >>> > using > >> > > >>> Python > >> > > >>> > 2.7 and NumPy 1.9.2 and have installed mpi4py. However, when I > >> > > >>> > attempt > >> > > >>> to > >> > > >>> > install petsc4py (both with pip and distutils) I get a mpicc > >> > > >>> > compiler > >> > > >>> error > >> > > >>> > due to undefined references/symbols. I have attached the > output > >> > > >>> > of > >> > > >>> running > >> > > >>> > > >> > > >>> > pip install petsc petsc4py --allow-external petsc > >> > > >>> > --allow-external > >> > > >>> petsc4py > >> > > >>> > > >> > > >>> > >> > > >>> I've never ever built or test petsc4py under Cygwin. The errors > >> > > >>> you > >> > > >>> see are expected. > >> > > >>> Perhaps you can manually workaround the issues following the > >> > > >>> following > >> > > >>> steps: > >> > > >>> > >> > > >>> 1) Download the petsc4py tarball and unpack it. > >> > > >>> 2) Open the file "src/libpetsc4py/libpetsc4py.h", add remove all > >> > > >>> occurences of DL_IMPORT, i.e, replace DL_IMPORT(XYZ) for just > XYZ > >> > > >>> 3) Use pip again: > >> > > >>> > >> > > >>> pip install petsc > >> > > >>> pip install . > >> > > >>> > >> > > >>> The last line assumes your current working directory is the one > >> > > >>> having > >> > > >>> petsc4py's setup.py > >> > > >>> > >> > > >>> Finally, I do not guarantee this will work. I'm just guessing, > >> > > >>> petsc4py never explicitly supported Windows and/or Cygwin. > >> > > >>> > >> > > >>> > >> > > >>> -- > >> > > >>> Lisandro Dalcin > >> > > >>> ============ > >> > > >>> Research Scientist > >> > > >>> Computer, Electrical and Mathematical Sciences & Engineering > >> > > >>> (CEMSE) > >> > > >>> Numerical Porous Media Center (NumPor) > >> > > >>> King Abdullah University of Science and Technology (KAUST) > >> > > >>> http://numpor.kaust.edu.sa/ > >> > > >>> > >> > > >>> 4700 King Abdullah University of Science and Technology > >> > > >>> al-Khawarizmi Bldg (Bldg 1), Office # 4332 > >> > > >>> Thuwal 23955-6900, Kingdom of Saudi Arabia > >> > > >>> http://www.kaust.edu.sa > >> > > >>> > >> > > >>> Office Phone: +966 12 808-0459 > >> > > >>> > >> > > >> > >> > > >> > >> > > > > >> > > > > >> > > > -- > >> > > > What most experimenters take for granted before they begin their > >> > > > experiments is infinitely more interesting than any results to > which > >> > > > their > >> > > > experiments lead. > >> > > > -- Norbert Wiener > >> > > > > >> > > > >> > > >> > > > > > -- > Lisandro Dalcin > ============ > Research Scientist > Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) > Numerical Porous Media Center (NumPor) > King Abdullah University of Science and Technology (KAUST) > http://numpor.kaust.edu.sa/ > > 4700 King Abdullah University of Science and Technology > al-Khawarizmi Bldg (Bldg 1), Office # 4332 > Thuwal 23955-6900, Kingdom of Saudi Arabia > http://www.kaust.edu.sa > > Office Phone: +966 12 808-0459 > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- Processing /cygdrive/c/cygwin64/lib/python2.7/site-packages/petsc4py-3.5.1 Requirement already satisfied (use --upgrade to upgrade): numpy in /usr/lib/python2.7/site-packages (from petsc4py==3.6.0) Building wheels for collected packages: petsc4py Running setup.py bdist_wheel for petsc4py Complete output from command /usr/bin/python -c "import setuptools;__file__='/tmp/pip-yBfOJT-build/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" bdist_wheel -d /tmp/tmpXdUvO2pip-wheel-: running bdist_wheel running build running build_src cythonizing 'petsc4py.PETSc.pyx' -> 'petsc4py.PETSc.c' cythonizing 'libpetsc4py/libpetsc4py.pyx' -> 'libpetsc4py/libpetsc4py.c' running build_py creating build creating build/lib.cygwin-2.0.4-i686-2.7 creating build/lib.cygwin-2.0.4-i686-2.7/petsc4py copying src/PETSc.py -> build/lib.cygwin-2.0.4-i686-2.7/petsc4py copying src/__init__.py -> build/lib.cygwin-2.0.4-i686-2.7/petsc4py copying src/__main__.py -> build/lib.cygwin-2.0.4-i686-2.7/petsc4py creating build/lib.cygwin-2.0.4-i686-2.7/petsc4py/lib copying src/lib/__init__.py -> build/lib.cygwin-2.0.4-i686-2.7/petsc4py/lib creating build/lib.cygwin-2.0.4-i686-2.7/petsc4py/include creating build/lib.cygwin-2.0.4-i686-2.7/petsc4py/include/petsc4py copying src/include/petsc4py/numpy.h -> build/lib.cygwin-2.0.4-i686-2.7/petsc4py/include/petsc4py copying src/include/petsc4py/petsc4py.h -> build/lib.cygwin-2.0.4-i686-2.7/petsc4py/include/petsc4py copying src/include/petsc4py/petsc4py.PETSc.h -> build/lib.cygwin-2.0.4-i686-2.7/petsc4py/include/petsc4py copying src/include/petsc4py/petsc4py.PETSc_api.h -> build/lib.cygwin-2.0.4-i686-2.7/petsc4py/include/petsc4py copying src/include/petsc4py/petsc4py.i -> build/lib.cygwin-2.0.4-i686-2.7/petsc4py/include/petsc4py copying src/include/petsc4py/PETSc.pxd -> build/lib.cygwin-2.0.4-i686-2.7/petsc4py/include/petsc4py copying src/include/petsc4py/__init__.pxd -> build/lib.cygwin-2.0.4-i686-2.7/petsc4py/include/petsc4py copying src/include/petsc4py/__init__.pyx -> build/lib.cygwin-2.0.4-i686-2.7/petsc4py/include/petsc4py copying src/lib/petsc.cfg -> build/lib.cygwin-2.0.4-i686-2.7/petsc4py/lib running build_ext Traceback (most recent call last): File "", line 1, in File "/tmp/pip-yBfOJT-build/setup.py", line 266, in main() File "/tmp/pip-yBfOJT-build/setup.py", line 263, in main run_setup() File "/tmp/pip-yBfOJT-build/setup.py", line 136, in run_setup **setup_args) File "/usr/lib/python2.7/distutils/core.py", line 151, in setup dist.run_commands() File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands self.run_command(cmd) File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command cmd_obj.run() File "/usr/lib/python2.7/site-packages/wheel/bdist_wheel.py", line 175, in run self.run_command('build') File "/usr/lib/python2.7/distutils/cmd.py", line 326, in run_command self.distribution.run_command(command) File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command cmd_obj.run() File "/usr/lib/python2.7/distutils/command/build.py", line 127, in run self.run_command(cmd_name) File "/usr/lib/python2.7/distutils/cmd.py", line 326, in run_command self.distribution.run_command(command) File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command cmd_obj.run() File "/usr/lib/python2.7/site-packages/setuptools/command/build_ext.py", line 50, in run _build_ext.run(self) File "/usr/lib/python2.7/distutils/command/build_ext.py", line 337, in run self.build_extensions() File "conf/baseconf.py", line 510, in build_extensions _build_ext.build_extensions(self, *args,**kargs) File "/usr/lib/python2.7/distutils/command/build_ext.py", line 446, in build_extensions self.build_extension(ext) File "conf/baseconf.py", line 495, in build_extension config = self.get_config_arch(arch) File "conf/baseconf.py", line 486, in get_config_arch return config.Configure(self.petsc_dir, arch) File "conf/baseconf.py", line 108, in __init__ self.language = language_map[self['PETSC_LANGUAGE']] File "conf/baseconf.py", line 111, in __getitem__ return self.configdict[item] KeyError: 'PETSC_LANGUAGE' ---------------------------------------- Failed to build petsc4py Installing collected packages: petsc4py Running setup.py install for petsc4py Complete output from command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip-yBfOJT-build/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-0cQfDb-record/install-record.txt --single-version-externally-managed --compile: running install running build running build_src running build_py running build_ext Traceback (most recent call last): File "", line 1, in File "/tmp/pip-yBfOJT-build/setup.py", line 266, in main() File "/tmp/pip-yBfOJT-build/setup.py", line 263, in main run_setup() File "/tmp/pip-yBfOJT-build/setup.py", line 136, in run_setup **setup_args) File "/usr/lib/python2.7/distutils/core.py", line 151, in setup dist.run_commands() File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands self.run_command(cmd) File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command cmd_obj.run() File "conf/baseconf.py", line 561, in run _install.run(self) File "/usr/lib/python2.7/site-packages/setuptools/command/install.py", line 61, in run return orig.install.run(self) File "/usr/lib/python2.7/distutils/command/install.py", line 563, in run self.run_command('build') File "/usr/lib/python2.7/distutils/cmd.py", line 326, in run_command self.distribution.run_command(command) File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command cmd_obj.run() File "/usr/lib/python2.7/distutils/command/build.py", line 127, in run self.run_command(cmd_name) File "/usr/lib/python2.7/distutils/cmd.py", line 326, in run_command self.distribution.run_command(command) File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command cmd_obj.run() File "/usr/lib/python2.7/site-packages/setuptools/command/build_ext.py", line 50, in run _build_ext.run(self) File "/usr/lib/python2.7/distutils/command/build_ext.py", line 337, in run self.build_extensions() File "conf/baseconf.py", line 510, in build_extensions _build_ext.build_extensions(self, *args,**kargs) File "/usr/lib/python2.7/distutils/command/build_ext.py", line 446, in build_extensions self.build_extension(ext) File "conf/baseconf.py", line 495, in build_extension config = self.get_config_arch(arch) File "conf/baseconf.py", line 486, in get_config_arch return config.Configure(self.petsc_dir, arch) File "conf/baseconf.py", line 108, in __init__ self.language = language_map[self['PETSC_LANGUAGE']] File "conf/baseconf.py", line 111, in __getitem__ return self.configdict[item] KeyError: 'PETSC_LANGUAGE' ---------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 100 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: make.log Type: application/octet-stream Size: 90 bytes Desc: not available URL: From bsmith at mcs.anl.gov Tue Jun 23 13:57:32 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 23 Jun 2015 13:57:32 -0500 Subject: [petsc-users] move from KSPSetNullSpace to MatSetNullSpace In-Reply-To: <55895419.7050500@imperial.ac.uk> References: <55895419.7050500@imperial.ac.uk> Message-ID: <3A881365-E683-4DAB-B4EC-49236045F309@mcs.anl.gov> If we move the location of the nullspace used in removal from the pmat to the mat would that completely resolve the problem for you? Barry On Jun 23, 2015, at 7:42 AM, Stephan Kramer wrote: > >>> On Mon, Jun 22, 2015 at 1:13 PM, Stephan Kramer >>> imperial.ac.uk> >>>> wrote: >>>> >>>> Dear petsc devs >>>>> >>>>> I've been trying to move our code from using KSPSetNullSpace to use >>>>> MatSetNullSpace instead. Although I appreciate the convenience of the >>>>> nullspace being propagated automatically through the solver hierarchy, >>>>> I'm >>>>> still a little confused on how to deal with the case that mat/=pmat in a >>>>> ksp. >>>>> >>>>> If I read the code correctly I need to call MatSetNullSpace on the pmat >>>>> in >>>>> order for a ksp to project that nullspace out of the residual during the >>>>> krylov iteration. However the nullspaces of mat and pmat are not >>>>> necessarily the same. For instance, what should I do if the system that I >>>>> want to solve has a nullspace (i.e. the `mat` has a null vector), but the >>>>> preconditioner matrix does not. >>>>> >>>>> As an example of existing setups that we have for which this is the case: >>>>> take a standard Stokes velocity, pressure system - where we want to solve >>>>> for pressure through a Schur complement. Suppose the boundary conditions >>>>> are Dirichlet for velocity on all boundaries, then the pressure equation >>>>> has the standard, constant nullspace. A frequently used preconditioner is >>>>> the "scaled pressure mass matrix" where G^T K^{-1} G is approximated by a >>>>> pressure mass matrix that is scaled by the value of the viscosity. So in >>>>> this case the actual system has a nullspace, but the preconditioner >>>>> matrix >>>>> does not. >>>>> >>>>> The way we previously used to solve this is by setting the nullspace on >>>>> the ksp of the Schur system, where the mat comes from >>>>> MatCreateSchurComplement and the pmat is the scaled mass matrix. We then >>>>> set the pctype to PCKSP and do not set a nullspace on its associated >>>>> ksp. I >>>>> don't really see how I can do this using only MatSetNullSpace - unless I >>>>> create a copy of the mass matrix and have one copy with and one copy >>>>> without a nullspace so that I could use the one with the nullspace for >>>>> the >>>>> pmat of the Schur ksp (and thus the mat of the pcksp) and the copy >>>>> without >>>>> the nullspace as the pmat for the pcksp. >>>>> >>>>> We would very much appreciate some guidance on what the correct way to >>>>> deal with this kind of situation is >>>>> >>>>> >>>> I can understand that this is inconsistent (we have not really thought of >>>> a >>>> good way to make this consistent). However, does >>>> this produce the wrong results? If A has a null space, won't you get the >>>> same answer by attaching it to P, or am I missing the >>>> import of the above example? >>>> >>> >>> Attaching the nullspace to P means it will be applied in the PCKSP solve as >>> well - which in this case is just a mass matrix solve which doesn't >>> have a nullspace. I was under the impression that removing a nullspace in >>> a solve >>> where the matrix doesn't having one, would lead to trouble - but I'm happy >>> to >>> be told wrong. >>> >> >> I guess I was saying that this solve is only applied after a system with >> that null space, >> so I did not think it would change the answer. >> >> Thanks, >> >> Matt > > Sorry if we're talking completely cross-purposes here: but the pcksp solve (that doesn't actually have a nullspace) is inside a solve that does have a nullspace. If what you are saying is that applying the nullspace inside the pcksp solve does not affect the outer solve, I can only see that if the difference in outcome of the pcksp solve (with or without the nullspace) would be inside the nullspace, because such a difference would be projected out anyway straight after the pcksp solve. I don't see that this is true however. My reasoning is the following: > > Say the original PCKSP solve (that doesn't apply the nullspace) gets passed some residual r from the outer solve, so it solves: > > Pz=r > > where P is our mass matrix approximation to the system in the outer solve, and P is full rank. Now if we do remove the nullspace during the pcksp solve, effectively we change the system to the following: > > NM^{-1}P z = NM^{-1}r > > where M^{-1} is the preconditioner inside the pcksp solve (assuming left-preconditiong here), and N is the projection operator that projects out the nullspace. This system is now rank-deficient, where I can add: > > z -> z + P^{-1}M n > > for arbitrary n in the nullspace. > > So not only is the possible difference between solving with or without a nullspace not found in that nullspace, but worse, I've ended up with a preconditioner that's rank deficient in the outer solve. > > I've experimented with the example I described before with the Schur complement of a Stokes velocity,pressure system on the outside using fgmres and the viscosit scaled pressure mass matrix as preconditioner which is solved in the pcksp solve with cg+sor. With petsc 3.5 (so I can still use KSPSetNullSpace) if I set a nullspace only on the outside ksp, or if I set it both on the outside ksp and the pcksp sub_ksp, I get different answers (and also a different a number of iterations). The difference in answer visually looks like some grid scale noise that indeed could very well be of the form of a constant vector multiplied by an inverse mass matrix. > > If you agree that applying the nullspace inside the pcksp indeed gives incorrect answers, I'm still not clear how I can set this up using MatSetNullSpace only. > > Cheers > Stephan From knepley at gmail.com Tue Jun 23 14:24:49 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 23 Jun 2015 14:24:49 -0500 Subject: [petsc-users] move from KSPSetNullSpace to MatSetNullSpace In-Reply-To: <55895419.7050500@imperial.ac.uk> References: <55895419.7050500@imperial.ac.uk> Message-ID: On Tue, Jun 23, 2015 at 7:42 AM, Stephan Kramer wrote: > On Mon, Jun 22, 2015 at 1:13 PM, Stephan Kramer >> >>>> imperial.ac.uk> >>>> wrote: >>>> >>>> Dear petsc devs >>>> >>>>> >>>>> I've been trying to move our code from using KSPSetNullSpace to use >>>>> MatSetNullSpace instead. Although I appreciate the convenience of the >>>>> nullspace being propagated automatically through the solver hierarchy, >>>>> I'm >>>>> still a little confused on how to deal with the case that mat/=pmat in >>>>> a >>>>> ksp. >>>>> >>>>> If I read the code correctly I need to call MatSetNullSpace on the pmat >>>>> in >>>>> order for a ksp to project that nullspace out of the residual during >>>>> the >>>>> krylov iteration. However the nullspaces of mat and pmat are not >>>>> necessarily the same. For instance, what should I do if the system >>>>> that I >>>>> want to solve has a nullspace (i.e. the `mat` has a null vector), but >>>>> the >>>>> preconditioner matrix does not. >>>>> >>>>> As an example of existing setups that we have for which this is the >>>>> case: >>>>> take a standard Stokes velocity, pressure system - where we want to >>>>> solve >>>>> for pressure through a Schur complement. Suppose the boundary >>>>> conditions >>>>> are Dirichlet for velocity on all boundaries, then the pressure >>>>> equation >>>>> has the standard, constant nullspace. A frequently used preconditioner >>>>> is >>>>> the "scaled pressure mass matrix" where G^T K^{-1} G is approximated >>>>> by a >>>>> pressure mass matrix that is scaled by the value of the viscosity. So >>>>> in >>>>> this case the actual system has a nullspace, but the preconditioner >>>>> matrix >>>>> does not. >>>>> >>>>> The way we previously used to solve this is by setting the nullspace on >>>>> the ksp of the Schur system, where the mat comes from >>>>> MatCreateSchurComplement and the pmat is the scaled mass matrix. We >>>>> then >>>>> set the pctype to PCKSP and do not set a nullspace on its associated >>>>> ksp. I >>>>> don't really see how I can do this using only MatSetNullSpace - unless >>>>> I >>>>> create a copy of the mass matrix and have one copy with and one copy >>>>> without a nullspace so that I could use the one with the nullspace for >>>>> the >>>>> pmat of the Schur ksp (and thus the mat of the pcksp) and the copy >>>>> without >>>>> the nullspace as the pmat for the pcksp. >>>>> >>>>> We would very much appreciate some guidance on what the correct way to >>>>> deal with this kind of situation is >>>>> >>>>> >>>>> I can understand that this is inconsistent (we have not really >>>> thought of >>>> a >>>> good way to make this consistent). However, does >>>> this produce the wrong results? If A has a null space, won't you get the >>>> same answer by attaching it to P, or am I missing the >>>> import of the above example? >>>> >>>> >>> Attaching the nullspace to P means it will be applied in the PCKSP solve >>> as >>> well - which in this case is just a mass matrix solve which doesn't >>> have a nullspace. I was under the impression that removing a nullspace in >>> a solve >>> where the matrix doesn't having one, would lead to trouble - but I'm >>> happy >>> to >>> be told wrong. >>> >>> >> I guess I was saying that this solve is only applied after a system with >> that null space, >> so I did not think it would change the answer. >> >> Thanks, >> >> Matt >> > > Sorry if we're talking completely cross-purposes here: but the pcksp solve > (that doesn't actually have a nullspace) is inside a solve that does have a > nullspace. If what you are saying is that applying the nullspace inside the > pcksp solve does not affect the outer solve, I can only see that if the > difference in outcome of the pcksp solve (with or without the nullspace) > would be inside the nullspace, because such a difference would be projected > out anyway straight after the pcksp solve. I don't see that this is true > however. My reasoning is the following: > > Say the original PCKSP solve (that doesn't apply the nullspace) gets > passed some residual r from the outer solve, so it solves: > > Pz=r > > where P is our mass matrix approximation to the system in the outer solve, > and P is full rank. Now if we do remove the nullspace during the pcksp > solve, effectively we change the system to the following: > > NM^{-1}P z = NM^{-1}r > > where M^{-1} is the preconditioner inside the pcksp solve (assuming > left-preconditiong here), and N is the projection operator that projects > out the nullspace. This system is now rank-deficient, where I can add: > > z -> z + P^{-1}M n > > for arbitrary n in the nullspace. > > So not only is the possible difference between solving with or without a > nullspace not found in that nullspace, but worse, I've ended up with a > preconditioner that's rank deficient in the outer solve. > This analysis does not make sense to me. The whole point of having all this nullspace stuff is so that we do not get a component of the null space in the solution. The way we avoid these in Krylov methods is to project them out, which is what happens when you attach it. Thus, your PCKSP solution 'z' will not have any 'n' component. Your outer solve will also not have any 'n' component, so I do not see where the inconsistency comes in. Thanks, Matt > I've experimented with the example I described before with the Schur > complement of a Stokes velocity,pressure system on the outside using fgmres > and the viscosit scaled pressure mass matrix as preconditioner which is > solved in the pcksp solve with cg+sor. With petsc 3.5 (so I can still use > KSPSetNullSpace) if I set a nullspace only on the outside ksp, or if I set > it both on the outside ksp and the pcksp sub_ksp, I get different answers > (and also a different a number of iterations). The difference in answer > visually looks like some grid scale noise that indeed could very well be of > the form of a constant vector multiplied by an inverse mass matrix. > > If you agree that applying the nullspace inside the pcksp indeed gives > incorrect answers, I'm still not clear how I can set this up using > MatSetNullSpace only. > > Cheers > Stephan > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Tue Jun 23 18:36:55 2015 From: jychang48 at gmail.com (Justin Chang) Date: Tue, 23 Jun 2015 18:36:55 -0500 Subject: [petsc-users] Varying TAO optimization solve iterations using BLMVM In-Reply-To: References: Message-ID: I was unable to do quad precision or even with 64 bit integers because my data files rely on intricate binary files that have been written in 32 bit. However, I noticed a couple things which are puzzling to me: 1) I am solving a transient problem using my own backward euler function. Basically I call TaoSolve at each time level. What I find strange is that the number of TAO solve iterations vary at each time level for a given number of processors. The solution is roughly the same when I change the number of processors. Any idea why this is happening, or might this have more to do with the job scheduling/compute nodes on my HPC machine? 2) Sometimes, I get Tao Termination reason of -5, and from what I see from the online documentation, it means the number of function evaluations exceeds the maximum number of function evaluations. I only get this at certain time levels, and it also varies when I change the number of processors. I can understand the number of iterations going down the further in time i go (this is due to the nature of my problem), but I am not sure why the above two observations are happening. Any thoughts? Thanks, Justin On Fri, Jun 19, 2015 at 11:52 AM, Justin Chang wrote: > My code sort of requires HDF5 so installing quad precision might be a > little difficult. I could try to work around this but that might take some > effort. > > In the mean time, is there any other potential explanation or alternative > to figuring this out? > > Thanks, > Justin > > > On Thursday, June 18, 2015, Matthew Knepley wrote: > >> On Thu, Jun 18, 2015 at 1:52 PM, Jason Sarich >> wrote: >> >>> BLMVM doesn't use a KSP or preconditioner, it updates using the L-BFGS-B >>> formula >>> >> >> Then this sounds like a bug, unless one of the constants is partition >> dependent. >> >> Matt >> >> >>> On Thu, Jun 18, 2015 at 1:45 PM, Matthew Knepley >>> wrote: >>> >>>> On Thu, Jun 18, 2015 at 12:15 PM, Jason Sarich < >>>> jason.sarich at gmail.com> wrote: >>>> >>>>> Hi Justin, >>>>> >>>>> I can't tell for sure why this is happening, have you tried using >>>>> quad precision to make sure that numerical cutoffs isn't the problem? >>>>> >>>>> 1 The Hessian being approximate and the resulting implicit >>>>> computation is the source of the cutoff, but would not be causing different >>>>> convergence rates in infinite precision. >>>>> >>>>> 2 the local size may affect load balancing but not the resulting >>>>> norms/convergence rate. >>>>> >>>> >>>> This sounds to be like the preconditioner is dependent on the >>>> partition. Can you send -tao_view -snes_view >>>> >>>> Matt >>>> >>>> >>>>> Jason >>>>> >>>>> >>>>> On Thu, Jun 18, 2015 at 10:44 AM, Justin Chang >>>>> wrote: >>>>> >>>>>> I solved a transient diffusion across multiple cores using TAO >>>>>> BLMVM. When I simulate the same problem but on different numbers of >>>>>> processing cores, the number of solve iterations change quite drastically. >>>>>> The numerical solution is the same, but these changes are quite vast. I >>>>>> attached a PDF showing a comparison between KSP and TAO. KSP remains >>>>>> largely invariant with number of processors but TAO (with bounded >>>>>> constraints) fluctuates. >>>>>> >>>>>> My question is, why is this happening? I understand that accumulation >>>>>> of numerical round-offs may attribute to this, but the differences seem >>>>>> quite vast to me. My initial thought was that >>>>>> >>>>>> 1) the Hessian is only projected and not explicitly computed, which >>>>>> may have something to do with the rate of convergence >>>>>> >>>>>> 2) local problem size. Certain regions of my domain have different >>>>>> number of "violations" which need to be corrected by the bounded >>>>>> constraints so the rate of convergence depends on how these regions are >>>>>> partitioned? >>>>>> >>>>>> Any thoughts? >>>>>> >>>>>> Thanks, >>>>>> Justin >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gideon.simpson at gmail.com Tue Jun 23 19:52:13 2015 From: gideon.simpson at gmail.com (Gideon Simpson) Date: Tue, 23 Jun 2015 20:52:13 -0400 Subject: [petsc-users] jacobians for multicomponent problems Message-ID: Suppose I have a problem for which the field variable has multiple components, in this case real and imaginary parts, and I have created a DM with two degrees of freedom. When assembling a Jacobian associated with a nonlinear problem on this data, is there any data management accounting for the different components that makes this ?easier?? -gideon -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Jun 23 22:05:00 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 23 Jun 2015 22:05:00 -0500 Subject: [petsc-users] jacobians for multicomponent problems In-Reply-To: References: Message-ID: On Tue, Jun 23, 2015 at 7:52 PM, Gideon Simpson wrote: > Suppose I have a problem for which the field variable has multiple > components, in this case real and imaginary parts, and I have created a DM > with two degrees of freedom. When assembling a Jacobian associated with a > nonlinear problem on this data, is there any data management accounting for > the different components that makes this ?easier?? > DMDA knows about components so that MatStencil has a .c slot for indices. Is that what you mean? Thanks, Matt > -gideon > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From gideon.simpson at gmail.com Tue Jun 23 22:30:13 2015 From: gideon.simpson at gmail.com (Gideon Simpson) Date: Tue, 23 Jun 2015 23:30:13 -0400 Subject: [petsc-users] jacobians for multicomponent problems In-Reply-To: References: Message-ID: Yes, is there and example for this? -gideon > On Jun 23, 2015, at 11:05 PM, Matthew Knepley wrote: > >> On Tue, Jun 23, 2015 at 7:52 PM, Gideon Simpson wrote: >> Suppose I have a problem for which the field variable has multiple components, in this case real and imaginary parts, and I have created a DM with two degrees of freedom. When assembling a Jacobian associated with a nonlinear problem on this data, is there any data management accounting for the different components that makes this ?easier?? > > DMDA knows about components so that MatStencil has a .c slot for indices. Is that what you mean? > > Thanks, > > Matt > >> -gideon >> > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From domenico_lahaye at yahoo.com Wed Jun 24 02:39:25 2015 From: domenico_lahaye at yahoo.com (domenico lahaye) Date: Wed, 24 Jun 2015 07:39:25 +0000 (UTC) Subject: [petsc-users] jacobians for multicomponent problems In-Reply-To: References: Message-ID: <1381935005.127870.1435131565928.JavaMail.yahoo@mail.yahoo.com> Would the same work for non-structured problems, for instance for power flow equations? Thx. Domenico Lahaye. From: Matthew Knepley To: Gideon Simpson Cc: petsc-users Sent: Wednesday, June 24, 2015 5:05 AM Subject: Re: [petsc-users] jacobians for multicomponent problems On Tue, Jun 23, 2015 at 7:52 PM, Gideon Simpson wrote: Suppose I have a problem for which the field variable has multiple components, in this case real and imaginary parts, and I have created a DM with two degrees of freedom.? When assembling a Jacobian associated with a nonlinear problem on this data, is there any data management accounting for the different components that makes this ?easier?? DMDA knows about components so that MatStencil has a .c slot for indices. Is that what you mean? ? Thanks, ? ? Matt ? -gideon -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From forms-shares-noreply at google.com Wed Jun 24 03:00:45 2015 From: forms-shares-noreply at google.com (=?GB2312?B?VGF5bG9ydXo5IEhlcnJlcmF6aWQ4OaOozai5/SBHb29nbGUgse21paOp?=) Date: Wed, 24 Jun 2015 08:00:45 +0000 Subject: [petsc-users] =?gb2312?b?y6vPssjtvP4gzeLDs7mk1/fJz7XEusOw78rWIMT6?= =?gb2312?b?1rW1w9O109A=?= Message-ID: <90e6ba614dd46f5f4c05193ee717@google.com> ????????? ????????????????????????????200????? 700???????????? ???????????????????????????????????? ??????????google??????????????? ???????????????????????????????????? ??????????? ????? ?????? ???????????????????? ??????(18???????)+??700???????? 1880?/? 2880?/? ?????????QQ?61737441---???? ?QQ?????????????????????? ????????? ??????? ??????????? https://docs.google.com/forms/d/1sOCrOpzEdb1oWM-2JR09ecMGQPArX2OM6cYJEYg1lHA/viewform?c=0&w=1&usp=mail_form_link -------------- next part -------------- An HTML attachment was scrubbed... URL: From orxan.shibli at gmail.com Wed Jun 24 05:08:06 2015 From: orxan.shibli at gmail.com (Orxan Shibliyev) Date: Wed, 24 Jun 2015 04:08:06 -0600 Subject: [petsc-users] nan after VecDuplicate Message-ID: I get a nan for solution vector, 'x' right after VecDuplicate (x, &b) where, 'b' is a Vec. This happens only with multiple processes while sequential one works perfectly. I even tried to set 'x' with VecZeroEntries(x) but didn't help. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jun 24 05:16:34 2015 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 24 Jun 2015 05:16:34 -0500 Subject: [petsc-users] jacobians for multicomponent problems In-Reply-To: References: Message-ID: On Tue, Jun 23, 2015 at 10:30 PM, Gideon Simpson wrote: > Yes, is there and example for this? > src/ts/examples/tutorials/advection-diffusion-reaction/ex5.c Thanks, Matt > -gideon > > On Jun 23, 2015, at 11:05 PM, Matthew Knepley wrote: > > On Tue, Jun 23, 2015 at 7:52 PM, Gideon Simpson > wrote: > >> Suppose I have a problem for which the field variable has multiple >> components, in this case real and imaginary parts, and I have created a DM >> with two degrees of freedom. When assembling a Jacobian associated with a >> nonlinear problem on this data, is there any data management accounting for >> the different components that makes this ?easier?? >> > > DMDA knows about components so that MatStencil has a .c slot for indices. > Is that what you mean? > > Thanks, > > Matt > > >> -gideon >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jun 24 05:17:48 2015 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 24 Jun 2015 05:17:48 -0500 Subject: [petsc-users] jacobians for multicomponent problems In-Reply-To: <1381935005.127870.1435131565928.JavaMail.yahoo@mail.yahoo.com> References: <1381935005.127870.1435131565928.JavaMail.yahoo@mail.yahoo.com> Message-ID: On Wed, Jun 24, 2015 at 2:39 AM, domenico lahaye wrote: > Would the same work for non-structured problems, for instance for power > flow equations? > Yes, PetscSection knows about components for fields, and this carries through to the DMNetwork setup. It will, for instance, use block matrix formats if the block size is constant. Thanks, Matt > Thx. Domenico Lahaye. > > ------------------------------ > *From:* Matthew Knepley > *To:* Gideon Simpson > *Cc:* petsc-users > *Sent:* Wednesday, June 24, 2015 5:05 AM > *Subject:* Re: [petsc-users] jacobians for multicomponent problems > > On Tue, Jun 23, 2015 at 7:52 PM, Gideon Simpson > wrote: > > Suppose I have a problem for which the field variable has multiple > components, in this case real and imaginary parts, and I have created a DM > with two degrees of freedom. When assembling a Jacobian associated with a > nonlinear problem on this data, is there any data management accounting for > the different components that makes this ?easier?? > > > DMDA knows about components so that MatStencil has a .c slot for indices. > Is that what you mean? > > Thanks, > > Matt > > > > > -gideon > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jun 24 05:32:31 2015 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 24 Jun 2015 05:32:31 -0500 Subject: [petsc-users] nan after VecDuplicate In-Reply-To: References: Message-ID: On Wed, Jun 24, 2015 at 5:08 AM, Orxan Shibliyev wrote: > I get a nan for solution vector, 'x' right after VecDuplicate (x, &b) > where, 'b' is a Vec. This happens only with multiple processes while > sequential one works perfectly. I even tried to set 'x' with > VecZeroEntries(x) but didn't help. > VecDuplicate() does not affect the values in the initial vector. This is used everywhere in PETSc. It sounds like there is memory corruption somewhere else in your code. We recommend using valgrind to track this down. Thanks, Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From dalcinl at gmail.com Wed Jun 24 09:15:22 2015 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Wed, 24 Jun 2015 09:15:22 -0500 Subject: [petsc-users] petsc4py Build Problem In-Reply-To: References: Message-ID: On 23 June 2015 at 12:15, Mikhail Khodak wrote: > Thank you for reviewing this. The errors for the command 'pip2 install .' > inside the petsc4py directory are attached, along with the configure/make > logs for the PETSc build. > Best, > Mikhail Khodak > Not sure what's going on. Do you have PETSC_DIR and PETSC_ARCH in your environment? Does it work if you just do "python setup.py build" ? -- Lisandro Dalcin ============ Research Scientist Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) Numerical Porous Media Center (NumPor) King Abdullah University of Science and Technology (KAUST) http://numpor.kaust.edu.sa/ 4700 King Abdullah University of Science and Technology al-Khawarizmi Bldg (Bldg 1), Office # 4332 Thuwal 23955-6900, Kingdom of Saudi Arabia http://www.kaust.edu.sa Office Phone: +966 12 808-0459 From gideon.simpson at gmail.com Wed Jun 24 09:51:07 2015 From: gideon.simpson at gmail.com (Gideon Simpson) Date: Wed, 24 Jun 2015 10:51:07 -0400 Subject: [petsc-users] DMDAVecGetArray vs DMDAVecGetArrayRead Message-ID: <0A33A759-0562-4C41-AC0E-E946DD56D8A8@gmail.com> Is there any difference between these two routines? Is there a reason to use one over the other? -gideon -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jun 24 10:05:37 2015 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 24 Jun 2015 10:05:37 -0500 Subject: [petsc-users] DMDAVecGetArray vs DMDAVecGetArrayRead In-Reply-To: <0A33A759-0562-4C41-AC0E-E946DD56D8A8@gmail.com> References: <0A33A759-0562-4C41-AC0E-E946DD56D8A8@gmail.com> Message-ID: On Wed, Jun 24, 2015 at 9:51 AM, Gideon Simpson wrote: > Is there any difference between these two routines? Is there a reason to > use one over the other? > You can only change the values using the first. Thanks, Matt > -gideon > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From riseth at maths.ox.ac.uk Wed Jun 24 19:02:20 2015 From: riseth at maths.ox.ac.uk (=?UTF-8?Q?Asbj=C3=B8rn_Nilsen_Riseth?=) Date: Thu, 25 Jun 2015 00:02:20 +0000 Subject: [petsc-users] Set PC PythonContext within NPC Message-ID: Hi all, I'm currently trying to set up a nonlinear solver that uses NGMRES with NEWTONLS as a right preconditioner. The NEWTONLS has a custom linear preconditioner. Everything is accessed through petsc4py. *Is there a way I can configure a NPC NEWTONLS KSP CompositePC without first calling solve on my outer snes?* My NEWTONLS has the following KSP setup: FGMRES | PCCOMPOSITE || PYTHON || ILU The way I understand things, the NPC is not created/set up until SNESSolve_NGMRES() is called. Therefore, I cannot call snes.getNPC().ksp.pc.getCompositePC(0).setPythonContext(ctx) before I have called snes.solve(). What I currently do is a try/except on snes.solve to create/set up the pccomposite. Then I can set my pythoncontext and it runs fine. This is quite ugly though, so I was hoping anyone would have a better approach. Ozzy -------------- next part -------------- An HTML attachment was scrubbed... URL: From mkhodak at princeton.edu Wed Jun 24 21:12:37 2015 From: mkhodak at princeton.edu (Mikhail Khodak) Date: Wed, 24 Jun 2015 19:12:37 -0700 Subject: [petsc-users] petsc4py Build Problem In-Reply-To: References: Message-ID: Yes, the variables are defined. Using setup.py gives the same error. As a note, I am using a download of the master branch. Mikhail Khodak On Wed, Jun 24, 2015 at 7:15 AM, Lisandro Dalcin wrote: > On 23 June 2015 at 12:15, Mikhail Khodak wrote: > > Thank you for reviewing this. The errors for the command 'pip2 install .' > > inside the petsc4py directory are attached, along with the configure/make > > logs for the PETSc build. > > Best, > > Mikhail Khodak > > > > Not sure what's going on. > > Do you have PETSC_DIR and PETSC_ARCH in your environment? > > Does it work if you just do "python setup.py build" ? > > > -- > Lisandro Dalcin > ============ > Research Scientist > Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) > Numerical Porous Media Center (NumPor) > King Abdullah University of Science and Technology (KAUST) > http://numpor.kaust.edu.sa/ > > 4700 King Abdullah University of Science and Technology > al-Khawarizmi Bldg (Bldg 1), Office # 4332 > Thuwal 23955-6900, Kingdom of Saudi Arabia > http://www.kaust.edu.sa > > Office Phone: +966 12 808-0459 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Jun 24 21:14:46 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 24 Jun 2015 21:14:46 -0500 Subject: [petsc-users] petsc4py Build Problem In-Reply-To: References: Message-ID: Perhaps the issue is petsc-3.5. We tried this petsc-3.6 - and that worked fine.. Satish On Wed, 24 Jun 2015, Mikhail Khodak wrote: > Yes, the variables are defined. Using setup.py gives the same error. > As a note, I am using a download of the master branch. > Mikhail Khodak > > On Wed, Jun 24, 2015 at 7:15 AM, Lisandro Dalcin wrote: > > > On 23 June 2015 at 12:15, Mikhail Khodak wrote: > > > Thank you for reviewing this. The errors for the command 'pip2 install .' > > > inside the petsc4py directory are attached, along with the configure/make > > > logs for the PETSc build. > > > Best, > > > Mikhail Khodak > > > > > > > Not sure what's going on. > > > > Do you have PETSC_DIR and PETSC_ARCH in your environment? > > > > Does it work if you just do "python setup.py build" ? > > > > > > -- > > Lisandro Dalcin > > ============ > > Research Scientist > > Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) > > Numerical Porous Media Center (NumPor) > > King Abdullah University of Science and Technology (KAUST) > > http://numpor.kaust.edu.sa/ > > > > 4700 King Abdullah University of Science and Technology > > al-Khawarizmi Bldg (Bldg 1), Office # 4332 > > Thuwal 23955-6900, Kingdom of Saudi Arabia > > http://www.kaust.edu.sa > > > > Office Phone: +966 12 808-0459 > > > From mkhodak at princeton.edu Wed Jun 24 21:47:47 2015 From: mkhodak at princeton.edu (Mikhail Khodak) Date: Wed, 24 Jun 2015 19:47:47 -0700 Subject: [petsc-users] petsc4py Build Problem In-Reply-To: References: Message-ID: That was it. Installed with 3.6.0 and all tests passed. Thank you both for the help. Mikhail Khodak On Wed, Jun 24, 2015 at 7:14 PM, Satish Balay wrote: > Perhaps the issue is petsc-3.5. We tried this petsc-3.6 - and that > worked fine.. > > Satish > > On Wed, 24 Jun 2015, Mikhail Khodak wrote: > > > Yes, the variables are defined. Using setup.py gives the same error. > > As a note, I am using a download of the master branch. > > Mikhail Khodak > > > > On Wed, Jun 24, 2015 at 7:15 AM, Lisandro Dalcin > wrote: > > > > > On 23 June 2015 at 12:15, Mikhail Khodak > wrote: > > > > Thank you for reviewing this. The errors for the command 'pip2 > install .' > > > > inside the petsc4py directory are attached, along with the > configure/make > > > > logs for the PETSc build. > > > > Best, > > > > Mikhail Khodak > > > > > > > > > > Not sure what's going on. > > > > > > Do you have PETSC_DIR and PETSC_ARCH in your environment? > > > > > > Does it work if you just do "python setup.py build" ? > > > > > > > > > -- > > > Lisandro Dalcin > > > ============ > > > Research Scientist > > > Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) > > > Numerical Porous Media Center (NumPor) > > > King Abdullah University of Science and Technology (KAUST) > > > http://numpor.kaust.edu.sa/ > > > > > > 4700 King Abdullah University of Science and Technology > > > al-Khawarizmi Bldg (Bldg 1), Office # 4332 > > > Thuwal 23955-6900, Kingdom of Saudi Arabia > > > http://www.kaust.edu.sa > > > > > > Office Phone: +966 12 808-0459 > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stoneszone at gmail.com Thu Jun 25 05:51:32 2015 From: stoneszone at gmail.com (Lei Shi) Date: Thu, 25 Jun 2015 05:51:32 -0500 Subject: [petsc-users] Parallel efficiency of the gmres solver with ASM Message-ID: Hello, I'm trying to improve the parallel efficiency of gmres solve in my. In my CFD solver, Petsc gmres is used to solve the linear system generated by the Newton's method. To test its efficiency, I started with a very simple inviscid subsonic 3D flow as the first testcase. The parallel efficiency of gmres solve with asm as the preconditioner is very bad. The results are from our latest cluster. Right now, I'm only looking at the wclock time of the ksp_solve. 1. First I tested ASM with gmres and ilu 0 for the sub domain , the cpu time of 2 cores is almost the same as the serial run. Here is the options for this case -ksp_type gmres -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50 -ksp_gmres_restart 30 -ksp_pc_side right -pc_type asm -sub_ksp_type gmres -sub_ksp_rtol 0.001 -sub_ksp_atol 1e-30 -sub_ksp_max_it 1000 -sub_pc_type ilu -sub_pc_factor_levels 0 -sub_pc_factor_fill 1.9 The iteration numbers increase a lot for parallel run. coresiterationserrpetsc solve wclock timespeedupefficiency121.15E-0411.95125 2.05E-0210.51.010.50462.19E-027.641.390.34 2. Then I tested ASM with ilu 0 as the preconditoner only, the cpu time of 2 cores is better than the 1st test, but the speedup is still very bad. Here is the options i'm using -ksp_type gmres -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50 -ksp_gmres_restart 30 -ksp_pc_side right -pc_type asm -sub_pc_type ilu -sub_pc_factor_levels 0 -sub_pc_factor_fill 1.9 coresiterationserrpetsc solve cpu timespeedupefficiency1104.54E-0410.681211 9.55E-048.21.300.654123.59E-045.262.030.50 Those results are from a third order "DG" scheme with a very coarse 3D mesh (480 elements). I believe I should get some speedups for this test even on this coarse mesh. My question is why does the asm with a local solve take much longer time than the asm as a preconditioner only? Also the accuracy is very bad too I have tested changing the overlap of asm to 2, but make it even worse. If I used a larger mesh ~4000 elements, the 2nd case with asm as the preconditioner gives me a better speedup, but still not very good. coresiterationserrpetsc solve cpu timespeedupefficiency171.91E-0297.32127 2.07E-0264.941.50.74472.61E-0236.972.60.65 Attached are the log_summary dumped from petsc, any suggestions are welcome. I really appreciate it. Sincerely Yours, Lei Shi --------- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: proc2_asm_sub_ksp.dat Type: application/octet-stream Size: 12233 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: proc1_asm_sub_ksp.dat Type: application/octet-stream Size: 11951 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: proc2_asm_pconly.dat Type: application/octet-stream Size: 12498 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: proc1_asm_pconly.dat Type: application/octet-stream Size: 12087 bytes Desc: not available URL: From knepley at gmail.com Thu Jun 25 06:44:17 2015 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 25 Jun 2015 06:44:17 -0500 Subject: [petsc-users] Parallel efficiency of the gmres solver with ASM In-Reply-To: References: Message-ID: On Thu, Jun 25, 2015 at 5:51 AM, Lei Shi wrote: > Hello, > 1) In order to understand this, we have to disentagle the various effect. First, run the STREAMS benchmark make NPMAX=4 streams This will tell you the maximum speedup you can expect on this machine. 2) For these test cases, also send the output of -ksp_view -ksp_converged_reason -ksp_monitor_true_residual Thanks, Matt > I'm trying to improve the parallel efficiency of gmres solve in my. In my > CFD solver, Petsc gmres is used to solve the linear system generated by the > Newton's method. To test its efficiency, I started with a very simple > inviscid subsonic 3D flow as the first testcase. The parallel efficiency of > gmres solve with asm as the preconditioner is very bad. The results are > from our latest cluster. Right now, I'm only looking at the wclock time of > the ksp_solve. > > 1. First I tested ASM with gmres and ilu 0 for the sub domain , the > cpu time of 2 cores is almost the same as the serial run. Here is the > options for this case > > -ksp_type gmres -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50 > -ksp_gmres_restart 30 -ksp_pc_side right > -pc_type asm -sub_ksp_type gmres -sub_ksp_rtol 0.001 -sub_ksp_atol 1e-30 > -sub_ksp_max_it 1000 -sub_pc_type ilu -sub_pc_factor_levels 0 > -sub_pc_factor_fill 1.9 > > The iteration numbers increase a lot for parallel run. > > coresiterationserrpetsc solve wclock timespeedupefficiency121.15E-0411.951 > 252.05E-0210.51.010.50462.19E-027.641.390.34 > > > > > > > 2. Then I tested ASM with ilu 0 as the preconditoner only, the cpu > time of 2 cores is better than the 1st test, but the speedup is still very > bad. Here is the options i'm using > > -ksp_type gmres -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50 > -ksp_gmres_restart 30 -ksp_pc_side right > -pc_type asm -sub_pc_type ilu -sub_pc_factor_levels 0 -sub_pc_factor_fill > 1.9 > > coresiterationserrpetsc solve cpu timespeedupefficiency1104.54E-0410.6812 > 119.55E-048.21.300.654123.59E-045.262.030.50 > > > > > > > Those results are from a third order "DG" scheme with a very coarse 3D > mesh (480 elements). I believe I should get some speedups for this test > even on this coarse mesh. > > My question is why does the asm with a local solve take much longer time > than the asm as a preconditioner only? Also the accuracy is very bad too I > have tested changing the overlap of asm to 2, but make it even worse. > > If I used a larger mesh ~4000 elements, the 2nd case with asm as the > preconditioner gives me a better speedup, but still not very good. > > > coresiterationserrpetsc solve cpu timespeedupefficiency171.91E-0297.32127 > 2.07E-0264.941.50.74472.61E-0236.972.60.65 > > > > Attached are the log_summary dumped from petsc, any suggestions are > welcome. I really appreciate it. > > > Sincerely Yours, > > Lei Shi > --------- > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.sarich at gmail.com Thu Jun 25 09:51:31 2015 From: jason.sarich at gmail.com (Jason Sarich) Date: Thu, 25 Jun 2015 09:51:31 -0500 Subject: [petsc-users] Varying TAO optimization solve iterations using BLMVM In-Reply-To: References: Message-ID: Hi Justin, I don't see anything obviously wrong that would be causing this variation in iterations due to number of processors. Is it at all feasible to send be an example code that reproduces the problem (perhaps a smaller version)? I'm still guessing the problem lies in numerical precision, it would be nice to find a way to avoid them. I don't think the job scheduling or compute nodes would affect this. > Basically I call TaoSolve at each time level. What I find strange is that > the number of TAO solve iterations vary at each time level for a given > number of processors Just for clarification, do you mean that for a given problem, you run the same problem several times the only difference being the number of processors, and that on each time step you get (close to) the same solution for each run, just with a different number of TAO iterations? If you run the same problem twice using the same number of processors, is the output identical? No OpenMP threads or GPU's? > 2) Sometimes, I get Tao Termination reason of -5, and from what I see from > the online documentation, it means the number of function evaluations > exceeds the maximum number of function evaluations. I only get this at > certain time levels, and it also varies when I change the number of > processors. This looks like the same issue, unless the number of iterations is hugely different (say it converges on some number of processors after 200 iterations, but still hasn't converged after 2000 on a different number). Jason On Tue, Jun 23, 2015 at 6:36 PM, Justin Chang wrote: > I was unable to do quad precision or even with 64 bit integers > because my data files rely on intricate binary files that have been written > in 32 bit. > > However, I noticed a couple things which are puzzling to me: > > 1) I am solving a transient problem using my own backward euler function. > Basically I call TaoSolve at each time level. What I find strange is that > the number of TAO solve iterations vary at each time level for a given > number of processors. The solution is roughly the same when I change the > number of processors. Any idea why this is happening, or might this have > more to do with the job scheduling/compute nodes on my HPC machine? > > 2) Sometimes, I get Tao Termination reason of -5, and from what I see > from the online documentation, it means the number of function evaluations > exceeds the maximum number of function evaluations. I only get this at > certain time levels, and it also varies when I change the number of > processors. > > I can understand the number of iterations going down the further in time > i go (this is due to the nature of my problem), but I am not sure why the > above two observations are happening. Any thoughts? > > Thanks, > Justin > > On Fri, Jun 19, 2015 at 11:52 AM, Justin Chang > wrote: > >> My code sort of requires HDF5 so installing quad precision might be a >> little difficult. I could try to work around this but that might take some >> effort. >> >> In the mean time, is there any other potential explanation or >> alternative to figuring this out? >> >> Thanks, >> Justin >> >> >> On Thursday, June 18, 2015, Matthew Knepley wrote: >> >>> On Thu, Jun 18, 2015 at 1:52 PM, Jason Sarich >>> wrote: >>> >>>> BLMVM doesn't use a KSP or preconditioner, it updates using the >>>> L-BFGS-B formula >>>> >>> >>> Then this sounds like a bug, unless one of the constants is partition >>> dependent. >>> >>> Matt >>> >>> >>>> On Thu, Jun 18, 2015 at 1:45 PM, Matthew Knepley >>>> wrote: >>>> >>>>> On Thu, Jun 18, 2015 at 12:15 PM, Jason Sarich < >>>>> jason.sarich at gmail.com> wrote: >>>>> >>>>>> Hi Justin, >>>>>> >>>>>> I can't tell for sure why this is happening, have you tried using >>>>>> quad precision to make sure that numerical cutoffs isn't the problem? >>>>>> >>>>>> 1 The Hessian being approximate and the resulting implicit >>>>>> computation is the source of the cutoff, but would not be causing different >>>>>> convergence rates in infinite precision. >>>>>> >>>>>> 2 the local size may affect load balancing but not the resulting >>>>>> norms/convergence rate. >>>>>> >>>>> >>>>> This sounds to be like the preconditioner is dependent on the >>>>> partition. Can you send -tao_view -snes_view >>>>> >>>>> Matt >>>>> >>>>> >>>>>> Jason >>>>>> >>>>>> >>>>>> On Thu, Jun 18, 2015 at 10:44 AM, Justin Chang >>>>>> wrote: >>>>>> >>>>>>> I solved a transient diffusion across multiple cores using TAO >>>>>>> BLMVM. When I simulate the same problem but on different numbers of >>>>>>> processing cores, the number of solve iterations change quite drastically. >>>>>>> The numerical solution is the same, but these changes are quite vast. I >>>>>>> attached a PDF showing a comparison between KSP and TAO. KSP remains >>>>>>> largely invariant with number of processors but TAO (with bounded >>>>>>> constraints) fluctuates. >>>>>>> >>>>>>> My question is, why is this happening? I understand that >>>>>>> accumulation of numerical round-offs may attribute to this, but the >>>>>>> differences seem quite vast to me. My initial thought was that >>>>>>> >>>>>>> 1) the Hessian is only projected and not explicitly computed, >>>>>>> which may have something to do with the rate of convergence >>>>>>> >>>>>>> 2) local problem size. Certain regions of my domain have different >>>>>>> number of "violations" which need to be corrected by the bounded >>>>>>> constraints so the rate of convergence depends on how these regions are >>>>>>> partitioned? >>>>>>> >>>>>>> Any thoughts? >>>>>>> >>>>>>> Thanks, >>>>>>> Justin >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Thu Jun 25 11:42:53 2015 From: jychang48 at gmail.com (Justin Chang) Date: Thu, 25 Jun 2015 11:42:53 -0500 Subject: [petsc-users] Varying TAO optimization solve iterations using BLMVM In-Reply-To: References: Message-ID: Jason, I was experimenting with some smaller steady-state problems and I still get the same issue: every time I run the same problem on the same number of processors the number of iterations differs but the solutions remain the same. I think this is the root to why I get such erratic behavior in iterations in the transient case. Attached are the log files for the two runs I did. Is it possible to tell what's going on from these? Or should I reinstall PETSc? Because I only noticed this problem once I switched over to the master version of 3.6 (although it's possible I might have screwed up the installation a bit because I had to scp everything due to the firewall). I can send you the example code and working files if needed Thanks, Justin On Thu, Jun 25, 2015 at 9:51 AM, Jason Sarich wrote: > Hi Justin, > > I don't see anything obviously wrong that would be causing this variation > in iterations due to number of processors. Is it at all feasible to send be > an example code that reproduces the problem (perhaps a smaller version)? > I'm still guessing the problem lies in numerical precision, it would be > nice to find a way to avoid them. I don't think the job scheduling or > compute nodes would affect this. > >> Basically I call TaoSolve at each time level. What I find strange is >> that the number of TAO solve iterations vary at each time level for a given >> number of processors > > Just for clarification, do you mean that for a given problem, you run the > same problem several times the only difference being the number of > processors, and that on each time step you get (close to) the same solution > for each run, just with a different number of TAO iterations? > > If you run the same problem twice using the same number of processors, is > the output identical? No OpenMP threads or GPU's? > > >> 2) Sometimes, I get Tao Termination reason of -5, and from what I see >> from the online documentation, it means the number of function evaluations >> exceeds the maximum number of function evaluations. I only get this at >> certain time levels, and it also varies when I change the number of >> processors. > > > This looks like the same issue, unless the number of iterations is hugely > different (say it converges on some number of processors after 200 > iterations, but still hasn't converged after 2000 on a different number). > > > Jason > > > > > On Tue, Jun 23, 2015 at 6:36 PM, Justin Chang wrote: > >> I was unable to do quad precision or even with 64 bit integers >> because my data files rely on intricate binary files that have been written >> in 32 bit. >> >> However, I noticed a couple things which are puzzling to me: >> >> 1) I am solving a transient problem using my own backward euler >> function. Basically I call TaoSolve at each time level. What I find strange >> is that the number of TAO solve iterations vary at each time level for a >> given number of processors. The solution is roughly the same when I change >> the number of processors. Any idea why this is happening, or might this >> have more to do with the job scheduling/compute nodes on my HPC machine? >> >> 2) Sometimes, I get Tao Termination reason of -5, and from what I see >> from the online documentation, it means the number of function evaluations >> exceeds the maximum number of function evaluations. I only get this at >> certain time levels, and it also varies when I change the number of >> processors. >> >> I can understand the number of iterations going down the further in time >> i go (this is due to the nature of my problem), but I am not sure why the >> above two observations are happening. Any thoughts? >> >> Thanks, >> Justin >> >> On Fri, Jun 19, 2015 at 11:52 AM, Justin Chang >> wrote: >> >>> My code sort of requires HDF5 so installing quad precision might be a >>> little difficult. I could try to work around this but that might take some >>> effort. >>> >>> In the mean time, is there any other potential explanation or >>> alternative to figuring this out? >>> >>> Thanks, >>> Justin >>> >>> >>> On Thursday, June 18, 2015, Matthew Knepley wrote: >>> >>>> On Thu, Jun 18, 2015 at 1:52 PM, Jason Sarich >>>> wrote: >>>> >>>>> BLMVM doesn't use a KSP or preconditioner, it updates using the >>>>> L-BFGS-B formula >>>>> >>>> >>>> Then this sounds like a bug, unless one of the constants is partition >>>> dependent. >>>> >>>> Matt >>>> >>>> >>>>> On Thu, Jun 18, 2015 at 1:45 PM, Matthew Knepley >>>>> wrote: >>>>> >>>>>> On Thu, Jun 18, 2015 at 12:15 PM, Jason Sarich < >>>>>> jason.sarich at gmail.com> wrote: >>>>>> >>>>>>> Hi Justin, >>>>>>> >>>>>>> I can't tell for sure why this is happening, have you tried using >>>>>>> quad precision to make sure that numerical cutoffs isn't the problem? >>>>>>> >>>>>>> 1 The Hessian being approximate and the resulting implicit >>>>>>> computation is the source of the cutoff, but would not be causing different >>>>>>> convergence rates in infinite precision. >>>>>>> >>>>>>> 2 the local size may affect load balancing but not the resulting >>>>>>> norms/convergence rate. >>>>>>> >>>>>> >>>>>> This sounds to be like the preconditioner is dependent on the >>>>>> partition. Can you send -tao_view -snes_view >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> Jason >>>>>>> >>>>>>> >>>>>>> On Thu, Jun 18, 2015 at 10:44 AM, Justin Chang >>>>>>> wrote: >>>>>>> >>>>>>>> I solved a transient diffusion across multiple cores using TAO >>>>>>>> BLMVM. When I simulate the same problem but on different numbers of >>>>>>>> processing cores, the number of solve iterations change quite drastically. >>>>>>>> The numerical solution is the same, but these changes are quite vast. I >>>>>>>> attached a PDF showing a comparison between KSP and TAO. KSP remains >>>>>>>> largely invariant with number of processors but TAO (with bounded >>>>>>>> constraints) fluctuates. >>>>>>>> >>>>>>>> My question is, why is this happening? I understand that >>>>>>>> accumulation of numerical round-offs may attribute to this, but the >>>>>>>> differences seem quite vast to me. My initial thought was that >>>>>>>> >>>>>>>> 1) the Hessian is only projected and not explicitly computed, >>>>>>>> which may have something to do with the rate of convergence >>>>>>>> >>>>>>>> 2) local problem size. Certain regions of my domain have different >>>>>>>> number of "violations" which need to be corrected by the bounded >>>>>>>> constraints so the rate of convergence depends on how these regions are >>>>>>>> partitioned? >>>>>>>> >>>>>>>> Any thoughts? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Justin >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ================================================== ================================================== MESHID = 4 ================================================== ================================================== ========================================== 16 processors: ========================================== TSTEP ANALYSIS TIME ITER FLOPS/s iter = 0, Function value: -1.57703e-07, Residual: 0.239098 iter = 1, Function value: -4.53878e-07, Residual: 0.223304 iter = 2, Function value: -7.89606e-06, Residual: 0.221146 iter = 3, Function value: -0.0808297, Residual: 0.225876 iter = 4, Function value: -1.67847, Residual: 0.351701 iter = 5, Function value: -3.2307, Residual: 0.18377 iter = 6, Function value: -3.80321, Residual: 0.168797 iter = 7, Function value: -4.25223, Residual: 0.208515 iter = 8, Function value: -4.54546, Residual: 0.211522 iter = 9, Function value: -4.76263, Residual: 0.121566 iter = 10, Function value: -4.93278, Residual: 0.11454 iter = 11, Function value: -5.1385, Residual: 0.149913 iter = 12, Function value: -5.3172, Residual: 0.188699 iter = 13, Function value: -5.35404, Residual: 0.414234 iter = 14, Function value: -5.46775, Residual: 0.166872 iter = 15, Function value: -5.5088, Residual: 0.109606 iter = 16, Function value: -5.55735, Residual: 0.137697 iter = 17, Function value: -5.61175, Residual: 0.20436 iter = 18, Function value: -5.67254, Residual: 0.145789 iter = 19, Function value: -5.7428, Residual: 0.131282 iter = 20, Function value: -5.78678, Residual: 0.168675 iter = 21, Function value: -5.83482, Residual: 0.107592 iter = 22, Function value: -5.88877, Residual: 0.120863 iter = 23, Function value: -5.94162, Residual: 0.170501 iter = 24, Function value: -5.99103, Residual: 0.137338 iter = 25, Function value: -6.03433, Residual: 0.11123 iter = 26, Function value: -6.08494, Residual: 0.123001 iter = 27, Function value: -6.12432, Residual: 0.158457 iter = 28, Function value: -6.16801, Residual: 0.100757 iter = 29, Function value: -6.21675, Residual: 0.100482 iter = 30, Function value: -6.25439, Residual: 0.150756 iter = 31, Function value: -6.29508, Residual: 0.10085 iter = 32, Function value: -6.33253, Residual: 0.0966374 iter = 33, Function value: -6.36431, Residual: 0.103262 iter = 34, Function value: -6.40181, Residual: 0.102187 iter = 35, Function value: -6.44427, Residual: 0.121893 iter = 36, Function value: -6.48004, Residual: 0.0983186 iter = 37, Function value: -6.51506, Residual: 0.092832 iter = 38, Function value: -6.5519, Residual: 0.116683 iter = 39, Function value: -6.58292, Residual: 0.116904 iter = 40, Function value: -6.60971, Residual: 0.0771122 iter = 41, Function value: -6.63585, Residual: 0.0757516 iter = 42, Function value: -6.6586, Residual: 0.10885 iter = 43, Function value: -6.68587, Residual: 0.0752019 iter = 44, Function value: -6.7155, Residual: 0.0769993 iter = 45, Function value: -6.7459, Residual: 0.1128 iter = 46, Function value: -6.77436, Residual: 0.0874185 iter = 47, Function value: -6.8015, Residual: 0.0772962 iter = 48, Function value: -6.82934, Residual: 0.0894704 iter = 49, Function value: -6.85383, Residual: 0.0930334 iter = 50, Function value: -6.87712, Residual: 0.0707267 iter = 51, Function value: -6.90227, Residual: 0.0704031 iter = 52, Function value: -6.92095, Residual: 0.11036 iter = 53, Function value: -6.94187, Residual: 0.0625794 iter = 54, Function value: -6.96158, Residual: 0.0585309 iter = 55, Function value: -6.98111, Residual: 0.07778 iter = 56, Function value: -7.00425, Residual: 0.0917501 iter = 57, Function value: -7.02495, Residual: 0.0680073 iter = 58, Function value: -7.04571, Residual: 0.0632846 iter = 59, Function value: -7.06812, Residual: 0.067345 iter = 60, Function value: -7.08734, Residual: 0.111259 iter = 61, Function value: -7.10785, Residual: 0.062663 iter = 62, Function value: -7.12178, Residual: 0.0546556 iter = 63, Function value: -7.13834, Residual: 0.0661359 iter = 64, Function value: -7.15288, Residual: 0.0977186 iter = 65, Function value: -7.16979, Residual: 0.0558807 iter = 66, Function value: -7.18517, Residual: 0.0514965 iter = 67, Function value: -7.19956, Residual: 0.0566699 iter = 68, Function value: -7.21503, Residual: 0.102862 iter = 69, Function value: -7.23295, Residual: 0.0520616 iter = 70, Function value: -7.24472, Residual: 0.0514462 iter = 71, Function value: -7.2607, Residual: 0.0603313 iter = 72, Function value: -7.27322, Residual: 0.114154 iter = 73, Function value: -7.29073, Residual: 0.0553255 iter = 74, Function value: -7.30301, Residual: 0.0468704 iter = 75, Function value: -7.31535, Residual: 0.0503862 iter = 76, Function value: -7.32483, Residual: 0.121667 iter = 77, Function value: -7.34239, Residual: 0.0473323 iter = 78, Function value: -7.35051, Residual: 0.0402786 iter = 79, Function value: -7.36252, Residual: 0.0513142 iter = 80, Function value: -7.37578, Residual: 0.0980691 iter = 81, Function value: -7.39103, Residual: 0.0561799 iter = 82, Function value: -7.40188, Residual: 0.0420525 iter = 83, Function value: -7.41352, Residual: 0.0506624 iter = 84, Function value: -7.42311, Residual: 0.0745317 iter = 85, Function value: -7.43349, Residual: 0.0418963 iter = 86, Function value: -7.44304, Residual: 0.0427164 iter = 87, Function value: -7.45309, Residual: 0.050941 iter = 88, Function value: -7.46557, Residual: 0.0855721 iter = 89, Function value: -7.47897, Residual: 0.0443531 iter = 90, Function value: -7.48731, Residual: 0.038289 iter = 91, Function value: -7.49667, Residual: 0.0433084 iter = 92, Function value: -7.50404, Residual: 0.0748006 iter = 93, Function value: -7.51295, Residual: 0.0379092 iter = 94, Function value: -7.52065, Residual: 0.0360034 iter = 95, Function value: -7.52815, Residual: 0.0407615 iter = 96, Function value: -7.53624, Residual: 0.0871539 iter = 97, Function value: -7.54733, Residual: 0.0370463 iter = 98, Function value: -7.55263, Residual: 0.0321752 iter = 99, Function value: -7.5611, Residual: 0.0380701 iter = 100, Function value: -7.56636, Residual: 0.0828749 iter = 101, Function value: -7.57523, Residual: 0.0364468 iter = 102, Function value: -7.58065, Residual: 0.0296879 iter = 103, Function value: -7.58617, Residual: 0.0331854 iter = 104, Function value: -7.59286, Residual: 0.0653273 iter = 105, Function value: -7.60073, Residual: 0.0311378 iter = 106, Function value: -7.605, Residual: 0.0271273 iter = 107, Function value: -7.61161, Residual: 0.0352389 iter = 108, Function value: -7.61636, Residual: 0.0633698 iter = 109, Function value: -7.62264, Residual: 0.0323803 iter = 110, Function value: -7.62822, Residual: 0.0286247 iter = 111, Function value: -7.63278, Residual: 0.0310643 iter = 112, Function value: -7.63766, Residual: 0.0695965 iter = 113, Function value: -7.64471, Residual: 0.0273006 iter = 114, Function value: -7.6477, Residual: 0.0241679 iter = 115, Function value: -7.65299, Residual: 0.0287846 iter = 116, Function value: -7.65608, Residual: 0.0686328 iter = 117, Function value: -7.66196, Residual: 0.029546 iter = 118, Function value: -7.66562, Residual: 0.022733 iter = 119, Function value: -7.6691, Residual: 0.0260237 iter = 120, Function value: -7.67366, Residual: 0.0518384 iter = 121, Function value: -7.67882, Residual: 0.0250519 iter = 122, Function value: -7.68159, Residual: 0.0208822 iter = 123, Function value: -7.68604, Residual: 0.0283777 iter = 124, Function value: -7.68918, Residual: 0.0502209 iter = 125, Function value: -7.69338, Residual: 0.026466 iter = 126, Function value: -7.69725, Residual: 0.0232737 iter = 127, Function value: -7.70029, Residual: 0.0246071 iter = 128, Function value: -7.70333, Residual: 0.0560219 iter = 129, Function value: -7.70795, Residual: 0.0219018 iter = 130, Function value: -7.70988, Residual: 0.0192943 iter = 131, Function value: -7.71333, Residual: 0.0223829 iter = 132, Function value: -7.71539, Residual: 0.0571394 iter = 133, Function value: -7.71939, Residual: 0.0241063 iter = 134, Function value: -7.72174, Residual: 0.0183287 iter = 135, Function value: -7.72395, Residual: 0.0206927 iter = 136, Function value: -7.7272, Residual: 0.0366094 iter = 137, Function value: -7.73034, Residual: 0.0229168 iter = 138, Function value: -7.73217, Residual: 0.0165316 iter = 139, Function value: -7.73523, Residual: 0.0207809 iter = 140, Function value: -7.7372, Residual: 0.0344397 iter = 141, Function value: -7.73964, Residual: 0.0197955 iter = 142, Function value: -7.74225, Residual: 0.0175415 iter = 143, Function value: -7.74421, Residual: 0.0204096 iter = 144, Function value: -7.74642, Residual: 0.0325727 iter = 145, Function value: -7.74861, Residual: 0.0168286 iter = 146, Function value: -7.74991, Residual: 0.0166282 iter = 147, Function value: -7.75202, Residual: 0.0187187 iter = 148, Function value: -7.75333, Residual: 0.043491 iter = 149, Function value: -7.75579, Residual: 0.0163257 iter = 150, Function value: -7.75685, Residual: 0.013081 iter = 151, Function value: -7.75808, Residual: 0.0146514 iter = 152, Function value: -7.75972, Residual: 0.0272383 iter = 153, Function value: -7.76143, Residual: 0.0146716 iter = 154, Function value: -7.76245, Residual: 0.012202 iter = 155, Function value: -7.76402, Residual: 0.0165763 iter = 156, Function value: -7.76485, Residual: 0.0264216 iter = 157, Function value: -7.76596, Residual: 0.0141675 iter = 158, Function value: -7.76711, Residual: 0.0118525 iter = 159, Function value: -7.76795, Residual: 0.0132682 iter = 160, Function value: -7.76906, Residual: 0.0307988 iter = 161, Function value: -7.77046, Residual: 0.0136723 iter = 162, Function value: -7.77101, Residual: 0.0105228 iter = 163, Function value: -7.77204, Residual: 0.0114438 iter = 164, Function value: -7.77257, Residual: 0.0246093 iter = 165, Function value: -7.77345, Residual: 0.0125104 iter = 166, Function value: -7.77417, Residual: 0.0102181 iter = 167, Function value: -7.77474, Residual: 0.0109432 iter = 168, Function value: -7.77532, Residual: 0.0199891 iter = 169, Function value: -7.77611, Residual: 0.0101163 iter = 170, Function value: -7.77655, Residual: 0.0105442 iter = 171, Function value: -7.77743, Residual: 0.0116992 iter = 172, Function value: -7.77779, Residual: 0.0293656 iter = 173, Function value: -7.77878, Residual: 0.0112078 iter = 174, Function value: -7.77919, Residual: 0.00808034 iter = 175, Function value: -7.77957, Residual: 0.00902743 iter = 176, Function value: -7.78023, Residual: 0.0122209 iter = 177, Function value: -7.78068, Residual: 0.0174619 iter = 178, Function value: -7.78127, Residual: 0.00842028 iter = 179, Function value: -7.78167, Residual: 0.00912139 iter = 180, Function value: -7.78215, Residual: 0.0100989 iter = 181, Function value: -7.78253, Residual: 0.0198295 iter = 182, Function value: -7.78312, Residual: 0.00877936 iter = 183, Function value: -7.78342, Residual: 0.00805744 iter = 184, Function value: -7.78381, Residual: 0.00884593 iter = 185, Function value: -7.78426, Residual: 0.0172994 iter = 186, Function value: -7.78476, Residual: 0.00859095 iter = 187, Function value: -7.78503, Residual: 0.00730765 iter = 188, Function value: -7.78543, Residual: 0.00819728 iter = 189, Function value: -7.7857, Residual: 0.0163602 iter = 190, Function value: -7.78611, Residual: 0.00840991 iter = 191, Function value: -7.78641, Residual: 0.00748283 iter = 192, Function value: -7.7867, Residual: 0.00842632 iter = 193, Function value: -7.78717, Residual: 0.0143767 iter = 194, Function value: -7.78759, Residual: 0.00972408 iter = 195, Function value: -7.78783, Residual: 0.00682761 iter = 196, Function value: -7.78815, Residual: 0.00741374 iter = 197, Function value: -7.7884, Residual: 0.00941704 iter = 198, Function value: -7.78875, Residual: 0.00770444 iter = 199, Function value: -7.78917, Residual: 0.0085247 iter = 200, Function value: -7.78949, Residual: 0.0119359 iter = 201, Function value: -7.7898, Residual: 0.00737882 iter = 202, Function value: -7.79006, Residual: 0.00692514 iter = 203, Function value: -7.79031, Residual: 0.00784755 iter = 204, Function value: -7.79062, Residual: 0.0133266 iter = 205, Function value: -7.79097, Residual: 0.00759952 iter = 206, Function value: -7.7912, Residual: 0.00672236 iter = 207, Function value: -7.79144, Residual: 0.00698928 iter = 208, Function value: -7.79165, Residual: 0.0137271 iter = 209, Function value: -7.79195, Residual: 0.00704828 iter = 210, Function value: -7.79218, Residual: 0.00665079 iter = 211, Function value: -7.79241, Residual: 0.00752568 iter = 212, Function value: -7.79273, Residual: 0.0135994 iter = 213, Function value: -7.79307, Residual: 0.00727429 iter = 214, Function value: -7.79324, Residual: 0.00610776 iter = 215, Function value: -7.79355, Residual: 0.00704834 iter = 216, Function value: -7.79371, Residual: 0.0129413 iter = 217, Function value: -7.79398, Residual: 0.00699312 iter = 218, Function value: -7.79422, Residual: 0.00615925 iter = 219, Function value: -7.79442, Residual: 0.00670629 iter = 220, Function value: -7.79466, Residual: 0.0129022 iter = 221, Function value: -7.79497, Residual: 0.00653843 iter = 222, Function value: -7.79512, Residual: 0.00595371 iter = 223, Function value: -7.79543, Residual: 0.00677522 iter = 224, Function value: -7.79557, Residual: 0.0157612 iter = 225, Function value: -7.79589, Residual: 0.00704513 iter = 226, Function value: -7.79609, Residual: 0.00558273 iter = 227, Function value: -7.79628, Residual: 0.00635017 iter = 228, Function value: -7.79656, Residual: 0.0100848 iter = 229, Function value: -7.79683, Residual: 0.00716537 iter = 230, Function value: -7.79701, Residual: 0.00566117 iter = 231, Function value: -7.79735, Residual: 0.00681235 iter = 232, Function value: -7.79758, Residual: 0.0114013 iter = 233, Function value: -7.79786, Residual: 0.00690156 iter = 234, Function value: -7.79816, Residual: 0.00560736 iter = 235, Function value: -7.79836, Residual: 0.00676259 iter = 236, Function value: -7.79867, Residual: 0.00733547 iter = 237, Function value: -7.79891, Residual: 0.00691113 iter = 238, Function value: -7.79917, Residual: 0.00634128 iter = 239, Function value: -7.79947, Residual: 0.00755322 iter = 240, Function value: -7.79971, Residual: 0.00881029 iter = 241, Function value: -7.79994, Residual: 0.00591148 iter = 242, Function value: -7.80019, Residual: 0.00531922 iter = 243, Function value: -7.80038, Residual: 0.00782399 iter = 244, Function value: -7.80062, Residual: 0.00560367 iter = 245, Function value: -7.80087, Residual: 0.00646445 iter = 246, Function value: -7.8011, Residual: 0.00669491 iter = 247, Function value: -7.8013, Residual: 0.00542145 iter = 248, Function value: -7.80156, Residual: 0.00646472 iter = 249, Function value: -7.80176, Residual: 0.00831322 iter = 250, Function value: -7.80195, Residual: 0.00593352 iter = 251, Function value: -7.80221, Residual: 0.00510617 iter = 252, Function value: -7.80237, Residual: 0.00726524 iter = 253, Function value: -7.80255, Residual: 0.00561318 iter = 254, Function value: -7.80277, Residual: 0.00598889 iter = 255, Function value: -7.80293, Residual: 0.00663841 iter = 256, Function value: -7.80309, Residual: 0.00557977 iter = 257, Function value: -7.80339, Residual: 0.00662109 iter = 258, Function value: -7.80351, Residual: 0.0103698 iter = 259, Function value: -7.80369, Residual: 0.00547099 iter = 260, Function value: -7.80385, Residual: 0.00412179 iter = 261, Function value: -7.80398, Residual: 0.00469965 iter = 262, Function value: -7.80418, Residual: 0.00800895 iter = 263, Function value: -7.80436, Residual: 0.00682549 iter = 264, Function value: -7.8045, Residual: 0.00448242 iter = 265, Function value: -7.8047, Residual: 0.00449525 iter = 266, Function value: -7.80483, Residual: 0.00623266 iter = 267, Function value: -7.80499, Residual: 0.00432503 iter = 268, Function value: -7.80516, Residual: 0.00443986 iter = 269, Function value: -7.80527, Residual: 0.00654341 iter = 270, Function value: -7.8054, Residual: 0.00403316 iter = 271, Function value: -7.80557, Residual: 0.0042036 iter = 272, Function value: -7.8057, Residual: 0.00486939 iter = 273, Function value: -7.80582, Residual: 0.00914879 iter = 274, Function value: -7.80599, Residual: 0.00371046 iter = 275, Function value: -7.80606, Residual: 0.00343322 iter = 276, Function value: -7.80617, Residual: 0.00408214 iter = 277, Function value: -7.80626, Residual: 0.00960308 iter = 278, Function value: -7.80641, Residual: 0.00403028 iter = 279, Function value: -7.8065, Residual: 0.00326197 iter = 280, Function value: -7.80658, Residual: 0.00364705 iter = 281, Function value: -7.8067, Residual: 0.00772842 iter = 282, Function value: -7.80684, Residual: 0.00352321 iter = 283, Function value: -7.80691, Residual: 0.00312168 iter = 284, Function value: -7.80703, Residual: 0.00413708 iter = 285, Function value: -7.8071, Residual: 0.00844741 iter = 286, Function value: -7.80722, Residual: 0.00387447 iter = 287, Function value: -7.80731, Residual: 0.00321656 iter = 288, Function value: -7.80739, Residual: 0.0035753 iter = 289, Function value: -7.8075, Residual: 0.00709712 iter = 290, Function value: -7.80762, Residual: 0.0038076 iter = 291, Function value: -7.80768, Residual: 0.00276752 iter = 292, Function value: -7.80778, Residual: 0.00341213 iter = 293, Function value: -7.80784, Residual: 0.00630934 iter = 294, Function value: -7.80793, Residual: 0.00369558 iter = 295, Function value: -7.80804, Residual: 0.00316572 iter = 296, Function value: -7.80811, Residual: 0.00341499 iter = 297, Function value: -7.80817, Residual: 0.00733733 iter = 298, Function value: -7.80828, Residual: 0.00290179 iter = 299, Function value: -7.80832, Residual: 0.002659 iter = 300, Function value: -7.80841, Residual: 0.00325143 iter = 301, Function value: -7.80846, Residual: 0.00851133 iter = 302, Function value: -7.80856, Residual: 0.00338366 iter = 303, Function value: -7.80862, Residual: 0.00242721 iter = 304, Function value: -7.80866, Residual: 0.00265834 iter = 305, Function value: -7.80874, Residual: 0.00468424 iter = 306, Function value: -7.80881, Residual: 0.00278952 iter = 307, Function value: -7.80886, Residual: 0.0024051 iter = 308, Function value: -7.80896, Residual: 0.00336636 iter = 309, Function value: -7.809, Residual: 0.00684994 iter = 310, Function value: -7.80907, Residual: 0.00284831 iter = 311, Function value: -7.80912, Residual: 0.0021039 iter = 312, Function value: -7.80915, Residual: 0.0024576 iter = 313, Function value: -7.80923, Residual: 0.00351195 iter = 314, Function value: -7.80925, Residual: 0.00653157 iter = 315, Function value: -7.80933, Residual: 0.00221326 iter = 316, Function value: -7.80935, Residual: 0.00193244 iter = 317, Function value: -7.8094, Residual: 0.00236798 iter = 318, Function value: -7.80944, Residual: 0.00544877 iter = 319, Function value: -7.80949, Residual: 0.00242602 iter = 320, Function value: -7.80952, Residual: 0.00187275 iter = 321, Function value: -7.80955, Residual: 0.00208739 iter = 322, Function value: -7.80959, Residual: 0.00369487 iter = 323, Function value: -7.80963, Residual: 0.00210818 iter = 324, Function value: -7.80965, Residual: 0.00171339 iter = 325, Function value: -7.8097, Residual: 0.00224769 iter = 326, Function value: -7.80972, Residual: 0.00409731 iter = 327, Function value: -7.80975, Residual: 0.00205957 iter = 328, Function value: -7.80978, Residual: 0.00166663 iter = 329, Function value: -7.8098, Residual: 0.00177559 iter = 330, Function value: -7.80984, Residual: 0.00401932 iter = 331, Function value: -7.80987, Residual: 0.0019732 iter = 332, Function value: -7.80988, Residual: 0.00146904 iter = 333, Function value: -7.80991, Residual: 0.00156285 iter = 334, Function value: -7.80993, Residual: 0.00366378 iter = 335, Function value: -7.80996, Residual: 0.00182864 iter = 336, Function value: -7.80998, Residual: 0.00143332 iter = 337, Function value: -7.80999, Residual: 0.00153632 iter = 338, Function value: -7.81001, Residual: 0.00300152 iter = 339, Function value: -7.81003, Residual: 0.00158852 iter = 340, Function value: -7.81005, Residual: 0.00139549 iter = 341, Function value: -7.81007, Residual: 0.0015773 iter = 342, Function value: -7.81008, Residual: 0.00381815 iter = 343, Function value: -7.8101, Residual: 0.00154118 iter = 344, Function value: -7.81011, Residual: 0.00110395 iter = 345, Function value: -7.81012, Residual: 0.00129509 iter = 346, Function value: -7.81014, Residual: 0.00187921 iter = 347, Function value: -7.81016, Residual: 0.00235816 iter = 348, Function value: -7.81017, Residual: 0.00112299 iter = 349, Function value: -7.81018, Residual: 0.00122119 iter = 350, Function value: -7.81019, Residual: 0.00136949 iter = 351, Function value: -7.81021, Residual: 0.00242993 iter = 352, Function value: -7.81022, Residual: 0.00113653 iter = 353, Function value: -7.81023, Residual: 0.00119118 iter = 354, Function value: -7.81024, Residual: 0.00131839 iter = 355, Function value: -7.81025, Residual: 0.00344071 iter = 356, Function value: -7.81027, Residual: 0.00107127 iter = 357, Function value: -7.81027, Residual: 0.000881068 iter = 358, Function value: -7.81028, Residual: 0.00103876 iter = 359, Function value: -7.81029, Residual: 0.00168859 iter = 360, Function value: -7.8103, Residual: 0.00128558 iter = 361, Function value: -7.81031, Residual: 0.000936338 iter = 362, Function value: -7.81032, Residual: 0.00112771 iter = 363, Function value: -7.81033, Residual: 0.00198573 iter = 364, Function value: -7.81034, Residual: 0.00111927 iter = 365, Function value: -7.81035, Residual: 0.000968416 iter = 366, Function value: -7.81036, Residual: 0.00106383 iter = 367, Function value: -7.81037, Residual: 0.00255311 iter = 368, Function value: -7.81038, Residual: 0.00109532 iter = 369, Function value: -7.81038, Residual: 0.000799371 iter = 370, Function value: -7.81039, Residual: 0.000852602 iter = 371, Function value: -7.8104, Residual: 0.00198778 iter = 372, Function value: -7.81041, Residual: 0.00113394 iter = 373, Function value: -7.81042, Residual: 0.000915895 iter = 374, Function value: -7.81043, Residual: 0.000993134 iter = 375, Function value: -7.81043, Residual: 0.00201892 iter = 376, Function value: -7.81044, Residual: 0.000831109 iter = 377, Function value: -7.81045, Residual: 0.000884957 iter = 378, Function value: -7.81045, Residual: 0.00110069 iter = 379, Function value: -7.81046, Residual: 0.0024113 iter = 380, Function value: -7.81047, Residual: 0.000842735 iter = 381, Function value: -7.81048, Residual: 0.000673622 iter = 382, Function value: -7.81048, Residual: 0.0007883 iter = 383, Function value: -7.81049, Residual: 0.00169046 iter = 384, Function value: -7.81049, Residual: 0.000872412 iter = 385, Function value: -7.8105, Residual: 0.000738495 iter = 386, Function value: -7.81051, Residual: 0.00109419 iter = 387, Function value: -7.81051, Residual: 0.00127752 iter = 388, Function value: -7.81052, Residual: 0.000781728 iter = 389, Function value: -7.81052, Residual: 0.000802141 iter = 390, Function value: -7.81053, Residual: 0.000918364 iter = 391, Function value: -7.81054, Residual: 0.00194135 iter = 392, Function value: -7.81054, Residual: 0.000837812 iter = 393, Function value: -7.81055, Residual: 0.000640064 iter = 394, Function value: -7.81055, Residual: 0.000663732 iter = 395, Function value: -7.81056, Residual: 0.00144857 iter = 396, Function value: -7.81056, Residual: 0.000820515 iter = 397, Function value: -7.81057, Residual: 0.00068299 iter = 398, Function value: -7.81057, Residual: 0.000751018 iter = 399, Function value: -7.81057, Residual: 0.00162709 iter = 400, Function value: -7.81058, Residual: 0.000692016 iter = 401, Function value: -7.81058, Residual: 0.000676616 iter = 402, Function value: -7.81059, Residual: 0.000839196 iter = 403, Function value: -7.81059, Residual: 0.00192361 iter = 404, Function value: -7.8106, Residual: 0.000734566 iter = 405, Function value: -7.8106, Residual: 0.000552234 iter = 406, Function value: -7.8106, Residual: 0.000666441 iter = 407, Function value: -7.81061, Residual: 0.00110812 iter = 408, Function value: -7.81061, Residual: 0.000805687 iter = 409, Function value: -7.81062, Residual: 0.000563736 iter = 410, Function value: -7.81062, Residual: 0.000747683 iter = 411, Function value: -7.81062, Residual: 0.00109469 iter = 412, Function value: -7.81063, Residual: 0.000705128 iter = 413, Function value: -7.81063, Residual: 0.000617784 iter = 414, Function value: -7.81064, Residual: 0.00119265 iter = 415, Function value: -7.81064, Residual: 0.000626528 iter = 416, Function value: -7.81064, Residual: 0.000587671 iter = 417, Function value: -7.81065, Residual: 0.00068946 iter = 418, Function value: -7.81065, Residual: 0.00129254 iter = 419, Function value: -7.81065, Residual: 0.000927012 iter = 420, Function value: -7.81066, Residual: 0.000498195 iter = 421, Function value: -7.81066, Residual: 0.000540408 iter = 422, Function value: -7.81066, Residual: 0.000728311 iter = 423, Function value: -7.81067, Residual: 0.00109958 iter = 424, Function value: -7.81067, Residual: 0.000615443 iter = 425, Function value: -7.81067, Residual: 0.000579832 iter = 426, Function value: -7.81068, Residual: 0.000682068 iter = 427, Function value: -7.81068, Residual: 0.000860703 iter = 428, Function value: -7.81068, Residual: 0.000559616 iter = 429, Function value: -7.81069, Residual: 0.000618818 iter = 430, Function value: -7.81069, Residual: 0.000794079 iter = 431, Function value: -7.81069, Residual: 0.00073984 iter = 432, Function value: -7.8107, Residual: 0.000747683 iter = 433, Function value: -7.8107, Residual: 0.000616185 iter = 434, Function value: -7.8107, Residual: 0.000693478 iter = 435, Function value: -7.81071, Residual: 0.000640444 iter = 436, Function value: -7.81071, Residual: 0.000620902 iter = 437, Function value: -7.81072, Residual: 0.000854389 iter = 438, Function value: -7.81072, Residual: 0.00106588 iter = 439, Function value: -7.81072, Residual: 0.000556506 iter = 440, Function value: -7.81072, Residual: 0.000495603 iter = 441, Function value: -7.81073, Residual: 0.000674669 iter = 442, Function value: -7.81073, Residual: 0.00103042 iter = 443, Function value: -7.81073, Residual: 0.000655499 iter = 444, Function value: -7.81074, Residual: 0.000607126 iter = 445, Function value: -7.81074, Residual: 0.000705519 iter = 446, Function value: -7.81074, Residual: 0.00119595 iter = 447, Function value: -7.81075, Residual: 0.000646184 iter = 448, Function value: -7.81075, Residual: 0.000594044 iter = 449, Function value: -7.81075, Residual: 0.000641001 iter = 450, Function value: -7.81076, Residual: 0.00125829 iter = 451, Function value: -7.81076, Residual: 0.000555608 iter = 452, Function value: -7.81076, Residual: 0.000525893 iter = 453, Function value: -7.81077, Residual: 0.000642375 iter = 454, Function value: -7.81077, Residual: 0.00143457 iter = 455, Function value: -7.81077, Residual: 0.000603507 iter = 456, Function value: -7.81077, Residual: 0.0004898 iter = 457, Function value: -7.81078, Residual: 0.000552409 iter = 458, Function value: -7.81078, Residual: 0.00108441 iter = 459, Function value: -7.81078, Residual: 0.000449216 iter = 460, Function value: -7.81078, Residual: 0.000447034 iter = 461, Function value: -7.81078, Residual: 0.000582248 iter = 462, Function value: -7.81079, Residual: 0.00129805 iter = 463, Function value: -7.81079, Residual: 0.000628522 iter = 464, Function value: -7.81079, Residual: 0.000497836 iter = 465, Function value: -7.81079, Residual: 0.000525396 iter = 466, Function value: -7.8108, Residual: 0.00116966 iter = 467, Function value: -7.8108, Residual: 0.000449353 iter = 468, Function value: -7.8108, Residual: 0.000376025 iter = 469, Function value: -7.8108, Residual: 0.000468749 iter = 470, Function value: -7.8108, Residual: 0.00101476 iter = 471, Function value: -7.81081, Residual: 0.000525062 iter = 472, Function value: -7.81081, Residual: 0.000429858 iter = 473, Function value: -7.81081, Residual: 0.000451821 iter = 474, Function value: -7.81081, Residual: 0.00120145 iter = 475, Function value: -7.81081, Residual: 0.000391866 iter = 476, Function value: -7.81081, Residual: 0.000333985 iter = 477, Function value: -7.81082, Residual: 0.000428342 iter = 478, Function value: -7.81082, Residual: 0.000739816 iter = 479, Function value: -7.81082, Residual: 0.000514762 iter = 480, Function value: -7.81082, Residual: 0.000342221 iter = 481, Function value: -7.81082, Residual: 0.000402088 iter = 482, Function value: -7.81082, Residual: 0.000503487 iter = 483, Function value: -7.81083, Residual: 0.000375629 iter = 484, Function value: -7.81083, Residual: 0.000387146 iter = 485, Function value: -7.81083, Residual: 0.000767867 iter = 486, Function value: -7.81083, Residual: 0.000318778 iter = 487, Function value: -7.81083, Residual: 0.000275048 Tao Object: 16 MPI processes type: blmvm Gradient steps: 0 TaoLineSearch Object: 16 MPI processes type: more-thuente Active Set subset type: subvec convergence tolerances: fatol=1e-08, frtol=1e-08 convergence tolerances: gatol=1e-08, steptol=0, gttol=0 Residual in Function/Gradient:=0.000275048 Objective value=-7.81083 total number of iterations=487, (max: 50000) total number of function/gradient evaluations=488, (max: 4000) Solution converged: estimated |f(x)-f(X*)|/|f(X*)| <= frtol it: 1 2.409978e+00 487 6.354219e+09 ========================================== Time summary: ========================================== Creating DMPlex: 1.1421 Distributing DMPlex: 0.738438 Refining DMPlex: 0.388502 Setting up problem: 0.545519 Overall analysis time: 2.81322 Overall FLOPS/s: 5.22992e+09 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./main_wolf on a arch-no-hdf5-opt named wf323.localdomain with 16 processors, by jychang48 Thu Jun 25 10:33:17 2015 Using Petsc Development GIT revision: unknown GIT Date: unknown Max Max/Min Avg Total Time (sec): 5.637e+00 1.00020 5.637e+00 Objects: 3.920e+02 1.11681 3.536e+02 Flops: 1.029e+09 1.12330 9.661e+08 1.546e+10 Flops/sec: 1.826e+08 1.12340 1.714e+08 2.742e+09 MPI Messages: 5.230e+03 1.42222 4.597e+03 7.354e+04 MPI Message Lengths: 8.956e+07 5.74006 4.876e+03 3.586e+08 MPI Reductions: 1.269e+04 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 5.6369e+00 100.0% 1.5458e+10 100.0% 7.354e+04 100.0% 4.876e+03 100.0% 1.269e+04 100.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage CreateMesh 489 1.0 2.7790e+00 1.0 2.81e+08 1.1 7.1e+04 3.7e+03 9.9e+02 49 27 96 74 8 49 27 96 74 8 1523 VecView 1 1.0 2.8899e-03 2.4 6.20e+04 2.6 8.6e+01 2.5e+04 0.0e+00 0 0 0 1 0 0 0 0 1 0 252 VecDot 10680 1.0 5.4635e-01 1.3 3.61e+08 1.1 0.0e+00 0.0e+00 1.1e+04 8 35 0 0 84 8 35 0 0 84 9915 VecNorm 488 1.0 3.0562e-02 1.8 1.65e+07 1.1 0.0e+00 0.0e+00 4.9e+02 0 2 0 0 4 0 2 0 0 4 8099 VecScale 1947 1.0 2.9742e-02 1.1 3.29e+07 1.1 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 16602 VecCopy 6326 1.0 1.7738e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 VecSet 11 1.0 1.6385e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 8245 1.0 2.7241e-01 1.2 2.87e+08 1.1 0.0e+00 0.0e+00 0.0e+00 4 28 0 0 0 4 28 0 0 0 15804 VecAYPX 972 1.0 3.0343e-02 1.5 1.64e+07 1.1 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 8124 VecWAXPY 1 1.0 4.1008e-05 1.8 1.69e+04 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 6185 VecPointwiseMult 2917 1.0 7.4186e-02 1.1 4.93e+07 1.1 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 9972 VecScatterBegin 489 1.0 2.2702e-02 1.4 0.00e+00 0.0 6.8e+04 2.5e+03 0.0e+00 0 0 93 48 0 0 0 93 48 0 0 VecScatterEnd 489 1.0 2.3321e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMult 489 1.0 4.4136e-01 1.1 2.32e+08 1.1 6.8e+04 2.5e+03 0.0e+00 8 23 93 48 0 8 23 93 48 0 7922 MatAssemblyBegin 2 1.0 2.9637e-0211.0 0.00e+00 0.0 2.1e+02 7.3e+04 4.0e+00 0 0 0 4 0 0 0 0 4 0 0 MatAssemblyEnd 2 1.0 1.2829e-02 1.7 0.00e+00 0.0 2.8e+02 6.3e+02 8.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 1 1.0 3.3212e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 Mesh Partition 1 1.0 6.6892e-01 1.0 0.00e+00 0.0 1.6e+03 1.1e+04 4.0e+00 12 0 2 5 0 12 0 2 5 0 0 Mesh Migration 1 1.0 8.7067e-02 1.0 0.00e+00 0.0 3.0e+02 2.0e+05 2.0e+00 2 0 0 16 0 2 0 0 16 0 0 DMPlexInterp 3 1.0 9.7370e-014158.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 DMPlexDistribute 1 1.0 7.6774e-01 1.0 0.00e+00 0.0 1.9e+03 4.7e+04 6.0e+00 14 0 3 25 0 14 0 3 25 0 0 DMPlexDistCones 1 1.0 5.6297e-02 1.0 0.00e+00 0.0 1.3e+02 3.5e+05 0.0e+00 1 0 0 13 0 1 0 0 13 0 0 DMPlexDistLabels 1 1.0 2.6512e-0424.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 DMPlexDistField 2 1.0 2.6672e-02 1.1 0.00e+00 0.0 2.6e+02 7.8e+04 4.0e+00 0 0 0 6 0 0 0 0 6 0 0 DMPlexDistData 1 1.0 4.5035e-0132.4 0.00e+00 0.0 1.6e+03 7.0e+03 0.0e+00 7 0 2 3 0 7 0 2 3 0 0 DMPlexStratify 12 1.2 3.5558e-01 4.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 DMPlexPrealloc 1 1.0 1.9255e-01 1.0 0.00e+00 0.0 1.7e+03 5.2e+03 1.7e+01 3 0 2 2 0 3 0 2 2 0 0 DMPlexResidualFE 1 1.0 1.0459e-01 1.1 5.23e+06 1.1 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 769 DMPlexJacobianFE 1 1.0 3.0138e-01 1.0 1.06e+07 1.1 2.1e+02 7.3e+04 2.0e+00 5 1 0 4 0 5 1 0 4 0 542 SFSetGraph 22 1.0 1.2767e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFBcastBegin 35 1.0 4.6513e-01 7.6 0.00e+00 0.0 3.6e+03 3.0e+04 0.0e+00 8 0 5 30 0 8 0 5 30 0 0 SFBcastEnd 35 1.0 8.2554e-02 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 SFReduceBegin 6 1.0 1.3256e-02 9.8 0.00e+00 0.0 8.0e+02 1.9e+04 0.0e+00 0 0 1 4 0 0 0 1 4 0 0 SFReduceEnd 6 1.0 1.9113e-02 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFFetchOpBegin 1 1.0 7.4863e-0524.2 0.00e+00 0.0 7.0e+01 1.2e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFFetchOpEnd 1 1.0 4.8614e-04 2.3 0.00e+00 0.0 7.0e+01 1.2e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SNESFunctionEval 1 1.0 1.1039e-01 1.1 5.23e+06 1.1 1.7e+02 2.5e+04 0.0e+00 2 1 0 1 0 2 1 0 1 0 729 SNESJacobianEval 1 1.0 3.0186e-01 1.0 1.06e+07 1.1 4.2e+02 4.4e+04 2.0e+00 5 1 1 5 0 5 1 1 5 0 541 TaoSolve 1 1.0 2.0006e+00 1.0 1.00e+09 1.1 6.8e+04 2.5e+03 1.3e+04 35 97 93 48 99 35 97 93 48 99 7529 TaoLineSearchApply 487 1.0 7.7062e-01 1.0 3.63e+08 1.1 6.8e+04 2.5e+03 3.9e+03 14 35 93 48 31 14 35 93 48 31 7083 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Viewer 4 3 2264 0 Object 7 7 4032 0 Container 7 7 3976 0 Vector 49 49 35398640 0 Vector Scatter 1 1 1088 0 Matrix 4 4 3128844 0 Distributed Mesh 30 30 139040 0 GraphPartitioner 12 12 7248 0 Star Forest Bipartite Graph 78 78 63576 0 Discrete System 30 30 25440 0 Index Set 85 85 12912584 0 IS L to G Mapping 2 2 3821272 0 Section 70 70 46480 0 SNES 1 1 1332 0 SNESLineSearch 1 1 864 0 DMSNES 1 1 664 0 Krylov Solver 1 1 1216 0 Preconditioner 1 1 848 0 Linear Space 2 2 1280 0 Dual Space 2 2 1312 0 FE Space 2 2 1496 0 Tao 1 1 1752 0 TaoLineSearch 1 1 880 0 ======================================================================================================================== Average time to get PetscTime(): 5.96046e-07 Average time for MPI_Barrier(): 5.00679e-06 Average time for zero size MPI_Send(): 1.68383e-06 #PETSc Option Table entries: -al 1 -am 0 -at 0.001 -bcloc 0,1,0,1,0,0,0,1,0,1,1,1,0,0,0,1,0,1,1,1,0,1,0,1,0,1,0,0,0,1,0,1,1,1,0,1,0.45,0.55,0.45,0.55,0.45,0.55 -bcnum 7 -bcval 0,0,0,0,0,0,1 -dim 3 -dm_refine 1 -dt 0.001 -edges 3,3 -floc 0.25,0.75,0.25,0.75,0.25,0.75 -fnum 0 -ftime 0,99 -fval 1 -ksp_max_it 50000 -ksp_rtol 1.0e-10 -ksp_type cg -log_summary -lower 0,0 -mat_petscspace_order 0 -mesh datafiles/cube_with_hole4_mesh.dat -mu 1 -nonneg 1 -numsteps 0 -options_left 0 -pc_type jacobi -petscpartitioner_type parmetis -progress 0 -simplex 1 -solution_petscspace_order 1 -tao_fatol 1e-8 -tao_frtol 1e-8 -tao_max_it 50000 -tao_monitor -tao_type blmvm -tao_view -trans datafiles/cube_with_hole4_trans.dat -upper 1,1 -vtuname figures/cube_with_hole_4 -vtuprint 1 #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --download-f2cblaslapack=/turquoise/users/jychang48/petsc-externalpackages/f2cblaslapack-3.4.2.q1.tar.gz --download-metis=/turquoise/users/jychang48/petsc-externalpackages/metis-5.1.0-p1.tar.gz --download-openmpi=/turquoise/users/jychang48/petsc-externalpackages/openmpi-1.8.5.tar.gz --download-parmetis=/turquoise/users/jychang48/petsc-externalpackages/parmetis-4.0.3-p1.tar.gz --download-sowing=/turquoise/users/jychang48/petsc-externalpackages/sowing-1.1.17-p1.tar.gz --with-cc=gcc --with-cxx=g++ --with-debugging=0 --with-fc=gfortran COPTFLAGS="-O3 -march=native -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" PETSC_ARCH=arch-no-hdf5-opt --download-chaco=/turquoise/users/jychang48/petsc-externalpackages/Chaco-2.2-p2.tar.gz ----------------------------------------- Libraries compiled on Tue Jun 23 12:16:19 2015 on wf-fe2.lanl.gov Machine characteristics: Linux-2.6.32-431.29.2.2chaos.ch5.2.x86_64-x86_64-with-redhat-6.6-Santiago Using PETSc directory: /turquoise/users/jychang48/petsc-master Using PETSc arch: arch-no-hdf5-opt ----------------------------------------- Using C compiler: /turquoise/users/jychang48/petsc-master/arch-no-hdf5-opt/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -march=native -mtune=native ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /turquoise/users/jychang48/petsc-master/arch-no-hdf5-opt/bin/mpif90 -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/turquoise/users/jychang48/petsc-master/arch-no-hdf5-opt/include -I/turquoise/users/jychang48/petsc-master/include -I/turquoise/users/jychang48/petsc-master/include -I/turquoise/users/jychang48/petsc-master/arch-no-hdf5-opt/include ----------------------------------------- Using C linker: /turquoise/users/jychang48/petsc-master/arch-no-hdf5-opt/bin/mpicc Using Fortran linker: /turquoise/users/jychang48/petsc-master/arch-no-hdf5-opt/bin/mpif90 Using libraries: -Wl,-rpath,/turquoise/users/jychang48/petsc-master/arch-no-hdf5-opt/lib -L/turquoise/users/jychang48/petsc-master/arch-no-hdf5-opt/lib -lpetsc -Wl,-rpath,/turquoise/users/jychang48/petsc-master/arch-no-hdf5-opt/lib -L/turquoise/users/jychang48/petsc-master/arch-no-hdf5-opt/lib -lf2clapack -lf2cblas -lm -lparmetis -lmetis -lchaco -lX11 -lhwloc -lssl -lcrypto -lm -Wl,-rpath,/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib/gcc/x86_64-unknown-linux-gnu/4.8.2 -L/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib/gcc/x86_64-unknown-linux-gnu/4.8.2 -Wl,-rpath,/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib/gcc -L/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib/gcc -Wl,-rpath,/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib64 -L/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib64 -Wl,-rpath,/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib -L/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib -lmpi_usempi -lmpi_mpifh -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpi_cxx -lstdc++ -Wl,-rpath,/turquoise/users/jychang48/petsc-master/arch-no-hdf5-opt/lib -L/turquoise/users/jychang48/petsc-master/arch-no-hdf5-opt/lib -Wl,-rpath,/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib/gcc/x86_64-unknown-linux-gnu/4.8.2 -L/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib/gcc/x86_64-unknown-linux-gnu/4.8.2 -Wl,-rpath,/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib/gcc -L/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib/gcc -Wl,-rpath,/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib64 -L/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib64 -Wl,-rpath,/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib -L/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib -ldl -Wl,-rpath,/turquoise/users/jychang48/petsc-master/arch-no-hdf5-opt/lib -lmpi -lgcc_s -lpthread -ldl ----------------------------------------- -------------- next part -------------- ================================================== ================================================== MESHID = 4 ================================================== ================================================== ========================================== 16 processors: ========================================== TSTEP ANALYSIS TIME ITER FLOPS/s iter = 0, Function value: -1.57703e-07, Residual: 0.239098 iter = 1, Function value: -4.53878e-07, Residual: 0.223304 iter = 2, Function value: -7.89606e-06, Residual: 0.221146 iter = 3, Function value: -0.0808297, Residual: 0.225876 iter = 4, Function value: -1.67847, Residual: 0.351701 iter = 5, Function value: -3.2307, Residual: 0.18377 iter = 6, Function value: -3.80321, Residual: 0.168797 iter = 7, Function value: -4.25223, Residual: 0.208515 iter = 8, Function value: -4.54546, Residual: 0.211522 iter = 9, Function value: -4.76263, Residual: 0.121566 iter = 10, Function value: -4.93278, Residual: 0.11454 iter = 11, Function value: -5.1385, Residual: 0.149913 iter = 12, Function value: -5.3172, Residual: 0.188699 iter = 13, Function value: -5.35404, Residual: 0.414234 iter = 14, Function value: -5.46775, Residual: 0.166872 iter = 15, Function value: -5.5088, Residual: 0.109606 iter = 16, Function value: -5.55735, Residual: 0.137697 iter = 17, Function value: -5.61175, Residual: 0.20436 iter = 18, Function value: -5.67254, Residual: 0.145789 iter = 19, Function value: -5.7428, Residual: 0.131282 iter = 20, Function value: -5.78678, Residual: 0.168675 iter = 21, Function value: -5.83482, Residual: 0.107592 iter = 22, Function value: -5.88877, Residual: 0.120863 iter = 23, Function value: -5.94162, Residual: 0.170501 iter = 24, Function value: -5.99103, Residual: 0.137338 iter = 25, Function value: -6.03433, Residual: 0.11123 iter = 26, Function value: -6.08494, Residual: 0.123001 iter = 27, Function value: -6.12432, Residual: 0.158457 iter = 28, Function value: -6.16801, Residual: 0.100757 iter = 29, Function value: -6.21675, Residual: 0.100482 iter = 30, Function value: -6.25439, Residual: 0.150756 iter = 31, Function value: -6.29508, Residual: 0.10085 iter = 32, Function value: -6.33253, Residual: 0.0966374 iter = 33, Function value: -6.36431, Residual: 0.103262 iter = 34, Function value: -6.40181, Residual: 0.102187 iter = 35, Function value: -6.44427, Residual: 0.121893 iter = 36, Function value: -6.48004, Residual: 0.0983186 iter = 37, Function value: -6.51506, Residual: 0.092832 iter = 38, Function value: -6.5519, Residual: 0.116683 iter = 39, Function value: -6.58292, Residual: 0.116904 iter = 40, Function value: -6.60971, Residual: 0.0771122 iter = 41, Function value: -6.63585, Residual: 0.0757516 iter = 42, Function value: -6.6586, Residual: 0.10885 iter = 43, Function value: -6.68587, Residual: 0.0752019 iter = 44, Function value: -6.7155, Residual: 0.0769993 iter = 45, Function value: -6.7459, Residual: 0.1128 iter = 46, Function value: -6.77436, Residual: 0.0874185 iter = 47, Function value: -6.8015, Residual: 0.0772962 iter = 48, Function value: -6.82934, Residual: 0.0894704 iter = 49, Function value: -6.85383, Residual: 0.0930334 iter = 50, Function value: -6.87712, Residual: 0.0707267 iter = 51, Function value: -6.90227, Residual: 0.0704031 iter = 52, Function value: -6.92095, Residual: 0.11036 iter = 53, Function value: -6.94187, Residual: 0.0625794 iter = 54, Function value: -6.96158, Residual: 0.0585309 iter = 55, Function value: -6.98111, Residual: 0.07778 iter = 56, Function value: -7.00425, Residual: 0.0917501 iter = 57, Function value: -7.02495, Residual: 0.0680073 iter = 58, Function value: -7.04571, Residual: 0.0632846 iter = 59, Function value: -7.06812, Residual: 0.067345 iter = 60, Function value: -7.08734, Residual: 0.111259 iter = 61, Function value: -7.10785, Residual: 0.062663 iter = 62, Function value: -7.12178, Residual: 0.0546556 iter = 63, Function value: -7.13834, Residual: 0.0661359 iter = 64, Function value: -7.15288, Residual: 0.0977186 iter = 65, Function value: -7.16979, Residual: 0.0558807 iter = 66, Function value: -7.18517, Residual: 0.0514965 iter = 67, Function value: -7.19956, Residual: 0.0566699 iter = 68, Function value: -7.21503, Residual: 0.102862 iter = 69, Function value: -7.23295, Residual: 0.0520616 iter = 70, Function value: -7.24472, Residual: 0.0514462 iter = 71, Function value: -7.2607, Residual: 0.0603313 iter = 72, Function value: -7.27322, Residual: 0.114154 iter = 73, Function value: -7.29073, Residual: 0.0553255 iter = 74, Function value: -7.30301, Residual: 0.0468704 iter = 75, Function value: -7.31535, Residual: 0.0503862 iter = 76, Function value: -7.32483, Residual: 0.121667 iter = 77, Function value: -7.34239, Residual: 0.0473323 iter = 78, Function value: -7.35051, Residual: 0.0402786 iter = 79, Function value: -7.36252, Residual: 0.0513142 iter = 80, Function value: -7.37578, Residual: 0.0980691 iter = 81, Function value: -7.39103, Residual: 0.0561799 iter = 82, Function value: -7.40188, Residual: 0.0420525 iter = 83, Function value: -7.41352, Residual: 0.0506624 iter = 84, Function value: -7.42311, Residual: 0.0745317 iter = 85, Function value: -7.43349, Residual: 0.0418963 iter = 86, Function value: -7.44304, Residual: 0.0427164 iter = 87, Function value: -7.45309, Residual: 0.050941 iter = 88, Function value: -7.46557, Residual: 0.0855721 iter = 89, Function value: -7.47897, Residual: 0.0443531 iter = 90, Function value: -7.48731, Residual: 0.038289 iter = 91, Function value: -7.49667, Residual: 0.0433084 iter = 92, Function value: -7.50404, Residual: 0.0748006 iter = 93, Function value: -7.51295, Residual: 0.0379092 iter = 94, Function value: -7.52065, Residual: 0.0360034 iter = 95, Function value: -7.52815, Residual: 0.0407615 iter = 96, Function value: -7.53624, Residual: 0.0871539 iter = 97, Function value: -7.54733, Residual: 0.0370463 iter = 98, Function value: -7.55263, Residual: 0.0321752 iter = 99, Function value: -7.5611, Residual: 0.0380701 iter = 100, Function value: -7.56636, Residual: 0.0828749 iter = 101, Function value: -7.57523, Residual: 0.0364468 iter = 102, Function value: -7.58065, Residual: 0.0296879 iter = 103, Function value: -7.58617, Residual: 0.0331854 iter = 104, Function value: -7.59286, Residual: 0.0653273 iter = 105, Function value: -7.60073, Residual: 0.0311378 iter = 106, Function value: -7.605, Residual: 0.0271273 iter = 107, Function value: -7.61161, Residual: 0.0352389 iter = 108, Function value: -7.61636, Residual: 0.0633698 iter = 109, Function value: -7.62264, Residual: 0.0323803 iter = 110, Function value: -7.62822, Residual: 0.0286247 iter = 111, Function value: -7.63278, Residual: 0.0310643 iter = 112, Function value: -7.63766, Residual: 0.0695965 iter = 113, Function value: -7.64471, Residual: 0.0273006 iter = 114, Function value: -7.6477, Residual: 0.0241679 iter = 115, Function value: -7.65299, Residual: 0.0287846 iter = 116, Function value: -7.65608, Residual: 0.0686328 iter = 117, Function value: -7.66196, Residual: 0.029546 iter = 118, Function value: -7.66562, Residual: 0.022733 iter = 119, Function value: -7.6691, Residual: 0.0260237 iter = 120, Function value: -7.67366, Residual: 0.0518384 iter = 121, Function value: -7.67882, Residual: 0.0250519 iter = 122, Function value: -7.68159, Residual: 0.0208822 iter = 123, Function value: -7.68604, Residual: 0.0283777 iter = 124, Function value: -7.68918, Residual: 0.0502209 iter = 125, Function value: -7.69338, Residual: 0.026466 iter = 126, Function value: -7.69725, Residual: 0.0232737 iter = 127, Function value: -7.70029, Residual: 0.0246071 iter = 128, Function value: -7.70333, Residual: 0.0560219 iter = 129, Function value: -7.70795, Residual: 0.0219018 iter = 130, Function value: -7.70988, Residual: 0.0192943 iter = 131, Function value: -7.71333, Residual: 0.0223829 iter = 132, Function value: -7.71539, Residual: 0.0571394 iter = 133, Function value: -7.71939, Residual: 0.0241063 iter = 134, Function value: -7.72174, Residual: 0.0183287 iter = 135, Function value: -7.72395, Residual: 0.0206927 iter = 136, Function value: -7.7272, Residual: 0.0366094 iter = 137, Function value: -7.73034, Residual: 0.0229168 iter = 138, Function value: -7.73217, Residual: 0.0165316 iter = 139, Function value: -7.73523, Residual: 0.0207809 iter = 140, Function value: -7.7372, Residual: 0.0344397 iter = 141, Function value: -7.73964, Residual: 0.0197955 iter = 142, Function value: -7.74225, Residual: 0.0175415 iter = 143, Function value: -7.74421, Residual: 0.0204096 iter = 144, Function value: -7.74642, Residual: 0.0325727 iter = 145, Function value: -7.74861, Residual: 0.0168286 iter = 146, Function value: -7.74991, Residual: 0.0166282 iter = 147, Function value: -7.75202, Residual: 0.0187187 iter = 148, Function value: -7.75333, Residual: 0.043491 iter = 149, Function value: -7.75579, Residual: 0.0163257 iter = 150, Function value: -7.75685, Residual: 0.013081 iter = 151, Function value: -7.75808, Residual: 0.0146514 iter = 152, Function value: -7.75972, Residual: 0.0272383 iter = 153, Function value: -7.76143, Residual: 0.0146716 iter = 154, Function value: -7.76245, Residual: 0.012202 iter = 155, Function value: -7.76402, Residual: 0.0165763 iter = 156, Function value: -7.76485, Residual: 0.0264216 iter = 157, Function value: -7.76596, Residual: 0.0141675 iter = 158, Function value: -7.76711, Residual: 0.0118525 iter = 159, Function value: -7.76795, Residual: 0.0132682 iter = 160, Function value: -7.76906, Residual: 0.0307988 iter = 161, Function value: -7.77046, Residual: 0.0136723 iter = 162, Function value: -7.77101, Residual: 0.0105228 iter = 163, Function value: -7.77204, Residual: 0.0114438 iter = 164, Function value: -7.77257, Residual: 0.0246093 iter = 165, Function value: -7.77345, Residual: 0.0125104 iter = 166, Function value: -7.77417, Residual: 0.0102181 iter = 167, Function value: -7.77474, Residual: 0.0109432 iter = 168, Function value: -7.77532, Residual: 0.0199892 iter = 169, Function value: -7.77611, Residual: 0.0101163 iter = 170, Function value: -7.77655, Residual: 0.0105442 iter = 171, Function value: -7.77743, Residual: 0.0116992 iter = 172, Function value: -7.77779, Residual: 0.0293656 iter = 173, Function value: -7.77878, Residual: 0.0112078 iter = 174, Function value: -7.77919, Residual: 0.00808034 iter = 175, Function value: -7.77957, Residual: 0.00902743 iter = 176, Function value: -7.78023, Residual: 0.0122209 iter = 177, Function value: -7.78068, Residual: 0.0174618 iter = 178, Function value: -7.78127, Residual: 0.00842028 iter = 179, Function value: -7.78167, Residual: 0.0091214 iter = 180, Function value: -7.78215, Residual: 0.0100989 iter = 181, Function value: -7.78253, Residual: 0.0198294 iter = 182, Function value: -7.78312, Residual: 0.00877935 iter = 183, Function value: -7.78342, Residual: 0.00805744 iter = 184, Function value: -7.78381, Residual: 0.00884593 iter = 185, Function value: -7.78426, Residual: 0.0172995 iter = 186, Function value: -7.78476, Residual: 0.00859094 iter = 187, Function value: -7.78503, Residual: 0.00730765 iter = 188, Function value: -7.78543, Residual: 0.00819726 iter = 189, Function value: -7.7857, Residual: 0.0163602 iter = 190, Function value: -7.78611, Residual: 0.00840989 iter = 191, Function value: -7.78641, Residual: 0.00748282 iter = 192, Function value: -7.7867, Residual: 0.00842633 iter = 193, Function value: -7.78717, Residual: 0.0143768 iter = 194, Function value: -7.78759, Residual: 0.00972398 iter = 195, Function value: -7.78783, Residual: 0.0068276 iter = 196, Function value: -7.78815, Residual: 0.00741378 iter = 197, Function value: -7.7884, Residual: 0.00941715 iter = 198, Function value: -7.78875, Residual: 0.00770439 iter = 199, Function value: -7.78917, Residual: 0.00852455 iter = 200, Function value: -7.78949, Residual: 0.0119365 iter = 201, Function value: -7.7898, Residual: 0.00737878 iter = 202, Function value: -7.79006, Residual: 0.00692509 iter = 203, Function value: -7.79031, Residual: 0.0078474 iter = 204, Function value: -7.79062, Residual: 0.0133278 iter = 205, Function value: -7.79097, Residual: 0.00759941 iter = 206, Function value: -7.7912, Residual: 0.00672229 iter = 207, Function value: -7.79144, Residual: 0.00698935 iter = 208, Function value: -7.79165, Residual: 0.0137285 iter = 209, Function value: -7.79195, Residual: 0.00704832 iter = 210, Function value: -7.79218, Residual: 0.00665052 iter = 211, Function value: -7.79241, Residual: 0.00752555 iter = 212, Function value: -7.79273, Residual: 0.013596 iter = 213, Function value: -7.79307, Residual: 0.00727608 iter = 214, Function value: -7.79324, Residual: 0.0061077 iter = 215, Function value: -7.79355, Residual: 0.0070488 iter = 216, Function value: -7.79371, Residual: 0.0129342 iter = 217, Function value: -7.79398, Residual: 0.00699342 iter = 218, Function value: -7.79422, Residual: 0.00615994 iter = 219, Function value: -7.79442, Residual: 0.00670625 iter = 220, Function value: -7.79466, Residual: 0.0129079 iter = 221, Function value: -7.79497, Residual: 0.00653888 iter = 222, Function value: -7.79512, Residual: 0.00595468 iter = 223, Function value: -7.79543, Residual: 0.00677409 iter = 224, Function value: -7.79557, Residual: 0.0157728 iter = 225, Function value: -7.79589, Residual: 0.00704583 iter = 226, Function value: -7.79609, Residual: 0.00558073 iter = 227, Function value: -7.79628, Residual: 0.00634841 iter = 228, Function value: -7.79655, Residual: 0.0100776 iter = 229, Function value: -7.79683, Residual: 0.00715925 iter = 230, Function value: -7.79701, Residual: 0.00566091 iter = 231, Function value: -7.79735, Residual: 0.00681582 iter = 232, Function value: -7.79758, Residual: 0.0114214 iter = 233, Function value: -7.79786, Residual: 0.00690348 iter = 234, Function value: -7.79816, Residual: 0.00560618 iter = 235, Function value: -7.79836, Residual: 0.00674194 iter = 236, Function value: -7.79867, Residual: 0.00742554 iter = 237, Function value: -7.79891, Residual: 0.00689359 iter = 238, Function value: -7.79916, Residual: 0.00634623 iter = 239, Function value: -7.79947, Residual: 0.00757451 iter = 240, Function value: -7.79971, Residual: 0.00895436 iter = 241, Function value: -7.79993, Residual: 0.00593646 iter = 242, Function value: -7.80019, Residual: 0.0052903 iter = 243, Function value: -7.80038, Residual: 0.00750854 iter = 244, Function value: -7.80063, Residual: 0.00558416 iter = 245, Function value: -7.80087, Residual: 0.00693769 iter = 246, Function value: -7.80111, Residual: 0.00608144 iter = 247, Function value: -7.80131, Residual: 0.00532918 iter = 248, Function value: -7.80156, Residual: 0.0068985 iter = 249, Function value: -7.80177, Residual: 0.00830931 iter = 250, Function value: -7.80197, Residual: 0.00580026 iter = 251, Function value: -7.80221, Residual: 0.00495945 iter = 252, Function value: -7.80236, Residual: 0.00744075 iter = 253, Function value: -7.80255, Residual: 0.00554201 iter = 254, Function value: -7.80278, Residual: 0.00566253 iter = 255, Function value: -7.80295, Residual: 0.00688607 iter = 256, Function value: -7.80311, Residual: 0.00544887 iter = 257, Function value: -7.8034, Residual: 0.00636679 iter = 258, Function value: -7.80355, Residual: 0.00897564 iter = 259, Function value: -7.80371, Residual: 0.00549888 iter = 260, Function value: -7.8039, Residual: 0.00415368 iter = 261, Function value: -7.80402, Residual: 0.00468554 iter = 262, Function value: -7.8042, Residual: 0.00944556 iter = 263, Function value: -7.8044, Residual: 0.00516081 iter = 264, Function value: -7.8045, Residual: 0.0043641 iter = 265, Function value: -7.8047, Residual: 0.00495905 iter = 266, Function value: -7.80479, Residual: 0.00991048 iter = 267, Function value: -7.80495, Residual: 0.00465967 iter = 268, Function value: -7.80507, Residual: 0.00374079 iter = 269, Function value: -7.80516, Residual: 0.00395313 iter = 270, Function value: -7.80527, Residual: 0.00847105 iter = 271, Function value: -7.80544, Residual: 0.00366557 iter = 272, Function value: -7.80554, Residual: 0.00421187 iter = 273, Function value: -7.80573, Residual: 0.00506542 iter = 274, Function value: -7.8058, Residual: 0.0113156 iter = 275, Function value: -7.80599, Residual: 0.00403993 iter = 276, Function value: -7.80607, Residual: 0.00309473 iter = 277, Function value: -7.80614, Residual: 0.00370412 iter = 278, Function value: -7.80629, Residual: 0.00589863 iter = 279, Function value: -7.80642, Residual: 0.00610977 iter = 280, Function value: -7.80654, Residual: 0.00338659 iter = 281, Function value: -7.80664, Residual: 0.00361862 iter = 282, Function value: -7.80672, Residual: 0.00436093 iter = 283, Function value: -7.80683, Residual: 0.00412742 iter = 284, Function value: -7.80694, Residual: 0.0034514 iter = 285, Function value: -7.80707, Residual: 0.00470219 iter = 286, Function value: -7.80717, Residual: 0.00523596 iter = 287, Function value: -7.80726, Residual: 0.00350967 iter = 288, Function value: -7.80737, Residual: 0.00311707 iter = 289, Function value: -7.80744, Residual: 0.0046191 iter = 290, Function value: -7.80754, Residual: 0.00378936 iter = 291, Function value: -7.80763, Residual: 0.00564189 iter = 292, Function value: -7.80771, Residual: 0.00311771 iter = 293, Function value: -7.80776, Residual: 0.00315966 iter = 294, Function value: -7.80786, Residual: 0.00369862 iter = 295, Function value: -7.80792, Residual: 0.00827063 iter = 296, Function value: -7.80803, Residual: 0.00357855 iter = 297, Function value: -7.80811, Residual: 0.00265554 iter = 298, Function value: -7.80816, Residual: 0.00290326 iter = 299, Function value: -7.80822, Residual: 0.00596541 iter = 300, Function value: -7.8083, Residual: 0.00280405 iter = 301, Function value: -7.80836, Residual: 0.00260171 iter = 302, Function value: -7.80844, Residual: 0.00323139 iter = 303, Function value: -7.80849, Residual: 0.00698054 iter = 304, Function value: -7.80858, Residual: 0.00268255 iter = 305, Function value: -7.80862, Residual: 0.00233456 iter = 306, Function value: -7.80867, Residual: 0.00274033 iter = 307, Function value: -7.80874, Residual: 0.00525292 iter = 308, Function value: -7.80882, Residual: 0.00315494 iter = 309, Function value: -7.80887, Residual: 0.00224967 iter = 310, Function value: -7.80893, Residual: 0.00288279 iter = 311, Function value: -7.80896, Residual: 0.0034635 iter = 312, Function value: -7.809, Residual: 0.00250577 iter = 313, Function value: -7.80908, Residual: 0.00252035 iter = 314, Function value: -7.80913, Residual: 0.00382787 iter = 315, Function value: -7.80918, Residual: 0.00268336 iter = 316, Function value: -7.80924, Residual: 0.00316437 iter = 317, Function value: -7.80927, Residual: 0.00287941 iter = 318, Function value: -7.8093, Residual: 0.00233516 iter = 319, Function value: -7.80936, Residual: 0.00313269 iter = 320, Function value: -7.8094, Residual: 0.00424648 iter = 321, Function value: -7.80943, Residual: 0.00254562 iter = 322, Function value: -7.80948, Residual: 0.00197848 iter = 323, Function value: -7.80951, Residual: 0.00219143 iter = 324, Function value: -7.80954, Residual: 0.00559215 iter = 325, Function value: -7.8096, Residual: 0.00207975 iter = 326, Function value: -7.80962, Residual: 0.00178679 iter = 327, Function value: -7.80967, Residual: 0.00219704 iter = 328, Function value: -7.80968, Residual: 0.00618964 iter = 329, Function value: -7.80974, Residual: 0.00217387 iter = 330, Function value: -7.80976, Residual: 0.00159697 iter = 331, Function value: -7.80978, Residual: 0.00181745 iter = 332, Function value: -7.80981, Residual: 0.00342114 iter = 333, Function value: -7.80984, Residual: 0.00156656 iter = 334, Function value: -7.80987, Residual: 0.0015906 iter = 335, Function value: -7.8099, Residual: 0.00220417 iter = 336, Function value: -7.80992, Residual: 0.00465009 iter = 337, Function value: -7.80995, Residual: 0.00187269 iter = 338, Function value: -7.80997, Residual: 0.00141207 iter = 339, Function value: -7.80999, Residual: 0.00165522 iter = 340, Function value: -7.81002, Residual: 0.00275905 iter = 341, Function value: -7.81004, Residual: 0.00236444 iter = 342, Function value: -7.81005, Residual: 0.00126158 iter = 343, Function value: -7.81007, Residual: 0.0013511 iter = 344, Function value: -7.81008, Residual: 0.00159616 iter = 345, Function value: -7.8101, Residual: 0.00263468 iter = 346, Function value: -7.81012, Residual: 0.00130192 iter = 347, Function value: -7.81013, Residual: 0.00125099 iter = 348, Function value: -7.81014, Residual: 0.00134643 iter = 349, Function value: -7.81015, Residual: 0.00322428 iter = 350, Function value: -7.81017, Residual: 0.00112657 iter = 351, Function value: -7.81017, Residual: 0.00105449 iter = 352, Function value: -7.81019, Residual: 0.00124955 iter = 353, Function value: -7.8102, Residual: 0.00223142 iter = 354, Function value: -7.81021, Residual: 0.00113859 iter = 355, Function value: -7.81022, Residual: 0.00098257 iter = 356, Function value: -7.81023, Residual: 0.00131657 iter = 357, Function value: -7.81024, Residual: 0.00218694 iter = 358, Function value: -7.81025, Residual: 0.00104937 iter = 359, Function value: -7.81025, Residual: 0.000917616 iter = 360, Function value: -7.81026, Residual: 0.00108485 iter = 361, Function value: -7.81027, Residual: 0.00205437 iter = 362, Function value: -7.81028, Residual: 0.00123308 iter = 363, Function value: -7.81029, Residual: 0.000869878 iter = 364, Function value: -7.8103, Residual: 0.000962767 iter = 365, Function value: -7.8103, Residual: 0.00183068 iter = 366, Function value: -7.81031, Residual: 0.00111156 iter = 367, Function value: -7.81032, Residual: 0.000900357 iter = 368, Function value: -7.81033, Residual: 0.0010033 iter = 369, Function value: -7.81034, Residual: 0.00178238 iter = 370, Function value: -7.81034, Residual: 0.000996511 iter = 371, Function value: -7.81035, Residual: 0.000886717 iter = 372, Function value: -7.81036, Residual: 0.00106146 iter = 373, Function value: -7.81036, Residual: 0.00222379 iter = 374, Function value: -7.81037, Residual: 0.00099812 iter = 375, Function value: -7.81038, Residual: 0.000723174 iter = 376, Function value: -7.81038, Residual: 0.000738951 iter = 377, Function value: -7.81039, Residual: 0.00185185 iter = 378, Function value: -7.8104, Residual: 0.00075251 iter = 379, Function value: -7.8104, Residual: 0.000839343 iter = 380, Function value: -7.81041, Residual: 0.00103561 iter = 381, Function value: -7.81041, Residual: 0.0024586 iter = 382, Function value: -7.81042, Residual: 0.000744648 iter = 383, Function value: -7.81042, Residual: 0.000538622 iter = 384, Function value: -7.81043, Residual: 0.000718492 iter = 385, Function value: -7.81043, Residual: 0.00104014 iter = 386, Function value: -7.81044, Residual: 0.00136274 iter = 387, Function value: -7.81044, Residual: 0.00074841 iter = 388, Function value: -7.81045, Residual: 0.000785691 iter = 389, Function value: -7.81045, Residual: 0.000898177 iter = 390, Function value: -7.81046, Residual: 0.00146418 iter = 391, Function value: -7.81046, Residual: 0.000690708 iter = 392, Function value: -7.81047, Residual: 0.000647569 iter = 393, Function value: -7.81047, Residual: 0.000697967 iter = 394, Function value: -7.81048, Residual: 0.00131163 iter = 395, Function value: -7.81048, Residual: 0.000827399 iter = 396, Function value: -7.81049, Residual: 0.000845789 iter = 397, Function value: -7.8105, Residual: 0.00105409 iter = 398, Function value: -7.8105, Residual: 0.00161545 iter = 399, Function value: -7.8105, Residual: 0.000687262 iter = 400, Function value: -7.81051, Residual: 0.00059559 iter = 401, Function value: -7.81051, Residual: 0.000752873 iter = 402, Function value: -7.81051, Residual: 0.00120154 iter = 403, Function value: -7.81052, Residual: 0.000798778 iter = 404, Function value: -7.81052, Residual: 0.000712079 iter = 405, Function value: -7.81053, Residual: 0.00110496 iter = 406, Function value: -7.81053, Residual: 0.00127187 iter = 407, Function value: -7.81054, Residual: 0.000842919 iter = 408, Function value: -7.81055, Residual: 0.00069032 iter = 409, Function value: -7.81055, Residual: 0.000841593 iter = 410, Function value: -7.81056, Residual: 0.0012262 iter = 411, Function value: -7.81056, Residual: 0.000918047 iter = 412, Function value: -7.81057, Residual: 0.0007945 iter = 413, Function value: -7.81057, Residual: 0.000866942 iter = 414, Function value: -7.81058, Residual: 0.00155323 iter = 415, Function value: -7.81058, Residual: 0.000891976 iter = 416, Function value: -7.81059, Residual: 0.000747428 iter = 417, Function value: -7.81059, Residual: 0.000821924 iter = 418, Function value: -7.8106, Residual: 0.00105041 iter = 419, Function value: -7.8106, Residual: 0.000994719 iter = 420, Function value: -7.81061, Residual: 0.00088687 iter = 421, Function value: -7.81061, Residual: 0.00103494 iter = 422, Function value: -7.81062, Residual: 0.00130036 iter = 423, Function value: -7.81063, Residual: 0.000908119 iter = 424, Function value: -7.81063, Residual: 0.000781051 iter = 425, Function value: -7.81064, Residual: 0.00113501 iter = 426, Function value: -7.81064, Residual: 0.000816488 iter = 427, Function value: -7.81065, Residual: 0.000743231 iter = 428, Function value: -7.81065, Residual: 0.00100657 iter = 429, Function value: -7.81065, Residual: 0.000910451 iter = 430, Function value: -7.81066, Residual: 0.00194077 iter = 431, Function value: -7.81067, Residual: 0.000792676 iter = 432, Function value: -7.81067, Residual: 0.000635068 iter = 433, Function value: -7.81068, Residual: 0.000844652 iter = 434, Function value: -7.81068, Residual: 0.00173968 iter = 435, Function value: -7.81068, Residual: 0.000718673 iter = 436, Function value: -7.81069, Residual: 0.000571681 iter = 437, Function value: -7.81069, Residual: 0.000659658 iter = 438, Function value: -7.81069, Residual: 0.0013505 iter = 439, Function value: -7.8107, Residual: 0.000568503 iter = 440, Function value: -7.8107, Residual: 0.000650747 iter = 441, Function value: -7.8107, Residual: 0.000749039 iter = 442, Function value: -7.81071, Residual: 0.00174564 iter = 443, Function value: -7.81071, Residual: 0.000662028 iter = 444, Function value: -7.81071, Residual: 0.000523397 iter = 445, Function value: -7.81072, Residual: 0.000656138 iter = 446, Function value: -7.81072, Residual: 0.00109849 iter = 447, Function value: -7.81073, Residual: 0.000874994 iter = 448, Function value: -7.81073, Residual: 0.000531464 iter = 449, Function value: -7.81073, Residual: 0.000593073 iter = 450, Function value: -7.81073, Residual: 0.000707091 iter = 451, Function value: -7.81074, Residual: 0.000810123 iter = 452, Function value: -7.81074, Residual: 0.000603729 iter = 453, Function value: -7.81074, Residual: 0.000561513 iter = 454, Function value: -7.81075, Residual: 0.000729322 iter = 455, Function value: -7.81075, Residual: 0.000673021 iter = 456, Function value: -7.81075, Residual: 0.000644186 iter = 457, Function value: -7.81076, Residual: 0.000523062 iter = 458, Function value: -7.81076, Residual: 0.000587717 iter = 459, Function value: -7.81076, Residual: 0.000496603 iter = 460, Function value: -7.81076, Residual: 0.000532727 iter = 461, Function value: -7.81077, Residual: 0.000998028 iter = 462, Function value: -7.81077, Residual: 0.000573228 iter = 463, Function value: -7.81077, Residual: 0.000469068 iter = 464, Function value: -7.81077, Residual: 0.000588658 iter = 465, Function value: -7.81077, Residual: 0.000786977 iter = 466, Function value: -7.81078, Residual: 0.000519669 iter = 467, Function value: -7.81078, Residual: 0.00048518 iter = 468, Function value: -7.81078, Residual: 0.000624198 iter = 469, Function value: -7.81078, Residual: 0.000502941 iter = 470, Function value: -7.81078, Residual: 0.000492811 iter = 471, Function value: -7.81079, Residual: 0.00055812 iter = 472, Function value: -7.81079, Residual: 0.000851356 iter = 473, Function value: -7.81079, Residual: 0.00054883 iter = 474, Function value: -7.81079, Residual: 0.000465062 iter = 475, Function value: -7.8108, Residual: 0.000552709 iter = 476, Function value: -7.8108, Residual: 0.000708076 iter = 477, Function value: -7.8108, Residual: 0.000473574 iter = 478, Function value: -7.8108, Residual: 0.000442428 iter = 479, Function value: -7.8108, Residual: 0.000635062 iter = 480, Function value: -7.81081, Residual: 0.000419477 iter = 481, Function value: -7.81081, Residual: 0.000422442 iter = 482, Function value: -7.81081, Residual: 0.000578847 iter = 483, Function value: -7.81081, Residual: 0.000499854 iter = 484, Function value: -7.81081, Residual: 0.000460825 iter = 485, Function value: -7.81081, Residual: 0.000545967 iter = 486, Function value: -7.81082, Residual: 0.00060014 iter = 487, Function value: -7.81082, Residual: 0.000418945 iter = 488, Function value: -7.81082, Residual: 0.000410794 iter = 489, Function value: -7.81082, Residual: 0.000750432 iter = 490, Function value: -7.81082, Residual: 0.000379884 iter = 491, Function value: -7.81082, Residual: 0.000331549 iter = 492, Function value: -7.81082, Residual: 0.000416861 iter = 493, Function value: -7.81083, Residual: 0.000579213 iter = 494, Function value: -7.81083, Residual: 0.000378749 iter = 495, Function value: -7.81083, Residual: 0.000391863 iter = 496, Function value: -7.81083, Residual: 0.000469321 iter = 497, Function value: -7.81083, Residual: 0.000849723 iter = 498, Function value: -7.81083, Residual: 0.000365311 iter = 499, Function value: -7.81083, Residual: 0.00030933 iter = 500, Function value: -7.81083, Residual: 0.000356744 iter = 501, Function value: -7.81084, Residual: 0.000652476 iter = 502, Function value: -7.81084, Residual: 0.000389956 iter = 503, Function value: -7.81084, Residual: 0.00030185 iter = 504, Function value: -7.81084, Residual: 0.000430598 iter = 505, Function value: -7.81084, Residual: 0.000506603 iter = 506, Function value: -7.81084, Residual: 0.000319141 iter = 507, Function value: -7.81084, Residual: 0.000290193 iter = 508, Function value: -7.81084, Residual: 0.000334589 iter = 509, Function value: -7.81084, Residual: 0.000689399 iter = 510, Function value: -7.81084, Residual: 0.000335727 iter = 511, Function value: -7.81084, Residual: 0.000261672 Tao Object: 16 MPI processes type: blmvm Gradient steps: 0 TaoLineSearch Object: 16 MPI processes type: more-thuente Active Set subset type: subvec convergence tolerances: fatol=1e-08, frtol=1e-08 convergence tolerances: gatol=1e-08, steptol=0, gttol=0 Residual in Function/Gradient:=0.000261672 Objective value=-7.81084 total number of iterations=511, (max: 50000) total number of function/gradient evaluations=512, (max: 4000) Solution converged: estimated |f(x)-f(X*)|/|f(X*)| <= frtol it: 1 2.475548e+00 511 6.486348e+09 ========================================== Time summary: ========================================== Creating DMPlex: 1.16017 Distributing DMPlex: 0.726076 Refining DMPlex: 0.392075 Setting up problem: 0.55005 Overall analysis time: 2.88639 Overall FLOPS/s: 5.35456e+09 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./main_wolf on a arch-no-hdf5-opt named wf148.localdomain with 16 processors, by jychang48 Thu Jun 25 10:36:25 2015 Using Petsc Development GIT revision: unknown GIT Date: unknown Max Max/Min Avg Total Time (sec): 5.725e+00 1.00016 5.724e+00 Objects: 3.920e+02 1.11681 3.536e+02 Flops: 1.079e+09 1.12340 1.013e+09 1.620e+10 Flops/sec: 1.885e+08 1.12351 1.769e+08 2.830e+09 MPI Messages: 5.470e+03 1.42250 4.807e+03 7.690e+04 MPI Message Lengths: 8.998e+07 5.60808 4.773e+03 3.671e+08 MPI Reductions: 1.332e+04 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 5.7241e+00 100.0% 1.6201e+10 100.0% 7.690e+04 100.0% 4.773e+03 100.0% 1.332e+04 100.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage CreateMesh 513 1.0 2.8079e+00 1.0 2.95e+08 1.1 7.4e+04 3.7e+03 1.0e+03 49 27 96 74 8 49 27 96 74 8 1581 VecView 1 1.0 2.8341e-03 2.4 6.20e+04 2.6 8.6e+01 2.5e+04 0.0e+00 0 0 0 1 0 0 0 0 1 0 257 VecDot 11208 1.0 5.5847e-01 1.3 3.79e+08 1.1 0.0e+00 0.0e+00 1.1e+04 9 35 0 0 84 9 35 0 0 84 10179 VecNorm 512 1.0 2.0345e-02 1.5 1.73e+07 1.1 0.0e+00 0.0e+00 5.1e+02 0 2 0 0 4 0 2 0 0 4 12765 VecScale 2043 1.0 3.0195e-02 1.1 3.46e+07 1.1 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 17159 VecCopy 6638 1.0 1.8342e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 VecSet 11 1.0 1.5627e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 8653 1.0 2.8274e-01 1.2 3.01e+08 1.1 0.0e+00 0.0e+00 0.0e+00 4 28 0 0 0 4 28 0 0 0 15981 VecAYPX 1020 1.0 3.0797e-02 1.5 1.73e+07 1.1 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 8400 VecWAXPY 1 1.0 4.1008e-05 1.9 1.69e+04 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 6185 VecPointwiseMult 3061 1.0 7.7274e-02 1.1 5.18e+07 1.1 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 10046 VecScatterBegin 513 1.0 2.4148e-02 1.5 0.00e+00 0.0 7.2e+04 2.5e+03 0.0e+00 0 0 93 49 0 0 0 93 49 0 0 VecScatterEnd 513 1.0 2.4307e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMult 513 1.0 4.5839e-01 1.1 2.44e+08 1.1 7.2e+04 2.5e+03 0.0e+00 8 23 93 49 0 8 23 93 49 0 8002 MatAssemblyBegin 2 1.0 2.8213e-0212.5 0.00e+00 0.0 2.1e+02 7.3e+04 4.0e+00 0 0 0 4 0 0 0 0 4 0 0 MatAssemblyEnd 2 1.0 1.2559e-02 1.6 0.00e+00 0.0 2.8e+02 6.3e+02 8.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 1 1.0 3.2997e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 Mesh Partition 1 1.0 6.5906e-01 1.1 0.00e+00 0.0 1.6e+03 1.1e+04 4.0e+00 11 0 2 5 0 11 0 2 5 0 0 Mesh Migration 1 1.0 8.5906e-02 1.0 0.00e+00 0.0 3.0e+02 2.0e+05 2.0e+00 1 0 0 16 0 1 0 0 16 0 0 DMPlexInterp 3 1.0 9.6569e-014074.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 DMPlexDistribute 1 1.0 7.5649e-01 1.0 0.00e+00 0.0 1.9e+03 4.7e+04 6.0e+00 13 0 3 25 0 13 0 3 25 0 0 DMPlexDistCones 1 1.0 5.5471e-02 1.0 0.00e+00 0.0 1.3e+02 3.5e+05 0.0e+00 1 0 0 12 0 1 0 0 12 0 0 DMPlexDistLabels 1 1.0 2.6703e-0429.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 DMPlexDistField 2 1.0 2.6206e-02 1.1 0.00e+00 0.0 2.6e+02 7.8e+04 4.0e+00 0 0 0 5 0 0 0 0 5 0 0 DMPlexDistData 1 1.0 4.4285e-0131.5 0.00e+00 0.0 1.6e+03 7.0e+03 0.0e+00 7 0 2 3 0 7 0 2 3 0 0 DMPlexStratify 12 1.2 3.5430e-01 4.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 DMPlexPrealloc 1 1.0 1.9255e-01 1.0 0.00e+00 0.0 1.7e+03 5.2e+03 1.7e+01 3 0 2 2 0 3 0 2 2 0 0 DMPlexResidualFE 1 1.0 1.0427e-01 1.1 5.23e+06 1.1 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 771 DMPlexJacobianFE 1 1.0 2.9981e-01 1.0 1.06e+07 1.1 2.1e+02 7.3e+04 2.0e+00 5 1 0 4 0 5 1 0 4 0 545 SFSetGraph 22 1.0 1.2753e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFBcastBegin 35 1.0 4.5945e-01 7.6 0.00e+00 0.0 3.6e+03 3.0e+04 0.0e+00 8 0 5 29 0 8 0 5 29 0 0 SFBcastEnd 35 1.0 8.2271e-02 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 SFReduceBegin 6 1.0 1.3219e-0210.0 0.00e+00 0.0 8.0e+02 1.9e+04 0.0e+00 0 0 1 4 0 0 0 1 4 0 0 SFReduceEnd 6 1.0 1.8614e-02 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFFetchOpBegin 1 1.0 7.2002e-0517.8 0.00e+00 0.0 7.0e+01 1.2e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFFetchOpEnd 1 1.0 4.7493e-04 2.1 0.00e+00 0.0 7.0e+01 1.2e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SNESFunctionEval 1 1.0 1.0910e-01 1.1 5.23e+06 1.1 1.7e+02 2.5e+04 0.0e+00 2 0 0 1 0 2 0 0 1 0 737 SNESJacobianEval 1 1.0 3.0030e-01 1.0 1.06e+07 1.1 4.2e+02 4.4e+04 2.0e+00 5 1 1 5 0 5 1 1 5 0 544 TaoSolve 1 1.0 2.0681e+00 1.0 1.05e+09 1.1 7.2e+04 2.5e+03 1.3e+04 36 98 93 49100 36 98 93 49100 7643 TaoLineSearchApply 511 1.0 8.0252e-01 1.0 3.81e+08 1.1 7.2e+04 2.5e+03 4.1e+03 14 35 93 49 31 14 35 93 49 31 7136 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Viewer 4 3 2264 0 Object 7 7 4032 0 Container 7 7 3976 0 Vector 49 49 35398640 0 Vector Scatter 1 1 1088 0 Matrix 4 4 3128844 0 Distributed Mesh 30 30 139040 0 GraphPartitioner 12 12 7248 0 Star Forest Bipartite Graph 78 78 63576 0 Discrete System 30 30 25440 0 Index Set 85 85 12912584 0 IS L to G Mapping 2 2 3821272 0 Section 70 70 46480 0 SNES 1 1 1332 0 SNESLineSearch 1 1 864 0 DMSNES 1 1 664 0 Krylov Solver 1 1 1216 0 Preconditioner 1 1 848 0 Linear Space 2 2 1280 0 Dual Space 2 2 1312 0 FE Space 2 2 1496 0 Tao 1 1 1752 0 TaoLineSearch 1 1 880 0 ======================================================================================================================== Average time to get PetscTime(): 5.96046e-07 Average time for MPI_Barrier(): 3.19481e-06 Average time for zero size MPI_Send(): 1.68383e-06 #PETSc Option Table entries: -al 1 -am 0 -at 0.001 -bcloc 0,1,0,1,0,0,0,1,0,1,1,1,0,0,0,1,0,1,1,1,0,1,0,1,0,1,0,0,0,1,0,1,1,1,0,1,0.45,0.55,0.45,0.55,0.45,0.55 -bcnum 7 -bcval 0,0,0,0,0,0,1 -dim 3 -dm_refine 1 -dt 0.001 -edges 3,3 -floc 0.25,0.75,0.25,0.75,0.25,0.75 -fnum 0 -ftime 0,99 -fval 1 -ksp_max_it 50000 -ksp_rtol 1.0e-10 -ksp_type cg -log_summary -lower 0,0 -mat_petscspace_order 0 -mesh datafiles/cube_with_hole4_mesh.dat -mu 1 -nonneg 1 -numsteps 0 -options_left 0 -pc_type jacobi -petscpartitioner_type parmetis -progress 0 -simplex 1 -solution_petscspace_order 1 -tao_fatol 1e-8 -tao_frtol 1e-8 -tao_max_it 50000 -tao_monitor -tao_type blmvm -tao_view -trans datafiles/cube_with_hole4_trans.dat -upper 1,1 -vtuname figures/cube_with_hole_4 -vtuprint 1 #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --download-f2cblaslapack=/turquoise/users/jychang48/petsc-externalpackages/f2cblaslapack-3.4.2.q1.tar.gz --download-metis=/turquoise/users/jychang48/petsc-externalpackages/metis-5.1.0-p1.tar.gz --download-openmpi=/turquoise/users/jychang48/petsc-externalpackages/openmpi-1.8.5.tar.gz --download-parmetis=/turquoise/users/jychang48/petsc-externalpackages/parmetis-4.0.3-p1.tar.gz --download-sowing=/turquoise/users/jychang48/petsc-externalpackages/sowing-1.1.17-p1.tar.gz --with-cc=gcc --with-cxx=g++ --with-debugging=0 --with-fc=gfortran COPTFLAGS="-O3 -march=native -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" PETSC_ARCH=arch-no-hdf5-opt --download-chaco=/turquoise/users/jychang48/petsc-externalpackages/Chaco-2.2-p2.tar.gz ----------------------------------------- Libraries compiled on Tue Jun 23 12:16:19 2015 on wf-fe2.lanl.gov Machine characteristics: Linux-2.6.32-431.29.2.2chaos.ch5.2.x86_64-x86_64-with-redhat-6.6-Santiago Using PETSc directory: /turquoise/users/jychang48/petsc-master Using PETSc arch: arch-no-hdf5-opt ----------------------------------------- Using C compiler: /turquoise/users/jychang48/petsc-master/arch-no-hdf5-opt/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -march=native -mtune=native ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /turquoise/users/jychang48/petsc-master/arch-no-hdf5-opt/bin/mpif90 -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/turquoise/users/jychang48/petsc-master/arch-no-hdf5-opt/include -I/turquoise/users/jychang48/petsc-master/include -I/turquoise/users/jychang48/petsc-master/include -I/turquoise/users/jychang48/petsc-master/arch-no-hdf5-opt/include ----------------------------------------- Using C linker: /turquoise/users/jychang48/petsc-master/arch-no-hdf5-opt/bin/mpicc Using Fortran linker: /turquoise/users/jychang48/petsc-master/arch-no-hdf5-opt/bin/mpif90 Using libraries: -Wl,-rpath,/turquoise/users/jychang48/petsc-master/arch-no-hdf5-opt/lib -L/turquoise/users/jychang48/petsc-master/arch-no-hdf5-opt/lib -lpetsc -Wl,-rpath,/turquoise/users/jychang48/petsc-master/arch-no-hdf5-opt/lib -L/turquoise/users/jychang48/petsc-master/arch-no-hdf5-opt/lib -lf2clapack -lf2cblas -lm -lparmetis -lmetis -lchaco -lX11 -lhwloc -lssl -lcrypto -lm -Wl,-rpath,/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib/gcc/x86_64-unknown-linux-gnu/4.8.2 -L/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib/gcc/x86_64-unknown-linux-gnu/4.8.2 -Wl,-rpath,/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib/gcc -L/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib/gcc -Wl,-rpath,/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib64 -L/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib64 -Wl,-rpath,/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib -L/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib -lmpi_usempi -lmpi_mpifh -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpi_cxx -lstdc++ -Wl,-rpath,/turquoise/users/jychang48/petsc-master/arch-no-hdf5-opt/lib -L/turquoise/users/jychang48/petsc-master/arch-no-hdf5-opt/lib -Wl,-rpath,/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib/gcc/x86_64-unknown-linux-gnu/4.8.2 -L/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib/gcc/x86_64-unknown-linux-gnu/4.8.2 -Wl,-rpath,/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib/gcc -L/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib/gcc -Wl,-rpath,/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib64 -L/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib64 -Wl,-rpath,/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib -L/turquoise/usr/projects/hpcsoft/toss2/common/gcc/4.8.2/lib -ldl -Wl,-rpath,/turquoise/users/jychang48/petsc-master/arch-no-hdf5-opt/lib -lmpi -lgcc_s -lpthread -ldl ----------------------------------------- From stoneszone at gmail.com Thu Jun 25 15:24:29 2015 From: stoneszone at gmail.com (Lei Shi) Date: Thu, 25 Jun 2015 15:24:29 -0500 Subject: [petsc-users] Parallel efficiency of the gmres solver with ASM In-Reply-To: References: Message-ID: Hi Matt, Thanks for your suggestions. Here is the output from Stream test on one node which has 20 cores. I run it up to 20. Attached are the dumped output with your suggested options. Really appreciate your help!!! Number of MPI processes 1 Function Rate (MB/s) Copy: 13816.9372 Scale: 8020.1809 Add: 12762.3830 Triad: 11852.5016 Number of MPI processes 2 Function Rate (MB/s) Copy: 22748.7681 Scale: 14081.4906 Add: 18998.4516 Triad: 18303.2494 Number of MPI processes 3 Function Rate (MB/s) Copy: 34045.2510 Scale: 23410.9767 Add: 30320.2702 Triad: 30163.7977 Number of MPI processes 4 Function Rate (MB/s) Copy: 36875.5349 Scale: 29440.1694 Add: 36971.1860 Triad: 37377.0103 Number of MPI processes 5 Function Rate (MB/s) Copy: 32272.8763 Scale: 30316.3435 Add: 38022.0193 Triad: 38815.4830 Number of MPI processes 6 Function Rate (MB/s) Copy: 35619.8925 Scale: 34457.5078 Add: 41419.3722 Triad: 35825.3621 Number of MPI processes 7 Function Rate (MB/s) Copy: 55284.2420 Scale: 47706.8009 Add: 59076.4735 Triad: 61680.5559 Number of MPI processes 8 Function Rate (MB/s) Copy: 44525.8901 Scale: 48949.9599 Add: 57437.7784 Triad: 56671.0593 Number of MPI processes 9 Function Rate (MB/s) Copy: 34375.7364 Scale: 29507.5293 Add: 45405.3120 Triad: 39518.7559 Number of MPI processes 10 Function Rate (MB/s) Copy: 34278.0415 Scale: 41721.7843 Add: 46642.2465 Triad: 45454.7000 Number of MPI processes 11 Function Rate (MB/s) Copy: 38093.7244 Scale: 35147.2412 Add: 45047.0853 Triad: 44983.2013 Number of MPI processes 12 Function Rate (MB/s) Copy: 39750.8760 Scale: 52038.0631 Add: 55552.9503 Triad: 54884.3839 Number of MPI processes 13 Function Rate (MB/s) Copy: 60839.0248 Scale: 74143.7458 Add: 85545.3135 Triad: 85667.6551 Number of MPI processes 14 Function Rate (MB/s) Copy: 37766.2343 Scale: 40279.1928 Add: 49992.8572 Triad: 50303.4809 Number of MPI processes 15 Function Rate (MB/s) Copy: 49762.3670 Scale: 59077.8251 Add: 60407.9651 Triad: 61691.9456 Number of MPI processes 16 Function Rate (MB/s) Copy: 31996.7169 Scale: 36962.4860 Add: 40183.5060 Triad: 41096.0512 Number of MPI processes 17 Function Rate (MB/s) Copy: 36348.3839 Scale: 39108.6761 Add: 46853.4476 Triad: 47266.1778 Number of MPI processes 18 Function Rate (MB/s) Copy: 40438.7558 Scale: 43195.5785 Add: 53063.4321 Triad: 53605.0293 Number of MPI processes 19 Function Rate (MB/s) Copy: 30739.4908 Scale: 34280.8118 Add: 40710.5155 Triad: 43330.9503 Number of MPI processes 20 Function Rate (MB/s) Copy: 37488.3777 Scale: 41791.8999 Add: 49518.9604 Triad: 48908.2677 ------------------------------------------------ np speedup 1 1.0 2 1.54 3 2.54 4 3.15 5 3.27 6 3.02 7 5.2 8 4.78 9 3.33 10 3.84 11 3.8 12 4.63 13 7.23 14 4.24 15 5.2 16 3.47 17 3.99 18 4.52 19 3.66 20 4.13 Sincerely Yours, Lei Shi --------- On Thu, Jun 25, 2015 at 6:44 AM, Matthew Knepley wrote: > On Thu, Jun 25, 2015 at 5:51 AM, Lei Shi wrote: > >> Hello, >> > > 1) In order to understand this, we have to disentagle the various effect. > First, run the STREAMS benchmark > > make NPMAX=4 streams > > This will tell you the maximum speedup you can expect on this machine. > > 2) For these test cases, also send the output of > > -ksp_view -ksp_converged_reason -ksp_monitor_true_residual > > Thanks, > > Matt > > >> I'm trying to improve the parallel efficiency of gmres solve in my. In my >> CFD solver, Petsc gmres is used to solve the linear system generated by the >> Newton's method. To test its efficiency, I started with a very simple >> inviscid subsonic 3D flow as the first testcase. The parallel efficiency of >> gmres solve with asm as the preconditioner is very bad. The results are >> from our latest cluster. Right now, I'm only looking at the wclock time of >> the ksp_solve. >> >> 1. First I tested ASM with gmres and ilu 0 for the sub domain , the >> cpu time of 2 cores is almost the same as the serial run. Here is the >> options for this case >> >> -ksp_type gmres -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50 >> -ksp_gmres_restart 30 -ksp_pc_side right >> -pc_type asm -sub_ksp_type gmres -sub_ksp_rtol 0.001 -sub_ksp_atol 1e-30 >> -sub_ksp_max_it 1000 -sub_pc_type ilu -sub_pc_factor_levels 0 >> -sub_pc_factor_fill 1.9 >> >> The iteration numbers increase a lot for parallel run. >> >> coresiterationserrpetsc solve wclock timespeedupefficiency121.15E-0411.95 >> 1252.05E-0210.51.010.50462.19E-027.641.390.34 >> >> >> >> >> >> >> 2. Then I tested ASM with ilu 0 as the preconditoner only, the cpu >> time of 2 cores is better than the 1st test, but the speedup is still very >> bad. Here is the options i'm using >> >> -ksp_type gmres -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50 >> -ksp_gmres_restart 30 -ksp_pc_side right >> -pc_type asm -sub_pc_type ilu -sub_pc_factor_levels 0 >> -sub_pc_factor_fill 1.9 >> >> coresiterationserrpetsc solve cpu timespeedupefficiency1104.54E-0410.6812 >> 119.55E-048.21.300.654123.59E-045.262.030.50 >> >> >> >> >> >> >> Those results are from a third order "DG" scheme with a very coarse 3D >> mesh (480 elements). I believe I should get some speedups for this test >> even on this coarse mesh. >> >> My question is why does the asm with a local solve take much longer >> time than the asm as a preconditioner only? Also the accuracy is very bad >> too I have tested changing the overlap of asm to 2, but make it even worse. >> >> If I used a larger mesh ~4000 elements, the 2nd case with asm as the >> preconditioner gives me a better speedup, but still not very good. >> >> >> coresiterationserrpetsc solve cpu timespeedupefficiency171.91E-0297.32127 >> 2.07E-0264.941.50.74472.61E-0236.972.60.65 >> >> >> >> Attached are the log_summary dumped from petsc, any suggestions are >> welcome. I really appreciate it. >> >> >> Sincerely Yours, >> >> Lei Shi >> --------- >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: proc2_asm_sub_ksp.dat Type: application/octet-stream Size: 12839 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: proc2_asm_pconly.dat Type: application/octet-stream Size: 13347 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: proc1_asm_sub_ksp.dat Type: application/octet-stream Size: 12323 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: proc1_asm_pconly.dat Type: application/octet-stream Size: 13066 bytes Desc: not available URL: From jychang48 at gmail.com Thu Jun 25 15:34:50 2015 From: jychang48 at gmail.com (Justin Chang) Date: Thu, 25 Jun 2015 15:34:50 -0500 Subject: [petsc-users] Parallel efficiency of the gmres solver with ASM In-Reply-To: References: Message-ID: Hi Lei, Depending on your machine and MPI library, you may have to use smart process to core/socket bindings to achieve better speedup. Instructions can be found here: http://www.mcs.anl.gov/petsc/documentation/faq.html#computers Justin On Thu, Jun 25, 2015 at 3:24 PM, Lei Shi wrote: > Hi Matt, > > Thanks for your suggestions. Here is the output from Stream test on one > node which has 20 cores. I run it up to 20. Attached are the dumped output > with your suggested options. Really appreciate your help!!! > > Number of MPI processes 1 > Function Rate (MB/s) > Copy: 13816.9372 > Scale: 8020.1809 > Add: 12762.3830 > Triad: 11852.5016 > > Number of MPI processes 2 > Function Rate (MB/s) > Copy: 22748.7681 > Scale: 14081.4906 > Add: 18998.4516 > Triad: 18303.2494 > > Number of MPI processes 3 > Function Rate (MB/s) > Copy: 34045.2510 > Scale: 23410.9767 > Add: 30320.2702 > Triad: 30163.7977 > > Number of MPI processes 4 > Function Rate (MB/s) > Copy: 36875.5349 > Scale: 29440.1694 > Add: 36971.1860 > Triad: 37377.0103 > > Number of MPI processes 5 > Function Rate (MB/s) > Copy: 32272.8763 > Scale: 30316.3435 > Add: 38022.0193 > Triad: 38815.4830 > > Number of MPI processes 6 > Function Rate (MB/s) > Copy: 35619.8925 > Scale: 34457.5078 > Add: 41419.3722 > Triad: 35825.3621 > > Number of MPI processes 7 > Function Rate (MB/s) > Copy: 55284.2420 > Scale: 47706.8009 > Add: 59076.4735 > Triad: 61680.5559 > > Number of MPI processes 8 > Function Rate (MB/s) > Copy: 44525.8901 > Scale: 48949.9599 > Add: 57437.7784 > Triad: 56671.0593 > > Number of MPI processes 9 > Function Rate (MB/s) > Copy: 34375.7364 > Scale: 29507.5293 > Add: 45405.3120 > Triad: 39518.7559 > > Number of MPI processes 10 > Function Rate (MB/s) > Copy: 34278.0415 > Scale: 41721.7843 > Add: 46642.2465 > Triad: 45454.7000 > > Number of MPI processes 11 > Function Rate (MB/s) > Copy: 38093.7244 > Scale: 35147.2412 > Add: 45047.0853 > Triad: 44983.2013 > > Number of MPI processes 12 > Function Rate (MB/s) > Copy: 39750.8760 > Scale: 52038.0631 > Add: 55552.9503 > Triad: 54884.3839 > > Number of MPI processes 13 > Function Rate (MB/s) > Copy: 60839.0248 > Scale: 74143.7458 > Add: 85545.3135 > Triad: 85667.6551 > > Number of MPI processes 14 > Function Rate (MB/s) > Copy: 37766.2343 > Scale: 40279.1928 > Add: 49992.8572 > Triad: 50303.4809 > > Number of MPI processes 15 > Function Rate (MB/s) > Copy: 49762.3670 > Scale: 59077.8251 > Add: 60407.9651 > Triad: 61691.9456 > > Number of MPI processes 16 > Function Rate (MB/s) > Copy: 31996.7169 > Scale: 36962.4860 > Add: 40183.5060 > Triad: 41096.0512 > > Number of MPI processes 17 > Function Rate (MB/s) > Copy: 36348.3839 > Scale: 39108.6761 > Add: 46853.4476 > Triad: 47266.1778 > > Number of MPI processes 18 > Function Rate (MB/s) > Copy: 40438.7558 > Scale: 43195.5785 > Add: 53063.4321 > Triad: 53605.0293 > > Number of MPI processes 19 > Function Rate (MB/s) > Copy: 30739.4908 > Scale: 34280.8118 > Add: 40710.5155 > Triad: 43330.9503 > > Number of MPI processes 20 > Function Rate (MB/s) > Copy: 37488.3777 > Scale: 41791.8999 > Add: 49518.9604 > Triad: 48908.2677 > ------------------------------------------------ > np speedup > 1 1.0 > 2 1.54 > 3 2.54 > 4 3.15 > 5 3.27 > 6 3.02 > 7 5.2 > 8 4.78 > 9 3.33 > 10 3.84 > 11 3.8 > 12 4.63 > 13 7.23 > 14 4.24 > 15 5.2 > 16 3.47 > 17 3.99 > 18 4.52 > 19 3.66 > 20 4.13 > > > > > Sincerely Yours, > > Lei Shi > --------- > > On Thu, Jun 25, 2015 at 6:44 AM, Matthew Knepley > wrote: > >> On Thu, Jun 25, 2015 at 5:51 AM, Lei Shi wrote: >> >>> Hello, >>> >> >> 1) In order to understand this, we have to disentagle the various effect. >> First, run the STREAMS benchmark >> >> make NPMAX=4 streams >> >> This will tell you the maximum speedup you can expect on this machine. >> >> 2) For these test cases, also send the output of >> >> -ksp_view -ksp_converged_reason -ksp_monitor_true_residual >> >> Thanks, >> >> Matt >> >> >>> I'm trying to improve the parallel efficiency of gmres solve in my. In >>> my CFD solver, Petsc gmres is used to solve the linear system generated by >>> the Newton's method. To test its efficiency, I started with a very simple >>> inviscid subsonic 3D flow as the first testcase. The parallel efficiency of >>> gmres solve with asm as the preconditioner is very bad. The results are >>> from our latest cluster. Right now, I'm only looking at the wclock time of >>> the ksp_solve. >>> >>> 1. First I tested ASM with gmres and ilu 0 for the sub domain , the >>> cpu time of 2 cores is almost the same as the serial run. Here is the >>> options for this case >>> >>> -ksp_type gmres -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50 >>> -ksp_gmres_restart 30 -ksp_pc_side right >>> -pc_type asm -sub_ksp_type gmres -sub_ksp_rtol 0.001 -sub_ksp_atol 1e-30 >>> -sub_ksp_max_it 1000 -sub_pc_type ilu -sub_pc_factor_levels 0 >>> -sub_pc_factor_fill 1.9 >>> >>> The iteration numbers increase a lot for parallel run. >>> >>> coresiterationserrpetsc solve wclock timespeedupefficiency121.15E-04 >>> 11.951252.05E-0210.51.010.50462.19E-027.641.390.34 >>> >>> >>> >>> >>> >>> >>> 2. Then I tested ASM with ilu 0 as the preconditoner only, the >>> cpu time of 2 cores is better than the 1st test, but the speedup is still >>> very bad. Here is the options i'm using >>> >>> -ksp_type gmres -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50 >>> -ksp_gmres_restart 30 -ksp_pc_side right >>> -pc_type asm -sub_pc_type ilu -sub_pc_factor_levels 0 >>> -sub_pc_factor_fill 1.9 >>> >>> coresiterationserrpetsc solve cpu timespeedupefficiency1104.54E-0410.681 >>> 2119.55E-048.21.300.654123.59E-045.262.030.50 >>> >>> >>> >>> >>> >>> >>> Those results are from a third order "DG" scheme with a very coarse >>> 3D mesh (480 elements). I believe I should get some speedups for this test >>> even on this coarse mesh. >>> >>> My question is why does the asm with a local solve take much longer >>> time than the asm as a preconditioner only? Also the accuracy is very bad >>> too I have tested changing the overlap of asm to 2, but make it even worse. >>> >>> If I used a larger mesh ~4000 elements, the 2nd case with asm as the >>> preconditioner gives me a better speedup, but still not very good. >>> >>> >>> coresiterationserrpetsc solve cpu timespeedupefficiency171.91E-0297.3212 >>> 72.07E-0264.941.50.74472.61E-0236.972.60.65 >>> >>> >>> >>> Attached are the log_summary dumped from petsc, any suggestions are >>> welcome. I really appreciate it. >>> >>> >>> Sincerely Yours, >>> >>> Lei Shi >>> --------- >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stoneszone at gmail.com Thu Jun 25 15:48:06 2015 From: stoneszone at gmail.com (Lei Shi) Date: Thu, 25 Jun 2015 15:48:06 -0500 Subject: [petsc-users] Parallel efficiency of the gmres solver with ASM In-Reply-To: References: Message-ID: Hi Justin, Thanks for your suggestion. I will test it asap. Another thing confusing me is the wclock time with 2 cores is almost the same as the serial run when I use asm with sub_ksp_type gmres and ilu0 on subdomains. Serial run takes 11.95 sec and parallel run takes 10.5 sec. There is almost no speedup at all. And I found some other people got similar bad speedups when comparing 2 cores with 1 core. Attached is one slide from J.A. Davis's presentation. I just found it from the web. As you can see, asm with 2 cores takes almost the same cpu times compare 1 core too! May be I miss understanding some fundamental things related to asm. coresiterationserrpetsc solve wclock timespeedupefficiency121.15E-0411.95125 2.05E-0210.51.010.50462.19E-027.641.390.34 ? Sincerely Yours, Lei Shi --------- On Thu, Jun 25, 2015 at 3:34 PM, Justin Chang wrote: > Hi Lei, > > Depending on your machine and MPI library, you may have to use smart > process to core/socket bindings to achieve better speedup. Instructions can > be found here: > > http://www.mcs.anl.gov/petsc/documentation/faq.html#computers > > > Justin > > On Thu, Jun 25, 2015 at 3:24 PM, Lei Shi wrote: > >> Hi Matt, >> >> Thanks for your suggestions. Here is the output from Stream test on one >> node which has 20 cores. I run it up to 20. Attached are the dumped output >> with your suggested options. Really appreciate your help!!! >> >> Number of MPI processes 1 >> Function Rate (MB/s) >> Copy: 13816.9372 >> Scale: 8020.1809 >> Add: 12762.3830 >> Triad: 11852.5016 >> >> Number of MPI processes 2 >> Function Rate (MB/s) >> Copy: 22748.7681 >> Scale: 14081.4906 >> Add: 18998.4516 >> Triad: 18303.2494 >> >> Number of MPI processes 3 >> Function Rate (MB/s) >> Copy: 34045.2510 >> Scale: 23410.9767 >> Add: 30320.2702 >> Triad: 30163.7977 >> >> Number of MPI processes 4 >> Function Rate (MB/s) >> Copy: 36875.5349 >> Scale: 29440.1694 >> Add: 36971.1860 >> Triad: 37377.0103 >> >> Number of MPI processes 5 >> Function Rate (MB/s) >> Copy: 32272.8763 >> Scale: 30316.3435 >> Add: 38022.0193 >> Triad: 38815.4830 >> >> Number of MPI processes 6 >> Function Rate (MB/s) >> Copy: 35619.8925 >> Scale: 34457.5078 >> Add: 41419.3722 >> Triad: 35825.3621 >> >> Number of MPI processes 7 >> Function Rate (MB/s) >> Copy: 55284.2420 >> Scale: 47706.8009 >> Add: 59076.4735 >> Triad: 61680.5559 >> >> Number of MPI processes 8 >> Function Rate (MB/s) >> Copy: 44525.8901 >> Scale: 48949.9599 >> Add: 57437.7784 >> Triad: 56671.0593 >> >> Number of MPI processes 9 >> Function Rate (MB/s) >> Copy: 34375.7364 >> Scale: 29507.5293 >> Add: 45405.3120 >> Triad: 39518.7559 >> >> Number of MPI processes 10 >> Function Rate (MB/s) >> Copy: 34278.0415 >> Scale: 41721.7843 >> Add: 46642.2465 >> Triad: 45454.7000 >> >> Number of MPI processes 11 >> Function Rate (MB/s) >> Copy: 38093.7244 >> Scale: 35147.2412 >> Add: 45047.0853 >> Triad: 44983.2013 >> >> Number of MPI processes 12 >> Function Rate (MB/s) >> Copy: 39750.8760 >> Scale: 52038.0631 >> Add: 55552.9503 >> Triad: 54884.3839 >> >> Number of MPI processes 13 >> Function Rate (MB/s) >> Copy: 60839.0248 >> Scale: 74143.7458 >> Add: 85545.3135 >> Triad: 85667.6551 >> >> Number of MPI processes 14 >> Function Rate (MB/s) >> Copy: 37766.2343 >> Scale: 40279.1928 >> Add: 49992.8572 >> Triad: 50303.4809 >> >> Number of MPI processes 15 >> Function Rate (MB/s) >> Copy: 49762.3670 >> Scale: 59077.8251 >> Add: 60407.9651 >> Triad: 61691.9456 >> >> Number of MPI processes 16 >> Function Rate (MB/s) >> Copy: 31996.7169 >> Scale: 36962.4860 >> Add: 40183.5060 >> Triad: 41096.0512 >> >> Number of MPI processes 17 >> Function Rate (MB/s) >> Copy: 36348.3839 >> Scale: 39108.6761 >> Add: 46853.4476 >> Triad: 47266.1778 >> >> Number of MPI processes 18 >> Function Rate (MB/s) >> Copy: 40438.7558 >> Scale: 43195.5785 >> Add: 53063.4321 >> Triad: 53605.0293 >> >> Number of MPI processes 19 >> Function Rate (MB/s) >> Copy: 30739.4908 >> Scale: 34280.8118 >> Add: 40710.5155 >> Triad: 43330.9503 >> >> Number of MPI processes 20 >> Function Rate (MB/s) >> Copy: 37488.3777 >> Scale: 41791.8999 >> Add: 49518.9604 >> Triad: 48908.2677 >> ------------------------------------------------ >> np speedup >> 1 1.0 >> 2 1.54 >> 3 2.54 >> 4 3.15 >> 5 3.27 >> 6 3.02 >> 7 5.2 >> 8 4.78 >> 9 3.33 >> 10 3.84 >> 11 3.8 >> 12 4.63 >> 13 7.23 >> 14 4.24 >> 15 5.2 >> 16 3.47 >> 17 3.99 >> 18 4.52 >> 19 3.66 >> 20 4.13 >> >> >> >> >> Sincerely Yours, >> >> Lei Shi >> --------- >> >> On Thu, Jun 25, 2015 at 6:44 AM, Matthew Knepley >> wrote: >> >>> On Thu, Jun 25, 2015 at 5:51 AM, Lei Shi wrote: >>> >>>> Hello, >>>> >>> >>> 1) In order to understand this, we have to disentagle the various >>> effect. First, run the STREAMS benchmark >>> >>> make NPMAX=4 streams >>> >>> This will tell you the maximum speedup you can expect on this machine. >>> >>> 2) For these test cases, also send the output of >>> >>> -ksp_view -ksp_converged_reason -ksp_monitor_true_residual >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> I'm trying to improve the parallel efficiency of gmres solve in my. In >>>> my CFD solver, Petsc gmres is used to solve the linear system generated by >>>> the Newton's method. To test its efficiency, I started with a very simple >>>> inviscid subsonic 3D flow as the first testcase. The parallel efficiency of >>>> gmres solve with asm as the preconditioner is very bad. The results are >>>> from our latest cluster. Right now, I'm only looking at the wclock time of >>>> the ksp_solve. >>>> >>>> 1. First I tested ASM with gmres and ilu 0 for the sub domain , the >>>> cpu time of 2 cores is almost the same as the serial run. Here is the >>>> options for this case >>>> >>>> -ksp_type gmres -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50 >>>> -ksp_gmres_restart 30 -ksp_pc_side right >>>> -pc_type asm -sub_ksp_type gmres -sub_ksp_rtol 0.001 -sub_ksp_atol 1e-30 >>>> -sub_ksp_max_it 1000 -sub_pc_type ilu -sub_pc_factor_levels 0 >>>> -sub_pc_factor_fill 1.9 >>>> >>>> The iteration numbers increase a lot for parallel run. >>>> >>>> coresiterationserrpetsc solve wclock timespeedupefficiency121.15E-04 >>>> 11.951252.05E-0210.51.010.50462.19E-027.641.390.34 >>>> >>>> >>>> >>>> >>>> >>>> >>>> 2. Then I tested ASM with ilu 0 as the preconditoner only, the >>>> cpu time of 2 cores is better than the 1st test, but the speedup is still >>>> very bad. Here is the options i'm using >>>> >>>> -ksp_type gmres -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50 >>>> -ksp_gmres_restart 30 -ksp_pc_side right >>>> -pc_type asm -sub_pc_type ilu -sub_pc_factor_levels 0 >>>> -sub_pc_factor_fill 1.9 >>>> >>>> coresiterationserrpetsc solve cpu timespeedupefficiency1104.54E-0410.68 >>>> 12119.55E-048.21.300.654123.59E-045.262.030.50 >>>> >>>> >>>> >>>> >>>> >>>> >>>> Those results are from a third order "DG" scheme with a very coarse >>>> 3D mesh (480 elements). I believe I should get some speedups for this test >>>> even on this coarse mesh. >>>> >>>> My question is why does the asm with a local solve take much longer >>>> time than the asm as a preconditioner only? Also the accuracy is very bad >>>> too I have tested changing the overlap of asm to 2, but make it even worse. >>>> >>>> If I used a larger mesh ~4000 elements, the 2nd case with asm as the >>>> preconditioner gives me a better speedup, but still not very good. >>>> >>>> >>>> coresiterationserrpetsc solve cpu timespeedupefficiency171.91E-0297.321 >>>> 272.07E-0264.941.50.74472.61E-0236.972.60.65 >>>> >>>> >>>> >>>> Attached are the log_summary dumped from petsc, any suggestions are >>>> welcome. I really appreciate it. >>>> >>>> >>>> Sincerely Yours, >>>> >>>> Lei Shi >>>> --------- >>>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot - 06252015 - 03:44:53 PM.png Type: image/png Size: 159230 bytes Desc: not available URL: From lawrence.mitchell at imperial.ac.uk Thu Jun 25 16:16:57 2015 From: lawrence.mitchell at imperial.ac.uk (Lawrence Mitchell) Date: Thu, 25 Jun 2015 14:16:57 -0700 Subject: [petsc-users] Set PC PythonContext within NPC In-Reply-To: References: Message-ID: <6B055E13-DBAE-4FFE-8F89-A18DDC41D016@imperial.ac.uk> > On 24 Jun 2015, at 17:02, Asbj?rn Nilsen Riseth wrote: > > Hi all, > > I'm currently trying to set up a nonlinear solver that uses NGMRES with NEWTONLS as a right preconditioner. The NEWTONLS has a custom linear preconditioner. Everything is accessed through petsc4py. > > *Is there a way I can configure a NPC NEWTONLS KSP CompositePC without first calling solve on my outer snes?* > > My NEWTONLS has the following KSP setup: > FGMRES > | PCCOMPOSITE > || PYTHON > || ILU > > The way I understand things, the NPC is not created/set up until SNESSolve_NGMRES() is called. Therefore, I cannot call snes.getNPC().ksp.pc.getCompositePC(0).setPythonContext(ctx) before I have called snes.solve(). > > What I currently do is a try/except on snes.solve to create/set up the pccomposite. Then I can set my pythoncontext and it runs fine. This is quite ugly though, so I was hoping anyone would have a better approach. I think you can do the following: Define a class in some module: # (in module foo) class MyPC(object): def setUp(self, pc): # do any setup here. pass def apply(self, pc, x, y): # apply preconditioner pass And pass: -npc_sub_1_pc_type python -npc_sub_1_pc_python_context foo.MyPC You can pull the app context out in the class's setUp method by doing: ctx = pc.getDM().getAppCtx() Assuming that you're running with a branch that contains origin/knepley/fix-pc-composite-dm. Cheers, Lawrence -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 455 bytes Desc: Message signed with OpenPGP using GPGMail URL: From stoneszone at gmail.com Thu Jun 25 16:27:17 2015 From: stoneszone at gmail.com (Lei Shi) Date: Thu, 25 Jun 2015 16:27:17 -0500 Subject: [petsc-users] Parallel efficiency of the gmres solver with ASM In-Reply-To: References: Message-ID: Hi Justin, I tested with mpirun --binding cpu:sockets .... Unfortunately, the results are almost the same as before. Thanks. -ksp_type gmres -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50 -ksp_gmres_restart 30 -ksp_pc_side right -pc_type asm -sub_pc_type ilu -sub_pc_factor_levels 0 -sub_pc_factor_fill 1.9 coresiterationserrpetsc solve cpu time 1104.54E-0410.65 2119.55E-048.19 4123.59E-045.32 Sincerely Yours, Lei Shi --------- On Thu, Jun 25, 2015 at 3:48 PM, Lei Shi wrote: > Hi Justin, > > Thanks for your suggestion. I will test it asap. > > Another thing confusing me is the wclock time with 2 cores is almost the > same as the serial run when I use asm with sub_ksp_type gmres and ilu0 on > subdomains. Serial run takes 11.95 sec and parallel run takes 10.5 sec. > There is almost no speedup at all. > > And I found some other people got similar bad speedups when comparing 2 > cores with 1 core. Attached is one slide from J.A. Davis's presentation. I > just found it from the web. As you can see, asm with 2 cores takes almost > the same cpu times compare 1 core too! May be I miss understanding some > fundamental things related to asm. > > coresiterationserrpetsc solve wclock timespeedupefficiency121.15E-0411.951 > 252.05E-0210.51.010.50462.19E-027.641.390.34 > > > > > > > ? > > Sincerely Yours, > > Lei Shi > --------- > > On Thu, Jun 25, 2015 at 3:34 PM, Justin Chang wrote: > >> Hi Lei, >> >> Depending on your machine and MPI library, you may have to use smart >> process to core/socket bindings to achieve better speedup. Instructions can >> be found here: >> >> http://www.mcs.anl.gov/petsc/documentation/faq.html#computers >> >> >> Justin >> >> On Thu, Jun 25, 2015 at 3:24 PM, Lei Shi wrote: >> >>> Hi Matt, >>> >>> Thanks for your suggestions. Here is the output from Stream test on one >>> node which has 20 cores. I run it up to 20. Attached are the dumped output >>> with your suggested options. Really appreciate your help!!! >>> >>> Number of MPI processes 1 >>> Function Rate (MB/s) >>> Copy: 13816.9372 >>> Scale: 8020.1809 >>> Add: 12762.3830 >>> Triad: 11852.5016 >>> >>> Number of MPI processes 2 >>> Function Rate (MB/s) >>> Copy: 22748.7681 >>> Scale: 14081.4906 >>> Add: 18998.4516 >>> Triad: 18303.2494 >>> >>> Number of MPI processes 3 >>> Function Rate (MB/s) >>> Copy: 34045.2510 >>> Scale: 23410.9767 >>> Add: 30320.2702 >>> Triad: 30163.7977 >>> >>> Number of MPI processes 4 >>> Function Rate (MB/s) >>> Copy: 36875.5349 >>> Scale: 29440.1694 >>> Add: 36971.1860 >>> Triad: 37377.0103 >>> >>> Number of MPI processes 5 >>> Function Rate (MB/s) >>> Copy: 32272.8763 >>> Scale: 30316.3435 >>> Add: 38022.0193 >>> Triad: 38815.4830 >>> >>> Number of MPI processes 6 >>> Function Rate (MB/s) >>> Copy: 35619.8925 >>> Scale: 34457.5078 >>> Add: 41419.3722 >>> Triad: 35825.3621 >>> >>> Number of MPI processes 7 >>> Function Rate (MB/s) >>> Copy: 55284.2420 >>> Scale: 47706.8009 >>> Add: 59076.4735 >>> Triad: 61680.5559 >>> >>> Number of MPI processes 8 >>> Function Rate (MB/s) >>> Copy: 44525.8901 >>> Scale: 48949.9599 >>> Add: 57437.7784 >>> Triad: 56671.0593 >>> >>> Number of MPI processes 9 >>> Function Rate (MB/s) >>> Copy: 34375.7364 >>> Scale: 29507.5293 >>> Add: 45405.3120 >>> Triad: 39518.7559 >>> >>> Number of MPI processes 10 >>> Function Rate (MB/s) >>> Copy: 34278.0415 >>> Scale: 41721.7843 >>> Add: 46642.2465 >>> Triad: 45454.7000 >>> >>> Number of MPI processes 11 >>> Function Rate (MB/s) >>> Copy: 38093.7244 >>> Scale: 35147.2412 >>> Add: 45047.0853 >>> Triad: 44983.2013 >>> >>> Number of MPI processes 12 >>> Function Rate (MB/s) >>> Copy: 39750.8760 >>> Scale: 52038.0631 >>> Add: 55552.9503 >>> Triad: 54884.3839 >>> >>> Number of MPI processes 13 >>> Function Rate (MB/s) >>> Copy: 60839.0248 >>> Scale: 74143.7458 >>> Add: 85545.3135 >>> Triad: 85667.6551 >>> >>> Number of MPI processes 14 >>> Function Rate (MB/s) >>> Copy: 37766.2343 >>> Scale: 40279.1928 >>> Add: 49992.8572 >>> Triad: 50303.4809 >>> >>> Number of MPI processes 15 >>> Function Rate (MB/s) >>> Copy: 49762.3670 >>> Scale: 59077.8251 >>> Add: 60407.9651 >>> Triad: 61691.9456 >>> >>> Number of MPI processes 16 >>> Function Rate (MB/s) >>> Copy: 31996.7169 >>> Scale: 36962.4860 >>> Add: 40183.5060 >>> Triad: 41096.0512 >>> >>> Number of MPI processes 17 >>> Function Rate (MB/s) >>> Copy: 36348.3839 >>> Scale: 39108.6761 >>> Add: 46853.4476 >>> Triad: 47266.1778 >>> >>> Number of MPI processes 18 >>> Function Rate (MB/s) >>> Copy: 40438.7558 >>> Scale: 43195.5785 >>> Add: 53063.4321 >>> Triad: 53605.0293 >>> >>> Number of MPI processes 19 >>> Function Rate (MB/s) >>> Copy: 30739.4908 >>> Scale: 34280.8118 >>> Add: 40710.5155 >>> Triad: 43330.9503 >>> >>> Number of MPI processes 20 >>> Function Rate (MB/s) >>> Copy: 37488.3777 >>> Scale: 41791.8999 >>> Add: 49518.9604 >>> Triad: 48908.2677 >>> ------------------------------------------------ >>> np speedup >>> 1 1.0 >>> 2 1.54 >>> 3 2.54 >>> 4 3.15 >>> 5 3.27 >>> 6 3.02 >>> 7 5.2 >>> 8 4.78 >>> 9 3.33 >>> 10 3.84 >>> 11 3.8 >>> 12 4.63 >>> 13 7.23 >>> 14 4.24 >>> 15 5.2 >>> 16 3.47 >>> 17 3.99 >>> 18 4.52 >>> 19 3.66 >>> 20 4.13 >>> >>> >>> >>> >>> Sincerely Yours, >>> >>> Lei Shi >>> --------- >>> >>> On Thu, Jun 25, 2015 at 6:44 AM, Matthew Knepley >>> wrote: >>> >>>> On Thu, Jun 25, 2015 at 5:51 AM, Lei Shi wrote: >>>> >>>>> Hello, >>>>> >>>> >>>> 1) In order to understand this, we have to disentagle the various >>>> effect. First, run the STREAMS benchmark >>>> >>>> make NPMAX=4 streams >>>> >>>> This will tell you the maximum speedup you can expect on this machine. >>>> >>>> 2) For these test cases, also send the output of >>>> >>>> -ksp_view -ksp_converged_reason -ksp_monitor_true_residual >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> I'm trying to improve the parallel efficiency of gmres solve in my. In >>>>> my CFD solver, Petsc gmres is used to solve the linear system generated by >>>>> the Newton's method. To test its efficiency, I started with a very simple >>>>> inviscid subsonic 3D flow as the first testcase. The parallel efficiency of >>>>> gmres solve with asm as the preconditioner is very bad. The results are >>>>> from our latest cluster. Right now, I'm only looking at the wclock time of >>>>> the ksp_solve. >>>>> >>>>> 1. First I tested ASM with gmres and ilu 0 for the sub domain , >>>>> the cpu time of 2 cores is almost the same as the serial run. Here is the >>>>> options for this case >>>>> >>>>> -ksp_type gmres -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50 >>>>> -ksp_gmres_restart 30 -ksp_pc_side right >>>>> -pc_type asm -sub_ksp_type gmres -sub_ksp_rtol 0.001 -sub_ksp_atol >>>>> 1e-30 >>>>> -sub_ksp_max_it 1000 -sub_pc_type ilu -sub_pc_factor_levels 0 >>>>> -sub_pc_factor_fill 1.9 >>>>> >>>>> The iteration numbers increase a lot for parallel run. >>>>> >>>>> coresiterationserrpetsc solve wclock timespeedupefficiency121.15E-04 >>>>> 11.951252.05E-0210.51.010.50462.19E-027.641.390.34 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> 2. Then I tested ASM with ilu 0 as the preconditoner only, the >>>>> cpu time of 2 cores is better than the 1st test, but the speedup is still >>>>> very bad. Here is the options i'm using >>>>> >>>>> -ksp_type gmres -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50 >>>>> -ksp_gmres_restart 30 -ksp_pc_side right >>>>> -pc_type asm -sub_pc_type ilu -sub_pc_factor_levels 0 >>>>> -sub_pc_factor_fill 1.9 >>>>> >>>>> coresiterationserrpetsc solve cpu timespeedupefficiency1104.54E-04 >>>>> 10.6812119.55E-048.21.300.654123.59E-045.262.030.50 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Those results are from a third order "DG" scheme with a very coarse >>>>> 3D mesh (480 elements). I believe I should get some speedups for this test >>>>> even on this coarse mesh. >>>>> >>>>> My question is why does the asm with a local solve take much longer >>>>> time than the asm as a preconditioner only? Also the accuracy is very bad >>>>> too I have tested changing the overlap of asm to 2, but make it even worse. >>>>> >>>>> If I used a larger mesh ~4000 elements, the 2nd case with asm as the >>>>> preconditioner gives me a better speedup, but still not very good. >>>>> >>>>> >>>>> coresiterationserrpetsc solve cpu timespeedupefficiency171.91E-0297.32 >>>>> 1272.07E-0264.941.50.74472.61E-0236.972.60.65 >>>>> >>>>> >>>>> >>>>> Attached are the log_summary dumped from petsc, any suggestions are >>>>> welcome. I really appreciate it. >>>>> >>>>> >>>>> Sincerely Yours, >>>>> >>>>> Lei Shi >>>>> --------- >>>>> >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot - 06252015 - 03:44:53 PM.png Type: image/png Size: 159230 bytes Desc: not available URL: From bsmith at mcs.anl.gov Thu Jun 25 17:33:33 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 25 Jun 2015 17:33:33 -0500 Subject: [petsc-users] Parallel efficiency of the gmres solver with ASM In-Reply-To: References: Message-ID: <228F0DC3-F1AB-41F5-8502-39314F2E2B52@mcs.anl.gov> > On Jun 25, 2015, at 3:48 PM, Lei Shi wrote: > > Hi Justin, > > Thanks for your suggestion. I will test it asap. > > Another thing confusing me is the wclock time with 2 cores is almost the same as the serial run when I use asm with sub_ksp_type gmres and ilu0 on subdomains. Serial run takes 11.95 sec and parallel run takes 10.5 sec. There is almost no speedup at all. On one process ASM is ilu(0), so the setup time is one ILU(0) factorization of the entire matrix. On two processes the ILU(0) is run on a matrix that is more than 1/2 the size of the matrix; due to the overlap of 1. In particular for small problems the overlap will pull in most of the matrix so the setup time is not 1/2 of the setup time of one process. Then the number of iterations increases a good amount in going from 1 to 2 processes. In combination this means that ASM going from one to two process requires one each process much more than 1/2 the work of running on 1 process so you should not expect great speedup in going from one to two processes. > > And I found some other people got similar bad speedups when comparing 2 cores with 1 core. Attached is one slide from J.A. Davis's presentation. I just found it from the web. As you can see, asm with 2 cores takes almost the same cpu times compare 1 core too! May be I miss understanding some fundamental things related to asm. > > cores iterations err petsc solve wclock time speedup efficiency > 1 2 1.15E-04 11.95 1 > 2 5 2.05E-02 10.5 1.01 0.50 > 4 6 2.19E-02 7.64 1.39 0.34 > > > > > > > > > ? > > Sincerely Yours, > > Lei Shi > --------- > > On Thu, Jun 25, 2015 at 3:34 PM, Justin Chang wrote: > Hi Lei, > > Depending on your machine and MPI library, you may have to use smart process to core/socket bindings to achieve better speedup. Instructions can be found here: > > http://www.mcs.anl.gov/petsc/documentation/faq.html#computers > > > Justin > > On Thu, Jun 25, 2015 at 3:24 PM, Lei Shi wrote: > Hi Matt, > > Thanks for your suggestions. Here is the output from Stream test on one node which has 20 cores. I run it up to 20. Attached are the dumped output with your suggested options. Really appreciate your help!!! > > Number of MPI processes 1 > Function Rate (MB/s) > Copy: 13816.9372 > Scale: 8020.1809 > Add: 12762.3830 > Triad: 11852.5016 > > Number of MPI processes 2 > Function Rate (MB/s) > Copy: 22748.7681 > Scale: 14081.4906 > Add: 18998.4516 > Triad: 18303.2494 > > Number of MPI processes 3 > Function Rate (MB/s) > Copy: 34045.2510 > Scale: 23410.9767 > Add: 30320.2702 > Triad: 30163.7977 > > Number of MPI processes 4 > Function Rate (MB/s) > Copy: 36875.5349 > Scale: 29440.1694 > Add: 36971.1860 > Triad: 37377.0103 > > Number of MPI processes 5 > Function Rate (MB/s) > Copy: 32272.8763 > Scale: 30316.3435 > Add: 38022.0193 > Triad: 38815.4830 > > Number of MPI processes 6 > Function Rate (MB/s) > Copy: 35619.8925 > Scale: 34457.5078 > Add: 41419.3722 > Triad: 35825.3621 > > Number of MPI processes 7 > Function Rate (MB/s) > Copy: 55284.2420 > Scale: 47706.8009 > Add: 59076.4735 > Triad: 61680.5559 > > Number of MPI processes 8 > Function Rate (MB/s) > Copy: 44525.8901 > Scale: 48949.9599 > Add: 57437.7784 > Triad: 56671.0593 > > Number of MPI processes 9 > Function Rate (MB/s) > Copy: 34375.7364 > Scale: 29507.5293 > Add: 45405.3120 > Triad: 39518.7559 > > Number of MPI processes 10 > Function Rate (MB/s) > Copy: 34278.0415 > Scale: 41721.7843 > Add: 46642.2465 > Triad: 45454.7000 > > Number of MPI processes 11 > Function Rate (MB/s) > Copy: 38093.7244 > Scale: 35147.2412 > Add: 45047.0853 > Triad: 44983.2013 > > Number of MPI processes 12 > Function Rate (MB/s) > Copy: 39750.8760 > Scale: 52038.0631 > Add: 55552.9503 > Triad: 54884.3839 > > Number of MPI processes 13 > Function Rate (MB/s) > Copy: 60839.0248 > Scale: 74143.7458 > Add: 85545.3135 > Triad: 85667.6551 > > Number of MPI processes 14 > Function Rate (MB/s) > Copy: 37766.2343 > Scale: 40279.1928 > Add: 49992.8572 > Triad: 50303.4809 > > Number of MPI processes 15 > Function Rate (MB/s) > Copy: 49762.3670 > Scale: 59077.8251 > Add: 60407.9651 > Triad: 61691.9456 > > Number of MPI processes 16 > Function Rate (MB/s) > Copy: 31996.7169 > Scale: 36962.4860 > Add: 40183.5060 > Triad: 41096.0512 > > Number of MPI processes 17 > Function Rate (MB/s) > Copy: 36348.3839 > Scale: 39108.6761 > Add: 46853.4476 > Triad: 47266.1778 > > Number of MPI processes 18 > Function Rate (MB/s) > Copy: 40438.7558 > Scale: 43195.5785 > Add: 53063.4321 > Triad: 53605.0293 > > Number of MPI processes 19 > Function Rate (MB/s) > Copy: 30739.4908 > Scale: 34280.8118 > Add: 40710.5155 > Triad: 43330.9503 > > Number of MPI processes 20 > Function Rate (MB/s) > Copy: 37488.3777 > Scale: 41791.8999 > Add: 49518.9604 > Triad: 48908.2677 > ------------------------------------------------ > np speedup > 1 1.0 > 2 1.54 > 3 2.54 > 4 3.15 > 5 3.27 > 6 3.02 > 7 5.2 > 8 4.78 > 9 3.33 > 10 3.84 > 11 3.8 > 12 4.63 > 13 7.23 > 14 4.24 > 15 5.2 > 16 3.47 > 17 3.99 > 18 4.52 > 19 3.66 > 20 4.13 > > > > > > Sincerely Yours, > > Lei Shi > --------- > > On Thu, Jun 25, 2015 at 6:44 AM, Matthew Knepley wrote: > On Thu, Jun 25, 2015 at 5:51 AM, Lei Shi wrote: > Hello, > > 1) In order to understand this, we have to disentagle the various effect. First, run the STREAMS benchmark > > make NPMAX=4 streams > > This will tell you the maximum speedup you can expect on this machine. > > 2) For these test cases, also send the output of > > -ksp_view -ksp_converged_reason -ksp_monitor_true_residual > > Thanks, > > Matt > > I'm trying to improve the parallel efficiency of gmres solve in my. In my CFD solver, Petsc gmres is used to solve the linear system generated by the Newton's method. To test its efficiency, I started with a very simple inviscid subsonic 3D flow as the first testcase. The parallel efficiency of gmres solve with asm as the preconditioner is very bad. The results are from our latest cluster. Right now, I'm only looking at the wclock time of the ksp_solve. > ? First I tested ASM with gmres and ilu 0 for the sub domain , the cpu time of 2 cores is almost the same as the serial run. Here is the options for this case > -ksp_type gmres -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50 > -ksp_gmres_restart 30 -ksp_pc_side right > -pc_type asm -sub_ksp_type gmres -sub_ksp_rtol 0.001 -sub_ksp_atol 1e-30 > -sub_ksp_max_it 1000 -sub_pc_type ilu -sub_pc_factor_levels 0 > -sub_pc_factor_fill 1.9 > The iteration numbers increase a lot for parallel run. > cores iterations err petsc solve wclock time speedup efficiency > 1 2 1.15E-04 11.95 1 > 2 5 2.05E-02 10.5 1.01 0.50 > 4 6 2.19E-02 7.64 1.39 0.34 > > > > > > > > 2. Then I tested ASM with ilu 0 as the preconditoner only, the cpu time of 2 cores is better than the 1st test, but the speedup is still very bad. Here is the options i'm using > -ksp_type gmres -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50 > -ksp_gmres_restart 30 -ksp_pc_side right > -pc_type asm -sub_pc_type ilu -sub_pc_factor_levels 0 -sub_pc_factor_fill 1.9 > cores iterations err petsc solve cpu time speedup efficiency > 1 10 4.54E-04 10.68 1 > 2 11 9.55E-04 8.2 1.30 0.65 > 4 12 3.59E-04 5.26 2.03 0.50 > > > > > > > > Those results are from a third order "DG" scheme with a very coarse 3D mesh (480 elements). I believe I should get some speedups for this test even on this coarse mesh. > > My question is why does the asm with a local solve take much longer time than the asm as a preconditioner only? Also the accuracy is very bad too I have tested changing the overlap of asm to 2, but make it even worse. > > If I used a larger mesh ~4000 elements, the 2nd case with asm as the preconditioner gives me a better speedup, but still not very good. > > cores iterations err petsc solve cpu time speedup efficiency > 1 7 1.91E-02 97.32 1 > 2 7 2.07E-02 64.94 1.5 0.74 > 4 7 2.61E-02 36.97 2.6 0.65 > > > Attached are the log_summary dumped from petsc, any suggestions are welcome. I really appreciate it. > > > Sincerely Yours, > > Lei Shi > --------- > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > > From Xinya.Li at pnnl.gov Thu Jun 25 18:37:29 2015 From: Xinya.Li at pnnl.gov (Li, Xinya) Date: Thu, 25 Jun 2015 23:37:29 +0000 Subject: [petsc-users] TSSolve problems Message-ID: Dear Sir, I am using the ts solver to solve a set of ODE and DAE. The Jacobian matrix is a 1152 *1152 sparse complex matrix. Each TSStep in TSSolve is taking nearly 1 second. Thus, I need to improve the speed of TSSolve. Which parts should be taking into account to accelerate TSSolve? Thank you very much. Regards __________________________________________________ Xinya Li Scientist EED/Hydrology Pacific Northwest National Laboratory 902 Battelle Boulevard P.O. Box 999, MSIN K9-33 Richland, WA 99352 USA Tel: 509-372-6248 Fax: 509-372-6089 Xinya.Li at pnl.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Jun 25 19:46:59 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 25 Jun 2015 19:46:59 -0500 Subject: [petsc-users] TSSolve problems In-Reply-To: References: Message-ID: Run with -ts_view -log_summary and send the output. This will tell the current solvers and where the time is being spent. Barry > On Jun 25, 2015, at 6:37 PM, Li, Xinya wrote: > > Dear Sir, > > I am using the ts solver to solve a set of ODE and DAE. The Jacobian matrix is a 1152 *1152 sparse complex matrix. > Each TSStep in TSSolve is taking nearly 1 second. Thus, I need to improve the speed of TSSolve. > Which parts should be taking into account to accelerate TSSolve? > Thank you very much. > Regards > __________________________________________________ > Xinya Li > Scientist > EED/Hydrology > Pacific Northwest National Laboratory > 902 Battelle Boulevard > P.O. Box 999, MSIN K9-33 > Richland, WA 99352 USA > Tel: 509-372-6248 > Fax: 509-372-6089 > Xinya.Li at pnl.gov From stoneszone at gmail.com Thu Jun 25 21:25:01 2015 From: stoneszone at gmail.com (Lei Shi) Date: Thu, 25 Jun 2015 21:25:01 -0500 Subject: [petsc-users] Parallel efficiency of the gmres solver with ASM In-Reply-To: <228F0DC3-F1AB-41F5-8502-39314F2E2B52@mcs.anl.gov> References: <228F0DC3-F1AB-41F5-8502-39314F2E2B52@mcs.anl.gov> Message-ID: Barry, Thanks a lot for your reply. Your explanation helps me understand my test results. So In this case, to compute the speedup for a strong scalability test, I should use the the wall clock time with multiple cores as a reference time instead of serial run time? e.g. for computing speed up of 16 cores, i should use [image: speedup=\frac{4 \times wclock_{4core}}{wclock_{16core}}] instead of using [image: speedup=\frac{wclock_{1core}}{wclock_{16core}}] Another question is when I use asm as a preconditioner only, the speedup of 2 cores is much better than the case using asm with a local solve sub_ksp_type gmres. -ksp_type gmres -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50 -ksp_gmres_restart 30 -ksp_pc_side right -pc_type asm -sub_pc_type ilu -sub_pc_factor_levels 0 -sub_pc_factor_fill 1.9 coresiterationserrpetsc solve cpu timespeedupefficiency1104.54E-0410.681211 9.55E-048.21.300.654123.59E-045.262.030.50 What is the main differences between those two? Thanks. Would you please take a look of my profiling data? Do you think this is the best parallel efficiency I can get from Petsc? How can I improve it? Best, Lei Shi Sincerely Yours, Lei Shi --------- On Thu, Jun 25, 2015 at 5:33 PM, Barry Smith wrote: > > > On Jun 25, 2015, at 3:48 PM, Lei Shi wrote: > > > > Hi Justin, > > > > Thanks for your suggestion. I will test it asap. > > > > Another thing confusing me is the wclock time with 2 cores is almost the > same as the serial run when I use asm with sub_ksp_type gmres and ilu0 on > subdomains. Serial run takes 11.95 sec and parallel run takes 10.5 sec. > There is almost no speedup at all. > > On one process ASM is ilu(0), so the setup time is one ILU(0) > factorization of the entire matrix. On two processes the ILU(0) is run on a > matrix that is more than 1/2 the size of the matrix; due to the overlap of > 1. In particular for small problems the overlap will pull in most of the > matrix so the setup time is not 1/2 of the setup time of one process. Then > the number of iterations increases a good amount in going from 1 to 2 > processes. In combination this means that ASM going from one to two process > requires one each process much more than 1/2 the work of running on 1 > process so you should not expect great speedup in going from one to two > processes. > > > > > And I found some other people got similar bad speedups when comparing 2 > cores with 1 core. Attached is one slide from J.A. Davis's presentation. I > just found it from the web. As you can see, asm with 2 cores takes almost > the same cpu times compare 1 core too! May be I miss understanding some > fundamental things related to asm. > > > > cores iterations err petsc solve wclock time speedup efficiency > > 1 2 1.15E-04 11.95 1 > > 2 5 2.05E-02 10.5 1.01 0.50 > > 4 6 2.19E-02 7.64 1.39 0.34 > > > > > > > > > > > > > > > > > > ? > > > > Sincerely Yours, > > > > Lei Shi > > --------- > > > > On Thu, Jun 25, 2015 at 3:34 PM, Justin Chang > wrote: > > Hi Lei, > > > > Depending on your machine and MPI library, you may have to use smart > process to core/socket bindings to achieve better speedup. Instructions can > be found here: > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html#computers > > > > > > Justin > > > > On Thu, Jun 25, 2015 at 3:24 PM, Lei Shi wrote: > > Hi Matt, > > > > Thanks for your suggestions. Here is the output from Stream test on one > node which has 20 cores. I run it up to 20. Attached are the dumped output > with your suggested options. Really appreciate your help!!! > > > > Number of MPI processes 1 > > Function Rate (MB/s) > > Copy: 13816.9372 > > Scale: 8020.1809 > > Add: 12762.3830 > > Triad: 11852.5016 > > > > Number of MPI processes 2 > > Function Rate (MB/s) > > Copy: 22748.7681 > > Scale: 14081.4906 > > Add: 18998.4516 > > Triad: 18303.2494 > > > > Number of MPI processes 3 > > Function Rate (MB/s) > > Copy: 34045.2510 > > Scale: 23410.9767 > > Add: 30320.2702 > > Triad: 30163.7977 > > > > Number of MPI processes 4 > > Function Rate (MB/s) > > Copy: 36875.5349 > > Scale: 29440.1694 > > Add: 36971.1860 > > Triad: 37377.0103 > > > > Number of MPI processes 5 > > Function Rate (MB/s) > > Copy: 32272.8763 > > Scale: 30316.3435 > > Add: 38022.0193 > > Triad: 38815.4830 > > > > Number of MPI processes 6 > > Function Rate (MB/s) > > Copy: 35619.8925 > > Scale: 34457.5078 > > Add: 41419.3722 > > Triad: 35825.3621 > > > > Number of MPI processes 7 > > Function Rate (MB/s) > > Copy: 55284.2420 > > Scale: 47706.8009 > > Add: 59076.4735 > > Triad: 61680.5559 > > > > Number of MPI processes 8 > > Function Rate (MB/s) > > Copy: 44525.8901 > > Scale: 48949.9599 > > Add: 57437.7784 > > Triad: 56671.0593 > > > > Number of MPI processes 9 > > Function Rate (MB/s) > > Copy: 34375.7364 > > Scale: 29507.5293 > > Add: 45405.3120 > > Triad: 39518.7559 > > > > Number of MPI processes 10 > > Function Rate (MB/s) > > Copy: 34278.0415 > > Scale: 41721.7843 > > Add: 46642.2465 > > Triad: 45454.7000 > > > > Number of MPI processes 11 > > Function Rate (MB/s) > > Copy: 38093.7244 > > Scale: 35147.2412 > > Add: 45047.0853 > > Triad: 44983.2013 > > > > Number of MPI processes 12 > > Function Rate (MB/s) > > Copy: 39750.8760 > > Scale: 52038.0631 > > Add: 55552.9503 > > Triad: 54884.3839 > > > > Number of MPI processes 13 > > Function Rate (MB/s) > > Copy: 60839.0248 > > Scale: 74143.7458 > > Add: 85545.3135 > > Triad: 85667.6551 > > > > Number of MPI processes 14 > > Function Rate (MB/s) > > Copy: 37766.2343 > > Scale: 40279.1928 > > Add: 49992.8572 > > Triad: 50303.4809 > > > > Number of MPI processes 15 > > Function Rate (MB/s) > > Copy: 49762.3670 > > Scale: 59077.8251 > > Add: 60407.9651 > > Triad: 61691.9456 > > > > Number of MPI processes 16 > > Function Rate (MB/s) > > Copy: 31996.7169 > > Scale: 36962.4860 > > Add: 40183.5060 > > Triad: 41096.0512 > > > > Number of MPI processes 17 > > Function Rate (MB/s) > > Copy: 36348.3839 > > Scale: 39108.6761 > > Add: 46853.4476 > > Triad: 47266.1778 > > > > Number of MPI processes 18 > > Function Rate (MB/s) > > Copy: 40438.7558 > > Scale: 43195.5785 > > Add: 53063.4321 > > Triad: 53605.0293 > > > > Number of MPI processes 19 > > Function Rate (MB/s) > > Copy: 30739.4908 > > Scale: 34280.8118 > > Add: 40710.5155 > > Triad: 43330.9503 > > > > Number of MPI processes 20 > > Function Rate (MB/s) > > Copy: 37488.3777 > > Scale: 41791.8999 > > Add: 49518.9604 > > Triad: 48908.2677 > > ------------------------------------------------ > > np speedup > > 1 1.0 > > 2 1.54 > > 3 2.54 > > 4 3.15 > > 5 3.27 > > 6 3.02 > > 7 5.2 > > 8 4.78 > > 9 3.33 > > 10 3.84 > > 11 3.8 > > 12 4.63 > > 13 7.23 > > 14 4.24 > > 15 5.2 > > 16 3.47 > > 17 3.99 > > 18 4.52 > > 19 3.66 > > 20 4.13 > > > > > > > > > > > > Sincerely Yours, > > > > Lei Shi > > --------- > > > > On Thu, Jun 25, 2015 at 6:44 AM, Matthew Knepley > wrote: > > On Thu, Jun 25, 2015 at 5:51 AM, Lei Shi wrote: > > Hello, > > > > 1) In order to understand this, we have to disentagle the various > effect. First, run the STREAMS benchmark > > > > make NPMAX=4 streams > > > > This will tell you the maximum speedup you can expect on this machine. > > > > 2) For these test cases, also send the output of > > > > -ksp_view -ksp_converged_reason -ksp_monitor_true_residual > > > > Thanks, > > > > Matt > > > > I'm trying to improve the parallel efficiency of gmres solve in my. In > my CFD solver, Petsc gmres is used to solve the linear system generated by > the Newton's method. To test its efficiency, I started with a very simple > inviscid subsonic 3D flow as the first testcase. The parallel efficiency of > gmres solve with asm as the preconditioner is very bad. The results are > from our latest cluster. Right now, I'm only looking at the wclock time of > the ksp_solve. > > ? First I tested ASM with gmres and ilu 0 for the sub domain , the > cpu time of 2 cores is almost the same as the serial run. Here is the > options for this case > > -ksp_type gmres -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50 > > -ksp_gmres_restart 30 -ksp_pc_side right > > -pc_type asm -sub_ksp_type gmres -sub_ksp_rtol 0.001 -sub_ksp_atol 1e-30 > > -sub_ksp_max_it 1000 -sub_pc_type ilu -sub_pc_factor_levels 0 > > -sub_pc_factor_fill 1.9 > > The iteration numbers increase a lot for parallel run. > > cores iterations err petsc solve wclock time speedup efficiency > > 1 2 1.15E-04 11.95 1 > > 2 5 2.05E-02 10.5 1.01 0.50 > > 4 6 2.19E-02 7.64 1.39 0.34 > > > > > > > > > > > > > > > > 2. Then I tested ASM with ilu 0 as the preconditoner only, the > cpu time of 2 cores is better than the 1st test, but the speedup is still > very bad. Here is the options i'm using > > -ksp_type gmres -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50 > > -ksp_gmres_restart 30 -ksp_pc_side right > > -pc_type asm -sub_pc_type ilu -sub_pc_factor_levels 0 > -sub_pc_factor_fill 1.9 > > cores iterations err petsc solve cpu time speedup efficiency > > 1 10 4.54E-04 10.68 1 > > 2 11 9.55E-04 8.2 1.30 0.65 > > 4 12 3.59E-04 5.26 2.03 0.50 > > > > > > > > > > > > > > > > Those results are from a third order "DG" scheme with a very coarse > 3D mesh (480 elements). I believe I should get some speedups for this test > even on this coarse mesh. > > > > My question is why does the asm with a local solve take much longer > time than the asm as a preconditioner only? Also the accuracy is very bad > too I have tested changing the overlap of asm to 2, but make it even worse. > > > > If I used a larger mesh ~4000 elements, the 2nd case with asm as the > preconditioner gives me a better speedup, but still not very good. > > > > cores iterations err petsc solve cpu time speedup efficiency > > 1 7 1.91E-02 97.32 1 > > 2 7 2.07E-02 64.94 1.5 0.74 > > 4 7 2.61E-02 36.97 2.6 0.65 > > > > > > Attached are the log_summary dumped from petsc, any suggestions are > welcome. I really appreciate it. > > > > > > Sincerely Yours, > > > > Lei Shi > > --------- > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > > > > > Sincerely Yours, Lei Shi --------- On Thu, Jun 25, 2015 at 5:33 PM, Barry Smith wrote: > > > On Jun 25, 2015, at 3:48 PM, Lei Shi wrote: > > > > Hi Justin, > > > > Thanks for your suggestion. I will test it asap. > > > > Another thing confusing me is the wclock time with 2 cores is almost the > same as the serial run when I use asm with sub_ksp_type gmres and ilu0 on > subdomains. Serial run takes 11.95 sec and parallel run takes 10.5 sec. > There is almost no speedup at all. > > On one process ASM is ilu(0), so the setup time is one ILU(0) > factorization of the entire matrix. On two processes the ILU(0) is run on a > matrix that is more than 1/2 the size of the matrix; due to the overlap of > 1. In particular for small problems the overlap will pull in most of the > matrix so the setup time is not 1/2 of the setup time of one process. Then > the number of iterations increases a good amount in going from 1 to 2 > processes. In combination this means that ASM going from one to two process > requires one each process much more than 1/2 the work of running on 1 > process so you should not expect great speedup in going from one to two > processes. > > > > > > > And I found some other people got similar bad speedups when comparing 2 > cores with 1 core. Attached is one slide from J.A. Davis's presentation. I > just found it from the web. As you can see, asm with 2 cores takes almost > the same cpu times compare 1 core too! May be I miss understanding some > fundamental things related to asm. > > > > cores iterations err petsc solve wclock time speedup efficiency > > 1 2 1.15E-04 11.95 1 > > 2 5 2.05E-02 10.5 1.01 0.50 > > 4 6 2.19E-02 7.64 1.39 0.34 > > > > > > > > > > > > > > > > > > ? > > > > Sincerely Yours, > > > > Lei Shi > > --------- > > > > On Thu, Jun 25, 2015 at 3:34 PM, Justin Chang > wrote: > > Hi Lei, > > > > Depending on your machine and MPI library, you may have to use smart > process to core/socket bindings to achieve better speedup. Instructions can > be found here: > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html#computers > > > > > > Justin > > > > On Thu, Jun 25, 2015 at 3:24 PM, Lei Shi wrote: > > Hi Matt, > > > > Thanks for your suggestions. Here is the output from Stream test on one > node which has 20 cores. I run it up to 20. Attached are the dumped output > with your suggested options. Really appreciate your help!!! > > > > Number of MPI processes 1 > > Function Rate (MB/s) > > Copy: 13816.9372 > > Scale: 8020.1809 > > Add: 12762.3830 > > Triad: 11852.5016 > > > > Number of MPI processes 2 > > Function Rate (MB/s) > > Copy: 22748.7681 > > Scale: 14081.4906 > > Add: 18998.4516 > > Triad: 18303.2494 > > > > Number of MPI processes 3 > > Function Rate (MB/s) > > Copy: 34045.2510 > > Scale: 23410.9767 > > Add: 30320.2702 > > Triad: 30163.7977 > > > > Number of MPI processes 4 > > Function Rate (MB/s) > > Copy: 36875.5349 > > Scale: 29440.1694 > > Add: 36971.1860 > > Triad: 37377.0103 > > > > Number of MPI processes 5 > > Function Rate (MB/s) > > Copy: 32272.8763 > > Scale: 30316.3435 > > Add: 38022.0193 > > Triad: 38815.4830 > > > > Number of MPI processes 6 > > Function Rate (MB/s) > > Copy: 35619.8925 > > Scale: 34457.5078 > > Add: 41419.3722 > > Triad: 35825.3621 > > > > Number of MPI processes 7 > > Function Rate (MB/s) > > Copy: 55284.2420 > > Scale: 47706.8009 > > Add: 59076.4735 > > Triad: 61680.5559 > > > > Number of MPI processes 8 > > Function Rate (MB/s) > > Copy: 44525.8901 > > Scale: 48949.9599 > > Add: 57437.7784 > > Triad: 56671.0593 > > > > Number of MPI processes 9 > > Function Rate (MB/s) > > Copy: 34375.7364 > > Scale: 29507.5293 > > Add: 45405.3120 > > Triad: 39518.7559 > > > > Number of MPI processes 10 > > Function Rate (MB/s) > > Copy: 34278.0415 > > Scale: 41721.7843 > > Add: 46642.2465 > > Triad: 45454.7000 > > > > Number of MPI processes 11 > > Function Rate (MB/s) > > Copy: 38093.7244 > > Scale: 35147.2412 > > Add: 45047.0853 > > Triad: 44983.2013 > > > > Number of MPI processes 12 > > Function Rate (MB/s) > > Copy: 39750.8760 > > Scale: 52038.0631 > > Add: 55552.9503 > > Triad: 54884.3839 > > > > Number of MPI processes 13 > > Function Rate (MB/s) > > Copy: 60839.0248 > > Scale: 74143.7458 > > Add: 85545.3135 > > Triad: 85667.6551 > > > > Number of MPI processes 14 > > Function Rate (MB/s) > > Copy: 37766.2343 > > Scale: 40279.1928 > > Add: 49992.8572 > > Triad: 50303.4809 > > > > Number of MPI processes 15 > > Function Rate (MB/s) > > Copy: 49762.3670 > > Scale: 59077.8251 > > Add: 60407.9651 > > Triad: 61691.9456 > > > > Number of MPI processes 16 > > Function Rate (MB/s) > > Copy: 31996.7169 > > Scale: 36962.4860 > > Add: 40183.5060 > > Triad: 41096.0512 > > > > Number of MPI processes 17 > > Function Rate (MB/s) > > Copy: 36348.3839 > > Scale: 39108.6761 > > Add: 46853.4476 > > Triad: 47266.1778 > > > > Number of MPI processes 18 > > Function Rate (MB/s) > > Copy: 40438.7558 > > Scale: 43195.5785 > > Add: 53063.4321 > > Triad: 53605.0293 > > > > Number of MPI processes 19 > > Function Rate (MB/s) > > Copy: 30739.4908 > > Scale: 34280.8118 > > Add: 40710.5155 > > Triad: 43330.9503 > > > > Number of MPI processes 20 > > Function Rate (MB/s) > > Copy: 37488.3777 > > Scale: 41791.8999 > > Add: 49518.9604 > > Triad: 48908.2677 > > ------------------------------------------------ > > np speedup > > 1 1.0 > > 2 1.54 > > 3 2.54 > > 4 3.15 > > 5 3.27 > > 6 3.02 > > 7 5.2 > > 8 4.78 > > 9 3.33 > > 10 3.84 > > 11 3.8 > > 12 4.63 > > 13 7.23 > > 14 4.24 > > 15 5.2 > > 16 3.47 > > 17 3.99 > > 18 4.52 > > 19 3.66 > > 20 4.13 > > > > > > > > > > > > Sincerely Yours, > > > > Lei Shi > > --------- > > > > On Thu, Jun 25, 2015 at 6:44 AM, Matthew Knepley > wrote: > > On Thu, Jun 25, 2015 at 5:51 AM, Lei Shi wrote: > > Hello, > > > > 1) In order to understand this, we have to disentagle the various > effect. First, run the STREAMS benchmark > > > > make NPMAX=4 streams > > > > This will tell you the maximum speedup you can expect on this machine. > > > > 2) For these test cases, also send the output of > > > > -ksp_view -ksp_converged_reason -ksp_monitor_true_residual > > > > Thanks, > > > > Matt > > > > I'm trying to improve the parallel efficiency of gmres solve in my. In > my CFD solver, Petsc gmres is used to solve the linear system generated by > the Newton's method. To test its efficiency, I started with a very simple > inviscid subsonic 3D flow as the first testcase. The parallel efficiency of > gmres solve with asm as the preconditioner is very bad. The results are > from our latest cluster. Right now, I'm only looking at the wclock time of > the ksp_solve. > > ? First I tested ASM with gmres and ilu 0 for the sub domain , the > cpu time of 2 cores is almost the same as the serial run. Here is the > options for this case > > -ksp_type gmres -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50 > > -ksp_gmres_restart 30 -ksp_pc_side right > > -pc_type asm -sub_ksp_type gmres -sub_ksp_rtol 0.001 -sub_ksp_atol 1e-30 > > -sub_ksp_max_it 1000 -sub_pc_type ilu -sub_pc_factor_levels 0 > > -sub_pc_factor_fill 1.9 > > The iteration numbers increase a lot for parallel run. > > cores iterations err petsc solve wclock time speedup efficiency > > 1 2 1.15E-04 11.95 1 > > 2 5 2.05E-02 10.5 1.01 0.50 > > 4 6 2.19E-02 7.64 1.39 0.34 > > > > > > > > > > > > > > > > 2. Then I tested ASM with ilu 0 as the preconditoner only, the > cpu time of 2 cores is better than the 1st test, but the speedup is still > very bad. Here is the options i'm using > > -ksp_type gmres -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50 > > -ksp_gmres_restart 30 -ksp_pc_side right > > -pc_type asm -sub_pc_type ilu -sub_pc_factor_levels 0 > -sub_pc_factor_fill 1.9 > > cores iterations err petsc solve cpu time speedup efficiency > > 1 10 4.54E-04 10.68 1 > > 2 11 9.55E-04 8.2 1.30 0.65 > > 4 12 3.59E-04 5.26 2.03 0.50 > > > > > > > > > > > > > > > > Those results are from a third order "DG" scheme with a very coarse > 3D mesh (480 elements). I believe I should get some speedups for this test > even on this coarse mesh. > > > > My question is why does the asm with a local solve take much longer > time than the asm as a preconditioner only? Also the accuracy is very bad > too I have tested changing the overlap of asm to 2, but make it even worse. > > > > If I used a larger mesh ~4000 elements, the 2nd case with asm as the > preconditioner gives me a better speedup, but still not very good. > > > > cores iterations err petsc solve cpu time speedup efficiency > > 1 7 1.91E-02 97.32 1 > > 2 7 2.07E-02 64.94 1.5 0.74 > > 4 7 2.61E-02 36.97 2.6 0.65 > > > > > > Attached are the log_summary dumped from petsc, any suggestions are > welcome. I really appreciate it. > > > > > > Sincerely Yours, > > > > Lei Shi > > --------- > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Jun 25 21:35:12 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 25 Jun 2015 21:35:12 -0500 Subject: [petsc-users] Parallel efficiency of the gmres solver with ASM In-Reply-To: References: <228F0DC3-F1AB-41F5-8502-39314F2E2B52@mcs.anl.gov> Message-ID: <0F28D43B-A03E-44D6-92D1-400A22397A1C@mcs.anl.gov> > On Jun 25, 2015, at 9:25 PM, Lei Shi wrote: > > Barry, > > Thanks a lot for your reply. Your explanation helps me understand my test results. So In this case, to compute the speedup for a strong scalability test, I should use the the wall clock time with multiple cores as a reference time instead of serial run time? > > e.g. for computing speed up of 16 cores, i should use > > > > instead of using > > > > Another question is when I use asm as a preconditioner only, the speedup of 2 cores is much better than the case using asm with a local solve sub_ksp_type gmres. > -ksp_type gmres -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50 > -ksp_gmres_restart 30 -ksp_pc_side right > -pc_type asm -sub_pc_type ilu -sub_pc_factor_levels 0 -sub_pc_factor_fill 1.9 > cores iterations err petsc solve cpu time speedup efficiency > 1 10 4.54E-04 10.68 1 > 2 11 9.55E-04 8.2 1.30 0.65 > 4 12 3.59E-04 5.26 2.03 0.50 Using -sub_ksp_type germs results in "oversolving" the local problems which takes more time but does not improve (by much) the convergence of the "outer" linear solver. Unfortunately I don't know any way to automatically adjust the accuracy of the inner solver to minimize the solve time of the outer solve. The (relative) performance of the many solvers depends greatly on the size of the problem (for example for small problems using no preconditioner is often best but for larger problems something like GAMG is best). So, in order to come to some conclusion of what solver to use you need to run the tests for the the size of the problem you want to solve (not a smaller problem). So I recommend running a new set of tests using the problem size you need to solve with no preconditioner, bjacobi, ASM, and GAMG this will help you decide what works best for your problem. Barry > > > > > > > What is the main differences between those two? Thanks. > > Would you please take a look of my profiling data? Do you think this is the best parallel efficiency I can get from Petsc? How can I improve it? > > Best, > > Lei Shi > > > > > > > Sincerely Yours, > > Lei Shi > --------- > > On Thu, Jun 25, 2015 at 5:33 PM, Barry Smith wrote: > > > On Jun 25, 2015, at 3:48 PM, Lei Shi wrote: > > > > Hi Justin, > > > > Thanks for your suggestion. I will test it asap. > > > > Another thing confusing me is the wclock time with 2 cores is almost the same as the serial run when I use asm with sub_ksp_type gmres and ilu0 on subdomains. Serial run takes 11.95 sec and parallel run takes 10.5 sec. There is almost no speedup at all. > > On one process ASM is ilu(0), so the setup time is one ILU(0) factorization of the entire matrix. On two processes the ILU(0) is run on a matrix that is more than 1/2 the size of the matrix; due to the overlap of 1. In particular for small problems the overlap will pull in most of the matrix so the setup time is not 1/2 of the setup time of one process. Then the number of iterations increases a good amount in going from 1 to 2 processes. In combination this means that ASM going from one to two process requires one each process much more than 1/2 the work of running on 1 process so you should not expect great speedup in going from one to two processes. > > > > > > And I found some other people got similar bad speedups when comparing 2 cores with 1 core. Attached is one slide from J.A. Davis's presentation. I just found it from the web. As you can see, asm with 2 cores takes almost the same cpu times compare 1 core too! May be I miss understanding some fundamental things related to asm. > > > > cores iterations err petsc solve wclock time speedup efficiency > > 1 2 1.15E-04 11.95 1 > > 2 5 2.05E-02 10.5 1.01 0.50 > > 4 6 2.19E-02 7.64 1.39 0.34 > > > > > > > > > > > > > > > > > > ? > > > > Sincerely Yours, > > > > Lei Shi > > --------- > > > > On Thu, Jun 25, 2015 at 3:34 PM, Justin Chang wrote: > > Hi Lei, > > > > Depending on your machine and MPI library, you may have to use smart process to core/socket bindings to achieve better speedup. Instructions can be found here: > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html#computers > > > > > > Justin > > > > On Thu, Jun 25, 2015 at 3:24 PM, Lei Shi wrote: > > Hi Matt, > > > > Thanks for your suggestions. Here is the output from Stream test on one node which has 20 cores. I run it up to 20. Attached are the dumped output with your suggested options. Really appreciate your help!!! > > > > Number of MPI processes 1 > > Function Rate (MB/s) > > Copy: 13816.9372 > > Scale: 8020.1809 > > Add: 12762.3830 > > Triad: 11852.5016 > > > > Number of MPI processes 2 > > Function Rate (MB/s) > > Copy: 22748.7681 > > Scale: 14081.4906 > > Add: 18998.4516 > > Triad: 18303.2494 > > > > Number of MPI processes 3 > > Function Rate (MB/s) > > Copy: 34045.2510 > > Scale: 23410.9767 > > Add: 30320.2702 > > Triad: 30163.7977 > > > > Number of MPI processes 4 > > Function Rate (MB/s) > > Copy: 36875.5349 > > Scale: 29440.1694 > > Add: 36971.1860 > > Triad: 37377.0103 > > > > Number of MPI processes 5 > > Function Rate (MB/s) > > Copy: 32272.8763 > > Scale: 30316.3435 > > Add: 38022.0193 > > Triad: 38815.4830 > > > > Number of MPI processes 6 > > Function Rate (MB/s) > > Copy: 35619.8925 > > Scale: 34457.5078 > > Add: 41419.3722 > > Triad: 35825.3621 > > > > Number of MPI processes 7 > > Function Rate (MB/s) > > Copy: 55284.2420 > > Scale: 47706.8009 > > Add: 59076.4735 > > Triad: 61680.5559 > > > > Number of MPI processes 8 > > Function Rate (MB/s) > > Copy: 44525.8901 > > Scale: 48949.9599 > > Add: 57437.7784 > > Triad: 56671.0593 > > > > Number of MPI processes 9 > > Function Rate (MB/s) > > Copy: 34375.7364 > > Scale: 29507.5293 > > Add: 45405.3120 > > Triad: 39518.7559 > > > > Number of MPI processes 10 > > Function Rate (MB/s) > > Copy: 34278.0415 > > Scale: 41721.7843 > > Add: 46642.2465 > > Triad: 45454.7000 > > > > Number of MPI processes 11 > > Function Rate (MB/s) > > Copy: 38093.7244 > > Scale: 35147.2412 > > Add: 45047.0853 > > Triad: 44983.2013 > > > > Number of MPI processes 12 > > Function Rate (MB/s) > > Copy: 39750.8760 > > Scale: 52038.0631 > > Add: 55552.9503 > > Triad: 54884.3839 > > > > Number of MPI processes 13 > > Function Rate (MB/s) > > Copy: 60839.0248 > > Scale: 74143.7458 > > Add: 85545.3135 > > Triad: 85667.6551 > > > > Number of MPI processes 14 > > Function Rate (MB/s) > > Copy: 37766.2343 > > Scale: 40279.1928 > > Add: 49992.8572 > > Triad: 50303.4809 > > > > Number of MPI processes 15 > > Function Rate (MB/s) > > Copy: 49762.3670 > > Scale: 59077.8251 > > Add: 60407.9651 > > Triad: 61691.9456 > > > > Number of MPI processes 16 > > Function Rate (MB/s) > > Copy: 31996.7169 > > Scale: 36962.4860 > > Add: 40183.5060 > > Triad: 41096.0512 > > > > Number of MPI processes 17 > > Function Rate (MB/s) > > Copy: 36348.3839 > > Scale: 39108.6761 > > Add: 46853.4476 > > Triad: 47266.1778 > > > > Number of MPI processes 18 > > Function Rate (MB/s) > > Copy: 40438.7558 > > Scale: 43195.5785 > > Add: 53063.4321 > > Triad: 53605.0293 > > > > Number of MPI processes 19 > > Function Rate (MB/s) > > Copy: 30739.4908 > > Scale: 34280.8118 > > Add: 40710.5155 > > Triad: 43330.9503 > > > > Number of MPI processes 20 > > Function Rate (MB/s) > > Copy: 37488.3777 > > Scale: 41791.8999 > > Add: 49518.9604 > > Triad: 48908.2677 > > ------------------------------------------------ > > np speedup > > 1 1.0 > > 2 1.54 > > 3 2.54 > > 4 3.15 > > 5 3.27 > > 6 3.02 > > 7 5.2 > > 8 4.78 > > 9 3.33 > > 10 3.84 > > 11 3.8 > > 12 4.63 > > 13 7.23 > > 14 4.24 > > 15 5.2 > > 16 3.47 > > 17 3.99 > > 18 4.52 > > 19 3.66 > > 20 4.13 > > > > > > > > > > > > Sincerely Yours, > > > > Lei Shi > > --------- > > > > On Thu, Jun 25, 2015 at 6:44 AM, Matthew Knepley wrote: > > On Thu, Jun 25, 2015 at 5:51 AM, Lei Shi wrote: > > Hello, > > > > 1) In order to understand this, we have to disentagle the various effect. First, run the STREAMS benchmark > > > > make NPMAX=4 streams > > > > This will tell you the maximum speedup you can expect on this machine. > > > > 2) For these test cases, also send the output of > > > > -ksp_view -ksp_converged_reason -ksp_monitor_true_residual > > > > Thanks, > > > > Matt > > > > I'm trying to improve the parallel efficiency of gmres solve in my. In my CFD solver, Petsc gmres is used to solve the linear system generated by the Newton's method. To test its efficiency, I started with a very simple inviscid subsonic 3D flow as the first testcase. The parallel efficiency of gmres solve with asm as the preconditioner is very bad. The results are from our latest cluster. Right now, I'm only looking at the wclock time of the ksp_solve. > > ? First I tested ASM with gmres and ilu 0 for the sub domain , the cpu time of 2 cores is almost the same as the serial run. Here is the options for this case > > -ksp_type gmres -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50 > > -ksp_gmres_restart 30 -ksp_pc_side right > > -pc_type asm -sub_ksp_type gmres -sub_ksp_rtol 0.001 -sub_ksp_atol 1e-30 > > -sub_ksp_max_it 1000 -sub_pc_type ilu -sub_pc_factor_levels 0 > > -sub_pc_factor_fill 1.9 > > The iteration numbers increase a lot for parallel run. > > cores iterations err petsc solve wclock time speedup efficiency > > 1 2 1.15E-04 11.95 1 > > 2 5 2.05E-02 10.5 1.01 0.50 > > 4 6 2.19E-02 7.64 1.39 0.34 > > > > > > > > > > > > > > > > 2. Then I tested ASM with ilu 0 as the preconditoner only, the cpu time of 2 cores is better than the 1st test, but the speedup is still very bad. Here is the options i'm using > > -ksp_type gmres -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50 > > -ksp_gmres_restart 30 -ksp_pc_side right > > -pc_type asm -sub_pc_type ilu -sub_pc_factor_levels 0 -sub_pc_factor_fill 1.9 > > cores iterations err petsc solve cpu time speedup efficiency > > 1 10 4.54E-04 10.68 1 > > 2 11 9.55E-04 8.2 1.30 0.65 > > 4 12 3.59E-04 5.26 2.03 0.50 > > > > > > > > > > > > > > > > Those results are from a third order "DG" scheme with a very coarse 3D mesh (480 elements). I believe I should get some speedups for this test even on this coarse mesh. > > > > My question is why does the asm with a local solve take much longer time than the asm as a preconditioner only? Also the accuracy is very bad too I have tested changing the overlap of asm to 2, but make it even worse. > > > > If I used a larger mesh ~4000 elements, the 2nd case with asm as the preconditioner gives me a better speedup, but still not very good. > > > > cores iterations err petsc solve cpu time speedup efficiency > > 1 7 1.91E-02 97.32 1 > > 2 7 2.07E-02 64.94 1.5 0.74 > > 4 7 2.61E-02 36.97 2.6 0.65 > > > > > > Attached are the log_summary dumped from petsc, any suggestions are welcome. I really appreciate it. > > > > > > Sincerely Yours, > > > > Lei Shi > > --------- > > > > > > > > > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > -- Norbert Wiener > > > > > > > > > > Sincerely Yours, > > Lei Shi > --------- > > On Thu, Jun 25, 2015 at 5:33 PM, Barry Smith wrote: > > > On Jun 25, 2015, at 3:48 PM, Lei Shi wrote: > > > > Hi Justin, > > > > Thanks for your suggestion. I will test it asap. > > > > Another thing confusing me is the wclock time with 2 cores is almost the same as the serial run when I use asm with sub_ksp_type gmres and ilu0 on subdomains. Serial run takes 11.95 sec and parallel run takes 10.5 sec. There is almost no speedup at all. > > On one process ASM is ilu(0), so the setup time is one ILU(0) factorization of the entire matrix. On two processes the ILU(0) is run on a matrix that is more than 1/2 the size of the matrix; due to the overlap of 1. In particular for small problems the overlap will pull in most of the matrix so the setup time is not 1/2 of the setup time of one process. Then the number of iterations increases a good amount in going from 1 to 2 processes. In combination this means that ASM going from one to two process requires one each process much more than 1/2 the work of running on 1 process so you should not expect great speedup in going from one to two processes. > > > > > > > And I found some other people got similar bad speedups when comparing 2 cores with 1 core. Attached is one slide from J.A. Davis's presentation. I just found it from the web. As you can see, asm with 2 cores takes almost the same cpu times compare 1 core too! May be I miss understanding some fundamental things related to asm. > > > > cores iterations err petsc solve wclock time speedup efficiency > > 1 2 1.15E-04 11.95 1 > > 2 5 2.05E-02 10.5 1.01 0.50 > > 4 6 2.19E-02 7.64 1.39 0.34 > > > > > > > > > > > > > > > > > > ? > > > > Sincerely Yours, > > > > Lei Shi > > --------- > > > > On Thu, Jun 25, 2015 at 3:34 PM, Justin Chang wrote: > > Hi Lei, > > > > Depending on your machine and MPI library, you may have to use smart process to core/socket bindings to achieve better speedup. Instructions can be found here: > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html#computers > > > > > > Justin > > > > On Thu, Jun 25, 2015 at 3:24 PM, Lei Shi wrote: > > Hi Matt, > > > > Thanks for your suggestions. Here is the output from Stream test on one node which has 20 cores. I run it up to 20. Attached are the dumped output with your suggested options. Really appreciate your help!!! > > > > Number of MPI processes 1 > > Function Rate (MB/s) > > Copy: 13816.9372 > > Scale: 8020.1809 > > Add: 12762.3830 > > Triad: 11852.5016 > > > > Number of MPI processes 2 > > Function Rate (MB/s) > > Copy: 22748.7681 > > Scale: 14081.4906 > > Add: 18998.4516 > > Triad: 18303.2494 > > > > Number of MPI processes 3 > > Function Rate (MB/s) > > Copy: 34045.2510 > > Scale: 23410.9767 > > Add: 30320.2702 > > Triad: 30163.7977 > > > > Number of MPI processes 4 > > Function Rate (MB/s) > > Copy: 36875.5349 > > Scale: 29440.1694 > > Add: 36971.1860 > > Triad: 37377.0103 > > > > Number of MPI processes 5 > > Function Rate (MB/s) > > Copy: 32272.8763 > > Scale: 30316.3435 > > Add: 38022.0193 > > Triad: 38815.4830 > > > > Number of MPI processes 6 > > Function Rate (MB/s) > > Copy: 35619.8925 > > Scale: 34457.5078 > > Add: 41419.3722 > > Triad: 35825.3621 > > > > Number of MPI processes 7 > > Function Rate (MB/s) > > Copy: 55284.2420 > > Scale: 47706.8009 > > Add: 59076.4735 > > Triad: 61680.5559 > > > > Number of MPI processes 8 > > Function Rate (MB/s) > > Copy: 44525.8901 > > Scale: 48949.9599 > > Add: 57437.7784 > > Triad: 56671.0593 > > > > Number of MPI processes 9 > > Function Rate (MB/s) > > Copy: 34375.7364 > > Scale: 29507.5293 > > Add: 45405.3120 > > Triad: 39518.7559 > > > > Number of MPI processes 10 > > Function Rate (MB/s) > > Copy: 34278.0415 > > Scale: 41721.7843 > > Add: 46642.2465 > > Triad: 45454.7000 > > > > Number of MPI processes 11 > > Function Rate (MB/s) > > Copy: 38093.7244 > > Scale: 35147.2412 > > Add: 45047.0853 > > Triad: 44983.2013 > > > > Number of MPI processes 12 > > Function Rate (MB/s) > > Copy: 39750.8760 > > Scale: 52038.0631 > > Add: 55552.9503 > > Triad: 54884.3839 > > > > Number of MPI processes 13 > > Function Rate (MB/s) > > Copy: 60839.0248 > > Scale: 74143.7458 > > Add: 85545.3135 > > Triad: 85667.6551 > > > > Number of MPI processes 14 > > Function Rate (MB/s) > > Copy: 37766.2343 > > Scale: 40279.1928 > > Add: 49992.8572 > > Triad: 50303.4809 > > > > Number of MPI processes 15 > > Function Rate (MB/s) > > Copy: 49762.3670 > > Scale: 59077.8251 > > Add: 60407.9651 > > Triad: 61691.9456 > > > > Number of MPI processes 16 > > Function Rate (MB/s) > > Copy: 31996.7169 > > Scale: 36962.4860 > > Add: 40183.5060 > > Triad: 41096.0512 > > > > Number of MPI processes 17 > > Function Rate (MB/s) > > Copy: 36348.3839 > > Scale: 39108.6761 > > Add: 46853.4476 > > Triad: 47266.1778 > > > > Number of MPI processes 18 > > Function Rate (MB/s) > > Copy: 40438.7558 > > Scale: 43195.5785 > > Add: 53063.4321 > > Triad: 53605.0293 > > > > Number of MPI processes 19 > > Function Rate (MB/s) > > Copy: 30739.4908 > > Scale: 34280.8118 > > Add: 40710.5155 > > Triad: 43330.9503 > > > > Number of MPI processes 20 > > Function Rate (MB/s) > > Copy: 37488.3777 > > Scale: 41791.8999 > > Add: 49518.9604 > > Triad: 48908.2677 > > ------------------------------------------------ > > np speedup > > 1 1.0 > > 2 1.54 > > 3 2.54 > > 4 3.15 > > 5 3.27 > > 6 3.02 > > 7 5.2 > > 8 4.78 > > 9 3.33 > > 10 3.84 > > 11 3.8 > > 12 4.63 > > 13 7.23 > > 14 4.24 > > 15 5.2 > > 16 3.47 > > 17 3.99 > > 18 4.52 > > 19 3.66 > > 20 4.13 > > > > > > > > > > > > Sincerely Yours, > > > > Lei Shi > > --------- > > > > On Thu, Jun 25, 2015 at 6:44 AM, Matthew Knepley wrote: > > On Thu, Jun 25, 2015 at 5:51 AM, Lei Shi wrote: > > Hello, > > > > 1) In order to understand this, we have to disentagle the various effect. First, run the STREAMS benchmark > > > > make NPMAX=4 streams > > > > This will tell you the maximum speedup you can expect on this machine. > > > > 2) For these test cases, also send the output of > > > > -ksp_view -ksp_converged_reason -ksp_monitor_true_residual > > > > Thanks, > > > > Matt > > > > I'm trying to improve the parallel efficiency of gmres solve in my. In my CFD solver, Petsc gmres is used to solve the linear system generated by the Newton's method. To test its efficiency, I started with a very simple inviscid subsonic 3D flow as the first testcase. The parallel efficiency of gmres solve with asm as the preconditioner is very bad. The results are from our latest cluster. Right now, I'm only looking at the wclock time of the ksp_solve. > > ? First I tested ASM with gmres and ilu 0 for the sub domain , the cpu time of 2 cores is almost the same as the serial run. Here is the options for this case > > -ksp_type gmres -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50 > > -ksp_gmres_restart 30 -ksp_pc_side right > > -pc_type asm -sub_ksp_type gmres -sub_ksp_rtol 0.001 -sub_ksp_atol 1e-30 > > -sub_ksp_max_it 1000 -sub_pc_type ilu -sub_pc_factor_levels 0 > > -sub_pc_factor_fill 1.9 > > The iteration numbers increase a lot for parallel run. > > cores iterations err petsc solve wclock time speedup efficiency > > 1 2 1.15E-04 11.95 1 > > 2 5 2.05E-02 10.5 1.01 0.50 > > 4 6 2.19E-02 7.64 1.39 0.34 > > > > > > > > > > > > > > > > 2. Then I tested ASM with ilu 0 as the preconditoner only, the cpu time of 2 cores is better than the 1st test, but the speedup is still very bad. Here is the options i'm using > > -ksp_type gmres -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50 > > -ksp_gmres_restart 30 -ksp_pc_side right > > -pc_type asm -sub_pc_type ilu -sub_pc_factor_levels 0 -sub_pc_factor_fill 1.9 > > cores iterations err petsc solve cpu time speedup efficiency > > 1 10 4.54E-04 10.68 1 > > 2 11 9.55E-04 8.2 1.30 0.65 > > 4 12 3.59E-04 5.26 2.03 0.50 > > > > > > > > > > > > > > > > Those results are from a third order "DG" scheme with a very coarse 3D mesh (480 elements). I believe I should get some speedups for this test even on this coarse mesh. > > > > My question is why does the asm with a local solve take much longer time than the asm as a preconditioner only? Also the accuracy is very bad too I have tested changing the overlap of asm to 2, but make it even worse. > > > > If I used a larger mesh ~4000 elements, the 2nd case with asm as the preconditioner gives me a better speedup, but still not very good. > > > > cores iterations err petsc solve cpu time speedup efficiency > > 1 7 1.91E-02 97.32 1 > > 2 7 2.07E-02 64.94 1.5 0.74 > > 4 7 2.61E-02 36.97 2.6 0.65 > > > > > > Attached are the log_summary dumped from petsc, any suggestions are welcome. I really appreciate it. > > > > > > Sincerely Yours, > > > > Lei Shi > > --------- > > > > > > > > > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > -- Norbert Wiener > > > > > > > > From jychang48 at gmail.com Fri Jun 26 01:45:51 2015 From: jychang48 at gmail.com (Justin Chang) Date: Fri, 26 Jun 2015 01:45:51 -0500 Subject: [petsc-users] PetscBinaryRead and quad precision Message-ID: Hi all, I need to run simulations that rely on several of my custom binary datafiles (written in 32 bit int and 64 bit doubles). These date files were generated from MATLAB. In my PETSc code I invoke PetscBinaryRead(...) into these binary files, which gives me mesh data, auxiliaries, etc. However, when I now configure with quad precision (--with-precision=__float128 and --download-f2cblaslapack) my PetscBinaryRead() functions give me segmentation violation errors. I am guessing this is because I have binary files written in double precision but have PETSc which reads in quad precision, meaning I will be reading past the end of these files due to the larger strides. So my question is, is there a way to circumvent this issue? That is, to read double-precision binary data into a PETSc program configured with quad-precision? Otherwise I would have to rewrite or redo all of my datafiles, which I would prefer not to do if possible. Thanks, Justin -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Jun 26 11:24:10 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 26 Jun 2015 11:24:10 -0500 Subject: [petsc-users] PetscBinaryRead and quad precision In-Reply-To: References: Message-ID: We have an undocumented option -binary_read_double that will read in double precision from the file and place it into a quad precision array. This is exactly what you need Barry Yes, we should fix up our binary viewers to allow reading and writing generally for any precision but need a volunteer to do it. > On Jun 26, 2015, at 1:45 AM, Justin Chang wrote: > > Hi all, > > I need to run simulations that rely on several of my custom binary datafiles (written in 32 bit int and 64 bit doubles). These date files were generated from MATLAB. In my PETSc code I invoke PetscBinaryRead(...) into these binary files, which gives me mesh data, auxiliaries, etc. > > However, when I now configure with quad precision (--with-precision=__float128 and --download-f2cblaslapack) my PetscBinaryRead() functions give me segmentation violation errors. I am guessing this is because I have binary files written in double precision but have PETSc which reads in quad precision, meaning I will be reading past the end of these files due to the larger strides. > > So my question is, is there a way to circumvent this issue? That is, to read double-precision binary data into a PETSc program configured with quad-precision? Otherwise I would have to rewrite or redo all of my datafiles, which I would prefer not to do if possible. > > Thanks, > Justin > From Xinya.Li at pnnl.gov Fri Jun 26 12:16:07 2015 From: Xinya.Li at pnnl.gov (Li, Xinya) Date: Fri, 26 Jun 2015 17:16:07 +0000 Subject: [petsc-users] TSSolve problems In-Reply-To: References: Message-ID: Barry, Thank you for your response. Attached is the output. SNESSolve was taking most of the time. Xinya -----Original Message----- From: Barry Smith [mailto:bsmith at mcs.anl.gov] Sent: Thursday, June 25, 2015 5:47 PM To: Li, Xinya Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] TSSolve problems Run with -ts_view -log_summary and send the output. This will tell the current solvers and where the time is being spent. Barry > On Jun 25, 2015, at 6:37 PM, Li, Xinya wrote: > > Dear Sir, > > I am using the ts solver to solve a set of ODE and DAE. The Jacobian matrix is a 1152 *1152 sparse complex matrix. > Each TSStep in TSSolve is taking nearly 1 second. Thus, I need to improve the speed of TSSolve. > Which parts should be taking into account to accelerate TSSolve? > Thank you very much. > Regards > __________________________________________________ > Xinya Li > Scientist > EED/Hydrology > Pacific Northwest National Laboratory > 902 Battelle Boulevard > P.O. Box 999, MSIN K9-33 > Richland, WA 99352 USA > Tel: 509-372-6248 > Fax: 509-372-6089 > Xinya.Li at pnl.gov -------------- next part -------------- A non-text attachment was scrubbed... Name: 288g1081b_short.log Type: application/octet-stream Size: 27560 bytes Desc: 288g1081b_short.log URL: From gianmail at gmail.com Fri Jun 26 12:27:42 2015 From: gianmail at gmail.com (Gianluca Meneghello) Date: Fri, 26 Jun 2015 10:27:42 -0700 Subject: [petsc-users] DM coordinates interpolations Message-ID: Dear all, I would like to solve a PDE discretized on a nonuniform --- but rectangular --- grid and I wanted to use the DM coordinates vector to compute the metric terms by finite difference. The only alternative I see is to recompute the coordinates (and then the metric terms) at every function and jacobian evaluation call. My question is what is the best (or even correct) way to provide the coordinates to the newly created da objects. Is there anything like a DMDASetNonUniformCoordinates to which to pass a function computing the coordinates? As far as I can tell the fine grid coordinates are currently linearly interpolated from the coarse grid ones. Please also let me thank you for your great work: it has been and it currently is of enormous help. Best Gianluca -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jun 26 13:06:44 2015 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 26 Jun 2015 13:06:44 -0500 Subject: [petsc-users] DM coordinates interpolations In-Reply-To: References: Message-ID: On Fri, Jun 26, 2015 at 12:27 PM, Gianluca Meneghello wrote: > Dear all, > > I would like to solve a PDE discretized on a nonuniform --- but > rectangular --- grid and I wanted to use the DM coordinates vector to > compute the metric terms by finite difference. The only alternative I see > is to recompute the coordinates (and then the metric terms) at every > function and jacobian evaluation call. > > My question is what is the best (or even correct) way to provide the > coordinates to the newly created da objects. Is there anything like a > DMDASetNonUniformCoordinates to which to pass a function computing the > coordinates? As far as I can tell the fine grid coordinates are currently > linearly interpolated from the coarse grid ones. > We do not have that function, however you can operate on the coordinate vector the same as you would any other vector, namely you call DMDAVecGetArrayDOF() on the coordinate vector, write your nested loops over the vertices, and set coords[k][j][i][d] to the d-th coordinate of the point. You can ask for the interpolator from the coordinate DM to project these to coarse levels, or just set them directly. Does this make sense? Thanks, Matt > Please also let me thank you for your great work: it has been and it > currently is of enormous help. > > Best > > Gianluca > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Jun 26 13:09:30 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 26 Jun 2015 13:09:30 -0500 Subject: [petsc-users] DM coordinates interpolations In-Reply-To: References: Message-ID: > On Jun 26, 2015, at 12:27 PM, Gianluca Meneghello wrote: > > Dear all, > > I would like to solve a PDE discretized on a nonuniform --- but rectangular --- grid and I wanted to use the DM coordinates vector to compute the metric terms by finite difference. The only alternative I see is to recompute the coordinates (and then the metric terms) at every function and jacobian evaluation call. > > My question is what is the best (or even correct) way to provide the coordinates to the newly created da objects. Is there anything like a DMDASetNonUniformCoordinates to which to pass a function computing the coordinates? As far as I can tell the fine grid coordinates are currently linearly interpolated from the coarse grid ones. Call DMDASetUniformCoordinates() on each level then call DMGetCoordinates() and put the coordinate values you want in. You can call DMGetCoordinateDM(dm, &dmcoor) to get the DM that goes with the coordinate vector and use DMDAVecGetArray(dmcoor,coor,&array) to give easy access to the entries. > > Please also let me thank you for your great work: it has been and it currently is of enormous help. > > Best > > Gianluca From mrosso at uci.edu Fri Jun 26 13:20:10 2015 From: mrosso at uci.edu (Michele Rosso) Date: Fri, 26 Jun 2015 11:20:10 -0700 Subject: [petsc-users] DMDASetInterpolationType Message-ID: <1435342810.6261.4.camel@kolmog5> Hi, I am wondering if there is a way to use a custom interpolation routine for DMDA. The reason is that DMDA_Q0 does not support periodic BCs and I need that. Thanks, Michele -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Fri Jun 26 13:28:59 2015 From: dave.mayhem23 at gmail.com (Dave May) Date: Fri, 26 Jun 2015 20:28:59 +0200 Subject: [petsc-users] DM coordinates interpolations In-Reply-To: References: Message-ID: Also note that if you go and modify the global coordinate vector after calling DMDASetUniformCoordinates, you will need to explicitly call the vecscatter yourself to update the local coordinates. On Friday, 26 June 2015, Barry Smith wrote: > > > On Jun 26, 2015, at 12:27 PM, Gianluca Meneghello > wrote: > > > > Dear all, > > > > I would like to solve a PDE discretized on a nonuniform --- but > rectangular --- grid and I wanted to use the DM coordinates vector to > compute the metric terms by finite difference. The only alternative I see > is to recompute the coordinates (and then the metric terms) at every > function and jacobian evaluation call. > > > > My question is what is the best (or even correct) way to provide the > coordinates to the newly created da objects. Is there anything like a > DMDASetNonUniformCoordinates to which to pass a function computing the > coordinates? As far as I can tell the fine grid coordinates are currently > linearly interpolated from the coarse grid ones. > > Call DMDASetUniformCoordinates() on each level then call > DMGetCoordinates() and put the coordinate values you want in. You can call > DMGetCoordinateDM(dm, &dmcoor) to get the DM that goes with the coordinate > vector and use DMDAVecGetArray(dmcoor,coor,&array) to give easy access to > the entries. > > > > > Please also let me thank you for your great work: it has been and it > currently is of enormous help. > > > > Best > > > > Gianluca > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Jun 26 13:39:54 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 26 Jun 2015 13:39:54 -0500 Subject: [petsc-users] DMDASetInterpolationType In-Reply-To: <1435342810.6261.4.camel@kolmog5> References: <1435342810.6261.4.camel@kolmog5> Message-ID: <6EC86404-581D-4C14-A715-646494236F50@mcs.anl.gov> We don't currently have a "setter" allowing you to overwrite the function pointer. The easiest thing for you to do is "fix" the DMDA_Q0 to handle also handle periodic boundary conditions. It is DMCreateInterpolation_DA_2D_Q0() or 1d or 3d in src/dm/impls/da then send us a patch or make a pull request and we'll stick it in the development copy. Barry > On Jun 26, 2015, at 1:20 PM, Michele Rosso wrote: > > Hi, > > I am wondering if there is a way to use a custom interpolation routine for DMDA. > The reason is that DMDA_Q0 does not support periodic BCs and I need that. > Thanks, > > Michele From bsmith at mcs.anl.gov Fri Jun 26 14:04:26 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 26 Jun 2015 14:04:26 -0500 Subject: [petsc-users] TSSolve problems In-Reply-To: References: Message-ID: ########################################################## # # # WARNING!!! # # # # This code was compiled with a debugging option, # # To get timing results run ./configure # # using --with-debugging=no, the performance will # # be generally two or three times faster. # # # ########################################################## First you need to configure PETSc again without all the debugging. So do, for example, export PETSC=arch-opt ./configure --with-cc=gcc --with-fc=gfortran --with-cxx=g++ --with-scalar-type=complex --with-clanguage=C++ --with-cxx-dialect=C++11 --download-mpich --download-superlu_dist --download-mumps --download-scalapack --download-parmetis --download-metis --download-elemental make all test then recompile and rerun your example again with -log_summary and send the output. Note that you should not pass --download-fblaslapack nor the fortran kernel stuff. Barry > On Jun 26, 2015, at 12:16 PM, Li, Xinya wrote: > > Barry, > > Thank you for your response. > > Attached is the output. SNESSolve was taking most of the time. > > > Xinya > > > > -----Original Message----- > From: Barry Smith [mailto:bsmith at mcs.anl.gov] > Sent: Thursday, June 25, 2015 5:47 PM > To: Li, Xinya > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] TSSolve problems > > > Run with -ts_view -log_summary and send the output. This will tell the current solvers and where the time is being spent. > > Barry > >> On Jun 25, 2015, at 6:37 PM, Li, Xinya wrote: >> >> Dear Sir, >> >> I am using the ts solver to solve a set of ODE and DAE. The Jacobian matrix is a 1152 *1152 sparse complex matrix. >> Each TSStep in TSSolve is taking nearly 1 second. Thus, I need to improve the speed of TSSolve. >> Which parts should be taking into account to accelerate TSSolve? >> Thank you very much. >> Regards >> __________________________________________________ >> Xinya Li >> Scientist >> EED/Hydrology >> Pacific Northwest National Laboratory >> 902 Battelle Boulevard >> P.O. Box 999, MSIN K9-33 >> Richland, WA 99352 USA >> Tel: 509-372-6248 >> Fax: 509-372-6089 >> Xinya.Li at pnl.gov > > <288g1081b_short.log> From jychang48 at gmail.com Fri Jun 26 22:46:57 2015 From: jychang48 at gmail.com (Justin Chang) Date: Fri, 26 Jun 2015 22:46:57 -0500 Subject: [petsc-users] PetscBinaryRead and quad precision In-Reply-To: References: Message-ID: Barry, Thank you that kind of did the trick, although I have run into a couple issues: 1) Is it expected for quad precision to have a significantly worse performance than double precision in terms of wall-clock time? What took 60 seconds to solve with double precision now takes approximately 1500 seconds to solve. I am guessing this has a lot to do with the fact that scalars are now 16 bytes large as opposed to 8? 2) The program crashes at DMPlexDistribute() when I used two or more processors. Here's the error log: Input Error: Incorrect sum of 0.000000 for tpwgts for constraint 0. [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Error in external library [0]PETSC ERROR: Error in METIS_PartGraphKway() [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.6-815-g2d9afd9 GIT Date: 2015-06-26 18:48:28 -0500 [0]PETSC ERROR: ./main on a arch-linux2-quad-opt named pacotaco-xps by justin Fri Jun 26 22:29:36 2015 [0]PETSC ERROR: Configure options --download-f2cblaslapack --download-metis --download-mpich --download-parmetis --download-triangle --with-cc=gcc --with-cmake=cmake --with-cxx=g++ --with-debugging=0 --with-fc=gfortran --with-valgrind=1 PETSC_ARCH=arch-linux2-quad-opt --with-precision=__float128 [0]PETSC ERROR: #1 PetscPartitionerPartition_ParMetis() line 1181 in /home/justin/Software/petsc-dev/src/dm/impls/plex/plexpartition.c [0]PETSC ERROR: #2 PetscPartitionerPartition() line 653 in /home/justin/Software/petsc-dev/src/dm/impls/plex/plexpartition.c [0]PETSC ERROR: #3 DMPlexDistribute() line 1505 in /home/justin/Software/petsc-dev/src/dm/impls/plex/plexdistribute.c [0]PETSC ERROR: #4 CreateMesh() line 762 in /home/justin/Dropbox/DMPlex-nonneg/main.c [0]PETSC ERROR: #5 main() line 993 in /home/justin/Dropbox/DMPlex-nonneg/main.c [0]PETSC ERROR: PETSc Option Table entries: [0]PETSC ERROR: -al 1 [0]PETSC ERROR: -am 0 [0]PETSC ERROR: -at 0.001 [0]PETSC ERROR: -bcloc 0,1,0,1,0,0,0,1,0,1,1,1,0,0,0,1,0,1,1,1,0,1,0,1,0,1,0,0,0,1,0,1,1,1,0,1,0.45,0.55,0.45,0.55,0.45,0.55 [0]PETSC ERROR: -bcnum 7 [0]PETSC ERROR: -bcval 0,0,0,0,0,0,1 [0]PETSC ERROR: -binary_read_double [0]PETSC ERROR: -dim 3 [0]PETSC ERROR: -dm_refine 1 [0]PETSC ERROR: -dt 0.001 [0]PETSC ERROR: -edges 3,3 [0]PETSC ERROR: -floc 0.25,0.75,0.25,0.75,0.25,0.75 [0]PETSC ERROR: -fnum 0 [0]PETSC ERROR: -ftime 0,99 [0]PETSC ERROR: -fval 1 [0]PETSC ERROR: -ksp_max_it 50000 [0]PETSC ERROR: -ksp_rtol 1.0e-10 [0]PETSC ERROR: -ksp_type cg [0]PETSC ERROR: -log_summary [0]PETSC ERROR: -lower 0,0 [0]PETSC ERROR: -mat_petscspace_order 0 [0]PETSC ERROR: -mesh datafiles/cube_with_hole4_mesh.dat [0]PETSC ERROR: -mu 1 [0]PETSC ERROR: -nonneg 1 [0]PETSC ERROR: -numsteps 0 [0]PETSC ERROR: -options_left 0 [0]PETSC ERROR: -pc_type jacobi [0]PETSC ERROR: -petscpartitioner_type parmetis [0]PETSC ERROR: -progress 1 [0]PETSC ERROR: -simplex 1 [0]PETSC ERROR: -solution_petscspace_order 1 [0]PETSC ERROR: -tao_fatol 1e-8 [0]PETSC ERROR: -tao_frtol 1e-8 [0]PETSC ERROR: -tao_max_it 50000 [0]PETSC ERROR: -tao_monitor [0]PETSC ERROR: -tao_type blmvm [0]PETSC ERROR: -tao_view [0]PETSC ERROR: -trans datafiles/cube_with_hole4_trans.dat [0]PETSC ERROR: -upper 1,1 [0]PETSC ERROR: -vtuname figures/cube_with_hole_4 [0]PETSC ERROR: -vtuprint 1 [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 76) - process 0 [cli_0]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 76) - process 0 =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 12626 RUNNING AT pacotaco-xps = EXIT CODE: 76 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== Know what's going on here? Thanks, Justin On Fri, Jun 26, 2015 at 11:24 AM, Barry Smith wrote: > > We have an undocumented option -binary_read_double that will read in > double precision from the file and place it into a quad precision array. > This is exactly what you need > > Barry > > Yes, we should fix up our binary viewers to allow reading and writing > generally for any precision but need a volunteer to do it. > > > > > On Jun 26, 2015, at 1:45 AM, Justin Chang wrote: > > > > Hi all, > > > > I need to run simulations that rely on several of my custom binary > datafiles (written in 32 bit int and 64 bit doubles). These date files were > generated from MATLAB. In my PETSc code I invoke PetscBinaryRead(...) into > these binary files, which gives me mesh data, auxiliaries, etc. > > > > However, when I now configure with quad precision > (--with-precision=__float128 and --download-f2cblaslapack) my > PetscBinaryRead() functions give me segmentation violation errors. I am > guessing this is because I have binary files written in double precision > but have PETSc which reads in quad precision, meaning I will be reading > past the end of these files due to the larger strides. > > > > So my question is, is there a way to circumvent this issue? That is, to > read double-precision binary data into a PETSc program configured with > quad-precision? Otherwise I would have to rewrite or redo all of my > datafiles, which I would prefer not to do if possible. > > > > Thanks, > > Justin > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Sat Jun 27 02:05:13 2015 From: jychang48 at gmail.com (Justin Chang) Date: Sat, 27 Jun 2015 02:05:13 -0500 Subject: [petsc-users] Parallel IO of DMPlex Message-ID: Hi everyone, I see that parallel IO of various input mesh formats for DMPlex is not currently supported. However, is there a way to read a custom mesh file in parallel? For instance, I have a binary data file formatted like this: Reading this will allow me to create a DMPlex with DMPlexCreateFromDAG(). Currently, only the root process reads this binary mesh file and creates the DMPlex whereas the other processes create an empty DMPlex. Then all processes invoke DMPlexDistribute(). This one-to-all distribution seems to be a bottleneck, and I want to know if it's possible to have each process read a local portion of the connectivity and coordinates arrays and let DMPlex/METIS/ParMETIS handle the load balancing and redistribution. Intuitively this would be easy to write, but again I want to know how to do this through leveraging the functions and routines within DMPlex. Thanks, Justin -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Jun 27 05:54:06 2015 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 27 Jun 2015 05:54:06 -0500 Subject: [petsc-users] PetscBinaryRead and quad precision In-Reply-To: References: Message-ID: On Fri, Jun 26, 2015 at 10:46 PM, Justin Chang wrote: > Barry, > > Thank you that kind of did the trick, although I have run into a couple > issues: > > 1) Is it expected for quad precision to have a significantly worse > performance than double precision in terms of wall-clock time? What took 60 > seconds to solve with double precision now takes approximately 1500 seconds > to solve. I am guessing this has a lot to do with the fact that scalars are > now 16 bytes large as opposed to 8? > > 2) The program crashes at DMPlexDistribute() when I used two or more > processors. Here's the error log: > I am probably not careful to insulate ParMetis from PetscReal. I have to look at it. Thanks, Matt > Input Error: Incorrect sum of 0.000000 for tpwgts for constraint 0. > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Error in external library > [0]PETSC ERROR: Error in METIS_PartGraphKway() > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.6-815-g2d9afd9 GIT > Date: 2015-06-26 18:48:28 -0500 > [0]PETSC ERROR: ./main on a arch-linux2-quad-opt named pacotaco-xps by > justin Fri Jun 26 22:29:36 2015 > [0]PETSC ERROR: Configure options --download-f2cblaslapack > --download-metis --download-mpich --download-parmetis --download-triangle > --with-cc=gcc --with-cmake=cmake --with-cxx=g++ --with-debugging=0 > --with-fc=gfortran --with-valgrind=1 PETSC_ARCH=arch-linux2-quad-opt > --with-precision=__float128 > [0]PETSC ERROR: #1 PetscPartitionerPartition_ParMetis() line 1181 in > /home/justin/Software/petsc-dev/src/dm/impls/plex/plexpartition.c > [0]PETSC ERROR: #2 PetscPartitionerPartition() line 653 in > /home/justin/Software/petsc-dev/src/dm/impls/plex/plexpartition.c > [0]PETSC ERROR: #3 DMPlexDistribute() line 1505 in > /home/justin/Software/petsc-dev/src/dm/impls/plex/plexdistribute.c > [0]PETSC ERROR: #4 CreateMesh() line 762 in > /home/justin/Dropbox/DMPlex-nonneg/main.c > [0]PETSC ERROR: #5 main() line 993 in > /home/justin/Dropbox/DMPlex-nonneg/main.c > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -al 1 > [0]PETSC ERROR: -am 0 > [0]PETSC ERROR: -at 0.001 > [0]PETSC ERROR: -bcloc > 0,1,0,1,0,0,0,1,0,1,1,1,0,0,0,1,0,1,1,1,0,1,0,1,0,1,0,0,0,1,0,1,1,1,0,1,0.45,0.55,0.45,0.55,0.45,0.55 > [0]PETSC ERROR: -bcnum 7 > [0]PETSC ERROR: -bcval 0,0,0,0,0,0,1 > [0]PETSC ERROR: -binary_read_double > [0]PETSC ERROR: -dim 3 > [0]PETSC ERROR: -dm_refine 1 > [0]PETSC ERROR: -dt 0.001 > [0]PETSC ERROR: -edges 3,3 > [0]PETSC ERROR: -floc 0.25,0.75,0.25,0.75,0.25,0.75 > [0]PETSC ERROR: -fnum 0 > [0]PETSC ERROR: -ftime 0,99 > [0]PETSC ERROR: -fval 1 > [0]PETSC ERROR: -ksp_max_it 50000 > [0]PETSC ERROR: -ksp_rtol 1.0e-10 > [0]PETSC ERROR: -ksp_type cg > [0]PETSC ERROR: -log_summary > [0]PETSC ERROR: -lower 0,0 > [0]PETSC ERROR: -mat_petscspace_order 0 > [0]PETSC ERROR: -mesh datafiles/cube_with_hole4_mesh.dat > [0]PETSC ERROR: -mu 1 > [0]PETSC ERROR: -nonneg 1 > [0]PETSC ERROR: -numsteps 0 > [0]PETSC ERROR: -options_left 0 > [0]PETSC ERROR: -pc_type jacobi > [0]PETSC ERROR: -petscpartitioner_type parmetis > [0]PETSC ERROR: -progress 1 > [0]PETSC ERROR: -simplex 1 > [0]PETSC ERROR: -solution_petscspace_order 1 > [0]PETSC ERROR: -tao_fatol 1e-8 > [0]PETSC ERROR: -tao_frtol 1e-8 > [0]PETSC ERROR: -tao_max_it 50000 > [0]PETSC ERROR: -tao_monitor > [0]PETSC ERROR: -tao_type blmvm > [0]PETSC ERROR: -tao_view > [0]PETSC ERROR: -trans datafiles/cube_with_hole4_trans.dat > [0]PETSC ERROR: -upper 1,1 > [0]PETSC ERROR: -vtuname figures/cube_with_hole_4 > [0]PETSC ERROR: -vtuprint 1 > [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > application called MPI_Abort(MPI_COMM_WORLD, 76) - process 0 > [cli_0]: aborting job: > application called MPI_Abort(MPI_COMM_WORLD, 76) - process 0 > > > =================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = PID 12626 RUNNING AT pacotaco-xps > = EXIT CODE: 76 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > > =================================================================================== > > Know what's going on here? > > Thanks, > Justin > > On Fri, Jun 26, 2015 at 11:24 AM, Barry Smith wrote: > >> >> We have an undocumented option -binary_read_double that will read in >> double precision from the file and place it into a quad precision array. >> This is exactly what you need >> >> Barry >> >> Yes, we should fix up our binary viewers to allow reading and writing >> generally for any precision but need a volunteer to do it. >> >> >> >> > On Jun 26, 2015, at 1:45 AM, Justin Chang wrote: >> > >> > Hi all, >> > >> > I need to run simulations that rely on several of my custom binary >> datafiles (written in 32 bit int and 64 bit doubles). These date files were >> generated from MATLAB. In my PETSc code I invoke PetscBinaryRead(...) into >> these binary files, which gives me mesh data, auxiliaries, etc. >> > >> > However, when I now configure with quad precision >> (--with-precision=__float128 and --download-f2cblaslapack) my >> PetscBinaryRead() functions give me segmentation violation errors. I am >> guessing this is because I have binary files written in double precision >> but have PETSc which reads in quad precision, meaning I will be reading >> past the end of these files due to the larger strides. >> > >> > So my question is, is there a way to circumvent this issue? That is, to >> read double-precision binary data into a PETSc program configured with >> quad-precision? Otherwise I would have to rewrite or redo all of my >> datafiles, which I would prefer not to do if possible. >> > >> > Thanks, >> > Justin >> > >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Jun 27 06:01:03 2015 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 27 Jun 2015 06:01:03 -0500 Subject: [petsc-users] Parallel IO of DMPlex In-Reply-To: References: Message-ID: On Sat, Jun 27, 2015 at 2:05 AM, Justin Chang wrote: > Hi everyone, > > I see that parallel IO of various input mesh formats for DMPlex is not > currently supported. However, is there a way to read a custom mesh file in > parallel? > > For instance, I have a binary data file formatted like this: > > > > > > ....> > ....> > > ....> > > Reading this will allow me to create a DMPlex with DMPlexCreateFromDAG(). > Currently, only the root process reads this binary mesh file and creates > the DMPlex whereas the other processes create an empty DMPlex. Then all > processes invoke DMPlexDistribute(). This one-to-all distribution seems to > be a bottleneck, and I want to know if it's possible to have each process > read a local portion of the connectivity and coordinates arrays and let > DMPlex/METIS/ParMETIS handle the load balancing and redistribution. > Intuitively this would be easy to write, but again I want to know how to do > this through leveraging the functions and routines within DMPlex. > This is on our agenda for the fall, but I can describe the process: a) Do a naive partition of the cells (easy) b) Read the connectivity for "your" cells (easy) c) Read "your" coordinates (tricky) d) Read "your" exterior (tricky) e) Create local DAGs (easy) f) Create SF for shared boundary (hard) g) Repartition and redistribute (easy) You could start writing this for your format and we could help. I probably will not get to the generic one until late in the year. Thanks, Matt Thanks, > Justin > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From gianmail at gmail.com Sat Jun 27 13:25:04 2015 From: gianmail at gmail.com (Gianluca Meneghello) Date: Sat, 27 Jun 2015 11:25:04 -0700 Subject: [petsc-users] DM coordinates interpolations In-Reply-To: References: Message-ID: Dear Matthew, Berry and Dave, Thanks for your reply. I will do as you suggest: only two more questions: - is the beginning of formFunction the right place to set up the coordinates? As far as I understand I do not have access to the refined DM before running the code with -snes_grid_sequence. - if so, what should I check in order to avoid to recompute the coordinates at each formFunction call? e.g., what does DMGetCoordinates return NULL as the coordinate vector (or something else) if called when the coordinates have not yet been set up? Thanks again Gianluca On Fri, Jun 26, 2015 at 11:28 AM, Dave May wrote: > Also note that if you go and modify the global coordinate vector after > calling DMDASetUniformCoordinates, you will need to explicitly call the > vecscatter yourself to update the local coordinates. > > > > On Friday, 26 June 2015, Barry Smith wrote: > >> >> > On Jun 26, 2015, at 12:27 PM, Gianluca Meneghello >> wrote: >> > >> > Dear all, >> > >> > I would like to solve a PDE discretized on a nonuniform --- but >> rectangular --- grid and I wanted to use the DM coordinates vector to >> compute the metric terms by finite difference. The only alternative I see >> is to recompute the coordinates (and then the metric terms) at every >> function and jacobian evaluation call. >> > >> > My question is what is the best (or even correct) way to provide the >> coordinates to the newly created da objects. Is there anything like a >> DMDASetNonUniformCoordinates to which to pass a function computing the >> coordinates? As far as I can tell the fine grid coordinates are currently >> linearly interpolated from the coarse grid ones. >> >> Call DMDASetUniformCoordinates() on each level then call >> DMGetCoordinates() and put the coordinate values you want in. You can call >> DMGetCoordinateDM(dm, &dmcoor) to get the DM that goes with the coordinate >> vector and use DMDAVecGetArray(dmcoor,coor,&array) to give easy access to >> the entries. >> >> > >> > Please also let me thank you for your great work: it has been and it >> currently is of enormous help. >> > >> > Best >> > >> > Gianluca >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sat Jun 27 13:28:11 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 27 Jun 2015 13:28:11 -0500 Subject: [petsc-users] PetscBinaryRead and quad precision In-Reply-To: References: Message-ID: > On Jun 26, 2015, at 10:46 PM, Justin Chang wrote: > > Barry, > > Thank you that kind of did the trick, although I have run into a couple issues: > > 1) Is it expected for quad precision to have a significantly worse performance than double precision in terms of wall-clock time? What took 60 seconds to solve with double precision now takes approximately 1500 seconds to solve. This is possible. You can look at -log_summary output for both cases to see where the time is spent. Double precision is done in hardware but quad precision is done in software by using the double precision operation multiple times so it is much slower. > > 2) The program crashes at DMPlexDistribute() when I used two or more processors. Here's the error log: > > Input Error: Incorrect sum of 0.000000 for tpwgts for constraint 0. > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Error in external library > [0]PETSC ERROR: Error in METIS_PartGraphKway() > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.6-815-g2d9afd9 GIT Date: 2015-06-26 18:48:28 -0500 > [0]PETSC ERROR: ./main on a arch-linux2-quad-opt named pacotaco-xps by justin Fri Jun 26 22:29:36 2015 > [0]PETSC ERROR: Configure options --download-f2cblaslapack --download-metis --download-mpich --download-parmetis --download-triangle --with-cc=gcc --with-cmake=cmake --with-cxx=g++ --with-debugging=0 --with-fc=gfortran --with-valgrind=1 PETSC_ARCH=arch-linux2-quad-opt --with-precision=__float128 > [0]PETSC ERROR: #1 PetscPartitionerPartition_ParMetis() line 1181 in /home/justin/Software/petsc-dev/src/dm/impls/plex/plexpartition.c > [0]PETSC ERROR: #2 PetscPartitionerPartition() line 653 in /home/justin/Software/petsc-dev/src/dm/impls/plex/plexpartition.c > [0]PETSC ERROR: #3 DMPlexDistribute() line 1505 in /home/justin/Software/petsc-dev/src/dm/impls/plex/plexdistribute.c > [0]PETSC ERROR: #4 CreateMesh() line 762 in /home/justin/Dropbox/DMPlex-nonneg/main.c > [0]PETSC ERROR: #5 main() line 993 in /home/justin/Dropbox/DMPlex-nonneg/main.c > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -al 1 > [0]PETSC ERROR: -am 0 > [0]PETSC ERROR: -at 0.001 > [0]PETSC ERROR: -bcloc 0,1,0,1,0,0,0,1,0,1,1,1,0,0,0,1,0,1,1,1,0,1,0,1,0,1,0,0,0,1,0,1,1,1,0,1,0.45,0.55,0.45,0.55,0.45,0.55 > [0]PETSC ERROR: -bcnum 7 > [0]PETSC ERROR: -bcval 0,0,0,0,0,0,1 > [0]PETSC ERROR: -binary_read_double > [0]PETSC ERROR: -dim 3 > [0]PETSC ERROR: -dm_refine 1 > [0]PETSC ERROR: -dt 0.001 > [0]PETSC ERROR: -edges 3,3 > [0]PETSC ERROR: -floc 0.25,0.75,0.25,0.75,0.25,0.75 > [0]PETSC ERROR: -fnum 0 > [0]PETSC ERROR: -ftime 0,99 > [0]PETSC ERROR: -fval 1 > [0]PETSC ERROR: -ksp_max_it 50000 > [0]PETSC ERROR: -ksp_rtol 1.0e-10 > [0]PETSC ERROR: -ksp_type cg > [0]PETSC ERROR: -log_summary > [0]PETSC ERROR: -lower 0,0 > [0]PETSC ERROR: -mat_petscspace_order 0 > [0]PETSC ERROR: -mesh datafiles/cube_with_hole4_mesh.dat > [0]PETSC ERROR: -mu 1 > [0]PETSC ERROR: -nonneg 1 > [0]PETSC ERROR: -numsteps 0 > [0]PETSC ERROR: -options_left 0 > [0]PETSC ERROR: -pc_type jacobi > [0]PETSC ERROR: -petscpartitioner_type parmetis > [0]PETSC ERROR: -progress 1 > [0]PETSC ERROR: -simplex 1 > [0]PETSC ERROR: -solution_petscspace_order 1 > [0]PETSC ERROR: -tao_fatol 1e-8 > [0]PETSC ERROR: -tao_frtol 1e-8 > [0]PETSC ERROR: -tao_max_it 50000 > [0]PETSC ERROR: -tao_monitor > [0]PETSC ERROR: -tao_type blmvm > [0]PETSC ERROR: -tao_view > [0]PETSC ERROR: -trans datafiles/cube_with_hole4_trans.dat > [0]PETSC ERROR: -upper 1,1 > [0]PETSC ERROR: -vtuname figures/cube_with_hole_4 > [0]PETSC ERROR: -vtuprint 1 > [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- > application called MPI_Abort(MPI_COMM_WORLD, 76) - process 0 > [cli_0]: aborting job: > application called MPI_Abort(MPI_COMM_WORLD, 76) - process 0 > > =================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = PID 12626 RUNNING AT pacotaco-xps > = EXIT CODE: 76 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > =================================================================================== > > Know what's going on here? > > Thanks, > Justin > > On Fri, Jun 26, 2015 at 11:24 AM, Barry Smith wrote: > > We have an undocumented option -binary_read_double that will read in double precision from the file and place it into a quad precision array. This is exactly what you need > > Barry > > Yes, we should fix up our binary viewers to allow reading and writing generally for any precision but need a volunteer to do it. > > > > > On Jun 26, 2015, at 1:45 AM, Justin Chang wrote: > > > > Hi all, > > > > I need to run simulations that rely on several of my custom binary datafiles (written in 32 bit int and 64 bit doubles). These date files were generated from MATLAB. In my PETSc code I invoke PetscBinaryRead(...) into these binary files, which gives me mesh data, auxiliaries, etc. > > > > However, when I now configure with quad precision (--with-precision=__float128 and --download-f2cblaslapack) my PetscBinaryRead() functions give me segmentation violation errors. I am guessing this is because I have binary files written in double precision but have PETSc which reads in quad precision, meaning I will be reading past the end of these files due to the larger strides. > > > > So my question is, is there a way to circumvent this issue? That is, to read double-precision binary data into a PETSc program configured with quad-precision? Otherwise I would have to rewrite or redo all of my datafiles, which I would prefer not to do if possible. > > > > Thanks, > > Justin > > > > From bsmith at mcs.anl.gov Sat Jun 27 13:31:23 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 27 Jun 2015 13:31:23 -0500 Subject: [petsc-users] DM coordinates interpolations In-Reply-To: References: Message-ID: <92F3F647-6DDE-4BD1-A657-DCA520039DB2@mcs.anl.gov> > On Jun 27, 2015, at 1:25 PM, Gianluca Meneghello wrote: > > Dear Matthew, Berry and Dave, > > Thanks for your reply. I will do as you suggest: only two more questions: > > - is the beginning of formFunction the right place to set up the coordinates? As far as I understand I do not have access to the refined DM before running the code with -snes_grid_sequence. > > - if so, what should I check in order to avoid to recompute the coordinates at each formFunction call? e.g., what does DMGetCoordinates return NULL as the coordinate vector (or something else) if called when the coordinates have not yet been set up? Yes it returns NULL so you can use that as your check. > > Thanks again > > Gianluca > > > > On Fri, Jun 26, 2015 at 11:28 AM, Dave May wrote: > Also note that if you go and modify the global coordinate vector after calling DMDASetUniformCoordinates, you will need to explicitly call the vecscatter yourself to update the local coordinates. > > > > On Friday, 26 June 2015, Barry Smith wrote: > > > On Jun 26, 2015, at 12:27 PM, Gianluca Meneghello wrote: > > > > Dear all, > > > > I would like to solve a PDE discretized on a nonuniform --- but rectangular --- grid and I wanted to use the DM coordinates vector to compute the metric terms by finite difference. The only alternative I see is to recompute the coordinates (and then the metric terms) at every function and jacobian evaluation call. > > > > My question is what is the best (or even correct) way to provide the coordinates to the newly created da objects. Is there anything like a DMDASetNonUniformCoordinates to which to pass a function computing the coordinates? As far as I can tell the fine grid coordinates are currently linearly interpolated from the coarse grid ones. > > Call DMDASetUniformCoordinates() on each level then call DMGetCoordinates() and put the coordinate values you want in. You can call DMGetCoordinateDM(dm, &dmcoor) to get the DM that goes with the coordinate vector and use DMDAVecGetArray(dmcoor,coor,&array) to give easy access to the entries. > > > > > Please also let me thank you for your great work: it has been and it currently is of enormous help. > > > > Best > > > > Gianluca > > From s.kramer at imperial.ac.uk Sun Jun 28 11:52:15 2015 From: s.kramer at imperial.ac.uk (Stephan Kramer) Date: Sun, 28 Jun 2015 17:52:15 +0100 Subject: [petsc-users] move from KSPSetNullSpace to MatSetNullSpace In-Reply-To: References: Message-ID: <5590263F.9000107@imperial.ac.uk> >> Sorry if we're talking completely cross-purposes here: but the pcksp solve >> (that doesn't actually have a nullspace) is inside a solve that does have a >> nullspace. If what you are saying is that applying the nullspace inside the >> pcksp solve does not affect the outer solve, I can only see that if the >> difference in outcome of the pcksp solve (with or without the nullspace) >> would be inside the nullspace, because such a difference would be projected >> out anyway straight after the pcksp solve. I don't see that this is true >> however. My reasoning is the following: >> >> Say the original PCKSP solve (that doesn't apply the nullspace) gets >> passed some residual r from the outer solve, so it solves: >> >> Pz=r >> >> where P is our mass matrix approximation to the system in the outer solve, >> and P is full rank. Now if we do remove the nullspace during the pcksp >> solve, effectively we change the system to the following: >> >> NM^{-1}P z = NM^{-1}r >> >> where M^{-1} is the preconditioner inside the pcksp solve (assuming >> left-preconditiong here), and N is the projection operator that projects >> out the nullspace. This system is now rank-deficient, where I can add: >> >> z -> z + P^{-1}M n >> >> for arbitrary n in the nullspace. >> >> So not only is the possible difference between solving with or without a >> nullspace not found in that nullspace, but worse, I've ended up with a >> preconditioner that's rank deficient in the outer solve. >> > > This analysis does not make sense to me. The whole point of having all this > nullspace stuff is so that we do not get > a component of the null space in the solution. The way we avoid these in > Krylov methods is to project them out, > which is what happens when you attach it. Thus, your PCKSP solution 'z' > will not have any 'n' component. Your > outer solve will also not have any 'n' component, so I do not see where the > inconsistency comes in. The nullspace component does indeed get projected out in the pcksp solve. However that does not mean it gives the right answer - the right answer would be the same answer I would get before (when we weren't subtracting the nullspace in the pcksp) modulo the nullspace. For instance if my initial residual r is exactly Mn I would get the answer 0 instead of P^{-1}Mn - or equally correct would have been P^{-1}Mn with the nullspace component projected out afterwards, but not 0. So the inconsistency is in having a preconditioner that's not full-rank and whose nullspace is different than the nullspace of the real matrix in the outer solve. Some more observations: * if I switch the preconditioner inside the pcksp to be a right-preconditioner (-ksp_ksp_pc_side right) I get back the correct answer - which is kind of what I expect as in that case it should just give the "correct" pcksp answer with the nullspace projected out * if instead of gmres+sor for the pcksp solve, I select cg+sor (which is what I actually want), the pcksp solve fails with an indefinite pc. Both cases produce the same wrong answer in the outer solve (both using default left-preconditioning) Hope this makes sense Cheers Stephan From s.kramer at imperial.ac.uk Sun Jun 28 12:20:40 2015 From: s.kramer at imperial.ac.uk (Stephan Kramer) Date: Sun, 28 Jun 2015 18:20:40 +0100 Subject: [petsc-users] move from KSPSetNullSpace to MatSetNullSpace In-Reply-To: <3A881365-E683-4DAB-B4EC-49236045F309@mcs.anl.gov> References: <3A881365-E683-4DAB-B4EC-49236045F309@mcs.anl.gov> Message-ID: <55902CE8.1050702@imperial.ac.uk> > If we move the location of the nullspace used in removal from the pmat to the mat would that completely resolve the problem for you? > > Barry That would indeed resolve the issue (and to me would also make the most sense) Stephan From bsmith at mcs.anl.gov Sun Jun 28 14:00:02 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 28 Jun 2015 14:00:02 -0500 Subject: [petsc-users] move from KSPSetNullSpace to MatSetNullSpace In-Reply-To: <55902CE8.1050702@imperial.ac.uk> References: <3A881365-E683-4DAB-B4EC-49236045F309@mcs.anl.gov> <55902CE8.1050702@imperial.ac.uk> Message-ID: > On Jun 28, 2015, at 12:20 PM, Stephan Kramer wrote: > >> If we move the location of the nullspace used in removal from the pmat to the mat would that completely resolve the problem for you? >> >> Barry > > That would indeed resolve the issue (and to me would also make the most sense) Good. Unfortunately I realized later that this may not work when users use -snes_mf_operator which overwrites the operator provided by the user and any null space they attached to it. So the fix needs to be a little more subtle. Barry > > Stephan From jychang48 at gmail.com Mon Jun 29 03:20:30 2015 From: jychang48 at gmail.com (Justin Chang) Date: Mon, 29 Jun 2015 03:20:30 -0500 Subject: [petsc-users] Parallel IO of DMPlex In-Reply-To: References: Message-ID: Given the format I described in the previous post, I have the following mesh: 8 9 3 2 0 1 3 1 4 3 1 2 4 2 5 4 3 4 6 4 7 6 4 5 7 5 8 7 0.0 0.0 0.5 0.0 1.0 0.0 0.0 0.5 0.5 0.5 1.0 0.5 0.0 1.0 0.5 1.0 1.0 1.0 8 0 1 2 3 5 6 7 8 If I If i want to partition across 2 MPI processes, I would: a) have all processes open the file and read the first four entires. b) declare that rank 0 holds cells 0-3 and rank 1 holds cell 4-7. A special algorithm would be needed to handle cases where the number of cells is not divisible by the number of MPI processes c) rank 0 stores 0 1 3 1 4 3 1 2 4 2 5 4, rank 1 reads 3 4 6 4 7 6 4 5 7 5 8 7 d) traverse the respective arrays and read in the coordinates associated with the vertices. I am guessing this part is "tricky" because I cannot simply do a naive partitioning of the coordinates the way I could with the connectivity? e) Wont worry about exterior nodes for now f) With the above I invoke DMPlexCreateFromDAG(), (of course making sure that the numbering is of right format) g) When creating a SF for shared boundary, does this mean determining which nodes are ghosted and which ones are local? For a structured looking grid this would be easy since I could employ some sort of round-robin scheme. h) Does the DMPlexDistribute() also take care of the repartitioning and redistribution? Thanks, Justin On Sat, Jun 27, 2015 at 6:01 AM, Matthew Knepley wrote: > On Sat, Jun 27, 2015 at 2:05 AM, Justin Chang wrote: > >> Hi everyone, >> >> I see that parallel IO of various input mesh formats for DMPlex is not >> currently supported. However, is there a way to read a custom mesh file in >> parallel? >> >> For instance, I have a binary data file formatted like this: >> >> >> >> >> >> > ....> >> > ....> >> >> > ....> >> >> Reading this will allow me to create a DMPlex with DMPlexCreateFromDAG(). >> Currently, only the root process reads this binary mesh file and creates >> the DMPlex whereas the other processes create an empty DMPlex. Then all >> processes invoke DMPlexDistribute(). This one-to-all distribution seems to >> be a bottleneck, and I want to know if it's possible to have each process >> read a local portion of the connectivity and coordinates arrays and let >> DMPlex/METIS/ParMETIS handle the load balancing and redistribution. >> Intuitively this would be easy to write, but again I want to know how to do >> this through leveraging the functions and routines within DMPlex. >> > > This is on our agenda for the fall, but I can describe the process: > > a) Do a naive partition of the cells (easy) > > b) Read the connectivity for "your" cells (easy) > > c) Read "your" coordinates (tricky) > > d) Read "your" exterior (tricky) > > e) Create local DAGs (easy) > > f) Create SF for shared boundary (hard) > > g) Repartition and redistribute (easy) > > You could start writing this for your format and we could help. I probably > will not get to the generic one until late in the year. > > Thanks, > > Matt > > Thanks, >> Justin >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Carol.Brickley at awe.co.uk Mon Jun 29 09:22:10 2015 From: Carol.Brickley at awe.co.uk (Carol.Brickley at awe.co.uk) Date: Mon, 29 Jun 2015 14:22:10 +0000 Subject: [petsc-users] QCG and KSPSetPCSide Message-ID: <201506291422.t5TEM0SC024680@msw2.awe.co.uk> All, I am trying to use KSPSolve for a QCG solver using petsc 3.4.3 and it gives me the error of "PC does not have left symmetric apply!". Further reading on the web pages suggest I need to use KSPSetPCSide to set up a symmetric side for the solver. However, this does not seem to work. I am doing the following in an F90 code: call KSPSetPCType(ksp_ptr%ksp, 'qcg', petsc_err) call KSPSetPCSide(ksp_ptr%ksp, PC_SYMMETRIC, petsc_err) but I still get the same failure message when I call KSPSolve(ksp_ptr%ksp..) later on. Any ideas as to why this is not working? Carol Dr Carol Brickley BSc,PhD,ARCS,DIC,MBCS Senior Software Engineer Applied Computer Science DS+T, AWE Aldermaston Reading Berkshire RG7 4PR Direct: 0118 9855035 ___________________________________________________ ____________________________ The information in this email and in any attachment(s) is commercial in confidence. If you are not the named addressee(s) or if you receive this email in error then any distribution, copying or use of this communication or the information in it is strictly prohibited. Please notify us immediately by email at admin.internet(at)awe.co.uk, and then delete this message from your computer. While attachments are virus checked, AWE plc does not accept any liability in respect of any virus which is not detected. AWE Plc Registered in England and Wales Registration No 02763902 AWE, Aldermaston, Reading, RG7 4PR From hzhang at mcs.anl.gov Mon Jun 29 09:53:53 2015 From: hzhang at mcs.anl.gov (Hong) Date: Mon, 29 Jun 2015 09:53:53 -0500 Subject: [petsc-users] QCG and KSPSetPCSide In-Reply-To: <201506291422.t5TEM0SC024680@msw2.awe.co.uk> References: <201506291422.t5TEM0SC024680@msw2.awe.co.uk> Message-ID: Carol, I tested qcg with petsc/src/ksp/ksp/examples/tutorials/ex2.c and found it crashes with np=2, pc=bjacobi. Then I found in http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/KSP/KSPQCG.html Notes: This is rarely used directly Currently we allow symmetric preconditioning with the following scaling matricesPCNONE : D = Identity matrix PCJACOBI : D = diag [d_1, d_2, ...., d_n], where d_i = sqrt(H[i,i]) PCICC : D = L^T, implemented with forward and backward solves. Here L is an incomplete Cholesky factor of H. Hong On Mon, Jun 29, 2015 at 9:22 AM, wrote: > All, > > I am trying to use KSPSolve for a QCG solver using petsc 3.4.3 and it > gives me the error of "PC does not have left symmetric apply!". Further > reading on the web pages suggest I need to use KSPSetPCSide to set up a > symmetric side for the solver. However, this does not seem to work. I am > doing the following in an F90 code: > > call KSPSetPCType(ksp_ptr%ksp, 'qcg', petsc_err) > call KSPSetPCSide(ksp_ptr%ksp, PC_SYMMETRIC, petsc_err) > > but I still get the same failure message when I call > KSPSolve(ksp_ptr%ksp..) later on. > > Any ideas as to why this is not working? > > Carol > > > > > > Dr Carol Brickley > BSc,PhD,ARCS,DIC,MBCS > > Senior Software Engineer > Applied Computer Science > DS+T, > AWE > Aldermaston > Reading > Berkshire > RG7 4PR > > Direct: 0118 9855035 > > > > ___________________________________________________ > ____________________________ > > The information in this email and in any attachment(s) is > commercial in confidence. If you are not the named addressee(s) > or > if you receive this email in error then any distribution, copying or > use of this communication or the information in it is strictly > prohibited. Please notify us immediately by email at > admin.internet(at)awe.co.uk, and then delete this message from > your computer. While attachments are virus checked, AWE plc > does not accept any liability in respect of any virus which is not > detected. > > AWE Plc > Registered in England and Wales > Registration No 02763902 > AWE, Aldermaston, Reading, RG7 4PR > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Jun 29 09:54:45 2015 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 29 Jun 2015 09:54:45 -0500 Subject: [petsc-users] QCG and KSPSetPCSide In-Reply-To: <201506291422.t5TEM0SC024680@msw2.awe.co.uk> References: <201506291422.t5TEM0SC024680@msw2.awe.co.uk> Message-ID: On Mon, Jun 29, 2015 at 9:22 AM, wrote: > All, > > I am trying to use KSPSolve for a QCG solver using petsc 3.4.3 and it > gives me the error of "PC does not have left symmetric apply!". Further > reading on the web pages suggest I CG is the Krylov Solver (KSP), but the error message is about the Preconditioner (PC). QCG by default uses symmetric application, and by default the PC is ILU which does not have a symmetric apply. You can use -ksp_pc_side left Thanks, Matt > need to use KSPSetPCSide to set up a symmetric side for the solver. > However, this does not seem to work. I am doing the following in an F90 > code: > > call KSPSetPCType(ksp_ptr%ksp, 'qcg', petsc_err) > call KSPSetPCSide(ksp_ptr%ksp, PC_SYMMETRIC, petsc_err) > > but I still get the same failure message when I call > KSPSolve(ksp_ptr%ksp..) later on. > > Any ideas as to why this is not working? > > Carol > > > > > > Dr Carol Brickley > BSc,PhD,ARCS,DIC,MBCS > > Senior Software Engineer > Applied Computer Science > DS+T, > AWE > Aldermaston > Reading > Berkshire > RG7 4PR > > Direct: 0118 9855035 > > > > ___________________________________________________ > ____________________________ > > The information in this email and in any attachment(s) is > commercial in confidence. If you are not the named addressee(s) > or > if you receive this email in error then any distribution, copying or > use of this communication or the information in it is strictly > prohibited. Please notify us immediately by email at > admin.internet(at)awe.co.uk, and then delete this message from > your computer. While attachments are virus checked, AWE plc > does not accept any liability in respect of any virus which is not > detected. > > AWE Plc > Registered in England and Wales > Registration No 02763902 > AWE, Aldermaston, Reading, RG7 4PR > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hong at aspiritech.org Mon Jun 29 09:59:24 2015 From: hong at aspiritech.org (hong at aspiritech.org) Date: Mon, 29 Jun 2015 09:59:24 -0500 Subject: [petsc-users] QCG and KSPSetPCSide In-Reply-To: References: <201506291422.t5TEM0SC024680@msw2.awe.co.uk> Message-ID: PETSc implementation of qcg only allows symmetric preconditioner, which is set in the code. Hong On Mon, Jun 29, 2015 at 9:54 AM, Matthew Knepley wrote: > On Mon, Jun 29, 2015 at 9:22 AM, wrote: > >> All, >> >> I am trying to use KSPSolve for a QCG solver using petsc 3.4.3 and it >> gives me the error of "PC does not have left symmetric apply!". Further >> reading on the web pages suggest I > > > CG is the Krylov Solver (KSP), but the error message is about the > Preconditioner (PC). QCG by default uses symmetric application, and by > default the PC is ILU which does not have a symmetric apply. You can use > -ksp_pc_side left > > Thanks, > > Matt > > >> need to use KSPSetPCSide to set up a symmetric side for the solver. >> However, this does not seem to work. I am doing the following in an F90 >> code: >> >> call KSPSetPCType(ksp_ptr%ksp, 'qcg', petsc_err) >> call KSPSetPCSide(ksp_ptr%ksp, PC_SYMMETRIC, petsc_err) >> >> but I still get the same failure message when I call >> KSPSolve(ksp_ptr%ksp..) later on. >> >> Any ideas as to why this is not working? >> >> Carol >> >> >> >> >> >> Dr Carol Brickley >> BSc,PhD,ARCS,DIC,MBCS >> >> Senior Software Engineer >> Applied Computer Science >> DS+T, >> AWE >> Aldermaston >> Reading >> Berkshire >> RG7 4PR >> >> Direct: 0118 9855035 >> >> >> >> ___________________________________________________ >> ____________________________ >> >> The information in this email and in any attachment(s) is >> commercial in confidence. If you are not the named addressee(s) >> or >> if you receive this email in error then any distribution, copying or >> use of this communication or the information in it is strictly >> prohibited. Please notify us immediately by email at >> admin.internet(at)awe.co.uk, and then delete this message from >> your computer. While attachments are virus checked, AWE plc >> does not accept any liability in respect of any virus which is not >> detected. >> >> AWE Plc >> Registered in England and Wales >> Registration No 02763902 >> AWE, Aldermaston, Reading, RG7 4PR >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mpovolot at purdue.edu Mon Jun 29 09:59:36 2015 From: mpovolot at purdue.edu (Michael Povolotskyi) Date: Mon, 29 Jun 2015 10:59:36 -0400 Subject: [petsc-users] Question about the error message Message-ID: <55915D58.2080707@purdue.edu> Dear PETSc developers and users, what does this error message mean? [0]PETSC ERROR: --------------------- Error Message ------------------------------------ [0]PETSC ERROR: Argument out of range! [0]PETSC ERROR: nnz cannot be greater than row length: local row 0 value 108 rowlength 72! [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Petsc Release Version 3.4.3, Oct, 15, 2013 [0]PETSC ERROR: See docs/changes/index.html for recent updates. [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. [0]PETSC ERROR: See docs/index.html for manual pages. [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: /home/mpovolot/NEMO5/prototype/nemo on a linux-static named conte-fe02.rcac.purdue.edu by mpovolot Mon Jun 29 10:49:32 2015 [0]PETSC ERROR: Libraries linked from /apps/rhel6/petsc/3.4.3_impi-4.1.1.036_intel-13.1.1.163/linux-static/lib [0]PETSC ERROR: Configure run at Sun Jan 19 12:47:22 2014 [0]PETSC ERROR: Configure options --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --download-sowing --with-scalar-type=complex --with-shared-libraries=0 --with-pic=1 --with-clanguage=C++ --with-fortran=1 --with-fortran-kernels=0 --with-debugging=0 --with-blas-lapack-dir=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64 --with-blacs-lib=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so --with-blacs-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include --with-scalapack-lib="-L/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64" --with-scalapack-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 --download-hdf5=yes --download-metis=yes --download-parmetis=yes --download-superlu_dist=yes --download-mumps=yes --download-hypre=no --download-spooles=yes [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: MatSeqAIJSetPreallocation_SeqAIJ() line 3524 in /apps/rhel6/petsc/3.4.3_impi-4.1.1.036_intel-13.1.1.163/src/mat/impls/aij/seq/aij.c [0]PETSC ERROR: MatSeqAIJSetPreallocation() line 3496 in /apps/rhel6/petsc/3.4.3_impi-4.1.1.036_intel-13.1.1.163/src/mat/impls/aij/seq/aij.c [0]PETSC ERROR: MatAXPY_SeqAIJ() line 2710 in /apps/rhel6/petsc/3.4.3_impi-4.1.1.036_intel-13.1.1.163/src/mat/impls/aij/seq/aij.c [0]PETSC ERROR: MatAXPY() line 39 in /apps/rhel6/petsc/3.4.3_impi-4.1.1.036_intel-13.1.1.163/src/mat/utils/axpy.c terminate called after throwing an instance of 'n5_runtime_error' what(): [PetscMatrixNemo] PETSc gave error with code 63: Argument out of range . Program received signal SIGABRT, Aborted. Thank you, Michael. From knepley at gmail.com Mon Jun 29 10:08:44 2015 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 29 Jun 2015 10:08:44 -0500 Subject: [petsc-users] Question about the error message In-Reply-To: <55915D58.2080707@purdue.edu> References: <55915D58.2080707@purdue.edu> Message-ID: On Mon, Jun 29, 2015 at 9:59 AM, Michael Povolotskyi wrote: > Dear PETSc developers and users, > what does this error message mean? > > [0]PETSC ERROR: --------------------- Error Message > ------------------------------------ > [0]PETSC ERROR: Argument out of range! > [0]PETSC ERROR: nnz cannot be greater than row length: local row 0 value > 108 rowlength 72! > It looks like a bug in our AXPY implementation. We try to do preallocation of the result, but it seems here that we are overallocating. Can you run with the latest release? Thanks, Matt > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Petsc Release Version 3.4.3, Oct, 15, 2013 > [0]PETSC ERROR: See docs/changes/index.html for recent updates. > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. > [0]PETSC ERROR: See docs/index.html for manual pages. > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: /home/mpovolot/NEMO5/prototype/nemo on a linux-static > named conte-fe02.rcac.purdue.edu by mpovolot Mon Jun 29 10:49:32 2015 > [0]PETSC ERROR: Libraries linked from > /apps/rhel6/petsc/3.4.3_impi-4.1.1.036_intel-13.1.1.163/linux-static/lib > [0]PETSC ERROR: Configure run at Sun Jan 19 12:47:22 2014 > [0]PETSC ERROR: Configure options --with-cc=mpiicc --with-cxx=mpiicpc > --with-fc=mpiifort --download-sowing --with-scalar-type=complex > --with-shared-libraries=0 --with-pic=1 --with-clanguage=C++ > --with-fortran=1 --with-fortran-kernels=0 --with-debugging=0 > --with-blas-lapack-dir=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64 > --with-blacs-lib=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so > --with-blacs-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include > --with-scalapack-lib="-L/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64 > -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64" > --with-scalapack-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include > --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 --download-hdf5=yes > --download-metis=yes --download-parmetis=yes --download-superlu_dist=yes > --download-mumps=yes --download-hypre=no --download-spooles=yes > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: MatSeqAIJSetPreallocation_SeqAIJ() line 3524 in > /apps/rhel6/petsc/3.4.3_impi-4.1.1.036_intel-13.1.1.163/src/mat/impls/aij/seq/aij.c > [0]PETSC ERROR: MatSeqAIJSetPreallocation() line 3496 in > /apps/rhel6/petsc/3.4.3_impi-4.1.1.036_intel-13.1.1.163/src/mat/impls/aij/seq/aij.c > [0]PETSC ERROR: MatAXPY_SeqAIJ() line 2710 in > /apps/rhel6/petsc/3.4.3_impi-4.1.1.036_intel-13.1.1.163/src/mat/impls/aij/seq/aij.c > [0]PETSC ERROR: MatAXPY() line 39 in > /apps/rhel6/petsc/3.4.3_impi-4.1.1.036_intel-13.1.1.163/src/mat/utils/axpy.c > terminate called after throwing an instance of 'n5_runtime_error' > what(): [PetscMatrixNemo] PETSc gave error with code 63: > Argument out of range > . > > Program received signal SIGABRT, Aborted. > > Thank you, > Michael. > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From mpovolot at purdue.edu Mon Jun 29 10:12:23 2015 From: mpovolot at purdue.edu (Michael Povolotskyi) Date: Mon, 29 Jun 2015 11:12:23 -0400 Subject: [petsc-users] Question about the error message In-Reply-To: References: <55915D58.2080707@purdue.edu> Message-ID: <55916057.4040008@purdue.edu> Thank you, I can try with a newer release. Have you re-implemented the function there? The problem is we have not move to the new API completely. Is there any workaround with Petsc 3.4? On 06/29/2015 11:08 AM, Matthew Knepley wrote: > On Mon, Jun 29, 2015 at 9:59 AM, Michael Povolotskyi > > wrote: > > Dear PETSc developers and users, > what does this error message mean? > > [0]PETSC ERROR: --------------------- Error Message > ------------------------------------ > [0]PETSC ERROR: Argument out of range! > [0]PETSC ERROR: nnz cannot be greater than row length: local row 0 > value 108 rowlength 72! > > > It looks like a bug in our AXPY implementation. We try to do > preallocation of the result, but it seems > here that we are overallocating. Can you run with the latest release? > > Thanks, > > Matt > > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Petsc Release Version 3.4.3, Oct, 15, 2013 > [0]PETSC ERROR: See docs/changes/index.html for recent updates. > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. > [0]PETSC ERROR: See docs/index.html for manual pages. > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: /home/mpovolot/NEMO5/prototype/nemo on a > linux-static named conte-fe02.rcac.purdue.edu > by mpovolot Mon Jun 29 > 10:49:32 2015 > [0]PETSC ERROR: Libraries linked from > /apps/rhel6/petsc/3.4.3_impi-4.1.1.036_intel-13.1.1.163/linux-static/lib > [0]PETSC ERROR: Configure run at Sun Jan 19 12:47:22 2014 > [0]PETSC ERROR: Configure options --with-cc=mpiicc > --with-cxx=mpiicpc --with-fc=mpiifort --download-sowing > --with-scalar-type=complex --with-shared-libraries=0 --with-pic=1 > --with-clanguage=C++ --with-fortran=1 --with-fortran-kernels=0 > --with-debugging=0 > --with-blas-lapack-dir=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64 > --with-blacs-lib=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so > --with-blacs-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include > --with-scalapack-lib="-L/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64 > -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64" > --with-scalapack-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include > --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 > --download-hdf5=yes --download-metis=yes --download-parmetis=yes > --download-superlu_dist=yes --download-mumps=yes > --download-hypre=no --download-spooles=yes > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: MatSeqAIJSetPreallocation_SeqAIJ() line 3524 in > /apps/rhel6/petsc/3.4.3_impi-4.1.1.036_intel-13.1.1.163/src/mat/impls/aij/seq/aij.c > [0]PETSC ERROR: MatSeqAIJSetPreallocation() line 3496 in > /apps/rhel6/petsc/3.4.3_impi-4.1.1.036_intel-13.1.1.163/src/mat/impls/aij/seq/aij.c > [0]PETSC ERROR: MatAXPY_SeqAIJ() line 2710 in > /apps/rhel6/petsc/3.4.3_impi-4.1.1.036_intel-13.1.1.163/src/mat/impls/aij/seq/aij.c > [0]PETSC ERROR: MatAXPY() line 39 in > /apps/rhel6/petsc/3.4.3_impi-4.1.1.036_intel-13.1.1.163/src/mat/utils/axpy.c > terminate called after throwing an instance of 'n5_runtime_error' > what(): [PetscMatrixNemo] PETSc gave error with code 63: > Argument out of range > . > > Program received signal SIGABRT, Aborted. > > Thank you, > Michael. > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -- Michael Povolotskyi, PhD Research Assistant Professor Network for Computational Nanotechnology Hall for Discover and Learning Research, Room 441 West Lafayette, IN 47907 Phone (765) 4949396 -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Jun 29 10:13:56 2015 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 29 Jun 2015 10:13:56 -0500 Subject: [petsc-users] Question about the error message In-Reply-To: <55916057.4040008@purdue.edu> References: <55915D58.2080707@purdue.edu> <55916057.4040008@purdue.edu> Message-ID: On Mon, Jun 29, 2015 at 10:12 AM, Michael Povolotskyi wrote: > Thank you, I can try with a newer release. > Have you re-implemented the function there? > The problem is we have not move to the new API completely. > Is there any workaround with Petsc 3.4? > I think there was a bugfix here, but I would have to confirm in the logs. If it works, we could try and backport the solution since the interface did not change. Thanks, Matt > On 06/29/2015 11:08 AM, Matthew Knepley wrote: > > On Mon, Jun 29, 2015 at 9:59 AM, Michael Povolotskyi > wrote: > >> Dear PETSc developers and users, >> what does this error message mean? >> >> [0]PETSC ERROR: --------------------- Error Message >> ------------------------------------ >> [0]PETSC ERROR: Argument out of range! >> [0]PETSC ERROR: nnz cannot be greater than row length: local row 0 value >> 108 rowlength 72! >> > > It looks like a bug in our AXPY implementation. We try to do > preallocation of the result, but it seems > here that we are overallocating. Can you run with the latest release? > > Thanks, > > Matt > > >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [0]PETSC ERROR: Petsc Release Version 3.4.3, Oct, 15, 2013 >> [0]PETSC ERROR: See docs/changes/index.html for recent updates. >> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. >> [0]PETSC ERROR: See docs/index.html for manual pages. >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [0]PETSC ERROR: /home/mpovolot/NEMO5/prototype/nemo on a linux-static >> named conte-fe02.rcac.purdue.edu by mpovolot Mon Jun 29 10:49:32 2015 >> [0]PETSC ERROR: Libraries linked from >> /apps/rhel6/petsc/3.4.3_impi-4.1.1.036_intel-13.1.1.163/linux-static/lib >> [0]PETSC ERROR: Configure run at Sun Jan 19 12:47:22 2014 >> [0]PETSC ERROR: Configure options --with-cc=mpiicc --with-cxx=mpiicpc >> --with-fc=mpiifort --download-sowing --with-scalar-type=complex >> --with-shared-libraries=0 --with-pic=1 --with-clanguage=C++ >> --with-fortran=1 --with-fortran-kernels=0 --with-debugging=0 >> --with-blas-lapack-dir=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64 >> --with-blacs-lib=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so >> --with-blacs-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include >> --with-scalapack-lib="-L/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64 >> -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64" >> --with-scalapack-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include >> --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 --download-hdf5=yes >> --download-metis=yes --download-parmetis=yes --download-superlu_dist=yes >> --download-mumps=yes --download-hypre=no --download-spooles=yes >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [0]PETSC ERROR: MatSeqAIJSetPreallocation_SeqAIJ() line 3524 in >> /apps/rhel6/petsc/3.4.3_impi-4.1.1.036_intel-13.1.1.163/src/mat/impls/aij/seq/aij.c >> [0]PETSC ERROR: MatSeqAIJSetPreallocation() line 3496 in >> /apps/rhel6/petsc/3.4.3_impi-4.1.1.036_intel-13.1.1.163/src/mat/impls/aij/seq/aij.c >> [0]PETSC ERROR: MatAXPY_SeqAIJ() line 2710 in >> /apps/rhel6/petsc/3.4.3_impi-4.1.1.036_intel-13.1.1.163/src/mat/impls/aij/seq/aij.c >> [0]PETSC ERROR: MatAXPY() line 39 in >> /apps/rhel6/petsc/3.4.3_impi-4.1.1.036_intel-13.1.1.163/src/mat/utils/axpy.c >> terminate called after throwing an instance of 'n5_runtime_error' >> what(): [PetscMatrixNemo] PETSc gave error with code 63: >> Argument out of range >> . >> >> Program received signal SIGABRT, Aborted. >> >> Thank you, >> Michael. >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > -- > Michael Povolotskyi, PhD > Research Assistant Professor > Network for Computational Nanotechnology > Hall for Discover and Learning Research, Room 441 > West Lafayette, IN 47907 > Phone (765) 4949396 > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From mpovolot at purdue.edu Mon Jun 29 10:15:14 2015 From: mpovolot at purdue.edu (Michael Povolotskyi) Date: Mon, 29 Jun 2015 11:15:14 -0400 Subject: [petsc-users] Question about the error message In-Reply-To: References: <55915D58.2080707@purdue.edu> <55916057.4040008@purdue.edu> Message-ID: <55916102.40003@purdue.edu> This would be great. Thank you, Michael. On 06/29/2015 11:13 AM, Matthew Knepley wrote: > On Mon, Jun 29, 2015 at 10:12 AM, Michael Povolotskyi > > wrote: > > Thank you, I can try with a newer release. > Have you re-implemented the function there? > The problem is we have not move to the new API completely. > Is there any workaround with Petsc 3.4? > > > I think there was a bugfix here, but I would have to confirm in the > logs. If it works, we could try and backport > the solution since the interface did not change. > > Thanks, > > Matt > > On 06/29/2015 11:08 AM, Matthew Knepley wrote: >> On Mon, Jun 29, 2015 at 9:59 AM, Michael Povolotskyi >> > wrote: >> >> Dear PETSc developers and users, >> what does this error message mean? >> >> [0]PETSC ERROR: --------------------- Error Message >> ------------------------------------ >> [0]PETSC ERROR: Argument out of range! >> [0]PETSC ERROR: nnz cannot be greater than row length: local >> row 0 value 108 rowlength 72! >> >> >> It looks like a bug in our AXPY implementation. We try to do >> preallocation of the result, but it seems >> here that we are overallocating. Can you run with the latest release? >> >> Thanks, >> >> Matt >> >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [0]PETSC ERROR: Petsc Release Version 3.4.3, Oct, 15, 2013 >> [0]PETSC ERROR: See docs/changes/index.html for recent updates. >> [0]PETSC ERROR: See docs/faq.html for hints about trouble >> shooting. >> [0]PETSC ERROR: See docs/index.html for manual pages. >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [0]PETSC ERROR: /home/mpovolot/NEMO5/prototype/nemo on a >> linux-static named conte-fe02.rcac.purdue.edu >> by mpovolot Mon Jun 29 >> 10:49:32 2015 >> [0]PETSC ERROR: Libraries linked from >> /apps/rhel6/petsc/3.4.3_impi-4.1.1.036_intel-13.1.1.163/linux-static/lib >> [0]PETSC ERROR: Configure run at Sun Jan 19 12:47:22 2014 >> [0]PETSC ERROR: Configure options --with-cc=mpiicc >> --with-cxx=mpiicpc --with-fc=mpiifort --download-sowing >> --with-scalar-type=complex --with-shared-libraries=0 >> --with-pic=1 --with-clanguage=C++ --with-fortran=1 >> --with-fortran-kernels=0 --with-debugging=0 >> --with-blas-lapack-dir=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64 >> --with-blacs-lib=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so >> --with-blacs-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include >> --with-scalapack-lib="-L/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64 >> -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64" >> --with-scalapack-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include >> --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 >> --download-hdf5=yes --download-metis=yes >> --download-parmetis=yes --download-superlu_dist=yes >> --download-mumps=yes --download-hypre=no --download-spooles=yes >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [0]PETSC ERROR: MatSeqAIJSetPreallocation_SeqAIJ() line 3524 >> in >> /apps/rhel6/petsc/3.4.3_impi-4.1.1.036_intel-13.1.1.163/src/mat/impls/aij/seq/aij.c >> [0]PETSC ERROR: MatSeqAIJSetPreallocation() line 3496 in >> /apps/rhel6/petsc/3.4.3_impi-4.1.1.036_intel-13.1.1.163/src/mat/impls/aij/seq/aij.c >> [0]PETSC ERROR: MatAXPY_SeqAIJ() line 2710 in >> /apps/rhel6/petsc/3.4.3_impi-4.1.1.036_intel-13.1.1.163/src/mat/impls/aij/seq/aij.c >> [0]PETSC ERROR: MatAXPY() line 39 in >> /apps/rhel6/petsc/3.4.3_impi-4.1.1.036_intel-13.1.1.163/src/mat/utils/axpy.c >> terminate called after throwing an instance of 'n5_runtime_error' >> what(): [PetscMatrixNemo] PETSc gave error with code 63: >> Argument out of range >> . >> >> Program received signal SIGABRT, Aborted. >> >> Thank you, >> Michael. >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to >> which their experiments lead. >> -- Norbert Wiener > > -- > Michael Povolotskyi, PhD > Research Assistant Professor > Network for Computational Nanotechnology > Hall for Discover and Learning Research, Room 441 > West Lafayette, IN 47907 > Phone(765) 4949396 > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -- Michael Povolotskyi, PhD Research Assistant Professor Network for Computational Nanotechnology Hall for Discover and Learning Research, Room 441 West Lafayette, IN 47907 Phone (765) 4949396 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mpovolot at purdue.edu Mon Jun 29 10:49:42 2015 From: mpovolot at purdue.edu (Michael Povolotskyi) Date: Mon, 29 Jun 2015 11:49:42 -0400 Subject: [petsc-users] Another question about MatAXPY Message-ID: <55916916.6050504@purdue.edu> Dear developers, I have the following question: Imagine that I have two matrices A and B Both are created using MatCreateSeqAIJ (MPI_Comm comm,PetscInt m,PetscInt n,PetscInt nz,const PetscInt nnz[],Mat *A) with the same first 5 arguments. In addition, for both matrices I call MatSetOption(A,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_TRUE); After this I set some entries to both A and B matrices and assemble them. Question: can I use MatAXPY with SAME_NONZERO_PATTERN argument? Thank you, Michael. -- Michael Povolotskyi, PhD Research Assistant Professor Network for Computational Nanotechnology Hall for Discover and Learning Research, Room 441 West Lafayette, IN 47907 Phone (765) 4949396 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Xinya.Li at pnnl.gov Mon Jun 29 11:57:56 2015 From: Xinya.Li at pnnl.gov (Li, Xinya) Date: Mon, 29 Jun 2015 16:57:56 +0000 Subject: [petsc-users] TSSolve problems In-Reply-To: References: Message-ID: Barry, Here is the new output without debugging. Thank you. Xinya ******************************************************************** -----Original Message----- From: Barry Smith [mailto:bsmith at mcs.anl.gov] Sent: Friday, June 26, 2015 12:04 PM To: Li, Xinya Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] TSSolve problems ########################################################## # # # WARNING!!! # # # # This code was compiled with a debugging option, # # To get timing results run ./configure # # using --with-debugging=no, the performance will # # be generally two or three times faster. # # # ########################################################## First you need to configure PETSc again without all the debugging. So do, for example, export PETSC=arch-opt ./configure --with-cc=gcc --with-fc=gfortran --with-cxx=g++ --with-scalar-type=complex --with-clanguage=C++ --with-cxx-dialect=C++11 --download-mpich --download-superlu_dist --download-mumps --download-scalapack --download-parmetis --download-metis --download-elemental make all test then recompile and rerun your example again with -log_summary and send the output. Note that you should not pass --download-fblaslapack nor the fortran kernel stuff. Barry > On Jun 26, 2015, at 12:16 PM, Li, Xinya wrote: > > Barry, > > Thank you for your response. > > Attached is the output. SNESSolve was taking most of the time. > > > Xinya > > > > -----Original Message----- > From: Barry Smith [mailto:bsmith at mcs.anl.gov] > Sent: Thursday, June 25, 2015 5:47 PM > To: Li, Xinya > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] TSSolve problems > > > Run with -ts_view -log_summary and send the output. This will tell the current solvers and where the time is being spent. > > Barry > >> On Jun 25, 2015, at 6:37 PM, Li, Xinya wrote: >> >> Dear Sir, >> >> I am using the ts solver to solve a set of ODE and DAE. The Jacobian matrix is a 1152 *1152 sparse complex matrix. >> Each TSStep in TSSolve is taking nearly 1 second. Thus, I need to improve the speed of TSSolve. >> Which parts should be taking into account to accelerate TSSolve? >> Thank you very much. >> Regards >> __________________________________________________ >> Xinya Li >> Scientist >> EED/Hydrology >> Pacific Northwest National Laboratory >> 902 Battelle Boulevard >> P.O. Box 999, MSIN K9-33 >> Richland, WA 99352 USA >> Tel: 509-372-6248 >> Fax: 509-372-6089 >> Xinya.Li at pnl.gov > > <288g1081b_short.log> -------------- next part -------------- A non-text attachment was scrubbed... Name: d288gen.log Type: application/octet-stream Size: 17837 bytes Desc: d288gen.log URL: From bsmith at mcs.anl.gov Mon Jun 29 12:58:38 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 29 Jun 2015 12:58:38 -0500 Subject: [petsc-users] Another question about MatAXPY In-Reply-To: <55916916.6050504@purdue.edu> References: <55916916.6050504@purdue.edu> Message-ID: > On Jun 29, 2015, at 10:49 AM, Michael Povolotskyi wrote: > > Dear developers, > I have the following question: > > Imagine that I have two matrices A and B > Both are created using MatCreateSeqAIJ(MPI_Comm comm,PetscInt m,PetscInt n,PetscInt nz,const PetscInt nnz[],Mat *A) > with the same first 5 arguments. > In addition, for both matrices I call > MatSetOption(A,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_TRUE); > > After this I set some entries to both A and B matrices and assemble them. > Question: can I use MatAXPY with SAME_NONZERO_PATTERN argument? Only if you KNOW that they have the exact same nonzero pattern. The MatAssembly "squeezes" out any "extra" space that was preallocated but was not actually used when calling MatSetValues() so even if two matrices have the same preallocation it doesn't mean they have the same nonzero pattern. Barry > Thank you, > Michael. > -- > Michael Povolotskyi, PhD > Research Assistant Professor > Network for Computational Nanotechnology > Hall for Discover and Learning Research, Room 441 > West Lafayette, IN 47907 > Phone (765) 4949396 > From bsmith at mcs.anl.gov Mon Jun 29 13:26:49 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 29 Jun 2015 13:26:49 -0500 Subject: [petsc-users] TSSolve problems In-Reply-To: References: Message-ID: Is your code C++? count time percent time --------------------------------------------------------------------- TSStep 600 3.1174e+02 100 TSFunctionEval 2937 1.4288e+02 46 TSJacobianEval 1737 1.3074e+02 42 KSPSolve 1737 3.5144e+01 11 Ok I pulled out the important time from the table. 46 percent of the time is in your function evaluation, 42 percent in the Jacobian evaluation and 11 percent in the linear solve. The only way to improve the time significantly is by speeding up the function and Jacobian computations. What is happening in those routines, can you email them? Barry > On Jun 29, 2015, at 11:57 AM, Li, Xinya wrote: > > Barry, > > Here is the new output without debugging. > > Thank you. > > Xinya > > ******************************************************************** > > > -----Original Message----- > From: Barry Smith [mailto:bsmith at mcs.anl.gov] > Sent: Friday, June 26, 2015 12:04 PM > To: Li, Xinya > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] TSSolve problems > > > ########################################################## > # # > # WARNING!!! # > # # > # This code was compiled with a debugging option, # > # To get timing results run ./configure # > # using --with-debugging=no, the performance will # > # be generally two or three times faster. # > # # > ########################################################## > > First you need to configure PETSc again without all the debugging. So do, for example, > > export PETSC=arch-opt > ./configure --with-cc=gcc --with-fc=gfortran --with-cxx=g++ --with-scalar-type=complex --with-clanguage=C++ --with-cxx-dialect=C++11 --download-mpich --download-superlu_dist --download-mumps --download-scalapack --download-parmetis --download-metis --download-elemental make all test > > then recompile and rerun your example again with -log_summary and send the output. > > Note that you should not pass --download-fblaslapack nor the fortran kernel stuff. > > Barry > > >> On Jun 26, 2015, at 12:16 PM, Li, Xinya wrote: >> >> Barry, >> >> Thank you for your response. >> >> Attached is the output. SNESSolve was taking most of the time. >> >> >> Xinya >> >> >> >> -----Original Message----- >> From: Barry Smith [mailto:bsmith at mcs.anl.gov] >> Sent: Thursday, June 25, 2015 5:47 PM >> To: Li, Xinya >> Cc: petsc-users at mcs.anl.gov >> Subject: Re: [petsc-users] TSSolve problems >> >> >> Run with -ts_view -log_summary and send the output. This will tell the current solvers and where the time is being spent. >> >> Barry >> >>> On Jun 25, 2015, at 6:37 PM, Li, Xinya wrote: >>> >>> Dear Sir, >>> >>> I am using the ts solver to solve a set of ODE and DAE. The Jacobian matrix is a 1152 *1152 sparse complex matrix. >>> Each TSStep in TSSolve is taking nearly 1 second. Thus, I need to improve the speed of TSSolve. >>> Which parts should be taking into account to accelerate TSSolve? >>> Thank you very much. >>> Regards >>> __________________________________________________ >>> Xinya Li >>> Scientist >>> EED/Hydrology >>> Pacific Northwest National Laboratory >>> 902 Battelle Boulevard >>> P.O. Box 999, MSIN K9-33 >>> Richland, WA 99352 USA >>> Tel: 509-372-6248 >>> Fax: 509-372-6089 >>> Xinya.Li at pnl.gov >> >> <288g1081b_short.log> > > From mpovolot at purdue.edu Mon Jun 29 13:28:36 2015 From: mpovolot at purdue.edu (Michael Povolotskyi) Date: Mon, 29 Jun 2015 14:28:36 -0400 Subject: [petsc-users] Another question about MatAXPY In-Reply-To: References: <55916916.6050504@purdue.edu> Message-ID: <55918E54.6020204@purdue.edu> Thank you. Is there a way to disable the squeeze? Michael. On 06/29/2015 01:58 PM, Barry Smith wrote: >> On Jun 29, 2015, at 10:49 AM, Michael Povolotskyi wrote: >> >> Dear developers, >> I have the following question: >> >> Imagine that I have two matrices A and B >> Both are created using MatCreateSeqAIJ(MPI_Comm comm,PetscInt m,PetscInt n,PetscInt nz,const PetscInt nnz[],Mat *A) >> with the same first 5 arguments. >> In addition, for both matrices I call >> MatSetOption(A,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_TRUE); >> >> After this I set some entries to both A and B matrices and assemble them. >> Question: can I use MatAXPY with SAME_NONZERO_PATTERN argument? > Only if you KNOW that they have the exact same nonzero pattern. The MatAssembly "squeezes" out any "extra" space that was preallocated but was not actually used when calling MatSetValues() so even if two matrices have the same preallocation it doesn't mean they have the same nonzero pattern. > > Barry > > >> Thank you, >> Michael. >> -- >> Michael Povolotskyi, PhD >> Research Assistant Professor >> Network for Computational Nanotechnology >> Hall for Discover and Learning Research, Room 441 >> West Lafayette, IN 47907 >> Phone (765) 4949396 >> From Xinya.Li at pnnl.gov Mon Jun 29 13:35:43 2015 From: Xinya.Li at pnnl.gov (Li, Xinya) Date: Mon, 29 Jun 2015 18:35:43 +0000 Subject: [petsc-users] TSSolve problems In-Reply-To: References: Message-ID: Yes. it is C++. I attached the major routines for TSSolve. Thanks Xinya -----Original Message----- From: Barry Smith [mailto:bsmith at mcs.anl.gov] Sent: Monday, June 29, 2015 11:27 AM To: Li, Xinya Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] TSSolve problems Is your code C++? count time percent time --------------------------------------------------------------------- TSStep 600 3.1174e+02 100 TSFunctionEval 2937 1.4288e+02 46 TSJacobianEval 1737 1.3074e+02 42 KSPSolve 1737 3.5144e+01 11 Ok I pulled out the important time from the table. 46 percent of the time is in your function evaluation, 42 percent in the Jacobian evaluation and 11 percent in the linear solve. The only way to improve the time significantly is by speeding up the function and Jacobian computations. What is happening in those routines, can you email them? Barry > On Jun 29, 2015, at 11:57 AM, Li, Xinya wrote: > > Barry, > > Here is the new output without debugging. > > Thank you. > > Xinya > > ******************************************************************** > > > -----Original Message----- > From: Barry Smith [mailto:bsmith at mcs.anl.gov] > Sent: Friday, June 26, 2015 12:04 PM > To: Li, Xinya > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] TSSolve problems > > > ########################################################## > # # > # WARNING!!! # > # # > # This code was compiled with a debugging option, # > # To get timing results run ./configure # > # using --with-debugging=no, the performance will # > # be generally two or three times faster. # > # # > ########################################################## > > First you need to configure PETSc again without all the debugging. So do, for example, > > export PETSC=arch-opt > ./configure --with-cc=gcc --with-fc=gfortran --with-cxx=g++ --with-scalar-type=complex --with-clanguage=C++ --with-cxx-dialect=C++11 --download-mpich --download-superlu_dist --download-mumps --download-scalapack --download-parmetis --download-metis --download-elemental make all test > > then recompile and rerun your example again with -log_summary and send the output. > > Note that you should not pass --download-fblaslapack nor the fortran kernel stuff. > > Barry > > >> On Jun 26, 2015, at 12:16 PM, Li, Xinya wrote: >> >> Barry, >> >> Thank you for your response. >> >> Attached is the output. SNESSolve was taking most of the time. >> >> >> Xinya >> >> >> >> -----Original Message----- >> From: Barry Smith [mailto:bsmith at mcs.anl.gov] >> Sent: Thursday, June 25, 2015 5:47 PM >> To: Li, Xinya >> Cc: petsc-users at mcs.anl.gov >> Subject: Re: [petsc-users] TSSolve problems >> >> >> Run with -ts_view -log_summary and send the output. This will tell the current solvers and where the time is being spent. >> >> Barry >> >>> On Jun 25, 2015, at 6:37 PM, Li, Xinya wrote: >>> >>> Dear Sir, >>> >>> I am using the ts solver to solve a set of ODE and DAE. The Jacobian matrix is a 1152 *1152 sparse complex matrix. >>> Each TSStep in TSSolve is taking nearly 1 second. Thus, I need to improve the speed of TSSolve. >>> Which parts should be taking into account to accelerate TSSolve? >>> Thank you very much. >>> Regards >>> __________________________________________________ >>> Xinya Li >>> Scientist >>> EED/Hydrology >>> Pacific Northwest National Laboratory >>> 902 Battelle Boulevard >>> P.O. Box 999, MSIN K9-33 >>> Richland, WA 99352 USA >>> Tel: 509-372-6248 >>> Fax: 509-372-6089 >>> Xinya.Li at pnl.gov >> >> <288g1081b_short.log> > > -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: simulation.C URL: From bsmith at mcs.anl.gov Mon Jun 29 13:38:12 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 29 Jun 2015 13:38:12 -0500 Subject: [petsc-users] Another question about MatAXPY In-Reply-To: <55918E54.6020204@purdue.edu> References: <55916916.6050504@purdue.edu> <55918E54.6020204@purdue.edu> Message-ID: > On Jun 29, 2015, at 1:28 PM, Michael Povolotskyi wrote: > > Thank you. > Is there a way to disable the squeeze? Not directly (it won't know what to do with the empty space). If you want to insure that a sparse matrix has a particular nonzero structure then just set values of 0.0 into the "extra" locations you want represented in the "nonzero pattern". One simple way to do this is call MatSetValues() for the nonzero structure you want with all zero entries, then call MatAssemblyBegin/End(mat,MAT_FINAL_ASSEMBLY) and then call MatSetValues() to put the numerical values you want to set (which will be a subset of the location you put in with zeros) then call MatAssemblyBegin/End(mat,MAT_FINAL_ASSEMBLY) again. Barry > Michael. > > On 06/29/2015 01:58 PM, Barry Smith wrote: >>> On Jun 29, 2015, at 10:49 AM, Michael Povolotskyi wrote: >>> >>> Dear developers, >>> I have the following question: >>> >>> Imagine that I have two matrices A and B >>> Both are created using MatCreateSeqAIJ(MPI_Comm comm,PetscInt m,PetscInt n,PetscInt nz,const PetscInt nnz[],Mat *A) >>> with the same first 5 arguments. >>> In addition, for both matrices I call >>> MatSetOption(A,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_TRUE); >>> >>> After this I set some entries to both A and B matrices and assemble them. >>> Question: can I use MatAXPY with SAME_NONZERO_PATTERN argument? >> Only if you KNOW that they have the exact same nonzero pattern. The MatAssembly "squeezes" out any "extra" space that was preallocated but was not actually used when calling MatSetValues() so even if two matrices have the same preallocation it doesn't mean they have the same nonzero pattern. >> >> Barry >> >> >>> Thank you, >>> Michael. >>> -- >>> Michael Povolotskyi, PhD >>> Research Assistant Professor >>> Network for Computational Nanotechnology >>> Hall for Discover and Learning Research, Room 441 >>> West Lafayette, IN 47907 >>> Phone (765) 4949396 >>> > From mpovolot at purdue.edu Mon Jun 29 13:42:26 2015 From: mpovolot at purdue.edu (Michael Povolotskyi) Date: Mon, 29 Jun 2015 14:42:26 -0400 Subject: [petsc-users] Another question about MatAXPY In-Reply-To: References: <55916916.6050504@purdue.edu> <55918E54.6020204@purdue.edu> Message-ID: <55919192.6020200@purdue.edu> Will a call to MatZeroEntries and MatAssemblyBegin/End(mat,MAT_FINAL_ASSEMBLY) do same as MatSetValues() with zero values? Michael. On 06/29/2015 02:38 PM, Barry Smith wrote: >> On Jun 29, 2015, at 1:28 PM, Michael Povolotskyi wrote: >> >> Thank you. >> Is there a way to disable the squeeze? > Not directly (it won't know what to do with the empty space). If you want to insure that a sparse matrix has a particular nonzero structure then just set values of 0.0 into the "extra" locations you want represented in the "nonzero pattern". One simple way to do this is call MatSetValues() for the nonzero structure you want with all zero entries, then call MatAssemblyBegin/End(mat,MAT_FINAL_ASSEMBLY) and then call MatSetValues() to put the numerical values you want to set (which will be a subset of the location you put in with zeros) then call MatAssemblyBegin/End(mat,MAT_FINAL_ASSEMBLY) again. > > Barry > >> Michael. >> >> On 06/29/2015 01:58 PM, Barry Smith wrote: >>>> On Jun 29, 2015, at 10:49 AM, Michael Povolotskyi wrote: >>>> >>>> Dear developers, >>>> I have the following question: >>>> >>>> Imagine that I have two matrices A and B >>>> Both are created using MatCreateSeqAIJ(MPI_Comm comm,PetscInt m,PetscInt n,PetscInt nz,const PetscInt nnz[],Mat *A) >>>> with the same first 5 arguments. >>>> In addition, for both matrices I call >>>> MatSetOption(A,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_TRUE); >>>> >>>> After this I set some entries to both A and B matrices and assemble them. >>>> Question: can I use MatAXPY with SAME_NONZERO_PATTERN argument? >>> Only if you KNOW that they have the exact same nonzero pattern. The MatAssembly "squeezes" out any "extra" space that was preallocated but was not actually used when calling MatSetValues() so even if two matrices have the same preallocation it doesn't mean they have the same nonzero pattern. >>> >>> Barry >>> >>> >>>> Thank you, >>>> Michael. >>>> -- >>>> Michael Povolotskyi, PhD >>>> Research Assistant Professor >>>> Network for Computational Nanotechnology >>>> Hall for Discover and Learning Research, Room 441 >>>> West Lafayette, IN 47907 >>>> Phone (765) 4949396 >>>> -- Michael Povolotskyi, PhD Research Assistant Professor Network for Computational Nanotechnology Hall for Discover and Learning Research, Room 441 West Lafayette, IN 47907 Phone (765) 4949396 From bsmith at mcs.anl.gov Mon Jun 29 13:49:50 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 29 Jun 2015 13:49:50 -0500 Subject: [petsc-users] Another question about MatAXPY In-Reply-To: <55919192.6020200@purdue.edu> References: <55916916.6050504@purdue.edu> <55918E54.6020204@purdue.edu> <55919192.6020200@purdue.edu> Message-ID: <00F161BE-0E78-4DDC-9DC6-1B0220CFD618@mcs.anl.gov> > On Jun 29, 2015, at 1:42 PM, Michael Povolotskyi wrote: > > Will a call to MatZeroEntries > and MatAssemblyBegin/End(mat,MAT_FINAL_ASSEMBLY) do same as MatSetValues() with zero values? No, unfortunately not. If you call MatZeroEntires() before you have put any entries (and their locations) in it has no way to know what nonzero pattern you actually want. You have to tell it the "nonzero" entries so it knows where you want them. Barry > Michael. > > > On 06/29/2015 02:38 PM, Barry Smith wrote: >>> On Jun 29, 2015, at 1:28 PM, Michael Povolotskyi wrote: >>> >>> Thank you. >>> Is there a way to disable the squeeze? >> Not directly (it won't know what to do with the empty space). If you want to insure that a sparse matrix has a particular nonzero structure then just set values of 0.0 into the "extra" locations you want represented in the "nonzero pattern". One simple way to do this is call MatSetValues() for the nonzero structure you want with all zero entries, then call MatAssemblyBegin/End(mat,MAT_FINAL_ASSEMBLY) and then call MatSetValues() to put the numerical values you want to set (which will be a subset of the location you put in with zeros) then call MatAssemblyBegin/End(mat,MAT_FINAL_ASSEMBLY) again. >> >> Barry >> >>> Michael. >>> >>> On 06/29/2015 01:58 PM, Barry Smith wrote: >>>>> On Jun 29, 2015, at 10:49 AM, Michael Povolotskyi wrote: >>>>> >>>>> Dear developers, >>>>> I have the following question: >>>>> >>>>> Imagine that I have two matrices A and B >>>>> Both are created using MatCreateSeqAIJ(MPI_Comm comm,PetscInt m,PetscInt n,PetscInt nz,const PetscInt nnz[],Mat *A) >>>>> with the same first 5 arguments. >>>>> In addition, for both matrices I call >>>>> MatSetOption(A,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_TRUE); >>>>> >>>>> After this I set some entries to both A and B matrices and assemble them. >>>>> Question: can I use MatAXPY with SAME_NONZERO_PATTERN argument? >>>> Only if you KNOW that they have the exact same nonzero pattern. The MatAssembly "squeezes" out any "extra" space that was preallocated but was not actually used when calling MatSetValues() so even if two matrices have the same preallocation it doesn't mean they have the same nonzero pattern. >>>> >>>> Barry >>>> >>>> >>>>> Thank you, >>>>> Michael. >>>>> -- >>>>> Michael Povolotskyi, PhD >>>>> Research Assistant Professor >>>>> Network for Computational Nanotechnology >>>>> Hall for Discover and Learning Research, Room 441 >>>>> West Lafayette, IN 47907 >>>>> Phone (765) 4949396 >>>>> > > -- > Michael Povolotskyi, PhD > Research Assistant Professor > Network for Computational Nanotechnology > Hall for Discover and Learning Research, Room 441 > West Lafayette, IN 47907 > Phone (765) 4949396 > From gianmail at gmail.com Mon Jun 29 21:34:37 2015 From: gianmail at gmail.com (Gianluca Meneghello) Date: Mon, 29 Jun 2015 19:34:37 -0700 Subject: [petsc-users] DM coordinates interpolations In-Reply-To: <92F3F647-6DDE-4BD1-A657-DCA520039DB2@mcs.anl.gov> References: <92F3F647-6DDE-4BD1-A657-DCA520039DB2@mcs.anl.gov> Message-ID: Thanks! Gianluca On Saturday, June 27, 2015, Barry Smith wrote: > > > On Jun 27, 2015, at 1:25 PM, Gianluca Meneghello > wrote: > > > > Dear Matthew, Berry and Dave, > > > > Thanks for your reply. I will do as you suggest: only two more questions: > > > > - is the beginning of formFunction the right place to set up the > coordinates? As far as I understand I do not have access to the refined DM > before running the code with -snes_grid_sequence. > > > > - if so, what should I check in order to avoid to recompute the > coordinates at each formFunction call? e.g., what does DMGetCoordinates > return NULL as the coordinate vector (or something else) if called when the > coordinates have not yet been set up? > > Yes it returns NULL so you can use that as your check. > > > > > Thanks again > > > > Gianluca > > > > > > > > On Fri, Jun 26, 2015 at 11:28 AM, Dave May > wrote: > > Also note that if you go and modify the global coordinate vector after > calling DMDASetUniformCoordinates, you will need to explicitly call the > vecscatter yourself to update the local coordinates. > > > > > > > > On Friday, 26 June 2015, Barry Smith > > wrote: > > > > > On Jun 26, 2015, at 12:27 PM, Gianluca Meneghello > wrote: > > > > > > Dear all, > > > > > > I would like to solve a PDE discretized on a nonuniform --- but > rectangular --- grid and I wanted to use the DM coordinates vector to > compute the metric terms by finite difference. The only alternative I see > is to recompute the coordinates (and then the metric terms) at every > function and jacobian evaluation call. > > > > > > My question is what is the best (or even correct) way to provide the > coordinates to the newly created da objects. Is there anything like a > DMDASetNonUniformCoordinates to which to pass a function computing the > coordinates? As far as I can tell the fine grid coordinates are currently > linearly interpolated from the coarse grid ones. > > > > Call DMDASetUniformCoordinates() on each level then call > DMGetCoordinates() and put the coordinate values you want in. You can call > DMGetCoordinateDM(dm, &dmcoor) to get the DM that goes with the coordinate > vector and use DMDAVecGetArray(dmcoor,coor,&array) to give easy access to > the entries. > > > > > > > > Please also let me thank you for your great work: it has been and it > currently is of enormous help. > > > > > > Best > > > > > > Gianluca > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.croucher at auckland.ac.nz Mon Jun 29 21:50:58 2015 From: a.croucher at auckland.ac.nz (Adrian Croucher) Date: Tue, 30 Jun 2015 14:50:58 +1200 Subject: [petsc-users] DMDAVecGetArrayReadF90 Message-ID: <55920412.1060105@auckland.ac.nz> hi, I'm using a SNES to solve a nonlinear problem on a DMDA, using Fortran. In my function to be minimized I need an array on the solution vector. If I use DMDAVecGetArrayF90(), I get an 'object is in wrong state' error because the solution vector has been locked read-only by the SNES. If I'm using DMPLex I use VecGetArrayReadF90() to get a read-only array, but for DMDA there doesn't seem to be a corresponding DMDAVecGetArrayReadF90(). Is there something else I should use instead? - Adrian -- Dr Adrian Croucher Senior Research Fellow Department of Engineering Science University of Auckland, New Zealand email: a.croucher at auckland.ac.nz tel: +64 (0)9 923 84611 From bsmith at mcs.anl.gov Mon Jun 29 22:25:36 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 29 Jun 2015 22:25:36 -0500 Subject: [petsc-users] DMDAVecGetArrayReadF90 In-Reply-To: <55920412.1060105@auckland.ac.nz> References: <55920412.1060105@auckland.ac.nz> Message-ID: You are correct that those routines should exist but simply do not exist in the PETSc 3.6 release. I have added them in the branch barry/add-dmdavecgetarrayreadf90/maint as well as the attached patch. Please let me know if you have any trouble with them, Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: add-dmdavecgetarrayreadf90.patch Type: application/octet-stream Size: 13210 bytes Desc: not available URL: -------------- next part -------------- > On Jun 29, 2015, at 9:50 PM, Adrian Croucher wrote: > > hi, I'm using a SNES to solve a nonlinear problem on a DMDA, using Fortran. > > In my function to be minimized I need an array on the solution vector. If I use DMDAVecGetArrayF90(), I get an 'object is in wrong state' error because the solution vector has been locked read-only by the SNES. > > If I'm using DMPLex I use VecGetArrayReadF90() to get a read-only array, but for DMDA there doesn't seem to be a corresponding DMDAVecGetArrayReadF90(). Is there something else I should use instead? > > - Adrian > > -- > Dr Adrian Croucher > Senior Research Fellow > Department of Engineering Science > University of Auckland, New Zealand > email: a.croucher at auckland.ac.nz > tel: +64 (0)9 923 84611 > From bsmith at mcs.anl.gov Mon Jun 29 22:28:40 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 29 Jun 2015 22:28:40 -0500 Subject: [petsc-users] TSSolve problems In-Reply-To: References: Message-ID: > On Jun 29, 2015, at 1:35 PM, Li, Xinya wrote: > > Yes. it is C++. > I attached the major routines for TSSolve. Thanks. There is not much in the functions to explain why they are taking the vast bulk of the time but I am going to go out on a limb and guess that it is the C++ implementation of complex numbers and the use of the cos() and sin() of those complex numbers that it causing the terrible performance. Since your code has actually very little C++ specific stuff in it I would suggest building PETSc without the --with-clanguage=c++ and use C for everything, changing your code as needed to remove the C++ isms. In C99 complex is a native data type (not a C++ class) so can potentially give better performance. Barry > > Thanks > Xinya > > > -----Original Message----- > From: Barry Smith [mailto:bsmith at mcs.anl.gov] > Sent: Monday, June 29, 2015 11:27 AM > To: Li, Xinya > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] TSSolve problems > > > Is your code C++? > > count time percent time > --------------------------------------------------------------------- > TSStep 600 3.1174e+02 100 > TSFunctionEval 2937 1.4288e+02 46 > TSJacobianEval 1737 1.3074e+02 42 > KSPSolve 1737 3.5144e+01 11 > > Ok I pulled out the important time from the table. 46 percent of the time is in your function evaluation, 42 percent in the Jacobian evaluation and 11 percent in the linear solve. > > The only way to improve the time significantly is by speeding up the function and Jacobian computations. What is happening in those routines, can you email them? > > Barry > > > > > >> On Jun 29, 2015, at 11:57 AM, Li, Xinya wrote: >> >> Barry, >> >> Here is the new output without debugging. >> >> Thank you. >> >> Xinya >> >> ******************************************************************** >> >> >> -----Original Message----- >> From: Barry Smith [mailto:bsmith at mcs.anl.gov] >> Sent: Friday, June 26, 2015 12:04 PM >> To: Li, Xinya >> Cc: petsc-users at mcs.anl.gov >> Subject: Re: [petsc-users] TSSolve problems >> >> >> ########################################################## >> # # >> # WARNING!!! # >> # # >> # This code was compiled with a debugging option, # >> # To get timing results run ./configure # >> # using --with-debugging=no, the performance will # >> # be generally two or three times faster. # >> # # >> ########################################################## >> >> First you need to configure PETSc again without all the debugging. So do, for example, >> >> export PETSC=arch-opt >> ./configure --with-cc=gcc --with-fc=gfortran --with-cxx=g++ --with-scalar-type=complex --with-clanguage=C++ --with-cxx-dialect=C++11 --download-mpich --download-superlu_dist --download-mumps --download-scalapack --download-parmetis --download-metis --download-elemental make all test >> >> then recompile and rerun your example again with -log_summary and send the output. >> >> Note that you should not pass --download-fblaslapack nor the fortran kernel stuff. >> >> Barry >> >> >>> On Jun 26, 2015, at 12:16 PM, Li, Xinya wrote: >>> >>> Barry, >>> >>> Thank you for your response. >>> >>> Attached is the output. SNESSolve was taking most of the time. >>> >>> >>> Xinya >>> >>> >>> >>> -----Original Message----- >>> From: Barry Smith [mailto:bsmith at mcs.anl.gov] >>> Sent: Thursday, June 25, 2015 5:47 PM >>> To: Li, Xinya >>> Cc: petsc-users at mcs.anl.gov >>> Subject: Re: [petsc-users] TSSolve problems >>> >>> >>> Run with -ts_view -log_summary and send the output. This will tell the current solvers and where the time is being spent. >>> >>> Barry >>> >>>> On Jun 25, 2015, at 6:37 PM, Li, Xinya wrote: >>>> >>>> Dear Sir, >>>> >>>> I am using the ts solver to solve a set of ODE and DAE. The Jacobian matrix is a 1152 *1152 sparse complex matrix. >>>> Each TSStep in TSSolve is taking nearly 1 second. Thus, I need to improve the speed of TSSolve. >>>> Which parts should be taking into account to accelerate TSSolve? >>>> Thank you very much. >>>> Regards >>>> __________________________________________________ >>>> Xinya Li >>>> Scientist >>>> EED/Hydrology >>>> Pacific Northwest National Laboratory >>>> 902 Battelle Boulevard >>>> P.O. Box 999, MSIN K9-33 >>>> Richland, WA 99352 USA >>>> Tel: 509-372-6248 >>>> Fax: 509-372-6089 >>>> Xinya.Li at pnl.gov >>> >>> <288g1081b_short.log> >> >> > > From sander.land at gmail.com Tue Jun 30 05:43:40 2015 From: sander.land at gmail.com (Sander Land) Date: Tue, 30 Jun 2015 11:43:40 +0100 Subject: [petsc-users] Fieldsplit PC null pointer error on getksp Message-ID: I am trying to use the Schur complement preconditioner in petsc, but am encountering a null argument error calling PCFieldSplitGetSubKSP. This only happens on PC_COMPOSITE_SCHUR, the multiplicative/additive options do return a KSP array. Error message: [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Null argument, when expecting valid pointer [0]PETSC ERROR: Null Object: Parameter # 1 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 [0]PETSC ERROR: ./elecmech on a arch-linux2-cxx-debug [0]PETSC ERROR: Configure options --with-shared-libraries=1 --with-debugging=1 --download-openmpi=1 --with-clanguage=c++ --download-fblaslapack=1 --download-scalapack=1 --download-blacs=1 --download-suitesparse=1 --download-pastix=1 --download-superlu=1 --dowload-essl=1 --download-ptscotch=1 --download-mumps=1 --download-lusol=1 [0]PETSC ERROR: #1 MatSchurComplementGetKSP() line 317 in petsc-3.6/src/ksp/ksp/utils/schurm.c [0]PETSC ERROR: #2 PCFieldSplitGetSubKSP_FieldSplit_Schur() line 1264 in petsc-3.6/src/ksp/pc/impls/fieldsplit/fieldsplit.c [0]PETSC ERROR: #3 PCFieldSplitGetSubKSP() line 1668 in petsc-3.6/src/ksp/pc/impls/fieldsplit/fieldsplit.c Code snippet: ISCreateGeneral(PETSC_COMM_SELF,xp_dofs,&ii[0],PETSC_COPY_VALUES,&is_xp); ISCreateGeneral(PETSC_COMM_SELF,all_dofs - xp_dofs,&ii[xp_dofs],PETSC_COPY_VALUES,&is_wk); PCSetType(pc,PCFIELDSPLIT); PCFieldSplitSetType(pc,PC_COMPOSITE_SCHUR); PCFieldSplitSetIS(pc,"xp",is_xp); PCFieldSplitSetIS(pc,"wk",is_wk); int n; KSP* ksps; PC subpc; PCFieldSplitGetSubKSP(pc,&n,&ksps); ii here is simply an array with ii[i] = i. There is probably a better way to simply indicate two blocks of different size, but I couldn't find it. Thanks, Sander -------------- next part -------------- An HTML attachment was scrubbed... URL: From jianjun.xiao at kit.edu Tue Jun 30 07:10:14 2015 From: jianjun.xiao at kit.edu (Xiao, Jianjun (IKET)) Date: Tue, 30 Jun 2015 14:10:14 +0200 Subject: [petsc-users] periodic boundary condition does not work after upgrading from 3.5.1 to 3.6.0 Message-ID: <56D054AF2E93E044AC1D2685709D28680197C29596B9@KIT-MSX-07.kit.edu> I am running a case with periodic boundary condition. It worked fine with PETSc 3.5.1. When PETSc was upgraded to 3.6.0, I got the error below. [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: No support for this operation for this object type [0]PETSC ERROR: Available only for boundary none or with parallism in y direction [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 [0]PETSC ERROR: #295 DMLocalToGlobalBegin_DA() line 54 in /home/xiao/Local/petsc-3.6.0-debug/src/dm/impls/da/dagtol.c [0]PETSC ERROR: #296 DMLocalToGlobalBegin() line 1944 in /home/xiao/Local/petsc-3.6.0-debug/src/dm/interface/dm.c I have periodic boundary condition in y direction, and I used INSERT_VALUES in DMLocalToGlobal. CALL DMLocalToGlobalBegin(da_dof1,wb0_gfpp_loc,INSERT_VALUES,wb0_gfpp,ierr) CALL DMLocalToGlobalEnd(da_dof1,wb0_gfpp_loc,INSERT_VALUES,wb0_gfpp,ierr) Please let me know if you need more information. Thanks in advance. Best regards Jianjun From knepley at gmail.com Tue Jun 30 08:09:43 2015 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 30 Jun 2015 14:09:43 +0100 Subject: [petsc-users] Fieldsplit PC null pointer error on getksp In-Reply-To: References: Message-ID: On Tue, Jun 30, 2015 at 11:43 AM, Sander Land wrote: > I am trying to use the Schur complement preconditioner in petsc, but am > encountering a null argument error calling PCFieldSplitGetSubKSP. > This only happens on PC_COMPOSITE_SCHUR, the multiplicative/additive > options do return a KSP array. > > Error message: > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Null argument, when expecting valid pointer > [0]PETSC ERROR: Null Object: Parameter # 1 > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 > [0]PETSC ERROR: ./elecmech on a arch-linux2-cxx-debug > [0]PETSC ERROR: Configure options --with-shared-libraries=1 > --with-debugging=1 --download-openmpi=1 --with-clanguage=c++ > --download-fblaslapack=1 --download-scalapack=1 --download-blacs=1 > --download-suitesparse=1 --download-pastix=1 --download-superlu=1 > --dowload-essl=1 --download-ptscotch=1 --download-mumps=1 --download-lusol=1 > [0]PETSC ERROR: #1 MatSchurComplementGetKSP() line 317 in > petsc-3.6/src/ksp/ksp/utils/schurm.c > [0]PETSC ERROR: #2 PCFieldSplitGetSubKSP_FieldSplit_Schur() line 1264 in > petsc-3.6/src/ksp/pc/impls/fieldsplit/fieldsplit.c > [0]PETSC ERROR: #3 PCFieldSplitGetSubKSP() line 1668 in > petsc-3.6/src/ksp/pc/impls/fieldsplit/fieldsplit.c > > Code snippet: > > ISCreateGeneral(PETSC_COMM_SELF,xp_dofs,&ii[0],PETSC_COPY_VALUES,&is_xp); > ISCreateGeneral(PETSC_COMM_SELF,all_dofs - > xp_dofs,&ii[xp_dofs],PETSC_COPY_VALUES,&is_wk); > > PCSetType(pc,PCFIELDSPLIT); > PCFieldSplitSetType(pc,PC_COMPOSITE_SCHUR); > PCFieldSplitSetIS(pc,"xp",is_xp); > PCFieldSplitSetIS(pc,"wk",is_wk); > int n; > KSP* ksps; > PC subpc; > PCFieldSplitGetSubKSP(pc,&n,&ksps); > > > ii here is simply an array with ii[i] = i. There is probably a better way > to simply indicate two blocks of different size, but I couldn't find it. > It looks like the PC is not setup. Can you try calling PCSetUp(pc) before GetSubKSP()? Thanks, Matt > Thanks, > Sander > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From abhyshr at anl.gov Tue Jun 30 09:00:01 2015 From: abhyshr at anl.gov (Abhyankar, Shrirang G.) Date: Tue, 30 Jun 2015 14:00:01 +0000 Subject: [petsc-users] TSSolve problems In-Reply-To: References: Message-ID: Note also that the use of DMDA is incorrect in your example. DMDA is for managing structured grids, while your example application (power grid dynamics simulation) has an unstructured network. Using DMNetwork or DMPlex would be more appropriate. Shri -----Original Message----- From: barry smith Date: Monday, June 29, 2015 at 10:28 PM To: "Li, Xinya" Cc: "petsc-users at mcs.anl.gov" Subject: Re: [petsc-users] TSSolve problems > >> On Jun 29, 2015, at 1:35 PM, Li, Xinya wrote: >> >> Yes. it is C++. >> I attached the major routines for TSSolve. > > Thanks. There is not much in the functions to explain why they are >taking the vast bulk of the time but I am going to go out on a limb and >guess that it is the C++ implementation of complex numbers and the use of >the cos() and sin() of those complex numbers that it causing the terrible >performance. > > Since your code has actually very little C++ specific stuff in it I >would suggest building PETSc without the --with-clanguage=c++ and use C >for everything, changing your code as needed to remove the C++ isms. In >C99 complex is a native data type (not a C++ class) so can potentially >give better performance. > > Barry > >> >> Thanks >> Xinya >> >> >> -----Original Message----- >> From: Barry Smith [mailto:bsmith at mcs.anl.gov] >> Sent: Monday, June 29, 2015 11:27 AM >> To: Li, Xinya >> Cc: petsc-users at mcs.anl.gov >> Subject: Re: [petsc-users] TSSolve problems >> >> >> Is your code C++? >> >> count time percent time >> --------------------------------------------------------------------- >> TSStep 600 3.1174e+02 100 >> TSFunctionEval 2937 1.4288e+02 46 >> TSJacobianEval 1737 1.3074e+02 42 >> KSPSolve 1737 3.5144e+01 11 >> >> Ok I pulled out the important time from the table. 46 percent of the >>time is in your function evaluation, 42 percent in the Jacobian >>evaluation and 11 percent in the linear solve. >> >> The only way to improve the time significantly is by speeding up the >>function and Jacobian computations. What is happening in those routines, >>can you email them? >> >> Barry >> >> >> >> >> >>> On Jun 29, 2015, at 11:57 AM, Li, Xinya wrote: >>> >>> Barry, >>> >>> Here is the new output without debugging. >>> >>> Thank you. >>> >>> Xinya >>> >>> ******************************************************************** >>> >>> >>> -----Original Message----- >>> From: Barry Smith [mailto:bsmith at mcs.anl.gov] >>> Sent: Friday, June 26, 2015 12:04 PM >>> To: Li, Xinya >>> Cc: petsc-users at mcs.anl.gov >>> Subject: Re: [petsc-users] TSSolve problems >>> >>> >>> ########################################################## >>> # # >>> # WARNING!!! # >>> # # >>> # This code was compiled with a debugging option, # >>> # To get timing results run ./configure # >>> # using --with-debugging=no, the performance will # >>> # be generally two or three times faster. # >>> # # >>> ########################################################## >>> >>> First you need to configure PETSc again without all the debugging. So >>>do, for example, >>> >>> export PETSC=arch-opt >>> ./configure --with-cc=gcc --with-fc=gfortran --with-cxx=g++ >>>--with-scalar-type=complex --with-clanguage=C++ >>>--with-cxx-dialect=C++11 --download-mpich --download-superlu_dist >>>--download-mumps --download-scalapack --download-parmetis >>>--download-metis --download-elemental make all test >>> >>> then recompile and rerun your example again with -log_summary and send >>>the output. >>> >>> Note that you should not pass --download-fblaslapack nor the fortran >>>kernel stuff. >>> >>> Barry >>> >>> >>>> On Jun 26, 2015, at 12:16 PM, Li, Xinya wrote: >>>> >>>> Barry, >>>> >>>> Thank you for your response. >>>> >>>> Attached is the output. SNESSolve was taking most of the time. >>>> >>>> >>>> Xinya >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: Barry Smith [mailto:bsmith at mcs.anl.gov] >>>> Sent: Thursday, June 25, 2015 5:47 PM >>>> To: Li, Xinya >>>> Cc: petsc-users at mcs.anl.gov >>>> Subject: Re: [petsc-users] TSSolve problems >>>> >>>> >>>> Run with -ts_view -log_summary and send the output. This will tell >>>>the current solvers and where the time is being spent. >>>> >>>> Barry >>>> >>>>> On Jun 25, 2015, at 6:37 PM, Li, Xinya wrote: >>>>> >>>>> Dear Sir, >>>>> >>>>> I am using the ts solver to solve a set of ODE and DAE. The Jacobian >>>>>matrix is a 1152 *1152 sparse complex matrix. >>>>> Each TSStep in TSSolve is taking nearly 1 second. Thus, I need to >>>>>improve the speed of TSSolve. >>>>> Which parts should be taking into account to accelerate TSSolve? >>>>> Thank you very much. >>>>> Regards >>>>> __________________________________________________ >>>>> Xinya Li >>>>> Scientist >>>>> EED/Hydrology >>>>> Pacific Northwest National Laboratory >>>>> 902 Battelle Boulevard >>>>> P.O. Box 999, MSIN K9-33 >>>>> Richland, WA 99352 USA >>>>> Tel: 509-372-6248 >>>>> Fax: 509-372-6089 >>>>> Xinya.Li at pnl.gov >>>> >>>> <288g1081b_short.log> >>> >>> >> >> > From sander.land at gmail.com Tue Jun 30 09:10:16 2015 From: sander.land at gmail.com (Sander Land) Date: Tue, 30 Jun 2015 15:10:16 +0100 Subject: [petsc-users] Fieldsplit PC null pointer error on getksp In-Reply-To: References: Message-ID: This gives [0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix must be set first The PC is from a SNESGetKSP, and I have called SNESSetJacobian before this. Thanks, Sander On Tue, Jun 30, 2015 at 2:09 PM, Matthew Knepley wrote: > On Tue, Jun 30, 2015 at 11:43 AM, Sander Land > wrote: > >> I am trying to use the Schur complement preconditioner in petsc, but am >> encountering a null argument error calling PCFieldSplitGetSubKSP. >> This only happens on PC_COMPOSITE_SCHUR, the multiplicative/additive >> options do return a KSP array. >> >> Error message: >> >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: Null argument, when expecting valid pointer >> [0]PETSC ERROR: Null Object: Parameter # 1 >> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >> for trouble shooting. >> [0]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 >> [0]PETSC ERROR: ./elecmech on a arch-linux2-cxx-debug >> [0]PETSC ERROR: Configure options --with-shared-libraries=1 >> --with-debugging=1 --download-openmpi=1 --with-clanguage=c++ >> --download-fblaslapack=1 --download-scalapack=1 --download-blacs=1 >> --download-suitesparse=1 --download-pastix=1 --download-superlu=1 >> --dowload-essl=1 --download-ptscotch=1 --download-mumps=1 --download-lusol=1 >> [0]PETSC ERROR: #1 MatSchurComplementGetKSP() line 317 in >> petsc-3.6/src/ksp/ksp/utils/schurm.c >> [0]PETSC ERROR: #2 PCFieldSplitGetSubKSP_FieldSplit_Schur() line 1264 in >> petsc-3.6/src/ksp/pc/impls/fieldsplit/fieldsplit.c >> [0]PETSC ERROR: #3 PCFieldSplitGetSubKSP() line 1668 in >> petsc-3.6/src/ksp/pc/impls/fieldsplit/fieldsplit.c >> >> Code snippet: >> >> ISCreateGeneral(PETSC_COMM_SELF,xp_dofs,&ii[0],PETSC_COPY_VALUES,&is_xp); >> ISCreateGeneral(PETSC_COMM_SELF,all_dofs - >> xp_dofs,&ii[xp_dofs],PETSC_COPY_VALUES,&is_wk); >> >> PCSetType(pc,PCFIELDSPLIT); >> PCFieldSplitSetType(pc,PC_COMPOSITE_SCHUR); >> PCFieldSplitSetIS(pc,"xp",is_xp); >> PCFieldSplitSetIS(pc,"wk",is_wk); >> int n; >> KSP* ksps; >> PC subpc; >> PCFieldSplitGetSubKSP(pc,&n,&ksps); >> >> >> ii here is simply an array with ii[i] = i. There is probably a better way >> to simply indicate two blocks of different size, but I couldn't find it. >> > > It looks like the PC is not setup. Can you try calling PCSetUp(pc) before > GetSubKSP()? > > Thanks, > > Matt > > >> Thanks, >> Sander >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Xinya.Li at pnnl.gov Tue Jun 30 11:54:41 2015 From: Xinya.Li at pnnl.gov (Li, Xinya) Date: Tue, 30 Jun 2015 16:54:41 +0000 Subject: [petsc-users] TSSolve problems In-Reply-To: References: Message-ID: Thank you both very much. I will modify the codes according to the suggestions and retest the performance. xinya -----Original Message----- From: Abhyankar, Shrirang G. [mailto:abhyshr at anl.gov] Sent: Tuesday, June 30, 2015 7:00 AM To: Li, Xinya Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] TSSolve problems Note also that the use of DMDA is incorrect in your example. DMDA is for managing structured grids, while your example application (power grid dynamics simulation) has an unstructured network. Using DMNetwork or DMPlex would be more appropriate. Shri -----Original Message----- From: barry smith Date: Monday, June 29, 2015 at 10:28 PM To: "Li, Xinya" Cc: "petsc-users at mcs.anl.gov" Subject: Re: [petsc-users] TSSolve problems > >> On Jun 29, 2015, at 1:35 PM, Li, Xinya wrote: >> >> Yes. it is C++. >> I attached the major routines for TSSolve. > > Thanks. There is not much in the functions to explain why they are >taking the vast bulk of the time but I am going to go out on a limb and >guess that it is the C++ implementation of complex numbers and the use >of the cos() and sin() of those complex numbers that it causing the >terrible performance. > > Since your code has actually very little C++ specific stuff in it I >would suggest building PETSc without the --with-clanguage=c++ and use C >for everything, changing your code as needed to remove the C++ isms. In >C99 complex is a native data type (not a C++ class) so can potentially >give better performance. > > Barry > >> >> Thanks >> Xinya >> >> >> -----Original Message----- >> From: Barry Smith [mailto:bsmith at mcs.anl.gov] >> Sent: Monday, June 29, 2015 11:27 AM >> To: Li, Xinya >> Cc: petsc-users at mcs.anl.gov >> Subject: Re: [petsc-users] TSSolve problems >> >> >> Is your code C++? >> >> count time percent time >> --------------------------------------------------------------------- >> TSStep 600 3.1174e+02 100 >> TSFunctionEval 2937 1.4288e+02 46 >> TSJacobianEval 1737 1.3074e+02 42 >> KSPSolve 1737 3.5144e+01 11 >> >> Ok I pulled out the important time from the table. 46 percent of >>the time is in your function evaluation, 42 percent in the Jacobian >>evaluation and 11 percent in the linear solve. >> >> The only way to improve the time significantly is by speeding up the >>function and Jacobian computations. What is happening in those >>routines, can you email them? >> >> Barry >> >> >> >> >> >>> On Jun 29, 2015, at 11:57 AM, Li, Xinya wrote: >>> >>> Barry, >>> >>> Here is the new output without debugging. >>> >>> Thank you. >>> >>> Xinya >>> >>> ******************************************************************** >>> >>> >>> -----Original Message----- >>> From: Barry Smith [mailto:bsmith at mcs.anl.gov] >>> Sent: Friday, June 26, 2015 12:04 PM >>> To: Li, Xinya >>> Cc: petsc-users at mcs.anl.gov >>> Subject: Re: [petsc-users] TSSolve problems >>> >>> >>> ########################################################## >>> # # >>> # WARNING!!! # >>> # # >>> # This code was compiled with a debugging option, # >>> # To get timing results run ./configure # >>> # using --with-debugging=no, the performance will # >>> # be generally two or three times faster. # >>> # # >>> ########################################################## >>> >>> First you need to configure PETSc again without all the debugging. >>>So do, for example, >>> >>> export PETSC=arch-opt >>> ./configure --with-cc=gcc --with-fc=gfortran --with-cxx=g++ >>>--with-scalar-type=complex --with-clanguage=C++ >>>--with-cxx-dialect=C++11 --download-mpich --download-superlu_dist >>>--download-mumps --download-scalapack --download-parmetis >>>--download-metis --download-elemental make all test >>> >>> then recompile and rerun your example again with -log_summary and >>>send the output. >>> >>> Note that you should not pass --download-fblaslapack nor the fortran >>>kernel stuff. >>> >>> Barry >>> >>> >>>> On Jun 26, 2015, at 12:16 PM, Li, Xinya wrote: >>>> >>>> Barry, >>>> >>>> Thank you for your response. >>>> >>>> Attached is the output. SNESSolve was taking most of the time. >>>> >>>> >>>> Xinya >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: Barry Smith [mailto:bsmith at mcs.anl.gov] >>>> Sent: Thursday, June 25, 2015 5:47 PM >>>> To: Li, Xinya >>>> Cc: petsc-users at mcs.anl.gov >>>> Subject: Re: [petsc-users] TSSolve problems >>>> >>>> >>>> Run with -ts_view -log_summary and send the output. This will tell >>>>the current solvers and where the time is being spent. >>>> >>>> Barry >>>> >>>>> On Jun 25, 2015, at 6:37 PM, Li, Xinya wrote: >>>>> >>>>> Dear Sir, >>>>> >>>>> I am using the ts solver to solve a set of ODE and DAE. The >>>>>Jacobian matrix is a 1152 *1152 sparse complex matrix. >>>>> Each TSStep in TSSolve is taking nearly 1 second. Thus, I need to >>>>>improve the speed of TSSolve. >>>>> Which parts should be taking into account to accelerate TSSolve? >>>>> Thank you very much. >>>>> Regards >>>>> __________________________________________________ >>>>> Xinya Li >>>>> Scientist >>>>> EED/Hydrology >>>>> Pacific Northwest National Laboratory >>>>> 902 Battelle Boulevard >>>>> P.O. Box 999, MSIN K9-33 >>>>> Richland, WA 99352 USA >>>>> Tel: 509-372-6248 >>>>> Fax: 509-372-6089 >>>>> Xinya.Li at pnl.gov >>>> >>>> <288g1081b_short.log> >>> >>> >> >> > From mhasan8 at vols.utk.edu Tue Jun 30 12:03:54 2015 From: mhasan8 at vols.utk.edu (Hasan, Fahad) Date: Tue, 30 Jun 2015 17:03:54 +0000 Subject: [petsc-users] TSSUNDIALS Message-ID: Hello PETSc team, Do you have any example for solving ODE/DAE using TSSUNDIALS solver? Regards, Fahad Vogt Research Group, Buehler Hall, Room-304. University of Tennessee-Knoxville. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Jun 30 12:11:52 2015 From: jed at jedbrown.org (Jed Brown) Date: Tue, 30 Jun 2015 11:11:52 -0600 Subject: [petsc-users] TSSUNDIALS In-Reply-To: References: Message-ID: <87h9ppnmsn.fsf@jedbrown.org> "Hasan, Fahad" writes: > Hello PETSc team, > > Do you have any example for solving ODE/DAE using TSSUNDIALS solver? Run any TS example with -ts_type sundials. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From hzhang at mcs.anl.gov Tue Jun 30 12:21:16 2015 From: hzhang at mcs.anl.gov (Hong) Date: Tue, 30 Jun 2015 12:21:16 -0500 Subject: [petsc-users] TSSUNDIALS In-Reply-To: References: Message-ID: petsc/src/ts/examples/tutorials/ex13.c for ODE using TSSUNDIALS solver. We do not have interface for solving DAE with TSSUNDIALS. Hong On Tue, Jun 30, 2015 at 12:03 PM, Hasan, Fahad wrote: > Hello PETSc team, > > > > Do you have any example for solving ODE/DAE using TSSUNDIALS solver? > > > > Regards, > > Fahad > > Vogt Research Group, > > Buehler Hall, Room-304. > > University of Tennessee-Knoxville. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Tue Jun 30 12:24:04 2015 From: hzhang at mcs.anl.gov (Hong) Date: Tue, 30 Jun 2015 12:24:04 -0500 Subject: [petsc-users] TSSUNDIALS In-Reply-To: <87h9ppnmsn.fsf@jedbrown.org> References: <87h9ppnmsn.fsf@jedbrown.org> Message-ID: Jed, Do we support DAE with sundials? Hong On Tue, Jun 30, 2015 at 12:11 PM, Jed Brown wrote: > "Hasan, Fahad" writes: > > > Hello PETSc team, > > > > Do you have any example for solving ODE/DAE using TSSUNDIALS solver? > > Run any TS example with -ts_type sundials. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Jun 30 12:27:36 2015 From: jed at jedbrown.org (Jed Brown) Date: Tue, 30 Jun 2015 11:27:36 -0600 Subject: [petsc-users] TSSUNDIALS In-Reply-To: References: <87h9ppnmsn.fsf@jedbrown.org> Message-ID: <87bnfxnm2f.fsf@jedbrown.org> Hong writes: > Jed, > Do we support DAE with sundials? We do not currently have an IDA interface, only CVODE. An IDA interface would be nearly a direct copy with the new namespace. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From bsmith at mcs.anl.gov Tue Jun 30 12:50:56 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 30 Jun 2015 12:50:56 -0500 Subject: [petsc-users] periodic boundary condition does not work after upgrading from 3.5.1 to 3.6.0 In-Reply-To: <56D054AF2E93E044AC1D2685709D28680197C29596B9@KIT-MSX-07.kit.edu> References: <56D054AF2E93E044AC1D2685709D28680197C29596B9@KIT-MSX-07.kit.edu> Message-ID: We had to remove the support for DMLocalToGlobalBegin/End() for this case since it was too difficult to maintain. Why are you using DMLocalToGlobalBegin/End() here? All it does it discard the ghost values. Instead of putting values into the wb0_gfpp_loc and then calling DMLocalToGlobalBegin to move the values into wb0_gfpp you should just put the values directly into wb0_gfpp and never even have a wb0_gfpp_loc Note that if you use DMDAVecGetArray() on wb0_gfpp_loc you can use it also on wb0_gfpp instead. Barry > On Jun 30, 2015, at 7:10 AM, Xiao, Jianjun (IKET) wrote: > > I am running a case with periodic boundary condition. It worked fine with PETSc 3.5.1. When PETSc was upgraded to 3.6.0, I got the error below. > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: No support for this operation for this object type > [0]PETSC ERROR: Available only for boundary none or with parallism in y direction > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015 > [0]PETSC ERROR: #295 DMLocalToGlobalBegin_DA() line 54 in /home/xiao/Local/petsc-3.6.0-debug/src/dm/impls/da/dagtol.c > [0]PETSC ERROR: #296 DMLocalToGlobalBegin() line 1944 in /home/xiao/Local/petsc-3.6.0-debug/src/dm/interface/dm.c > > I have periodic boundary condition in y direction, and I used INSERT_VALUES in DMLocalToGlobal. > > CALL DMLocalToGlobalBegin(da_dof1,wb0_gfpp_loc,INSERT_VALUES,wb0_gfpp,ierr) > CALL DMLocalToGlobalEnd(da_dof1,wb0_gfpp_loc,INSERT_VALUES,wb0_gfpp,ierr) > > Please let me know if you need more information. Thanks in advance. > > Best regards > Jianjun From torquil at gmail.com Tue Jun 30 16:05:17 2015 From: torquil at gmail.com (=?UTF-8?B?VG9ycXVpbCBNYWNkb25hbGQgU8O4cmVuc2Vu?=) Date: Tue, 30 Jun 2015 23:05:17 +0200 Subject: [petsc-users] Order of PetscRandom and SVD Message-ID: <5593048D.7050208@gmail.com> Hi! I'm experiencing some problems using PetscRandom and SVD from SLEPc. For some reason, the order of two seemingly unrelated parts of the program below affects the random values in a vector. The test program below, run with "mpiexec -n 2", will print a vector with two identical component values. If I move the four lines of SVD code up above "PetscRandom r", the two vector component are different, as expected. Why does this matter, when the SVD code is unrelated to the PetscRandom r and Vec u? #include "petscdmda.h" #include "slepcsvd.h" int main(int argc, char **argv) { PetscErrorCode ierr = SlepcInitialize(&argc, &argv, 0, 0); CHKERRQ(ierr); DM da; ierr = DMDACreate1d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, 2, 1, 1, 0, &da); CHKERRQ(ierr); ierr = DMSetFromOptions(da); CHKERRQ(ierr); Vec u; ierr = DMCreateGlobalVector(da, &u); CHKERRQ(ierr); PetscRandom r; ierr = PetscRandomCreate(PETSC_COMM_WORLD, &r); CHKERRQ(ierr); ierr = PetscRandomSetFromOptions(r); SVD svd; ierr = SVDCreate(PETSC_COMM_WORLD, &svd); CHKERRQ(ierr); ierr = SVDSetFromOptions(svd); CHKERRQ(ierr); ierr = SVDDestroy(&svd); CHKERRQ(ierr); ierr = VecSetRandom(u, r); CHKERRQ(ierr); ierr = VecView(u, PETSC_VIEWER_STDOUT_WORLD); CHKERRQ(ierr); ierr = PetscRandomDestroy(&r); CHKERRQ(ierr); ierr = VecDestroy(&u); CHKERRQ(ierr); ierr = DMDestroy(&da); CHKERRQ(ierr); ierr = SlepcFinalize(); CHKERRQ(ierr); return 0; } Output of the program as written above: Vec Object: 2 MPI processes type: mpi Vec Object:Vec_0xd00a30_0 2 MPI processes type: mpi Process [0] 0.720032 + 0.061794 i Process [1] 0.720032 + 0.061794 i Output of the program with the four lines of SVD code moved above the line "PetscRandom r": Vec Object: 2 MPI processes type: mpi Vec Object:Vec_0x117aa30_0 2 MPI processes type: mpi Process [0] 0.720032 + 0.061794 i Process [1] 0.541144 + 0.529699 i Other information: $ gcc --version gcc (Debian 5.1.1-12) 5.1.1 20150622 The PETSc version was obtained with the command: git checkout 75ae60a9cdad23ddd49e9e39341b43db353bd940 because the newest GIT version wouldn't compile. SLEPc is the newest from the master branch. The -log_summary output of the program is: ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./randtest.exe on a arch-linux2-c-debug named lenovo with 2 processors, by tmac Tue Jun 30 22:42:49 2015 Using Petsc Development GIT revision: v3.6-89-g75ae60a GIT Date: 2015-06-29 18:37:23 -0500 Max Max/Min Avg Total Time (sec): 1.274e-02 1.00000 1.274e-02 Objects: 3.900e+01 1.00000 3.900e+01 Flops: 0.000e+00 0.00000 0.000e+00 0.000e+00 Flops/sec: 0.000e+00 0.00000 0.000e+00 0.000e+00 Memory: 1.273e+05 1.00000 2.546e+05 MPI Messages: 6.500e+00 1.00000 6.500e+00 1.300e+01 MPI Message Lengths: 3.600e+01 1.00000 5.538e+00 7.200e+01 MPI Reductions: 9.000e+01 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 1.2723e-02 99.9% 0.0000e+00 0.0% 1.300e+01 100.0% 5.538e+00 100.0% 8.900e+01 98.9% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ ########################################################## # # # WARNING!!! # # # # This code was compiled with a debugging option, # # To get timing results run ./configure # # using --with-debugging=no, the performance will # # be generally two or three times faster. # # # ########################################################## ########################################################## # # # WARNING!!! # # # # The code for various complex numbers numerical # # kernels uses C++, which generally is not well # # optimized. For performance that is about 4-5 times # # faster, specify --with-fortran-kernels=1 # # when running ./configure.py. # # # ########################################################## Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecView 1 1.0 1.0881e-03 1.2 0.00e+00 0.0 5.0e+00 8.0e+00 1.5e+01 8 0 38 56 17 8 0 38 56 17 0 VecSet 1 1.0 1.5020e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 1 1.0 2.1935e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterEnd 1 1.0 7.1526e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSetRandom 1 1.0 1.6928e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 1 1.0 8.4877e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 1 0 0 0 2 1 0 0 0 2 0 MatAssemblyEnd 1 1.0 6.7306e-04 1.0 0.00e+00 0.0 4.0e+00 4.0e+00 1.4e+01 5 0 31 22 16 5 0 31 22 16 0 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Vector 7 7 11064 0 Vector Scatter 3 3 3280 0 Matrix 3 3 8064 0 Distributed Mesh 1 1 4972 0 Star Forest Bipartite Graph 2 2 1712 0 Discrete System 1 1 848 0 Index Set 6 6 4660 0 IS L to G Mapping 1 1 632 0 PetscRandom 3 3 1920 0 SVD Solver 1 1 984 0 EPS Solver 1 1 1272 0 Spectral Transform 1 1 784 0 Krylov Solver 1 1 1304 0 Preconditioner 1 1 848 0 Basis Vectors 3 3 3024 0 Direct Solver 2 2 2640 0 Region 1 1 648 0 Viewer 1 0 0 0 ======================================================================================================================== Average time to get PetscTime(): 9.53674e-08 Average time for MPI_Barrier(): 1.43051e-06 Average time for zero size MPI_Send(): 4.52995e-06 #PETSc Option Table entries: -ksp_converged_reason -log_summary -objects_dump 0 -options_table #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4 Configure options: --prefix=/home/tmac/usr/stow/petsc --with-scalar-type=complex --with-shared-libraries=yes --with-debugging=yes --download-superlu=yes --download-ptscotch=yes --with-sowing=no --download-sowing=no ----------------------------------------- Libraries compiled on Tue Jun 30 20:18:19 2015 on lenovo Machine characteristics: Linux-4.0.0-2-amd64-x86_64-with-debian-stretch-sid Using PETSc directory: /home/tmac/tmp/src/petsc Using PETSc arch: arch-linux2-c-debug ----------------------------------------- Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: mpif90 -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -g -O0 ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/tmac/tmp/src/petsc/arch-linux2-c-debug/include -I/home/tmac/tmp/src/petsc/include -I/home/tmac/tmp/src/petsc/include -I/home/tmac/tmp/src/petsc/arch-linux2-c-debug/include -I/home/tmac/usr/stow/petsc/include -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi ----------------------------------------- Using C linker: mpicc Using Fortran linker: mpif90 Using libraries: -Wl,-rpath,/home/tmac/tmp/src/petsc/arch-linux2-c-debug/lib -L/home/tmac/tmp/src/petsc/arch-linux2-c-debug/lib -lpetsc -Wl,-rpath,/home/tmac/usr/stow/petsc/lib -L/home/tmac/usr/stow/petsc/lib -lsuperlu_4.3 -llapack -lblas -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lX11 -lssl -lcrypto -lm -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpi_f90 -lmpi_f77 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpi_cxx -lstdc++ -lrt -lm -lz -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -lmpi -lhwloc -lgcc_s -lpthread -ldl ----------------------------------------- Best regards Torquil S?rensen From jroman at dsic.upv.es Tue Jun 30 16:43:20 2015 From: jroman at dsic.upv.es (Jose E. Roman) Date: Tue, 30 Jun 2015 23:43:20 +0200 Subject: [petsc-users] Order of PetscRandom and SVD In-Reply-To: <5593048D.7050208@gmail.com> References: <5593048D.7050208@gmail.com> Message-ID: SLEPc solvers use a PetscRandom object internally, and may generate a different sequence of random values in each process. Jose > El 30/6/2015, a las 23:05, Torquil Macdonald S?rensen escribi?: > > Hi! > > I'm experiencing some problems using PetscRandom and SVD from SLEPc. For > some reason, the order of two seemingly unrelated parts of the program > below affects the random values in a vector. The test program below, run > with "mpiexec -n 2", will print a vector with two identical component > values. If I move the four lines of SVD code up above "PetscRandom r", > the two vector component are different, as expected. Why does this > matter, when the SVD code is unrelated to the PetscRandom r and Vec u? > > #include "petscdmda.h" > #include "slepcsvd.h" > > int main(int argc, char **argv) > { > PetscErrorCode ierr = SlepcInitialize(&argc, &argv, 0, 0); > CHKERRQ(ierr); > > DM da; > ierr = DMDACreate1d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, 2, 1, 1, 0, > &da); CHKERRQ(ierr); > ierr = DMSetFromOptions(da); CHKERRQ(ierr); > > Vec u; > ierr = DMCreateGlobalVector(da, &u); CHKERRQ(ierr); > > PetscRandom r; > ierr = PetscRandomCreate(PETSC_COMM_WORLD, &r); CHKERRQ(ierr); > ierr = PetscRandomSetFromOptions(r); > > SVD svd; > ierr = SVDCreate(PETSC_COMM_WORLD, &svd); CHKERRQ(ierr); > ierr = SVDSetFromOptions(svd); CHKERRQ(ierr); > ierr = SVDDestroy(&svd); CHKERRQ(ierr); > > ierr = VecSetRandom(u, r); CHKERRQ(ierr); > ierr = VecView(u, PETSC_VIEWER_STDOUT_WORLD); CHKERRQ(ierr); > > ierr = PetscRandomDestroy(&r); CHKERRQ(ierr); > ierr = VecDestroy(&u); CHKERRQ(ierr); > ierr = DMDestroy(&da); CHKERRQ(ierr); > ierr = SlepcFinalize(); CHKERRQ(ierr); > > return 0; > } > > Output of the program as written above: > > Vec Object: 2 MPI processes > type: mpi > Vec Object:Vec_0xd00a30_0 2 MPI processes > type: mpi > Process [0] > 0.720032 + 0.061794 i > Process [1] > 0.720032 + 0.061794 i > > Output of the program with the four lines of SVD code moved above the > line "PetscRandom r": > > Vec Object: 2 MPI processes > type: mpi > Vec Object:Vec_0x117aa30_0 2 MPI processes > type: mpi > Process [0] > 0.720032 + 0.061794 i > Process [1] > 0.541144 + 0.529699 i > > Other information: > > $ gcc --version > gcc (Debian 5.1.1-12) 5.1.1 20150622 > > The PETSc version was obtained with the command: > git checkout 75ae60a9cdad23ddd49e9e39341b43db353bd940 > > because the newest GIT version wouldn't compile. SLEPc is the newest > from the master branch. The -log_summary output of the program is: > > ---------------------------------------------- PETSc Performance > Summary: ---------------------------------------------- > > ./randtest.exe on a arch-linux2-c-debug named lenovo with 2 processors, > by tmac Tue Jun 30 22:42:49 2015 > Using Petsc Development GIT revision: v3.6-89-g75ae60a GIT Date: > 2015-06-29 18:37:23 -0500 > > Max Max/Min Avg Total > Time (sec): 1.274e-02 1.00000 1.274e-02 > Objects: 3.900e+01 1.00000 3.900e+01 > Flops: 0.000e+00 0.00000 0.000e+00 0.000e+00 > Flops/sec: 0.000e+00 0.00000 0.000e+00 0.000e+00 > Memory: 1.273e+05 1.00000 2.546e+05 > MPI Messages: 6.500e+00 1.00000 6.500e+00 1.300e+01 > MPI Message Lengths: 3.600e+01 1.00000 5.538e+00 7.200e+01 > MPI Reductions: 9.000e+01 1.00000 > > Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N > --> 2N flops > and VecAXPY() for complex vectors of length > N --> 8N flops > > Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages > --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total counts > %Total Avg %Total counts %Total > 0: Main Stage: 1.2723e-02 99.9% 0.0000e+00 0.0% 1.300e+01 > 100.0% 5.538e+00 100.0% 8.900e+01 98.9% > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on > interpreting output. > Phase summary info: > Count: number of times phase was executed > Time and Flops: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > Avg. len: average message length (bytes) > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() > and PetscLogStagePop(). > %T - percent time in this phase %F - percent flops in this > phase > %M - percent messages in this phase %L - percent message > lengths in this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time > over all processors) > ------------------------------------------------------------------------------------------------------------------------ > > > ########################################################## > # # > # WARNING!!! # > # # > # This code was compiled with a debugging option, # > # To get timing results run ./configure # > # using --with-debugging=no, the performance will # > # be generally two or three times faster. # > # # > ########################################################## > > > > > ########################################################## > # # > # WARNING!!! # > # # > # The code for various complex numbers numerical # > # kernels uses C++, which generally is not well # > # optimized. For performance that is about 4-5 times # > # faster, specify --with-fortran-kernels=1 # > # when running ./configure.py. # > # # > ########################################################## > > > Event Count Time (sec) > Flops --- Global --- --- Stage --- Total > Max Ratio Max Ratio Max Ratio Mess Avg len > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > VecView 1 1.0 1.0881e-03 1.2 0.00e+00 0.0 5.0e+00 8.0e+00 > 1.5e+01 8 0 38 56 17 8 0 38 56 17 0 > VecSet 1 1.0 1.5020e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecScatterBegin 1 1.0 2.1935e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecScatterEnd 1 1.0 7.1526e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSetRandom 1 1.0 1.6928e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyBegin 1 1.0 8.4877e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.0e+00 1 0 0 0 2 1 0 0 0 2 0 > MatAssemblyEnd 1 1.0 6.7306e-04 1.0 0.00e+00 0.0 4.0e+00 4.0e+00 > 1.4e+01 5 0 31 22 16 5 0 31 22 16 0 > ------------------------------------------------------------------------------------------------------------------------ > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' Mem. > Reports information only for process 0. > > --- Event Stage 0: Main Stage > > Vector 7 7 11064 0 > Vector Scatter 3 3 3280 0 > Matrix 3 3 8064 0 > Distributed Mesh 1 1 4972 0 > Star Forest Bipartite Graph 2 2 1712 0 > Discrete System 1 1 848 0 > Index Set 6 6 4660 0 > IS L to G Mapping 1 1 632 0 > PetscRandom 3 3 1920 0 > SVD Solver 1 1 984 0 > EPS Solver 1 1 1272 0 > Spectral Transform 1 1 784 0 > Krylov Solver 1 1 1304 0 > Preconditioner 1 1 848 0 > Basis Vectors 3 3 3024 0 > Direct Solver 2 2 2640 0 > Region 1 1 648 0 > Viewer 1 0 0 0 > ======================================================================================================================== > Average time to get PetscTime(): 9.53674e-08 > Average time for MPI_Barrier(): 1.43051e-06 > Average time for zero size MPI_Send(): 4.52995e-06 > #PETSc Option Table entries: > -ksp_converged_reason > -log_summary > -objects_dump 0 > -options_table > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 16 sizeof(PetscInt) 4 > Configure options: --prefix=/home/tmac/usr/stow/petsc > --with-scalar-type=complex --with-shared-libraries=yes > --with-debugging=yes --download-superlu=yes --download-ptscotch=yes > --with-sowing=no --download-sowing=no > ----------------------------------------- > Libraries compiled on Tue Jun 30 20:18:19 2015 on lenovo > Machine characteristics: Linux-4.0.0-2-amd64-x86_64-with-debian-stretch-sid > Using PETSc directory: /home/tmac/tmp/src/petsc > Using PETSc arch: arch-linux2-c-debug > ----------------------------------------- > > Using C compiler: mpicc -fPIC -Wall -Wwrite-strings > -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 ${COPTFLAGS} ${CFLAGS} > Using Fortran compiler: mpif90 -fPIC -Wall -Wno-unused-variable > -ffree-line-length-0 -g -O0 ${FOPTFLAGS} ${FFLAGS} > ----------------------------------------- > > Using include paths: > -I/home/tmac/tmp/src/petsc/arch-linux2-c-debug/include > -I/home/tmac/tmp/src/petsc/include -I/home/tmac/tmp/src/petsc/include > -I/home/tmac/tmp/src/petsc/arch-linux2-c-debug/include > -I/home/tmac/usr/stow/petsc/include -I/usr/lib/openmpi/include > -I/usr/lib/openmpi/include/openmpi > ----------------------------------------- > > Using C linker: mpicc > Using Fortran linker: mpif90 > Using libraries: > -Wl,-rpath,/home/tmac/tmp/src/petsc/arch-linux2-c-debug/lib > -L/home/tmac/tmp/src/petsc/arch-linux2-c-debug/lib -lpetsc > -Wl,-rpath,/home/tmac/usr/stow/petsc/lib -L/home/tmac/usr/stow/petsc/lib > -lsuperlu_4.3 -llapack -lblas -lptesmumps -lptscotch -lptscotcherr > -lscotch -lscotcherr -lX11 -lssl -lcrypto -lm > -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib > -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 > -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu > -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu > -L/lib/x86_64-linux-gnu -lmpi_f90 -lmpi_f77 -lgfortran -lm -lgfortran > -lm -lquadmath -lm -lmpi_cxx -lstdc++ -lrt -lm -lz > -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib > -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 > -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu > -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu > -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu > -L/usr/lib/x86_64-linux-gnu -ldl -lmpi -lhwloc -lgcc_s -lpthread -ldl > ----------------------------------------- > > Best regards > Torquil S?rensen From wadud.miah at gmail.com Tue Jun 30 17:02:50 2015 From: wadud.miah at gmail.com (W. Miah) Date: Tue, 30 Jun 2015 23:02:50 +0100 Subject: [petsc-users] preprocessing problem Message-ID: Hi, This is probably a simple mistake, but I am getting the following error when trying to compile a Fortran PETSc example: [host petsc-examples]$ mpiifort -c -I/home/fcwm/petsc-3.6.0/include -I/home/fcwm/petsc-3.6.0/include/petsc ex2f.F ex2f.F(82): error #5082: Syntax error, found ',' when expecting one of: ( % [ : . = => Vec x,b,u ------------------^ Any help will be greatly appreciated. Thanks in advance. Wadud. From balay at mcs.anl.gov Tue Jun 30 17:11:00 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 30 Jun 2015 17:11:00 -0500 Subject: [petsc-users] preprocessing problem In-Reply-To: References: Message-ID: On Tue, 30 Jun 2015, W. Miah wrote: > Hi, > > This is probably a simple mistake, but I am getting the following > error when trying to compile a Fortran PETSc example: > > [host petsc-examples]$ mpiifort -c -I/home/fcwm/petsc-3.6.0/include > -I/home/fcwm/petsc-3.6.0/include/petsc ex2f.F > ex2f.F(82): error #5082: Syntax error, found ',' when expecting one > of: ( % [ : . = => > Vec x,b,u > ------------------^ > > Any help will be greatly appreciated. Thanks in advance. Are you not using the correspoding makefile that came with this example? Satish From wadud.miah at gmail.com Tue Jun 30 17:15:15 2015 From: wadud.miah at gmail.com (W. Miah) Date: Tue, 30 Jun 2015 23:15:15 +0100 Subject: [petsc-users] preprocessing problem In-Reply-To: References: Message-ID: Hi Satish, I can't find the Makefile that came with it. I'll have a trawl through the petsc download again, but if you could give me the compilation line that would be appreciated. Best regards, Wadud. On 30 June 2015 at 23:11, Satish Balay wrote: > On Tue, 30 Jun 2015, W. Miah wrote: > >> Hi, >> >> This is probably a simple mistake, but I am getting the following >> error when trying to compile a Fortran PETSc example: >> >> [host petsc-examples]$ mpiifort -c -I/home/fcwm/petsc-3.6.0/include >> -I/home/fcwm/petsc-3.6.0/include/petsc ex2f.F >> ex2f.F(82): error #5082: Syntax error, found ',' when expecting one >> of: ( % [ : . = => >> Vec x,b,u >> ------------------^ >> >> Any help will be greatly appreciated. Thanks in advance. > > Are you not using the correspoding makefile that came with this example? > > Satish > -- email: wadud.miah at gmail.com mobile: 07905 755604 gnupg: 2E29 B22F From balay at mcs.anl.gov Tue Jun 30 17:20:56 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 30 Jun 2015 17:20:56 -0500 Subject: [petsc-users] preprocessing problem In-Reply-To: References: Message-ID: Where did you grab this example from? the 'makefile' should be in the same location. For eg: if you grabbed src/ksp/ksp/examples/tutorials/ex2f.F - then you would use: http://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/makefile The users manual has a section explaining the format. http://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf#subsection.16.2.1 Satish On Tue, 30 Jun 2015, W. Miah wrote: > Hi Satish, > > I can't find the Makefile that came with it. I'll have a trawl through > the petsc download again, but if you could give me the compilation > line that would be appreciated. > > Best regards, > Wadud. > > On 30 June 2015 at 23:11, Satish Balay wrote: > > On Tue, 30 Jun 2015, W. Miah wrote: > > > >> Hi, > >> > >> This is probably a simple mistake, but I am getting the following > >> error when trying to compile a Fortran PETSc example: > >> > >> [host petsc-examples]$ mpiifort -c -I/home/fcwm/petsc-3.6.0/include > >> -I/home/fcwm/petsc-3.6.0/include/petsc ex2f.F > >> ex2f.F(82): error #5082: Syntax error, found ',' when expecting one > >> of: ( % [ : . = => > >> Vec x,b,u > >> ------------------^ > >> > >> Any help will be greatly appreciated. Thanks in advance. > > > > Are you not using the correspoding makefile that came with this example? > > > > Satish > > > > > > From wadud.miah at gmail.com Tue Jun 30 17:33:20 2015 From: wadud.miah at gmail.com (W. Miah) Date: Tue, 30 Jun 2015 23:33:20 +0100 Subject: [petsc-users] preprocessing problem In-Reply-To: References: Message-ID: Hi Satish, Thanks for the Makefile. Regards, Wadud. On 30 June 2015 at 23:20, Satish Balay wrote: > Where did you grab this example from? the 'makefile' should be in the > same location. > > For eg: if you grabbed src/ksp/ksp/examples/tutorials/ex2f.F - then you would use: > http://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/makefile > > The users manual has a section explaining the format. > http://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf#subsection.16.2.1 > > Satish > > On Tue, 30 Jun 2015, W. Miah wrote: > >> Hi Satish, >> >> I can't find the Makefile that came with it. I'll have a trawl through >> the petsc download again, but if you could give me the compilation >> line that would be appreciated. >> >> Best regards, >> Wadud. >> >> On 30 June 2015 at 23:11, Satish Balay wrote: >> > On Tue, 30 Jun 2015, W. Miah wrote: >> > >> >> Hi, >> >> >> >> This is probably a simple mistake, but I am getting the following >> >> error when trying to compile a Fortran PETSc example: >> >> >> >> [host petsc-examples]$ mpiifort -c -I/home/fcwm/petsc-3.6.0/include >> >> -I/home/fcwm/petsc-3.6.0/include/petsc ex2f.F >> >> ex2f.F(82): error #5082: Syntax error, found ',' when expecting one >> >> of: ( % [ : . = => >> >> Vec x,b,u >> >> ------------------^ >> >> >> >> Any help will be greatly appreciated. Thanks in advance. >> > >> > Are you not using the correspoding makefile that came with this example? >> > >> > Satish >> > >> >> >> >> > -- email: wadud.miah at gmail.com mobile: 07905 755604 gnupg: 2E29 B22F From torquil at gmail.com Tue Jun 30 17:42:12 2015 From: torquil at gmail.com (=?UTF-8?B?VG9ycXVpbCBNYWNkb25hbGQgU8O4cmVuc2Vu?=) Date: Wed, 01 Jul 2015 00:42:12 +0200 Subject: [petsc-users] Order of PetscRandom and SVD In-Reply-To: References: <5593048D.7050208@gmail.com> Message-ID: <55931B44.8050700@gmail.com> Thanks. So what can I do to create a PetscRandom object that is not affected by the SLEPc-functions that I'm calling? Best regards, Torquil S?rensen On 30/06/15 23:43, Jose E. Roman wrote: > SLEPc solvers use a PetscRandom object internally, and may generate a different sequence of random values in each process. > Jose > > > >> El 30/6/2015, a las 23:05, Torquil Macdonald S?rensen escribi?: >> >> Hi! >> >> I'm experiencing some problems using PetscRandom and SVD from SLEPc. For >> some reason, the order of two seemingly unrelated parts of the program >> below affects the random values in a vector. The test program below, run >> with "mpiexec -n 2", will print a vector with two identical component >> values. If I move the four lines of SVD code up above "PetscRandom r", >> the two vector component are different, as expected. Why does this >> matter, when the SVD code is unrelated to the PetscRandom r and Vec u? >> >> #include "petscdmda.h" >> #include "slepcsvd.h" >> >> int main(int argc, char **argv) >> { >> PetscErrorCode ierr = SlepcInitialize(&argc, &argv, 0, 0); >> CHKERRQ(ierr); >> >> DM da; >> ierr = DMDACreate1d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, 2, 1, 1, 0, >> &da); CHKERRQ(ierr); >> ierr = DMSetFromOptions(da); CHKERRQ(ierr); >> >> Vec u; >> ierr = DMCreateGlobalVector(da, &u); CHKERRQ(ierr); >> >> PetscRandom r; >> ierr = PetscRandomCreate(PETSC_COMM_WORLD, &r); CHKERRQ(ierr); >> ierr = PetscRandomSetFromOptions(r); >> >> SVD svd; >> ierr = SVDCreate(PETSC_COMM_WORLD, &svd); CHKERRQ(ierr); >> ierr = SVDSetFromOptions(svd); CHKERRQ(ierr); >> ierr = SVDDestroy(&svd); CHKERRQ(ierr); >> >> ierr = VecSetRandom(u, r); CHKERRQ(ierr); >> ierr = VecView(u, PETSC_VIEWER_STDOUT_WORLD); CHKERRQ(ierr); >> >> ierr = PetscRandomDestroy(&r); CHKERRQ(ierr); >> ierr = VecDestroy(&u); CHKERRQ(ierr); >> ierr = DMDestroy(&da); CHKERRQ(ierr); >> ierr = SlepcFinalize(); CHKERRQ(ierr); >> >> return 0; >> } >> >> Output of the program as written above: >> >> Vec Object: 2 MPI processes >> type: mpi >> Vec Object:Vec_0xd00a30_0 2 MPI processes >> type: mpi >> Process [0] >> 0.720032 + 0.061794 i >> Process [1] >> 0.720032 + 0.061794 i >> >> Output of the program with the four lines of SVD code moved above the >> line "PetscRandom r": >> >> Vec Object: 2 MPI processes >> type: mpi >> Vec Object:Vec_0x117aa30_0 2 MPI processes >> type: mpi >> Process [0] >> 0.720032 + 0.061794 i >> Process [1] >> 0.541144 + 0.529699 i >> >> Other information: >> >> $ gcc --version >> gcc (Debian 5.1.1-12) 5.1.1 20150622 >> >> The PETSc version was obtained with the command: >> git checkout 75ae60a9cdad23ddd49e9e39341b43db353bd940 >> >> because the newest GIT version wouldn't compile. SLEPc is the newest >> from the master branch. The -log_summary output of the program is: >> >> ---------------------------------------------- PETSc Performance >> Summary: ---------------------------------------------- >> >> ./randtest.exe on a arch-linux2-c-debug named lenovo with 2 processors, >> by tmac Tue Jun 30 22:42:49 2015 >> Using Petsc Development GIT revision: v3.6-89-g75ae60a GIT Date: >> 2015-06-29 18:37:23 -0500 >> >> Max Max/Min Avg Total >> Time (sec): 1.274e-02 1.00000 1.274e-02 >> Objects: 3.900e+01 1.00000 3.900e+01 >> Flops: 0.000e+00 0.00000 0.000e+00 0.000e+00 >> Flops/sec: 0.000e+00 0.00000 0.000e+00 0.000e+00 >> Memory: 1.273e+05 1.00000 2.546e+05 >> MPI Messages: 6.500e+00 1.00000 6.500e+00 1.300e+01 >> MPI Message Lengths: 3.600e+01 1.00000 5.538e+00 7.200e+01 >> MPI Reductions: 9.000e+01 1.00000 >> >> Flop counting convention: 1 flop = 1 real number operation of type >> (multiply/divide/add/subtract) >> e.g., VecAXPY() for real vectors of length N >> --> 2N flops >> and VecAXPY() for complex vectors of length >> N --> 8N flops >> >> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages >> --- -- Message Lengths -- -- Reductions -- >> Avg %Total Avg %Total counts >> %Total Avg %Total counts %Total >> 0: Main Stage: 1.2723e-02 99.9% 0.0000e+00 0.0% 1.300e+01 >> 100.0% 5.538e+00 100.0% 8.900e+01 98.9% >> >> ------------------------------------------------------------------------------------------------------------------------ >> See the 'Profiling' chapter of the users' manual for details on >> interpreting output. >> Phase summary info: >> Count: number of times phase was executed >> Time and Flops: Max - maximum over all processors >> Ratio - ratio of maximum to minimum over all processors >> Mess: number of messages sent >> Avg. len: average message length (bytes) >> Reduct: number of global reductions >> Global: entire computation >> Stage: stages of a computation. Set stages with PetscLogStagePush() >> and PetscLogStagePop(). >> %T - percent time in this phase %F - percent flops in this >> phase >> %M - percent messages in this phase %L - percent message >> lengths in this phase >> %R - percent reductions in this phase >> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time >> over all processors) >> ------------------------------------------------------------------------------------------------------------------------ >> >> >> ########################################################## >> # # >> # WARNING!!! # >> # # >> # This code was compiled with a debugging option, # >> # To get timing results run ./configure # >> # using --with-debugging=no, the performance will # >> # be generally two or three times faster. # >> # # >> ########################################################## >> >> >> >> >> ########################################################## >> # # >> # WARNING!!! # >> # # >> # The code for various complex numbers numerical # >> # kernels uses C++, which generally is not well # >> # optimized. For performance that is about 4-5 times # >> # faster, specify --with-fortran-kernels=1 # >> # when running ./configure.py. # >> # # >> ########################################################## >> >> >> Event Count Time (sec) >> Flops --- Global --- --- Stage --- Total >> Max Ratio Max Ratio Max Ratio Mess Avg len >> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> ------------------------------------------------------------------------------------------------------------------------ >> >> --- Event Stage 0: Main Stage >> >> VecView 1 1.0 1.0881e-03 1.2 0.00e+00 0.0 5.0e+00 8.0e+00 >> 1.5e+01 8 0 38 56 17 8 0 38 56 17 0 >> VecSet 1 1.0 1.5020e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecScatterBegin 1 1.0 2.1935e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecScatterEnd 1 1.0 7.1526e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecSetRandom 1 1.0 1.6928e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAssemblyBegin 1 1.0 8.4877e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 >> 2.0e+00 1 0 0 0 2 1 0 0 0 2 0 >> MatAssemblyEnd 1 1.0 6.7306e-04 1.0 0.00e+00 0.0 4.0e+00 4.0e+00 >> 1.4e+01 5 0 31 22 16 5 0 31 22 16 0 >> ------------------------------------------------------------------------------------------------------------------------ >> >> Memory usage is given in bytes: >> >> Object Type Creations Destructions Memory Descendants' Mem. >> Reports information only for process 0. >> >> --- Event Stage 0: Main Stage >> >> Vector 7 7 11064 0 >> Vector Scatter 3 3 3280 0 >> Matrix 3 3 8064 0 >> Distributed Mesh 1 1 4972 0 >> Star Forest Bipartite Graph 2 2 1712 0 >> Discrete System 1 1 848 0 >> Index Set 6 6 4660 0 >> IS L to G Mapping 1 1 632 0 >> PetscRandom 3 3 1920 0 >> SVD Solver 1 1 984 0 >> EPS Solver 1 1 1272 0 >> Spectral Transform 1 1 784 0 >> Krylov Solver 1 1 1304 0 >> Preconditioner 1 1 848 0 >> Basis Vectors 3 3 3024 0 >> Direct Solver 2 2 2640 0 >> Region 1 1 648 0 >> Viewer 1 0 0 0 >> ======================================================================================================================== >> Average time to get PetscTime(): 9.53674e-08 >> Average time for MPI_Barrier(): 1.43051e-06 >> Average time for zero size MPI_Send(): 4.52995e-06 >> #PETSc Option Table entries: >> -ksp_converged_reason >> -log_summary >> -objects_dump 0 >> -options_table >> #End of PETSc Option Table entries >> Compiled without FORTRAN kernels >> Compiled with full precision matrices (default) >> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 >> sizeof(PetscScalar) 16 sizeof(PetscInt) 4 >> Configure options: --prefix=/home/tmac/usr/stow/petsc >> --with-scalar-type=complex --with-shared-libraries=yes >> --with-debugging=yes --download-superlu=yes --download-ptscotch=yes >> --with-sowing=no --download-sowing=no >> ----------------------------------------- >> Libraries compiled on Tue Jun 30 20:18:19 2015 on lenovo >> Machine characteristics: Linux-4.0.0-2-amd64-x86_64-with-debian-stretch-sid >> Using PETSc directory: /home/tmac/tmp/src/petsc >> Using PETSc arch: arch-linux2-c-debug >> ----------------------------------------- >> >> Using C compiler: mpicc -fPIC -Wall -Wwrite-strings >> -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 ${COPTFLAGS} ${CFLAGS} >> Using Fortran compiler: mpif90 -fPIC -Wall -Wno-unused-variable >> -ffree-line-length-0 -g -O0 ${FOPTFLAGS} ${FFLAGS} >> ----------------------------------------- >> >> Using include paths: >> -I/home/tmac/tmp/src/petsc/arch-linux2-c-debug/include >> -I/home/tmac/tmp/src/petsc/include -I/home/tmac/tmp/src/petsc/include >> -I/home/tmac/tmp/src/petsc/arch-linux2-c-debug/include >> -I/home/tmac/usr/stow/petsc/include -I/usr/lib/openmpi/include >> -I/usr/lib/openmpi/include/openmpi >> ----------------------------------------- >> >> Using C linker: mpicc >> Using Fortran linker: mpif90 >> Using libraries: >> -Wl,-rpath,/home/tmac/tmp/src/petsc/arch-linux2-c-debug/lib >> -L/home/tmac/tmp/src/petsc/arch-linux2-c-debug/lib -lpetsc >> -Wl,-rpath,/home/tmac/usr/stow/petsc/lib -L/home/tmac/usr/stow/petsc/lib >> -lsuperlu_4.3 -llapack -lblas -lptesmumps -lptscotch -lptscotcherr >> -lscotch -lscotcherr -lX11 -lssl -lcrypto -lm >> -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib >> -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 >> -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu >> -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu >> -L/lib/x86_64-linux-gnu -lmpi_f90 -lmpi_f77 -lgfortran -lm -lgfortran >> -lm -lquadmath -lm -lmpi_cxx -lstdc++ -lrt -lm -lz >> -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib >> -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 >> -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu >> -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu >> -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu >> -L/usr/lib/x86_64-linux-gnu -ldl -lmpi -lhwloc -lgcc_s -lpthread -ldl >> ----------------------------------------- >> >> Best regards >> Torquil S?rensen From wadud.miah at gmail.com Tue Jun 30 17:54:04 2015 From: wadud.miah at gmail.com (W. Miah) Date: Tue, 30 Jun 2015 23:54:04 +0100 Subject: [petsc-users] preprocessing problem In-Reply-To: References: Message-ID: Hi again Satish, When I use the supplied makefile, I get the following error: [hamilton1 petsc-examples]$ make ex2f /usr/local/intel/ics_2013/impi/4.1.0.024/intel64/bin/mpif90 -c -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -O -I/home/fcwm/petsc-3.6.0/include -I/home/fcwm/petsc-3.6.0/include -I/home/fcwm/hypre-2.10.0b/include -I/usr/local/intel/ics_2013/impi/4.1.0.024/intel64/include -I/usr/local/intel/ics_2013/impi/4.1.0.024/intel64/include -o ex2f.o ex2f.F ex2f.F:48: error: finclude/petscsys.h: No such file or directory ex2f.F:49: error: finclude/petscvec.h: No such file or directory ex2f.F:50: error: finclude/petscmat.h: No such file or directory ex2f.F:51: error: finclude/petscpc.h: No such file or directory ex2f.F:52: error: finclude/petscksp.h: No such file or directory It obviously cannot find the Fortran header files. When I include the path to the Fortran header files, it still complains: /usr/local/intel/ics_2013/impi/4.1.0.024/intel64/bin/mpif90 -c -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -O -I/home/fcwm/petsc-3.6.0/include -I/home/fcwm/petsc-3.6.0/include/petsc -I/home/fcwm/hypre-2.10.0b/include -I/usr/local/intel/ics_2013/impi/4.1.0.024/intel64/include -I/usr/local/intel/ics_2013/impi/4.1.0.024/intel64/include -o ex2f.o ex2f.F Warning: Nonconforming tab character in column 3 of line 1180 Warning: Nonconforming tab character in column 3 of line 1201 Warning: Nonconforming tab character in column 3 of line 1209 Warning: Nonconforming tab character in column 3 of line 3483 Warning: Nonconforming tab character in column 3 of line 3504 Warning: Nonconforming tab character in column 3 of line 3512 Warning: Nonconforming tab character in column 3 of line 4578 Warning: Nonconforming tab character in column 3 of line 4599 Warning: Nonconforming tab character in column 3 of line 4607 ex2f.F:82.6: Vec x,b,u 1 Error: Unclassifiable statement at (1) ex2f.F:83.6: I think I need to tell the Fortran compiler to use the C preprocessor instead of the Fortran preprocessor. Regards, Wadud. On 30 June 2015 at 23:33, W. Miah wrote: > Hi Satish, > > Thanks for the Makefile. > > Regards, > Wadud. > > On 30 June 2015 at 23:20, Satish Balay wrote: >> Where did you grab this example from? the 'makefile' should be in the >> same location. >> >> For eg: if you grabbed src/ksp/ksp/examples/tutorials/ex2f.F - then you would use: >> http://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/makefile >> >> The users manual has a section explaining the format. >> http://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf#subsection.16.2.1 >> >> Satish >> >> On Tue, 30 Jun 2015, W. Miah wrote: >> >>> Hi Satish, >>> >>> I can't find the Makefile that came with it. I'll have a trawl through >>> the petsc download again, but if you could give me the compilation >>> line that would be appreciated. >>> >>> Best regards, >>> Wadud. >>> >>> On 30 June 2015 at 23:11, Satish Balay wrote: >>> > On Tue, 30 Jun 2015, W. Miah wrote: >>> > >>> >> Hi, >>> >> >>> >> This is probably a simple mistake, but I am getting the following >>> >> error when trying to compile a Fortran PETSc example: >>> >> >>> >> [host petsc-examples]$ mpiifort -c -I/home/fcwm/petsc-3.6.0/include >>> >> -I/home/fcwm/petsc-3.6.0/include/petsc ex2f.F >>> >> ex2f.F(82): error #5082: Syntax error, found ',' when expecting one >>> >> of: ( % [ : . = => >>> >> Vec x,b,u >>> >> ------------------^ >>> >> >>> >> Any help will be greatly appreciated. Thanks in advance. >>> > >>> > Are you not using the correspoding makefile that came with this example? >>> > >>> > Satish >>> > >>> >>> >>> >>> >> > > > > -- > email: wadud.miah at gmail.com > mobile: 07905 755604 > gnupg: 2E29 B22F -- email: wadud.miah at gmail.com mobile: 07905 755604 gnupg: 2E29 B22F From bsmith at mcs.anl.gov Tue Jun 30 19:28:05 2015 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 30 Jun 2015 19:28:05 -0500 Subject: [petsc-users] Order of PetscRandom and SVD In-Reply-To: <55931B44.8050700@gmail.com> References: <5593048D.7050208@gmail.com> <55931B44.8050700@gmail.com> Message-ID: <6E4FB546-7D70-40E7-95BC-1FE695910293@mcs.anl.gov> It looks like all three random number generators that PETSc use keep their "state" in some global data structure which means that using the random number generator to get the second set of values will be affected by the first set of values. So the trick would be to use a different random number generator for your code then that SLEPc uses. My guess is that SLEPc uses PETSCRAND48 so you should run ./configure with --download-sprng and then after you create your PetscRandom object call PetscRandomSetType(r,PETSCSPRNG) Barry > On Jun 30, 2015, at 5:42 PM, Torquil Macdonald S?rensen wrote: > > Thanks. So what can I do to create a PetscRandom object that is not > affected by the SLEPc-functions that I'm calling? > > Best regards, > Torquil S?rensen > > On 30/06/15 23:43, Jose E. Roman wrote: >> SLEPc solvers use a PetscRandom object internally, and may generate a different sequence of random values in each process. >> Jose >> >> >> >>> El 30/6/2015, a las 23:05, Torquil Macdonald S?rensen escribi?: >>> >>> Hi! >>> >>> I'm experiencing some problems using PetscRandom and SVD from SLEPc. For >>> some reason, the order of two seemingly unrelated parts of the program >>> below affects the random values in a vector. The test program below, run >>> with "mpiexec -n 2", will print a vector with two identical component >>> values. If I move the four lines of SVD code up above "PetscRandom r", >>> the two vector component are different, as expected. Why does this >>> matter, when the SVD code is unrelated to the PetscRandom r and Vec u? >>> >>> #include "petscdmda.h" >>> #include "slepcsvd.h" >>> >>> int main(int argc, char **argv) >>> { >>> PetscErrorCode ierr = SlepcInitialize(&argc, &argv, 0, 0); >>> CHKERRQ(ierr); >>> >>> DM da; >>> ierr = DMDACreate1d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, 2, 1, 1, 0, >>> &da); CHKERRQ(ierr); >>> ierr = DMSetFromOptions(da); CHKERRQ(ierr); >>> >>> Vec u; >>> ierr = DMCreateGlobalVector(da, &u); CHKERRQ(ierr); >>> >>> PetscRandom r; >>> ierr = PetscRandomCreate(PETSC_COMM_WORLD, &r); CHKERRQ(ierr); >>> ierr = PetscRandomSetFromOptions(r); >>> >>> SVD svd; >>> ierr = SVDCreate(PETSC_COMM_WORLD, &svd); CHKERRQ(ierr); >>> ierr = SVDSetFromOptions(svd); CHKERRQ(ierr); >>> ierr = SVDDestroy(&svd); CHKERRQ(ierr); >>> >>> ierr = VecSetRandom(u, r); CHKERRQ(ierr); >>> ierr = VecView(u, PETSC_VIEWER_STDOUT_WORLD); CHKERRQ(ierr); >>> >>> ierr = PetscRandomDestroy(&r); CHKERRQ(ierr); >>> ierr = VecDestroy(&u); CHKERRQ(ierr); >>> ierr = DMDestroy(&da); CHKERRQ(ierr); >>> ierr = SlepcFinalize(); CHKERRQ(ierr); >>> >>> return 0; >>> } >>> >>> Output of the program as written above: >>> >>> Vec Object: 2 MPI processes >>> type: mpi >>> Vec Object:Vec_0xd00a30_0 2 MPI processes >>> type: mpi >>> Process [0] >>> 0.720032 + 0.061794 i >>> Process [1] >>> 0.720032 + 0.061794 i >>> >>> Output of the program with the four lines of SVD code moved above the >>> line "PetscRandom r": >>> >>> Vec Object: 2 MPI processes >>> type: mpi >>> Vec Object:Vec_0x117aa30_0 2 MPI processes >>> type: mpi >>> Process [0] >>> 0.720032 + 0.061794 i >>> Process [1] >>> 0.541144 + 0.529699 i >>> >>> Other information: >>> >>> $ gcc --version >>> gcc (Debian 5.1.1-12) 5.1.1 20150622 >>> >>> The PETSc version was obtained with the command: >>> git checkout 75ae60a9cdad23ddd49e9e39341b43db353bd940 >>> >>> because the newest GIT version wouldn't compile. SLEPc is the newest >>> from the master branch. The -log_summary output of the program is: >>> >>> ---------------------------------------------- PETSc Performance >>> Summary: ---------------------------------------------- >>> >>> ./randtest.exe on a arch-linux2-c-debug named lenovo with 2 processors, >>> by tmac Tue Jun 30 22:42:49 2015 >>> Using Petsc Development GIT revision: v3.6-89-g75ae60a GIT Date: >>> 2015-06-29 18:37:23 -0500 >>> >>> Max Max/Min Avg Total >>> Time (sec): 1.274e-02 1.00000 1.274e-02 >>> Objects: 3.900e+01 1.00000 3.900e+01 >>> Flops: 0.000e+00 0.00000 0.000e+00 0.000e+00 >>> Flops/sec: 0.000e+00 0.00000 0.000e+00 0.000e+00 >>> Memory: 1.273e+05 1.00000 2.546e+05 >>> MPI Messages: 6.500e+00 1.00000 6.500e+00 1.300e+01 >>> MPI Message Lengths: 3.600e+01 1.00000 5.538e+00 7.200e+01 >>> MPI Reductions: 9.000e+01 1.00000 >>> >>> Flop counting convention: 1 flop = 1 real number operation of type >>> (multiply/divide/add/subtract) >>> e.g., VecAXPY() for real vectors of length N >>> --> 2N flops >>> and VecAXPY() for complex vectors of length >>> N --> 8N flops >>> >>> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages >>> --- -- Message Lengths -- -- Reductions -- >>> Avg %Total Avg %Total counts >>> %Total Avg %Total counts %Total >>> 0: Main Stage: 1.2723e-02 99.9% 0.0000e+00 0.0% 1.300e+01 >>> 100.0% 5.538e+00 100.0% 8.900e+01 98.9% >>> >>> ------------------------------------------------------------------------------------------------------------------------ >>> See the 'Profiling' chapter of the users' manual for details on >>> interpreting output. >>> Phase summary info: >>> Count: number of times phase was executed >>> Time and Flops: Max - maximum over all processors >>> Ratio - ratio of maximum to minimum over all processors >>> Mess: number of messages sent >>> Avg. len: average message length (bytes) >>> Reduct: number of global reductions >>> Global: entire computation >>> Stage: stages of a computation. Set stages with PetscLogStagePush() >>> and PetscLogStagePop(). >>> %T - percent time in this phase %F - percent flops in this >>> phase >>> %M - percent messages in this phase %L - percent message >>> lengths in this phase >>> %R - percent reductions in this phase >>> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time >>> over all processors) >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> >>> ########################################################## >>> # # >>> # WARNING!!! # >>> # # >>> # This code was compiled with a debugging option, # >>> # To get timing results run ./configure # >>> # using --with-debugging=no, the performance will # >>> # be generally two or three times faster. # >>> # # >>> ########################################################## >>> >>> >>> >>> >>> ########################################################## >>> # # >>> # WARNING!!! # >>> # # >>> # The code for various complex numbers numerical # >>> # kernels uses C++, which generally is not well # >>> # optimized. For performance that is about 4-5 times # >>> # faster, specify --with-fortran-kernels=1 # >>> # when running ./configure.py. # >>> # # >>> ########################################################## >>> >>> >>> Event Count Time (sec) >>> Flops --- Global --- --- Stage --- Total >>> Max Ratio Max Ratio Max Ratio Mess Avg len >>> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> --- Event Stage 0: Main Stage >>> >>> VecView 1 1.0 1.0881e-03 1.2 0.00e+00 0.0 5.0e+00 8.0e+00 >>> 1.5e+01 8 0 38 56 17 8 0 38 56 17 0 >>> VecSet 1 1.0 1.5020e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecScatterBegin 1 1.0 2.1935e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecScatterEnd 1 1.0 7.1526e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecSetRandom 1 1.0 1.6928e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatAssemblyBegin 1 1.0 8.4877e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 2.0e+00 1 0 0 0 2 1 0 0 0 2 0 >>> MatAssemblyEnd 1 1.0 6.7306e-04 1.0 0.00e+00 0.0 4.0e+00 4.0e+00 >>> 1.4e+01 5 0 31 22 16 5 0 31 22 16 0 >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> Memory usage is given in bytes: >>> >>> Object Type Creations Destructions Memory Descendants' Mem. >>> Reports information only for process 0. >>> >>> --- Event Stage 0: Main Stage >>> >>> Vector 7 7 11064 0 >>> Vector Scatter 3 3 3280 0 >>> Matrix 3 3 8064 0 >>> Distributed Mesh 1 1 4972 0 >>> Star Forest Bipartite Graph 2 2 1712 0 >>> Discrete System 1 1 848 0 >>> Index Set 6 6 4660 0 >>> IS L to G Mapping 1 1 632 0 >>> PetscRandom 3 3 1920 0 >>> SVD Solver 1 1 984 0 >>> EPS Solver 1 1 1272 0 >>> Spectral Transform 1 1 784 0 >>> Krylov Solver 1 1 1304 0 >>> Preconditioner 1 1 848 0 >>> Basis Vectors 3 3 3024 0 >>> Direct Solver 2 2 2640 0 >>> Region 1 1 648 0 >>> Viewer 1 0 0 0 >>> ======================================================================================================================== >>> Average time to get PetscTime(): 9.53674e-08 >>> Average time for MPI_Barrier(): 1.43051e-06 >>> Average time for zero size MPI_Send(): 4.52995e-06 >>> #PETSc Option Table entries: >>> -ksp_converged_reason >>> -log_summary >>> -objects_dump 0 >>> -options_table >>> #End of PETSc Option Table entries >>> Compiled without FORTRAN kernels >>> Compiled with full precision matrices (default) >>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 >>> sizeof(PetscScalar) 16 sizeof(PetscInt) 4 >>> Configure options: --prefix=/home/tmac/usr/stow/petsc >>> --with-scalar-type=complex --with-shared-libraries=yes >>> --with-debugging=yes --download-superlu=yes --download-ptscotch=yes >>> --with-sowing=no --download-sowing=no >>> ----------------------------------------- >>> Libraries compiled on Tue Jun 30 20:18:19 2015 on lenovo >>> Machine characteristics: Linux-4.0.0-2-amd64-x86_64-with-debian-stretch-sid >>> Using PETSc directory: /home/tmac/tmp/src/petsc >>> Using PETSc arch: arch-linux2-c-debug >>> ----------------------------------------- >>> >>> Using C compiler: mpicc -fPIC -Wall -Wwrite-strings >>> -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 ${COPTFLAGS} ${CFLAGS} >>> Using Fortran compiler: mpif90 -fPIC -Wall -Wno-unused-variable >>> -ffree-line-length-0 -g -O0 ${FOPTFLAGS} ${FFLAGS} >>> ----------------------------------------- >>> >>> Using include paths: >>> -I/home/tmac/tmp/src/petsc/arch-linux2-c-debug/include >>> -I/home/tmac/tmp/src/petsc/include -I/home/tmac/tmp/src/petsc/include >>> -I/home/tmac/tmp/src/petsc/arch-linux2-c-debug/include >>> -I/home/tmac/usr/stow/petsc/include -I/usr/lib/openmpi/include >>> -I/usr/lib/openmpi/include/openmpi >>> ----------------------------------------- >>> >>> Using C linker: mpicc >>> Using Fortran linker: mpif90 >>> Using libraries: >>> -Wl,-rpath,/home/tmac/tmp/src/petsc/arch-linux2-c-debug/lib >>> -L/home/tmac/tmp/src/petsc/arch-linux2-c-debug/lib -lpetsc >>> -Wl,-rpath,/home/tmac/usr/stow/petsc/lib -L/home/tmac/usr/stow/petsc/lib >>> -lsuperlu_4.3 -llapack -lblas -lptesmumps -lptscotch -lptscotcherr >>> -lscotch -lscotcherr -lX11 -lssl -lcrypto -lm >>> -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib >>> -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 >>> -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu >>> -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu >>> -L/lib/x86_64-linux-gnu -lmpi_f90 -lmpi_f77 -lgfortran -lm -lgfortran >>> -lm -lquadmath -lm -lmpi_cxx -lstdc++ -lrt -lm -lz >>> -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib >>> -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 >>> -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu >>> -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu >>> -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu >>> -L/usr/lib/x86_64-linux-gnu -ldl -lmpi -lhwloc -lgcc_s -lpthread -ldl >>> ----------------------------------------- >>> >>> Best regards >>> Torquil S?rensen > From wadud.miah at gmail.com Tue Jun 30 19:49:48 2015 From: wadud.miah at gmail.com (W. Miah) Date: Wed, 1 Jul 2015 01:49:48 +0100 Subject: [petsc-users] preprocessing problem In-Reply-To: References: Message-ID: Hi Satish, I managed to fix it by using the flag --with-fortran-interfaces=1 and removed --with-fortran-datatypes=1. Regards, Wadud. On 30 June 2015 at 23:54, W. Miah wrote: > Hi again Satish, > > When I use the supplied makefile, I get the following error: > > [hamilton1 petsc-examples]$ make ex2f > /usr/local/intel/ics_2013/impi/4.1.0.024/intel64/bin/mpif90 -c -fPIC > -Wall -Wno-unused-variable -ffree-line-length-0 -O > -I/home/fcwm/petsc-3.6.0/include -I/home/fcwm/petsc-3.6.0/include > -I/home/fcwm/hypre-2.10.0b/include > -I/usr/local/intel/ics_2013/impi/4.1.0.024/intel64/include > -I/usr/local/intel/ics_2013/impi/4.1.0.024/intel64/include -o > ex2f.o ex2f.F > ex2f.F:48: error: finclude/petscsys.h: No such file or directory > ex2f.F:49: error: finclude/petscvec.h: No such file or directory > ex2f.F:50: error: finclude/petscmat.h: No such file or directory > ex2f.F:51: error: finclude/petscpc.h: No such file or directory > ex2f.F:52: error: finclude/petscksp.h: No such file or directory > > It obviously cannot find the Fortran header files. When I include the > path to the Fortran header files, it still complains: > > /usr/local/intel/ics_2013/impi/4.1.0.024/intel64/bin/mpif90 -c -fPIC > -Wall -Wno-unused-variable -ffree-line-length-0 -O > -I/home/fcwm/petsc-3.6.0/include > -I/home/fcwm/petsc-3.6.0/include/petsc > -I/home/fcwm/hypre-2.10.0b/include > -I/usr/local/intel/ics_2013/impi/4.1.0.024/intel64/include > -I/usr/local/intel/ics_2013/impi/4.1.0.024/intel64/include -o > ex2f.o ex2f.F > Warning: Nonconforming tab character in column 3 of line 1180 > Warning: Nonconforming tab character in column 3 of line 1201 > Warning: Nonconforming tab character in column 3 of line 1209 > Warning: Nonconforming tab character in column 3 of line 3483 > Warning: Nonconforming tab character in column 3 of line 3504 > Warning: Nonconforming tab character in column 3 of line 3512 > Warning: Nonconforming tab character in column 3 of line 4578 > Warning: Nonconforming tab character in column 3 of line 4599 > Warning: Nonconforming tab character in column 3 of line 4607 > ex2f.F:82.6: > > Vec x,b,u > 1 > Error: Unclassifiable statement at (1) > ex2f.F:83.6: > > I think I need to tell the Fortran compiler to use the C preprocessor > instead of the Fortran preprocessor. > > Regards, > Wadud. > > On 30 June 2015 at 23:33, W. Miah wrote: >> Hi Satish, >> >> Thanks for the Makefile. >> >> Regards, >> Wadud. >> >> On 30 June 2015 at 23:20, Satish Balay wrote: >>> Where did you grab this example from? the 'makefile' should be in the >>> same location. >>> >>> For eg: if you grabbed src/ksp/ksp/examples/tutorials/ex2f.F - then you would use: >>> http://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/makefile >>> >>> The users manual has a section explaining the format. >>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf#subsection.16.2.1 >>> >>> Satish >>> >>> On Tue, 30 Jun 2015, W. Miah wrote: >>> >>>> Hi Satish, >>>> >>>> I can't find the Makefile that came with it. I'll have a trawl through >>>> the petsc download again, but if you could give me the compilation >>>> line that would be appreciated. >>>> >>>> Best regards, >>>> Wadud. >>>> >>>> On 30 June 2015 at 23:11, Satish Balay wrote: >>>> > On Tue, 30 Jun 2015, W. Miah wrote: >>>> > >>>> >> Hi, >>>> >> >>>> >> This is probably a simple mistake, but I am getting the following >>>> >> error when trying to compile a Fortran PETSc example: >>>> >> >>>> >> [host petsc-examples]$ mpiifort -c -I/home/fcwm/petsc-3.6.0/include >>>> >> -I/home/fcwm/petsc-3.6.0/include/petsc ex2f.F >>>> >> ex2f.F(82): error #5082: Syntax error, found ',' when expecting one >>>> >> of: ( % [ : . = => >>>> >> Vec x,b,u >>>> >> ------------------^ >>>> >> >>>> >> Any help will be greatly appreciated. Thanks in advance. >>>> > >>>> > Are you not using the correspoding makefile that came with this example? >>>> > >>>> > Satish >>>> > >>>> >>>> >>>> >>>> >>> >> >> >> >> -- >> email: wadud.miah at gmail.com >> mobile: 07905 755604 >> gnupg: 2E29 B22F > > > > -- > email: wadud.miah at gmail.com > mobile: 07905 755604 > gnupg: 2E29 B22F -- email: wadud.miah at gmail.com mobile: 07905 755604 gnupg: 2E29 B22F From balay at mcs.anl.gov Tue Jun 30 20:14:08 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 30 Jun 2015 20:14:08 -0500 Subject: [petsc-users] preprocessing problem In-Reply-To: References: Message-ID: Where did you get this example source from? You can't mix and match examples from different versions of PETSc Satish On Tue, 30 Jun 2015, W. Miah wrote: > Hi again Satish, > > When I use the supplied makefile, I get the following error: > > [hamilton1 petsc-examples]$ make ex2f > /usr/local/intel/ics_2013/impi/4.1.0.024/intel64/bin/mpif90 -c -fPIC > -Wall -Wno-unused-variable -ffree-line-length-0 -O > -I/home/fcwm/petsc-3.6.0/include -I/home/fcwm/petsc-3.6.0/include > -I/home/fcwm/hypre-2.10.0b/include > -I/usr/local/intel/ics_2013/impi/4.1.0.024/intel64/include > -I/usr/local/intel/ics_2013/impi/4.1.0.024/intel64/include -o > ex2f.o ex2f.F > ex2f.F:48: error: finclude/petscsys.h: No such file or directory > ex2f.F:49: error: finclude/petscvec.h: No such file or directory > ex2f.F:50: error: finclude/petscmat.h: No such file or directory > ex2f.F:51: error: finclude/petscpc.h: No such file or directory > ex2f.F:52: error: finclude/petscksp.h: No such file or directory > > It obviously cannot find the Fortran header files. When I include the > path to the Fortran header files, it still complains: > > /usr/local/intel/ics_2013/impi/4.1.0.024/intel64/bin/mpif90 -c -fPIC > -Wall -Wno-unused-variable -ffree-line-length-0 -O > -I/home/fcwm/petsc-3.6.0/include > -I/home/fcwm/petsc-3.6.0/include/petsc > -I/home/fcwm/hypre-2.10.0b/include > -I/usr/local/intel/ics_2013/impi/4.1.0.024/intel64/include > -I/usr/local/intel/ics_2013/impi/4.1.0.024/intel64/include -o > ex2f.o ex2f.F > Warning: Nonconforming tab character in column 3 of line 1180 > Warning: Nonconforming tab character in column 3 of line 1201 > Warning: Nonconforming tab character in column 3 of line 1209 > Warning: Nonconforming tab character in column 3 of line 3483 > Warning: Nonconforming tab character in column 3 of line 3504 > Warning: Nonconforming tab character in column 3 of line 3512 > Warning: Nonconforming tab character in column 3 of line 4578 > Warning: Nonconforming tab character in column 3 of line 4599 > Warning: Nonconforming tab character in column 3 of line 4607 > ex2f.F:82.6: > > Vec x,b,u > 1 > Error: Unclassifiable statement at (1) > ex2f.F:83.6: > > I think I need to tell the Fortran compiler to use the C preprocessor > instead of the Fortran preprocessor. > > Regards, > Wadud. > > On 30 June 2015 at 23:33, W. Miah wrote: > > Hi Satish, > > > > Thanks for the Makefile. > > > > Regards, > > Wadud. > > > > On 30 June 2015 at 23:20, Satish Balay wrote: > >> Where did you grab this example from? the 'makefile' should be in the > >> same location. > >> > >> For eg: if you grabbed src/ksp/ksp/examples/tutorials/ex2f.F - then you would use: > >> http://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/makefile > >> > >> The users manual has a section explaining the format. > >> http://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf#subsection.16.2.1 > >> > >> Satish > >> > >> On Tue, 30 Jun 2015, W. Miah wrote: > >> > >>> Hi Satish, > >>> > >>> I can't find the Makefile that came with it. I'll have a trawl through > >>> the petsc download again, but if you could give me the compilation > >>> line that would be appreciated. > >>> > >>> Best regards, > >>> Wadud. > >>> > >>> On 30 June 2015 at 23:11, Satish Balay wrote: > >>> > On Tue, 30 Jun 2015, W. Miah wrote: > >>> > > >>> >> Hi, > >>> >> > >>> >> This is probably a simple mistake, but I am getting the following > >>> >> error when trying to compile a Fortran PETSc example: > >>> >> > >>> >> [host petsc-examples]$ mpiifort -c -I/home/fcwm/petsc-3.6.0/include > >>> >> -I/home/fcwm/petsc-3.6.0/include/petsc ex2f.F > >>> >> ex2f.F(82): error #5082: Syntax error, found ',' when expecting one > >>> >> of: ( % [ : . = => > >>> >> Vec x,b,u > >>> >> ------------------^ > >>> >> > >>> >> Any help will be greatly appreciated. Thanks in advance. > >>> > > >>> > Are you not using the correspoding makefile that came with this example? > >>> > > >>> > Satish > >>> > > >>> > >>> > >>> > >>> > >> > > > > > > > > -- > > email: wadud.miah at gmail.com > > mobile: 07905 755604 > > gnupg: 2E29 B22F > > > > From balay at mcs.anl.gov Tue Jun 30 20:15:40 2015 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 30 Jun 2015 20:15:40 -0500 Subject: [petsc-users] preprocessing problem In-Reply-To: References: Message-ID: most fortran examples wont work with --with-fortran-datatypes=1 option. Satish On Tue, 30 Jun 2015, W. Miah wrote: > Hi Satish, > > I managed to fix it by using the flag --with-fortran-interfaces=1 and > removed --with-fortran-datatypes=1. > > Regards, > Wadud. > > On 30 June 2015 at 23:54, W. Miah wrote: > > Hi again Satish, > > > > When I use the supplied makefile, I get the following error: > > > > [hamilton1 petsc-examples]$ make ex2f > > /usr/local/intel/ics_2013/impi/4.1.0.024/intel64/bin/mpif90 -c -fPIC > > -Wall -Wno-unused-variable -ffree-line-length-0 -O > > -I/home/fcwm/petsc-3.6.0/include -I/home/fcwm/petsc-3.6.0/include > > -I/home/fcwm/hypre-2.10.0b/include > > -I/usr/local/intel/ics_2013/impi/4.1.0.024/intel64/include > > -I/usr/local/intel/ics_2013/impi/4.1.0.024/intel64/include -o > > ex2f.o ex2f.F > > ex2f.F:48: error: finclude/petscsys.h: No such file or directory > > ex2f.F:49: error: finclude/petscvec.h: No such file or directory > > ex2f.F:50: error: finclude/petscmat.h: No such file or directory > > ex2f.F:51: error: finclude/petscpc.h: No such file or directory > > ex2f.F:52: error: finclude/petscksp.h: No such file or directory > > > > It obviously cannot find the Fortran header files. When I include the > > path to the Fortran header files, it still complains: > > > > /usr/local/intel/ics_2013/impi/4.1.0.024/intel64/bin/mpif90 -c -fPIC > > -Wall -Wno-unused-variable -ffree-line-length-0 -O > > -I/home/fcwm/petsc-3.6.0/include > > -I/home/fcwm/petsc-3.6.0/include/petsc > > -I/home/fcwm/hypre-2.10.0b/include > > -I/usr/local/intel/ics_2013/impi/4.1.0.024/intel64/include > > -I/usr/local/intel/ics_2013/impi/4.1.0.024/intel64/include -o > > ex2f.o ex2f.F > > Warning: Nonconforming tab character in column 3 of line 1180 > > Warning: Nonconforming tab character in column 3 of line 1201 > > Warning: Nonconforming tab character in column 3 of line 1209 > > Warning: Nonconforming tab character in column 3 of line 3483 > > Warning: Nonconforming tab character in column 3 of line 3504 > > Warning: Nonconforming tab character in column 3 of line 3512 > > Warning: Nonconforming tab character in column 3 of line 4578 > > Warning: Nonconforming tab character in column 3 of line 4599 > > Warning: Nonconforming tab character in column 3 of line 4607 > > ex2f.F:82.6: > > > > Vec x,b,u > > 1 > > Error: Unclassifiable statement at (1) > > ex2f.F:83.6: > > > > I think I need to tell the Fortran compiler to use the C preprocessor > > instead of the Fortran preprocessor. > > > > Regards, > > Wadud. > > > > On 30 June 2015 at 23:33, W. Miah wrote: > >> Hi Satish, > >> > >> Thanks for the Makefile. > >> > >> Regards, > >> Wadud. > >> > >> On 30 June 2015 at 23:20, Satish Balay wrote: > >>> Where did you grab this example from? the 'makefile' should be in the > >>> same location. > >>> > >>> For eg: if you grabbed src/ksp/ksp/examples/tutorials/ex2f.F - then you would use: > >>> http://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/makefile > >>> > >>> The users manual has a section explaining the format. > >>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf#subsection.16.2.1 > >>> > >>> Satish > >>> > >>> On Tue, 30 Jun 2015, W. Miah wrote: > >>> > >>>> Hi Satish, > >>>> > >>>> I can't find the Makefile that came with it. I'll have a trawl through > >>>> the petsc download again, but if you could give me the compilation > >>>> line that would be appreciated. > >>>> > >>>> Best regards, > >>>> Wadud. > >>>> > >>>> On 30 June 2015 at 23:11, Satish Balay wrote: > >>>> > On Tue, 30 Jun 2015, W. Miah wrote: > >>>> > > >>>> >> Hi, > >>>> >> > >>>> >> This is probably a simple mistake, but I am getting the following > >>>> >> error when trying to compile a Fortran PETSc example: > >>>> >> > >>>> >> [host petsc-examples]$ mpiifort -c -I/home/fcwm/petsc-3.6.0/include > >>>> >> -I/home/fcwm/petsc-3.6.0/include/petsc ex2f.F > >>>> >> ex2f.F(82): error #5082: Syntax error, found ',' when expecting one > >>>> >> of: ( % [ : . = => > >>>> >> Vec x,b,u > >>>> >> ------------------^ > >>>> >> > >>>> >> Any help will be greatly appreciated. Thanks in advance. > >>>> > > >>>> > Are you not using the correspoding makefile that came with this example? > >>>> > > >>>> > Satish > >>>> > > >>>> > >>>> > >>>> > >>>> > >>> > >> > >> > >> > >> -- > >> email: wadud.miah at gmail.com > >> mobile: 07905 755604 > >> gnupg: 2E29 B22F > > > > > > > > -- > > email: wadud.miah at gmail.com > > mobile: 07905 755604 > > gnupg: 2E29 B22F > > > >