From rlwalker at usc.edu Thu Jun 1 00:14:35 2017 From: rlwalker at usc.edu (Robert Walker) Date: Wed, 31 May 2017 22:14:35 -0700 Subject: [petsc-users] [petsc-dev] PETSc User Meeting 2017, June 14-16 in Boulder, Colorado In-Reply-To: <87zidsokp8.fsf@jedbrown.org> References: <87y3wbtk1i.fsf@jedbrown.org> <87shjsxmyh.fsf@jedbrown.org> <87zidsokp8.fsf@jedbrown.org> Message-ID: Any chance this will either be streamed live or have recordings available soon after? Robert L. Walker MS Petroleum Engineering Mork Family Department of Chemicals and Materials Sciences University of Southern California ---------------------------------------------- Mobile US: +1 (213) - 290 -7101 Mobile EU: +34 62 274 66 40 rlwalker at usc.edu On Wed, May 31, 2017 at 1:06 PM, Jed Brown wrote: > Correction: it is still possible to book lodging today (closes at > midnight Mountain Time). > > See you in two short weeks. Thanks! > > Jed Brown writes: > > > The program is up on the website: > > > > https://www.mcs.anl.gov/petsc/meetings/2017/ > > > > If you haven't registered yet, we can still accommodate you, but please > > register soon. If you haven't booked lodging, please do that soon -- > > the on-campus lodging option will close on *Tuesday, May 30*. > > > > https://confreg.colorado.edu/CSM2017 > > > > We are looking forward to seeing you in Boulder! > > > > Jed Brown writes: > > > >> We'd like to invite you to join us at the 2017 PETSc User Meeting held > >> at the University of Colorado Boulder on June 14-16, 2017. > >> > >> http://www.mcs.anl.gov/petsc/meetings/2017/ > >> > >> The first day consists of tutorials on various aspects and features of > >> PETSc. The second and third days will be devoted to exchange, > >> discussions, and a refinement of strategies for the future with our > >> users. We encourage you to present work illustrating your own use of > >> PETSc, for example in applications or in libraries built on top of > >> PETSc. > >> > >> Registration for the PETSc User Meeting 2017 is free for students and > >> $75 for non-students. We can host a maximum of 150 participants, so > >> register soon (and by May 15). > >> > >> http://www.eventzilla.net/web/e/petsc-user-meeting-2017-2138890185 > >> > >> We are also offering low-cost lodging on campus. A lodging registration > >> site will be available soon and announced here and on the website. > >> > >> Thanks to the generosity of Intel, we will be able to offer a limited > >> number of student travel grants. We are also soliciting additional > >> sponsors -- please contact us if you are interested. > >> > >> > >> We are looking forward to seeing you in Boulder! > >> > >> Please contact us at petsc2017 at mcs.anl.gov if you have any questions or > >> comments. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jun 1 07:52:02 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 1 Jun 2017 07:52:02 -0500 Subject: [petsc-users] PetscFECreateDefault in Fortran In-Reply-To: <1496286009918.20206@auckland.ac.nz> References: <1496200773990.42892@auckland.ac.nz> <1496286009918.20206@auckland.ac.nz> Message-ID: On Wed, May 31, 2017 at 10:00 PM, Justin Pogacnik wrote: > Thanks Matt! That works perfectly now. I have another question regarding > accessing the quadrature information. > > > When I use PetscFEGetQuadrature(), then PetscQuadratureView(), I see what > I expect regarding point locations, weights. > > > However, when I try to use PetscQuadratureGetData() the pointers seem to > point to random memory locations. > > > The exact line from my test problem is: call PetscQuadratureGetData(quad,q_ > nc,q_dim,q_num,pq_points,pq_weights,ierr); > > where the pq_* are the pointers giving strange output. The q_nc, q_dim, > and q_num are all giving what I would expect to see. > You are clearly the first Fortran user interested in this stuff ;) Handling of arrays in Fortran demands some more work from us. I need to write a wrapper for that function. I will do it as soon as I can. Thanks, Matt > Happy to send along the file if that helps. > > > Thanks again, > > > Justin > ------------------------------ > *From:* Matthew Knepley > *Sent:* Thursday, June 1, 2017 1:34 AM > *To:* Justin Pogacnik > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] PetscFECreateDefault in Fortran > > On Wed, May 31, 2017 at 7:53 AM, Matthew Knepley > wrote: > >> On Tue, May 30, 2017 at 10:19 PM, Justin Pogacnik < >> j.pogacnik at auckland.ac.nz> wrote: >> >>> Hello, >>> >>> I'm developing a finite element code in fortran 90. I recently updated >>> my PETSc and am now getting the following error during compile/linking on >>> an existing application: >>> >>> Undefined symbols for architecture x86_64: >>> >>> "_petscfecreatedefault_", referenced from: >>> >>> _MAIN__ in fe_test.o >>> >>> ld: symbol(s) not found for architecture x86_64 >>> >>> collect2: error: ld returned 1 exit status >>> >>> make: *** [dist/fe_test] Error 1 >>> >>> >>> I'm running Mac OS X Yosemite (10.10.5). I've created a "minimum working >>> example" (attached) that re-creates the problem. It's basically >>> just dm/impls/plex/examples/tutorials/ex3f90, but tries to create a >>> PetscFE object. Everything goes fine and the DM looks like what is expected >>> if PetscFECreateDefault is commented out. Any idea what am I missing? >>> >> Yes, I had not made a Fortran binding for this function. I will do it now. >> > > I have merged it to the 'next' branch, and it will be in 'master' soon. > > Thanks, > > Matt > > >> Thanks, >> >> Matt >> >> >>> Many thanks! >>> >>> Justin >>> >>> >>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From niko.karin at gmail.com Thu Jun 1 09:30:53 2017 From: niko.karin at gmail.com (Karin&NiKo) Date: Thu, 1 Jun 2017 16:30:53 +0200 Subject: [petsc-users] Using FAS with SNES Message-ID: Dear PETSc team, I have interfaced our fortran legacy code with PETSC SNES. I mainly followed the examples you provide. What I conceptually used is : ------------------------------------------------------------ -------------------------------------------------------------------------- call SNESCreate(PETSC_COMM_WORLD,snes,ierr) call SNESSetFromOptions(snes,ierr) call SNESSETFunction(snes, pf, nonlinFormFunction, PETSC_NULL_OBJECT, ierr) call SNESSetJacobian(snes, mat, mat, nonlinFormJacobian, PETSC_NULL_OBJECT, ierr) call SNESSetKSP(snes, myksp, ierr) ------------------------------------------------------------ -------------------------------------------------------------------------- The code runs fine with -snes_type newtonls or newtontr. But when using -snes_type fas, it complains with the message : ------------------------------------------------------------ -------------------------------------------------------------------------- [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Corrupt argument: http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [0]PETSC ERROR: Fortran callback not set on this object [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 [0]PETSC ERROR: on a arch-linux2-c-debug named dsp0780450 by niko Thu Jun 1 16:18:43 2017 [0]PETSC ERROR: Configure options --prefix=/home/niko/dev/codeaster-prerequisites/petsc-3.7.2/Install --with-mpi=yes --with-x=yes --download-ml=/home/niko/dev/codeaster-prerequisites/petsc-3.7.2/ml-6.2-p3.tar.gz --with-mumps-lib="-L/home/niko/dev/codeaster-prerequisites/v13/prerequisites/Mumps-502_consortium_aster1/MPI/lib -lzmumps -ldmumps -lmumps_common -lpord -L/home/niko/dev/codeaster-prerequisites/v13/prerequisites/Scotch_aster-604_aster6/MPI/lib -lesmumps -lptscotch -lptscotcherr -lptscotcherrexit -lscotch -lscotcherr -lscotcherrexit -L/home/niko/dev/codeaster-prerequisites/v13/prerequisites/Parmetis_aster-403_aster/lib -lparmetis -L/home/niko/dev/codeaster-prerequisites/v13/prerequisites/Metis_aster-510_aster1/lib -lmetis -L/usr/lib -lscalapack-openmpi -L/usr/lib -lblacs-openmpi -lblacsCinit-openmpi -lblacsF77init-openmpi -L/usr/lib/x86_64-linux-gnu -lgomp " --with-mumps-include=/home/niko/dev/codeaster-prerequisites/v13/prerequisites/Mumps-502_consortium_aster1/MPI/include --with-scalapack-lib="-L/usr/lib -lscalapack-openmpi" --with-blacs-lib="-L/usr/lib -lblacs-openmpi -lblacsCinit-openmpi -lblacsF77init-openmpi" --with-blas-lib="-L/usr/lib -lopenblas -lcblas" --with-lapack-lib="-L/usr/lib -llapack" [0]PETSC ERROR: #1 PetscObjectGetFortranCallback() line 263 in /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/sys/objects/inherit.c [0]PETSC ERROR: #2 oursnesjacobian() line 105 in /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/snes/interface/ftn-custom/zsnesf.c [0]PETSC ERROR: #3 SNESComputeJacobian() line 2312 in /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/snes/interface/snes.c [0]PETSC ERROR: #4 SNESSolve_NEWTONLS() line 228 in /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/snes/impls/ls/ls.c [0]PETSC ERROR: #5 SNESSolve() line 4008 in /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/snes/interface/snes.c [0]PETSC ERROR: #6 SNESFASDownSmooth_Private() line 512 in /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/snes/impls/fas/fas.c [0]PETSC ERROR: #7 SNESFASCycle_Multiplicative() line 816 in /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/snes/impls/fas/fas.c [0]PETSC ERROR: #8 SNESSolve_FAS() line 987 in /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/snes/impls/fas/fas.c [0]PETSC ERROR: #9 SNESSolve() line 4008 in /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/snes/interface/snes.c ------------------------------------------------------------ -------------------------------------------------------------------------- When exploring a little bit with a debugger, it seems that the object snes->vec_rhs which is used is fas.c is a null pointer. What is this object vec_rhs and how to set it? Thank you in advance, Nicolas -------------- next part -------------- An HTML attachment was scrubbed... URL: From lawrence.mitchell at imperial.ac.uk Thu Jun 1 10:51:00 2017 From: lawrence.mitchell at imperial.ac.uk (Lawrence Mitchell) Date: Thu, 1 Jun 2017 16:51:00 +0100 Subject: [petsc-users] DMPlex distribution with FVM adjacency In-Reply-To: References: <15e465f7-dea1-39c5-7c43-ba447a7a8c09@imperial.ac.uk> <54529998-4688-4774-845B-1FDF67A8C20B@imperial.ac.uk> <0BEB36D4-C35B-48E4-8F66-8EE8D38E08B6@imperial.ac.uk> <6C66D04E-72AD-445B-9DE6-BB0961B9F622@imperial.ac.uk> Message-ID: <0b55eaf7-bf06-3876-a6bb-ce8e54422fa1@imperial.ac.uk> On 25/05/17 21:00, Matthew Knepley wrote: > On Thu, May 25, 2017 at 2:22 PM, Lawrence Mitchell > > wrote: > > > > On 25 May 2017, at 20:03, Matthew Knepley > wrote: > > > > > > Hmm, I thought I made adjacency per field. I have to look. That way, no problem with the Stokes example. DG is still weird. > > You might, we don't right now. We just make the topological > adjacency that is "large enough", and then make fields on that. > > > > > That seems baroque. So this is just another adjacency pattern. You should be able to easily define it, or if you are a patient person, > > wait for me to do it. Its here > > > > https://bitbucket.org/petsc/petsc/src/01c3230e040078628f5e559992965c1c4b6f473d/src/dm/impls/plex/plexdistribute.c?at=master&fileviewer=file-view-default#plexdistribute.c-239 > > > > > I am more than willing to make this overridable by the user through function composition or another mechanism. > > Hmm, that naive thing of just modifying the XXX_Support_Internal > to compute with DMPlexGetTransitiveClosure rather than > DMPlexGetCone didn't do what I expected, but I don't understand > the way this bootstrapping is done very well. > > > It should do the right thing. Notice that you have to be careful about > the arrays that you use since I reuse them for efficiency here. > What is going wrong? Coming back to this, I think I understand the problem a little better. Consider this mesh: +----+ |\ 3 | | \ | |2 \ | | \| +----+ |\ 1 | | \ | |0 \ | | \| +----+ Let's say I run on 3 processes and the initial (non-overlapped) cell partition is: rank 0: cell 0 rank 1: cell 1 & 2 rank 2: cell 3 Now I'd like to grow the overlap such that any cell I can see through a facet (and its closure) lives in the overlap. So great, I just need a new adjacency relation that gathers closure(support(point)) But, that isn't right, because now on rank 0, I will get a mesh that looks like: + | | | | +----+ |\ 1 | | \ | |0 \ | | \| +----+ Because I grab all the mesh points in the adjacency of the initial cell: + |\ | \ |0 \ | \ +----+ And on the top vertex that pulls in the facet (but not the cell). So I can write a "DMPlexGetAdjacency" information that only returns non-empty adjacencies for facets. But it's sort of lying about what it does now. Thoughts? Lawrence -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 473 bytes Desc: OpenPGP digital signature URL: From lawrence.mitchell at imperial.ac.uk Thu Jun 1 12:17:03 2017 From: lawrence.mitchell at imperial.ac.uk (Lawrence Mitchell) Date: Thu, 1 Jun 2017 18:17:03 +0100 Subject: [petsc-users] DMPlex distribution with FVM adjacency In-Reply-To: <0b55eaf7-bf06-3876-a6bb-ce8e54422fa1@imperial.ac.uk> References: <15e465f7-dea1-39c5-7c43-ba447a7a8c09@imperial.ac.uk> <54529998-4688-4774-845B-1FDF67A8C20B@imperial.ac.uk> <0BEB36D4-C35B-48E4-8F66-8EE8D38E08B6@imperial.ac.uk> <6C66D04E-72AD-445B-9DE6-BB0961B9F622@imperial.ac.uk> <0b55eaf7-bf06-3876-a6bb-ce8e54422fa1@imperial.ac.uk> Message-ID: <18cfce27-845a-41c5-a314-e673df6cfec3@imperial.ac.uk> On 01/06/17 16:51, Lawrence Mitchell wrote: ... > So I can write a "DMPlexGetAdjacency" information that only returns > non-empty adjacencies for facets. But it's sort of lying about what > it does now. Proposed a PR that allows the user to specify what they want adjacency to mean by providing a callback function. That might be best? https://bitbucket.org/petsc/petsc/pull-requests/690/plex-support-user-defined-adjacencies-via/diff Lawrence -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 473 bytes Desc: OpenPGP digital signature URL: From stefano.zampini at gmail.com Thu Jun 1 13:12:53 2017 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Thu, 1 Jun 2017 20:12:53 +0200 Subject: [petsc-users] DMPlex distribution with FVM adjacency In-Reply-To: <0b55eaf7-bf06-3876-a6bb-ce8e54422fa1@imperial.ac.uk> References: <15e465f7-dea1-39c5-7c43-ba447a7a8c09@imperial.ac.uk> <54529998-4688-4774-845B-1FDF67A8C20B@imperial.ac.uk> <0BEB36D4-C35B-48E4-8F66-8EE8D38E08B6@imperial.ac.uk> <6C66D04E-72AD-445B-9DE6-BB0961B9F622@imperial.ac.uk> <0b55eaf7-bf06-3876-a6bb-ce8e54422fa1@imperial.ac.uk> Message-ID: <213E52CF-100B-494D-BB8E-B186E6FA832E@gmail.com> > On Jun 1, 2017, at 5:51 PM, Lawrence Mitchell wrote: > > On 25/05/17 21:00, Matthew Knepley wrote: >> On Thu, May 25, 2017 at 2:22 PM, Lawrence Mitchell >> > > wrote: >> >> >>> On 25 May 2017, at 20:03, Matthew Knepley > wrote: >>> >>> >>> Hmm, I thought I made adjacency per field. I have to look. That way, no problem with the Stokes example. DG is still weird. >> >> You might, we don't right now. We just make the topological >> adjacency that is "large enough", and then make fields on that. >> >>> >>> That seems baroque. So this is just another adjacency pattern. You should be able to easily define it, or if you are a patient person, >>> wait for me to do it. Its here >>> >>> https://bitbucket.org/petsc/petsc/src/01c3230e040078628f5e559992965c1c4b6f473d/src/dm/impls/plex/plexdistribute.c?at=master&fileviewer=file-view-default#plexdistribute.c-239 >> >>> >>> I am more than willing to make this overridable by the user through function composition or another mechanism. >> >> Hmm, that naive thing of just modifying the XXX_Support_Internal >> to compute with DMPlexGetTransitiveClosure rather than >> DMPlexGetCone didn't do what I expected, but I don't understand >> the way this bootstrapping is done very well. >> >> >> It should do the right thing. Notice that you have to be careful about >> the arrays that you use since I reuse them for efficiency here. >> What is going wrong? > > Coming back to this, I think I understand the problem a little better. > > Consider this mesh: > > +----+ > |\ 3 | > | \ | > |2 \ | > | \| > +----+ > |\ 1 | > | \ | > |0 \ | > | \| > +----+ > > Let's say I run on 3 processes and the initial (non-overlapped) cell > partition is: > > rank 0: cell 0 > rank 1: cell 1 & 2 > rank 2: cell 3 > > Now I'd like to grow the overlap such that any cell I can see through > a facet (and its closure) lives in the overlap. > Lawrence, why do you need the closure here? Why facet adjacency is not enough? > So great, I just need a new adjacency relation that gathers > closure(support(point)) > > But, that isn't right, because now on rank 0, I will get a mesh that > looks like: > > + > | > | > | > | > +----+ > |\ 1 | > | \ | > |0 \ | > | \| > +----+ > > Because I grab all the mesh points in the adjacency of the initial cell: > > + > |\ > | \ > |0 \ > | \ > +----+ > > And on the top vertex that pulls in the facet (but not the cell). > > So I can write a "DMPlexGetAdjacency" information that only returns > non-empty adjacencies for facets. But it's sort of lying about what > it does now. > > Thoughts? > > Lawrence > From lawrence.mitchell at imperial.ac.uk Thu Jun 1 13:38:19 2017 From: lawrence.mitchell at imperial.ac.uk (Lawrence Mitchell) Date: Thu, 1 Jun 2017 19:38:19 +0100 Subject: [petsc-users] DMPlex distribution with FVM adjacency In-Reply-To: <213E52CF-100B-494D-BB8E-B186E6FA832E@gmail.com> References: <15e465f7-dea1-39c5-7c43-ba447a7a8c09@imperial.ac.uk> <54529998-4688-4774-845B-1FDF67A8C20B@imperial.ac.uk> <0BEB36D4-C35B-48E4-8F66-8EE8D38E08B6@imperial.ac.uk> <6C66D04E-72AD-445B-9DE6-BB0961B9F622@imperial.ac.uk> <0b55eaf7-bf06-3876-a6bb-ce8e54422fa1@imperial.ac.uk> <213E52CF-100B-494D-BB8E-B186E6FA832E@gmail.com> Message-ID: On 1 Jun 2017, at 19:12, Stefano Zampini wrote: >> Now I'd like to grow the overlap such that any cell I can see through >> a facet (and its closure) lives in the overlap. >> > > Lawrence, why do you need the closure here? Why facet adjacency is not enough? Sorry, bad punctuation. The closure subclause is attached to the cell. So the adjacency I want is "go from the facets to the cells, gather all the points in the closure of those cells". Lawrence > From bikash at umich.edu Thu Jun 1 14:07:51 2017 From: bikash at umich.edu (Bikash Kanungo) Date: Thu, 1 Jun 2017 15:07:51 -0400 Subject: [petsc-users] MatSetNullSpace Message-ID: Hi, I'm trying to solve a linear system of equations Ax=b, where A has a null space (say Q) and x is known to be orthogonal to Q. In order to avoid ill-conditioning, I was trying to do the following: 1. Create A as a shell matrix 2. Overload the MATOP_MULT operation for with my own function which returns y = A*(I - QQ^T)x instead of y = Ax 3. Upon convergence, solution = (I-QQ^T)x instead of x. However, I realized that the linear solver can make x have any arbitrary component along Q and still y = A*(I-QQ^T)x will remain unaffected, and hence can cause convergence issues. Indeed, I saw such convergence problems. What fixed the problem was using MatSetNullSpace for A with Q as the nullspace, in addition to the above three steps. So my question is what exactly is MatSetNullSpace doing? And since the full A information is not present and A is only accessed through MAT_OP_MULT, I'm confused as how MatSetNullSpace might be fixing the convergence issue. Thanks, Bikash -- Bikash S. Kanungo PhD Student Computational Materials Physics Group Mechanical Engineering University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.pogacnik at auckland.ac.nz Thu Jun 1 14:59:22 2017 From: j.pogacnik at auckland.ac.nz (Justin Pogacnik) Date: Thu, 1 Jun 2017 19:59:22 +0000 Subject: [petsc-users] PetscFECreateDefault in Fortran In-Reply-To: References: <1496200773990.42892@auckland.ac.nz> <1496286009918.20206@auckland.ac.nz>, Message-ID: <1496347162251.24784@auckland.ac.nz> ?All good. Thanks Matt. Will keep an eye out for that update. :) -Justin ________________________________ From: Matthew Knepley Sent: Friday, June 2, 2017 12:52 AM To: Justin Pogacnik Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] PetscFECreateDefault in Fortran On Wed, May 31, 2017 at 10:00 PM, Justin Pogacnik > wrote: Thanks Matt! That works perfectly now. I have another question regarding accessing the quadrature information. When I use PetscFEGetQuadrature(), then PetscQuadratureView(), I see what I expect regarding point locations, weights. However, when I try to use PetscQuadratureGetData() the pointers seem to point to random memory locations. The exact line from my test problem is: call PetscQuadratureGetData(quad,q_nc,q_dim,q_num,pq_points,pq_weights,ierr); where the pq_* are the pointers giving strange output. The q_nc, q_dim, and q_num are all giving what I would expect to see. You are clearly the first Fortran user interested in this stuff ;) Handling of arrays in Fortran demands some more work from us. I need to write a wrapper for that function. I will do it as soon as I can. Thanks, Matt Happy to send along the file if that helps. Thanks again, Justin ________________________________ From: Matthew Knepley > Sent: Thursday, June 1, 2017 1:34 AM To: Justin Pogacnik Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] PetscFECreateDefault in Fortran On Wed, May 31, 2017 at 7:53 AM, Matthew Knepley > wrote: On Tue, May 30, 2017 at 10:19 PM, Justin Pogacnik > wrote: Hello, I'm developing a finite element code in fortran 90. I recently updated my PETSc and am now getting the following error during compile/linking on an existing application: Undefined symbols for architecture x86_64: "_petscfecreatedefault_", referenced from: _MAIN__ in fe_test.o ld: symbol(s) not found for architecture x86_64 collect2: error: ld returned 1 exit status make: *** [dist/fe_test] Error 1 I'm running Mac OS X Yosemite (10.10.5). I've created a "minimum working example" (attached) that re-creates the problem. It's basically just dm/impls/plex/examples/tutorials/ex3f90, but tries to create a PetscFE object. Everything goes fine and the DM looks like what is expected if PetscFECreateDefault is commented out. Any idea what am I missing? Yes, I had not made a Fortran binding for this function. I will do it now. I have merged it to the 'next' branch, and it will be in 'master' soon. Thanks, Matt Thanks, Matt Many thanks! Justin -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Jun 1 16:15:18 2017 From: jed at jedbrown.org (Jed Brown) Date: Thu, 01 Jun 2017 18:45:18 -0230 Subject: [petsc-users] [petsc-dev] PETSc User Meeting 2017, June 14-16 in Boulder, Colorado In-Reply-To: References: <87y3wbtk1i.fsf@jedbrown.org> <87shjsxmyh.fsf@jedbrown.org> <87zidsokp8.fsf@jedbrown.org> Message-ID: <87y3tbmmu1.fsf@jedbrown.org> I haven't been on attempting live streaming. We'll collect slides at a minimum and may do video recording. I'm on travel this week, but will discuss recording options early next week. Thanks. Robert Walker writes: > Any chance this will either be streamed live or have recordings available > soon after? > > Robert L. Walker > MS Petroleum Engineering > Mork Family Department of Chemicals and Materials Sciences > University of Southern California > ---------------------------------------------- > Mobile US: +1 (213) - 290 -7101 > Mobile EU: +34 62 274 66 40 > rlwalker at usc.edu > > On Wed, May 31, 2017 at 1:06 PM, Jed Brown wrote: > >> Correction: it is still possible to book lodging today (closes at >> midnight Mountain Time). >> >> See you in two short weeks. Thanks! >> >> Jed Brown writes: >> >> > The program is up on the website: >> > >> > https://www.mcs.anl.gov/petsc/meetings/2017/ >> > >> > If you haven't registered yet, we can still accommodate you, but please >> > register soon. If you haven't booked lodging, please do that soon -- >> > the on-campus lodging option will close on *Tuesday, May 30*. >> > >> > https://confreg.colorado.edu/CSM2017 >> > >> > We are looking forward to seeing you in Boulder! >> > >> > Jed Brown writes: >> > >> >> We'd like to invite you to join us at the 2017 PETSc User Meeting held >> >> at the University of Colorado Boulder on June 14-16, 2017. >> >> >> >> http://www.mcs.anl.gov/petsc/meetings/2017/ >> >> >> >> The first day consists of tutorials on various aspects and features of >> >> PETSc. The second and third days will be devoted to exchange, >> >> discussions, and a refinement of strategies for the future with our >> >> users. We encourage you to present work illustrating your own use of >> >> PETSc, for example in applications or in libraries built on top of >> >> PETSc. >> >> >> >> Registration for the PETSc User Meeting 2017 is free for students and >> >> $75 for non-students. We can host a maximum of 150 participants, so >> >> register soon (and by May 15). >> >> >> >> http://www.eventzilla.net/web/e/petsc-user-meeting-2017-2138890185 >> >> >> >> We are also offering low-cost lodging on campus. A lodging registration >> >> site will be available soon and announced here and on the website. >> >> >> >> Thanks to the generosity of Intel, we will be able to offer a limited >> >> number of student travel grants. We are also soliciting additional >> >> sponsors -- please contact us if you are interested. >> >> >> >> >> >> We are looking forward to seeing you in Boulder! >> >> >> >> Please contact us at petsc2017 at mcs.anl.gov if you have any questions or >> >> comments. >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From knepley at gmail.com Thu Jun 1 16:25:49 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 1 Jun 2017 16:25:49 -0500 Subject: [petsc-users] MatSetNullSpace In-Reply-To: References: Message-ID: On Thu, Jun 1, 2017 at 2:07 PM, Bikash Kanungo wrote: > Hi, > > I'm trying to solve a linear system of equations Ax=b, where A has a null > space (say Q) and x is known to be orthogonal to Q. In order to avoid > ill-conditioning, I was trying to do the following: > 1) This implies that A is symmetric 2) I think you mean the b is orthogonal to Q > > 1. Create A as a shell matrix > 2. Overload the MATOP_MULT operation for with my own function which > returns > y = A*(I - QQ^T)x instead of y = Ax > 3. Upon convergence, solution = (I-QQ^T)x instead of x. > > However, I realized that the linear solver can make x have any arbitrary > component along Q and still y = A*(I-QQ^T)x will remain unaffected, and > hence can cause convergence issues. Indeed, I saw such convergence > problems. What fixed the problem was using MatSetNullSpace for A with Q as > the nullspace, in addition to the above three steps. > > So my question is what exactly is MatSetNullSpace doing? And since the > full A information is not present and A is only accessed through > MAT_OP_MULT, I'm confused as how MatSetNullSpace might be fixing the > convergence issue. > Its not MatSetNullSpace() that fixes this issue, it is the linear solver itself. Many Krylov solvers can converge to the minimum norm solution of this rank deficient problem. Second, we remove components in the nullspace from each iterate, which is not what you are doing above. It seems easier to just give your Q as the nullspace. Thanks, Matt > Thanks, > Bikash > > -- > Bikash S. Kanungo > PhD Student > Computational Materials Physics Group > Mechanical Engineering > University of Michigan > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jun 1 16:57:05 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 1 Jun 2017 16:57:05 -0500 Subject: [petsc-users] PetscFECreateDefault in Fortran In-Reply-To: <1496347162251.24784@auckland.ac.nz> References: <1496200773990.42892@auckland.ac.nz> <1496286009918.20206@auckland.ac.nz> <1496347162251.24784@auckland.ac.nz> Message-ID: On Thu, Jun 1, 2017 at 2:59 PM, Justin Pogacnik wrote: > ?All good. Thanks Matt. Will keep an eye out for that update. :) > I have checked, and I was wrong before. I did write the code for PetscQuadratureGet/RestoreData() in Fortran. Since it uses array arguments, I used F90. Thus you have to use F90 pointers, in the same style as DMPlexGet/RestoreCone() in this example: https://bitbucket.org/petsc/petsc/src/a19fbe4d52f99f875359274419a2d40a87edfba3/src/dm/impls/plex/examples/tutorials/ex1f90.F90?at=master&fileviewer=file-view-default Let me know if that is not understandable Thanks, Matt > -Justin > ------------------------------ > *From:* Matthew Knepley > *Sent:* Friday, June 2, 2017 12:52 AM > > *To:* Justin Pogacnik > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] PetscFECreateDefault in Fortran > > On Wed, May 31, 2017 at 10:00 PM, Justin Pogacnik < > j.pogacnik at auckland.ac.nz> wrote: > >> Thanks Matt! That works perfectly now. I have another question regarding >> accessing the quadrature information. >> >> >> When I use PetscFEGetQuadrature(), then PetscQuadratureView(), I see what >> I expect regarding point locations, weights. >> >> >> However, when I try to use PetscQuadratureGetData() the pointers seem to >> point to random memory locations. >> >> >> The exact line from my test problem is: call >> PetscQuadratureGetData(quad,q_nc,q_dim,q_num,pq_points,pq_weights,ierr); >> >> where the pq_* are the pointers giving strange output. The q_nc, q_dim, >> and q_num are all giving what I would expect to see. >> > You are clearly the first Fortran user interested in this stuff ;) > Handling of arrays in Fortran demands some more work > from us. I need to write a wrapper for that function. I will do it as soon > as I can. > > Thanks, > > Matt > >> Happy to send along the file if that helps. >> >> >> Thanks again, >> >> >> Justin >> ------------------------------ >> *From:* Matthew Knepley >> *Sent:* Thursday, June 1, 2017 1:34 AM >> *To:* Justin Pogacnik >> *Cc:* petsc-users at mcs.anl.gov >> *Subject:* Re: [petsc-users] PetscFECreateDefault in Fortran >> >> On Wed, May 31, 2017 at 7:53 AM, Matthew Knepley >> wrote: >> >>> On Tue, May 30, 2017 at 10:19 PM, Justin Pogacnik < >>> j.pogacnik at auckland.ac.nz> wrote: >>> >>>> Hello, >>>> >>>> I'm developing a finite element code in fortran 90. I recently updated >>>> my PETSc and am now getting the following error during compile/linking on >>>> an existing application: >>>> >>>> Undefined symbols for architecture x86_64: >>>> >>>> "_petscfecreatedefault_", referenced from: >>>> >>>> _MAIN__ in fe_test.o >>>> >>>> ld: symbol(s) not found for architecture x86_64 >>>> >>>> collect2: error: ld returned 1 exit status >>>> >>>> make: *** [dist/fe_test] Error 1 >>>> >>>> >>>> I'm running Mac OS X Yosemite (10.10.5). I've created a "minimum >>>> working example" (attached) that re-creates the problem. It's basically >>>> just dm/impls/plex/examples/tutorials/ex3f90, but tries to create a >>>> PetscFE object. Everything goes fine and the DM looks like what is expected >>>> if PetscFECreateDefault is commented out. Any idea what am I missing? >>>> >>> Yes, I had not made a Fortran binding for this function. I will do it >>> now. >>> >> >> I have merged it to the 'next' branch, and it will be in 'master' soon. >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>> >>> Matt >>> >>> >>>> Many thanks! >>>> >>>> Justin >>>> >>>> >>>> >>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> http://www.caam.rice.edu/~mk51/ >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bikash at umich.edu Thu Jun 1 16:58:03 2017 From: bikash at umich.edu (Bikash Kanungo) Date: Thu, 1 Jun 2017 17:58:03 -0400 Subject: [petsc-users] MatSetNullSpace In-Reply-To: References: Message-ID: Thank you Matthew for the quick response. I should provide some more details. 1) This implies that A is symmetric Yes A is symmetric 2) I think you mean the b is orthogonal to Q No. I've an extra condition on the solution x - that it has to be orthogonal to Q. Without this condition the problem will have multiple solutions. It's same as finding the minimum norm solution > Its not MatSetNullSpace() that fixes this issue, it is the linear solver > itself. Many Krylov solvers can converge to the minimum norm solution of this rank deficient problem. I was unable to get convergence without using MatSetNullSpace(). So it seems that the Krlov solvers are inadequate just by themselves. Second, we remove components in the nullspace from each iterate, which is > not what you are doing above. > This might be the reason why without MatSetNullSpace() I'm unable to attain convergence. > It seems easier to just give your Q as the nullspace. Yeah using Q as nullspace is the easiest option. But my basis is non-orthogonal and has an overlap matrix S. So in order to provide orthonormal nullvectors as Q to Petsc, I need to evaluate S^{-1/2}. As of now, for small problems I can evalaute S^{-1/2}, however, it's a show stopper for large ones. I can possibly use MatNullSpaceSetFunction() to circumvent the requirement of orthonormal nullvectors in MatSetNullSpace(). I would appreciate your input on this. Lastly, I would like to know what operations do MatSetNullSpace enforce within a KSP solve. Thanks, Bikash On Thu, Jun 1, 2017 at 5:25 PM, Matthew Knepley wrote: > On Thu, Jun 1, 2017 at 2:07 PM, Bikash Kanungo wrote: > >> Hi, >> >> I'm trying to solve a linear system of equations Ax=b, where A has a >> null space (say Q) and x is known to be orthogonal to Q. In order to avoid >> ill-conditioning, I was trying to do the following: >> > > 1) This implies that A is symmetric > > 2) I think you mean the b is orthogonal to Q > > >> >> 1. Create A as a shell matrix >> 2. Overload the MATOP_MULT operation for with my own function which >> returns >> y = A*(I - QQ^T)x instead of y = Ax >> 3. Upon convergence, solution = (I-QQ^T)x instead of x. >> >> However, I realized that the linear solver can make x have any arbitrary >> component along Q and still y = A*(I-QQ^T)x will remain unaffected, and >> hence can cause convergence issues. Indeed, I saw such convergence >> problems. What fixed the problem was using MatSetNullSpace for A with Q as >> the nullspace, in addition to the above three steps. >> >> So my question is what exactly is MatSetNullSpace doing? And since the >> full A information is not present and A is only accessed through >> MAT_OP_MULT, I'm confused as how MatSetNullSpace might be fixing the >> convergence issue. >> > > Its not MatSetNullSpace() that fixes this issue, it is the linear solver > itself. Many Krylov solvers can converge to the > minimum norm solution of this rank deficient problem. Second, we remove > components in the nullspace from each > iterate, which is not what you are doing above. It seems easier to just > give your Q as the nullspace. > > Thanks, > > Matt > > >> Thanks, >> Bikash >> >> -- >> Bikash S. Kanungo >> PhD Student >> Computational Materials Physics Group >> Mechanical Engineering >> University of Michigan >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > -- Bikash S. Kanungo PhD Student Computational Materials Physics Group Mechanical Engineering University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jun 1 17:00:58 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 1 Jun 2017 17:00:58 -0500 Subject: [petsc-users] MatSetNullSpace In-Reply-To: References: Message-ID: On Thu, Jun 1, 2017 at 4:58 PM, Bikash Kanungo wrote: > Thank you Matthew for the quick response. > > I should provide some more details. > > 1) This implies that A is symmetric > > > Yes A is symmetric > > 2) I think you mean the b is orthogonal to Q > > > No. I've an extra condition on the solution x - that it has to be > orthogonal to Q. Without this condition the problem will have multiple > solutions. It's same as finding the minimum norm solution > > >> Its not MatSetNullSpace() that fixes this issue, it is the linear solver >> itself. Many Krylov solvers can converge to the > > minimum norm solution of this rank deficient problem. > > > I was unable to get convergence without using MatSetNullSpace(). So it > seems that the Krlov solvers are inadequate just by themselves. > > Second, we remove components in the nullspace from each iterate, which is >> not what you are doing above. >> > > This might be the reason why without MatSetNullSpace() I'm unable to > attain convergence. > > >> It seems easier to just give your Q as the nullspace. > > Yeah using Q as nullspace is the easiest option. But my basis is > non-orthogonal and has an overlap matrix S. So in order to provide > orthonormal nullvectors as Q to Petsc, I need to evaluate S^{-1/2}. As of > now, for small problems I can evalaute S^{-1/2}, however, it's a show > stopper for large ones. I can possibly use MatNullSpaceSetFunction() to > circumvent the requirement of orthonormal nullvectors in MatSetNullSpace(). > I would appreciate your input on this. > You do not need this. Just use http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatNullSpaceSetFunction.html#MatNullSpaceSetFunction to implement your (I - Q^T Q). Thanks, Matt > Lastly, I would like to know what operations do MatSetNullSpace enforce > within a KSP solve. > > Thanks, > Bikash > > On Thu, Jun 1, 2017 at 5:25 PM, Matthew Knepley wrote: > >> On Thu, Jun 1, 2017 at 2:07 PM, Bikash Kanungo wrote: >> >>> Hi, >>> >>> I'm trying to solve a linear system of equations Ax=b, where A has a >>> null space (say Q) and x is known to be orthogonal to Q. In order to avoid >>> ill-conditioning, I was trying to do the following: >>> >> >> 1) This implies that A is symmetric >> >> 2) I think you mean the b is orthogonal to Q >> >> >>> >>> 1. Create A as a shell matrix >>> 2. Overload the MATOP_MULT operation for with my own function which >>> returns >>> y = A*(I - QQ^T)x instead of y = Ax >>> 3. Upon convergence, solution = (I-QQ^T)x instead of x. >>> >>> However, I realized that the linear solver can make x have any arbitrary >>> component along Q and still y = A*(I-QQ^T)x will remain unaffected, and >>> hence can cause convergence issues. Indeed, I saw such convergence >>> problems. What fixed the problem was using MatSetNullSpace for A with Q as >>> the nullspace, in addition to the above three steps. >>> >>> So my question is what exactly is MatSetNullSpace doing? And since the >>> full A information is not present and A is only accessed through >>> MAT_OP_MULT, I'm confused as how MatSetNullSpace might be fixing the >>> convergence issue. >>> >> >> Its not MatSetNullSpace() that fixes this issue, it is the linear solver >> itself. Many Krylov solvers can converge to the >> minimum norm solution of this rank deficient problem. Second, we remove >> components in the nullspace from each >> iterate, which is not what you are doing above. It seems easier to just >> give your Q as the nullspace. >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>> Bikash >>> >>> -- >>> Bikash S. Kanungo >>> PhD Student >>> Computational Materials Physics Group >>> Mechanical Engineering >>> University of Michigan >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ >> > > > > -- > Bikash S. Kanungo > PhD Student > Computational Materials Physics Group > Mechanical Engineering > University of Michigan > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bikash at umich.edu Thu Jun 1 18:10:24 2017 From: bikash at umich.edu (Bikash Kanungo) Date: Thu, 1 Jun 2017 19:10:24 -0400 Subject: [petsc-users] MatSetNullSpace In-Reply-To: References: Message-ID: Thanks again Matthew. I'll try to use MatSetNullSpaceFunction and see if it resolves the issue. On Thu, Jun 1, 2017 at 6:00 PM, Matthew Knepley wrote: > On Thu, Jun 1, 2017 at 4:58 PM, Bikash Kanungo wrote: > >> Thank you Matthew for the quick response. >> >> I should provide some more details. >> >> 1) This implies that A is symmetric >> >> >> Yes A is symmetric >> >> 2) I think you mean the b is orthogonal to Q >> >> >> No. I've an extra condition on the solution x - that it has to be >> orthogonal to Q. Without this condition the problem will have multiple >> solutions. It's same as finding the minimum norm solution >> >> >>> Its not MatSetNullSpace() that fixes this issue, it is the linear solver >>> itself. Many Krylov solvers can converge to the >> >> minimum norm solution of this rank deficient problem. >> >> >> I was unable to get convergence without using MatSetNullSpace(). So it >> seems that the Krlov solvers are inadequate just by themselves. >> >> Second, we remove components in the nullspace from each iterate, which >>> is not what you are doing above. >>> >> >> This might be the reason why without MatSetNullSpace() I'm unable to >> attain convergence. >> >> >>> It seems easier to just give your Q as the nullspace. >> >> Yeah using Q as nullspace is the easiest option. But my basis is >> non-orthogonal and has an overlap matrix S. So in order to provide >> orthonormal nullvectors as Q to Petsc, I need to evaluate S^{-1/2}. As of >> now, for small problems I can evalaute S^{-1/2}, however, it's a show >> stopper for large ones. I can possibly use MatNullSpaceSetFunction() to >> circumvent the requirement of orthonormal nullvectors in MatSetNullSpace(). >> I would appreciate your input on this. >> > > You do not need this. Just use > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/ > MatNullSpaceSetFunction.html#MatNullSpaceSetFunction > > to implement your (I - Q^T Q). > > Thanks, > > Matt > > >> Lastly, I would like to know what operations do MatSetNullSpace enforce >> within a KSP solve. >> >> Thanks, >> Bikash >> >> On Thu, Jun 1, 2017 at 5:25 PM, Matthew Knepley >> wrote: >> >>> On Thu, Jun 1, 2017 at 2:07 PM, Bikash Kanungo wrote: >>> >>>> Hi, >>>> >>>> I'm trying to solve a linear system of equations Ax=b, where A has a >>>> null space (say Q) and x is known to be orthogonal to Q. In order to avoid >>>> ill-conditioning, I was trying to do the following: >>>> >>> >>> 1) This implies that A is symmetric >>> >>> 2) I think you mean the b is orthogonal to Q >>> >>> >>>> >>>> 1. Create A as a shell matrix >>>> 2. Overload the MATOP_MULT operation for with my own function >>>> which returns >>>> y = A*(I - QQ^T)x instead of y = Ax >>>> 3. Upon convergence, solution = (I-QQ^T)x instead of x. >>>> >>>> However, I realized that the linear solver can make x have any >>>> arbitrary component along Q and still y = A*(I-QQ^T)x will remain >>>> unaffected, and hence can cause convergence issues. Indeed, I saw such >>>> convergence problems. What fixed the problem was using MatSetNullSpace for >>>> A with Q as the nullspace, in addition to the above three steps. >>>> >>>> So my question is what exactly is MatSetNullSpace doing? And since the >>>> full A information is not present and A is only accessed through >>>> MAT_OP_MULT, I'm confused as how MatSetNullSpace might be fixing the >>>> convergence issue. >>>> >>> >>> Its not MatSetNullSpace() that fixes this issue, it is the linear solver >>> itself. Many Krylov solvers can converge to the >>> minimum norm solution of this rank deficient problem. Second, we remove >>> components in the nullspace from each >>> iterate, which is not what you are doing above. It seems easier to just >>> give your Q as the nullspace. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks, >>>> Bikash >>>> >>>> -- >>>> Bikash S. Kanungo >>>> PhD Student >>>> Computational Materials Physics Group >>>> Mechanical Engineering >>>> University of Michigan >>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> http://www.caam.rice.edu/~mk51/ >>> >> >> >> >> -- >> Bikash S. Kanungo >> PhD Student >> Computational Materials Physics Group >> Mechanical Engineering >> University of Michigan >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > -- Bikash S. Kanungo PhD Student Computational Materials Physics Group Mechanical Engineering University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.pogacnik at auckland.ac.nz Thu Jun 1 18:29:26 2017 From: j.pogacnik at auckland.ac.nz (Justin Pogacnik) Date: Thu, 1 Jun 2017 23:29:26 +0000 Subject: [petsc-users] PetscFECreateDefault in Fortran In-Reply-To: References: <1496200773990.42892@auckland.ac.nz> <1496286009918.20206@auckland.ac.nz> <1496347162251.24784@auckland.ac.nz>, Message-ID: <1496359766243.4125@auckland.ac.nz> Hi Matt, ?Yes, I was using the QuadratureGetData in the past and the results were good. I wasn't sure why that would have changed recently. Looking at the example (in test directory, not tutorial), it looks like the two interesting pointers are pEC and pES. pEC's target is EC, which is explicitly set. Does pES have no target? In my code, I set (basically the same for q_points as well): PetscReal, target, dimension(8) :: q_weights PetscReal, pointer :: pq_weights(:) then: pq_weights => q_weights Then call: PetscQuadratureGetData() When I try to write the quadrature points and weights to see their output, I get values like NaN, 3.0e-310, etc. I tried removing the q_weights target (like pES example in the test) and that returns a segfault. I've attached my example if that helps. You can uncomment the QuadratureView() and see that the points and weights are correct.? Thanks again, Justin ________________________________ From: Matthew Knepley Sent: Friday, June 2, 2017 9:57 AM To: Justin Pogacnik Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] PetscFECreateDefault in Fortran On Thu, Jun 1, 2017 at 2:59 PM, Justin Pogacnik > wrote: ?All good. Thanks Matt. Will keep an eye out for that update. :) I have checked, and I was wrong before. I did write the code for PetscQuadratureGet/RestoreData() in Fortran. Since it uses array arguments, I used F90. Thus you have to use F90 pointers, in the same style as DMPlexGet/RestoreCone() in this example: https://bitbucket.org/petsc/petsc/src/a19fbe4d52f99f875359274419a2d40a87edfba3/src/dm/impls/plex/examples/tutorials/ex1f90.F90?at=master&fileviewer=file-view-default Let me know if that is not understandable Thanks, Matt -Justin ________________________________ From: Matthew Knepley > Sent: Friday, June 2, 2017 12:52 AM To: Justin Pogacnik Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] PetscFECreateDefault in Fortran On Wed, May 31, 2017 at 10:00 PM, Justin Pogacnik > wrote: Thanks Matt! That works perfectly now. I have another question regarding accessing the quadrature information. When I use PetscFEGetQuadrature(), then PetscQuadratureView(), I see what I expect regarding point locations, weights. However, when I try to use PetscQuadratureGetData() the pointers seem to point to random memory locations. The exact line from my test problem is: call PetscQuadratureGetData(quad,q_nc,q_dim,q_num,pq_points,pq_weights,ierr); where the pq_* are the pointers giving strange output. The q_nc, q_dim, and q_num are all giving what I would expect to see. You are clearly the first Fortran user interested in this stuff ;) Handling of arrays in Fortran demands some more work from us. I need to write a wrapper for that function. I will do it as soon as I can. Thanks, Matt Happy to send along the file if that helps. Thanks again, Justin ________________________________ From: Matthew Knepley > Sent: Thursday, June 1, 2017 1:34 AM To: Justin Pogacnik Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] PetscFECreateDefault in Fortran On Wed, May 31, 2017 at 7:53 AM, Matthew Knepley > wrote: On Tue, May 30, 2017 at 10:19 PM, Justin Pogacnik > wrote: Hello, I'm developing a finite element code in fortran 90. I recently updated my PETSc and am now getting the following error during compile/linking on an existing application: Undefined symbols for architecture x86_64: "_petscfecreatedefault_", referenced from: _MAIN__ in fe_test.o ld: symbol(s) not found for architecture x86_64 collect2: error: ld returned 1 exit status make: *** [dist/fe_test] Error 1 I'm running Mac OS X Yosemite (10.10.5). I've created a "minimum working example" (attached) that re-creates the problem. It's basically just dm/impls/plex/examples/tutorials/ex3f90, but tries to create a PetscFE object. Everything goes fine and the DM looks like what is expected if PetscFECreateDefault is commented out. Any idea what am I missing? Yes, I had not made a Fortran binding for this function. I will do it now. I have merged it to the 'next' branch, and it will be in 'master' soon. Thanks, Matt Thanks, Matt Many thanks! Justin -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fe_test.F90 Type: application/octet-stream Size: 2595 bytes Desc: fe_test.F90 URL: From knepley at gmail.com Thu Jun 1 23:09:55 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 1 Jun 2017 23:09:55 -0500 Subject: [petsc-users] DMPlex distribution with FVM adjacency In-Reply-To: <0b55eaf7-bf06-3876-a6bb-ce8e54422fa1@imperial.ac.uk> References: <15e465f7-dea1-39c5-7c43-ba447a7a8c09@imperial.ac.uk> <54529998-4688-4774-845B-1FDF67A8C20B@imperial.ac.uk> <0BEB36D4-C35B-48E4-8F66-8EE8D38E08B6@imperial.ac.uk> <6C66D04E-72AD-445B-9DE6-BB0961B9F622@imperial.ac.uk> <0b55eaf7-bf06-3876-a6bb-ce8e54422fa1@imperial.ac.uk> Message-ID: On Thu, Jun 1, 2017 at 10:51 AM, Lawrence Mitchell < lawrence.mitchell at imperial.ac.uk> wrote: > On 25/05/17 21:00, Matthew Knepley wrote: > > On Thu, May 25, 2017 at 2:22 PM, Lawrence Mitchell > > > > wrote: > > > > > > > On 25 May 2017, at 20:03, Matthew Knepley > wrote: > > > > > > > > > Hmm, I thought I made adjacency per field. I have to look. That > way, no problem with the Stokes example. DG is still weird. > > > > You might, we don't right now. We just make the topological > > adjacency that is "large enough", and then make fields on that. > > > > > > > > That seems baroque. So this is just another adjacency pattern. You > should be able to easily define it, or if you are a patient person, > > > wait for me to do it. Its here > > > > > > https://bitbucket.org/petsc/petsc/src/ > 01c3230e040078628f5e559992965c1c4b6f473d/src/dm/impls/plex/ > plexdistribute.c?at=master&fileviewer=file-view-default# > plexdistribute.c-239 > > 01c3230e040078628f5e559992965c1c4b6f473d/src/dm/impls/plex/ > plexdistribute.c?at=master&fileviewer=file-view-default# > plexdistribute.c-239> > > > > > > I am more than willing to make this overridable by the user > through function composition or another mechanism. > > > > Hmm, that naive thing of just modifying the XXX_Support_Internal > > to compute with DMPlexGetTransitiveClosure rather than > > DMPlexGetCone didn't do what I expected, but I don't understand > > the way this bootstrapping is done very well. > > > > > > It should do the right thing. Notice that you have to be careful about > > the arrays that you use since I reuse them for efficiency here. > > What is going wrong? > > Coming back to this, I think I understand the problem a little better. > > Consider this mesh: > > +----+ > |\ 3 | > | \ | > |2 \ | > | \| > +----+ > |\ 1 | > | \ | > |0 \ | > | \| > +----+ > > Let's say I run on 3 processes and the initial (non-overlapped) cell > partition is: > > rank 0: cell 0 > rank 1: cell 1 & 2 > rank 2: cell 3 > > Now I'd like to grow the overlap such that any cell I can see through > a facet (and its closure) lives in the overlap. > > So great, I just need a new adjacency relation that gathers > closure(support(point)) > > But, that isn't right, because now on rank 0, I will get a mesh that > looks like: > I do not understand why you think its not right. Toby and I are trying to push a formalism for this understanding, in https://arxiv.org/abs/1508.02470. So you say that if sigma is a dual basis function associated with point p, then the support of its matching psi, sigma(psi) = 1 in the biorthogonal bases, is exactly star(p). So, if you have no sigma sitting on your vertex, who cares if you put that extra edge and node in. It will not affect the communication pattern for dofs. If you do, then shouldn't you be including that edge? Matt > + > | > | > | > | > +----+ > |\ 1 | > | \ | > |0 \ | > | \| > +----+ > > Because I grab all the mesh points in the adjacency of the initial cell: > > + > |\ > | \ > |0 \ > | \ > +----+ > > And on the top vertex that pulls in the facet (but not the cell). > > So I can write a "DMPlexGetAdjacency" information that only returns > non-empty adjacencies for facets. But it's sort of lying about what > it does now. > > Thoughts? > > Lawrence > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jun 2 08:52:50 2017 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 2 Jun 2017 08:52:50 -0500 Subject: [petsc-users] PetscFECreateDefault in Fortran In-Reply-To: <1496359766243.4125@auckland.ac.nz> References: <1496200773990.42892@auckland.ac.nz> <1496286009918.20206@auckland.ac.nz> <1496347162251.24784@auckland.ac.nz> <1496359766243.4125@auckland.ac.nz> Message-ID: On Thu, Jun 1, 2017 at 6:29 PM, Justin Pogacnik wrote: > Hi Matt, > > ?Yes, I was using the QuadratureGetData in the past and the results were > good. I wasn't sure why that would have changed recently. Looking at the > example (in test directory, not tutorial), it looks like the two > interesting pointers are pEC and pES. pEC's target is EC, which is > explicitly set. Does pES have no target? In my code, I set (basically the > same for q_points as well): > > PetscReal, target, dimension(8) :: q_weights > > PetscReal, pointer :: pq_weights(:) > > > then: > > pq_weights => q_weights > > > Then call: PetscQuadratureGetData() > > When I try to write the quadrature points and weights to see their output, > I get values like NaN, 3.0e-310, etc. > > > I tried removing the q_weights target (like pES example in the test) and > that returns a segfault. I've attached my example if that helps. You can > uncomment the QuadratureView() and see that the points and weights are > correct.? > Okay, when we did the conversion of the F90 interface, the DT module was left out. I put it back in and your example started working again. You will need to use 'next' or my fix branch right now. If you rebuild, the module will be rebuilt and it should work. Thanks, Matt > Thanks again, > > > Justin > ------------------------------ > *From:* Matthew Knepley > *Sent:* Friday, June 2, 2017 9:57 AM > *To:* Justin Pogacnik > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] PetscFECreateDefault in Fortran > > On Thu, Jun 1, 2017 at 2:59 PM, Justin Pogacnik > wrote: > >> ?All good. Thanks Matt. Will keep an eye out for that update. :) >> > I have checked, and I was wrong before. I did write the code for > PetscQuadratureGet/RestoreData() > in Fortran. Since it uses array arguments, I used F90. Thus you have to > use F90 pointers, in the > same style as DMPlexGet/RestoreCone() in this example: > > https://bitbucket.org/petsc/petsc/src/a19fbe4d52f99f875359274419a2d4 > 0a87edfba3/src/dm/impls/plex/examples/tutorials/ex1f90.F90? > at=master&fileviewer=file-view-default > > Let me know if that is not understandable > > Thanks, > > Matt > >> -Justin >> ------------------------------ >> *From:* Matthew Knepley >> *Sent:* Friday, June 2, 2017 12:52 AM >> >> *To:* Justin Pogacnik >> *Cc:* petsc-users at mcs.anl.gov >> *Subject:* Re: [petsc-users] PetscFECreateDefault in Fortran >> >> On Wed, May 31, 2017 at 10:00 PM, Justin Pogacnik < >> j.pogacnik at auckland.ac.nz> wrote: >> >>> Thanks Matt! That works perfectly now. I have another question regarding >>> accessing the quadrature information. >>> >>> >>> When I use PetscFEGetQuadrature(), then PetscQuadratureView(), I see >>> what I expect regarding point locations, weights. >>> >>> >>> However, when I try to use PetscQuadratureGetData() the pointers seem to >>> point to random memory locations. >>> >>> >>> The exact line from my test problem is: call >>> PetscQuadratureGetData(quad,q_nc,q_dim,q_num,pq_points,pq_weights,ierr); >>> >>> where the pq_* are the pointers giving strange output. The q_nc, q_dim, >>> and q_num are all giving what I would expect to see. >>> >> You are clearly the first Fortran user interested in this stuff ;) >> Handling of arrays in Fortran demands some more work >> from us. I need to write a wrapper for that function. I will do it as >> soon as I can. >> >> Thanks, >> >> Matt >> >>> Happy to send along the file if that helps. >>> >>> >>> Thanks again, >>> >>> >>> Justin >>> ------------------------------ >>> *From:* Matthew Knepley >>> *Sent:* Thursday, June 1, 2017 1:34 AM >>> *To:* Justin Pogacnik >>> *Cc:* petsc-users at mcs.anl.gov >>> *Subject:* Re: [petsc-users] PetscFECreateDefault in Fortran >>> >>> On Wed, May 31, 2017 at 7:53 AM, Matthew Knepley >>> wrote: >>> >>>> On Tue, May 30, 2017 at 10:19 PM, Justin Pogacnik < >>>> j.pogacnik at auckland.ac.nz> wrote: >>>> >>>>> Hello, >>>>> >>>>> I'm developing a finite element code in fortran 90. I recently updated >>>>> my PETSc and am now getting the following error during compile/linking on >>>>> an existing application: >>>>> >>>>> Undefined symbols for architecture x86_64: >>>>> >>>>> "_petscfecreatedefault_", referenced from: >>>>> >>>>> _MAIN__ in fe_test.o >>>>> >>>>> ld: symbol(s) not found for architecture x86_64 >>>>> >>>>> collect2: error: ld returned 1 exit status >>>>> >>>>> make: *** [dist/fe_test] Error 1 >>>>> >>>>> >>>>> I'm running Mac OS X Yosemite (10.10.5). I've created a "minimum >>>>> working example" (attached) that re-creates the problem. It's basically >>>>> just dm/impls/plex/examples/tutorials/ex3f90, but tries to create a >>>>> PetscFE object. Everything goes fine and the DM looks like what is expected >>>>> if PetscFECreateDefault is commented out. Any idea what am I missing? >>>>> >>>> Yes, I had not made a Fortran binding for this function. I will do it >>>> now. >>>> >>> >>> I have merged it to the 'next' branch, and it will be in 'master' soon. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Many thanks! >>>>> >>>>> Justin >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> http://www.caam.rice.edu/~mk51/ >>>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> http://www.caam.rice.edu/~mk51/ >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From niko.karin at gmail.com Fri Jun 2 10:07:53 2017 From: niko.karin at gmail.com (Karin&NiKo) Date: Fri, 2 Jun 2017 17:07:53 +0200 Subject: [petsc-users] Using FAS with SNES In-Reply-To: References: Message-ID: Dear All, In order to ease the investigation, I reproduced this segfault by running snes/examples/tutorials/ex1f.F with the option -snes_type fas. I have the feeling that this is due to the nullity of the context object of FormJacobian (PETSC_NULL_OBJECT in Fortran). Best regards, Nicolas 2017-06-01 16:30 GMT+02:00 Karin&NiKo : > Dear PETSc team, > > I have interfaced our fortran legacy code with PETSC SNES. I mainly > followed the examples you provide. What I conceptually used is : > > ------------------------------------------------------------ > -------------------------------------------------------------------------- > > call SNESCreate(PETSC_COMM_WORLD,snes,ierr) > > call SNESSetFromOptions(snes,ierr) > > call SNESSETFunction(snes, pf, nonlinFormFunction, PETSC_NULL_OBJECT, > ierr) > > call SNESSetJacobian(snes, mat, mat, nonlinFormJacobian, > PETSC_NULL_OBJECT, ierr) > > call SNESSetKSP(snes, myksp, ierr) > ------------------------------------------------------------ > -------------------------------------------------------------------------- > > The code runs fine with -snes_type newtonls or newtontr. But when using > -snes_type fas, it complains with the message : > > ------------------------------------------------------------ > -------------------------------------------------------------------------- > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Corrupt argument: http://www.mcs.anl.gov/petsc/ > documentation/faq.html#valgrind > [0]PETSC ERROR: Fortran callback not set on this object > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > [0]PETSC ERROR: > > > > on a arch-linux2-c-debug named dsp0780450 by niko Thu Jun 1 16:18:43 2017 > [0]PETSC ERROR: Configure options --prefix=/home/niko/dev/ > codeaster-prerequisites/petsc-3.7.2/Install --with-mpi=yes --with-x=yes > --download-ml=/home/niko/dev/codeaster-prerequisites/petsc-3.7.2/ml-6.2-p3.tar.gz > --with-mumps-lib="-L/home/niko/dev/codeaster-prerequisites/v13/ > prerequisites/Mumps-502_consortium_aster1/MPI/lib -lzmumps -ldmumps > -lmumps_common -lpord -L/home/niko/dev/codeaster-prerequisites/v13/ > prerequisites/Scotch_aster-604_aster6/MPI/lib -lesmumps -lptscotch > -lptscotcherr -lptscotcherrexit -lscotch -lscotcherr -lscotcherrexit > -L/home/niko/dev/codeaster-prerequisites/v13/prerequisites/Parmetis_aster-403_aster/lib > -lparmetis -L/home/niko/dev/codeaster-prerequisites/v13/ > prerequisites/Metis_aster-510_aster1/lib -lmetis -L/usr/lib > -lscalapack-openmpi -L/usr/lib -lblacs-openmpi -lblacsCinit-openmpi > -lblacsF77init-openmpi -L/usr/lib/x86_64-linux-gnu -lgomp " > --with-mumps-include=/home/niko/dev/codeaster-prerequisites/v13/ > prerequisites/Mumps-502_consortium_aster1/MPI/include > --with-scalapack-lib="-L/usr/lib -lscalapack-openmpi" > --with-blacs-lib="-L/usr/lib -lblacs-openmpi -lblacsCinit-openmpi > -lblacsF77init-openmpi" --with-blas-lib="-L/usr/lib -lopenblas -lcblas" > --with-lapack-lib="-L/usr/lib -llapack" > [0]PETSC ERROR: #1 PetscObjectGetFortranCallback() line 263 in > /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/ > sys/objects/inherit.c > [0]PETSC ERROR: #2 oursnesjacobian() line 105 in /home/niko/dev/codeaster- > prerequisites/petsc-3.7.2/src/snes/interface/ftn-custom/zsnesf.c > [0]PETSC ERROR: #3 SNESComputeJacobian() line 2312 in > /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/ > snes/interface/snes.c > [0]PETSC ERROR: #4 SNESSolve_NEWTONLS() line 228 in > /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/snes/impls/ls/ls.c > [0]PETSC ERROR: #5 SNESSolve() line 4008 in /home/niko/dev/codeaster- > prerequisites/petsc-3.7.2/src/snes/interface/snes.c > [0]PETSC ERROR: #6 SNESFASDownSmooth_Private() line 512 in > /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/ > snes/impls/fas/fas.c > [0]PETSC ERROR: #7 SNESFASCycle_Multiplicative() line 816 in > /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/ > snes/impls/fas/fas.c > [0]PETSC ERROR: #8 SNESSolve_FAS() line 987 in /home/niko/dev/codeaster- > prerequisites/petsc-3.7.2/src/snes/impls/fas/fas.c > [0]PETSC ERROR: #9 SNESSolve() line 4008 in /home/niko/dev/codeaster- > prerequisites/petsc-3.7.2/src/snes/interface/snes.c > ------------------------------------------------------------ > -------------------------------------------------------------------------- > > When exploring a little bit with a debugger, it seems that the object > snes->vec_rhs which is used is fas.c is a null pointer. > > What is this object vec_rhs and how to set it? > > Thank you in advance, > Nicolas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From niko.karin at gmail.com Fri Jun 2 10:13:57 2017 From: niko.karin at gmail.com (Karin&NiKo) Date: Fri, 2 Jun 2017 17:13:57 +0200 Subject: [petsc-users] Using FAS with SNES In-Reply-To: References: Message-ID: I emphasize that snes/examples/tutorials/ex1.c works perfectly with the option -snes_type fas. 2017-06-02 17:07 GMT+02:00 Karin&NiKo : > Dear All, > > In order to ease the investigation, I reproduced this segfault by running > snes/examples/tutorials/ex1f.F with the option -snes_type fas. > I have the feeling that this is due to the nullity of the context object > of FormJacobian (PETSC_NULL_OBJECT in Fortran). > > Best regards, > Nicolas > > 2017-06-01 16:30 GMT+02:00 Karin&NiKo : > >> Dear PETSc team, >> >> I have interfaced our fortran legacy code with PETSC SNES. I mainly >> followed the examples you provide. What I conceptually used is : >> >> ------------------------------------------------------------ >> ------------------------------------------------------------ >> -------------- >> >> call SNESCreate(PETSC_COMM_WORLD,snes,ierr) >> >> call SNESSetFromOptions(snes,ierr) >> >> call SNESSETFunction(snes, pf, nonlinFormFunction, PETSC_NULL_OBJECT, >> ierr) >> >> call SNESSetJacobian(snes, mat, mat, nonlinFormJacobian, >> PETSC_NULL_OBJECT, ierr) >> >> call SNESSetKSP(snes, myksp, ierr) >> ------------------------------------------------------------ >> ------------------------------------------------------------ >> -------------- >> >> The code runs fine with -snes_type newtonls or newtontr. But when using >> -snes_type fas, it complains with the message : >> >> ------------------------------------------------------------ >> ------------------------------------------------------------ >> -------------- >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: Corrupt argument: http://www.mcs.anl.gov/petsc/d >> ocumentation/faq.html#valgrind >> [0]PETSC ERROR: Fortran callback not set on this object >> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >> for trouble shooting. >> [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 >> [0]PETSC ERROR: >> >> >> >> on a arch-linux2-c-debug named dsp0780450 by niko Thu Jun 1 16:18:43 2017 >> [0]PETSC ERROR: Configure options --prefix=/home/niko/dev/codeas >> ter-prerequisites/petsc-3.7.2/Install --with-mpi=yes --with-x=yes >> --download-ml=/home/niko/dev/codeaster-prerequisites/petsc-3.7.2/ml-6.2-p3.tar.gz >> --with-mumps-lib="-L/home/niko/dev/codeaster-prerequisites/ >> v13/prerequisites/Mumps-502_consortium_aster1/MPI/lib -lzmumps -ldmumps >> -lmumps_common -lpord -L/home/niko/dev/codeaster-pre >> requisites/v13/prerequisites/Scotch_aster-604_aster6/MPI/lib -lesmumps >> -lptscotch -lptscotcherr -lptscotcherrexit -lscotch -lscotcherr >> -lscotcherrexit -L/home/niko/dev/codeaster-pre >> requisites/v13/prerequisites/Parmetis_aster-403_aster/lib -lparmetis >> -L/home/niko/dev/codeaster-prerequisites/v13/prerequisites/Metis_aster-510_aster1/lib >> -lmetis -L/usr/lib -lscalapack-openmpi -L/usr/lib -lblacs-openmpi >> -lblacsCinit-openmpi -lblacsF77init-openmpi -L/usr/lib/x86_64-linux-gnu >> -lgomp " --with-mumps-include=/home/niko/dev/codeaster-prerequisites/ >> v13/prerequisites/Mumps-502_consortium_aster1/MPI/include >> --with-scalapack-lib="-L/usr/lib -lscalapack-openmpi" >> --with-blacs-lib="-L/usr/lib -lblacs-openmpi -lblacsCinit-openmpi >> -lblacsF77init-openmpi" --with-blas-lib="-L/usr/lib -lopenblas -lcblas" >> --with-lapack-lib="-L/usr/lib -llapack" >> [0]PETSC ERROR: #1 PetscObjectGetFortranCallback() line 263 in >> /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/sys/ >> objects/inherit.c >> [0]PETSC ERROR: #2 oursnesjacobian() line 105 in >> /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/snes/ >> interface/ftn-custom/zsnesf.c >> [0]PETSC ERROR: #3 SNESComputeJacobian() line 2312 in >> /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/snes/ >> interface/snes.c >> [0]PETSC ERROR: #4 SNESSolve_NEWTONLS() line 228 in >> /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/snes/impls/ls/ls.c >> [0]PETSC ERROR: #5 SNESSolve() line 4008 in /home/niko/dev/codeaster-prere >> quisites/petsc-3.7.2/src/snes/interface/snes.c >> [0]PETSC ERROR: #6 SNESFASDownSmooth_Private() line 512 in >> /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/snes/ >> impls/fas/fas.c >> [0]PETSC ERROR: #7 SNESFASCycle_Multiplicative() line 816 in >> /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/snes/ >> impls/fas/fas.c >> [0]PETSC ERROR: #8 SNESSolve_FAS() line 987 in >> /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/snes/ >> impls/fas/fas.c >> [0]PETSC ERROR: #9 SNESSolve() line 4008 in /home/niko/dev/codeaster-prere >> quisites/petsc-3.7.2/src/snes/interface/snes.c >> ------------------------------------------------------------ >> ------------------------------------------------------------ >> -------------- >> >> When exploring a little bit with a debugger, it seems that the object >> snes->vec_rhs which is used is fas.c is a null pointer. >> >> What is this object vec_rhs and how to set it? >> >> Thank you in advance, >> Nicolas >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.pogacnik at auckland.ac.nz Fri Jun 2 14:38:52 2017 From: j.pogacnik at auckland.ac.nz (Justin Pogacnik) Date: Fri, 2 Jun 2017 19:38:52 +0000 Subject: [petsc-users] PetscFECreateDefault in Fortran In-Reply-To: References: <1496200773990.42892@auckland.ac.nz> <1496286009918.20206@auckland.ac.nz> <1496347162251.24784@auckland.ac.nz> <1496359766243.4125@auckland.ac.nz>, Message-ID: <1496432331355.22478@auckland.ac.nz> ?Works perfectly! Thanks Matt ________________________________ From: Matthew Knepley Sent: Saturday, June 3, 2017 1:52 AM To: Justin Pogacnik Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] PetscFECreateDefault in Fortran On Thu, Jun 1, 2017 at 6:29 PM, Justin Pogacnik > wrote: Hi Matt, ?Yes, I was using the QuadratureGetData in the past and the results were good. I wasn't sure why that would have changed recently. Looking at the example (in test directory, not tutorial), it looks like the two interesting pointers are pEC and pES. pEC's target is EC, which is explicitly set. Does pES have no target? In my code, I set (basically the same for q_points as well): PetscReal, target, dimension(8) :: q_weights PetscReal, pointer :: pq_weights(:) then: pq_weights => q_weights Then call: PetscQuadratureGetData() When I try to write the quadrature points and weights to see their output, I get values like NaN, 3.0e-310, etc. I tried removing the q_weights target (like pES example in the test) and that returns a segfault. I've attached my example if that helps. You can uncomment the QuadratureView() and see that the points and weights are correct.? Okay, when we did the conversion of the F90 interface, the DT module was left out. I put it back in and your example started working again. You will need to use 'next' or my fix branch right now. If you rebuild, the module will be rebuilt and it should work. Thanks, Matt Thanks again, Justin ________________________________ From: Matthew Knepley > Sent: Friday, June 2, 2017 9:57 AM To: Justin Pogacnik Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] PetscFECreateDefault in Fortran On Thu, Jun 1, 2017 at 2:59 PM, Justin Pogacnik > wrote: ?All good. Thanks Matt. Will keep an eye out for that update. :) I have checked, and I was wrong before. I did write the code for PetscQuadratureGet/RestoreData() in Fortran. Since it uses array arguments, I used F90. Thus you have to use F90 pointers, in the same style as DMPlexGet/RestoreCone() in this example: https://bitbucket.org/petsc/petsc/src/a19fbe4d52f99f875359274419a2d40a87edfba3/src/dm/impls/plex/examples/tutorials/ex1f90.F90?at=master&fileviewer=file-view-default Let me know if that is not understandable Thanks, Matt -Justin ________________________________ From: Matthew Knepley > Sent: Friday, June 2, 2017 12:52 AM To: Justin Pogacnik Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] PetscFECreateDefault in Fortran On Wed, May 31, 2017 at 10:00 PM, Justin Pogacnik > wrote: Thanks Matt! That works perfectly now. I have another question regarding accessing the quadrature information. When I use PetscFEGetQuadrature(), then PetscQuadratureView(), I see what I expect regarding point locations, weights. However, when I try to use PetscQuadratureGetData() the pointers seem to point to random memory locations. The exact line from my test problem is: call PetscQuadratureGetData(quad,q_nc,q_dim,q_num,pq_points,pq_weights,ierr); where the pq_* are the pointers giving strange output. The q_nc, q_dim, and q_num are all giving what I would expect to see. You are clearly the first Fortran user interested in this stuff ;) Handling of arrays in Fortran demands some more work from us. I need to write a wrapper for that function. I will do it as soon as I can. Thanks, Matt Happy to send along the file if that helps. Thanks again, Justin ________________________________ From: Matthew Knepley > Sent: Thursday, June 1, 2017 1:34 AM To: Justin Pogacnik Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] PetscFECreateDefault in Fortran On Wed, May 31, 2017 at 7:53 AM, Matthew Knepley > wrote: On Tue, May 30, 2017 at 10:19 PM, Justin Pogacnik > wrote: Hello, I'm developing a finite element code in fortran 90. I recently updated my PETSc and am now getting the following error during compile/linking on an existing application: Undefined symbols for architecture x86_64: "_petscfecreatedefault_", referenced from: _MAIN__ in fe_test.o ld: symbol(s) not found for architecture x86_64 collect2: error: ld returned 1 exit status make: *** [dist/fe_test] Error 1 I'm running Mac OS X Yosemite (10.10.5). I've created a "minimum working example" (attached) that re-creates the problem. It's basically just dm/impls/plex/examples/tutorials/ex3f90, but tries to create a PetscFE object. Everything goes fine and the DM looks like what is expected if PetscFECreateDefault is commented out. Any idea what am I missing? Yes, I had not made a Fortran binding for this function. I will do it now. I have merged it to the 'next' branch, and it will be in 'master' soon. Thanks, Matt Thanks, Matt Many thanks! Justin -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgbk2008 at gmail.com Sun Jun 4 17:51:57 2017 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Mon, 5 Jun 2017 00:51:57 +0200 Subject: [petsc-users] BDDC assembly question Message-ID: Hello I obtained two different matrices when assembling with MATIS and MATMPIAIJ. With MATIS I used MatISSetPreallocation to allocate and MatSetLocalToGlobalMapping to provide the mapping. However I still used MatSetValues and MatAssemblyBegin/End with MATIS. Is it the correct way to do so? In case that BDDC required assembling the matrix using local index, is there a way to assemble using global index to keep the same assembly interface as MATMPIAIJ? Thanks Giang -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sun Jun 4 18:48:38 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 4 Jun 2017 18:48:38 -0500 Subject: [petsc-users] BDDC assembly question In-Reply-To: References: Message-ID: <1FE2819E-9D30-4CA7-95CF-98BE6458C87E@mcs.anl.gov> > On Jun 4, 2017, at 5:51 PM, Hoang Giang Bui wrote: > > Hello > > I obtained two different matrices when assembling with MATIS and MATMPIAIJ. With MATIS I used MatISSetPreallocation to allocate and MatSetLocalToGlobalMapping to provide the mapping. However I still used MatSetValues and MatAssemblyBegin/End with MATIS. Is it the correct way to do so? In case that BDDC required assembling the matrix using local index, is there a way to assemble using global index to keep the same assembly interface as MATMPIAIJ? You can use MatSetValuesLocal() in both cases; this the efficient way. Barry > > Thanks > Giang From quanwang.us at gmail.com Sun Jun 4 19:08:59 2017 From: quanwang.us at gmail.com (Wang) Date: Sun, 4 Jun 2017 20:08:59 -0400 Subject: [petsc-users] interpreting results of ISLocalToGlobalMappingView Message-ID: Hello. I have some confusions about the results given by ISLocalToGlobalMappingView. After reading a simple mesh and associate each vertex with a scalar dof, the test code uses DMPlexDistribute to get a distributed dm. Then I use the following calls call DMGetLocalToGlobalMapping(dm,ltog,ierr) call ISLocalToGlobalMappingView(ltog, PETSC_VIEWER_STDOUT_WORLD, ierr); and get following results for l2g. (MatGetOwnershipRange gives [0 3] for rank 0 and [3 9] for rank 1) ISLocalToGlobalMapping Object: 2 MPI processes type: basic [0] 0 0 [0] 1 1 [0] 2 5 [0] 3 2 [0] 4 6 [0] 5 8 [1] 0 3 [1] 1 4 [1] 2 5 [1] 3 6 [1] 4 7 [1] 5 8 The question is why, on rank 0, the global indices (I assume the third column) are not grouped into local chunks and ghost chunks. I understand how to do local to global mapping without any concern of the actual ordering, but I have some impression that in PETSC the ghost information is always coming later in the local vector. In this case, on rank 0, global index 5 should appear later than 0,1,2, because it is ghost vertex for rank 0. I'm not trying to use this for FEM, but instead using the mesh management in dmplex for other tasks. So I need to know more details. Thank you. QW -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Jun 4 19:15:13 2017 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 4 Jun 2017 19:15:13 -0500 Subject: [petsc-users] interpreting results of ISLocalToGlobalMappingView In-Reply-To: References: Message-ID: On Sun, Jun 4, 2017 at 7:08 PM, Wang wrote: > Hello. I have some confusions about the results given > by ISLocalToGlobalMappingView. > > After reading a simple mesh and associate each vertex with a scalar dof, > the test code uses DMPlexDistribute to get a distributed dm. Then I use the > following calls > > call DMGetLocalToGlobalMapping(dm,ltog,ierr) > call ISLocalToGlobalMappingView(ltog, PETSC_VIEWER_STDOUT_WORLD, ierr); > > and get following results for l2g. (MatGetOwnershipRange gives [0 3] for > rank 0 and [3 9] for rank 1) > > ISLocalToGlobalMapping Object: 2 MPI processes > type: basic > [0] 0 0 > [0] 1 1 > [0] 2 5 > [0] 3 2 > [0] 4 6 > [0] 5 8 > [1] 0 3 > [1] 1 4 > [1] 2 5 > [1] 3 6 > [1] 4 7 > [1] 5 8 > > > The question is why, on rank 0, the global indices (I assume the third > column) are not grouped into local chunks and ghost chunks. I understand > how to do local to global mapping without any concern of the actual > ordering, but I have some impression that in PETSC the ghost information is > always coming later in the local vector. > Nope. That is only guaranteed when using VecGhost. > In this case, on rank 0, global index 5 should appear later than 0,1,2, > because it is ghost vertex for rank 0. > > I'm not trying to use this for FEM, but instead using the mesh management > in dmplex for other tasks. So I need to know more details. > We base the dof ordering on the mesh ordering. If you ordered the shared parts of the mesh last, this would produce the ordering you expect. Matt > Thank you. > > QW > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From quanwang.us at gmail.com Sun Jun 4 20:53:03 2017 From: quanwang.us at gmail.com (Wang) Date: Sun, 4 Jun 2017 21:53:03 -0400 Subject: [petsc-users] interpreting results of ISLocalToGlobalMappingView In-Reply-To: References: Message-ID: Thanks for your quick response. (Sorry, Matt, I didn't reply to all for the first time.) When I add another field, I got the following ( Now, MatGetOwnershipRange gives [0 6] for rank 0 and [6 18] for rank 1 ) It seems that the global index for the second field is also starting from zero,, instead of starting from N_{first field} (9 for this case). But, in the second column, the local index is accumulating. Is it reasonable or did I make some mistakes when defining the dofs to setup the section? I also have confusions when I play with ISLocalToGlobalMappingGetInfo(ltog, nproc_nbr,procs_nbr, numprocs, indices_nbr, ierr). What does *numprocs* mean here? I'm assuming the ith of its elements is the number of indices that have "ghost" copies at processor proc_nbr(i). But for this example, my test code gives numprocs of (\48, 24 \) on rank 0, which is larger than the total dofs number in my problem, which is only 18. ISLocalToGlobalMpngGetInfoSize(ltog, nproc_nbr, numprocmax, ierr) gives nproc_nbr=2 and numprocmax=48 on both processors. I attach the code and input file, both of which were found in this mail list. ISLocalToGlobalMapping results: ISLocalToGlobalMapping Object: 2 MPI processes type: basic [0] 0 0 [0] 1 0 [0] 2 1 [0] 3 1 [0] 4 5 [0] 5 5 [0] 6 2 [0] 7 2 [0] 8 6 [0] 9 6 [0] 10 8 [0] 11 8 [1] 0 3 [1] 1 3 [1] 2 4 [1] 3 4 [1] 4 5 [1] 5 5 [1] 6 6 [1] 7 6 [1] 8 7 [1] 9 7 [1] 10 8 [1] 11 8 On Sun, Jun 4, 2017 at 8:15 PM, Matthew Knepley wrote: > On Sun, Jun 4, 2017 at 7:08 PM, Wang wrote: > >> Hello. I have some confusions about the results given >> by ISLocalToGlobalMappingView. >> >> After reading a simple mesh and associate each vertex with a scalar dof, >> the test code uses DMPlexDistribute to get a distributed dm. Then I use the >> following calls >> >> call DMGetLocalToGlobalMapping(dm,ltog,ierr) >> call ISLocalToGlobalMappingView(ltog, PETSC_VIEWER_STDOUT_WORLD, ierr); >> >> and get following results for l2g. (MatGetOwnershipRange gives [0 3] for >> rank 0 and [3 9] for rank 1) >> >> ISLocalToGlobalMapping Object: 2 MPI processes >> type: basic >> [0] 0 0 >> [0] 1 1 >> [0] 2 5 >> [0] 3 2 >> [0] 4 6 >> [0] 5 8 >> [1] 0 3 >> [1] 1 4 >> [1] 2 5 >> [1] 3 6 >> [1] 4 7 >> [1] 5 8 >> >> >> The question is why, on rank 0, the global indices (I assume the third >> column) are not grouped into local chunks and ghost chunks. I understand >> how to do local to global mapping without any concern of the actual >> ordering, but I have some impression that in PETSC the ghost information is >> always coming later in the local vector. >> > > Nope. That is only guaranteed when using VecGhost. > > >> In this case, on rank 0, global index 5 should appear later than 0,1,2, >> because it is ghost vertex for rank 0. >> >> I'm not trying to use this for FEM, but instead using the mesh management >> in dmplex for other tasks. So I need to know more details. >> > > We base the dof ordering on the mesh ordering. If you ordered the shared > parts of the mesh last, this would produce the ordering you expect. > > Matt > > >> Thank you. >> >> QW >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: diy.f90 Type: text/x-fortran Size: 15295 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Q1_4cells.msh Type: model/mesh Size: 440 bytes Desc: not available URL: From niko.karin at gmail.com Mon Jun 5 03:12:30 2017 From: niko.karin at gmail.com (Karin&NiKo) Date: Mon, 5 Jun 2017 10:12:30 +0200 Subject: [petsc-users] "snes/examples/tutorials/ex1f -snes_type fas" fails with segfault Message-ID: Dear PETSc team, If I run "snes/examples/tutorials/ex1 -snes_type fas", everything is OK. But with its Fortran version "snes/examples/tutorials/ex1f -snes_type fas", I get a segfault (see error below). Do you confirm or did I miss something? Best regards, Nicolas -------------------------------------------------------------------------------------------------------------------------------------- [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Corrupt argument: http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [0]PETSC ERROR: Fortran callback not set on this object [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 [0]PETSC ERROR: on a arch-linux2-c-debug named dsp0780450 by niko Thu Jun 1 16:18:43 2017 [0]PETSC ERROR: Configure options --prefix=/home/niko/dev/codeaster-prerequisites/petsc-3.7.2/Install --with-mpi=yes --with-x=yes --download-ml=/home/niko/dev/codeaster-prerequisites/petsc-3.7.2/ml-6.2-p3.tar.gz --with-mumps-lib="-L/home/niko/dev/codeaster-prerequisites/v13/prerequisites/Mumps-502_consortium_aster1/MPI/lib -lzmumps -ldmumps -lmumps_common -lpord -L/home/niko/dev/codeaster-prerequisites/v13/prerequisites/Scotch_aster-604_aster6/MPI/lib -lesmumps -lptscotch -lptscotcherr -lptscotcherrexit -lscotch -lscotcherr -lscotcherrexit -L/home/niko/dev/codeaster-prerequisites/v13/prerequisites/Parmetis_aster-403_aster/lib -lparmetis -L/home/niko/dev/codeaster-prerequisites/v13/prerequisites/Metis_aster-510_aster1/lib -lmetis -L/usr/lib -lscalapack-openmpi -L/usr/lib -lblacs-openmpi -lblacsCinit-openmpi -lblacsF77init-openmpi -L/usr/lib/x86_64-linux-gnu -lgomp " --with-mumps-include=/home/niko/dev/codeaster-prerequisites/v13/prerequisites/Mumps-502_consortium_aster1/MPI/include --with-scalapack-lib="-L/usr/lib -lscalapack-openmpi" --with-blacs-lib="-L/usr/lib -lblacs-openmpi -lblacsCinit-openmpi -lblacsF77init-openmpi" --with-blas-lib="-L/usr/lib -lopenblas -lcblas" --with-lapack-lib="-L/usr/lib -llapack" [0]PETSC ERROR: #1 PetscObjectGetFortranCallback() line 263 in /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/sys/objects/inherit.c [0]PETSC ERROR: #2 oursnesjacobian() line 105 in /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/snes/interface/ftn-custom/zsnesf.c [0]PETSC ERROR: #3 SNESComputeJacobian() line 2312 in /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/snes/interface/snes.c [0]PETSC ERROR: #4 SNESSolve_NEWTONLS() line 228 in /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/snes/impls/ls/ls.c [0]PETSC ERROR: #5 SNESSolve() line 4008 in /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/snes/interface/snes.c [0]PETSC ERROR: #6 SNESFASDownSmooth_Private() line 512 in /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/snes/impls/fas/fas.c [0]PETSC ERROR: #7 SNESFASCycle_Multiplicative() line 816 in /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/snes/impls/fas/fas.c [0]PETSC ERROR: #8 SNESSolve_FAS() line 987 in /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/snes/impls/fas/fas.c [0]PETSC ERROR: #9 SNESSolve() line 4008 in /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/snes/interface/snes.c -------------------------------------------------------------------------------------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From francescomigliorini93 at gmail.com Mon Jun 5 09:12:46 2017 From: francescomigliorini93 at gmail.com (Francesco Migliorini) Date: Mon, 5 Jun 2017 16:12:46 +0200 Subject: [petsc-users] Parallel vector with shared memory in Fortran Message-ID: Hello there! I am working with an MPI code in which I should create a petsc vector such that all the processes can access to all its entries. So, I tried with VecCreateShared but it does not work with my machine. Then I tried VecCreateMPI but it seems to me that it does not change anything from the usual VecCreate. Finally I found the scatter commands but the examples are a bit tricky. So, are there any other way? If no, could someone please show me how to use scatter in this simple code? Vec feP !The vector to be shared with all the processes (...) mpi_np = 2 !The number of processes ind(1) = 10 !The global dimension of the vector call VecCreate(PETSC_COMM_WORLD,feP,perr) call VecSetSizes(feP,PETSC_DECIDE,ind,perr) call VecSetFromOptions(feP,perr) (...) !Here feP is filled in call VecAssemblyBegin(feP,perr) call VecAssemblyEnd(feP,perr) Many thanks, Francesco -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Mon Jun 5 09:43:31 2017 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Mon, 5 Jun 2017 16:43:31 +0200 Subject: [petsc-users] Parallel vector with shared memory in Fortran In-Reply-To: References: Message-ID: petsc-current/docs/manualpages/Vec/VecScatterCreateToAll.html Il 05 Giu 2017 4:12 PM, "Francesco Migliorini" < francescomigliorini93 at gmail.com> ha scritto: > Hello there! > > I am working with an MPI code in which I should create a petsc vector such > that all the processes can access to all its entries. So, I tried with > VecCreateShared but it does not work with my machine. Then I tried > VecCreateMPI but it seems to me that it does not change anything from the > usual VecCreate. Finally I found the scatter commands but the examples are > a bit tricky. So, are there any other way? If no, could someone please show > me how to use scatter in this simple code? > > Vec feP !The vector to be shared with all the processes > (...) > mpi_np = 2 !The number of processes > ind(1) = 10 !The global dimension of the vector > call VecCreate(PETSC_COMM_WORLD,feP,perr) > call VecSetSizes(feP,PETSC_DECIDE,ind,perr) > call VecSetFromOptions(feP,perr) > (...) !Here feP is filled in > call VecAssemblyBegin(feP,perr) > call VecAssemblyEnd(feP,perr) > > Many thanks, > Francesco > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Mon Jun 5 09:44:26 2017 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Mon, 5 Jun 2017 16:44:26 +0200 Subject: [petsc-users] Parallel vector with shared memory in Fortran In-Reply-To: References: Message-ID: Sorry, bad copy and paste http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecScatterCreateToAll.html Il 05 Giu 2017 4:43 PM, "Stefano Zampini" ha scritto: > petsc-current/docs/manualpages/Vec/VecScatterCreateToAll.html > > Il 05 Giu 2017 4:12 PM, "Francesco Migliorini" < > francescomigliorini93 at gmail.com> ha scritto: > >> Hello there! >> >> I am working with an MPI code in which I should create a petsc vector >> such that all the processes can access to all its entries. So, I tried with >> VecCreateShared but it does not work with my machine. Then I tried >> VecCreateMPI but it seems to me that it does not change anything from the >> usual VecCreate. Finally I found the scatter commands but the examples are >> a bit tricky. So, are there any other way? If no, could someone please show >> me how to use scatter in this simple code? >> >> Vec feP !The vector to be shared with all the processes >> (...) >> mpi_np = 2 !The number of processes >> ind(1) = 10 !The global dimension of the vector >> call VecCreate(PETSC_COMM_WORLD,feP,perr) >> call VecSetSizes(feP,PETSC_DECIDE,ind,perr) >> call VecSetFromOptions(feP,perr) >> (...) !Here feP is filled in >> call VecAssemblyBegin(feP,perr) >> call VecAssemblyEnd(feP,perr) >> >> Many thanks, >> Francesco >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From quanwang.us at gmail.com Mon Jun 5 10:46:49 2017 From: quanwang.us at gmail.com (Wang) Date: Mon, 5 Jun 2017 11:46:49 -0400 Subject: [petsc-users] fields of ISLocalToGlobalMapping Message-ID: Sorry to open up another thread on local to global mappings I saw from the source code of ISLocalToGlobalMappingCreate that some fields (started with info_) in ISLocalToGlobalMapping are not updated from information of indices. If not updated when created, where or when are they updated so that I can get them through ISLocalToGlobalMappingGetInfo. Here is the link to the source. http://www.mcs.anl.gov/petsc/petsc-current/src/vec/is/utils/isltog.c.html#ISLocalToGlobalMappingCreate Thanks. QW -------------- next part -------------- An HTML attachment was scrubbed... URL: From francescomigliorini93 at gmail.com Mon Jun 5 11:20:55 2017 From: francescomigliorini93 at gmail.com (Francesco Migliorini) Date: Mon, 5 Jun 2017 18:20:55 +0200 Subject: [petsc-users] Parallel vector with shared memory in Fortran In-Reply-To: References: Message-ID: Dear Stefano, Thank you for your answer. I tried to use VecScatterCreateToAll as you suggested but it does not work since the first processor can only view its part of the vector. Here's how I managed the code: Vec fePS VecScatter Scatter (...) call VecScatterCreateToAll(feP,Scatter,fePS,perr) call VecScatterBegin(Scatter,feP,fePS,INSERT_VALUES,SCATTER_FORWARD,perr) call VecScatterEnd(Scatter,feP,fePS,INSERT_VALUES,SCATTER_FORWARD,perr) call VecScatterDestroy(Scatter,perr) call VecDestroy(fePS,perr) As I said, after this piece of code, if I print all the entries of feP from one processor, the values are correct if they belong to the part of the processor randon values. Bests, Francesco 2017-06-05 16:44 GMT+02:00 Stefano Zampini : > Sorry, bad copy and paste > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/ > VecScatterCreateToAll.html > > Il 05 Giu 2017 4:43 PM, "Stefano Zampini" ha > scritto: > >> petsc-current/docs/manualpages/Vec/VecScatterCreateToAll.html >> >> Il 05 Giu 2017 4:12 PM, "Francesco Migliorini" < >> francescomigliorini93 at gmail.com> ha scritto: >> >>> Hello there! >>> >>> I am working with an MPI code in which I should create a petsc vector >>> such that all the processes can access to all its entries. So, I tried with >>> VecCreateShared but it does not work with my machine. Then I tried >>> VecCreateMPI but it seems to me that it does not change anything from the >>> usual VecCreate. Finally I found the scatter commands but the examples are >>> a bit tricky. So, are there any other way? If no, could someone please show >>> me how to use scatter in this simple code? >>> >>> Vec feP !The vector to be shared with all the processes >>> (...) >>> mpi_np = 2 !The number of processes >>> ind(1) = 10 !The global dimension of the vector >>> call VecCreate(PETSC_COMM_WORLD,feP,perr) >>> call VecSetSizes(feP,PETSC_DECIDE,ind,perr) >>> call VecSetFromOptions(feP,perr) >>> (...) !Here feP is filled in >>> call VecAssemblyBegin(feP,perr) >>> call VecAssemblyEnd(feP,perr) >>> >>> Many thanks, >>> Francesco >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Jun 5 11:36:44 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 5 Jun 2017 11:36:44 -0500 Subject: [petsc-users] Parallel vector with shared memory in Fortran In-Reply-To: References: Message-ID: On Mon, Jun 5, 2017 at 11:20 AM, Francesco Migliorini < francescomigliorini93 at gmail.com> wrote: > Dear Stefano, > Thank you for your answer. I tried to use VecScatterCreateToAll as you > suggested but it does not work since the first processor can only view its > part of the vector. > Then there is a bug in your code. The example src/vec/vec/examples/tests/ex33.c tests this. Take a look at it. Thanks, Matt > Here's how I managed the code: > > Vec fePS > VecScatter Scatter > (...) > call VecScatterCreateToAll(feP,Scatter,fePS,perr) > call VecScatterBegin(Scatter,feP,fePS,INSERT_VALUES,SCATTER_FORWARD,perr) > call VecScatterEnd(Scatter,feP,fePS,INSERT_VALUES,SCATTER_FORWARD,perr) > call VecScatterDestroy(Scatter,perr) > call VecDestroy(fePS,perr) > > As I said, after this piece of code, if I print all the entries of feP > from one processor, the values are correct if they belong to the part of > the processor randon values. > > Bests, > Francesco > > 2017-06-05 16:44 GMT+02:00 Stefano Zampini : > >> Sorry, bad copy and paste >> >> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/ >> Vec/VecScatterCreateToAll.html >> >> Il 05 Giu 2017 4:43 PM, "Stefano Zampini" ha >> scritto: >> >>> petsc-current/docs/manualpages/Vec/VecScatterCreateToAll.html >>> >>> Il 05 Giu 2017 4:12 PM, "Francesco Migliorini" < >>> francescomigliorini93 at gmail.com> ha scritto: >>> >>>> Hello there! >>>> >>>> I am working with an MPI code in which I should create a petsc vector >>>> such that all the processes can access to all its entries. So, I tried with >>>> VecCreateShared but it does not work with my machine. Then I tried >>>> VecCreateMPI but it seems to me that it does not change anything from the >>>> usual VecCreate. Finally I found the scatter commands but the examples are >>>> a bit tricky. So, are there any other way? If no, could someone please show >>>> me how to use scatter in this simple code? >>>> >>>> Vec feP !The vector to be shared with all the processes >>>> (...) >>>> mpi_np = 2 !The number of processes >>>> ind(1) = 10 !The global dimension of the vector >>>> call VecCreate(PETSC_COMM_WORLD,feP,perr) >>>> call VecSetSizes(feP,PETSC_DECIDE,ind,perr) >>>> call VecSetFromOptions(feP,perr) >>>> (...) !Here feP is filled in >>>> call VecAssemblyBegin(feP,perr) >>>> call VecAssemblyEnd(feP,perr) >>>> >>>> Many thanks, >>>> Francesco >>>> >>> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From lawrence.mitchell at imperial.ac.uk Mon Jun 5 12:18:52 2017 From: lawrence.mitchell at imperial.ac.uk (Lawrence Mitchell) Date: Mon, 5 Jun 2017 18:18:52 +0100 Subject: [petsc-users] DMPlex distribution with custom adjacency In-Reply-To: References: <15e465f7-dea1-39c5-7c43-ba447a7a8c09@imperial.ac.uk> <54529998-4688-4774-845B-1FDF67A8C20B@imperial.ac.uk> <0BEB36D4-C35B-48E4-8F66-8EE8D38E08B6@imperial.ac.uk> <6C66D04E-72AD-445B-9DE6-BB0961B9F622@imperial.ac.uk> <0b55eaf7-bf06-3876-a6bb-ce8e54422fa1@imperial.ac.uk> Message-ID: <7E354BC8-1419-4803-B8F6-CA78F868972A@imperial.ac.uk> > On 2 Jun 2017, at 05:09, Matthew Knepley wrote: > > > Coming back to this, I think I understand the problem a little better. > > Consider this mesh: > > +----+ > |\ 3 | > | \ | > |2 \ | > | \| > +----+ > |\ 1 | > | \ | > |0 \ | > | \| > +----+ > > Let's say I run on 3 processes and the initial (non-overlapped) cell > partition is: > > rank 0: cell 0 > rank 1: cell 1 & 2 > rank 2: cell 3 > > Now I'd like to grow the overlap such that any cell I can see through > a facet (and its closure) lives in the overlap. > > So great, I just need a new adjacency relation that gathers > closure(support(point)) > > But, that isn't right, because now on rank 0, I will get a mesh that > looks like: > > I do not understand why you think its not right. Toby and I are trying to push a formalism for > this understanding, in https://arxiv.org/abs/1508.02470. So you say that if sigma is a dual > basis function associated with point p, then the support of its matching psi, sigma(psi) = 1 > in the biorthogonal bases, is exactly star(p). > > So, if you have no sigma sitting on your vertex, who cares if you put that extra edge and node > in. It will not affect the communication pattern for dofs. If you do, then shouldn't you be including > that edge? Hmm, I think we are talking at cross-purposes. Let me try and explain again where I am coming from: To do a FEM integral on some cell c I need: i) to evaluate coefficients at quadrature points (on the cell) ii) to evaluate basis functions at quadrature points (on the cell) for (i), I need all the dofs in closure(c). for (ii), I just need the definition of the finite element. To do a FEM integral on a facet f I need: i) to evaluate coefficients at quadrature points (on the facet) ii) to evaluate basis functions at quadrature points (on the facet) for (i), I need all the dofs in closure(support(f)). for (ii), I just need the definition of the finite element. So now, my model for how I want to global assembly of a facet integral is: loop over all facets: gather from global coefficient to local data evaluate coefficient at quad points perform integral local to global In parallel, I just make a partition of the facets (so that each facet is integrated exactly once). OK, so what data do I need in parallel? Exactly the dofs that correspond to closure(support(facet)) for all owned facets in my partition. So I was hoping to be able to grow a distributed cell partition by exactly that means: add in those remote cells which are in the support of owned facets (I'm happy if this is symmetrically grown, although I think I can do it with one-sided growth). So that's my rationale for wanting this "strange" adjacency. I can get enough adjacency by using the current FEM adjacency and filtering which entities I iterate over, but it seems a bit wasteful. For the currently implemented adjacencies, however, the code definitely assumes in various places that the closure of the cells on a partition covers all the points. And doing, say, DMPlexSetAdjacencyUseCone/Closure(PETSC_FALSE) results in meshes where that is not true. See the below: Lawrence $ mpiexec -n 3 ./bork [0] face 7 has support: 0 [0] face 8 has support: 0 [0] face 9 has support: 0 1 [0] face 10 has support: 1 [0] face 11 has support: 1 [0] face 12 has support: [1] face 10 has support: 0 2 [1] face 11 has support: 0 1 [1] face 12 has support: 0 [1] face 13 has support: 1 [1] face 14 has support: 2 [1] face 15 has support: 2 [1] face 16 has support: 1 3 [1] face 17 has support: 3 [1] face 18 has support: 3 [2] face 7 has support: 0 1 [2] face 8 has support: 0 [2] face 9 has support: 0 [2] face 10 has support: 1 [2] face 11 has support: [2] face 12 has support: 1[2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [2]PETSC ERROR: Petsc has generated inconsistent data [2]PETSC ERROR: Number of depth 2 faces 6 does not match permuted nubmer 5 [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [2]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3857-gda66ab19e3 GIT Date: 2017-05-10 09:02:09 -0500 [2]PETSC ERROR: ./bork on a arch-darwin-c-opt named yam-laptop.local by lmitche1 Mon Jun 5 18:15:31 2017 [2]PETSC ERROR: Configure options --download-chaco=1 --download-ctetgen=1 --download-eigen --download-exodusii=1 --download-hdf5=1 --download-hypre=1 --download-metis=1 --download-ml=1 --download-mumps=1 --download-netcdf=1 --download-parmetis=1 --download-ptscotch=1 --download-scalapack=1 --download-triangle=1 --with-c2html=0 --with-debugging=0 --with-shared-libraries=1 PETSC_ARCH=arch-darwin-c-opt [2]PETSC ERROR: #1 DMPlexCreateOrderingClosure_Static() line 41 in /Users/lmitche1/Documents/work/src/deps/petsc/src/dm/impls/plex/plexreorder.c [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Petsc has generated inconsistent data [0]PETSC ERROR: Number of depth 2 faces 6 does not match permuted nubmer 5 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3857-gda66ab19e3 GIT Date: 2017-05-10 09:02:09 -0500 [0]PETSC ERROR: ./bork on a arch-darwin-c-opt named yam-laptop.local by lmitche1 Mon Jun 5 18:15:31 2017 [0]PETSC ERROR: Configure options --download-chaco=1 --download-ctetgen=1 --download-eigen --download-exodusii=1 --download-hdf5=1 --download-hypre=1 --download-metis=1 --download-ml=1 --download-mumps=1 --download-netcdf=1 --download-parmetis=1 --download-ptscotch=1 --download-scalapack=1 --download-triangle=1 --with-c2html=0 --with-debugging=0 --with-shared-libraries=1 PETSC_ARCH=arch-darwin-c-opt [0]PETSC ERROR: #1 DMPlexCreateOrderingClosure_Static() line 41 in /Users/lmitche1/Documents/work/src/deps/petsc/src/dm/impls/plex/plexreorder.c [0]PETSC ERROR: #2 DMPlexGetOrdering() line 133 in /Users/lmitche1/Documents/work/src/deps/petsc/src/dm/impls/plex/plexreorder.c [0]PETSC ERROR: #3 main() line 87 in /Users/lmitche1/Documents/work/src/petsc-doodles/bork.c [0]PETSC ERROR: No PETSc Option Table entries [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- [2]PETSC ERROR: #2 DMPlexGetOrdering() line 133 in /Users/lmitche1/Documents/work/src/deps/petsc/src/dm/impls/plex/plexreorder.c [2]PETSC ERROR: #3 main() line 87 in /Users/lmitche1/Documents/work/src/petsc-doodles/bork.c [2]PETSC ERROR: No PETSc Option Table entries [2]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 77) - process 0 #include #include int main(int argc, char **argv) { PetscErrorCode ierr; DM dm = NULL, dmParallel = NULL; PetscSF sf = NULL; PetscPartitioner partitioner = NULL; IS perm = NULL; const PetscReal coords[12] = {0, 0, 0, 0.5, 0, 1, 1, 0, 1, 0.5, 1, 1}; const PetscInt cells[12] = {0, 1, 3, 1, 4, 3, 1, 2, 4, 2, 5, 4}; const PetscInt sizes[3] = {1, 2, 1}; const PetscInt points[4] = {0, 1, 2, 3}; PetscInt fStart, fEnd; PetscMPIInt rank, size; MPI_Comm comm; PetscInitialize(&argc, &argv, NULL, NULL); comm = PETSC_COMM_WORLD; ierr = MPI_Comm_rank(comm, &rank); CHKERRQ(ierr); ierr = MPI_Comm_size(comm, &size); CHKERRQ(ierr); if (size != 3) SETERRQ(comm, PETSC_ERR_ARG_WRONG, "Requires 3 processes"); if (!rank) { ierr = DMPlexCreateFromCellList(comm, 2, 4, 6, 3, PETSC_TRUE, cells, 2, coords, &dm); CHKERRQ(ierr); } else { ierr = DMPlexCreateFromCellList(comm, 2, 0, 0, 3, PETSC_TRUE, NULL, 2, NULL, &dm); CHKERRQ(ierr); } ierr = PetscPartitionerCreate(comm, &partitioner); CHKERRQ(ierr); ierr = PetscPartitionerSetType(partitioner, PETSCPARTITIONERSHELL); CHKERRQ(ierr); if (!rank) { ierr = PetscPartitionerShellSetPartition(partitioner, size, sizes, points); CHKERRQ(ierr); } else { ierr = PetscPartitionerShellSetPartition(partitioner, 3, NULL, NULL); CHKERRQ(ierr); } ierr = DMPlexSetPartitioner(dm, partitioner); CHKERRQ(ierr); ierr = DMPlexDistribute(dm, 0, &sf, &dmParallel); CHKERRQ(ierr); ierr = DMDestroy(&dm); CHKERRQ(ierr); ierr = PetscSFDestroy(&sf); CHKERRQ(ierr); ierr = PetscPartitionerDestroy(&partitioner); CHKERRQ(ierr); ierr = DMViewFromOptions(dmParallel, NULL, "-parallel_dm_view"); CHKERRQ(ierr); ierr = DMPlexSetAdjacencyUseCone(dmParallel, PETSC_FALSE); CHKERRQ(ierr); ierr = DMPlexSetAdjacencyUseClosure(dmParallel, PETSC_FALSE); CHKERRQ(ierr); ierr = DMPlexDistributeOverlap(dmParallel, 1, &sf, &dm); CHKERRQ(ierr); ierr = DMDestroy(&dmParallel); CHKERRQ(ierr); ierr = PetscSFDestroy(&sf); CHKERRQ(ierr); ierr = DMViewFromOptions(dm, NULL, "-overlap_dm_view"); CHKERRQ(ierr); ierr = DMPlexGetHeightStratum(dm, 1, &fStart, &fEnd); CHKERRQ(ierr); for (PetscInt f = fStart; f < fEnd; f++) { const PetscInt *support = NULL; PetscInt supportSize; ierr = DMPlexGetSupportSize(dm, f, &supportSize); CHKERRQ(ierr); ierr = DMPlexGetSupport(dm, f, &support); CHKERRQ(ierr); ierr = PetscSynchronizedPrintf(comm, "[%d] face %d has support:", rank, f); CHKERRQ(ierr); for (PetscInt p = 0; p < supportSize; p++) { ierr = PetscSynchronizedPrintf(comm, " %d", support[p]); CHKERRQ(ierr); } ierr = PetscSynchronizedPrintf(comm, "\n"); CHKERRQ(ierr); } ierr = PetscSynchronizedFlush(comm, PETSC_STDOUT); CHKERRQ(ierr); ierr = DMPlexGetOrdering(dm, MATORDERINGRCM, NULL, &perm); CHKERRQ(ierr); if (perm) { ierr = ISViewFromOptions(perm, NULL, "-ordering_is_view"); CHKERRQ(ierr); ierr = ISDestroy(&perm); CHKERRQ(ierr); } ierr = DMDestroy(&dm); CHKERRQ(ierr); ierr = PetscFinalize(); return 0; } From kannanr at ornl.gov Mon Jun 5 12:37:44 2017 From: kannanr at ornl.gov (Kannan, Ramakrishnan) Date: Mon, 5 Jun 2017 17:37:44 +0000 Subject: [petsc-users] slepc trap for large matrix Message-ID: I am running EPS for NHEP on a matrix of size 119999808x119999808 and I am experiencing the attached trapped. This is a 1D row distributed sparse uniform random matrix with 1e-6 sparsity over 36 processors. It works fine for smaller matrices of sizes with 1.2 million x 1.2 million. Let me know if you are looking for more information. -- Regards, Ramki -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: slepc.e609742.zip Type: application/zip Size: 711685 bytes Desc: slepc.e609742.zip URL: From jroman at dsic.upv.es Mon Jun 5 13:00:41 2017 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 5 Jun 2017 20:00:41 +0200 Subject: [petsc-users] slepc trap for large matrix In-Reply-To: References: Message-ID: > El 5 jun 2017, a las 19:37, Kannan, Ramakrishnan escribi?: > > I am running EPS for NHEP on a matrix of size 119999808x119999808 and I am experiencing the attached trapped. This is a 1D row distributed sparse uniform random matrix with 1e-6 sparsity over 36 processors. It works fine for smaller matrices of sizes with 1.2 million x 1.2 million. Let me know if you are looking for more information. > > -- > Regards, > Ramki > > In the log it seems that you are using more than 36 processors... Could you try running with the additional option -bv_type vecs to see if the error persists? Thanks. Jose From knepley at gmail.com Mon Jun 5 17:01:31 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 5 Jun 2017 17:01:31 -0500 Subject: [petsc-users] DMPlex distribution with custom adjacency In-Reply-To: <7E354BC8-1419-4803-B8F6-CA78F868972A@imperial.ac.uk> References: <15e465f7-dea1-39c5-7c43-ba447a7a8c09@imperial.ac.uk> <54529998-4688-4774-845B-1FDF67A8C20B@imperial.ac.uk> <0BEB36D4-C35B-48E4-8F66-8EE8D38E08B6@imperial.ac.uk> <6C66D04E-72AD-445B-9DE6-BB0961B9F622@imperial.ac.uk> <0b55eaf7-bf06-3876-a6bb-ce8e54422fa1@imperial.ac.uk> <7E354BC8-1419-4803-B8F6-CA78F868972A@imperial.ac.uk> Message-ID: On Mon, Jun 5, 2017 at 12:18 PM, Lawrence Mitchell < lawrence.mitchell at imperial.ac.uk> wrote: > > > On 2 Jun 2017, at 05:09, Matthew Knepley wrote: > > > > > > Coming back to this, I think I understand the problem a little better. > > > > Consider this mesh: > > > > +----+ > > |\ 3 | > > | \ | > > |2 \ | > > | \| > > +----+ > > |\ 1 | > > | \ | > > |0 \ | > > | \| > > +----+ > > > > Let's say I run on 3 processes and the initial (non-overlapped) cell > > partition is: > > > > rank 0: cell 0 > > rank 1: cell 1 & 2 > > rank 2: cell 3 > > > > Now I'd like to grow the overlap such that any cell I can see through > > a facet (and its closure) lives in the overlap. > > > > So great, I just need a new adjacency relation that gathers > > closure(support(point)) > > > > But, that isn't right, because now on rank 0, I will get a mesh that > > looks like: > > > > I do not understand why you think its not right. Toby and I are trying > to push a formalism for > > this understanding, in https://arxiv.org/abs/1508.02470. So you say > that if sigma is a dual > > basis function associated with point p, then the support of its matching > psi, sigma(psi) = 1 > > in the biorthogonal bases, is exactly star(p). > > > > So, if you have no sigma sitting on your vertex, who cares if you put > that extra edge and node > > in. It will not affect the communication pattern for dofs. If you do, > then shouldn't you be including > > that edge? > > Hmm, I think we are talking at cross-purposes. Let me try and explain > again where I am coming from: > > To do a FEM integral on some cell c I need: > > i) to evaluate coefficients at quadrature points (on the cell) > ii) to evaluate basis functions at quadrature points (on the cell) > > for (i), I need all the dofs in closure(c). > for (ii), I just need the definition of the finite element. > > To do a FEM integral on a facet f I need: > > i) to evaluate coefficients at quadrature points (on the facet) > ii) to evaluate basis functions at quadrature points (on the facet) > > for (i), I need all the dofs in closure(support(f)). > So this is a jump term, since I need both sides. Is it nonlinear? If its linear, things are easy and just compute from both sides and add, but of course that does not work for nonlinear things. > for (ii), I just need the definition of the finite element. > > So now, my model for how I want to global assembly of a facet integral is: > > loop over all facets: > gather from global coefficient to local data > evaluate coefficient at quad points > perform integral > local to global > > In parallel, I just make a partition of the facets (so that each facet is > integrated exactly once). > > OK, so what data do I need in parallel? > > Exactly the dofs that correspond to closure(support(facet)) for all owned > facets in my partition. > > So I was hoping to be able to grow a distributed cell partition by exactly > that means: add in those remote cells which are in the support of owned > facets (I'm happy if this is symmetrically grown, although I think I can do > it with one-sided growth). > > So that's my rationale for wanting this "strange" adjacency. I can get > enough adjacency by using the current FEM adjacency and filtering which > entities I iterate over, but it seems a bit wasteful. > I see that you have edges you do not necessarily want, but do they mess up your loop? It seems like you will not encounter them looping over facets. This is exactly what happens to me in FV, where I just ignore them. If you do really want to prune them, then I guess overriding the DMPlexGetAdjacency() as you propose is probably the best way. I would be willing to put it in. Please send me a reminder email since this week is pretty heinous for me. Thanks, Matt > For the currently implemented adjacencies, however, the code definitely > assumes in various places that the closure of the cells on a partition > covers all the points. And doing, say, DMPlexSetAdjacencyUseCone/Closure(PETSC_FALSE) > results in meshes where that is not true. > > See the below: > > Lawrence > > $ mpiexec -n 3 ./bork > [0] face 7 has support: 0 > [0] face 8 has support: 0 > [0] face 9 has support: 0 1 > [0] face 10 has support: 1 > [0] face 11 has support: 1 > [0] face 12 has support: > [1] face 10 has support: 0 2 > [1] face 11 has support: 0 1 > [1] face 12 has support: 0 > [1] face 13 has support: 1 > [1] face 14 has support: 2 > [1] face 15 has support: 2 > [1] face 16 has support: 1 3 > [1] face 17 has support: 3 > [1] face 18 has support: 3 > [2] face 7 has support: 0 1 > [2] face 8 has support: 0 > [2] face 9 has support: 0 > [2] face 10 has support: 1 > [2] face 11 has support: > [2] face 12 has support: 1[2]PETSC ERROR: --------------------- Error > Message -------------------------------------------------------------- > [2]PETSC ERROR: Petsc has generated inconsistent data > [2]PETSC ERROR: Number of depth 2 faces 6 does not match permuted nubmer 5 > [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [2]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3857-gda66ab19e3 > GIT Date: 2017-05-10 09:02:09 -0500 > [2]PETSC ERROR: ./bork on a arch-darwin-c-opt named yam-laptop.local by > lmitche1 Mon Jun 5 18:15:31 2017 > [2]PETSC ERROR: Configure options --download-chaco=1 --download-ctetgen=1 > --download-eigen --download-exodusii=1 --download-hdf5=1 --download-hypre=1 > --download-metis=1 --download-ml=1 --download-mumps=1 --download-netcdf=1 > --download-parmetis=1 --download-ptscotch=1 --download-scalapack=1 > --download-triangle=1 --with-c2html=0 --with-debugging=0 > --with-shared-libraries=1 PETSC_ARCH=arch-darwin-c-opt > [2]PETSC ERROR: #1 DMPlexCreateOrderingClosure_Static() line 41 in > /Users/lmitche1/Documents/work/src/deps/petsc/src/dm/ > impls/plex/plexreorder.c > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Petsc has generated inconsistent data > [0]PETSC ERROR: Number of depth 2 faces 6 does not match permuted nubmer 5 > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3857-gda66ab19e3 > GIT Date: 2017-05-10 09:02:09 -0500 > [0]PETSC ERROR: ./bork on a arch-darwin-c-opt named yam-laptop.local by > lmitche1 Mon Jun 5 18:15:31 2017 > [0]PETSC ERROR: Configure options --download-chaco=1 --download-ctetgen=1 > --download-eigen --download-exodusii=1 --download-hdf5=1 --download-hypre=1 > --download-metis=1 --download-ml=1 --download-mumps=1 --download-netcdf=1 > --download-parmetis=1 --download-ptscotch=1 --download-scalapack=1 > --download-triangle=1 --with-c2html=0 --with-debugging=0 > --with-shared-libraries=1 PETSC_ARCH=arch-darwin-c-opt > [0]PETSC ERROR: #1 DMPlexCreateOrderingClosure_Static() line 41 in > /Users/lmitche1/Documents/work/src/deps/petsc/src/dm/ > impls/plex/plexreorder.c > [0]PETSC ERROR: #2 DMPlexGetOrdering() line 133 in > /Users/lmitche1/Documents/work/src/deps/petsc/src/dm/ > impls/plex/plexreorder.c > [0]PETSC ERROR: #3 main() line 87 in /Users/lmitche1/Documents/ > work/src/petsc-doodles/bork.c > [0]PETSC ERROR: No PETSc Option Table entries > [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > [2]PETSC ERROR: #2 DMPlexGetOrdering() line 133 in > /Users/lmitche1/Documents/work/src/deps/petsc/src/dm/ > impls/plex/plexreorder.c > [2]PETSC ERROR: #3 main() line 87 in /Users/lmitche1/Documents/ > work/src/petsc-doodles/bork.c > [2]PETSC ERROR: No PETSc Option Table entries > [2]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > application called MPI_Abort(MPI_COMM_WORLD, 77) - process 0 > > #include > #include > > int main(int argc, char **argv) > { > PetscErrorCode ierr; > DM dm = NULL, dmParallel = NULL; > PetscSF sf = NULL; > PetscPartitioner partitioner = NULL; > IS perm = NULL; > const PetscReal coords[12] = {0, 0, > 0, 0.5, > 0, 1, > 1, 0, > 1, 0.5, > 1, 1}; > const PetscInt cells[12] = {0, 1, 3, > 1, 4, 3, > 1, 2, 4, > 2, 5, 4}; > > const PetscInt sizes[3] = {1, 2, 1}; > const PetscInt points[4] = {0, 1, 2, 3}; > PetscInt fStart, fEnd; > PetscMPIInt rank, size; > MPI_Comm comm; > > PetscInitialize(&argc, &argv, NULL, NULL); > > comm = PETSC_COMM_WORLD; > > ierr = MPI_Comm_rank(comm, &rank); CHKERRQ(ierr); > ierr = MPI_Comm_size(comm, &size); CHKERRQ(ierr); > > if (size != 3) SETERRQ(comm, PETSC_ERR_ARG_WRONG, "Requires 3 > processes"); > > if (!rank) { > ierr = DMPlexCreateFromCellList(comm, 2, 4, 6, 3, PETSC_TRUE, > cells, 2, coords, &dm); > CHKERRQ(ierr); > } else { > ierr = DMPlexCreateFromCellList(comm, 2, 0, 0, 3, PETSC_TRUE, > NULL, 2, NULL, &dm); CHKERRQ(ierr); > } > > ierr = PetscPartitionerCreate(comm, &partitioner); CHKERRQ(ierr); > ierr = PetscPartitionerSetType(partitioner, PETSCPARTITIONERSHELL); > CHKERRQ(ierr); > if (!rank) { > ierr = PetscPartitionerShellSetPartition(partitioner, size, > sizes, points); CHKERRQ(ierr); > } else { > ierr = PetscPartitionerShellSetPartition(partitioner, 3, NULL, > NULL); CHKERRQ(ierr); > } > ierr = DMPlexSetPartitioner(dm, partitioner); CHKERRQ(ierr); > > ierr = DMPlexDistribute(dm, 0, &sf, &dmParallel); CHKERRQ(ierr); > ierr = DMDestroy(&dm); CHKERRQ(ierr); > ierr = PetscSFDestroy(&sf); CHKERRQ(ierr); > ierr = PetscPartitionerDestroy(&partitioner); CHKERRQ(ierr); > > ierr = DMViewFromOptions(dmParallel, NULL, "-parallel_dm_view"); > CHKERRQ(ierr); > > ierr = DMPlexSetAdjacencyUseCone(dmParallel, PETSC_FALSE); > CHKERRQ(ierr); > ierr = DMPlexSetAdjacencyUseClosure(dmParallel, PETSC_FALSE); > CHKERRQ(ierr); > > ierr = DMPlexDistributeOverlap(dmParallel, 1, &sf, &dm); > CHKERRQ(ierr); > > ierr = DMDestroy(&dmParallel); CHKERRQ(ierr); > ierr = PetscSFDestroy(&sf); CHKERRQ(ierr); > > ierr = DMViewFromOptions(dm, NULL, "-overlap_dm_view"); CHKERRQ(ierr); > > ierr = DMPlexGetHeightStratum(dm, 1, &fStart, &fEnd); CHKERRQ(ierr); > > for (PetscInt f = fStart; f < fEnd; f++) { > const PetscInt *support = NULL; > PetscInt supportSize; > ierr = DMPlexGetSupportSize(dm, f, &supportSize); CHKERRQ(ierr); > ierr = DMPlexGetSupport(dm, f, &support); CHKERRQ(ierr); > ierr = PetscSynchronizedPrintf(comm, "[%d] face %d has support:", > rank, f); CHKERRQ(ierr); > for (PetscInt p = 0; p < supportSize; p++) { > ierr = PetscSynchronizedPrintf(comm, " %d", support[p]); > CHKERRQ(ierr); > } > ierr = PetscSynchronizedPrintf(comm, "\n"); CHKERRQ(ierr); > } > ierr = PetscSynchronizedFlush(comm, PETSC_STDOUT); CHKERRQ(ierr); > > ierr = DMPlexGetOrdering(dm, MATORDERINGRCM, NULL, &perm); > CHKERRQ(ierr); > if (perm) { > ierr = ISViewFromOptions(perm, NULL, "-ordering_is_view"); > CHKERRQ(ierr); > ierr = ISDestroy(&perm); CHKERRQ(ierr); > } > ierr = DMDestroy(&dm); CHKERRQ(ierr); > ierr = PetscFinalize(); > > return 0; > } > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.croucher at auckland.ac.nz Mon Jun 5 20:40:01 2017 From: a.croucher at auckland.ac.nz (Adrian Croucher) Date: Tue, 6 Jun 2017 13:40:01 +1200 Subject: [petsc-users] Jacobian matrix for dual porosity model Message-ID: <699ad4c0-6f79-be19-8239-ba2050ccb8de@auckland.ac.nz> I am about to generalise the geothermal subsurface flow simulator I'm working on so that it can simulate dual-porosity systems, and need some advice on the best way to to this within the PETSc framework. 1) Dual porosity: For anyone not familiar with the idea, dual porosity models are a way of simulating flow in fractured media. The main flow in the system takes place along the fractures, but fluid is also stored in the 'matrix' rock between the fractures. The rocks in the fracture and matrix cells usually have different properties (e.g. permeability, porosity). Starting from a conventional single-porosity finite volume model, dual porosity effectively adds a one-dimensional sub-model inside each cell (or possibly only in some cells), representing storage in the 'matrix' rock. The original single-porosity cells now represent the fractures in the rock. Their effective volumes are reduced but their connectivity to other fracture cells is unchanged. Each one gets an additional connection to a matrix cell. However, the matrix cells are usually not connected to each other. In the simplest case there is just one matrix cell for every fracture cell, though it is often desirable to use two or more, to get better representation of the flow in and out of the matrix. In that case the higher-level matrix cells form a one-dimensional sub-model for each cell. 2) Simple solution method: The simplest way to implement this approach is just to add in all the matrix cells and treat them the same way as the fracture cells. If the original number of single-porosity cells was n, and the number of matrix cells per fracture cell is m, then the Jacobian for the system is of size (m+1) * n. (In practice we are almost always solving for more than one variable per cell, e.g. pressure and temperature, so if there are p variables per cell, the total Jacobian size is really (m+1) * n * p, but if we think of the Jacobian as a block matrix with blocksize p then the total number of blocks is still (m+1) * n.) 2) More efficient solution method: The straight-forward approach is not very efficient, because it doesn't take any advantage of the particular sparsity pattern of the Jacobian. Because all the matrix rock sub-models are one-dimensional, the Jacobian has a block-tridiagonal high-level structure, in which all of the the block matrices (of size n) are themselves block-diagonal (with blocksize p), except for the upper left one which has the same sparsity pattern as the Jacobian for the original single-porosity system. An efficient way to solve this linear system is described by, among others, Zyvoloski et al (2008): http://www.sciencedirect.com/science/article/pii/S0309170807001741 The method is detailed on p. 537, equations 1 - 6. What it amounts to is taking successive 2-block-by-2-block sub-systems, starting from the bottom right, and replacing the upper left sub-matrix in each case by its Schur complement (with a similar modification to the right hand side vector). Because all the sub-matrices are block diagonal, these can be computed quite cheaply (and in parallel, if the sub-matrices are distributed appropriately across the processors). When you get back up to the top left of the whole matrix, the remaining modified sub-matrix of size n can be solved on its own to get the solution in the fracture cells. You can then back-substitute to get the solution in the matrix rock cells. In other words, after a relatively inexpensive reduction process, you wind up solving a linear system of size n (same as the original single-porosity system, and with the same sparsity pattern- only the diagonal and RHS are modified) instead of (m+1) * n. This whole process is usually a lot faster than solving the whole (m+1) * n sized linear system. 3) PETSc implementation So, how would I implement this approach in the PETSc framework? Because the flow equations are non-linear, solving these linear systems happens multiple times each time-step as part of a SNES solve. One way might be to form the whole Jacobian but somehow use a modified KSP solve which would implement the reduction process, do a KSP solve on the reduced system of size n, and finally back-substitute to find the unknowns in the matrix rock cells. Another way might be to form only the reduced-size Jacobian and the other block-diagonal matrices separately, use KSP to solve the reduced system but first incorporate the reduction process into the Jacobian calculation routine, and somewhere a post-solve step to back-substitute for the unknowns in the matrix cells. However currently we are using finite differences to compute these Jacobians and it seems to me it would be messy to try to do that separately for each of the sub-matrices. Doing it the first way above would avoid all that. Any suggestions for what might be a good approach? or any other ideas that could be easier to implement with PETSc but have similar efficiency? I didn't see anything currently in PETSc specifically for solving block-tridiagonal systems. Thanks, Adrian -- Dr Adrian Croucher Senior Research Fellow Department of Engineering Science University of Auckland, New Zealand email: a.croucher at auckland.ac.nz tel: +64 (0)9 923 4611 From lawrence.mitchell at imperial.ac.uk Tue Jun 6 03:01:54 2017 From: lawrence.mitchell at imperial.ac.uk (Lawrence Mitchell) Date: Tue, 6 Jun 2017 09:01:54 +0100 Subject: [petsc-users] DMPlex distribution with custom adjacency In-Reply-To: References: <15e465f7-dea1-39c5-7c43-ba447a7a8c09@imperial.ac.uk> <54529998-4688-4774-845B-1FDF67A8C20B@imperial.ac.uk> <0BEB36D4-C35B-48E4-8F66-8EE8D38E08B6@imperial.ac.uk> <6C66D04E-72AD-445B-9DE6-BB0961B9F622@imperial.ac.uk> <0b55eaf7-bf06-3876-a6bb-ce8e54422fa1@imperial.ac.uk> <7E354BC8-1419-4803-B8F6-CA78F868972A@imperial.ac.uk> Message-ID: <68964A27-CA4F-4527-8F18-6F231997451F@imperial.ac.uk> > On 5 Jun 2017, at 23:01, Matthew Knepley wrote: > > To do a FEM integral on a facet f I need: > > i) to evaluate coefficients at quadrature points (on the facet) > ii) to evaluate basis functions at quadrature points (on the facet) > > for (i), I need all the dofs in closure(support(f)). > > So this is a jump term, since I need both sides. Is it nonlinear? If its linear, things are easy > and just compute from both sides and add, but of course that does not work for nonlinear things. I can't guarantee what downstream users will write. Most of the time it will probably be nonlinear in some way. Hence wanting to compute on the facet (getting both sides of the contribution at once). Rather than spinning over cells, computing the one-sided facet contribution and adding in. > > for (ii), I just need the definition of the finite element. > > So now, my model for how I want to global assembly of a facet integral is: > > loop over all facets: > gather from global coefficient to local data > evaluate coefficient at quad points > perform integral > local to global > > In parallel, I just make a partition of the facets (so that each facet is integrated exactly once). > > OK, so what data do I need in parallel? > > Exactly the dofs that correspond to closure(support(facet)) for all owned facets in my partition. > > So I was hoping to be able to grow a distributed cell partition by exactly that means: add in those remote cells which are in the support of owned facets (I'm happy if this is symmetrically grown, although I think I can do it with one-sided growth). > > So that's my rationale for wanting this "strange" adjacency. I can get enough adjacency by using the current FEM adjacency and filtering which entities I iterate over, but it seems a bit wasteful. > > I see that you have edges you do not necessarily want, but do they mess up your loop? It seems like you will not encounter them looping over facets. > This is exactly what happens to me in FV, where I just ignore them. Remember in 2D the edges are the facets. I haven't checked what happens in 3D (but I expect it will be similar), because I can't draw 3D meshes. > If you do really want to prune them, then I guess overriding the DMPlexGetAdjacency() as you propose is probably the best way. I would > be willing to put it in. Please send me a reminder email since this week is pretty heinous for me. Sure. I think this is quite a cute usage because I can make the ghost region "one-sided" quite easily by only growing the adjacency through the facets on the ranks that I need. So the halo exchange between two processes is not symmetric. The code I sketched that did this seems to work properly, once the adjacency computation is right. Lawrence From franck.houssen at inria.fr Tue Jun 6 11:45:48 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Tue, 6 Jun 2017 18:45:48 +0200 (CEST) Subject: [petsc-users] How to VecScatter from global to local vector, and then, VecGather back ? In-Reply-To: <1808812566.2848156.1496767085102.JavaMail.zimbra@inria.fr> Message-ID: <1020917960.2852820.1496767548318.JavaMail.zimbra@inria.fr> How to VecScatter from global to local vector, and then, VecGather back ? This is a very simple use case: I need to split a global vector in local (possibly overlapping) pieces, then I need to modify each local piece (x2), and finally I need to assemble (+=) back local parts into a global vector. Read the doc and went through examples... But still can't make this work: can I get some help on this ? Note: running petsc-3.7.6 on debian with gcc-6.3 Thanks, Franck ~> head -n 12 vecScatterGather.cpp // How to VecScatter from global to local vector, and then, VecGather back ? // // global vector: 3x1 2 overlapping local vector: 2x1 global vector: 3x1 // // x2 // |1 -> |2 // |1 scatter |1 |2 gather |2 // |1 -> -> |4 // |1 |1 -> |2 |2 // |1 |2 // // ~> g++ -o vecScatterGather.exe vecScatterGather.cpp -lpetsc -lm; mpirun -n 2 vecScatterGather.exe -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: vecScatterGather.cpp Type: text/x-c++src Size: 2437 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: vecScatterGather.log Type: text/x-log Size: 381 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: vecScatterGather.log.expected Type: application/octet-stream Size: 374 bytes Desc: not available URL: From dave.mayhem23 at gmail.com Tue Jun 6 14:02:08 2017 From: dave.mayhem23 at gmail.com (Dave May) Date: Tue, 6 Jun 2017 20:02:08 +0100 Subject: [petsc-users] How to VecScatter from global to local vector, and then, VecGather back ? In-Reply-To: <1020917960.2852820.1496767548318.JavaMail.zimbra@inria.fr> References: <1808812566.2848156.1496767085102.JavaMail.zimbra@inria.fr> <1020917960.2852820.1496767548318.JavaMail.zimbra@inria.fr> Message-ID: On 6 June 2017 at 17:45, Franck Houssen wrote: > How to VecScatter from global to local vector, and then, VecGather back ? > > This is a very simple use case: I need to split a global vector in local > (possibly overlapping) pieces, then I need to modify each local piece (x2), > and finally I need to assemble (+=) back local parts into a global vector. > Read the doc and went through examples... But still can't make this work: > can I get some help on this ? > > Your usage of VecScatter in the code is fine. The reason you don't get the expected result of (-2,-4,-2) is because your vector (globVec) contains a bunch of -1's prior to the gather operation. Just call VecZeroEntries(globVec); before the call to VecScatterBegin(scatCtx, locVec, globVec, ADD_VALUES, SCATTER_REVERSE); VecScatterEnd (scatCtx, locVec, globVec, ADD_VALUES, SCATTER_REVERSE); and you'll get the correct result. Thanks, Dave > Note: running petsc-3.7.6 on debian with gcc-6.3 > > Thanks, > > Franck > > ~> head -n 12 vecScatterGather.cpp > // How to VecScatter from global to local vector, and then, VecGather back > ? > // > // global vector: 3x1 2 overlapping local vector: > 2x1 global vector: 3x1 > // > // x2 > // |1 -> |2 > // |1 scatter |1 |2 > gather |2 > // |1 -> > -> |4 > // |1 |1 -> |2 > |2 > // |1 |2 > // > // ~> g++ -o vecScatterGather.exe vecScatterGather.cpp -lpetsc -lm; mpirun > -n 2 vecScatterGather.exe > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Jun 6 16:36:16 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 6 Jun 2017 16:36:16 -0500 Subject: [petsc-users] Parallel vector with shared memory in Fortran In-Reply-To: References: Message-ID: <8A5E4351-D887-405C-AD34-C391245A04E6@mcs.anl.gov> > On Jun 5, 2017, at 11:20 AM, Francesco Migliorini wrote: > > Dear Stefano, > Thank you for your answer. I tried to use VecScatterCreateToAll as you suggested but it does not work since the first processor can only view its part of the vector. Here's how I managed the code: > > Vec fePS > VecScatter Scatter > (...) > call VecScatterCreateToAll(feP,Scatter,fePS,perr) > call VecScatterBegin(Scatter,feP,fePS,INSERT_VALUES,SCATTER_FORWARD,perr) > call VecScatterEnd(Scatter,feP,fePS,INSERT_VALUES,SCATTER_FORWARD,perr) > call VecScatterDestroy(Scatter,perr) > call VecDestroy(fePS,perr) > > As I said, after this piece of code, if I print all the entries of feP The vector feP which is parallel remains the same in these calls > from one processor, the values are correct if they belong to the part of the processor randon values. The vector fePS contains all the values from all the processes on each process after this call. Barry > > Bests, > Francesco > > 2017-06-05 16:44 GMT+02:00 Stefano Zampini : > Sorry, bad copy and paste > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecScatterCreateToAll.html > > Il 05 Giu 2017 4:43 PM, "Stefano Zampini" ha scritto: > petsc-current/docs/manualpages/Vec/VecScatterCreateToAll.html > > Il 05 Giu 2017 4:12 PM, "Francesco Migliorini" ha scritto: > Hello there! > > I am working with an MPI code in which I should create a petsc vector such that all the processes can access to all its entries. So, I tried with VecCreateShared but it does not work with my machine. Then I tried VecCreateMPI but it seems to me that it does not change anything from the usual VecCreate. Finally I found the scatter commands but the examples are a bit tricky. So, are there any other way? If no, could someone please show me how to use scatter in this simple code? > > Vec feP !The vector to be shared with all the processes > (...) > mpi_np = 2 !The number of processes > ind(1) = 10 !The global dimension of the vector > call VecCreate(PETSC_COMM_WORLD,feP,perr) > call VecSetSizes(feP,PETSC_DECIDE,ind,perr) > call VecSetFromOptions(feP,perr) > (...) !Here feP is filled in > call VecAssemblyBegin(feP,perr) > call VecAssemblyEnd(feP,perr) > > Many thanks, > Francesco > From bsmith at mcs.anl.gov Tue Jun 6 20:06:44 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 6 Jun 2017 20:06:44 -0500 Subject: [petsc-users] slepc trap for large matrix In-Reply-To: References: Message-ID: <6B43B495-9711-4F89-88AC-730A2DE38949@mcs.anl.gov> The resulting matrix has something like >>> 119999808*119999808*1.e-6 14,399,953,920.036863 nonzero entries. It is possible that some integer operations are overflowing since C int can only go up to about 4 billion before overflowing. You can building with a different PETSC_ARCH value using the additional ./configure option for PETSc of --with-64-bit-indices and see if the problem is resolved. Barry > On Jun 5, 2017, at 12:37 PM, Kannan, Ramakrishnan wrote: > > I am running EPS for NHEP on a matrix of size 119999808x119999808 and I am experiencing the attached trapped. This is a 1D row distributed sparse uniform random matrix with 1e-6 sparsity over 36 processors. It works fine for smaller matrices of sizes with 1.2 million x 1.2 million. Let me know if you are looking for more information. > > -- > Regards, > Ramki > > From zonexo at gmail.com Wed Jun 7 01:57:37 2017 From: zonexo at gmail.com (TAY wee-beng) Date: Wed, 7 Jun 2017 14:57:37 +0800 Subject: [petsc-users] Strange Segmentation Violation error Message-ID: <635a7754-72ec-1c96-2d0a-783615079b25@gmail.com> Hi, I have been PETSc together with my CFD code. There seems to be a bug with the Intel compiler such that when I call some DM routines such as DMLocalToLocalBegin, a segmentation violation will occur if full optimization is used. I had posted this question a while back. So the current solution is to use -O1 -ip instead of -O3 -ipo -ip for certain source files which uses DMLocalToLocalBegin etc. Recently, I made some changes to the code, mainly adding some stuffs. However, depending on my options. some cases still go thru the same program path. Now when I tried to run those same cases, I got segmentation violation, which didn't happen before: / IIB_I_cell_no_uvw_total2 14 10 6 3// // 2 1/ /[0]PETSC ERROR: ------------------------------------------------------------------------// //[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range// //[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger// //[0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind// //[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors// //[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run // //[0]PETSC ERROR: to get more information on the crash.// //[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------// //[0]PETSC ERROR: Signal received// //[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.// //[0]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016 // //[0]PETSC ERROR: ./a.out / I can't debug using VS since the codes have been optimized. I tried to print messages (if (myid == 0) print "1") to pinpoint the error. Strangely, after adding these print messages, the error disappears. / IIB_I_cell_no_uvw_total2 14 10 6 3// // 2 1// // 1// // 2// // 3// // 4// // 5// // 1 0.26873613 0.12620288 0.12949340 1.11422363 0.43983516E-06 -0.59311066E-01 0.25546227E+04// // 2 0.22236892 0.14528589 0.16939270 1.10459102 0.74556128E-02 -0.55168234E-01 0.25532419E+04// // 3 0.20764796 0.14832689 0.18780489 1.08039569 0.80299767E-02 -0.46972411E-01 0.25523174E+04/ Can anyone give a logical explanation why this is happening? Moreover, if I removed printing 1 to 3, and only print 4 and 5, segmentation violation appears again. I am using Intel Fortran 2016.1.150. I wonder if it helps if I post in the Intel Fortran forum. I can provide more info if require. -- Thank you very much. Yours sincerely, ================================================ TAY Wee-Beng (Zheng Weiming) ??? Personal research webpage: http://tayweebeng.wixsite.com/website Youtube research showcase: https://www.youtube.com/channel/UC72ZHtvQNMpNs2uRTSToiLA linkedin: www.linkedin.com/in/tay-weebeng ================================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Lukasz.Kaczmarczyk at glasgow.ac.uk Wed Jun 7 02:22:55 2017 From: Lukasz.Kaczmarczyk at glasgow.ac.uk (Lukasz Kaczmarczyk) Date: Wed, 7 Jun 2017 07:22:55 +0000 Subject: [petsc-users] Strange Segmentation Violation error In-Reply-To: <635a7754-72ec-1c96-2d0a-783615079b25@gmail.com> References: <635a7754-72ec-1c96-2d0a-783615079b25@gmail.com> Message-ID: <055B880D-7D0C-49C9-84B5-2C64B003FC0A@glasgow.ac.uk> On 7 Jun 2017, at 07:57, TAY wee-beng > wrote: Hi, I have been PETSc together with my CFD code. There seems to be a bug with the Intel compiler such that when I call some DM routines such as DMLocalToLocalBegin, a segmentation violation will occur if full optimization is used. I had posted this question a while back. So the current solution is to use -O1 -ip instead of -O3 -ipo -ip for certain source files which uses DMLocalToLocalBegin etc. Recently, I made some changes to the code, mainly adding some stuffs. However, depending on my options. some cases still go thru the same program path. Now when I tried to run those same cases, I got segmentation violation, which didn't happen before: IIB_I_cell_no_uvw_total2 14 10 6 3 2 1 [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [0]PETSC ERROR: to get more information on the crash. [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Signal received [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016 [0]PETSC ERROR: ./a.out I can't debug using VS since the codes have been optimized. I tried to print messages (if (myid == 0) print "1") to pinpoint the error. Strangely, after adding these print messages, the error disappears. IIB_I_cell_no_uvw_total2 14 10 6 3 2 1 1 2 3 4 5 1 0.26873613 0.12620288 0.12949340 1.11422363 0.43983516E-06 -0.59311066E-01 0.25546227E+04 2 0.22236892 0.14528589 0.16939270 1.10459102 0.74556128E-02 -0.55168234E-01 0.25532419E+04 3 0.20764796 0.14832689 0.18780489 1.08039569 0.80299767E-02 -0.46972411E-01 0.25523174E+04 Can anyone give a logical explanation why this is happening? Moreover, if I removed printing 1 to 3, and only print 4 and 5, segmentation violation appears again. I am using Intel Fortran 2016.1.150. I wonder if it helps if I post in the Intel Fortran forum. I can provide more info if require. You very likely write on the memory, for example when you exceed the size of arrays. Depending on your compilation options, starting parameters, etc. you write in an uncontrolled way on the part of memory which belongs to your process or protected by operation system. In the second case, you have a segmentation fault. You can have correct results for some runs, but your bug is there hiding in the dark. To put light on it, you need Valgrind. Compile the code with debugging on, no optimisation and start searching. You can run as well generate core file and in gdb/ldb buck track error. Lukasz -------------- next part -------------- An HTML attachment was scrubbed... URL: From natacha.bereux at gmail.com Wed Jun 7 03:52:06 2017 From: natacha.bereux at gmail.com (Natacha BEREUX) Date: Wed, 7 Jun 2017 10:52:06 +0200 Subject: [petsc-users] "snes/examples/tutorials/ex1f -snes_type fas" fails with segfault In-Reply-To: References: Message-ID: Hello Nicolas, I ran snes/examples/tutorials/ex1f -snes_type fas with a recent version (3.7.6) and I confirm the problem. The C version works fine, but the Fortran version complains about a Fortran callback problem. My output looks quite similar to yours ... Best regards, Natacha mpirun ex1f -snes_type fas [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Corrupt argument: http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [0]PETSC ERROR: Fortran callback not set on this object [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.7.6, Apr, 24, 2017 [0]PETSC ERROR: ex1f on a linux-opt-mumps-ml-hypre named dsp0780444 by H03755 Wed Jun 7 10:42:55 2017 [0]PETSC ERROR: Configure options --with-mpi=1 --with-debugging=0 --with-mumps-lib="-L/home/H03755/dev/codeaster-prerequisites/v13/prerequisites//Mumps-511_consortium_aster/MPI/lib -lzmumps -ldmumps -lmumps_common -lpord -L/home/H03755/dev/codeaster-prerequisites/v13/prerequisites//Parmetis_aster-403_aster/lib -lparmetis -L/home/H03755/dev/codeaster-prerequisites/v13/prerequisites//Scotch_aster-604_aster6/MPI/lib -lptscotch -lptscotcherr -lptscotcherrexit -lptscotchparmetis -lesmumps -lscotch -lscotcherr -lscotcherrexit -L/home/H03755/dev/codeaster-prerequisites/v13/prerequisites//Metis_aster-510_aster1/lib -lmetis" --with-mumps-include=/home/H03755/dev/codeaster-prerequisites/v13/prerequisites//Mumps-511_consortium_aster/MPI/include --download-hypre=/home/H03755/Librairies/hypre-2.11.1.tar.gz --download-ml=/home/H03755/Librairies/petsc-pkg-ml-e5040d11aa07.tar.gz --with-openmp=0 --with-scalapack-lib="-lscalapack-openmpi -lblacs-openmpi -lblacsF77init-openmpi -lblacsCinit-openmpi" --with-blas-lapack-lib="-llapack -lopenblas" --PETSC_ARCH=linux-opt-mumps-ml-hypre LIBS=-lgomp --prefix=/home/H03755/local/petsc/petsc-3.7.6 [0]PETSC ERROR: #1 PetscObjectGetFortranCallback() line 263 in /home/H03755/Librairies/petsc-3.7.6/src/sys/objects/inherit.c [0]PETSC ERROR: #2 oursnesjacobian() line 105 in /home/H03755/Librairies/petsc-3.7.6/src/snes/interface/ftn-custom/zsnesf.c [0]PETSC ERROR: #3 SNESComputeJacobian() line 2312 in /home/H03755/Librairies/petsc-3.7.6/src/snes/interface/snes.c [0]PETSC ERROR: #4 SNESSolve_NEWTONLS() line 228 in /home/H03755/Librairies/petsc-3.7.6/src/snes/impls/ls/ls.c [0]PETSC ERROR: #5 SNESSolve() line 4005 in /home/H03755/Librairies/petsc-3.7.6/src/snes/interface/snes.c [0]PETSC ERROR: #6 SNESFASDownSmooth_Private() line 512 in /home/H03755/Librairies/petsc-3.7.6/src/snes/impls/fas/fas.c [0]PETSC ERROR: #7 SNESFASCycle_Multiplicative() line 816 in /home/H03755/Librairies/petsc-3.7.6/src/snes/impls/fas/fas.c [0]PETSC ERROR: #8 SNESSolve_FAS() line 987 in /home/H03755/Librairies/petsc-3.7.6/src/snes/impls/fas/fas.c [0]PETSC ERROR: #9 SNESSolve() line 4005 in /home/H03755/Librairies/petsc-3.7.6/src/snes/interface/snes.c Number of SNES iterations = 0 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ex1f on a linux-opt-mumps-ml-hypre named dsp0780444 with 1 processor, by H03755 Wed Jun 7 10:42:55 2017 Using Petsc Release Version 3.7.6, Apr, 24, 2017 Max Max/Min Avg Total Time (sec): 7.548e-03 1.00000 7.548e-03 Objects: 2.300e+01 1.00000 2.300e+01 Flops: 3.000e+00 1.00000 3.000e+00 3.000e+00 Flops/sec: 3.975e+02 1.00000 3.975e+02 3.975e+02 MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 7.5421e-03 99.9% 3.0000e+00 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage SNESFunctionEval 1 1.0 7.8678e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNorm 1 1.0 6.1989e-06 1.0 3.00e+00 1.0 0.0e+00 0.0e+00 0.0e+00 0100 0 0 0 0100 0 0 0 0 VecSet 8 1.0 1.9073e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage SNES 2 2 2828 0. SNESLineSearch 2 2 1992 0. DMSNES 1 1 672 0. Vector 6 6 9312 0. Matrix 2 2 6536 0. Distributed Mesh 1 1 4624 0. Star Forest Bipartite Graph 2 2 1616 0. Discrete System 1 1 872 0. Krylov Solver 2 2 2704 0. DMKSP interface 1 1 656 0. Preconditioner 2 2 1832 0. Viewer 1 0 0 0. ======================================================================================================================== Average time to get PetscTime(): 0. #PETSc Option Table entries: -ksp_monitor -ksp_view -log_view -snes_type fas #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-mpi=1 --with-debugging=0 --with-mumps-lib="-L/home/H03755/dev/codeaster-prerequisites/v13/prerequisites//Mumps-511_consortium_aster/MPI/lib -lzmumps -ldmumps -lmumps_common -lpord -L/home/H03755/dev/codeaster-prerequisites/v13/prerequisites//Parmetis_aster-403_aster/lib -lparmetis -L/home/H03755/dev/codeaster-prerequisites/v13/prerequisites//Scotch_aster-604_aster6/MPI/lib -lptscotch -lptscotcherr -lptscotcherrexit -lptscotchparmetis -lesmumps -lscotch -lscotcherr -lscotcherrexit -L/home/H03755/dev/codeaster-prerequisites/v13/prerequisites//Metis_aster-510_aster1/lib -lmetis" --with-mumps-include=/home/H03755/dev/codeaster-prerequisites/v13/prerequisites//Mumps-511_consortium_aster/MPI/include --download-hypre=/home/H03755/Librairies/hypre-2.11.1.tar.gz --download-ml=/home/H03755/Librairies/petsc-pkg-ml-e5040d11aa07.tar.gz --with-openmp=0 --with-scalapack-lib="-lscalapack-openmpi -lblacs-openmpi -lblacsF77init-openmpi -lblacsCinit-openmpi" --with-blas-lapack-lib="-llapack -lopenblas" --PETSC_ARCH=linux-opt-mumps-ml-hypre LIBS=-lgomp --prefix=/home/H03755/local/petsc/petsc-3.7.6 ----------------------------------------- Libraries compiled on Fri Apr 28 15:23:58 2017 on dsp0780444 Machine characteristics: Linux-3.16.0-4-amd64-x86_64-with-debian-8.7 Using PETSc directory: /home/H03755/Librairies/petsc-3.7.6 Using PETSc arch: linux-opt-mumps-ml-hypre ----------------------------------------- Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/H03755/Librairies/petsc-3.7.6/linux-opt-mumps-ml-hypre/include -I/home/H03755/Librairies/petsc-3.7.6/include -I/home/H03755/Librairies/petsc-3.7.6/include -I/home/H03755/Librairies/petsc-3.7.6/linux-opt-mumps-ml-hypre/include -I/home/H03755/dev/codeaster-prerequisites/v13/prerequisites/Mumps-511_consortium_aster/MPI/include -I/home/H03755/local/petsc/petsc-3.7.6/include ----------------------------------------- Using C linker: mpicc Using Fortran linker: mpif90 Using libraries: -Wl,-rpath,/home/H03755/Librairies/petsc-3.7.6/linux-opt-mumps-ml-hypre/lib -L/home/H03755/Librairies/petsc-3.7.6/linux-opt-mumps-ml-hypre/lib -lpetsc -L/home/H03755/dev/codeaster-prerequisites/v13/prerequisites//Mumps-511_consortium_aster/MPI/lib -L/home/H03755/dev/codeaster-prerequisites/v13/prerequisites//Parmetis_aster-403_aster/lib -L/home/H03755/dev/codeaster-prerequisites/v13/prerequisites//Scotch_aster-604_aster6/MPI/lib -L/home/H03755/dev/codeaster-prerequisites/v13/prerequisites//Metis_aster-510_aster1/lib -Wl,-rpath,/home/H03755/local/petsc/petsc-3.7.6/lib -L/home/H03755/local/petsc/petsc-3.7.6/lib -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.9 -L/usr/lib/gcc/x86_64-linux-gnu/4.9 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lzmumps -ldmumps -lmumps_common -lpord -lparmetis -lptscotch -lptscotcherr -lptscotcherrexit -lptscotchparmetis -lesmumps -lscotch -lscotcherr -lscotcherrexit -lmetis -lHYPRE -lmpi_cxx -lstdc++ -lm -lscalapack-openmpi -lblacs-openmpi -lblacsF77init-openmpi -lblacsCinit-openmpi -lml -lmpi_cxx -lstdc++ -lm -llapack -lopenblas -lX11 -lssl -lcrypto -lm -lmpi_f90 -lmpi_f77 -lgfortran -lm -lgfortran -lm -lquadmath -lmpi_cxx -lstdc++ -lm -ldl -lgomp -lmpi -lhwloc -lgcc_s -lpthread -ldl -lgomp On Mon, Jun 5, 2017 at 10:12 AM, Karin&NiKo wrote: > Dear PETSc team, > > If I run "snes/examples/tutorials/ex1 -snes_type fas", everything is OK. > But with its Fortran version "snes/examples/tutorials/ex1f -snes_type > fas", I get a segfault (see error below). > Do you confirm or did I miss something? > > Best regards, > Nicolas > > ------------------------------------------------------------ > -------------------------------------------------------------------------- > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Corrupt argument: http://www.mcs.anl.gov/petsc/ > documentation/faq.html#valgrind > [0]PETSC ERROR: Fortran callback not set on this object > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > [0]PETSC ERROR: > > > on a arch-linux2-c-debug > named dsp0780450 by niko Thu Jun 1 16:18:43 2017 > [0]PETSC ERROR: Configure options --prefix=/home/niko/dev/ > codeaster-prerequisites/petsc-3.7.2/Install --with-mpi=yes --with-x=yes > --download-ml=/home/niko/dev/codeaster-prerequisites/petsc-3.7.2/ml-6.2-p3.tar.gz > --with-mumps-lib="-L/home/niko/dev/codeaster-prerequisites/v13/ > prerequisites/Mumps-502_consortium_aster1/MPI/lib -lzmumps -ldmumps > -lmumps_common -lpord -L/home/niko/dev/codeaster-prerequisites/v13/ > prerequisites/Scotch_aster-604_aster6/MPI/lib -lesmumps -lptscotch > -lptscotcherr -lptscotcherrexit -lscotch -lscotcherr -lscotcherrexit > -L/home/niko/dev/codeaster-prerequisites/v13/prerequisites/Parmetis_aster-403_aster/lib > -lparmetis -L/home/niko/dev/codeaster-prerequisites/v13/ > prerequisites/Metis_aster-510_aster1/lib -lmetis -L/usr/lib > -lscalapack-openmpi -L/usr/lib -lblacs-openmpi -lblacsCinit-openmpi > -lblacsF77init-openmpi -L/usr/lib/x86_64-linux-gnu -lgomp " > --with-mumps-include=/home/niko/dev/codeaster-prerequisites/v13/ > prerequisites/Mumps-502_consortium_aster1/MPI/include > --with-scalapack-lib="-L/usr/lib -lscalapack-openmpi" > --with-blacs-lib="-L/usr/lib -lblacs-openmpi -lblacsCinit-openmpi > -lblacsF77init-openmpi" --with-blas-lib="-L/usr/lib -lopenblas -lcblas" > --with-lapack-lib="-L/usr/lib -llapack" > [0]PETSC ERROR: #1 PetscObjectGetFortranCallback() line 263 in > /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/ > sys/objects/inherit.c > [0]PETSC ERROR: #2 oursnesjacobian() line 105 in /home/niko/dev/codeaster- > prerequisites/petsc-3.7.2/src/snes/interface/ftn-custom/zsnesf.c > [0]PETSC ERROR: #3 SNESComputeJacobian() line 2312 in > /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/ > snes/interface/snes.c > [0]PETSC ERROR: #4 SNESSolve_NEWTONLS() line 228 in > /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/snes/impls/ls/ls.c > [0]PETSC ERROR: #5 SNESSolve() line 4008 in /home/niko/dev/codeaster- > prerequisites/petsc-3.7.2/src/snes/interface/snes.c > [0]PETSC ERROR: #6 SNESFASDownSmooth_Private() line 512 in > /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/ > snes/impls/fas/fas.c > [0]PETSC ERROR: #7 SNESFASCycle_Multiplicative() line 816 in > /home/niko/dev/codeaster-prerequisites/petsc-3.7.2/src/ > snes/impls/fas/fas.c > [0]PETSC ERROR: #8 SNESSolve_FAS() line 987 in /home/niko/dev/codeaster- > prerequisites/petsc-3.7.2/src/snes/impls/fas/fas.c > [0]PETSC ERROR: #9 SNESSolve() line 4008 in /home/niko/dev/codeaster- > prerequisites/petsc-3.7.2/src/snes/interface/snes.c > ------------------------------------------------------------ > -------------------------------------------------------------------------- > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgbk2008 at gmail.com Wed Jun 7 08:18:40 2017 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Wed, 7 Jun 2017 15:18:40 +0200 Subject: [petsc-users] BDDC assembly question In-Reply-To: <1FE2819E-9D30-4CA7-95CF-98BE6458C87E@mcs.anl.gov> References: <1FE2819E-9D30-4CA7-95CF-98BE6458C87E@mcs.anl.gov> Message-ID: But the MatSetValuesLocal requires local index. I do not have that. By the way, use MatISGetMPIXAIJ to get the assembled matrix I obtained the one as in the figure A00.ps. It seems that MatSetValues does not assemble the remote entries, because the coupling blocks are zero. The range of the matrix in each process is below: 0: Istart: 0, Iend: 17 1: Istart: 17, Iend: 54 Giang On Mon, Jun 5, 2017 at 1:48 AM, Barry Smith wrote: > > > On Jun 4, 2017, at 5:51 PM, Hoang Giang Bui wrote: > > > > Hello > > > > I obtained two different matrices when assembling with MATIS and > MATMPIAIJ. With MATIS I used MatISSetPreallocation to allocate and > MatSetLocalToGlobalMapping to provide the mapping. However I still used > MatSetValues and MatAssemblyBegin/End with MATIS. Is it the correct way to > do so? In case that BDDC required assembling the matrix using local index, > is there a way to assemble using global index to keep the same assembly > interface as MATMPIAIJ? > > You can use MatSetValuesLocal() in both cases; this the efficient way. > > Barry > > > > > Thanks > > Giang > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: A00.mm Type: text/x-troff-mm Size: 22498 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: A00.ps Type: application/postscript Size: 1815069 bytes Desc: not available URL: From stefano.zampini at gmail.com Wed Jun 7 08:35:44 2017 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Wed, 7 Jun 2017 15:35:44 +0200 Subject: [petsc-users] BDDC assembly question In-Reply-To: References: <1FE2819E-9D30-4CA7-95CF-98BE6458C87E@mcs.anl.gov> Message-ID: <1CA90724-C7F3-48C6-9D4B-460B410E7990@gmail.com> Which version of PETSc are you using? If you use the dev version and try to call MatSetValues on a MATIS with a non-owned (subdomain-wise) dof it will raise an error, as MATIS does not implement any caching mechanisms for off-proc entries. Off-proc entries are a concept related with the AIJ format. The local row and columns distribution of a Mat (any type, MATAIJ, MATIS or whatever type you want to use) are related with the local sizes of the vectors used in MatMult; for MATIS, the size of the subdomain problem (call it nl) is inferred from the size of the ISLocalToGlobalMapping object used in the constructor (or passed in via MatSetLocalToGlobalMapping). So, you can either do A) loop over elements and call MatSetValuesLocal(A,element_dofs_in_subdomain_ordering?) B) loop over elements and call MatSetValues(A,element_dofs_in_global_ordering?) in case A), if you want a code independent on the matrix type (AIJ or IS), you need to call MatSetLocalToGlobalMapping(A,l2g,l2g) before being able to call MatSetValuesLocal. The l2g map should map dofs from subdomain (0 to nl) to global ordering in case B), the l2g map is only needed to create the MATIS object; in this case, when you call MatSetValues, the dofs in global ordering are mapped back to the subdomain ordering via ISGlobalToLocalMappingApply, that may not be memory scalable. So this is why Barry suggested you to use approach A). You may want to take a look at http://epubs.siam.org/doi/abs/10.1137/15M1025785 to better understand how MATIS works. > On Jun 7, 2017, at 3:18 PM, Hoang Giang Bui wrote: > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kannanr at ornl.gov Wed Jun 7 09:37:18 2017 From: kannanr at ornl.gov (Kannan, Ramakrishnan) Date: Wed, 7 Jun 2017 14:37:18 +0000 Subject: [petsc-users] slepc trap for large matrix In-Reply-To: <6B43B495-9711-4F89-88AC-730A2DE38949@mcs.anl.gov> References: <6B43B495-9711-4F89-88AC-730A2DE38949@mcs.anl.gov> Message-ID: Barry, Thanks for the kind response. I am building slepc 3.7.3 and when I configure ?with-64-bit-indices=1, I am getting the following error. ./configure --with-64-bit-indices=1 --prefix=/lustre/atlas/proj-shared/csc209/ramki/slepc ERROR: Invalid arguments --with-64-bit-indices=1 Use -h for help When I run ./configure ?h, I am getting the following options. Let me know if I am missing something. SLEPc Configure Help -------------------------------------------------------------------------------- SLEPc: --with-clean= : Delete prior build files including externalpackages --with-cmake= : Enable builds with CMake (disabled by default) --prefix= : Specify location to install SLEPc (e.g., /usr/local) --DATAFILESPATH= : Specify location of datafiles (for SLEPc developers) ARPACK: --download-arpack[=] : Download and install ARPACK in SLEPc directory --with-arpack= : Indicate if you wish to test for ARPACK --with-arpack-dir= : Indicate the directory for ARPACK libraries --with-arpack-flags= : Indicate comma-separated flags for linking ARPACK BLOPEX: --download-blopex[=] : Download and install BLOPEX in SLEPc directory BLZPACK: --with-blzpack= : Indicate if you wish to test for BLZPACK --with-blzpack-dir= : Indicate the directory for BLZPACK libraries --with-blzpack-flags= : Indicate comma-separated flags for linking BLZPACK FEAST: --with-feast= : Indicate if you wish to test for FEAST --with-feast-dir= : Indicate the directory for FEAST libraries --with-feast-flags= : Indicate comma-separated flags for linking FEAST PRIMME: --download-primme[=] : Download and install PRIMME in SLEPc directory --with-primme= : Indicate if you wish to test for PRIMME --with-primme-dir= : Indicate the directory for PRIMME libraries --with-primme-flags= : Indicate comma-separated flags for linking PRIMME TRLAN: --download-trlan[=] : Download and install TRLAN in SLEPc directory --with-trlan= : Indicate if you wish to test for TRLAN --with-trlan-dir= : Indicate the directory for TRLAN libraries --with-trlan-flags= : Indicate comma-separated flags for linking TRLAN SOWING: --download-sowing[=] : Download and install SOWING in SLEPc directory -- Regards, Ramki On 6/6/17, 9:06 PM, "Barry Smith" wrote: The resulting matrix has something like >>> 119999808*119999808*1.e-6 14,399,953,920.036863 nonzero entries. It is possible that some integer operations are overflowing since C int can only go up to about 4 billion before overflowing. You can building with a different PETSC_ARCH value using the additional ./configure option for PETSc of --with-64-bit-indices and see if the problem is resolved. Barry > On Jun 5, 2017, at 12:37 PM, Kannan, Ramakrishnan wrote: > > I am running EPS for NHEP on a matrix of size 119999808x119999808 and I am experiencing the attached trapped. This is a 1D row distributed sparse uniform random matrix with 1e-6 sparsity over 36 processors. It works fine for smaller matrices of sizes with 1.2 million x 1.2 million. Let me know if you are looking for more information. > > -- > Regards, > Ramki > > From jroman at dsic.upv.es Wed Jun 7 09:41:16 2017 From: jroman at dsic.upv.es (Jose E. Roman) Date: Wed, 7 Jun 2017 16:41:16 +0200 Subject: [petsc-users] slepc trap for large matrix In-Reply-To: References: <6B43B495-9711-4F89-88AC-730A2DE38949@mcs.anl.gov> Message-ID: This option belongs to PETSc's configure, not SLEPc's configure. Jose > El 7 jun 2017, a las 16:37, Kannan, Ramakrishnan escribi?: > > Barry, > > Thanks for the kind response. I am building slepc 3.7.3 and when I configure ?with-64-bit-indices=1, I am getting the following error. > > ./configure --with-64-bit-indices=1 --prefix=/lustre/atlas/proj-shared/csc209/ramki/slepc > ERROR: Invalid arguments --with-64-bit-indices=1 > Use -h for help > > When I run ./configure ?h, I am getting the following options. Let me know if I am missing something. > > SLEPc Configure Help > -------------------------------------------------------------------------------- > SLEPc: > --with-clean= : Delete prior build files including externalpackages > --with-cmake= : Enable builds with CMake (disabled by default) > --prefix= : Specify location to install SLEPc (e.g., /usr/local) > --DATAFILESPATH= : Specify location of datafiles (for SLEPc developers) > ARPACK: > --download-arpack[=] : Download and install ARPACK in SLEPc directory > --with-arpack= : Indicate if you wish to test for ARPACK > --with-arpack-dir= : Indicate the directory for ARPACK libraries > --with-arpack-flags= : Indicate comma-separated flags for linking ARPACK > BLOPEX: > --download-blopex[=] : Download and install BLOPEX in SLEPc directory > BLZPACK: > --with-blzpack= : Indicate if you wish to test for BLZPACK > --with-blzpack-dir= : Indicate the directory for BLZPACK libraries > --with-blzpack-flags= : Indicate comma-separated flags for linking BLZPACK > FEAST: > --with-feast= : Indicate if you wish to test for FEAST > --with-feast-dir= : Indicate the directory for FEAST libraries > --with-feast-flags= : Indicate comma-separated flags for linking FEAST > PRIMME: > --download-primme[=] : Download and install PRIMME in SLEPc directory > --with-primme= : Indicate if you wish to test for PRIMME > --with-primme-dir= : Indicate the directory for PRIMME libraries > --with-primme-flags= : Indicate comma-separated flags for linking PRIMME > TRLAN: > --download-trlan[=] : Download and install TRLAN in SLEPc directory > --with-trlan= : Indicate if you wish to test for TRLAN > --with-trlan-dir= : Indicate the directory for TRLAN libraries > --with-trlan-flags= : Indicate comma-separated flags for linking TRLAN > SOWING: > --download-sowing[=] : Download and install SOWING in SLEPc directory > > -- > Regards, > Ramki > > > On 6/6/17, 9:06 PM, "Barry Smith" wrote: > > > The resulting matrix has something like > >>>> 119999808*119999808*1.e-6 > 14,399,953,920.036863 > > nonzero entries. It is possible that some integer operations are overflowing since C int can only go up to about 4 billion before overflowing. > > You can building with a different PETSC_ARCH value using the additional ./configure option for PETSc of --with-64-bit-indices and see if the problem is resolved. > > Barry > > >> On Jun 5, 2017, at 12:37 PM, Kannan, Ramakrishnan wrote: >> >> I am running EPS for NHEP on a matrix of size 119999808x119999808 and I am experiencing the attached trapped. This is a 1D row distributed sparse uniform random matrix with 1e-6 sparsity over 36 processors. It works fine for smaller matrices of sizes with 1.2 million x 1.2 million. Let me know if you are looking for more information. >> >> -- >> Regards, >> Ramki >> >> > > > > From kannanr at ornl.gov Wed Jun 7 09:41:56 2017 From: kannanr at ornl.gov (Kannan, Ramakrishnan) Date: Wed, 7 Jun 2017 14:41:56 +0000 Subject: [petsc-users] slepc trap for large matrix In-Reply-To: References: <6B43B495-9711-4F89-88AC-730A2DE38949@mcs.anl.gov> Message-ID: Jose, I am running in the super computer environment. I just do a ?module load cray-petsc-64/3.7.4.0?. I don?t compile PETSc. -- Regards, Ramki On 6/7/17, 10:41 AM, "Jose E. Roman" wrote: This option belongs to PETSc's configure, not SLEPc's configure. Jose > El 7 jun 2017, a las 16:37, Kannan, Ramakrishnan escribi?: > > Barry, > > Thanks for the kind response. I am building slepc 3.7.3 and when I configure ?with-64-bit-indices=1, I am getting the following error. > > ./configure --with-64-bit-indices=1 --prefix=/lustre/atlas/proj-shared/csc209/ramki/slepc > ERROR: Invalid arguments --with-64-bit-indices=1 > Use -h for help > > When I run ./configure ?h, I am getting the following options. Let me know if I am missing something. > > SLEPc Configure Help > -------------------------------------------------------------------------------- > SLEPc: > --with-clean= : Delete prior build files including externalpackages > --with-cmake= : Enable builds with CMake (disabled by default) > --prefix= : Specify location to install SLEPc (e.g., /usr/local) > --DATAFILESPATH= : Specify location of datafiles (for SLEPc developers) > ARPACK: > --download-arpack[=] : Download and install ARPACK in SLEPc directory > --with-arpack= : Indicate if you wish to test for ARPACK > --with-arpack-dir= : Indicate the directory for ARPACK libraries > --with-arpack-flags= : Indicate comma-separated flags for linking ARPACK > BLOPEX: > --download-blopex[=] : Download and install BLOPEX in SLEPc directory > BLZPACK: > --with-blzpack= : Indicate if you wish to test for BLZPACK > --with-blzpack-dir= : Indicate the directory for BLZPACK libraries > --with-blzpack-flags= : Indicate comma-separated flags for linking BLZPACK > FEAST: > --with-feast= : Indicate if you wish to test for FEAST > --with-feast-dir= : Indicate the directory for FEAST libraries > --with-feast-flags= : Indicate comma-separated flags for linking FEAST > PRIMME: > --download-primme[=] : Download and install PRIMME in SLEPc directory > --with-primme= : Indicate if you wish to test for PRIMME > --with-primme-dir= : Indicate the directory for PRIMME libraries > --with-primme-flags= : Indicate comma-separated flags for linking PRIMME > TRLAN: > --download-trlan[=] : Download and install TRLAN in SLEPc directory > --with-trlan= : Indicate if you wish to test for TRLAN > --with-trlan-dir= : Indicate the directory for TRLAN libraries > --with-trlan-flags= : Indicate comma-separated flags for linking TRLAN > SOWING: > --download-sowing[=] : Download and install SOWING in SLEPc directory > > -- > Regards, > Ramki > > > On 6/6/17, 9:06 PM, "Barry Smith" wrote: > > > The resulting matrix has something like > >>>> 119999808*119999808*1.e-6 > 14,399,953,920.036863 > > nonzero entries. It is possible that some integer operations are overflowing since C int can only go up to about 4 billion before overflowing. > > You can building with a different PETSC_ARCH value using the additional ./configure option for PETSc of --with-64-bit-indices and see if the problem is resolved. > > Barry > > >> On Jun 5, 2017, at 12:37 PM, Kannan, Ramakrishnan wrote: >> >> I am running EPS for NHEP on a matrix of size 119999808x119999808 and I am experiencing the attached trapped. This is a 1D row distributed sparse uniform random matrix with 1e-6 sparsity over 36 processors. It works fine for smaller matrices of sizes with 1.2 million x 1.2 million. Let me know if you are looking for more information. >> >> -- >> Regards, >> Ramki >> >> > > > > From fande.kong at inl.gov Wed Jun 7 09:44:16 2017 From: fande.kong at inl.gov (Kong, Fande) Date: Wed, 7 Jun 2017 08:44:16 -0600 Subject: [petsc-users] slepc trap for large matrix In-Reply-To: References: <6B43B495-9711-4F89-88AC-730A2DE38949@mcs.anl.gov> Message-ID: On Wed, Jun 7, 2017 at 8:37 AM, Kannan, Ramakrishnan wrote: > Barry, > > Thanks for the kind response. I am building slepc 3.7.3 and when I > configure ?with-64-bit-indices=1, I am getting the following error. > > ./configure --with-64-bit-indices=1 --prefix=/lustre/atlas/proj- > shared/csc209/ramki/slepc > ERROR: Invalid arguments --with-64-bit-indices=1 > Use -h for help > I think you need to do "configure --with-64-bit-indices=1" for PETSc (Not SLEPc). Fande, > When I run ./configure ?h, I am getting the following options. Let me know > if I am missing something. > > SLEPc Configure Help > ------------------------------------------------------------ > -------------------- > SLEPc: > --with-clean= : Delete prior build files including > externalpackages > --with-cmake= : Enable builds with CMake (disabled by > default) > --prefix= : Specify location to install SLEPc (e.g., > /usr/local) > --DATAFILESPATH= : Specify location of datafiles (for SLEPc > developers) > ARPACK: > --download-arpack[=] : Download and install ARPACK in SLEPc > directory > --with-arpack= : Indicate if you wish to test for ARPACK > --with-arpack-dir= : Indicate the directory for ARPACK > libraries > --with-arpack-flags= : Indicate comma-separated flags for > linking ARPACK > BLOPEX: > --download-blopex[=] : Download and install BLOPEX in SLEPc > directory > BLZPACK: > --with-blzpack= : Indicate if you wish to test for BLZPACK > --with-blzpack-dir= : Indicate the directory for BLZPACK > libraries > --with-blzpack-flags= : Indicate comma-separated flags for > linking BLZPACK > FEAST: > --with-feast= : Indicate if you wish to test for FEAST > --with-feast-dir= : Indicate the directory for FEAST libraries > --with-feast-flags= : Indicate comma-separated flags for > linking FEAST > PRIMME: > --download-primme[=] : Download and install PRIMME in SLEPc > directory > --with-primme= : Indicate if you wish to test for PRIMME > --with-primme-dir= : Indicate the directory for PRIMME > libraries > --with-primme-flags= : Indicate comma-separated flags for > linking PRIMME > TRLAN: > --download-trlan[=] : Download and install TRLAN in SLEPc > directory > --with-trlan= : Indicate if you wish to test for TRLAN > --with-trlan-dir= : Indicate the directory for TRLAN libraries > --with-trlan-flags= : Indicate comma-separated flags for > linking TRLAN > SOWING: > --download-sowing[=] : Download and install SOWING in SLEPc > directory > > -- > Regards, > Ramki > > > On 6/6/17, 9:06 PM, "Barry Smith" wrote: > > > The resulting matrix has something like > > >>> 119999808*119999808*1.e-6 > 14,399,953,920.036863 > > nonzero entries. It is possible that some integer operations are > overflowing since C int can only go up to about 4 billion before > overflowing. > > You can building with a different PETSC_ARCH value using the > additional ./configure option for PETSc of --with-64-bit-indices and see if > the problem is resolved. > > Barry > > > > On Jun 5, 2017, at 12:37 PM, Kannan, Ramakrishnan > wrote: > > > > I am running EPS for NHEP on a matrix of size 119999808x119999808 > and I am experiencing the attached trapped. This is a 1D row distributed > sparse uniform random matrix with 1e-6 sparsity over 36 processors. It > works fine for smaller matrices of sizes with 1.2 million x 1.2 million. > Let me know if you are looking for more information. > > > > -- > > Regards, > > Ramki > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgbk2008 at gmail.com Wed Jun 7 09:55:17 2017 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Wed, 7 Jun 2017 16:55:17 +0200 Subject: [petsc-users] BDDC assembly question In-Reply-To: <1CA90724-C7F3-48C6-9D4B-460B410E7990@gmail.com> References: <1FE2819E-9D30-4CA7-95CF-98BE6458C87E@mcs.anl.gov> <1CA90724-C7F3-48C6-9D4B-460B410E7990@gmail.com> Message-ID: Hi Stefano I used case B) to not change the current code significantly. Nevertheless case A) is worth to look when the number of domains grow. In case B) I noticed that the l2g passing to MatSetLocalToGlobalMapping must also contain the off-proc entries, in order to assemble correctly. As you said, calling MatSetValues with non-owned dof will raise error, so we have to include that in the current sub-domain. I used v3.7.4 though. Thanks all for the help. Giang On Wed, Jun 7, 2017 at 3:35 PM, Stefano Zampini wrote: > Which version of PETSc are you using? If you use the dev version and try > to call MatSetValues on a MATIS with a non-owned (subdomain-wise) dof it > will raise an error, as MATIS does not implement any caching mechanisms for > off-proc entries. > Off-proc entries are a concept related with the AIJ format. > > The local row and columns distribution of a Mat (any type, MATAIJ, MATIS > or whatever type you want to use) are related with the local sizes of the > vectors used in MatMult; > for MATIS, the size of the subdomain problem (call it nl) is inferred from > the size of the ISLocalToGlobalMapping object used in the constructor (or > passed in via MatSetLocalToGlobalMapping). > > So, you can either do > > A) loop over elements and call MatSetValuesLocal(A,element_ > dofs_in_subdomain_ordering?) > B) loop over elements and call MatSetValues(A,element_dofs_ > in_global_ordering?) > > in case A), if you want a code independent on the matrix type (AIJ or IS), > you need to call MatSetLocalToGlobalMapping(A,l2g,l2g) before being able > to call MatSetValuesLocal. The l2g map should map dofs from subdomain (0 to > nl) to global ordering > > in case B), the l2g map is only needed to create the MATIS object; in this > case, when you call MatSetValues, the dofs in global ordering are mapped > back to the subdomain ordering via ISGlobalToLocalMappingApply, that may > not be memory scalable. So this is why Barry suggested you to use approach > A). > > > You may want to take a look at http://epubs.siam.org/doi/ > abs/10.1137/15M1025785 to better understand how MATIS works. > > > On Jun 7, 2017, at 3:18 PM, Hoang Giang Bui wrote: > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Wed Jun 7 10:04:15 2017 From: jroman at dsic.upv.es (Jose E. Roman) Date: Wed, 7 Jun 2017 17:04:15 +0200 Subject: [petsc-users] slepc trap for large matrix In-Reply-To: References: <6B43B495-9711-4F89-88AC-730A2DE38949@mcs.anl.gov> Message-ID: > El 7 jun 2017, a las 16:41, Kannan, Ramakrishnan escribi?: > > Jose, > > I am running in the super computer environment. I just do a ?module load cray-petsc-64/3.7.4.0?. I don?t compile PETSc. > -- > Regards, > Ramki In $PETSC_DIR/$PETSC_ARCH/lib/petsc/conf/configure.log you should be able to see which options were used in PETSc's configure. If --with-64-bit-indices=1 is not there, ask the sysadmin to create another PETSc module with this option. Alternatively, configure it yourself from source with the same options adding --with-64-bit-indices=1 Jose From sb020287 at gmail.com Thu Jun 8 11:20:17 2017 From: sb020287 at gmail.com (Somdeb Bandopadhyay) Date: Fri, 9 Jun 2017 00:20:17 +0800 Subject: [petsc-users] example for TS with AMR? Message-ID: hi, is there any example of using TS with AMR (e.g p4est)? from what I understand, I can use posttimestep and poststage to perform similar job, but I think it will be too complicated for high level of refinement (say levelmax=8). is there any example where only TS is used with SAMR grid topology? -------------- next part -------------- An HTML attachment was scrubbed... URL: From epscodes at gmail.com Thu Jun 8 12:34:53 2017 From: epscodes at gmail.com (Xiangdong) Date: Thu, 8 Jun 2017 13:34:53 -0400 Subject: [petsc-users] questions on BAIJ matrix Message-ID: Hello everyone, I have a few quick questions on BAIJ matrix in petsc. 1) In the remark of the function MatCreateMPIBAIJWithArrays, it says " bs - the block size, only a block size of 1 is supported". Why must the block size be 1? Is this a typo? http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/ MatCreateMPIBAIJWithArrays.html 2) In the Line 4040 of the implemention of MatCreateMPIBAIJWithArrays, would the matrix type be matmpibaij instead of matmpiSbaij? http://www.mcs.anl.gov/petsc/petsc-current/src/mat/impls/ baij/mpi/mpibaij.c.html#MatCreateMPIBAIJWithArrays 4031: PetscErrorCode MatCreateMPIBAIJWithArrays(MPI_Comm comm,PetscInt bs,PetscInt m,PetscInt n,PetscInt M,PetscInt N,const PetscInt i[],const PetscInt j[],const PetscScalar a[],Mat *mat) 4032: { 4036: if (i[0]) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_ARG_OUTOFRANGE,"i (row indices) must start with 0"); 4037: if (m < 0) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_ARG_OUTOFRANGE,"local number of rows (m) cannot be PETSC_DECIDE, or negative"); 4038: MatCreate(comm,mat); 4039: MatSetSizes(*mat,m,n,M,N); 4040: MatSetType(*mat,MATMPISBAIJ); 4041: MatSetOption(*mat,MAT_ROW_ORIENTED,PETSC_FALSE); 4042: MatMPIBAIJSetPreallocationCSR(*mat,bs,i,j,a); 4043: MatSetOption(*mat,MAT_ROW_ORIENTED,PETSC_TRUE); 4044: return(0); 4045: } 3) I want to create a petsc matrix M equivalent to the sum of two block csr matrix/array (M1csr, M2csr). What is the best way to achieve it? I am thinking of created two petsc baij matrix (M1baij and M2baij) by calling MatCreateMPIBAIJWithArrays twice and then call MATAXPY to get the sum M=M1baij + M2baij. Is there a better way to do it? Thank you. Best, Xiangdong -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Thu Jun 8 14:17:43 2017 From: hzhang at mcs.anl.gov (Hong) Date: Thu, 8 Jun 2017 14:17:43 -0500 Subject: [petsc-users] questions on BAIJ matrix In-Reply-To: References: Message-ID: Xiangdong: MatCreateMPIBAIJWithArrays() is obviously buggy, and not been tested. > 1) In the remark of the function MatCreateMPIBAIJWithArrays, it says " bs - > the block size, only a block size of 1 is supported". Why must the block > size be 1? Is this a typo? > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/ > Mat/MatCreateMPIBAIJWithArrays.html > It seems only bs=1 was implemented. I would not trust it without a test example. > > 2) In the Line 4040 of the implemention of MatCreateMPIBAIJWithArrays, > would the matrix type be matmpibaij instead of matmpiSbaij? > This is an error. It should be matmpibaij. > > http://www.mcs.anl.gov/petsc/petsc-current/src/mat/impls/bai > j/mpi/mpibaij.c.html#MatCreateMPIBAIJWithArrays > > 4031: PetscErrorCode MatCreateMPIBAIJWithArrays(MPI_Comm comm,PetscInt > bs,PetscInt m,PetscInt n,PetscInt M,PetscInt N,const PetscInt i[],const > PetscInt j[],const PetscScalar a[],Mat *mat) > 4032: { > > 4036: if (i[0]) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_ARG_OUTOFRANGE,"i > (row indices) must start with 0"); > 4037: if (m < 0) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_ARG_OUTOFRANGE,"local > number of rows (m) cannot be PETSC_DECIDE, or negative"); > 4038: MatCreate(comm,mat); > 4039: MatSetSizes(*mat,m,n,M,N); > 4040: MatSetType(*mat,MATMPISBAIJ); > It should be MATMPIBAIJ. > > 3) I want to create a petsc matrix M equivalent to the sum of two block > csr matrix/array (M1csr, M2csr). What is the best way to achieve it? I am > thinking of created two petsc baij matrix (M1baij and M2baij) by > calling MatCreateMPIBAIJWithArrays twice and then call MATAXPY to get the > sum M=M1baij + M2baij. Is there a better way to do it? > This is an approach. However MatCreateMPIBAIJWithArrays() needs to be fixed, tested and implemented with requested bs. What bs do you need? Why not use MatCreate(), MatSetValuses() (set a block values at time) to create two MPIBAIJ matrices, then call MATAXPY. Since petsc MPIBAIJ matrix has different internal data structure than csr, "The i, j, and a arrays ARE copied by MatCreateMPIBAIJWithArrays() into the internal format used by PETSc;", so this approach would give similar performance. Hong -------------- next part -------------- An HTML attachment was scrubbed... URL: From epscodes at gmail.com Thu Jun 8 14:56:08 2017 From: epscodes at gmail.com (Xiangdong) Date: Thu, 8 Jun 2017 15:56:08 -0400 Subject: [petsc-users] questions on BAIJ matrix In-Reply-To: References: Message-ID: On Thu, Jun 8, 2017 at 3:17 PM, Hong wrote: > Xiangdong: > MatCreateMPIBAIJWithArrays() is obviously buggy, and not been tested. > > >> 1) In the remark of the function MatCreateMPIBAIJWithArrays, it says " bs - >> the block size, only a block size of 1 is supported". Why must the block >> size be 1? Is this a typo? >> >> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/ >> Mat/MatCreateMPIBAIJWithArrays.html >> > > It seems only bs=1 was implemented. I would not trust it without a test > example. > >> >> 2) In the Line 4040 of the implemention of MatCreateMPIBAIJWithArrays, >> would the matrix type be matmpibaij instead of matmpiSbaij? >> > > This is an error. It should be matmpibaij. > >> >> http://www.mcs.anl.gov/petsc/petsc-current/src/mat/impls/bai >> j/mpi/mpibaij.c.html#MatCreateMPIBAIJWithArrays >> >> 4031: PetscErrorCode MatCreateMPIBAIJWithArrays(MPI_Comm comm,PetscInt >> bs,PetscInt m,PetscInt n,PetscInt M,PetscInt N,const PetscInt i[],const >> PetscInt j[],const PetscScalar a[],Mat *mat) >> 4032: { >> >> 4036: if (i[0]) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_ARG_OUTOFRANGE,"i >> (row indices) must start with 0"); >> 4037: if (m < 0) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_ARG_OUTOFRANGE,"local >> number of rows (m) cannot be PETSC_DECIDE, or negative"); >> 4038: MatCreate(comm,mat); >> 4039: MatSetSizes(*mat,m,n,M,N); >> 4040: MatSetType(*mat,MATMPISBAIJ); >> > > It should be MATMPIBAIJ. > >> >> 3) I want to create a petsc matrix M equivalent to the sum of two block >> csr matrix/array (M1csr, M2csr). What is the best way to achieve it? I am >> thinking of created two petsc baij matrix (M1baij and M2baij) by >> calling MatCreateMPIBAIJWithArrays twice and then call MATAXPY to get >> the sum M=M1baij + M2baij. Is there a better way to do it? >> > > This is an approach. However MatCreateMPIBAIJWithArrays() needs to be > fixed, tested and implemented with requested bs. What bs do you need? > Why does each bs need to be implemented separately? In the mean time, I modifed the implementation of MatCreateMPIBAIJWithArrays() a little bit to create a baij matrix with csr arrays. MatCreate(comm,mat); MatSetSizes(*mat,m,n,M,N); MatSetType(*mat,MATMPIBAIJ); MatMPIBAIJSetPreallocationCSR(*mat,bs,i,j,a); MatSetOption(*mat,MAT_ROW_ORIENTED,PETSC_FALSE); MatAssemblyBegin(M,MAT_FINAL_ASSEMBLY); MatAssemblyEnd(M,MAT_FINAL_ASSEMBLY); I just set the type to MATMPIBAIJ and delete the line MatSetOption before preallocation (otherwise I get error at runtime complaining using set options before preallocation) and it works fine. The only thing missing is that setting mat_row_oriented to be petsc_false has no effect on the final matrix, which I do not know how to fix. > > Why not use MatCreate(), MatSetValuses() (set a block values at time) to > create two MPIBAIJ matrices, then call MATAXPY. Since petsc MPIBAIJ matrix > has different internal data structure than csr, > "The i, j, and a arrays ARE copied by MatCreateMPIBAIJWithArrays() into > the internal format used by PETSc;", so this approach would give similar > performance. > I will try this option as well. Thanks for your suggestions. Xiangdong > > Hong > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jun 8 21:21:51 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 8 Jun 2017 21:21:51 -0500 Subject: [petsc-users] example for TS with AMR? In-Reply-To: References: Message-ID: On Thu, Jun 8, 2017 at 11:20 AM, Somdeb Bandopadhyay wrote: > hi, is there any example of using TS with AMR (e.g p4est)? > from what I understand, I can use posttimestep and poststage to perform > similar job, but I think it will be too complicated for high level of > refinement (say levelmax=8). is there any example where only TS is used > with SAMR grid topology? > No, we really do not have something yet. You can see us trying things out in TS ex11, but I would not characterize this as a solution, but more of an experiment. Thanks, Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sb020287 at gmail.com Thu Jun 8 22:41:51 2017 From: sb020287 at gmail.com (Somdeb Bandopadhyay) Date: Fri, 9 Jun 2017 11:41:51 +0800 Subject: [petsc-users] example for TS with AMR? In-Reply-To: References: Message-ID: Alright, thank you for the update. On Fri, Jun 9, 2017 at 10:21 AM, Matthew Knepley wrote: > On Thu, Jun 8, 2017 at 11:20 AM, Somdeb Bandopadhyay > wrote: > >> hi, is there any example of using TS with AMR (e.g p4est)? >> from what I understand, I can use posttimestep and poststage to perform >> similar job, but I think it will be too complicated for high level of >> refinement (say levelmax=8). is there any example where only TS is used >> with SAMR grid topology? >> > > No, we really do not have something yet. You can see us trying things out > in TS ex11, but I would not > characterize this as a solution, but more of an experiment. > > Thanks, > > Matt > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xgarnaud at gmail.com Fri Jun 9 09:41:00 2017 From: xgarnaud at gmail.com (Xavier Garnaud) Date: Fri, 9 Jun 2017 16:41:00 +0200 Subject: [petsc-users] Ghost edges in DMPlex Message-ID: Dear all, I am working on a vertex-centered Finite Volume CFD solver (with only tetrahedral or triangular cells). I'd like to be able to use DMPlex, but I have a couple of questions: For me, the most convenient way to store the mesh is to have, in addition to the standard DMPlex built from a cell to node connectivity is to also have 2 types of fictitious edges: 1- for periodic surfaces, I build ghost cells and ghost nodes prior to building the DMPlex, and I'd like to add "edges" that will give the matching between the ghost nodes and the corresponding nodes. These edges will not belong to any face or cell --> what would be the most convenient way to add such edges ? 2- for each node on each boundary surface, I'd like to add a fictitious edge to easily compute the Finite Volume operators. --> can I have edges that link a node to itself (possible several times if the node belongs to several surfaces)? alternatively, should I add a fictitious node for each surface, and link all the nodes in the surface to this fictitious node? For the mesh partition, which option should I use in ierr = DMPlexSetAdjacencyUseCone(_dm,PETSC_TRUE); ierr = DMPlexSetAdjacencyUseClosure(_dm,PETSC_TRUE); knowing that the unknowns are stores at the vertices and that two vertices are connected if there is an edge between them? Thank you very much for your library, and for your help. Best regards, Xavier -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jun 9 13:42:56 2017 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 9 Jun 2017 13:42:56 -0500 Subject: [petsc-users] Ghost edges in DMPlex In-Reply-To: References: Message-ID: On Fri, Jun 9, 2017 at 9:41 AM, Xavier Garnaud wrote: > Dear all, > > I am working on a vertex-centered Finite Volume CFD solver (with only > tetrahedral or triangular cells). I'd like to be able to use DMPlex, but > I have a couple of questions: > > For me, the most convenient way to store the mesh is to have, in addition > to the standard DMPlex built from a cell to node connectivity is to also > have 2 types of fictitious edges: > 1- for periodic surfaces, I build ghost cells and ghost nodes prior to > building the DMPlex, and I'd like to add "edges" that will give the > matching between the ghost nodes and the corresponding nodes. These edges > will not belong to any face or cell --> what would be the most convenient > way to add such edges ? > I do not understand doing it that way, although I think you could do it. You can build any kind of adjacency just by adding a point and its cone. I manage periodic surfaces just by discretizing the periodic topology directly, since topology and geometry are decoupled. For example, on the circle, we just directly have the edges of the circle. > 2- for each node on each boundary surface, I'd like to add a > fictitious edge to easily compute the Finite Volume operators. --> can I > have edges that link a node to itself (possible several times if the node > belongs to several surfaces)? alternatively, should I add a fictitious node > for each surface, and link all the nodes in the surface to this fictitious > node? > I think I understand this one. This would be the analogue of my "ghost cells". Then you put the boundary concentration in the ghost cell and compute the relevant boundary flux along the boundary face. You are talking about the topology as the dual to what I am used to, so you have ghost nodes instead of cells and ghost edges instead of faces. > For the mesh partition, which option should I use in > > ierr = DMPlexSetAdjacencyUseCone(_dm,PETSC_TRUE); > ierr = DMPlexSetAdjacencyUseClosure(_dm,PETSC_TRUE); > > knowing that the unknowns are stores at the vertices and that two vertices > are connected if there is an edge between them? > For the topology I use, I believe you want PETSC_TRUE and PETSC_FALSE, as I show on this page http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DM/DMPlexGetAdjacencyUseClosure.html but if you are using the dual, as you do above, then it would be PETSC_FALSE and PETSC_FALSE I realize that the FV support is not as developed as the FEM support, so feel free to mail when you have problems. Thanks, Matt > Thank you very much for your library, and for your help. > > Best regards, > > Xavier > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dnolte at dim.uchile.cl Sat Jun 10 20:25:35 2017 From: dnolte at dim.uchile.cl (David Nolte) Date: Sat, 10 Jun 2017 21:25:35 -0400 Subject: [petsc-users] Advice on improving Stokes Schur preconditioners Message-ID: Dear all, I am solving a Stokes problem in 3D aorta geometries, using a P2/P1 finite elements discretization on tetrahedral meshes resulting in ~1-1.5M DOFs. Viscosity is uniform (can be adjusted arbitrarily), and the right hand side is a function of noisy measurement data. In other settings of "standard" Stokes flow problems I have obtained good convergence with an "upper" Schur complement preconditioner, using AMG (ML or Hypre) on the velocity block and approximating the Schur complement matrix by the diagonal of the pressure mass matrix: -ksp_converged_reason -ksp_monitor_true_residual -ksp_initial_guess_nonzero -ksp_diagonal_scale -ksp_diagonal_scale_fix -ksp_type fgmres -ksp_rtol 1.0e-8 -pc_type fieldsplit -pc_fieldsplit_type schur -pc_fieldsplit_detect_saddle_point -pc_fieldsplit_schur_fact_type upper -pc_fieldsplit_schur_precondition user # <-- pressure mass matrix -fieldsplit_0_ksp_type preonly -fieldsplit_0_pc_type ml -fieldsplit_1_ksp_type preonly -fieldsplit_1_pc_type jacobi In my present case this setup gives rather slow convergence (varies for different geometries between 200-500 or several thousands!). I obtain better convergence with "-pc_fieldsplit_schur_precondition selfp"and using multigrid on S, with "-fieldsplit_1_pc_type ml" (I don't think this is optimal, though). I don't understand why the pressure mass matrix approach performs so poorly and wonder what I could try to improve the convergence. Until now I have been using ML and Hypre BoomerAMG mostly with default parameters. Surely they can be improved by tuning some parameters. Which could be a good starting point? Are there other options I should consider? With the above setup (jacobi) for a case that works better than others, the KSP terminates with 467 KSP unpreconditioned resid norm 2.072014323515e-09 true resid norm 2.072014322600e-09 ||r(i)||/||b|| 9.939098100674e-09 You can find the output of -ksp_view below. Let me know if you need more details. Thanks in advance for your advice! Best wishes David KSP Object: 1 MPI processes type: fgmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10000 tolerances: relative=1e-08, absolute=1e-50, divergence=10000. right preconditioning diagonally scaled system using nonzero initial guess using UNPRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: fieldsplit FieldSplit with Schur preconditioner, factorization UPPER Preconditioner for the Schur complement formed from user provided matrix Split info: Split number 0 Defined by IS Split number 1 Defined by IS KSP solver for A00 block KSP Object: (fieldsplit_0_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (fieldsplit_0_) 1 MPI processes type: ml MG: type is MULTIPLICATIVE, levels=5 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices Coarse grid solver -- level ------------------------------- KSP Object: (fieldsplit_0_mg_coarse_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (fieldsplit_0_mg_coarse_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5., needed 1. Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=3, cols=3 package used to perform factorization: petsc total: nonzeros=3, allocated nonzeros=3 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=3, cols=3 total: nonzeros=3, allocated nonzeros=3 total number of mallocs used during MatSetValues calls =0 not using I-node routines Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (fieldsplit_0_mg_levels_1_) 1 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (fieldsplit_0_mg_levels_1_) 1 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=15, cols=15 total: nonzeros=69, allocated nonzeros=69 total number of mallocs used during MatSetValues calls =0 not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (fieldsplit_0_mg_levels_2_) 1 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (fieldsplit_0_mg_levels_2_) 1 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=304, cols=304 total: nonzeros=7354, allocated nonzeros=7354 total number of mallocs used during MatSetValues calls =0 not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (fieldsplit_0_mg_levels_3_) 1 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (fieldsplit_0_mg_levels_3_) 1 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=30236, cols=30236 total: nonzeros=2730644, allocated nonzeros=2730644 total number of mallocs used during MatSetValues calls =0 not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 4 ------------------------------- KSP Object: (fieldsplit_0_mg_levels_4_) 1 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (fieldsplit_0_mg_levels_4_) 1 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: (fieldsplit_0_) 1 MPI processes type: seqaij rows=894132, cols=894132 total: nonzeros=70684164, allocated nonzeros=70684164 total number of mallocs used during MatSetValues calls =0 not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: (fieldsplit_0_) 1 MPI processes type: seqaij rows=894132, cols=894132 total: nonzeros=70684164, allocated nonzeros=70684164 total number of mallocs used during MatSetValues calls =0 not using I-node routines KSP solver for S = A11 - A10 inv(A00) A01 KSP Object: (fieldsplit_1_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (fieldsplit_1_) 1 MPI processes type: jacobi linear system matrix followed by preconditioner matrix: Mat Object: (fieldsplit_1_) 1 MPI processes type: schurcomplement rows=42025, cols=42025 Schur complement A11 - A10 inv(A00) A01 A11 Mat Object: (fieldsplit_1_) 1 MPI processes type: seqaij rows=42025, cols=42025 total: nonzeros=554063, allocated nonzeros=554063 total number of mallocs used during MatSetValues calls =0 not using I-node routines A10 Mat Object: 1 MPI processes type: seqaij rows=42025, cols=894132 total: nonzeros=6850107, allocated nonzeros=6850107 total number of mallocs used during MatSetValues calls =0 not using I-node routines KSP of A00 KSP Object: (fieldsplit_0_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (fieldsplit_0_) 1 MPI processes type: ml MG: type is MULTIPLICATIVE, levels=5 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices Coarse grid solver -- level ------------------------------- KSP Object: (fieldsplit_0_mg_coarse_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (fieldsplit_0_mg_coarse_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5., needed 1. Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=3, cols=3 package used to perform factorization: petsc total: nonzeros=3, allocated nonzeros=3 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=3, cols=3 total: nonzeros=3, allocated nonzeros=3 total number of mallocs used during MatSetValues calls =0 not using I-node routines Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (fieldsplit_0_mg_levels_1_) 1 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (fieldsplit_0_mg_levels_1_) 1 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=15, cols=15 total: nonzeros=69, allocated nonzeros=69 total number of mallocs used during MatSetValues calls =0 not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (fieldsplit_0_mg_levels_2_) 1 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (fieldsplit_0_mg_levels_2_) 1 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=304, cols=304 total: nonzeros=7354, allocated nonzeros=7354 total number of mallocs used during MatSetValues calls =0 not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (fieldsplit_0_mg_levels_3_) 1 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (fieldsplit_0_mg_levels_3_) 1 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=30236, cols=30236 total: nonzeros=2730644, allocated nonzeros=2730644 total number of mallocs used during MatSetValues calls =0 not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 4 ------------------------------- KSP Object: (fieldsplit_0_mg_levels_4_) 1 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (fieldsplit_0_mg_levels_4_) 1 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: (fieldsplit_0_) 1 MPI processes type: seqaij rows=894132, cols=894132 total: nonzeros=70684164, allocated nonzeros=70684164 total number of mallocs used during MatSetValues calls =0 not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: (fieldsplit_0_) 1 MPI processes type: seqaij rows=894132, cols=894132 total: nonzeros=70684164, allocated nonzeros=70684164 total number of mallocs used during MatSetValues calls =0 not using I-node routines A01 Mat Object: 1 MPI processes type: seqaij rows=894132, cols=42025 total: nonzeros=6850107, allocated nonzeros=6850107 total number of mallocs used during MatSetValues calls =0 not using I-node routines Mat Object: 1 MPI processes type: seqaij rows=42025, cols=42025 total: nonzeros=554063, allocated nonzeros=554063 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=936157, cols=936157 total: nonzeros=84938441, allocated nonzeros=84938441 total number of mallocs used during MatSetValues calls =0 not using I-node routines From knepley at gmail.com Sun Jun 11 07:53:38 2017 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 11 Jun 2017 07:53:38 -0500 Subject: [petsc-users] Advice on improving Stokes Schur preconditioners In-Reply-To: References: Message-ID: On Sat, Jun 10, 2017 at 8:25 PM, David Nolte wrote: > Dear all, > > I am solving a Stokes problem in 3D aorta geometries, using a P2/P1 > finite elements discretization on tetrahedral meshes resulting in > ~1-1.5M DOFs. Viscosity is uniform (can be adjusted arbitrarily), and > the right hand side is a function of noisy measurement data. > > In other settings of "standard" Stokes flow problems I have obtained > good convergence with an "upper" Schur complement preconditioner, using > AMG (ML or Hypre) on the velocity block and approximating the Schur > complement matrix by the diagonal of the pressure mass matrix: > > -ksp_converged_reason > -ksp_monitor_true_residual > -ksp_initial_guess_nonzero > -ksp_diagonal_scale > -ksp_diagonal_scale_fix > -ksp_type fgmres > -ksp_rtol 1.0e-8 > > -pc_type fieldsplit > -pc_fieldsplit_type schur > -pc_fieldsplit_detect_saddle_point > -pc_fieldsplit_schur_fact_type upper > -pc_fieldsplit_schur_precondition user # <-- pressure mass matrix > > -fieldsplit_0_ksp_type preonly > -fieldsplit_0_pc_type ml > > -fieldsplit_1_ksp_type preonly > -fieldsplit_1_pc_type jacobi > 1) I always recommend starting from an exact solver and backing off in small steps for optimization. Thus I would start with LU on the upper block and GMRES/LU with toelrance 1e-10 on the Schur block. This should converge in 1 iterate. 2) I don't think you want preonly on the Schur system. You might want GMRES/Jacobi to invert the mass matrix. 3) You probably want to tighten the tolerance on the Schur solve, at least to start, and then slowly let it out. The tight tolerance will show you how effective the preconditioner is using that Schur operator. Then you can start to evaluate how effective the Schur linear sovler is. Does this make sense? Thanks, Matt > In my present case this setup gives rather slow convergence (varies for > different geometries between 200-500 or several thousands!). I obtain > better convergence with "-pc_fieldsplit_schur_precondition selfp"and > using multigrid on S, with "-fieldsplit_1_pc_type ml" (I don't think > this is optimal, though). > > I don't understand why the pressure mass matrix approach performs so > poorly and wonder what I could try to improve the convergence. Until now > I have been using ML and Hypre BoomerAMG mostly with default parameters. > Surely they can be improved by tuning some parameters. Which could be a > good starting point? Are there other options I should consider? > > With the above setup (jacobi) for a case that works better than others, > the KSP terminates with > 467 KSP unpreconditioned resid norm 2.072014323515e-09 true resid norm > 2.072014322600e-09 ||r(i)||/||b|| 9.939098100674e-09 > > You can find the output of -ksp_view below. Let me know if you need more > details. > > Thanks in advance for your advice! > Best wishes > David > > > KSP Object: 1 MPI processes > type: fgmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10000 > tolerances: relative=1e-08, absolute=1e-50, divergence=10000. > right preconditioning > diagonally scaled system > using nonzero initial guess > using UNPRECONDITIONED norm type for convergence test > PC Object: 1 MPI processes > type: fieldsplit > FieldSplit with Schur preconditioner, factorization UPPER > Preconditioner for the Schur complement formed from user provided > matrix > Split info: > Split number 0 Defined by IS > Split number 1 Defined by IS > KSP solver for A00 block > KSP Object: (fieldsplit_0_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (fieldsplit_0_) 1 MPI processes > type: ml > MG: type is MULTIPLICATIVE, levels=5 cycles=v > Cycles per PCApply=1 > Using Galerkin computed coarse grid matrices > Coarse grid solver -- level ------------------------------- > KSP Object: (fieldsplit_0_mg_coarse_) 1 MPI > processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (fieldsplit_0_mg_coarse_) 1 MPI > processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > using diagonal shift on blocks to prevent zero pivot > [INBLOCKS] > matrix ordering: nd > factor fill ratio given 5., needed 1. > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=3, cols=3 > package used to perform factorization: petsc > total: nonzeros=3, allocated nonzeros=3 > total number of mallocs used during MatSetValues > calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=3, cols=3 > total: nonzeros=3, allocated nonzeros=3 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > Down solver (pre-smoother) on level 1 > ------------------------------- > KSP Object: (fieldsplit_0_mg_levels_1_) 1 > MPI processes > type: richardson > Richardson: damping factor=1. > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (fieldsplit_0_mg_levels_1_) 1 > MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local > iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=15, cols=15 > total: nonzeros=69, allocated nonzeros=69 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 2 > ------------------------------- > KSP Object: (fieldsplit_0_mg_levels_2_) 1 > MPI processes > type: richardson > Richardson: damping factor=1. > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (fieldsplit_0_mg_levels_2_) 1 > MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local > iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=304, cols=304 > total: nonzeros=7354, allocated nonzeros=7354 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 3 > ------------------------------- > KSP Object: (fieldsplit_0_mg_levels_3_) 1 > MPI processes > type: richardson > Richardson: damping factor=1. > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (fieldsplit_0_mg_levels_3_) 1 > MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local > iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=30236, cols=30236 > total: nonzeros=2730644, allocated nonzeros=2730644 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 4 > ------------------------------- > KSP Object: (fieldsplit_0_mg_levels_4_) 1 > MPI processes > type: richardson > Richardson: damping factor=1. > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (fieldsplit_0_mg_levels_4_) 1 > MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local > iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: (fieldsplit_0_) 1 MPI > processes > type: seqaij > rows=894132, cols=894132 > total: nonzeros=70684164, allocated nonzeros=70684164 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > Up solver (post-smoother) same as down solver (pre-smoother) > linear system matrix = precond matrix: > Mat Object: (fieldsplit_0_) 1 MPI processes > type: seqaij > rows=894132, cols=894132 > total: nonzeros=70684164, allocated nonzeros=70684164 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > KSP solver for S = A11 - A10 inv(A00) A01 > KSP Object: (fieldsplit_1_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (fieldsplit_1_) 1 MPI processes > type: jacobi > linear system matrix followed by preconditioner matrix: > Mat Object: (fieldsplit_1_) 1 MPI processes > type: schurcomplement > rows=42025, cols=42025 > Schur complement A11 - A10 inv(A00) A01 > A11 > Mat Object: (fieldsplit_1_) 1 > MPI processes > type: seqaij > rows=42025, cols=42025 > total: nonzeros=554063, allocated nonzeros=554063 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > A10 > Mat Object: 1 MPI processes > type: seqaij > rows=42025, cols=894132 > total: nonzeros=6850107, allocated nonzeros=6850107 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > KSP of A00 > KSP Object: (fieldsplit_0_) 1 > MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (fieldsplit_0_) 1 > MPI processes > type: ml > MG: type is MULTIPLICATIVE, levels=5 cycles=v > Cycles per PCApply=1 > Using Galerkin computed coarse grid matrices > Coarse grid solver -- level ------------------------------ > - > KSP Object: > (fieldsplit_0_mg_coarse_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: > (fieldsplit_0_mg_coarse_) 1 MPI processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > using diagonal shift on blocks to prevent zero > pivot [INBLOCKS] > matrix ordering: nd > factor fill ratio given 5., needed 1. > Factored matrix follows: > Mat Object: 1 MPI > processes > type: seqaij > rows=3, cols=3 > package used to perform factorization: petsc > total: nonzeros=3, allocated nonzeros=3 > total number of mallocs used during > MatSetValues calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=3, cols=3 > total: nonzeros=3, allocated nonzeros=3 > total number of mallocs used during MatSetValues > calls =0 > not using I-node routines > Down solver (pre-smoother) on level 1 > ------------------------------- > KSP Object: > (fieldsplit_0_mg_levels_1_) 1 MPI processes > type: richardson > Richardson: damping factor=1. > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: > (fieldsplit_0_mg_levels_1_) 1 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local > iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=15, cols=15 > total: nonzeros=69, allocated nonzeros=69 > total number of mallocs used during MatSetValues > calls =0 > not using I-node routines > Up solver (post-smoother) same as down solver > (pre-smoother) > Down solver (pre-smoother) on level 2 > ------------------------------- > KSP Object: > (fieldsplit_0_mg_levels_2_) 1 MPI processes > type: richardson > Richardson: damping factor=1. > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: > (fieldsplit_0_mg_levels_2_) 1 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local > iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=304, cols=304 > total: nonzeros=7354, allocated nonzeros=7354 > total number of mallocs used during MatSetValues > calls =0 > not using I-node routines > Up solver (post-smoother) same as down solver > (pre-smoother) > Down solver (pre-smoother) on level 3 > ------------------------------- > KSP Object: > (fieldsplit_0_mg_levels_3_) 1 MPI processes > type: richardson > Richardson: damping factor=1. > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: > (fieldsplit_0_mg_levels_3_) 1 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local > iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=30236, cols=30236 > total: nonzeros=2730644, allocated nonzeros=2730644 > total number of mallocs used during MatSetValues > calls =0 > not using I-node routines > Up solver (post-smoother) same as down solver > (pre-smoother) > Down solver (pre-smoother) on level 4 > ------------------------------- > KSP Object: > (fieldsplit_0_mg_levels_4_) 1 MPI processes > type: richardson > Richardson: damping factor=1. > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: > (fieldsplit_0_mg_levels_4_) 1 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local > iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: > (fieldsplit_0_) 1 MPI processes > type: seqaij > rows=894132, cols=894132 > total: nonzeros=70684164, allocated nonzeros=70684164 > total number of mallocs used during MatSetValues > calls =0 > not using I-node routines > Up solver (post-smoother) same as down solver > (pre-smoother) > linear system matrix = precond matrix: > Mat Object: > (fieldsplit_0_) 1 MPI processes > type: seqaij > rows=894132, cols=894132 > total: nonzeros=70684164, allocated nonzeros=70684164 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > A01 > Mat Object: 1 MPI processes > type: seqaij > rows=894132, cols=42025 > total: nonzeros=6850107, allocated nonzeros=6850107 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > Mat Object: 1 MPI processes > type: seqaij > rows=42025, cols=42025 > total: nonzeros=554063, allocated nonzeros=554063 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=936157, cols=936157 > total: nonzeros=84938441, allocated nonzeros=84938441 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgbk2008 at gmail.com Sun Jun 11 12:34:01 2017 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Sun, 11 Jun 2017 19:34:01 +0200 Subject: [petsc-users] empty split for fieldsplit Message-ID: Hello I noticed that my code stopped very long, possibly hang, at PCFieldSplitSetIS. There are two splits and one split is empty in one process. May that be the possible reason that PCFieldSplitSetIS hang ? Giang -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sun Jun 11 13:11:40 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 11 Jun 2017 13:11:40 -0500 Subject: [petsc-users] empty split for fieldsplit In-Reply-To: References: Message-ID: <86908C35-EBFC-48C6-A002-B64BEC85A375@mcs.anl.gov> Could be, send us a simple example that demonstrates the problem and we'll track it down. > On Jun 11, 2017, at 12:34 PM, Hoang Giang Bui wrote: > > Hello > > I noticed that my code stopped very long, possibly hang, at PCFieldSplitSetIS. There are two splits and one split is empty in one process. May that be the possible reason that PCFieldSplitSetIS hang ? > > Giang From dnolte at dim.uchile.cl Sun Jun 11 23:06:14 2017 From: dnolte at dim.uchile.cl (David Nolte) Date: Mon, 12 Jun 2017 00:06:14 -0400 Subject: [petsc-users] Advice on improving Stokes Schur preconditioners In-Reply-To: References: Message-ID: Thanks Matt, makes sense to me! I skipped direct solvers at first because for these 'real' configurations LU (mumps/superlu_dist) usally goes out of memory (got 32GB RAM). It would be reasonable to take one more step back and play with synthetic examples. I managed to run one case though with 936k dofs using: ("user" =pressure mass matrix) <...> -pc_fieldsplit_schur_fact_type upper -pc_fieldsplit_schur_precondition user -fieldsplit_0_ksp_type preonly -fieldsplit_0_pc_type lu -fieldsplit_0_pc_factor_mat_solver_package mumps -fieldsplit_1_ksp_type gmres -fieldsplit_1_ksp_monitor_true_residuals -fieldsplit_1_ksp_rtol 1e-10 -fieldsplit_1_pc_type lu -fieldsplit_1_pc_factor_mat_solver_package mumps It takes 2 outer iterations, as expected. However the fieldsplit_1 solve takes very long. 0 KSP unpreconditioned resid norm 4.038466809302e-03 true resid norm 4.038466809302e-03 ||r(i)||/||b|| 1.000000000000e+00 Residual norms for fieldsplit_1_ solve. 0 KSP preconditioned resid norm 0.000000000000e+00 true resid norm 0.000000000000e+00 ||r(i)||/||b|| -nan Linear fieldsplit_1_ solve converged due to CONVERGED_ATOL iterations 0 1 KSP unpreconditioned resid norm 4.860095964831e-06 true resid norm 4.860095964831e-06 ||r(i)||/||b|| 1.203450763452e-03 Residual norms for fieldsplit_1_ solve. 0 KSP preconditioned resid norm 2.965546249872e+08 true resid norm 1.000000000000e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.347596594634e+08 true resid norm 3.599678801575e-01 ||r(i)||/||b|| 3.599678801575e-01 2 KSP preconditioned resid norm 5.913230136403e+07 true resid norm 2.364916760834e-01 ||r(i)||/||b|| 2.364916760834e-01 3 KSP preconditioned resid norm 4.629700028930e+07 true resid norm 1.984444715595e-01 ||r(i)||/||b|| 1.984444715595e-01 4 KSP preconditioned resid norm 3.804431276819e+07 true resid norm 1.747224559120e-01 ||r(i)||/||b|| 1.747224559120e-01 5 KSP preconditioned resid norm 3.178769422140e+07 true resid norm 1.402254864444e-01 ||r(i)||/||b|| 1.402254864444e-01 6 KSP preconditioned resid norm 2.648669043919e+07 true resid norm 1.191164310866e-01 ||r(i)||/||b|| 1.191164310866e-01 7 KSP preconditioned resid norm 2.203522108614e+07 true resid norm 9.690500018007e-02 ||r(i)||/||b|| 9.690500018007e-02 <...> 422 KSP preconditioned resid norm 2.984888715147e-02 true resid norm 8.598401046494e-11 ||r(i)||/||b|| 8.598401046494e-11 423 KSP preconditioned resid norm 2.638419658982e-02 true resid norm 7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 423 2 KSP unpreconditioned resid norm 3.539889585599e-16 true resid norm 3.542279617063e-16 ||r(i)||/||b|| 8.771347603759e-14 Linear solve converged due to CONVERGED_RTOL iterations 2 Does the slow convergence of the Schur block mean that my preconditioning matrix Sp is a poor choice? Thanks, David On 06/11/2017 08:53 AM, Matthew Knepley wrote: > On Sat, Jun 10, 2017 at 8:25 PM, David Nolte > wrote: > > Dear all, > > I am solving a Stokes problem in 3D aorta geometries, using a P2/P1 > finite elements discretization on tetrahedral meshes resulting in > ~1-1.5M DOFs. Viscosity is uniform (can be adjusted arbitrarily), and > the right hand side is a function of noisy measurement data. > > In other settings of "standard" Stokes flow problems I have obtained > good convergence with an "upper" Schur complement preconditioner, > using > AMG (ML or Hypre) on the velocity block and approximating the Schur > complement matrix by the diagonal of the pressure mass matrix: > > -ksp_converged_reason > -ksp_monitor_true_residual > -ksp_initial_guess_nonzero > -ksp_diagonal_scale > -ksp_diagonal_scale_fix > -ksp_type fgmres > -ksp_rtol 1.0e-8 > > -pc_type fieldsplit > -pc_fieldsplit_type schur > -pc_fieldsplit_detect_saddle_point > -pc_fieldsplit_schur_fact_type upper > -pc_fieldsplit_schur_precondition user # <-- pressure mass > matrix > > -fieldsplit_0_ksp_type preonly > -fieldsplit_0_pc_type ml > > -fieldsplit_1_ksp_type preonly > -fieldsplit_1_pc_type jacobi > > > 1) I always recommend starting from an exact solver and backing off in > small steps for optimization. Thus > I would start with LU on the upper block and GMRES/LU with > toelrance 1e-10 on the Schur block. > This should converge in 1 iterate. > > 2) I don't think you want preonly on the Schur system. You might want > GMRES/Jacobi to invert the mass matrix. > > 3) You probably want to tighten the tolerance on the Schur solve, at > least to start, and then slowly let it out. The > tight tolerance will show you how effective the preconditioner is > using that Schur operator. Then you can start > to evaluate how effective the Schur linear sovler is. > > Does this make sense? > > Thanks, > > Matt > > > In my present case this setup gives rather slow convergence > (varies for > different geometries between 200-500 or several thousands!). I obtain > better convergence with "-pc_fieldsplit_schur_precondition selfp"and > using multigrid on S, with "-fieldsplit_1_pc_type ml" (I don't think > this is optimal, though). > > I don't understand why the pressure mass matrix approach performs so > poorly and wonder what I could try to improve the convergence. > Until now > I have been using ML and Hypre BoomerAMG mostly with default > parameters. > Surely they can be improved by tuning some parameters. Which could > be a > good starting point? Are there other options I should consider? > > With the above setup (jacobi) for a case that works better than > others, > the KSP terminates with > 467 KSP unpreconditioned resid norm 2.072014323515e-09 true resid norm > 2.072014322600e-09 ||r(i)||/||b|| 9.939098100674e-09 > > You can find the output of -ksp_view below. Let me know if you > need more > details. > > Thanks in advance for your advice! > Best wishes > David > > > KSP Object: 1 MPI processes > type: fgmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10000 > tolerances: relative=1e-08, absolute=1e-50, divergence=10000. > right preconditioning > diagonally scaled system > using nonzero initial guess > using UNPRECONDITIONED norm type for convergence test > PC Object: 1 MPI processes > type: fieldsplit > FieldSplit with Schur preconditioner, factorization UPPER > Preconditioner for the Schur complement formed from user > provided matrix > Split info: > Split number 0 Defined by IS > Split number 1 Defined by IS > KSP solver for A00 block > KSP Object: (fieldsplit_0_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (fieldsplit_0_) 1 MPI processes > type: ml > MG: type is MULTIPLICATIVE, levels=5 cycles=v > Cycles per PCApply=1 > Using Galerkin computed coarse grid matrices > Coarse grid solver -- level ------------------------------- > KSP Object: (fieldsplit_0_mg_coarse_) > 1 MPI > processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (fieldsplit_0_mg_coarse_) > 1 MPI > processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > using diagonal shift on blocks to prevent zero pivot > [INBLOCKS] > matrix ordering: nd > factor fill ratio given 5., needed 1. > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=3, cols=3 > package used to perform factorization: petsc > total: nonzeros=3, allocated nonzeros=3 > total number of mallocs used during MatSetValues > calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=3, cols=3 > total: nonzeros=3, allocated nonzeros=3 > total number of mallocs used during MatSetValues > calls =0 > not using I-node routines > Down solver (pre-smoother) on level 1 > ------------------------------- > KSP Object: (fieldsplit_0_mg_levels_1_) 1 > MPI processes > type: richardson > Richardson: damping factor=1. > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (fieldsplit_0_mg_levels_1_) 1 > MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local > iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=15, cols=15 > total: nonzeros=69, allocated nonzeros=69 > total number of mallocs used during MatSetValues > calls =0 > not using I-node routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 2 > ------------------------------- > KSP Object: (fieldsplit_0_mg_levels_2_) 1 > MPI processes > type: richardson > Richardson: damping factor=1. > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (fieldsplit_0_mg_levels_2_) 1 > MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local > iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=304, cols=304 > total: nonzeros=7354, allocated nonzeros=7354 > total number of mallocs used during MatSetValues > calls =0 > not using I-node routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 3 > ------------------------------- > KSP Object: (fieldsplit_0_mg_levels_3_) 1 > MPI processes > type: richardson > Richardson: damping factor=1. > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (fieldsplit_0_mg_levels_3_) 1 > MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local > iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=30236, cols=30236 > total: nonzeros=2730644, allocated nonzeros=2730644 > total number of mallocs used during MatSetValues > calls =0 > not using I-node routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 4 > ------------------------------- > KSP Object: (fieldsplit_0_mg_levels_4_) 1 > MPI processes > type: richardson > Richardson: damping factor=1. > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (fieldsplit_0_mg_levels_4_) 1 > MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local > iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: (fieldsplit_0_) 1 MPI > processes > type: seqaij > rows=894132, cols=894132 > total: nonzeros=70684164, allocated nonzeros=70684164 > total number of mallocs used during MatSetValues > calls =0 > not using I-node routines > Up solver (post-smoother) same as down solver (pre-smoother) > linear system matrix = precond matrix: > Mat Object: (fieldsplit_0_) 1 MPI processes > type: seqaij > rows=894132, cols=894132 > total: nonzeros=70684164, allocated nonzeros=70684164 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > KSP solver for S = A11 - A10 inv(A00) A01 > KSP Object: (fieldsplit_1_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (fieldsplit_1_) 1 MPI processes > type: jacobi > linear system matrix followed by preconditioner matrix: > Mat Object: (fieldsplit_1_) 1 MPI processes > type: schurcomplement > rows=42025, cols=42025 > Schur complement A11 - A10 inv(A00) A01 > A11 > Mat Object: (fieldsplit_1_) 1 > MPI processes > type: seqaij > rows=42025, cols=42025 > total: nonzeros=554063, allocated nonzeros=554063 > total number of mallocs used during MatSetValues > calls =0 > not using I-node routines > A10 > Mat Object: 1 MPI processes > type: seqaij > rows=42025, cols=894132 > total: nonzeros=6850107, allocated nonzeros=6850107 > total number of mallocs used during MatSetValues > calls =0 > not using I-node routines > KSP of A00 > KSP Object: (fieldsplit_0_) 1 > MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (fieldsplit_0_) 1 > MPI processes > type: ml > MG: type is MULTIPLICATIVE, levels=5 cycles=v > Cycles per PCApply=1 > Using Galerkin computed coarse grid matrices > Coarse grid solver -- level > ------------------------------- > KSP Object: > (fieldsplit_0_mg_coarse_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: > (fieldsplit_0_mg_coarse_) 1 MPI processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > using diagonal shift on blocks to prevent zero > pivot [INBLOCKS] > matrix ordering: nd > factor fill ratio given 5., needed 1. > Factored matrix follows: > Mat Object: 1 MPI > processes > type: seqaij > rows=3, cols=3 > package used to perform factorization: > petsc > total: nonzeros=3, allocated nonzeros=3 > total number of mallocs used during > MatSetValues calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=3, cols=3 > total: nonzeros=3, allocated nonzeros=3 > total number of mallocs used during MatSetValues > calls =0 > not using I-node routines > Down solver (pre-smoother) on level 1 > ------------------------------- > KSP Object: > (fieldsplit_0_mg_levels_1_) 1 MPI processes > type: richardson > Richardson: damping factor=1. > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: > (fieldsplit_0_mg_levels_1_) 1 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, > local > iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=15, cols=15 > total: nonzeros=69, allocated nonzeros=69 > total number of mallocs used during MatSetValues > calls =0 > not using I-node routines > Up solver (post-smoother) same as down solver > (pre-smoother) > Down solver (pre-smoother) on level 2 > ------------------------------- > KSP Object: > (fieldsplit_0_mg_levels_2_) 1 MPI processes > type: richardson > Richardson: damping factor=1. > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: > (fieldsplit_0_mg_levels_2_) 1 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, > local > iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=304, cols=304 > total: nonzeros=7354, allocated nonzeros=7354 > total number of mallocs used during MatSetValues > calls =0 > not using I-node routines > Up solver (post-smoother) same as down solver > (pre-smoother) > Down solver (pre-smoother) on level 3 > ------------------------------- > KSP Object: > (fieldsplit_0_mg_levels_3_) 1 MPI processes > type: richardson > Richardson: damping factor=1. > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: > (fieldsplit_0_mg_levels_3_) 1 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, > local > iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=30236, cols=30236 > total: nonzeros=2730644, allocated > nonzeros=2730644 > total number of mallocs used during MatSetValues > calls =0 > not using I-node routines > Up solver (post-smoother) same as down solver > (pre-smoother) > Down solver (pre-smoother) on level 4 > ------------------------------- > KSP Object: > (fieldsplit_0_mg_levels_4_) 1 MPI processes > type: richardson > Richardson: damping factor=1. > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: > (fieldsplit_0_mg_levels_4_) 1 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, > local > iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: > (fieldsplit_0_) 1 MPI processes > type: seqaij > rows=894132, cols=894132 > total: nonzeros=70684164, allocated > nonzeros=70684164 > total number of mallocs used during MatSetValues > calls =0 > not using I-node routines > Up solver (post-smoother) same as down solver > (pre-smoother) > linear system matrix = precond matrix: > Mat Object: > (fieldsplit_0_) 1 MPI processes > type: seqaij > rows=894132, cols=894132 > total: nonzeros=70684164, allocated > nonzeros=70684164 > total number of mallocs used during MatSetValues > calls =0 > not using I-node routines > A01 > Mat Object: 1 MPI processes > type: seqaij > rows=894132, cols=42025 > total: nonzeros=6850107, allocated nonzeros=6850107 > total number of mallocs used during MatSetValues > calls =0 > not using I-node routines > Mat Object: 1 MPI processes > type: seqaij > rows=42025, cols=42025 > total: nonzeros=554063, allocated nonzeros=554063 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=936157, cols=936157 > total: nonzeros=84938441, allocated nonzeros=84938441 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Jun 12 06:50:13 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 12 Jun 2017 06:50:13 -0500 Subject: [petsc-users] Advice on improving Stokes Schur preconditioners In-Reply-To: References: Message-ID: On Sun, Jun 11, 2017 at 11:06 PM, David Nolte wrote: > Thanks Matt, makes sense to me! > > I skipped direct solvers at first because for these 'real' configurations > LU (mumps/superlu_dist) usally goes out of memory (got 32GB RAM). It would > be reasonable to take one more step back and play with synthetic examples. > I managed to run one case though with 936k dofs using: ("user" =pressure > mass matrix) > > <...> > -pc_fieldsplit_schur_fact_type upper > -pc_fieldsplit_schur_precondition user > -fieldsplit_0_ksp_type preonly > -fieldsplit_0_pc_type lu > -fieldsplit_0_pc_factor_mat_solver_package mumps > > -fieldsplit_1_ksp_type gmres > -fieldsplit_1_ksp_monitor_true_residuals > -fieldsplit_1_ksp_rtol 1e-10 > -fieldsplit_1_pc_type lu > -fieldsplit_1_pc_factor_mat_solver_package mumps > > It takes 2 outer iterations, as expected. However the fieldsplit_1 solve > takes very long. > 1) It should take 1 outer iterate, not two. The problem is that your Schur tolerance is way too high. Use -fieldsplit_1_ksp_rtol 1e-10 or something like that. Then it will take 1 iterate. 2) There is a problem with the Schur solve. Now from the iterates 423 KSP preconditioned resid norm 2.638419658982e-02 true resid norm 7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11 it is clear that the preconditioner is really screwing stuff up. For testing, you can use -pc_fieldsplit_schur_precondition full and your same setup here. It should take one iterate. I think there is something wrong with your mass matrix. Thanks, Matt > 0 KSP unpreconditioned resid norm 4.038466809302e-03 true resid norm > 4.038466809302e-03 ||r(i)||/||b|| 1.000000000000e+00 > Residual norms for fieldsplit_1_ solve. > 0 KSP preconditioned resid norm 0.000000000000e+00 true resid norm > 0.000000000000e+00 ||r(i)||/||b|| -nan > Linear fieldsplit_1_ solve converged due to CONVERGED_ATOL iterations 0 > 1 KSP unpreconditioned resid norm 4.860095964831e-06 true resid norm > 4.860095964831e-06 ||r(i)||/||b|| 1.203450763452e-03 > Residual norms for fieldsplit_1_ solve. > 0 KSP preconditioned resid norm 2.965546249872e+08 true resid norm > 1.000000000000e+00 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 1.347596594634e+08 true resid norm > 3.599678801575e-01 ||r(i)||/||b|| 3.599678801575e-01 > 2 KSP preconditioned resid norm 5.913230136403e+07 true resid norm > 2.364916760834e-01 ||r(i)||/||b|| 2.364916760834e-01 > 3 KSP preconditioned resid norm 4.629700028930e+07 true resid norm > 1.984444715595e-01 ||r(i)||/||b|| 1.984444715595e-01 > 4 KSP preconditioned resid norm 3.804431276819e+07 true resid norm > 1.747224559120e-01 ||r(i)||/||b|| 1.747224559120e-01 > 5 KSP preconditioned resid norm 3.178769422140e+07 true resid norm > 1.402254864444e-01 ||r(i)||/||b|| 1.402254864444e-01 > 6 KSP preconditioned resid norm 2.648669043919e+07 true resid norm > 1.191164310866e-01 ||r(i)||/||b|| 1.191164310866e-01 > 7 KSP preconditioned resid norm 2.203522108614e+07 true resid norm > 9.690500018007e-02 ||r(i)||/||b|| 9.690500018007e-02 > <...> > 422 KSP preconditioned resid norm 2.984888715147e-02 true resid norm > 8.598401046494e-11 ||r(i)||/||b|| 8.598401046494e-11 > 423 KSP preconditioned resid norm 2.638419658982e-02 true resid norm > 7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 423 > 2 KSP unpreconditioned resid norm 3.539889585599e-16 true resid norm > 3.542279617063e-16 ||r(i)||/||b|| 8.771347603759e-14 > Linear solve converged due to CONVERGED_RTOL iterations 2 > > > Does the slow convergence of the Schur block mean that my preconditioning > matrix Sp is a poor choice? > > Thanks, > David > > > On 06/11/2017 08:53 AM, Matthew Knepley wrote: > > On Sat, Jun 10, 2017 at 8:25 PM, David Nolte wrote: > >> Dear all, >> >> I am solving a Stokes problem in 3D aorta geometries, using a P2/P1 >> finite elements discretization on tetrahedral meshes resulting in >> ~1-1.5M DOFs. Viscosity is uniform (can be adjusted arbitrarily), and >> the right hand side is a function of noisy measurement data. >> >> In other settings of "standard" Stokes flow problems I have obtained >> good convergence with an "upper" Schur complement preconditioner, using >> AMG (ML or Hypre) on the velocity block and approximating the Schur >> complement matrix by the diagonal of the pressure mass matrix: >> >> -ksp_converged_reason >> -ksp_monitor_true_residual >> -ksp_initial_guess_nonzero >> -ksp_diagonal_scale >> -ksp_diagonal_scale_fix >> -ksp_type fgmres >> -ksp_rtol 1.0e-8 >> >> -pc_type fieldsplit >> -pc_fieldsplit_type schur >> -pc_fieldsplit_detect_saddle_point >> -pc_fieldsplit_schur_fact_type upper >> -pc_fieldsplit_schur_precondition user # <-- pressure mass matrix >> >> -fieldsplit_0_ksp_type preonly >> -fieldsplit_0_pc_type ml >> >> -fieldsplit_1_ksp_type preonly >> -fieldsplit_1_pc_type jacobi >> > > 1) I always recommend starting from an exact solver and backing off in > small steps for optimization. Thus > I would start with LU on the upper block and GMRES/LU with toelrance > 1e-10 on the Schur block. > This should converge in 1 iterate. > > 2) I don't think you want preonly on the Schur system. You might want > GMRES/Jacobi to invert the mass matrix. > > 3) You probably want to tighten the tolerance on the Schur solve, at least > to start, and then slowly let it out. The > tight tolerance will show you how effective the preconditioner is > using that Schur operator. Then you can start > to evaluate how effective the Schur linear sovler is. > > Does this make sense? > > Thanks, > > Matt > > >> In my present case this setup gives rather slow convergence (varies for >> different geometries between 200-500 or several thousands!). I obtain >> better convergence with "-pc_fieldsplit_schur_precondition selfp"and >> using multigrid on S, with "-fieldsplit_1_pc_type ml" (I don't think >> this is optimal, though). >> >> I don't understand why the pressure mass matrix approach performs so >> poorly and wonder what I could try to improve the convergence. Until now >> I have been using ML and Hypre BoomerAMG mostly with default parameters. >> Surely they can be improved by tuning some parameters. Which could be a >> good starting point? Are there other options I should consider? >> >> With the above setup (jacobi) for a case that works better than others, >> the KSP terminates with >> 467 KSP unpreconditioned resid norm 2.072014323515e-09 true resid norm >> 2.072014322600e-09 ||r(i)||/||b|| 9.939098100674e-09 >> >> You can find the output of -ksp_view below. Let me know if you need more >> details. >> >> Thanks in advance for your advice! >> Best wishes >> David >> >> >> KSP Object: 1 MPI processes >> type: fgmres >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >> Orthogonalization with no iterative refinement >> GMRES: happy breakdown tolerance 1e-30 >> maximum iterations=10000 >> tolerances: relative=1e-08, absolute=1e-50, divergence=10000. >> right preconditioning >> diagonally scaled system >> using nonzero initial guess >> using UNPRECONDITIONED norm type for convergence test >> PC Object: 1 MPI processes >> type: fieldsplit >> FieldSplit with Schur preconditioner, factorization UPPER >> Preconditioner for the Schur complement formed from user provided >> matrix >> Split info: >> Split number 0 Defined by IS >> Split number 1 Defined by IS >> KSP solver for A00 block >> KSP Object: (fieldsplit_0_) 1 MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_) 1 MPI processes >> type: ml >> MG: type is MULTIPLICATIVE, levels=5 cycles=v >> Cycles per PCApply=1 >> Using Galerkin computed coarse grid matrices >> Coarse grid solver -- level ------------------------------- >> KSP Object: (fieldsplit_0_mg_coarse_) 1 MPI >> processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_mg_coarse_) 1 MPI >> processes >> type: lu >> LU: out-of-place factorization >> tolerance for zero pivot 2.22045e-14 >> using diagonal shift on blocks to prevent zero pivot >> [INBLOCKS] >> matrix ordering: nd >> factor fill ratio given 5., needed 1. >> Factored matrix follows: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=3, cols=3 >> package used to perform factorization: petsc >> total: nonzeros=3, allocated nonzeros=3 >> total number of mallocs used during MatSetValues >> calls =0 >> not using I-node routines >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=3, cols=3 >> total: nonzeros=3, allocated nonzeros=3 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> Down solver (pre-smoother) on level 1 >> ------------------------------- >> KSP Object: (fieldsplit_0_mg_levels_1_) 1 >> MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_mg_levels_1_) 1 >> MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=15, cols=15 >> total: nonzeros=69, allocated nonzeros=69 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 2 >> ------------------------------- >> KSP Object: (fieldsplit_0_mg_levels_2_) 1 >> MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_mg_levels_2_) 1 >> MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=304, cols=304 >> total: nonzeros=7354, allocated nonzeros=7354 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 3 >> ------------------------------- >> KSP Object: (fieldsplit_0_mg_levels_3_) 1 >> MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_mg_levels_3_) 1 >> MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=30236, cols=30236 >> total: nonzeros=2730644, allocated nonzeros=2730644 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 4 >> ------------------------------- >> KSP Object: (fieldsplit_0_mg_levels_4_) 1 >> MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_mg_levels_4_) 1 >> MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: (fieldsplit_0_) 1 MPI >> processes >> type: seqaij >> rows=894132, cols=894132 >> total: nonzeros=70684164, allocated nonzeros=70684164 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver (pre-smoother) >> linear system matrix = precond matrix: >> Mat Object: (fieldsplit_0_) 1 MPI processes >> type: seqaij >> rows=894132, cols=894132 >> total: nonzeros=70684164, allocated nonzeros=70684164 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> KSP solver for S = A11 - A10 inv(A00) A01 >> KSP Object: (fieldsplit_1_) 1 MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (fieldsplit_1_) 1 MPI processes >> type: jacobi >> linear system matrix followed by preconditioner matrix: >> Mat Object: (fieldsplit_1_) 1 MPI processes >> type: schurcomplement >> rows=42025, cols=42025 >> Schur complement A11 - A10 inv(A00) A01 >> A11 >> Mat Object: (fieldsplit_1_) 1 >> MPI processes >> type: seqaij >> rows=42025, cols=42025 >> total: nonzeros=554063, allocated nonzeros=554063 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> A10 >> Mat Object: 1 MPI processes >> type: seqaij >> rows=42025, cols=894132 >> total: nonzeros=6850107, allocated nonzeros=6850107 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> KSP of A00 >> KSP Object: (fieldsplit_0_) 1 >> MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_) 1 >> MPI processes >> type: ml >> MG: type is MULTIPLICATIVE, levels=5 cycles=v >> Cycles per PCApply=1 >> Using Galerkin computed coarse grid matrices >> Coarse grid solver -- level ------------------------------ >> - >> KSP Object: >> (fieldsplit_0_mg_coarse_) 1 MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: >> (fieldsplit_0_mg_coarse_) 1 MPI processes >> type: lu >> LU: out-of-place factorization >> tolerance for zero pivot 2.22045e-14 >> using diagonal shift on blocks to prevent zero >> pivot [INBLOCKS] >> matrix ordering: nd >> factor fill ratio given 5., needed 1. >> Factored matrix follows: >> Mat Object: 1 MPI >> processes >> type: seqaij >> rows=3, cols=3 >> package used to perform factorization: petsc >> total: nonzeros=3, allocated nonzeros=3 >> total number of mallocs used during >> MatSetValues calls =0 >> not using I-node routines >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=3, cols=3 >> total: nonzeros=3, allocated nonzeros=3 >> total number of mallocs used during MatSetValues >> calls =0 >> not using I-node routines >> Down solver (pre-smoother) on level 1 >> ------------------------------- >> KSP Object: >> (fieldsplit_0_mg_levels_1_) 1 MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: >> (fieldsplit_0_mg_levels_1_) 1 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=15, cols=15 >> total: nonzeros=69, allocated nonzeros=69 >> total number of mallocs used during MatSetValues >> calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver >> (pre-smoother) >> Down solver (pre-smoother) on level 2 >> ------------------------------- >> KSP Object: >> (fieldsplit_0_mg_levels_2_) 1 MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: >> (fieldsplit_0_mg_levels_2_) 1 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=304, cols=304 >> total: nonzeros=7354, allocated nonzeros=7354 >> total number of mallocs used during MatSetValues >> calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver >> (pre-smoother) >> Down solver (pre-smoother) on level 3 >> ------------------------------- >> KSP Object: >> (fieldsplit_0_mg_levels_3_) 1 MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: >> (fieldsplit_0_mg_levels_3_) 1 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=30236, cols=30236 >> total: nonzeros=2730644, allocated nonzeros=2730644 >> total number of mallocs used during MatSetValues >> calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver >> (pre-smoother) >> Down solver (pre-smoother) on level 4 >> ------------------------------- >> KSP Object: >> (fieldsplit_0_mg_levels_4_) 1 MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: >> (fieldsplit_0_mg_levels_4_) 1 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: >> (fieldsplit_0_) 1 MPI processes >> type: seqaij >> rows=894132, cols=894132 >> total: nonzeros=70684164, allocated >> nonzeros=70684164 >> total number of mallocs used during MatSetValues >> calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver >> (pre-smoother) >> linear system matrix = precond matrix: >> Mat Object: >> (fieldsplit_0_) 1 MPI processes >> type: seqaij >> rows=894132, cols=894132 >> total: nonzeros=70684164, allocated nonzeros=70684164 >> total number of mallocs used during MatSetValues calls >> =0 >> not using I-node routines >> A01 >> Mat Object: 1 MPI processes >> type: seqaij >> rows=894132, cols=42025 >> total: nonzeros=6850107, allocated nonzeros=6850107 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> Mat Object: 1 MPI processes >> type: seqaij >> rows=42025, cols=42025 >> total: nonzeros=554063, allocated nonzeros=554063 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=936157, cols=936157 >> total: nonzeros=84938441, allocated nonzeros=84938441 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgbk2008 at gmail.com Mon Jun 12 10:19:00 2017 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Mon, 12 Jun 2017 17:19:00 +0200 Subject: [petsc-users] empty split for fieldsplit In-Reply-To: <86908C35-EBFC-48C6-A002-B64BEC85A375@mcs.anl.gov> References: <86908C35-EBFC-48C6-A002-B64BEC85A375@mcs.anl.gov> Message-ID: Dear Barry I made a small example with 2 process with one empty split in proc 0. But it gives another strange error [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: Arguments are incompatible [1]PETSC ERROR: Local size 31 not compatible with block size 2 The local size is always 60, so this is confusing. Giang On Sun, Jun 11, 2017 at 8:11 PM, Barry Smith wrote: > Could be, send us a simple example that demonstrates the problem and > we'll track it down. > > > > On Jun 11, 2017, at 12:34 PM, Hoang Giang Bui > wrote: > > > > Hello > > > > I noticed that my code stopped very long, possibly hang, at > PCFieldSplitSetIS. There are two splits and one split is empty in one > process. May that be the possible reason that PCFieldSplitSetIS hang ? > > > > Giang > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex.c Type: text/x-csrc Size: 3165 bytes Desc: not available URL: From dnolte at dim.uchile.cl Mon Jun 12 10:36:37 2017 From: dnolte at dim.uchile.cl (David Nolte) Date: Mon, 12 Jun 2017 11:36:37 -0400 Subject: [petsc-users] Advice on improving Stokes Schur preconditioners In-Reply-To: References: Message-ID: On 06/12/2017 07:50 AM, Matthew Knepley wrote: > On Sun, Jun 11, 2017 at 11:06 PM, David Nolte > wrote: > > Thanks Matt, makes sense to me! > > I skipped direct solvers at first because for these 'real' > configurations LU (mumps/superlu_dist) usally goes out of memory > (got 32GB RAM). It would be reasonable to take one more step back > and play with synthetic examples. > I managed to run one case though with 936k dofs using: ("user" > =pressure mass matrix) > > <...> > -pc_fieldsplit_schur_fact_type upper > -pc_fieldsplit_schur_precondition user > -fieldsplit_0_ksp_type preonly > -fieldsplit_0_pc_type lu > -fieldsplit_0_pc_factor_mat_solver_package mumps > > -fieldsplit_1_ksp_type gmres > -fieldsplit_1_ksp_monitor_true_residuals > -fieldsplit_1_ksp_rtol 1e-10 > -fieldsplit_1_pc_type lu > -fieldsplit_1_pc_factor_mat_solver_package mumps > > It takes 2 outer iterations, as expected. However the fieldsplit_1 > solve takes very long. > > > 1) It should take 1 outer iterate, not two. The problem is that your > Schur tolerance is way too high. Use > > -fieldsplit_1_ksp_rtol 1e-10 > > or something like that. Then it will take 1 iterate. Shouldn't it take 2 with a triangular Schur factorization and exact preconditioners, and 1 with a full factorization? (cf. Benzi et al 2005, p.66, http://www.mathcs.emory.edu/~benzi/Web_papers/bgl05.pdf) That's exactly what I set: -fieldsplit_1_ksp_rtol 1e-10 and the Schur solver does drop below "rtol < 1e-10" > > 2) There is a problem with the Schur solve. Now from the iterates > > 423 KSP preconditioned resid norm 2.638419658982e-02 true resid norm > 7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11 > > it is clear that the preconditioner is really screwing stuff up. For > testing, you can use > > -pc_fieldsplit_schur_precondition full > > and your same setup here. It should take one iterate. I think there is > something wrong with your > mass matrix. I agree. I forgot to mention that I am considering an "enclosed flow" problem, with u=0 on all the boundary and a Dirichlet condition for the pressure in one point for fixing the constant pressure. Maybe the preconditioner is not consistent with this setup, need to check this.. Thanks a lot > > Thanks, > > Matt > > > 0 KSP unpreconditioned resid norm 4.038466809302e-03 true resid > norm 4.038466809302e-03 ||r(i)||/||b|| 1.000000000000e+00 > Residual norms for fieldsplit_1_ solve. > 0 KSP preconditioned resid norm 0.000000000000e+00 true resid > norm 0.000000000000e+00 ||r(i)||/||b|| -nan > Linear fieldsplit_1_ solve converged due to CONVERGED_ATOL > iterations 0 > 1 KSP unpreconditioned resid norm 4.860095964831e-06 true resid > norm 4.860095964831e-06 ||r(i)||/||b|| 1.203450763452e-03 > Residual norms for fieldsplit_1_ solve. > 0 KSP preconditioned resid norm 2.965546249872e+08 true resid > norm 1.000000000000e+00 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 1.347596594634e+08 true resid > norm 3.599678801575e-01 ||r(i)||/||b|| 3.599678801575e-01 > 2 KSP preconditioned resid norm 5.913230136403e+07 true resid > norm 2.364916760834e-01 ||r(i)||/||b|| 2.364916760834e-01 > 3 KSP preconditioned resid norm 4.629700028930e+07 true resid > norm 1.984444715595e-01 ||r(i)||/||b|| 1.984444715595e-01 > 4 KSP preconditioned resid norm 3.804431276819e+07 true resid > norm 1.747224559120e-01 ||r(i)||/||b|| 1.747224559120e-01 > 5 KSP preconditioned resid norm 3.178769422140e+07 true resid > norm 1.402254864444e-01 ||r(i)||/||b|| 1.402254864444e-01 > 6 KSP preconditioned resid norm 2.648669043919e+07 true resid > norm 1.191164310866e-01 ||r(i)||/||b|| 1.191164310866e-01 > 7 KSP preconditioned resid norm 2.203522108614e+07 true resid > norm 9.690500018007e-02 ||r(i)||/||b|| 9.690500018007e-02 > <...> > 422 KSP preconditioned resid norm 2.984888715147e-02 true > resid norm 8.598401046494e-11 ||r(i)||/||b|| 8.598401046494e-11 > 423 KSP preconditioned resid norm 2.638419658982e-02 true > resid norm 7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL > iterations 423 > 2 KSP unpreconditioned resid norm 3.539889585599e-16 true resid > norm 3.542279617063e-16 ||r(i)||/||b|| 8.771347603759e-14 > Linear solve converged due to CONVERGED_RTOL iterations 2 > > > Does the slow convergence of the Schur block mean that my > preconditioning matrix Sp is a poor choice? > > Thanks, > David > > > On 06/11/2017 08:53 AM, Matthew Knepley wrote: >> On Sat, Jun 10, 2017 at 8:25 PM, David Nolte >> > wrote: >> >> Dear all, >> >> I am solving a Stokes problem in 3D aorta geometries, using a >> P2/P1 >> finite elements discretization on tetrahedral meshes resulting in >> ~1-1.5M DOFs. Viscosity is uniform (can be adjusted >> arbitrarily), and >> the right hand side is a function of noisy measurement data. >> >> In other settings of "standard" Stokes flow problems I have >> obtained >> good convergence with an "upper" Schur complement >> preconditioner, using >> AMG (ML or Hypre) on the velocity block and approximating the >> Schur >> complement matrix by the diagonal of the pressure mass matrix: >> >> -ksp_converged_reason >> -ksp_monitor_true_residual >> -ksp_initial_guess_nonzero >> -ksp_diagonal_scale >> -ksp_diagonal_scale_fix >> -ksp_type fgmres >> -ksp_rtol 1.0e-8 >> >> -pc_type fieldsplit >> -pc_fieldsplit_type schur >> -pc_fieldsplit_detect_saddle_point >> -pc_fieldsplit_schur_fact_type upper >> -pc_fieldsplit_schur_precondition user # <-- pressure >> mass matrix >> >> -fieldsplit_0_ksp_type preonly >> -fieldsplit_0_pc_type ml >> >> -fieldsplit_1_ksp_type preonly >> -fieldsplit_1_pc_type jacobi >> >> >> 1) I always recommend starting from an exact solver and backing >> off in small steps for optimization. Thus >> I would start with LU on the upper block and GMRES/LU with >> toelrance 1e-10 on the Schur block. >> This should converge in 1 iterate. >> >> 2) I don't think you want preonly on the Schur system. You might >> want GMRES/Jacobi to invert the mass matrix. >> >> 3) You probably want to tighten the tolerance on the Schur solve, >> at least to start, and then slowly let it out. The >> tight tolerance will show you how effective the >> preconditioner is using that Schur operator. Then you can start >> to evaluate how effective the Schur linear sovler is. >> >> Does this make sense? >> >> Thanks, >> >> Matt >> >> >> In my present case this setup gives rather slow convergence >> (varies for >> different geometries between 200-500 or several thousands!). >> I obtain >> better convergence with "-pc_fieldsplit_schur_precondition >> selfp"and >> using multigrid on S, with "-fieldsplit_1_pc_type ml" (I >> don't think >> this is optimal, though). >> >> I don't understand why the pressure mass matrix approach >> performs so >> poorly and wonder what I could try to improve the >> convergence. Until now >> I have been using ML and Hypre BoomerAMG mostly with default >> parameters. >> Surely they can be improved by tuning some parameters. Which >> could be a >> good starting point? Are there other options I should consider? >> >> With the above setup (jacobi) for a case that works better >> than others, >> the KSP terminates with >> 467 KSP unpreconditioned resid norm 2.072014323515e-09 true >> resid norm >> 2.072014322600e-09 ||r(i)||/||b|| 9.939098100674e-09 >> >> You can find the output of -ksp_view below. Let me know if >> you need more >> details. >> >> Thanks in advance for your advice! >> Best wishes >> David >> >> >> KSP Object: 1 MPI processes >> type: fgmres >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >> Orthogonalization with no iterative refinement >> GMRES: happy breakdown tolerance 1e-30 >> maximum iterations=10000 >> tolerances: relative=1e-08, absolute=1e-50, divergence=10000. >> right preconditioning >> diagonally scaled system >> using nonzero initial guess >> using UNPRECONDITIONED norm type for convergence test >> PC Object: 1 MPI processes >> type: fieldsplit >> FieldSplit with Schur preconditioner, factorization UPPER >> Preconditioner for the Schur complement formed from user >> provided matrix >> Split info: >> Split number 0 Defined by IS >> Split number 1 Defined by IS >> KSP solver for A00 block >> KSP Object: (fieldsplit_0_) 1 MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_) 1 MPI processes >> type: ml >> MG: type is MULTIPLICATIVE, levels=5 cycles=v >> Cycles per PCApply=1 >> Using Galerkin computed coarse grid matrices >> Coarse grid solver -- level >> ------------------------------- >> KSP Object: (fieldsplit_0_mg_coarse_) >> 1 MPI >> processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_mg_coarse_) >> 1 MPI >> processes >> type: lu >> LU: out-of-place factorization >> tolerance for zero pivot 2.22045e-14 >> using diagonal shift on blocks to prevent zero >> pivot >> [INBLOCKS] >> matrix ordering: nd >> factor fill ratio given 5., needed 1. >> Factored matrix follows: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=3, cols=3 >> package used to perform factorization: petsc >> total: nonzeros=3, allocated nonzeros=3 >> total number of mallocs used during >> MatSetValues >> calls =0 >> not using I-node routines >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=3, cols=3 >> total: nonzeros=3, allocated nonzeros=3 >> total number of mallocs used during >> MatSetValues calls =0 >> not using I-node routines >> Down solver (pre-smoother) on level 1 >> ------------------------------- >> KSP Object: (fieldsplit_0_mg_levels_1_) >> 1 >> MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_mg_levels_1_) >> 1 >> MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=15, cols=15 >> total: nonzeros=69, allocated nonzeros=69 >> total number of mallocs used during >> MatSetValues calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver >> (pre-smoother) >> Down solver (pre-smoother) on level 2 >> ------------------------------- >> KSP Object: (fieldsplit_0_mg_levels_2_) >> 1 >> MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_mg_levels_2_) >> 1 >> MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=304, cols=304 >> total: nonzeros=7354, allocated nonzeros=7354 >> total number of mallocs used during >> MatSetValues calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver >> (pre-smoother) >> Down solver (pre-smoother) on level 3 >> ------------------------------- >> KSP Object: (fieldsplit_0_mg_levels_3_) >> 1 >> MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_mg_levels_3_) >> 1 >> MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=30236, cols=30236 >> total: nonzeros=2730644, allocated nonzeros=2730644 >> total number of mallocs used during >> MatSetValues calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver >> (pre-smoother) >> Down solver (pre-smoother) on level 4 >> ------------------------------- >> KSP Object: (fieldsplit_0_mg_levels_4_) >> 1 >> MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_mg_levels_4_) >> 1 >> MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: (fieldsplit_0_) >> 1 MPI >> processes >> type: seqaij >> rows=894132, cols=894132 >> total: nonzeros=70684164, allocated >> nonzeros=70684164 >> total number of mallocs used during >> MatSetValues calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver >> (pre-smoother) >> linear system matrix = precond matrix: >> Mat Object: (fieldsplit_0_) 1 MPI >> processes >> type: seqaij >> rows=894132, cols=894132 >> total: nonzeros=70684164, allocated nonzeros=70684164 >> total number of mallocs used during MatSetValues >> calls =0 >> not using I-node routines >> KSP solver for S = A11 - A10 inv(A00) A01 >> KSP Object: (fieldsplit_1_) 1 MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (fieldsplit_1_) 1 MPI processes >> type: jacobi >> linear system matrix followed by preconditioner matrix: >> Mat Object: (fieldsplit_1_) 1 MPI >> processes >> type: schurcomplement >> rows=42025, cols=42025 >> Schur complement A11 - A10 inv(A00) A01 >> A11 >> Mat Object: (fieldsplit_1_) >> 1 >> MPI processes >> type: seqaij >> rows=42025, cols=42025 >> total: nonzeros=554063, allocated nonzeros=554063 >> total number of mallocs used during >> MatSetValues calls =0 >> not using I-node routines >> A10 >> Mat Object: 1 MPI processes >> type: seqaij >> rows=42025, cols=894132 >> total: nonzeros=6850107, allocated >> nonzeros=6850107 >> total number of mallocs used during >> MatSetValues calls =0 >> not using I-node routines >> KSP of A00 >> KSP Object: (fieldsplit_0_) >> 1 >> MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_) >> 1 >> MPI processes >> type: ml >> MG: type is MULTIPLICATIVE, levels=5 cycles=v >> Cycles per PCApply=1 >> Using Galerkin computed coarse grid matrices >> Coarse grid solver -- level >> ------------------------------- >> KSP Object: >> (fieldsplit_0_mg_coarse_) 1 MPI processes >> type: preonly >> maximum iterations=10000, initial guess >> is zero >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: >> (fieldsplit_0_mg_coarse_) 1 MPI processes >> type: lu >> LU: out-of-place factorization >> tolerance for zero pivot 2.22045e-14 >> using diagonal shift on blocks to >> prevent zero >> pivot [INBLOCKS] >> matrix ordering: nd >> factor fill ratio given 5., needed 1. >> Factored matrix follows: >> Mat Object: >> 1 MPI >> processes >> type: seqaij >> rows=3, cols=3 >> package used to perform >> factorization: petsc >> total: nonzeros=3, allocated >> nonzeros=3 >> total number of mallocs used during >> MatSetValues calls =0 >> not using I-node routines >> linear system matrix = precond matrix: >> Mat Object: 1 MPI >> processes >> type: seqaij >> rows=3, cols=3 >> total: nonzeros=3, allocated nonzeros=3 >> total number of mallocs used during >> MatSetValues >> calls =0 >> not using I-node routines >> Down solver (pre-smoother) on level 1 >> ------------------------------- >> KSP Object: >> (fieldsplit_0_mg_levels_1_) 1 MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: >> (fieldsplit_0_mg_levels_1_) 1 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations >> = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 1 MPI >> processes >> type: seqaij >> rows=15, cols=15 >> total: nonzeros=69, allocated nonzeros=69 >> total number of mallocs used during >> MatSetValues >> calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver >> (pre-smoother) >> Down solver (pre-smoother) on level 2 >> ------------------------------- >> KSP Object: >> (fieldsplit_0_mg_levels_2_) 1 MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: >> (fieldsplit_0_mg_levels_2_) 1 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations >> = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 1 MPI >> processes >> type: seqaij >> rows=304, cols=304 >> total: nonzeros=7354, allocated >> nonzeros=7354 >> total number of mallocs used during >> MatSetValues >> calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver >> (pre-smoother) >> Down solver (pre-smoother) on level 3 >> ------------------------------- >> KSP Object: >> (fieldsplit_0_mg_levels_3_) 1 MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: >> (fieldsplit_0_mg_levels_3_) 1 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations >> = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 1 MPI >> processes >> type: seqaij >> rows=30236, cols=30236 >> total: nonzeros=2730644, allocated >> nonzeros=2730644 >> total number of mallocs used during >> MatSetValues >> calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver >> (pre-smoother) >> Down solver (pre-smoother) on level 4 >> ------------------------------- >> KSP Object: >> (fieldsplit_0_mg_levels_4_) 1 MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: >> (fieldsplit_0_mg_levels_4_) 1 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations >> = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: >> (fieldsplit_0_) 1 MPI processes >> type: seqaij >> rows=894132, cols=894132 >> total: nonzeros=70684164, allocated >> nonzeros=70684164 >> total number of mallocs used during >> MatSetValues >> calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver >> (pre-smoother) >> linear system matrix = precond matrix: >> Mat Object: >> (fieldsplit_0_) 1 MPI processes >> type: seqaij >> rows=894132, cols=894132 >> total: nonzeros=70684164, allocated >> nonzeros=70684164 >> total number of mallocs used during >> MatSetValues calls =0 >> not using I-node routines >> A01 >> Mat Object: 1 MPI processes >> type: seqaij >> rows=894132, cols=42025 >> total: nonzeros=6850107, allocated >> nonzeros=6850107 >> total number of mallocs used during >> MatSetValues calls =0 >> not using I-node routines >> Mat Object: 1 MPI processes >> type: seqaij >> rows=42025, cols=42025 >> total: nonzeros=554063, allocated nonzeros=554063 >> total number of mallocs used during MatSetValues >> calls =0 >> not using I-node routines >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=936157, cols=936157 >> total: nonzeros=84938441, allocated nonzeros=84938441 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to >> which their experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Jun 12 11:41:02 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 12 Jun 2017 11:41:02 -0500 Subject: [petsc-users] Advice on improving Stokes Schur preconditioners In-Reply-To: References: Message-ID: On Mon, Jun 12, 2017 at 10:36 AM, David Nolte wrote: > > On 06/12/2017 07:50 AM, Matthew Knepley wrote: > > On Sun, Jun 11, 2017 at 11:06 PM, David Nolte > wrote: > >> Thanks Matt, makes sense to me! >> >> I skipped direct solvers at first because for these 'real' configurations >> LU (mumps/superlu_dist) usally goes out of memory (got 32GB RAM). It would >> be reasonable to take one more step back and play with synthetic examples. >> I managed to run one case though with 936k dofs using: ("user" =pressure >> mass matrix) >> >> <...> >> -pc_fieldsplit_schur_fact_type upper >> -pc_fieldsplit_schur_precondition user >> -fieldsplit_0_ksp_type preonly >> -fieldsplit_0_pc_type lu >> -fieldsplit_0_pc_factor_mat_solver_package mumps >> >> -fieldsplit_1_ksp_type gmres >> -fieldsplit_1_ksp_monitor_true_residuals >> -fieldsplit_1_ksp_rtol 1e-10 >> -fieldsplit_1_pc_type lu >> -fieldsplit_1_pc_factor_mat_solver_package mumps >> >> It takes 2 outer iterations, as expected. However the fieldsplit_1 solve >> takes very long. >> > > 1) It should take 1 outer iterate, not two. The problem is that your Schur > tolerance is way too high. Use > > -fieldsplit_1_ksp_rtol 1e-10 > > or something like that. Then it will take 1 iterate. > > > Shouldn't it take 2 with a triangular Schur factorization and exact > preconditioners, and 1 with a full factorization? (cf. Benzi et al 2005, > p.66, http://www.mathcs.emory.edu/~benzi/Web_papers/bgl05.pdf) > > That's exactly what I set: -fieldsplit_1_ksp_rtol 1e-10 and the Schur > solver does drop below "rtol < 1e-10" > Oh, yes. Take away the upper until things are worked out. Thanks, Matt > > 2) There is a problem with the Schur solve. Now from the iterates > > 423 KSP preconditioned resid norm 2.638419658982e-02 true resid norm > 7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11 > > it is clear that the preconditioner is really screwing stuff up. For > testing, you can use > > -pc_fieldsplit_schur_precondition full > > and your same setup here. It should take one iterate. I think there is > something wrong with your > mass matrix. > > > I agree. I forgot to mention that I am considering an "enclosed flow" > problem, with u=0 on all the boundary and a Dirichlet condition for the > pressure in one point for fixing the constant pressure. Maybe the > preconditioner is not consistent with this setup, need to check this.. > > Thanks a lot > > > > Thanks, > > Matt > > >> 0 KSP unpreconditioned resid norm 4.038466809302e-03 true resid norm >> 4.038466809302e-03 ||r(i)||/||b|| 1.000000000000e+00 >> Residual norms for fieldsplit_1_ solve. >> 0 KSP preconditioned resid norm 0.000000000000e+00 true resid norm >> 0.000000000000e+00 ||r(i)||/||b|| -nan >> Linear fieldsplit_1_ solve converged due to CONVERGED_ATOL iterations 0 >> 1 KSP unpreconditioned resid norm 4.860095964831e-06 true resid norm >> 4.860095964831e-06 ||r(i)||/||b|| 1.203450763452e-03 >> Residual norms for fieldsplit_1_ solve. >> 0 KSP preconditioned resid norm 2.965546249872e+08 true resid norm >> 1.000000000000e+00 ||r(i)||/||b|| 1.000000000000e+00 >> 1 KSP preconditioned resid norm 1.347596594634e+08 true resid norm >> 3.599678801575e-01 ||r(i)||/||b|| 3.599678801575e-01 >> 2 KSP preconditioned resid norm 5.913230136403e+07 true resid norm >> 2.364916760834e-01 ||r(i)||/||b|| 2.364916760834e-01 >> 3 KSP preconditioned resid norm 4.629700028930e+07 true resid norm >> 1.984444715595e-01 ||r(i)||/||b|| 1.984444715595e-01 >> 4 KSP preconditioned resid norm 3.804431276819e+07 true resid norm >> 1.747224559120e-01 ||r(i)||/||b|| 1.747224559120e-01 >> 5 KSP preconditioned resid norm 3.178769422140e+07 true resid norm >> 1.402254864444e-01 ||r(i)||/||b|| 1.402254864444e-01 >> 6 KSP preconditioned resid norm 2.648669043919e+07 true resid norm >> 1.191164310866e-01 ||r(i)||/||b|| 1.191164310866e-01 >> 7 KSP preconditioned resid norm 2.203522108614e+07 true resid norm >> 9.690500018007e-02 ||r(i)||/||b|| 9.690500018007e-02 >> <...> >> 422 KSP preconditioned resid norm 2.984888715147e-02 true resid norm >> 8.598401046494e-11 ||r(i)||/||b|| 8.598401046494e-11 >> 423 KSP preconditioned resid norm 2.638419658982e-02 true resid norm >> 7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11 >> Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations >> 423 >> 2 KSP unpreconditioned resid norm 3.539889585599e-16 true resid norm >> 3.542279617063e-16 ||r(i)||/||b|| 8.771347603759e-14 >> Linear solve converged due to CONVERGED_RTOL iterations 2 >> >> >> Does the slow convergence of the Schur block mean that my preconditioning >> matrix Sp is a poor choice? >> >> Thanks, >> David >> >> >> On 06/11/2017 08:53 AM, Matthew Knepley wrote: >> >> On Sat, Jun 10, 2017 at 8:25 PM, David Nolte >> wrote: >> >>> Dear all, >>> >>> I am solving a Stokes problem in 3D aorta geometries, using a P2/P1 >>> finite elements discretization on tetrahedral meshes resulting in >>> ~1-1.5M DOFs. Viscosity is uniform (can be adjusted arbitrarily), and >>> the right hand side is a function of noisy measurement data. >>> >>> In other settings of "standard" Stokes flow problems I have obtained >>> good convergence with an "upper" Schur complement preconditioner, using >>> AMG (ML or Hypre) on the velocity block and approximating the Schur >>> complement matrix by the diagonal of the pressure mass matrix: >>> >>> -ksp_converged_reason >>> -ksp_monitor_true_residual >>> -ksp_initial_guess_nonzero >>> -ksp_diagonal_scale >>> -ksp_diagonal_scale_fix >>> -ksp_type fgmres >>> -ksp_rtol 1.0e-8 >>> >>> -pc_type fieldsplit >>> -pc_fieldsplit_type schur >>> -pc_fieldsplit_detect_saddle_point >>> -pc_fieldsplit_schur_fact_type upper >>> -pc_fieldsplit_schur_precondition user # <-- pressure mass matrix >>> >>> -fieldsplit_0_ksp_type preonly >>> -fieldsplit_0_pc_type ml >>> >>> -fieldsplit_1_ksp_type preonly >>> -fieldsplit_1_pc_type jacobi >>> >> >> 1) I always recommend starting from an exact solver and backing off in >> small steps for optimization. Thus >> I would start with LU on the upper block and GMRES/LU with toelrance >> 1e-10 on the Schur block. >> This should converge in 1 iterate. >> >> 2) I don't think you want preonly on the Schur system. You might want >> GMRES/Jacobi to invert the mass matrix. >> >> 3) You probably want to tighten the tolerance on the Schur solve, at >> least to start, and then slowly let it out. The >> tight tolerance will show you how effective the preconditioner is >> using that Schur operator. Then you can start >> to evaluate how effective the Schur linear sovler is. >> >> Does this make sense? >> >> Thanks, >> >> Matt >> >> >>> In my present case this setup gives rather slow convergence (varies for >>> different geometries between 200-500 or several thousands!). I obtain >>> better convergence with "-pc_fieldsplit_schur_precondition selfp"and >>> using multigrid on S, with "-fieldsplit_1_pc_type ml" (I don't think >>> this is optimal, though). >>> >>> I don't understand why the pressure mass matrix approach performs so >>> poorly and wonder what I could try to improve the convergence. Until now >>> I have been using ML and Hypre BoomerAMG mostly with default parameters. >>> Surely they can be improved by tuning some parameters. Which could be a >>> good starting point? Are there other options I should consider? >>> >>> With the above setup (jacobi) for a case that works better than others, >>> the KSP terminates with >>> 467 KSP unpreconditioned resid norm 2.072014323515e-09 true resid norm >>> 2.072014322600e-09 ||r(i)||/||b|| 9.939098100674e-09 >>> >>> You can find the output of -ksp_view below. Let me know if you need more >>> details. >>> >>> Thanks in advance for your advice! >>> Best wishes >>> David >>> >>> >>> KSP Object: 1 MPI processes >>> type: fgmres >>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >>> Orthogonalization with no iterative refinement >>> GMRES: happy breakdown tolerance 1e-30 >>> maximum iterations=10000 >>> tolerances: relative=1e-08, absolute=1e-50, divergence=10000. >>> right preconditioning >>> diagonally scaled system >>> using nonzero initial guess >>> using UNPRECONDITIONED norm type for convergence test >>> PC Object: 1 MPI processes >>> type: fieldsplit >>> FieldSplit with Schur preconditioner, factorization UPPER >>> Preconditioner for the Schur complement formed from user provided >>> matrix >>> Split info: >>> Split number 0 Defined by IS >>> Split number 1 Defined by IS >>> KSP solver for A00 block >>> KSP Object: (fieldsplit_0_) 1 MPI processes >>> type: preonly >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>> left preconditioning >>> using NONE norm type for convergence test >>> PC Object: (fieldsplit_0_) 1 MPI processes >>> type: ml >>> MG: type is MULTIPLICATIVE, levels=5 cycles=v >>> Cycles per PCApply=1 >>> Using Galerkin computed coarse grid matrices >>> Coarse grid solver -- level ------------------------------- >>> KSP Object: (fieldsplit_0_mg_coarse_) 1 MPI >>> processes >>> type: preonly >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using NONE norm type for convergence test >>> PC Object: (fieldsplit_0_mg_coarse_) 1 MPI >>> processes >>> type: lu >>> LU: out-of-place factorization >>> tolerance for zero pivot 2.22045e-14 >>> using diagonal shift on blocks to prevent zero pivot >>> [INBLOCKS] >>> matrix ordering: nd >>> factor fill ratio given 5., needed 1. >>> Factored matrix follows: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=3, cols=3 >>> package used to perform factorization: petsc >>> total: nonzeros=3, allocated nonzeros=3 >>> total number of mallocs used during MatSetValues >>> calls =0 >>> not using I-node routines >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=3, cols=3 >>> total: nonzeros=3, allocated nonzeros=3 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node routines >>> Down solver (pre-smoother) on level 1 >>> ------------------------------- >>> KSP Object: (fieldsplit_0_mg_levels_1_) 1 >>> MPI processes >>> type: richardson >>> Richardson: damping factor=1. >>> maximum iterations=2 >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using nonzero initial guess >>> using NONE norm type for convergence test >>> PC Object: (fieldsplit_0_mg_levels_1_) 1 >>> MPI processes >>> type: sor >>> SOR: type = local_symmetric, iterations = 1, local >>> iterations = 1, omega = 1. >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=15, cols=15 >>> total: nonzeros=69, allocated nonzeros=69 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node routines >>> Up solver (post-smoother) same as down solver (pre-smoother) >>> Down solver (pre-smoother) on level 2 >>> ------------------------------- >>> KSP Object: (fieldsplit_0_mg_levels_2_) 1 >>> MPI processes >>> type: richardson >>> Richardson: damping factor=1. >>> maximum iterations=2 >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using nonzero initial guess >>> using NONE norm type for convergence test >>> PC Object: (fieldsplit_0_mg_levels_2_) 1 >>> MPI processes >>> type: sor >>> SOR: type = local_symmetric, iterations = 1, local >>> iterations = 1, omega = 1. >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=304, cols=304 >>> total: nonzeros=7354, allocated nonzeros=7354 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node routines >>> Up solver (post-smoother) same as down solver (pre-smoother) >>> Down solver (pre-smoother) on level 3 >>> ------------------------------- >>> KSP Object: (fieldsplit_0_mg_levels_3_) 1 >>> MPI processes >>> type: richardson >>> Richardson: damping factor=1. >>> maximum iterations=2 >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using nonzero initial guess >>> using NONE norm type for convergence test >>> PC Object: (fieldsplit_0_mg_levels_3_) 1 >>> MPI processes >>> type: sor >>> SOR: type = local_symmetric, iterations = 1, local >>> iterations = 1, omega = 1. >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=30236, cols=30236 >>> total: nonzeros=2730644, allocated nonzeros=2730644 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node routines >>> Up solver (post-smoother) same as down solver (pre-smoother) >>> Down solver (pre-smoother) on level 4 >>> ------------------------------- >>> KSP Object: (fieldsplit_0_mg_levels_4_) 1 >>> MPI processes >>> type: richardson >>> Richardson: damping factor=1. >>> maximum iterations=2 >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using nonzero initial guess >>> using NONE norm type for convergence test >>> PC Object: (fieldsplit_0_mg_levels_4_) 1 >>> MPI processes >>> type: sor >>> SOR: type = local_symmetric, iterations = 1, local >>> iterations = 1, omega = 1. >>> linear system matrix = precond matrix: >>> Mat Object: (fieldsplit_0_) 1 MPI >>> processes >>> type: seqaij >>> rows=894132, cols=894132 >>> total: nonzeros=70684164, allocated nonzeros=70684164 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node routines >>> Up solver (post-smoother) same as down solver (pre-smoother) >>> linear system matrix = precond matrix: >>> Mat Object: (fieldsplit_0_) 1 MPI processes >>> type: seqaij >>> rows=894132, cols=894132 >>> total: nonzeros=70684164, allocated nonzeros=70684164 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node routines >>> KSP solver for S = A11 - A10 inv(A00) A01 >>> KSP Object: (fieldsplit_1_) 1 MPI processes >>> type: preonly >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>> left preconditioning >>> using NONE norm type for convergence test >>> PC Object: (fieldsplit_1_) 1 MPI processes >>> type: jacobi >>> linear system matrix followed by preconditioner matrix: >>> Mat Object: (fieldsplit_1_) 1 MPI processes >>> type: schurcomplement >>> rows=42025, cols=42025 >>> Schur complement A11 - A10 inv(A00) A01 >>> A11 >>> Mat Object: (fieldsplit_1_) 1 >>> MPI processes >>> type: seqaij >>> rows=42025, cols=42025 >>> total: nonzeros=554063, allocated nonzeros=554063 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node routines >>> A10 >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=42025, cols=894132 >>> total: nonzeros=6850107, allocated nonzeros=6850107 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node routines >>> KSP of A00 >>> KSP Object: (fieldsplit_0_) 1 >>> MPI processes >>> type: preonly >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using NONE norm type for convergence test >>> PC Object: (fieldsplit_0_) 1 >>> MPI processes >>> type: ml >>> MG: type is MULTIPLICATIVE, levels=5 cycles=v >>> Cycles per PCApply=1 >>> Using Galerkin computed coarse grid matrices >>> Coarse grid solver -- level >>> ------------------------------- >>> KSP Object: >>> (fieldsplit_0_mg_coarse_) 1 MPI processes >>> type: preonly >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using NONE norm type for convergence test >>> PC Object: >>> (fieldsplit_0_mg_coarse_) 1 MPI processes >>> type: lu >>> LU: out-of-place factorization >>> tolerance for zero pivot 2.22045e-14 >>> using diagonal shift on blocks to prevent zero >>> pivot [INBLOCKS] >>> matrix ordering: nd >>> factor fill ratio given 5., needed 1. >>> Factored matrix follows: >>> Mat Object: 1 MPI >>> processes >>> type: seqaij >>> rows=3, cols=3 >>> package used to perform factorization: petsc >>> total: nonzeros=3, allocated nonzeros=3 >>> total number of mallocs used during >>> MatSetValues calls =0 >>> not using I-node routines >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=3, cols=3 >>> total: nonzeros=3, allocated nonzeros=3 >>> total number of mallocs used during MatSetValues >>> calls =0 >>> not using I-node routines >>> Down solver (pre-smoother) on level 1 >>> ------------------------------- >>> KSP Object: >>> (fieldsplit_0_mg_levels_1_) 1 MPI processes >>> type: richardson >>> Richardson: damping factor=1. >>> maximum iterations=2 >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using nonzero initial guess >>> using NONE norm type for convergence test >>> PC Object: >>> (fieldsplit_0_mg_levels_1_) 1 MPI processes >>> type: sor >>> SOR: type = local_symmetric, iterations = 1, local >>> iterations = 1, omega = 1. >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=15, cols=15 >>> total: nonzeros=69, allocated nonzeros=69 >>> total number of mallocs used during MatSetValues >>> calls =0 >>> not using I-node routines >>> Up solver (post-smoother) same as down solver >>> (pre-smoother) >>> Down solver (pre-smoother) on level 2 >>> ------------------------------- >>> KSP Object: >>> (fieldsplit_0_mg_levels_2_) 1 MPI processes >>> type: richardson >>> Richardson: damping factor=1. >>> maximum iterations=2 >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using nonzero initial guess >>> using NONE norm type for convergence test >>> PC Object: >>> (fieldsplit_0_mg_levels_2_) 1 MPI processes >>> type: sor >>> SOR: type = local_symmetric, iterations = 1, local >>> iterations = 1, omega = 1. >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=304, cols=304 >>> total: nonzeros=7354, allocated nonzeros=7354 >>> total number of mallocs used during MatSetValues >>> calls =0 >>> not using I-node routines >>> Up solver (post-smoother) same as down solver >>> (pre-smoother) >>> Down solver (pre-smoother) on level 3 >>> ------------------------------- >>> KSP Object: >>> (fieldsplit_0_mg_levels_3_) 1 MPI processes >>> type: richardson >>> Richardson: damping factor=1. >>> maximum iterations=2 >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using nonzero initial guess >>> using NONE norm type for convergence test >>> PC Object: >>> (fieldsplit_0_mg_levels_3_) 1 MPI processes >>> type: sor >>> SOR: type = local_symmetric, iterations = 1, local >>> iterations = 1, omega = 1. >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=30236, cols=30236 >>> total: nonzeros=2730644, allocated nonzeros=2730644 >>> total number of mallocs used during MatSetValues >>> calls =0 >>> not using I-node routines >>> Up solver (post-smoother) same as down solver >>> (pre-smoother) >>> Down solver (pre-smoother) on level 4 >>> ------------------------------- >>> KSP Object: >>> (fieldsplit_0_mg_levels_4_) 1 MPI processes >>> type: richardson >>> Richardson: damping factor=1. >>> maximum iterations=2 >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using nonzero initial guess >>> using NONE norm type for convergence test >>> PC Object: >>> (fieldsplit_0_mg_levels_4_) 1 MPI processes >>> type: sor >>> SOR: type = local_symmetric, iterations = 1, local >>> iterations = 1, omega = 1. >>> linear system matrix = precond matrix: >>> Mat Object: >>> (fieldsplit_0_) 1 MPI processes >>> type: seqaij >>> rows=894132, cols=894132 >>> total: nonzeros=70684164, allocated >>> nonzeros=70684164 >>> total number of mallocs used during MatSetValues >>> calls =0 >>> not using I-node routines >>> Up solver (post-smoother) same as down solver >>> (pre-smoother) >>> linear system matrix = precond matrix: >>> Mat Object: >>> (fieldsplit_0_) 1 MPI processes >>> type: seqaij >>> rows=894132, cols=894132 >>> total: nonzeros=70684164, allocated nonzeros=70684164 >>> total number of mallocs used during MatSetValues calls >>> =0 >>> not using I-node routines >>> A01 >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=894132, cols=42025 >>> total: nonzeros=6850107, allocated nonzeros=6850107 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node routines >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=42025, cols=42025 >>> total: nonzeros=554063, allocated nonzeros=554063 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node routines >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=936157, cols=936157 >>> total: nonzeros=84938441, allocated nonzeros=84938441 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node routines >>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dnolte at dim.uchile.cl Mon Jun 12 14:20:07 2017 From: dnolte at dim.uchile.cl (David Nolte) Date: Mon, 12 Jun 2017 15:20:07 -0400 Subject: [petsc-users] Advice on improving Stokes Schur preconditioners In-Reply-To: References: Message-ID: Ok. With "-pc_fieldsplit_schur_fact_type full" the outer iteration converges in 1 step. The problem remain the Schur iterations. I was not sure if the problem was maybe the singular pressure or the pressure Dirichlet BC. I tested the solver with a standard Stokes flow in a pipe with a constriction (zero Neumann BC for the pressure at the outlet) and in a 3D cavity (enclosed flow, no pressure BC or fixed at one point). I am not sure if I need to attach the constant pressure nullspace to the matrix for GMRES. Not doing so does not alter the convergence of GMRES in the Schur solver (nor the pressure solution), using a pressure Dirichlet BC however slows down convergence (I suppose because of the scaling of the matrix). I also checked the pressure mass matrix that I give PETSc, it looks correct. In all these cases, the solver behaves just as before. With LU in fieldsplit_0 and GMRES/LU with rtol 1e-10 in fieldsplit_1, it converges after 1 outer iteration, but the inner Schur solver converges slowly. How should the convergence of GMRES/LU of the Schur complement *normally* behave? Thanks again! David On 06/12/2017 12:41 PM, Matthew Knepley wrote: > On Mon, Jun 12, 2017 at 10:36 AM, David Nolte > wrote: > > > On 06/12/2017 07:50 AM, Matthew Knepley wrote: >> On Sun, Jun 11, 2017 at 11:06 PM, David Nolte >> > wrote: >> >> Thanks Matt, makes sense to me! >> >> I skipped direct solvers at first because for these 'real' >> configurations LU (mumps/superlu_dist) usally goes out of >> memory (got 32GB RAM). It would be reasonable to take one >> more step back and play with synthetic examples. >> I managed to run one case though with 936k dofs using: >> ("user" =pressure mass matrix) >> >> <...> >> -pc_fieldsplit_schur_fact_type upper >> -pc_fieldsplit_schur_precondition user >> -fieldsplit_0_ksp_type preonly >> -fieldsplit_0_pc_type lu >> -fieldsplit_0_pc_factor_mat_solver_package mumps >> >> -fieldsplit_1_ksp_type gmres >> -fieldsplit_1_ksp_monitor_true_residuals >> -fieldsplit_1_ksp_rtol 1e-10 >> -fieldsplit_1_pc_type lu >> -fieldsplit_1_pc_factor_mat_solver_package mumps >> >> It takes 2 outer iterations, as expected. However the >> fieldsplit_1 solve takes very long. >> >> >> 1) It should take 1 outer iterate, not two. The problem is that >> your Schur tolerance is way too high. Use >> >> -fieldsplit_1_ksp_rtol 1e-10 >> >> or something like that. Then it will take 1 iterate. > > Shouldn't it take 2 with a triangular Schur factorization and > exact preconditioners, and 1 with a full factorization? (cf. Benzi > et al 2005, p.66, > http://www.mathcs.emory.edu/~benzi/Web_papers/bgl05.pdf > ) > > That's exactly what I set: -fieldsplit_1_ksp_rtol 1e-10 and the > Schur solver does drop below "rtol < 1e-10" > > > Oh, yes. Take away the upper until things are worked out. > > Thanks, > > Matt > >> >> 2) There is a problem with the Schur solve. Now from the iterates >> >> 423 KSP preconditioned resid norm 2.638419658982e-02 true resid >> norm 7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11 >> >> it is clear that the preconditioner is really screwing stuff up. >> For testing, you can use >> >> -pc_fieldsplit_schur_precondition full >> >> and your same setup here. It should take one iterate. I think >> there is something wrong with your >> mass matrix. > > I agree. I forgot to mention that I am considering an "enclosed > flow" problem, with u=0 on all the boundary and a Dirichlet > condition for the pressure in one point for fixing the constant > pressure. Maybe the preconditioner is not consistent with this > setup, need to check this.. > > Thanks a lot > > >> >> Thanks, >> >> Matt >> >> >> 0 KSP unpreconditioned resid norm 4.038466809302e-03 true >> resid norm 4.038466809302e-03 ||r(i)||/||b|| 1.000000000000e+00 >> Residual norms for fieldsplit_1_ solve. >> 0 KSP preconditioned resid norm 0.000000000000e+00 true >> resid norm 0.000000000000e+00 ||r(i)||/||b|| -nan >> Linear fieldsplit_1_ solve converged due to CONVERGED_ATOL >> iterations 0 >> 1 KSP unpreconditioned resid norm 4.860095964831e-06 true >> resid norm 4.860095964831e-06 ||r(i)||/||b|| 1.203450763452e-03 >> Residual norms for fieldsplit_1_ solve. >> 0 KSP preconditioned resid norm 2.965546249872e+08 true >> resid norm 1.000000000000e+00 ||r(i)||/||b|| 1.000000000000e+00 >> 1 KSP preconditioned resid norm 1.347596594634e+08 true >> resid norm 3.599678801575e-01 ||r(i)||/||b|| 3.599678801575e-01 >> 2 KSP preconditioned resid norm 5.913230136403e+07 true >> resid norm 2.364916760834e-01 ||r(i)||/||b|| 2.364916760834e-01 >> 3 KSP preconditioned resid norm 4.629700028930e+07 true >> resid norm 1.984444715595e-01 ||r(i)||/||b|| 1.984444715595e-01 >> 4 KSP preconditioned resid norm 3.804431276819e+07 true >> resid norm 1.747224559120e-01 ||r(i)||/||b|| 1.747224559120e-01 >> 5 KSP preconditioned resid norm 3.178769422140e+07 true >> resid norm 1.402254864444e-01 ||r(i)||/||b|| 1.402254864444e-01 >> 6 KSP preconditioned resid norm 2.648669043919e+07 true >> resid norm 1.191164310866e-01 ||r(i)||/||b|| 1.191164310866e-01 >> 7 KSP preconditioned resid norm 2.203522108614e+07 true >> resid norm 9.690500018007e-02 ||r(i)||/||b|| 9.690500018007e-02 >> <...> >> 422 KSP preconditioned resid norm 2.984888715147e-02 true >> resid norm 8.598401046494e-11 ||r(i)||/||b|| 8.598401046494e-11 >> 423 KSP preconditioned resid norm 2.638419658982e-02 true >> resid norm 7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11 >> Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL >> iterations 423 >> 2 KSP unpreconditioned resid norm 3.539889585599e-16 true >> resid norm 3.542279617063e-16 ||r(i)||/||b|| 8.771347603759e-14 >> Linear solve converged due to CONVERGED_RTOL iterations 2 >> >> >> Does the slow convergence of the Schur block mean that my >> preconditioning matrix Sp is a poor choice? >> >> Thanks, >> David >> >> >> On 06/11/2017 08:53 AM, Matthew Knepley wrote: >>> On Sat, Jun 10, 2017 at 8:25 PM, David Nolte >>> > wrote: >>> >>> Dear all, >>> >>> I am solving a Stokes problem in 3D aorta geometries, >>> using a P2/P1 >>> finite elements discretization on tetrahedral meshes >>> resulting in >>> ~1-1.5M DOFs. Viscosity is uniform (can be adjusted >>> arbitrarily), and >>> the right hand side is a function of noisy measurement data. >>> >>> In other settings of "standard" Stokes flow problems I >>> have obtained >>> good convergence with an "upper" Schur complement >>> preconditioner, using >>> AMG (ML or Hypre) on the velocity block and >>> approximating the Schur >>> complement matrix by the diagonal of the pressure mass >>> matrix: >>> >>> -ksp_converged_reason >>> -ksp_monitor_true_residual >>> -ksp_initial_guess_nonzero >>> -ksp_diagonal_scale >>> -ksp_diagonal_scale_fix >>> -ksp_type fgmres >>> -ksp_rtol 1.0e-8 >>> >>> -pc_type fieldsplit >>> -pc_fieldsplit_type schur >>> -pc_fieldsplit_detect_saddle_point >>> -pc_fieldsplit_schur_fact_type upper >>> -pc_fieldsplit_schur_precondition user # <-- >>> pressure mass matrix >>> >>> -fieldsplit_0_ksp_type preonly >>> -fieldsplit_0_pc_type ml >>> >>> -fieldsplit_1_ksp_type preonly >>> -fieldsplit_1_pc_type jacobi >>> >>> >>> 1) I always recommend starting from an exact solver and >>> backing off in small steps for optimization. Thus >>> I would start with LU on the upper block and GMRES/LU >>> with toelrance 1e-10 on the Schur block. >>> This should converge in 1 iterate. >>> >>> 2) I don't think you want preonly on the Schur system. You >>> might want GMRES/Jacobi to invert the mass matrix. >>> >>> 3) You probably want to tighten the tolerance on the Schur >>> solve, at least to start, and then slowly let it out. The >>> tight tolerance will show you how effective the >>> preconditioner is using that Schur operator. Then you can start >>> to evaluate how effective the Schur linear sovler is. >>> >>> Does this make sense? >>> >>> Thanks, >>> >>> Matt >>> >>> >>> In my present case this setup gives rather slow >>> convergence (varies for >>> different geometries between 200-500 or several >>> thousands!). I obtain >>> better convergence with >>> "-pc_fieldsplit_schur_precondition selfp"and >>> using multigrid on S, with "-fieldsplit_1_pc_type ml" (I >>> don't think >>> this is optimal, though). >>> >>> I don't understand why the pressure mass matrix approach >>> performs so >>> poorly and wonder what I could try to improve the >>> convergence. Until now >>> I have been using ML and Hypre BoomerAMG mostly with >>> default parameters. >>> Surely they can be improved by tuning some parameters. >>> Which could be a >>> good starting point? Are there other options I should >>> consider? >>> >>> With the above setup (jacobi) for a case that works >>> better than others, >>> the KSP terminates with >>> 467 KSP unpreconditioned resid norm 2.072014323515e-09 >>> true resid norm >>> 2.072014322600e-09 ||r(i)||/||b|| 9.939098100674e-09 >>> >>> You can find the output of -ksp_view below. Let me know >>> if you need more >>> details. >>> >>> Thanks in advance for your advice! >>> Best wishes >>> David >>> >>> >>> KSP Object: 1 MPI processes >>> type: fgmres >>> GMRES: restart=30, using Classical (unmodified) >>> Gram-Schmidt >>> Orthogonalization with no iterative refinement >>> GMRES: happy breakdown tolerance 1e-30 >>> maximum iterations=10000 >>> tolerances: relative=1e-08, absolute=1e-50, >>> divergence=10000. >>> right preconditioning >>> diagonally scaled system >>> using nonzero initial guess >>> using UNPRECONDITIONED norm type for convergence test >>> PC Object: 1 MPI processes >>> type: fieldsplit >>> FieldSplit with Schur preconditioner, factorization >>> UPPER >>> Preconditioner for the Schur complement formed from >>> user provided matrix >>> Split info: >>> Split number 0 Defined by IS >>> Split number 1 Defined by IS >>> KSP solver for A00 block >>> KSP Object: (fieldsplit_0_) 1 MPI processes >>> type: preonly >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using NONE norm type for convergence test >>> PC Object: (fieldsplit_0_) 1 MPI processes >>> type: ml >>> MG: type is MULTIPLICATIVE, levels=5 cycles=v >>> Cycles per PCApply=1 >>> Using Galerkin computed coarse grid matrices >>> Coarse grid solver -- level >>> ------------------------------- >>> KSP Object: >>> (fieldsplit_0_mg_coarse_) 1 MPI >>> processes >>> type: preonly >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using NONE norm type for convergence test >>> PC Object: (fieldsplit_0_mg_coarse_) >>> 1 MPI >>> processes >>> type: lu >>> LU: out-of-place factorization >>> tolerance for zero pivot 2.22045e-14 >>> using diagonal shift on blocks to prevent >>> zero pivot >>> [INBLOCKS] >>> matrix ordering: nd >>> factor fill ratio given 5., needed 1. >>> Factored matrix follows: >>> Mat Object: 1 MPI >>> processes >>> type: seqaij >>> rows=3, cols=3 >>> package used to perform >>> factorization: petsc >>> total: nonzeros=3, allocated nonzeros=3 >>> total number of mallocs used during >>> MatSetValues >>> calls =0 >>> not using I-node routines >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=3, cols=3 >>> total: nonzeros=3, allocated nonzeros=3 >>> total number of mallocs used during >>> MatSetValues calls =0 >>> not using I-node routines >>> Down solver (pre-smoother) on level 1 >>> ------------------------------- >>> KSP Object: >>> (fieldsplit_0_mg_levels_1_) 1 >>> MPI processes >>> type: richardson >>> Richardson: damping factor=1. >>> maximum iterations=2 >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using nonzero initial guess >>> using NONE norm type for convergence test >>> PC Object: >>> (fieldsplit_0_mg_levels_1_) 1 >>> MPI processes >>> type: sor >>> SOR: type = local_symmetric, iterations = >>> 1, local >>> iterations = 1, omega = 1. >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=15, cols=15 >>> total: nonzeros=69, allocated nonzeros=69 >>> total number of mallocs used during >>> MatSetValues calls =0 >>> not using I-node routines >>> Up solver (post-smoother) same as down solver >>> (pre-smoother) >>> Down solver (pre-smoother) on level 2 >>> ------------------------------- >>> KSP Object: >>> (fieldsplit_0_mg_levels_2_) 1 >>> MPI processes >>> type: richardson >>> Richardson: damping factor=1. >>> maximum iterations=2 >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using nonzero initial guess >>> using NONE norm type for convergence test >>> PC Object: >>> (fieldsplit_0_mg_levels_2_) 1 >>> MPI processes >>> type: sor >>> SOR: type = local_symmetric, iterations = >>> 1, local >>> iterations = 1, omega = 1. >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=304, cols=304 >>> total: nonzeros=7354, allocated nonzeros=7354 >>> total number of mallocs used during >>> MatSetValues calls =0 >>> not using I-node routines >>> Up solver (post-smoother) same as down solver >>> (pre-smoother) >>> Down solver (pre-smoother) on level 3 >>> ------------------------------- >>> KSP Object: >>> (fieldsplit_0_mg_levels_3_) 1 >>> MPI processes >>> type: richardson >>> Richardson: damping factor=1. >>> maximum iterations=2 >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using nonzero initial guess >>> using NONE norm type for convergence test >>> PC Object: >>> (fieldsplit_0_mg_levels_3_) 1 >>> MPI processes >>> type: sor >>> SOR: type = local_symmetric, iterations = >>> 1, local >>> iterations = 1, omega = 1. >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=30236, cols=30236 >>> total: nonzeros=2730644, allocated >>> nonzeros=2730644 >>> total number of mallocs used during >>> MatSetValues calls =0 >>> not using I-node routines >>> Up solver (post-smoother) same as down solver >>> (pre-smoother) >>> Down solver (pre-smoother) on level 4 >>> ------------------------------- >>> KSP Object: >>> (fieldsplit_0_mg_levels_4_) 1 >>> MPI processes >>> type: richardson >>> Richardson: damping factor=1. >>> maximum iterations=2 >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using nonzero initial guess >>> using NONE norm type for convergence test >>> PC Object: >>> (fieldsplit_0_mg_levels_4_) 1 >>> MPI processes >>> type: sor >>> SOR: type = local_symmetric, iterations = >>> 1, local >>> iterations = 1, omega = 1. >>> linear system matrix = precond matrix: >>> Mat Object: (fieldsplit_0_) >>> 1 MPI >>> processes >>> type: seqaij >>> rows=894132, cols=894132 >>> total: nonzeros=70684164, allocated >>> nonzeros=70684164 >>> total number of mallocs used during >>> MatSetValues calls =0 >>> not using I-node routines >>> Up solver (post-smoother) same as down solver >>> (pre-smoother) >>> linear system matrix = precond matrix: >>> Mat Object: (fieldsplit_0_) 1 MPI >>> processes >>> type: seqaij >>> rows=894132, cols=894132 >>> total: nonzeros=70684164, allocated >>> nonzeros=70684164 >>> total number of mallocs used during >>> MatSetValues calls =0 >>> not using I-node routines >>> KSP solver for S = A11 - A10 inv(A00) A01 >>> KSP Object: (fieldsplit_1_) 1 MPI processes >>> type: preonly >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using NONE norm type for convergence test >>> PC Object: (fieldsplit_1_) 1 MPI processes >>> type: jacobi >>> linear system matrix followed by preconditioner >>> matrix: >>> Mat Object: (fieldsplit_1_) 1 MPI >>> processes >>> type: schurcomplement >>> rows=42025, cols=42025 >>> Schur complement A11 - A10 inv(A00) A01 >>> A11 >>> Mat Object: (fieldsplit_1_) >>> 1 >>> MPI processes >>> type: seqaij >>> rows=42025, cols=42025 >>> total: nonzeros=554063, allocated >>> nonzeros=554063 >>> total number of mallocs used during >>> MatSetValues calls =0 >>> not using I-node routines >>> A10 >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=42025, cols=894132 >>> total: nonzeros=6850107, allocated >>> nonzeros=6850107 >>> total number of mallocs used during >>> MatSetValues calls =0 >>> not using I-node routines >>> KSP of A00 >>> KSP Object: (fieldsplit_0_) >>> 1 >>> MPI processes >>> type: preonly >>> maximum iterations=10000, initial guess >>> is zero >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using NONE norm type for convergence test >>> PC Object: (fieldsplit_0_) >>> 1 >>> MPI processes >>> type: ml >>> MG: type is MULTIPLICATIVE, levels=5 >>> cycles=v >>> Cycles per PCApply=1 >>> Using Galerkin computed coarse grid >>> matrices >>> Coarse grid solver -- level >>> ------------------------------- >>> KSP Object: >>> (fieldsplit_0_mg_coarse_) 1 MPI processes >>> type: preonly >>> maximum iterations=10000, initial >>> guess is zero >>> tolerances: relative=1e-05, >>> absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using NONE norm type for convergence >>> test >>> PC Object: >>> (fieldsplit_0_mg_coarse_) 1 MPI processes >>> type: lu >>> LU: out-of-place factorization >>> tolerance for zero pivot 2.22045e-14 >>> using diagonal shift on blocks to >>> prevent zero >>> pivot [INBLOCKS] >>> matrix ordering: nd >>> factor fill ratio given 5., needed 1. >>> Factored matrix follows: >>> Mat Object: >>> 1 MPI >>> processes >>> type: seqaij >>> rows=3, cols=3 >>> package used to perform >>> factorization: petsc >>> total: nonzeros=3, allocated >>> nonzeros=3 >>> total number of mallocs used >>> during >>> MatSetValues calls =0 >>> not using I-node routines >>> linear system matrix = precond matrix: >>> Mat Object: 1 >>> MPI processes >>> type: seqaij >>> rows=3, cols=3 >>> total: nonzeros=3, allocated >>> nonzeros=3 >>> total number of mallocs used >>> during MatSetValues >>> calls =0 >>> not using I-node routines >>> Down solver (pre-smoother) on level 1 >>> ------------------------------- >>> KSP Object: >>> (fieldsplit_0_mg_levels_1_) 1 MPI >>> processes >>> type: richardson >>> Richardson: damping factor=1. >>> maximum iterations=2 >>> tolerances: relative=1e-05, >>> absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using nonzero initial guess >>> using NONE norm type for convergence >>> test >>> PC Object: >>> (fieldsplit_0_mg_levels_1_) 1 MPI >>> processes >>> type: sor >>> SOR: type = local_symmetric, >>> iterations = 1, local >>> iterations = 1, omega = 1. >>> linear system matrix = precond matrix: >>> Mat Object: 1 >>> MPI processes >>> type: seqaij >>> rows=15, cols=15 >>> total: nonzeros=69, allocated >>> nonzeros=69 >>> total number of mallocs used >>> during MatSetValues >>> calls =0 >>> not using I-node routines >>> Up solver (post-smoother) same as down >>> solver (pre-smoother) >>> Down solver (pre-smoother) on level 2 >>> ------------------------------- >>> KSP Object: >>> (fieldsplit_0_mg_levels_2_) 1 MPI >>> processes >>> type: richardson >>> Richardson: damping factor=1. >>> maximum iterations=2 >>> tolerances: relative=1e-05, >>> absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using nonzero initial guess >>> using NONE norm type for convergence >>> test >>> PC Object: >>> (fieldsplit_0_mg_levels_2_) 1 MPI >>> processes >>> type: sor >>> SOR: type = local_symmetric, >>> iterations = 1, local >>> iterations = 1, omega = 1. >>> linear system matrix = precond matrix: >>> Mat Object: 1 >>> MPI processes >>> type: seqaij >>> rows=304, cols=304 >>> total: nonzeros=7354, allocated >>> nonzeros=7354 >>> total number of mallocs used >>> during MatSetValues >>> calls =0 >>> not using I-node routines >>> Up solver (post-smoother) same as down >>> solver (pre-smoother) >>> Down solver (pre-smoother) on level 3 >>> ------------------------------- >>> KSP Object: >>> (fieldsplit_0_mg_levels_3_) 1 MPI >>> processes >>> type: richardson >>> Richardson: damping factor=1. >>> maximum iterations=2 >>> tolerances: relative=1e-05, >>> absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using nonzero initial guess >>> using NONE norm type for convergence >>> test >>> PC Object: >>> (fieldsplit_0_mg_levels_3_) 1 MPI >>> processes >>> type: sor >>> SOR: type = local_symmetric, >>> iterations = 1, local >>> iterations = 1, omega = 1. >>> linear system matrix = precond matrix: >>> Mat Object: 1 >>> MPI processes >>> type: seqaij >>> rows=30236, cols=30236 >>> total: nonzeros=2730644, allocated >>> nonzeros=2730644 >>> total number of mallocs used >>> during MatSetValues >>> calls =0 >>> not using I-node routines >>> Up solver (post-smoother) same as down >>> solver (pre-smoother) >>> Down solver (pre-smoother) on level 4 >>> ------------------------------- >>> KSP Object: >>> (fieldsplit_0_mg_levels_4_) 1 MPI >>> processes >>> type: richardson >>> Richardson: damping factor=1. >>> maximum iterations=2 >>> tolerances: relative=1e-05, >>> absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using nonzero initial guess >>> using NONE norm type for convergence >>> test >>> PC Object: >>> (fieldsplit_0_mg_levels_4_) 1 MPI >>> processes >>> type: sor >>> SOR: type = local_symmetric, >>> iterations = 1, local >>> iterations = 1, omega = 1. >>> linear system matrix = precond matrix: >>> Mat Object: >>> (fieldsplit_0_) 1 MPI processes >>> type: seqaij >>> rows=894132, cols=894132 >>> total: nonzeros=70684164, >>> allocated nonzeros=70684164 >>> total number of mallocs used >>> during MatSetValues >>> calls =0 >>> not using I-node routines >>> Up solver (post-smoother) same as down >>> solver (pre-smoother) >>> linear system matrix = precond matrix: >>> Mat Object: >>> (fieldsplit_0_) 1 MPI processes >>> type: seqaij >>> rows=894132, cols=894132 >>> total: nonzeros=70684164, allocated >>> nonzeros=70684164 >>> total number of mallocs used during >>> MatSetValues calls =0 >>> not using I-node routines >>> A01 >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=894132, cols=42025 >>> total: nonzeros=6850107, allocated >>> nonzeros=6850107 >>> total number of mallocs used during >>> MatSetValues calls =0 >>> not using I-node routines >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=42025, cols=42025 >>> total: nonzeros=554063, allocated nonzeros=554063 >>> total number of mallocs used during >>> MatSetValues calls =0 >>> not using I-node routines >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=936157, cols=936157 >>> total: nonzeros=84938441, allocated nonzeros=84938441 >>> total number of mallocs used during MatSetValues >>> calls =0 >>> not using I-node routines >>> >>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin >>> their experiments is infinitely more interesting than any >>> results to which their experiments lead. >>> -- Norbert Wiener >>> >>> http://www.caam.rice.edu/~mk51/ >>> >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to >> which their experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Mon Jun 12 15:52:32 2017 From: dave.mayhem23 at gmail.com (Dave May) Date: Mon, 12 Jun 2017 21:52:32 +0100 Subject: [petsc-users] Advice on improving Stokes Schur preconditioners In-Reply-To: References: Message-ID: I've been following the discussion and have a couple of comments: 1/ For the preconditioners that you are using (Schur factorisation LDU, or upper block triangular DU), the convergence properties (e.g. 1 iterate for LDU and 2 iterates for DU) come from analysis involving exact inverses of A_00 and S Once you switch from using exact inverses of A_00 and S, you have to rely on spectral equivalence of operators. That is fine, but the spectral equivalence does not tell you how many iterates LDU or DU will require to converge. What it does inform you about is that if you have a spectrally equivalent operator for A_00 and S (Schur complement), then under mesh refinement, your iteration count (whatever it was prior to refinement) will not increase. 2/ Looking at your first set of options, I see you have opted to use -fieldsplit_ksp_type preonly (for both split 0 and 1). That is nice as it creates a linear operator thus you don't need something like FGMRES or GCR applied to the saddle point problem. Your choice for Schur is fine in the sense that the diagonal of M is spectrally equivalent to M, and M is spectrally equivalent to S. Whether it is "fine" in terms of the iteration count for Schur systems, we cannot say apriori (since the spectral equivalence doesn't give us direct info about the iterations we should expect). Your preconditioner for A_00 relies on AMG producing a spectrally equivalent operator with bounds which are tight enough to ensure convergence of the saddle point problem. I'll try explain this. In my experience, for many problems (unstructured FE with variable coefficients, structured FE meshes with variable coefficients) AMG and preonly is not a robust choice. To control the approximation (the spectral equiv bounds), I typically run a stationary or Krylov method on split 0 (e.g. -fieldsplit_0_ksp_type xxx -fieldsplit_0_kps_rtol yyy). Since the AMG preconditioner generated is spectrally equivalent (usually!), these solves will converge to a chosen rtol in a constant number of iterates under h-refinement. In practice, if I don't enforce that I hit something like rtol=1.0e-1 (or 1.0e-2) on the 0th split, saddle point iterates will typically increase for "hard" problems under mesh refinement (1e4-1e7 coefficient variation), and may not even converge at all when just using -fieldsplit_0_ksp_type preonly. Failure ultimately depends on how "strong" the preconditioner for A_00 block is (consider re-discretized geometric multigrid versus AMG). Running an iterative solve on the 0th split lets you control and recover from weak/poor, but spectrally equivalent preconditioners for A_00. Note that people hate this approach as it invariably nests Krylov methods, and subsequently adds more global reductions. However, it is scalable, optimal, tuneable and converges faster than the case which didn't converge at all :D 3/ I agree with Matt's comments, but I'd do a couple of other things first. * I'd first check the discretization is implemented correctly. Your P2/P1 element is inf-sup stable - thus the condition number of S (unpreconditioned) should be independent of the mesh resolution (h). An easy way to verify this is to run either LDU (schur_fact_type full) or DU (schur_fact_type upper) and monitor the iterations required for those S solves. Use -fieldsplit_1_pc_type none -fieldsplit_1_ksp_rtol 1.0e-8 -fieldsplit_1_ksp_monitor_true_residual -fieldsplit_1_ksp_pc_right -fieldsplit_1_ksp_type gmres -fieldsplit_0_pc_type lu Then refine the mesh (ideally via sub-division) and repeat the experiment. If the S iterates don't asymptote, but instead grow with each refinement - you likely have a problem with the discretisation. * Do the same experiment, but this time use your mass matrix as the preconditioner for S and use -fieldsplit_1_pc_type lu. If the iterates, compared with the previous experiments (without a Schur PC) have gone up your mass matrix is not defined correctly. If in the previous experiment (without a Schur PC) iterates on the S solves were bounded, but now when preconditioned with the mass matrix the iterates go up, then your mass matrix is definitely not correct. 4/ Lastly, to finally get to your question regarding does +400 iterates for the solving the Schur seem "reasonable" and what is "normal behaviour"? It seems "high" to me. However the specifics of your discretisation, mesh topology, element quality, boundary conditions render it almost impossible to say what should be expected. When I use a Q2-P2* discretisation on a structured mesh with a non-constant viscosity I'd expect something like 50-60 for 1.0e-10 with a mass matrix scaled by the inverse (local) viscosity. For constant viscosity maybe 30 iterates. I think this kind of statement is not particularly useful or helpful though. Given you use an unstructured tet mesh, it is possible that some elements have very bad quality (high aspect ratio (AR), highly skewed). I am certain that P2/P1 has an inf-sup constant which is sensitive to the element aspect ratio (I don't recall the exact scaling wrt AR). From experience I know that using the mass matrix as a preconditioner for Schur is not robust as AR increases (e.g. iterations for the S solve grow). Hence, with a couple of "bad" element in your mesh, I could imagine that you could end up having to perform +400 iterations 5/ Lastly, definitely don't impose one Dirichlet BC on pressure to make the pressure unique. This really screws up all the nice properties of your matrices. Just enforce the constant null space for p. And as you noticed, GMRES magically just does it automatically if the RHS of your original system was consistent. Thanks, Dave On 12 June 2017 at 20:20, David Nolte wrote: > Ok. With "-pc_fieldsplit_schur_fact_type full" the outer iteration > converges in 1 step. The problem remain the Schur iterations. > > I was not sure if the problem was maybe the singular pressure or the > pressure Dirichlet BC. I tested the solver with a standard Stokes flow in a > pipe with a constriction (zero Neumann BC for the pressure at the outlet) > and in a 3D cavity (enclosed flow, no pressure BC or fixed at one point). I > am not sure if I need to attach the constant pressure nullspace to the > matrix for GMRES. Not doing so does not alter the convergence of GMRES in > the Schur solver (nor the pressure solution), using a pressure Dirichlet BC > however slows down convergence (I suppose because of the scaling of the > matrix). > > I also checked the pressure mass matrix that I give PETSc, it looks > correct. > > In all these cases, the solver behaves just as before. With LU in > fieldsplit_0 and GMRES/LU with rtol 1e-10 in fieldsplit_1, it converges > after 1 outer iteration, but the inner Schur solver converges slowly. > > How should the convergence of GMRES/LU of the Schur complement *normally* > behave? > > Thanks again! > David > > > > > On 06/12/2017 12:41 PM, Matthew Knepley wrote: > > On Mon, Jun 12, 2017 at 10:36 AM, David Nolte > wrote: > >> >> On 06/12/2017 07:50 AM, Matthew Knepley wrote: >> >> On Sun, Jun 11, 2017 at 11:06 PM, David Nolte >> wrote: >> >>> Thanks Matt, makes sense to me! >>> >>> I skipped direct solvers at first because for these 'real' >>> configurations LU (mumps/superlu_dist) usally goes out of memory (got 32GB >>> RAM). It would be reasonable to take one more step back and play with >>> synthetic examples. >>> I managed to run one case though with 936k dofs using: ("user" =pressure >>> mass matrix) >>> >>> <...> >>> -pc_fieldsplit_schur_fact_type upper >>> -pc_fieldsplit_schur_precondition user >>> -fieldsplit_0_ksp_type preonly >>> -fieldsplit_0_pc_type lu >>> -fieldsplit_0_pc_factor_mat_solver_package mumps >>> >>> -fieldsplit_1_ksp_type gmres >>> -fieldsplit_1_ksp_monitor_true_residuals >>> -fieldsplit_1_ksp_rtol 1e-10 >>> -fieldsplit_1_pc_type lu >>> -fieldsplit_1_pc_factor_mat_solver_package mumps >>> >>> It takes 2 outer iterations, as expected. However the fieldsplit_1 solve >>> takes very long. >>> >> >> 1) It should take 1 outer iterate, not two. The problem is that your >> Schur tolerance is way too high. Use >> >> -fieldsplit_1_ksp_rtol 1e-10 >> >> or something like that. Then it will take 1 iterate. >> >> >> Shouldn't it take 2 with a triangular Schur factorization and exact >> preconditioners, and 1 with a full factorization? (cf. Benzi et al 2005, >> p.66, http://www.mathcs.emory.edu/~benzi/Web_papers/bgl05.pdf) >> >> That's exactly what I set: -fieldsplit_1_ksp_rtol 1e-10 and the Schur >> solver does drop below "rtol < 1e-10" >> > > Oh, yes. Take away the upper until things are worked out. > > Thanks, > > Matt > >> >> 2) There is a problem with the Schur solve. Now from the iterates >> >> 423 KSP preconditioned resid norm 2.638419658982e-02 true resid norm >> 7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11 >> >> it is clear that the preconditioner is really screwing stuff up. For >> testing, you can use >> >> -pc_fieldsplit_schur_precondition full >> >> and your same setup here. It should take one iterate. I think there is >> something wrong with your >> mass matrix. >> >> >> I agree. I forgot to mention that I am considering an "enclosed flow" >> problem, with u=0 on all the boundary and a Dirichlet condition for the >> pressure in one point for fixing the constant pressure. Maybe the >> preconditioner is not consistent with this setup, need to check this.. >> >> Thanks a lot >> >> >> >> Thanks, >> >> Matt >> >> >>> 0 KSP unpreconditioned resid norm 4.038466809302e-03 true resid norm >>> 4.038466809302e-03 ||r(i)||/||b|| 1.000000000000e+00 >>> Residual norms for fieldsplit_1_ solve. >>> 0 KSP preconditioned resid norm 0.000000000000e+00 true resid norm >>> 0.000000000000e+00 ||r(i)||/||b|| -nan >>> Linear fieldsplit_1_ solve converged due to CONVERGED_ATOL iterations 0 >>> 1 KSP unpreconditioned resid norm 4.860095964831e-06 true resid norm >>> 4.860095964831e-06 ||r(i)||/||b|| 1.203450763452e-03 >>> Residual norms for fieldsplit_1_ solve. >>> 0 KSP preconditioned resid norm 2.965546249872e+08 true resid norm >>> 1.000000000000e+00 ||r(i)||/||b|| 1.000000000000e+00 >>> 1 KSP preconditioned resid norm 1.347596594634e+08 true resid norm >>> 3.599678801575e-01 ||r(i)||/||b|| 3.599678801575e-01 >>> 2 KSP preconditioned resid norm 5.913230136403e+07 true resid norm >>> 2.364916760834e-01 ||r(i)||/||b|| 2.364916760834e-01 >>> 3 KSP preconditioned resid norm 4.629700028930e+07 true resid norm >>> 1.984444715595e-01 ||r(i)||/||b|| 1.984444715595e-01 >>> 4 KSP preconditioned resid norm 3.804431276819e+07 true resid norm >>> 1.747224559120e-01 ||r(i)||/||b|| 1.747224559120e-01 >>> 5 KSP preconditioned resid norm 3.178769422140e+07 true resid norm >>> 1.402254864444e-01 ||r(i)||/||b|| 1.402254864444e-01 >>> 6 KSP preconditioned resid norm 2.648669043919e+07 true resid norm >>> 1.191164310866e-01 ||r(i)||/||b|| 1.191164310866e-01 >>> 7 KSP preconditioned resid norm 2.203522108614e+07 true resid norm >>> 9.690500018007e-02 ||r(i)||/||b|| 9.690500018007e-02 >>> <...> >>> 422 KSP preconditioned resid norm 2.984888715147e-02 true resid norm >>> 8.598401046494e-11 ||r(i)||/||b|| 8.598401046494e-11 >>> 423 KSP preconditioned resid norm 2.638419658982e-02 true resid norm >>> 7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11 >>> Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations >>> 423 >>> 2 KSP unpreconditioned resid norm 3.539889585599e-16 true resid norm >>> 3.542279617063e-16 ||r(i)||/||b|| 8.771347603759e-14 >>> Linear solve converged due to CONVERGED_RTOL iterations 2 >>> >>> >>> Does the slow convergence of the Schur block mean that my >>> preconditioning matrix Sp is a poor choice? >>> >>> Thanks, >>> David >>> >>> >>> On 06/11/2017 08:53 AM, Matthew Knepley wrote: >>> >>> On Sat, Jun 10, 2017 at 8:25 PM, David Nolte >>> wrote: >>> >>>> Dear all, >>>> >>>> I am solving a Stokes problem in 3D aorta geometries, using a P2/P1 >>>> finite elements discretization on tetrahedral meshes resulting in >>>> ~1-1.5M DOFs. Viscosity is uniform (can be adjusted arbitrarily), and >>>> the right hand side is a function of noisy measurement data. >>>> >>>> In other settings of "standard" Stokes flow problems I have obtained >>>> good convergence with an "upper" Schur complement preconditioner, using >>>> AMG (ML or Hypre) on the velocity block and approximating the Schur >>>> complement matrix by the diagonal of the pressure mass matrix: >>>> >>>> -ksp_converged_reason >>>> -ksp_monitor_true_residual >>>> -ksp_initial_guess_nonzero >>>> -ksp_diagonal_scale >>>> -ksp_diagonal_scale_fix >>>> -ksp_type fgmres >>>> -ksp_rtol 1.0e-8 >>>> >>>> -pc_type fieldsplit >>>> -pc_fieldsplit_type schur >>>> -pc_fieldsplit_detect_saddle_point >>>> -pc_fieldsplit_schur_fact_type upper >>>> -pc_fieldsplit_schur_precondition user # <-- pressure mass >>>> matrix >>>> >>>> -fieldsplit_0_ksp_type preonly >>>> -fieldsplit_0_pc_type ml >>>> >>>> -fieldsplit_1_ksp_type preonly >>>> -fieldsplit_1_pc_type jacobi >>>> >>> >>> 1) I always recommend starting from an exact solver and backing off in >>> small steps for optimization. Thus >>> I would start with LU on the upper block and GMRES/LU with toelrance >>> 1e-10 on the Schur block. >>> This should converge in 1 iterate. >>> >>> 2) I don't think you want preonly on the Schur system. You might want >>> GMRES/Jacobi to invert the mass matrix. >>> >>> 3) You probably want to tighten the tolerance on the Schur solve, at >>> least to start, and then slowly let it out. The >>> tight tolerance will show you how effective the preconditioner is >>> using that Schur operator. Then you can start >>> to evaluate how effective the Schur linear sovler is. >>> >>> Does this make sense? >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> In my present case this setup gives rather slow convergence (varies for >>>> different geometries between 200-500 or several thousands!). I obtain >>>> better convergence with "-pc_fieldsplit_schur_precondition selfp"and >>>> using multigrid on S, with "-fieldsplit_1_pc_type ml" (I don't think >>>> this is optimal, though). >>>> >>>> I don't understand why the pressure mass matrix approach performs so >>>> poorly and wonder what I could try to improve the convergence. Until now >>>> I have been using ML and Hypre BoomerAMG mostly with default parameters. >>>> Surely they can be improved by tuning some parameters. Which could be a >>>> good starting point? Are there other options I should consider? >>>> >>>> With the above setup (jacobi) for a case that works better than others, >>>> the KSP terminates with >>>> 467 KSP unpreconditioned resid norm 2.072014323515e-09 true resid norm >>>> 2.072014322600e-09 ||r(i)||/||b|| 9.939098100674e-09 >>>> >>>> You can find the output of -ksp_view below. Let me know if you need more >>>> details. >>>> >>>> Thanks in advance for your advice! >>>> Best wishes >>>> David >>>> >>>> >>>> KSP Object: 1 MPI processes >>>> type: fgmres >>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >>>> Orthogonalization with no iterative refinement >>>> GMRES: happy breakdown tolerance 1e-30 >>>> maximum iterations=10000 >>>> tolerances: relative=1e-08, absolute=1e-50, divergence=10000. >>>> right preconditioning >>>> diagonally scaled system >>>> using nonzero initial guess >>>> using UNPRECONDITIONED norm type for convergence test >>>> PC Object: 1 MPI processes >>>> type: fieldsplit >>>> FieldSplit with Schur preconditioner, factorization UPPER >>>> Preconditioner for the Schur complement formed from user provided >>>> matrix >>>> Split info: >>>> Split number 0 Defined by IS >>>> Split number 1 Defined by IS >>>> KSP solver for A00 block >>>> KSP Object: (fieldsplit_0_) 1 MPI processes >>>> type: preonly >>>> maximum iterations=10000, initial guess is zero >>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>>> left preconditioning >>>> using NONE norm type for convergence test >>>> PC Object: (fieldsplit_0_) 1 MPI processes >>>> type: ml >>>> MG: type is MULTIPLICATIVE, levels=5 cycles=v >>>> Cycles per PCApply=1 >>>> Using Galerkin computed coarse grid matrices >>>> Coarse grid solver -- level ------------------------------- >>>> KSP Object: (fieldsplit_0_mg_coarse_) 1 MPI >>>> processes >>>> type: preonly >>>> maximum iterations=10000, initial guess is zero >>>> tolerances: relative=1e-05, absolute=1e-50, >>>> divergence=10000. >>>> left preconditioning >>>> using NONE norm type for convergence test >>>> PC Object: (fieldsplit_0_mg_coarse_) 1 MPI >>>> processes >>>> type: lu >>>> LU: out-of-place factorization >>>> tolerance for zero pivot 2.22045e-14 >>>> using diagonal shift on blocks to prevent zero pivot >>>> [INBLOCKS] >>>> matrix ordering: nd >>>> factor fill ratio given 5., needed 1. >>>> Factored matrix follows: >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> rows=3, cols=3 >>>> package used to perform factorization: petsc >>>> total: nonzeros=3, allocated nonzeros=3 >>>> total number of mallocs used during MatSetValues >>>> calls =0 >>>> not using I-node routines >>>> linear system matrix = precond matrix: >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> rows=3, cols=3 >>>> total: nonzeros=3, allocated nonzeros=3 >>>> total number of mallocs used during MatSetValues calls =0 >>>> not using I-node routines >>>> Down solver (pre-smoother) on level 1 >>>> ------------------------------- >>>> KSP Object: (fieldsplit_0_mg_levels_1_) 1 >>>> MPI processes >>>> type: richardson >>>> Richardson: damping factor=1. >>>> maximum iterations=2 >>>> tolerances: relative=1e-05, absolute=1e-50, >>>> divergence=10000. >>>> left preconditioning >>>> using nonzero initial guess >>>> using NONE norm type for convergence test >>>> PC Object: (fieldsplit_0_mg_levels_1_) 1 >>>> MPI processes >>>> type: sor >>>> SOR: type = local_symmetric, iterations = 1, local >>>> iterations = 1, omega = 1. >>>> linear system matrix = precond matrix: >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> rows=15, cols=15 >>>> total: nonzeros=69, allocated nonzeros=69 >>>> total number of mallocs used during MatSetValues calls =0 >>>> not using I-node routines >>>> Up solver (post-smoother) same as down solver (pre-smoother) >>>> Down solver (pre-smoother) on level 2 >>>> ------------------------------- >>>> KSP Object: (fieldsplit_0_mg_levels_2_) 1 >>>> MPI processes >>>> type: richardson >>>> Richardson: damping factor=1. >>>> maximum iterations=2 >>>> tolerances: relative=1e-05, absolute=1e-50, >>>> divergence=10000. >>>> left preconditioning >>>> using nonzero initial guess >>>> using NONE norm type for convergence test >>>> PC Object: (fieldsplit_0_mg_levels_2_) 1 >>>> MPI processes >>>> type: sor >>>> SOR: type = local_symmetric, iterations = 1, local >>>> iterations = 1, omega = 1. >>>> linear system matrix = precond matrix: >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> rows=304, cols=304 >>>> total: nonzeros=7354, allocated nonzeros=7354 >>>> total number of mallocs used during MatSetValues calls =0 >>>> not using I-node routines >>>> Up solver (post-smoother) same as down solver (pre-smoother) >>>> Down solver (pre-smoother) on level 3 >>>> ------------------------------- >>>> KSP Object: (fieldsplit_0_mg_levels_3_) 1 >>>> MPI processes >>>> type: richardson >>>> Richardson: damping factor=1. >>>> maximum iterations=2 >>>> tolerances: relative=1e-05, absolute=1e-50, >>>> divergence=10000. >>>> left preconditioning >>>> using nonzero initial guess >>>> using NONE norm type for convergence test >>>> PC Object: (fieldsplit_0_mg_levels_3_) 1 >>>> MPI processes >>>> type: sor >>>> SOR: type = local_symmetric, iterations = 1, local >>>> iterations = 1, omega = 1. >>>> linear system matrix = precond matrix: >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> rows=30236, cols=30236 >>>> total: nonzeros=2730644, allocated nonzeros=2730644 >>>> total number of mallocs used during MatSetValues calls =0 >>>> not using I-node routines >>>> Up solver (post-smoother) same as down solver (pre-smoother) >>>> Down solver (pre-smoother) on level 4 >>>> ------------------------------- >>>> KSP Object: (fieldsplit_0_mg_levels_4_) 1 >>>> MPI processes >>>> type: richardson >>>> Richardson: damping factor=1. >>>> maximum iterations=2 >>>> tolerances: relative=1e-05, absolute=1e-50, >>>> divergence=10000. >>>> left preconditioning >>>> using nonzero initial guess >>>> using NONE norm type for convergence test >>>> PC Object: (fieldsplit_0_mg_levels_4_) 1 >>>> MPI processes >>>> type: sor >>>> SOR: type = local_symmetric, iterations = 1, local >>>> iterations = 1, omega = 1. >>>> linear system matrix = precond matrix: >>>> Mat Object: (fieldsplit_0_) 1 MPI >>>> processes >>>> type: seqaij >>>> rows=894132, cols=894132 >>>> total: nonzeros=70684164, allocated nonzeros=70684164 >>>> total number of mallocs used during MatSetValues calls =0 >>>> not using I-node routines >>>> Up solver (post-smoother) same as down solver (pre-smoother) >>>> linear system matrix = precond matrix: >>>> Mat Object: (fieldsplit_0_) 1 MPI processes >>>> type: seqaij >>>> rows=894132, cols=894132 >>>> total: nonzeros=70684164, allocated nonzeros=70684164 >>>> total number of mallocs used during MatSetValues calls =0 >>>> not using I-node routines >>>> KSP solver for S = A11 - A10 inv(A00) A01 >>>> KSP Object: (fieldsplit_1_) 1 MPI processes >>>> type: preonly >>>> maximum iterations=10000, initial guess is zero >>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>>> left preconditioning >>>> using NONE norm type for convergence test >>>> PC Object: (fieldsplit_1_) 1 MPI processes >>>> type: jacobi >>>> linear system matrix followed by preconditioner matrix: >>>> Mat Object: (fieldsplit_1_) 1 MPI processes >>>> type: schurcomplement >>>> rows=42025, cols=42025 >>>> Schur complement A11 - A10 inv(A00) A01 >>>> A11 >>>> Mat Object: (fieldsplit_1_) 1 >>>> MPI processes >>>> type: seqaij >>>> rows=42025, cols=42025 >>>> total: nonzeros=554063, allocated nonzeros=554063 >>>> total number of mallocs used during MatSetValues calls >>>> =0 >>>> not using I-node routines >>>> A10 >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> rows=42025, cols=894132 >>>> total: nonzeros=6850107, allocated nonzeros=6850107 >>>> total number of mallocs used during MatSetValues calls >>>> =0 >>>> not using I-node routines >>>> KSP of A00 >>>> KSP Object: (fieldsplit_0_) 1 >>>> MPI processes >>>> type: preonly >>>> maximum iterations=10000, initial guess is zero >>>> tolerances: relative=1e-05, absolute=1e-50, >>>> divergence=10000. >>>> left preconditioning >>>> using NONE norm type for convergence test >>>> PC Object: (fieldsplit_0_) 1 >>>> MPI processes >>>> type: ml >>>> MG: type is MULTIPLICATIVE, levels=5 cycles=v >>>> Cycles per PCApply=1 >>>> Using Galerkin computed coarse grid matrices >>>> Coarse grid solver -- level >>>> ------------------------------- >>>> KSP Object: >>>> (fieldsplit_0_mg_coarse_) 1 MPI processes >>>> type: preonly >>>> maximum iterations=10000, initial guess is zero >>>> tolerances: relative=1e-05, absolute=1e-50, >>>> divergence=10000. >>>> left preconditioning >>>> using NONE norm type for convergence test >>>> PC Object: >>>> (fieldsplit_0_mg_coarse_) 1 MPI processes >>>> type: lu >>>> LU: out-of-place factorization >>>> tolerance for zero pivot 2.22045e-14 >>>> using diagonal shift on blocks to prevent zero >>>> pivot [INBLOCKS] >>>> matrix ordering: nd >>>> factor fill ratio given 5., needed 1. >>>> Factored matrix follows: >>>> Mat Object: 1 MPI >>>> processes >>>> type: seqaij >>>> rows=3, cols=3 >>>> package used to perform factorization: petsc >>>> total: nonzeros=3, allocated nonzeros=3 >>>> total number of mallocs used during >>>> MatSetValues calls =0 >>>> not using I-node routines >>>> linear system matrix = precond matrix: >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> rows=3, cols=3 >>>> total: nonzeros=3, allocated nonzeros=3 >>>> total number of mallocs used during MatSetValues >>>> calls =0 >>>> not using I-node routines >>>> Down solver (pre-smoother) on level 1 >>>> ------------------------------- >>>> KSP Object: >>>> (fieldsplit_0_mg_levels_1_) 1 MPI processes >>>> type: richardson >>>> Richardson: damping factor=1. >>>> maximum iterations=2 >>>> tolerances: relative=1e-05, absolute=1e-50, >>>> divergence=10000. >>>> left preconditioning >>>> using nonzero initial guess >>>> using NONE norm type for convergence test >>>> PC Object: >>>> (fieldsplit_0_mg_levels_1_) 1 MPI processes >>>> type: sor >>>> SOR: type = local_symmetric, iterations = 1, local >>>> iterations = 1, omega = 1. >>>> linear system matrix = precond matrix: >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> rows=15, cols=15 >>>> total: nonzeros=69, allocated nonzeros=69 >>>> total number of mallocs used during MatSetValues >>>> calls =0 >>>> not using I-node routines >>>> Up solver (post-smoother) same as down solver >>>> (pre-smoother) >>>> Down solver (pre-smoother) on level 2 >>>> ------------------------------- >>>> KSP Object: >>>> (fieldsplit_0_mg_levels_2_) 1 MPI processes >>>> type: richardson >>>> Richardson: damping factor=1. >>>> maximum iterations=2 >>>> tolerances: relative=1e-05, absolute=1e-50, >>>> divergence=10000. >>>> left preconditioning >>>> using nonzero initial guess >>>> using NONE norm type for convergence test >>>> PC Object: >>>> (fieldsplit_0_mg_levels_2_) 1 MPI processes >>>> type: sor >>>> SOR: type = local_symmetric, iterations = 1, local >>>> iterations = 1, omega = 1. >>>> linear system matrix = precond matrix: >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> rows=304, cols=304 >>>> total: nonzeros=7354, allocated nonzeros=7354 >>>> total number of mallocs used during MatSetValues >>>> calls =0 >>>> not using I-node routines >>>> Up solver (post-smoother) same as down solver >>>> (pre-smoother) >>>> Down solver (pre-smoother) on level 3 >>>> ------------------------------- >>>> KSP Object: >>>> (fieldsplit_0_mg_levels_3_) 1 MPI processes >>>> type: richardson >>>> Richardson: damping factor=1. >>>> maximum iterations=2 >>>> tolerances: relative=1e-05, absolute=1e-50, >>>> divergence=10000. >>>> left preconditioning >>>> using nonzero initial guess >>>> using NONE norm type for convergence test >>>> PC Object: >>>> (fieldsplit_0_mg_levels_3_) 1 MPI processes >>>> type: sor >>>> SOR: type = local_symmetric, iterations = 1, local >>>> iterations = 1, omega = 1. >>>> linear system matrix = precond matrix: >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> rows=30236, cols=30236 >>>> total: nonzeros=2730644, allocated >>>> nonzeros=2730644 >>>> total number of mallocs used during MatSetValues >>>> calls =0 >>>> not using I-node routines >>>> Up solver (post-smoother) same as down solver >>>> (pre-smoother) >>>> Down solver (pre-smoother) on level 4 >>>> ------------------------------- >>>> KSP Object: >>>> (fieldsplit_0_mg_levels_4_) 1 MPI processes >>>> type: richardson >>>> Richardson: damping factor=1. >>>> maximum iterations=2 >>>> tolerances: relative=1e-05, absolute=1e-50, >>>> divergence=10000. >>>> left preconditioning >>>> using nonzero initial guess >>>> using NONE norm type for convergence test >>>> PC Object: >>>> (fieldsplit_0_mg_levels_4_) 1 MPI processes >>>> type: sor >>>> SOR: type = local_symmetric, iterations = 1, local >>>> iterations = 1, omega = 1. >>>> linear system matrix = precond matrix: >>>> Mat Object: >>>> (fieldsplit_0_) 1 MPI processes >>>> type: seqaij >>>> rows=894132, cols=894132 >>>> total: nonzeros=70684164, allocated >>>> nonzeros=70684164 >>>> total number of mallocs used during MatSetValues >>>> calls =0 >>>> not using I-node routines >>>> Up solver (post-smoother) same as down solver >>>> (pre-smoother) >>>> linear system matrix = precond matrix: >>>> Mat Object: >>>> (fieldsplit_0_) 1 MPI processes >>>> type: seqaij >>>> rows=894132, cols=894132 >>>> total: nonzeros=70684164, allocated nonzeros=70684164 >>>> total number of mallocs used during MatSetValues >>>> calls =0 >>>> not using I-node routines >>>> A01 >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> rows=894132, cols=42025 >>>> total: nonzeros=6850107, allocated nonzeros=6850107 >>>> total number of mallocs used during MatSetValues calls >>>> =0 >>>> not using I-node routines >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> rows=42025, cols=42025 >>>> total: nonzeros=554063, allocated nonzeros=554063 >>>> total number of mallocs used during MatSetValues calls =0 >>>> not using I-node routines >>>> linear system matrix = precond matrix: >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> rows=936157, cols=936157 >>>> total: nonzeros=84938441, allocated nonzeros=84938441 >>>> total number of mallocs used during MatSetValues calls =0 >>>> not using I-node routines >>>> >>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> http://www.caam.rice.edu/~mk51/ >>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gotofd at gmail.com Mon Jun 12 19:11:51 2017 From: gotofd at gmail.com (Ji Zhang) Date: Tue, 13 Jun 2017 08:11:51 +0800 Subject: [petsc-users] Why the convergence is much slower when I use two nodes Message-ID: Dear all, I'm a PETSc user. I'm using GMRES method to solve some linear equations. I'm using boundary element method, so the matrix type is dense (or mpidense). I'm using MPICH2, I found that the convergence is fast if I only use one computer node; and much more slower if I use two or more nodes. I'm interested in why this happen, and how can I improve the convergence performance when I use multi-nodes. Thanks a lot. ?? ?? ????????? ?????????? ???????????10????9?? ?100193? Best, Regards, Zhang Ji, PhD student Beijing Computational Science Research Center Zhongguancun Software Park II, No. 10 Dongbeiwang West Road, Haidian District, Beijing 100193, China -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Jun 12 20:34:54 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 12 Jun 2017 20:34:54 -0500 Subject: [petsc-users] Why the convergence is much slower when I use two nodes In-Reply-To: References: Message-ID: You need to provide more information. What is the output of -ksp_view? and -log_view? for both cases > On Jun 12, 2017, at 7:11 PM, Ji Zhang wrote: > > Dear all, > > I'm a PETSc user. I'm using GMRES method to solve some linear equations. I'm using boundary element method, so the matrix type is dense (or mpidense). I'm using MPICH2, I found that the convergence is fast if I only use one computer node; and much more slower if I use two or more nodes. I'm interested in why this happen, and how can I improve the convergence performance when I use multi-nodes. > > Thanks a lot. > > ?? > ?? > ????????? > ?????????? > ???????????10????9?? ?100193? > > Best, > Regards, > Zhang Ji, PhD student > Beijing Computational Science Research Center > Zhongguancun Software Park II, No. 10 Dongbeiwang West Road, Haidian District, Beijing 100193, China From G.Vaz at marin.nl Tue Jun 13 03:48:51 2017 From: G.Vaz at marin.nl (Vaz, Guilherme) Date: Tue, 13 Jun 2017 08:48:51 +0000 Subject: [petsc-users] PETSC on Cray Hazelhen Message-ID: <1497343731587.65032@marin.nl> Dear all, I am trying to install PETSC on a Cray XC40 system (Hazelhen) with the usual Cray wrappers for Intel compilers, with some chosen external packages and MKL libraries. I read some threads in the mailing list about this, and I tried the petsc-3.7.5/config/examples/arch-cray-xt6-pkgs-opt.py configuration options. After trying this (please abstract from my own env vars), CONFOPTS="--prefix=$PETSC_INSTALL_DIR \ --with-cc=cc \ --with-cxx=CC \ --with-fc=ftn \ --with-clib-autodetect=0 \ --with-cxxlib-autodetect=0 \ --with-fortranlib-autodetect=0 \ --COPTFLAGS=-fast -mp \ --CXXOPTFLAGS=-fast -mp \ --FOPTFLAGS=-fast -mp \ --with-shared-libraries=0 \ --with-batch=1 \ --with-x=0 \ --with-mpe=0 \ --with-debugging=0 \ --download-superlu_dist=$SOURCE_DIR/$SUPERLU_SOURCE_FILE \ --with-blas-lapack-dir=$BLASDIR \ --download-parmetis=$SOURCE_DIR/$PARMETIS_SOURCE_FILE \ --download-metis=$SOURCE_DIR/$METIS_SOURCE_FILE \ --with-external-packages-dir=$INSTALL_DIR \ --with-ssl=0 " I get the following error: TESTING: checkFortranLinkingCxx from config.compilers(/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/compilers.py:1097) ******************************************************************************* UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): ------------------------------------------------------------------------------- Fortran could not successfully link C++ objects ******************************************************************************* Does it ring a bell? Any tips? Thanks, Guilherme V. dr. ir. Guilherme Vaz | CFD Research Coordinator / Senior CFD Researcher | Research & Development MARIN | T +31 317 49 33 25 | M +31 621 13 11 97 | G.Vaz at marin.nl | www.marin.nl [LinkedIn] [YouTube] [Twitter] [Facebook] MARIN news: Maritime Safety seminar, September 12, Singapore -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imageb0ee12.PNG Type: image/png Size: 293 bytes Desc: imageb0ee12.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imaged8f08c.PNG Type: image/png Size: 331 bytes Desc: imaged8f08c.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imagedacd80.PNG Type: image/png Size: 333 bytes Desc: imagedacd80.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image1bd020.PNG Type: image/png Size: 253 bytes Desc: image1bd020.PNG URL: From knepley at gmail.com Tue Jun 13 07:34:24 2017 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 13 Jun 2017 07:34:24 -0500 Subject: [petsc-users] PETSC on Cray Hazelhen In-Reply-To: <1497343731587.65032@marin.nl> References: <1497343731587.65032@marin.nl> Message-ID: On Tue, Jun 13, 2017 at 3:48 AM, Vaz, Guilherme wrote: > Dear all, > > I am trying to install PETSC on a Cray XC40 system (Hazelhen) with the > usual Cray wrappers for Intel compilers, with some chosen external packages > and MKL libraries. > > I read some threads in the mailing list about this, and I tried the > petsc-3.7.5/config/examples/arch-cray-xt6-pkgs-opt.py configuration options. > After trying this (please abstract from my own env vars), > CONFOPTS="--prefix=$PETSC_INSTALL_DIR \ > --with-cc=cc \ > --with-cxx=CC \ > --with-fc=ftn \ > --with-clib-autodetect=0 \ > --with-cxxlib-autodetect=0 \ > --with-fortranlib-autodetect=0 \ > --COPTFLAGS=-fast -mp \ > --CXXOPTFLAGS=-fast -mp \ > --FOPTFLAGS=-fast -mp \ > --with-shared-libraries=0 \ > --with-batch=1 \ > --with-x=0 \ > --with-mpe=0 \ > --with-debugging=0 \ > --download-superlu_dist=$SOURCE_DIR/$SUPERLU_SOURCE_FILE \ > --with-blas-lapack-dir=$BLASDIR \ > --download-parmetis=$SOURCE_DIR/$PARMETIS_SOURCE_FILE \ > --download-metis=$SOURCE_DIR/$METIS_SOURCE_FILE \ > --with-external-packages-dir=$INSTALL_DIR \ > --with-ssl=0 " > > > I get the following error: > > > TESTING: checkFortranLinkingCxx from config.compilers(/zhome/ > academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/ > petsc-3.7.5/config/BuildSystem/config/compilers.py:1097) > ************************************************************ > ******************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for > details): > ------------------------------------------------------------ > ------------------- > Fortran could not successfully link C++ objects > ************************************************************ > ******************* > > Does it ring a bell? Any tips? > You turned off autodetection, so it will not find libstdc++. That either has to be put in LIBS, or I would recommend --with-cxx=0 since nothing you have there requires C++. Thanks, Matt > Thanks, > Guilherme V. > > dr. ir. Guilherme Vaz | CFD Research Coordinator / Senior CFD Researcher | > Research & Development > MARIN | T +31 317 49 33 25 <+31%20317%20493%20325> | M +31 621 13 11 97 | > G.Vaz at marin.nl | www.marin.nl > > [image: LinkedIn] [image: > YouTube] [image: Twitter] > [image: Facebook] > > MARIN news: Maritime Safety seminar, September 12, Singapore > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imageb0ee12.PNG Type: image/png Size: 293 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image1bd020.PNG Type: image/png Size: 253 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imaged8f08c.PNG Type: image/png Size: 331 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imagedacd80.PNG Type: image/png Size: 333 bytes Desc: not available URL: From gotofd at gmail.com Tue Jun 13 07:38:17 2017 From: gotofd at gmail.com (Ji Zhang) Date: Tue, 13 Jun 2017 20:38:17 +0800 Subject: [petsc-users] Why the convergence is much slower when I use two nodes In-Reply-To: References: Message-ID: mpirun -n 1 -hostfile hostfile python force_pipe.py -dp 0.05 -ep 2 -lp 2 -b0 1e-4 -b1 0.9 -nb 1 -th 30 -ksp_max_it 1000 -ksp_view -log_view > mpi_1.txt mpirun -n 2 -hostfile hostfile python force_pipe.py -dp 0.05 -ep 2 -lp 2 -b0 1e-4 -b1 0.9 -nb 1 -th 30 -ksp_max_it 1000 -ksp_view -log_view > mpi_2.txt mpirun -n 4 -hostfile hostfile python force_pipe.py -dp 0.05 -ep 2 -lp 2 -b0 1e-4 -b1 0.9 -nb 1 -th 30 -ksp_max_it 1000 -ksp_view -log_view > mpi_3.txt mpirun -n 6 -hostfile hostfile python force_pipe.py -dp 0.05 -ep 2 -lp 2 -b0 1e-4 -b1 0.9 -nb 1 -th 30 -ksp_max_it 1000 -ksp_view -log_view > mpi_4.txt mpirun -n 2 python force_pipe.py -dp 0.05 -ep 2 -lp 2 -b0 1e-4 -b1 0.9 -nb 1 -th 30 -ksp_max_it 1000 -ksp_view -log_view > mpi_5.txt mpirun -n 4 python force_pipe.py -dp 0.05 -ep 2 -lp 2 -b0 1e-4 -b1 0.9 -nb 1 -th 30 -ksp_max_it 1000 -ksp_view -log_view > mpi_6.txt mpirun -n 6 python force_pipe.py -dp 0.05 -ep 2 -lp 2 -b0 1e-4 -b1 0.9 -nb 1 -th 30 -ksp_max_it 1000 -ksp_view -log_view > mpi_7.txt Dear Barry, The following tests are runing in our cluster using one, two or three nodes. Each node have 64GB memory and 24 cups (Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz). Basic information of each node are listed below. $ lstopo Machine (64GB) NUMANode L#0 (P#0 32GB) Socket L#0 + L3 L#0 (30MB) L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0) L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1) L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2) L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3) L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 (P#4) L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#5) L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#6) L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#7) L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 (P#8) L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 (P#9) L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 + PU L#10 (P#10) L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 + PU L#11 (P#11) HostBridge L#0 PCIBridge PCI 1000:0097 Block L#0 "sda" PCIBridge PCI 8086:1523 Net L#1 "eth0" PCI 8086:1523 Net L#2 "eth1" PCIBridge PCIBridge PCI 1a03:2000 PCI 8086:8d02 NUMANode L#1 (P#1 32GB) Socket L#1 + L3 L#1 (30MB) L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12 + PU L#12 (P#12) L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13 + PU L#13 (P#13) L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14 + PU L#14 (P#14) L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15 + PU L#15 (P#15) L2 L#16 (256KB) + L1d L#16 (32KB) + L1i L#16 (32KB) + Core L#16 + PU L#16 (P#16) L2 L#17 (256KB) + L1d L#17 (32KB) + L1i L#17 (32KB) + Core L#17 + PU L#17 (P#17) L2 L#18 (256KB) + L1d L#18 (32KB) + L1i L#18 (32KB) + Core L#18 + PU L#18 (P#18) L2 L#19 (256KB) + L1d L#19 (32KB) + L1i L#19 (32KB) + Core L#19 + PU L#19 (P#19) L2 L#20 (256KB) + L1d L#20 (32KB) + L1i L#20 (32KB) + Core L#20 + PU L#20 (P#20) L2 L#21 (256KB) + L1d L#21 (32KB) + L1i L#21 (32KB) + Core L#21 + PU L#21 (P#21) L2 L#22 (256KB) + L1d L#22 (32KB) + L1i L#22 (32KB) + Core L#22 + PU L#22 (P#22) L2 L#23 (256KB) + L1d L#23 (32KB) + L1i L#23 (32KB) + Core L#23 + PU L#23 (P#23) HostBridge L#5 PCIBridge PCI 15b3:1003 Net L#3 "ib0" OpenFabrics L#4 "mlx4_0" I have tested seven different cases. Each case solving three different linear equation systems A*x1=b1, A*x2=b2, A*x3=b3. The matrix A is a mpidense matrix and b1, b2, b3 are different vectors. I'm using GMRES method without precondition method . I have set -ksp_mat_it 1000 process nodes eq1_residual_norms eq1_duration eq2_residual_norms eq2_duration eq3_residual_norms eq3_duration mpi_1.txt: 1 1 9.884635e-04 88.631310s 4.144572e-04 88.855811s 4.864481e-03 88.673738s mpi_2.txt: 2 2 6.719300e-01 84.212435s 6.782443e-01 85.153371s 7.223828e-01 85.246724s mpi_3.txt: 4 4 5.813354e-01 52.616490s 5.397962e-01 52.413213s 9.503432e-01 52.495871s mpi_4.txt: 6 6 4.621066e-01 42.929705s 4.661823e-01 43.367914s 1.047436e+00 43.108877s mpi_5.txt: 2 1 6.719300e-01 141.490945s 6.782443e-01 142.746243s 7.223828e-01 142.042608s mpi_6.txt: 3 1 5.813354e-01 165.061162s 5.397962e-01 196.539286s 9.503432e-01 180.240947s mpi_7.txt: 6 1 4.621066e-01 213.683270s 4.661823e-01 208.180939s 1.047436e+00 194.251886s I found that all residual norms are on the order of 1 except the first case, which one I only use one process at one node. See the attach files for more details, please. ?? ?? ????????? ?????????? ???????????10????9?? ?100193? Best, Regards, Zhang Ji, PhD student Beijing Computational Science Research Center Zhongguancun Software Park II, No. 10 Dongbeiwang West Road, Haidian District, Beijing 100193, China On Tue, Jun 13, 2017 at 9:34 AM, Barry Smith wrote: > > You need to provide more information. What is the output of -ksp_view? > and -log_view? for both cases > > > On Jun 12, 2017, at 7:11 PM, Ji Zhang wrote: > > > > Dear all, > > > > I'm a PETSc user. I'm using GMRES method to solve some linear equations. > I'm using boundary element method, so the matrix type is dense (or > mpidense). I'm using MPICH2, I found that the convergence is fast if I only > use one computer node; and much more slower if I use two or more nodes. I'm > interested in why this happen, and how can I improve the convergence > performance when I use multi-nodes. > > > > Thanks a lot. > > > > ?? > > ?? > > ????????? > > ?????????? > > ???????????10????9?? ?100193? > > > > Best, > > Regards, > > Zhang Ji, PhD student > > Beijing Computational Science Research Center > > Zhongguancun Software Park II, No. 10 Dongbeiwang West Road, Haidian > District, Beijing 100193, China > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- Case information: pipe length: 2.000000, pipe radius: 1.000000 delta length of pipe is 0.050000, epsilon of pipe is 2.000000 threshold of seriers is 30 b: 1 numbers are evenly distributed within the range [0.000100, 0.900000] create matrix method: pf_stokesletsInPipe solve method: gmres, precondition method: none output file headle: force_pipe MPI size: 6 Stokeslets in pipe prepare, contain 7376 nodes create matrix use 3.737578s: _00001/00001_b=0.000100: calculate boundary condation use: 1.798243s KSP Object: 6 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1000 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 6 MPI processes type: none linear system matrix = precond matrix: Mat Object: 6 MPI processes type: mpidense rows=22128, cols=22128 total: nonzeros=489648384, allocated nonzeros=489648384 total number of mallocs used during MatSetValues calls =0 _00001/00001_u1: solve matrix equation use: 213.683270s, with residual norm 4.621066e-01 KSP Object: 6 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1000 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 6 MPI processes type: none linear system matrix = precond matrix: Mat Object: 6 MPI processes type: mpidense rows=22128, cols=22128 total: nonzeros=489648384, allocated nonzeros=489648384 total number of mallocs used during MatSetValues calls =0 _00001/00001_u2: solve matrix equation use: 208.180939s, with residual norm 4.661823e-01 KSP Object: 6 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1000 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 6 MPI processes type: none linear system matrix = precond matrix: Mat Object: 6 MPI processes type: mpidense rows=22128, cols=22128 total: nonzeros=489648384, allocated nonzeros=489648384 total number of mallocs used during MatSetValues calls =0 _00001/00001_u3: solve matrix equation use: 194.251886s, with residual norm 1.047436e+00 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- force_pipe.py on a linux-mpich-opblas named cn10 with 6 processors, by zhangji Tue Jun 13 18:11:46 2017 Using Petsc Release Version 3.7.6, Apr, 24, 2017 Max Max/Min Avg Total Time (sec): 6.236e+02 1.00013 6.235e+02 Objects: 4.130e+02 1.00000 4.130e+02 Flops: 5.073e+11 1.00081 5.070e+11 3.042e+12 Flops/sec: 8.136e+08 1.00092 8.131e+08 4.879e+09 MPI Messages: 4.200e+01 2.33333 3.000e+01 1.800e+02 MPI Message Lengths: 2.520e+02 2.33333 6.000e+00 1.080e+03 MPI Reductions: 9.541e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 6.2355e+02 100.0% 3.0421e+12 100.0% 1.800e+02 100.0% 6.000e+00 100.0% 9.540e+03 100.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecMDot 3000 1.0 1.7038e+02 1.7 3.41e+08 1.0 0.0e+00 0.0e+00 3.0e+03 20 0 0 0 31 20 0 0 0 31 12 VecNorm 3102 1.0 7.8933e+01 1.2 2.29e+07 1.0 0.0e+00 0.0e+00 3.1e+03 11 0 0 0 33 11 0 0 0 33 2 VecScale 3102 1.0 3.2920e-02 3.1 1.14e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2085 VecCopy 3204 1.0 1.1629e-01 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 123 1.0 1.1544e-0212.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 201 1.0 1.8733e-03 1.4 1.48e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 4749 VecMAXPY 3102 1.0 3.3990e-01 2.0 3.63e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 6406 VecAssemblyBegin 9 1.0 1.8613e-03 2.0 0.00e+00 0.0 1.8e+02 6.0e+00 2.7e+01 0 0100100 0 0 0100100 0 0 VecAssemblyEnd 9 1.0 3.7193e-05 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 3114 1.0 8.3257e+01 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 3.1e+03 10 0 0 0 33 10 0 0 0 33 0 VecNormalize 3102 1.0 7.8971e+01 1.2 3.43e+07 1.0 0.0e+00 0.0e+00 3.1e+03 11 0 0 0 33 11 0 0 0 33 3 MatMult 3105 1.0 4.4362e+02 1.2 5.07e+11 1.0 0.0e+00 0.0e+00 3.1e+03 67100 0 0 33 67100 0 0 33 6848 MatAssemblyBegin 2 1.0 7.7588e-0211.5 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 2 1.0 1.7595e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 3 1.0 6.1056e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCSetUp 3 1.0 9.5367e-07 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 3102 1.0 1.1835e-01 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPGMRESOrthog 3000 1.0 1.7062e+02 1.7 6.82e+08 1.0 0.0e+00 0.0e+00 3.0e+03 20 0 0 0 31 20 0 0 0 31 24 KSPSetUp 3 1.0 3.8290e-04 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 3 1.0 6.1546e+02 1.0 5.07e+11 1.0 0.0e+00 0.0e+00 9.2e+03 99100 0 0 96 99100 0 0 96 4938 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Viewer 1 0 0 0. Index Set 51 51 39576 0. IS L to G Mapping 15 15 78244 0. Vector 205 205 6265288 0. Vector Scatter 26 26 161064 0. Matrix 9 9 653332008 0. Preconditioner 3 3 2448 0. Krylov Solver 3 3 55200 0. Distributed Mesh 25 25 191044 0. Star Forest Bipartite Graph 50 50 42176 0. Discrete System 25 25 21600 0. ======================================================================================================================== Average time to get PetscTime(): 0. Average time for MPI_Barrier(): 6.19888e-06 Average time for zero size MPI_Send(): 2.30471e-06 #PETSc Option Table entries: -b0 1e-4 -b1 0.9 -dp 0.05 -ep 2 -ksp_max_it 1000 -ksp_view -log_view -lp 2 -nb 1 -th 30 #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-blas-lapack-lib=/public/software/OpenBLAS/lib/libopenblas.a --with-mpi-dir=/home/zhangji/python/mpich-3.2/build --with-hdf5-dir=/public/software/mathlib/hdf5/1.8.12/gnu/ PETSC_DIR=/home/zhangji/python/petsc-3.7.6 PETSC_ARCH=linux-mpich-opblas --download-metis=/public/sourcecode/petsc_gnu/metis-5.1.0.tar.gz --download-ptscotch=/home/zhangji/python/scotch_6.0.4.tar.gz --download-pastix=/home/zhangji/python/pastix_5.2.3.tar.bz2 --with-debugging=no --CFLAGS=-O3 --CXXFLAGS=-O3 --FFLAGS=-O3 ----------------------------------------- Libraries compiled on Sat Jun 10 00:26:59 2017 on cn11 Machine characteristics: Linux-2.6.32-504.el6.x86_64-x86_64-with-centos-6.6-Final Using PETSc directory: /home/zhangji/python/petsc-3.7.6 Using PETSc arch: linux-mpich-opblas ----------------------------------------- Using C compiler: /home/zhangji/python/mpich-3.2/build/bin/mpicc -O3 -fPIC ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /home/zhangji/python/mpich-3.2/build/bin/mpif90 -O3 -fPIC ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/public/software/mathlib/hdf5/1.8.12/gnu/include -I/home/zhangji/python/mpich-3.2/build/include ----------------------------------------- Using C linker: /home/zhangji/python/mpich-3.2/build/bin/mpicc Using Fortran linker: /home/zhangji/python/mpich-3.2/build/bin/mpif90 Using libraries: -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -lpetsc -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -Wl,-rpath,/public/software/OpenBLAS/lib -L/public/software/OpenBLAS/lib -Wl,-rpath,/public/software/mathlib/hdf5/1.8.12/gnu/lib -L/public/software/mathlib/hdf5/1.8.12/gnu/lib -Wl,-rpath,/home/zhangji/python/mpich-3.2/build/lib -L/home/zhangji/python/mpich-3.2/build/lib -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -Wl,-rpath,/home/zhangji/python/petsc-3.7.6 -L/home/zhangji/python/petsc-3.7.6 -lmetis -lpastix -lopenblas -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lz -lX11 -lssl -lcrypto -lmpifort -lifport -lifcore -lpthread -lmpicxx -lrt -lm -lrt -lm -lpthread -lz -ldl -lmpi -limf -lsvml -lirng -lm -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lirc_s -ldl ----------------------------------------- -------------- next part -------------- Case information: pipe length: 2.000000, pipe radius: 1.000000 delta length of pipe is 0.050000, epsilon of pipe is 2.000000 threshold of seriers is 30 b: 1 numbers are evenly distributed within the range [0.000100, 0.900000] create matrix method: pf_stokesletsInPipe solve method: gmres, precondition method: none output file headle: force_pipe MPI size: 4 Stokeslets in pipe prepare, contain 7376 nodes create matrix use 4.977263s: _00001/00001_b=0.000100: calculate boundary condation use: 1.769788s KSP Object: 4 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1000 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 4 MPI processes type: none linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpidense rows=22128, cols=22128 total: nonzeros=489648384, allocated nonzeros=489648384 total number of mallocs used during MatSetValues calls =0 _00001/00001_u1: solve matrix equation use: 165.061162s, with residual norm 5.813354e-01 KSP Object: 4 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1000 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 4 MPI processes type: none linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpidense rows=22128, cols=22128 total: nonzeros=489648384, allocated nonzeros=489648384 total number of mallocs used during MatSetValues calls =0 _00001/00001_u2: solve matrix equation use: 196.539286s, with residual norm 5.397962e-01 KSP Object: 4 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1000 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 4 MPI processes type: none linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpidense rows=22128, cols=22128 total: nonzeros=489648384, allocated nonzeros=489648384 total number of mallocs used during MatSetValues calls =0 _00001/00001_u3: solve matrix equation use: 180.240947s, with residual norm 9.503432e-01 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- force_pipe.py on a linux-mpich-opblas named cn10 with 4 processors, by zhangji Tue Jun 13 18:01:22 2017 Using Petsc Release Version 3.7.6, Apr, 24, 2017 Max Max/Min Avg Total Time (sec): 5.506e+02 1.00007 5.506e+02 Objects: 4.130e+02 1.00000 4.130e+02 Flops: 7.605e+11 1.00000 7.605e+11 3.042e+12 Flops/sec: 1.381e+09 1.00007 1.381e+09 5.525e+09 MPI Messages: 3.000e+01 1.66667 2.700e+01 1.080e+02 MPI Message Lengths: 1.800e+02 1.66667 6.000e+00 6.480e+02 MPI Reductions: 9.541e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 5.5060e+02 100.0% 3.0421e+12 100.0% 1.080e+02 100.0% 6.000e+00 100.0% 9.540e+03 100.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecMDot 3000 1.0 1.0788e+02 1.9 5.11e+08 1.0 0.0e+00 0.0e+00 3.0e+03 14 0 0 0 31 14 0 0 0 31 19 VecNorm 3102 1.0 5.0609e+01 1.1 3.43e+07 1.0 0.0e+00 0.0e+00 3.1e+03 9 0 0 0 33 9 0 0 0 33 3 VecScale 3102 1.0 1.3757e-02 1.1 1.72e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 4990 VecCopy 3204 1.0 9.1627e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 123 1.0 1.4656e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 201 1.0 2.9745e-03 1.6 2.22e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2991 VecMAXPY 3102 1.0 4.4902e-01 1.6 5.44e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 4849 VecAssemblyBegin 9 1.0 1.2916e-01 1.6 0.00e+00 0.0 1.1e+02 6.0e+00 2.7e+01 0 0100100 0 0 0100100 0 0 VecAssemblyEnd 9 1.0 3.3617e-05 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 3114 1.0 5.2983e+01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.1e+03 7 0 0 0 33 7 0 0 0 33 0 VecNormalize 3102 1.0 5.0635e+01 1.1 5.15e+07 1.0 0.0e+00 0.0e+00 3.1e+03 9 0 0 0 33 9 0 0 0 33 4 MatMult 3105 1.0 4.3607e+02 1.1 7.59e+11 1.0 0.0e+00 0.0e+00 3.1e+03 76100 0 0 33 76100 0 0 33 6966 MatAssemblyBegin 2 1.0 2.7158e-013390.1 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 2 1.0 1.6093e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 3 1.0 3.9665e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCSetUp 3 1.0 9.5367e-07 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 3102 1.0 9.2988e-02 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPGMRESOrthog 3000 1.0 1.0832e+02 1.9 1.02e+09 1.0 0.0e+00 0.0e+00 3.0e+03 14 0 0 0 31 14 0 0 0 31 38 KSPSetUp 3 1.0 3.7909e-04 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 3 1.0 5.4123e+02 1.0 7.60e+11 1.0 0.0e+00 0.0e+00 9.2e+03 98100 0 0 96 98100 0 0 96 5615 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Viewer 1 0 0 0. Index Set 51 51 39576 0. IS L to G Mapping 15 15 112628 0. Vector 205 205 8343112 0. Vector Scatter 26 26 229832 0. Matrix 9 9 979454472 0. Preconditioner 3 3 2448 0. Krylov Solver 3 3 55200 0. Distributed Mesh 25 25 225428 0. Star Forest Bipartite Graph 50 50 42176 0. Discrete System 25 25 21600 0. ======================================================================================================================== Average time to get PetscTime(): 9.53674e-08 Average time for MPI_Barrier(): 2.6226e-06 Average time for zero size MPI_Send(): 2.26498e-06 #PETSc Option Table entries: -b0 1e-4 -b1 0.9 -dp 0.05 -ep 2 -ksp_max_it 1000 -ksp_view -log_view -lp 2 -nb 1 -th 30 #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-blas-lapack-lib=/public/software/OpenBLAS/lib/libopenblas.a --with-mpi-dir=/home/zhangji/python/mpich-3.2/build --with-hdf5-dir=/public/software/mathlib/hdf5/1.8.12/gnu/ PETSC_DIR=/home/zhangji/python/petsc-3.7.6 PETSC_ARCH=linux-mpich-opblas --download-metis=/public/sourcecode/petsc_gnu/metis-5.1.0.tar.gz --download-ptscotch=/home/zhangji/python/scotch_6.0.4.tar.gz --download-pastix=/home/zhangji/python/pastix_5.2.3.tar.bz2 --with-debugging=no --CFLAGS=-O3 --CXXFLAGS=-O3 --FFLAGS=-O3 ----------------------------------------- Libraries compiled on Sat Jun 10 00:26:59 2017 on cn11 Machine characteristics: Linux-2.6.32-504.el6.x86_64-x86_64-with-centos-6.6-Final Using PETSc directory: /home/zhangji/python/petsc-3.7.6 Using PETSc arch: linux-mpich-opblas ----------------------------------------- Using C compiler: /home/zhangji/python/mpich-3.2/build/bin/mpicc -O3 -fPIC ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /home/zhangji/python/mpich-3.2/build/bin/mpif90 -O3 -fPIC ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/public/software/mathlib/hdf5/1.8.12/gnu/include -I/home/zhangji/python/mpich-3.2/build/include ----------------------------------------- Using C linker: /home/zhangji/python/mpich-3.2/build/bin/mpicc Using Fortran linker: /home/zhangji/python/mpich-3.2/build/bin/mpif90 Using libraries: -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -lpetsc -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -Wl,-rpath,/public/software/OpenBLAS/lib -L/public/software/OpenBLAS/lib -Wl,-rpath,/public/software/mathlib/hdf5/1.8.12/gnu/lib -L/public/software/mathlib/hdf5/1.8.12/gnu/lib -Wl,-rpath,/home/zhangji/python/mpich-3.2/build/lib -L/home/zhangji/python/mpich-3.2/build/lib -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -Wl,-rpath,/home/zhangji/python/petsc-3.7.6 -L/home/zhangji/python/petsc-3.7.6 -lmetis -lpastix -lopenblas -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lz -lX11 -lssl -lcrypto -lmpifort -lifport -lifcore -lpthread -lmpicxx -lrt -lm -lrt -lm -lpthread -lz -ldl -lmpi -limf -lsvml -lirng -lm -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lirc_s -ldl ----------------------------------------- -------------- next part -------------- Case information: pipe length: 2.000000, pipe radius: 1.000000 delta length of pipe is 0.050000, epsilon of pipe is 2.000000 threshold of seriers is 30 b: 1 numbers are evenly distributed within the range [0.000100, 0.900000] create matrix method: pf_stokesletsInPipe solve method: gmres, precondition method: none output file headle: force_pipe MPI size: 2 Stokeslets in pipe prepare, contain 7376 nodes create matrix use 8.694003s: _00001/00001_b=0.000100: calculate boundary condation use: 1.975384s KSP Object: 2 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1000 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 2 MPI processes type: none linear system matrix = precond matrix: Mat Object: 2 MPI processes type: mpidense rows=22128, cols=22128 total: nonzeros=489648384, allocated nonzeros=489648384 total number of mallocs used during MatSetValues calls =0 _00001/00001_u1: solve matrix equation use: 141.490945s, with residual norm 6.719300e-01 KSP Object: 2 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1000 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 2 MPI processes type: none linear system matrix = precond matrix: Mat Object: 2 MPI processes type: mpidense rows=22128, cols=22128 total: nonzeros=489648384, allocated nonzeros=489648384 total number of mallocs used during MatSetValues calls =0 _00001/00001_u2: solve matrix equation use: 142.746243s, with residual norm 6.782443e-01 KSP Object: 2 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1000 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 2 MPI processes type: none linear system matrix = precond matrix: Mat Object: 2 MPI processes type: mpidense rows=22128, cols=22128 total: nonzeros=489648384, allocated nonzeros=489648384 total number of mallocs used during MatSetValues calls =0 _00001/00001_u3: solve matrix equation use: 142.042608s, with residual norm 7.223828e-01 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- force_pipe.py on a linux-mpich-opblas named cn10 with 2 processors, by zhangji Tue Jun 13 17:52:11 2017 Using Petsc Release Version 3.7.6, Apr, 24, 2017 Max Max/Min Avg Total Time (sec): 4.388e+02 1.00006 4.387e+02 Objects: 4.130e+02 1.00000 4.130e+02 Flops: 1.521e+12 1.00000 1.521e+12 3.042e+12 Flops/sec: 3.467e+09 1.00006 3.467e+09 6.934e+09 MPI Messages: 1.200e+01 1.00000 1.200e+01 2.400e+01 MPI Message Lengths: 1.080e+02 1.00000 9.000e+00 2.160e+02 MPI Reductions: 9.541e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 4.3875e+02 100.0% 3.0421e+12 100.0% 2.400e+01 100.0% 9.000e+00 100.0% 9.540e+03 100.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecMDot 3000 1.0 2.6931e+01 4.4 1.02e+09 1.0 0.0e+00 0.0e+00 3.0e+03 4 0 0 0 31 4 0 0 0 31 76 VecNorm 3102 1.0 1.4946e+00 2.1 6.86e+07 1.0 0.0e+00 0.0e+00 3.1e+03 0 0 0 0 33 0 0 0 0 33 92 VecScale 3102 1.0 1.9959e-02 1.4 3.43e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3439 VecCopy 3204 1.0 8.3234e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 123 1.0 2.2550e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 201 1.0 2.7293e-01 1.2 4.45e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 33 VecMAXPY 3102 1.0 5.8037e-01 1.7 1.09e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3752 VecAssemblyBegin 9 1.0 1.0383e-0215.2 0.00e+00 0.0 2.4e+01 9.0e+00 2.7e+01 0 0100100 0 0 0100100 0 0 VecAssemblyEnd 9 1.0 2.5272e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 3114 1.0 8.7240e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.1e+03 0 0 0 0 33 0 0 0 0 33 0 VecNormalize 3102 1.0 1.5182e+00 2.1 1.03e+08 1.0 0.0e+00 0.0e+00 3.1e+03 0 0 0 0 33 0 0 0 0 33 136 MatMult 3105 1.0 4.1876e+02 1.1 1.52e+12 1.0 0.0e+00 0.0e+00 3.1e+03 93100 0 0 33 93100 0 0 33 7254 MatAssemblyBegin 2 1.0 5.4870e-02676.9 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 2 1.0 1.6594e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 3 1.0 1.1323e-02 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCSetUp 3 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 3102 1.0 8.3565e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPGMRESOrthog 3000 1.0 2.7499e+01 4.3 2.04e+09 1.0 0.0e+00 0.0e+00 3.0e+03 4 0 0 0 31 4 0 0 0 31 149 KSPSetUp 3 1.0 3.8886e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 3 1.0 4.2584e+02 1.0 1.52e+12 1.0 0.0e+00 0.0e+00 9.2e+03 97100 0 0 96 97100 0 0 96 7137 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Viewer 1 0 0 0. Index Set 51 51 39576 0. IS L to G Mapping 15 15 215892 0. Vector 205 205 14583280 0. Vector Scatter 26 26 436360 0. Matrix 9 9 1958884008 0. Preconditioner 3 3 2448 0. Krylov Solver 3 3 55200 0. Distributed Mesh 25 25 328692 0. Star Forest Bipartite Graph 50 50 42176 0. Discrete System 25 25 21600 0. ======================================================================================================================== Average time to get PetscTime(): 1.19209e-07 Average time for MPI_Barrier(): 1.7643e-06 Average time for zero size MPI_Send(): 3.45707e-06 #PETSc Option Table entries: -b0 1e-4 -b1 0.9 -dp 0.05 -ep 2 -ksp_max_it 1000 -ksp_view -log_view -lp 2 -nb 1 -th 30 #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-blas-lapack-lib=/public/software/OpenBLAS/lib/libopenblas.a --with-mpi-dir=/home/zhangji/python/mpich-3.2/build --with-hdf5-dir=/public/software/mathlib/hdf5/1.8.12/gnu/ PETSC_DIR=/home/zhangji/python/petsc-3.7.6 PETSC_ARCH=linux-mpich-opblas --download-metis=/public/sourcecode/petsc_gnu/metis-5.1.0.tar.gz --download-ptscotch=/home/zhangji/python/scotch_6.0.4.tar.gz --download-pastix=/home/zhangji/python/pastix_5.2.3.tar.bz2 --with-debugging=no --CFLAGS=-O3 --CXXFLAGS=-O3 --FFLAGS=-O3 ----------------------------------------- Libraries compiled on Sat Jun 10 00:26:59 2017 on cn11 Machine characteristics: Linux-2.6.32-504.el6.x86_64-x86_64-with-centos-6.6-Final Using PETSc directory: /home/zhangji/python/petsc-3.7.6 Using PETSc arch: linux-mpich-opblas ----------------------------------------- Using C compiler: /home/zhangji/python/mpich-3.2/build/bin/mpicc -O3 -fPIC ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /home/zhangji/python/mpich-3.2/build/bin/mpif90 -O3 -fPIC ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/public/software/mathlib/hdf5/1.8.12/gnu/include -I/home/zhangji/python/mpich-3.2/build/include ----------------------------------------- Using C linker: /home/zhangji/python/mpich-3.2/build/bin/mpicc Using Fortran linker: /home/zhangji/python/mpich-3.2/build/bin/mpif90 Using libraries: -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -lpetsc -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -Wl,-rpath,/public/software/OpenBLAS/lib -L/public/software/OpenBLAS/lib -Wl,-rpath,/public/software/mathlib/hdf5/1.8.12/gnu/lib -L/public/software/mathlib/hdf5/1.8.12/gnu/lib -Wl,-rpath,/home/zhangji/python/mpich-3.2/build/lib -L/home/zhangji/python/mpich-3.2/build/lib -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -Wl,-rpath,/home/zhangji/python/petsc-3.7.6 -L/home/zhangji/python/petsc-3.7.6 -lmetis -lpastix -lopenblas -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lz -lX11 -lssl -lcrypto -lmpifort -lifport -lifcore -lpthread -lmpicxx -lrt -lm -lrt -lm -lpthread -lz -ldl -lmpi -limf -lsvml -lirng -lm -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lirc_s -ldl ----------------------------------------- -------------- next part -------------- Case information: pipe length: 2.000000, pipe radius: 1.000000 delta length of pipe is 0.050000, epsilon of pipe is 2.000000 threshold of seriers is 30 b: 1 numbers are evenly distributed within the range [0.000100, 0.900000] create matrix method: pf_stokesletsInPipe solve method: gmres, precondition method: none output file headle: force_pipe MPI size: 6 Stokeslets in pipe prepare, contain 7376 nodes create matrix use 3.769214s: _00001/00001_b=0.000100: calculate boundary condation use: 1.710873s KSP Object: 6 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1000 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 6 MPI processes type: none linear system matrix = precond matrix: Mat Object: 6 MPI processes type: mpidense rows=22128, cols=22128 total: nonzeros=489648384, allocated nonzeros=489648384 total number of mallocs used during MatSetValues calls =0 _00001/00001_u1: solve matrix equation use: 42.929705s, with residual norm 4.621066e-01 KSP Object: 6 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1000 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 6 MPI processes type: none linear system matrix = precond matrix: Mat Object: 6 MPI processes type: mpidense rows=22128, cols=22128 total: nonzeros=489648384, allocated nonzeros=489648384 total number of mallocs used during MatSetValues calls =0 _00001/00001_u2: solve matrix equation use: 43.367914s, with residual norm 4.661823e-01 KSP Object: 6 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1000 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 6 MPI processes type: none linear system matrix = precond matrix: Mat Object: 6 MPI processes type: mpidense rows=22128, cols=22128 total: nonzeros=489648384, allocated nonzeros=489648384 total number of mallocs used during MatSetValues calls =0 _00001/00001_u3: solve matrix equation use: 43.108877s, with residual norm 1.047436e+00 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- force_pipe.py on a linux-mpich-opblas named cn5 with 6 processors, by zhangji Tue Jun 13 17:44:52 2017 Using Petsc Release Version 3.7.6, Apr, 24, 2017 Max Max/Min Avg Total Time (sec): 1.367e+02 1.00029 1.367e+02 Objects: 4.130e+02 1.00000 4.130e+02 Flops: 5.073e+11 1.00081 5.070e+11 3.042e+12 Flops/sec: 3.713e+09 1.00111 3.710e+09 2.226e+10 MPI Messages: 4.200e+01 2.33333 3.000e+01 1.800e+02 MPI Message Lengths: 2.520e+02 2.33333 6.000e+00 1.080e+03 MPI Reductions: 9.541e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 1.3666e+02 100.0% 3.0421e+12 100.0% 1.800e+02 100.0% 6.000e+00 100.0% 9.540e+03 100.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecMDot 3000 1.0 8.5456e+00 5.3 3.41e+08 1.0 0.0e+00 0.0e+00 3.0e+03 4 0 0 0 31 4 0 0 0 31 239 VecNorm 3102 1.0 1.0225e+00 1.7 2.29e+07 1.0 0.0e+00 0.0e+00 3.1e+03 1 0 0 0 33 1 0 0 0 33 134 VecScale 3102 1.0 9.2647e-03 1.5 1.14e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 7409 VecCopy 3204 1.0 3.4087e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 123 1.0 1.1790e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 201 1.0 1.6315e-03 1.4 1.48e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 5452 VecMAXPY 3102 1.0 9.3553e-02 1.1 3.63e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 23274 VecAssemblyBegin 9 1.0 1.2650e-02 2.2 0.00e+00 0.0 1.8e+02 6.0e+00 2.7e+01 0 0100100 0 0 0100100 0 0 VecAssemblyEnd 9 1.0 4.8575e-022122.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 3114 1.0 7.4042e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.1e+03 5 0 0 0 33 5 0 0 0 33 0 VecNormalize 3102 1.0 1.0357e+00 1.7 3.43e+07 1.0 0.0e+00 0.0e+00 3.1e+03 1 0 0 0 33 1 0 0 0 33 199 MatMult 3105 1.0 1.2659e+02 1.1 5.07e+11 1.0 0.0e+00 0.0e+00 3.1e+03 90100 0 0 33 90100 0 0 33 23996 MatAssemblyBegin 2 1.0 2.0758e-01138.7 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 2 1.0 1.9758e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 3 1.0 1.5299e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCSetUp 3 1.0 2.1458e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 3102 1.0 3.6064e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPGMRESOrthog 3000 1.0 8.6466e+00 5.1 6.82e+08 1.0 0.0e+00 0.0e+00 3.0e+03 4 0 0 0 31 4 0 0 0 31 473 KSPSetUp 3 1.0 1.5187e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 3 1.0 1.2925e+02 1.0 5.07e+11 1.0 0.0e+00 0.0e+00 9.2e+03 95100 0 0 96 95100 0 0 96 23515 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Viewer 1 0 0 0. Index Set 51 51 39576 0. IS L to G Mapping 15 15 78244 0. Vector 205 205 6265288 0. Vector Scatter 26 26 161064 0. Matrix 9 9 653332008 0. Preconditioner 3 3 2448 0. Krylov Solver 3 3 55200 0. Distributed Mesh 25 25 191044 0. Star Forest Bipartite Graph 50 50 42176 0. Discrete System 25 25 21600 0. ======================================================================================================================== Average time to get PetscTime(): 0. Average time for MPI_Barrier(): 0.000177193 Average time for zero size MPI_Send(): 3.89814e-05 #PETSc Option Table entries: -b0 1e-4 -b1 0.9 -dp 0.05 -ep 2 -ksp_max_it 1000 -ksp_view -log_view -lp 2 -nb 1 -th 30 #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-blas-lapack-lib=/public/software/OpenBLAS/lib/libopenblas.a --with-mpi-dir=/home/zhangji/python/mpich-3.2/build --with-hdf5-dir=/public/software/mathlib/hdf5/1.8.12/gnu/ PETSC_DIR=/home/zhangji/python/petsc-3.7.6 PETSC_ARCH=linux-mpich-opblas --download-metis=/public/sourcecode/petsc_gnu/metis-5.1.0.tar.gz --download-ptscotch=/home/zhangji/python/scotch_6.0.4.tar.gz --download-pastix=/home/zhangji/python/pastix_5.2.3.tar.bz2 --with-debugging=no --CFLAGS=-O3 --CXXFLAGS=-O3 --FFLAGS=-O3 ----------------------------------------- Libraries compiled on Sat Jun 10 00:26:59 2017 on cn11 Machine characteristics: Linux-2.6.32-504.el6.x86_64-x86_64-with-centos-6.6-Final Using PETSc directory: /home/zhangji/python/petsc-3.7.6 Using PETSc arch: linux-mpich-opblas ----------------------------------------- Using C compiler: /home/zhangji/python/mpich-3.2/build/bin/mpicc -O3 -fPIC ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /home/zhangji/python/mpich-3.2/build/bin/mpif90 -O3 -fPIC ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/public/software/mathlib/hdf5/1.8.12/gnu/include -I/home/zhangji/python/mpich-3.2/build/include ----------------------------------------- Using C linker: /home/zhangji/python/mpich-3.2/build/bin/mpicc Using Fortran linker: /home/zhangji/python/mpich-3.2/build/bin/mpif90 Using libraries: -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -lpetsc -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -Wl,-rpath,/public/software/OpenBLAS/lib -L/public/software/OpenBLAS/lib -Wl,-rpath,/public/software/mathlib/hdf5/1.8.12/gnu/lib -L/public/software/mathlib/hdf5/1.8.12/gnu/lib -Wl,-rpath,/home/zhangji/python/mpich-3.2/build/lib -L/home/zhangji/python/mpich-3.2/build/lib -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -Wl,-rpath,/home/zhangji/python/petsc-3.7.6 -L/home/zhangji/python/petsc-3.7.6 -lmetis -lpastix -lopenblas -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lz -lX11 -lssl -lcrypto -lmpifort -lifport -lifcore -lpthread -lmpicxx -lrt -lm -lrt -lm -lpthread -lz -ldl -lmpi -limf -lsvml -lirng -lm -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lirc_s -ldl ----------------------------------------- -------------- next part -------------- Case information: pipe length: 2.000000, pipe radius: 1.000000 delta length of pipe is 0.050000, epsilon of pipe is 2.000000 threshold of seriers is 30 b: 1 numbers are evenly distributed within the range [0.000100, 0.900000] create matrix method: pf_stokesletsInPipe solve method: gmres, precondition method: none output file headle: force_pipe MPI size: 4 Stokeslets in pipe prepare, contain 7376 nodes create matrix use 6.052226s: _00001/00001_b=0.000100: calculate boundary condation use: 1.826128s KSP Object: 4 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1000 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 4 MPI processes type: none linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpidense rows=22128, cols=22128 total: nonzeros=489648384, allocated nonzeros=489648384 total number of mallocs used during MatSetValues calls =0 _00001/00001_u1: solve matrix equation use: 52.616490s, with residual norm 5.813354e-01 KSP Object: 4 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1000 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 4 MPI processes type: none linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpidense rows=22128, cols=22128 total: nonzeros=489648384, allocated nonzeros=489648384 total number of mallocs used during MatSetValues calls =0 _00001/00001_u2: solve matrix equation use: 52.413213s, with residual norm 5.397962e-01 KSP Object: 4 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1000 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 4 MPI processes type: none linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpidense rows=22128, cols=22128 total: nonzeros=489648384, allocated nonzeros=489648384 total number of mallocs used during MatSetValues calls =0 _00001/00001_u3: solve matrix equation use: 52.495871s, with residual norm 9.503432e-01 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- force_pipe.py on a linux-mpich-opblas named cn5 with 4 processors, by zhangji Tue Jun 13 17:42:35 2017 Using Petsc Release Version 3.7.6, Apr, 24, 2017 Max Max/Min Avg Total Time (sec): 1.675e+02 1.00001 1.675e+02 Objects: 4.130e+02 1.00000 4.130e+02 Flops: 7.605e+11 1.00000 7.605e+11 3.042e+12 Flops/sec: 4.541e+09 1.00001 4.541e+09 1.816e+10 MPI Messages: 3.000e+01 1.66667 2.700e+01 1.080e+02 MPI Message Lengths: 1.800e+02 1.66667 6.000e+00 6.480e+02 MPI Reductions: 9.541e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 1.6749e+02 100.0% 3.0421e+12 100.0% 1.080e+02 100.0% 6.000e+00 100.0% 9.540e+03 100.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecMDot 3000 1.0 3.3006e+0156.7 5.11e+08 1.0 0.0e+00 0.0e+00 3.0e+03 9 0 0 0 31 9 0 0 0 31 62 VecNorm 3102 1.0 1.4346e+00 3.8 3.43e+07 1.0 0.0e+00 0.0e+00 3.1e+03 1 0 0 0 33 1 0 0 0 33 96 VecScale 3102 1.0 1.0346e-02 1.2 1.72e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 6634 VecCopy 3204 1.0 3.8601e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 123 1.0 1.1497e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 201 1.0 1.7583e-03 1.1 2.22e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 5059 VecMAXPY 3102 1.0 1.4580e-01 1.1 5.44e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 14934 VecAssemblyBegin 9 1.0 9.8622e-03 2.9 0.00e+00 0.0 1.1e+02 6.0e+00 2.7e+01 0 0100100 0 0 0100100 0 0 VecAssemblyEnd 9 1.0 3.1948e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 3114 1.0 5.3085e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.1e+03 3 0 0 0 33 3 0 0 0 33 0 VecNormalize 3102 1.0 1.4464e+00 3.7 5.15e+07 1.0 0.0e+00 0.0e+00 3.1e+03 1 0 0 0 33 1 0 0 0 33 142 MatMult 3105 1.0 1.5605e+02 1.3 7.59e+11 1.0 0.0e+00 0.0e+00 3.1e+03 84100 0 0 33 84100 0 0 33 19466 MatAssemblyBegin 2 1.0 9.4833e-0148.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 2 1.0 9.2912e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 3 1.0 1.7860e-03 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCSetUp 3 1.0 9.5367e-07 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 3102 1.0 4.0367e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPGMRESOrthog 3000 1.0 3.3157e+0146.5 1.02e+09 1.0 0.0e+00 0.0e+00 3.0e+03 9 0 0 0 31 9 0 0 0 31 123 KSPSetUp 3 1.0 8.2684e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 3 1.0 1.5735e+02 1.0 7.60e+11 1.0 0.0e+00 0.0e+00 9.2e+03 94100 0 0 96 94100 0 0 96 19315 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Viewer 1 0 0 0. Index Set 51 51 39576 0. IS L to G Mapping 15 15 112628 0. Vector 205 205 8343112 0. Vector Scatter 26 26 229832 0. Matrix 9 9 979454472 0. Preconditioner 3 3 2448 0. Krylov Solver 3 3 55200 0. Distributed Mesh 25 25 225428 0. Star Forest Bipartite Graph 50 50 42176 0. Discrete System 25 25 21600 0. ======================================================================================================================== Average time to get PetscTime(): 1.19209e-07 Average time for MPI_Barrier(): 7.45773e-05 Average time for zero size MPI_Send(): 6.04987e-05 #PETSc Option Table entries: -b0 1e-4 -b1 0.9 -dp 0.05 -ep 2 -ksp_max_it 1000 -ksp_view -log_view -lp 2 -nb 1 -th 30 #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-blas-lapack-lib=/public/software/OpenBLAS/lib/libopenblas.a --with-mpi-dir=/home/zhangji/python/mpich-3.2/build --with-hdf5-dir=/public/software/mathlib/hdf5/1.8.12/gnu/ PETSC_DIR=/home/zhangji/python/petsc-3.7.6 PETSC_ARCH=linux-mpich-opblas --download-metis=/public/sourcecode/petsc_gnu/metis-5.1.0.tar.gz --download-ptscotch=/home/zhangji/python/scotch_6.0.4.tar.gz --download-pastix=/home/zhangji/python/pastix_5.2.3.tar.bz2 --with-debugging=no --CFLAGS=-O3 --CXXFLAGS=-O3 --FFLAGS=-O3 ----------------------------------------- Libraries compiled on Sat Jun 10 00:26:59 2017 on cn11 Machine characteristics: Linux-2.6.32-504.el6.x86_64-x86_64-with-centos-6.6-Final Using PETSc directory: /home/zhangji/python/petsc-3.7.6 Using PETSc arch: linux-mpich-opblas ----------------------------------------- Using C compiler: /home/zhangji/python/mpich-3.2/build/bin/mpicc -O3 -fPIC ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /home/zhangji/python/mpich-3.2/build/bin/mpif90 -O3 -fPIC ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/public/software/mathlib/hdf5/1.8.12/gnu/include -I/home/zhangji/python/mpich-3.2/build/include ----------------------------------------- Using C linker: /home/zhangji/python/mpich-3.2/build/bin/mpicc Using Fortran linker: /home/zhangji/python/mpich-3.2/build/bin/mpif90 Using libraries: -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -lpetsc -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -Wl,-rpath,/public/software/OpenBLAS/lib -L/public/software/OpenBLAS/lib -Wl,-rpath,/public/software/mathlib/hdf5/1.8.12/gnu/lib -L/public/software/mathlib/hdf5/1.8.12/gnu/lib -Wl,-rpath,/home/zhangji/python/mpich-3.2/build/lib -L/home/zhangji/python/mpich-3.2/build/lib -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -Wl,-rpath,/home/zhangji/python/petsc-3.7.6 -L/home/zhangji/python/petsc-3.7.6 -lmetis -lpastix -lopenblas -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lz -lX11 -lssl -lcrypto -lmpifort -lifport -lifcore -lpthread -lmpicxx -lrt -lm -lrt -lm -lpthread -lz -ldl -lmpi -limf -lsvml -lirng -lm -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lirc_s -ldl ----------------------------------------- -------------- next part -------------- Case information: pipe length: 2.000000, pipe radius: 1.000000 delta length of pipe is 0.050000, epsilon of pipe is 2.000000 threshold of seriers is 30 b: 1 numbers are evenly distributed within the range [0.000100, 0.900000] create matrix method: pf_stokesletsInPipe solve method: gmres, precondition method: none output file headle: force_pipe MPI size: 2 Stokeslets in pipe prepare, contain 7376 nodes create matrix use 30.575036s: _00001/00001_b=0.000100: calculate boundary condation use: 3.463875s KSP Object: 2 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1000 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 2 MPI processes type: none linear system matrix = precond matrix: Mat Object: 2 MPI processes type: mpidense rows=22128, cols=22128 total: nonzeros=489648384, allocated nonzeros=489648384 total number of mallocs used during MatSetValues calls =0 _00001/00001_u1: solve matrix equation use: 84.212435s, with residual norm 6.719300e-01 KSP Object: 2 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1000 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 2 MPI processes type: none linear system matrix = precond matrix: Mat Object: 2 MPI processes type: mpidense rows=22128, cols=22128 total: nonzeros=489648384, allocated nonzeros=489648384 total number of mallocs used during MatSetValues calls =0 _00001/00001_u2: solve matrix equation use: 85.153371s, with residual norm 6.782443e-01 KSP Object: 2 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1000 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 2 MPI processes type: none linear system matrix = precond matrix: Mat Object: 2 MPI processes type: mpidense rows=22128, cols=22128 total: nonzeros=489648384, allocated nonzeros=489648384 total number of mallocs used during MatSetValues calls =0 _00001/00001_u3: solve matrix equation use: 85.246724s, with residual norm 7.223828e-01 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- force_pipe.py on a linux-mpich-opblas named cn5 with 2 processors, by zhangji Tue Jun 13 17:39:46 2017 Using Petsc Release Version 3.7.6, Apr, 24, 2017 Max Max/Min Avg Total Time (sec): 2.908e+02 1.00002 2.908e+02 Objects: 4.130e+02 1.00000 4.130e+02 Flops: 1.521e+12 1.00000 1.521e+12 3.042e+12 Flops/sec: 5.231e+09 1.00002 5.231e+09 1.046e+10 MPI Messages: 1.200e+01 1.00000 1.200e+01 2.400e+01 MPI Message Lengths: 1.080e+02 1.00000 9.000e+00 2.160e+02 MPI Reductions: 9.541e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 2.9076e+02 100.0% 3.0421e+12 100.0% 2.400e+01 100.0% 9.000e+00 100.0% 9.540e+03 100.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecMDot 3000 1.0 7.5770e+01146.1 1.02e+09 1.0 0.0e+00 0.0e+00 3.0e+03 13 0 0 0 31 13 0 0 0 31 27 VecNorm 3102 1.0 2.5821e+00 4.9 6.86e+07 1.0 0.0e+00 0.0e+00 3.1e+03 1 0 0 0 33 1 0 0 0 33 53 VecScale 3102 1.0 1.5304e-02 1.1 3.43e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 4485 VecCopy 3204 1.0 6.6598e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 123 1.0 1.7567e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 201 1.0 1.9052e-02 2.3 4.45e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 467 VecMAXPY 3102 1.0 3.7953e-01 1.4 1.09e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 5737 VecAssemblyBegin 9 1.0 7.8998e-03 3.9 0.00e+00 0.0 2.4e+01 9.0e+00 2.7e+01 0 0100100 0 0 0100100 0 0 VecAssemblyEnd 9 1.0 2.5034e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 3114 1.0 3.4727e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.1e+03 1 0 0 0 33 1 0 0 0 33 0 VecNormalize 3102 1.0 2.6013e+00 4.8 1.03e+08 1.0 0.0e+00 0.0e+00 3.1e+03 1 0 0 0 33 1 0 0 0 33 79 MatMult 3105 1.0 2.5316e+02 1.4 1.52e+12 1.0 0.0e+00 0.0e+00 3.1e+03 74100 0 0 33 74100 0 0 33 11999 MatAssemblyBegin 2 1.0 2.0132e+01910.2 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 3 0 0 0 0 3 0 0 0 0 0 MatAssemblyEnd 2 1.0 5.2810e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 3 1.0 1.5538e-03 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCSetUp 3 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 3102 1.0 6.7246e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPGMRESOrthog 3000 1.0 7.6143e+0197.9 2.04e+09 1.0 0.0e+00 0.0e+00 3.0e+03 13 0 0 0 31 13 0 0 0 31 54 KSPSetUp 3 1.0 7.3075e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 3 1.0 2.5436e+02 1.0 1.52e+12 1.0 0.0e+00 0.0e+00 9.2e+03 87100 0 0 96 87100 0 0 96 11949 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Viewer 1 0 0 0. Index Set 51 51 39576 0. IS L to G Mapping 15 15 215892 0. Vector 205 205 14583280 0. Vector Scatter 26 26 436360 0. Matrix 9 9 1958884008 0. Preconditioner 3 3 2448 0. Krylov Solver 3 3 55200 0. Distributed Mesh 25 25 328692 0. Star Forest Bipartite Graph 50 50 42176 0. Discrete System 25 25 21600 0. ======================================================================================================================== Average time to get PetscTime(): 0. Average time for MPI_Barrier(): 3.65734e-05 Average time for zero size MPI_Send(): 6.1512e-05 #PETSc Option Table entries: -b0 1e-4 -b1 0.9 -dp 0.05 -ep 2 -ksp_max_it 1000 -ksp_view -log_view -lp 2 -nb 1 -th 30 #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-blas-lapack-lib=/public/software/OpenBLAS/lib/libopenblas.a --with-mpi-dir=/home/zhangji/python/mpich-3.2/build --with-hdf5-dir=/public/software/mathlib/hdf5/1.8.12/gnu/ PETSC_DIR=/home/zhangji/python/petsc-3.7.6 PETSC_ARCH=linux-mpich-opblas --download-metis=/public/sourcecode/petsc_gnu/metis-5.1.0.tar.gz --download-ptscotch=/home/zhangji/python/scotch_6.0.4.tar.gz --download-pastix=/home/zhangji/python/pastix_5.2.3.tar.bz2 --with-debugging=no --CFLAGS=-O3 --CXXFLAGS=-O3 --FFLAGS=-O3 ----------------------------------------- Libraries compiled on Sat Jun 10 00:26:59 2017 on cn11 Machine characteristics: Linux-2.6.32-504.el6.x86_64-x86_64-with-centos-6.6-Final Using PETSc directory: /home/zhangji/python/petsc-3.7.6 Using PETSc arch: linux-mpich-opblas ----------------------------------------- Using C compiler: /home/zhangji/python/mpich-3.2/build/bin/mpicc -O3 -fPIC ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /home/zhangji/python/mpich-3.2/build/bin/mpif90 -O3 -fPIC ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/public/software/mathlib/hdf5/1.8.12/gnu/include -I/home/zhangji/python/mpich-3.2/build/include ----------------------------------------- Using C linker: /home/zhangji/python/mpich-3.2/build/bin/mpicc Using Fortran linker: /home/zhangji/python/mpich-3.2/build/bin/mpif90 Using libraries: -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -lpetsc -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -Wl,-rpath,/public/software/OpenBLAS/lib -L/public/software/OpenBLAS/lib -Wl,-rpath,/public/software/mathlib/hdf5/1.8.12/gnu/lib -L/public/software/mathlib/hdf5/1.8.12/gnu/lib -Wl,-rpath,/home/zhangji/python/mpich-3.2/build/lib -L/home/zhangji/python/mpich-3.2/build/lib -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -Wl,-rpath,/home/zhangji/python/petsc-3.7.6 -L/home/zhangji/python/petsc-3.7.6 -lmetis -lpastix -lopenblas -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lz -lX11 -lssl -lcrypto -lmpifort -lifport -lifcore -lpthread -lmpicxx -lrt -lm -lrt -lm -lpthread -lz -ldl -lmpi -limf -lsvml -lirng -lm -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lirc_s -ldl ----------------------------------------- -------------- next part -------------- Case information: pipe length: 2.000000, pipe radius: 1.000000 delta length of pipe is 0.050000, epsilon of pipe is 2.000000 threshold of seriers is 30 b: 1 numbers are evenly distributed within the range [0.000100, 0.900000] create matrix method: pf_stokesletsInPipe solve method: gmres, precondition method: none output file headle: force_pipe MPI size: 1 Stokeslets in pipe prepare, contain 7376 nodes create matrix use 80.827850s: _00001/00001_b=0.000100: calculate boundary condation use: 3.421076s KSP Object: 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1000 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: none linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqdense rows=22128, cols=22128 total: nonzeros=489648384, allocated nonzeros=489648384 total number of mallocs used during MatSetValues calls =0 _00001/00001_u1: solve matrix equation use: 88.631310s, with residual norm 9.884635e-04 KSP Object: 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1000 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: none linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqdense rows=22128, cols=22128 total: nonzeros=489648384, allocated nonzeros=489648384 total number of mallocs used during MatSetValues calls =0 _00001/00001_u2: solve matrix equation use: 88.855811s, with residual norm 4.144572e-04 KSP Object: 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1000 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: none linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqdense rows=22128, cols=22128 total: nonzeros=489648384, allocated nonzeros=489648384 total number of mallocs used during MatSetValues calls =0 _00001/00001_u3: solve matrix equation use: 88.673738s, with residual norm 4.864481e-03 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- force_pipe.py on a linux-mpich-opblas named cn5 with 1 processor, by zhangji Tue Jun 13 17:34:55 2017 Using Petsc Release Version 3.7.6, Apr, 24, 2017 Max Max/Min Avg Total Time (sec): 3.521e+02 1.00000 3.521e+02 Objects: 4.010e+02 1.00000 4.010e+02 Flops: 3.042e+12 1.00000 3.042e+12 3.042e+12 Flops/sec: 8.640e+09 1.00000 8.640e+09 8.640e+09 MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 3.5212e+02 100.0% 3.0421e+12 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecMDot 3000 1.0 7.4338e-01 1.0 2.04e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2750 VecNorm 3102 1.0 3.6990e-02 1.0 1.37e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3711 VecScale 3102 1.0 2.4405e-02 1.0 6.86e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2813 VecCopy 3204 1.0 1.1260e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 173 1.0 4.3569e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 201 1.0 1.3273e-02 1.0 8.90e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 670 VecMAXPY 3102 1.0 5.2372e-01 1.0 2.18e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 4158 VecAssemblyBegin 9 1.0 4.3392e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 9 1.0 4.2915e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 9 1.0 2.2984e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNormalize 3102 1.0 6.4425e-02 1.0 2.06e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3196 MatMult 3105 1.0 2.6468e+02 1.0 3.04e+12 1.0 0.0e+00 0.0e+00 0.0e+00 75100 0 0 0 75100 0 0 0 11477 MatAssemblyBegin 2 1.0 3.0994e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 2 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 3 1.0 6.1703e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCSetUp 3 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 3102 1.0 1.1186e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPGMRESOrthog 3000 1.0 1.2434e+00 1.0 4.09e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3289 KSPSetUp 3 1.0 5.0569e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 3 1.0 2.6590e+02 1.0 3.04e+12 1.0 0.0e+00 0.0e+00 0.0e+00 76100 0 0 0 76100 0 0 0 11430 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Viewer 1 0 0 0. Index Set 47 47 36472 0. IS L to G Mapping 15 15 422420 0. Vector 201 201 26873904 0. Vector Scatter 24 24 15744 0. Matrix 7 7 3917737368 0. Preconditioner 3 3 2448 0. Krylov Solver 3 3 55200 0. Distributed Mesh 25 25 535220 0. Star Forest Bipartite Graph 50 50 42176 0. Discrete System 25 25 21600 0. ======================================================================================================================== Average time to get PetscTime(): 9.53674e-08 #PETSc Option Table entries: -b0 1e-4 -b1 0.9 -dp 0.05 -ep 2 -ksp_max_it 1000 -ksp_view -log_view -lp 2 -nb 1 -th 30 #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-blas-lapack-lib=/public/software/OpenBLAS/lib/libopenblas.a --with-mpi-dir=/home/zhangji/python/mpich-3.2/build --with-hdf5-dir=/public/software/mathlib/hdf5/1.8.12/gnu/ PETSC_DIR=/home/zhangji/python/petsc-3.7.6 PETSC_ARCH=linux-mpich-opblas --download-metis=/public/sourcecode/petsc_gnu/metis-5.1.0.tar.gz --download-ptscotch=/home/zhangji/python/scotch_6.0.4.tar.gz --download-pastix=/home/zhangji/python/pastix_5.2.3.tar.bz2 --with-debugging=no --CFLAGS=-O3 --CXXFLAGS=-O3 --FFLAGS=-O3 ----------------------------------------- Libraries compiled on Sat Jun 10 00:26:59 2017 on cn11 Machine characteristics: Linux-2.6.32-504.el6.x86_64-x86_64-with-centos-6.6-Final Using PETSc directory: /home/zhangji/python/petsc-3.7.6 Using PETSc arch: linux-mpich-opblas ----------------------------------------- Using C compiler: /home/zhangji/python/mpich-3.2/build/bin/mpicc -O3 -fPIC ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /home/zhangji/python/mpich-3.2/build/bin/mpif90 -O3 -fPIC ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/include -I/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/include -I/public/software/mathlib/hdf5/1.8.12/gnu/include -I/home/zhangji/python/mpich-3.2/build/include ----------------------------------------- Using C linker: /home/zhangji/python/mpich-3.2/build/bin/mpicc Using Fortran linker: /home/zhangji/python/mpich-3.2/build/bin/mpif90 Using libraries: -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -lpetsc -Wl,-rpath,/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -L/home/zhangji/python/petsc-3.7.6/linux-mpich-opblas/lib -Wl,-rpath,/public/software/OpenBLAS/lib -L/public/software/OpenBLAS/lib -Wl,-rpath,/public/software/mathlib/hdf5/1.8.12/gnu/lib -L/public/software/mathlib/hdf5/1.8.12/gnu/lib -Wl,-rpath,/home/zhangji/python/mpich-3.2/build/lib -L/home/zhangji/python/mpich-3.2/build/lib -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/compiler/lib/intel64 -Wl,-rpath,/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -L/public/software/compiler/intel/composer_xe_2015.2.164/mkl/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -Wl,-rpath,/home/zhangji/python/petsc-3.7.6 -L/home/zhangji/python/petsc-3.7.6 -lmetis -lpastix -lopenblas -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lz -lX11 -lssl -lcrypto -lmpifort -lifport -lifcore -lpthread -lmpicxx -lrt -lm -lrt -lm -lpthread -lz -ldl -lmpi -limf -lsvml -lirng -lm -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lirc_s -ldl ----------------------------------------- From G.Vaz at marin.nl Tue Jun 13 08:34:41 2017 From: G.Vaz at marin.nl (Vaz, Guilherme) Date: Tue, 13 Jun 2017 13:34:41 +0000 Subject: [petsc-users] PETSC on Cray Hazelhen In-Reply-To: References: <1497343731587.65032@marin.nl>, Message-ID: <1497360881294.8614@marin.nl> Dear Matthew, Thanks. It went further, but now I get: TESTING: configureMPIEXEC from config.packages.MPI(/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/packages/MPI.py:143) ******************************************************************************* UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): ------------------------------------------------------------------------------- Must give a default value for known-mpi-shared-libraries since executables cannot be run ******************************************************************************* Last lines from the log: File "./config/configure.py", line 405, in petsc_configure framework.configure(out = sys.stdout) File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/framework.py", line 1090, in configure self.processChildren() File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/framework.py", line 1079, in processChildren self.serialEvaluation(self.childGraph) File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/framework.py", line 1060, in serialEvaluation child.configure() File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/package.py", line 791, in configure self.executeTest(self.checkSharedLibrary) File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/base.py", line 126, in executeTest ret = test(*args,**kargs) File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/packages/MPI.py", line 135, in checkSharedLibrary self.shared = self.libraries.checkShared('#include \n','MPI_Init','MPI_Initialized','MPI_Finalize',checkLink = self.checkPackageLink,libraries = self.lib, defaultArg = 'known-mpi-shared-libraries', ex ecutor = self.mpiexec) File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/libraries.py", line 471, in checkShared if self.checkRun(defaultIncludes, body, defaultArg = defaultArg, executor = executor): File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/base.py", line 628, in checkRun (output, returnCode) = self.outputRun(includes, body, cleanup, defaultArg, executor) File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/base.py", line 598, in outputRun raise ConfigureSetupError('Must give a default value for '+defaultOutputArg+' since executables cannot be run') ?Any ideas? Something related with --with-shared-libraries=0 \ --with-batch=1 \ ?The first I set because it was in the cray example, and the second because aprun (the mpiexec of Cray) is not available in the frontend. Thanks, Guilherme V. dr. ir. Guilherme Vaz | CFD Research Coordinator / Senior CFD Researcher | Research & Development MARIN | T +31 317 49 33 25 | M +31 621 13 11 97 | G.Vaz at marin.nl | www.marin.nl [LinkedIn] [YouTube] [Twitter] [Facebook] MARIN news: MARIN deelt onderzoek tijdens R&D Seminar, 21 juni 2017 ________________________________ From: Matthew Knepley Sent: Tuesday, June 13, 2017 2:34 PM To: Vaz, Guilherme Cc: PETSc Subject: Re: [petsc-users] PETSC on Cray Hazelhen On Tue, Jun 13, 2017 at 3:48 AM, Vaz, Guilherme > wrote: Dear all, I am trying to install PETSC on a Cray XC40 system (Hazelhen) with the usual Cray wrappers for Intel compilers, with some chosen external packages and MKL libraries. I read some threads in the mailing list about this, and I tried the petsc-3.7.5/config/examples/arch-cray-xt6-pkgs-opt.py configuration options. After trying this (please abstract from my own env vars), CONFOPTS="--prefix=$PETSC_INSTALL_DIR \ --with-cc=cc \ --with-cxx=CC \ --with-fc=ftn \ --with-clib-autodetect=0 \ --with-cxxlib-autodetect=0 \ --with-fortranlib-autodetect=0 \ --COPTFLAGS=-fast -mp \ --CXXOPTFLAGS=-fast -mp \ --FOPTFLAGS=-fast -mp \ --with-shared-libraries=0 \ --with-batch=1 \ --with-x=0 \ --with-mpe=0 \ --with-debugging=0 \ --download-superlu_dist=$SOURCE_DIR/$SUPERLU_SOURCE_FILE \ --with-blas-lapack-dir=$BLASDIR \ --download-parmetis=$SOURCE_DIR/$PARMETIS_SOURCE_FILE \ --download-metis=$SOURCE_DIR/$METIS_SOURCE_FILE \ --with-external-packages-dir=$INSTALL_DIR \ --with-ssl=0 " I get the following error: TESTING: checkFortranLinkingCxx from config.compilers(/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/compilers.py:1097) ******************************************************************************* UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): ------------------------------------------------------------------------------- Fortran could not successfully link C++ objects ******************************************************************************* Does it ring a bell? Any tips? You turned off autodetection, so it will not find libstdc++. That either has to be put in LIBS, or I would recommend --with-cxx=0 since nothing you have there requires C++. Thanks, Matt Thanks, Guilherme V. dr. ir. Guilherme Vaz | CFD Research Coordinator / Senior CFD Researcher | Research & Development MARIN | T +31 317 49 33 25 | M +31 621 13 11 97 | G.Vaz at marin.nl | www.marin.nl [LinkedIn] [YouTube] [Twitter] [Facebook] MARIN news: Maritime Safety seminar, September 12, Singapore -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imageb0ee12.PNG Type: image/png Size: 293 bytes Desc: imageb0ee12.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image1bd020.PNG Type: image/png Size: 253 bytes Desc: image1bd020.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imaged8f08c.PNG Type: image/png Size: 331 bytes Desc: imaged8f08c.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imagedacd80.PNG Type: image/png Size: 333 bytes Desc: imagedacd80.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image628296.PNG Type: image/png Size: 293 bytes Desc: image628296.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imagecdd6dd.PNG Type: image/png Size: 331 bytes Desc: imagecdd6dd.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imageae7ad3.PNG Type: image/png Size: 333 bytes Desc: imageae7ad3.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image9aa6ed.PNG Type: image/png Size: 253 bytes Desc: image9aa6ed.PNG URL: From stefano.zampini at gmail.com Tue Jun 13 08:42:59 2017 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Tue, 13 Jun 2017 15:42:59 +0200 Subject: [petsc-users] PETSC on Cray Hazelhen In-Reply-To: <1497360881294.8614@marin.nl> References: <1497343731587.65032@marin.nl> <1497360881294.8614@marin.nl> Message-ID: Guilherme, here is my debug configuration (with shared libraries) in PETSc on a XC40 '--CFLAGS=-mkl=sequential -g -O0 ', '--CXXFLAGS=-mkl=sequential -g -O0 ', '--FFLAGS=-mkl=sequential -g -O0 -lstdc++', '--LDFLAGS=-dynamic', '--download-metis-cmake-arguments=-DCMAKE_C_COMPILER_FORCED=1', '--download-metis=1', '--download-parmetis-cmake-arguments=-DCMAKE_C_COMPILER_FORCED=1', '--download-parmetis=1', '--known-bits-per-byte=8', '--known-has-attribute-aligned=1', '--known-level1-dcache-assoc=8', '--known-level1-dcache-linesize=64', '--known-level1-dcache-size=32768', '--known-memcmp-ok=1', '--known-mpi-c-double-complex=1', '--known-mpi-int64_t=1', '--known-mpi-long-double=1', '--known-mpi-shared-libraries=0', '--known-sdot-returns-double=0', '--known-sizeof-MPI_Comm=4', '--known-sizeof-MPI_Fint=4', '--known-sizeof-char=1', '--known-sizeof-double=8', '--known-sizeof-float=4', '--known-sizeof-int=4', '--known-sizeof-long-long=8', '--known-sizeof-long=8', '--known-sizeof-short=2', '--known-sizeof-size_t=8', '--known-sizeof-void-p=8', '--known-snrm2-returns-double=0', '--with-ar=ar', '--with-batch=1', '--with-cc=/opt/cray/craype/2.4.2/bin/cc', '--with-clib-autodetect=0', '--with-cmake=/home/zampins/local/bin/cmake', '--with-cxx=/opt/cray/craype/2.4.2/bin/CC', '--with-cxxlib-autodetect=0', '--with-debugging=1', '--with-dependencies=0', '--with-etags=0', '--with-fc=/opt/cray/craype/2.4.2/bin/ftn', '--with-fortran-datatypes=0', '--with-fortran-interfaces=0', '--with-fortranlib-autodetect=0', '--with-pthread=0', '--with-ranlib=ranlib', '--with-scalar-type=real', '--with-shared-ld=ar', '--with-shared-libraries=1', 'PETSC_ARCH=arch-intel-debug', 2017-06-13 15:34 GMT+02:00 Vaz, Guilherme : > Dear Matthew, > > > Thanks. It went further, but now I get: > TESTING: configureMPIEXEC from config.packages.MPI(/zhome/ > academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/ > petsc-3.7.5/config/BuildSystem/config/packages/MPI.py:143) > ************************************************************ > ******************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for > details): > ------------------------------------------------------------ > ------------------- > Must give a default value for known-mpi-shared-libraries since executables > cannot be run > ************************************************************ > ******************* > > Last lines from the log: > File "./config/configure.py", line 405, in petsc_configure > framework.configure(out = sys.stdout) > File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/ > Libs/build/petsc-3.7.5/config/BuildSystem/config/framework.py", line > 1090, in configure > self.processChildren() > File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/ > Libs/build/petsc-3.7.5/config/BuildSystem/config/framework.py", line > 1079, in processChildren > self.serialEvaluation(self.childGraph) > File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/ > Libs/build/petsc-3.7.5/config/BuildSystem/config/framework.py", line > 1060, in serialEvaluation > child.configure() > File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/ > Libs/build/petsc-3.7.5/config/BuildSystem/config/package.py", line 791, > in configure > self.executeTest(self.checkSharedLibrary) > File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/ > Libs/build/petsc-3.7.5/config/BuildSystem/config/base.py", line 126, in > executeTest > ret = test(*args,**kargs) > File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/ > Libs/build/petsc-3.7.5/config/BuildSystem/config/packages/MPI.py", line > 135, in checkSharedLibrary > self.shared = self.libraries.checkShared('#include > \n','MPI_Init','MPI_Initialized','MPI_Finalize',checkLink = > self.checkPackageLink,libraries = self.lib, defaultArg = > 'known-mpi-shared-libraries', ex > ecutor = self.mpiexec) > File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/ > Libs/build/petsc-3.7.5/config/BuildSystem/config/libraries.py", line 471, > in checkShared > if self.checkRun(defaultIncludes, body, defaultArg = defaultArg, > executor = executor): > File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/ > Libs/build/petsc-3.7.5/config/BuildSystem/config/base.py", line 628, in > checkRun > (output, returnCode) = self.outputRun(includes, body, cleanup, > defaultArg, executor) > File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/ > Libs/build/petsc-3.7.5/config/BuildSystem/config/base.py", line 598, in > outputRun > raise ConfigureSetupError('Must give a default value for > '+defaultOutputArg+' since executables cannot be run') > > ?Any ideas? Something related with > --with-shared-libraries=0 \ > --with-batch=1 \ > ?The first I set because it was in the cray example, and the second > because aprun (the mpiexec of Cray) is not available in the frontend. > > Thanks, > > Guilherme V. > > dr. ir. Guilherme Vaz | CFD Research Coordinator / Senior CFD Researcher | > Research & Development > MARIN | T +31 317 49 33 25 | M +31 621 13 11 97 | G.Vaz at marin.nl | > www.marin.nl > > [image: LinkedIn] [image: > YouTube] [image: Twitter] > [image: Facebook] > > MARIN news: MARIN deelt onderzoek tijdens R&D Seminar, 21 juni 2017 > > > ------------------------------ > *From:* Matthew Knepley > *Sent:* Tuesday, June 13, 2017 2:34 PM > *To:* Vaz, Guilherme > *Cc:* PETSc > *Subject:* Re: [petsc-users] PETSC on Cray Hazelhen > > On Tue, Jun 13, 2017 at 3:48 AM, Vaz, Guilherme wrote: > >> Dear all, >> >> I am trying to install PETSC on a Cray XC40 system (Hazelhen) with the >> usual Cray wrappers for Intel compilers, with some chosen external packages >> and MKL libraries. >> >> I read some threads in the mailing list about this, and I tried the >> petsc-3.7.5/config/examples/arch-cray-xt6-pkgs-opt.py configuration options. >> After trying this (please abstract from my own env vars), >> CONFOPTS="--prefix=$PETSC_INSTALL_DIR \ >> --with-cc=cc \ >> --with-cxx=CC \ >> --with-fc=ftn \ >> --with-clib-autodetect=0 \ >> --with-cxxlib-autodetect=0 \ >> --with-fortranlib-autodetect=0 \ >> --COPTFLAGS=-fast -mp \ >> --CXXOPTFLAGS=-fast -mp \ >> --FOPTFLAGS=-fast -mp \ >> --with-shared-libraries=0 \ >> --with-batch=1 \ >> --with-x=0 \ >> --with-mpe=0 \ >> --with-debugging=0 \ >> --download-superlu_dist=$SOURCE_DIR/$SUPERLU_SOURCE_FILE \ >> --with-blas-lapack-dir=$BLASDIR \ >> --download-parmetis=$SOURCE_DIR/$PARMETIS_SOURCE_FILE \ >> --download-metis=$SOURCE_DIR/$METIS_SOURCE_FILE \ >> --with-external-packages-dir=$INSTALL_DIR \ >> --with-ssl=0 " >> >> >> I get the following error: >> >> >> TESTING: checkFortranLinkingCxx from config.compilers(/zhome/academ >> ic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3. >> 7.5/config/BuildSystem/config/compilers.py:1097) >> ************************************************************ >> ******************* >> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for >> details): >> ------------------------------------------------------------ >> ------------------- >> Fortran could not successfully link C++ objects >> ************************************************************ >> ******************* >> >> Does it ring a bell? Any tips? >> > > You turned off autodetection, so it will not find libstdc++. That either > has to be put in LIBS, or I would recommend > > --with-cxx=0 > > since nothing you have there requires C++. > > Thanks, > > Matt > > >> Thanks, >> Guilherme V. >> >> dr. ir. Guilherme Vaz | CFD Research Coordinator / Senior CFD Researcher | >> Research & Development >> MARIN | T +31 317 49 33 25 <+31%20317%20493%20325> | M +31 621 13 11 97 >> | G.Vaz at marin.nl | www.marin.nl >> >> [image: LinkedIn] [image: >> YouTube] [image: Twitter] >> [image: Facebook] >> >> MARIN news: Maritime Safety seminar, September 12, Singapore >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imageae7ad3.PNG Type: image/png Size: 333 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image628296.PNG Type: image/png Size: 293 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imaged8f08c.PNG Type: image/png Size: 331 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image9aa6ed.PNG Type: image/png Size: 253 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imagedacd80.PNG Type: image/png Size: 333 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image1bd020.PNG Type: image/png Size: 253 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imagecdd6dd.PNG Type: image/png Size: 331 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imageb0ee12.PNG Type: image/png Size: 293 bytes Desc: not available URL: From G.Vaz at marin.nl Tue Jun 13 09:01:36 2017 From: G.Vaz at marin.nl (Vaz, Guilherme) Date: Tue, 13 Jun 2017 14:01:36 +0000 Subject: [petsc-users] PETSC on Cray Hazelhen In-Reply-To: References: <1497343731587.65032@marin.nl> <1497360881294.8614@marin.nl>, Message-ID: <1497362496576.40011@marin.nl> Stefano/Mathew, Do I need all this :-)? And I dont want a debug version, just an optimized release version. Thus I understand the -g -O0 flags for debug, but the rest I am not sure what is for debug and for a release version. Sorry... I am also kind of confused on the shared libraries issue, '--known-mpi-shared-libraries=0', '--with-shared-libraries=1', on the static vs dynamic linking (I thought in XC-40 we had to compile everything statically, --LDFLAGS=-dynamic and on the FFFLAG: --FFLAGS=-mkl=sequential -g -O0 -lstdc++ Is this to be used with Intel MKL libraries? Thanks for the help, you both. Guilherme V. dr. ir. Guilherme Vaz | CFD Research Coordinator / Senior CFD Researcher | Research & Development MARIN | T +31 317 49 33 25 | M +31 621 13 11 97 | G.Vaz at marin.nl | www.marin.nl [LinkedIn] [YouTube] [Twitter] [Facebook] MARIN news: MARIN deelt onderzoek tijdens R&D Seminar, 21 juni 2017 ________________________________ From: Stefano Zampini Sent: Tuesday, June 13, 2017 3:42 PM To: Vaz, Guilherme Cc: Matthew Knepley; PETSc Subject: Re: [petsc-users] PETSC on Cray Hazelhen Guilherme, here is my debug configuration (with shared libraries) in PETSc on a XC40 '--CFLAGS=-mkl=sequential -g -O0 ', '--CXXFLAGS=-mkl=sequential -g -O0 ', '--FFLAGS=-mkl=sequential -g -O0 -lstdc++', '--LDFLAGS=-dynamic', '--download-metis-cmake-arguments=-DCMAKE_C_COMPILER_FORCED=1', '--download-metis=1', '--download-parmetis-cmake-arguments=-DCMAKE_C_COMPILER_FORCED=1', '--download-parmetis=1', '--known-bits-per-byte=8', '--known-has-attribute-aligned=1', '--known-level1-dcache-assoc=8', '--known-level1-dcache-linesize=64', '--known-level1-dcache-size=32768', '--known-memcmp-ok=1', '--known-mpi-c-double-complex=1', '--known-mpi-int64_t=1', '--known-mpi-long-double=1', '--known-mpi-shared-libraries=0', '--known-sdot-returns-double=0', '--known-sizeof-MPI_Comm=4', '--known-sizeof-MPI_Fint=4', '--known-sizeof-char=1', '--known-sizeof-double=8', '--known-sizeof-float=4', '--known-sizeof-int=4', '--known-sizeof-long-long=8', '--known-sizeof-long=8', '--known-sizeof-short=2', '--known-sizeof-size_t=8', '--known-sizeof-void-p=8', '--known-snrm2-returns-double=0', '--with-ar=ar', '--with-batch=1', '--with-cc=/opt/cray/craype/2.4.2/bin/cc', '--with-clib-autodetect=0', '--with-cmake=/home/zampins/local/bin/cmake', '--with-cxx=/opt/cray/craype/2.4.2/bin/CC', '--with-cxxlib-autodetect=0', '--with-debugging=1', '--with-dependencies=0', '--with-etags=0', '--with-fc=/opt/cray/craype/2.4.2/bin/ftn', '--with-fortran-datatypes=0', '--with-fortran-interfaces=0', '--with-fortranlib-autodetect=0', '--with-pthread=0', '--with-ranlib=ranlib', '--with-scalar-type=real', '--with-shared-ld=ar', '--with-shared-libraries=1', 'PETSC_ARCH=arch-intel-debug', 2017-06-13 15:34 GMT+02:00 Vaz, Guilherme >: Dear Matthew, Thanks. It went further, but now I get: TESTING: configureMPIEXEC from config.packages.MPI(/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/packages/MPI.py:143) ******************************************************************************* UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): ------------------------------------------------------------------------------- Must give a default value for known-mpi-shared-libraries since executables cannot be run ******************************************************************************* Last lines from the log: File "./config/configure.py", line 405, in petsc_configure framework.configure(out = sys.stdout) File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/framework.py", line 1090, in configure self.processChildren() File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/framework.py", line 1079, in processChildren self.serialEvaluation(self.childGraph) File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/framework.py", line 1060, in serialEvaluation child.configure() File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/package.py", line 791, in configure self.executeTest(self.checkSharedLibrary) File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/base.py", line 126, in executeTest ret = test(*args,**kargs) File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/packages/MPI.py", line 135, in checkSharedLibrary self.shared = self.libraries.checkShared('#include \n','MPI_Init','MPI_Initialized','MPI_Finalize',checkLink = self.checkPackageLink,libraries = self.lib, defaultArg = 'known-mpi-shared-libraries', ex ecutor = self.mpiexec) File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/libraries.py", line 471, in checkShared if self.checkRun(defaultIncludes, body, defaultArg = defaultArg, executor = executor): File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/base.py", line 628, in checkRun (output, returnCode) = self.outputRun(includes, body, cleanup, defaultArg, executor) File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/base.py", line 598, in outputRun raise ConfigureSetupError('Must give a default value for '+defaultOutputArg+' since executables cannot be run') ?Any ideas? Something related with --with-shared-libraries=0 \ --with-batch=1 \ ?The first I set because it was in the cray example, and the second because aprun (the mpiexec of Cray) is not available in the frontend. Thanks, Guilherme V. dr. ir. Guilherme Vaz | CFD Research Coordinator / Senior CFD Researcher | Research & Development MARIN | T +31 317 49 33 25 | M +31 621 13 11 97 | G.Vaz at marin.nl | www.marin.nl [LinkedIn] [YouTube] [Twitter] [Facebook] MARIN news: MARIN deelt onderzoek tijdens R&D Seminar, 21 juni 2017 ________________________________ From: Matthew Knepley > Sent: Tuesday, June 13, 2017 2:34 PM To: Vaz, Guilherme Cc: PETSc Subject: Re: [petsc-users] PETSC on Cray Hazelhen On Tue, Jun 13, 2017 at 3:48 AM, Vaz, Guilherme > wrote: Dear all, I am trying to install PETSC on a Cray XC40 system (Hazelhen) with the usual Cray wrappers for Intel compilers, with some chosen external packages and MKL libraries. I read some threads in the mailing list about this, and I tried the petsc-3.7.5/config/examples/arch-cray-xt6-pkgs-opt.py configuration options. After trying this (please abstract from my own env vars), CONFOPTS="--prefix=$PETSC_INSTALL_DIR \ --with-cc=cc \ --with-cxx=CC \ --with-fc=ftn \ --with-clib-autodetect=0 \ --with-cxxlib-autodetect=0 \ --with-fortranlib-autodetect=0 \ --COPTFLAGS=-fast -mp \ --CXXOPTFLAGS=-fast -mp \ --FOPTFLAGS=-fast -mp \ --with-shared-libraries=0 \ --with-batch=1 \ --with-x=0 \ --with-mpe=0 \ --with-debugging=0 \ --download-superlu_dist=$SOURCE_DIR/$SUPERLU_SOURCE_FILE \ --with-blas-lapack-dir=$BLASDIR \ --download-parmetis=$SOURCE_DIR/$PARMETIS_SOURCE_FILE \ --download-metis=$SOURCE_DIR/$METIS_SOURCE_FILE \ --with-external-packages-dir=$INSTALL_DIR \ --with-ssl=0 " I get the following error: TESTING: checkFortranLinkingCxx from config.compilers(/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/compilers.py:1097) ******************************************************************************* UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): ------------------------------------------------------------------------------- Fortran could not successfully link C++ objects ******************************************************************************* Does it ring a bell? Any tips? You turned off autodetection, so it will not find libstdc++. That either has to be put in LIBS, or I would recommend --with-cxx=0 since nothing you have there requires C++. Thanks, Matt Thanks, Guilherme V. dr. ir. Guilherme Vaz | CFD Research Coordinator / Senior CFD Researcher | Research & Development MARIN | T +31 317 49 33 25 | M +31 621 13 11 97 | G.Vaz at marin.nl | www.marin.nl [LinkedIn] [YouTube] [Twitter] [Facebook] MARIN news: Maritime Safety seminar, September 12, Singapore -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imageae7ad3.PNG Type: image/png Size: 333 bytes Desc: imageae7ad3.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image628296.PNG Type: image/png Size: 293 bytes Desc: image628296.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imaged8f08c.PNG Type: image/png Size: 331 bytes Desc: imaged8f08c.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image9aa6ed.PNG Type: image/png Size: 253 bytes Desc: image9aa6ed.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imagedacd80.PNG Type: image/png Size: 333 bytes Desc: imagedacd80.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image1bd020.PNG Type: image/png Size: 253 bytes Desc: image1bd020.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imagecdd6dd.PNG Type: image/png Size: 331 bytes Desc: imagecdd6dd.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imageb0ee12.PNG Type: image/png Size: 293 bytes Desc: imageb0ee12.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image067d96.PNG Type: image/png Size: 293 bytes Desc: image067d96.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image0c3f53.PNG Type: image/png Size: 331 bytes Desc: image0c3f53.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image887afd.PNG Type: image/png Size: 333 bytes Desc: image887afd.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image2a2a45.PNG Type: image/png Size: 253 bytes Desc: image2a2a45.PNG URL: From stefano.zampini at gmail.com Tue Jun 13 09:10:04 2017 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Tue, 13 Jun 2017 16:10:04 +0200 Subject: [petsc-users] PETSC on Cray Hazelhen In-Reply-To: <1497362496576.40011@marin.nl> References: <1497343731587.65032@marin.nl> <1497360881294.8614@marin.nl> <1497362496576.40011@marin.nl> Message-ID: Cray machines can be used with shared libraries, it?s not like the earlier versions of BG/Q Yes, you need almost all of this. If you run with configure with the option ?with-batch=1, will then generate something like the one I have sent you. ?with-shared-libraries is a PETSc configuration, i.e. you will create libpetsc.so ?LDFLAGS=-dynamic is used to link dynamically a PETSc executable ?known-mpi-shared-libraries=0 will use a statically linked MPI You can remove the first two options listed above if you would like to have a static version of PETSC. You may want to refine the options for optimized builds, i.e. add your favorite COPTFLAGS and remove ?with-debugging=1 Another thing you can do. Load any of the PETSc modules on the XC40, and then look at the file $PETSC_DIR/include/petscconfiginfo.h > On Jun 13, 2017, at 4:01 PM, Vaz, Guilherme wrote: > > Stefano/Mathew, > > Do I need all this :-)? > And I dont want a debug version, just an optimized release version. Thus I understand the -g -O0 flags for debug, but the rest I am not sure what is for debug and for a release version. Sorry... > I am also kind of confused on the shared libraries issue, > '--known-mpi-shared-libraries=0', > '--with-shared-libraries=1', > on the static vs dynamic linking (I thought in XC-40 we had to compile everything statically, > --LDFLAGS=-dynamic > and on the FFFLAG: > --FFLAGS=-mkl=sequential -g -O0 -lstdc++ > Is this to be used with Intel MKL libraries? > > Thanks for the help, you both. > > Guilherme V. > > dr. ir. Guilherme Vaz | CFD Research Coordinator / Senior CFD Researcher | Research & Development > MARIN | T +31 317 49 33 25 | M +31 621 13 11 97 | G.Vaz at marin.nl | www.marin.nl > > > MARIN news: MARIN deelt onderzoek tijdens R&D Seminar, 21 juni 2017 > From: Stefano Zampini > > Sent: Tuesday, June 13, 2017 3:42 PM > To: Vaz, Guilherme > Cc: Matthew Knepley; PETSc > Subject: Re: [petsc-users] PETSC on Cray Hazelhen > > Guilherme, > > here is my debug configuration (with shared libraries) in PETSc on a XC40 > > '--CFLAGS=-mkl=sequential -g -O0 ', > '--CXXFLAGS=-mkl=sequential -g -O0 ', > '--FFLAGS=-mkl=sequential -g -O0 -lstdc++', > '--LDFLAGS=-dynamic', > '--download-metis-cmake-arguments=-DCMAKE_C_COMPILER_FORCED=1', > '--download-metis=1', > '--download-parmetis-cmake-arguments=-DCMAKE_C_COMPILER_FORCED=1', > '--download-parmetis=1', > '--known-bits-per-byte=8', > '--known-has-attribute-aligned=1', > '--known-level1-dcache-assoc=8', > '--known-level1-dcache-linesize=64', > '--known-level1-dcache-size=32768', > '--known-memcmp-ok=1', > '--known-mpi-c-double-complex=1', > '--known-mpi-int64_t=1', > '--known-mpi-long-double=1', > '--known-mpi-shared-libraries=0', > '--known-sdot-returns-double=0', > '--known-sizeof-MPI_Comm=4', > '--known-sizeof-MPI_Fint=4', > '--known-sizeof-char=1', > '--known-sizeof-double=8', > '--known-sizeof-float=4', > '--known-sizeof-int=4', > '--known-sizeof-long-long=8', > '--known-sizeof-long=8', > '--known-sizeof-short=2', > '--known-sizeof-size_t=8', > '--known-sizeof-void-p=8', > '--known-snrm2-returns-double=0', > '--with-ar=ar', > '--with-batch=1', > '--with-cc=/opt/cray/craype/2.4.2/bin/cc', > '--with-clib-autodetect=0', > '--with-cmake=/home/zampins/local/bin/cmake', > '--with-cxx=/opt/cray/craype/2.4.2/bin/CC', > '--with-cxxlib-autodetect=0', > '--with-debugging=1', > '--with-dependencies=0', > '--with-etags=0', > '--with-fc=/opt/cray/craype/2.4.2/bin/ftn', > '--with-fortran-datatypes=0', > '--with-fortran-interfaces=0', > '--with-fortranlib-autodetect=0', > '--with-pthread=0', > '--with-ranlib=ranlib', > '--with-scalar-type=real', > '--with-shared-ld=ar', > '--with-shared-libraries=1', > 'PETSC_ARCH=arch-intel-debug', > > > 2017-06-13 15:34 GMT+02:00 Vaz, Guilherme >: > Dear Matthew, > > Thanks. It went further, but now I get: > TESTING: configureMPIEXEC from config.packages.MPI(/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/packages/MPI.py:143) > ******************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): > ------------------------------------------------------------------------------- > Must give a default value for known-mpi-shared-libraries since executables cannot be run > ******************************************************************************* > > Last lines from the log: > File "./config/configure.py", line 405, in petsc_configure > framework.configure(out = sys.stdout) > File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/framework.py", line 1090, in configure > self.processChildren() > File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/framework.py", line 1079, in processChildren > self.serialEvaluation(self.childGraph) > File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/framework.py", line 1060, in serialEvaluation > child.configure() > File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/package.py", line 791, in configure > self.executeTest(self.checkSharedLibrary) > File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/base.py", line 126, in executeTest > ret = test(*args,**kargs) > File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/packages/MPI.py", line 135, in checkSharedLibrary > self.shared = self.libraries.checkShared('#include \n','MPI_Init','MPI_Initialized','MPI_Finalize',checkLink = self.checkPackageLink,libraries = self.lib, defaultArg = 'known-mpi-shared-libraries', ex > ecutor = self.mpiexec) > File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/libraries.py", line 471, in checkShared > if self.checkRun(defaultIncludes, body, defaultArg = defaultArg, executor = executor): > File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/base.py", line 628, in checkRun > (output, returnCode) = self.outputRun(includes, body, cleanup, defaultArg, executor) > File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/base.py", line 598, in outputRun > raise ConfigureSetupError('Must give a default value for '+defaultOutputArg+' since executables cannot be run') > > ?Any ideas? Something related with > --with-shared-libraries=0 \ > --with-batch=1 \ > ?The first I set because it was in the cray example, and the second because aprun (the mpiexec of Cray) is not available in the frontend. > > Thanks, > > Guilherme V. > > dr. ir. Guilherme Vaz | CFD Research Coordinator / Senior CFD Researcher | Research & Development > MARIN | T +31 317 49 33 25 | M +31 621 13 11 97 | G.Vaz at marin.nl | www.marin.nl > > > MARIN news: MARIN deelt onderzoek tijdens R&D Seminar, 21 juni 2017 > From: Matthew Knepley > > Sent: Tuesday, June 13, 2017 2:34 PM > To: Vaz, Guilherme > Cc: PETSc > Subject: Re: [petsc-users] PETSC on Cray Hazelhen > > On Tue, Jun 13, 2017 at 3:48 AM, Vaz, Guilherme > wrote: > Dear all, > I am trying to install PETSC on a Cray XC40 system (Hazelhen) with the usual Cray wrappers for Intel compilers, with some chosen external packages and MKL libraries. > I read some threads in the mailing list about this, and I tried the petsc-3.7.5/config/examples/arch-cray-xt6-pkgs-opt.py configuration options. After trying this (please abstract from my own env vars), > CONFOPTS="--prefix=$PETSC_INSTALL_DIR \ > --with-cc=cc \ > --with-cxx=CC \ > --with-fc=ftn \ > --with-clib-autodetect=0 \ > --with-cxxlib-autodetect=0 \ > --with-fortranlib-autodetect=0 \ > --COPTFLAGS=-fast -mp \ > --CXXOPTFLAGS=-fast -mp \ > --FOPTFLAGS=-fast -mp \ > --with-shared-libraries=0 \ > --with-batch=1 \ > --with-x=0 \ > --with-mpe=0 \ > --with-debugging=0 \ > --download-superlu_dist=$SOURCE_DIR/$SUPERLU_SOURCE_FILE \ > --with-blas-lapack-dir=$BLASDIR \ > --download-parmetis=$SOURCE_DIR/$PARMETIS_SOURCE_FILE \ > --download-metis=$SOURCE_DIR/$METIS_SOURCE_FILE \ > --with-external-packages-dir=$INSTALL_DIR \ > --with-ssl=0 " > > I get the following error: > > TESTING: checkFortranLinkingCxx from config.compilers(/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/compilers.py:1097) > ******************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): > ------------------------------------------------------------------------------- > Fortran could not successfully link C++ objects > ******************************************************************************* > > Does it ring a bell? Any tips? > > You turned off autodetection, so it will not find libstdc++. That either has to be put in LIBS, or I would recommend > > --with-cxx=0 > > since nothing you have there requires C++. > > Thanks, > > Matt > > Thanks, > Guilherme V. > > dr. ir. Guilherme Vaz | CFD Research Coordinator / Senior CFD Researcher | Research & Development > MARIN | T +31 317 49 33 25 | M +31 621 13 11 97 | G.Vaz at marin.nl | www.marin.nl > > > MARIN news: Maritime Safety seminar, September 12, Singapore > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > > -- > Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Jun 13 09:41:06 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 13 Jun 2017 09:41:06 -0500 Subject: [petsc-users] Why the convergence is much slower when I use two nodes In-Reply-To: References: Message-ID: <18A824A7-F8CD-4CA6-9444-A88195DEE5B3@mcs.anl.gov> Before we worry about time we need to figure out why the MPI parallel jobs have different final residual norms. Given that you have no preconditioner the residual histories for different number of processes should be very similar. Run on one and two MPI processes with the option -ksp_monitor_true_residual and send the output. Perhaps there is a big in the matrix generation in parallel so it does not produce the same matrix as when run sequentially. Barry > On Jun 13, 2017, at 7:38 AM, Ji Zhang wrote: > > mpirun -n 1 -hostfile hostfile python force_pipe.py -dp 0.05 -ep 2 -lp 2 -b0 1e-4 -b1 0.9 -nb 1 -th 30 -ksp_max_it 1000 -ksp_view -log_view > mpi_1.txt > mpirun -n 2 -hostfile hostfile python force_pipe.py -dp 0.05 -ep 2 -lp 2 -b0 1e-4 -b1 0.9 -nb 1 -th 30 -ksp_max_it 1000 -ksp_view -log_view > mpi_2.txt > mpirun -n 4 -hostfile hostfile python force_pipe.py -dp 0.05 -ep 2 -lp 2 -b0 1e-4 -b1 0.9 -nb 1 -th 30 -ksp_max_it 1000 -ksp_view -log_view > mpi_3.txt > mpirun -n 6 -hostfile hostfile python force_pipe.py -dp 0.05 -ep 2 -lp 2 -b0 1e-4 -b1 0.9 -nb 1 -th 30 -ksp_max_it 1000 -ksp_view -log_view > mpi_4.txt > mpirun -n 2 python force_pipe.py -dp 0.05 -ep 2 -lp 2 -b0 1e-4 -b1 0.9 -nb 1 -th 30 -ksp_max_it 1000 -ksp_view -log_view > mpi_5.txt > mpirun -n 4 python force_pipe.py -dp 0.05 -ep 2 -lp 2 -b0 1e-4 -b1 0.9 -nb 1 -th 30 -ksp_max_it 1000 -ksp_view -log_view > mpi_6.txt > mpirun -n 6 python force_pipe.py -dp 0.05 -ep 2 -lp 2 -b0 1e-4 -b1 0.9 -nb 1 -th 30 -ksp_max_it 1000 -ksp_view -log_view > mpi_7.txt > > Dear Barry, > > The following tests are runing in our cluster using one, two or three nodes. Each node have 64GB memory and 24 cups (Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz). Basic information of each node are listed below. > > $ lstopo > Machine (64GB) > NUMANode L#0 (P#0 32GB) > Socket L#0 + L3 L#0 (30MB) > L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0) > L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1) > L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2) > L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3) > L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 (P#4) > L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#5) > L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#6) > L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#7) > L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 (P#8) > L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 (P#9) > L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 + PU L#10 (P#10) > L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 + PU L#11 (P#11) > HostBridge L#0 > PCIBridge > PCI 1000:0097 > Block L#0 "sda" > PCIBridge > PCI 8086:1523 > Net L#1 "eth0" > PCI 8086:1523 > Net L#2 "eth1" > PCIBridge > PCIBridge > PCI 1a03:2000 > PCI 8086:8d02 > NUMANode L#1 (P#1 32GB) > Socket L#1 + L3 L#1 (30MB) > L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12 + PU L#12 (P#12) > L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13 + PU L#13 (P#13) > L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14 + PU L#14 (P#14) > L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15 + PU L#15 (P#15) > L2 L#16 (256KB) + L1d L#16 (32KB) + L1i L#16 (32KB) + Core L#16 + PU L#16 (P#16) > L2 L#17 (256KB) + L1d L#17 (32KB) + L1i L#17 (32KB) + Core L#17 + PU L#17 (P#17) > L2 L#18 (256KB) + L1d L#18 (32KB) + L1i L#18 (32KB) + Core L#18 + PU L#18 (P#18) > L2 L#19 (256KB) + L1d L#19 (32KB) + L1i L#19 (32KB) + Core L#19 + PU L#19 (P#19) > L2 L#20 (256KB) + L1d L#20 (32KB) + L1i L#20 (32KB) + Core L#20 + PU L#20 (P#20) > L2 L#21 (256KB) + L1d L#21 (32KB) + L1i L#21 (32KB) + Core L#21 + PU L#21 (P#21) > L2 L#22 (256KB) + L1d L#22 (32KB) + L1i L#22 (32KB) + Core L#22 + PU L#22 (P#22) > L2 L#23 (256KB) + L1d L#23 (32KB) + L1i L#23 (32KB) + Core L#23 + PU L#23 (P#23) > HostBridge L#5 > PCIBridge > PCI 15b3:1003 > Net L#3 "ib0" > OpenFabrics L#4 "mlx4_0" > > I have tested seven different cases. Each case solving three different linear equation systems A*x1=b1, A*x2=b2, A*x3=b3. The matrix A is a mpidense matrix and b1, b2, b3 are different vectors. > I'm using GMRES method without precondition method . I have set -ksp_mat_it 1000 > process nodes eq1_residual_norms eq1_duration eq2_residual_norms eq2_duration eq3_residual_norms eq3_duration > mpi_1.txt: 1 1 9.884635e-04 88.631310s 4.144572e-04 88.855811s 4.864481e-03 88.673738s > mpi_2.txt: 2 2 6.719300e-01 84.212435s 6.782443e-01 85.153371s 7.223828e-01 85.246724s > mpi_3.txt: 4 4 5.813354e-01 52.616490s 5.397962e-01 52.413213s 9.503432e-01 52.495871s > mpi_4.txt: 6 6 4.621066e-01 42.929705s 4.661823e-01 43.367914s 1.047436e+00 43.108877s > mpi_5.txt: 2 1 6.719300e-01 141.490945s 6.782443e-01 142.746243s 7.223828e-01 142.042608s > mpi_6.txt: 3 1 5.813354e-01 165.061162s 5.397962e-01 196.539286s 9.503432e-01 180.240947s > mpi_7.txt: 6 1 4.621066e-01 213.683270s 4.661823e-01 208.180939s 1.047436e+00 194.251886s > I found that all residual norms are on the order of 1 except the first case, which one I only use one process at one node. > See the attach files for more details, please. > > > ?? > ?? > ????????? > ?????????? > ???????????10????9?? ?100193? > > Best, > Regards, > Zhang Ji, PhD student > Beijing Computational Science Research Center > Zhongguancun Software Park II, No. 10 Dongbeiwang West Road, Haidian District, Beijing 100193, China > > On Tue, Jun 13, 2017 at 9:34 AM, Barry Smith wrote: > > You need to provide more information. What is the output of -ksp_view? and -log_view? for both cases > > > On Jun 12, 2017, at 7:11 PM, Ji Zhang wrote: > > > > Dear all, > > > > I'm a PETSc user. I'm using GMRES method to solve some linear equations. I'm using boundary element method, so the matrix type is dense (or mpidense). I'm using MPICH2, I found that the convergence is fast if I only use one computer node; and much more slower if I use two or more nodes. I'm interested in why this happen, and how can I improve the convergence performance when I use multi-nodes. > > > > Thanks a lot. > > > > ?? > > ?? > > ????????? > > ?????????? > > ???????????10????9?? ?100193? > > > > Best, > > Regards, > > Zhang Ji, PhD student > > Beijing Computational Science Research Center > > Zhongguancun Software Park II, No. 10 Dongbeiwang West Road, Haidian District, Beijing 100193, China > > > From jed at jedbrown.org Tue Jun 13 10:06:24 2017 From: jed at jedbrown.org (Jed Brown) Date: Tue, 13 Jun 2017 09:06:24 -0600 Subject: [petsc-users] Jacobian matrix for dual porosity model In-Reply-To: <699ad4c0-6f79-be19-8239-ba2050ccb8de@auckland.ac.nz> References: <699ad4c0-6f79-be19-8239-ba2050ccb8de@auckland.ac.nz> Message-ID: <87d1a86i6n.fsf@jedbrown.org> Adrian Croucher writes: > One way might be to form the whole Jacobian but somehow use a modified > KSP solve which would implement the reduction process, do a KSP solve on > the reduced system of size n, and finally back-substitute to find the > unknowns in the matrix rock cells. You can do this with PCFieldSplit type Schur, but it's a lot heavier than you might like. > Another way might be to form only the reduced-size Jacobian and the > other block-diagonal matrices separately, use KSP to solve the reduced > system but first incorporate the reduction process into the Jacobian > calculation routine, and somewhere a post-solve step to back-substitute > for the unknowns in the matrix cells. However currently we are using > finite differences to compute these Jacobians and it seems to me it > would be messy to try to do that separately for each of the > sub-matrices. Doing it the first way above would avoid all that. If you choose this option, you would make your residual evaluation perform the local solve (i.e., eliminating the local variables). > Any suggestions for what might be a good approach? or any other ideas > that could be easier to implement with PETSc but have similar > efficiency? I didn't see anything currently in PETSc specifically for > solving block-tridiagonal systems. Any incomplete or complete factorization (optionally inside block Jacobi) is an O(N) direct solver for a tridiagonal system. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From bsmith at mcs.anl.gov Tue Jun 13 13:58:26 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 13 Jun 2017 13:58:26 -0500 Subject: [petsc-users] Jacobian matrix for dual porosity model In-Reply-To: <87d1a86i6n.fsf@jedbrown.org> References: <699ad4c0-6f79-be19-8239-ba2050ccb8de@auckland.ac.nz> <87d1a86i6n.fsf@jedbrown.org> Message-ID: <90E27510-2650-4B07-B37C-1C6D46250FC3@mcs.anl.gov> > On Jun 13, 2017, at 10:06 AM, Jed Brown wrote: > > Adrian Croucher writes: > >> One way might be to form the whole Jacobian but somehow use a modified >> KSP solve which would implement the reduction process, do a KSP solve on >> the reduced system of size n, and finally back-substitute to find the >> unknowns in the matrix rock cells. > > You can do this with PCFieldSplit type Schur, but it's a lot heavier > than you might like. Is it clear that it would produce much overhead compared to doing a custom "reduction to a smaller problem". Perhaps he should start with this and then profiling can show if there are any likely benefits to "specializing more"? Barry > >> Another way might be to form only the reduced-size Jacobian and the >> other block-diagonal matrices separately, use KSP to solve the reduced >> system but first incorporate the reduction process into the Jacobian >> calculation routine, and somewhere a post-solve step to back-substitute >> for the unknowns in the matrix cells. However currently we are using >> finite differences to compute these Jacobians and it seems to me it >> would be messy to try to do that separately for each of the >> sub-matrices. Doing it the first way above would avoid all that. > > If you choose this option, you would make your residual evaluation > perform the local solve (i.e., eliminating the local variables). > >> Any suggestions for what might be a good approach? or any other ideas >> that could be easier to implement with PETSc but have similar >> efficiency? I didn't see anything currently in PETSc specifically for >> solving block-tridiagonal systems. > > Any incomplete or complete factorization (optionally inside block > Jacobi) is an O(N) direct solver for a tridiagonal system. From jed at jedbrown.org Tue Jun 13 14:45:10 2017 From: jed at jedbrown.org (Jed Brown) Date: Tue, 13 Jun 2017 13:45:10 -0600 Subject: [petsc-users] Jacobian matrix for dual porosity model In-Reply-To: <90E27510-2650-4B07-B37C-1C6D46250FC3@mcs.anl.gov> References: <699ad4c0-6f79-be19-8239-ba2050ccb8de@auckland.ac.nz> <87d1a86i6n.fsf@jedbrown.org> <90E27510-2650-4B07-B37C-1C6D46250FC3@mcs.anl.gov> Message-ID: <87y3sv4qpl.fsf@jedbrown.org> Barry Smith writes: >> On Jun 13, 2017, at 10:06 AM, Jed Brown wrote: >> >> Adrian Croucher writes: >> >>> One way might be to form the whole Jacobian but somehow use a modified >>> KSP solve which would implement the reduction process, do a KSP solve on >>> the reduced system of size n, and finally back-substitute to find the >>> unknowns in the matrix rock cells. >> >> You can do this with PCFieldSplit type Schur, but it's a lot heavier >> than you might like. > > Is it clear that it would produce much overhead compared to doing a custom "reduction to a smaller problem". Perhaps he should start with this and then profiling can show if there are any likely benefits to "specializing more"? Yeah, that would be reasonable. We don't have a concept of sparsity for preconditioners so don't have a clean way to produce the exact (sparse) Schur complement. Computing this matrix using coloring should be relatively inexpensive due to the independence in each cell and its tridiagonal structure. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From fabien.tholin at onera.fr Wed Jun 14 10:35:11 2017 From: fabien.tholin at onera.fr (Fabien Tholin) Date: Wed, 14 Jun 2017 17:35:11 +0200 Subject: [petsc-users] Beginner : question about fortran modules and makefile Message-ID: <594157AF.30104@onera.fr> Hello, I am a beginner with Petsc and i'm trying to compile a very simple fortran program "test" with a calling program in "test.F90" and a module "my_module.F90". Unfortunately, i do not know how to write properly the makefile to be able to compile the module with "#include petsc" statements inside: I can not find any example on how to do it. here is my non-working makefile: CFLAGS = FFLAGS = PETSC_DIR=/home/fab/Program/PETSC/petsc-3.7.6 PETSC_ARCH=arch-linux2-c-debug FLINKER=mpif90 CLINKER=mpicc include ${PETSC_DIR}/lib/petsc/conf/variables include ${PETSC_DIR}/lib/petsc/conf/rules list=my_module.o test: test.o $(list) chkopts -${FLINKER} -o test test.o $(list) my_module.o my_module.mod: my_module.F90 -${FLINKER} -c my_module.F90 Thank you very much for helping me. Regards, Fabien THOLIN -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: makefile URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test.F90 URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: my_module.F90 URL: From balay at mcs.anl.gov Wed Jun 14 10:45:43 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 14 Jun 2017 10:45:43 -0500 Subject: [petsc-users] Beginner : question about fortran modules and makefile In-Reply-To: <594157AF.30104@onera.fr> References: <594157AF.30104@onera.fr> Message-ID: attaching fixed files. $ make test mpif90 -c -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -I/home/balay/tmp/petsc/include -I/home/balay/tmp/petsc/arch-linux2-c-debug/include -o my_module.o my_module.F90 mpif90 -c -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -I/home/balay/tmp/petsc/include -I/home/balay/tmp/petsc/arch-linux2-c-debug/include -o test.o test.F90 mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -o test test.o my_module.o -Wl,-rpath,/home/balay/tmp/petsc/arch-linux2-c-debug/lib -L/home/balay/tmp/petsc/arch-linux2-c-debug/lib -Wl,-rpath,/home/balay/soft/mpich-3.3a2/lib -L/home/balay/soft/mpich-3.3a2/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/7 -L/usr/lib/gcc/x86_64-redhat-linux/7 -lpetsc -llapack -lblas -lX11 -lpthread -lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lmpicxx -lstdc++ -lm -Wl,-rpath,/home/balay/soft/mpich-3.3a2/lib -L/home/balay/soft/mpich-3.3a2/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/7 -L/usr/lib/gcc/x86_64-redhat-linux/7 -ldl -Wl,-rpath,/home/balay/soft/mpich-3.3a2/lib -lmpi -lgcc_s -ldl $ Satish On Wed, 14 Jun 2017, Fabien Tholin wrote: > Hello, > > I am a beginner with Petsc and i'm trying > > to compile a very simple fortran program "test" with > > a calling program in "test.F90" and a module "my_module.F90". > > Unfortunately, i do not know how to write properly the makefile > > to be able to compile the module with "#include petsc" statements inside: > > I can not find any example on how to do it. > > here is my non-working makefile: > > > CFLAGS = > FFLAGS = > PETSC_DIR=/home/fab/Program/PETSC/petsc-3.7.6 > PETSC_ARCH=arch-linux2-c-debug > FLINKER=mpif90 > CLINKER=mpicc > > include ${PETSC_DIR}/lib/petsc/conf/variables > include ${PETSC_DIR}/lib/petsc/conf/rules > > list=my_module.o > > test: test.o $(list) chkopts > -${FLINKER} -o test test.o $(list) > > my_module.o my_module.mod: my_module.F90 > -${FLINKER} -c my_module.F90 > > > Thank you very much for helping me. > > Regards, > > Fabien THOLIN > -------------- next part -------------- CFLAGS = FFLAGS = include ${PETSC_DIR}/lib/petsc/conf/variables include ${PETSC_DIR}/lib/petsc/conf/rules list = test.o my_module.o test.o:my_module.o test: ${list} chkopts -${FLINKER} -o test ${list} ${PETSC_LIB} -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: my_module.F90 URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test.F90 URL: From balay at mcs.anl.gov Wed Jun 14 10:48:16 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 14 Jun 2017 10:48:16 -0500 Subject: [petsc-users] Beginner : question about fortran modules and makefile In-Reply-To: References: <594157AF.30104@onera.fr> Message-ID: BTW: you might consider using 'master' branch from petsc git repo. The fortran module support is revamped in it. Satish On Wed, 14 Jun 2017, Satish Balay wrote: > attaching fixed files. > > $ make test > mpif90 -c -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -I/home/balay/tmp/petsc/include -I/home/balay/tmp/petsc/arch-linux2-c-debug/include -o my_module.o my_module.F90 > mpif90 -c -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -I/home/balay/tmp/petsc/include -I/home/balay/tmp/petsc/arch-linux2-c-debug/include -o test.o test.F90 > mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -o test test.o my_module.o -Wl,-rpath,/home/balay/tmp/petsc/arch-linux2-c-debug/lib -L/home/balay/tmp/petsc/arch-linux2-c-debug/lib -Wl,-rpath,/home/balay/soft/mpich-3.3a2/lib -L/home/balay/soft/mpich-3.3a2/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/7 -L/usr/lib/gcc/x86_64-redhat-linux/7 -lpetsc -llapack -lblas -lX11 -lpthread -lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lmpicxx -lstdc++ -lm -Wl,-rpath,/home/balay/soft/mpich-3.3a2/lib -L/home/balay/soft/mpich-3.3a2/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/7 -L/usr/lib/gcc/x86_64-redhat-linux/7 -ldl -Wl,-rpath,/home/balay/soft/mpich-3.3a2/lib -lmpi -lgcc_s -ldl > $ > > Satish > > On Wed, 14 Jun 2017, Fabien Tholin wrote: > > > Hello, > > > > I am a beginner with Petsc and i'm trying > > > > to compile a very simple fortran program "test" with > > > > a calling program in "test.F90" and a module "my_module.F90". > > > > Unfortunately, i do not know how to write properly the makefile > > > > to be able to compile the module with "#include petsc" statements inside: > > > > I can not find any example on how to do it. > > > > here is my non-working makefile: > > > > > > CFLAGS = > > FFLAGS = > > PETSC_DIR=/home/fab/Program/PETSC/petsc-3.7.6 > > PETSC_ARCH=arch-linux2-c-debug > > FLINKER=mpif90 > > CLINKER=mpicc > > > > include ${PETSC_DIR}/lib/petsc/conf/variables > > include ${PETSC_DIR}/lib/petsc/conf/rules > > > > list=my_module.o > > > > test: test.o $(list) chkopts > > -${FLINKER} -o test test.o $(list) > > > > my_module.o my_module.mod: my_module.F90 > > -${FLINKER} -c my_module.F90 > > > > > > Thank you very much for helping me. > > > > Regards, > > > > Fabien THOLIN > > > From dnolte at dim.uchile.cl Wed Jun 14 12:41:55 2017 From: dnolte at dim.uchile.cl (David Nolte) Date: Wed, 14 Jun 2017 13:41:55 -0400 Subject: [petsc-users] Advice on improving Stokes Schur preconditioners In-Reply-To: References: Message-ID: <2e961dbd-c795-0485-844b-e0f505e23df7@dim.uchile.cl> Dave, thanks a lot for your great answer and for sharing your experience. I have a much clearer picture now. :) The experiments 3/ give the desired results for examples of cavity flow. The (1/mu scaled) mass matrix seems OK. I followed your and Matt's recommendations, used a FULL Schur factorization, LU in the 0th split, and gradually relaxed the tolerance of GMRES/Jacobi in split 1 (observed the gradual increase in outer iterations). Then I replaced the split_0 LU with AMG (further increase of outer iterations and iterations on the Schur complement). Doing so I converged to using hypre boomeramg (smooth_type Euclid, strong_threshold 0.75) and 3 iterations of GMRES/Jacobi on the Schur block, which gave the best time-to-solution in my particular setup and convergence to rtol=1e-8 within 60 outer iterations. In my cases, using GMRES in the 0th split (with rtol 1e-1 or 1e-2) instead of "preonly" did not help convergence (on the contrary). I also repeated the experiments with "-pc_fieldsplit_schur_precondition selfp", with hypre(ilu) in split 0 and hypre in split 1, just to check, and somewhat disappointingly ( ;-) ) the wall time is less than half than when using gmres/Jac and Sp = mass matrix. I am aware that this says nothing about scaling and robustness with respect to h-refinement... Would you agree that these configurations "make sense"? Furthermore, maybe anyone has a hint where to start tuning multigrid? So far hypre worked better than ML, but I have not experimented much with the parameters. Thanks again for your help! Best wishes, David On 06/12/2017 04:52 PM, Dave May wrote: > I've been following the discussion and have a couple of comments: > > 1/ For the preconditioners that you are using (Schur factorisation > LDU, or upper block triangular DU), the convergence properties (e.g. 1 > iterate for LDU and 2 iterates for DU) come from analysis involving > exact inverses of A_00 and S > > Once you switch from using exact inverses of A_00 and S, you have to > rely on spectral equivalence of operators. That is fine, but the > spectral equivalence does not tell you how many iterates LDU or DU > will require to converge. What it does inform you about is that if you > have a spectrally equivalent operator for A_00 and S (Schur > complement), then under mesh refinement, your iteration count > (whatever it was prior to refinement) will not increase. > > 2/ Looking at your first set of options, I see you have opted to use > -fieldsplit_ksp_type preonly (for both split 0 and 1). That is nice as > it creates a linear operator thus you don't need something like FGMRES > or GCR applied to the saddle point problem. > > Your choice for Schur is fine in the sense that the diagonal of M is > spectrally equivalent to M, and M is spectrally equivalent to S. > Whether it is "fine" in terms of the iteration count for Schur > systems, we cannot say apriori (since the spectral equivalence doesn't > give us direct info about the iterations we should expect). > > Your preconditioner for A_00 relies on AMG producing a spectrally > equivalent operator with bounds which are tight enough to ensure > convergence of the saddle point problem. I'll try explain this. > > In my experience, for many problems (unstructured FE with variable > coefficients, structured FE meshes with variable coefficients) AMG and > preonly is not a robust choice. To control the approximation (the > spectral equiv bounds), I typically run a stationary or Krylov method > on split 0 (e.g. -fieldsplit_0_ksp_type xxx -fieldsplit_0_kps_rtol > yyy). Since the AMG preconditioner generated is spectrally equivalent > (usually!), these solves will converge to a chosen rtol in a constant > number of iterates under h-refinement. In practice, if I don't enforce > that I hit something like rtol=1.0e-1 (or 1.0e-2) on the 0th split, > saddle point iterates will typically increase for "hard" problems > under mesh refinement (1e4-1e7 coefficient variation), and may not > even converge at all when just using -fieldsplit_0_ksp_type preonly. > Failure ultimately depends on how "strong" the preconditioner for A_00 > block is (consider re-discretized geometric multigrid versus AMG). > Running an iterative solve on the 0th split lets you control and > recover from weak/poor, but spectrally equivalent preconditioners for > A_00. Note that people hate this approach as it invariably nests > Krylov methods, and subsequently adds more global reductions. However, > it is scalable, optimal, tuneable and converges faster than the case > which didn't converge at all :D > > 3/ I agree with Matt's comments, but I'd do a couple of other things > first. > > * I'd first check the discretization is implemented correctly. Your > P2/P1 element is inf-sup stable - thus the condition number of S > (unpreconditioned) should be independent of the mesh resolution (h). > An easy way to verify this is to run either LDU (schur_fact_type full) > or DU (schur_fact_type upper) and monitor the iterations required for > those S solves. Use -fieldsplit_1_pc_type none -fieldsplit_1_ksp_rtol > 1.0e-8 -fieldsplit_1_ksp_monitor_true_residual > -fieldsplit_1_ksp_pc_right -fieldsplit_1_ksp_type gmres > -fieldsplit_0_pc_type lu > > Then refine the mesh (ideally via sub-division) and repeat the experiment. > If the S iterates don't asymptote, but instead grow with each > refinement - you likely have a problem with the discretisation. > > * Do the same experiment, but this time use your mass matrix as the > preconditioner for S and use -fieldsplit_1_pc_type lu. If the > iterates, compared with the previous experiments (without a Schur PC) > have gone up your mass matrix is not defined correctly. If in the > previous experiment (without a Schur PC) iterates on the S solves were > bounded, but now when preconditioned with the mass matrix the iterates > go up, then your mass matrix is definitely not correct. > > 4/ Lastly, to finally get to your question regarding does +400 > iterates for the solving the Schur seem "reasonable" and what is > "normal behaviour"? > > It seems "high" to me. However the specifics of your discretisation, > mesh topology, element quality, boundary conditions render it almost > impossible to say what should be expected. When I use a Q2-P2* > discretisation on a structured mesh with a non-constant viscosity I'd > expect something like 50-60 for 1.0e-10 with a mass matrix scaled by > the inverse (local) viscosity. For constant viscosity maybe 30 > iterates. I think this kind of statement is not particularly useful or > helpful though. > > Given you use an unstructured tet mesh, it is possible that some > elements have very bad quality (high aspect ratio (AR), highly > skewed). I am certain that P2/P1 has an inf-sup constant which is > sensitive to the element aspect ratio (I don't recall the exact > scaling wrt AR). From experience I know that using the mass matrix as > a preconditioner for Schur is not robust as AR increases (e.g. > iterations for the S solve grow). Hence, with a couple of "bad" > element in your mesh, I could imagine that you could end up having to > perform +400 iterations > > 5/ Lastly, definitely don't impose one Dirichlet BC on pressure to > make the pressure unique. This really screws up all the nice > properties of your matrices. Just enforce the constant null space for > p. And as you noticed, GMRES magically just does it automatically if > the RHS of your original system was consistent. > > Thanks, > Dave > > > On 12 June 2017 at 20:20, David Nolte > wrote: > > Ok. With "-pc_fieldsplit_schur_fact_type full" the outer iteration > converges in 1 step. The problem remain the Schur iterations. > > I was not sure if the problem was maybe the singular pressure or > the pressure Dirichlet BC. I tested the solver with a standard > Stokes flow in a pipe with a constriction (zero Neumann BC for the > pressure at the outlet) and in a 3D cavity (enclosed flow, no > pressure BC or fixed at one point). I am not sure if I need to > attach the constant pressure nullspace to the matrix for GMRES. > Not doing so does not alter the convergence of GMRES in the Schur > solver (nor the pressure solution), using a pressure Dirichlet BC > however slows down convergence (I suppose because of the scaling > of the matrix). > > I also checked the pressure mass matrix that I give PETSc, it > looks correct. > > In all these cases, the solver behaves just as before. With LU in > fieldsplit_0 and GMRES/LU with rtol 1e-10 in fieldsplit_1, it > converges after 1 outer iteration, but the inner Schur solver > converges slowly. > > How should the convergence of GMRES/LU of the Schur complement > *normally* behave? > > Thanks again! > David > > > > > On 06/12/2017 12:41 PM, Matthew Knepley wrote: >> On Mon, Jun 12, 2017 at 10:36 AM, David Nolte >> > wrote: >> >> >> On 06/12/2017 07:50 AM, Matthew Knepley wrote: >>> On Sun, Jun 11, 2017 at 11:06 PM, David Nolte >>> > wrote: >>> >>> Thanks Matt, makes sense to me! >>> >>> I skipped direct solvers at first because for these >>> 'real' configurations LU (mumps/superlu_dist) usally >>> goes out of memory (got 32GB RAM). It would be >>> reasonable to take one more step back and play with >>> synthetic examples. >>> I managed to run one case though with 936k dofs using: >>> ("user" =pressure mass matrix) >>> >>> <...> >>> -pc_fieldsplit_schur_fact_type upper >>> -pc_fieldsplit_schur_precondition user >>> -fieldsplit_0_ksp_type preonly >>> -fieldsplit_0_pc_type lu >>> -fieldsplit_0_pc_factor_mat_solver_package mumps >>> >>> -fieldsplit_1_ksp_type gmres >>> -fieldsplit_1_ksp_monitor_true_residuals >>> -fieldsplit_1_ksp_rtol 1e-10 >>> -fieldsplit_1_pc_type lu >>> -fieldsplit_1_pc_factor_mat_solver_package mumps >>> >>> It takes 2 outer iterations, as expected. However the >>> fieldsplit_1 solve takes very long. >>> >>> >>> 1) It should take 1 outer iterate, not two. The problem is >>> that your Schur tolerance is way too high. Use >>> >>> -fieldsplit_1_ksp_rtol 1e-10 >>> >>> or something like that. Then it will take 1 iterate. >> >> Shouldn't it take 2 with a triangular Schur factorization and >> exact preconditioners, and 1 with a full factorization? (cf. >> Benzi et al 2005, p.66, >> http://www.mathcs.emory.edu/~benzi/Web_papers/bgl05.pdf >> ) >> >> That's exactly what I set: -fieldsplit_1_ksp_rtol 1e-10 and >> the Schur solver does drop below "rtol < 1e-10" >> >> >> Oh, yes. Take away the upper until things are worked out. >> >> Thanks, >> >> Matt >> >>> >>> 2) There is a problem with the Schur solve. Now from the >>> iterates >>> >>> 423 KSP preconditioned resid norm 2.638419658982e-02 true >>> resid norm 7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11 >>> >>> it is clear that the preconditioner is really screwing stuff >>> up. For testing, you can use >>> >>> -pc_fieldsplit_schur_precondition full >>> >>> and your same setup here. It should take one iterate. I >>> think there is something wrong with your >>> mass matrix. >> >> I agree. I forgot to mention that I am considering an >> "enclosed flow" problem, with u=0 on all the boundary and a >> Dirichlet condition for the pressure in one point for fixing >> the constant pressure. Maybe the preconditioner is not >> consistent with this setup, need to check this.. >> >> Thanks a lot >> >> >>> >>> Thanks, >>> >>> Matt >>> >>> >>> 0 KSP unpreconditioned resid norm 4.038466809302e-03 >>> true resid norm 4.038466809302e-03 ||r(i)||/||b|| >>> 1.000000000000e+00 >>> Residual norms for fieldsplit_1_ solve. >>> 0 KSP preconditioned resid norm 0.000000000000e+00 >>> true resid norm 0.000000000000e+00 >>> ||r(i)||/||b|| -nan >>> Linear fieldsplit_1_ solve converged due to >>> CONVERGED_ATOL iterations 0 >>> 1 KSP unpreconditioned resid norm 4.860095964831e-06 >>> true resid norm 4.860095964831e-06 ||r(i)||/||b|| >>> 1.203450763452e-03 >>> Residual norms for fieldsplit_1_ solve. >>> 0 KSP preconditioned resid norm 2.965546249872e+08 >>> true resid norm 1.000000000000e+00 ||r(i)||/||b|| >>> 1.000000000000e+00 >>> 1 KSP preconditioned resid norm 1.347596594634e+08 >>> true resid norm 3.599678801575e-01 ||r(i)||/||b|| >>> 3.599678801575e-01 >>> 2 KSP preconditioned resid norm 5.913230136403e+07 >>> true resid norm 2.364916760834e-01 ||r(i)||/||b|| >>> 2.364916760834e-01 >>> 3 KSP preconditioned resid norm 4.629700028930e+07 >>> true resid norm 1.984444715595e-01 ||r(i)||/||b|| >>> 1.984444715595e-01 >>> 4 KSP preconditioned resid norm 3.804431276819e+07 >>> true resid norm 1.747224559120e-01 ||r(i)||/||b|| >>> 1.747224559120e-01 >>> 5 KSP preconditioned resid norm 3.178769422140e+07 >>> true resid norm 1.402254864444e-01 ||r(i)||/||b|| >>> 1.402254864444e-01 >>> 6 KSP preconditioned resid norm 2.648669043919e+07 >>> true resid norm 1.191164310866e-01 ||r(i)||/||b|| >>> 1.191164310866e-01 >>> 7 KSP preconditioned resid norm 2.203522108614e+07 >>> true resid norm 9.690500018007e-02 ||r(i)||/||b|| >>> 9.690500018007e-02 >>> <...> >>> 422 KSP preconditioned resid norm 2.984888715147e-02 >>> true resid norm 8.598401046494e-11 ||r(i)||/||b|| >>> 8.598401046494e-11 >>> 423 KSP preconditioned resid norm 2.638419658982e-02 >>> true resid norm 7.229653211635e-11 ||r(i)||/||b|| >>> 7.229653211635e-11 >>> Linear fieldsplit_1_ solve converged due to >>> CONVERGED_RTOL iterations 423 >>> 2 KSP unpreconditioned resid norm 3.539889585599e-16 >>> true resid norm 3.542279617063e-16 ||r(i)||/||b|| >>> 8.771347603759e-14 >>> Linear solve converged due to CONVERGED_RTOL iterations 2 >>> >>> >>> Does the slow convergence of the Schur block mean that >>> my preconditioning matrix Sp is a poor choice? >>> >>> Thanks, >>> David >>> >>> >>> On 06/11/2017 08:53 AM, Matthew Knepley wrote: >>>> On Sat, Jun 10, 2017 at 8:25 PM, David Nolte >>>> > wrote: >>>> >>>> Dear all, >>>> >>>> I am solving a Stokes problem in 3D aorta >>>> geometries, using a P2/P1 >>>> finite elements discretization on tetrahedral >>>> meshes resulting in >>>> ~1-1.5M DOFs. Viscosity is uniform (can be adjusted >>>> arbitrarily), and >>>> the right hand side is a function of noisy >>>> measurement data. >>>> >>>> In other settings of "standard" Stokes flow >>>> problems I have obtained >>>> good convergence with an "upper" Schur complement >>>> preconditioner, using >>>> AMG (ML or Hypre) on the velocity block and >>>> approximating the Schur >>>> complement matrix by the diagonal of the pressure >>>> mass matrix: >>>> >>>> -ksp_converged_reason >>>> -ksp_monitor_true_residual >>>> -ksp_initial_guess_nonzero >>>> -ksp_diagonal_scale >>>> -ksp_diagonal_scale_fix >>>> -ksp_type fgmres >>>> -ksp_rtol 1.0e-8 >>>> >>>> -pc_type fieldsplit >>>> -pc_fieldsplit_type schur >>>> -pc_fieldsplit_detect_saddle_point >>>> -pc_fieldsplit_schur_fact_type upper >>>> -pc_fieldsplit_schur_precondition user # <-- >>>> pressure mass matrix >>>> >>>> -fieldsplit_0_ksp_type preonly >>>> -fieldsplit_0_pc_type ml >>>> >>>> -fieldsplit_1_ksp_type preonly >>>> -fieldsplit_1_pc_type jacobi >>>> >>>> >>>> 1) I always recommend starting from an exact solver and >>>> backing off in small steps for optimization. Thus >>>> I would start with LU on the upper block and >>>> GMRES/LU with toelrance 1e-10 on the Schur block. >>>> This should converge in 1 iterate. >>>> >>>> 2) I don't think you want preonly on the Schur system. >>>> You might want GMRES/Jacobi to invert the mass matrix. >>>> >>>> 3) You probably want to tighten the tolerance on the >>>> Schur solve, at least to start, and then slowly let it >>>> out. The >>>> tight tolerance will show you how effective the >>>> preconditioner is using that Schur operator. Then you >>>> can start >>>> to evaluate how effective the Schur linear sovler is. >>>> >>>> Does this make sense? >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>> In my present case this setup gives rather slow >>>> convergence (varies for >>>> different geometries between 200-500 or several >>>> thousands!). I obtain >>>> better convergence with >>>> "-pc_fieldsplit_schur_precondition selfp"and >>>> using multigrid on S, with "-fieldsplit_1_pc_type >>>> ml" (I don't think >>>> this is optimal, though). >>>> >>>> I don't understand why the pressure mass matrix >>>> approach performs so >>>> poorly and wonder what I could try to improve the >>>> convergence. Until now >>>> I have been using ML and Hypre BoomerAMG mostly >>>> with default parameters. >>>> Surely they can be improved by tuning some >>>> parameters. Which could be a >>>> good starting point? Are there other options I >>>> should consider? >>>> >>>> With the above setup (jacobi) for a case that works >>>> better than others, >>>> the KSP terminates with >>>> 467 KSP unpreconditioned resid norm >>>> 2.072014323515e-09 true resid norm >>>> 2.072014322600e-09 ||r(i)||/||b|| 9.939098100674e-09 >>>> >>>> You can find the output of -ksp_view below. Let me >>>> know if you need more >>>> details. >>>> >>>> Thanks in advance for your advice! >>>> Best wishes >>>> David >>>> >>>> >>>> KSP Object: 1 MPI processes >>>> type: fgmres >>>> GMRES: restart=30, using Classical (unmodified) >>>> Gram-Schmidt >>>> Orthogonalization with no iterative refinement >>>> GMRES: happy breakdown tolerance 1e-30 >>>> maximum iterations=10000 >>>> tolerances: relative=1e-08, absolute=1e-50, >>>> divergence=10000. >>>> right preconditioning >>>> diagonally scaled system >>>> using nonzero initial guess >>>> using UNPRECONDITIONED norm type for convergence test >>>> PC Object: 1 MPI processes >>>> type: fieldsplit >>>> FieldSplit with Schur preconditioner, >>>> factorization UPPER >>>> Preconditioner for the Schur complement formed >>>> from user provided matrix >>>> Split info: >>>> Split number 0 Defined by IS >>>> Split number 1 Defined by IS >>>> KSP solver for A00 block >>>> KSP Object: (fieldsplit_0_) 1 MPI >>>> processes >>>> type: preonly >>>> maximum iterations=10000, initial guess is zero >>>> tolerances: relative=1e-05, >>>> absolute=1e-50, divergence=10000. >>>> left preconditioning >>>> using NONE norm type for convergence test >>>> PC Object: (fieldsplit_0_) 1 MPI >>>> processes >>>> type: ml >>>> MG: type is MULTIPLICATIVE, levels=5 cycles=v >>>> Cycles per PCApply=1 >>>> Using Galerkin computed coarse grid >>>> matrices >>>> Coarse grid solver -- level >>>> ------------------------------- >>>> KSP Object: >>>> (fieldsplit_0_mg_coarse_) 1 MPI >>>> processes >>>> type: preonly >>>> maximum iterations=10000, initial guess >>>> is zero >>>> tolerances: relative=1e-05, >>>> absolute=1e-50, divergence=10000. >>>> left preconditioning >>>> using NONE norm type for convergence test >>>> PC Object: >>>> (fieldsplit_0_mg_coarse_) 1 MPI >>>> processes >>>> type: lu >>>> LU: out-of-place factorization >>>> tolerance for zero pivot 2.22045e-14 >>>> using diagonal shift on blocks to >>>> prevent zero pivot >>>> [INBLOCKS] >>>> matrix ordering: nd >>>> factor fill ratio given 5., needed 1. >>>> Factored matrix follows: >>>> Mat Object: 1 >>>> MPI processes >>>> type: seqaij >>>> rows=3, cols=3 >>>> package used to perform >>>> factorization: petsc >>>> total: nonzeros=3, allocated >>>> nonzeros=3 >>>> total number of mallocs used >>>> during MatSetValues >>>> calls =0 >>>> not using I-node routines >>>> linear system matrix = precond matrix: >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> rows=3, cols=3 >>>> total: nonzeros=3, allocated nonzeros=3 >>>> total number of mallocs used during >>>> MatSetValues calls =0 >>>> not using I-node routines >>>> Down solver (pre-smoother) on level 1 >>>> ------------------------------- >>>> KSP Object: >>>> (fieldsplit_0_mg_levels_1_) 1 >>>> MPI processes >>>> type: richardson >>>> Richardson: damping factor=1. >>>> maximum iterations=2 >>>> tolerances: relative=1e-05, >>>> absolute=1e-50, divergence=10000. >>>> left preconditioning >>>> using nonzero initial guess >>>> using NONE norm type for convergence test >>>> PC Object: >>>> (fieldsplit_0_mg_levels_1_) 1 >>>> MPI processes >>>> type: sor >>>> SOR: type = local_symmetric, >>>> iterations = 1, local >>>> iterations = 1, omega = 1. >>>> linear system matrix = precond matrix: >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> rows=15, cols=15 >>>> total: nonzeros=69, allocated nonzeros=69 >>>> total number of mallocs used during >>>> MatSetValues calls =0 >>>> not using I-node routines >>>> Up solver (post-smoother) same as down >>>> solver (pre-smoother) >>>> Down solver (pre-smoother) on level 2 >>>> ------------------------------- >>>> KSP Object: >>>> (fieldsplit_0_mg_levels_2_) 1 >>>> MPI processes >>>> type: richardson >>>> Richardson: damping factor=1. >>>> maximum iterations=2 >>>> tolerances: relative=1e-05, >>>> absolute=1e-50, divergence=10000. >>>> left preconditioning >>>> using nonzero initial guess >>>> using NONE norm type for convergence test >>>> PC Object: >>>> (fieldsplit_0_mg_levels_2_) 1 >>>> MPI processes >>>> type: sor >>>> SOR: type = local_symmetric, >>>> iterations = 1, local >>>> iterations = 1, omega = 1. >>>> linear system matrix = precond matrix: >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> rows=304, cols=304 >>>> total: nonzeros=7354, allocated >>>> nonzeros=7354 >>>> total number of mallocs used during >>>> MatSetValues calls =0 >>>> not using I-node routines >>>> Up solver (post-smoother) same as down >>>> solver (pre-smoother) >>>> Down solver (pre-smoother) on level 3 >>>> ------------------------------- >>>> KSP Object: >>>> (fieldsplit_0_mg_levels_3_) 1 >>>> MPI processes >>>> type: richardson >>>> Richardson: damping factor=1. >>>> maximum iterations=2 >>>> tolerances: relative=1e-05, >>>> absolute=1e-50, divergence=10000. >>>> left preconditioning >>>> using nonzero initial guess >>>> using NONE norm type for convergence test >>>> PC Object: >>>> (fieldsplit_0_mg_levels_3_) 1 >>>> MPI processes >>>> type: sor >>>> SOR: type = local_symmetric, >>>> iterations = 1, local >>>> iterations = 1, omega = 1. >>>> linear system matrix = precond matrix: >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> rows=30236, cols=30236 >>>> total: nonzeros=2730644, allocated >>>> nonzeros=2730644 >>>> total number of mallocs used during >>>> MatSetValues calls =0 >>>> not using I-node routines >>>> Up solver (post-smoother) same as down >>>> solver (pre-smoother) >>>> Down solver (pre-smoother) on level 4 >>>> ------------------------------- >>>> KSP Object: >>>> (fieldsplit_0_mg_levels_4_) 1 >>>> MPI processes >>>> type: richardson >>>> Richardson: damping factor=1. >>>> maximum iterations=2 >>>> tolerances: relative=1e-05, >>>> absolute=1e-50, divergence=10000. >>>> left preconditioning >>>> using nonzero initial guess >>>> using NONE norm type for convergence test >>>> PC Object: >>>> (fieldsplit_0_mg_levels_4_) 1 >>>> MPI processes >>>> type: sor >>>> SOR: type = local_symmetric, >>>> iterations = 1, local >>>> iterations = 1, omega = 1. >>>> linear system matrix = precond matrix: >>>> Mat Object: (fieldsplit_0_) >>>> 1 MPI >>>> processes >>>> type: seqaij >>>> rows=894132, cols=894132 >>>> total: nonzeros=70684164, allocated >>>> nonzeros=70684164 >>>> total number of mallocs used during >>>> MatSetValues calls =0 >>>> not using I-node routines >>>> Up solver (post-smoother) same as down >>>> solver (pre-smoother) >>>> linear system matrix = precond matrix: >>>> Mat Object: (fieldsplit_0_) >>>> 1 MPI processes >>>> type: seqaij >>>> rows=894132, cols=894132 >>>> total: nonzeros=70684164, allocated >>>> nonzeros=70684164 >>>> total number of mallocs used during >>>> MatSetValues calls =0 >>>> not using I-node routines >>>> KSP solver for S = A11 - A10 inv(A00) A01 >>>> KSP Object: (fieldsplit_1_) 1 MPI >>>> processes >>>> type: preonly >>>> maximum iterations=10000, initial guess is zero >>>> tolerances: relative=1e-05, >>>> absolute=1e-50, divergence=10000. >>>> left preconditioning >>>> using NONE norm type for convergence test >>>> PC Object: (fieldsplit_1_) 1 MPI >>>> processes >>>> type: jacobi >>>> linear system matrix followed by >>>> preconditioner matrix: >>>> Mat Object: (fieldsplit_1_) >>>> 1 MPI processes >>>> type: schurcomplement >>>> rows=42025, cols=42025 >>>> Schur complement A11 - A10 inv(A00) A01 >>>> A11 >>>> Mat Object: >>>> (fieldsplit_1_) 1 >>>> MPI processes >>>> type: seqaij >>>> rows=42025, cols=42025 >>>> total: nonzeros=554063, allocated >>>> nonzeros=554063 >>>> total number of mallocs used during >>>> MatSetValues calls =0 >>>> not using I-node routines >>>> A10 >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> rows=42025, cols=894132 >>>> total: nonzeros=6850107, allocated >>>> nonzeros=6850107 >>>> total number of mallocs used during >>>> MatSetValues calls =0 >>>> not using I-node routines >>>> KSP of A00 >>>> KSP Object: >>>> (fieldsplit_0_) 1 >>>> MPI processes >>>> type: preonly >>>> maximum iterations=10000, initial >>>> guess is zero >>>> tolerances: relative=1e-05, >>>> absolute=1e-50, >>>> divergence=10000. >>>> left preconditioning >>>> using NONE norm type for >>>> convergence test >>>> PC Object: >>>> (fieldsplit_0_) 1 >>>> MPI processes >>>> type: ml >>>> MG: type is MULTIPLICATIVE, >>>> levels=5 cycles=v >>>> Cycles per PCApply=1 >>>> Using Galerkin computed coarse >>>> grid matrices >>>> Coarse grid solver -- level >>>> ------------------------------- >>>> KSP Object: >>>> (fieldsplit_0_mg_coarse_) 1 MPI >>>> processes >>>> type: preonly >>>> maximum iterations=10000, >>>> initial guess is zero >>>> tolerances: relative=1e-05, >>>> absolute=1e-50, >>>> divergence=10000. >>>> left preconditioning >>>> using NONE norm type for >>>> convergence test >>>> PC Object: >>>> (fieldsplit_0_mg_coarse_) 1 MPI >>>> processes >>>> type: lu >>>> LU: out-of-place factorization >>>> tolerance for zero pivot >>>> 2.22045e-14 >>>> using diagonal shift on >>>> blocks to prevent zero >>>> pivot [INBLOCKS] >>>> matrix ordering: nd >>>> factor fill ratio given 5., >>>> needed 1. >>>> Factored matrix follows: >>>> Mat Object: >>>> 1 MPI >>>> processes >>>> type: seqaij >>>> rows=3, cols=3 >>>> package used to perform >>>> factorization: petsc >>>> total: nonzeros=3, >>>> allocated nonzeros=3 >>>> total number of mallocs >>>> used during >>>> MatSetValues calls =0 >>>> not using I-node routines >>>> linear system matrix = precond >>>> matrix: >>>> Mat Object: >>>> 1 MPI processes >>>> type: seqaij >>>> rows=3, cols=3 >>>> total: nonzeros=3, allocated >>>> nonzeros=3 >>>> total number of mallocs used >>>> during MatSetValues >>>> calls =0 >>>> not using I-node routines >>>> Down solver (pre-smoother) on level 1 >>>> ------------------------------- >>>> KSP Object: >>>> (fieldsplit_0_mg_levels_1_) 1 MPI >>>> processes >>>> type: richardson >>>> Richardson: damping factor=1. >>>> maximum iterations=2 >>>> tolerances: relative=1e-05, >>>> absolute=1e-50, >>>> divergence=10000. >>>> left preconditioning >>>> using nonzero initial guess >>>> using NONE norm type for >>>> convergence test >>>> PC Object: >>>> (fieldsplit_0_mg_levels_1_) 1 MPI >>>> processes >>>> type: sor >>>> SOR: type = local_symmetric, >>>> iterations = 1, local >>>> iterations = 1, omega = 1. >>>> linear system matrix = precond >>>> matrix: >>>> Mat Object: >>>> 1 MPI processes >>>> type: seqaij >>>> rows=15, cols=15 >>>> total: nonzeros=69, allocated >>>> nonzeros=69 >>>> total number of mallocs used >>>> during MatSetValues >>>> calls =0 >>>> not using I-node routines >>>> Up solver (post-smoother) same as >>>> down solver (pre-smoother) >>>> Down solver (pre-smoother) on level 2 >>>> ------------------------------- >>>> KSP Object: >>>> (fieldsplit_0_mg_levels_2_) 1 MPI >>>> processes >>>> type: richardson >>>> Richardson: damping factor=1. >>>> maximum iterations=2 >>>> tolerances: relative=1e-05, >>>> absolute=1e-50, >>>> divergence=10000. >>>> left preconditioning >>>> using nonzero initial guess >>>> using NONE norm type for >>>> convergence test >>>> PC Object: >>>> (fieldsplit_0_mg_levels_2_) 1 MPI >>>> processes >>>> type: sor >>>> SOR: type = local_symmetric, >>>> iterations = 1, local >>>> iterations = 1, omega = 1. >>>> linear system matrix = precond >>>> matrix: >>>> Mat Object: >>>> 1 MPI processes >>>> type: seqaij >>>> rows=304, cols=304 >>>> total: nonzeros=7354, >>>> allocated nonzeros=7354 >>>> total number of mallocs used >>>> during MatSetValues >>>> calls =0 >>>> not using I-node routines >>>> Up solver (post-smoother) same as >>>> down solver (pre-smoother) >>>> Down solver (pre-smoother) on level 3 >>>> ------------------------------- >>>> KSP Object: >>>> (fieldsplit_0_mg_levels_3_) 1 MPI >>>> processes >>>> type: richardson >>>> Richardson: damping factor=1. >>>> maximum iterations=2 >>>> tolerances: relative=1e-05, >>>> absolute=1e-50, >>>> divergence=10000. >>>> left preconditioning >>>> using nonzero initial guess >>>> using NONE norm type for >>>> convergence test >>>> PC Object: >>>> (fieldsplit_0_mg_levels_3_) 1 MPI >>>> processes >>>> type: sor >>>> SOR: type = local_symmetric, >>>> iterations = 1, local >>>> iterations = 1, omega = 1. >>>> linear system matrix = precond >>>> matrix: >>>> Mat Object: >>>> 1 MPI processes >>>> type: seqaij >>>> rows=30236, cols=30236 >>>> total: nonzeros=2730644, >>>> allocated nonzeros=2730644 >>>> total number of mallocs used >>>> during MatSetValues >>>> calls =0 >>>> not using I-node routines >>>> Up solver (post-smoother) same as >>>> down solver (pre-smoother) >>>> Down solver (pre-smoother) on level 4 >>>> ------------------------------- >>>> KSP Object: >>>> (fieldsplit_0_mg_levels_4_) 1 MPI >>>> processes >>>> type: richardson >>>> Richardson: damping factor=1. >>>> maximum iterations=2 >>>> tolerances: relative=1e-05, >>>> absolute=1e-50, >>>> divergence=10000. >>>> left preconditioning >>>> using nonzero initial guess >>>> using NONE norm type for >>>> convergence test >>>> PC Object: >>>> (fieldsplit_0_mg_levels_4_) 1 MPI >>>> processes >>>> type: sor >>>> SOR: type = local_symmetric, >>>> iterations = 1, local >>>> iterations = 1, omega = 1. >>>> linear system matrix = precond >>>> matrix: >>>> Mat Object: >>>> (fieldsplit_0_) 1 MPI processes >>>> type: seqaij >>>> rows=894132, cols=894132 >>>> total: nonzeros=70684164, >>>> allocated nonzeros=70684164 >>>> total number of mallocs used >>>> during MatSetValues >>>> calls =0 >>>> not using I-node routines >>>> Up solver (post-smoother) same as >>>> down solver (pre-smoother) >>>> linear system matrix = precond matrix: >>>> Mat Object: >>>> (fieldsplit_0_) 1 MPI processes >>>> type: seqaij >>>> rows=894132, cols=894132 >>>> total: nonzeros=70684164, >>>> allocated nonzeros=70684164 >>>> total number of mallocs used >>>> during MatSetValues calls =0 >>>> not using I-node routines >>>> A01 >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> rows=894132, cols=42025 >>>> total: nonzeros=6850107, allocated >>>> nonzeros=6850107 >>>> total number of mallocs used during >>>> MatSetValues calls =0 >>>> not using I-node routines >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> rows=42025, cols=42025 >>>> total: nonzeros=554063, allocated >>>> nonzeros=554063 >>>> total number of mallocs used during >>>> MatSetValues calls =0 >>>> not using I-node routines >>>> linear system matrix = precond matrix: >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> rows=936157, cols=936157 >>>> total: nonzeros=84938441, allocated >>>> nonzeros=84938441 >>>> total number of mallocs used during >>>> MatSetValues calls =0 >>>> not using I-node routines >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they >>>> begin their experiments is infinitely more interesting >>>> than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> http://www.caam.rice.edu/~mk51/ >>>> >>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin >>> their experiments is infinitely more interesting than any >>> results to which their experiments lead. >>> -- Norbert Wiener >>> >>> http://www.caam.rice.edu/~mk51/ >>> >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to >> which their experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: From dave.mayhem23 at gmail.com Wed Jun 14 13:26:59 2017 From: dave.mayhem23 at gmail.com (Dave May) Date: Wed, 14 Jun 2017 18:26:59 +0000 Subject: [petsc-users] Advice on improving Stokes Schur preconditioners In-Reply-To: <2e961dbd-c795-0485-844b-e0f505e23df7@dim.uchile.cl> References: <2e961dbd-c795-0485-844b-e0f505e23df7@dim.uchile.cl> Message-ID: On Wed, 14 Jun 2017 at 19:42, David Nolte wrote: > Dave, thanks a lot for your great answer and for sharing your experience. > I have a much clearer picture now. :) > > The experiments 3/ give the desired results for examples of cavity flow. > The (1/mu scaled) mass matrix seems OK. > > I followed your and Matt's recommendations, used a FULL Schur > factorization, LU in the 0th split, and gradually relaxed the tolerance of > GMRES/Jacobi in split 1 (observed the gradual increase in outer > iterations). Then I replaced the split_0 LU with AMG (further increase of > outer iterations and iterations on the Schur complement). > Doing so I converged to using hypre boomeramg (smooth_type Euclid, > strong_threshold 0.75) and 3 iterations of GMRES/Jacobi on the Schur block, > which gave the best time-to-solution in my particular setup and convergence > to rtol=1e-8 within 60 outer iterations. > In my cases, using GMRES in the 0th split (with rtol 1e-1 or 1e-2) instead > of "preonly" did not help convergence (on the contrary). > > I also repeated the experiments with "-pc_fieldsplit_schur_precondition > selfp", with hypre(ilu) in split 0 and hypre in split 1, just to check, and > somewhat disappointingly ( ;-) ) the wall time is less than half than when > using gmres/Jac and Sp = mass matrix. > I am aware that this says nothing about scaling and robustness with > respect to h-refinement... > - selfp defines the schur pc as A10 inv(diag(A00)) A01. This operator is not spectrally equivalent to S - For split 0 did you use preonly-hypre(ilu)? - For split 1 did you also use hypre(ilu) (you just wrote hypre)? - What was the iteration count for the saddle point problem with hypre and selfp? Iterates will increase if you refine the mesh and a cross over will occur at some (unknown) resolution and the mass matrix variant will be faster. > > Would you agree that these configurations "make sense"? > If you want to weak scale, the configuration with the mass matrix makes the most sense. If you are only interested in solving many problems on one mesh, then do what ever you can to make the solve time as fast as possible - including using preconditioners defined with non-spectrally equivalent operators :D Thanks, Dave > Furthermore, maybe anyone has a hint where to start tuning multigrid? So > far hypre worked better than ML, but I have not experimented much with the > parameters. > > > > Thanks again for your help! > > Best wishes, > David > > > > > On 06/12/2017 04:52 PM, Dave May wrote: > > I've been following the discussion and have a couple of comments: > > 1/ For the preconditioners that you are using (Schur factorisation LDU, or > upper block triangular DU), the convergence properties (e.g. 1 iterate for > LDU and 2 iterates for DU) come from analysis involving exact inverses of > A_00 and S > > Once you switch from using exact inverses of A_00 and S, you have to rely > on spectral equivalence of operators. That is fine, but the spectral > equivalence does not tell you how many iterates LDU or DU will require to > converge. What it does inform you about is that if you have a spectrally > equivalent operator for A_00 and S (Schur complement), then under mesh > refinement, your iteration count (whatever it was prior to refinement) will > not increase. > > 2/ Looking at your first set of options, I see you have opted to use > -fieldsplit_ksp_type preonly (for both split 0 and 1). That is nice as it > creates a linear operator thus you don't need something like FGMRES or GCR > applied to the saddle point problem. > > Your choice for Schur is fine in the sense that the diagonal of M is > spectrally equivalent to M, and M is spectrally equivalent to S. Whether it > is "fine" in terms of the iteration count for Schur systems, we cannot say > apriori (since the spectral equivalence doesn't give us direct info about > the iterations we should expect). > > Your preconditioner for A_00 relies on AMG producing a spectrally > equivalent operator with bounds which are tight enough to ensure > convergence of the saddle point problem. I'll try explain this. > > In my experience, for many problems (unstructured FE with variable > coefficients, structured FE meshes with variable coefficients) AMG and > preonly is not a robust choice. To control the approximation (the spectral > equiv bounds), I typically run a stationary or Krylov method on split 0 > (e.g. -fieldsplit_0_ksp_type xxx -fieldsplit_0_kps_rtol yyy). Since the AMG > preconditioner generated is spectrally equivalent (usually!), these solves > will converge to a chosen rtol in a constant number of iterates under > h-refinement. In practice, if I don't enforce that I hit something like > rtol=1.0e-1 (or 1.0e-2) on the 0th split, saddle point iterates will > typically increase for "hard" problems under mesh refinement (1e4-1e7 > coefficient variation), and may not even converge at all when just using > -fieldsplit_0_ksp_type preonly. Failure ultimately depends on how "strong" > the preconditioner for A_00 block is (consider re-discretized geometric > multigrid versus AMG). Running an iterative solve on the 0th split lets you > control and recover from weak/poor, but spectrally equivalent > preconditioners for A_00. Note that people hate this approach as it > invariably nests Krylov methods, and subsequently adds more global > reductions. However, it is scalable, optimal, tuneable and converges faster > than the case which didn't converge at all :D > > 3/ I agree with Matt's comments, but I'd do a couple of other things first. > > * I'd first check the discretization is implemented correctly. Your P2/P1 > element is inf-sup stable - thus the condition number of S > (unpreconditioned) should be independent of the mesh resolution (h). An > easy way to verify this is to run either LDU (schur_fact_type full) or DU > (schur_fact_type upper) and monitor the iterations required for those S > solves. Use -fieldsplit_1_pc_type none -fieldsplit_1_ksp_rtol 1.0e-8 > -fieldsplit_1_ksp_monitor_true_residual -fieldsplit_1_ksp_pc_right > -fieldsplit_1_ksp_type gmres -fieldsplit_0_pc_type lu > > Then refine the mesh (ideally via sub-division) and repeat the experiment. > If the S iterates don't asymptote, but instead grow with each refinement - > you likely have a problem with the discretisation. > > * Do the same experiment, but this time use your mass matrix as the > preconditioner for S and use -fieldsplit_1_pc_type lu. If the iterates, > compared with the previous experiments (without a Schur PC) have gone up > your mass matrix is not defined correctly. If in the previous experiment > (without a Schur PC) iterates on the S solves were bounded, but now when > preconditioned with the mass matrix the iterates go up, then your mass > matrix is definitely not correct. > > 4/ Lastly, to finally get to your question regarding does +400 iterates > for the solving the Schur seem "reasonable" and what is "normal behaviour"? > > It seems "high" to me. However the specifics of your discretisation, mesh > topology, element quality, boundary conditions render it almost impossible > to say what should be expected. When I use a Q2-P2* discretisation on a > structured mesh with a non-constant viscosity I'd expect something like > 50-60 for 1.0e-10 with a mass matrix scaled by the inverse (local) > viscosity. For constant viscosity maybe 30 iterates. I think this kind of > statement is not particularly useful or helpful though. > > Given you use an unstructured tet mesh, it is possible that some elements > have very bad quality (high aspect ratio (AR), highly skewed). I am certain > that P2/P1 has an inf-sup constant which is sensitive to the element aspect > ratio (I don't recall the exact scaling wrt AR). From experience I know > that using the mass matrix as a preconditioner for Schur is not robust as > AR increases (e.g. iterations for the S solve grow). Hence, with a couple > of "bad" element in your mesh, I could imagine that you could end up having > to perform +400 iterations > > 5/ Lastly, definitely don't impose one Dirichlet BC on pressure to make > the pressure unique. This really screws up all the nice properties of your > matrices. Just enforce the constant null space for p. And as you noticed, > GMRES magically just does it automatically if the RHS of your original > system was consistent. > > Thanks, > Dave > > > On 12 June 2017 at 20:20, David Nolte wrote: > >> Ok. With "-pc_fieldsplit_schur_fact_type full" the outer iteration >> converges in 1 step. The problem remain the Schur iterations. >> >> I was not sure if the problem was maybe the singular pressure or the >> pressure Dirichlet BC. I tested the solver with a standard Stokes flow in a >> pipe with a constriction (zero Neumann BC for the pressure at the outlet) >> and in a 3D cavity (enclosed flow, no pressure BC or fixed at one point). I >> am not sure if I need to attach the constant pressure nullspace to the >> matrix for GMRES. Not doing so does not alter the convergence of GMRES in >> the Schur solver (nor the pressure solution), using a pressure Dirichlet BC >> however slows down convergence (I suppose because of the scaling of the >> matrix). >> >> I also checked the pressure mass matrix that I give PETSc, it looks >> correct. >> >> In all these cases, the solver behaves just as before. With LU in >> fieldsplit_0 and GMRES/LU with rtol 1e-10 in fieldsplit_1, it converges >> after 1 outer iteration, but the inner Schur solver converges slowly. >> >> How should the convergence of GMRES/LU of the Schur complement *normally* >> behave? >> >> Thanks again! >> David >> >> >> >> >> On 06/12/2017 12:41 PM, Matthew Knepley wrote: >> >> On Mon, Jun 12, 2017 at 10:36 AM, David Nolte >> wrote: >> >>> >>> On 06/12/2017 07:50 AM, Matthew Knepley wrote: >>> >>> On Sun, Jun 11, 2017 at 11:06 PM, David Nolte >>> wrote: >>> >>>> Thanks Matt, makes sense to me! >>>> >>>> I skipped direct solvers at first because for these 'real' >>>> configurations LU (mumps/superlu_dist) usally goes out of memory (got 32GB >>>> RAM). It would be reasonable to take one more step back and play with >>>> synthetic examples. >>>> I managed to run one case though with 936k dofs using: ("user" >>>> =pressure mass matrix) >>>> >>>> <...> >>>> -pc_fieldsplit_schur_fact_type upper >>>> -pc_fieldsplit_schur_precondition user >>>> -fieldsplit_0_ksp_type preonly >>>> -fieldsplit_0_pc_type lu >>>> -fieldsplit_0_pc_factor_mat_solver_package mumps >>>> >>>> -fieldsplit_1_ksp_type gmres >>>> -fieldsplit_1_ksp_monitor_true_residuals >>>> -fieldsplit_1_ksp_rtol 1e-10 >>>> -fieldsplit_1_pc_type lu >>>> -fieldsplit_1_pc_factor_mat_solver_package mumps >>>> >>>> It takes 2 outer iterations, as expected. However the fieldsplit_1 >>>> solve takes very long. >>>> >>> >>> 1) It should take 1 outer iterate, not two. The problem is that your >>> Schur tolerance is way too high. Use >>> >>> -fieldsplit_1_ksp_rtol 1e-10 >>> >>> or something like that. Then it will take 1 iterate. >>> >>> >>> Shouldn't it take 2 with a triangular Schur factorization and exact >>> preconditioners, and 1 with a full factorization? (cf. Benzi et al 2005, >>> p.66, http://www.mathcs.emory.edu/~benzi/Web_papers/bgl05.pdf) >>> >>> That's exactly what I set: -fieldsplit_1_ksp_rtol 1e-10 and the Schur >>> solver does drop below "rtol < 1e-10" >>> >> >> Oh, yes. Take away the upper until things are worked out. >> >> Thanks, >> >> Matt >> >>> >>> 2) There is a problem with the Schur solve. Now from the iterates >>> >>> 423 KSP preconditioned resid norm 2.638419658982e-02 true resid norm >>> 7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11 >>> >>> it is clear that the preconditioner is really screwing stuff up. For >>> testing, you can use >>> >>> -pc_fieldsplit_schur_precondition full >>> >>> and your same setup here. It should take one iterate. I think there is >>> something wrong with your >>> mass matrix. >>> >>> >>> I agree. I forgot to mention that I am considering an "enclosed flow" >>> problem, with u=0 on all the boundary and a Dirichlet condition for the >>> pressure in one point for fixing the constant pressure. Maybe the >>> preconditioner is not consistent with this setup, need to check this.. >>> >>> Thanks a lot >>> >>> >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> 0 KSP unpreconditioned resid norm 4.038466809302e-03 true resid norm >>>> 4.038466809302e-03 ||r(i)||/||b|| 1.000000000000e+00 >>>> Residual norms for fieldsplit_1_ solve. >>>> 0 KSP preconditioned resid norm 0.000000000000e+00 true resid norm >>>> 0.000000000000e+00 ||r(i)||/||b|| -nan >>>> Linear fieldsplit_1_ solve converged due to CONVERGED_ATOL iterations >>>> 0 >>>> 1 KSP unpreconditioned resid norm 4.860095964831e-06 true resid norm >>>> 4.860095964831e-06 ||r(i)||/||b|| 1.203450763452e-03 >>>> Residual norms for fieldsplit_1_ solve. >>>> 0 KSP preconditioned resid norm 2.965546249872e+08 true resid norm >>>> 1.000000000000e+00 ||r(i)||/||b|| 1.000000000000e+00 >>>> 1 KSP preconditioned resid norm 1.347596594634e+08 true resid norm >>>> 3.599678801575e-01 ||r(i)||/||b|| 3.599678801575e-01 >>>> 2 KSP preconditioned resid norm 5.913230136403e+07 true resid norm >>>> 2.364916760834e-01 ||r(i)||/||b|| 2.364916760834e-01 >>>> 3 KSP preconditioned resid norm 4.629700028930e+07 true resid norm >>>> 1.984444715595e-01 ||r(i)||/||b|| 1.984444715595e-01 >>>> 4 KSP preconditioned resid norm 3.804431276819e+07 true resid norm >>>> 1.747224559120e-01 ||r(i)||/||b|| 1.747224559120e-01 >>>> 5 KSP preconditioned resid norm 3.178769422140e+07 true resid norm >>>> 1.402254864444e-01 ||r(i)||/||b|| 1.402254864444e-01 >>>> 6 KSP preconditioned resid norm 2.648669043919e+07 true resid norm >>>> 1.191164310866e-01 ||r(i)||/||b|| 1.191164310866e-01 >>>> 7 KSP preconditioned resid norm 2.203522108614e+07 true resid norm >>>> 9.690500018007e-02 ||r(i)||/||b|| 9.690500018007e-02 >>>> <...> >>>> 422 KSP preconditioned resid norm 2.984888715147e-02 true resid >>>> norm 8.598401046494e-11 ||r(i)||/||b|| 8.598401046494e-11 >>>> 423 KSP preconditioned resid norm 2.638419658982e-02 true resid >>>> norm 7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11 >>>> Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations >>>> 423 >>>> 2 KSP unpreconditioned resid norm 3.539889585599e-16 true resid norm >>>> 3.542279617063e-16 ||r(i)||/||b|| 8.771347603759e-14 >>>> Linear solve converged due to CONVERGED_RTOL iterations 2 >>>> >>>> >>>> Does the slow convergence of the Schur block mean that my >>>> preconditioning matrix Sp is a poor choice? >>>> >>>> Thanks, >>>> David >>>> >>>> >>>> On 06/11/2017 08:53 AM, Matthew Knepley wrote: >>>> >>>> On Sat, Jun 10, 2017 at 8:25 PM, David Nolte >>>> wrote: >>>> >>>>> Dear all, >>>>> >>>>> I am solving a Stokes problem in 3D aorta geometries, using a P2/P1 >>>>> finite elements discretization on tetrahedral meshes resulting in >>>>> ~1-1.5M DOFs. Viscosity is uniform (can be adjusted arbitrarily), and >>>>> the right hand side is a function of noisy measurement data. >>>>> >>>>> In other settings of "standard" Stokes flow problems I have obtained >>>>> good convergence with an "upper" Schur complement preconditioner, using >>>>> AMG (ML or Hypre) on the velocity block and approximating the Schur >>>>> complement matrix by the diagonal of the pressure mass matrix: >>>>> >>>>> -ksp_converged_reason >>>>> -ksp_monitor_true_residual >>>>> -ksp_initial_guess_nonzero >>>>> -ksp_diagonal_scale >>>>> -ksp_diagonal_scale_fix >>>>> -ksp_type fgmres >>>>> -ksp_rtol 1.0e-8 >>>>> >>>>> -pc_type fieldsplit >>>>> -pc_fieldsplit_type schur >>>>> -pc_fieldsplit_detect_saddle_point >>>>> -pc_fieldsplit_schur_fact_type upper >>>>> -pc_fieldsplit_schur_precondition user # <-- pressure mass >>>>> matrix >>>>> >>>>> -fieldsplit_0_ksp_type preonly >>>>> -fieldsplit_0_pc_type ml >>>>> >>>>> -fieldsplit_1_ksp_type preonly >>>>> -fieldsplit_1_pc_type jacobi >>>>> >>>> >>>> 1) I always recommend starting from an exact solver and backing off in >>>> small steps for optimization. Thus >>>> I would start with LU on the upper block and GMRES/LU with >>>> toelrance 1e-10 on the Schur block. >>>> This should converge in 1 iterate. >>>> >>>> 2) I don't think you want preonly on the Schur system. You might want >>>> GMRES/Jacobi to invert the mass matrix. >>>> >>>> 3) You probably want to tighten the tolerance on the Schur solve, at >>>> least to start, and then slowly let it out. The >>>> tight tolerance will show you how effective the preconditioner is >>>> using that Schur operator. Then you can start >>>> to evaluate how effective the Schur linear sovler is. >>>> >>>> Does this make sense? >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> In my present case this setup gives rather slow convergence (varies for >>>>> different geometries between 200-500 or several thousands!). I obtain >>>>> better convergence with "-pc_fieldsplit_schur_precondition selfp"and >>>>> using multigrid on S, with "-fieldsplit_1_pc_type ml" (I don't think >>>>> this is optimal, though). >>>>> >>>>> I don't understand why the pressure mass matrix approach performs so >>>>> poorly and wonder what I could try to improve the convergence. Until >>>>> now >>>>> I have been using ML and Hypre BoomerAMG mostly with default >>>>> parameters. >>>>> Surely they can be improved by tuning some parameters. Which could be a >>>>> good starting point? Are there other options I should consider? >>>>> >>>>> With the above setup (jacobi) for a case that works better than others, >>>>> the KSP terminates with >>>>> 467 KSP unpreconditioned resid norm 2.072014323515e-09 true resid norm >>>>> 2.072014322600e-09 ||r(i)||/||b|| 9.939098100674e-09 >>>>> >>>>> You can find the output of -ksp_view below. Let me know if you need >>>>> more >>>>> details. >>>>> >>>>> Thanks in advance for your advice! >>>>> Best wishes >>>>> David >>>>> >>>>> >>>>> KSP Object: 1 MPI processes >>>>> type: fgmres >>>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >>>>> Orthogonalization with no iterative refinement >>>>> GMRES: happy breakdown tolerance 1e-30 >>>>> maximum iterations=10000 >>>>> tolerances: relative=1e-08, absolute=1e-50, divergence=10000. >>>>> right preconditioning >>>>> diagonally scaled system >>>>> using nonzero initial guess >>>>> using UNPRECONDITIONED norm type for convergence test >>>>> PC Object: 1 MPI processes >>>>> type: fieldsplit >>>>> FieldSplit with Schur preconditioner, factorization UPPER >>>>> Preconditioner for the Schur complement formed from user provided >>>>> matrix >>>>> Split info: >>>>> Split number 0 Defined by IS >>>>> Split number 1 Defined by IS >>>>> KSP solver for A00 block >>>>> KSP Object: (fieldsplit_0_) 1 MPI processes >>>>> type: preonly >>>>> maximum iterations=10000, initial guess is zero >>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>>>> left preconditioning >>>>> using NONE norm type for convergence test >>>>> PC Object: (fieldsplit_0_) 1 MPI processes >>>>> type: ml >>>>> MG: type is MULTIPLICATIVE, levels=5 cycles=v >>>>> Cycles per PCApply=1 >>>>> Using Galerkin computed coarse grid matrices >>>>> Coarse grid solver -- level ------------------------------- >>>>> KSP Object: (fieldsplit_0_mg_coarse_) 1 >>>>> MPI >>>>> processes >>>>> type: preonly >>>>> maximum iterations=10000, initial guess is zero >>>>> tolerances: relative=1e-05, absolute=1e-50, >>>>> divergence=10000. >>>>> left preconditioning >>>>> using NONE norm type for convergence test >>>>> PC Object: (fieldsplit_0_mg_coarse_) 1 MPI >>>>> processes >>>>> type: lu >>>>> LU: out-of-place factorization >>>>> tolerance for zero pivot 2.22045e-14 >>>>> using diagonal shift on blocks to prevent zero pivot >>>>> [INBLOCKS] >>>>> matrix ordering: nd >>>>> factor fill ratio given 5., needed 1. >>>>> Factored matrix follows: >>>>> Mat Object: 1 MPI processes >>>>> type: seqaij >>>>> rows=3, cols=3 >>>>> package used to perform factorization: petsc >>>>> total: nonzeros=3, allocated nonzeros=3 >>>>> total number of mallocs used during MatSetValues >>>>> calls =0 >>>>> not using I-node routines >>>>> linear system matrix = precond matrix: >>>>> Mat Object: 1 MPI processes >>>>> type: seqaij >>>>> rows=3, cols=3 >>>>> total: nonzeros=3, allocated nonzeros=3 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> not using I-node routines >>>>> Down solver (pre-smoother) on level 1 >>>>> ------------------------------- >>>>> KSP Object: (fieldsplit_0_mg_levels_1_) 1 >>>>> MPI processes >>>>> type: richardson >>>>> Richardson: damping factor=1. >>>>> maximum iterations=2 >>>>> tolerances: relative=1e-05, absolute=1e-50, >>>>> divergence=10000. >>>>> left preconditioning >>>>> using nonzero initial guess >>>>> using NONE norm type for convergence test >>>>> PC Object: (fieldsplit_0_mg_levels_1_) 1 >>>>> MPI processes >>>>> type: sor >>>>> SOR: type = local_symmetric, iterations = 1, local >>>>> iterations = 1, omega = 1. >>>>> linear system matrix = precond matrix: >>>>> Mat Object: 1 MPI processes >>>>> type: seqaij >>>>> rows=15, cols=15 >>>>> total: nonzeros=69, allocated nonzeros=69 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> not using I-node routines >>>>> Up solver (post-smoother) same as down solver (pre-smoother) >>>>> Down solver (pre-smoother) on level 2 >>>>> ------------------------------- >>>>> KSP Object: (fieldsplit_0_mg_levels_2_) 1 >>>>> MPI processes >>>>> type: richardson >>>>> Richardson: damping factor=1. >>>>> maximum iterations=2 >>>>> tolerances: relative=1e-05, absolute=1e-50, >>>>> divergence=10000. >>>>> left preconditioning >>>>> using nonzero initial guess >>>>> using NONE norm type for convergence test >>>>> PC Object: (fieldsplit_0_mg_levels_2_) 1 >>>>> MPI processes >>>>> type: sor >>>>> SOR: type = local_symmetric, iterations = 1, local >>>>> iterations = 1, omega = 1. >>>>> linear system matrix = precond matrix: >>>>> Mat Object: 1 MPI processes >>>>> type: seqaij >>>>> rows=304, cols=304 >>>>> total: nonzeros=7354, allocated nonzeros=7354 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> not using I-node routines >>>>> Up solver (post-smoother) same as down solver (pre-smoother) >>>>> Down solver (pre-smoother) on level 3 >>>>> ------------------------------- >>>>> KSP Object: (fieldsplit_0_mg_levels_3_) 1 >>>>> MPI processes >>>>> type: richardson >>>>> Richardson: damping factor=1. >>>>> maximum iterations=2 >>>>> tolerances: relative=1e-05, absolute=1e-50, >>>>> divergence=10000. >>>>> left preconditioning >>>>> using nonzero initial guess >>>>> using NONE norm type for convergence test >>>>> PC Object: (fieldsplit_0_mg_levels_3_) 1 >>>>> MPI processes >>>>> type: sor >>>>> SOR: type = local_symmetric, iterations = 1, local >>>>> iterations = 1, omega = 1. >>>>> linear system matrix = precond matrix: >>>>> Mat Object: 1 MPI processes >>>>> type: seqaij >>>>> rows=30236, cols=30236 >>>>> total: nonzeros=2730644, allocated nonzeros=2730644 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> not using I-node routines >>>>> Up solver (post-smoother) same as down solver (pre-smoother) >>>>> Down solver (pre-smoother) on level 4 >>>>> ------------------------------- >>>>> KSP Object: (fieldsplit_0_mg_levels_4_) 1 >>>>> MPI processes >>>>> type: richardson >>>>> Richardson: damping factor=1. >>>>> maximum iterations=2 >>>>> tolerances: relative=1e-05, absolute=1e-50, >>>>> divergence=10000. >>>>> left preconditioning >>>>> using nonzero initial guess >>>>> using NONE norm type for convergence test >>>>> PC Object: (fieldsplit_0_mg_levels_4_) 1 >>>>> MPI processes >>>>> type: sor >>>>> SOR: type = local_symmetric, iterations = 1, local >>>>> iterations = 1, omega = 1. >>>>> linear system matrix = precond matrix: >>>>> Mat Object: (fieldsplit_0_) 1 MPI >>>>> processes >>>>> type: seqaij >>>>> rows=894132, cols=894132 >>>>> total: nonzeros=70684164, allocated nonzeros=70684164 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> not using I-node routines >>>>> Up solver (post-smoother) same as down solver (pre-smoother) >>>>> linear system matrix = precond matrix: >>>>> Mat Object: (fieldsplit_0_) 1 MPI processes >>>>> type: seqaij >>>>> rows=894132, cols=894132 >>>>> total: nonzeros=70684164, allocated nonzeros=70684164 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> not using I-node routines >>>>> KSP solver for S = A11 - A10 inv(A00) A01 >>>>> KSP Object: (fieldsplit_1_) 1 MPI processes >>>>> type: preonly >>>>> maximum iterations=10000, initial guess is zero >>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>>>> left preconditioning >>>>> using NONE norm type for convergence test >>>>> PC Object: (fieldsplit_1_) 1 MPI processes >>>>> type: jacobi >>>>> linear system matrix followed by preconditioner matrix: >>>>> Mat Object: (fieldsplit_1_) 1 MPI processes >>>>> type: schurcomplement >>>>> rows=42025, cols=42025 >>>>> Schur complement A11 - A10 inv(A00) A01 >>>>> A11 >>>>> Mat Object: (fieldsplit_1_) 1 >>>>> MPI processes >>>>> type: seqaij >>>>> rows=42025, cols=42025 >>>>> total: nonzeros=554063, allocated nonzeros=554063 >>>>> total number of mallocs used during MatSetValues calls >>>>> =0 >>>>> not using I-node routines >>>>> A10 >>>>> Mat Object: 1 MPI processes >>>>> type: seqaij >>>>> rows=42025, cols=894132 >>>>> total: nonzeros=6850107, allocated nonzeros=6850107 >>>>> total number of mallocs used during MatSetValues calls >>>>> =0 >>>>> not using I-node routines >>>>> KSP of A00 >>>>> KSP Object: (fieldsplit_0_) 1 >>>>> MPI processes >>>>> type: preonly >>>>> maximum iterations=10000, initial guess is zero >>>>> tolerances: relative=1e-05, absolute=1e-50, >>>>> divergence=10000. >>>>> left preconditioning >>>>> using NONE norm type for convergence test >>>>> PC Object: (fieldsplit_0_) 1 >>>>> MPI processes >>>>> type: ml >>>>> MG: type is MULTIPLICATIVE, levels=5 cycles=v >>>>> Cycles per PCApply=1 >>>>> Using Galerkin computed coarse grid matrices >>>>> Coarse grid solver -- level >>>>> ------------------------------- >>>>> KSP Object: >>>>> (fieldsplit_0_mg_coarse_) 1 MPI processes >>>>> type: preonly >>>>> maximum iterations=10000, initial guess is zero >>>>> tolerances: relative=1e-05, absolute=1e-50, >>>>> divergence=10000. >>>>> left preconditioning >>>>> using NONE norm type for convergence test >>>>> PC Object: >>>>> (fieldsplit_0_mg_coarse_) 1 MPI processes >>>>> type: lu >>>>> LU: out-of-place factorization >>>>> tolerance for zero pivot 2.22045e-14 >>>>> using diagonal shift on blocks to prevent zero >>>>> pivot [INBLOCKS] >>>>> matrix ordering: nd >>>>> factor fill ratio given 5., needed 1. >>>>> Factored matrix follows: >>>>> Mat Object: 1 MPI >>>>> processes >>>>> type: seqaij >>>>> rows=3, cols=3 >>>>> package used to perform factorization: >>>>> petsc >>>>> total: nonzeros=3, allocated nonzeros=3 >>>>> total number of mallocs used during >>>>> MatSetValues calls =0 >>>>> not using I-node routines >>>>> >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From dnolte at dim.uchile.cl Wed Jun 14 14:04:53 2017 From: dnolte at dim.uchile.cl (David Nolte) Date: Wed, 14 Jun 2017 15:04:53 -0400 Subject: [petsc-users] Advice on improving Stokes Schur preconditioners In-Reply-To: References: <2e961dbd-c795-0485-844b-e0f505e23df7@dim.uchile.cl> Message-ID: On 06/14/2017 02:26 PM, Dave May wrote: > > On Wed, 14 Jun 2017 at 19:42, David Nolte > wrote: > > Dave, thanks a lot for your great answer and for sharing your > experience. I have a much clearer picture now. :) > > The experiments 3/ give the desired results for examples of cavity > flow. The (1/mu scaled) mass matrix seems OK. > > I followed your and Matt's recommendations, used a FULL Schur > factorization, LU in the 0th split, and gradually relaxed the > tolerance of GMRES/Jacobi in split 1 (observed the gradual > increase in outer iterations). Then I replaced the split_0 LU with > AMG (further increase of outer iterations and iterations on the > Schur complement). > Doing so I converged to using hypre boomeramg (smooth_type Euclid, > strong_threshold 0.75) and 3 iterations of GMRES/Jacobi on the > Schur block, which gave the best time-to-solution in my particular > setup and convergence to rtol=1e-8 within 60 outer iterations. > In my cases, using GMRES in the 0th split (with rtol 1e-1 or 1e-2) > instead of "preonly" did not help convergence (on the contrary). > > I also repeated the experiments with > "-pc_fieldsplit_schur_precondition selfp", with hypre(ilu) in > split 0 and hypre in split 1, just to check, and somewhat > disappointingly ( ;-) ) the wall time is less than half than when > using gmres/Jac and Sp = mass matrix. > I am aware that this says nothing about scaling and robustness > with respect to h-refinement... > > > - selfp defines the schur pc as A10 inv(diag(A00)) A01. This operator > is not spectrally equivalent to S > > - For split 0 did you use preonly-hypre(ilu)? > > - For split 1 did you also use hypre(ilu) (you just wrote hypre)? > > - What was the iteration count for the saddle point problem with hypre > and selfp? Iterates will increase if you refine the mesh and a cross > over will occur at some (unknown) resolution and the mass matrix > variant will be faster. Ok, this makes sense. split 1 has hypre with the default smoother (Schwarz-smoothers), the setup is: -pc_type fieldsplit -pc_fieldsplit_type schur -pc_fieldsplit_detect_saddle_point -pc_fieldsplit_schur_fact_type full -pc_fieldsplit_schur_precondition selfp -fieldsplit_0_ksp_type richardson -fieldsplit_0_ksp_max_it 1 -fieldsplit_0_pc_type hypre -fieldsplit_0_pc_hypre_type boomeramg -fieldsplit_0_pc_hypre_boomeramg_strong_threshold 0.75 -fieldsplit_0_pc_hypre_boomeramg_smooth_type Euclid -fieldsplit_0_pc_hypre_boomeramg_eu_bj -fieldsplit_1_ksp_type richardson -fieldsplit_1_ksp_max_it 1 -fieldsplit_1_pc_type hypre -fieldsplit_1_pc_hypre_type boomeramg -fieldsplit_1_pc_hypre_boomeramg_strong_threshold 0.75 Iteration counts were in two different cases 90 and 113, while the mass matrix variant (gmres/jacobi iterations on the Schur complement) took 56 and 59. > Would you agree that these configurations "make sense"? > > > If you want to weak scale, the configuration with the mass matrix > makes the most sense. > > If you are only interested in solving many problems on one mesh, then > do what ever you can to make the solve time as fast as possible - > including using preconditioners defined with non-spectrally equivalent > operators :D > I see. That's exactly my case, many problems on one mesh (they are generated from medical images with fixed resolution). The hypre/selfp variant is 2-3x faster, so I'll just stick with that for the moment and try tuning the hypre parameters. Thanks again! > Thanks, > Dave > > > Furthermore, maybe anyone has a hint where to start tuning > multigrid? So far hypre worked better than ML, but I have not > experimented much with the parameters. > > > > > Thanks again for your help! > > Best wishes, > David > > > > > On 06/12/2017 04:52 PM, Dave May wrote: >> I've been following the discussion and have a couple of comments: >> >> 1/ For the preconditioners that you are using (Schur >> factorisation LDU, or upper block triangular DU), the convergence >> properties (e.g. 1 iterate for LDU and 2 iterates for DU) come >> from analysis involving exact inverses of A_00 and S >> >> Once you switch from using exact inverses of A_00 and S, you have >> to rely on spectral equivalence of operators. That is fine, but >> the spectral equivalence does not tell you how many iterates LDU >> or DU will require to converge. What it does inform you about is >> that if you have a spectrally equivalent operator for A_00 and S >> (Schur complement), then under mesh refinement, your iteration >> count (whatever it was prior to refinement) will not increase. >> >> 2/ Looking at your first set of options, I see you have opted to >> use -fieldsplit_ksp_type preonly (for both split 0 and 1). That >> is nice as it creates a linear operator thus you don't need >> something like FGMRES or GCR applied to the saddle point problem. >> >> Your choice for Schur is fine in the sense that the diagonal of M >> is spectrally equivalent to M, and M is spectrally equivalent to >> S. Whether it is "fine" in terms of the iteration count for Schur >> systems, we cannot say apriori (since the spectral equivalence >> doesn't give us direct info about the iterations we should expect). >> >> Your preconditioner for A_00 relies on AMG producing a spectrally >> equivalent operator with bounds which are tight enough to ensure >> convergence of the saddle point problem. I'll try explain this. >> >> In my experience, for many problems (unstructured FE with >> variable coefficients, structured FE meshes with variable >> coefficients) AMG and preonly is not a robust choice. To control >> the approximation (the spectral equiv bounds), I typically run a >> stationary or Krylov method on split 0 (e.g. >> -fieldsplit_0_ksp_type xxx -fieldsplit_0_kps_rtol yyy). Since the >> AMG preconditioner generated is spectrally equivalent (usually!), >> these solves will converge to a chosen rtol in a constant number >> of iterates under h-refinement. In practice, if I don't enforce >> that I hit something like rtol=1.0e-1 (or 1.0e-2) on the 0th >> split, saddle point iterates will typically increase for "hard" >> problems under mesh refinement (1e4-1e7 coefficient variation), >> and may not even converge at all when just using >> -fieldsplit_0_ksp_type preonly. Failure ultimately depends on how >> "strong" the preconditioner for A_00 block is (consider >> re-discretized geometric multigrid versus AMG). Running an >> iterative solve on the 0th split lets you control and recover >> from weak/poor, but spectrally equivalent preconditioners for >> A_00. Note that people hate this approach as it invariably nests >> Krylov methods, and subsequently adds more global reductions. >> However, it is scalable, optimal, tuneable and converges faster >> than the case which didn't converge at all :D >> >> 3/ I agree with Matt's comments, but I'd do a couple of other >> things first. >> >> * I'd first check the discretization is implemented correctly. >> Your P2/P1 element is inf-sup stable - thus the condition number >> of S (unpreconditioned) should be independent of the mesh >> resolution (h). An easy way to verify this is to run either LDU >> (schur_fact_type full) or DU (schur_fact_type upper) and monitor >> the iterations required for those S solves. Use >> -fieldsplit_1_pc_type none -fieldsplit_1_ksp_rtol 1.0e-8 >> -fieldsplit_1_ksp_monitor_true_residual >> -fieldsplit_1_ksp_pc_right -fieldsplit_1_ksp_type gmres >> -fieldsplit_0_pc_type lu >> >> Then refine the mesh (ideally via sub-division) and repeat the >> experiment. >> If the S iterates don't asymptote, but instead grow with each >> refinement - you likely have a problem with the discretisation. >> >> * Do the same experiment, but this time use your mass matrix as >> the preconditioner for S and use -fieldsplit_1_pc_type lu. If the >> iterates, compared with the previous experiments (without a Schur >> PC) have gone up your mass matrix is not defined correctly. If in >> the previous experiment (without a Schur PC) iterates on the S >> solves were bounded, but now when preconditioned with the mass >> matrix the iterates go up, then your mass matrix is definitely >> not correct. >> >> 4/ Lastly, to finally get to your question regarding does +400 >> iterates for the solving the Schur seem "reasonable" and what is >> "normal behaviour"? >> >> It seems "high" to me. However the specifics of your >> discretisation, mesh topology, element quality, boundary >> conditions render it almost impossible to say what should be >> expected. When I use a Q2-P2* discretisation on a structured mesh >> with a non-constant viscosity I'd expect something like 50-60 for >> 1.0e-10 with a mass matrix scaled by the inverse (local) >> viscosity. For constant viscosity maybe 30 iterates. I think this >> kind of statement is not particularly useful or helpful though. >> >> Given you use an unstructured tet mesh, it is possible that some >> elements have very bad quality (high aspect ratio (AR), highly >> skewed). I am certain that P2/P1 has an inf-sup constant which is >> sensitive to the element aspect ratio (I don't recall the exact >> scaling wrt AR). From experience I know that using the mass >> matrix as a preconditioner for Schur is not robust as AR >> increases (e.g. iterations for the S solve grow). Hence, with a >> couple of "bad" element in your mesh, I could imagine that you >> could end up having to perform +400 iterations >> >> 5/ Lastly, definitely don't impose one Dirichlet BC on pressure >> to make the pressure unique. This really screws up all the nice >> properties of your matrices. Just enforce the constant null space >> for p. And as you noticed, GMRES magically just does it >> automatically if the RHS of your original system was consistent. >> >> Thanks, >> Dave >> >> >> On 12 June 2017 at 20:20, David Nolte > > wrote: >> >> Ok. With "-pc_fieldsplit_schur_fact_type full" the outer >> iteration converges in 1 step. The problem remain the Schur >> iterations. >> >> I was not sure if the problem was maybe the singular pressure >> or the pressure Dirichlet BC. I tested the solver with a >> standard Stokes flow in a pipe with a constriction (zero >> Neumann BC for the pressure at the outlet) and in a 3D cavity >> (enclosed flow, no pressure BC or fixed at one point). I am >> not sure if I need to attach the constant pressure nullspace >> to the matrix for GMRES. Not doing so does not alter the >> convergence of GMRES in the Schur solver (nor the pressure >> solution), using a pressure Dirichlet BC however slows down >> convergence (I suppose because of the scaling of the matrix). >> >> I also checked the pressure mass matrix that I give PETSc, it >> looks correct. >> >> In all these cases, the solver behaves just as before. With >> LU in fieldsplit_0 and GMRES/LU with rtol 1e-10 in >> fieldsplit_1, it converges after 1 outer iteration, but the >> inner Schur solver converges slowly. >> >> How should the convergence of GMRES/LU of the Schur >> complement *normally* behave? >> >> Thanks again! >> David >> >> >> >> >> On 06/12/2017 12:41 PM, Matthew Knepley wrote: >>> On Mon, Jun 12, 2017 at 10:36 AM, David Nolte >>> > wrote: >>> >>> >>> On 06/12/2017 07:50 AM, Matthew Knepley wrote: >>>> On Sun, Jun 11, 2017 at 11:06 PM, David Nolte >>>> > wrote: >>>> >>>> Thanks Matt, makes sense to me! >>>> >>>> I skipped direct solvers at first because for these >>>> 'real' configurations LU (mumps/superlu_dist) >>>> usally goes out of memory (got 32GB RAM). It would >>>> be reasonable to take one more step back and play >>>> with synthetic examples. >>>> I managed to run one case though with 936k dofs >>>> using: ("user" =pressure mass matrix) >>>> >>>> <...> >>>> -pc_fieldsplit_schur_fact_type upper >>>> -pc_fieldsplit_schur_precondition user >>>> -fieldsplit_0_ksp_type preonly >>>> -fieldsplit_0_pc_type lu >>>> -fieldsplit_0_pc_factor_mat_solver_package mumps >>>> >>>> -fieldsplit_1_ksp_type gmres >>>> -fieldsplit_1_ksp_monitor_true_residuals >>>> -fieldsplit_1_ksp_rtol 1e-10 >>>> -fieldsplit_1_pc_type lu >>>> -fieldsplit_1_pc_factor_mat_solver_package mumps >>>> >>>> It takes 2 outer iterations, as expected. However >>>> the fieldsplit_1 solve takes very long. >>>> >>>> >>>> 1) It should take 1 outer iterate, not two. The problem >>>> is that your Schur tolerance is way too high. Use >>>> >>>> -fieldsplit_1_ksp_rtol 1e-10 >>>> >>>> or something like that. Then it will take 1 iterate. >>> >>> Shouldn't it take 2 with a triangular Schur >>> factorization and exact preconditioners, and 1 with a >>> full factorization? (cf. Benzi et al 2005, p.66, >>> http://www.mathcs.emory.edu/~benzi/Web_papers/bgl05.pdf >>> ) >>> >>> That's exactly what I set: -fieldsplit_1_ksp_rtol 1e-10 >>> and the Schur solver does drop below "rtol < 1e-10" >>> >>> >>> Oh, yes. Take away the upper until things are worked out. >>> >>> Thanks, >>> >>> Matt >>> >>>> >>>> 2) There is a problem with the Schur solve. Now from >>>> the iterates >>>> >>>> 423 KSP preconditioned resid norm 2.638419658982e-02 >>>> true resid norm 7.229653211635e-11 ||r(i)||/||b|| >>>> 7.229653211635e-11 >>>> >>>> it is clear that the preconditioner is really screwing >>>> stuff up. For testing, you can use >>>> >>>> -pc_fieldsplit_schur_precondition full >>>> >>>> and your same setup here. It should take one iterate. I >>>> think there is something wrong with your >>>> mass matrix. >>> >>> I agree. I forgot to mention that I am considering an >>> "enclosed flow" problem, with u=0 on all the boundary >>> and a Dirichlet condition for the pressure in one point >>> for fixing the constant pressure. Maybe the >>> preconditioner is not consistent with this setup, need >>> to check this.. >>> >>> Thanks a lot >>> >>> >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>> 0 KSP unpreconditioned resid norm >>>> 4.038466809302e-03 true resid norm >>>> 4.038466809302e-03 ||r(i)||/||b|| 1.000000000000e+00 >>>> Residual norms for fieldsplit_1_ solve. >>>> 0 KSP preconditioned resid norm >>>> 0.000000000000e+00 true resid norm >>>> 0.000000000000e+00 ||r(i)||/||b|| -nan >>>> Linear fieldsplit_1_ solve converged due to >>>> CONVERGED_ATOL iterations 0 >>>> 1 KSP unpreconditioned resid norm >>>> 4.860095964831e-06 true resid norm >>>> 4.860095964831e-06 ||r(i)||/||b|| 1.203450763452e-03 >>>> Residual norms for fieldsplit_1_ solve. >>>> 0 KSP preconditioned resid norm >>>> 2.965546249872e+08 true resid norm >>>> 1.000000000000e+00 ||r(i)||/||b|| 1.000000000000e+00 >>>> 1 KSP preconditioned resid norm >>>> 1.347596594634e+08 true resid norm >>>> 3.599678801575e-01 ||r(i)||/||b|| 3.599678801575e-01 >>>> 2 KSP preconditioned resid norm >>>> 5.913230136403e+07 true resid norm >>>> 2.364916760834e-01 ||r(i)||/||b|| 2.364916760834e-01 >>>> 3 KSP preconditioned resid norm >>>> 4.629700028930e+07 true resid norm >>>> 1.984444715595e-01 ||r(i)||/||b|| 1.984444715595e-01 >>>> 4 KSP preconditioned resid norm >>>> 3.804431276819e+07 true resid norm >>>> 1.747224559120e-01 ||r(i)||/||b|| 1.747224559120e-01 >>>> 5 KSP preconditioned resid norm >>>> 3.178769422140e+07 true resid norm >>>> 1.402254864444e-01 ||r(i)||/||b|| 1.402254864444e-01 >>>> 6 KSP preconditioned resid norm >>>> 2.648669043919e+07 true resid norm >>>> 1.191164310866e-01 ||r(i)||/||b|| 1.191164310866e-01 >>>> 7 KSP preconditioned resid norm >>>> 2.203522108614e+07 true resid norm >>>> 9.690500018007e-02 ||r(i)||/||b|| 9.690500018007e-02 >>>> <...> >>>> 422 KSP preconditioned resid norm >>>> 2.984888715147e-02 true resid norm >>>> 8.598401046494e-11 ||r(i)||/||b|| 8.598401046494e-11 >>>> 423 KSP preconditioned resid norm >>>> 2.638419658982e-02 true resid norm >>>> 7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11 >>>> Linear fieldsplit_1_ solve converged due to >>>> CONVERGED_RTOL iterations 423 >>>> 2 KSP unpreconditioned resid norm >>>> 3.539889585599e-16 true resid norm >>>> 3.542279617063e-16 ||r(i)||/||b|| 8.771347603759e-14 >>>> Linear solve converged due to CONVERGED_RTOL >>>> iterations 2 >>>> >>>> >>>> Does the slow convergence of the Schur block mean >>>> that my preconditioning matrix Sp is a poor choice? >>>> >>>> Thanks, >>>> David >>>> >>>> >>>> On 06/11/2017 08:53 AM, Matthew Knepley wrote: >>>>> On Sat, Jun 10, 2017 at 8:25 PM, David Nolte >>>>> >>>> > wrote: >>>>> >>>>> Dear all, >>>>> >>>>> I am solving a Stokes problem in 3D aorta >>>>> geometries, using a P2/P1 >>>>> finite elements discretization on tetrahedral >>>>> meshes resulting in >>>>> ~1-1.5M DOFs. Viscosity is uniform (can be >>>>> adjusted arbitrarily), and >>>>> the right hand side is a function of noisy >>>>> measurement data. >>>>> >>>>> In other settings of "standard" Stokes flow >>>>> problems I have obtained >>>>> good convergence with an "upper" Schur >>>>> complement preconditioner, using >>>>> AMG (ML or Hypre) on the velocity block and >>>>> approximating the Schur >>>>> complement matrix by the diagonal of the >>>>> pressure mass matrix: >>>>> >>>>> -ksp_converged_reason >>>>> -ksp_monitor_true_residual >>>>> -ksp_initial_guess_nonzero >>>>> -ksp_diagonal_scale >>>>> -ksp_diagonal_scale_fix >>>>> -ksp_type fgmres >>>>> -ksp_rtol 1.0e-8 >>>>> >>>>> -pc_type fieldsplit >>>>> -pc_fieldsplit_type schur >>>>> -pc_fieldsplit_detect_saddle_point >>>>> -pc_fieldsplit_schur_fact_type upper >>>>> -pc_fieldsplit_schur_precondition user >>>>> # <-- pressure mass matrix >>>>> >>>>> -fieldsplit_0_ksp_type preonly >>>>> -fieldsplit_0_pc_type ml >>>>> >>>>> -fieldsplit_1_ksp_type preonly >>>>> -fieldsplit_1_pc_type jacobi >>>>> >>>>> >>>>> 1) I always recommend starting from an exact >>>>> solver and backing off in small steps for >>>>> optimization. Thus >>>>> I would start with LU on the upper block and >>>>> GMRES/LU with toelrance 1e-10 on the Schur block. >>>>> This should converge in 1 iterate. >>>>> >>>>> 2) I don't think you want preonly on the Schur >>>>> system. You might want GMRES/Jacobi to invert the >>>>> mass matrix. >>>>> >>>>> 3) You probably want to tighten the tolerance on >>>>> the Schur solve, at least to start, and then >>>>> slowly let it out. The >>>>> tight tolerance will show you how effective >>>>> the preconditioner is using that Schur operator. >>>>> Then you can start >>>>> to evaluate how effective the Schur linear >>>>> sovler is. >>>>> >>>>> Does this make sense? >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>> In my present case this setup gives rather >>>>> slow convergence (varies for >>>>> different geometries between 200-500 or >>>>> several thousands!). I obtain >>>>> better convergence with >>>>> "-pc_fieldsplit_schur_precondition selfp"and >>>>> using multigrid on S, with >>>>> "-fieldsplit_1_pc_type ml" (I don't think >>>>> this is optimal, though). >>>>> >>>>> I don't understand why the pressure mass >>>>> matrix approach performs so >>>>> poorly and wonder what I could try to improve >>>>> the convergence. Until now >>>>> I have been using ML and Hypre BoomerAMG >>>>> mostly with default parameters. >>>>> Surely they can be improved by tuning some >>>>> parameters. Which could be a >>>>> good starting point? Are there other options I >>>>> should consider? >>>>> >>>>> With the above setup (jacobi) for a case that >>>>> works better than others, >>>>> the KSP terminates with >>>>> 467 KSP unpreconditioned resid norm >>>>> 2.072014323515e-09 true resid norm >>>>> 2.072014322600e-09 ||r(i)||/||b|| >>>>> 9.939098100674e-09 >>>>> >>>>> You can find the output of -ksp_view below. >>>>> Let me know if you need more >>>>> details. >>>>> >>>>> Thanks in advance for your advice! >>>>> Best wishes >>>>> David >>>>> >>>>> >>>>> KSP Object: 1 MPI processes >>>>> type: fgmres >>>>> GMRES: restart=30, using Classical >>>>> (unmodified) Gram-Schmidt >>>>> Orthogonalization with no iterative refinement >>>>> GMRES: happy breakdown tolerance 1e-30 >>>>> maximum iterations=10000 >>>>> tolerances: relative=1e-08, absolute=1e-50, >>>>> divergence=10000. >>>>> right preconditioning >>>>> diagonally scaled system >>>>> using nonzero initial guess >>>>> using UNPRECONDITIONED norm type for >>>>> convergence test >>>>> PC Object: 1 MPI processes >>>>> type: fieldsplit >>>>> FieldSplit with Schur preconditioner, >>>>> factorization UPPER >>>>> Preconditioner for the Schur complement >>>>> formed from user provided matrix >>>>> Split info: >>>>> Split number 0 Defined by IS >>>>> Split number 1 Defined by IS >>>>> KSP solver for A00 block >>>>> KSP Object: (fieldsplit_0_) 1 >>>>> MPI processes >>>>> type: preonly >>>>> maximum iterations=10000, initial >>>>> guess is zero >>>>> tolerances: relative=1e-05, >>>>> absolute=1e-50, divergence=10000. >>>>> left preconditioning >>>>> using NONE norm type for convergence test >>>>> PC Object: (fieldsplit_0_) 1 >>>>> MPI processes >>>>> type: ml >>>>> MG: type is MULTIPLICATIVE, levels=5 >>>>> cycles=v >>>>> Cycles per PCApply=1 >>>>> Using Galerkin computed coarse >>>>> grid matrices >>>>> Coarse grid solver -- level >>>>> ------------------------------- >>>>> KSP Object: >>>>> (fieldsplit_0_mg_coarse_) 1 MPI >>>>> processes >>>>> type: preonly >>>>> maximum iterations=10000, initial >>>>> guess is zero >>>>> tolerances: relative=1e-05, >>>>> absolute=1e-50, divergence=10000. >>>>> left preconditioning >>>>> using NONE norm type for >>>>> convergence test >>>>> PC Object: >>>>> (fieldsplit_0_mg_coarse_) 1 MPI >>>>> processes >>>>> type: lu >>>>> LU: out-of-place factorization >>>>> tolerance for zero pivot 2.22045e-14 >>>>> using diagonal shift on blocks >>>>> to prevent zero pivot >>>>> [INBLOCKS] >>>>> matrix ordering: nd >>>>> factor fill ratio given 5., >>>>> needed 1. >>>>> Factored matrix follows: >>>>> Mat Object: >>>>> 1 MPI processes >>>>> type: seqaij >>>>> rows=3, cols=3 >>>>> package used to perform >>>>> factorization: petsc >>>>> total: nonzeros=3, >>>>> allocated nonzeros=3 >>>>> total number of mallocs >>>>> used during MatSetValues >>>>> calls =0 >>>>> not using I-node routines >>>>> linear system matrix = precond matrix: >>>>> Mat Object: 1 MPI >>>>> processes >>>>> type: seqaij >>>>> rows=3, cols=3 >>>>> total: nonzeros=3, allocated >>>>> nonzeros=3 >>>>> total number of mallocs used >>>>> during MatSetValues calls =0 >>>>> not using I-node routines >>>>> Down solver (pre-smoother) on level 1 >>>>> ------------------------------- >>>>> KSP Object: >>>>> (fieldsplit_0_mg_levels_1_) 1 >>>>> MPI processes >>>>> type: richardson >>>>> Richardson: damping factor=1. >>>>> maximum iterations=2 >>>>> tolerances: relative=1e-05, >>>>> absolute=1e-50, divergence=10000. >>>>> left preconditioning >>>>> using nonzero initial guess >>>>> using NONE norm type for >>>>> convergence test >>>>> PC Object: >>>>> (fieldsplit_0_mg_levels_1_) 1 >>>>> MPI processes >>>>> type: sor >>>>> SOR: type = local_symmetric, >>>>> iterations = 1, local >>>>> iterations = 1, omega = 1. >>>>> linear system matrix = precond matrix: >>>>> Mat Object: 1 MPI >>>>> processes >>>>> type: seqaij >>>>> rows=15, cols=15 >>>>> total: nonzeros=69, allocated >>>>> nonzeros=69 >>>>> total number of mallocs used >>>>> during MatSetValues calls =0 >>>>> not using I-node routines >>>>> Up solver (post-smoother) same as down >>>>> solver (pre-smoother) >>>>> Down solver (pre-smoother) on level 2 >>>>> ------------------------------- >>>>> KSP Object: >>>>> (fieldsplit_0_mg_levels_2_) 1 >>>>> MPI processes >>>>> type: richardson >>>>> Richardson: damping factor=1. >>>>> maximum iterations=2 >>>>> tolerances: relative=1e-05, >>>>> absolute=1e-50, divergence=10000. >>>>> left preconditioning >>>>> using nonzero initial guess >>>>> using NONE norm type for >>>>> convergence test >>>>> PC Object: >>>>> (fieldsplit_0_mg_levels_2_) 1 >>>>> MPI processes >>>>> type: sor >>>>> SOR: type = local_symmetric, >>>>> iterations = 1, local >>>>> iterations = 1, omega = 1. >>>>> linear system matrix = precond matrix: >>>>> Mat Object: 1 MPI >>>>> processes >>>>> type: seqaij >>>>> rows=304, cols=304 >>>>> total: nonzeros=7354, allocated >>>>> nonzeros=7354 >>>>> total number of mallocs used >>>>> during MatSetValues calls =0 >>>>> not using I-node routines >>>>> Up solver (post-smoother) same as down >>>>> solver (pre-smoother) >>>>> Down solver (pre-smoother) on level 3 >>>>> ------------------------------- >>>>> KSP Object: >>>>> (fieldsplit_0_mg_levels_3_) 1 >>>>> MPI processes >>>>> type: richardson >>>>> Richardson: damping factor=1. >>>>> maximum iterations=2 >>>>> tolerances: relative=1e-05, >>>>> absolute=1e-50, divergence=10000. >>>>> left preconditioning >>>>> using nonzero initial guess >>>>> using NONE norm type for >>>>> convergence test >>>>> PC Object: >>>>> (fieldsplit_0_mg_levels_3_) 1 >>>>> MPI processes >>>>> type: sor >>>>> SOR: type = local_symmetric, >>>>> iterations = 1, local >>>>> iterations = 1, omega = 1. >>>>> linear system matrix = precond matrix: >>>>> Mat Object: 1 MPI >>>>> processes >>>>> type: seqaij >>>>> rows=30236, cols=30236 >>>>> total: nonzeros=2730644, >>>>> allocated nonzeros=2730644 >>>>> total number of mallocs used >>>>> during MatSetValues calls =0 >>>>> not using I-node routines >>>>> Up solver (post-smoother) same as down >>>>> solver (pre-smoother) >>>>> Down solver (pre-smoother) on level 4 >>>>> ------------------------------- >>>>> KSP Object: >>>>> (fieldsplit_0_mg_levels_4_) 1 >>>>> MPI processes >>>>> type: richardson >>>>> Richardson: damping factor=1. >>>>> maximum iterations=2 >>>>> tolerances: relative=1e-05, >>>>> absolute=1e-50, divergence=10000. >>>>> left preconditioning >>>>> using nonzero initial guess >>>>> using NONE norm type for >>>>> convergence test >>>>> PC Object: >>>>> (fieldsplit_0_mg_levels_4_) 1 >>>>> MPI processes >>>>> type: sor >>>>> SOR: type = local_symmetric, >>>>> iterations = 1, local >>>>> iterations = 1, omega = 1. >>>>> linear system matrix = precond matrix: >>>>> Mat Object: >>>>> (fieldsplit_0_) 1 MPI >>>>> processes >>>>> type: seqaij >>>>> rows=894132, cols=894132 >>>>> total: nonzeros=70684164, >>>>> allocated nonzeros=70684164 >>>>> total number of mallocs used >>>>> during MatSetValues calls =0 >>>>> not using I-node routines >>>>> Up solver (post-smoother) same as down >>>>> solver (pre-smoother) >>>>> linear system matrix = precond matrix: >>>>> Mat Object: (fieldsplit_0_) >>>>> 1 MPI processes >>>>> type: seqaij >>>>> rows=894132, cols=894132 >>>>> total: nonzeros=70684164, allocated >>>>> nonzeros=70684164 >>>>> total number of mallocs used during >>>>> MatSetValues calls =0 >>>>> not using I-node routines >>>>> KSP solver for S = A11 - A10 inv(A00) A01 >>>>> KSP Object: (fieldsplit_1_) 1 >>>>> MPI processes >>>>> type: preonly >>>>> maximum iterations=10000, initial >>>>> guess is zero >>>>> tolerances: relative=1e-05, >>>>> absolute=1e-50, divergence=10000. >>>>> left preconditioning >>>>> using NONE norm type for convergence test >>>>> PC Object: (fieldsplit_1_) 1 >>>>> MPI processes >>>>> type: jacobi >>>>> linear system matrix followed by >>>>> preconditioner matrix: >>>>> Mat Object: (fieldsplit_1_) >>>>> 1 MPI processes >>>>> type: schurcomplement >>>>> rows=42025, cols=42025 >>>>> Schur complement A11 - A10 >>>>> inv(A00) A01 >>>>> A11 >>>>> Mat Object: >>>>> (fieldsplit_1_) 1 >>>>> MPI processes >>>>> type: seqaij >>>>> rows=42025, cols=42025 >>>>> total: nonzeros=554063, >>>>> allocated nonzeros=554063 >>>>> total number of mallocs used >>>>> during MatSetValues calls =0 >>>>> not using I-node routines >>>>> A10 >>>>> Mat Object: 1 MPI >>>>> processes >>>>> type: seqaij >>>>> rows=42025, cols=894132 >>>>> total: nonzeros=6850107, >>>>> allocated nonzeros=6850107 >>>>> total number of mallocs used >>>>> during MatSetValues calls =0 >>>>> not using I-node routines >>>>> KSP of A00 >>>>> KSP Object: >>>>> (fieldsplit_0_) 1 >>>>> MPI processes >>>>> type: preonly >>>>> maximum iterations=10000, >>>>> initial guess is zero >>>>> tolerances: relative=1e-05, >>>>> absolute=1e-50, >>>>> divergence=10000. >>>>> left preconditioning >>>>> using NONE norm type for >>>>> convergence test >>>>> PC Object: >>>>> (fieldsplit_0_) 1 >>>>> MPI processes >>>>> type: ml >>>>> MG: type is MULTIPLICATIVE, >>>>> levels=5 cycles=v >>>>> Cycles per PCApply=1 >>>>> Using Galerkin computed >>>>> coarse grid matrices >>>>> Coarse grid solver -- level >>>>> ------------------------------- >>>>> KSP Object: >>>>> (fieldsplit_0_mg_coarse_) 1 >>>>> MPI processes >>>>> type: preonly >>>>> maximum iterations=10000, >>>>> initial guess is zero >>>>> tolerances: >>>>> relative=1e-05, absolute=1e-50, >>>>> divergence=10000. >>>>> left preconditioning >>>>> using NONE norm type for >>>>> convergence test >>>>> PC Object: >>>>> (fieldsplit_0_mg_coarse_) 1 >>>>> MPI processes >>>>> type: lu >>>>> LU: out-of-place >>>>> factorization >>>>> tolerance for zero pivot >>>>> 2.22045e-14 >>>>> using diagonal shift on >>>>> blocks to prevent zero >>>>> pivot [INBLOCKS] >>>>> matrix ordering: nd >>>>> factor fill ratio given >>>>> 5., needed 1. >>>>> Factored matrix follows: >>>>> Mat Object: >>>>> 1 MPI >>>>> processes >>>>> type: seqaij >>>>> rows=3, cols=3 >>>>> package used to >>>>> perform factorization: petsc >>>>> total: nonzeros=3, >>>>> allocated nonzeros=3 >>>>> total number of >>>>> mallocs used during >>>>> MatSetValues calls =0 >>>>> not using I-node >>>>> routines >>>>> >>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: From kannanr at ornl.gov Wed Jun 14 14:58:48 2017 From: kannanr at ornl.gov (Kannan, Ramakrishnan) Date: Wed, 14 Jun 2017 19:58:48 +0000 Subject: [petsc-users] slepc NHEP error Message-ID: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> Hello, I am running NHEP across 16 MPI processors over 16 nodes in a matrix of global size of 1,000,000x1,000,000 with approximately global 16,000,000 non-zeros. Each node has approximately 1million non-zeros. The following is my slepc code for EPS. PetscInt nev; ierr = EPSCreate(PETSC_COMM_WORLD, &eps); // CHKERRQ(ierr); ierr = EPSSetOperators(eps, A, NULL); // CHKERRQ(ierr); ierr = EPSSetProblemType(eps, EPS_NHEP); // CHKERRQ(ierr); EPSSetWhichEigenpairs(eps, EPS_LARGEST_REAL); EPSSetDimensions(eps, 100, PETSC_DEFAULT, PETSC_DEFAULT); PRINTROOT("calling epssolve"); ierr = EPSSolve(eps); // CHKERRQ(ierr); ierr = EPSGetType(eps, &type); // CHKERRQ(ierr); ierr = PetscPrintf(PETSC_COMM_WORLD, " Solution method: %s\n\n", type); // CHKERRQ(ierr); ierr = EPSGetDimensions(eps, &nev, NULL, NULL); // CHKERRQ(ierr); ierr = PetscPrintf(PETSC_COMM_WORLD, " Number of requested eigenvalues: %D\n", nev); // CHKERRQ(ierr); I am getting the following error. Attached is the entire error file for your reference. Please let me know what should I fix in this code. 2]PETSC ERROR: Argument out of range [2]PETSC ERROR: Argument 2 out of range [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [2]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016 [2]PETSC ERROR: ./miniapps on a sandybridge named nid00300 by d3s Wed Jun 14 15:32:00 2017 [13]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c [13]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c [13]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c [13]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c [13]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c [13]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c [13]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c [13]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c [13]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c -- Regards, Ramki -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test.pbs.e613713 Type: application/octet-stream Size: 1419981 bytes Desc: test.pbs.e613713 URL: From kannanr at ornl.gov Wed Jun 14 15:12:28 2017 From: kannanr at ornl.gov (Kannan, Ramakrishnan) Date: Wed, 14 Jun 2017 20:12:28 +0000 Subject: [petsc-users] slow generation of petsc MPIAIJ matrix Message-ID: <643C345A-D6A1-4419-B3BD-A39E0C133047@ornl.gov> I am running NHEP across 16 MPI processors over 16 nodes in a matrix of global size of 1,000,000x1,000,000 with approximately global 16,000,000 non-zeros. Each node has the 1D row distribution of the matrix with exactly 62500 rows and 1 million columns with 1million non-zeros as CSR/COO matrix. I am generating this graph as follows. It takes approximately 12 seconds to insert 25000 NNZ into petsc matrix with MatSetValues which means it is taking closer to 10 minutes to 1million NNZ?s in every processes. It takes 12 seconds for assembly. Is these times normal? Is there a faster way of doing it? I am unable to construct matrices of 1 billion global nnz?s in which each process has closer to 100 million entries. Generate_petsc_matrix(int n_rows, int n_cols, int n_nnz, PetscInt *row_idx, PetscInt *col_idx, PetscScalar *val, const MPICommunicator& communicator) { int *start_row = new int[communicator.size()]; MPI_Allgather(&n_rows, 1, MPI_INT, all_proc_rows, 1, MPI_INT, MPI_COMM_WORLD); start_row[0] = 0; for (int i = 0; i < communicator.size(); i++) { if (i > 0) { start_row[i] = start_row[i - 1] + all_proc_rows[i]; } } MatCreate(PETSC_COMM_WORLD, &A); MatSetType(A, MATMPIAIJ); MatSetSizes(A, n_rows, PETSC_DECIDE, global_rows, n_cols); MatMPIAIJSetPreallocation(A, PETSC_DEFAULT, PETSC_NULL, PETSC_DEFAULT, PETSC_NULL); PetscInt local_row_idx; PetscInt local_col_idx; PetscScalar local_val; int my_start_row = start_row[MPI_RANK]; int my_start_col = 0; double petsc_insert_time=0.0; for (int i = 0; i < n_nnz; i++) { local_row_idx = my_start_row + row_idx[i]; local_col_idx = my_start_col + col_idx[i]; local_val = val[i]; tic(); ierr = MatSetValues(A, 1, &local_row_idx, 1, &local_col_idx, &local_val, INSERT_VALUES); petsc_insert_time += toc(); if (i % 25000 == 0){ PRINTROOT("25000 time::" << petsc_insert_time); petsc_insert_time=0; } CHKERRV(ierr); } PRINTROOT("100000 time::" << petsc_insert_time); tic(); MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); petsc_insert_time = toc(); PRINTROOT("calling assembly to end::took::" << petsc_insert_time); } -- Regards, Ramki -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Jun 14 15:14:05 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 14 Jun 2017 15:14:05 -0500 Subject: [petsc-users] slepc NHEP error In-Reply-To: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> Message-ID: Why do you have the CHKERRQ(ierr); commented out in your code? Because of this you are getting mangled confusing error messages. Put a ierr = in front of all calls and a CHKERRQ(ierr); after each call. Then resend the new error message which will be much clearer. > On Jun 14, 2017, at 2:58 PM, Kannan, Ramakrishnan wrote: > > Hello, > > I am running NHEP across 16 MPI processors over 16 nodes in a matrix of global size of 1,000,000x1,000,000 with approximately global 16,000,000 non-zeros. Each node has approximately 1million non-zeros. > > The following is my slepc code for EPS. > > PetscInt nev; > ierr = EPSCreate(PETSC_COMM_WORLD, &eps); // CHKERRQ(ierr); > ierr = EPSSetOperators(eps, A, NULL); // CHKERRQ(ierr); > ierr = EPSSetProblemType(eps, EPS_NHEP); // CHKERRQ(ierr); > EPSSetWhichEigenpairs(eps, EPS_LARGEST_REAL); > EPSSetDimensions(eps, 100, PETSC_DEFAULT, PETSC_DEFAULT); > PRINTROOT("calling epssolve"); > ierr = EPSSolve(eps); // CHKERRQ(ierr); > ierr = EPSGetType(eps, &type); // CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD, " Solution method: %s\n\n", type); > // CHKERRQ(ierr); > ierr = EPSGetDimensions(eps, &nev, NULL, NULL); // CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD, > " Number of requested eigenvalues: %D\n", > nev); // CHKERRQ(ierr); > > I am getting the following error. Attached is the entire error file for your reference. Please let me know what should I fix in this code. > > 2]PETSC ERROR: Argument out of range > [2]PETSC ERROR: Argument 2 out of range > [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [2]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016 > [2]PETSC ERROR: ./miniapps on a sandybridge named nid00300 by d3s Wed Jun 14 15:32:00 2017 > [13]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c > [13]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c > [13]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c > [13]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [13]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [13]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [13]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c > [13]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c > [13]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c > > -- > Regards, > Ramki > > From jed at jedbrown.org Wed Jun 14 15:18:44 2017 From: jed at jedbrown.org (Jed Brown) Date: Wed, 14 Jun 2017 14:18:44 -0600 Subject: [petsc-users] slow generation of petsc MPIAIJ matrix In-Reply-To: <643C345A-D6A1-4419-B3BD-A39E0C133047@ornl.gov> References: <643C345A-D6A1-4419-B3BD-A39E0C133047@ornl.gov> Message-ID: <87mv9a2uhn.fsf@jedbrown.org> "Kannan, Ramakrishnan" writes: > I am running NHEP across 16 MPI processors over 16 nodes in a matrix of global size of 1,000,000x1,000,000 with approximately global 16,000,000 non-zeros. Each node has the 1D row distribution of the matrix with exactly 62500 rows and 1 million columns with 1million non-zeros as CSR/COO matrix. > > I am generating this graph as follows. It takes approximately 12 seconds to insert 25000 NNZ into petsc matrix with MatSetValues which means it is taking closer to 10 minutes to 1million NNZ?s in every processes. It takes 12 seconds for assembly. Is these times normal? Is there a faster way of doing it? I am unable to construct matrices of 1 billion global nnz?s in which each process has closer to 100 million entries. > > Generate_petsc_matrix(int n_rows, int n_cols, int n_nnz, > PetscInt *row_idx, PetscInt *col_idx, PetscScalar *val, > const MPICommunicator& communicator) { > int *start_row = new int[communicator.size()]; > MPI_Allgather(&n_rows, 1, MPI_INT, all_proc_rows, 1, MPI_INT, MPI_COMM_WORLD); > start_row[0] = 0; > for (int i = 0; i < communicator.size(); i++) { > if (i > 0) { > start_row[i] = start_row[i - 1] + all_proc_rows[i]; > } > } > MatCreate(PETSC_COMM_WORLD, &A); > MatSetType(A, MATMPIAIJ); > MatSetSizes(A, n_rows, PETSC_DECIDE, global_rows, n_cols); > MatMPIAIJSetPreallocation(A, PETSC_DEFAULT, PETSC_NULL, PETSC_DEFAULT, PETSC_NULL); This preallocation is not sufficient. Either put in the maximum number of entries any any row or provide the arrays. This will make your matrix assembly orders of magnitude faster. > PetscInt local_row_idx; > PetscInt local_col_idx; > PetscScalar local_val; > int my_start_row = start_row[MPI_RANK]; > int my_start_col = 0; > double petsc_insert_time=0.0; > for (int i = 0; i < n_nnz; i++) { > local_row_idx = my_start_row + row_idx[i]; > local_col_idx = my_start_col + col_idx[i]; > local_val = val[i]; > tic(); > ierr = MatSetValues(A, 1, &local_row_idx, 1, &local_col_idx, &local_val, INSERT_VALUES); > petsc_insert_time += toc(); > if (i % 25000 == 0){ > PRINTROOT("25000 time::" << petsc_insert_time); > petsc_insert_time=0; > } > CHKERRV(ierr); > } > PRINTROOT("100000 time::" << petsc_insert_time); > tic(); > MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); > MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); > petsc_insert_time = toc(); > PRINTROOT("calling assembly to end::took::" << petsc_insert_time); > } > -- > Regards, > Ramki -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From bsmith at mcs.anl.gov Wed Jun 14 15:20:16 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 14 Jun 2017 15:20:16 -0500 Subject: [petsc-users] slow generation of petsc MPIAIJ matrix In-Reply-To: <87mv9a2uhn.fsf@jedbrown.org> References: <643C345A-D6A1-4419-B3BD-A39E0C133047@ornl.gov> <87mv9a2uhn.fsf@jedbrown.org> Message-ID: <7425EF32-54E8-43CA-91FC-B6219415895A@mcs.anl.gov> http://www.mcs.anl.gov/petsc/documentation/faq.html#efficient-assembly > On Jun 14, 2017, at 3:18 PM, Jed Brown wrote: > > "Kannan, Ramakrishnan" writes: > >> I am running NHEP across 16 MPI processors over 16 nodes in a matrix of global size of 1,000,000x1,000,000 with approximately global 16,000,000 non-zeros. Each node has the 1D row distribution of the matrix with exactly 62500 rows and 1 million columns with 1million non-zeros as CSR/COO matrix. >> >> I am generating this graph as follows. It takes approximately 12 seconds to insert 25000 NNZ into petsc matrix with MatSetValues which means it is taking closer to 10 minutes to 1million NNZ?s in every processes. It takes 12 seconds for assembly. Is these times normal? Is there a faster way of doing it? I am unable to construct matrices of 1 billion global nnz?s in which each process has closer to 100 million entries. >> >> Generate_petsc_matrix(int n_rows, int n_cols, int n_nnz, >> PetscInt *row_idx, PetscInt *col_idx, PetscScalar *val, >> const MPICommunicator& communicator) { >> int *start_row = new int[communicator.size()]; >> MPI_Allgather(&n_rows, 1, MPI_INT, all_proc_rows, 1, MPI_INT, MPI_COMM_WORLD); >> start_row[0] = 0; >> for (int i = 0; i < communicator.size(); i++) { >> if (i > 0) { >> start_row[i] = start_row[i - 1] + all_proc_rows[i]; >> } >> } >> MatCreate(PETSC_COMM_WORLD, &A); >> MatSetType(A, MATMPIAIJ); >> MatSetSizes(A, n_rows, PETSC_DECIDE, global_rows, n_cols); >> MatMPIAIJSetPreallocation(A, PETSC_DEFAULT, PETSC_NULL, PETSC_DEFAULT, PETSC_NULL); > > This preallocation is not sufficient. Either put in the maximum number > of entries any any row or provide the arrays. This will make your > matrix assembly orders of magnitude faster. > >> PetscInt local_row_idx; >> PetscInt local_col_idx; >> PetscScalar local_val; >> int my_start_row = start_row[MPI_RANK]; >> int my_start_col = 0; >> double petsc_insert_time=0.0; >> for (int i = 0; i < n_nnz; i++) { >> local_row_idx = my_start_row + row_idx[i]; >> local_col_idx = my_start_col + col_idx[i]; >> local_val = val[i]; >> tic(); >> ierr = MatSetValues(A, 1, &local_row_idx, 1, &local_col_idx, &local_val, INSERT_VALUES); >> petsc_insert_time += toc(); >> if (i % 25000 == 0){ >> PRINTROOT("25000 time::" << petsc_insert_time); >> petsc_insert_time=0; >> } >> CHKERRV(ierr); >> } >> PRINTROOT("100000 time::" << petsc_insert_time); >> tic(); >> MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); >> MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); >> petsc_insert_time = toc(); >> PRINTROOT("calling assembly to end::took::" << petsc_insert_time); >> } >> -- >> Regards, >> Ramki From kannanr at ornl.gov Wed Jun 14 15:25:59 2017 From: kannanr at ornl.gov (Kannan, Ramakrishnan) Date: Wed, 14 Jun 2017 20:25:59 +0000 Subject: [petsc-users] slepc NHEP error In-Reply-To: References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> Message-ID: <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> I get the following compilation error when I have CHKERRQ. /opt/cray/petsc/3.7.4.0/real/GNU64/5.1/sandybridge/include/petscerror.h:433:154: error: return-statement with a value, in function returning 'void' [-fpermissive] #define CHKERRQ(n) do {if (PetscUnlikely(n)) return PetscError(PETSC_COMM_SELF,__LINE__,PETSC_FUNCTION_NAME,__FILE__,n,PETSC_ERROR_REPEAT," ");} while (0) -- Regards, Ramki On 6/14/17, 4:14 PM, "Barry Smith" wrote: Why do you have the CHKERRQ(ierr); commented out in your code? Because of this you are getting mangled confusing error messages. Put a ierr = in front of all calls and a CHKERRQ(ierr); after each call. Then resend the new error message which will be much clearer. > On Jun 14, 2017, at 2:58 PM, Kannan, Ramakrishnan wrote: > > Hello, > > I am running NHEP across 16 MPI processors over 16 nodes in a matrix of global size of 1,000,000x1,000,000 with approximately global 16,000,000 non-zeros. Each node has approximately 1million non-zeros. > > The following is my slepc code for EPS. > > PetscInt nev; > ierr = EPSCreate(PETSC_COMM_WORLD, &eps); // CHKERRQ(ierr); > ierr = EPSSetOperators(eps, A, NULL); // CHKERRQ(ierr); > ierr = EPSSetProblemType(eps, EPS_NHEP); // CHKERRQ(ierr); > EPSSetWhichEigenpairs(eps, EPS_LARGEST_REAL); > EPSSetDimensions(eps, 100, PETSC_DEFAULT, PETSC_DEFAULT); > PRINTROOT("calling epssolve"); > ierr = EPSSolve(eps); // CHKERRQ(ierr); > ierr = EPSGetType(eps, &type); // CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD, " Solution method: %s\n\n", type); > // CHKERRQ(ierr); > ierr = EPSGetDimensions(eps, &nev, NULL, NULL); // CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD, > " Number of requested eigenvalues: %D\n", > nev); // CHKERRQ(ierr); > > I am getting the following error. Attached is the entire error file for your reference. Please let me know what should I fix in this code. > > 2]PETSC ERROR: Argument out of range > [2]PETSC ERROR: Argument 2 out of range > [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [2]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016 > [2]PETSC ERROR: ./miniapps on a sandybridge named nid00300 by d3s Wed Jun 14 15:32:00 2017 > [13]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c > [13]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c > [13]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c > [13]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [13]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [13]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [13]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c > [13]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c > [13]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c > > -- > Regards, > Ramki > > From kannanr at ornl.gov Wed Jun 14 15:33:54 2017 From: kannanr at ornl.gov (Kannan, Ramakrishnan) Date: Wed, 14 Jun 2017 20:33:54 +0000 Subject: [petsc-users] slepc NHEP error In-Reply-To: <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> Message-ID: <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> Can I use CHKERRV instead of CHKERRQ? Will that help? -- Regards, Ramki On 6/14/17, 4:25 PM, "Kannan, Ramakrishnan" wrote: I get the following compilation error when I have CHKERRQ. /opt/cray/petsc/3.7.4.0/real/GNU64/5.1/sandybridge/include/petscerror.h:433:154: error: return-statement with a value, in function returning 'void' [-fpermissive] #define CHKERRQ(n) do {if (PetscUnlikely(n)) return PetscError(PETSC_COMM_SELF,__LINE__,PETSC_FUNCTION_NAME,__FILE__,n,PETSC_ERROR_REPEAT," ");} while (0) -- Regards, Ramki On 6/14/17, 4:14 PM, "Barry Smith" wrote: Why do you have the CHKERRQ(ierr); commented out in your code? Because of this you are getting mangled confusing error messages. Put a ierr = in front of all calls and a CHKERRQ(ierr); after each call. Then resend the new error message which will be much clearer. > On Jun 14, 2017, at 2:58 PM, Kannan, Ramakrishnan wrote: > > Hello, > > I am running NHEP across 16 MPI processors over 16 nodes in a matrix of global size of 1,000,000x1,000,000 with approximately global 16,000,000 non-zeros. Each node has approximately 1million non-zeros. > > The following is my slepc code for EPS. > > PetscInt nev; > ierr = EPSCreate(PETSC_COMM_WORLD, &eps); // CHKERRQ(ierr); > ierr = EPSSetOperators(eps, A, NULL); // CHKERRQ(ierr); > ierr = EPSSetProblemType(eps, EPS_NHEP); // CHKERRQ(ierr); > EPSSetWhichEigenpairs(eps, EPS_LARGEST_REAL); > EPSSetDimensions(eps, 100, PETSC_DEFAULT, PETSC_DEFAULT); > PRINTROOT("calling epssolve"); > ierr = EPSSolve(eps); // CHKERRQ(ierr); > ierr = EPSGetType(eps, &type); // CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD, " Solution method: %s\n\n", type); > // CHKERRQ(ierr); > ierr = EPSGetDimensions(eps, &nev, NULL, NULL); // CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD, > " Number of requested eigenvalues: %D\n", > nev); // CHKERRQ(ierr); > > I am getting the following error. Attached is the entire error file for your reference. Please let me know what should I fix in this code. > > 2]PETSC ERROR: Argument out of range > [2]PETSC ERROR: Argument 2 out of range > [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [2]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016 > [2]PETSC ERROR: ./miniapps on a sandybridge named nid00300 by d3s Wed Jun 14 15:32:00 2017 > [13]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c > [13]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c > [13]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c > [13]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [13]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [13]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [13]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c > [13]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c > [13]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c > > -- > Regards, > Ramki > > From bsmith at mcs.anl.gov Wed Jun 14 15:40:45 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 14 Jun 2017 15:40:45 -0500 Subject: [petsc-users] slepc NHEP error In-Reply-To: <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> Message-ID: <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> > On Jun 14, 2017, at 3:33 PM, Kannan, Ramakrishnan wrote: > > Can I use CHKERRV instead of CHKERRQ? Will that help? You can do that. But I question you having functions in your code that return void instead of an error code. Without error codes you are just hurting your own productivity. Barry > > -- > Regards, > Ramki > > > On 6/14/17, 4:25 PM, "Kannan, Ramakrishnan" wrote: > > I get the following compilation error when I have CHKERRQ. > > /opt/cray/petsc/3.7.4.0/real/GNU64/5.1/sandybridge/include/petscerror.h:433:154: error: return-statement with a value, in function returning 'void' [-fpermissive] > #define CHKERRQ(n) do {if (PetscUnlikely(n)) return PetscError(PETSC_COMM_SELF,__LINE__,PETSC_FUNCTION_NAME,__FILE__,n,PETSC_ERROR_REPEAT," ");} while (0) > > > -- > Regards, > Ramki > > > On 6/14/17, 4:14 PM, "Barry Smith" wrote: > > > Why do you have the CHKERRQ(ierr); commented out in your code? > > Because of this you are getting mangled confusing error messages. > > Put a ierr = in front of all calls and a CHKERRQ(ierr); after each call. > > Then resend the new error message which will be much clearer. > > > >> On Jun 14, 2017, at 2:58 PM, Kannan, Ramakrishnan wrote: >> >> Hello, >> >> I am running NHEP across 16 MPI processors over 16 nodes in a matrix of global size of 1,000,000x1,000,000 with approximately global 16,000,000 non-zeros. Each node has approximately 1million non-zeros. >> >> The following is my slepc code for EPS. >> >> PetscInt nev; >> ierr = EPSCreate(PETSC_COMM_WORLD, &eps); // CHKERRQ(ierr); >> ierr = EPSSetOperators(eps, A, NULL); // CHKERRQ(ierr); >> ierr = EPSSetProblemType(eps, EPS_NHEP); // CHKERRQ(ierr); >> EPSSetWhichEigenpairs(eps, EPS_LARGEST_REAL); >> EPSSetDimensions(eps, 100, PETSC_DEFAULT, PETSC_DEFAULT); >> PRINTROOT("calling epssolve"); >> ierr = EPSSolve(eps); // CHKERRQ(ierr); >> ierr = EPSGetType(eps, &type); // CHKERRQ(ierr); >> ierr = PetscPrintf(PETSC_COMM_WORLD, " Solution method: %s\n\n", type); >> // CHKERRQ(ierr); >> ierr = EPSGetDimensions(eps, &nev, NULL, NULL); // CHKERRQ(ierr); >> ierr = PetscPrintf(PETSC_COMM_WORLD, >> " Number of requested eigenvalues: %D\n", >> nev); // CHKERRQ(ierr); >> >> I am getting the following error. Attached is the entire error file for your reference. Please let me know what should I fix in this code. >> >> 2]PETSC ERROR: Argument out of range >> [2]PETSC ERROR: Argument 2 out of range >> [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >> [2]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016 >> [2]PETSC ERROR: ./miniapps on a sandybridge named nid00300 by d3s Wed Jun 14 15:32:00 2017 >> [13]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c >> [13]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c >> [13]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c >> [13]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [13]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [13]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [13]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c >> [13]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c >> [13]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c >> >> -- >> Regards, >> Ramki >> >> > > > > > > From kannanr at ornl.gov Wed Jun 14 15:45:09 2017 From: kannanr at ornl.gov (Kannan, Ramakrishnan) Date: Wed, 14 Jun 2017 20:45:09 +0000 Subject: [petsc-users] slepc NHEP error In-Reply-To: <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> Message-ID: <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> Barry, All the functions here are standard SLEPC functions and there are no user-defined or custom code here. As you can see, when I uncomment the CHKERRQ macros in my code, I am getting the compilation error. -- Regards, Ramki On 6/14/17, 4:40 PM, "Barry Smith" wrote: > On Jun 14, 2017, at 3:33 PM, Kannan, Ramakrishnan wrote: > > Can I use CHKERRV instead of CHKERRQ? Will that help? You can do that. But I question you having functions in your code that return void instead of an error code. Without error codes you are just hurting your own productivity. Barry > > -- > Regards, > Ramki > > > On 6/14/17, 4:25 PM, "Kannan, Ramakrishnan" wrote: > > I get the following compilation error when I have CHKERRQ. > > /opt/cray/petsc/3.7.4.0/real/GNU64/5.1/sandybridge/include/petscerror.h:433:154: error: return-statement with a value, in function returning 'void' [-fpermissive] > #define CHKERRQ(n) do {if (PetscUnlikely(n)) return PetscError(PETSC_COMM_SELF,__LINE__,PETSC_FUNCTION_NAME,__FILE__,n,PETSC_ERROR_REPEAT," ");} while (0) > > > -- > Regards, > Ramki > > > On 6/14/17, 4:14 PM, "Barry Smith" wrote: > > > Why do you have the CHKERRQ(ierr); commented out in your code? > > Because of this you are getting mangled confusing error messages. > > Put a ierr = in front of all calls and a CHKERRQ(ierr); after each call. > > Then resend the new error message which will be much clearer. > > > >> On Jun 14, 2017, at 2:58 PM, Kannan, Ramakrishnan wrote: >> >> Hello, >> >> I am running NHEP across 16 MPI processors over 16 nodes in a matrix of global size of 1,000,000x1,000,000 with approximately global 16,000,000 non-zeros. Each node has approximately 1million non-zeros. >> >> The following is my slepc code for EPS. >> >> PetscInt nev; >> ierr = EPSCreate(PETSC_COMM_WORLD, &eps); // CHKERRQ(ierr); >> ierr = EPSSetOperators(eps, A, NULL); // CHKERRQ(ierr); >> ierr = EPSSetProblemType(eps, EPS_NHEP); // CHKERRQ(ierr); >> EPSSetWhichEigenpairs(eps, EPS_LARGEST_REAL); >> EPSSetDimensions(eps, 100, PETSC_DEFAULT, PETSC_DEFAULT); >> PRINTROOT("calling epssolve"); >> ierr = EPSSolve(eps); // CHKERRQ(ierr); >> ierr = EPSGetType(eps, &type); // CHKERRQ(ierr); >> ierr = PetscPrintf(PETSC_COMM_WORLD, " Solution method: %s\n\n", type); >> // CHKERRQ(ierr); >> ierr = EPSGetDimensions(eps, &nev, NULL, NULL); // CHKERRQ(ierr); >> ierr = PetscPrintf(PETSC_COMM_WORLD, >> " Number of requested eigenvalues: %D\n", >> nev); // CHKERRQ(ierr); >> >> I am getting the following error. Attached is the entire error file for your reference. Please let me know what should I fix in this code. >> >> 2]PETSC ERROR: Argument out of range >> [2]PETSC ERROR: Argument 2 out of range >> [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >> [2]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016 >> [2]PETSC ERROR: ./miniapps on a sandybridge named nid00300 by d3s Wed Jun 14 15:32:00 2017 >> [13]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c >> [13]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c >> [13]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c >> [13]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [13]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [13]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [13]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c >> [13]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c >> [13]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c >> >> -- >> Regards, >> Ramki >> >> > > > > > > From bsmith at mcs.anl.gov Wed Jun 14 15:48:26 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 14 Jun 2017 15:48:26 -0500 Subject: [petsc-users] slepc NHEP error In-Reply-To: <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> Message-ID: <92EA6381-8F1F-4C9F-88BD-C1357B8C1C42@mcs.anl.gov> > On Jun 14, 2017, at 3:45 PM, Kannan, Ramakrishnan wrote: > > Barry, > > All the functions here are standard SLEPC functions and there are no user-defined or custom code here. As you can see, when I uncomment the CHKERRQ macros in my code, I am getting the compilation error. Yes that is because YOUR function that calls the SLEPc functions is void and doesn't return an error code. It is that function I recommend changing to return error codes. Barry > > -- > Regards, > Ramki > > > On 6/14/17, 4:40 PM, "Barry Smith" wrote: > > >> On Jun 14, 2017, at 3:33 PM, Kannan, Ramakrishnan wrote: >> >> Can I use CHKERRV instead of CHKERRQ? Will that help? > > You can do that. But I question you having functions in your code that return void instead of an error code. Without error codes you are just hurting your own productivity. > > Barry > >> >> -- >> Regards, >> Ramki >> >> >> On 6/14/17, 4:25 PM, "Kannan, Ramakrishnan" wrote: >> >> I get the following compilation error when I have CHKERRQ. >> >> /opt/cray/petsc/3.7.4.0/real/GNU64/5.1/sandybridge/include/petscerror.h:433:154: error: return-statement with a value, in function returning 'void' [-fpermissive] >> #define CHKERRQ(n) do {if (PetscUnlikely(n)) return PetscError(PETSC_COMM_SELF,__LINE__,PETSC_FUNCTION_NAME,__FILE__,n,PETSC_ERROR_REPEAT," ");} while (0) >> >> >> -- >> Regards, >> Ramki >> >> >> On 6/14/17, 4:14 PM, "Barry Smith" wrote: >> >> >> Why do you have the CHKERRQ(ierr); commented out in your code? >> >> Because of this you are getting mangled confusing error messages. >> >> Put a ierr = in front of all calls and a CHKERRQ(ierr); after each call. >> >> Then resend the new error message which will be much clearer. >> >> >> >>> On Jun 14, 2017, at 2:58 PM, Kannan, Ramakrishnan wrote: >>> >>> Hello, >>> >>> I am running NHEP across 16 MPI processors over 16 nodes in a matrix of global size of 1,000,000x1,000,000 with approximately global 16,000,000 non-zeros. Each node has approximately 1million non-zeros. >>> >>> The following is my slepc code for EPS. >>> >>> PetscInt nev; >>> ierr = EPSCreate(PETSC_COMM_WORLD, &eps); // CHKERRQ(ierr); >>> ierr = EPSSetOperators(eps, A, NULL); // CHKERRQ(ierr); >>> ierr = EPSSetProblemType(eps, EPS_NHEP); // CHKERRQ(ierr); >>> EPSSetWhichEigenpairs(eps, EPS_LARGEST_REAL); >>> EPSSetDimensions(eps, 100, PETSC_DEFAULT, PETSC_DEFAULT); >>> PRINTROOT("calling epssolve"); >>> ierr = EPSSolve(eps); // CHKERRQ(ierr); >>> ierr = EPSGetType(eps, &type); // CHKERRQ(ierr); >>> ierr = PetscPrintf(PETSC_COMM_WORLD, " Solution method: %s\n\n", type); >>> // CHKERRQ(ierr); >>> ierr = EPSGetDimensions(eps, &nev, NULL, NULL); // CHKERRQ(ierr); >>> ierr = PetscPrintf(PETSC_COMM_WORLD, >>> " Number of requested eigenvalues: %D\n", >>> nev); // CHKERRQ(ierr); >>> >>> I am getting the following error. Attached is the entire error file for your reference. Please let me know what should I fix in this code. >>> >>> 2]PETSC ERROR: Argument out of range >>> [2]PETSC ERROR: Argument 2 out of range >>> [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>> [2]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016 >>> [2]PETSC ERROR: ./miniapps on a sandybridge named nid00300 by d3s Wed Jun 14 15:32:00 2017 >>> [13]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c >>> [13]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c >>> [13]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c >>> [13]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >>> [13]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >>> [13]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >>> [13]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c >>> [13]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c >>> [13]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c >>> >>> -- >>> Regards, >>> Ramki >>> >>> >> >> >> >> >> >> > > > > From kannanr at ornl.gov Wed Jun 14 16:17:33 2017 From: kannanr at ornl.gov (Kannan, Ramakrishnan) Date: Wed, 14 Jun 2017 21:17:33 +0000 Subject: [petsc-users] slepc NHEP error In-Reply-To: <92EA6381-8F1F-4C9F-88BD-C1357B8C1C42@mcs.anl.gov> References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> <92EA6381-8F1F-4C9F-88BD-C1357B8C1C42@mcs.anl.gov> Message-ID: Barry, Appreciate your kind help. It compiles fine. I am still getting the following error. [0]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c [0]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c [0]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c [0]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c [0]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c [0]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c [0]PETSC ERROR: [8]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c [8]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c [8]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c [8]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c [8]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c [8]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c [8]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c [8]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c [8]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c [8]PETSC ERROR: #10 count() line 266 in /lustre/atlas/proj-shared/csc040/gryffin/gryffndor/miniapps/cmake/../algorithms/tricount.hpp [2]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c [2]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c [2]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c [2]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c [2]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c [2]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c [2]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c [2]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c [2]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c [2]PETSC ERROR: #10 count() line 266 in /lustre/atlas/proj-shared/csc040/gryffin/gryffndor/miniapps/cmake/../algorithms/tricount.hpp #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c [0]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c [0]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c [0]PETSC ERROR: #10 count() line 266 in /lustre/atlas/proj-shared/csc040/gryffin/gryffndor/miniapps/cmake/../algorithms/tricount.hpp [7]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c [7]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c [7]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c [7]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c [7]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c [7]PETSC ERROR: [15]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c [15]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c [15]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c [15]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c [15]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c [15]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c [15]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c [15]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c [15]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c [15]PETSC ERROR: #10 count() line 266 in /lustre/atlas/proj-shared/csc040/gryffin/gryffndor/miniapps/cmake/../algorithms/tricount.hpp #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c [7]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c [7]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c [7]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c [7]PETSC ERROR: #10 count() line 266 in /lustre/atlas/proj-shared/csc040/gryffin/gryffndor/miniapps/cmake/../algorithms/tricount.hpp -- Regards, Ramki On 6/14/17, 4:48 PM, "Barry Smith" wrote: > On Jun 14, 2017, at 3:45 PM, Kannan, Ramakrishnan wrote: > > Barry, > > All the functions here are standard SLEPC functions and there are no user-defined or custom code here. As you can see, when I uncomment the CHKERRQ macros in my code, I am getting the compilation error. Yes that is because YOUR function that calls the SLEPc functions is void and doesn't return an error code. It is that function I recommend changing to return error codes. Barry > > -- > Regards, > Ramki > > > On 6/14/17, 4:40 PM, "Barry Smith" wrote: > > >> On Jun 14, 2017, at 3:33 PM, Kannan, Ramakrishnan wrote: >> >> Can I use CHKERRV instead of CHKERRQ? Will that help? > > You can do that. But I question you having functions in your code that return void instead of an error code. Without error codes you are just hurting your own productivity. > > Barry > >> >> -- >> Regards, >> Ramki >> >> >> On 6/14/17, 4:25 PM, "Kannan, Ramakrishnan" wrote: >> >> I get the following compilation error when I have CHKERRQ. >> >> /opt/cray/petsc/3.7.4.0/real/GNU64/5.1/sandybridge/include/petscerror.h:433:154: error: return-statement with a value, in function returning 'void' [-fpermissive] >> #define CHKERRQ(n) do {if (PetscUnlikely(n)) return PetscError(PETSC_COMM_SELF,__LINE__,PETSC_FUNCTION_NAME,__FILE__,n,PETSC_ERROR_REPEAT," ");} while (0) >> >> >> -- >> Regards, >> Ramki >> >> >> On 6/14/17, 4:14 PM, "Barry Smith" wrote: >> >> >> Why do you have the CHKERRQ(ierr); commented out in your code? >> >> Because of this you are getting mangled confusing error messages. >> >> Put a ierr = in front of all calls and a CHKERRQ(ierr); after each call. >> >> Then resend the new error message which will be much clearer. >> >> >> >>> On Jun 14, 2017, at 2:58 PM, Kannan, Ramakrishnan wrote: >>> >>> Hello, >>> >>> I am running NHEP across 16 MPI processors over 16 nodes in a matrix of global size of 1,000,000x1,000,000 with approximately global 16,000,000 non-zeros. Each node has approximately 1million non-zeros. >>> >>> The following is my slepc code for EPS. >>> >>> PetscInt nev; >>> ierr = EPSCreate(PETSC_COMM_WORLD, &eps); // CHKERRQ(ierr); >>> ierr = EPSSetOperators(eps, A, NULL); // CHKERRQ(ierr); >>> ierr = EPSSetProblemType(eps, EPS_NHEP); // CHKERRQ(ierr); >>> EPSSetWhichEigenpairs(eps, EPS_LARGEST_REAL); >>> EPSSetDimensions(eps, 100, PETSC_DEFAULT, PETSC_DEFAULT); >>> PRINTROOT("calling epssolve"); >>> ierr = EPSSolve(eps); // CHKERRQ(ierr); >>> ierr = EPSGetType(eps, &type); // CHKERRQ(ierr); >>> ierr = PetscPrintf(PETSC_COMM_WORLD, " Solution method: %s\n\n", type); >>> // CHKERRQ(ierr); >>> ierr = EPSGetDimensions(eps, &nev, NULL, NULL); // CHKERRQ(ierr); >>> ierr = PetscPrintf(PETSC_COMM_WORLD, >>> " Number of requested eigenvalues: %D\n", >>> nev); // CHKERRQ(ierr); >>> >>> I am getting the following error. Attached is the entire error file for your reference. Please let me know what should I fix in this code. >>> >>> 2]PETSC ERROR: Argument out of range >>> [2]PETSC ERROR: Argument 2 out of range >>> [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>> [2]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016 >>> [2]PETSC ERROR: ./miniapps on a sandybridge named nid00300 by d3s Wed Jun 14 15:32:00 2017 >>> [13]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c >>> [13]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c >>> [13]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c >>> [13]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >>> [13]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >>> [13]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >>> [13]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c >>> [13]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c >>> [13]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c >>> >>> -- >>> Regards, >>> Ramki >>> >>> >> >> >> >> >> >> > > > > From bsmith at mcs.anl.gov Wed Jun 14 16:21:04 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 14 Jun 2017 16:21:04 -0500 Subject: [petsc-users] slepc NHEP error In-Reply-To: References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> <92EA6381-8F1F-4C9F-88BD-C1357B8C1C42@mcs.anl.gov> Message-ID: <9C04F853-DEDF-4F28-B63E-316AB14E0A97@mcs.anl.gov> Send the file autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c as an attachment. Barry > On Jun 14, 2017, at 4:17 PM, Kannan, Ramakrishnan wrote: > > Barry, > > Appreciate your kind help. It compiles fine. I am still getting the following error. > > [0]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c > [0]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c > [0]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c > [0]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [0]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [0]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [0]PETSC ERROR: [8]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c > [8]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c > [8]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c > [8]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [8]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [8]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [8]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c > [8]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c > [8]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c > [8]PETSC ERROR: #10 count() line 266 in /lustre/atlas/proj-shared/csc040/gryffin/gryffndor/miniapps/cmake/../algorithms/tricount.hpp > [2]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c > [2]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c > [2]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c > [2]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [2]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [2]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [2]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c > [2]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c > [2]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c > [2]PETSC ERROR: #10 count() line 266 in /lustre/atlas/proj-shared/csc040/gryffin/gryffndor/miniapps/cmake/../algorithms/tricount.hpp > #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c > [0]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c > [0]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c > [0]PETSC ERROR: #10 count() line 266 in /lustre/atlas/proj-shared/csc040/gryffin/gryffndor/miniapps/cmake/../algorithms/tricount.hpp > [7]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c > [7]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c > [7]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c > [7]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [7]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [7]PETSC ERROR: [15]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c > [15]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c > [15]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c > [15]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [15]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [15]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [15]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c > [15]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c > [15]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c > [15]PETSC ERROR: #10 count() line 266 in /lustre/atlas/proj-shared/csc040/gryffin/gryffndor/miniapps/cmake/../algorithms/tricount.hpp > #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [7]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c > [7]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c > [7]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c > [7]PETSC ERROR: #10 count() line 266 in /lustre/atlas/proj-shared/csc040/gryffin/gryffndor/miniapps/cmake/../algorithms/tricount.hpp > > -- > Regards, > Ramki > > > On 6/14/17, 4:48 PM, "Barry Smith" wrote: > > >> On Jun 14, 2017, at 3:45 PM, Kannan, Ramakrishnan wrote: >> >> Barry, >> >> All the functions here are standard SLEPC functions and there are no user-defined or custom code here. As you can see, when I uncomment the CHKERRQ macros in my code, I am getting the compilation error. > > Yes that is because YOUR function that calls the SLEPc functions is void and doesn't return an error code. It is that function I recommend changing to return error codes. > > Barry > >> >> -- >> Regards, >> Ramki >> >> >> On 6/14/17, 4:40 PM, "Barry Smith" wrote: >> >> >>> On Jun 14, 2017, at 3:33 PM, Kannan, Ramakrishnan wrote: >>> >>> Can I use CHKERRV instead of CHKERRQ? Will that help? >> >> You can do that. But I question you having functions in your code that return void instead of an error code. Without error codes you are just hurting your own productivity. >> >> Barry >> >>> >>> -- >>> Regards, >>> Ramki >>> >>> >>> On 6/14/17, 4:25 PM, "Kannan, Ramakrishnan" wrote: >>> >>> I get the following compilation error when I have CHKERRQ. >>> >>> /opt/cray/petsc/3.7.4.0/real/GNU64/5.1/sandybridge/include/petscerror.h:433:154: error: return-statement with a value, in function returning 'void' [-fpermissive] >>> #define CHKERRQ(n) do {if (PetscUnlikely(n)) return PetscError(PETSC_COMM_SELF,__LINE__,PETSC_FUNCTION_NAME,__FILE__,n,PETSC_ERROR_REPEAT," ");} while (0) >>> >>> >>> -- >>> Regards, >>> Ramki >>> >>> >>> On 6/14/17, 4:14 PM, "Barry Smith" wrote: >>> >>> >>> Why do you have the CHKERRQ(ierr); commented out in your code? >>> >>> Because of this you are getting mangled confusing error messages. >>> >>> Put a ierr = in front of all calls and a CHKERRQ(ierr); after each call. >>> >>> Then resend the new error message which will be much clearer. >>> >>> >>> >>>> On Jun 14, 2017, at 2:58 PM, Kannan, Ramakrishnan wrote: >>>> >>>> Hello, >>>> >>>> I am running NHEP across 16 MPI processors over 16 nodes in a matrix of global size of 1,000,000x1,000,000 with approximately global 16,000,000 non-zeros. Each node has approximately 1million non-zeros. >>>> >>>> The following is my slepc code for EPS. >>>> >>>> PetscInt nev; >>>> ierr = EPSCreate(PETSC_COMM_WORLD, &eps); // CHKERRQ(ierr); >>>> ierr = EPSSetOperators(eps, A, NULL); // CHKERRQ(ierr); >>>> ierr = EPSSetProblemType(eps, EPS_NHEP); // CHKERRQ(ierr); >>>> EPSSetWhichEigenpairs(eps, EPS_LARGEST_REAL); >>>> EPSSetDimensions(eps, 100, PETSC_DEFAULT, PETSC_DEFAULT); >>>> PRINTROOT("calling epssolve"); >>>> ierr = EPSSolve(eps); // CHKERRQ(ierr); >>>> ierr = EPSGetType(eps, &type); // CHKERRQ(ierr); >>>> ierr = PetscPrintf(PETSC_COMM_WORLD, " Solution method: %s\n\n", type); >>>> // CHKERRQ(ierr); >>>> ierr = EPSGetDimensions(eps, &nev, NULL, NULL); // CHKERRQ(ierr); >>>> ierr = PetscPrintf(PETSC_COMM_WORLD, >>>> " Number of requested eigenvalues: %D\n", >>>> nev); // CHKERRQ(ierr); >>>> >>>> I am getting the following error. Attached is the entire error file for your reference. Please let me know what should I fix in this code. >>>> >>>> 2]PETSC ERROR: Argument out of range >>>> [2]PETSC ERROR: Argument 2 out of range >>>> [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>> [2]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016 >>>> [2]PETSC ERROR: ./miniapps on a sandybridge named nid00300 by d3s Wed Jun 14 15:32:00 2017 >>>> [13]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c >>>> [13]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c >>>> [13]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c >>>> [13]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >>>> [13]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >>>> [13]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >>>> [13]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c >>>> [13]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c >>>> [13]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c >>>> >>>> -- >>>> Regards, >>>> Ramki >>>> >>>> >>> >>> >>> >>> >>> >>> >> >> >> >> > > > > From kannanr at ornl.gov Wed Jun 14 16:24:48 2017 From: kannanr at ornl.gov (Kannan, Ramakrishnan) Date: Wed, 14 Jun 2017 21:24:48 +0000 Subject: [petsc-users] slepc NHEP error In-Reply-To: <9C04F853-DEDF-4F28-B63E-316AB14E0A97@mcs.anl.gov> References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> <92EA6381-8F1F-4C9F-88BD-C1357B8C1C42@mcs.anl.gov> <9C04F853-DEDF-4F28-B63E-316AB14E0A97@mcs.anl.gov> Message-ID: -- Regards, Ramki On 6/14/17, 5:21 PM, "Barry Smith" wrote: Send the file autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c as an attachment. Barry > On Jun 14, 2017, at 4:17 PM, Kannan, Ramakrishnan wrote: > > Barry, > > Appreciate your kind help. It compiles fine. I am still getting the following error. > > [0]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c > [0]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c > [0]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c > [0]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [0]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [0]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [0]PETSC ERROR: [8]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c > [8]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c > [8]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c > [8]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [8]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [8]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [8]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c > [8]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c > [8]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c > [8]PETSC ERROR: #10 count() line 266 in /lustre/atlas/proj-shared/csc040/gryffin/gryffndor/miniapps/cmake/../algorithms/tricount.hpp > [2]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c > [2]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c > [2]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c > [2]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [2]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [2]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [2]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c > [2]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c > [2]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c > [2]PETSC ERROR: #10 count() line 266 in /lustre/atlas/proj-shared/csc040/gryffin/gryffndor/miniapps/cmake/../algorithms/tricount.hpp > #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c > [0]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c > [0]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c > [0]PETSC ERROR: #10 count() line 266 in /lustre/atlas/proj-shared/csc040/gryffin/gryffndor/miniapps/cmake/../algorithms/tricount.hpp > [7]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c > [7]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c > [7]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c > [7]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [7]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [7]PETSC ERROR: [15]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c > [15]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c > [15]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c > [15]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [15]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [15]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [15]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c > [15]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c > [15]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c > [15]PETSC ERROR: #10 count() line 266 in /lustre/atlas/proj-shared/csc040/gryffin/gryffndor/miniapps/cmake/../algorithms/tricount.hpp > #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c > [7]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c > [7]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c > [7]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c > [7]PETSC ERROR: #10 count() line 266 in /lustre/atlas/proj-shared/csc040/gryffin/gryffndor/miniapps/cmake/../algorithms/tricount.hpp > > -- > Regards, > Ramki > > > On 6/14/17, 4:48 PM, "Barry Smith" wrote: > > >> On Jun 14, 2017, at 3:45 PM, Kannan, Ramakrishnan wrote: >> >> Barry, >> >> All the functions here are standard SLEPC functions and there are no user-defined or custom code here. As you can see, when I uncomment the CHKERRQ macros in my code, I am getting the compilation error. > > Yes that is because YOUR function that calls the SLEPc functions is void and doesn't return an error code. It is that function I recommend changing to return error codes. > > Barry > >> >> -- >> Regards, >> Ramki >> >> >> On 6/14/17, 4:40 PM, "Barry Smith" wrote: >> >> >>> On Jun 14, 2017, at 3:33 PM, Kannan, Ramakrishnan wrote: >>> >>> Can I use CHKERRV instead of CHKERRQ? Will that help? >> >> You can do that. But I question you having functions in your code that return void instead of an error code. Without error codes you are just hurting your own productivity. >> >> Barry >> >>> >>> -- >>> Regards, >>> Ramki >>> >>> >>> On 6/14/17, 4:25 PM, "Kannan, Ramakrishnan" wrote: >>> >>> I get the following compilation error when I have CHKERRQ. >>> >>> /opt/cray/petsc/3.7.4.0/real/GNU64/5.1/sandybridge/include/petscerror.h:433:154: error: return-statement with a value, in function returning 'void' [-fpermissive] >>> #define CHKERRQ(n) do {if (PetscUnlikely(n)) return PetscError(PETSC_COMM_SELF,__LINE__,PETSC_FUNCTION_NAME,__FILE__,n,PETSC_ERROR_REPEAT," ");} while (0) >>> >>> >>> -- >>> Regards, >>> Ramki >>> >>> >>> On 6/14/17, 4:14 PM, "Barry Smith" wrote: >>> >>> >>> Why do you have the CHKERRQ(ierr); commented out in your code? >>> >>> Because of this you are getting mangled confusing error messages. >>> >>> Put a ierr = in front of all calls and a CHKERRQ(ierr); after each call. >>> >>> Then resend the new error message which will be much clearer. >>> >>> >>> >>>> On Jun 14, 2017, at 2:58 PM, Kannan, Ramakrishnan wrote: >>>> >>>> Hello, >>>> >>>> I am running NHEP across 16 MPI processors over 16 nodes in a matrix of global size of 1,000,000x1,000,000 with approximately global 16,000,000 non-zeros. Each node has approximately 1million non-zeros. >>>> >>>> The following is my slepc code for EPS. >>>> >>>> PetscInt nev; >>>> ierr = EPSCreate(PETSC_COMM_WORLD, &eps); // CHKERRQ(ierr); >>>> ierr = EPSSetOperators(eps, A, NULL); // CHKERRQ(ierr); >>>> ierr = EPSSetProblemType(eps, EPS_NHEP); // CHKERRQ(ierr); >>>> EPSSetWhichEigenpairs(eps, EPS_LARGEST_REAL); >>>> EPSSetDimensions(eps, 100, PETSC_DEFAULT, PETSC_DEFAULT); >>>> PRINTROOT("calling epssolve"); >>>> ierr = EPSSolve(eps); // CHKERRQ(ierr); >>>> ierr = EPSGetType(eps, &type); // CHKERRQ(ierr); >>>> ierr = PetscPrintf(PETSC_COMM_WORLD, " Solution method: %s\n\n", type); >>>> // CHKERRQ(ierr); >>>> ierr = EPSGetDimensions(eps, &nev, NULL, NULL); // CHKERRQ(ierr); >>>> ierr = PetscPrintf(PETSC_COMM_WORLD, >>>> " Number of requested eigenvalues: %D\n", >>>> nev); // CHKERRQ(ierr); >>>> >>>> I am getting the following error. Attached is the entire error file for your reference. Please let me know what should I fix in this code. >>>> >>>> 2]PETSC ERROR: Argument out of range >>>> [2]PETSC ERROR: Argument 2 out of range >>>> [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>> [2]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016 >>>> [2]PETSC ERROR: ./miniapps on a sandybridge named nid00300 by d3s Wed Jun 14 15:32:00 2017 >>>> [13]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c >>>> [13]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c >>>> [13]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c >>>> [13]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >>>> [13]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >>>> [13]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >>>> [13]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c >>>> [13]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c >>>> [13]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c >>>> >>>> -- >>>> Regards, >>>> Ramki >>>> >>>> >>> >>> >>> >>> >>> >>> >> >> >> >> > > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: bvblas.c Type: application/octet-stream Size: 15975 bytes Desc: bvblas.c URL: From bsmith at mcs.anl.gov Wed Jun 14 16:31:37 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 14 Jun 2017 16:31:37 -0500 Subject: [petsc-users] slepc NHEP error In-Reply-To: References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> <92EA6381-8F1F-4C9F-88BD-C1357B8C1C42@mcs.anl.gov> <9C04F853-DEDF-4F28-B63E-316AB14E0A97@mcs.anl.gov> Message-ID: <34243810-1A49-499E-812E-0C0CCCC38565@mcs.anl.gov> Here is the line that generates an error: ierr = MPI_Allreduce(bv->work,y,len,MPIU_SCALAR,MPIU_SUM,PetscObjectComm((PetscObject)bv));CHKERRQ(ierr); let's see what the MPI error is by running again with the additional command line option -on_error_abort hopefully MPI will say something useful. Barry > On Jun 14, 2017, at 4:24 PM, Kannan, Ramakrishnan wrote: > > > > -- > Regards, > Ramki > > > On 6/14/17, 5:21 PM, "Barry Smith" wrote: > > > Send the file autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c as an attachment. > > Barry > >> On Jun 14, 2017, at 4:17 PM, Kannan, Ramakrishnan wrote: >> >> Barry, >> >> Appreciate your kind help. It compiles fine. I am still getting the following error. >> >> [0]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c >> [0]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c >> [0]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c >> [0]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [0]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [0]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [0]PETSC ERROR: [8]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c >> [8]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c >> [8]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c >> [8]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [8]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [8]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [8]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c >> [8]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c >> [8]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c >> [8]PETSC ERROR: #10 count() line 266 in /lustre/atlas/proj-shared/csc040/gryffin/gryffndor/miniapps/cmake/../algorithms/tricount.hpp >> [2]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c >> [2]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c >> [2]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c >> [2]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [2]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [2]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [2]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c >> [2]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c >> [2]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c >> [2]PETSC ERROR: #10 count() line 266 in /lustre/atlas/proj-shared/csc040/gryffin/gryffndor/miniapps/cmake/../algorithms/tricount.hpp >> #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c >> [0]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c >> [0]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c >> [0]PETSC ERROR: #10 count() line 266 in /lustre/atlas/proj-shared/csc040/gryffin/gryffndor/miniapps/cmake/../algorithms/tricount.hpp >> [7]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c >> [7]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c >> [7]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c >> [7]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [7]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [7]PETSC ERROR: [15]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c >> [15]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c >> [15]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c >> [15]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [15]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [15]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [15]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c >> [15]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c >> [15]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c >> [15]PETSC ERROR: #10 count() line 266 in /lustre/atlas/proj-shared/csc040/gryffin/gryffndor/miniapps/cmake/../algorithms/tricount.hpp >> #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [7]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c >> [7]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c >> [7]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c >> [7]PETSC ERROR: #10 count() line 266 in /lustre/atlas/proj-shared/csc040/gryffin/gryffndor/miniapps/cmake/../algorithms/tricount.hpp >> >> -- >> Regards, >> Ramki >> >> >> On 6/14/17, 4:48 PM, "Barry Smith" wrote: >> >> >>> On Jun 14, 2017, at 3:45 PM, Kannan, Ramakrishnan wrote: >>> >>> Barry, >>> >>> All the functions here are standard SLEPC functions and there are no user-defined or custom code here. As you can see, when I uncomment the CHKERRQ macros in my code, I am getting the compilation error. >> >> Yes that is because YOUR function that calls the SLEPc functions is void and doesn't return an error code. It is that function I recommend changing to return error codes. >> >> Barry >> >>> >>> -- >>> Regards, >>> Ramki >>> >>> >>> On 6/14/17, 4:40 PM, "Barry Smith" wrote: >>> >>> >>>> On Jun 14, 2017, at 3:33 PM, Kannan, Ramakrishnan wrote: >>>> >>>> Can I use CHKERRV instead of CHKERRQ? Will that help? >>> >>> You can do that. But I question you having functions in your code that return void instead of an error code. Without error codes you are just hurting your own productivity. >>> >>> Barry >>> >>>> >>>> -- >>>> Regards, >>>> Ramki >>>> >>>> >>>> On 6/14/17, 4:25 PM, "Kannan, Ramakrishnan" wrote: >>>> >>>> I get the following compilation error when I have CHKERRQ. >>>> >>>> /opt/cray/petsc/3.7.4.0/real/GNU64/5.1/sandybridge/include/petscerror.h:433:154: error: return-statement with a value, in function returning 'void' [-fpermissive] >>>> #define CHKERRQ(n) do {if (PetscUnlikely(n)) return PetscError(PETSC_COMM_SELF,__LINE__,PETSC_FUNCTION_NAME,__FILE__,n,PETSC_ERROR_REPEAT," ");} while (0) >>>> >>>> >>>> -- >>>> Regards, >>>> Ramki >>>> >>>> >>>> On 6/14/17, 4:14 PM, "Barry Smith" wrote: >>>> >>>> >>>> Why do you have the CHKERRQ(ierr); commented out in your code? >>>> >>>> Because of this you are getting mangled confusing error messages. >>>> >>>> Put a ierr = in front of all calls and a CHKERRQ(ierr); after each call. >>>> >>>> Then resend the new error message which will be much clearer. >>>> >>>> >>>> >>>>> On Jun 14, 2017, at 2:58 PM, Kannan, Ramakrishnan wrote: >>>>> >>>>> Hello, >>>>> >>>>> I am running NHEP across 16 MPI processors over 16 nodes in a matrix of global size of 1,000,000x1,000,000 with approximately global 16,000,000 non-zeros. Each node has approximately 1million non-zeros. >>>>> >>>>> The following is my slepc code for EPS. >>>>> >>>>> PetscInt nev; >>>>> ierr = EPSCreate(PETSC_COMM_WORLD, &eps); // CHKERRQ(ierr); >>>>> ierr = EPSSetOperators(eps, A, NULL); // CHKERRQ(ierr); >>>>> ierr = EPSSetProblemType(eps, EPS_NHEP); // CHKERRQ(ierr); >>>>> EPSSetWhichEigenpairs(eps, EPS_LARGEST_REAL); >>>>> EPSSetDimensions(eps, 100, PETSC_DEFAULT, PETSC_DEFAULT); >>>>> PRINTROOT("calling epssolve"); >>>>> ierr = EPSSolve(eps); // CHKERRQ(ierr); >>>>> ierr = EPSGetType(eps, &type); // CHKERRQ(ierr); >>>>> ierr = PetscPrintf(PETSC_COMM_WORLD, " Solution method: %s\n\n", type); >>>>> // CHKERRQ(ierr); >>>>> ierr = EPSGetDimensions(eps, &nev, NULL, NULL); // CHKERRQ(ierr); >>>>> ierr = PetscPrintf(PETSC_COMM_WORLD, >>>>> " Number of requested eigenvalues: %D\n", >>>>> nev); // CHKERRQ(ierr); >>>>> >>>>> I am getting the following error. Attached is the entire error file for your reference. Please let me know what should I fix in this code. >>>>> >>>>> 2]PETSC ERROR: Argument out of range >>>>> [2]PETSC ERROR: Argument 2 out of range >>>>> [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>>> [2]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016 >>>>> [2]PETSC ERROR: ./miniapps on a sandybridge named nid00300 by d3s Wed Jun 14 15:32:00 2017 >>>>> [13]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c >>>>> [13]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c >>>>> [13]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c >>>>> [13]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >>>>> [13]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >>>>> [13]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >>>>> [13]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c >>>>> [13]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c >>>>> [13]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c >>>>> >>>>> -- >>>>> Regards, >>>>> Ramki >>>>> >>>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> >> >> >> >> > > > > > From bsmith at mcs.anl.gov Wed Jun 14 18:29:18 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 14 Jun 2017 18:29:18 -0500 Subject: [petsc-users] empty split for fieldsplit In-Reply-To: References: <86908C35-EBFC-48C6-A002-B64BEC85A375@mcs.anl.gov> Message-ID: <5596225C-DB37-4040-B709-5E6F4B18041B@mcs.anl.gov> You can't do this ierr = MatSetSizes(A,PETSC_DECIDE,N,N,N);CHKERRQ(ierr); use PETSC_DECIDE for the third argument Also this is wrong for (i = Istart; i < Iend; ++i) { ierr = MatSetValue(A,i,i,2,INSERT_VALUES);CHKERRQ(ierr); ierr = MatSetValue(A,i+1,i,-1,INSERT_VALUES);CHKERRQ(ierr); ierr = MatSetValue(A,i,i+1,-1,INSERT_VALUES);CHKERRQ(ierr); } you will get $ petscmpiexec -n 2 ./ex1 0: Istart = 0, Iend = 60 1: Istart = 60, Iend = 120 [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: Argument out of range [1]PETSC ERROR: Row too large: row 120 max 119 [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [1]PETSC ERROR: Petsc Development GIT revision: v3.7.6-4103-g93161b8192 GIT Date: 2017-06-11 14:49:39 -0500 [1]PETSC ERROR: ./ex1 on a arch-basic named Barrys-MacBook-Pro.local by barrysmith Wed Jun 14 18:26:52 2017 [1]PETSC ERROR: Configure options PETSC_ARCH=arch-basic [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 550 in /Users/barrysmith/Src/petsc/src/mat/impls/aij/mpi/mpiaij.c [1]PETSC ERROR: #2 MatSetValues() line 1270 in /Users/barrysmith/Src/petsc/src/mat/interface/matrix.c [1]PETSC ERROR: #3 main() line 30 in /Users/barrysmith/Src/petsc/test-dir/ex1.c [1]PETSC ERROR: PETSc Option Table entries: [1]PETSC ERROR: -malloc_test You need to get the example working so it ends with the error you reported previously not these other bugs. > On Jun 12, 2017, at 10:19 AM, Hoang Giang Bui wrote: > > Dear Barry > > I made a small example with 2 process with one empty split in proc 0. But it gives another strange error > > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [1]PETSC ERROR: Arguments are incompatible > [1]PETSC ERROR: Local size 31 not compatible with block size 2 > > The local size is always 60, so this is confusing. > > Giang > > On Sun, Jun 11, 2017 at 8:11 PM, Barry Smith wrote: > Could be, send us a simple example that demonstrates the problem and we'll track it down. > > > > On Jun 11, 2017, at 12:34 PM, Hoang Giang Bui wrote: > > > > Hello > > > > I noticed that my code stopped very long, possibly hang, at PCFieldSplitSetIS. There are two splits and one split is empty in one process. May that be the possible reason that PCFieldSplitSetIS hang ? > > > > Giang > > > From bsmith at mcs.anl.gov Wed Jun 14 20:02:22 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 14 Jun 2017 20:02:22 -0500 Subject: [petsc-users] questions on BAIJ matrix In-Reply-To: References: Message-ID: > On Jun 8, 2017, at 2:56 PM, Xiangdong wrote: > > > On Thu, Jun 8, 2017 at 3:17 PM, Hong wrote: > Xiangdong: > MatCreateMPIBAIJWithArrays() is obviously buggy, and not been tested. > > > 1) In the remark of the function MatCreateMPIBAIJWithArrays, it says " bs - the block size, only a block size of 1 is supported". Why must the block size be 1? Is this a typo? > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateMPIBAIJWithArrays.html > > It seems only bs=1 was implemented. I would not trust it without a test example. > > 2) In the Line 4040 of the implemention of MatCreateMPIBAIJWithArrays, would the matrix type be matmpibaij instead of matmpiSbaij? > > This is an error. It should be matmpibaij. > > http://www.mcs.anl.gov/petsc/petsc-current/src/mat/impls/baij/mpi/mpibaij.c.html#MatCreateMPIBAIJWithArrays > > 4031: PetscErrorCode MatCreateMPIBAIJWithArrays(MPI_Comm comm,PetscInt bs,PetscInt m,PetscInt n,PetscInt M,PetscInt N,const PetscInt i[],const PetscInt j[],const PetscScalar a[],Mat *mat) > 4032: { > > 4036: if (i[0]) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_ARG_OUTOFRANGE,"i (row indices) must start with 0"); > 4037: if (m < 0) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_ARG_OUTOFRANGE,"local number of rows (m) cannot be PETSC_DECIDE, or negative"); > 4038: MatCreate(comm,mat); > 4039: MatSetSizes(*mat,m,n,M,N); > 4040: MatSetType(*mat,MATMPISBAIJ); > > It should be MATMPIBAIJ. > > 3) I want to create a petsc matrix M equivalent to the sum of two block csr matrix/array (M1csr, M2csr). What is the best way to achieve it? I am thinking of created two petsc baij matrix (M1baij and M2baij) by calling MatCreateMPIBAIJWithArrays twice and then call MATAXPY to get the sum M=M1baij + M2baij. Is there a better way to do it? > > This is an approach. However MatCreateMPIBAIJWithArrays() needs to be fixed, tested and implemented with requested bs. What bs do you need? > > Why does each bs need to be implemented separately? In the mean time, I modifed the implementation of MatCreateMPIBAIJWithArrays() a little bit to create a baij matrix with csr arrays. > > MatCreate(comm,mat); > MatSetSizes(*mat,m,n,M,N); > MatSetType(*mat,MATMPIBAIJ); > MatMPIBAIJSetPreallocationCSR(*mat,bs,i,j,a); > MatSetOption(*mat,MAT_ROW_ORIENTED,PETSC_FALSE); > MatAssemblyBegin(M,MAT_FINAL_ASSEMBLY); > MatAssemblyEnd(M,MAT_FINAL_ASSEMBLY); > > I just set the type to MATMPIBAIJ and delete the line MatSetOption before preallocation (otherwise I get error at runtime complaining using set options before preallocation) and it works fine. The only thing missing is that setting mat_row_oriented to be petsc_false has no effect on the final matrix, which I do not know how to fix. row_oriented is only used in MatSetValues[Blocked] for the array being passed in to set values. It has nothing to do with the actual storage of the matrix, for BAIJ the actual storage is always column oriented. Is this ok, why do you want to set row_oriented? Barry > > > > Why not use MatCreate(), MatSetValuses() (set a block values at time) to create two MPIBAIJ matrices, then call MATAXPY. Since petsc MPIBAIJ matrix has different internal data structure than csr, > "The i, j, and a arrays ARE copied by MatCreateMPIBAIJWithArrays() into the internal format used by PETSc;", so this approach would give similar performance. > > I will try this option as well. Thanks for your suggestions. > > Xiangdong > > Hong > > From m.rezasoltanian at gmail.com Wed Jun 14 22:33:50 2017 From: m.rezasoltanian at gmail.com (Mohamadreza Soltanian) Date: Wed, 14 Jun 2017 23:33:50 -0400 Subject: [petsc-users] PETSC Test Message-ID: Hello All, I am trying to install and test PETSC. When I get to the test part, it seems everything get stuck in the following line. I was wondering if anyone can help. Thank you C/C++ example src/snes/examples/tutorials/ex19 run successfully with 1 MPI process -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Jun 14 23:07:07 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 14 Jun 2017 23:07:07 -0500 Subject: [petsc-users] PETSC Test In-Reply-To: References: Message-ID: Likely its hanging in gethostbyname() call - which is caued by a mismatch in hostname. What OS are you using? If linux - can you do the following? sudo echo 127.0.0.1 `hostname` >> /etc/hosts And retry? Satish On Thu, 15 Jun 2017, Mohamadreza Soltanian wrote: > Hello All, > > I am trying to install and test PETSC. When I get to the test part, it > seems everything get stuck in the following line. I was wondering if anyone > can help. Thank you > > C/C++ example src/snes/examples/tutorials/ex19 run successfully with 1 MPI > process > From m.rezasoltanian at gmail.com Wed Jun 14 23:19:08 2017 From: m.rezasoltanian at gmail.com (Mohamadreza Soltanian) Date: Thu, 15 Jun 2017 00:19:08 -0400 Subject: [petsc-users] PETSC Test In-Reply-To: References: Message-ID: Hello Satish Thank you for your reply. I am using macOS Sierra 10.12.5. Thanks On Thu, Jun 15, 2017 at 12:07 AM, Satish Balay wrote: > Likely its hanging in gethostbyname() call - which is caued > by a mismatch in hostname. > > What OS are you using? If linux - can you do the following? > > sudo echo 127.0.0.1 `hostname` >> /etc/hosts > > And retry? > > Satish > > On Thu, 15 Jun 2017, Mohamadreza Soltanian wrote: > > > Hello All, > > > > I am trying to install and test PETSC. When I get to the test part, it > > seems everything get stuck in the following line. I was wondering if > anyone > > can help. Thank you > > > > C/C++ example src/snes/examples/tutorials/ex19 run successfully with 1 > MPI > > process > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Jun 14 23:21:05 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 14 Jun 2017 23:21:05 -0500 Subject: [petsc-users] PETSC Test In-Reply-To: References: Message-ID: The command below should work on osx aswell.. Satish On Thu, 15 Jun 2017, Mohamadreza Soltanian wrote: > Hello Satish > > Thank you for your reply. I am using macOS Sierra 10.12.5. > > Thanks > > > > On Thu, Jun 15, 2017 at 12:07 AM, Satish Balay wrote: > > > Likely its hanging in gethostbyname() call - which is caued > > by a mismatch in hostname. > > > > What OS are you using? If linux - can you do the following? > > > > sudo echo 127.0.0.1 `hostname` >> /etc/hosts > > > > And retry? > > > > Satish > > > > On Thu, 15 Jun 2017, Mohamadreza Soltanian wrote: > > > > > Hello All, > > > > > > I am trying to install and test PETSC. When I get to the test part, it > > > seems everything get stuck in the following line. I was wondering if > > anyone > > > can help. Thank you > > > > > > C/C++ example src/snes/examples/tutorials/ex19 run successfully with 1 > > MPI > > > process > > > > > > > > From a.croucher at auckland.ac.nz Wed Jun 14 23:48:12 2017 From: a.croucher at auckland.ac.nz (Adrian Croucher) Date: Thu, 15 Jun 2017 16:48:12 +1200 Subject: [petsc-users] Jacobian matrix for dual porosity model In-Reply-To: <87y3sv4qpl.fsf@jedbrown.org> References: <699ad4c0-6f79-be19-8239-ba2050ccb8de@auckland.ac.nz> <87d1a86i6n.fsf@jedbrown.org> <90E27510-2650-4B07-B37C-1C6D46250FC3@mcs.anl.gov> <87y3sv4qpl.fsf@jedbrown.org> Message-ID: On 14/06/17 07:45, Jed Brown wrote: > Barry Smith writes: > >>> On Jun 13, 2017, at 10:06 AM, Jed Brown wrote: >>> >>> Adrian Croucher writes: >>> >>>> One way might be to form the whole Jacobian but somehow use a modified >>>> KSP solve which would implement the reduction process, do a KSP solve on >>>> the reduced system of size n, and finally back-substitute to find the >>>> unknowns in the matrix rock cells. >>> You can do this with PCFieldSplit type Schur, but it's a lot heavier >>> than you might like. >> Is it clear that it would produce much overhead compared to doing a custom "reduction to a smaller problem". Perhaps he should start with this and then profiling can show if there are any likely benefits to "specializing more"? > Yeah, that would be reasonable. We don't have a concept of sparsity for > preconditioners so don't have a clean way to produce the exact (sparse) > Schur complement. Computing this matrix using coloring should be > relatively inexpensive due to the independence in each cell and its > tridiagonal structure. Thanks for those ideas, very helpful. If I try this approach (forming whole Jacobian matrix and using PCFieldSplit Schur), I guess I will first need to set up a modified DMPlex for the whole fracture + matrix mesh- so I can use it to create vectors and the Jacobian matrix (with the right sparsity pattern), and also to work out the coloring for finite differencing. Would that be straight-forward to do? Currently my DM just comes from DMPlexCreateFromFile(). Presumably you can use DMPlexInsertCone() or similar to add points into it? - Adrian -- Dr Adrian Croucher Senior Research Fellow Department of Engineering Science University of Auckland, New Zealand email: a.croucher at auckland.ac.nz tel: +64 (0)9 923 4611 From balay at mcs.anl.gov Thu Jun 15 00:06:13 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 15 Jun 2017 00:06:13 -0500 Subject: [petsc-users] PETSC Test In-Reply-To: References: Message-ID: The redirection is a problem with sudo - so the following is the correction echo 127.0.0.1 `hostname` | sudo tee -a /etc/hosts Satish On Thu, 15 Jun 2017, Satish Balay wrote: > The command below should work on osx aswell.. > > Satish > > On Thu, 15 Jun 2017, Mohamadreza Soltanian wrote: > > > Hello Satish > > > > Thank you for your reply. I am using macOS Sierra 10.12.5. > > > > Thanks > > > > > > > > On Thu, Jun 15, 2017 at 12:07 AM, Satish Balay wrote: > > > > > Likely its hanging in gethostbyname() call - which is caued > > > by a mismatch in hostname. > > > > > > What OS are you using? If linux - can you do the following? > > > > > > sudo echo 127.0.0.1 `hostname` >> /etc/hosts > > > > > > And retry? > > > > > > Satish > > > > > > On Thu, 15 Jun 2017, Mohamadreza Soltanian wrote: > > > > > > > Hello All, > > > > > > > > I am trying to install and test PETSC. When I get to the test part, it > > > > seems everything get stuck in the following line. I was wondering if > > > anyone > > > > can help. Thank you > > > > > > > > C/C++ example src/snes/examples/tutorials/ex19 run successfully with 1 > > > MPI > > > > process > > > > > > > > > > > > > > From hgbk2008 at gmail.com Thu Jun 15 07:56:46 2017 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Thu, 15 Jun 2017 06:56:46 -0600 Subject: [petsc-users] empty split for fieldsplit In-Reply-To: <5596225C-DB37-4040-B709-5E6F4B18041B@mcs.anl.gov> References: <86908C35-EBFC-48C6-A002-B64BEC85A375@mcs.anl.gov> <5596225C-DB37-4040-B709-5E6F4B18041B@mcs.anl.gov> Message-ID: Hi Barry Thanks for pointing out the error. I think the problem coming from the zero fieldsplit in proc 0. In this modified example, I parameterized the matrix size and block size, so when you're executing mpirun -np 2 ./ex -mat_size 40 -block_size 2 -method 1 everything was fine. With method = 1, fieldsplit size of B is nonzero and is divided by the block size. With method=2, i.e mpirun -np 2 ./ex -mat_size 40 -block_size 2 -method 2, the fieldsplit B is zero on proc 0, and the error is thrown [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: Arguments are incompatible [1]PETSC ERROR: Local size 11 not compatible with block size 2 This is somehow not logical, because 0 is divided by block_size. Furthermore, if you execute "mpirun -np 2 ./ex -mat_size 20 -block_size 2 -method 2", the code hangs at ISSetBlockSize, which is pretty similar to my original problem. Probably the original one also hangs at ISSetBlockSize, which I may not realize at that time. Giang On Wed, Jun 14, 2017 at 5:29 PM, Barry Smith wrote: > > You can't do this > > ierr = MatSetSizes(A,PETSC_DECIDE,N,N,N);CHKERRQ(ierr); > > use PETSC_DECIDE for the third argument > > Also this is wrong > > for (i = Istart; i < Iend; ++i) > { > ierr = MatSetValue(A,i,i,2,INSERT_VALUES);CHKERRQ(ierr); > ierr = MatSetValue(A,i+1,i,-1,INSERT_VALUES);CHKERRQ(ierr); > ierr = MatSetValue(A,i,i+1,-1,INSERT_VALUES);CHKERRQ(ierr); > } > > you will get > > $ petscmpiexec -n 2 ./ex1 > 0: Istart = 0, Iend = 60 > 1: Istart = 60, Iend = 120 > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: Argument out of range > [1]PETSC ERROR: Row too large: row 120 max 119 > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [1]PETSC ERROR: Petsc Development GIT revision: v3.7.6-4103-g93161b8192 > GIT Date: 2017-06-11 14:49:39 -0500 > [1]PETSC ERROR: ./ex1 on a arch-basic named Barrys-MacBook-Pro.local by > barrysmith Wed Jun 14 18:26:52 2017 > [1]PETSC ERROR: Configure options PETSC_ARCH=arch-basic > [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 550 in > /Users/barrysmith/Src/petsc/src/mat/impls/aij/mpi/mpiaij.c > [1]PETSC ERROR: #2 MatSetValues() line 1270 in > /Users/barrysmith/Src/petsc/src/mat/interface/matrix.c > [1]PETSC ERROR: #3 main() line 30 in /Users/barrysmith/Src/petsc/te > st-dir/ex1.c > [1]PETSC ERROR: PETSc Option Table entries: > [1]PETSC ERROR: -malloc_test > > You need to get the example working so it ends with the error you reported > previously not these other bugs. > > > > On Jun 12, 2017, at 10:19 AM, Hoang Giang Bui > wrote: > > > > Dear Barry > > > > I made a small example with 2 process with one empty split in proc 0. > But it gives another strange error > > > > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [1]PETSC ERROR: Arguments are incompatible > > [1]PETSC ERROR: Local size 31 not compatible with block size 2 > > > > The local size is always 60, so this is confusing. > > > > Giang > > > > On Sun, Jun 11, 2017 at 8:11 PM, Barry Smith wrote: > > Could be, send us a simple example that demonstrates the problem and > we'll track it down. > > > > > > > On Jun 11, 2017, at 12:34 PM, Hoang Giang Bui > wrote: > > > > > > Hello > > > > > > I noticed that my code stopped very long, possibly hang, at > PCFieldSplitSetIS. There are two splits and one split is empty in one > process. May that be the possible reason that PCFieldSplitSetIS hang ? > > > > > > Giang > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex.c Type: text/x-csrc Size: 4020 bytes Desc: not available URL: From knepley at gmail.com Thu Jun 15 08:19:31 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 15 Jun 2017 08:19:31 -0500 Subject: [petsc-users] Jacobian matrix for dual porosity model In-Reply-To: References: <699ad4c0-6f79-be19-8239-ba2050ccb8de@auckland.ac.nz> <87d1a86i6n.fsf@jedbrown.org> <90E27510-2650-4B07-B37C-1C6D46250FC3@mcs.anl.gov> <87y3sv4qpl.fsf@jedbrown.org> Message-ID: On Wed, Jun 14, 2017 at 11:48 PM, Adrian Croucher wrote: > On 14/06/17 07:45, Jed Brown wrote: > >> Barry Smith writes: >> >> On Jun 13, 2017, at 10:06 AM, Jed Brown wrote: >>>> >>>> Adrian Croucher writes: >>>> >>>> One way might be to form the whole Jacobian but somehow use a modified >>>>> KSP solve which would implement the reduction process, do a KSP solve >>>>> on >>>>> the reduced system of size n, and finally back-substitute to find the >>>>> unknowns in the matrix rock cells. >>>>> >>>> You can do this with PCFieldSplit type Schur, but it's a lot heavier >>>> than you might like. >>>> >>> Is it clear that it would produce much overhead compared to doing a >>> custom "reduction to a smaller problem". Perhaps he should start with this >>> and then profiling can show if there are any likely benefits to >>> "specializing more"? >>> >> Yeah, that would be reasonable. We don't have a concept of sparsity for >> preconditioners so don't have a clean way to produce the exact (sparse) >> Schur complement. Computing this matrix using coloring should be >> relatively inexpensive due to the independence in each cell and its >> tridiagonal structure. >> > > Thanks for those ideas, very helpful. > > If I try this approach (forming whole Jacobian matrix and using > PCFieldSplit Schur), I guess I will first need to set up a modified DMPlex > for the whole fracture + matrix mesh- so I can use it to create vectors and > the Jacobian matrix (with the right sparsity pattern), and also to work out > the coloring for finite differencing. > > Would that be straight-forward to do? Currently my DM just comes from > DMPlexCreateFromFile(). Presumably you can use DMPlexInsertCone() or > similar to add points into it? You can certainly modify the mesh. I need to get a better idea what kind of modification, and then I can suggest a way to do it. What do you start with, and what exactly do you want to add? Thanks, Matt > > - Adrian > > -- > Dr Adrian Croucher > Senior Research Fellow > Department of Engineering Science > University of Auckland, New Zealand > email: a.croucher at auckland.ac.nz > tel: +64 (0)9 923 4611 > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From kannanr at ornl.gov Thu Jun 15 08:27:41 2017 From: kannanr at ornl.gov (Kannan, Ramakrishnan) Date: Thu, 15 Jun 2017 13:27:41 +0000 Subject: [petsc-users] slepc NHEP error In-Reply-To: <34243810-1A49-499E-812E-0C0CCCC38565@mcs.anl.gov> References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> <92EA6381-8F1F-4C9F-88BD-C1357B8C1C42@mcs.anl.gov> <9C04F853-DEDF-4F28-B63E-316AB14E0A97@mcs.anl.gov> <34243810-1A49-499E-812E-0C0CCCC38565@mcs.anl.gov> Message-ID: <4D23FC40-1491-44AF-825E-C4C4160F1F1E@ornl.gov> Barry, Attached is the error with ?one_error_abort and ?on_error_mpiabort. Let me know if you are looking for more information. -- Regards, Ramki On 6/14/17, 5:31 PM, "Barry Smith" wrote: Here is the line that generates an error: ierr = MPI_Allreduce(bv->work,y,len,MPIU_SCALAR,MPIU_SUM,PetscObjectComm((PetscObject)bv));CHKERRQ(ierr); let's see what the MPI error is by running again with the additional command line option -on_error_abort hopefully MPI will say something useful. Barry > On Jun 14, 2017, at 4:24 PM, Kannan, Ramakrishnan wrote: > > > > -- > Regards, > Ramki > > > On 6/14/17, 5:21 PM, "Barry Smith" wrote: > > > Send the file autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c as an attachment. > > Barry > >> On Jun 14, 2017, at 4:17 PM, Kannan, Ramakrishnan wrote: >> >> Barry, >> >> Appreciate your kind help. It compiles fine. I am still getting the following error. >> >> [0]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c >> [0]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c >> [0]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c >> [0]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [0]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [0]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [0]PETSC ERROR: [8]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c >> [8]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c >> [8]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c >> [8]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [8]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [8]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [8]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c >> [8]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c >> [8]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c >> [8]PETSC ERROR: #10 count() line 266 in /lustre/atlas/proj-shared/csc040/gryffin/gryffndor/miniapps/cmake/../algorithms/tricount.hpp >> [2]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c >> [2]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c >> [2]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c >> [2]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [2]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [2]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [2]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c >> [2]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c >> [2]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c >> [2]PETSC ERROR: #10 count() line 266 in /lustre/atlas/proj-shared/csc040/gryffin/gryffndor/miniapps/cmake/../algorithms/tricount.hpp >> #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c >> [0]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c >> [0]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c >> [0]PETSC ERROR: #10 count() line 266 in /lustre/atlas/proj-shared/csc040/gryffin/gryffndor/miniapps/cmake/../algorithms/tricount.hpp >> [7]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c >> [7]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c >> [7]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c >> [7]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [7]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [7]PETSC ERROR: [15]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c >> [15]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c >> [15]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c >> [15]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [15]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [15]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [15]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c >> [15]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c >> [15]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c >> [15]PETSC ERROR: #10 count() line 266 in /lustre/atlas/proj-shared/csc040/gryffin/gryffndor/miniapps/cmake/../algorithms/tricount.hpp >> #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >> [7]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c >> [7]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c >> [7]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c >> [7]PETSC ERROR: #10 count() line 266 in /lustre/atlas/proj-shared/csc040/gryffin/gryffndor/miniapps/cmake/../algorithms/tricount.hpp >> >> -- >> Regards, >> Ramki >> >> >> On 6/14/17, 4:48 PM, "Barry Smith" wrote: >> >> >>> On Jun 14, 2017, at 3:45 PM, Kannan, Ramakrishnan wrote: >>> >>> Barry, >>> >>> All the functions here are standard SLEPC functions and there are no user-defined or custom code here. As you can see, when I uncomment the CHKERRQ macros in my code, I am getting the compilation error. >> >> Yes that is because YOUR function that calls the SLEPc functions is void and doesn't return an error code. It is that function I recommend changing to return error codes. >> >> Barry >> >>> >>> -- >>> Regards, >>> Ramki >>> >>> >>> On 6/14/17, 4:40 PM, "Barry Smith" wrote: >>> >>> >>>> On Jun 14, 2017, at 3:33 PM, Kannan, Ramakrishnan wrote: >>>> >>>> Can I use CHKERRV instead of CHKERRQ? Will that help? >>> >>> You can do that. But I question you having functions in your code that return void instead of an error code. Without error codes you are just hurting your own productivity. >>> >>> Barry >>> >>>> >>>> -- >>>> Regards, >>>> Ramki >>>> >>>> >>>> On 6/14/17, 4:25 PM, "Kannan, Ramakrishnan" wrote: >>>> >>>> I get the following compilation error when I have CHKERRQ. >>>> >>>> /opt/cray/petsc/3.7.4.0/real/GNU64/5.1/sandybridge/include/petscerror.h:433:154: error: return-statement with a value, in function returning 'void' [-fpermissive] >>>> #define CHKERRQ(n) do {if (PetscUnlikely(n)) return PetscError(PETSC_COMM_SELF,__LINE__,PETSC_FUNCTION_NAME,__FILE__,n,PETSC_ERROR_REPEAT," ");} while (0) >>>> >>>> >>>> -- >>>> Regards, >>>> Ramki >>>> >>>> >>>> On 6/14/17, 4:14 PM, "Barry Smith" wrote: >>>> >>>> >>>> Why do you have the CHKERRQ(ierr); commented out in your code? >>>> >>>> Because of this you are getting mangled confusing error messages. >>>> >>>> Put a ierr = in front of all calls and a CHKERRQ(ierr); after each call. >>>> >>>> Then resend the new error message which will be much clearer. >>>> >>>> >>>> >>>>> On Jun 14, 2017, at 2:58 PM, Kannan, Ramakrishnan wrote: >>>>> >>>>> Hello, >>>>> >>>>> I am running NHEP across 16 MPI processors over 16 nodes in a matrix of global size of 1,000,000x1,000,000 with approximately global 16,000,000 non-zeros. Each node has approximately 1million non-zeros. >>>>> >>>>> The following is my slepc code for EPS. >>>>> >>>>> PetscInt nev; >>>>> ierr = EPSCreate(PETSC_COMM_WORLD, &eps); // CHKERRQ(ierr); >>>>> ierr = EPSSetOperators(eps, A, NULL); // CHKERRQ(ierr); >>>>> ierr = EPSSetProblemType(eps, EPS_NHEP); // CHKERRQ(ierr); >>>>> EPSSetWhichEigenpairs(eps, EPS_LARGEST_REAL); >>>>> EPSSetDimensions(eps, 100, PETSC_DEFAULT, PETSC_DEFAULT); >>>>> PRINTROOT("calling epssolve"); >>>>> ierr = EPSSolve(eps); // CHKERRQ(ierr); >>>>> ierr = EPSGetType(eps, &type); // CHKERRQ(ierr); >>>>> ierr = PetscPrintf(PETSC_COMM_WORLD, " Solution method: %s\n\n", type); >>>>> // CHKERRQ(ierr); >>>>> ierr = EPSGetDimensions(eps, &nev, NULL, NULL); // CHKERRQ(ierr); >>>>> ierr = PetscPrintf(PETSC_COMM_WORLD, >>>>> " Number of requested eigenvalues: %D\n", >>>>> nev); // CHKERRQ(ierr); >>>>> >>>>> I am getting the following error. Attached is the entire error file for your reference. Please let me know what should I fix in this code. >>>>> >>>>> 2]PETSC ERROR: Argument out of range >>>>> [2]PETSC ERROR: Argument 2 out of range >>>>> [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>>> [2]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016 >>>>> [2]PETSC ERROR: ./miniapps on a sandybridge named nid00300 by d3s Wed Jun 14 15:32:00 2017 >>>>> [13]PETSC ERROR: #1 BVDotVec_BLAS_Private() line 272 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvblas.c >>>>> [13]PETSC ERROR: #2 BVDotVec_Svec() line 150 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/impls/svec/svec.c >>>>> [13]PETSC ERROR: #3 BVDotVec() line 191 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvglobal.c >>>>> [13]PETSC ERROR: #4 BVOrthogonalizeCGS1() line 81 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >>>>> [13]PETSC ERROR: #5 BVOrthogonalizeCGS() line 214 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >>>>> [13]PETSC ERROR: #6 BVOrthogonalizeColumn() line 371 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/sys/classes/bv/interface/bvorthog.c >>>>> [13]PETSC ERROR: #7 EPSBasicArnoldi() line 59 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/epskrylov.c >>>>> [13]PETSC ERROR: #8 EPSSolve_KrylovSchur_Default() line 203 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/impls/krylov/krylovschur/krylovschur.c >>>>> [13]PETSC ERROR: #9 EPSSolve() line 101 in /autofs/nccs-svm1_home1/ramki/libraries/slepc-3.7.3/src/eps/interface/epssolve.c >>>>> >>>>> -- >>>>> Regards, >>>>> Ramki >>>>> >>>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> >> >> >> >> > > > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: slepc.e614014 Type: application/octet-stream Size: 5160 bytes Desc: slepc.e614014 URL: From jroman at dsic.upv.es Thu Jun 15 08:40:47 2017 From: jroman at dsic.upv.es (Jose E. Roman) Date: Thu, 15 Jun 2017 15:40:47 +0200 Subject: [petsc-users] slepc NHEP error In-Reply-To: <4D23FC40-1491-44AF-825E-C4C4160F1F1E@ornl.gov> References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> <92EA6381-8F1F-4C9F-88BD-C1357B8C1C42@mcs.anl.gov> <9C04F853-DEDF-4F28-B63E-316AB14E0A97@mcs.anl.gov> <34243810-1A49-499E-812E-0C0CCCC38565@mcs.anl.gov> <4D23FC40-1491-44AF-825E-C4C4160F1F1E@ornl.gov> Message-ID: <2A3DB53A-D92A-4FC6-8454-5C11039B0343@dsic.upv.es> > El 15 jun 2017, a las 15:27, Kannan, Ramakrishnan escribi?: > > Barry, > > Attached is the error with ?one_error_abort and ?on_error_mpiabort. Let me know if you are looking for more information. Thanks for the information. Please try the following: In function BVDotVec_BLAS_Private, change the type of variable "len" to PetscMPIInt instead of PetscBLASInt. The same thing in function BVDot_BLAS_Private. [This change is in master but not in the release version.] Jose From kannanr at ornl.gov Thu Jun 15 09:18:15 2017 From: kannanr at ornl.gov (Kannan, Ramakrishnan) Date: Thu, 15 Jun 2017 14:18:15 +0000 Subject: [petsc-users] slepc NHEP error In-Reply-To: <2A3DB53A-D92A-4FC6-8454-5C11039B0343@dsic.upv.es> References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> <92EA6381-8F1F-4C9F-88BD-C1357B8C1C42@mcs.anl.gov> <9C04F853-DEDF-4F28-B63E-316AB14E0A97@mcs.anl.gov> <34243810-1A49-499E-812E-0C0CCCC38565@mcs.anl.gov> <4D23FC40-1491-44AF-825E-C4C4160F1F1E@ornl.gov> <2A3DB53A-D92A-4FC6-8454-5C11039B0343@dsic.upv.es> Message-ID: <4650CE13-784F-4FBE-B5FC-45717BD30103@ornl.gov> I made the advised changes and rebuilt slepc. I ran and the error still exists. Attached are the error file and the modified source file bvblas.c. -- Regards, Ramki On 6/15/17, 9:40 AM, "Jose E. Roman" wrote: > El 15 jun 2017, a las 15:27, Kannan, Ramakrishnan escribi?: > > Barry, > > Attached is the error with ?one_error_abort and ?on_error_mpiabort. Let me know if you are looking for more information. Thanks for the information. Please try the following: In function BVDotVec_BLAS_Private, change the type of variable "len" to PetscMPIInt instead of PetscBLASInt. The same thing in function BVDot_BLAS_Private. [This change is in master but not in the release version.] Jose -------------- next part -------------- A non-text attachment was scrubbed... Name: bvblas.c Type: application/octet-stream Size: 16005 bytes Desc: bvblas.c URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: slepc.e614030 Type: application/octet-stream Size: 2971 bytes Desc: slepc.e614030 URL: From jroman at dsic.upv.es Thu Jun 15 09:51:32 2017 From: jroman at dsic.upv.es (Jose E. Roman) Date: Thu, 15 Jun 2017 16:51:32 +0200 Subject: [petsc-users] slepc NHEP error In-Reply-To: <4650CE13-784F-4FBE-B5FC-45717BD30103@ornl.gov> References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> <92EA6381-8F1F-4C9F-88BD-C1357B8C1C42@mcs.anl.gov> <9C04F853-DEDF-4F28-B63E-316AB14E0A97@mcs.anl.gov> <34243810-1A49-499E-812E-0C0CCCC38565@mcs.anl.gov> <4D23FC40-1491-44AF-825E-C4C4160F1F1E@ornl.gov> <2A3DB53A-D92A-4FC6-8454-5C11039B0343@dsic.upv.es> <4650CE13-784F-4FBE-B5FC-45717BD30103@ornl.gov> Message-ID: <3A414042-4AC3-4B8F-8CE6-6C0A45509ECF@dsic.upv.es> > El 15 jun 2017, a las 16:18, Kannan, Ramakrishnan escribi?: > > I made the advised changes and rebuilt slepc. I ran and the error still exists. Attached are the error file and the modified source file bvblas.c. This is really weird. It seems that at some point the number of columns of the BV object differs by one in different MPI processes. The only explanation I can think of is that a threaded BLAS/LAPACK is giving slightly different results in each process. Which BLAS/LAPACK do you have? Can you run with threads turned off? An alternative would be to configure PETSc with --download-fblaslapack (or --download-f2cblaslapack) Jose From kannanr at ornl.gov Thu Jun 15 10:18:52 2017 From: kannanr at ornl.gov (Kannan, Ramakrishnan) Date: Thu, 15 Jun 2017 15:18:52 +0000 Subject: [petsc-users] slepc NHEP error In-Reply-To: <3A414042-4AC3-4B8F-8CE6-6C0A45509ECF@dsic.upv.es> References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> <92EA6381-8F1F-4C9F-88BD-C1357B8C1C42@mcs.anl.gov> <9C04F853-DEDF-4F28-B63E-316AB14E0A97@mcs.anl.gov> <34243810-1A49-499E-812E-0C0CCCC38565@mcs.anl.gov> <4D23FC40-1491-44AF-825E-C4C4160F1F1E@ornl.gov> <2A3DB53A-D92A-4FC6-8454-5C11039B0343@dsic.upv.es> <4650CE13-784F-4FBE-B5FC-45717BD30103@ornl.gov> <3A414042-4AC3-4B8F-8CE6-6C0A45509ECF@dsic.upv.es> Message-ID: 1. I am using the pre-loaded 64-bit petsc in the EOS system at OLCF. 2. The BLAS and LAPACK are the cray wrappers and I don?t give any explicit BLAS/LAPACK. 3. I am association one MPI process for every core. Where and how do I turn threads off? Do you want me to build a local petsc with 64-bit using their lapack and use my custom built petsc for building the slepc? -- Regards, Ramki On 6/15/17, 10:51 AM, "Jose E. Roman" wrote: > El 15 jun 2017, a las 16:18, Kannan, Ramakrishnan escribi?: > > I made the advised changes and rebuilt slepc. I ran and the error still exists. Attached are the error file and the modified source file bvblas.c. This is really weird. It seems that at some point the number of columns of the BV object differs by one in different MPI processes. The only explanation I can think of is that a threaded BLAS/LAPACK is giving slightly different results in each process. Which BLAS/LAPACK do you have? Can you run with threads turned off? An alternative would be to configure PETSc with --download-fblaslapack (or --download-f2cblaslapack) Jose From jroman at dsic.upv.es Thu Jun 15 10:30:03 2017 From: jroman at dsic.upv.es (Jose E. Roman) Date: Thu, 15 Jun 2017 17:30:03 +0200 Subject: [petsc-users] slepc NHEP error In-Reply-To: References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> <92EA6381-8F1F-4C9F-88BD-C1357B8C1C42@mcs.anl.gov> <9C04F853-DEDF-4F28-B63E-316AB14E0A97@mcs.anl.gov> <34243810-1A49-499E-812E-0C0CCCC38565@mcs.anl.gov> <4D23FC40-1491-44AF-825E-C4C4160F1F1E@ornl.gov> <2A3DB53A-D92A-4FC6-8454-5C11039B0343@dsic.upv.es> <4650CE13-784F-4FBE-B5FC-45717BD30103@ornl.gov> <3A414042-4AC3-4B8F-8CE6-6C0A45509ECF@dsic.upv.es> Message-ID: <6EB8A094-C11D-4739-8741-378B923E2889@dsic.upv.es> > El 15 jun 2017, a las 17:18, Kannan, Ramakrishnan escribi?: > > 1. I am using the pre-loaded 64-bit petsc in the EOS system at OLCF. > 2. The BLAS and LAPACK are the cray wrappers and I don?t give any explicit BLAS/LAPACK. > 3. I am association one MPI process for every core. Where and how do I turn threads off? I don't have access to this machine. You can try setting OMP_NUM_THREADS=1 before launching, not sure if this will make a difference. https://www.olcf.ornl.gov/kb_articles/compiling-threaded-codes-on-eos/ > > Do you want me to build a local petsc with 64-bit using their lapack and use my custom built petsc for building the slepc? What I was suggesting is to build PETSc locally in your account with --download-fblaslapack --with-64-bit-indices, and build SLEPc against this one. Jose > -- > Regards, > Ramki From kannanr at ornl.gov Thu Jun 15 10:56:48 2017 From: kannanr at ornl.gov (Kannan, Ramakrishnan) Date: Thu, 15 Jun 2017 15:56:48 +0000 Subject: [petsc-users] slepc NHEP error In-Reply-To: <6EB8A094-C11D-4739-8741-378B923E2889@dsic.upv.es> References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> <92EA6381-8F1F-4C9F-88BD-C1357B8C1C42@mcs.anl.gov> <9C04F853-DEDF-4F28-B63E-316AB14E0A97@mcs.anl.gov> <34243810-1A49-499E-812E-0C0CCCC38565@mcs.anl.gov> <4D23FC40-1491-44AF-825E-C4C4160F1F1E@ornl.gov> <2A3DB53A-D92A-4FC6-8454-5C11039B0343@dsic.upv.es> <4650CE13-784F-4FBE-B5FC-45717BD30103@ornl.gov> <3A414042-4AC3-4B8F-8CE6-6C0A45509ECF@dsic.upv.es> <6EB8A094-C11D-4739-8741-378B923E2889@dsic.upv.es> Message-ID: <21FE34C4-7F20-48EF-BF56-F496B73B6CEF@ornl.gov> 1. I am seeing this error happening in titan as well 2. Export OMP_NUM_THREADS=1, didn?t alleviate and the error still remains the same. 3. I will try petsc-32 bit and check if the error remains the same. 4. I will build my own petsc 64 bit after my trial with 32 bit. -- Regards, Ramki On 6/15/17, 11:30 AM, "Jose E. Roman" wrote: > El 15 jun 2017, a las 17:18, Kannan, Ramakrishnan escribi?: > > 1. I am using the pre-loaded 64-bit petsc in the EOS system at OLCF. > 2. The BLAS and LAPACK are the cray wrappers and I don?t give any explicit BLAS/LAPACK. > 3. I am association one MPI process for every core. Where and how do I turn threads off? I don't have access to this machine. You can try setting OMP_NUM_THREADS=1 before launching, not sure if this will make a difference. https://www.olcf.ornl.gov/kb_articles/compiling-threaded-codes-on-eos/ > > Do you want me to build a local petsc with 64-bit using their lapack and use my custom built petsc for building the slepc? What I was suggesting is to build PETSc locally in your account with --download-fblaslapack --with-64-bit-indices, and build SLEPc against this one. Jose > -- > Regards, > Ramki From bsmith at mcs.anl.gov Thu Jun 15 12:13:51 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 15 Jun 2017 12:13:51 -0500 Subject: [petsc-users] slepc NHEP error In-Reply-To: <3A414042-4AC3-4B8F-8CE6-6C0A45509ECF@dsic.upv.es> References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> <92EA6381-8F1F-4C9F-88BD-C1357B8C1C42@mcs.anl.gov> <9C04F853-DEDF-4F28-B63E-316AB14E0A97@mcs.anl.gov> <34243810-1A49-499E-812E-0C0CCCC38565@mcs.anl.gov> <4D23FC40-1491-44AF-825E-C4C4160F1F1E@ornl.gov> <2A3DB53A-D92A-4FC6-8454-5C11039B0343@dsic.upv.es> <4650CE13-784F-4FBE-B5FC-45717BD30103@ornl.gov> <3A414042-4AC3-4B8F-8CE6-6C0A45509ECF@dsic.upv.es> Message-ID: > On Jun 15, 2017, at 9:51 AM, Jose E. Roman wrote: > > >> El 15 jun 2017, a las 16:18, Kannan, Ramakrishnan escribi?: >> >> I made the advised changes and rebuilt slepc. I ran and the error still exists. Attached are the error file and the modified source file bvblas.c. > > This is really weird. It seems that at some point the number of columns of the BV object differs by one in different MPI processes. The only explanation I can think of is that a threaded BLAS/LAPACK is giving slightly different results in each process. Jose, Do you have a local calculation that generates the number of columns seperately on each process with the assumption that the result will be the same on all processes? Where is that code? You may need a global reduction where the processes "negotiate" what the number of columns should be after they do the local computation, for example take the maximum (or min or average) produced by all the processes. Barry > Which BLAS/LAPACK do you have? Can you run with threads turned off? An alternative would be to configure PETSc with --download-fblaslapack (or --download-f2cblaslapack) > > Jose > From jroman at dsic.upv.es Thu Jun 15 12:31:42 2017 From: jroman at dsic.upv.es (Jose E. Roman) Date: Thu, 15 Jun 2017 19:31:42 +0200 Subject: [petsc-users] slepc NHEP error In-Reply-To: References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> <92EA6381-8F1F-4C9F-88BD-C1357B8C1C42@mcs.anl.gov> <9C04F853-DEDF-4F28-B63E-316AB14E0A97@mcs.anl.gov> <34243810-1A49-499E-812E-0C0CCCC38565@mcs.anl.gov> <4D23FC40-1491-44AF-825E-C4C4160F1F1E@ornl.gov> <2A3DB53A-D92A-4FC6-8454-5C11039B0343@dsic.upv.es> <4650CE13-784F-4FBE-B5FC-45717BD30103@ornl.gov> <3A414042-4AC3-4B8F-8CE6-6C0A45509ECF@dsic.upv.es> Message-ID: > El 15 jun 2017, a las 19:13, Barry Smith escribi?: > > >> On Jun 15, 2017, at 9:51 AM, Jose E. Roman wrote: >> >> >>> El 15 jun 2017, a las 16:18, Kannan, Ramakrishnan escribi?: >>> >>> I made the advised changes and rebuilt slepc. I ran and the error still exists. Attached are the error file and the modified source file bvblas.c. >> >> This is really weird. It seems that at some point the number of columns of the BV object differs by one in different MPI processes. The only explanation I can think of is that a threaded BLAS/LAPACK is giving slightly different results in each process. > > > Jose, > > Do you have a local calculation that generates the number of columns seperately on each process with the assumption that the result will be the same on all processes? Where is that code? You may need a global reduction where the processes "negotiate" what the number of columns should be after they do the local computation, for example take the maximum (or min or average) produced by all the processes. > > Barry > My comment is related to the convergence criterion, which is based on a call to LAPACK (it is buried in the DS object, no clear spot in the code). This is done this way in SLEPc for 15+ years, and no one has complained. So maybe it is not what is causing this problem. The thing is that I do not have access to these big machines, with Cray libraries etc., so I cannot reproduce the problem and am just suggesting things blindly. Jose > > > >> Which BLAS/LAPACK do you have? Can you run with threads turned off? An alternative would be to configure PETSc with --download-fblaslapack (or --download-f2cblaslapack) >> >> Jose >> > From bsmith at mcs.anl.gov Thu Jun 15 12:35:41 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 15 Jun 2017 12:35:41 -0500 Subject: [petsc-users] slepc NHEP error In-Reply-To: References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> <92EA6381-8F1F-4C9F-88BD-C1357B8C1C42@mcs.anl.gov> <9C04F853-DEDF-4F28-B63E-316AB14E0A97@mcs.anl.gov> <34243810-1A49-499E-812E-0C0CCCC38565@mcs.anl.gov> <4D23FC40-1491-44AF-825E-C4C4160F1F1E@ornl.gov> <2A3DB53A-D92A-4FC6-8454-5C11039B0343@dsic.upv.es> <4650CE13-784F-4FBE-B5FC-45717BD30103@ornl.gov> <3A414042-4AC3-4B8F-8CE6-6C0A45509ECF@dsic.upv.es> Message-ID: > On Jun 15, 2017, at 12:31 PM, Jose E. Roman wrote: > >> >> El 15 jun 2017, a las 19:13, Barry Smith escribi?: >> >> >>> On Jun 15, 2017, at 9:51 AM, Jose E. Roman wrote: >>> >>> >>>> El 15 jun 2017, a las 16:18, Kannan, Ramakrishnan escribi?: >>>> >>>> I made the advised changes and rebuilt slepc. I ran and the error still exists. Attached are the error file and the modified source file bvblas.c. >>> >>> This is really weird. It seems that at some point the number of columns of the BV object differs by one in different MPI processes. The only explanation I can think of is that a threaded BLAS/LAPACK is giving slightly different results in each process. So where in the code is the decision on how many columns to use made? If we look at that it might help see why it could ever produce different results on different processes. >> >> >> Jose, >> >> Do you have a local calculation that generates the number of columns seperately on each process with the assumption that the result will be the same on all processes? Where is that code? You may need a global reduction where the processes "negotiate" what the number of columns should be after they do the local computation, for example take the maximum (or min or average) produced by all the processes. >> >> Barry >> > > My comment is related to the convergence criterion, which is based on a call to LAPACK (it is buried in the DS object, no clear spot in the code). This is done this way in SLEPc for 15+ years, and no one has complained. So maybe it is not what is causing this problem. The thing is that I do not have access to these big machines, with Cray libraries etc., so I cannot reproduce the problem and am just suggesting things blindly. > > Jose > > >> >> >> >>> Which BLAS/LAPACK do you have? Can you run with threads turned off? An alternative would be to configure PETSc with --download-fblaslapack (or --download-f2cblaslapack) >>> >>> Jose From kannanr at ornl.gov Thu Jun 15 12:41:25 2017 From: kannanr at ornl.gov (Kannan, Ramakrishnan) Date: Thu, 15 Jun 2017 17:41:25 +0000 Subject: [petsc-users] slepc NHEP error In-Reply-To: References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> <92EA6381-8F1F-4C9F-88BD-C1357B8C1C42@mcs.anl.gov> <9C04F853-DEDF-4F28-B63E-316AB14E0A97@mcs.anl.gov> <34243810-1A49-499E-812E-0C0CCCC38565@mcs.anl.gov> <4D23FC40-1491-44AF-825E-C4C4160F1F1E@ornl.gov> <2A3DB53A-D92A-4FC6-8454-5C11039B0343@dsic.upv.es> <4650CE13-784F-4FBE-B5FC-45717BD30103@ornl.gov> <3A414042-4AC3-4B8F-8CE6-6C0A45509ECF@dsic.upv.es> Message-ID: Barry/Jose, It is uniform random synthetic matrix. All the processors have same number of columns. Here is the code snippet for matrix construction. Generate_petsc_mat(int n_rows, int n_cols, int n_nnz, PetscInt *row_idx, PetscInt *col_idx, PetscScalar *val, const MPICommunicator& communicator): { //compute matrix statistics. Use for matsetvalues int *start_row = new int[communicator.size()]; int *all_proc_rows = new int[communicator.size()]; int global_rows = 0; MPI_Allgather(&n_rows, 1, MPI_INT, all_proc_rows, 1, MPI_INT, MPI_COMM_WORLD); start_row[0] = 0; for (int i = 0; i < communicator.size(); i++) { if (i > 0) { start_row[i] = start_row[i - 1] + all_proc_rows[i]; } global_rows += all_proc_rows[i]; } // find the max nnzs per row for preallocation. int *nnzs_per_rows = new int[n_rows]; for (int i = 0; i < n_rows; i++) { nnzs_per_rows[i] = 0; } for (int i = 0; i < n_nnz; i++) { nnzs_per_rows[row_idx[i]]++; } int max_nnz_per_row = -1; for (int i = 0; i < n_rows; i++) { if (nnzs_per_rows[i] > max_nnz_per_row) { max_nnz_per_row = nnzs_per_rows[i]; } } int max_cols; MPI_Allreduce(&n_cols,&max_cols,1,MPI_INT,MPI_MAX,MPI_COMM_WORLD); DISTPRINTINFO("rows,cols,nnzs, max_nnz_per_row" << n_rows << "," << n_cols << "," << n_nnz << max_nnz_per_row); MatCreate(PETSC_COMM_WORLD, &A); MatSetType(A, MATMPIAIJ); MatSetSizes(A, n_rows, PETSC_DECIDE, global_rows, max_cols); MatMPIAIJSetPreallocation(A, max_nnz_per_row, PETSC_NULL, max_nnz_per_row, PETSC_NULL); PetscInt local_row_idx; PetscInt local_col_idx; PetscScalar local_val; int my_start_row = start_row[MPI_RANK]; int my_start_col = 0; double petsc_insert_time = 0.0; for (int i = 0; i < n_nnz; i++) { local_row_idx = my_start_row + row_idx[i]; local_col_idx = my_start_col + col_idx[i]; local_val = val[i]; #ifdef TRICOUNT_VERBOSE DISTPRINTINFO(local_row_idx << "," << local_col_idx << "," << local_val); #endif tic(); ierr = MatSetValues(A, 1, &local_row_idx, 1, &local_col_idx, &local_val, INSERT_VALUES); petsc_insert_time += toc(); if (i % 25000 == 0) { PRINTROOT("25000 time::" << petsc_insert_time); petsc_insert_time = 0; } CHKERRV(ierr); } PRINTROOT("100000 time::" << petsc_insert_time); tic(); MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); petsc_insert_time = toc(); PRINTROOT("calling assembly to end::took::" << petsc_insert_time); PetscPrintf(PETSC_COMM_WORLD, "Matrix created\n"); -- Regards, Ramki On 6/15/17, 1:35 PM, "Barry Smith" wrote: > On Jun 15, 2017, at 12:31 PM, Jose E. Roman wrote: > >> >> El 15 jun 2017, a las 19:13, Barry Smith escribi?: >> >> >>> On Jun 15, 2017, at 9:51 AM, Jose E. Roman wrote: >>> >>> >>>> El 15 jun 2017, a las 16:18, Kannan, Ramakrishnan escribi?: >>>> >>>> I made the advised changes and rebuilt slepc. I ran and the error still exists. Attached are the error file and the modified source file bvblas.c. >>> >>> This is really weird. It seems that at some point the number of columns of the BV object differs by one in different MPI processes. The only explanation I can think of is that a threaded BLAS/LAPACK is giving slightly different results in each process. So where in the code is the decision on how many columns to use made? If we look at that it might help see why it could ever produce different results on different processes. >> >> >> Jose, >> >> Do you have a local calculation that generates the number of columns seperately on each process with the assumption that the result will be the same on all processes? Where is that code? You may need a global reduction where the processes "negotiate" what the number of columns should be after they do the local computation, for example take the maximum (or min or average) produced by all the processes. >> >> Barry >> > > My comment is related to the convergence criterion, which is based on a call to LAPACK (it is buried in the DS object, no clear spot in the code). This is done this way in SLEPc for 15+ years, and no one has complained. So maybe it is not what is causing this problem. The thing is that I do not have access to these big machines, with Cray libraries etc., so I cannot reproduce the problem and am just suggesting things blindly. > > Jose > > >> >> >> >>> Which BLAS/LAPACK do you have? Can you run with threads turned off? An alternative would be to configure PETSc with --download-fblaslapack (or --download-f2cblaslapack) >>> >>> Jose From jroman at dsic.upv.es Thu Jun 15 13:27:10 2017 From: jroman at dsic.upv.es (Jose E. Roman) Date: Thu, 15 Jun 2017 20:27:10 +0200 Subject: [petsc-users] slepc NHEP error In-Reply-To: References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> <92EA6381-8F1F-4C9F-88BD-C1357B8C1C42@mcs.anl.gov> <9C04F853-DEDF-4F28-B63E-316AB14E0A97@mcs.anl.gov> <34243810-1A49-499E-812E-0C0CCCC38565@mcs.anl.gov> <4D23FC40-1491-44AF-825E-C4C4160F1F1E@ornl.gov> <2A3DB53A-D92A-4FC6-8454-5C11039B0343@dsic.upv.es> <4650CE13-784F-4FBE-B5FC-45717BD30103@ornl.gov> <3A414042-4AC3-4B8F-8CE6-6C0A45509ECF@dsic.upv.es> Message-ID: <6C5B1E55-B678-4A54-881B-421E627932E5@dsic.upv.es> > El 15 jun 2017, a las 19:35, Barry Smith escribi?: > > So where in the code is the decision on how many columns to use made? If we look at that it might help see why it could ever produce different results on different processes. After seeing the call stack again, I think my previous comment is wrong. I really don't know what is happening. If the number of columns was different in different processes, it would have failed before reaching that line of code. Ramki: could you send me the matrix somehow? I could try it in a machine here. Which options are you using for the solver? Jose From bsmith at mcs.anl.gov Thu Jun 15 13:38:11 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 15 Jun 2017 13:38:11 -0500 Subject: [petsc-users] empty split for fieldsplit In-Reply-To: References: <86908C35-EBFC-48C6-A002-B64BEC85A375@mcs.anl.gov> <5596225C-DB37-4040-B709-5E6F4B18041B@mcs.anl.gov> Message-ID: Hong, Please build the attached code with master and run with petscmpiexec -n 2 ./ex1 -mat_size 40 -block_size 2 -method 2 I think this is a bug in your new MatGetSubMatrix routines. You take the block size of the outer IS and pass it into the inner IS but that inner IS may not support the same block size hence the crash. Can you please debug this? Thanks Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: ex1.c Type: application/octet-stream Size: 4025 bytes Desc: not available URL: -------------- next part -------------- > On Jun 15, 2017, at 7:56 AM, Hoang Giang Bui wrote: > > Hi Barry > > Thanks for pointing out the error. I think the problem coming from the zero fieldsplit in proc 0. In this modified example, I parameterized the matrix size and block size, so when you're executing > > mpirun -np 2 ./ex -mat_size 40 -block_size 2 -method 1 > > everything was fine. With method = 1, fieldsplit size of B is nonzero and is divided by the block size. > > With method=2, i.e mpirun -np 2 ./ex -mat_size 40 -block_size 2 -method 2, the fieldsplit B is zero on proc 0, and the error is thrown > > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [1]PETSC ERROR: Arguments are incompatible > [1]PETSC ERROR: Local size 11 not compatible with block size 2 > > This is somehow not logical, because 0 is divided by block_size. > > Furthermore, if you execute "mpirun -np 2 ./ex -mat_size 20 -block_size 2 -method 2", the code hangs at ISSetBlockSize, which is pretty similar to my original problem. Probably the original one also hangs at ISSetBlockSize, which I may not realize at that time. > > Giang > > On Wed, Jun 14, 2017 at 5:29 PM, Barry Smith wrote: > > You can't do this > > ierr = MatSetSizes(A,PETSC_DECIDE,N,N,N);CHKERRQ(ierr); > > use PETSC_DECIDE for the third argument > > Also this is wrong > > for (i = Istart; i < Iend; ++i) > { > ierr = MatSetValue(A,i,i,2,INSERT_VALUES);CHKERRQ(ierr); > ierr = MatSetValue(A,i+1,i,-1,INSERT_VALUES);CHKERRQ(ierr); > ierr = MatSetValue(A,i,i+1,-1,INSERT_VALUES);CHKERRQ(ierr); > } > > you will get > > $ petscmpiexec -n 2 ./ex1 > 0: Istart = 0, Iend = 60 > 1: Istart = 60, Iend = 120 > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [1]PETSC ERROR: Argument out of range > [1]PETSC ERROR: Row too large: row 120 max 119 > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [1]PETSC ERROR: Petsc Development GIT revision: v3.7.6-4103-g93161b8192 GIT Date: 2017-06-11 14:49:39 -0500 > [1]PETSC ERROR: ./ex1 on a arch-basic named Barrys-MacBook-Pro.local by barrysmith Wed Jun 14 18:26:52 2017 > [1]PETSC ERROR: Configure options PETSC_ARCH=arch-basic > [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 550 in /Users/barrysmith/Src/petsc/src/mat/impls/aij/mpi/mpiaij.c > [1]PETSC ERROR: #2 MatSetValues() line 1270 in /Users/barrysmith/Src/petsc/src/mat/interface/matrix.c > [1]PETSC ERROR: #3 main() line 30 in /Users/barrysmith/Src/petsc/test-dir/ex1.c > [1]PETSC ERROR: PETSc Option Table entries: > [1]PETSC ERROR: -malloc_test > > You need to get the example working so it ends with the error you reported previously not these other bugs. > > > > On Jun 12, 2017, at 10:19 AM, Hoang Giang Bui wrote: > > > > Dear Barry > > > > I made a small example with 2 process with one empty split in proc 0. But it gives another strange error > > > > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > [1]PETSC ERROR: Arguments are incompatible > > [1]PETSC ERROR: Local size 31 not compatible with block size 2 > > > > The local size is always 60, so this is confusing. > > > > Giang > > > > On Sun, Jun 11, 2017 at 8:11 PM, Barry Smith wrote: > > Could be, send us a simple example that demonstrates the problem and we'll track it down. > > > > > > > On Jun 11, 2017, at 12:34 PM, Hoang Giang Bui wrote: > > > > > > Hello > > > > > > I noticed that my code stopped very long, possibly hang, at PCFieldSplitSetIS. There are two splits and one split is empty in one process. May that be the possible reason that PCFieldSplitSetIS hang ? > > > > > > Giang > > > > > > > > > From kannanr at ornl.gov Thu Jun 15 13:45:23 2017 From: kannanr at ornl.gov (Kannan, Ramakrishnan) Date: Thu, 15 Jun 2017 18:45:23 +0000 Subject: [petsc-users] slepc NHEP error In-Reply-To: <6C5B1E55-B678-4A54-881B-421E627932E5@dsic.upv.es> References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> <92EA6381-8F1F-4C9F-88BD-C1357B8C1C42@mcs.anl.gov> <9C04F853-DEDF-4F28-B63E-316AB14E0A97@mcs.anl.gov> <34243810-1A49-499E-812E-0C0CCCC38565@mcs.anl.gov> <4D23FC40-1491-44AF-825E-C4C4160F1F1E@ornl.gov> <2A3DB53A-D92A-4FC6-8454-5C11039B0343@dsic.upv.es> <4650CE13-784F-4FBE-B5FC-45717BD30103@ornl.gov> <3A414042-4AC3-4B8F-8CE6-6C0A45509ECF@dsic.upv.es> <6C5B1E55-B678-4A54-881B-421E627932E5@dsic.upv.es> Message-ID: <2E0B0F78-3385-4167-A9A9-BEDA970B7F19@ornl.gov> Attached is the latest error w/ 32 bit petsc and the uniform random input matrix. Let me know if you are looking for more information. -- Regards, Ramki On 6/15/17, 2:27 PM, "Jose E. Roman" wrote: > El 15 jun 2017, a las 19:35, Barry Smith escribi?: > > So where in the code is the decision on how many columns to use made? If we look at that it might help see why it could ever produce different results on different processes. After seeing the call stack again, I think my previous comment is wrong. I really don't know what is happening. If the number of columns was different in different processes, it would have failed before reaching that line of code. Ramki: could you send me the matrix somehow? I could try it in a machine here. Which options are you using for the solver? Jose -------------- next part -------------- A non-text attachment was scrubbed... Name: slepc.e614138 Type: application/octet-stream Size: 3141 bytes Desc: slepc.e614138 URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Arows.tar.gz Type: application/x-gzip Size: 2284164 bytes Desc: Arows.tar.gz URL: From bsmith at mcs.anl.gov Thu Jun 15 15:35:05 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 15 Jun 2017 15:35:05 -0500 Subject: [petsc-users] slepc NHEP error In-Reply-To: <2E0B0F78-3385-4167-A9A9-BEDA970B7F19@ornl.gov> References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> <92EA6381-8F1F-4C9F-88BD-C1357B8C1C42@mcs.anl.gov> <9C04F853-DEDF-4F28-B63E-316AB14E0A97@mcs.anl.gov> <34243810-1A49-499E-812E-0C0CCCC38565@mcs.anl.gov> <4D23FC40-1491-44AF-825E-C4C4160F1F1E@ornl.gov> <2A3DB53A-D92A-4FC6-8454-5C11039B0343@dsic.upv.es> <4650CE13-784F-4FBE-B5FC-45717BD30103@ornl.gov> <3A414042-4AC3-4B8F-8CE6-6C0A45509ECF@dsic.upv.es> <6C5B1E55-B678-4A54-881B-421E627932E5@dsic.upv.es> <2E0B0F78-3385-4167-A9A9-BEDA970B7F19@ornl.gov> Message-ID: <25798DA5-ECA6-40E5-995D-2BE90D6FDBAF@mcs.anl.gov> > On Jun 15, 2017, at 1:45 PM, Kannan, Ramakrishnan wrote: > > Attached is the latest error w/ 32 bit petsc and the uniform random input matrix. Let me know if you are looking for more information. Could you please send the full program that reads in the data files and runs SLEPc generating the problem? We don't have any way of using the data files you sent us. Barry > > > > -- > Regards, > Ramki > > > On 6/15/17, 2:27 PM, "Jose E. Roman" wrote: > > >> El 15 jun 2017, a las 19:35, Barry Smith escribi?: >> >> So where in the code is the decision on how many columns to use made? If we look at that it might help see why it could ever produce different results on different processes. > > After seeing the call stack again, I think my previous comment is wrong. I really don't know what is happening. If the number of columns was different in different processes, it would have failed before reaching that line of code. > > Ramki: could you send me the matrix somehow? I could try it in a machine here. Which options are you using for the solver? > > Jose > > > > > From asmprog32 at hotmail.com Thu Jun 15 16:09:07 2017 From: asmprog32 at hotmail.com (Pietro Incardona) Date: Thu, 15 Jun 2017 21:09:07 +0000 Subject: [petsc-users] PETSC profiling on 1536 cores Message-ID: Dear All I tried PETSC version 3.6.5 to solve a linear system with 256 000 000 unknown. The equation is Finite differences Poisson equation. I am using Conjugate gradient (the matrix is symmetric) with no preconditioner. Visualizing the solution is reasonable. Unfortunately the Conjugate-Gradient does not scale at all and I am extremely concerned about this problem in paticular about the profiling numbers. Looking at the profiler it seem that 1536 cores = 24 cores x 64 VecScatterBegin 348 1.0 2.3975e-01 1.8 0.00e+00 0.0 7.7e+06 3.1e+04 0.0e+00 0 0 85 99 0 0 0 85 99 0 0 VecScatterEnd 348 1.0 2.8680e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatMult 348 1.0 4.1088e+00 1.4 8.18e+08 1.3 7.7e+06 3.1e+04 0.0e+00 2 52 85 99 0 2 52 85 99 0 281866 I was expecting that this part was the most expensive and take around 4 second in total that sound reasonable Unfortunately on 1536 cores = 24 cores x 64 VecTDot 696 1.0 3.4442e+01 1.4 2.52e+08 1.3 0.0e+00 0.0e+00 7.0e+02 12 16 0 0 65 12 16 0 0 65 10346 VecNorm 349 1.0 1.1101e+02 1.1 1.26e+08 1.3 0.0e+00 0.0e+00 3.5e+02 46 8 0 0 33 46 8 0 0 33 1610 VecAXPY 696 1.0 8.3134e+01 1.1 2.52e+08 1.3 0.0e+00 0.0e+00 0.0e+00 34 16 0 0 0 34 16 0 0 0 4286 Take over 228 seconds. Considering that doing some test on the cluster I do not see any problem with MPI_Reduce I do not understand how these numbers are possible ////////////////////////// I also attach to the profiling part the inversion on 48 cores ///////////////////////// VecTDot 696 1.0 1.4684e+01 1.3 3.92e+09 1.1 0.0e+00 0.0e+00 7.0e+02 6 16 0 0 65 6 16 0 0 65 24269 VecNorm 349 1.0 4.9612e+01 1.3 1.96e+09 1.1 0.0e+00 0.0e+00 3.5e+02 22 8 0 0 33 22 8 0 0 33 3602 VecCopy 351 1.0 8.8359e+00 7.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecSet 2 1.0 1.6177e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 696 1.0 8.8559e+01 1.1 3.92e+09 1.1 0.0e+00 0.0e+00 0.0e+00 42 16 0 0 0 42 16 0 0 0 4024 VecAYPX 347 1.0 4.6790e+00 1.2 1.95e+09 1.1 0.0e+00 0.0e+00 0.0e+00 2 8 0 0 0 2 8 0 0 0 37970 VecAssemblyBegin 2 1.0 5.0942e-02 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 1 0 0 0 0 1 0 VecAssemblyEnd 2 1.0 1.9073e-05 6.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 348 1.0 1.2763e+00 1.5 0.00e+00 0.0 4.6e+05 2.0e+05 0.0e+00 0 0 97100 0 0 0 97100 0 0 VecScatterEnd 348 1.0 4.6741e+00 5.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatMult 348 1.0 2.8440e+01 1.1 1.27e+10 1.1 4.6e+05 2.0e+05 0.0e+00 13 52 97100 0 13 52 97100 0 40722 MatAssemblyBegin 1 1.0 7.4749e-0124.5 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 8.3194e-01 1.0 0.00e+00 0.0 2.7e+03 5.1e+04 8.0e+00 0 0 1 0 1 0 0 1 0 1 0 KSPSetUp 1 1.0 8.2883e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 1.7964e+02 1.0 2.45e+10 1.1 4.6e+05 2.0e+05 1.0e+03 87100 97100 98 87100 97100 98 12398 PCSetUp 1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 349 1.0 8.8166e+00 7.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////// If you need more information or test please let me know. Thanks in advance Here the log of 1536 cores 345 KSP Residual norm 1.007085286893e-02 346 KSP Residual norm 1.010054402040e-02 347 KSP Residual norm 1.002139574355e-02 348 KSP Residual norm 9.775851299055e-03 Max div for vorticity 1.84572e-05 Integral: 6.62466e-09 130.764 -132 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./vic_petsc on a arch-linux2-c-opt named taurusi6217 with 1536 processors, by incard Thu Jun 15 22:27:09 2017 Using Petsc Release Version 3.6.4, Apr, 12, 2016 Max Max/Min Avg Total Time (sec): 2.312e+02 1.00027 2.312e+02 Objects: 1.900e+01 1.00000 1.900e+01 Flops: 1.573e+09 1.32212 1.450e+09 2.227e+12 Flops/sec: 6.804e+06 1.32242 6.271e+06 9.633e+09 MPI Messages: 8.202e+03 2.06821 5.871e+03 9.018e+06 MPI Message Lengths: 2.013e+08 1.86665 2.640e+04 2.381e+11 MPI Reductions: 1.066e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 2.3120e+02 100.0% 2.2272e+12 100.0% 9.018e+06 100.0% 2.640e+04 100.0% 1.065e+03 99.9% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecTDot 696 1.0 3.4442e+01 1.4 2.52e+08 1.3 0.0e+00 0.0e+00 7.0e+02 12 16 0 0 65 12 16 0 0 65 10346 VecNorm 349 1.0 1.1101e+02 1.1 1.26e+08 1.3 0.0e+00 0.0e+00 3.5e+02 46 8 0 0 33 46 8 0 0 33 1610 VecCopy 351 1.0 2.7609e-01 5.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 2 1.0 3.8961e-0256.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 696 1.0 8.3134e+01 1.1 2.52e+08 1.3 0.0e+00 0.0e+00 0.0e+00 34 16 0 0 0 34 16 0 0 0 4286 VecAYPX 347 1.0 2.0852e-01 2.0 1.25e+08 1.3 0.0e+00 0.0e+00 0.0e+00 0 8 0 0 0 0 8 0 0 0 852044 VecAssemblyBegin 2 1.0 8.3237e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 1 0 0 0 0 1 0 VecAssemblyEnd 2 1.0 5.1022e-0517.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 348 1.0 2.3975e-01 1.8 0.00e+00 0.0 7.7e+06 3.1e+04 0.0e+00 0 0 85 99 0 0 0 85 99 0 0 VecScatterEnd 348 1.0 2.8680e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatMult 348 1.0 4.1088e+00 1.4 8.18e+08 1.3 7.7e+06 3.1e+04 0.0e+00 2 52 85 99 0 2 52 85 99 0 281866 MatAssemblyBegin 1 1.0 9.1920e-02 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 3.9093e+00 1.0 0.00e+00 0.0 4.4e+04 7.7e+03 8.0e+00 2 0 0 0 1 2 0 0 0 1 0 KSPSetUp 1 1.0 8.1890e-03 4.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 2.1525e+02 1.0 1.57e+09 1.3 7.7e+06 3.1e+04 1.0e+03 93100 85 99 98 93100 85 99 98 10347 PCSetUp 1 1.0 5.5075e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 349 1.0 2.7485e-01 6.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Vector 10 7 7599552 0 Vector Scatter 1 1 1088 0 Matrix 3 3 20858912 0 Krylov Solver 1 1 1216 0 Index Set 2 2 242288 0 Preconditioner 1 1 816 0 Viewer 1 0 0 0 ======================================================================================================================== Average time to get PetscTime(): 9.53674e-08 Average time for MPI_Barrier(): 3.68118e-05 Average time for zero size MPI_Send(): 3.24349e-06 #PETSc Option Table entries: -ksp_atol 0.010000 -ksp_max_it 500 -ksp_monitor -ksp_type cg -log_summary #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-cxx-dialect=C++11 --with-mpi-dir=/sw/taurus/libraries/openmpi/1.10.2-gnu --with-parmetis=yes --with-parmetis-dir=/scratch/p_ppm//PARMETIS/ --with-metis=yes --with-metis-dir=/scratch/p_ppm//METIS --with-boost=yes --with-boost-dir=/scratch/p_ppm//BOOST --with-blas-lib=/scratch/p_ppm//OPENBLAS/lib/libopenblas.a --with-lapack-lib=/scratch/p_ppm//OPENBLAS/lib/libopenblas.a --with-suitesparse=yes --with-suitesparse-dir=/scratch/p_ppm//SUITESPARSE --with-trilinos=yes -with-trilinos-dir=/scratch/p_ppm//TRILINOS --with-scalapack=yes -with-scalapack-dir=/scratch/p_ppm//SCALAPACK --with-mumps=yes --with-mumps-include=/scratch/p_ppm//MUMPS/include --with-superlu_dist=yes --with-superlu_dist-lib=/scratch/p_ppm//SUPERLU_DIST/lib/libsuperlu_dist_4.3.a --with-superlu_dist-include=/scratch/p_ppm//SUPERLU_DIST/include/ --with-hypre=yes -with-hypre-dir=/scratch/p_ppm//HYPRE --with-mumps-lib=""/scratch/p_ppm//MUMPS/lib/libdmumps.a /scratch/p_ppm//MUMPS/lib/libmumps_common.a /scratch/p_ppm//MUMPS/lib/libpord.a"" --prefix=/scratch/p_ppm//PETSC --with-debugging=0 ----------------------------------------- Libraries compiled on Wed Feb 22 17:30:49 2017 on tauruslogin4 Machine characteristics: Linux-2.6.32-642.11.1.el6.Bull.106.x86_64-x86_64-with-redhat-6.8-Santiago Using PETSc directory: /lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4 Using PETSc arch: arch-linux2-c-opt ----------------------------------------- Using C compiler: /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpif90 -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/include -I/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/include -I/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/include -I/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/include -I/scratch/p_ppm/TRILINOS/include -I/scratch/p_ppm/HYPRE/include -I/scratch/p_ppm/SUPERLU_DIST/include -I/scratch/p_ppm/SUITESPARSE/include -I/scratch/p_ppm/MUMPS/include -I/scratch/p_ppm/PARMETIS/include -I/scratch/p_ppm/METIS/include -I/scratch/p_ppm/BOOST/include -I/sw/taurus/libraries/openmpi/1.10.2-gnu/include ----------------------------------------- Using C linker: /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpicc Using Fortran linker: /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpif90 Using libraries: -Wl,-rpath,/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/lib -L/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/scratch/p_ppm/TRILINOS/lib -L/scratch/p_ppm/TRILINOS/lib -lpike-blackbox -ltrilinoscouplings -lmsqutil -lmesquite -lctrilinos -lsundancePdeopt -lsundanceStdFwk -lsundanceStdMesh -lsundanceCore -lsundanceInterop -lsundanceUtils -lsundancePlaya -lpiro -lrol -lstokhos_muelu -lstokhos_ifpack2 -lstokhos_amesos2 -lstokhos_tpetra -lstokhos_sacado -lstokhos -lmoochothyra -lmoocho -lrythmos -lmuelu-adapters -lmuelu-interface -lmuelu -lmoertel -llocathyra -llocaepetra -llocalapack -lloca -lnoxepetra -lnoxlapack -lnox -lphalanx -lstk_mesh_fixtures -lstk_search_util_base -lstk_search -lstk_unit_test_utils -lstk_io_util -lstk_io -lstk_mesh_base -lstk_topology -lstk_util_use_cases -lstk_util_registry -lstk_util_diag -lstk_util_env -lstk_util_util -lstkclassic_search_util -lstkclassic_search -lstkclassic_rebalance_utils -lstkclassic_rebalance -lstkclassic_linsys -lstkclassic_io_util -lstkclassic_io -lstkclassic_expreval -lstkclassic_algsup -lstkclassic_mesh_fem -lstkclassic_mesh_base -lstkclassic_util_use_cases -lstkclassic_util_unit_test_support -lstkclassic_util_parallel -lstkclassic_util_diag -lstkclassic_util_env -lstkclassic_util_util -lstk_mesh_fixtures -lstk_search_util_base -lstk_search -lstk_unit_test_utils -lstk_io_util -lstk_io -lstk_mesh_base -lstk_topology -lstk_util_use_cases -lstk_util_registry -lstk_util_diag -lstk_util_env -lstk_util_util -lstkclassic_search_util -lstkclassic_search -lstkclassic_rebalance_utils -lstkclassic_rebalance -lstkclassic_linsys -lstkclassic_io_util -lstkclassic_io -lstkclassic_expreval -lstkclassic_algsup -lstkclassic_mesh_fem -lstkclassic_mesh_base -lstkclassic_util_use_cases -lstkclassic_util_unit_test_support -lstkclassic_util_parallel -lstkclassic_util_diag -lstkclassic_util_env -lstkclassic_util_util -lintrepid -lteko -lfei_trilinos -lfei_base -lstratimikos -lstratimikosbelos -lstratimikosaztecoo -lstratimikosamesos -lstratimikosml -lstratimikosifpack -lifpack2-adapters -lifpack2 -lanasazitpetra -lModeLaplace -lanasaziepetra -lanasazi -lkomplex -lsupes -laprepro_lib -lchaco -lIonit -lIotr -lIohb -lIogn -lIopg -lIoss -lsupes -laprepro_lib -lchaco -lIonit -lIotr -lIohb -lIogn -lIopg -lIoss -lamesos2 -lshylu -lbelostpetra -lbelosepetra -lbelos -lml -lifpack -lzoltan2 -lpamgen_extras -lpamgen -lamesos -lgaleri-xpetra -lgaleri-epetra -laztecoo -ldpliris -lisorropia -loptipack -lxpetra-sup -lxpetra -lthyratpetra -lthyraepetraext -lthyraepetra -lthyracore -lthyratpetra -lthyraepetraext -lthyraepetra -lthyracore -lepetraext -ltpetraext -ltpetrainout -ltpetra -lkokkostsqr -ltpetrakernels -ltpetraclassiclinalg -ltpetraclassicnodeapi -ltpetraclassic -ltpetraext -ltpetrainout -ltpetra -lkokkostsqr -ltpetrakernels -ltpetraclassiclinalg -ltpetraclassicnodeapi -ltpetraclassic -ltriutils -lglobipack -lshards -lzoltan -lepetra -lsacado -lrtop -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchoscore -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -ltpi -lgtest -lpthread -Wl,-rpath,/scratch/p_ppm/HYPRE/lib -L/scratch/p_ppm/HYPRE/lib -lHYPRE -Wl,-rpath,/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -L/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 -L/sw/global/compilers/gcc/5.3.0/lib64 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -L/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0 -L/sw/global/compilers/gcc/5.3.0/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib -L/sw/global/compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib -L/sw/global/compilers/gcc/5.3.0/lib -lmpi_cxx -lstdc++ -Wl,-rpath,/scratch/p_ppm//SUPERLU_DIST/lib -L/scratch/p_ppm//SUPERLU_DIST/lib -lsuperlu_dist_4.3 -Wl,-rpath,/scratch/p_ppm/SUITESPARSE/lib -L/scratch/p_ppm/SUITESPARSE/lib -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lrt -ldmumps -Wl,-rpath,/scratch/p_ppm//MUMPS/lib -L/scratch/p_ppm//MUMPS/lib -lmumps_common -lpord -Wl,-rpath,/scratch/p_ppm/SCALAPACK/lib -L/scratch/p_ppm/SCALAPACK/lib -lscalapack -Wl,-rpath,/scratch/p_ppm//OPENBLAS/lib -L/scratch/p_ppm//OPENBLAS/lib -lopenblas -Wl,-rpath,/scratch/p_ppm/PARMETIS/lib -L/scratch/p_ppm/PARMETIS/lib -lparmetis -Wl,-rpath,/scratch/p_ppm/METIS/lib -L/scratch/p_ppm/METIS/lib -lmetis -lX11 -lhwloc -lssl -lcrypto -lm -lmpi_usempi -lmpi_mpifh -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpi_cxx -lstdc++ -Wl,-rpath,/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -L/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 -L/sw/global/compilers/gcc/5.3.0/lib64 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -L/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 -L/sw/global/compilers/gcc/5.3.0/lib64 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 -L/sw/global/compilers/gcc/5.3.0/lib64 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0 -L/sw/global/compilers/gcc/5.3.0/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -L/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib -L/sw/global/compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib -L/sw/global/compilers/gcc/5.3.0/lib -ldl -Wl,-rpath,/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -lmpi -lgcc_s -lpthread -ldl ----------------------------------------- Regards Pietro Incardona -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Thu Jun 15 16:15:04 2017 From: jychang48 at gmail.com (Justin Chang) Date: Thu, 15 Jun 2017 21:15:04 +0000 Subject: [petsc-users] PETSC profiling on 1536 cores In-Reply-To: References: Message-ID: Using no preconditioner is a bad bad idea and anyone with the gall to do this deserves to be spanked. For the Poisson equation, why not use PETSc's native algebraic multigrid solver? -pc_type gamg On Thu, Jun 15, 2017 at 3:09 PM Pietro Incardona wrote: > Dear All > > > I tried PETSC version 3.6.5 to solve a linear system with 256 000 000 > unknown. The equation is Finite differences Poisson equation. > > > I am using Conjugate gradient (the matrix is symmetric) with no > preconditioner. Visualizing the solution is reasonable. > > Unfortunately the Conjugate-Gradient does not scale at all and I am > extremely concerned about this problem in paticular about the profiling > numbers. > > Looking at the profiler it seem that > > > 1536 cores = 24 cores x 64 > > > VecScatterBegin 348 1.0 2.3975e-01 1.8 0.00e+00 0.0 7.7e+06 3.1e+04 > 0.0e+00 0 0 85 99 0 0 0 85 99 0 0 > VecScatterEnd 348 1.0 2.8680e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatMult 348 1.0 4.1088e+00 1.4 8.18e+08 1.3 7.7e+06 3.1e+04 > 0.0e+00 2 52 85 99 0 2 52 85 99 0 281866 > > > I was expecting that this part was the most expensive and take around 4 > second in total that sound reasonable > > > Unfortunately > > > on 1536 cores = 24 cores x 64 > > > VecTDot 696 1.0 3.4442e+01 1.4 2.52e+08 1.3 0.0e+00 0.0e+00 > 7.0e+02 12 16 0 0 65 12 16 0 0 65 10346 > VecNorm 349 1.0 1.1101e+02 1.1 1.26e+08 1.3 0.0e+00 0.0e+00 > 3.5e+02 46 8 0 0 33 46 8 0 0 33 1610 > VecAXPY 696 1.0 8.3134e+01 1.1 2.52e+08 1.3 0.0e+00 0.0e+00 > 0.0e+00 34 16 0 0 0 34 16 0 0 0 4286 > > > Take over 228 seconds. Considering that doing some test on the cluster I > do not see any problem with MPI_Reduce I do not understand how these > numbers are possible > > > > ////////////////////////// I also attach to the profiling part the > inversion on 48 cores ///////////////////////// > > > VecTDot 696 1.0 1.4684e+01 1.3 3.92e+09 1.1 0.0e+00 0.0e+00 > 7.0e+02 6 16 0 0 65 6 16 0 0 65 24269 > VecNorm 349 1.0 4.9612e+01 1.3 1.96e+09 1.1 0.0e+00 0.0e+00 > 3.5e+02 22 8 0 0 33 22 8 0 0 33 3602 > VecCopy 351 1.0 8.8359e+00 7.7 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > VecSet 2 1.0 1.6177e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAXPY 696 1.0 8.8559e+01 1.1 3.92e+09 1.1 0.0e+00 0.0e+00 > 0.0e+00 42 16 0 0 0 42 16 0 0 0 4024 > VecAYPX 347 1.0 4.6790e+00 1.2 1.95e+09 1.1 0.0e+00 0.0e+00 > 0.0e+00 2 8 0 0 0 2 8 0 0 0 37970 > VecAssemblyBegin 2 1.0 5.0942e-02 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 > 6.0e+00 0 0 0 0 1 0 0 0 0 1 0 > VecAssemblyEnd 2 1.0 1.9073e-05 6.7 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecScatterBegin 348 1.0 1.2763e+00 1.5 0.00e+00 0.0 4.6e+05 2.0e+05 > 0.0e+00 0 0 97100 0 0 0 97100 0 0 > VecScatterEnd 348 1.0 4.6741e+00 5.6 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatMult 348 1.0 2.8440e+01 1.1 1.27e+10 1.1 4.6e+05 2.0e+05 > 0.0e+00 13 52 97100 0 13 52 97100 0 40722 > MatAssemblyBegin 1 1.0 7.4749e-0124.5 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 1 1.0 8.3194e-01 1.0 0.00e+00 0.0 2.7e+03 5.1e+04 > 8.0e+00 0 0 1 0 1 0 0 1 0 1 0 > KSPSetUp 1 1.0 8.2883e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 1 1.0 1.7964e+02 1.0 2.45e+10 1.1 4.6e+05 2.0e+05 > 1.0e+03 87100 97100 98 87100 97100 98 12398 > PCSetUp 1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > PCApply 349 1.0 8.8166e+00 7.8 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > > > > > ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////// > > > If you need more information or test please let me know. > > > Thanks in advance > > > Here the log of 1536 cores > > > 345 KSP Residual norm 1.007085286893e-02 > 346 KSP Residual norm 1.010054402040e-02 > 347 KSP Residual norm 1.002139574355e-02 > 348 KSP Residual norm 9.775851299055e-03 > Max div for vorticity 1.84572e-05 Integral: 6.62466e-09 130.764 -132 > > ************************************************************************************************************************ > *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r > -fCourier9' to print this document *** > > ************************************************************************************************************************ > > ---------------------------------------------- PETSc Performance Summary: > ---------------------------------------------- > > ./vic_petsc on a arch-linux2-c-opt named taurusi6217 with 1536 processors, > by incard Thu Jun 15 22:27:09 2017 > Using Petsc Release Version 3.6.4, Apr, 12, 2016 > > Max Max/Min Avg Total > Time (sec): 2.312e+02 1.00027 2.312e+02 > Objects: 1.900e+01 1.00000 1.900e+01 > Flops: 1.573e+09 1.32212 1.450e+09 2.227e+12 > Flops/sec: 6.804e+06 1.32242 6.271e+06 9.633e+09 > MPI Messages: 8.202e+03 2.06821 5.871e+03 9.018e+06 > MPI Message Lengths: 2.013e+08 1.86665 2.640e+04 2.381e+11 > MPI Reductions: 1.066e+03 1.00000 > > Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N > --> 2N flops > and VecAXPY() for complex vectors of length N > --> 8N flops > > Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages > --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total counts > %Total Avg %Total counts %Total > 0: Main Stage: 2.3120e+02 100.0% 2.2272e+12 100.0% 9.018e+06 > 100.0% 2.640e+04 100.0% 1.065e+03 99.9% > > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on > interpreting output. > Phase summary info: > Count: number of times phase was executed > Time and Flops: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > Avg. len: average message length (bytes) > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and > PetscLogStagePop(). > %T - percent time in this phase %F - percent flops in this > phase > %M - percent messages in this phase %L - percent message lengths > in this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time > over all processors) > > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) > Flops --- Global --- --- Stage --- Total > Max Ratio Max Ratio Max Ratio Mess Avg len > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > VecTDot 696 1.0 3.4442e+01 1.4 2.52e+08 1.3 0.0e+00 0.0e+00 > 7.0e+02 12 16 0 0 65 12 16 0 0 65 10346 > VecNorm 349 1.0 1.1101e+02 1.1 1.26e+08 1.3 0.0e+00 0.0e+00 > 3.5e+02 46 8 0 0 33 46 8 0 0 33 1610 > VecCopy 351 1.0 2.7609e-01 5.8 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 2 1.0 3.8961e-0256.9 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAXPY 696 1.0 8.3134e+01 1.1 2.52e+08 1.3 0.0e+00 0.0e+00 > 0.0e+00 34 16 0 0 0 34 16 0 0 0 4286 > VecAYPX 347 1.0 2.0852e-01 2.0 1.25e+08 1.3 0.0e+00 0.0e+00 > 0.0e+00 0 8 0 0 0 0 8 0 0 0 852044 > VecAssemblyBegin 2 1.0 8.3237e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 > 6.0e+00 0 0 0 0 1 0 0 0 0 1 0 > VecAssemblyEnd 2 1.0 5.1022e-0517.8 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecScatterBegin 348 1.0 2.3975e-01 1.8 0.00e+00 0.0 7.7e+06 3.1e+04 > 0.0e+00 0 0 85 99 0 0 0 85 99 0 0 > VecScatterEnd 348 1.0 2.8680e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatMult 348 1.0 4.1088e+00 1.4 8.18e+08 1.3 7.7e+06 3.1e+04 > 0.0e+00 2 52 85 99 0 2 52 85 99 0 281866 > MatAssemblyBegin 1 1.0 9.1920e-02 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 1 1.0 3.9093e+00 1.0 0.00e+00 0.0 4.4e+04 7.7e+03 > 8.0e+00 2 0 0 0 1 2 0 0 0 1 0 > KSPSetUp 1 1.0 8.1890e-03 4.3 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 1 1.0 2.1525e+02 1.0 1.57e+09 1.3 7.7e+06 3.1e+04 > 1.0e+03 93100 85 99 98 93100 85 99 98 10347 > PCSetUp 1 1.0 5.5075e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > PCApply 349 1.0 2.7485e-01 6.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > ------------------------------------------------------------------------------------------------------------------------ > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' Mem. > Reports information only for process 0. > > --- Event Stage 0: Main Stage > > Vector 10 7 7599552 0 > Vector Scatter 1 1 1088 0 > Matrix 3 3 20858912 0 > Krylov Solver 1 1 1216 0 > Index Set 2 2 242288 0 > Preconditioner 1 1 816 0 > Viewer 1 0 0 0 > > ======================================================================================================================== > Average time to get PetscTime(): 9.53674e-08 > Average time for MPI_Barrier(): 3.68118e-05 > Average time for zero size MPI_Send(): 3.24349e-06 > #PETSc Option Table entries: > -ksp_atol 0.010000 > -ksp_max_it 500 > -ksp_monitor > -ksp_type cg > -log_summary > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 8 sizeof(PetscInt) 4 > Configure options: --with-cxx-dialect=C++11 > --with-mpi-dir=/sw/taurus/libraries/openmpi/1.10.2-gnu --with-parmetis=yes > --with-parmetis-dir=/scratch/p_ppm//PARMETIS/ --with-metis=yes > --with-metis-dir=/scratch/p_ppm//METIS --with-boost=yes > --with-boost-dir=/scratch/p_ppm//BOOST > --with-blas-lib=/scratch/p_ppm//OPENBLAS/lib/libopenblas.a > --with-lapack-lib=/scratch/p_ppm//OPENBLAS/lib/libopenblas.a > --with-suitesparse=yes --with-suitesparse-dir=/scratch/p_ppm//SUITESPARSE > --with-trilinos=yes -with-trilinos-dir=/scratch/p_ppm//TRILINOS > --with-scalapack=yes -with-scalapack-dir=/scratch/p_ppm//SCALAPACK > --with-mumps=yes --with-mumps-include=/scratch/p_ppm//MUMPS/include > --with-superlu_dist=yes > --with-superlu_dist-lib=/scratch/p_ppm//SUPERLU_DIST/lib/libsuperlu_dist_4.3.a > --with-superlu_dist-include=/scratch/p_ppm//SUPERLU_DIST/include/ > --with-hypre=yes -with-hypre-dir=/scratch/p_ppm//HYPRE > --with-mumps-lib=""/scratch/p_ppm//MUMPS/lib/libdmumps.a > /scratch/p_ppm//MUMPS/lib/libmumps_common.a > /scratch/p_ppm//MUMPS/lib/libpord.a"" --prefix=/scratch/p_ppm//PETSC > --with-debugging=0 > ----------------------------------------- > Libraries compiled on Wed Feb 22 17:30:49 2017 on tauruslogin4 > Machine characteristics: > Linux-2.6.32-642.11.1.el6.Bull.106.x86_64-x86_64-with-redhat-6.8-Santiago > Using PETSc directory: > /lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4 > Using PETSc arch: arch-linux2-c-opt > ----------------------------------------- > > Using C compiler: /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpicc -fPIC > -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O > ${COPTFLAGS} ${CFLAGS} > Using Fortran compiler: > /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpif90 -fPIC -Wall > -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O > ${FOPTFLAGS} ${FFLAGS} > ----------------------------------------- > > Using include paths: > -I/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/include > -I/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/include > -I/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/include > -I/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/include > -I/scratch/p_ppm/TRILINOS/include -I/scratch/p_ppm/HYPRE/include > -I/scratch/p_ppm/SUPERLU_DIST/include -I/scratch/p_ppm/SUITESPARSE/include > -I/scratch/p_ppm/MUMPS/include -I/scratch/p_ppm/PARMETIS/include > -I/scratch/p_ppm/METIS/include -I/scratch/p_ppm/BOOST/include > -I/sw/taurus/libraries/openmpi/1.10.2-gnu/include > ----------------------------------------- > > Using C linker: /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpicc > Using Fortran linker: /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpif90 > Using libraries: > -Wl,-rpath,/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/lib > -L/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/lib > -lpetsc -Wl,-rpath,/scratch/p_ppm/TRILINOS/lib > -L/scratch/p_ppm/TRILINOS/lib -lpike-blackbox -ltrilinoscouplings -lmsqutil > -lmesquite -lctrilinos -lsundancePdeopt -lsundanceStdFwk -lsundanceStdMesh > -lsundanceCore -lsundanceInterop -lsundanceUtils -lsundancePlaya -lpiro > -lrol -lstokhos_muelu -lstokhos_ifpack2 -lstokhos_amesos2 -lstokhos_tpetra > -lstokhos_sacado -lstokhos -lmoochothyra -lmoocho -lrythmos > -lmuelu-adapters -lmuelu-interface -lmuelu -lmoertel -llocathyra > -llocaepetra -llocalapack -lloca -lnoxepetra -lnoxlapack -lnox -lphalanx > -lstk_mesh_fixtures -lstk_search_util_base -lstk_search > -lstk_unit_test_utils -lstk_io_util -lstk_io -lstk_mesh_base -lstk_topology > -lstk_util_use_cases -lstk_util_registry -lstk_util_diag -lstk_util_env > -lstk_util_util -lstkclassic_search_util -lstkclassic_search > -lstkclassic_rebalance_utils -lstkclassic_rebalance -lstkclassic_linsys > -lstkclassic_io_util -lstkclassic_io -lstkclassic_expreval > -lstkclassic_algsup -lstkclassic_mesh_fem -lstkclassic_mesh_base > -lstkclassic_util_use_cases -lstkclassic_util_unit_test_support > -lstkclassic_util_parallel -lstkclassic_util_diag -lstkclassic_util_env > -lstkclassic_util_util -lstk_mesh_fixtures -lstk_search_util_base > -lstk_search -lstk_unit_test_utils -lstk_io_util -lstk_io -lstk_mesh_base > -lstk_topology -lstk_util_use_cases -lstk_util_registry -lstk_util_diag > -lstk_util_env -lstk_util_util -lstkclassic_search_util -lstkclassic_search > -lstkclassic_rebalance_utils -lstkclassic_rebalance -lstkclassic_linsys > -lstkclassic_io_util -lstkclassic_io -lstkclassic_expreval > -lstkclassic_algsup -lstkclassic_mesh_fem -lstkclassic_mesh_base > -lstkclassic_util_use_cases -lstkclassic_util_unit_test_support > -lstkclassic_util_parallel -lstkclassic_util_diag -lstkclassic_util_env > -lstkclassic_util_util -lintrepid -lteko -lfei_trilinos -lfei_base > -lstratimikos -lstratimikosbelos -lstratimikosaztecoo -lstratimikosamesos > -lstratimikosml -lstratimikosifpack -lifpack2-adapters -lifpack2 > -lanasazitpetra -lModeLaplace -lanasaziepetra -lanasazi -lkomplex -lsupes > -laprepro_lib -lchaco -lIonit -lIotr -lIohb -lIogn -lIopg -lIoss -lsupes > -laprepro_lib -lchaco -lIonit -lIotr -lIohb -lIogn -lIopg -lIoss -lamesos2 > -lshylu -lbelostpetra -lbelosepetra -lbelos -lml -lifpack -lzoltan2 > -lpamgen_extras -lpamgen -lamesos -lgaleri-xpetra -lgaleri-epetra -laztecoo > -ldpliris -lisorropia -loptipack -lxpetra-sup -lxpetra -lthyratpetra > -lthyraepetraext -lthyraepetra -lthyracore -lthyratpetra -lthyraepetraext > -lthyraepetra -lthyracore -lepetraext -ltpetraext -ltpetrainout -ltpetra > -lkokkostsqr -ltpetrakernels -ltpetraclassiclinalg -ltpetraclassicnodeapi > -ltpetraclassic -ltpetraext -ltpetrainout -ltpetra -lkokkostsqr > -ltpetrakernels -ltpetraclassiclinalg -ltpetraclassicnodeapi > -ltpetraclassic -ltriutils -lglobipack -lshards -lzoltan -lepetra -lsacado > -lrtop -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder > -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchoscore > -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder > -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchoscore > -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lkokkosalgorithms > -lkokkoscontainers -lkokkoscore -ltpi -lgtest -lpthread > -Wl,-rpath,/scratch/p_ppm/HYPRE/lib -L/scratch/p_ppm/HYPRE/lib -lHYPRE > -Wl,-rpath,/sw/taurus/libraries/openmpi/1.10.2-gnu/lib > -L/sw/taurus/libraries/openmpi/1.10.2-gnu/lib > -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 > -L/sw/global/compilers/gcc/5.3.0/lib64 > -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 > -L/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 > -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0 > -L/sw/global/compilers/gcc/5.3.0/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0 > -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib > -L/sw/global/compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib > -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib > -L/sw/global/compilers/gcc/5.3.0/lib -lmpi_cxx -lstdc++ > -Wl,-rpath,/scratch/p_ppm//SUPERLU_DIST/lib > -L/scratch/p_ppm//SUPERLU_DIST/lib -lsuperlu_dist_4.3 > -Wl,-rpath,/scratch/p_ppm/SUITESPARSE/lib -L/scratch/p_ppm/SUITESPARSE/lib > -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd > -lsuitesparseconfig -lrt -ldmumps -Wl,-rpath,/scratch/p_ppm//MUMPS/lib > -L/scratch/p_ppm//MUMPS/lib -lmumps_common -lpord > -Wl,-rpath,/scratch/p_ppm/SCALAPACK/lib -L/scratch/p_ppm/SCALAPACK/lib > -lscalapack -Wl,-rpath,/scratch/p_ppm//OPENBLAS/lib > -L/scratch/p_ppm//OPENBLAS/lib -lopenblas > -Wl,-rpath,/scratch/p_ppm/PARMETIS/lib -L/scratch/p_ppm/PARMETIS/lib > -lparmetis -Wl,-rpath,/scratch/p_ppm/METIS/lib -L/scratch/p_ppm/METIS/lib > -lmetis -lX11 -lhwloc -lssl -lcrypto -lm -lmpi_usempi -lmpi_mpifh > -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpi_cxx -lstdc++ > -Wl,-rpath,/sw/taurus/libraries/openmpi/1.10.2-gnu/lib > -L/sw/taurus/libraries/openmpi/1.10.2-gnu/lib > -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 > -L/sw/global/compilers/gcc/5.3.0/lib64 > -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 > -L/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 > -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 > -L/sw/global/compilers/gcc/5.3.0/lib64 > -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 > -L/sw/global/compilers/gcc/5.3.0/lib64 > -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0 > -L/sw/global/compilers/gcc/5.3.0/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0 > -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 > -L/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 > -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib > -L/sw/global/compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib > -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib > -L/sw/global/compilers/gcc/5.3.0/lib -ldl > -Wl,-rpath,/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -lmpi -lgcc_s > -lpthread -ldl > ----------------------------------------- > > Regards > > Pietro Incardona > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Jun 15 16:16:50 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 15 Jun 2017 16:16:50 -0500 Subject: [petsc-users] PETSC profiling on 1536 cores In-Reply-To: References: Message-ID: <896F771F-C770-4003-B616-C45AD9A0EB68@mcs.anl.gov> Please send the complete -log_view files as attachments. Both 1536 and 48 cores. The mailers mess up the ASCII formatting in text so I can't make heads or tails out of the result. What is the MPI being used and what kind of interconnect does the network have? Also is the MPI specific to that interconnect or just something compiled off the web? Barry > On Jun 15, 2017, at 4:09 PM, Pietro Incardona wrote: > > Dear All > > I tried PETSC version 3.6.5 to solve a linear system with 256 000 000 unknown. The equation is Finite differences Poisson equation. > > I am using Conjugate gradient (the matrix is symmetric) with no preconditioner. Visualizing the solution is reasonable. > Unfortunately the Conjugate-Gradient does not scale at all and I am extremely concerned about this problem in paticular about the profiling numbers. > Looking at the profiler it seem that > > 1536 cores = 24 cores x 64 > > VecScatterBegin 348 1.0 2.3975e-01 1.8 0.00e+00 0.0 7.7e+06 3.1e+04 0.0e+00 0 0 85 99 0 0 0 85 99 0 0 > VecScatterEnd 348 1.0 2.8680e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatMult 348 1.0 4.1088e+00 1.4 8.18e+08 1.3 7.7e+06 3.1e+04 0.0e+00 2 52 85 99 0 2 52 85 99 0 281866 > > I was expecting that this part was the most expensive and take around 4 second in total that sound reasonable > > Unfortunately > > on 1536 cores = 24 cores x 64 > > VecTDot 696 1.0 3.4442e+01 1.4 2.52e+08 1.3 0.0e+00 0.0e+00 7.0e+02 12 16 0 0 65 12 16 0 0 65 10346 > VecNorm 349 1.0 1.1101e+02 1.1 1.26e+08 1.3 0.0e+00 0.0e+00 3.5e+02 46 8 0 0 33 46 8 0 0 33 1610 > VecAXPY 696 1.0 8.3134e+01 1.1 2.52e+08 1.3 0.0e+00 0.0e+00 0.0e+00 34 16 0 0 0 34 16 0 0 0 4286 > > Take over 228 seconds. Considering that doing some test on the cluster I do not see any problem with MPI_Reduce I do not understand how these numbers are possible > > > ////////////////////////// I also attach to the profiling part the inversion on 48 cores ///////////////////////// > > VecTDot 696 1.0 1.4684e+01 1.3 3.92e+09 1.1 0.0e+00 0.0e+00 7.0e+02 6 16 0 0 65 6 16 0 0 65 24269 > VecNorm 349 1.0 4.9612e+01 1.3 1.96e+09 1.1 0.0e+00 0.0e+00 3.5e+02 22 8 0 0 33 22 8 0 0 33 3602 > VecCopy 351 1.0 8.8359e+00 7.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > VecSet 2 1.0 1.6177e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAXPY 696 1.0 8.8559e+01 1.1 3.92e+09 1.1 0.0e+00 0.0e+00 0.0e+00 42 16 0 0 0 42 16 0 0 0 4024 > VecAYPX 347 1.0 4.6790e+00 1.2 1.95e+09 1.1 0.0e+00 0.0e+00 0.0e+00 2 8 0 0 0 2 8 0 0 0 37970 > VecAssemblyBegin 2 1.0 5.0942e-02 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 1 0 0 0 0 1 0 > VecAssemblyEnd 2 1.0 1.9073e-05 6.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecScatterBegin 348 1.0 1.2763e+00 1.5 0.00e+00 0.0 4.6e+05 2.0e+05 0.0e+00 0 0 97100 0 0 0 97100 0 0 > VecScatterEnd 348 1.0 4.6741e+00 5.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatMult 348 1.0 2.8440e+01 1.1 1.27e+10 1.1 4.6e+05 2.0e+05 0.0e+00 13 52 97100 0 13 52 97100 0 40722 > MatAssemblyBegin 1 1.0 7.4749e-0124.5 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 1 1.0 8.3194e-01 1.0 0.00e+00 0.0 2.7e+03 5.1e+04 8.0e+00 0 0 1 0 1 0 0 1 0 1 0 > KSPSetUp 1 1.0 8.2883e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 1 1.0 1.7964e+02 1.0 2.45e+10 1.1 4.6e+05 2.0e+05 1.0e+03 87100 97100 98 87100 97100 98 12398 > PCSetUp 1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > PCApply 349 1.0 8.8166e+00 7.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > > > ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////// > > If you need more information or test please let me know. > > Thanks in advance > > Here the log of 1536 cores > > 345 KSP Residual norm 1.007085286893e-02 > 346 KSP Residual norm 1.010054402040e-02 > 347 KSP Residual norm 1.002139574355e-02 > 348 KSP Residual norm 9.775851299055e-03 > Max div for vorticity 1.84572e-05 Integral: 6.62466e-09 130.764 -132 > ************************************************************************************************************************ > *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** > ************************************************************************************************************************ > > ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- > > ./vic_petsc on a arch-linux2-c-opt named taurusi6217 with 1536 processors, by incard Thu Jun 15 22:27:09 2017 > Using Petsc Release Version 3.6.4, Apr, 12, 2016 > > Max Max/Min Avg Total > Time (sec): 2.312e+02 1.00027 2.312e+02 > Objects: 1.900e+01 1.00000 1.900e+01 > Flops: 1.573e+09 1.32212 1.450e+09 2.227e+12 > Flops/sec: 6.804e+06 1.32242 6.271e+06 9.633e+09 > MPI Messages: 8.202e+03 2.06821 5.871e+03 9.018e+06 > MPI Message Lengths: 2.013e+08 1.86665 2.640e+04 2.381e+11 > MPI Reductions: 1.066e+03 1.00000 > > Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N --> 2N flops > and VecAXPY() for complex vectors of length N --> 8N flops > > Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total counts %Total Avg %Total counts %Total > 0: Main Stage: 2.3120e+02 100.0% 2.2272e+12 100.0% 9.018e+06 100.0% 2.640e+04 100.0% 1.065e+03 99.9% > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on interpreting output. > Phase summary info: > Count: number of times phase was executed > Time and Flops: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > Avg. len: average message length (bytes) > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). > %T - percent time in this phase %F - percent flops in this phase > %M - percent messages in this phase %L - percent message lengths in this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flops --- Global --- --- Stage --- Total > Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > VecTDot 696 1.0 3.4442e+01 1.4 2.52e+08 1.3 0.0e+00 0.0e+00 7.0e+02 12 16 0 0 65 12 16 0 0 65 10346 > VecNorm 349 1.0 1.1101e+02 1.1 1.26e+08 1.3 0.0e+00 0.0e+00 3.5e+02 46 8 0 0 33 46 8 0 0 33 1610 > VecCopy 351 1.0 2.7609e-01 5.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 2 1.0 3.8961e-0256.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAXPY 696 1.0 8.3134e+01 1.1 2.52e+08 1.3 0.0e+00 0.0e+00 0.0e+00 34 16 0 0 0 34 16 0 0 0 4286 > VecAYPX 347 1.0 2.0852e-01 2.0 1.25e+08 1.3 0.0e+00 0.0e+00 0.0e+00 0 8 0 0 0 0 8 0 0 0 852044 > VecAssemblyBegin 2 1.0 8.3237e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 1 0 0 0 0 1 0 > VecAssemblyEnd 2 1.0 5.1022e-0517.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecScatterBegin 348 1.0 2.3975e-01 1.8 0.00e+00 0.0 7.7e+06 3.1e+04 0.0e+00 0 0 85 99 0 0 0 85 99 0 0 > VecScatterEnd 348 1.0 2.8680e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatMult 348 1.0 4.1088e+00 1.4 8.18e+08 1.3 7.7e+06 3.1e+04 0.0e+00 2 52 85 99 0 2 52 85 99 0 281866 > MatAssemblyBegin 1 1.0 9.1920e-02 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 1 1.0 3.9093e+00 1.0 0.00e+00 0.0 4.4e+04 7.7e+03 8.0e+00 2 0 0 0 1 2 0 0 0 1 0 > KSPSetUp 1 1.0 8.1890e-03 4.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 1 1.0 2.1525e+02 1.0 1.57e+09 1.3 7.7e+06 3.1e+04 1.0e+03 93100 85 99 98 93100 85 99 98 10347 > PCSetUp 1 1.0 5.5075e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > PCApply 349 1.0 2.7485e-01 6.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > ------------------------------------------------------------------------------------------------------------------------ > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' Mem. > Reports information only for process 0. > > --- Event Stage 0: Main Stage > > Vector 10 7 7599552 0 > Vector Scatter 1 1 1088 0 > Matrix 3 3 20858912 0 > Krylov Solver 1 1 1216 0 > Index Set 2 2 242288 0 > Preconditioner 1 1 816 0 > Viewer 1 0 0 0 > ======================================================================================================================== > Average time to get PetscTime(): 9.53674e-08 > Average time for MPI_Barrier(): 3.68118e-05 > Average time for zero size MPI_Send(): 3.24349e-06 > #PETSc Option Table entries: > -ksp_atol 0.010000 > -ksp_max_it 500 > -ksp_monitor > -ksp_type cg > -log_summary > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 > Configure options: --with-cxx-dialect=C++11 --with-mpi-dir=/sw/taurus/libraries/openmpi/1.10.2-gnu --with-parmetis=yes --with-parmetis-dir=/scratch/p_ppm//PARMETIS/ --with-metis=yes --with-metis-dir=/scratch/p_ppm//METIS --with-boost=yes --with-boost-dir=/scratch/p_ppm//BOOST --with-blas-lib=/scratch/p_ppm//OPENBLAS/lib/libopenblas.a --with-lapack-lib=/scratch/p_ppm//OPENBLAS/lib/libopenblas.a --with-suitesparse=yes --with-suitesparse-dir=/scratch/p_ppm//SUITESPARSE --with-trilinos=yes -with-trilinos-dir=/scratch/p_ppm//TRILINOS --with-scalapack=yes -with-scalapack-dir=/scratch/p_ppm//SCALAPACK --with-mumps=yes --with-mumps-include=/scratch/p_ppm//MUMPS/include --with-superlu_dist=yes --with-superlu_dist-lib=/scratch/p_ppm//SUPERLU_DIST/lib/libsuperlu_dist_4.3.a --with-superlu_dist-include=/scratch/p_ppm//SUPERLU_DIST/include/ --with-hypre=yes -with-hypre-dir=/scratch/p_ppm//HYPRE --with-mumps-lib=""/scratch/p_ppm//MUMPS/lib/libdmumps.a /scratch/p_ppm//MUMPS/lib/libmumps_common.a /scratch/p_ppm//MUMPS/lib/libpord.a"" --prefix=/scratch/p_ppm//PETSC --with-debugging=0 > ----------------------------------------- > Libraries compiled on Wed Feb 22 17:30:49 2017 on tauruslogin4 > Machine characteristics: Linux-2.6.32-642.11.1.el6.Bull.106.x86_64-x86_64-with-redhat-6.8-Santiago > Using PETSc directory: /lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4 > Using PETSc arch: arch-linux2-c-opt > ----------------------------------------- > > Using C compiler: /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS} ${CFLAGS} > Using Fortran compiler: /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpif90 -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O ${FOPTFLAGS} ${FFLAGS} > ----------------------------------------- > > Using include paths: -I/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/include -I/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/include -I/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/include -I/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/include -I/scratch/p_ppm/TRILINOS/include -I/scratch/p_ppm/HYPRE/include -I/scratch/p_ppm/SUPERLU_DIST/include -I/scratch/p_ppm/SUITESPARSE/include -I/scratch/p_ppm/MUMPS/include -I/scratch/p_ppm/PARMETIS/include -I/scratch/p_ppm/METIS/include -I/scratch/p_ppm/BOOST/include -I/sw/taurus/libraries/openmpi/1.10.2-gnu/include > ----------------------------------------- > > Using C linker: /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpicc > Using Fortran linker: /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpif90 > Using libraries: -Wl,-rpath,/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/lib -L/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/scratch/p_ppm/TRILINOS/lib -L/scratch/p_ppm/TRILINOS/lib -lpike-blackbox -ltrilinoscouplings -lmsqutil -lmesquite -lctrilinos -lsundancePdeopt -lsundanceStdFwk -lsundanceStdMesh -lsundanceCore -lsundanceInterop -lsundanceUtils -lsundancePlaya -lpiro -lrol -lstokhos_muelu -lstokhos_ifpack2 -lstokhos_amesos2 -lstokhos_tpetra -lstokhos_sacado -lstokhos -lmoochothyra -lmoocho -lrythmos -lmuelu-adapters -lmuelu-interface -lmuelu -lmoertel -llocathyra -llocaepetra -llocalapack -lloca -lnoxepetra -lnoxlapack -lnox -lphalanx -lstk_mesh_fixtures -lstk_search_util_base -lstk_search -lstk_unit_test_utils -lstk_io_util -lstk_io -lstk_mesh_base -lstk_topology -lstk_util_use_cases -lstk_util_registry -lstk_util_diag -lstk_util_env -lstk_util_util -lstkclassic_search_util -lstkclassic_search -lstkclassic_rebalance_utils -lstkclassic_rebalance -lstkclassic_linsys -lstkclassic_io_util -lstkclassic_io -lstkclassic_expreval -lstkclassic_algsup -lstkclassic_mesh_fem -lstkclassic_mesh_base -lstkclassic_util_use_cases -lstkclassic_util_unit_test_support -lstkclassic_util_parallel -lstkclassic_util_diag -lstkclassic_util_env -lstkclassic_util_util -lstk_mesh_fixtures -lstk_search_util_base -lstk_search -lstk_unit_test_utils -lstk_io_util -lstk_io -lstk_mesh_base -lstk_topology -lstk_util_use_cases -lstk_util_registry -lstk_util_diag -lstk_util_env -lstk_util_util -lstkclassic_search_util -lstkclassic_search -lstkclassic_rebalance_utils -lstkclassic_rebalance -lstkclassic_linsys -lstkclassic_io_util -lstkclassic_io -lstkclassic_expreval -lstkclassic_algsup -lstkclassic_mesh_fem -lstkclassic_mesh_base -lstkclassic_util_use_cases -lstkclassic_util_unit_test_support -lstkclassic_util_parallel -lstkclassic_util_diag -lstkclassic_util_env -lstkclassic_util_util -lintrepid -lteko -lfei_trilinos -lfei_base -lstratimikos -lstratimikosbelos -lstratimikosaztecoo -lstratimikosamesos -lstratimikosml -lstratimikosifpack -lifpack2-adapters -lifpack2 -lanasazitpetra -lModeLaplace -lanasaziepetra -lanasazi -lkomplex -lsupes -laprepro_lib -lchaco -lIonit -lIotr -lIohb -lIogn -lIopg -lIoss -lsupes -laprepro_lib -lchaco -lIonit -lIotr -lIohb -lIogn -lIopg -lIoss -lamesos2 -lshylu -lbelostpetra -lbelosepetra -lbelos -lml -lifpack -lzoltan2 -lpamgen_extras -lpamgen -lamesos -lgaleri-xpetra -lgaleri-epetra -laztecoo -ldpliris -lisorropia -loptipack -lxpetra-sup -lxpetra -lthyratpetra -lthyraepetraext -lthyraepetra -lthyracore -lthyratpetra -lthyraepetraext -lthyraepetra -lthyracore -lepetraext -ltpetraext -ltpetrainout -ltpetra -lkokkostsqr -ltpetrakernels -ltpetraclassiclinalg -ltpetraclassicnodeapi -ltpetraclassic -ltpetraext -ltpetrainout -ltpetra -lkokkostsqr -ltpetrakernels -ltpetraclassiclinalg -ltpetraclassicnodeapi -ltpetraclassic -ltriutils -lglobipack -lshards -lzoltan -lepetra -lsacado -lrtop -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchoscore -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -ltpi -lgtest -lpthread -Wl,-rpath,/scratch/p_ppm/HYPRE/lib -L/scratch/p_ppm/HYPRE/lib -lHYPRE -Wl,-rpath,/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -L/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 -L/sw/global/compilers/gcc/5.3.0/lib64 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -L/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0 -L/sw/global/compilers/gcc/5.3.0/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib -L/sw/global/compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib -L/sw/global/compilers/gcc/5.3.0/lib -lmpi_cxx -lstdc++ -Wl,-rpath,/scratch/p_ppm//SUPERLU_DIST/lib -L/scratch/p_ppm//SUPERLU_DIST/lib -lsuperlu_dist_4.3 -Wl,-rpath,/scratch/p_ppm/SUITESPARSE/lib -L/scratch/p_ppm/SUITESPARSE/lib -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lrt -ldmumps -Wl,-rpath,/scratch/p_ppm//MUMPS/lib -L/scratch/p_ppm//MUMPS/lib -lmumps_common -lpord -Wl,-rpath,/scratch/p_ppm/SCALAPACK/lib -L/scratch/p_ppm/SCALAPACK/lib -lscalapack -Wl,-rpath,/scratch/p_ppm//OPENBLAS/lib -L/scratch/p_ppm//OPENBLAS/lib -lopenblas -Wl,-rpath,/scratch/p_ppm/PARMETIS/lib -L/scratch/p_ppm/PARMETIS/lib -lparmetis -Wl,-rpath,/scratch/p_ppm/METIS/lib -L/scratch/p_ppm/METIS/lib -lmetis -lX11 -lhwloc -lssl -lcrypto -lm -lmpi_usempi -lmpi_mpifh -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpi_cxx -lstdc++ -Wl,-rpath,/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -L/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 -L/sw/global/compilers/gcc/5.3.0/lib64 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -L/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 -L/sw/global/compilers/gcc/5.3.0/lib64 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 -L/sw/global/compilers/gcc/5.3.0/lib64 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0 -L/sw/global/compilers/gcc/5.3.0/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -L/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib -L/sw/global/compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib -L/sw/global/compilers/gcc/5.3.0/lib -ldl -Wl,-rpath,/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -lmpi -lgcc_s -lpthread -ldl > ----------------------------------------- > > Regards > Pietro Incardona From kannanr at ornl.gov Thu Jun 15 18:34:16 2017 From: kannanr at ornl.gov (Kannan, Ramakrishnan) Date: Thu, 15 Jun 2017 23:34:16 +0000 Subject: [petsc-users] slepc NHEP error In-Reply-To: <25798DA5-ECA6-40E5-995D-2BE90D6FDBAF@mcs.anl.gov> References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> <92EA6381-8F1F-4C9F-88BD-C1357B8C1C42@mcs.anl.gov> <9C04F853-DEDF-4F28-B63E-316AB14E0A97@mcs.anl.gov> <34243810-1A49-499E-812E-0C0CCCC38565@mcs.anl.gov> <4D23FC40-1491-44AF-825E-C4C4160F1F1E@ornl.gov> <2A3DB53A-D92A-4FC6-8454-5C11039B0343@dsic.upv.es> <4650CE13-784F-4FBE-B5FC-45717BD30103@ornl.gov> <3A414042-4AC3-4B8F-8CE6-6C0A45509ECF@dsic.upv.es> <6C5B1E55-B678-4A54-881B-421E627932E5@dsic.upv.es> <2E0B0F78-3385-4167-A9A9-BEDA970B7F19@ornl.gov> <25798DA5-ECA6-40E5-995D-2BE90D6FDBAF@mcs.anl.gov> Message-ID: Barry, Attached is the quick test program I extracted out of my existing code. This is not clean but you can still understand. I use slepc 3.7.3 and 32 bit real petsc 3.7.4. This requires armadillo from http://arma.sourceforge.net/download.html. Just extract and show the correct path of armadillo in the build.sh. I compiled, ran the code. The error and the output file are also in the tar.gz file. Appreciate your kind support and looking forward for early resolution. -- Regards, Ramki On 6/15/17, 4:35 PM, "Barry Smith" wrote: > On Jun 15, 2017, at 1:45 PM, Kannan, Ramakrishnan wrote: > > Attached is the latest error w/ 32 bit petsc and the uniform random input matrix. Let me know if you are looking for more information. Could you please send the full program that reads in the data files and runs SLEPc generating the problem? We don't have any way of using the data files you sent us. Barry > > > > -- > Regards, > Ramki > > > On 6/15/17, 2:27 PM, "Jose E. Roman" wrote: > > >> El 15 jun 2017, a las 19:35, Barry Smith escribi?: >> >> So where in the code is the decision on how many columns to use made? If we look at that it might help see why it could ever produce different results on different processes. > > After seeing the call stack again, I think my previous comment is wrong. I really don't know what is happening. If the number of columns was different in different processes, it would have failed before reaching that line of code. > > Ramki: could you send me the matrix somehow? I could try it in a machine here. Which options are you using for the solver? > > Jose > > > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: testslepc.tar.gz Type: application/x-gzip Size: 37769 bytes Desc: testslepc.tar.gz URL: From a.croucher at auckland.ac.nz Thu Jun 15 18:37:55 2017 From: a.croucher at auckland.ac.nz (Adrian Croucher) Date: Fri, 16 Jun 2017 11:37:55 +1200 Subject: [petsc-users] Jacobian matrix for dual porosity model In-Reply-To: References: <699ad4c0-6f79-be19-8239-ba2050ccb8de@auckland.ac.nz> <87d1a86i6n.fsf@jedbrown.org> <90E27510-2650-4B07-B37C-1C6D46250FC3@mcs.anl.gov> <87y3sv4qpl.fsf@jedbrown.org> Message-ID: On 16/06/17 01:19, Matthew Knepley wrote: > > Thanks for those ideas, very helpful. > > If I try this approach (forming whole Jacobian matrix and using > PCFieldSplit Schur), I guess I will first need to set up a > modified DMPlex for the whole fracture + matrix mesh- so I can use > it to create vectors and the Jacobian matrix (with the right > sparsity pattern), and also to work out the coloring for finite > differencing. > > Would that be straight-forward to do? Currently my DM just comes > from DMPlexCreateFromFile(). Presumably you can use > DMPlexInsertCone() or similar to add points into it? > > > You can certainly modify the mesh. I need to get a better idea what > kind of modification, and then > I can suggest a way to do it. What do you start with, and what exactly > do you want to add? > > The way dual porosity is normally implemented in a finite volume context is to add an extra matrix rock cell 'inside' each of the original cells (which now represent the fractures, and have their volumes reduced accordingly), with a connection between the fracture cell and its corresponding matrix rock cell, so fluid can flow between them. More generally there can be multiple matrix rock cells for each fracture cell, in which case further matrix rock cells are nested inside the first one, again with connections between them. There are formulae for computing the appropriate effective matrix rock cell volumes and connection areas, typically based on a 'fracture spacing' parameter which determines how fractured the rock is. So in a DMPlex context it would mean somehow adding extra DAG points representing the internal cells and faces for each of the original cells. I'm not sure how that would be done. Another approach, which might be easier, would be not to construct a DM for the whole dual-porosity mesh, but create the Jacobian matrix as a MatNest, with the fracture cell part created using DMCreateMatrix() as now, with the original DM, and the other parts of the Jacobian (representing fracture-matrix and matrix-matrix interactions) created 'by hand'- because of the local one-dimensional nature of the matrix rock cell parts of the mesh, these would be just a bunch of block diagonal matrices anyway. I've just been looking at SNES example ex70.c where something a bit like this is done. I think once the Jacobian matrix is created, you can create a coloring for finite differencing on it just from the matrix itself, and don't actually need a corresponding DM. So this approach might work, without needing to construct a dual-porosity DM. - Adrian -- Dr Adrian Croucher Senior Research Fellow Department of Engineering Science University of Auckland, New Zealand email: a.croucher at auckland.ac.nz tel: +64 ( -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Jun 15 18:54:11 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 15 Jun 2017 18:54:11 -0500 Subject: [petsc-users] slepc NHEP error In-Reply-To: References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> <92EA6381-8F1F-4C9F-88BD-C1357B8C1C42@mcs.anl.gov> <9C04F853-DEDF-4F28-B63E-316AB14E0A97@mcs.anl.gov> <34243810-1A49-499E-812E-0C0CCCC38565@mcs.anl.gov> <4D23FC40-1491-44AF-825E-C4C4160F1F1E@ornl.gov> <2A3DB53A-D92A-4FC6-8454-5C11039B0343@dsic.upv.es> <4650CE13-784F-4FBE-B5FC-45717BD30103@ornl.gov> <3A414042-4AC3-4B8F-8CE6-6C0A45509ECF@dsic.upv.es> <6C5B1E55-B678-4A54-881B-421E627932E5@dsic.upv.es> <2E0B0F78-3385-4167-A9A9-BEDA970B7F19@ornl.gov> <25798DA5-ECA6-40E5-995D-2BE90D6FDBAF@mcs.anl.gov> Message-ID: <2756491B-117D-4985-BB1A-BDF91A21D5BC@mcs.anl.gov> brew install Armadillo fails for me on brew install hdf5 I have reported this to home-brew and hopefully they'll have a fix within a couple of days so I can try to run the test case. Barry > On Jun 15, 2017, at 6:34 PM, Kannan, Ramakrishnan wrote: > > Barry, > > Attached is the quick test program I extracted out of my existing code. This is not clean but you can still understand. I use slepc 3.7.3 and 32 bit real petsc 3.7.4. > > This requires armadillo from http://arma.sourceforge.net/download.html. Just extract and show the correct path of armadillo in the build.sh. > > I compiled, ran the code. The error and the output file are also in the tar.gz file. > > Appreciate your kind support and looking forward for early resolution. > -- > Regards, > Ramki > > > On 6/15/17, 4:35 PM, "Barry Smith" wrote: > > >> On Jun 15, 2017, at 1:45 PM, Kannan, Ramakrishnan wrote: >> >> Attached is the latest error w/ 32 bit petsc and the uniform random input matrix. Let me know if you are looking for more information. > > Could you please send the full program that reads in the data files and runs SLEPc generating the problem? We don't have any way of using the data files you sent us. > > Barry > >> >> >> >> -- >> Regards, >> Ramki >> >> >> On 6/15/17, 2:27 PM, "Jose E. Roman" wrote: >> >> >>> El 15 jun 2017, a las 19:35, Barry Smith escribi?: >>> >>> So where in the code is the decision on how many columns to use made? If we look at that it might help see why it could ever produce different results on different processes. >> >> After seeing the call stack again, I think my previous comment is wrong. I really don't know what is happening. If the number of columns was different in different processes, it would have failed before reaching that line of code. >> >> Ramki: could you send me the matrix somehow? I could try it in a machine here. Which options are you using for the solver? >> >> Jose >> >> >> >> >> > > > > > From kannanr at ornl.gov Thu Jun 15 18:56:24 2017 From: kannanr at ornl.gov (Kannan, Ramakrishnan) Date: Thu, 15 Jun 2017 23:56:24 +0000 Subject: [petsc-users] slepc NHEP error In-Reply-To: <2756491B-117D-4985-BB1A-BDF91A21D5BC@mcs.anl.gov> References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> <92EA6381-8F1F-4C9F-88BD-C1357B8C1C42@mcs.anl.gov> <9C04F853-DEDF-4F28-B63E-316AB14E0A97@mcs.anl.gov> <34243810-1A49-499E-812E-0C0CCCC38565@mcs.anl.gov> <4D23FC40-1491-44AF-825E-C4C4160F1F1E@ornl.gov> <2A3DB53A-D92A-4FC6-8454-5C11039B0343@dsic.upv.es> <4650CE13-784F-4FBE-B5FC-45717BD30103@ornl.gov> <3A414042-4AC3-4B8F-8CE6-6C0A45509ECF@dsic.upv.es> <6C5B1E55-B678-4A54-881B-421E627932E5@dsic.upv.es> <2E0B0F78-3385-4167-A9A9-BEDA970B7F19@ornl.gov> <25798DA5-ECA6-40E5-995D-2BE90D6FDBAF@mcs.anl.gov> , <2756491B-117D-4985-BB1A-BDF91A21D5BC@mcs.anl.gov> Message-ID: 7:1252 You don't need to install. Just download and extract the tar file. There will be a folder of include files. Point this in build.sh. Regards, Ramki Android keyboard at work. Excuse typos and brevity ________________________________ From: Barry Smith Sent: Thursday, June 15, 2017 7:54 PM To: "Kannan, Ramakrishnan" CC: "Jose E. Roman" ,petsc-users at mcs.anl.gov Subject: Re: [petsc-users] slepc NHEP error brew install Armadillo fails for me on brew install hdf5 I have reported this to home-brew and hopefully they'll have a fix within a couple of days so I can try to run the test case. Barry > On Jun 15, 2017, at 6:34 PM, Kannan, Ramakrishnan wrote: > > Barry, > > Attached is the quick test program I extracted out of my existing code. This is not clean but you can still understand. I use slepc 3.7.3 and 32 bit real petsc 3.7.4. > > This requires armadillo from http://arma.sourceforge.net/download.html. Just extract and show the correct path of armadillo in the build.sh. > > I compiled, ran the code. The error and the output file are also in the tar.gz file. > > Appreciate your kind support and looking forward for early resolution. > -- > Regards, > Ramki > > > On 6/15/17, 4:35 PM, "Barry Smith" wrote: > > >> On Jun 15, 2017, at 1:45 PM, Kannan, Ramakrishnan wrote: >> >> Attached is the latest error w/ 32 bit petsc and the uniform random input matrix. Let me know if you are looking for more information. > > Could you please send the full program that reads in the data files and runs SLEPc generating the problem? We don't have any way of using the data files you sent us. > > Barry > >> >> >> >> -- >> Regards, >> Ramki >> >> >> On 6/15/17, 2:27 PM, "Jose E. Roman" wrote: >> >> >>> El 15 jun 2017, a las 19:35, Barry Smith escribi?: >>> >>> So where in the code is the decision on how many columns to use made? If we look at that it might help see why it could ever produce different results on different processes. >> >> After seeing the call stack again, I think my previous comment is wrong. I really don't know what is happening. If the number of columns was different in different processes, it would have failed before reaching that line of code. >> >> Ramki: could you send me the matrix somehow? I could try it in a machine here. Which options are you using for the solver? >> >> Jose >> >> >> >> >> > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Jun 15 19:03:18 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 15 Jun 2017 19:03:18 -0500 Subject: [petsc-users] slepc NHEP error References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> <92EA6381-8F1F-4C9F-88BD-C1357B8C1C42@mcs.anl.gov> <9C04F853-DEDF-4F28-B63E-316AB14E0A97@mcs.anl.gov> <34243810-1A49-499E-812E-0C0CCCC38565@mcs.anl.gov> <4D23FC40-1491-44AF-825E-C4C4160F1F1E@ornl.gov> <2A3DB53A-D92A-4FC6-8454-5C11039B0343@dsic.upv.es> <4650CE13-784F-4FBE-B5FC-45717BD30103@ornl.gov> <3A414042-4AC3-4B8F-8CE6-6C0A45509ECF@dsic.upv.es> <6C5B1E55-B678-4A54-881B-421E627932E5@dsic.upv.es> <2E0B0F78-3385-4167-A9A9-BEDA970B7F19@ornl.gov> <25798DA5-ECA6-40E5-995D-2BE90D6FDBAF@mcs.anl.gov> <2756491B-117D-4985-BB1A-BDF91A21D5BC@mcs.anl.gov> Message-ID: <32692246-AB15-4605-BEB1-82CE12B43FCB@mcs.anl.gov> Ok, got it. > On Jun 15, 2017, at 6:56 PM, Kannan, Ramakrishnan wrote: > > You don't need to install. Just download and extract the tar file. There will be a folder of include files. Point this in build.sh. > > Regards, Ramki > Android keyboard at work. Excuse typos and brevity > From: Barry Smith > Sent: Thursday, June 15, 2017 7:54 PM > To: "Kannan, Ramakrishnan" > CC: "Jose E. Roman" ,petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] slepc NHEP error > > > > brew install Armadillo fails for me on brew install hdf5 I have reported this to home-brew and hopefully they'll have a fix within a couple of days so I can try to run the test case. > > Barry > > > On Jun 15, 2017, at 6:34 PM, Kannan, Ramakrishnan wrote: > > > > Barry, > > > > Attached is the quick test program I extracted out of my existing code. This is not clean but you can still understand. I use slepc 3.7.3 and 32 bit real petsc 3.7.4. > > > > This requires armadillo from http://arma.sourceforge.net/download.html. Just extract and show the correct path of armadillo in the build.sh. > > > > I compiled, ran the code. The error and the output file are also in the tar.gz file. > > > > Appreciate your kind support and looking forward for early resolution. > > -- > > Regards, > > Ramki > > > > > > On 6/15/17, 4:35 PM, "Barry Smith" wrote: > > > > > >> On Jun 15, 2017, at 1:45 PM, Kannan, Ramakrishnan wrote: > >> > >> Attached is the latest error w/ 32 bit petsc and the uniform random input matrix. Let me know if you are looking for more information. > > > > Could you please send the full program that reads in the data files and runs SLEPc generating the problem? We don't have any way of using the data files you sent us. > > > > Barry > > > >> > >> > >> > >> -- > >> Regards, > >> Ramki > >> > >> > >> On 6/15/17, 2:27 PM, "Jose E. Roman" wrote: > >> > >> > >>> El 15 jun 2017, a las 19:35, Barry Smith escribi?: > >>> > >>> So where in the code is the decision on how many columns to use made? If we look at that it might help see why it could ever produce different results on different processes. > >> > >> After seeing the call stack again, I think my previous comment is wrong. I really don't know what is happening. If the number of columns was different in different processes, it would have failed before reaching that line of code. > >> > >> Ramki: could you send me the matrix somehow? I could try it in a machine here. Which options are you using for the solver? > >> > >> Jose > >> > >> > >> > >> > >> > > > > > > > > > > From liujuy at gmail.com Thu Jun 15 19:46:13 2017 From: liujuy at gmail.com (Ju Liu) Date: Thu, 15 Jun 2017 17:46:13 -0700 Subject: [petsc-users] a fieldsplit question Message-ID: Hi PETSc team: Suppose I provide three index sets {0}, {1,3}, {2} for PCFieldSplitSetFields to construct a field split PC. Conceputally, the matrix is split into 3x3 blocks. Is it true that the index set {0} always corresponds to the block with index 0? What is the block index for the sets {1,3}? Thanks! Ju -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Jun 15 20:09:01 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 15 Jun 2017 20:09:01 -0500 Subject: [petsc-users] a fieldsplit question In-Reply-To: References: Message-ID: > On Jun 15, 2017, at 7:46 PM, Ju Liu wrote: > > Hi PETSc team: > > Suppose I provide three index sets {0}, {1,3}, {2} for PCFieldSplitSetFields to construct a field split PC. Conceputally, the matrix is split into 3x3 blocks. Is it true that the index set {0} always corresponds to the block with index 0? What is the block index for the sets {1,3}? > Not sure what you mean by "block index" but the three fields are just defined in the order that you provide the values, so in this case the 0th field is {0}, the 1st is {1,3} and the 2nd is {2}. If you provided them in the order {2},{1,3},{0} then the three fields are first {2}, then {1,3} then {0} > Thanks! > > Ju From bsmith at mcs.anl.gov Thu Jun 15 20:15:09 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 15 Jun 2017 20:15:09 -0500 Subject: [petsc-users] a fieldsplit question In-Reply-To: References: Message-ID: <12A81711-51D5-4D26-B11F-97A0949CF70E@mcs.anl.gov> > On Jun 15, 2017, at 8:13 PM, Ju Liu wrote: > > I see. Suppose I have the following code: > > const PetscInt pfield[1] = {0}, vfields[] = {1,3}, tfield[1] = {2}; > PCFieldSplitSetBlockSize(upc,5); > PCFieldSplitSetFields(upc,"u",3,vfields,vfields); > PCFieldSplitSetFields(upc,"pp",1,pfield,pfield); > PCFieldSplitSetFields(upc,"t",1,tfield,tfield); > > Then the A_00 matrix corresponds to the index of vfields, A_11 with pfield, and A_22 with tfield. It depends on the order one calls the PCFieldSplitSetFields function. Is that right? Yes. Note that since you give them all names you can refer to them by name for setting solver options like -fieldsplit_u_ksp_type etc. > > Thanks! > > Ju > > 2017-06-15 18:09 GMT-07:00 Barry Smith : > > > On Jun 15, 2017, at 7:46 PM, Ju Liu wrote: > > > > Hi PETSc team: > > > > Suppose I provide three index sets {0}, {1,3}, {2} for PCFieldSplitSetFields to construct a field split PC. Conceputally, the matrix is split into 3x3 blocks. Is it true that the index set {0} always corresponds to the block with index 0? What is the block index for the sets {1,3}? > > > > Not sure what you mean by "block index" but the three fields are just defined in the order that you provide the values, so in this case the 0th field is {0}, the 1st is {1,3} and the 2nd is {2}. If you provided them in the order {2},{1,3},{0} then the three fields are first {2}, then {1,3} then {0} > > > > Thanks! > > > > Ju > > > > > -- > Ju Liu, Ph.D. > Postdoctoral Research Fellow > Stanford School of Medicine > http://ju-liu.github.io/ From asmprog32 at hotmail.com Fri Jun 16 03:03:39 2017 From: asmprog32 at hotmail.com (Pietro Incardona) Date: Fri, 16 Jun 2017 08:03:39 +0000 Subject: [petsc-users] PETSC profiling on 1536 cores In-Reply-To: References: , Message-ID: Thanks for the answer. I am aware that Multi-grid overall improve the speed of the solver, on the other hand I do not understand why a multigrid preconditioner should help in understanding what I am doing wrong ________________________________ Da: Justin Chang Inviato: gioved? 15 giugno 2017 23:15:04 A: Pietro Incardona; petsc-users at mcs.anl.gov Oggetto: Re: [petsc-users] PETSC profiling on 1536 cores Using no preconditioner is a bad bad idea and anyone with the gall to do this deserves to be spanked. For the Poisson equation, why not use PETSc's native algebraic multigrid solver? -pc_type gamg On Thu, Jun 15, 2017 at 3:09 PM Pietro Incardona > wrote: Dear All I tried PETSC version 3.6.5 to solve a linear system with 256 000 000 unknown. The equation is Finite differences Poisson equation. I am using Conjugate gradient (the matrix is symmetric) with no preconditioner. Visualizing the solution is reasonable. Unfortunately the Conjugate-Gradient does not scale at all and I am extremely concerned about this problem in paticular about the profiling numbers. Looking at the profiler it seem that 1536 cores = 24 cores x 64 VecScatterBegin 348 1.0 2.3975e-01 1.8 0.00e+00 0.0 7.7e+06 3.1e+04 0.0e+00 0 0 85 99 0 0 0 85 99 0 0 VecScatterEnd 348 1.0 2.8680e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatMult 348 1.0 4.1088e+00 1.4 8.18e+08 1.3 7.7e+06 3.1e+04 0.0e+00 2 52 85 99 0 2 52 85 99 0 281866 I was expecting that this part was the most expensive and take around 4 second in total that sound reasonable Unfortunately on 1536 cores = 24 cores x 64 VecTDot 696 1.0 3.4442e+01 1.4 2.52e+08 1.3 0.0e+00 0.0e+00 7.0e+02 12 16 0 0 65 12 16 0 0 65 10346 VecNorm 349 1.0 1.1101e+02 1.1 1.26e+08 1.3 0.0e+00 0.0e+00 3.5e+02 46 8 0 0 33 46 8 0 0 33 1610 VecAXPY 696 1.0 8.3134e+01 1.1 2.52e+08 1.3 0.0e+00 0.0e+00 0.0e+00 34 16 0 0 0 34 16 0 0 0 4286 Take over 228 seconds. Considering that doing some test on the cluster I do not see any problem with MPI_Reduce I do not understand how these numbers are possible ////////////////////////// I also attach to the profiling part the inversion on 48 cores ///////////////////////// VecTDot 696 1.0 1.4684e+01 1.3 3.92e+09 1.1 0.0e+00 0.0e+00 7.0e+02 6 16 0 0 65 6 16 0 0 65 24269 VecNorm 349 1.0 4.9612e+01 1.3 1.96e+09 1.1 0.0e+00 0.0e+00 3.5e+02 22 8 0 0 33 22 8 0 0 33 3602 VecCopy 351 1.0 8.8359e+00 7.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecSet 2 1.0 1.6177e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 696 1.0 8.8559e+01 1.1 3.92e+09 1.1 0.0e+00 0.0e+00 0.0e+00 42 16 0 0 0 42 16 0 0 0 4024 VecAYPX 347 1.0 4.6790e+00 1.2 1.95e+09 1.1 0.0e+00 0.0e+00 0.0e+00 2 8 0 0 0 2 8 0 0 0 37970 VecAssemblyBegin 2 1.0 5.0942e-02 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 1 0 0 0 0 1 0 VecAssemblyEnd 2 1.0 1.9073e-05 6.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 348 1.0 1.2763e+00 1.5 0.00e+00 0.0 4.6e+05 2.0e+05 0.0e+00 0 0 97100 0 0 0 97100 0 0 VecScatterEnd 348 1.0 4.6741e+00 5.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatMult 348 1.0 2.8440e+01 1.1 1.27e+10 1.1 4.6e+05 2.0e+05 0.0e+00 13 52 97100 0 13 52 97100 0 40722 MatAssemblyBegin 1 1.0 7.4749e-0124.5 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 8.3194e-01 1.0 0.00e+00 0.0 2.7e+03 5.1e+04 8.0e+00 0 0 1 0 1 0 0 1 0 1 0 KSPSetUp 1 1.0 8.2883e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 1.7964e+02 1.0 2.45e+10 1.1 4.6e+05 2.0e+05 1.0e+03 87100 97100 98 87100 97100 98 12398 PCSetUp 1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 349 1.0 8.8166e+00 7.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////// If you need more information or test please let me know. Thanks in advance Here the log of 1536 cores 345 KSP Residual norm 1.007085286893e-02 346 KSP Residual norm 1.010054402040e-02 347 KSP Residual norm 1.002139574355e-02 348 KSP Residual norm 9.775851299055e-03 Max div for vorticity 1.84572e-05 Integral: 6.62466e-09 130.764 -132 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./vic_petsc on a arch-linux2-c-opt named taurusi6217 with 1536 processors, by incard Thu Jun 15 22:27:09 2017 Using Petsc Release Version 3.6.4, Apr, 12, 2016 Max Max/Min Avg Total Time (sec): 2.312e+02 1.00027 2.312e+02 Objects: 1.900e+01 1.00000 1.900e+01 Flops: 1.573e+09 1.32212 1.450e+09 2.227e+12 Flops/sec: 6.804e+06 1.32242 6.271e+06 9.633e+09 MPI Messages: 8.202e+03 2.06821 5.871e+03 9.018e+06 MPI Message Lengths: 2.013e+08 1.86665 2.640e+04 2.381e+11 MPI Reductions: 1.066e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 2.3120e+02 100.0% 2.2272e+12 100.0% 9.018e+06 100.0% 2.640e+04 100.0% 1.065e+03 99.9% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecTDot 696 1.0 3.4442e+01 1.4 2.52e+08 1.3 0.0e+00 0.0e+00 7.0e+02 12 16 0 0 65 12 16 0 0 65 10346 VecNorm 349 1.0 1.1101e+02 1.1 1.26e+08 1.3 0.0e+00 0.0e+00 3.5e+02 46 8 0 0 33 46 8 0 0 33 1610 VecCopy 351 1.0 2.7609e-01 5.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 2 1.0 3.8961e-0256.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 696 1.0 8.3134e+01 1.1 2.52e+08 1.3 0.0e+00 0.0e+00 0.0e+00 34 16 0 0 0 34 16 0 0 0 4286 VecAYPX 347 1.0 2.0852e-01 2.0 1.25e+08 1.3 0.0e+00 0.0e+00 0.0e+00 0 8 0 0 0 0 8 0 0 0 852044 VecAssemblyBegin 2 1.0 8.3237e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 1 0 0 0 0 1 0 VecAssemblyEnd 2 1.0 5.1022e-0517.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 348 1.0 2.3975e-01 1.8 0.00e+00 0.0 7.7e+06 3.1e+04 0.0e+00 0 0 85 99 0 0 0 85 99 0 0 VecScatterEnd 348 1.0 2.8680e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatMult 348 1.0 4.1088e+00 1.4 8.18e+08 1.3 7.7e+06 3.1e+04 0.0e+00 2 52 85 99 0 2 52 85 99 0 281866 MatAssemblyBegin 1 1.0 9.1920e-02 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 3.9093e+00 1.0 0.00e+00 0.0 4.4e+04 7.7e+03 8.0e+00 2 0 0 0 1 2 0 0 0 1 0 KSPSetUp 1 1.0 8.1890e-03 4.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 2.1525e+02 1.0 1.57e+09 1.3 7.7e+06 3.1e+04 1.0e+03 93100 85 99 98 93100 85 99 98 10347 PCSetUp 1 1.0 5.5075e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 349 1.0 2.7485e-01 6.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Vector 10 7 7599552 0 Vector Scatter 1 1 1088 0 Matrix 3 3 20858912 0 Krylov Solver 1 1 1216 0 Index Set 2 2 242288 0 Preconditioner 1 1 816 0 Viewer 1 0 0 0 ======================================================================================================================== Average time to get PetscTime(): 9.53674e-08 Average time for MPI_Barrier(): 3.68118e-05 Average time for zero size MPI_Send(): 3.24349e-06 #PETSc Option Table entries: -ksp_atol 0.010000 -ksp_max_it 500 -ksp_monitor -ksp_type cg -log_summary #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-cxx-dialect=C++11 --with-mpi-dir=/sw/taurus/libraries/openmpi/1.10.2-gnu --with-parmetis=yes --with-parmetis-dir=/scratch/p_ppm//PARMETIS/ --with-metis=yes --with-metis-dir=/scratch/p_ppm//METIS --with-boost=yes --with-boost-dir=/scratch/p_ppm//BOOST --with-blas-lib=/scratch/p_ppm//OPENBLAS/lib/libopenblas.a --with-lapack-lib=/scratch/p_ppm//OPENBLAS/lib/libopenblas.a --with-suitesparse=yes --with-suitesparse-dir=/scratch/p_ppm//SUITESPARSE --with-trilinos=yes -with-trilinos-dir=/scratch/p_ppm//TRILINOS --with-scalapack=yes -with-scalapack-dir=/scratch/p_ppm//SCALAPACK --with-mumps=yes --with-mumps-include=/scratch/p_ppm//MUMPS/include --with-superlu_dist=yes --with-superlu_dist-lib=/scratch/p_ppm//SUPERLU_DIST/lib/libsuperlu_dist_4.3.a --with-superlu_dist-include=/scratch/p_ppm//SUPERLU_DIST/include/ --with-hypre=yes -with-hypre-dir=/scratch/p_ppm//HYPRE --with-mumps-lib=""/scratch/p_ppm//MUMPS/lib/libdmumps.a /scratch/p_ppm//MUMPS/lib/libmumps_common.a /scratch/p_ppm//MUMPS/lib/libpord.a"" --prefix=/scratch/p_ppm//PETSC --with-debugging=0 ----------------------------------------- Libraries compiled on Wed Feb 22 17:30:49 2017 on tauruslogin4 Machine characteristics: Linux-2.6.32-642.11.1.el6.Bull.106.x86_64-x86_64-with-redhat-6.8-Santiago Using PETSc directory: /lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4 Using PETSc arch: arch-linux2-c-opt ----------------------------------------- Using C compiler: /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpif90 -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/include -I/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/include -I/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/include -I/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/include -I/scratch/p_ppm/TRILINOS/include -I/scratch/p_ppm/HYPRE/include -I/scratch/p_ppm/SUPERLU_DIST/include -I/scratch/p_ppm/SUITESPARSE/include -I/scratch/p_ppm/MUMPS/include -I/scratch/p_ppm/PARMETIS/include -I/scratch/p_ppm/METIS/include -I/scratch/p_ppm/BOOST/include -I/sw/taurus/libraries/openmpi/1.10.2-gnu/include ----------------------------------------- Using C linker: /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpicc Using Fortran linker: /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpif90 Using libraries: -Wl,-rpath,/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/lib -L/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/scratch/p_ppm/TRILINOS/lib -L/scratch/p_ppm/TRILINOS/lib -lpike-blackbox -ltrilinoscouplings -lmsqutil -lmesquite -lctrilinos -lsundancePdeopt -lsundanceStdFwk -lsundanceStdMesh -lsundanceCore -lsundanceInterop -lsundanceUtils -lsundancePlaya -lpiro -lrol -lstokhos_muelu -lstokhos_ifpack2 -lstokhos_amesos2 -lstokhos_tpetra -lstokhos_sacado -lstokhos -lmoochothyra -lmoocho -lrythmos -lmuelu-adapters -lmuelu-interface -lmuelu -lmoertel -llocathyra -llocaepetra -llocalapack -lloca -lnoxepetra -lnoxlapack -lnox -lphalanx -lstk_mesh_fixtures -lstk_search_util_base -lstk_search -lstk_unit_test_utils -lstk_io_util -lstk_io -lstk_mesh_base -lstk_topology -lstk_util_use_cases -lstk_util_registry -lstk_util_diag -lstk_util_env -lstk_util_util -lstkclassic_search_util -lstkclassic_search -lstkclassic_rebalance_utils -lstkclassic_rebalance -lstkclassic_linsys -lstkclassic_io_util -lstkclassic_io -lstkclassic_expreval -lstkclassic_algsup -lstkclassic_mesh_fem -lstkclassic_mesh_base -lstkclassic_util_use_cases -lstkclassic_util_unit_test_support -lstkclassic_util_parallel -lstkclassic_util_diag -lstkclassic_util_env -lstkclassic_util_util -lstk_mesh_fixtures -lstk_search_util_base -lstk_search -lstk_unit_test_utils -lstk_io_util -lstk_io -lstk_mesh_base -lstk_topology -lstk_util_use_cases -lstk_util_registry -lstk_util_diag -lstk_util_env -lstk_util_util -lstkclassic_search_util -lstkclassic_search -lstkclassic_rebalance_utils -lstkclassic_rebalance -lstkclassic_linsys -lstkclassic_io_util -lstkclassic_io -lstkclassic_expreval -lstkclassic_algsup -lstkclassic_mesh_fem -lstkclassic_mesh_base -lstkclassic_util_use_cases -lstkclassic_util_unit_test_support -lstkclassic_util_parallel -lstkclassic_util_diag -lstkclassic_util_env -lstkclassic_util_util -lintrepid -lteko -lfei_trilinos -lfei_base -lstratimikos -lstratimikosbelos -lstratimikosaztecoo -lstratimikosamesos -lstratimikosml -lstratimikosifpack -lifpack2-adapters -lifpack2 -lanasazitpetra -lModeLaplace -lanasaziepetra -lanasazi -lkomplex -lsupes -laprepro_lib -lchaco -lIonit -lIotr -lIohb -lIogn -lIopg -lIoss -lsupes -laprepro_lib -lchaco -lIonit -lIotr -lIohb -lIogn -lIopg -lIoss -lamesos2 -lshylu -lbelostpetra -lbelosepetra -lbelos -lml -lifpack -lzoltan2 -lpamgen_extras -lpamgen -lamesos -lgaleri-xpetra -lgaleri-epetra -laztecoo -ldpliris -lisorropia -loptipack -lxpetra-sup -lxpetra -lthyratpetra -lthyraepetraext -lthyraepetra -lthyracore -lthyratpetra -lthyraepetraext -lthyraepetra -lthyracore -lepetraext -ltpetraext -ltpetrainout -ltpetra -lkokkostsqr -ltpetrakernels -ltpetraclassiclinalg -ltpetraclassicnodeapi -ltpetraclassic -ltpetraext -ltpetrainout -ltpetra -lkokkostsqr -ltpetrakernels -ltpetraclassiclinalg -ltpetraclassicnodeapi -ltpetraclassic -ltriutils -lglobipack -lshards -lzoltan -lepetra -lsacado -lrtop -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchoscore -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -ltpi -lgtest -lpthread -Wl,-rpath,/scratch/p_ppm/HYPRE/lib -L/scratch/p_ppm/HYPRE/lib -lHYPRE -Wl,-rpath,/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -L/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 -L/sw/global/compilers/gcc/5.3.0/lib64 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -L/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0 -L/sw/global/compilers/gcc/5.3.0/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib -L/sw/global/compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib -L/sw/global/compilers/gcc/5.3.0/lib -lmpi_cxx -lstdc++ -Wl,-rpath,/scratch/p_ppm//SUPERLU_DIST/lib -L/scratch/p_ppm//SUPERLU_DIST/lib -lsuperlu_dist_4.3 -Wl,-rpath,/scratch/p_ppm/SUITESPARSE/lib -L/scratch/p_ppm/SUITESPARSE/lib -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lrt -ldmumps -Wl,-rpath,/scratch/p_ppm//MUMPS/lib -L/scratch/p_ppm//MUMPS/lib -lmumps_common -lpord -Wl,-rpath,/scratch/p_ppm/SCALAPACK/lib -L/scratch/p_ppm/SCALAPACK/lib -lscalapack -Wl,-rpath,/scratch/p_ppm//OPENBLAS/lib -L/scratch/p_ppm//OPENBLAS/lib -lopenblas -Wl,-rpath,/scratch/p_ppm/PARMETIS/lib -L/scratch/p_ppm/PARMETIS/lib -lparmetis -Wl,-rpath,/scratch/p_ppm/METIS/lib -L/scratch/p_ppm/METIS/lib -lmetis -lX11 -lhwloc -lssl -lcrypto -lm -lmpi_usempi -lmpi_mpifh -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpi_cxx -lstdc++ -Wl,-rpath,/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -L/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 -L/sw/global/compilers/gcc/5.3.0/lib64 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -L/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 -L/sw/global/compilers/gcc/5.3.0/lib64 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 -L/sw/global/compilers/gcc/5.3.0/lib64 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0 -L/sw/global/compilers/gcc/5.3.0/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -L/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib -L/sw/global/compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib -L/sw/global/compilers/gcc/5.3.0/lib -ldl -Wl,-rpath,/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -lmpi -lgcc_s -lpthread -ldl ----------------------------------------- Regards Pietro Incardona -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmprog32 at hotmail.com Fri Jun 16 03:29:14 2017 From: asmprog32 at hotmail.com (Pietro Incardona) Date: Fri, 16 Jun 2017 08:29:14 +0000 Subject: [petsc-users] PETSC profiling on 1536 cores In-Reply-To: <896F771F-C770-4003-B616-C45AD9A0EB68@mcs.anl.gov> References: , <896F771F-C770-4003-B616-C45AD9A0EB68@mcs.anl.gov> Message-ID: Thanks for the fast reply here I attached the log_view from my program on 96 processors and 1536 processors vic_output_96 and vic_output_1536. I also attached the output of ex34 increased in a grid 1600x512x512. The good news is that in the example 34 it seem to scale at least looking at KSPSolve function time. What I do not understand are all the other numbers in the profiler. In exercise 34 on 96 processor VecScale and VecNormalize these call that take most of the time 148 and 153 seconds (that for me is already surprising) on 1536 these call suddenly drop to 0.1s and 2.2 seconds that are higher than a factor x16. I will try now other examples more in the direction of having only CG and no preconditioners to see what happen in scalability and understand what I am doing wrong. But in the meanwhile I have a second question does the fact that I compiled PETSC with debugging = 0 could affect the profiler numbers to be unreliable ? Thanks in advance Pietro Incardona ________________________________ Da: Barry Smith Inviato: gioved? 15 giugno 2017 23:16:50 A: Pietro Incardona Cc: petsc-users at mcs.anl.gov Oggetto: Re: [petsc-users] PETSC profiling on 1536 cores Please send the complete -log_view files as attachments. Both 1536 and 48 cores. The mailers mess up the ASCII formatting in text so I can't make heads or tails out of the result. What is the MPI being used and what kind of interconnect does the network have? Also is the MPI specific to that interconnect or just something compiled off the web? Barry > On Jun 15, 2017, at 4:09 PM, Pietro Incardona wrote: > > Dear All > > I tried PETSC version 3.6.5 to solve a linear system with 256 000 000 unknown. The equation is Finite differences Poisson equation. > > I am using Conjugate gradient (the matrix is symmetric) with no preconditioner. Visualizing the solution is reasonable. > Unfortunately the Conjugate-Gradient does not scale at all and I am extremely concerned about this problem in paticular about the profiling numbers. > Looking at the profiler it seem that > > 1536 cores = 24 cores x 64 > > VecScatterBegin 348 1.0 2.3975e-01 1.8 0.00e+00 0.0 7.7e+06 3.1e+04 0.0e+00 0 0 85 99 0 0 0 85 99 0 0 > VecScatterEnd 348 1.0 2.8680e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatMult 348 1.0 4.1088e+00 1.4 8.18e+08 1.3 7.7e+06 3.1e+04 0.0e+00 2 52 85 99 0 2 52 85 99 0 281866 > > I was expecting that this part was the most expensive and take around 4 second in total that sound reasonable > > Unfortunately > > on 1536 cores = 24 cores x 64 > > VecTDot 696 1.0 3.4442e+01 1.4 2.52e+08 1.3 0.0e+00 0.0e+00 7.0e+02 12 16 0 0 65 12 16 0 0 65 10346 > VecNorm 349 1.0 1.1101e+02 1.1 1.26e+08 1.3 0.0e+00 0.0e+00 3.5e+02 46 8 0 0 33 46 8 0 0 33 1610 > VecAXPY 696 1.0 8.3134e+01 1.1 2.52e+08 1.3 0.0e+00 0.0e+00 0.0e+00 34 16 0 0 0 34 16 0 0 0 4286 > > Take over 228 seconds. Considering that doing some test on the cluster I do not see any problem with MPI_Reduce I do not understand how these numbers are possible > > > ////////////////////////// I also attach to the profiling part the inversion on 48 cores ///////////////////////// > > VecTDot 696 1.0 1.4684e+01 1.3 3.92e+09 1.1 0.0e+00 0.0e+00 7.0e+02 6 16 0 0 65 6 16 0 0 65 24269 > VecNorm 349 1.0 4.9612e+01 1.3 1.96e+09 1.1 0.0e+00 0.0e+00 3.5e+02 22 8 0 0 33 22 8 0 0 33 3602 > VecCopy 351 1.0 8.8359e+00 7.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > VecSet 2 1.0 1.6177e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAXPY 696 1.0 8.8559e+01 1.1 3.92e+09 1.1 0.0e+00 0.0e+00 0.0e+00 42 16 0 0 0 42 16 0 0 0 4024 > VecAYPX 347 1.0 4.6790e+00 1.2 1.95e+09 1.1 0.0e+00 0.0e+00 0.0e+00 2 8 0 0 0 2 8 0 0 0 37970 > VecAssemblyBegin 2 1.0 5.0942e-02 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 1 0 0 0 0 1 0 > VecAssemblyEnd 2 1.0 1.9073e-05 6.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecScatterBegin 348 1.0 1.2763e+00 1.5 0.00e+00 0.0 4.6e+05 2.0e+05 0.0e+00 0 0 97100 0 0 0 97100 0 0 > VecScatterEnd 348 1.0 4.6741e+00 5.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatMult 348 1.0 2.8440e+01 1.1 1.27e+10 1.1 4.6e+05 2.0e+05 0.0e+00 13 52 97100 0 13 52 97100 0 40722 > MatAssemblyBegin 1 1.0 7.4749e-0124.5 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 1 1.0 8.3194e-01 1.0 0.00e+00 0.0 2.7e+03 5.1e+04 8.0e+00 0 0 1 0 1 0 0 1 0 1 0 > KSPSetUp 1 1.0 8.2883e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 1 1.0 1.7964e+02 1.0 2.45e+10 1.1 4.6e+05 2.0e+05 1.0e+03 87100 97100 98 87100 97100 98 12398 > PCSetUp 1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > PCApply 349 1.0 8.8166e+00 7.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > > > ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////// > > If you need more information or test please let me know. > > Thanks in advance > > Here the log of 1536 cores > > 345 KSP Residual norm 1.007085286893e-02 > 346 KSP Residual norm 1.010054402040e-02 > 347 KSP Residual norm 1.002139574355e-02 > 348 KSP Residual norm 9.775851299055e-03 > Max div for vorticity 1.84572e-05 Integral: 6.62466e-09 130.764 -132 > ************************************************************************************************************************ > *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** > ************************************************************************************************************************ > > ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- > > ./vic_petsc on a arch-linux2-c-opt named taurusi6217 with 1536 processors, by incard Thu Jun 15 22:27:09 2017 > Using Petsc Release Version 3.6.4, Apr, 12, 2016 > > Max Max/Min Avg Total > Time (sec): 2.312e+02 1.00027 2.312e+02 > Objects: 1.900e+01 1.00000 1.900e+01 > Flops: 1.573e+09 1.32212 1.450e+09 2.227e+12 > Flops/sec: 6.804e+06 1.32242 6.271e+06 9.633e+09 > MPI Messages: 8.202e+03 2.06821 5.871e+03 9.018e+06 > MPI Message Lengths: 2.013e+08 1.86665 2.640e+04 2.381e+11 > MPI Reductions: 1.066e+03 1.00000 > > Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N --> 2N flops > and VecAXPY() for complex vectors of length N --> 8N flops > > Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total counts %Total Avg %Total counts %Total > 0: Main Stage: 2.3120e+02 100.0% 2.2272e+12 100.0% 9.018e+06 100.0% 2.640e+04 100.0% 1.065e+03 99.9% > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on interpreting output. > Phase summary info: > Count: number of times phase was executed > Time and Flops: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > Avg. len: average message length (bytes) > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). > %T - percent time in this phase %F - percent flops in this phase > %M - percent messages in this phase %L - percent message lengths in this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flops --- Global --- --- Stage --- Total > Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > VecTDot 696 1.0 3.4442e+01 1.4 2.52e+08 1.3 0.0e+00 0.0e+00 7.0e+02 12 16 0 0 65 12 16 0 0 65 10346 > VecNorm 349 1.0 1.1101e+02 1.1 1.26e+08 1.3 0.0e+00 0.0e+00 3.5e+02 46 8 0 0 33 46 8 0 0 33 1610 > VecCopy 351 1.0 2.7609e-01 5.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 2 1.0 3.8961e-0256.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAXPY 696 1.0 8.3134e+01 1.1 2.52e+08 1.3 0.0e+00 0.0e+00 0.0e+00 34 16 0 0 0 34 16 0 0 0 4286 > VecAYPX 347 1.0 2.0852e-01 2.0 1.25e+08 1.3 0.0e+00 0.0e+00 0.0e+00 0 8 0 0 0 0 8 0 0 0 852044 > VecAssemblyBegin 2 1.0 8.3237e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 1 0 0 0 0 1 0 > VecAssemblyEnd 2 1.0 5.1022e-0517.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecScatterBegin 348 1.0 2.3975e-01 1.8 0.00e+00 0.0 7.7e+06 3.1e+04 0.0e+00 0 0 85 99 0 0 0 85 99 0 0 > VecScatterEnd 348 1.0 2.8680e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatMult 348 1.0 4.1088e+00 1.4 8.18e+08 1.3 7.7e+06 3.1e+04 0.0e+00 2 52 85 99 0 2 52 85 99 0 281866 > MatAssemblyBegin 1 1.0 9.1920e-02 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 1 1.0 3.9093e+00 1.0 0.00e+00 0.0 4.4e+04 7.7e+03 8.0e+00 2 0 0 0 1 2 0 0 0 1 0 > KSPSetUp 1 1.0 8.1890e-03 4.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 1 1.0 2.1525e+02 1.0 1.57e+09 1.3 7.7e+06 3.1e+04 1.0e+03 93100 85 99 98 93100 85 99 98 10347 > PCSetUp 1 1.0 5.5075e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > PCApply 349 1.0 2.7485e-01 6.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > ------------------------------------------------------------------------------------------------------------------------ > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' Mem. > Reports information only for process 0. > > --- Event Stage 0: Main Stage > > Vector 10 7 7599552 0 > Vector Scatter 1 1 1088 0 > Matrix 3 3 20858912 0 > Krylov Solver 1 1 1216 0 > Index Set 2 2 242288 0 > Preconditioner 1 1 816 0 > Viewer 1 0 0 0 > ======================================================================================================================== > Average time to get PetscTime(): 9.53674e-08 > Average time for MPI_Barrier(): 3.68118e-05 > Average time for zero size MPI_Send(): 3.24349e-06 > #PETSc Option Table entries: > -ksp_atol 0.010000 > -ksp_max_it 500 > -ksp_monitor > -ksp_type cg > -log_summary > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 > Configure options: --with-cxx-dialect=C++11 --with-mpi-dir=/sw/taurus/libraries/openmpi/1.10.2-gnu --with-parmetis=yes --with-parmetis-dir=/scratch/p_ppm//PARMETIS/ --with-metis=yes --with-metis-dir=/scratch/p_ppm//METIS --with-boost=yes --with-boost-dir=/scratch/p_ppm//BOOST --with-blas-lib=/scratch/p_ppm//OPENBLAS/lib/libopenblas.a --with-lapack-lib=/scratch/p_ppm//OPENBLAS/lib/libopenblas.a --with-suitesparse=yes --with-suitesparse-dir=/scratch/p_ppm//SUITESPARSE --with-trilinos=yes -with-trilinos-dir=/scratch/p_ppm//TRILINOS --with-scalapack=yes -with-scalapack-dir=/scratch/p_ppm//SCALAPACK --with-mumps=yes --with-mumps-include=/scratch/p_ppm//MUMPS/include --with-superlu_dist=yes --with-superlu_dist-lib=/scratch/p_ppm//SUPERLU_DIST/lib/libsuperlu_dist_4.3.a --with-superlu_dist-include=/scratch/p_ppm//SUPERLU_DIST/include/ --with-hypre=yes -with-hypre-dir=/scratch/p_ppm//HYPRE --with-mumps-lib=""/scratch/p_ppm//MUMPS/lib/libdmumps.a /scratch/p_ppm//MUMPS/lib/libmumps_common.a /scratch/p_ppm//MUMPS/lib/libpord.a"" --prefix=/scratch/p_ppm//PETSC --with-debugging=0 > ----------------------------------------- > Libraries compiled on Wed Feb 22 17:30:49 2017 on tauruslogin4 > Machine characteristics: Linux-2.6.32-642.11.1.el6.Bull.106.x86_64-x86_64-with-redhat-6.8-Santiago > Using PETSc directory: /lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4 > Using PETSc arch: arch-linux2-c-opt > ----------------------------------------- > > Using C compiler: /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS} ${CFLAGS} > Using Fortran compiler: /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpif90 -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O ${FOPTFLAGS} ${FFLAGS} > ----------------------------------------- > > Using include paths: -I/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/include -I/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/include -I/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/include -I/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/include -I/scratch/p_ppm/TRILINOS/include -I/scratch/p_ppm/HYPRE/include -I/scratch/p_ppm/SUPERLU_DIST/include -I/scratch/p_ppm/SUITESPARSE/include -I/scratch/p_ppm/MUMPS/include -I/scratch/p_ppm/PARMETIS/include -I/scratch/p_ppm/METIS/include -I/scratch/p_ppm/BOOST/include -I/sw/taurus/libraries/openmpi/1.10.2-gnu/include > ----------------------------------------- > > Using C linker: /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpicc > Using Fortran linker: /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpif90 > Using libraries: -Wl,-rpath,/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/lib -L/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/scratch/p_ppm/TRILINOS/lib -L/scratch/p_ppm/TRILINOS/lib -lpike-blackbox -ltrilinoscouplings -lmsqutil -lmesquite -lctrilinos -lsundancePdeopt -lsundanceStdFwk -lsundanceStdMesh -lsundanceCore -lsundanceInterop -lsundanceUtils -lsundancePlaya -lpiro -lrol -lstokhos_muelu -lstokhos_ifpack2 -lstokhos_amesos2 -lstokhos_tpetra -lstokhos_sacado -lstokhos -lmoochothyra -lmoocho -lrythmos -lmuelu-adapters -lmuelu-interface -lmuelu -lmoertel -llocathyra -llocaepetra -llocalapack -lloca -lnoxepetra -lnoxlapack -lnox -lphalanx -lstk_mesh_fixtures -lstk_search_util_base -lstk_search -lstk_unit_test_utils -lstk_io_util -lstk_io -lstk_mesh_base -lstk_topology -lstk_util_use_cases -lstk_util_registry -lstk_util_diag -lstk_util_env -lstk_util_util -lstkclassic_search_util -lstkclassic_search -lstkclassic_rebalance_utils -lstkclassic_rebalance -lstkclassic_linsys -lstkclassic_io_util -lstkclassic_io -lstkclassic_expreval -lstkclassic_algsup -lstkclassic_mesh_fem -lstkclassic_mesh_base -lstkclassic_util_use_cases -lstkclassic_util_unit_test_support -lstkclassic_util_parallel -lstkclassic_util_diag -lstkclassic_util_env -lstkclassic_util_util -lstk_mesh_fixtures -lstk_search_util_base -lstk_search -lstk_unit_test_utils -lstk_io_util -lstk_io -lstk_mesh_base -lstk_topology -lstk_util_use_cases -lstk_util_registry -lstk_util_diag -lstk_util_env -lstk_util_util -lstkclassic_search_util -lstkclassic_search -lstkclassic_rebalance_utils -lstkclassic_rebalance -lstkclassic_linsys -lstkclassic_io_util -lstkclassic_io -lstkclassic_expreval -lstkclassic_algsup -lstkclassic_mesh_fem -lstkclassic_mesh_base -lstkclassic_util_use_cases -lstkclassic_util_unit_test_support -lstkclassic_util_parallel -lstkclassic_util_diag -lstkclassic_util_env -lstkclassic_util_util -lintrepid -lteko -lfei_trilinos -lfei_base -lstratimikos -lstratimikosbelos -lstratimikosaztecoo -lstratimikosamesos -lstratimikosml -lstratimikosifpack -lifpack2-adapters -lifpack2 -lanasazitpetra -lModeLaplace -lanasaziepetra -lanasazi -lkomplex -lsupes -laprepro_lib -lchaco -lIonit -lIotr -lIohb -lIogn -lIopg -lIoss -lsupes -laprepro_lib -lchaco -lIonit -lIotr -lIohb -lIogn -lIopg -lIoss -lamesos2 -lshylu -lbelostpetra -lbelosepetra -lbelos -lml -lifpack -lzoltan2 -lpamgen_extras -lpamgen -lamesos -lgaleri-xpetra -lgaleri-epetra -laztecoo -ldpliris -lisorropia -loptipack -lxpetra-sup -lxpetra -lthyratpetra -lthyraepetraext -lthyraepetra -lthyracore -lthyratpetra -lthyraepetraext -lthyraepetra -lthyracore -lepetraext -ltpetraext -ltpetrainout -ltpetra -lkokkostsqr -ltpetrakernels -ltpetraclassiclinalg -ltpetraclassicnodeapi -ltpetraclassic -ltpetraext -ltpetrainout -ltpetra -lkokkostsqr -ltpetrakernels -ltpetraclassiclinalg -ltpetraclassicnodeapi -ltpetraclassic -ltriutils -lglobipack -lshards -lzoltan -lepetra -lsacado -lrtop -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchoscore -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -ltpi -lgtest -lpthread -Wl,-rpath,/scratch/p_ppm/HYPRE/lib -L/scratch/p_ppm/HYPRE/lib -lHYPRE -Wl,-rpath,/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -L/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 -L/sw/global/compilers/gcc/5.3.0/lib64 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -L/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0 -L/sw/global/compilers/gcc/5.3.0/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib -L/sw/global/compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib -L/sw/global/compilers/gcc/5.3.0/lib -lmpi_cxx -lstdc++ -Wl,-rpath,/scratch/p_ppm//SUPERLU_DIST/lib -L/scratch/p_ppm//SUPERLU_DIST/lib -lsuperlu_dist_4.3 -Wl,-rpath,/scratch/p_ppm/SUITESPARSE/lib -L/scratch/p_ppm/SUITESPARSE/lib -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lrt -ldmumps -Wl,-rpath,/scratch/p_ppm//MUMPS/lib -L/scratch/p_ppm//MUMPS/lib -lmumps_common -lpord -Wl,-rpath,/scratch/p_ppm/SCALAPACK/lib -L/scratch/p_ppm/SCALAPACK/lib -lscalapack -Wl,-rpath,/scratch/p_ppm//OPENBLAS/lib -L/scratch/p_ppm//OPENBLAS/lib -lopenblas -Wl,-rpath,/scratch/p_ppm/PARMETIS/lib -L/scratch/p_ppm/PARMETIS/lib -lparmetis -Wl,-rpath,/scratch/p_ppm/METIS/lib -L/scratch/p_ppm/METIS/lib -lmetis -lX11 -lhwloc -lssl -lcrypto -lm -lmpi_usempi -lmpi_mpifh -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpi_cxx -lstdc++ -Wl,-rpath,/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -L/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 -L/sw/global/compilers/gcc/5.3.0/lib64 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -L/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 -L/sw/global/compilers/gcc/5.3.0/lib64 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 -L/sw/global/compilers/gcc/5.3.0/lib64 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0 -L/sw/global/compilers/gcc/5.3.0/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -L/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib -L/sw/global/compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib -L/sw/global/compilers/gcc/5.3.0/lib -ldl -Wl,-rpath,/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -lmpi -lgcc_s -lpthread -ldl > ----------------------------------------- > > Regards > Pietro Incardona -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: vic_output_96 Type: application/octet-stream Size: 32324 bytes Desc: vic_output_96 URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: vic_output_1536 Type: application/octet-stream Size: 58156 bytes Desc: vic_output_1536 URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex34_1536 Type: application/octet-stream Size: 17126 bytes Desc: ex34_1536 URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex34_96 Type: application/octet-stream Size: 17111 bytes Desc: ex34_96 URL: From jroman at dsic.upv.es Fri Jun 16 07:44:47 2017 From: jroman at dsic.upv.es (Jose E. Roman) Date: Fri, 16 Jun 2017 14:44:47 +0200 Subject: [petsc-users] slepc NHEP error In-Reply-To: <32692246-AB15-4605-BEB1-82CE12B43FCB@mcs.anl.gov> References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> <92EA6381-8F1F-4C9F-88BD-C1357B8C1C42@mcs.anl.gov> <9C04F853-DEDF-4F28-B63E-316AB14E0A97@mcs.anl.gov> <34243810-1A49-499E-812E-0C0CCCC38565@mcs.anl.gov> <4D23FC40-1491-44AF-825E-C4C4160F1F1E@ornl.gov> <2A3DB53A-D92A-4FC6-8454-5C11039B0343@dsic.upv.es> <4650CE13-784F-4FBE-B5FC-45717BD30103@ornl.gov> <3A414042-4AC3-4B8F-8CE6-6C0A45509ECF@dsic.upv.es> <6C5B1E55-B678-4A54-881B-421E627932E5@dsic.upv.es> <2E0B0F78-3385-4167-A9A9-BEDA970! B7F19@ornl.gov> <25798DA5-ECA6-40E5-995D-2BE90D6FDBAF@mcs.anl.gov> <2756491B-117D-4985-BB1A-BDF91A21D5BC@mcs.anl.gov> <32692246-AB15-4605-BEB1-82CE12B43FCB@mcs.anl.gov> Message-ID: <3F1CBD8E-315C-4A2F-B38D-E907C8790BC4@dsic.upv.es> I was able to reproduce the problem. I will try to track it down. Jose > El 16 jun 2017, a las 2:03, Barry Smith escribi?: > > > Ok, got it. > >> On Jun 15, 2017, at 6:56 PM, Kannan, Ramakrishnan wrote: >> >> You don't need to install. Just download and extract the tar file. There will be a folder of include files. Point this in build.sh. >> >> Regards, Ramki >> Android keyboard at work. Excuse typos and brevity >> From: Barry Smith >> Sent: Thursday, June 15, 2017 7:54 PM >> To: "Kannan, Ramakrishnan" >> CC: "Jose E. Roman" ,petsc-users at mcs.anl.gov >> Subject: Re: [petsc-users] slepc NHEP error >> >> >> >> brew install Armadillo fails for me on brew install hdf5 I have reported this to home-brew and hopefully they'll have a fix within a couple of days so I can try to run the test case. >> >> Barry >> >>> On Jun 15, 2017, at 6:34 PM, Kannan, Ramakrishnan wrote: >>> >>> Barry, >>> >>> Attached is the quick test program I extracted out of my existing code. This is not clean but you can still understand. I use slepc 3.7.3 and 32 bit real petsc 3.7.4. >>> >>> This requires armadillo from http://arma.sourceforge.net/download.html. Just extract and show the correct path of armadillo in the build.sh. >>> >>> I compiled, ran the code. The error and the output file are also in the tar.gz file. >>> >>> Appreciate your kind support and looking forward for early resolution. >>> -- >>> Regards, >>> Ramki >>> >>> >>> On 6/15/17, 4:35 PM, "Barry Smith" wrote: >>> >>> >>>> On Jun 15, 2017, at 1:45 PM, Kannan, Ramakrishnan wrote: >>>> >>>> Attached is the latest error w/ 32 bit petsc and the uniform random input matrix. Let me know if you are looking for more information. >>> >>> Could you please send the full program that reads in the data files and runs SLEPc generating the problem? We don't have any way of using the data files you sent us. >>> >>> Barry >>> >>>> >>>> >>>> >>>> -- >>>> Regards, >>>> Ramki >>>> >>>> >>>> On 6/15/17, 2:27 PM, "Jose E. Roman" wrote: >>>> >>>> >>>>> El 15 jun 2017, a las 19:35, Barry Smith escribi?: >>>>> >>>>> So where in the code is the decision on how many columns to use made? If we look at that it might help see why it could ever produce different results on different processes. >>>> >>>> After seeing the call stack again, I think my previous comment is wrong. I really don't know what is happening. If the number of columns was different in different processes, it would have failed before reaching that line of code. >>>> >>>> Ramki: could you send me the matrix somehow? I could try it in a machine here. Which options are you using for the solver? >>>> >>>> Jose >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> >>> > From kannanr at ornl.gov Fri Jun 16 07:50:16 2017 From: kannanr at ornl.gov (Kannan, Ramakrishnan) Date: Fri, 16 Jun 2017 12:50:16 +0000 Subject: [petsc-users] slepc NHEP error In-Reply-To: <3F1CBD8E-315C-4A2F-B38D-E907C8790BC4@dsic.upv.es> References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> <92EA6381-8F1F-4C9F-88BD-C1357B8C1C42@mcs.anl.gov> <9C04F853-DEDF-4F28-B63E-316AB14E0A97@mcs.anl.gov> <34243810-1A49-499E-812E-0C0CCCC38565@mcs.anl.gov> <4D23FC40-1491-44AF-825E-C4C4160F1F1E@ornl.gov> <2A3DB53A-D92A-4FC6-8454-5C11039B0343@dsic.upv.es> <4650CE13-784F-4FBE-B5FC-45717BD30103@ornl.gov> <3A414042-4AC3-4B8F-8CE6-6C0A45509ECF@dsic.upv.es> <6C5B1E55-B678-4A54-881B-421E627932E5@dsic.upv.es> <25798DA5-ECA6-40E5-995D-2BE90D6FDBAF@mcs.anl.gov> <2756491B-117D-4985-BB1A-BDF91A21D5BC@mcs.anl.gov> <32692246-AB15-4605-BEB1-82CE12B43FCB@mcs.anl.gov> <3F1CBD8E-315C-4A2F-B38D-E907C8790BC4@dsic.upv.es> Message-ID: <0C7C7E28-DDD7-4867-A387-71CA499740CC@ornl.gov> Jose/Barry, Excellent. This is a good news. I have a deadline on this code next Wednesday and hope it is not a big one to address. Please keep me posted. -- Regards, Ramki On 6/16/17, 8:44 AM, "Jose E. Roman" wrote: I was able to reproduce the problem. I will try to track it down. Jose > El 16 jun 2017, a las 2:03, Barry Smith escribi?: > > > Ok, got it. > >> On Jun 15, 2017, at 6:56 PM, Kannan, Ramakrishnan wrote: >> >> You don't need to install. Just download and extract the tar file. There will be a folder of include files. Point this in build.sh. >> >> Regards, Ramki >> Android keyboard at work. Excuse typos and brevity >> From: Barry Smith >> Sent: Thursday, June 15, 2017 7:54 PM >> To: "Kannan, Ramakrishnan" >> CC: "Jose E. Roman" ,petsc-users at mcs.anl.gov >> Subject: Re: [petsc-users] slepc NHEP error >> >> >> >> brew install Armadillo fails for me on brew install hdf5 I have reported this to home-brew and hopefully they'll have a fix within a couple of days so I can try to run the test case. >> >> Barry >> >>> On Jun 15, 2017, at 6:34 PM, Kannan, Ramakrishnan wrote: >>> >>> Barry, >>> >>> Attached is the quick test program I extracted out of my existing code. This is not clean but you can still understand. I use slepc 3.7.3 and 32 bit real petsc 3.7.4. >>> >>> This requires armadillo from http://arma.sourceforge.net/download.html. Just extract and show the correct path of armadillo in the build.sh. >>> >>> I compiled, ran the code. The error and the output file are also in the tar.gz file. >>> >>> Appreciate your kind support and looking forward for early resolution. >>> -- >>> Regards, >>> Ramki >>> >>> >>> On 6/15/17, 4:35 PM, "Barry Smith" wrote: >>> >>> >>>> On Jun 15, 2017, at 1:45 PM, Kannan, Ramakrishnan wrote: >>>> >>>> Attached is the latest error w/ 32 bit petsc and the uniform random input matrix. Let me know if you are looking for more information. >>> >>> Could you please send the full program that reads in the data files and runs SLEPc generating the problem? We don't have any way of using the data files you sent us. >>> >>> Barry >>> >>>> >>>> >>>> >>>> -- >>>> Regards, >>>> Ramki >>>> >>>> >>>> On 6/15/17, 2:27 PM, "Jose E. Roman" wrote: >>>> >>>> >>>>> El 15 jun 2017, a las 19:35, Barry Smith escribi?: >>>>> >>>>> So where in the code is the decision on how many columns to use made? If we look at that it might help see why it could ever produce different results on different processes. >>>> >>>> After seeing the call stack again, I think my previous comment is wrong. I really don't know what is happening. If the number of columns was different in different processes, it would have failed before reaching that line of code. >>>> >>>> Ramki: could you send me the matrix somehow? I could try it in a machine here. Which options are you using for the solver? >>>> >>>> Jose >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> >>> > From hzhang at mcs.anl.gov Fri Jun 16 07:55:45 2017 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Fri, 16 Jun 2017 12:55:45 +0000 Subject: [petsc-users] empty split for fieldsplit In-Reply-To: References: <86908C35-EBFC-48C6-A002-B64BEC85A375@mcs.anl.gov> <5596225C-DB37-4040-B709-5E6F4B18041B@mcs.anl.gov> , Message-ID: I'm in Boulder and will be back home this evening. Will test it this weekend. Hong ________________________________ From: Smith, Barry F. Sent: Thursday, June 15, 2017 1:38:11 PM To: Hoang Giang Bui; Zhang, Hong Cc: petsc-users Subject: Re: [petsc-users] empty split for fieldsplit Hong, Please build the attached code with master and run with petscmpiexec -n 2 ./ex1 -mat_size 40 -block_size 2 -method 2 I think this is a bug in your new MatGetSubMatrix routines. You take the block size of the outer IS and pass it into the inner IS but that inner IS may not support the same block size hence the crash. Can you please debug this? Thanks Barry > On Jun 15, 2017, at 7:56 AM, Hoang Giang Bui wrote: > > Hi Barry > > Thanks for pointing out the error. I think the problem coming from the zero fieldsplit in proc 0. In this modified example, I parameterized the matrix size and block size, so when you're executing > > mpirun -np 2 ./ex -mat_size 40 -block_size 2 -method 1 > > everything was fine. With method = 1, fieldsplit size of B is nonzero and is divided by the block size. > > With method=2, i.e mpirun -np 2 ./ex -mat_size 40 -block_size 2 -method 2, the fieldsplit B is zero on proc 0, and the error is thrown > > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [1]PETSC ERROR: Arguments are incompatible > [1]PETSC ERROR: Local size 11 not compatible with block size 2 > > This is somehow not logical, because 0 is divided by block_size. > > Furthermore, if you execute "mpirun -np 2 ./ex -mat_size 20 -block_size 2 -method 2", the code hangs at ISSetBlockSize, which is pretty similar to my original problem. Probably the original one also hangs at ISSetBlockSize, which I may not realize at that time. > > Giang > > On Wed, Jun 14, 2017 at 5:29 PM, Barry Smith wrote: > > You can't do this > > ierr = MatSetSizes(A,PETSC_DECIDE,N,N,N);CHKERRQ(ierr); > > use PETSC_DECIDE for the third argument > > Also this is wrong > > for (i = Istart; i < Iend; ++i) > { > ierr = MatSetValue(A,i,i,2,INSERT_VALUES);CHKERRQ(ierr); > ierr = MatSetValue(A,i+1,i,-1,INSERT_VALUES);CHKERRQ(ierr); > ierr = MatSetValue(A,i,i+1,-1,INSERT_VALUES);CHKERRQ(ierr); > } > > you will get > > $ petscmpiexec -n 2 ./ex1 > 0: Istart = 0, Iend = 60 > 1: Istart = 60, Iend = 120 > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [1]PETSC ERROR: Argument out of range > [1]PETSC ERROR: Row too large: row 120 max 119 > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [1]PETSC ERROR: Petsc Development GIT revision: v3.7.6-4103-g93161b8192 GIT Date: 2017-06-11 14:49:39 -0500 > [1]PETSC ERROR: ./ex1 on a arch-basic named Barrys-MacBook-Pro.local by barrysmith Wed Jun 14 18:26:52 2017 > [1]PETSC ERROR: Configure options PETSC_ARCH=arch-basic > [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 550 in /Users/barrysmith/Src/petsc/src/mat/impls/aij/mpi/mpiaij.c > [1]PETSC ERROR: #2 MatSetValues() line 1270 in /Users/barrysmith/Src/petsc/src/mat/interface/matrix.c > [1]PETSC ERROR: #3 main() line 30 in /Users/barrysmith/Src/petsc/test-dir/ex1.c > [1]PETSC ERROR: PETSc Option Table entries: > [1]PETSC ERROR: -malloc_test > > You need to get the example working so it ends with the error you reported previously not these other bugs. > > > > On Jun 12, 2017, at 10:19 AM, Hoang Giang Bui wrote: > > > > Dear Barry > > > > I made a small example with 2 process with one empty split in proc 0. But it gives another strange error > > > > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > [1]PETSC ERROR: Arguments are incompatible > > [1]PETSC ERROR: Local size 31 not compatible with block size 2 > > > > The local size is always 60, so this is confusing. > > > > Giang > > > > On Sun, Jun 11, 2017 at 8:11 PM, Barry Smith wrote: > > Could be, send us a simple example that demonstrates the problem and we'll track it down. > > > > > > > On Jun 11, 2017, at 12:34 PM, Hoang Giang Bui wrote: > > > > > > Hello > > > > > > I noticed that my code stopped very long, possibly hang, at PCFieldSplitSetIS. There are two splits and one split is empty in one process. May that be the possible reason that PCFieldSplitSetIS hang ? > > > > > > Giang > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From damian at man.poznan.pl Fri Jun 16 07:57:10 2017 From: damian at man.poznan.pl (Damian Kaliszan) Date: Fri, 16 Jun 2017 14:57:10 +0200 Subject: [petsc-users] strange PETSc/KSP GMRES timings for MPI+OMP configuration on KNLs Message-ID: <1868632011.20170616145710@man.poznan.pl> Hi, For several days I've been trying to figure out what is going wrong with my python app timings solving Ax=b with KSP (GMRES) solver when trying to run on Intel's KNL 7210/7230. I downsized the problem to 1000x1000 A matrix and a single node and observed the following: I'm attaching 2 extreme timings where configurations differ only by 1 OMP thread (64MPI/1 OMP vs 64/2 OMPs), 23321 vs 23325 slurm task ids. Any help will be appreciated.... Best, Damian -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: int_1.jpg Type: image/jpeg Size: 92539 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: slurm-23321.out Type: application/octet-stream Size: 39057 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: slurm-23325.out Type: application/octet-stream Size: 39054 bytes Desc: not available URL: From jroman at dsic.upv.es Fri Jun 16 10:36:03 2017 From: jroman at dsic.upv.es (Jose E. Roman) Date: Fri, 16 Jun 2017 17:36:03 +0200 Subject: [petsc-users] slepc NHEP error In-Reply-To: <0C7C7E28-DDD7-4867-A387-71CA499740CC@ornl.gov> References: <5EBDC484-4AC6-4AF3-8D3C-FF999830604F@ornl.gov> <8B4ECCCC-86B1-4580-8D3A-97DF12F02D7E@ornl.gov> <708A0DB5-AE36-40EE-86A9-288A9282A8B9@ornl.gov> <248F9117-5F43-438A-9E79-10D1E9EF9795@mcs.anl.gov> <0BED9C76-8FC4-4E58-B12C-45E21EC183DE@ornl.gov> <92EA6381-8F1F-4C9F-88BD-C1357B8C1C42@mcs.anl.gov> <9C04F853-DEDF-4F28-B63E-316AB14E0A97@mcs.anl.gov> <34243810-1A49-499E-812E-0C0CCCC38565@mcs.anl.gov> <4D23FC40-1491-44AF-825E-C4C4160F1F1E@ornl.gov> <2A3DB53A-D92A-4FC6-8454-5C11039B0343@dsic.upv.es> <4650CE13-784F-4FBE-B5FC-45717BD30103@ornl.gov> <3A414042-4AC3-4B8F-8CE6-6C0A45509ECF@dsic.upv.es> <6C5B1E55-B678-4A54-881B-421E627932E5@dsic.upv.es> <25798DA5-ECA6-40E5-995D-2BE90D6! FDBAF@mcs.anl.gov> <2756491B-117D-4985-BB1A-BDF91A21D5BC@mcs.anl.gov> <32692246-AB15-4605-BEB1-82CE12B43FCB@mcs.anl.gov> <3F1CBD8E-315C-4A2F-B38D-E907C8790BC4@dsic.upv.es> <0C7C7E28-DDD7-4867-A387-71CA499740CC@ornl.gov> Message-ID: <2792468E-84AC-4F09-A347-A4AB2F68A3E5@dsic.upv.es> I still need to work on this, but in principle my previous comments are confirmed. In particular, in my tests it seems that the problem does not appear if PETSc has been configured with --download-fblaslapack If you have a deadline, I would suggest you to go this way, until I can find a more definitive solution. Jose > El 16 jun 2017, a las 14:50, Kannan, Ramakrishnan escribi?: > > Jose/Barry, > > Excellent. This is a good news. I have a deadline on this code next Wednesday and hope it is not a big one to address. Please keep me posted. > -- > Regards, > Ramki > > > On 6/16/17, 8:44 AM, "Jose E. Roman" wrote: > > I was able to reproduce the problem. I will try to track it down. > Jose > >> El 16 jun 2017, a las 2:03, Barry Smith escribi?: >> >> >> Ok, got it. >> >>> On Jun 15, 2017, at 6:56 PM, Kannan, Ramakrishnan wrote: >>> >>> You don't need to install. Just download and extract the tar file. There will be a folder of include files. Point this in build.sh. >>> >>> Regards, Ramki >>> Android keyboard at work. Excuse typos and brevity >>> From: Barry Smith >>> Sent: Thursday, June 15, 2017 7:54 PM >>> To: "Kannan, Ramakrishnan" >>> CC: "Jose E. Roman" ,petsc-users at mcs.anl.gov >>> Subject: Re: [petsc-users] slepc NHEP error >>> >>> >>> >>> brew install Armadillo fails for me on brew install hdf5 I have reported this to home-brew and hopefully they'll have a fix within a couple of days so I can try to run the test case. >>> >>> Barry >>> >>>> On Jun 15, 2017, at 6:34 PM, Kannan, Ramakrishnan wrote: >>>> >>>> Barry, >>>> >>>> Attached is the quick test program I extracted out of my existing code. This is not clean but you can still understand. I use slepc 3.7.3 and 32 bit real petsc 3.7.4. >>>> >>>> This requires armadillo from http://arma.sourceforge.net/download.html. Just extract and show the correct path of armadillo in the build.sh. >>>> >>>> I compiled, ran the code. The error and the output file are also in the tar.gz file. >>>> >>>> Appreciate your kind support and looking forward for early resolution. >>>> -- >>>> Regards, >>>> Ramki >>>> >>>> >>>> On 6/15/17, 4:35 PM, "Barry Smith" wrote: >>>> >>>> >>>>> On Jun 15, 2017, at 1:45 PM, Kannan, Ramakrishnan wrote: >>>>> >>>>> Attached is the latest error w/ 32 bit petsc and the uniform random input matrix. Let me know if you are looking for more information. >>>> >>>> Could you please send the full program that reads in the data files and runs SLEPc generating the problem? We don't have any way of using the data files you sent us. >>>> >>>> Barry >>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Regards, >>>>> Ramki >>>>> >>>>> >>>>> On 6/15/17, 2:27 PM, "Jose E. Roman" wrote: >>>>> >>>>> >>>>>> El 15 jun 2017, a las 19:35, Barry Smith escribi?: >>>>>> >>>>>> So where in the code is the decision on how many columns to use made? If we look at that it might help see why it could ever produce different results on different processes. >>>>> >>>>> After seeing the call stack again, I think my previous comment is wrong. I really don't know what is happening. If the number of columns was different in different processes, it would have failed before reaching that line of code. >>>>> >>>>> Ramki: could you send me the matrix somehow? I could try it in a machine here. Which options are you using for the solver? >>>>> >>>>> Jose >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>>> >> > > > > From bsmith at mcs.anl.gov Fri Jun 16 13:24:45 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 16 Jun 2017 13:24:45 -0500 Subject: [petsc-users] PETSC profiling on 1536 cores In-Reply-To: References: <896F771F-C770-4003-B616-C45AD9A0EB68@mcs.anl.gov> Message-ID: <407ED2A7-12E6-4D5D-8FA8-94E429FED678@mcs.anl.gov> > On Jun 16, 2017, at 3:29 AM, Pietro Incardona wrote: > > Thanks for the fast reply here I attached the log_view from my program on 96 processors and 1536 processors vic_output_96 and vic_output_1536. I also attached the output of ex34 increased in a grid 1600x512x512. The good news is that in the example 34 it seem to scale at least looking at KSPSolve function time. What I do not understand are all the other numbers in the profiler. In exercise 34 on 96 processor VecScale This is the hint that tells use the problem. The dang Open BLAS is running in parallel using threads. This is oversubscribing the number of cores and killing performance. You need to make sure that the BLAS is not ITSELF running in parallel (since PETSc/MPI is doing the parallelism). Set the environmental variable OPENBLAS_NUM_THREADS=1 before running the code (if running on a batch system make sure that the variable is set). Your runs should be much saner now and you should not see nonsense like VecScale taking more than a tiny amount of time. Barry Note to PETSc developers - we need to do something about this problem, perhaps configure detects openblas_set_num_threads and PetscInitialize() calls it with the value 1 by default to prevent this oversubscription? > and VecNormalize these call that take most of the time 148 and 153 seconds (that for me is already surprising) on 1536 these call suddenly drop to 0.1s and 2.2 seconds that are higher than a factor x16. > > I will try now other examples more in the direction of having only CG and no preconditioners to see what happen in scalability and understand what I am doing wrong. But in the meanwhile I have a second question does the fact that I compiled PETSC with debugging = 0 could affect the profiler numbers to be unreliable ? > > Thanks in advance > Pietro Incardona > > > Da: Barry Smith > Inviato: gioved? 15 giugno 2017 23:16:50 > A: Pietro Incardona > Cc: petsc-users at mcs.anl.gov > Oggetto: Re: [petsc-users] PETSC profiling on 1536 cores > > > Please send the complete -log_view files as attachments. Both 1536 and 48 cores. The mailers mess up the ASCII formatting in text so I can't make heads or tails out of the result. > > What is the MPI being used and what kind of interconnect does the network have? Also is the MPI specific to that interconnect or just something compiled off the web? > > > Barry > > > On Jun 15, 2017, at 4:09 PM, Pietro Incardona wrote: > > > > Dear All > > > > I tried PETSC version 3.6.5 to solve a linear system with 256 000 000 unknown. The equation is Finite differences Poisson equation. > > > > I am using Conjugate gradient (the matrix is symmetric) with no preconditioner. Visualizing the solution is reasonable. > > Unfortunately the Conjugate-Gradient does not scale at all and I am extremely concerned about this problem in paticular about the profiling numbers. > > Looking at the profiler it seem that > > > > 1536 cores = 24 cores x 64 > > > > VecScatterBegin 348 1.0 2.3975e-01 1.8 0.00e+00 0.0 7.7e+06 3.1e+04 0.0e+00 0 0 85 99 0 0 0 85 99 0 0 > > VecScatterEnd 348 1.0 2.8680e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > > MatMult 348 1.0 4.1088e+00 1.4 8.18e+08 1.3 7.7e+06 3.1e+04 0.0e+00 2 52 85 99 0 2 52 85 99 0 281866 > > > > I was expecting that this part was the most expensive and take around 4 second in total that sound reasonable > > > > Unfortunately > > > > on 1536 cores = 24 cores x 64 > > > > VecTDot 696 1.0 3.4442e+01 1.4 2.52e+08 1.3 0.0e+00 0.0e+00 7.0e+02 12 16 0 0 65 12 16 0 0 65 10346 > > VecNorm 349 1.0 1.1101e+02 1.1 1.26e+08 1.3 0.0e+00 0.0e+00 3.5e+02 46 8 0 0 33 46 8 0 0 33 1610 > > VecAXPY 696 1.0 8.3134e+01 1.1 2.52e+08 1.3 0.0e+00 0.0e+00 0.0e+00 34 16 0 0 0 34 16 0 0 0 4286 > > > > Take over 228 seconds. Considering that doing some test on the cluster I do not see any problem with MPI_Reduce I do not understand how these numbers are possible > > > > > > ////////////////////////// I also attach to the profiling part the inversion on 48 cores ///////////////////////// > > > > VecTDot 696 1.0 1.4684e+01 1.3 3.92e+09 1.1 0.0e+00 0.0e+00 7.0e+02 6 16 0 0 65 6 16 0 0 65 24269 > > VecNorm 349 1.0 4.9612e+01 1.3 1.96e+09 1.1 0.0e+00 0.0e+00 3.5e+02 22 8 0 0 33 22 8 0 0 33 3602 > > VecCopy 351 1.0 8.8359e+00 7.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > > VecSet 2 1.0 1.6177e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecAXPY 696 1.0 8.8559e+01 1.1 3.92e+09 1.1 0.0e+00 0.0e+00 0.0e+00 42 16 0 0 0 42 16 0 0 0 4024 > > VecAYPX 347 1.0 4.6790e+00 1.2 1.95e+09 1.1 0.0e+00 0.0e+00 0.0e+00 2 8 0 0 0 2 8 0 0 0 37970 > > VecAssemblyBegin 2 1.0 5.0942e-02 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 1 0 0 0 0 1 0 > > VecAssemblyEnd 2 1.0 1.9073e-05 6.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecScatterBegin 348 1.0 1.2763e+00 1.5 0.00e+00 0.0 4.6e+05 2.0e+05 0.0e+00 0 0 97100 0 0 0 97100 0 0 > > VecScatterEnd 348 1.0 4.6741e+00 5.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > > MatMult 348 1.0 2.8440e+01 1.1 1.27e+10 1.1 4.6e+05 2.0e+05 0.0e+00 13 52 97100 0 13 52 97100 0 40722 > > MatAssemblyBegin 1 1.0 7.4749e-0124.5 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatAssemblyEnd 1 1.0 8.3194e-01 1.0 0.00e+00 0.0 2.7e+03 5.1e+04 8.0e+00 0 0 1 0 1 0 0 1 0 1 0 > > KSPSetUp 1 1.0 8.2883e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > KSPSolve 1 1.0 1.7964e+02 1.0 2.45e+10 1.1 4.6e+05 2.0e+05 1.0e+03 87100 97100 98 87100 97100 98 12398 > > PCSetUp 1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > PCApply 349 1.0 8.8166e+00 7.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > > > > > > ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////// > > > > If you need more information or test please let me know. > > > > Thanks in advance > > > > Here the log of 1536 cores > > > > 345 KSP Residual norm 1.007085286893e-02 > > 346 KSP Residual norm 1.010054402040e-02 > > 347 KSP Residual norm 1.002139574355e-02 > > 348 KSP Residual norm 9.775851299055e-03 > > Max div for vorticity 1.84572e-05 Integral: 6.62466e-09 130.764 -132 > > ************************************************************************************************************************ > > *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** > > ************************************************************************************************************************ > > > > ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- > > > > ./vic_petsc on a arch-linux2-c-opt named taurusi6217 with 1536 processors, by incard Thu Jun 15 22:27:09 2017 > > Using Petsc Release Version 3.6.4, Apr, 12, 2016 > > > > Max Max/Min Avg Total > > Time (sec): 2.312e+02 1.00027 2.312e+02 > > Objects: 1.900e+01 1.00000 1.900e+01 > > Flops: 1.573e+09 1.32212 1.450e+09 2.227e+12 > > Flops/sec: 6.804e+06 1.32242 6.271e+06 9.633e+09 > > MPI Messages: 8.202e+03 2.06821 5.871e+03 9.018e+06 > > MPI Message Lengths: 2.013e+08 1.86665 2.640e+04 2.381e+11 > > MPI Reductions: 1.066e+03 1.00000 > > > > Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) > > e.g., VecAXPY() for real vectors of length N --> 2N flops > > and VecAXPY() for complex vectors of length N --> 8N flops > > > > Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- > > Avg %Total Avg %Total counts %Total Avg %Total counts %Total > > 0: Main Stage: 2.3120e+02 100.0% 2.2272e+12 100.0% 9.018e+06 100.0% 2.640e+04 100.0% 1.065e+03 99.9% > > > > ------------------------------------------------------------------------------------------------------------------------ > > See the 'Profiling' chapter of the users' manual for details on interpreting output. > > Phase summary info: > > Count: number of times phase was executed > > Time and Flops: Max - maximum over all processors > > Ratio - ratio of maximum to minimum over all processors > > Mess: number of messages sent > > Avg. len: average message length (bytes) > > Reduct: number of global reductions > > Global: entire computation > > Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). > > %T - percent time in this phase %F - percent flops in this phase > > %M - percent messages in this phase %L - percent message lengths in this phase > > %R - percent reductions in this phase > > Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) > > ------------------------------------------------------------------------------------------------------------------------ > > Event Count Time (sec) Flops --- Global --- --- Stage --- Total > > Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > > ------------------------------------------------------------------------------------------------------------------------ > > > > --- Event Stage 0: Main Stage > > > > VecTDot 696 1.0 3.4442e+01 1.4 2.52e+08 1.3 0.0e+00 0.0e+00 7.0e+02 12 16 0 0 65 12 16 0 0 65 10346 > > VecNorm 349 1.0 1.1101e+02 1.1 1.26e+08 1.3 0.0e+00 0.0e+00 3.5e+02 46 8 0 0 33 46 8 0 0 33 1610 > > VecCopy 351 1.0 2.7609e-01 5.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecSet 2 1.0 3.8961e-0256.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecAXPY 696 1.0 8.3134e+01 1.1 2.52e+08 1.3 0.0e+00 0.0e+00 0.0e+00 34 16 0 0 0 34 16 0 0 0 4286 > > VecAYPX 347 1.0 2.0852e-01 2.0 1.25e+08 1.3 0.0e+00 0.0e+00 0.0e+00 0 8 0 0 0 0 8 0 0 0 852044 > > VecAssemblyBegin 2 1.0 8.3237e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 1 0 0 0 0 1 0 > > VecAssemblyEnd 2 1.0 5.1022e-0517.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecScatterBegin 348 1.0 2.3975e-01 1.8 0.00e+00 0.0 7.7e+06 3.1e+04 0.0e+00 0 0 85 99 0 0 0 85 99 0 0 > > VecScatterEnd 348 1.0 2.8680e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > > MatMult 348 1.0 4.1088e+00 1.4 8.18e+08 1.3 7.7e+06 3.1e+04 0.0e+00 2 52 85 99 0 2 52 85 99 0 281866 > > MatAssemblyBegin 1 1.0 9.1920e-02 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatAssemblyEnd 1 1.0 3.9093e+00 1.0 0.00e+00 0.0 4.4e+04 7.7e+03 8.0e+00 2 0 0 0 1 2 0 0 0 1 0 > > KSPSetUp 1 1.0 8.1890e-03 4.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > KSPSolve 1 1.0 2.1525e+02 1.0 1.57e+09 1.3 7.7e+06 3.1e+04 1.0e+03 93100 85 99 98 93100 85 99 98 10347 > > PCSetUp 1 1.0 5.5075e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > PCApply 349 1.0 2.7485e-01 6.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > ------------------------------------------------------------------------------------------------------------------------ > > > > Memory usage is given in bytes: > > > > Object Type Creations Destructions Memory Descendants' Mem. > > Reports information only for process 0. > > > > --- Event Stage 0: Main Stage > > > > Vector 10 7 7599552 0 > > Vector Scatter 1 1 1088 0 > > Matrix 3 3 20858912 0 > > Krylov Solver 1 1 1216 0 > > Index Set 2 2 242288 0 > > Preconditioner 1 1 816 0 > > Viewer 1 0 0 0 > > ======================================================================================================================== > > Average time to get PetscTime(): 9.53674e-08 > > Average time for MPI_Barrier(): 3.68118e-05 > > Average time for zero size MPI_Send(): 3.24349e-06 > > #PETSc Option Table entries: > > -ksp_atol 0.010000 > > -ksp_max_it 500 > > -ksp_monitor > > -ksp_type cg > > -log_summary > > #End of PETSc Option Table entries > > Compiled without FORTRAN kernels > > Compiled with full precision matrices (default) > > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 > > Configure options: --with-cxx-dialect=C++11 --with-mpi-dir=/sw/taurus/libraries/openmpi/1.10.2-gnu --with-parmetis=yes --with-parmetis-dir=/scratch/p_ppm//PARMETIS/ --with-metis=yes --with-metis-dir=/scratch/p_ppm//METIS --with-boost=yes --with-boost-dir=/scratch/p_ppm//BOOST --with-blas-lib=/scratch/p_ppm//OPENBLAS/lib/libopenblas.a --with-lapack-lib=/scratch/p_ppm//OPENBLAS/lib/libopenblas.a --with-suitesparse=yes --with-suitesparse-dir=/scratch/p_ppm//SUITESPARSE --with-trilinos=yes -with-trilinos-dir=/scratch/p_ppm//TRILINOS --with-scalapack=yes -with-scalapack-dir=/scratch/p_ppm//SCALAPACK --with-mumps=yes --with-mumps-include=/scratch/p_ppm//MUMPS/include --with-superlu_dist=yes --with-superlu_dist-lib=/scratch/p_ppm//SUPERLU_DIST/lib/libsuperlu_dist_4.3.a --with-superlu_dist-include=/scratch/p_ppm//SUPERLU_DIST/include/ --with-hypre=yes -with-hypre-dir=/scratch/p_ppm//HYPRE --with-mumps-lib=""/scratch/p_ppm//MUMPS/lib/libdmumps.a /scratch/p_ppm//MUMPS/lib/libmumps_common.a /scratch/p_ppm//MUMPS/lib/libpord.a"" --prefix=/scratch/p_ppm//PETSC --with-debugging=0 > > ----------------------------------------- > > Libraries compiled on Wed Feb 22 17:30:49 2017 on tauruslogin4 > > Machine characteristics: Linux-2.6.32-642.11.1.el6.Bull.106.x86_64-x86_64-with-redhat-6.8-Santiago > > Using PETSc directory: /lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4 > > Using PETSc arch: arch-linux2-c-opt > > ----------------------------------------- > > > > Using C compiler: /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS} ${CFLAGS} > > Using Fortran compiler: /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpif90 -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O ${FOPTFLAGS} ${FFLAGS} > > ----------------------------------------- > > > > Using include paths: -I/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/include -I/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/include -I/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/include -I/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/include -I/scratch/p_ppm/TRILINOS/include -I/scratch/p_ppm/HYPRE/include -I/scratch/p_ppm/SUPERLU_DIST/include -I/scratch/p_ppm/SUITESPARSE/include -I/scratch/p_ppm/MUMPS/include -I/scratch/p_ppm/PARMETIS/include -I/scratch/p_ppm/METIS/include -I/scratch/p_ppm/BOOST/include -I/sw/taurus/libraries/openmpi/1.10.2-gnu/include > > ----------------------------------------- > > > > Using C linker: /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpicc > > Using Fortran linker: /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpif90 > > Using libraries: -Wl,-rpath,/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/lib -L/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/scratch/p_ppm/TRILINOS/lib -L/scratch/p_ppm/TRILINOS/lib -lpike-blackbox -ltrilinoscouplings -lmsqutil -lmesquite -lctrilinos -lsundancePdeopt -lsundanceStdFwk -lsundanceStdMesh -lsundanceCore -lsundanceInterop -lsundanceUtils -lsundancePlaya -lpiro -lrol -lstokhos_muelu -lstokhos_ifpack2 -lstokhos_amesos2 -lstokhos_tpetra -lstokhos_sacado -lstokhos -lmoochothyra -lmoocho -lrythmos -lmuelu-adapters -lmuelu-interface -lmuelu -lmoertel -llocathyra -llocaepetra -llocalapack -lloca -lnoxepetra -lnoxlapack -lnox -lphalanx -lstk_mesh_fixtures -lstk_search_util_base -lstk_search -lstk_unit_test_utils -lstk_io_util -lstk_io -lstk_mesh_base -lstk_topology -lstk_util_use_cases -lstk_util_registry -lstk_util_diag -lstk_util_env -lstk_util_util -lstkclassic_search_util -lstkclassic_search -lstkclassic_rebalance_utils -lstkclassic_rebalance -lstkclassic_linsys -lstkclassic_io_util -lstkclassic_io -lstkclassic_expreval -lstkclassic_algsup -lstkclassic_mesh_fem -lstkclassic_mesh_base -lstkclassic_util_use_cases -lstkclassic_util_unit_test_support -lstkclassic_util_parallel -lstkclassic_util_diag -lstkclassic_util_env -lstkclassic_util_util -lstk_mesh_fixtures -lstk_search_util_base -lstk_search -lstk_unit_test_utils -lstk_io_util -lstk_io -lstk_mesh_base -lstk_topology -lstk_util_use_cases -lstk_util_registry -lstk_util_diag -lstk_util_env -lstk_util_util -lstkclassic_search_util -lstkclassic_search -lstkclassic_rebalance_utils -lstkclassic_rebalance -lstkclassic_linsys -lstkclassic_io_util -lstkclassic_io -lstkclassic_expreval -lstkclassic_algsup -lstkclassic_mesh_fem -lstkclassic_mesh_base -lstkclassic_util_use_cases -lstkclassic_util_unit_test_support -lstkclassic_util_parallel -lstkclassic_util_diag -lstkclassic_util_env -lstkclassic_util_util -lintrepid -lteko -lfei_trilinos -lfei_base -lstratimikos -lstratimikosbelos -lstratimikosaztecoo -lstratimikosamesos -lstratimikosml -lstratimikosifpack -lifpack2-adapters -lifpack2 -lanasazitpetra -lModeLaplace -lanasaziepetra -lanasazi -lkomplex -lsupes -laprepro_lib -lchaco -lIonit -lIotr -lIohb -lIogn -lIopg -lIoss -lsupes -laprepro_lib -lchaco -lIonit -lIotr -lIohb -lIogn -lIopg -lIoss -lamesos2 -lshylu -lbelostpetra -lbelosepetra -lbelos -lml -lifpack -lzoltan2 -lpamgen_extras -lpamgen -lamesos -lgaleri-xpetra -lgaleri-epetra -laztecoo -ldpliris -lisorropia -loptipack -lxpetra-sup -lxpetra -lthyratpetra -lthyraepetraext -lthyraepetra -lthyracore -lthyratpetra -lthyraepetraext -lthyraepetra -lthyracore -lepetraext -ltpetraext -ltpetrainout -ltpetra -lkokkostsqr -ltpetrakernels -ltpetraclassiclinalg -ltpetraclassicnodeapi -ltpetraclassic -ltpetraext -ltpetrainout -ltpetra -lkokkostsqr -ltpetrakernels -ltpetraclassiclinalg -ltpetraclassicnodeapi -ltpetraclassic -ltriutils -lglobipack -lshards -lzoltan -lepetra -lsacado -lrtop -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchoscore -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -ltpi -lgtest -lpthread -Wl,-rpath,/scratch/p_ppm/HYPRE/lib -L/scratch/p_ppm/HYPRE/lib -lHYPRE -Wl,-rpath,/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -L/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 -L/sw/global/compilers/gcc/5.3.0/lib64 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -L/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0 -L/sw/global/compilers/gcc/5.3.0/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib -L/sw/global/compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib -L/sw/global/compilers/gcc/5.3.0/lib -lmpi_cxx -lstdc++ -Wl,-rpath,/scratch/p_ppm//SUPERLU_DIST/lib -L/scratch/p_ppm//SUPERLU_DIST/lib -lsuperlu_dist_4.3 -Wl,-rpath,/scratch/p_ppm/SUITESPARSE/lib -L/scratch/p_ppm/SUITESPARSE/lib -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lrt -ldmumps -Wl,-rpath,/scratch/p_ppm//MUMPS/lib -L/scratch/p_ppm//MUMPS/lib -lmumps_common -lpord -Wl,-rpath,/scratch/p_ppm/SCALAPACK/lib -L/scratch/p_ppm/SCALAPACK/lib -lscalapack -Wl,-rpath,/scratch/p_ppm//OPENBLAS/lib -L/scratch/p_ppm//OPENBLAS/lib -lopenblas -Wl,-rpath,/scratch/p_ppm/PARMETIS/lib -L/scratch/p_ppm/PARMETIS/lib -lparmetis -Wl,-rpath,/scratch/p_ppm/METIS/lib -L/scratch/p_ppm/METIS/lib -lmetis -lX11 -lhwloc -lssl -lcrypto -lm -lmpi_usempi -lmpi_mpifh -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpi_cxx -lstdc++ -Wl,-rpath,/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -L/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 -L/sw/global/compilers/gcc/5.3.0/lib64 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -L/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 -L/sw/global/compilers/gcc/5.3.0/lib64 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 -L/sw/global/compilers/gcc/5.3.0/lib64 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0 -L/sw/global/compilers/gcc/5.3.0/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -L/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib -L/sw/global/compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib -L/sw/global/compilers/gcc/5.3.0/lib -ldl -Wl,-rpath,/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -lmpi -lgcc_s -lpthread -ldl > > ----------------------------------------- > > > > Regards > > Pietro Incardona > > From rtmills at anl.gov Fri Jun 16 15:03:05 2017 From: rtmills at anl.gov (Richard Tran Mills) Date: Fri, 16 Jun 2017 13:03:05 -0700 Subject: [petsc-users] PETSC profiling on 1536 cores In-Reply-To: References: <896F771F-C770-4003-B616-C45AD9A0EB68@mcs.anl.gov> Message-ID: Pietro, On Fri, Jun 16, 2017 at 1:29 AM, Pietro Incardona wrote: > [...] > > I will try now other examples more in the direction of having only CG and > no preconditioners to see what happen in scalability and understand what I > am doing wrong. But in the meanwhile I have a second question does the fact > that I compiled PETSC with debugging = 0 could affect the profiler numbers > to be unreliable ? > Building with debugging = 0 is the right thing to do if you want to do any performance studies. Building with debugging turned *on* is what might make the profile numbers unreliable. --Richard > > Thanks in advance > > Pietro Incardona > > > > ------------------------------ > *Da:* Barry Smith > *Inviato:* gioved? 15 giugno 2017 23:16:50 > *A:* Pietro Incardona > *Cc:* petsc-users at mcs.anl.gov > *Oggetto:* Re: [petsc-users] PETSC profiling on 1536 cores > > > Please send the complete -log_view files as attachments. Both 1536 and > 48 cores. The mailers mess up the ASCII formatting in text so I can't make > heads or tails out of the result. > > What is the MPI being used and what kind of interconnect does the > network have? Also is the MPI specific to that interconnect or just > something compiled off the web? > > > Barry > > > On Jun 15, 2017, at 4:09 PM, Pietro Incardona > wrote: > > > > Dear All > > > > I tried PETSC version 3.6.5 to solve a linear system with 256 000 000 > unknown. The equation is Finite differences Poisson equation. > > > > I am using Conjugate gradient (the matrix is symmetric) with no > preconditioner. Visualizing the solution is reasonable. > > Unfortunately the Conjugate-Gradient does not scale at all and I am > extremely concerned about this problem in paticular about the profiling > numbers. > > Looking at the profiler it seem that > > > > 1536 cores = 24 cores x 64 > > > > VecScatterBegin 348 1.0 2.3975e-01 1.8 0.00e+00 0.0 7.7e+06 3.1e+04 > 0.0e+00 0 0 85 99 0 0 0 85 99 0 0 > > VecScatterEnd 348 1.0 2.8680e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > > MatMult 348 1.0 4.1088e+00 1.4 8.18e+08 1.3 7.7e+06 3.1e+04 > 0.0e+00 2 52 85 99 0 2 52 85 99 0 281866 > > > > I was expecting that this part was the most expensive and take around 4 > second in total that sound reasonable > > > > Unfortunately > > > > on 1536 cores = 24 cores x 64 > > > > VecTDot 696 1.0 3.4442e+01 1.4 2.52e+08 1.3 0.0e+00 0.0e+00 > 7.0e+02 12 16 0 0 65 12 16 0 0 65 10346 > > VecNorm 349 1.0 1.1101e+02 1.1 1.26e+08 1.3 0.0e+00 0.0e+00 > 3.5e+02 46 8 0 0 33 46 8 0 0 33 1610 > > VecAXPY 696 1.0 8.3134e+01 1.1 2.52e+08 1.3 0.0e+00 0.0e+00 > 0.0e+00 34 16 0 0 0 34 16 0 0 0 4286 > > > > Take over 228 seconds. Considering that doing some test on the cluster I > do not see any problem with MPI_Reduce I do not understand how these > numbers are possible > > > > > > ////////////////////////// I also attach to the profiling part the > inversion on 48 cores ///////////////////////// > > > > VecTDot 696 1.0 1.4684e+01 1.3 3.92e+09 1.1 0.0e+00 0.0e+00 > 7.0e+02 6 16 0 0 65 6 16 0 0 65 24269 > > VecNorm 349 1.0 4.9612e+01 1.3 1.96e+09 1.1 0.0e+00 0.0e+00 > 3.5e+02 22 8 0 0 33 22 8 0 0 33 3602 > > VecCopy 351 1.0 8.8359e+00 7.7 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > > VecSet 2 1.0 1.6177e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecAXPY 696 1.0 8.8559e+01 1.1 3.92e+09 1.1 0.0e+00 0.0e+00 > 0.0e+00 42 16 0 0 0 42 16 0 0 0 4024 > > VecAYPX 347 1.0 4.6790e+00 1.2 1.95e+09 1.1 0.0e+00 0.0e+00 > 0.0e+00 2 8 0 0 0 2 8 0 0 0 37970 > > VecAssemblyBegin 2 1.0 5.0942e-02 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 > 6.0e+00 0 0 0 0 1 0 0 0 0 1 0 > > VecAssemblyEnd 2 1.0 1.9073e-05 6.7 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecScatterBegin 348 1.0 1.2763e+00 1.5 0.00e+00 0.0 4.6e+05 2.0e+05 > 0.0e+00 0 0 97100 0 0 0 97100 0 0 > > VecScatterEnd 348 1.0 4.6741e+00 5.6 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > > MatMult 348 1.0 2.8440e+01 1.1 1.27e+10 1.1 4.6e+05 2.0e+05 > 0.0e+00 13 52 97100 0 13 52 97100 0 40722 > > MatAssemblyBegin 1 1.0 7.4749e-0124.5 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatAssemblyEnd 1 1.0 8.3194e-01 1.0 0.00e+00 0.0 2.7e+03 5.1e+04 > 8.0e+00 0 0 1 0 1 0 0 1 0 1 0 > > KSPSetUp 1 1.0 8.2883e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > KSPSolve 1 1.0 1.7964e+02 1.0 2.45e+10 1.1 4.6e+05 2.0e+05 > 1.0e+03 87100 97100 98 87100 97100 98 12398 > > PCSetUp 1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > PCApply 349 1.0 8.8166e+00 7.8 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > > > > > > //////////////////////////////////////////////////////////// > ///////////////////////////////////////////////////////// > > > > If you need more information or test please let me know. > > > > Thanks in advance > > > > Here the log of 1536 cores > > > > 345 KSP Residual norm 1.007085286893e-02 > > 346 KSP Residual norm 1.010054402040e-02 > > 347 KSP Residual norm 1.002139574355e-02 > > 348 KSP Residual norm 9.775851299055e-03 > > Max div for vorticity 1.84572e-05 Integral: 6.62466e-09 130.764 -132 > > ************************************************************ > ************************************************************ > > *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r > -fCourier9' to print this document *** > > ************************************************************ > ************************************************************ > > > > ---------------------------------------------- PETSc Performance > Summary: ---------------------------------------------- > > > > ./vic_petsc on a arch-linux2-c-opt named taurusi6217 with 1536 > processors, by incard Thu Jun 15 22:27:09 2017 > > Using Petsc Release Version 3.6.4, Apr, 12, 2016 > > > > Max Max/Min Avg Total > > Time (sec): 2.312e+02 1.00027 2.312e+02 > > Objects: 1.900e+01 1.00000 1.900e+01 > > Flops: 1.573e+09 1.32212 1.450e+09 2.227e+12 > > Flops/sec: 6.804e+06 1.32242 6.271e+06 9.633e+09 > > MPI Messages: 8.202e+03 2.06821 5.871e+03 9.018e+06 > > MPI Message Lengths: 2.013e+08 1.86665 2.640e+04 2.381e+11 > > MPI Reductions: 1.066e+03 1.00000 > > > > Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > > e.g., VecAXPY() for real vectors of length N > --> 2N flops > > and VecAXPY() for complex vectors of length N > --> 8N flops > > > > Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages > --- -- Message Lengths -- -- Reductions -- > > Avg %Total Avg %Total counts > %Total Avg %Total counts %Total > > 0: Main Stage: 2.3120e+02 100.0% 2.2272e+12 100.0% 9.018e+06 > 100.0% 2.640e+04 100.0% 1.065e+03 99.9% > > > > ------------------------------------------------------------ > ------------------------------------------------------------ > > See the 'Profiling' chapter of the users' manual for details on > interpreting output. > > Phase summary info: > > Count: number of times phase was executed > > Time and Flops: Max - maximum over all processors > > Ratio - ratio of maximum to minimum over all processors > > Mess: number of messages sent > > Avg. len: average message length (bytes) > > Reduct: number of global reductions > > Global: entire computation > > Stage: stages of a computation. Set stages with PetscLogStagePush() > and PetscLogStagePop(). > > %T - percent time in this phase %F - percent flops in this > phase > > %M - percent messages in this phase %L - percent message > lengths in this phase > > %R - percent reductions in this phase > > Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time > over all processors) > > ------------------------------------------------------------ > ------------------------------------------------------------ > > Event Count Time (sec) > Flops --- Global --- --- Stage --- Total > > Max Ratio Max Ratio Max Ratio Mess Avg len > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > > ------------------------------------------------------------ > ------------------------------------------------------------ > > > > --- Event Stage 0: Main Stage > > > > VecTDot 696 1.0 3.4442e+01 1.4 2.52e+08 1.3 0.0e+00 0.0e+00 > 7.0e+02 12 16 0 0 65 12 16 0 0 65 10346 > > VecNorm 349 1.0 1.1101e+02 1.1 1.26e+08 1.3 0.0e+00 0.0e+00 > 3.5e+02 46 8 0 0 33 46 8 0 0 33 1610 > > VecCopy 351 1.0 2.7609e-01 5.8 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecSet 2 1.0 3.8961e-0256.9 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecAXPY 696 1.0 8.3134e+01 1.1 2.52e+08 1.3 0.0e+00 0.0e+00 > 0.0e+00 34 16 0 0 0 34 16 0 0 0 4286 > > VecAYPX 347 1.0 2.0852e-01 2.0 1.25e+08 1.3 0.0e+00 0.0e+00 > 0.0e+00 0 8 0 0 0 0 8 0 0 0 852044 > > VecAssemblyBegin 2 1.0 8.3237e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 > 6.0e+00 0 0 0 0 1 0 0 0 0 1 0 > > VecAssemblyEnd 2 1.0 5.1022e-0517.8 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecScatterBegin 348 1.0 2.3975e-01 1.8 0.00e+00 0.0 7.7e+06 3.1e+04 > 0.0e+00 0 0 85 99 0 0 0 85 99 0 0 > > VecScatterEnd 348 1.0 2.8680e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > > MatMult 348 1.0 4.1088e+00 1.4 8.18e+08 1.3 7.7e+06 3.1e+04 > 0.0e+00 2 52 85 99 0 2 52 85 99 0 281866 > > MatAssemblyBegin 1 1.0 9.1920e-02 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatAssemblyEnd 1 1.0 3.9093e+00 1.0 0.00e+00 0.0 4.4e+04 7.7e+03 > 8.0e+00 2 0 0 0 1 2 0 0 0 1 0 > > KSPSetUp 1 1.0 8.1890e-03 4.3 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > KSPSolve 1 1.0 2.1525e+02 1.0 1.57e+09 1.3 7.7e+06 3.1e+04 > 1.0e+03 93100 85 99 98 93100 85 99 98 10347 > > PCSetUp 1 1.0 5.5075e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > PCApply 349 1.0 2.7485e-01 6.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > ------------------------------------------------------------ > ------------------------------------------------------------ > > > > Memory usage is given in bytes: > > > > Object Type Creations Destructions Memory Descendants' > Mem. > > Reports information only for process 0. > > > > --- Event Stage 0: Main Stage > > > > Vector 10 7 7599552 0 > > Vector Scatter 1 1 1088 0 > > Matrix 3 3 20858912 0 > > Krylov Solver 1 1 1216 0 > > Index Set 2 2 242288 0 > > Preconditioner 1 1 816 0 > > Viewer 1 0 0 0 > > ============================================================ > ============================================================ > > Average time to get PetscTime(): 9.53674e-08 > > Average time for MPI_Barrier(): 3.68118e-05 > > Average time for zero size MPI_Send(): 3.24349e-06 > > #PETSc Option Table entries: > > -ksp_atol 0.010000 > > -ksp_max_it 500 > > -ksp_monitor > > -ksp_type cg > > -log_summary > > #End of PETSc Option Table entries > > Compiled without FORTRAN kernels > > Compiled with full precision matrices (default) > > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 8 sizeof(PetscInt) 4 > > Configure options: --with-cxx-dialect=C++11 --with-mpi-dir=/sw/taurus/libraries/openmpi/1.10.2-gnu > --with-parmetis=yes --with-parmetis-dir=/scratch/p_ppm//PARMETIS/ > --with-metis=yes --with-metis-dir=/scratch/p_ppm//METIS --with-boost=yes > --with-boost-dir=/scratch/p_ppm//BOOST --with-blas-lib=/scratch/p_ > ppm//OPENBLAS/lib/libopenblas.a --with-lapack-lib=/scratch/p_ > ppm//OPENBLAS/lib/libopenblas.a --with-suitesparse=yes > --with-suitesparse-dir=/scratch/p_ppm//SUITESPARSE --with-trilinos=yes > -with-trilinos-dir=/scratch/p_ppm//TRILINOS --with-scalapack=yes > -with-scalapack-dir=/scratch/p_ppm//SCALAPACK --with-mumps=yes > --with-mumps-include=/scratch/p_ppm//MUMPS/include > --with-superlu_dist=yes --with-superlu_dist-lib=/ > scratch/p_ppm//SUPERLU_DIST/lib/libsuperlu_dist_4.3.a > --with-superlu_dist-include=/scratch/p_ppm//SUPERLU_DIST/include/ > --with-hypre=yes -with-hypre-dir=/scratch/p_ppm//HYPRE > --with-mumps-lib=""/scratch/p_ppm//MUMPS/lib/libdmumps.a > /scratch/p_ppm//MUMPS/lib/libmumps_common.a /scratch/p_ppm//MUMPS/lib/libpord.a"" > --prefix=/scratch/p_ppm//PETSC --with-debugging=0 > > ----------------------------------------- > > Libraries compiled on Wed Feb 22 17:30:49 2017 on tauruslogin4 > > Machine characteristics: Linux-2.6.32-642.11.1.el6. > Bull.106.x86_64-x86_64-with-redhat-6.8-Santiago > > Using PETSc directory: /lustre/scratch2/p_ppm/ > jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4 > > Using PETSc arch: arch-linux2-c-opt > > ----------------------------------------- > > > > Using C compiler: /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpicc > -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O > ${COPTFLAGS} ${CFLAGS} > > Using Fortran compiler: /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpif90 > -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 > -Wno-unused-dummy-argument -O ${FOPTFLAGS} ${FFLAGS} > > ----------------------------------------- > > > > Using include paths: -I/lustre/scratch2/p_ppm/ > jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/include > -I/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_ > high_scal_tests/petsc-3.6.4/include -I/lustre/scratch2/p_ppm/ > jenkins2/workspace/OpenFPM_high_scal_tests/petsc-3.6.4/include > -I/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_ > high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/include > -I/scratch/p_ppm/TRILINOS/include -I/scratch/p_ppm/HYPRE/include > -I/scratch/p_ppm/SUPERLU_DIST/include -I/scratch/p_ppm/SUITESPARSE/include > -I/scratch/p_ppm/MUMPS/include -I/scratch/p_ppm/PARMETIS/include > -I/scratch/p_ppm/METIS/include -I/scratch/p_ppm/BOOST/include > -I/sw/taurus/libraries/openmpi/1.10.2-gnu/include > > ----------------------------------------- > > > > Using C linker: /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpicc > > Using Fortran linker: /sw/taurus/libraries/openmpi/1.10.2-gnu/bin/mpif90 > > Using libraries: -Wl,-rpath,/lustre/scratch2/p_ppm/jenkins2/workspace/ > OpenFPM_high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/lib > -L/lustre/scratch2/p_ppm/jenkins2/workspace/OpenFPM_ > high_scal_tests/petsc-3.6.4/arch-linux2-c-opt/lib -lpetsc > -Wl,-rpath,/scratch/p_ppm/TRILINOS/lib -L/scratch/p_ppm/TRILINOS/lib > -lpike-blackbox -ltrilinoscouplings -lmsqutil -lmesquite -lctrilinos > -lsundancePdeopt -lsundanceStdFwk -lsundanceStdMesh -lsundanceCore > -lsundanceInterop -lsundanceUtils -lsundancePlaya -lpiro -lrol > -lstokhos_muelu -lstokhos_ifpack2 -lstokhos_amesos2 -lstokhos_tpetra > -lstokhos_sacado -lstokhos -lmoochothyra -lmoocho -lrythmos > -lmuelu-adapters -lmuelu-interface -lmuelu -lmoertel -llocathyra > -llocaepetra -llocalapack -lloca -lnoxepetra -lnoxlapack -lnox -lphalanx > -lstk_mesh_fixtures -lstk_search_util_base -lstk_search > -lstk_unit_test_utils -lstk_io_util -lstk_io -lstk_mesh_base -lstk_topology > -lstk_util_use_cases -lstk_util_registry -lstk_util_diag -lstk_util_env > -lstk_util_util -lstkclassic_search_util -lstkclassic_search > -lstkclassic_rebalance_utils -lstkclassic_rebalance -lstkclassic_linsys > -lstkclassic_io_util -lstkclassic_io -lstkclassic_expreval > -lstkclassic_algsup -lstkclassic_mesh_fem -lstkclassic_mesh_base > -lstkclassic_util_use_cases -lstkclassic_util_unit_test_support > -lstkclassic_util_parallel -lstkclassic_util_diag -lstkclassic_util_env > -lstkclassic_util_util -lstk_mesh_fixtures -lstk_search_util_base > -lstk_search -lstk_unit_test_utils -lstk_io_util -lstk_io -lstk_mesh_base > -lstk_topology -lstk_util_use_cases -lstk_util_registry -lstk_util_diag > -lstk_util_env -lstk_util_util -lstkclassic_search_util -lstkclassic_search > -lstkclassic_rebalance_utils -lstkclassic_rebalance -lstkclassic_linsys > -lstkclassic_io_util -lstkclassic_io -lstkclassic_expreval > -lstkclassic_algsup -lstkclassic_mesh_fem -lstkclassic_mesh_base > -lstkclassic_util_use_cases -lstkclassic_util_unit_test_support > -lstkclassic_util_parallel -lstkclassic_util_diag -lstkclassic_util_env > -lstkclassic_util_util -lintrepid -lteko -lfei_trilinos -lfei_base > -lstratimikos -lstratimikosbelos -lstratimikosaztecoo -lstratimikosamesos > -lstratimikosml -lstratimikosifpack -lifpack2-adapters -lifpack2 > -lanasazitpetra -lModeLaplace -lanasaziepetra -lanasazi -lkomplex -lsupes > -laprepro_lib -lchaco -lIonit -lIotr -lIohb -lIogn -lIopg -lIoss -lsupes > -laprepro_lib -lchaco -lIonit -lIotr -lIohb -lIogn -lIopg -lIoss -lamesos2 > -lshylu -lbelostpetra -lbelosepetra -lbelos -lml -lifpack -lzoltan2 > -lpamgen_extras -lpamgen -lamesos -lgaleri-xpetra -lgaleri-epetra -laztecoo > -ldpliris -lisorropia -loptipack -lxpetra-sup -lxpetra -lthyratpetra > -lthyraepetraext -lthyraepetra -lthyracore -lthyratpetra -lthyraepetraext > -lthyraepetra -lthyracore -lepetraext -ltpetraext -ltpetrainout -ltpetra > -lkokkostsqr -ltpetrakernels -ltpetraclassiclinalg -ltpetraclassicnodeapi > -ltpetraclassic -ltpetraext -ltpetrainout -ltpetra -lkokkostsqr > -ltpetrakernels -ltpetraclassiclinalg -ltpetraclassicnodeapi > -ltpetraclassic -ltriutils -lglobipack -lshards -lzoltan -lepetra -lsacado > -lrtop -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder > -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchoscore > -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder > -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchoscore > -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lkokkosalgorithms > -lkokkoscontainers -lkokkoscore -ltpi -lgtest -lpthread > -Wl,-rpath,/scratch/p_ppm/HYPRE/lib -L/scratch/p_ppm/HYPRE/lib -lHYPRE > -Wl,-rpath,/sw/taurus/libraries/openmpi/1.10.2-gnu/lib > -L/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 > -L/sw/global/compilers/gcc/5.3.0/lib64 -Wl,-rpath,/sw/global/ > compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 > -L/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 > -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/libexec/ > gcc/x86_64-unknown-linux-gnu/5.3.0 -L/sw/global/compilers/gcc/5. > 3.0/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0 -Wl,-rpath,/sw/global/ > compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib > -L/sw/global/compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib > -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib > -L/sw/global/compilers/gcc/5.3.0/lib -lmpi_cxx -lstdc++ > -Wl,-rpath,/scratch/p_ppm//SUPERLU_DIST/lib -L/scratch/p_ppm//SUPERLU_DIST/lib > -lsuperlu_dist_4.3 -Wl,-rpath,/scratch/p_ppm/SUITESPARSE/lib > -L/scratch/p_ppm/SUITESPARSE/lib -lumfpack -lklu -lcholmod -lbtf > -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lrt -ldmumps > -Wl,-rpath,/scratch/p_ppm//MUMPS/lib -L/scratch/p_ppm//MUMPS/lib > -lmumps_common -lpord -Wl,-rpath,/scratch/p_ppm/SCALAPACK/lib > -L/scratch/p_ppm/SCALAPACK/lib -lscalapack -Wl,-rpath,/scratch/p_ppm//OPENBLAS/lib > -L/scratch/p_ppm//OPENBLAS/lib -lopenblas -Wl,-rpath,/scratch/p_ppm/PARMETIS/lib > -L/scratch/p_ppm/PARMETIS/lib -lparmetis -Wl,-rpath,/scratch/p_ppm/METIS/lib > -L/scratch/p_ppm/METIS/lib -lmetis -lX11 -lhwloc -lssl -lcrypto -lm > -lmpi_usempi -lmpi_mpifh -lgfortran -lm -lgfortran -lm -lquadmath -lm > -lmpi_cxx -lstdc++ -Wl,-rpath,/sw/taurus/libraries/openmpi/1.10.2-gnu/lib > -L/sw/taurus/libraries/openmpi/1.10.2-gnu/lib -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 > -L/sw/global/compilers/gcc/5.3.0/lib64 -Wl,-rpath,/sw/global/ > compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 > -L/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 > -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 > -L/sw/global/compilers/gcc/5.3.0/lib64 -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib64 > -L/sw/global/compilers/gcc/5.3.0/lib64 -Wl,-rpath,/sw/global/ > compilers/gcc/5.3.0/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0 > -L/sw/global/compilers/gcc/5.3.0/libexec/gcc/x86_64-unknown-linux-gnu/5.3.0 > -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 > -L/sw/global/compilers/gcc/5.3.0/lib/gcc/x86_64-unknown-linux-gnu/5.3.0 > -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib > -L/sw/global/compilers/gcc/5.3.0/x86_64-unknown-linux-gnu/lib > -Wl,-rpath,/sw/global/compilers/gcc/5.3.0/lib > -L/sw/global/compilers/gcc/5.3.0/lib -ldl -Wl,-rpath,/sw/taurus/ > libraries/openmpi/1.10.2-gnu/lib -lmpi -lgcc_s -lpthread -ldl > > ----------------------------------------- > > > > Regards > > Pietro Incardona > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thl at gps.caltech.edu Fri Jun 16 18:31:08 2017 From: thl at gps.caltech.edu (TianHao Le) Date: Fri, 16 Jun 2017 16:31:08 -0700 Subject: [petsc-users] undefined symbol: petsc_null_function_ Message-ID: Dear all, I try to put a 3-D radiative transfer model in the python Large-Scale Simulations model. However, when I setup the LES model, I got the following error message: Traceback (most recent call last): File "main.py", line 33, in main() File "main.py", line 17, in main main3d(namelist) File "main.py", line 23, in main3d import Simulation3d File "Radiation.pxd", line 11, in init Simulation3d (Simulation3d.c:29383) cdef class RadiationBase: ImportError: /global/home/thl/pycles-test/Radiation.so: undefined symbol: petsc_null_function_ Seems cannot find the PETSC NULL FUNCTION. I add the following 5 include paths in the setup.py (because I am not sure that I should add which one ) include_path += ['/home/thl/petsc/include/petsc/finclude'] include_path += ['/home/thl/petsc/include/petsc/private'] include_path += ['/home/thl/petsc/fast_double/include'] include_path += ['/home/thl/petsc/include'] include_path += [?/home/thl/petsc/include/petsc'] and add the following 2 library directions. library_dirs.append('/home/thl/tenstream/build/lib') library_dirs.append(?/home/thl/petsc/fast_double/lib') Now I am not sure what may cause this, maybe didn?t install PETSC correctly? Or add wrong include path or in a wrong way? If anyone knows how to fix this, please let me know! Thanks, Tianhao -- -- From bsmith at mcs.anl.gov Fri Jun 16 19:12:01 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 16 Jun 2017 19:12:01 -0500 Subject: [petsc-users] undefined symbol: petsc_null_function_ In-Reply-To: References: Message-ID: <704841EC-DB69-4E0E-B932-29F5590ED137@mcs.anl.gov> Please send make.log and configure.log from building PETSc This is a fortran stub function, is there any fortran in your code? If there is no fortran you can reconfigure PETSc with the option -with-fc=0 Otherwise we need to track down why it is not being found. Barry > On Jun 16, 2017, at 6:31 PM, TianHao Le wrote: > > Dear all, > > I try to put a 3-D radiative transfer model in the python Large-Scale > Simulations model. However, when I setup the LES model, I got the > following error message: > > Traceback (most recent call last): > File "main.py", line 33, in > main() > File "main.py", line 17, in main > main3d(namelist) > File "main.py", line 23, in main3d > import Simulation3d > File "Radiation.pxd", line 11, in init Simulation3d (Simulation3d.c:29383) > cdef class RadiationBase: > ImportError: /global/home/thl/pycles-test/Radiation.so: undefined symbol: > petsc_null_function_ > > Seems cannot find the PETSC NULL FUNCTION. > > I add the following 5 include paths in the setup.py (because I am not sure > that I should add which one.) > > include_path += ['/home/thl/petsc/include/petsc/finclude'] > include_path += ['/home/thl/petsc/include/petsc/private'] > include_path += ['/home/thl/petsc/fast_double/include'] > include_path += ['/home/thl/petsc/include'] > include_path += ['/home/thl/petsc/include/petsc'] > > and add the following 2 library directions. > > library_dirs.append('/home/thl/tenstream/build/lib') > library_dirs.append('/home/thl/petsc/fast_double/lib') > > > Now I am not sure what may cause this, maybe didn't install PETSC > correctly? Or add wrong include path or in a wrong way? > > If anyone knows how to fix this, please let me know! > > Thanks, > Tianhao > > > -- > > > -- > From happysky19 at gmail.com Fri Jun 16 19:35:45 2017 From: happysky19 at gmail.com (Tianhao Le) Date: Sat, 17 Jun 2017 08:35:45 +0800 Subject: [petsc-users] undefined symbol: petsc_null_function_ In-Reply-To: <704841EC-DB69-4E0E-B932-29F5590ED137@mcs.anl.gov> References: <704841EC-DB69-4E0E-B932-29F5590ED137@mcs.anl.gov> Message-ID: <31BE5C15-AD78-42BD-90EE-07626CCD4945@gmail.com> Hi Barry, Attached are the make.log and configure.log. In fact, the 3-D R-T model is written in c++ and fortran. I wrote a wrapper.f90 to let the 3-D R-T model can be called by the python code. (See the Tenstream_combined.csh, also attached) Thanks, Tianhao > ? 2017?6?17????8:12?Barry Smith ??? > > > Please send make.log and configure.log from building PETSc > > This is a fortran stub function, is there any fortran in your code? > > If there is no fortran you can reconfigure PETSc with the option -with-fc=0 > > Otherwise we need to track down why it is not being found. > > Barry > >> On Jun 16, 2017, at 6:31 PM, TianHao Le wrote: >> >> Dear all, >> >> I try to put a 3-D radiative transfer model in the python Large-Scale >> Simulations model. However, when I setup the LES model, I got the >> following error message: >> >> Traceback (most recent call last): >> File "main.py", line 33, in >> main() >> File "main.py", line 17, in main >> main3d(namelist) >> File "main.py", line 23, in main3d >> import Simulation3d >> File "Radiation.pxd", line 11, in init Simulation3d (Simulation3d.c:29383) >> cdef class RadiationBase: >> ImportError: /global/home/thl/pycles-test/Radiation.so: undefined symbol: >> petsc_null_function_ >> >> Seems cannot find the PETSC NULL FUNCTION. >> >> I add the following 5 include paths in the setup.py (because I am not sure >> that I should add which one.) >> >> include_path += ['/home/thl/petsc/include/petsc/finclude'] >> include_path += ['/home/thl/petsc/include/petsc/private'] >> include_path += ['/home/thl/petsc/fast_double/include'] >> include_path += ['/home/thl/petsc/include'] >> include_path += ['/home/thl/petsc/include/petsc'] >> >> and add the following 2 library directions. >> >> library_dirs.append('/home/thl/tenstream/build/lib') >> library_dirs.append('/home/thl/petsc/fast_double/lib') >> >> >> Now I am not sure what may cause this, maybe didn't install PETSC >> correctly? Or add wrong include path or in a wrong way? >> >> If anyone knows how to fix this, please let me know! >> >> Thanks, >> Tianhao >> >> >> -- >> >> >> -- >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Tenstream_combined.csh Type: application/octet-stream Size: 2000 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 4199635 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: make.log Type: application/octet-stream Size: 103336 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Jun 16 20:05:34 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 16 Jun 2017 20:05:34 -0500 Subject: [petsc-users] undefined symbol: petsc_null_function_ In-Reply-To: <31BE5C15-AD78-42BD-90EE-07626CCD4945@gmail.com> References: <704841EC-DB69-4E0E-B932-29F5590ED137@mcs.anl.gov> <31BE5C15-AD78-42BD-90EE-07626CCD4945@gmail.com> Message-ID: <1DF30162-1A63-468E-AD7C-1B79C737697E@mcs.anl.gov> I don't know exactly how to fix this but you need to make sure the PETSc libraries are linked against any shared libraries you are creating so that PETSc symbols can be found. Barry > On Jun 16, 2017, at 7:35 PM, Tianhao Le wrote: > > Hi Barry, > > Attached are the make.log and configure.log. > > In fact, the 3-D R-T model is written in c++ and fortran. > > I wrote a wrapper.f90 to let the 3-D R-T model can be called by the python code. (See the Tenstream_combined.csh, also attached) > > Thanks, > Tianhao > >> ? 2017?6?17????8:12?Barry Smith ??? >> >> >> Please send make.log and configure.log from building PETSc >> >> This is a fortran stub function, is there any fortran in your code? >> >> If there is no fortran you can reconfigure PETSc with the option -with-fc=0 >> >> Otherwise we need to track down why it is not being found. >> >> Barry >> >>> On Jun 16, 2017, at 6:31 PM, TianHao Le wrote: >>> >>> Dear all, >>> >>> I try to put a 3-D radiative transfer model in the python Large-Scale >>> Simulations model. However, when I setup the LES model, I got the >>> following error message: >>> >>> Traceback (most recent call last): >>> File "main.py", line 33, in >>> main() >>> File "main.py", line 17, in main >>> main3d(namelist) >>> File "main.py", line 23, in main3d >>> import Simulation3d >>> File "Radiation.pxd", line 11, in init Simulation3d (Simulation3d.c:29383) >>> cdef class RadiationBase: >>> ImportError: /global/home/thl/pycles-test/Radiation.so: undefined symbol: >>> petsc_null_function_ >>> >>> Seems cannot find the PETSC NULL FUNCTION. >>> >>> I add the following 5 include paths in the setup.py (because I am not sure >>> that I should add which one.) >>> >>> include_path += ['/home/thl/petsc/include/petsc/finclude'] >>> include_path += ['/home/thl/petsc/include/petsc/private'] >>> include_path += ['/home/thl/petsc/fast_double/include'] >>> include_path += ['/home/thl/petsc/include'] >>> include_path += ['/home/thl/petsc/include/petsc'] >>> >>> and add the following 2 library directions. >>> >>> library_dirs.append('/home/thl/tenstream/build/lib') >>> library_dirs.append('/home/thl/petsc/fast_double/lib') >>> >>> >>> Now I am not sure what may cause this, maybe didn't install PETSC >>> correctly? Or add wrong include path or in a wrong way? >>> >>> If anyone knows how to fix this, please let me know! >>> >>> Thanks, >>> Tianhao >>> >>> >>> -- >>> >>> >>> -- >>> >> > > From zonexo at gmail.com Sat Jun 17 10:49:23 2017 From: zonexo at gmail.com (TAY wee-beng) Date: Sat, 17 Jun 2017 23:49:23 +0800 Subject: [petsc-users] Strange Segmentation Violation error In-Reply-To: <055B880D-7D0C-49C9-84B5-2C64B003FC0A@glasgow.ac.uk> References: <635a7754-72ec-1c96-2d0a-783615079b25@gmail.com> <055B880D-7D0C-49C9-84B5-2C64B003FC0A@glasgow.ac.uk> Message-ID: Hi Lukasz, Thanks for the tip. I tied using valgrind. However, I got a lot of errors at a few of locations. One complained of uninitialized value of : call PetscInitialize(PETSC_NULL_CHARACTER,ierr) But I already initialize "ierr". Are these errors valid or can I hide them? == ==17300== Conditional jump or move depends on uninitialised value(s) ==17300== at 0x3C2A872849: _IO_file_fopen@@GLIBC_2.2.5 (in /lib64/libc-2.12.so) ==17300== by 0x3C2A866D95: __fopen_internal (in /lib64/libc-2.12.so) ==17300== by 0x3C2A8E2CB3: setmntent (in /lib64/libc-2.12.so) ==17300== by 0xA726083: mca_mpool_hugepage_open (in /home/tsltaywb/lib/openmpi-2.1.1/lib/openmpi/mca_mpool_hugepage.so) ==17300== by 0x65A83A1: mca_base_framework_components_open (in /home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1) ==17300== by 0x6614041: mca_mpool_base_open (in /home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1) ==17300== by 0x65B1EC0: mca_base_framework_open (in /home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1) ==17300== by 0x5E11123: ompi_mpi_init (in /home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi.so.20.10.1) ==17300== by 0x5E31032: PMPI_Init (in /home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi.so.20.10.1) ==17300== by 0x5978E87: PMPI_INIT (in /home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi_mpifh.so.20.11.0) ==17300== by 0xB29696: petscinitialize_ (zstart.c:316) ==17300== by 0xA80D2B: MAIN__ (ibm3d_high_Re.F90:63) ==17300== Uninitialised value was created by a stack allocation ==17300== at 0x3C2A8E2C82: setmntent (in /lib64/libc-2.12.so) ==17300== ==17300== Conditional jump or move depends on uninitialised value(s) ==17300== at 0x3C2A87284F: _IO_file_fopen@@GLIBC_2.2.5 (in /lib64/libc-2.12.so) ==17300== by 0x3C2A866D95: __fopen_internal (in /lib64/libc-2.12.so) ==17300== by 0x3C2A8E2CB3: setmntent (in /lib64/libc-2.12.so) ==17300== by 0xA726083: mca_mpool_hugepage_open (in /home/tsltaywb/lib/openmpi-2.1.1/lib/openmpi/mca_mpool_hugepage.so) ==17300== by 0x65A83A1: mca_base_framework_components_open (in /home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1) ==17300== by 0x6614041: mca_mpool_base_open (in /home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1) ==17300== by 0x65B1EC0: mca_base_framework_open (in /home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1) ==17300== by 0x5E11123: ompi_mpi_init (in /home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi.so.20.10.1) ==17300== by 0x5E31032: PMPI_Init (in /home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi.so.20.10.1) ==17300== by 0x5978E87: PMPI_INIT (in /home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi_mpifh.so.20.11.0) ==17300== by 0xB29696: petscinitialize_ (zstart.c:316) ==17300== by 0xA80D2B: MAIN__ (ibm3d_high_Re.F90:63) Thank you very much. Yours sincerely, ================================================ TAY Wee-Beng (Zheng Weiming) ??? Personal research webpage:http://tayweebeng.wixsite.com/website Youtube research showcase:https://www.youtube.com/channel/UC72ZHtvQNMpNs2uRTSToiLA linkedin:www.linkedin.com/in/tay-weebeng ================================================ On 7/6/2017 3:22 PM, Lukasz Kaczmarczyk wrote: > >> On 7 Jun 2017, at 07:57, TAY wee-beng > > wrote: >> >> Hi, >> >> I have been PETSc together with my CFD code. There seems to be a bug >> with the Intel compiler such that when I call some DM routines such >> as DMLocalToLocalBegin, a segmentation violation will occur if full >> optimization is used. I had posted this question a while back. So the >> current solution is to use -O1 -ip instead of -O3 -ipo -ip for >> certain source files which uses DMLocalToLocalBegin etc. >> >> Recently, I made some changes to the code, mainly adding some stuffs. >> However, depending on my options. some cases still go thru the same >> program path. >> >> Now when I tried to run those same cases, I got segmentation >> violation, which didn't happen before: >> >> / IIB_I_cell_no_uvw_total2 14 10 6 3// >> // 2 1/ >> >> /[0]PETSC ERROR: >> ------------------------------------------------------------------------// >> //[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation >> Violation, probably memory access out of range// >> //[0]PETSC ERROR: Try option -start_in_debugger or >> -on_error_attach_debugger// >> //[0]PETSC ERROR: or see >> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind// >> //[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple >> Mac OS X to find memory corruption errors// >> //[0]PETSC ERROR: configure using --with-debugging=yes, recompile, >> link, and run // >> //[0]PETSC ERROR: to get more information on the crash.// >> //[0]PETSC ERROR: --------------------- Error Message >> --------------------------------------------------------------// >> //[0]PETSC ERROR: Signal received// >> //[0]PETSC ERROR: See >> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >> shooting.// >> //[0]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016 // >> //[0]PETSC ERROR: ./a.out / >> >> I can't debug using VS since the codes have been optimized. I tried >> to print messages (if (myid == 0) print "1") to pinpoint the error. >> Strangely, after adding these print messages, the error disappears. >> >> / IIB_I_cell_no_uvw_total2 14 10 6 3// >> // 2 1// >> // 1// >> // 2// >> // 3// >> // 4// >> // 5// >> // 1 0.26873613 0.12620288 0.12949340 1.11422363 >> 0.43983516E-06 -0.59311066E-01 0.25546227E+04// >> // 2 0.22236892 0.14528589 0.16939270 1.10459102 >> 0.74556128E-02 -0.55168234E-01 0.25532419E+04// >> // 3 0.20764796 0.14832689 0.18780489 1.08039569 >> 0.80299767E-02 -0.46972411E-01 0.25523174E+04/ >> >> Can anyone give a logical explanation why this is happening? >> Moreover, if I removed printing 1 to 3, and only print 4 and 5, >> segmentation violation appears again. >> >> I am using Intel Fortran 2016.1.150. I wonder if it helps if I post >> in the Intel Fortran forum. >> >> I can provide more info if require. >> > You very likely write on the memory, for example when you exceed the > size of arrays. Depending on your compilation options, starting > parameters, etc. you write in an uncontrolled way on the part of > memory which belongs to your process or protected by operation system. > In the second case, you have a segmentation fault. You can have > correct results for some runs, but your bug is there hiding in the dark. > > To put light on it, you need Valgrind. Compile the code with debugging > on, no optimisation and start searching. You can run as well generate > core file and in gdb/ldb buck track error. > > Lukasz -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Sat Jun 17 11:04:29 2017 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Sat, 17 Jun 2017 18:04:29 +0200 Subject: [petsc-users] Strange Segmentation Violation error In-Reply-To: References: <635a7754-72ec-1c96-2d0a-783615079b25@gmail.com> <055B880D-7D0C-49C9-84B5-2C64B003FC0A@glasgow.ac.uk> Message-ID: If you plan to use valgrind you may want to use mpich (--download-mpich configure option) since openmpi has a lot of false positives. Il 17 Giu 2017 17:49, "TAY wee-beng" ha scritto: > Hi Lukasz, > > Thanks for the tip. > > I tied using valgrind. However, I got a lot of errors at a few of > locations. One complained of uninitialized value of : > > call PetscInitialize(PETSC_NULL_CHARACTER,ierr) > > But I already initialize "ierr". Are these errors valid or can I hide > them? > > == > ==17300== Conditional jump or move depends on uninitialised value(s) > ==17300== at 0x3C2A872849: _IO_file_fopen@@GLIBC_2.2.5 (in /lib64/ > libc-2.12.so) > ==17300== by 0x3C2A866D95: __fopen_internal (in /lib64/libc-2.12.so) > ==17300== by 0x3C2A8E2CB3: setmntent (in /lib64/libc-2.12.so) > ==17300== by 0xA726083: mca_mpool_hugepage_open (in > /home/tsltaywb/lib/openmpi-2.1.1/lib/openmpi/mca_mpool_hugepage.so) > ==17300== by 0x65A83A1: mca_base_framework_components_open (in > /home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1) > ==17300== by 0x6614041: mca_mpool_base_open (in > /home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1) > ==17300== by 0x65B1EC0: mca_base_framework_open (in > /home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1) > ==17300== by 0x5E11123: ompi_mpi_init (in /home/tsltaywb/lib/openmpi-2. > 1.1/lib/libmpi.so.20.10.1) > ==17300== by 0x5E31032: PMPI_Init (in /home/tsltaywb/lib/openmpi-2. > 1.1/lib/libmpi.so.20.10.1) > ==17300== by 0x5978E87: PMPI_INIT (in /home/tsltaywb/lib/openmpi-2. > 1.1/lib/libmpi_mpifh.so.20.11.0) > ==17300== by 0xB29696: petscinitialize_ (zstart.c:316) > ==17300== by 0xA80D2B: MAIN__ (ibm3d_high_Re.F90:63) > ==17300== Uninitialised value was created by a stack allocation > ==17300== at 0x3C2A8E2C82: setmntent (in /lib64/libc-2.12.so) > ==17300== > ==17300== Conditional jump or move depends on uninitialised value(s) > ==17300== at 0x3C2A87284F: _IO_file_fopen@@GLIBC_2.2.5 (in /lib64/ > libc-2.12.so) > ==17300== by 0x3C2A866D95: __fopen_internal (in /lib64/libc-2.12.so) > ==17300== by 0x3C2A8E2CB3: setmntent (in /lib64/libc-2.12.so) > ==17300== by 0xA726083: mca_mpool_hugepage_open (in > /home/tsltaywb/lib/openmpi-2.1.1/lib/openmpi/mca_mpool_hugepage.so) > ==17300== by 0x65A83A1: mca_base_framework_components_open (in > /home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1) > ==17300== by 0x6614041: mca_mpool_base_open (in > /home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1) > ==17300== by 0x65B1EC0: mca_base_framework_open (in > /home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1) > ==17300== by 0x5E11123: ompi_mpi_init (in /home/tsltaywb/lib/openmpi-2. > 1.1/lib/libmpi.so.20.10.1) > ==17300== by 0x5E31032: PMPI_Init (in /home/tsltaywb/lib/openmpi-2. > 1.1/lib/libmpi.so.20.10.1) > ==17300== by 0x5978E87: PMPI_INIT (in /home/tsltaywb/lib/openmpi-2. > 1.1/lib/libmpi_mpifh.so.20.11.0) > ==17300== by 0xB29696: petscinitialize_ (zstart.c:316) > ==17300== by 0xA80D2B: MAIN__ (ibm3d_high_Re.F90:63) > > > > > > Thank you very much. > > Yours sincerely, > > ================================================ > TAY Wee-Beng (Zheng Weiming) ??? > Personal research webpage: http://tayweebeng.wixsite.com/website > Youtube research showcase: https://www.youtube.com/channel/UC72ZHtvQNMpNs2uRTSToiLA > linkedin: www.linkedin.com/in/tay-weebeng > ================================================ > > On 7/6/2017 3:22 PM, Lukasz Kaczmarczyk wrote: > > > On 7 Jun 2017, at 07:57, TAY wee-beng wrote: > > Hi, > > I have been PETSc together with my CFD code. There seems to be a bug with > the Intel compiler such that when I call some DM routines such as > DMLocalToLocalBegin, a segmentation violation will occur if full > optimization is used. I had posted this question a while back. So the > current solution is to use -O1 -ip instead of -O3 -ipo -ip for certain > source files which uses DMLocalToLocalBegin etc. > > Recently, I made some changes to the code, mainly adding some stuffs. > However, depending on my options. some cases still go thru the same program > path. > > Now when I tried to run those same cases, I got segmentation violation, > which didn't happen before: > > * IIB_I_cell_no_uvw_total2 14 10 6 3* > * 2 1* > > *[0]PETSC ERROR: > ------------------------------------------------------------------------* > *[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range* > *[0]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger* > *[0]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > * > *[0]PETSC ERROR: or try http://valgrind.org on > GNU/linux and Apple Mac OS X to find memory corruption errors* > *[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, > and run * > *[0]PETSC ERROR: to get more information on the crash.* > *[0]PETSC ERROR: --------------------- Error Message > --------------------------------------------------------------* > *[0]PETSC ERROR: Signal received* > *[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting.* > *[0]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016 * > *[0]PETSC ERROR: ./a.out * > > > I can't debug using VS since the codes have been optimized. I tried to > print messages (if (myid == 0) print "1") to pinpoint the error. Strangely, > after adding these print messages, the error disappears. > > * IIB_I_cell_no_uvw_total2 14 10 6 3* > * 2 1* > * 1* > * 2* > * 3* > * 4* > * 5* > * 1 0.26873613 0.12620288 0.12949340 1.11422363 > 0.43983516E-06 -0.59311066E-01 0.25546227E+04* > * 2 0.22236892 0.14528589 0.16939270 1.10459102 > 0.74556128E-02 -0.55168234E-01 0.25532419E+04* > * 3 0.20764796 0.14832689 0.18780489 1.08039569 > 0.80299767E-02 -0.46972411E-01 0.25523174E+04* > > Can anyone give a logical explanation why this is happening? Moreover, if > I removed printing 1 to 3, and only print 4 and 5, segmentation violation > appears again. > > I am using Intel Fortran 2016.1.150. I wonder if it helps if I post in the > Intel Fortran forum. > > I can provide more info if require. > > You very likely write on the memory, for example when you exceed the size > of arrays. Depending on your compilation options, starting parameters, > etc. you write in an uncontrolled way on the part of memory which belongs > to your process or protected by operation system. In the second case, you > have a segmentation fault. You can have correct results for some runs, but > your bug is there hiding in the dark. > > To put light on it, you need Valgrind. Compile the code with debugging on, > no optimisation and start searching. You can run as well generate core > file and in gdb/ldb buck track error. > > Lukasz > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Sat Jun 17 15:56:35 2017 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Sat, 17 Jun 2017 20:56:35 +0000 Subject: [petsc-users] empty split for fieldsplit In-Reply-To: References: <86908C35-EBFC-48C6-A002-B64BEC85A375@mcs.anl.gov> <5596225C-DB37-4040-B709-5E6F4B18041B@mcs.anl.gov> , , Message-ID: Matrix A is a tridiagonal matrix with blocksize=1. Why do you set block_size=2 for A_IS and B_IS? Hong ________________________________ From: Zhang, Hong Sent: Friday, June 16, 2017 7:55:45 AM To: Smith, Barry F.; Hoang Giang Bui Cc: petsc-users Subject: Re: [petsc-users] empty split for fieldsplit I'm in Boulder and will be back home this evening. Will test it this weekend. Hong ________________________________ From: Smith, Barry F. Sent: Thursday, June 15, 2017 1:38:11 PM To: Hoang Giang Bui; Zhang, Hong Cc: petsc-users Subject: Re: [petsc-users] empty split for fieldsplit Hong, Please build the attached code with master and run with petscmpiexec -n 2 ./ex1 -mat_size 40 -block_size 2 -method 2 I think this is a bug in your new MatGetSubMatrix routines. You take the block size of the outer IS and pass it into the inner IS but that inner IS may not support the same block size hence the crash. Can you please debug this? Thanks Barry > On Jun 15, 2017, at 7:56 AM, Hoang Giang Bui wrote: > > Hi Barry > > Thanks for pointing out the error. I think the problem coming from the zero fieldsplit in proc 0. In this modified example, I parameterized the matrix size and block size, so when you're executing > > mpirun -np 2 ./ex -mat_size 40 -block_size 2 -method 1 > > everything was fine. With method = 1, fieldsplit size of B is nonzero and is divided by the block size. > > With method=2, i.e mpirun -np 2 ./ex -mat_size 40 -block_size 2 -method 2, the fieldsplit B is zero on proc 0, and the error is thrown > > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [1]PETSC ERROR: Arguments are incompatible > [1]PETSC ERROR: Local size 11 not compatible with block size 2 > > This is somehow not logical, because 0 is divided by block_size. > > Furthermore, if you execute "mpirun -np 2 ./ex -mat_size 20 -block_size 2 -method 2", the code hangs at ISSetBlockSize, which is pretty similar to my original problem. Probably the original one also hangs at ISSetBlockSize, which I may not realize at that time. > > Giang > > On Wed, Jun 14, 2017 at 5:29 PM, Barry Smith wrote: > > You can't do this > > ierr = MatSetSizes(A,PETSC_DECIDE,N,N,N);CHKERRQ(ierr); > > use PETSC_DECIDE for the third argument > > Also this is wrong > > for (i = Istart; i < Iend; ++i) > { > ierr = MatSetValue(A,i,i,2,INSERT_VALUES);CHKERRQ(ierr); > ierr = MatSetValue(A,i+1,i,-1,INSERT_VALUES);CHKERRQ(ierr); > ierr = MatSetValue(A,i,i+1,-1,INSERT_VALUES);CHKERRQ(ierr); > } > > you will get > > $ petscmpiexec -n 2 ./ex1 > 0: Istart = 0, Iend = 60 > 1: Istart = 60, Iend = 120 > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [1]PETSC ERROR: Argument out of range > [1]PETSC ERROR: Row too large: row 120 max 119 > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [1]PETSC ERROR: Petsc Development GIT revision: v3.7.6-4103-g93161b8192 GIT Date: 2017-06-11 14:49:39 -0500 > [1]PETSC ERROR: ./ex1 on a arch-basic named Barrys-MacBook-Pro.local by barrysmith Wed Jun 14 18:26:52 2017 > [1]PETSC ERROR: Configure options PETSC_ARCH=arch-basic > [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 550 in /Users/barrysmith/Src/petsc/src/mat/impls/aij/mpi/mpiaij.c > [1]PETSC ERROR: #2 MatSetValues() line 1270 in /Users/barrysmith/Src/petsc/src/mat/interface/matrix.c > [1]PETSC ERROR: #3 main() line 30 in /Users/barrysmith/Src/petsc/test-dir/ex1.c > [1]PETSC ERROR: PETSc Option Table entries: > [1]PETSC ERROR: -malloc_test > > You need to get the example working so it ends with the error you reported previously not these other bugs. > > > > On Jun 12, 2017, at 10:19 AM, Hoang Giang Bui wrote: > > > > Dear Barry > > > > I made a small example with 2 process with one empty split in proc 0. But it gives another strange error > > > > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > [1]PETSC ERROR: Arguments are incompatible > > [1]PETSC ERROR: Local size 31 not compatible with block size 2 > > > > The local size is always 60, so this is confusing. > > > > Giang > > > > On Sun, Jun 11, 2017 at 8:11 PM, Barry Smith wrote: > > Could be, send us a simple example that demonstrates the problem and we'll track it down. > > > > > > > On Jun 11, 2017, at 12:34 PM, Hoang Giang Bui wrote: > > > > > > Hello > > > > > > I noticed that my code stopped very long, possibly hang, at PCFieldSplitSetIS. There are two splits and one split is empty in one process. May that be the possible reason that PCFieldSplitSetIS hang ? > > > > > > Giang > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Sat Jun 17 16:06:30 2017 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Sat, 17 Jun 2017 21:06:30 +0000 Subject: [petsc-users] empty split for fieldsplit In-Reply-To: References: <86908C35-EBFC-48C6-A002-B64BEC85A375@mcs.anl.gov> <5596225C-DB37-4040-B709-5E6F4B18041B@mcs.anl.gov> , , , Message-ID: never mind, I know it is ok to set blocksize=2 ________________________________ From: Zhang, Hong Sent: Saturday, June 17, 2017 3:56:35 PM To: Smith, Barry F.; Hoang Giang Bui Cc: petsc-users Subject: Re: [petsc-users] empty split for fieldsplit Matrix A is a tridiagonal matrix with blocksize=1. Why do you set block_size=2 for A_IS and B_IS? Hong ________________________________ From: Zhang, Hong Sent: Friday, June 16, 2017 7:55:45 AM To: Smith, Barry F.; Hoang Giang Bui Cc: petsc-users Subject: Re: [petsc-users] empty split for fieldsplit I'm in Boulder and will be back home this evening. Will test it this weekend. Hong ________________________________ From: Smith, Barry F. Sent: Thursday, June 15, 2017 1:38:11 PM To: Hoang Giang Bui; Zhang, Hong Cc: petsc-users Subject: Re: [petsc-users] empty split for fieldsplit Hong, Please build the attached code with master and run with petscmpiexec -n 2 ./ex1 -mat_size 40 -block_size 2 -method 2 I think this is a bug in your new MatGetSubMatrix routines. You take the block size of the outer IS and pass it into the inner IS but that inner IS may not support the same block size hence the crash. Can you please debug this? Thanks Barry > On Jun 15, 2017, at 7:56 AM, Hoang Giang Bui wrote: > > Hi Barry > > Thanks for pointing out the error. I think the problem coming from the zero fieldsplit in proc 0. In this modified example, I parameterized the matrix size and block size, so when you're executing > > mpirun -np 2 ./ex -mat_size 40 -block_size 2 -method 1 > > everything was fine. With method = 1, fieldsplit size of B is nonzero and is divided by the block size. > > With method=2, i.e mpirun -np 2 ./ex -mat_size 40 -block_size 2 -method 2, the fieldsplit B is zero on proc 0, and the error is thrown > > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [1]PETSC ERROR: Arguments are incompatible > [1]PETSC ERROR: Local size 11 not compatible with block size 2 > > This is somehow not logical, because 0 is divided by block_size. > > Furthermore, if you execute "mpirun -np 2 ./ex -mat_size 20 -block_size 2 -method 2", the code hangs at ISSetBlockSize, which is pretty similar to my original problem. Probably the original one also hangs at ISSetBlockSize, which I may not realize at that time. > > Giang > > On Wed, Jun 14, 2017 at 5:29 PM, Barry Smith wrote: > > You can't do this > > ierr = MatSetSizes(A,PETSC_DECIDE,N,N,N);CHKERRQ(ierr); > > use PETSC_DECIDE for the third argument > > Also this is wrong > > for (i = Istart; i < Iend; ++i) > { > ierr = MatSetValue(A,i,i,2,INSERT_VALUES);CHKERRQ(ierr); > ierr = MatSetValue(A,i+1,i,-1,INSERT_VALUES);CHKERRQ(ierr); > ierr = MatSetValue(A,i,i+1,-1,INSERT_VALUES);CHKERRQ(ierr); > } > > you will get > > $ petscmpiexec -n 2 ./ex1 > 0: Istart = 0, Iend = 60 > 1: Istart = 60, Iend = 120 > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [1]PETSC ERROR: Argument out of range > [1]PETSC ERROR: Row too large: row 120 max 119 > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [1]PETSC ERROR: Petsc Development GIT revision: v3.7.6-4103-g93161b8192 GIT Date: 2017-06-11 14:49:39 -0500 > [1]PETSC ERROR: ./ex1 on a arch-basic named Barrys-MacBook-Pro.local by barrysmith Wed Jun 14 18:26:52 2017 > [1]PETSC ERROR: Configure options PETSC_ARCH=arch-basic > [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 550 in /Users/barrysmith/Src/petsc/src/mat/impls/aij/mpi/mpiaij.c > [1]PETSC ERROR: #2 MatSetValues() line 1270 in /Users/barrysmith/Src/petsc/src/mat/interface/matrix.c > [1]PETSC ERROR: #3 main() line 30 in /Users/barrysmith/Src/petsc/test-dir/ex1.c > [1]PETSC ERROR: PETSc Option Table entries: > [1]PETSC ERROR: -malloc_test > > You need to get the example working so it ends with the error you reported previously not these other bugs. > > > > On Jun 12, 2017, at 10:19 AM, Hoang Giang Bui wrote: > > > > Dear Barry > > > > I made a small example with 2 process with one empty split in proc 0. But it gives another strange error > > > > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > [1]PETSC ERROR: Arguments are incompatible > > [1]PETSC ERROR: Local size 31 not compatible with block size 2 > > > > The local size is always 60, so this is confusing. > > > > Giang > > > > On Sun, Jun 11, 2017 at 8:11 PM, Barry Smith wrote: > > Could be, send us a simple example that demonstrates the problem and we'll track it down. > > > > > > > On Jun 11, 2017, at 12:34 PM, Hoang Giang Bui wrote: > > > > > > Hello > > > > > > I noticed that my code stopped very long, possibly hang, at PCFieldSplitSetIS. There are two splits and one split is empty in one process. May that be the possible reason that PCFieldSplitSetIS hang ? > > > > > > Giang > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Sun Jun 18 17:45:59 2017 From: hzhang at mcs.anl.gov (Hong) Date: Sun, 18 Jun 2017 17:45:59 -0500 Subject: [petsc-users] empty split for fieldsplit In-Reply-To: References: <86908C35-EBFC-48C6-A002-B64BEC85A375@mcs.anl.gov> <5596225C-DB37-4040-B709-5E6F4B18041B@mcs.anl.gov> Message-ID: Hoang, I pushed a fix https://bitbucket.org/petsc/petsc/commits/d4e3277789d24018f0db1641a80db7be76600165 and added your test to petsc/src/ksp/ksp/examples/tests/ex53.c It is on the branch hzhang/fix-blockedIS-submat Let me know if it still does not fix your problem. Hong On Sat, Jun 17, 2017 at 4:06 PM, Zhang, Hong wrote: > never mind, I know it is ok to set blocksize=2 > ------------------------------ > *From:* Zhang, Hong > *Sent:* Saturday, June 17, 2017 3:56:35 PM > > *To:* Smith, Barry F.; Hoang Giang Bui > *Cc:* petsc-users > *Subject:* Re: [petsc-users] empty split for fieldsplit > > > Matrix A is a tridiagonal matrix with blocksize=1. > > Why do you set block_size=2 for A_IS and B_IS? > > > Hong > ------------------------------ > *From:* Zhang, Hong > *Sent:* Friday, June 16, 2017 7:55:45 AM > *To:* Smith, Barry F.; Hoang Giang Bui > *Cc:* petsc-users > *Subject:* Re: [petsc-users] empty split for fieldsplit > > > I'm in Boulder and will be back home this evening. > > Will test it this weekend. > > > Hong > ------------------------------ > *From:* Smith, Barry F. > *Sent:* Thursday, June 15, 2017 1:38:11 PM > *To:* Hoang Giang Bui; Zhang, Hong > *Cc:* petsc-users > *Subject:* Re: [petsc-users] empty split for fieldsplit > > > Hong, > > Please build the attached code with master and run with > > petscmpiexec -n 2 ./ex1 -mat_size 40 -block_size 2 -method 2 > > I think this is a bug in your new MatGetSubMatrix routines. You take the > block size of the outer IS and pass it into the inner IS but that inner IS > may not support the same block size hence the crash. > > Can you please debug this? > > Thanks > > Barry > > > > > On Jun 15, 2017, at 7:56 AM, Hoang Giang Bui wrote: > > > > Hi Barry > > > > Thanks for pointing out the error. I think the problem coming from the > zero fieldsplit in proc 0. In this modified example, I parameterized the > matrix size and block size, so when you're executing > > > > mpirun -np 2 ./ex -mat_size 40 -block_size 2 -method 1 > > > > everything was fine. With method = 1, fieldsplit size of B is nonzero > and is divided by the block size. > > > > With method=2, i.e mpirun -np 2 ./ex -mat_size 40 -block_size 2 -method > 2, the fieldsplit B is zero on proc 0, and the error is thrown > > > > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [1]PETSC ERROR: Arguments are incompatible > > [1]PETSC ERROR: Local size 11 not compatible with block size 2 > > > > This is somehow not logical, because 0 is divided by block_size. > > > > Furthermore, if you execute "mpirun -np 2 ./ex -mat_size 20 -block_size > 2 -method 2", the code hangs at ISSetBlockSize, which is pretty similar to > my original problem. Probably the original one also hangs at > ISSetBlockSize, which I may not realize at that time. > > > > Giang > > > > On Wed, Jun 14, 2017 at 5:29 PM, Barry Smith wrote: > > > > You can't do this > > > > ierr = MatSetSizes(A,PETSC_DECIDE,N,N,N);CHKERRQ(ierr); > > > > use PETSC_DECIDE for the third argument > > > > Also this is wrong > > > > for (i = Istart; i < Iend; ++i) > > { > > ierr = MatSetValue(A,i,i,2,INSERT_VALUES);CHKERRQ(ierr); > > ierr = MatSetValue(A,i+1,i,-1,INSERT_VALUES);CHKERRQ(ierr); > > ierr = MatSetValue(A,i,i+1,-1,INSERT_VALUES);CHKERRQ(ierr); > > } > > > > you will get > > > > $ petscmpiexec -n 2 ./ex1 > > 0: Istart = 0, Iend = 60 > > 1: Istart = 60, Iend = 120 > > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [1]PETSC ERROR: Argument out of range > > [1]PETSC ERROR: Row too large: row 120 max 119 > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [1]PETSC ERROR: Petsc Development GIT revision: v3.7.6-4103-g93161b8192 > GIT Date: 2017-06-11 14:49:39 -0500 > > [1]PETSC ERROR: ./ex1 on a arch-basic named Barrys-MacBook-Pro.local by > barrysmith Wed Jun 14 18:26:52 2017 > > [1]PETSC ERROR: Configure options PETSC_ARCH=arch-basic > > [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 550 in > /Users/barrysmith/Src/petsc/src/mat/impls/aij/mpi/mpiaij.c > > [1]PETSC ERROR: #2 MatSetValues() line 1270 in > /Users/barrysmith/Src/petsc/src/mat/interface/matrix.c > > [1]PETSC ERROR: #3 main() line 30 in /Users/barrysmith/Src/petsc/ > test-dir/ex1.c > > [1]PETSC ERROR: PETSc Option Table entries: > > [1]PETSC ERROR: -malloc_test > > > > You need to get the example working so it ends with the error you > reported previously not these other bugs. > > > > > > > On Jun 12, 2017, at 10:19 AM, Hoang Giang Bui > wrote: > > > > > > Dear Barry > > > > > > I made a small example with 2 process with one empty split in proc 0. > But it gives another strange error > > > > > > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > > [1]PETSC ERROR: Arguments are incompatible > > > [1]PETSC ERROR: Local size 31 not compatible with block size 2 > > > > > > The local size is always 60, so this is confusing. > > > > > > Giang > > > > > > On Sun, Jun 11, 2017 at 8:11 PM, Barry Smith > wrote: > > > Could be, send us a simple example that demonstrates the problem and > we'll track it down. > > > > > > > > > > On Jun 11, 2017, at 12:34 PM, Hoang Giang Bui > wrote: > > > > > > > > Hello > > > > > > > > I noticed that my code stopped very long, possibly hang, at > PCFieldSplitSetIS. There are two splits and one split is empty in one > process. May that be the possible reason that PCFieldSplitSetIS hang ? > > > > > > > > Giang > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgbk2008 at gmail.com Mon Jun 19 05:35:38 2017 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Mon, 19 Jun 2017 04:35:38 -0600 Subject: [petsc-users] empty split for fieldsplit In-Reply-To: References: <86908C35-EBFC-48C6-A002-B64BEC85A375@mcs.anl.gov> <5596225C-DB37-4040-B709-5E6F4B18041B@mcs.anl.gov> Message-ID: Thanks Hong I confirmed that it's fixed by your changes. Giang On Sun, Jun 18, 2017 at 4:45 PM, Hong wrote: > Hoang, > I pushed a fix > https://bitbucket.org/petsc/petsc/commits/d4e3277789d24018f0db1641a80db7 > be76600165 > and added your test to > petsc/src/ksp/ksp/examples/tests/ex53.c > > It is on the branch hzhang/fix-blockedIS-submat > Let me know if it still does not fix your problem. > > Hong > > On Sat, Jun 17, 2017 at 4:06 PM, Zhang, Hong wrote: > >> never mind, I know it is ok to set blocksize=2 >> ------------------------------ >> *From:* Zhang, Hong >> *Sent:* Saturday, June 17, 2017 3:56:35 PM >> >> *To:* Smith, Barry F.; Hoang Giang Bui >> *Cc:* petsc-users >> *Subject:* Re: [petsc-users] empty split for fieldsplit >> >> >> Matrix A is a tridiagonal matrix with blocksize=1. >> >> Why do you set block_size=2 for A_IS and B_IS? >> >> >> Hong >> ------------------------------ >> *From:* Zhang, Hong >> *Sent:* Friday, June 16, 2017 7:55:45 AM >> *To:* Smith, Barry F.; Hoang Giang Bui >> *Cc:* petsc-users >> *Subject:* Re: [petsc-users] empty split for fieldsplit >> >> >> I'm in Boulder and will be back home this evening. >> >> Will test it this weekend. >> >> >> Hong >> ------------------------------ >> *From:* Smith, Barry F. >> *Sent:* Thursday, June 15, 2017 1:38:11 PM >> *To:* Hoang Giang Bui; Zhang, Hong >> *Cc:* petsc-users >> *Subject:* Re: [petsc-users] empty split for fieldsplit >> >> >> Hong, >> >> Please build the attached code with master and run with >> >> petscmpiexec -n 2 ./ex1 -mat_size 40 -block_size 2 -method 2 >> >> I think this is a bug in your new MatGetSubMatrix routines. You take the >> block size of the outer IS and pass it into the inner IS but that inner IS >> may not support the same block size hence the crash. >> >> Can you please debug this? >> >> Thanks >> >> Barry >> >> >> >> > On Jun 15, 2017, at 7:56 AM, Hoang Giang Bui >> wrote: >> > >> > Hi Barry >> > >> > Thanks for pointing out the error. I think the problem coming from the >> zero fieldsplit in proc 0. In this modified example, I parameterized the >> matrix size and block size, so when you're executing >> > >> > mpirun -np 2 ./ex -mat_size 40 -block_size 2 -method 1 >> > >> > everything was fine. With method = 1, fieldsplit size of B is nonzero >> and is divided by the block size. >> > >> > With method=2, i.e mpirun -np 2 ./ex -mat_size 40 -block_size 2 -method >> 2, the fieldsplit B is zero on proc 0, and the error is thrown >> > >> > [1]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> > [1]PETSC ERROR: Arguments are incompatible >> > [1]PETSC ERROR: Local size 11 not compatible with block size 2 >> > >> > This is somehow not logical, because 0 is divided by block_size. >> > >> > Furthermore, if you execute "mpirun -np 2 ./ex -mat_size 20 >> -block_size 2 -method 2", the code hangs at ISSetBlockSize, which is pretty >> similar to my original problem. Probably the original one also hangs at >> ISSetBlockSize, which I may not realize at that time. >> > >> > Giang >> > >> > On Wed, Jun 14, 2017 at 5:29 PM, Barry Smith >> wrote: >> > >> > You can't do this >> > >> > ierr = MatSetSizes(A,PETSC_DECIDE,N,N,N);CHKERRQ(ierr); >> > >> > use PETSC_DECIDE for the third argument >> > >> > Also this is wrong >> > >> > for (i = Istart; i < Iend; ++i) >> > { >> > ierr = MatSetValue(A,i,i,2,INSERT_VALUES);CHKERRQ(ierr); >> > ierr = MatSetValue(A,i+1,i,-1,INSERT_VALUES);CHKERRQ(ierr); >> > ierr = MatSetValue(A,i,i+1,-1,INSERT_VALUES);CHKERRQ(ierr); >> > } >> > >> > you will get >> > >> > $ petscmpiexec -n 2 ./ex1 >> > 0: Istart = 0, Iend = 60 >> > 1: Istart = 60, Iend = 120 >> > [1]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> > [1]PETSC ERROR: Argument out of range >> > [1]PETSC ERROR: Row too large: row 120 max 119 >> > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >> for trouble shooting. >> > [1]PETSC ERROR: Petsc Development GIT revision: >> v3.7.6-4103-g93161b8192 GIT Date: 2017-06-11 14:49:39 -0500 >> > [1]PETSC ERROR: ./ex1 on a arch-basic named Barrys-MacBook-Pro.local by >> barrysmith Wed Jun 14 18:26:52 2017 >> > [1]PETSC ERROR: Configure options PETSC_ARCH=arch-basic >> > [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 550 in >> /Users/barrysmith/Src/petsc/src/mat/impls/aij/mpi/mpiaij.c >> > [1]PETSC ERROR: #2 MatSetValues() line 1270 in >> /Users/barrysmith/Src/petsc/src/mat/interface/matrix.c >> > [1]PETSC ERROR: #3 main() line 30 in /Users/barrysmith/Src/petsc/te >> st-dir/ex1.c >> > [1]PETSC ERROR: PETSc Option Table entries: >> > [1]PETSC ERROR: -malloc_test >> > >> > You need to get the example working so it ends with the error you >> reported previously not these other bugs. >> > >> > >> > > On Jun 12, 2017, at 10:19 AM, Hoang Giang Bui >> wrote: >> > > >> > > Dear Barry >> > > >> > > I made a small example with 2 process with one empty split in proc 0. >> But it gives another strange error >> > > >> > > [1]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> > > [1]PETSC ERROR: Arguments are incompatible >> > > [1]PETSC ERROR: Local size 31 not compatible with block size 2 >> > > >> > > The local size is always 60, so this is confusing. >> > > >> > > Giang >> > > >> > > On Sun, Jun 11, 2017 at 8:11 PM, Barry Smith >> wrote: >> > > Could be, send us a simple example that demonstrates the problem >> and we'll track it down. >> > > >> > > >> > > > On Jun 11, 2017, at 12:34 PM, Hoang Giang Bui >> wrote: >> > > > >> > > > Hello >> > > > >> > > > I noticed that my code stopped very long, possibly hang, at >> PCFieldSplitSetIS. There are two splits and one split is empty in one >> process. May that be the possible reason that PCFieldSplitSetIS hang ? >> > > > >> > > > Giang >> > > >> > > >> > > >> > >> > >> > >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From damian at man.poznan.pl Mon Jun 19 07:42:09 2017 From: damian at man.poznan.pl (Damian Kaliszan) Date: Mon, 19 Jun 2017 14:42:09 +0200 Subject: [petsc-users] Fwd: strange PETSc/KSP GMRES timings for MPI+OMP configuration on KNLs In-Reply-To: <1868632011.20170616145710@man.poznan.pl> References: <1868632011.20170616145710@man.poznan.pl> Message-ID: <686121773.20170619144209@man.poznan.pl> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: int_1.jpg Type: image/jpeg Size: 92539 bytes Desc: not available URL: -------------- next part -------------- An embedded message was scrubbed... From: Damian Kaliszan Subject: [petsc-users] strange PETSc/KSP GMRES timings for MPI+OMP configuration on KNLs Date: Fri, 16 Jun 2017 14:57:10 +0200 Size: 237115 URL: From knepley at gmail.com Mon Jun 19 07:50:25 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 19 Jun 2017 06:50:25 -0600 Subject: [petsc-users] Fwd: strange PETSc/KSP GMRES timings for MPI+OMP configuration on KNLs In-Reply-To: <686121773.20170619144209@man.poznan.pl> References: <1868632011.20170616145710@man.poznan.pl> <686121773.20170619144209@man.poznan.pl> Message-ID: On Mon, Jun 19, 2017 at 6:42 AM, Damian Kaliszan wrote: > Hi, > > Regarding my previous post > I looked into both logs of 64MPI/1 OMP vs. 64MPI/2 OMP. > > > What attracted my attention is huge difference in MPI timings in the > following places: > > Average time to get PetscTime(): 2.14577e-07 > Average time for MPI_Barrier(): 3.9196e-05 > Average time for zero size MPI_Send(): 5.45382e-06 > > vs. > > Average time to get PetscTime(): 4.05312e-07 > Average time for MPI_Barrier(): 0.348399 > Average time for zero size MPI_Send(): 0.029937 > > Isn't something wrong with PETSc library itself?... > I don't think so. This is bad interaction of MPI and your threading mechanism. MPI_Barrier() and MPI_Send() are lower level than PETSc. What threading mode did you choose for MPI? This can have a performance impact. Also, the justifications for threading in this context are weak (or non-existent): http://www.orau.gov/hpcor2015/whitepapers/Exascale_Computing_without_Threads-Barry_Smith.pdf Thanks, Matt > > Best, > Damian > > Wiadomo?? przekazana > Od: Damian Kaliszan > Do: PETSc users list > Data: 16 czerwca 2017, 14:57:10 > Temat: [petsc-users] strange PETSc/KSP GMRES timings for MPI+OMP > configuration on KNLs > > ===8<===============Tre?? oryginalnej wiadomo?ci=============== > Hi, > > For several days I've been trying to figure out what is going wrong > with my python app timings solving Ax=b with KSP (GMRES) solver when > trying to run on Intel's KNL 7210/7230. > > I downsized the problem to 1000x1000 A matrix and a single node and > observed the following: > > > I'm attaching 2 extreme timings where configurations differ only by 1 OMP > thread (64MPI/1 OMP vs 64/2 OMPs), > 23321 vs 23325 slurm task ids. > > Any help will be appreciated.... > > Best, > Damian > > ===8<===========Koniec tre?ci oryginalnej wiadomo?ci=========== > > > > ------------------------------------------------------- > Damian Kaliszan > > Poznan Supercomputing and Networking Center > HPC and Data Centres Technologies > ul. Jana Paw?a II 10 > 61-139 Poznan > POLAND > > phone (+48 61) 858 5109 <+48%2061%20858%2051%2009> > e-mail damian at man.poznan.pl > www - http://www.man.poznan.pl/ > ------------------------------------------------------- > > > ---------- Forwarded message ---------- > From: Damian Kaliszan > To: PETSc users list > Cc: > Bcc: > Date: Fri, 16 Jun 2017 14:57:10 +0200 > Subject: [petsc-users] strange PETSc/KSP GMRES timings for MPI+OMP > configuration on KNLs > Hi, > > For several days I've been trying to figure out what is going wrong > with my python app timings solving Ax=b with KSP (GMRES) solver when > trying to run on Intel's KNL 7210/7230. > > I downsized the problem to 1000x1000 A matrix and a single node and > observed the following: > > > I'm attaching 2 extreme timings where configurations differ only by 1 OMP > thread (64MPI/1 OMP vs 64/2 OMPs), > 23321 vs 23325 slurm task ids. > > Any help will be appreciated.... > > Best, > Damian > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: int_1.jpg Type: image/jpeg Size: 92539 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: int_1.jpg Type: image/jpeg Size: 92539 bytes Desc: not available URL: From G.Vaz at marin.nl Mon Jun 19 07:57:55 2017 From: G.Vaz at marin.nl (Vaz, Guilherme) Date: Mon, 19 Jun 2017 12:57:55 +0000 Subject: [petsc-users] PETSC on Cray Hazelhen In-Reply-To: References: <1497343731587.65032@marin.nl> <1497360881294.8614@marin.nl> <1497362496576.40011@marin.nl>, Message-ID: <1497877075364.57203@marin.nl> Guys, I made it working... After a lot of trial and error, based on Matthew, Stefano and Cray $PETSC_DIR/include/petscconfiginfo.h?, these are my "best" confopts which worked: CONFOPTS="--prefix=$PETSC_INSTALL_DIR \ --known-has-attribute-aligned=1 --known-mpi-int64_t=0 --known-bits-per-byte=8 \ --known-sdot-returns-double=0 --known-snrm2-returns-double=0 --known-level1-dcache-assoc=4 \ --known-level1-dcache-linesize=64 --known-level1-dcache-size=16384 --known-memcmp-ok=1 \ --known-mpi-c-double-complex=1 --known-mpi-long-double=1 \ --known-mpi-shared-libraries=0 \ --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-sizeof-char=1 --known-sizeof-double=8 --known-sizeof-float=4 \ --known-sizeof-int=4 --known-sizeof-long-long=8 --known-sizeof-long=8 --known-sizeof-short=2 --known-sizeof-size_t=8 --known-sizeof-void-p=8 \ --with-ar=ar \ --with-batch=1 \ --with-cc=/opt/cray/craype/2.5.10/bin/cc \ --with-clib-autodetect=0 \ --with-cxx=0 \ --with-cxxlib-autodetect=0 \ --with-debugging=0 \ --with-dependencies=0 \ --with-fc=/opt/cray/craype/2.5.10/bin/ftn \ --with-fortran-datatypes=0 --with-fortran-interfaces=0 --with-fortranlib-autodetect=0 \ --CFLAGS=-O3 \ --FFLAGS=-O3 -lstdc++ \ --LDFLAGS=-dynamic \ --FOPTFLAGS=-O3 \ --COPTFLAGS=-O3 \ --with-ranlib=ranlib \ --with-scalar-type=real \ --with-shared-ld=ar \ --with-etags=0 \ --with-x=0 \ --with-ssl=0 \ --with-shared-libraries=0 \ --with-mpi-lib="" --with-mpi-include="" \ --download-superlu_dist=$SOURCE_DIR/$SUPERLU_SOURCE_FILE \ --download-parmetis=$SOURCE_DIR/$PARMETIS_SOURCE_FILE \ --download-metis=$SOURCE_DIR/$METIS_SOURCE_FILE \ --with-external-packages-dir=$INSTALL_DIR " Thanks for the help. And I hope this may help some Cray newbies like me. Guilherme V. ________________________________ From: Stefano Zampini Sent: Tuesday, June 13, 2017 4:10 PM To: Vaz, Guilherme Cc: Matthew Knepley; PETSc Subject: Re: [petsc-users] PETSC on Cray Hazelhen Cray machines can be used with shared libraries, it?s not like the earlier versions of BG/Q Yes, you need almost all of this. If you run with configure with the option ?with-batch=1, will then generate something like the one I have sent you. ?with-shared-libraries is a PETSc configuration, i.e. you will create libpetsc.so ?LDFLAGS=-dynamic is used to link dynamically a PETSc executable ?known-mpi-shared-libraries=0 will use a statically linked MPI You can remove the first two options listed above if you would like to have a static version of PETSC. You may want to refine the options for optimized builds, i.e. add your favorite COPTFLAGS and remove ?with-debugging=1 Another thing you can do. Load any of the PETSc modules on the XC40, and then look at the file $PETSC_DIR/include/petscconfiginfo.h On Jun 13, 2017, at 4:01 PM, Vaz, Guilherme > wrote: Stefano/Mathew, Do I need all this :-)? And I dont want a debug version, just an optimized release version. Thus I understand the -g -O0 flags for debug, but the rest I am not sure what is for debug and for a release version. Sorry... I am also kind of confused on the shared libraries issue, '--known-mpi-shared-libraries=0', '--with-shared-libraries=1', on the static vs dynamic linking (I thought in XC-40 we had to compile everything statically, --LDFLAGS=-dynamic and on the FFFLAG: --FFLAGS=-mkl=sequential -g -O0 -lstdc++ Is this to be used with Intel MKL libraries? Thanks for the help, you both. Guilherme V. dr. ir. Guilherme Vaz | CFD Research Coordinator / Senior CFD Researcher | Research & Development MARIN | T +31 317 49 33 25 | M +31 621 13 11 97 | G.Vaz at marin.nl | www.marin.nl MARIN news: MARIN deelt onderzoek tijdens R&D Seminar, 21 juni 2017 ________________________________ From: Stefano Zampini > Sent: Tuesday, June 13, 2017 3:42 PM To: Vaz, Guilherme Cc: Matthew Knepley; PETSc Subject: Re: [petsc-users] PETSC on Cray Hazelhen Guilherme, here is my debug configuration (with shared libraries) in PETSc on a XC40 '--CFLAGS=-mkl=sequential -g -O0 ', '--CXXFLAGS=-mkl=sequential -g -O0 ', '--FFLAGS=-mkl=sequential -g -O0 -lstdc++', '--LDFLAGS=-dynamic', '--download-metis-cmake-arguments=-DCMAKE_C_COMPILER_FORCED=1', '--download-metis=1', '--download-parmetis-cmake-arguments=-DCMAKE_C_COMPILER_FORCED=1', '--download-parmetis=1', '--known-bits-per-byte=8', '--known-has-attribute-aligned=1', '--known-level1-dcache-assoc=8', '--known-level1-dcache-linesize=64', '--known-level1-dcache-size=32768', '--known-memcmp-ok=1', '--known-mpi-c-double-complex=1', '--known-mpi-int64_t=1', '--known-mpi-long-double=1', '--known-mpi-shared-libraries=0', '--known-sdot-returns-double=0', '--known-sizeof-MPI_Comm=4', '--known-sizeof-MPI_Fint=4', '--known-sizeof-char=1', '--known-sizeof-double=8', '--known-sizeof-float=4', '--known-sizeof-int=4', '--known-sizeof-long-long=8', '--known-sizeof-long=8', '--known-sizeof-short=2', '--known-sizeof-size_t=8', '--known-sizeof-void-p=8', '--known-snrm2-returns-double=0', '--with-ar=ar', '--with-batch=1', '--with-cc=/opt/cray/craype/2.4.2/bin/cc', '--with-clib-autodetect=0', '--with-cmake=/home/zampins/local/bin/cmake', '--with-cxx=/opt/cray/craype/2.4.2/bin/CC', '--with-cxxlib-autodetect=0', '--with-debugging=1', '--with-dependencies=0', '--with-etags=0', '--with-fc=/opt/cray/craype/2.4.2/bin/ftn', '--with-fortran-datatypes=0', '--with-fortran-interfaces=0', '--with-fortranlib-autodetect=0', '--with-pthread=0', '--with-ranlib=ranlib', '--with-scalar-type=real', '--with-shared-ld=ar', '--with-shared-libraries=1', 'PETSC_ARCH=arch-intel-debug', 2017-06-13 15:34 GMT+02:00 Vaz, Guilherme >: Dear Matthew, Thanks. It went further, but now I get: TESTING: configureMPIEXEC from config.packages.MPI(/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/packages/MPI.py:143) ******************************************************************************* UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): ------------------------------------------------------------------------------- Must give a default value for known-mpi-shared-libraries since executables cannot be run ******************************************************************************* Last lines from the log: File "./config/configure.py", line 405, in petsc_configure framework.configure(out = sys.stdout) File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/framework.py", line 1090, in configure self.processChildren() File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/framework.py", line 1079, in processChildren self.serialEvaluation(self.childGraph) File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/framework.py", line 1060, in serialEvaluation child.configure() File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/package.py", line 791, in configure self.executeTest(self.checkSharedLibrary) File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/base.py", line 126, in executeTest ret = test(*args,**kargs) File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/packages/MPI.py", line 135, in checkSharedLibrary self.shared = self.libraries.checkShared('#include \n','MPI_Init','MPI_Initialized','MPI_Finalize',checkLink = self.checkPackageLink,libraries = self.lib, defaultArg = 'known-mpi-shared-libraries', ex ecutor = self.mpiexec) File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/libraries.py", line 471, in checkShared if self.checkRun(defaultIncludes, body, defaultArg = defaultArg, executor = executor): File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/base.py", line 628, in checkRun (output, returnCode) = self.outputRun(includes, body, cleanup, defaultArg, executor) File "/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/base.py", line 598, in outputRun raise ConfigureSetupError('Must give a default value for '+defaultOutputArg+' since executables cannot be run') ?Any ideas? Something related with --with-shared-libraries=0 \ --with-batch=1 \ ?The first I set because it was in the cray example, and the second because aprun (the mpiexec of Cray) is not available in the frontend. Thanks, Guilherme V. dr. ir. Guilherme Vaz | CFD Research Coordinator / Senior CFD Researcher | Research & Development MARIN | T +31 317 49 33 25 | M +31 621 13 11 97 | G.Vaz at marin.nl | www.marin.nl MARIN news: MARIN deelt onderzoek tijdens R&D Seminar, 21 juni 2017 ________________________________ From: Matthew Knepley > Sent: Tuesday, June 13, 2017 2:34 PM To: Vaz, Guilherme Cc: PETSc Subject: Re: [petsc-users] PETSC on Cray Hazelhen On Tue, Jun 13, 2017 at 3:48 AM, Vaz, Guilherme > wrote: Dear all, I am trying to install PETSC on a Cray XC40 system (Hazelhen) with the usual Cray wrappers for Intel compilers, with some chosen external packages and MKL libraries. I read some threads in the mailing list about this, and I tried the petsc-3.7.5/config/examples/arch-cray-xt6-pkgs-opt.py configuration options. After trying this (please abstract from my own env vars), CONFOPTS="--prefix=$PETSC_INSTALL_DIR \ --with-cc=cc \ --with-cxx=CC \ --with-fc=ftn \ --with-clib-autodetect=0 \ --with-cxxlib-autodetect=0 \ --with-fortranlib-autodetect=0 \ --COPTFLAGS=-fast -mp \ --CXXOPTFLAGS=-fast -mp \ --FOPTFLAGS=-fast -mp \ --with-shared-libraries=0 \ --with-batch=1 \ --with-x=0 \ --with-mpe=0 \ --with-debugging=0 \ --download-superlu_dist=$SOURCE_DIR/$SUPERLU_SOURCE_FILE \ --with-blas-lapack-dir=$BLASDIR \ --download-parmetis=$SOURCE_DIR/$PARMETIS_SOURCE_FILE \ --download-metis=$SOURCE_DIR/$METIS_SOURCE_FILE \ --with-external-packages-dir=$INSTALL_DIR \ --with-ssl=0 " I get the following error: TESTING: checkFortranLinkingCxx from config.compilers(/zhome/academic/HLRS/pri/iprguvaz/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/config/BuildSystem/config/compilers.py:1097) ******************************************************************************* UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): ------------------------------------------------------------------------------- Fortran could not successfully link C++ objects ******************************************************************************* Does it ring a bell? Any tips? You turned off autodetection, so it will not find libstdc++. That either has to be put in LIBS, or I would recommend --with-cxx=0 since nothing you have there requires C++. Thanks, Matt Thanks, Guilherme V. dr. ir. Guilherme Vaz | CFD Research Coordinator / Senior CFD Researcher | Research & Development MARIN | T +31 317 49 33 25 | M +31 621 13 11 97 | G.Vaz at marin.nl | www.marin.nl MARIN news: Maritime Safety seminar, September 12, Singapore -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -- Stefano dr. ir. Guilherme Vaz | CFD Research Coordinator / Senior CFD Researcher | Research & Development MARIN | T +31 317 49 33 25 | M +31 621 13 11 97 | G.Vaz at marin.nl | www.marin.nl [LinkedIn] [YouTube] [Twitter] [Facebook] MARIN news: Floating cities: our future is on the water -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image9fc2ea.PNG Type: image/png Size: 293 bytes Desc: image9fc2ea.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image5fa2a7.PNG Type: image/png Size: 331 bytes Desc: image5fa2a7.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imagefc5855.PNG Type: image/png Size: 333 bytes Desc: imagefc5855.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imagef440c9.PNG Type: image/png Size: 253 bytes Desc: imagef440c9.PNG URL: From franck.houssen at inria.fr Mon Jun 19 08:20:06 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Mon, 19 Jun 2017 15:20:06 +0200 (CEST) Subject: [petsc-users] Building MatIS with dense local matrix ? In-Reply-To: <59634672.7742377.1497877797395.JavaMail.zimbra@inria.fr> Message-ID: <347595938.7749535.1497878406127.JavaMail.zimbra@inria.fr> Hi, I try to call MatISGetMPIXAIJ on a MatIS (A) that has been feed locally by sequential (Aloc) dense matrix. Seems this ends up with this error: [0]PETSC ERROR: New nonzero at (0,1) caused a malloc. Is this a known error / limitation ? (not supposed to work with dense matrix ?) This (pseudo code) works fine: MatCreateIS(..., A) MatCreateSeqAIJ(..., Aloc) MatISSetLocalMat(pcA, pcALoc) MatISGetMPIXAIJ(A, ...) // OK ! When I try to replace MatCreateSeqAIJ(..., Aloc) with MatCreateSeqDense(..., Aloc), it does no more work. Franck PS: running debian/testing with gcc-6.3 + petsc-3.7.6 -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Mon Jun 19 08:25:35 2017 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Mon, 19 Jun 2017 15:25:35 +0200 Subject: [petsc-users] Building MatIS with dense local matrix ? In-Reply-To: <347595938.7749535.1497878406127.JavaMail.zimbra@inria.fr> References: <59634672.7742377.1497877797395.JavaMail.zimbra@inria.fr> <347595938.7749535.1497878406127.JavaMail.zimbra@inria.fr> Message-ID: Can you send a minimal working example so that I can fix the code? Thanks Stefano Il 19 Giu 2017 15:20, "Franck Houssen" ha scritto: > Hi, > > I try to call MatISGetMPIXAIJ on a MatIS (A) that has been feed locally by > sequential (Aloc) dense matrix. > Seems this ends up with this error: [0]PETSC ERROR: New nonzero at (0,1) > caused a malloc. Is this a known error / limitation ? (not supposed to work > with dense matrix ?) > > This (pseudo code) works fine: > MatCreateIS(..., A) > MatCreateSeqAIJ(..., Aloc) > MatISSetLocalMat(pcA, pcALoc) > MatISGetMPIXAIJ(A, ...) // OK ! > > When I try to replace MatCreateSeqAIJ(..., Aloc) with > MatCreateSeqDense(..., Aloc), it does no more work. > > Franck > > PS: running debian/testing with gcc-6.3 + petsc-3.7.6 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From damian at man.poznan.pl Mon Jun 19 08:32:43 2017 From: damian at man.poznan.pl (Damian Kaliszan) Date: Mon, 19 Jun 2017 15:32:43 +0200 Subject: [petsc-users] Fwd: strange PETSc/KSP GMRES timings for MPI+OMP configuration on KNLs In-Reply-To: References: <1868632011.20170616145710@man.poznan.pl> <686121773.20170619144209@man.poznan.pl> Message-ID: <1708894683.20170619153243@man.poznan.pl> Hi, Thank you for the answer and the article. I use SLURM (srun) for job submission by running 'srun script.py script_parameters' command inside batch script so this is SPMD model. What I noticed is that the problems I'm having now didn't happened before on CPU E5-2697 v3 nodes (28 cores - the best perormance I had was using 14MPIs/2OMP per node). Problems started to appear when I moved to KNLs. The funny thing is that switching OMP on/off (by setting OMP_NUM_THREADS to 1) doesn't help for all #NODES/# MPI/ #OMP combinations. For example, for 2 nodes, 16 MPIs, for OMP=1 and 2 the timings are huge and for 4 is OK. Playing with affinitty didn't help so far. In other words at first glance results look completely random (I can provide more such examples). Best, Damian W li?cie datowanym 19 czerwca 2017 (14:50:25) napisano: On Mon, Jun 19, 2017 at 6:42 AM, Damian Kaliszan wrote: Hi, Regarding my previous post I looked into both logs of 64MPI/1 OMP vs. 64MPI/2 OMP. What attracted my attention is huge difference in MPI timings in the following places: Average time to get PetscTime(): 2.14577e-07 Average time for MPI_Barrier(): 3.9196e-05 Average time for zero size MPI_Send(): 5.45382e-06 vs. Average time to get PetscTime(): 4.05312e-07 Average time for MPI_Barrier(): 0.348399 Average time for zero size MPI_Send(): 0.029937 Isn't something wrong with PETSc library itself?... I don't think so. This is bad interaction of MPI and your threading mechanism. MPI_Barrier() and MPI_Send() are lower level than PETSc. What threading mode did you choose for MPI? This can have a performance impact. Also, the justifications for threading in this context are weak (or non-existent): http://www.orau.gov/hpcor2015/whitepapers/Exascale_Computing_without_Threads-Barry_Smith.pdf Thanks, Matt Best, Damian Wiadomo?? przekazana Od: Damian Kaliszan Do: PETSc users list Data: 16 czerwca 2017, 14:57:10 Temat: [petsc-users] strange PETSc/KSP GMRES timings for MPI+OMP configuration on KNLs ===8<===============Tre?? oryginalnej wiadomo?ci=============== Hi, For several days I've been trying to figure out what is going wrong with my python app timings solving Ax=b with KSP (GMRES) solver when trying to run on Intel's KNL 7210/7230. I downsized the problem to 1000x1000 A matrix and a single node and observed the following: I'm attaching 2 extreme timings where configurations differ only by 1 OMP thread (64MPI/1 OMP vs 64/2 OMPs), 23321 vs 23325 slurm task ids. Any help will be appreciated.... Best, Damian ===8<===========Koniec tre?ci oryginalnej wiadomo?ci=========== ------------------------------------------------------- Damian Kaliszan Poznan Supercomputing and Networking Center HPC and Data Centres Technologies ul. Jana Paw?a II 10 61-139 Poznan POLAND phone (+48 61) 858 5109 e-mail damian at man.poznan.pl www - http://www.man.poznan.pl/ ------------------------------------------------------- ---------- Forwarded message ---------- From: Damian Kaliszan To: PETSc users list Cc: Bcc: Date: Fri, 16 Jun 2017 14:57:10 +0200 Subject: [petsc-users] strange PETSc/KSP GMRES timings for MPI+OMP configuration on KNLs Hi, For several days I've been trying to figure out what is going wrong with my python app timings solving Ax=b with KSP (GMRES) solver when trying to run on Intel's KNL 7210/7230. I downsized the problem to 1000x1000 A matrix and a single node and observed the following: I'm attaching 2 extreme timings where configurations differ only by 1 OMP thread (64MPI/1 OMP vs 64/2 OMPs), 23321 vs 23325 slurm task ids. Any help will be appreciated.... Best, Damian -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ ------------------------------------------------------- Damian Kaliszan Poznan Supercomputing and Networking Center HPC and Data Centres Technologies ul. Jana Paw?a II 10 61-139 Poznan POLAND phone (+48 61) 858 5109 e-mail damian at man.poznan.pl www - http://www.man.poznan.pl/ ------------------------------------------------------- From knepley at gmail.com Mon Jun 19 08:39:53 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 19 Jun 2017 07:39:53 -0600 Subject: [petsc-users] Fwd: strange PETSc/KSP GMRES timings for MPI+OMP configuration on KNLs In-Reply-To: <1708894683.20170619153243@man.poznan.pl> References: <1868632011.20170616145710@man.poznan.pl> <686121773.20170619144209@man.poznan.pl> <1708894683.20170619153243@man.poznan.pl> Message-ID: On Mon, Jun 19, 2017 at 7:32 AM, Damian Kaliszan wrote: > Hi, > Thank you for the answer and the article. > I use SLURM (srun) for job submission by running > 'srun script.py script_parameters' command inside batch script so this is > SPMD model. > What I noticed is that the problems I'm having now didn't happened > before on CPU E5-2697 v3 nodes (28 cores - the best perormance I had > was using 14MPIs/2OMP per node). Problems started to appear when I moved > to KNLs. > The funny thing is that switching OMP on/off (by setting > OMP_NUM_THREADS to 1) doesn't help for all #NODES/# MPI/ #OMP > combinations. For example, for 2 nodes, 16 MPIs, for OMP=1 and 2 the > timings are huge and for 4 is OK. > Lets narrow this down to MPI_Barrier(). What memory mode is KNL in? Did you require KNL to use only MCDRAM? Please show the MPI_Barrier()/MPI_Send() numbers for the different configurations. This measures just latency. We could also look at VecScale() to look at memory bandwidth achieved. Thanks, Matt > Playing with affinitty didn't help so far. > In other words at first glance results look completely random (I can > provide more such examples). > > > > Best, > Damian > > W li?cie datowanym 19 czerwca 2017 (14:50:25) napisano: > > > On Mon, Jun 19, 2017 at 6:42 AM, Damian Kaliszan > wrote: > Hi, > > Regarding my previous post > I looked into both logs of 64MPI/1 OMP vs. 64MPI/2 OMP. > > > What attracted my attention is huge difference in MPI timings in the > following places: > > Average time to get PetscTime(): 2.14577e-07 > Average time for MPI_Barrier(): 3.9196e-05 > Average time for zero size MPI_Send(): 5.45382e-06 > > vs. > > Average time to get PetscTime(): 4.05312e-07 > Average time for MPI_Barrier(): 0.348399 > Average time for zero size MPI_Send(): 0.029937 > > Isn't something wrong with PETSc library itself?... > > I don't think so. This is bad interaction of MPI and your threading > mechanism. MPI_Barrier() and MPI_Send() are lower > level than PETSc. What threading mode did you choose for MPI? This can > have a performance impact. > > Also, the justifications for threading in this context are weak (or > non-existent): http://www.orau.gov/hpcor2015/whitepapers/Exascale_ > Computing_without_Threads-Barry_Smith.pdf > > Thanks, > > Matt > > > Best, > Damian > > Wiadomo?? przekazana > Od: Damian Kaliszan > Do: PETSc users list > Data: 16 czerwca 2017, 14:57:10 > Temat: [petsc-users] strange PETSc/KSP GMRES timings for MPI+OMP > configuration on KNLs > > ===8<===============Tre?? oryginalnej wiadomo?ci=============== > Hi, > > For several days I've been trying to figure out what is going wrong > with my python app timings solving Ax=b with KSP (GMRES) solver when > trying to run on Intel's KNL 7210/7230. > > I downsized the problem to 1000x1000 A matrix and a single node and > observed the following: > > > I'm attaching 2 extreme timings where configurations differ only by 1 OMP > thread (64MPI/1 OMP vs 64/2 OMPs), > 23321 vs 23325 slurm task ids. > > Any help will be appreciated.... > > Best, > Damian > > ===8<===========Koniec tre?ci oryginalnej wiadomo?ci=========== > > > > ------------------------------------------------------- > Damian Kaliszan > > Poznan Supercomputing and Networking Center > HPC and Data Centres Technologies > ul. Jana Paw?a II 10 > 61-139 Poznan > POLAND > > phone (+48 61) 858 5109 > e-mail damian at man.poznan.pl > www - http://www.man.poznan.pl/ > ------------------------------------------------------- > > > ---------- Forwarded message ---------- > From: Damian Kaliszan > To: PETSc users list > Cc: > Bcc: > Date: Fri, 16 Jun 2017 14:57:10 +0200 > Subject: [petsc-users] strange PETSc/KSP GMRES timings for MPI+OMP > configuration on KNLs > Hi, > > For several days I've been trying to figure out what is going wrong > with my python app timings solving Ax=b with KSP (GMRES) solver when > trying to run on Intel's KNL 7210/7230. > > I downsized the problem to 1000x1000 A matrix and a single node and > observed the following: > > > I'm attaching 2 extreme timings where configurations differ only by 1 OMP > thread (64MPI/1 OMP vs 64/2 OMPs), > 23321 vs 23325 slurm task ids. > > Any help will be appreciated.... > > Best, > Damian > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > > ------------------------------------------------------- > Damian Kaliszan > > Poznan Supercomputing and Networking Center > HPC and Data Centres Technologies > ul. Jana Paw?a II 10 > 61-139 Poznan > POLAND > > phone (+48 61) 858 5109 > e-mail damian at man.poznan.pl > www - http://www.man.poznan.pl/ > ------------------------------------------------------- > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From damian at man.poznan.pl Mon Jun 19 08:56:32 2017 From: damian at man.poznan.pl (Damian Kaliszan) Date: Mon, 19 Jun 2017 15:56:32 +0200 Subject: [petsc-users] Fwd: strange PETSc/KSP GMRES timings for MPI+OMP configuration on KNLs In-Reply-To: References: <1868632011.20170616145710@man.poznan.pl> <686121773.20170619144209@man.poznan.pl> <1708894683.20170619153243@man.poznan.pl> Message-ID: <176540165.20170619155632@man.poznan.pl> Hi, Please find attached 2 output files from 64MPI/1 OMP vs 64/2 OMPs examples, 23321 vs 23325 slurm task ids. Best, Damian W li?cie datowanym 19 czerwca 2017 (15:39:53) napisano: On Mon, Jun 19, 2017 at 7:32 AM, Damian Kaliszan wrote: Hi, Thank you for the answer and the article. I use SLURM (srun) for job submission by running 'srun script.py script_parameters' command inside batch script so this is SPMD model. What I noticed is that the problems I'm having now didn't happened before on CPU E5-2697 v3 nodes (28 cores - the best perormance I had was using 14MPIs/2OMP per node). Problems started to appear when I moved to KNLs. The funny thing is that switching OMP on/off (by setting OMP_NUM_THREADS to 1) doesn't help for all #NODES/# MPI/ #OMP combinations. For example, for 2 nodes, 16 MPIs, for OMP=1 and 2 the timings are huge and for 4 is OK. Lets narrow this down to MPI_Barrier(). What memory mode is KNL in? Did you require KNL to use only MCDRAM? Please show the MPI_Barrier()/MPI_Send() numbers for the different configurations. This measures just latency. We could also look at VecScale() to look at memory bandwidth achieved. Thanks, Matt Playing with affinitty didn't help so far. In other words at first glance results look completely random (I can provide more such examples). Best, Damian W li?cie datowanym 19 czerwca 2017 (14:50:25) napisano: On Mon, Jun 19, 2017 at 6:42 AM, Damian Kaliszan wrote: Hi, Regarding my previous post I looked into both logs of 64MPI/1 OMP vs. 64MPI/2 OMP. What attracted my attention is huge difference in MPI timings in the following places: Average time to get PetscTime(): 2.14577e-07 Average time for MPI_Barrier(): 3.9196e-05 Average time for zero size MPI_Send(): 5.45382e-06 vs. Average time to get PetscTime(): 4.05312e-07 Average time for MPI_Barrier(): 0.348399 Average time for zero size MPI_Send(): 0.029937 Isn't something wrong with PETSc library itself?... I don't think so. This is bad interaction of MPI and your threading mechanism. MPI_Barrier() and MPI_Send() are lower level than PETSc. What threading mode did you choose for MPI? This can have a performance impact. Also, the justifications for threading in this context are weak (or non-existent): http://www.orau.gov/hpcor2015/whitepapers/Exascale_Computing_without_Threads-Barry_Smith.pdf Thanks, Matt Best, Damian Wiadomo?? przekazana Od: Damian Kaliszan Do: PETSc users list Data: 16 czerwca 2017, 14:57:10 Temat: [petsc-users] strange PETSc/KSP GMRES timings for MPI+OMP configuration on KNLs ===8<===============Tre?? oryginalnej wiadomo?ci=============== Hi, For several days I've been trying to figure out what is going wrong with my python app timings solving Ax=b with KSP (GMRES) solver when trying to run on Intel's KNL 7210/7230. I downsized the problem to 1000x1000 A matrix and a single node and observed the following: I'm attaching 2 extreme timings where configurations differ only by 1 OMP thread (64MPI/1 OMP vs 64/2 OMPs), 23321 vs 23325 slurm task ids. Any help will be appreciated.... Best, Damian ===8<===========Koniec tre?ci oryginalnej wiadomo?ci=========== ------------------------------------------------------- Damian Kaliszan Poznan Supercomputing and Networking Center HPC and Data Centres Technologies ul. Jana Paw?a II 10 61-139 Poznan POLAND phone (+48 61) 858 5109 e-mail damian at man.poznan.pl www - http://www.man.poznan.pl/ ------------------------------------------------------- ---------- Forwarded message ---------- From: Damian Kaliszan To: PETSc users list Cc: Bcc: Date: Fri, 16 Jun 2017 14:57:10 +0200 Subject: [petsc-users] strange PETSc/KSP GMRES timings for MPI+OMP configuration on KNLs Hi, For several days I've been trying to figure out what is going wrong with my python app timings solving Ax=b with KSP (GMRES) solver when trying to run on Intel's KNL 7210/7230. I downsized the problem to 1000x1000 A matrix and a single node and observed the following: I'm attaching 2 extreme timings where configurations differ only by 1 OMP thread (64MPI/1 OMP vs 64/2 OMPs), 23321 vs 23325 slurm task ids. Any help will be appreciated.... Best, Damian -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ ------------------------------------------------------- Damian Kaliszan Poznan Supercomputing and Networking Center HPC and Data Centres Technologies ul. Jana Paw?a II 10 61-139 Poznan POLAND phone (+48 61) 858 5109 e-mail damian at man.poznan.pl www - http://www.man.poznan.pl/ ------------------------------------------------------- -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ ------------------------------------------------------- Damian Kaliszan Poznan Supercomputing and Networking Center HPC and Data Centres Technologies ul. Jana Paw?a II 10 61-139 Poznan POLAND phone (+48 61) 858 5109 e-mail damian at man.poznan.pl www - http://www.man.poznan.pl/ ------------------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: slurm-23321.out Type: application/octet-stream Size: 39057 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: slurm-23325.out Type: application/octet-stream Size: 39054 bytes Desc: not available URL: From knepley at gmail.com Mon Jun 19 09:46:22 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 19 Jun 2017 08:46:22 -0600 Subject: [petsc-users] Fwd: strange PETSc/KSP GMRES timings for MPI+OMP configuration on KNLs In-Reply-To: <176540165.20170619155632@man.poznan.pl> References: <1868632011.20170616145710@man.poznan.pl> <686121773.20170619144209@man.poznan.pl> <1708894683.20170619153243@man.poznan.pl> <176540165.20170619155632@man.poznan.pl> Message-ID: On Mon, Jun 19, 2017 at 7:56 AM, Damian Kaliszan wrote: > Hi, > > Please find attached 2 output files from 64MPI/1 OMP vs 64/2 OMPs examples, > 23321 vs 23325 slurm task ids. > This is on 1 KNL? Then aren't you oversubscribing using 2 threads? This produces horrible performance, like you see in this log. Matt > Best, > Damian > > > W li?cie datowanym 19 czerwca 2017 (15:39:53) napisano: > > > On Mon, Jun 19, 2017 at 7:32 AM, Damian Kaliszan > wrote: > Hi, > Thank you for the answer and the article. > I use SLURM (srun) for job submission by running > 'srun script.py script_parameters' command inside batch script so this is > SPMD model. > What I noticed is that the problems I'm having now didn't happened > before on CPU E5-2697 v3 nodes (28 cores - the best perormance I had > was using 14MPIs/2OMP per node). Problems started to appear when I moved > to KNLs. > The funny thing is that switching OMP on/off (by setting > OMP_NUM_THREADS to 1) doesn't help for all #NODES/# MPI/ #OMP > combinations. For example, for 2 nodes, 16 MPIs, for OMP=1 and 2 the > timings are huge and for 4 is OK. > > Lets narrow this down to MPI_Barrier(). What memory mode is KNL in? Did > you require > KNL to use only MCDRAM? Please show the MPI_Barrier()/MPI_Send() numbers > for the different configurations. > This measures just latency. We could also look at VecScale() to look at > memory bandwidth achieved. > > Thanks, > > Matt > > Playing with affinitty didn't help so far. > In other words at first glance results look completely random (I can > provide more such examples). > > > > Best, > Damian > > W li?cie datowanym 19 czerwca 2017 (14:50:25) napisano: > > > On Mon, Jun 19, 2017 at 6:42 AM, Damian Kaliszan > wrote: > Hi, > > Regarding my previous post > I looked into both logs of 64MPI/1 OMP vs. 64MPI/2 OMP. > > > What attracted my attention is huge difference in MPI timings in the > following places: > > Average time to get PetscTime(): 2.14577e-07 > Average time for MPI_Barrier(): 3.9196e-05 > Average time for zero size MPI_Send(): 5.45382e-06 > > vs. > > Average time to get PetscTime(): 4.05312e-07 > Average time for MPI_Barrier(): 0.348399 > Average time for zero size MPI_Send(): 0.029937 > > Isn't something wrong with PETSc library itself?... > > I don't think so. This is bad interaction of MPI and your threading > mechanism. MPI_Barrier() and MPI_Send() are lower > level than PETSc. What threading mode did you choose for MPI? This can > have a performance impact. > > Also, the justifications for threading in this context are weak (or > non-existent): http://www.orau.gov/hpcor2015/whitepapers/Exascale_ > Computing_without_Threads-Barry_Smith.pdf > > Thanks, > > Matt > > > Best, > Damian > > Wiadomo?? przekazana > Od: Damian Kaliszan > Do: PETSc users list > Data: 16 czerwca 2017, 14:57:10 > Temat: [petsc-users] strange PETSc/KSP GMRES timings for MPI+OMP > configuration on KNLs > > ===8<===============Tre?? oryginalnej wiadomo?ci=============== > Hi, > > For several days I've been trying to figure out what is going wrong > with my python app timings solving Ax=b with KSP (GMRES) solver when > trying to run on Intel's KNL 7210/7230. > > I downsized the problem to 1000x1000 A matrix and a single node and > observed the following: > > > I'm attaching 2 extreme timings where configurations differ only by 1 OMP > thread (64MPI/1 OMP vs 64/2 OMPs), > 23321 vs 23325 slurm task ids. > > Any help will be appreciated.... > > Best, > Damian > > ===8<===========Koniec tre?ci oryginalnej wiadomo?ci=========== > > > > ------------------------------------------------------- > Damian Kaliszan > > Poznan Supercomputing and Networking Center > HPC and Data Centres Technologies > ul. Jana Paw?a II 10 > 61-139 Poznan > POLAND > > phone (+48 61) 858 5109 > e-mail damian at man.poznan.pl > www - http://www.man.poznan.pl/ > ------------------------------------------------------- > > > ---------- Forwarded message ---------- > From: Damian Kaliszan > To: PETSc users list > Cc: > Bcc: > Date: Fri, 16 Jun 2017 14:57:10 +0200 > Subject: [petsc-users] strange PETSc/KSP GMRES timings for MPI+OMP > configuration on KNLs > Hi, > > For several days I've been trying to figure out what is going wrong > with my python app timings solving Ax=b with KSP (GMRES) solver when > trying to run on Intel's KNL 7210/7230. > > I downsized the problem to 1000x1000 A matrix and a single node and > observed the following: > > > I'm attaching 2 extreme timings where configurations differ only by 1 OMP > thread (64MPI/1 OMP vs 64/2 OMPs), > 23321 vs 23325 slurm task ids. > > Any help will be appreciated.... > > Best, > Damian > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > > ------------------------------------------------------- > Damian Kaliszan > > Poznan Supercomputing and Networking Center > HPC and Data Centres Technologies > ul. Jana Paw?a II 10 > 61-139 Poznan > POLAND > > phone (+48 61) 858 5109 > e-mail damian at man.poznan.pl > www - http://www.man.poznan.pl/ > ------------------------------------------------------- > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > > ------------------------------------------------------- > Damian Kaliszan > > Poznan Supercomputing and Networking Center > HPC and Data Centres Technologies > ul. Jana Paw?a II 10 > 61-139 Poznan > POLAND > > phone (+48 61) 858 5109 > e-mail damian at man.poznan.pl > www - http://www.man.poznan.pl/ > ------------------------------------------------------- -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From franck.houssen at inria.fr Mon Jun 19 11:17:20 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Mon, 19 Jun 2017 18:17:20 +0200 (CEST) Subject: [petsc-users] Building MatIS with dense local matrix ? In-Reply-To: References: <59634672.7742377.1497877797395.JavaMail.zimbra@inria.fr> <347595938.7749535.1497878406127.JavaMail.zimbra@inria.fr> Message-ID: <2069876670.7862897.1497889040933.JavaMail.zimbra@inria.fr> The problem was difficult to reduce as reducing make things disappear... Luckily, I believe I got it (or at least, it looks "like" the one I "really" have...). Seems that for square matrix, it works fine for csr and dense matrix. But, If I am not mistaken, it does not for dense rectangular matrix (still OK for csr). matISCSRDenseSquare.cpp: 2 procs, global 3x3 matrix, each proc adds a 2x2 local matrix in the global matrix. matISCSRDenseRect.cpp: 2 procs, global 2 x3 matrix, each proc adds a 1 x2 local vector in the global matrix. reminder: running debian/testing with gcc-6.3 + petsc-3.7.6 Franck >> mpirun -n 2 ./matISCSRDenseSquare.exe csr; mpirun -n 2 ./matISCSRDenseSquare.exe dense csr csr Mat Object: 2 MPI processes type: is Mat Object: 1 MPI processes type: seqaij row 0: (0, 1.) (1, 0.) row 1: (0, 0.) (1, 1.) Mat Object: 1 MPI processes type: seqaij row 0: (0, 1.) (1, 0.) row 1: (0, 0.) (1, 1.) dense dense Mat Object: 2 MPI processes type: is Mat Object: 1 MPI processes type: seqdense 1.0000000000000000e+00 0.0000000000000000e+00 0.0000000000000000e+00 1.0000000000000000e+00 Mat Object: 1 MPI processes type: seqdense 1.0000000000000000e+00 0.0000000000000000e+00 0.0000000000000000e+00 1.0000000000000000e+00 >> mpirun -n 2 ./matISCSRDenseRect.exe csr; mpirun -n 2 ./matISCSRDenseRect.exe dense csr csr Mat Object: 2 MPI processes type: is Mat Object: 1 MPI processes type: seqaij row 0: (0, 1.) (1, 0.) Mat Object: 1 MPI processes type: seqaij row 0: (0, 1.) (1, 0.) dense dense [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Argument out of range [0]PETSC ERROR: [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: Argument out of range [1]PETSC ERROR: New nonzero at (0,1) caused a malloc Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn off this check [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [1]PETSC ERROR: Petsc Release Version 3.7.6, Apr, 24, 2017 [1]PETSC ERROR: ./matISCSRDenseRect.exe on a arch-linux2-c-debug named yoda by fghoussen Mon Jun 19 18:08:58 2017 New nonzero at (0,1) caused a malloc Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn off this check [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.7.6, Apr, 24, 2017 [1]PETSC ERROR: Configure options --prefix=/home/fghoussen/Documents/INRIA/petsc-3.7.6/local --with-mpi=1 --with-pthread=1 --download-f2cblaslapack=yes --download-mumps=yes --download-scalapack=yes --download-superlu=yes --download-suitesparse=yes [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 616 in /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/aij/mpi/mpiaij.c [1]PETSC ERROR: #2 MatSetValues() line 1190 in /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c [1]PETSC ERROR: #3 MatSetValuesLocal() line 2053 in /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c [1]PETSC ERROR: #4 MatISGetMPIXAIJ_IS() line 365 in /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c [1]PETSC ERROR: #5 MatISGetMPIXAIJ() line 437 in /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c [0]PETSC ERROR: ./matISCSRDenseRect.exe on a arch-linux2-c-debug named yoda by fghoussen Mon Jun 19 18:08:58 2017 [0]PETSC ERROR: Configure options --prefix=/home/fghoussen/Documents/INRIA/petsc-3.7.6/local --with-mpi=1 --with-pthread=1 --download-f2cblaslapack=yes --download-mumps=yes --download-scalapack=yes --download-superlu=yes --download-suitesparse=yes [0]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 582 in /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/aij/mpi/mpiaij.c [0]PETSC ERROR: #2 MatSetValues() line 1190 in /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c [0]PETSC ERROR: #3 MatSetValuesLocal() line 2053 in /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c [0]PETSC ERROR: #4 MatISGetMPIXAIJ_IS() line 365 in /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c [0]PETSC ERROR: #5 MatISGetMPIXAIJ() line 437 in /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c Mat Object: 2 MPI processes type: is Mat Object: 1 MPI processes type: seqdense 1.0000000000000000e+00 0.0000000000000000e+00 Mat Object: 1 MPI processes type: seqdense 1.0000000000000000e+00 0.0000000000000000e+00 >> diff matISCSRDenseSquare.cpp matISCSRDenseRect.cpp 3c3 < // ~> g++ -o matISCSRDenseSquare.exe matISCSRDenseSquare.cpp -lpetsc -lm; mpirun -n 2 matISCSRDenseSquare.exe --- > // ~> g++ -o matISCSRDenseRect.exe matISCSRDenseRect.cpp -lpetsc -lm; mpirun -n 2 matISCSRDenseRect.exe 24c24 < ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, PETSC_COPY_VALUES, &rmap); --- > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 1, &rank, PETSC_COPY_VALUES, &rmap); 29c29 < MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 3, 3, rmap, cmap, &A); --- > MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 2, 3, rmap, cmap, &A); 32,33c32,33 < if (matType == "csr") {cout << matType << endl; MatCreateSeqAIJ(PETSC_COMM_SELF, 2, 2, 2, NULL, &Aloc);} < else {cout << matType << endl; MatCreateSeqDense(PETSC_COMM_SELF, 2, 2, NULL, &Aloc);} --- > if (matType == "csr") {cout << matType << endl; MatCreateSeqAIJ(PETSC_COMM_SELF, 1, 2, 2, NULL, &Aloc);} > else {cout << matType << endl; MatCreateSeqDense(PETSC_COMM_SELF, 1, 2, NULL, &Aloc);} 35,36c35,36 < PetscScalar localVal[4] = {1., 0., 0., 1.}; < MatSetValues(Aloc, 2, localIdx, 2, localIdx, localVal, ADD_VALUES); // Add local 2x2 matrix --- > PetscScalar localVal[2] = {1., 0.}; PetscInt oneLocalRow = 0; > MatSetValues(Aloc, 1, &oneLocalRow, 2, localIdx, localVal, ADD_VALUES); // Add local row ----- Mail original ----- > De: "Stefano Zampini" > ?: "Franck Houssen" > Cc: "PETSc users list" > Envoy?: Lundi 19 Juin 2017 15:25:35 > Objet: Re: [petsc-users] Building MatIS with dense local matrix ? > Can you send a minimal working example so that I can fix the code? > Thanks > Stefano > Il 19 Giu 2017 15:20, "Franck Houssen" < franck.houssen at inria.fr > ha > scritto: > > Hi, > > > I try to call MatISGetMPIXAIJ on a MatIS (A) that has been feed locally by > > sequential (Aloc) dense matrix. > > > Seems this ends up with this error: [0]PETSC ERROR: New nonzero at (0,1) > > caused a malloc. Is this a known error / limitation ? (not supposed to work > > with dense matrix ?) > > > This (pseudo code) works fine: > > > MatCreateIS(..., A) > > > MatCreateSeqAIJ(..., Aloc) > > > MatISSetLocalMat(pcA, pcALoc) > > > MatISGetMPIXAIJ(A, ...) // OK ! > > > When I try to replace MatCreateSeqAIJ(..., Aloc) with > > MatCreateSeqDense(..., > > Aloc), it does no more work. > > > Franck > > > PS: running debian/testing with gcc-6.3 + petsc-3.7.6 > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: matISCSRDenseRect.cpp Type: text/x-c++src Size: 1882 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: matISCSRDenseSquare.cpp Type: text/x-c++src Size: 1876 bytes Desc: not available URL: From bsmith at mcs.anl.gov Mon Jun 19 12:29:28 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 19 Jun 2017 12:29:28 -0500 Subject: [petsc-users] strange PETSc/KSP GMRES timings for MPI+OMP configuration on KNLs In-Reply-To: <1868632011.20170616145710@man.poznan.pl> References: <1868632011.20170616145710@man.poznan.pl> Message-ID: <874FEDA4-E6E8-43DF-808C-FDE4E07912B5@mcs.anl.gov> 1000 by 1000 (sparse presumably) is way to small for scaling studies. With sparse matrices you want at least 10-20,000 unknowns per MPI process. What is the Number of steps in the table? Is this Krylov iterations? If so that is also problematic because you are comparing two things: parallel work but much more work because many more iterations. Barry > On Jun 16, 2017, at 7:57 AM, Damian Kaliszan wrote: > > Hi, > > For several days I've been trying to figure out what is going wrong > with my python app timings solving Ax=b with KSP (GMRES) solver when trying to run on Intel's KNL 7210/7230. > > I downsized the problem to 1000x1000 A matrix and a single node and > observed the following: > > > I'm attaching 2 extreme timings where configurations differ only by 1 OMP thread (64MPI/1 OMP vs 64/2 OMPs), > 23321 vs 23325 slurm task ids. > > Any help will be appreciated.... > > Best, > Damian > From balay at mcs.anl.gov Mon Jun 19 12:53:36 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 19 Jun 2017 12:53:36 -0500 Subject: [petsc-users] strange PETSc/KSP GMRES timings for MPI+OMP configuration on KNLs In-Reply-To: <1868632011.20170616145710@man.poznan.pl> References: <1868632011.20170616145710@man.poznan.pl> Message-ID: MPI=16 OMP=1 time=45.62. This timing [without OpenMP] looks out of place. Perhaps something else [wierd MPI behavior?] is going on here.. Satish On Fri, 16 Jun 2017, Damian Kaliszan wrote: > Hi, > > For several days I've been trying to figure out what is going wrong > with my python app timings solving Ax=b with KSP (GMRES) solver when trying to run on Intel's KNL 7210/7230. > > I downsized the problem to 1000x1000 A matrix and a single node and > observed the following: > > > I'm attaching 2 extreme timings where configurations differ only by 1 OMP thread (64MPI/1 OMP vs 64/2 OMPs), > 23321 vs 23325 slurm task ids. > > Any help will be appreciated.... > > Best, > Damian > From damian at man.poznan.pl Mon Jun 19 13:31:03 2017 From: damian at man.poznan.pl (Damian Kaliszan) Date: Mon, 19 Jun 2017 20:31:03 +0200 Subject: [petsc-users] strange PETSc/KSP GMRES timings for MPI+OMP configuration on KNLs In-Reply-To: References: <1868632011.20170616145710@man.poznan.pl> Message-ID: Yes, very strange. I tested it with Intel MPI and ParastationMPI, both available on the cluster. Output log I sent may show something interesting (?) Best, Damian W dniu 19 cze 2017, 19:53, o 19:53, u?ytkownik Satish Balay napisa?: >MPI=16 OMP=1 time=45.62. > >This timing [without OpenMP] looks out of place. Perhaps something >else [wierd MPI behavior?] is going on here.. > >Satish > >On Fri, 16 Jun 2017, Damian Kaliszan wrote: > >> Hi, >> >> For several days I've been trying to figure out what is going >wrong >> with my python app timings solving Ax=b with KSP (GMRES) solver when >trying to run on Intel's KNL 7210/7230. >> >> I downsized the problem to 1000x1000 A matrix and a single node >and >> observed the following: >> >> >> I'm attaching 2 extreme timings where configurations differ only by 1 >OMP thread (64MPI/1 OMP vs 64/2 OMPs), >> 23321 vs 23325 slurm task ids. >> >> Any help will be appreciated.... >> >> Best, >> Damian >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Mon Jun 19 15:46:07 2017 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Mon, 19 Jun 2017 22:46:07 +0200 Subject: [petsc-users] Building MatIS with dense local matrix ? In-Reply-To: <2069876670.7862897.1497889040933.JavaMail.zimbra@inria.fr> References: <59634672.7742377.1497877797395.JavaMail.zimbra@inria.fr> <347595938.7749535.1497878406127.JavaMail.zimbra@inria.fr> <2069876670.7862897.1497889040933.JavaMail.zimbra@inria.fr> Message-ID: Franck, Thanks. I'll? get back soon with a fix. Stefano Il 19 Giu 2017 18:17, "Franck Houssen" ha scritto: > The problem was difficult to reduce as reducing make things disappear... > Luckily, I believe I got it (or at least, it looks "like" the one I > "really" have...). > > Seems that for square matrix, it works fine for csr and dense matrix. But, > If I am not mistaken, it does not for dense rectangular matrix (still OK > for csr). > > matISCSRDenseSquare.cpp: 2 procs, global 3x3 matrix, each proc adds a 2x2 > local matrix in the global matrix. > matISCSRDenseRect.cpp: 2 procs, global *2*x3 matrix, each proc adds a *1*x2 > local *vector* in the global matrix. > > reminder: running debian/testing with gcc-6.3 + petsc-3.7.6 > > Franck > > >> mpirun -n 2 ./matISCSRDenseSquare.exe csr; mpirun -n 2 > ./matISCSRDenseSquare.exe dense > csr > csr > Mat Object: 2 MPI processes > type: is > Mat Object: 1 MPI processes > type: seqaij > row 0: (0, 1.) (1, 0.) > row 1: (0, 0.) (1, 1.) > Mat Object: 1 MPI processes > type: seqaij > row 0: (0, 1.) (1, 0.) > row 1: (0, 0.) (1, 1.) > dense > dense > Mat Object: 2 MPI processes > type: is > Mat Object: 1 MPI processes > type: seqdense > 1.0000000000000000e+00 0.0000000000000000e+00 > 0.0000000000000000e+00 1.0000000000000000e+00 > Mat Object: 1 MPI processes > type: seqdense > 1.0000000000000000e+00 0.0000000000000000e+00 > 0.0000000000000000e+00 1.0000000000000000e+00 > > >> mpirun -n 2 ./matISCSRDenseRect.exe csr; mpirun -n 2 > ./matISCSRDenseRect.exe dense > csr > csr > Mat Object: 2 MPI processes > type: is > Mat Object: 1 MPI processes > type: seqaij > row 0: (0, 1.) (1, 0.) > Mat Object: 1 MPI processes > type: seqaij > row 0: (0, 1.) (1, 0.) > dense > dense > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Argument out of range > [0]PETSC ERROR: [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: Argument out of range > [1]PETSC ERROR: New nonzero at (0,1) caused a malloc > Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn > off this check > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.7.6, Apr, 24, 2017 > [1]PETSC ERROR: ./matISCSRDenseRect.exe on a arch-linux2-c-debug named > yoda by fghoussen Mon Jun 19 18:08:58 2017 > New nonzero at (0,1) caused a malloc > Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn > off this check > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.7.6, Apr, 24, 2017 > [1]PETSC ERROR: Configure options --prefix=/home/fghoussen/ > Documents/INRIA/petsc-3.7.6/local --with-mpi=1 --with-pthread=1 > --download-f2cblaslapack=yes --download-mumps=yes --download-scalapack=yes > --download-superlu=yes --download-suitesparse=yes > [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 616 in > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/aij/mpi/mpiaij.c > [1]PETSC ERROR: #2 MatSetValues() line 1190 in /home/fghoussen/Documents/ > INRIA/petsc-3.7.6/src/mat/interface/matrix.c > [1]PETSC ERROR: #3 MatSetValuesLocal() line 2053 in > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > [1]PETSC ERROR: #4 MatISGetMPIXAIJ_IS() line 365 in > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > [1]PETSC ERROR: #5 MatISGetMPIXAIJ() line 437 in /home/fghoussen/Documents/ > INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > [0]PETSC ERROR: ./matISCSRDenseRect.exe on a arch-linux2-c-debug named > yoda by fghoussen Mon Jun 19 18:08:58 2017 > [0]PETSC ERROR: Configure options --prefix=/home/fghoussen/ > Documents/INRIA/petsc-3.7.6/local --with-mpi=1 --with-pthread=1 > --download-f2cblaslapack=yes --download-mumps=yes --download-scalapack=yes > --download-superlu=yes --download-suitesparse=yes > [0]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 582 in > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/aij/mpi/mpiaij.c > [0]PETSC ERROR: #2 MatSetValues() line 1190 in /home/fghoussen/Documents/ > INRIA/petsc-3.7.6/src/mat/interface/matrix.c > [0]PETSC ERROR: #3 MatSetValuesLocal() line 2053 in > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > [0]PETSC ERROR: #4 MatISGetMPIXAIJ_IS() line 365 in > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > [0]PETSC ERROR: #5 MatISGetMPIXAIJ() line 437 in /home/fghoussen/Documents/ > INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > Mat Object: 2 MPI processes > type: is > Mat Object: 1 MPI processes > type: seqdense > 1.0000000000000000e+00 0.0000000000000000e+00 > Mat Object: 1 MPI processes > type: seqdense > 1.0000000000000000e+00 0.0000000000000000e+00 > > >> diff matISCSRDenseSquare.cpp matISCSRDenseRect.cpp > 3c3 > < // ~> g++ -o matISCSRDenseSquare.exe matISCSRDenseSquare.cpp -lpetsc > -lm; mpirun -n 2 matISCSRDenseSquare.exe > --- > > // ~> g++ -o matISCSRDenseRect.exe matISCSRDenseRect.cpp -lpetsc -lm; > mpirun -n 2 matISCSRDenseRect.exe > 24c24 > < ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, > PETSC_COPY_VALUES, &rmap); > --- > > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 1, &rank, > PETSC_COPY_VALUES, &rmap); > 29c29 > < MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 3, 3, > rmap, cmap, &A); > --- > > MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 2, 3, > rmap, cmap, &A); > 32,33c32,33 > < if (matType == "csr") {cout << matType << endl; > MatCreateSeqAIJ(PETSC_COMM_SELF, 2, 2, 2, NULL, &Aloc);} > < else {cout << matType << endl; MatCreateSeqDense(PETSC_COMM_SELF, 2, > 2, NULL, &Aloc);} > --- > > if (matType == "csr") {cout << matType << endl; > MatCreateSeqAIJ(PETSC_COMM_SELF, 1, 2, 2, NULL, &Aloc);} > > else {cout << matType << endl; MatCreateSeqDense(PETSC_COMM_SELF, 1, > 2, NULL, &Aloc);} > 35,36c35,36 > < PetscScalar localVal[4] = {1., 0., 0., 1.}; > < MatSetValues(Aloc, 2, localIdx, 2, localIdx, localVal, ADD_VALUES); // > Add local 2x2 matrix > --- > > PetscScalar localVal[2] = {1., 0.}; PetscInt oneLocalRow = 0; > > MatSetValues(Aloc, 1, &oneLocalRow, 2, localIdx, localVal, > ADD_VALUES); // Add local row > > ------------------------------ > > *De: *"Stefano Zampini" > *?: *"Franck Houssen" > *Cc: *"PETSc users list" > *Envoy?: *Lundi 19 Juin 2017 15:25:35 > *Objet: *Re: [petsc-users] Building MatIS with dense local matrix ? > > Can you send a minimal working example so that I can fix the code? > > Thanks > Stefano > > Il 19 Giu 2017 15:20, "Franck Houssen" ha > scritto: > >> Hi, >> >> I try to call MatISGetMPIXAIJ on a MatIS (A) that has been feed locally >> by sequential (Aloc) dense matrix. >> Seems this ends up with this error: [0]PETSC ERROR: New nonzero at (0,1) >> caused a malloc. Is this a known error / limitation ? (not supposed to work >> with dense matrix ?) >> >> This (pseudo code) works fine: >> MatCreateIS(..., A) >> MatCreateSeqAIJ(..., Aloc) >> MatISSetLocalMat(pcA, pcALoc) >> MatISGetMPIXAIJ(A, ...) // OK ! >> >> When I try to replace MatCreateSeqAIJ(..., Aloc) with >> MatCreateSeqDense(..., Aloc), it does no more work. >> >> Franck >> >> PS: running debian/testing with gcc-6.3 + petsc-3.7.6 >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Mon Jun 19 17:23:24 2017 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Tue, 20 Jun 2017 00:23:24 +0200 Subject: [petsc-users] Building MatIS with dense local matrix ? In-Reply-To: References: <59634672.7742377.1497877797395.JavaMail.zimbra@inria.fr> <347595938.7749535.1497878406127.JavaMail.zimbra@inria.fr> <2069876670.7862897.1497889040933.JavaMail.zimbra@inria.fr> Message-ID: It should be fixed now in maint and master https://bitbucket.org/petsc/petsc/commits/4c8dd594d1988a0cbe282f8a37d9916f61e0c445 Thanks for reporting the problem, Stefano > On Jun 19, 2017, at 10:46 PM, Stefano Zampini wrote: > > Franck, > > Thanks. I'll? get back soon with a fix. > > Stefano > > > Il 19 Giu 2017 18:17, "Franck Houssen" > ha scritto: > The problem was difficult to reduce as reducing make things disappear... Luckily, I believe I got it (or at least, it looks "like" the one I "really" have...). > > Seems that for square matrix, it works fine for csr and dense matrix. But, If I am not mistaken, it does not for dense rectangular matrix (still OK for csr). > > matISCSRDenseSquare.cpp: 2 procs, global 3x3 matrix, each proc adds a 2x2 local matrix in the global matrix. > matISCSRDenseRect.cpp: 2 procs, global 2x3 matrix, each proc adds a 1x2 local vector in the global matrix. > > reminder: running debian/testing with gcc-6.3 + petsc-3.7.6 > > Franck > > >> mpirun -n 2 ./matISCSRDenseSquare.exe csr; mpirun -n 2 ./matISCSRDenseSquare.exe dense > csr > csr > Mat Object: 2 MPI processes > type: is > Mat Object: 1 MPI processes > type: seqaij > row 0: (0, 1.) (1, 0.) > row 1: (0, 0.) (1, 1.) > Mat Object: 1 MPI processes > type: seqaij > row 0: (0, 1.) (1, 0.) > row 1: (0, 0.) (1, 1.) > dense > dense > Mat Object: 2 MPI processes > type: is > Mat Object: 1 MPI processes > type: seqdense > 1.0000000000000000e+00 0.0000000000000000e+00 > 0.0000000000000000e+00 1.0000000000000000e+00 > Mat Object: 1 MPI processes > type: seqdense > 1.0000000000000000e+00 0.0000000000000000e+00 > 0.0000000000000000e+00 1.0000000000000000e+00 > > >> mpirun -n 2 ./matISCSRDenseRect.exe csr; mpirun -n 2 ./matISCSRDenseRect.exe dense > csr > csr > Mat Object: 2 MPI processes > type: is > Mat Object: 1 MPI processes > type: seqaij > row 0: (0, 1.) (1, 0.) > Mat Object: 1 MPI processes > type: seqaij > row 0: (0, 1.) (1, 0.) > dense > dense > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Argument out of range > [0]PETSC ERROR: [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [1]PETSC ERROR: Argument out of range > [1]PETSC ERROR: New nonzero at (0,1) caused a malloc > Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn off this check > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.7.6, Apr, 24, 2017 > [1]PETSC ERROR: ./matISCSRDenseRect.exe on a arch-linux2-c-debug named yoda by fghoussen Mon Jun 19 18:08:58 2017 > New nonzero at (0,1) caused a malloc > Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn off this check > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.7.6, Apr, 24, 2017 > [1]PETSC ERROR: Configure options --prefix=/home/fghoussen/Documents/INRIA/petsc-3.7.6/local --with-mpi=1 --with-pthread=1 --download-f2cblaslapack=yes --download-mumps=yes --download-scalapack=yes --download-superlu=yes --download-suitesparse=yes > [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 616 in /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/aij/mpi/mpiaij.c > [1]PETSC ERROR: #2 MatSetValues() line 1190 in /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > [1]PETSC ERROR: #3 MatSetValuesLocal() line 2053 in /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > [1]PETSC ERROR: #4 MatISGetMPIXAIJ_IS() line 365 in /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > [1]PETSC ERROR: #5 MatISGetMPIXAIJ() line 437 in /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > [0]PETSC ERROR: ./matISCSRDenseRect.exe on a arch-linux2-c-debug named yoda by fghoussen Mon Jun 19 18:08:58 2017 > [0]PETSC ERROR: Configure options --prefix=/home/fghoussen/Documents/INRIA/petsc-3.7.6/local --with-mpi=1 --with-pthread=1 --download-f2cblaslapack=yes --download-mumps=yes --download-scalapack=yes --download-superlu=yes --download-suitesparse=yes > [0]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 582 in /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/aij/mpi/mpiaij.c > [0]PETSC ERROR: #2 MatSetValues() line 1190 in /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > [0]PETSC ERROR: #3 MatSetValuesLocal() line 2053 in /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > [0]PETSC ERROR: #4 MatISGetMPIXAIJ_IS() line 365 in /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > [0]PETSC ERROR: #5 MatISGetMPIXAIJ() line 437 in /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > Mat Object: 2 MPI processes > type: is > Mat Object: 1 MPI processes > type: seqdense > 1.0000000000000000e+00 0.0000000000000000e+00 > Mat Object: 1 MPI processes > type: seqdense > 1.0000000000000000e+00 0.0000000000000000e+00 > > >> diff matISCSRDenseSquare.cpp matISCSRDenseRect.cpp > 3c3 > < // ~> g++ -o matISCSRDenseSquare.exe matISCSRDenseSquare.cpp -lpetsc -lm; mpirun -n 2 matISCSRDenseSquare.exe > --- > > // ~> g++ -o matISCSRDenseRect.exe matISCSRDenseRect.cpp -lpetsc -lm; mpirun -n 2 matISCSRDenseRect.exe > 24c24 > < ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, PETSC_COPY_VALUES, &rmap); > --- > > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 1, &rank, PETSC_COPY_VALUES, &rmap); > 29c29 > < MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 3, 3, rmap, cmap, &A); > --- > > MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 2, 3, rmap, cmap, &A); > 32,33c32,33 > < if (matType == "csr") {cout << matType << endl; MatCreateSeqAIJ(PETSC_COMM_SELF, 2, 2, 2, NULL, &Aloc);} > < else {cout << matType << endl; MatCreateSeqDense(PETSC_COMM_SELF, 2, 2, NULL, &Aloc);} > --- > > if (matType == "csr") {cout << matType << endl; MatCreateSeqAIJ(PETSC_COMM_SELF, 1, 2, 2, NULL, &Aloc);} > > else {cout << matType << endl; MatCreateSeqDense(PETSC_COMM_SELF, 1, 2, NULL, &Aloc);} > 35,36c35,36 > < PetscScalar localVal[4] = {1., 0., 0., 1.}; > < MatSetValues(Aloc, 2, localIdx, 2, localIdx, localVal, ADD_VALUES); // Add local 2x2 matrix > --- > > PetscScalar localVal[2] = {1., 0.}; PetscInt oneLocalRow = 0; > > MatSetValues(Aloc, 1, &oneLocalRow, 2, localIdx, localVal, ADD_VALUES); // Add local row > > De: "Stefano Zampini" > > ?: "Franck Houssen" > > Cc: "PETSc users list" > > Envoy?: Lundi 19 Juin 2017 15:25:35 > Objet: Re: [petsc-users] Building MatIS with dense local matrix ? > > Can you send a minimal working example so that I can fix the code? > > Thanks > Stefano > > Il 19 Giu 2017 15:20, "Franck Houssen" > ha scritto: > Hi, > > I try to call MatISGetMPIXAIJ on a MatIS (A) that has been feed locally by sequential (Aloc) dense matrix. > Seems this ends up with this error: [0]PETSC ERROR: New nonzero at (0,1) caused a malloc. Is this a known error / limitation ? (not supposed to work with dense matrix ?) > > This (pseudo code) works fine: > MatCreateIS(..., A) > MatCreateSeqAIJ(..., Aloc) > MatISSetLocalMat(pcA, pcALoc) > MatISGetMPIXAIJ(A, ...) // OK ! > > When I try to replace MatCreateSeqAIJ(..., Aloc) with MatCreateSeqDense(..., Aloc), it does no more work. > > Franck > > PS: running debian/testing with gcc-6.3 + petsc-3.7.6 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From franck.houssen at inria.fr Tue Jun 20 05:58:25 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Tue, 20 Jun 2017 12:58:25 +0200 (CEST) Subject: [petsc-users] Building MatIS with dense local matrix ? In-Reply-To: References: <59634672.7742377.1497877797395.JavaMail.zimbra@inria.fr> <347595938.7749535.1497878406127.JavaMail.zimbra@inria.fr> <2069876670.7862897.1497889040933.JavaMail.zimbra@inria.fr> Message-ID: <2005369855.8101317.1497956305467.JavaMail.zimbra@inria.fr> As I said, it is often difficult to reduce the "real" problem: it turns out that your fix solves the "matISCSRDenseSquare/Rect.cpp" dummy example I sent, but, it's still not working in "my real" situation. I changed a bit the "matISCSRDenseSquare/Rect.cpp" dummy example (see git diff below - I just changed the point that overlaps) : the dummy example is still failing "mpirun -n 2 ./matISCSRDenseSquare.exe csr" and "mpirun -n 2 ./matISCSRDenseSquare.exe dense" : OK but "mpirun -n 2 ./matISCSRDenseRect.exe csr" and "mpirun -n 2 ./matISCSRDenseRect.exe dense": KO with error "Argument out of range - New nonzero at (0,2) caused a malloc" I would say, the problem (I am concerned with the "real" case) is around lines 360-380 of /src/mat/impls/is/matis.c (not around 181 : this fixes a valid problem, but, this problem is another one) Franck --- a/matISCSRDenseRect.cpp +++ b/matISCSRDenseRect.cpp @@ -18,7 +18,7 @@ int main(int argc,char **argv) { int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); PetscInt localIdx[2] = {0, 0}; - if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} + if (rank == 0) {localIdx[0] = 0; localIdx[1] = 2;} else {localIdx[0] = 1; localIdx[1] = 2;} ISLocalToGlobalMapping rmap; ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 1, &rank, PETSC_COPY_VALUES, &rmap); diff --git a/Graphs/Franck/03.petscDDM/02.petscMailList/matISCSRDenseSquare.cpp b/Graphs/Franck/03.petscDDM/02.petscMailList/matISCSRDenseSquare.cpp index 4bc6190..4a6ea41 100644 --- a/matISCSRDenseSquare.cpp +++ b/matISCSRDenseSquare.cpp @@ -18,7 +18,7 @@ int main(int argc,char **argv) { int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); PetscInt localIdx[2] = {0, 0}; - if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} + if (rank == 0) {localIdx[0] = 0; localIdx[1] = 2;} else {localIdx[0] = 1; localIdx[1] = 2;} ISLocalToGlobalMapping rmap; ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, PETSC_COPY_VALUES, &rmap); ----- Mail original ----- > De: "Stefano Zampini" > ?: "Franck Houssen" > Cc: "PETSc users list" > Envoy?: Mardi 20 Juin 2017 00:23:24 > Objet: Re: [petsc-users] Building MatIS with dense local matrix ? > It should be fixed now in maint and master > https://bitbucket.org/petsc/petsc/commits/4c8dd594d1988a0cbe282f8a37d9916f61e0c445 > Thanks for reporting the problem, > Stefano > > On Jun 19, 2017, at 10:46 PM, Stefano Zampini < stefano.zampini at gmail.com > > > wrote: > > > Franck, > > > Thanks. I'll? get back soon with a fix. > > > Stefano > > > Il 19 Giu 2017 18:17, "Franck Houssen" < franck.houssen at inria.fr > ha > > scritto: > > > > The problem was difficult to reduce as reducing make things disappear... > > > Luckily, I believe I got it (or at least, it looks "like" the one I > > > "really" > > > have...). > > > > > > Seems that for square matrix, it works fine for csr and dense matrix. > > > But, > > > If > > > I am not mistaken, it does not for dense rectangular matrix (still OK for > > > csr). > > > > > > matISCSRDenseSquare.cpp: 2 procs, global 3x3 matrix, each proc adds a 2x2 > > > local matrix in the global matrix. > > > > > > matISCSRDenseRect.cpp: 2 procs, global 2 x3 matrix, each proc adds a 1 x2 > > > local vector in the global matrix. > > > > > > reminder: running debian/testing with gcc-6.3 + petsc-3.7.6 > > > > > > Franck > > > > > > >> mpirun -n 2 ./matISCSRDenseSquare.exe csr; mpirun -n 2 > > > >> ./matISCSRDenseSquare.exe dense > > > > > > csr > > > > > > csr > > > > > > Mat Object: 2 MPI processes > > > > > > type: is > > > > > > Mat Object: 1 MPI processes > > > > > > type: seqaij > > > > > > row 0: (0, 1.) (1, 0.) > > > > > > row 1: (0, 0.) (1, 1.) > > > > > > Mat Object: 1 MPI processes > > > > > > type: seqaij > > > > > > row 0: (0, 1.) (1, 0.) > > > > > > row 1: (0, 0.) (1, 1.) > > > > > > dense > > > > > > dense > > > > > > Mat Object: 2 MPI processes > > > > > > type: is > > > > > > Mat Object: 1 MPI processes > > > > > > type: seqdense > > > > > > 1.0000000000000000e+00 0.0000000000000000e+00 > > > > > > 0.0000000000000000e+00 1.0000000000000000e+00 > > > > > > Mat Object: 1 MPI processes > > > > > > type: seqdense > > > > > > 1.0000000000000000e+00 0.0000000000000000e+00 > > > > > > 0.0000000000000000e+00 1.0000000000000000e+00 > > > > > > >> mpirun -n 2 ./matISCSRDenseRect.exe csr; mpirun -n 2 > > > >> ./matISCSRDenseRect.exe dense > > > > > > csr > > > > > > csr > > > > > > Mat Object: 2 MPI processes > > > > > > type: is > > > > > > Mat Object: 1 MPI processes > > > > > > type: seqaij > > > > > > row 0: (0, 1.) (1, 0.) > > > > > > Mat Object: 1 MPI processes > > > > > > type: seqaij > > > > > > row 0: (0, 1.) (1, 0.) > > > > > > dense > > > > > > dense > > > > > > [0]PETSC ERROR: --------------------- Error Message > > > -------------------------------------------------------------- > > > > > > [0]PETSC ERROR: Argument out of range > > > > > > [0]PETSC ERROR: [1]PETSC ERROR: --------------------- Error Message > > > -------------------------------------------------------------- > > > > > > [1]PETSC ERROR: Argument out of range > > > > > > [1]PETSC ERROR: New nonzero at (0,1) caused a malloc > > > > > > Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn > > > off > > > this check > > > > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > > > for > > > trouble shooting. > > > > > > [1]PETSC ERROR: Petsc Release Version 3.7.6, Apr, 24, 2017 > > > > > > [1]PETSC ERROR: ./matISCSRDenseRect.exe on a arch-linux2-c-debug named > > > yoda > > > by fghoussen Mon Jun 19 18:08:58 2017 > > > > > > New nonzero at (0,1) caused a malloc > > > > > > Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn > > > off > > > this check > > > > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > > > for > > > trouble shooting. > > > > > > [0]PETSC ERROR: Petsc Release Version 3.7.6, Apr, 24, 2017 > > > > > > [1]PETSC ERROR: Configure options > > > --prefix=/home/fghoussen/Documents/INRIA/petsc-3.7.6/local --with-mpi=1 > > > --with-pthread=1 --download-f2cblaslapack=yes --download-mumps=yes > > > --download-scalapack=yes --download-superlu=yes > > > --download-suitesparse=yes > > > > > > [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 616 in > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/aij/mpi/mpiaij.c > > > > > > [1]PETSC ERROR: #2 MatSetValues() line 1190 in > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > > > > > > [1]PETSC ERROR: #3 MatSetValuesLocal() line 2053 in > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > > > > > > [1]PETSC ERROR: #4 MatISGetMPIXAIJ_IS() line 365 in > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > [1]PETSC ERROR: #5 MatISGetMPIXAIJ() line 437 in > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > [0]PETSC ERROR: ./matISCSRDenseRect.exe on a arch-linux2-c-debug named > > > yoda > > > by fghoussen Mon Jun 19 18:08:58 2017 > > > > > > [0]PETSC ERROR: Configure options > > > --prefix=/home/fghoussen/Documents/INRIA/petsc-3.7.6/local --with-mpi=1 > > > --with-pthread=1 --download-f2cblaslapack=yes --download-mumps=yes > > > --download-scalapack=yes --download-superlu=yes > > > --download-suitesparse=yes > > > > > > [0]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 582 in > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/aij/mpi/mpiaij.c > > > > > > [0]PETSC ERROR: #2 MatSetValues() line 1190 in > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > > > > > > [0]PETSC ERROR: #3 MatSetValuesLocal() line 2053 in > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > > > > > > [0]PETSC ERROR: #4 MatISGetMPIXAIJ_IS() line 365 in > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > [0]PETSC ERROR: #5 MatISGetMPIXAIJ() line 437 in > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > Mat Object: 2 MPI processes > > > > > > type: is > > > > > > Mat Object: 1 MPI processes > > > > > > type: seqdense > > > > > > 1.0000000000000000e+00 0.0000000000000000e+00 > > > > > > Mat Object: 1 MPI processes > > > > > > type: seqdense > > > > > > 1.0000000000000000e+00 0.0000000000000000e+00 > > > > > > >> diff matISCSRDenseSquare.cpp matISCSRDenseRect.cpp > > > > > > 3c3 > > > > > > < // ~> g++ -o matISCSRDenseSquare.exe matISCSRDenseSquare.cpp -lpetsc > > > -lm; > > > mpirun -n 2 matISCSRDenseSquare.exe > > > > > > --- > > > > > > > // ~> g++ -o matISCSRDenseRect.exe matISCSRDenseRect.cpp -lpetsc -lm; > > > > mpirun -n 2 matISCSRDenseRect.exe > > > > > > 24c24 > > > > > > < ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, > > > PETSC_COPY_VALUES, &rmap); > > > > > > --- > > > > > > > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 1, &rank, > > > > PETSC_COPY_VALUES, &rmap); > > > > > > 29c29 > > > > > > < MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 3, 3, > > > rmap, > > > cmap, &A); > > > > > > --- > > > > > > > MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 2, 3, > > > > rmap, > > > > cmap, &A); > > > > > > 32,33c32,33 > > > > > > < if (matType == "csr") {cout << matType << endl; > > > MatCreateSeqAIJ(PETSC_COMM_SELF, 2, 2, 2, NULL, &Aloc);} > > > > > > < else {cout << matType << endl; MatCreateSeqDense(PETSC_COMM_SELF, 2, 2, > > > NULL, &Aloc);} > > > > > > --- > > > > > > > if (matType == "csr") {cout << matType << endl; > > > > MatCreateSeqAIJ(PETSC_COMM_SELF, 1, 2, 2, NULL, &Aloc);} > > > > > > > else {cout << matType << endl; MatCreateSeqDense(PETSC_COMM_SELF, 1, 2, > > > > NULL, &Aloc);} > > > > > > 35,36c35,36 > > > > > > < PetscScalar localVal[4] = {1., 0., 0., 1.}; > > > > > > < MatSetValues(Aloc, 2, localIdx, 2, localIdx, localVal, ADD_VALUES); // > > > Add > > > local 2x2 matrix > > > > > > --- > > > > > > > PetscScalar localVal[2] = {1., 0.}; PetscInt oneLocalRow = 0; > > > > > > > MatSetValues(Aloc, 1, &oneLocalRow, 2, localIdx, localVal, ADD_VALUES); > > > > // > > > > Add local row > > > > > > > De: "Stefano Zampini" < stefano.zampini at gmail.com > > > > > > > > > > > ?: "Franck Houssen" < franck.houssen at inria.fr > > > > > > > > > > > Cc: "PETSc users list" < petsc-users at mcs.anl.gov > > > > > > > > > > > Envoy?: Lundi 19 Juin 2017 15:25:35 > > > > > > > > > > Objet: Re: [petsc-users] Building MatIS with dense local matrix ? > > > > > > > > > > Can you send a minimal working example so that I can fix the code? > > > > > > > > > > Thanks > > > > > > > > > > Stefano > > > > > > > > > > Il 19 Giu 2017 15:20, "Franck Houssen" < franck.houssen at inria.fr > ha > > > > scritto: > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > I try to call MatISGetMPIXAIJ on a MatIS (A) that has been feed > > > > > locally > > > > > by > > > > > sequential (Aloc) dense matrix. > > > > > > > > > > > > > > > Seems this ends up with this error: [0]PETSC ERROR: New nonzero at > > > > > (0,1) > > > > > caused a malloc. Is this a known error / limitation ? (not supposed > > > > > to > > > > > work > > > > > with dense matrix ?) > > > > > > > > > > > > > > > This (pseudo code) works fine: > > > > > > > > > > > > > > > MatCreateIS(..., A) > > > > > > > > > > > > > > > MatCreateSeqAIJ(..., Aloc) > > > > > > > > > > > > > > > MatISSetLocalMat(pcA, pcALoc) > > > > > > > > > > > > > > > MatISGetMPIXAIJ(A, ...) // OK ! > > > > > > > > > > > > > > > When I try to replace MatCreateSeqAIJ(..., Aloc) with > > > > > MatCreateSeqDense(..., > > > > > Aloc), it does no more work. > > > > > > > > > > > > > > > Franck > > > > > > > > > > > > > > > PS: running debian/testing with gcc-6.3 + petsc-3.7.6 > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: matISCSRDenseRect.cpp Type: text/x-c++src Size: 1882 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: matISCSRDenseSquare.cpp Type: text/x-c++src Size: 1876 bytes Desc: not available URL: From stefano.zampini at gmail.com Tue Jun 20 06:23:27 2017 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Tue, 20 Jun 2017 13:23:27 +0200 Subject: [petsc-users] Building MatIS with dense local matrix ? In-Reply-To: <2005369855.8101317.1497956305467.JavaMail.zimbra@inria.fr> References: <59634672.7742377.1497877797395.JavaMail.zimbra@inria.fr> <347595938.7749535.1497878406127.JavaMail.zimbra@inria.fr> <2069876670.7862897.1497889040933.JavaMail.zimbra@inria.fr> <2005369855.8101317.1497956305467.JavaMail.zimbra@inria.fr> Message-ID: Franck I tested your new example with master and it works. However, It doesn't work with maint. I fixed the rectangular case a while ago in master and forgot to add the change to maint too. Sorry for that. This should fix the problem with maint: https://bitbucket.org/petsc/petsc/commits/0ea065fb06d751599c4157d36bfe1a1b41348e0b Test your real case and let me know. If you could, it would be good to test against master too. Thanks, Stefano 2017-06-20 12:58 GMT+02:00 Franck Houssen : > As I said, it is often difficult to reduce the "real" problem: it turns > out that your fix solves the "matISCSRDenseSquare/Rect.cpp" dummy example I > sent, but, it's still not working in "my real" situation. > > I changed a bit the "matISCSRDenseSquare/Rect.cpp" dummy example (see git > diff below - I just changed the point that overlaps) : the dummy example is > still failing > > "mpirun -n 2 ./matISCSRDenseSquare.exe csr" and "mpirun -n 2 > ./matISCSRDenseSquare.exe dense" : OK > but > "mpirun -n 2 ./matISCSRDenseRect.exe csr" and "mpirun -n 2 > ./matISCSRDenseRect.exe dense": KO with error "Argument out of range - New > nonzero at (0,2) caused a malloc" > > I would say, the problem (I am concerned with the "real" case) is around > lines 360-380 of /src/mat/impls/is/matis.c (not around 181 : this fixes a > valid problem, but, this problem is another one) > > Franck > > --- a/matISCSRDenseRect.cpp > +++ b/matISCSRDenseRect.cpp > @@ -18,7 +18,7 @@ int main(int argc,char **argv) { > int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > PetscInt localIdx[2] = {0, 0}; > - if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} > + if (rank == 0) {localIdx[0] = 0; localIdx[1] = 2;} > else {localIdx[0] = 1; localIdx[1] = 2;} > ISLocalToGlobalMapping rmap; > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 1, &rank, > PETSC_COPY_VALUES, &rmap); > diff --git a/Graphs/Franck/03.petscDDM/02.petscMailList/matISCSRDenseSquare.cpp > b/Graphs/Franck/03.petscDDM/02.petscMailList/matISCSRDenseSquare.cpp > index 4bc6190..4a6ea41 100644 > --- a/matISCSRDenseSquare.cpp > +++ b/matISCSRDenseSquare.cpp > @@ -18,7 +18,7 @@ int main(int argc,char **argv) { > int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > PetscInt localIdx[2] = {0, 0}; > - if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} > + if (rank == 0) {localIdx[0] = 0; localIdx[1] = 2;} > else {localIdx[0] = 1; localIdx[1] = 2;} > ISLocalToGlobalMapping rmap; > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, > PETSC_COPY_VALUES, &rmap); > > ------------------------------ > > *De: *"Stefano Zampini" > *?: *"Franck Houssen" > *Cc: *"PETSc users list" > *Envoy?: *Mardi 20 Juin 2017 00:23:24 > > *Objet: *Re: [petsc-users] Building MatIS with dense local matrix ? > > It should be fixed now in maint and master > > https://bitbucket.org/petsc/petsc/commits/4c8dd594d1988a0cbe282f8a37d991 > 6f61e0c445 > > Thanks for reporting the problem, > Stefano > > On Jun 19, 2017, at 10:46 PM, Stefano Zampini > wrote: > > Franck, > > Thanks. I'll? get back soon with a fix. > > Stefano > > > Il 19 Giu 2017 18:17, "Franck Houssen" ha > scritto: > >> The problem was difficult to reduce as reducing make things disappear... >> Luckily, I believe I got it (or at least, it looks "like" the one I >> "really" have...). >> >> Seems that for square matrix, it works fine for csr and dense matrix. >> But, If I am not mistaken, it does not for dense rectangular matrix (still >> OK for csr). >> >> matISCSRDenseSquare.cpp: 2 procs, global 3x3 matrix, each proc adds a 2x2 >> local matrix in the global matrix. >> matISCSRDenseRect.cpp: 2 procs, global *2*x3 matrix, each proc adds a *1*x2 >> local *vector* in the global matrix. >> >> reminder: running debian/testing with gcc-6.3 + petsc-3.7.6 >> >> Franck >> >> >> mpirun -n 2 ./matISCSRDenseSquare.exe csr; mpirun -n 2 >> ./matISCSRDenseSquare.exe dense >> csr >> csr >> Mat Object: 2 MPI processes >> type: is >> Mat Object: 1 MPI processes >> type: seqaij >> row 0: (0, 1.) (1, 0.) >> row 1: (0, 0.) (1, 1.) >> Mat Object: 1 MPI processes >> type: seqaij >> row 0: (0, 1.) (1, 0.) >> row 1: (0, 0.) (1, 1.) >> dense >> dense >> Mat Object: 2 MPI processes >> type: is >> Mat Object: 1 MPI processes >> type: seqdense >> 1.0000000000000000e+00 0.0000000000000000e+00 >> 0.0000000000000000e+00 1.0000000000000000e+00 >> Mat Object: 1 MPI processes >> type: seqdense >> 1.0000000000000000e+00 0.0000000000000000e+00 >> 0.0000000000000000e+00 1.0000000000000000e+00 >> >> >> mpirun -n 2 ./matISCSRDenseRect.exe csr; mpirun -n 2 >> ./matISCSRDenseRect.exe dense >> csr >> csr >> Mat Object: 2 MPI processes >> type: is >> Mat Object: 1 MPI processes >> type: seqaij >> row 0: (0, 1.) (1, 0.) >> Mat Object: 1 MPI processes >> type: seqaij >> row 0: (0, 1.) (1, 0.) >> dense >> dense >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: Argument out of range >> [0]PETSC ERROR: [1]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [1]PETSC ERROR: Argument out of range >> [1]PETSC ERROR: New nonzero at (0,1) caused a malloc >> Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn >> off this check >> [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >> for trouble shooting. >> [1]PETSC ERROR: Petsc Release Version 3.7.6, Apr, 24, 2017 >> [1]PETSC ERROR: ./matISCSRDenseRect.exe on a arch-linux2-c-debug named >> yoda by fghoussen Mon Jun 19 18:08:58 2017 >> New nonzero at (0,1) caused a malloc >> Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn >> off this check >> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >> for trouble shooting. >> [0]PETSC ERROR: Petsc Release Version 3.7.6, Apr, 24, 2017 >> [1]PETSC ERROR: Configure options --prefix=/home/fghoussen/ >> Documents/INRIA/petsc-3.7.6/local --with-mpi=1 --with-pthread=1 >> --download-f2cblaslapack=yes --download-mumps=yes --download-scalapack=yes >> --download-superlu=yes --download-suitesparse=yes >> [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 616 in >> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/ >> impls/aij/mpi/mpiaij.c >> [1]PETSC ERROR: #2 MatSetValues() line 1190 in /home/fghoussen/Documents/ >> INRIA/petsc-3.7.6/src/mat/interface/matrix.c >> [1]PETSC ERROR: #3 MatSetValuesLocal() line 2053 in >> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c >> [1]PETSC ERROR: #4 MatISGetMPIXAIJ_IS() line 365 in >> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >> [1]PETSC ERROR: #5 MatISGetMPIXAIJ() line 437 in >> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >> [0]PETSC ERROR: ./matISCSRDenseRect.exe on a arch-linux2-c-debug named >> yoda by fghoussen Mon Jun 19 18:08:58 2017 >> [0]PETSC ERROR: Configure options --prefix=/home/fghoussen/ >> Documents/INRIA/petsc-3.7.6/local --with-mpi=1 --with-pthread=1 >> --download-f2cblaslapack=yes --download-mumps=yes --download-scalapack=yes >> --download-superlu=yes --download-suitesparse=yes >> [0]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 582 in >> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/ >> impls/aij/mpi/mpiaij.c >> [0]PETSC ERROR: #2 MatSetValues() line 1190 in /home/fghoussen/Documents/ >> INRIA/petsc-3.7.6/src/mat/interface/matrix.c >> [0]PETSC ERROR: #3 MatSetValuesLocal() line 2053 in >> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c >> [0]PETSC ERROR: #4 MatISGetMPIXAIJ_IS() line 365 in >> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >> [0]PETSC ERROR: #5 MatISGetMPIXAIJ() line 437 in >> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >> Mat Object: 2 MPI processes >> type: is >> Mat Object: 1 MPI processes >> type: seqdense >> 1.0000000000000000e+00 0.0000000000000000e+00 >> Mat Object: 1 MPI processes >> type: seqdense >> 1.0000000000000000e+00 0.0000000000000000e+00 >> >> >> diff matISCSRDenseSquare.cpp matISCSRDenseRect.cpp >> 3c3 >> < // ~> g++ -o matISCSRDenseSquare.exe matISCSRDenseSquare.cpp -lpetsc >> -lm; mpirun -n 2 matISCSRDenseSquare.exe >> --- >> > // ~> g++ -o matISCSRDenseRect.exe matISCSRDenseRect.cpp -lpetsc -lm; >> mpirun -n 2 matISCSRDenseRect.exe >> 24c24 >> < ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, >> PETSC_COPY_VALUES, &rmap); >> --- >> > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 1, &rank, >> PETSC_COPY_VALUES, &rmap); >> 29c29 >> < MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 3, 3, >> rmap, cmap, &A); >> --- >> > MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 2, 3, >> rmap, cmap, &A); >> 32,33c32,33 >> < if (matType == "csr") {cout << matType << endl; >> MatCreateSeqAIJ(PETSC_COMM_SELF, 2, 2, 2, NULL, &Aloc);} >> < else {cout << matType << endl; MatCreateSeqDense(PETSC_COMM_SELF, 2, >> 2, NULL, &Aloc);} >> --- >> > if (matType == "csr") {cout << matType << endl; >> MatCreateSeqAIJ(PETSC_COMM_SELF, 1, 2, 2, NULL, &Aloc);} >> > else {cout << matType << endl; MatCreateSeqDense(PETSC_COMM_SELF, 1, >> 2, NULL, &Aloc);} >> 35,36c35,36 >> < PetscScalar localVal[4] = {1., 0., 0., 1.}; >> < MatSetValues(Aloc, 2, localIdx, 2, localIdx, localVal, ADD_VALUES); >> // Add local 2x2 matrix >> --- >> > PetscScalar localVal[2] = {1., 0.}; PetscInt oneLocalRow = 0; >> > MatSetValues(Aloc, 1, &oneLocalRow, 2, localIdx, localVal, >> ADD_VALUES); // Add local row >> >> ------------------------------ >> >> *De: *"Stefano Zampini" >> *?: *"Franck Houssen" >> *Cc: *"PETSc users list" >> *Envoy?: *Lundi 19 Juin 2017 15:25:35 >> *Objet: *Re: [petsc-users] Building MatIS with dense local matrix ? >> >> Can you send a minimal working example so that I can fix the code? >> >> Thanks >> Stefano >> >> Il 19 Giu 2017 15:20, "Franck Houssen" ha >> scritto: >> >>> Hi, >>> >>> I try to call MatISGetMPIXAIJ on a MatIS (A) that has been feed locally >>> by sequential (Aloc) dense matrix. >>> Seems this ends up with this error: [0]PETSC ERROR: New nonzero at (0,1) >>> caused a malloc. Is this a known error / limitation ? (not supposed to work >>> with dense matrix ?) >>> >>> This (pseudo code) works fine: >>> MatCreateIS(..., A) >>> MatCreateSeqAIJ(..., Aloc) >>> MatISSetLocalMat(pcA, pcALoc) >>> MatISGetMPIXAIJ(A, ...) // OK ! >>> >>> When I try to replace MatCreateSeqAIJ(..., Aloc) with >>> MatCreateSeqDense(..., Aloc), it does no more work. >>> >>> Franck >>> >>> PS: running debian/testing with gcc-6.3 + petsc-3.7.6 >>> >> >> > > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From franck.houssen at inria.fr Tue Jun 20 09:30:58 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Tue, 20 Jun 2017 16:30:58 +0200 (CEST) Subject: [petsc-users] Building MatIS with dense local matrix ? In-Reply-To: References: <59634672.7742377.1497877797395.JavaMail.zimbra@inria.fr> <347595938.7749535.1497878406127.JavaMail.zimbra@inria.fr> <2069876670.7862897.1497889040933.JavaMail.zimbra@inria.fr> <2005369855.8101317.1497956305467.JavaMail.zimbra@inria.fr> Message-ID: <720133125.8242944.1497969058541.JavaMail.zimbra@inria.fr> OK. I moved from petsc-3.7.6 to development version (git clone bitbucket). The very first version of the dummy example (= matISCSRDenseSquare/Rect.cpp) works with this fix: https://bitbucket.org/petsc/petsc/commits/4c8dd594d1988a0cbe282f8a37d9916f61e0c445 . The second version of the dummy example works too with the fix if one moves to petsc bitbucket (master). But, the code still breaks in "my" initial "real" case (using now master from petsc bitbucket)... With another error "SEGV under MatISSetMPIXAIJPreallocation_Private" (note: this is not a "new non zero" message, this seems to be another problem). Here is a third version of the dummy example that breaks with "SEGV under MatISSetMPIXAIJPreallocation_Private" (using master from petsc bitbucket) : the idea is the same but with N procs (not only 2) and a rectangular matrix of size N*(N+1). With 2 procs, it works (all cases). With 4 procs, new problems occur: >> mpirun -n 4 ./matISCSRDenseSquare.exe csr; mpirun -n 4 ./matISCSRDenseSquare.exe dense => OK >> mpirun -n 4 ./matISCSRDenseRect.exe csr => OK but >> mpirun -n 4 ./matISCSRDenseRect.exe dense; dense dense dense dense [3]PETSC ERROR: ------------------------------------------------------------------------ [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [3]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [3]PETSC ERROR: likely location of problem given in stack below [3]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [3]PETSC ERROR: INSTEAD the line number of the start of the function [3]PETSC ERROR: is given. [3]PETSC ERROR: [3] MatISSetMPIXAIJPreallocation_Private line 1055 /home/fghoussen/Documents/INRIA/petsc/src/mat/impls/is/matis.c [3]PETSC ERROR: [3] MatISGetMPIXAIJ_IS line 1230 /home/fghoussen/Documents/INRIA/petsc/src/mat/impls/is/matis.c [3]PETSC ERROR: [3] MatISGetMPIXAIJ line 1384 /home/fghoussen/Documents/INRIA/petsc/src/mat/impls/is/matis.c [3]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- I tried to go through this with caution... But I have to say I feel messy. Can you reproduce this problem at your side ? Franck >> git diff . --- a/matISCSRDenseRect.cpp +++ b/matISCSRDenseRect.cpp @@ -14,19 +14,17 @@ int main(int argc,char **argv) { if (matType != "csr" && matType != "dense") {cout << "error: need arg = csr or dense" << endl; return 1;} PetscInitialize(&argc, &argv, NULL, NULL); - int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); if (size != 2) {cout << "error: mpi != 2" << endl; return 1;} + int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); - PetscInt localIdx[2] = {0, 0}; - if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} - else {localIdx[0] = 1; localIdx[1] = 2;} + PetscInt localIdx[2] = {rank, rank+1}; ISLocalToGlobalMapping rmap; ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 1, &rank, PETSC_COPY_VALUES, &rmap); ISLocalToGlobalMapping cmap; ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, PETSC_COPY_VALUES, &cmap); Mat A; - MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 2, 3, rmap, cmap, &A); + MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, size, size+1, rmap, cmap, &A); Mat Aloc; if (matType == "csr") {cout << matType << endl; MatCreateSeqAIJ(PETSC_COMM_SELF, 1, 2, 2, NULL, &Aloc);} --- a/matISCSRDenseSquare.cpp +++ b/matISCSRDenseSquare.cpp @@ -14,19 +14,17 @@ int main(int argc,char **argv) { if (matType != "csr" && matType != "dense") {cout << "error: need arg = csr or dense" << endl; return 1;} PetscInitialize(&argc, &argv, NULL, NULL); - int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); if (size != 2) {cout << "error: mpi != 2" << endl; return 1;} + int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); - PetscInt localIdx[2] = {0, 0}; - if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} - else {localIdx[0] = 1; localIdx[1] = 2;} + PetscInt localIdx[2] = {rank, rank+1}; ISLocalToGlobalMapping rmap; ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, PETSC_COPY_VALUES, &rmap); ISLocalToGlobalMapping cmap; ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, PETSC_COPY_VALUES, &cmap); Mat A; - MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 3, 3, rmap, cmap, &A); + MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, size+1, size+1, rmap, cmap, &A); Mat Aloc; if (matType == "csr") {cout << matType << endl; MatCreateSeqAIJ(PETSC_COMM_SELF, 2, 2, 2, NULL, &Aloc);} ----- Mail original ----- > De: "Stefano Zampini" > ?: "Franck Houssen" > Cc: "PETSc users list" > Envoy?: Mardi 20 Juin 2017 13:23:27 > Objet: Re: [petsc-users] Building MatIS with dense local matrix ? > Franck > I tested your new example with master and it works. However, It doesn't work > with maint. I fixed the rectangular case a while ago in master and forgot to > add the change to maint too. Sorry for that. > This should fix the problem with maint: > https://bitbucket.org/petsc/petsc/commits/0ea065fb06d751599c4157d36bfe1a1b41348e0b > Test your real case and let me know. > If you could, it would be good to test against master too. > Thanks, > Stefano > 2017-06-20 12:58 GMT+02:00 Franck Houssen < franck.houssen at inria.fr > : > > As I said, it is often difficult to reduce the "real" problem: it turns out > > that your fix solves the "matISCSRDenseSquare/Rect.cpp" dummy example I > > sent, but, it's still not working in "my real" situation. > > > I changed a bit the "matISCSRDenseSquare/Rect.cpp" dummy example (see git > > diff below - I just changed the point that overlaps) : the dummy example is > > still failing > > > "mpirun -n 2 ./matISCSRDenseSquare.exe csr" and "mpirun -n 2 > > ./matISCSRDenseSquare.exe dense" : OK > > > but > > > "mpirun -n 2 ./matISCSRDenseRect.exe csr" and "mpirun -n 2 > > ./matISCSRDenseRect.exe dense": KO with error "Argument out of range - New > > nonzero at (0,2) caused a malloc" > > > I would say, the problem (I am concerned with the "real" case) is around > > lines 360-380 of /src/mat/impls/is/matis.c (not around 181 : this fixes a > > valid problem, but, this problem is another one) > > > Franck > > > --- a/matISCSRDenseRect.cpp > > > +++ b/matISCSRDenseRect.cpp > > > @@ -18,7 +18,7 @@ int main(int argc,char **argv) { > > > int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > > PetscInt localIdx[2] = {0, 0}; > > > - if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} > > > + if (rank == 0) {localIdx[0] = 0; localIdx[1] = 2;} > > > else {localIdx[0] = 1; localIdx[1] = 2;} > > > ISLocalToGlobalMapping rmap; > > > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 1, &rank, > > PETSC_COPY_VALUES, &rmap); > > > diff --git > > a/Graphs/Franck/03.petscDDM/02.petscMailList/matISCSRDenseSquare.cpp > > b/Graphs/Franck/03.petscDDM/02.petscMailList/matISCSRDenseSquare.cpp > > > index 4bc6190..4a6ea41 100644 > > > --- a/matISCSRDenseSquare.cpp > > > +++ b/matISCSRDenseSquare.cpp > > > @@ -18,7 +18,7 @@ int main(int argc,char **argv) { > > > int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > > PetscInt localIdx[2] = {0, 0}; > > > - if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} > > > + if (rank == 0) {localIdx[0] = 0; localIdx[1] = 2;} > > > else {localIdx[0] = 1; localIdx[1] = 2;} > > > ISLocalToGlobalMapping rmap; > > > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, > > PETSC_COPY_VALUES, &rmap); > > > > De: "Stefano Zampini" < stefano.zampini at gmail.com > > > > > > > ?: "Franck Houssen" < franck.houssen at inria.fr > > > > > > > Cc: "PETSc users list" < petsc-users at mcs.anl.gov > > > > > > > Envoy?: Mardi 20 Juin 2017 00:23:24 > > > > > > Objet: Re: [petsc-users] Building MatIS with dense local matrix ? > > > > > > It should be fixed now in maint and master > > > > > > https://bitbucket.org/petsc/petsc/commits/4c8dd594d1988a0cbe282f8a37d9916f61e0c445 > > > > > > Thanks for reporting the problem, > > > > > > Stefano > > > > > > > On Jun 19, 2017, at 10:46 PM, Stefano Zampini < > > > > stefano.zampini at gmail.com > > > > > > > > > wrote: > > > > > > > > > > Franck, > > > > > > > > > > Thanks. I'll? get back soon with a fix. > > > > > > > > > > Stefano > > > > > > > > > > Il 19 Giu 2017 18:17, "Franck Houssen" < franck.houssen at inria.fr > ha > > > > scritto: > > > > > > > > > > > The problem was difficult to reduce as reducing make things > > > > > disappear... > > > > > Luckily, I believe I got it (or at least, it looks "like" the one I > > > > > "really" > > > > > have...). > > > > > > > > > > > > > > > Seems that for square matrix, it works fine for csr and dense matrix. > > > > > But, > > > > > If > > > > > I am not mistaken, it does not for dense rectangular matrix (still OK > > > > > for > > > > > csr). > > > > > > > > > > > > > > > matISCSRDenseSquare.cpp: 2 procs, global 3x3 matrix, each proc adds a > > > > > 2x2 > > > > > local matrix in the global matrix. > > > > > > > > > > > > > > > matISCSRDenseRect.cpp: 2 procs, global 2 x3 matrix, each proc adds a > > > > > 1 > > > > > x2 > > > > > local vector in the global matrix. > > > > > > > > > > > > > > > reminder: running debian/testing with gcc-6.3 + petsc-3.7.6 > > > > > > > > > > > > > > > Franck > > > > > > > > > > > > > > > >> mpirun -n 2 ./matISCSRDenseSquare.exe csr; mpirun -n 2 > > > > > >> ./matISCSRDenseSquare.exe dense > > > > > > > > > > > > > > > csr > > > > > > > > > > > > > > > csr > > > > > > > > > > > > > > > Mat Object: 2 MPI processes > > > > > > > > > > > > > > > type: is > > > > > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > > > > > > > > > > type: seqaij > > > > > > > > > > > > > > > row 0: (0, 1.) (1, 0.) > > > > > > > > > > > > > > > row 1: (0, 0.) (1, 1.) > > > > > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > > > > > > > > > > type: seqaij > > > > > > > > > > > > > > > row 0: (0, 1.) (1, 0.) > > > > > > > > > > > > > > > row 1: (0, 0.) (1, 1.) > > > > > > > > > > > > > > > dense > > > > > > > > > > > > > > > dense > > > > > > > > > > > > > > > Mat Object: 2 MPI processes > > > > > > > > > > > > > > > type: is > > > > > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > > > > > > > > > > type: seqdense > > > > > > > > > > > > > > > 1.0000000000000000e+00 0.0000000000000000e+00 > > > > > > > > > > > > > > > 0.0000000000000000e+00 1.0000000000000000e+00 > > > > > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > > > > > > > > > > type: seqdense > > > > > > > > > > > > > > > 1.0000000000000000e+00 0.0000000000000000e+00 > > > > > > > > > > > > > > > 0.0000000000000000e+00 1.0000000000000000e+00 > > > > > > > > > > > > > > > >> mpirun -n 2 ./matISCSRDenseRect.exe csr; mpirun -n 2 > > > > > >> ./matISCSRDenseRect.exe dense > > > > > > > > > > > > > > > csr > > > > > > > > > > > > > > > csr > > > > > > > > > > > > > > > Mat Object: 2 MPI processes > > > > > > > > > > > > > > > type: is > > > > > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > > > > > > > > > > type: seqaij > > > > > > > > > > > > > > > row 0: (0, 1.) (1, 0.) > > > > > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > > > > > > > > > > type: seqaij > > > > > > > > > > > > > > > row 0: (0, 1.) (1, 0.) > > > > > > > > > > > > > > > dense > > > > > > > > > > > > > > > dense > > > > > > > > > > > > > > > [0]PETSC ERROR: --------------------- Error Message > > > > > -------------------------------------------------------------- > > > > > > > > > > > > > > > [0]PETSC ERROR: Argument out of range > > > > > > > > > > > > > > > [0]PETSC ERROR: [1]PETSC ERROR: --------------------- Error Message > > > > > -------------------------------------------------------------- > > > > > > > > > > > > > > > [1]PETSC ERROR: Argument out of range > > > > > > > > > > > > > > > [1]PETSC ERROR: New nonzero at (0,1) caused a malloc > > > > > > > > > > > > > > > Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to > > > > > turn > > > > > off > > > > > this check > > > > > > > > > > > > > > > [1]PETSC ERROR: See > > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html > > > > > for > > > > > trouble shooting. > > > > > > > > > > > > > > > [1]PETSC ERROR: Petsc Release Version 3.7.6, Apr, 24, 2017 > > > > > > > > > > > > > > > [1]PETSC ERROR: ./matISCSRDenseRect.exe on a arch-linux2-c-debug > > > > > named > > > > > yoda > > > > > by fghoussen Mon Jun 19 18:08:58 2017 > > > > > > > > > > > > > > > New nonzero at (0,1) caused a malloc > > > > > > > > > > > > > > > Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to > > > > > turn > > > > > off > > > > > this check > > > > > > > > > > > > > > > [0]PETSC ERROR: See > > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html > > > > > for > > > > > trouble shooting. > > > > > > > > > > > > > > > [0]PETSC ERROR: Petsc Release Version 3.7.6, Apr, 24, 2017 > > > > > > > > > > > > > > > [1]PETSC ERROR: Configure options > > > > > --prefix=/home/fghoussen/Documents/INRIA/petsc-3.7.6/local > > > > > --with-mpi=1 > > > > > --with-pthread=1 --download-f2cblaslapack=yes --download-mumps=yes > > > > > --download-scalapack=yes --download-superlu=yes > > > > > --download-suitesparse=yes > > > > > > > > > > > > > > > [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 616 in > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/aij/mpi/mpiaij.c > > > > > > > > > > > > > > > [1]PETSC ERROR: #2 MatSetValues() line 1190 in > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > > > > > > > > > > > > > > > [1]PETSC ERROR: #3 MatSetValuesLocal() line 2053 in > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > > > > > > > > > > > > > > > [1]PETSC ERROR: #4 MatISGetMPIXAIJ_IS() line 365 in > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > > > > > > > > > > [1]PETSC ERROR: #5 MatISGetMPIXAIJ() line 437 in > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > > > > > > > > > > [0]PETSC ERROR: ./matISCSRDenseRect.exe on a arch-linux2-c-debug > > > > > named > > > > > yoda > > > > > by fghoussen Mon Jun 19 18:08:58 2017 > > > > > > > > > > > > > > > [0]PETSC ERROR: Configure options > > > > > --prefix=/home/fghoussen/Documents/INRIA/petsc-3.7.6/local > > > > > --with-mpi=1 > > > > > --with-pthread=1 --download-f2cblaslapack=yes --download-mumps=yes > > > > > --download-scalapack=yes --download-superlu=yes > > > > > --download-suitesparse=yes > > > > > > > > > > > > > > > [0]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 582 in > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/aij/mpi/mpiaij.c > > > > > > > > > > > > > > > [0]PETSC ERROR: #2 MatSetValues() line 1190 in > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > > > > > > > > > > > > > > > [0]PETSC ERROR: #3 MatSetValuesLocal() line 2053 in > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > > > > > > > > > > > > > > > [0]PETSC ERROR: #4 MatISGetMPIXAIJ_IS() line 365 in > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > > > > > > > > > > [0]PETSC ERROR: #5 MatISGetMPIXAIJ() line 437 in > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > > > > > > > > > > Mat Object: 2 MPI processes > > > > > > > > > > > > > > > type: is > > > > > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > > > > > > > > > > type: seqdense > > > > > > > > > > > > > > > 1.0000000000000000e+00 0.0000000000000000e+00 > > > > > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > > > > > > > > > > type: seqdense > > > > > > > > > > > > > > > 1.0000000000000000e+00 0.0000000000000000e+00 > > > > > > > > > > > > > > > >> diff matISCSRDenseSquare.cpp matISCSRDenseRect.cpp > > > > > > > > > > > > > > > 3c3 > > > > > > > > > > > > > > > < // ~> g++ -o matISCSRDenseSquare.exe matISCSRDenseSquare.cpp > > > > > -lpetsc > > > > > -lm; > > > > > mpirun -n 2 matISCSRDenseSquare.exe > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > // ~> g++ -o matISCSRDenseRect.exe matISCSRDenseRect.cpp -lpetsc > > > > > > -lm; > > > > > > mpirun -n 2 matISCSRDenseRect.exe > > > > > > > > > > > > > > > 24c24 > > > > > > > > > > > > > > > < ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, > > > > > PETSC_COPY_VALUES, &rmap); > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 1, &rank, > > > > > > PETSC_COPY_VALUES, &rmap); > > > > > > > > > > > > > > > 29c29 > > > > > > > > > > > > > > > < MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 3, 3, > > > > > rmap, > > > > > cmap, &A); > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 2, 3, > > > > > > rmap, > > > > > > cmap, &A); > > > > > > > > > > > > > > > 32,33c32,33 > > > > > > > > > > > > > > > < if (matType == "csr") {cout << matType << endl; > > > > > MatCreateSeqAIJ(PETSC_COMM_SELF, 2, 2, 2, NULL, &Aloc);} > > > > > > > > > > > > > > > < else {cout << matType << endl; MatCreateSeqDense(PETSC_COMM_SELF, > > > > > 2, > > > > > 2, > > > > > NULL, &Aloc);} > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > if (matType == "csr") {cout << matType << endl; > > > > > > MatCreateSeqAIJ(PETSC_COMM_SELF, 1, 2, 2, NULL, &Aloc);} > > > > > > > > > > > > > > > > else {cout << matType << endl; MatCreateSeqDense(PETSC_COMM_SELF, > > > > > > 1, > > > > > > 2, > > > > > > NULL, &Aloc);} > > > > > > > > > > > > > > > 35,36c35,36 > > > > > > > > > > > > > > > < PetscScalar localVal[4] = {1., 0., 0., 1.}; > > > > > > > > > > > > > > > < MatSetValues(Aloc, 2, localIdx, 2, localIdx, localVal, ADD_VALUES); > > > > > // > > > > > Add > > > > > local 2x2 matrix > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > PetscScalar localVal[2] = {1., 0.}; PetscInt oneLocalRow = 0; > > > > > > > > > > > > > > > > MatSetValues(Aloc, 1, &oneLocalRow, 2, localIdx, localVal, > > > > > > ADD_VALUES); > > > > > > // > > > > > > Add local row > > > > > > > > > > > > > > > > De: "Stefano Zampini" < stefano.zampini at gmail.com > > > > > > > > > > > > > > > > > > > > > > ?: "Franck Houssen" < franck.houssen at inria.fr > > > > > > > > > > > > > > > > > > > > > > Cc: "PETSc users list" < petsc-users at mcs.anl.gov > > > > > > > > > > > > > > > > > > > > > > Envoy?: Lundi 19 Juin 2017 15:25:35 > > > > > > > > > > > > > > > > > > > > > Objet: Re: [petsc-users] Building MatIS with dense local matrix ? > > > > > > > > > > > > > > > > > > > > > Can you send a minimal working example so that I can fix the code? > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > Stefano > > > > > > > > > > > > > > > > > > > > > Il 19 Giu 2017 15:20, "Franck Houssen" < franck.houssen at inria.fr > > > > > > > ha > > > > > > scritto: > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > > I try to call MatISGetMPIXAIJ on a MatIS (A) that has been feed > > > > > > > locally > > > > > > > by > > > > > > > sequential (Aloc) dense matrix. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Seems this ends up with this error: [0]PETSC ERROR: New nonzero > > > > > > > at > > > > > > > (0,1) > > > > > > > caused a malloc. Is this a known error / limitation ? (not > > > > > > > supposed > > > > > > > to > > > > > > > work > > > > > > > with dense matrix ?) > > > > > > > > > > > > > > > > > > > > > > > > > > > > This (pseudo code) works fine: > > > > > > > > > > > > > > > > > > > > > > > > > > > > MatCreateIS(..., A) > > > > > > > > > > > > > > > > > > > > > > > > > > > > MatCreateSeqAIJ(..., Aloc) > > > > > > > > > > > > > > > > > > > > > > > > > > > > MatISSetLocalMat(pcA, pcALoc) > > > > > > > > > > > > > > > > > > > > > > > > > > > > MatISGetMPIXAIJ(A, ...) // OK ! > > > > > > > > > > > > > > > > > > > > > > > > > > > > When I try to replace MatCreateSeqAIJ(..., Aloc) with > > > > > > > MatCreateSeqDense(..., > > > > > > > Aloc), it does no more work. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Franck > > > > > > > > > > > > > > > > > > > > > > > > > > > > PS: running debian/testing with gcc-6.3 + petsc-3.7.6 > > > > > > > > > > > > > > > > > > > > > > -- > Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: matISCSRDenseRect.cpp Type: text/x-c++src Size: 1730 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: matISCSRDenseSquare.cpp Type: text/x-c++src Size: 1726 bytes Desc: not available URL: From stefano.zampini at gmail.com Tue Jun 20 10:08:29 2017 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Tue, 20 Jun 2017 17:08:29 +0200 Subject: [petsc-users] Building MatIS with dense local matrix ? In-Reply-To: <720133125.8242944.1497969058541.JavaMail.zimbra@inria.fr> References: <59634672.7742377.1497877797395.JavaMail.zimbra@inria.fr> <347595938.7749535.1497878406127.JavaMail.zimbra@inria.fr> <2069876670.7862897.1497889040933.JavaMail.zimbra@inria.fr> <2005369855.8101317.1497956305467.JavaMail.zimbra@inria.fr> <720133125.8242944.1497969058541.JavaMail.zimbra@inria.fr> Message-ID: It should be fixed right now, both in master and in maint. Again, sorry for this ping-pong of fixes, my brain it's not fully functional these days.... https://bitbucket.org/petsc/petsc/commits/c6f20c4fa7817632f09219574920bd3bd922f6f1 2017-06-20 16:30 GMT+02:00 Franck Houssen : > OK. I moved from petsc-3.7.6 to development version (git clone bitbucket). > The very first version of the dummy example (= > matISCSRDenseSquare/Rect.cpp) works with this fix: > https://bitbucket.org/petsc/petsc/commits/4c8dd594d1988a0cbe282f8a37d991 > 6f61e0c445. > The second version of the dummy example works too with the fix if one > moves to petsc bitbucket (master). > > But, the code still breaks in "my" initial "real" case (using now master > from petsc bitbucket)... With another error "SEGV under > MatISSetMPIXAIJPreallocation_Private" (note: this is not a "new non zero" > message, this seems to be another problem). > > Here is a third version of the dummy example that breaks with "SEGV under > MatISSetMPIXAIJPreallocation_Private" (using master from petsc bitbucket) > : the idea is the same but with N procs (not only 2) and a rectangular > matrix of size N*(N+1). > With 2 procs, it works (all cases). > With 4 procs, new problems occur: > >> mpirun -n 4 ./matISCSRDenseSquare.exe csr; mpirun -n 4 > ./matISCSRDenseSquare.exe dense => OK > >> mpirun -n 4 ./matISCSRDenseRect.exe csr => OK > but > >> mpirun -n 4 ./matISCSRDenseRect.exe dense; > dense > dense > dense > dense > [3]PETSC ERROR: ------------------------------ > ------------------------------------------ > [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [3]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/ > documentation/faq.html#valgrind > [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > X to find memory corruption errors > [3]PETSC ERROR: likely location of problem given in stack below > [3]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > [3]PETSC ERROR: INSTEAD the line number of the start of the function > [3]PETSC ERROR: is given. > [3]PETSC ERROR: [3] MatISSetMPIXAIJPreallocation_Private line 1055 > /home/fghoussen/Documents/INRIA/petsc/src/mat/impls/is/matis.c > [3]PETSC ERROR: [3] MatISGetMPIXAIJ_IS line 1230 /home/fghoussen/Documents/ > INRIA/petsc/src/mat/impls/is/matis.c > [3]PETSC ERROR: [3] MatISGetMPIXAIJ line 1384 /home/fghoussen/Documents/ > INRIA/petsc/src/mat/impls/is/matis.c > [3]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > I tried to go through this with caution... But I have to say I feel messy. > Can you reproduce this problem at your side ? > > Franck > > >> git diff . > --- a/matISCSRDenseRect.cpp > +++ b/matISCSRDenseRect.cpp > @@ -14,19 +14,17 @@ int main(int argc,char **argv) { > if (matType != "csr" && matType != "dense") {cout << "error: need arg = > csr or dense" << endl; return 1;} > > PetscInitialize(&argc, &argv, NULL, NULL); > - int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); if (size != 2) > {cout << "error: mpi != 2" << endl; return 1;} > + int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); > int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > - PetscInt localIdx[2] = {0, 0}; > - if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} > - else {localIdx[0] = 1; localIdx[1] = 2;} > + PetscInt localIdx[2] = {rank, rank+1}; > ISLocalToGlobalMapping rmap; > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 1, &rank, > PETSC_COPY_VALUES, &rmap); > ISLocalToGlobalMapping cmap; > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, > PETSC_COPY_VALUES, &cmap); > > Mat A; > - MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 2, 3, > rmap, cmap, &A); > + MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, size, > size+1, rmap, cmap, &A); > > Mat Aloc; > if (matType == "csr") {cout << matType << endl; > MatCreateSeqAIJ(PETSC_COMM_SELF, 1, 2, 2, NULL, &Aloc);} > > --- a/matISCSRDenseSquare.cpp > +++ b/matISCSRDenseSquare.cpp > @@ -14,19 +14,17 @@ int main(int argc,char **argv) { > if (matType != "csr" && matType != "dense") {cout << "error: need arg = > csr or dense" << endl; return 1;} > > PetscInitialize(&argc, &argv, NULL, NULL); > - int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); if (size != 2) > {cout << "error: mpi != 2" << endl; return 1;} > + int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); > int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > - PetscInt localIdx[2] = {0, 0}; > - if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} > - else {localIdx[0] = 1; localIdx[1] = 2;} > + PetscInt localIdx[2] = {rank, rank+1}; > ISLocalToGlobalMapping rmap; > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, > PETSC_COPY_VALUES, &rmap); > ISLocalToGlobalMapping cmap; > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, > PETSC_COPY_VALUES, &cmap); > > Mat A; > - MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 3, 3, > rmap, cmap, &A); > + MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, size+1, > size+1, rmap, cmap, &A); > > Mat Aloc; > if (matType == "csr") {cout << matType << endl; > MatCreateSeqAIJ(PETSC_COMM_SELF, 2, 2, 2, NULL, &Aloc);} > > ------------------------------ > > *De: *"Stefano Zampini" > *?: *"Franck Houssen" > *Cc: *"PETSc users list" > *Envoy?: *Mardi 20 Juin 2017 13:23:27 > > *Objet: *Re: [petsc-users] Building MatIS with dense local matrix ? > > Franck > > I tested your new example with master and it works. However, It doesn't > work with maint. I fixed the rectangular case a while ago in master and > forgot to add the change to maint too. Sorry for that. > > This should fix the problem with maint: https://bitbucket.org/petsc/ > petsc/commits/0ea065fb06d751599c4157d36bfe1a1b41348e0b > > Test your real case and let me know. > If you could, it would be good to test against master too. > > Thanks, > Stefano > > > 2017-06-20 12:58 GMT+02:00 Franck Houssen : > >> As I said, it is often difficult to reduce the "real" problem: it turns >> out that your fix solves the "matISCSRDenseSquare/Rect.cpp" dummy example I >> sent, but, it's still not working in "my real" situation. >> >> I changed a bit the "matISCSRDenseSquare/Rect.cpp" dummy example (see git >> diff below - I just changed the point that overlaps) : the dummy example is >> still failing >> >> "mpirun -n 2 ./matISCSRDenseSquare.exe csr" and "mpirun -n 2 >> ./matISCSRDenseSquare.exe dense" : OK >> but >> "mpirun -n 2 ./matISCSRDenseRect.exe csr" and "mpirun -n 2 >> ./matISCSRDenseRect.exe dense": KO with error "Argument out of range - New >> nonzero at (0,2) caused a malloc" >> >> I would say, the problem (I am concerned with the "real" case) is around >> lines 360-380 of /src/mat/impls/is/matis.c (not around 181 : this fixes a >> valid problem, but, this problem is another one) >> >> Franck >> >> --- a/matISCSRDenseRect.cpp >> +++ b/matISCSRDenseRect.cpp >> @@ -18,7 +18,7 @@ int main(int argc,char **argv) { >> int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); >> >> PetscInt localIdx[2] = {0, 0}; >> - if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} >> + if (rank == 0) {localIdx[0] = 0; localIdx[1] = 2;} >> else {localIdx[0] = 1; localIdx[1] = 2;} >> ISLocalToGlobalMapping rmap; >> ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 1, &rank, >> PETSC_COPY_VALUES, &rmap); >> diff --git a/Graphs/Franck/03.petscDDM/02.petscMailList/matISCSRDenseSquare.cpp >> b/Graphs/Franck/03.petscDDM/02.petscMailList/matISCSRDenseSquare.cpp >> index 4bc6190..4a6ea41 100644 >> --- a/matISCSRDenseSquare.cpp >> +++ b/matISCSRDenseSquare.cpp >> @@ -18,7 +18,7 @@ int main(int argc,char **argv) { >> int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); >> >> PetscInt localIdx[2] = {0, 0}; >> - if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} >> + if (rank == 0) {localIdx[0] = 0; localIdx[1] = 2;} >> else {localIdx[0] = 1; localIdx[1] = 2;} >> ISLocalToGlobalMapping rmap; >> ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, >> PETSC_COPY_VALUES, &rmap); >> >> ------------------------------ >> >> *De: *"Stefano Zampini" >> *?: *"Franck Houssen" >> *Cc: *"PETSc users list" >> *Envoy?: *Mardi 20 Juin 2017 00:23:24 >> >> *Objet: *Re: [petsc-users] Building MatIS with dense local matrix ? >> >> It should be fixed now in maint and master >> >> https://bitbucket.org/petsc/petsc/commits/4c8dd594d1988a0cbe282f8a37d991 >> 6f61e0c445 >> >> Thanks for reporting the problem, >> Stefano >> >> On Jun 19, 2017, at 10:46 PM, Stefano Zampini >> wrote: >> >> Franck, >> >> Thanks. I'll? get back soon with a fix. >> >> Stefano >> >> >> Il 19 Giu 2017 18:17, "Franck Houssen" ha >> scritto: >> >>> The problem was difficult to reduce as reducing make things disappear... >>> Luckily, I believe I got it (or at least, it looks "like" the one I >>> "really" have...). >>> >>> Seems that for square matrix, it works fine for csr and dense matrix. >>> But, If I am not mistaken, it does not for dense rectangular matrix (still >>> OK for csr). >>> >>> matISCSRDenseSquare.cpp: 2 procs, global 3x3 matrix, each proc adds a >>> 2x2 local matrix in the global matrix. >>> matISCSRDenseRect.cpp: 2 procs, global *2*x3 matrix, each proc adds a >>> *1*x2 local *vector* in the global matrix. >>> >>> reminder: running debian/testing with gcc-6.3 + petsc-3.7.6 >>> >>> Franck >>> >>> >> mpirun -n 2 ./matISCSRDenseSquare.exe csr; mpirun -n 2 >>> ./matISCSRDenseSquare.exe dense >>> csr >>> csr >>> Mat Object: 2 MPI processes >>> type: is >>> Mat Object: 1 MPI processes >>> type: seqaij >>> row 0: (0, 1.) (1, 0.) >>> row 1: (0, 0.) (1, 1.) >>> Mat Object: 1 MPI processes >>> type: seqaij >>> row 0: (0, 1.) (1, 0.) >>> row 1: (0, 0.) (1, 1.) >>> dense >>> dense >>> Mat Object: 2 MPI processes >>> type: is >>> Mat Object: 1 MPI processes >>> type: seqdense >>> 1.0000000000000000e+00 0.0000000000000000e+00 >>> 0.0000000000000000e+00 1.0000000000000000e+00 >>> Mat Object: 1 MPI processes >>> type: seqdense >>> 1.0000000000000000e+00 0.0000000000000000e+00 >>> 0.0000000000000000e+00 1.0000000000000000e+00 >>> >>> >> mpirun -n 2 ./matISCSRDenseRect.exe csr; mpirun -n 2 >>> ./matISCSRDenseRect.exe dense >>> csr >>> csr >>> Mat Object: 2 MPI processes >>> type: is >>> Mat Object: 1 MPI processes >>> type: seqaij >>> row 0: (0, 1.) (1, 0.) >>> Mat Object: 1 MPI processes >>> type: seqaij >>> row 0: (0, 1.) (1, 0.) >>> dense >>> dense >>> [0]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> [0]PETSC ERROR: Argument out of range >>> [0]PETSC ERROR: [1]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> [1]PETSC ERROR: Argument out of range >>> [1]PETSC ERROR: New nonzero at (0,1) caused a malloc >>> Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to >>> turn off this check >>> [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >>> for trouble shooting. >>> [1]PETSC ERROR: Petsc Release Version 3.7.6, Apr, 24, 2017 >>> [1]PETSC ERROR: ./matISCSRDenseRect.exe on a arch-linux2-c-debug named >>> yoda by fghoussen Mon Jun 19 18:08:58 2017 >>> New nonzero at (0,1) caused a malloc >>> Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to >>> turn off this check >>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >>> for trouble shooting. >>> [0]PETSC ERROR: Petsc Release Version 3.7.6, Apr, 24, 2017 >>> [1]PETSC ERROR: Configure options --prefix=/home/fghoussen/ >>> Documents/INRIA/petsc-3.7.6/local --with-mpi=1 --with-pthread=1 >>> --download-f2cblaslapack=yes --download-mumps=yes --download-scalapack=yes >>> --download-superlu=yes --download-suitesparse=yes >>> [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 616 in >>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/ >>> impls/aij/mpi/mpiaij.c >>> [1]PETSC ERROR: #2 MatSetValues() line 1190 in /home/fghoussen/Documents/ >>> INRIA/petsc-3.7.6/src/mat/interface/matrix.c >>> [1]PETSC ERROR: #3 MatSetValuesLocal() line 2053 in >>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c >>> [1]PETSC ERROR: #4 MatISGetMPIXAIJ_IS() line 365 in >>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >>> [1]PETSC ERROR: #5 MatISGetMPIXAIJ() line 437 in >>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >>> [0]PETSC ERROR: ./matISCSRDenseRect.exe on a arch-linux2-c-debug named >>> yoda by fghoussen Mon Jun 19 18:08:58 2017 >>> [0]PETSC ERROR: Configure options --prefix=/home/fghoussen/ >>> Documents/INRIA/petsc-3.7.6/local --with-mpi=1 --with-pthread=1 >>> --download-f2cblaslapack=yes --download-mumps=yes --download-scalapack=yes >>> --download-superlu=yes --download-suitesparse=yes >>> [0]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 582 in >>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/ >>> impls/aij/mpi/mpiaij.c >>> [0]PETSC ERROR: #2 MatSetValues() line 1190 in /home/fghoussen/Documents/ >>> INRIA/petsc-3.7.6/src/mat/interface/matrix.c >>> [0]PETSC ERROR: #3 MatSetValuesLocal() line 2053 in >>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c >>> [0]PETSC ERROR: #4 MatISGetMPIXAIJ_IS() line 365 in >>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >>> [0]PETSC ERROR: #5 MatISGetMPIXAIJ() line 437 in >>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >>> Mat Object: 2 MPI processes >>> type: is >>> Mat Object: 1 MPI processes >>> type: seqdense >>> 1.0000000000000000e+00 0.0000000000000000e+00 >>> Mat Object: 1 MPI processes >>> type: seqdense >>> 1.0000000000000000e+00 0.0000000000000000e+00 >>> >>> >> diff matISCSRDenseSquare.cpp matISCSRDenseRect.cpp >>> 3c3 >>> < // ~> g++ -o matISCSRDenseSquare.exe matISCSRDenseSquare.cpp -lpetsc >>> -lm; mpirun -n 2 matISCSRDenseSquare.exe >>> --- >>> > // ~> g++ -o matISCSRDenseRect.exe matISCSRDenseRect.cpp -lpetsc -lm; >>> mpirun -n 2 matISCSRDenseRect.exe >>> 24c24 >>> < ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, >>> PETSC_COPY_VALUES, &rmap); >>> --- >>> > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 1, &rank, >>> PETSC_COPY_VALUES, &rmap); >>> 29c29 >>> < MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 3, 3, >>> rmap, cmap, &A); >>> --- >>> > MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 2, 3, >>> rmap, cmap, &A); >>> 32,33c32,33 >>> < if (matType == "csr") {cout << matType << endl; >>> MatCreateSeqAIJ(PETSC_COMM_SELF, 2, 2, 2, NULL, &Aloc);} >>> < else {cout << matType << endl; MatCreateSeqDense(PETSC_COMM_SELF, >>> 2, 2, NULL, &Aloc);} >>> --- >>> > if (matType == "csr") {cout << matType << endl; >>> MatCreateSeqAIJ(PETSC_COMM_SELF, 1, 2, 2, NULL, &Aloc);} >>> > else {cout << matType << endl; MatCreateSeqDense(PETSC_COMM_SELF, >>> 1, 2, NULL, &Aloc);} >>> 35,36c35,36 >>> < PetscScalar localVal[4] = {1., 0., 0., 1.}; >>> < MatSetValues(Aloc, 2, localIdx, 2, localIdx, localVal, ADD_VALUES); >>> // Add local 2x2 matrix >>> --- >>> > PetscScalar localVal[2] = {1., 0.}; PetscInt oneLocalRow = 0; >>> > MatSetValues(Aloc, 1, &oneLocalRow, 2, localIdx, localVal, >>> ADD_VALUES); // Add local row >>> >>> ------------------------------ >>> >>> *De: *"Stefano Zampini" >>> *?: *"Franck Houssen" >>> *Cc: *"PETSc users list" >>> *Envoy?: *Lundi 19 Juin 2017 15:25:35 >>> *Objet: *Re: [petsc-users] Building MatIS with dense local matrix ? >>> >>> Can you send a minimal working example so that I can fix the code? >>> >>> Thanks >>> Stefano >>> >>> Il 19 Giu 2017 15:20, "Franck Houssen" ha >>> scritto: >>> >>>> Hi, >>>> >>>> I try to call MatISGetMPIXAIJ on a MatIS (A) that has been feed locally >>>> by sequential (Aloc) dense matrix. >>>> Seems this ends up with this error: [0]PETSC ERROR: New nonzero at >>>> (0,1) caused a malloc. Is this a known error / limitation ? (not supposed >>>> to work with dense matrix ?) >>>> >>>> This (pseudo code) works fine: >>>> MatCreateIS(..., A) >>>> MatCreateSeqAIJ(..., Aloc) >>>> MatISSetLocalMat(pcA, pcALoc) >>>> MatISGetMPIXAIJ(A, ...) // OK ! >>>> >>>> When I try to replace MatCreateSeqAIJ(..., Aloc) with >>>> MatCreateSeqDense(..., Aloc), it does no more work. >>>> >>>> Franck >>>> >>>> PS: running debian/testing with gcc-6.3 + petsc-3.7.6 >>>> >>> >>> >> >> > > > -- > Stefano > > > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Tue Jun 20 10:24:17 2017 From: hzhang at mcs.anl.gov (Hong) Date: Tue, 20 Jun 2017 09:24:17 -0600 Subject: [petsc-users] empty split for fieldsplit In-Reply-To: References: <86908C35-EBFC-48C6-A002-B64BEC85A375@mcs.anl.gov> <5596225C-DB37-4040-B709-5E6F4B18041B@mcs.anl.gov> Message-ID: The fix is merged to master branch. Thanks for your report and contribution! Hong On Mon, Jun 19, 2017 at 4:35 AM, Hoang Giang Bui wrote: > Thanks Hong > > I confirmed that it's fixed by your changes. > > Giang > > On Sun, Jun 18, 2017 at 4:45 PM, Hong wrote: > >> Hoang, >> I pushed a fix >> https://bitbucket.org/petsc/petsc/commits/d4e3277789d24018f0 >> db1641a80db7be76600165 >> and added your test to >> petsc/src/ksp/ksp/examples/tests/ex53.c >> >> It is on the branch hzhang/fix-blockedIS-submat >> Let me know if it still does not fix your problem. >> >> Hong >> >> On Sat, Jun 17, 2017 at 4:06 PM, Zhang, Hong wrote: >> >>> never mind, I know it is ok to set blocksize=2 >>> ------------------------------ >>> *From:* Zhang, Hong >>> *Sent:* Saturday, June 17, 2017 3:56:35 PM >>> >>> *To:* Smith, Barry F.; Hoang Giang Bui >>> *Cc:* petsc-users >>> *Subject:* Re: [petsc-users] empty split for fieldsplit >>> >>> >>> Matrix A is a tridiagonal matrix with blocksize=1. >>> >>> Why do you set block_size=2 for A_IS and B_IS? >>> >>> >>> Hong >>> ------------------------------ >>> *From:* Zhang, Hong >>> *Sent:* Friday, June 16, 2017 7:55:45 AM >>> *To:* Smith, Barry F.; Hoang Giang Bui >>> *Cc:* petsc-users >>> *Subject:* Re: [petsc-users] empty split for fieldsplit >>> >>> >>> I'm in Boulder and will be back home this evening. >>> >>> Will test it this weekend. >>> >>> >>> Hong >>> ------------------------------ >>> *From:* Smith, Barry F. >>> *Sent:* Thursday, June 15, 2017 1:38:11 PM >>> *To:* Hoang Giang Bui; Zhang, Hong >>> *Cc:* petsc-users >>> *Subject:* Re: [petsc-users] empty split for fieldsplit >>> >>> >>> Hong, >>> >>> Please build the attached code with master and run with >>> >>> petscmpiexec -n 2 ./ex1 -mat_size 40 -block_size 2 -method 2 >>> >>> I think this is a bug in your new MatGetSubMatrix routines. You take the >>> block size of the outer IS and pass it into the inner IS but that inner IS >>> may not support the same block size hence the crash. >>> >>> Can you please debug this? >>> >>> Thanks >>> >>> Barry >>> >>> >>> >>> > On Jun 15, 2017, at 7:56 AM, Hoang Giang Bui >>> wrote: >>> > >>> > Hi Barry >>> > >>> > Thanks for pointing out the error. I think the problem coming from the >>> zero fieldsplit in proc 0. In this modified example, I parameterized the >>> matrix size and block size, so when you're executing >>> > >>> > mpirun -np 2 ./ex -mat_size 40 -block_size 2 -method 1 >>> > >>> > everything was fine. With method = 1, fieldsplit size of B is nonzero >>> and is divided by the block size. >>> > >>> > With method=2, i.e mpirun -np 2 ./ex -mat_size 40 -block_size 2 >>> -method 2, the fieldsplit B is zero on proc 0, and the error is thrown >>> > >>> > [1]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> > [1]PETSC ERROR: Arguments are incompatible >>> > [1]PETSC ERROR: Local size 11 not compatible with block size 2 >>> > >>> > This is somehow not logical, because 0 is divided by block_size. >>> > >>> > Furthermore, if you execute "mpirun -np 2 ./ex -mat_size 20 >>> -block_size 2 -method 2", the code hangs at ISSetBlockSize, which is pretty >>> similar to my original problem. Probably the original one also hangs at >>> ISSetBlockSize, which I may not realize at that time. >>> > >>> > Giang >>> > >>> > On Wed, Jun 14, 2017 at 5:29 PM, Barry Smith >>> wrote: >>> > >>> > You can't do this >>> > >>> > ierr = MatSetSizes(A,PETSC_DECIDE,N,N,N);CHKERRQ(ierr); >>> > >>> > use PETSC_DECIDE for the third argument >>> > >>> > Also this is wrong >>> > >>> > for (i = Istart; i < Iend; ++i) >>> > { >>> > ierr = MatSetValue(A,i,i,2,INSERT_VALUES);CHKERRQ(ierr); >>> > ierr = MatSetValue(A,i+1,i,-1,INSERT_VALUES);CHKERRQ(ierr); >>> > ierr = MatSetValue(A,i,i+1,-1,INSERT_VALUES);CHKERRQ(ierr); >>> > } >>> > >>> > you will get >>> > >>> > $ petscmpiexec -n 2 ./ex1 >>> > 0: Istart = 0, Iend = 60 >>> > 1: Istart = 60, Iend = 120 >>> > [1]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> > [1]PETSC ERROR: Argument out of range >>> > [1]PETSC ERROR: Row too large: row 120 max 119 >>> > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>> ocumentation/faq.html for trouble shooting. >>> > [1]PETSC ERROR: Petsc Development GIT revision: >>> v3.7.6-4103-g93161b8192 GIT Date: 2017-06-11 14:49:39 -0500 >>> > [1]PETSC ERROR: ./ex1 on a arch-basic named Barrys-MacBook-Pro.local >>> by barrysmith Wed Jun 14 18:26:52 2017 >>> > [1]PETSC ERROR: Configure options PETSC_ARCH=arch-basic >>> > [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 550 in >>> /Users/barrysmith/Src/petsc/src/mat/impls/aij/mpi/mpiaij.c >>> > [1]PETSC ERROR: #2 MatSetValues() line 1270 in >>> /Users/barrysmith/Src/petsc/src/mat/interface/matrix.c >>> > [1]PETSC ERROR: #3 main() line 30 in /Users/barrysmith/Src/petsc/te >>> st-dir/ex1.c >>> > [1]PETSC ERROR: PETSc Option Table entries: >>> > [1]PETSC ERROR: -malloc_test >>> > >>> > You need to get the example working so it ends with the error you >>> reported previously not these other bugs. >>> > >>> > >>> > > On Jun 12, 2017, at 10:19 AM, Hoang Giang Bui >>> wrote: >>> > > >>> > > Dear Barry >>> > > >>> > > I made a small example with 2 process with one empty split in proc >>> 0. But it gives another strange error >>> > > >>> > > [1]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> > > [1]PETSC ERROR: Arguments are incompatible >>> > > [1]PETSC ERROR: Local size 31 not compatible with block size 2 >>> > > >>> > > The local size is always 60, so this is confusing. >>> > > >>> > > Giang >>> > > >>> > > On Sun, Jun 11, 2017 at 8:11 PM, Barry Smith >>> wrote: >>> > > Could be, send us a simple example that demonstrates the problem >>> and we'll track it down. >>> > > >>> > > >>> > > > On Jun 11, 2017, at 12:34 PM, Hoang Giang Bui >>> wrote: >>> > > > >>> > > > Hello >>> > > > >>> > > > I noticed that my code stopped very long, possibly hang, at >>> PCFieldSplitSetIS. There are two splits and one split is empty in one >>> process. May that be the possible reason that PCFieldSplitSetIS hang ? >>> > > > >>> > > > Giang >>> > > >>> > > >>> > > >>> > >>> > >>> > >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From franck.houssen at inria.fr Tue Jun 20 10:31:19 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Tue, 20 Jun 2017 17:31:19 +0200 (CEST) Subject: [petsc-users] Building MatIS with dense local matrix ? In-Reply-To: References: <59634672.7742377.1497877797395.JavaMail.zimbra@inria.fr> <2069876670.7862897.1497889040933.JavaMail.zimbra@inria.fr> <2005369855.8101317.1497956305467.JavaMail.zimbra@inria.fr> <720133125.8242944.1497969058541.JavaMail.zimbra@inria.fr> Message-ID: <421697287.8283455.1497972679234.JavaMail.zimbra@inria.fr> OK. Franck ----- Mail original ----- > De: "Stefano Zampini" > ?: "Franck Houssen" > Cc: "PETSc users list" > Envoy?: Mardi 20 Juin 2017 17:08:29 > Objet: Re: [petsc-users] Building MatIS with dense local matrix ? > It should be fixed right now, both in master and in maint. > Again, sorry for this ping-pong of fixes, my brain it's not fully functional > these days.... > https://bitbucket.org/petsc/petsc/commits/c6f20c4fa7817632f09219574920bd3bd922f6f1 > 2017-06-20 16:30 GMT+02:00 Franck Houssen < franck.houssen at inria.fr > : > > OK. I moved from petsc-3.7.6 to development version (git clone bitbucket). > > > The very first version of the dummy example (= > > matISCSRDenseSquare/Rect.cpp) > > works with this fix: > > https://bitbucket.org/petsc/petsc/commits/4c8dd594d1988a0cbe282f8a37d9916f61e0c445 > > . > > > The second version of the dummy example works too with the fix if one moves > > to petsc bitbucket (master). > > > But, the code still breaks in "my" initial "real" case (using now master > > from > > petsc bitbucket)... With another error "SEGV under > > MatISSetMPIXAIJPreallocation_Private" (note: this is not a "new non zero" > > message, this seems to be another problem). > > > Here is a third version of the dummy example that breaks with "SEGV under > > MatISSetMPIXAIJPreallocation_Private" (using master from petsc bitbucket) : > > the idea is the same but with N procs (not only 2) and a rectangular matrix > > of size N*(N+1). > > > With 2 procs, it works (all cases). > > > With 4 procs, new problems occur: > > > >> mpirun -n 4 ./matISCSRDenseSquare.exe csr; mpirun -n 4 > > >> ./matISCSRDenseSquare.exe dense => OK > > > >> mpirun -n 4 ./matISCSRDenseRect.exe csr => OK > > > but > > > >> mpirun -n 4 ./matISCSRDenseRect.exe dense; > > > dense > > > dense > > > dense > > > dense > > > [3]PETSC ERROR: > > ------------------------------------------------------------------------ > > > [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > > probably memory access out of range > > > [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > > > [3]PETSC ERROR: or see > > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > > [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X > > to > > find memory corruption errors > > > [3]PETSC ERROR: likely location of problem given in stack below > > > [3]PETSC ERROR: --------------------- Stack Frames > > ------------------------------------ > > > [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not > > available, > > > [3]PETSC ERROR: INSTEAD the line number of the start of the function > > > [3]PETSC ERROR: is given. > > > [3]PETSC ERROR: [3] MatISSetMPIXAIJPreallocation_Private line 1055 > > /home/fghoussen/Documents/INRIA/petsc/src/mat/impls/is/matis.c > > > [3]PETSC ERROR: [3] MatISGetMPIXAIJ_IS line 1230 > > /home/fghoussen/Documents/INRIA/petsc/src/mat/impls/is/matis.c > > > [3]PETSC ERROR: [3] MatISGetMPIXAIJ line 1384 > > /home/fghoussen/Documents/INRIA/petsc/src/mat/impls/is/matis.c > > > [3]PETSC ERROR: --------------------- Error Message > > -------------------------------------------------------------- > > > I tried to go through this with caution... But I have to say I feel messy. > > Can you reproduce this problem at your side ? > > > Franck > > > >> git diff . > > > --- a/matISCSRDenseRect.cpp > > > +++ b/matISCSRDenseRect.cpp > > > @@ -14,19 +14,17 @@ int main(int argc,char **argv) { > > > if (matType != "csr" && matType != "dense") {cout << "error: need arg = csr > > or dense" << endl; return 1;} > > > PetscInitialize(&argc, &argv, NULL, NULL); > > > - int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); if (size != 2) {cout > > << > > "error: mpi != 2" << endl; return 1;} > > > + int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); > > > int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > > - PetscInt localIdx[2] = {0, 0}; > > > - if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} > > > - else {localIdx[0] = 1; localIdx[1] = 2;} > > > + PetscInt localIdx[2] = {rank, rank+1}; > > > ISLocalToGlobalMapping rmap; > > > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 1, &rank, > > PETSC_COPY_VALUES, &rmap); > > > ISLocalToGlobalMapping cmap; > > > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, > > PETSC_COPY_VALUES, &cmap); > > > Mat A; > > > - MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 2, 3, rmap, > > cmap, &A); > > > + MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, size, > > size+1, > > rmap, cmap, &A); > > > Mat Aloc; > > > if (matType == "csr") {cout << matType << endl; > > MatCreateSeqAIJ(PETSC_COMM_SELF, 1, 2, 2, NULL, &Aloc);} > > > --- a/matISCSRDenseSquare.cpp > > > +++ b/matISCSRDenseSquare.cpp > > > @@ -14,19 +14,17 @@ int main(int argc,char **argv) { > > > if (matType != "csr" && matType != "dense") {cout << "error: need arg = csr > > or dense" << endl; return 1;} > > > PetscInitialize(&argc, &argv, NULL, NULL); > > > - int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); if (size != 2) {cout > > << > > "error: mpi != 2" << endl; return 1;} > > > + int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); > > > int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > > - PetscInt localIdx[2] = {0, 0}; > > > - if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} > > > - else {localIdx[0] = 1; localIdx[1] = 2;} > > > + PetscInt localIdx[2] = {rank, rank+1}; > > > ISLocalToGlobalMapping rmap; > > > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, > > PETSC_COPY_VALUES, &rmap); > > > ISLocalToGlobalMapping cmap; > > > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, > > PETSC_COPY_VALUES, &cmap); > > > Mat A; > > > - MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 3, 3, rmap, > > cmap, &A); > > > + MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, size+1, > > size+1, rmap, cmap, &A); > > > Mat Aloc; > > > if (matType == "csr") {cout << matType << endl; > > MatCreateSeqAIJ(PETSC_COMM_SELF, 2, 2, 2, NULL, &Aloc);} > > > > De: "Stefano Zampini" < stefano.zampini at gmail.com > > > > > > > ?: "Franck Houssen" < franck.houssen at inria.fr > > > > > > > Cc: "PETSc users list" < petsc-users at mcs.anl.gov > > > > > > > Envoy?: Mardi 20 Juin 2017 13:23:27 > > > > > > Objet: Re: [petsc-users] Building MatIS with dense local matrix ? > > > > > > Franck > > > > > > I tested your new example with master and it works. However, It doesn't > > > work > > > with maint. I fixed the rectangular case a while ago in master and forgot > > > to > > > add the change to maint too. Sorry for that. > > > > > > This should fix the problem with maint: > > > https://bitbucket.org/petsc/petsc/commits/0ea065fb06d751599c4157d36bfe1a1b41348e0b > > > > > > Test your real case and let me know. > > > > > > If you could, it would be good to test against master too. > > > > > > Thanks, > > > > > > Stefano > > > > > > 2017-06-20 12:58 GMT+02:00 Franck Houssen < franck.houssen at inria.fr > : > > > > > > > As I said, it is often difficult to reduce the "real" problem: it turns > > > > out > > > > that your fix solves the "matISCSRDenseSquare/Rect.cpp" dummy example I > > > > sent, but, it's still not working in "my real" situation. > > > > > > > > > > I changed a bit the "matISCSRDenseSquare/Rect.cpp" dummy example (see > > > > git > > > > diff below - I just changed the point that overlaps) : the dummy > > > > example > > > > is > > > > still failing > > > > > > > > > > "mpirun -n 2 ./matISCSRDenseSquare.exe csr" and "mpirun -n 2 > > > > ./matISCSRDenseSquare.exe dense" : OK > > > > > > > > > > but > > > > > > > > > > "mpirun -n 2 ./matISCSRDenseRect.exe csr" and "mpirun -n 2 > > > > ./matISCSRDenseRect.exe dense": KO with error "Argument out of range - > > > > New > > > > nonzero at (0,2) caused a malloc" > > > > > > > > > > I would say, the problem (I am concerned with the "real" case) is > > > > around > > > > lines 360-380 of /src/mat/impls/is/matis.c (not around 181 : this fixes > > > > a > > > > valid problem, but, this problem is another one) > > > > > > > > > > Franck > > > > > > > > > > --- a/matISCSRDenseRect.cpp > > > > > > > > > > +++ b/matISCSRDenseRect.cpp > > > > > > > > > > @@ -18,7 +18,7 @@ int main(int argc,char **argv) { > > > > > > > > > > int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > > > > > > > > > PetscInt localIdx[2] = {0, 0}; > > > > > > > > > > - if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} > > > > > > > > > > + if (rank == 0) {localIdx[0] = 0; localIdx[1] = 2;} > > > > > > > > > > else {localIdx[0] = 1; localIdx[1] = 2;} > > > > > > > > > > ISLocalToGlobalMapping rmap; > > > > > > > > > > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 1, &rank, > > > > PETSC_COPY_VALUES, &rmap); > > > > > > > > > > diff --git > > > > a/Graphs/Franck/03.petscDDM/02.petscMailList/matISCSRDenseSquare.cpp > > > > b/Graphs/Franck/03.petscDDM/02.petscMailList/matISCSRDenseSquare.cpp > > > > > > > > > > index 4bc6190..4a6ea41 100644 > > > > > > > > > > --- a/matISCSRDenseSquare.cpp > > > > > > > > > > +++ b/matISCSRDenseSquare.cpp > > > > > > > > > > @@ -18,7 +18,7 @@ int main(int argc,char **argv) { > > > > > > > > > > int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > > > > > > > > > PetscInt localIdx[2] = {0, 0}; > > > > > > > > > > - if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} > > > > > > > > > > + if (rank == 0) {localIdx[0] = 0; localIdx[1] = 2;} > > > > > > > > > > else {localIdx[0] = 1; localIdx[1] = 2;} > > > > > > > > > > ISLocalToGlobalMapping rmap; > > > > > > > > > > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, > > > > PETSC_COPY_VALUES, &rmap); > > > > > > > > > > > De: "Stefano Zampini" < stefano.zampini at gmail.com > > > > > > > > > > > > > > > > ?: "Franck Houssen" < franck.houssen at inria.fr > > > > > > > > > > > > > > > > Cc: "PETSc users list" < petsc-users at mcs.anl.gov > > > > > > > > > > > > > > > > Envoy?: Mardi 20 Juin 2017 00:23:24 > > > > > > > > > > > > > > > Objet: Re: [petsc-users] Building MatIS with dense local matrix ? > > > > > > > > > > > > > > > It should be fixed now in maint and master > > > > > > > > > > > > > > > https://bitbucket.org/petsc/petsc/commits/4c8dd594d1988a0cbe282f8a37d9916f61e0c445 > > > > > > > > > > > > > > > Thanks for reporting the problem, > > > > > > > > > > > > > > > Stefano > > > > > > > > > > > > > > > > On Jun 19, 2017, at 10:46 PM, Stefano Zampini < > > > > > > stefano.zampini at gmail.com > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Franck, > > > > > > > > > > > > > > > > > > > > > Thanks. I'll? get back soon with a fix. > > > > > > > > > > > > > > > > > > > > > Stefano > > > > > > > > > > > > > > > > > > > > > Il 19 Giu 2017 18:17, "Franck Houssen" < franck.houssen at inria.fr > > > > > > > ha > > > > > > scritto: > > > > > > > > > > > > > > > > > > > > > > The problem was difficult to reduce as reducing make things > > > > > > > disappear... > > > > > > > Luckily, I believe I got it (or at least, it looks "like" the one > > > > > > > I > > > > > > > "really" > > > > > > > have...). > > > > > > > > > > > > > > > > > > > > > > > > > > > > Seems that for square matrix, it works fine for csr and dense > > > > > > > matrix. > > > > > > > But, > > > > > > > If > > > > > > > I am not mistaken, it does not for dense rectangular matrix > > > > > > > (still > > > > > > > OK > > > > > > > for > > > > > > > csr). > > > > > > > > > > > > > > > > > > > > > > > > > > > > matISCSRDenseSquare.cpp: 2 procs, global 3x3 matrix, each proc > > > > > > > adds > > > > > > > a > > > > > > > 2x2 > > > > > > > local matrix in the global matrix. > > > > > > > > > > > > > > > > > > > > > > > > > > > > matISCSRDenseRect.cpp: 2 procs, global 2 x3 matrix, each proc > > > > > > > adds > > > > > > > a > > > > > > > 1 > > > > > > > x2 > > > > > > > local vector in the global matrix. > > > > > > > > > > > > > > > > > > > > > > > > > > > > reminder: running debian/testing with gcc-6.3 + petsc-3.7.6 > > > > > > > > > > > > > > > > > > > > > > > > > > > > Franck > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> mpirun -n 2 ./matISCSRDenseSquare.exe csr; mpirun -n 2 > > > > > > > >> ./matISCSRDenseSquare.exe dense > > > > > > > > > > > > > > > > > > > > > > > > > > > > csr > > > > > > > > > > > > > > > > > > > > > > > > > > > > csr > > > > > > > > > > > > > > > > > > > > > > > > > > > > Mat Object: 2 MPI processes > > > > > > > > > > > > > > > > > > > > > > > > > > > > type: is > > > > > > > > > > > > > > > > > > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > > > > > > > > > > > > > > > > > > > > > > > type: seqaij > > > > > > > > > > > > > > > > > > > > > > > > > > > > row 0: (0, 1.) (1, 0.) > > > > > > > > > > > > > > > > > > > > > > > > > > > > row 1: (0, 0.) (1, 1.) > > > > > > > > > > > > > > > > > > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > > > > > > > > > > > > > > > > > > > > > > > type: seqaij > > > > > > > > > > > > > > > > > > > > > > > > > > > > row 0: (0, 1.) (1, 0.) > > > > > > > > > > > > > > > > > > > > > > > > > > > > row 1: (0, 0.) (1, 1.) > > > > > > > > > > > > > > > > > > > > > > > > > > > > dense > > > > > > > > > > > > > > > > > > > > > > > > > > > > dense > > > > > > > > > > > > > > > > > > > > > > > > > > > > Mat Object: 2 MPI processes > > > > > > > > > > > > > > > > > > > > > > > > > > > > type: is > > > > > > > > > > > > > > > > > > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > > > > > > > > > > > > > > > > > > > > > > > type: seqdense > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1.0000000000000000e+00 0.0000000000000000e+00 > > > > > > > > > > > > > > > > > > > > > > > > > > > > 0.0000000000000000e+00 1.0000000000000000e+00 > > > > > > > > > > > > > > > > > > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > > > > > > > > > > > > > > > > > > > > > > > type: seqdense > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1.0000000000000000e+00 0.0000000000000000e+00 > > > > > > > > > > > > > > > > > > > > > > > > > > > > 0.0000000000000000e+00 1.0000000000000000e+00 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> mpirun -n 2 ./matISCSRDenseRect.exe csr; mpirun -n 2 > > > > > > > >> ./matISCSRDenseRect.exe dense > > > > > > > > > > > > > > > > > > > > > > > > > > > > csr > > > > > > > > > > > > > > > > > > > > > > > > > > > > csr > > > > > > > > > > > > > > > > > > > > > > > > > > > > Mat Object: 2 MPI processes > > > > > > > > > > > > > > > > > > > > > > > > > > > > type: is > > > > > > > > > > > > > > > > > > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > > > > > > > > > > > > > > > > > > > > > > > type: seqaij > > > > > > > > > > > > > > > > > > > > > > > > > > > > row 0: (0, 1.) (1, 0.) > > > > > > > > > > > > > > > > > > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > > > > > > > > > > > > > > > > > > > > > > > type: seqaij > > > > > > > > > > > > > > > > > > > > > > > > > > > > row 0: (0, 1.) (1, 0.) > > > > > > > > > > > > > > > > > > > > > > > > > > > > dense > > > > > > > > > > > > > > > > > > > > > > > > > > > > dense > > > > > > > > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: --------------------- Error Message > > > > > > > -------------------------------------------------------------- > > > > > > > > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: Argument out of range > > > > > > > > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: [1]PETSC ERROR: --------------------- Error > > > > > > > Message > > > > > > > -------------------------------------------------------------- > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]PETSC ERROR: Argument out of range > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]PETSC ERROR: New nonzero at (0,1) caused a malloc > > > > > > > > > > > > > > > > > > > > > > > > > > > > Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) > > > > > > > to > > > > > > > turn > > > > > > > off > > > > > > > this check > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]PETSC ERROR: See > > > > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html > > > > > > > for > > > > > > > trouble shooting. > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]PETSC ERROR: Petsc Release Version 3.7.6, Apr, 24, 2017 > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]PETSC ERROR: ./matISCSRDenseRect.exe on a arch-linux2-c-debug > > > > > > > named > > > > > > > yoda > > > > > > > by fghoussen Mon Jun 19 18:08:58 2017 > > > > > > > > > > > > > > > > > > > > > > > > > > > > New nonzero at (0,1) caused a malloc > > > > > > > > > > > > > > > > > > > > > > > > > > > > Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) > > > > > > > to > > > > > > > turn > > > > > > > off > > > > > > > this check > > > > > > > > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: See > > > > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html > > > > > > > for > > > > > > > trouble shooting. > > > > > > > > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: Petsc Release Version 3.7.6, Apr, 24, 2017 > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]PETSC ERROR: Configure options > > > > > > > --prefix=/home/fghoussen/Documents/INRIA/petsc-3.7.6/local > > > > > > > --with-mpi=1 > > > > > > > --with-pthread=1 --download-f2cblaslapack=yes > > > > > > > --download-mumps=yes > > > > > > > --download-scalapack=yes --download-superlu=yes > > > > > > > --download-suitesparse=yes > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 616 in > > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/aij/mpi/mpiaij.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]PETSC ERROR: #2 MatSetValues() line 1190 in > > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]PETSC ERROR: #3 MatSetValuesLocal() line 2053 in > > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]PETSC ERROR: #4 MatISGetMPIXAIJ_IS() line 365 in > > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]PETSC ERROR: #5 MatISGetMPIXAIJ() line 437 in > > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: ./matISCSRDenseRect.exe on a arch-linux2-c-debug > > > > > > > named > > > > > > > yoda > > > > > > > by fghoussen Mon Jun 19 18:08:58 2017 > > > > > > > > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: Configure options > > > > > > > --prefix=/home/fghoussen/Documents/INRIA/petsc-3.7.6/local > > > > > > > --with-mpi=1 > > > > > > > --with-pthread=1 --download-f2cblaslapack=yes > > > > > > > --download-mumps=yes > > > > > > > --download-scalapack=yes --download-superlu=yes > > > > > > > --download-suitesparse=yes > > > > > > > > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 582 in > > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/aij/mpi/mpiaij.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: #2 MatSetValues() line 1190 in > > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: #3 MatSetValuesLocal() line 2053 in > > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: #4 MatISGetMPIXAIJ_IS() line 365 in > > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: #5 MatISGetMPIXAIJ() line 437 in > > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > Mat Object: 2 MPI processes > > > > > > > > > > > > > > > > > > > > > > > > > > > > type: is > > > > > > > > > > > > > > > > > > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > > > > > > > > > > > > > > > > > > > > > > > type: seqdense > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1.0000000000000000e+00 0.0000000000000000e+00 > > > > > > > > > > > > > > > > > > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > > > > > > > > > > > > > > > > > > > > > > > type: seqdense > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1.0000000000000000e+00 0.0000000000000000e+00 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> diff matISCSRDenseSquare.cpp matISCSRDenseRect.cpp > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3c3 > > > > > > > > > > > > > > > > > > > > > > > > > > > > < // ~> g++ -o matISCSRDenseSquare.exe matISCSRDenseSquare.cpp > > > > > > > -lpetsc > > > > > > > -lm; > > > > > > > mpirun -n 2 matISCSRDenseSquare.exe > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > // ~> g++ -o matISCSRDenseRect.exe matISCSRDenseRect.cpp > > > > > > > > -lpetsc > > > > > > > > -lm; > > > > > > > > mpirun -n 2 matISCSRDenseRect.exe > > > > > > > > > > > > > > > > > > > > > > > > > > > > 24c24 > > > > > > > > > > > > > > > > > > > > > > > > > > > > < ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, > > > > > > > PETSC_COPY_VALUES, &rmap); > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 1, &rank, > > > > > > > > PETSC_COPY_VALUES, &rmap); > > > > > > > > > > > > > > > > > > > > > > > > > > > > 29c29 > > > > > > > > > > > > > > > > > > > > > > > > > > > > < MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 3, > > > > > > > 3, > > > > > > > rmap, > > > > > > > cmap, &A); > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 2, > > > > > > > > 3, > > > > > > > > rmap, > > > > > > > > cmap, &A); > > > > > > > > > > > > > > > > > > > > > > > > > > > > 32,33c32,33 > > > > > > > > > > > > > > > > > > > > > > > > > > > > < if (matType == "csr") {cout << matType << endl; > > > > > > > MatCreateSeqAIJ(PETSC_COMM_SELF, 2, 2, 2, NULL, &Aloc);} > > > > > > > > > > > > > > > > > > > > > > > > > > > > < else {cout << matType << endl; > > > > > > > MatCreateSeqDense(PETSC_COMM_SELF, > > > > > > > 2, > > > > > > > 2, > > > > > > > NULL, &Aloc);} > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > if (matType == "csr") {cout << matType << endl; > > > > > > > > MatCreateSeqAIJ(PETSC_COMM_SELF, 1, 2, 2, NULL, &Aloc);} > > > > > > > > > > > > > > > > > > > > > > > > > > > > > else {cout << matType << endl; > > > > > > > > MatCreateSeqDense(PETSC_COMM_SELF, > > > > > > > > 1, > > > > > > > > 2, > > > > > > > > NULL, &Aloc);} > > > > > > > > > > > > > > > > > > > > > > > > > > > > 35,36c35,36 > > > > > > > > > > > > > > > > > > > > > > > > > > > > < PetscScalar localVal[4] = {1., 0., 0., 1.}; > > > > > > > > > > > > > > > > > > > > > > > > > > > > < MatSetValues(Aloc, 2, localIdx, 2, localIdx, localVal, > > > > > > > ADD_VALUES); > > > > > > > // > > > > > > > Add > > > > > > > local 2x2 matrix > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > PetscScalar localVal[2] = {1., 0.}; PetscInt oneLocalRow = 0; > > > > > > > > > > > > > > > > > > > > > > > > > > > > > MatSetValues(Aloc, 1, &oneLocalRow, 2, localIdx, localVal, > > > > > > > > ADD_VALUES); > > > > > > > > // > > > > > > > > Add local row > > > > > > > > > > > > > > > > > > > > > > > > > > > > > De: "Stefano Zampini" < stefano.zampini at gmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ?: "Franck Houssen" < franck.houssen at inria.fr > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cc: "PETSc users list" < petsc-users at mcs.anl.gov > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Envoy?: Lundi 19 Juin 2017 15:25:35 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Objet: Re: [petsc-users] Building MatIS with dense local matrix > > > > > > > > ? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Can you send a minimal working example so that I can fix the > > > > > > > > code? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Stefano > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Il 19 Giu 2017 15:20, "Franck Houssen" < > > > > > > > > franck.houssen at inria.fr > > > > > > > > > > > > > > > > > ha > > > > > > > > scritto: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I try to call MatISGetMPIXAIJ on a MatIS (A) that has been > > > > > > > > > feed > > > > > > > > > locally > > > > > > > > > by > > > > > > > > > sequential (Aloc) dense matrix. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Seems this ends up with this error: [0]PETSC ERROR: New > > > > > > > > > nonzero > > > > > > > > > at > > > > > > > > > (0,1) > > > > > > > > > caused a malloc. Is this a known error / limitation ? (not > > > > > > > > > supposed > > > > > > > > > to > > > > > > > > > work > > > > > > > > > with dense matrix ?) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This (pseudo code) works fine: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > MatCreateIS(..., A) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > MatCreateSeqAIJ(..., Aloc) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > MatISSetLocalMat(pcA, pcALoc) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > MatISGetMPIXAIJ(A, ...) // OK ! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > When I try to replace MatCreateSeqAIJ(..., Aloc) with > > > > > > > > > MatCreateSeqDense(..., > > > > > > > > > Aloc), it does no more work. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Franck > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > PS: running debian/testing with gcc-6.3 + petsc-3.7.6 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Stefano > > > > -- > Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From cpraveen at gmail.com Tue Jun 20 12:17:50 2017 From: cpraveen at gmail.com (Praveen C) Date: Tue, 20 Jun 2017 22:47:50 +0530 Subject: [petsc-users] Matrix-free use of TS Message-ID: Dear all I am using TS in my code to solve 3-d compressible navier-stokes [1]. To use an implicit scheme, I will use matrix free approach. I want to implement my own preconditioner which is also matrix-free. Can you point out some existing examples that can help me in this ? Thanks praveen [1] https://bitbucket.org/cpraveen/ug3 -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Jun 20 12:30:37 2017 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 20 Jun 2017 12:30:37 -0500 Subject: [petsc-users] Matrix-free use of TS In-Reply-To: References: Message-ID: On Tue, Jun 20, 2017 at 12:17 PM, Praveen C wrote: > Dear all > > I am using TS in my code to solve 3-d compressible navier-stokes [1]. To > use an implicit scheme, I will use matrix free approach. I want to > implement my own preconditioner which is also matrix-free. Can you point > out some existing examples that can help me in this ? > I do not think we have any. You should just need to create a MATSHELL for the Jacobian, and then stick the input vector in a context inside the FormIJacobian routine. Note also you will need to handle the scaling for the derivative with respect to \dot u. You can do the PC with PCSHELL. You can show us your code if it does not work. Thanks, Matt > Thanks > praveen > > [1] https://bitbucket.org/cpraveen/ug3 > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Tue Jun 20 13:45:47 2017 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Tue, 20 Jun 2017 20:45:47 +0200 Subject: [petsc-users] Building MatIS with dense local matrix ? In-Reply-To: <421697287.8283455.1497972679234.JavaMail.zimbra@inria.fr> References: <59634672.7742377.1497877797395.JavaMail.zimbra@inria.fr> <2069876670.7862897.1497889040933.JavaMail.zimbra@inria.fr> <2005369855.8101317.1497956305467.JavaMail.zimbra@inria.fr> <720133125.8242944.1497969058541.JavaMail.zimbra@inria.fr> <421697287.8283455.1497972679234.JavaMail.zimbra@inria.fr> Message-ID: Franck, is it your "real" case working properly now? Stefano 2017-06-20 17:31 GMT+02:00 Franck Houssen : > OK. > > Franck > > ------------------------------ > > *De: *"Stefano Zampini" > *?: *"Franck Houssen" > *Cc: *"PETSc users list" > *Envoy?: *Mardi 20 Juin 2017 17:08:29 > > *Objet: *Re: [petsc-users] Building MatIS with dense local matrix ? > > It should be fixed right now, both in master and in maint. > Again, sorry for this ping-pong of fixes, my brain it's not fully > functional these days.... > > https://bitbucket.org/petsc/petsc/commits/c6f20c4fa7817632f09219574920bd > 3bd922f6f1 > > 2017-06-20 16:30 GMT+02:00 Franck Houssen : > >> OK. I moved from petsc-3.7.6 to development version (git clone bitbucket). >> The very first version of the dummy example (= >> matISCSRDenseSquare/Rect.cpp) works with this fix: >> https://bitbucket.org/petsc/petsc/commits/4c8dd594d1988a0cbe282f8a37d991 >> 6f61e0c445. >> The second version of the dummy example works too with the fix if one >> moves to petsc bitbucket (master). >> >> But, the code still breaks in "my" initial "real" case (using now master >> from petsc bitbucket)... With another error "SEGV under >> MatISSetMPIXAIJPreallocation_Private" (note: this is not a "new non >> zero" message, this seems to be another problem). >> >> Here is a third version of the dummy example that breaks with "SEGV under >> MatISSetMPIXAIJPreallocation_Private" (using master from petsc >> bitbucket) : the idea is the same but with N procs (not only 2) and a >> rectangular matrix of size N*(N+1). >> With 2 procs, it works (all cases). >> With 4 procs, new problems occur: >> >> mpirun -n 4 ./matISCSRDenseSquare.exe csr; mpirun -n 4 >> ./matISCSRDenseSquare.exe dense => OK >> >> mpirun -n 4 ./matISCSRDenseRect.exe csr => OK >> but >> >> mpirun -n 4 ./matISCSRDenseRect.exe dense; >> dense >> dense >> dense >> dense >> [3]PETSC ERROR: ------------------------------ >> ------------------------------------------ >> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, >> probably memory access out of range >> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [3]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/ >> documentation/faq.html#valgrind >> [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS >> X to find memory corruption errors >> [3]PETSC ERROR: likely location of problem given in stack below >> [3]PETSC ERROR: --------------------- Stack Frames >> ------------------------------------ >> [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not >> available, >> [3]PETSC ERROR: INSTEAD the line number of the start of the function >> [3]PETSC ERROR: is given. >> [3]PETSC ERROR: [3] MatISSetMPIXAIJPreallocation_Private line 1055 >> /home/fghoussen/Documents/INRIA/petsc/src/mat/impls/is/matis.c >> [3]PETSC ERROR: [3] MatISGetMPIXAIJ_IS line 1230 >> /home/fghoussen/Documents/INRIA/petsc/src/mat/impls/is/matis.c >> [3]PETSC ERROR: [3] MatISGetMPIXAIJ line 1384 /home/fghoussen/Documents/ >> INRIA/petsc/src/mat/impls/is/matis.c >> [3]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> >> I tried to go through this with caution... But I have to say I feel >> messy. Can you reproduce this problem at your side ? >> >> Franck >> >> >> git diff . >> --- a/matISCSRDenseRect.cpp >> +++ b/matISCSRDenseRect.cpp >> @@ -14,19 +14,17 @@ int main(int argc,char **argv) { >> if (matType != "csr" && matType != "dense") {cout << "error: need arg >> = csr or dense" << endl; return 1;} >> >> PetscInitialize(&argc, &argv, NULL, NULL); >> - int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); if (size != 2) >> {cout << "error: mpi != 2" << endl; return 1;} >> + int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); >> int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); >> >> - PetscInt localIdx[2] = {0, 0}; >> - if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} >> - else {localIdx[0] = 1; localIdx[1] = 2;} >> + PetscInt localIdx[2] = {rank, rank+1}; >> ISLocalToGlobalMapping rmap; >> ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 1, &rank, >> PETSC_COPY_VALUES, &rmap); >> ISLocalToGlobalMapping cmap; >> ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, >> PETSC_COPY_VALUES, &cmap); >> >> Mat A; >> - MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 2, 3, >> rmap, cmap, &A); >> + MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, size, >> size+1, rmap, cmap, &A); >> >> Mat Aloc; >> if (matType == "csr") {cout << matType << endl; >> MatCreateSeqAIJ(PETSC_COMM_SELF, 1, 2, 2, NULL, &Aloc);} >> >> --- a/matISCSRDenseSquare.cpp >> +++ b/matISCSRDenseSquare.cpp >> @@ -14,19 +14,17 @@ int main(int argc,char **argv) { >> if (matType != "csr" && matType != "dense") {cout << "error: need arg >> = csr or dense" << endl; return 1;} >> >> PetscInitialize(&argc, &argv, NULL, NULL); >> - int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); if (size != 2) >> {cout << "error: mpi != 2" << endl; return 1;} >> + int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); >> int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); >> >> - PetscInt localIdx[2] = {0, 0}; >> - if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} >> - else {localIdx[0] = 1; localIdx[1] = 2;} >> + PetscInt localIdx[2] = {rank, rank+1}; >> ISLocalToGlobalMapping rmap; >> ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, >> PETSC_COPY_VALUES, &rmap); >> ISLocalToGlobalMapping cmap; >> ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, >> PETSC_COPY_VALUES, &cmap); >> >> Mat A; >> - MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 3, 3, >> rmap, cmap, &A); >> + MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, size+1, >> size+1, rmap, cmap, &A); >> >> Mat Aloc; >> if (matType == "csr") {cout << matType << endl; >> MatCreateSeqAIJ(PETSC_COMM_SELF, 2, 2, 2, NULL, &Aloc);} >> >> ------------------------------ >> >> *De: *"Stefano Zampini" >> *?: *"Franck Houssen" >> *Cc: *"PETSc users list" >> *Envoy?: *Mardi 20 Juin 2017 13:23:27 >> >> *Objet: *Re: [petsc-users] Building MatIS with dense local matrix ? >> >> Franck >> >> I tested your new example with master and it works. However, It doesn't >> work with maint. I fixed the rectangular case a while ago in master and >> forgot to add the change to maint too. Sorry for that. >> >> This should fix the problem with maint: https://bitbucket.org/petsc/ >> petsc/commits/0ea065fb06d751599c4157d36bfe1a1b41348e0b >> >> Test your real case and let me know. >> If you could, it would be good to test against master too. >> >> Thanks, >> Stefano >> >> >> 2017-06-20 12:58 GMT+02:00 Franck Houssen : >> >>> As I said, it is often difficult to reduce the "real" problem: it turns >>> out that your fix solves the "matISCSRDenseSquare/Rect.cpp" dummy example I >>> sent, but, it's still not working in "my real" situation. >>> >>> I changed a bit the "matISCSRDenseSquare/Rect.cpp" dummy example (see >>> git diff below - I just changed the point that overlaps) : the dummy >>> example is still failing >>> >>> "mpirun -n 2 ./matISCSRDenseSquare.exe csr" and "mpirun -n 2 >>> ./matISCSRDenseSquare.exe dense" : OK >>> but >>> "mpirun -n 2 ./matISCSRDenseRect.exe csr" and "mpirun -n 2 >>> ./matISCSRDenseRect.exe dense": KO with error "Argument out of range - New >>> nonzero at (0,2) caused a malloc" >>> >>> I would say, the problem (I am concerned with the "real" case) is around >>> lines 360-380 of /src/mat/impls/is/matis.c (not around 181 : this fixes a >>> valid problem, but, this problem is another one) >>> >>> Franck >>> >>> --- a/matISCSRDenseRect.cpp >>> +++ b/matISCSRDenseRect.cpp >>> @@ -18,7 +18,7 @@ int main(int argc,char **argv) { >>> int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); >>> >>> PetscInt localIdx[2] = {0, 0}; >>> - if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} >>> + if (rank == 0) {localIdx[0] = 0; localIdx[1] = 2;} >>> else {localIdx[0] = 1; localIdx[1] = 2;} >>> ISLocalToGlobalMapping rmap; >>> ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 1, &rank, >>> PETSC_COPY_VALUES, &rmap); >>> diff --git a/Graphs/Franck/03.petscDDM/02.petscMailList/matISCSRDenseSquare.cpp >>> b/Graphs/Franck/03.petscDDM/02.petscMailList/matISCSRDenseSquare.cpp >>> index 4bc6190..4a6ea41 100644 >>> --- a/matISCSRDenseSquare.cpp >>> +++ b/matISCSRDenseSquare.cpp >>> @@ -18,7 +18,7 @@ int main(int argc,char **argv) { >>> int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); >>> >>> PetscInt localIdx[2] = {0, 0}; >>> - if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} >>> + if (rank == 0) {localIdx[0] = 0; localIdx[1] = 2;} >>> else {localIdx[0] = 1; localIdx[1] = 2;} >>> ISLocalToGlobalMapping rmap; >>> ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, >>> PETSC_COPY_VALUES, &rmap); >>> >>> ------------------------------ >>> >>> *De: *"Stefano Zampini" >>> *?: *"Franck Houssen" >>> *Cc: *"PETSc users list" >>> *Envoy?: *Mardi 20 Juin 2017 00:23:24 >>> >>> *Objet: *Re: [petsc-users] Building MatIS with dense local matrix ? >>> >>> It should be fixed now in maint and master >>> >>> https://bitbucket.org/petsc/petsc/commits/4c8dd594d1988a0cbe282f8a37d991 >>> 6f61e0c445 >>> >>> Thanks for reporting the problem, >>> Stefano >>> >>> On Jun 19, 2017, at 10:46 PM, Stefano Zampini >>> wrote: >>> >>> Franck, >>> >>> Thanks. I'll? get back soon with a fix. >>> >>> Stefano >>> >>> >>> Il 19 Giu 2017 18:17, "Franck Houssen" ha >>> scritto: >>> >>>> The problem was difficult to reduce as reducing make things >>>> disappear... Luckily, I believe I got it (or at least, it looks "like" the >>>> one I "really" have...). >>>> >>>> Seems that for square matrix, it works fine for csr and dense matrix. >>>> But, If I am not mistaken, it does not for dense rectangular matrix (still >>>> OK for csr). >>>> >>>> matISCSRDenseSquare.cpp: 2 procs, global 3x3 matrix, each proc adds a >>>> 2x2 local matrix in the global matrix. >>>> matISCSRDenseRect.cpp: 2 procs, global *2*x3 matrix, each proc adds a >>>> *1*x2 local *vector* in the global matrix. >>>> >>>> reminder: running debian/testing with gcc-6.3 + petsc-3.7.6 >>>> >>>> Franck >>>> >>>> >> mpirun -n 2 ./matISCSRDenseSquare.exe csr; mpirun -n 2 >>>> ./matISCSRDenseSquare.exe dense >>>> csr >>>> csr >>>> Mat Object: 2 MPI processes >>>> type: is >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> row 0: (0, 1.) (1, 0.) >>>> row 1: (0, 0.) (1, 1.) >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> row 0: (0, 1.) (1, 0.) >>>> row 1: (0, 0.) (1, 1.) >>>> dense >>>> dense >>>> Mat Object: 2 MPI processes >>>> type: is >>>> Mat Object: 1 MPI processes >>>> type: seqdense >>>> 1.0000000000000000e+00 0.0000000000000000e+00 >>>> 0.0000000000000000e+00 1.0000000000000000e+00 >>>> Mat Object: 1 MPI processes >>>> type: seqdense >>>> 1.0000000000000000e+00 0.0000000000000000e+00 >>>> 0.0000000000000000e+00 1.0000000000000000e+00 >>>> >>>> >> mpirun -n 2 ./matISCSRDenseRect.exe csr; mpirun -n 2 >>>> ./matISCSRDenseRect.exe dense >>>> csr >>>> csr >>>> Mat Object: 2 MPI processes >>>> type: is >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> row 0: (0, 1.) (1, 0.) >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> row 0: (0, 1.) (1, 0.) >>>> dense >>>> dense >>>> [0]PETSC ERROR: --------------------- Error Message >>>> -------------------------------------------------------------- >>>> [0]PETSC ERROR: Argument out of range >>>> [0]PETSC ERROR: [1]PETSC ERROR: --------------------- Error Message >>>> -------------------------------------------------------------- >>>> [1]PETSC ERROR: Argument out of range >>>> [1]PETSC ERROR: New nonzero at (0,1) caused a malloc >>>> Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to >>>> turn off this check >>>> [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >>>> for trouble shooting. >>>> [1]PETSC ERROR: Petsc Release Version 3.7.6, Apr, 24, 2017 >>>> [1]PETSC ERROR: ./matISCSRDenseRect.exe on a arch-linux2-c-debug named >>>> yoda by fghoussen Mon Jun 19 18:08:58 2017 >>>> New nonzero at (0,1) caused a malloc >>>> Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to >>>> turn off this check >>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >>>> for trouble shooting. >>>> [0]PETSC ERROR: Petsc Release Version 3.7.6, Apr, 24, 2017 >>>> [1]PETSC ERROR: Configure options --prefix=/home/fghoussen/ >>>> Documents/INRIA/petsc-3.7.6/local --with-mpi=1 --with-pthread=1 >>>> --download-f2cblaslapack=yes --download-mumps=yes --download-scalapack=yes >>>> --download-superlu=yes --download-suitesparse=yes >>>> [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 616 in >>>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/ >>>> impls/aij/mpi/mpiaij.c >>>> [1]PETSC ERROR: #2 MatSetValues() line 1190 in >>>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c >>>> [1]PETSC ERROR: #3 MatSetValuesLocal() line 2053 in >>>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c >>>> [1]PETSC ERROR: #4 MatISGetMPIXAIJ_IS() line 365 in >>>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >>>> [1]PETSC ERROR: #5 MatISGetMPIXAIJ() line 437 in >>>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >>>> [0]PETSC ERROR: ./matISCSRDenseRect.exe on a arch-linux2-c-debug named >>>> yoda by fghoussen Mon Jun 19 18:08:58 2017 >>>> [0]PETSC ERROR: Configure options --prefix=/home/fghoussen/ >>>> Documents/INRIA/petsc-3.7.6/local --with-mpi=1 --with-pthread=1 >>>> --download-f2cblaslapack=yes --download-mumps=yes --download-scalapack=yes >>>> --download-superlu=yes --download-suitesparse=yes >>>> [0]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 582 in >>>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/ >>>> impls/aij/mpi/mpiaij.c >>>> [0]PETSC ERROR: #2 MatSetValues() line 1190 in >>>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c >>>> [0]PETSC ERROR: #3 MatSetValuesLocal() line 2053 in >>>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c >>>> [0]PETSC ERROR: #4 MatISGetMPIXAIJ_IS() line 365 in >>>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >>>> [0]PETSC ERROR: #5 MatISGetMPIXAIJ() line 437 in >>>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >>>> Mat Object: 2 MPI processes >>>> type: is >>>> Mat Object: 1 MPI processes >>>> type: seqdense >>>> 1.0000000000000000e+00 0.0000000000000000e+00 >>>> Mat Object: 1 MPI processes >>>> type: seqdense >>>> 1.0000000000000000e+00 0.0000000000000000e+00 >>>> >>>> >> diff matISCSRDenseSquare.cpp matISCSRDenseRect.cpp >>>> 3c3 >>>> < // ~> g++ -o matISCSRDenseSquare.exe matISCSRDenseSquare.cpp -lpetsc >>>> -lm; mpirun -n 2 matISCSRDenseSquare.exe >>>> --- >>>> > // ~> g++ -o matISCSRDenseRect.exe matISCSRDenseRect.cpp -lpetsc -lm; >>>> mpirun -n 2 matISCSRDenseRect.exe >>>> 24c24 >>>> < ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, >>>> PETSC_COPY_VALUES, &rmap); >>>> --- >>>> > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 1, &rank, >>>> PETSC_COPY_VALUES, &rmap); >>>> 29c29 >>>> < MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 3, 3, >>>> rmap, cmap, &A); >>>> --- >>>> > MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 2, 3, >>>> rmap, cmap, &A); >>>> 32,33c32,33 >>>> < if (matType == "csr") {cout << matType << endl; >>>> MatCreateSeqAIJ(PETSC_COMM_SELF, 2, 2, 2, NULL, &Aloc);} >>>> < else {cout << matType << endl; MatCreateSeqDense(PETSC_COMM_SELF, >>>> 2, 2, NULL, &Aloc);} >>>> --- >>>> > if (matType == "csr") {cout << matType << endl; >>>> MatCreateSeqAIJ(PETSC_COMM_SELF, 1, 2, 2, NULL, &Aloc);} >>>> > else {cout << matType << endl; MatCreateSeqDense(PETSC_COMM_SELF, >>>> 1, 2, NULL, &Aloc);} >>>> 35,36c35,36 >>>> < PetscScalar localVal[4] = {1., 0., 0., 1.}; >>>> < MatSetValues(Aloc, 2, localIdx, 2, localIdx, localVal, ADD_VALUES); >>>> // Add local 2x2 matrix >>>> --- >>>> > PetscScalar localVal[2] = {1., 0.}; PetscInt oneLocalRow = 0; >>>> > MatSetValues(Aloc, 1, &oneLocalRow, 2, localIdx, localVal, >>>> ADD_VALUES); // Add local row >>>> >>>> ------------------------------ >>>> >>>> *De: *"Stefano Zampini" >>>> *?: *"Franck Houssen" >>>> *Cc: *"PETSc users list" >>>> *Envoy?: *Lundi 19 Juin 2017 15:25:35 >>>> *Objet: *Re: [petsc-users] Building MatIS with dense local matrix ? >>>> >>>> Can you send a minimal working example so that I can fix the code? >>>> >>>> Thanks >>>> Stefano >>>> >>>> Il 19 Giu 2017 15:20, "Franck Houssen" ha >>>> scritto: >>>> >>>>> Hi, >>>>> >>>>> I try to call MatISGetMPIXAIJ on a MatIS (A) that has been feed >>>>> locally by sequential (Aloc) dense matrix. >>>>> Seems this ends up with this error: [0]PETSC ERROR: New nonzero at >>>>> (0,1) caused a malloc. Is this a known error / limitation ? (not supposed >>>>> to work with dense matrix ?) >>>>> >>>>> This (pseudo code) works fine: >>>>> MatCreateIS(..., A) >>>>> MatCreateSeqAIJ(..., Aloc) >>>>> MatISSetLocalMat(pcA, pcALoc) >>>>> MatISGetMPIXAIJ(A, ...) // OK ! >>>>> >>>>> When I try to replace MatCreateSeqAIJ(..., Aloc) with >>>>> MatCreateSeqDense(..., Aloc), it does no more work. >>>>> >>>>> Franck >>>>> >>>>> PS: running debian/testing with gcc-6.3 + petsc-3.7.6 >>>>> >>>> >>>> >>> >>> >> >> >> -- >> Stefano >> >> >> > > > -- > Stefano > > > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.croucher at auckland.ac.nz Tue Jun 20 22:41:44 2017 From: a.croucher at auckland.ac.nz (Adrian Croucher) Date: Wed, 21 Jun 2017 15:41:44 +1200 Subject: [petsc-users] Jacobian matrix for dual porosity model In-Reply-To: References: <699ad4c0-6f79-be19-8239-ba2050ccb8de@auckland.ac.nz> <87d1a86i6n.fsf@jedbrown.org> <90E27510-2650-4B07-B37C-1C6D46250FC3@mcs.anl.gov> <87y3sv4qpl.fsf@jedbrown.org> Message-ID: <6e72e0e2-1e9c-8796-4b3b-d55421a3fd61@auckland.ac.nz> On 16/06/17 11:37, Adrian Croucher wrote: > On 16/06/17 01:19, Matthew Knepley wrote: >> >> Thanks for those ideas, very helpful. >> >> If I try this approach (forming whole Jacobian matrix and using >> PCFieldSplit Schur), I guess I will first need to set up a >> modified DMPlex for the whole fracture + matrix mesh- so I can >> use it to create vectors and the Jacobian matrix (with the right >> sparsity pattern), and also to work out the coloring for finite >> differencing. >> >> Would that be straight-forward to do? Currently my DM just comes >> from DMPlexCreateFromFile(). Presumably you can use >> DMPlexInsertCone() or similar to add points into it? >> >> >> You can certainly modify the mesh. I need to get a better idea what >> kind of modification, and then >> I can suggest a way to do it. What do you start with, and what >> exactly do you want to add? >> >> > The way dual porosity is normally implemented in a finite volume > context is to add an extra matrix rock cell 'inside' each of the > original cells (which now represent the fractures, and have their > volumes reduced accordingly), with a connection between the fracture > cell and its corresponding matrix rock cell, so fluid can flow between > them. > > More generally there can be multiple matrix rock cells for each > fracture cell, in which case further matrix rock cells are nested > inside the first one, again with connections between them. There are > formulae for computing the appropriate effective matrix rock cell > volumes and connection areas, typically based on a 'fracture spacing' > parameter which determines how fractured the rock is. > > So in a DMPlex context it would mean somehow adding extra DAG points > representing the internal cells and faces for each of the original > cells. I'm not sure how that would be done. I've been having a go at this- copying the cones and supports from the original DM into a new DM, which will represent the topology of the dual porosity mesh, but adding one new cell for each original cell, and a face connecting the two. I've based my attempt loosely on some of the stuff in SplitFaces() in TS ex11.c. It isn't working though, as yet- when I construct the new DM and view it, the numbers of cells, faces etc. don't make sense, and the depth label says it's got 7 strata, where it should still just have 4 like the original DM. I may have just messed it up somehow, but it occurred to me that maybe DMPlex won't like having bits of its DAG having different depths? With what I've been trying, the dual porosity parts of the mesh would just specify DAG points representing the new cells and the corresponding faces, but not for any edges or vertices of those faces. I hoped that would be sufficient for generating the nonzero structure of the Jacobian matrix. But I don't know if DMPlex will allow a DAG like that? - Adrian -- Dr Adrian Croucher Senior Research Fellow Department of Engineering Science University of Auckland, New Zealand email: a.croucher at auckland.ac.nz tel: +64 (0)9 923 4611 -------------- next part -------------- An HTML attachment was scrubbed... URL: From franck.houssen at inria.fr Wed Jun 21 05:41:10 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Wed, 21 Jun 2017 12:41:10 +0200 (CEST) Subject: [petsc-users] Building MatIS with dense local matrix ? In-Reply-To: References: <59634672.7742377.1497877797395.JavaMail.zimbra@inria.fr> <2005369855.8101317.1497956305467.JavaMail.zimbra@inria.fr> <720133125.8242944.1497969058541.JavaMail.zimbra@inria.fr> <421697287.8283455.1497972679234.JavaMail.zimbra@inria.fr> Message-ID: <163999.8548533.1498041670913.JavaMail.zimbra@inria.fr> Yes... And no ! The MatISGetMPIXAIJ call seems to work and to return a MATMPI (correct ? The returned matrix is distributed ?) But, after that, I hit another problem when using this matrix. Still working on that... Franck ----- Mail original ----- > De: "Stefano Zampini" > ?: "Franck Houssen" > Cc: "PETSc users list" > Envoy?: Mardi 20 Juin 2017 20:45:47 > Objet: Re: [petsc-users] Building MatIS with dense local matrix ? > Franck, > is it your "real" case working properly now? > Stefano > 2017-06-20 17:31 GMT+02:00 Franck Houssen < franck.houssen at inria.fr > : > > OK. > > > Franck > > > > De: "Stefano Zampini" < stefano.zampini at gmail.com > > > > > > > ?: "Franck Houssen" < franck.houssen at inria.fr > > > > > > > Cc: "PETSc users list" < petsc-users at mcs.anl.gov > > > > > > > Envoy?: Mardi 20 Juin 2017 17:08:29 > > > > > > Objet: Re: [petsc-users] Building MatIS with dense local matrix ? > > > > > > It should be fixed right now, both in master and in maint. > > > > > > Again, sorry for this ping-pong of fixes, my brain it's not fully > > > functional > > > these days.... > > > > > > https://bitbucket.org/petsc/petsc/commits/c6f20c4fa7817632f09219574920bd3bd922f6f1 > > > > > > 2017-06-20 16:30 GMT+02:00 Franck Houssen < franck.houssen at inria.fr > : > > > > > > > OK. I moved from petsc-3.7.6 to development version (git clone > > > > bitbucket). > > > > > > > > > > The very first version of the dummy example (= > > > > matISCSRDenseSquare/Rect.cpp) > > > > works with this fix: > > > > https://bitbucket.org/petsc/petsc/commits/4c8dd594d1988a0cbe282f8a37d9916f61e0c445 > > > > . > > > > > > > > > > The second version of the dummy example works too with the fix if one > > > > moves > > > > to petsc bitbucket (master). > > > > > > > > > > But, the code still breaks in "my" initial "real" case (using now > > > > master > > > > from > > > > petsc bitbucket)... With another error "SEGV under > > > > MatISSetMPIXAIJPreallocation_Private" (note: this is not a "new non > > > > zero" > > > > message, this seems to be another problem). > > > > > > > > > > Here is a third version of the dummy example that breaks with "SEGV > > > > under > > > > MatISSetMPIXAIJPreallocation_Private" (using master from petsc > > > > bitbucket) > > > > : > > > > the idea is the same but with N procs (not only 2) and a rectangular > > > > matrix > > > > of size N*(N+1). > > > > > > > > > > With 2 procs, it works (all cases). > > > > > > > > > > With 4 procs, new problems occur: > > > > > > > > > > >> mpirun -n 4 ./matISCSRDenseSquare.exe csr; mpirun -n 4 > > > > >> ./matISCSRDenseSquare.exe dense => OK > > > > > > > > > > >> mpirun -n 4 ./matISCSRDenseRect.exe csr => OK > > > > > > > > > > but > > > > > > > > > > >> mpirun -n 4 ./matISCSRDenseRect.exe dense; > > > > > > > > > > dense > > > > > > > > > > dense > > > > > > > > > > dense > > > > > > > > > > dense > > > > > > > > > > [3]PETSC ERROR: > > > > ------------------------------------------------------------------------ > > > > > > > > > > [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > > > > probably memory access out of range > > > > > > > > > > [3]PETSC ERROR: Try option -start_in_debugger or > > > > -on_error_attach_debugger > > > > > > > > > > [3]PETSC ERROR: or see > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > > > > > > > > > [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > > > > OS > > > > X > > > > to > > > > find memory corruption errors > > > > > > > > > > [3]PETSC ERROR: likely location of problem given in stack below > > > > > > > > > > [3]PETSC ERROR: --------------------- Stack Frames > > > > ------------------------------------ > > > > > > > > > > [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not > > > > available, > > > > > > > > > > [3]PETSC ERROR: INSTEAD the line number of the start of the function > > > > > > > > > > [3]PETSC ERROR: is given. > > > > > > > > > > [3]PETSC ERROR: [3] MatISSetMPIXAIJPreallocation_Private line 1055 > > > > /home/fghoussen/Documents/INRIA/petsc/src/mat/impls/is/matis.c > > > > > > > > > > [3]PETSC ERROR: [3] MatISGetMPIXAIJ_IS line 1230 > > > > /home/fghoussen/Documents/INRIA/petsc/src/mat/impls/is/matis.c > > > > > > > > > > [3]PETSC ERROR: [3] MatISGetMPIXAIJ line 1384 > > > > /home/fghoussen/Documents/INRIA/petsc/src/mat/impls/is/matis.c > > > > > > > > > > [3]PETSC ERROR: --------------------- Error Message > > > > -------------------------------------------------------------- > > > > > > > > > > I tried to go through this with caution... But I have to say I feel > > > > messy. > > > > Can you reproduce this problem at your side ? > > > > > > > > > > Franck > > > > > > > > > > >> git diff . > > > > > > > > > > --- a/matISCSRDenseRect.cpp > > > > > > > > > > +++ b/matISCSRDenseRect.cpp > > > > > > > > > > @@ -14,19 +14,17 @@ int main(int argc,char **argv) { > > > > > > > > > > if (matType != "csr" && matType != "dense") {cout << "error: need arg = > > > > csr > > > > or dense" << endl; return 1;} > > > > > > > > > > PetscInitialize(&argc, &argv, NULL, NULL); > > > > > > > > > > - int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); if (size != 2) > > > > {cout > > > > << > > > > "error: mpi != 2" << endl; return 1;} > > > > > > > > > > + int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); > > > > > > > > > > int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > > > > > > > > > - PetscInt localIdx[2] = {0, 0}; > > > > > > > > > > - if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} > > > > > > > > > > - else {localIdx[0] = 1; localIdx[1] = 2;} > > > > > > > > > > + PetscInt localIdx[2] = {rank, rank+1}; > > > > > > > > > > ISLocalToGlobalMapping rmap; > > > > > > > > > > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 1, &rank, > > > > PETSC_COPY_VALUES, &rmap); > > > > > > > > > > ISLocalToGlobalMapping cmap; > > > > > > > > > > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, > > > > PETSC_COPY_VALUES, &cmap); > > > > > > > > > > Mat A; > > > > > > > > > > - MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 2, 3, > > > > rmap, > > > > cmap, &A); > > > > > > > > > > + MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, size, > > > > size+1, > > > > rmap, cmap, &A); > > > > > > > > > > Mat Aloc; > > > > > > > > > > if (matType == "csr") {cout << matType << endl; > > > > MatCreateSeqAIJ(PETSC_COMM_SELF, 1, 2, 2, NULL, &Aloc);} > > > > > > > > > > --- a/matISCSRDenseSquare.cpp > > > > > > > > > > +++ b/matISCSRDenseSquare.cpp > > > > > > > > > > @@ -14,19 +14,17 @@ int main(int argc,char **argv) { > > > > > > > > > > if (matType != "csr" && matType != "dense") {cout << "error: need arg = > > > > csr > > > > or dense" << endl; return 1;} > > > > > > > > > > PetscInitialize(&argc, &argv, NULL, NULL); > > > > > > > > > > - int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); if (size != 2) > > > > {cout > > > > << > > > > "error: mpi != 2" << endl; return 1;} > > > > > > > > > > + int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); > > > > > > > > > > int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > > > > > > > > > - PetscInt localIdx[2] = {0, 0}; > > > > > > > > > > - if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} > > > > > > > > > > - else {localIdx[0] = 1; localIdx[1] = 2;} > > > > > > > > > > + PetscInt localIdx[2] = {rank, rank+1}; > > > > > > > > > > ISLocalToGlobalMapping rmap; > > > > > > > > > > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, > > > > PETSC_COPY_VALUES, &rmap); > > > > > > > > > > ISLocalToGlobalMapping cmap; > > > > > > > > > > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, > > > > PETSC_COPY_VALUES, &cmap); > > > > > > > > > > Mat A; > > > > > > > > > > - MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, 3, 3, > > > > rmap, > > > > cmap, &A); > > > > > > > > > > + MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, PETSC_DECIDE, size+1, > > > > size+1, rmap, cmap, &A); > > > > > > > > > > Mat Aloc; > > > > > > > > > > if (matType == "csr") {cout << matType << endl; > > > > MatCreateSeqAIJ(PETSC_COMM_SELF, 2, 2, 2, NULL, &Aloc);} > > > > > > > > > > > De: "Stefano Zampini" < stefano.zampini at gmail.com > > > > > > > > > > > > > > > > ?: "Franck Houssen" < franck.houssen at inria.fr > > > > > > > > > > > > > > > > Cc: "PETSc users list" < petsc-users at mcs.anl.gov > > > > > > > > > > > > > > > > Envoy?: Mardi 20 Juin 2017 13:23:27 > > > > > > > > > > > > > > > Objet: Re: [petsc-users] Building MatIS with dense local matrix ? > > > > > > > > > > > > > > > Franck > > > > > > > > > > > > > > > I tested your new example with master and it works. However, It > > > > > doesn't > > > > > work > > > > > with maint. I fixed the rectangular case a while ago in master and > > > > > forgot > > > > > to > > > > > add the change to maint too. Sorry for that. > > > > > > > > > > > > > > > This should fix the problem with maint: > > > > > https://bitbucket.org/petsc/petsc/commits/0ea065fb06d751599c4157d36bfe1a1b41348e0b > > > > > > > > > > > > > > > Test your real case and let me know. > > > > > > > > > > > > > > > If you could, it would be good to test against master too. > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > Stefano > > > > > > > > > > > > > > > 2017-06-20 12:58 GMT+02:00 Franck Houssen < franck.houssen at inria.fr > > > > > > : > > > > > > > > > > > > > > > > As I said, it is often difficult to reduce the "real" problem: it > > > > > > turns > > > > > > out > > > > > > that your fix solves the "matISCSRDenseSquare/Rect.cpp" dummy > > > > > > example > > > > > > I > > > > > > sent, but, it's still not working in "my real" situation. > > > > > > > > > > > > > > > > > > > > > I changed a bit the "matISCSRDenseSquare/Rect.cpp" dummy example > > > > > > (see > > > > > > git > > > > > > diff below - I just changed the point that overlaps) : the dummy > > > > > > example > > > > > > is > > > > > > still failing > > > > > > > > > > > > > > > > > > > > > "mpirun -n 2 ./matISCSRDenseSquare.exe csr" and "mpirun -n 2 > > > > > > ./matISCSRDenseSquare.exe dense" : OK > > > > > > > > > > > > > > > > > > > > > but > > > > > > > > > > > > > > > > > > > > > "mpirun -n 2 ./matISCSRDenseRect.exe csr" and "mpirun -n 2 > > > > > > ./matISCSRDenseRect.exe dense": KO with error "Argument out of > > > > > > range > > > > > > - > > > > > > New > > > > > > nonzero at (0,2) caused a malloc" > > > > > > > > > > > > > > > > > > > > > I would say, the problem (I am concerned with the "real" case) is > > > > > > around > > > > > > lines 360-380 of /src/mat/impls/is/matis.c (not around 181 : this > > > > > > fixes > > > > > > a > > > > > > valid problem, but, this problem is another one) > > > > > > > > > > > > > > > > > > > > > Franck > > > > > > > > > > > > > > > > > > > > > --- a/matISCSRDenseRect.cpp > > > > > > > > > > > > > > > > > > > > > +++ b/matISCSRDenseRect.cpp > > > > > > > > > > > > > > > > > > > > > @@ -18,7 +18,7 @@ int main(int argc,char **argv) { > > > > > > > > > > > > > > > > > > > > > int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > > > > > > > > > > > > > > > > > > > > PetscInt localIdx[2] = {0, 0}; > > > > > > > > > > > > > > > > > > > > > - if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} > > > > > > > > > > > > > > > > > > > > > + if (rank == 0) {localIdx[0] = 0; localIdx[1] = 2;} > > > > > > > > > > > > > > > > > > > > > else {localIdx[0] = 1; localIdx[1] = 2;} > > > > > > > > > > > > > > > > > > > > > ISLocalToGlobalMapping rmap; > > > > > > > > > > > > > > > > > > > > > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 1, &rank, > > > > > > PETSC_COPY_VALUES, &rmap); > > > > > > > > > > > > > > > > > > > > > diff --git > > > > > > a/Graphs/Franck/03.petscDDM/02.petscMailList/matISCSRDenseSquare.cpp > > > > > > b/Graphs/Franck/03.petscDDM/02.petscMailList/matISCSRDenseSquare.cpp > > > > > > > > > > > > > > > > > > > > > index 4bc6190..4a6ea41 100644 > > > > > > > > > > > > > > > > > > > > > --- a/matISCSRDenseSquare.cpp > > > > > > > > > > > > > > > > > > > > > +++ b/matISCSRDenseSquare.cpp > > > > > > > > > > > > > > > > > > > > > @@ -18,7 +18,7 @@ int main(int argc,char **argv) { > > > > > > > > > > > > > > > > > > > > > int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > > > > > > > > > > > > > > > > > > > > PetscInt localIdx[2] = {0, 0}; > > > > > > > > > > > > > > > > > > > > > - if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} > > > > > > > > > > > > > > > > > > > > > + if (rank == 0) {localIdx[0] = 0; localIdx[1] = 2;} > > > > > > > > > > > > > > > > > > > > > else {localIdx[0] = 1; localIdx[1] = 2;} > > > > > > > > > > > > > > > > > > > > > ISLocalToGlobalMapping rmap; > > > > > > > > > > > > > > > > > > > > > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, localIdx, > > > > > > PETSC_COPY_VALUES, &rmap); > > > > > > > > > > > > > > > > > > > > > > De: "Stefano Zampini" < stefano.zampini at gmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ?: "Franck Houssen" < franck.houssen at inria.fr > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cc: "PETSc users list" < petsc-users at mcs.anl.gov > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Envoy?: Mardi 20 Juin 2017 00:23:24 > > > > > > > > > > > > > > > > > > > > > > > > > > > > Objet: Re: [petsc-users] Building MatIS with dense local matrix ? > > > > > > > > > > > > > > > > > > > > > > > > > > > > It should be fixed now in maint and master > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://bitbucket.org/petsc/petsc/commits/4c8dd594d1988a0cbe282f8a37d9916f61e0c445 > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for reporting the problem, > > > > > > > > > > > > > > > > > > > > > > > > > > > > Stefano > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jun 19, 2017, at 10:46 PM, Stefano Zampini < > > > > > > > > stefano.zampini at gmail.com > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Franck, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks. I'll? get back soon with a fix. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Stefano > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Il 19 Giu 2017 18:17, "Franck Houssen" < > > > > > > > > franck.houssen at inria.fr > > > > > > > > > > > > > > > > > ha > > > > > > > > scritto: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The problem was difficult to reduce as reducing make things > > > > > > > > > disappear... > > > > > > > > > Luckily, I believe I got it (or at least, it looks "like" the > > > > > > > > > one > > > > > > > > > I > > > > > > > > > "really" > > > > > > > > > have...). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Seems that for square matrix, it works fine for csr and dense > > > > > > > > > matrix. > > > > > > > > > But, > > > > > > > > > If > > > > > > > > > I am not mistaken, it does not for dense rectangular matrix > > > > > > > > > (still > > > > > > > > > OK > > > > > > > > > for > > > > > > > > > csr). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > matISCSRDenseSquare.cpp: 2 procs, global 3x3 matrix, each > > > > > > > > > proc > > > > > > > > > adds > > > > > > > > > a > > > > > > > > > 2x2 > > > > > > > > > local matrix in the global matrix. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > matISCSRDenseRect.cpp: 2 procs, global 2 x3 matrix, each proc > > > > > > > > > adds > > > > > > > > > a > > > > > > > > > 1 > > > > > > > > > x2 > > > > > > > > > local vector in the global matrix. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > reminder: running debian/testing with gcc-6.3 + petsc-3.7.6 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Franck > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> mpirun -n 2 ./matISCSRDenseSquare.exe csr; mpirun -n 2 > > > > > > > > > >> ./matISCSRDenseSquare.exe dense > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > csr > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > csr > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Mat Object: 2 MPI processes > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > type: is > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > type: seqaij > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > row 0: (0, 1.) (1, 0.) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > row 1: (0, 0.) (1, 1.) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > type: seqaij > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > row 0: (0, 1.) (1, 0.) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > row 1: (0, 0.) (1, 1.) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > dense > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > dense > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Mat Object: 2 MPI processes > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > type: is > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > type: seqdense > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1.0000000000000000e+00 0.0000000000000000e+00 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 0.0000000000000000e+00 1.0000000000000000e+00 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > type: seqdense > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1.0000000000000000e+00 0.0000000000000000e+00 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 0.0000000000000000e+00 1.0000000000000000e+00 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> mpirun -n 2 ./matISCSRDenseRect.exe csr; mpirun -n 2 > > > > > > > > > >> ./matISCSRDenseRect.exe dense > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > csr > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > csr > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Mat Object: 2 MPI processes > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > type: is > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > type: seqaij > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > row 0: (0, 1.) (1, 0.) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > type: seqaij > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > row 0: (0, 1.) (1, 0.) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > dense > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > dense > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: --------------------- Error Message > > > > > > > > > -------------------------------------------------------------- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: Argument out of range > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: [1]PETSC ERROR: --------------------- Error > > > > > > > > > Message > > > > > > > > > -------------------------------------------------------------- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]PETSC ERROR: Argument out of range > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]PETSC ERROR: New nonzero at (0,1) caused a malloc > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, > > > > > > > > > PETSC_FALSE) > > > > > > > > > to > > > > > > > > > turn > > > > > > > > > off > > > > > > > > > this check > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]PETSC ERROR: See > > > > > > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html > > > > > > > > > for > > > > > > > > > trouble shooting. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]PETSC ERROR: Petsc Release Version 3.7.6, Apr, 24, 2017 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]PETSC ERROR: ./matISCSRDenseRect.exe on a > > > > > > > > > arch-linux2-c-debug > > > > > > > > > named > > > > > > > > > yoda > > > > > > > > > by fghoussen Mon Jun 19 18:08:58 2017 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > New nonzero at (0,1) caused a malloc > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, > > > > > > > > > PETSC_FALSE) > > > > > > > > > to > > > > > > > > > turn > > > > > > > > > off > > > > > > > > > this check > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: See > > > > > > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html > > > > > > > > > for > > > > > > > > > trouble shooting. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: Petsc Release Version 3.7.6, Apr, 24, 2017 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]PETSC ERROR: Configure options > > > > > > > > > --prefix=/home/fghoussen/Documents/INRIA/petsc-3.7.6/local > > > > > > > > > --with-mpi=1 > > > > > > > > > --with-pthread=1 --download-f2cblaslapack=yes > > > > > > > > > --download-mumps=yes > > > > > > > > > --download-scalapack=yes --download-superlu=yes > > > > > > > > > --download-suitesparse=yes > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 616 in > > > > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/aij/mpi/mpiaij.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]PETSC ERROR: #2 MatSetValues() line 1190 in > > > > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]PETSC ERROR: #3 MatSetValuesLocal() line 2053 in > > > > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]PETSC ERROR: #4 MatISGetMPIXAIJ_IS() line 365 in > > > > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]PETSC ERROR: #5 MatISGetMPIXAIJ() line 437 in > > > > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: ./matISCSRDenseRect.exe on a > > > > > > > > > arch-linux2-c-debug > > > > > > > > > named > > > > > > > > > yoda > > > > > > > > > by fghoussen Mon Jun 19 18:08:58 2017 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: Configure options > > > > > > > > > --prefix=/home/fghoussen/Documents/INRIA/petsc-3.7.6/local > > > > > > > > > --with-mpi=1 > > > > > > > > > --with-pthread=1 --download-f2cblaslapack=yes > > > > > > > > > --download-mumps=yes > > > > > > > > > --download-scalapack=yes --download-superlu=yes > > > > > > > > > --download-suitesparse=yes > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 582 in > > > > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/aij/mpi/mpiaij.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: #2 MatSetValues() line 1190 in > > > > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: #3 MatSetValuesLocal() line 2053 in > > > > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: #4 MatISGetMPIXAIJ_IS() line 365 in > > > > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: #5 MatISGetMPIXAIJ() line 437 in > > > > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Mat Object: 2 MPI processes > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > type: is > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > type: seqdense > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1.0000000000000000e+00 0.0000000000000000e+00 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > type: seqdense > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1.0000000000000000e+00 0.0000000000000000e+00 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> diff matISCSRDenseSquare.cpp matISCSRDenseRect.cpp > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3c3 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > < // ~> g++ -o matISCSRDenseSquare.exe > > > > > > > > > matISCSRDenseSquare.cpp > > > > > > > > > -lpetsc > > > > > > > > > -lm; > > > > > > > > > mpirun -n 2 matISCSRDenseSquare.exe > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > // ~> g++ -o matISCSRDenseRect.exe matISCSRDenseRect.cpp > > > > > > > > > > -lpetsc > > > > > > > > > > -lm; > > > > > > > > > > mpirun -n 2 matISCSRDenseRect.exe > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 24c24 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > < ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 2, > > > > > > > > > localIdx, > > > > > > > > > PETSC_COPY_VALUES, &rmap); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ISLocalToGlobalMappingCreate(PETSC_COMM_WORLD, 1, 1, &rank, > > > > > > > > > > PETSC_COPY_VALUES, &rmap); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 29c29 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > < MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, > > > > > > > > > PETSC_DECIDE, > > > > > > > > > 3, > > > > > > > > > 3, > > > > > > > > > rmap, > > > > > > > > > cmap, &A); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > MatCreateIS(PETSC_COMM_WORLD, 1, PETSC_DECIDE, > > > > > > > > > > PETSC_DECIDE, > > > > > > > > > > 2, > > > > > > > > > > 3, > > > > > > > > > > rmap, > > > > > > > > > > cmap, &A); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 32,33c32,33 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > < if (matType == "csr") {cout << matType << endl; > > > > > > > > > MatCreateSeqAIJ(PETSC_COMM_SELF, 2, 2, 2, NULL, &Aloc);} > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > < else {cout << matType << endl; > > > > > > > > > MatCreateSeqDense(PETSC_COMM_SELF, > > > > > > > > > 2, > > > > > > > > > 2, > > > > > > > > > NULL, &Aloc);} > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > if (matType == "csr") {cout << matType << endl; > > > > > > > > > > MatCreateSeqAIJ(PETSC_COMM_SELF, 1, 2, 2, NULL, &Aloc);} > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > else {cout << matType << endl; > > > > > > > > > > MatCreateSeqDense(PETSC_COMM_SELF, > > > > > > > > > > 1, > > > > > > > > > > 2, > > > > > > > > > > NULL, &Aloc);} > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 35,36c35,36 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > < PetscScalar localVal[4] = {1., 0., 0., 1.}; > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > < MatSetValues(Aloc, 2, localIdx, 2, localIdx, localVal, > > > > > > > > > ADD_VALUES); > > > > > > > > > // > > > > > > > > > Add > > > > > > > > > local 2x2 matrix > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > PetscScalar localVal[2] = {1., 0.}; PetscInt oneLocalRow = > > > > > > > > > > 0; > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > MatSetValues(Aloc, 1, &oneLocalRow, 2, localIdx, localVal, > > > > > > > > > > ADD_VALUES); > > > > > > > > > > // > > > > > > > > > > Add local row > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > De: "Stefano Zampini" < stefano.zampini at gmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ?: "Franck Houssen" < franck.houssen at inria.fr > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cc: "PETSc users list" < petsc-users at mcs.anl.gov > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Envoy?: Lundi 19 Juin 2017 15:25:35 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Objet: Re: [petsc-users] Building MatIS with dense local > > > > > > > > > > matrix > > > > > > > > > > ? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Can you send a minimal working example so that I can fix > > > > > > > > > > the > > > > > > > > > > code? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Stefano > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Il 19 Giu 2017 15:20, "Franck Houssen" < > > > > > > > > > > franck.houssen at inria.fr > > > > > > > > > > > > > > > > > > > > > ha > > > > > > > > > > scritto: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I try to call MatISGetMPIXAIJ on a MatIS (A) that has > > > > > > > > > > > been > > > > > > > > > > > feed > > > > > > > > > > > locally > > > > > > > > > > > by > > > > > > > > > > > sequential (Aloc) dense matrix. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Seems this ends up with this error: [0]PETSC ERROR: New > > > > > > > > > > > nonzero > > > > > > > > > > > at > > > > > > > > > > > (0,1) > > > > > > > > > > > caused a malloc. Is this a known error / limitation ? > > > > > > > > > > > (not > > > > > > > > > > > supposed > > > > > > > > > > > to > > > > > > > > > > > work > > > > > > > > > > > with dense matrix ?) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This (pseudo code) works fine: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > MatCreateIS(..., A) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > MatCreateSeqAIJ(..., Aloc) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > MatISSetLocalMat(pcA, pcALoc) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > MatISGetMPIXAIJ(A, ...) // OK ! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > When I try to replace MatCreateSeqAIJ(..., Aloc) with > > > > > > > > > > > MatCreateSeqDense(..., > > > > > > > > > > > Aloc), it does no more work. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Franck > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > PS: running debian/testing with gcc-6.3 + petsc-3.7.6 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > Stefano > > > > > > > > > > > > > -- > > > > > > Stefano > > > > -- > Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jun 21 06:12:31 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 21 Jun 2017 06:12:31 -0500 Subject: [petsc-users] Jacobian matrix for dual porosity model In-Reply-To: <6e72e0e2-1e9c-8796-4b3b-d55421a3fd61@auckland.ac.nz> References: <699ad4c0-6f79-be19-8239-ba2050ccb8de@auckland.ac.nz> <87d1a86i6n.fsf@jedbrown.org> <90E27510-2650-4B07-B37C-1C6D46250FC3@mcs.anl.gov> <87y3sv4qpl.fsf@jedbrown.org> <6e72e0e2-1e9c-8796-4b3b-d55421a3fd61@auckland.ac.nz> Message-ID: On Tue, Jun 20, 2017 at 10:41 PM, Adrian Croucher wrote: > On 16/06/17 11:37, Adrian Croucher wrote: > > On 16/06/17 01:19, Matthew Knepley wrote: > > > Thanks for those ideas, very helpful. >> >> If I try this approach (forming whole Jacobian matrix and using >> PCFieldSplit Schur), I guess I will first need to set up a modified DMPlex >> for the whole fracture + matrix mesh- so I can use it to create vectors and >> the Jacobian matrix (with the right sparsity pattern), and also to work out >> the coloring for finite differencing. >> >> Would that be straight-forward to do? Currently my DM just comes from >> DMPlexCreateFromFile(). Presumably you can use DMPlexInsertCone() or >> similar to add points into it? > > > You can certainly modify the mesh. I need to get a better idea what kind > of modification, and then > I can suggest a way to do it. What do you start with, and what exactly do > you want to add? > > > The way dual porosity is normally implemented in a finite volume context > is to add an extra matrix rock cell 'inside' each of the original cells > (which now represent the fractures, and have their volumes reduced > accordingly), with a connection between the fracture cell and its > corresponding matrix rock cell, so fluid can flow between them. > > More generally there can be multiple matrix rock cells for each fracture > cell, in which case further matrix rock cells are nested inside the first > one, again with connections between them. There are formulae for computing > the appropriate effective matrix rock cell volumes and connection areas, > typically based on a 'fracture spacing' parameter which determines how > fractured the rock is. > > So in a DMPlex context it would mean somehow adding extra DAG points > representing the internal cells and faces for each of the original cells. > I'm not sure how that would be done. > > > I've been having a go at this- copying the cones and supports from the > original DM into a new DM, which will represent the topology of the dual > porosity mesh, but adding one new cell for each original cell, and a face > connecting the two. I've based my attempt loosely on some of the stuff in > SplitFaces() in TS ex11.c. > > It isn't working though, as yet- when I construct the new DM and view it, > the numbers of cells, faces etc. don't make sense, and the depth label says > it's got 7 strata, where it should still just have 4 like the original DM. > > I may have just messed it up somehow, but it occurred to me that maybe > DMPlex won't like having bits of its DAG having different depths? With what > I've been trying, the dual porosity parts of the mesh would just specify > DAG points representing the new cells and the corresponding faces, but not > for any edges or vertices of those faces. I hoped that would be sufficient > for generating the nonzero structure of the Jacobian matrix. But I don't > know if DMPlex will allow a DAG like that? > I was going to write you, but I am moving houses and its very hectic. >From the above description, its not clear to me that you want this topology. For example, in the case that each cell gets an internal cell, it seems like you could just handle this in the function space for each cell. Even for multiple internal cells, it seems like function space parameterization is a better option. Topology is supposed to indicate the support of basis functions, but here that job is done. I would just treat it as some sort of augmented approximation space. Thanks, Matt > - Adrian > > -- > Dr Adrian Croucher > Senior Research Fellow > Department of Engineering Science > University of Auckland, New Zealand > email: a.croucher at auckland.ac.nz > tel: +64 (0)9 923 4611 <+64%209-923%204611> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From franck.houssen at inria.fr Wed Jun 21 08:00:33 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Wed, 21 Jun 2017 15:00:33 +0200 (CEST) Subject: [petsc-users] How to compute RARt with A and R as distributed (MPI) matrices ? In-Reply-To: <1770269664.8592221.1498049231017.JavaMail.zimbra@inria.fr> Message-ID: <674421727.8597906.1498050033315.JavaMail.zimbra@inria.fr> How to compute RARt with A and R as distributed (MPI) matrices ? This works with sequential matrices. The doc say "currently only implemented for pairs of AIJ matrices and classes which inherit from AIJ": I supposed that MPIAIJ was someway inheriting from AIJ, seems that it doesn't. Is this kind of matrix product possible with distributed matrices in PETSc ? Or is this a known limitation ? Do I go the wrong way to do that (= should use another method) ? If yes, what is the correct one ? Franck PS: running debian/testing + gcc-6.3 + bitbucket petsc. >> mpirun -n 2 matRARt.exe seq Mat Object: 1 MPI processes type: seqaij row 0: (0, 1.) (1, 0.) row 1: (0, 0.) (1, 1.) >> mpirun -n 2 matRARt.exe mpi [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: No support for this operation for this object type [0]PETSC ERROR: Matrix of type does not support RARt -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: matRARt.cpp Type: text/x-c++src Size: 1613 bytes Desc: not available URL: From dave.mayhem23 at gmail.com Wed Jun 21 08:11:27 2017 From: dave.mayhem23 at gmail.com (Dave May) Date: Wed, 21 Jun 2017 13:11:27 +0000 Subject: [petsc-users] How to compute RARt with A and R as distributed (MPI) matrices ? In-Reply-To: <674421727.8597906.1498050033315.JavaMail.zimbra@inria.fr> References: <1770269664.8592221.1498049231017.JavaMail.zimbra@inria.fr> <674421727.8597906.1498050033315.JavaMail.zimbra@inria.fr> Message-ID: You can assemble R^t and then use MatPtAP which supports MPIAIJ On Wed, 21 Jun 2017 at 15:00, Franck Houssen wrote: > How to compute RARt with A and R as distributed (MPI) matrices ? > > This works with sequential matrices. > The doc say "currently only implemented for pairs of AIJ matrices and > classes which inherit from AIJ": I supposed that MPIAIJ was someway > inheriting from AIJ, seems that it doesn't. > > Is this kind of matrix product possible with distributed matrices in PETSc > ? Or is this a known limitation ? > Do I go the wrong way to do that (= should use another method) ? If yes, > what is the correct one ? > > Franck > > PS: running debian/testing + gcc-6.3 + bitbucket petsc. > > >> mpirun -n 2 matRARt.exe seq > Mat Object: 1 MPI processes > type: seqaij > row 0: (0, 1.) (1, 0.) > row 1: (0, 0.) (1, 1.) > > >> mpirun -n 2 matRARt.exe mpi > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: No support for this operation for this object type > [0]PETSC ERROR: Matrix of type does not support RARt > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From franck.houssen at inria.fr Wed Jun 21 10:45:06 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Wed, 21 Jun 2017 17:45:06 +0200 (CEST) Subject: [petsc-users] How to compute RARt with A and R as distributed (MPI) matrices ? In-Reply-To: References: <1770269664.8592221.1498049231017.JavaMail.zimbra@inria.fr> <674421727.8597906.1498050033315.JavaMail.zimbra@inria.fr> Message-ID: <495721115.8685207.1498059906185.JavaMail.zimbra@inria.fr> I read the doc, and googled this before to write a dummy example. So, matRARt is only for sequential matrices, and, matPtAP for distributed ones : correct ? (means, I "just" used the "wrong" method) If so, this is a major difference which does not seem (to me) to be emphasized enough in the doc, in particular for a new user (the matRARt and matPtAP pages are the same with R replaced with Pt). Where was it possible to find this information ? Franck ----- Mail original ----- > De: "Dave May" > ?: "Franck Houssen" , "PETSc users list" > , "petsc-dev" > Envoy?: Mercredi 21 Juin 2017 15:11:27 > Objet: Re: [petsc-users] How to compute RARt with A and R as distributed > (MPI) matrices ? > You can assemble R^t and then use MatPtAP which supports MPIAIJ > On Wed, 21 Jun 2017 at 15:00, Franck Houssen < franck.houssen at inria.fr > > wrote: > > How to compute RARt with A and R as distributed (MPI) matrices ? > > > This works with sequential matrices. > > > The doc say "currently only implemented for pairs of AIJ matrices and > > classes > > which inherit from AIJ": I supposed that MPIAIJ was someway inheriting from > > AIJ, seems that it doesn't. > > > Is this kind of matrix product possible with distributed matrices in PETSc > > ? > > Or is this a known limitation ? > > > Do I go the wrong way to do that (= should use another method) ? If yes, > > what > > is the correct one ? > > > Franck > > > PS: running debian/testing + gcc-6.3 + bitbucket petsc. > > > >> mpirun -n 2 matRARt.exe seq > > > Mat Object: 1 MPI processes > > > type: seqaij > > > row 0: (0, 1.) (1, 0.) > > > row 1: (0, 0.) (1, 1.) > > > >> mpirun -n 2 matRARt.exe mpi > > > [0]PETSC ERROR: --------------------- Error Message > > -------------------------------------------------------------- > > > [0]PETSC ERROR: No support for this operation for this object type > > > [0]PETSC ERROR: Matrix of type does not support RARt > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Jun 21 11:30:17 2017 From: jed at jedbrown.org (Jed Brown) Date: Wed, 21 Jun 2017 10:30:17 -0600 Subject: [petsc-users] How to compute RARt with A and R as distributed (MPI) matrices ? In-Reply-To: <495721115.8685207.1498059906185.JavaMail.zimbra@inria.fr> References: <1770269664.8592221.1498049231017.JavaMail.zimbra@inria.fr> <674421727.8597906.1498050033315.JavaMail.zimbra@inria.fr> <495721115.8685207.1498059906185.JavaMail.zimbra@inria.fr> Message-ID: <878tklxq06.fsf@jedbrown.org> It is only implemented for SeqAIJ, but it should arguably have a default implementation that performs an explicit transpose and then calls MatMatMatMult or MatPtAP. Franck Houssen writes: > I read the doc, and googled this before to write a dummy example. So, matRARt is only for sequential matrices, and, matPtAP for distributed ones : correct ? (means, I "just" used the "wrong" method) > If so, this is a major difference which does not seem (to me) to be emphasized enough in the doc, in particular for a new user (the matRARt and matPtAP pages are the same with R replaced with Pt). Where was it possible to find this information ? > > Franck > > ----- Mail original ----- > >> De: "Dave May" >> ?: "Franck Houssen" , "PETSc users list" >> , "petsc-dev" >> Envoy?: Mercredi 21 Juin 2017 15:11:27 >> Objet: Re: [petsc-users] How to compute RARt with A and R as distributed >> (MPI) matrices ? > >> You can assemble R^t and then use MatPtAP which supports MPIAIJ > >> On Wed, 21 Jun 2017 at 15:00, Franck Houssen < franck.houssen at inria.fr > >> wrote: > >> > How to compute RARt with A and R as distributed (MPI) matrices ? >> > >> > This works with sequential matrices. >> >> > The doc say "currently only implemented for pairs of AIJ matrices and >> > classes >> > which inherit from AIJ": I supposed that MPIAIJ was someway inheriting from >> > AIJ, seems that it doesn't. >> > >> > Is this kind of matrix product possible with distributed matrices in PETSc >> > ? >> > Or is this a known limitation ? >> >> > Do I go the wrong way to do that (= should use another method) ? If yes, >> > what >> > is the correct one ? >> > >> > Franck >> > >> > PS: running debian/testing + gcc-6.3 + bitbucket petsc. >> > >> > >> mpirun -n 2 matRARt.exe seq >> >> > Mat Object: 1 MPI processes >> >> > type: seqaij >> >> > row 0: (0, 1.) (1, 0.) >> >> > row 1: (0, 0.) (1, 1.) >> > >> > >> mpirun -n 2 matRARt.exe mpi >> >> > [0]PETSC ERROR: --------------------- Error Message >> > -------------------------------------------------------------- >> >> > [0]PETSC ERROR: No support for this operation for this object type >> >> > [0]PETSC ERROR: Matrix of type does not support RARt >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From hzhang at mcs.anl.gov Wed Jun 21 11:52:28 2017 From: hzhang at mcs.anl.gov (Hong) Date: Wed, 21 Jun 2017 11:52:28 -0500 Subject: [petsc-users] How to compute RARt with A and R as distributed (MPI) matrices ? In-Reply-To: <878tklxq06.fsf@jedbrown.org> References: <1770269664.8592221.1498049231017.JavaMail.zimbra@inria.fr> <674421727.8597906.1498050033315.JavaMail.zimbra@inria.fr> <495721115.8685207.1498059906185.JavaMail.zimbra@inria.fr> <878tklxq06.fsf@jedbrown.org> Message-ID: Jed : > It is only implemented for SeqAIJ, but it should arguably have a default > implementation that performs an explicit transpose and then calls > MatMatMatMult or MatPtAP. > We can implement P= R^T (we have this for mpi matrices, expensive though) C = R*A*P Hong > Franck Houssen writes: > > > I read the doc, and googled this before to write a dummy example. So, > matRARt is only for sequential matrices, and, matPtAP for distributed ones > : correct ? (means, I "just" used the "wrong" method) > > If so, this is a major difference which does not seem (to me) to be > emphasized enough in the doc, in particular for a new user (the matRARt and > matPtAP pages are the same with R replaced with Pt). Where was it possible > to find this information ? > > > > Franck > > > > ----- Mail original ----- > > > >> De: "Dave May" > >> ?: "Franck Houssen" , "PETSc users list" > >> , "petsc-dev" > >> Envoy?: Mercredi 21 Juin 2017 15:11:27 > >> Objet: Re: [petsc-users] How to compute RARt with A and R as distributed > >> (MPI) matrices ? > > > >> You can assemble R^t and then use MatPtAP which supports MPIAIJ > > > >> On Wed, 21 Jun 2017 at 15:00, Franck Houssen < franck.houssen at inria.fr > > > >> wrote: > > > >> > How to compute RARt with A and R as distributed (MPI) matrices ? > >> > > > >> > This works with sequential matrices. > >> > >> > The doc say "currently only implemented for pairs of AIJ matrices and > >> > classes > >> > which inherit from AIJ": I supposed that MPIAIJ was someway > inheriting from > >> > AIJ, seems that it doesn't. > >> > > > >> > Is this kind of matrix product possible with distributed matrices in > PETSc > >> > ? > >> > Or is this a known limitation ? > >> > >> > Do I go the wrong way to do that (= should use another method) ? If > yes, > >> > what > >> > is the correct one ? > >> > > > >> > Franck > >> > > > >> > PS: running debian/testing + gcc-6.3 + bitbucket petsc. > >> > > > >> > >> mpirun -n 2 matRARt.exe seq > >> > >> > Mat Object: 1 MPI processes > >> > >> > type: seqaij > >> > >> > row 0: (0, 1.) (1, 0.) > >> > >> > row 1: (0, 0.) (1, 1.) > >> > > > >> > >> mpirun -n 2 matRARt.exe mpi > >> > >> > [0]PETSC ERROR: --------------------- Error Message > >> > -------------------------------------------------------------- > >> > >> > [0]PETSC ERROR: No support for this operation for this object type > >> > >> > [0]PETSC ERROR: Matrix of type does not support RARt > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Jun 21 12:08:08 2017 From: jed at jedbrown.org (Jed Brown) Date: Wed, 21 Jun 2017 11:08:08 -0600 Subject: [petsc-users] How to compute RARt with A and R as distributed (MPI) matrices ? In-Reply-To: References: <1770269664.8592221.1498049231017.JavaMail.zimbra@inria.fr> <674421727.8597906.1498050033315.JavaMail.zimbra@inria.fr> <495721115.8685207.1498059906185.JavaMail.zimbra@inria.fr> <878tklxq06.fsf@jedbrown.org> Message-ID: <8760fpxo93.fsf@jedbrown.org> Hong writes: > Jed : > >> It is only implemented for SeqAIJ, but it should arguably have a default >> implementation that performs an explicit transpose and then calls >> MatMatMatMult or MatPtAP. >> > > We can implement > P= R^T (we have this for mpi matrices, expensive though) > C = R*A*P Exactly. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From bsmith at mcs.anl.gov Wed Jun 21 12:53:29 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 21 Jun 2017 12:53:29 -0500 Subject: [petsc-users] How to compute RARt with A and R as distributed (MPI) matrices ? In-Reply-To: <674421727.8597906.1498050033315.JavaMail.zimbra@inria.fr> References: <674421727.8597906.1498050033315.JavaMail.zimbra@inria.fr> Message-ID: > On Jun 21, 2017, at 8:00 AM, Franck Houssen wrote: > > How to compute RARt with A and R as distributed (MPI) matrices ? > > This works with sequential matrices. > The doc say "currently only implemented for pairs of AIJ matrices and classes which inherit from AIJ": I supposed that MPIAIJ was someway inheriting from AIJ, seems that it doesn't. Yes, when we say AIJ we mean both SeqAIJ and MPIAIJ. The manual page here is wrong, probably because it got copied from MatPtAP page > > Is this kind of matrix product possible with distributed matrices in PETSc ? Or is this a known limitation ? > Do I go the wrong way to do that (= should use another method) ? If yes, what is the correct one ? > > Franck > > PS: running debian/testing + gcc-6.3 + bitbucket petsc. > > >> mpirun -n 2 matRARt.exe seq > Mat Object: 1 MPI processes > type: seqaij > row 0: (0, 1.) (1, 0.) > row 1: (0, 0.) (1, 1.) > > >> mpirun -n 2 matRARt.exe mpi > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: No support for this operation for this object type > [0]PETSC ERROR: Matrix of type does not support RARt > > From hzhang at mcs.anl.gov Wed Jun 21 14:41:33 2017 From: hzhang at mcs.anl.gov (Hong) Date: Wed, 21 Jun 2017 14:41:33 -0500 Subject: [petsc-users] [petsc-dev] How to compute RARt with A and R as distributed (MPI) matrices ? In-Reply-To: References: <674421727.8597906.1498050033315.JavaMail.zimbra@inria.fr> Message-ID: I add support for MatRARt_MPIAIJ https://bitbucket.org/petsc/petsc/commits/a3c14138ce0daf4ee55c7c10f1b4631a8ed2f13e It is in branch hzhang/mpirart. Let me know your comments. Hong On Wed, Jun 21, 2017 at 12:53 PM, Barry Smith wrote: > > > On Jun 21, 2017, at 8:00 AM, Franck Houssen > wrote: > > > > How to compute RARt with A and R as distributed (MPI) matrices ? > > > > This works with sequential matrices. > > The doc say "currently only implemented for pairs of AIJ matrices and > classes which inherit from AIJ": I supposed that MPIAIJ was someway > inheriting from AIJ, seems that it doesn't. > > Yes, when we say AIJ we mean both SeqAIJ and MPIAIJ. The manual page > here is wrong, probably because it got copied from MatPtAP page > > > > > Is this kind of matrix product possible with distributed matrices in > PETSc ? Or is this a known limitation ? > > Do I go the wrong way to do that (= should use another method) ? If yes, > what is the correct one ? > > > > Franck > > > > PS: running debian/testing + gcc-6.3 + bitbucket petsc. > > > > >> mpirun -n 2 matRARt.exe seq > > Mat Object: 1 MPI processes > > type: seqaij > > row 0: (0, 1.) (1, 0.) > > row 1: (0, 0.) (1, 1.) > > > > >> mpirun -n 2 matRARt.exe mpi > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: No support for this operation for this object type > > [0]PETSC ERROR: Matrix of type does not support RARt > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Jun 21 15:15:03 2017 From: jed at jedbrown.org (Jed Brown) Date: Wed, 21 Jun 2017 14:15:03 -0600 Subject: [petsc-users] [petsc-dev] How to compute RARt with A and R as distributed (MPI) matrices ? In-Reply-To: References: <674421727.8597906.1498050033315.JavaMail.zimbra@inria.fr> Message-ID: <87r2ydw114.fsf@jedbrown.org> Hong writes: > I add support for MatRARt_MPIAIJ > https://bitbucket.org/petsc/petsc/commits/a3c14138ce0daf4ee55c7c10f1b4631a8ed2f13e Is it better this way or as a fallback when !A->ops->rart? MatPtAP handles other combinations like MAIJ. > It is in branch hzhang/mpirart. > Let me know your comments. > > Hong > > On Wed, Jun 21, 2017 at 12:53 PM, Barry Smith wrote: > >> >> > On Jun 21, 2017, at 8:00 AM, Franck Houssen >> wrote: >> > >> > How to compute RARt with A and R as distributed (MPI) matrices ? >> > >> > This works with sequential matrices. >> > The doc say "currently only implemented for pairs of AIJ matrices and >> classes which inherit from AIJ": I supposed that MPIAIJ was someway >> inheriting from AIJ, seems that it doesn't. >> >> Yes, when we say AIJ we mean both SeqAIJ and MPIAIJ. The manual page >> here is wrong, probably because it got copied from MatPtAP page >> >> > >> > Is this kind of matrix product possible with distributed matrices in >> PETSc ? Or is this a known limitation ? >> > Do I go the wrong way to do that (= should use another method) ? If yes, >> what is the correct one ? >> > >> > Franck >> > >> > PS: running debian/testing + gcc-6.3 + bitbucket petsc. >> > >> > >> mpirun -n 2 matRARt.exe seq >> > Mat Object: 1 MPI processes >> > type: seqaij >> > row 0: (0, 1.) (1, 0.) >> > row 1: (0, 0.) (1, 1.) >> > >> > >> mpirun -n 2 matRARt.exe mpi >> > [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> > [0]PETSC ERROR: No support for this operation for this object type >> > [0]PETSC ERROR: Matrix of type does not support RARt >> > >> > >> >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From hzhang at mcs.anl.gov Wed Jun 21 18:26:05 2017 From: hzhang at mcs.anl.gov (Hong) Date: Wed, 21 Jun 2017 18:26:05 -0500 Subject: [petsc-users] [petsc-dev] How to compute RARt with A and R as distributed (MPI) matrices ? In-Reply-To: <87r2ydw114.fsf@jedbrown.org> References: <674421727.8597906.1498050033315.JavaMail.zimbra@inria.fr> <87r2ydw114.fsf@jedbrown.org> Message-ID: Jed : > > Is it better this way or as a fallback when !A->ops->rart? MatPtAP > handles other combinations like MAIJ. > Do you mean if ( !A->ops->rart) { Mat Rt; ierr = MatTranspose(R,MAT_INITIAL_MATRIX,&Rt);CHKERRQ(ierr); ierr = MatMatMatMult(R,A,Rt,scall,fill,C);CHKERRQ(ierr); ierr = MatDestroy(&Rt);CHKERRQ(ierr); } This does NOT work for most matrix formats because we do not have fallbacks for MatTranspose() and MatMatMult(). Hong > > > > > On Wed, Jun 21, 2017 at 12:53 PM, Barry Smith > wrote: > > > >> > >> > On Jun 21, 2017, at 8:00 AM, Franck Houssen > >> wrote: > >> > > >> > How to compute RARt with A and R as distributed (MPI) matrices ? > >> > > >> > This works with sequential matrices. > >> > The doc say "currently only implemented for pairs of AIJ matrices and > >> classes which inherit from AIJ": I supposed that MPIAIJ was someway > >> inheriting from AIJ, seems that it doesn't. > >> > >> Yes, when we say AIJ we mean both SeqAIJ and MPIAIJ. The manual > page > >> here is wrong, probably because it got copied from MatPtAP page > >> > >> > > >> > Is this kind of matrix product possible with distributed matrices in > >> PETSc ? Or is this a known limitation ? > >> > Do I go the wrong way to do that (= should use another method) ? If > yes, > >> what is the correct one ? > >> > > >> > Franck > >> > > >> > PS: running debian/testing + gcc-6.3 + bitbucket petsc. > >> > > >> > >> mpirun -n 2 matRARt.exe seq > >> > Mat Object: 1 MPI processes > >> > type: seqaij > >> > row 0: (0, 1.) (1, 0.) > >> > row 1: (0, 0.) (1, 1.) > >> > > >> > >> mpirun -n 2 matRARt.exe mpi > >> > [0]PETSC ERROR: --------------------- Error Message > >> -------------------------------------------------------------- > >> > [0]PETSC ERROR: No support for this operation for this object type > >> > [0]PETSC ERROR: Matrix of type does not support RARt > >> > > >> > > >> > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.lefley at aclectic.com Wed Jun 21 20:12:51 2017 From: jason.lefley at aclectic.com (Jason Lefley) Date: Wed, 21 Jun 2017 18:12:51 -0700 Subject: [petsc-users] Issue using multi-grid as a pre-conditioner with KSP for a Poisson problem Message-ID: <89C033B5-529D-4B36-B4AF-2EC35CA2CCAB@aclectic.com> Hello, We are attempting to use the PETSc KSP solver framework in a fluid dynamics simulation we developed. The solution is part of a pressure projection and solves a Poisson problem. We use a cell-centered layout with a regular grid in 3d. We started with ex34.c from the KSP tutorials since it has the correct calls for the 3d DMDA, uses a cell-centered layout, and states that it works with multi-grid. We modified the operator construction function to match the coefficients and Dirichlet boundary conditions used in our problem (we?d also like to support Neumann but left those out for now to keep things simple). As a result of the modified boundary conditions, our version does not perform a null space removal on the right hand side or operator as the original did. We also modified the right hand side to contain a sinusoidal pattern for testing. Other than these changes, our code is the same as the original ex34.c With the default KSP options and using CG with the default pre-conditioner and without a pre-conditioner, we see good convergence. However, we?d like to accelerate the time to solution further and target larger problem sizes (>= 1024^3) if possible. Given these objectives, multi-grid as a pre-conditioner interests us. To understand the improvement that multi-grid provides, we ran ex45 from the KSP tutorials. ex34 with CG and no pre-conditioner appears to converge in a single iteration and we wanted to compare against a problem that has similar convergence patterns to our problem. Here?s the tests we ran with ex45: mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 time in KSPSolve(): 7.0178e+00 solver iterations: 157 KSP final norm of residual: 3.16874e-05 mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -ksp_type cg -pc_type none time in KSPSolve(): 4.1072e+00 solver iterations: 213 KSP final norm of residual: 0.000138866 mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -ksp_type cg time in KSPSolve(): 3.3962e+00 solver iterations: 88 KSP final norm of residual: 6.46242e-05 mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi time in KSPSolve(): 1.3201e+00 solver iterations: 4 KSP final norm of residual: 8.13339e-05 mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi -ksp_type cg time in KSPSolve(): 1.3008e+00 solver iterations: 4 KSP final norm of residual: 2.21474e-05 We found the multi-grid pre-conditioner options in the KSP tutorials makefile. These results make sense; both the default GMRES and CG solvers converge and CG without a pre-conditioner takes more iterations. The multi-grid pre-conditioned runs are pretty dramatically accelerated and require only a handful of iterations. We ran our code (modified ex34.c as described above) with the same parameters: mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 time in KSPSolve(): 5.3729e+00 solver iterations: 123 KSP final norm of residual: 0.00595066 mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_type cg -pc_type none time in KSPSolve(): 3.6154e+00 solver iterations: 188 KSP final norm of residual: 0.00505943 mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_type cg time in KSPSolve(): 3.5661e+00 solver iterations: 98 KSP final norm of residual: 0.00967462 mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi time in KSPSolve(): 4.5606e+00 solver iterations: 44 KSP final norm of residual: 949.553 mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi -ksp_type cg time in KSPSolve(): 1.5481e+01 solver iterations: 198 KSP final norm of residual: 0.916558 We performed all tests with petsc-3.7.6. The trends with CG and GMRES seem consistent with the results from ex45. However, with multi-grid, something doesn?t seem right. Convergence seems poor and the solves run for many more iterations than ex45 with multi-grid as a pre-conditioner. I extensively validated the code that builds the matrix and also confirmed that the solution produced by CG, when evaluated with the system of equations elsewhere in our simulation, produces the same residual as indicated by PETSc. Given that we only made minimal modifications to the original example code, it seems likely that the operators constructed for the multi-grid levels are ok. We also tried a variety of other suggested parameters for the multi-grid pre-conditioner as suggested in related mailing list posts but we didn?t observe any significant improvements over the results above. Is there anything we can do to check the validity of the coefficient matrices built for the different multi-grid levels? Does it look like there could be problems there? Or any other suggestions to achieve better results with multi-grid? I have the -log_view, -ksp_view, and convergence monitor output from the above tests and can post any of it if it would assist. Thanks From jed at jedbrown.org Wed Jun 21 22:26:57 2017 From: jed at jedbrown.org (Jed Brown) Date: Wed, 21 Jun 2017 21:26:57 -0600 Subject: [petsc-users] [petsc-dev] How to compute RARt with A and R as distributed (MPI) matrices ? In-Reply-To: References: <674421727.8597906.1498050033315.JavaMail.zimbra@inria.fr> <87r2ydw114.fsf@jedbrown.org> Message-ID: <87d19wwvlq.fsf@jedbrown.org> Hong writes: > Jed : > >> >> Is it better this way or as a fallback when !A->ops->rart? MatPtAP >> handles other combinations like MAIJ. >> > > Do you mean > if ( !A->ops->rart) { > Mat Rt; > ierr = MatTranspose(R,MAT_INITIAL_MATRIX,&Rt);CHKERRQ(ierr); > ierr = MatMatMatMult(R,A,Rt,scall,fill,C);CHKERRQ(ierr); > ierr = MatDestroy(&Rt);CHKERRQ(ierr); > } > This does NOT work for most matrix formats because we do not have fallbacks > for MatTranspose() and MatMatMult(). That's fine; they'll trigger an error and we'll be able to see from the stack that it can be made to work by either implementing the appropriate MatRARt or MatTranspose and MatMatMatMult. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From dave.mayhem23 at gmail.com Wed Jun 21 23:09:57 2017 From: dave.mayhem23 at gmail.com (Dave May) Date: Thu, 22 Jun 2017 04:09:57 +0000 Subject: [petsc-users] Issue using multi-grid as a pre-conditioner with KSP for a Poisson problem In-Reply-To: <89C033B5-529D-4B36-B4AF-2EC35CA2CCAB@aclectic.com> References: <89C033B5-529D-4B36-B4AF-2EC35CA2CCAB@aclectic.com> Message-ID: Please send your modified version of ex34. It will be faster to examine the source and experiment with option choices locally rather than sending emails back and forth. Thanks, Dave On Thu, 22 Jun 2017 at 03:13, Jason Lefley wrote: > Hello, > > We are attempting to use the PETSc KSP solver framework in a fluid > dynamics simulation we developed. The solution is part of a pressure > projection and solves a Poisson problem. We use a cell-centered layout with > a regular grid in 3d. We started with ex34.c from the KSP tutorials since > it has the correct calls for the 3d DMDA, uses a cell-centered layout, and > states that it works with multi-grid. We modified the operator construction > function to match the coefficients and Dirichlet boundary conditions used > in our problem (we?d also like to support Neumann but left those out for > now to keep things simple). As a result of the modified boundary > conditions, our version does not perform a null space removal on the right > hand side or operator as the original did. We also modified the right hand > side to contain a sinusoidal pattern for testing. Other than these changes, > our code is the same as the original ex34.c > > With the default KSP options and using CG with the default pre-conditioner > and without a pre-conditioner, we see good convergence. However, we?d like > to accelerate the time to solution further and target larger problem sizes > (>= 1024^3) if possible. Given these objectives, multi-grid as a > pre-conditioner interests us. To understand the improvement that multi-grid > provides, we ran ex45 from the KSP tutorials. ex34 with CG and no > pre-conditioner appears to converge in a single iteration and we wanted to > compare against a problem that has similar convergence patterns to our > problem. Here?s the tests we ran with ex45: > > mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 > time in KSPSolve(): 7.0178e+00 > solver iterations: 157 > KSP final norm of residual: 3.16874e-05 > > mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -ksp_type > cg -pc_type none > time in KSPSolve(): 4.1072e+00 > solver iterations: 213 > KSP final norm of residual: 0.000138866 > > mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -ksp_type > cg > time in KSPSolve(): 3.3962e+00 > solver iterations: 88 > KSP final norm of residual: 6.46242e-05 > > mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type > mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 > -mg_levels_pc_type bjacobi > time in KSPSolve(): 1.3201e+00 > solver iterations: 4 > KSP final norm of residual: 8.13339e-05 > > mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type > mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 > -mg_levels_pc_type bjacobi -ksp_type cg > time in KSPSolve(): 1.3008e+00 > solver iterations: 4 > KSP final norm of residual: 2.21474e-05 > > We found the multi-grid pre-conditioner options in the KSP tutorials > makefile. These results make sense; both the default GMRES and CG solvers > converge and CG without a pre-conditioner takes more iterations. The > multi-grid pre-conditioned runs are pretty dramatically accelerated and > require only a handful of iterations. > > We ran our code (modified ex34.c as described above) with the same > parameters: > > mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 > time in KSPSolve(): 5.3729e+00 > solver iterations: 123 > KSP final norm of residual: 0.00595066 > > mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 > -ksp_type cg -pc_type none > time in KSPSolve(): 3.6154e+00 > solver iterations: 188 > KSP final norm of residual: 0.00505943 > > mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 > -ksp_type cg > time in KSPSolve(): 3.5661e+00 > solver iterations: 98 > KSP final norm of residual: 0.00967462 > > mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 > -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson > -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi > time in KSPSolve(): 4.5606e+00 > solver iterations: 44 > KSP final norm of residual: 949.553 > > mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 > -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson > -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi -ksp_type cg > time in KSPSolve(): 1.5481e+01 > solver iterations: 198 > KSP final norm of residual: 0.916558 > > We performed all tests with petsc-3.7.6. > > The trends with CG and GMRES seem consistent with the results from ex45. > However, with multi-grid, something doesn?t seem right. Convergence seems > poor and the solves run for many more iterations than ex45 with multi-grid > as a pre-conditioner. I extensively validated the code that builds the > matrix and also confirmed that the solution produced by CG, when evaluated > with the system of equations elsewhere in our simulation, produces the same > residual as indicated by PETSc. Given that we only made minimal > modifications to the original example code, it seems likely that the > operators constructed for the multi-grid levels are ok. > > We also tried a variety of other suggested parameters for the multi-grid > pre-conditioner as suggested in related mailing list posts but we didn?t > observe any significant improvements over the results above. > > Is there anything we can do to check the validity of the coefficient > matrices built for the different multi-grid levels? Does it look like there > could be problems there? Or any other suggestions to achieve better results > with multi-grid? I have the -log_view, -ksp_view, and convergence monitor > output from the above tests and can post any of it if it would assist. > > Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.croucher at auckland.ac.nz Thu Jun 22 00:59:00 2017 From: a.croucher at auckland.ac.nz (Adrian Croucher) Date: Thu, 22 Jun 2017 17:59:00 +1200 Subject: [petsc-users] Jacobian matrix for dual porosity model In-Reply-To: References: <699ad4c0-6f79-be19-8239-ba2050ccb8de@auckland.ac.nz> <87d1a86i6n.fsf@jedbrown.org> <90E27510-2650-4B07-B37C-1C6D46250FC3@mcs.anl.gov> <87y3sv4qpl.fsf@jedbrown.org> <6e72e0e2-1e9c-8796-4b3b-d55421a3fd61@auckland.ac.nz> Message-ID: <638c328a-ade6-dbb6-88f1-24c9372d5178@auckland.ac.nz> On 21/06/17 23:12, Matthew Knepley wrote: > > From the above description, its not clear to me that you want this > topology. For example, in the case that each cell gets an internal cell, > it seems like you could just handle this in the function space for > each cell. Even for multiple internal cells, it seems like function space > parameterization is a better option. Topology is supposed to indicate > the support of basis functions, but here that job is done. I would > just treat it as some sort of augmented approximation space. If I understand what you mean, I considered doing something like that- basically just defining extra degrees of freedom in the cells where dual porosity is to be applied. It seemed to me that if I then went ahead and created the Jacobian matrix using DMCreateMatrix(), it would give me extra nonzero entries that shouldn't be there - interactions between the dual porosity variables in neighbouring cells. Is there any way to avoid that? - Adrian -- Dr Adrian Croucher Senior Research Fellow Department of Engineering Science University of Auckland, New Zealand email: a.croucher at auckland.ac.nz tel: +64 (0)9 923 4611 -------------- next part -------------- An HTML attachment was scrubbed... URL: From franck.houssen at inria.fr Thu Jun 22 03:35:16 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Thu, 22 Jun 2017 10:35:16 +0200 (CEST) Subject: [petsc-users] [petsc-dev] How to compute RARt with A and R as distributed (MPI) matrices ? In-Reply-To: <87d19wwvlq.fsf@jedbrown.org> References: <674421727.8597906.1498050033315.JavaMail.zimbra@inria.fr> <87r2ydw114.fsf@jedbrown.org> <87d19wwvlq.fsf@jedbrown.org> Message-ID: <803640737.8847135.1498120516474.JavaMail.zimbra@inria.fr> Thanks, I wouldn't have expected so much ! I have to say I am too far from the code to get all the details "behind" this (I understand these details may be numerous and tricky - no problem). If I got it well, in a near future, MatRARt could support both sequential and distributed matrices, and, MatPtAP will support only distributed ones, correct ? But, the web pages are still identical and the manual.pdf does not mention these functions. So most (rookies, not so experienced) users may still be lost, right ? A user can understand there are several methods available for different reasons, but, he needs to know why. I just expected adding a 1-line note in the doc (pdf or html) for MatRARt and/or MatPtAP of this kind "this method is meant to support this stuffs but not that stuffs" and/or "if you look for performance, prefer to use this method, not that one" if relevant (?). Franck ----- Mail original ----- > De: "Jed Brown" > ?: "Hong" > Cc: "petsc-dev" , "PETSc users list" > Envoy?: Jeudi 22 Juin 2017 05:26:57 > Objet: Re: [petsc-users] [petsc-dev] How to compute RARt with A and R as distributed (MPI) matrices ? > > Hong writes: > > > Jed : > > > >> > >> Is it better this way or as a fallback when !A->ops->rart? MatPtAP > >> handles other combinations like MAIJ. > >> > > > > Do you mean > > if ( !A->ops->rart) { > > Mat Rt; > > ierr = MatTranspose(R,MAT_INITIAL_MATRIX,&Rt);CHKERRQ(ierr); > > ierr = MatMatMatMult(R,A,Rt,scall,fill,C);CHKERRQ(ierr); > > ierr = MatDestroy(&Rt);CHKERRQ(ierr); > > } > > This does NOT work for most matrix formats because we do not have fallbacks > > for MatTranspose() and MatMatMult(). > > That's fine; they'll trigger an error and we'll be able to see from the > stack that it can be made to work by either implementing the appropriate > MatRARt or MatTranspose and MatMatMatMult. > From knepley at gmail.com Thu Jun 22 07:48:47 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 22 Jun 2017 07:48:47 -0500 Subject: [petsc-users] Jacobian matrix for dual porosity model In-Reply-To: <638c328a-ade6-dbb6-88f1-24c9372d5178@auckland.ac.nz> References: <699ad4c0-6f79-be19-8239-ba2050ccb8de@auckland.ac.nz> <87d1a86i6n.fsf@jedbrown.org> <90E27510-2650-4B07-B37C-1C6D46250FC3@mcs.anl.gov> <87y3sv4qpl.fsf@jedbrown.org> <6e72e0e2-1e9c-8796-4b3b-d55421a3fd61@auckland.ac.nz> <638c328a-ade6-dbb6-88f1-24c9372d5178@auckland.ac.nz> Message-ID: On Thu, Jun 22, 2017 at 12:59 AM, Adrian Croucher wrote: > On 21/06/17 23:12, Matthew Knepley wrote: > > > From the above description, its not clear to me that you want this > topology. For example, in the case that each cell gets an internal cell, > it seems like you could just handle this in the function space for each > cell. Even for multiple internal cells, it seems like function space > parameterization is a better option. Topology is supposed to indicate the > support of basis functions, but here that job is done. I would > just treat it as some sort of augmented approximation space. > > > If I understand what you mean, I considered doing something like that- > basically just defining extra degrees of freedom in the cells where dual > porosity is to be applied. > > It seemed to me that if I then went ahead and created the Jacobian matrix > using DMCreateMatrix(), it would give me extra nonzero entries that > shouldn't be there - interactions between the dual porosity variables in > neighbouring cells. Is there any way to avoid that? > Ah, this is a very good point. You would like sparse structure in the Jacobian blocks. Currently I do not have it, but DMNetwork does. I have been planning to unify the sparsity determination between the two. It is on the list. Thanks, Matt > > - Adrian > > -- > Dr Adrian Croucher > Senior Research Fellow > Department of Engineering Science > University of Auckland, New Zealand > email: a.croucher at auckland.ac.nz > tel: +64 (0)9 923 4611 <+64%209-923%204611> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Thu Jun 22 10:12:17 2017 From: hzhang at mcs.anl.gov (Hong) Date: Thu, 22 Jun 2017 10:12:17 -0500 Subject: [petsc-users] [petsc-dev] How to compute RARt with A and R as distributed (MPI) matrices ? In-Reply-To: <87d19wwvlq.fsf@jedbrown.org> References: <674421727.8597906.1498050033315.JavaMail.zimbra@inria.fr> <87r2ydw114.fsf@jedbrown.org> <87d19wwvlq.fsf@jedbrown.org> Message-ID: Jed: > > >> Is it better this way or as a fallback when !A->ops->rart? MatPtAP > >> handles other combinations like MAIJ. > >> > > > > Do you mean > > if ( !A->ops->rart) { > > Mat Rt; > > ierr = MatTranspose(R,MAT_INITIAL_MATRIX,&Rt);CHKERRQ(ierr); > > ierr = MatMatMatMult(R,A,Rt,scall,fill,C);CHKERRQ(ierr); > > ierr = MatDestroy(&Rt);CHKERRQ(ierr); > > } > > This does NOT work for most matrix formats because we do not have > fallbacks > > for MatTranspose() and MatMatMult(). > > That's fine; they'll trigger an error and we'll be able to see from the > stack that it can be made to work by either implementing the appropriate > MatRARt or MatTranspose and MatMatMatMult. > You prefer adding this default, even though it gives error in either MatTranspose() or MatMatMatMult() depends on input matrix format? If so, we need add this type of 'default' to all mat operations -- currently, all routines do if (!mat->ops-> ) SETERRQ1(PetscObjectComm((PetscObject)mat),PETSC_ERR_SUP,"Mat type %s",((PetscObject)mat)->type_name); Hong -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Jun 22 11:17:33 2017 From: jed at jedbrown.org (Jed Brown) Date: Thu, 22 Jun 2017 10:17:33 -0600 Subject: [petsc-users] [petsc-dev] How to compute RARt with A and R as distributed (MPI) matrices ? In-Reply-To: References: <674421727.8597906.1498050033315.JavaMail.zimbra@inria.fr> <87r2ydw114.fsf@jedbrown.org> <87d19wwvlq.fsf@jedbrown.org> Message-ID: <87y3skuhcy.fsf@jedbrown.org> Hong writes: > Jed: >> >> >> Is it better this way or as a fallback when !A->ops->rart? MatPtAP >> >> handles other combinations like MAIJ. >> >> >> > >> > Do you mean >> > if ( !A->ops->rart) { >> > Mat Rt; >> > ierr = MatTranspose(R,MAT_INITIAL_MATRIX,&Rt);CHKERRQ(ierr); >> > ierr = MatMatMatMult(R,A,Rt,scall,fill,C);CHKERRQ(ierr); >> > ierr = MatDestroy(&Rt);CHKERRQ(ierr); >> > } >> > This does NOT work for most matrix formats because we do not have >> fallbacks >> > for MatTranspose() and MatMatMult(). >> >> That's fine; they'll trigger an error and we'll be able to see from the >> stack that it can be made to work by either implementing the appropriate >> MatRARt or MatTranspose and MatMatMatMult. >> > > You prefer adding this default, even though it gives error in either > MatTranspose() or MatMatMatMult() depends on input matrix format? Yeah, in the sense that it gives more opportunities to succeed. > If so, we need add this type of 'default' to all mat operations -- > currently, all routines do > if (!mat->ops-> ) > SETERRQ1(PetscObjectComm((PetscObject)mat),PETSC_ERR_SUP,"Mat type > %s",((PetscObject)mat)->type_name); Probably. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From knepley at gmail.com Thu Jun 22 11:23:18 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 22 Jun 2017 11:23:18 -0500 Subject: [petsc-users] Issue using multi-grid as a pre-conditioner with KSP for a Poisson problem In-Reply-To: <89C033B5-529D-4B36-B4AF-2EC35CA2CCAB@aclectic.com> References: <89C033B5-529D-4B36-B4AF-2EC35CA2CCAB@aclectic.com> Message-ID: On Wed, Jun 21, 2017 at 8:12 PM, Jason Lefley wrote: > Hello, > > We are attempting to use the PETSc KSP solver framework in a fluid > dynamics simulation we developed. The solution is part of a pressure > projection and solves a Poisson problem. We use a cell-centered layout with > a regular grid in 3d. We started with ex34.c from the KSP tutorials since > it has the correct calls for the 3d DMDA, uses a cell-centered layout, and > states that it works with multi-grid. We modified the operator construction > function to match the coefficients and Dirichlet boundary conditions used > in our problem (we?d also like to support Neumann but left those out for > now to keep things simple). As a result of the modified boundary > conditions, our version does not perform a null space removal on the right > hand side or operator as the original did. We also modified the right hand > side to contain a sinusoidal pattern for testing. Other than these changes, > our code is the same as the original ex34.c > > With the default KSP options and using CG with the default pre-conditioner > and without a pre-conditioner, we see good convergence. However, we?d like > to accelerate the time to solution further and target larger problem sizes > (>= 1024^3) if possible. Given these objectives, multi-grid as a > pre-conditioner interests us. To understand the improvement that multi-grid > provides, we ran ex45 from the KSP tutorials. ex34 with CG and no > pre-conditioner appears to converge in a single iteration and we wanted to > compare against a problem that has similar convergence patterns to our > problem. Here?s the tests we ran with ex45: > > mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 > time in KSPSolve(): 7.0178e+00 > solver iterations: 157 > KSP final norm of residual: 3.16874e-05 > > mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -ksp_type > cg -pc_type none > time in KSPSolve(): 4.1072e+00 > solver iterations: 213 > KSP final norm of residual: 0.000138866 > > mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -ksp_type > cg > time in KSPSolve(): 3.3962e+00 > solver iterations: 88 > KSP final norm of residual: 6.46242e-05 > > mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type > mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 > -mg_levels_pc_type bjacobi > time in KSPSolve(): 1.3201e+00 > solver iterations: 4 > KSP final norm of residual: 8.13339e-05 > > mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type > mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 > -mg_levels_pc_type bjacobi -ksp_type cg > time in KSPSolve(): 1.3008e+00 > solver iterations: 4 > KSP final norm of residual: 2.21474e-05 > > We found the multi-grid pre-conditioner options in the KSP tutorials > makefile. These results make sense; both the default GMRES and CG solvers > converge and CG without a pre-conditioner takes more iterations. The > multi-grid pre-conditioned runs are pretty dramatically accelerated and > require only a handful of iterations. > > We ran our code (modified ex34.c as described above) with the same > parameters: > > mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 > time in KSPSolve(): 5.3729e+00 > solver iterations: 123 > KSP final norm of residual: 0.00595066 > > mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 > -ksp_type cg -pc_type none > time in KSPSolve(): 3.6154e+00 > solver iterations: 188 > KSP final norm of residual: 0.00505943 > > mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 > -ksp_type cg > time in KSPSolve(): 3.5661e+00 > solver iterations: 98 > KSP final norm of residual: 0.00967462 > > mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 > -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson > -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi > time in KSPSolve(): 4.5606e+00 > solver iterations: 44 > KSP final norm of residual: 949.553 > 1) Dave is right 2) In order to see how many iterates to expect, first try using algebraic multigrid -pc_type gamg This should work out of the box for Poisson 3) For questions like this, we really need to see -ksp_view -ksp_monitor_true_residual 4) It sounds like you smoother is not strong enough. You could try -mg_levels_ksp_type richardson -mg_levels_ksp_richardson_self_scale -mg_levels_ksp_max_it 5 or maybe GMRES until it works. Thanks, Matt > mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 > -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson > -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi -ksp_type cg > time in KSPSolve(): 1.5481e+01 > solver iterations: 198 > KSP final norm of residual: 0.916558 > > We performed all tests with petsc-3.7.6. > > The trends with CG and GMRES seem consistent with the results from ex45. > However, with multi-grid, something doesn?t seem right. Convergence seems > poor and the solves run for many more iterations than ex45 with multi-grid > as a pre-conditioner. I extensively validated the code that builds the > matrix and also confirmed that the solution produced by CG, when evaluated > with the system of equations elsewhere in our simulation, produces the same > residual as indicated by PETSc. Given that we only made minimal > modifications to the original example code, it seems likely that the > operators constructed for the multi-grid levels are ok. > > We also tried a variety of other suggested parameters for the multi-grid > pre-conditioner as suggested in related mailing list posts but we didn?t > observe any significant improvements over the results above. > > Is there anything we can do to check the validity of the coefficient > matrices built for the different multi-grid levels? Does it look like there > could be problems there? Or any other suggestions to achieve better results > with multi-grid? I have the -log_view, -ksp_view, and convergence monitor > output from the above tests and can post any of it if it would assist. > > Thanks -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.lefley at aclectic.com Thu Jun 22 15:20:33 2017 From: jason.lefley at aclectic.com (Jason Lefley) Date: Thu, 22 Jun 2017 13:20:33 -0700 Subject: [petsc-users] Issue using multi-grid as a pre-conditioner with KSP for a Poisson problem In-Reply-To: References: <89C033B5-529D-4B36-B4AF-2EC35CA2CCAB@aclectic.com> Message-ID: <4D5B3921-810B-49AD-97E9-8BF1DECBF655@aclectic.com> Thanks for the prompt replies. I ran with gamg and the results look more promising. I tried the suggested -mg_* options and did not see improvement. The -ksp_view and -ksp_monitor_true_residual output from those tests and the solver_test source (modified ex34.c) follow: $ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_view -ksp_monitor_true_residual -pc_type gamg -ksp_type cg right hand side 2 norm: 512. right hand side infinity norm: 0.999097 building operator with Dirichlet boundary conditions, global grid size: 128 x 128 x 128 0 KSP preconditioned resid norm 2.600515167901e+00 true resid norm 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 6.715532962879e-02 true resid norm 7.578946422553e+02 ||r(i)||/||b|| 1.480262973155e+00 2 KSP preconditioned resid norm 1.127682308441e-02 true resid norm 3.247852182315e+01 ||r(i)||/||b|| 6.343461293584e-02 3 KSP preconditioned resid norm 7.760468503025e-04 true resid norm 3.304142895659e+00 ||r(i)||/||b|| 6.453404093085e-03 4 KSP preconditioned resid norm 6.419777870067e-05 true resid norm 2.662993775521e-01 ||r(i)||/||b|| 5.201159717815e-04 5 KSP preconditioned resid norm 5.107540549482e-06 true resid norm 2.309528369351e-02 ||r(i)||/||b|| 4.510797596388e-05 KSP Object: 16 MPI processes type: cg maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 16 MPI processes type: gamg MG: type is MULTIPLICATIVE, levels=5 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices GAMG specific options Threshold for dropping small values from graph 0. AGG specific options Symmetric graph false Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 16 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 16 MPI processes type: bjacobi block Jacobi: number of blocks = 16 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5., needed 1. Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=13, cols=13 package used to perform factorization: petsc total: nonzeros=169, allocated nonzeros=169 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 3 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=13, cols=13 total: nonzeros=169, allocated nonzeros=169 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 3 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=13, cols=13 total: nonzeros=169, allocated nonzeros=169 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 3 nodes, limit used is 5 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 16 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.136516, max = 1.50168 Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_1_esteig_) 16 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 16 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=467, cols=467 total: nonzeros=68689, allocated nonzeros=68689 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 16 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.148872, max = 1.63759 Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_2_esteig_) 16 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_2_) 16 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=14893, cols=14893 total: nonzeros=1856839, allocated nonzeros=1856839 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 16 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.135736, max = 1.49309 Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_3_esteig_) 16 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_3_) 16 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=190701, cols=190701 total: nonzeros=6209261, allocated nonzeros=6209261 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 4 ------------------------------- KSP Object: (mg_levels_4_) 16 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.140039, max = 1.54043 Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_4_esteig_) 16 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_4_) 16 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=2097152, cols=2097152 total: nonzeros=14581760, allocated nonzeros=14581760 total number of mallocs used during MatSetValues calls =0 Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=2097152, cols=2097152 total: nonzeros=14581760, allocated nonzeros=14581760 total number of mallocs used during MatSetValues calls =0 Residual 2 norm 0.0230953 Residual infinity norm 0.000240174 $ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_view -ksp_monitor_true_residual -pc_type mg -ksp_type cg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_richardson_self_scale -mg_levels_ksp_max_it 5 right hand side 2 norm: 512. right hand side infinity norm: 0.999097 building operator with Dirichlet boundary conditions, global grid size: 128 x 128 x 128 building operator with Dirichlet boundary conditions, global grid size: 16 x 16 x 16 building operator with Dirichlet boundary conditions, global grid size: 32 x 32 x 32 building operator with Dirichlet boundary conditions, global grid size: 64 x 64 x 64 building operator with Dirichlet boundary conditions, global grid size: 8 x 8 x 8 0 KSP preconditioned resid norm 1.957390963372e+03 true resid norm 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 7.501162328351e+02 true resid norm 3.373318498950e+02 ||r(i)||/||b|| 6.588512693262e-01 2 KSP preconditioned resid norm 7.658993705113e+01 true resid norm 1.827365322620e+02 ||r(i)||/||b|| 3.569072895742e-01 3 KSP preconditioned resid norm 9.059824824329e+02 true resid norm 1.426474831278e+02 ||r(i)||/||b|| 2.786083654840e-01 4 KSP preconditioned resid norm 4.091168582134e+02 true resid norm 1.292495057977e+02 ||r(i)||/||b|| 2.524404410112e-01 5 KSP preconditioned resid norm 7.422110759274e+01 true resid norm 1.258028404461e+02 ||r(i)||/||b|| 2.457086727463e-01 6 KSP preconditioned resid norm 4.619015396949e+01 true resid norm 1.213792421102e+02 ||r(i)||/||b|| 2.370688322464e-01 7 KSP preconditioned resid norm 6.391009527793e+01 true resid norm 1.124510270422e+02 ||r(i)||/||b|| 2.196309121917e-01 8 KSP preconditioned resid norm 7.446926604265e+01 true resid norm 1.077567310933e+02 ||r(i)||/||b|| 2.104623654166e-01 9 KSP preconditioned resid norm 4.220904319642e+01 true resid norm 9.988181971539e+01 ||r(i)||/||b|| 1.950816791316e-01 10 KSP preconditioned resid norm 2.394387980018e+01 true resid norm 9.127579669592e+01 ||r(i)||/||b|| 1.782730404217e-01 11 KSP preconditioned resid norm 1.360843954226e+01 true resid norm 8.771762326371e+01 ||r(i)||/||b|| 1.713234829369e-01 12 KSP preconditioned resid norm 4.128223286694e+01 true resid norm 8.529182941649e+01 ||r(i)||/||b|| 1.665856043291e-01 13 KSP preconditioned resid norm 2.183532094447e+01 true resid norm 8.263211340769e+01 ||r(i)||/||b|| 1.613908464994e-01 14 KSP preconditioned resid norm 1.304178992338e+01 true resid norm 7.971822602122e+01 ||r(i)||/||b|| 1.556996601977e-01 15 KSP preconditioned resid norm 7.573349141411e+00 true resid norm 7.520975377445e+01 ||r(i)||/||b|| 1.468940503407e-01 16 KSP preconditioned resid norm 9.314890793459e+00 true resid norm 7.304954328407e+01 ||r(i)||/||b|| 1.426748892267e-01 17 KSP preconditioned resid norm 4.445933446231e+00 true resid norm 6.978356031428e+01 ||r(i)||/||b|| 1.362960162388e-01 18 KSP preconditioned resid norm 5.349719054065e+00 true resid norm 6.667516877214e+01 ||r(i)||/||b|| 1.302249390081e-01 19 KSP preconditioned resid norm 3.295861671942e+00 true resid norm 6.182140339659e+01 ||r(i)||/||b|| 1.207449285090e-01 20 KSP preconditioned resid norm 1.035616277789e+01 true resid norm 5.734720030036e+01 ||r(i)||/||b|| 1.120062505866e-01 21 KSP preconditioned resid norm 3.211186072853e+01 true resid norm 5.552393909940e+01 ||r(i)||/||b|| 1.084451935535e-01 22 KSP preconditioned resid norm 1.305589450595e+01 true resid norm 5.499062776214e+01 ||r(i)||/||b|| 1.074035698479e-01 23 KSP preconditioned resid norm 2.686432456763e+00 true resid norm 5.207613218582e+01 ||r(i)||/||b|| 1.017111956754e-01 24 KSP preconditioned resid norm 2.824784197849e+00 true resid norm 4.838619801451e+01 ||r(i)||/||b|| 9.450429299708e-02 25 KSP preconditioned resid norm 1.071690618667e+00 true resid norm 4.607851421273e+01 ||r(i)||/||b|| 8.999709807174e-02 26 KSP preconditioned resid norm 1.881879145107e+00 true resid norm 4.001593265961e+01 ||r(i)||/||b|| 7.815611847581e-02 27 KSP preconditioned resid norm 1.572862295402e+00 true resid norm 3.838282973517e+01 ||r(i)||/||b|| 7.496646432650e-02 28 KSP preconditioned resid norm 1.470751639074e+00 true resid norm 3.480847634691e+01 ||r(i)||/||b|| 6.798530536506e-02 29 KSP preconditioned resid norm 1.024975253805e+01 true resid norm 3.242161363347e+01 ||r(i)||/||b|| 6.332346412788e-02 30 KSP preconditioned resid norm 2.548780607710e+00 true resid norm 3.146609403253e+01 ||r(i)||/||b|| 6.145721490728e-02 31 KSP preconditioned resid norm 1.560691471465e+00 true resid norm 2.970265802267e+01 ||r(i)||/||b|| 5.801300395052e-02 32 KSP preconditioned resid norm 2.596714997356e+00 true resid norm 2.766969046763e+01 ||r(i)||/||b|| 5.404236419458e-02 33 KSP preconditioned resid norm 7.034818331385e+00 true resid norm 2.684572557056e+01 ||r(i)||/||b|| 5.243305775501e-02 34 KSP preconditioned resid norm 1.494072683898e+00 true resid norm 2.475430030960e+01 ||r(i)||/||b|| 4.834824279219e-02 35 KSP preconditioned resid norm 2.080781323538e+01 true resid norm 2.334859550417e+01 ||r(i)||/||b|| 4.560272559409e-02 36 KSP preconditioned resid norm 2.046655096031e+00 true resid norm 2.240354154839e+01 ||r(i)||/||b|| 4.375691708669e-02 37 KSP preconditioned resid norm 7.606846976760e-01 true resid norm 2.109556419574e+01 ||r(i)||/||b|| 4.120227381981e-02 38 KSP preconditioned resid norm 2.521301363193e+00 true resid norm 1.843497075964e+01 ||r(i)||/||b|| 3.600580226493e-02 39 KSP preconditioned resid norm 3.726976470079e+00 true resid norm 1.794209917279e+01 ||r(i)||/||b|| 3.504316244686e-02 40 KSP preconditioned resid norm 8.959884762705e-01 true resid norm 1.573137783532e+01 ||r(i)||/||b|| 3.072534733461e-02 41 KSP preconditioned resid norm 1.227682448888e+00 true resid norm 1.501346415860e+01 ||r(i)||/||b|| 2.932317218476e-02 42 KSP preconditioned resid norm 1.452770736534e+00 true resid norm 1.433942919922e+01 ||r(i)||/||b|| 2.800669765473e-02 43 KSP preconditioned resid norm 5.675352390898e-01 true resid norm 1.216437815936e+01 ||r(i)||/||b|| 2.375855109250e-02 44 KSP preconditioned resid norm 4.949409351772e-01 true resid norm 1.042812110399e+01 ||r(i)||/||b|| 2.036742403123e-02 45 KSP preconditioned resid norm 2.002853875915e+00 true resid norm 9.309183650084e+00 ||r(i)||/||b|| 1.818199931657e-02 46 KSP preconditioned resid norm 3.745525627399e-01 true resid norm 8.522457639380e+00 ||r(i)||/||b|| 1.664542507691e-02 47 KSP preconditioned resid norm 1.811694613170e-01 true resid norm 7.531206553361e+00 ||r(i)||/||b|| 1.470938779953e-02 48 KSP preconditioned resid norm 1.782171623447e+00 true resid norm 6.764441307706e+00 ||r(i)||/||b|| 1.321179942911e-02 49 KSP preconditioned resid norm 2.299828236176e+00 true resid norm 6.702407994976e+00 ||r(i)||/||b|| 1.309064061519e-02 50 KSP preconditioned resid norm 1.273834849543e+00 true resid norm 6.053797247633e+00 ||r(i)||/||b|| 1.182382274928e-02 51 KSP preconditioned resid norm 2.719578737249e-01 true resid norm 5.470925517497e+00 ||r(i)||/||b|| 1.068540140136e-02 52 KSP preconditioned resid norm 4.663757145206e-01 true resid norm 5.005785517882e+00 ||r(i)||/||b|| 9.776924839614e-03 53 KSP preconditioned resid norm 1.292565284376e+00 true resid norm 4.881780753946e+00 ||r(i)||/||b|| 9.534728035050e-03 54 KSP preconditioned resid norm 1.867369610632e-01 true resid norm 4.496564950399e+00 ||r(i)||/||b|| 8.782353418749e-03 55 KSP preconditioned resid norm 5.249392115789e-01 true resid norm 4.092757959067e+00 ||r(i)||/||b|| 7.993667888803e-03 56 KSP preconditioned resid norm 1.924525961621e-01 true resid norm 3.780501481010e+00 ||r(i)||/||b|| 7.383791955098e-03 57 KSP preconditioned resid norm 5.779420386829e-01 true resid norm 3.213189014725e+00 ||r(i)||/||b|| 6.275759794385e-03 58 KSP preconditioned resid norm 5.955339076981e-01 true resid norm 3.112032435949e+00 ||r(i)||/||b|| 6.078188351463e-03 59 KSP preconditioned resid norm 3.750139060970e-01 true resid norm 2.999193364090e+00 ||r(i)||/||b|| 5.857799539239e-03 60 KSP preconditioned resid norm 1.384679712935e-01 true resid norm 2.745891157615e+00 ||r(i)||/||b|| 5.363068667216e-03 61 KSP preconditioned resid norm 7.632834890339e-02 true resid norm 2.176299405671e+00 ||r(i)||/||b|| 4.250584776702e-03 62 KSP preconditioned resid norm 3.147491994853e-01 true resid norm 1.832893972188e+00 ||r(i)||/||b|| 3.579871039430e-03 63 KSP preconditioned resid norm 5.052243308649e-01 true resid norm 1.775115122392e+00 ||r(i)||/||b|| 3.467021723421e-03 64 KSP preconditioned resid norm 8.956523831283e-01 true resid norm 1.731441975933e+00 ||r(i)||/||b|| 3.381722609244e-03 65 KSP preconditioned resid norm 7.897527588669e-01 true resid norm 1.682654829619e+00 ||r(i)||/||b|| 3.286435214100e-03 66 KSP preconditioned resid norm 5.770941160165e-02 true resid norm 1.560734518349e+00 ||r(i)||/||b|| 3.048309606150e-03 67 KSP preconditioned resid norm 3.553024960194e-02 true resid norm 1.389699750667e+00 ||r(i)||/||b|| 2.714257325521e-03 68 KSP preconditioned resid norm 4.316233667769e-02 true resid norm 1.147051776028e+00 ||r(i)||/||b|| 2.240335500054e-03 69 KSP preconditioned resid norm 3.793691994632e-02 true resid norm 1.012385825627e+00 ||r(i)||/||b|| 1.977316065678e-03 70 KSP preconditioned resid norm 2.383460701011e-02 true resid norm 8.696480161436e-01 ||r(i)||/||b|| 1.698531281530e-03 71 KSP preconditioned resid norm 6.376655007996e-02 true resid norm 7.779779636534e-01 ||r(i)||/||b|| 1.519488210261e-03 72 KSP preconditioned resid norm 5.714768085413e-02 true resid norm 7.153671793501e-01 ||r(i)||/||b|| 1.397201522168e-03 73 KSP preconditioned resid norm 1.708395350387e-01 true resid norm 6.312992319936e-01 ||r(i)||/||b|| 1.233006312487e-03 74 KSP preconditioned resid norm 1.498516783452e-01 true resid norm 6.006527781743e-01 ||r(i)||/||b|| 1.173149957372e-03 75 KSP preconditioned resid norm 1.218071938641e-01 true resid norm 5.769463903876e-01 ||r(i)||/||b|| 1.126848418726e-03 76 KSP preconditioned resid norm 2.682030144251e-02 true resid norm 5.214035118381e-01 ||r(i)||/||b|| 1.018366234059e-03 77 KSP preconditioned resid norm 9.794744927328e-02 true resid norm 4.660318995939e-01 ||r(i)||/||b|| 9.102185538943e-04 78 KSP preconditioned resid norm 3.311394355245e-01 true resid norm 4.581129176231e-01 ||r(i)||/||b|| 8.947517922325e-04 79 KSP preconditioned resid norm 7.771705063438e-02 true resid norm 4.103510898511e-01 ||r(i)||/||b|| 8.014669723654e-04 80 KSP preconditioned resid norm 3.078123608908e-02 true resid norm 3.918493012988e-01 ||r(i)||/||b|| 7.653306665991e-04 81 KSP preconditioned resid norm 2.759088686744e-02 true resid norm 3.289360804743e-01 ||r(i)||/||b|| 6.424532821763e-04 82 KSP preconditioned resid norm 1.147671489846e-01 true resid norm 3.190902200515e-01 ||r(i)||/||b|| 6.232230860381e-04 83 KSP preconditioned resid norm 1.101306468440e-02 true resid norm 2.900815313985e-01 ||r(i)||/||b|| 5.665654910126e-04 KSP Object: 16 MPI processes type: cg maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 16 MPI processes type: mg MG: type is MULTIPLICATIVE, levels=5 cycles=v Cycles per PCApply=1 Not using Galerkin computed coarse grid matrices Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 16 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 16 MPI processes type: redundant Redundant preconditioner: First (color=0) of 16 PCs follows KSP Object: (mg_coarse_redundant_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_redundant_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5., needed 7.56438 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=512, cols=512 package used to perform factorization: petsc total: nonzeros=24206, allocated nonzeros=24206 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=512, cols=512 total: nonzeros=3200, allocated nonzeros=3200 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=512, cols=512 total: nonzeros=3200, allocated nonzeros=3200 total number of mallocs used during MatSetValues calls =0 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 16 MPI processes type: richardson Richardson: using self-scale best computed damping factor maximum iterations=5 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 16 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=4096, cols=4096 total: nonzeros=27136, allocated nonzeros=27136 total number of mallocs used during MatSetValues calls =0 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 16 MPI processes type: richardson Richardson: using self-scale best computed damping factor maximum iterations=5 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_2_) 16 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=32768, cols=32768 total: nonzeros=223232, allocated nonzeros=223232 total number of mallocs used during MatSetValues calls =0 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 16 MPI processes type: richardson Richardson: using self-scale best computed damping factor maximum iterations=5 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_3_) 16 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=262144, cols=262144 total: nonzeros=1810432, allocated nonzeros=1810432 total number of mallocs used during MatSetValues calls =0 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 4 ------------------------------- KSP Object: (mg_levels_4_) 16 MPI processes type: richardson Richardson: using self-scale best computed damping factor maximum iterations=5 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_4_) 16 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=2097152, cols=2097152 total: nonzeros=14581760, allocated nonzeros=14581760 total number of mallocs used during MatSetValues calls =0 Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=2097152, cols=2097152 total: nonzeros=14581760, allocated nonzeros=14581760 total number of mallocs used during MatSetValues calls =0 Residual 2 norm 0.290082 Residual infinity norm 0.00192869 solver_test.c: // modified version of ksp/ksp/examples/tutorials/ex34.c // related: ksp/ksp/examples/tutorials/ex29.c // ksp/ksp/examples/tutorials/ex32.c // ksp/ksp/examples/tutorials/ex50.c #include #include #include extern PetscErrorCode ComputeMatrix(KSP,Mat,Mat,void*); extern PetscErrorCode ComputeRHS(KSP,Vec,void*); typedef enum { DIRICHLET, NEUMANN } BCType; #undef __FUNCT__ #define __FUNCT__ "main" int main(int argc,char **argv) { KSP ksp; DM da; PetscReal norm; PetscErrorCode ierr; PetscInt i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs; PetscScalar Hx,Hy,Hz; PetscScalar ***array; Vec x,b,r; Mat J; const char* bcTypes[2] = { "dirichlet", "neumann" }; PetscInt bcType = (PetscInt)DIRICHLET; PetscInitialize(&argc,&argv,(char*)0,0); ierr = PetscOptionsBegin(PETSC_COMM_WORLD, "", "", "");CHKERRQ(ierr); ierr = PetscOptionsEList("-bc_type", "Type of boundary condition", "", bcTypes, 2, bcTypes[0], &bcType, NULL);CHKERRQ(ierr); ierr = PetscOptionsEnd();CHKERRQ(ierr); ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr); ierr = DMDACreate3d(PETSC_COMM_WORLD,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DMDA_STENCIL_STAR,-12,-12,-12,PETSC_DECIDE,PETSC_DECIDE,PETSC_DECIDE,1,1,0,0,0,&da);CHKERRQ(ierr); ierr = DMDASetInterpolationType(da, DMDA_Q0);CHKERRQ(ierr); ierr = KSPSetDM(ksp,da);CHKERRQ(ierr); ierr = KSPSetComputeRHS(ksp,ComputeRHS,&bcType);CHKERRQ(ierr); ierr = KSPSetComputeOperators(ksp,ComputeMatrix,&bcType);CHKERRQ(ierr); ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr); ierr = KSPSolve(ksp,NULL,NULL);CHKERRQ(ierr); ierr = KSPGetSolution(ksp,&x);CHKERRQ(ierr); ierr = KSPGetRhs(ksp,&b);CHKERRQ(ierr); ierr = KSPGetOperators(ksp,NULL,&J);CHKERRQ(ierr); ierr = VecDuplicate(b,&r);CHKERRQ(ierr); ierr = MatMult(J,x,r);CHKERRQ(ierr); ierr = VecAYPX(r,-1.0,b);CHKERRQ(ierr); ierr = VecNorm(r,NORM_2,&norm);CHKERRQ(ierr); ierr = PetscPrintf(PETSC_COMM_WORLD,"Residual 2 norm %g\n",(double)norm);CHKERRQ(ierr); ierr = VecNorm(r,NORM_INFINITY,&norm);CHKERRQ(ierr); ierr = PetscPrintf(PETSC_COMM_WORLD,"Residual infinity norm %g\n",(double)norm);CHKERRQ(ierr); ierr = VecDestroy(&r);CHKERRQ(ierr); ierr = KSPDestroy(&ksp);CHKERRQ(ierr); ierr = DMDestroy(&da);CHKERRQ(ierr); ierr = PetscFinalize(); return 0; } #undef __FUNCT__ #define __FUNCT__ "ComputeRHS" PetscErrorCode ComputeRHS(KSP ksp,Vec b,void *ctx) { PetscErrorCode ierr; PetscInt i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs; PetscScalar Hx,Hy,Hz; PetscScalar ***array; DM da; BCType bcType = *(BCType*)ctx; PetscFunctionBeginUser; ierr = KSPGetDM(ksp,&da);CHKERRQ(ierr); ierr = DMDAGetInfo(da, 0, &mx, &my, &mz, 0,0,0,0,0,0,0,0,0);CHKERRQ(ierr); Hx = 1.0 / (PetscReal)(mx); Hy = 1.0 / (PetscReal)(my); Hz = 1.0 / (PetscReal)(mz); ierr = DMDAGetCorners(da,&xs,&ys,&zs,&xm,&ym,&zm);CHKERRQ(ierr); ierr = DMDAVecGetArray(da, b, &array);CHKERRQ(ierr); for (k = zs; k < zs + zm; k++) { for (j = ys; j < ys + ym; j++) { for (i = xs; i < xs + xm; i++) { PetscReal x = ((PetscReal)i + 0.5) * Hx; PetscReal y = ((PetscReal)j + 0.5) * Hy; PetscReal z = ((PetscReal)k + 0.5) * Hz; array[k][j][i] = PetscSinReal(x * 2.0 * PETSC_PI) * PetscCosReal(y * 2.0 * PETSC_PI) * PetscSinReal(z * 2.0 * PETSC_PI); } } } ierr = DMDAVecRestoreArray(da, b, &array);CHKERRQ(ierr); ierr = VecAssemblyBegin(b);CHKERRQ(ierr); ierr = VecAssemblyEnd(b);CHKERRQ(ierr); PetscReal norm; VecNorm(b, NORM_2, &norm); PetscPrintf(PETSC_COMM_WORLD, "right hand side 2 norm: %g\n", (double)norm); VecNorm(b, NORM_INFINITY, &norm); PetscPrintf(PETSC_COMM_WORLD, "right hand side infinity norm: %g\n", (double)norm); /* force right hand side to be consistent for singular matrix */ /* note this is really a hack, normally the model would provide you with a consistent right handside */ if (bcType == NEUMANN) { MatNullSpace nullspace; ierr = MatNullSpaceCreate(PETSC_COMM_WORLD,PETSC_TRUE,0,0,&nullspace);CHKERRQ(ierr); ierr = MatNullSpaceRemove(nullspace,b);CHKERRQ(ierr); ierr = MatNullSpaceDestroy(&nullspace);CHKERRQ(ierr); } PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "ComputeMatrix" PetscErrorCode ComputeMatrix(KSP ksp, Mat J,Mat jac, void *ctx) { PetscErrorCode ierr; PetscInt i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs,num, numi, numj, numk; PetscScalar v[7],Hx,Hy,Hz; MatStencil row, col[7]; DM da; BCType bcType = *(BCType*)ctx; PetscFunctionBeginUser; if (bcType == DIRICHLET) PetscPrintf(PETSC_COMM_WORLD, "building operator with Dirichlet boundary conditions, "); else if (bcType == NEUMANN) PetscPrintf(PETSC_COMM_WORLD, "building operator with Neumann boundary conditions, "); else SETERRQ(PETSC_COMM_WORLD, PETSC_ERR_SUP, "unrecognized boundary condition type\n"); ierr = KSPGetDM(ksp,&da);CHKERRQ(ierr); ierr = DMDAGetInfo(da,0,&mx,&my,&mz,0,0,0,0,0,0,0,0,0);CHKERRQ(ierr); PetscPrintf(PETSC_COMM_WORLD, "global grid size: %d x %d x %d\n", mx, my, mz); Hx = 1.0 / (PetscReal)(mx); Hy = 1.0 / (PetscReal)(my); Hz = 1.0 / (PetscReal)(mz); PetscReal Hx2 = Hx * Hx; PetscReal Hy2 = Hy * Hy; PetscReal Hz2 = Hz * Hz; PetscReal scaleX = 1.0 / Hx2; PetscReal scaleY = 1.0 / Hy2; PetscReal scaleZ = 1.0 / Hz2; ierr = DMDAGetCorners(da,&xs,&ys,&zs,&xm,&ym,&zm);CHKERRQ(ierr); for (k = zs; k < zs + zm; k++) { for (j = ys; j < ys + ym; j++) { for (i = xs; i < xs + xm; i++) { row.i = i; row.j = j; row.k = k; if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1 || k == mz - 1) { num = 0; numi = 0; numj = 0; numk = 0; if (k != 0) { v[num] = -scaleZ; col[num].i = i; col[num].j = j; col[num].k = k - 1; num++; numk++; } if (j != 0) { v[num] = -scaleY; col[num].i = i; col[num].j = j - 1; col[num].k = k; num++; numj++; } if (i != 0) { v[num] = -scaleX; col[num].i = i - 1; col[num].j = j; col[num].k = k; num++; numi++; } if (i != mx - 1) { v[num] = -scaleX; col[num].i = i + 1; col[num].j = j; col[num].k = k; num++; numi++; } if (j != my - 1) { v[num] = -scaleY; col[num].i = i; col[num].j = j + 1; col[num].k = k; num++; numj++; } if (k != mz - 1) { v[num] = -scaleZ; col[num].i = i; col[num].j = j; col[num].k = k + 1; num++; numk++; } if (bcType == NEUMANN) { v[num] = (PetscReal) (numk) * scaleZ + (PetscReal) (numj) * scaleY + (PetscReal) (numi) * scaleX; } else if (bcType == DIRICHLET) { v[num] = 2.0 * (scaleX + scaleY + scaleZ); } col[num].i = i; col[num].j = j; col[num].k = k; num++; ierr = MatSetValuesStencil(jac, 1, &row, num, col, v, INSERT_VALUES); CHKERRQ(ierr); } else { v[0] = -scaleZ; col[0].i = i; col[0].j = j; col[0].k = k - 1; v[1] = -scaleY; col[1].i = i; col[1].j = j - 1; col[1].k = k; v[2] = -scaleX; col[2].i = i - 1; col[2].j = j; col[2].k = k; v[3] = 2.0 * (scaleX + scaleY + scaleZ); col[3].i = i; col[3].j = j; col[3].k = k; v[4] = -scaleX; col[4].i = i + 1; col[4].j = j; col[4].k = k; v[5] = -scaleY; col[5].i = i; col[5].j = j + 1; col[5].k = k; v[6] = -scaleZ; col[6].i = i; col[6].j = j; col[6].k = k + 1; ierr = MatSetValuesStencil(jac, 1, &row, 7, col, v, INSERT_VALUES); CHKERRQ(ierr); } } } } ierr = MatAssemblyBegin(jac,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); ierr = MatAssemblyEnd(jac,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); if (bcType == NEUMANN) { MatNullSpace nullspace; ierr = MatNullSpaceCreate(PETSC_COMM_WORLD,PETSC_TRUE,0,0,&nullspace);CHKERRQ(ierr); ierr = MatSetNullSpace(J,nullspace);CHKERRQ(ierr); ierr = MatNullSpaceDestroy(&nullspace);CHKERRQ(ierr); } PetscFunctionReturn(0); } > On Jun 22, 2017, at 9:23 AM, Matthew Knepley wrote: > > On Wed, Jun 21, 2017 at 8:12 PM, Jason Lefley > wrote: > Hello, > > We are attempting to use the PETSc KSP solver framework in a fluid dynamics simulation we developed. The solution is part of a pressure projection and solves a Poisson problem. We use a cell-centered layout with a regular grid in 3d. We started with ex34.c from the KSP tutorials since it has the correct calls for the 3d DMDA, uses a cell-centered layout, and states that it works with multi-grid. We modified the operator construction function to match the coefficients and Dirichlet boundary conditions used in our problem (we?d also like to support Neumann but left those out for now to keep things simple). As a result of the modified boundary conditions, our version does not perform a null space removal on the right hand side or operator as the original did. We also modified the right hand side to contain a sinusoidal pattern for testing. Other than these changes, our code is the same as the original ex34.c > > With the default KSP options and using CG with the default pre-conditioner and without a pre-conditioner, we see good convergence. However, we?d like to accelerate the time to solution further and target larger problem sizes (>= 1024^3) if possible. Given these objectives, multi-grid as a pre-conditioner interests us. To understand the improvement that multi-grid provides, we ran ex45 from the KSP tutorials. ex34 with CG and no pre-conditioner appears to converge in a single iteration and we wanted to compare against a problem that has similar convergence patterns to our problem. Here?s the tests we ran with ex45: > > mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 > time in KSPSolve(): 7.0178e+00 > solver iterations: 157 > KSP final norm of residual: 3.16874e-05 > > mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -ksp_type cg -pc_type none > time in KSPSolve(): 4.1072e+00 > solver iterations: 213 > KSP final norm of residual: 0.000138866 > > mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -ksp_type cg > time in KSPSolve(): 3.3962e+00 > solver iterations: 88 > KSP final norm of residual: 6.46242e-05 > > mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi > time in KSPSolve(): 1.3201e+00 > solver iterations: 4 > KSP final norm of residual: 8.13339e-05 > > mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi -ksp_type cg > time in KSPSolve(): 1.3008e+00 > solver iterations: 4 > KSP final norm of residual: 2.21474e-05 > > We found the multi-grid pre-conditioner options in the KSP tutorials makefile. These results make sense; both the default GMRES and CG solvers converge and CG without a pre-conditioner takes more iterations. The multi-grid pre-conditioned runs are pretty dramatically accelerated and require only a handful of iterations. > > We ran our code (modified ex34.c as described above) with the same parameters: > > mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 > time in KSPSolve(): 5.3729e+00 > solver iterations: 123 > KSP final norm of residual: 0.00595066 > > mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_type cg -pc_type none > time in KSPSolve(): 3.6154e+00 > solver iterations: 188 > KSP final norm of residual: 0.00505943 > > mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_type cg > time in KSPSolve(): 3.5661e+00 > solver iterations: 98 > KSP final norm of residual: 0.00967462 > > mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi > time in KSPSolve(): 4.5606e+00 > solver iterations: 44 > KSP final norm of residual: 949.553 > > 1) Dave is right > > 2) In order to see how many iterates to expect, first try using algebraic multigrid > > -pc_type gamg > > This should work out of the box for Poisson > > 3) For questions like this, we really need to see > > -ksp_view -ksp_monitor_true_residual > > 4) It sounds like you smoother is not strong enough. You could try > > -mg_levels_ksp_type richardson -mg_levels_ksp_richardson_self_scale -mg_levels_ksp_max_it 5 > > or maybe GMRES until it works. > > Thanks, > > Matt > > mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi -ksp_type cg > time in KSPSolve(): 1.5481e+01 > solver iterations: 198 > KSP final norm of residual: 0.916558 > > We performed all tests with petsc-3.7.6. > > The trends with CG and GMRES seem consistent with the results from ex45. However, with multi-grid, something doesn?t seem right. Convergence seems poor and the solves run for many more iterations than ex45 with multi-grid as a pre-conditioner. I extensively validated the code that builds the matrix and also confirmed that the solution produced by CG, when evaluated with the system of equations elsewhere in our simulation, produces the same residual as indicated by PETSc. Given that we only made minimal modifications to the original example code, it seems likely that the operators constructed for the multi-grid levels are ok. > > We also tried a variety of other suggested parameters for the multi-grid pre-conditioner as suggested in related mailing list posts but we didn?t observe any significant improvements over the results above. > > Is there anything we can do to check the validity of the coefficient matrices built for the different multi-grid levels? Does it look like there could be problems there? Or any other suggestions to achieve better results with multi-grid? I have the -log_view, -ksp_view, and convergence monitor output from the above tests and can post any of it if it would assist. > > Thanks > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Jun 22 17:52:40 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 22 Jun 2017 17:52:40 -0500 Subject: [petsc-users] Issue using multi-grid as a pre-conditioner with KSP for a Poisson problem In-Reply-To: <4D5B3921-810B-49AD-97E9-8BF1DECBF655@aclectic.com> References: <89C033B5-529D-4B36-B4AF-2EC35CA2CCAB@aclectic.com> <4D5B3921-810B-49AD-97E9-8BF1DECBF655@aclectic.com> Message-ID: Try running the -pc_type mg case with the additional argument -pc_mg_galerkin does that decrease the number of iterations? > On Jun 22, 2017, at 3:20 PM, Jason Lefley wrote: > > Thanks for the prompt replies. I ran with gamg and the results look more promising. I tried the suggested -mg_* options and did not see improvement. Yes, the 5 iterations you get below is pretty much the best you can expect. No reasonably tuning of smoother options is likely to have much affect (it is very difficult to improve from 5). Barry > The -ksp_view and -ksp_monitor_true_residual output from those tests and the solver_test source (modified ex34.c) follow: > > $ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_view -ksp_monitor_true_residual -pc_type gamg -ksp_type cg > right hand side 2 norm: 512. > right hand side infinity norm: 0.999097 > building operator with Dirichlet boundary conditions, global grid size: 128 x 128 x 128 > 0 KSP preconditioned resid norm 2.600515167901e+00 true resid norm 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 6.715532962879e-02 true resid norm 7.578946422553e+02 ||r(i)||/||b|| 1.480262973155e+00 > 2 KSP preconditioned resid norm 1.127682308441e-02 true resid norm 3.247852182315e+01 ||r(i)||/||b|| 6.343461293584e-02 > 3 KSP preconditioned resid norm 7.760468503025e-04 true resid norm 3.304142895659e+00 ||r(i)||/||b|| 6.453404093085e-03 > 4 KSP preconditioned resid norm 6.419777870067e-05 true resid norm 2.662993775521e-01 ||r(i)||/||b|| 5.201159717815e-04 > 5 KSP preconditioned resid norm 5.107540549482e-06 true resid norm 2.309528369351e-02 ||r(i)||/||b|| 4.510797596388e-05 > KSP Object: 16 MPI processes Y > type: cg > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 16 MPI processes > type: gamg > MG: type is MULTIPLICATIVE, levels=5 cycles=v > Cycles per PCApply=1 > Using Galerkin computed coarse grid matrices > GAMG specific options > Threshold for dropping small values from graph 0. > AGG specific options > Symmetric graph false > Coarse grid solver -- level ------------------------------- > KSP Object: (mg_coarse_) 16 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_) 16 MPI processes > type: bjacobi > block Jacobi: number of blocks = 16 > Local solve is same for all blocks, in the following KSP and PC objects: > KSP Object: (mg_coarse_sub_) 1 MPI processes > type: preonly > maximum iterations=1, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_sub_) 1 MPI processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > using diagonal shift on blocks to prevent zero pivot [INBLOCKS] > matrix ordering: nd > factor fill ratio given 5., needed 1. > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=13, cols=13 > package used to perform factorization: petsc > total: nonzeros=169, allocated nonzeros=169 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 3 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=13, cols=13 > total: nonzeros=169, allocated nonzeros=169 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 3 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=13, cols=13 > total: nonzeros=169, allocated nonzeros=169 > total number of mallocs used during MatSetValues calls =0 > using I-node (on process 0) routines: found 3 nodes, limit used is 5 > Down solver (pre-smoother) on level 1 ------------------------------- > KSP Object: (mg_levels_1_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.136516, max = 1.50168 > Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_1_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_1_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=467, cols=467 > total: nonzeros=68689, allocated nonzeros=68689 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 2 ------------------------------- > KSP Object: (mg_levels_2_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.148872, max = 1.63759 > Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_2_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_2_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=14893, cols=14893 > total: nonzeros=1856839, allocated nonzeros=1856839 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 3 ------------------------------- > KSP Object: (mg_levels_3_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.135736, max = 1.49309 > Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_3_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_3_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=190701, cols=190701 > total: nonzeros=6209261, allocated nonzeros=6209261 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 4 ------------------------------- > KSP Object: (mg_levels_4_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.140039, max = 1.54043 > Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_4_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_4_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=2097152, cols=2097152 > total: nonzeros=14581760, allocated nonzeros=14581760 > total number of mallocs used during MatSetValues calls =0 > Up solver (post-smoother) same as down solver (pre-smoother) > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=2097152, cols=2097152 > total: nonzeros=14581760, allocated nonzeros=14581760 > total number of mallocs used during MatSetValues calls =0 > Residual 2 norm 0.0230953 > Residual infinity norm 0.000240174 > > > > $ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_view -ksp_monitor_true_residual -pc_type mg -ksp_type cg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_richardson_self_scale -mg_levels_ksp_max_it 5 > right hand side 2 norm: 512. > right hand side infinity norm: 0.999097 > building operator with Dirichlet boundary conditions, global grid size: 128 x 128 x 128 > building operator with Dirichlet boundary conditions, global grid size: 16 x 16 x 16 > building operator with Dirichlet boundary conditions, global grid size: 32 x 32 x 32 > building operator with Dirichlet boundary conditions, global grid size: 64 x 64 x 64 > building operator with Dirichlet boundary conditions, global grid size: 8 x 8 x 8 > 0 KSP preconditioned resid norm 1.957390963372e+03 true resid norm 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 7.501162328351e+02 true resid norm 3.373318498950e+02 ||r(i)||/||b|| 6.588512693262e-01 > 2 KSP preconditioned resid norm 7.658993705113e+01 true resid norm 1.827365322620e+02 ||r(i)||/||b|| 3.569072895742e-01 > 3 KSP preconditioned resid norm 9.059824824329e+02 true resid norm 1.426474831278e+02 ||r(i)||/||b|| 2.786083654840e-01 > 4 KSP preconditioned resid norm 4.091168582134e+02 true resid norm 1.292495057977e+02 ||r(i)||/||b|| 2.524404410112e-01 > 5 KSP preconditioned resid norm 7.422110759274e+01 true resid norm 1.258028404461e+02 ||r(i)||/||b|| 2.457086727463e-01 > 6 KSP preconditioned resid norm 4.619015396949e+01 true resid norm 1.213792421102e+02 ||r(i)||/||b|| 2.370688322464e-01 > 7 KSP preconditioned resid norm 6.391009527793e+01 true resid norm 1.124510270422e+02 ||r(i)||/||b|| 2.196309121917e-01 > 8 KSP preconditioned resid norm 7.446926604265e+01 true resid norm 1.077567310933e+02 ||r(i)||/||b|| 2.104623654166e-01 > 9 KSP preconditioned resid norm 4.220904319642e+01 true resid norm 9.988181971539e+01 ||r(i)||/||b|| 1.950816791316e-01 > 10 KSP preconditioned resid norm 2.394387980018e+01 true resid norm 9.127579669592e+01 ||r(i)||/||b|| 1.782730404217e-01 > 11 KSP preconditioned resid norm 1.360843954226e+01 true resid norm 8.771762326371e+01 ||r(i)||/||b|| 1.713234829369e-01 > 12 KSP preconditioned resid norm 4.128223286694e+01 true resid norm 8.529182941649e+01 ||r(i)||/||b|| 1.665856043291e-01 > 13 KSP preconditioned resid norm 2.183532094447e+01 true resid norm 8.263211340769e+01 ||r(i)||/||b|| 1.613908464994e-01 > 14 KSP preconditioned resid norm 1.304178992338e+01 true resid norm 7.971822602122e+01 ||r(i)||/||b|| 1.556996601977e-01 > 15 KSP preconditioned resid norm 7.573349141411e+00 true resid norm 7.520975377445e+01 ||r(i)||/||b|| 1.468940503407e-01 > 16 KSP preconditioned resid norm 9.314890793459e+00 true resid norm 7.304954328407e+01 ||r(i)||/||b|| 1.426748892267e-01 > 17 KSP preconditioned resid norm 4.445933446231e+00 true resid norm 6.978356031428e+01 ||r(i)||/||b|| 1.362960162388e-01 > 18 KSP preconditioned resid norm 5.349719054065e+00 true resid norm 6.667516877214e+01 ||r(i)||/||b|| 1.302249390081e-01 > 19 KSP preconditioned resid norm 3.295861671942e+00 true resid norm 6.182140339659e+01 ||r(i)||/||b|| 1.207449285090e-01 > 20 KSP preconditioned resid norm 1.035616277789e+01 true resid norm 5.734720030036e+01 ||r(i)||/||b|| 1.120062505866e-01 > 21 KSP preconditioned resid norm 3.211186072853e+01 true resid norm 5.552393909940e+01 ||r(i)||/||b|| 1.084451935535e-01 > 22 KSP preconditioned resid norm 1.305589450595e+01 true resid norm 5.499062776214e+01 ||r(i)||/||b|| 1.074035698479e-01 > 23 KSP preconditioned resid norm 2.686432456763e+00 true resid norm 5.207613218582e+01 ||r(i)||/||b|| 1.017111956754e-01 > 24 KSP preconditioned resid norm 2.824784197849e+00 true resid norm 4.838619801451e+01 ||r(i)||/||b|| 9.450429299708e-02 > 25 KSP preconditioned resid norm 1.071690618667e+00 true resid norm 4.607851421273e+01 ||r(i)||/||b|| 8.999709807174e-02 > 26 KSP preconditioned resid norm 1.881879145107e+00 true resid norm 4.001593265961e+01 ||r(i)||/||b|| 7.815611847581e-02 > 27 KSP preconditioned resid norm 1.572862295402e+00 true resid norm 3.838282973517e+01 ||r(i)||/||b|| 7.496646432650e-02 > 28 KSP preconditioned resid norm 1.470751639074e+00 true resid norm 3.480847634691e+01 ||r(i)||/||b|| 6.798530536506e-02 > 29 KSP preconditioned resid norm 1.024975253805e+01 true resid norm 3.242161363347e+01 ||r(i)||/||b|| 6.332346412788e-02 > 30 KSP preconditioned resid norm 2.548780607710e+00 true resid norm 3.146609403253e+01 ||r(i)||/||b|| 6.145721490728e-02 > 31 KSP preconditioned resid norm 1.560691471465e+00 true resid norm 2.970265802267e+01 ||r(i)||/||b|| 5.801300395052e-02 > 32 KSP preconditioned resid norm 2.596714997356e+00 true resid norm 2.766969046763e+01 ||r(i)||/||b|| 5.404236419458e-02 > 33 KSP preconditioned resid norm 7.034818331385e+00 true resid norm 2.684572557056e+01 ||r(i)||/||b|| 5.243305775501e-02 > 34 KSP preconditioned resid norm 1.494072683898e+00 true resid norm 2.475430030960e+01 ||r(i)||/||b|| 4.834824279219e-02 > 35 KSP preconditioned resid norm 2.080781323538e+01 true resid norm 2.334859550417e+01 ||r(i)||/||b|| 4.560272559409e-02 > 36 KSP preconditioned resid norm 2.046655096031e+00 true resid norm 2.240354154839e+01 ||r(i)||/||b|| 4.375691708669e-02 > 37 KSP preconditioned resid norm 7.606846976760e-01 true resid norm 2.109556419574e+01 ||r(i)||/||b|| 4.120227381981e-02 > 38 KSP preconditioned resid norm 2.521301363193e+00 true resid norm 1.843497075964e+01 ||r(i)||/||b|| 3.600580226493e-02 > 39 KSP preconditioned resid norm 3.726976470079e+00 true resid norm 1.794209917279e+01 ||r(i)||/||b|| 3.504316244686e-02 > 40 KSP preconditioned resid norm 8.959884762705e-01 true resid norm 1.573137783532e+01 ||r(i)||/||b|| 3.072534733461e-02 > 41 KSP preconditioned resid norm 1.227682448888e+00 true resid norm 1.501346415860e+01 ||r(i)||/||b|| 2.932317218476e-02 > 42 KSP preconditioned resid norm 1.452770736534e+00 true resid norm 1.433942919922e+01 ||r(i)||/||b|| 2.800669765473e-02 > 43 KSP preconditioned resid norm 5.675352390898e-01 true resid norm 1.216437815936e+01 ||r(i)||/||b|| 2.375855109250e-02 > 44 KSP preconditioned resid norm 4.949409351772e-01 true resid norm 1.042812110399e+01 ||r(i)||/||b|| 2.036742403123e-02 > 45 KSP preconditioned resid norm 2.002853875915e+00 true resid norm 9.309183650084e+00 ||r(i)||/||b|| 1.818199931657e-02 > 46 KSP preconditioned resid norm 3.745525627399e-01 true resid norm 8.522457639380e+00 ||r(i)||/||b|| 1.664542507691e-02 > 47 KSP preconditioned resid norm 1.811694613170e-01 true resid norm 7.531206553361e+00 ||r(i)||/||b|| 1.470938779953e-02 > 48 KSP preconditioned resid norm 1.782171623447e+00 true resid norm 6.764441307706e+00 ||r(i)||/||b|| 1.321179942911e-02 > 49 KSP preconditioned resid norm 2.299828236176e+00 true resid norm 6.702407994976e+00 ||r(i)||/||b|| 1.309064061519e-02 > 50 KSP preconditioned resid norm 1.273834849543e+00 true resid norm 6.053797247633e+00 ||r(i)||/||b|| 1.182382274928e-02 > 51 KSP preconditioned resid norm 2.719578737249e-01 true resid norm 5.470925517497e+00 ||r(i)||/||b|| 1.068540140136e-02 > 52 KSP preconditioned resid norm 4.663757145206e-01 true resid norm 5.005785517882e+00 ||r(i)||/||b|| 9.776924839614e-03 > 53 KSP preconditioned resid norm 1.292565284376e+00 true resid norm 4.881780753946e+00 ||r(i)||/||b|| 9.534728035050e-03 > 54 KSP preconditioned resid norm 1.867369610632e-01 true resid norm 4.496564950399e+00 ||r(i)||/||b|| 8.782353418749e-03 > 55 KSP preconditioned resid norm 5.249392115789e-01 true resid norm 4.092757959067e+00 ||r(i)||/||b|| 7.993667888803e-03 > 56 KSP preconditioned resid norm 1.924525961621e-01 true resid norm 3.780501481010e+00 ||r(i)||/||b|| 7.383791955098e-03 > 57 KSP preconditioned resid norm 5.779420386829e-01 true resid norm 3.213189014725e+00 ||r(i)||/||b|| 6.275759794385e-03 > 58 KSP preconditioned resid norm 5.955339076981e-01 true resid norm 3.112032435949e+00 ||r(i)||/||b|| 6.078188351463e-03 > 59 KSP preconditioned resid norm 3.750139060970e-01 true resid norm 2.999193364090e+00 ||r(i)||/||b|| 5.857799539239e-03 > 60 KSP preconditioned resid norm 1.384679712935e-01 true resid norm 2.745891157615e+00 ||r(i)||/||b|| 5.363068667216e-03 > 61 KSP preconditioned resid norm 7.632834890339e-02 true resid norm 2.176299405671e+00 ||r(i)||/||b|| 4.250584776702e-03 > 62 KSP preconditioned resid norm 3.147491994853e-01 true resid norm 1.832893972188e+00 ||r(i)||/||b|| 3.579871039430e-03 > 63 KSP preconditioned resid norm 5.052243308649e-01 true resid norm 1.775115122392e+00 ||r(i)||/||b|| 3.467021723421e-03 > 64 KSP preconditioned resid norm 8.956523831283e-01 true resid norm 1.731441975933e+00 ||r(i)||/||b|| 3.381722609244e-03 > 65 KSP preconditioned resid norm 7.897527588669e-01 true resid norm 1.682654829619e+00 ||r(i)||/||b|| 3.286435214100e-03 > 66 KSP preconditioned resid norm 5.770941160165e-02 true resid norm 1.560734518349e+00 ||r(i)||/||b|| 3.048309606150e-03 > 67 KSP preconditioned resid norm 3.553024960194e-02 true resid norm 1.389699750667e+00 ||r(i)||/||b|| 2.714257325521e-03 > 68 KSP preconditioned resid norm 4.316233667769e-02 true resid norm 1.147051776028e+00 ||r(i)||/||b|| 2.240335500054e-03 > 69 KSP preconditioned resid norm 3.793691994632e-02 true resid norm 1.012385825627e+00 ||r(i)||/||b|| 1.977316065678e-03 > 70 KSP preconditioned resid norm 2.383460701011e-02 true resid norm 8.696480161436e-01 ||r(i)||/||b|| 1.698531281530e-03 > 71 KSP preconditioned resid norm 6.376655007996e-02 true resid norm 7.779779636534e-01 ||r(i)||/||b|| 1.519488210261e-03 > 72 KSP preconditioned resid norm 5.714768085413e-02 true resid norm 7.153671793501e-01 ||r(i)||/||b|| 1.397201522168e-03 > 73 KSP preconditioned resid norm 1.708395350387e-01 true resid norm 6.312992319936e-01 ||r(i)||/||b|| 1.233006312487e-03 > 74 KSP preconditioned resid norm 1.498516783452e-01 true resid norm 6.006527781743e-01 ||r(i)||/||b|| 1.173149957372e-03 > 75 KSP preconditioned resid norm 1.218071938641e-01 true resid norm 5.769463903876e-01 ||r(i)||/||b|| 1.126848418726e-03 > 76 KSP preconditioned resid norm 2.682030144251e-02 true resid norm 5.214035118381e-01 ||r(i)||/||b|| 1.018366234059e-03 > 77 KSP preconditioned resid norm 9.794744927328e-02 true resid norm 4.660318995939e-01 ||r(i)||/||b|| 9.102185538943e-04 > 78 KSP preconditioned resid norm 3.311394355245e-01 true resid norm 4.581129176231e-01 ||r(i)||/||b|| 8.947517922325e-04 > 79 KSP preconditioned resid norm 7.771705063438e-02 true resid norm 4.103510898511e-01 ||r(i)||/||b|| 8.014669723654e-04 > 80 KSP preconditioned resid norm 3.078123608908e-02 true resid norm 3.918493012988e-01 ||r(i)||/||b|| 7.653306665991e-04 > 81 KSP preconditioned resid norm 2.759088686744e-02 true resid norm 3.289360804743e-01 ||r(i)||/||b|| 6.424532821763e-04 > 82 KSP preconditioned resid norm 1.147671489846e-01 true resid norm 3.190902200515e-01 ||r(i)||/||b|| 6.232230860381e-04 > 83 KSP preconditioned resid norm 1.101306468440e-02 true resid norm 2.900815313985e-01 ||r(i)||/||b|| 5.665654910126e-04 > KSP Object: 16 MPI processes > type: cg > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 16 MPI processes > type: mg > MG: type is MULTIPLICATIVE, levels=5 cycles=v > Cycles per PCApply=1 > Not using Galerkin computed coarse grid matrices > Coarse grid solver -- level ------------------------------- > KSP Object: (mg_coarse_) 16 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_) 16 MPI processes > type: redundant > Redundant preconditioner: First (color=0) of 16 PCs follows > KSP Object: (mg_coarse_redundant_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_redundant_) 1 MPI processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > using diagonal shift on blocks to prevent zero pivot [INBLOCKS] > matrix ordering: nd > factor fill ratio given 5., needed 7.56438 > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=512, cols=512 > package used to perform factorization: petsc > total: nonzeros=24206, allocated nonzeros=24206 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=512, cols=512 > total: nonzeros=3200, allocated nonzeros=3200 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=512, cols=512 > total: nonzeros=3200, allocated nonzeros=3200 > total number of mallocs used during MatSetValues calls =0 > Down solver (pre-smoother) on level 1 ------------------------------- > KSP Object: (mg_levels_1_) 16 MPI processes > type: richardson > Richardson: using self-scale best computed damping factor > maximum iterations=5 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_1_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=4096, cols=4096 > total: nonzeros=27136, allocated nonzeros=27136 > total number of mallocs used during MatSetValues calls =0 > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 2 ------------------------------- > KSP Object: (mg_levels_2_) 16 MPI processes > type: richardson > Richardson: using self-scale best computed damping factor > maximum iterations=5 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_2_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=32768, cols=32768 > total: nonzeros=223232, allocated nonzeros=223232 > total number of mallocs used during MatSetValues calls =0 > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 3 ------------------------------- > KSP Object: (mg_levels_3_) 16 MPI processes > type: richardson > Richardson: using self-scale best computed damping factor > maximum iterations=5 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_3_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=262144, cols=262144 > total: nonzeros=1810432, allocated nonzeros=1810432 > total number of mallocs used during MatSetValues calls =0 > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 4 ------------------------------- > KSP Object: (mg_levels_4_) 16 MPI processes > type: richardson > Richardson: using self-scale best computed damping factor > maximum iterations=5 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_4_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=2097152, cols=2097152 > total: nonzeros=14581760, allocated nonzeros=14581760 > total number of mallocs used during MatSetValues calls =0 > Up solver (post-smoother) same as down solver (pre-smoother) > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=2097152, cols=2097152 > total: nonzeros=14581760, allocated nonzeros=14581760 > total number of mallocs used during MatSetValues calls =0 > Residual 2 norm 0.290082 > Residual infinity norm 0.00192869 > > > > > > solver_test.c: > > // modified version of ksp/ksp/examples/tutorials/ex34.c > // related: ksp/ksp/examples/tutorials/ex29.c > // ksp/ksp/examples/tutorials/ex32.c > // ksp/ksp/examples/tutorials/ex50.c > > #include > #include > #include > > extern PetscErrorCode ComputeMatrix(KSP,Mat,Mat,void*); > extern PetscErrorCode ComputeRHS(KSP,Vec,void*); > > typedef enum > { > DIRICHLET, > NEUMANN > } BCType; > > #undef __FUNCT__ > #define __FUNCT__ "main" > int main(int argc,char **argv) > { > KSP ksp; > DM da; > PetscReal norm; > PetscErrorCode ierr; > > PetscInt i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs; > PetscScalar Hx,Hy,Hz; > PetscScalar ***array; > Vec x,b,r; > Mat J; > const char* bcTypes[2] = { "dirichlet", "neumann" }; > PetscInt bcType = (PetscInt)DIRICHLET; > > PetscInitialize(&argc,&argv,(char*)0,0); > > ierr = PetscOptionsBegin(PETSC_COMM_WORLD, "", "", "");CHKERRQ(ierr); > ierr = PetscOptionsEList("-bc_type", "Type of boundary condition", "", bcTypes, 2, bcTypes[0], &bcType, NULL);CHKERRQ(ierr); > ierr = PetscOptionsEnd();CHKERRQ(ierr); > > ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr); > ierr = DMDACreate3d(PETSC_COMM_WORLD,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DMDA_STENCIL_STAR,-12,-12,-12,PETSC_DECIDE,PETSC_DECIDE,PETSC_DECIDE,1,1,0,0,0,&da);CHKERRQ(ierr); > ierr = DMDASetInterpolationType(da, DMDA_Q0);CHKERRQ(ierr); > > ierr = KSPSetDM(ksp,da);CHKERRQ(ierr); > > ierr = KSPSetComputeRHS(ksp,ComputeRHS,&bcType);CHKERRQ(ierr); > ierr = KSPSetComputeOperators(ksp,ComputeMatrix,&bcType);CHKERRQ(ierr); > ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr); > ierr = KSPSolve(ksp,NULL,NULL);CHKERRQ(ierr); > ierr = KSPGetSolution(ksp,&x);CHKERRQ(ierr); > ierr = KSPGetRhs(ksp,&b);CHKERRQ(ierr); > ierr = KSPGetOperators(ksp,NULL,&J);CHKERRQ(ierr); > ierr = VecDuplicate(b,&r);CHKERRQ(ierr); > > ierr = MatMult(J,x,r);CHKERRQ(ierr); > ierr = VecAYPX(r,-1.0,b);CHKERRQ(ierr); > ierr = VecNorm(r,NORM_2,&norm);CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD,"Residual 2 norm %g\n",(double)norm);CHKERRQ(ierr); > ierr = VecNorm(r,NORM_INFINITY,&norm);CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD,"Residual infinity norm %g\n",(double)norm);CHKERRQ(ierr); > > ierr = VecDestroy(&r);CHKERRQ(ierr); > ierr = KSPDestroy(&ksp);CHKERRQ(ierr); > ierr = DMDestroy(&da);CHKERRQ(ierr); > ierr = PetscFinalize(); > return 0; > } > > #undef __FUNCT__ > #define __FUNCT__ "ComputeRHS" > PetscErrorCode ComputeRHS(KSP ksp,Vec b,void *ctx) > { > PetscErrorCode ierr; > PetscInt i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs; > PetscScalar Hx,Hy,Hz; > PetscScalar ***array; > DM da; > BCType bcType = *(BCType*)ctx; > > PetscFunctionBeginUser; > ierr = KSPGetDM(ksp,&da);CHKERRQ(ierr); > ierr = DMDAGetInfo(da, 0, &mx, &my, &mz, 0,0,0,0,0,0,0,0,0);CHKERRQ(ierr); > Hx = 1.0 / (PetscReal)(mx); > Hy = 1.0 / (PetscReal)(my); > Hz = 1.0 / (PetscReal)(mz); > ierr = DMDAGetCorners(da,&xs,&ys,&zs,&xm,&ym,&zm);CHKERRQ(ierr); > ierr = DMDAVecGetArray(da, b, &array);CHKERRQ(ierr); > for (k = zs; k < zs + zm; k++) > { > for (j = ys; j < ys + ym; j++) > { > for (i = xs; i < xs + xm; i++) > { > PetscReal x = ((PetscReal)i + 0.5) * Hx; > PetscReal y = ((PetscReal)j + 0.5) * Hy; > PetscReal z = ((PetscReal)k + 0.5) * Hz; > array[k][j][i] = PetscSinReal(x * 2.0 * PETSC_PI) * PetscCosReal(y * 2.0 * PETSC_PI) * PetscSinReal(z * 2.0 * PETSC_PI); > } > } > } > ierr = DMDAVecRestoreArray(da, b, &array);CHKERRQ(ierr); > ierr = VecAssemblyBegin(b);CHKERRQ(ierr); > ierr = VecAssemblyEnd(b);CHKERRQ(ierr); > > PetscReal norm; > VecNorm(b, NORM_2, &norm); > PetscPrintf(PETSC_COMM_WORLD, "right hand side 2 norm: %g\n", (double)norm); > VecNorm(b, NORM_INFINITY, &norm); > PetscPrintf(PETSC_COMM_WORLD, "right hand side infinity norm: %g\n", (double)norm); > > /* force right hand side to be consistent for singular matrix */ > /* note this is really a hack, normally the model would provide you with a consistent right handside */ > > if (bcType == NEUMANN) > { > MatNullSpace nullspace; > ierr = MatNullSpaceCreate(PETSC_COMM_WORLD,PETSC_TRUE,0,0,&nullspace);CHKERRQ(ierr); > ierr = MatNullSpaceRemove(nullspace,b);CHKERRQ(ierr); > ierr = MatNullSpaceDestroy(&nullspace);CHKERRQ(ierr); > } > PetscFunctionReturn(0); > } > > > #undef __FUNCT__ > #define __FUNCT__ "ComputeMatrix" > PetscErrorCode ComputeMatrix(KSP ksp, Mat J,Mat jac, void *ctx) > { > PetscErrorCode ierr; > PetscInt i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs,num, numi, numj, numk; > PetscScalar v[7],Hx,Hy,Hz; > MatStencil row, col[7]; > DM da; > BCType bcType = *(BCType*)ctx; > > PetscFunctionBeginUser; > > if (bcType == DIRICHLET) > PetscPrintf(PETSC_COMM_WORLD, "building operator with Dirichlet boundary conditions, "); > else if (bcType == NEUMANN) > PetscPrintf(PETSC_COMM_WORLD, "building operator with Neumann boundary conditions, "); > else > SETERRQ(PETSC_COMM_WORLD, PETSC_ERR_SUP, "unrecognized boundary condition type\n"); > > ierr = KSPGetDM(ksp,&da);CHKERRQ(ierr); > ierr = DMDAGetInfo(da,0,&mx,&my,&mz,0,0,0,0,0,0,0,0,0);CHKERRQ(ierr); > > PetscPrintf(PETSC_COMM_WORLD, "global grid size: %d x %d x %d\n", mx, my, mz); > > Hx = 1.0 / (PetscReal)(mx); > Hy = 1.0 / (PetscReal)(my); > Hz = 1.0 / (PetscReal)(mz); > > PetscReal Hx2 = Hx * Hx; > PetscReal Hy2 = Hy * Hy; > PetscReal Hz2 = Hz * Hz; > > PetscReal scaleX = 1.0 / Hx2; > PetscReal scaleY = 1.0 / Hy2; > PetscReal scaleZ = 1.0 / Hz2; > > ierr = DMDAGetCorners(da,&xs,&ys,&zs,&xm,&ym,&zm);CHKERRQ(ierr); > for (k = zs; k < zs + zm; k++) > { > for (j = ys; j < ys + ym; j++) > { > for (i = xs; i < xs + xm; i++) > { > row.i = i; > row.j = j; > row.k = k; > if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1 || k == mz - 1) > { > num = 0; > numi = 0; > numj = 0; > numk = 0; > if (k != 0) > { > v[num] = -scaleZ; > col[num].i = i; > col[num].j = j; > col[num].k = k - 1; > num++; > numk++; > } > if (j != 0) > { > v[num] = -scaleY; > col[num].i = i; > col[num].j = j - 1; > col[num].k = k; > num++; > numj++; > } > if (i != 0) > { > v[num] = -scaleX; > col[num].i = i - 1; > col[num].j = j; > col[num].k = k; > num++; > numi++; > } > if (i != mx - 1) > { > v[num] = -scaleX; > col[num].i = i + 1; > col[num].j = j; > col[num].k = k; > num++; > numi++; > } > if (j != my - 1) > { > v[num] = -scaleY; > col[num].i = i; > col[num].j = j + 1; > col[num].k = k; > num++; > numj++; > } > if (k != mz - 1) > { > v[num] = -scaleZ; > col[num].i = i; > col[num].j = j; > col[num].k = k + 1; > num++; > numk++; > } > > if (bcType == NEUMANN) > { > v[num] = (PetscReal) (numk) * scaleZ + (PetscReal) (numj) * scaleY + (PetscReal) (numi) * scaleX; > } > else if (bcType == DIRICHLET) > { > v[num] = 2.0 * (scaleX + scaleY + scaleZ); > } > > col[num].i = i; > col[num].j = j; > col[num].k = k; > num++; > ierr = MatSetValuesStencil(jac, 1, &row, num, col, v, INSERT_VALUES); > CHKERRQ(ierr); > } > else > { > v[0] = -scaleZ; > col[0].i = i; > col[0].j = j; > col[0].k = k - 1; > v[1] = -scaleY; > col[1].i = i; > col[1].j = j - 1; > col[1].k = k; > v[2] = -scaleX; > col[2].i = i - 1; > col[2].j = j; > col[2].k = k; > v[3] = 2.0 * (scaleX + scaleY + scaleZ); > col[3].i = i; > col[3].j = j; > col[3].k = k; > v[4] = -scaleX; > col[4].i = i + 1; > col[4].j = j; > col[4].k = k; > v[5] = -scaleY; > col[5].i = i; > col[5].j = j + 1; > col[5].k = k; > v[6] = -scaleZ; > col[6].i = i; > col[6].j = j; > col[6].k = k + 1; > ierr = MatSetValuesStencil(jac, 1, &row, 7, col, v, INSERT_VALUES); > CHKERRQ(ierr); > } > } > } > } > ierr = MatAssemblyBegin(jac,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); > ierr = MatAssemblyEnd(jac,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); > if (bcType == NEUMANN) > { > MatNullSpace nullspace; > ierr = MatNullSpaceCreate(PETSC_COMM_WORLD,PETSC_TRUE,0,0,&nullspace);CHKERRQ(ierr); > ierr = MatSetNullSpace(J,nullspace);CHKERRQ(ierr); > ierr = MatNullSpaceDestroy(&nullspace);CHKERRQ(ierr); > } > PetscFunctionReturn(0); > } > > >> On Jun 22, 2017, at 9:23 AM, Matthew Knepley wrote: >> >> On Wed, Jun 21, 2017 at 8:12 PM, Jason Lefley wrote: >> Hello, >> >> We are attempting to use the PETSc KSP solver framework in a fluid dynamics simulation we developed. The solution is part of a pressure projection and solves a Poisson problem. We use a cell-centered layout with a regular grid in 3d. We started with ex34.c from the KSP tutorials since it has the correct calls for the 3d DMDA, uses a cell-centered layout, and states that it works with multi-grid. We modified the operator construction function to match the coefficients and Dirichlet boundary conditions used in our problem (we?d also like to support Neumann but left those out for now to keep things simple). As a result of the modified boundary conditions, our version does not perform a null space removal on the right hand side or operator as the original did. We also modified the right hand side to contain a sinusoidal pattern for testing. Other than these changes, our code is the same as the original ex34.c >> >> With the default KSP options and using CG with the default pre-conditioner and without a pre-conditioner, we see good convergence. However, we?d like to accelerate the time to solution further and target larger problem sizes (>= 1024^3) if possible. Given these objectives, multi-grid as a pre-conditioner interests us. To understand the improvement that multi-grid provides, we ran ex45 from the KSP tutorials. ex34 with CG and no pre-conditioner appears to converge in a single iteration and we wanted to compare against a problem that has similar convergence patterns to our problem. Here?s the tests we ran with ex45: >> >> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 >> time in KSPSolve(): 7.0178e+00 >> solver iterations: 157 >> KSP final norm of residual: 3.16874e-05 >> >> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -ksp_type cg -pc_type none >> time in KSPSolve(): 4.1072e+00 >> solver iterations: 213 >> KSP final norm of residual: 0.000138866 >> >> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -ksp_type cg >> time in KSPSolve(): 3.3962e+00 >> solver iterations: 88 >> KSP final norm of residual: 6.46242e-05 >> >> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi >> time in KSPSolve(): 1.3201e+00 >> solver iterations: 4 >> KSP final norm of residual: 8.13339e-05 >> >> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi -ksp_type cg >> time in KSPSolve(): 1.3008e+00 >> solver iterations: 4 >> KSP final norm of residual: 2.21474e-05 >> >> We found the multi-grid pre-conditioner options in the KSP tutorials makefile. These results make sense; both the default GMRES and CG solvers converge and CG without a pre-conditioner takes more iterations. The multi-grid pre-conditioned runs are pretty dramatically accelerated and require only a handful of iterations. >> >> We ran our code (modified ex34.c as described above) with the same parameters: >> >> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 >> time in KSPSolve(): 5.3729e+00 >> solver iterations: 123 >> KSP final norm of residual: 0.00595066 >> >> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_type cg -pc_type none >> time in KSPSolve(): 3.6154e+00 >> solver iterations: 188 >> KSP final norm of residual: 0.00505943 >> >> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_type cg >> time in KSPSolve(): 3.5661e+00 >> solver iterations: 98 >> KSP final norm of residual: 0.00967462 >> >> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi >> time in KSPSolve(): 4.5606e+00 >> solver iterations: 44 >> KSP final norm of residual: 949.553 >> >> 1) Dave is right >> >> 2) In order to see how many iterates to expect, first try using algebraic multigrid >> >> -pc_type gamg >> >> This should work out of the box for Poisson >> >> 3) For questions like this, we really need to see >> >> -ksp_view -ksp_monitor_true_residual >> >> 4) It sounds like you smoother is not strong enough. You could try >> >> -mg_levels_ksp_type richardson -mg_levels_ksp_richardson_self_scale -mg_levels_ksp_max_it 5 >> >> or maybe GMRES until it works. >> >> Thanks, >> >> Matt >> >> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi -ksp_type cg >> time in KSPSolve(): 1.5481e+01 >> solver iterations: 198 >> KSP final norm of residual: 0.916558 >> >> We performed all tests with petsc-3.7.6. >> >> The trends with CG and GMRES seem consistent with the results from ex45. However, with multi-grid, something doesn?t seem right. Convergence seems poor and the solves run for many more iterations than ex45 with multi-grid as a pre-conditioner. I extensively validated the code that builds the matrix and also confirmed that the solution produced by CG, when evaluated with the system of equations elsewhere in our simulation, produces the same residual as indicated by PETSc. Given that we only made minimal modifications to the original example code, it seems likely that the operators constructed for the multi-grid levels are ok. >> >> We also tried a variety of other suggested parameters for the multi-grid pre-conditioner as suggested in related mailing list posts but we didn?t observe any significant improvements over the results above. >> >> Is there anything we can do to check the validity of the coefficient matrices built for the different multi-grid levels? Does it look like there could be problems there? Or any other suggestions to achieve better results with multi-grid? I have the -log_view, -ksp_view, and convergence monitor output from the above tests and can post any of it if it would assist. >> >> Thanks >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ > From knepley at gmail.com Thu Jun 22 19:35:52 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 22 Jun 2017 19:35:52 -0500 Subject: [petsc-users] Issue using multi-grid as a pre-conditioner with KSP for a Poisson problem In-Reply-To: <4D5B3921-810B-49AD-97E9-8BF1DECBF655@aclectic.com> References: <89C033B5-529D-4B36-B4AF-2EC35CA2CCAB@aclectic.com> <4D5B3921-810B-49AD-97E9-8BF1DECBF655@aclectic.com> Message-ID: On Thu, Jun 22, 2017 at 3:20 PM, Jason Lefley wrote: > Thanks for the prompt replies. I ran with gamg and the results look more > promising. I tried the suggested -mg_* options and did not see improvement. > The -ksp_view and -ksp_monitor_true_residual output from those tests and > the solver_test source (modified ex34.c) follow: > Okay, the second step is to replicate the smoother for the GMG, which will have a smaller and scalable setup time. The smoother could be weak, or the restriction could be bad. Thanks, Matt > $ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 > -ksp_view -ksp_monitor_true_residual -pc_type gamg -ksp_type cg > right hand side 2 norm: 512. > right hand side infinity norm: 0.999097 > building operator with Dirichlet boundary conditions, global grid size: > 128 x 128 x 128 > 0 KSP preconditioned resid norm 2.600515167901e+00 true resid norm > 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 6.715532962879e-02 true resid norm > 7.578946422553e+02 ||r(i)||/||b|| 1.480262973155e+00 > 2 KSP preconditioned resid norm 1.127682308441e-02 true resid norm > 3.247852182315e+01 ||r(i)||/||b|| 6.343461293584e-02 > 3 KSP preconditioned resid norm 7.760468503025e-04 true resid norm > 3.304142895659e+00 ||r(i)||/||b|| 6.453404093085e-03 > 4 KSP preconditioned resid norm 6.419777870067e-05 true resid norm > 2.662993775521e-01 ||r(i)||/||b|| 5.201159717815e-04 > 5 KSP preconditioned resid norm 5.107540549482e-06 true resid norm > 2.309528369351e-02 ||r(i)||/||b|| 4.510797596388e-05 > KSP Object: 16 MPI processes > type: cg > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 16 MPI processes > type: gamg > MG: type is MULTIPLICATIVE, levels=5 cycles=v > Cycles per PCApply=1 > Using Galerkin computed coarse grid matrices > GAMG specific options > Threshold for dropping small values from graph 0. > AGG specific options > Symmetric graph false > Coarse grid solver -- level ------------------------------- > KSP Object: (mg_coarse_) 16 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_) 16 MPI processes > type: bjacobi > block Jacobi: number of blocks = 16 > Local solve is same for all blocks, in the following KSP and PC > objects: > KSP Object: (mg_coarse_sub_) 1 MPI processes > type: preonly > maximum iterations=1, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_sub_) 1 MPI processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > using diagonal shift on blocks to prevent zero pivot [INBLOCKS] > matrix ordering: nd > factor fill ratio given 5., needed 1. > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=13, cols=13 > package used to perform factorization: petsc > total: nonzeros=169, allocated nonzeros=169 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 3 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=13, cols=13 > total: nonzeros=169, allocated nonzeros=169 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 3 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=13, cols=13 > total: nonzeros=169, allocated nonzeros=169 > total number of mallocs used during MatSetValues calls =0 > using I-node (on process 0) routines: found 3 nodes, limit used > is 5 > Down solver (pre-smoother) on level 1 ------------------------------- > KSP Object: (mg_levels_1_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.136516, max = 1.50168 > Chebyshev: eigenvalues estimated using gmres with translations > [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_1_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_1_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=467, cols=467 > total: nonzeros=68689, allocated nonzeros=68689 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 2 ------------------------------- > KSP Object: (mg_levels_2_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.148872, max = 1.63759 > Chebyshev: eigenvalues estimated using gmres with translations > [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_2_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_2_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=14893, cols=14893 > total: nonzeros=1856839, allocated nonzeros=1856839 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 3 ------------------------------- > KSP Object: (mg_levels_3_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.135736, max = 1.49309 > Chebyshev: eigenvalues estimated using gmres with translations > [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_3_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_3_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=190701, cols=190701 > total: nonzeros=6209261, allocated nonzeros=6209261 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 4 ------------------------------- > KSP Object: (mg_levels_4_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.140039, max = 1.54043 > Chebyshev: eigenvalues estimated using gmres with translations > [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_4_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_4_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=2097152, cols=2097152 > total: nonzeros=14581760, allocated nonzeros=14581760 > total number of mallocs used during MatSetValues calls =0 > Up solver (post-smoother) same as down solver (pre-smoother) > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=2097152, cols=2097152 > total: nonzeros=14581760, allocated nonzeros=14581760 > total number of mallocs used during MatSetValues calls =0 > Residual 2 norm 0.0230953 > Residual infinity norm 0.000240174 > > > > $ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 > -ksp_view -ksp_monitor_true_residual -pc_type mg -ksp_type cg -pc_mg_levels > 5 -mg_levels_ksp_type richardson -mg_levels_ksp_richardson_self_scale > -mg_levels_ksp_max_it 5 > right hand side 2 norm: 512. > right hand side infinity norm: 0.999097 > building operator with Dirichlet boundary conditions, global grid size: > 128 x 128 x 128 > building operator with Dirichlet boundary conditions, global grid size: 16 > x 16 x 16 > building operator with Dirichlet boundary conditions, global grid size: 32 > x 32 x 32 > building operator with Dirichlet boundary conditions, global grid size: 64 > x 64 x 64 > building operator with Dirichlet boundary conditions, global grid size: 8 > x 8 x 8 > 0 KSP preconditioned resid norm 1.957390963372e+03 true resid norm > 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 7.501162328351e+02 true resid norm > 3.373318498950e+02 ||r(i)||/||b|| 6.588512693262e-01 > 2 KSP preconditioned resid norm 7.658993705113e+01 true resid norm > 1.827365322620e+02 ||r(i)||/||b|| 3.569072895742e-01 > 3 KSP preconditioned resid norm 9.059824824329e+02 true resid norm > 1.426474831278e+02 ||r(i)||/||b|| 2.786083654840e-01 > 4 KSP preconditioned resid norm 4.091168582134e+02 true resid norm > 1.292495057977e+02 ||r(i)||/||b|| 2.524404410112e-01 > 5 KSP preconditioned resid norm 7.422110759274e+01 true resid norm > 1.258028404461e+02 ||r(i)||/||b|| 2.457086727463e-01 > 6 KSP preconditioned resid norm 4.619015396949e+01 true resid norm > 1.213792421102e+02 ||r(i)||/||b|| 2.370688322464e-01 > 7 KSP preconditioned resid norm 6.391009527793e+01 true resid norm > 1.124510270422e+02 ||r(i)||/||b|| 2.196309121917e-01 > 8 KSP preconditioned resid norm 7.446926604265e+01 true resid norm > 1.077567310933e+02 ||r(i)||/||b|| 2.104623654166e-01 > 9 KSP preconditioned resid norm 4.220904319642e+01 true resid norm > 9.988181971539e+01 ||r(i)||/||b|| 1.950816791316e-01 > 10 KSP preconditioned resid norm 2.394387980018e+01 true resid norm > 9.127579669592e+01 ||r(i)||/||b|| 1.782730404217e-01 > 11 KSP preconditioned resid norm 1.360843954226e+01 true resid norm > 8.771762326371e+01 ||r(i)||/||b|| 1.713234829369e-01 > 12 KSP preconditioned resid norm 4.128223286694e+01 true resid norm > 8.529182941649e+01 ||r(i)||/||b|| 1.665856043291e-01 > 13 KSP preconditioned resid norm 2.183532094447e+01 true resid norm > 8.263211340769e+01 ||r(i)||/||b|| 1.613908464994e-01 > 14 KSP preconditioned resid norm 1.304178992338e+01 true resid norm > 7.971822602122e+01 ||r(i)||/||b|| 1.556996601977e-01 > 15 KSP preconditioned resid norm 7.573349141411e+00 true resid norm > 7.520975377445e+01 ||r(i)||/||b|| 1.468940503407e-01 > 16 KSP preconditioned resid norm 9.314890793459e+00 true resid norm > 7.304954328407e+01 ||r(i)||/||b|| 1.426748892267e-01 > 17 KSP preconditioned resid norm 4.445933446231e+00 true resid norm > 6.978356031428e+01 ||r(i)||/||b|| 1.362960162388e-01 > 18 KSP preconditioned resid norm 5.349719054065e+00 true resid norm > 6.667516877214e+01 ||r(i)||/||b|| 1.302249390081e-01 > 19 KSP preconditioned resid norm 3.295861671942e+00 true resid norm > 6.182140339659e+01 ||r(i)||/||b|| 1.207449285090e-01 > 20 KSP preconditioned resid norm 1.035616277789e+01 true resid norm > 5.734720030036e+01 ||r(i)||/||b|| 1.120062505866e-01 > 21 KSP preconditioned resid norm 3.211186072853e+01 true resid norm > 5.552393909940e+01 ||r(i)||/||b|| 1.084451935535e-01 > 22 KSP preconditioned resid norm 1.305589450595e+01 true resid norm > 5.499062776214e+01 ||r(i)||/||b|| 1.074035698479e-01 > 23 KSP preconditioned resid norm 2.686432456763e+00 true resid norm > 5.207613218582e+01 ||r(i)||/||b|| 1.017111956754e-01 > 24 KSP preconditioned resid norm 2.824784197849e+00 true resid norm > 4.838619801451e+01 ||r(i)||/||b|| 9.450429299708e-02 > 25 KSP preconditioned resid norm 1.071690618667e+00 true resid norm > 4.607851421273e+01 ||r(i)||/||b|| 8.999709807174e-02 > 26 KSP preconditioned resid norm 1.881879145107e+00 true resid norm > 4.001593265961e+01 ||r(i)||/||b|| 7.815611847581e-02 > 27 KSP preconditioned resid norm 1.572862295402e+00 true resid norm > 3.838282973517e+01 ||r(i)||/||b|| 7.496646432650e-02 > 28 KSP preconditioned resid norm 1.470751639074e+00 true resid norm > 3.480847634691e+01 ||r(i)||/||b|| 6.798530536506e-02 > 29 KSP preconditioned resid norm 1.024975253805e+01 true resid norm > 3.242161363347e+01 ||r(i)||/||b|| 6.332346412788e-02 > 30 KSP preconditioned resid norm 2.548780607710e+00 true resid norm > 3.146609403253e+01 ||r(i)||/||b|| 6.145721490728e-02 > 31 KSP preconditioned resid norm 1.560691471465e+00 true resid norm > 2.970265802267e+01 ||r(i)||/||b|| 5.801300395052e-02 > 32 KSP preconditioned resid norm 2.596714997356e+00 true resid norm > 2.766969046763e+01 ||r(i)||/||b|| 5.404236419458e-02 > 33 KSP preconditioned resid norm 7.034818331385e+00 true resid norm > 2.684572557056e+01 ||r(i)||/||b|| 5.243305775501e-02 > 34 KSP preconditioned resid norm 1.494072683898e+00 true resid norm > 2.475430030960e+01 ||r(i)||/||b|| 4.834824279219e-02 > 35 KSP preconditioned resid norm 2.080781323538e+01 true resid norm > 2.334859550417e+01 ||r(i)||/||b|| 4.560272559409e-02 > 36 KSP preconditioned resid norm 2.046655096031e+00 true resid norm > 2.240354154839e+01 ||r(i)||/||b|| 4.375691708669e-02 > 37 KSP preconditioned resid norm 7.606846976760e-01 true resid norm > 2.109556419574e+01 ||r(i)||/||b|| 4.120227381981e-02 > 38 KSP preconditioned resid norm 2.521301363193e+00 true resid norm > 1.843497075964e+01 ||r(i)||/||b|| 3.600580226493e-02 > 39 KSP preconditioned resid norm 3.726976470079e+00 true resid norm > 1.794209917279e+01 ||r(i)||/||b|| 3.504316244686e-02 > 40 KSP preconditioned resid norm 8.959884762705e-01 true resid norm > 1.573137783532e+01 ||r(i)||/||b|| 3.072534733461e-02 > 41 KSP preconditioned resid norm 1.227682448888e+00 true resid norm > 1.501346415860e+01 ||r(i)||/||b|| 2.932317218476e-02 > 42 KSP preconditioned resid norm 1.452770736534e+00 true resid norm > 1.433942919922e+01 ||r(i)||/||b|| 2.800669765473e-02 > 43 KSP preconditioned resid norm 5.675352390898e-01 true resid norm > 1.216437815936e+01 ||r(i)||/||b|| 2.375855109250e-02 > 44 KSP preconditioned resid norm 4.949409351772e-01 true resid norm > 1.042812110399e+01 ||r(i)||/||b|| 2.036742403123e-02 > 45 KSP preconditioned resid norm 2.002853875915e+00 true resid norm > 9.309183650084e+00 ||r(i)||/||b|| 1.818199931657e-02 > 46 KSP preconditioned resid norm 3.745525627399e-01 true resid norm > 8.522457639380e+00 ||r(i)||/||b|| 1.664542507691e-02 > 47 KSP preconditioned resid norm 1.811694613170e-01 true resid norm > 7.531206553361e+00 ||r(i)||/||b|| 1.470938779953e-02 > 48 KSP preconditioned resid norm 1.782171623447e+00 true resid norm > 6.764441307706e+00 ||r(i)||/||b|| 1.321179942911e-02 > 49 KSP preconditioned resid norm 2.299828236176e+00 true resid norm > 6.702407994976e+00 ||r(i)||/||b|| 1.309064061519e-02 > 50 KSP preconditioned resid norm 1.273834849543e+00 true resid norm > 6.053797247633e+00 ||r(i)||/||b|| 1.182382274928e-02 > 51 KSP preconditioned resid norm 2.719578737249e-01 true resid norm > 5.470925517497e+00 ||r(i)||/||b|| 1.068540140136e-02 > 52 KSP preconditioned resid norm 4.663757145206e-01 true resid norm > 5.005785517882e+00 ||r(i)||/||b|| 9.776924839614e-03 > 53 KSP preconditioned resid norm 1.292565284376e+00 true resid norm > 4.881780753946e+00 ||r(i)||/||b|| 9.534728035050e-03 > 54 KSP preconditioned resid norm 1.867369610632e-01 true resid norm > 4.496564950399e+00 ||r(i)||/||b|| 8.782353418749e-03 > 55 KSP preconditioned resid norm 5.249392115789e-01 true resid norm > 4.092757959067e+00 ||r(i)||/||b|| 7.993667888803e-03 > 56 KSP preconditioned resid norm 1.924525961621e-01 true resid norm > 3.780501481010e+00 ||r(i)||/||b|| 7.383791955098e-03 > 57 KSP preconditioned resid norm 5.779420386829e-01 true resid norm > 3.213189014725e+00 ||r(i)||/||b|| 6.275759794385e-03 > 58 KSP preconditioned resid norm 5.955339076981e-01 true resid norm > 3.112032435949e+00 ||r(i)||/||b|| 6.078188351463e-03 > 59 KSP preconditioned resid norm 3.750139060970e-01 true resid norm > 2.999193364090e+00 ||r(i)||/||b|| 5.857799539239e-03 > 60 KSP preconditioned resid norm 1.384679712935e-01 true resid norm > 2.745891157615e+00 ||r(i)||/||b|| 5.363068667216e-03 > 61 KSP preconditioned resid norm 7.632834890339e-02 true resid norm > 2.176299405671e+00 ||r(i)||/||b|| 4.250584776702e-03 > 62 KSP preconditioned resid norm 3.147491994853e-01 true resid norm > 1.832893972188e+00 ||r(i)||/||b|| 3.579871039430e-03 > 63 KSP preconditioned resid norm 5.052243308649e-01 true resid norm > 1.775115122392e+00 ||r(i)||/||b|| 3.467021723421e-03 > 64 KSP preconditioned resid norm 8.956523831283e-01 true resid norm > 1.731441975933e+00 ||r(i)||/||b|| 3.381722609244e-03 > 65 KSP preconditioned resid norm 7.897527588669e-01 true resid norm > 1.682654829619e+00 ||r(i)||/||b|| 3.286435214100e-03 > 66 KSP preconditioned resid norm 5.770941160165e-02 true resid norm > 1.560734518349e+00 ||r(i)||/||b|| 3.048309606150e-03 > 67 KSP preconditioned resid norm 3.553024960194e-02 true resid norm > 1.389699750667e+00 ||r(i)||/||b|| 2.714257325521e-03 > 68 KSP preconditioned resid norm 4.316233667769e-02 true resid norm > 1.147051776028e+00 ||r(i)||/||b|| 2.240335500054e-03 > 69 KSP preconditioned resid norm 3.793691994632e-02 true resid norm > 1.012385825627e+00 ||r(i)||/||b|| 1.977316065678e-03 > 70 KSP preconditioned resid norm 2.383460701011e-02 true resid norm > 8.696480161436e-01 ||r(i)||/||b|| 1.698531281530e-03 > 71 KSP preconditioned resid norm 6.376655007996e-02 true resid norm > 7.779779636534e-01 ||r(i)||/||b|| 1.519488210261e-03 > 72 KSP preconditioned resid norm 5.714768085413e-02 true resid norm > 7.153671793501e-01 ||r(i)||/||b|| 1.397201522168e-03 > 73 KSP preconditioned resid norm 1.708395350387e-01 true resid norm > 6.312992319936e-01 ||r(i)||/||b|| 1.233006312487e-03 > 74 KSP preconditioned resid norm 1.498516783452e-01 true resid norm > 6.006527781743e-01 ||r(i)||/||b|| 1.173149957372e-03 > 75 KSP preconditioned resid norm 1.218071938641e-01 true resid norm > 5.769463903876e-01 ||r(i)||/||b|| 1.126848418726e-03 > 76 KSP preconditioned resid norm 2.682030144251e-02 true resid norm > 5.214035118381e-01 ||r(i)||/||b|| 1.018366234059e-03 > 77 KSP preconditioned resid norm 9.794744927328e-02 true resid norm > 4.660318995939e-01 ||r(i)||/||b|| 9.102185538943e-04 > 78 KSP preconditioned resid norm 3.311394355245e-01 true resid norm > 4.581129176231e-01 ||r(i)||/||b|| 8.947517922325e-04 > 79 KSP preconditioned resid norm 7.771705063438e-02 true resid norm > 4.103510898511e-01 ||r(i)||/||b|| 8.014669723654e-04 > 80 KSP preconditioned resid norm 3.078123608908e-02 true resid norm > 3.918493012988e-01 ||r(i)||/||b|| 7.653306665991e-04 > 81 KSP preconditioned resid norm 2.759088686744e-02 true resid norm > 3.289360804743e-01 ||r(i)||/||b|| 6.424532821763e-04 > 82 KSP preconditioned resid norm 1.147671489846e-01 true resid norm > 3.190902200515e-01 ||r(i)||/||b|| 6.232230860381e-04 > 83 KSP preconditioned resid norm 1.101306468440e-02 true resid norm > 2.900815313985e-01 ||r(i)||/||b|| 5.665654910126e-04 > KSP Object: 16 MPI processes > type: cg > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 16 MPI processes > type: mg > MG: type is MULTIPLICATIVE, levels=5 cycles=v > Cycles per PCApply=1 > Not using Galerkin computed coarse grid matrices > Coarse grid solver -- level ------------------------------- > KSP Object: (mg_coarse_) 16 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_) 16 MPI processes > type: redundant > Redundant preconditioner: First (color=0) of 16 PCs follows > KSP Object: (mg_coarse_redundant_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_redundant_) 1 MPI processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > using diagonal shift on blocks to prevent zero pivot [INBLOCKS] > matrix ordering: nd > factor fill ratio given 5., needed 7.56438 > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=512, cols=512 > package used to perform factorization: petsc > total: nonzeros=24206, allocated nonzeros=24206 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=512, cols=512 > total: nonzeros=3200, allocated nonzeros=3200 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=512, cols=512 > total: nonzeros=3200, allocated nonzeros=3200 > total number of mallocs used during MatSetValues calls =0 > Down solver (pre-smoother) on level 1 ------------------------------- > KSP Object: (mg_levels_1_) 16 MPI processes > type: richardson > Richardson: using self-scale best computed damping factor > maximum iterations=5 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_1_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=4096, cols=4096 > total: nonzeros=27136, allocated nonzeros=27136 > total number of mallocs used during MatSetValues calls =0 > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 2 ------------------------------- > KSP Object: (mg_levels_2_) 16 MPI processes > type: richardson > Richardson: using self-scale best computed damping factor > maximum iterations=5 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_2_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=32768, cols=32768 > total: nonzeros=223232, allocated nonzeros=223232 > total number of mallocs used during MatSetValues calls =0 > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 3 ------------------------------- > KSP Object: (mg_levels_3_) 16 MPI processes > type: richardson > Richardson: using self-scale best computed damping factor > maximum iterations=5 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_3_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=262144, cols=262144 > total: nonzeros=1810432, allocated nonzeros=1810432 > total number of mallocs used during MatSetValues calls =0 > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 4 ------------------------------- > KSP Object: (mg_levels_4_) 16 MPI processes > type: richardson > Richardson: using self-scale best computed damping factor > maximum iterations=5 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_4_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=2097152, cols=2097152 > total: nonzeros=14581760, allocated nonzeros=14581760 > total number of mallocs used during MatSetValues calls =0 > Up solver (post-smoother) same as down solver (pre-smoother) > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=2097152, cols=2097152 > total: nonzeros=14581760, allocated nonzeros=14581760 > total number of mallocs used during MatSetValues calls =0 > Residual 2 norm 0.290082 > Residual infinity norm 0.00192869 > > > > > > solver_test.c: > > // modified version of ksp/ksp/examples/tutorials/ex34.c > // related: ksp/ksp/examples/tutorials/ex29.c > // ksp/ksp/examples/tutorials/ex32.c > // ksp/ksp/examples/tutorials/ex50.c > > #include > #include > #include > > extern PetscErrorCode ComputeMatrix(KSP,Mat,Mat,void*); > extern PetscErrorCode ComputeRHS(KSP,Vec,void*); > > typedef enum > { > DIRICHLET, > NEUMANN > } BCType; > > #undef __FUNCT__ > #define __FUNCT__ "main" > int main(int argc,char **argv) > { > KSP ksp; > DM da; > PetscReal norm; > PetscErrorCode ierr; > > PetscInt i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs; > PetscScalar Hx,Hy,Hz; > PetscScalar ***array; > Vec x,b,r; > Mat J; > const char* bcTypes[2] = { "dirichlet", "neumann" }; > PetscInt bcType = (PetscInt)DIRICHLET; > > PetscInitialize(&argc,&argv,(char*)0,0); > > ierr = PetscOptionsBegin(PETSC_COMM_WORLD, "", "", "");CHKERRQ(ierr); > ierr = PetscOptionsEList("-bc_type", "Type of boundary condition", "", > bcTypes, 2, bcTypes[0], &bcType, NULL);CHKERRQ(ierr); > ierr = PetscOptionsEnd();CHKERRQ(ierr); > > ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr); > ierr = DMDACreate3d(PETSC_COMM_WORLD,DM_BOUNDARY_NONE,DM_BOUNDARY_ > NONE,DM_BOUNDARY_NONE,DMDA_STENCIL_STAR,-12,-12,-12, > PETSC_DECIDE,PETSC_DECIDE,PETSC_DECIDE,1,1,0,0,0,&da);CHKERRQ(ierr); > ierr = DMDASetInterpolationType(da, DMDA_Q0);CHKERRQ(ierr); > > ierr = KSPSetDM(ksp,da);CHKERRQ(ierr); > > ierr = KSPSetComputeRHS(ksp,ComputeRHS,&bcType);CHKERRQ(ierr); > ierr = KSPSetComputeOperators(ksp,ComputeMatrix,&bcType); > CHKERRQ(ierr); > ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr); > ierr = KSPSolve(ksp,NULL,NULL);CHKERRQ(ierr); > ierr = KSPGetSolution(ksp,&x);CHKERRQ(ierr); > ierr = KSPGetRhs(ksp,&b);CHKERRQ(ierr); > ierr = KSPGetOperators(ksp,NULL,&J);CHKERRQ(ierr); > ierr = VecDuplicate(b,&r);CHKERRQ(ierr); > > ierr = MatMult(J,x,r);CHKERRQ(ierr); > ierr = VecAYPX(r,-1.0,b);CHKERRQ(ierr); > ierr = VecNorm(r,NORM_2,&norm);CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD,"Residual 2 norm > %g\n",(double)norm);CHKERRQ(ierr); > ierr = VecNorm(r,NORM_INFINITY,&norm);CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD,"Residual infinity norm > %g\n",(double)norm);CHKERRQ(ierr); > > ierr = VecDestroy(&r);CHKERRQ(ierr); > ierr = KSPDestroy(&ksp);CHKERRQ(ierr); > ierr = DMDestroy(&da);CHKERRQ(ierr); > ierr = PetscFinalize(); > return 0; > } > > #undef __FUNCT__ > #define __FUNCT__ "ComputeRHS" > PetscErrorCode ComputeRHS(KSP ksp,Vec b,void *ctx) > { > PetscErrorCode ierr; > PetscInt i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs; > PetscScalar Hx,Hy,Hz; > PetscScalar ***array; > DM da; > BCType bcType = *(BCType*)ctx; > > PetscFunctionBeginUser; > ierr = KSPGetDM(ksp,&da);CHKERRQ(ierr); > ierr = DMDAGetInfo(da, 0, &mx, &my, &mz, 0,0,0,0,0,0,0,0,0);CHKERRQ( > ierr); > Hx = 1.0 / (PetscReal)(mx); > Hy = 1.0 / (PetscReal)(my); > Hz = 1.0 / (PetscReal)(mz); > ierr = DMDAGetCorners(da,&xs,&ys,&zs,&xm,&ym,&zm);CHKERRQ(ierr); > ierr = DMDAVecGetArray(da, b, &array);CHKERRQ(ierr); > for (k = zs; k < zs + zm; k++) > { > for (j = ys; j < ys + ym; j++) > { > for (i = xs; i < xs + xm; i++) > { > PetscReal x = ((PetscReal)i + 0.5) * Hx; > PetscReal y = ((PetscReal)j + 0.5) * Hy; > PetscReal z = ((PetscReal)k + 0.5) * Hz; > array[k][j][i] = PetscSinReal(x * 2.0 * PETSC_PI) * > PetscCosReal(y * 2.0 * PETSC_PI) * PetscSinReal(z * 2.0 * PETSC_PI); > } > } > } > ierr = DMDAVecRestoreArray(da, b, &array);CHKERRQ(ierr); > ierr = VecAssemblyBegin(b);CHKERRQ(ierr); > ierr = VecAssemblyEnd(b);CHKERRQ(ierr); > > PetscReal norm; > VecNorm(b, NORM_2, &norm); > PetscPrintf(PETSC_COMM_WORLD, "right hand side 2 norm: %g\n", > (double)norm); > VecNorm(b, NORM_INFINITY, &norm); > PetscPrintf(PETSC_COMM_WORLD, "right hand side infinity norm: %g\n", > (double)norm); > > /* force right hand side to be consistent for singular matrix */ > /* note this is really a hack, normally the model would provide you > with a consistent right handside */ > > if (bcType == NEUMANN) > { > MatNullSpace nullspace; > ierr = MatNullSpaceCreate(PETSC_COMM_WORLD,PETSC_TRUE,0,0,& > nullspace);CHKERRQ(ierr); > ierr = MatNullSpaceRemove(nullspace,b);CHKERRQ(ierr); > ierr = MatNullSpaceDestroy(&nullspace);CHKERRQ(ierr); > } > PetscFunctionReturn(0); > } > > > #undef __FUNCT__ > #define __FUNCT__ "ComputeMatrix" > PetscErrorCode ComputeMatrix(KSP ksp, Mat J,Mat jac, void *ctx) > { > PetscErrorCode ierr; > PetscInt i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs,num, numi, numj, numk; > PetscScalar v[7],Hx,Hy,Hz; > MatStencil row, col[7]; > DM da; > BCType bcType = *(BCType*)ctx; > > PetscFunctionBeginUser; > > if (bcType == DIRICHLET) > PetscPrintf(PETSC_COMM_WORLD, "building operator with Dirichlet > boundary conditions, "); > else if (bcType == NEUMANN) > PetscPrintf(PETSC_COMM_WORLD, "building operator with Neumann > boundary conditions, "); > else > SETERRQ(PETSC_COMM_WORLD, PETSC_ERR_SUP, "unrecognized boundary > condition type\n"); > > ierr = KSPGetDM(ksp,&da);CHKERRQ(ierr); > ierr = DMDAGetInfo(da,0,&mx,&my,&mz,0,0,0,0,0,0,0,0,0);CHKERRQ( > ierr); > > PetscPrintf(PETSC_COMM_WORLD, "global grid size: %d x %d x %d\n", mx, > my, mz); > > Hx = 1.0 / (PetscReal)(mx); > Hy = 1.0 / (PetscReal)(my); > Hz = 1.0 / (PetscReal)(mz); > > PetscReal Hx2 = Hx * Hx; > PetscReal Hy2 = Hy * Hy; > PetscReal Hz2 = Hz * Hz; > > PetscReal scaleX = 1.0 / Hx2; > PetscReal scaleY = 1.0 / Hy2; > PetscReal scaleZ = 1.0 / Hz2; > > ierr = DMDAGetCorners(da,&xs,&ys,&zs,&xm,&ym,&zm);CHKERRQ(ierr); > for (k = zs; k < zs + zm; k++) > { > for (j = ys; j < ys + ym; j++) > { > for (i = xs; i < xs + xm; i++) > { > row.i = i; > row.j = j; > row.k = k; > if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - > 1 || k == mz - 1) > { > num = 0; > numi = 0; > numj = 0; > numk = 0; > if (k != 0) > { > v[num] = -scaleZ; > col[num].i = i; > col[num].j = j; > col[num].k = k - 1; > num++; > numk++; > } > if (j != 0) > { > v[num] = -scaleY; > col[num].i = i; > col[num].j = j - 1; > col[num].k = k; > num++; > numj++; > } > if (i != 0) > { > v[num] = -scaleX; > col[num].i = i - 1; > col[num].j = j; > col[num].k = k; > num++; > numi++; > } > if (i != mx - 1) > { > v[num] = -scaleX; > col[num].i = i + 1; > col[num].j = j; > col[num].k = k; > num++; > numi++; > } > if (j != my - 1) > { > v[num] = -scaleY; > col[num].i = i; > col[num].j = j + 1; > col[num].k = k; > num++; > numj++; > } > if (k != mz - 1) > { > v[num] = -scaleZ; > col[num].i = i; > col[num].j = j; > col[num].k = k + 1; > num++; > numk++; > } > > if (bcType == NEUMANN) > { > v[num] = (PetscReal) (numk) * scaleZ + (PetscReal) > (numj) * scaleY + (PetscReal) (numi) * scaleX; > } > else if (bcType == DIRICHLET) > { > v[num] = 2.0 * (scaleX + scaleY + scaleZ); > } > > col[num].i = i; > col[num].j = j; > col[num].k = k; > num++; > ierr = MatSetValuesStencil(jac, 1, &row, num, col, v, > INSERT_VALUES); > CHKERRQ(ierr); > } > else > { > v[0] = -scaleZ; > col[0].i = i; > col[0].j = j; > col[0].k = k - 1; > v[1] = -scaleY; > col[1].i = i; > col[1].j = j - 1; > col[1].k = k; > v[2] = -scaleX; > col[2].i = i - 1; > col[2].j = j; > col[2].k = k; > v[3] = 2.0 * (scaleX + scaleY + scaleZ); > col[3].i = i; > col[3].j = j; > col[3].k = k; > v[4] = -scaleX; > col[4].i = i + 1; > col[4].j = j; > col[4].k = k; > v[5] = -scaleY; > col[5].i = i; > col[5].j = j + 1; > col[5].k = k; > v[6] = -scaleZ; > col[6].i = i; > col[6].j = j; > col[6].k = k + 1; > ierr = MatSetValuesStencil(jac, 1, &row, 7, col, v, > INSERT_VALUES); > CHKERRQ(ierr); > } > } > } > } > ierr = MatAssemblyBegin(jac,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); > ierr = MatAssemblyEnd(jac,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); > if (bcType == NEUMANN) > { > MatNullSpace nullspace; > ierr = MatNullSpaceCreate(PETSC_COMM_WORLD,PETSC_TRUE,0,0,& > nullspace);CHKERRQ(ierr); > ierr = MatSetNullSpace(J,nullspace);CHKERRQ(ierr); > ierr = MatNullSpaceDestroy(&nullspace);CHKERRQ(ierr); > } > PetscFunctionReturn(0); > } > > > On Jun 22, 2017, at 9:23 AM, Matthew Knepley wrote: > > On Wed, Jun 21, 2017 at 8:12 PM, Jason Lefley > wrote: > >> Hello, >> >> We are attempting to use the PETSc KSP solver framework in a fluid >> dynamics simulation we developed. The solution is part of a pressure >> projection and solves a Poisson problem. We use a cell-centered layout with >> a regular grid in 3d. We started with ex34.c from the KSP tutorials since >> it has the correct calls for the 3d DMDA, uses a cell-centered layout, and >> states that it works with multi-grid. We modified the operator construction >> function to match the coefficients and Dirichlet boundary conditions used >> in our problem (we?d also like to support Neumann but left those out for >> now to keep things simple). As a result of the modified boundary >> conditions, our version does not perform a null space removal on the right >> hand side or operator as the original did. We also modified the right hand >> side to contain a sinusoidal pattern for testing. Other than these changes, >> our code is the same as the original ex34.c >> >> With the default KSP options and using CG with the default >> pre-conditioner and without a pre-conditioner, we see good convergence. >> However, we?d like to accelerate the time to solution further and target >> larger problem sizes (>= 1024^3) if possible. Given these objectives, >> multi-grid as a pre-conditioner interests us. To understand the improvement >> that multi-grid provides, we ran ex45 from the KSP tutorials. ex34 with CG >> and no pre-conditioner appears to converge in a single iteration and we >> wanted to compare against a problem that has similar convergence patterns >> to our problem. Here?s the tests we ran with ex45: >> >> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 >> time in KSPSolve(): 7.0178e+00 >> solver iterations: 157 >> KSP final norm of residual: 3.16874e-05 >> >> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 >> -ksp_type cg -pc_type none >> time in KSPSolve(): 4.1072e+00 >> solver iterations: 213 >> KSP final norm of residual: 0.000138866 >> >> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 >> -ksp_type cg >> time in KSPSolve(): 3.3962e+00 >> solver iterations: 88 >> KSP final norm of residual: 6.46242e-05 >> >> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type >> mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 >> -mg_levels_pc_type bjacobi >> time in KSPSolve(): 1.3201e+00 >> solver iterations: 4 >> KSP final norm of residual: 8.13339e-05 >> >> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type >> mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 >> -mg_levels_pc_type bjacobi -ksp_type cg >> time in KSPSolve(): 1.3008e+00 >> solver iterations: 4 >> KSP final norm of residual: 2.21474e-05 >> >> We found the multi-grid pre-conditioner options in the KSP tutorials >> makefile. These results make sense; both the default GMRES and CG solvers >> converge and CG without a pre-conditioner takes more iterations. The >> multi-grid pre-conditioned runs are pretty dramatically accelerated and >> require only a handful of iterations. >> >> We ran our code (modified ex34.c as described above) with the same >> parameters: >> >> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 >> time in KSPSolve(): 5.3729e+00 >> solver iterations: 123 >> KSP final norm of residual: 0.00595066 >> >> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 >> -ksp_type cg -pc_type none >> time in KSPSolve(): 3.6154e+00 >> solver iterations: 188 >> KSP final norm of residual: 0.00505943 >> >> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 >> -ksp_type cg >> time in KSPSolve(): 3.5661e+00 >> solver iterations: 98 >> KSP final norm of residual: 0.00967462 >> >> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 >> -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson >> -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi >> time in KSPSolve(): 4.5606e+00 >> solver iterations: 44 >> KSP final norm of residual: 949.553 >> > > 1) Dave is right > > 2) In order to see how many iterates to expect, first try using algebraic > multigrid > > -pc_type gamg > > This should work out of the box for Poisson > > 3) For questions like this, we really need to see > > -ksp_view -ksp_monitor_true_residual > > 4) It sounds like you smoother is not strong enough. You could try > > -mg_levels_ksp_type richardson -mg_levels_ksp_richardson_self_scale > -mg_levels_ksp_max_it 5 > > or maybe GMRES until it works. > > Thanks, > > Matt > > >> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 >> -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson >> -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi -ksp_type cg >> time in KSPSolve(): 1.5481e+01 >> solver iterations: 198 >> KSP final norm of residual: 0.916558 >> >> We performed all tests with petsc-3.7.6. >> >> The trends with CG and GMRES seem consistent with the results from ex45. >> However, with multi-grid, something doesn?t seem right. Convergence seems >> poor and the solves run for many more iterations than ex45 with multi-grid >> as a pre-conditioner. I extensively validated the code that builds the >> matrix and also confirmed that the solution produced by CG, when evaluated >> with the system of equations elsewhere in our simulation, produces the same >> residual as indicated by PETSc. Given that we only made minimal >> modifications to the original example code, it seems likely that the >> operators constructed for the multi-grid levels are ok. >> >> We also tried a variety of other suggested parameters for the multi-grid >> pre-conditioner as suggested in related mailing list posts but we didn?t >> observe any significant improvements over the results above. >> >> Is there anything we can do to check the validity of the coefficient >> matrices built for the different multi-grid levels? Does it look like there >> could be problems there? Or any other suggestions to achieve better results >> with multi-grid? I have the -log_view, -ksp_view, and convergence monitor >> output from the above tests and can post any of it if it would assist. >> >> Thanks > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.lefley at aclectic.com Fri Jun 23 00:52:12 2017 From: jason.lefley at aclectic.com (Jason Lefley) Date: Thu, 22 Jun 2017 22:52:12 -0700 Subject: [petsc-users] Issue using multi-grid as a pre-conditioner with KSP for a Poisson problem In-Reply-To: References: <89C033B5-529D-4B36-B4AF-2EC35CA2CCAB@aclectic.com> <4D5B3921-810B-49AD-97E9-8BF1DECBF655@aclectic.com> Message-ID: <36C36D64-B4B8-4D06-A72C-F61DE5E2ABCB@aclectic.com> > On Jun 22, 2017, at 3:52 PM, Barry Smith wrote: > > > Try running the -pc_type mg case with the additional argument -pc_mg_galerkin does that decrease the number of iterations? Yes, adding that option yields a considerable improvement: $ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_view -ksp_monitor_true_residual -pc_type mg -ksp_type cg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_richardson_self_scale -mg_levels_ksp_max_it 5 -pc_mg_galerkin right hand side 2 norm: 512. right hand side infinity norm: 0.999097 building operator with Dirichlet boundary conditions, global grid size: 128 x 128 x 128 0 KSP preconditioned resid norm 9.822067063977e-01 true resid norm 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.212533971477e-01 true resid norm 3.294699127279e+02 ||r(i)||/||b|| 6.434959232968e-01 2 KSP preconditioned resid norm 9.110625065462e-02 true resid norm 1.674425012051e+02 ||r(i)||/||b|| 3.270361351662e-01 3 KSP preconditioned resid norm 3.418653611088e-02 true resid norm 7.052946665869e+01 ||r(i)||/||b|| 1.377528645678e-01 4 KSP preconditioned resid norm 1.120326622247e-02 true resid norm 2.513164585178e+01 ||r(i)||/||b|| 4.908524580425e-02 5 KSP preconditioned resid norm 3.171704458717e-03 true resid norm 7.887015942877e+00 ||r(i)||/||b|| 1.540432801343e-02 6 KSP preconditioned resid norm 7.376165339467e-04 true resid norm 2.126203120360e+00 ||r(i)||/||b|| 4.152740469453e-03 7 KSP preconditioned resid norm 1.599952443901e-04 true resid norm 4.455888786560e-01 ||r(i)||/||b|| 8.702907786250e-04 8 KSP preconditioned resid norm 5.280374315084e-05 true resid norm 9.902193817384e-02 ||r(i)||/||b|| 1.934022229958e-04 9 KSP preconditioned resid norm 3.284982207258e-05 true resid norm 3.046564806731e-02 ||r(i)||/||b|| 5.950321888146e-05 10 KSP preconditioned resid norm 1.630367968481e-05 true resid norm 3.157470827473e-02 ||r(i)||/||b|| 6.166935209908e-05 11 KSP preconditioned resid norm 4.546559816411e-06 true resid norm 9.115958407015e-03 ||r(i)||/||b|| 1.780460626370e-05 KSP Object: 16 MPI processes type: cg maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 16 MPI processes type: mg MG: type is MULTIPLICATIVE, levels=5 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 16 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 16 MPI processes type: redundant Redundant preconditioner: First (color=0) of 16 PCs follows KSP Object: (mg_coarse_redundant_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_redundant_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5., needed 7.56438 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=512, cols=512 package used to perform factorization: petsc total: nonzeros=24206, allocated nonzeros=24206 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=512, cols=512 total: nonzeros=3200, allocated nonzeros=3200 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=512, cols=512 total: nonzeros=3200, allocated nonzeros=3200 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 16 MPI processes type: richardson Richardson: using self-scale best computed damping factor maximum iterations=5 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 16 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=4096, cols=4096 total: nonzeros=27136, allocated nonzeros=27136 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 16 MPI processes type: richardson Richardson: using self-scale best computed damping factor maximum iterations=5 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_2_) 16 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=32768, cols=32768 total: nonzeros=223232, allocated nonzeros=223232 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 16 MPI processes type: richardson Richardson: using self-scale best computed damping factor maximum iterations=5 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_3_) 16 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=262144, cols=262144 total: nonzeros=1810432, allocated nonzeros=1810432 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 4 ------------------------------- KSP Object: (mg_levels_4_) 16 MPI processes type: richardson Richardson: using self-scale best computed damping factor maximum iterations=5 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_4_) 16 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=2097152, cols=2097152 total: nonzeros=14581760, allocated nonzeros=14581760 total number of mallocs used during MatSetValues calls =0 Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=2097152, cols=2097152 total: nonzeros=14581760, allocated nonzeros=14581760 total number of mallocs used during MatSetValues calls =0 Residual 2 norm 0.00911596 Residual infinity norm 4.90052e-05 > >> On Jun 22, 2017, at 3:20 PM, Jason Lefley wrote: >> >> Thanks for the prompt replies. I ran with gamg and the results look more promising. I tried the suggested -mg_* options and did not see improvement. > > Yes, the 5 iterations you get below is pretty much the best you can expect. No reasonably tuning of smoother options is likely to have much affect (it is very difficult to improve from 5). > We are quite happy with 5 iterations. The suggested mg options did not improve the runs involving -pc_type mg. I?d like to know how to get the same convergence we observe with gamg when using mg, if possible. > Barry > >> The -ksp_view and -ksp_monitor_true_residual output from those tests and the solver_test source (modified ex34.c) follow: >> >> $ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_view -ksp_monitor_true_residual -pc_type gamg -ksp_type cg >> right hand side 2 norm: 512. >> right hand side infinity norm: 0.999097 >> building operator with Dirichlet boundary conditions, global grid size: 128 x 128 x 128 >> 0 KSP preconditioned resid norm 2.600515167901e+00 true resid norm 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00 >> 1 KSP preconditioned resid norm 6.715532962879e-02 true resid norm 7.578946422553e+02 ||r(i)||/||b|| 1.480262973155e+00 >> 2 KSP preconditioned resid norm 1.127682308441e-02 true resid norm 3.247852182315e+01 ||r(i)||/||b|| 6.343461293584e-02 >> 3 KSP preconditioned resid norm 7.760468503025e-04 true resid norm 3.304142895659e+00 ||r(i)||/||b|| 6.453404093085e-03 >> 4 KSP preconditioned resid norm 6.419777870067e-05 true resid norm 2.662993775521e-01 ||r(i)||/||b|| 5.201159717815e-04 >> 5 KSP preconditioned resid norm 5.107540549482e-06 true resid norm 2.309528369351e-02 ||r(i)||/||b|| 4.510797596388e-05 >> KSP Object: 16 MPI processes > > Y >> type: cg >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> PC Object: 16 MPI processes >> type: gamg >> MG: type is MULTIPLICATIVE, levels=5 cycles=v >> Cycles per PCApply=1 >> Using Galerkin computed coarse grid matrices >> GAMG specific options >> Threshold for dropping small values from graph 0. >> AGG specific options >> Symmetric graph false >> Coarse grid solver -- level ------------------------------- >> KSP Object: (mg_coarse_) 16 MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (mg_coarse_) 16 MPI processes >> type: bjacobi >> block Jacobi: number of blocks = 16 >> Local solve is same for all blocks, in the following KSP and PC objects: >> KSP Object: (mg_coarse_sub_) 1 MPI processes >> type: preonly >> maximum iterations=1, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (mg_coarse_sub_) 1 MPI processes >> type: lu >> LU: out-of-place factorization >> tolerance for zero pivot 2.22045e-14 >> using diagonal shift on blocks to prevent zero pivot [INBLOCKS] >> matrix ordering: nd >> factor fill ratio given 5., needed 1. >> Factored matrix follows: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=13, cols=13 >> package used to perform factorization: petsc >> total: nonzeros=169, allocated nonzeros=169 >> total number of mallocs used during MatSetValues calls =0 >> using I-node routines: found 3 nodes, limit used is 5 >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=13, cols=13 >> total: nonzeros=169, allocated nonzeros=169 >> total number of mallocs used during MatSetValues calls =0 >> using I-node routines: found 3 nodes, limit used is 5 >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=13, cols=13 >> total: nonzeros=169, allocated nonzeros=169 >> total number of mallocs used during MatSetValues calls =0 >> using I-node (on process 0) routines: found 3 nodes, limit used is 5 >> Down solver (pre-smoother) on level 1 ------------------------------- >> KSP Object: (mg_levels_1_) 16 MPI processes >> type: chebyshev >> Chebyshev: eigenvalue estimates: min = 0.136516, max = 1.50168 >> Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] >> KSP Object: (mg_levels_1_esteig_) 16 MPI processes >> type: gmres >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement >> GMRES: happy breakdown tolerance 1e-30 >> maximum iterations=10, initial guess is zero >> tolerances: relative=1e-12, absolute=1e-50, divergence=10000. >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_1_) 16 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=467, cols=467 >> total: nonzeros=68689, allocated nonzeros=68689 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node (on process 0) routines >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 2 ------------------------------- >> KSP Object: (mg_levels_2_) 16 MPI processes >> type: chebyshev >> Chebyshev: eigenvalue estimates: min = 0.148872, max = 1.63759 >> Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] >> KSP Object: (mg_levels_2_esteig_) 16 MPI processes >> type: gmres >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement >> GMRES: happy breakdown tolerance 1e-30 >> maximum iterations=10, initial guess is zero >> tolerances: relative=1e-12, absolute=1e-50, divergence=10000. >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_2_) 16 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=14893, cols=14893 >> total: nonzeros=1856839, allocated nonzeros=1856839 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node (on process 0) routines >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 3 ------------------------------- >> KSP Object: (mg_levels_3_) 16 MPI processes >> type: chebyshev >> Chebyshev: eigenvalue estimates: min = 0.135736, max = 1.49309 >> Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] >> KSP Object: (mg_levels_3_esteig_) 16 MPI processes >> type: gmres >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement >> GMRES: happy breakdown tolerance 1e-30 >> maximum iterations=10, initial guess is zero >> tolerances: relative=1e-12, absolute=1e-50, divergence=10000. >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_3_) 16 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=190701, cols=190701 >> total: nonzeros=6209261, allocated nonzeros=6209261 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node (on process 0) routines >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 4 ------------------------------- >> KSP Object: (mg_levels_4_) 16 MPI processes >> type: chebyshev >> Chebyshev: eigenvalue estimates: min = 0.140039, max = 1.54043 >> Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] >> KSP Object: (mg_levels_4_esteig_) 16 MPI processes >> type: gmres >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement >> GMRES: happy breakdown tolerance 1e-30 >> maximum iterations=10, initial guess is zero >> tolerances: relative=1e-12, absolute=1e-50, divergence=10000. >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_4_) 16 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=2097152, cols=2097152 >> total: nonzeros=14581760, allocated nonzeros=14581760 >> total number of mallocs used during MatSetValues calls =0 >> Up solver (post-smoother) same as down solver (pre-smoother) >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=2097152, cols=2097152 >> total: nonzeros=14581760, allocated nonzeros=14581760 >> total number of mallocs used during MatSetValues calls =0 >> Residual 2 norm 0.0230953 >> Residual infinity norm 0.000240174 >> >> >> >> $ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_view -ksp_monitor_true_residual -pc_type mg -ksp_type cg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_richardson_self_scale -mg_levels_ksp_max_it 5 >> right hand side 2 norm: 512. >> right hand side infinity norm: 0.999097 >> building operator with Dirichlet boundary conditions, global grid size: 128 x 128 x 128 >> building operator with Dirichlet boundary conditions, global grid size: 16 x 16 x 16 >> building operator with Dirichlet boundary conditions, global grid size: 32 x 32 x 32 >> building operator with Dirichlet boundary conditions, global grid size: 64 x 64 x 64 >> building operator with Dirichlet boundary conditions, global grid size: 8 x 8 x 8 >> 0 KSP preconditioned resid norm 1.957390963372e+03 true resid norm 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00 >> 1 KSP preconditioned resid norm 7.501162328351e+02 true resid norm 3.373318498950e+02 ||r(i)||/||b|| 6.588512693262e-01 >> 2 KSP preconditioned resid norm 7.658993705113e+01 true resid norm 1.827365322620e+02 ||r(i)||/||b|| 3.569072895742e-01 >> 3 KSP preconditioned resid norm 9.059824824329e+02 true resid norm 1.426474831278e+02 ||r(i)||/||b|| 2.786083654840e-01 >> 4 KSP preconditioned resid norm 4.091168582134e+02 true resid norm 1.292495057977e+02 ||r(i)||/||b|| 2.524404410112e-01 >> 5 KSP preconditioned resid norm 7.422110759274e+01 true resid norm 1.258028404461e+02 ||r(i)||/||b|| 2.457086727463e-01 >> 6 KSP preconditioned resid norm 4.619015396949e+01 true resid norm 1.213792421102e+02 ||r(i)||/||b|| 2.370688322464e-01 >> 7 KSP preconditioned resid norm 6.391009527793e+01 true resid norm 1.124510270422e+02 ||r(i)||/||b|| 2.196309121917e-01 >> 8 KSP preconditioned resid norm 7.446926604265e+01 true resid norm 1.077567310933e+02 ||r(i)||/||b|| 2.104623654166e-01 >> 9 KSP preconditioned resid norm 4.220904319642e+01 true resid norm 9.988181971539e+01 ||r(i)||/||b|| 1.950816791316e-01 >> 10 KSP preconditioned resid norm 2.394387980018e+01 true resid norm 9.127579669592e+01 ||r(i)||/||b|| 1.782730404217e-01 >> 11 KSP preconditioned resid norm 1.360843954226e+01 true resid norm 8.771762326371e+01 ||r(i)||/||b|| 1.713234829369e-01 >> 12 KSP preconditioned resid norm 4.128223286694e+01 true resid norm 8.529182941649e+01 ||r(i)||/||b|| 1.665856043291e-01 >> 13 KSP preconditioned resid norm 2.183532094447e+01 true resid norm 8.263211340769e+01 ||r(i)||/||b|| 1.613908464994e-01 >> 14 KSP preconditioned resid norm 1.304178992338e+01 true resid norm 7.971822602122e+01 ||r(i)||/||b|| 1.556996601977e-01 >> 15 KSP preconditioned resid norm 7.573349141411e+00 true resid norm 7.520975377445e+01 ||r(i)||/||b|| 1.468940503407e-01 >> 16 KSP preconditioned resid norm 9.314890793459e+00 true resid norm 7.304954328407e+01 ||r(i)||/||b|| 1.426748892267e-01 >> 17 KSP preconditioned resid norm 4.445933446231e+00 true resid norm 6.978356031428e+01 ||r(i)||/||b|| 1.362960162388e-01 >> 18 KSP preconditioned resid norm 5.349719054065e+00 true resid norm 6.667516877214e+01 ||r(i)||/||b|| 1.302249390081e-01 >> 19 KSP preconditioned resid norm 3.295861671942e+00 true resid norm 6.182140339659e+01 ||r(i)||/||b|| 1.207449285090e-01 >> 20 KSP preconditioned resid norm 1.035616277789e+01 true resid norm 5.734720030036e+01 ||r(i)||/||b|| 1.120062505866e-01 >> 21 KSP preconditioned resid norm 3.211186072853e+01 true resid norm 5.552393909940e+01 ||r(i)||/||b|| 1.084451935535e-01 >> 22 KSP preconditioned resid norm 1.305589450595e+01 true resid norm 5.499062776214e+01 ||r(i)||/||b|| 1.074035698479e-01 >> 23 KSP preconditioned resid norm 2.686432456763e+00 true resid norm 5.207613218582e+01 ||r(i)||/||b|| 1.017111956754e-01 >> 24 KSP preconditioned resid norm 2.824784197849e+00 true resid norm 4.838619801451e+01 ||r(i)||/||b|| 9.450429299708e-02 >> 25 KSP preconditioned resid norm 1.071690618667e+00 true resid norm 4.607851421273e+01 ||r(i)||/||b|| 8.999709807174e-02 >> 26 KSP preconditioned resid norm 1.881879145107e+00 true resid norm 4.001593265961e+01 ||r(i)||/||b|| 7.815611847581e-02 >> 27 KSP preconditioned resid norm 1.572862295402e+00 true resid norm 3.838282973517e+01 ||r(i)||/||b|| 7.496646432650e-02 >> 28 KSP preconditioned resid norm 1.470751639074e+00 true resid norm 3.480847634691e+01 ||r(i)||/||b|| 6.798530536506e-02 >> 29 KSP preconditioned resid norm 1.024975253805e+01 true resid norm 3.242161363347e+01 ||r(i)||/||b|| 6.332346412788e-02 >> 30 KSP preconditioned resid norm 2.548780607710e+00 true resid norm 3.146609403253e+01 ||r(i)||/||b|| 6.145721490728e-02 >> 31 KSP preconditioned resid norm 1.560691471465e+00 true resid norm 2.970265802267e+01 ||r(i)||/||b|| 5.801300395052e-02 >> 32 KSP preconditioned resid norm 2.596714997356e+00 true resid norm 2.766969046763e+01 ||r(i)||/||b|| 5.404236419458e-02 >> 33 KSP preconditioned resid norm 7.034818331385e+00 true resid norm 2.684572557056e+01 ||r(i)||/||b|| 5.243305775501e-02 >> 34 KSP preconditioned resid norm 1.494072683898e+00 true resid norm 2.475430030960e+01 ||r(i)||/||b|| 4.834824279219e-02 >> 35 KSP preconditioned resid norm 2.080781323538e+01 true resid norm 2.334859550417e+01 ||r(i)||/||b|| 4.560272559409e-02 >> 36 KSP preconditioned resid norm 2.046655096031e+00 true resid norm 2.240354154839e+01 ||r(i)||/||b|| 4.375691708669e-02 >> 37 KSP preconditioned resid norm 7.606846976760e-01 true resid norm 2.109556419574e+01 ||r(i)||/||b|| 4.120227381981e-02 >> 38 KSP preconditioned resid norm 2.521301363193e+00 true resid norm 1.843497075964e+01 ||r(i)||/||b|| 3.600580226493e-02 >> 39 KSP preconditioned resid norm 3.726976470079e+00 true resid norm 1.794209917279e+01 ||r(i)||/||b|| 3.504316244686e-02 >> 40 KSP preconditioned resid norm 8.959884762705e-01 true resid norm 1.573137783532e+01 ||r(i)||/||b|| 3.072534733461e-02 >> 41 KSP preconditioned resid norm 1.227682448888e+00 true resid norm 1.501346415860e+01 ||r(i)||/||b|| 2.932317218476e-02 >> 42 KSP preconditioned resid norm 1.452770736534e+00 true resid norm 1.433942919922e+01 ||r(i)||/||b|| 2.800669765473e-02 >> 43 KSP preconditioned resid norm 5.675352390898e-01 true resid norm 1.216437815936e+01 ||r(i)||/||b|| 2.375855109250e-02 >> 44 KSP preconditioned resid norm 4.949409351772e-01 true resid norm 1.042812110399e+01 ||r(i)||/||b|| 2.036742403123e-02 >> 45 KSP preconditioned resid norm 2.002853875915e+00 true resid norm 9.309183650084e+00 ||r(i)||/||b|| 1.818199931657e-02 >> 46 KSP preconditioned resid norm 3.745525627399e-01 true resid norm 8.522457639380e+00 ||r(i)||/||b|| 1.664542507691e-02 >> 47 KSP preconditioned resid norm 1.811694613170e-01 true resid norm 7.531206553361e+00 ||r(i)||/||b|| 1.470938779953e-02 >> 48 KSP preconditioned resid norm 1.782171623447e+00 true resid norm 6.764441307706e+00 ||r(i)||/||b|| 1.321179942911e-02 >> 49 KSP preconditioned resid norm 2.299828236176e+00 true resid norm 6.702407994976e+00 ||r(i)||/||b|| 1.309064061519e-02 >> 50 KSP preconditioned resid norm 1.273834849543e+00 true resid norm 6.053797247633e+00 ||r(i)||/||b|| 1.182382274928e-02 >> 51 KSP preconditioned resid norm 2.719578737249e-01 true resid norm 5.470925517497e+00 ||r(i)||/||b|| 1.068540140136e-02 >> 52 KSP preconditioned resid norm 4.663757145206e-01 true resid norm 5.005785517882e+00 ||r(i)||/||b|| 9.776924839614e-03 >> 53 KSP preconditioned resid norm 1.292565284376e+00 true resid norm 4.881780753946e+00 ||r(i)||/||b|| 9.534728035050e-03 >> 54 KSP preconditioned resid norm 1.867369610632e-01 true resid norm 4.496564950399e+00 ||r(i)||/||b|| 8.782353418749e-03 >> 55 KSP preconditioned resid norm 5.249392115789e-01 true resid norm 4.092757959067e+00 ||r(i)||/||b|| 7.993667888803e-03 >> 56 KSP preconditioned resid norm 1.924525961621e-01 true resid norm 3.780501481010e+00 ||r(i)||/||b|| 7.383791955098e-03 >> 57 KSP preconditioned resid norm 5.779420386829e-01 true resid norm 3.213189014725e+00 ||r(i)||/||b|| 6.275759794385e-03 >> 58 KSP preconditioned resid norm 5.955339076981e-01 true resid norm 3.112032435949e+00 ||r(i)||/||b|| 6.078188351463e-03 >> 59 KSP preconditioned resid norm 3.750139060970e-01 true resid norm 2.999193364090e+00 ||r(i)||/||b|| 5.857799539239e-03 >> 60 KSP preconditioned resid norm 1.384679712935e-01 true resid norm 2.745891157615e+00 ||r(i)||/||b|| 5.363068667216e-03 >> 61 KSP preconditioned resid norm 7.632834890339e-02 true resid norm 2.176299405671e+00 ||r(i)||/||b|| 4.250584776702e-03 >> 62 KSP preconditioned resid norm 3.147491994853e-01 true resid norm 1.832893972188e+00 ||r(i)||/||b|| 3.579871039430e-03 >> 63 KSP preconditioned resid norm 5.052243308649e-01 true resid norm 1.775115122392e+00 ||r(i)||/||b|| 3.467021723421e-03 >> 64 KSP preconditioned resid norm 8.956523831283e-01 true resid norm 1.731441975933e+00 ||r(i)||/||b|| 3.381722609244e-03 >> 65 KSP preconditioned resid norm 7.897527588669e-01 true resid norm 1.682654829619e+00 ||r(i)||/||b|| 3.286435214100e-03 >> 66 KSP preconditioned resid norm 5.770941160165e-02 true resid norm 1.560734518349e+00 ||r(i)||/||b|| 3.048309606150e-03 >> 67 KSP preconditioned resid norm 3.553024960194e-02 true resid norm 1.389699750667e+00 ||r(i)||/||b|| 2.714257325521e-03 >> 68 KSP preconditioned resid norm 4.316233667769e-02 true resid norm 1.147051776028e+00 ||r(i)||/||b|| 2.240335500054e-03 >> 69 KSP preconditioned resid norm 3.793691994632e-02 true resid norm 1.012385825627e+00 ||r(i)||/||b|| 1.977316065678e-03 >> 70 KSP preconditioned resid norm 2.383460701011e-02 true resid norm 8.696480161436e-01 ||r(i)||/||b|| 1.698531281530e-03 >> 71 KSP preconditioned resid norm 6.376655007996e-02 true resid norm 7.779779636534e-01 ||r(i)||/||b|| 1.519488210261e-03 >> 72 KSP preconditioned resid norm 5.714768085413e-02 true resid norm 7.153671793501e-01 ||r(i)||/||b|| 1.397201522168e-03 >> 73 KSP preconditioned resid norm 1.708395350387e-01 true resid norm 6.312992319936e-01 ||r(i)||/||b|| 1.233006312487e-03 >> 74 KSP preconditioned resid norm 1.498516783452e-01 true resid norm 6.006527781743e-01 ||r(i)||/||b|| 1.173149957372e-03 >> 75 KSP preconditioned resid norm 1.218071938641e-01 true resid norm 5.769463903876e-01 ||r(i)||/||b|| 1.126848418726e-03 >> 76 KSP preconditioned resid norm 2.682030144251e-02 true resid norm 5.214035118381e-01 ||r(i)||/||b|| 1.018366234059e-03 >> 77 KSP preconditioned resid norm 9.794744927328e-02 true resid norm 4.660318995939e-01 ||r(i)||/||b|| 9.102185538943e-04 >> 78 KSP preconditioned resid norm 3.311394355245e-01 true resid norm 4.581129176231e-01 ||r(i)||/||b|| 8.947517922325e-04 >> 79 KSP preconditioned resid norm 7.771705063438e-02 true resid norm 4.103510898511e-01 ||r(i)||/||b|| 8.014669723654e-04 >> 80 KSP preconditioned resid norm 3.078123608908e-02 true resid norm 3.918493012988e-01 ||r(i)||/||b|| 7.653306665991e-04 >> 81 KSP preconditioned resid norm 2.759088686744e-02 true resid norm 3.289360804743e-01 ||r(i)||/||b|| 6.424532821763e-04 >> 82 KSP preconditioned resid norm 1.147671489846e-01 true resid norm 3.190902200515e-01 ||r(i)||/||b|| 6.232230860381e-04 >> 83 KSP preconditioned resid norm 1.101306468440e-02 true resid norm 2.900815313985e-01 ||r(i)||/||b|| 5.665654910126e-04 >> KSP Object: 16 MPI processes >> type: cg >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> PC Object: 16 MPI processes >> type: mg >> MG: type is MULTIPLICATIVE, levels=5 cycles=v >> Cycles per PCApply=1 >> Not using Galerkin computed coarse grid matrices >> Coarse grid solver -- level ------------------------------- >> KSP Object: (mg_coarse_) 16 MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (mg_coarse_) 16 MPI processes >> type: redundant >> Redundant preconditioner: First (color=0) of 16 PCs follows >> KSP Object: (mg_coarse_redundant_) 1 MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (mg_coarse_redundant_) 1 MPI processes >> type: lu >> LU: out-of-place factorization >> tolerance for zero pivot 2.22045e-14 >> using diagonal shift on blocks to prevent zero pivot [INBLOCKS] >> matrix ordering: nd >> factor fill ratio given 5., needed 7.56438 >> Factored matrix follows: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=512, cols=512 >> package used to perform factorization: petsc >> total: nonzeros=24206, allocated nonzeros=24206 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=512, cols=512 >> total: nonzeros=3200, allocated nonzeros=3200 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=512, cols=512 >> total: nonzeros=3200, allocated nonzeros=3200 >> total number of mallocs used during MatSetValues calls =0 >> Down solver (pre-smoother) on level 1 ------------------------------- >> KSP Object: (mg_levels_1_) 16 MPI processes >> type: richardson >> Richardson: using self-scale best computed damping factor >> maximum iterations=5 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_1_) 16 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=4096, cols=4096 >> total: nonzeros=27136, allocated nonzeros=27136 >> total number of mallocs used during MatSetValues calls =0 >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 2 ------------------------------- >> KSP Object: (mg_levels_2_) 16 MPI processes >> type: richardson >> Richardson: using self-scale best computed damping factor >> maximum iterations=5 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_2_) 16 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=32768, cols=32768 >> total: nonzeros=223232, allocated nonzeros=223232 >> total number of mallocs used during MatSetValues calls =0 >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 3 ------------------------------- >> KSP Object: (mg_levels_3_) 16 MPI processes >> type: richardson >> Richardson: using self-scale best computed damping factor >> maximum iterations=5 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_3_) 16 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=262144, cols=262144 >> total: nonzeros=1810432, allocated nonzeros=1810432 >> total number of mallocs used during MatSetValues calls =0 >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 4 ------------------------------- >> KSP Object: (mg_levels_4_) 16 MPI processes >> type: richardson >> Richardson: using self-scale best computed damping factor >> maximum iterations=5 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_4_) 16 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=2097152, cols=2097152 >> total: nonzeros=14581760, allocated nonzeros=14581760 >> total number of mallocs used during MatSetValues calls =0 >> Up solver (post-smoother) same as down solver (pre-smoother) >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=2097152, cols=2097152 >> total: nonzeros=14581760, allocated nonzeros=14581760 >> total number of mallocs used during MatSetValues calls =0 >> Residual 2 norm 0.290082 >> Residual infinity norm 0.00192869 >> >> >> >> >> >> solver_test.c: >> >> // modified version of ksp/ksp/examples/tutorials/ex34.c >> // related: ksp/ksp/examples/tutorials/ex29.c >> // ksp/ksp/examples/tutorials/ex32.c >> // ksp/ksp/examples/tutorials/ex50.c >> >> #include >> #include >> #include >> >> extern PetscErrorCode ComputeMatrix(KSP,Mat,Mat,void*); >> extern PetscErrorCode ComputeRHS(KSP,Vec,void*); >> >> typedef enum >> { >> DIRICHLET, >> NEUMANN >> } BCType; >> >> #undef __FUNCT__ >> #define __FUNCT__ "main" >> int main(int argc,char **argv) >> { >> KSP ksp; >> DM da; >> PetscReal norm; >> PetscErrorCode ierr; >> >> PetscInt i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs; >> PetscScalar Hx,Hy,Hz; >> PetscScalar ***array; >> Vec x,b,r; >> Mat J; >> const char* bcTypes[2] = { "dirichlet", "neumann" }; >> PetscInt bcType = (PetscInt)DIRICHLET; >> >> PetscInitialize(&argc,&argv,(char*)0,0); >> >> ierr = PetscOptionsBegin(PETSC_COMM_WORLD, "", "", "");CHKERRQ(ierr); >> ierr = PetscOptionsEList("-bc_type", "Type of boundary condition", "", bcTypes, 2, bcTypes[0], &bcType, NULL);CHKERRQ(ierr); >> ierr = PetscOptionsEnd();CHKERRQ(ierr); >> >> ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr); >> ierr = DMDACreate3d(PETSC_COMM_WORLD,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DMDA_STENCIL_STAR,-12,-12,-12,PETSC_DECIDE,PETSC_DECIDE,PETSC_DECIDE,1,1,0,0,0,&da);CHKERRQ(ierr); >> ierr = DMDASetInterpolationType(da, DMDA_Q0);CHKERRQ(ierr); >> >> ierr = KSPSetDM(ksp,da);CHKERRQ(ierr); >> >> ierr = KSPSetComputeRHS(ksp,ComputeRHS,&bcType);CHKERRQ(ierr); >> ierr = KSPSetComputeOperators(ksp,ComputeMatrix,&bcType);CHKERRQ(ierr); >> ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr); >> ierr = KSPSolve(ksp,NULL,NULL);CHKERRQ(ierr); >> ierr = KSPGetSolution(ksp,&x);CHKERRQ(ierr); >> ierr = KSPGetRhs(ksp,&b);CHKERRQ(ierr); >> ierr = KSPGetOperators(ksp,NULL,&J);CHKERRQ(ierr); >> ierr = VecDuplicate(b,&r);CHKERRQ(ierr); >> >> ierr = MatMult(J,x,r);CHKERRQ(ierr); >> ierr = VecAYPX(r,-1.0,b);CHKERRQ(ierr); >> ierr = VecNorm(r,NORM_2,&norm);CHKERRQ(ierr); >> ierr = PetscPrintf(PETSC_COMM_WORLD,"Residual 2 norm %g\n",(double)norm);CHKERRQ(ierr); >> ierr = VecNorm(r,NORM_INFINITY,&norm);CHKERRQ(ierr); >> ierr = PetscPrintf(PETSC_COMM_WORLD,"Residual infinity norm %g\n",(double)norm);CHKERRQ(ierr); >> >> ierr = VecDestroy(&r);CHKERRQ(ierr); >> ierr = KSPDestroy(&ksp);CHKERRQ(ierr); >> ierr = DMDestroy(&da);CHKERRQ(ierr); >> ierr = PetscFinalize(); >> return 0; >> } >> >> #undef __FUNCT__ >> #define __FUNCT__ "ComputeRHS" >> PetscErrorCode ComputeRHS(KSP ksp,Vec b,void *ctx) >> { >> PetscErrorCode ierr; >> PetscInt i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs; >> PetscScalar Hx,Hy,Hz; >> PetscScalar ***array; >> DM da; >> BCType bcType = *(BCType*)ctx; >> >> PetscFunctionBeginUser; >> ierr = KSPGetDM(ksp,&da);CHKERRQ(ierr); >> ierr = DMDAGetInfo(da, 0, &mx, &my, &mz, 0,0,0,0,0,0,0,0,0);CHKERRQ(ierr); >> Hx = 1.0 / (PetscReal)(mx); >> Hy = 1.0 / (PetscReal)(my); >> Hz = 1.0 / (PetscReal)(mz); >> ierr = DMDAGetCorners(da,&xs,&ys,&zs,&xm,&ym,&zm);CHKERRQ(ierr); >> ierr = DMDAVecGetArray(da, b, &array);CHKERRQ(ierr); >> for (k = zs; k < zs + zm; k++) >> { >> for (j = ys; j < ys + ym; j++) >> { >> for (i = xs; i < xs + xm; i++) >> { >> PetscReal x = ((PetscReal)i + 0.5) * Hx; >> PetscReal y = ((PetscReal)j + 0.5) * Hy; >> PetscReal z = ((PetscReal)k + 0.5) * Hz; >> array[k][j][i] = PetscSinReal(x * 2.0 * PETSC_PI) * PetscCosReal(y * 2.0 * PETSC_PI) * PetscSinReal(z * 2.0 * PETSC_PI); >> } >> } >> } >> ierr = DMDAVecRestoreArray(da, b, &array);CHKERRQ(ierr); >> ierr = VecAssemblyBegin(b);CHKERRQ(ierr); >> ierr = VecAssemblyEnd(b);CHKERRQ(ierr); >> >> PetscReal norm; >> VecNorm(b, NORM_2, &norm); >> PetscPrintf(PETSC_COMM_WORLD, "right hand side 2 norm: %g\n", (double)norm); >> VecNorm(b, NORM_INFINITY, &norm); >> PetscPrintf(PETSC_COMM_WORLD, "right hand side infinity norm: %g\n", (double)norm); >> >> /* force right hand side to be consistent for singular matrix */ >> /* note this is really a hack, normally the model would provide you with a consistent right handside */ >> >> if (bcType == NEUMANN) >> { >> MatNullSpace nullspace; >> ierr = MatNullSpaceCreate(PETSC_COMM_WORLD,PETSC_TRUE,0,0,&nullspace);CHKERRQ(ierr); >> ierr = MatNullSpaceRemove(nullspace,b);CHKERRQ(ierr); >> ierr = MatNullSpaceDestroy(&nullspace);CHKERRQ(ierr); >> } >> PetscFunctionReturn(0); >> } >> >> >> #undef __FUNCT__ >> #define __FUNCT__ "ComputeMatrix" >> PetscErrorCode ComputeMatrix(KSP ksp, Mat J,Mat jac, void *ctx) >> { >> PetscErrorCode ierr; >> PetscInt i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs,num, numi, numj, numk; >> PetscScalar v[7],Hx,Hy,Hz; >> MatStencil row, col[7]; >> DM da; >> BCType bcType = *(BCType*)ctx; >> >> PetscFunctionBeginUser; >> >> if (bcType == DIRICHLET) >> PetscPrintf(PETSC_COMM_WORLD, "building operator with Dirichlet boundary conditions, "); >> else if (bcType == NEUMANN) >> PetscPrintf(PETSC_COMM_WORLD, "building operator with Neumann boundary conditions, "); >> else >> SETERRQ(PETSC_COMM_WORLD, PETSC_ERR_SUP, "unrecognized boundary condition type\n"); >> >> ierr = KSPGetDM(ksp,&da);CHKERRQ(ierr); >> ierr = DMDAGetInfo(da,0,&mx,&my,&mz,0,0,0,0,0,0,0,0,0);CHKERRQ(ierr); >> >> PetscPrintf(PETSC_COMM_WORLD, "global grid size: %d x %d x %d\n", mx, my, mz); >> >> Hx = 1.0 / (PetscReal)(mx); >> Hy = 1.0 / (PetscReal)(my); >> Hz = 1.0 / (PetscReal)(mz); >> >> PetscReal Hx2 = Hx * Hx; >> PetscReal Hy2 = Hy * Hy; >> PetscReal Hz2 = Hz * Hz; >> >> PetscReal scaleX = 1.0 / Hx2; >> PetscReal scaleY = 1.0 / Hy2; >> PetscReal scaleZ = 1.0 / Hz2; >> >> ierr = DMDAGetCorners(da,&xs,&ys,&zs,&xm,&ym,&zm);CHKERRQ(ierr); >> for (k = zs; k < zs + zm; k++) >> { >> for (j = ys; j < ys + ym; j++) >> { >> for (i = xs; i < xs + xm; i++) >> { >> row.i = i; >> row.j = j; >> row.k = k; >> if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1 || k == mz - 1) >> { >> num = 0; >> numi = 0; >> numj = 0; >> numk = 0; >> if (k != 0) >> { >> v[num] = -scaleZ; >> col[num].i = i; >> col[num].j = j; >> col[num].k = k - 1; >> num++; >> numk++; >> } >> if (j != 0) >> { >> v[num] = -scaleY; >> col[num].i = i; >> col[num].j = j - 1; >> col[num].k = k; >> num++; >> numj++; >> } >> if (i != 0) >> { >> v[num] = -scaleX; >> col[num].i = i - 1; >> col[num].j = j; >> col[num].k = k; >> num++; >> numi++; >> } >> if (i != mx - 1) >> { >> v[num] = -scaleX; >> col[num].i = i + 1; >> col[num].j = j; >> col[num].k = k; >> num++; >> numi++; >> } >> if (j != my - 1) >> { >> v[num] = -scaleY; >> col[num].i = i; >> col[num].j = j + 1; >> col[num].k = k; >> num++; >> numj++; >> } >> if (k != mz - 1) >> { >> v[num] = -scaleZ; >> col[num].i = i; >> col[num].j = j; >> col[num].k = k + 1; >> num++; >> numk++; >> } >> >> if (bcType == NEUMANN) >> { >> v[num] = (PetscReal) (numk) * scaleZ + (PetscReal) (numj) * scaleY + (PetscReal) (numi) * scaleX; >> } >> else if (bcType == DIRICHLET) >> { >> v[num] = 2.0 * (scaleX + scaleY + scaleZ); >> } >> >> col[num].i = i; >> col[num].j = j; >> col[num].k = k; >> num++; >> ierr = MatSetValuesStencil(jac, 1, &row, num, col, v, INSERT_VALUES); >> CHKERRQ(ierr); >> } >> else >> { >> v[0] = -scaleZ; >> col[0].i = i; >> col[0].j = j; >> col[0].k = k - 1; >> v[1] = -scaleY; >> col[1].i = i; >> col[1].j = j - 1; >> col[1].k = k; >> v[2] = -scaleX; >> col[2].i = i - 1; >> col[2].j = j; >> col[2].k = k; >> v[3] = 2.0 * (scaleX + scaleY + scaleZ); >> col[3].i = i; >> col[3].j = j; >> col[3].k = k; >> v[4] = -scaleX; >> col[4].i = i + 1; >> col[4].j = j; >> col[4].k = k; >> v[5] = -scaleY; >> col[5].i = i; >> col[5].j = j + 1; >> col[5].k = k; >> v[6] = -scaleZ; >> col[6].i = i; >> col[6].j = j; >> col[6].k = k + 1; >> ierr = MatSetValuesStencil(jac, 1, &row, 7, col, v, INSERT_VALUES); >> CHKERRQ(ierr); >> } >> } >> } >> } >> ierr = MatAssemblyBegin(jac,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >> ierr = MatAssemblyEnd(jac,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >> if (bcType == NEUMANN) >> { >> MatNullSpace nullspace; >> ierr = MatNullSpaceCreate(PETSC_COMM_WORLD,PETSC_TRUE,0,0,&nullspace);CHKERRQ(ierr); >> ierr = MatSetNullSpace(J,nullspace);CHKERRQ(ierr); >> ierr = MatNullSpaceDestroy(&nullspace);CHKERRQ(ierr); >> } >> PetscFunctionReturn(0); >> } >> >> >>> On Jun 22, 2017, at 9:23 AM, Matthew Knepley wrote: >>> >>> On Wed, Jun 21, 2017 at 8:12 PM, Jason Lefley wrote: >>> Hello, >>> >>> We are attempting to use the PETSc KSP solver framework in a fluid dynamics simulation we developed. The solution is part of a pressure projection and solves a Poisson problem. We use a cell-centered layout with a regular grid in 3d. We started with ex34.c from the KSP tutorials since it has the correct calls for the 3d DMDA, uses a cell-centered layout, and states that it works with multi-grid. We modified the operator construction function to match the coefficients and Dirichlet boundary conditions used in our problem (we?d also like to support Neumann but left those out for now to keep things simple). As a result of the modified boundary conditions, our version does not perform a null space removal on the right hand side or operator as the original did. We also modified the right hand side to contain a sinusoidal pattern for testing. Other than these changes, our code is the same as the original ex34.c >>> >>> With the default KSP options and using CG with the default pre-conditioner and without a pre-conditioner, we see good convergence. However, we?d like to accelerate the time to solution further and target larger problem sizes (>= 1024^3) if possible. Given these objectives, multi-grid as a pre-conditioner interests us. To understand the improvement that multi-grid provides, we ran ex45 from the KSP tutorials. ex34 with CG and no pre-conditioner appears to converge in a single iteration and we wanted to compare against a problem that has similar convergence patterns to our problem. Here?s the tests we ran with ex45: >>> >>> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 >>> time in KSPSolve(): 7.0178e+00 >>> solver iterations: 157 >>> KSP final norm of residual: 3.16874e-05 >>> >>> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -ksp_type cg -pc_type none >>> time in KSPSolve(): 4.1072e+00 >>> solver iterations: 213 >>> KSP final norm of residual: 0.000138866 >>> >>> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -ksp_type cg >>> time in KSPSolve(): 3.3962e+00 >>> solver iterations: 88 >>> KSP final norm of residual: 6.46242e-05 >>> >>> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi >>> time in KSPSolve(): 1.3201e+00 >>> solver iterations: 4 >>> KSP final norm of residual: 8.13339e-05 >>> >>> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi -ksp_type cg >>> time in KSPSolve(): 1.3008e+00 >>> solver iterations: 4 >>> KSP final norm of residual: 2.21474e-05 >>> >>> We found the multi-grid pre-conditioner options in the KSP tutorials makefile. These results make sense; both the default GMRES and CG solvers converge and CG without a pre-conditioner takes more iterations. The multi-grid pre-conditioned runs are pretty dramatically accelerated and require only a handful of iterations. >>> >>> We ran our code (modified ex34.c as described above) with the same parameters: >>> >>> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 >>> time in KSPSolve(): 5.3729e+00 >>> solver iterations: 123 >>> KSP final norm of residual: 0.00595066 >>> >>> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_type cg -pc_type none >>> time in KSPSolve(): 3.6154e+00 >>> solver iterations: 188 >>> KSP final norm of residual: 0.00505943 >>> >>> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_type cg >>> time in KSPSolve(): 3.5661e+00 >>> solver iterations: 98 >>> KSP final norm of residual: 0.00967462 >>> >>> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi >>> time in KSPSolve(): 4.5606e+00 >>> solver iterations: 44 >>> KSP final norm of residual: 949.553 >>> >>> 1) Dave is right >>> >>> 2) In order to see how many iterates to expect, first try using algebraic multigrid >>> >>> -pc_type gamg >>> >>> This should work out of the box for Poisson >>> >>> 3) For questions like this, we really need to see >>> >>> -ksp_view -ksp_monitor_true_residual >>> >>> 4) It sounds like you smoother is not strong enough. You could try >>> >>> -mg_levels_ksp_type richardson -mg_levels_ksp_richardson_self_scale -mg_levels_ksp_max_it 5 >>> >>> or maybe GMRES until it works. >>> >>> Thanks, >>> >>> Matt >>> >>> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi -ksp_type cg >>> time in KSPSolve(): 1.5481e+01 >>> solver iterations: 198 >>> KSP final norm of residual: 0.916558 >>> >>> We performed all tests with petsc-3.7.6. >>> >>> The trends with CG and GMRES seem consistent with the results from ex45. However, with multi-grid, something doesn?t seem right. Convergence seems poor and the solves run for many more iterations than ex45 with multi-grid as a pre-conditioner. I extensively validated the code that builds the matrix and also confirmed that the solution produced by CG, when evaluated with the system of equations elsewhere in our simulation, produces the same residual as indicated by PETSc. Given that we only made minimal modifications to the original example code, it seems likely that the operators constructed for the multi-grid levels are ok. >>> >>> We also tried a variety of other suggested parameters for the multi-grid pre-conditioner as suggested in related mailing list posts but we didn?t observe any significant improvements over the results above. >>> >>> Is there anything we can do to check the validity of the coefficient matrices built for the different multi-grid levels? Does it look like there could be problems there? Or any other suggestions to achieve better results with multi-grid? I have the -log_view, -ksp_view, and convergence monitor output from the above tests and can post any of it if it would assist. >>> >>> Thanks >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> http://www.caam.rice.edu/~mk51/ >> > From jason.lefley at aclectic.com Fri Jun 23 00:54:26 2017 From: jason.lefley at aclectic.com (Jason Lefley) Date: Thu, 22 Jun 2017 22:54:26 -0700 Subject: [petsc-users] Issue using multi-grid as a pre-conditioner with KSP for a Poisson problem In-Reply-To: References: <89C033B5-529D-4B36-B4AF-2EC35CA2CCAB@aclectic.com> <4D5B3921-810B-49AD-97E9-8BF1DECBF655@aclectic.com> Message-ID: > On Jun 22, 2017, at 5:35 PM, Matthew Knepley wrote: > > On Thu, Jun 22, 2017 at 3:20 PM, Jason Lefley > wrote: > Thanks for the prompt replies. I ran with gamg and the results look more promising. I tried the suggested -mg_* options and did not see improvement. The -ksp_view and -ksp_monitor_true_residual output from those tests and the solver_test source (modified ex34.c) follow: > > Okay, the second step is to replicate the smoother for the GMG, which will have a smaller and scalable setup time. The > smoother could be weak, or the restriction could be bad. I inspected the ksp_view output from the run with -pc_type gamg and ran the program again with -pc_type mg and the pre-conditioner options from the gamg run: $ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_view -ksp_monitor_true_residual -ksp_type cg -pc_type mg -pc_mg_levels 5 -mg_coarse_ksp_type preonly -mg_coarse_pc_type bjacobi -mg_coarse_sub_ksp_type preonly -mg_coarse_sub_pc_type lu -mg_coarse_sub_pc_factor_shift_type INBLOCKS -mg_levels_ksp_type chebyshev -mg_levels_pc_type sor -mg_levels_esteig_ksp_type gmres right hand side 2 norm: 512. right hand side infinity norm: 0.999097 building operator with Dirichlet boundary conditions, global grid size: 128 x 128 x 128 building operator with Dirichlet boundary conditions, global grid size: 16 x 16 x 16 building operator with Dirichlet boundary conditions, global grid size: 32 x 32 x 32 building operator with Dirichlet boundary conditions, global grid size: 64 x 64 x 64 building operator with Dirichlet boundary conditions, global grid size: 8 x 8 x 8 0 KSP preconditioned resid norm 9.806726045668e+02 true resid norm 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 3.621361277232e+02 true resid norm 1.429430352211e+03 ||r(i)||/||b|| 2.791856156662e+00 2 KSP preconditioned resid norm 2.362961522860e+01 true resid norm 1.549620746006e+03 ||r(i)||/||b|| 3.026603019544e+00 3 KSP preconditioned resid norm 7.695073339717e+01 true resid norm 1.542148820317e+03 ||r(i)||/||b|| 3.012009414681e+00 4 KSP preconditioned resid norm 3.765270470793e+01 true resid norm 1.536405551882e+03 ||r(i)||/||b|| 3.000792093520e+00 5 KSP preconditioned resid norm 6.761970780882e+01 true resid norm 1.502842623846e+03 ||r(i)||/||b|| 2.935239499698e+00 6 KSP preconditioned resid norm 5.995995646652e+01 true resid norm 1.447456652501e+03 ||r(i)||/||b|| 2.827063774415e+00 7 KSP preconditioned resid norm 4.388139142285e+01 true resid norm 1.413766393419e+03 ||r(i)||/||b|| 2.761262487146e+00 8 KSP preconditioned resid norm 2.295909410512e+01 true resid norm 1.371727148377e+03 ||r(i)||/||b|| 2.679154586673e+00 9 KSP preconditioned resid norm 1.961908891359e+01 true resid norm 1.339113282715e+03 ||r(i)||/||b|| 2.615455630302e+00 10 KSP preconditioned resid norm 6.893687291220e+01 true resid norm 1.229592829746e+03 ||r(i)||/||b|| 2.401548495598e+00 11 KSP preconditioned resid norm 3.833567365382e+01 true resid norm 1.118085982483e+03 ||r(i)||/||b|| 2.183761684536e+00 12 KSP preconditioned resid norm 1.939604089596e+01 true resid norm 9.852672187664e+02 ||r(i)||/||b|| 1.924350036653e+00 13 KSP preconditioned resid norm 2.252075208204e+01 true resid norm 8.159187018709e+02 ||r(i)||/||b|| 1.593591214592e+00 14 KSP preconditioned resid norm 2.642782719810e+01 true resid norm 7.253214735753e+02 ||r(i)||/||b|| 1.416643503077e+00 15 KSP preconditioned resid norm 2.548817259250e+01 true resid norm 6.070018478722e+02 ||r(i)||/||b|| 1.185550484125e+00 16 KSP preconditioned resid norm 5.281972692525e+01 true resid norm 4.815894400238e+02 ||r(i)||/||b|| 9.406043750466e-01 17 KSP preconditioned resid norm 2.402884696592e+01 true resid norm 4.144462871860e+02 ||r(i)||/||b|| 8.094654046603e-01 18 KSP preconditioned resid norm 1.043080941574e+01 true resid norm 3.729148183697e+02 ||r(i)||/||b|| 7.283492546283e-01 19 KSP preconditioned resid norm 1.490375076082e+01 true resid norm 3.122057027160e+02 ||r(i)||/||b|| 6.097767631172e-01 20 KSP preconditioned resid norm 3.249426166084e+00 true resid norm 2.704136970440e+02 ||r(i)||/||b|| 5.281517520390e-01 21 KSP preconditioned resid norm 4.898441103047e+00 true resid norm 2.346045017813e+02 ||r(i)||/||b|| 4.582119175416e-01 22 KSP preconditioned resid norm 6.674657659594e+00 true resid norm 1.870390126135e+02 ||r(i)||/||b|| 3.653105715107e-01 23 KSP preconditioned resid norm 5.475921158065e+00 true resid norm 1.732176093821e+02 ||r(i)||/||b|| 3.383156433245e-01 24 KSP preconditioned resid norm 2.776421930727e+00 true resid norm 1.562809743536e+02 ||r(i)||/||b|| 3.052362780343e-01 25 KSP preconditioned resid norm 3.424602247354e+00 true resid norm 1.375628929963e+02 ||r(i)||/||b|| 2.686775253835e-01 26 KSP preconditioned resid norm 2.212037280808e+00 true resid norm 1.221828497054e+02 ||r(i)||/||b|| 2.386383783309e-01 27 KSP preconditioned resid norm 1.365474968893e+00 true resid norm 1.082476112493e+02 ||r(i)||/||b|| 2.114211157213e-01 28 KSP preconditioned resid norm 2.638907538318e+00 true resid norm 8.864935716757e+01 ||r(i)||/||b|| 1.731432757179e-01 29 KSP preconditioned resid norm 1.719908158919e+00 true resid norm 7.632670876324e+01 ||r(i)||/||b|| 1.490756030532e-01 30 KSP preconditioned resid norm 7.985033219249e-01 true resid norm 6.949169231958e+01 ||r(i)||/||b|| 1.357259615617e-01 31 KSP preconditioned resid norm 3.811670663811e+00 true resid norm 6.151000812796e+01 ||r(i)||/||b|| 1.201367346249e-01 32 KSP preconditioned resid norm 7.888148376757e+00 true resid norm 5.694823999920e+01 ||r(i)||/||b|| 1.112270312484e-01 33 KSP preconditioned resid norm 7.545633821809e-01 true resid norm 4.589854278402e+01 ||r(i)||/||b|| 8.964559137503e-02 34 KSP preconditioned resid norm 2.271801800991e+00 true resid norm 3.728668301821e+01 ||r(i)||/||b|| 7.282555276994e-02 35 KSP preconditioned resid norm 3.961087334680e+00 true resid norm 3.169140910721e+01 ||r(i)||/||b|| 6.189728341253e-02 36 KSP preconditioned resid norm 9.139405064634e-01 true resid norm 2.825299509385e+01 ||r(i)||/||b|| 5.518163104268e-02 37 KSP preconditioned resid norm 3.403605053170e-01 true resid norm 2.102215336663e+01 ||r(i)||/||b|| 4.105889329421e-02 38 KSP preconditioned resid norm 4.614799224677e-01 true resid norm 1.651863757642e+01 ||r(i)||/||b|| 3.226296401644e-02 39 KSP preconditioned resid norm 1.996074237552e+00 true resid norm 1.439868559977e+01 ||r(i)||/||b|| 2.812243281205e-02 40 KSP preconditioned resid norm 1.106018322401e+00 true resid norm 1.313250681787e+01 ||r(i)||/||b|| 2.564942737865e-02 41 KSP preconditioned resid norm 2.639402464711e-01 true resid norm 1.164910167179e+01 ||r(i)||/||b|| 2.275215170271e-02 42 KSP preconditioned resid norm 1.749941228669e-01 true resid norm 1.053438524789e+01 ||r(i)||/||b|| 2.057497118729e-02 43 KSP preconditioned resid norm 6.464433193720e-01 true resid norm 9.105614545741e+00 ||r(i)||/||b|| 1.778440340965e-02 44 KSP preconditioned resid norm 5.990029838187e-01 true resid norm 8.803151647663e+00 ||r(i)||/||b|| 1.719365556184e-02 45 KSP preconditioned resid norm 1.871777684116e-01 true resid norm 8.140591972598e+00 ||r(i)||/||b|| 1.589959369648e-02 46 KSP preconditioned resid norm 4.316459571157e-01 true resid norm 7.640223567698e+00 ||r(i)||/||b|| 1.492231165566e-02 47 KSP preconditioned resid norm 9.563142801536e-02 true resid norm 7.094762567710e+00 ||r(i)||/||b|| 1.385695814006e-02 48 KSP preconditioned resid norm 2.380088757747e-01 true resid norm 6.064559746487e+00 ||r(i)||/||b|| 1.184484325486e-02 49 KSP preconditioned resid norm 2.230779501200e-01 true resid norm 4.923827478633e+00 ||r(i)||/||b|| 9.616850544205e-03 50 KSP preconditioned resid norm 2.905071000609e-01 true resid norm 4.426620956264e+00 ||r(i)||/||b|| 8.645744055203e-03 51 KSP preconditioned resid norm 3.430194707482e-02 true resid norm 3.873957688918e+00 ||r(i)||/||b|| 7.566323611167e-03 52 KSP preconditioned resid norm 4.329652082337e-02 true resid norm 3.430571122778e+00 ||r(i)||/||b|| 6.700334224177e-03 53 KSP preconditioned resid norm 1.610976212900e-01 true resid norm 3.052757228648e+00 ||r(i)||/||b|| 5.962416462203e-03 54 KSP preconditioned resid norm 6.113252183681e-02 true resid norm 2.876793151138e+00 ||r(i)||/||b|| 5.618736623317e-03 55 KSP preconditioned resid norm 2.731463237482e-02 true resid norm 2.441017091077e+00 ||r(i)||/||b|| 4.767611506010e-03 56 KSP preconditioned resid norm 5.193746161496e-02 true resid norm 2.114917193241e+00 ||r(i)||/||b|| 4.130697643049e-03 57 KSP preconditioned resid norm 2.959513516137e-01 true resid norm 1.903828747377e+00 ||r(i)||/||b|| 3.718415522220e-03 58 KSP preconditioned resid norm 8.093802579621e-02 true resid norm 1.759070727559e+00 ||r(i)||/||b|| 3.435685014763e-03 59 KSP preconditioned resid norm 3.558590388480e-02 true resid norm 1.356337866126e+00 ||r(i)||/||b|| 2.649097394777e-03 60 KSP preconditioned resid norm 6.506508837044e-02 true resid norm 1.214979249890e+00 ||r(i)||/||b|| 2.373006347441e-03 61 KSP preconditioned resid norm 3.120758675191e-02 true resid norm 9.993321163196e-01 ||r(i)||/||b|| 1.951820539687e-03 62 KSP preconditioned resid norm 1.034431089486e-01 true resid norm 9.193137244810e-01 ||r(i)||/||b|| 1.795534618127e-03 63 KSP preconditioned resid norm 2.763120051285e-02 true resid norm 8.479698661132e-01 ||r(i)||/||b|| 1.656191144752e-03 64 KSP preconditioned resid norm 1.937546528918e-02 true resid norm 7.431839535619e-01 ||r(i)||/||b|| 1.451531159301e-03 65 KSP preconditioned resid norm 2.133391792161e-02 true resid norm 7.089428437765e-01 ||r(i)||/||b|| 1.384653991751e-03 66 KSP preconditioned resid norm 8.676771000819e-03 true resid norm 6.511166875850e-01 ||r(i)||/||b|| 1.271712280439e-03 KSP Object: 16 MPI processes type: cg maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 16 MPI processes type: mg MG: type is MULTIPLICATIVE, levels=5 cycles=v Cycles per PCApply=1 Not using Galerkin computed coarse grid matrices Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 16 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 16 MPI processes type: bjacobi block Jacobi: number of blocks = 16 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5., needed 2.3125 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=32, cols=32 package used to perform factorization: petsc total: nonzeros=370, allocated nonzeros=370 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=32, cols=32 total: nonzeros=160, allocated nonzeros=160 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=512, cols=512 total: nonzeros=3200, allocated nonzeros=3200 total number of mallocs used during MatSetValues calls =0 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 16 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.153005, max = 1.68306 Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_1_esteig_) 16 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 16 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=4096, cols=4096 total: nonzeros=27136, allocated nonzeros=27136 total number of mallocs used during MatSetValues calls =0 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 16 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.152793, max = 1.68072 Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_2_esteig_) 16 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_2_) 16 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=32768, cols=32768 total: nonzeros=223232, allocated nonzeros=223232 total number of mallocs used during MatSetValues calls =0 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 16 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.144705, max = 1.59176 Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_3_esteig_) 16 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_3_) 16 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=262144, cols=262144 total: nonzeros=1810432, allocated nonzeros=1810432 total number of mallocs used during MatSetValues calls =0 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 4 ------------------------------- KSP Object: (mg_levels_4_) 16 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.140039, max = 1.54043 Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_4_esteig_) 16 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_4_) 16 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=2097152, cols=2097152 total: nonzeros=14581760, allocated nonzeros=14581760 total number of mallocs used during MatSetValues calls =0 Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=2097152, cols=2097152 total: nonzeros=14581760, allocated nonzeros=14581760 total number of mallocs used during MatSetValues calls =0 Residual 2 norm 0.651117 Residual infinity norm 0.00799571 I did a diff on the ksp_view from the above run from the output from the run with -pc_type gamg and the only differences include the needed factor fill ratio (gamg: 1, mg: 2.3125), the size and non-zero counts of the matrices used in the multi-grid levels, the Chebyshev eigenvalue estimates, and the usage of I-node routines (gamg: using I-node routines: found 3 nodes, limit used is 5, mg: not using I-node routines). Adding -pc_mg_galerkin results in some improvement but still not as good as with gamg: $ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_view -ksp_monitor_true_residual -ksp_type cg -pc_type mg -pc_mg_levels 5 -mg_coarse_ksp_type preonly -mg_coarse_pc_type bjacobi -mg_coarse_sub_ksp_type preonly -mg_coarse_sub_pc_type lu -mg_coarse_sub_pc_factor_shift_type INBLOCKS -mg_levels_ksp_type chebyshev -mg_levels_pc_type sor -mg_levels_esteig_ksp_type gmres -pc_mg_galerkin right hand side 2 norm: 512. right hand side infinity norm: 0.999097 building operator with Dirichlet boundary conditions, global grid size: 128 x 128 x 128 0 KSP preconditioned resid norm 1.073621701581e+00 true resid norm 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.316341151889e-01 true resid norm 1.169096072553e+03 ||r(i)||/||b|| 2.283390766706e+00 2 KSP preconditioned resid norm 1.054910990128e-01 true resid norm 4.444993518786e+02 ||r(i)||/||b|| 8.681627966378e-01 3 KSP preconditioned resid norm 3.671488511570e-02 true resid norm 1.169431518627e+02 ||r(i)||/||b|| 2.284045934818e-01 4 KSP preconditioned resid norm 1.055769111265e-02 true resid norm 3.161333456265e+01 ||r(i)||/||b|| 6.174479406767e-02 5 KSP preconditioned resid norm 2.557907008002e-03 true resid norm 9.319742572653e+00 ||r(i)||/||b|| 1.820262221221e-02 6 KSP preconditioned resid norm 5.039866236685e-04 true resid norm 2.418858575838e+00 ||r(i)||/||b|| 4.724333155934e-03 7 KSP preconditioned resid norm 1.132965683654e-04 true resid norm 4.979511177091e-01 ||r(i)||/||b|| 9.725607767757e-04 8 KSP preconditioned resid norm 5.458028025084e-05 true resid norm 1.150321233127e-01 ||r(i)||/||b|| 2.246721158452e-04 9 KSP preconditioned resid norm 3.742558792121e-05 true resid norm 8.485603638598e-02 ||r(i)||/||b|| 1.657344460664e-04 10 KSP preconditioned resid norm 1.121838737544e-05 true resid norm 4.699890661073e-02 ||r(i)||/||b|| 9.179473947407e-05 11 KSP preconditioned resid norm 4.452473763175e-06 true resid norm 1.071140093264e-02 ||r(i)||/||b|| 2.092070494657e-05 KSP Object: 16 MPI processes type: cg maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 16 MPI processes type: mg MG: type is MULTIPLICATIVE, levels=5 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 16 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 16 MPI processes type: bjacobi block Jacobi: number of blocks = 16 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5., needed 2.3125 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=32, cols=32 package used to perform factorization: petsc total: nonzeros=370, allocated nonzeros=370 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=32, cols=32 total: nonzeros=160, allocated nonzeros=160 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=512, cols=512 total: nonzeros=3200, allocated nonzeros=3200 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 16 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.153005, max = 1.68306 Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_1_esteig_) 16 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 16 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=4096, cols=4096 total: nonzeros=27136, allocated nonzeros=27136 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 16 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.152793, max = 1.68072 Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_2_esteig_) 16 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_2_) 16 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=32768, cols=32768 total: nonzeros=223232, allocated nonzeros=223232 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 16 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.144705, max = 1.59176 Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_3_esteig_) 16 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_3_) 16 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=262144, cols=262144 total: nonzeros=1810432, allocated nonzeros=1810432 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 4 ------------------------------- KSP Object: (mg_levels_4_) 16 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.140039, max = 1.54043 Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_4_esteig_) 16 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_4_) 16 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=2097152, cols=2097152 total: nonzeros=14581760, allocated nonzeros=14581760 total number of mallocs used during MatSetValues calls =0 Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=2097152, cols=2097152 total: nonzeros=14581760, allocated nonzeros=14581760 total number of mallocs used during MatSetValues calls =0 Residual 2 norm 0.0107114 Residual infinity norm 6.84843e-05 What are the differences between gamg and mg with -pc_mg_galerkin option (apart from the default smoother/coarse grid solver options, which I identified by comparing the ksp_view output)? Perhaps there?s an issue with the restriction, as you suggest? Thanks! > > Thanks, > > Matt > > $ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_view -ksp_monitor_true_residual -pc_type gamg -ksp_type cg > right hand side 2 norm: 512. > right hand side infinity norm: 0.999097 > building operator with Dirichlet boundary conditions, global grid size: 128 x 128 x 128 > 0 KSP preconditioned resid norm 2.600515167901e+00 true resid norm 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 6.715532962879e-02 true resid norm 7.578946422553e+02 ||r(i)||/||b|| 1.480262973155e+00 > 2 KSP preconditioned resid norm 1.127682308441e-02 true resid norm 3.247852182315e+01 ||r(i)||/||b|| 6.343461293584e-02 > 3 KSP preconditioned resid norm 7.760468503025e-04 true resid norm 3.304142895659e+00 ||r(i)||/||b|| 6.453404093085e-03 > 4 KSP preconditioned resid norm 6.419777870067e-05 true resid norm 2.662993775521e-01 ||r(i)||/||b|| 5.201159717815e-04 > 5 KSP preconditioned resid norm 5.107540549482e-06 true resid norm 2.309528369351e-02 ||r(i)||/||b|| 4.510797596388e-05 > KSP Object: 16 MPI processes > type: cg > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 16 MPI processes > type: gamg > MG: type is MULTIPLICATIVE, levels=5 cycles=v > Cycles per PCApply=1 > Using Galerkin computed coarse grid matrices > GAMG specific options > Threshold for dropping small values from graph 0. > AGG specific options > Symmetric graph false > Coarse grid solver -- level ------------------------------- > KSP Object: (mg_coarse_) 16 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_) 16 MPI processes > type: bjacobi > block Jacobi: number of blocks = 16 > Local solve is same for all blocks, in the following KSP and PC objects: > KSP Object: (mg_coarse_sub_) 1 MPI processes > type: preonly > maximum iterations=1, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_sub_) 1 MPI processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > using diagonal shift on blocks to prevent zero pivot [INBLOCKS] > matrix ordering: nd > factor fill ratio given 5., needed 1. > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=13, cols=13 > package used to perform factorization: petsc > total: nonzeros=169, allocated nonzeros=169 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 3 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=13, cols=13 > total: nonzeros=169, allocated nonzeros=169 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 3 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=13, cols=13 > total: nonzeros=169, allocated nonzeros=169 > total number of mallocs used during MatSetValues calls =0 > using I-node (on process 0) routines: found 3 nodes, limit used is 5 > Down solver (pre-smoother) on level 1 ------------------------------- > KSP Object: (mg_levels_1_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.136516, max = 1.50168 > Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_1_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_1_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=467, cols=467 > total: nonzeros=68689, allocated nonzeros=68689 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 2 ------------------------------- > KSP Object: (mg_levels_2_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.148872, max = 1.63759 > Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_2_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_2_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=14893, cols=14893 > total: nonzeros=1856839, allocated nonzeros=1856839 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 3 ------------------------------- > KSP Object: (mg_levels_3_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.135736, max = 1.49309 > Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_3_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_3_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=190701, cols=190701 > total: nonzeros=6209261, allocated nonzeros=6209261 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 4 ------------------------------- > KSP Object: (mg_levels_4_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.140039, max = 1.54043 > Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_4_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_4_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=2097152, cols=2097152 > total: nonzeros=14581760, allocated nonzeros=14581760 > total number of mallocs used during MatSetValues calls =0 > Up solver (post-smoother) same as down solver (pre-smoother) > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=2097152, cols=2097152 > total: nonzeros=14581760, allocated nonzeros=14581760 > total number of mallocs used during MatSetValues calls =0 > Residual 2 norm 0.0230953 > Residual infinity norm 0.000240174 > > > > $ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_view -ksp_monitor_true_residual -pc_type mg -ksp_type cg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_richardson_self_scale -mg_levels_ksp_max_it 5 > right hand side 2 norm: 512. > right hand side infinity norm: 0.999097 > building operator with Dirichlet boundary conditions, global grid size: 128 x 128 x 128 > building operator with Dirichlet boundary conditions, global grid size: 16 x 16 x 16 > building operator with Dirichlet boundary conditions, global grid size: 32 x 32 x 32 > building operator with Dirichlet boundary conditions, global grid size: 64 x 64 x 64 > building operator with Dirichlet boundary conditions, global grid size: 8 x 8 x 8 > 0 KSP preconditioned resid norm 1.957390963372e+03 true resid norm 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 7.501162328351e+02 true resid norm 3.373318498950e+02 ||r(i)||/||b|| 6.588512693262e-01 > 2 KSP preconditioned resid norm 7.658993705113e+01 true resid norm 1.827365322620e+02 ||r(i)||/||b|| 3.569072895742e-01 > 3 KSP preconditioned resid norm 9.059824824329e+02 true resid norm 1.426474831278e+02 ||r(i)||/||b|| 2.786083654840e-01 > 4 KSP preconditioned resid norm 4.091168582134e+02 true resid norm 1.292495057977e+02 ||r(i)||/||b|| 2.524404410112e-01 > 5 KSP preconditioned resid norm 7.422110759274e+01 true resid norm 1.258028404461e+02 ||r(i)||/||b|| 2.457086727463e-01 > 6 KSP preconditioned resid norm 4.619015396949e+01 true resid norm 1.213792421102e+02 ||r(i)||/||b|| 2.370688322464e-01 > 7 KSP preconditioned resid norm 6.391009527793e+01 true resid norm 1.124510270422e+02 ||r(i)||/||b|| 2.196309121917e-01 > 8 KSP preconditioned resid norm 7.446926604265e+01 true resid norm 1.077567310933e+02 ||r(i)||/||b|| 2.104623654166e-01 > 9 KSP preconditioned resid norm 4.220904319642e+01 true resid norm 9.988181971539e+01 ||r(i)||/||b|| 1.950816791316e-01 > 10 KSP preconditioned resid norm 2.394387980018e+01 true resid norm 9.127579669592e+01 ||r(i)||/||b|| 1.782730404217e-01 > 11 KSP preconditioned resid norm 1.360843954226e+01 true resid norm 8.771762326371e+01 ||r(i)||/||b|| 1.713234829369e-01 > 12 KSP preconditioned resid norm 4.128223286694e+01 true resid norm 8.529182941649e+01 ||r(i)||/||b|| 1.665856043291e-01 > 13 KSP preconditioned resid norm 2.183532094447e+01 true resid norm 8.263211340769e+01 ||r(i)||/||b|| 1.613908464994e-01 > 14 KSP preconditioned resid norm 1.304178992338e+01 true resid norm 7.971822602122e+01 ||r(i)||/||b|| 1.556996601977e-01 > 15 KSP preconditioned resid norm 7.573349141411e+00 true resid norm 7.520975377445e+01 ||r(i)||/||b|| 1.468940503407e-01 > 16 KSP preconditioned resid norm 9.314890793459e+00 true resid norm 7.304954328407e+01 ||r(i)||/||b|| 1.426748892267e-01 > 17 KSP preconditioned resid norm 4.445933446231e+00 true resid norm 6.978356031428e+01 ||r(i)||/||b|| 1.362960162388e-01 > 18 KSP preconditioned resid norm 5.349719054065e+00 true resid norm 6.667516877214e+01 ||r(i)||/||b|| 1.302249390081e-01 > 19 KSP preconditioned resid norm 3.295861671942e+00 true resid norm 6.182140339659e+01 ||r(i)||/||b|| 1.207449285090e-01 > 20 KSP preconditioned resid norm 1.035616277789e+01 true resid norm 5.734720030036e+01 ||r(i)||/||b|| 1.120062505866e-01 > 21 KSP preconditioned resid norm 3.211186072853e+01 true resid norm 5.552393909940e+01 ||r(i)||/||b|| 1.084451935535e-01 > 22 KSP preconditioned resid norm 1.305589450595e+01 true resid norm 5.499062776214e+01 ||r(i)||/||b|| 1.074035698479e-01 > 23 KSP preconditioned resid norm 2.686432456763e+00 true resid norm 5.207613218582e+01 ||r(i)||/||b|| 1.017111956754e-01 > 24 KSP preconditioned resid norm 2.824784197849e+00 true resid norm 4.838619801451e+01 ||r(i)||/||b|| 9.450429299708e-02 > 25 KSP preconditioned resid norm 1.071690618667e+00 true resid norm 4.607851421273e+01 ||r(i)||/||b|| 8.999709807174e-02 > 26 KSP preconditioned resid norm 1.881879145107e+00 true resid norm 4.001593265961e+01 ||r(i)||/||b|| 7.815611847581e-02 > 27 KSP preconditioned resid norm 1.572862295402e+00 true resid norm 3.838282973517e+01 ||r(i)||/||b|| 7.496646432650e-02 > 28 KSP preconditioned resid norm 1.470751639074e+00 true resid norm 3.480847634691e+01 ||r(i)||/||b|| 6.798530536506e-02 > 29 KSP preconditioned resid norm 1.024975253805e+01 true resid norm 3.242161363347e+01 ||r(i)||/||b|| 6.332346412788e-02 > 30 KSP preconditioned resid norm 2.548780607710e+00 true resid norm 3.146609403253e+01 ||r(i)||/||b|| 6.145721490728e-02 > 31 KSP preconditioned resid norm 1.560691471465e+00 true resid norm 2.970265802267e+01 ||r(i)||/||b|| 5.801300395052e-02 > 32 KSP preconditioned resid norm 2.596714997356e+00 true resid norm 2.766969046763e+01 ||r(i)||/||b|| 5.404236419458e-02 > 33 KSP preconditioned resid norm 7.034818331385e+00 true resid norm 2.684572557056e+01 ||r(i)||/||b|| 5.243305775501e-02 > 34 KSP preconditioned resid norm 1.494072683898e+00 true resid norm 2.475430030960e+01 ||r(i)||/||b|| 4.834824279219e-02 > 35 KSP preconditioned resid norm 2.080781323538e+01 true resid norm 2.334859550417e+01 ||r(i)||/||b|| 4.560272559409e-02 > 36 KSP preconditioned resid norm 2.046655096031e+00 true resid norm 2.240354154839e+01 ||r(i)||/||b|| 4.375691708669e-02 > 37 KSP preconditioned resid norm 7.606846976760e-01 true resid norm 2.109556419574e+01 ||r(i)||/||b|| 4.120227381981e-02 > 38 KSP preconditioned resid norm 2.521301363193e+00 true resid norm 1.843497075964e+01 ||r(i)||/||b|| 3.600580226493e-02 > 39 KSP preconditioned resid norm 3.726976470079e+00 true resid norm 1.794209917279e+01 ||r(i)||/||b|| 3.504316244686e-02 > 40 KSP preconditioned resid norm 8.959884762705e-01 true resid norm 1.573137783532e+01 ||r(i)||/||b|| 3.072534733461e-02 > 41 KSP preconditioned resid norm 1.227682448888e+00 true resid norm 1.501346415860e+01 ||r(i)||/||b|| 2.932317218476e-02 > 42 KSP preconditioned resid norm 1.452770736534e+00 true resid norm 1.433942919922e+01 ||r(i)||/||b|| 2.800669765473e-02 > 43 KSP preconditioned resid norm 5.675352390898e-01 true resid norm 1.216437815936e+01 ||r(i)||/||b|| 2.375855109250e-02 > 44 KSP preconditioned resid norm 4.949409351772e-01 true resid norm 1.042812110399e+01 ||r(i)||/||b|| 2.036742403123e-02 > 45 KSP preconditioned resid norm 2.002853875915e+00 true resid norm 9.309183650084e+00 ||r(i)||/||b|| 1.818199931657e-02 > 46 KSP preconditioned resid norm 3.745525627399e-01 true resid norm 8.522457639380e+00 ||r(i)||/||b|| 1.664542507691e-02 > 47 KSP preconditioned resid norm 1.811694613170e-01 true resid norm 7.531206553361e+00 ||r(i)||/||b|| 1.470938779953e-02 > 48 KSP preconditioned resid norm 1.782171623447e+00 true resid norm 6.764441307706e+00 ||r(i)||/||b|| 1.321179942911e-02 > 49 KSP preconditioned resid norm 2.299828236176e+00 true resid norm 6.702407994976e+00 ||r(i)||/||b|| 1.309064061519e-02 > 50 KSP preconditioned resid norm 1.273834849543e+00 true resid norm 6.053797247633e+00 ||r(i)||/||b|| 1.182382274928e-02 > 51 KSP preconditioned resid norm 2.719578737249e-01 true resid norm 5.470925517497e+00 ||r(i)||/||b|| 1.068540140136e-02 > 52 KSP preconditioned resid norm 4.663757145206e-01 true resid norm 5.005785517882e+00 ||r(i)||/||b|| 9.776924839614e-03 > 53 KSP preconditioned resid norm 1.292565284376e+00 true resid norm 4.881780753946e+00 ||r(i)||/||b|| 9.534728035050e-03 > 54 KSP preconditioned resid norm 1.867369610632e-01 true resid norm 4.496564950399e+00 ||r(i)||/||b|| 8.782353418749e-03 > 55 KSP preconditioned resid norm 5.249392115789e-01 true resid norm 4.092757959067e+00 ||r(i)||/||b|| 7.993667888803e-03 > 56 KSP preconditioned resid norm 1.924525961621e-01 true resid norm 3.780501481010e+00 ||r(i)||/||b|| 7.383791955098e-03 > 57 KSP preconditioned resid norm 5.779420386829e-01 true resid norm 3.213189014725e+00 ||r(i)||/||b|| 6.275759794385e-03 > 58 KSP preconditioned resid norm 5.955339076981e-01 true resid norm 3.112032435949e+00 ||r(i)||/||b|| 6.078188351463e-03 > 59 KSP preconditioned resid norm 3.750139060970e-01 true resid norm 2.999193364090e+00 ||r(i)||/||b|| 5.857799539239e-03 > 60 KSP preconditioned resid norm 1.384679712935e-01 true resid norm 2.745891157615e+00 ||r(i)||/||b|| 5.363068667216e-03 > 61 KSP preconditioned resid norm 7.632834890339e-02 true resid norm 2.176299405671e+00 ||r(i)||/||b|| 4.250584776702e-03 > 62 KSP preconditioned resid norm 3.147491994853e-01 true resid norm 1.832893972188e+00 ||r(i)||/||b|| 3.579871039430e-03 > 63 KSP preconditioned resid norm 5.052243308649e-01 true resid norm 1.775115122392e+00 ||r(i)||/||b|| 3.467021723421e-03 > 64 KSP preconditioned resid norm 8.956523831283e-01 true resid norm 1.731441975933e+00 ||r(i)||/||b|| 3.381722609244e-03 > 65 KSP preconditioned resid norm 7.897527588669e-01 true resid norm 1.682654829619e+00 ||r(i)||/||b|| 3.286435214100e-03 > 66 KSP preconditioned resid norm 5.770941160165e-02 true resid norm 1.560734518349e+00 ||r(i)||/||b|| 3.048309606150e-03 > 67 KSP preconditioned resid norm 3.553024960194e-02 true resid norm 1.389699750667e+00 ||r(i)||/||b|| 2.714257325521e-03 > 68 KSP preconditioned resid norm 4.316233667769e-02 true resid norm 1.147051776028e+00 ||r(i)||/||b|| 2.240335500054e-03 > 69 KSP preconditioned resid norm 3.793691994632e-02 true resid norm 1.012385825627e+00 ||r(i)||/||b|| 1.977316065678e-03 > 70 KSP preconditioned resid norm 2.383460701011e-02 true resid norm 8.696480161436e-01 ||r(i)||/||b|| 1.698531281530e-03 > 71 KSP preconditioned resid norm 6.376655007996e-02 true resid norm 7.779779636534e-01 ||r(i)||/||b|| 1.519488210261e-03 > 72 KSP preconditioned resid norm 5.714768085413e-02 true resid norm 7.153671793501e-01 ||r(i)||/||b|| 1.397201522168e-03 > 73 KSP preconditioned resid norm 1.708395350387e-01 true resid norm 6.312992319936e-01 ||r(i)||/||b|| 1.233006312487e-03 > 74 KSP preconditioned resid norm 1.498516783452e-01 true resid norm 6.006527781743e-01 ||r(i)||/||b|| 1.173149957372e-03 > 75 KSP preconditioned resid norm 1.218071938641e-01 true resid norm 5.769463903876e-01 ||r(i)||/||b|| 1.126848418726e-03 > 76 KSP preconditioned resid norm 2.682030144251e-02 true resid norm 5.214035118381e-01 ||r(i)||/||b|| 1.018366234059e-03 > 77 KSP preconditioned resid norm 9.794744927328e-02 true resid norm 4.660318995939e-01 ||r(i)||/||b|| 9.102185538943e-04 > 78 KSP preconditioned resid norm 3.311394355245e-01 true resid norm 4.581129176231e-01 ||r(i)||/||b|| 8.947517922325e-04 > 79 KSP preconditioned resid norm 7.771705063438e-02 true resid norm 4.103510898511e-01 ||r(i)||/||b|| 8.014669723654e-04 > 80 KSP preconditioned resid norm 3.078123608908e-02 true resid norm 3.918493012988e-01 ||r(i)||/||b|| 7.653306665991e-04 > 81 KSP preconditioned resid norm 2.759088686744e-02 true resid norm 3.289360804743e-01 ||r(i)||/||b|| 6.424532821763e-04 > 82 KSP preconditioned resid norm 1.147671489846e-01 true resid norm 3.190902200515e-01 ||r(i)||/||b|| 6.232230860381e-04 > 83 KSP preconditioned resid norm 1.101306468440e-02 true resid norm 2.900815313985e-01 ||r(i)||/||b|| 5.665654910126e-04 > KSP Object: 16 MPI processes > type: cg > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 16 MPI processes > type: mg > MG: type is MULTIPLICATIVE, levels=5 cycles=v > Cycles per PCApply=1 > Not using Galerkin computed coarse grid matrices > Coarse grid solver -- level ------------------------------- > KSP Object: (mg_coarse_) 16 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_) 16 MPI processes > type: redundant > Redundant preconditioner: First (color=0) of 16 PCs follows > KSP Object: (mg_coarse_redundant_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_redundant_) 1 MPI processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > using diagonal shift on blocks to prevent zero pivot [INBLOCKS] > matrix ordering: nd > factor fill ratio given 5., needed 7.56438 > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=512, cols=512 > package used to perform factorization: petsc > total: nonzeros=24206, allocated nonzeros=24206 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=512, cols=512 > total: nonzeros=3200, allocated nonzeros=3200 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=512, cols=512 > total: nonzeros=3200, allocated nonzeros=3200 > total number of mallocs used during MatSetValues calls =0 > Down solver (pre-smoother) on level 1 ------------------------------- > KSP Object: (mg_levels_1_) 16 MPI processes > type: richardson > Richardson: using self-scale best computed damping factor > maximum iterations=5 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_1_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=4096, cols=4096 > total: nonzeros=27136, allocated nonzeros=27136 > total number of mallocs used during MatSetValues calls =0 > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 2 ------------------------------- > KSP Object: (mg_levels_2_) 16 MPI processes > type: richardson > Richardson: using self-scale best computed damping factor > maximum iterations=5 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_2_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=32768, cols=32768 > total: nonzeros=223232, allocated nonzeros=223232 > total number of mallocs used during MatSetValues calls =0 > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 3 ------------------------------- > KSP Object: (mg_levels_3_) 16 MPI processes > type: richardson > Richardson: using self-scale best computed damping factor > maximum iterations=5 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_3_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=262144, cols=262144 > total: nonzeros=1810432, allocated nonzeros=1810432 > total number of mallocs used during MatSetValues calls =0 > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 4 ------------------------------- > KSP Object: (mg_levels_4_) 16 MPI processes > type: richardson > Richardson: using self-scale best computed damping factor > maximum iterations=5 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_4_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=2097152, cols=2097152 > total: nonzeros=14581760, allocated nonzeros=14581760 > total number of mallocs used during MatSetValues calls =0 > Up solver (post-smoother) same as down solver (pre-smoother) > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=2097152, cols=2097152 > total: nonzeros=14581760, allocated nonzeros=14581760 > total number of mallocs used during MatSetValues calls =0 > Residual 2 norm 0.290082 > Residual infinity norm 0.00192869 > > > > > > solver_test.c: > > // modified version of ksp/ksp/examples/tutorials/ex34.c > // related: ksp/ksp/examples/tutorials/ex29.c > // ksp/ksp/examples/tutorials/ex32.c > // ksp/ksp/examples/tutorials/ex50.c > > #include > #include > #include > > extern PetscErrorCode ComputeMatrix(KSP,Mat,Mat,void*); > extern PetscErrorCode ComputeRHS(KSP,Vec,void*); > > typedef enum > { > DIRICHLET, > NEUMANN > } BCType; > > #undef __FUNCT__ > #define __FUNCT__ "main" > int main(int argc,char **argv) > { > KSP ksp; > DM da; > PetscReal norm; > PetscErrorCode ierr; > > PetscInt i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs; > PetscScalar Hx,Hy,Hz; > PetscScalar ***array; > Vec x,b,r; > Mat J; > const char* bcTypes[2] = { "dirichlet", "neumann" }; > PetscInt bcType = (PetscInt)DIRICHLET; > > PetscInitialize(&argc,&argv,(char*)0,0); > > ierr = PetscOptionsBegin(PETSC_COMM_WORLD, "", "", "");CHKERRQ(ierr); > ierr = PetscOptionsEList("-bc_type", "Type of boundary condition", "", bcTypes, 2, bcTypes[0], &bcType, NULL);CHKERRQ(ierr); > ierr = PetscOptionsEnd();CHKERRQ(ierr); > > ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr); > ierr = DMDACreate3d(PETSC_COMM_WORLD,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DMDA_STENCIL_STAR,-12,-12,-12,PETSC_DECIDE,PETSC_DECIDE,PETSC_DECIDE,1,1,0,0,0,&da);CHKERRQ(ierr); > ierr = DMDASetInterpolationType(da, DMDA_Q0);CHKERRQ(ierr); > > ierr = KSPSetDM(ksp,da);CHKERRQ(ierr); > > ierr = KSPSetComputeRHS(ksp,ComputeRHS,&bcType);CHKERRQ(ierr); > ierr = KSPSetComputeOperators(ksp,ComputeMatrix,&bcType);CHKERRQ(ierr); > ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr); > ierr = KSPSolve(ksp,NULL,NULL);CHKERRQ(ierr); > ierr = KSPGetSolution(ksp,&x);CHKERRQ(ierr); > ierr = KSPGetRhs(ksp,&b);CHKERRQ(ierr); > ierr = KSPGetOperators(ksp,NULL,&J);CHKERRQ(ierr); > ierr = VecDuplicate(b,&r);CHKERRQ(ierr); > > ierr = MatMult(J,x,r);CHKERRQ(ierr); > ierr = VecAYPX(r,-1.0,b);CHKERRQ(ierr); > ierr = VecNorm(r,NORM_2,&norm);CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD,"Residual 2 norm %g\n",(double)norm);CHKERRQ(ierr); > ierr = VecNorm(r,NORM_INFINITY,&norm);CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD,"Residual infinity norm %g\n",(double)norm);CHKERRQ(ierr); > > ierr = VecDestroy(&r);CHKERRQ(ierr); > ierr = KSPDestroy(&ksp);CHKERRQ(ierr); > ierr = DMDestroy(&da);CHKERRQ(ierr); > ierr = PetscFinalize(); > return 0; > } > > #undef __FUNCT__ > #define __FUNCT__ "ComputeRHS" > PetscErrorCode ComputeRHS(KSP ksp,Vec b,void *ctx) > { > PetscErrorCode ierr; > PetscInt i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs; > PetscScalar Hx,Hy,Hz; > PetscScalar ***array; > DM da; > BCType bcType = *(BCType*)ctx; > > PetscFunctionBeginUser; > ierr = KSPGetDM(ksp,&da);CHKERRQ(ierr); > ierr = DMDAGetInfo(da, 0, &mx, &my, &mz, 0,0,0,0,0,0,0,0,0);CHKERRQ(ierr); > Hx = 1.0 / (PetscReal)(mx); > Hy = 1.0 / (PetscReal)(my); > Hz = 1.0 / (PetscReal)(mz); > ierr = DMDAGetCorners(da,&xs,&ys,&zs,&xm,&ym,&zm);CHKERRQ(ierr); > ierr = DMDAVecGetArray(da, b, &array);CHKERRQ(ierr); > for (k = zs; k < zs + zm; k++) > { > for (j = ys; j < ys + ym; j++) > { > for (i = xs; i < xs + xm; i++) > { > PetscReal x = ((PetscReal)i + 0.5) * Hx; > PetscReal y = ((PetscReal)j + 0.5) * Hy; > PetscReal z = ((PetscReal)k + 0.5) * Hz; > array[k][j][i] = PetscSinReal(x * 2.0 * PETSC_PI) * PetscCosReal(y * 2.0 * PETSC_PI) * PetscSinReal(z * 2.0 * PETSC_PI); > } > } > } > ierr = DMDAVecRestoreArray(da, b, &array);CHKERRQ(ierr); > ierr = VecAssemblyBegin(b);CHKERRQ(ierr); > ierr = VecAssemblyEnd(b);CHKERRQ(ierr); > > PetscReal norm; > VecNorm(b, NORM_2, &norm); > PetscPrintf(PETSC_COMM_WORLD, "right hand side 2 norm: %g\n", (double)norm); > VecNorm(b, NORM_INFINITY, &norm); > PetscPrintf(PETSC_COMM_WORLD, "right hand side infinity norm: %g\n", (double)norm); > > /* force right hand side to be consistent for singular matrix */ > /* note this is really a hack, normally the model would provide you with a consistent right handside */ > > if (bcType == NEUMANN) > { > MatNullSpace nullspace; > ierr = MatNullSpaceCreate(PETSC_COMM_WORLD,PETSC_TRUE,0,0,&nullspace);CHKERRQ(ierr); > ierr = MatNullSpaceRemove(nullspace,b);CHKERRQ(ierr); > ierr = MatNullSpaceDestroy(&nullspace);CHKERRQ(ierr); > } > PetscFunctionReturn(0); > } > > > #undef __FUNCT__ > #define __FUNCT__ "ComputeMatrix" > PetscErrorCode ComputeMatrix(KSP ksp, Mat J,Mat jac, void *ctx) > { > PetscErrorCode ierr; > PetscInt i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs,num, numi, numj, numk; > PetscScalar v[7],Hx,Hy,Hz; > MatStencil row, col[7]; > DM da; > BCType bcType = *(BCType*)ctx; > > PetscFunctionBeginUser; > > if (bcType == DIRICHLET) > PetscPrintf(PETSC_COMM_WORLD, "building operator with Dirichlet boundary conditions, "); > else if (bcType == NEUMANN) > PetscPrintf(PETSC_COMM_WORLD, "building operator with Neumann boundary conditions, "); > else > SETERRQ(PETSC_COMM_WORLD, PETSC_ERR_SUP, "unrecognized boundary condition type\n"); > > ierr = KSPGetDM(ksp,&da);CHKERRQ(ierr); > ierr = DMDAGetInfo(da,0,&mx,&my,&mz,0,0,0,0,0,0,0,0,0);CHKERRQ(ierr); > > PetscPrintf(PETSC_COMM_WORLD, "global grid size: %d x %d x %d\n", mx, my, mz); > > Hx = 1.0 / (PetscReal)(mx); > Hy = 1.0 / (PetscReal)(my); > Hz = 1.0 / (PetscReal)(mz); > > PetscReal Hx2 = Hx * Hx; > PetscReal Hy2 = Hy * Hy; > PetscReal Hz2 = Hz * Hz; > > PetscReal scaleX = 1.0 / Hx2; > PetscReal scaleY = 1.0 / Hy2; > PetscReal scaleZ = 1.0 / Hz2; > > ierr = DMDAGetCorners(da,&xs,&ys,&zs,&xm,&ym,&zm);CHKERRQ(ierr); > for (k = zs; k < zs + zm; k++) > { > for (j = ys; j < ys + ym; j++) > { > for (i = xs; i < xs + xm; i++) > { > row.i = i; > row.j = j; > row.k = k; > if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1 || k == mz - 1) > { > num = 0; > numi = 0; > numj = 0; > numk = 0; > if (k != 0) > { > v[num] = -scaleZ; > col[num].i = i; > col[num].j = j; > col[num].k = k - 1; > num++; > numk++; > } > if (j != 0) > { > v[num] = -scaleY; > col[num].i = i; > col[num].j = j - 1; > col[num].k = k; > num++; > numj++; > } > if (i != 0) > { > v[num] = -scaleX; > col[num].i = i - 1; > col[num].j = j; > col[num].k = k; > num++; > numi++; > } > if (i != mx - 1) > { > v[num] = -scaleX; > col[num].i = i + 1; > col[num].j = j; > col[num].k = k; > num++; > numi++; > } > if (j != my - 1) > { > v[num] = -scaleY; > col[num].i = i; > col[num].j = j + 1; > col[num].k = k; > num++; > numj++; > } > if (k != mz - 1) > { > v[num] = -scaleZ; > col[num].i = i; > col[num].j = j; > col[num].k = k + 1; > num++; > numk++; > } > > if (bcType == NEUMANN) > { > v[num] = (PetscReal) (numk) * scaleZ + (PetscReal) (numj) * scaleY + (PetscReal) (numi) * scaleX; > } > else if (bcType == DIRICHLET) > { > v[num] = 2.0 * (scaleX + scaleY + scaleZ); > } > > col[num].i = i; > col[num].j = j; > col[num].k = k; > num++; > ierr = MatSetValuesStencil(jac, 1, &row, num, col, v, INSERT_VALUES); > CHKERRQ(ierr); > } > else > { > v[0] = -scaleZ; > col[0].i = i; > col[0].j = j; > col[0].k = k - 1; > v[1] = -scaleY; > col[1].i = i; > col[1].j = j - 1; > col[1].k = k; > v[2] = -scaleX; > col[2].i = i - 1; > col[2].j = j; > col[2].k = k; > v[3] = 2.0 * (scaleX + scaleY + scaleZ); > col[3].i = i; > col[3].j = j; > col[3].k = k; > v[4] = -scaleX; > col[4].i = i + 1; > col[4].j = j; > col[4].k = k; > v[5] = -scaleY; > col[5].i = i; > col[5].j = j + 1; > col[5].k = k; > v[6] = -scaleZ; > col[6].i = i; > col[6].j = j; > col[6].k = k + 1; > ierr = MatSetValuesStencil(jac, 1, &row, 7, col, v, INSERT_VALUES); > CHKERRQ(ierr); > } > } > } > } > ierr = MatAssemblyBegin(jac,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); > ierr = MatAssemblyEnd(jac,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); > if (bcType == NEUMANN) > { > MatNullSpace nullspace; > ierr = MatNullSpaceCreate(PETSC_COMM_WORLD,PETSC_TRUE,0,0,&nullspace);CHKERRQ(ierr); > ierr = MatSetNullSpace(J,nullspace);CHKERRQ(ierr); > ierr = MatNullSpaceDestroy(&nullspace);CHKERRQ(ierr); > } > PetscFunctionReturn(0); > } > > >> On Jun 22, 2017, at 9:23 AM, Matthew Knepley > wrote: >> >> On Wed, Jun 21, 2017 at 8:12 PM, Jason Lefley > wrote: >> Hello, >> >> We are attempting to use the PETSc KSP solver framework in a fluid dynamics simulation we developed. The solution is part of a pressure projection and solves a Poisson problem. We use a cell-centered layout with a regular grid in 3d. We started with ex34.c from the KSP tutorials since it has the correct calls for the 3d DMDA, uses a cell-centered layout, and states that it works with multi-grid. We modified the operator construction function to match the coefficients and Dirichlet boundary conditions used in our problem (we?d also like to support Neumann but left those out for now to keep things simple). As a result of the modified boundary conditions, our version does not perform a null space removal on the right hand side or operator as the original did. We also modified the right hand side to contain a sinusoidal pattern for testing. Other than these changes, our code is the same as the original ex34.c >> >> With the default KSP options and using CG with the default pre-conditioner and without a pre-conditioner, we see good convergence. However, we?d like to accelerate the time to solution further and target larger problem sizes (>= 1024^3) if possible. Given these objectives, multi-grid as a pre-conditioner interests us. To understand the improvement that multi-grid provides, we ran ex45 from the KSP tutorials. ex34 with CG and no pre-conditioner appears to converge in a single iteration and we wanted to compare against a problem that has similar convergence patterns to our problem. Here?s the tests we ran with ex45: >> >> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 >> time in KSPSolve(): 7.0178e+00 >> solver iterations: 157 >> KSP final norm of residual: 3.16874e-05 >> >> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -ksp_type cg -pc_type none >> time in KSPSolve(): 4.1072e+00 >> solver iterations: 213 >> KSP final norm of residual: 0.000138866 >> >> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -ksp_type cg >> time in KSPSolve(): 3.3962e+00 >> solver iterations: 88 >> KSP final norm of residual: 6.46242e-05 >> >> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi >> time in KSPSolve(): 1.3201e+00 >> solver iterations: 4 >> KSP final norm of residual: 8.13339e-05 >> >> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi -ksp_type cg >> time in KSPSolve(): 1.3008e+00 >> solver iterations: 4 >> KSP final norm of residual: 2.21474e-05 >> >> We found the multi-grid pre-conditioner options in the KSP tutorials makefile. These results make sense; both the default GMRES and CG solvers converge and CG without a pre-conditioner takes more iterations. The multi-grid pre-conditioned runs are pretty dramatically accelerated and require only a handful of iterations. >> >> We ran our code (modified ex34.c as described above) with the same parameters: >> >> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 >> time in KSPSolve(): 5.3729e+00 >> solver iterations: 123 >> KSP final norm of residual: 0.00595066 >> >> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_type cg -pc_type none >> time in KSPSolve(): 3.6154e+00 >> solver iterations: 188 >> KSP final norm of residual: 0.00505943 >> >> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_type cg >> time in KSPSolve(): 3.5661e+00 >> solver iterations: 98 >> KSP final norm of residual: 0.00967462 >> >> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi >> time in KSPSolve(): 4.5606e+00 >> solver iterations: 44 >> KSP final norm of residual: 949.553 >> >> 1) Dave is right >> >> 2) In order to see how many iterates to expect, first try using algebraic multigrid >> >> -pc_type gamg >> >> This should work out of the box for Poisson >> >> 3) For questions like this, we really need to see >> >> -ksp_view -ksp_monitor_true_residual >> >> 4) It sounds like you smoother is not strong enough. You could try >> >> -mg_levels_ksp_type richardson -mg_levels_ksp_richardson_self_scale -mg_levels_ksp_max_it 5 >> >> or maybe GMRES until it works. >> >> Thanks, >> >> Matt >> >> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi -ksp_type cg >> time in KSPSolve(): 1.5481e+01 >> solver iterations: 198 >> KSP final norm of residual: 0.916558 >> >> We performed all tests with petsc-3.7.6. >> >> The trends with CG and GMRES seem consistent with the results from ex45. However, with multi-grid, something doesn?t seem right. Convergence seems poor and the solves run for many more iterations than ex45 with multi-grid as a pre-conditioner. I extensively validated the code that builds the matrix and also confirmed that the solution produced by CG, when evaluated with the system of equations elsewhere in our simulation, produces the same residual as indicated by PETSc. Given that we only made minimal modifications to the original example code, it seems likely that the operators constructed for the multi-grid levels are ok. >> >> We also tried a variety of other suggested parameters for the multi-grid pre-conditioner as suggested in related mailing list posts but we didn?t observe any significant improvements over the results above. >> >> Is there anything we can do to check the validity of the coefficient matrices built for the different multi-grid levels? Does it look like there could be problems there? Or any other suggestions to achieve better results with multi-grid? I have the -log_view, -ksp_view, and convergence monitor output from the above tests and can post any of it if it would assist. >> >> Thanks >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.croucher at auckland.ac.nz Fri Jun 23 01:15:48 2017 From: a.croucher at auckland.ac.nz (Adrian Croucher) Date: Fri, 23 Jun 2017 18:15:48 +1200 Subject: [petsc-users] Jacobian matrix for dual porosity model In-Reply-To: References: <699ad4c0-6f79-be19-8239-ba2050ccb8de@auckland.ac.nz> <87d1a86i6n.fsf@jedbrown.org> <90E27510-2650-4B07-B37C-1C6D46250FC3@mcs.anl.gov> <87y3sv4qpl.fsf@jedbrown.org> <6e72e0e2-1e9c-8796-4b3b-d55421a3fd61@auckland.ac.nz> <638c328a-ade6-dbb6-88f1-24c9372d5178@auckland.ac.nz> Message-ID: <493a8399-e321-9e55-03bf-5a298962d99f@auckland.ac.nz> On 23/06/17 00:48, Matthew Knepley wrote: > > If I understand what you mean, I considered doing something like that- > basically just defining extra degrees of freedom in the cells where > dual porosity is to be applied. > > > It seemed to me that if I then went ahead and created the Jacobian > matrix using DMCreateMatrix(), it would give me extra nonzero > entries that shouldn't be there - interactions between the dual > porosity variables in neighbouring cells. Is there any way to > avoid that? > > > Ah, this is a very good point. You would like sparse structure in the > Jacobian blocks. Currently I do not have it, > but DMNetwork does. I have been planning to unify the sparsity > determination between the two. It is on the list. I can see another possible problem with this approach, which is that the extra dual-porosity unknowns would presumably be interlaced with the others (the fracture unknowns) in the resulting vector. This might make it harder to apply PCFIELDSPLIT (I think?), where I want to partition the Jacobian into fracture and matrix parts to solve it more efficiently, as Jed and Barry suggested. I have played around a bit more with including the dual-porosity topology in the DMPlex. It seems to work OK if I just extend the DAG on the dual-porosity faces up to the maximum depth (so there is an unused edge and vertex associated with each dual-porosity face). I also had a closer look inside DMPlexStratify() and I can see that it won't work if different parts of the DAG have different heights. For my case I could possibly stratify it differently (starting from the cell end rather than the vertex end) and it might work, but I don't know if there are other DMPlex functions that might not like that kind of DAG either. - Adrian -- Dr Adrian Croucher Senior Research Fellow Department of Engineering Science University of Auckland, New Zealand email: a.croucher at auckland.ac.nz tel: +64 (0)9 923 4611 -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jun 23 05:30:34 2017 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 23 Jun 2017 05:30:34 -0500 Subject: [petsc-users] Jacobian matrix for dual porosity model In-Reply-To: <493a8399-e321-9e55-03bf-5a298962d99f@auckland.ac.nz> References: <699ad4c0-6f79-be19-8239-ba2050ccb8de@auckland.ac.nz> <87d1a86i6n.fsf@jedbrown.org> <90E27510-2650-4B07-B37C-1C6D46250FC3@mcs.anl.gov> <87y3sv4qpl.fsf@jedbrown.org> <6e72e0e2-1e9c-8796-4b3b-d55421a3fd61@auckland.ac.nz> <638c328a-ade6-dbb6-88f1-24c9372d5178@auckland.ac.nz> <493a8399-e321-9e55-03bf-5a298962d99f@auckland.ac.nz> Message-ID: On Fri, Jun 23, 2017 at 1:15 AM, Adrian Croucher wrote: > On 23/06/17 00:48, Matthew Knepley wrote: > > > If I understand what you mean, I considered doing something like that- > basically just defining extra degrees of freedom in the cells where dual > porosity is to be applied. > >> >> It seemed to me that if I then went ahead and created the Jacobian matrix >> using DMCreateMatrix(), it would give me extra nonzero entries that >> shouldn't be there - interactions between the dual porosity variables in >> neighbouring cells. Is there any way to avoid that? >> > > Ah, this is a very good point. You would like sparse structure in the > Jacobian blocks. Currently I do not have it, > but DMNetwork does. I have been planning to unify the sparsity > determination between the two. It is on the list. > > > I can see another possible problem with this approach, which is that the > extra dual-porosity unknowns would presumably be interlaced with the others > (the fracture unknowns) in the resulting vector. This might make it harder > to apply PCFIELDSPLIT (I think?), where I want to partition the Jacobian > into fracture and matrix parts to solve it more efficiently, as Jed and > Barry suggested. > Actually that part will work fine. > I have played around a bit more with including the dual-porosity topology > in the DMPlex. It seems to work OK if I just extend the DAG on the > dual-porosity faces up to the maximum depth (so there is an unused edge and > vertex associated with each dual-porosity face). > > I also had a closer look inside DMPlexStratify() and I can see that it > won't work if different parts of the DAG have different heights. For my > case I could possibly stratify it differently (starting from the cell end > rather than the vertex end) and it might work, but I don't know if there > are other DMPlex functions that might not like that kind of DAG either. > Having different parts have different heights does throw a wrench in things. In your case, this would not happen if you put an unused edge and vertex on top of the dual porosity face. Thanks, Matt > > - Adrian > > -- > Dr Adrian Croucher > Senior Research Fellow > Department of Engineering Science > University of Auckland, New Zealand > email: a.croucher at auckland.ac.nz > tel: +64 (0)9 923 4611 <+64%209-923%204611> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jun 23 05:36:45 2017 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 23 Jun 2017 05:36:45 -0500 Subject: [petsc-users] Issue using multi-grid as a pre-conditioner with KSP for a Poisson problem In-Reply-To: References: <89C033B5-529D-4B36-B4AF-2EC35CA2CCAB@aclectic.com> <4D5B3921-810B-49AD-97E9-8BF1DECBF655@aclectic.com> Message-ID: On Fri, Jun 23, 2017 at 12:54 AM, Jason Lefley wrote: > > On Jun 22, 2017, at 5:35 PM, Matthew Knepley wrote: > > On Thu, Jun 22, 2017 at 3:20 PM, Jason Lefley > wrote: > >> Thanks for the prompt replies. I ran with gamg and the results look more >> promising. I tried the suggested -mg_* options and did not see improvement. >> The -ksp_view and -ksp_monitor_true_residual output from those tests and >> the solver_test source (modified ex34.c) follow: >> > > Okay, the second step is to replicate the smoother for the GMG, which will > have a smaller and scalable setup time. The > smoother could be weak, or the restriction could be bad. > > > I inspected the ksp_view output from the run with -pc_type gamg and ran > the program again with -pc_type mg and the pre-conditioner options from the > gamg run: > > $ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 > -ksp_view -ksp_monitor_true_residual -ksp_type cg -pc_type mg -pc_mg_levels > 5 -mg_coarse_ksp_type preonly -mg_coarse_pc_type bjacobi > -mg_coarse_sub_ksp_type preonly -mg_coarse_sub_pc_type lu > -mg_coarse_sub_pc_factor_shift_type INBLOCKS -mg_levels_ksp_type > chebyshev -mg_levels_pc_type sor -mg_levels_esteig_ksp_type gmres > right hand side 2 norm: 512. > right hand side infinity norm: 0.999097 > building operator with Dirichlet boundary conditions, global grid size: > 128 x 128 x 128 > building operator with Dirichlet boundary conditions, global grid size: 16 > x 16 x 16 > building operator with Dirichlet boundary conditions, global grid size: 32 > x 32 x 32 > building operator with Dirichlet boundary conditions, global grid size: 64 > x 64 x 64 > building operator with Dirichlet boundary conditions, global grid size: 8 > x 8 x 8 > 0 KSP preconditioned resid norm 9.806726045668e+02 true resid norm > 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 3.621361277232e+02 true resid norm > 1.429430352211e+03 ||r(i)||/||b|| 2.791856156662e+00 > 2 KSP preconditioned resid norm 2.362961522860e+01 true resid norm > 1.549620746006e+03 ||r(i)||/||b|| 3.026603019544e+00 > 3 KSP preconditioned resid norm 7.695073339717e+01 true resid norm > 1.542148820317e+03 ||r(i)||/||b|| 3.012009414681e+00 > 4 KSP preconditioned resid norm 3.765270470793e+01 true resid norm > 1.536405551882e+03 ||r(i)||/||b|| 3.000792093520e+00 > 5 KSP preconditioned resid norm 6.761970780882e+01 true resid norm > 1.502842623846e+03 ||r(i)||/||b|| 2.935239499698e+00 > 6 KSP preconditioned resid norm 5.995995646652e+01 true resid norm > 1.447456652501e+03 ||r(i)||/||b|| 2.827063774415e+00 > 7 KSP preconditioned resid norm 4.388139142285e+01 true resid norm > 1.413766393419e+03 ||r(i)||/||b|| 2.761262487146e+00 > 8 KSP preconditioned resid norm 2.295909410512e+01 true resid norm > 1.371727148377e+03 ||r(i)||/||b|| 2.679154586673e+00 > 9 KSP preconditioned resid norm 1.961908891359e+01 true resid norm > 1.339113282715e+03 ||r(i)||/||b|| 2.615455630302e+00 > 10 KSP preconditioned resid norm 6.893687291220e+01 true resid norm > 1.229592829746e+03 ||r(i)||/||b|| 2.401548495598e+00 > 11 KSP preconditioned resid norm 3.833567365382e+01 true resid norm > 1.118085982483e+03 ||r(i)||/||b|| 2.183761684536e+00 > 12 KSP preconditioned resid norm 1.939604089596e+01 true resid norm > 9.852672187664e+02 ||r(i)||/||b|| 1.924350036653e+00 > 13 KSP preconditioned resid norm 2.252075208204e+01 true resid norm > 8.159187018709e+02 ||r(i)||/||b|| 1.593591214592e+00 > 14 KSP preconditioned resid norm 2.642782719810e+01 true resid norm > 7.253214735753e+02 ||r(i)||/||b|| 1.416643503077e+00 > 15 KSP preconditioned resid norm 2.548817259250e+01 true resid norm > 6.070018478722e+02 ||r(i)||/||b|| 1.185550484125e+00 > 16 KSP preconditioned resid norm 5.281972692525e+01 true resid norm > 4.815894400238e+02 ||r(i)||/||b|| 9.406043750466e-01 > 17 KSP preconditioned resid norm 2.402884696592e+01 true resid norm > 4.144462871860e+02 ||r(i)||/||b|| 8.094654046603e-01 > 18 KSP preconditioned resid norm 1.043080941574e+01 true resid norm > 3.729148183697e+02 ||r(i)||/||b|| 7.283492546283e-01 > 19 KSP preconditioned resid norm 1.490375076082e+01 true resid norm > 3.122057027160e+02 ||r(i)||/||b|| 6.097767631172e-01 > 20 KSP preconditioned resid norm 3.249426166084e+00 true resid norm > 2.704136970440e+02 ||r(i)||/||b|| 5.281517520390e-01 > 21 KSP preconditioned resid norm 4.898441103047e+00 true resid norm > 2.346045017813e+02 ||r(i)||/||b|| 4.582119175416e-01 > 22 KSP preconditioned resid norm 6.674657659594e+00 true resid norm > 1.870390126135e+02 ||r(i)||/||b|| 3.653105715107e-01 > 23 KSP preconditioned resid norm 5.475921158065e+00 true resid norm > 1.732176093821e+02 ||r(i)||/||b|| 3.383156433245e-01 > 24 KSP preconditioned resid norm 2.776421930727e+00 true resid norm > 1.562809743536e+02 ||r(i)||/||b|| 3.052362780343e-01 > 25 KSP preconditioned resid norm 3.424602247354e+00 true resid norm > 1.375628929963e+02 ||r(i)||/||b|| 2.686775253835e-01 > 26 KSP preconditioned resid norm 2.212037280808e+00 true resid norm > 1.221828497054e+02 ||r(i)||/||b|| 2.386383783309e-01 > 27 KSP preconditioned resid norm 1.365474968893e+00 true resid norm > 1.082476112493e+02 ||r(i)||/||b|| 2.114211157213e-01 > 28 KSP preconditioned resid norm 2.638907538318e+00 true resid norm > 8.864935716757e+01 ||r(i)||/||b|| 1.731432757179e-01 > 29 KSP preconditioned resid norm 1.719908158919e+00 true resid norm > 7.632670876324e+01 ||r(i)||/||b|| 1.490756030532e-01 > 30 KSP preconditioned resid norm 7.985033219249e-01 true resid norm > 6.949169231958e+01 ||r(i)||/||b|| 1.357259615617e-01 > 31 KSP preconditioned resid norm 3.811670663811e+00 true resid norm > 6.151000812796e+01 ||r(i)||/||b|| 1.201367346249e-01 > 32 KSP preconditioned resid norm 7.888148376757e+00 true resid norm > 5.694823999920e+01 ||r(i)||/||b|| 1.112270312484e-01 > 33 KSP preconditioned resid norm 7.545633821809e-01 true resid norm > 4.589854278402e+01 ||r(i)||/||b|| 8.964559137503e-02 > 34 KSP preconditioned resid norm 2.271801800991e+00 true resid norm > 3.728668301821e+01 ||r(i)||/||b|| 7.282555276994e-02 > 35 KSP preconditioned resid norm 3.961087334680e+00 true resid norm > 3.169140910721e+01 ||r(i)||/||b|| 6.189728341253e-02 > 36 KSP preconditioned resid norm 9.139405064634e-01 true resid norm > 2.825299509385e+01 ||r(i)||/||b|| 5.518163104268e-02 > 37 KSP preconditioned resid norm 3.403605053170e-01 true resid norm > 2.102215336663e+01 ||r(i)||/||b|| 4.105889329421e-02 > 38 KSP preconditioned resid norm 4.614799224677e-01 true resid norm > 1.651863757642e+01 ||r(i)||/||b|| 3.226296401644e-02 > 39 KSP preconditioned resid norm 1.996074237552e+00 true resid norm > 1.439868559977e+01 ||r(i)||/||b|| 2.812243281205e-02 > 40 KSP preconditioned resid norm 1.106018322401e+00 true resid norm > 1.313250681787e+01 ||r(i)||/||b|| 2.564942737865e-02 > 41 KSP preconditioned resid norm 2.639402464711e-01 true resid norm > 1.164910167179e+01 ||r(i)||/||b|| 2.275215170271e-02 > 42 KSP preconditioned resid norm 1.749941228669e-01 true resid norm > 1.053438524789e+01 ||r(i)||/||b|| 2.057497118729e-02 > 43 KSP preconditioned resid norm 6.464433193720e-01 true resid norm > 9.105614545741e+00 ||r(i)||/||b|| 1.778440340965e-02 > 44 KSP preconditioned resid norm 5.990029838187e-01 true resid norm > 8.803151647663e+00 ||r(i)||/||b|| 1.719365556184e-02 > 45 KSP preconditioned resid norm 1.871777684116e-01 true resid norm > 8.140591972598e+00 ||r(i)||/||b|| 1.589959369648e-02 > 46 KSP preconditioned resid norm 4.316459571157e-01 true resid norm > 7.640223567698e+00 ||r(i)||/||b|| 1.492231165566e-02 > 47 KSP preconditioned resid norm 9.563142801536e-02 true resid norm > 7.094762567710e+00 ||r(i)||/||b|| 1.385695814006e-02 > 48 KSP preconditioned resid norm 2.380088757747e-01 true resid norm > 6.064559746487e+00 ||r(i)||/||b|| 1.184484325486e-02 > 49 KSP preconditioned resid norm 2.230779501200e-01 true resid norm > 4.923827478633e+00 ||r(i)||/||b|| 9.616850544205e-03 > 50 KSP preconditioned resid norm 2.905071000609e-01 true resid norm > 4.426620956264e+00 ||r(i)||/||b|| 8.645744055203e-03 > 51 KSP preconditioned resid norm 3.430194707482e-02 true resid norm > 3.873957688918e+00 ||r(i)||/||b|| 7.566323611167e-03 > 52 KSP preconditioned resid norm 4.329652082337e-02 true resid norm > 3.430571122778e+00 ||r(i)||/||b|| 6.700334224177e-03 > 53 KSP preconditioned resid norm 1.610976212900e-01 true resid norm > 3.052757228648e+00 ||r(i)||/||b|| 5.962416462203e-03 > 54 KSP preconditioned resid norm 6.113252183681e-02 true resid norm > 2.876793151138e+00 ||r(i)||/||b|| 5.618736623317e-03 > 55 KSP preconditioned resid norm 2.731463237482e-02 true resid norm > 2.441017091077e+00 ||r(i)||/||b|| 4.767611506010e-03 > 56 KSP preconditioned resid norm 5.193746161496e-02 true resid norm > 2.114917193241e+00 ||r(i)||/||b|| 4.130697643049e-03 > 57 KSP preconditioned resid norm 2.959513516137e-01 true resid norm > 1.903828747377e+00 ||r(i)||/||b|| 3.718415522220e-03 > 58 KSP preconditioned resid norm 8.093802579621e-02 true resid norm > 1.759070727559e+00 ||r(i)||/||b|| 3.435685014763e-03 > 59 KSP preconditioned resid norm 3.558590388480e-02 true resid norm > 1.356337866126e+00 ||r(i)||/||b|| 2.649097394777e-03 > 60 KSP preconditioned resid norm 6.506508837044e-02 true resid norm > 1.214979249890e+00 ||r(i)||/||b|| 2.373006347441e-03 > 61 KSP preconditioned resid norm 3.120758675191e-02 true resid norm > 9.993321163196e-01 ||r(i)||/||b|| 1.951820539687e-03 > 62 KSP preconditioned resid norm 1.034431089486e-01 true resid norm > 9.193137244810e-01 ||r(i)||/||b|| 1.795534618127e-03 > 63 KSP preconditioned resid norm 2.763120051285e-02 true resid norm > 8.479698661132e-01 ||r(i)||/||b|| 1.656191144752e-03 > 64 KSP preconditioned resid norm 1.937546528918e-02 true resid norm > 7.431839535619e-01 ||r(i)||/||b|| 1.451531159301e-03 > 65 KSP preconditioned resid norm 2.133391792161e-02 true resid norm > 7.089428437765e-01 ||r(i)||/||b|| 1.384653991751e-03 > 66 KSP preconditioned resid norm 8.676771000819e-03 true resid norm > 6.511166875850e-01 ||r(i)||/||b|| 1.271712280439e-03 > KSP Object: 16 MPI processes > type: cg > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 16 MPI processes > type: mg > MG: type is MULTIPLICATIVE, levels=5 cycles=v > Cycles per PCApply=1 > Not using Galerkin computed coarse grid matrices > Coarse grid solver -- level ------------------------------- > KSP Object: (mg_coarse_) 16 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_) 16 MPI processes > type: bjacobi > block Jacobi: number of blocks = 16 > Local solve is same for all blocks, in the following KSP and PC > objects: > KSP Object: (mg_coarse_sub_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_sub_) 1 MPI processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > using diagonal shift on blocks to prevent zero pivot [INBLOCKS] > matrix ordering: nd > factor fill ratio given 5., needed 2.3125 > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=32, cols=32 > package used to perform factorization: petsc > total: nonzeros=370, allocated nonzeros=370 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=32, cols=32 > total: nonzeros=160, allocated nonzeros=160 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=512, cols=512 > total: nonzeros=3200, allocated nonzeros=3200 > total number of mallocs used during MatSetValues calls =0 > Down solver (pre-smoother) on level 1 ------------------------------- > KSP Object: (mg_levels_1_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.153005, max = 1.68306 > Chebyshev: eigenvalues estimated using gmres with translations > [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_1_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_1_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=4096, cols=4096 > total: nonzeros=27136, allocated nonzeros=27136 > total number of mallocs used during MatSetValues calls =0 > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 2 ------------------------------- > KSP Object: (mg_levels_2_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.152793, max = 1.68072 > Chebyshev: eigenvalues estimated using gmres with translations > [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_2_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_2_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=32768, cols=32768 > total: nonzeros=223232, allocated nonzeros=223232 > total number of mallocs used during MatSetValues calls =0 > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 3 ------------------------------- > KSP Object: (mg_levels_3_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.144705, max = 1.59176 > Chebyshev: eigenvalues estimated using gmres with translations > [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_3_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_3_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=262144, cols=262144 > total: nonzeros=1810432, allocated nonzeros=1810432 > total number of mallocs used during MatSetValues calls =0 > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 4 ------------------------------- > KSP Object: (mg_levels_4_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.140039, max = 1.54043 > Chebyshev: eigenvalues estimated using gmres with translations > [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_4_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_4_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=2097152, cols=2097152 > total: nonzeros=14581760, allocated nonzeros=14581760 > total number of mallocs used during MatSetValues calls =0 > Up solver (post-smoother) same as down solver (pre-smoother) > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=2097152, cols=2097152 > total: nonzeros=14581760, allocated nonzeros=14581760 > total number of mallocs used during MatSetValues calls =0 > Residual 2 norm 0.651117 > Residual infinity norm 0.00799571 > > > I did a diff on the ksp_view from the above run from the output from the > run with -pc_type gamg and the only differences include the needed factor > fill ratio (gamg: 1, mg: 2.3125), the size and non-zero counts of the > matrices used in the multi-grid levels, the Chebyshev eigenvalue estimates, > and the usage of I-node routines (gamg: using I-node routines: found 3 > nodes, limit used is 5, mg: not using I-node routines). > > Adding -pc_mg_galerkin results in some improvement but still not as good > as with gamg: > > $ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 > -ksp_view -ksp_monitor_true_residual -ksp_type cg -pc_type mg -pc_mg_levels > 5 -mg_coarse_ksp_type preonly -mg_coarse_pc_type bjacobi > -mg_coarse_sub_ksp_type preonly -mg_coarse_sub_pc_type lu > -mg_coarse_sub_pc_factor_shift_type INBLOCKS -mg_levels_ksp_type > chebyshev -mg_levels_pc_type sor -mg_levels_esteig_ksp_type gmres > -pc_mg_galerkin > right hand side 2 norm: 512. > right hand side infinity norm: 0.999097 > building operator with Dirichlet boundary conditions, global grid size: > 128 x 128 x 128 > 0 KSP preconditioned resid norm 1.073621701581e+00 true resid norm > 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 2.316341151889e-01 true resid norm > 1.169096072553e+03 ||r(i)||/||b|| 2.283390766706e+00 > 2 KSP preconditioned resid norm 1.054910990128e-01 true resid norm > 4.444993518786e+02 ||r(i)||/||b|| 8.681627966378e-01 > 3 KSP preconditioned resid norm 3.671488511570e-02 true resid norm > 1.169431518627e+02 ||r(i)||/||b|| 2.284045934818e-01 > 4 KSP preconditioned resid norm 1.055769111265e-02 true resid norm > 3.161333456265e+01 ||r(i)||/||b|| 6.174479406767e-02 > 5 KSP preconditioned resid norm 2.557907008002e-03 true resid norm > 9.319742572653e+00 ||r(i)||/||b|| 1.820262221221e-02 > 6 KSP preconditioned resid norm 5.039866236685e-04 true resid norm > 2.418858575838e+00 ||r(i)||/||b|| 4.724333155934e-03 > 7 KSP preconditioned resid norm 1.132965683654e-04 true resid norm > 4.979511177091e-01 ||r(i)||/||b|| 9.725607767757e-04 > 8 KSP preconditioned resid norm 5.458028025084e-05 true resid norm > 1.150321233127e-01 ||r(i)||/||b|| 2.246721158452e-04 > 9 KSP preconditioned resid norm 3.742558792121e-05 true resid norm > 8.485603638598e-02 ||r(i)||/||b|| 1.657344460664e-04 > 10 KSP preconditioned resid norm 1.121838737544e-05 true resid norm > 4.699890661073e-02 ||r(i)||/||b|| 9.179473947407e-05 > 11 KSP preconditioned resid norm 4.452473763175e-06 true resid norm > 1.071140093264e-02 ||r(i)||/||b|| 2.092070494657e-05 > KSP Object: 16 MPI processes > type: cg > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 16 MPI processes > type: mg > MG: type is MULTIPLICATIVE, levels=5 cycles=v > Cycles per PCApply=1 > Using Galerkin computed coarse grid matrices > Coarse grid solver -- level ------------------------------- > KSP Object: (mg_coarse_) 16 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_) 16 MPI processes > type: bjacobi > block Jacobi: number of blocks = 16 > Local solve is same for all blocks, in the following KSP and PC > objects: > KSP Object: (mg_coarse_sub_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_sub_) 1 MPI processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > using diagonal shift on blocks to prevent zero pivot [INBLOCKS] > matrix ordering: nd > factor fill ratio given 5., needed 2.3125 > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=32, cols=32 > package used to perform factorization: petsc > total: nonzeros=370, allocated nonzeros=370 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=32, cols=32 > total: nonzeros=160, allocated nonzeros=160 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=512, cols=512 > total: nonzeros=3200, allocated nonzeros=3200 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > Down solver (pre-smoother) on level 1 ------------------------------- > KSP Object: (mg_levels_1_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.153005, max = 1.68306 > Chebyshev: eigenvalues estimated using gmres with translations > [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_1_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_1_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=4096, cols=4096 > total: nonzeros=27136, allocated nonzeros=27136 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 2 ------------------------------- > KSP Object: (mg_levels_2_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.152793, max = 1.68072 > Chebyshev: eigenvalues estimated using gmres with translations > [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_2_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_2_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=32768, cols=32768 > total: nonzeros=223232, allocated nonzeros=223232 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 3 ------------------------------- > KSP Object: (mg_levels_3_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.144705, max = 1.59176 > Chebyshev: eigenvalues estimated using gmres with translations > [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_3_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_3_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=262144, cols=262144 > total: nonzeros=1810432, allocated nonzeros=1810432 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 4 ------------------------------- > KSP Object: (mg_levels_4_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.140039, max = 1.54043 > Chebyshev: eigenvalues estimated using gmres with translations > [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_4_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_4_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=2097152, cols=2097152 > total: nonzeros=14581760, allocated nonzeros=14581760 > total number of mallocs used during MatSetValues calls =0 > Up solver (post-smoother) same as down solver (pre-smoother) > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=2097152, cols=2097152 > total: nonzeros=14581760, allocated nonzeros=14581760 > total number of mallocs used during MatSetValues calls =0 > Residual 2 norm 0.0107114 > Residual infinity norm 6.84843e-05 > > What are the differences between gamg and mg with -pc_mg_galerkin option > (apart from the default smoother/coarse grid solver options, which I > identified by comparing the ksp_view output)? Perhaps there?s an issue with > the restriction, as you suggest? > Okay, when you say a Poisson problem, I assumed you meant div grad phi = f However, now it appears that you have div D grad phi = f Is this true? It would explain your results. Your coarse operator is inaccurate. AMG makes the coarse operator directly from the matrix, so it incorporates coefficient variation. Galerkin projection makes the coarse operator using R A P from your original operator A, and this is accurate enough to get good convergence. So your coefficient representation on the coarse levels is really bad. If you want to use GMG, you need to figure out how to represent the coefficient on coarser levels, which is sometimes called "renormalization". Matt > Thanks! > > > Thanks, > > Matt > > >> $ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 >> -ksp_view -ksp_monitor_true_residual -pc_type gamg -ksp_type cg >> right hand side 2 norm: 512. >> right hand side infinity norm: 0.999097 >> building operator with Dirichlet boundary conditions, global grid size: >> 128 x 128 x 128 >> 0 KSP preconditioned resid norm 2.600515167901e+00 true resid norm >> 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00 >> 1 KSP preconditioned resid norm 6.715532962879e-02 true resid norm >> 7.578946422553e+02 ||r(i)||/||b|| 1.480262973155e+00 >> 2 KSP preconditioned resid norm 1.127682308441e-02 true resid norm >> 3.247852182315e+01 ||r(i)||/||b|| 6.343461293584e-02 >> 3 KSP preconditioned resid norm 7.760468503025e-04 true resid norm >> 3.304142895659e+00 ||r(i)||/||b|| 6.453404093085e-03 >> 4 KSP preconditioned resid norm 6.419777870067e-05 true resid norm >> 2.662993775521e-01 ||r(i)||/||b|| 5.201159717815e-04 >> 5 KSP preconditioned resid norm 5.107540549482e-06 true resid norm >> 2.309528369351e-02 ||r(i)||/||b|| 4.510797596388e-05 >> KSP Object: 16 MPI processes >> type: cg >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> PC Object: 16 MPI processes >> type: gamg >> MG: type is MULTIPLICATIVE, levels=5 cycles=v >> Cycles per PCApply=1 >> Using Galerkin computed coarse grid matrices >> GAMG specific options >> Threshold for dropping small values from graph 0. >> AGG specific options >> Symmetric graph false >> Coarse grid solver -- level ------------------------------- >> KSP Object: (mg_coarse_) 16 MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (mg_coarse_) 16 MPI processes >> type: bjacobi >> block Jacobi: number of blocks = 16 >> Local solve is same for all blocks, in the following KSP and PC >> objects: >> KSP Object: (mg_coarse_sub_) 1 MPI processes >> type: preonly >> maximum iterations=1, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (mg_coarse_sub_) 1 MPI processes >> type: lu >> LU: out-of-place factorization >> tolerance for zero pivot 2.22045e-14 >> using diagonal shift on blocks to prevent zero pivot [INBLOCKS] >> matrix ordering: nd >> factor fill ratio given 5., needed 1. >> Factored matrix follows: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=13, cols=13 >> package used to perform factorization: petsc >> total: nonzeros=169, allocated nonzeros=169 >> total number of mallocs used during MatSetValues calls =0 >> using I-node routines: found 3 nodes, limit used is 5 >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=13, cols=13 >> total: nonzeros=169, allocated nonzeros=169 >> total number of mallocs used during MatSetValues calls =0 >> using I-node routines: found 3 nodes, limit used is 5 >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=13, cols=13 >> total: nonzeros=169, allocated nonzeros=169 >> total number of mallocs used during MatSetValues calls =0 >> using I-node (on process 0) routines: found 3 nodes, limit >> used is 5 >> Down solver (pre-smoother) on level 1 ------------------------------- >> KSP Object: (mg_levels_1_) 16 MPI processes >> type: chebyshev >> Chebyshev: eigenvalue estimates: min = 0.136516, max = 1.50168 >> Chebyshev: eigenvalues estimated using gmres with translations >> [0. 0.1; 0. 1.1] >> KSP Object: (mg_levels_1_esteig_) 16 MPI processes >> type: gmres >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >> Orthogonalization with no iterative refinement >> GMRES: happy breakdown tolerance 1e-30 >> maximum iterations=10, initial guess is zero >> tolerances: relative=1e-12, absolute=1e-50, divergence=10000. >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_1_) 16 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = >> 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=467, cols=467 >> total: nonzeros=68689, allocated nonzeros=68689 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node (on process 0) routines >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 2 ------------------------------- >> KSP Object: (mg_levels_2_) 16 MPI processes >> type: chebyshev >> Chebyshev: eigenvalue estimates: min = 0.148872, max = 1.63759 >> Chebyshev: eigenvalues estimated using gmres with translations >> [0. 0.1; 0. 1.1] >> KSP Object: (mg_levels_2_esteig_) 16 MPI processes >> type: gmres >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >> Orthogonalization with no iterative refinement >> GMRES: happy breakdown tolerance 1e-30 >> maximum iterations=10, initial guess is zero >> tolerances: relative=1e-12, absolute=1e-50, divergence=10000. >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_2_) 16 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = >> 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=14893, cols=14893 >> total: nonzeros=1856839, allocated nonzeros=1856839 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node (on process 0) routines >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 3 ------------------------------- >> KSP Object: (mg_levels_3_) 16 MPI processes >> type: chebyshev >> Chebyshev: eigenvalue estimates: min = 0.135736, max = 1.49309 >> Chebyshev: eigenvalues estimated using gmres with translations >> [0. 0.1; 0. 1.1] >> KSP Object: (mg_levels_3_esteig_) 16 MPI processes >> type: gmres >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >> Orthogonalization with no iterative refinement >> GMRES: happy breakdown tolerance 1e-30 >> maximum iterations=10, initial guess is zero >> tolerances: relative=1e-12, absolute=1e-50, divergence=10000. >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_3_) 16 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = >> 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=190701, cols=190701 >> total: nonzeros=6209261, allocated nonzeros=6209261 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node (on process 0) routines >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 4 ------------------------------- >> KSP Object: (mg_levels_4_) 16 MPI processes >> type: chebyshev >> Chebyshev: eigenvalue estimates: min = 0.140039, max = 1.54043 >> Chebyshev: eigenvalues estimated using gmres with translations >> [0. 0.1; 0. 1.1] >> KSP Object: (mg_levels_4_esteig_) 16 MPI processes >> type: gmres >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >> Orthogonalization with no iterative refinement >> GMRES: happy breakdown tolerance 1e-30 >> maximum iterations=10, initial guess is zero >> tolerances: relative=1e-12, absolute=1e-50, divergence=10000. >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_4_) 16 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = >> 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=2097152, cols=2097152 >> total: nonzeros=14581760, allocated nonzeros=14581760 >> total number of mallocs used during MatSetValues calls =0 >> Up solver (post-smoother) same as down solver (pre-smoother) >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=2097152, cols=2097152 >> total: nonzeros=14581760, allocated nonzeros=14581760 >> total number of mallocs used during MatSetValues calls =0 >> Residual 2 norm 0.0230953 >> Residual infinity norm 0.000240174 >> >> >> >> $ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 >> -ksp_view -ksp_monitor_true_residual -pc_type mg -ksp_type cg -pc_mg_levels >> 5 -mg_levels_ksp_type richardson -mg_levels_ksp_richardson_self_scale >> -mg_levels_ksp_max_it 5 >> right hand side 2 norm: 512. >> right hand side infinity norm: 0.999097 >> building operator with Dirichlet boundary conditions, global grid size: >> 128 x 128 x 128 >> building operator with Dirichlet boundary conditions, global grid size: >> 16 x 16 x 16 >> building operator with Dirichlet boundary conditions, global grid size: >> 32 x 32 x 32 >> building operator with Dirichlet boundary conditions, global grid size: >> 64 x 64 x 64 >> building operator with Dirichlet boundary conditions, global grid size: 8 >> x 8 x 8 >> 0 KSP preconditioned resid norm 1.957390963372e+03 true resid norm >> 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00 >> 1 KSP preconditioned resid norm 7.501162328351e+02 true resid norm >> 3.373318498950e+02 ||r(i)||/||b|| 6.588512693262e-01 >> 2 KSP preconditioned resid norm 7.658993705113e+01 true resid norm >> 1.827365322620e+02 ||r(i)||/||b|| 3.569072895742e-01 >> 3 KSP preconditioned resid norm 9.059824824329e+02 true resid norm >> 1.426474831278e+02 ||r(i)||/||b|| 2.786083654840e-01 >> 4 KSP preconditioned resid norm 4.091168582134e+02 true resid norm >> 1.292495057977e+02 ||r(i)||/||b|| 2.524404410112e-01 >> 5 KSP preconditioned resid norm 7.422110759274e+01 true resid norm >> 1.258028404461e+02 ||r(i)||/||b|| 2.457086727463e-01 >> 6 KSP preconditioned resid norm 4.619015396949e+01 true resid norm >> 1.213792421102e+02 ||r(i)||/||b|| 2.370688322464e-01 >> 7 KSP preconditioned resid norm 6.391009527793e+01 true resid norm >> 1.124510270422e+02 ||r(i)||/||b|| 2.196309121917e-01 >> 8 KSP preconditioned resid norm 7.446926604265e+01 true resid norm >> 1.077567310933e+02 ||r(i)||/||b|| 2.104623654166e-01 >> 9 KSP preconditioned resid norm 4.220904319642e+01 true resid norm >> 9.988181971539e+01 ||r(i)||/||b|| 1.950816791316e-01 >> 10 KSP preconditioned resid norm 2.394387980018e+01 true resid norm >> 9.127579669592e+01 ||r(i)||/||b|| 1.782730404217e-01 >> 11 KSP preconditioned resid norm 1.360843954226e+01 true resid norm >> 8.771762326371e+01 ||r(i)||/||b|| 1.713234829369e-01 >> 12 KSP preconditioned resid norm 4.128223286694e+01 true resid norm >> 8.529182941649e+01 ||r(i)||/||b|| 1.665856043291e-01 >> 13 KSP preconditioned resid norm 2.183532094447e+01 true resid norm >> 8.263211340769e+01 ||r(i)||/||b|| 1.613908464994e-01 >> 14 KSP preconditioned resid norm 1.304178992338e+01 true resid norm >> 7.971822602122e+01 ||r(i)||/||b|| 1.556996601977e-01 >> 15 KSP preconditioned resid norm 7.573349141411e+00 true resid norm >> 7.520975377445e+01 ||r(i)||/||b|| 1.468940503407e-01 >> 16 KSP preconditioned resid norm 9.314890793459e+00 true resid norm >> 7.304954328407e+01 ||r(i)||/||b|| 1.426748892267e-01 >> 17 KSP preconditioned resid norm 4.445933446231e+00 true resid norm >> 6.978356031428e+01 ||r(i)||/||b|| 1.362960162388e-01 >> 18 KSP preconditioned resid norm 5.349719054065e+00 true resid norm >> 6.667516877214e+01 ||r(i)||/||b|| 1.302249390081e-01 >> 19 KSP preconditioned resid norm 3.295861671942e+00 true resid norm >> 6.182140339659e+01 ||r(i)||/||b|| 1.207449285090e-01 >> 20 KSP preconditioned resid norm 1.035616277789e+01 true resid norm >> 5.734720030036e+01 ||r(i)||/||b|| 1.120062505866e-01 >> 21 KSP preconditioned resid norm 3.211186072853e+01 true resid norm >> 5.552393909940e+01 ||r(i)||/||b|| 1.084451935535e-01 >> 22 KSP preconditioned resid norm 1.305589450595e+01 true resid norm >> 5.499062776214e+01 ||r(i)||/||b|| 1.074035698479e-01 >> 23 KSP preconditioned resid norm 2.686432456763e+00 true resid norm >> 5.207613218582e+01 ||r(i)||/||b|| 1.017111956754e-01 >> 24 KSP preconditioned resid norm 2.824784197849e+00 true resid norm >> 4.838619801451e+01 ||r(i)||/||b|| 9.450429299708e-02 >> 25 KSP preconditioned resid norm 1.071690618667e+00 true resid norm >> 4.607851421273e+01 ||r(i)||/||b|| 8.999709807174e-02 >> 26 KSP preconditioned resid norm 1.881879145107e+00 true resid norm >> 4.001593265961e+01 ||r(i)||/||b|| 7.815611847581e-02 >> 27 KSP preconditioned resid norm 1.572862295402e+00 true resid norm >> 3.838282973517e+01 ||r(i)||/||b|| 7.496646432650e-02 >> 28 KSP preconditioned resid norm 1.470751639074e+00 true resid norm >> 3.480847634691e+01 ||r(i)||/||b|| 6.798530536506e-02 >> 29 KSP preconditioned resid norm 1.024975253805e+01 true resid norm >> 3.242161363347e+01 ||r(i)||/||b|| 6.332346412788e-02 >> 30 KSP preconditioned resid norm 2.548780607710e+00 true resid norm >> 3.146609403253e+01 ||r(i)||/||b|| 6.145721490728e-02 >> 31 KSP preconditioned resid norm 1.560691471465e+00 true resid norm >> 2.970265802267e+01 ||r(i)||/||b|| 5.801300395052e-02 >> 32 KSP preconditioned resid norm 2.596714997356e+00 true resid norm >> 2.766969046763e+01 ||r(i)||/||b|| 5.404236419458e-02 >> 33 KSP preconditioned resid norm 7.034818331385e+00 true resid norm >> 2.684572557056e+01 ||r(i)||/||b|| 5.243305775501e-02 >> 34 KSP preconditioned resid norm 1.494072683898e+00 true resid norm >> 2.475430030960e+01 ||r(i)||/||b|| 4.834824279219e-02 >> 35 KSP preconditioned resid norm 2.080781323538e+01 true resid norm >> 2.334859550417e+01 ||r(i)||/||b|| 4.560272559409e-02 >> 36 KSP preconditioned resid norm 2.046655096031e+00 true resid norm >> 2.240354154839e+01 ||r(i)||/||b|| 4.375691708669e-02 >> 37 KSP preconditioned resid norm 7.606846976760e-01 true resid norm >> 2.109556419574e+01 ||r(i)||/||b|| 4.120227381981e-02 >> 38 KSP preconditioned resid norm 2.521301363193e+00 true resid norm >> 1.843497075964e+01 ||r(i)||/||b|| 3.600580226493e-02 >> 39 KSP preconditioned resid norm 3.726976470079e+00 true resid norm >> 1.794209917279e+01 ||r(i)||/||b|| 3.504316244686e-02 >> 40 KSP preconditioned resid norm 8.959884762705e-01 true resid norm >> 1.573137783532e+01 ||r(i)||/||b|| 3.072534733461e-02 >> 41 KSP preconditioned resid norm 1.227682448888e+00 true resid norm >> 1.501346415860e+01 ||r(i)||/||b|| 2.932317218476e-02 >> 42 KSP preconditioned resid norm 1.452770736534e+00 true resid norm >> 1.433942919922e+01 ||r(i)||/||b|| 2.800669765473e-02 >> 43 KSP preconditioned resid norm 5.675352390898e-01 true resid norm >> 1.216437815936e+01 ||r(i)||/||b|| 2.375855109250e-02 >> 44 KSP preconditioned resid norm 4.949409351772e-01 true resid norm >> 1.042812110399e+01 ||r(i)||/||b|| 2.036742403123e-02 >> 45 KSP preconditioned resid norm 2.002853875915e+00 true resid norm >> 9.309183650084e+00 ||r(i)||/||b|| 1.818199931657e-02 >> 46 KSP preconditioned resid norm 3.745525627399e-01 true resid norm >> 8.522457639380e+00 ||r(i)||/||b|| 1.664542507691e-02 >> 47 KSP preconditioned resid norm 1.811694613170e-01 true resid norm >> 7.531206553361e+00 ||r(i)||/||b|| 1.470938779953e-02 >> 48 KSP preconditioned resid norm 1.782171623447e+00 true resid norm >> 6.764441307706e+00 ||r(i)||/||b|| 1.321179942911e-02 >> 49 KSP preconditioned resid norm 2.299828236176e+00 true resid norm >> 6.702407994976e+00 ||r(i)||/||b|| 1.309064061519e-02 >> 50 KSP preconditioned resid norm 1.273834849543e+00 true resid norm >> 6.053797247633e+00 ||r(i)||/||b|| 1.182382274928e-02 >> 51 KSP preconditioned resid norm 2.719578737249e-01 true resid norm >> 5.470925517497e+00 ||r(i)||/||b|| 1.068540140136e-02 >> 52 KSP preconditioned resid norm 4.663757145206e-01 true resid norm >> 5.005785517882e+00 ||r(i)||/||b|| 9.776924839614e-03 >> 53 KSP preconditioned resid norm 1.292565284376e+00 true resid norm >> 4.881780753946e+00 ||r(i)||/||b|| 9.534728035050e-03 >> 54 KSP preconditioned resid norm 1.867369610632e-01 true resid norm >> 4.496564950399e+00 ||r(i)||/||b|| 8.782353418749e-03 >> 55 KSP preconditioned resid norm 5.249392115789e-01 true resid norm >> 4.092757959067e+00 ||r(i)||/||b|| 7.993667888803e-03 >> 56 KSP preconditioned resid norm 1.924525961621e-01 true resid norm >> 3.780501481010e+00 ||r(i)||/||b|| 7.383791955098e-03 >> 57 KSP preconditioned resid norm 5.779420386829e-01 true resid norm >> 3.213189014725e+00 ||r(i)||/||b|| 6.275759794385e-03 >> 58 KSP preconditioned resid norm 5.955339076981e-01 true resid norm >> 3.112032435949e+00 ||r(i)||/||b|| 6.078188351463e-03 >> 59 KSP preconditioned resid norm 3.750139060970e-01 true resid norm >> 2.999193364090e+00 ||r(i)||/||b|| 5.857799539239e-03 >> 60 KSP preconditioned resid norm 1.384679712935e-01 true resid norm >> 2.745891157615e+00 ||r(i)||/||b|| 5.363068667216e-03 >> 61 KSP preconditioned resid norm 7.632834890339e-02 true resid norm >> 2.176299405671e+00 ||r(i)||/||b|| 4.250584776702e-03 >> 62 KSP preconditioned resid norm 3.147491994853e-01 true resid norm >> 1.832893972188e+00 ||r(i)||/||b|| 3.579871039430e-03 >> 63 KSP preconditioned resid norm 5.052243308649e-01 true resid norm >> 1.775115122392e+00 ||r(i)||/||b|| 3.467021723421e-03 >> 64 KSP preconditioned resid norm 8.956523831283e-01 true resid norm >> 1.731441975933e+00 ||r(i)||/||b|| 3.381722609244e-03 >> 65 KSP preconditioned resid norm 7.897527588669e-01 true resid norm >> 1.682654829619e+00 ||r(i)||/||b|| 3.286435214100e-03 >> 66 KSP preconditioned resid norm 5.770941160165e-02 true resid norm >> 1.560734518349e+00 ||r(i)||/||b|| 3.048309606150e-03 >> 67 KSP preconditioned resid norm 3.553024960194e-02 true resid norm >> 1.389699750667e+00 ||r(i)||/||b|| 2.714257325521e-03 >> 68 KSP preconditioned resid norm 4.316233667769e-02 true resid norm >> 1.147051776028e+00 ||r(i)||/||b|| 2.240335500054e-03 >> 69 KSP preconditioned resid norm 3.793691994632e-02 true resid norm >> 1.012385825627e+00 ||r(i)||/||b|| 1.977316065678e-03 >> 70 KSP preconditioned resid norm 2.383460701011e-02 true resid norm >> 8.696480161436e-01 ||r(i)||/||b|| 1.698531281530e-03 >> 71 KSP preconditioned resid norm 6.376655007996e-02 true resid norm >> 7.779779636534e-01 ||r(i)||/||b|| 1.519488210261e-03 >> 72 KSP preconditioned resid norm 5.714768085413e-02 true resid norm >> 7.153671793501e-01 ||r(i)||/||b|| 1.397201522168e-03 >> 73 KSP preconditioned resid norm 1.708395350387e-01 true resid norm >> 6.312992319936e-01 ||r(i)||/||b|| 1.233006312487e-03 >> 74 KSP preconditioned resid norm 1.498516783452e-01 true resid norm >> 6.006527781743e-01 ||r(i)||/||b|| 1.173149957372e-03 >> 75 KSP preconditioned resid norm 1.218071938641e-01 true resid norm >> 5.769463903876e-01 ||r(i)||/||b|| 1.126848418726e-03 >> 76 KSP preconditioned resid norm 2.682030144251e-02 true resid norm >> 5.214035118381e-01 ||r(i)||/||b|| 1.018366234059e-03 >> 77 KSP preconditioned resid norm 9.794744927328e-02 true resid norm >> 4.660318995939e-01 ||r(i)||/||b|| 9.102185538943e-04 >> 78 KSP preconditioned resid norm 3.311394355245e-01 true resid norm >> 4.581129176231e-01 ||r(i)||/||b|| 8.947517922325e-04 >> 79 KSP preconditioned resid norm 7.771705063438e-02 true resid norm >> 4.103510898511e-01 ||r(i)||/||b|| 8.014669723654e-04 >> 80 KSP preconditioned resid norm 3.078123608908e-02 true resid norm >> 3.918493012988e-01 ||r(i)||/||b|| 7.653306665991e-04 >> 81 KSP preconditioned resid norm 2.759088686744e-02 true resid norm >> 3.289360804743e-01 ||r(i)||/||b|| 6.424532821763e-04 >> 82 KSP preconditioned resid norm 1.147671489846e-01 true resid norm >> 3.190902200515e-01 ||r(i)||/||b|| 6.232230860381e-04 >> 83 KSP preconditioned resid norm 1.101306468440e-02 true resid norm >> 2.900815313985e-01 ||r(i)||/||b|| 5.665654910126e-04 >> KSP Object: 16 MPI processes >> type: cg >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> PC Object: 16 MPI processes >> type: mg >> MG: type is MULTIPLICATIVE, levels=5 cycles=v >> Cycles per PCApply=1 >> Not using Galerkin computed coarse grid matrices >> Coarse grid solver -- level ------------------------------- >> KSP Object: (mg_coarse_) 16 MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (mg_coarse_) 16 MPI processes >> type: redundant >> Redundant preconditioner: First (color=0) of 16 PCs follows >> KSP Object: (mg_coarse_redundant_) 1 MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (mg_coarse_redundant_) 1 MPI processes >> type: lu >> LU: out-of-place factorization >> tolerance for zero pivot 2.22045e-14 >> using diagonal shift on blocks to prevent zero pivot >> [INBLOCKS] >> matrix ordering: nd >> factor fill ratio given 5., needed 7.56438 >> Factored matrix follows: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=512, cols=512 >> package used to perform factorization: petsc >> total: nonzeros=24206, allocated nonzeros=24206 >> total number of mallocs used during MatSetValues calls >> =0 >> not using I-node routines >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=512, cols=512 >> total: nonzeros=3200, allocated nonzeros=3200 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=512, cols=512 >> total: nonzeros=3200, allocated nonzeros=3200 >> total number of mallocs used during MatSetValues calls =0 >> Down solver (pre-smoother) on level 1 ------------------------------- >> KSP Object: (mg_levels_1_) 16 MPI processes >> type: richardson >> Richardson: using self-scale best computed damping factor >> maximum iterations=5 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_1_) 16 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = >> 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=4096, cols=4096 >> total: nonzeros=27136, allocated nonzeros=27136 >> total number of mallocs used during MatSetValues calls =0 >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 2 ------------------------------- >> KSP Object: (mg_levels_2_) 16 MPI processes >> type: richardson >> Richardson: using self-scale best computed damping factor >> maximum iterations=5 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_2_) 16 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = >> 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=32768, cols=32768 >> total: nonzeros=223232, allocated nonzeros=223232 >> total number of mallocs used during MatSetValues calls =0 >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 3 ------------------------------- >> KSP Object: (mg_levels_3_) 16 MPI processes >> type: richardson >> Richardson: using self-scale best computed damping factor >> maximum iterations=5 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_3_) 16 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = >> 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=262144, cols=262144 >> total: nonzeros=1810432, allocated nonzeros=1810432 >> total number of mallocs used during MatSetValues calls =0 >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 4 ------------------------------- >> KSP Object: (mg_levels_4_) 16 MPI processes >> type: richardson >> Richardson: using self-scale best computed damping factor >> maximum iterations=5 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_4_) 16 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = >> 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=2097152, cols=2097152 >> total: nonzeros=14581760, allocated nonzeros=14581760 >> total number of mallocs used during MatSetValues calls =0 >> Up solver (post-smoother) same as down solver (pre-smoother) >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=2097152, cols=2097152 >> total: nonzeros=14581760, allocated nonzeros=14581760 >> total number of mallocs used during MatSetValues calls =0 >> Residual 2 norm 0.290082 >> Residual infinity norm 0.00192869 >> >> >> >> >> >> solver_test.c: >> >> // modified version of ksp/ksp/examples/tutorials/ex34.c >> // related: ksp/ksp/examples/tutorials/ex29.c >> // ksp/ksp/examples/tutorials/ex32.c >> // ksp/ksp/examples/tutorials/ex50.c >> >> #include >> #include >> #include >> >> extern PetscErrorCode ComputeMatrix(KSP,Mat,Mat,void*); >> extern PetscErrorCode ComputeRHS(KSP,Vec,void*); >> >> typedef enum >> { >> DIRICHLET, >> NEUMANN >> } BCType; >> >> #undef __FUNCT__ >> #define __FUNCT__ "main" >> int main(int argc,char **argv) >> { >> KSP ksp; >> DM da; >> PetscReal norm; >> PetscErrorCode ierr; >> >> PetscInt i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs; >> PetscScalar Hx,Hy,Hz; >> PetscScalar ***array; >> Vec x,b,r; >> Mat J; >> const char* bcTypes[2] = { "dirichlet", "neumann" }; >> PetscInt bcType = (PetscInt)DIRICHLET; >> >> PetscInitialize(&argc,&argv,(char*)0,0); >> >> ierr = PetscOptionsBegin(PETSC_COMM_WORLD, "", "", "");CHKERRQ(ierr); >> ierr = PetscOptionsEList("-bc_type", "Type of boundary condition", >> "", bcTypes, 2, bcTypes[0], &bcType, NULL);CHKERRQ(ierr); >> ierr = PetscOptionsEnd();CHKERRQ(ierr); >> >> ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr); >> ierr = DMDACreate3d(PETSC_COMM_WORLD,DM_BOUNDARY_NONE,DM_BOUNDARY_N >> ONE,DM_BOUNDARY_NONE,DMDA_STENCIL_STAR,-12,-12,-12,PETSC_ >> DECIDE,PETSC_DECIDE,PETSC_DECIDE,1,1,0,0,0,&da);CHKERRQ(ierr); >> ierr = DMDASetInterpolationType(da, DMDA_Q0);CHKERRQ(ierr); >> >> ierr = KSPSetDM(ksp,da);CHKERRQ(ierr); >> >> ierr = KSPSetComputeRHS(ksp,ComputeRHS,&bcType);CHKERRQ(ierr); >> ierr = KSPSetComputeOperators(ksp,ComputeMatrix,&bcType);CHKERRQ( >> ierr); >> ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr); >> ierr = KSPSolve(ksp,NULL,NULL);CHKERRQ(ierr); >> ierr = KSPGetSolution(ksp,&x);CHKERRQ(ierr); >> ierr = KSPGetRhs(ksp,&b);CHKERRQ(ierr); >> ierr = KSPGetOperators(ksp,NULL,&J);CHKERRQ(ierr); >> ierr = VecDuplicate(b,&r);CHKERRQ(ierr); >> >> ierr = MatMult(J,x,r);CHKERRQ(ierr); >> ierr = VecAYPX(r,-1.0,b);CHKERRQ(ierr); >> ierr = VecNorm(r,NORM_2,&norm);CHKERRQ(ierr); >> ierr = PetscPrintf(PETSC_COMM_WORLD,"Residual 2 norm >> %g\n",(double)norm);CHKERRQ(ierr); >> ierr = VecNorm(r,NORM_INFINITY,&norm);CHKERRQ(ierr); >> ierr = PetscPrintf(PETSC_COMM_WORLD,"Residual infinity norm >> %g\n",(double)norm);CHKERRQ(ierr); >> >> ierr = VecDestroy(&r);CHKERRQ(ierr); >> ierr = KSPDestroy(&ksp);CHKERRQ(ierr); >> ierr = DMDestroy(&da);CHKERRQ(ierr); >> ierr = PetscFinalize(); >> return 0; >> } >> >> #undef __FUNCT__ >> #define __FUNCT__ "ComputeRHS" >> PetscErrorCode ComputeRHS(KSP ksp,Vec b,void *ctx) >> { >> PetscErrorCode ierr; >> PetscInt i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs; >> PetscScalar Hx,Hy,Hz; >> PetscScalar ***array; >> DM da; >> BCType bcType = *(BCType*)ctx; >> >> PetscFunctionBeginUser; >> ierr = KSPGetDM(ksp,&da);CHKERRQ(ierr); >> ierr = DMDAGetInfo(da, 0, &mx, &my, &mz, >> 0,0,0,0,0,0,0,0,0);CHKERRQ(ierr); >> Hx = 1.0 / (PetscReal)(mx); >> Hy = 1.0 / (PetscReal)(my); >> Hz = 1.0 / (PetscReal)(mz); >> ierr = DMDAGetCorners(da,&xs,&ys,&zs,&xm,&ym,&zm);CHKERRQ(ierr); >> ierr = DMDAVecGetArray(da, b, &array);CHKERRQ(ierr); >> for (k = zs; k < zs + zm; k++) >> { >> for (j = ys; j < ys + ym; j++) >> { >> for (i = xs; i < xs + xm; i++) >> { >> PetscReal x = ((PetscReal)i + 0.5) * Hx; >> PetscReal y = ((PetscReal)j + 0.5) * Hy; >> PetscReal z = ((PetscReal)k + 0.5) * Hz; >> array[k][j][i] = PetscSinReal(x * 2.0 * PETSC_PI) * >> PetscCosReal(y * 2.0 * PETSC_PI) * PetscSinReal(z * 2.0 * PETSC_PI); >> } >> } >> } >> ierr = DMDAVecRestoreArray(da, b, &array);CHKERRQ(ierr); >> ierr = VecAssemblyBegin(b);CHKERRQ(ierr); >> ierr = VecAssemblyEnd(b);CHKERRQ(ierr); >> >> PetscReal norm; >> VecNorm(b, NORM_2, &norm); >> PetscPrintf(PETSC_COMM_WORLD, "right hand side 2 norm: %g\n", >> (double)norm); >> VecNorm(b, NORM_INFINITY, &norm); >> PetscPrintf(PETSC_COMM_WORLD, "right hand side infinity norm: %g\n", >> (double)norm); >> >> /* force right hand side to be consistent for singular matrix */ >> /* note this is really a hack, normally the model would provide you >> with a consistent right handside */ >> >> if (bcType == NEUMANN) >> { >> MatNullSpace nullspace; >> ierr = MatNullSpaceCreate(PETSC_COMM_ >> WORLD,PETSC_TRUE,0,0,&nullspace);CHKERRQ(ierr); >> ierr = MatNullSpaceRemove(nullspace,b);CHKERRQ(ierr); >> ierr = MatNullSpaceDestroy(&nullspace);CHKERRQ(ierr); >> } >> PetscFunctionReturn(0); >> } >> >> >> #undef __FUNCT__ >> #define __FUNCT__ "ComputeMatrix" >> PetscErrorCode ComputeMatrix(KSP ksp, Mat J,Mat jac, void *ctx) >> { >> PetscErrorCode ierr; >> PetscInt i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs,num, numi, numj, >> numk; >> PetscScalar v[7],Hx,Hy,Hz; >> MatStencil row, col[7]; >> DM da; >> BCType bcType = *(BCType*)ctx; >> >> PetscFunctionBeginUser; >> >> if (bcType == DIRICHLET) >> PetscPrintf(PETSC_COMM_WORLD, "building operator with Dirichlet >> boundary conditions, "); >> else if (bcType == NEUMANN) >> PetscPrintf(PETSC_COMM_WORLD, "building operator with Neumann >> boundary conditions, "); >> else >> SETERRQ(PETSC_COMM_WORLD, PETSC_ERR_SUP, "unrecognized boundary >> condition type\n"); >> >> ierr = KSPGetDM(ksp,&da);CHKERRQ(ierr); >> ierr = DMDAGetInfo(da,0,&mx,&my,&mz,0 >> ,0,0,0,0,0,0,0,0);CHKERRQ(ierr); >> >> PetscPrintf(PETSC_COMM_WORLD, "global grid size: %d x %d x %d\n", >> mx, my, mz); >> >> Hx = 1.0 / (PetscReal)(mx); >> Hy = 1.0 / (PetscReal)(my); >> Hz = 1.0 / (PetscReal)(mz); >> >> PetscReal Hx2 = Hx * Hx; >> PetscReal Hy2 = Hy * Hy; >> PetscReal Hz2 = Hz * Hz; >> >> PetscReal scaleX = 1.0 / Hx2; >> PetscReal scaleY = 1.0 / Hy2; >> PetscReal scaleZ = 1.0 / Hz2; >> >> ierr = DMDAGetCorners(da,&xs,&ys,&zs,&xm,&ym,&zm);CHKERRQ(ierr); >> for (k = zs; k < zs + zm; k++) >> { >> for (j = ys; j < ys + ym; j++) >> { >> for (i = xs; i < xs + xm; i++) >> { >> row.i = i; >> row.j = j; >> row.k = k; >> if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my >> - 1 || k == mz - 1) >> { >> num = 0; >> numi = 0; >> numj = 0; >> numk = 0; >> if (k != 0) >> { >> v[num] = -scaleZ; >> col[num].i = i; >> col[num].j = j; >> col[num].k = k - 1; >> num++; >> numk++; >> } >> if (j != 0) >> { >> v[num] = -scaleY; >> col[num].i = i; >> col[num].j = j - 1; >> col[num].k = k; >> num++; >> numj++; >> } >> if (i != 0) >> { >> v[num] = -scaleX; >> col[num].i = i - 1; >> col[num].j = j; >> col[num].k = k; >> num++; >> numi++; >> } >> if (i != mx - 1) >> { >> v[num] = -scaleX; >> col[num].i = i + 1; >> col[num].j = j; >> col[num].k = k; >> num++; >> numi++; >> } >> if (j != my - 1) >> { >> v[num] = -scaleY; >> col[num].i = i; >> col[num].j = j + 1; >> col[num].k = k; >> num++; >> numj++; >> } >> if (k != mz - 1) >> { >> v[num] = -scaleZ; >> col[num].i = i; >> col[num].j = j; >> col[num].k = k + 1; >> num++; >> numk++; >> } >> >> if (bcType == NEUMANN) >> { >> v[num] = (PetscReal) (numk) * scaleZ + >> (PetscReal) (numj) * scaleY + (PetscReal) (numi) * scaleX; >> } >> else if (bcType == DIRICHLET) >> { >> v[num] = 2.0 * (scaleX + scaleY + scaleZ); >> } >> >> col[num].i = i; >> col[num].j = j; >> col[num].k = k; >> num++; >> ierr = MatSetValuesStencil(jac, 1, &row, num, col, >> v, INSERT_VALUES); >> CHKERRQ(ierr); >> } >> else >> { >> v[0] = -scaleZ; >> col[0].i = i; >> col[0].j = j; >> col[0].k = k - 1; >> v[1] = -scaleY; >> col[1].i = i; >> col[1].j = j - 1; >> col[1].k = k; >> v[2] = -scaleX; >> col[2].i = i - 1; >> col[2].j = j; >> col[2].k = k; >> v[3] = 2.0 * (scaleX + scaleY + scaleZ); >> col[3].i = i; >> col[3].j = j; >> col[3].k = k; >> v[4] = -scaleX; >> col[4].i = i + 1; >> col[4].j = j; >> col[4].k = k; >> v[5] = -scaleY; >> col[5].i = i; >> col[5].j = j + 1; >> col[5].k = k; >> v[6] = -scaleZ; >> col[6].i = i; >> col[6].j = j; >> col[6].k = k + 1; >> ierr = MatSetValuesStencil(jac, 1, &row, 7, col, v, >> INSERT_VALUES); >> CHKERRQ(ierr); >> } >> } >> } >> } >> ierr = MatAssemblyBegin(jac,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >> ierr = MatAssemblyEnd(jac,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >> if (bcType == NEUMANN) >> { >> MatNullSpace nullspace; >> ierr = MatNullSpaceCreate(PETSC_COMM_ >> WORLD,PETSC_TRUE,0,0,&nullspace);CHKERRQ(ierr); >> ierr = MatSetNullSpace(J,nullspace);CHKERRQ(ierr); >> ierr = MatNullSpaceDestroy(&nullspace);CHKERRQ(ierr); >> } >> PetscFunctionReturn(0); >> } >> >> >> On Jun 22, 2017, at 9:23 AM, Matthew Knepley wrote: >> >> On Wed, Jun 21, 2017 at 8:12 PM, Jason Lefley >> wrote: >> >>> Hello, >>> >>> We are attempting to use the PETSc KSP solver framework in a fluid >>> dynamics simulation we developed. The solution is part of a pressure >>> projection and solves a Poisson problem. We use a cell-centered layout with >>> a regular grid in 3d. We started with ex34.c from the KSP tutorials since >>> it has the correct calls for the 3d DMDA, uses a cell-centered layout, and >>> states that it works with multi-grid. We modified the operator construction >>> function to match the coefficients and Dirichlet boundary conditions used >>> in our problem (we?d also like to support Neumann but left those out for >>> now to keep things simple). As a result of the modified boundary >>> conditions, our version does not perform a null space removal on the right >>> hand side or operator as the original did. We also modified the right hand >>> side to contain a sinusoidal pattern for testing. Other than these changes, >>> our code is the same as the original ex34.c >>> >>> With the default KSP options and using CG with the default >>> pre-conditioner and without a pre-conditioner, we see good convergence. >>> However, we?d like to accelerate the time to solution further and target >>> larger problem sizes (>= 1024^3) if possible. Given these objectives, >>> multi-grid as a pre-conditioner interests us. To understand the improvement >>> that multi-grid provides, we ran ex45 from the KSP tutorials. ex34 with CG >>> and no pre-conditioner appears to converge in a single iteration and we >>> wanted to compare against a problem that has similar convergence patterns >>> to our problem. Here?s the tests we ran with ex45: >>> >>> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 >>> time in KSPSolve(): 7.0178e+00 >>> solver iterations: 157 >>> KSP final norm of residual: 3.16874e-05 >>> >>> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 >>> -ksp_type cg -pc_type none >>> time in KSPSolve(): 4.1072e+00 >>> solver iterations: 213 >>> KSP final norm of residual: 0.000138866 >>> >>> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 >>> -ksp_type cg >>> time in KSPSolve(): 3.3962e+00 >>> solver iterations: 88 >>> KSP final norm of residual: 6.46242e-05 >>> >>> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 >>> -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson >>> -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi >>> time in KSPSolve(): 1.3201e+00 >>> solver iterations: 4 >>> KSP final norm of residual: 8.13339e-05 >>> >>> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 >>> -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson >>> -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi -ksp_type cg >>> time in KSPSolve(): 1.3008e+00 >>> solver iterations: 4 >>> KSP final norm of residual: 2.21474e-05 >>> >>> We found the multi-grid pre-conditioner options in the KSP tutorials >>> makefile. These results make sense; both the default GMRES and CG solvers >>> converge and CG without a pre-conditioner takes more iterations. The >>> multi-grid pre-conditioned runs are pretty dramatically accelerated and >>> require only a handful of iterations. >>> >>> We ran our code (modified ex34.c as described above) with the same >>> parameters: >>> >>> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 >>> time in KSPSolve(): 5.3729e+00 >>> solver iterations: 123 >>> KSP final norm of residual: 0.00595066 >>> >>> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 >>> -ksp_type cg -pc_type none >>> time in KSPSolve(): 3.6154e+00 >>> solver iterations: 188 >>> KSP final norm of residual: 0.00505943 >>> >>> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 >>> -ksp_type cg >>> time in KSPSolve(): 3.5661e+00 >>> solver iterations: 98 >>> KSP final norm of residual: 0.00967462 >>> >>> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 >>> -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson >>> -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi >>> time in KSPSolve(): 4.5606e+00 >>> solver iterations: 44 >>> KSP final norm of residual: 949.553 >>> >> >> 1) Dave is right >> >> 2) In order to see how many iterates to expect, first try using algebraic >> multigrid >> >> -pc_type gamg >> >> This should work out of the box for Poisson >> >> 3) For questions like this, we really need to see >> >> -ksp_view -ksp_monitor_true_residual >> >> 4) It sounds like you smoother is not strong enough. You could try >> >> -mg_levels_ksp_type richardson -mg_levels_ksp_richardson_self_scale >> -mg_levels_ksp_max_it 5 >> >> or maybe GMRES until it works. >> >> Thanks, >> >> Matt >> >> >>> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 >>> -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson >>> -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi -ksp_type cg >>> time in KSPSolve(): 1.5481e+01 >>> solver iterations: 198 >>> KSP final norm of residual: 0.916558 >>> >>> We performed all tests with petsc-3.7.6. >>> >>> The trends with CG and GMRES seem consistent with the results from ex45. >>> However, with multi-grid, something doesn?t seem right. Convergence seems >>> poor and the solves run for many more iterations than ex45 with multi-grid >>> as a pre-conditioner. I extensively validated the code that builds the >>> matrix and also confirmed that the solution produced by CG, when evaluated >>> with the system of equations elsewhere in our simulation, produces the same >>> residual as indicated by PETSc. Given that we only made minimal >>> modifications to the original example code, it seems likely that the >>> operators constructed for the multi-grid levels are ok. >>> >>> We also tried a variety of other suggested parameters for the multi-grid >>> pre-conditioner as suggested in related mailing list posts but we didn?t >>> observe any significant improvements over the results above. >>> >>> Is there anything we can do to check the validity of the coefficient >>> matrices built for the different multi-grid levels? Does it look like there >>> could be problems there? Or any other suggestions to achieve better results >>> with multi-grid? I have the -log_view, -ksp_view, and convergence monitor >>> output from the above tests and can post any of it if it would assist. >>> >>> Thanks >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Jun 23 07:43:59 2017 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 23 Jun 2017 08:43:59 -0400 Subject: [petsc-users] Issue using multi-grid as a pre-conditioner with KSP for a Poisson problem In-Reply-To: References: <89C033B5-529D-4B36-B4AF-2EC35CA2CCAB@aclectic.com> <4D5B3921-810B-49AD-97E9-8BF1DECBF655@aclectic.com> Message-ID: > > > coarser levels, which is sometimes called "renormalization". > > Of course. BTW, what ever happened to"fiber bundle". I miss FB :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ibarletta at inogs.it Fri Jun 23 07:46:03 2017 From: ibarletta at inogs.it (Barletta, Ivano) Date: Fri, 23 Jun 2017 14:46:03 +0200 Subject: [petsc-users] info about MatPreallocateInitialize/Finalize Message-ID: Dear all How does MatPreallocateInitialize/Finalize work? I don't see any reference for examples in the online manual, soI've browsed the source code to see whetherthe routine is used and how As far as I understood it works this way: call MatPreallocateInitialize(comm, rows, cols, d_nz,o_nz,ierr) ! do something to calculate d_nz and o_nz call MatXXXSetPreallocation call MatPreallocateFinalize It is not that clear the way d_nz and o_nz are computed. Do I need to provide them myself? (in that case what would be the point of using MatPreallocateInitialize/Finalize?) Thanks Ivano -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Jun 23 11:05:22 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 23 Jun 2017 11:05:22 -0500 Subject: [petsc-users] info about MatPreallocateInitialize/Finalize In-Reply-To: References: Message-ID: You are correct, there are no could uses of it in examples. The best way to understand its use is to look at where it is used in PETSc. For example in src/dm/impls/da/fdda.c the function DMCreateMatrix_DA_2d_MPIAIJ(). You will see that it is your job to call MatPreallocateSetLocal() (or one of the other family of functions) to provide the information about nonzero columns for each row. Note that this tool is very useful for some preallocations, like finite differences, it may not be useful for other discretizations and it may be better to compute the values for dnz and onz your self without using MatPreallocateInitialize Barry > On Jun 23, 2017, at 7:46 AM, Barletta, Ivano wrote: > > Dear all > > How does MatPreallocateInitialize/Finalize work? > > I don't see any reference for examples in the > online manual, soI've browsed the source code > to see whetherthe routine is used and how > > As far as I understood it works this way: > > call MatPreallocateInitialize(comm, rows, cols, d_nz,o_nz,ierr) > > ! do something to calculate d_nz and o_nz > > call MatXXXSetPreallocation > call MatPreallocateFinalize > > It is not that clear the way d_nz and o_nz are > computed. Do I need to provide them myself? > (in that case what would be the point of using > MatPreallocateInitialize/Finalize?) > > Thanks > Ivano > > > From bsmith at mcs.anl.gov Fri Jun 23 11:34:39 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 23 Jun 2017 11:34:39 -0500 Subject: [petsc-users] Issue using multi-grid as a pre-conditioner with KSP for a Poisson problem In-Reply-To: References: <89C033B5-529D-4B36-B4AF-2EC35CA2CCAB@aclectic.com> <4D5B3921-810B-49AD-97E9-8BF1DECBF655@aclectic.com> Message-ID: > On Jun 23, 2017, at 12:54 AM, Jason Lefley wrote: > > >> On Jun 22, 2017, at 5:35 PM, Matthew Knepley wrote: >> >> On Thu, Jun 22, 2017 at 3:20 PM, Jason Lefley wrote: >> Thanks for the prompt replies. I ran with gamg and the results look more promising. I tried the suggested -mg_* options and did not see improvement. The -ksp_view and -ksp_monitor_true_residual output from those tests and the solver_test source (modified ex34.c) follow: >> >> Okay, the second step is to replicate the smoother for the GMG, which will have a smaller and scalable setup time. The >> smoother could be weak, or the restriction could be bad. > > I inspected the ksp_view output from the run with -pc_type gamg and ran the program again with -pc_type mg and the pre-conditioner options from the gamg run: Yeah this definitely won't work. You won't be able to get the same number of iterations with a "tweaked" geometric MG as you get with GAMG on this type of problem; or if you did you would be a great mathematician. This is kind of an unsolved problem in numerical analysis. Thus you are really stuck with two choices GAMG or geometric MG with the Galerkin generation of coarse operators. GAMG will result in fewer iterations but maybe a longer time to setup the preconditioner. Do you need to solve a single system with the SAME given operator or many systems (for example at each timestep). If you need to solve many such systems then the setup time is irrelevant and you should use GAMG. If you need to solve only a single system then there you need to try both preconditioners and determine which one is faster on your machine for your system. If each system you solve is only "slightly" different than the previous you might be able to lag the preconditioner, this means build the preconditioner and then use it for several new systems before creating a new preconditioner. You can do this by using KSPSetReusePreconditioner() appropriately. Barry > > $ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_view -ksp_monitor_true_residual -ksp_type cg -pc_type mg -pc_mg_levels 5 -mg_coarse_ksp_type preonly -mg_coarse_pc_type bjacobi -mg_coarse_sub_ksp_type preonly -mg_coarse_sub_pc_type lu -mg_coarse_sub_pc_factor_shift_type INBLOCKS -mg_levels_ksp_type chebyshev -mg_levels_pc_type sor -mg_levels_esteig_ksp_type gmres > right hand side 2 norm: 512. > right hand side infinity norm: 0.999097 > building operator with Dirichlet boundary conditions, global grid size: 128 x 128 x 128 > building operator with Dirichlet boundary conditions, global grid size: 16 x 16 x 16 > building operator with Dirichlet boundary conditions, global grid size: 32 x 32 x 32 > building operator with Dirichlet boundary conditions, global grid size: 64 x 64 x 64 > building operator with Dirichlet boundary conditions, global grid size: 8 x 8 x 8 > 0 KSP preconditioned resid norm 9.806726045668e+02 true resid norm 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 3.621361277232e+02 true resid norm 1.429430352211e+03 ||r(i)||/||b|| 2.791856156662e+00 > 2 KSP preconditioned resid norm 2.362961522860e+01 true resid norm 1.549620746006e+03 ||r(i)||/||b|| 3.026603019544e+00 > 3 KSP preconditioned resid norm 7.695073339717e+01 true resid norm 1.542148820317e+03 ||r(i)||/||b|| 3.012009414681e+00 > 4 KSP preconditioned resid norm 3.765270470793e+01 true resid norm 1.536405551882e+03 ||r(i)||/||b|| 3.000792093520e+00 > 5 KSP preconditioned resid norm 6.761970780882e+01 true resid norm 1.502842623846e+03 ||r(i)||/||b|| 2.935239499698e+00 > 6 KSP preconditioned resid norm 5.995995646652e+01 true resid norm 1.447456652501e+03 ||r(i)||/||b|| 2.827063774415e+00 > 7 KSP preconditioned resid norm 4.388139142285e+01 true resid norm 1.413766393419e+03 ||r(i)||/||b|| 2.761262487146e+00 > 8 KSP preconditioned resid norm 2.295909410512e+01 true resid norm 1.371727148377e+03 ||r(i)||/||b|| 2.679154586673e+00 > 9 KSP preconditioned resid norm 1.961908891359e+01 true resid norm 1.339113282715e+03 ||r(i)||/||b|| 2.615455630302e+00 > 10 KSP preconditioned resid norm 6.893687291220e+01 true resid norm 1.229592829746e+03 ||r(i)||/||b|| 2.401548495598e+00 > 11 KSP preconditioned resid norm 3.833567365382e+01 true resid norm 1.118085982483e+03 ||r(i)||/||b|| 2.183761684536e+00 > 12 KSP preconditioned resid norm 1.939604089596e+01 true resid norm 9.852672187664e+02 ||r(i)||/||b|| 1.924350036653e+00 > 13 KSP preconditioned resid norm 2.252075208204e+01 true resid norm 8.159187018709e+02 ||r(i)||/||b|| 1.593591214592e+00 > 14 KSP preconditioned resid norm 2.642782719810e+01 true resid norm 7.253214735753e+02 ||r(i)||/||b|| 1.416643503077e+00 > 15 KSP preconditioned resid norm 2.548817259250e+01 true resid norm 6.070018478722e+02 ||r(i)||/||b|| 1.185550484125e+00 > 16 KSP preconditioned resid norm 5.281972692525e+01 true resid norm 4.815894400238e+02 ||r(i)||/||b|| 9.406043750466e-01 > 17 KSP preconditioned resid norm 2.402884696592e+01 true resid norm 4.144462871860e+02 ||r(i)||/||b|| 8.094654046603e-01 > 18 KSP preconditioned resid norm 1.043080941574e+01 true resid norm 3.729148183697e+02 ||r(i)||/||b|| 7.283492546283e-01 > 19 KSP preconditioned resid norm 1.490375076082e+01 true resid norm 3.122057027160e+02 ||r(i)||/||b|| 6.097767631172e-01 > 20 KSP preconditioned resid norm 3.249426166084e+00 true resid norm 2.704136970440e+02 ||r(i)||/||b|| 5.281517520390e-01 > 21 KSP preconditioned resid norm 4.898441103047e+00 true resid norm 2.346045017813e+02 ||r(i)||/||b|| 4.582119175416e-01 > 22 KSP preconditioned resid norm 6.674657659594e+00 true resid norm 1.870390126135e+02 ||r(i)||/||b|| 3.653105715107e-01 > 23 KSP preconditioned resid norm 5.475921158065e+00 true resid norm 1.732176093821e+02 ||r(i)||/||b|| 3.383156433245e-01 > 24 KSP preconditioned resid norm 2.776421930727e+00 true resid norm 1.562809743536e+02 ||r(i)||/||b|| 3.052362780343e-01 > 25 KSP preconditioned resid norm 3.424602247354e+00 true resid norm 1.375628929963e+02 ||r(i)||/||b|| 2.686775253835e-01 > 26 KSP preconditioned resid norm 2.212037280808e+00 true resid norm 1.221828497054e+02 ||r(i)||/||b|| 2.386383783309e-01 > 27 KSP preconditioned resid norm 1.365474968893e+00 true resid norm 1.082476112493e+02 ||r(i)||/||b|| 2.114211157213e-01 > 28 KSP preconditioned resid norm 2.638907538318e+00 true resid norm 8.864935716757e+01 ||r(i)||/||b|| 1.731432757179e-01 > 29 KSP preconditioned resid norm 1.719908158919e+00 true resid norm 7.632670876324e+01 ||r(i)||/||b|| 1.490756030532e-01 > 30 KSP preconditioned resid norm 7.985033219249e-01 true resid norm 6.949169231958e+01 ||r(i)||/||b|| 1.357259615617e-01 > 31 KSP preconditioned resid norm 3.811670663811e+00 true resid norm 6.151000812796e+01 ||r(i)||/||b|| 1.201367346249e-01 > 32 KSP preconditioned resid norm 7.888148376757e+00 true resid norm 5.694823999920e+01 ||r(i)||/||b|| 1.112270312484e-01 > 33 KSP preconditioned resid norm 7.545633821809e-01 true resid norm 4.589854278402e+01 ||r(i)||/||b|| 8.964559137503e-02 > 34 KSP preconditioned resid norm 2.271801800991e+00 true resid norm 3.728668301821e+01 ||r(i)||/||b|| 7.282555276994e-02 > 35 KSP preconditioned resid norm 3.961087334680e+00 true resid norm 3.169140910721e+01 ||r(i)||/||b|| 6.189728341253e-02 > 36 KSP preconditioned resid norm 9.139405064634e-01 true resid norm 2.825299509385e+01 ||r(i)||/||b|| 5.518163104268e-02 > 37 KSP preconditioned resid norm 3.403605053170e-01 true resid norm 2.102215336663e+01 ||r(i)||/||b|| 4.105889329421e-02 > 38 KSP preconditioned resid norm 4.614799224677e-01 true resid norm 1.651863757642e+01 ||r(i)||/||b|| 3.226296401644e-02 > 39 KSP preconditioned resid norm 1.996074237552e+00 true resid norm 1.439868559977e+01 ||r(i)||/||b|| 2.812243281205e-02 > 40 KSP preconditioned resid norm 1.106018322401e+00 true resid norm 1.313250681787e+01 ||r(i)||/||b|| 2.564942737865e-02 > 41 KSP preconditioned resid norm 2.639402464711e-01 true resid norm 1.164910167179e+01 ||r(i)||/||b|| 2.275215170271e-02 > 42 KSP preconditioned resid norm 1.749941228669e-01 true resid norm 1.053438524789e+01 ||r(i)||/||b|| 2.057497118729e-02 > 43 KSP preconditioned resid norm 6.464433193720e-01 true resid norm 9.105614545741e+00 ||r(i)||/||b|| 1.778440340965e-02 > 44 KSP preconditioned resid norm 5.990029838187e-01 true resid norm 8.803151647663e+00 ||r(i)||/||b|| 1.719365556184e-02 > 45 KSP preconditioned resid norm 1.871777684116e-01 true resid norm 8.140591972598e+00 ||r(i)||/||b|| 1.589959369648e-02 > 46 KSP preconditioned resid norm 4.316459571157e-01 true resid norm 7.640223567698e+00 ||r(i)||/||b|| 1.492231165566e-02 > 47 KSP preconditioned resid norm 9.563142801536e-02 true resid norm 7.094762567710e+00 ||r(i)||/||b|| 1.385695814006e-02 > 48 KSP preconditioned resid norm 2.380088757747e-01 true resid norm 6.064559746487e+00 ||r(i)||/||b|| 1.184484325486e-02 > 49 KSP preconditioned resid norm 2.230779501200e-01 true resid norm 4.923827478633e+00 ||r(i)||/||b|| 9.616850544205e-03 > 50 KSP preconditioned resid norm 2.905071000609e-01 true resid norm 4.426620956264e+00 ||r(i)||/||b|| 8.645744055203e-03 > 51 KSP preconditioned resid norm 3.430194707482e-02 true resid norm 3.873957688918e+00 ||r(i)||/||b|| 7.566323611167e-03 > 52 KSP preconditioned resid norm 4.329652082337e-02 true resid norm 3.430571122778e+00 ||r(i)||/||b|| 6.700334224177e-03 > 53 KSP preconditioned resid norm 1.610976212900e-01 true resid norm 3.052757228648e+00 ||r(i)||/||b|| 5.962416462203e-03 > 54 KSP preconditioned resid norm 6.113252183681e-02 true resid norm 2.876793151138e+00 ||r(i)||/||b|| 5.618736623317e-03 > 55 KSP preconditioned resid norm 2.731463237482e-02 true resid norm 2.441017091077e+00 ||r(i)||/||b|| 4.767611506010e-03 > 56 KSP preconditioned resid norm 5.193746161496e-02 true resid norm 2.114917193241e+00 ||r(i)||/||b|| 4.130697643049e-03 > 57 KSP preconditioned resid norm 2.959513516137e-01 true resid norm 1.903828747377e+00 ||r(i)||/||b|| 3.718415522220e-03 > 58 KSP preconditioned resid norm 8.093802579621e-02 true resid norm 1.759070727559e+00 ||r(i)||/||b|| 3.435685014763e-03 > 59 KSP preconditioned resid norm 3.558590388480e-02 true resid norm 1.356337866126e+00 ||r(i)||/||b|| 2.649097394777e-03 > 60 KSP preconditioned resid norm 6.506508837044e-02 true resid norm 1.214979249890e+00 ||r(i)||/||b|| 2.373006347441e-03 > 61 KSP preconditioned resid norm 3.120758675191e-02 true resid norm 9.993321163196e-01 ||r(i)||/||b|| 1.951820539687e-03 > 62 KSP preconditioned resid norm 1.034431089486e-01 true resid norm 9.193137244810e-01 ||r(i)||/||b|| 1.795534618127e-03 > 63 KSP preconditioned resid norm 2.763120051285e-02 true resid norm 8.479698661132e-01 ||r(i)||/||b|| 1.656191144752e-03 > 64 KSP preconditioned resid norm 1.937546528918e-02 true resid norm 7.431839535619e-01 ||r(i)||/||b|| 1.451531159301e-03 > 65 KSP preconditioned resid norm 2.133391792161e-02 true resid norm 7.089428437765e-01 ||r(i)||/||b|| 1.384653991751e-03 > 66 KSP preconditioned resid norm 8.676771000819e-03 true resid norm 6.511166875850e-01 ||r(i)||/||b|| 1.271712280439e-03 > KSP Object: 16 MPI processes > type: cg > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 16 MPI processes > type: mg > MG: type is MULTIPLICATIVE, levels=5 cycles=v > Cycles per PCApply=1 > Not using Galerkin computed coarse grid matrices > Coarse grid solver -- level ------------------------------- > KSP Object: (mg_coarse_) 16 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_) 16 MPI processes > type: bjacobi > block Jacobi: number of blocks = 16 > Local solve is same for all blocks, in the following KSP and PC objects: > KSP Object: (mg_coarse_sub_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_sub_) 1 MPI processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > using diagonal shift on blocks to prevent zero pivot [INBLOCKS] > matrix ordering: nd > factor fill ratio given 5., needed 2.3125 > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=32, cols=32 > package used to perform factorization: petsc > total: nonzeros=370, allocated nonzeros=370 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=32, cols=32 > total: nonzeros=160, allocated nonzeros=160 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=512, cols=512 > total: nonzeros=3200, allocated nonzeros=3200 > total number of mallocs used during MatSetValues calls =0 > Down solver (pre-smoother) on level 1 ------------------------------- > KSP Object: (mg_levels_1_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.153005, max = 1.68306 > Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_1_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_1_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=4096, cols=4096 > total: nonzeros=27136, allocated nonzeros=27136 > total number of mallocs used during MatSetValues calls =0 > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 2 ------------------------------- > KSP Object: (mg_levels_2_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.152793, max = 1.68072 > Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_2_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_2_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=32768, cols=32768 > total: nonzeros=223232, allocated nonzeros=223232 > total number of mallocs used during MatSetValues calls =0 > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 3 ------------------------------- > KSP Object: (mg_levels_3_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.144705, max = 1.59176 > Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_3_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_3_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=262144, cols=262144 > total: nonzeros=1810432, allocated nonzeros=1810432 > total number of mallocs used during MatSetValues calls =0 > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 4 ------------------------------- > KSP Object: (mg_levels_4_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.140039, max = 1.54043 > Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_4_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_4_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=2097152, cols=2097152 > total: nonzeros=14581760, allocated nonzeros=14581760 > total number of mallocs used during MatSetValues calls =0 > Up solver (post-smoother) same as down solver (pre-smoother) > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=2097152, cols=2097152 > total: nonzeros=14581760, allocated nonzeros=14581760 > total number of mallocs used during MatSetValues calls =0 > Residual 2 norm 0.651117 > Residual infinity norm 0.00799571 > > > I did a diff on the ksp_view from the above run from the output from the run with -pc_type gamg and the only differences include the needed factor fill ratio (gamg: 1, mg: 2.3125), the size and non-zero counts of the matrices used in the multi-grid levels, the Chebyshev eigenvalue estimates, and the usage of I-node routines (gamg: using I-node routines: found 3 nodes, limit used is 5, mg: not using I-node routines). > > Adding -pc_mg_galerkin results in some improvement but still not as good as with gamg: > > $ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_view -ksp_monitor_true_residual -ksp_type cg -pc_type mg -pc_mg_levels 5 -mg_coarse_ksp_type preonly -mg_coarse_pc_type bjacobi -mg_coarse_sub_ksp_type preonly -mg_coarse_sub_pc_type lu -mg_coarse_sub_pc_factor_shift_type INBLOCKS -mg_levels_ksp_type chebyshev -mg_levels_pc_type sor -mg_levels_esteig_ksp_type gmres -pc_mg_galerkin > right hand side 2 norm: 512. > right hand side infinity norm: 0.999097 > building operator with Dirichlet boundary conditions, global grid size: 128 x 128 x 128 > 0 KSP preconditioned resid norm 1.073621701581e+00 true resid norm 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 2.316341151889e-01 true resid norm 1.169096072553e+03 ||r(i)||/||b|| 2.283390766706e+00 > 2 KSP preconditioned resid norm 1.054910990128e-01 true resid norm 4.444993518786e+02 ||r(i)||/||b|| 8.681627966378e-01 > 3 KSP preconditioned resid norm 3.671488511570e-02 true resid norm 1.169431518627e+02 ||r(i)||/||b|| 2.284045934818e-01 > 4 KSP preconditioned resid norm 1.055769111265e-02 true resid norm 3.161333456265e+01 ||r(i)||/||b|| 6.174479406767e-02 > 5 KSP preconditioned resid norm 2.557907008002e-03 true resid norm 9.319742572653e+00 ||r(i)||/||b|| 1.820262221221e-02 > 6 KSP preconditioned resid norm 5.039866236685e-04 true resid norm 2.418858575838e+00 ||r(i)||/||b|| 4.724333155934e-03 > 7 KSP preconditioned resid norm 1.132965683654e-04 true resid norm 4.979511177091e-01 ||r(i)||/||b|| 9.725607767757e-04 > 8 KSP preconditioned resid norm 5.458028025084e-05 true resid norm 1.150321233127e-01 ||r(i)||/||b|| 2.246721158452e-04 > 9 KSP preconditioned resid norm 3.742558792121e-05 true resid norm 8.485603638598e-02 ||r(i)||/||b|| 1.657344460664e-04 > 10 KSP preconditioned resid norm 1.121838737544e-05 true resid norm 4.699890661073e-02 ||r(i)||/||b|| 9.179473947407e-05 > 11 KSP preconditioned resid norm 4.452473763175e-06 true resid norm 1.071140093264e-02 ||r(i)||/||b|| 2.092070494657e-05 > KSP Object: 16 MPI processes > type: cg > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 16 MPI processes > type: mg > MG: type is MULTIPLICATIVE, levels=5 cycles=v > Cycles per PCApply=1 > Using Galerkin computed coarse grid matrices > Coarse grid solver -- level ------------------------------- > KSP Object: (mg_coarse_) 16 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_) 16 MPI processes > type: bjacobi > block Jacobi: number of blocks = 16 > Local solve is same for all blocks, in the following KSP and PC objects: > KSP Object: (mg_coarse_sub_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_sub_) 1 MPI processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > using diagonal shift on blocks to prevent zero pivot [INBLOCKS] > matrix ordering: nd > factor fill ratio given 5., needed 2.3125 > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=32, cols=32 > package used to perform factorization: petsc > total: nonzeros=370, allocated nonzeros=370 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=32, cols=32 > total: nonzeros=160, allocated nonzeros=160 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=512, cols=512 > total: nonzeros=3200, allocated nonzeros=3200 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > Down solver (pre-smoother) on level 1 ------------------------------- > KSP Object: (mg_levels_1_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.153005, max = 1.68306 > Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_1_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_1_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=4096, cols=4096 > total: nonzeros=27136, allocated nonzeros=27136 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 2 ------------------------------- > KSP Object: (mg_levels_2_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.152793, max = 1.68072 > Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_2_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_2_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=32768, cols=32768 > total: nonzeros=223232, allocated nonzeros=223232 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 3 ------------------------------- > KSP Object: (mg_levels_3_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.144705, max = 1.59176 > Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_3_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_3_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=262144, cols=262144 > total: nonzeros=1810432, allocated nonzeros=1810432 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 4 ------------------------------- > KSP Object: (mg_levels_4_) 16 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.140039, max = 1.54043 > Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_4_esteig_) 16 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_4_) 16 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=2097152, cols=2097152 > total: nonzeros=14581760, allocated nonzeros=14581760 > total number of mallocs used during MatSetValues calls =0 > Up solver (post-smoother) same as down solver (pre-smoother) > linear system matrix = precond matrix: > Mat Object: 16 MPI processes > type: mpiaij > rows=2097152, cols=2097152 > total: nonzeros=14581760, allocated nonzeros=14581760 > total number of mallocs used during MatSetValues calls =0 > Residual 2 norm 0.0107114 > Residual infinity norm 6.84843e-05 > > What are the differences between gamg and mg with -pc_mg_galerkin option (apart from the default smoother/coarse grid solver options, which I identified by comparing the ksp_view output)? Perhaps there?s an issue with the restriction, as you suggest? > > Thanks! > >> >> Thanks, >> >> Matt >> >> $ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_view -ksp_monitor_true_residual -pc_type gamg -ksp_type cg >> right hand side 2 norm: 512. >> right hand side infinity norm: 0.999097 >> building operator with Dirichlet boundary conditions, global grid size: 128 x 128 x 128 >> 0 KSP preconditioned resid norm 2.600515167901e+00 true resid norm 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00 >> 1 KSP preconditioned resid norm 6.715532962879e-02 true resid norm 7.578946422553e+02 ||r(i)||/||b|| 1.480262973155e+00 >> 2 KSP preconditioned resid norm 1.127682308441e-02 true resid norm 3.247852182315e+01 ||r(i)||/||b|| 6.343461293584e-02 >> 3 KSP preconditioned resid norm 7.760468503025e-04 true resid norm 3.304142895659e+00 ||r(i)||/||b|| 6.453404093085e-03 >> 4 KSP preconditioned resid norm 6.419777870067e-05 true resid norm 2.662993775521e-01 ||r(i)||/||b|| 5.201159717815e-04 >> 5 KSP preconditioned resid norm 5.107540549482e-06 true resid norm 2.309528369351e-02 ||r(i)||/||b|| 4.510797596388e-05 >> KSP Object: 16 MPI processes >> type: cg >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> PC Object: 16 MPI processes >> type: gamg >> MG: type is MULTIPLICATIVE, levels=5 cycles=v >> Cycles per PCApply=1 >> Using Galerkin computed coarse grid matrices >> GAMG specific options >> Threshold for dropping small values from graph 0. >> AGG specific options >> Symmetric graph false >> Coarse grid solver -- level ------------------------------- >> KSP Object: (mg_coarse_) 16 MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (mg_coarse_) 16 MPI processes >> type: bjacobi >> block Jacobi: number of blocks = 16 >> Local solve is same for all blocks, in the following KSP and PC objects: >> KSP Object: (mg_coarse_sub_) 1 MPI processes >> type: preonly >> maximum iterations=1, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (mg_coarse_sub_) 1 MPI processes >> type: lu >> LU: out-of-place factorization >> tolerance for zero pivot 2.22045e-14 >> using diagonal shift on blocks to prevent zero pivot [INBLOCKS] >> matrix ordering: nd >> factor fill ratio given 5., needed 1. >> Factored matrix follows: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=13, cols=13 >> package used to perform factorization: petsc >> total: nonzeros=169, allocated nonzeros=169 >> total number of mallocs used during MatSetValues calls =0 >> using I-node routines: found 3 nodes, limit used is 5 >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=13, cols=13 >> total: nonzeros=169, allocated nonzeros=169 >> total number of mallocs used during MatSetValues calls =0 >> using I-node routines: found 3 nodes, limit used is 5 >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=13, cols=13 >> total: nonzeros=169, allocated nonzeros=169 >> total number of mallocs used during MatSetValues calls =0 >> using I-node (on process 0) routines: found 3 nodes, limit used is 5 >> Down solver (pre-smoother) on level 1 ------------------------------- >> KSP Object: (mg_levels_1_) 16 MPI processes >> type: chebyshev >> Chebyshev: eigenvalue estimates: min = 0.136516, max = 1.50168 >> Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] >> KSP Object: (mg_levels_1_esteig_) 16 MPI processes >> type: gmres >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement >> GMRES: happy breakdown tolerance 1e-30 >> maximum iterations=10, initial guess is zero >> tolerances: relative=1e-12, absolute=1e-50, divergence=10000. >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_1_) 16 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=467, cols=467 >> total: nonzeros=68689, allocated nonzeros=68689 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node (on process 0) routines >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 2 ------------------------------- >> KSP Object: (mg_levels_2_) 16 MPI processes >> type: chebyshev >> Chebyshev: eigenvalue estimates: min = 0.148872, max = 1.63759 >> Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] >> KSP Object: (mg_levels_2_esteig_) 16 MPI processes >> type: gmres >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement >> GMRES: happy breakdown tolerance 1e-30 >> maximum iterations=10, initial guess is zero >> tolerances: relative=1e-12, absolute=1e-50, divergence=10000. >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_2_) 16 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=14893, cols=14893 >> total: nonzeros=1856839, allocated nonzeros=1856839 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node (on process 0) routines >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 3 ------------------------------- >> KSP Object: (mg_levels_3_) 16 MPI processes >> type: chebyshev >> Chebyshev: eigenvalue estimates: min = 0.135736, max = 1.49309 >> Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] >> KSP Object: (mg_levels_3_esteig_) 16 MPI processes >> type: gmres >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement >> GMRES: happy breakdown tolerance 1e-30 >> maximum iterations=10, initial guess is zero >> tolerances: relative=1e-12, absolute=1e-50, divergence=10000. >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_3_) 16 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=190701, cols=190701 >> total: nonzeros=6209261, allocated nonzeros=6209261 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node (on process 0) routines >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 4 ------------------------------- >> KSP Object: (mg_levels_4_) 16 MPI processes >> type: chebyshev >> Chebyshev: eigenvalue estimates: min = 0.140039, max = 1.54043 >> Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] >> KSP Object: (mg_levels_4_esteig_) 16 MPI processes >> type: gmres >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement >> GMRES: happy breakdown tolerance 1e-30 >> maximum iterations=10, initial guess is zero >> tolerances: relative=1e-12, absolute=1e-50, divergence=10000. >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_4_) 16 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=2097152, cols=2097152 >> total: nonzeros=14581760, allocated nonzeros=14581760 >> total number of mallocs used during MatSetValues calls =0 >> Up solver (post-smoother) same as down solver (pre-smoother) >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=2097152, cols=2097152 >> total: nonzeros=14581760, allocated nonzeros=14581760 >> total number of mallocs used during MatSetValues calls =0 >> Residual 2 norm 0.0230953 >> Residual infinity norm 0.000240174 >> >> >> >> $ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_view -ksp_monitor_true_residual -pc_type mg -ksp_type cg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_richardson_self_scale -mg_levels_ksp_max_it 5 >> right hand side 2 norm: 512. >> right hand side infinity norm: 0.999097 >> building operator with Dirichlet boundary conditions, global grid size: 128 x 128 x 128 >> building operator with Dirichlet boundary conditions, global grid size: 16 x 16 x 16 >> building operator with Dirichlet boundary conditions, global grid size: 32 x 32 x 32 >> building operator with Dirichlet boundary conditions, global grid size: 64 x 64 x 64 >> building operator with Dirichlet boundary conditions, global grid size: 8 x 8 x 8 >> 0 KSP preconditioned resid norm 1.957390963372e+03 true resid norm 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00 >> 1 KSP preconditioned resid norm 7.501162328351e+02 true resid norm 3.373318498950e+02 ||r(i)||/||b|| 6.588512693262e-01 >> 2 KSP preconditioned resid norm 7.658993705113e+01 true resid norm 1.827365322620e+02 ||r(i)||/||b|| 3.569072895742e-01 >> 3 KSP preconditioned resid norm 9.059824824329e+02 true resid norm 1.426474831278e+02 ||r(i)||/||b|| 2.786083654840e-01 >> 4 KSP preconditioned resid norm 4.091168582134e+02 true resid norm 1.292495057977e+02 ||r(i)||/||b|| 2.524404410112e-01 >> 5 KSP preconditioned resid norm 7.422110759274e+01 true resid norm 1.258028404461e+02 ||r(i)||/||b|| 2.457086727463e-01 >> 6 KSP preconditioned resid norm 4.619015396949e+01 true resid norm 1.213792421102e+02 ||r(i)||/||b|| 2.370688322464e-01 >> 7 KSP preconditioned resid norm 6.391009527793e+01 true resid norm 1.124510270422e+02 ||r(i)||/||b|| 2.196309121917e-01 >> 8 KSP preconditioned resid norm 7.446926604265e+01 true resid norm 1.077567310933e+02 ||r(i)||/||b|| 2.104623654166e-01 >> 9 KSP preconditioned resid norm 4.220904319642e+01 true resid norm 9.988181971539e+01 ||r(i)||/||b|| 1.950816791316e-01 >> 10 KSP preconditioned resid norm 2.394387980018e+01 true resid norm 9.127579669592e+01 ||r(i)||/||b|| 1.782730404217e-01 >> 11 KSP preconditioned resid norm 1.360843954226e+01 true resid norm 8.771762326371e+01 ||r(i)||/||b|| 1.713234829369e-01 >> 12 KSP preconditioned resid norm 4.128223286694e+01 true resid norm 8.529182941649e+01 ||r(i)||/||b|| 1.665856043291e-01 >> 13 KSP preconditioned resid norm 2.183532094447e+01 true resid norm 8.263211340769e+01 ||r(i)||/||b|| 1.613908464994e-01 >> 14 KSP preconditioned resid norm 1.304178992338e+01 true resid norm 7.971822602122e+01 ||r(i)||/||b|| 1.556996601977e-01 >> 15 KSP preconditioned resid norm 7.573349141411e+00 true resid norm 7.520975377445e+01 ||r(i)||/||b|| 1.468940503407e-01 >> 16 KSP preconditioned resid norm 9.314890793459e+00 true resid norm 7.304954328407e+01 ||r(i)||/||b|| 1.426748892267e-01 >> 17 KSP preconditioned resid norm 4.445933446231e+00 true resid norm 6.978356031428e+01 ||r(i)||/||b|| 1.362960162388e-01 >> 18 KSP preconditioned resid norm 5.349719054065e+00 true resid norm 6.667516877214e+01 ||r(i)||/||b|| 1.302249390081e-01 >> 19 KSP preconditioned resid norm 3.295861671942e+00 true resid norm 6.182140339659e+01 ||r(i)||/||b|| 1.207449285090e-01 >> 20 KSP preconditioned resid norm 1.035616277789e+01 true resid norm 5.734720030036e+01 ||r(i)||/||b|| 1.120062505866e-01 >> 21 KSP preconditioned resid norm 3.211186072853e+01 true resid norm 5.552393909940e+01 ||r(i)||/||b|| 1.084451935535e-01 >> 22 KSP preconditioned resid norm 1.305589450595e+01 true resid norm 5.499062776214e+01 ||r(i)||/||b|| 1.074035698479e-01 >> 23 KSP preconditioned resid norm 2.686432456763e+00 true resid norm 5.207613218582e+01 ||r(i)||/||b|| 1.017111956754e-01 >> 24 KSP preconditioned resid norm 2.824784197849e+00 true resid norm 4.838619801451e+01 ||r(i)||/||b|| 9.450429299708e-02 >> 25 KSP preconditioned resid norm 1.071690618667e+00 true resid norm 4.607851421273e+01 ||r(i)||/||b|| 8.999709807174e-02 >> 26 KSP preconditioned resid norm 1.881879145107e+00 true resid norm 4.001593265961e+01 ||r(i)||/||b|| 7.815611847581e-02 >> 27 KSP preconditioned resid norm 1.572862295402e+00 true resid norm 3.838282973517e+01 ||r(i)||/||b|| 7.496646432650e-02 >> 28 KSP preconditioned resid norm 1.470751639074e+00 true resid norm 3.480847634691e+01 ||r(i)||/||b|| 6.798530536506e-02 >> 29 KSP preconditioned resid norm 1.024975253805e+01 true resid norm 3.242161363347e+01 ||r(i)||/||b|| 6.332346412788e-02 >> 30 KSP preconditioned resid norm 2.548780607710e+00 true resid norm 3.146609403253e+01 ||r(i)||/||b|| 6.145721490728e-02 >> 31 KSP preconditioned resid norm 1.560691471465e+00 true resid norm 2.970265802267e+01 ||r(i)||/||b|| 5.801300395052e-02 >> 32 KSP preconditioned resid norm 2.596714997356e+00 true resid norm 2.766969046763e+01 ||r(i)||/||b|| 5.404236419458e-02 >> 33 KSP preconditioned resid norm 7.034818331385e+00 true resid norm 2.684572557056e+01 ||r(i)||/||b|| 5.243305775501e-02 >> 34 KSP preconditioned resid norm 1.494072683898e+00 true resid norm 2.475430030960e+01 ||r(i)||/||b|| 4.834824279219e-02 >> 35 KSP preconditioned resid norm 2.080781323538e+01 true resid norm 2.334859550417e+01 ||r(i)||/||b|| 4.560272559409e-02 >> 36 KSP preconditioned resid norm 2.046655096031e+00 true resid norm 2.240354154839e+01 ||r(i)||/||b|| 4.375691708669e-02 >> 37 KSP preconditioned resid norm 7.606846976760e-01 true resid norm 2.109556419574e+01 ||r(i)||/||b|| 4.120227381981e-02 >> 38 KSP preconditioned resid norm 2.521301363193e+00 true resid norm 1.843497075964e+01 ||r(i)||/||b|| 3.600580226493e-02 >> 39 KSP preconditioned resid norm 3.726976470079e+00 true resid norm 1.794209917279e+01 ||r(i)||/||b|| 3.504316244686e-02 >> 40 KSP preconditioned resid norm 8.959884762705e-01 true resid norm 1.573137783532e+01 ||r(i)||/||b|| 3.072534733461e-02 >> 41 KSP preconditioned resid norm 1.227682448888e+00 true resid norm 1.501346415860e+01 ||r(i)||/||b|| 2.932317218476e-02 >> 42 KSP preconditioned resid norm 1.452770736534e+00 true resid norm 1.433942919922e+01 ||r(i)||/||b|| 2.800669765473e-02 >> 43 KSP preconditioned resid norm 5.675352390898e-01 true resid norm 1.216437815936e+01 ||r(i)||/||b|| 2.375855109250e-02 >> 44 KSP preconditioned resid norm 4.949409351772e-01 true resid norm 1.042812110399e+01 ||r(i)||/||b|| 2.036742403123e-02 >> 45 KSP preconditioned resid norm 2.002853875915e+00 true resid norm 9.309183650084e+00 ||r(i)||/||b|| 1.818199931657e-02 >> 46 KSP preconditioned resid norm 3.745525627399e-01 true resid norm 8.522457639380e+00 ||r(i)||/||b|| 1.664542507691e-02 >> 47 KSP preconditioned resid norm 1.811694613170e-01 true resid norm 7.531206553361e+00 ||r(i)||/||b|| 1.470938779953e-02 >> 48 KSP preconditioned resid norm 1.782171623447e+00 true resid norm 6.764441307706e+00 ||r(i)||/||b|| 1.321179942911e-02 >> 49 KSP preconditioned resid norm 2.299828236176e+00 true resid norm 6.702407994976e+00 ||r(i)||/||b|| 1.309064061519e-02 >> 50 KSP preconditioned resid norm 1.273834849543e+00 true resid norm 6.053797247633e+00 ||r(i)||/||b|| 1.182382274928e-02 >> 51 KSP preconditioned resid norm 2.719578737249e-01 true resid norm 5.470925517497e+00 ||r(i)||/||b|| 1.068540140136e-02 >> 52 KSP preconditioned resid norm 4.663757145206e-01 true resid norm 5.005785517882e+00 ||r(i)||/||b|| 9.776924839614e-03 >> 53 KSP preconditioned resid norm 1.292565284376e+00 true resid norm 4.881780753946e+00 ||r(i)||/||b|| 9.534728035050e-03 >> 54 KSP preconditioned resid norm 1.867369610632e-01 true resid norm 4.496564950399e+00 ||r(i)||/||b|| 8.782353418749e-03 >> 55 KSP preconditioned resid norm 5.249392115789e-01 true resid norm 4.092757959067e+00 ||r(i)||/||b|| 7.993667888803e-03 >> 56 KSP preconditioned resid norm 1.924525961621e-01 true resid norm 3.780501481010e+00 ||r(i)||/||b|| 7.383791955098e-03 >> 57 KSP preconditioned resid norm 5.779420386829e-01 true resid norm 3.213189014725e+00 ||r(i)||/||b|| 6.275759794385e-03 >> 58 KSP preconditioned resid norm 5.955339076981e-01 true resid norm 3.112032435949e+00 ||r(i)||/||b|| 6.078188351463e-03 >> 59 KSP preconditioned resid norm 3.750139060970e-01 true resid norm 2.999193364090e+00 ||r(i)||/||b|| 5.857799539239e-03 >> 60 KSP preconditioned resid norm 1.384679712935e-01 true resid norm 2.745891157615e+00 ||r(i)||/||b|| 5.363068667216e-03 >> 61 KSP preconditioned resid norm 7.632834890339e-02 true resid norm 2.176299405671e+00 ||r(i)||/||b|| 4.250584776702e-03 >> 62 KSP preconditioned resid norm 3.147491994853e-01 true resid norm 1.832893972188e+00 ||r(i)||/||b|| 3.579871039430e-03 >> 63 KSP preconditioned resid norm 5.052243308649e-01 true resid norm 1.775115122392e+00 ||r(i)||/||b|| 3.467021723421e-03 >> 64 KSP preconditioned resid norm 8.956523831283e-01 true resid norm 1.731441975933e+00 ||r(i)||/||b|| 3.381722609244e-03 >> 65 KSP preconditioned resid norm 7.897527588669e-01 true resid norm 1.682654829619e+00 ||r(i)||/||b|| 3.286435214100e-03 >> 66 KSP preconditioned resid norm 5.770941160165e-02 true resid norm 1.560734518349e+00 ||r(i)||/||b|| 3.048309606150e-03 >> 67 KSP preconditioned resid norm 3.553024960194e-02 true resid norm 1.389699750667e+00 ||r(i)||/||b|| 2.714257325521e-03 >> 68 KSP preconditioned resid norm 4.316233667769e-02 true resid norm 1.147051776028e+00 ||r(i)||/||b|| 2.240335500054e-03 >> 69 KSP preconditioned resid norm 3.793691994632e-02 true resid norm 1.012385825627e+00 ||r(i)||/||b|| 1.977316065678e-03 >> 70 KSP preconditioned resid norm 2.383460701011e-02 true resid norm 8.696480161436e-01 ||r(i)||/||b|| 1.698531281530e-03 >> 71 KSP preconditioned resid norm 6.376655007996e-02 true resid norm 7.779779636534e-01 ||r(i)||/||b|| 1.519488210261e-03 >> 72 KSP preconditioned resid norm 5.714768085413e-02 true resid norm 7.153671793501e-01 ||r(i)||/||b|| 1.397201522168e-03 >> 73 KSP preconditioned resid norm 1.708395350387e-01 true resid norm 6.312992319936e-01 ||r(i)||/||b|| 1.233006312487e-03 >> 74 KSP preconditioned resid norm 1.498516783452e-01 true resid norm 6.006527781743e-01 ||r(i)||/||b|| 1.173149957372e-03 >> 75 KSP preconditioned resid norm 1.218071938641e-01 true resid norm 5.769463903876e-01 ||r(i)||/||b|| 1.126848418726e-03 >> 76 KSP preconditioned resid norm 2.682030144251e-02 true resid norm 5.214035118381e-01 ||r(i)||/||b|| 1.018366234059e-03 >> 77 KSP preconditioned resid norm 9.794744927328e-02 true resid norm 4.660318995939e-01 ||r(i)||/||b|| 9.102185538943e-04 >> 78 KSP preconditioned resid norm 3.311394355245e-01 true resid norm 4.581129176231e-01 ||r(i)||/||b|| 8.947517922325e-04 >> 79 KSP preconditioned resid norm 7.771705063438e-02 true resid norm 4.103510898511e-01 ||r(i)||/||b|| 8.014669723654e-04 >> 80 KSP preconditioned resid norm 3.078123608908e-02 true resid norm 3.918493012988e-01 ||r(i)||/||b|| 7.653306665991e-04 >> 81 KSP preconditioned resid norm 2.759088686744e-02 true resid norm 3.289360804743e-01 ||r(i)||/||b|| 6.424532821763e-04 >> 82 KSP preconditioned resid norm 1.147671489846e-01 true resid norm 3.190902200515e-01 ||r(i)||/||b|| 6.232230860381e-04 >> 83 KSP preconditioned resid norm 1.101306468440e-02 true resid norm 2.900815313985e-01 ||r(i)||/||b|| 5.665654910126e-04 >> KSP Object: 16 MPI processes >> type: cg >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> PC Object: 16 MPI processes >> type: mg >> MG: type is MULTIPLICATIVE, levels=5 cycles=v >> Cycles per PCApply=1 >> Not using Galerkin computed coarse grid matrices >> Coarse grid solver -- level ------------------------------- >> KSP Object: (mg_coarse_) 16 MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (mg_coarse_) 16 MPI processes >> type: redundant >> Redundant preconditioner: First (color=0) of 16 PCs follows >> KSP Object: (mg_coarse_redundant_) 1 MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (mg_coarse_redundant_) 1 MPI processes >> type: lu >> LU: out-of-place factorization >> tolerance for zero pivot 2.22045e-14 >> using diagonal shift on blocks to prevent zero pivot [INBLOCKS] >> matrix ordering: nd >> factor fill ratio given 5., needed 7.56438 >> Factored matrix follows: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=512, cols=512 >> package used to perform factorization: petsc >> total: nonzeros=24206, allocated nonzeros=24206 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=512, cols=512 >> total: nonzeros=3200, allocated nonzeros=3200 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=512, cols=512 >> total: nonzeros=3200, allocated nonzeros=3200 >> total number of mallocs used during MatSetValues calls =0 >> Down solver (pre-smoother) on level 1 ------------------------------- >> KSP Object: (mg_levels_1_) 16 MPI processes >> type: richardson >> Richardson: using self-scale best computed damping factor >> maximum iterations=5 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_1_) 16 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=4096, cols=4096 >> total: nonzeros=27136, allocated nonzeros=27136 >> total number of mallocs used during MatSetValues calls =0 >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 2 ------------------------------- >> KSP Object: (mg_levels_2_) 16 MPI processes >> type: richardson >> Richardson: using self-scale best computed damping factor >> maximum iterations=5 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_2_) 16 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=32768, cols=32768 >> total: nonzeros=223232, allocated nonzeros=223232 >> total number of mallocs used during MatSetValues calls =0 >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 3 ------------------------------- >> KSP Object: (mg_levels_3_) 16 MPI processes >> type: richardson >> Richardson: using self-scale best computed damping factor >> maximum iterations=5 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_3_) 16 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=262144, cols=262144 >> total: nonzeros=1810432, allocated nonzeros=1810432 >> total number of mallocs used during MatSetValues calls =0 >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 4 ------------------------------- >> KSP Object: (mg_levels_4_) 16 MPI processes >> type: richardson >> Richardson: using self-scale best computed damping factor >> maximum iterations=5 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (mg_levels_4_) 16 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=2097152, cols=2097152 >> total: nonzeros=14581760, allocated nonzeros=14581760 >> total number of mallocs used during MatSetValues calls =0 >> Up solver (post-smoother) same as down solver (pre-smoother) >> linear system matrix = precond matrix: >> Mat Object: 16 MPI processes >> type: mpiaij >> rows=2097152, cols=2097152 >> total: nonzeros=14581760, allocated nonzeros=14581760 >> total number of mallocs used during MatSetValues calls =0 >> Residual 2 norm 0.290082 >> Residual infinity norm 0.00192869 >> >> >> >> >> >> solver_test.c: >> >> // modified version of ksp/ksp/examples/tutorials/ex34.c >> // related: ksp/ksp/examples/tutorials/ex29.c >> // ksp/ksp/examples/tutorials/ex32.c >> // ksp/ksp/examples/tutorials/ex50.c >> >> #include >> #include >> #include >> >> extern PetscErrorCode ComputeMatrix(KSP,Mat,Mat,void*); >> extern PetscErrorCode ComputeRHS(KSP,Vec,void*); >> >> typedef enum >> { >> DIRICHLET, >> NEUMANN >> } BCType; >> >> #undef __FUNCT__ >> #define __FUNCT__ "main" >> int main(int argc,char **argv) >> { >> KSP ksp; >> DM da; >> PetscReal norm; >> PetscErrorCode ierr; >> >> PetscInt i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs; >> PetscScalar Hx,Hy,Hz; >> PetscScalar ***array; >> Vec x,b,r; >> Mat J; >> const char* bcTypes[2] = { "dirichlet", "neumann" }; >> PetscInt bcType = (PetscInt)DIRICHLET; >> >> PetscInitialize(&argc,&argv,(char*)0,0); >> >> ierr = PetscOptionsBegin(PETSC_COMM_WORLD, "", "", "");CHKERRQ(ierr); >> ierr = PetscOptionsEList("-bc_type", "Type of boundary condition", "", bcTypes, 2, bcTypes[0], &bcType, NULL);CHKERRQ(ierr); >> ierr = PetscOptionsEnd();CHKERRQ(ierr); >> >> ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr); >> ierr = DMDACreate3d(PETSC_COMM_WORLD,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DMDA_STENCIL_STAR,-12,-12,-12,PETSC_DECIDE,PETSC_DECIDE,PETSC_DECIDE,1,1,0,0,0,&da);CHKERRQ(ierr); >> ierr = DMDASetInterpolationType(da, DMDA_Q0);CHKERRQ(ierr); >> >> ierr = KSPSetDM(ksp,da);CHKERRQ(ierr); >> >> ierr = KSPSetComputeRHS(ksp,ComputeRHS,&bcType);CHKERRQ(ierr); >> ierr = KSPSetComputeOperators(ksp,ComputeMatrix,&bcType);CHKERRQ(ierr); >> ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr); >> ierr = KSPSolve(ksp,NULL,NULL);CHKERRQ(ierr); >> ierr = KSPGetSolution(ksp,&x);CHKERRQ(ierr); >> ierr = KSPGetRhs(ksp,&b);CHKERRQ(ierr); >> ierr = KSPGetOperators(ksp,NULL,&J);CHKERRQ(ierr); >> ierr = VecDuplicate(b,&r);CHKERRQ(ierr); >> >> ierr = MatMult(J,x,r);CHKERRQ(ierr); >> ierr = VecAYPX(r,-1.0,b);CHKERRQ(ierr); >> ierr = VecNorm(r,NORM_2,&norm);CHKERRQ(ierr); >> ierr = PetscPrintf(PETSC_COMM_WORLD,"Residual 2 norm %g\n",(double)norm);CHKERRQ(ierr); >> ierr = VecNorm(r,NORM_INFINITY,&norm);CHKERRQ(ierr); >> ierr = PetscPrintf(PETSC_COMM_WORLD,"Residual infinity norm %g\n",(double)norm);CHKERRQ(ierr); >> >> ierr = VecDestroy(&r);CHKERRQ(ierr); >> ierr = KSPDestroy(&ksp);CHKERRQ(ierr); >> ierr = DMDestroy(&da);CHKERRQ(ierr); >> ierr = PetscFinalize(); >> return 0; >> } >> >> #undef __FUNCT__ >> #define __FUNCT__ "ComputeRHS" >> PetscErrorCode ComputeRHS(KSP ksp,Vec b,void *ctx) >> { >> PetscErrorCode ierr; >> PetscInt i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs; >> PetscScalar Hx,Hy,Hz; >> PetscScalar ***array; >> DM da; >> BCType bcType = *(BCType*)ctx; >> >> PetscFunctionBeginUser; >> ierr = KSPGetDM(ksp,&da);CHKERRQ(ierr); >> ierr = DMDAGetInfo(da, 0, &mx, &my, &mz, 0,0,0,0,0,0,0,0,0);CHKERRQ(ierr); >> Hx = 1.0 / (PetscReal)(mx); >> Hy = 1.0 / (PetscReal)(my); >> Hz = 1.0 / (PetscReal)(mz); >> ierr = DMDAGetCorners(da,&xs,&ys,&zs,&xm,&ym,&zm);CHKERRQ(ierr); >> ierr = DMDAVecGetArray(da, b, &array);CHKERRQ(ierr); >> for (k = zs; k < zs + zm; k++) >> { >> for (j = ys; j < ys + ym; j++) >> { >> for (i = xs; i < xs + xm; i++) >> { >> PetscReal x = ((PetscReal)i + 0.5) * Hx; >> PetscReal y = ((PetscReal)j + 0.5) * Hy; >> PetscReal z = ((PetscReal)k + 0.5) * Hz; >> array[k][j][i] = PetscSinReal(x * 2.0 * PETSC_PI) * PetscCosReal(y * 2.0 * PETSC_PI) * PetscSinReal(z * 2.0 * PETSC_PI); >> } >> } >> } >> ierr = DMDAVecRestoreArray(da, b, &array);CHKERRQ(ierr); >> ierr = VecAssemblyBegin(b);CHKERRQ(ierr); >> ierr = VecAssemblyEnd(b);CHKERRQ(ierr); >> >> PetscReal norm; >> VecNorm(b, NORM_2, &norm); >> PetscPrintf(PETSC_COMM_WORLD, "right hand side 2 norm: %g\n", (double)norm); >> VecNorm(b, NORM_INFINITY, &norm); >> PetscPrintf(PETSC_COMM_WORLD, "right hand side infinity norm: %g\n", (double)norm); >> >> /* force right hand side to be consistent for singular matrix */ >> /* note this is really a hack, normally the model would provide you with a consistent right handside */ >> >> if (bcType == NEUMANN) >> { >> MatNullSpace nullspace; >> ierr = MatNullSpaceCreate(PETSC_COMM_WORLD,PETSC_TRUE,0,0,&nullspace);CHKERRQ(ierr); >> ierr = MatNullSpaceRemove(nullspace,b);CHKERRQ(ierr); >> ierr = MatNullSpaceDestroy(&nullspace);CHKERRQ(ierr); >> } >> PetscFunctionReturn(0); >> } >> >> >> #undef __FUNCT__ >> #define __FUNCT__ "ComputeMatrix" >> PetscErrorCode ComputeMatrix(KSP ksp, Mat J,Mat jac, void *ctx) >> { >> PetscErrorCode ierr; >> PetscInt i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs,num, numi, numj, numk; >> PetscScalar v[7],Hx,Hy,Hz; >> MatStencil row, col[7]; >> DM da; >> BCType bcType = *(BCType*)ctx; >> >> PetscFunctionBeginUser; >> >> if (bcType == DIRICHLET) >> PetscPrintf(PETSC_COMM_WORLD, "building operator with Dirichlet boundary conditions, "); >> else if (bcType == NEUMANN) >> PetscPrintf(PETSC_COMM_WORLD, "building operator with Neumann boundary conditions, "); >> else >> SETERRQ(PETSC_COMM_WORLD, PETSC_ERR_SUP, "unrecognized boundary condition type\n"); >> >> ierr = KSPGetDM(ksp,&da);CHKERRQ(ierr); >> ierr = DMDAGetInfo(da,0,&mx,&my,&mz,0,0,0,0,0,0,0,0,0);CHKERRQ(ierr); >> >> PetscPrintf(PETSC_COMM_WORLD, "global grid size: %d x %d x %d\n", mx, my, mz); >> >> Hx = 1.0 / (PetscReal)(mx); >> Hy = 1.0 / (PetscReal)(my); >> Hz = 1.0 / (PetscReal)(mz); >> >> PetscReal Hx2 = Hx * Hx; >> PetscReal Hy2 = Hy * Hy; >> PetscReal Hz2 = Hz * Hz; >> >> PetscReal scaleX = 1.0 / Hx2; >> PetscReal scaleY = 1.0 / Hy2; >> PetscReal scaleZ = 1.0 / Hz2; >> >> ierr = DMDAGetCorners(da,&xs,&ys,&zs,&xm,&ym,&zm);CHKERRQ(ierr); >> for (k = zs; k < zs + zm; k++) >> { >> for (j = ys; j < ys + ym; j++) >> { >> for (i = xs; i < xs + xm; i++) >> { >> row.i = i; >> row.j = j; >> row.k = k; >> if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1 || k == mz - 1) >> { >> num = 0; >> numi = 0; >> numj = 0; >> numk = 0; >> if (k != 0) >> { >> v[num] = -scaleZ; >> col[num].i = i; >> col[num].j = j; >> col[num].k = k - 1; >> num++; >> numk++; >> } >> if (j != 0) >> { >> v[num] = -scaleY; >> col[num].i = i; >> col[num].j = j - 1; >> col[num].k = k; >> num++; >> numj++; >> } >> if (i != 0) >> { >> v[num] = -scaleX; >> col[num].i = i - 1; >> col[num].j = j; >> col[num].k = k; >> num++; >> numi++; >> } >> if (i != mx - 1) >> { >> v[num] = -scaleX; >> col[num].i = i + 1; >> col[num].j = j; >> col[num].k = k; >> num++; >> numi++; >> } >> if (j != my - 1) >> { >> v[num] = -scaleY; >> col[num].i = i; >> col[num].j = j + 1; >> col[num].k = k; >> num++; >> numj++; >> } >> if (k != mz - 1) >> { >> v[num] = -scaleZ; >> col[num].i = i; >> col[num].j = j; >> col[num].k = k + 1; >> num++; >> numk++; >> } >> >> if (bcType == NEUMANN) >> { >> v[num] = (PetscReal) (numk) * scaleZ + (PetscReal) (numj) * scaleY + (PetscReal) (numi) * scaleX; >> } >> else if (bcType == DIRICHLET) >> { >> v[num] = 2.0 * (scaleX + scaleY + scaleZ); >> } >> >> col[num].i = i; >> col[num].j = j; >> col[num].k = k; >> num++; >> ierr = MatSetValuesStencil(jac, 1, &row, num, col, v, INSERT_VALUES); >> CHKERRQ(ierr); >> } >> else >> { >> v[0] = -scaleZ; >> col[0].i = i; >> col[0].j = j; >> col[0].k = k - 1; >> v[1] = -scaleY; >> col[1].i = i; >> col[1].j = j - 1; >> col[1].k = k; >> v[2] = -scaleX; >> col[2].i = i - 1; >> col[2].j = j; >> col[2].k = k; >> v[3] = 2.0 * (scaleX + scaleY + scaleZ); >> col[3].i = i; >> col[3].j = j; >> col[3].k = k; >> v[4] = -scaleX; >> col[4].i = i + 1; >> col[4].j = j; >> col[4].k = k; >> v[5] = -scaleY; >> col[5].i = i; >> col[5].j = j + 1; >> col[5].k = k; >> v[6] = -scaleZ; >> col[6].i = i; >> col[6].j = j; >> col[6].k = k + 1; >> ierr = MatSetValuesStencil(jac, 1, &row, 7, col, v, INSERT_VALUES); >> CHKERRQ(ierr); >> } >> } >> } >> } >> ierr = MatAssemblyBegin(jac,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >> ierr = MatAssemblyEnd(jac,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >> if (bcType == NEUMANN) >> { >> MatNullSpace nullspace; >> ierr = MatNullSpaceCreate(PETSC_COMM_WORLD,PETSC_TRUE,0,0,&nullspace);CHKERRQ(ierr); >> ierr = MatSetNullSpace(J,nullspace);CHKERRQ(ierr); >> ierr = MatNullSpaceDestroy(&nullspace);CHKERRQ(ierr); >> } >> PetscFunctionReturn(0); >> } >> >> >>> On Jun 22, 2017, at 9:23 AM, Matthew Knepley wrote: >>> >>> On Wed, Jun 21, 2017 at 8:12 PM, Jason Lefley wrote: >>> Hello, >>> >>> We are attempting to use the PETSc KSP solver framework in a fluid dynamics simulation we developed. The solution is part of a pressure projection and solves a Poisson problem. We use a cell-centered layout with a regular grid in 3d. We started with ex34.c from the KSP tutorials since it has the correct calls for the 3d DMDA, uses a cell-centered layout, and states that it works with multi-grid. We modified the operator construction function to match the coefficients and Dirichlet boundary conditions used in our problem (we?d also like to support Neumann but left those out for now to keep things simple). As a result of the modified boundary conditions, our version does not perform a null space removal on the right hand side or operator as the original did. We also modified the right hand side to contain a sinusoidal pattern for testing. Other than these changes, our code is the same as the original ex34.c >>> >>> With the default KSP options and using CG with the default pre-conditioner and without a pre-conditioner, we see good convergence. However, we?d like to accelerate the time to solution further and target larger problem sizes (>= 1024^3) if possible. Given these objectives, multi-grid as a pre-conditioner interests us. To understand the improvement that multi-grid provides, we ran ex45 from the KSP tutorials. ex34 with CG and no pre-conditioner appears to converge in a single iteration and we wanted to compare against a problem that has similar convergence patterns to our problem. Here?s the tests we ran with ex45: >>> >>> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 >>> time in KSPSolve(): 7.0178e+00 >>> solver iterations: 157 >>> KSP final norm of residual: 3.16874e-05 >>> >>> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -ksp_type cg -pc_type none >>> time in KSPSolve(): 4.1072e+00 >>> solver iterations: 213 >>> KSP final norm of residual: 0.000138866 >>> >>> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -ksp_type cg >>> time in KSPSolve(): 3.3962e+00 >>> solver iterations: 88 >>> KSP final norm of residual: 6.46242e-05 >>> >>> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi >>> time in KSPSolve(): 1.3201e+00 >>> solver iterations: 4 >>> KSP final norm of residual: 8.13339e-05 >>> >>> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi -ksp_type cg >>> time in KSPSolve(): 1.3008e+00 >>> solver iterations: 4 >>> KSP final norm of residual: 2.21474e-05 >>> >>> We found the multi-grid pre-conditioner options in the KSP tutorials makefile. These results make sense; both the default GMRES and CG solvers converge and CG without a pre-conditioner takes more iterations. The multi-grid pre-conditioned runs are pretty dramatically accelerated and require only a handful of iterations. >>> >>> We ran our code (modified ex34.c as described above) with the same parameters: >>> >>> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 >>> time in KSPSolve(): 5.3729e+00 >>> solver iterations: 123 >>> KSP final norm of residual: 0.00595066 >>> >>> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_type cg -pc_type none >>> time in KSPSolve(): 3.6154e+00 >>> solver iterations: 188 >>> KSP final norm of residual: 0.00505943 >>> >>> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_type cg >>> time in KSPSolve(): 3.5661e+00 >>> solver iterations: 98 >>> KSP final norm of residual: 0.00967462 >>> >>> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi >>> time in KSPSolve(): 4.5606e+00 >>> solver iterations: 44 >>> KSP final norm of residual: 949.553 >>> >>> 1) Dave is right >>> >>> 2) In order to see how many iterates to expect, first try using algebraic multigrid >>> >>> -pc_type gamg >>> >>> This should work out of the box for Poisson >>> >>> 3) For questions like this, we really need to see >>> >>> -ksp_view -ksp_monitor_true_residual >>> >>> 4) It sounds like you smoother is not strong enough. You could try >>> >>> -mg_levels_ksp_type richardson -mg_levels_ksp_richardson_self_scale -mg_levels_ksp_max_it 5 >>> >>> or maybe GMRES until it works. >>> >>> Thanks, >>> >>> Matt >>> >>> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi -ksp_type cg >>> time in KSPSolve(): 1.5481e+01 >>> solver iterations: 198 >>> KSP final norm of residual: 0.916558 >>> >>> We performed all tests with petsc-3.7.6. >>> >>> The trends with CG and GMRES seem consistent with the results from ex45. However, with multi-grid, something doesn?t seem right. Convergence seems poor and the solves run for many more iterations than ex45 with multi-grid as a pre-conditioner. I extensively validated the code that builds the matrix and also confirmed that the solution produced by CG, when evaluated with the system of equations elsewhere in our simulation, produces the same residual as indicated by PETSc. Given that we only made minimal modifications to the original example code, it seems likely that the operators constructed for the multi-grid levels are ok. >>> >>> We also tried a variety of other suggested parameters for the multi-grid pre-conditioner as suggested in related mailing list posts but we didn?t observe any significant improvements over the results above. >>> >>> Is there anything we can do to check the validity of the coefficient matrices built for the different multi-grid levels? Does it look like there could be problems there? Or any other suggestions to achieve better results with multi-grid? I have the -log_view, -ksp_view, and convergence monitor output from the above tests and can post any of it if it would assist. >>> >>> Thanks >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> http://www.caam.rice.edu/~mk51/ >> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ > From hgbk2008 at gmail.com Fri Jun 23 11:48:18 2017 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Fri, 23 Jun 2017 18:48:18 +0200 Subject: [petsc-users] -ksp_pc_side for -pc_fieldsplit_schur_fact_type upper Message-ID: Hello I just want to make sure that I understand the right thing, in the manual page of PCFieldSplitSetSchurFactType: http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCFieldSplitSetSchurFactType.html The "upper" option uses DU from the full factorization. If I understand correctly, the ksp must be used with right preconditioning, because A P^-1 = (LDU) * (DU)^(-1) = L. If that was the case, should the manual put this information to be more clear? Thanks Giang -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Jun 23 14:14:58 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 23 Jun 2017 14:14:58 -0500 Subject: [petsc-users] -ksp_pc_side for -pc_fieldsplit_schur_fact_type upper In-Reply-To: References: Message-ID: <8710FAC1-0A5B-48EB-93FB-37C0A3053017@mcs.anl.gov> Based on my reading of the attached paper, in particular the last line on page 1050 I don't think it matters. and P?1A and AP?1 have the minimal polynomial (? ? 1)2. since the application on either side matches the same polynomial. Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: s1064827500377435.pdf Type: application/pdf Size: 57146 bytes Desc: not available URL: -------------- next part -------------- > On Jun 23, 2017, at 11:48 AM, Hoang Giang Bui wrote: > > Hello > > I just want to make sure that I understand the right thing, in the manual page of PCFieldSplitSetSchurFactType: http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCFieldSplitSetSchurFactType.html > > The "upper" option uses DU from the full factorization. If I understand correctly, the ksp must be used with right preconditioning, because A P^-1 = (LDU) * (DU)^(-1) = L. If that was the case, should the manual put this information to be more clear? > > Thanks > Giang > From knepley at gmail.com Sun Jun 25 12:03:38 2017 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 25 Jun 2017 12:03:38 -0500 Subject: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel In-Reply-To: References: Message-ID: On Tue, Apr 11, 2017 at 9:21 AM, Hassan Raiesi < Hassan.Raiesi at aero.bombardier.com> wrote: > Hello, > > > > I?m trying to use DMPlexCreateFromCellListParallel to create a DM from an > already partitioned mesh, > > It requires an array of numVertices*spaceDim numbers, but how should one > order the coordinates of the vertices? > Global order. Here is the idea. You must read the file in chunks so that each proc can read its own chunk in parallel without talking to anyone else. > we only pass the global vertex numbers using ?const int cells[]? to define > the cell-connectivity, so passing the vertex coordinates in local ordering > wouldn?t make sense? > Yes. > If it needs to be in global ordering, should I sort the global index of > the node numbers owned by each rank (as they wont be continuous). > Nope. Thanks, Matt > > > Thank you > > > > Hassan Raiesi, > > Bombardier Aerospace > > www.bombardier.com > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgbk2008 at gmail.com Mon Jun 26 17:03:02 2017 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Tue, 27 Jun 2017 00:03:02 +0200 Subject: [petsc-users] -ksp_pc_side for -pc_fieldsplit_schur_fact_type upper In-Reply-To: <8710FAC1-0A5B-48EB-93FB-37C0A3053017@mcs.anl.gov> References: <8710FAC1-0A5B-48EB-93FB-37C0A3053017@mcs.anl.gov> Message-ID: Thanks Barry for the paper. I also noticed that the block lower preconditioner also has the same minimal polynomial (the proof is similar), hence left and right preconditioning shall apply the same. Giang On Fri, Jun 23, 2017 at 9:14 PM, Barry Smith wrote: > > Based on my reading of the attached paper, in particular the last line > on page 1050 I don't think it matters. > > and P?1A and AP?1 have the minimal polynomial (? ? 1)2. > > since the application on either side matches the same polynomial. > > > Barry > > > > > > On Jun 23, 2017, at 11:48 AM, Hoang Giang Bui > wrote: > > > > Hello > > > > I just want to make sure that I understand the right thing, in the > manual page of PCFieldSplitSetSchurFactType: > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/ > PCFieldSplitSetSchurFactType.html > > > > The "upper" option uses DU from the full factorization. If I understand > correctly, the ksp must be used with right preconditioning, because A P^-1 > = (LDU) * (DU)^(-1) = L. If that was the case, should the manual put this > information to be more clear? > > > > Thanks > > Giang > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.lefley at aclectic.com Mon Jun 26 20:37:56 2017 From: jason.lefley at aclectic.com (Jason Lefley) Date: Mon, 26 Jun 2017 18:37:56 -0700 Subject: [petsc-users] Issue using multi-grid as a pre-conditioner with KSP for a Poisson problem In-Reply-To: References: <89C033B5-529D-4B36-B4AF-2EC35CA2CCAB@aclectic.com> <4D5B3921-810B-49AD-97E9-8BF1DECBF655@aclectic.com> Message-ID: <063A8337-A2DC-4869-8E55-451F0F6F592C@aclectic.com> > Okay, when you say a Poisson problem, I assumed you meant > > div grad phi = f > > However, now it appears that you have > > div D grad phi = f > > Is this true? It would explain your results. Your coarse operator is inaccurate. AMG makes the coarse operator directly > from the matrix, so it incorporates coefficient variation. Galerkin projection makes the coarse operator using R A P > from your original operator A, and this is accurate enough to get good convergence. So your coefficient representation > on the coarse levels is really bad. If you want to use GMG, you need to figure out how to represent the coefficient on > coarser levels, which is sometimes called "renormalization". > > Matt I believe we are solving the first one. The discretized form we are using is equation 13 in this document: https://www.rsmas.miami.edu/users/miskandarani/Courses/MSC321/Projects/prjpoisson.pdf Would you clarify why you think we are solving the second equation? I looked at some other code that uses geometric multi-grid to solve the same problem and the authors assumed a uniform cell width and subsequently factored out the cell width and moved it to the right hand side. I did that in our code (using scale = 1 rather than 1/h^2) and we see better convergence with the mg pre-conditioner: $ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_monitor_true_residual -pc_type mg -ksp_type cg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_richardson_self_scale -mg_levels_ksp_max_it 5 right hand side 2 norm: 512. right hand side infinity norm: 0.999097 0 KSP preconditioned resid norm 3.434682678336e+05 true resid norm 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 7.305460801248e+04 true resid norm 1.633308409885e+02 ||r(i)||/||b|| 3.190055488056e-01 2 KSP preconditioned resid norm 6.656705346262e+03 true resid norm 7.942064132554e+01 ||r(i)||/||b|| 1.551184400890e-01 3 KSP preconditioned resid norm 3.523570204496e+03 true resid norm 4.046777759638e+01 ||r(i)||/||b|| 7.903862811794e-02 4 KSP preconditioned resid norm 9.123967354323e+03 true resid norm 2.445432977350e+01 ||r(i)||/||b|| 4.776236283887e-02 5 KSP preconditioned resid norm 9.811672802436e+02 true resid norm 1.556881108574e+01 ||r(i)||/||b|| 3.040783415183e-02 6 KSP preconditioned resid norm 1.106193887270e+03 true resid norm 8.752912569969e+00 ||r(i)||/||b|| 1.709553236322e-02 7 KSP preconditioned resid norm 3.411263151853e+02 true resid norm 4.817172861959e+00 ||r(i)||/||b|| 9.408540746014e-03 8 KSP preconditioned resid norm 1.129663122476e+02 true resid norm 2.051711481120e+00 ||r(i)||/||b|| 4.007248986563e-03 9 KSP preconditioned resid norm 7.776030229135e+01 true resid norm 1.092336734730e+00 ||r(i)||/||b|| 2.133470185019e-03 10 KSP preconditioned resid norm 3.900236414632e+01 true resid norm 4.662658178376e-01 ||r(i)||/||b|| 9.106754254641e-04 11 KSP preconditioned resid norm 2.884248061867e+01 true resid norm 2.584775749590e-01 ||r(i)||/||b|| 5.048390135919e-04 12 KSP preconditioned resid norm 1.275086146987e+01 true resid norm 1.183721340034e-01 ||r(i)||/||b|| 2.311955742253e-04 13 KSP preconditioned resid norm 3.378721119782e+00 true resid norm 5.841425568821e-02 ||r(i)||/||b|| 1.140903431410e-04 Linear solve converged due to CONVERGED_RTOL iterations 13 KSP final norm of residual 0.0584143 Residual 2 norm 0.0584143 Residual infinity norm 0.000458905 While this looks much better than our previous attempts, I think we might end up using the algebraic approach for generating the intermediate operators (either gamg or mg with -pc_mg_galerkin) since we want to support specification of boundary conditions other than just at the extents of the domain and it?s not clear how to handle the coarsening when boundary cells and non-boundary cells are combined into a single cell on the coarser grid. I?d still like to understand the renormalization you mentioned. Do you know of any resources that discuss it? Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From Eric.Chamberland at giref.ulaval.ca Mon Jun 26 21:24:22 2017 From: Eric.Chamberland at giref.ulaval.ca (=?UTF-8?Q?=c3=89ric_Chamberland?=) Date: Mon, 26 Jun 2017 22:24:22 -0400 Subject: [petsc-users] Is still GPU feature recommended on master branch only? Message-ID: <5306b0bc-58ca-5122-c39f-6ba27bfe3661@giref.ulaval.ca> Hi, as it is stated here (https://www.mcs.anl.gov/petsc/features/gpus.html) since a long time: We recommend working withpetsc master (git branch) if you wish to work witht his feature. Does this recommendation remains true? Thanks, Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Jun 26 21:52:28 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 26 Jun 2017 21:52:28 -0500 Subject: [petsc-users] Issue using multi-grid as a pre-conditioner with KSP for a Poisson problem In-Reply-To: <063A8337-A2DC-4869-8E55-451F0F6F592C@aclectic.com> References: <89C033B5-529D-4B36-B4AF-2EC35CA2CCAB@aclectic.com> <4D5B3921-810B-49AD-97E9-8BF1DECBF655@aclectic.com> <063A8337-A2DC-4869-8E55-451F0F6F592C@aclectic.com> Message-ID: On Mon, Jun 26, 2017 at 8:37 PM, Jason Lefley wrote: > Okay, when you say a Poisson problem, I assumed you meant > > div grad phi = f > > However, now it appears that you have > > div D grad phi = f > > Is this true? It would explain your results. Your coarse operator is > inaccurate. AMG makes the coarse operator directly > from the matrix, so it incorporates coefficient variation. Galerkin > projection makes the coarse operator using R A P > from your original operator A, and this is accurate enough to get good > convergence. So your coefficient representation > on the coarse levels is really bad. If you want to use GMG, you need to > figure out how to represent the coefficient on > coarser levels, which is sometimes called "renormalization". > > Matt > > > I believe we are solving the first one. The discretized form we are using > is equation 13 in this document: https://www.rsmas. > miami.edu/users/miskandarani/Courses/MSC321/Projects/prjpoisson.pdf Would > you clarify why you think we are solving the second equation? > Something is wrong. The writeup is just showing the FD Laplacian. Can you take a look at SNES ex5, and let me know how your problem differs from that one? There were use GMG and can converge is a few (5-6) iterates, and if you use FMG you converge in 1 iterate. In fact, that is in my class notes on the CAAM 519 website. Its possible that you have badly scaled boundary values, which can cause convergence to deteriorate. Thanks, Matt > I looked at some other code that uses geometric multi-grid to solve the > same problem and the authors assumed a uniform cell width and subsequently > factored out the cell width and moved it to the right hand side. I did that > in our code (using scale = 1 rather than 1/h^2) and we see better > convergence with the mg pre-conditioner: > > $ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 > -ksp_monitor_true_residual -pc_type mg -ksp_type cg -pc_mg_levels 5 > -mg_levels_ksp_type richardson -mg_levels_ksp_richardson_self_scale > -mg_levels_ksp_max_it 5 > right hand side 2 norm: 512. > right hand side infinity norm: 0.999097 > 0 KSP preconditioned resid norm 3.434682678336e+05 true resid norm > 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 7.305460801248e+04 true resid norm > 1.633308409885e+02 ||r(i)||/||b|| 3.190055488056e-01 > 2 KSP preconditioned resid norm 6.656705346262e+03 true resid norm > 7.942064132554e+01 ||r(i)||/||b|| 1.551184400890e-01 > 3 KSP preconditioned resid norm 3.523570204496e+03 true resid norm > 4.046777759638e+01 ||r(i)||/||b|| 7.903862811794e-02 > 4 KSP preconditioned resid norm 9.123967354323e+03 true resid norm > 2.445432977350e+01 ||r(i)||/||b|| 4.776236283887e-02 > 5 KSP preconditioned resid norm 9.811672802436e+02 true resid norm > 1.556881108574e+01 ||r(i)||/||b|| 3.040783415183e-02 > 6 KSP preconditioned resid norm 1.106193887270e+03 true resid norm > 8.752912569969e+00 ||r(i)||/||b|| 1.709553236322e-02 > 7 KSP preconditioned resid norm 3.411263151853e+02 true resid norm > 4.817172861959e+00 ||r(i)||/||b|| 9.408540746014e-03 > 8 KSP preconditioned resid norm 1.129663122476e+02 true resid norm > 2.051711481120e+00 ||r(i)||/||b|| 4.007248986563e-03 > 9 KSP preconditioned resid norm 7.776030229135e+01 true resid norm > 1.092336734730e+00 ||r(i)||/||b|| 2.133470185019e-03 > 10 KSP preconditioned resid norm 3.900236414632e+01 true resid norm > 4.662658178376e-01 ||r(i)||/||b|| 9.106754254641e-04 > 11 KSP preconditioned resid norm 2.884248061867e+01 true resid norm > 2.584775749590e-01 ||r(i)||/||b|| 5.048390135919e-04 > 12 KSP preconditioned resid norm 1.275086146987e+01 true resid norm > 1.183721340034e-01 ||r(i)||/||b|| 2.311955742253e-04 > 13 KSP preconditioned resid norm 3.378721119782e+00 true resid norm > 5.841425568821e-02 ||r(i)||/||b|| 1.140903431410e-04 > Linear solve converged due to CONVERGED_RTOL iterations 13 > KSP final norm of residual 0.0584143 > Residual 2 norm 0.0584143 > Residual infinity norm 0.000458905 > > While this looks much better than our previous attempts, I think we might > end up using the algebraic approach for generating the intermediate > operators (either gamg or mg with -pc_mg_galerkin) since we want to support > specification of boundary conditions other than just at the extents of the > domain and it?s not clear how to handle the coarsening when boundary cells > and non-boundary cells are combined into a single cell on the coarser grid. > > I?d still like to understand the renormalization you mentioned. Do you > know of any resources that discuss it? > > Thanks > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Jun 26 21:53:22 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 26 Jun 2017 21:53:22 -0500 Subject: [petsc-users] Is still GPU feature recommended on master branch only? In-Reply-To: <5306b0bc-58ca-5122-c39f-6ba27bfe3661@giref.ulaval.ca> References: <5306b0bc-58ca-5122-c39f-6ba27bfe3661@giref.ulaval.ca> Message-ID: On Mon, Jun 26, 2017 at 9:24 PM, ?ric Chamberland < Eric.Chamberland at giref.ulaval.ca> wrote: > Hi, > > as it is stated here (https://www.mcs.anl.gov/petsc/features/gpus.html) > since a long time: > > We recommend working with petsc master (git branch) > if you wish to work > witht his feature. > Does this recommendation remains true? > Yes. GPU vendors are unable to arrive at stable software platforms. Matt > Thanks, > > Eric > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Hassan.Raiesi at aero.bombardier.com Tue Jun 27 09:12:33 2017 From: Hassan.Raiesi at aero.bombardier.com (Hassan Raiesi) Date: Tue, 27 Jun 2017 14:12:33 +0000 Subject: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel In-Reply-To: References: Message-ID: Thanks for your reply, Is there any example where each rank owns more than 1 element, i.e for the simple mesh here(attached png file), how should I pack and pass the coordinates of the vertices owned by rank0, rank1 Rank0:numcells = 2; num nodes=4, cells=[154 245] , nodes=[1 5 2 4] nodes = 1245 vertex coords: in what node order ? [coords_n1 coords_n2 coords_n4 coords_n5] or [coords_n2 coords_n4 coords_n1 coords_n5] or ?..? rank1: numcells = 2; num nodes=2, cells=[532 635], nodes=[6 3] vertexcoords [how to pack the nodes coords here?] should it be [x6y6 x3y3] or [x3y3 x6y6]? In what order? thanks From: Matthew Knepley [mailto:knepley at gmail.com] Sent: Sunday, June 25, 2017 1:04 PM To: Hassan Raiesi Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel On Tue, Apr 11, 2017 at 9:21 AM, Hassan Raiesi > wrote: Hello, I?m trying to use DMPlexCreateFromCellListParallel to create a DM from an already partitioned mesh, It requires an array of numVertices*spaceDim numbers, but how should one order the coordinates of the vertices? Global order. Here is the idea. You must read the file in chunks so that each proc can read its own chunk in parallel without talking to anyone else. we only pass the global vertex numbers using ?const int cells[]? to define the cell-connectivity, so passing the vertex coordinates in local ordering wouldn?t make sense? Yes. If it needs to be in global ordering, should I sort the global index of the node numbers owned by each rank (as they wont be continuous). Nope. Thanks, Matt Thank you Hassan Raiesi, Bombardier Aerospace www.bombardier.com -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: dmplex_mesh.png Type: image/png Size: 6224 bytes Desc: dmplex_mesh.png URL: From knepley at gmail.com Tue Jun 27 09:16:46 2017 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 27 Jun 2017 09:16:46 -0500 Subject: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel In-Reply-To: References: Message-ID: On Tue, Jun 27, 2017 at 9:12 AM, Hassan Raiesi < Hassan.Raiesi at aero.bombardier.com> wrote: > Thanks for your reply, > > > > Is there any example where each rank owns more than 1 element, i.e for the > simple mesh here(attached png file), how should I pack and pass the > coordinates of the vertices owned by rank0, rank1 > > > > Rank0:numcells = 2; num nodes=4, cells=[154 245] , nodes=[1 5 2 4] > > nodes = 1245 > > vertex coords: in what node order ? > > [coords_n1 coords_n2 coords_n4 coords_n5] or > > [coords_n2 coords_n4 coords_n1 coords_n5] or ?..? > > > > > > rank1: numcells = 2; num nodes=2, cells=[532 635], nodes=[6 3] > > vertexcoords [how to pack the nodes coords here?] > > should it be [x6y6 x3y3] or [x3y3 x6y6]? In what order? > I think there is a misunderstanding here. There is NO connection between the cell order and the vertex order. Each process gets a contiguous set of cells (in the global numbering) and a contiguous set of vertices (in the global numbering). These two are NOT related. We then move the vertices to the correct processes. In this way, we can load completely in parallel, without requiring any setup in the mesh file. If you are worried, you can always arrange the order of vertices to "match" the order of cells. Thanks, Matt > thanks > > > > > > > > *From:* Matthew Knepley [mailto:knepley at gmail.com] > *Sent:* Sunday, June 25, 2017 1:04 PM > *To:* Hassan Raiesi > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] specifying vertex coordinates using > DMPlexCreateFromCellListParallel > > > > On Tue, Apr 11, 2017 at 9:21 AM, Hassan Raiesi bombardier.com> wrote: > > Hello, > > > > I?m trying to use DMPlexCreateFromCellListParallel to create a DM from an > already partitioned mesh, > > It requires an array of numVertices*spaceDim numbers, but how should one > order the coordinates of the vertices? > > > > Global order. Here is the idea. You must read the file in chunks so that > each proc can read its own chunk in parallel > > without talking to anyone else. > > > > we only pass the global vertex numbers using ?const int cells[]? to define > the cell-connectivity, so passing the vertex coordinates in local ordering > wouldn?t make sense? > > > > Yes. > > > > If it needs to be in global ordering, should I sort the global index of > the node numbers owned by each rank (as they wont be continuous). > > > > Nope. > > > > Thanks, > > > > Matt > > > > > > Thank you > > > > Hassan Raiesi, > > Bombardier Aerospace > > www.bombardier.com > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > http://www.caam.rice.edu/~mk51/ > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rupp at iue.tuwien.ac.at Tue Jun 27 09:51:43 2017 From: rupp at iue.tuwien.ac.at (Karl Rupp) Date: Tue, 27 Jun 2017 16:51:43 +0200 Subject: [petsc-users] Is still GPU feature recommended on master branch only? In-Reply-To: References: <5306b0bc-58ca-5122-c39f-6ba27bfe3661@giref.ulaval.ca> Message-ID: Hey, > as it is stated here > (https://www.mcs.anl.gov/petsc/features/gpus.html > ) since a long time: > > We recommend working withpetsc master (git branch) > if you wish to > work witht his feature. > > Does this recommendation remains true? > > > Yes. GPU vendors are unable to arrive at stable software platforms. Things have gotten better in recent years, though. I hope to have GPU support in a better state in the future, so that GPU support is also available in the 'official' releases. Best regards, Karli From Hassan.Raiesi at aero.bombardier.com Tue Jun 27 10:08:41 2017 From: Hassan.Raiesi at aero.bombardier.com (Hassan Raiesi) Date: Tue, 27 Jun 2017 15:08:41 +0000 Subject: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel In-Reply-To: References: Message-ID: Great, It?s clear now ?, One more question, any plan to support other element shapes (prism and pyramid) in 3D?, DMPlexGetRawFaces_Internal only supports tets and hexs in 3D, can prisms and pyramids be used as degenerate hexahedrons? Thank you -Hassan From: Matthew Knepley [mailto:knepley at gmail.com] Sent: Tuesday, June 27, 2017 10:17 AM To: Hassan Raiesi Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel On Tue, Jun 27, 2017 at 9:12 AM, Hassan Raiesi > wrote: Thanks for your reply, Is there any example where each rank owns more than 1 element, i.e for the simple mesh here(attached png file), how should I pack and pass the coordinates of the vertices owned by rank0, rank1 Rank0:numcells = 2; num nodes=4, cells=[154 245] , nodes=[1 5 2 4] nodes = 1245 vertex coords: in what node order ? [coords_n1 coords_n2 coords_n4 coords_n5] or [coords_n2 coords_n4 coords_n1 coords_n5] or ?..? rank1: numcells = 2; num nodes=2, cells=[532 635], nodes=[6 3] vertexcoords [how to pack the nodes coords here?] should it be [x6y6 x3y3] or [x3y3 x6y6]? In what order? I think there is a misunderstanding here. There is NO connection between the cell order and the vertex order. Each process gets a contiguous set of cells (in the global numbering) and a contiguous set of vertices (in the global numbering). These two are NOT related. We then move the vertices to the correct processes. In this way, we can load completely in parallel, without requiring any setup in the mesh file. If you are worried, you can always arrange the order of vertices to "match" the order of cells. Thanks, Matt thanks From: Matthew Knepley [mailto:knepley at gmail.com] Sent: Sunday, June 25, 2017 1:04 PM To: Hassan Raiesi > Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel On Tue, Apr 11, 2017 at 9:21 AM, Hassan Raiesi > wrote: Hello, I?m trying to use DMPlexCreateFromCellListParallel to create a DM from an already partitioned mesh, It requires an array of numVertices*spaceDim numbers, but how should one order the coordinates of the vertices? Global order. Here is the idea. You must read the file in chunks so that each proc can read its own chunk in parallel without talking to anyone else. we only pass the global vertex numbers using ?const int cells[]? to define the cell-connectivity, so passing the vertex coordinates in local ordering wouldn?t make sense? Yes. If it needs to be in global ordering, should I sort the global index of the node numbers owned by each rank (as they wont be continuous). Nope. Thanks, Matt Thank you Hassan Raiesi, Bombardier Aerospace www.bombardier.com -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Jun 27 10:52:44 2017 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 27 Jun 2017 10:52:44 -0500 Subject: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel In-Reply-To: References: Message-ID: On Tue, Jun 27, 2017 at 10:08 AM, Hassan Raiesi < Hassan.Raiesi at aero.bombardier.com> wrote: > Great, It?s clear now J, > > One more question, any plan to support other element shapes (prism and > pyramid) in 3D?, DMPlexGetRawFaces_Internal only supports tets and hexs > in 3D, can prisms and pyramids be used as degenerate hexahedrons? > It depends on what you mean "support". Right now, we can represent these shapes in Plex. However, if you want mesh interpolation to work, then yes you need to extend GetRawFaces() to understand that shape. If you want them read out of a file format, other than Gmsh, we would likely have to extend that as well. These are straightforward once I understand what exactly you want to do. Thanks, Matt > Thank you > > -Hassan > > > > > > > > > > *From:* Matthew Knepley [mailto:knepley at gmail.com] > *Sent:* Tuesday, June 27, 2017 10:17 AM > *To:* Hassan Raiesi > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] specifying vertex coordinates using > DMPlexCreateFromCellListParallel > > > > On Tue, Jun 27, 2017 at 9:12 AM, Hassan Raiesi bombardier.com> wrote: > > Thanks for your reply, > > > > Is there any example where each rank owns more than 1 element, i.e for the > simple mesh here(attached png file), how should I pack and pass the > coordinates of the vertices owned by rank0, rank1 > > > > Rank0:numcells = 2; num nodes=4, cells=[154 245] , nodes=[1 5 2 4] > > nodes = 1245 > > vertex coords: in what node order ? > > [coords_n1 coords_n2 coords_n4 coords_n5] or > > [coords_n2 coords_n4 coords_n1 coords_n5] or ?..? > > > > > > rank1: numcells = 2; num nodes=2, cells=[532 635], nodes=[6 3] > > vertexcoords [how to pack the nodes coords here?] > > should it be [x6y6 x3y3] or [x3y3 x6y6]? In what order? > > > > I think there is a misunderstanding here. > > > > There is NO connection between the cell order and the vertex order. Each > process gets a contiguous > > set of cells (in the global numbering) and a contiguous set of vertices > (in the global numbering). These > > two are NOT related. We then move the vertices to the correct processes. > In this way, we can load > > completely in parallel, without requiring any setup in the mesh file. > > > > If you are worried, you can always arrange the order of vertices to > "match" the order of cells. > > > > Thanks, > > > > Matt > > > > thanks > > > > > > > > *From:* Matthew Knepley [mailto:knepley at gmail.com] > *Sent:* Sunday, June 25, 2017 1:04 PM > *To:* Hassan Raiesi > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] specifying vertex coordinates using > DMPlexCreateFromCellListParallel > > > > On Tue, Apr 11, 2017 at 9:21 AM, Hassan Raiesi bombardier.com> wrote: > > Hello, > > > > I?m trying to use DMPlexCreateFromCellListParallel to create a DM from an > already partitioned mesh, > > It requires an array of numVertices*spaceDim numbers, but how should one > order the coordinates of the vertices? > > > > Global order. Here is the idea. You must read the file in chunks so that > each proc can read its own chunk in parallel > > without talking to anyone else. > > > > we only pass the global vertex numbers using ?const int cells[]? to define > the cell-connectivity, so passing the vertex coordinates in local ordering > wouldn?t make sense? > > > > Yes. > > > > If it needs to be in global ordering, should I sort the global index of > the node numbers owned by each rank (as they wont be continuous). > > > > Nope. > > > > Thanks, > > > > Matt > > > > > > Thank you > > > > Hassan Raiesi, > > Bombardier Aerospace > > www.bombardier.com > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > http://www.caam.rice.edu/~mk51/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > http://www.caam.rice.edu/~mk51/ > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.annewandter at opengosim.com Tue Jun 27 11:24:41 2017 From: robert.annewandter at opengosim.com (Robert Annewandter) Date: Tue, 27 Jun 2017 17:24:41 +0100 Subject: [petsc-users] PCCOMPOSITE with PCBJACOBI Message-ID: Dear PETSc folks, I want a Block Jacobi PC to be the second PC in a two-stage preconditioning scheme implemented via multiplicative PCCOMPOSITE, with the outermost KSP an FGMRES. However, PCBJacobiGetSubKSP (https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCBJacobiGetSubKSP.html#PCBJacobiGetSubKSP) requires to call KSPSetUp (or PCSetUp) first on its parent KSP, which I struggle in succeeding. I wonder which KSP (or if so PC) that is. This is how I attempt to do it (using PCKSP to provide a parent KSP for PCBJacobiGetSubKSP): call KSPGetPC(solver%ksp, solver%pc, ierr); CHKERRQ(ierr) call PCSetType(solver%pc, PCCOMPOSITE, ierr); CHKERRQ(ierr) call PCCompositeSetType(solver%pc, PC_COMPOSITE_MULTIPLICATIVE, ierr); CHKERRQ(ierr) ! 1st Stage call PCCompositeAddPC(solver%pc, PCGALERKIN, ierr); CHKERRQ(ierr) call PCCompositeGetPC(solver%pc, 0, T1, ierr); CHKERRQ(ierr) ! KSPPREONLY-PCNONE for testing call PCGalerkinGetKSP(T1, Ap_ksp, ierr); CHKERRQ(ierr) call KSPSetType(Ap_ksp, KSPPREONLY, ierr); CHKERRQ(ierr) call KSPGetPC(Ap_ksp, Ap_pc, ierr); CHKERRQ(ierr) call PCSetType(Ap_pc, PCNONE, ierr); CHKERRQ(ierr) ! 2nd Stage call PCCompositeAddPC(solver%pc, PCKSP, ierr); CHKERRQ(ierr) call PCCompositeGetPC(solver%pc, 1, T2, ierr); CHKERRQ(ierr) call PCKSPGetKSP(T2, BJac_ksp, ierr); CHKERRQ(ierr) call KSPSetType(BJac_ksp, KSPPREONLY, ierr); CHKERRQ(ierr) call KSPGetPC(BJac_ksp, BJac_pc, ierr); CHKERRQ(ierr) call PCSetType(BJac_pc, PCBJACOBI, ierr); CHKERRQ(ierr) call KSPSetUp(solver%ksp, ierr); CHKERRQ(ierr) ! call KSPSetUp(BJac_ksp, ierr); CHKERRQ(ierr) ! call PCSetUp(T2, ierr); CHKERRQ(ierr) ! call PCSetUp(BJac_pc, ierr); CHKERRQ(ierr) call PCBJacobiGetSubKSP(BJac_pc, nsub_ksp, first_sub_ksp, PETSC_NULL_KSP, ierr); CHKERRQ(ierr) allocate(sub_ksps(nsub_ksp)) call PCBJacobiGetSubKSP(BJac_pc, nsub_ksp, first_sub_ksp, sub_ksps,ierr); CHKERRQ(ierr) do i = 1, nsub_ksp call KSPGetPC(sub_ksps(i), BJac_pc_sub, ierr); CHKERRQ(ierr) call PCFactorSetShiftType(BJac_pc_sub, MAT_SHIFT_INBLOCKS, ierr); CHKERRQ(ierr) call PCFactorSetZeroPivot(BJac_pc_sub, solver%linear_zero_pivot_tol, ierr); CHKERRQ(ierr) end do deallocate(sub_ksps) nullify(sub_ksps) Is using PCKSP a good idea at all? With KSPSetUp(solver%ksp) -> FGMRES [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: You requested a vector from a KSP that cannot provide one [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3167-g03c0fad GIT Date: 2017-03-30 14:27:53 -0500 [0]PETSC ERROR: pflotran on a debug_g-6.2 named mother by pujjad Tue Jun 27 16:55:14 2017 [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes --download-fblaslapack=yes --download-metis=yes --download-parmetis=yes --download-eigen=yes --download-hypre=yes --download-superlu_dist=yes --download-superlu=yes --with-cc=gcc-6 --with-cxx=g++-6 --with-fc=gfortran-6 PETSC_ARCH=debug_g-6.2 PETSC_DIR=/home/pujjad/Repositories/petsc [0]PETSC ERROR: #1 KSPCreateVecs() line 939 in /home/pujjad/Repositories/petsc/src/ksp/ksp/interface/iterativ.c [0]PETSC ERROR: #2 KSPSetUp_GMRES() line 85 in /home/pujjad/Repositories/petsc/src/ksp/ksp/impls/gmres/gmres.c [0]PETSC ERROR: #3 KSPSetUp_FGMRES() line 41 in /home/pujjad/Repositories/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c [0]PETSC ERROR: #4 KSPSetUp() line 338 in /home/pujjad/Repositories/petsc/src/ksp/ksp/interface/itfunc.c application called MPI_Abort(MPI_COMM_WORLD, 73) - process 0 [mpiexec at mother] handle_pmi_cmd (./pm/pmiserv/pmiserv_cb.c:52): Unrecognized PMI command: abort | cleaning up processes [mpiexec at mother] control_cb (./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command [mpiexec at mother] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [mpiexec at mother] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event [mpiexec at mother] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion With KSPSetUp(BJac_ksp) -> KSPPREONLY [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Arguments are incompatible [0]PETSC ERROR: Both n and N cannot be PETSC_DECIDE likely a call to VecSetSizes() or MatSetSizes() is wrong. See http://www.mcs.anl.gov/petsc/documentation/faq.html#split [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3167-g03c0fad GIT Date: 2017-03-30 14:27:53 -0500 [0]PETSC ERROR: pflotran on a debug_g-6.2 named mother by pujjad Tue Jun 27 16:52:57 2017 [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes --download-fblaslapack=yes --download-metis=yes --download-parmetis=yes --download-eigen=yes --download-hypre=yes --download-superlu_dist=yes --download-superlu=yes --with-cc=gcc-6 --with-cxx=g++-6 --with-fc=gfortran-6 PETSC_ARCH=debug_g-6.2 PETSC_DIR=/home/pujjad/Repositories/petsc [0]PETSC ERROR: #1 PetscSplitOwnership() line 77 in /home/pujjad/Repositories/petsc/src/sys/utils/psplit.c [0]PETSC ERROR: #2 PetscLayoutSetUp() line 137 in /home/pujjad/Repositories/petsc/src/vec/is/utils/pmap.c [0]PETSC ERROR: #3 VecCreate_Seq_Private() line 847 in /home/pujjad/Repositories/petsc/src/vec/vec/impls/seq/bvec2.c [0]PETSC ERROR: #4 VecCreateSeqWithArray() line 899 in /home/pujjad/Repositories/petsc/src/vec/vec/impls/seq/bvec2.c [0]PETSC ERROR: #5 PCSetUp_BJacobi_Singleblock() line 786 in /home/pujjad/Repositories/petsc/src/ksp/pc/impls/bjacobi/bjacobi.c [0]PETSC ERROR: #6 PCSetUp_BJacobi() line 136 in /home/pujjad/Repositories/petsc/src/ksp/pc/impls/bjacobi/bjacobi.c [0]PETSC ERROR: #7 PCSetUp() line 924 in /home/pujjad/Repositories/petsc/src/ksp/pc/interface/precon.c [0]PETSC ERROR: #8 KSPSetUp() line 379 in /home/pujjad/Repositories/petsc/src/ksp/ksp/interface/itfunc.c [mpiexec at mother] handle_pmi_cmd (./pm/pmiserv/pmiserv_cb.c:52): Unrecognized PMI command: abort | cleaning up processes [mpiexec at mother] control_cb (./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command [mpiexec at mother] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [mpiexec at mother] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event [mpiexec at mother] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion [mpiexec at mother] handle_pmi_cmd (./pm/pmiserv/pmiserv_cb.c:52): Unrecognized PMI command: abort | cleaning up processes [mpiexec at mother] control_cb (./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command [mpiexec at mother] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [mpiexec at mother] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event [mpiexec at mother] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion With PCSetUp(T2) -> PCKSP [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix must be set first [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3167-g03c0fad GIT Date: 2017-03-30 14:27:53 -0500 [0]PETSC ERROR: pflotran on a debug_g-6.2 named mother by pujjad Tue Jun 27 16:51:23 2017 [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes --download-fblaslapack=yes --download-metis=yes --download-parmetis=yes --download-eigen=yes --download-hypre=yes --download-superlu_dist=yes --download-superlu=yes --with-cc=gcc-6 --with-cxx=g++-6 --with-fc=gfortran-6 PETSC_ARCH=debug_g-6.2 PETSC_DIR=/home/pujjad/Repositories/petsc [0]PETSC ERROR: #1 PCSetUp() line 888 in /home/pujjad/Repositories/petsc/src/ksp/pc/interface/precon.c application called MPI_Abort(MPI_COMM_WORLD, 73) - process 0 [mpiexec at mother] handle_pmi_cmd (./pm/pmiserv/pmiserv_cb.c:52): Unrecognized PMI command: abort | cleaning up processes [mpiexec at mother] control_cb (./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command [mpiexec at mother] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [mpiexec at mother] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event [mpiexec at mother] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion With PCSetUp(BJac_pc) -> PCBJACOBI [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix must be set first [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3167-g03c0fad GIT Date: 2017-03-30 14:27:53 -0500 [0]PETSC ERROR: pflotran on a debug_g-6.2 named mother by pujjad Tue Jun 27 16:42:10 2017 [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes --download-fblaslapack=yes --download-metis=yes --download-parmetis=yes --download-eigen=yes --download-hypre=yes --download-superlu_dist=yes --download-superlu=yes --with-cc=gcc-6 --with-cxx=g++-6 --with-fc=gfortran-6 PETSC_ARCH=debug_g-6.2 PETSC_DIR=/home/pujjad/Repositories/petsc [0]PETSC ERROR: #1 PCSetUp() line 888 in /home/pujjad/Repositories/petsc/src/ksp/pc/interface/precon.c [mpiexec at mother] handle_pmi_cmd (./pm/pmiserv/pmiserv_cb.c:52): Unrecognized PMI command: abort | cleaning up processes [mpiexec at mother] control_cb (./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command [mpiexec at mother] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [mpiexec at mother] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event [mpiexec at mother] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion Grateful for any help! Robert -------------- next part -------------- An HTML attachment was scrubbed... URL: From Hassan.Raiesi at aero.bombardier.com Tue Jun 27 11:30:29 2017 From: Hassan.Raiesi at aero.bombardier.com (Hassan Raiesi) Date: Tue, 27 Jun 2017 16:30:29 +0000 Subject: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel In-Reply-To: References: Message-ID: I meant the interpolation, DMPlex supports those element shapes, however, there are two problems, one, ?DMPlexBuildFromCellList_Parallel? does not take mixed elements (numCorners must be constant for all elements in current implementation), that was easy to fix, I already extended DMPlexBuildFromCellList_Parallel_Private to take elements with different shapes, then I realized the interpolation does not work when the mesh has elements other than tets and hex. Regarding the interpolation, I also noticed that the memory requirement is huge if I load the whole mesh on one core and interpolate (I cannot interpolate a mesh with 9M tets on a machine with 128GB of memory, it ran out of memory, I?ll try to run with petsc memory logs and send). is there anyways to interpolate the mesh after DMPlexdistribute? The code crashes if I move DMPlexInterpolate to after calling DMPlexdistribute., I guess what is needed is a DMPlexInterpolate on already distributed meshes. Thank you -Hassan From: Matthew Knepley [mailto:knepley at gmail.com] Sent: Tuesday, June 27, 2017 11:53 AM To: Hassan Raiesi Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel On Tue, Jun 27, 2017 at 10:08 AM, Hassan Raiesi > wrote: Great, It?s clear now ?, One more question, any plan to support other element shapes (prism and pyramid) in 3D?, DMPlexGetRawFaces_Internal only supports tets and hexs in 3D, can prisms and pyramids be used as degenerate hexahedrons? It depends on what you mean "support". Right now, we can represent these shapes in Plex. However, if you want mesh interpolation to work, then yes you need to extend GetRawFaces() to understand that shape. If you want them read out of a file format, other than Gmsh, we would likely have to extend that as well. These are straightforward once I understand what exactly you want to do. Thanks, Matt Thank you -Hassan From: Matthew Knepley [mailto:knepley at gmail.com] Sent: Tuesday, June 27, 2017 10:17 AM To: Hassan Raiesi > Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel On Tue, Jun 27, 2017 at 9:12 AM, Hassan Raiesi > wrote: Thanks for your reply, Is there any example where each rank owns more than 1 element, i.e for the simple mesh here(attached png file), how should I pack and pass the coordinates of the vertices owned by rank0, rank1 Rank0:numcells = 2; num nodes=4, cells=[154 245] , nodes=[1 5 2 4] nodes = 1245 vertex coords: in what node order ? [coords_n1 coords_n2 coords_n4 coords_n5] or [coords_n2 coords_n4 coords_n1 coords_n5] or ?..? rank1: numcells = 2; num nodes=2, cells=[532 635], nodes=[6 3] vertexcoords [how to pack the nodes coords here?] should it be [x6y6 x3y3] or [x3y3 x6y6]? In what order? I think there is a misunderstanding here. There is NO connection between the cell order and the vertex order. Each process gets a contiguous set of cells (in the global numbering) and a contiguous set of vertices (in the global numbering). These two are NOT related. We then move the vertices to the correct processes. In this way, we can load completely in parallel, without requiring any setup in the mesh file. If you are worried, you can always arrange the order of vertices to "match" the order of cells. Thanks, Matt thanks From: Matthew Knepley [mailto:knepley at gmail.com] Sent: Sunday, June 25, 2017 1:04 PM To: Hassan Raiesi > Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel On Tue, Apr 11, 2017 at 9:21 AM, Hassan Raiesi > wrote: Hello, I?m trying to use DMPlexCreateFromCellListParallel to create a DM from an already partitioned mesh, It requires an array of numVertices*spaceDim numbers, but how should one order the coordinates of the vertices? Global order. Here is the idea. You must read the file in chunks so that each proc can read its own chunk in parallel without talking to anyone else. we only pass the global vertex numbers using ?const int cells[]? to define the cell-connectivity, so passing the vertex coordinates in local ordering wouldn?t make sense? Yes. If it needs to be in global ordering, should I sort the global index of the node numbers owned by each rank (as they wont be continuous). Nope. Thanks, Matt Thank you Hassan Raiesi, Bombardier Aerospace www.bombardier.com -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Jun 27 12:53:34 2017 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 27 Jun 2017 12:53:34 -0500 Subject: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel In-Reply-To: References: Message-ID: On Tue, Jun 27, 2017 at 11:30 AM, Hassan Raiesi < Hassan.Raiesi at aero.bombardier.com> wrote: > I meant the interpolation, > > DMPlex supports those element shapes, however, there are two problems, > one, ?DMPlexBuildFromCellList_Parallel? does not take mixed elements > (numCorners must be constant for all elements in current implementation), > that was easy to fix, > > I already extended DMPlexBuildFromCellList_Parallel_Private to take > elements with different shapes, then I realized the interpolation does not > work when the mesh has elements other than tets and hex. > Okay, here is what is needed. You need to a) prescribe an order for the vertices in a prism/pyramid (all input cells must have this order) b) report all faces as sets of ordered vertices - You have to order them matching the Plex vertex order for the lower dimensional shape - They should be oriented as to have outward facing normal > Regarding the interpolation, I also noticed that the memory requirement is > huge if I load the whole mesh on one core and interpolate (I cannot > interpolate a mesh with 9M tets on a machine with 128GB of memory, it ran > out of memory, I?ll try to run with petsc memory logs and send). > I am not sure what the upper limit is, but in a 3D mesh you could have many times the number of cells in faces and edges. Note that you would need --with-64-bit-indices to go beyond 4GB. > is there anyways to interpolate the mesh after DMPlexdistribute? The code > crashes if I move DMPlexInterpolate to after calling DMPlexdistribute., I > guess what is needed is a DMPlexInterpolate on already distributed meshes. > This should work. Maybe try a small example? If that crashes, just send it. Thanks, Matt > Thank you > > -Hassan > > > > *From:* Matthew Knepley [mailto:knepley at gmail.com] > *Sent:* Tuesday, June 27, 2017 11:53 AM > *To:* Hassan Raiesi > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] specifying vertex coordinates using > DMPlexCreateFromCellListParallel > > > > On Tue, Jun 27, 2017 at 10:08 AM, Hassan Raiesi bombardier.com> wrote: > > Great, It?s clear now J, > > One more question, any plan to support other element shapes (prism and > pyramid) in 3D?, DMPlexGetRawFaces_Internal only supports tets and hexs > in 3D, can prisms and pyramids be used as degenerate hexahedrons? > > > > It depends on what you mean "support". Right now, we can represent these > shapes in Plex. However, if you > > want mesh interpolation to work, then yes you need to extend GetRawFaces() > to understand that shape. If > > you want them read out of a file format, other than Gmsh, we would likely > have to extend that as well. These > > are straightforward once I understand what exactly you want to do. > > > > Thanks, > > > > Matt > > > > Thank you > > -Hassan > > > > > > > > > > *From:* Matthew Knepley [mailto:knepley at gmail.com] > *Sent:* Tuesday, June 27, 2017 10:17 AM > *To:* Hassan Raiesi > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] specifying vertex coordinates using > DMPlexCreateFromCellListParallel > > > > On Tue, Jun 27, 2017 at 9:12 AM, Hassan Raiesi bombardier.com> wrote: > > Thanks for your reply, > > > > Is there any example where each rank owns more than 1 element, i.e for the > simple mesh here(attached png file), how should I pack and pass the > coordinates of the vertices owned by rank0, rank1 > > > > Rank0:numcells = 2; num nodes=4, cells=[154 245] , nodes=[1 5 2 4] > > nodes = 1245 > > vertex coords: in what node order ? > > [coords_n1 coords_n2 coords_n4 coords_n5] or > > [coords_n2 coords_n4 coords_n1 coords_n5] or ?..? > > > > > > rank1: numcells = 2; num nodes=2, cells=[532 635], nodes=[6 3] > > vertexcoords [how to pack the nodes coords here?] > > should it be [x6y6 x3y3] or [x3y3 x6y6]? In what order? > > > > I think there is a misunderstanding here. > > > > There is NO connection between the cell order and the vertex order. Each > process gets a contiguous > > set of cells (in the global numbering) and a contiguous set of vertices > (in the global numbering). These > > two are NOT related. We then move the vertices to the correct processes. > In this way, we can load > > completely in parallel, without requiring any setup in the mesh file. > > > > If you are worried, you can always arrange the order of vertices to > "match" the order of cells. > > > > Thanks, > > > > Matt > > > > thanks > > > > > > > > *From:* Matthew Knepley [mailto:knepley at gmail.com] > *Sent:* Sunday, June 25, 2017 1:04 PM > *To:* Hassan Raiesi > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] specifying vertex coordinates using > DMPlexCreateFromCellListParallel > > > > On Tue, Apr 11, 2017 at 9:21 AM, Hassan Raiesi bombardier.com> wrote: > > Hello, > > > > I?m trying to use DMPlexCreateFromCellListParallel to create a DM from an > already partitioned mesh, > > It requires an array of numVertices*spaceDim numbers, but how should one > order the coordinates of the vertices? > > > > Global order. Here is the idea. You must read the file in chunks so that > each proc can read its own chunk in parallel > > without talking to anyone else. > > > > we only pass the global vertex numbers using ?const int cells[]? to define > the cell-connectivity, so passing the vertex coordinates in local ordering > wouldn?t make sense? > > > > Yes. > > > > If it needs to be in global ordering, should I sort the global index of > the node numbers owned by each rank (as they wont be continuous). > > > > Nope. > > > > Thanks, > > > > Matt > > > > > > Thank you > > > > Hassan Raiesi, > > Bombardier Aerospace > > www.bombardier.com > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > http://www.caam.rice.edu/~mk51/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > http://www.caam.rice.edu/~mk51/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > http://www.caam.rice.edu/~mk51/ > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Jun 27 17:45:32 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 27 Jun 2017 17:45:32 -0500 Subject: [petsc-users] PCCOMPOSITE with PCBJACOBI In-Reply-To: References: Message-ID: It is difficult, if not impossible at times to get all the options where you want them to be using the function call interface. On the other hand it is generally easy (if there are no inner PCSHELLS) to do this via the options database -pc_type composite -pc_composite_type multiplicative -pc_composite_pcs galerkin,bjacobi -sub_0_galerkin_ksp_type preonly -sub_0_galerkin_pc_type none -sub_1_sub_pc_factor_shift_type inblocks -sub_1_sub_pc_factor_zero_pivot zpiv > On Jun 27, 2017, at 11:24 AM, Robert Annewandter wrote: > > Dear PETSc folks, > > > I want a Block Jacobi PC to be the second PC in a two-stage preconditioning scheme implemented via multiplicative PCCOMPOSITE, with the outermost KSP an FGMRES. > > > However, PCBJacobiGetSubKSP (https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCBJacobiGetSubKSP.html#PCBJacobiGetSubKSP) requires to call KSPSetUp (or PCSetUp) first on its parent KSP, which I struggle in succeeding. I wonder which KSP (or if so PC) that is. > > > This is how I attempt to do it (using PCKSP to provide a parent KSP for PCBJacobiGetSubKSP): > > > call KSPGetPC(solver%ksp, solver%pc, ierr); CHKERRQ(ierr) > call PCSetType(solver%pc, PCCOMPOSITE, ierr); CHKERRQ(ierr) > call PCCompositeSetType(solver%pc, PC_COMPOSITE_MULTIPLICATIVE, ierr); CHKERRQ(ierr) > > > ! 1st Stage > call PCCompositeAddPC(solver%pc, PCGALERKIN, ierr); CHKERRQ(ierr) > call PCCompositeGetPC(solver%pc, 0, T1, ierr); CHKERRQ(ierr) > > > ! KSPPREONLY-PCNONE for testing > call PCGalerkinGetKSP(T1, Ap_ksp, ierr); CHKERRQ(ierr) > call KSPSetType(Ap_ksp, KSPPREONLY, ierr); CHKERRQ(ierr) > call KSPGetPC(Ap_ksp, Ap_pc, ierr); CHKERRQ(ierr) > call PCSetType(Ap_pc, PCNONE, ierr); CHKERRQ(ierr) > > > ! 2nd Stage > call PCCompositeAddPC(solver%pc, PCKSP, ierr); CHKERRQ(ierr) > call PCCompositeGetPC(solver%pc, 1, T2, ierr); CHKERRQ(ierr) > call PCKSPGetKSP(T2, BJac_ksp, ierr); CHKERRQ(ierr) > call KSPSetType(BJac_ksp, KSPPREONLY, ierr); CHKERRQ(ierr) > call KSPGetPC(BJac_ksp, BJac_pc, ierr); CHKERRQ(ierr) > call PCSetType(BJac_pc, PCBJACOBI, ierr); CHKERRQ(ierr) > > > call KSPSetUp(solver%ksp, ierr); CHKERRQ(ierr) > ! call KSPSetUp(BJac_ksp, ierr); CHKERRQ(ierr) > ! call PCSetUp(T2, ierr); CHKERRQ(ierr) > ! call PCSetUp(BJac_pc, ierr); CHKERRQ(ierr) > > > call PCBJacobiGetSubKSP(BJac_pc, nsub_ksp, first_sub_ksp, PETSC_NULL_KSP, ierr); CHKERRQ(ierr) > allocate(sub_ksps(nsub_ksp)) > call PCBJacobiGetSubKSP(BJac_pc, nsub_ksp, first_sub_ksp, sub_ksps,ierr); CHKERRQ(ierr) > do i = 1, nsub_ksp > call KSPGetPC(sub_ksps(i), BJac_pc_sub, ierr); CHKERRQ(ierr) > call PCFactorSetShiftType(BJac_pc_sub, MAT_SHIFT_INBLOCKS, ierr); CHKERRQ(ierr) > call PCFactorSetZeroPivot(BJac_pc_sub, solver%linear_zero_pivot_tol, ierr); CHKERRQ(ierr) > end do > deallocate(sub_ksps) > nullify(sub_ksps) > > > Is using PCKSP a good idea at all? > > > With KSPSetUp(solver%ksp) -> FGMRES > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: You requested a vector from a KSP that cannot provide one > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3167-g03c0fad GIT Date: 2017-03-30 14:27:53 -0500 > [0]PETSC ERROR: pflotran on a debug_g-6.2 named mother by pujjad Tue Jun 27 16:55:14 2017 > [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes --download-fblaslapack=yes --download-metis=yes --download-parmetis=yes --download-eigen=yes --download-hypre=yes --download-superlu_dist=yes --download-superlu=yes --with-cc=gcc-6 --with-cxx=g++-6 --with-fc=gfortran-6 PETSC_ARCH=debug_g-6.2 PETSC_DIR=/home/pujjad/Repositories/petsc > [0]PETSC ERROR: #1 KSPCreateVecs() line 939 in /home/pujjad/Repositories/petsc/src/ksp/ksp/interface/iterativ.c > [0]PETSC ERROR: #2 KSPSetUp_GMRES() line 85 in /home/pujjad/Repositories/petsc/src/ksp/ksp/impls/gmres/gmres.c > [0]PETSC ERROR: #3 KSPSetUp_FGMRES() line 41 in /home/pujjad/Repositories/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c > [0]PETSC ERROR: #4 KSPSetUp() line 338 in /home/pujjad/Repositories/petsc/src/ksp/ksp/interface/itfunc.c > application called MPI_Abort(MPI_COMM_WORLD, 73) - process 0 > [mpiexec at mother] handle_pmi_cmd (./pm/pmiserv/pmiserv_cb.c:52): Unrecognized PMI command: abort | cleaning up processes > [mpiexec at mother] control_cb (./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command > [mpiexec at mother] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [mpiexec at mother] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event > [mpiexec at mother] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion > > > > With KSPSetUp(BJac_ksp) -> KSPPREONLY > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Arguments are incompatible > [0]PETSC ERROR: Both n and N cannot be PETSC_DECIDE > likely a call to VecSetSizes() or MatSetSizes() is wrong. > See http://www.mcs.anl.gov/petsc/documentation/faq.html#split > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3167-g03c0fad GIT Date: 2017-03-30 14:27:53 -0500 > [0]PETSC ERROR: pflotran on a debug_g-6.2 named mother by pujjad Tue Jun 27 16:52:57 2017 > [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes --download-fblaslapack=yes --download-metis=yes --download-parmetis=yes --download-eigen=yes --download-hypre=yes --download-superlu_dist=yes --download-superlu=yes --with-cc=gcc-6 --with-cxx=g++-6 --with-fc=gfortran-6 PETSC_ARCH=debug_g-6.2 PETSC_DIR=/home/pujjad/Repositories/petsc > [0]PETSC ERROR: #1 PetscSplitOwnership() line 77 in /home/pujjad/Repositories/petsc/src/sys/utils/psplit.c > [0]PETSC ERROR: #2 PetscLayoutSetUp() line 137 in /home/pujjad/Repositories/petsc/src/vec/is/utils/pmap.c > [0]PETSC ERROR: #3 VecCreate_Seq_Private() line 847 in /home/pujjad/Repositories/petsc/src/vec/vec/impls/seq/bvec2.c > [0]PETSC ERROR: #4 VecCreateSeqWithArray() line 899 in /home/pujjad/Repositories/petsc/src/vec/vec/impls/seq/bvec2.c > [0]PETSC ERROR: #5 PCSetUp_BJacobi_Singleblock() line 786 in /home/pujjad/Repositories/petsc/src/ksp/pc/impls/bjacobi/bjacobi.c > [0]PETSC ERROR: #6 PCSetUp_BJacobi() line 136 in /home/pujjad/Repositories/petsc/src/ksp/pc/impls/bjacobi/bjacobi.c > [0]PETSC ERROR: #7 PCSetUp() line 924 in /home/pujjad/Repositories/petsc/src/ksp/pc/interface/precon.c > [0]PETSC ERROR: #8 KSPSetUp() line 379 in /home/pujjad/Repositories/petsc/src/ksp/ksp/interface/itfunc.c > [mpiexec at mother] handle_pmi_cmd (./pm/pmiserv/pmiserv_cb.c:52): Unrecognized PMI command: abort | cleaning up processes > [mpiexec at mother] control_cb (./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command > [mpiexec at mother] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [mpiexec at mother] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event > [mpiexec at mother] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion > [mpiexec at mother] handle_pmi_cmd (./pm/pmiserv/pmiserv_cb.c:52): Unrecognized PMI command: abort | cleaning up processes > [mpiexec at mother] control_cb (./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command > [mpiexec at mother] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [mpiexec at mother] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event > [mpiexec at mother] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion > > > > With PCSetUp(T2) -> PCKSP > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Matrix must be set first > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3167-g03c0fad GIT Date: 2017-03-30 14:27:53 -0500 > [0]PETSC ERROR: pflotran on a debug_g-6.2 named mother by pujjad Tue Jun 27 16:51:23 2017 > [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes --download-fblaslapack=yes --download-metis=yes --download-parmetis=yes --download-eigen=yes --download-hypre=yes --download-superlu_dist=yes --download-superlu=yes --with-cc=gcc-6 --with-cxx=g++-6 --with-fc=gfortran-6 PETSC_ARCH=debug_g-6.2 PETSC_DIR=/home/pujjad/Repositories/petsc > [0]PETSC ERROR: #1 PCSetUp() line 888 in /home/pujjad/Repositories/petsc/src/ksp/pc/interface/precon.c > application called MPI_Abort(MPI_COMM_WORLD, 73) - process 0 > [mpiexec at mother] handle_pmi_cmd (./pm/pmiserv/pmiserv_cb.c:52): Unrecognized PMI command: abort | cleaning up processes > [mpiexec at mother] control_cb (./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command > [mpiexec at mother] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [mpiexec at mother] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event > [mpiexec at mother] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion > > > > With PCSetUp(BJac_pc) -> PCBJACOBI > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Matrix must be set first > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3167-g03c0fad GIT Date: 2017-03-30 14:27:53 -0500 > [0]PETSC ERROR: pflotran on a debug_g-6.2 named mother by pujjad Tue Jun 27 16:42:10 2017 > [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes --download-fblaslapack=yes --download-metis=yes --download-parmetis=yes --download-eigen=yes --download-hypre=yes --download-superlu_dist=yes --download-superlu=yes --with-cc=gcc-6 --with-cxx=g++-6 --with-fc=gfortran-6 PETSC_ARCH=debug_g-6.2 PETSC_DIR=/home/pujjad/Repositories/petsc > [0]PETSC ERROR: #1 PCSetUp() line 888 in /home/pujjad/Repositories/petsc/src/ksp/pc/interface/precon.c > [mpiexec at mother] handle_pmi_cmd (./pm/pmiserv/pmiserv_cb.c:52): Unrecognized PMI command: abort | cleaning up processes > [mpiexec at mother] control_cb (./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command > [mpiexec at mother] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status > [mpiexec at mother] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event > [mpiexec at mother] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion > > > > Grateful for any help! > Robert > > From robert.annewandter at opengosim.com Wed Jun 28 02:07:18 2017 From: robert.annewandter at opengosim.com (Robert Annewandter) Date: Wed, 28 Jun 2017 08:07:18 +0100 Subject: [petsc-users] PCCOMPOSITE with PCBJACOBI In-Reply-To: References: Message-ID: Thank you Barry! We like to hard wire it into PFLOTRAN with CPR-AMG Block Jacobi Two-Stage Preconditioning potentially becoming the standard solver strategy. Using the options database is a great start to reverse engineer the issue! Thanks! Robert On 27/06/17 23:45, Barry Smith wrote: > It is difficult, if not impossible at times to get all the options where you want them to be using the function call interface. On the other hand it is generally easy (if there are no inner PCSHELLS) to do this via the options database > > -pc_type composite > -pc_composite_type multiplicative > -pc_composite_pcs galerkin,bjacobi > > -sub_0_galerkin_ksp_type preonly > -sub_0_galerkin_pc_type none > > -sub_1_sub_pc_factor_shift_type inblocks > -sub_1_sub_pc_factor_zero_pivot zpiv > > > >> On Jun 27, 2017, at 11:24 AM, Robert Annewandter wrote: >> >> Dear PETSc folks, >> >> >> I want a Block Jacobi PC to be the second PC in a two-stage preconditioning scheme implemented via multiplicative PCCOMPOSITE, with the outermost KSP an FGMRES. >> >> >> However, PCBJacobiGetSubKSP (https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCBJacobiGetSubKSP.html#PCBJacobiGetSubKSP) requires to call KSPSetUp (or PCSetUp) first on its parent KSP, which I struggle in succeeding. I wonder which KSP (or if so PC) that is. >> >> >> This is how I attempt to do it (using PCKSP to provide a parent KSP for PCBJacobiGetSubKSP): >> >> >> call KSPGetPC(solver%ksp, solver%pc, ierr); CHKERRQ(ierr) >> call PCSetType(solver%pc, PCCOMPOSITE, ierr); CHKERRQ(ierr) >> call PCCompositeSetType(solver%pc, PC_COMPOSITE_MULTIPLICATIVE, ierr); CHKERRQ(ierr) >> >> >> ! 1st Stage >> call PCCompositeAddPC(solver%pc, PCGALERKIN, ierr); CHKERRQ(ierr) >> call PCCompositeGetPC(solver%pc, 0, T1, ierr); CHKERRQ(ierr) >> >> >> ! KSPPREONLY-PCNONE for testing >> call PCGalerkinGetKSP(T1, Ap_ksp, ierr); CHKERRQ(ierr) >> call KSPSetType(Ap_ksp, KSPPREONLY, ierr); CHKERRQ(ierr) >> call KSPGetPC(Ap_ksp, Ap_pc, ierr); CHKERRQ(ierr) >> call PCSetType(Ap_pc, PCNONE, ierr); CHKERRQ(ierr) >> >> >> ! 2nd Stage >> call PCCompositeAddPC(solver%pc, PCKSP, ierr); CHKERRQ(ierr) >> call PCCompositeGetPC(solver%pc, 1, T2, ierr); CHKERRQ(ierr) >> call PCKSPGetKSP(T2, BJac_ksp, ierr); CHKERRQ(ierr) >> call KSPSetType(BJac_ksp, KSPPREONLY, ierr); CHKERRQ(ierr) >> call KSPGetPC(BJac_ksp, BJac_pc, ierr); CHKERRQ(ierr) >> call PCSetType(BJac_pc, PCBJACOBI, ierr); CHKERRQ(ierr) >> >> >> call KSPSetUp(solver%ksp, ierr); CHKERRQ(ierr) >> ! call KSPSetUp(BJac_ksp, ierr); CHKERRQ(ierr) >> ! call PCSetUp(T2, ierr); CHKERRQ(ierr) >> ! call PCSetUp(BJac_pc, ierr); CHKERRQ(ierr) >> >> >> call PCBJacobiGetSubKSP(BJac_pc, nsub_ksp, first_sub_ksp, PETSC_NULL_KSP, ierr); CHKERRQ(ierr) >> allocate(sub_ksps(nsub_ksp)) >> call PCBJacobiGetSubKSP(BJac_pc, nsub_ksp, first_sub_ksp, sub_ksps,ierr); CHKERRQ(ierr) >> do i = 1, nsub_ksp >> call KSPGetPC(sub_ksps(i), BJac_pc_sub, ierr); CHKERRQ(ierr) >> call PCFactorSetShiftType(BJac_pc_sub, MAT_SHIFT_INBLOCKS, ierr); CHKERRQ(ierr) >> call PCFactorSetZeroPivot(BJac_pc_sub, solver%linear_zero_pivot_tol, ierr); CHKERRQ(ierr) >> end do >> deallocate(sub_ksps) >> nullify(sub_ksps) >> >> >> Is using PCKSP a good idea at all? >> >> >> With KSPSetUp(solver%ksp) -> FGMRES >> >> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> [0]PETSC ERROR: Object is in wrong state >> [0]PETSC ERROR: You requested a vector from a KSP that cannot provide one >> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3167-g03c0fad GIT Date: 2017-03-30 14:27:53 -0500 >> [0]PETSC ERROR: pflotran on a debug_g-6.2 named mother by pujjad Tue Jun 27 16:55:14 2017 >> [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes --download-fblaslapack=yes --download-metis=yes --download-parmetis=yes --download-eigen=yes --download-hypre=yes --download-superlu_dist=yes --download-superlu=yes --with-cc=gcc-6 --with-cxx=g++-6 --with-fc=gfortran-6 PETSC_ARCH=debug_g-6.2 PETSC_DIR=/home/pujjad/Repositories/petsc >> [0]PETSC ERROR: #1 KSPCreateVecs() line 939 in /home/pujjad/Repositories/petsc/src/ksp/ksp/interface/iterativ.c >> [0]PETSC ERROR: #2 KSPSetUp_GMRES() line 85 in /home/pujjad/Repositories/petsc/src/ksp/ksp/impls/gmres/gmres.c >> [0]PETSC ERROR: #3 KSPSetUp_FGMRES() line 41 in /home/pujjad/Repositories/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >> [0]PETSC ERROR: #4 KSPSetUp() line 338 in /home/pujjad/Repositories/petsc/src/ksp/ksp/interface/itfunc.c >> application called MPI_Abort(MPI_COMM_WORLD, 73) - process 0 >> [mpiexec at mother] handle_pmi_cmd (./pm/pmiserv/pmiserv_cb.c:52): Unrecognized PMI command: abort | cleaning up processes >> [mpiexec at mother] control_cb (./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command >> [mpiexec at mother] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status >> [mpiexec at mother] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event >> [mpiexec at mother] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion >> >> >> >> With KSPSetUp(BJac_ksp) -> KSPPREONLY >> >> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> [0]PETSC ERROR: Arguments are incompatible >> [0]PETSC ERROR: Both n and N cannot be PETSC_DECIDE >> likely a call to VecSetSizes() or MatSetSizes() is wrong. >> See http://www.mcs.anl.gov/petsc/documentation/faq.html#split >> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3167-g03c0fad GIT Date: 2017-03-30 14:27:53 -0500 >> [0]PETSC ERROR: pflotran on a debug_g-6.2 named mother by pujjad Tue Jun 27 16:52:57 2017 >> [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes --download-fblaslapack=yes --download-metis=yes --download-parmetis=yes --download-eigen=yes --download-hypre=yes --download-superlu_dist=yes --download-superlu=yes --with-cc=gcc-6 --with-cxx=g++-6 --with-fc=gfortran-6 PETSC_ARCH=debug_g-6.2 PETSC_DIR=/home/pujjad/Repositories/petsc >> [0]PETSC ERROR: #1 PetscSplitOwnership() line 77 in /home/pujjad/Repositories/petsc/src/sys/utils/psplit.c >> [0]PETSC ERROR: #2 PetscLayoutSetUp() line 137 in /home/pujjad/Repositories/petsc/src/vec/is/utils/pmap.c >> [0]PETSC ERROR: #3 VecCreate_Seq_Private() line 847 in /home/pujjad/Repositories/petsc/src/vec/vec/impls/seq/bvec2.c >> [0]PETSC ERROR: #4 VecCreateSeqWithArray() line 899 in /home/pujjad/Repositories/petsc/src/vec/vec/impls/seq/bvec2.c >> [0]PETSC ERROR: #5 PCSetUp_BJacobi_Singleblock() line 786 in /home/pujjad/Repositories/petsc/src/ksp/pc/impls/bjacobi/bjacobi.c >> [0]PETSC ERROR: #6 PCSetUp_BJacobi() line 136 in /home/pujjad/Repositories/petsc/src/ksp/pc/impls/bjacobi/bjacobi.c >> [0]PETSC ERROR: #7 PCSetUp() line 924 in /home/pujjad/Repositories/petsc/src/ksp/pc/interface/precon.c >> [0]PETSC ERROR: #8 KSPSetUp() line 379 in /home/pujjad/Repositories/petsc/src/ksp/ksp/interface/itfunc.c >> [mpiexec at mother] handle_pmi_cmd (./pm/pmiserv/pmiserv_cb.c:52): Unrecognized PMI command: abort | cleaning up processes >> [mpiexec at mother] control_cb (./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command >> [mpiexec at mother] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status >> [mpiexec at mother] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event >> [mpiexec at mother] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion >> [mpiexec at mother] handle_pmi_cmd (./pm/pmiserv/pmiserv_cb.c:52): Unrecognized PMI command: abort | cleaning up processes >> [mpiexec at mother] control_cb (./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command >> [mpiexec at mother] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status >> [mpiexec at mother] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event >> [mpiexec at mother] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion >> >> >> >> With PCSetUp(T2) -> PCKSP >> >> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> [0]PETSC ERROR: Object is in wrong state >> [0]PETSC ERROR: Matrix must be set first >> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3167-g03c0fad GIT Date: 2017-03-30 14:27:53 -0500 >> [0]PETSC ERROR: pflotran on a debug_g-6.2 named mother by pujjad Tue Jun 27 16:51:23 2017 >> [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes --download-fblaslapack=yes --download-metis=yes --download-parmetis=yes --download-eigen=yes --download-hypre=yes --download-superlu_dist=yes --download-superlu=yes --with-cc=gcc-6 --with-cxx=g++-6 --with-fc=gfortran-6 PETSC_ARCH=debug_g-6.2 PETSC_DIR=/home/pujjad/Repositories/petsc >> [0]PETSC ERROR: #1 PCSetUp() line 888 in /home/pujjad/Repositories/petsc/src/ksp/pc/interface/precon.c >> application called MPI_Abort(MPI_COMM_WORLD, 73) - process 0 >> [mpiexec at mother] handle_pmi_cmd (./pm/pmiserv/pmiserv_cb.c:52): Unrecognized PMI command: abort | cleaning up processes >> [mpiexec at mother] control_cb (./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command >> [mpiexec at mother] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status >> [mpiexec at mother] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event >> [mpiexec at mother] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion >> >> >> >> With PCSetUp(BJac_pc) -> PCBJACOBI >> >> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> [0]PETSC ERROR: Object is in wrong state >> [0]PETSC ERROR: Matrix must be set first >> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3167-g03c0fad GIT Date: 2017-03-30 14:27:53 -0500 >> [0]PETSC ERROR: pflotran on a debug_g-6.2 named mother by pujjad Tue Jun 27 16:42:10 2017 >> [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes --download-fblaslapack=yes --download-metis=yes --download-parmetis=yes --download-eigen=yes --download-hypre=yes --download-superlu_dist=yes --download-superlu=yes --with-cc=gcc-6 --with-cxx=g++-6 --with-fc=gfortran-6 PETSC_ARCH=debug_g-6.2 PETSC_DIR=/home/pujjad/Repositories/petsc >> [0]PETSC ERROR: #1 PCSetUp() line 888 in /home/pujjad/Repositories/petsc/src/ksp/pc/interface/precon.c >> [mpiexec at mother] handle_pmi_cmd (./pm/pmiserv/pmiserv_cb.c:52): Unrecognized PMI command: abort | cleaning up processes >> [mpiexec at mother] control_cb (./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command >> [mpiexec at mother] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status >> [mpiexec at mother] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event >> [mpiexec at mother] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion >> >> >> >> Grateful for any help! >> Robert >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From giuntoli1991 at gmail.com Wed Jun 28 08:05:56 2017 From: giuntoli1991 at gmail.com (Guido Giuntoli) Date: Wed, 28 Jun 2017 15:05:56 +0200 Subject: [petsc-users] Node renumbering and ghost nodes determination Message-ID: Hi, I am using ParMETIS to get a partition of my mesh (a group of elements for each process) and now I want to renumber the nodes contiguous for each process according to their rank. Can I take advantage of the IS functions (or others like AO) directly from this point or is crucial to do the partition with ParMETIS using PETSc functions and apply ISPartitionToNumbering like in the manual's example ? Another question, which is the best way of determine the ownership of the nodes ? for example : node 101 belongs to an element in process 0 and to an element in process 1, how do I take the decision ? In the past I did a partition with METIS and he returned an array of nodes belonging like the elements so that was easy but now my only idea is to get the new renumbering and check if the new renumbering fall inside my range. Thank you, Guido. -------------- next part -------------- An HTML attachment was scrubbed... URL: From niko.karin at gmail.com Wed Jun 28 10:28:53 2017 From: niko.karin at gmail.com (Karin&NiKo) Date: Wed, 28 Jun 2017 17:28:53 +0200 Subject: [petsc-users] Print local parts of MATMPIAIJ Message-ID: Dear PETSc team, I am building a distributed MATMPIAIJ matrix with the petsc4py interface of PETSc. Then I would like to print, say in separate files, the local entries of the matrix owned by the different processes. Indeed, when defining a viewer, it prints the whole matrix whether in stdout or in a specified file. Is there a way of doing so with petsc4py of with PETSc? Thanks, Nicolas -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jun 28 10:52:58 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 28 Jun 2017 10:52:58 -0500 Subject: [petsc-users] Print local parts of MATMPIAIJ In-Reply-To: References: Message-ID: On Wed, Jun 28, 2017 at 10:28 AM, Karin&NiKo wrote: > Dear PETSc team, > > I am building a distributed MATMPIAIJ matrix with the petsc4py interface > of PETSc. Then I would like to print, say in separate files, the local > entries of the matrix owned by the different processes. Indeed, when > defining a viewer, it prints the whole matrix whether in stdout or in a > specified file. > The rows are partitioned, so seeing what is owned by a process is easy. > Is there a way of doing so with petsc4py of with PETSc? > If you really want separate files, I think you could use http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetSubMatrices.html to make a serial local matrix on every proc, which you then viewed. Matt > Thanks, > Nicolas > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Jun 28 11:31:34 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 28 Jun 2017 11:31:34 -0500 Subject: [petsc-users] PCCOMPOSITE with PCBJACOBI In-Reply-To: References: Message-ID: <3669D977-1FA4-46AC-9DFD-C69E3C7320EF@mcs.anl.gov> > On Jun 28, 2017, at 2:07 AM, Robert Annewandter wrote: > > Thank you Barry! > > We like to hard wire it into PFLOTRAN with CPR-AMG Block Jacobi Two-Stage Preconditioning potentially becoming the standard solver strategy. Understood. Note that you can embed the options into the program with PetscOptionsSetValue() so they don't need to be on the command line. Barry > Using the options database is a great start to reverse engineer the issue! > > Thanks! > Robert > > > > > On 27/06/17 23:45, Barry Smith wrote: >> It is difficult, if not impossible at times to get all the options where you want them to be using the function call interface. On the other hand it is generally easy (if there are no inner PCSHELLS) to do this via the options database >> >> -pc_type composite >> -pc_composite_type multiplicative >> -pc_composite_pcs galerkin,bjacobi >> >> -sub_0_galerkin_ksp_type preonly >> -sub_0_galerkin_pc_type none >> >> -sub_1_sub_pc_factor_shift_type inblocks >> -sub_1_sub_pc_factor_zero_pivot zpiv >> >> >> >> >>> On Jun 27, 2017, at 11:24 AM, Robert Annewandter >>> wrote: >>> >>> Dear PETSc folks, >>> >>> >>> I want a Block Jacobi PC to be the second PC in a two-stage preconditioning scheme implemented via multiplicative PCCOMPOSITE, with the outermost KSP an FGMRES. >>> >>> >>> However, PCBJacobiGetSubKSP ( >>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCBJacobiGetSubKSP.html#PCBJacobiGetSubKSP >>> ) requires to call KSPSetUp (or PCSetUp) first on its parent KSP, which I struggle in succeeding. I wonder which KSP (or if so PC) that is. >>> >>> >>> This is how I attempt to do it (using PCKSP to provide a parent KSP for PCBJacobiGetSubKSP): >>> >>> >>> call KSPGetPC(solver%ksp, solver%pc, ierr); CHKERRQ(ierr) >>> call PCSetType(solver%pc, PCCOMPOSITE, ierr); CHKERRQ(ierr) >>> call PCCompositeSetType(solver%pc, PC_COMPOSITE_MULTIPLICATIVE, ierr); CHKERRQ(ierr) >>> >>> >>> ! 1st Stage >>> call PCCompositeAddPC(solver%pc, PCGALERKIN, ierr); CHKERRQ(ierr) >>> call PCCompositeGetPC(solver%pc, 0, T1, ierr); CHKERRQ(ierr) >>> >>> >>> ! KSPPREONLY-PCNONE for testing >>> call PCGalerkinGetKSP(T1, Ap_ksp, ierr); CHKERRQ(ierr) >>> call KSPSetType(Ap_ksp, KSPPREONLY, ierr); CHKERRQ(ierr) >>> call KSPGetPC(Ap_ksp, Ap_pc, ierr); CHKERRQ(ierr) >>> call PCSetType(Ap_pc, PCNONE, ierr); CHKERRQ(ierr) >>> >>> >>> ! 2nd Stage >>> call PCCompositeAddPC(solver%pc, PCKSP, ierr); CHKERRQ(ierr) >>> call PCCompositeGetPC(solver%pc, 1, T2, ierr); CHKERRQ(ierr) >>> call PCKSPGetKSP(T2, BJac_ksp, ierr); CHKERRQ(ierr) >>> call KSPSetType(BJac_ksp, KSPPREONLY, ierr); CHKERRQ(ierr) >>> call KSPGetPC(BJac_ksp, BJac_pc, ierr); CHKERRQ(ierr) >>> call PCSetType(BJac_pc, PCBJACOBI, ierr); CHKERRQ(ierr) >>> >>> >>> call KSPSetUp(solver%ksp, ierr); CHKERRQ(ierr) >>> ! call KSPSetUp(BJac_ksp, ierr); CHKERRQ(ierr) >>> ! call PCSetUp(T2, ierr); CHKERRQ(ierr) >>> ! call PCSetUp(BJac_pc, ierr); CHKERRQ(ierr) >>> >>> >>> call PCBJacobiGetSubKSP(BJac_pc, nsub_ksp, first_sub_ksp, PETSC_NULL_KSP, ierr); CHKERRQ(ierr) >>> allocate(sub_ksps(nsub_ksp)) >>> call PCBJacobiGetSubKSP(BJac_pc, nsub_ksp, first_sub_ksp, sub_ksps,ierr); CHKERRQ(ierr) >>> do i = 1, nsub_ksp >>> call KSPGetPC(sub_ksps(i), BJac_pc_sub, ierr); CHKERRQ(ierr) >>> call PCFactorSetShiftType(BJac_pc_sub, MAT_SHIFT_INBLOCKS, ierr); CHKERRQ(ierr) >>> call PCFactorSetZeroPivot(BJac_pc_sub, solver%linear_zero_pivot_tol, ierr); CHKERRQ(ierr) >>> end do >>> deallocate(sub_ksps) >>> nullify(sub_ksps) >>> >>> >>> Is using PCKSP a good idea at all? >>> >>> >>> With KSPSetUp(solver%ksp) -> FGMRES >>> >>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>> [0]PETSC ERROR: Object is in wrong state >>> [0]PETSC ERROR: You requested a vector from a KSP that cannot provide one >>> [0]PETSC ERROR: See >>> http://www.mcs.anl.gov/petsc/documentation/faq.html >>> for trouble shooting. >>> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3167-g03c0fad GIT Date: 2017-03-30 14:27:53 -0500 >>> [0]PETSC ERROR: pflotran on a debug_g-6.2 named mother by pujjad Tue Jun 27 16:55:14 2017 >>> [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes --download-fblaslapack=yes --download-metis=yes --download-parmetis=yes --download-eigen=yes --download-hypre=yes --download-superlu_dist=yes --download-superlu=yes --with-cc=gcc-6 --with-cxx=g++-6 --with-fc=gfortran-6 PETSC_ARCH=debug_g-6.2 PETSC_DIR=/home/pujjad/Repositories/petsc >>> [0]PETSC ERROR: #1 KSPCreateVecs() line 939 in /home/pujjad/Repositories/petsc/src/ksp/ksp/interface/iterativ.c >>> [0]PETSC ERROR: #2 KSPSetUp_GMRES() line 85 in /home/pujjad/Repositories/petsc/src/ksp/ksp/impls/gmres/gmres.c >>> [0]PETSC ERROR: #3 KSPSetUp_FGMRES() line 41 in /home/pujjad/Repositories/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >>> [0]PETSC ERROR: #4 KSPSetUp() line 338 in /home/pujjad/Repositories/petsc/src/ksp/ksp/interface/itfunc.c >>> application called MPI_Abort(MPI_COMM_WORLD, 73) - process 0 >>> [mpiexec at mother] handle_pmi_cmd (./pm/pmiserv/pmiserv_cb.c:52): Unrecognized PMI command: abort | cleaning up processes >>> [mpiexec at mother] control_cb (./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command >>> [mpiexec at mother] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status >>> [mpiexec at mother] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event >>> [mpiexec at mother] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion >>> >>> >>> >>> With KSPSetUp(BJac_ksp) -> KSPPREONLY >>> >>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>> [0]PETSC ERROR: Arguments are incompatible >>> [0]PETSC ERROR: Both n and N cannot be PETSC_DECIDE >>> likely a call to VecSetSizes() or MatSetSizes() is wrong. >>> See >>> http://www.mcs.anl.gov/petsc/documentation/faq.html#split >>> >>> [0]PETSC ERROR: See >>> http://www.mcs.anl.gov/petsc/documentation/faq.html >>> for trouble shooting. >>> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3167-g03c0fad GIT Date: 2017-03-30 14:27:53 -0500 >>> [0]PETSC ERROR: pflotran on a debug_g-6.2 named mother by pujjad Tue Jun 27 16:52:57 2017 >>> [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes --download-fblaslapack=yes --download-metis=yes --download-parmetis=yes --download-eigen=yes --download-hypre=yes --download-superlu_dist=yes --download-superlu=yes --with-cc=gcc-6 --with-cxx=g++-6 --with-fc=gfortran-6 PETSC_ARCH=debug_g-6.2 PETSC_DIR=/home/pujjad/Repositories/petsc >>> [0]PETSC ERROR: #1 PetscSplitOwnership() line 77 in /home/pujjad/Repositories/petsc/src/sys/utils/psplit.c >>> [0]PETSC ERROR: #2 PetscLayoutSetUp() line 137 in /home/pujjad/Repositories/petsc/src/vec/is/utils/pmap.c >>> [0]PETSC ERROR: #3 VecCreate_Seq_Private() line 847 in /home/pujjad/Repositories/petsc/src/vec/vec/impls/seq/bvec2.c >>> [0]PETSC ERROR: #4 VecCreateSeqWithArray() line 899 in /home/pujjad/Repositories/petsc/src/vec/vec/impls/seq/bvec2.c >>> [0]PETSC ERROR: #5 PCSetUp_BJacobi_Singleblock() line 786 in /home/pujjad/Repositories/petsc/src/ksp/pc/impls/bjacobi/bjacobi.c >>> [0]PETSC ERROR: #6 PCSetUp_BJacobi() line 136 in /home/pujjad/Repositories/petsc/src/ksp/pc/impls/bjacobi/bjacobi.c >>> [0]PETSC ERROR: #7 PCSetUp() line 924 in /home/pujjad/Repositories/petsc/src/ksp/pc/interface/precon.c >>> [0]PETSC ERROR: #8 KSPSetUp() line 379 in /home/pujjad/Repositories/petsc/src/ksp/ksp/interface/itfunc.c >>> [mpiexec at mother] handle_pmi_cmd (./pm/pmiserv/pmiserv_cb.c:52): Unrecognized PMI command: abort | cleaning up processes >>> [mpiexec at mother] control_cb (./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command >>> [mpiexec at mother] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status >>> [mpiexec at mother] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event >>> [mpiexec at mother] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion >>> [mpiexec at mother] handle_pmi_cmd (./pm/pmiserv/pmiserv_cb.c:52): Unrecognized PMI command: abort | cleaning up processes >>> [mpiexec at mother] control_cb (./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command >>> [mpiexec at mother] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status >>> [mpiexec at mother] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event >>> [mpiexec at mother] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion >>> >>> >>> >>> With PCSetUp(T2) -> PCKSP >>> >>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>> [0]PETSC ERROR: Object is in wrong state >>> [0]PETSC ERROR: Matrix must be set first >>> [0]PETSC ERROR: See >>> http://www.mcs.anl.gov/petsc/documentation/faq.html >>> for trouble shooting. >>> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3167-g03c0fad GIT Date: 2017-03-30 14:27:53 -0500 >>> [0]PETSC ERROR: pflotran on a debug_g-6.2 named mother by pujjad Tue Jun 27 16:51:23 2017 >>> [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes --download-fblaslapack=yes --download-metis=yes --download-parmetis=yes --download-eigen=yes --download-hypre=yes --download-superlu_dist=yes --download-superlu=yes --with-cc=gcc-6 --with-cxx=g++-6 --with-fc=gfortran-6 PETSC_ARCH=debug_g-6.2 PETSC_DIR=/home/pujjad/Repositories/petsc >>> [0]PETSC ERROR: #1 PCSetUp() line 888 in /home/pujjad/Repositories/petsc/src/ksp/pc/interface/precon.c >>> application called MPI_Abort(MPI_COMM_WORLD, 73) - process 0 >>> [mpiexec at mother] handle_pmi_cmd (./pm/pmiserv/pmiserv_cb.c:52): Unrecognized PMI command: abort | cleaning up processes >>> [mpiexec at mother] control_cb (./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command >>> [mpiexec at mother] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status >>> [mpiexec at mother] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event >>> [mpiexec at mother] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion >>> >>> >>> >>> With PCSetUp(BJac_pc) -> PCBJACOBI >>> >>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>> [0]PETSC ERROR: Object is in wrong state >>> [0]PETSC ERROR: Matrix must be set first >>> [0]PETSC ERROR: See >>> http://www.mcs.anl.gov/petsc/documentation/faq.html >>> for trouble shooting. >>> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3167-g03c0fad GIT Date: 2017-03-30 14:27:53 -0500 >>> [0]PETSC ERROR: pflotran on a debug_g-6.2 named mother by pujjad Tue Jun 27 16:42:10 2017 >>> [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes --download-fblaslapack=yes --download-metis=yes --download-parmetis=yes --download-eigen=yes --download-hypre=yes --download-superlu_dist=yes --download-superlu=yes --with-cc=gcc-6 --with-cxx=g++-6 --with-fc=gfortran-6 PETSC_ARCH=debug_g-6.2 PETSC_DIR=/home/pujjad/Repositories/petsc >>> [0]PETSC ERROR: #1 PCSetUp() line 888 in /home/pujjad/Repositories/petsc/src/ksp/pc/interface/precon.c >>> [mpiexec at mother] handle_pmi_cmd (./pm/pmiserv/pmiserv_cb.c:52): Unrecognized PMI command: abort | cleaning up processes >>> [mpiexec at mother] control_cb (./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command >>> [mpiexec at mother] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status >>> [mpiexec at mother] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event >>> [mpiexec at mother] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion >>> >>> >>> >>> Grateful for any help! >>> Robert >>> >>> >>> > From robert.annewandter at opengosim.com Wed Jun 28 12:50:18 2017 From: robert.annewandter at opengosim.com (Robert Annewandter) Date: Wed, 28 Jun 2017 18:50:18 +0100 Subject: [petsc-users] PCCOMPOSITE with PCBJACOBI In-Reply-To: <3669D977-1FA4-46AC-9DFD-C69E3C7320EF@mcs.anl.gov> References: <3669D977-1FA4-46AC-9DFD-C69E3C7320EF@mcs.anl.gov> Message-ID: <663ab4e6-bc97-2983-09d5-0010bfddd442@opengosim.com> Interesting! And would fit into configuring PFLOTRAN via its input decks (ie we could also provide ASM instead of Block Jacobi) Thanks a lot! On 28/06/17 17:31, Barry Smith wrote: >> On Jun 28, 2017, at 2:07 AM, Robert Annewandter wrote: >> >> Thank you Barry! >> >> We like to hard wire it into PFLOTRAN with CPR-AMG Block Jacobi Two-Stage Preconditioning potentially becoming the standard solver strategy. > Understood. Note that you can embed the options into the program with PetscOptionsSetValue() so they don't need to be on the command line. > > Barry > >> Using the options database is a great start to reverse engineer the issue! >> >> Thanks! >> Robert >> >> >> >> >> On 27/06/17 23:45, Barry Smith wrote: >>> It is difficult, if not impossible at times to get all the options where you want them to be using the function call interface. On the other hand it is generally easy (if there are no inner PCSHELLS) to do this via the options database >>> >>> -pc_type composite >>> -pc_composite_type multiplicative >>> -pc_composite_pcs galerkin,bjacobi >>> >>> -sub_0_galerkin_ksp_type preonly >>> -sub_0_galerkin_pc_type none >>> >>> -sub_1_sub_pc_factor_shift_type inblocks >>> -sub_1_sub_pc_factor_zero_pivot zpiv >>> >>> >>> >>> >>>> On Jun 27, 2017, at 11:24 AM, Robert Annewandter >>>> wrote: >>>> >>>> Dear PETSc folks, >>>> >>>> >>>> I want a Block Jacobi PC to be the second PC in a two-stage preconditioning scheme implemented via multiplicative PCCOMPOSITE, with the outermost KSP an FGMRES. >>>> >>>> >>>> However, PCBJacobiGetSubKSP ( >>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCBJacobiGetSubKSP.html#PCBJacobiGetSubKSP >>>> ) requires to call KSPSetUp (or PCSetUp) first on its parent KSP, which I struggle in succeeding. I wonder which KSP (or if so PC) that is. >>>> >>>> >>>> This is how I attempt to do it (using PCKSP to provide a parent KSP for PCBJacobiGetSubKSP): >>>> >>>> >>>> call KSPGetPC(solver%ksp, solver%pc, ierr); CHKERRQ(ierr) >>>> call PCSetType(solver%pc, PCCOMPOSITE, ierr); CHKERRQ(ierr) >>>> call PCCompositeSetType(solver%pc, PC_COMPOSITE_MULTIPLICATIVE, ierr); CHKERRQ(ierr) >>>> >>>> >>>> ! 1st Stage >>>> call PCCompositeAddPC(solver%pc, PCGALERKIN, ierr); CHKERRQ(ierr) >>>> call PCCompositeGetPC(solver%pc, 0, T1, ierr); CHKERRQ(ierr) >>>> >>>> >>>> ! KSPPREONLY-PCNONE for testing >>>> call PCGalerkinGetKSP(T1, Ap_ksp, ierr); CHKERRQ(ierr) >>>> call KSPSetType(Ap_ksp, KSPPREONLY, ierr); CHKERRQ(ierr) >>>> call KSPGetPC(Ap_ksp, Ap_pc, ierr); CHKERRQ(ierr) >>>> call PCSetType(Ap_pc, PCNONE, ierr); CHKERRQ(ierr) >>>> >>>> >>>> ! 2nd Stage >>>> call PCCompositeAddPC(solver%pc, PCKSP, ierr); CHKERRQ(ierr) >>>> call PCCompositeGetPC(solver%pc, 1, T2, ierr); CHKERRQ(ierr) >>>> call PCKSPGetKSP(T2, BJac_ksp, ierr); CHKERRQ(ierr) >>>> call KSPSetType(BJac_ksp, KSPPREONLY, ierr); CHKERRQ(ierr) >>>> call KSPGetPC(BJac_ksp, BJac_pc, ierr); CHKERRQ(ierr) >>>> call PCSetType(BJac_pc, PCBJACOBI, ierr); CHKERRQ(ierr) >>>> >>>> >>>> call KSPSetUp(solver%ksp, ierr); CHKERRQ(ierr) >>>> ! call KSPSetUp(BJac_ksp, ierr); CHKERRQ(ierr) >>>> ! call PCSetUp(T2, ierr); CHKERRQ(ierr) >>>> ! call PCSetUp(BJac_pc, ierr); CHKERRQ(ierr) >>>> >>>> >>>> call PCBJacobiGetSubKSP(BJac_pc, nsub_ksp, first_sub_ksp, PETSC_NULL_KSP, ierr); CHKERRQ(ierr) >>>> allocate(sub_ksps(nsub_ksp)) >>>> call PCBJacobiGetSubKSP(BJac_pc, nsub_ksp, first_sub_ksp, sub_ksps,ierr); CHKERRQ(ierr) >>>> do i = 1, nsub_ksp >>>> call KSPGetPC(sub_ksps(i), BJac_pc_sub, ierr); CHKERRQ(ierr) >>>> call PCFactorSetShiftType(BJac_pc_sub, MAT_SHIFT_INBLOCKS, ierr); CHKERRQ(ierr) >>>> call PCFactorSetZeroPivot(BJac_pc_sub, solver%linear_zero_pivot_tol, ierr); CHKERRQ(ierr) >>>> end do >>>> deallocate(sub_ksps) >>>> nullify(sub_ksps) >>>> >>>> >>>> Is using PCKSP a good idea at all? >>>> >>>> >>>> With KSPSetUp(solver%ksp) -> FGMRES >>>> >>>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>> [0]PETSC ERROR: Object is in wrong state >>>> [0]PETSC ERROR: You requested a vector from a KSP that cannot provide one >>>> [0]PETSC ERROR: See >>>> http://www.mcs.anl.gov/petsc/documentation/faq.html >>>> for trouble shooting. >>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3167-g03c0fad GIT Date: 2017-03-30 14:27:53 -0500 >>>> [0]PETSC ERROR: pflotran on a debug_g-6.2 named mother by pujjad Tue Jun 27 16:55:14 2017 >>>> [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes --download-fblaslapack=yes --download-metis=yes --download-parmetis=yes --download-eigen=yes --download-hypre=yes --download-superlu_dist=yes --download-superlu=yes --with-cc=gcc-6 --with-cxx=g++-6 --with-fc=gfortran-6 PETSC_ARCH=debug_g-6.2 PETSC_DIR=/home/pujjad/Repositories/petsc >>>> [0]PETSC ERROR: #1 KSPCreateVecs() line 939 in /home/pujjad/Repositories/petsc/src/ksp/ksp/interface/iterativ.c >>>> [0]PETSC ERROR: #2 KSPSetUp_GMRES() line 85 in /home/pujjad/Repositories/petsc/src/ksp/ksp/impls/gmres/gmres.c >>>> [0]PETSC ERROR: #3 KSPSetUp_FGMRES() line 41 in /home/pujjad/Repositories/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >>>> [0]PETSC ERROR: #4 KSPSetUp() line 338 in /home/pujjad/Repositories/petsc/src/ksp/ksp/interface/itfunc.c >>>> application called MPI_Abort(MPI_COMM_WORLD, 73) - process 0 >>>> [mpiexec at mother] handle_pmi_cmd (./pm/pmiserv/pmiserv_cb.c:52): Unrecognized PMI command: abort | cleaning up processes >>>> [mpiexec at mother] control_cb (./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command >>>> [mpiexec at mother] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status >>>> [mpiexec at mother] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event >>>> [mpiexec at mother] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion >>>> >>>> >>>> >>>> With KSPSetUp(BJac_ksp) -> KSPPREONLY >>>> >>>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>> [0]PETSC ERROR: Arguments are incompatible >>>> [0]PETSC ERROR: Both n and N cannot be PETSC_DECIDE >>>> likely a call to VecSetSizes() or MatSetSizes() is wrong. >>>> See >>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#split >>>> >>>> [0]PETSC ERROR: See >>>> http://www.mcs.anl.gov/petsc/documentation/faq.html >>>> for trouble shooting. >>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3167-g03c0fad GIT Date: 2017-03-30 14:27:53 -0500 >>>> [0]PETSC ERROR: pflotran on a debug_g-6.2 named mother by pujjad Tue Jun 27 16:52:57 2017 >>>> [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes --download-fblaslapack=yes --download-metis=yes --download-parmetis=yes --download-eigen=yes --download-hypre=yes --download-superlu_dist=yes --download-superlu=yes --with-cc=gcc-6 --with-cxx=g++-6 --with-fc=gfortran-6 PETSC_ARCH=debug_g-6.2 PETSC_DIR=/home/pujjad/Repositories/petsc >>>> [0]PETSC ERROR: #1 PetscSplitOwnership() line 77 in /home/pujjad/Repositories/petsc/src/sys/utils/psplit.c >>>> [0]PETSC ERROR: #2 PetscLayoutSetUp() line 137 in /home/pujjad/Repositories/petsc/src/vec/is/utils/pmap.c >>>> [0]PETSC ERROR: #3 VecCreate_Seq_Private() line 847 in /home/pujjad/Repositories/petsc/src/vec/vec/impls/seq/bvec2.c >>>> [0]PETSC ERROR: #4 VecCreateSeqWithArray() line 899 in /home/pujjad/Repositories/petsc/src/vec/vec/impls/seq/bvec2.c >>>> [0]PETSC ERROR: #5 PCSetUp_BJacobi_Singleblock() line 786 in /home/pujjad/Repositories/petsc/src/ksp/pc/impls/bjacobi/bjacobi.c >>>> [0]PETSC ERROR: #6 PCSetUp_BJacobi() line 136 in /home/pujjad/Repositories/petsc/src/ksp/pc/impls/bjacobi/bjacobi.c >>>> [0]PETSC ERROR: #7 PCSetUp() line 924 in /home/pujjad/Repositories/petsc/src/ksp/pc/interface/precon.c >>>> [0]PETSC ERROR: #8 KSPSetUp() line 379 in /home/pujjad/Repositories/petsc/src/ksp/ksp/interface/itfunc.c >>>> [mpiexec at mother] handle_pmi_cmd (./pm/pmiserv/pmiserv_cb.c:52): Unrecognized PMI command: abort | cleaning up processes >>>> [mpiexec at mother] control_cb (./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command >>>> [mpiexec at mother] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status >>>> [mpiexec at mother] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event >>>> [mpiexec at mother] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion >>>> [mpiexec at mother] handle_pmi_cmd (./pm/pmiserv/pmiserv_cb.c:52): Unrecognized PMI command: abort | cleaning up processes >>>> [mpiexec at mother] control_cb (./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command >>>> [mpiexec at mother] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status >>>> [mpiexec at mother] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event >>>> [mpiexec at mother] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion >>>> >>>> >>>> >>>> With PCSetUp(T2) -> PCKSP >>>> >>>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>> [0]PETSC ERROR: Object is in wrong state >>>> [0]PETSC ERROR: Matrix must be set first >>>> [0]PETSC ERROR: See >>>> http://www.mcs.anl.gov/petsc/documentation/faq.html >>>> for trouble shooting. >>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3167-g03c0fad GIT Date: 2017-03-30 14:27:53 -0500 >>>> [0]PETSC ERROR: pflotran on a debug_g-6.2 named mother by pujjad Tue Jun 27 16:51:23 2017 >>>> [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes --download-fblaslapack=yes --download-metis=yes --download-parmetis=yes --download-eigen=yes --download-hypre=yes --download-superlu_dist=yes --download-superlu=yes --with-cc=gcc-6 --with-cxx=g++-6 --with-fc=gfortran-6 PETSC_ARCH=debug_g-6.2 PETSC_DIR=/home/pujjad/Repositories/petsc >>>> [0]PETSC ERROR: #1 PCSetUp() line 888 in /home/pujjad/Repositories/petsc/src/ksp/pc/interface/precon.c >>>> application called MPI_Abort(MPI_COMM_WORLD, 73) - process 0 >>>> [mpiexec at mother] handle_pmi_cmd (./pm/pmiserv/pmiserv_cb.c:52): Unrecognized PMI command: abort | cleaning up processes >>>> [mpiexec at mother] control_cb (./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command >>>> [mpiexec at mother] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status >>>> [mpiexec at mother] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event >>>> [mpiexec at mother] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion >>>> >>>> >>>> >>>> With PCSetUp(BJac_pc) -> PCBJACOBI >>>> >>>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>> [0]PETSC ERROR: Object is in wrong state >>>> [0]PETSC ERROR: Matrix must be set first >>>> [0]PETSC ERROR: See >>>> http://www.mcs.anl.gov/petsc/documentation/faq.html >>>> for trouble shooting. >>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3167-g03c0fad GIT Date: 2017-03-30 14:27:53 -0500 >>>> [0]PETSC ERROR: pflotran on a debug_g-6.2 named mother by pujjad Tue Jun 27 16:42:10 2017 >>>> [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes --download-fblaslapack=yes --download-metis=yes --download-parmetis=yes --download-eigen=yes --download-hypre=yes --download-superlu_dist=yes --download-superlu=yes --with-cc=gcc-6 --with-cxx=g++-6 --with-fc=gfortran-6 PETSC_ARCH=debug_g-6.2 PETSC_DIR=/home/pujjad/Repositories/petsc >>>> [0]PETSC ERROR: #1 PCSetUp() line 888 in /home/pujjad/Repositories/petsc/src/ksp/pc/interface/precon.c >>>> [mpiexec at mother] handle_pmi_cmd (./pm/pmiserv/pmiserv_cb.c:52): Unrecognized PMI command: abort | cleaning up processes >>>> [mpiexec at mother] control_cb (./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command >>>> [mpiexec at mother] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status >>>> [mpiexec at mother] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event >>>> [mpiexec at mother] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion >>>> >>>> >>>> >>>> Grateful for any help! >>>> Robert >>>> >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Jun 28 14:06:03 2017 From: jed at jedbrown.org (Jed Brown) Date: Wed, 28 Jun 2017 13:06:03 -0600 Subject: [petsc-users] Node renumbering and ghost nodes determination In-Reply-To: References: Message-ID: <871sq4ortw.fsf@jedbrown.org> There is no single correct way to do this so you can do whatever makes the most sense for your application. That ranges from calling ParMETIS directly and creating a numbering using any scheme you like to using PETSc functions for everything. Note that assembled linear algebra (matrices and vectors) cares about the dof connectivity graph which is different from the element graph. Some people (usually those using lowest order methods) don't pay attention to elements at all (even integrating elements redundantly instead of communicating again) while others focus entirely on elements and make arbitrary decisions about vertex ownership. For every choice, you can find someone that did it one way and someone else that did it the other way -- both will be convinced that their choice was the only correct choice. Guido Giuntoli writes: > Hi, > > I am using ParMETIS to get a partition of my mesh (a group of elements for > each process) and now I want to renumber the nodes contiguous for each > process according to their rank. Can I take advantage of the IS functions > (or others like AO) directly from this point or is crucial to do the > partition with ParMETIS using PETSc functions and apply > ISPartitionToNumbering like in the manual's example ? > > Another question, which is the best way of determine the ownership of the > nodes ? for example : node 101 belongs to an element in process 0 and to an > element in process 1, how do I take the decision ? In the past I did a > partition with METIS and he returned an array of nodes belonging like the > elements so that was easy but now my only idea is to get the new > renumbering and check if the new renumbering fall inside my range. > > Thank you, Guido. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From hbcbh1999 at gmail.com Wed Jun 28 19:50:04 2017 From: hbcbh1999 at gmail.com (Hao Zhang) Date: Wed, 28 Jun 2017 20:50:04 -0400 Subject: [petsc-users] petsc-3.5 quadruple precision mixed with double Message-ID: hi, all: I'm developing a CFD project, which all the matrix solver is PETSc based. It works fine with double precision until I need some more, like quadruple precision. Ax = b is the system input are double precision shall I use PetscScalar instead of double? Or is there a better way to pass double type to PETSc solver? Or shall I convert double type to quadruple type before passing into the solver? I'm using PETSc 3.5.4. GCC 5.2.0 comiple with float128. I think there should be no compiling error. I can provide more code part if necessary. Thanks! -- Hao -------------- next part -------------- An HTML attachment was scrubbed... URL: From giuntoli1991 at gmail.com Wed Jun 28 19:52:28 2017 From: giuntoli1991 at gmail.com (Guido Giuntoli) Date: Thu, 29 Jun 2017 02:52:28 +0200 Subject: [petsc-users] Node renumbering and ghost nodes determination In-Reply-To: <871sq4ortw.fsf@jedbrown.org> References: <871sq4ortw.fsf@jedbrown.org> Message-ID: Thank you very much Jed, now I have a clearer explanation and another point of view of the situation. This is like take the red or blue pill for me... 2017-06-28 21:06 GMT+02:00 Jed Brown : > There is no single correct way to do this so you can do whatever makes > the most sense for your application. That ranges from calling ParMETIS > directly and creating a numbering using any scheme you like to using > PETSc functions for everything. Note that assembled linear algebra > (matrices and vectors) cares about the dof connectivity graph which is > different from the element graph. Some people (usually those using > lowest order methods) don't pay attention to elements at all (even > integrating elements redundantly instead of communicating again) while > others focus entirely on elements and make arbitrary decisions about > vertex ownership. For every choice, you can find someone that did it > one way and someone else that did it the other way -- both will be > convinced that their choice was the only correct choice. > > Guido Giuntoli writes: > > > Hi, > > > > I am using ParMETIS to get a partition of my mesh (a group of elements > for > > each process) and now I want to renumber the nodes contiguous for each > > process according to their rank. Can I take advantage of the IS functions > > (or others like AO) directly from this point or is crucial to do the > > partition with ParMETIS using PETSc functions and apply > > ISPartitionToNumbering like in the manual's example ? > > > > Another question, which is the best way of determine the ownership of the > > nodes ? for example : node 101 belongs to an element in process 0 and to > an > > element in process 1, how do I take the decision ? In the past I did a > > partition with METIS and he returned an array of nodes belonging like the > > elements so that was easy but now my only idea is to get the new > > renumbering and check if the new renumbering fall inside my range. > > > > Thank you, Guido. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Jun 28 20:00:22 2017 From: jed at jedbrown.org (Jed Brown) Date: Wed, 28 Jun 2017 19:00:22 -0600 Subject: [petsc-users] petsc-3.5 quadruple precision mixed with double In-Reply-To: References: Message-ID: <87wp7vmwux.fsf@jedbrown.org> Hao Zhang writes: > hi, all: > > I'm developing a CFD project, which all the matrix solver is PETSc based. > It works fine with double precision until I need some more, like quadruple > precision. What sort of CFD problem do you need quad precision for? > Ax = b is the system > input are double precision > > shall I use PetscScalar instead of double? Yes. > Or is there a better way to pass double type to PETSc solver? No, use PetscScalar in your code. > Or shall I convert double type to quadruple type before passing into the > solver? That tends to be uglier and slower. It also loses accuracy in the input, which is important for extremely ill-conditioned problems. > I'm using PETSc 3.5.4. GCC 5.2.0 comiple with float128. I think there > should be no compiling error. > I can provide more code part if necessary. Thanks! > > -- > Hao -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From balay at mcs.anl.gov Wed Jun 28 20:05:56 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 28 Jun 2017 20:05:56 -0500 Subject: [petsc-users] petsc-3.5 quadruple precision mixed with double In-Reply-To: <87wp7vmwux.fsf@jedbrown.org> References: <87wp7vmwux.fsf@jedbrown.org> Message-ID: Also you should use petsc-3.7 - the current release. Satish On Thu, 29 Jun 2017, Jed Brown wrote: > Hao Zhang writes: > > > hi, all: > > > > I'm developing a CFD project, which all the matrix solver is PETSc based. > > It works fine with double precision until I need some more, like quadruple > > precision. > > What sort of CFD problem do you need quad precision for? > > > Ax = b is the system > > input are double precision > > > > shall I use PetscScalar instead of double? > > Yes. > > > Or is there a better way to pass double type to PETSc solver? > > No, use PetscScalar in your code. > > > Or shall I convert double type to quadruple type before passing into the > > solver? > > That tends to be uglier and slower. It also loses accuracy in the > input, which is important for extremely ill-conditioned problems. > > > I'm using PETSc 3.5.4. GCC 5.2.0 comiple with float128. I think there > > should be no compiling error. > > I can provide more code part if necessary. Thanks! > > > > -- > > Hao > From hbcbh1999 at gmail.com Wed Jun 28 20:22:34 2017 From: hbcbh1999 at gmail.com (Hao Zhang) Date: Wed, 28 Jun 2017 21:22:34 -0400 Subject: [petsc-users] petsc-3.5 quadruple precision mixed with double In-Reply-To: References: <87wp7vmwux.fsf@jedbrown.org> Message-ID: It's 3d incompressible RT simulation. My pressure between serial and parallel calculation is off by 10^(-14) in relative error. it eventually build up at later time. I want to rule out the possibilities that PETSc give me bad solution. pressure scale is 10^(-2). I use PetscScalar. thanks @Jed Brown for confirming that but I have Segmentation Violation when retrieving x. I allocated memory for the array x (PetscScalar type). if not for quadruple precision, there is no error. thanks @Satish Balay. I will update code petsc-3.7 later. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jun 28 20:27:19 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 28 Jun 2017 20:27:19 -0500 Subject: [petsc-users] petsc-3.5 quadruple precision mixed with double In-Reply-To: References: <87wp7vmwux.fsf@jedbrown.org> Message-ID: On Wed, Jun 28, 2017 at 8:22 PM, Hao Zhang wrote: > It's 3d incompressible RT simulation. My pressure between serial and > parallel calculation is off by 10^(-14) in relative error. > This could just be reordering of the calculation. > it eventually build up at later time. I want to rule out the possibilities > that PETSc give me bad solution. pressure scale is 10^(-2). > > I use PetscScalar. thanks @Jed Brown for confirming that but I have > Segmentation Violation when retrieving x. I allocated memory for the array > x (PetscScalar type). if not for quadruple precision, there is no error. > It sounds like maybe you are passing a double where a PetscScalar is expected, or vice versa. Run under valgrind. Matt > thanks @Satish Balay. I will update code petsc-3.7 later. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hbcbh1999 at gmail.com Wed Jun 28 21:00:37 2017 From: hbcbh1999 at gmail.com (Hao Zhang) Date: Wed, 28 Jun 2017 22:00:37 -0400 Subject: [petsc-users] petsc-3.5 quadruple precision mixed with double In-Reply-To: References: <87wp7vmwux.fsf@jedbrown.org> Message-ID: thanks, @Matthew I was worried about this. is there a way to convert double to PetscScalar? incompressible code are double type everywhere else except for PETSc. could this be the problem? it was double type for the entire code including PETSc before this quadruple test. On Wed, Jun 28, 2017 at 9:27 PM, Matthew Knepley wrote: > On Wed, Jun 28, 2017 at 8:22 PM, Hao Zhang wrote: > >> It's 3d incompressible RT simulation. My pressure between serial and >> parallel calculation is off by 10^(-14) in relative error. >> > > This could just be reordering of the calculation. > > >> it eventually build up at later time. I want to rule out the >> possibilities that PETSc give me bad solution. pressure scale is 10^(-2). >> >> I use PetscScalar. thanks @Jed Brown for confirming that but I have >> Segmentation Violation when retrieving x. I allocated memory for the array >> x (PetscScalar type). if not for quadruple precision, there is no error. >> > > It sounds like maybe you are passing a double where a PetscScalar is > expected, or vice versa. Run under valgrind. > > Matt > > >> thanks @Satish Balay. I will update code petsc-3.7 later. >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > -- hao -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jun 28 21:02:35 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 28 Jun 2017 21:02:35 -0500 Subject: [petsc-users] petsc-3.5 quadruple precision mixed with double In-Reply-To: References: <87wp7vmwux.fsf@jedbrown.org> Message-ID: On Wed, Jun 28, 2017 at 9:00 PM, Hao Zhang wrote: > thanks, @Matthew > > I was worried about this. is there a way to convert double to PetscScalar? > C cast. > incompressible code are double type everywhere else except for PETSc. > could this be the problem? > I doubt using high precision only in the solve will make much difference, certainly not 10^{-14}. Matt > it was double type for the entire code including PETSc before this > quadruple test. > > > > > > > > > On Wed, Jun 28, 2017 at 9:27 PM, Matthew Knepley > wrote: > >> On Wed, Jun 28, 2017 at 8:22 PM, Hao Zhang wrote: >> >>> It's 3d incompressible RT simulation. My pressure between serial and >>> parallel calculation is off by 10^(-14) in relative error. >>> >> >> This could just be reordering of the calculation. >> >> >>> it eventually build up at later time. I want to rule out the >>> possibilities that PETSc give me bad solution. pressure scale is 10^(-2). >>> >>> I use PetscScalar. thanks @Jed Brown for confirming that but I have >>> Segmentation Violation when retrieving x. I allocated memory for the array >>> x (PetscScalar type). if not for quadruple precision, there is no error. >>> >> >> It sounds like maybe you are passing a double where a PetscScalar is >> expected, or vice versa. Run under valgrind. >> >> Matt >> >> >>> thanks @Satish Balay. I will update code petsc-3.7 later. >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ >> > > > > -- > hao > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Jun 28 21:13:27 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 28 Jun 2017 21:13:27 -0500 Subject: [petsc-users] petsc-3.5 quadruple precision mixed with double In-Reply-To: References: <87wp7vmwux.fsf@jedbrown.org> Message-ID: Better to do an update of all the code to use PetscScalar (just replace double with PetscScalar in some editor or sed script), trying to translate back and forth in the code will be a debugging nightmare. Barry > On Jun 28, 2017, at 9:02 PM, Matthew Knepley wrote: > > On Wed, Jun 28, 2017 at 9:00 PM, Hao Zhang wrote: > thanks, @Matthew > > I was worried about this. is there a way to convert double to PetscScalar? > > C cast. > > incompressible code are double type everywhere else except for PETSc. could this be the problem? > > I doubt using high precision only in the solve will make much difference, certainly not 10^{-14}. > > Matt > > it was double type for the entire code including PETSc before this quadruple test. > > > > > > > > > On Wed, Jun 28, 2017 at 9:27 PM, Matthew Knepley wrote: > On Wed, Jun 28, 2017 at 8:22 PM, Hao Zhang wrote: > It's 3d incompressible RT simulation. My pressure between serial and parallel calculation is off by 10^(-14) in relative error. > > This could just be reordering of the calculation. > > it eventually build up at later time. I want to rule out the possibilities that PETSc give me bad solution. pressure scale is 10^(-2). > > I use PetscScalar. thanks @Jed Brown for confirming that but I have Segmentation Violation when retrieving x. I allocated memory for the array x (PetscScalar type). if not for quadruple precision, there is no error. > > It sounds like maybe you are passing a double where a PetscScalar is expected, or vice versa. Run under valgrind. > > Matt > > thanks @Satish Balay. I will update code petsc-3.7 later. > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > > -- > hao > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ From Hassan.Raiesi at aero.bombardier.com Thu Jun 29 08:59:48 2017 From: Hassan.Raiesi at aero.bombardier.com (Hassan Raiesi) Date: Thu, 29 Jun 2017 13:59:48 +0000 Subject: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel In-Reply-To: References: Message-ID: Matthew, Not sure if I understood completely, I have the ordering according to the CGNS standard for all cells in the mesh, or I can change it if needed, but I don?t understand what you mean by reporting all faces as ordered vertices? Do you mean I just define the face connectivity?Could you explain a bit more? Thank you -Hassan From: Matthew Knepley [mailto:knepley at gmail.com] Sent: Tuesday, June 27, 2017 1:54 PM To: Hassan Raiesi Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel On Tue, Jun 27, 2017 at 11:30 AM, Hassan Raiesi > wrote: I meant the interpolation, DMPlex supports those element shapes, however, there are two problems, one, ?DMPlexBuildFromCellList_Parallel? does not take mixed elements (numCorners must be constant for all elements in current implementation), that was easy to fix, I already extended DMPlexBuildFromCellList_Parallel_Private to take elements with different shapes, then I realized the interpolation does not work when the mesh has elements other than tets and hex. Okay, here is what is needed. You need to a) prescribe an order for the vertices in a prism/pyramid (all input cells must have this order) b) report all faces as sets of ordered vertices - You have to order them matching the Plex vertex order for the lower dimensional shape - They should be oriented as to have outward facing normal Regarding the interpolation, I also noticed that the memory requirement is huge if I load the whole mesh on one core and interpolate (I cannot interpolate a mesh with 9M tets on a machine with 128GB of memory, it ran out of memory, I?ll try to run with petsc memory logs and send). I am not sure what the upper limit is, but in a 3D mesh you could have many times the number of cells in faces and edges. Note that you would need --with-64-bit-indices to go beyond 4GB. is there anyways to interpolate the mesh after DMPlexdistribute? The code crashes if I move DMPlexInterpolate to after calling DMPlexdistribute., I guess what is needed is a DMPlexInterpolate on already distributed meshes. This should work. Maybe try a small example? If that crashes, just send it. Thanks, Matt Thank you -Hassan From: Matthew Knepley [mailto:knepley at gmail.com] Sent: Tuesday, June 27, 2017 11:53 AM To: Hassan Raiesi > Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel On Tue, Jun 27, 2017 at 10:08 AM, Hassan Raiesi > wrote: Great, It?s clear now ?, One more question, any plan to support other element shapes (prism and pyramid) in 3D?, DMPlexGetRawFaces_Internal only supports tets and hexs in 3D, can prisms and pyramids be used as degenerate hexahedrons? It depends on what you mean "support". Right now, we can represent these shapes in Plex. However, if you want mesh interpolation to work, then yes you need to extend GetRawFaces() to understand that shape. If you want them read out of a file format, other than Gmsh, we would likely have to extend that as well. These are straightforward once I understand what exactly you want to do. Thanks, Matt Thank you -Hassan From: Matthew Knepley [mailto:knepley at gmail.com] Sent: Tuesday, June 27, 2017 10:17 AM To: Hassan Raiesi > Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel On Tue, Jun 27, 2017 at 9:12 AM, Hassan Raiesi > wrote: Thanks for your reply, Is there any example where each rank owns more than 1 element, i.e for the simple mesh here(attached png file), how should I pack and pass the coordinates of the vertices owned by rank0, rank1 Rank0:numcells = 2; num nodes=4, cells=[154 245] , nodes=[1 5 2 4] nodes = 1245 vertex coords: in what node order ? [coords_n1 coords_n2 coords_n4 coords_n5] or [coords_n2 coords_n4 coords_n1 coords_n5] or ?..? rank1: numcells = 2; num nodes=2, cells=[532 635], nodes=[6 3] vertexcoords [how to pack the nodes coords here?] should it be [x6y6 x3y3] or [x3y3 x6y6]? In what order? I think there is a misunderstanding here. There is NO connection between the cell order and the vertex order. Each process gets a contiguous set of cells (in the global numbering) and a contiguous set of vertices (in the global numbering). These two are NOT related. We then move the vertices to the correct processes. In this way, we can load completely in parallel, without requiring any setup in the mesh file. If you are worried, you can always arrange the order of vertices to "match" the order of cells. Thanks, Matt thanks From: Matthew Knepley [mailto:knepley at gmail.com] Sent: Sunday, June 25, 2017 1:04 PM To: Hassan Raiesi > Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel On Tue, Apr 11, 2017 at 9:21 AM, Hassan Raiesi > wrote: Hello, I?m trying to use DMPlexCreateFromCellListParallel to create a DM from an already partitioned mesh, It requires an array of numVertices*spaceDim numbers, but how should one order the coordinates of the vertices? Global order. Here is the idea. You must read the file in chunks so that each proc can read its own chunk in parallel without talking to anyone else. we only pass the global vertex numbers using ?const int cells[]? to define the cell-connectivity, so passing the vertex coordinates in local ordering wouldn?t make sense? Yes. If it needs to be in global ordering, should I sort the global index of the node numbers owned by each rank (as they wont be continuous). Nope. Thanks, Matt Thank you Hassan Raiesi, Bombardier Aerospace www.bombardier.com -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jun 29 09:12:12 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 29 Jun 2017 09:12:12 -0500 Subject: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel In-Reply-To: References: Message-ID: On Thu, Jun 29, 2017 at 8:59 AM, Hassan Raiesi < Hassan.Raiesi at aero.bombardier.com> wrote: > Matthew, > > > > Not sure if I understood completely, I have the ordering according to the > CGNS standard for all cells in the mesh, or I can change it if needed, but > I don?t understand what you mean by reporting all faces as ordered > vertices? Do you mean I just define the face connectivity?Could you explain > a bit more? > For a triangular prism, you would add the case in GetRawFaces() for dim 3, coneSize 6. Lets look at a triangle (dim 2, coneSize 3) https://bitbucket.org/petsc/petsc/src/5ebd2cd705b0ffa5a27aee8a08f86795d413b548/src/dm/impls/plex/plexinterpolate.c?at=master&fileviewer=file-view-default#plexinterpolate.c-63 You see that it returns 3 groups of 2 vertices, which are the edges. For a tetrahedron https://bitbucket.org/petsc/petsc/src/5ebd2cd705b0ffa5a27aee8a08f86795d413b548/src/dm/impls/plex/plexinterpolate.c?at=master&fileviewer=file-view-default#plexinterpolate.c-101 it returns 4 groups of 3 vertices, which are the faces. Note that all faces should be oriented such that the normal is outward. Thanks, Matt > > > Thank you > > -Hassan > > > > *From:* Matthew Knepley [mailto:knepley at gmail.com] > *Sent:* Tuesday, June 27, 2017 1:54 PM > *To:* Hassan Raiesi > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] specifying vertex coordinates using > DMPlexCreateFromCellListParallel > > > > On Tue, Jun 27, 2017 at 11:30 AM, Hassan Raiesi bombardier.com> wrote: > > I meant the interpolation, > > DMPlex supports those element shapes, however, there are two problems, > one, ?DMPlexBuildFromCellList_Parallel? does not take mixed elements > (numCorners must be constant for all elements in current implementation), > that was easy to fix, > > I already extended DMPlexBuildFromCellList_Parallel_Private to take > elements with different shapes, then I realized the interpolation does not > work when the mesh has elements other than tets and hex. > > > > Okay, here is what is needed. You need to > > > > a) prescribe an order for the vertices in a prism/pyramid (all input > cells must have this order) > > > > b) report all faces as sets of ordered vertices > > - You have to order them matching the Plex vertex order for the lower > dimensional shape > > - They should be oriented as to have outward facing normal > > > > Regarding the interpolation, I also noticed that the memory requirement is > huge if I load the whole mesh on one core and interpolate (I cannot > interpolate a mesh with 9M tets on a machine with 128GB of memory, it ran > out of memory, I?ll try to run with petsc memory logs and send). > > > > I am not sure what the upper limit is, but in a 3D mesh you could have > many times the number of cells in faces and edges. Note that > > you would need --with-64-bit-indices to go beyond 4GB. > > > > is there anyways to interpolate the mesh after DMPlexdistribute? The code > crashes if I move DMPlexInterpolate to after calling DMPlexdistribute., I > guess what is needed is a DMPlexInterpolate on already distributed meshes. > > > > This should work. Maybe try a small example? If that crashes, just send it. > > > > Thanks, > > > > Matt > > > > Thank you > > -Hassan > > > > *From:* Matthew Knepley [mailto:knepley at gmail.com] > *Sent:* Tuesday, June 27, 2017 11:53 AM > *To:* Hassan Raiesi > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] specifying vertex coordinates using > DMPlexCreateFromCellListParallel > > > > On Tue, Jun 27, 2017 at 10:08 AM, Hassan Raiesi bombardier.com> wrote: > > Great, It?s clear now J, > > One more question, any plan to support other element shapes (prism and > pyramid) in 3D?, DMPlexGetRawFaces_Internal only supports tets and hexs > in 3D, can prisms and pyramids be used as degenerate hexahedrons? > > > > It depends on what you mean "support". Right now, we can represent these > shapes in Plex. However, if you > > want mesh interpolation to work, then yes you need to extend GetRawFaces() > to understand that shape. If > > you want them read out of a file format, other than Gmsh, we would likely > have to extend that as well. These > > are straightforward once I understand what exactly you want to do. > > > > Thanks, > > > > Matt > > > > Thank you > > -Hassan > > > > > > > > > > *From:* Matthew Knepley [mailto:knepley at gmail.com] > *Sent:* Tuesday, June 27, 2017 10:17 AM > *To:* Hassan Raiesi > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] specifying vertex coordinates using > DMPlexCreateFromCellListParallel > > > > On Tue, Jun 27, 2017 at 9:12 AM, Hassan Raiesi bombardier.com> wrote: > > Thanks for your reply, > > > > Is there any example where each rank owns more than 1 element, i.e for the > simple mesh here(attached png file), how should I pack and pass the > coordinates of the vertices owned by rank0, rank1 > > > > Rank0:numcells = 2; num nodes=4, cells=[154 245] , nodes=[1 5 2 4] > > nodes = 1245 > > vertex coords: in what node order ? > > [coords_n1 coords_n2 coords_n4 coords_n5] or > > [coords_n2 coords_n4 coords_n1 coords_n5] or ?..? > > > > > > rank1: numcells = 2; num nodes=2, cells=[532 635], nodes=[6 3] > > vertexcoords [how to pack the nodes coords here?] > > should it be [x6y6 x3y3] or [x3y3 x6y6]? In what order? > > > > I think there is a misunderstanding here. > > > > There is NO connection between the cell order and the vertex order. Each > process gets a contiguous > > set of cells (in the global numbering) and a contiguous set of vertices > (in the global numbering). These > > two are NOT related. We then move the vertices to the correct processes. > In this way, we can load > > completely in parallel, without requiring any setup in the mesh file. > > > > If you are worried, you can always arrange the order of vertices to > "match" the order of cells. > > > > Thanks, > > > > Matt > > > > thanks > > > > > > > > *From:* Matthew Knepley [mailto:knepley at gmail.com] > *Sent:* Sunday, June 25, 2017 1:04 PM > *To:* Hassan Raiesi > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] specifying vertex coordinates using > DMPlexCreateFromCellListParallel > > > > On Tue, Apr 11, 2017 at 9:21 AM, Hassan Raiesi bombardier.com> wrote: > > Hello, > > > > I?m trying to use DMPlexCreateFromCellListParallel to create a DM from an > already partitioned mesh, > > It requires an array of numVertices*spaceDim numbers, but how should one > order the coordinates of the vertices? > > > > Global order. Here is the idea. You must read the file in chunks so that > each proc can read its own chunk in parallel > > without talking to anyone else. > > > > we only pass the global vertex numbers using ?const int cells[]? to define > the cell-connectivity, so passing the vertex coordinates in local ordering > wouldn?t make sense? > > > > Yes. > > > > If it needs to be in global ordering, should I sort the global index of > the node numbers owned by each rank (as they wont be continuous). > > > > Nope. > > > > Thanks, > > > > Matt > > > > > > Thank you > > > > Hassan Raiesi, > > Bombardier Aerospace > > www.bombardier.com > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > http://www.caam.rice.edu/~mk51/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > http://www.caam.rice.edu/~mk51/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > http://www.caam.rice.edu/~mk51/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > http://www.caam.rice.edu/~mk51/ > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Hassan.Raiesi at aero.bombardier.com Thu Jun 29 09:32:18 2017 From: Hassan.Raiesi at aero.bombardier.com (Hassan Raiesi) Date: Thu, 29 Jun 2017 14:32:18 +0000 Subject: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel In-Reply-To: References: Message-ID: Ok, The problem is that the faceSizes varies for pyramids and prisms, they have faces with variable number of nodes (i.e for pyramids, four triangular faces and one quad face, this I don?t see how to specify? What if I report all faces as quad faces, for triangular faces, there would be a repeated vertex, would it make any problem? Thank you -Hassan From: Matthew Knepley [mailto:knepley at gmail.com] Sent: Thursday, June 29, 2017 10:12 AM To: Hassan Raiesi Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel On Thu, Jun 29, 2017 at 8:59 AM, Hassan Raiesi > wrote: Matthew, Not sure if I understood completely, I have the ordering according to the CGNS standard for all cells in the mesh, or I can change it if needed, but I don?t understand what you mean by reporting all faces as ordered vertices? Do you mean I just define the face connectivity?Could you explain a bit more? For a triangular prism, you would add the case in GetRawFaces() for dim 3, coneSize 6. Lets look at a triangle (dim 2, coneSize 3) https://bitbucket.org/petsc/petsc/src/5ebd2cd705b0ffa5a27aee8a08f86795d413b548/src/dm/impls/plex/plexinterpolate.c?at=master&fileviewer=file-view-default#plexinterpolate.c-63 You see that it returns 3 groups of 2 vertices, which are the edges. For a tetrahedron https://bitbucket.org/petsc/petsc/src/5ebd2cd705b0ffa5a27aee8a08f86795d413b548/src/dm/impls/plex/plexinterpolate.c?at=master&fileviewer=file-view-default#plexinterpolate.c-101 it returns 4 groups of 3 vertices, which are the faces. Note that all faces should be oriented such that the normal is outward. Thanks, Matt Thank you -Hassan From: Matthew Knepley [mailto:knepley at gmail.com] Sent: Tuesday, June 27, 2017 1:54 PM To: Hassan Raiesi > Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel On Tue, Jun 27, 2017 at 11:30 AM, Hassan Raiesi > wrote: I meant the interpolation, DMPlex supports those element shapes, however, there are two problems, one, ?DMPlexBuildFromCellList_Parallel? does not take mixed elements (numCorners must be constant for all elements in current implementation), that was easy to fix, I already extended DMPlexBuildFromCellList_Parallel_Private to take elements with different shapes, then I realized the interpolation does not work when the mesh has elements other than tets and hex. Okay, here is what is needed. You need to a) prescribe an order for the vertices in a prism/pyramid (all input cells must have this order) b) report all faces as sets of ordered vertices - You have to order them matching the Plex vertex order for the lower dimensional shape - They should be oriented as to have outward facing normal Regarding the interpolation, I also noticed that the memory requirement is huge if I load the whole mesh on one core and interpolate (I cannot interpolate a mesh with 9M tets on a machine with 128GB of memory, it ran out of memory, I?ll try to run with petsc memory logs and send). I am not sure what the upper limit is, but in a 3D mesh you could have many times the number of cells in faces and edges. Note that you would need --with-64-bit-indices to go beyond 4GB. is there anyways to interpolate the mesh after DMPlexdistribute? The code crashes if I move DMPlexInterpolate to after calling DMPlexdistribute., I guess what is needed is a DMPlexInterpolate on already distributed meshes. This should work. Maybe try a small example? If that crashes, just send it. Thanks, Matt Thank you -Hassan From: Matthew Knepley [mailto:knepley at gmail.com] Sent: Tuesday, June 27, 2017 11:53 AM To: Hassan Raiesi > Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel On Tue, Jun 27, 2017 at 10:08 AM, Hassan Raiesi > wrote: Great, It?s clear now ?, One more question, any plan to support other element shapes (prism and pyramid) in 3D?, DMPlexGetRawFaces_Internal only supports tets and hexs in 3D, can prisms and pyramids be used as degenerate hexahedrons? It depends on what you mean "support". Right now, we can represent these shapes in Plex. However, if you want mesh interpolation to work, then yes you need to extend GetRawFaces() to understand that shape. If you want them read out of a file format, other than Gmsh, we would likely have to extend that as well. These are straightforward once I understand what exactly you want to do. Thanks, Matt Thank you -Hassan From: Matthew Knepley [mailto:knepley at gmail.com] Sent: Tuesday, June 27, 2017 10:17 AM To: Hassan Raiesi > Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel On Tue, Jun 27, 2017 at 9:12 AM, Hassan Raiesi > wrote: Thanks for your reply, Is there any example where each rank owns more than 1 element, i.e for the simple mesh here(attached png file), how should I pack and pass the coordinates of the vertices owned by rank0, rank1 Rank0:numcells = 2; num nodes=4, cells=[154 245] , nodes=[1 5 2 4] nodes = 1245 vertex coords: in what node order ? [coords_n1 coords_n2 coords_n4 coords_n5] or [coords_n2 coords_n4 coords_n1 coords_n5] or ?..? rank1: numcells = 2; num nodes=2, cells=[532 635], nodes=[6 3] vertexcoords [how to pack the nodes coords here?] should it be [x6y6 x3y3] or [x3y3 x6y6]? In what order? I think there is a misunderstanding here. There is NO connection between the cell order and the vertex order. Each process gets a contiguous set of cells (in the global numbering) and a contiguous set of vertices (in the global numbering). These two are NOT related. We then move the vertices to the correct processes. In this way, we can load completely in parallel, without requiring any setup in the mesh file. If you are worried, you can always arrange the order of vertices to "match" the order of cells. Thanks, Matt thanks From: Matthew Knepley [mailto:knepley at gmail.com] Sent: Sunday, June 25, 2017 1:04 PM To: Hassan Raiesi > Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel On Tue, Apr 11, 2017 at 9:21 AM, Hassan Raiesi > wrote: Hello, I?m trying to use DMPlexCreateFromCellListParallel to create a DM from an already partitioned mesh, It requires an array of numVertices*spaceDim numbers, but how should one order the coordinates of the vertices? Global order. Here is the idea. You must read the file in chunks so that each proc can read its own chunk in parallel without talking to anyone else. we only pass the global vertex numbers using ?const int cells[]? to define the cell-connectivity, so passing the vertex coordinates in local ordering wouldn?t make sense? Yes. If it needs to be in global ordering, should I sort the global index of the node numbers owned by each rank (as they wont be continuous). Nope. Thanks, Matt Thank you Hassan Raiesi, Bombardier Aerospace www.bombardier.com -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jun 29 09:36:05 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 29 Jun 2017 09:36:05 -0500 Subject: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel In-Reply-To: References: Message-ID: On Thu, Jun 29, 2017 at 9:32 AM, Hassan Raiesi < Hassan.Raiesi at aero.bombardier.com> wrote: > Ok, > > > > The problem is that the faceSizes varies for pyramids and prisms, they > have faces with variable number of nodes (i.e for pyramids, four triangular > faces and one quad face, this I don?t see how to specify? > > What if I report all faces as quad faces, for triangular faces, there > would be a repeated vertex, would it make any problem? > No, you are right, that will not work. The whole interpolation code would have to be rewritten to allow faces with variable size. Its not hard, but it is time-consuming. Since I am moving this summer, this will not be done in a timely manner unfortunately. Thanks, Matt Thank you > > -Hassan > > > > *From:* Matthew Knepley [mailto:knepley at gmail.com] > *Sent:* Thursday, June 29, 2017 10:12 AM > *To:* Hassan Raiesi > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] specifying vertex coordinates using > DMPlexCreateFromCellListParallel > > > > On Thu, Jun 29, 2017 at 8:59 AM, Hassan Raiesi bombardier.com> wrote: > > Matthew, > > > > Not sure if I understood completely, I have the ordering according to the > CGNS standard for all cells in the mesh, or I can change it if needed, but > I don?t understand what you mean by reporting all faces as ordered > vertices? Do you mean I just define the face connectivity?Could you explain > a bit more? > > > > For a triangular prism, you would add the case in GetRawFaces() for dim 3, > coneSize 6. Lets look at a triangle (dim 2, coneSize 3) > > > > https://bitbucket.org/petsc/petsc/src/5ebd2cd705b0ffa5a27aee8a08f867 > 95d413b548/src/dm/impls/plex/plexinterpolate.c?at=master& > fileviewer=file-view-default#plexinterpolate.c-63 > > > > You see that it returns 3 groups of 2 vertices, which are the edges. For a > tetrahedron > > > > https://bitbucket.org/petsc/petsc/src/5ebd2cd705b0ffa5a27aee8a08f867 > 95d413b548/src/dm/impls/plex/plexinterpolate.c?at=master& > fileviewer=file-view-default#plexinterpolate.c-101 > > > > it returns 4 groups of 3 vertices, which are the faces. Note that all > faces should be oriented such that the normal is outward. > > > > Thanks, > > > > Matt > > > > > > Thank you > > -Hassan > > > > *From:* Matthew Knepley [mailto:knepley at gmail.com] > *Sent:* Tuesday, June 27, 2017 1:54 PM > *To:* Hassan Raiesi > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] specifying vertex coordinates using > DMPlexCreateFromCellListParallel > > > > On Tue, Jun 27, 2017 at 11:30 AM, Hassan Raiesi bombardier.com> wrote: > > I meant the interpolation, > > DMPlex supports those element shapes, however, there are two problems, > one, ?DMPlexBuildFromCellList_Parallel? does not take mixed elements > (numCorners must be constant for all elements in current implementation), > that was easy to fix, > > I already extended DMPlexBuildFromCellList_Parallel_Private to take > elements with different shapes, then I realized the interpolation does not > work when the mesh has elements other than tets and hex. > > > > Okay, here is what is needed. You need to > > > > a) prescribe an order for the vertices in a prism/pyramid (all input > cells must have this order) > > > > b) report all faces as sets of ordered vertices > > - You have to order them matching the Plex vertex order for the lower > dimensional shape > > - They should be oriented as to have outward facing normal > > > > Regarding the interpolation, I also noticed that the memory requirement is > huge if I load the whole mesh on one core and interpolate (I cannot > interpolate a mesh with 9M tets on a machine with 128GB of memory, it ran > out of memory, I?ll try to run with petsc memory logs and send). > > > > I am not sure what the upper limit is, but in a 3D mesh you could have > many times the number of cells in faces and edges. Note that > > you would need --with-64-bit-indices to go beyond 4GB. > > > > is there anyways to interpolate the mesh after DMPlexdistribute? The code > crashes if I move DMPlexInterpolate to after calling DMPlexdistribute., I > guess what is needed is a DMPlexInterpolate on already distributed meshes. > > > > This should work. Maybe try a small example? If that crashes, just send it. > > > > Thanks, > > > > Matt > > > > Thank you > > -Hassan > > > > *From:* Matthew Knepley [mailto:knepley at gmail.com] > *Sent:* Tuesday, June 27, 2017 11:53 AM > *To:* Hassan Raiesi > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] specifying vertex coordinates using > DMPlexCreateFromCellListParallel > > > > On Tue, Jun 27, 2017 at 10:08 AM, Hassan Raiesi bombardier.com> wrote: > > Great, It?s clear now J, > > One more question, any plan to support other element shapes (prism and > pyramid) in 3D?, DMPlexGetRawFaces_Internal only supports tets and hexs > in 3D, can prisms and pyramids be used as degenerate hexahedrons? > > > > It depends on what you mean "support". Right now, we can represent these > shapes in Plex. However, if you > > want mesh interpolation to work, then yes you need to extend GetRawFaces() > to understand that shape. If > > you want them read out of a file format, other than Gmsh, we would likely > have to extend that as well. These > > are straightforward once I understand what exactly you want to do. > > > > Thanks, > > > > Matt > > > > Thank you > > -Hassan > > > > > > > > > > *From:* Matthew Knepley [mailto:knepley at gmail.com] > *Sent:* Tuesday, June 27, 2017 10:17 AM > *To:* Hassan Raiesi > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] specifying vertex coordinates using > DMPlexCreateFromCellListParallel > > > > On Tue, Jun 27, 2017 at 9:12 AM, Hassan Raiesi bombardier.com> wrote: > > Thanks for your reply, > > > > Is there any example where each rank owns more than 1 element, i.e for the > simple mesh here(attached png file), how should I pack and pass the > coordinates of the vertices owned by rank0, rank1 > > > > Rank0:numcells = 2; num nodes=4, cells=[154 245] , nodes=[1 5 2 4] > > nodes = 1245 > > vertex coords: in what node order ? > > [coords_n1 coords_n2 coords_n4 coords_n5] or > > [coords_n2 coords_n4 coords_n1 coords_n5] or ?..? > > > > > > rank1: numcells = 2; num nodes=2, cells=[532 635], nodes=[6 3] > > vertexcoords [how to pack the nodes coords here?] > > should it be [x6y6 x3y3] or [x3y3 x6y6]? In what order? > > > > I think there is a misunderstanding here. > > > > There is NO connection between the cell order and the vertex order. Each > process gets a contiguous > > set of cells (in the global numbering) and a contiguous set of vertices > (in the global numbering). These > > two are NOT related. We then move the vertices to the correct processes. > In this way, we can load > > completely in parallel, without requiring any setup in the mesh file. > > > > If you are worried, you can always arrange the order of vertices to > "match" the order of cells. > > > > Thanks, > > > > Matt > > > > thanks > > > > > > > > *From:* Matthew Knepley [mailto:knepley at gmail.com] > *Sent:* Sunday, June 25, 2017 1:04 PM > *To:* Hassan Raiesi > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] specifying vertex coordinates using > DMPlexCreateFromCellListParallel > > > > On Tue, Apr 11, 2017 at 9:21 AM, Hassan Raiesi bombardier.com> wrote: > > Hello, > > > > I?m trying to use DMPlexCreateFromCellListParallel to create a DM from an > already partitioned mesh, > > It requires an array of numVertices*spaceDim numbers, but how should one > order the coordinates of the vertices? > > > > Global order. Here is the idea. You must read the file in chunks so that > each proc can read its own chunk in parallel > > without talking to anyone else. > > > > we only pass the global vertex numbers using ?const int cells[]? to define > the cell-connectivity, so passing the vertex coordinates in local ordering > wouldn?t make sense? > > > > Yes. > > > > If it needs to be in global ordering, should I sort the global index of > the node numbers owned by each rank (as they wont be continuous). > > > > Nope. > > > > Thanks, > > > > Matt > > > > > > Thank you > > > > Hassan Raiesi, > > Bombardier Aerospace > > www.bombardier.com > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > http://www.caam.rice.edu/~mk51/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > http://www.caam.rice.edu/~mk51/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > http://www.caam.rice.edu/~mk51/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > http://www.caam.rice.edu/~mk51/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > http://www.caam.rice.edu/~mk51/ > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Hassan.Raiesi at aero.bombardier.com Thu Jun 29 09:52:57 2017 From: Hassan.Raiesi at aero.bombardier.com (Hassan Raiesi) Date: Thu, 29 Jun 2017 14:52:57 +0000 Subject: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel In-Reply-To: References: Message-ID: That would great if it allows elements with mixed faces, Thank you -Hassan From: Matthew Knepley [mailto:knepley at gmail.com] Sent: Thursday, June 29, 2017 10:36 AM To: Hassan Raiesi Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel On Thu, Jun 29, 2017 at 9:32 AM, Hassan Raiesi > wrote: Ok, The problem is that the faceSizes varies for pyramids and prisms, they have faces with variable number of nodes (i.e for pyramids, four triangular faces and one quad face, this I don?t see how to specify? What if I report all faces as quad faces, for triangular faces, there would be a repeated vertex, would it make any problem? No, you are right, that will not work. The whole interpolation code would have to be rewritten to allow faces with variable size. Its not hard, but it is time-consuming. Since I am moving this summer, this will not be done in a timely manner unfortunately. Thanks, Matt Thank you -Hassan From: Matthew Knepley [mailto:knepley at gmail.com] Sent: Thursday, June 29, 2017 10:12 AM To: Hassan Raiesi > Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel On Thu, Jun 29, 2017 at 8:59 AM, Hassan Raiesi > wrote: Matthew, Not sure if I understood completely, I have the ordering according to the CGNS standard for all cells in the mesh, or I can change it if needed, but I don?t understand what you mean by reporting all faces as ordered vertices? Do you mean I just define the face connectivity?Could you explain a bit more? For a triangular prism, you would add the case in GetRawFaces() for dim 3, coneSize 6. Lets look at a triangle (dim 2, coneSize 3) https://bitbucket.org/petsc/petsc/src/5ebd2cd705b0ffa5a27aee8a08f86795d413b548/src/dm/impls/plex/plexinterpolate.c?at=master&fileviewer=file-view-default#plexinterpolate.c-63 You see that it returns 3 groups of 2 vertices, which are the edges. For a tetrahedron https://bitbucket.org/petsc/petsc/src/5ebd2cd705b0ffa5a27aee8a08f86795d413b548/src/dm/impls/plex/plexinterpolate.c?at=master&fileviewer=file-view-default#plexinterpolate.c-101 it returns 4 groups of 3 vertices, which are the faces. Note that all faces should be oriented such that the normal is outward. Thanks, Matt Thank you -Hassan From: Matthew Knepley [mailto:knepley at gmail.com] Sent: Tuesday, June 27, 2017 1:54 PM To: Hassan Raiesi > Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel On Tue, Jun 27, 2017 at 11:30 AM, Hassan Raiesi > wrote: I meant the interpolation, DMPlex supports those element shapes, however, there are two problems, one, ?DMPlexBuildFromCellList_Parallel? does not take mixed elements (numCorners must be constant for all elements in current implementation), that was easy to fix, I already extended DMPlexBuildFromCellList_Parallel_Private to take elements with different shapes, then I realized the interpolation does not work when the mesh has elements other than tets and hex. Okay, here is what is needed. You need to a) prescribe an order for the vertices in a prism/pyramid (all input cells must have this order) b) report all faces as sets of ordered vertices - You have to order them matching the Plex vertex order for the lower dimensional shape - They should be oriented as to have outward facing normal Regarding the interpolation, I also noticed that the memory requirement is huge if I load the whole mesh on one core and interpolate (I cannot interpolate a mesh with 9M tets on a machine with 128GB of memory, it ran out of memory, I?ll try to run with petsc memory logs and send). I am not sure what the upper limit is, but in a 3D mesh you could have many times the number of cells in faces and edges. Note that you would need --with-64-bit-indices to go beyond 4GB. is there anyways to interpolate the mesh after DMPlexdistribute? The code crashes if I move DMPlexInterpolate to after calling DMPlexdistribute., I guess what is needed is a DMPlexInterpolate on already distributed meshes. This should work. Maybe try a small example? If that crashes, just send it. Thanks, Matt Thank you -Hassan From: Matthew Knepley [mailto:knepley at gmail.com] Sent: Tuesday, June 27, 2017 11:53 AM To: Hassan Raiesi > Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel On Tue, Jun 27, 2017 at 10:08 AM, Hassan Raiesi > wrote: Great, It?s clear now ?, One more question, any plan to support other element shapes (prism and pyramid) in 3D?, DMPlexGetRawFaces_Internal only supports tets and hexs in 3D, can prisms and pyramids be used as degenerate hexahedrons? It depends on what you mean "support". Right now, we can represent these shapes in Plex. However, if you want mesh interpolation to work, then yes you need to extend GetRawFaces() to understand that shape. If you want them read out of a file format, other than Gmsh, we would likely have to extend that as well. These are straightforward once I understand what exactly you want to do. Thanks, Matt Thank you -Hassan From: Matthew Knepley [mailto:knepley at gmail.com] Sent: Tuesday, June 27, 2017 10:17 AM To: Hassan Raiesi > Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel On Tue, Jun 27, 2017 at 9:12 AM, Hassan Raiesi > wrote: Thanks for your reply, Is there any example where each rank owns more than 1 element, i.e for the simple mesh here(attached png file), how should I pack and pass the coordinates of the vertices owned by rank0, rank1 Rank0:numcells = 2; num nodes=4, cells=[154 245] , nodes=[1 5 2 4] nodes = 1245 vertex coords: in what node order ? [coords_n1 coords_n2 coords_n4 coords_n5] or [coords_n2 coords_n4 coords_n1 coords_n5] or ?..? rank1: numcells = 2; num nodes=2, cells=[532 635], nodes=[6 3] vertexcoords [how to pack the nodes coords here?] should it be [x6y6 x3y3] or [x3y3 x6y6]? In what order? I think there is a misunderstanding here. There is NO connection between the cell order and the vertex order. Each process gets a contiguous set of cells (in the global numbering) and a contiguous set of vertices (in the global numbering). These two are NOT related. We then move the vertices to the correct processes. In this way, we can load completely in parallel, without requiring any setup in the mesh file. If you are worried, you can always arrange the order of vertices to "match" the order of cells. Thanks, Matt thanks From: Matthew Knepley [mailto:knepley at gmail.com] Sent: Sunday, June 25, 2017 1:04 PM To: Hassan Raiesi > Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel On Tue, Apr 11, 2017 at 9:21 AM, Hassan Raiesi > wrote: Hello, I?m trying to use DMPlexCreateFromCellListParallel to create a DM from an already partitioned mesh, It requires an array of numVertices*spaceDim numbers, but how should one order the coordinates of the vertices? Global order. Here is the idea. You must read the file in chunks so that each proc can read its own chunk in parallel without talking to anyone else. we only pass the global vertex numbers using ?const int cells[]? to define the cell-connectivity, so passing the vertex coordinates in local ordering wouldn?t make sense? Yes. If it needs to be in global ordering, should I sort the global index of the node numbers owned by each rank (as they wont be continuous). Nope. Thanks, Matt Thank you Hassan Raiesi, Bombardier Aerospace www.bombardier.com -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jun 29 09:57:37 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 29 Jun 2017 09:57:37 -0500 Subject: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel In-Reply-To: References: Message-ID: On Thu, Jun 29, 2017 at 9:52 AM, Hassan Raiesi < Hassan.Raiesi at aero.bombardier.com> wrote: > That would great if it allows elements with mixed faces, > Note that if you really needed this, you can always input the information by hand using DMPlexCreateDAG(), but its laborious. Matt > > > Thank you > > -Hassan > > > > *From:* Matthew Knepley [mailto:knepley at gmail.com] > *Sent:* Thursday, June 29, 2017 10:36 AM > *To:* Hassan Raiesi > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] specifying vertex coordinates using > DMPlexCreateFromCellListParallel > > > > On Thu, Jun 29, 2017 at 9:32 AM, Hassan Raiesi bombardier.com> wrote: > > Ok, > > > > The problem is that the faceSizes varies for pyramids and prisms, they > have faces with variable number of nodes (i.e for pyramids, four triangular > faces and one quad face, this I don?t see how to specify? > > What if I report all faces as quad faces, for triangular faces, there > would be a repeated vertex, would it make any problem? > > > > No, you are right, that will not work. The whole interpolation code would > have to be rewritten to allow faces with > > variable size. Its not hard, but it is time-consuming. Since I am moving > this summer, this will not be done in a timely > > manner unfortunately. > > > > Thanks, > > > > Matt > > > > Thank you > > -Hassan > > > > *From:* Matthew Knepley [mailto:knepley at gmail.com] > *Sent:* Thursday, June 29, 2017 10:12 AM > *To:* Hassan Raiesi > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] specifying vertex coordinates using > DMPlexCreateFromCellListParallel > > > > On Thu, Jun 29, 2017 at 8:59 AM, Hassan Raiesi bombardier.com> wrote: > > Matthew, > > > > Not sure if I understood completely, I have the ordering according to the > CGNS standard for all cells in the mesh, or I can change it if needed, but > I don?t understand what you mean by reporting all faces as ordered > vertices? Do you mean I just define the face connectivity?Could you explain > a bit more? > > > > For a triangular prism, you would add the case in GetRawFaces() for dim 3, > coneSize 6. Lets look at a triangle (dim 2, coneSize 3) > > > > https://bitbucket.org/petsc/petsc/src/5ebd2cd705b0ffa5a27aee8a08f867 > 95d413b548/src/dm/impls/plex/plexinterpolate.c?at=master& > fileviewer=file-view-default#plexinterpolate.c-63 > > > > You see that it returns 3 groups of 2 vertices, which are the edges. For a > tetrahedron > > > > https://bitbucket.org/petsc/petsc/src/5ebd2cd705b0ffa5a27aee8a08f867 > 95d413b548/src/dm/impls/plex/plexinterpolate.c?at=master& > fileviewer=file-view-default#plexinterpolate.c-101 > > > > it returns 4 groups of 3 vertices, which are the faces. Note that all > faces should be oriented such that the normal is outward. > > > > Thanks, > > > > Matt > > > > > > Thank you > > -Hassan > > > > *From:* Matthew Knepley [mailto:knepley at gmail.com] > *Sent:* Tuesday, June 27, 2017 1:54 PM > *To:* Hassan Raiesi > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] specifying vertex coordinates using > DMPlexCreateFromCellListParallel > > > > On Tue, Jun 27, 2017 at 11:30 AM, Hassan Raiesi bombardier.com> wrote: > > I meant the interpolation, > > DMPlex supports those element shapes, however, there are two problems, > one, ?DMPlexBuildFromCellList_Parallel? does not take mixed elements > (numCorners must be constant for all elements in current implementation), > that was easy to fix, > > I already extended DMPlexBuildFromCellList_Parallel_Private to take > elements with different shapes, then I realized the interpolation does not > work when the mesh has elements other than tets and hex. > > > > Okay, here is what is needed. You need to > > > > a) prescribe an order for the vertices in a prism/pyramid (all input > cells must have this order) > > > > b) report all faces as sets of ordered vertices > > - You have to order them matching the Plex vertex order for the lower > dimensional shape > > - They should be oriented as to have outward facing normal > > > > Regarding the interpolation, I also noticed that the memory requirement is > huge if I load the whole mesh on one core and interpolate (I cannot > interpolate a mesh with 9M tets on a machine with 128GB of memory, it ran > out of memory, I?ll try to run with petsc memory logs and send). > > > > I am not sure what the upper limit is, but in a 3D mesh you could have > many times the number of cells in faces and edges. Note that > > you would need --with-64-bit-indices to go beyond 4GB. > > > > is there anyways to interpolate the mesh after DMPlexdistribute? The code > crashes if I move DMPlexInterpolate to after calling DMPlexdistribute., I > guess what is needed is a DMPlexInterpolate on already distributed meshes. > > > > This should work. Maybe try a small example? If that crashes, just send it. > > > > Thanks, > > > > Matt > > > > Thank you > > -Hassan > > > > *From:* Matthew Knepley [mailto:knepley at gmail.com] > *Sent:* Tuesday, June 27, 2017 11:53 AM > *To:* Hassan Raiesi > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] specifying vertex coordinates using > DMPlexCreateFromCellListParallel > > > > On Tue, Jun 27, 2017 at 10:08 AM, Hassan Raiesi bombardier.com> wrote: > > Great, It?s clear now J, > > One more question, any plan to support other element shapes (prism and > pyramid) in 3D?, DMPlexGetRawFaces_Internal only supports tets and hexs > in 3D, can prisms and pyramids be used as degenerate hexahedrons? > > > > It depends on what you mean "support". Right now, we can represent these > shapes in Plex. However, if you > > want mesh interpolation to work, then yes you need to extend GetRawFaces() > to understand that shape. If > > you want them read out of a file format, other than Gmsh, we would likely > have to extend that as well. These > > are straightforward once I understand what exactly you want to do. > > > > Thanks, > > > > Matt > > > > Thank you > > -Hassan > > > > > > > > > > *From:* Matthew Knepley [mailto:knepley at gmail.com] > *Sent:* Tuesday, June 27, 2017 10:17 AM > *To:* Hassan Raiesi > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] specifying vertex coordinates using > DMPlexCreateFromCellListParallel > > > > On Tue, Jun 27, 2017 at 9:12 AM, Hassan Raiesi bombardier.com> wrote: > > Thanks for your reply, > > > > Is there any example where each rank owns more than 1 element, i.e for the > simple mesh here(attached png file), how should I pack and pass the > coordinates of the vertices owned by rank0, rank1 > > > > Rank0:numcells = 2; num nodes=4, cells=[154 245] , nodes=[1 5 2 4] > > nodes = 1245 > > vertex coords: in what node order ? > > [coords_n1 coords_n2 coords_n4 coords_n5] or > > [coords_n2 coords_n4 coords_n1 coords_n5] or ?..? > > > > > > rank1: numcells = 2; num nodes=2, cells=[532 635], nodes=[6 3] > > vertexcoords [how to pack the nodes coords here?] > > should it be [x6y6 x3y3] or [x3y3 x6y6]? In what order? > > > > I think there is a misunderstanding here. > > > > There is NO connection between the cell order and the vertex order. Each > process gets a contiguous > > set of cells (in the global numbering) and a contiguous set of vertices > (in the global numbering). These > > two are NOT related. We then move the vertices to the correct processes. > In this way, we can load > > completely in parallel, without requiring any setup in the mesh file. > > > > If you are worried, you can always arrange the order of vertices to > "match" the order of cells. > > > > Thanks, > > > > Matt > > > > thanks > > > > > > > > *From:* Matthew Knepley [mailto:knepley at gmail.com] > *Sent:* Sunday, June 25, 2017 1:04 PM > *To:* Hassan Raiesi > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] specifying vertex coordinates using > DMPlexCreateFromCellListParallel > > > > On Tue, Apr 11, 2017 at 9:21 AM, Hassan Raiesi bombardier.com> wrote: > > Hello, > > > > I?m trying to use DMPlexCreateFromCellListParallel to create a DM from an > already partitioned mesh, > > It requires an array of numVertices*spaceDim numbers, but how should one > order the coordinates of the vertices? > > > > Global order. Here is the idea. You must read the file in chunks so that > each proc can read its own chunk in parallel > > without talking to anyone else. > > > > we only pass the global vertex numbers using ?const int cells[]? to define > the cell-connectivity, so passing the vertex coordinates in local ordering > wouldn?t make sense? > > > > Yes. > > > > If it needs to be in global ordering, should I sort the global index of > the node numbers owned by each rank (as they wont be continuous). > > > > Nope. > > > > Thanks, > > > > Matt > > > > > > Thank you > > > > Hassan Raiesi, > > Bombardier Aerospace > > www.bombardier.com > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > http://www.caam.rice.edu/~mk51/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > http://www.caam.rice.edu/~mk51/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > http://www.caam.rice.edu/~mk51/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > http://www.caam.rice.edu/~mk51/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > http://www.caam.rice.edu/~mk51/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > http://www.caam.rice.edu/~mk51/ > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hbcbh1999 at gmail.com Thu Jun 29 12:39:04 2017 From: hbcbh1999 at gmail.com (Hao Zhang) Date: Thu, 29 Jun 2017 13:39:04 -0400 Subject: [petsc-users] petsc-3.5 quadruple precision mixed with double In-Reply-To: References: <87wp7vmwux.fsf@jedbrown.org> Message-ID: thanks, Barry. I have params like coordinate, particle position, density, viscosity etc, which definitely prefer not to be PetscScalar type. I try to localize PetscScalar. it works with double precision. quadruple precision really gives me running problem. i haven't configured why. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Jun 29 12:50:26 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 29 Jun 2017 12:50:26 -0500 Subject: [petsc-users] petsc-3.5 quadruple precision mixed with double In-Reply-To: References: <87wp7vmwux.fsf@jedbrown.org> Message-ID: <44B3563E-4FFC-44ED-996D-7E551B3C65EF@mcs.anl.gov> There really is no way to have half the code in double and half in quad, plus if you have any code in double you'll only get double results anyways. Barry > On Jun 29, 2017, at 12:39 PM, Hao Zhang wrote: > > thanks, Barry. > > I have params like coordinate, particle position, density, viscosity etc, which definitely prefer not to be PetscScalar type. I try to localize PetscScalar. > it works with double precision. > quadruple precision really gives me running problem. i haven't configured why. > > > > From lvella at gmail.com Thu Jun 29 15:38:19 2017 From: lvella at gmail.com (Lucas Clemente Vella) Date: Thu, 29 Jun 2017 17:38:19 -0300 Subject: [petsc-users] Is linear solver performance is worse in parallel? Message-ID: Hi, I have a problem that is easily solvable with 8 processes, (by easily I mean with few iterations). Using PCFIELDSPLIT, I get 2 outer iterations and 6 inner iterations, reaching residual norm of 1e-8. The system have 786432 unknowns in total, and the solver setting is given by: PetscOptionsInsertString(NULL, "-ksp_type fgmres " "-pc_type fieldsplit " "-pc_fieldsplit_detect_saddle_point " "-pc_fieldsplit_type schur " "-pc_fieldsplit_schur_fact_type full " "-pc_fieldsplit_schur_precondition self " "-fieldsplit_0_ksp_type bcgs " "-fieldsplit_0_pc_type hypre " "-fieldsplit_1_ksp_type gmres " "-fieldsplit_1_pc_type lsc " "-fieldsplit_1_lsc_pc_type hypre " "-fieldsplit_1_lsc_pc_hypre_boomeramg_cycle_type w"); Problem is, it is slow, (compared to less complex systems, solvable simply with bcgs+hypre), and to try to speed things up, I've ran with 64 processes, which gives only 12288 unknowns per process. In this setting, inner iteration reaches the maximum of 15 iterations I set, and the outer iteration couldn't lower the residual norm from 1e2 after 20 iterations. Is this supposed to happen? Increasing the number of parallel processes is supposed to worsen the solver performance? I just want to clear this issue from Petsc and Hypre side if possible, so if I ever experience such behavior again, I can be sure my code is wrong... -- Lucas Clemente Vella lvella at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jun 29 15:58:57 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 29 Jun 2017 15:58:57 -0500 Subject: [petsc-users] Is linear solver performance is worse in parallel? In-Reply-To: References: Message-ID: On Thu, Jun 29, 2017 at 3:38 PM, Lucas Clemente Vella wrote: > Hi, I have a problem that is easily solvable with 8 processes, (by easily > I mean with few iterations). Using PCFIELDSPLIT, I get 2 outer iterations > and 6 inner iterations, reaching residual norm of 1e-8. The system have > 786432 unknowns in total, and the solver setting is given by: > > PetscOptionsInsertString(NULL, > "-ksp_type fgmres " > "-pc_type fieldsplit " > "-pc_fieldsplit_detect_saddle_point " > "-pc_fieldsplit_type schur " > "-pc_fieldsplit_schur_fact_type full " > "-pc_fieldsplit_schur_precondition self " > "-fieldsplit_0_ksp_type bcgs " > "-fieldsplit_0_pc_type hypre " > "-fieldsplit_1_ksp_type gmres " > "-fieldsplit_1_pc_type lsc " > "-fieldsplit_1_lsc_pc_type hypre " > "-fieldsplit_1_lsc_pc_hypre_boomeramg_cycle_type w"); > > Problem is, it is slow, (compared to less complex systems, solvable simply > with bcgs+hypre), and to try to speed things up, I've ran with 64 > processes, which gives only 12288 unknowns per process. In this setting, > inner iteration reaches the maximum of 15 iterations I set, and the outer > iteration couldn't lower the residual norm from 1e2 after 20 iterations. > > Is this supposed to happen? Increasing the number of parallel processes is > supposed to worsen the solver performance? I just want to clear this issue > from Petsc and Hypre side if possible, so if I ever experience such > behavior again, I can be sure my code is wrong... > 1) For figuring out convergence issues, I would start with a smaller problem, so you can run lots of them 2) For any questions about convergence, we need to see the output of -ksp_view -ksp_monitor_true_residual -fieldsplit_1_ksp_monitor_true_residual -ksp_converged_reason 3) Please start with -fieldsplit_0_pc_type lu so we can just look at the Schur complement system 4) It sounds like the strength of your Schur complement preconditioner is not uniform in the size of the problem. Why do you think LSC would be a good idea? Also, 'self' preconditioning for many equations, like Stokes, is not uniform in problem size. 5) What are your equations? 6) I would start with -fieldsplit_1_pc_type lu, which will test your PC matrix, and after that works, change things one at a time. Thanks, Matt > -- > Lucas Clemente Vella > lvella at gmail.com > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From lvella at gmail.com Thu Jun 29 17:17:05 2017 From: lvella at gmail.com (Lucas Clemente Vella) Date: Thu, 29 Jun 2017 19:17:05 -0300 Subject: [petsc-users] Is linear solver performance is worse in parallel? In-Reply-To: References: Message-ID: 2017-06-29 17:58 GMT-03:00 Matthew Knepley : > 1) For figuring out convergence issues, I would start with a smaller > problem, so you can run lots of them > Hi! Yes, I did start, but since they are smaller, I run with fewer processes, and with few processes they converge (as well as the big problem). It won't converge with many processes, but that isn't needed for many small problems. I can still try it and report back, anyway. 2) For any questions about convergence, we need to see the output of > > -ksp_view -ksp_monitor_true_residual -fieldsplit_1_ksp_monitor_true_residual > -ksp_converged_reason > Can't provide it now, but I'll report back, too. > 4) It sounds like the strength of your Schur complement preconditioner is > not uniform in the size of the problem. Why > do you think LSC would be a good idea? Also, 'self' preconditioning > for many equations, like Stokes, is not uniform > in problem size. > I don't know what else I could solve the problem, but the size of the problem is not the issue here (yet!), otherwise, the same problem would not converge with 8 processes, but it does! The issue arises when I increase the number of processes, maintaining the global problem size. So the real question is, are the proconditiones not uniform w.r.t. the number of process, as they aren't w.r.t the size of the problem? By the way, if I can't use self+LSC, how should I build the matrix to be used in the preconditioner? From my tests, it seems "a11" is a bad choice (doesn't converge). > 5) What are your equations? > The matrix is the jacobian for discrete form of the non-linear incompressible Navier-Stokes equations, given in http://mathb.in/147865 About > 3) Please start with -fieldsplit_0_pc_type lu so we can just look at the > Schur complement system > and > 6) I would start with -fieldsplit_1_pc_type lu, which will test your PC > matrix, and after that works, change things one at a time. > It already works with low number of processes (same problem size)! If I manage to install and run a LU provider package that works in parallel, what new information can be obtained? > Thanks, > > Matt > > >> -- >> Lucas Clemente Vella >> lvella at gmail.com >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > -- Lucas Clemente Vella lvella at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jun 29 17:22:19 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 29 Jun 2017 17:22:19 -0500 Subject: [petsc-users] Is linear solver performance is worse in parallel? In-Reply-To: References: Message-ID: On Thu, Jun 29, 2017 at 5:17 PM, Lucas Clemente Vella wrote: > > > 2017-06-29 17:58 GMT-03:00 Matthew Knepley : > >> 1) For figuring out convergence issues, I would start with a smaller >> problem, so you can run lots of them >> > Hi! Yes, I did start, but since they are smaller, I run with fewer > processes, and with few processes they converge (as well as the big > problem). It won't converge with many processes, but that isn't needed for > many small problems. I can still try it and report back, anyway. > > 2) For any questions about convergence, we need to see the output of >> >> -ksp_view -ksp_monitor_true_residual -fieldsplit_1_ksp_monitor_true_residual >> -ksp_converged_reason >> > Can't provide it now, but I'll report back, too. > > >> 4) It sounds like the strength of your Schur complement preconditioner is >> not uniform in the size of the problem. Why >> do you think LSC would be a good idea? Also, 'self' preconditioning >> for many equations, like Stokes, is not uniform >> in problem size. >> > I don't know what else I could solve the problem, but the size of the > problem is not the issue here (yet!), otherwise, the same problem would not > converge with 8 processes, but it does! The issue arises when I increase > the number of processes, maintaining the global problem size. So the real > question is, are the proconditiones not uniform w.r.t. the number of > process, as they aren't w.r.t the size of the problem? > By the way, if I can't use self+LSC, how should I build the matrix to be > used in the preconditioner? From my tests, it seems "a11" is a bad choice > (doesn't converge). > > >> 5) What are your equations? >> > The matrix is the jacobian for discrete form of the non-linear > incompressible Navier-Stokes equations, given in http://mathb.in/147865 > > About > >> 3) Please start with -fieldsplit_0_pc_type lu so we can just look at the >> Schur complement system >> > and > >> 6) I would start with -fieldsplit_1_pc_type lu, which will test your PC >> matrix, and after that works, change things one at a time. >> > It already works with low number of processes (same problem size)! If I > manage to install and run a LU provider package that works in parallel, > what new information can be obtained? > Right, but we do not understand what is failing (it appears to be Hypre since everything else should be uniform in p) so we want to remove all other variables and see if we can figure it out by changing one thing at a time. Matt > >> Thanks, >> >> Matt >> >> >>> -- >>> Lucas Clemente Vella >>> lvella at gmail.com >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ >> > > > > -- > Lucas Clemente Vella > lvella at gmail.com > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Jun 29 17:46:28 2017 From: jed at jedbrown.org (Jed Brown) Date: Thu, 29 Jun 2017 16:46:28 -0600 Subject: [petsc-users] petsc-3.5 quadruple precision mixed with double In-Reply-To: References: <87wp7vmwux.fsf@jedbrown.org> Message-ID: <87efu2l8e3.fsf@jedbrown.org> Hao Zhang writes: > It's 3d incompressible RT simulation. My pressure between serial and > parallel calculation is off by 10^(-14) in relative error. it eventually > build up at later time. I doubt that is why your solution develops artifacts after many time steps. If you're using __float128, you'll take a big hit on performance so you might as well make everything __float128 (PetscScalar). Mixing precisions is a maintenance burden and exactly the wrong thing when you're trying to rule out potential sources of error. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From a.croucher at auckland.ac.nz Thu Jun 29 18:33:22 2017 From: a.croucher at auckland.ac.nz (Adrian Croucher) Date: Fri, 30 Jun 2017 11:33:22 +1200 Subject: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel In-Reply-To: References: Message-ID: <4df4c4ff-3709-5c46-e4bf-c7bc9cd6b89e@auckland.ac.nz> hi > That would great if it allows elements with mixed faces, Yes, as I've already mentioned to Matt, I need this too for 6-node wedge elements (probably the same thing as your 'prisms'). - Adrian > > Thank you > -Hassan > > From: Matthew Knepley [mailto:knepley at gmail.com] > Sent: Thursday, June 29, 2017 10:36 AM > To: Hassan Raiesi > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel > > On Thu, Jun 29, 2017 at 9:32 AM, Hassan Raiesi > wrote: > Ok, > > The problem is that the faceSizes varies for pyramids and prisms, they have faces with variable number of nodes (i.e for pyramids, four triangular faces and one quad face, this I don?t see how to specify? > What if I report all faces as quad faces, for triangular faces, there would be a repeated vertex, would it make any problem? > > No, you are right, that will not work. The whole interpolation code would have to be rewritten to allow faces with > variable size. Its not hard, but it is time-consuming. Since I am moving this summer, this will not be done in a timely > manner unfortunately. From hbcbh1999 at gmail.com Thu Jun 29 18:58:30 2017 From: hbcbh1999 at gmail.com (Hao Zhang) Date: Thu, 29 Jun 2017 19:58:30 -0400 Subject: [petsc-users] petsc-3.5 quadruple precision mixed with double In-Reply-To: <87efu2l8e3.fsf@jedbrown.org> References: <87wp7vmwux.fsf@jedbrown.org> <87efu2l8e3.fsf@jedbrown.org> Message-ID: Thanks for the input. I was doing a few tests. I will put quadruple precision choice on hold for now. Since the condition number is big at the beginning, scale of 10^2, I will try to smooth the input of the matrix by introducing heaviside function. I do have another question. the incompressible simulation I'm simulating is using all reflection(free slip neumann) boundary condition, what will be a good preconditioner? Shall I start a new thread? so, when other users search, they could use it as a reference -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Jun 29 18:58:42 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 29 Jun 2017 18:58:42 -0500 Subject: [petsc-users] Is linear solver performance is worse in parallel? In-Reply-To: References: Message-ID: <4D5E5AFD-75E4-4E5A-A742-2F3CD27BD57D@mcs.anl.gov> > On Jun 29, 2017, at 3:38 PM, Lucas Clemente Vella wrote: > > Hi, I have a problem that is easily solvable with 8 processes, (by easily I mean with few iterations). Using PCFIELDSPLIT, I get 2 outer iterations and 6 inner iterations, reaching residual norm of 1e-8. The system have 786432 unknowns in total, and the solver setting is given by: > > PetscOptionsInsertString(NULL, > "-ksp_type fgmres " > "-pc_type fieldsplit " > "-pc_fieldsplit_detect_saddle_point " > "-pc_fieldsplit_type schur " > "-pc_fieldsplit_schur_fact_type full " > "-pc_fieldsplit_schur_precondition self " > "-fieldsplit_0_ksp_type bcgs " > "-fieldsplit_0_pc_type hypre " > "-fieldsplit_1_ksp_type gmres " > "-fieldsplit_1_pc_type lsc " > "-fieldsplit_1_lsc_pc_type hypre " > "-fieldsplit_1_lsc_pc_hypre_boomeramg_cycle_type w"); > > Problem is, it is slow, (compared to less complex systems, solvable simply with bcgs+hypre), and to try to speed things up, I've ran with 64 processes, which gives only 12288 unknowns per process. In this setting, inner iteration reaches the maximum of 15 iterations I set, and the outer iteration couldn't lower the residual norm from 1e2 after 20 iterations. > > Is this supposed to happen? Increasing the number of parallel processes is supposed to worsen the solver performance? For many solvers yes increasing the number of processors but leaving everything else fixed does increase the number of iterations; the reason is simple, the more parallelism the more use of "older information" in the iteration since every process doesn't know the most recent values from the other processes. This Jacobi versus Gauss-Seidel iteration, GS uses newer information so almost always converges faster than Jacobi. Now appropriate algebraic multigrid does not suffer very much from this problem, so if the number of iterations increases dramatically with AMG this usually means that AMG is not appropriate for the problem or something is wrong. Barry > I just want to clear this issue from Petsc and Hypre side if possible, so if I ever experience such behavior again, I can be sure my code is wrong... > > -- > Lucas Clemente Vella > lvella at gmail.com From jed at jedbrown.org Thu Jun 29 20:29:50 2017 From: jed at jedbrown.org (Jed Brown) Date: Thu, 29 Jun 2017 19:29:50 -0600 Subject: [petsc-users] petsc-3.5 quadruple precision mixed with double In-Reply-To: References: <87wp7vmwux.fsf@jedbrown.org> <87efu2l8e3.fsf@jedbrown.org> Message-ID: <87wp7ujm9d.fsf@jedbrown.org> Hao Zhang writes: > Thanks for the input. I was doing a few tests. I will put quadruple > precision choice on hold for now. Since the condition number is big at the > beginning, scale of 10^2, That is not large and you don't need quad precision to solve it to better accuracy than almost any measured physical quantity. > I will try to smooth the input of the matrix by introducing heaviside > function. Not sure how that is smoothing. > I do have another question. the incompressible simulation I'm > simulating is using all reflection(free slip neumann) boundary > condition, what will be a good preconditioner? Shall I start a new > thread? so, when other users search, they could use it as a reference The boundary condition has almost no effect on choice of preconditioner. There are lots of methods for incompressible flow, many of which can be implemented at run-time using PCFIELDSPLIT. Unfortunately, there are not good black-box solvers, so you'll have to do some reading. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From imilian.hartig at gmail.com Fri Jun 30 03:45:42 2017 From: imilian.hartig at gmail.com (Maximilian Hartig) Date: Fri, 30 Jun 2017 10:45:42 +0200 Subject: [petsc-users] -snes_fd for problems with residuals with non-continuous first derivative? Message-ID: Hello, I?m trying to implement plasticity and have problems getting the Petsc SNES to converge. To check if my residual formulation is correct I tried running with -snes_fd for an easy example as the Petsc FAQ suggest. I cannot seem to get the solver to converge at any cost. I already tried to impose bounds on the solution and moved to vinewtonrsls as a nonlinear solver. I checked and rechecked my residuals but I do not find an error there. I now have the suspicion that the -snes_fd option is not made for handling residuals who?s first derivatives are not continuous (e.g. have an ?if? condition in them for the plasticity/ flow-condition). Can you confirm my suspicion? And is there another way to test my residual formulation separate from my hand-coded jacobian? Thanks, Max From luvsharma11 at gmail.com Fri Jun 30 04:09:51 2017 From: luvsharma11 at gmail.com (Luv Sharma) Date: Fri, 30 Jun 2017 11:09:51 +0200 Subject: [petsc-users] -snes_fd for problems with residuals with non-continuous first derivative? In-Reply-To: References: Message-ID: <87978294-9B3A-46C3-9986-6CA2B90891FD@gmail.com> Hi Max, Is your field solver not converging or the material point solver ;)? Best regards, Luv > On 30 Jun 2017, at 10:45, Maximilian Hartig wrote: > > Hello, > > I?m trying to implement plasticity and have problems getting the Petsc SNES to converge. To check if my residual formulation is correct I tried running with -snes_fd for an easy example as the Petsc FAQ suggest. I cannot seem to get the solver to converge at any cost. > I already tried to impose bounds on the solution and moved to vinewtonrsls as a nonlinear solver. I checked and rechecked my residuals but I do not find an error there. I now have the suspicion that the -snes_fd option is not made for handling residuals who?s first derivatives are not continuous (e.g. have an ?if? condition in them for the plasticity/ flow-condition). Can you confirm my suspicion? And is there another way to test my residual formulation separate from my hand-coded jacobian? > > > Thanks, > Max From imilian.hartig at gmail.com Fri Jun 30 04:52:22 2017 From: imilian.hartig at gmail.com (Maximilian Hartig) Date: Fri, 30 Jun 2017 11:52:22 +0200 Subject: [petsc-users] -snes_fd for problems with residuals with non-continuous first derivative? In-Reply-To: <87978294-9B3A-46C3-9986-6CA2B90891FD@gmail.com> References: <87978294-9B3A-46C3-9986-6CA2B90891FD@gmail.com> Message-ID: Hi Luv, I?m modelling linear hardening(sigma_y = sigma_y0 + K_iso*epsilon_plast_eqiv) with isotropic plasticity only. So I should not need to use an iterative method to find the point on the yield surface. I have three fields and 7 unknowns in total: Field 0: 3 displacement components Field 1: 3 velocity components Field 2: 1 equivalent plastic strain It is the solver for these three fields that is not converging. I am using PetscFE. As residuals for the plastic case (sigma_vM > sigma_yield) I have: Field 0 (displacement): f0[i] = rho*u_t[u_Off[1]+i] f1[i*dim+j] = sigma_tr[i*dim+j] - 2*mu*sqrt(3/2)*u_t[uOff[2]]*N[i*dim+j] where sigma_tr is the trial stress, mu is the shear modulus and N is the unit deviator tensor normal to the yield surface. Field 1 (velocity): f0[i] = u[uOff[1]+i]-u_t[i] f1[i*dim+j] = 0 Field 2 (effective plastic strain): f0[0] = ||s_tr|| -2*mu*sqrt(3/2)*u_t[uOff[2]]-sqrt(2/3)*sigma_y f1[i] = 0 where ||s_tr|| is the norm of the deviator stress tensor. Field 0 residual is essentially newton?s second law of motion and Field 2 residual should be the yield criterion. I might have just fundamentally misunderstood the equations of plasticity but I cannot seem to find my mistake. Thanks, Max > On 30. Jun 2017, at 11:09, Luv Sharma wrote: > > Hi Max, > > Is your field solver not converging or the material point solver ;)? > > Best regards, > Luv >> On 30 Jun 2017, at 10:45, Maximilian Hartig wrote: >> >> Hello, >> >> I?m trying to implement plasticity and have problems getting the Petsc SNES to converge. To check if my residual formulation is correct I tried running with -snes_fd for an easy example as the Petsc FAQ suggest. I cannot seem to get the solver to converge at any cost. >> I already tried to impose bounds on the solution and moved to vinewtonrsls as a nonlinear solver. I checked and rechecked my residuals but I do not find an error there. I now have the suspicion that the -snes_fd option is not made for handling residuals who?s first derivatives are not continuous (e.g. have an ?if? condition in them for the plasticity/ flow-condition). Can you confirm my suspicion? And is there another way to test my residual formulation separate from my hand-coded jacobian? >> >> >> Thanks, >> Max > From knepley at gmail.com Fri Jun 30 09:51:16 2017 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 30 Jun 2017 09:51:16 -0500 Subject: [petsc-users] -snes_fd for problems with residuals with non-continuous first derivative? In-Reply-To: References: Message-ID: On Fri, Jun 30, 2017 at 3:45 AM, Maximilian Hartig wrote: > Hello, > > I?m trying to implement plasticity and have problems getting the Petsc > SNES to converge. To check if my residual formulation is correct I tried > running with -snes_fd for an easy example as the Petsc FAQ suggest. I > cannot seem to get the solver to converge at any cost. > I already tried to impose bounds on the solution and moved to vinewtonrsls > as a nonlinear solver. I checked and rechecked my residuals but I do not > find an error there. I now have the suspicion that the -snes_fd option is > not made for handling residuals who?s first derivatives are not continuous > (e.g. have an ?if? condition in them for the plasticity/ flow-condition). > Can you confirm my suspicion? And is there another way to test my residual > formulation separate from my hand-coded jacobian? > -snes_fd does a finite difference approximation, so if the two samples are on different sides of your conditional, the answer can be crap. You can take a look at this: https://scicomp.stackexchange.com/questions/30/why-is-newtons-method-not-converging The SNESVI that we have only deals with bound constraints, but I think you have a complementarity constraint. We should implement a nice solver for this, like Mihai Anitescu and Dan Negrut have, but we don't have time right now. Thanks, Matt > Thanks, > Max -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Jun 30 10:14:26 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 30 Jun 2017 10:14:26 -0500 Subject: [petsc-users] -snes_fd for problems with residuals with non-continuous first derivative? In-Reply-To: References: Message-ID: What is the output from -snes_monitor -ksp_monitor_true_residual -pc_type lu -ksp_converged_reason -snes_converged_reason > On Jun 30, 2017, at 3:45 AM, Maximilian Hartig wrote: > > Hello, > > I?m trying to implement plasticity and have problems getting the Petsc SNES to converge. To check if my residual formulation is correct I tried running with -snes_fd for an easy example as the Petsc FAQ suggest. I cannot seem to get the solver to converge at any cost. > I already tried to impose bounds on the solution and moved to vinewtonrsls as a nonlinear solver. I checked and rechecked my residuals but I do not find an error there. I now have the suspicion that the -snes_fd option is not made for handling residuals who?s first derivatives are not continuous (e.g. have an ?if? condition in them for the plasticity/ flow-condition). Can you confirm my suspicion? And is there another way to test my residual formulation separate from my hand-coded jacobian? > > > Thanks, > Max From luvsharma11 at gmail.com Fri Jun 30 13:49:11 2017 From: luvsharma11 at gmail.com (Luv Sharma) Date: Fri, 30 Jun 2017 20:49:11 +0200 Subject: [petsc-users] -snes_fd for problems with residuals with non-continuous first derivative? In-Reply-To: References: <87978294-9B3A-46C3-9986-6CA2B90891FD@gmail.com> Message-ID: Hi Max, I do not understand the equations that you write very clearly. Are you looking to implement a ?local? and ?if? type of isotropic hardening plasticity? If that is the case, then in my understanding you need to solve only 1 field equation for the displacement components or for the strain components. You can look at the following code: https://github.com/tdegeus/GooseFFT/blob/master/small-strain/laminate/elasto-plasticity.py If you are looking for a PETSc based implementation for plasticity (isotropic/anisotropic) you can look at https://damask.mpie.de/ I had presented a talk about the same at the PETSc User Meeting last year. As I understand it, additional field equations will only be necessary if the plasticity or elasticity were ?nonlocal?. You may want to look at: On the role of moving elastic?plastic boundaries in strain gradient plasticity, R H J Peerlings Best regards, Luv > On 30 Jun 2017, at 11:52, Maximilian Hartig wrote: > > Hi Luv, > > I?m modelling linear hardening(sigma_y = sigma_y0 + K_iso*epsilon_plast_eqiv) with isotropic plasticity only. So I should not need to use an iterative method to find the point on the yield surface. I have three fields and 7 unknowns in total: > Field 0: 3 displacement components > Field 1: 3 velocity components > Field 2: 1 equivalent plastic strain > > It is the solver for these three fields that is not converging. I am using PetscFE. As residuals for the plastic case (sigma_vM > sigma_yield) I have: > > Field 0 (displacement): > f0[i] = rho*u_t[u_Off[1]+i] > f1[i*dim+j] = sigma_tr[i*dim+j] - 2*mu*sqrt(3/2)*u_t[uOff[2]]*N[i*dim+j] > > where sigma_tr is the trial stress, mu is the shear modulus and N is the unit deviator tensor normal to the yield surface. > > Field 1 (velocity): > f0[i] = u[uOff[1]+i]-u_t[i] > f1[i*dim+j] = 0 > > Field 2 (effective plastic strain): > f0[0] = ||s_tr|| -2*mu*sqrt(3/2)*u_t[uOff[2]]-sqrt(2/3)*sigma_y > f1[i] = 0 > where ||s_tr|| is the norm of the deviator stress tensor. > > Field 0 residual is essentially newton?s second law of motion and Field 2 residual should be the yield criterion. I might have just fundamentally misunderstood the equations of plasticity but I cannot seem to find my mistake. > > Thanks, > Max > > >> On 30. Jun 2017, at 11:09, Luv Sharma wrote: >> >> Hi Max, >> >> Is your field solver not converging or the material point solver ;)? >> >> Best regards, >> Luv >>> On 30 Jun 2017, at 10:45, Maximilian Hartig wrote: >>> >>> Hello, >>> >>> I?m trying to implement plasticity and have problems getting the Petsc SNES to converge. To check if my residual formulation is correct I tried running with -snes_fd for an easy example as the Petsc FAQ suggest. I cannot seem to get the solver to converge at any cost. >>> I already tried to impose bounds on the solution and moved to vinewtonrsls as a nonlinear solver. I checked and rechecked my residuals but I do not find an error there. I now have the suspicion that the -snes_fd option is not made for handling residuals who?s first derivatives are not continuous (e.g. have an ?if? condition in them for the plasticity/ flow-condition). Can you confirm my suspicion? And is there another way to test my residual formulation separate from my hand-coded jacobian? >>> >>> >>> Thanks, >>> Max >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dnolte at dim.uchile.cl Fri Jun 30 16:22:14 2017 From: dnolte at dim.uchile.cl (David Nolte) Date: Fri, 30 Jun 2017 17:22:14 -0400 Subject: [petsc-users] Algebraic multigrid with fieldsplit smoother Message-ID: Dear all, in the context of Navier-Stokes problems using FEM, I am interested in trying a "coupled AMG" method, where the smoother on each (but the coarse) level uses a Schur complement fieldsplit PC (I am thinking of some variant of SIMPLE). Is this possible with gamg/ml/hypre? I would set, for instance, -mg_levels_pc_type fieldsplit -mg_levels_pc_fieldsplit_type schur -mg_levels_pc_fieldsplit_schur_precondition selfp and set the IS for the Fieldsplit in the code. I only have the IS for the fine level, though. Would I have to specify the correct IS for the fieldsplit of each sublevel (don't have that information)? How can this be done? Thanks a lot in advance! David From danyang.su at gmail.com Fri Jun 30 17:40:58 2017 From: danyang.su at gmail.com (Danyang Su) Date: Fri, 30 Jun 2017 15:40:58 -0700 Subject: [petsc-users] Is OpenMP still available for PETSc? Message-ID: Dear All, I recalled there was OpenMP available for PETSc for the old development version. When google "petsc hybrid mpi openmp", there returned some papers about this feature. My code was first parallelized using OpenMP and then redeveloped using PETSc, with OpenMP kept but not used together with MPI. Before retesting the code using hybrid mpi-openmp, I picked one PETSc example ex10 by adding "omp_set_num_threads(max_threads);" under PetscInitialize. The PETSc is the current development version configured as follows --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-debugging=0 --CFLAGS=-fopenmp --CXXFLAGS=-fopenmp --FFLAGS=-fopenmp COPTFLAGS="-O3 -march=native -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" FOPTFLAGS="-O3 -march=native -mtune=native" --with-large-file-io=1 --download-cmake=yes --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes --with-openmp --with-threadcomm --with-pthreadclasses --with-openmpclasses The code can be successfully compiled. However, when I run the code with OpenMP, it does not work, the time shows no change in performance if 1 or 2 threads per processor is used. Also, the CPU/Threads usage indicates that no thread is used. I just wonder if OpenMP is still available in the latest version, though it is not recommended to use. mpiexec -n 2 ./ex10 -f0 mat_rhs_pc_nonzero/a_react_in_2.bin -rhs mat_rhs_pc_nonzero/b_react_in_2.bin -ksp_rtol 1.0e-20 -ksp_monitor -ksp_error_if_not_converged -sub_pc_factor_shift_type nonzero -mat_view ascii::ascii_info -log_view -max_threads 1 -threadcomm_type openmp -threadcomm_nthreads 1 KSPSolve 1 1.0 8.9934e-01 1.0 1.03e+09 1.0 7.8e+01 3.6e+04 7.8e+01 69 97 89 6 76 89 97 98 98 96 2290 PCSetUp 2 1.0 8.9590e-02 1.0 2.91e+07 1.0 0.0e+00 0.0e+00 0.0e+00 7 3 0 0 0 9 3 0 0 0 648 PCSetUpOnBlocks 2 1.0 8.9465e-02 1.0 2.91e+07 1.0 0.0e+00 0.0e+00 0.0e+00 7 3 0 0 0 9 3 0 0 0 649 PCApply 40 1.0 3.1993e-01 1.0 2.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00 24 25 0 0 0 32 25 0 0 0 1686 mpiexec -n 2 ./ex10 -f0 mat_rhs_pc_nonzero/a_react_in_2.bin -rhs mat_rhs_pc_nonzero/b_react_in_2.bin -ksp_rtol 1.0e-20 -ksp_monitor -ksp_error_if_not_converged -sub_pc_factor_shift_type nonzero -mat_view ascii::ascii_info -log_view -max_threads 2 -threadcomm_type openmp -threadcomm_nthreads 2 KSPSolve 1 1.0 8.9701e-01 1.0 1.03e+09 1.0 7.8e+01 3.6e+04 7.8e+01 69 97 89 6 76 89 97 98 98 96 2296 PCSetUp 2 1.0 8.7635e-02 1.0 2.91e+07 1.0 0.0e+00 0.0e+00 0.0e+00 7 3 0 0 0 9 3 0 0 0 663 PCSetUpOnBlocks 2 1.0 8.7511e-02 1.0 2.91e+07 1.0 0.0e+00 0.0e+00 0.0e+00 7 3 0 0 0 9 3 0 0 0 664 PCApply 40 1.0 3.1878e-01 1.0 2.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00 24 25 0 0 0 32 25 0 0 0 1692 Thanks and regards, Danyang -------------- next part -------------- A non-text attachment was scrubbed... Name: ex10.c Type: text/x-csrc Size: 19687 bytes Desc: not available URL: -------------- next part -------------- CFLAGS = FFLAGS = CPPFLAGS = FPPFLAGS = LOCDIR = src/ksp/ksp/examples/tutorials/ EXAMPLESC = ex1.c ex2.c ex3.c ex4.c ex5.c ex6.c ex7.c ex8.c ex9.c \ ex10.c ex11.c ex12.c ex13.c ex15.c ex16.c ex18.c ex23.c \ ex25.c ex27.c ex28.c ex29.c ex30.c ex31.c ex32.c ex34.c \ ex41.c ex42.c ex43.c \ ex45.c ex46.c ex49.c ex50.c ex51.c ex52.c ex53.c \ ex54.c ex55.c ex56.c ex58.c ex62.c ex63.cxx ex64.c ex65.c EXAMPLESF = ex1f.F ex2f.F ex6f.F ex11f.F ex13f90.F ex14f.F ex15f.F ex21f.F ex22f.F ex44f.F90 ex45f.F \ ex52f.F ex54f.F ex61f.F90 MANSEC = KSP CLEANFILES = rhs.vtk solution.vtk NP = 1 include ${PETSC_DIR}/lib/petsc/conf/variables include ${PETSC_DIR}/lib/petsc/conf/rules ex1: ex1.o chkopts -${CLINKER} -o ex1 ex1.o ${PETSC_KSP_LIB} ${RM} ex1.o ex1f: ex1f.o chkopts -${FLINKER} -o ex1f ex1f.o ${PETSC_KSP_LIB} ${RM} ex1f.o ex2: ex2.o chkopts -${CLINKER} -o ex2 ex2.o ${PETSC_KSP_LIB} ${RM} ex2.o ex2a: ex2a.o chkopts -${CLINKER} -o ex2a ex2a.o ${PETSC_KSP_LIB} ${RM} ex2a.o ex2f: ex2f.o chkopts -${FLINKER} -o ex2f ex2f.o ${PETSC_KSP_LIB} ${RM} ex2f.o ex3: ex3.o chkopts -${CLINKER} -o ex3 ex3.o ${PETSC_KSP_LIB} ${RM} ex3.o ex4: ex4.o chkopts -${CLINKER} -o ex4 ex4.o ${PETSC_KSP_LIB} ${RM} ex4.o ex5: ex5.o chkopts -${CLINKER} -o ex5 ex5.o ${PETSC_KSP_LIB} ${RM} ex5.o ex6: ex6.o chkopts -${CLINKER} -o ex6 ex6.o ${PETSC_KSP_LIB} ${RM} ex6.o ex6f: ex6f.o chkopts -${FLINKER} -o ex6f ex6f.o ${PETSC_KSP_LIB} ${RM} ex6f.o ex7: ex7.o chkopts -${CLINKER} -o ex7 ex7.o ${PETSC_KSP_LIB} ${RM} ex7.o ex8: ex8.o chkopts -${CLINKER} -o ex8 ex8.o ${PETSC_KSP_LIB} ${RM} ex8.o ex9: ex9.o chkopts -${CLINKER} -o ex9 ex9.o ${PETSC_KSP_LIB} ${RM} ex9.o ex10: ex10.o chkopts -${CLINKER} -o ex10 ex10.o ${PETSC_KSP_LIB} ${RM} ex10.o ex11: ex11.o chkopts -${CLINKER} -o ex11 ex11.o ${PETSC_KSP_LIB} ${RM} ex11.o ex11f: ex11f.o chkopts -${FLINKER} -o ex11f ex11f.o ${PETSC_KSP_LIB} ${RM} ex11f.o ex12: ex12.o chkopts -${CLINKER} -o ex12 ex12.o ${PETSC_KSP_LIB} ${RM} ex12.o ex13: ex13.o chkopts -${CLINKER} -o ex13 ex13.o ${PETSC_KSP_LIB} ${RM} ex13.o ex13f90: ex13f90.o chkopts -${FLINKER} -o ex13f90 ex13f90.o ${PETSC_KSP_LIB} ${RM} ex13f90.o ex14: ex14.o chkopts -${CLINKER} -o ex14 ex14.o ${PETSC_KSP_LIB} ${RM} ex14.o ex14f: ex14f.o chkopts -${FLINKER} -o ex14f ex14f.o ${PETSC_KSP_LIB} ${RM} ex14f.o ex15: ex15.o chkopts -${CLINKER} -o ex15 ex15.o ${PETSC_KSP_LIB} ${RM} ex15.o ex15f: ex15f.o chkopts -${FLINKER} -o ex15f ex15f.o ${PETSC_KSP_LIB} ${RM} ex15f.o ex16: ex16.o chkopts -${CLINKER} -o ex16 ex16.o ${PETSC_KSP_LIB} ${RM} ex16.o ex17: ex17.o chkopts -${CLINKER} -o ex17 ex17.o ${PETSC_KSP_LIB} ${RM} ex17.o ex18: ex18.o chkopts -${CLINKER} -o ex18 ex18.o ${PETSC_KSP_LIB} ${RM} ex18.o ex20: ex20.o chkopts -${CLINKER} -o ex20 ex20.o ${PETSC_KSP_LIB} ${RM} ex20.o ex21f: ex21f.o chkopts -${FLINKER} -o ex21f ex21f.o ${PETSC_KSP_LIB} ${RM} ex21f.o ex22: ex22.o chkopts -${CLINKER} -o ex22 ex22.o ${PETSC_SNES_LIB} ${RM} ex22.o ex22f: ex22f.o chkopts -${FLINKER} -o ex22f ex22f.o ${PETSC_SNES_LIB} ${RM} ex22f.o ex23: ex23.o chkopts -${CLINKER} -o ex23 ex23.o ${PETSC_KSP_LIB} ${RM} ex23.o ex25: ex25.o chkopts -${CLINKER} -o ex25 ex25.o ${PETSC_SNES_LIB} ${RM} ex25.o ex26: ex26.o chkopts -${CLINKER} -o ex26 ex26.o ${PETSC_KSP_LIB} ${RM} ex26.o ex27: ex27.o chkopts -${CLINKER} -o ex27 ex27.o ${PETSC_KSP_LIB} ${RM} ex27.o ex28: ex28.o chkopts -${CLINKER} -o ex28 ex28.o ${PETSC_SNES_LIB} ${RM} ex28.o ex29: ex29.o chkopts -${CLINKER} -o ex29 ex29.o ${PETSC_SNES_LIB} ${RM} ex29.o ex30: ex30.o chkopts -${CLINKER} -o ex30 ex30.o ${PETSC_KSP_LIB} ${RM} ex30.o ex31: ex31.o chkopts -${CLINKER} -o ex31 ex31.o ${PETSC_SNES_LIB} ${RM} ex31.o ex32: ex32.o chkopts -${CLINKER} -o ex32 ex32.o ${PETSC_SNES_LIB} ${RM} ex32.o ex33: ex33.o chkopts -${CLINKER} -o ex33 ex33.o ${PETSC_SNES_LIB} ${RM} ex33.o ex34: ex34.o chkopts -${CLINKER} -o ex34 ex34.o ${PETSC_SNES_LIB} ${RM} ex34.o ex35: ex35.o chkopts -${CLINKER} -o ex35 ex35.o ${PETSC_SNES_LIB} ${RM} ex35.o ex36: ex36.o chkopts -${CLINKER} -o ex36 ex36.o ${PETSC_SNES_LIB} ${RM} ex36.o ex37: ex37.o chkopts -${CLINKER} -o ex37 ex37.o ${PETSC_SNES_LIB} ${RM} ex37.o ex38: ex38.o chkopts -${CLINKER} -o ex38 ex38.o ${PETSC_SNES_LIB} ${RM} ex38.o ex39: ex39.o chkopts -${CLINKER} -o ex39 ex39.o ${PETSC_SNES_LIB} ${RM} ex39.o ex40: ex40.o chkopts -${CLINKER} -o ex40 ex40.o ${PETSC_SNES_LIB} ${RM} ex40.o ex41: ex41.o chkopts -${CLINKER} -o ex41 ex41.o ${PETSC_SNES_LIB} ${RM} ex41.o ex42: ex42.o chkopts -${CLINKER} -o ex42 ex42.o ${PETSC_KSP_LIB} ${RM} ex42.o ex43: ex43.o chkopts -${CLINKER} -o ex43 ex43.o ${PETSC_KSP_LIB} ${RM} ex43.o # not tested in nightly builds because requires F90 compiler that handles long lines ex44f: ex44f.o chkopts -${FLINKER} -o ex44f ex44f.o ${PETSC_KSP_LIB} ${RM} ex44f.o ex45f: ex45f.o chkopts -${FLINKER} -o ex45f ex45f.o ${PETSC_KSP_LIB} ${RM} ex45f.o ex45: ex45.o chkopts -${CLINKER} -o ex45 ex45.o ${PETSC_KSP_LIB} ${RM} ex45.o ex46: ex46.o chkopts -${CLINKER} -o ex46 ex46.o ${PETSC_KSP_LIB} ${RM} ex46.o ex47: ex47.o chkopts -${CLINKER} -o ex47 ex47.o ${PETSC_KSP_LIB} ${RM} ex47.o ex49: ex49.o chkopts -${CLINKER} -o ex49 ex49.o ${PETSC_KSP_LIB} ${RM} ex49.o ex50: ex50.o chkopts -${CLINKER} -o ex50 ex50.o ${PETSC_KSP_LIB} ${RM} ex50.o ex51: ex51.o chkopts -${CLINKER} -o ex51 ex51.o ${PETSC_KSP_LIB} ${RM} ex51.o ex52: ex52.o chkopts -${CLINKER} -o ex52 ex52.o ${PETSC_KSP_LIB} ${RM} ex52.o ex52f: ex52f.o chkopts -${FLINKER} -o ex52f ex52f.o ${PETSC_KSP_LIB} ${RM} ex52f.o ex53: ex53.o chkopts -${CLINKER} -o ex53 ex53.o ${PETSC_KSP_LIB} ${RM} ex53.o ex54: ex54.o chkopts -${CLINKER} -o ex54 ex54.o ${PETSC_KSP_LIB} ${RM} ex54.o ex54f: ex54f.o chkopts -${FLINKER} -o ex54f ex54f.o ${PETSC_KSP_LIB} ${RM} ex54f.o ex55: ex55.o chkopts -${CLINKER} -o ex55 ex55.o ${PETSC_KSP_LIB} ${RM} ex55.o ex56: ex56.o chkopts -${CLINKER} -o ex56 ex56.o ${PETSC_KSP_LIB} ${RM} ex56.o ex57f: ex57f.o chkopts -${FLINKER} -o ex57f ex57f.o ${PETSC_KSP_LIB} ${RM} ex57f.o ex58: ex58.o chkopts -${CLINKER} -o ex58 ex58.o ${PETSC_KSP_LIB} ${RM} ex58.o ex59: ex59.o chkopts -${CLINKER} -o ex59 ex59.o ${PETSC_KSP_LIB} ${RM} ex59.o ex60: ex60.o chkopts -${CLINKER} -o ex60 ex60.o ${PETSC_KSP_LIB} ${RM} ex60.o ex61f: ex61f.o chkopts -${FLINKER} -o ex61f ex61f.o ${PETSC_KSP_LIB} ${RM} ex61f.o ex62: ex62.o chkopts -${CLINKER} -o ex62 ex62.o ${PETSC_KSP_LIB} ${RM} ex62.o ex63: ex63.o chkopts -${CLINKER} -o ex63 ex63.o ${PETSC_KSP_LIB} ${RM} ex63.o ex64: ex64.o chkopts -${CLINKER} -o ex64 ex64.o ${PETSC_KSP_LIB} ${RM} ex64. ex65: ex65.o chkopts -${CLINKER} -o ex65 ex65.o ${PETSC_KSP_LIB} ${RM} ex65.o #---------------------------------------------------------------------------- runex1: -@${MPIEXEC} -n 1 ./ex1 -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > ex1_1.tmp 2>&1; \ if (${DIFF} output/ex1_1.out ex1_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex1_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex1_1.tmp runex1_2: -@${MPIEXEC} -n 1 ./ex1 -pc_type sor -pc_sor_symmetric -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always >\ ex1_2.tmp 2>&1; \ if (${DIFF} output/ex1_2.out ex1_2.tmp) then true; \ else printf "${PWD}\nPossible problem with ex1_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex1_2.tmp runex1_3: -@${MPIEXEC} -n 1 ./ex1 -pc_type eisenstat -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always >\ ex1_3.tmp 2>&1; \ if (${DIFF} output/ex1_3.out ex1_3.tmp) then true; \ else printf "${PWD}\nPossible problem with ex1_3, diffs above\n=========================================\n"; fi; \ ${RM} -f ex1_3.tmp runex1f: -@${MPIEXEC} -n 1 ./ex1f -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > ex1f_1.tmp 2>&1; \ if (${DIFF} output/ex1f_1.out ex1f_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex1f_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex1f_1.tmp runex2: -@${MPIEXEC} -n 1 ./ex2 -ksp_monitor_short -m 5 -n 5 -ksp_gmres_cgs_refinement_type refine_always > ex2_1.tmp 2>&1; \ if (${DIFF} output/ex2_1.out ex2_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex2_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex2_1.tmp runex2_2: -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 5 -n 5 -ksp_gmres_cgs_refinement_type refine_always > ex2_2.tmp 2>&1; \ if (${DIFF} output/ex2_2.out ex2_2.tmp) then true; \ else printf "${PWD}\nPossible problem with ex2_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex2_2.tmp runex2_3: -@${MPIEXEC} -n 1 ./ex2 -pc_type sor -pc_sor_symmetric -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > \ ex2_3.tmp 2>&1; \ if (${DIFF} output/ex2_3.out ex2_3.tmp) then true; \ else printf "${PWD}\nPossible problem with ex2_3, diffs above\n=========================================\n"; fi; \ ${RM} -f ex2_3.tmp runex2_4: -@${MPIEXEC} -n 1 ./ex2 -pc_type eisenstat -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always >\ ex2_4.tmp 2>&1; \ if (${DIFF} output/ex2_4.out ex2_4.tmp) then true; \ else printf "${PWD}\nPossible problem with ex2_4, diffs above\n=========================================\n"; fi; \ ${RM} -f ex2_4.tmp runex2_5: -@${MPIEXEC} -n 2 ./ex2 -ksp_monitor_short -m 5 -n 5 -mat_view draw -ksp_gmres_cgs_refinement_type refine_always -nox > ex2_5.tmp 2>&1; \ if (${DIFF} output/ex2_2.out ex2_5.tmp) then true; \ else printf "${PWD}\nPossible problem with ex2_5, diffs above\n=========================================\n"; fi; \ ${RM} -f ex2_5.tmp runex2_bjacobi: -@${MPIEXEC} -n 4 ./ex2 -pc_type bjacobi -pc_bjacobi_blocks 1 -ksp_monitor_short -sub_pc_type jacobi -sub_ksp_type gmres > ex2.tmp 2>&1; \ if (${DIFF} output/ex2_bjacobi.out ex2.tmp) then true; \ else printf "${PWD}\nPossible problem with ex2_bjacobi, diffs above\n=========================================\n"; fi; \ ${RM} -f ex2.tmp runex2_bjacobi_2: -@${MPIEXEC} -n 4 ./ex2 -pc_type bjacobi -pc_bjacobi_blocks 2 -ksp_monitor_short -sub_pc_type jacobi -sub_ksp_type gmres -ksp_view > ex2.tmp 2>&1; \ if (${DIFF} output/ex2_bjacobi_2.out ex2.tmp) then true; \ else printf "${PWD}\nPossible problem with ex2_bjacobi_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex2.tmp runex2_bjacobi_3: -@${MPIEXEC} -n 4 ./ex2 -pc_type bjacobi -pc_bjacobi_blocks 4 -ksp_monitor_short -sub_pc_type jacobi -sub_ksp_type gmres > ex2.tmp 2>&1; \ if (${DIFF} output/ex2_bjacobi_3.out ex2.tmp) then true; \ else printf "${PWD}\nPossible problem with ex2_bjacobi_3, diffs above\n=========================================\n"; fi; \ ${RM} -f ex2.tmp runex2_chebyest_1: -@${MPIEXEC} -n 1 ./ex2 -m 80 -n 80 -ksp_pc_side right -pc_type ksp -ksp_ksp_type chebyshev -ksp_ksp_max_it 5 -ksp_ksp_chebyshev_esteig 0.9,0,0,1.1 -ksp_monitor_short > ex2.tmp 2>&1; \ ${DIFF} output/ex2_chebyest_1.out ex2.tmp || printf "${PWD}\nPossible problem with ex2_chebyest_1, diffs above\n=========================================\n"; \ ${RM} -f ex2.tmp runex2_chebyest_2: -@${MPIEXEC} -n 1 ./ex2 -m 80 -n 80 -ksp_pc_side right -pc_type ksp -ksp_ksp_type chebyshev -ksp_ksp_max_it 5 -ksp_ksp_chebyshev_esteig 0.9,0,0,1.1 -ksp_esteig_ksp_type cg -ksp_monitor_short > ex2.tmp 2>&1; \ ${DIFF} output/ex2_chebyest_2.out ex2.tmp || printf "${PWD}\nPossible problem with ex2_chebyest_2, diffs above\n=========================================\n"; \ ${RM} -f ex2.tmp runex2_umfpack: -@${MPIEXEC} -n 1 ./ex2 -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package umfpack > ex2_umfpack.tmp 2>&1; \ if (${DIFF} output/ex2_umfpack.out ex2_umfpack.tmp) then true; \ else printf "${PWD}\nPossible problem with ex2_umfpack, diffs above\n=========================================\n"; fi; \ ${RM} -f ex2_umfpack.tmp runex2_mkl_pardiso_lu: -@${MPIEXEC} -n 1 ./ex2 -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package mkl_pardiso > ex2_mkl_pardiso.tmp 2>&1; \ if (${DIFF} output/ex2_mkl_pardiso_lu.out ex2_mkl_pardiso.tmp) then true; \ else printf "${PWD}\nPossible problem with ex2_mkl_pardiso_lu, diffs above\n=========================================\n"; fi; \ ${RM} -f ex2_mkl_pardiso.tmp runex2_mkl_pardiso_cholesky: -@${MPIEXEC} -n 1 ./ex2 -ksp_type preonly -pc_type cholesky -mat_type sbaij -pc_factor_mat_solver_package mkl_pardiso > ex2_mkl_pardiso.tmp 2>&1; \ if (${DIFF} output/ex2_mkl_pardiso_cholesky.out ex2_mkl_pardiso.tmp) then true; \ else printf "${PWD}\nPossible problem with ex2_mkl_pardiso_cholesky, diffs above\n=========================================\n"; fi; \ ${RM} -f ex2_mkl_pardiso.tmp runex2_fbcgs: -@${MPIEXEC} -n 1 ./ex2 -ksp_type fbcgs -pc_type ilu > ex2.tmp 2>&1; \ if (${DIFF} output/ex2_fbcgs.out ex2.tmp) then true; \ else printf "${PWD}\nPossible problem with ex2_fbcgs, diffs above\n=========================================\n"; fi; \ ${RM} -f ex2.tmp runex2_fbcgs_2: -@${MPIEXEC} -n 3 ./ex2 -ksp_type fbcgsr -pc_type bjacobi > ex2.tmp 2>&1; \ if (${DIFF} output/ex2_fbcgs_2.out ex2.tmp) then true; \ else printf "${PWD}\nPossible problem with ex2_fbcgs_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex2.tmp runex2_telescope: -@${MPIEXEC} -n 4 ./ex2 -m 100 -n 100 -ksp_converged_reason -pc_type telescope -pc_telescope_reduction_factor 4 -telescope_pc_type bjacobi > ex2.tmp 2>&1; \ if (${DIFF} output/ex2_telescope.out ex2.tmp) then true; \ else printf "${PWD}\nPossible problem with ex2_telescope, diffs above\n=========================================\n"; fi; \ ${RM} -f ex2.tmp runex2_pipecg: -@${MPIEXEC} -n 1 ./ex2 -ksp_monitor_short -ksp_type pipecg -m 9 -n 9 > ex2_pipecg.tmp 2>&1; \ ${DIFF} output/ex2_pipecg.out ex2_pipecg.tmp || printf "${PWD}\nPossible problem with ex2_pipecg, diffs above\n=========================================\n"; \ ${RM} -f ex2_pipecg.tmp runex2_pipecr: -@${MPIEXEC} -n 1 ./ex2 -ksp_monitor_short -ksp_type pipecr -m 9 -n 9 > ex2_pipecr.tmp 2>&1; \ ${DIFF} output/ex2_pipecr.out ex2_pipecr.tmp || printf "${PWD}\nPossible problem with ex2_pipecr, diffs above\n=========================================\n"; \ ${RM} -f ex2_pipecr.tmp runex2_groppcg: -@${MPIEXEC} -n 1 ./ex2 -ksp_monitor_short -ksp_type groppcg -m 9 -n 9 > ex2_groppcg.tmp 2>&1; \ ${DIFF} output/ex2_groppcg.out ex2_groppcg.tmp || printf "${PWD}\nPossible problem with ex2_groppcg, diffs above\n=========================================\n"; \ ${RM} -f ex2_groppcg.tmp runex2_pipecgrr: -@${MPIEXEC} -n 1 ./ex2 -ksp_monitor_short -ksp_type pipecgrr -m 9 -n 9 > ex2_pipecgrr.tmp 2>&1; \ ${DIFF} output/ex2_pipecgrr.out ex2_pipecgrr.tmp || printf "${PWD}\nPossible problem with ex2_pipecgrr, diffs above\n=========================================\n"; \ ${RM} -f ex2_pipecgrr.tmp runex2f: -@${MPIEXEC} -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > ex2f_1.tmp 2>&1; \ if (${DIFF} output/ex2f_1.out ex2f_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex2f_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex2f_1.tmp runex2f_2: -@${MPIEXEC} -n 2 ./ex2f -pc_type jacobi -my_ksp_monitor -ksp_gmres_cgs_refinement_type refine_always > ex2f_2.tmp 2>&1; \ if (${DIFF} output/ex2f_2.out ex2f_2.tmp) then true; \ else printf "${PWD}\nPossible problem with ex2f_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex2f_2.tmp runex5: -@${MPIEXEC} -n 1 ./ex5 -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > ex5_1.tmp 2>&1; \ if (${DIFF} output/ex5_1.out ex5_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex5_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex5_1.tmp runex5_2: -@${MPIEXEC} -n 2 ./ex5 -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always \ -ksp_rtol .000001 > ex5_2.tmp 2>&1; \ if (${DIFF} output/ex5_2.out ex5_2.tmp) then true; \ else printf "${PWD}\nPossible problem with ex5_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex5_2.tmp runex5_5: -@${MPIEXEC} -n 2 ./ex5 -ksp_gmres_cgs_refinement_type refine_always > ex5_5.tmp 2>&1; \ if (${DIFF} output/ex5_5.out ex5_5.tmp) then true; \ else printf "${PWD}\nPossible problem with ex5_5, diffs above\n=========================================\n"; fi; \ ${RM} -f ex5_5.tmp runex5_redundant_0: -@${MPIEXEC} -n 1 ./ex5 -m 1000 -pc_type redundant -pc_redundant_number 1 -redundant_ksp_type gmres -redundant_pc_type jacobi > ex5.tmp 2>&1; \ if (${DIFF} output/ex5_redundant_0.out ex5.tmp) then true; \ else printf "${PWD}\nPossible problem with ex5_redundant, diffs above\n=========================================\n"; fi; \ ${RM} -f ex5.tmp runex5_redundant_1: -@${MPIEXEC} -n 5 ./ex5 -pc_type redundant -pc_redundant_number 1 -redundant_ksp_type gmres -redundant_pc_type jacobi > ex5.tmp 2>&1; \ if (${DIFF} output/ex5_redundant_1.out ex5.tmp) then true; \ else printf "${PWD}\nPossible problem with ex5_redundant_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex5.tmp runex5_redundant_2: -@${MPIEXEC} -n 5 ./ex5 -pc_type redundant -pc_redundant_number 3 -redundant_ksp_type gmres -redundant_pc_type jacobi > ex5.tmp 2>&1; \ if (${DIFF} output/ex5_redundant_2.out ex5.tmp) then true; \ else printf "${PWD}\nPossible problem with ex5_redundant_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex5.tmp runex5_redundant_3: -@${MPIEXEC} -n 5 ./ex5 -pc_type redundant -pc_redundant_number 5 -redundant_ksp_type gmres -redundant_pc_type jacobi > ex5.tmp 2>&1; \ if (${DIFF} output/ex5_redundant_3.out ex5.tmp) then true; \ else printf "${PWD}\nPossible problem with ex5_redundant_3, diffs above\n=========================================\n"; fi; \ ${RM} -f ex5.tmp runex5_redundant_4: -@${MPIEXEC} -n 5 ./ex5 -pc_type redundant -pc_redundant_number 3 -redundant_ksp_type gmres -redundant_pc_type jacobi -psubcomm_type interlaced > ex5.tmp 2>&1; \ if (${DIFF} output/ex5_redundant_4.out ex5.tmp) then true; \ else printf "${PWD}\nPossible problem with ex5_redundant_4, diffs above\n=========================================\n"; fi; \ ${RM} -f ex5.tmp runex6: -@${MPIEXEC} -n 1 ./ex6 -ksp_view > ex6_0.tmp 2>&1; \ if (${DIFF} output/ex6_0.out ex6_0.tmp) then true; \ else printf "${PWD}\nPossible problem with ex6_0, diffs above\n=========================================\n"; fi; \ ${RM} -f ex6_0.tmp runex6_1: -@${MPIEXEC} -n 4 ./ex6 -ksp_view > ex6_1.tmp 2>&1; \ if (${DIFF} output/ex6_1.out ex6_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex6_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex6_1.tmp runex6_2: -@${MPIEXEC} -n 4 ./ex6 -user_subdomains -ksp_view ${ARGS} > ex6_2.tmp 2>&1; \ if (${DIFF} output/ex6_2.out ex6_2.tmp) then true; \ else printf "${PWD}\nPossible problem with ex6_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex6_2.tmp runex6f: -@${MPIEXEC} -n 1 ./ex6f -pc_type jacobi -mat_view -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > ex6f_1.tmp 2>&1; \ if (${DIFF} output/ex6f_1.out ex6f_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex6f_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex6f_1.tmp runex7: -@${MPIEXEC} -n 2 ./ex7 -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always> ex7_1.tmp 2>&1; \ if (${DIFF} output/ex7_1.out ex7_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex7_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex7_1.tmp runex7_2: -@${MPIEXEC} -n 2 ./ex7 -ksp_view > ex7_2.tmp 2>&1; \ if (${DIFF} output/ex7_2.out ex7_2.tmp) then true; \ else printf "${PWD}\nPossible problem with ex7_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex7_2.tmp runex7_mpiaijcusp: -@${MPIEXEC} -n 1 ./ex7 -ksp_monitor_short -mat_type mpiaijcusp -sub_pc_factor_mat_solver_package cusparse -vec_type mpicusp > ex7_mpiaijcusp.tmp 2>&1; \ if (${DIFF} output/ex7_mpiaijcusp.out ex7_mpiaijcusp.tmp) then true; \ else printf "${PWD}\nPossible problem with with ex7_mpiaijcusp, diffs above\n=========================================\n"; fi; \ ${RM} -f ex7_mpiaijcusp.tmp runex7_mpiaijcusp_2: -@${MPIEXEC} -n 2 ./ex7 -ksp_monitor_short -mat_type mpiaijcusp -sub_pc_factor_mat_solver_package cusparse -vec_type mpicusp > ex7_mpiaijcusp_2.tmp 2>&1; \ if (${DIFF} output/ex7_mpiaijcusp_2.out ex7_mpiaijcusp_2.tmp) then true; \ else printf "${PWD}\nPossible problem with with ex7_mpiaijcusp_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex7_mpiaijcusp_2.tmp runex7_mpiaijcusp_simple: -@${MPIEXEC} -n 1 ./ex7 -ksp_monitor_short -mat_type mpiaijcusp -sub_pc_factor_mat_solver_package cusparse -vec_type mpicusp -sub_ksp_type preonly -sub_pc_type ilu > ex7_mpiaijcusp_simple.tmp 2>&1; \ if (${DIFF} output/ex7_mpiaijcusp_simple.out ex7_mpiaijcusp_simple.tmp) then true; \ else printf "${PWD}\nPossible problem with with ex7_mpiaijcusp_simple, diffs above\n=========================================\n"; fi; \ ${RM} -f ex7_mpiaijcusp_simple.tmp runex7_mpiaijcusp_simple_2: -@${MPIEXEC} -n 2 ./ex7 -ksp_monitor_short -mat_type mpiaijcusp -sub_pc_factor_mat_solver_package cusparse -vec_type mpicusp -sub_ksp_type preonly -sub_pc_type ilu > ex7_mpiaijcusp_simple_2.tmp 2>&1; \ if (${DIFF} output/ex7_mpiaijcusp_simple_2.out ex7_mpiaijcusp_simple_2.tmp) then true; \ else printf "${PWD}\nPossible problem with with ex7_mpiaijcusp_simple_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex7_mpiaijcusp_simple_2.tmp runex7_mpiaijcusparse: -@${MPIEXEC} -n 1 ./ex7 -ksp_monitor_short -mat_type mpiaijcusparse -sub_pc_factor_mat_solver_package cusparse -vec_type mpicuda > ex7_mpiaijcusparse.tmp 2>&1; \ if (${DIFF} output/ex7_mpiaijcusparse.out ex7_mpiaijcusparse.tmp) then true; \ else printf "${PWD}\nPossible problem with with ex7_mpiaijcusparse, diffs above\n=========================================\n"; fi; \ ${RM} -f ex7_mpiaijcusparse.tmp runex7_mpiaijcusparse_2: -@${MPIEXEC} -n 2 ./ex7 -ksp_monitor_short -mat_type mpiaijcusparse -sub_pc_factor_mat_solver_package cusparse -vec_type mpicuda > ex7_mpiaijcusparse_2.tmp 2>&1; \ if (${DIFF} output/ex7_mpiaijcusparse_2.out ex7_mpiaijcusparse_2.tmp) then true; \ else printf "${PWD}\nPossible problem with with ex7_mpiaijcusparse_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex7_mpiaijcusparse_2.tmp NP = 1 M = 4 N = 5 MDOMAINS = 2 NDOMAINS = 1 OVERLAP=1 runex8: -@${MPIEXEC} -n ${NP} ./ex8 -m $M -n $N -user_set_subdomains -Mdomains ${MDOMAINS} -Ndomains ${NDOMAINS} -overlap ${OVERLAP} -print_error ${ARGS} runex8_1: -@${MPIEXEC} -n 1 ./ex8 -print_error > ex8_1.tmp 2>&1; \ if (${DIFF} output/ex8_1.out ex8_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex8_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex8_1.tmp runex9: -@${MPIEXEC} -n 1 ./ex9 -t 2 -pc_type jacobi -ksp_monitor_short -ksp_type gmres -ksp_gmres_cgs_refinement_type refine_always \ -s2_ksp_type bcgs -s2_pc_type jacobi -s2_ksp_monitor_short \ > ex9_1.tmp 2>&1; \ if (${DIFF} output/ex9_1.out ex9_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex9_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex9_1.tmp runex10: -@${MPIEXEC} -n 2 ./ex10 -f0 ${PETSC_DIR}/share/petsc/datafiles/matrices/spd-real-int${PETSC_INDEX_SIZE}-float${PETSC_SCALAR_SIZE} > ex10_1.tmp 2>&1; \ if (${DIFF} output/ex10_1.out ex10_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_1.tmp # See http://www.mcs.anl.gov/petsc/documentation/faq.html#datafiles for how to obtain the datafiles used below runex10_2: -@${MPIEXEC} -n 2 ./ex10 -ksp_type bicg \ -f0 ${DATAFILESPATH}/matrices/medium > ex10_2.tmp 2>&1; \ if (${DIFF} output/ex10_2.out ex10_2.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_2.tmp runex10_3: -@${MPIEXEC} -n 2 ./ex10 -ksp_type bicg -pc_type asm \ -f0 ${DATAFILESPATH}/matrices/medium > ex10_3.tmp 2>&1; \ if (${DIFF} output/ex10_3.out ex10_3.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_3, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_3.tmp runex10_4: -@${MPIEXEC} -n 1 ./ex10 -ksp_type bicg -pc_type lu \ -f0 ${DATAFILESPATH}/matrices/medium > ex10_4.tmp 2>&1; \ if (${DIFF} output/ex10_4.out ex10_4.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_4, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_4.tmp runex10_5: -@${MPIEXEC} -n 1 ./ex10 -ksp_type bicg \ -f0 ${DATAFILESPATH}/matrices/medium > ex10_5.tmp 2>&1; \ if (${DIFF} output/ex10_5.out ex10_5.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_5, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_5.tmp runex10_6: -@${MPIEXEC} -n 1 ./ex10 -pc_factor_levels 2 -pc_factor_fill 1.73 -ksp_gmres_cgs_refinement_type refine_always \ -f0 ${DATAFILESPATH}/matrices/fem1 > ex10_6.tmp 2>&1; \ if (${DIFF} output/ex10_6.out ex10_6.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_6, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_6.tmp # See http://www.mcs.anl.gov/petsc/documentation/faq.html#datafiles for how to obtain the datafiles used below BS = 2 3 4 5 6 7 8 runex10_7: - at touch ex10_7.tmp - at for bs in ${BS}; do \ ${MPIEXEC} -n 1 ./ex10 -f0 ${DATAFILESPATH}/matrices/medium -viewer_binary_skip_info \ -mat_type seqbaij -matload_block_size $$bs -ksp_max_it 100 -ksp_gmres_cgs_refinement_type refine_always -ksp_rtol \ 1.0e-15 -ksp_monitor_short >> ex10_7.tmp 2>&1 ; \ ${MPIEXEC} -n 1 ./ex10 -f0 ${DATAFILESPATH}/matrices/medium -viewer_binary_skip_info \ -mat_type seqbaij -matload_block_size $$bs -ksp_max_it 100 -ksp_gmres_cgs_refinement_type refine_always -ksp_rtol \ 1.0e-15 -ksp_monitor_short -pc_factor_mat_ordering_type nd >> ex10_7.tmp 2>&1 ; \ ${MPIEXEC} -n 1 ./ex10 -f0 ${DATAFILESPATH}/matrices/medium -viewer_binary_skip_info \ -mat_type seqbaij -matload_block_size $$bs -ksp_max_it 100 -ksp_gmres_cgs_refinement_type refine_always -ksp_rtol \ 1.0e-15 -ksp_monitor_short -pc_factor_levels 1 >> ex10_7.tmp 2>&1 ; \ ${MPIEXEC} -n 1 ./ex10 -f0 ${DATAFILESPATH}/matrices/medium -viewer_binary_skip_info \ -mat_type seqbaij -matload_block_size $$bs -ksp_type preonly \ -pc_type lu >> ex10_7.tmp 2>&1 ; \ done; - at if (${DIFF} output/ex10_7.out ex10_7.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_7, diffs above\n=========================================\n"; fi; -@${RM} -f ex10_7.tmp # See http://www.mcs.anl.gov/petsc/documentation/faq.html#datafiles for how to obtain the datafiles used below runex10_8: -@${MPIEXEC} -n 1 ./ex10 -ksp_diagonal_scale -pc_type eisenstat -ksp_monitor_short -ksp_diagonal_scale_fix \ -f0 ${DATAFILESPATH}/matrices/medium -ksp_gmres_cgs_refinement_type refine_always -mat_no_inode > ex10_8.tmp 2>&1; \ if (${DIFF} output/ex10_8.out ex10_8.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_8, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_8.tmp # See http://www.mcs.anl.gov/petsc/documentation/faq.html#datafiles for how to obtain the datafiles used below runex10_9: - at touch ex10_9.tmp - at for type in gmres; do \ for bs in 1 2 3 4 5 6 7; do \ ${MPIEXEC} -n 1 ./ex10 -f0 ${DATAFILESPATH}/matrices/medium -viewer_binary_skip_info \ -mat_type seqbaij -matload_block_size $$bs -ksp_max_it 100 -ksp_gmres_cgs_refinement_type refine_always -ksp_rtol \ 1.0e-15 -ksp_monitor_short >> ex10_9.tmp 2>&1 ; \ ${MPIEXEC} -n 1 ./ex10 -f0 ${DATAFILESPATH}/matrices/medium -ksp_gmres_cgs_refinement_type refine_always -viewer_binary_skip_info \ -mat_type seqbaij -matload_block_size $$bs -ksp_max_it 100 -ksp_rtol \ 1.0e-15 -ksp_monitor_short -trans >> ex10_9.tmp 2>&1 ; \ for np in 2 3; do \ ${MPIEXEC} -n $$np ./ex10 -f0 ${DATAFILESPATH}/matrices/medium -viewer_binary_skip_info \ -mat_type mpibaij -matload_block_size $$bs -ksp_max_it 100 -ksp_gmres_cgs_refinement_type refine_always -ksp_rtol \ 1.0e-15 -ksp_monitor_short >> ex10_9.tmp 2>&1 ; \ ${MPIEXEC} -n $$np ./ex10 -f0 ${DATAFILESPATH}/matrices/medium -ksp_gmres_cgs_refinement_type refine_always -viewer_binary_skip_info \ -mat_type mpibaij -matload_block_size $$bs -ksp_max_it 100 -ksp_rtol \ 1.0e-15 -ksp_monitor_short -trans >> ex10_9.tmp 2>&1 ; \ done; done; done; - at if (${DIFF} output/ex10_9.out ex10_9.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_9, diffs above\n=========================================\n"; fi; -@${RM} -f ex10_9.tmp # See http://www.mcs.anl.gov/petsc/documentation/faq.html#datafiles for how to obtain the datafiles used below runex10_10: -@${MPIEXEC} -n 2 ./ex10 -ksp_type fgmres -pc_type ksp \ -f0 ${DATAFILESPATH}/matrices/medium -ksp_fgmres_modifypcksp -ksp_monitor_short> ex10_10.tmp 2>&1; \ if (${DIFF} output/ex10_10.out ex10_10.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_10, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_10.tmp runex10_11: -@${MPIEXEC} -n 2 ./ex10 -f0 http://ftp.mcs.anl.gov/pub/petsc/matrices/testmatrix.gz > ex10_11.tmp 2>&1;\ if (${DIFF} output/ex10_11.out ex10_11.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_11, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_11.tmp runex10_12: -@${MPIEXEC} -n 1 ./ex10 -pc_type lu -pc_factor_mat_solver_package matlab -f0 ${DATAFILESPATH}/matrices/arco1 > ex10_12.tmp 2>&1;\ if (${DIFF} output/ex10_12.out ex10_12.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_12, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_12.tmp runex10_13: -@${MPIEXEC} -n 1 ./ex10 -mat_type lusol -pc_type lu -f0 ${DATAFILESPATH}/matrices/arco1 > ex10_13.tmp 2>&1;\ if (${DIFF} output/ex10_13.out ex10_13.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_13, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_13.tmp runex10_14: -@${MPIEXEC} -n 3 ./ex10 -pc_type spai -f0 ${DATAFILESPATH}/matrices/medium > ex10_14.tmp 2>&1; \ ${DIFF} output/ex10_14.out ex10_14.tmp || printf "${PWD}\nPossible problem with ex10_14, diffs above\n=========================================\n"; \ ${RM} -f ex10_14.tmp runex10_15: -@${MPIEXEC} -n 3 ./ex10 -pc_type hypre -pc_hypre_type pilut -f0 ${DATAFILESPATH}/matrices/medium > ex10_15.tmp 2>&1;\ if (${DIFF} output/ex10_15.out ex10_15.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_15, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_15.tmp runex10_16: -@${MPIEXEC} -n 3 ./ex10 -pc_type hypre -pc_hypre_type parasails -f0 ${DATAFILESPATH}/matrices/medium > ex10_16.tmp 2>&1;\ if (${DIFF} output/ex10_16.out ex10_16.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_16, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_16.tmp runex10_17: -@${MPIEXEC} -n 3 ./ex10 -pc_type hypre -pc_hypre_type boomeramg -f0 ${DATAFILESPATH}/matrices/medium > ex10_17.tmp 2>&1;\ if (${DIFF} output/ex10_17.out ex10_17.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_17, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_17.tmp runex10_boomeramg_schwarz: -@${MPIEXEC} -n 2 ./ex10 -ksp_monitor_short -ksp_rtol 1.E-9 -pc_type hypre -pc_hypre_type boomeramg -pc_hypre_boomeramg_smooth_type Schwarz-smoothers -f0 ${DATAFILESPATH}/matrices/poisson2.gz > ex10_boomeramg_schwarz.tmp 2>&1;\ if (${DIFF} output/ex10_boomeramg_schwarz.out ex10_boomeramg_schwarz.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_boomeramg_schwarz, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_boomeramg_schwarz.tmp runex10_boomeramg_pilut: -@${MPIEXEC} -n 2 ./ex10 -ksp_monitor_short -ksp_rtol 1.E-9 -pc_type hypre -pc_hypre_type boomeramg -pc_hypre_boomeramg_smooth_type Pilut -pc_hypre_boomeramg_smooth_num_levels 2 -f0 ${DATAFILESPATH}/matrices/poisson2.gz > ex10_boomeramg_pilut.tmp 2>&1;\ if (${DIFF} output/ex10_boomeramg_pilut.out ex10_boomeramg_pilut.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_boomeramg_pilut, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_boomeramg_pilut.tmp runex10_boomeramg_parasails: -@${MPIEXEC} -n 2 ./ex10 -ksp_monitor_short -ksp_rtol 1.E-9 -pc_type hypre -pc_hypre_type boomeramg -pc_hypre_boomeramg_smooth_type ParaSails -pc_hypre_boomeramg_smooth_num_levels 2 -f0 ${DATAFILESPATH}/matrices/poisson2.gz > ex10_boomeramg_parasails.tmp 2>&1;\ if (${DIFF} output/ex10_boomeramg_parasails.out ex10_boomeramg_parasails.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_boomeramg_parasails, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_boomeramg_parasails.tmp # Euclid has a bug in its handling of MPI communicators resulting in some memory not being freed at conclusion of the run runex10_boomeramg_euclid: -@${MPIEXEC} -n 2 ./ex10 -ksp_monitor_short -ksp_rtol 1.E-9 -pc_type hypre -pc_hypre_type boomeramg -pc_hypre_boomeramg_smooth_type Euclid -pc_hypre_boomeramg_smooth_num_levels 2 -pc_hypre_boomeramg_eu_level 1 -pc_hypre_boomeramg_eu_droptolerance 0.01 -f0 ${DATAFILESPATH}/matrices/poisson2.gz > ex10_boomeramg_euclid.tmp 2>&1;\ if (${DIFF} output/ex10_boomeramg_euclid.out ex10_boomeramg_euclid.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_boomeramg_euclid, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_boomeramg_euclid.tmp runex10_boomeramg_euclid_bj: -@${MPIEXEC} -n 2 ./ex10 -ksp_monitor_short -ksp_rtol 1.E-9 -pc_type hypre -pc_hypre_type boomeramg -pc_hypre_boomeramg_smooth_type Euclid -pc_hypre_boomeramg_smooth_num_levels 2 -pc_hypre_boomeramg_eu_level 1 -pc_hypre_boomeramg_eu_droptolerance 0.01 -pc_hypre_boomeramg_eu_bj -f0 ${DATAFILESPATH}/matrices/poisson2.gz > ex10_boomeramg_euclid_bj.tmp 2>&1;\ if (${DIFF} output/ex10_boomeramg_euclid_bj.out ex10_boomeramg_euclid_bj.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_boomeramg_euclid_bj, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_boomeramg_euclid_bj.tmp # See http://www.mcs.anl.gov/petsc/documentation/faq.html#datafiles for how to obtain the datafiles used below LEVELS = 0 2 4 runex10_19: - at touch ex10_19aij.tmp - at touch ex10_19sbaij.tmp - at for levels in ${LEVELS}; do \ ${MPIEXEC} -n 1 ./ex10 -f0 ${DATAFILESPATH}/matrices/poisson1 -ksp_type cg -pc_type icc -pc_factor_levels $$levels >> ex10_19aij.tmp 2>&1; \ ${MPIEXEC} -n 1 ./ex10 -f0 ${DATAFILESPATH}/matrices/poisson1 -ksp_type cg -pc_type icc -pc_factor_levels $$levels -mat_type seqsbaij >> ex10_19sbaij.tmp 2>&1; \ done; - at if (${DIFF} ex10_19aij.tmp ex10_19sbaij.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_19, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_19aij.tmp ex10_19sbaij.tmp # See http://www.mcs.anl.gov/petsc/documentation/faq.html#datafiles for how to obtain the datafiles used below runex10_superlu_lu_1: -@${MPIEXEC} -n 1 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package superlu -num_numfac 2 -num_rhs 2 > ex10_superlu_lu_1.tmp 2>&1; \ if (${DIFF} output/ex10_mumps.out ex10_superlu_lu_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_superlu_lu_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_superlu_lu_1.tmp runex10_superlu_dist_lu_1: -@${MPIEXEC} -n 1 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist -num_numfac 2 -num_rhs 2 > ex10_superlu_lu_2.tmp 2>&1; \ if (${DIFF} output/ex10_mumps.out ex10_superlu_lu_2.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_superlu_lu_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_superlu_lu_2.tmp runex10_superlu_dist_lu_2: -@${MPIEXEC} -n 2 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist -num_numfac 2 -num_rhs 2 > ex10_superlu_lu_2.tmp 2>&1; \ if (${DIFF} output/ex10_mumps.out ex10_superlu_lu_2.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_superlu_lu_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_superlu_lu_2.tmp runex10_umfpack: -@${MPIEXEC} -n 1 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -ksp_type preonly -pc_type lu -mat_type seqaij -pc_factor_mat_solver_package umfpack -num_numfac 2 -num_rhs 2 > ex10_umfpack.tmp 2>&1; \ if (${DIFF} output/ex10_umfpack.out ex10_umfpack.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_umfpack, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_umfpack.tmp runex10_mumps_lu_1: -@${MPIEXEC} -n 1 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -ksp_type preonly -pc_type lu -mat_type seqaij -pc_factor_mat_solver_package mumps -num_numfac 2 -num_rhs 2 > ex10_mumps_lu_1.tmp 2>&1; \ if (${DIFF} output/ex10_mumps.out ex10_mumps_lu_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_mumps_lu_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_mumps_lu_1.tmp runex10_mumps_lu_metis: -@${MPIEXEC} -n 1 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -ksp_type preonly -pc_type lu -mat_type aij -pc_factor_mat_solver_package mumps -num_numfac 2 -num_rhs 2 -mat_mumps_icntl_7 5 > ex10_mumps_lu_1.tmp 2>&1; \ if (${DIFF} output/ex10_mumps.out ex10_mumps_lu_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_mumps_lu_metis, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_mumps_lu_1.tmp runex10_mumps_lu_2: -@${MPIEXEC} -n 2 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -ksp_type preonly -pc_type lu -mat_type mpiaij -pc_factor_mat_solver_package mumps -num_numfac 2 -num_rhs 2 > ex10_mumps_lu_2.tmp 2>&1; \ if (${DIFF} output/ex10_mumps.out ex10_mumps_lu_2.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_mumps_lu_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_mumps_lu_2.tmp runex10_mumps_lu_parmetis: -@${MPIEXEC} -n 2 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -ksp_type preonly -pc_type lu -mat_type mpiaij -pc_factor_mat_solver_package mumps -num_numfac 2 -num_rhs 2 -mat_mumps_icntl_28 2 -mat_mumps_icntl_29 2 > ex10_mumps_lu_2.tmp 2>&1; \ if (${DIFF} output/ex10_mumps.out ex10_mumps_lu_2.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_mumps_lu_parmetis, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_mumps_lu_2.tmp runex10_mumps_lu_3: -@${MPIEXEC} -n 1 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -ksp_type preonly -pc_type lu -mat_type seqbaij -pc_factor_mat_solver_package mumps -num_numfac 2 -num_rhs 2 -matload_block_size 2 > ex10_mumps_lu_3.tmp 2>&1; \ if (${DIFF} output/ex10_mumps.out ex10_mumps_lu_3.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_mumps_lu_3, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_mumps_lu_3.tmp runex10_mumps_lu_4: -@${MPIEXEC} -n 2 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -ksp_type preonly -pc_type lu -mat_type mpibaij -pc_factor_mat_solver_package mumps -num_numfac 2 -num_rhs 2 -matload_block_size 2 > ex10_mumps_lu_4.tmp 2>&1; \ if (${DIFF} output/ex10_mumps.out ex10_mumps_lu_4.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_mumps_lu_4, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_mumps_lu_4.tmp runex10_mumps_cholesky_1: -@${MPIEXEC} -n 1 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -ksp_type preonly -pc_type cholesky -mat_type sbaij -pc_factor_mat_solver_package mumps -num_numfac 2 -num_rhs 2 -mat_ignore_lower_triangular > ex10_mumps_cholesky_1.tmp 2>&1; \ if (${DIFF} output/ex10_mumps.out ex10_mumps_cholesky_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_mumps_cholesky_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_mumps_cholesky_1.tmp runex10_mumps_cholesky_2: -@${MPIEXEC} -n 2 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -ksp_type preonly -pc_type cholesky -mat_type sbaij -pc_factor_mat_solver_package mumps -num_numfac 2 -num_rhs 2 -mat_ignore_lower_triangular > ex10_mumps_cholesky_2.tmp 2>&1; \ if (${DIFF} output/ex10_mumps.out ex10_mumps_cholesky_2.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_mumps_cholesky_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_mumps_cholesky_2.tmp runex10_mumps_cholesky_3: -@${MPIEXEC} -n 1 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -ksp_type preonly -pc_type cholesky -mat_type aij -pc_factor_mat_solver_package mumps -num_numfac 2 -num_rhs 2 > ex10_mumps_cholesky_3.tmp 2>&1; \ if (${DIFF} output/ex10_mumps.out ex10_mumps_cholesky_3.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_mumps_cholesky_3, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_mumps_cholesky_3.tmp runex10_mumps_cholesky_4: -@${MPIEXEC} -n 2 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -ksp_type preonly -pc_type cholesky -mat_type aij -pc_factor_mat_solver_package mumps -num_numfac 2 -num_rhs 2 > ex10_mumps_cholesky_4.tmp 2>&1; \ if (${DIFF} output/ex10_mumps.out ex10_mumps_cholesky_4.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_mumps_cholesky_4, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_mumps_cholesky_4.tmp runex10_mumps_cholesky_spd_1: -@${MPIEXEC} -n 1 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -ksp_type preonly -pc_type cholesky -mat_type aij -matload_spd -pc_factor_mat_solver_package mumps -num_numfac 2 -num_rhs 2 > ex10_mumps_cholesky_spd_1.tmp 2>&1; \ if (${DIFF} output/ex10_mumps.out ex10_mumps_cholesky_spd_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_mumps_cholesky_spd_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_mumps_cholesky_spd_1.tmp runex10_mumps_cholesky_spd_2: -@${MPIEXEC} -n 2 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -ksp_type preonly -pc_type cholesky -mat_type aij -matload_spd -pc_factor_mat_solver_package mumps -num_numfac 2 -num_rhs 2 > ex10_mumps_cholesky_spd_2.tmp 2>&1; \ if (${DIFF} output/ex10_mumps.out ex10_mumps_cholesky_spd_2.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_mumps_cholesky_spd_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_mumps_cholesky_spd_2.tmp NSUBCOMM = 8 7 6 5 4 3 2 1 runex10_mumps_redundant: - at touch ex10_mumps_redundant.tmp - at for nsubcomm in ${NSUBCOMM}; do \ ${MPIEXEC} -n 8 ./ex10 -f0 ${DATAFILESPATH}/matrices/medium -ksp_type preonly -pc_type redundant -pc_redundant_number $$nsubcomm -redundant_pc_factor_mat_solver_package mumps -num_numfac 2 -num_rhs 2 >> ex10_mumps_redundant.tmp 2>&1; \ done; - at if (${DIFF} output/ex10_mumps_redundant.out ex10_mumps_redundant.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_mumps_redundant, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_mumps_redundant.tmp; runex10_pastix_lu_1: -@${MPIEXEC} -n 1 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -ksp_type preonly -pc_type lu -mat_type seqaij -pc_factor_mat_solver_package pastix -num_numfac 2 -num_rhs 2 > ex10_pastix_lu_1.tmp 2>&1; \ if (${DIFF} output/ex10_mumps.out ex10_pastix_lu_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_pastix_lu_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_pastix_lu_1.tmp runex10_pastix_lu_2: -@${MPIEXEC} -n 2 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -ksp_type preonly -pc_type lu -mat_type mpiaij -pc_factor_mat_solver_package pastix -num_numfac 2 -num_rhs 2 > ex10_pastix_lu_2.tmp 2>&1; \ if (${DIFF} output/ex10_mumps.out ex10_pastix_lu_2.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_pastix_lu_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_pastix_lu_2.tmp runex10_pastix_cholesky_1: -@${MPIEXEC} -n 1 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -ksp_type preonly -pc_type cholesky -mat_type sbaij -pc_factor_mat_solver_package pastix -num_numfac 2 -num_rhs 2 -mat_ignore_lower_triangular > ex10_pastix_cholesky_1.tmp 2>&1; \ if (${DIFF} output/ex10_mumps.out ex10_pastix_cholesky_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_pastix_cholesky_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_pastix_cholesky_1.tmp runex10_pastix_cholesky_2: -@${MPIEXEC} -n 2 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -ksp_type preonly -pc_type cholesky -mat_type sbaij -pc_factor_mat_solver_package pastix -num_numfac 2 -num_rhs 2 -mat_ignore_lower_triangular > ex10_pastix_cholesky_2.tmp 2>&1; \ if (${DIFF} output/ex10_mumps.out ex10_pastix_cholesky_2.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_pastix_cholesky_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_pastix_cholesky_2.tmp NSUBCOMM = 8 7 6 5 4 3 2 1 runex10_pastix_redundant: - at touch ex10_pastix_redundant.tmp - at for nsubcomm in ${NSUBCOMM}; do \ ${MPIEXEC} -n 8 ./ex10 -f0 ${DATAFILESPATH}/matrices/medium -ksp_type preonly -pc_type redundant -pc_redundant_number $$nsubcomm -redundant_pc_factor_mat_solver_package pastix -num_numfac 2 -num_rhs 2 >> ex10_pastix_redundant.tmp 2>&1; \ done; - at if (${DIFF} output/ex10_mumps_redundant.out ex10_pastix_redundant.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_pastix_redundant, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_pastix_redundant.tmp; runex10_superlu_dist_redundant: - at touch ex10_superlu_dist_redundant.tmp - at for nsubcomm in ${NSUBCOMM}; do \ ${MPIEXEC} -n 8 ./ex10 -f0 ${DATAFILESPATH}/matrices/medium -ksp_type preonly -pc_type redundant -pc_redundant_number $$nsubcomm -redundant_pc_factor_mat_solver_package superlu_dist -num_numfac 2 -num_rhs 2 >> ex10_superlu_dist_redundant.tmp 2>&1; \ done; - at if (${DIFF} output/ex10_mumps_redundant.out ex10_superlu_dist_redundant.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_superlu_dist_redundant, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_superlu_dist_redundant.tmp; runex10_ILU: # test ilu fill greater than zero -@${MPIEXEC} -n 1 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -pc_factor_levels 1 > ex10_20.tmp 2>&1; \ if (${DIFF} output/ex10_ILU.out ex10_20.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_ILU, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_20.tmp runex10_ILUBAIJ: # test ilu fill greater than zero -@${MPIEXEC} -n 1 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -pc_factor_levels 1 -mat_type baij > ex10_20.tmp 2>&1; \ if (${DIFF} output/ex10_ILU.out ex10_20.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_ILU, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_20.tmp runex10_cg: -@${MPIEXEC} -n 1 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -mat_type mpisbaij -ksp_type cg -pc_type eisenstat -ksp_monitor_short -ksp_converged_reason > ex10_20.tmp 2>&1; \ if (${DIFF} output/ex10_cg_singlereduction.out ex10_20.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_cg, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_20.tmp runex10_cg_singlereduction: -@${MPIEXEC} -n 1 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -mat_type mpisbaij -ksp_type cg -pc_type eisenstat -ksp_monitor_short -ksp_converged_reason -ksp_cg_single_reduction > ex10_20.tmp 2>&1; \ if (${DIFF} output/ex10_cg_singlereduction.out ex10_20.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_cg_singlereduction, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_20.tmp runex10_seqaijcrl: -@${MPIEXEC} -n 1 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -ksp_monitor_short -ksp_view -mat_view ascii::ascii_info -mat_type seqaijcrl > ex10_seqaijcrl.tmp 2>&1; \ if (${DIFF} output/ex10_seqcrl.out ex10_seqaijcrl.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_seqaijcrl, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_seqaijcrl.tmp runex10_mpiaijcrl: -@${MPIEXEC} -n 2 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -ksp_monitor_short -ksp_view -mat_type mpiaijcrl > ex10_mpiaijcrl.tmp 2>&1; \ if (${DIFF} output/ex10_mpiaij.out ex10_mpiaijcrl.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_mpiaijcrl, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_mpiaijcrl.tmp runex10_seqaijperm: -@${MPIEXEC} -n 1 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -ksp_monitor_short -ksp_view -mat_view ascii::ascii_info -mat_type seqaijperm > ex10_seqaijperm.tmp 2>&1; \ if (${DIFF} output/ex10_seqcsrperm.out ex10_seqaijperm.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_seqaijperm, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_seqaijperm.tmp runex10_mpiaijperm: -@${MPIEXEC} -n 2 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -ksp_monitor_short -ksp_view -mat_type mpiaijperm > ex10_mpiaijperm.tmp 2>&1; \ if (${DIFF} output/ex10_mpicsrperm.out ex10_mpiaijperm.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_mpiaijperm, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_mpiaijperm.tmp runex10_aijcusparse: -@${MPIEXEC} -n 1 ./ex10 -f0 ${DATAFILESPATH}/matrices/medium -ksp_monitor_short -ksp_view -mat_view ascii::ascii_info -mat_type aijcusparse -pc_factor_mat_solver_package cusparse -pc_type ilu > ex10_aijcusparse.tmp 2>&1; \ if (${DIFF} output/ex10_aijcusparse.out ex10_aijcusparse.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_aijcusparse, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10_aijcusparse.tmp runex10_zeropivot: -@${MPIEXEC} -n 3 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -test_zeropivot -ksp_converged_reason -ksp_type fgmres -pc_type ksp -ksp_pc_type bjacobi > ex10.tmp 2>&1; \ if (${DIFF} output/ex10_zeropivot.out ex10.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_zeropivot, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10.tmp runex10_zeropivot_2: -@${MPIEXEC} -n 2 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -test_zeropivot -ksp_converged_reason -ksp_type fgmres -pc_type ksp -ksp_ksp_type cg -ksp_pc_type bjacobi -ksp_pc_bjacobi_blocks 1 > ex10.tmp 2>&1; \ if (${DIFF} output/ex10_zeropivot.out ex10.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_zeropivot_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10.tmp runex10_zeropivot_3: -@${MPIEXEC} -n 3 ./ex10 -f0 ${DATAFILESPATH}/matrices/small -test_zeropivot -ksp_converged_reason -ksp_type fgmres -pc_type ksp -ksp_ksp_converged_reason -ksp_pc_type bjacobi -ksp_sub_ksp_converged_reason > ex10.tmp 2>&1; \ if (${DIFF} output/ex10_zeropivot_3.out ex10.tmp) then true; \ else printf "${PWD}\nPossible problem with ex10_zeropivot_3, diffs above\n=========================================\n"; fi; \ ${RM} -f ex10.tmp runex11: -@${MPIEXEC} -n 1 ./ex11 -n 6 -norandom -pc_type none -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > ex11_1.tmp 2>&1; \ if (${DIFF} output/ex11_1.out ex11_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex11_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex11_1.tmp runex11f: -@${MPIEXEC} -n 1 ./ex11f -n 6 -norandom -pc_type none -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > ex11f_1.tmp 2>&1; \ if (${DIFF} output/ex11f_1.out ex11f_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex11f_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex11f_1.tmp runex12: -@${MPIEXEC} -n 1 ./ex12 -ksp_gmres_cgs_refinement_type refine_always > ex12_1.tmp 2>&1; \ if (${DIFF} output/ex12_1.out ex12_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex12_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex12_1.tmp runex13: -@${MPIEXEC} -n 1 ./ex13 -m 19 -n 20 -ksp_gmres_cgs_refinement_type refine_always > ex13_1.tmp 2>&1; \ if (${DIFF} output/ex13_1.out ex13_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex13_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex13_1.tmp runex13f90: -@${MPIEXEC} -n 1 ./ex13f90 -m 19 -n 20 -ksp_gmres_cgs_refinement_type refine_always > ex13f90_1.tmp 2>&1; \ if (${DIFF} output/ex13f90_1.out ex13f90_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex13f90_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex13f90_1.tmp runex14f: -@${MPIEXEC} -n 1 ./ex14f -no_output -ksp_gmres_cgs_refinement_type refine_always > ex14_1.tmp 2>&1; \ if (${DIFF} output/ex14_1.out ex14_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex14f_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex14_1.tmp runex15: -@${MPIEXEC} -n 2 ./ex15 -ksp_view -user_defined_pc -ksp_gmres_cgs_refinement_type refine_always > ex15_1.tmp 2>&1; \ if (${DIFF} output/ex15_1.out ex15_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex15_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex15_1.tmp runex15f: -@${MPIEXEC} -n 2 ./ex15f -ksp_view -user_defined_pc -ksp_gmres_cgs_refinement_type refine_always > ex15f_1.tmp 2>&1; \ if (${DIFF} output/ex15f_1.out ex15f_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex15f_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex15f_1.tmp runex16: -@${MPIEXEC} -n 2 ./ex16 -ntimes 4 -ksp_gmres_cgs_refinement_type refine_always > ex16_1.tmp 2>&1; \ if (${DIFF} output/ex16_1.out ex16_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex16_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex16_1.tmp runex18: -@${MPIEXEC} -n 3 ./ex18 -m 39 -n 18 -ksp_monitor_short -permute nd > ex18_1.tmp 2>&1; \ ${DIFF} output/ex18_1.out ex18_1.tmp || printf "${PWD}\nPossible problem with ex18_1, diffs above\n=========================================\n"; \ ${RM} -f ex18_1.tmp runex18_2: -@${MPIEXEC} -n 3 ./ex18 -m 39 -n 18 -ksp_monitor_short -permute rcm > ex18_2.tmp 2>&1; \ ${DIFF} output/ex18_2.out ex18_2.tmp || printf "${PWD}\nPossible problem with ex18_2, diffs above\n=========================================\n"; \ ${RM} -f ex18_2.tmp runex18_3: -@${MPIEXEC} -n 3 ./ex18 -m 13 -n 17 -ksp_monitor_short -ksp_type cg -ksp_cg_single_reduction > ex18_3.tmp 2>&1; \ ${DIFF} output/ex18_3.out ex18_3.tmp || printf "${PWD}\nPossible problem with ex18_3, diffs above\n=========================================\n"; \ ${RM} -f ex18_3.tmp runex21f: -@${MPIEXEC} -n 1 ./ex21f > ex21f_1.tmp 2>&1; \ if (${DIFF} output/ex21f_1.out ex21f_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex21f_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex21f_1.tmp runex22f: -@${MPIEXEC} -n 1 ./ex22f -pc_mg_type full -ksp_monitor_short -mg_levels_ksp_monitor_short -mg_levels_ksp_norm_type preconditioned -pc_type mg -da_refine 2 -ksp_type fgmres > ex22_1.tmp 2>&1; \ if (${DIFF} output/ex22_1.out ex22_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex22f_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex22_1.tmp runex23: -@${MPIEXEC} -n 1 ./ex23 -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > ex23_1.tmp 2>&1; \ if (${DIFF} output/ex23_1.out ex23_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex23_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex23_1.tmp runex23_2: -@${MPIEXEC} -n 3 ./ex23 -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > ex23_2.tmp 2>&1; \ if (${DIFF} output/ex23_2.out ex23_2.tmp) then true; \ else printf "${PWD}\nPossible problem with ex23_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex23_2.tmp runex23_3 : -@${MPIEXEC} -n 2 ./ex23 -ksp_monitor_short -ksp_rtol 1e-6 -ksp_type pipefgmres > ex23_3.tmp 2>&1; \ if (${DIFF} output/ex23_3.out ex23_3.tmp) then true; \ else printf "${PWD}\nPossible problem with with ex23_3, diffs above\n=========================================\n"; fi; \ ${RM} -f ex23_3.tmp runex25: -@${MPIEXEC} -n 1 ./ex25 -pc_type mg -ksp_type fgmres -da_refine 2 -ksp_monitor_short -mg_levels_ksp_monitor_short -mg_levels_ksp_norm_type unpreconditioned -ksp_view -pc_mg_type full > ex25_1.tmp 2>&1; \ if (${DIFF} output/ex25_1.out ex25_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex25_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex25_1.tmp runex25_2: -@${MPIEXEC} -n 2 ./ex25 -pc_type mg -ksp_type fgmres -da_refine 2 -ksp_monitor_short -mg_levels_ksp_monitor_short -mg_levels_ksp_norm_type unpreconditioned -ksp_view -pc_mg_type full > ex25_2.tmp 2>&1; \ if (${DIFF} output/ex25_2.out ex25_2.tmp) then true; \ else printf "${PWD}\nPossible problem with ex25_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex25_2.tmp runex27: -@${MPIEXEC} -n 1 ./ex27 -f ${DATAFILESPATH}/matrices/medium -ksp_view -ksp_monitor_short -ksp_max_it 100 > ex27_1.tmp 2>&1; \ if (${DIFF} output/ex27_1.out ex27_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex27_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex27_1.tmp runex28: -@${MPIEXEC} -n 1 ./ex28 -ksp_monitor_short -pc_type mg -pc_mg_type full -ksp_type fgmres -da_refine 2 -mg_levels_ksp_type gmres -mg_levels_ksp_max_it 1 -mg_levels_pc_type ilu > ex28_1.tmp 2>&1; \ if (${DIFF} output/ex28_1.out ex28_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex28_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex28_1.tmp runex29: -@${MPIEXEC} -n 1 ./ex29 -pc_type mg -pc_mg_type full -ksp_type fgmres -ksp_monitor_short -da_refine 8 > ex29_1.tmp 2>&1; \ if (${DIFF} output/ex29_1.out ex29_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex29_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex29_1.tmp runex29_2: -@${MPIEXEC} -n 1 ./ex29 -bc_type neumann -pc_type mg -pc_mg_type full -ksp_type fgmres -ksp_monitor_short -da_refine 8 -mg_coarse_pc_factor_shift_type nonzero > ex29_2.tmp 2>&1; \ if (${DIFF} output/ex29_2.out ex29_2.tmp) then true; \ else printf "${PWD}\nPossible problem with ex29_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex29_2.tmp runex29_telescope: -@${MPIEXEC} -n 4 ./ex29 -ksp_monitor_short -da_grid_x 257 -da_grid_y 257 -pc_type mg -pc_mg_galerkin -pc_mg_levels 4 -ksp_type richardson -mg_levels_ksp_type chebyshev -mg_levels_pc_type jacobi -mg_coarse_pc_type telescope -mg_coarse_pc_telescope_ignore_kspcomputeoperators -mg_coarse_telescope_pc_type mg -mg_coarse_telescope_pc_mg_galerkin -mg_coarse_telescope_pc_mg_levels 3 -mg_coarse_telescope_mg_levels_ksp_type chebyshev -mg_coarse_telescope_mg_levels_pc_type jacobi -mg_coarse_pc_telescope_reduction_factor 4 > ex29_telescope.tmp 2>&1; \ if (${DIFF} output/ex29_telescope.out ex29_telescope.tmp) then true; \ else printf "${PWD}\nPossible problem with ex29_telescope, diffs above\n=========================================\n"; fi; \ ${RM} -f ex29_telescope.tmp runex30: -@${MPIEXEC} -n 1 ./ex30 > ex30_1.tmp 2>&1; \ if (${DIFF} output/ex30_1.out ex30_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex30_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex30_1.tmp runex32: -@${MPIEXEC} -n 1 ./ex32 -pc_type mg -pc_mg_type full -ksp_type fgmres -ksp_monitor_short -pc_mg_levels 3 -mg_coarse_pc_factor_shift_type nonzero > ex32_1.tmp 2>&1; \ if (${DIFF} output/ex32_1.out ex32_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex32_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex32_1.tmp runex34: -@${MPIEXEC} -n 1 ./ex34 -pc_type mg -pc_mg_type full -ksp_type fgmres -ksp_monitor_short -pc_mg_levels 3 -mg_coarse_pc_factor_shift_type nonzero -ksp_view > ex34_1.tmp 2>&1; \ if (${DIFF} output/ex34_1.out ex34_1.tmp) then true; \ else printf "${PWD}\nPossible problem with ex34_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex34_1.tmp runex34_2 : -@${MPIEXEC} -n 2 ./ex34 -ksp_monitor_short -da_grid_x 50 -da_grid_y 50 -pc_type ksp -ksp_ksp_type cg -ksp_pc_type bjacobi -ksp_ksp_rtol 1e-1 -ksp_ksp_monitor -ksp_type pipefgmres -ksp_gmres_restart 5 > ex34_2.tmp 2>&1; \ if (${DIFF} output/ex34_2.out ex34_2.tmp) then true; \ else printf "${PWD}\nPossible problem with with ex34_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex34_2.tmp runex35: -@${MPIEXEC} -n 1 ./ex35 -bc_type dirichlet -nu .01 -n 10 -ksp_type cg -pc_type sor -ksp_converged_reason > ex35_1.tmp 2>&1;\ if (${DIFF} output/ex35_1.out ex35_1.tmp) then true ; \ else echo ${PWD} ; echo "Possible problem with runex35, diffs above \n========================================="; fi ;\ ${RM} -f ex35_1.tmp runex35_2: -@${MPIEXEC} -n 2 ./ex35 -bc_type dirichlet -nu .01 -n 20 -ksp_type cg -pc_type sor -ksp_converged_reason > ex35_2.tmp 2>&1;\ if (${DIFF} output/ex35_2.out ex35_2.tmp) then true ; \ else echo ${PWD} ; echo "Possible problem with runex35_2, diffs above \n========================================="; fi ;\ ${RM} -f ex35_2.tmp runex43: -@${MPIEXEC} -n 1 ./ex43 -stokes_ksp_type fgmres -stokes_pc_type fieldsplit -stokes_pc_fieldsplit_block_size 3 -stokes_pc_fieldsplit_type SYMMETRIC_MULTIPLICATIVE -stokes_pc_fieldsplit_0_fields 0,1 -stokes_pc_fieldsplit_1_fields 2 -stokes_fieldsplit_0_ksp_type preonly -stokes_fieldsplit_0_pc_type lu -stokes_fieldsplit_1_ksp_type preonly -stokes_fieldsplit_1_pc_type jacobi -c_str 0 -solcx_eta0 1.0 -solcx_eta1 1.0e6 -solcx_xc 0.5 -solcx_nz 2 -mx 20 -my 20 -stokes_ksp_monitor_short > ex43_1.tmp 2>&1; \ ${DIFF} output/ex43_1.out ex43_1.tmp || printf "${PWD}\nPossible problem with ex43_1, diffs above\n=========================================\n"; \ ${RM} -f ex43_1.tmp runex43_2: -@${MPIEXEC} -n 1 ./ex43 -stokes_ksp_type fgmres -stokes_pc_type fieldsplit -stokes_pc_fieldsplit_block_size 3 -stokes_pc_fieldsplit_type SYMMETRIC_MULTIPLICATIVE -stokes_fieldsplit_u_ksp_type preonly -stokes_fieldsplit_u_pc_type lu -stokes_fieldsplit_p_ksp_type preonly -stokes_fieldsplit_p_pc_type jacobi -c_str 0 -solcx_eta0 1.0 -solcx_eta1 1.0e6 -solcx_xc 0.5 -solcx_nz 2 -mx 20 -my 20 -stokes_ksp_monitor_short > ex43_2.tmp 2>&1; \ ${DIFF} output/ex43_1.out ex43_2.tmp || printf "${PWD}\nPossible problem with ex43_2, diffs above\n=========================================\n"; \ ${RM} -f ex43_2.tmp runex43_3: -@${MPIEXEC} -n 4 ./ex43 -stokes_ksp_type gcr -stokes_ksp_gcr_restart 60 -stokes_ksp_norm_type unpreconditioned -stokes_ksp_rtol 1.e-2 -c_str 3 -sinker_eta0 1.0 -sinker_eta1 100 -sinker_dx 0.4 -sinker_dy 0.3 -mx 128 -my 128 -stokes_ksp_monitor_short -stokes_pc_type mg -stokes_mg_levels_pc_type fieldsplit -stokes_pc_mg_galerkin -stokes_mg_levels_pc_fieldsplit_block_size 3 -stokes_mg_levels_pc_fieldsplit_0_fields 0,1 -stokes_mg_levels_pc_fieldsplit_1_fields 2 -stokes_mg_levels_fieldsplit_0_pc_type sor -stokes_mg_levels_fieldsplit_1_pc_type sor -stokes_mg_levels_ksp_type chebyshev -stokes_mg_levels_ksp_max_it 1 -stokes_mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.1 -stokes_pc_mg_levels 4 -stokes_ksp_view > ex43_3.tmp 2>&1; \ ${DIFF} output/ex43_3.out ex43_3.tmp || printf "${PWD}\nPossible problem with ex43_3, diffs above\n=========================================\n"; \ ${RM} -f ex43_3.tmp runex43_bjacobi: -@${MPIEXEC} -n 4 ./ex43 -stokes_ksp_rtol 1.e-4 -stokes_pc_type bjacobi -stokes_pc_bjacobi_blocks 2 -mat_type aij -stokes_ksp_converged_reason > ex43.tmp 2>&1; \ ${DIFF} output/ex43_bjacobi.out ex43.tmp || printf "${PWD}\nPossible problem with ex43_bjacobi, diffs above\n=========================================\n"; \ ${RM} -f ex43.tmp runex43_bjacobi_baij: -@${MPIEXEC} -n 4 ./ex43 -stokes_ksp_rtol 1.e-4 -stokes_pc_type bjacobi -stokes_pc_bjacobi_blocks 2 -mat_type baij -stokes_ksp_converged_reason > ex43.tmp 2>&1; \ ${DIFF} output/ex43_bjacobi.out ex43.tmp || printf "${PWD}\nPossible problem with ex43_bjacobi_baij, diffs above\n=========================================\n"; \ ${RM} -f ex43.tmp runex43_nested_gmg: -@${MPIEXEC} -n 4 ./ex43 -mx 16 -my 16 -stokes_ksp_type fgmres -stokes_pc_type fieldsplit -stokes_fieldsplit_u_pc_type mg -stokes_fieldsplit_u_pc_mg_levels 5 -stokes_fieldsplit_u_pc_mg_galerkin -stokes_fieldsplit_u_ksp_type cg -stokes_fieldsplit_u_ksp_rtol 1.0e-4 -stokes_fieldsplit_u_mg_levels_pc_type jacobi -solcx_eta0 1.0e4 -stokes_fieldsplit_u_ksp_converged_reason -stokes_ksp_converged_reason -stokes_fieldsplit_p_sub_pc_factor_zeropivot 1.e-8 > ex43.tmp 2>&1; \ ${DIFF} output/ex43_nested_gmg.out ex43.tmp || printf "${PWD}\nPossible problem with ex43_nested_gmg, diffs above\n=========================================\n"; \ ${RM} -f ex43.tmp runex43_4: -@${MPIEXEC} -n 4 ./ex43 -stokes_ksp_type pipegcr -stokes_ksp_pipegcr_mmax 60 -stokes_ksp_pipegcr_unroll_w 1 -stokes_ksp_norm_type natural -c_str 3 -sinker_eta0 1.0 -sinker_eta1 100 -sinker_dx 0.4 -sinker_dy 0.3 -mx 128 -my 128 -stokes_ksp_monitor_short -stokes_pc_type mg -stokes_mg_levels_pc_type fieldsplit -stokes_pc_mg_galerkin -stokes_mg_levels_pc_fieldsplit_block_size 3 -stokes_mg_levels_pc_fieldsplit_0_fields 0,1 -stokes_mg_levels_pc_fieldsplit_1_fields 2 -stokes_mg_levels_fieldsplit_0_pc_type sor -stokes_mg_levels_fieldsplit_1_pc_type sor -stokes_mg_levels_ksp_type chebyshev -stokes_mg_levels_ksp_max_it 1 -stokes_mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.1 -stokes_pc_mg_levels 4 -stokes_ksp_view > ex43_4.tmp 2>&1; \ ${DIFF} output/ex43_4.out ex43_4.tmp || printf "${PWD}\nPossible problem with ex43_4, diffs above\n=========================================\n"; \ ${RM} -f ex43_4.tmp runex43_5: -@${MPIEXEC} -n 4 ./ex43 -stokes_ksp_type pipegcr -stokes_pc_type fieldsplit -stokes_pc_fieldsplit_block_size 3 -stokes_pc_fieldsplit_type SYMMETRIC_MULTIPLICATIVE -stokes_pc_fieldsplit_0_fields 0,1 -stokes_pc_fieldsplit_1_fields 2 -stokes_fieldsplit_0_ksp_type preonly -stokes_fieldsplit_0_pc_type bjacobi -stokes_fieldsplit_1_ksp_type preonly -stokes_fieldsplit_1_pc_type bjacobi -c_str 0 -solcx_eta0 1.0 -solcx_eta1 1.0e6 -solcx_xc 0.5 -solcx_nz 2 -mx 20 -my 20 -stokes_ksp_monitor_short -stokes_ksp_view > ex43_5.tmp 2>&1; \ ${DIFF} output/ex43_5.out ex43_5.tmp || printf "${PWD}\nPossible problem with ex43_5, diffs above\n=========================================\n"; \ ${RM} -f ex43_5.tmp runex45: -@${MPIEXEC} -n 4 ./ex45 -pc_type exotic -ksp_monitor_short -ksp_type fgmres -mg_levels_ksp_type gmres -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi > ex45_1.tmp 2>&1; \ ${DIFF} output/ex45_1.out ex45_1.tmp || printf "${PWD}\nPossible problem with ex45_1, diffs above\n=========================================\n"; \ ${RM} -f ex45_1.tmp runex45_2: -@${MPIEXEC} -n 4 ./ex45 -ksp_monitor_short -da_grid_x 21 -da_grid_y 21 -da_grid_z 21 -pc_type mg -pc_mg_levels 3 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi > ex45_2.tmp 2>&1; \ ${DIFF} output/ex45_2.out ex45_2.tmp || printf "${PWD}\nPossible problem with ex45_2, diffs above\n=========================================\n"; \ ${RM} -f ex45_2.tmp runex45_telescope: -@${MPIEXEC} -n 4 ./ex45 -ksp_type fgmres -ksp_monitor_short -pc_type mg -mg_levels_ksp_type richardson -mg_levels_pc_type jacobi -pc_mg_levels 2 -da_grid_x 65 -da_grid_y 65 -da_grid_z 65 -mg_coarse_pc_type telescope -mg_coarse_pc_telescope_ignore_kspcomputeoperators -mg_coarse_pc_telescope_reduction_factor 4 -mg_coarse_telescope_pc_type mg -mg_coarse_telescope_pc_mg_galerkin -mg_coarse_telescope_pc_mg_levels 3 -mg_coarse_telescope_mg_levels_ksp_type richardson -mg_coarse_telescope_mg_levels_pc_type jacobi -mg_levels_ksp_type richardson -mg_coarse_telescope_mg_levels_ksp_type richardson -ksp_rtol 1.0e-4 > ex45_telescope.tmp 2>&1; \ ${DIFF} output/ex45_telescope.out ex45_telescope.tmp || printf "${PWD}\nPossible problem with ex45_telescope, diffs above\n=========================================\n"; \ ${RM} -f ex45_telescope.tmp runex45_telescope_2: -@${MPIEXEC} -n 4 ./ex45 -ksp_type fgmres -ksp_monitor_short -pc_type mg -mg_levels_ksp_type richardson -mg_levels_pc_type jacobi -pc_mg_levels 2 -da_grid_x 65 -da_grid_y 65 -da_grid_z 65 -mg_coarse_pc_type telescope -mg_coarse_pc_telescope_reduction_factor 2 -mg_coarse_telescope_pc_type mg -mg_coarse_telescope_pc_mg_galerkin -mg_coarse_telescope_pc_mg_levels 3 -mg_coarse_telescope_mg_levels_ksp_type richardson -mg_coarse_telescope_mg_levels_pc_type jacobi -mg_levels_ksp_type richardson -mg_coarse_telescope_mg_levels_ksp_type richardson -ksp_rtol 1.0e-4 > ex45_telescope_2.tmp 2>&1; \ ${DIFF} output/ex45_telescope_2.out ex45_telescope_2.tmp || printf "${PWD}\nPossible problem with ex45_telescope_2, diffs above\n=========================================\n"; \ ${RM} -f ex45_telescope_2.tmp runex45f: -@${MPIEXEC} -n 4 ./ex45f -ksp_monitor_short -da_refine 5 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type chebyshev -mg_levels_ksp_max_it 2 -mg_levels_pc_type jacobi -ksp_pc_side right > ex45f_1.tmp 2>&1; \ ${DIFF} output/ex45f_1.out ex45f_1.tmp || printf "${PWD}\nPossible problem with ex45f_1, diffs above\n=========================================\n"; \ ${RM} -f ex45f_1.tmp runex46_aijcusp: -@${MPIEXEC} -n 1 ./ex46 -mat_type aijcusp -dm_vec_type cusp -random_exact_sol > ex46_aijcusp.tmp 2>& 1; \ ${DIFF} output/ex46_aijcusp.out ex46_aijcusp.tmp || printf "${PWD}\nPossible problem with ex46_aijcusp, diffs above\n=========================================\n"; \ ${RM} ex46_aijcusp.tmp runex46_aijcusparse: -@${MPIEXEC} -n 1 ./ex46 -mat_type aijcusparse -dm_vec_type cuda -random_exact_sol > ex46_aijcusparse.tmp 2>& 1; \ ${DIFF} output/ex46_aijcusparse.out ex46_aijcusparse.tmp || printf "${PWD}\nPossible problem with ex46_aijcusparse, diffs above\n=========================================\n"; \ ${RM} ex46_aijcusparse.tmp runex49: -@${MPIEXEC} -n 1 ./ex49 -mx 20 -my 30 -elas_ksp_monitor_short -no_view -c_str 3 -sponge_E0 1 -sponge_E1 1000 -sponge_nu0 0.4 -sponge_nu1 0.2 -sponge_t 1 -sponge_w 8 -elas_ksp_rtol 5e-3 -elas_ksp_view > ex49_1.tmp 2>&1; \ ${DIFF} output/ex49_1.out ex49_1.tmp || printf "${PWD}\nPossible problem with ex49_1, diffs above\n=========================================\n"; \ ${RM} -f ex49_1.tmp runex49_2: -@${MPIEXEC} -n 4 ./ex49 -mx 20 -my 30 -elas_ksp_monitor_short -no_view -c_str 3 -sponge_E0 1 -sponge_E1 1000 -sponge_nu0 0.4 -sponge_nu1 0.2 -sponge_t 1 -sponge_w 8 -elas_ksp_type gcr -elas_pc_type asm -elas_sub_pc_type lu -elas_ksp_rtol 5e-3 > ex49_2.tmp 2>&1; \ ${DIFF} output/ex49_2.out ex49_2.tmp || printf "${PWD}\nPossible problem with ex49_2, diffs above\n=========================================\n"; \ ${RM} -f ex49_2.tmp runex49_3: -@${MPIEXEC} -n 4 ./ex49 -mx 20 -my 30 -elas_ksp_monitor_short -no_view -c_str 2 -brick_E 1,10,1000,100 -brick_nu 0.4,0.2,0.3,0.1 -brick_span 3 -elas_pc_type asm -elas_sub_pc_type lu -elas_ksp_rtol 5e-3 > ex49_3.tmp 2>&1; \ ${DIFF} output/ex49_3.out ex49_3.tmp || printf "${PWD}\nPossible problem with ex49_3, diffs above\n=========================================\n"; \ ${RM} -f ex49_3.tmp runex49_4: -@${MPIEXEC} -n 4 ./ex49 -elas_ksp_monitor_short -elas_ksp_converged_reason -elas_ksp_type cg -elas_ksp_norm_type unpreconditioned -mx 40 -my 40 -c_str 2 -brick_E 1,1e-6,1e-2 -brick_nu .3,.2,.4 -brick_span 8 -elas_mg_levels_ksp_type chebyshev -elas_pc_type ml -elas_mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.1 -elas_mg_levels_pc_type pbjacobi -elas_mg_levels_ksp_max_it 2 -use_nonsymbc -elas_pc_ml_nullspace user > ex49_4.tmp 2>&1; \ ${DIFF} output/ex49_4.out ex49_4.tmp || printf "${PWD}\nPossible problem with ex49_4, diffs above\n=========================================\n"; \ ${RM} -f ex49_4.tmp runex49_5: -@${MPIEXEC} -n 3 ./ex49 -elas_ksp_monitor_short -elas_ksp_converged_reason -elas_ksp_type cg -elas_ksp_norm_type natural -mx 22 -my 22 -c_str 2 -brick_E 1,1e-6,1e-2 -brick_nu .3,.2,.4 -brick_span 8 -elas_pc_type gamg -elas_mg_levels_ksp_type chebyshev -elas_mg_levels_ksp_max_it 1 -elas_mg_levels_ksp_chebyshev_esteig 0.2,1.1 -elas_mg_levels_pc_type jacobi -elas_pc_gamg_random_no_imaginary_part -elas_pc_gamg_random_no_imaginary_part > ex49_5.tmp 2>&1; \ ${DIFF} output/ex49_5.out ex49_5.tmp || printf "${PWD}\nPossible problem with ex49_5, diffs above\n=========================================\n"; \ ${RM} -f ex49_5.tmp # hyper has some valgrind serious bus in it for vec_interp_variant hence this crashes on some systems, waiting for hypre team to fix runex49_hypre_nullspace: -@${MPIEXEC} -n 1 ./ex49 -elas_ksp_monitor_short -elas_ksp_converged_reason -elas_ksp_type cg -elas_ksp_norm_type natural -mx 22 -my 22 -c_str 2 -brick_E 1,1e-6,1e-2 -brick_nu .3,.2,.4 -brick_span 8 -elas_pc_type hypre -elas_pc_hypre_boomeramg_nodal_coarsen 6 -elas_pc_hypre_boomeramg_vec_interp_variant 3 -elas_ksp_view > ex49_hypre_nullspace.tmp 2>&1; \ ${DIFF} output/ex49_hypre_nullspace.out ex49_hypre_nullspace.tmp || printf "${PWD}\nPossible problem with ex49_hypre_nullspace, diffs above\n=========================================\n"; \ ${RM} -f ex49_hypre_nullspace.tmp runex49_6: -@${MPIEXEC} -n 4 ./ex49 -mx 20 -my 30 -elas_ksp_monitor_short -no_view -c_str 3 -sponge_E0 1 -sponge_E1 1000 -sponge_nu0 0.4 -sponge_nu1 0.2 -sponge_t 1 -sponge_w 8 -elas_ksp_type pipegcr -elas_pc_type asm -elas_sub_pc_type lu > ex49_6.tmp 2>&1; \ ${DIFF} output/ex49_6.out ex49_6.tmp || printf "${PWD}\nPossible problem with ex49_6, diffs above\n=========================================\n"; \ ${RM} -f ex49_6.tmp runex49_7: -@${MPIEXEC} -n 4 ./ex49 -mx 20 -my 30 -elas_ksp_monitor_short -no_view -c_str 3 -sponge_E0 1 -sponge_E1 1000 -sponge_nu0 0.4 -sponge_nu1 0.2 -sponge_t 1 -sponge_w 8 -elas_ksp_type pipegcr -elas_pc_type asm -elas_sub_pc_type ksp -elas_sub_ksp_ksp_type cg -elas_sub_ksp_ksp_max_it 15 > ex49_7.tmp 2>&1; \ ${DIFF} output/ex49_7.out ex49_7.tmp || printf "${PWD}\nPossible problem with ex49_7, diffs above\n=========================================\n"; \ ${RM} -f ex49_7.tmp runex49_8: -@${MPIEXEC} -n 4 ./ex49 -mx 20 -my 30 -elas_ksp_monitor_short -no_view -c_str 3 -sponge_E0 1 -sponge_E1 1000 -sponge_nu0 0.4 -sponge_nu1 0.2 -sponge_t 1 -sponge_w 8 -elas_ksp_type pipefgmres -elas_pc_type asm -elas_sub_pc_type ksp -elas_sub_ksp_ksp_type cg -elas_sub_ksp_ksp_max_it 15 > ex49_8.tmp 2>&1; \ ${DIFF} output/ex49_8.out ex49_8.tmp || printf "${PWD}\nPossible problem with ex49_8, diffs above\n=========================================\n"; \ ${RM} -f ex49_8.tmp runex50: -@${MPIEXEC} -n 1 ./ex50 -pc_type mg -pc_mg_type full -ksp_type fgmres -ksp_monitor_short -da_refine 1 -mg_levels_pc_factor_shift_type nonzero -mg_coarse_pc_factor_shift_type nonzero -ksp_view > ex50.tmp 2>&1; \ ${DIFF} output/ex50.out ex50.tmp || printf "${PWD}\nPossible problem with ex50, diffs above\n=========================================\n"; \ ${RM} -f ex50.tmp runex50_2: -@${MPIEXEC} -n 2 ./ex50 -pc_type mg -pc_mg_type full -ksp_type fgmres -ksp_monitor_short -da_refine 1 -mg_levels_sub_pc_factor_shift_type nonzero -mg_coarse_pc_type redundant -mg_coarse_redundant_pc_type svd -ksp_view > ex50_2.tmp 2>&1; \ ${DIFF} output/ex50_2.out ex50_2.tmp || printf "${PWD}\nPossible problem with ex50_2, diffs above\n=========================================\n"; \ ${RM} -f ex50_2.tmp runex50_3 : -@${MPIEXEC} -n 2 ./ex50 -pc_type mg -pc_mg_type full -ksp_monitor_short -da_refine 5 -mg_coarse_ksp_type cg -mg_coarse_ksp_converged_reason -mg_coarse_ksp_rtol 1e-2 -mg_coarse_ksp_max_it 5 -mg_coarse_pc_type none -pc_mg_levels 2 -ksp_type pipefgmres -ksp_pipefgmres_shift 1.5 > ex50_3.tmp 2>&1; \ ${DIFF} output/ex50_3.out ex50_3.tmp || printf "${PWD}\nPossible problem with ex50_3, diffs above\n=========================================\n"; \ ${RM} -f ex50_3.tmp runex51: -@${MPIEXEC} -n 2 ./ex51 -ksp_monitor_short > ex51.tmp 2>&1; \ ${DIFF} output/ex51_1.out ex51.tmp || printf "${PWD}\nPossible problem with ex51, diffs above\n=========================================\n"; \ ${RM} -f ex51.tmp runex52: -@${MPIEXEC} -n 1 ./ex52 -use_petsc_lu > ex52.tmp 2>&1; \ ${DIFF} output/ex52_2.out ex52.tmp || printf "${PWD}\nPossible problem with ex52, diffs above\n=========================================\n"; \ ${RM} -f ex52.tmp runex52_mumps: -@${MPIEXEC} -n 3 ./ex52 -use_mumps_lu > ex52.tmp 2>&1; \ ${DIFF} output/ex52_1.out ex52.tmp || printf "${PWD}\nPossible problem with ex52_mumps, diffs above\n=========================================\n"; \ ${RM} -f ex52.tmp runex52_mumps_2: -@${MPIEXEC} -n 3 ./ex52 -use_mumps_ch > ex52.tmp 2>&1; \ ${DIFF} output/ex52_1.out ex52.tmp || printf "${PWD}\nPossible problem with ex52_mumps_2, diffs above\n=========================================\n"; \ ${RM} -f ex52.tmp runex52_mumps_3: -@${MPIEXEC} -n 3 ./ex52 -use_mumps_ch -mat_type sbaij > ex52.tmp 2>&1; \ ${DIFF} output/ex52_1.out ex52.tmp || printf "${PWD}\nPossible problem with ex52_mumps_3, diffs above\n=========================================\n"; \ ${RM} -f ex52.tmp runex52_superlu_ilu: -@${MPIEXEC} -n 1 ./ex52 -use_superlu_ilu > ex52.tmp 2>&1; \ ${DIFF} output/ex52_2.out ex52.tmp || printf "${PWD}\nPossible problem with ex52_superlu_ilu, diffs above\n=========================================\n"; \ ${RM} -f ex52.tmp runex52_superlu: -@${MPIEXEC} -n 1 ./ex52 -use_superlu_lu > ex52.tmp 2>&1; \ ${DIFF} output/ex52_2.out ex52.tmp || printf "${PWD}\nPossible problem with ex52_superlu, diffs above\n=========================================\n"; \ ${RM} -f ex52.tmp runex52_superlu_dist: -@${MPIEXEC} -n 2 ./ex52 -use_superlu_lu > ex52.tmp 2>&1; \ ${DIFF} output/ex52_2.out ex52.tmp || printf "${PWD}\nPossible problem with ex52_superlu_dist, diffs above\n=========================================\n"; \ ${RM} -f ex52.tmp runex52f_mumps: -@${MPIEXEC} -n 3 ./ex52f > ex52.tmp 2>&1; \ ${DIFF} output/ex52f_1.out ex52.tmp || printf "${PWD}\nPossible problem with ex52f_mumps, diffs above\n=========================================\n"; \ ${RM} -f ex52.tmp runex53: -@${MPIEXEC} -n 1 ./ex53 > ex53.tmp 2>&1; \ ${DIFF} output/ex53.out ex53.tmp || printf "${PWD}\nPossible problem with ex53, diffs above\n=========================================\n"; \ ${RM} -f ex53.tmp runex53_2: -@${MPIEXEC} -n 2 ./ex53 > ex53.tmp 2>&1; \ ${DIFF} output/ex53.out ex53.tmp || printf "${PWD}\nPossible problem with ex53_2, diffs above\n=========================================\n"; \ ${RM} -f ex53.tmp runex54_geo: -@${MPIEXEC} -n 4 ./ex54 -ne 49 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_type geo -pc_gamg_coarse_eq_limit 200 -mg_levels_pc_type jacobi -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -ksp_monitor_short -mg_levels_esteig_ksp_type cg > ex.tmp 2>&1; \ ${DIFF} output/ex54_0.out ex.tmp || printf "${PWD}\nPossible problem with ex54_0.out, diffs above\n=========================================\n"; \ ${RM} -f ex.tmp runex54: -@${MPIEXEC} -n 4 ./ex54 -ne 49 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -ksp_converged_reason -mg_levels_esteig_ksp_type cg > ex.tmp 2>&1; \ ${DIFF} output/ex54_1.out ex.tmp || printf "${PWD}\nPossible problem with ex54_1.out, diffs above\n======================================\n"; \ ${RM} -f ex.tmp runex54_Classical: -@${MPIEXEC} -n 4 ./ex54 -ne 49 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_type classical -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -ksp_converged_reason -mg_levels_esteig_ksp_type cg > ex.tmp 2>&1; \ ${DIFF} output/ex54_classical.out ex.tmp || printf "${PWD}\nPossible problem with ex54_classical.out, diffs above\n======================================\n"; \ ${RM} -f ex.tmp runex54f: -@${MPIEXEC} -n 4 ./ex54f -ne 39 -theta 30.0 -epsilon 1.e-1 -blob_center 0.,0. -ksp_type cg -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mat_coarsen_type hem -pc_gamg_square_graph 0 -ksp_monitor_short -mg_levels_esteig_ksp_type cg > ex.tmp 2>&1; \ ${DIFF} output/ex54f.out ex.tmp || printf "${PWD}\nPossible problem with ex54f.out, diffs above\n======================================\n"; \ ${RM} -f ex.tmp runex55_geo: -@${MPIEXEC} -n 4 ./ex55 -ne 29 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_type geo -use_coordinates -ksp_monitor_short -mg_levels_esteig_ksp_type cg -ksp_type cg -ksp_norm_type unpreconditioned > ex.tmp 2>&1; \ ${DIFF} output/ex55_0.out ex.tmp || printf "${PWD}\nPossible problem with ex55_0.out, diffs above\n=========================================\n"; \ ${RM} -f ex.tmp runex55_hypre: -@${MPIEXEC} -n 4 ./ex55 -ne 29 -alpha 1.e-3 -ksp_type cg -pc_type hypre -pc_hypre_type boomeramg -ksp_monitor_short > ex.tmp 2>&1; \ ${DIFF} output/ex55_hypre.out ex.tmp || printf "${PWD}\nPossible problem with ex55_hypre, diffs above\n=========================================\n"; \ ${RM} -f ex.tmp runex55: -@${MPIEXEC} -n 4 ./ex55 -ne 29 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -use_coordinates -ksp_converged_reason -mg_levels_esteig_ksp_type cg -ksp_rtol 1.e-3 -ksp_monitor > ex.tmp 2>&1; \ ${DIFF} output/ex55_sa.out ex.tmp || printf "${PWD}\nPossible problem with ex55_sa.out, diffs above\n=========================================\n"; \ ${RM} -f ex.tmp runex55_Classical: -@${MPIEXEC} -n 4 ./ex55 -ne 29 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_type classical -mg_levels_ksp_max_it 5 -ksp_converged_reason -mg_levels_esteig_ksp_type cg > ex.tmp 2>&1; \ ${DIFF} output/ex55_classical.out ex.tmp || printf "${PWD}\nPossible problem with ex55_classical.out, diffs above\n=========================================\n"; \ ${RM} -f ex.tmp runex55_NC: -@${MPIEXEC} -n 4 ./ex55 -ne 29 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -ksp_converged_reason -mg_levels_esteig_ksp_type cg > ex.tmp 2>&1; \ ${DIFF} output/ex55_NC.out ex.tmp || printf "${PWD}\nPossible problem with ex55_NC.out, diffs above\n======================================\n"; \ ${RM} -f ex.tmp runex56: -@${MPIEXEC} -n 8 ./ex56 -ne 11 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 -pc_gamg_reuse_interpolation true -two_solves -ksp_converged_reason -use_mat_nearnullspace -mg_levels_esteig_ksp_type gmres -pc_gamg_square_graph 1 -mg_levels_ksp_type chebyshev -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type sor > ex.tmp 2>&1; \ ${DIFF} output/ex56_0.out ex.tmp || printf "${PWD}\nPossible problem with ex56_0.out, diffs above \n=========================================\n"; \ ${RM} -f ex.tmp runex56_ml: -@${MPIEXEC} -n 8 ./ex56 -ne 9 -alpha 1.e-3 -ksp_type cg -pc_type ml -mg_levels_ksp_type chebyshev -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_pc_type sor -ksp_monitor_short -mg_levels_esteig_ksp_type cg > ex.tmp 2>&1; \ ${DIFF} output/ex56_ml.out ex.tmp || printf "${PWD}\nPossible problem with ex56_ml.out, diffs above\n=========================================\n"; \ ${RM} -f ex.tmp runex56_nns: -@${MPIEXEC} -n 1 ./ex56 -ne 9 -alpha 1.e-3 -ksp_converged_reason -ksp_type cg -ksp_max_it 50 -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 1000 -mg_levels_ksp_type chebyshev -mg_levels_pc_type sor -pc_gamg_reuse_interpolation true -two_solves -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg > ex.tmp 2>&1; \ ${DIFF} output/ex56_nns.out ex.tmp || printf "${PWD}\nPossible problem with ex56_nns.out, diffs above\n=========================================\n"; \ ${RM} -f ex.tmp runex58: -@${MPIEXEC} -n 1 ./ex58 -mat_type aij > ex58.tmp 2>&1; \ ${DIFF} output/ex58.out ex58.tmp || printf "${PWD}\nPossible problem with ex58, diffs above\n=========================================\n"; \ ${RM} -f ex58.tmp runex58_baij: -@${MPIEXEC} -n 1 ./ex58 -mat_type baij > ex58.tmp 2>&1; \ ${DIFF} output/ex58.out ex58.tmp || printf "${PWD}\nPossible problem with ex58_baij, diffs above\n=========================================\n"; \ ${RM} -f ex58.tmp runex58_sbaij: -@${MPIEXEC} -n 1 ./ex58 -mat_type sbaij > ex58.tmp 2>&1; \ ${DIFF} output/ex58.out ex58.tmp || printf "${PWD}\nPossible problem with ex58_sbaij, diffs above\n=========================================\n"; \ ${RM} -f ex58.tmp runex59: -@${MPIEXEC} -n 4 ./ex59 -nex 7 > ex59.tmp 2>&1; \ ${DIFF} output/ex59_1.out ex59.tmp || printf "${PWD}\nPossible problem with ex59, diffs above\n=========================================\n"; \ ${RM} -f ex59.tmp runex59_2: -@${MPIEXEC} -n 4 ./ex59 -npx 2 -npy 2 -nex 6 -ney 6 -ksp_max_it 3 > ex59_2.tmp 2>&1; \ ${DIFF} output/ex59_2.out ex59_2.tmp || printf "${PWD}\nPossible problem with ex59_2, diffs above\n=========================================\n"; \ ${RM} -f ex59_2.tmp runex59_3: -@${MPIEXEC} -n 4 ./ex59 -npx 2 -npy 2 -npz 1 -nex 6 -ney 6 -nez 1 -ksp_max_it 4 > ex59_3.tmp 2>&1; \ ${DIFF} output/ex59_3.out ex59_3.tmp || printf "${PWD}\nPossible problem with ex59_3, diffs above\n=========================================\n"; \ ${RM} -f ex59_3.tmp runex60: -@${MPIEXEC} -n 2 ./ex60 -ksp_monitor_short -ksp_rtol 1e-6 -diagfunc 1 -ksp_type fcg -ksp_fcg_mmax 1 -eta 0.1 > ex60_1.tmp 2>&1; \ if (${DIFF} output/ex60_1.out ex60_1.tmp) then true; \ else printf "${PWD}\nPossible problem with with ex60_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex60_1.tmp runex60_2: -@${MPIEXEC} -n 2 ./ex60 -ksp_monitor_short -diagfunc 3 -ksp_type fcg -ksp_fcg_mmax 10000 -eta 0.3333 > ex60_2.tmp 2>&1; \ if (${DIFF} output/ex60_2.out ex60_2.tmp) then true; \ else printf "${PWD}\nPossible problem with with ex60_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex60_2.tmp runex60_3: -@${MPIEXEC} -n 3 ./ex60 -ksp_monitor_short -ksp_rtol 1e-6 -diagfunc 2 -ksp_type fgmres -eta 0.1 > ex60_3.tmp 2>&1; \ if (${DIFF} output/ex60_3.out ex60_3.tmp) then true; \ else printf "${PWD}\nPossible problem with with ex60_3, diffs above\n=========================================\n"; fi; \ ${RM} -f ex60_3.tmp runex60_4: -@${MPIEXEC} -n 2 ./ex60 -ksp_monitor_short -ksp_rtol 1e-6 -diagfunc 1 -ksp_type pipefcg -ksp_pipefcg_mmax 1 -eta 0.1 > ex60_4.tmp 2>&1; \ if (${DIFF} output/ex60_4.out ex60_4.tmp) then true; \ else printf "${PWD}\nPossible problem with with ex60_4, diffs above\n=========================================\n"; fi; \ ${RM} -f ex60_4.tmp runex60_5: -@${MPIEXEC} -n 2 ./ex60 -ksp_monitor_short -ksp_rtol 1e-6 -diagfunc 3 -ksp_type pipefcg -ksp_pipefcg_mmax 10000 -eta 0.1 > ex60_5.tmp 2>&1; \ if (${DIFF} output/ex60_5.out ex60_5.tmp) then true; \ else printf "${PWD}\nPossible problem with with ex60_5, diffs above\n=========================================\n"; fi; \ ${RM} -f ex60_5.tmp runex60_6 : -@${MPIEXEC} -n 4 ./ex60 -ksp_monitor_short -ksp_rtol 1e-6 -diagfunc 3 -ksp_type fcg -ksp_fcg_mmax 10000 -eta 0 -pc_type ksp -ksp_ksp_type cg -ksp_pc_type none -ksp_ksp_rtol 1e-1 -ksp_ksp_max_it 5 -ksp_ksp_converged_reason > ex60_6.tmp 2>&1; \ if (${DIFF} output/ex60_6.out ex60_6.tmp) then true; \ else printf "${PWD}\nPossible problem with with ex60_6, diffs above\n=========================================\n"; fi; \ ${RM} -f ex60_6.tmp runex60_7 : -@${MPIEXEC} -n 4 ./ex60 -ksp_monitor_short -ksp_rtol 1e-6 -diagfunc 3 -ksp_type pipefcg -ksp_pipefcg_mmax 10000 -eta 0 -pc_type ksp -ksp_ksp_type cg -ksp_pc_type none -ksp_ksp_rtol 1e-1 -ksp_ksp_max_it 5 -ksp_ksp_converged_reason > ex60_7.tmp 2>&1; \ if (${DIFF} output/ex60_7.out ex60_7.tmp) then true; \ else printf "${PWD}\nPossible problem with with ex60_7, diffs above\n=========================================\n"; fi; \ ${RM} -f ex60_7.tmp runex60_8 : -@${MPIEXEC} -n 2 ./ex60 -ksp_monitor_short -ksp_rtol 1e-6 -diagfunc 1 -ksp_type pipefgmres -pc_type ksp -ksp_ksp_type cg -ksp_pc_type none -ksp_ksp_rtol 1e-2 -ksp_ksp_converged_reason > ex60_8.tmp 2>&1; \ if (${DIFF} output/ex60_8.out ex60_8.tmp) then true; \ else printf "${PWD}\nPossible problem with with ex60_8, diffs above\n=========================================\n"; fi; \ ${RM} -f ex60_8.tmp runex60_9 : -@${MPIEXEC} -n 2 ./ex60 -ksp_monitor_short -ksp_rtol 1e-6 -diagfunc 1 -ksp_type pipefgmres -pc_type ksp -ksp_ksp_type cg -ksp_pc_type none -ksp_ksp_rtol 1e-2 -ksp_ksp_converged_reason > ex60_9.tmp 2>&1; \ if (${DIFF} output/ex60_9.out ex60_9.tmp) then true; \ else printf "${PWD}\nPossible problem with with ex60_9, diffs above\n=========================================\n"; fi; \ ${RM} -f ex60_9.tmp NP = 1 M = 4 N = 5 MDOMAINS = 2 NDOMAINS = 1 OVERLAP=1 TSUBDOMAINS = 1 runex62_valgrind: -@${MPIEXEC} -n ${NP} valgrind ./ex62 -M $M -N $N -print_error ${ARGS} runex62: -@${MPIEXEC} -n ${NP} ./ex62 -M $M -N $N -print_error ${ARGS} runex62_hp: -@${MPIEXEC} -n 4 ./ex62 -M 7 -N 9 -pc_gasm_overlap 1 -sub_pc_type lu -sub_pc_factor_mat_solver_package superlu_dist -ksp_monitor -print_error -pc_gasm_total_subdomains 2 -pc_gasm_use_hierachical_partitioning 1 > ex62.tmp 2>&1; \ if (${DIFF} output/ex62.out ex62.tmp) then true; \ else printf "${PWD}\nPossible problem with with ex62, diffs above\n=========================================\n"; fi; \ ${RM} -f ex62.tmp runex62_2D: -@${MPIEXEC} -n ${NP} ./ex62 -M $M -N $N -user_set_subdomains -Mdomains ${MDOMAINS} -Ndomains ${NDOMAINS} -overlap ${OVERLAP} -print_error ${ARGS} TSUBDOMAINS=1 runex62_superlu_dist: -@${MPIEXEC} -n ${NP} ./ex62 -M $M -N $N -print_error -pc_gasm_total_subdomains ${TSUBDOMAINS} -sub_pc_type lu -sub_pc_factor_mat_solver_package superlu_dist ${ARGS} runex62_2D_1: -@${MPIEXEC} -n 1 ./ex62 -M 7 -N 9 -user_set_subdomains -Mdomains 1 -Ndomains 3 -overlap 1 -print_error -pc_gasm_print_subdomains > ex62.tmp 2>&1; \ if (${DIFF} output/ex62_2D_1.out ex62.tmp) then true; \ else printf "${PWD}\nPossible problem with ex62_2D_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex62.tmp runex62_2D_2: -@${MPIEXEC} -n 2 ./ex62 -M 7 -N 9 -user_set_subdomains -Mdomains 1 -Ndomains 3 -overlap 1 -print_error -pc_gasm_print_subdomains > ex62.tmp 2>&1; \ if (${DIFF} output/ex62_2D_2.out ex62.tmp) then true; \ else printf "${PWD}\nPossible problem with ex62_2D_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex62.tmp runex62_2D_3: -@${MPIEXEC} -n 3 ./ex62 -M 7 -N 9 -user_set_subdomains -Mdomains 1 -Ndomains 3 -overlap 1 -print_error -pc_gasm_print_subdomains > ex62.tmp 2>&1; \ if (${DIFF} output/ex62_2D_3.out ex62.tmp) then true; \ else printf "${PWD}\nPossible problem with ex62_2D_3, diffs above\n=========================================\n"; fi; \ ${RM} -f ex62.tmp runex62_superlu_dist_1: -@${MPIEXEC} -n 1 ./ex62 -M 7 -N 9 -print_error -pc_gasm_total_subdomains 1 -pc_gasm_print_subdomains -sub_pc_type lu -sub_pc_factor_mat_solver_package superlu_dist > ex62.tmp 2>&1; \ if (${DIFF} output/ex62_superlu_dist_1.out ex62.tmp) then true; \ else printf "${PWD}\nPossible problem with ex62_superlu_dist_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex62.tmp runex62_superlu_dist_2: -@${MPIEXEC} -n 2 ./ex62 -M 7 -N 9 -print_error -pc_gasm_total_subdomains 1 -pc_gasm_print_subdomains -sub_pc_type lu -sub_pc_factor_mat_solver_package superlu_dist > ex62.tmp 2>&1; \ if (${DIFF} output/ex62_superlu_dist_2.out ex62.tmp) then true; \ else printf "${PWD}\nPossible problem with ex62_superlu_dist_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex62.tmp runex62_superlu_dist_3: -@${MPIEXEC} -n 3 ./ex62 -M 7 -N 9 -print_error -pc_gasm_total_subdomains 2 -pc_gasm_print_subdomains -sub_pc_type lu -sub_pc_factor_mat_solver_package superlu_dist > ex62.tmp 2>&1; \ if (${DIFF} output/ex62_superlu_dist_3.out ex62.tmp) then true; \ else printf "${PWD}\nPossible problem with ex62_superlu_dist_3, diffs above\n=========================================\n"; fi; \ ${RM} -f ex62.tmp runex62_superlu_dist_4: -@${MPIEXEC} -n 4 ./ex62 -M 7 -N 9 -print_error -pc_gasm_total_subdomains 2 -pc_gasm_print_subdomains -sub_pc_type lu -sub_pc_factor_mat_solver_package superlu_dist > ex62.tmp 2>&1; \ if (${DIFF} output/ex62_superlu_dist_4.out ex62.tmp) then true; \ else printf "${PWD}\nPossible problem with ex62_superlu_dist_4, diffs above\n=========================================\n"; fi; \ ${RM} -f ex62.tmp runex63: -@${MPIEXEC} -n 1 ./ex63 --filedir=${PETSC_DIR}/share/petsc/datafiles/matrices/ --filename=amesos2_test_mat0.mtx --solver=SuperLU --print-residual=true -ksp_monitor -pc_type lu -pc_factor_mat_solver_package superlu -ksp_view -ksp_converged_reason > ex63_1.tmp 2>&1; \ if (${DIFF} output/ex63_1.out ex63_1.tmp) then true; \ else printf "${PWD}\nPossible problem with with ex63_1, diffs above\n=========================================\n"; fi; \ ${RM} -f ex63_1.tmp runex63_2: -@${MPIEXEC} -n 1 ./ex63 --filedir=${PETSC_DIR}/share/petsc/datafiles/matrices/ --filename=amesos2_test_mat0.mtx --solver=SuperLUDist --print-residual=true -ksp_monitor -pc_type lu -pc_factor_mat_solver_package superlu_dist -ksp_view -ksp_converged_reason > ex63_2.tmp 2>&1; \ if (${DIFF} output/ex63_2.out ex63_2.tmp) then true; \ else printf "${PWD}\nPossible problem with with ex63_2, diffs above\n=========================================\n"; fi; \ ${RM} -f ex63_2.tmp runex64: -@${MPIEXEC} -n 4 ./ex64 -ksp_monitor -pc_gasm_overlap 1 -sub_pc_type lu -sub_pc_factor_mat_solver_package superlu_dist -pc_gasm_total_subdomains 2 > ex64.tmp 2>&1; \ if (${DIFF} output/ex64.out ex64.tmp) then true; \ else printf "${PWD}\nPossible problem with with ex64, diffs above\n=========================================\n"; fi; \ ${RM} -f ex64.tmp runex65: -@${MPIEXEC} -n 4 ./ex65 -ksp_monitor -pc_type mg -da_refine 3 > ex65.tmp 2>&1; \ if (${DIFF} output/ex65.out ex65.tmp) then true; \ else printf "${PWD}\nPossible problem with with ex65, diffs above\n=========================================\n"; fi; \ ${RM} -f ex65.tmp TESTEXAMPLES_C = ex1.PETSc runex1 runex1_2 runex1_3 ex1.rm ex2.PETSc runex2 runex2_2 runex2_3 \ runex2_4 runex2_bjacobi runex2_bjacobi_2 runex2_bjacobi_3 \ runex2_chebyest_1 runex2_chebyest_2 runex2_fbcgs runex2_fbcgs_2 runex2_telescope runex2_pipecg runex2_pipecr runex2_groppcg runex2_pipecgrr ex2.rm \ ex7.PETSc runex7 runex7_2 ex7.rm ex4.PETSc ex4.rm ex5.PETSc runex5 runex5_2 \ runex5_redundant_0 runex5_redundant_1 runex5_redundant_2 runex5_redundant_3 runex5_redundant_4 ex5.rm \ ex6.PETSc runex6 runex6_1 runex6_2 ex6.rm \ ex9.PETSc runex9 ex9.rm ex12.PETSc runex12 ex12.rm ex13.PETSc runex13 ex13.rm \ ex15.PETSc runex15 ex15.rm ex16.PETSc runex16 ex16.rm \ ex23.PETSc runex23 runex23_2 ex23.rm ex25.PETSc runex25 runex25_2 ex25.rm \ ex18.PETSc runex18_3 ex18.rm \ ex27.PETSc ex27.rm ex28.PETSc ex28.rm ex29.PETSc runex29_telescope ex29.rm \ ex31.PETSc ex31.rm ex32.PETSc runex32 ex32.rm ex34.PETSc runex34 ex34.rm ex42.PETSc ex42.rm \ ex43.PETSc runex43 runex43_2 runex43_bjacobi runex43_bjacobi_baij runex43_nested_gmg runex43_3 ex43.rm \ ex45.PETSc runex45 runex45_2 runex45_telescope runex45_telescope_2 ex45.rm \ ex49.PETSc runex49 runex49_2 runex49_3 runex49_5 ex49.rm \ ex51.PETSc runex51 ex51.rm \ ex53.PETSc runex53 ex53.rm \ ex54.PETSc runex54 runex54_Classical ex54.rm ex55.PETSc runex55 runex55_Classical runex55_NC ex55.rm\ ex56.PETSc runex56_nns runex56 ex56.rm ex59.PETSc runex59 runex59_2 runex59_3 ex59.rm \ ex58.PETSc runex58 runex58_baij runex58_sbaij ex58.rm \ ex60.PETSc runex60 runex60_2 runex60_3 ex60.rm \ ex62.PETSc runex62_2D_1 runex62_2D_2 runex62_2D_3 ex62.rm ex65.PETSc runex65 ex65.rm TESTEXAMPLES_C_NOTSINGLE = ex43.PETSc runex43_3 ex43.rm TESTEXAMPLES_C_NOCOMPLEX = ex54.PETSc ex54.rm ex10.PETSc runex10 ex10.rm TESTEXAMPLES_C_NOCOMPLEX_NOTSINGLE = ex23.PETSc runex23_3 ex23.rm \ ex34.PETSc runex34 runex34_2 ex34.rm \ ex43.PETSc runex43_4 runex43_5 ex43.rm \ ex49.PETSc runex49_6 runex49_7 runex49_8 ex49.rm \ ex50.PETSc runex50_3 ex50.rm \ ex60.PETSc runex60_4 runex60_5 runex60_6 runex60_7 runex60_8 runex60_9 ex60.rm TESTEXAMPLES_C_X = ex2.PETSc runex2_5 ex2.rm ex5.PETSc runex5_5 ex5.rm ex8.PETSc ex8.rm ex28.PETSc runex28 ex28.rm TESTEXAMPLES_FORTRAN = ex1f.PETSc runex1f ex1f.rm ex2f.PETSc runex2f runex2f_2 ex2f.rm ex6f.PETSc ex6f.rm \ ex15f.PETSc runex15f ex15f.rm \ ex45f.PETSc runex45f ex45f.rm TESTEXAMPLES_FORTRAN_NOTSINGLE = ex14f.PETSc runex14f ex14f.rm ex22f.PETSc runex22f ex22f.rm ex21f.PETSc runex21f ex21f.rm TESTEXAMPLES_FORTRAN_MPIUNI = ex1f.PETSc runex1f ex1f.rm ex6f.PETSc runex6f ex6f.rm TESTEXAMPLES_C_X_MPIUNI = ex1.PETSc runex1 runex1_2 runex1_3 ex1.rm ex2.PETSc runex2 runex2_3 ex2.rm \ ex7.PETSc ex7.rm ex5.PETSc ex5.rm ex9.PETSc runex9 ex9.rm \ ex23.PETSc runex23 ex23.rm TESTEXAMPLES_C_COMPLEX = ex10.PETSc ex10.rm ex11.PETSc runex11 ex11.rm TESTEXAMPLES_DATAFILESPATH = ex10.PETSc runex10_2 runex10_3 runex10_4 runex10_5 runex10_6 runex10_7 runex10_8 \ runex10_9 runex10_10 runex10_19 runex10_ILU runex10_ILUBAIJ runex10_cg \ runex10_cg_singlereduction runex10_seqaijcrl runex10_mpiaijcrl runex10_seqaijperm runex10_mpiaijperm ex10.rm \ ex27.PETSc runex27 ex27.rm # even though ex10.c is -pc_mg_smoothdown na C example to run with -mat_type lusol requires a Fortran compiler, hence # we list it with the fortran examples TESTEXAMPLES_FORTRAN_NOCOMPLEX = TESTEXAMPLES_FORTRAN_COMPLEX = ex11f.PETSc runex11f ex11f.rm TESTEXAMPLES_F90 = ex13f90.PETSc runex13f90 ex13f90.rm TESTEXAMPLES_13 = ex3.PETSc ex3.rm ex14f.PETSc ex14f.rm TESTEXAMPLES_MATLAB_ENGINE = ex10.PETSc runex10_12 ex10.rm TESTEXAMPLES_17 = ex10.PETSc runex10_11 ex10.rm TESTEXAMPLES_18 = ex2.PETSc runex2_6 ex2.rm TESTEXAMPLES_SPAI = ex10.PETSc runex10_14 ex10.rm TESTEXAMPLES_HYPRE = ex49.PETSc runex49_hypre_nullspace ex49.rm TESTEXAMPLES_HYPRE_DATAFILESPATH = ex10.PETSc runex10_15 runex10_16 runex10_17 runex10_boomeramg_schwarz runex10_boomeramg_parasails runex10_boomeramg_pilut ex10.rm TESTEXAMPLES_LUSOL = ex10.PETSc runex10_13 ex10.rm TESTEXAMPLES_MUMPS = ex53.PETSc runex53 runex53_2 ex53.rm \ ex52.PETSc runex52 runex52_mumps runex52_mumps_2 runex52_mumps_3 ex52.rm ex52f.PETSc runex52f_mumps ex52f.rm TESTEXAMPLES_MUMPS_DATAFILESPATH = ex10.PETSc runex10_mumps_lu_1 runex10_mumps_lu_2 runex10_mumps_lu_3 runex10_mumps_lu_4 \ runex10_mumps_cholesky_1 runex10_mumps_cholesky_2 runex10_mumps_cholesky_3 runex10_mumps_cholesky_4 \ runex10_mumps_redundant runex10_mumps_cholesky_spd_1 runex10_mumps_cholesky_spd_2 \ runex10_zeropivot runex10_zeropivot_2 ex10.rm TESTEXAMPLES_PASTIX_DATAFILESPATH = ex10.PETSc runex10_pastix_lu_1 runex10_pastix_lu_2 \ runex10_pastix_cholesky_1 runex10_pastix_cholesky_2 runex10_pastix_redundant \ ex10.rm TESTEXAMPLES_SUPERLU = ex52.PETSc runex52_superlu_ilu runex52_superlu ex52.rm TESTEXAMPLES_SUPERLU_DATAFILESPATH = ex10.PETSc runex10_superlu_lu_1 ex10.rm TESTEXAMPLES_SUPERLU_DIST = ex52.PETSc runex52_superlu_dist ex52.rm ex64.PETSc runex64 ex64.rm \ ex62.PETSc runex62_superlu_dist_1 runex62_superlu_dist_2 runex62_superlu_dist_3 runex62_superlu_dist_4 ex62.rm TESTEXAMPLES_SUPERLU_DIST_DATAFILESPATH = ex10.PETSc runex10_superlu_dist_lu_1 runex10_superlu_dist_lu_2 runex10_superlu_dist_redundant ex10.rm TESTEXAMPLES_SUITESPARSE_DATAFILESPATH = ex10.PETSc runex10_umfpack ex10.rm TESTEXAMPLES_SUITESPARSE = ex2.PETSc runex2_umfpack ex2.rm TESTEXAMPLES_MKL_PARDISO = ex2.PETSc runex2_mkl_pardiso_lu runex2_mkl_pardiso_cholesky ex2.rm TESTEXAMPLES_CUDA_DATAFILESPATH = ex10.PETSc runex10_aijcusparse ex10.rm TESTEXAMPLES_CUDA = ex7.PETSc runex7_mpiaijcusparse runex7_mpiaijcusparse_2 ex7.rm \ ex46.PETSc runex46_aijcusp runex46_aijcusparse ex46.rm TESTEXAMPLES_MOAB = ex35.PETSc runex35 ex35.rm TESTEXAMPLES_MOAB_HDF5 = ex35.PETSc runex35_2 ex35.rm TESTEXAMPLES_TRILINOS = ex63.PETSc runex63 runex63_2 ex63.rm include ${PETSC_DIR}/lib/petsc/conf/test From bsmith at mcs.anl.gov Fri Jun 30 17:49:56 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 30 Jun 2017 17:49:56 -0500 Subject: [petsc-users] Is OpenMP still available for PETSc? In-Reply-To: References: Message-ID: <2AEC6AB1-D4BD-4615-A99A-C0CFB140A5AF@mcs.anl.gov> The current version of PETSc does not use OpenMP, you are free to use OpenMP in your portions of the code of course. If you want PETSc using OpenMP you have to use the old, unsupported version of PETSc. We never found any benefit to using OpenMP. Barry > On Jun 30, 2017, at 5:40 PM, Danyang Su wrote: > > Dear All, > > I recalled there was OpenMP available for PETSc for the old development version. When google "petsc hybrid mpi openmp", there returned some papers about this feature. My code was first parallelized using OpenMP and then redeveloped using PETSc, with OpenMP kept but not used together with MPI. Before retesting the code using hybrid mpi-openmp, I picked one PETSc example ex10 by adding "omp_set_num_threads(max_threads);" under PetscInitialize. > > The PETSc is the current development version configured as follows > > --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-debugging=0 --CFLAGS=-fopenmp --CXXFLAGS=-fopenmp --FFLAGS=-fopenmp COPTFLAGS="-O3 -march=native -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" FOPTFLAGS="-O3 -march=native -mtune=native" --with-large-file-io=1 --download-cmake=yes --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes --with-openmp --with-threadcomm --with-pthreadclasses --with-openmpclasses > > The code can be successfully compiled. However, when I run the code with OpenMP, it does not work, the time shows no change in performance if 1 or 2 threads per processor is used. Also, the CPU/Threads usage indicates that no thread is used. > > I just wonder if OpenMP is still available in the latest version, though it is not recommended to use. > > mpiexec -n 2 ./ex10 -f0 mat_rhs_pc_nonzero/a_react_in_2.bin -rhs mat_rhs_pc_nonzero/b_react_in_2.bin -ksp_rtol 1.0e-20 -ksp_monitor -ksp_error_if_not_converged -sub_pc_factor_shift_type nonzero -mat_view ascii::ascii_info -log_view -max_threads 1 -threadcomm_type openmp -threadcomm_nthreads 1 > > KSPSolve 1 1.0 8.9934e-01 1.0 1.03e+09 1.0 7.8e+01 3.6e+04 7.8e+01 69 97 89 6 76 89 97 98 98 96 2290 > PCSetUp 2 1.0 8.9590e-02 1.0 2.91e+07 1.0 0.0e+00 0.0e+00 0.0e+00 7 3 0 0 0 9 3 0 0 0 648 > PCSetUpOnBlocks 2 1.0 8.9465e-02 1.0 2.91e+07 1.0 0.0e+00 0.0e+00 0.0e+00 7 3 0 0 0 9 3 0 0 0 649 > PCApply 40 1.0 3.1993e-01 1.0 2.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00 24 25 0 0 0 32 25 0 0 0 1686 > > mpiexec -n 2 ./ex10 -f0 mat_rhs_pc_nonzero/a_react_in_2.bin -rhs mat_rhs_pc_nonzero/b_react_in_2.bin -ksp_rtol 1.0e-20 -ksp_monitor -ksp_error_if_not_converged -sub_pc_factor_shift_type nonzero -mat_view ascii::ascii_info -log_view -max_threads 2 -threadcomm_type openmp -threadcomm_nthreads 2 > > KSPSolve 1 1.0 8.9701e-01 1.0 1.03e+09 1.0 7.8e+01 3.6e+04 7.8e+01 69 97 89 6 76 89 97 98 98 96 2296 > PCSetUp 2 1.0 8.7635e-02 1.0 2.91e+07 1.0 0.0e+00 0.0e+00 0.0e+00 7 3 0 0 0 9 3 0 0 0 663 > PCSetUpOnBlocks 2 1.0 8.7511e-02 1.0 2.91e+07 1.0 0.0e+00 0.0e+00 0.0e+00 7 3 0 0 0 9 3 0 0 0 664 > PCApply 40 1.0 3.1878e-01 1.0 2.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00 24 25 0 0 0 32 25 0 0 0 1692 > > Thanks and regards, > > Danyang > > > From danyang.su at gmail.com Fri Jun 30 17:59:56 2017 From: danyang.su at gmail.com (Danyang Su) Date: Fri, 30 Jun 2017 15:59:56 -0700 Subject: [petsc-users] Is OpenMP still available for PETSc? In-Reply-To: <2AEC6AB1-D4BD-4615-A99A-C0CFB140A5AF@mcs.anl.gov> References: <2AEC6AB1-D4BD-4615-A99A-C0CFB140A5AF@mcs.anl.gov> Message-ID: <1a8eeeed-4b74-5321-6c0c-8d287196d70a@gmail.com> Hi Barry, Thanks for the quick response. What I want to test is to check if OpenMP has any benefit when total degrees of freedoms per processor drops below 5k. When using pure MPI my code shows good speedup if total degrees of freedoms per processor is above 10k. But below this value, the parallel efficiency decreases. The petsc 3.6 change log indicates * Removed all threadcomm support including --with-pthreadclasses and --with-openmpclasses configure arguments I guess petsc 3.5 version is the last version I can test, right? Thanks, Danyang On 17-06-30 03:49 PM, Barry Smith wrote: > The current version of PETSc does not use OpenMP, you are free to use OpenMP in your portions of the code of course. If you want PETSc using OpenMP you have to use the old, unsupported version of PETSc. We never found any benefit to using OpenMP. > > Barry > >> On Jun 30, 2017, at 5:40 PM, Danyang Su wrote: >> >> Dear All, >> >> I recalled there was OpenMP available for PETSc for the old development version. When google "petsc hybrid mpi openmp", there returned some papers about this feature. My code was first parallelized using OpenMP and then redeveloped using PETSc, with OpenMP kept but not used together with MPI. Before retesting the code using hybrid mpi-openmp, I picked one PETSc example ex10 by adding "omp_set_num_threads(max_threads);" under PetscInitialize. >> >> The PETSc is the current development version configured as follows >> >> --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-debugging=0 --CFLAGS=-fopenmp --CXXFLAGS=-fopenmp --FFLAGS=-fopenmp COPTFLAGS="-O3 -march=native -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" FOPTFLAGS="-O3 -march=native -mtune=native" --with-large-file-io=1 --download-cmake=yes --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes --with-openmp --with-threadcomm --with-pthreadclasses --with-openmpclasses >> >> The code can be successfully compiled. However, when I run the code with OpenMP, it does not work, the time shows no change in performance if 1 or 2 threads per processor is used. Also, the CPU/Threads usage indicates that no thread is used. >> >> I just wonder if OpenMP is still available in the latest version, though it is not recommended to use. >> >> mpiexec -n 2 ./ex10 -f0 mat_rhs_pc_nonzero/a_react_in_2.bin -rhs mat_rhs_pc_nonzero/b_react_in_2.bin -ksp_rtol 1.0e-20 -ksp_monitor -ksp_error_if_not_converged -sub_pc_factor_shift_type nonzero -mat_view ascii::ascii_info -log_view -max_threads 1 -threadcomm_type openmp -threadcomm_nthreads 1 >> >> KSPSolve 1 1.0 8.9934e-01 1.0 1.03e+09 1.0 7.8e+01 3.6e+04 7.8e+01 69 97 89 6 76 89 97 98 98 96 2290 >> PCSetUp 2 1.0 8.9590e-02 1.0 2.91e+07 1.0 0.0e+00 0.0e+00 0.0e+00 7 3 0 0 0 9 3 0 0 0 648 >> PCSetUpOnBlocks 2 1.0 8.9465e-02 1.0 2.91e+07 1.0 0.0e+00 0.0e+00 0.0e+00 7 3 0 0 0 9 3 0 0 0 649 >> PCApply 40 1.0 3.1993e-01 1.0 2.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00 24 25 0 0 0 32 25 0 0 0 1686 >> >> mpiexec -n 2 ./ex10 -f0 mat_rhs_pc_nonzero/a_react_in_2.bin -rhs mat_rhs_pc_nonzero/b_react_in_2.bin -ksp_rtol 1.0e-20 -ksp_monitor -ksp_error_if_not_converged -sub_pc_factor_shift_type nonzero -mat_view ascii::ascii_info -log_view -max_threads 2 -threadcomm_type openmp -threadcomm_nthreads 2 >> >> KSPSolve 1 1.0 8.9701e-01 1.0 1.03e+09 1.0 7.8e+01 3.6e+04 7.8e+01 69 97 89 6 76 89 97 98 98 96 2296 >> PCSetUp 2 1.0 8.7635e-02 1.0 2.91e+07 1.0 0.0e+00 0.0e+00 0.0e+00 7 3 0 0 0 9 3 0 0 0 663 >> PCSetUpOnBlocks 2 1.0 8.7511e-02 1.0 2.91e+07 1.0 0.0e+00 0.0e+00 0.0e+00 7 3 0 0 0 9 3 0 0 0 664 >> PCApply 40 1.0 3.1878e-01 1.0 2.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00 24 25 0 0 0 32 25 0 0 0 1692 >> >> Thanks and regards, >> >> Danyang >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: