From marco at kit.ac.jp Thu Aug 1 03:34:45 2024 From: marco at kit.ac.jp (Marco Seiz) Date: Thu, 1 Aug 2024 17:34:45 +0900 Subject: [petsc-users] Right DM for a particle network In-Reply-To: References: <54a9b9e5-691c-4535-bc49-5c00bc19a0df@kit.ac.jp> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex_graphlapl.c Type: text/x-csrc Size: 11457 bytes Desc: not available URL: From Eric.Chamberland at giref.ulaval.ca Thu Aug 1 07:23:28 2024 From: Eric.Chamberland at giref.ulaval.ca (Eric Chamberland) Date: Thu, 1 Aug 2024 08:23:28 -0400 Subject: [petsc-users] How to combine different element types into a single DMPlex? In-Reply-To: References: <6e78845e-2054-92b1-d6db-2c0820c05b64@giref.ulaval.ca> <9021c53e-18af-428a-978a-54a3c7371378@giref.ulaval.ca> Message-ID: <4545fc14-d9d5-46c4-bb16-fa304b27d106@giref.ulaval.ca> Hi Matthew, we have our own format that uses MPI I/O for the initial read, then we would like to do almost exactly what we do in ex47.c (https://urldefense.us/v3/__https://petsc.org/main/src/dm/impls/plex/tests/ex47.c.html__;!!G_uCfscf7eWS!cgLnoLq-w8YlD_y4ZrBQbY1i_SgBSKmVRFIOZU9rULyu9jowettkaC7Srlg-sjuHlrXIjItOOY-dgiXMDyfGE3fljVcPVgrTfu5bKTR_$ ) excepted the very beginning of the program that will read (MPI I/O) from the disk.? Then, always in parallel: 1- Populate a DMPlex with multiple element types (with a variant of DMPlexBuildFromCellListParallel ? do you have an example of this?) 2- Call partitioning (DMPlexDistribute) 3- Compute overlap (DMPlexDistributeOverlap) 4- Also compute the corresponding mapping between original element numbers and partitonned+overlap elements ( DMPlexNaturalToGlobalBegin/End) The main point here here is overlap computation.? And the big challenge is that we must always rely on the fact that never, ever, any node read all the mesh: all nodes have only a small part of it at the beginning then we want parallel partitioning and overlapping computation... It is now working fine for a mesh with a single type of element, but if we can modify ex47.c with an example of a mixed element types that will achieve what we would like to do! Thanks, Eric On 2024-07-31 22:09, Matthew Knepley wrote: > On Wed, Jul 31, 2024 at 4:16?PM Eric Chamberland > wrote: > > Hi Vaclav, > > Okay, I am coming back with this question after some time... ;) > > I am just wondering if it is now possible to call > DMPlexBuildFromCellListParallel or something else, to build a mesh > that combine different element types into a single DMPlex (in > parallel of course) ? > > 1) Meshes with different cell types are fully functional, and some > applications have been using them for a while now. > > 2) The Firedrake I/O methods support these hybrid meshes. > > 3) You can, for example, read in a GMsh or ExodusII file with > different cell types. > > However, there is no direct interface like > DMPlexBuildFromCellListParallel(). If you plan on creating meshes by > hand, I can build that for you. > No one so far has wanted that. Rather they want to read in a mesh in > some format, or alter a base mesh by inserting other cell types. > > So, what is the motivating use case? > > ? Thanks, > > ? ? ?Matt > > Thanks, > > Eric > > On 2021-09-23 11:30, Hapla Vaclav wrote: >> Note there will soon be a generalization of >> DMPlexBuildFromCellListParallel() around, as a side product of >> our current collaborative efforts with Firedrake guys. It will >> take a PetscSection instead of relying on the blocksize [which is >> indeed always constant for the given dataset]. Stay tuned. >> >> https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/4350__;!!G_uCfscf7eWS!cgLnoLq-w8YlD_y4ZrBQbY1i_SgBSKmVRFIOZU9rULyu9jowettkaC7Srlg-sjuHlrXIjItOOY-dgiXMDyfGE3fljVcPVgrTfvLIunXK$ >> >> Thanks, >> >> Vaclav >> >>> On 23 Sep 2021, at 16:53, Eric Chamberland >>> wrote: >>> >>> Hi, >>> >>> oh, that's a great news! >>> >>> In our case we have our home-made file-format, invariant to the >>> number of processes (thanks to MPI_File_set_view), that uses >>> collective, asynchronous MPI I/O native calls for unstructured >>> hybrid meshes and fields . >>> >>> So our needs are not for reading meshes but only to fill an >>> hybrid DMPlex with DMPlexBuildFromCellListParallel (or something >>> else to come?)... to exploit petsc partitioners and parallel >>> overlap computation... >>> >>> Thanks for the follow-up! :) >>> >>> Eric >>> >>> >>> On 2021-09-22 7:20 a.m., Matthew Knepley wrote: >>>> On Wed, Sep 22, 2021 at 3:04 AM Karin&NiKo >>>> wrote: >>>> >>>> Dear Matthew, >>>> >>>> This is great news! >>>> For my part, I would be mostly interested?in the parallel >>>> input interface. Sorry for that... >>>> Indeed, in our application, we already have a parallel mesh >>>> data structure that supports hybrid meshes with parallel >>>> I/O and distribution (based on the MED format). We would >>>> like to use a DMPlex to make parallel mesh adaptation. >>>> ?As a matter of fact, all our meshes are in the MED format. >>>> We could also?contribute to extend the interface of DMPlex >>>> with MED (if you consider it could be usefull). >>>> >>>> >>>> An MED interface does exist. I stopped using it for two reasons: >>>> >>>> ? 1) The code was not portable and the build was failing on >>>> different architectures. I had to manually fix it. >>>> >>>> ? 2) The boundary markers did not provide global information, >>>> so that parallel reading was much harder. >>>> >>>> Feel free to update my MED reader to a better design. >>>> >>>> ? Thanks, >>>> >>>> ? ? ?Matt >>>> >>>> Best regards, >>>> Nicolas >>>> >>>> >>>> Le?mar. 21 sept. 2021 ??21:56, Matthew Knepley >>>> a ?crit?: >>>> >>>> On Tue, Sep 21, 2021 at 10:31 AM Karin&NiKo >>>> wrote: >>>> >>>> Dear Eric, dear Matthew, >>>> >>>> I share Eric's desire to be able to manipulate >>>> meshes composed of different types of elements in a >>>> PETSc's DMPlex. >>>> Since this discussion, is there anything new on >>>> this feature for the DMPlex?object or am I missing >>>> something? >>>> >>>> >>>> Thanks for finding this! >>>> >>>> Okay, I did a rewrite of the Plex internals this >>>> summer. It should now be possible to interpolate a mesh >>>> with any >>>> number of cell types, partition it, redistribute it, >>>> and many other manipulations. >>>> >>>> You can read in some formats that support >>>> hybrid?meshes. If you let me know how you plan to read >>>> it in, we can make it work. >>>> Right now, I don't want to make input interfaces that >>>> no one will ever use. We have a project, joint with >>>> Firedrake, to finalize >>>> parallel I/O. This will make parallel reading and >>>> writing for checkpointing possible, supporting >>>> topology, geometry, fields and >>>> layouts, for many meshes?in one HDF5 file. I think we >>>> will finish in November. >>>> >>>> ? Thanks, >>>> >>>> ? ? ?Matt >>>> >>>> Thanks, >>>> Nicolas >>>> >>>> Le?mer. 21 juil. 2021 ??04:25, Eric Chamberland >>>> a ?crit?: >>>> >>>> Hi, >>>> >>>> On 2021-07-14 3:14 p.m., Matthew Knepley wrote: >>>>> On Wed, Jul 14, 2021 at 1:25 PM Eric >>>>> Chamberland >>>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> while playing with >>>>> DMPlexBuildFromCellListParallel, I noticed >>>>> we have to >>>>> specify "numCorners" which is a fixed >>>>> value, then gives a fixed number >>>>> of nodes for a series of elements. >>>>> >>>>> How can I then add, for example, triangles >>>>> and quadrangles into a DMPlex? >>>>> >>>>> >>>>> You can't with that function. It would be much >>>>> mich more complicated if you could, and I am >>>>> not sure >>>>> it is worth it for that function. The reason >>>>> is that you would need index information to >>>>> offset?into the >>>>> connectivity list, and that would need to be >>>>> replicated to some extent so that all >>>>> processes know what >>>>> the others are doing. Possible, but complicated. >>>>> >>>>> Maybe I can help suggest something for what >>>>> you are trying?to do? >>>> >>>> Yes: we are trying to partition our parallel >>>> mesh with PETSc functions. The mesh has been >>>> read in parallel so each process owns a part of >>>> it, but we have to manage mixed elements types. >>>> >>>> When we directly use ParMETIS_V3_PartMeshKway, >>>> we give two arrays to describe the elements >>>> which allows mixed elements. >>>> >>>> So, how would I read my mixed mesh in parallel >>>> and give it to PETSc DMPlex so I can use a >>>> PetscPartitioner with DMPlexDistribute ? >>>> >>>> A second goal we have is to use PETSc to >>>> compute the overlap, which is something I can't >>>> find in PARMetis (and any other partitionning >>>> library?) >>>> >>>> Thanks, >>>> >>>> Eric >>>> >>>> >>>>> >>>>> ? Thanks, >>>>> >>>>> Matt >>>>> >>>>> Thanks, >>>>> >>>>> Eric >>>>> >>>>> -- >>>>> Eric Chamberland, ing., M. Ing >>>>> Professionnel de recherche >>>>> GIREF/Universit? Laval >>>>> (418) 656-2131 poste 41 22 42 >>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted >>>>> before they begin their experiments is >>>>> infinitely more interesting than any results >>>>> to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cgLnoLq-w8YlD_y4ZrBQbY1i_SgBSKmVRFIOZU9rULyu9jowettkaC7Srlg-sjuHlrXIjItOOY-dgiXMDyfGE3fljVcPVgrTfsLvKmAp$ >>>>> >>>> >>>> -- >>>> Eric Chamberland, ing., M. Ing >>>> Professionnel de recherche >>>> GIREF/Universit? Laval >>>> (418) 656-2131 poste 41 22 42 >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they >>>> begin their experiments is infinitely more interesting >>>> than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cgLnoLq-w8YlD_y4ZrBQbY1i_SgBSKmVRFIOZU9rULyu9jowettkaC7Srlg-sjuHlrXIjItOOY-dgiXMDyfGE3fljVcPVgrTfsLvKmAp$ >>>> >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin >>>> their experiments is infinitely more interesting than any >>>> results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cgLnoLq-w8YlD_y4ZrBQbY1i_SgBSKmVRFIOZU9rULyu9jowettkaC7Srlg-sjuHlrXIjItOOY-dgiXMDyfGE3fljVcPVgrTfsLvKmAp$ >>>> >>> -- >>> Eric Chamberland, ing., M. Ing >>> Professionnel de recherche >>> GIREF/Universit? Laval >>> (418) 656-2131 poste 41 22 42 >> > -- > Eric Chamberland, ing., M. Ing > Professionnel de recherche > GIREF/Universit? Laval > (418) 656-2131 poste 41 22 42 > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cgLnoLq-w8YlD_y4ZrBQbY1i_SgBSKmVRFIOZU9rULyu9jowettkaC7Srlg-sjuHlrXIjItOOY-dgiXMDyfGE3fljVcPVgrTfsLvKmAp$ > -- Eric Chamberland, ing., M. Ing Professionnel de recherche GIREF/Universit? Laval (418) 656-2131 poste 41 22 42 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian.blauth at itwm.fraunhofer.de Thu Aug 1 08:19:52 2024 From: sebastian.blauth at itwm.fraunhofer.de (Blauth, Sebastian) Date: Thu, 1 Aug 2024 13:19:52 +0000 Subject: [petsc-users] Question regarding naming of fieldsplit splits In-Reply-To: References: Message-ID: Hello everyone, I have a follow up on my question. I noticed the following behavior. Let?s assume I have 5 fields which I want to group with the following options: -ksp_type fgmres -ksp_max_it 1 -ksp_monitor_true_residual -ksp_view -pc_type fieldsplit -pc_fieldsplit_type multiplicative -pc_fieldsplit_0_fields 0,1 -pc_fieldsplit_1_fields 2 -pc_fieldsplit_2_fields 3,4 -fieldsplit_0_ksp_type preonly -fieldsplit_0_pc_type jacobi -fieldsplit_2_ksp_type preonly -fieldsplit_2_pc_type jacobi Then, the first split is fine, but both the second and third splits get the same prefix, i.e., ?fieldsplit_2?. This is shown in the output of the ksp_view, which I attach below. The first one gets the prefix as there is only a single split (and I choose as name the index) and the third split gets the name as it groups two other fields, so the ?outer? name is taken. Is there any way to circumvent this, other than using custom names for the splits which are unique? Thanks for your time and best regards, Sebastian Blauth The output of ?ksp_view? is the following KSP Object: 1 MPI process type: fgmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. right preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 1 MPI process type: fieldsplit FieldSplit with MULTIPLICATIVE composition: total splits = 3 Solver info for each split is in the following KSP objects: Split number 0 Defined by IS KSP Object: (fieldsplit_0_) 1 MPI process type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (fieldsplit_0_) 1 MPI process type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: (fieldsplit_0_) 1 MPI process type: seqaij rows=243, cols=243 total: nonzeros=4473, allocated nonzeros=4473 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 86 nodes, limit used is 5 Split number 1 Defined by IS KSP Object: (fieldsplit_2_) 1 MPI process type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (fieldsplit_2_) 1 MPI process type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: (fieldsplit_2_) 1 MPI process type: seqaij rows=81, cols=81 total: nonzeros=497, allocated nonzeros=497 total number of mallocs used during MatSetValues calls=0 not using I-node routines Split number 2 Defined by IS KSP Object: (fieldsplit_2_) 1 MPI process type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (fieldsplit_2_) 1 MPI process type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: (fieldsplit_2_) 1 MPI process type: seqaij rows=243, cols=243 total: nonzeros=4473, allocated nonzeros=4473 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 85 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI process type: seqaij rows=567, cols=567 total: nonzeros=24353, allocated nonzeros=24353 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 173 nodes, limit used is 5 -- Dr. Sebastian Blauth Fraunhofer-Institut f?r Techno- und Wirtschaftsmathematik ITWM Abteilung Transportvorg?nge Fraunhofer-Platz 1, 67663 Kaiserslautern Telefon: +49 631 31600-4968 sebastian.blauth at itwm.fraunhofer.de https://www.itwm.fraunhofer.de From: petsc-users On Behalf Of Blauth, Sebastian Sent: Tuesday, July 2, 2024 11:47 AM To: Matthew Knepley Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Question regarding naming of fieldsplit splits Hi Matt, thanks fort he answer and clarification. Then I?ll work around this issue in python, where I set the options. Best, Sebastian -- Dr. Sebastian Blauth Fraunhofer-Institut f?r Techno- und Wirtschaftsmathematik ITWM Abteilung Transportvorg?nge Fraunhofer-Platz 1, 67663 Kaiserslautern Telefon: +49 631 31600-4968 sebastian.blauth at itwm.fraunhofer.de https://www.itwm.fraunhofer.de From: Matthew Knepley > Sent: Monday, July 1, 2024 4:30 PM To: Blauth, Sebastian > Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Question regarding naming of fieldsplit splits On Mon, Jul 1, 2024 at 9:48?AM Blauth, Sebastian > wrote: Dear Matt, thanks a lot for your help. Unfortunately, for me these extra options do not have any effect, I still get the ?u? and ?p? fieldnames. Also, this would not help me to get rid of the ?c? fieldname ? on that level of the fieldsplit I am basically using your approach already, and still it does show up. The output of the -ksp_view is unchanged, so that I do not attach it here again. Maybe I misunderstood you? Oh, we make an exception for single fields, since we think you would want to use the name. I have to make an extra option to shut off naming. Thanks, Matt Thanks for the help and best regards, Sebastian -- Dr. Sebastian Blauth Fraunhofer-Institut f?r Techno- und Wirtschaftsmathematik ITWM Abteilung Transportvorg?nge Fraunhofer-Platz 1, 67663 Kaiserslautern Telefon: +49 631 31600-4968 sebastian.blauth at itwm.fraunhofer.de https://www.itwm.fraunhofer.de From: Matthew Knepley > Sent: Monday, July 1, 2024 2:27 PM To: Blauth, Sebastian > Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Question regarding naming of fieldsplit splits On Fri, Jun 28, 2024 at 4:05?AM Blauth, Sebastian > wrote: Hello everyone, I have a question regarding the naming convention using PETSc?s PCFieldsplit. I have been following https://lists.mcs.anl.gov/pipermail/petsc-users/2019-January/037262.html to create a DMShell with FEniCS in order to customize PCFieldsplit for my application. I am using the following options, which work nicely for me: -ksp_type fgmres -pc_type fieldsplit -pc_fieldsplit_0_fields 0, 1 -pc_fieldsplit_1_fields 2 -pc_fieldsplit_type additive -fieldsplit_0_ksp_type fgmres -fieldsplit_0_pc_type fieldsplit -fieldsplit_0_pc_fieldsplit_type schur -fieldsplit_0_pc_fieldsplit_schur_fact_type full -fieldsplit_0_pc_fieldsplit_schur_precondition selfp -fieldsplit_0_fieldsplit_u_ksp_type preonly -fieldsplit_0_fieldsplit_u_pc_type lu -fieldsplit_0_fieldsplit_p_ksp_type cg -fieldsplit_0_fieldsplit_p_ksp_rtol 1e-14 -fieldsplit_0_fieldsplit_p_ksp_atol 1e-30 -fieldsplit_0_fieldsplit_p_pc_type icc -fieldsplit_0_ksp_rtol 1e-14 -fieldsplit_0_ksp_atol 1e-30 -fieldsplit_0_ksp_monitor_true_residual -fieldsplit_c_ksp_type preonly -fieldsplit_c_pc_type lu -ksp_view By default, we use the field names, but you can prevent this by specifying the fields by hand, so -fieldsplit_0_pc_fieldsplit_0_fields 0 -fieldsplit_0_pc_fieldsplit_1_fields 1 should remove the 'u' and 'p' fieldnames. It is somewhat hacky, but I think easier to remember than some extra option. Thanks, Matt Note that this is just an academic example (sorry for the low solver tolerances) to test the approach, consisting of a Stokes equation and some concentration equation (which is not even coupled to Stokes, just for testing). Completely analogous to https://lists.mcs.anl.gov/pipermail/petsc-users/2019-January/037262.html, I translate my IS?s to a PETSc Section, which is then supplied to a DMShell and assigned to a KSP. I am not so familiar with the code or how / why this works, but it seems to do so perfectly. I name my sections with petsc4py using section.setFieldName(0, "u") section.setFieldName(1, "p") section.setFieldName(2, "c") However, this is also reflected in the way I can access the fieldsplit options from the command line. My question is: Is there any way of not using the FieldNames specified in python but use the index of the field as defined with ?-pc_fieldsplit_0_fields 0, 1? and ?-pc_fieldsplit_1_fields 2?, i.e., instead of the prefix ?fieldsplit_0_fieldsplit_u? I want to write ?fieldsplit_0_fieldsplit_0?, instead of ?fieldsplit_0_fieldsplit_p? I want to use ?fieldsplit_0_fieldsplit_1?, and instead of ?fieldsplit_c? I want to use ?fieldsplit_1?. Just changing the names of the fields to section.setFieldName(0, "0") section.setFieldName(1, "1") section.setFieldName(2, "2") does obviously not work as expected, as it works for velocity and pressure, but not for the concentration ? the prefix there is then ?fieldsplit_2? and not ?fieldsplit_1?. In the docs, I have found https://petsc.org/main/manualpages/PC/PCFieldSplitSetFields/ which seems to suggest that the fieldname can potentially be supplied, but I don?t see how to do so from the command line. Also, for the sake of completeness, I attach the output of the solve with ?-ksp_view? below. Thanks a lot in advance and best regards, Sebastian The output of ksp_view is the following: KSP Object: 1 MPI processes type: fgmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-11, divergence=10000. right preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: fieldsplit FieldSplit with ADDITIVE composition: total splits = 2 Solver info for each split is in the following KSP objects: Split number 0 Defined by IS KSP Object: (fieldsplit_0_) 1 MPI processes type: fgmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-14, absolute=1e-30, divergence=10000. right preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: (fieldsplit_0_) 1 MPI processes type: fieldsplit FieldSplit with Schur preconditioner, factorization FULL Preconditioner for the Schur complement formed from Sp, an assembled approximation to S, which uses A00's diagonal's inverse Split info: Split number 0 Defined by IS Split number 1 Defined by IS KSP solver for A00 block KSP Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes type: lu out-of-place factorization tolerance for zero pivot 2.22045e-14 matrix ordering: nd factor fill ratio given 5., needed 3.92639 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=4290, cols=4290 package used to perform factorization: petsc total: nonzeros=375944, allocated nonzeros=375944 using I-node routines: found 2548 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes type: seqaij rows=4290, cols=4290 total: nonzeros=95748, allocated nonzeros=95748 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 3287 nodes, limit used is 5 KSP solver for S = A11 - A10 inv(A00) A01 KSP Object: (fieldsplit_0_fieldsplit_p_) 1 MPI processes type: cg maximum iterations=10000, initial guess is zero tolerances: relative=1e-14, absolute=1e-30, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (fieldsplit_0_fieldsplit_p_) 1 MPI processes type: icc out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 using Manteuffel shift [POSITIVE_DEFINITE] matrix ordering: natural factor fill ratio given 1., needed 1. Factored matrix follows: Mat Object: 1 MPI processes type: seqsbaij rows=561, cols=561 package used to perform factorization: petsc total: nonzeros=5120, allocated nonzeros=5120 block size is 1 linear system matrix followed by preconditioner matrix: Mat Object: (fieldsplit_0_fieldsplit_p_) 1 MPI processes type: schurcomplement rows=561, cols=561 Schur complement A11 - A10 inv(A00) A01 A11 Mat Object: (fieldsplit_0_fieldsplit_p_) 1 MPI processes type: seqaij rows=561, cols=561 total: nonzeros=3729, allocated nonzeros=3729 total number of mallocs used during MatSetValues calls=0 not using I-node routines A10 Mat Object: 1 MPI processes type: seqaij rows=561, cols=4290 total: nonzeros=19938, allocated nonzeros=19938 total number of mallocs used during MatSetValues calls=0 not using I-node routines KSP of A00 KSP Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes type: lu out-of-place factorization tolerance for zero pivot 2.22045e-14 matrix ordering: nd factor fill ratio given 5., needed 3.92639 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=4290, cols=4290 package used to perform factorization: petsc total: nonzeros=375944, allocated nonzeros=375944 using I-node routines: found 2548 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes type: seqaij rows=4290, cols=4290 total: nonzeros=95748, allocated nonzeros=95748 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 3287 nodes, limit used is 5 A01 Mat Object: 1 MPI processes type: seqaij rows=4290, cols=561 total: nonzeros=19938, allocated nonzeros=19938 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 3287 nodes, limit used is 5 Mat Object: 1 MPI processes type: seqaij rows=561, cols=561 total: nonzeros=9679, allocated nonzeros=9679 total number of mallocs used during MatSetValues calls=0 not using I-node routines linear system matrix = precond matrix: Mat Object: (fieldsplit_0_) 1 MPI processes type: seqaij rows=4851, cols=4851 total: nonzeros=139353, allocated nonzeros=139353 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 3830 nodes, limit used is 5 Split number 1 Defined by IS KSP Object: (fieldsplit_c_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (fieldsplit_c_) 1 MPI processes type: lu out-of-place factorization tolerance for zero pivot 2.22045e-14 matrix ordering: nd factor fill ratio given 5., needed 4.24323 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=561, cols=561 package used to perform factorization: petsc total: nonzeros=15823, allocated nonzeros=15823 not using I-node routines linear system matrix = precond matrix: Mat Object: (fieldsplit_c_) 1 MPI processes type: seqaij rows=561, cols=561 total: nonzeros=3729, allocated nonzeros=3729 total number of mallocs used during MatSetValues calls=0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=5412, cols=5412 total: nonzeros=190416, allocated nonzeros=190416 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 3833 nodes, limit used is 5 -- Dr. Sebastian Blauth Fraunhofer-Institut f?r Techno- und Wirtschaftsmathematik ITWM Abteilung Transportvorg?nge Fraunhofer-Platz 1, 67663 Kaiserslautern Telefon: +49 631 31600-4968 sebastian.blauth at itwm.fraunhofer.de https://www.itwm.fraunhofer.de -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 7943 bytes Desc: not available URL: From knepley at gmail.com Thu Aug 1 08:24:47 2024 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 1 Aug 2024 09:24:47 -0400 Subject: [petsc-users] How to combine different element types into a single DMPlex? In-Reply-To: <4545fc14-d9d5-46c4-bb16-fa304b27d106@giref.ulaval.ca> References: <6e78845e-2054-92b1-d6db-2c0820c05b64@giref.ulaval.ca> <9021c53e-18af-428a-978a-54a3c7371378@giref.ulaval.ca> <4545fc14-d9d5-46c4-bb16-fa304b27d106@giref.ulaval.ca> Message-ID: On Thu, Aug 1, 2024 at 8:23?AM Eric Chamberland < Eric.Chamberland at giref.ulaval.ca> wrote: > Hi Matthew, > > we have our own format that uses MPI I/O for the initial read, then we > would like to do almost exactly what we do in ex47.c ( > https://urldefense.us/v3/__https://petsc.org/main/src/dm/impls/plex/tests/ex47.c.html__;!!G_uCfscf7eWS!aHeMEPfb0Meog5f2a3LiP86hnFxzuIQvMnwh6xTVli7pOyTG58-uCFxfN1vPwH43kT7LT5MKKPc7W06sEuZH$ ) excepted the > very beginning of the program that will read (MPI I/O) from the disk. > Then, always in parallel: > > 1- Populate a DMPlex with multiple element types (with a variant of > DMPlexBuildFromCellListParallel ? do you have an example of this?) > > 2- Call partitioning (DMPlexDistribute) > > 3- Compute overlap (DMPlexDistributeOverlap) > > 4- Also compute the corresponding mapping between original element numbers > and partitonned+overlap elements ( DMPlexNaturalToGlobalBegin/End) > > The main point here here is overlap computation. And the big challenge is > that we must always rely on the fact that never, ever, any node read all > the mesh: all nodes have only a small part of it at the beginning then we > want parallel partitioning and overlapping computation... > > It is now working fine for a mesh with a single type of element, but if we > can modify ex47.c with an example of a mixed element types that will > achieve what we would like to do! > > We can do that. We only need to change step 1. I will put it on my TODO list. My thinking is the same as Vaclav, namely to replace numCorners with a PetscSection describing the cells[] array. Will that work for you? Thanks, Matt > Thanks, > > Eric > > > On 2024-07-31 22:09, Matthew Knepley wrote: > > On Wed, Jul 31, 2024 at 4:16?PM Eric Chamberland < > Eric.Chamberland at giref.ulaval.ca> wrote: > >> Hi Vaclav, >> >> Okay, I am coming back with this question after some time... ;) >> >> I am just wondering if it is now possible to call >> DMPlexBuildFromCellListParallel or something else, to build a mesh that >> combine different element types into a single DMPlex (in parallel of >> course) ? >> > 1) Meshes with different cell types are fully functional, and some > applications have been using them for a while now. > > 2) The Firedrake I/O methods support these hybrid meshes. > > 3) You can, for example, read in a GMsh or ExodusII file with different > cell types. > > However, there is no direct interface like > DMPlexBuildFromCellListParallel(). If you plan on creating meshes by hand, > I can build that for you. > No one so far has wanted that. Rather they want to read in a mesh in some > format, or alter a base mesh by inserting other cell types. > > So, what is the motivating use case? > > Thanks, > > Matt > >> Thanks, >> >> Eric >> On 2021-09-23 11:30, Hapla Vaclav wrote: >> >> Note there will soon be a generalization of >> DMPlexBuildFromCellListParallel() around, as a side product of our current >> collaborative efforts with Firedrake guys. It will take a PetscSection >> instead of relying on the blocksize [which is indeed always constant for >> the given dataset]. Stay tuned. >> >> https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/4350__;!!G_uCfscf7eWS!aHeMEPfb0Meog5f2a3LiP86hnFxzuIQvMnwh6xTVli7pOyTG58-uCFxfN1vPwH43kT7LT5MKKPc7W_UAR2Yb$ >> >> Thanks, >> >> Vaclav >> >> On 23 Sep 2021, at 16:53, Eric Chamberland < >> Eric.Chamberland at giref.ulaval.ca> wrote: >> >> Hi, >> >> oh, that's a great news! >> >> In our case we have our home-made file-format, invariant to the number of >> processes (thanks to MPI_File_set_view), that uses collective, asynchronous >> MPI I/O native calls for unstructured hybrid meshes and fields . >> >> So our needs are not for reading meshes but only to fill an hybrid DMPlex >> with DMPlexBuildFromCellListParallel (or something else to come?)... to >> exploit petsc partitioners and parallel overlap computation... >> >> Thanks for the follow-up! :) >> >> Eric >> >> >> On 2021-09-22 7:20 a.m., Matthew Knepley wrote: >> >> On Wed, Sep 22, 2021 at 3:04 AM Karin&NiKo wrote: >> >>> Dear Matthew, >>> >>> This is great news! >>> For my part, I would be mostly interested in the parallel input >>> interface. Sorry for that... >>> Indeed, in our application, we already have a parallel mesh data >>> structure that supports hybrid meshes with parallel I/O and distribution >>> (based on the MED format). We would like to use a DMPlex to make parallel >>> mesh adaptation. >>> As a matter of fact, all our meshes are in the MED format. We could >>> also contribute to extend the interface of DMPlex with MED (if you consider >>> it could be usefull). >>> >> >> An MED interface does exist. I stopped using it for two reasons: >> >> 1) The code was not portable and the build was failing on different >> architectures. I had to manually fix it. >> >> 2) The boundary markers did not provide global information, so that >> parallel reading was much harder. >> >> Feel free to update my MED reader to a better design. >> >> Thanks, >> >> Matt >> >> >>> Best regards, >>> Nicolas >>> >>> >>> Le mar. 21 sept. 2021 ? 21:56, Matthew Knepley a >>> ?crit : >>> >>>> On Tue, Sep 21, 2021 at 10:31 AM Karin&NiKo >>>> wrote: >>>> >>>>> Dear Eric, dear Matthew, >>>>> >>>>> I share Eric's desire to be able to manipulate meshes composed of >>>>> different types of elements in a PETSc's DMPlex. >>>>> Since this discussion, is there anything new on this feature for the >>>>> DMPlex object or am I missing something? >>>>> >>>> >>>> Thanks for finding this! >>>> >>>> Okay, I did a rewrite of the Plex internals this summer. It should now >>>> be possible to interpolate a mesh with any >>>> number of cell types, partition it, redistribute it, and many other >>>> manipulations. >>>> >>>> You can read in some formats that support hybrid meshes. If you let me >>>> know how you plan to read it in, we can make it work. >>>> Right now, I don't want to make input interfaces that no one will ever >>>> use. We have a project, joint with Firedrake, to finalize >>>> parallel I/O. This will make parallel reading and writing for >>>> checkpointing possible, supporting topology, geometry, fields and >>>> layouts, for many meshes in one HDF5 file. I think we will finish in >>>> November. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Thanks, >>>>> Nicolas >>>>> >>>>> Le mer. 21 juil. 2021 ? 04:25, Eric Chamberland < >>>>> Eric.Chamberland at giref.ulaval.ca> a ?crit : >>>>> >>>>>> Hi, >>>>>> On 2021-07-14 3:14 p.m., Matthew Knepley wrote: >>>>>> >>>>>> On Wed, Jul 14, 2021 at 1:25 PM Eric Chamberland < >>>>>> Eric.Chamberland at giref.ulaval.ca> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> while playing with DMPlexBuildFromCellListParallel, I noticed we >>>>>>> have to >>>>>>> specify "numCorners" which is a fixed value, then gives a fixed >>>>>>> number >>>>>>> of nodes for a series of elements. >>>>>>> >>>>>>> How can I then add, for example, triangles and quadrangles into a >>>>>>> DMPlex? >>>>>>> >>>>>> >>>>>> You can't with that function. It would be much mich more complicated >>>>>> if you could, and I am not sure >>>>>> it is worth it for that function. The reason is that you would need >>>>>> index information to offset into the >>>>>> connectivity list, and that would need to be replicated to some >>>>>> extent so that all processes know what >>>>>> the others are doing. Possible, but complicated. >>>>>> >>>>>> Maybe I can help suggest something for what you are trying to do? >>>>>> >>>>>> Yes: we are trying to partition our parallel mesh with PETSc >>>>>> functions. The mesh has been read in parallel so each process owns a part >>>>>> of it, but we have to manage mixed elements types. >>>>>> >>>>>> When we directly use ParMETIS_V3_PartMeshKway, we give two arrays to >>>>>> describe the elements which allows mixed elements. >>>>>> >>>>>> So, how would I read my mixed mesh in parallel and give it to PETSc >>>>>> DMPlex so I can use a PetscPartitioner with DMPlexDistribute ? >>>>>> >>>>>> A second goal we have is to use PETSc to compute the overlap, which >>>>>> is something I can't find in PARMetis (and any other partitionning library?) >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Eric >>>>>> >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Eric >>>>>>> >>>>>>> -- >>>>>>> Eric Chamberland, ing., M. Ing >>>>>>> Professionnel de recherche >>>>>>> GIREF/Universit? Laval >>>>>>> (418) 656-2131 poste 41 22 42 >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aHeMEPfb0Meog5f2a3LiP86hnFxzuIQvMnwh6xTVli7pOyTG58-uCFxfN1vPwH43kT7LT5MKKPc7WzLKFxyz$ >>>>>> >>>>>> >>>>>> -- >>>>>> Eric Chamberland, ing., M. Ing >>>>>> Professionnel de recherche >>>>>> GIREF/Universit? Laval >>>>>> (418) 656-2131 poste 41 22 42 >>>>>> >>>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aHeMEPfb0Meog5f2a3LiP86hnFxzuIQvMnwh6xTVli7pOyTG58-uCFxfN1vPwH43kT7LT5MKKPc7WzLKFxyz$ >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aHeMEPfb0Meog5f2a3LiP86hnFxzuIQvMnwh6xTVli7pOyTG58-uCFxfN1vPwH43kT7LT5MKKPc7WzLKFxyz$ >> >> >> -- >> Eric Chamberland, ing., M. Ing >> Professionnel de recherche >> GIREF/Universit? Laval >> (418) 656-2131 poste 41 22 42 >> >> >> -- >> Eric Chamberland, ing., M. Ing >> Professionnel de recherche >> GIREF/Universit? Laval >> (418) 656-2131 poste 41 22 42 >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aHeMEPfb0Meog5f2a3LiP86hnFxzuIQvMnwh6xTVli7pOyTG58-uCFxfN1vPwH43kT7LT5MKKPc7WzLKFxyz$ > > > -- > Eric Chamberland, ing., M. Ing > Professionnel de recherche > GIREF/Universit? Laval > (418) 656-2131 poste 41 22 42 > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aHeMEPfb0Meog5f2a3LiP86hnFxzuIQvMnwh6xTVli7pOyTG58-uCFxfN1vPwH43kT7LT5MKKPc7WzLKFxyz$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Aug 1 11:19:20 2024 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 1 Aug 2024 12:19:20 -0400 Subject: [petsc-users] Question regarding naming of fieldsplit splits In-Reply-To: References: Message-ID: <54A1B0B8-A644-446F-854B-A4D60B47671F@petsc.dev> The current code is nonsensical. We can "fix" it in a patch to the release branch (but the fix may break some current usage) by changing if (nfields == 1) { PetscCall(PCFieldSplitSetIS(pc, fieldNames[ifields[0]], compField)); } else { PetscCall(PetscSNPrintf(splitname, sizeof(splitname), "%" PetscInt_FMT, i)); PetscCall(PCFieldSplitSetIS(pc, splitname, compField)); } to PetscCall(PetscSNPrintf(splitname, sizeof(splitname), "%" PetscInt_FMT, i)); PetscCall(PCFieldSplitSetIS(pc, splitname, compField)); but a "correct" fix will take some thought. The current model using a combination of some "inner" integer fieldnames and some outer fieldnames (which are whatever they are including possible integers) doesn't make any sense. > On Aug 1, 2024, at 9:19?AM, Blauth, Sebastian wrote: > > Hello everyone, > > I have a follow up on my question. I noticed the following behavior. Let?s assume I have 5 fields which I want to group with the following options: > > -ksp_type fgmres > -ksp_max_it 1 > -ksp_monitor_true_residual > -ksp_view > -pc_type fieldsplit > -pc_fieldsplit_type multiplicative > -pc_fieldsplit_0_fields 0,1 > -pc_fieldsplit_1_fields 2 > -pc_fieldsplit_2_fields 3,4 > -fieldsplit_0_ksp_type preonly > -fieldsplit_0_pc_type jacobi > -fieldsplit_2_ksp_type preonly > -fieldsplit_2_pc_type jacobi > > Then, the first split is fine, but both the second and third splits get the same prefix, i.e., ?fieldsplit_2?. This is shown in the output of the ksp_view, which I attach below. > The first one gets the prefix as there is only a single split (and I choose as name the index) and the third split gets the name as it groups two other fields, so the ?outer? name is taken. Is there any way to circumvent this, other than using custom names for the splits which are unique? > > Thanks for your time and best regards, > Sebastian Blauth > > > The output of ?ksp_view? is the following > > KSP Object: 1 MPI process > type: fgmres > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=1, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > right preconditioning > using UNPRECONDITIONED norm type for convergence test > PC Object: 1 MPI process > type: fieldsplit > FieldSplit with MULTIPLICATIVE composition: total splits = 3 > Solver info for each split is in the following KSP objects: > Split number 0 Defined by IS > KSP Object: (fieldsplit_0_) 1 MPI process > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (fieldsplit_0_) 1 MPI process > type: jacobi > type DIAGONAL > linear system matrix = precond matrix: > Mat Object: (fieldsplit_0_) 1 MPI process > type: seqaij > rows=243, cols=243 > total: nonzeros=4473, allocated nonzeros=4473 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 86 nodes, limit used is 5 > Split number 1 Defined by IS > KSP Object: (fieldsplit_2_) 1 MPI process > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (fieldsplit_2_) 1 MPI process > type: jacobi > type DIAGONAL > linear system matrix = precond matrix: > Mat Object: (fieldsplit_2_) 1 MPI process > type: seqaij > rows=81, cols=81 > total: nonzeros=497, allocated nonzeros=497 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > Split number 2 Defined by IS > KSP Object: (fieldsplit_2_) 1 MPI process > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (fieldsplit_2_) 1 MPI process > type: jacobi > type DIAGONAL > linear system matrix = precond matrix: > Mat Object: (fieldsplit_2_) 1 MPI process > type: seqaij > rows=243, cols=243 > total: nonzeros=4473, allocated nonzeros=4473 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 85 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: 1 MPI process > type: seqaij > rows=567, cols=567 > total: nonzeros=24353, allocated nonzeros=24353 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 173 nodes, limit used is 5 > > -- > Dr. Sebastian Blauth > Fraunhofer-Institut f?r > Techno- und Wirtschaftsmathematik ITWM > Abteilung Transportvorg?nge > Fraunhofer-Platz 1, 67663 Kaiserslautern > Telefon: +49 631 31600-4968 > sebastian.blauth at itwm.fraunhofer.de > https://urldefense.us/v3/__https://www.itwm.fraunhofer.de__;!!G_uCfscf7eWS!cAFFX-2D5mPl2LyzZdzgpGK1EsZCSss_e1OpkYmPPKSWI9R6M4qPL0ghruqbMv6bIKAYbSdHtCXmL68KeQMTQeY$ > > From: petsc-users > On Behalf Of Blauth, Sebastian > Sent: Tuesday, July 2, 2024 11:47 AM > To: Matthew Knepley > > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Question regarding naming of fieldsplit splits > > Hi Matt, > > thanks fort he answer and clarification. Then I?ll work around this issue in python, where I set the options. > > Best, > Sebastian > > -- > Dr. Sebastian Blauth > Fraunhofer-Institut f?r > Techno- und Wirtschaftsmathematik ITWM > Abteilung Transportvorg?nge > Fraunhofer-Platz 1, 67663 Kaiserslautern > Telefon: +49 631 31600-4968 > sebastian.blauth at itwm.fraunhofer.de > https://urldefense.us/v3/__https://www.itwm.fraunhofer.de__;!!G_uCfscf7eWS!cAFFX-2D5mPl2LyzZdzgpGK1EsZCSss_e1OpkYmPPKSWI9R6M4qPL0ghruqbMv6bIKAYbSdHtCXmL68KeQMTQeY$ > > From: Matthew Knepley > > Sent: Monday, July 1, 2024 4:30 PM > To: Blauth, Sebastian > > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Question regarding naming of fieldsplit splits > > On Mon, Jul 1, 2024 at 9:48?AM Blauth, Sebastian > wrote: > Dear Matt, > > thanks a lot for your help. Unfortunately, for me these extra options do not have any effect, I still get the ?u? and ?p? fieldnames. Also, this would not help me to get rid of the ?c? fieldname ? on that level of the fieldsplit I am basically using your approach already, and still it does show up. The output of the -ksp_view is unchanged, so that I do not attach it here again. Maybe I misunderstood you? > > Oh, we make an exception for single fields, since we think you would want to use the name. I have to make an extra option to shut off naming. > > Thanks, > > Matt > > Thanks for the help and best regards, > Sebastian > > -- > Dr. Sebastian Blauth > Fraunhofer-Institut f?r > Techno- und Wirtschaftsmathematik ITWM > Abteilung Transportvorg?nge > Fraunhofer-Platz 1, 67663 Kaiserslautern > Telefon: +49 631 31600-4968 > sebastian.blauth at itwm.fraunhofer.de > https://urldefense.us/v3/__https://www.itwm.fraunhofer.de__;!!G_uCfscf7eWS!cAFFX-2D5mPl2LyzZdzgpGK1EsZCSss_e1OpkYmPPKSWI9R6M4qPL0ghruqbMv6bIKAYbSdHtCXmL68KeQMTQeY$ > > From: Matthew Knepley > > Sent: Monday, July 1, 2024 2:27 PM > To: Blauth, Sebastian > > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Question regarding naming of fieldsplit splits > > On Fri, Jun 28, 2024 at 4:05?AM Blauth, Sebastian > wrote: > Hello everyone, > > I have a question regarding the naming convention using PETSc?s PCFieldsplit. I have been following https://urldefense.us/v3/__https://lists.mcs.anl.gov/pipermail/petsc-users/2019-January/037262.html__;!!G_uCfscf7eWS!cAFFX-2D5mPl2LyzZdzgpGK1EsZCSss_e1OpkYmPPKSWI9R6M4qPL0ghruqbMv6bIKAYbSdHtCXmL68KSDRPkj4$ to create a DMShell with FEniCS in order to customize PCFieldsplit for my application. > I am using the following options, which work nicely for me: > > -ksp_type fgmres > -pc_type fieldsplit > -pc_fieldsplit_0_fields 0, 1 > -pc_fieldsplit_1_fields 2 > -pc_fieldsplit_type additive > -fieldsplit_0_ksp_type fgmres > -fieldsplit_0_pc_type fieldsplit > -fieldsplit_0_pc_fieldsplit_type schur > -fieldsplit_0_pc_fieldsplit_schur_fact_type full > -fieldsplit_0_pc_fieldsplit_schur_precondition selfp > -fieldsplit_0_fieldsplit_u_ksp_type preonly > -fieldsplit_0_fieldsplit_u_pc_type lu > -fieldsplit_0_fieldsplit_p_ksp_type cg > -fieldsplit_0_fieldsplit_p_ksp_rtol 1e-14 > -fieldsplit_0_fieldsplit_p_ksp_atol 1e-30 > -fieldsplit_0_fieldsplit_p_pc_type icc > -fieldsplit_0_ksp_rtol 1e-14 > -fieldsplit_0_ksp_atol 1e-30 > -fieldsplit_0_ksp_monitor_true_residual > -fieldsplit_c_ksp_type preonly > -fieldsplit_c_pc_type lu > -ksp_view > > By default, we use the field names, but you can prevent this by specifying the fields by hand, so > > -fieldsplit_0_pc_fieldsplit_0_fields 0 > -fieldsplit_0_pc_fieldsplit_1_fields 1 > > should remove the 'u' and 'p' fieldnames. It is somewhat hacky, but I think easier to remember than > some extra option. > > Thanks, > > Matt > > Note that this is just an academic example (sorry for the low solver tolerances) to test the approach, consisting of a Stokes equation and some concentration equation (which is not even coupled to Stokes, just for testing). > Completely analogous to https://urldefense.us/v3/__https://lists.mcs.anl.gov/pipermail/petsc-users/2019-January/037262.html__;!!G_uCfscf7eWS!cAFFX-2D5mPl2LyzZdzgpGK1EsZCSss_e1OpkYmPPKSWI9R6M4qPL0ghruqbMv6bIKAYbSdHtCXmL68KSDRPkj4$ , I translate my IS?s to a PETSc Section, which is then supplied to a DMShell and assigned to a KSP. I am not so familiar with the code or how / why this works, but it seems to do so perfectly. I name my sections with petsc4py using > > section.setFieldName(0, "u") > section.setFieldName(1, "p") > section.setFieldName(2, "c") > > However, this is also reflected in the way I can access the fieldsplit options from the command line. My question is: Is there any way of not using the FieldNames specified in python but use the index of the field as defined with ?-pc_fieldsplit_0_fields 0, 1? and ?-pc_fieldsplit_1_fields 2?, i.e., instead of the prefix ?fieldsplit_0_fieldsplit_u? I want to write ?fieldsplit_0_fieldsplit_0?, instead of ?fieldsplit_0_fieldsplit_p? I want to use ?fieldsplit_0_fieldsplit_1?, and instead of ?fieldsplit_c? I want to use ?fieldsplit_1?. Just changing the names of the fields to > > section.setFieldName(0, "0") > section.setFieldName(1, "1") > section.setFieldName(2, "2") > > does obviously not work as expected, as it works for velocity and pressure, but not for the concentration ? the prefix there is then ?fieldsplit_2? and not ?fieldsplit_1?. In the docs, I have found https://urldefense.us/v3/__https://petsc.org/main/manualpages/PC/PCFieldSplitSetFields/__;!!G_uCfscf7eWS!cAFFX-2D5mPl2LyzZdzgpGK1EsZCSss_e1OpkYmPPKSWI9R6M4qPL0ghruqbMv6bIKAYbSdHtCXmL68KbWFmeH0$ which seems to suggest that the fieldname can potentially be supplied, but I don?t see how to do so from the command line. Also, for the sake of completeness, I attach the output of the solve with ?-ksp_view? below. > > Thanks a lot in advance and best regards, > Sebastian > > > The output of ksp_view is the following: > KSP Object: 1 MPI processes > type: fgmres > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-11, divergence=10000. > right preconditioning > using UNPRECONDITIONED norm type for convergence test > PC Object: 1 MPI processes > type: fieldsplit > FieldSplit with ADDITIVE composition: total splits = 2 > Solver info for each split is in the following KSP objects: > Split number 0 Defined by IS > KSP Object: (fieldsplit_0_) 1 MPI processes > type: fgmres > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-14, absolute=1e-30, divergence=10000. > right preconditioning > using UNPRECONDITIONED norm type for convergence test > PC Object: (fieldsplit_0_) 1 MPI processes > type: fieldsplit > FieldSplit with Schur preconditioner, factorization FULL > Preconditioner for the Schur complement formed from Sp, an assembled approximation to S, which uses A00's diagonal's inverse > Split info: > Split number 0 Defined by IS > Split number 1 Defined by IS > KSP solver for A00 block > KSP Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes > type: lu > out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: nd > factor fill ratio given 5., needed 3.92639 > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=4290, cols=4290 > package used to perform factorization: petsc > total: nonzeros=375944, allocated nonzeros=375944 > using I-node routines: found 2548 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes > type: seqaij > rows=4290, cols=4290 > total: nonzeros=95748, allocated nonzeros=95748 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 3287 nodes, limit used is 5 > KSP solver for S = A11 - A10 inv(A00) A01 > KSP Object: (fieldsplit_0_fieldsplit_p_) 1 MPI processes > type: cg > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-14, absolute=1e-30, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: (fieldsplit_0_fieldsplit_p_) 1 MPI processes > type: icc > out-of-place factorization > 0 levels of fill > tolerance for zero pivot 2.22045e-14 > using Manteuffel shift [POSITIVE_DEFINITE] > matrix ordering: natural > factor fill ratio given 1., needed 1. > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqsbaij > rows=561, cols=561 > package used to perform factorization: petsc > total: nonzeros=5120, allocated nonzeros=5120 > block size is 1 > linear system matrix followed by preconditioner matrix: > Mat Object: (fieldsplit_0_fieldsplit_p_) 1 MPI processes > type: schurcomplement > rows=561, cols=561 > Schur complement A11 - A10 inv(A00) A01 > A11 > Mat Object: (fieldsplit_0_fieldsplit_p_) 1 MPI processes > type: seqaij > rows=561, cols=561 > total: nonzeros=3729, allocated nonzeros=3729 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > A10 > Mat Object: 1 MPI processes > type: seqaij > rows=561, cols=4290 > total: nonzeros=19938, allocated nonzeros=19938 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > KSP of A00 > KSP Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes > type: lu > out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: nd > factor fill ratio given 5., needed 3.92639 > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=4290, cols=4290 > package used to perform factorization: petsc > total: nonzeros=375944, allocated nonzeros=375944 > using I-node routines: found 2548 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes > type: seqaij > rows=4290, cols=4290 > total: nonzeros=95748, allocated nonzeros=95748 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 3287 nodes, limit used is 5 > A01 > Mat Object: 1 MPI processes > type: seqaij > rows=4290, cols=561 > total: nonzeros=19938, allocated nonzeros=19938 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 3287 nodes, limit used is 5 > Mat Object: 1 MPI processes > type: seqaij > rows=561, cols=561 > total: nonzeros=9679, allocated nonzeros=9679 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: (fieldsplit_0_) 1 MPI processes > type: seqaij > rows=4851, cols=4851 > total: nonzeros=139353, allocated nonzeros=139353 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 3830 nodes, limit used is 5 > Split number 1 Defined by IS > KSP Object: (fieldsplit_c_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (fieldsplit_c_) 1 MPI processes > type: lu > out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: nd > factor fill ratio given 5., needed 4.24323 > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=561, cols=561 > package used to perform factorization: petsc > total: nonzeros=15823, allocated nonzeros=15823 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: (fieldsplit_c_) 1 MPI processes > type: seqaij > rows=561, cols=561 > total: nonzeros=3729, allocated nonzeros=3729 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=5412, cols=5412 > total: nonzeros=190416, allocated nonzeros=190416 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 3833 nodes, limit used is 5 > > -- > Dr. Sebastian Blauth > Fraunhofer-Institut f?r > Techno- und Wirtschaftsmathematik ITWM > Abteilung Transportvorg?nge > Fraunhofer-Platz 1, 67663 Kaiserslautern > Telefon: +49 631 31600-4968 > sebastian.blauth at itwm.fraunhofer.de > https://urldefense.us/v3/__https://www.itwm.fraunhofer.de__;!!G_uCfscf7eWS!cAFFX-2D5mPl2LyzZdzgpGK1EsZCSss_e1OpkYmPPKSWI9R6M4qPL0ghruqbMv6bIKAYbSdHtCXmL68KeQMTQeY$ > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cAFFX-2D5mPl2LyzZdzgpGK1EsZCSss_e1OpkYmPPKSWI9R6M4qPL0ghruqbMv6bIKAYbSdHtCXmL68KAT1VwKQ$ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cAFFX-2D5mPl2LyzZdzgpGK1EsZCSss_e1OpkYmPPKSWI9R6M4qPL0ghruqbMv6bIKAYbSdHtCXmL68KAT1VwKQ$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmolinos at us.es Thu Aug 1 15:40:08 2024 From: mmolinos at us.es (MIGUEL MOLINOS PEREZ) Date: Thu, 1 Aug 2024 20:40:08 +0000 Subject: [petsc-users] Ghost particles for DMSWARM (or similar) Message-ID: <8FBAC7A5-B6AE-4B21-8FEB-52BE1C04A265@us.es> An HTML attachment was scrubbed... URL: From Eric.Chamberland at giref.ulaval.ca Thu Aug 1 16:33:24 2024 From: Eric.Chamberland at giref.ulaval.ca (Eric Chamberland) Date: Thu, 1 Aug 2024 17:33:24 -0400 Subject: [petsc-users] How to combine different element types into a single DMPlex? In-Reply-To: References: <6e78845e-2054-92b1-d6db-2c0820c05b64@giref.ulaval.ca> <9021c53e-18af-428a-978a-54a3c7371378@giref.ulaval.ca> <4545fc14-d9d5-46c4-bb16-fa304b27d106@giref.ulaval.ca> Message-ID: <4fc58cb6-10c8-40be-9c6f-2470e630c7b6@giref.ulaval.ca> On 2024-08-01 09:24, Matthew Knepley wrote: > On Thu, Aug 1, 2024 at 8:23?AM Eric Chamberland > wrote: > > Hi Matthew, > > we have our own format that uses MPI I/O for the initial read, > then we would like to do almost exactly what we do in ex47.c > (https://urldefense.us/v3/__https://petsc.org/main/src/dm/impls/plex/tests/ex47.c.html__;!!G_uCfscf7eWS!Yl2BQr5WaJV41Sq7-i2xoMTi_ZGsBeThe3GPDdLjQmRtNXOdQJKpIg1Ec8-av5NcnywNIyr2D9ew6B-O8jC5ICPpWzcZ0mNNE3n3bYIy$ ) > excepted the very beginning of the program that will read (MPI > I/O) from the disk.? Then, always in parallel: > > 1- Populate a DMPlex with multiple element types (with a variant > of DMPlexBuildFromCellListParallel ? do you have an example of this?) > > ... > > We can do that. We only need to change step 1. I will put it on my > TODO list. My thinking is the same as Vaclav, namely to replace > numCorners with a PetscSection describing?the cells[] array. Will that > work for you? > Hi Matthew, That sounds fine for me!? I can create a mixed mesh partition description so we add it to ex47.c... I'll ping @you in a MR for that... thanks a lot! Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Aug 2 07:41:47 2024 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 2 Aug 2024 08:41:47 -0400 Subject: [petsc-users] Question regarding naming of fieldsplit splits In-Reply-To: <54A1B0B8-A644-446F-854B-A4D60B47671F@petsc.dev> References: <54A1B0B8-A644-446F-854B-A4D60B47671F@petsc.dev> Message-ID: On Thu, Aug 1, 2024 at 12:19?PM Barry Smith wrote: > > The current code is nonsensical. We can "fix" it in a patch to the > release branch (but the fix may break some current usage) by changing > > if (nfields == 1) { > PetscCall(PCFieldSplitSetIS(pc, fieldNames[ifields[0]], > compField)); > } else { > PetscCall(PetscSNPrintf(splitname, sizeof(splitname), "%" > PetscInt_FMT, i)); > PetscCall(PCFieldSplitSetIS(pc, splitname, compField)); > } > > to > > PetscCall(PetscSNPrintf(splitname, sizeof(splitname), "%" > PetscInt_FMT, i)); > PetscCall(PCFieldSplitSetIS(pc, splitname, compField)); > > > but a "correct" fix will take some thought. The current model using a > combination of some "inner" integer fieldnames and some outer fieldnames > (which are whatever they are including possible integers) doesn't make any > sense. > My fix was going to be a flag that turns off names altogether. I think this will fix it for Sebastian, and is the only consistent fix I can think of. Thanks, Matt > > On Aug 1, 2024, at 9:19?AM, Blauth, Sebastian < > sebastian.blauth at itwm.fraunhofer.de> wrote: > > Hello everyone, > > I have a follow up on my question. I noticed the following behavior. Let?s > assume I have 5 fields which I want to group with the following options: > > -ksp_type fgmres > -ksp_max_it 1 > -ksp_monitor_true_residual > -ksp_view > -pc_type fieldsplit > -pc_fieldsplit_type multiplicative > -pc_fieldsplit_0_fields 0,1 > -pc_fieldsplit_1_fields 2 > -pc_fieldsplit_2_fields 3,4 > -fieldsplit_0_ksp_type preonly > -fieldsplit_0_pc_type jacobi > -fieldsplit_2_ksp_type preonly > -fieldsplit_2_pc_type jacobi > > Then, the first split is fine, but both the second and third splits get > the same prefix, i.e., ?fieldsplit_2?. This is shown in the output of the > ksp_view, which I attach below. > The first one gets the prefix as there is only a single split (and I > choose as name the index) and the third split gets the name as it groups > two other fields, so the ?outer? name is taken. Is there any way to > circumvent this, other than using custom names for the splits which are > unique? > > Thanks for your time and best regards, > Sebastian Blauth > > > The output of ?ksp_view? is the following > > KSP Object: 1 MPI process > type: fgmres > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=1, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > right preconditioning > using UNPRECONDITIONED norm type for convergence test > PC Object: 1 MPI process > type: fieldsplit > FieldSplit with MULTIPLICATIVE composition: total splits = 3 > Solver info for each split is in the following KSP objects: > Split number 0 Defined by IS > KSP Object: (fieldsplit_0_) 1 MPI process > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (fieldsplit_0_) 1 MPI process > type: jacobi > type DIAGONAL > linear system matrix = precond matrix: > Mat Object: (fieldsplit_0_) 1 MPI process > type: seqaij > rows=243, cols=243 > total: nonzeros=4473, allocated nonzeros=4473 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 86 nodes, limit used is 5 > Split number 1 Defined by IS > KSP Object: (fieldsplit_2_) 1 MPI process > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (fieldsplit_2_) 1 MPI process > type: jacobi > type DIAGONAL > linear system matrix = precond matrix: > Mat Object: (fieldsplit_2_) 1 MPI process > type: seqaij > rows=81, cols=81 > total: nonzeros=497, allocated nonzeros=497 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > Split number 2 Defined by IS > KSP Object: (fieldsplit_2_) 1 MPI process > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (fieldsplit_2_) 1 MPI process > type: jacobi > type DIAGONAL > linear system matrix = precond matrix: > Mat Object: (fieldsplit_2_) 1 MPI process > type: seqaij > rows=243, cols=243 > total: nonzeros=4473, allocated nonzeros=4473 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 85 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: 1 MPI process > type: seqaij > rows=567, cols=567 > total: nonzeros=24353, allocated nonzeros=24353 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 173 nodes, limit used is 5 > > -- > Dr. Sebastian Blauth > Fraunhofer-Institut f?r > Techno- und Wirtschaftsmathematik ITWM > Abteilung Transportvorg?nge > Fraunhofer-Platz 1, 67663 Kaiserslautern > Telefon: +49 631 31600-4968 > sebastian.blauth at itwm.fraunhofer.de > https://urldefense.us/v3/__https://www.itwm.fraunhofer.de__;!!G_uCfscf7eWS!an6Idf-f7OiZlsU0N0Ftpr5mM5etD7GF_9ghya_ALFmQP_eL93oONwYYRLmLGz-0FSXHkB0bMsjj0e4-qdCV$ > > *From:* petsc-users *On Behalf Of *Blauth, > Sebastian > *Sent:* Tuesday, July 2, 2024 11:47 AM > *To:* Matthew Knepley > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Question regarding naming of fieldsplit > splits > > Hi Matt, > > thanks fort he answer and clarification. Then I?ll work around this issue > in python, where I set the options. > > Best, > Sebastian > > -- > Dr. Sebastian Blauth > Fraunhofer-Institut f?r > Techno- und Wirtschaftsmathematik ITWM > Abteilung Transportvorg?nge > Fraunhofer-Platz 1, 67663 Kaiserslautern > Telefon: +49 631 31600-4968 > sebastian.blauth at itwm.fraunhofer.de > https://urldefense.us/v3/__https://www.itwm.fraunhofer.de__;!!G_uCfscf7eWS!an6Idf-f7OiZlsU0N0Ftpr5mM5etD7GF_9ghya_ALFmQP_eL93oONwYYRLmLGz-0FSXHkB0bMsjj0e4-qdCV$ > > *From:* Matthew Knepley > *Sent:* Monday, July 1, 2024 4:30 PM > *To:* Blauth, Sebastian > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Question regarding naming of fieldsplit > splits > > On Mon, Jul 1, 2024 at 9:48?AM Blauth, Sebastian < > sebastian.blauth at itwm.fraunhofer.de> wrote: > > Dear Matt, > > thanks a lot for your help. Unfortunately, for me these extra options do > not have any effect, I still get the ?u? and ?p? fieldnames. Also, this > would not help me to get rid of the ?c? fieldname ? on that level of the > fieldsplit I am basically using your approach already, and still it does > show up. The output of the -ksp_view is unchanged, so that I do not attach > it here again. Maybe I misunderstood you? > > > Oh, we make an exception for single fields, since we think you would want > to use the name. I have to make an extra option to shut off naming. > > Thanks, > > Matt > > > Thanks for the help and best regards, > Sebastian > > -- > Dr. Sebastian Blauth > Fraunhofer-Institut f?r > Techno- und Wirtschaftsmathematik ITWM > Abteilung Transportvorg?nge > Fraunhofer-Platz 1, 67663 Kaiserslautern > Telefon: +49 631 31600-4968 > sebastian.blauth at itwm.fraunhofer.de > https://urldefense.us/v3/__https://www.itwm.fraunhofer.de__;!!G_uCfscf7eWS!an6Idf-f7OiZlsU0N0Ftpr5mM5etD7GF_9ghya_ALFmQP_eL93oONwYYRLmLGz-0FSXHkB0bMsjj0e4-qdCV$ > > *From:* Matthew Knepley > *Sent:* Monday, July 1, 2024 2:27 PM > *To:* Blauth, Sebastian > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Question regarding naming of fieldsplit > splits > > On Fri, Jun 28, 2024 at 4:05?AM Blauth, Sebastian < > sebastian.blauth at itwm.fraunhofer.de> wrote: > > Hello everyone, > > I have a question regarding the naming convention using PETSc?s > PCFieldsplit. I have been following > https://urldefense.us/v3/__https://lists.mcs.anl.gov/pipermail/petsc-users/2019-January/037262.html__;!!G_uCfscf7eWS!an6Idf-f7OiZlsU0N0Ftpr5mM5etD7GF_9ghya_ALFmQP_eL93oONwYYRLmLGz-0FSXHkB0bMsjj0Qyn5DYX$ to > create a DMShell with FEniCS in order to customize PCFieldsplit for my > application. > I am using the following options, which work nicely for me: > > -ksp_type fgmres > -pc_type fieldsplit > -pc_fieldsplit_0_fields 0, 1 > -pc_fieldsplit_1_fields 2 > -pc_fieldsplit_type additive > -fieldsplit_0_ksp_type fgmres > -fieldsplit_0_pc_type fieldsplit > -fieldsplit_0_pc_fieldsplit_type schur > -fieldsplit_0_pc_fieldsplit_schur_fact_type full > -fieldsplit_0_pc_fieldsplit_schur_precondition selfp > -fieldsplit_0_fieldsplit_u_ksp_type preonly > -fieldsplit_0_fieldsplit_u_pc_type lu > -fieldsplit_0_fieldsplit_p_ksp_type cg > -fieldsplit_0_fieldsplit_p_ksp_rtol 1e-14 > -fieldsplit_0_fieldsplit_p_ksp_atol 1e-30 > -fieldsplit_0_fieldsplit_p_pc_type icc > -fieldsplit_0_ksp_rtol 1e-14 > -fieldsplit_0_ksp_atol 1e-30 > -fieldsplit_0_ksp_monitor_true_residual > -fieldsplit_c_ksp_type preonly > -fieldsplit_c_pc_type lu > -ksp_view > > > By default, we use the field names, but you can prevent this by specifying > the fields by hand, so > > -fieldsplit_0_pc_fieldsplit_0_fields 0 > -fieldsplit_0_pc_fieldsplit_1_fields 1 > > should remove the 'u' and 'p' fieldnames. It is somewhat hacky, but I > think easier to remember than > some extra option. > > Thanks, > > Matt > > > Note that this is just an academic example (sorry for the low solver > tolerances) to test the approach, consisting of a Stokes equation and some > concentration equation (which is not even coupled to Stokes, just for > testing). > Completely analogous to > https://urldefense.us/v3/__https://lists.mcs.anl.gov/pipermail/petsc-users/2019-January/037262.html__;!!G_uCfscf7eWS!an6Idf-f7OiZlsU0N0Ftpr5mM5etD7GF_9ghya_ALFmQP_eL93oONwYYRLmLGz-0FSXHkB0bMsjj0Qyn5DYX$ , > I translate my IS?s to a PETSc Section, which is then supplied to a DMShell > and assigned to a KSP. I am not so familiar with the code or how / why this > works, but it seems to do so perfectly. I name my sections with petsc4py > using > > section.setFieldName(0, "u") > section.setFieldName(1, "p") > section.setFieldName(2, "c") > > However, this is also reflected in the way I can access the fieldsplit > options from the command line. My question is: Is there any way of not > using the FieldNames specified in python but use the index of the field as > defined with ?-pc_fieldsplit_0_fields 0, 1? and ?-pc_fieldsplit_1_fields > 2?, i.e., instead of the prefix ?fieldsplit_0_fieldsplit_u? I want to write > ?fieldsplit_0_fieldsplit_0?, instead of ?fieldsplit_0_fieldsplit_p? I want > to use ?fieldsplit_0_fieldsplit_1?, and instead of ?fieldsplit_c? I want to > use ?fieldsplit_1?. Just changing the names of the fields to > > section.setFieldName(0, "0") > section.setFieldName(1, "1") > section.setFieldName(2, "2") > > does obviously not work as expected, as it works for velocity and > pressure, but not for the concentration ? the prefix there is then > ?fieldsplit_2? and not ?fieldsplit_1?. In the docs, I have found > https://urldefense.us/v3/__https://petsc.org/main/manualpages/PC/PCFieldSplitSetFields/__;!!G_uCfscf7eWS!an6Idf-f7OiZlsU0N0Ftpr5mM5etD7GF_9ghya_ALFmQP_eL93oONwYYRLmLGz-0FSXHkB0bMsjj0X9GdD2a$ which seems > to suggest that the fieldname can potentially be supplied, but I don?t see > how to do so from the command line. Also, for the sake of completeness, I > attach the output of the solve with ?-ksp_view? below. > > Thanks a lot in advance and best regards, > Sebastian > > > The output of ksp_view is the following: > KSP Object: 1 MPI processes > type: fgmres > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-11, divergence=10000. > right preconditioning > using UNPRECONDITIONED norm type for convergence test > PC Object: 1 MPI processes > type: fieldsplit > FieldSplit with ADDITIVE composition: total splits = 2 > Solver info for each split is in the following KSP objects: > Split number 0 Defined by IS > KSP Object: (fieldsplit_0_) 1 MPI processes > type: fgmres > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-14, absolute=1e-30, divergence=10000. > right preconditioning > using UNPRECONDITIONED norm type for convergence test > PC Object: (fieldsplit_0_) 1 MPI processes > type: fieldsplit > FieldSplit with Schur preconditioner, factorization FULL > Preconditioner for the Schur complement formed from Sp, an assembled > approximation to S, which uses A00's diagonal's inverse > Split info: > Split number 0 Defined by IS > Split number 1 Defined by IS > KSP solver for A00 block > KSP Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes > type: lu > out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: nd > factor fill ratio given 5., needed 3.92639 > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=4290, cols=4290 > package used to perform factorization: petsc > total: nonzeros=375944, allocated nonzeros=375944 > using I-node routines: found 2548 nodes, limit used is > 5 > linear system matrix = precond matrix: > Mat Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes > type: seqaij > rows=4290, cols=4290 > total: nonzeros=95748, allocated nonzeros=95748 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 3287 nodes, limit used is 5 > KSP solver for S = A11 - A10 inv(A00) A01 > KSP Object: (fieldsplit_0_fieldsplit_p_) 1 MPI processes > type: cg > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-14, absolute=1e-30, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: (fieldsplit_0_fieldsplit_p_) 1 MPI processes > type: icc > out-of-place factorization > 0 levels of fill > tolerance for zero pivot 2.22045e-14 > using Manteuffel shift [POSITIVE_DEFINITE] > matrix ordering: natural > factor fill ratio given 1., needed 1. > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqsbaij > rows=561, cols=561 > package used to perform factorization: petsc > total: nonzeros=5120, allocated nonzeros=5120 > block size is 1 > linear system matrix followed by preconditioner matrix: > Mat Object: (fieldsplit_0_fieldsplit_p_) 1 MPI processes > type: schurcomplement > rows=561, cols=561 > Schur complement A11 - A10 inv(A00) A01 > A11 > Mat Object: (fieldsplit_0_fieldsplit_p_) 1 MPI processes > type: seqaij > rows=561, cols=561 > total: nonzeros=3729, allocated nonzeros=3729 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > A10 > Mat Object: 1 MPI processes > type: seqaij > rows=561, cols=4290 > total: nonzeros=19938, allocated nonzeros=19938 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > KSP of A00 > KSP Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes > type: lu > out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: nd > factor fill ratio given 5., needed 3.92639 > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=4290, cols=4290 > package used to perform factorization: petsc > total: nonzeros=375944, allocated nonzeros=375944 > using I-node routines: found 2548 nodes, limit > used is 5 > linear system matrix = precond matrix: > Mat Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes > type: seqaij > rows=4290, cols=4290 > total: nonzeros=95748, allocated nonzeros=95748 > total number of mallocs used during MatSetValues > calls=0 > using I-node routines: found 3287 nodes, limit used > is 5 > A01 > Mat Object: 1 MPI processes > type: seqaij > rows=4290, cols=561 > total: nonzeros=19938, allocated nonzeros=19938 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 3287 nodes, limit used is > 5 > Mat Object: 1 MPI processes > type: seqaij > rows=561, cols=561 > total: nonzeros=9679, allocated nonzeros=9679 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: (fieldsplit_0_) 1 MPI processes > type: seqaij > rows=4851, cols=4851 > total: nonzeros=139353, allocated nonzeros=139353 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 3830 nodes, limit used is 5 > Split number 1 Defined by IS > KSP Object: (fieldsplit_c_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (fieldsplit_c_) 1 MPI processes > type: lu > out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: nd > factor fill ratio given 5., needed 4.24323 > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=561, cols=561 > package used to perform factorization: petsc > total: nonzeros=15823, allocated nonzeros=15823 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: (fieldsplit_c_) 1 MPI processes > type: seqaij > rows=561, cols=561 > total: nonzeros=3729, allocated nonzeros=3729 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=5412, cols=5412 > total: nonzeros=190416, allocated nonzeros=190416 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 3833 nodes, limit used is 5 > > -- > Dr. Sebastian Blauth > Fraunhofer-Institut f?r > Techno- und Wirtschaftsmathematik ITWM > Abteilung Transportvorg?nge > Fraunhofer-Platz 1, 67663 Kaiserslautern > Telefon: +49 631 31600-4968 > sebastian.blauth at itwm.fraunhofer.de > https://urldefense.us/v3/__https://www.itwm.fraunhofer.de__;!!G_uCfscf7eWS!an6Idf-f7OiZlsU0N0Ftpr5mM5etD7GF_9ghya_ALFmQP_eL93oONwYYRLmLGz-0FSXHkB0bMsjj0e4-qdCV$ > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!an6Idf-f7OiZlsU0N0Ftpr5mM5etD7GF_9ghya_ALFmQP_eL93oONwYYRLmLGz-0FSXHkB0bMsjj0dkG26YT$ > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!an6Idf-f7OiZlsU0N0Ftpr5mM5etD7GF_9ghya_ALFmQP_eL93oONwYYRLmLGz-0FSXHkB0bMsjj0dkG26YT$ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!an6Idf-f7OiZlsU0N0Ftpr5mM5etD7GF_9ghya_ALFmQP_eL93oONwYYRLmLGz-0FSXHkB0bMsjj0dkG26YT$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Aug 2 08:58:10 2024 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 2 Aug 2024 09:58:10 -0400 Subject: [petsc-users] Ghost particles for DMSWARM (or similar) In-Reply-To: <8FBAC7A5-B6AE-4B21-8FEB-52BE1C04A265@us.es> References: <8FBAC7A5-B6AE-4B21-8FEB-52BE1C04A265@us.es> Message-ID: On Thu, Aug 1, 2024 at 4:40?PM MIGUEL MOLINOS PEREZ wrote: > Dear all, I am implementing a Molecular Dynamics (MD) code using the > DMSWARM interface. In the MD simulations we evaluate on each particle > (atoms) some kind of scalar functional using data from the neighbouring > atoms. My problem lies in the > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > > Dear all, > > I am implementing a Molecular Dynamics (MD) code using the DMSWARM interface. In the MD simulations we evaluate on each particle (atoms) some kind of scalar functional using data from the neighbouring atoms. My problem lies in the parallel implementation of the model, because sometimes, some of these neighbours lie on a different processor. > > This is usually solved by using ghost particles. A similar approach (with nodes instead) is already implemented for other PETSc mesh structures like DMPlexConstructGhostCells. Unfortunately, I don't see this kind of constructs for DMSWARM. Am I missing something? > > I this could be done by applying a buffer region by exploiting the background DMDA mesh that I already use to do domain decomposition. Then using the buffer region of each cell to locate the ghost particles and finally using VecCreateGhost. Is this feasible? Or is there an easier approach using other PETSc functions. > > This is feasible, but it would be good to develop a set of best practices, since we have been mainly focused on the case of non-redundant particles. Here is how I think I would do what you want. 1) Add a particle field 'ghost' that identifies ghost vs owned particles. I think it needs options OWNED, OVERLAP, and GHOST 2) At some interval identify particles that should be sent to other processes as ghosts. I would call these "overlap particles". The determination seems application specific, so I would leave this determination to the user right now. We do two things to these particles a) Mark chosen particles as OVERLAP b) Change rank to process we are sending to 3) Call DMSwarmMigrate with PETSC_FALSE for the particle deletion flag 4) Mark OVERLAP particles as GHOST when they arrive There is one problem in the above algorithm. It does not allow sending particles to multiple ranks. We would have to do this in phases right now, or make a small adjustment to the interface allowing replication of particles when a set of ranks is specified. THanks, Matt > Thank you, > Miguel > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fTP6CcczHauSge4FV5cI88RqYPhXISVNPhCpwU5IjmOea9z2VEtIlwEoPSlg5aJbEQzO0IQ8CIvAywPYjOAG$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmolinos at us.es Fri Aug 2 10:15:40 2024 From: mmolinos at us.es (MIGUEL MOLINOS PEREZ) Date: Fri, 2 Aug 2024 15:15:40 +0000 Subject: [petsc-users] Ghost particles for DMSWARM (or similar) In-Reply-To: References: <8FBAC7A5-B6AE-4B21-8FEB-52BE1C04A265@us.es> Message-ID: Thank you Matt for your time, What you describe seems to me the ideal approach. 1) Add a particle field 'ghost' that identifies ghost vs owned particles. I think it needs options OWNED, OVERLAP, and GHOST This means, locally, I need to allocate Nlocal + ghost particles (duplicated) for my model? If that so, how to do the communication between the ghost particles living in the rank i and their ?real? counterpart in the rank j. Algo, as an alternative, what about: 1) Use an IS tag which contains, for each rank, a list of the global index of the neighbors particles outside of the rank. 2) Use VecCreateGhost to create a new vector which contains extra local space for the ghost components of the vector. 3) Use VecScatterCreate, VecScatterBegin, and VecScatterEnd to do the transference of data between a vector obtained with DMSwarmCreateGlobalVectorFromField 4) Do necessary computations using the vectors created with VecCreateGhost. Thanks, Miguel On Aug 2, 2024, at 8:58?AM, Matthew Knepley wrote: On Thu, Aug 1, 2024 at 4:40?PM MIGUEL MOLINOS PEREZ > wrote: Dear all, I am implementing a Molecular Dynamics (MD) code using the DMSWARM interface. In the MD simulations we evaluate on each particle (atoms) some kind of scalar functional using data from the neighbouring atoms. My problem lies in the ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Dear all, I am implementing a Molecular Dynamics (MD) code using the DMSWARM interface. In the MD simulations we evaluate on each particle (atoms) some kind of scalar functional using data from the neighbouring atoms. My problem lies in the parallel implementation of the model, because sometimes, some of these neighbours lie on a different processor. This is usually solved by using ghost particles. A similar approach (with nodes instead) is already implemented for other PETSc mesh structures like DMPlexConstructGhostCells. Unfortunately, I don't see this kind of constructs for DMSWARM. Am I missing something? I this could be done by applying a buffer region by exploiting the background DMDA mesh that I already use to do domain decomposition. Then using the buffer region of each cell to locate the ghost particles and finally using VecCreateGhost. Is this feasible? Or is there an easier approach using other PETSc functions. This is feasible, but it would be good to develop a set of best practices, since we have been mainly focused on the case of non-redundant particles. Here is how I think I would do what you want. 1) Add a particle field 'ghost' that identifies ghost vs owned particles. I think it needs options OWNED, OVERLAP, and GHOST 2) At some interval identify particles that should be sent to other processes as ghosts. I would call these "overlap particles". The determination seems application specific, so I would leave this determination to the user right now. We do two things to these particles a) Mark chosen particles as OVERLAP b) Change rank to process we are sending to 3) Call DMSwarmMigrate with PETSC_FALSE for the particle deletion flag 4) Mark OVERLAP particles as GHOST when they arrive There is one problem in the above algorithm. It does not allow sending particles to multiple ranks. We would have to do this in phases right now, or make a small adjustment to the interface allowing replication of particles when a set of ranks is specified. THanks, Matt Thank you, Miguel -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bwMUVbfsEDURwiD6tV7_-3EXq7Aogacpt43DZLysMRG2mTWcoK-ax5Ad2xtFGWdBZWNR_QnyvEOYuHqbu4PhgA$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Aug 2 17:33:02 2024 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 2 Aug 2024 18:33:02 -0400 Subject: [petsc-users] Ghost particles for DMSWARM (or similar) In-Reply-To: References: <8FBAC7A5-B6AE-4B21-8FEB-52BE1C04A265@us.es> Message-ID: On Fri, Aug 2, 2024 at 11:15?AM MIGUEL MOLINOS PEREZ wrote: > Thank you Matt for your time, > > What you describe seems to me the ideal approach. > > 1) Add a particle field 'ghost' that identifies ghost vs owned particles. > I think it needs options OWNED, OVERLAP, and GHOST > > This means, locally, I need to allocate Nlocal + ghost particles > (duplicated) for my model? > I would do it another way. I would allocate the particles with no overlap and set them up. Then I would identify the halo particles, mark them as OVERLAP, call DMSwarmMigrate(), and mark the migrated particles as GHOST, then unmark the OVERLAP particles. Shoot! That marking will not work since we cannot tell the difference between particles we received and particles we sent. Okay, instead of the `ghost` field we need an `owner rank` field. So then we 1) Setup the non-overlapping particles 2) Identify the halo particles 3) Change the `rank`, but not the `owner rank` 4) Call DMSwarmMigrate() Now we can identify ghost particles by the `owner rank` > If that so, how to do the communication between the ghost particles living > in the rank i and their ?real? counterpart in the rank j. > > Algo, as an alternative, what about: > 1) Use an IS tag which contains, for each rank, a list of the global > index of the neighbors particles outside of the rank. > 2) Use VecCreateGhost to create a new vector which contains extra local > space for the ghost components of the vector. > 3) Use VecScatterCreate, VecScatterBegin, and VecScatterEnd to do the > transference of data between a vector obtained with > DMSwarmCreateGlobalVectorFromField > 4) Do necessary computations using the vectors created with VecCreateGhost > . > This is essentially what Migrate() does. I was trying to reuse the code. Thanks, Matt > Thanks, > Miguel > > On Aug 2, 2024, at 8:58?AM, Matthew Knepley wrote: > > On Thu, Aug 1, 2024 at 4:40?PM MIGUEL MOLINOS PEREZ > wrote: > >> Dear all, I am implementing a Molecular Dynamics (MD) code using the >> DMSWARM interface. In the MD simulations we evaluate on each particle >> (atoms) some kind of scalar functional using data from the neighbouring >> atoms. My problem lies in the >> ZjQcmQRYFpfptBannerStart >> This Message Is From an External Sender >> This message came from outside your organization. >> >> ZjQcmQRYFpfptBannerEnd >> >> Dear all, >> >> I am implementing a Molecular Dynamics (MD) code using the DMSWARM interface. In the MD simulations we evaluate on each particle (atoms) some kind of scalar functional using data from the neighbouring atoms. My problem lies in the parallel implementation of the model, because sometimes, some of these neighbours lie on a different processor. >> >> This is usually solved by using ghost particles. A similar approach (with nodes instead) is already implemented for other PETSc mesh structures like DMPlexConstructGhostCells. Unfortunately, I don't see this kind of constructs for DMSWARM. Am I missing something? >> >> I this could be done by applying a buffer region by exploiting the background DMDA mesh that I already use to do domain decomposition. Then using the buffer region of each cell to locate the ghost particles and finally using VecCreateGhost. Is this feasible? Or is there an easier approach using other PETSc functions. >> >> > This is feasible, but it would be good to develop a set of best practices, > since we have been mainly focused on the case of non-redundant particles. > Here is how I think I would do what you want. > > 1) Add a particle field 'ghost' that identifies ghost vs owned particles. > I think it needs options OWNED, OVERLAP, and GHOST > > 2) At some interval identify particles that should be sent to other > processes as ghosts. I would call these "overlap particles". The > determination > seems application specific, so I would leave this determination to the > user right now. We do two things to these particles > > a) Mark chosen particles as OVERLAP > > b) Change rank to process we are sending to > > 3) Call DMSwarmMigrate with PETSC_FALSE for the particle deletion flag > > 4) Mark OVERLAP particles as GHOST when they arrive > > There is one problem in the above algorithm. It does not allow sending > particles to multiple ranks. We would have to do this > in phases right now, or make a small adjustment to the interface allowing > replication of particles when a set of ranks is specified. > > THanks, > > Matt > > >> Thank you, >> Miguel >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bV_q4tHUc2Lno7u6JeojubaRmzQjKJDlVFxnATMOtT6Soncx1isPiFXZBhekxMOgHSdyaz_fLrVfbGZhZdDQ$ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bV_q4tHUc2Lno7u6JeojubaRmzQjKJDlVFxnATMOtT6Soncx1isPiFXZBhekxMOgHSdyaz_fLrVfbGZhZdDQ$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmolinos at us.es Fri Aug 2 18:15:01 2024 From: mmolinos at us.es (MIGUEL MOLINOS PEREZ) Date: Fri, 2 Aug 2024 23:15:01 +0000 Subject: [petsc-users] Ghost particles for DMSWARM (or similar) In-Reply-To: References: <8FBAC7A5-B6AE-4B21-8FEB-52BE1C04A265@us.es> Message-ID: <1B9B1277-9566-444C-9DA8-7ED17684FE01@us.es> Thanks again Matt, that makes a lot more sense !! Just to check that we are on the same page. You are saying: 1. create a field define a field called "owner rank" for each particle. 2. Identify the phantom particles and modify the internal variable defined by the DMSwarmField_rank variable. 3. Call DMSwarmMigrate(*,PETSC_FALSE), do the calculations using the new local vector including the ghost particles. 4. Then, once the calculations are done, rename the DMSwarmField_rank variable using the "owner rank" variable and call DMSwarmMigrate(*,PETSC_FALSE) once again. Thank you, Miguel On Aug 2, 2024, at 5:33?PM, Matthew Knepley wrote: On Fri, Aug 2, 2024 at 11:15?AM MIGUEL MOLINOS PEREZ > wrote: Thank you Matt for your time, What you describe seems to me the ideal approach. 1) Add a particle field 'ghost' that identifies ghost vs owned particles. I think it needs options OWNED, OVERLAP, and GHOST This means, locally, I need to allocate Nlocal + ghost particles (duplicated) for my model? I would do it another way. I would allocate the particles with no overlap and set them up. Then I would identify the halo particles, mark them as OVERLAP, call DMSwarmMigrate(), and mark the migrated particles as GHOST, then unmark the OVERLAP particles. Shoot! That marking will not work since we cannot tell the difference between particles we received and particles we sent. Okay, instead of the `ghost` field we need an `owner rank` field. So then we 1) Setup the non-overlapping particles 2) Identify the halo particles 3) Change the `rank`, but not the `owner rank` 4) Call DMSwarmMigrate() Now we can identify ghost particles by the `owner rank` If that so, how to do the communication between the ghost particles living in the rank i and their ?real? counterpart in the rank j. Algo, as an alternative, what about: 1) Use an IS tag which contains, for each rank, a list of the global index of the neighbors particles outside of the rank. 2) Use VecCreateGhost to create a new vector which contains extra local space for the ghost components of the vector. 3) Use VecScatterCreate, VecScatterBegin, and VecScatterEnd to do the transference of data between a vector obtained with DMSwarmCreateGlobalVectorFromField 4) Do necessary computations using the vectors created with VecCreateGhost. This is essentially what Migrate() does. I was trying to reuse the code. Thanks, Matt Thanks, Miguel On Aug 2, 2024, at 8:58?AM, Matthew Knepley > wrote: On Thu, Aug 1, 2024 at 4:40?PM MIGUEL MOLINOS PEREZ > wrote: This Message Is From an External Sender This message came from outside your organization. Dear all, I am implementing a Molecular Dynamics (MD) code using the DMSWARM interface. In the MD simulations we evaluate on each particle (atoms) some kind of scalar functional using data from the neighbouring atoms. My problem lies in the parallel implementation of the model, because sometimes, some of these neighbours lie on a different processor. This is usually solved by using ghost particles. A similar approach (with nodes instead) is already implemented for other PETSc mesh structures like DMPlexConstructGhostCells. Unfortunately, I don't see this kind of constructs for DMSWARM. Am I missing something? I this could be done by applying a buffer region by exploiting the background DMDA mesh that I already use to do domain decomposition. Then using the buffer region of each cell to locate the ghost particles and finally using VecCreateGhost. Is this feasible? Or is there an easier approach using other PETSc functions. This is feasible, but it would be good to develop a set of best practices, since we have been mainly focused on the case of non-redundant particles. Here is how I think I would do what you want. 1) Add a particle field 'ghost' that identifies ghost vs owned particles. I think it needs options OWNED, OVERLAP, and GHOST 2) At some interval identify particles that should be sent to other processes as ghosts. I would call these "overlap particles". The determination seems application specific, so I would leave this determination to the user right now. We do two things to these particles a) Mark chosen particles as OVERLAP b) Change rank to process we are sending to 3) Call DMSwarmMigrate with PETSC_FALSE for the particle deletion flag 4) Mark OVERLAP particles as GHOST when they arrive There is one problem in the above algorithm. It does not allow sending particles to multiple ranks. We would have to do this in phases right now, or make a small adjustment to the interface allowing replication of particles when a set of ranks is specified. THanks, Matt Thank you, Miguel -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YUCXV0Md1eeDXw0O-sIM1ttHyCHlD3L9TKHf-ojo81xTW1wSCAlD3ilqjvUp-jFdEJos7OcgQNHzLba_F9ZmFA$ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YUCXV0Md1eeDXw0O-sIM1ttHyCHlD3L9TKHf-ojo81xTW1wSCAlD3ilqjvUp-jFdEJos7OcgQNHzLba_F9ZmFA$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmolinos at us.es Fri Aug 2 18:47:30 2024 From: mmolinos at us.es (MIGUEL MOLINOS PEREZ) Date: Fri, 2 Aug 2024 23:47:30 +0000 Subject: [petsc-users] Ghost particles for DMSWARM (or similar) In-Reply-To: <1B9B1277-9566-444C-9DA8-7ED17684FE01@us.es> References: <8FBAC7A5-B6AE-4B21-8FEB-52BE1C04A265@us.es> <1B9B1277-9566-444C-9DA8-7ED17684FE01@us.es> Message-ID: <63A1940C-EE12-48F1-8196-A3CE7C81DCA1@us.es> Sorry, I forgot to ask this: Is this procedure safe with overlapping ghost paddles? Like for instance, shared corners between 4 ranks in 2D. Thanks, Miguel On Aug 2, 2024, at 6:14?PM, MIGUEL MOLINOS PEREZ wrote: Thanks again Matt, that makes a lot more sense !! Just to check that we are on the same page. You are saying: 1. create a field define a field called "owner rank" for each particle. 2. Identify the phantom particles and modify the internal variable defined by the DMSwarmField_rank variable. 3. Call DMSwarmMigrate(*,PETSC_FALSE), do the calculations using the new local vector including the ghost particles. 4. Then, once the calculations are done, rename the DMSwarmField_rank variable using the "owner rank" variable and call DMSwarmMigrate(*,PETSC_FALSE) once again. Thank you, Miguel On Aug 2, 2024, at 5:33?PM, Matthew Knepley > wrote: On Fri, Aug 2, 2024 at 11:15?AM MIGUEL MOLINOS PEREZ > wrote: Thank you Matt for your time, What you describe seems to me the ideal approach. 1) Add a particle field 'ghost' that identifies ghost vs owned particles. I think it needs options OWNED, OVERLAP, and GHOST This means, locally, I need to allocate Nlocal + ghost particles (duplicated) for my model? I would do it another way. I would allocate the particles with no overlap and set them up. Then I would identify the halo particles, mark them as OVERLAP, call DMSwarmMigrate(), and mark the migrated particles as GHOST, then unmark the OVERLAP particles. Shoot! That marking will not work since we cannot tell the difference between particles we received and particles we sent. Okay, instead of the `ghost` field we need an `owner rank` field. So then we 1) Setup the non-overlapping particles 2) Identify the halo particles 3) Change the `rank`, but not the `owner rank` 4) Call DMSwarmMigrate() Now we can identify ghost particles by the `owner rank` If that so, how to do the communication between the ghost particles living in the rank i and their ?real? counterpart in the rank j. Algo, as an alternative, what about: 1) Use an IS tag which contains, for each rank, a list of the global index of the neighbors particles outside of the rank. 2) Use VecCreateGhost to create a new vector which contains extra local space for the ghost components of the vector. 3) Use VecScatterCreate, VecScatterBegin, and VecScatterEnd to do the transference of data between a vector obtained with DMSwarmCreateGlobalVectorFromField 4) Do necessary computations using the vectors created with VecCreateGhost. This is essentially what Migrate() does. I was trying to reuse the code. Thanks, Matt Thanks, Miguel On Aug 2, 2024, at 8:58?AM, Matthew Knepley > wrote: On Thu, Aug 1, 2024 at 4:40?PM MIGUEL MOLINOS PEREZ > wrote: This Message Is From an External Sender This message came from outside your organization. Dear all, I am implementing a Molecular Dynamics (MD) code using the DMSWARM interface. In the MD simulations we evaluate on each particle (atoms) some kind of scalar functional using data from the neighbouring atoms. My problem lies in the parallel implementation of the model, because sometimes, some of these neighbours lie on a different processor. This is usually solved by using ghost particles. A similar approach (with nodes instead) is already implemented for other PETSc mesh structures like DMPlexConstructGhostCells. Unfortunately, I don't see this kind of constructs for DMSWARM. Am I missing something? I this could be done by applying a buffer region by exploiting the background DMDA mesh that I already use to do domain decomposition. Then using the buffer region of each cell to locate the ghost particles and finally using VecCreateGhost. Is this feasible? Or is there an easier approach using other PETSc functions. This is feasible, but it would be good to develop a set of best practices, since we have been mainly focused on the case of non-redundant particles. Here is how I think I would do what you want. 1) Add a particle field 'ghost' that identifies ghost vs owned particles. I think it needs options OWNED, OVERLAP, and GHOST 2) At some interval identify particles that should be sent to other processes as ghosts. I would call these "overlap particles". The determination seems application specific, so I would leave this determination to the user right now. We do two things to these particles a) Mark chosen particles as OVERLAP b) Change rank to process we are sending to 3) Call DMSwarmMigrate with PETSC_FALSE for the particle deletion flag 4) Mark OVERLAP particles as GHOST when they arrive There is one problem in the above algorithm. It does not allow sending particles to multiple ranks. We would have to do this in phases right now, or make a small adjustment to the interface allowing replication of particles when a set of ranks is specified. THanks, Matt Thank you, Miguel -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YDFG5MS95W4ljXEG0E1Yev9-PFcRmx0YhN98aKOS9oNQtJv4IZo87H1hNoJmE6kU1F2wiGvHxReC2jqH5qCJNQ$ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YDFG5MS95W4ljXEG0E1Yev9-PFcRmx0YhN98aKOS9oNQtJv4IZo87H1hNoJmE6kU1F2wiGvHxReC2jqH5qCJNQ$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Aug 4 03:30:05 2024 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 4 Aug 2024 04:30:05 -0400 Subject: [petsc-users] Ghost particles for DMSWARM (or similar) In-Reply-To: <1B9B1277-9566-444C-9DA8-7ED17684FE01@us.es> References: <8FBAC7A5-B6AE-4B21-8FEB-52BE1C04A265@us.es> <1B9B1277-9566-444C-9DA8-7ED17684FE01@us.es> Message-ID: On Fri, Aug 2, 2024 at 7:15?PM MIGUEL MOLINOS PEREZ wrote: > Thanks again Matt, that makes a lot more sense !! > > Just to check that we are on the same page. You are saying: > > 1. create a field define a field called "owner rank" for each particle. > > 2. Identify the phantom particles and modify the internal variable defined > by the DMSwarmField_rank variable. > > 3. Call DMSwarmMigrate(*,PETSC_FALSE), do the calculations using the new > local vector including the ghost particles. > > 4. Then, once the calculations are done, rename the DMSwarmField_rank > variable using the "owner rank" variable and call > DMSwarmMigrate(*,PETSC_FALSE) once again. > I don't think we need this last step. We can just remove those ghost particles for the next step I think. Thanks, Matt > Thank you, > Miguel > > > On Aug 2, 2024, at 5:33?PM, Matthew Knepley wrote: > > On Fri, Aug 2, 2024 at 11:15?AM MIGUEL MOLINOS PEREZ > wrote: > >> Thank you Matt for your time, >> >> What you describe seems to me the ideal approach. >> >> 1) Add a particle field 'ghost' that identifies ghost vs owned particles. >> I think it needs options OWNED, OVERLAP, and GHOST >> >> This means, locally, I need to allocate Nlocal + ghost particles >> (duplicated) for my model? >> > > I would do it another way. I would allocate the particles with no overlap > and set them up. Then I would identify the halo particles, mark them as > OVERLAP, call DMSwarmMigrate(), and mark the migrated particles as GHOST, > then unmark the OVERLAP particles. Shoot! That marking will not work since > we cannot tell the difference between particles we received and particles > we sent. Okay, instead of the `ghost` field we need an `owner rank` field. > So then we > > 1) Setup the non-overlapping particles > > 2) Identify the halo particles > > 3) Change the `rank`, but not the `owner rank` > > 4) Call DMSwarmMigrate() > > Now we can identify ghost particles by the `owner rank` > > >> If that so, how to do the communication between the ghost particles >> living in the rank i and their ?real? counterpart in the rank j. >> >> Algo, as an alternative, what about: >> 1) Use an IS tag which contains, for each rank, a list of the global >> index of the neighbors particles outside of the rank. >> 2) Use VecCreateGhost to create a new vector which contains extra local >> space for the ghost components of the vector. >> 3) Use VecScatterCreate, VecScatterBegin, and VecScatterEnd to do the >> transference of data between a vector obtained with >> DMSwarmCreateGlobalVectorFromField >> 4) Do necessary computations using the vectors created with >> VecCreateGhost. >> > > This is essentially what Migrate() does. I was trying to reuse the code. > > Thanks, > > Matt > > >> Thanks, >> Miguel >> >> On Aug 2, 2024, at 8:58?AM, Matthew Knepley wrote: >> >> On Thu, Aug 1, 2024 at 4:40?PM MIGUEL MOLINOS PEREZ >> wrote: >> >>> This Message Is From an External Sender >>> This message came from outside your organization. >>> >>> >>> Dear all, >>> >>> I am implementing a Molecular Dynamics (MD) code using the DMSWARM interface. In the MD simulations we evaluate on each particle (atoms) some kind of scalar functional using data from the neighbouring atoms. My problem lies in the parallel implementation of the model, because sometimes, some of these neighbours lie on a different processor. >>> >>> This is usually solved by using ghost particles. A similar approach (with nodes instead) is already implemented for other PETSc mesh structures like DMPlexConstructGhostCells. Unfortunately, I don't see this kind of constructs for DMSWARM. Am I missing something? >>> >>> I this could be done by applying a buffer region by exploiting the background DMDA mesh that I already use to do domain decomposition. Then using the buffer region of each cell to locate the ghost particles and finally using VecCreateGhost. Is this feasible? Or is there an easier approach using other PETSc functions. >>> >>> >> This is feasible, but it would be good to develop a set of best >> practices, since we have been mainly focused on the case of non-redundant >> particles. Here is how I think I would do what you want. >> >> 1) Add a particle field 'ghost' that identifies ghost vs owned particles. >> I think it needs options OWNED, OVERLAP, and GHOST >> >> 2) At some interval identify particles that should be sent to other >> processes as ghosts. I would call these "overlap particles". The >> determination >> seems application specific, so I would leave this determination to >> the user right now. We do two things to these particles >> >> a) Mark chosen particles as OVERLAP >> >> b) Change rank to process we are sending to >> >> 3) Call DMSwarmMigrate with PETSC_FALSE for the particle deletion flag >> >> 4) Mark OVERLAP particles as GHOST when they arrive >> >> There is one problem in the above algorithm. It does not allow sending >> particles to multiple ranks. We would have to do this >> in phases right now, or make a small adjustment to the interface allowing >> replication of particles when a set of ranks is specified. >> >> THanks, >> >> Matt >> >> >>> Thank you, >>> Miguel >>> >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bfs_cYi_MbewjyZ5saHoEqAx9SXEFMKekC6TOFsGAXCr11wOn1RrnuG5RTFV4WqHjWvBiHxouSdCL7B8UTwQ$ >> >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bfs_cYi_MbewjyZ5saHoEqAx9SXEFMKekC6TOFsGAXCr11wOn1RrnuG5RTFV4WqHjWvBiHxouSdCL7B8UTwQ$ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bfs_cYi_MbewjyZ5saHoEqAx9SXEFMKekC6TOFsGAXCr11wOn1RrnuG5RTFV4WqHjWvBiHxouSdCL7B8UTwQ$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From konstantin.murusidze at math.msu.ru Mon Aug 5 03:43:09 2024 From: konstantin.murusidze at math.msu.ru (=?utf-8?B?0JrQvtC90YHRgtCw0L3RgtC40L0g0JzRg9GA0YPRgdC40LTQt9C1?=) Date: Mon, 05 Aug 2024 11:43:09 +0300 Subject: [petsc-users] (no subject) In-Reply-To: References: <441311721646454@mail.yandex.ru> Message-ID: <234711722846959@mail.yandex.ru> An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Aug 5 10:50:46 2024 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 5 Aug 2024 11:50:46 -0400 Subject: [petsc-users] (no subject) In-Reply-To: <234711722846959@mail.yandex.ru> References: <441311721646454@mail.yandex.ru> <234711722846959@mail.yandex.ru> Message-ID: <634B195B-0BD7-43A5-8ABC-6D8D4E00ABF1@petsc.dev> PCFactorSetShiftType() only works for certain preconditioners, ILU. If the matrix is not symmetric it also doesn't help. Are you sure the matrix is symmetric positive definite? > > > On Aug 5, 2024, at 4:43?AM, ?????????? ????????? wrote: > > After this procedure I saw "Linear solution did not converge due to DIVERGED_INDEFINITE_MAT". Then as shown on https://urldefense.us/v3/__https://petsc.org/main/manualpages/KSP/KSP_DIVERGED_INDEFINITE_PC/__;!!G_uCfscf7eWS!dUAb2JB1g_tVNhPsjMe6qIzyNarGnl8HzZh2cOn_IkzWnHD5QJPY06dL8iwmKAJH2A_CYwssOq3IcWT2ButJOns$ ? I wrote such line PetscCall(PCFactorSetShiftType(pc, MAT_SHIFT_POSITIVE_DEFINITE)); but nothing changed and I still have divergence > > > 22.07.2024, 17:22, "Barry Smith" : > > Run with -ksp_monitor_true_residual -ksp_converged_reason -ksp_view to see why it is stopping at 38 iterations. > > Barry > > > On Jul 22, 2024, at 7:37?AM, ?????????? ????????? > wrote: > > This Message Is From an External Sender > This message came from outside your organization. > Good afternoon. I am a student at the Faculty of Mathematics and for my course work I need to solve SLAE with a relative accuracy of 1e-8 or more. To do this, I created the function PetscCall(KSPSetTolerances(ksp, 1.e-8, PETSC_DEFAULT, PETSC_DEFAULT, 100000));. But in the end, only 38 iterations were made and the relative norm ||Ax-b||/||b|| it turns out 4.54011. If you reply to my email, I can give you more information about the solver settings. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From srvenkat at utexas.edu Mon Aug 5 12:10:26 2024 From: srvenkat at utexas.edu (Sreeram R Venkat) Date: Mon, 5 Aug 2024 12:10:26 -0500 Subject: [petsc-users] Read/Write large dense matrix Message-ID: I have a large dense matrix (size ranging from 5e4 to 1e5) that arises as a result of doing MatComputeOperator() on a MatShell. When the total number of nonzeros exceeds the 32 bit integer value, I get an error (MPI buffer size too big) when trying to do MatView() on this to save to binary. Is there a way I can save this matrix to load again for later use? The other thing I tried was to save each column as a separate dataset in an hdf5 file. Then, I tried to load this in python, combine them to an np array, and then create/save a dense matrix with petsc4py. I was able to create the dense Mat, but the MatView() once again resulted in an error (out of memory). Thanks, Sreeram -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Aug 5 12:25:27 2024 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 5 Aug 2024 13:25:27 -0400 Subject: [petsc-users] Read/Write large dense matrix In-Reply-To: References: Message-ID: On Mon, Aug 5, 2024 at 1:10?PM Sreeram R Venkat wrote: > I have a large dense matrix (size ranging from 5e4 to 1e5) that arises as > a result of doing MatComputeOperator() on a MatShell. When the total number > of nonzeros exceeds the 32 bit integer value, I get an error (MPI buffer > size too big) when > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > I have a large dense matrix (size ranging from 5e4 to 1e5) that arises as > a result of doing MatComputeOperator() on a MatShell. When the total number > of nonzeros exceeds the 32 bit integer value, I get an error (MPI buffer > size too big) when trying to do MatView() on this to save to binary. Is > there a way I can save this matrix to load again for later use? > I think you need to reconfigure with --with-64-bit-indices. Thanks, Matt > The other thing I tried was to save each column as a separate dataset in > an hdf5 file. Then, I tried to load this in python, combine them to an np > array, and then create/save a dense matrix with petsc4py. I was able to > create the dense Mat, but the MatView() once again resulted in an error > (out of memory). > > Thanks, > Sreeram > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ZWJy1vnQRcamRkpD9AvtD6y9h9bvIfbWSTz0DllLxYWq7hwcAytyX_EC7cpuwneyYXURUCUm2lSCptmMMZy4$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From srvenkat at utexas.edu Mon Aug 5 12:26:45 2024 From: srvenkat at utexas.edu (Sreeram R Venkat) Date: Mon, 5 Aug 2024 12:26:45 -0500 Subject: [petsc-users] Read/Write large dense matrix In-Reply-To: References: Message-ID: I do have 64 bit indices turned on. The problem I think is that the PetscMPIInt is always a 32 bit int, and that's what's overflowing On Mon, Aug 5, 2024 at 12:25?PM Matthew Knepley wrote: > On Mon, Aug 5, 2024 at 1:10?PM Sreeram R Venkat > wrote: > >> I have a large dense matrix (size ranging from 5e4 to 1e5) that arises as >> a result of doing MatComputeOperator() on a MatShell. When the total number >> of nonzeros exceeds the 32 bit integer value, I get an error (MPI buffer >> size too big) when >> ZjQcmQRYFpfptBannerStart >> This Message Is From an External Sender >> This message came from outside your organization. >> >> ZjQcmQRYFpfptBannerEnd >> I have a large dense matrix (size ranging from 5e4 to 1e5) that arises as >> a result of doing MatComputeOperator() on a MatShell. When the total number >> of nonzeros exceeds the 32 bit integer value, I get an error (MPI buffer >> size too big) when trying to do MatView() on this to save to binary. Is >> there a way I can save this matrix to load again for later use? >> > > I think you need to reconfigure with --with-64-bit-indices. > > Thanks, > > Matt > > >> The other thing I tried was to save each column as a separate dataset in >> an hdf5 file. Then, I tried to load this in python, combine them to an np >> array, and then create/save a dense matrix with petsc4py. I was able to >> create the dense Mat, but the MatView() once again resulted in an error >> (out of memory). >> >> Thanks, >> Sreeram >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YEWGX-ox3ARoPA3Pbscem_Hb7EN32OeIgge1kzxkTWE7WIoLiKNb_-TCX9RvQvrJDbraBDj_vhi63DtOgAjbEhbzEA$ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Aug 5 12:40:43 2024 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 5 Aug 2024 13:40:43 -0400 Subject: [petsc-users] Read/Write large dense matrix In-Reply-To: References: Message-ID: On Mon, Aug 5, 2024 at 1:26?PM Sreeram R Venkat wrote: > I do have 64 bit indices turned on. The problem I think is that the > PetscMPIInt is always a 32 bit int, and that's what's overflowing > We should be using the large count support from MPI. However, it appears we forgot somewhere. Would it be possible to construct a simple example that I can run and find the error? You should be able to just create a dense matrix of zeros with the correct size. Thanks, Matt > On Mon, Aug 5, 2024 at 12:25?PM Matthew Knepley wrote: > >> On Mon, Aug 5, 2024 at 1:10?PM Sreeram R Venkat >> wrote: >> >>> I have a large dense matrix (size ranging from 5e4 to 1e5) that arises >>> as a result of doing MatComputeOperator() on a MatShell. When the total >>> number of nonzeros exceeds the 32 bit integer value, I get an error (MPI >>> buffer size too big) when >>> ZjQcmQRYFpfptBannerStart >>> This Message Is From an External Sender >>> This message came from outside your organization. >>> >>> ZjQcmQRYFpfptBannerEnd >>> I have a large dense matrix (size ranging from 5e4 to 1e5) that arises >>> as a result of doing MatComputeOperator() on a MatShell. When the total >>> number of nonzeros exceeds the 32 bit integer value, I get an error (MPI >>> buffer size too big) when trying to do MatView() on this to save to binary. >>> Is there a way I can save this matrix to load again for later use? >>> >> >> I think you need to reconfigure with --with-64-bit-indices. >> >> Thanks, >> >> Matt >> >> >>> The other thing I tried was to save each column as a separate dataset in >>> an hdf5 file. Then, I tried to load this in python, combine them to an np >>> array, and then create/save a dense matrix with petsc4py. I was able to >>> create the dense Mat, but the MatView() once again resulted in an error >>> (out of memory). >>> >>> Thanks, >>> Sreeram >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!a-sxRcKHh_nd4gLTjiXZxx0nYU4_lvIBL8xVFhNVrOwEBeVFcnTWMFNkyHuJ15bZDhKacKWF1t8swumsFxgH$ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!a-sxRcKHh_nd4gLTjiXZxx0nYU4_lvIBL8xVFhNVrOwEBeVFcnTWMFNkyHuJ15bZDhKacKWF1t8swumsFxgH$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Aug 5 13:19:37 2024 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 5 Aug 2024 14:19:37 -0400 Subject: [petsc-users] Read/Write large dense matrix In-Reply-To: References: Message-ID: <48E31B64-61EB-463F-823F-314DE9E7C290@petsc.dev> By default PETSc MatView() to a binary viewer uses the "standard" compressed sparse storage format. This is not efficient (or reasonable) for dense matrices and produces issues with integer overflow. To store a dense matrix as dense on disk, use the PetscViewerFormat of PETSC_VIEWER_NATIVE. So for example PetscViewerPushFormat(viewer,PETSC_VIEWER_NATIVE); MatView(mat, viewer); PetscViewerPopFormat(viewer); > On Aug 5, 2024, at 1:10?PM, Sreeram R Venkat wrote: > > This Message Is From an External Sender > This message came from outside your organization. > I have a large dense matrix (size ranging from 5e4 to 1e5) that arises as a result of doing MatComputeOperator() on a MatShell. When the total number of nonzeros exceeds the 32 bit integer value, I get an error (MPI buffer size too big) when trying to do MatView() on this to save to binary. Is there a way I can save this matrix to load again for later use? > > The other thing I tried was to save each column as a separate dataset in an hdf5 file. Then, I tried to load this in python, combine them to an np array, and then create/save a dense matrix with petsc4py. I was able to create the dense Mat, but the MatView() once again resulted in an error (out of memory). > > Thanks, > Sreeram -------------- next part -------------- An HTML attachment was scrubbed... URL: From konstantin.murusidze at math.msu.ru Mon Aug 5 13:27:51 2024 From: konstantin.murusidze at math.msu.ru (=?utf-8?B?0JrQvtC90YHRgtCw0L3RgtC40L0g0JzRg9GA0YPRgdC40LTQt9C1?=) Date: Mon, 05 Aug 2024 21:27:51 +0300 Subject: [petsc-users] (no subject) In-Reply-To: <634B195B-0BD7-43A5-8ABC-6D8D4E00ABF1@petsc.dev> References: <634B195B-0BD7-43A5-8ABC-6D8D4E00ABF1@petsc.dev> <441311721646454@mail.yandex.ru> <234711722846959@mail.yandex.ru> Message-ID: <27547411722882471@wf4nrjvtssjecb53.iva.yp-c.yandex.net> An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Aug 5 14:33:44 2024 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 5 Aug 2024 15:33:44 -0400 Subject: [petsc-users] (no subject) In-Reply-To: <27547411722882471@wf4nrjvtssjecb53.iva.yp-c.yandex.net> References: <634B195B-0BD7-43A5-8ABC-6D8D4E00ABF1@petsc.dev> <441311721646454@mail.yandex.ru> <234711722846959@mail.yandex.ru> <27547411722882471@wf4nrjvtssjecb53.iva.yp-c.yandex.net> Message-ID: https://urldefense.us/v3/__https://petsc.org/release/manualpages/KSP/KSPCR/__;!!G_uCfscf7eWS!ZQFXJM_C7qCPyXSN6vUgcQaTYavgEpmvtDdWSpbY1dR7m_6DlTM1CF00dL26gVO4tjPEFFvV79SEmjS1tbaF0nc$ The preconditioner must be POSITIVE-DEFINITE and the operator POSITIVE-SEMIDEFINITE. https://urldefense.us/v3/__https://petsc.org/release/manualpages/KSP/KSPMINRES/__;!!G_uCfscf7eWS!ZQFXJM_C7qCPyXSN6vUgcQaTYavgEpmvtDdWSpbY1dR7m_6DlTM1CF00dL26gVO4tjPEFFvV79SEmjS1rmWgN0k$ ? The operator and the preconditioner must be symmetric and the preconditioner must be positive definite for this method. https://urldefense.us/v3/__https://petsc.org/release/manualpages/KSP/KSPSYMMLQ/__;!!G_uCfscf7eWS!ZQFXJM_C7qCPyXSN6vUgcQaTYavgEpmvtDdWSpbY1dR7m_6DlTM1CF00dL26gVO4tjPEFFvV79SEmjS1uMjjdeU$ ? The preconditioner must be POSITIVE-DEFINITE. Of course you can always use KSPGMRES or KSPBCGS > On Aug 5, 2024, at 2:27?PM, ?????????? ????????? wrote: > > I know that matrix is symmetric, but it isn?t positive definite. Is it possible to solve such problem, maybe with another solver or preconditioner? > > -- > ?????????? ?? ????????? ?????? ????? > > 05.08.2024, 18:51, "Barry Smith" : > PCFactorSetShiftType() only works for certain preconditioners, ILU. > > If the matrix is not symmetric it also doesn't help. > > Are you sure the matrix is symmetric positive definite? > > > > > On Aug 5, 2024, at 4:43?AM, ?????????? ????????? > wrote: > > After this procedure I saw "Linear solution did not converge due to DIVERGED_INDEFINITE_MAT". Then as shown on https://urldefense.us/v3/__https://petsc.org/main/manualpages/KSP/KSP_DIVERGED_INDEFINITE_PC/__;!!G_uCfscf7eWS!ZQFXJM_C7qCPyXSN6vUgcQaTYavgEpmvtDdWSpbY1dR7m_6DlTM1CF00dL26gVO4tjPEFFvV79SEmjS1OqyHDu0$ ? I wrote such line PetscCall(PCFactorSetShiftType(pc, MAT_SHIFT_POSITIVE_DEFINITE)); but nothing changed and I still have divergence > > > 22.07.2024, 17:22, "Barry Smith" >: > > Run with -ksp_monitor_true_residual -ksp_converged_reason -ksp_view to see why it is stopping at 38 iterations. > > Barry > > > On Jul 22, 2024, at 7:37?AM, ?????????? ????????? > wrote: > > This Message Is From an External Sender > This message came from outside your organization. > Good afternoon. I am a student at the Faculty of Mathematics and for my course work I need to solve SLAE with a relative accuracy of 1e-8 or more. To do this, I created the function PetscCall(KSPSetTolerances(ksp, 1.e-8, PETSC_DEFAULT, PETSC_DEFAULT, 100000));. But in the end, only 38 iterations were made and the relative norm ||Ax-b||/||b|| it turns out 4.54011. If you reply to my email, I can give you more information about the solver settings. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: petsc_favicon.png Type: image/png Size: 1172 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: petsc_favicon.png Type: image/png Size: 1172 bytes Desc: not available URL: From srvenkat at utexas.edu Mon Aug 5 20:19:51 2024 From: srvenkat at utexas.edu (Sreeram R Venkat) Date: Mon, 5 Aug 2024 20:19:51 -0500 Subject: [petsc-users] Read/Write large dense matrix In-Reply-To: <48E31B64-61EB-463F-823F-314DE9E7C290@petsc.dev> References: <48E31B64-61EB-463F-823F-314DE9E7C290@petsc.dev> Message-ID: Here's an example code that should replicate the error: https://urldefense.us/v3/__https://github.com/s769/petsc-test/tree/master__;!!G_uCfscf7eWS!dWU2gJCvykWqg3TTfkkQOsW3q32Sny3r399zmyr6MCiJQh6_dH-T3IktQLg9fbvc4okbbHP2koQZkzL0fCjOTrC90w$ . I tried using the PETSC_FORMAT_NATIVE, but I still get the error. I have a situation where the matrix is created on PETSC_COMM_WORLD but only has entries on the first process due to some layout constraints elsewhere in the program. The nodes I'm running on should have more than enough memory to hold the entire matrix on one process, and the error I get is not an out-of-memory error anyway. Let me know if you aren't able to build the example. I noticed that if I fully distributed the matrix over all processes, then the save works fine. Is there some way to do that after I create the matrix but before saving it? On Mon, Aug 5, 2024 at 1:19?PM Barry Smith wrote: > > By default PETSc MatView() to a binary viewer uses the "standard" > compressed sparse storage format. This is not efficient (or reasonable) for > dense matrices and > produces issues with integer overflow. > > To store a dense matrix as dense on disk, use the PetscViewerFormat > of PETSC_VIEWER_NATIVE. So for example > > PetscViewerPushFormat(viewer,PETSC_VIEWER_NATIVE); > MatView(mat, viewer); > PetscViewerPopFormat(viewer); > > > On Aug 5, 2024, at 1:10?PM, Sreeram R Venkat wrote: > > This Message Is From an External Sender > This message came from outside your organization. > I have a large dense matrix (size ranging from 5e4 to 1e5) that arises as > a result of doing MatComputeOperator() on a MatShell. When the total number > of nonzeros exceeds the 32 bit integer value, I get an error (MPI buffer > size too big) when trying to do MatView() on this to save to binary. Is > there a way I can save this matrix to load again for later use? > > The other thing I tried was to save each column as a separate dataset in > an hdf5 file. Then, I tried to load this in python, combine them to an np > array, and then create/save a dense matrix with petsc4py. I was able to > create the dense Mat, but the MatView() once again resulted in an error > (out of memory). > > Thanks, > Sreeram > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmolinos at us.es Tue Aug 6 21:22:29 2024 From: mmolinos at us.es (MIGUEL MOLINOS PEREZ) Date: Wed, 7 Aug 2024 02:22:29 +0000 Subject: [petsc-users] Ghost particles for DMSWARM (or similar) In-Reply-To: References: <8FBAC7A5-B6AE-4B21-8FEB-52BE1C04A265@us.es> <1B9B1277-9566-444C-9DA8-7ED17684FE01@us.es> Message-ID: Thanks Matt, I think I'll start by making a small program as a proof of concept. Then, if it works I'll implement it in my code and I'll be happy to share it too :-) Miguel On Aug 4, 2024, at 3:30?AM, Matthew Knepley wrote: On Fri, Aug 2, 2024 at 7:15?PM MIGUEL MOLINOS PEREZ > wrote: Thanks again Matt, that makes a lot more sense !! Just to check that we are on the same page. You are saying: 1. create a field define a field called "owner rank" for each particle. 2. Identify the phantom particles and modify the internal variable defined by the DMSwarmField_rank variable. 3. Call DMSwarmMigrate(*,PETSC_FALSE), do the calculations using the new local vector including the ghost particles. 4. Then, once the calculations are done, rename the DMSwarmField_rank variable using the "owner rank" variable and call DMSwarmMigrate(*,PETSC_FALSE) once again. I don't think we need this last step. We can just remove those ghost particles for the next step I think. Thanks, Matt Thank you, Miguel On Aug 2, 2024, at 5:33?PM, Matthew Knepley > wrote: On Fri, Aug 2, 2024 at 11:15?AM MIGUEL MOLINOS PEREZ > wrote: Thank you Matt for your time, What you describe seems to me the ideal approach. 1) Add a particle field 'ghost' that identifies ghost vs owned particles. I think it needs options OWNED, OVERLAP, and GHOST This means, locally, I need to allocate Nlocal + ghost particles (duplicated) for my model? I would do it another way. I would allocate the particles with no overlap and set them up. Then I would identify the halo particles, mark them as OVERLAP, call DMSwarmMigrate(), and mark the migrated particles as GHOST, then unmark the OVERLAP particles. Shoot! That marking will not work since we cannot tell the difference between particles we received and particles we sent. Okay, instead of the `ghost` field we need an `owner rank` field. So then we 1) Setup the non-overlapping particles 2) Identify the halo particles 3) Change the `rank`, but not the `owner rank` 4) Call DMSwarmMigrate() Now we can identify ghost particles by the `owner rank` If that so, how to do the communication between the ghost particles living in the rank i and their ?real? counterpart in the rank j. Algo, as an alternative, what about: 1) Use an IS tag which contains, for each rank, a list of the global index of the neighbors particles outside of the rank. 2) Use VecCreateGhost to create a new vector which contains extra local space for the ghost components of the vector. 3) Use VecScatterCreate, VecScatterBegin, and VecScatterEnd to do the transference of data between a vector obtained with DMSwarmCreateGlobalVectorFromField 4) Do necessary computations using the vectors created with VecCreateGhost. This is essentially what Migrate() does. I was trying to reuse the code. Thanks, Matt Thanks, Miguel On Aug 2, 2024, at 8:58?AM, Matthew Knepley > wrote: On Thu, Aug 1, 2024 at 4:40?PM MIGUEL MOLINOS PEREZ > wrote: This Message Is From an External Sender This message came from outside your organization. Dear all, I am implementing a Molecular Dynamics (MD) code using the DMSWARM interface. In the MD simulations we evaluate on each particle (atoms) some kind of scalar functional using data from the neighbouring atoms. My problem lies in the parallel implementation of the model, because sometimes, some of these neighbours lie on a different processor. This is usually solved by using ghost particles. A similar approach (with nodes instead) is already implemented for other PETSc mesh structures like DMPlexConstructGhostCells. Unfortunately, I don't see this kind of constructs for DMSWARM. Am I missing something? I this could be done by applying a buffer region by exploiting the background DMDA mesh that I already use to do domain decomposition. Then using the buffer region of each cell to locate the ghost particles and finally using VecCreateGhost. Is this feasible? Or is there an easier approach using other PETSc functions. This is feasible, but it would be good to develop a set of best practices, since we have been mainly focused on the case of non-redundant particles. Here is how I think I would do what you want. 1) Add a particle field 'ghost' that identifies ghost vs owned particles. I think it needs options OWNED, OVERLAP, and GHOST 2) At some interval identify particles that should be sent to other processes as ghosts. I would call these "overlap particles". The determination seems application specific, so I would leave this determination to the user right now. We do two things to these particles a) Mark chosen particles as OVERLAP b) Change rank to process we are sending to 3) Call DMSwarmMigrate with PETSC_FALSE for the particle deletion flag 4) Mark OVERLAP particles as GHOST when they arrive There is one problem in the above algorithm. It does not allow sending particles to multiple ranks. We would have to do this in phases right now, or make a small adjustment to the interface allowing replication of particles when a set of ranks is specified. THanks, Matt Thank you, Miguel -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Zaa9esHaq00hIlNsNZjjG-L4CQ2QxKeoyoqvAF4909vtCxKveI1Fh83DKxZnH24E5ToHwzs69i5yzVZlQGO6fA$ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Zaa9esHaq00hIlNsNZjjG-L4CQ2QxKeoyoqvAF4909vtCxKveI1Fh83DKxZnH24E5ToHwzs69i5yzVZlQGO6fA$ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Zaa9esHaq00hIlNsNZjjG-L4CQ2QxKeoyoqvAF4909vtCxKveI1Fh83DKxZnH24E5ToHwzs69i5yzVZlQGO6fA$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Aug 6 22:23:30 2024 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 6 Aug 2024 23:23:30 -0400 Subject: [petsc-users] Read/Write large dense matrix In-Reply-To: References: <48E31B64-61EB-463F-823F-314DE9E7C290@petsc.dev> Message-ID: <2BE6C49A-B4CB-48A8-AA78-056753B8539F@petsc.dev> I have removed an unnecessary PetscMPIIntCast() on MPI rank zero that was causing your test code to fail. See https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/7747__;!!G_uCfscf7eWS!fyVpTZyH7SS1nVHiZBR-6MSWa6uJ0mSExyg1aNmU4PyWfukw1682_dX9rwUKstiGP6Z8i22L4pmElEi9qsfHA6o$ Thanks for reporting the problem. Barry BTW: I don't think we have code to distribute a dense matrix that has values only on one rank to all the ranks. The code would essentially like the combination of of MatView_Dense_Binary/MatLoad_Dense_Binary with PetscViewerBinaryWriteReadAll without the saving and reading from disk. It is likely relatively easy to fix the dense matrix view/load with native format so that it does not need 64 bit indices to work with your test code. > On Aug 5, 2024, at 9:19?PM, Sreeram R Venkat wrote: > > Here's an example code that should replicate the error: https://urldefense.us/v3/__https://github.com/s769/petsc-test/tree/master__;!!G_uCfscf7eWS!fyVpTZyH7SS1nVHiZBR-6MSWa6uJ0mSExyg1aNmU4PyWfukw1682_dX9rwUKstiGP6Z8i22L4pmElEi9y0M6fi0$ . > > I tried using the PETSC_FORMAT_NATIVE, but I still get the error. I have a situation where the matrix is created on PETSC_COMM_WORLD but only has entries on the first process due to some layout constraints elsewhere in the program. The nodes I'm running on should have more than enough memory to hold the entire matrix on one process, and the error I get is not an out-of-memory error anyway. > > Let me know if you aren't able to build the example. > > I noticed that if I fully distributed the matrix over all processes, then the save works fine. Is there some way to do that after I create the matrix but before saving it? > > On Mon, Aug 5, 2024 at 1:19?PM Barry Smith > wrote: >> >> By default PETSc MatView() to a binary viewer uses the "standard" compressed sparse storage format. This is not efficient (or reasonable) for dense matrices and >> produces issues with integer overflow. >> >> To store a dense matrix as dense on disk, use the PetscViewerFormat of PETSC_VIEWER_NATIVE. So for example >> >> PetscViewerPushFormat(viewer,PETSC_VIEWER_NATIVE); >> MatView(mat, viewer); >> PetscViewerPopFormat(viewer); >> >> >>> On Aug 5, 2024, at 1:10?PM, Sreeram R Venkat > wrote: >>> >>> This Message Is From an External Sender >>> This message came from outside your organization. >>> I have a large dense matrix (size ranging from 5e4 to 1e5) that arises as a result of doing MatComputeOperator() on a MatShell. When the total number of nonzeros exceeds the 32 bit integer value, I get an error (MPI buffer size too big) when trying to do MatView() on this to save to binary. Is there a way I can save this matrix to load again for later use? >>> >>> The other thing I tried was to save each column as a separate dataset in an hdf5 file. Then, I tried to load this in python, combine them to an np array, and then create/save a dense matrix with petsc4py. I was able to create the dense Mat, but the MatView() once again resulted in an error (out of memory). >>> >>> Thanks, >>> Sreeram >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From pentaerythrotol at gmail.com Thu Aug 8 09:46:19 2024 From: pentaerythrotol at gmail.com (=?UTF-8?B?546L5oOK5qKm?=) Date: Thu, 8 Aug 2024 22:46:19 +0800 Subject: [petsc-users] A question about loading data form .h5 file Message-ID: Dear Sir or Madam, I met a problem about loading data from hdf5 file. Attached is a simple test program. A introduction of the program is as follows: I create a 3d DMDA object(50*50*50), and create 2 global vectors x and y by DMDA, then I write the vector x into file test.h5. When I try to load the vector y from test.h5, the error occurs that "Global size of array in file is 2500, not 125000 as expected", but the size of the array should be 125000 which I've checked in python. Thank you for your help! Best regards ZHOU Yingjie -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test.c Type: text/x-c-code Size: 1766 bytes Desc: not available URL: From bsmith at petsc.dev Thu Aug 8 13:28:33 2024 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 8 Aug 2024 14:28:33 -0400 Subject: [petsc-users] A question about loading data form .h5 file In-Reply-To: References: Message-ID: <2C7817FD-EF02-455C-B6F9-17CEB8D72485@petsc.dev> Try reading the vector back in with a vector created from the DM. > On Aug 8, 2024, at 10:46?AM, ??? wrote: > > This Message Is From an External Sender > This message came from outside your organization. > Dear Sir or Madam, > > I met a problem about loading data from hdf5 file. Attached is a simple test program. > > A introduction of the program is as follows: > I create a 3d DMDA object(50*50*50), and create 2 global vectors x and y by DMDA, then I write the vector x into file test.h5. > When I try to load the vector y from test.h5, the error occurs that "Global size of array in file is 2500, not 125000 as expected", but the size of the array should be 125000 which I've checked in python. > > Thank you for your help! > > Best regards > ZHOU Yingjie > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Aug 9 09:37:05 2024 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 9 Aug 2024 10:37:05 -0400 Subject: [petsc-users] A question about loading data form .h5 file In-Reply-To: References: <2C7817FD-EF02-455C-B6F9-17CEB8D72485@petsc.dev> Message-ID: <2A0602EC-91E6-4D5B-9D76-C1458BA23535@petsc.dev> Vectors created from a DMDA have multi-dimensional information attached to them. When the vector is saved with an HDF5 viewer, this multi-dimensional information is saved in the HDF5 file. When a vector created with a DMDA loads the HDF5 file, the multi-dimensional information is used for a successful load. When a plain vector (obtained with VecCreate) loads from the HDF5 file, it does not have the concept of multidimensional information and thus, for some reason, works incorrectly. I would consider the current error output to be a bug; it should provide a more useful error message. Note that for DMDA in parallel, the DMDA vector data is transformed to be stored in the natural multidimensional ordering on the disk. When it is read back in with a DMDA vector, the transformation is reversed to get the data back into PETSc's parallel ordering. So, in parallel, it would never make sense to store a DMDA vector with hdf5 and then read it back in with a "plain" vector since the plain vector does not know how to transform the data. Barry > On Aug 9, 2024, at 1:11?AM, ??? wrote: > > Dear Barry, > > Thank you for your quick reply, the problem has been solved. > > I'm just still curious why when I use PETSC_COMM_SELF in PETSCVIEWER to load a 50*50*50 vector, and the error happens that "Global size of array in file is 2500(50*50), not 125000(50*50*50) as expected". > > Best regards, > Yingjie ZHOU > > Barry Smith > ?2024?8?9??? 02:28??? >> >> Try reading the vector back in with a vector created from the DM. >> >> >> >>> On Aug 8, 2024, at 10:46?AM, ??? > wrote: >>> >>> This Message Is From an External Sender >>> This message came from outside your organization. >>> Dear Sir or Madam, >>> >>> I met a problem about loading data from hdf5 file. Attached is a simple test program. >>> >>> A introduction of the program is as follows: >>> I create a 3d DMDA object(50*50*50), and create 2 global vectors x and y by DMDA, then I write the vector x into file test.h5. >>> When I try to load the vector y from test.h5, the error occurs that "Global size of array in file is 2500, not 125000 as expected", but the size of the array should be 125000 which I've checked in python. >>> >>> Thank you for your help! >>> >>> Best regards >>> ZHOU Yingjie >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcos.vanella at nist.gov Fri Aug 9 12:17:10 2024 From: marcos.vanella at nist.gov (Vanella, Marcos (Fed)) Date: Fri, 9 Aug 2024 17:17:10 +0000 Subject: [petsc-users] Issue configuring PETSc with HYPRE in Polaris Message-ID: Hi All, I keep running into this issue when trying to configure PETSc downloading HYPRE in Polaris. My modules are: export MPICH_GPU_SUPPORT_ENABLED=1 module use /soft/modulefiles module load spack-pe-base cmake module unload darshan module load cudatoolkit-standalone PrgEnv-gnu cray-libsci and my configure line is: $./configure COPTFLAGS="-O2" CXXOPTFLAGS="-O2" FOPTFLAGS="-O2" FCOPTFLAGS="-O2" CUDAOPTFLAGS="-O2" --with-debugging=1 --download-suitesparse --download-hypre --with-cuda --with-cc=cc --with-cxx=CC --with-fc=ftn --with-cudac=nvcc --with-cuda-arch=80 What I see in the configure phase is: ============================================================================================= Configuring PETSc to compile on your system ============================================================================================= ============================================================================================= Trying to download https://urldefense.us/v3/__https://bitbucket.org/petsc/pkg-sowing.git__;!!G_uCfscf7eWS!cmVCEe9Yo9XY7yJT97YkbQmjoCgOuxhiJ2FQxtDUKX1EeWJlKWt0pLawxoHeUS0ZDgSfwCHAoJNUjc5uQW3gQdHH9OXszklO$ for SOWING ============================================================================================= ============================================================================================= Running configure on SOWING; this may take several minutes ============================================================================================= ============================================================================================= Running make on SOWING; this may take several minutes ============================================================================================= ============================================================================================= Running make install on SOWING; this may take several minutes ============================================================================================= ============================================================================================= Running arch-polaris-dbg/bin/bfort to generate Fortran stubs ============================================================================================= ============================================================================================= Trying to download https://urldefense.us/v3/__https://github.com/DrTimothyAldenDavis/SuiteSparse__;!!G_uCfscf7eWS!cmVCEe9Yo9XY7yJT97YkbQmjoCgOuxhiJ2FQxtDUKX1EeWJlKWt0pLawxoHeUS0ZDgSfwCHAoJNUjc5uQW3gQdHH9Ho5-hpl$ for SUITESPARSE ============================================================================================= ============================================================================================= Configuring SUITESPARSE with CMake; this may take several minutes ============================================================================================= ============================================================================================= Compiling and installing SUITESPARSE; this may take several minutes ============================================================================================= ============================================================================================= Trying to download https://urldefense.us/v3/__https://github.com/hypre-space/hypre__;!!G_uCfscf7eWS!cmVCEe9Yo9XY7yJT97YkbQmjoCgOuxhiJ2FQxtDUKX1EeWJlKWt0pLawxoHeUS0ZDgSfwCHAoJNUjc5uQW3gQdHH9JxTYrQ0$ for HYPRE ============================================================================================= ============================================================================================= Running configure on HYPRE; this may take several minutes ============================================================================================= ============================================================================================= Running make on HYPRE; this may take several minutes ============================================================================================= ********************************************************************************************* UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): --------------------------------------------------------------------------------------------- Error running make; make install on HYPRE ********************************************************************************************* the configure.log file ends with: ********************************************************************************************* UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): --------------------------------------------------------------------------------------------- Error running make; make install on HYPRE ********************************************************************************************* File "/home/mnv/Software/petsc/config/configure.py", line 462, in petsc_configure framework.configure(out = sys.stdout) File "/home/mnv/Software/petsc/config/BuildSystem/config/framework.py", line 1455, in configure self.processChildren() File "/home/mnv/Software/petsc/config/BuildSystem/config/framework.py", line 1443, in processChildren self.serialEvaluation(self.childGraph) File "/home/mnv/Software/petsc/config/BuildSystem/config/framework.py", line 1418, in serialEvaluation child.configure() File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", line 1354, in configure self.executeTest(self.configureLibrary) File "/home/mnv/Software/petsc/config/BuildSystem/config/base.py", line 138, in executeTest ret = test(*args,**kargs) File "/home/mnv/Software/petsc/config/BuildSystem/config/packages/hypre.py", line 199, in configureLibrary config.package.Package.configureLibrary(self) File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", line 1041, in configureLibrary for location, directory, lib, incl in self.generateGuesses(): File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", line 609, in generateGuesses d = self.checkDownload() File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", line 743, in checkDownload return self.getInstallDir() File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", line 545, in getInstallDir installDir = self.Install() File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", line 1892, in Install raise RuntimeError('Error running make; make install on '+self.PACKAGE) ================================================================================ Finishing configure run at Fri, 09 Aug 2024 15:44:54 +0000 ================================================================================ Any help in debugging this is much appreciated. I can provide the whole configure.log file if needed. Thank you for your time, Marcos -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay.anl at fastmail.org Fri Aug 9 12:44:18 2024 From: balay.anl at fastmail.org (Satish Balay) Date: Fri, 9 Aug 2024 12:44:18 -0500 (CDT) Subject: [petsc-users] Issue configuring PETSc with HYPRE in Polaris In-Reply-To: References: Message-ID: If building on front-end - try using --with-make-np=8 [or 4] If you still have issues - send configure.log Satish On Fri, 9 Aug 2024, Vanella, Marcos (Fed) via petsc-users wrote: > Hi All, I keep running into this issue when trying to configure PETSc downloading HYPRE in Polaris. > My modules are: > > export MPICH_GPU_SUPPORT_ENABLED=1 > module use /soft/modulefiles > module load spack-pe-base cmake > module unload darshan > module load cudatoolkit-standalone PrgEnv-gnu cray-libsci > > and my configure line is: > > $./configure COPTFLAGS="-O2" CXXOPTFLAGS="-O2" FOPTFLAGS="-O2" FCOPTFLAGS="-O2" CUDAOPTFLAGS="-O2" --with-debugging=1 --download-suitesparse --download-hypre --with-cuda --with-cc=cc --with-cxx=CC --with-fc=ftn --with-cudac=nvcc --with-cuda-arch=80 > > What I see in the configure phase is: > ============================================================================================= > Configuring PETSc to compile on your system > ============================================================================================= > ============================================================================================= > Trying to download https://urldefense.us/v3/__https://bitbucket.org/petsc/pkg-sowing.git__;!!G_uCfscf7eWS!cmVCEe9Yo9XY7yJT97YkbQmjoCgOuxhiJ2FQxtDUKX1EeWJlKWt0pLawxoHeUS0ZDgSfwCHAoJNUjc5uQW3gQdHH9OXszklO$ for SOWING > ============================================================================================= > ============================================================================================= > Running configure on SOWING; this may take several minutes > ============================================================================================= > ============================================================================================= > Running make on SOWING; this may take several minutes > ============================================================================================= > ============================================================================================= > Running make install on SOWING; this may take several minutes > ============================================================================================= > ============================================================================================= > Running arch-polaris-dbg/bin/bfort to generate Fortran stubs > ============================================================================================= > ============================================================================================= > Trying to download https://urldefense.us/v3/__https://github.com/DrTimothyAldenDavis/SuiteSparse__;!!G_uCfscf7eWS!cmVCEe9Yo9XY7yJT97YkbQmjoCgOuxhiJ2FQxtDUKX1EeWJlKWt0pLawxoHeUS0ZDgSfwCHAoJNUjc5uQW3gQdHH9Ho5-hpl$ for SUITESPARSE > ============================================================================================= > ============================================================================================= > Configuring SUITESPARSE with CMake; this may take several minutes > ============================================================================================= > ============================================================================================= > Compiling and installing SUITESPARSE; this may take several minutes > ============================================================================================= > ============================================================================================= > Trying to download https://urldefense.us/v3/__https://github.com/hypre-space/hypre__;!!G_uCfscf7eWS!cmVCEe9Yo9XY7yJT97YkbQmjoCgOuxhiJ2FQxtDUKX1EeWJlKWt0pLawxoHeUS0ZDgSfwCHAoJNUjc5uQW3gQdHH9JxTYrQ0$ for HYPRE > ============================================================================================= > ============================================================================================= > Running configure on HYPRE; this may take several minutes > ============================================================================================= > ============================================================================================= > Running make on HYPRE; this may take several minutes > ============================================================================================= > > ********************************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): > --------------------------------------------------------------------------------------------- > Error running make; make install on HYPRE > ********************************************************************************************* > > the configure.log file ends with: > > ********************************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): > --------------------------------------------------------------------------------------------- > Error running make; make install on HYPRE > ********************************************************************************************* > File "/home/mnv/Software/petsc/config/configure.py", line 462, in petsc_configure > framework.configure(out = sys.stdout) > File "/home/mnv/Software/petsc/config/BuildSystem/config/framework.py", line 1455, in configure > self.processChildren() > File "/home/mnv/Software/petsc/config/BuildSystem/config/framework.py", line 1443, in processChildren > self.serialEvaluation(self.childGraph) > File "/home/mnv/Software/petsc/config/BuildSystem/config/framework.py", line 1418, in serialEvaluation > child.configure() > File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", line 1354, in configure > self.executeTest(self.configureLibrary) > File "/home/mnv/Software/petsc/config/BuildSystem/config/base.py", line 138, in executeTest > ret = test(*args,**kargs) > File "/home/mnv/Software/petsc/config/BuildSystem/config/packages/hypre.py", line 199, in configureLibrary > config.package.Package.configureLibrary(self) > File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", line 1041, in configureLibrary > for location, directory, lib, incl in self.generateGuesses(): > File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", line 609, in generateGuesses > d = self.checkDownload() > File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", line 743, in checkDownload > return self.getInstallDir() > File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", line 545, in getInstallDir > installDir = self.Install() > File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", line 1892, in Install > raise RuntimeError('Error running make; make install on '+self.PACKAGE) > ================================================================================ > Finishing configure run at Fri, 09 Aug 2024 15:44:54 +0000 > ================================================================================ > > Any help in debugging this is much appreciated. I can provide the whole configure.log file if needed. > Thank you for your time, > Marcos > From junming.duan.math at gmail.com Fri Aug 9 15:02:06 2024 From: junming.duan.math at gmail.com (Junming Duan) Date: Fri, 9 Aug 2024 22:02:06 +0200 Subject: [petsc-users] MatZeroRowsColumns eliminates incorrectly in parallel In-Reply-To: References: Message-ID: Dear all, I tried to use MatZeroRowsColumns to eliminate Dirichlet boundary nodes. However, it cannot eliminate correctly in parallel. Please see the attached code which uses DMDA to create the matrix. When I used one process, it works as expected. For two processes, the domain is split in the x direction. But the 10th row, 20th column is not eliminated as observed when using one process. The results for two processes are also attached. I have input the same rows to be eliminated for both processes. Thank you for any help. #include #include int main(int argc, char **argv) { PetscInt M = 5, N = 3, m = PETSC_DECIDE, n = PETSC_DECIDE, ncomp = 2; PetscInt i, j; DMDALocalInfo daInfo; DM da; Mat A; Vec x, b; MatStencil row, col[5]; PetscScalar v[5]; PetscInt n_dirichlet_rows = 0, dirichlet_rows[2*(M+N)]; PetscFunctionBeginUser; PetscCall(PetscInitialize(&argc, &argv, (char *)0, NULL)); PetscCall(DMDACreate2d(PETSC_COMM_WORLD, DM_BOUNDARY_GHOSTED, DM_BOUNDARY_GHOSTED, DMDA_STENCIL_BOX, M, N, m, n, ncomp, 1, NULL, NULL, &da)); PetscCall(DMSetFromOptions(da)); PetscCall(DMSetUp(da)); PetscCall(DMView(da, PETSC_VIEWER_STDOUT_WORLD)); PetscCall(DMDAGetLocalInfo(da, &daInfo)); PetscCall(DMSetMatrixPreallocateOnly(da, PETSC_TRUE)); PetscCall(DMCreateMatrix(da, &A)); PetscCall(MatCreateVecs(A, &x, &b)); PetscCall(MatZeroEntries(A)); PetscCall(VecZeroEntries(x)); PetscCall(VecZeroEntries(b)); for (j = daInfo.ys; j < daInfo.ys + daInfo.ym; ++j) { for (i = daInfo.xs; i < daInfo.xs + daInfo.xm; ++i) { row.j = j; row.i = i; row.c = 0; col[0].j = j; col[0].i = i; col[0].c = 0; v[0] = row.i + col[0].i + 2; col[1].j = j; col[1].i = i - 1; col[1].c = 0; v[1] = row.i + col[1].i + 2; col[2].j = j; col[2].i = i + 1; col[2].c = 0; v[2] = row.i + col[2].i + 2; col[3].j = j - 1; col[3].i = i; col[3].c = 0; v[3] = row.i + col[2].i + 2; col[4].j = j + 1; col[4].i = i; col[4].c = 0; v[4] = row.i + col[2].i + 2; PetscCall(MatSetValuesStencil(A, 1, &row, 5, col, v, ADD_VALUES)); row.j = j; row.i = i; row.c = 1; col[0].j = j; col[0].i = i; col[0].c = 1; v[1] = row.j + col[1].j + 2; col[1].j = j - 1; col[1].i = i; col[1].c = 1; v[1] = row.j + col[1].j + 2; col[2].j = j + 1; col[2].i = i; col[2].c = 1; v[2] = row.j + col[2].j + 2; col[3].j = j; col[3].i = i - 1; col[3].c = 1; v[3] = row.j + col[1].j + 2; col[4].j = j; col[4].i = i + 1; col[4].c = 1; v[4] = row.j + col[2].j + 2; PetscCall(MatSetValuesStencil(A, 1, &row, 5, col, v, ADD_VALUES)); } } PetscCall(MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY)); PetscCall(MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY)); MatView(A, 0); for (j = 0; j < daInfo.my; ++j) { dirichlet_rows[n_dirichlet_rows++] = j * daInfo.mx * ncomp; dirichlet_rows[n_dirichlet_rows++] = (j+1) * daInfo.mx * ncomp - ncomp; } PetscCall(PetscPrintf(PETSC_COMM_SELF, "n_dirichlet_rows: %d\n", n_dirichlet_rows)); for (j = 0; j < n_dirichlet_rows; ++j) { PetscCall(PetscPrintf(PETSC_COMM_SELF, "%d, ", dirichlet_rows[j])); } PetscCall(PetscPrintf(PETSC_COMM_SELF, "\n")); PetscCall(MatZeroRowsColumns(A, n_dirichlet_rows, dirichlet_rows, 1, NULL, NULL)); MatView(A, 0); PetscCall(VecDestroy(&x)); PetscCall(VecDestroy(&b)); PetscCall(DMDestroy(&da)); PetscCall(PetscFinalize()); return 0; } ??????????????????? n_dirichlet_rows: 6 0, 8, 10, 18, 20, 28, Mat Object: 2 MPI processes type: mpiaij row 0: (0, 1.) (2, 0.) (10, 0.) row 1: (1, 2.) (3, 3.) (11, 3.) row 2: (0, 0.) (2, 4.) (4, 5.) (12, 0.) row 3: (1, 1.) (3, 4.) (5, 3.) (13, 3.) row 4: (2, 5.) (4, 6.) (6, 0.) (14, 0.) row 5: (3, 1.) (5, 6.) (7, 3.) (15, 3.) row 6: (4, 0.) (6, 1.) (8, 0.) (16, 0.) row 7: (5, 1.) (7, 8.) (9, 3.) (17, 3.) row 8: (6, 0.) (8, 1.) (18, 0.) row 9: (7, 1.) (9, 10.) (19, 3.) row 10: (0, 0.) (10, 2.) (12, 0.) (20, 3.) row 11: (1, 3.) (11, 2.) (13, 5.) (21, 5.) row 12: (2, 0.) (10, 0.) (12, 1.) (14, 0.) (22, 0.) row 13: (3, 3.) (11, 3.) (13, 4.) (15, 5.) (23, 5.) row 14: (4, 0.) (12, 0.) (14, 1.) (16, 0.) (24, 0.) row 15: (5, 3.) (13, 3.) (15, 6.) (17, 5.) (25, 5.) row 16: (6, 0.) (14, 0.) (16, 8.) (18, 9.) (26, 9.) row 17: (7, 3.) (15, 3.) (17, 8.) (19, 5.) (27, 5.) row 18: (8, 0.) (16, 9.) (18, 10.) (28, 0.) row 19: (9, 3.) (17, 3.) (19, 10.) (29, 5.) row 20: (10, 3.) (20, 2.) (22, 3.) row 21: (11, 5.) (21, 2.) (23, 7.) row 22: (12, 0.) (20, 3.) (22, 4.) (24, 5.) row 23: (13, 5.) (21, 5.) (23, 4.) (25, 7.) row 24: (14, 0.) (22, 5.) (24, 6.) (26, 7.) row 25: (15, 5.) (23, 5.) (25, 6.) (27, 7.) row 26: (16, 9.) (24, 7.) (26, 8.) (28, 0.) row 27: (17, 5.) (25, 5.) (27, 8.) (29, 7.) row 28: (18, 0.) (26, 0.) (28, 1.) row 29: (19, 5.) (27, 5.) (29, 10.) Junming -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Aug 9 15:25:09 2024 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 9 Aug 2024 16:25:09 -0400 Subject: [petsc-users] MatZeroRowsColumns eliminates incorrectly in parallel In-Reply-To: References: Message-ID: This is incorrect and will not work. for (j = 0; j < daInfo.my; ++j) { dirichlet_rows[n_dirichlet_rows++] = j * daInfo.mx * ncomp; dirichlet_rows[n_dirichlet_rows++] = (j+1) * daInfo.mx * ncomp - ncomp; } You are assuming that the PETSc global numbering of the matrix rows/columns is the same as the natural ordering (on a 2d mesh) across the entire mesh. It is not, rather all nodes are numbered on the first MPI process, followed by all the on second etc. Thus mapping between the PETSc ordering and the natural ordering is cumbersome. But, no worries, MatZeroRowsColumnsStencil() allows you to indicate the rows/columns to zero using the same stencil information you use to fill the matrix, so you never need to worry about the mapping between the PETSc ordering and the global natural ordering. Barry > On Aug 9, 2024, at 4:02?PM, Junming Duan wrote: > > Dear all, > > I tried to use MatZeroRowsColumns to eliminate Dirichlet boundary nodes. However, it cannot eliminate correctly in parallel. > Please see the attached code which uses DMDA to create the matrix. > When I used one process, it works as expected. > For two processes, the domain is split in the x direction. But the 10th row, 20th column is not eliminated as observed when using one process. The results for two processes are also attached. > I have input the same rows to be eliminated for both processes. > Thank you for any help. > > #include > #include > > int main(int argc, char **argv) > { > PetscInt M = 5, N = 3, m = PETSC_DECIDE, n = PETSC_DECIDE, ncomp = 2; > PetscInt i, j; > DMDALocalInfo daInfo; > DM da; > Mat A; > Vec x, b; > MatStencil row, col[5]; > PetscScalar v[5]; > PetscInt n_dirichlet_rows = 0, dirichlet_rows[2*(M+N)]; > > PetscFunctionBeginUser; > PetscCall(PetscInitialize(&argc, &argv, (char *)0, NULL)); > PetscCall(DMDACreate2d(PETSC_COMM_WORLD, DM_BOUNDARY_GHOSTED, DM_BOUNDARY_GHOSTED, DMDA_STENCIL_BOX, M, N, m, n, ncomp, 1, NULL, NULL, &da)); > PetscCall(DMSetFromOptions(da)); > PetscCall(DMSetUp(da)); > > PetscCall(DMView(da, PETSC_VIEWER_STDOUT_WORLD)); > > PetscCall(DMDAGetLocalInfo(da, &daInfo)); > PetscCall(DMSetMatrixPreallocateOnly(da, PETSC_TRUE)); > > PetscCall(DMCreateMatrix(da, &A)); > PetscCall(MatCreateVecs(A, &x, &b)); > PetscCall(MatZeroEntries(A)); > PetscCall(VecZeroEntries(x)); > PetscCall(VecZeroEntries(b)); > > for (j = daInfo.ys; j < daInfo.ys + daInfo.ym; ++j) { > for (i = daInfo.xs; i < daInfo.xs + daInfo.xm; ++i) { > row.j = j; > row.i = i; > row.c = 0; > > col[0].j = j; > col[0].i = i; > col[0].c = 0; > v[0] = row.i + col[0].i + 2; > col[1].j = j; > col[1].i = i - 1; > col[1].c = 0; > v[1] = row.i + col[1].i + 2; > col[2].j = j; > col[2].i = i + 1; > col[2].c = 0; > v[2] = row.i + col[2].i + 2; > col[3].j = j - 1; > col[3].i = i; > col[3].c = 0; > v[3] = row.i + col[2].i + 2; > col[4].j = j + 1; > col[4].i = i; > col[4].c = 0; > v[4] = row.i + col[2].i + 2; > > PetscCall(MatSetValuesStencil(A, 1, &row, 5, col, v, ADD_VALUES)); > > row.j = j; > row.i = i; > row.c = 1; > col[0].j = j; > col[0].i = i; > col[0].c = 1; > v[1] = row.j + col[1].j + 2; > col[1].j = j - 1; > col[1].i = i; > col[1].c = 1; > v[1] = row.j + col[1].j + 2; > col[2].j = j + 1; > col[2].i = i; > col[2].c = 1; > v[2] = row.j + col[2].j + 2; > col[3].j = j; > col[3].i = i - 1; > col[3].c = 1; > v[3] = row.j + col[1].j + 2; > col[4].j = j; > col[4].i = i + 1; > col[4].c = 1; > v[4] = row.j + col[2].j + 2; > > PetscCall(MatSetValuesStencil(A, 1, &row, 5, col, v, ADD_VALUES)); > > } > } > PetscCall(MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY)); > PetscCall(MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY)); > MatView(A, 0); > > for (j = 0; j < daInfo.my; ++j) { > dirichlet_rows[n_dirichlet_rows++] = j * daInfo.mx * ncomp; > dirichlet_rows[n_dirichlet_rows++] = (j+1) * daInfo.mx * ncomp - ncomp; > } > PetscCall(PetscPrintf(PETSC_COMM_SELF, "n_dirichlet_rows: %d\n", n_dirichlet_rows)); > for (j = 0; j < n_dirichlet_rows; ++j) { > PetscCall(PetscPrintf(PETSC_COMM_SELF, "%d, ", dirichlet_rows[j])); > } > PetscCall(PetscPrintf(PETSC_COMM_SELF, "\n")); > PetscCall(MatZeroRowsColumns(A, n_dirichlet_rows, dirichlet_rows, 1, NULL, NULL)); > MatView(A, 0); > > PetscCall(VecDestroy(&x)); > PetscCall(VecDestroy(&b)); > PetscCall(DMDestroy(&da)); > PetscCall(PetscFinalize()); > return 0; > } > > ??????????????????? > > n_dirichlet_rows: 6 > 0, 8, 10, 18, 20, 28, > Mat Object: 2 MPI processes > type: mpiaij > row 0: (0, 1.) (2, 0.) (10, 0.) > row 1: (1, 2.) (3, 3.) (11, 3.) > row 2: (0, 0.) (2, 4.) (4, 5.) (12, 0.) > row 3: (1, 1.) (3, 4.) (5, 3.) (13, 3.) > row 4: (2, 5.) (4, 6.) (6, 0.) (14, 0.) > row 5: (3, 1.) (5, 6.) (7, 3.) (15, 3.) > row 6: (4, 0.) (6, 1.) (8, 0.) (16, 0.) > row 7: (5, 1.) (7, 8.) (9, 3.) (17, 3.) > row 8: (6, 0.) (8, 1.) (18, 0.) > row 9: (7, 1.) (9, 10.) (19, 3.) > row 10: (0, 0.) (10, 2.) (12, 0.) (20, 3.) > row 11: (1, 3.) (11, 2.) (13, 5.) (21, 5.) > row 12: (2, 0.) (10, 0.) (12, 1.) (14, 0.) (22, 0.) > row 13: (3, 3.) (11, 3.) (13, 4.) (15, 5.) (23, 5.) > row 14: (4, 0.) (12, 0.) (14, 1.) (16, 0.) (24, 0.) > row 15: (5, 3.) (13, 3.) (15, 6.) (17, 5.) (25, 5.) > row 16: (6, 0.) (14, 0.) (16, 8.) (18, 9.) (26, 9.) > row 17: (7, 3.) (15, 3.) (17, 8.) (19, 5.) (27, 5.) > row 18: (8, 0.) (16, 9.) (18, 10.) (28, 0.) > row 19: (9, 3.) (17, 3.) (19, 10.) (29, 5.) > row 20: (10, 3.) (20, 2.) (22, 3.) > row 21: (11, 5.) (21, 2.) (23, 7.) > row 22: (12, 0.) (20, 3.) (22, 4.) (24, 5.) > row 23: (13, 5.) (21, 5.) (23, 4.) (25, 7.) > row 24: (14, 0.) (22, 5.) (24, 6.) (26, 7.) > row 25: (15, 5.) (23, 5.) (25, 6.) (27, 7.) > row 26: (16, 9.) (24, 7.) (26, 8.) (28, 0.) > row 27: (17, 5.) (25, 5.) (27, 8.) (29, 7.) > row 28: (18, 0.) (26, 0.) (28, 1.) > row 29: (19, 5.) (27, 5.) (29, 10.) > > Junming -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Aug 9 15:12:20 2024 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 9 Aug 2024 16:12:20 -0400 Subject: [petsc-users] Issue configuring PETSc with HYPRE in Polaris In-Reply-To: References: Message-ID: As a start, please send configure.log Thanks, Matt On Fri, Aug 9, 2024 at 1:17?PM Vanella, Marcos (Fed) via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi All, I keep running into this issue when trying to configure PETSc > downloading HYPRE in Polaris. > My modules are: > > export *MPICH_GPU_SUPPORT_ENABLED*=1 > module use /soft/modulefiles > module load spack-pe-base cmake > module unload darshan > module load cudatoolkit-standalone PrgEnv-gnu cray-libsci > > and my configure line is: > > $./configure COPTFLAGS="-O2" CXXOPTFLAGS="-O2" FOPTFLAGS="-O2" > FCOPTFLAGS="-O2" CUDAOPTFLAGS="-O2" --with-debugging=1 > --download-suitesparse --download-hypre --with-cuda --with-cc=cc > --with-cxx=CC --with-fc=ftn --with-cudac=nvcc --with-cuda-arch=80 > > What I see in the configure phase is: > > ============================================================================================= > Configuring PETSc to compile on your system > > ============================================================================================= > > ============================================================================================= > Trying to download https://urldefense.us/v3/__https://bitbucket.org/petsc/pkg-sowing.git__;!!G_uCfscf7eWS!bC2nF0niuYrmvBqOKJhC2c7ynXepezhMCen7e9RqnIO_bj8qEvum1TAPesC1XjzU0AEgkVpR4B20xSeFpvUg$ > > for SOWING > > ============================================================================================= > > ============================================================================================= > Running configure on SOWING; this may take several > minutes > > ============================================================================================= > > ============================================================================================= > Running make on SOWING; this may take several minutes > > ============================================================================================= > > ============================================================================================= > Running make install on SOWING; this may take several > minutes > > ============================================================================================= > > ============================================================================================= > Running arch-polaris-dbg/bin/bfort to generate Fortran > stubs > > ============================================================================================= > > ============================================================================================= > Trying to download https://urldefense.us/v3/__https://github.com/DrTimothyAldenDavis/SuiteSparse__;!!G_uCfscf7eWS!bC2nF0niuYrmvBqOKJhC2c7ynXepezhMCen7e9RqnIO_bj8qEvum1TAPesC1XjzU0AEgkVpR4B20xQ43P-ld$ > > for SUITESPARSE > > ============================================================================================= > > ============================================================================================= > Configuring SUITESPARSE with CMake; this may take several > minutes > > ============================================================================================= > > ============================================================================================= > Compiling and installing SUITESPARSE; this may take several > minutes > > ============================================================================================= > > ============================================================================================= > Trying to download https://urldefense.us/v3/__https://github.com/hypre-space/hypre__;!!G_uCfscf7eWS!bC2nF0niuYrmvBqOKJhC2c7ynXepezhMCen7e9RqnIO_bj8qEvum1TAPesC1XjzU0AEgkVpR4B20xeK92H5d$ > > for HYPRE > > ============================================================================================= > > ============================================================================================= > Running configure on HYPRE; this may take several minutes > > ============================================================================================= > > ============================================================================================= > Running make on HYPRE; this may take several minutes > > ============================================================================================= > > > ********************************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for > details): > > --------------------------------------------------------------------------------------------- > Error running make; make install on HYPRE > > ********************************************************************************************* > > the configure.log file ends with: > > > ********************************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for > details): > > --------------------------------------------------------------------------------------------- > Error running make; make install on HYPRE > > ********************************************************************************************* > File "/home/mnv/Software/petsc/config/configure.py", line 462, in > petsc_configure > framework.configure(out = sys.stdout) > File "/home/mnv/Software/petsc/config/BuildSystem/config/framework.py", > line 1455, in configure > self.processChildren() > File "/home/mnv/Software/petsc/config/BuildSystem/config/framework.py", > line 1443, in processChildren > self.serialEvaluation(self.childGraph) > File "/home/mnv/Software/petsc/config/BuildSystem/config/framework.py", > line 1418, in serialEvaluation > child.configure() > File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", > line 1354, in configure > self.executeTest(self.configureLibrary) > File "/home/mnv/Software/petsc/config/BuildSystem/config/base.py", line > 138, in executeTest > ret = test(*args,**kargs) > File > "/home/mnv/Software/petsc/config/BuildSystem/config/packages/hypre.py", > line 199, in configureLibrary > config.package.Package.configureLibrary(self) > File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", > line 1041, in configureLibrary > for location, directory, lib, incl in self.generateGuesses(): > File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", > line 609, in generateGuesses > d = self.checkDownload() > File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", > line 743, in checkDownload > return self.getInstallDir() > File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", > line 545, in getInstallDir > installDir = self.Install() > File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", > line 1892, in Install > raise RuntimeError('Error running make; make install on '+self.PACKAGE) > > ================================================================================ > Finishing configure run at Fri, 09 Aug 2024 15:44:54 +0000 > > ================================================================================ > > Any help in debugging this is much appreciated. I can provide the whole > configure.log file if needed. > Thank you for your time, > Marcos > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bC2nF0niuYrmvBqOKJhC2c7ynXepezhMCen7e9RqnIO_bj8qEvum1TAPesC1XjzU0AEgkVpR4B20xdqmkbnf$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From liufield at gmail.com Tue Aug 13 16:17:25 2024 From: liufield at gmail.com (neil liu) Date: Tue, 13 Aug 2024 17:17:25 -0400 Subject: [petsc-users] Question about the memory usage for BDDC preconditioner. Message-ID: Dear Petsc developers, I am testing PCBDDC for my vector based FEM solver(complex system). It can work well on a coarse mesh(tetrahedra cell #: 6,108; dof # : 39,596). Then I tried a finer mesh (tetrahedra cell #: 32,036; dof # : 206,362). It seems ASM can work well with petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 4 ./app -pc_type asm -ksp_converged_reason -ksp_monitor -ksp_gmres_restart 100 -ksp_rtol 1e-4 -pc_asm_overalp 4 -sub_pc_type ilu -malloc_view while PCBDDC eats up the memory (61 GB) when I tried petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 4 ./app -pc_type bddc -pc_bddc_coarse_redundant_pc_type ilu -pc_bddc_use_vertices -ksp_error_if_not_converged -mat_type is -ksp_monitor -ksp_rtol 1e-8 -ksp_gmres_restart 30 -ksp_view -malloc_view -pc_bddc_monolithic -pc_bddc_neumann_pc_type ilu -pc_bddc_dirichlet_pc_type ilu The following errors with BDDC came out. The memory usage for PCBDDC (different from PCASM) is also listed (I am assuming the unit is Bytes, right?). *Although the BDDC requires more memory, it still seems normal, right? * [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Out of memory. This could be due to allocating [0]PETSC ERROR: too large an object or bleeding by not properly [0]PETSC ERROR: destroying unneeded objects. [0] Maximum memory PetscMalloc()ed 30829727808 maximum size of entire process 16899194880 [0] Memory usage sorted by function .... *[0] 1 240 PCBDDCGraphCreate()* *[0] 1 3551136 PCBDDCGraphInit()* *[0] 2045 32720 PCBDDCGraphSetUp()* *[0] 2 8345696 PCBDDCSetLocalAdjacencyGraph_BDDC()* *[0] 1 784 PCCreate()* *[0] 1 1216 PCCreate_BDDC()* .... Thanks for your help. Xiaodong -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Tue Aug 13 16:47:07 2024 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Tue, 13 Aug 2024 23:47:07 +0200 Subject: [petsc-users] Question about the memory usage for BDDC preconditioner. In-Reply-To: References: Message-ID: can you run the same options and add "-ksp_view -pc_bddc_check_level 1" for the smaller case? Also, can you send the full stack trace of the out-of-memory error using a debug version of PETSc? A note aside: you should not need pc_bddc_use_vertices (which is on by default) Il giorno mar 13 ago 2024 alle ore 23:17 neil liu ha scritto: > Dear Petsc developers, > > I am testing PCBDDC for my vector based FEM solver(complex system). It can > work well on a coarse mesh(tetrahedra cell #: 6,108; dof # : 39,596). Then > I tried a finer mesh (tetrahedra cell #: 32,036; dof # : 206,362). It seems > ASM can work well with > > petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 4 ./app -pc_type asm > -ksp_converged_reason -ksp_monitor -ksp_gmres_restart 100 -ksp_rtol 1e-4 > -pc_asm_overalp 4 -sub_pc_type ilu -malloc_view > > while PCBDDC eats up the memory (61 GB) when I tried > > petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 4 ./app -pc_type bddc > -pc_bddc_coarse_redundant_pc_type ilu -pc_bddc_use_vertices > -ksp_error_if_not_converged -mat_type is -ksp_monitor -ksp_rtol 1e-8 > -ksp_gmres_restart 30 -ksp_view -malloc_view -pc_bddc_monolithic > -pc_bddc_neumann_pc_type ilu -pc_bddc_dirichlet_pc_type ilu > > The following errors with BDDC came out. The memory usage for PCBDDC > (different from PCASM) is also listed (I am assuming the unit is Bytes, > right?). *Although the BDDC requires more memory, it still seems normal, > right? * > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Out of memory. This could be due to allocating > [0]PETSC ERROR: too large an object or bleeding by not properly > [0]PETSC ERROR: destroying unneeded objects. > [0] Maximum memory PetscMalloc()ed 30829727808 maximum size of entire > process 16899194880 > [0] Memory usage sorted by function > .... > *[0] 1 240 PCBDDCGraphCreate()* > *[0] 1 3551136 PCBDDCGraphInit()* > *[0] 2045 32720 PCBDDCGraphSetUp()* > *[0] 2 8345696 PCBDDCSetLocalAdjacencyGraph_BDDC()* > *[0] 1 784 PCCreate()* > *[0] 1 1216 PCCreate_BDDC()* > .... > > Thanks for your help. > > Xiaodong > > > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From liufield at gmail.com Tue Aug 13 22:15:13 2024 From: liufield at gmail.com (neil liu) Date: Tue, 13 Aug 2024 23:15:13 -0400 Subject: [petsc-users] Question about the memory usage for BDDC preconditioner. In-Reply-To: References: Message-ID: Hi, Stefano, Please see the attached for the smaller case(successful with BDDC). and the Error_largerMesh shows the error with the large mesh using petsc debug mode. Thanks a lot, Xiaodong On Tue, Aug 13, 2024 at 5:47?PM Stefano Zampini wrote: > can you run the same options and add "-ksp_view -pc_bddc_check_level 1" > for the smaller case? Also, can you send the full stack trace of the > out-of-memory error using a debug version of PETSc? > A note aside: you should not need pc_bddc_use_vertices (which is on by > default) > > Il giorno mar 13 ago 2024 alle ore 23:17 neil liu ha > scritto: > >> Dear Petsc developers, >> >> I am testing PCBDDC for my vector based FEM solver(complex system). It >> can work well on a coarse mesh(tetrahedra cell #: 6,108; dof # : 39,596). >> Then I tried a finer mesh (tetrahedra cell #: 32,036; dof # : 206,362). It >> seems ASM can work well with >> >> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 4 ./app -pc_type asm >> -ksp_converged_reason -ksp_monitor -ksp_gmres_restart 100 -ksp_rtol 1e-4 >> -pc_asm_overalp 4 -sub_pc_type ilu -malloc_view >> >> while PCBDDC eats up the memory (61 GB) when I tried >> >> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 4 ./app -pc_type bddc >> -pc_bddc_coarse_redundant_pc_type ilu -pc_bddc_use_vertices >> -ksp_error_if_not_converged -mat_type is -ksp_monitor -ksp_rtol 1e-8 >> -ksp_gmres_restart 30 -ksp_view -malloc_view -pc_bddc_monolithic >> -pc_bddc_neumann_pc_type ilu -pc_bddc_dirichlet_pc_type ilu >> >> The following errors with BDDC came out. The memory usage for PCBDDC >> (different from PCASM) is also listed (I am assuming the unit is Bytes, >> right?). *Although the BDDC requires more memory, it still seems normal, >> right? * >> >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: Out of memory. This could be due to allocating >> [0]PETSC ERROR: too large an object or bleeding by not properly >> [0]PETSC ERROR: destroying unneeded objects. >> [0] Maximum memory PetscMalloc()ed 30829727808 maximum size of entire >> process 16899194880 >> [0] Memory usage sorted by function >> .... >> *[0] 1 240 PCBDDCGraphCreate()* >> *[0] 1 3551136 PCBDDCGraphInit()* >> *[0] 2045 32720 PCBDDCGraphSetUp()* >> *[0] 2 8345696 PCBDDCSetLocalAdjacencyGraph_BDDC()* >> *[0] 1 784 PCCreate()* >> *[0] 1 1216 PCCreate_BDDC()* >> .... >> >> Thanks for your help. >> >> Xiaodong >> >> >> > > -- > Stefano > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Error_largerMesh Type: application/octet-stream Size: 6080 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Smaller_case Type: application/octet-stream Size: 323048 bytes Desc: not available URL: From balay at fastmail.org Wed Aug 14 09:21:52 2024 From: balay at fastmail.org (Satish Balay) Date: Wed, 14 Aug 2024 09:21:52 -0500 (CDT) Subject: [petsc-users] HDF5 (fwd) Message-ID: Please fix your 'contacts' to use 'petsc-users' and not 'petsc-users-bounces' Satish ---------- Forwarded message ---------- Date: Wed, 14 Aug 2024 07:09:31 +0000 From: Yang Yehua To: "petsc-users-bounces at mcs.anl.gov" Subject: HDF5 Dear all, I am trying to use HDF5 to save a DM object, but it is very slow. Here is the code I am using: PetscViewer viewer; PetscCall(PetscViewerHDF5Open(PETSC_COMM_WORLD, "mesh.h5", FILE_MODE_WRITE, &viewer)); PetscCall(PetscObjectSetName((PetscObject)dm, "plexA")); PetscCall(DMView(dm, viewer)); PetscCall(PetscViewerDestroy(&viewer)); The DM object is a parallel mesh with 2 MPI processes. Here are the details: * Type: plex * Parallel Mesh in 3 dimensions: * Number of 0-cells per rank: 1954, 1948 * Number of 1-cells per rank: 11794, 11749 * Number of 2-cells per rank: 18851, 18773 * Number of 3-cells per rank: 9010, 8971 Labels: * Depth: 4 strata with value/size (0 (1954), 1 (11794), 2 (18851), 3 (9010)) * Celltype: 4 strata with value/size (0 (1954), 1 (11794), 3 (18851), 6 (9010)) * Cell Sets: 1 strata with value/size (2 (9010)) * av_section: 1 strata with value/size (0 (9010)) Field phi_grad: * Adjacency FEM Despite the small mesh size, using 16 ranks takes several minutes, while using 2 ranks takes 21 seconds. Any suggestions on how to improve the performance would be greatly appreciated. Best regards, Yehua -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 988337 bytes Desc: configure.log URL: From stefano.zampini at gmail.com Wed Aug 14 10:54:17 2024 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Wed, 14 Aug 2024 17:54:17 +0200 Subject: [petsc-users] Question about the memory usage for BDDC preconditioner. In-Reply-To: References: Message-ID: Ok, the problem is that the default algorithm for detecting the connected components of the interface finds a lot of disconnected dofs. What discretization is this? Nedelec elements? Can you try using -pc_ *bddc_use_lo*cal_mat_graph 0? Also, you are using -pc_bddc_monolithic, but you only have one field. That flag aggregates different fields, but you only have one. Note that with Nedelec elements, you need a special change of basis for BDDC to work, see e.g. https://urldefense.us/v3/__https://www.osti.gov/servlets/purl/1377770__;!!G_uCfscf7eWS!Z94Qs8Q7RYEdhbAbvkaNorzlyoN4UH_ttW0EmR6d-NKweo4S35ELp-_Y60aJAAE1vzgZpof2VQYVxX9Xm1kM2vwiioZBFlo$ Il giorno mer 14 ago 2024 alle ore 05:15 neil liu ha scritto: > Hi, Stefano, > > Please see the attached for the smaller case(successful with BDDC). > and the Error_largerMesh shows the error with the large mesh using petsc > debug mode. > > Thanks a lot, > > Xiaodong > > > On Tue, Aug 13, 2024 at 5:47?PM Stefano Zampini > wrote: > >> can you run the same options and add "-ksp_view -pc_bddc_check_level 1" >> for the smaller case? Also, can you send the full stack trace of the >> out-of-memory error using a debug version of PETSc? >> A note aside: you should not need pc_bddc_use_vertices (which is on by >> default) >> >> Il giorno mar 13 ago 2024 alle ore 23:17 neil liu >> ha scritto: >> >>> Dear Petsc developers, >>> >>> I am testing PCBDDC for my vector based FEM solver(complex system). It >>> can work well on a coarse mesh(tetrahedra cell #: 6,108; dof # : 39,596). >>> Then I tried a finer mesh (tetrahedra cell #: 32,036; dof # : 206,362). It >>> seems ASM can work well with >>> >>> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 4 ./app -pc_type asm >>> -ksp_converged_reason -ksp_monitor -ksp_gmres_restart 100 -ksp_rtol 1e-4 >>> -pc_asm_overalp 4 -sub_pc_type ilu -malloc_view >>> >>> while PCBDDC eats up the memory (61 GB) when I tried >>> >>> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 4 ./app -pc_type bddc >>> -pc_bddc_coarse_redundant_pc_type ilu -pc_bddc_use_vertices >>> -ksp_error_if_not_converged -mat_type is -ksp_monitor -ksp_rtol 1e-8 >>> -ksp_gmres_restart 30 -ksp_view -malloc_view -pc_bddc_monolithic >>> -pc_bddc_neumann_pc_type ilu -pc_bddc_dirichlet_pc_type ilu >>> >>> The following errors with BDDC came out. The memory usage for PCBDDC >>> (different from PCASM) is also listed (I am assuming the unit is Bytes, >>> right?). *Although the BDDC requires more memory, it still seems >>> normal, right? * >>> >>> [0]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> [0]PETSC ERROR: Out of memory. This could be due to allocating >>> [0]PETSC ERROR: too large an object or bleeding by not properly >>> [0]PETSC ERROR: destroying unneeded objects. >>> [0] Maximum memory PetscMalloc()ed 30829727808 maximum size of entire >>> process 16899194880 >>> [0] Memory usage sorted by function >>> .... >>> *[0] 1 240 PCBDDCGraphCreate()* >>> *[0] 1 3551136 PCBDDCGraphInit()* >>> *[0] 2045 32720 PCBDDCGraphSetUp()* >>> *[0] 2 8345696 PCBDDCSetLocalAdjacencyGraph_BDDC()* >>> *[0] 1 784 PCCreate()* >>> *[0] 1 1216 PCCreate_BDDC()* >>> .... >>> >>> Thanks for your help. >>> >>> Xiaodong >>> >>> >>> >> >> -- >> Stefano >> > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From coltonbryant2021 at u.northwestern.edu Wed Aug 14 17:36:41 2024 From: coltonbryant2021 at u.northwestern.edu (Colton Bryant) Date: Wed, 14 Aug 2024 16:36:41 -0600 Subject: [petsc-users] Using DM_BOUNDARY_GHOSTED in DMSTAG Message-ID: Hello, I'm trying to understand the use of DM_BOUNDARY_GHOSTED and am a little confused. Is there any way for the linear solver to access and manipulate the ghost point value during the solve? I currently have a code using DM_BOUNDARY_PERIODIC and at the periodic boundary I simply apply the same discretization as I do everywhere else and as I understand it the value at e.g. i=-1 is set automatically by the periodic boundary condition. I would like to use DM_BOUNDARY_GHOSTED to set my own condition by which the point at i=-1 is set (a Neumann type condition). I have seen some matrix free examples but is there an easy way to "add" such a condition to the linear system in this case? Thanks for any help you can provide. Best, Colton Bryant -------------- next part -------------- An HTML attachment was scrubbed... URL: From liufield at gmail.com Wed Aug 14 21:08:44 2024 From: liufield at gmail.com (neil liu) Date: Wed, 14 Aug 2024 22:08:44 -0400 Subject: [petsc-users] Question about the memory usage for BDDC preconditioner. In-Reply-To: References: Message-ID: Thanks a lot, Stefano. Yes. I am using 2nd-order Nedelec elements. -pc_*bddc_use_lo*cal_mat_graph 0 can make the code run. I am testing more cpu #. I am testing my code using, petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 4 ./app -pc_type bddc -pc_bddc_coarse_redundant_pc_type svd -pc_bddc_use_vertices -ksp_error_if_not_converged -mat_type is -ksp_monitor -ksp_rtol 1e-8 -ksp_gmres_restart 2000 -ksp_view -malloc_view -pc_bddc_use_local_mat_graph 0 -ksp_converged_reason -pc_bddc_neumann_pc_type gamg -pc_bddc_neumann_pc_gamg_esteig_ksp_max_it 10 -ksp_converged_reason -pc_bddc_neumann_approximate -pc_bddc_dirichlet_pc_type gamg -pc_bddc_dirichlet_pc_gamg_esteig_ksp_max_it 10 -ksp_converged_reason -pc_bddc_dirichlet_approximate The residual dropped to 6e-5 very fast and then continued to reduce very slowly. Do you have any suggestions to improve this ? Will it be necessary to change the basis for BDDC in order to accelerate the convergence ? In addition, I tried -pc_bddc_use_deluxe_scaling, but it showed some errors. It seems deluxe scaling obviously requires a much larger size (*Global size overflow 3051678564*) than my problem. Thanks, Xiaodong [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Overflow in integer operation: https://urldefense.us/v3/__https://petsc.org/release/faq/*64-bit-indices__;Iw!!G_uCfscf7eWS!ZVyzxJb4s9N1kzsS2BV7raG-kJIn8X6skBNtfsvA8aHyjWPm8oYGfzk83j1n0PFstGE6nDCHpOIpMvkLFZcexA$ [0]PETSC ERROR: Global size overflow 3051678564. You may consider ./configure PETSc with --with-64-bit-indices for the case you are running [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-ksp_converged_reason (no value) source: command line [0]PETSC ERROR: Option left: name:-pc_bddc_coarse_redundant_pc_type value: svd source: command line [0]PETSC ERROR: Option left: name:-pc_bddc_neumann_pc_gamg_esteig_ksp_max_it value: 10 source: command line [0]PETSC ERROR: Option left: name:-pc_bddc_neumann_pc_type value: gamg source: command line [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!ZVyzxJb4s9N1kzsS2BV7raG-kJIn8X6skBNtfsvA8aHyjWPm8oYGfzk83j1n0PFstGE6nDCHpOIpMvlAB6PtAg$ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.21.1, unknown [0]PETSC ERROR: Configure options --with-cc=gcc --with-fc=gfortran --with-cxx=g++ --download-fblaslapack --download-mpich --with-scalar-type=complex --download-triangle --with-debugging=no [0]PETSC ERROR: #1 PetscSplitOwnership() at /Documents/petsc-3.21.1/petsc/src/sys/utils/psplit.c:86 [0]PETSC ERROR: #2 PetscLayoutSetUp() at /Documents/petsc-3.21.1/petsc/src/vec/is/utils/pmap.c:244 [0]PETSC ERROR: #3 PetscLayoutCreateFromSizes() at /Documents/petsc-3.21.1/petsc/src/vec/is/utils/pmap.c:107 [0]PETSC ERROR: #4 ISGeneralSetIndices_General() at /Documents/petsc-3.21.1/petsc/src/vec/is/is/impls/general/general.c:569 [0]PETSC ERROR: #5 ISGeneralSetIndices() at /Documents/petsc-3.21.1/petsc/src/vec/is/is/impls/general/general.c:559 [0]PETSC ERROR: #6 ISCreateGeneral() at /Documents/petsc-3.21.1/petsc/src/vec/is/is/impls/general/general.c:530 [0]PETSC ERROR: #7 ISRenumber() at /Documents/petsc-3.21.1/petsc/src/vec/is/is/interface/index.c:198 [0]PETSC ERROR: #8 PCBDDCSubSchursSetUp() at /Documents/petsc-3.21.1/petsc/src/ksp/pc/impls/bddc/bddcschurs.c:646 [0]PETSC ERROR: #9 PCBDDCSetUpSubSchurs() at /Documents/petsc-3.21.1/petsc/src/ksp/pc/impls/bddc/bddcprivate.c:9348 [0]PETSC ERROR: #10 PCSetUp_BDDC() at /Documents/petsc-3.21.1/petsc/src/ksp/pc/impls/bddc/bddc.c:1564 [0]PETSC ERROR: #11 PCSetUp() at /Documents/petsc-3.21.1/petsc/src/ksp/pc/interface/precon.c:1079 [0]PETSC ERROR: #12 KSPSetUp() at /Documents/petsc-3.21.1/petsc/src/ksp/ksp/interface/itfunc.c:415 [0]PETSC ERROR: #13 KSPSolve_Private() at Documents/petsc-3.21.1/petsc/src/ksp/ksp/interface/itfunc.c:831 [0]PETSC ERROR: #14 KSPSolve() at /Documents/petsc-3.21.1/petsc/src/ksp/ksp/interface/itfunc.c:1078 On Wed, Aug 14, 2024 at 11:54?AM Stefano Zampini wrote: > Ok, the problem is that the default algorithm for detecting the connected > components of the interface finds a lot of disconnected dofs. > What discretization is this? Nedelec elements? Can you try using -pc_ > *bddc_use_lo*cal_mat_graph 0? > Also, you are using -pc_bddc_monolithic, but you only have one field. That > flag aggregates different fields, but you only have one. > Note that with Nedelec elements, you need a special change of basis for > BDDC to work, see e.g. https://urldefense.us/v3/__https://www.osti.gov/servlets/purl/1377770__;!!G_uCfscf7eWS!ZVyzxJb4s9N1kzsS2BV7raG-kJIn8X6skBNtfsvA8aHyjWPm8oYGfzk83j1n0PFstGE6nDCHpOIpMvlI_QH81A$ > > Il giorno mer 14 ago 2024 alle ore 05:15 neil liu ha > scritto: > >> Hi, Stefano, >> >> Please see the attached for the smaller case(successful with BDDC). >> and the Error_largerMesh shows the error with the large mesh using petsc >> debug mode. >> >> Thanks a lot, >> >> Xiaodong >> >> >> On Tue, Aug 13, 2024 at 5:47?PM Stefano Zampini < >> stefano.zampini at gmail.com> wrote: >> >>> can you run the same options and add "-ksp_view -pc_bddc_check_level 1" >>> for the smaller case? Also, can you send the full stack trace of the >>> out-of-memory error using a debug version of PETSc? >>> A note aside: you should not need pc_bddc_use_vertices (which is on by >>> default) >>> >>> Il giorno mar 13 ago 2024 alle ore 23:17 neil liu >>> ha scritto: >>> >>>> Dear Petsc developers, >>>> >>>> I am testing PCBDDC for my vector based FEM solver(complex system). It >>>> can work well on a coarse mesh(tetrahedra cell #: 6,108; dof # : 39,596). >>>> Then I tried a finer mesh (tetrahedra cell #: 32,036; dof # : 206,362). It >>>> seems ASM can work well with >>>> >>>> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 4 ./app -pc_type asm >>>> -ksp_converged_reason -ksp_monitor -ksp_gmres_restart 100 -ksp_rtol 1e-4 >>>> -pc_asm_overalp 4 -sub_pc_type ilu -malloc_view >>>> >>>> while PCBDDC eats up the memory (61 GB) when I tried >>>> >>>> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 4 ./app -pc_type bddc >>>> -pc_bddc_coarse_redundant_pc_type ilu -pc_bddc_use_vertices >>>> -ksp_error_if_not_converged -mat_type is -ksp_monitor -ksp_rtol 1e-8 >>>> -ksp_gmres_restart 30 -ksp_view -malloc_view -pc_bddc_monolithic >>>> -pc_bddc_neumann_pc_type ilu -pc_bddc_dirichlet_pc_type ilu >>>> >>>> The following errors with BDDC came out. The memory usage for PCBDDC >>>> (different from PCASM) is also listed (I am assuming the unit is Bytes, >>>> right?). *Although the BDDC requires more memory, it still seems >>>> normal, right? * >>>> >>>> [0]PETSC ERROR: --------------------- Error Message >>>> -------------------------------------------------------------- >>>> [0]PETSC ERROR: Out of memory. This could be due to allocating >>>> [0]PETSC ERROR: too large an object or bleeding by not properly >>>> [0]PETSC ERROR: destroying unneeded objects. >>>> [0] Maximum memory PetscMalloc()ed 30829727808 maximum size of entire >>>> process 16899194880 >>>> [0] Memory usage sorted by function >>>> .... >>>> *[0] 1 240 PCBDDCGraphCreate()* >>>> *[0] 1 3551136 PCBDDCGraphInit()* >>>> *[0] 2045 32720 PCBDDCGraphSetUp()* >>>> *[0] 2 8345696 PCBDDCSetLocalAdjacencyGraph_BDDC()* >>>> *[0] 1 784 PCCreate()* >>>> *[0] 1 1216 PCCreate_BDDC()* >>>> .... >>>> >>>> Thanks for your help. >>>> >>>> Xiaodong >>>> >>>> >>>> >>> >>> -- >>> Stefano >>> >> > > -- > Stefano > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Aug 14 22:01:17 2024 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 14 Aug 2024 23:01:17 -0400 Subject: [petsc-users] Using DM_BOUNDARY_GHOSTED in DMSTAG In-Reply-To: References: Message-ID: The linear solvers (unless you write your own matrix-free beast, which I don't think you want to do) has no concept of the "ghost" locations and cannot access or manipulate them because the linear solver (and most standard preconditioners) work only with the global vector (not the local ghosted vector). This means you have one equation for each "point" on the DMStag or DMDA (for simplicity, assuming a scalar problem) and similarly one variable for each of those DM "points". Resulting in a square matrix. It sounds like you would like to have one equation for each point on the DMStag but variables on both the points on the DMStag and the extra ghost points, resulting in a rectangular matrix. The rectangular matrix is nice because it has the same regular stencil on all of its rows; no rows connected to the boundary are missing part of the stencil. We don't support linear solvers that can work with the PETSc parallel matrices that can directly work with this form. Since you know the values on the ghost points, you can eliminate them (in theory or practice) by updating the right-hand side for the points on the DM. This elimination results in a square matrix (which now has the annoying boundary rows), which can then be solved. This is the model that we work with for linear problems, having unknowns and equations only for variables in the global vector. Now, for nonlinear problems, it is a different story; here, using ghost points is very useful in evaluating f(x) (what becomes the right-hand side in Newton's method) and the Jacobian J(x). So both the function evaluation and Jacobian evaluation start by scattering the global x into local x (the ghost points between processes), the other ghost points (for boundary conditions) are filled by us as appropriate, and then a local function evaluation is done filling in the local points of the global vector using the values in the local vectors. Many of our SNES examples use this style. If you do want to use your scheme for a linear problem directly you can do it. Do not have DM ghost boundary locations; instead, increase the size of the domain by one stencil width on each side and put the ghost boundary locations in the global solution vector, and make identity equations for each ghost boundary locations in the linear system. Conceptually, you have your nice rectangular matrix embedded in a square matrix by just having the other rows be rows of the identity matrix. And put in the right-hand side for those equations the values of your ghost locations, There are a bunch of ways of thinking about these issues if you get into it, Barry > On Aug 14, 2024, at 6:36?PM, Colton Bryant wrote: > > Hello, > > I'm trying to understand the use of DM_BOUNDARY_GHOSTED and am a little confused. Is there any way for the linear solver to access and manipulate the ghost point value during the solve? I currently have a code using DM_BOUNDARY_PERIODIC and at the periodic boundary I simply apply the same discretization as I do everywhere else and as I understand it the value at e.g. i=-1 is set automatically by the periodic boundary condition. I would like to use DM_BOUNDARY_GHOSTED to set my own condition by which the point at i=-1 is set (a Neumann type condition). I have seen some matrix free examples but is there an easy way to "add" such a condition to the linear system in this case? > > Thanks for any help you can provide. > > Best, > Colton Bryant From liufield at gmail.com Thu Aug 15 16:03:41 2024 From: liufield at gmail.com (neil liu) Date: Thu, 15 Aug 2024 17:03:41 -0400 Subject: [petsc-users] Strong scaling concerns for PCBDDC with Vector FEM Message-ID: Dear Petsc developers, Thanks for your previous help. Now, the PCBDDC can converge to 1e-8 with, petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 8 ./app -pc_type bddc -pc_bddc_coarse_redundant_pc_type svd -ksp_error_if_not_converged -mat_type is -ksp_monitor -ksp_rtol 1e-8 -ksp_gmres_restart 5000 -ksp_view -pc_bddc_use_local_mat_graph 0 -pc_bddc_dirichlet_pc_type ilu -pc_bddc_neumann_pc_type gamg -pc_bddc_neumann_pc_gamg_esteig_ksp_max_it 10 -ksp_converged_reason -pc_bddc_neumann_approximate -ksp_max_it 500 -log_view Then I used 2 cases for strong scaling test. One case only involves real numbers (tetra #: 49,152; dof #: 324, 224 ) for matrix and rhs. The 2nd case involves complex numbers (tetra #: 95,336; dof #: 611,432) due to PML. Case 1: cpu # Time for 500 ksp steps (s) Parallel efficiency PCsetup time(s) 2 234.7 3.12 4 126.6 0.92 1.62 8 84.97 0.69 1.26 However for Case 2, cpu # Time for 500 ksp steps (s) Parallel efficiency PCsetup time(s) 2 584.5 8.61 4 376.8 0.77 6.56 8 459.6 0.31 66.47 For these 2 cases, I checked the time for PCsetup as an example. It seems 8 cpus for case 2 used too much time on PCsetup. Do you have any ideas about what is going on here? Thanks, Xiaodong -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Sat Aug 17 08:23:22 2024 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Sat, 17 Aug 2024 16:23:22 +0300 Subject: [petsc-users] Strong scaling concerns for PCBDDC with Vector FEM In-Reply-To: References: Message-ID: Please include the output of -log_view -ksp_view -ksp_monitor to understand what's happening. Can you please share the equations you are solving so we can provide suggestions on the solver configuration? As I said, solving for Nedelec-type discretizations is challenging, and not for off-the-shelf, black box solvers Below are some comments: - You use a redundant SVD approach for the coarse solve, which can be inefficient if your coarse space grows. You can use a parallel direct solver like MUMPS (reconfigure with --download-mumps and use -pc_bddc_coarse_pc_type lu -pc_bddc_coarse_pc_factor_mat_solver_type mumps) - Why use ILU for the Dirichlet problem and GAMG for the Neumann problem? With 8 processes and 300K total dofs, you will have around 40K dofs per process, which is ok for a direct solver like MUMPS (-pc_bddc_dirichlet_pc_factor_mat_solver_type mumps, same for Neumann). With Nedelec dofs and the sparsity pattern they induce, I believe you can push to 80K dofs per process with good performance. - Why 5000 of restart for GMRES? It is highly inefficient to re-orthogonalize such a large set of vectors. Il giorno ven 16 ago 2024 alle ore 00:04 neil liu ha scritto: > Dear Petsc developers, > > Thanks for your previous help. Now, the PCBDDC can converge to 1e-8 with, > > petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 8 ./app -pc_type bddc > -pc_bddc_coarse_redundant_pc_type svd -ksp_error_if_not_converged > -mat_type is -ksp_monitor -ksp_rtol 1e-8 -ksp_gmres_restart 5000 -ksp_view > -pc_bddc_use_local_mat_graph 0 -pc_bddc_dirichlet_pc_type ilu > -pc_bddc_neumann_pc_type gamg -pc_bddc_neumann_pc_gamg_esteig_ksp_max_it 10 > -ksp_converged_reason -pc_bddc_neumann_approximate -ksp_max_it 500 -log_view > > Then I used 2 cases for strong scaling test. One case only involves real > numbers (tetra #: 49,152; dof #: 324, 224 ) for matrix and rhs. The 2nd > case involves complex numbers (tetra #: 95,336; dof #: 611,432) due to > PML. > > Case 1: > cpu # Time for 500 ksp steps (s) Parallel efficiency > PCsetup time(s) > 2 234.7 > 3.12 > 4 126.6 0.92 > 1.62 > 8 84.97 0.69 > 1.26 > However for Case 2, > cpu # Time for 500 ksp steps (s) Parallel efficiency > PCsetup time(s) > 2 584.5 > 8.61 > 4 376.8 0.77 > 6.56 > 8 459.6 0.31 > 66.47 > For these 2 cases, I checked the time for PCsetup as an example. It seems > 8 cpus for case 2 used too much time on PCsetup. > Do you have any ideas about what is going on here? > > Thanks, > Xiaodong > > > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From lzou at anl.gov Sat Aug 17 11:35:28 2024 From: lzou at anl.gov (Zou, Ling) Date: Sat, 17 Aug 2024 16:35:28 +0000 Subject: [petsc-users] PCFactorSetMatOrderingType not working with 3.21 Message-ID: Hi all, The following codes are how I used to setup PC mat ordering: // Setup KSP/PC (at this moment, user-input options and commandline options are available) SNESGetKSP(snes, &ksp); KSPSetFromOptions(ksp); PC pc; KSPGetPC(ksp, &pc); PCFactorSetMatOrderingType(pc, MATORDERINGRCM); // PCFactorSetLevels(pc, 5); SNESSetFromOptions(snes); After switching to PETSc 3.21, this no longer works, and can be confirmed from ?-snes_view? output: PC Object: 1 MPI process type: ilu out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 using diagonal shift to prevent zero pivot [NONZERO] matrix ordering: natural The command line option still works, i.e., ?-pc_factor_mat_ordering_type rcm? gives me the correct behavior. Questions: * Is this a bug introduced in the new version, or * With the new version, I should call this function at a different time? Best, -Ling -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sat Aug 17 12:07:46 2024 From: bsmith at petsc.dev (Barry Smith) Date: Sat, 17 Aug 2024 13:07:46 -0400 Subject: [petsc-users] PCFactorSetMatOrderingType not working with 3.21 In-Reply-To: References: Message-ID: I have attached src/snes/tutorials/ex5.c in which I tried to reproduce your problem by inserting the code you've indicated. However I am not getting the problem you see, I am seeing, type: ilu out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: rcm when I run with -pc_type ilu -snes_view? Can you please confirm you get the same problem with the attached ex5.c ? You could send your code to see if I can reproduce the problem. I am using the release branch of PETSc Barry > On Aug 17, 2024, at 12:35?PM, Zou, Ling via petsc-users wrote: > > Hi all, > > The following codes are how I used to setup PC mat ordering: > > // Setup KSP/PC (at this moment, user-input options and commandline options are available) > SNESGetKSP(snes, &ksp); > KSPSetFromOptions(ksp); > PC pc; > KSPGetPC(ksp, &pc); > PCFactorSetMatOrderingType(pc, MATORDERINGRCM); > // PCFactorSetLevels(pc, 5); > SNESSetFromOptions(snes); > > After switching to PETSc 3.21, this no longer works, and can be confirmed from ?-snes_view? output: > > PC Object: 1 MPI process > type: ilu > out-of-place factorization > 0 levels of fill > tolerance for zero pivot 2.22045e-14 > using diagonal shift to prevent zero pivot [NONZERO] > matrix ordering: natural > > The command line option still works, i.e., ?-pc_factor_mat_ordering_type rcm? gives me the correct behavior. > > Questions: > Is this a bug introduced in the new version, or > With the new version, I should call this function at a different time? > > Best, > > -Ling -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex5.c Type: application/octet-stream Size: 37704 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From lzou at anl.gov Sat Aug 17 14:44:25 2024 From: lzou at anl.gov (Zou, Ling) Date: Sat, 17 Aug 2024 19:44:25 +0000 Subject: [petsc-users] PCFactorSetMatOrderingType not working with 3.21 In-Reply-To: References: Message-ID: Barry, thanks. I am accessing PETSc through MOOSE. I need to figure out if the versions are consistent and how to test it. -Ling From: Barry Smith Date: Saturday, August 17, 2024 at 12:08?PM To: Zou, Ling Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] PCFactorSetMatOrderingType not working with 3.21 I have attached src/snes/tutorials/ex5.?c in which I tried to reproduce your problem by inserting the code you've indicated. However I am not getting the problem you see, I am seeing, type: ilu out-of-place factorization 0 levels of fill tolerance ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd I have attached src/snes/tutorials/ex5.c in which I tried to reproduce your problem by inserting the code you've indicated. However I am not getting the problem you see, I am seeing, type: ilu out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: rcm when I run with -pc_type ilu -snes_view Can you please confirm you get the same problem with the attached ex5.c ? You could send your code to see if I can reproduce the problem. I am using the release branch of PETSc Barry On Aug 17, 2024, at 12:35?PM, Zou, Ling via petsc-users wrote: Hi all, The following codes are how I used to setup PC mat ordering: // Setup KSP/PC (at this moment, user-input options and commandline options are available) SNESGetKSP(snes, &ksp); KSPSetFromOptions(ksp); PC pc; KSPGetPC(ksp, &pc); PCFactorSetMatOrderingType(pc, MATORDERINGRCM); // PCFactorSetLevels(pc, 5); SNESSetFromOptions(snes); After switching to PETSc 3.21, this no longer works, and can be confirmed from ?-snes_view? output: PC Object: 1 MPI process type: ilu out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 using diagonal shift to prevent zero pivot [NONZERO] matrix ordering: natural The command line option still works, i.e., ?-pc_factor_mat_ordering_type rcm? gives me the correct behavior. Questions: * Is this a bug introduced in the new version, or * With the new version, I should call this function at a different time? Best, -Ling -------------- next part -------------- An HTML attachment was scrubbed... URL: From liufield at gmail.com Sat Aug 17 16:37:42 2024 From: liufield at gmail.com (neil liu) Date: Sat, 17 Aug 2024 17:37:42 -0400 Subject: [petsc-users] Strong scaling concerns for PCBDDC with Vector FEM In-Reply-To: References: Message-ID: Hi, Stefano, Please see the attached for the information with 4 and 8 CPUs for the complex matrix. I am solving Maxwell equations (Attahced) using 2nd-order Nedelec elements (two dofs each edge, and two dofs each face). The computational domain consists of different mediums, e.g., vacuum and substrate (different permitivity). The PML is used to truncate the computational domain, absorbing the outgoing wave and introducing complex numbers for the matrix. Thanks a lot for your suggestions. I will try MUMPS. For now, I just want to fiddle with Petsc's built-in features to know more about it. Yes. 5000 is larger. Smaller value. e.g., 30, converges very slowly. Thanks a lot. Have a good weekend. On Sat, Aug 17, 2024 at 9:23?AM Stefano Zampini wrote: > Please include the output of -log_view -ksp_view -ksp_monitor to > understand what's happening. > > Can you please share the equations you are solving so we can provide > suggestions on the solver configuration? > As I said, solving for Nedelec-type discretizations is challenging, and > not for off-the-shelf, black box solvers > > Below are some comments: > > > - You use a redundant SVD approach for the coarse solve, which can be > inefficient if your coarse space grows. You can use a parallel direct > solver like MUMPS (reconfigure with --download-mumps and use > -pc_bddc_coarse_pc_type lu -pc_bddc_coarse_pc_factor_mat_solver_type mumps) > - Why use ILU for the Dirichlet problem and GAMG for the Neumann > problem? With 8 processes and 300K total dofs, you will have around 40K > dofs per process, which is ok for a direct solver like MUMPS > (-pc_bddc_dirichlet_pc_factor_mat_solver_type mumps, same for Neumann). > With Nedelec dofs and the sparsity pattern they induce, I believe you can > push to 80K dofs per process with good performance. > - Why 5000 of restart for GMRES? It is highly inefficient to > re-orthogonalize such a large set of vectors. > > > Il giorno ven 16 ago 2024 alle ore 00:04 neil liu ha > scritto: > >> Dear Petsc developers, >> >> Thanks for your previous help. Now, the PCBDDC can converge to 1e-8 with, >> >> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 8 ./app -pc_type bddc >> -pc_bddc_coarse_redundant_pc_type svd -ksp_error_if_not_converged >> -mat_type is -ksp_monitor -ksp_rtol 1e-8 -ksp_gmres_restart 5000 -ksp_view >> -pc_bddc_use_local_mat_graph 0 -pc_bddc_dirichlet_pc_type ilu >> -pc_bddc_neumann_pc_type gamg -pc_bddc_neumann_pc_gamg_esteig_ksp_max_it 10 >> -ksp_converged_reason -pc_bddc_neumann_approximate -ksp_max_it 500 -log_view >> >> Then I used 2 cases for strong scaling test. One case only involves real >> numbers (tetra #: 49,152; dof #: 324, 224 ) for matrix and rhs. The 2nd >> case involves complex numbers (tetra #: 95,336; dof #: 611,432) due to >> PML. >> >> Case 1: >> cpu # Time for 500 ksp steps (s) Parallel efficiency >> PCsetup time(s) >> 2 234.7 >> 3.12 >> 4 126.6 0.92 >> 1.62 >> 8 84.97 0.69 >> 1.26 >> However for Case 2, >> cpu # Time for 500 ksp steps (s) Parallel efficiency >> PCsetup time(s) >> 2 584.5 >> 8.61 >> 4 376.8 0.77 >> 6.56 >> 8 459.6 0.31 >> 66.47 >> For these 2 cases, I checked the time for PCsetup as an example. It seems >> 8 cpus for case 2 used too much time on PCsetup. >> Do you have any ideas about what is going on here? >> >> Thanks, >> Xiaodong >> >> >> > > -- > Stefano > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: LogView_KspView_KspMonitor_ComplexMatrix-4CPU Type: application/octet-stream Size: 58500 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: LogView_KspView_KspMonitor_ComplexMatrix-8CPU Type: application/octet-stream Size: 59661 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Equations.pdf Type: application/pdf Size: 114430 bytes Desc: not available URL: From bsmith at petsc.dev Sun Aug 18 12:22:20 2024 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 18 Aug 2024 13:22:20 -0400 Subject: [petsc-users] PCFactorSetMatOrderingType not working with 3.21 In-Reply-To: References: Message-ID: Are you using -pc_type ilu at the command line in your test? Or just letting it default to using ILU? This could explain the difference, the decision of what preconditioner to default to has moved until later in code, not when the PC is created or the matrix supplied but when it starts to build the preconditioner. Hence when you call PCFactorSetMatOrderingType() in the code the PC may not yet be set to ILU (or anything) hence the PCFactorSetMatOrderingType() is ignored, > On Aug 17, 2024, at 3:44?PM, Zou, Ling wrote: > > Barry, thanks. > I am accessing PETSc through MOOSE. I need to figure out if the versions are consistent and how to test it. > -Ling > > From: Barry Smith > > Date: Saturday, August 17, 2024 at 12:08?PM > To: Zou, Ling > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] PCFactorSetMatOrderingType not working with 3.21 > > I have attached src/snes/tutorials/ex5.?c in which I tried to reproduce your problem by inserting the code you've indicated. However I am not getting the problem you see, I am seeing, type: ilu out-of-place factorization 0 levels of fill tolerance > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > > I have attached src/snes/tutorials/ex5.c in which I tried to reproduce your problem by inserting the code you've indicated. > > However I am not getting the problem you see, I am seeing, > > type: ilu > out-of-place factorization > 0 levels of fill > tolerance for zero pivot 2.22045e-14 > matrix ordering: rcm > > > when I run with -pc_type ilu -snes_view > > Can you please confirm you get the same problem with the attached ex5.c ? You could send your code to see if I can reproduce the problem. > > I am using the release branch of PETSc > > Barry > > > > > > > > > > On Aug 17, 2024, at 12:35?PM, Zou, Ling via petsc-users > wrote: > > Hi all, > > The following codes are how I used to setup PC mat ordering: > > // Setup KSP/PC (at this moment, user-input options and commandline options are available) > SNESGetKSP(snes, &ksp); > KSPSetFromOptions(ksp); > PC pc; > KSPGetPC(ksp, &pc); > PCFactorSetMatOrderingType(pc, MATORDERINGRCM); > // PCFactorSetLevels(pc, 5); > SNESSetFromOptions(snes); > > After switching to PETSc 3.21, this no longer works, and can be confirmed from ?-snes_view? output: > > PC Object: 1 MPI process > type: ilu > out-of-place factorization > 0 levels of fill > tolerance for zero pivot 2.22045e-14 > using diagonal shift to prevent zero pivot [NONZERO] > matrix ordering: natural > > The command line option still works, i.e., ?-pc_factor_mat_ordering_type rcm? gives me the correct behavior. > > Questions: > Is this a bug introduced in the new version, or > With the new version, I should call this function at a different time? > > Best, > > -Ling -------------- next part -------------- An HTML attachment was scrubbed... URL: From lzou at anl.gov Sun Aug 18 17:19:38 2024 From: lzou at anl.gov (Zou, Ling) Date: Sun, 18 Aug 2024 22:19:38 +0000 Subject: [petsc-users] PCFactorSetMatOrderingType not working with 3.21 In-Reply-To: References: Message-ID: Thank you, Barry. You must be right in this case. I am defaulting to ILU. I did an additional test to confirm, with ?-pc_type ilu? in the command line, it works fine. If I am defaulting to ILU, when should I call PCFactorSetMatOrderingType? -Ling From: Barry Smith Date: Sunday, August 18, 2024 at 12:22?PM To: Zou, Ling Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] PCFactorSetMatOrderingType not working with 3.21 Are you using -pc_type ilu at the command line in your test? Or just letting it default to using ILU? This could explain the difference, the decision of what preconditioner to default to has moved until later in code, not when the PC is created ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Are you using -pc_type ilu at the command line in your test? Or just letting it default to using ILU? This could explain the difference, the decision of what preconditioner to default to has moved until later in code, not when the PC is created or the matrix supplied but when it starts to build the preconditioner. Hence when you call PCFactorSetMatOrderingType() in the code the PC may not yet be set to ILU (or anything) hence the PCFactorSetMatOrderingType() is ignored, On Aug 17, 2024, at 3:44?PM, Zou, Ling wrote: Barry, thanks. I am accessing PETSc through MOOSE. I need to figure out if the versions are consistent and how to test it. -Ling From: Barry Smith > Date: Saturday, August 17, 2024 at 12:08?PM To: Zou, Ling > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] PCFactorSetMatOrderingType not working with 3.21 I have attached src/snes/tutorials/ex5.?c in which I tried to reproduce your problem by inserting the code you've indicated. However I am not getting the problem you see, I am seeing, type: ilu out-of-place factorization 0 levels of fill tolerance ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd I have attached src/snes/tutorials/ex5.c in which I tried to reproduce your problem by inserting the code you've indicated. However I am not getting the problem you see, I am seeing, type: ilu out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: rcm when I run with -pc_type ilu -snes_view Can you please confirm you get the same problem with the attached ex5.c ? You could send your code to see if I can reproduce the problem. I am using the release branch of PETSc Barry On Aug 17, 2024, at 12:35?PM, Zou, Ling via petsc-users > wrote: Hi all, The following codes are how I used to setup PC mat ordering: // Setup KSP/PC (at this moment, user-input options and commandline options are available) SNESGetKSP(snes, &ksp); KSPSetFromOptions(ksp); PC pc; KSPGetPC(ksp, &pc); PCFactorSetMatOrderingType(pc, MATORDERINGRCM); // PCFactorSetLevels(pc, 5); SNESSetFromOptions(snes); After switching to PETSc 3.21, this no longer works, and can be confirmed from ?-snes_view? output: PC Object: 1 MPI process type: ilu out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 using diagonal shift to prevent zero pivot [NONZERO] matrix ordering: natural The command line option still works, i.e., ?-pc_factor_mat_ordering_type rcm? gives me the correct behavior. Questions: * Is this a bug introduced in the new version, or * With the new version, I should call this function at a different time? Best, -Ling -------------- next part -------------- An HTML attachment was scrubbed... URL: From lzou at anl.gov Sun Aug 18 18:04:25 2024 From: lzou at anl.gov (Zou, Ling) Date: Sun, 18 Aug 2024 23:04:25 +0000 Subject: [petsc-users] Would Mac OS version affect PETSc/C/C++ performance? Message-ID: Hi all, After updating Mac OS from Ventura to Sonoma, I am seeing my PETSc code having slightly-larger-than 10% of performance degradation (only in terms of execution time). I track the number of major function calls, they are identical between the two OS (so PETSc is not the one to blame), but just slower. Is this something expected, any one also experienced it? -Ling -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sun Aug 18 18:31:11 2024 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 18 Aug 2024 19:31:11 -0400 Subject: [petsc-users] PCFactorSetMatOrderingType not working with 3.21 In-Reply-To: References: Message-ID: You can call PCSetType(pc,PCILU); KSPSetFromOptions(ksp); PCFactorSetMatOrderingType(pc,....); This reproduces the old behavior in PETSc. You can still pass -pc_type somethingelse at runtime to use a different PC. > On Aug 18, 2024, at 6:19?PM, Zou, Ling wrote: > > Thank you, Barry. You must be right in this case. I am defaulting to ILU. > I did an additional test to confirm, with ?-pc_type ilu? in the command line, it works fine. > > If I am defaulting to ILU, when should I call PCFactorSetMatOrderingType? > > -Ling > > > From: Barry Smith > > Date: Sunday, August 18, 2024 at 12:22?PM > To: Zou, Ling > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] PCFactorSetMatOrderingType not working with 3.21 > > Are you using -pc_type ilu at the command line in your test? Or just letting it default to using ILU? This could explain the difference, the decision of what preconditioner to default to has moved until later in code, not when the PC is created > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > > Are you using -pc_type ilu at the command line in your test? Or just letting it default to using ILU? > > This could explain the difference, the decision of what preconditioner to default to has moved until later in code, not when the PC is created or the matrix supplied but when it starts to build the preconditioner. Hence when you call PCFactorSetMatOrderingType() in the code the PC may not yet be set to ILU (or anything) hence the PCFactorSetMatOrderingType() is ignored, > > > > > On Aug 17, 2024, at 3:44?PM, Zou, Ling > wrote: > > Barry, thanks. > I am accessing PETSc through MOOSE. I need to figure out if the versions are consistent and how to test it. > -Ling > > From: Barry Smith > > Date: Saturday, August 17, 2024 at 12:08?PM > To: Zou, Ling > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] PCFactorSetMatOrderingType not working with 3.21 > > I have attached src/snes/tutorials/ex5.?c in which I tried to reproduce your problem by inserting the code you've indicated. However I am not getting the problem you see, I am seeing, type: ilu out-of-place factorization 0 levels of fill tolerance > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > > I have attached src/snes/tutorials/ex5.c in which I tried to reproduce your problem by inserting the code you've indicated. > > However I am not getting the problem you see, I am seeing, > > type: ilu > out-of-place factorization > 0 levels of fill > tolerance for zero pivot 2.22045e-14 > matrix ordering: rcm > > > when I run with -pc_type ilu -snes_view > > Can you please confirm you get the same problem with the attached ex5.c ? You could send your code to see if I can reproduce the problem. > > I am using the release branch of PETSc > > Barry > > > > > > > > > > On Aug 17, 2024, at 12:35?PM, Zou, Ling via petsc-users > wrote: > > Hi all, > > The following codes are how I used to setup PC mat ordering: > > // Setup KSP/PC (at this moment, user-input options and commandline options are available) > SNESGetKSP(snes, &ksp); > KSPSetFromOptions(ksp); > PC pc; > KSPGetPC(ksp, &pc); > PCFactorSetMatOrderingType(pc, MATORDERINGRCM); > // PCFactorSetLevels(pc, 5); > SNESSetFromOptions(snes); > > After switching to PETSc 3.21, this no longer works, and can be confirmed from ?-snes_view? output: > > PC Object: 1 MPI process > type: ilu > out-of-place factorization > 0 levels of fill > tolerance for zero pivot 2.22045e-14 > using diagonal shift to prevent zero pivot [NONZERO] > matrix ordering: natural > > The command line option still works, i.e., ?-pc_factor_mat_ordering_type rcm? gives me the correct behavior. > > Questions: > Is this a bug introduced in the new version, or > With the new version, I should call this function at a different time? > > Best, > > -Ling -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Mon Aug 19 03:15:28 2024 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Mon, 19 Aug 2024 11:15:28 +0300 Subject: [petsc-users] Strong scaling concerns for PCBDDC with Vector FEM In-Reply-To: References: Message-ID: It seems you are using DMPLEX to handle the mesh, correct? If so, you should configure using --download-parmetis to have a better domain decomposition since the default one just splits the cells in chunks as they are ordered. This results in a large number of primal dofs on average (191, from the output of ksp_view) ... Primal dofs : 176 204 191 ... that slows down the solver setup. Again, you should not use approximate local solvers with BDDC unless you know what you are doing. The theory for approximate solvers for BDDC is small and only for SPD problems. Looking at the output of log_view, coarse problem setup (PCBDDCCSet), and primal functions setup (PCBDDCCorr) costs 35 + 63 seconds, respectively. Also, the 500 application of the GAMG preconditioner for the Neumann solver (PCBDDCNeuS) takes 129 seconds out of the 400 seconds of the total solve time. PCBDDCTopo 1 1.0 3.1563e-01 1.0 1.11e+06 3.4 1.6e+03 3.9e+04 3.8e+01 0 0 1 0 2 0 0 1 0 2 19 PCBDDCLKSP 2 1.0 2.0423e+00 1.7 9.31e+08 1.2 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 3378 PCBDDCLWor 1 1.0 3.9178e-02 13.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCBDDCCorr 1 1.0 6.3981e+01 2.2 8.16e+10 1.6 0.0e+00 0.0e+00 0.0e+00 11 11 0 0 0 11 11 0 0 0 8900 PCBDDCCSet 1 1.0 3.5453e+01 4564.9 1.06e+05 1.7 1.2e+03 5.3e+03 5.0e+01 2 0 1 0 3 2 0 1 0 3 0 PCBDDCCKSP 1 1.0 6.3266e-01 1.3 0.00e+00 0.0 3.3e+02 1.1e+02 2.2e+01 0 0 0 0 1 0 0 0 0 1 0 PCBDDCScal 1 1.0 6.8274e-03 1.3 1.11e+06 3.4 5.6e+01 3.2e+05 0.0e+00 0 0 0 0 0 0 0 0 0 0 894 PCBDDCDirS 1000 1.0 6.0420e+00 3.5 6.64e+09 5.4 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 2995 PCBDDCNeuS 500 1.0 1.2901e+02 2.1 8.28e+10 1.2 0.0e+00 0.0e+00 0.0e+00 22 12 0 0 0 22 12 0 0 0 4828 PCBDDCCoaS 500 1.0 5.8757e-01 1.8 1.09e+09 1.0 2.8e+04 7.4e+02 5.0e+02 0 0 17 0 28 0 0 17 0 31 14901 Finally, if I look at the residual history, I see a sharp decrease and a very long plateau. This indicates a bad coarse space; as I said before, there's no hope of finding a suitable coarse space without first changing the basis of the Nedelec elements, which is done automatically if you prescribe the discrete gradient operator (see the paper I have linked to in my previous communication). Il giorno dom 18 ago 2024 alle ore 00:37 neil liu ha scritto: > Hi, Stefano, > Please see the attached for the information with 4 and 8 CPUs for the > complex matrix. > I am solving Maxwell equations (Attahced) using 2nd-order Nedelec elements > (two dofs each edge, and two dofs each face). > The computational domain consists of different mediums, e.g., vacuum and > substrate (different permitivity). > The PML is used to truncate the computational domain, absorbing the > outgoing wave and introducing complex numbers for the matrix. > > Thanks a lot for your suggestions. I will try MUMPS. > For now, I just want to fiddle with Petsc's built-in features to know more > about it. > Yes. 5000 is larger. Smaller value. e.g., 30, converges very slowly. > > Thanks a lot. > > Have a good weekend. > > > On Sat, Aug 17, 2024 at 9:23?AM Stefano Zampini > wrote: > >> Please include the output of -log_view -ksp_view -ksp_monitor to >> understand what's happening. >> >> Can you please share the equations you are solving so we can provide >> suggestions on the solver configuration? >> As I said, solving for Nedelec-type discretizations is challenging, and >> not for off-the-shelf, black box solvers >> >> Below are some comments: >> >> >> - You use a redundant SVD approach for the coarse solve, which can be >> inefficient if your coarse space grows. You can use a parallel direct >> solver like MUMPS (reconfigure with --download-mumps and use >> -pc_bddc_coarse_pc_type lu -pc_bddc_coarse_pc_factor_mat_solver_type mumps) >> - Why use ILU for the Dirichlet problem and GAMG for the Neumann >> problem? With 8 processes and 300K total dofs, you will have around 40K >> dofs per process, which is ok for a direct solver like MUMPS >> (-pc_bddc_dirichlet_pc_factor_mat_solver_type mumps, same for Neumann). >> With Nedelec dofs and the sparsity pattern they induce, I believe you can >> push to 80K dofs per process with good performance. >> - Why 5000 of restart for GMRES? It is highly inefficient to >> re-orthogonalize such a large set of vectors. >> >> >> Il giorno ven 16 ago 2024 alle ore 00:04 neil liu >> ha scritto: >> >>> Dear Petsc developers, >>> >>> Thanks for your previous help. Now, the PCBDDC can converge to 1e-8 >>> with, >>> >>> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 8 ./app -pc_type bddc >>> -pc_bddc_coarse_redundant_pc_type svd -ksp_error_if_not_converged >>> -mat_type is -ksp_monitor -ksp_rtol 1e-8 -ksp_gmres_restart 5000 -ksp_view >>> -pc_bddc_use_local_mat_graph 0 -pc_bddc_dirichlet_pc_type ilu >>> -pc_bddc_neumann_pc_type gamg -pc_bddc_neumann_pc_gamg_esteig_ksp_max_it 10 >>> -ksp_converged_reason -pc_bddc_neumann_approximate -ksp_max_it 500 -log_view >>> >>> Then I used 2 cases for strong scaling test. One case only involves real >>> numbers (tetra #: 49,152; dof #: 324, 224 ) for matrix and rhs. The 2nd >>> case involves complex numbers (tetra #: 95,336; dof #: 611,432) due to >>> PML. >>> >>> Case 1: >>> cpu # Time for 500 ksp steps (s) Parallel efficiency >>> PCsetup time(s) >>> 2 234.7 >>> 3.12 >>> 4 126.6 0.92 >>> 1.62 >>> 8 84.97 0.69 >>> 1.26 >>> However for Case 2, >>> cpu # Time for 500 ksp steps (s) Parallel efficiency >>> PCsetup time(s) >>> 2 584.5 >>> 8.61 >>> 4 376.8 0.77 >>> 6.56 >>> 8 459.6 0.31 >>> 66.47 >>> For these 2 cases, I checked the time for PCsetup as an example. It >>> seems 8 cpus for case 2 used too much time on PCsetup. >>> Do you have any ideas about what is going on here? >>> >>> Thanks, >>> Xiaodong >>> >>> >>> >> >> -- >> Stefano >> > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From lzou at anl.gov Mon Aug 19 08:25:47 2024 From: lzou at anl.gov (Zou, Ling) Date: Mon, 19 Aug 2024 13:25:47 +0000 Subject: [petsc-users] PCFactorSetMatOrderingType not working with 3.21 In-Reply-To: References: Message-ID: That?s nice. Thank you! -Ling From: Barry Smith Date: Sunday, August 18, 2024 at 6:31?PM To: Zou, Ling Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] PCFactorSetMatOrderingType not working with 3.21 You can call PCSetType(pc,PCILU); KSPSetFromOptions(ksp); PCFactorSetMatOrderingType(pc,.?.?.?.?); This reproduces the old behavior in PETSc. You can still pass -pc_type somethingelse at runtime to use a different PC.?On Aug 18, 2024, at 6:?19 PM, ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd You can call PCSetType(pc,PCILU); KSPSetFromOptions(ksp); PCFactorSetMatOrderingType(pc,....); This reproduces the old behavior in PETSc. You can still pass -pc_type somethingelse at runtime to use a different PC. On Aug 18, 2024, at 6:19?PM, Zou, Ling wrote: Thank you, Barry. You must be right in this case. I am defaulting to ILU. I did an additional test to confirm, with ?-pc_type ilu? in the command line, it works fine. If I am defaulting to ILU, when should I call PCFactorSetMatOrderingType? -Ling From: Barry Smith > Date: Sunday, August 18, 2024 at 12:22?PM To: Zou, Ling > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] PCFactorSetMatOrderingType not working with 3.21 Are you using -pc_type ilu at the command line in your test? Or just letting it default to using ILU? This could explain the difference, the decision of what preconditioner to default to has moved until later in code, not when the PC is created ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Are you using -pc_type ilu at the command line in your test? Or just letting it default to using ILU? This could explain the difference, the decision of what preconditioner to default to has moved until later in code, not when the PC is created or the matrix supplied but when it starts to build the preconditioner. Hence when you call PCFactorSetMatOrderingType() in the code the PC may not yet be set to ILU (or anything) hence the PCFactorSetMatOrderingType() is ignored, On Aug 17, 2024, at 3:44?PM, Zou, Ling > wrote: Barry, thanks. I am accessing PETSc through MOOSE. I need to figure out if the versions are consistent and how to test it. -Ling From: Barry Smith > Date: Saturday, August 17, 2024 at 12:08?PM To: Zou, Ling > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] PCFactorSetMatOrderingType not working with 3.21 I have attached src/snes/tutorials/ex5.?c in which I tried to reproduce your problem by inserting the code you've indicated. However I am not getting the problem you see, I am seeing, type: ilu out-of-place factorization 0 levels of fill tolerance ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd I have attached src/snes/tutorials/ex5.c in which I tried to reproduce your problem by inserting the code you've indicated. However I am not getting the problem you see, I am seeing, type: ilu out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: rcm when I run with -pc_type ilu -snes_view Can you please confirm you get the same problem with the attached ex5.c ? You could send your code to see if I can reproduce the problem. I am using the release branch of PETSc Barry On Aug 17, 2024, at 12:35?PM, Zou, Ling via petsc-users > wrote: Hi all, The following codes are how I used to setup PC mat ordering: // Setup KSP/PC (at this moment, user-input options and commandline options are available) SNESGetKSP(snes, &ksp); KSPSetFromOptions(ksp); PC pc; KSPGetPC(ksp, &pc); PCFactorSetMatOrderingType(pc, MATORDERINGRCM); // PCFactorSetLevels(pc, 5); SNESSetFromOptions(snes); After switching to PETSc 3.21, this no longer works, and can be confirmed from ?-snes_view? output: PC Object: 1 MPI process type: ilu out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 using diagonal shift to prevent zero pivot [NONZERO] matrix ordering: natural The command line option still works, i.e., ?-pc_factor_mat_ordering_type rcm? gives me the correct behavior. Questions: * Is this a bug introduced in the new version, or * With the new version, I should call this function at a different time? Best, -Ling -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail2amneet at gmail.com Mon Aug 19 19:23:53 2024 From: mail2amneet at gmail.com (Amneet Bhalla) Date: Mon, 19 Aug 2024 17:23:53 -0700 Subject: [petsc-users] Configure issues with scalapack Message-ID: Hi Folks, I am trying to build PETSc with MUMPS which requires building/downloading scalapack. I used the following configure command to do this: ./configure --PETSC_ARCH=linux-opt --with-debugging=0 --download-hypre=1 --with-x=0 -download-mumps -download-scalapack -download-parmetis -download-metis -download-ptscotch --COPTFLAGS="-O3" --CXXOPTFLAGS="-O3" --FOPTFLAGS="-O3" --with-mpi-dir=/opt/intel/oneapi/mpi/latest For some reason PETSc configure gets stuck at configuring SCALAPACK -- it's been more than 1 hour at this point ============================================================================================= Configuring SCALAPACK with cmake; this may take several minutes ============================================================================================= Any idea what might be going on? Thanks, -- --Amneet -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Aug 19 21:16:23 2024 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 19 Aug 2024 22:16:23 -0400 Subject: [petsc-users] Configure issues with scalapack In-Reply-To: References: Message-ID: <75A3A976-12D4-4787-89F7-5A80CD69E8A2@petsc.dev> You need to send configure.log to petsc-maint at mcs.anl.gov so we can potentially locate the problem. > On Aug 19, 2024, at 8:23?PM, Amneet Bhalla wrote: > > Hi Folks, > > I am trying to build PETSc with MUMPS which requires building/downloading scalapack. I used the following configure command to do this: > > ./configure --PETSC_ARCH=linux-opt --with-debugging=0 --download-hypre=1 --with-x=0 -download-mumps -download-scalapack -download-parmetis -download-metis -download-ptscotch --COPTFLAGS="-O3" --CXXOPTFLAGS="-O3" --FOPTFLAGS="-O3" --with-mpi-dir=/opt/intel/oneapi/mpi/latest > > For some reason PETSc configure gets stuck at configuring SCALAPACK -- it's been more than 1 hour at this point > > ============================================================================================= Configuring SCALAPACK with cmake; this may take several minutes ============================================================================================= > > Any idea what might be going on? > > Thanks, > -- > --Amneet > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Mon Aug 19 22:03:51 2024 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Mon, 19 Aug 2024 22:03:51 -0500 Subject: [petsc-users] Would Mac OS version affect PETSc/C/C++ performance? In-Reply-To: References: Message-ID: Do you have -log_view report so that we can know which petsc functions degraded? Or is it because compilers were different? --Junchao Zhang On Sun, Aug 18, 2024 at 6:04?PM Zou, Ling via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi all, > > > > After updating Mac OS from Ventura to Sonoma, I am seeing my PETSc code > having slightly-larger-than 10% of performance degradation (only in terms > of execution time). > > I track the number of major function calls, they are identical between the > two OS (so PETSc is not the one to blame), but just slower. > > Is this something expected, any one also experienced it? > > > > -Ling > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay.anl at fastmail.org Mon Aug 19 23:23:16 2024 From: balay.anl at fastmail.org (Satish Balay) Date: Mon, 19 Aug 2024 23:23:16 -0500 (CDT) Subject: [petsc-users] Configure issues with scalapack In-Reply-To: <75A3A976-12D4-4787-89F7-5A80CD69E8A2@petsc.dev> References: <75A3A976-12D4-4787-89F7-5A80CD69E8A2@petsc.dev> Message-ID: <055212a0-40f8-6a7f-9961-ab942cf79f71@fastmail.org> I would: - use 'top' to see where the hang is - retry the build and see if the hang persists - tweak compiler options [change from -O3 to -O2 or such, or use latest cmake] and see if that makes a difference. Also note - the instructions for using Intel OneAPI MPI https://urldefense.us/v3/__https://petsc.org/release/install/install/*mpi__;Iw!!G_uCfscf7eWS!fzmxXJAoz6sq09cMkWlxlC7lxKlGq7s8d1lmsvonkVcTttVqgkZBiY7idwr7nk6a4uOcMnl9J2WlJoXCnlQxhllRSCE$ Satish On Mon, 19 Aug 2024, Barry Smith wrote: > > You need to send configure.log to petsc-maint at mcs.anl.gov so we can potentially locate the problem. > > > On Aug 19, 2024, at 8:23?PM, Amneet Bhalla wrote: > > > > Hi Folks, > > > > I am trying to build PETSc with MUMPS which requires building/downloading scalapack. I used the following configure command to do this: > > > > ./configure --PETSC_ARCH=linux-opt --with-debugging=0 --download-hypre=1 --with-x=0 -download-mumps -download-scalapack -download-parmetis -download-metis -download-ptscotch --COPTFLAGS="-O3" --CXXOPTFLAGS="-O3" --FOPTFLAGS="-O3" --with-mpi-dir=/opt/intel/oneapi/mpi/latest > > > > For some reason PETSc configure gets stuck at configuring SCALAPACK -- it's been more than 1 hour at this point > > > > ============================================================================================= Configuring SCALAPACK with cmake; this may take several minutes ============================================================================================= > > > > Any idea what might be going on? > > > > Thanks, > > -- > > --Amneet > > > > > > > > From liufield at gmail.com Tue Aug 20 12:01:22 2024 From: liufield at gmail.com (neil liu) Date: Tue, 20 Aug 2024 13:01:22 -0400 Subject: [petsc-users] Strong scaling concerns for PCBDDC with Vector FEM In-Reply-To: References: Message-ID: Thanks a lot for your explanation, Stefano. Very helpful. Yes. I am using dmplex to read a tetrahdra mesh from gmsh. With parmetis, the scaling performance is improved a lot. I will read your paper about how to change the basis for Nedelec elements. cpu # time for 500 ksp steps (s) parallel efficiency 2 546 4 224 120% 8 170 80% This results are much better than previous attempt. Then I checked the time spent by several Petsc built-in functions for the ksp solver. Functions time(2 cpus) time(4 cpus) time(8 cpus) VecMDot 78.32 43.28 30.47 VecMAXPY 92.95 48.37 30.798 MatMult 246.08 126.63 82.94 It seems from cpu 4 to cpu 8, the scaling is not as good as from cpu 2 to cpu 4. Am I missing something? Thanks a lot, Xiaodong On Mon, Aug 19, 2024 at 4:15?AM Stefano Zampini wrote: > It seems you are using DMPLEX to handle the mesh, correct? > If so, you should configure using --download-parmetis to have a better > domain decomposition since the default one just splits the cells in chunks > as they are ordered. > This results in a large number of primal dofs on average (191, from the > output of ksp_view) > ... > Primal dofs : 176 204 191 > ... > that slows down the solver setup. > > Again, you should not use approximate local solvers with BDDC unless you > know what you are doing. > The theory for approximate solvers for BDDC is small and only for SPD > problems. > Looking at the output of log_view, coarse problem setup (PCBDDCCSet), and > primal functions setup (PCBDDCCorr) costs 35 + 63 seconds, respectively. > Also, the 500 application of the GAMG preconditioner for the Neumann > solver (PCBDDCNeuS) takes 129 seconds out of the 400 seconds of the total > solve time. > > PCBDDCTopo 1 1.0 3.1563e-01 1.0 1.11e+06 3.4 1.6e+03 3.9e+04 > 3.8e+01 0 0 1 0 2 0 0 1 0 2 19 > PCBDDCLKSP 2 1.0 2.0423e+00 1.7 9.31e+08 1.2 0.0e+00 0.0e+00 > 2.0e+00 0 0 0 0 0 0 0 0 0 0 3378 > PCBDDCLWor 1 1.0 3.9178e-02 13.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 > PCBDDCCorr 1 1.0 6.3981e+01 2.2 8.16e+10 1.6 0.0e+00 0.0e+00 > 0.0e+00 11 11 0 0 0 11 11 0 0 0 8900 > PCBDDCCSet 1 1.0 3.5453e+01 4564.9 1.06e+05 1.7 1.2e+03 > 5.3e+03 5.0e+01 2 0 1 0 3 2 0 1 0 3 0 > PCBDDCCKSP 1 1.0 6.3266e-01 1.3 0.00e+00 0.0 3.3e+02 1.1e+02 > 2.2e+01 0 0 0 0 1 0 0 0 0 1 0 > PCBDDCScal 1 1.0 6.8274e-03 1.3 1.11e+06 3.4 5.6e+01 3.2e+05 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 894 > PCBDDCDirS 1000 1.0 6.0420e+00 3.5 6.64e+09 5.4 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 2995 > PCBDDCNeuS 500 1.0 1.2901e+02 2.1 8.28e+10 1.2 0.0e+00 0.0e+00 > 0.0e+00 22 12 0 0 0 22 12 0 0 0 4828 > PCBDDCCoaS 500 1.0 5.8757e-01 1.8 1.09e+09 1.0 2.8e+04 7.4e+02 > 5.0e+02 0 0 17 0 28 0 0 17 0 31 14901 > > Finally, if I look at the residual history, I see a sharp decrease and a > very long plateau. This indicates a bad coarse space; as I said before, > there's no hope of finding a suitable coarse space without first changing > the basis of the Nedelec elements, which is done automatically if you > prescribe the discrete gradient operator (see the paper I have linked to in > my previous communication). > > > > Il giorno dom 18 ago 2024 alle ore 00:37 neil liu ha > scritto: > >> Hi, Stefano, >> Please see the attached for the information with 4 and 8 CPUs for the >> complex matrix. >> I am solving Maxwell equations (Attahced) using 2nd-order Nedelec >> elements (two dofs each edge, and two dofs each face). >> The computational domain consists of different mediums, e.g., vacuum and >> substrate (different permitivity). >> The PML is used to truncate the computational domain, absorbing the >> outgoing wave and introducing complex numbers for the matrix. >> >> Thanks a lot for your suggestions. I will try MUMPS. >> For now, I just want to fiddle with Petsc's built-in features to know >> more about it. >> Yes. 5000 is larger. Smaller value. e.g., 30, converges very slowly. >> >> Thanks a lot. >> >> Have a good weekend. >> >> >> On Sat, Aug 17, 2024 at 9:23?AM Stefano Zampini < >> stefano.zampini at gmail.com> wrote: >> >>> Please include the output of -log_view -ksp_view -ksp_monitor to >>> understand what's happening. >>> >>> Can you please share the equations you are solving so we can provide >>> suggestions on the solver configuration? >>> As I said, solving for Nedelec-type discretizations is challenging, and >>> not for off-the-shelf, black box solvers >>> >>> Below are some comments: >>> >>> >>> - You use a redundant SVD approach for the coarse solve, which can >>> be inefficient if your coarse space grows. You can use a parallel direct >>> solver like MUMPS (reconfigure with --download-mumps and use >>> -pc_bddc_coarse_pc_type lu -pc_bddc_coarse_pc_factor_mat_solver_type mumps) >>> - Why use ILU for the Dirichlet problem and GAMG for the Neumann >>> problem? With 8 processes and 300K total dofs, you will have around 40K >>> dofs per process, which is ok for a direct solver like MUMPS >>> (-pc_bddc_dirichlet_pc_factor_mat_solver_type mumps, same for Neumann). >>> With Nedelec dofs and the sparsity pattern they induce, I believe you can >>> push to 80K dofs per process with good performance. >>> - Why 5000 of restart for GMRES? It is highly inefficient to >>> re-orthogonalize such a large set of vectors. >>> >>> >>> Il giorno ven 16 ago 2024 alle ore 00:04 neil liu >>> ha scritto: >>> >>>> Dear Petsc developers, >>>> >>>> Thanks for your previous help. Now, the PCBDDC can converge to 1e-8 >>>> with, >>>> >>>> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 8 ./app -pc_type bddc >>>> -pc_bddc_coarse_redundant_pc_type svd -ksp_error_if_not_converged >>>> -mat_type is -ksp_monitor -ksp_rtol 1e-8 -ksp_gmres_restart 5000 -ksp_view >>>> -pc_bddc_use_local_mat_graph 0 -pc_bddc_dirichlet_pc_type ilu >>>> -pc_bddc_neumann_pc_type gamg -pc_bddc_neumann_pc_gamg_esteig_ksp_max_it 10 >>>> -ksp_converged_reason -pc_bddc_neumann_approximate -ksp_max_it 500 -log_view >>>> >>>> Then I used 2 cases for strong scaling test. One case only involves >>>> real numbers (tetra #: 49,152; dof #: 324, 224 ) for matrix and rhs. The >>>> 2nd case involves complex numbers (tetra #: 95,336; dof #: 611,432) due >>>> to PML. >>>> >>>> Case 1: >>>> cpu # Time for 500 ksp steps (s) Parallel efficiency >>>> PCsetup time(s) >>>> 2 234.7 >>>> 3.12 >>>> 4 126.6 >>>> 0.92 1.62 >>>> 8 84.97 >>>> 0.69 1.26 >>>> However for Case 2, >>>> cpu # Time for 500 ksp steps (s) Parallel efficiency >>>> PCsetup time(s) >>>> 2 584.5 >>>> 8.61 >>>> 4 376.8 0.77 >>>> 6.56 >>>> 8 459.6 0.31 >>>> 66.47 >>>> For these 2 cases, I checked the time for PCsetup as an example. It >>>> seems 8 cpus for case 2 used too much time on PCsetup. >>>> Do you have any ideas about what is going on here? >>>> >>>> Thanks, >>>> Xiaodong >>>> >>>> >>>> >>> >>> -- >>> Stefano >>> >> > > -- > Stefano > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Aug 20 12:16:22 2024 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 20 Aug 2024 13:16:22 -0400 Subject: [petsc-users] Strong scaling concerns for PCBDDC with Vector FEM In-Reply-To: References: Message-ID: On Tue, Aug 20, 2024 at 1:10?PM neil liu wrote: > Thanks a lot for your explanation, Stefano. Very helpful. > Yes. I am using dmplex to read a tetrahdra mesh from gmsh. With parmetis, > the scaling performance is improved a lot. > I will read your paper about how to change the basis for Nedelec elements. > > cpu # time for 500 ksp steps (s) parallel efficiency > 2 546 > 4 224 120% > 8 170 80% > This results are much better than previous attempt. Then I checked the > time spent by several Petsc built-in functions for the ksp solver. > > Functions time(2 cpus) time(4 cpus) time(8 cpus) > VecMDot 78.32 43.28 30.47 > VecMAXPY 92.95 48.37 30.798 > MatMult 246.08 126.63 82.94 > > It seems from cpu 4 to cpu 8, the scaling is not as good as from cpu 2 to > cpu 4. > Am I missing something? > Did you normalize by the number of calls? Thanks, Matt > Thanks a lot, > > Xiaodong > > > On Mon, Aug 19, 2024 at 4:15?AM Stefano Zampini > wrote: > >> It seems you are using DMPLEX to handle the mesh, correct? >> If so, you should configure using --download-parmetis to have a better >> domain decomposition since the default one just splits the cells in chunks >> as they are ordered. >> This results in a large number of primal dofs on average (191, from the >> output of ksp_view) >> ... >> Primal dofs : 176 204 191 >> ... >> that slows down the solver setup. >> >> Again, you should not use approximate local solvers with BDDC unless you >> know what you are doing. >> The theory for approximate solvers for BDDC is small and only for SPD >> problems. >> Looking at the output of log_view, coarse problem setup (PCBDDCCSet), and >> primal functions setup (PCBDDCCorr) costs 35 + 63 seconds, respectively. >> Also, the 500 application of the GAMG preconditioner for the Neumann >> solver (PCBDDCNeuS) takes 129 seconds out of the 400 seconds of the total >> solve time. >> >> PCBDDCTopo 1 1.0 3.1563e-01 1.0 1.11e+06 3.4 1.6e+03 3.9e+04 >> 3.8e+01 0 0 1 0 2 0 0 1 0 2 19 >> PCBDDCLKSP 2 1.0 2.0423e+00 1.7 9.31e+08 1.2 0.0e+00 0.0e+00 >> 2.0e+00 0 0 0 0 0 0 0 0 0 0 3378 >> PCBDDCLWor 1 1.0 3.9178e-02 13.4 0.00e+00 0.0 0.0e+00 0.0e+00 >> 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> PCBDDCCorr 1 1.0 6.3981e+01 2.2 8.16e+10 1.6 0.0e+00 0.0e+00 >> 0.0e+00 11 11 0 0 0 11 11 0 0 0 8900 >> PCBDDCCSet 1 1.0 3.5453e+01 4564.9 1.06e+05 1.7 1.2e+03 >> 5.3e+03 5.0e+01 2 0 1 0 3 2 0 1 0 3 0 >> PCBDDCCKSP 1 1.0 6.3266e-01 1.3 0.00e+00 0.0 3.3e+02 1.1e+02 >> 2.2e+01 0 0 0 0 1 0 0 0 0 1 0 >> PCBDDCScal 1 1.0 6.8274e-03 1.3 1.11e+06 3.4 5.6e+01 3.2e+05 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 894 >> PCBDDCDirS 1000 1.0 6.0420e+00 3.5 6.64e+09 5.4 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 2995 >> PCBDDCNeuS 500 1.0 1.2901e+02 2.1 8.28e+10 1.2 0.0e+00 0.0e+00 >> 0.0e+00 22 12 0 0 0 22 12 0 0 0 4828 >> PCBDDCCoaS 500 1.0 5.8757e-01 1.8 1.09e+09 1.0 2.8e+04 7.4e+02 >> 5.0e+02 0 0 17 0 28 0 0 17 0 31 14901 >> >> Finally, if I look at the residual history, I see a sharp decrease and a >> very long plateau. This indicates a bad coarse space; as I said before, >> there's no hope of finding a suitable coarse space without first changing >> the basis of the Nedelec elements, which is done automatically if you >> prescribe the discrete gradient operator (see the paper I have linked to in >> my previous communication). >> >> >> >> Il giorno dom 18 ago 2024 alle ore 00:37 neil liu >> ha scritto: >> >>> Hi, Stefano, >>> Please see the attached for the information with 4 and 8 CPUs for the >>> complex matrix. >>> I am solving Maxwell equations (Attahced) using 2nd-order Nedelec >>> elements (two dofs each edge, and two dofs each face). >>> The computational domain consists of different mediums, e.g., vacuum and >>> substrate (different permitivity). >>> The PML is used to truncate the computational domain, absorbing the >>> outgoing wave and introducing complex numbers for the matrix. >>> >>> Thanks a lot for your suggestions. I will try MUMPS. >>> For now, I just want to fiddle with Petsc's built-in features to know >>> more about it. >>> Yes. 5000 is larger. Smaller value. e.g., 30, converges very slowly. >>> >>> Thanks a lot. >>> >>> Have a good weekend. >>> >>> >>> On Sat, Aug 17, 2024 at 9:23?AM Stefano Zampini < >>> stefano.zampini at gmail.com> wrote: >>> >>>> Please include the output of -log_view -ksp_view -ksp_monitor to >>>> understand what's happening. >>>> >>>> Can you please share the equations you are solving so we can provide >>>> suggestions on the solver configuration? >>>> As I said, solving for Nedelec-type discretizations is challenging, and >>>> not for off-the-shelf, black box solvers >>>> >>>> Below are some comments: >>>> >>>> >>>> - You use a redundant SVD approach for the coarse solve, which can >>>> be inefficient if your coarse space grows. You can use a parallel direct >>>> solver like MUMPS (reconfigure with --download-mumps and use >>>> -pc_bddc_coarse_pc_type lu -pc_bddc_coarse_pc_factor_mat_solver_type mumps) >>>> - Why use ILU for the Dirichlet problem and GAMG for the Neumann >>>> problem? With 8 processes and 300K total dofs, you will have around 40K >>>> dofs per process, which is ok for a direct solver like MUMPS >>>> (-pc_bddc_dirichlet_pc_factor_mat_solver_type mumps, same for Neumann). >>>> With Nedelec dofs and the sparsity pattern they induce, I believe you can >>>> push to 80K dofs per process with good performance. >>>> - Why 5000 of restart for GMRES? It is highly inefficient to >>>> re-orthogonalize such a large set of vectors. >>>> >>>> >>>> Il giorno ven 16 ago 2024 alle ore 00:04 neil liu >>>> ha scritto: >>>> >>>>> Dear Petsc developers, >>>>> >>>>> Thanks for your previous help. Now, the PCBDDC can converge to 1e-8 >>>>> with, >>>>> >>>>> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 8 ./app -pc_type >>>>> bddc -pc_bddc_coarse_redundant_pc_type svd -ksp_error_if_not_converged >>>>> -mat_type is -ksp_monitor -ksp_rtol 1e-8 -ksp_gmres_restart 5000 -ksp_view >>>>> -pc_bddc_use_local_mat_graph 0 -pc_bddc_dirichlet_pc_type ilu >>>>> -pc_bddc_neumann_pc_type gamg -pc_bddc_neumann_pc_gamg_esteig_ksp_max_it 10 >>>>> -ksp_converged_reason -pc_bddc_neumann_approximate -ksp_max_it 500 -log_view >>>>> >>>>> Then I used 2 cases for strong scaling test. One case only involves >>>>> real numbers (tetra #: 49,152; dof #: 324, 224 ) for matrix and rhs. The >>>>> 2nd case involves complex numbers (tetra #: 95,336; dof #: 611,432) due >>>>> to PML. >>>>> >>>>> Case 1: >>>>> cpu # Time for 500 ksp steps (s) Parallel >>>>> efficiency PCsetup time(s) >>>>> 2 234.7 >>>>> 3.12 >>>>> 4 126.6 >>>>> 0.92 1.62 >>>>> 8 84.97 >>>>> 0.69 1.26 >>>>> However for Case 2, >>>>> cpu # Time for 500 ksp steps (s) Parallel >>>>> efficiency PCsetup time(s) >>>>> 2 584.5 >>>>> 8.61 >>>>> 4 376.8 >>>>> 0.77 6.56 >>>>> 8 459.6 >>>>> 0.31 66.47 >>>>> For these 2 cases, I checked the time for PCsetup as an example. It >>>>> seems 8 cpus for case 2 used too much time on PCsetup. >>>>> Do you have any ideas about what is going on here? >>>>> >>>>> Thanks, >>>>> Xiaodong >>>>> >>>>> >>>>> >>>> >>>> -- >>>> Stefano >>>> >>> >> >> -- >> Stefano >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cv8zEG85Ua5eMw-4Fw6dtM6pp_fpFPiPqLUZHoZeqOqx846JROreXMlDUQnUBwxMFMeAWi4j-2LV_1iMA-xF$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From liufield at gmail.com Tue Aug 20 12:35:50 2024 From: liufield at gmail.com (neil liu) Date: Tue, 20 Aug 2024 13:35:50 -0400 Subject: [petsc-users] Strong scaling concerns for PCBDDC with Vector FEM In-Reply-To: References: Message-ID: Hi, Matt, I think the time listed here represents the maximum total time across different processors. Thanks a lot. 2 cpus 4 cpus 8 cpus Event Count Time (sec) Count Time (sec) Count Time (sec) Max Ratio Max Ratio Max Ratio Max Ratio Max Ratio Max Ratio VecMDot 530 1.0 7.8320e+01 1.0 530 1.0 4.3285e+01 1.1 530 1.0 3.0476e+01 1.1 VecMAXPY 534 1.0 9.2954e+01 1.0 534 1.0 4.8378e+01 1.1 534 1.0 3.0798e+01 1.1 MatMult 8055 1.0 2.4608e+02 1.0 8103 1.0 1.2663e+02 1.0 8367 1.0 8.2942e+01 1.1 On Tue, Aug 20, 2024 at 1:16?PM Matthew Knepley wrote: > On Tue, Aug 20, 2024 at 1:10?PM neil liu wrote: > >> Thanks a lot for your explanation, Stefano. Very helpful. >> Yes. I am using dmplex to read a tetrahdra mesh from gmsh. With parmetis, >> the scaling performance is improved a lot. >> I will read your paper about how to change the basis for Nedelec >> elements. >> >> cpu # time for 500 ksp steps (s) parallel efficiency >> 2 546 >> 4 224 120% >> 8 170 80% >> This results are much better than previous attempt. Then I checked the >> time spent by several Petsc built-in functions for the ksp solver. >> >> Functions time(2 cpus) time(4 cpus) time(8 cpus) >> VecMDot 78.32 43.28 30.47 >> VecMAXPY 92.95 48.37 30.798 >> MatMult 246.08 126.63 82.94 >> >> It seems from cpu 4 to cpu 8, the scaling is not as good as from cpu 2 to >> cpu 4. >> Am I missing something? >> > > Did you normalize by the number of calls? > > Thanks, > > Matt > > >> Thanks a lot, >> >> Xiaodong >> >> >> On Mon, Aug 19, 2024 at 4:15?AM Stefano Zampini < >> stefano.zampini at gmail.com> wrote: >> >>> It seems you are using DMPLEX to handle the mesh, correct? >>> If so, you should configure using --download-parmetis to have a better >>> domain decomposition since the default one just splits the cells in chunks >>> as they are ordered. >>> This results in a large number of primal dofs on average (191, from the >>> output of ksp_view) >>> ... >>> Primal dofs : 176 204 191 >>> ... >>> that slows down the solver setup. >>> >>> Again, you should not use approximate local solvers with BDDC unless you >>> know what you are doing. >>> The theory for approximate solvers for BDDC is small and only for SPD >>> problems. >>> Looking at the output of log_view, coarse problem setup (PCBDDCCSet), >>> and primal functions setup (PCBDDCCorr) costs 35 + 63 seconds, respectively. >>> Also, the 500 application of the GAMG preconditioner for the Neumann >>> solver (PCBDDCNeuS) takes 129 seconds out of the 400 seconds of the total >>> solve time. >>> >>> PCBDDCTopo 1 1.0 3.1563e-01 1.0 1.11e+06 3.4 1.6e+03 3.9e+04 >>> 3.8e+01 0 0 1 0 2 0 0 1 0 2 19 >>> PCBDDCLKSP 2 1.0 2.0423e+00 1.7 9.31e+08 1.2 0.0e+00 0.0e+00 >>> 2.0e+00 0 0 0 0 0 0 0 0 0 0 3378 >>> PCBDDCLWor 1 1.0 3.9178e-02 13.4 0.00e+00 0.0 0.0e+00 >>> 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> PCBDDCCorr 1 1.0 6.3981e+01 2.2 8.16e+10 1.6 0.0e+00 0.0e+00 >>> 0.0e+00 11 11 0 0 0 11 11 0 0 0 8900 >>> PCBDDCCSet 1 1.0 3.5453e+01 4564.9 1.06e+05 1.7 1.2e+03 >>> 5.3e+03 5.0e+01 2 0 1 0 3 2 0 1 0 3 0 >>> PCBDDCCKSP 1 1.0 6.3266e-01 1.3 0.00e+00 0.0 3.3e+02 1.1e+02 >>> 2.2e+01 0 0 0 0 1 0 0 0 0 1 0 >>> PCBDDCScal 1 1.0 6.8274e-03 1.3 1.11e+06 3.4 5.6e+01 3.2e+05 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 894 >>> PCBDDCDirS 1000 1.0 6.0420e+00 3.5 6.64e+09 5.4 0.0e+00 0.0e+00 >>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 2995 >>> PCBDDCNeuS 500 1.0 1.2901e+02 2.1 8.28e+10 1.2 0.0e+00 0.0e+00 >>> 0.0e+00 22 12 0 0 0 22 12 0 0 0 4828 >>> PCBDDCCoaS 500 1.0 5.8757e-01 1.8 1.09e+09 1.0 2.8e+04 7.4e+02 >>> 5.0e+02 0 0 17 0 28 0 0 17 0 31 14901 >>> >>> Finally, if I look at the residual history, I see a sharp decrease and a >>> very long plateau. This indicates a bad coarse space; as I said before, >>> there's no hope of finding a suitable coarse space without first changing >>> the basis of the Nedelec elements, which is done automatically if you >>> prescribe the discrete gradient operator (see the paper I have linked to in >>> my previous communication). >>> >>> >>> >>> Il giorno dom 18 ago 2024 alle ore 00:37 neil liu >>> ha scritto: >>> >>>> Hi, Stefano, >>>> Please see the attached for the information with 4 and 8 CPUs for the >>>> complex matrix. >>>> I am solving Maxwell equations (Attahced) using 2nd-order Nedelec >>>> elements (two dofs each edge, and two dofs each face). >>>> The computational domain consists of different mediums, e.g., >>>> vacuum and substrate (different permitivity). >>>> The PML is used to truncate the computational domain, absorbing the >>>> outgoing wave and introducing complex numbers for the matrix. >>>> >>>> Thanks a lot for your suggestions. I will try MUMPS. >>>> For now, I just want to fiddle with Petsc's built-in features to know >>>> more about it. >>>> Yes. 5000 is larger. Smaller value. e.g., 30, converges very slowly. >>>> >>>> Thanks a lot. >>>> >>>> Have a good weekend. >>>> >>>> >>>> On Sat, Aug 17, 2024 at 9:23?AM Stefano Zampini < >>>> stefano.zampini at gmail.com> wrote: >>>> >>>>> Please include the output of -log_view -ksp_view -ksp_monitor to >>>>> understand what's happening. >>>>> >>>>> Can you please share the equations you are solving so we can provide >>>>> suggestions on the solver configuration? >>>>> As I said, solving for Nedelec-type discretizations is challenging, >>>>> and not for off-the-shelf, black box solvers >>>>> >>>>> Below are some comments: >>>>> >>>>> >>>>> - You use a redundant SVD approach for the coarse solve, which can >>>>> be inefficient if your coarse space grows. You can use a parallel direct >>>>> solver like MUMPS (reconfigure with --download-mumps and use >>>>> -pc_bddc_coarse_pc_type lu -pc_bddc_coarse_pc_factor_mat_solver_type mumps) >>>>> - Why use ILU for the Dirichlet problem and GAMG for the Neumann >>>>> problem? With 8 processes and 300K total dofs, you will have around 40K >>>>> dofs per process, which is ok for a direct solver like MUMPS >>>>> (-pc_bddc_dirichlet_pc_factor_mat_solver_type mumps, same for Neumann). >>>>> With Nedelec dofs and the sparsity pattern they induce, I believe you can >>>>> push to 80K dofs per process with good performance. >>>>> - Why 5000 of restart for GMRES? It is highly inefficient to >>>>> re-orthogonalize such a large set of vectors. >>>>> >>>>> >>>>> Il giorno ven 16 ago 2024 alle ore 00:04 neil liu >>>>> ha scritto: >>>>> >>>>>> Dear Petsc developers, >>>>>> >>>>>> Thanks for your previous help. Now, the PCBDDC can converge to 1e-8 >>>>>> with, >>>>>> >>>>>> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 8 ./app -pc_type >>>>>> bddc -pc_bddc_coarse_redundant_pc_type svd -ksp_error_if_not_converged >>>>>> -mat_type is -ksp_monitor -ksp_rtol 1e-8 -ksp_gmres_restart 5000 -ksp_view >>>>>> -pc_bddc_use_local_mat_graph 0 -pc_bddc_dirichlet_pc_type ilu >>>>>> -pc_bddc_neumann_pc_type gamg -pc_bddc_neumann_pc_gamg_esteig_ksp_max_it 10 >>>>>> -ksp_converged_reason -pc_bddc_neumann_approximate -ksp_max_it 500 -log_view >>>>>> >>>>>> Then I used 2 cases for strong scaling test. One case only involves >>>>>> real numbers (tetra #: 49,152; dof #: 324, 224 ) for matrix and rhs. The >>>>>> 2nd case involves complex numbers (tetra #: 95,336; dof #: 611,432) due >>>>>> to PML. >>>>>> >>>>>> Case 1: >>>>>> cpu # Time for 500 ksp steps (s) Parallel >>>>>> efficiency PCsetup time(s) >>>>>> 2 234.7 >>>>>> 3.12 >>>>>> 4 126.6 >>>>>> 0.92 1.62 >>>>>> 8 84.97 >>>>>> 0.69 1.26 >>>>>> However for Case 2, >>>>>> cpu # Time for 500 ksp steps (s) Parallel >>>>>> efficiency PCsetup time(s) >>>>>> 2 584.5 >>>>>> 8.61 >>>>>> 4 376.8 >>>>>> 0.77 6.56 >>>>>> 8 459.6 >>>>>> 0.31 66.47 >>>>>> For these 2 cases, I checked the time for PCsetup as an example. It >>>>>> seems 8 cpus for case 2 used too much time on PCsetup. >>>>>> Do you have any ideas about what is going on here? >>>>>> >>>>>> Thanks, >>>>>> Xiaodong >>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Stefano >>>>> >>>> >>> >>> -- >>> Stefano >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ca9AF5cdAY7vJ6tkRgYarVU9gtRitWOShMIF4jR7s-PtvHGDo4bufcirY-qoE9vkvAzYBYCegD6y6bCQf02bqQ$ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Aug 20 12:45:36 2024 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 20 Aug 2024 13:45:36 -0400 Subject: [petsc-users] Strong scaling concerns for PCBDDC with Vector FEM In-Reply-To: References: Message-ID: On Tue, Aug 20, 2024 at 1:36?PM neil liu wrote: > Hi, Matt, > I think the time listed here represents the maximum total time across > different processors. > > Thanks a lot. > 2 cpus > 4 cpus 8 cpus > Event Count Time (sec) Count > Time (sec) Count Time (sec) > Max Ratio Max Ratio Max Ratio > Max Ratio Max Ratio Max Ratio > VecMDot 530 1.0 7.8320e+01 1.0 530 1.0 > 4.3285e+01 1.1 530 1.0 3.0476e+01 1.1 > VecMAXPY 534 1.0 9.2954e+01 1.0 534 1.0 > 4.8378e+01 1.1 534 1.0 3.0798e+01 1.1 > MatMult 8055 1.0 2.4608e+02 1.0 8103 1.0 > 1.2663e+02 1.0 8367 1.0 8.2942e+01 1.1 > For the number of calls listed. 1) The number of MatMults goes up, so you should normalize for that, but you still have about 1.6 speedup. However, this is all multiplications. Are we sure they have the same size and sparsity? 2) MAXPY is also 1.6 3) MDot probably does not see the latency of one node, so again it is not speeding up as you might want. This looks like you are using a single node with 2, 4, and 8 procs. The memory bandwidth is exhausted sometime before 8 procs (maybe 6), so you cease to see speedup. You can check this by running `make streams` on the node. Thanks, Matt > On Tue, Aug 20, 2024 at 1:16?PM Matthew Knepley wrote: > >> On Tue, Aug 20, 2024 at 1:10?PM neil liu wrote: >> >>> Thanks a lot for your explanation, Stefano. Very helpful. >>> Yes. I am using dmplex to read a tetrahdra mesh from gmsh. With >>> parmetis, the scaling performance is improved a lot. >>> I will read your paper about how to change the basis for Nedelec >>> elements. >>> >>> cpu # time for 500 ksp steps (s) parallel efficiency >>> 2 546 >>> 4 224 120% >>> 8 170 80% >>> This results are much better than previous attempt. Then I checked the >>> time spent by several Petsc built-in functions for the ksp solver. >>> >>> Functions time(2 cpus) time(4 cpus) time(8 cpus) >>> VecMDot 78.32 43.28 30.47 >>> VecMAXPY 92.95 48.37 30.798 >>> MatMult 246.08 126.63 82.94 >>> >>> It seems from cpu 4 to cpu 8, the scaling is not as good as from cpu 2 >>> to cpu 4. >>> Am I missing something? >>> >> >> Did you normalize by the number of calls? >> >> Thanks, >> >> Matt >> >> >>> Thanks a lot, >>> >>> Xiaodong >>> >>> >>> On Mon, Aug 19, 2024 at 4:15?AM Stefano Zampini < >>> stefano.zampini at gmail.com> wrote: >>> >>>> It seems you are using DMPLEX to handle the mesh, correct? >>>> If so, you should configure using --download-parmetis to have a better >>>> domain decomposition since the default one just splits the cells in chunks >>>> as they are ordered. >>>> This results in a large number of primal dofs on average (191, from >>>> the output of ksp_view) >>>> ... >>>> Primal dofs : 176 204 191 >>>> ... >>>> that slows down the solver setup. >>>> >>>> Again, you should not use approximate local solvers with BDDC unless >>>> you know what you are doing. >>>> The theory for approximate solvers for BDDC is small and only for SPD >>>> problems. >>>> Looking at the output of log_view, coarse problem setup (PCBDDCCSet), >>>> and primal functions setup (PCBDDCCorr) costs 35 + 63 seconds, respectively. >>>> Also, the 500 application of the GAMG preconditioner for the Neumann >>>> solver (PCBDDCNeuS) takes 129 seconds out of the 400 seconds of the total >>>> solve time. >>>> >>>> PCBDDCTopo 1 1.0 3.1563e-01 1.0 1.11e+06 3.4 1.6e+03 >>>> 3.9e+04 3.8e+01 0 0 1 0 2 0 0 1 0 2 19 >>>> PCBDDCLKSP 2 1.0 2.0423e+00 1.7 9.31e+08 1.2 0.0e+00 >>>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 3378 >>>> PCBDDCLWor 1 1.0 3.9178e-02 13.4 0.00e+00 0.0 0.0e+00 >>>> 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> PCBDDCCorr 1 1.0 6.3981e+01 2.2 8.16e+10 1.6 0.0e+00 >>>> 0.0e+00 0.0e+00 11 11 0 0 0 11 11 0 0 0 8900 >>>> PCBDDCCSet 1 1.0 3.5453e+01 4564.9 1.06e+05 1.7 1.2e+03 >>>> 5.3e+03 5.0e+01 2 0 1 0 3 2 0 1 0 3 0 >>>> PCBDDCCKSP 1 1.0 6.3266e-01 1.3 0.00e+00 0.0 3.3e+02 >>>> 1.1e+02 2.2e+01 0 0 0 0 1 0 0 0 0 1 0 >>>> PCBDDCScal 1 1.0 6.8274e-03 1.3 1.11e+06 3.4 5.6e+01 >>>> 3.2e+05 0.0e+00 0 0 0 0 0 0 0 0 0 0 894 >>>> PCBDDCDirS 1000 1.0 6.0420e+00 3.5 6.64e+09 5.4 0.0e+00 >>>> 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 2995 >>>> PCBDDCNeuS 500 1.0 1.2901e+02 2.1 8.28e+10 1.2 0.0e+00 >>>> 0.0e+00 0.0e+00 22 12 0 0 0 22 12 0 0 0 4828 >>>> PCBDDCCoaS 500 1.0 5.8757e-01 1.8 1.09e+09 1.0 2.8e+04 >>>> 7.4e+02 5.0e+02 0 0 17 0 28 0 0 17 0 31 14901 >>>> >>>> Finally, if I look at the residual history, I see a sharp decrease and >>>> a very long plateau. This indicates a bad coarse space; as I said before, >>>> there's no hope of finding a suitable coarse space without first changing >>>> the basis of the Nedelec elements, which is done automatically if you >>>> prescribe the discrete gradient operator (see the paper I have linked to in >>>> my previous communication). >>>> >>>> >>>> >>>> Il giorno dom 18 ago 2024 alle ore 00:37 neil liu >>>> ha scritto: >>>> >>>>> Hi, Stefano, >>>>> Please see the attached for the information with 4 and 8 CPUs for the >>>>> complex matrix. >>>>> I am solving Maxwell equations (Attahced) using 2nd-order Nedelec >>>>> elements (two dofs each edge, and two dofs each face). >>>>> The computational domain consists of different mediums, e.g., >>>>> vacuum and substrate (different permitivity). >>>>> The PML is used to truncate the computational domain, absorbing the >>>>> outgoing wave and introducing complex numbers for the matrix. >>>>> >>>>> Thanks a lot for your suggestions. I will try MUMPS. >>>>> For now, I just want to fiddle with Petsc's built-in features to know >>>>> more about it. >>>>> Yes. 5000 is larger. Smaller value. e.g., 30, converges very slowly. >>>>> >>>>> Thanks a lot. >>>>> >>>>> Have a good weekend. >>>>> >>>>> >>>>> On Sat, Aug 17, 2024 at 9:23?AM Stefano Zampini < >>>>> stefano.zampini at gmail.com> wrote: >>>>> >>>>>> Please include the output of -log_view -ksp_view -ksp_monitor to >>>>>> understand what's happening. >>>>>> >>>>>> Can you please share the equations you are solving so we can provide >>>>>> suggestions on the solver configuration? >>>>>> As I said, solving for Nedelec-type discretizations is challenging, >>>>>> and not for off-the-shelf, black box solvers >>>>>> >>>>>> Below are some comments: >>>>>> >>>>>> >>>>>> - You use a redundant SVD approach for the coarse solve, which >>>>>> can be inefficient if your coarse space grows. You can use a parallel >>>>>> direct solver like MUMPS (reconfigure with --download-mumps and use >>>>>> -pc_bddc_coarse_pc_type lu -pc_bddc_coarse_pc_factor_mat_solver_type mumps) >>>>>> - Why use ILU for the Dirichlet problem and GAMG for the Neumann >>>>>> problem? With 8 processes and 300K total dofs, you will have around 40K >>>>>> dofs per process, which is ok for a direct solver like MUMPS >>>>>> (-pc_bddc_dirichlet_pc_factor_mat_solver_type mumps, same for Neumann). >>>>>> With Nedelec dofs and the sparsity pattern they induce, I believe you can >>>>>> push to 80K dofs per process with good performance. >>>>>> - Why 5000 of restart for GMRES? It is highly inefficient to >>>>>> re-orthogonalize such a large set of vectors. >>>>>> >>>>>> >>>>>> Il giorno ven 16 ago 2024 alle ore 00:04 neil liu >>>>>> ha scritto: >>>>>> >>>>>>> Dear Petsc developers, >>>>>>> >>>>>>> Thanks for your previous help. Now, the PCBDDC can converge to 1e-8 >>>>>>> with, >>>>>>> >>>>>>> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 8 ./app -pc_type >>>>>>> bddc -pc_bddc_coarse_redundant_pc_type svd -ksp_error_if_not_converged >>>>>>> -mat_type is -ksp_monitor -ksp_rtol 1e-8 -ksp_gmres_restart 5000 -ksp_view >>>>>>> -pc_bddc_use_local_mat_graph 0 -pc_bddc_dirichlet_pc_type ilu >>>>>>> -pc_bddc_neumann_pc_type gamg -pc_bddc_neumann_pc_gamg_esteig_ksp_max_it 10 >>>>>>> -ksp_converged_reason -pc_bddc_neumann_approximate -ksp_max_it 500 -log_view >>>>>>> >>>>>>> Then I used 2 cases for strong scaling test. One case only involves >>>>>>> real numbers (tetra #: 49,152; dof #: 324, 224 ) for matrix and rhs. The >>>>>>> 2nd case involves complex numbers (tetra #: 95,336; dof #: 611,432) due >>>>>>> to PML. >>>>>>> >>>>>>> Case 1: >>>>>>> cpu # Time for 500 ksp steps (s) Parallel >>>>>>> efficiency PCsetup time(s) >>>>>>> 2 234.7 >>>>>>> 3.12 >>>>>>> 4 126.6 >>>>>>> 0.92 1.62 >>>>>>> 8 84.97 >>>>>>> 0.69 1.26 >>>>>>> However for Case 2, >>>>>>> cpu # Time for 500 ksp steps (s) Parallel >>>>>>> efficiency PCsetup time(s) >>>>>>> 2 584.5 >>>>>>> 8.61 >>>>>>> 4 376.8 >>>>>>> 0.77 6.56 >>>>>>> 8 459.6 >>>>>>> 0.31 66.47 >>>>>>> For these 2 cases, I checked the time for PCsetup as an example. It >>>>>>> seems 8 cpus for case 2 used too much time on PCsetup. >>>>>>> Do you have any ideas about what is going on here? >>>>>>> >>>>>>> Thanks, >>>>>>> Xiaodong >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Stefano >>>>>> >>>>> >>>> >>>> -- >>>> Stefano >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!eQty5R8qGgZBZNodHW90OVmUU1tsyjzmP4NkXVvtCk8QMzIM2XIAQEx4RrA_F814zU_1P_RsayqlJ7GNAhca$ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!eQty5R8qGgZBZNodHW90OVmUU1tsyjzmP4NkXVvtCk8QMzIM2XIAQEx4RrA_F814zU_1P_RsayqlJ7GNAhca$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From liufield at gmail.com Tue Aug 20 13:31:21 2024 From: liufield at gmail.com (neil liu) Date: Tue, 20 Aug 2024 14:31:21 -0400 Subject: [petsc-users] Strong scaling concerns for PCBDDC with Vector FEM In-Reply-To: References: Message-ID: Thanks a lot for this explanation, Matt. I will explore whether the matrix has the same size and spaisity. On Tue, Aug 20, 2024 at 1:45?PM Matthew Knepley wrote: > On Tue, Aug 20, 2024 at 1:36?PM neil liu wrote: > >> Hi, Matt, >> I think the time listed here represents the maximum total time across >> different processors. >> >> Thanks a lot. >> 2 cpus >> 4 cpus 8 cpus >> Event Count Time (sec) Count >> Time (sec) Count Time (sec) >> Max Ratio Max Ratio Max Ratio >> Max Ratio Max Ratio Max Ratio >> VecMDot 530 1.0 7.8320e+01 1.0 530 1.0 >> 4.3285e+01 1.1 530 1.0 3.0476e+01 1.1 >> VecMAXPY 534 1.0 9.2954e+01 1.0 534 1.0 >> 4.8378e+01 1.1 534 1.0 3.0798e+01 1.1 >> MatMult 8055 1.0 2.4608e+02 1.0 8103 1.0 >> 1.2663e+02 1.0 8367 1.0 8.2942e+01 1.1 >> > > For the number of calls listed. > > 1) The number of MatMults goes up, so you should normalize for that, but > you still have about 1.6 speedup. However, this is > all multiplications. Are we sure they have the same size and sparsity? > > 2) MAXPY is also 1.6 > > 3) MDot probably does not see the latency of one node, so again it is not > speeding up as you might want. > > This looks like you are using a single node with 2, 4, and 8 procs. The > memory bandwidth is exhausted sometime before 8 procs > (maybe 6), so you cease to see speedup. You can check this by running > `make streams` on the node. > > Thanks, > > Matt > > >> On Tue, Aug 20, 2024 at 1:16?PM Matthew Knepley >> wrote: >> >>> On Tue, Aug 20, 2024 at 1:10?PM neil liu wrote: >>> >>>> Thanks a lot for your explanation, Stefano. Very helpful. >>>> Yes. I am using dmplex to read a tetrahdra mesh from gmsh. With >>>> parmetis, the scaling performance is improved a lot. >>>> I will read your paper about how to change the basis for Nedelec >>>> elements. >>>> >>>> cpu # time for 500 ksp steps (s) parallel efficiency >>>> 2 546 >>>> 4 224 120% >>>> 8 170 80% >>>> This results are much better than previous attempt. Then I checked the >>>> time spent by several Petsc built-in functions for the ksp solver. >>>> >>>> Functions time(2 cpus) time(4 cpus) time(8 cpus) >>>> VecMDot 78.32 43.28 30.47 >>>> VecMAXPY 92.95 48.37 30.798 >>>> MatMult 246.08 126.63 82.94 >>>> >>>> It seems from cpu 4 to cpu 8, the scaling is not as good as from cpu 2 >>>> to cpu 4. >>>> Am I missing something? >>>> >>> >>> Did you normalize by the number of calls? >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks a lot, >>>> >>>> Xiaodong >>>> >>>> >>>> On Mon, Aug 19, 2024 at 4:15?AM Stefano Zampini < >>>> stefano.zampini at gmail.com> wrote: >>>> >>>>> It seems you are using DMPLEX to handle the mesh, correct? >>>>> If so, you should configure using --download-parmetis to have a better >>>>> domain decomposition since the default one just splits the cells in chunks >>>>> as they are ordered. >>>>> This results in a large number of primal dofs on average (191, from >>>>> the output of ksp_view) >>>>> ... >>>>> Primal dofs : 176 204 191 >>>>> ... >>>>> that slows down the solver setup. >>>>> >>>>> Again, you should not use approximate local solvers with BDDC unless >>>>> you know what you are doing. >>>>> The theory for approximate solvers for BDDC is small and only for SPD >>>>> problems. >>>>> Looking at the output of log_view, coarse problem setup (PCBDDCCSet), >>>>> and primal functions setup (PCBDDCCorr) costs 35 + 63 seconds, respectively. >>>>> Also, the 500 application of the GAMG preconditioner for the Neumann >>>>> solver (PCBDDCNeuS) takes 129 seconds out of the 400 seconds of the total >>>>> solve time. >>>>> >>>>> PCBDDCTopo 1 1.0 3.1563e-01 1.0 1.11e+06 3.4 1.6e+03 >>>>> 3.9e+04 3.8e+01 0 0 1 0 2 0 0 1 0 2 19 >>>>> PCBDDCLKSP 2 1.0 2.0423e+00 1.7 9.31e+08 1.2 0.0e+00 >>>>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 3378 >>>>> PCBDDCLWor 1 1.0 3.9178e-02 13.4 0.00e+00 0.0 0.0e+00 >>>>> 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> PCBDDCCorr 1 1.0 6.3981e+01 2.2 8.16e+10 1.6 0.0e+00 >>>>> 0.0e+00 0.0e+00 11 11 0 0 0 11 11 0 0 0 8900 >>>>> PCBDDCCSet 1 1.0 3.5453e+01 4564.9 1.06e+05 1.7 1.2e+03 >>>>> 5.3e+03 5.0e+01 2 0 1 0 3 2 0 1 0 3 0 >>>>> PCBDDCCKSP 1 1.0 6.3266e-01 1.3 0.00e+00 0.0 3.3e+02 >>>>> 1.1e+02 2.2e+01 0 0 0 0 1 0 0 0 0 1 0 >>>>> PCBDDCScal 1 1.0 6.8274e-03 1.3 1.11e+06 3.4 5.6e+01 >>>>> 3.2e+05 0.0e+00 0 0 0 0 0 0 0 0 0 0 894 >>>>> PCBDDCDirS 1000 1.0 6.0420e+00 3.5 6.64e+09 5.4 0.0e+00 >>>>> 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 2995 >>>>> PCBDDCNeuS 500 1.0 1.2901e+02 2.1 8.28e+10 1.2 0.0e+00 >>>>> 0.0e+00 0.0e+00 22 12 0 0 0 22 12 0 0 0 4828 >>>>> PCBDDCCoaS 500 1.0 5.8757e-01 1.8 1.09e+09 1.0 2.8e+04 >>>>> 7.4e+02 5.0e+02 0 0 17 0 28 0 0 17 0 31 14901 >>>>> >>>>> Finally, if I look at the residual history, I see a sharp decrease and >>>>> a very long plateau. This indicates a bad coarse space; as I said before, >>>>> there's no hope of finding a suitable coarse space without first changing >>>>> the basis of the Nedelec elements, which is done automatically if you >>>>> prescribe the discrete gradient operator (see the paper I have linked to in >>>>> my previous communication). >>>>> >>>>> >>>>> >>>>> Il giorno dom 18 ago 2024 alle ore 00:37 neil liu >>>>> ha scritto: >>>>> >>>>>> Hi, Stefano, >>>>>> Please see the attached for the information with 4 and 8 CPUs for the >>>>>> complex matrix. >>>>>> I am solving Maxwell equations (Attahced) using 2nd-order Nedelec >>>>>> elements (two dofs each edge, and two dofs each face). >>>>>> The computational domain consists of different mediums, e.g., >>>>>> vacuum and substrate (different permitivity). >>>>>> The PML is used to truncate the computational domain, absorbing the >>>>>> outgoing wave and introducing complex numbers for the matrix. >>>>>> >>>>>> Thanks a lot for your suggestions. I will try MUMPS. >>>>>> For now, I just want to fiddle with Petsc's built-in features to know >>>>>> more about it. >>>>>> Yes. 5000 is larger. Smaller value. e.g., 30, converges very slowly. >>>>>> >>>>>> Thanks a lot. >>>>>> >>>>>> Have a good weekend. >>>>>> >>>>>> >>>>>> On Sat, Aug 17, 2024 at 9:23?AM Stefano Zampini < >>>>>> stefano.zampini at gmail.com> wrote: >>>>>> >>>>>>> Please include the output of -log_view -ksp_view -ksp_monitor to >>>>>>> understand what's happening. >>>>>>> >>>>>>> Can you please share the equations you are solving so we can provide >>>>>>> suggestions on the solver configuration? >>>>>>> As I said, solving for Nedelec-type discretizations is challenging, >>>>>>> and not for off-the-shelf, black box solvers >>>>>>> >>>>>>> Below are some comments: >>>>>>> >>>>>>> >>>>>>> - You use a redundant SVD approach for the coarse solve, which >>>>>>> can be inefficient if your coarse space grows. You can use a parallel >>>>>>> direct solver like MUMPS (reconfigure with --download-mumps and use >>>>>>> -pc_bddc_coarse_pc_type lu -pc_bddc_coarse_pc_factor_mat_solver_type mumps) >>>>>>> - Why use ILU for the Dirichlet problem and GAMG for the Neumann >>>>>>> problem? With 8 processes and 300K total dofs, you will have around 40K >>>>>>> dofs per process, which is ok for a direct solver like MUMPS >>>>>>> (-pc_bddc_dirichlet_pc_factor_mat_solver_type mumps, same for Neumann). >>>>>>> With Nedelec dofs and the sparsity pattern they induce, I believe you can >>>>>>> push to 80K dofs per process with good performance. >>>>>>> - Why 5000 of restart for GMRES? It is highly inefficient to >>>>>>> re-orthogonalize such a large set of vectors. >>>>>>> >>>>>>> >>>>>>> Il giorno ven 16 ago 2024 alle ore 00:04 neil liu < >>>>>>> liufield at gmail.com> ha scritto: >>>>>>> >>>>>>>> Dear Petsc developers, >>>>>>>> >>>>>>>> Thanks for your previous help. Now, the PCBDDC can converge to 1e-8 >>>>>>>> with, >>>>>>>> >>>>>>>> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 8 ./app -pc_type >>>>>>>> bddc -pc_bddc_coarse_redundant_pc_type svd -ksp_error_if_not_converged >>>>>>>> -mat_type is -ksp_monitor -ksp_rtol 1e-8 -ksp_gmres_restart 5000 -ksp_view >>>>>>>> -pc_bddc_use_local_mat_graph 0 -pc_bddc_dirichlet_pc_type ilu >>>>>>>> -pc_bddc_neumann_pc_type gamg -pc_bddc_neumann_pc_gamg_esteig_ksp_max_it 10 >>>>>>>> -ksp_converged_reason -pc_bddc_neumann_approximate -ksp_max_it 500 -log_view >>>>>>>> >>>>>>>> Then I used 2 cases for strong scaling test. One case only involves >>>>>>>> real numbers (tetra #: 49,152; dof #: 324, 224 ) for matrix and rhs. The >>>>>>>> 2nd case involves complex numbers (tetra #: 95,336; dof #: 611,432) due >>>>>>>> to PML. >>>>>>>> >>>>>>>> Case 1: >>>>>>>> cpu # Time for 500 ksp steps (s) Parallel >>>>>>>> efficiency PCsetup time(s) >>>>>>>> 2 234.7 >>>>>>>> 3.12 >>>>>>>> 4 126.6 >>>>>>>> 0.92 1.62 >>>>>>>> 8 84.97 >>>>>>>> 0.69 1.26 >>>>>>>> However for Case 2, >>>>>>>> cpu # Time for 500 ksp steps (s) Parallel >>>>>>>> efficiency PCsetup time(s) >>>>>>>> 2 584.5 >>>>>>>> 8.61 >>>>>>>> 4 376.8 >>>>>>>> 0.77 6.56 >>>>>>>> 8 459.6 >>>>>>>> 0.31 66.47 >>>>>>>> For these 2 cases, I checked the time for PCsetup as an example. It >>>>>>>> seems 8 cpus for case 2 used too much time on PCsetup. >>>>>>>> Do you have any ideas about what is going on here? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Xiaodong >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Stefano >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Stefano >>>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cyJV1P5nkrWGBQ-UVnZtbe-PVdnBESh8O4cLvI1MXjIrzOtnmzeW7XOz2HYHoQMXSg3E7SmUvsqb_dL2fyWPhg$ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cyJV1P5nkrWGBQ-UVnZtbe-PVdnBESh8O4cLvI1MXjIrzOtnmzeW7XOz2HYHoQMXSg3E7SmUvsqb_dL2fyWPhg$ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Aug 20 16:53:20 2024 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 20 Aug 2024 17:53:20 -0400 Subject: [petsc-users] Strong scaling concerns for PCBDDC with Vector FEM In-Reply-To: References: Message-ID: On Tue, Aug 20, 2024 at 2:31?PM neil liu wrote: > Thanks a lot for this explanation, Matt. I will explore whether the matrix > has the same size and spaisity. > I think it is much more likely that you just exhausted bandwidth on the node. Thanks, Matt > On Tue, Aug 20, 2024 at 1:45?PM Matthew Knepley wrote: > >> On Tue, Aug 20, 2024 at 1:36?PM neil liu wrote: >> >>> Hi, Matt, >>> I think the time listed here represents the maximum total time across >>> different processors. >>> >>> Thanks a lot. >>> 2 cpus >>> 4 cpus 8 cpus >>> Event Count Time (sec) Count >>> Time (sec) Count Time (sec) >>> Max Ratio Max Ratio Max >>> Ratio Max Ratio Max Ratio Max Ratio >>> VecMDot 530 1.0 7.8320e+01 1.0 530 1.0 >>> 4.3285e+01 1.1 530 1.0 3.0476e+01 1.1 >>> VecMAXPY 534 1.0 9.2954e+01 1.0 534 1.0 >>> 4.8378e+01 1.1 534 1.0 3.0798e+01 1.1 >>> MatMult 8055 1.0 2.4608e+02 1.0 8103 1.0 >>> 1.2663e+02 1.0 8367 1.0 8.2942e+01 1.1 >>> >> >> For the number of calls listed. >> >> 1) The number of MatMults goes up, so you should normalize for that, but >> you still have about 1.6 speedup. However, this is >> all multiplications. Are we sure they have the same size and sparsity? >> >> 2) MAXPY is also 1.6 >> >> 3) MDot probably does not see the latency of one node, so again it is not >> speeding up as you might want. >> >> This looks like you are using a single node with 2, 4, and 8 procs. The >> memory bandwidth is exhausted sometime before 8 procs >> (maybe 6), so you cease to see speedup. You can check this by running >> `make streams` on the node. >> >> Thanks, >> >> Matt >> >> >>> On Tue, Aug 20, 2024 at 1:16?PM Matthew Knepley >>> wrote: >>> >>>> On Tue, Aug 20, 2024 at 1:10?PM neil liu wrote: >>>> >>>>> Thanks a lot for your explanation, Stefano. Very helpful. >>>>> Yes. I am using dmplex to read a tetrahdra mesh from gmsh. With >>>>> parmetis, the scaling performance is improved a lot. >>>>> I will read your paper about how to change the basis for Nedelec >>>>> elements. >>>>> >>>>> cpu # time for 500 ksp steps (s) parallel efficiency >>>>> 2 546 >>>>> 4 224 120% >>>>> 8 170 80% >>>>> This results are much better than previous attempt. Then I checked the >>>>> time spent by several Petsc built-in functions for the ksp solver. >>>>> >>>>> Functions time(2 cpus) time(4 cpus) time(8 cpus) >>>>> VecMDot 78.32 43.28 30.47 >>>>> VecMAXPY 92.95 48.37 30.798 >>>>> MatMult 246.08 126.63 82.94 >>>>> >>>>> It seems from cpu 4 to cpu 8, the scaling is not as good as from cpu 2 >>>>> to cpu 4. >>>>> Am I missing something? >>>>> >>>> >>>> Did you normalize by the number of calls? >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Thanks a lot, >>>>> >>>>> Xiaodong >>>>> >>>>> >>>>> On Mon, Aug 19, 2024 at 4:15?AM Stefano Zampini < >>>>> stefano.zampini at gmail.com> wrote: >>>>> >>>>>> It seems you are using DMPLEX to handle the mesh, correct? >>>>>> If so, you should configure using --download-parmetis to have a >>>>>> better domain decomposition since the default one just splits the cells in >>>>>> chunks as they are ordered. >>>>>> This results in a large number of primal dofs on average (191, from >>>>>> the output of ksp_view) >>>>>> ... >>>>>> Primal dofs : 176 204 191 >>>>>> ... >>>>>> that slows down the solver setup. >>>>>> >>>>>> Again, you should not use approximate local solvers with BDDC unless >>>>>> you know what you are doing. >>>>>> The theory for approximate solvers for BDDC is small and only for SPD >>>>>> problems. >>>>>> Looking at the output of log_view, coarse problem setup (PCBDDCCSet), >>>>>> and primal functions setup (PCBDDCCorr) costs 35 + 63 seconds, respectively. >>>>>> Also, the 500 application of the GAMG preconditioner for the Neumann >>>>>> solver (PCBDDCNeuS) takes 129 seconds out of the 400 seconds of the total >>>>>> solve time. >>>>>> >>>>>> PCBDDCTopo 1 1.0 3.1563e-01 1.0 1.11e+06 3.4 1.6e+03 >>>>>> 3.9e+04 3.8e+01 0 0 1 0 2 0 0 1 0 2 19 >>>>>> PCBDDCLKSP 2 1.0 2.0423e+00 1.7 9.31e+08 1.2 0.0e+00 >>>>>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 3378 >>>>>> PCBDDCLWor 1 1.0 3.9178e-02 13.4 0.00e+00 0.0 0.0e+00 >>>>>> 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>> PCBDDCCorr 1 1.0 6.3981e+01 2.2 8.16e+10 1.6 0.0e+00 >>>>>> 0.0e+00 0.0e+00 11 11 0 0 0 11 11 0 0 0 8900 >>>>>> PCBDDCCSet 1 1.0 3.5453e+01 4564.9 1.06e+05 1.7 1.2e+03 >>>>>> 5.3e+03 5.0e+01 2 0 1 0 3 2 0 1 0 3 0 >>>>>> PCBDDCCKSP 1 1.0 6.3266e-01 1.3 0.00e+00 0.0 3.3e+02 >>>>>> 1.1e+02 2.2e+01 0 0 0 0 1 0 0 0 0 1 0 >>>>>> PCBDDCScal 1 1.0 6.8274e-03 1.3 1.11e+06 3.4 5.6e+01 >>>>>> 3.2e+05 0.0e+00 0 0 0 0 0 0 0 0 0 0 894 >>>>>> PCBDDCDirS 1000 1.0 6.0420e+00 3.5 6.64e+09 5.4 0.0e+00 >>>>>> 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 2995 >>>>>> PCBDDCNeuS 500 1.0 1.2901e+02 2.1 8.28e+10 1.2 0.0e+00 >>>>>> 0.0e+00 0.0e+00 22 12 0 0 0 22 12 0 0 0 4828 >>>>>> PCBDDCCoaS 500 1.0 5.8757e-01 1.8 1.09e+09 1.0 2.8e+04 >>>>>> 7.4e+02 5.0e+02 0 0 17 0 28 0 0 17 0 31 14901 >>>>>> >>>>>> Finally, if I look at the residual history, I see a sharp decrease >>>>>> and a very long plateau. This indicates a bad coarse space; as I said >>>>>> before, there's no hope of finding a suitable coarse space without first >>>>>> changing the basis of the Nedelec elements, which is done automatically if >>>>>> you prescribe the discrete gradient operator (see the paper I have linked >>>>>> to in my previous communication). >>>>>> >>>>>> >>>>>> >>>>>> Il giorno dom 18 ago 2024 alle ore 00:37 neil liu >>>>>> ha scritto: >>>>>> >>>>>>> Hi, Stefano, >>>>>>> Please see the attached for the information with 4 and 8 CPUs for >>>>>>> the complex matrix. >>>>>>> I am solving Maxwell equations (Attahced) using 2nd-order Nedelec >>>>>>> elements (two dofs each edge, and two dofs each face). >>>>>>> The computational domain consists of different mediums, e.g., >>>>>>> vacuum and substrate (different permitivity). >>>>>>> The PML is used to truncate the computational domain, absorbing the >>>>>>> outgoing wave and introducing complex numbers for the matrix. >>>>>>> >>>>>>> Thanks a lot for your suggestions. I will try MUMPS. >>>>>>> For now, I just want to fiddle with Petsc's built-in features to >>>>>>> know more about it. >>>>>>> Yes. 5000 is larger. Smaller value. e.g., 30, converges very slowly. >>>>>>> >>>>>>> Thanks a lot. >>>>>>> >>>>>>> Have a good weekend. >>>>>>> >>>>>>> >>>>>>> On Sat, Aug 17, 2024 at 9:23?AM Stefano Zampini < >>>>>>> stefano.zampini at gmail.com> wrote: >>>>>>> >>>>>>>> Please include the output of -log_view -ksp_view -ksp_monitor to >>>>>>>> understand what's happening. >>>>>>>> >>>>>>>> Can you please share the equations you are solving so we can >>>>>>>> provide suggestions on the solver configuration? >>>>>>>> As I said, solving for Nedelec-type discretizations is challenging, >>>>>>>> and not for off-the-shelf, black box solvers >>>>>>>> >>>>>>>> Below are some comments: >>>>>>>> >>>>>>>> >>>>>>>> - You use a redundant SVD approach for the coarse solve, which >>>>>>>> can be inefficient if your coarse space grows. You can use a parallel >>>>>>>> direct solver like MUMPS (reconfigure with --download-mumps and use >>>>>>>> -pc_bddc_coarse_pc_type lu -pc_bddc_coarse_pc_factor_mat_solver_type mumps) >>>>>>>> - Why use ILU for the Dirichlet problem and GAMG for the >>>>>>>> Neumann problem? With 8 processes and 300K total dofs, you will have around >>>>>>>> 40K dofs per process, which is ok for a direct solver like MUMPS >>>>>>>> (-pc_bddc_dirichlet_pc_factor_mat_solver_type mumps, same for Neumann). >>>>>>>> With Nedelec dofs and the sparsity pattern they induce, I believe you can >>>>>>>> push to 80K dofs per process with good performance. >>>>>>>> - Why 5000 of restart for GMRES? It is highly inefficient to >>>>>>>> re-orthogonalize such a large set of vectors. >>>>>>>> >>>>>>>> >>>>>>>> Il giorno ven 16 ago 2024 alle ore 00:04 neil liu < >>>>>>>> liufield at gmail.com> ha scritto: >>>>>>>> >>>>>>>>> Dear Petsc developers, >>>>>>>>> >>>>>>>>> Thanks for your previous help. Now, the PCBDDC can converge to >>>>>>>>> 1e-8 with, >>>>>>>>> >>>>>>>>> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 8 ./app -pc_type >>>>>>>>> bddc -pc_bddc_coarse_redundant_pc_type svd -ksp_error_if_not_converged >>>>>>>>> -mat_type is -ksp_monitor -ksp_rtol 1e-8 -ksp_gmres_restart 5000 -ksp_view >>>>>>>>> -pc_bddc_use_local_mat_graph 0 -pc_bddc_dirichlet_pc_type ilu >>>>>>>>> -pc_bddc_neumann_pc_type gamg -pc_bddc_neumann_pc_gamg_esteig_ksp_max_it 10 >>>>>>>>> -ksp_converged_reason -pc_bddc_neumann_approximate -ksp_max_it 500 -log_view >>>>>>>>> >>>>>>>>> Then I used 2 cases for strong scaling test. One case only >>>>>>>>> involves real numbers (tetra #: 49,152; dof #: 324, 224 ) for matrix and >>>>>>>>> rhs. The 2nd case involves complex numbers (tetra #: 95,336; dof #: >>>>>>>>> 611,432) due to PML. >>>>>>>>> >>>>>>>>> Case 1: >>>>>>>>> cpu # Time for 500 ksp steps (s) Parallel >>>>>>>>> efficiency PCsetup time(s) >>>>>>>>> 2 234.7 >>>>>>>>> 3.12 >>>>>>>>> 4 126.6 >>>>>>>>> 0.92 1.62 >>>>>>>>> 8 84.97 >>>>>>>>> 0.69 1.26 >>>>>>>>> However for Case 2, >>>>>>>>> cpu # Time for 500 ksp steps (s) Parallel >>>>>>>>> efficiency PCsetup time(s) >>>>>>>>> 2 584.5 >>>>>>>>> 8.61 >>>>>>>>> 4 376.8 >>>>>>>>> 0.77 6.56 >>>>>>>>> 8 459.6 >>>>>>>>> 0.31 66.47 >>>>>>>>> For these 2 cases, I checked the time for PCsetup as an example. >>>>>>>>> It seems 8 cpus for case 2 used too much time on PCsetup. >>>>>>>>> Do you have any ideas about what is going on here? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Xiaodong >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Stefano >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Stefano >>>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!c1-7PTlMFjRSGEtUBfqX0W9JQed5UTJTHCsmwhm4whuZoTMIll340dHxiKyGvIedaFLp4VcuBIrnBKMFP6GD$ >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!c1-7PTlMFjRSGEtUBfqX0W9JQed5UTJTHCsmwhm4whuZoTMIll340dHxiKyGvIedaFLp4VcuBIrnBKMFP6GD$ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!c1-7PTlMFjRSGEtUBfqX0W9JQed5UTJTHCsmwhm4whuZoTMIll340dHxiKyGvIedaFLp4VcuBIrnBKMFP6GD$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Aug 20 20:44:05 2024 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 20 Aug 2024 21:44:05 -0400 Subject: [petsc-users] Strong scaling concerns for PCBDDC with Vector FEM In-Reply-To: References: Message-ID: <683BA4D7-A421-4610-8D1F-3EE5A53C7B5A@petsc.dev> See the detailed discussion at https://urldefense.us/v3/__https://petsc.org/main/manual/streams/__;!!G_uCfscf7eWS!a3P4JjUgPCzentaJNryo2MwVyxl-cDAbiuEsoucMRAbQELiLDTyLtn-3nuro0gjye5CW9EGD2cuep7AG667XDu4$ > On Aug 20, 2024, at 5:53?PM, Matthew Knepley wrote: > > On Tue, Aug 20, 2024 at 2:31?PM neil liu > wrote: >> Thanks a lot for this explanation, Matt. I will explore whether the matrix has the same size and spaisity. > > I think it is much more likely that you just exhausted bandwidth on the node. > > Thanks, > > Matt > >> On Tue, Aug 20, 2024 at 1:45?PM Matthew Knepley > wrote: >>> On Tue, Aug 20, 2024 at 1:36?PM neil liu > wrote: >>>> Hi, Matt, >>>> I think the time listed here represents the maximum total time across different processors. >>>> >>>> Thanks a lot. >>>> 2 cpus 4 cpus 8 cpus >>>> Event Count Time (sec) Count Time (sec) Count Time (sec) >>>> Max Ratio Max Ratio Max Ratio Max Ratio Max Ratio Max Ratio >>>> VecMDot 530 1.0 7.8320e+01 1.0 530 1.0 4.3285e+01 1.1 530 1.0 3.0476e+01 1.1 >>>> VecMAXPY 534 1.0 9.2954e+01 1.0 534 1.0 4.8378e+01 1.1 534 1.0 3.0798e+01 1.1 >>>> MatMult 8055 1.0 2.4608e+02 1.0 8103 1.0 1.2663e+02 1.0 8367 1.0 8.2942e+01 1.1 >>> >>> For the number of calls listed. >>> >>> 1) The number of MatMults goes up, so you should normalize for that, but you still have about 1.6 speedup. However, this is >>> all multiplications. Are we sure they have the same size and sparsity? >>> >>> 2) MAXPY is also 1.6 >>> >>> 3) MDot probably does not see the latency of one node, so again it is not speeding up as you might want. >>> >>> This looks like you are using a single node with 2, 4, and 8 procs. The memory bandwidth is exhausted sometime before 8 procs >>> (maybe 6), so you cease to see speedup. You can check this by running `make streams` on the node. >>> >>> Thanks, >>> >>> Matt >>> >>>> On Tue, Aug 20, 2024 at 1:16?PM Matthew Knepley > wrote: >>>>> On Tue, Aug 20, 2024 at 1:10?PM neil liu > wrote: >>>>>> Thanks a lot for your explanation, Stefano. Very helpful. >>>>>> Yes. I am using dmplex to read a tetrahdra mesh from gmsh. With parmetis, the scaling performance is improved a lot. >>>>>> I will read your paper about how to change the basis for Nedelec elements. >>>>>> >>>>>> cpu # time for 500 ksp steps (s) parallel efficiency >>>>>> 2 546 >>>>>> 4 224 120% >>>>>> 8 170 80% >>>>>> This results are much better than previous attempt. Then I checked the time spent by several Petsc built-in functions for the ksp solver. >>>>>> >>>>>> Functions time(2 cpus) time(4 cpus) time(8 cpus) >>>>>> VecMDot 78.32 43.28 30.47 >>>>>> VecMAXPY 92.95 48.37 30.798 >>>>>> MatMult 246.08 126.63 82.94 >>>>>> >>>>>> It seems from cpu 4 to cpu 8, the scaling is not as good as from cpu 2 to cpu 4. >>>>>> Am I missing something? >>>>> >>>>> Did you normalize by the number of calls? >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>>> Thanks a lot, >>>>>> >>>>>> Xiaodong >>>>>> >>>>>> >>>>>> On Mon, Aug 19, 2024 at 4:15?AM Stefano Zampini > wrote: >>>>>>> It seems you are using DMPLEX to handle the mesh, correct? >>>>>>> If so, you should configure using --download-parmetis to have a better domain decomposition since the default one just splits the cells in chunks as they are ordered. >>>>>>> This results in a large number of primal dofs on average (191, from the output of ksp_view) >>>>>>> ... >>>>>>> Primal dofs : 176 204 191 >>>>>>> ... >>>>>>> that slows down the solver setup. >>>>>>> >>>>>>> Again, you should not use approximate local solvers with BDDC unless you know what you are doing. >>>>>>> The theory for approximate solvers for BDDC is small and only for SPD problems. >>>>>>> Looking at the output of log_view, coarse problem setup (PCBDDCCSet), and primal functions setup (PCBDDCCorr) costs 35 + 63 seconds, respectively. >>>>>>> Also, the 500 application of the GAMG preconditioner for the Neumann solver (PCBDDCNeuS) takes 129 seconds out of the 400 seconds of the total solve time. >>>>>>> >>>>>>> PCBDDCTopo 1 1.0 3.1563e-01 1.0 1.11e+06 3.4 1.6e+03 3.9e+04 3.8e+01 0 0 1 0 2 0 0 1 0 2 19 >>>>>>> PCBDDCLKSP 2 1.0 2.0423e+00 1.7 9.31e+08 1.2 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 3378 >>>>>>> PCBDDCLWor 1 1.0 3.9178e-02 13.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>> PCBDDCCorr 1 1.0 6.3981e+01 2.2 8.16e+10 1.6 0.0e+00 0.0e+00 0.0e+00 11 11 0 0 0 11 11 0 0 0 8900 >>>>>>> PCBDDCCSet 1 1.0 3.5453e+01 4564.9 1.06e+05 1.7 1.2e+03 5.3e+03 5.0e+01 2 0 1 0 3 2 0 1 0 3 0 >>>>>>> PCBDDCCKSP 1 1.0 6.3266e-01 1.3 0.00e+00 0.0 3.3e+02 1.1e+02 2.2e+01 0 0 0 0 1 0 0 0 0 1 0 >>>>>>> PCBDDCScal 1 1.0 6.8274e-03 1.3 1.11e+06 3.4 5.6e+01 3.2e+05 0.0e+00 0 0 0 0 0 0 0 0 0 0 894 >>>>>>> PCBDDCDirS 1000 1.0 6.0420e+00 3.5 6.64e+09 5.4 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 2995 >>>>>>> PCBDDCNeuS 500 1.0 1.2901e+02 2.1 8.28e+10 1.2 0.0e+00 0.0e+00 0.0e+00 22 12 0 0 0 22 12 0 0 0 4828 >>>>>>> PCBDDCCoaS 500 1.0 5.8757e-01 1.8 1.09e+09 1.0 2.8e+04 7.4e+02 5.0e+02 0 0 17 0 28 0 0 17 0 31 14901 >>>>>>> >>>>>>> Finally, if I look at the residual history, I see a sharp decrease and a very long plateau. This indicates a bad coarse space; as I said before, there's no hope of finding a suitable coarse space without first changing the basis of the Nedelec elements, which is done automatically if you prescribe the discrete gradient operator (see the paper I have linked to in my previous communication). >>>>>>> >>>>>>> >>>>>>> >>>>>>> Il giorno dom 18 ago 2024 alle ore 00:37 neil liu > ha scritto: >>>>>>>> Hi, Stefano, >>>>>>>> Please see the attached for the information with 4 and 8 CPUs for the complex matrix. >>>>>>>> I am solving Maxwell equations (Attahced) using 2nd-order Nedelec elements (two dofs each edge, and two dofs each face). >>>>>>>> The computational domain consists of different mediums, e.g., vacuum and substrate (different permitivity). >>>>>>>> The PML is used to truncate the computational domain, absorbing the outgoing wave and introducing complex numbers for the matrix. >>>>>>>> >>>>>>>> Thanks a lot for your suggestions. I will try MUMPS. >>>>>>>> For now, I just want to fiddle with Petsc's built-in features to know more about it. >>>>>>>> Yes. 5000 is larger. Smaller value. e.g., 30, converges very slowly. >>>>>>>> >>>>>>>> Thanks a lot. >>>>>>>> >>>>>>>> Have a good weekend. >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Aug 17, 2024 at 9:23?AM Stefano Zampini > wrote: >>>>>>>>> Please include the output of -log_view -ksp_view -ksp_monitor to understand what's happening. >>>>>>>>> >>>>>>>>> Can you please share the equations you are solving so we can provide suggestions on the solver configuration? >>>>>>>>> As I said, solving for Nedelec-type discretizations is challenging, and not for off-the-shelf, black box solvers >>>>>>>>> >>>>>>>>> Below are some comments: >>>>>>>>> >>>>>>>>> You use a redundant SVD approach for the coarse solve, which can be inefficient if your coarse space grows. You can use a parallel direct solver like MUMPS (reconfigure with --download-mumps and use -pc_bddc_coarse_pc_type lu -pc_bddc_coarse_pc_factor_mat_solver_type mumps) >>>>>>>>> Why use ILU for the Dirichlet problem and GAMG for the Neumann problem? With 8 processes and 300K total dofs, you will have around 40K dofs per process, which is ok for a direct solver like MUMPS (-pc_bddc_dirichlet_pc_factor_mat_solver_type mumps, same for Neumann). With Nedelec dofs and the sparsity pattern they induce, I believe you can push to 80K dofs per process with good performance. >>>>>>>>> Why 5000 of restart for GMRES? It is highly inefficient to re-orthogonalize such a large set of vectors. >>>>>>>>> >>>>>>>>> Il giorno ven 16 ago 2024 alle ore 00:04 neil liu > ha scritto: >>>>>>>>>> Dear Petsc developers, >>>>>>>>>> >>>>>>>>>> Thanks for your previous help. Now, the PCBDDC can converge to 1e-8 with, >>>>>>>>>> >>>>>>>>>> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 8 ./app -pc_type bddc -pc_bddc_coarse_redundant_pc_type svd -ksp_error_if_not_converged -mat_type is -ksp_monitor -ksp_rtol 1e-8 -ksp_gmres_restart 5000 -ksp_view -pc_bddc_use_local_mat_graph 0 -pc_bddc_dirichlet_pc_type ilu -pc_bddc_neumann_pc_type gamg -pc_bddc_neumann_pc_gamg_esteig_ksp_max_it 10 -ksp_converged_reason -pc_bddc_neumann_approximate -ksp_max_it 500 -log_view >>>>>>>>>> >>>>>>>>>> Then I used 2 cases for strong scaling test. One case only involves real numbers (tetra #: 49,152; dof #: 324, 224 ) for matrix and rhs. The 2nd case involves complex numbers (tetra #: 95,336; dof #: 611,432) due to PML. >>>>>>>>>> >>>>>>>>>> Case 1: >>>>>>>>>> cpu # Time for 500 ksp steps (s) Parallel efficiency PCsetup time(s) >>>>>>>>>> 2 234.7 3.12 >>>>>>>>>> 4 126.6 0.92 1.62 >>>>>>>>>> 8 84.97 0.69 1.26 >>>>>>>>>> However for Case 2, >>>>>>>>>> cpu # Time for 500 ksp steps (s) Parallel efficiency PCsetup time(s) >>>>>>>>>> 2 584.5 8.61 >>>>>>>>>> 4 376.8 0.77 6.56 >>>>>>>>>> 8 459.6 0.31 66.47 >>>>>>>>>> For these 2 cases, I checked the time for PCsetup as an example. It seems 8 cpus for case 2 used too much time on PCsetup. >>>>>>>>>> Do you have any ideas about what is going on here? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Xiaodong >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Stefano >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Stefano >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!a3P4JjUgPCzentaJNryo2MwVyxl-cDAbiuEsoucMRAbQELiLDTyLtn-3nuro0gjye5CW9EGD2cuep7AGveiw7Wc$ >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!a3P4JjUgPCzentaJNryo2MwVyxl-cDAbiuEsoucMRAbQELiLDTyLtn-3nuro0gjye5CW9EGD2cuep7AGveiw7Wc$ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!a3P4JjUgPCzentaJNryo2MwVyxl-cDAbiuEsoucMRAbQELiLDTyLtn-3nuro0gjye5CW9EGD2cuep7AGveiw7Wc$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From lzou at anl.gov Wed Aug 21 08:57:18 2024 From: lzou at anl.gov (Zou, Ling) Date: Wed, 21 Aug 2024 13:57:18 +0000 Subject: [petsc-users] Would Mac OS version affect PETSc/C/C++ performance? In-Reply-To: References: Message-ID: Hi Junchao, Yeah, I have part of the log_view, for the same code, same version of PETSc (3.20), but two OS (Ventura vs. Sonoma). Note that PETSc function call numbers are exactly the same. I suspect that it?s just OS becomes slower, or maybe something related to the compiler. -Ling # of calls Time spent (Ventura) Time spent (Sonoma) MatMult MF 20463 3.718600E+00 4.467800E+00 MatMult 20463 3.721000E+00 4.470500E+00 MatFDColorApply 2062 4.507000E+00 5.394600E+00 MatFDColorFunc 24744 4.472400E+00 5.356300E+00 KSPSolve 2062 3.569700E+00 4.262400E+00 SNESSolve 986 9.195900E+00 1.102000E+01 SNESFunctionEval 23575 4.268600E+00 5.161100E+00 SNESJacobianEval 2062 4.509300E+00 5.397500E+00 From: Junchao Zhang Date: Monday, August 19, 2024 at 10:04?PM To: Zou, Ling Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Would Mac OS version affect PETSc/C/C++ performance? Do you have -log_view report so that we can know which petsc functions degraded? Or is it because compilers were different? --Junchao Zhang On Sun, Aug 18, 2024 at 6:?04 PM Zou, Ling via petsc-users wrote: Hi ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Do you have -log_view report so that we can know which petsc functions degraded? Or is it because compilers were different? --Junchao Zhang On Sun, Aug 18, 2024 at 6:04?PM Zou, Ling via petsc-users > wrote: Hi all, After updating Mac OS from Ventura to Sonoma, I am seeing my PETSc code having slightly-larger-than 10% of performance degradation (only in terms of execution time). I track the number of major function calls, they are identical between the two OS (so PETSc is not the one to blame), but just slower. Is this something expected, any one also experienced it? -Ling -------------- next part -------------- An HTML attachment was scrubbed... URL: From zonexo at gmail.com Wed Aug 21 10:19:01 2024 From: zonexo at gmail.com (TAY Wee Beng) Date: Wed, 21 Aug 2024 23:19:01 +0800 Subject: [petsc-users] Error during code compile with MATGETOWNERSHIPRANGE Message-ID: <28024196-ea0c-4ec7-ab13-a893d2852a04@gmail.com> Hi, I am using the latest PETSc thru github. I compiled both the debug and rel ver of PETSc w/o problem. I then use it with my CFD code and the debug ver works. However, I have problems with the rel ver: */ftn -o global.o -c -O3 -g -ip -ipo?? -fPIC? -save -w -I/home/project/11003851/lib/petsc_210824_intel_rel/include global.F90 ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is now deprecated and will be discontinued late 2024. Intel recommends that customers transition now to using the LLVM-based Intel(R) Fortran Compiler (ifx) for continued Windows* and Linux* support, new language support, new language features, and optimizations. Use '-diag-disable=10448' to disable this message. global.F90(444): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] ??????? call MatGetOwnershipRange(A_mat,ksta_p,kend_p,ierr) -------------^ global.F90(720): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] call MatGetOwnershipRange(A_mat_uv,ksta_m,kend_m,ierr) -----^ global.F90(774): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr) -----^ global.F90(776): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr) -----^ global.F90(949): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr) -----^ global.F90(957): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr) -----^ compilation aborted for global.F90 (code 1)/* May I know what's the problem? -- Thank you very much. Yours sincerely, ================================================ TAY Wee-Beng ??? (Zheng Weiming) ================================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Aug 21 11:03:49 2024 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 21 Aug 2024 12:03:49 -0400 Subject: [petsc-users] Error during code compile with MATGETOWNERSHIPRANGE In-Reply-To: <28024196-ea0c-4ec7-ab13-a893d2852a04@gmail.com> References: <28024196-ea0c-4ec7-ab13-a893d2852a04@gmail.com> Message-ID: <2CC29C86-EF68-4405-97F2-93EA0C25B9F2@petsc.dev> You must declare as PetscInt ksta_p,kend_p Perhaps they are declared as arrays? > On Aug 21, 2024, at 11:19?AM, TAY Wee Beng wrote: > > Hi, > > I am using the latest PETSc thru github. I compiled both the debug and rel ver of PETSc w/o problem. > > I then use it with my CFD code and the debug ver works. > > However, I have problems with the rel ver: > > ftn -o global.o -c -O3 -g -ip -ipo -fPIC -save -w -I/home/project/11003851/lib/petsc_210824_intel_rel/include global.F90 > ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is now deprecated and will be discontinued late 2024. Intel recommends that customers transition now to using the LLVM-based Intel(R) Fortran Compiler (ifx) for continued Windows* and Linux* support, new language support, new language features, and optimizations. Use '-diag-disable=10448' to disable this message. > global.F90(444): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] > call MatGetOwnershipRange(A_mat,ksta_p,kend_p,ierr) > -------------^ > global.F90(720): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] > call MatGetOwnershipRange(A_mat_uv,ksta_m,kend_m,ierr) > -----^ > global.F90(774): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] > call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr) > -----^ > global.F90(776): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] > call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr) > -----^ > global.F90(949): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] > call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr) > -----^ > global.F90(957): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] > call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr) > -----^ > compilation aborted for global.F90 (code 1) > > May I know what's the problem? > > -- > > Thank you very much. > > Yours sincerely, > > ================================================ > TAY Wee-Beng ??? (Zheng Weiming) > ================================================ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From meator.dev at gmail.com Wed Aug 21 13:15:59 2024 From: meator.dev at gmail.com (meator) Date: Wed, 21 Aug 2024 20:15:59 +0200 Subject: [petsc-users] petscerror.h is potentially missing includes Message-ID: <207f84ff-fb28-48b2-af72-bc0b8ea0cd4c@gmail.com> Hello. I have skimmed through PETSc's documentation to see whether PETSc has any special policy for including header files, but I didn't find anything, so I assume that standard C rules apply. The problematic header file is . The following code doesn't compile: #include int main() { return 0; } It fails because expects `MPI_Comm` to be defined, but it is (I assume) lacking appropriate includes which would define it. This is unfortunate, because many linters targeting C/C++ sort header files alphabetically. Since "petsc" is the common prefix for most PETSc header files, `petscerror.h` was put first in my header list because it begins with an "e". I'm using PETSc version 3.21.3. Thanks in advance -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_0x1A14CB3464CBE5BF.asc Type: application/pgp-keys Size: 6275 bytes Desc: OpenPGP public key URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature.asc Type: application/pgp-signature Size: 659 bytes Desc: OpenPGP digital signature URL: From pierre at joliv.et Wed Aug 21 13:40:25 2024 From: pierre at joliv.et (Pierre Jolivet) Date: Wed, 21 Aug 2024 20:40:25 +0200 Subject: [petsc-users] petscerror.h is potentially missing includes In-Reply-To: <207f84ff-fb28-48b2-af72-bc0b8ea0cd4c@gmail.com> References: <207f84ff-fb28-48b2-af72-bc0b8ea0cd4c@gmail.com> Message-ID: Cross-referencing https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/issues/1254__;!!G_uCfscf7eWS!dXHKUheZr_zi40ctF3flLGD0N_qAfFixD8DHUmzFJXKbIKjQ1jFS1-kfRf_GGXnljrjgjIyvcP-9POvAjSl8zA$ Thanks, Pierre > On 21 Aug 2024, at 8:15?PM, meator wrote: > > Hello. I have skimmed through PETSc's documentation to see whether PETSc has any special policy for including header files, but I didn't find anything, so I assume that standard C rules apply. > > The problematic header file is . The following code doesn't compile: > > #include > > int main() { return 0; } > > It fails because expects `MPI_Comm` to be defined, but it is (I assume) lacking appropriate includes which would define it. This is unfortunate, because many linters targeting C/C++ sort header files alphabetically. Since "petsc" is the common prefix for most PETSc header files, `petscerror.h` was put first in my header list because it begins with an "e". > > I'm using PETSc version 3.21.3. > > Thanks in advance > -------------- next part -------------- An HTML attachment was scrubbed... URL: From meator.dev at gmail.com Wed Aug 21 13:44:14 2024 From: meator.dev at gmail.com (meator) Date: Wed, 21 Aug 2024 20:44:14 +0200 Subject: [petsc-users] petscerror.h is potentially missing includes In-Reply-To: References: <207f84ff-fb28-48b2-af72-bc0b8ea0cd4c@gmail.com> Message-ID: <177d281c-06f1-4bf4-8d04-575596c1c797@gmail.com> Ah, I didn't know that this bug is reported already. Thanks for the pointer! On 8/21/24 8:40 PM, Pierre Jolivet wrote: > Cross-referencing https://gitlab.com/petsc/petsc/-/issues/1254 > > > Thanks, > Pierre > >> On 21 Aug 2024, at 8:15?PM, meator wrote: >> >> Hello. I have skimmed through PETSc's documentation to see whether >> PETSc has any special policy for including header files, but I didn't >> find anything, so I assume that standard C rules apply. >> >> The problematic header file is . The following code >> doesn't compile: >> >> ???#include >> >> ???int main() { return 0; } >> >> It fails because expects `MPI_Comm` to be defined, but >> it is (I assume) lacking appropriate includes which would define it. >> This is unfortunate, because many linters targeting C/C++ sort header >> files alphabetically. Since "petsc" is the common prefix for most >> PETSc header files, `petscerror.h` was put first in my header list >> because it begins with an "e". >> >> I'm using PETSc version 3.21.3. >> >> Thanks in advance >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_0x1A14CB3464CBE5BF.asc Type: application/pgp-keys Size: 6275 bytes Desc: OpenPGP public key URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature.asc Type: application/pgp-signature Size: 659 bytes Desc: OpenPGP digital signature URL: From bsmith at petsc.dev Thu Aug 22 08:40:42 2024 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 22 Aug 2024 09:40:42 -0400 Subject: [petsc-users] Error during code compile with MATGETOWNERSHIPRANGE In-Reply-To: References: <28024196-ea0c-4ec7-ab13-a893d2852a04@gmail.com> <2CC29C86-EF68-4405-97F2-93EA0C25B9F2@petsc.dev> Message-ID: <8F81DAC9-6A51-4AD2-9D8F-AC6FDCF2A007@petsc.dev> Fortran 90 type checking is very tight; The dimension of the array, or scalar passed as arguments must match the expected dimension (f77 did not do this type checking). Thus the ione argument must be a 1-d array as well as the numerical values so do > call MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr) > See Fortran at https://urldefense.us/v3/__https://petsc.org/main/changes/dev/__;!!G_uCfscf7eWS!fyaeYb3WlpH1d83aZxEB9RHOQhvgYlgvDJA4PQ389kQZjJxTKqgLZj0Jdglufyhde7YlMWKSo8z5ZSw_DhHXk48$ I am trying to support the old-fashion F77 model, allowing miss-matches in the array dimensions while still doing proper type checking but it will take some time to simplify the API. Barry > On Aug 21, 2024, at 9:44?PM, TAY Wee Beng wrote: > > Hi Barry, > > I have declared them as integers in Fortran. Is that different from PetscInt and how come it works in debug mode? > > Anyway, I changed them and it solved the problem. However, I have a similar problem in my boundary.F90: > > boundary.F90(6685): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATSETVALUES] > call MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr) > -----^ > I changed all to PetscInt and also PetscReal but I still got the error. > > Why is this so now? Any solution? > > Thanks! > > On 22/8/2024 12:03 am, Barry Smith wrote: >> >> You must declare as >> >> PetscInt ksta_p,kend_p >> >> Perhaps they are declared as arrays? >> >> >>> On Aug 21, 2024, at 11:19?AM, TAY Wee Beng wrote: >>> >>> Hi, >>> >>> I am using the latest PETSc thru github. I compiled both the debug and rel ver of PETSc w/o problem. >>> >>> I then use it with my CFD code and the debug ver works. >>> >>> However, I have problems with the rel ver: >>> >>> ftn -o global.o -c -O3 -g -ip -ipo -fPIC -save -w -I/home/project/11003851/lib/petsc_210824_intel_rel/include global.F90 >>> ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is now deprecated and will be discontinued late 2024. Intel recommends that customers transition now to using the LLVM-based Intel(R) Fortran Compiler (ifx) for continued Windows* and Linux* support, new language support, new language features, and optimizations. Use '-diag-disable=10448' to disable this message. >>> global.F90(444): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>> call MatGetOwnershipRange(A_mat,ksta_p,kend_p,ierr) >>> -------------^ >>> global.F90(720): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>> call MatGetOwnershipRange(A_mat_uv,ksta_m,kend_m,ierr) >>> -----^ >>> global.F90(774): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr) >>> -----^ >>> global.F90(776): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr) >>> -----^ >>> global.F90(949): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr) >>> -----^ >>> global.F90(957): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr) >>> -----^ >>> compilation aborted for global.F90 (code 1) >>> >>> May I know what's the problem? >>> >>> -- >>> >>> Thank you very much. >>> >>> Yours sincerely, >>> >>> ================================================ >>> TAY Wee-Beng ??? (Zheng Weiming) >>> ================================================ >>> >>> >> > -- > > Thank you very much. > > Yours sincerely, > > ================================================ > TAY Wee-Beng ??? (Zheng Weiming) > ================================================ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zonexo at gmail.com Thu Aug 22 08:45:45 2024 From: zonexo at gmail.com (TAY Wee Beng) Date: Thu, 22 Aug 2024 21:45:45 +0800 Subject: [petsc-users] Error during code compile with MATGETOWNERSHIPRANGE In-Reply-To: <8F81DAC9-6A51-4AD2-9D8F-AC6FDCF2A007@petsc.dev> References: <28024196-ea0c-4ec7-ab13-a893d2852a04@gmail.com> <2CC29C86-EF68-4405-97F2-93EA0C25B9F2@petsc.dev> <8F81DAC9-6A51-4AD2-9D8F-AC6FDCF2A007@petsc.dev> Message-ID: Hi Barry, Do you mean that I change from: call MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr) to call MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr) ? I did it but the error is still there. On 22/8/2024 9:40 pm, Barry Smith wrote: > > ? ?Fortran 90 type checking is very tight; The dimension of the array, > or scalar passed as arguments must match the expected dimension (f77 > did not do this type checking). Thus the ione argument must be a 1-d > array as well as the numerical values so do > >> */call >> MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr)/* >> > > See Fortran at https://urldefense.us/v3/__https://petsc.org/main/changes/dev/__;!!G_uCfscf7eWS!YtzPtW9XslKdNpPZd4zGIwtB0bpm5C24PUmAaH-renGV54WI9JpWuh7yYG-oSS4g9_KOnSqCEFPSwHcba_c$ > > I am trying to support the old-fashion F77 model, allowing > miss-matches in the array dimensions while still doing proper type > checking but it will take some time to simplify the API. > > ? ?Barry > > > >> On Aug 21, 2024, at 9:44?PM, TAY Wee Beng wrote: >> >> Hi Barry, >> >> I have declared them as integers in Fortran. Is that different from >> PetscInt and how come it works in debug mode? >> >> Anyway, I changed them and it solved the problem. However, I have a >> similar problem in my boundary.F90: >> >> */boundary.F90(6685): error #6285: There is no matching specific >> subroutine for this generic subroutine call.?? [MATSETVALUES] >> call >> MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr)/* >> -----^ >> I changed all to PetscInt and also PetscReal but I still got the error. >> >> Why is this so now? Any solution? >> >> Thanks! >> >> On 22/8/2024 12:03 am, Barry Smith wrote: >>> >>> ? You must declare as >>> >>> */? PetscInt ksta_p,kend_p/* >>> >>> ? Perhaps they are declared as arrays? >>> >>> >>>> On Aug 21, 2024, at 11:19?AM, TAY Wee Beng wrote: >>>> >>>> Hi, >>>> >>>> I am using the latest PETSc thru github. I compiled both the debug >>>> and rel ver of PETSc w/o problem. >>>> >>>> I then use it with my CFD code and the debug ver works. >>>> >>>> However, I have problems with the rel ver: >>>> >>>> */ftn -o global.o -c -O3 -g -ip -ipo?? -fPIC? -save -w >>>> -I/home/project/11003851/lib/petsc_210824_intel_rel/include global.F90 >>>> ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is >>>> now deprecated and will be discontinued late 2024. Intel recommends >>>> that customers transition now to using the LLVM-based Intel(R) >>>> Fortran Compiler (ifx) for continued Windows* and Linux* support, >>>> new language support, new language features, and optimizations. Use >>>> '-diag-disable=10448' to disable this message. >>>> global.F90(444): error #6285: There is no matching specific >>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>> ??????? call MatGetOwnershipRange(A_mat,ksta_p,kend_p,ierr) >>>> -------------^ >>>> global.F90(720): error #6285: There is no matching specific >>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>> call MatGetOwnershipRange(A_mat_uv,ksta_m,kend_m,ierr) >>>> -----^ >>>> global.F90(774): error #6285: There is no matching specific >>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr) >>>> -----^ >>>> global.F90(776): error #6285: There is no matching specific >>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr) >>>> -----^ >>>> global.F90(949): error #6285: There is no matching specific >>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr) >>>> -----^ >>>> global.F90(957): error #6285: There is no matching specific >>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr) >>>> -----^ >>>> compilation aborted for global.F90 (code 1)/* >>>> >>>> May I know what's the problem? >>>> >>>> -- >>>> >>>> Thank you very much. >>>> >>>> Yours sincerely, >>>> >>>> ================================================ >>>> TAY Wee-Beng ??? (Zheng Weiming) >>>> ================================================ >>>> >>>> >>> >> -- >> >> Thank you very much. >> >> Yours sincerely, >> >> ================================================ >> TAY Wee-Beng ??? (Zheng Weiming) >> ================================================ >> >> > -- Thank you very much. Yours sincerely, ================================================ TAY Wee-Beng ??? (Zheng Weiming) ================================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Aug 22 08:54:50 2024 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 22 Aug 2024 09:54:50 -0400 Subject: [petsc-users] Error during code compile with MATGETOWNERSHIPRANGE In-Reply-To: References: <28024196-ea0c-4ec7-ab13-a893d2852a04@gmail.com> <2CC29C86-EF68-4405-97F2-93EA0C25B9F2@petsc.dev> <8F81DAC9-6A51-4AD2-9D8F-AC6FDCF2A007@petsc.dev> Message-ID: What is int_impl(k,5) defined type? > On Aug 22, 2024, at 9:45?AM, TAY Wee Beng wrote: > > Hi Barry, > > Do you mean that I change from: > > call MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr) > > to > > call MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr) > > ? > > I did it but the error is still there. > > On 22/8/2024 9:40 pm, Barry Smith wrote: >> >> Fortran 90 type checking is very tight; The dimension of the array, or scalar passed as arguments must match the expected dimension (f77 did not do this type checking). Thus the ione argument must be a 1-d array as well as the numerical values so do >> >>> call MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr) >>> >> >> See Fortran at https://urldefense.us/v3/__https://petsc.org/main/changes/dev/__;!!G_uCfscf7eWS!aWzePGVwtbySxFNNIyaUMeslM1i47XZ6Q8Cu-XOfcXtqa0fUhIxUXbnw6aeXBBE-k5uGtriqZ7_yShLv_cy0KmM$ >> >> I am trying to support the old-fashion F77 model, allowing miss-matches in the array dimensions while still doing proper type checking but it will take some time to simplify the API. >> >> Barry >> >> >> >>> On Aug 21, 2024, at 9:44?PM, TAY Wee Beng wrote: >>> >>> Hi Barry, >>> >>> I have declared them as integers in Fortran. Is that different from PetscInt and how come it works in debug mode? >>> >>> Anyway, I changed them and it solved the problem. However, I have a similar problem in my boundary.F90: >>> >>> boundary.F90(6685): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATSETVALUES] >>> call MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr) >>> -----^ >>> I changed all to PetscInt and also PetscReal but I still got the error. >>> >>> Why is this so now? Any solution? >>> >>> Thanks! >>> >>> On 22/8/2024 12:03 am, Barry Smith wrote: >>>> >>>> You must declare as >>>> >>>> PetscInt ksta_p,kend_p >>>> >>>> Perhaps they are declared as arrays? >>>> >>>> >>>>> On Aug 21, 2024, at 11:19?AM, TAY Wee Beng wrote: >>>>> >>>>> Hi, >>>>> >>>>> I am using the latest PETSc thru github. I compiled both the debug and rel ver of PETSc w/o problem. >>>>> >>>>> I then use it with my CFD code and the debug ver works. >>>>> >>>>> However, I have problems with the rel ver: >>>>> >>>>> ftn -o global.o -c -O3 -g -ip -ipo -fPIC -save -w -I/home/project/11003851/lib/petsc_210824_intel_rel/include global.F90 >>>>> ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is now deprecated and will be discontinued late 2024. Intel recommends that customers transition now to using the LLVM-based Intel(R) Fortran Compiler (ifx) for continued Windows* and Linux* support, new language support, new language features, and optimizations. Use '-diag-disable=10448' to disable this message. >>>>> global.F90(444): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>> call MatGetOwnershipRange(A_mat,ksta_p,kend_p,ierr) >>>>> -------------^ >>>>> global.F90(720): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>> call MatGetOwnershipRange(A_mat_uv,ksta_m,kend_m,ierr) >>>>> -----^ >>>>> global.F90(774): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr) >>>>> -----^ >>>>> global.F90(776): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr) >>>>> -----^ >>>>> global.F90(949): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr) >>>>> -----^ >>>>> global.F90(957): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr) >>>>> -----^ >>>>> compilation aborted for global.F90 (code 1) >>>>> >>>>> May I know what's the problem? >>>>> >>>>> -- >>>>> >>>>> Thank you very much. >>>>> >>>>> Yours sincerely, >>>>> >>>>> ================================================ >>>>> TAY Wee-Beng ??? (Zheng Weiming) >>>>> ================================================ >>>>> >>>>> >>>> >>> -- >>> >>> Thank you very much. >>> >>> Yours sincerely, >>> >>> ================================================ >>> TAY Wee-Beng ??? (Zheng Weiming) >>> ================================================ >>> >>> >> > -- > > Thank you very much. > > Yours sincerely, > > ================================================ > TAY Wee-Beng ??? (Zheng Weiming) > ================================================ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zonexo at gmail.com Thu Aug 22 08:55:53 2024 From: zonexo at gmail.com (TAY Wee Beng) Date: Thu, 22 Aug 2024 21:55:53 +0800 Subject: [petsc-users] Error during code compile with MATGETOWNERSHIPRANGE In-Reply-To: References: <28024196-ea0c-4ec7-ab13-a893d2852a04@gmail.com> <2CC29C86-EF68-4405-97F2-93EA0C25B9F2@petsc.dev> <8F81DAC9-6A51-4AD2-9D8F-AC6FDCF2A007@petsc.dev> Message-ID: <493e2fc8-8ce6-47d9-83e2-8f6087d49422@gmail.com> On 22/8/2024 9:54 pm, Barry Smith wrote: > > ? What is int_impl(k,5) defined type? PetscInt > >> On Aug 22, 2024, at 9:45?AM, TAY Wee Beng wrote: >> >> Hi Barry, >> >> Do you mean that I change from: >> >> call >> MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr) >> >> to >> >> call >> MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr) >> >> ? >> >> I did it but the error is still there. >> >> On 22/8/2024 9:40 pm, Barry Smith wrote: >>> >>> ? ?Fortran 90 type checking is very tight; The dimension of the >>> array, or scalar passed as arguments must match the expected >>> dimension (f77 did not do this type checking). Thus the ione >>> argument must be a 1-d array as well as the numerical values so do >>> >>>> */call >>>> MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr)/* >>>> >>> >>> See Fortran at https://urldefense.us/v3/__https://petsc.org/main/changes/dev/__;!!G_uCfscf7eWS!autVTPnQ7buLuq9rjvUR07AS8J_YKe2xLprKP48K_ELW64wGci2MCdQ2u2VxgZOFwjHSdmLTP8x3yfcAg30$ >>> >>> I am trying to support the old-fashion F77 model, allowing >>> miss-matches in the array dimensions while still doing proper type >>> checking but it will take some time to simplify the API. >>> >>> ? ?Barry >>> >>> >>> >>>> On Aug 21, 2024, at 9:44?PM, TAY Wee Beng wrote: >>>> >>>> Hi Barry, >>>> >>>> I have declared them as integers in Fortran. Is that different from >>>> PetscInt and how come it works in debug mode? >>>> >>>> Anyway, I changed them and it solved the problem. However, I have a >>>> similar problem in my boundary.F90: >>>> >>>> */boundary.F90(6685): error #6285: There is no matching specific >>>> subroutine for this generic subroutine call. [MATSETVALUES] >>>> call >>>> MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr)/* >>>> -----^ >>>> I changed all to PetscInt and also PetscReal but I still got the error. >>>> >>>> Why is this so now? Any solution? >>>> >>>> Thanks! >>>> >>>> On 22/8/2024 12:03 am, Barry Smith wrote: >>>>> >>>>> ? You must declare as >>>>> >>>>> */? PetscInt ksta_p,kend_p/* >>>>> >>>>> ? Perhaps they are declared as arrays? >>>>> >>>>> >>>>>> On Aug 21, 2024, at 11:19?AM, TAY Wee Beng wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I am using the latest PETSc thru github. I compiled both the >>>>>> debug and rel ver of PETSc w/o problem. >>>>>> >>>>>> I then use it with my CFD code and the debug ver works. >>>>>> >>>>>> However, I have problems with the rel ver: >>>>>> >>>>>> */ftn -o global.o -c -O3 -g -ip -ipo?? -fPIC? -save -w >>>>>> -I/home/project/11003851/lib/petsc_210824_intel_rel/include >>>>>> global.F90 >>>>>> ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) >>>>>> is now deprecated and will be discontinued late 2024. Intel >>>>>> recommends that customers transition now to using the LLVM-based >>>>>> Intel(R) Fortran Compiler (ifx) for continued Windows* and Linux* >>>>>> support, new language support, new language features, and >>>>>> optimizations. Use '-diag-disable=10448' to disable this message. >>>>>> global.F90(444): error #6285: There is no matching specific >>>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>> ??????? call MatGetOwnershipRange(A_mat,ksta_p,kend_p,ierr) >>>>>> -------------^ >>>>>> global.F90(720): error #6285: There is no matching specific >>>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>> call MatGetOwnershipRange(A_mat_uv,ksta_m,kend_m,ierr) >>>>>> -----^ >>>>>> global.F90(774): error #6285: There is no matching specific >>>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr) >>>>>> -----^ >>>>>> global.F90(776): error #6285: There is no matching specific >>>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr) >>>>>> -----^ >>>>>> global.F90(949): error #6285: There is no matching specific >>>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr) >>>>>> -----^ >>>>>> global.F90(957): error #6285: There is no matching specific >>>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr) >>>>>> -----^ >>>>>> compilation aborted for global.F90 (code 1)/* >>>>>> >>>>>> May I know what's the problem? >>>>>> >>>>>> -- >>>>>> >>>>>> Thank you very much. >>>>>> >>>>>> Yours sincerely, >>>>>> >>>>>> ================================================ >>>>>> TAY Wee-Beng ??? (Zheng Weiming) >>>>>> ================================================ >>>>>> >>>>>> >>>>> >>>> -- >>>> >>>> Thank you very much. >>>> >>>> Yours sincerely, >>>> >>>> ================================================ >>>> TAY Wee-Beng ??? (Zheng Weiming) >>>> ================================================ >>>> >>>> >>> >> -- >> >> Thank you very much. >> >> Yours sincerely, >> >> ================================================ >> TAY Wee-Beng ??? (Zheng Weiming) >> ================================================ >> >> > -- Thank you very much. Yours sincerely, ================================================ TAY Wee-Beng ??? (Zheng Weiming) ================================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Thu Aug 22 09:28:20 2024 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 22 Aug 2024 09:28:20 -0500 Subject: [petsc-users] Would Mac OS version affect PETSc/C/C++ performance? In-Reply-To: References: Message-ID: Hi, Ling, MatMult almost degraded 20%, which is a lot. Do you have configure.log for the two builds? We might find compiler discrepancies from it. --Junchao Zhang On Wed, Aug 21, 2024 at 8:57?AM Zou, Ling wrote: > Hi Junchao, > > > > Yeah, I have part of the log_view, for the same code, same version of > PETSc (3.20), but two OS (Ventura vs. Sonoma). > > Note that PETSc function call numbers are exactly the same. > > I suspect that it?s just OS becomes slower, or maybe something related to > the compiler. > > > > -Ling > > > > # of calls > > Time spent (Ventura) > > Time spent (Sonoma) > > MatMult MF > > 20463 > > 3.718600E+00 > > 4.467800E+00 > > MatMult > > 20463 > > 3.721000E+00 > > 4.470500E+00 > > MatFDColorApply > > 2062 > > 4.507000E+00 > > 5.394600E+00 > > MatFDColorFunc > > 24744 > > 4.472400E+00 > > 5.356300E+00 > > KSPSolve > > 2062 > > 3.569700E+00 > > 4.262400E+00 > > SNESSolve > > 986 > > 9.195900E+00 > > 1.102000E+01 > > SNESFunctionEval > > 23575 > > 4.268600E+00 > > 5.161100E+00 > > SNESJacobianEval > > 2062 > > 4.509300E+00 > > 5.397500E+00 > > > > > > *From: *Junchao Zhang > *Date: *Monday, August 19, 2024 at 10:04?PM > *To: *Zou, Ling > *Cc: *petsc-users at mcs.anl.gov > *Subject: *Re: [petsc-users] Would Mac OS version affect PETSc/C/C++ > performance? > > Do you have -log_view report so that we can know which petsc functions > degraded? Or is it because compilers were different? --Junchao Zhang On > Sun, Aug 18, 2024 at 6: 04 PM Zou, Ling via petsc-users anl. gov> wrote: Hi > > ZjQcmQRYFpfptBannerStart > > *This Message Is From an External Sender * > > This message came from outside your organization. > > > > ZjQcmQRYFpfptBannerEnd > > Do you have -log_view report so that we can know which petsc functions > degraded? Or is it because compilers were different? > > > --Junchao Zhang > > > > > > On Sun, Aug 18, 2024 at 6:04?PM Zou, Ling via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hi all, > > > > After updating Mac OS from Ventura to Sonoma, I am seeing my PETSc code > having slightly-larger-than 10% of performance degradation (only in terms > of execution time). > > I track the number of major function calls, they are identical between the > two OS (so PETSc is not the one to blame), but just slower. > > Is this something expected, any one also experienced it? > > > > -Ling > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Aug 22 09:28:24 2024 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 22 Aug 2024 10:28:24 -0400 Subject: [petsc-users] Error during code compile with MATGETOWNERSHIPRANGE In-Reply-To: <493e2fc8-8ce6-47d9-83e2-8f6087d49422@gmail.com> References: <28024196-ea0c-4ec7-ab13-a893d2852a04@gmail.com> <2CC29C86-EF68-4405-97F2-93EA0C25B9F2@petsc.dev> <8F81DAC9-6A51-4AD2-9D8F-AC6FDCF2A007@petsc.dev> <493e2fc8-8ce6-47d9-83e2-8f6087d49422@gmail.com> Message-ID: <5561D4D2-5ADD-4076-8DB3-4B07EC893FD2@petsc.dev> Should be PetscReal as it is a numerical value. > On Aug 22, 2024, at 9:55?AM, TAY Wee Beng wrote: > > > > On 22/8/2024 9:54 pm, Barry Smith wrote: >> >> What is int_impl(k,5) defined type? > PetscInt >> >>> On Aug 22, 2024, at 9:45?AM, TAY Wee Beng wrote: >>> >>> Hi Barry, >>> >>> Do you mean that I change from: >>> >>> call MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr) >>> >>> to >>> >>> call MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr) >>> >>> ? >>> >>> I did it but the error is still there. >>> >>> On 22/8/2024 9:40 pm, Barry Smith wrote: >>>> >>>> Fortran 90 type checking is very tight; The dimension of the array, or scalar passed as arguments must match the expected dimension (f77 did not do this type checking). Thus the ione argument must be a 1-d array as well as the numerical values so do >>>> >>>>> call MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr) >>>>> >>>> >>>> See Fortran at https://urldefense.us/v3/__https://petsc.org/main/changes/dev/__;!!G_uCfscf7eWS!dNBzWkhJd-4jZyaRlnKfwEu7GsbCs6Akjkgm66N2JkhV3BIi2VMXqj5uIxaLfxLiJiHd7B0-jQmrxJNpdLCb00A$ >>>> >>>> I am trying to support the old-fashion F77 model, allowing miss-matches in the array dimensions while still doing proper type checking but it will take some time to simplify the API. >>>> >>>> Barry >>>> >>>> >>>> >>>>> On Aug 21, 2024, at 9:44?PM, TAY Wee Beng wrote: >>>>> >>>>> Hi Barry, >>>>> >>>>> I have declared them as integers in Fortran. Is that different from PetscInt and how come it works in debug mode? >>>>> >>>>> Anyway, I changed them and it solved the problem. However, I have a similar problem in my boundary.F90: >>>>> >>>>> boundary.F90(6685): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATSETVALUES] >>>>> call MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr) >>>>> -----^ >>>>> I changed all to PetscInt and also PetscReal but I still got the error. >>>>> >>>>> Why is this so now? Any solution? >>>>> >>>>> Thanks! >>>>> >>>>> On 22/8/2024 12:03 am, Barry Smith wrote: >>>>>> >>>>>> You must declare as >>>>>> >>>>>> PetscInt ksta_p,kend_p >>>>>> >>>>>> Perhaps they are declared as arrays? >>>>>> >>>>>> >>>>>>> On Aug 21, 2024, at 11:19?AM, TAY Wee Beng wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I am using the latest PETSc thru github. I compiled both the debug and rel ver of PETSc w/o problem. >>>>>>> >>>>>>> I then use it with my CFD code and the debug ver works. >>>>>>> >>>>>>> However, I have problems with the rel ver: >>>>>>> >>>>>>> ftn -o global.o -c -O3 -g -ip -ipo -fPIC -save -w -I/home/project/11003851/lib/petsc_210824_intel_rel/include global.F90 >>>>>>> ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is now deprecated and will be discontinued late 2024. Intel recommends that customers transition now to using the LLVM-based Intel(R) Fortran Compiler (ifx) for continued Windows* and Linux* support, new language support, new language features, and optimizations. Use '-diag-disable=10448' to disable this message. >>>>>>> global.F90(444): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>>> call MatGetOwnershipRange(A_mat,ksta_p,kend_p,ierr) >>>>>>> -------------^ >>>>>>> global.F90(720): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>>> call MatGetOwnershipRange(A_mat_uv,ksta_m,kend_m,ierr) >>>>>>> -----^ >>>>>>> global.F90(774): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr) >>>>>>> -----^ >>>>>>> global.F90(776): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr) >>>>>>> -----^ >>>>>>> global.F90(949): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr) >>>>>>> -----^ >>>>>>> global.F90(957): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr) >>>>>>> -----^ >>>>>>> compilation aborted for global.F90 (code 1) >>>>>>> >>>>>>> May I know what's the problem? >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Thank you very much. >>>>>>> >>>>>>> Yours sincerely, >>>>>>> >>>>>>> ================================================ >>>>>>> TAY Wee-Beng ??? (Zheng Weiming) >>>>>>> ================================================ >>>>>>> >>>>>>> >>>>>> >>>>> -- >>>>> >>>>> Thank you very much. >>>>> >>>>> Yours sincerely, >>>>> >>>>> ================================================ >>>>> TAY Wee-Beng ??? (Zheng Weiming) >>>>> ================================================ >>>>> >>>>> >>>> >>> -- >>> >>> Thank you very much. >>> >>> Yours sincerely, >>> >>> ================================================ >>> TAY Wee-Beng ??? (Zheng Weiming) >>> ================================================ >>> >>> >> > -- > > Thank you very much. > > Yours sincerely, > > ================================================ > TAY Wee-Beng ??? (Zheng Weiming) > ================================================ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zonexo at gmail.com Thu Aug 22 09:33:44 2024 From: zonexo at gmail.com (TAY Wee Beng) Date: Thu, 22 Aug 2024 22:33:44 +0800 Subject: [petsc-users] Error during code compile with MATGETOWNERSHIPRANGE In-Reply-To: <5561D4D2-5ADD-4076-8DB3-4B07EC893FD2@petsc.dev> References: <28024196-ea0c-4ec7-ab13-a893d2852a04@gmail.com> <2CC29C86-EF68-4405-97F2-93EA0C25B9F2@petsc.dev> <8F81DAC9-6A51-4AD2-9D8F-AC6FDCF2A007@petsc.dev> <493e2fc8-8ce6-47d9-83e2-8f6087d49422@gmail.com> <5561D4D2-5ADD-4076-8DB3-4B07EC893FD2@petsc.dev> Message-ID: <18521985-6cbf-4e75-8274-c7ae303816cf@gmail.com> On 22/8/2024 10:28 pm, Barry Smith wrote: > > ? Should be PetscReal as it is a numerical value. Ok, I changed it but I still get the same error. > > >> On Aug 22, 2024, at 9:55?AM, TAY Wee Beng wrote: >> >> >> On 22/8/2024 9:54 pm, Barry Smith wrote: >>> >>> ? What is int_impl(k,5) defined type? >> PetscInt >>> >>>> On Aug 22, 2024, at 9:45?AM, TAY Wee Beng wrote: >>>> >>>> Hi Barry, >>>> >>>> Do you mean that I change from: >>>> >>>> call >>>> MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr) >>>> >>>> to >>>> >>>> call >>>> MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr) >>>> >>>> ? >>>> >>>> I did it but the error is still there. >>>> >>>> On 22/8/2024 9:40 pm, Barry Smith wrote: >>>>> >>>>> ? ?Fortran 90 type checking is very tight; The dimension of the >>>>> array, or scalar passed as arguments must match the expected >>>>> dimension (f77 did not do this type checking). Thus the ione >>>>> argument must be a 1-d array as well as the numerical values so do >>>>> >>>>>> */call >>>>>> MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr)/* >>>>>> >>>>> >>>>> See Fortran at https://urldefense.us/v3/__https://petsc.org/main/changes/dev/__;!!G_uCfscf7eWS!ceHG-n9LyJ3rsdpLe9UTHqjmAfS_mkeKJyn0NemRgYmtmmVJH-W2116HMDnQnrHJK4J7Vd8G9Z3Pqkcz9N8$ >>>>> >>>>> I am trying to support the old-fashion F77 model, allowing >>>>> miss-matches in the array dimensions while still doing proper type >>>>> checking but it will take some time to simplify the API. >>>>> >>>>> ? ?Barry >>>>> >>>>> >>>>> >>>>>> On Aug 21, 2024, at 9:44?PM, TAY Wee Beng wrote: >>>>>> >>>>>> Hi Barry, >>>>>> >>>>>> I have declared them as integers in Fortran. Is that different >>>>>> from PetscInt and how come it works in debug mode? >>>>>> >>>>>> Anyway, I changed them and it solved the problem. However, I have >>>>>> a similar problem in my boundary.F90: >>>>>> >>>>>> */boundary.F90(6685): error #6285: There is no matching specific >>>>>> subroutine for this generic subroutine call. [MATSETVALUES] >>>>>> call >>>>>> MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr)/* >>>>>> -----^ >>>>>> I changed all to PetscInt and also PetscReal but I still got the >>>>>> error. >>>>>> >>>>>> Why is this so now? Any solution? >>>>>> >>>>>> Thanks! >>>>>> >>>>>> On 22/8/2024 12:03 am, Barry Smith wrote: >>>>>>> >>>>>>> ? You must declare as >>>>>>> >>>>>>> */? PetscInt ksta_p,kend_p/* >>>>>>> >>>>>>> ? Perhaps they are declared as arrays? >>>>>>> >>>>>>> >>>>>>>> On Aug 21, 2024, at 11:19?AM, TAY Wee Beng >>>>>>>> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I am using the latest PETSc thru github. I compiled both the >>>>>>>> debug and rel ver of PETSc w/o problem. >>>>>>>> >>>>>>>> I then use it with my CFD code and the debug ver works. >>>>>>>> >>>>>>>> However, I have problems with the rel ver: >>>>>>>> >>>>>>>> */ftn -o global.o -c -O3 -g -ip -ipo -fPIC? -save -w >>>>>>>> -I/home/project/11003851/lib/petsc_210824_intel_rel/include >>>>>>>> global.F90 >>>>>>>> ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) >>>>>>>> is now deprecated and will be discontinued late 2024. Intel >>>>>>>> recommends that customers transition now to using the >>>>>>>> LLVM-based Intel(R) Fortran Compiler (ifx) for continued >>>>>>>> Windows* and Linux* support, new language support, new language >>>>>>>> features, and optimizations. Use '-diag-disable=10448' to >>>>>>>> disable this message. >>>>>>>> global.F90(444): error #6285: There is no matching specific >>>>>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>>>> ??????? call MatGetOwnershipRange(A_mat,ksta_p,kend_p,ierr) >>>>>>>> -------------^ >>>>>>>> global.F90(720): error #6285: There is no matching specific >>>>>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>>>> call MatGetOwnershipRange(A_mat_uv,ksta_m,kend_m,ierr) >>>>>>>> -----^ >>>>>>>> global.F90(774): error #6285: There is no matching specific >>>>>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr) >>>>>>>> -----^ >>>>>>>> global.F90(776): error #6285: There is no matching specific >>>>>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr) >>>>>>>> -----^ >>>>>>>> global.F90(949): error #6285: There is no matching specific >>>>>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr) >>>>>>>> -----^ >>>>>>>> global.F90(957): error #6285: There is no matching specific >>>>>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr) >>>>>>>> -----^ >>>>>>>> compilation aborted for global.F90 (code 1)/* >>>>>>>> >>>>>>>> May I know what's the problem? >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> Thank you very much. >>>>>>>> >>>>>>>> Yours sincerely, >>>>>>>> >>>>>>>> ================================================ >>>>>>>> TAY Wee-Beng ??? (Zheng Weiming) >>>>>>>> ================================================ >>>>>>>> >>>>>>>> >>>>>>> >>>>>> -- >>>>>> >>>>>> Thank you very much. >>>>>> >>>>>> Yours sincerely, >>>>>> >>>>>> ================================================ >>>>>> TAY Wee-Beng ??? (Zheng Weiming) >>>>>> ================================================ >>>>>> >>>>>> >>>>> >>>> -- >>>> >>>> Thank you very much. >>>> >>>> Yours sincerely, >>>> >>>> ================================================ >>>> TAY Wee-Beng ??? (Zheng Weiming) >>>> ================================================ >>>> >>>> >>> >> -- >> >> Thank you very much. >> >> Yours sincerely, >> >> ================================================ >> TAY Wee-Beng ??? (Zheng Weiming) >> ================================================ >> >> > -- Thank you very much. Yours sincerely, ================================================ TAY Wee-Beng ??? (Zheng Weiming) ================================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Aug 22 09:39:08 2024 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 22 Aug 2024 10:39:08 -0400 Subject: [petsc-users] Error during code compile with MATGETOWNERSHIPRANGE In-Reply-To: <18521985-6cbf-4e75-8274-c7ae303816cf@gmail.com> References: <28024196-ea0c-4ec7-ab13-a893d2852a04@gmail.com> <2CC29C86-EF68-4405-97F2-93EA0C25B9F2@petsc.dev> <8F81DAC9-6A51-4AD2-9D8F-AC6FDCF2A007@petsc.dev> <493e2fc8-8ce6-47d9-83e2-8f6087d49422@gmail.com> <5561D4D2-5ADD-4076-8DB3-4B07EC893FD2@petsc.dev> <18521985-6cbf-4e75-8274-c7ae303816cf@gmail.com> Message-ID: <80D9E1BB-1124-4650-8BD3-1697C51A0D86@petsc.dev> Hmm, try using a standalone variable PetscReal value value = int_impl(k,5)] >>>>> call MatSetValues(A_mat_uv,[ione],II,[ione],[value],impl_mat_A,INSERT_VALUES,ierr) >>>>> unfortunately Fortan compilers in this situation are not good at telling us exactly what argument is giving it grief. Barry > On Aug 22, 2024, at 10:33?AM, TAY Wee Beng wrote: > > > > On 22/8/2024 10:28 pm, Barry Smith wrote: >> >> Should be PetscReal as it is a numerical value. > Ok, I changed it but I still get the same error. >> >> >>> On Aug 22, 2024, at 9:55?AM, TAY Wee Beng wrote: >>> >>> >>> >>> On 22/8/2024 9:54 pm, Barry Smith wrote: >>>> >>>> What is int_impl(k,5) defined type? >>> PetscInt >>>> >>>>> On Aug 22, 2024, at 9:45?AM, TAY Wee Beng wrote: >>>>> >>>>> Hi Barry, >>>>> >>>>> Do you mean that I change from: >>>>> >>>>> call MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr) >>>>> >>>>> to >>>>> >>>>> call MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr) >>>>> >>>>> ? >>>>> >>>>> I did it but the error is still there. >>>>> >>>>> On 22/8/2024 9:40 pm, Barry Smith wrote: >>>>>> >>>>>> Fortran 90 type checking is very tight; The dimension of the array, or scalar passed as arguments must match the expected dimension (f77 did not do this type checking). Thus the ione argument must be a 1-d array as well as the numerical values so do >>>>>> >>>>>>> call MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr) >>>>>>> >>>>>> >>>>>> See Fortran at https://urldefense.us/v3/__https://petsc.org/main/changes/dev/__;!!G_uCfscf7eWS!eBd6zaB5WRncL8G-UiVtsat2nZsfBnJPgSygfyWsY7J3L6w4xI03YuoBCgbU-j5nWJjRqFxxKa2bJ7PYVM9EOWg$ >>>>>> >>>>>> I am trying to support the old-fashion F77 model, allowing miss-matches in the array dimensions while still doing proper type checking but it will take some time to simplify the API. >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>>> On Aug 21, 2024, at 9:44?PM, TAY Wee Beng wrote: >>>>>>> >>>>>>> Hi Barry, >>>>>>> >>>>>>> I have declared them as integers in Fortran. Is that different from PetscInt and how come it works in debug mode? >>>>>>> >>>>>>> Anyway, I changed them and it solved the problem. However, I have a similar problem in my boundary.F90: >>>>>>> >>>>>>> boundary.F90(6685): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATSETVALUES] >>>>>>> call MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr) >>>>>>> -----^ >>>>>>> I changed all to PetscInt and also PetscReal but I still got the error. >>>>>>> >>>>>>> Why is this so now? Any solution? >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> On 22/8/2024 12:03 am, Barry Smith wrote: >>>>>>>> >>>>>>>> You must declare as >>>>>>>> >>>>>>>> PetscInt ksta_p,kend_p >>>>>>>> >>>>>>>> Perhaps they are declared as arrays? >>>>>>>> >>>>>>>> >>>>>>>>> On Aug 21, 2024, at 11:19?AM, TAY Wee Beng wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I am using the latest PETSc thru github. I compiled both the debug and rel ver of PETSc w/o problem. >>>>>>>>> >>>>>>>>> I then use it with my CFD code and the debug ver works. >>>>>>>>> >>>>>>>>> However, I have problems with the rel ver: >>>>>>>>> >>>>>>>>> ftn -o global.o -c -O3 -g -ip -ipo -fPIC -save -w -I/home/project/11003851/lib/petsc_210824_intel_rel/include global.F90 >>>>>>>>> ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is now deprecated and will be discontinued late 2024. Intel recommends that customers transition now to using the LLVM-based Intel(R) Fortran Compiler (ifx) for continued Windows* and Linux* support, new language support, new language features, and optimizations. Use '-diag-disable=10448' to disable this message. >>>>>>>>> global.F90(444): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>>>>> call MatGetOwnershipRange(A_mat,ksta_p,kend_p,ierr) >>>>>>>>> -------------^ >>>>>>>>> global.F90(720): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>>>>> call MatGetOwnershipRange(A_mat_uv,ksta_m,kend_m,ierr) >>>>>>>>> -----^ >>>>>>>>> global.F90(774): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr) >>>>>>>>> -----^ >>>>>>>>> global.F90(776): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr) >>>>>>>>> -----^ >>>>>>>>> global.F90(949): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr) >>>>>>>>> -----^ >>>>>>>>> global.F90(957): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr) >>>>>>>>> -----^ >>>>>>>>> compilation aborted for global.F90 (code 1) >>>>>>>>> >>>>>>>>> May I know what's the problem? >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> Thank you very much. >>>>>>>>> >>>>>>>>> Yours sincerely, >>>>>>>>> >>>>>>>>> ================================================ >>>>>>>>> TAY Wee-Beng ??? (Zheng Weiming) >>>>>>>>> ================================================ >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Thank you very much. >>>>>>> >>>>>>> Yours sincerely, >>>>>>> >>>>>>> ================================================ >>>>>>> TAY Wee-Beng ??? (Zheng Weiming) >>>>>>> ================================================ >>>>>>> >>>>>>> >>>>>> >>>>> -- >>>>> >>>>> Thank you very much. >>>>> >>>>> Yours sincerely, >>>>> >>>>> ================================================ >>>>> TAY Wee-Beng ??? (Zheng Weiming) >>>>> ================================================ >>>>> >>>>> >>>> >>> -- >>> >>> Thank you very much. >>> >>> Yours sincerely, >>> >>> ================================================ >>> TAY Wee-Beng ??? (Zheng Weiming) >>> ================================================ >>> >>> >> > -- > > Thank you very much. > > Yours sincerely, > > ================================================ > TAY Wee-Beng ??? (Zheng Weiming) > ================================================ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Aug 22 09:39:08 2024 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 22 Aug 2024 10:39:08 -0400 Subject: [petsc-users] Error during code compile with MATGETOWNERSHIPRANGE In-Reply-To: <18521985-6cbf-4e75-8274-c7ae303816cf@gmail.com> References: <28024196-ea0c-4ec7-ab13-a893d2852a04@gmail.com> <2CC29C86-EF68-4405-97F2-93EA0C25B9F2@petsc.dev> <8F81DAC9-6A51-4AD2-9D8F-AC6FDCF2A007@petsc.dev> <493e2fc8-8ce6-47d9-83e2-8f6087d49422@gmail.com> <5561D4D2-5ADD-4076-8DB3-4B07EC893FD2@petsc.dev> <18521985-6cbf-4e75-8274-c7ae303816cf@gmail.com> Message-ID: <80D9E1BB-1124-4650-8BD3-1697C51A0D86@petsc.dev> Hmm, try using a standalone variable PetscReal value value = int_impl(k,5)] >>>>> call MatSetValues(A_mat_uv,[ione],II,[ione],[value],impl_mat_A,INSERT_VALUES,ierr) >>>>> unfortunately Fortan compilers in this situation are not good at telling us exactly what argument is giving it grief. Barry > On Aug 22, 2024, at 10:33?AM, TAY Wee Beng wrote: > > > > On 22/8/2024 10:28 pm, Barry Smith wrote: >> >> Should be PetscReal as it is a numerical value. > Ok, I changed it but I still get the same error. >> >> >>> On Aug 22, 2024, at 9:55?AM, TAY Wee Beng wrote: >>> >>> >>> >>> On 22/8/2024 9:54 pm, Barry Smith wrote: >>>> >>>> What is int_impl(k,5) defined type? >>> PetscInt >>>> >>>>> On Aug 22, 2024, at 9:45?AM, TAY Wee Beng wrote: >>>>> >>>>> Hi Barry, >>>>> >>>>> Do you mean that I change from: >>>>> >>>>> call MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr) >>>>> >>>>> to >>>>> >>>>> call MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr) >>>>> >>>>> ? >>>>> >>>>> I did it but the error is still there. >>>>> >>>>> On 22/8/2024 9:40 pm, Barry Smith wrote: >>>>>> >>>>>> Fortran 90 type checking is very tight; The dimension of the array, or scalar passed as arguments must match the expected dimension (f77 did not do this type checking). Thus the ione argument must be a 1-d array as well as the numerical values so do >>>>>> >>>>>>> call MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr) >>>>>>> >>>>>> >>>>>> See Fortran at https://urldefense.us/v3/__https://petsc.org/main/changes/dev/__;!!G_uCfscf7eWS!b2f2s3CzjxVoPuzYvafOupk882EeGhmMhGIGBixXIP3SRNSBy6JkxfMOXE_aqVmSFgCwaEIhK4WHFyybLs7x4K8$ >>>>>> >>>>>> I am trying to support the old-fashion F77 model, allowing miss-matches in the array dimensions while still doing proper type checking but it will take some time to simplify the API. >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>>> On Aug 21, 2024, at 9:44?PM, TAY Wee Beng wrote: >>>>>>> >>>>>>> Hi Barry, >>>>>>> >>>>>>> I have declared them as integers in Fortran. Is that different from PetscInt and how come it works in debug mode? >>>>>>> >>>>>>> Anyway, I changed them and it solved the problem. However, I have a similar problem in my boundary.F90: >>>>>>> >>>>>>> boundary.F90(6685): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATSETVALUES] >>>>>>> call MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr) >>>>>>> -----^ >>>>>>> I changed all to PetscInt and also PetscReal but I still got the error. >>>>>>> >>>>>>> Why is this so now? Any solution? >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> On 22/8/2024 12:03 am, Barry Smith wrote: >>>>>>>> >>>>>>>> You must declare as >>>>>>>> >>>>>>>> PetscInt ksta_p,kend_p >>>>>>>> >>>>>>>> Perhaps they are declared as arrays? >>>>>>>> >>>>>>>> >>>>>>>>> On Aug 21, 2024, at 11:19?AM, TAY Wee Beng wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I am using the latest PETSc thru github. I compiled both the debug and rel ver of PETSc w/o problem. >>>>>>>>> >>>>>>>>> I then use it with my CFD code and the debug ver works. >>>>>>>>> >>>>>>>>> However, I have problems with the rel ver: >>>>>>>>> >>>>>>>>> ftn -o global.o -c -O3 -g -ip -ipo -fPIC -save -w -I/home/project/11003851/lib/petsc_210824_intel_rel/include global.F90 >>>>>>>>> ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is now deprecated and will be discontinued late 2024. Intel recommends that customers transition now to using the LLVM-based Intel(R) Fortran Compiler (ifx) for continued Windows* and Linux* support, new language support, new language features, and optimizations. Use '-diag-disable=10448' to disable this message. >>>>>>>>> global.F90(444): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>>>>> call MatGetOwnershipRange(A_mat,ksta_p,kend_p,ierr) >>>>>>>>> -------------^ >>>>>>>>> global.F90(720): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>>>>> call MatGetOwnershipRange(A_mat_uv,ksta_m,kend_m,ierr) >>>>>>>>> -----^ >>>>>>>>> global.F90(774): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr) >>>>>>>>> -----^ >>>>>>>>> global.F90(776): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr) >>>>>>>>> -----^ >>>>>>>>> global.F90(949): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr) >>>>>>>>> -----^ >>>>>>>>> global.F90(957): error #6285: There is no matching specific subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE] >>>>>>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr) >>>>>>>>> -----^ >>>>>>>>> compilation aborted for global.F90 (code 1) >>>>>>>>> >>>>>>>>> May I know what's the problem? >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> Thank you very much. >>>>>>>>> >>>>>>>>> Yours sincerely, >>>>>>>>> >>>>>>>>> ================================================ >>>>>>>>> TAY Wee-Beng ??? (Zheng Weiming) >>>>>>>>> ================================================ >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Thank you very much. >>>>>>> >>>>>>> Yours sincerely, >>>>>>> >>>>>>> ================================================ >>>>>>> TAY Wee-Beng ??? (Zheng Weiming) >>>>>>> ================================================ >>>>>>> >>>>>>> >>>>>> >>>>> -- >>>>> >>>>> Thank you very much. >>>>> >>>>> Yours sincerely, >>>>> >>>>> ================================================ >>>>> TAY Wee-Beng ??? (Zheng Weiming) >>>>> ================================================ >>>>> >>>>> >>>> >>> -- >>> >>> Thank you very much. >>> >>> Yours sincerely, >>> >>> ================================================ >>> TAY Wee-Beng ??? (Zheng Weiming) >>> ================================================ >>> >>> >> > -- > > Thank you very much. > > Yours sincerely, > > ================================================ > TAY Wee-Beng ??? (Zheng Weiming) > ================================================ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From edoardo.alinovi at gmail.com Sun Aug 25 06:49:07 2024 From: edoardo.alinovi at gmail.com (Edoardo alinovi) Date: Sun, 25 Aug 2024 13:49:07 +0200 Subject: [petsc-users] Preliminaries to use gpu capabilities Message-ID: Hello Petsc friends, As many people is doing, I would like to explore a bit gpu capabilities (cuda) in petsc. Before attemping any coding effort I would like to hear from you if all of this make sense: - compile mpi with cuda support - compile petsc with cuda support - build matrix and vectors as MATAIJCUSPARSE and VECMPICUDA to tell petsc using gpu. That's really it or do I need to take care of something else? I have seen that there is an amgXWrapper library aroud, but not sure if it is still relevant now or not. Thank you for the suggestions! -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sun Aug 25 11:17:20 2024 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 25 Aug 2024 12:17:20 -0400 Subject: [petsc-users] Preliminaries to use gpu capabilities In-Reply-To: References: Message-ID: <7465D12B-F30E-496F-B1AF-6C390410DD05@petsc.dev> > On Aug 25, 2024, at 7:49?AM, Edoardo alinovi wrote: > > Hello Petsc friends, > > As many people is doing, I would like to explore a bit gpu capabilities (cuda) in petsc. > > Before attemping any coding effort I would like to hear from you if all of this make sense: > - compile mpi with cuda support This is commonly called CUDA aware MPI, but actually only means that the MPI can send and receive messages from memory addresses directly on the GPU, > - compile petsc with cuda support > - build matrix and vectors as MATAIJCUSPARSE and VECMPICUDA to tell petsc using gpu. As presented, this will compute the vector and matrix entries on the CPU, and then PETSc will automatically move the values to the GPU for the linear solver. Which is a good start. You can run with -log_view -log_view_gpu_time to see the timings, how much data is moved between the CPU and GPU, and where the computation happens. If all goes well, then you will find almost all the compute time is in building the vectors and matrices and copying the values to the GPU. At that point you will need to think about moving your computation to the GPU. This is problem-dependent, but you can look at VecSetPreallocationCOO() and MatSetPreallocationCOO() for how you can efficiently provide the values to PETSc on the GPU. As always, feel free to ask questions; the process is not trivial or as simple as we would like it to be, Barry > > That's really it or do I need to take care of something else? > > I have seen that there is an amgXWrapper library aroud, but not sure if it is still relevant now or not. > > Thank you for the suggestions! > > From edoardo.alinovi at gmail.com Sun Aug 25 11:45:29 2024 From: edoardo.alinovi at gmail.com (Edoardo alinovi) Date: Sun, 25 Aug 2024 18:45:29 +0200 Subject: [petsc-users] Preliminaries to use gpu capabilities In-Reply-To: References: Message-ID: Thank you Barry, sounds great. I'll try it out in the next weeks! Is copy data such a bottleneck with respect to the solving time in your opinion? I am not scared of building stuff on gpu directly, I basically assemble the petsc matrix and rhs in one point so it would be ok doing stuff on gpu directly. Is aij format ok for gpu or better CSR? Cheers. Il Dom 25 Ago 2024, 13:49 Edoardo alinovi ha scritto: > Hello Petsc friends, > > As many people is doing, I would like to explore a bit gpu capabilities > (cuda) in petsc. > > Before attemping any coding effort I would like to hear from you if all of > this make sense: > - compile mpi with cuda support > - compile petsc with cuda support > - build matrix and vectors as MATAIJCUSPARSE and VECMPICUDA to tell petsc > using gpu. > > That's really it or do I need to take care of something else? > > I have seen that there is an amgXWrapper library aroud, but not sure if it > is still relevant now or not. > > Thank you for the suggestions! > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sun Aug 25 17:26:28 2024 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 25 Aug 2024 18:26:28 -0400 Subject: [petsc-users] Preliminaries to use gpu capabilities In-Reply-To: References: Message-ID: > On Aug 25, 2024, at 12:45?PM, Edoardo alinovi wrote: > > Thank you Barry, sounds great. I'll try it out in the next weeks! Is copy data such a bottleneck with respect to the solving time in your opinion? If you are solving with the same matrix many times, then the matrix copy is not a big deal. > I am not scared of building stuff on gpu directly, I basically assemble the petsc matrix and rhs in one point so it would be ok doing stuff on gpu directly. Is aij format ok for gpu or better CSR? PETSc AIJ is essentially CSR and is what everyone supports. If you are solving with the same matrix many times, converting to MATAIJSELL up front will likely pay off, but this doesn't change user code. > > Cheers. > > Il Dom 25 Ago 2024, 13:49 Edoardo alinovi > ha scritto: >> Hello Petsc friends, >> >> As many people is doing, I would like to explore a bit gpu capabilities (cuda) in petsc. >> >> Before attemping any coding effort I would like to hear from you if all of this make sense: >> - compile mpi with cuda support >> - compile petsc with cuda support >> - build matrix and vectors as MATAIJCUSPARSE and VECMPICUDA to tell petsc using gpu. >> >> That's really it or do I need to take care of something else? >> >> I have seen that there is an amgXWrapper library aroud, but not sure if it is still relevant now or not. >> >> Thank you for the suggestions! >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From zonexo at gmail.com Sun Aug 25 22:46:17 2024 From: zonexo at gmail.com (TAY Wee Beng) Date: Mon, 26 Aug 2024 11:46:17 +0800 Subject: [petsc-users] Error during code compile with MATGETOWNERSHIPRANGE In-Reply-To: <80D9E1BB-1124-4650-8BD3-1697C51A0D86@petsc.dev> References: <28024196-ea0c-4ec7-ab13-a893d2852a04@gmail.com> <2CC29C86-EF68-4405-97F2-93EA0C25B9F2@petsc.dev> <8F81DAC9-6A51-4AD2-9D8F-AC6FDCF2A007@petsc.dev> <493e2fc8-8ce6-47d9-83e2-8f6087d49422@gmail.com> <5561D4D2-5ADD-4076-8DB3-4B07EC893FD2@petsc.dev> <18521985-6cbf-4e75-8274-c7ae303816cf@gmail.com> <80D9E1BB-1124-4650-8BD3-1697C51A0D86@petsc.dev> Message-ID: <1ca6f1e4-7da1-4cc9-85cd-76e7b431c1b3@gmail.com> Hi Barry, Thanks, I'll try later. Back to using 3.20.6 which is working 1st. On 22/8/2024 10:39 pm, Barry Smith wrote: > > ? Hmm, try using a standalone variable > > ? PetscReal value > ? value = int_impl(k,5)] > > >>>>>> call >>>>>> MatSetValues(A_mat_uv,[ione],II,[ione],[value],impl_mat_A,INSERT_VALUES,ierr) >>>>>> > > unfortunately Fortan compilers in this situation are not good at > telling us exactly what argument is giving it grief. > > Barry > > > >> On Aug 22, 2024, at 10:33?AM, TAY Wee Beng wrote: >> >> >> On 22/8/2024 10:28 pm, Barry Smith wrote: >>> >>> ? Should be PetscReal as it is a numerical value. >> Ok, I changed it but I still get the same error. >>> >>> >>>> On Aug 22, 2024, at 9:55?AM, TAY Wee Beng wrote: >>>> >>>> >>>> On 22/8/2024 9:54 pm, Barry Smith wrote: >>>>> >>>>> ? What is int_impl(k,5) defined type? >>>> PetscInt >>>>> >>>>>> On Aug 22, 2024, at 9:45?AM, TAY Wee Beng wrote: >>>>>> >>>>>> Hi Barry, >>>>>> >>>>>> Do you mean that I change from: >>>>>> >>>>>> call >>>>>> MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr) >>>>>> >>>>>> to >>>>>> >>>>>> call >>>>>> MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr) >>>>>> >>>>>> ? >>>>>> >>>>>> I did it but the error is still there. >>>>>> >>>>>> On 22/8/2024 9:40 pm, Barry Smith wrote: >>>>>>> >>>>>>> ? ?Fortran 90 type checking is very tight; The dimension of the >>>>>>> array, or scalar passed as arguments must match the expected >>>>>>> dimension (f77 did not do this type checking). Thus the ione >>>>>>> argument must be a 1-d array as well as the numerical values so do >>>>>>> >>>>>>>> */call >>>>>>>> MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr)/* >>>>>>>> >>>>>>> >>>>>>> See Fortran at https://urldefense.us/v3/__https://petsc.org/main/changes/dev/__;!!G_uCfscf7eWS!evaW1MccHDe9ZPfL8ftIad0f_3W-98xwAmqqb-CmMXOH4FQPmfb1qB2vE4K3hGi3w2UJfq_e19E7r9oLmRU$ >>>>>>> >>>>>>> I am trying to support the old-fashion F77 model, allowing >>>>>>> miss-matches in the array dimensions while still doing proper >>>>>>> type checking but it will take some time to simplify the API. >>>>>>> >>>>>>> ? ?Barry >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Aug 21, 2024, at 9:44?PM, TAY Wee Beng wrote: >>>>>>>> >>>>>>>> Hi Barry, >>>>>>>> >>>>>>>> I have declared them as integers in Fortran. Is that different >>>>>>>> from PetscInt and how come it works in debug mode? >>>>>>>> >>>>>>>> Anyway, I changed them and it solved the problem. However, I >>>>>>>> have a similar problem in my boundary.F90: >>>>>>>> >>>>>>>> */boundary.F90(6685): error #6285: There is no matching >>>>>>>> specific subroutine for this generic subroutine call. >>>>>>>> [MATSETVALUES] >>>>>>>> call >>>>>>>> MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr)/* >>>>>>>> -----^ >>>>>>>> I changed all to PetscInt and also PetscReal but I still got >>>>>>>> the error. >>>>>>>> >>>>>>>> Why is this so now? Any solution? >>>>>>>> >>>>>>>> Thanks! >>>>>>>> >>>>>>>> On 22/8/2024 12:03 am, Barry Smith wrote: >>>>>>>>> >>>>>>>>> ? You must declare as >>>>>>>>> >>>>>>>>> */PetscInt ksta_p,kend_p/* >>>>>>>>> >>>>>>>>> ? Perhaps they are declared as arrays? >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Aug 21, 2024, at 11:19?AM, TAY Wee Beng >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I am using the latest PETSc thru github. I compiled both the >>>>>>>>>> debug and rel ver of PETSc w/o problem. >>>>>>>>>> >>>>>>>>>> I then use it with my CFD code and the debug ver works. >>>>>>>>>> >>>>>>>>>> However, I have problems with the rel ver: >>>>>>>>>> >>>>>>>>>> */ftn -o global.o -c -O3 -g -ip -ipo?? -fPIC -save -w >>>>>>>>>> -I/home/project/11003851/lib/petsc_210824_intel_rel/include >>>>>>>>>> global.F90 >>>>>>>>>> ifort: remark #10448: Intel(R) Fortran Compiler Classic >>>>>>>>>> (ifort) is now deprecated and will be discontinued late 2024. >>>>>>>>>> Intel recommends that customers transition now to using the >>>>>>>>>> LLVM-based Intel(R) Fortran Compiler (ifx) for continued >>>>>>>>>> Windows* and Linux* support, new language support, new >>>>>>>>>> language features, and optimizations. Use >>>>>>>>>> '-diag-disable=10448' to disable this message. >>>>>>>>>> global.F90(444): error #6285: There is no matching specific >>>>>>>>>> subroutine for this generic subroutine call. >>>>>>>>>> [MATGETOWNERSHIPRANGE] >>>>>>>>>> ??????? call MatGetOwnershipRange(A_mat,ksta_p,kend_p,ierr) >>>>>>>>>> -------------^ >>>>>>>>>> global.F90(720): error #6285: There is no matching specific >>>>>>>>>> subroutine for this generic subroutine call. >>>>>>>>>> [MATGETOWNERSHIPRANGE] >>>>>>>>>> call MatGetOwnershipRange(A_mat_uv,ksta_m,kend_m,ierr) >>>>>>>>>> -----^ >>>>>>>>>> global.F90(774): error #6285: There is no matching specific >>>>>>>>>> subroutine for this generic subroutine call. >>>>>>>>>> [MATGETOWNERSHIPRANGE] >>>>>>>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr) >>>>>>>>>> -----^ >>>>>>>>>> global.F90(776): error #6285: There is no matching specific >>>>>>>>>> subroutine for this generic subroutine call. >>>>>>>>>> [MATGETOWNERSHIPRANGE] >>>>>>>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr) >>>>>>>>>> -----^ >>>>>>>>>> global.F90(949): error #6285: There is no matching specific >>>>>>>>>> subroutine for this generic subroutine call. >>>>>>>>>> [MATGETOWNERSHIPRANGE] >>>>>>>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr) >>>>>>>>>> -----^ >>>>>>>>>> global.F90(957): error #6285: There is no matching specific >>>>>>>>>> subroutine for this generic subroutine call. >>>>>>>>>> [MATGETOWNERSHIPRANGE] >>>>>>>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr) >>>>>>>>>> -----^ >>>>>>>>>> compilation aborted for global.F90 (code 1)/* >>>>>>>>>> >>>>>>>>>> May I know what's the problem? >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> Thank you very much. >>>>>>>>>> >>>>>>>>>> Yours sincerely, >>>>>>>>>> >>>>>>>>>> ================================================ >>>>>>>>>> TAY Wee-Beng ??? (Zheng Weiming) >>>>>>>>>> ================================================ >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> Thank you very much. >>>>>>>> >>>>>>>> Yours sincerely, >>>>>>>> >>>>>>>> ================================================ >>>>>>>> TAY Wee-Beng ??? (Zheng Weiming) >>>>>>>> ================================================ >>>>>>>> >>>>>>>> >>>>>>> >>>>>> -- >>>>>> >>>>>> Thank you very much. >>>>>> >>>>>> Yours sincerely, >>>>>> >>>>>> ================================================ >>>>>> TAY Wee-Beng ??? (Zheng Weiming) >>>>>> ================================================ >>>>>> >>>>>> >>>>> >>>> -- >>>> >>>> Thank you very much. >>>> >>>> Yours sincerely, >>>> >>>> ================================================ >>>> TAY Wee-Beng ??? (Zheng Weiming) >>>> ================================================ >>>> >>>> >>> >> -- >> >> Thank you very much. >> >> Yours sincerely, >> >> ================================================ >> TAY Wee-Beng ??? (Zheng Weiming) >> ================================================ >> >> > -- Thank you very much. Yours sincerely, ================================================ TAY Wee-Beng ??? (Zheng Weiming) ================================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From lzou at anl.gov Mon Aug 26 08:10:38 2024 From: lzou at anl.gov (Zou, Ling) Date: Mon, 26 Aug 2024 13:10:38 +0000 Subject: [petsc-users] Would Mac OS version affect PETSc/C/C++ performance? In-Reply-To: References: Message-ID: Junchao, I am accessing a pre-compiled version of PETSc via MOOSE, so unfortunately, I don?t have those configure log files. Note that all function calls became slower. -Ling From: Junchao Zhang Date: Thursday, August 22, 2024 at 9:28?AM To: Zou, Ling Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Would Mac OS version affect PETSc/C/C++ performance? Hi, Ling, MatMult almost degraded 20%, which is a lot. Do you have configure.?log for the two builds? We might find compiler discrepancies from it. --Junchao Zhang On Wed, Aug 21, 2024 at 8:?57 AM Zou, Ling wrote: Hi Junchao, ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Hi, Ling, MatMult almost degraded 20%, which is a lot. Do you have configure.log for the two builds? We might find compiler discrepancies from it. --Junchao Zhang On Wed, Aug 21, 2024 at 8:57?AM Zou, Ling > wrote: Hi Junchao, Yeah, I have part of the log_view, for the same code, same version of PETSc (3.20), but two OS (Ventura vs. Sonoma). Note that PETSc function call numbers are exactly the same. I suspect that it?s just OS becomes slower, or maybe something related to the compiler. -Ling # of calls Time spent (Ventura) Time spent (Sonoma) MatMult MF 20463 3.718600E+00 4.467800E+00 MatMult 20463 3.721000E+00 4.470500E+00 MatFDColorApply 2062 4.507000E+00 5.394600E+00 MatFDColorFunc 24744 4.472400E+00 5.356300E+00 KSPSolve 2062 3.569700E+00 4.262400E+00 SNESSolve 986 9.195900E+00 1.102000E+01 SNESFunctionEval 23575 4.268600E+00 5.161100E+00 SNESJacobianEval 2062 4.509300E+00 5.397500E+00 From: Junchao Zhang > Date: Monday, August 19, 2024 at 10:04?PM To: Zou, Ling > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Would Mac OS version affect PETSc/C/C++ performance? Do you have -log_view report so that we can know which petsc functions degraded? Or is it because compilers were different? --Junchao Zhang On Sun, Aug 18, 2024 at 6:?04 PM Zou, Ling via petsc-users wrote: Hi ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Do you have -log_view report so that we can know which petsc functions degraded? Or is it because compilers were different? --Junchao Zhang On Sun, Aug 18, 2024 at 6:04?PM Zou, Ling via petsc-users > wrote: Hi all, After updating Mac OS from Ventura to Sonoma, I am seeing my PETSc code having slightly-larger-than 10% of performance degradation (only in terms of execution time). I track the number of major function calls, they are identical between the two OS (so PETSc is not the one to blame), but just slower. Is this something expected, any one also experienced it? -Ling -------------- next part -------------- An HTML attachment was scrubbed... URL: From edoardo.centofanti01 at universitadipavia.it Thu Aug 29 05:09:25 2024 From: edoardo.centofanti01 at universitadipavia.it (Edoardo Centofanti) Date: Thu, 29 Aug 2024 12:09:25 +0200 Subject: [petsc-users] Questions about DMPlex Message-ID: Dear PETSc users, I am tinkering with DMPlex. I am trying to import a .msh file with some labelling over it (the mesh is 3D. Some are labels over points within a 3D volume, others are labels over points lying on a surface for boundary conditions). However, this is what I get in the preamble of my msh file for the "PhysicalNames" part: $PhysicalNames 6 2 4 "surf1" 2 5 "surf2" 2 6 "boundary" 3 1 "vol1" 3 2 "vol2" 3 3 "vol3" $EndPhysicalNames I import the mesh through a function which should create a distributed mesh through all the MPI processes: PetscErrorCode ImportMsh(const char meshname[], DM *dm) { PetscErrorCode ierr; DM distributedDM = NULL; // Create a DMPlex object and set its type ierr = DMCreate(PETSC_COMM_WORLD, dm); CHKERRQ(ierr); ierr = DMSetType(*dm, DMPLEX); CHKERRQ(ierr); // Import the mesh from an external gmsh file ierr = DMPlexCreateGmshFromFile(PETSC_COMM_WORLD, meshname, PETSC_TRUE, dm); CHKERRQ(ierr); // Distribute the mesh across processors ierr = DMPlexDistribute(*dm, 0, NULL, &distributedDM); CHKERRQ(ierr); if (distributedDM) { ierr = DMDestroy(dm); CHKERRQ(ierr); *dm = distributedDM; } // View DMPlex DMView(*dm,PETSC_VIEWER_STDOUT_WORLD); return ierr; } The output of DMView with 1 processor is the following: DM Object: DM_0x12a623ae0_0 1 MPI process type: plex DM_0x12a623ae0_0 in 3 dimensions: Number of 0-cells per rank: 730 Number of 1-cells per rank: 4010 Number of 2-cells per rank: 6100 Number of 3-cells per rank: 2819 Labels: celltype: 4 strata with value/size (0 (730), 6 (2819), 3 (6100), 1 (4010)) depth: 4 strata with value/size (0 (730), 1 (4010), 2 (6100), 3 (2819)) Cell Sets: 3 strata with value/size (2 (311), 3 (322), 1 (2186)) Face Sets: 3 strata with value/size (6 (924), 4 (516), 5 (4)) While for 2 processes i get: DM Object: Parallel Mesh 2 MPI processes type: plex Parallel Mesh in 3 dimensions: Number of 0-cells per rank: 625 713 Number of 1-cells per rank: 2792 3187 Number of 2-cells per rank: 3546 3759 Number of 3-cells per rank: 1410 1409 Labels: depth: 4 strata with value/size (0 (625), 1 (2792), 2 (3546), 3 (1410)) celltype: 4 strata with value/size (0 (625), 1 (2792), 3 (3546), 6 (1410)) Cell Sets: 3 strata with value/size (1 (777), 2 (311), 3 (322)) Face Sets: 3 strata with value/size (4 (516), 5 (4), 6 (247)) *First question: *Where are the labels I gave in the .msh file? My interpretation here is that they are the numbers in brackets: for example, 6 (924) means that 924 Face elements are marked with 6, which corresponds to "boundary" in my .msh file. However, in the 2 processors DMView I get different number of elements. Again, my intuition is that only proc 0 labels are printed. *Second question: *Suppose I wanted to fill a matrix only in the entries corresponding to the nodes of the elements marked with 5, namely "surf2". How can I do that? So far, I have used ierr = DMPlexGetDepthStratum(dm, 2, &pStart, &pEnd); CHKERRQ(ierr); in order to get pStart and pEnd for the stratum related to faces (2). Then I looped over p from pStart to pEnd in order to select the faces which are marked with 5 (I cannot access directly to points since they do not seem to be marked at all). For the selected faces, I used ierr = DMPlexGetConeSize(dm, p, &coneSize1); CHKERRQ(ierr); ierr = DMPlexGetCone(dm, p, &cone); CHKERRQ(ierr); in order to retrieve the edges associated to each marked face, then I used again GetConeSize and GetCone in order to access the vertices and managed to build an IS object with the points I need (using also ISSortRemoveDups to remove duplicates). But here I am stuck, since printing this IS gives the right number of vertices, but with different local numberings depending on the number of processors used and with a variable offset depending on the local DAG associated to the local DMPlex. I wonder if there exists a less cumbersome way to perform this task... Thank you in advance, Edoardo -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.diehl at kuleuven.be Thu Aug 29 08:56:48 2024 From: martin.diehl at kuleuven.be (Martin Diehl) Date: Thu, 29 Aug 2024 13:56:48 +0000 Subject: [petsc-users] Fortran: PetscDSRestoreTabulation + PetscDSGetTabulation Message-ID: <2554e3743adb38378641ca97e5cc9c828d06e8f9.camel@kuleuven.be> Dear PETSc team, I have a question regarding the use of PetscDSGetTabulation from Fortran. PetscDSGetTabulation has a slightly different function signature between Fortran and C. In addition, there is an (undocumented) PetscDSRestoreTabulation in Fortran which cleans up the arrays. Calling it results in a segmentation fault. I believe that PetscDSRestoreTabulation is not needed. At least our Fortran FEM code compiles and runs without it. However, we have convergence issues that we don't understand so any suspicious code is currently under investigation. best regards, Martin -- KU Leuven Department of Computer Science Department of Materials Engineering Celestijnenlaan 200a 3001 Leuven, Belgium -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 659 bytes Desc: This is a digitally signed message part URL: From knepley at gmail.com Thu Aug 29 16:18:25 2024 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 29 Aug 2024 17:18:25 -0400 Subject: [petsc-users] Fortran: PetscDSRestoreTabulation + PetscDSGetTabulation In-Reply-To: <2554e3743adb38378641ca97e5cc9c828d06e8f9.camel@kuleuven.be> References: <2554e3743adb38378641ca97e5cc9c828d06e8f9.camel@kuleuven.be> Message-ID: On Thu, Aug 29, 2024 at 9:57?AM Martin Diehl wrote: > Dear PETSc team, > > I have a question regarding the use of PetscDSGetTabulation from > Fortran. > PetscDSGetTabulation has a slightly different function signature > between Fortran and C. In addition, there is an (undocumented) > PetscDSRestoreTabulation in Fortran which cleans up the arrays. Calling > it results in a segmentation fault. > > I believe that PetscDSRestoreTabulation is not needed. At least our > Fortran FEM code compiles and runs without it. However, we have > convergence issues that we don't understand so any suspicious code is > currently under investigation. > This may be due to my weak Fortran knowledge. Here is the code https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/blob/main/src/dm/dt/interface/f90-custom/zdtdsf90.c?ref_type=heads__;!!G_uCfscf7eWS!dBU6FLrC9bckJQhQgaPX-SxZbtbXKaPqvirTeDpSB_7r8Pn1M2Lo4ZkCq70i-eFj3KAT-qA_gjQDfjsxLbf2$ I call F90Array1dCreate() in the GetTabulation and F90Array1dDestroy() in the RestoreTabulation(), which I thought was right. However, I remember something about interface declarations, which have now moved somewhere I cannot find. Barry, is the interface declaration for this function correct? Thanks, Matt > best regards, > Martin > > -- > KU Leuven > Department of Computer Science > Department of Materials Engineering > Celestijnenlaan 200a > 3001 Leuven, Belgium > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!dBU6FLrC9bckJQhQgaPX-SxZbtbXKaPqvirTeDpSB_7r8Pn1M2Lo4ZkCq70i-eFj3KAT-qA_gjQDfgnXdIvK$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Aug 29 16:21:19 2024 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 29 Aug 2024 17:21:19 -0400 Subject: [petsc-users] Fortran: PetscDSRestoreTabulation + PetscDSGetTabulation In-Reply-To: References: <2554e3743adb38378641ca97e5cc9c828d06e8f9.camel@kuleuven.be> Message-ID: The interface definition and Fortran stub look ok to my eyeballs. However, eyeballs cannot compile code, so using the debugger to determine the cause of the crash is best. Barry > On Aug 29, 2024, at 5:18?PM, Matthew Knepley wrote: > > On Thu, Aug 29, 2024 at 9:57?AM Martin Diehl > wrote: >> Dear PETSc team, >> >> I have a question regarding the use of PetscDSGetTabulation from >> Fortran. >> PetscDSGetTabulation has a slightly different function signature >> between Fortran and C. In addition, there is an (undocumented) >> PetscDSRestoreTabulation in Fortran which cleans up the arrays. Calling >> it results in a segmentation fault. >> >> I believe that PetscDSRestoreTabulation is not needed. At least our >> Fortran FEM code compiles and runs without it. However, we have >> convergence issues that we don't understand so any suspicious code is >> currently under investigation. > > This may be due to my weak Fortran knowledge. Here is the code > > https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/blob/main/src/dm/dt/interface/f90-custom/zdtdsf90.c?ref_type=heads__;!!G_uCfscf7eWS!c6UMUopH2bMHgSiLAqHoFUrJqajqcyYKZ3EFjfB_9tp9k8ByFqwf12a_M6JkOBU2tA3kSC5h83TvWw9kIrmuwic$ > > I call F90Array1dCreate() in the GetTabulation and F90Array1dDestroy() in the RestoreTabulation(), which I thought > was right. However, I remember something about interface declarations, which have now moved somewhere I cannot find. > > Barry, is the interface declaration for this function correct? > > Thanks, > > Matt > >> best regards, >> Martin >> >> -- >> KU Leuven >> Department of Computer Science >> Department of Materials Engineering >> Celestijnenlaan 200a >> 3001 Leuven, Belgium >> > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!c6UMUopH2bMHgSiLAqHoFUrJqajqcyYKZ3EFjfB_9tp9k8ByFqwf12a_M6JkOBU2tA3kSC5h83TvWw9kyCZ6mUM$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.diehl at kuleuven.be Fri Aug 30 02:10:57 2024 From: martin.diehl at kuleuven.be (Martin Diehl) Date: Fri, 30 Aug 2024 07:10:57 +0000 Subject: [petsc-users] Fortran: PetscDSRestoreTabulation + PetscDSGetTabulation In-Reply-To: References: <2554e3743adb38378641ca97e5cc9c828d06e8f9.camel@kuleuven.be> Message-ID: Dear Matt and Barry: thanks for the quick reply. Please forget about the segmentation fault, that was a mistake in my code. Regarding the necessity of PetscDSRestoreTabulation: It cleans up "b" and "bDer". Those are defined as "PetscReal, pointer". If they are defined in a function or subroutine, they go out of scope automatically. So I believed (backed up by measuring the memory consumption with and without PetscDSRestoreTabulation) that the PetscDSRestoreTabulation does not add anything important. Martin On Thu, 2024-08-29 at 17:21 -0400, Barry Smith wrote: > > ? ?The interface definition and Fortran stub look ok to my eyeballs. > However, eyeballs cannot compile code, so using the debugger to > determine the cause of the crash is best. > > ? ?Barry > > > > > On Aug 29, 2024, at 5:18?PM, Matthew Knepley > > wrote: > > > > On Thu, Aug 29, 2024 at 9:57?AM Martin Diehl > > wrote: > > > Dear PETSc team, > > > > > > I have a question regarding the use of PetscDSGetTabulation from > > > Fortran. > > > PetscDSGetTabulation has a slightly different function signature > > > between Fortran and C. In addition, there is an (undocumented) > > > PetscDSRestoreTabulation in Fortran which cleans up the arrays. > > > Calling > > > it results in a segmentation fault. > > > > > > I believe that PetscDSRestoreTabulation is not needed. At least > > > our > > > Fortran FEM code compiles and runs without it. However, we have > > > convergence issues that we don't understand so any suspicious > > > code is > > > currently under investigation. > > > > > > > > > This may be due to my weak Fortran knowledge. Here is the code > > > > ?? > > https://gitlab.com/petsc/petsc/-/blob/main/src/dm/dt/interface/f90- > > custom/zdtdsf90.c?ref_type=heads > > > > I call F90Array1dCreate() in the GetTabulation and > > F90Array1dDestroy() in the RestoreTabulation(), which I thought > > was right. However, I remember something about interface > > declarations, which have now moved somewhere I cannot find. > > > > Barry, is the interface declaration for this function correct? > > > > ? Thanks, > > > > ? ? ? Matt > > ? > > > best regards, > > > Martin > > > -- KU Leuven Department of Computer Science Department of Materials Engineering Celestijnenlaan 200a 3001 Leuven, Belgium -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 659 bytes Desc: This is a digitally signed message part URL: From bsmith at petsc.dev Fri Aug 30 11:04:42 2024 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 30 Aug 2024 12:04:42 -0400 Subject: [petsc-users] Fortran: PetscDSRestoreTabulation + PetscDSGetTabulation In-Reply-To: References: <2554e3743adb38378641ca97e5cc9c828d06e8f9.camel@kuleuven.be> Message-ID: > On Aug 30, 2024, at 3:10?AM, Martin Diehl wrote: > > Dear Matt and Barry: > > thanks for the quick reply. > Please forget about the segmentation fault, that was a mistake in my > code. > > Regarding the necessity of PetscDSRestoreTabulation: > It cleans up "b" and "bDer". Those are defined as "PetscReal, pointer". > If they are defined in a function or subroutine, they go out of scope > automatically. So I believed (backed up by measuring the memory > consumption with and without PetscDSRestoreTabulation) that the > PetscDSRestoreTabulation does not add anything important. Our Fortran stub restore performs a nullify(ptr) so that the ptr is no longer associated with our C array. I guess you are saying that when the ptr goes out of scope, its (little) memory is automatically freed regardless of whether the ptr is still associated with something or not. So nullify is not needed in that case. Thus, the restore is only a "safety" feature, preventing the caller from accidentally using the associated C array later (which they should not do). Similar to how our restore in C nullifies the C pointer so it cannot be used accidentally later, resulting in memory corruption. Thanks for the clarification. Barry Note that some restores do dereference memory or objects, and those must be called, or there will be a memory loss. Thus, it is best always to call restore, though sometimes it may not be strictly necessary? > > Martin > > > On Thu, 2024-08-29 at 17:21 -0400, Barry Smith wrote: >> >> The interface definition and Fortran stub look ok to my eyeballs. >> However, eyeballs cannot compile code, so using the debugger to >> determine the cause of the crash is best. >> >> Barry >> >> >> >>> On Aug 29, 2024, at 5:18?PM, Matthew Knepley >>> wrote: >>> >>> On Thu, Aug 29, 2024 at 9:57?AM Martin Diehl >>> wrote: >>>> Dear PETSc team, >>>> >>>> I have a question regarding the use of PetscDSGetTabulation from >>>> Fortran. >>>> PetscDSGetTabulation has a slightly different function signature >>>> between Fortran and C. In addition, there is an (undocumented) >>>> PetscDSRestoreTabulation in Fortran which cleans up the arrays. >>>> Calling >>>> it results in a segmentation fault. >>>> >>>> I believe that PetscDSRestoreTabulation is not needed. At least >>>> our >>>> Fortran FEM code compiles and runs without it. However, we have >>>> convergence issues that we don't understand so any suspicious >>>> code is >>>> currently under investigation. >>>> >>> >>> >>> This may be due to my weak Fortran knowledge. Here is the code >>> >>> >>> https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/blob/main/src/dm/dt/interface/f90-__;!!G_uCfscf7eWS!Yupp9dT687CoWxUcIqYk9RyipgxWnDmxmOhS6kdl4ta7TWb_rVTCvVs9Zf5syoTZzj4shfQuUcwW5Aev3a8Vx04$ >>> custom/zdtdsf90.c?ref_type=heads >>> >>> I call F90Array1dCreate() in the GetTabulation and >>> F90Array1dDestroy() in the RestoreTabulation(), which I thought >>> was right. However, I remember something about interface >>> declarations, which have now moved somewhere I cannot find. >>> >>> Barry, is the interface declaration for this function correct? >>> >>> Thanks, >>> >>> Matt >>> >>>> best regards, >>>> Martin >>>> > > -- > KU Leuven > Department of Computer Science > Department of Materials Engineering > Celestijnenlaan 200a > 3001 Leuven, Belgium