From marco at kit.ac.jp  Thu Aug  1 03:34:45 2024
From: marco at kit.ac.jp (Marco Seiz)
Date: Thu, 1 Aug 2024 17:34:45 +0900
Subject: [petsc-users] Right DM for a particle network
In-Reply-To: <CAMYG4GnMzHQoer3UmTFPH06VLi+Cv=okoonrb0Yhk1rddnnB8w@mail.gmail.com>
References: <54a9b9e5-691c-4535-bc49-5c00bc19a0df@kit.ac.jp>
	<CAMYG4G=ZSFs--=Efu7QXOXeCuh1yb1sH31zXY1o5OWo0x35ukw@mail.gmail.com>
	<CADOhEh7XpSqsTVmN=NZPk+AK5g4m0ZrZ321n1C+6S0i3qrf64Q@mail.gmail.com>
	<a25a0aea-4668-4376-936c-97206c922019@kit.ac.jp>
	<CAMYG4Gk1+o0zHrfjO1b3vbEmuNSqytx6JWa+sqm34Z2DV-PF2w@mail.gmail.com>
	<CADOhEh5+BX5pSjjFHnri0D1KNaqoYFWokVEm1M7mA4e6mLj2Qg@mail.gmail.com>
	<CAMYG4Gm+dF9QV1eTRX5vkyZTda0QKvLMnSUK6da6UJX7q9k+CA@mail.gmail.com>
	<c5a2e7be-903c-4b21-bbaa-628a88422c30@kit.ac.jp>
	<CAMYG4GnMzHQoer3UmTFPH06VLi+Cv=okoonrb0Yhk1rddnnB8w@mail.gmail.com>
Message-ID: <f3e20c99-301f-4cf7-a24e-c8ed8afa22f2@kit.ac.jp>

An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240801/5ecd86e5/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ex_graphlapl.c
Type: text/x-csrc
Size: 11457 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240801/5ecd86e5/attachment-0001.bin>

From Eric.Chamberland at giref.ulaval.ca  Thu Aug  1 07:23:28 2024
From: Eric.Chamberland at giref.ulaval.ca (Eric Chamberland)
Date: Thu, 1 Aug 2024 08:23:28 -0400
Subject: [petsc-users] How to combine different element types into a
 single DMPlex?
In-Reply-To: <CAMYG4GkuqSvaPJYXV2wF+w-dfge_NtSzUncRZ8AKbyXaxuaJfg@mail.gmail.com>
References: <beda9fe3-b204-8b65-66c7-bdadf4509679@giref.ulaval.ca>
	<CAMYG4GnJE0DP_f-NctsMj0PoerBy2yi43RXL0WZtKoOxwqFQ7Q@mail.gmail.com>
	<ab13b059-1282-2314-2bc7-6336c051a590@giref.ulaval.ca>
	<CA+gX-L_3i=fA7hKBnnhL2EZvwvD+sEn11D8qcc2YRaR2ywwahA@mail.gmail.com>
	<CAMYG4GmYyHqLMhNWsCenbfGON_gSdN1DUy2oRgdO7fbGbwzv4w@mail.gmail.com>
	<CA+gX-L9FkvH+u-yPPpzKPcLkmk3vVzuFbF3mTr=jHJVZqVruZQ@mail.gmail.com>
	<CAMYG4G=mUTVCuayh4_kqfjEoNT_XEt6+FF24jF2KhLu0nWb8tw@mail.gmail.com>
	<6e78845e-2054-92b1-d6db-2c0820c05b64@giref.ulaval.ca>
	<BB7104B9-EB9F-4F1A-A62F-A4AD9CD6F294@erdw.ethz.ch>
	<9021c53e-18af-428a-978a-54a3c7371378@giref.ulaval.ca>
	<CAMYG4GkuqSvaPJYXV2wF+w-dfge_NtSzUncRZ8AKbyXaxuaJfg@mail.gmail.com>
Message-ID: <4545fc14-d9d5-46c4-bb16-fa304b27d106@giref.ulaval.ca>

Hi Matthew,

we have our own format that uses MPI I/O for the initial read, then we 
would like to do almost exactly what we do in ex47.c 
(https://urldefense.us/v3/__https://petsc.org/main/src/dm/impls/plex/tests/ex47.c.html__;!!G_uCfscf7eWS!cgLnoLq-w8YlD_y4ZrBQbY1i_SgBSKmVRFIOZU9rULyu9jowettkaC7Srlg-sjuHlrXIjItOOY-dgiXMDyfGE3fljVcPVgrTfu5bKTR_$ ) excepted 
the very beginning of the program that will read (MPI I/O) from the 
disk.? Then, always in parallel:

1- Populate a DMPlex with multiple element types (with a variant of 
DMPlexBuildFromCellListParallel ? do you have an example of this?)

2- Call partitioning (DMPlexDistribute)

3- Compute overlap (DMPlexDistributeOverlap)

4- Also compute the corresponding mapping between original element 
numbers and partitonned+overlap elements ( DMPlexNaturalToGlobalBegin/End)

The main point here here is overlap computation.? And the big challenge 
is that we must always rely on the fact that never, ever, any node read 
all the mesh: all nodes have only a small part of it at the beginning 
then we want parallel partitioning and overlapping computation...

It is now working fine for a mesh with a single type of element, but if 
we can modify ex47.c with an example of a mixed element types that will 
achieve what we would like to do!

Thanks,

Eric


On 2024-07-31 22:09, Matthew Knepley wrote:
> On Wed, Jul 31, 2024 at 4:16?PM Eric Chamberland 
> <Eric.Chamberland at giref.ulaval.ca> wrote:
>
>     Hi Vaclav,
>
>     Okay, I am coming back with this question after some time... ;)
>
>     I am just wondering if it is now possible to call
>     DMPlexBuildFromCellListParallel or something else, to build a mesh
>     that combine different element types into a single DMPlex (in
>     parallel of course) ?
>
> 1) Meshes with different cell types are fully functional, and some 
> applications have been using them for a while now.
>
> 2) The Firedrake I/O methods support these hybrid meshes.
>
> 3) You can, for example, read in a GMsh or ExodusII file with 
> different cell types.
>
> However, there is no direct interface like 
> DMPlexBuildFromCellListParallel(). If you plan on creating meshes by 
> hand, I can build that for you.
> No one so far has wanted that. Rather they want to read in a mesh in 
> some format, or alter a base mesh by inserting other cell types.
>
> So, what is the motivating use case?
>
> ? Thanks,
>
> ? ? ?Matt
>
>     Thanks,
>
>     Eric
>
>     On 2021-09-23 11:30, Hapla Vaclav wrote:
>>     Note there will soon be a generalization of
>>     DMPlexBuildFromCellListParallel() around, as a side product of
>>     our current collaborative efforts with Firedrake guys. It will
>>     take a PetscSection instead of relying on the blocksize [which is
>>     indeed always constant for the given dataset]. Stay tuned.
>>
>>     https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/4350__;!!G_uCfscf7eWS!cgLnoLq-w8YlD_y4ZrBQbY1i_SgBSKmVRFIOZU9rULyu9jowettkaC7Srlg-sjuHlrXIjItOOY-dgiXMDyfGE3fljVcPVgrTfvLIunXK$ 
>>
>>     Thanks,
>>
>>     Vaclav
>>
>>>     On 23 Sep 2021, at 16:53, Eric Chamberland
>>>     <Eric.Chamberland at giref.ulaval.ca> wrote:
>>>
>>>     Hi,
>>>
>>>     oh, that's a great news!
>>>
>>>     In our case we have our home-made file-format, invariant to the
>>>     number of processes (thanks to MPI_File_set_view), that uses
>>>     collective, asynchronous MPI I/O native calls for unstructured
>>>     hybrid meshes and fields .
>>>
>>>     So our needs are not for reading meshes but only to fill an
>>>     hybrid DMPlex with DMPlexBuildFromCellListParallel (or something
>>>     else to come?)... to exploit petsc partitioners and parallel
>>>     overlap computation...
>>>
>>>     Thanks for the follow-up! :)
>>>
>>>     Eric
>>>
>>>
>>>     On 2021-09-22 7:20 a.m., Matthew Knepley wrote:
>>>>     On Wed, Sep 22, 2021 at 3:04 AM Karin&NiKo
>>>>     <niko.karin at gmail.com> wrote:
>>>>
>>>>         Dear Matthew,
>>>>
>>>>         This is great news!
>>>>         For my part, I would be mostly interested?in the parallel
>>>>         input interface. Sorry for that...
>>>>         Indeed, in our application, we already have a parallel mesh
>>>>         data structure that supports hybrid meshes with parallel
>>>>         I/O and distribution (based on the MED format). We would
>>>>         like to use a DMPlex to make parallel mesh adaptation.
>>>>         ?As a matter of fact, all our meshes are in the MED format.
>>>>         We could also?contribute to extend the interface of DMPlex
>>>>         with MED (if you consider it could be usefull).
>>>>
>>>>
>>>>     An MED interface does exist. I stopped using it for two reasons:
>>>>
>>>>     ? 1) The code was not portable and the build was failing on
>>>>     different architectures. I had to manually fix it.
>>>>
>>>>     ? 2) The boundary markers did not provide global information,
>>>>     so that parallel reading was much harder.
>>>>
>>>>     Feel free to update my MED reader to a better design.
>>>>
>>>>     ? Thanks,
>>>>
>>>>     ? ? ?Matt
>>>>
>>>>         Best regards,
>>>>         Nicolas
>>>>
>>>>
>>>>         Le?mar. 21 sept. 2021 ??21:56, Matthew Knepley
>>>>         <knepley at gmail.com> a ?crit?:
>>>>
>>>>             On Tue, Sep 21, 2021 at 10:31 AM Karin&NiKo
>>>>             <niko.karin at gmail.com> wrote:
>>>>
>>>>                 Dear Eric, dear Matthew,
>>>>
>>>>                 I share Eric's desire to be able to manipulate
>>>>                 meshes composed of different types of elements in a
>>>>                 PETSc's DMPlex.
>>>>                 Since this discussion, is there anything new on
>>>>                 this feature for the DMPlex?object or am I missing
>>>>                 something?
>>>>
>>>>
>>>>             Thanks for finding this!
>>>>
>>>>             Okay, I did a rewrite of the Plex internals this
>>>>             summer. It should now be possible to interpolate a mesh
>>>>             with any
>>>>             number of cell types, partition it, redistribute it,
>>>>             and many other manipulations.
>>>>
>>>>             You can read in some formats that support
>>>>             hybrid?meshes. If you let me know how you plan to read
>>>>             it in, we can make it work.
>>>>             Right now, I don't want to make input interfaces that
>>>>             no one will ever use. We have a project, joint with
>>>>             Firedrake, to finalize
>>>>             parallel I/O. This will make parallel reading and
>>>>             writing for checkpointing possible, supporting
>>>>             topology, geometry, fields and
>>>>             layouts, for many meshes?in one HDF5 file. I think we
>>>>             will finish in November.
>>>>
>>>>             ? Thanks,
>>>>
>>>>             ? ? ?Matt
>>>>
>>>>                 Thanks,
>>>>                 Nicolas
>>>>
>>>>                 Le?mer. 21 juil. 2021 ??04:25, Eric Chamberland
>>>>                 <Eric.Chamberland at giref.ulaval.ca> a ?crit?:
>>>>
>>>>                     Hi,
>>>>
>>>>                     On 2021-07-14 3:14 p.m., Matthew Knepley wrote:
>>>>>                     On Wed, Jul 14, 2021 at 1:25 PM Eric
>>>>>                     Chamberland <Eric.Chamberland at giref.ulaval.ca>
>>>>>                     wrote:
>>>>>
>>>>>                         Hi,
>>>>>
>>>>>                         while playing with
>>>>>                         DMPlexBuildFromCellListParallel, I noticed
>>>>>                         we have to
>>>>>                         specify "numCorners" which is a fixed
>>>>>                         value, then gives a fixed number
>>>>>                         of nodes for a series of elements.
>>>>>
>>>>>                         How can I then add, for example, triangles
>>>>>                         and quadrangles into a DMPlex?
>>>>>
>>>>>
>>>>>                     You can't with that function. It would be much
>>>>>                     mich more complicated if you could, and I am
>>>>>                     not sure
>>>>>                     it is worth it for that function. The reason
>>>>>                     is that you would need index information to
>>>>>                     offset?into the
>>>>>                     connectivity list, and that would need to be
>>>>>                     replicated to some extent so that all
>>>>>                     processes know what
>>>>>                     the others are doing. Possible, but complicated.
>>>>>
>>>>>                     Maybe I can help suggest something for what
>>>>>                     you are trying?to do?
>>>>
>>>>                     Yes: we are trying to partition our parallel
>>>>                     mesh with PETSc functions. The mesh has been
>>>>                     read in parallel so each process owns a part of
>>>>                     it, but we have to manage mixed elements types.
>>>>
>>>>                     When we directly use ParMETIS_V3_PartMeshKway,
>>>>                     we give two arrays to describe the elements
>>>>                     which allows mixed elements.
>>>>
>>>>                     So, how would I read my mixed mesh in parallel
>>>>                     and give it to PETSc DMPlex so I can use a
>>>>                     PetscPartitioner with DMPlexDistribute ?
>>>>
>>>>                     A second goal we have is to use PETSc to
>>>>                     compute the overlap, which is something I can't
>>>>                     find in PARMetis (and any other partitionning
>>>>                     library?)
>>>>
>>>>                     Thanks,
>>>>
>>>>                     Eric
>>>>
>>>>
>>>>>
>>>>>                     ? Thanks,
>>>>>
>>>>>                     Matt
>>>>>
>>>>>                         Thanks,
>>>>>
>>>>>                         Eric
>>>>>
>>>>>                         -- 
>>>>>                         Eric Chamberland, ing., M. Ing
>>>>>                         Professionnel de recherche
>>>>>                         GIREF/Universit? Laval
>>>>>                         (418) 656-2131 poste 41 22 42
>>>>>
>>>>>
>>>>>
>>>>>                     -- 
>>>>>                     What most experimenters take for granted
>>>>>                     before they begin their experiments is
>>>>>                     infinitely more interesting than any results
>>>>>                     to which their experiments lead.
>>>>>                     -- Norbert Wiener
>>>>>
>>>>>                     https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cgLnoLq-w8YlD_y4ZrBQbY1i_SgBSKmVRFIOZU9rULyu9jowettkaC7Srlg-sjuHlrXIjItOOY-dgiXMDyfGE3fljVcPVgrTfsLvKmAp$ 
>>>>>                     <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cgLnoLq-w8YlD_y4ZrBQbY1i_SgBSKmVRFIOZU9rULyu9jowettkaC7Srlg-sjuHlrXIjItOOY-dgiXMDyfGE3fljVcPVgrTfmYFl_DF$ >
>>>>
>>>>                     -- 
>>>>                     Eric Chamberland, ing., M. Ing
>>>>                     Professionnel de recherche
>>>>                     GIREF/Universit? Laval
>>>>                     (418) 656-2131 poste 41 22 42
>>>>
>>>>
>>>>
>>>>             -- 
>>>>             What most experimenters take for granted before they
>>>>             begin their experiments is infinitely more interesting
>>>>             than any results to which their experiments lead.
>>>>             -- Norbert Wiener
>>>>
>>>>             https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cgLnoLq-w8YlD_y4ZrBQbY1i_SgBSKmVRFIOZU9rULyu9jowettkaC7Srlg-sjuHlrXIjItOOY-dgiXMDyfGE3fljVcPVgrTfsLvKmAp$ 
>>>>             <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cgLnoLq-w8YlD_y4ZrBQbY1i_SgBSKmVRFIOZU9rULyu9jowettkaC7Srlg-sjuHlrXIjItOOY-dgiXMDyfGE3fljVcPVgrTfmYFl_DF$ >
>>>>
>>>>
>>>>
>>>>     -- 
>>>>     What most experimenters take for granted before they begin
>>>>     their experiments is infinitely more interesting than any
>>>>     results to which their experiments lead.
>>>>     -- Norbert Wiener
>>>>
>>>>     https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cgLnoLq-w8YlD_y4ZrBQbY1i_SgBSKmVRFIOZU9rULyu9jowettkaC7Srlg-sjuHlrXIjItOOY-dgiXMDyfGE3fljVcPVgrTfsLvKmAp$ 
>>>>     <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cgLnoLq-w8YlD_y4ZrBQbY1i_SgBSKmVRFIOZU9rULyu9jowettkaC7Srlg-sjuHlrXIjItOOY-dgiXMDyfGE3fljVcPVgrTfmYFl_DF$ >
>>>     -- 
>>>     Eric Chamberland, ing., M. Ing
>>>     Professionnel de recherche
>>>     GIREF/Universit? Laval
>>>     (418) 656-2131 poste 41 22 42
>>
>     -- 
>     Eric Chamberland, ing., M. Ing
>     Professionnel de recherche
>     GIREF/Universit? Laval
>     (418) 656-2131 poste 41 22 42
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener
>
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cgLnoLq-w8YlD_y4ZrBQbY1i_SgBSKmVRFIOZU9rULyu9jowettkaC7Srlg-sjuHlrXIjItOOY-dgiXMDyfGE3fljVcPVgrTfsLvKmAp$  
> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cgLnoLq-w8YlD_y4ZrBQbY1i_SgBSKmVRFIOZU9rULyu9jowettkaC7Srlg-sjuHlrXIjItOOY-dgiXMDyfGE3fljVcPVgrTfmYFl_DF$ >

-- 
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Universit? Laval
(418) 656-2131 poste 41 22 42
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240801/39c62585/attachment-0001.html>

From sebastian.blauth at itwm.fraunhofer.de  Thu Aug  1 08:19:52 2024
From: sebastian.blauth at itwm.fraunhofer.de (Blauth, Sebastian)
Date: Thu, 1 Aug 2024 13:19:52 +0000
Subject: [petsc-users] Question regarding naming of fieldsplit splits
In-Reply-To: <BEZP281MB33633BDD021F547D5789693DB8DC2@BEZP281MB3363.DEUP281.PROD.OUTLOOK.COM>
References: <BE1P281MB3365B52866DD5B6ACF27213CB8D02@BE1P281MB3365.DEUP281.PROD.OUTLOOK.COM>
	<CAMYG4G=NR2iYsFawk6guYeuXOzJUv-65WVRw0enQJZ2HW-R6CA@mail.gmail.com>
	<BEZP281MB3363972EE729667EF509B583B8D32@BEZP281MB3363.DEUP281.PROD.OUTLOOK.COM>
	<CAMYG4Gn9awhxEwZk9YxNP1VPWfbfozYbH84ny4bV=kHyNoG_Tg@mail.gmail.com>
	<BEZP281MB33633BDD021F547D5789693DB8DC2@BEZP281MB3363.DEUP281.PROD.OUTLOOK.COM>
Message-ID: <BEZP281MB3363FAC6F29B92BDD53FE71EB8B22@BEZP281MB3363.DEUP281.PROD.OUTLOOK.COM>

Hello everyone,

 
I have a follow up on my question. I noticed the following behavior. Let?s assume I have 5 fields which I want to group with the following options:

 
-ksp_type fgmres

-ksp_max_it 1

-ksp_monitor_true_residual

-ksp_view

-pc_type fieldsplit

-pc_fieldsplit_type multiplicative

-pc_fieldsplit_0_fields 0,1

-pc_fieldsplit_1_fields 2

-pc_fieldsplit_2_fields 3,4

-fieldsplit_0_ksp_type preonly

-fieldsplit_0_pc_type jacobi

-fieldsplit_2_ksp_type preonly

-fieldsplit_2_pc_type jacobi

 
Then, the first split is fine, but both the second and third splits get the same prefix, i.e., ?fieldsplit_2?. This is shown in the output of the ksp_view, which I attach below.

The first one gets the prefix as there is only a single split (and I choose as name the index) and the third split gets the name as it groups two other fields, so the ?outer? name is taken. Is there any way to circumvent this, other than using custom names for the splits which are unique?

 
Thanks for your time and best regards,

Sebastian Blauth

 
The output of ?ksp_view? is the following

 
KSP Object: 1 MPI process

  type: fgmres

    restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement

    happy breakdown tolerance 1e-30

  maximum iterations=1, initial guess is zero

  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.

  right preconditioning

  using UNPRECONDITIONED norm type for convergence test

PC Object: 1 MPI process

  type: fieldsplit

    FieldSplit with MULTIPLICATIVE composition: total splits = 3

    Solver info for each split is in the following KSP objects:

  Split number 0 Defined by IS

  KSP Object: (fieldsplit_0_) 1 MPI process

    type: preonly

    maximum iterations=10000, initial guess is zero

    tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.

    left preconditioning

    using NONE norm type for convergence test

  PC Object: (fieldsplit_0_) 1 MPI process

    type: jacobi

      type DIAGONAL

    linear system matrix = precond matrix:

    Mat Object: (fieldsplit_0_) 1 MPI process

      type: seqaij

      rows=243, cols=243

      total: nonzeros=4473, allocated nonzeros=4473

      total number of mallocs used during MatSetValues calls=0

        using I-node routines: found 86 nodes, limit used is 5

  Split number 1 Defined by IS

  KSP Object: (fieldsplit_2_) 1 MPI process

    type: preonly

    maximum iterations=10000, initial guess is zero

    tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.

    left preconditioning

    using NONE norm type for convergence test

  PC Object: (fieldsplit_2_) 1 MPI process

    type: jacobi

      type DIAGONAL

    linear system matrix = precond matrix:

    Mat Object: (fieldsplit_2_) 1 MPI process

      type: seqaij

      rows=81, cols=81

      total: nonzeros=497, allocated nonzeros=497

      total number of mallocs used during MatSetValues calls=0

        not using I-node routines

  Split number 2 Defined by IS

  KSP Object: (fieldsplit_2_) 1 MPI process

    type: preonly

    maximum iterations=10000, initial guess is zero

    tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.

    left preconditioning

    using NONE norm type for convergence test

  PC Object: (fieldsplit_2_) 1 MPI process

    type: jacobi

      type DIAGONAL

    linear system matrix = precond matrix:

    Mat Object: (fieldsplit_2_) 1 MPI process

      type: seqaij

      rows=243, cols=243

      total: nonzeros=4473, allocated nonzeros=4473

      total number of mallocs used during MatSetValues calls=0

        using I-node routines: found 85 nodes, limit used is 5

  linear system matrix = precond matrix:

  Mat Object: 1 MPI process

    type: seqaij

    rows=567, cols=567

    total: nonzeros=24353, allocated nonzeros=24353

    total number of mallocs used during MatSetValues calls=0

      using I-node routines: found 173 nodes, limit used is 5

 
--

Dr. Sebastian Blauth

Fraunhofer-Institut f?r

Techno- und Wirtschaftsmathematik ITWM

Abteilung Transportvorg?nge

Fraunhofer-Platz 1, 67663 Kaiserslautern

Telefon: +49 631 31600-4968

sebastian.blauth at itwm.fraunhofer.de <mailto:sebastian.blauth at itwm.fraunhofer.de> 

https://www.itwm.fraunhofer.de

 
From: petsc-users <petsc-users-bounces at mcs.anl.gov> On Behalf Of Blauth, Sebastian
Sent: Tuesday, July 2, 2024 11:47 AM
To: Matthew Knepley <knepley at gmail.com>
Cc: petsc-users at mcs.anl.gov
Subject: Re: [petsc-users] Question regarding naming of fieldsplit splits

 
Hi Matt,

 
thanks fort he answer and clarification. Then I?ll work around this issue in python, where I set the options.

 
Best,

Sebastian

 
--

Dr. Sebastian Blauth

Fraunhofer-Institut f?r

Techno- und Wirtschaftsmathematik ITWM

Abteilung Transportvorg?nge

Fraunhofer-Platz 1, 67663 Kaiserslautern

Telefon: +49 631 31600-4968

sebastian.blauth at itwm.fraunhofer.de <mailto:sebastian.blauth at itwm.fraunhofer.de> 

https://www.itwm.fraunhofer.de

 
From: Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com> > 
Sent: Monday, July 1, 2024 4:30 PM
To: Blauth, Sebastian <sebastian.blauth at itwm.fraunhofer.de <mailto:sebastian.blauth at itwm.fraunhofer.de> >
Cc: petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> 
Subject: Re: [petsc-users] Question regarding naming of fieldsplit splits

 
On Mon, Jul 1, 2024 at 9:48?AM Blauth, Sebastian <sebastian.blauth at itwm.fraunhofer.de <mailto:sebastian.blauth at itwm.fraunhofer.de> > wrote:

Dear Matt,

 
thanks a lot for your help. Unfortunately, for me these extra options do not have any effect, I still get the ?u? and ?p? fieldnames. Also, this would not help me to get rid of the ?c? fieldname ? on that level of the fieldsplit I am basically using your approach already, and still it does show up. The output of the -ksp_view is unchanged, so that I do not attach it here again. Maybe I misunderstood you?

 
Oh, we make an exception for single fields, since we think you would want to use the name. I have to make an extra option to shut off naming.

 
   Thanks,

 
     Matt

 
Thanks for the help and best regards,

Sebastian

 
--

Dr. Sebastian Blauth

Fraunhofer-Institut f?r

Techno- und Wirtschaftsmathematik ITWM

Abteilung Transportvorg?nge

Fraunhofer-Platz 1, 67663 Kaiserslautern

Telefon: +49 631 31600-4968

sebastian.blauth at itwm.fraunhofer.de <mailto:sebastian.blauth at itwm.fraunhofer.de> 

https://www.itwm.fraunhofer.de

 
From: Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com> > 
Sent: Monday, July 1, 2024 2:27 PM
To: Blauth, Sebastian <sebastian.blauth at itwm.fraunhofer.de <mailto:sebastian.blauth at itwm.fraunhofer.de> >
Cc: petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> 
Subject: Re: [petsc-users] Question regarding naming of fieldsplit splits

 
On Fri, Jun 28, 2024 at 4:05?AM Blauth, Sebastian <sebastian.blauth at itwm.fraunhofer.de <mailto:sebastian.blauth at itwm.fraunhofer.de> > wrote:

Hello everyone,

 
I have a question regarding the naming convention using PETSc?s PCFieldsplit. I have been following https://lists.mcs.anl.gov/pipermail/petsc-users/2019-January/037262.html to create a DMShell with FEniCS in order to customize PCFieldsplit for my application. 

I am using the following options, which work nicely for me:

 
-ksp_type fgmres

-pc_type fieldsplit

-pc_fieldsplit_0_fields 0, 1

-pc_fieldsplit_1_fields 2

-pc_fieldsplit_type additive

-fieldsplit_0_ksp_type fgmres

-fieldsplit_0_pc_type fieldsplit

-fieldsplit_0_pc_fieldsplit_type schur

-fieldsplit_0_pc_fieldsplit_schur_fact_type full

-fieldsplit_0_pc_fieldsplit_schur_precondition selfp

-fieldsplit_0_fieldsplit_u_ksp_type preonly

-fieldsplit_0_fieldsplit_u_pc_type lu

-fieldsplit_0_fieldsplit_p_ksp_type cg

-fieldsplit_0_fieldsplit_p_ksp_rtol 1e-14

-fieldsplit_0_fieldsplit_p_ksp_atol 1e-30

-fieldsplit_0_fieldsplit_p_pc_type icc

-fieldsplit_0_ksp_rtol 1e-14

-fieldsplit_0_ksp_atol 1e-30

-fieldsplit_0_ksp_monitor_true_residual

-fieldsplit_c_ksp_type preonly

-fieldsplit_c_pc_type lu

-ksp_view

 
By default, we use the field names, but you can prevent this by specifying the fields by hand, so

 
-fieldsplit_0_pc_fieldsplit_0_fields 0
-fieldsplit_0_pc_fieldsplit_1_fields 1

 
should remove the 'u' and 'p' fieldnames. It is somewhat hacky, but I think easier to remember than

some extra option.

 
  Thanks,

 
     Matt

 
Note that this is just an academic example (sorry for the low solver tolerances) to test the approach, consisting of a Stokes equation and some concentration equation (which is not even coupled to Stokes, just for testing).

Completely analogous to https://lists.mcs.anl.gov/pipermail/petsc-users/2019-January/037262.html, I translate my IS?s to a PETSc Section, which is then supplied to a DMShell and assigned to a KSP. I am not so familiar with the code or how / why this works, but it seems to do so perfectly. I name my sections with petsc4py using

 
section.setFieldName(0, "u")

section.setFieldName(1, "p")

section.setFieldName(2, "c")

 
However, this is also reflected in the way I can access the fieldsplit options from the command line. My question is: Is there any way of not using the FieldNames specified in python but use the index of the field as defined with ?-pc_fieldsplit_0_fields 0, 1? and ?-pc_fieldsplit_1_fields 2?, i.e., instead of the prefix ?fieldsplit_0_fieldsplit_u? I want to write ?fieldsplit_0_fieldsplit_0?, instead of ?fieldsplit_0_fieldsplit_p? I want to use ?fieldsplit_0_fieldsplit_1?, and instead of ?fieldsplit_c? I want to use ?fieldsplit_1?. Just changing the names of the fields to

 
section.setFieldName(0, "0")

section.setFieldName(1, "1")

section.setFieldName(2, "2")

 
does obviously not work as expected, as it works for velocity and pressure, but not for the concentration ? the prefix there is then ?fieldsplit_2? and not ?fieldsplit_1?. In the docs, I have found https://petsc.org/main/manualpages/PC/PCFieldSplitSetFields/ which seems to suggest that the fieldname can potentially be supplied, but I don?t see how to do so from the command line. Also, for the sake of completeness, I attach the output of the solve with ?-ksp_view? below. 

 
Thanks a lot in advance and best regards,

Sebastian

 
The output of ksp_view is the following:

KSP Object: 1 MPI processes

  type: fgmres

    restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement

    happy breakdown tolerance 1e-30

  maximum iterations=10000, initial guess is zero

  tolerances:  relative=1e-05, absolute=1e-11, divergence=10000.

  right preconditioning

  using UNPRECONDITIONED norm type for convergence test

PC Object: 1 MPI processes

  type: fieldsplit

    FieldSplit with ADDITIVE composition: total splits = 2

    Solver info for each split is in the following KSP objects:

  Split number 0 Defined by IS

  KSP Object: (fieldsplit_0_) 1 MPI processes

    type: fgmres

      restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement

      happy breakdown tolerance 1e-30

    maximum iterations=10000, initial guess is zero

    tolerances:  relative=1e-14, absolute=1e-30, divergence=10000.

    right preconditioning

    using UNPRECONDITIONED norm type for convergence test

  PC Object: (fieldsplit_0_) 1 MPI processes

    type: fieldsplit

      FieldSplit with Schur preconditioner, factorization FULL

      Preconditioner for the Schur complement formed from Sp, an assembled approximation to S, which uses A00's diagonal's inverse

      Split info:

      Split number 0 Defined by IS

      Split number 1 Defined by IS

      KSP solver for A00 block

        KSP Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes

          type: preonly

          maximum iterations=10000, initial guess is zero

          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.

          left preconditioning

          using NONE norm type for convergence test

        PC Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes

          type: lu

            out-of-place factorization

            tolerance for zero pivot 2.22045e-14

            matrix ordering: nd

            factor fill ratio given 5., needed 3.92639

              Factored matrix follows:

                Mat Object: 1 MPI processes

                  type: seqaij

                  rows=4290, cols=4290

                  package used to perform factorization: petsc

                  total: nonzeros=375944, allocated nonzeros=375944

                    using I-node routines: found 2548 nodes, limit used is 5

          linear system matrix = precond matrix:

          Mat Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes

            type: seqaij

            rows=4290, cols=4290

            total: nonzeros=95748, allocated nonzeros=95748

            total number of mallocs used during MatSetValues calls=0

              using I-node routines: found 3287 nodes, limit used is 5

      KSP solver for S = A11 - A10 inv(A00) A01 

        KSP Object: (fieldsplit_0_fieldsplit_p_) 1 MPI processes

          type: cg

          maximum iterations=10000, initial guess is zero

          tolerances:  relative=1e-14, absolute=1e-30, divergence=10000.

          left preconditioning

          using PRECONDITIONED norm type for convergence test

        PC Object: (fieldsplit_0_fieldsplit_p_) 1 MPI processes

          type: icc

            out-of-place factorization

            0 levels of fill

            tolerance for zero pivot 2.22045e-14

            using Manteuffel shift [POSITIVE_DEFINITE]

            matrix ordering: natural

            factor fill ratio given 1., needed 1.

              Factored matrix follows:

                Mat Object: 1 MPI processes

                  type: seqsbaij

                  rows=561, cols=561

                  package used to perform factorization: petsc

                  total: nonzeros=5120, allocated nonzeros=5120

                      block size is 1

          linear system matrix followed by preconditioner matrix:

          Mat Object: (fieldsplit_0_fieldsplit_p_) 1 MPI processes

            type: schurcomplement

            rows=561, cols=561

              Schur complement A11 - A10 inv(A00) A01

              A11

                Mat Object: (fieldsplit_0_fieldsplit_p_) 1 MPI processes

                  type: seqaij

                  rows=561, cols=561

                  total: nonzeros=3729, allocated nonzeros=3729

                  total number of mallocs used during MatSetValues calls=0

                    not using I-node routines

              A10

                Mat Object: 1 MPI processes

                  type: seqaij

                  rows=561, cols=4290

                  total: nonzeros=19938, allocated nonzeros=19938

                  total number of mallocs used during MatSetValues calls=0

                    not using I-node routines

              KSP of A00

                KSP Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes

                  type: preonly

                  maximum iterations=10000, initial guess is zero

                  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.

                  left preconditioning

                  using NONE norm type for convergence test

                PC Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes

                  type: lu

                    out-of-place factorization

                    tolerance for zero pivot 2.22045e-14

                    matrix ordering: nd

                    factor fill ratio given 5., needed 3.92639

                      Factored matrix follows:

                        Mat Object: 1 MPI processes

                          type: seqaij

                          rows=4290, cols=4290

                          package used to perform factorization: petsc

                          total: nonzeros=375944, allocated nonzeros=375944

                            using I-node routines: found 2548 nodes, limit used is 5

                  linear system matrix = precond matrix:

                  Mat Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes

                    type: seqaij

                    rows=4290, cols=4290

                    total: nonzeros=95748, allocated nonzeros=95748

                    total number of mallocs used during MatSetValues calls=0

                      using I-node routines: found 3287 nodes, limit used is 5

              A01

                Mat Object: 1 MPI processes

                  type: seqaij

                  rows=4290, cols=561

                  total: nonzeros=19938, allocated nonzeros=19938

                  total number of mallocs used during MatSetValues calls=0

                    using I-node routines: found 3287 nodes, limit used is 5

          Mat Object: 1 MPI processes

            type: seqaij

            rows=561, cols=561

            total: nonzeros=9679, allocated nonzeros=9679

            total number of mallocs used during MatSetValues calls=0

              not using I-node routines

    linear system matrix = precond matrix:

    Mat Object: (fieldsplit_0_) 1 MPI processes

      type: seqaij

      rows=4851, cols=4851

      total: nonzeros=139353, allocated nonzeros=139353

      total number of mallocs used during MatSetValues calls=0

        using I-node routines: found 3830 nodes, limit used is 5

  Split number 1 Defined by IS

  KSP Object: (fieldsplit_c_) 1 MPI processes

    type: preonly

    maximum iterations=10000, initial guess is zero

    tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.

    left preconditioning

    using NONE norm type for convergence test

  PC Object: (fieldsplit_c_) 1 MPI processes

    type: lu

      out-of-place factorization

      tolerance for zero pivot 2.22045e-14

      matrix ordering: nd

      factor fill ratio given 5., needed 4.24323

        Factored matrix follows:

          Mat Object: 1 MPI processes

            type: seqaij

            rows=561, cols=561

            package used to perform factorization: petsc

            total: nonzeros=15823, allocated nonzeros=15823

              not using I-node routines

    linear system matrix = precond matrix:

    Mat Object: (fieldsplit_c_) 1 MPI processes

      type: seqaij

      rows=561, cols=561

      total: nonzeros=3729, allocated nonzeros=3729

      total number of mallocs used during MatSetValues calls=0

        not using I-node routines

  linear system matrix = precond matrix:

  Mat Object: 1 MPI processes

    type: seqaij

    rows=5412, cols=5412

    total: nonzeros=190416, allocated nonzeros=190416

    total number of mallocs used during MatSetValues calls=0

      using I-node routines: found 3833 nodes, limit used is 5

 
--

Dr. Sebastian Blauth

Fraunhofer-Institut f?r

Techno- und Wirtschaftsmathematik ITWM

Abteilung Transportvorg?nge

Fraunhofer-Platz 1, 67663 Kaiserslautern

Telefon: +49 631 31600-4968

sebastian.blauth at itwm.fraunhofer.de <mailto:sebastian.blauth at itwm.fraunhofer.de> 

https://www.itwm.fraunhofer.de

 
-- 

What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

 
https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/> 


-- 

What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

 
https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240801/8a673ba6/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 7943 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240801/8a673ba6/attachment-0001.p7s>

From knepley at gmail.com  Thu Aug  1 08:24:47 2024
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 1 Aug 2024 09:24:47 -0400
Subject: [petsc-users] How to combine different element types into a
 single DMPlex?
In-Reply-To: <4545fc14-d9d5-46c4-bb16-fa304b27d106@giref.ulaval.ca>
References: <beda9fe3-b204-8b65-66c7-bdadf4509679@giref.ulaval.ca>
	<CAMYG4GnJE0DP_f-NctsMj0PoerBy2yi43RXL0WZtKoOxwqFQ7Q@mail.gmail.com>
	<ab13b059-1282-2314-2bc7-6336c051a590@giref.ulaval.ca>
	<CA+gX-L_3i=fA7hKBnnhL2EZvwvD+sEn11D8qcc2YRaR2ywwahA@mail.gmail.com>
	<CAMYG4GmYyHqLMhNWsCenbfGON_gSdN1DUy2oRgdO7fbGbwzv4w@mail.gmail.com>
	<CA+gX-L9FkvH+u-yPPpzKPcLkmk3vVzuFbF3mTr=jHJVZqVruZQ@mail.gmail.com>
	<CAMYG4G=mUTVCuayh4_kqfjEoNT_XEt6+FF24jF2KhLu0nWb8tw@mail.gmail.com>
	<6e78845e-2054-92b1-d6db-2c0820c05b64@giref.ulaval.ca>
	<BB7104B9-EB9F-4F1A-A62F-A4AD9CD6F294@erdw.ethz.ch>
	<9021c53e-18af-428a-978a-54a3c7371378@giref.ulaval.ca>
	<CAMYG4GkuqSvaPJYXV2wF+w-dfge_NtSzUncRZ8AKbyXaxuaJfg@mail.gmail.com>
	<4545fc14-d9d5-46c4-bb16-fa304b27d106@giref.ulaval.ca>
Message-ID: <CAMYG4GkDQT6_HfFMcpEDuw7QoLrC=pvWzLA-cZsntbewYgUcCw@mail.gmail.com>

On Thu, Aug 1, 2024 at 8:23?AM Eric Chamberland <
Eric.Chamberland at giref.ulaval.ca> wrote:

> Hi Matthew,
>
> we have our own format that uses MPI I/O for the initial read, then we
> would like to do almost exactly what we do in ex47.c (
> https://urldefense.us/v3/__https://petsc.org/main/src/dm/impls/plex/tests/ex47.c.html__;!!G_uCfscf7eWS!aHeMEPfb0Meog5f2a3LiP86hnFxzuIQvMnwh6xTVli7pOyTG58-uCFxfN1vPwH43kT7LT5MKKPc7W06sEuZH$ ) excepted the
> very beginning of the program that will read (MPI I/O) from the disk.
> Then, always in parallel:
>
> 1- Populate a DMPlex with multiple element types (with a variant of
> DMPlexBuildFromCellListParallel ? do you have an example of this?)
>
> 2- Call partitioning (DMPlexDistribute)
>
> 3- Compute overlap (DMPlexDistributeOverlap)
>
> 4- Also compute the corresponding mapping between original element numbers
> and partitonned+overlap elements ( DMPlexNaturalToGlobalBegin/End)
>
> The main point here here is overlap computation.  And the big challenge is
> that we must always rely on the fact that never, ever, any node read all
> the mesh: all nodes have only a small part of it at the beginning then we
> want parallel partitioning and overlapping computation...
>
> It is now working fine for a mesh with a single type of element, but if we
> can modify ex47.c with an example of a mixed element types that will
> achieve what we would like to do!
>
> We can do that. We only need to change step 1. I will put it on my TODO
list. My thinking is the same as Vaclav, namely to replace numCorners with
a PetscSection describing the cells[] array. Will that work for you?

  Thanks,

     Matt

> Thanks,
>
> Eric
>
>
> On 2024-07-31 22:09, Matthew Knepley wrote:
>
> On Wed, Jul 31, 2024 at 4:16?PM Eric Chamberland <
> Eric.Chamberland at giref.ulaval.ca> wrote:
>
>> Hi Vaclav,
>>
>> Okay, I am coming back with this question after some time... ;)
>>
>> I am just wondering if it is now possible to call
>> DMPlexBuildFromCellListParallel or something else, to build a mesh that
>> combine different element types into a single DMPlex (in parallel of
>> course) ?
>>
> 1) Meshes with different cell types are fully functional, and some
> applications have been using them for a while now.
>
> 2) The Firedrake I/O methods support these hybrid meshes.
>
> 3) You can, for example, read in a GMsh or ExodusII file with different
> cell types.
>
> However, there is no direct interface like
> DMPlexBuildFromCellListParallel(). If you plan on creating meshes by hand,
> I can build that for you.
> No one so far has wanted that. Rather they want to read in a mesh in some
> format, or alter a base mesh by inserting other cell types.
>
> So, what is the motivating use case?
>
>   Thanks,
>
>      Matt
>
>> Thanks,
>>
>> Eric
>> On 2021-09-23 11:30, Hapla Vaclav wrote:
>>
>> Note there will soon be a generalization of
>> DMPlexBuildFromCellListParallel() around, as a side product of our current
>> collaborative efforts with Firedrake guys. It will take a PetscSection
>> instead of relying on the blocksize [which is indeed always constant for
>> the given dataset]. Stay tuned.
>>
>> https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/4350__;!!G_uCfscf7eWS!aHeMEPfb0Meog5f2a3LiP86hnFxzuIQvMnwh6xTVli7pOyTG58-uCFxfN1vPwH43kT7LT5MKKPc7W_UAR2Yb$ 
>>
>> Thanks,
>>
>> Vaclav
>>
>> On 23 Sep 2021, at 16:53, Eric Chamberland <
>> Eric.Chamberland at giref.ulaval.ca> wrote:
>>
>> Hi,
>>
>> oh, that's a great news!
>>
>> In our case we have our home-made file-format, invariant to the number of
>> processes (thanks to MPI_File_set_view), that uses collective, asynchronous
>> MPI I/O native calls for unstructured hybrid meshes and fields .
>>
>> So our needs are not for reading meshes but only to fill an hybrid DMPlex
>> with DMPlexBuildFromCellListParallel (or something else to come?)... to
>> exploit petsc partitioners and parallel overlap computation...
>>
>> Thanks for the follow-up! :)
>>
>> Eric
>>
>>
>> On 2021-09-22 7:20 a.m., Matthew Knepley wrote:
>>
>> On Wed, Sep 22, 2021 at 3:04 AM Karin&NiKo <niko.karin at gmail.com> wrote:
>>
>>> Dear Matthew,
>>>
>>> This is great news!
>>> For my part, I would be mostly interested in the parallel input
>>> interface. Sorry for that...
>>> Indeed, in our application,  we already have a parallel mesh data
>>> structure that supports hybrid meshes with parallel I/O and distribution
>>> (based on the MED format). We would like to use a DMPlex to make parallel
>>> mesh adaptation.
>>>  As a matter of fact, all our meshes are in the MED format. We could
>>> also contribute to extend the interface of DMPlex with MED (if you consider
>>> it could be usefull).
>>>
>>
>> An MED interface does exist. I stopped using it for two reasons:
>>
>>   1) The code was not portable and the build was failing on different
>> architectures. I had to manually fix it.
>>
>>   2) The boundary markers did not provide global information, so that
>> parallel reading was much harder.
>>
>> Feel free to update my MED reader to a better design.
>>
>>   Thanks,
>>
>>      Matt
>>
>>
>>> Best regards,
>>> Nicolas
>>>
>>>
>>> Le mar. 21 sept. 2021 ? 21:56, Matthew Knepley <knepley at gmail.com> a
>>> ?crit :
>>>
>>>> On Tue, Sep 21, 2021 at 10:31 AM Karin&NiKo <niko.karin at gmail.com>
>>>> wrote:
>>>>
>>>>> Dear Eric, dear Matthew,
>>>>>
>>>>> I share Eric's desire to be able to manipulate meshes composed of
>>>>> different types of elements in a PETSc's DMPlex.
>>>>> Since this discussion, is there anything new on this feature for the
>>>>> DMPlex object or am I missing something?
>>>>>
>>>>
>>>> Thanks for finding this!
>>>>
>>>> Okay, I did a rewrite of the Plex internals this summer. It should now
>>>> be possible to interpolate a mesh with any
>>>> number of cell types, partition it, redistribute it, and many other
>>>> manipulations.
>>>>
>>>> You can read in some formats that support hybrid meshes. If you let me
>>>> know how you plan to read it in, we can make it work.
>>>> Right now, I don't want to make input interfaces that no one will ever
>>>> use. We have a project, joint with Firedrake, to finalize
>>>> parallel I/O. This will make parallel reading and writing for
>>>> checkpointing possible, supporting topology, geometry, fields and
>>>> layouts, for many meshes in one HDF5 file. I think we will finish in
>>>> November.
>>>>
>>>>   Thanks,
>>>>
>>>>      Matt
>>>>
>>>>
>>>>> Thanks,
>>>>> Nicolas
>>>>>
>>>>> Le mer. 21 juil. 2021 ? 04:25, Eric Chamberland <
>>>>> Eric.Chamberland at giref.ulaval.ca> a ?crit :
>>>>>
>>>>>> Hi,
>>>>>> On 2021-07-14 3:14 p.m., Matthew Knepley wrote:
>>>>>>
>>>>>> On Wed, Jul 14, 2021 at 1:25 PM Eric Chamberland <
>>>>>> Eric.Chamberland at giref.ulaval.ca> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> while playing with DMPlexBuildFromCellListParallel, I noticed we
>>>>>>> have to
>>>>>>> specify "numCorners" which is a fixed value, then gives a fixed
>>>>>>> number
>>>>>>> of nodes for a series of elements.
>>>>>>>
>>>>>>> How can I then add, for example, triangles and quadrangles into a
>>>>>>> DMPlex?
>>>>>>>
>>>>>>
>>>>>> You can't with that function. It would be much mich more complicated
>>>>>> if you could, and I am not sure
>>>>>> it is worth it for that function. The reason is that you would need
>>>>>> index information to offset into the
>>>>>> connectivity list, and that would need to be replicated to some
>>>>>> extent so that all processes know what
>>>>>> the others are doing. Possible, but complicated.
>>>>>>
>>>>>> Maybe I can help suggest something for what you are trying to do?
>>>>>>
>>>>>> Yes: we are trying to partition our parallel mesh with PETSc
>>>>>> functions.  The mesh has been read in parallel so each process owns a part
>>>>>> of it, but we have to manage mixed elements types.
>>>>>>
>>>>>> When we directly use ParMETIS_V3_PartMeshKway, we give two arrays to
>>>>>> describe the elements which allows mixed elements.
>>>>>>
>>>>>> So, how would I read my mixed mesh in parallel and give it to PETSc
>>>>>> DMPlex so I can use a PetscPartitioner with DMPlexDistribute ?
>>>>>>
>>>>>> A second goal we have is to use PETSc to compute the overlap, which
>>>>>> is something I can't find in PARMetis (and any other partitionning library?)
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Eric
>>>>>>
>>>>>>
>>>>>>
>>>>>>   Thanks,
>>>>>>
>>>>>>       Matt
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Eric
>>>>>>>
>>>>>>> --
>>>>>>> Eric Chamberland, ing., M. Ing
>>>>>>> Professionnel de recherche
>>>>>>> GIREF/Universit? Laval
>>>>>>> (418) 656-2131 poste 41 22 42
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> What most experimenters take for granted before they begin their
>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>> experiments lead.
>>>>>> -- Norbert Wiener
>>>>>>
>>>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aHeMEPfb0Meog5f2a3LiP86hnFxzuIQvMnwh6xTVli7pOyTG58-uCFxfN1vPwH43kT7LT5MKKPc7WzLKFxyz$ 
>>>>>> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aHeMEPfb0Meog5f2a3LiP86hnFxzuIQvMnwh6xTVli7pOyTG58-uCFxfN1vPwH43kT7LT5MKKPc7W4OhwXgr$ >
>>>>>>
>>>>>> --
>>>>>> Eric Chamberland, ing., M. Ing
>>>>>> Professionnel de recherche
>>>>>> GIREF/Universit? Laval
>>>>>> (418) 656-2131 poste 41 22 42
>>>>>>
>>>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aHeMEPfb0Meog5f2a3LiP86hnFxzuIQvMnwh6xTVli7pOyTG58-uCFxfN1vPwH43kT7LT5MKKPc7WzLKFxyz$ 
>>>> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aHeMEPfb0Meog5f2a3LiP86hnFxzuIQvMnwh6xTVli7pOyTG58-uCFxfN1vPwH43kT7LT5MKKPc7W4OhwXgr$ >
>>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aHeMEPfb0Meog5f2a3LiP86hnFxzuIQvMnwh6xTVli7pOyTG58-uCFxfN1vPwH43kT7LT5MKKPc7WzLKFxyz$ 
>> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aHeMEPfb0Meog5f2a3LiP86hnFxzuIQvMnwh6xTVli7pOyTG58-uCFxfN1vPwH43kT7LT5MKKPc7W4OhwXgr$ >
>>
>> --
>> Eric Chamberland, ing., M. Ing
>> Professionnel de recherche
>> GIREF/Universit? Laval
>> (418) 656-2131 poste 41 22 42
>>
>>
>> --
>> Eric Chamberland, ing., M. Ing
>> Professionnel de recherche
>> GIREF/Universit? Laval
>> (418) 656-2131 poste 41 22 42
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aHeMEPfb0Meog5f2a3LiP86hnFxzuIQvMnwh6xTVli7pOyTG58-uCFxfN1vPwH43kT7LT5MKKPc7WzLKFxyz$ 
> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aHeMEPfb0Meog5f2a3LiP86hnFxzuIQvMnwh6xTVli7pOyTG58-uCFxfN1vPwH43kT7LT5MKKPc7W4OhwXgr$ >
>
> --
> Eric Chamberland, ing., M. Ing
> Professionnel de recherche
> GIREF/Universit? Laval
> (418) 656-2131 poste 41 22 42
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aHeMEPfb0Meog5f2a3LiP86hnFxzuIQvMnwh6xTVli7pOyTG58-uCFxfN1vPwH43kT7LT5MKKPc7WzLKFxyz$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aHeMEPfb0Meog5f2a3LiP86hnFxzuIQvMnwh6xTVli7pOyTG58-uCFxfN1vPwH43kT7LT5MKKPc7W4OhwXgr$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240801/837a0049/attachment-0001.html>

From bsmith at petsc.dev  Thu Aug  1 11:19:20 2024
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 1 Aug 2024 12:19:20 -0400
Subject: [petsc-users] Question regarding naming of fieldsplit splits
In-Reply-To: <BEZP281MB3363FAC6F29B92BDD53FE71EB8B22@BEZP281MB3363.DEUP281.PROD.OUTLOOK.COM>
References: <BE1P281MB3365B52866DD5B6ACF27213CB8D02@BE1P281MB3365.DEUP281.PROD.OUTLOOK.COM>
	<CAMYG4G=NR2iYsFawk6guYeuXOzJUv-65WVRw0enQJZ2HW-R6CA@mail.gmail.com>
	<BEZP281MB3363972EE729667EF509B583B8D32@BEZP281MB3363.DEUP281.PROD.OUTLOOK.COM>
	<CAMYG4Gn9awhxEwZk9YxNP1VPWfbfozYbH84ny4bV=kHyNoG_Tg@mail.gmail.com>
	<BEZP281MB33633BDD021F547D5789693DB8DC2@BEZP281MB3363.DEUP281.PROD.OUTLOOK.COM>
	<BEZP281MB3363FAC6F29B92BDD53FE71EB8B22@BEZP281MB3363.DEUP281.PROD.OUTLOOK.COM>
Message-ID: <54A1B0B8-A644-446F-854B-A4D60B47671F@petsc.dev>


   The current code is nonsensical. We can "fix" it in a patch to the release branch (but the fix may break some current usage) by changing

        if (nfields == 1) {
          PetscCall(PCFieldSplitSetIS(pc, fieldNames[ifields[0]], compField));
        } else {
          PetscCall(PetscSNPrintf(splitname, sizeof(splitname), "%" PetscInt_FMT, i));
          PetscCall(PCFieldSplitSetIS(pc, splitname, compField));
        }

to 

          PetscCall(PetscSNPrintf(splitname, sizeof(splitname), "%" PetscInt_FMT, i));
          PetscCall(PCFieldSplitSetIS(pc, splitname, compField));


but a "correct" fix will take some thought. The current model  using a combination of some "inner" integer fieldnames and some outer fieldnames (which are whatever they are including possible integers) doesn't make any sense.


> On Aug 1, 2024, at 9:19?AM, Blauth, Sebastian <sebastian.blauth at itwm.fraunhofer.de> wrote:
> 
> Hello everyone,
>  
> I have a follow up on my question. I noticed the following behavior. Let?s assume I have 5 fields which I want to group with the following options:
>  
> -ksp_type fgmres
> -ksp_max_it 1
> -ksp_monitor_true_residual
> -ksp_view
> -pc_type fieldsplit
> -pc_fieldsplit_type multiplicative
> -pc_fieldsplit_0_fields 0,1
> -pc_fieldsplit_1_fields 2
> -pc_fieldsplit_2_fields 3,4
> -fieldsplit_0_ksp_type preonly
> -fieldsplit_0_pc_type jacobi
> -fieldsplit_2_ksp_type preonly
> -fieldsplit_2_pc_type jacobi
>  
> Then, the first split is fine, but both the second and third splits get the same prefix, i.e., ?fieldsplit_2?. This is shown in the output of the ksp_view, which I attach below.
> The first one gets the prefix as there is only a single split (and I choose as name the index) and the third split gets the name as it groups two other fields, so the ?outer? name is taken. Is there any way to circumvent this, other than using custom names for the splits which are unique?
>  
> Thanks for your time and best regards,
> Sebastian Blauth
>  
>  
> The output of ?ksp_view? is the following
>  
> KSP Object: 1 MPI process
>   type: fgmres
>     restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>     happy breakdown tolerance 1e-30
>   maximum iterations=1, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>   right preconditioning
>   using UNPRECONDITIONED norm type for convergence test
> PC Object: 1 MPI process
>   type: fieldsplit
>     FieldSplit with MULTIPLICATIVE composition: total splits = 3
>     Solver info for each split is in the following KSP objects:
>   Split number 0 Defined by IS
>   KSP Object: (fieldsplit_0_) 1 MPI process
>     type: preonly
>     maximum iterations=10000, initial guess is zero
>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>     left preconditioning
>     using NONE norm type for convergence test
>   PC Object: (fieldsplit_0_) 1 MPI process
>     type: jacobi
>       type DIAGONAL
>     linear system matrix = precond matrix:
>     Mat Object: (fieldsplit_0_) 1 MPI process
>       type: seqaij
>       rows=243, cols=243
>       total: nonzeros=4473, allocated nonzeros=4473
>       total number of mallocs used during MatSetValues calls=0
>         using I-node routines: found 86 nodes, limit used is 5
>   Split number 1 Defined by IS
>   KSP Object: (fieldsplit_2_) 1 MPI process
>     type: preonly
>     maximum iterations=10000, initial guess is zero
>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>     left preconditioning
>     using NONE norm type for convergence test
>   PC Object: (fieldsplit_2_) 1 MPI process
>     type: jacobi
>       type DIAGONAL
>     linear system matrix = precond matrix:
>     Mat Object: (fieldsplit_2_) 1 MPI process
>       type: seqaij
>       rows=81, cols=81
>       total: nonzeros=497, allocated nonzeros=497
>       total number of mallocs used during MatSetValues calls=0
>         not using I-node routines
>   Split number 2 Defined by IS
>   KSP Object: (fieldsplit_2_) 1 MPI process
>     type: preonly
>     maximum iterations=10000, initial guess is zero
>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>     left preconditioning
>     using NONE norm type for convergence test
>   PC Object: (fieldsplit_2_) 1 MPI process
>     type: jacobi
>       type DIAGONAL
>     linear system matrix = precond matrix:
>     Mat Object: (fieldsplit_2_) 1 MPI process
>       type: seqaij
>       rows=243, cols=243
>       total: nonzeros=4473, allocated nonzeros=4473
>       total number of mallocs used during MatSetValues calls=0
>         using I-node routines: found 85 nodes, limit used is 5
>   linear system matrix = precond matrix:
>   Mat Object: 1 MPI process
>     type: seqaij
>     rows=567, cols=567
>     total: nonzeros=24353, allocated nonzeros=24353
>     total number of mallocs used during MatSetValues calls=0
>       using I-node routines: found 173 nodes, limit used is 5
>  
> --
> Dr. Sebastian Blauth
> Fraunhofer-Institut f?r
> Techno- und Wirtschaftsmathematik ITWM
> Abteilung Transportvorg?nge
> Fraunhofer-Platz 1, 67663 Kaiserslautern
> Telefon: +49 631 31600-4968
> sebastian.blauth at itwm.fraunhofer.de <mailto:sebastian.blauth at itwm.fraunhofer.de>
> https://urldefense.us/v3/__https://www.itwm.fraunhofer.de__;!!G_uCfscf7eWS!cAFFX-2D5mPl2LyzZdzgpGK1EsZCSss_e1OpkYmPPKSWI9R6M4qPL0ghruqbMv6bIKAYbSdHtCXmL68KeQMTQeY$  <https://urldefense.us/v3/__https://www.itwm.fraunhofer.de/__;!!G_uCfscf7eWS!cAFFX-2D5mPl2LyzZdzgpGK1EsZCSss_e1OpkYmPPKSWI9R6M4qPL0ghruqbMv6bIKAYbSdHtCXmL68KRgl9os0$ >
>  
> From: petsc-users <petsc-users-bounces at mcs.anl.gov <mailto:petsc-users-bounces at mcs.anl.gov>> On Behalf Of Blauth, Sebastian
> Sent: Tuesday, July 2, 2024 11:47 AM
> To: Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>>
> Cc: petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] Question regarding naming of fieldsplit splits
>  
> Hi Matt,
>  
> thanks fort he answer and clarification. Then I?ll work around this issue in python, where I set the options.
>  
> Best,
> Sebastian
>  
> --
> Dr. Sebastian Blauth
> Fraunhofer-Institut f?r
> Techno- und Wirtschaftsmathematik ITWM
> Abteilung Transportvorg?nge
> Fraunhofer-Platz 1, 67663 Kaiserslautern
> Telefon: +49 631 31600-4968
> sebastian.blauth at itwm.fraunhofer.de <mailto:sebastian.blauth at itwm.fraunhofer.de>
> https://urldefense.us/v3/__https://www.itwm.fraunhofer.de__;!!G_uCfscf7eWS!cAFFX-2D5mPl2LyzZdzgpGK1EsZCSss_e1OpkYmPPKSWI9R6M4qPL0ghruqbMv6bIKAYbSdHtCXmL68KeQMTQeY$  <https://urldefense.us/v3/__https://www.itwm.fraunhofer.de/__;!!G_uCfscf7eWS!cAFFX-2D5mPl2LyzZdzgpGK1EsZCSss_e1OpkYmPPKSWI9R6M4qPL0ghruqbMv6bIKAYbSdHtCXmL68KRgl9os0$ >
>  
> From: Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> 
> Sent: Monday, July 1, 2024 4:30 PM
> To: Blauth, Sebastian <sebastian.blauth at itwm.fraunhofer.de <mailto:sebastian.blauth at itwm.fraunhofer.de>>
> Cc: petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] Question regarding naming of fieldsplit splits
>  
> On Mon, Jul 1, 2024 at 9:48?AM Blauth, Sebastian <sebastian.blauth at itwm.fraunhofer.de <mailto:sebastian.blauth at itwm.fraunhofer.de>> wrote:
> Dear Matt,
>  
> thanks a lot for your help. Unfortunately, for me these extra options do not have any effect, I still get the ?u? and ?p? fieldnames. Also, this would not help me to get rid of the ?c? fieldname ? on that level of the fieldsplit I am basically using your approach already, and still it does show up. The output of the -ksp_view is unchanged, so that I do not attach it here again. Maybe I misunderstood you?
>  
> Oh, we make an exception for single fields, since we think you would want to use the name. I have to make an extra option to shut off naming.
>  
>    Thanks,
>  
>      Matt
>  
> Thanks for the help and best regards,
> Sebastian
>  
> --
> Dr. Sebastian Blauth
> Fraunhofer-Institut f?r
> Techno- und Wirtschaftsmathematik ITWM
> Abteilung Transportvorg?nge
> Fraunhofer-Platz 1, 67663 Kaiserslautern
> Telefon: +49 631 31600-4968
> sebastian.blauth at itwm.fraunhofer.de <mailto:sebastian.blauth at itwm.fraunhofer.de>
> https://urldefense.us/v3/__https://www.itwm.fraunhofer.de__;!!G_uCfscf7eWS!cAFFX-2D5mPl2LyzZdzgpGK1EsZCSss_e1OpkYmPPKSWI9R6M4qPL0ghruqbMv6bIKAYbSdHtCXmL68KeQMTQeY$  <https://urldefense.us/v3/__https://www.itwm.fraunhofer.de/__;!!G_uCfscf7eWS!cAFFX-2D5mPl2LyzZdzgpGK1EsZCSss_e1OpkYmPPKSWI9R6M4qPL0ghruqbMv6bIKAYbSdHtCXmL68KRgl9os0$ >
>  
> From: Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> 
> Sent: Monday, July 1, 2024 2:27 PM
> To: Blauth, Sebastian <sebastian.blauth at itwm.fraunhofer.de <mailto:sebastian.blauth at itwm.fraunhofer.de>>
> Cc: petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] Question regarding naming of fieldsplit splits
>  
> On Fri, Jun 28, 2024 at 4:05?AM Blauth, Sebastian <sebastian.blauth at itwm.fraunhofer.de <mailto:sebastian.blauth at itwm.fraunhofer.de>> wrote:
> Hello everyone,
>  
> I have a question regarding the naming convention using PETSc?s PCFieldsplit. I have been following https://urldefense.us/v3/__https://lists.mcs.anl.gov/pipermail/petsc-users/2019-January/037262.html__;!!G_uCfscf7eWS!cAFFX-2D5mPl2LyzZdzgpGK1EsZCSss_e1OpkYmPPKSWI9R6M4qPL0ghruqbMv6bIKAYbSdHtCXmL68KSDRPkj4$  to create a DMShell with FEniCS in order to customize PCFieldsplit for my application. 
> I am using the following options, which work nicely for me:
>  
> -ksp_type fgmres
> -pc_type fieldsplit
> -pc_fieldsplit_0_fields 0, 1
> -pc_fieldsplit_1_fields 2
> -pc_fieldsplit_type additive
> -fieldsplit_0_ksp_type fgmres
> -fieldsplit_0_pc_type fieldsplit
> -fieldsplit_0_pc_fieldsplit_type schur
> -fieldsplit_0_pc_fieldsplit_schur_fact_type full
> -fieldsplit_0_pc_fieldsplit_schur_precondition selfp
> -fieldsplit_0_fieldsplit_u_ksp_type preonly
> -fieldsplit_0_fieldsplit_u_pc_type lu
> -fieldsplit_0_fieldsplit_p_ksp_type cg
> -fieldsplit_0_fieldsplit_p_ksp_rtol 1e-14
> -fieldsplit_0_fieldsplit_p_ksp_atol 1e-30
> -fieldsplit_0_fieldsplit_p_pc_type icc
> -fieldsplit_0_ksp_rtol 1e-14
> -fieldsplit_0_ksp_atol 1e-30
> -fieldsplit_0_ksp_monitor_true_residual
> -fieldsplit_c_ksp_type preonly
> -fieldsplit_c_pc_type lu
> -ksp_view
>  
> By default, we use the field names, but you can prevent this by specifying the fields by hand, so
>  
> -fieldsplit_0_pc_fieldsplit_0_fields 0
> -fieldsplit_0_pc_fieldsplit_1_fields 1
>  
> should remove the 'u' and 'p' fieldnames. It is somewhat hacky, but I think easier to remember than
> some extra option.
>  
>   Thanks,
>  
>      Matt
>  
> Note that this is just an academic example (sorry for the low solver tolerances) to test the approach, consisting of a Stokes equation and some concentration equation (which is not even coupled to Stokes, just for testing).
> Completely analogous to https://urldefense.us/v3/__https://lists.mcs.anl.gov/pipermail/petsc-users/2019-January/037262.html__;!!G_uCfscf7eWS!cAFFX-2D5mPl2LyzZdzgpGK1EsZCSss_e1OpkYmPPKSWI9R6M4qPL0ghruqbMv6bIKAYbSdHtCXmL68KSDRPkj4$ , I translate my IS?s to a PETSc Section, which is then supplied to a DMShell and assigned to a KSP. I am not so familiar with the code or how / why this works, but it seems to do so perfectly. I name my sections with petsc4py using
>  
> section.setFieldName(0, "u")
> section.setFieldName(1, "p")
> section.setFieldName(2, "c")
>  
> However, this is also reflected in the way I can access the fieldsplit options from the command line. My question is: Is there any way of not using the FieldNames specified in python but use the index of the field as defined with ?-pc_fieldsplit_0_fields 0, 1? and ?-pc_fieldsplit_1_fields 2?, i.e., instead of the prefix ?fieldsplit_0_fieldsplit_u? I want to write ?fieldsplit_0_fieldsplit_0?, instead of ?fieldsplit_0_fieldsplit_p? I want to use ?fieldsplit_0_fieldsplit_1?, and instead of ?fieldsplit_c? I want to use ?fieldsplit_1?. Just changing the names of the fields to
>  
> section.setFieldName(0, "0")
> section.setFieldName(1, "1")
> section.setFieldName(2, "2")
>  
> does obviously not work as expected, as it works for velocity and pressure, but not for the concentration ? the prefix there is then ?fieldsplit_2? and not ?fieldsplit_1?. In the docs, I have found https://urldefense.us/v3/__https://petsc.org/main/manualpages/PC/PCFieldSplitSetFields/__;!!G_uCfscf7eWS!cAFFX-2D5mPl2LyzZdzgpGK1EsZCSss_e1OpkYmPPKSWI9R6M4qPL0ghruqbMv6bIKAYbSdHtCXmL68KbWFmeH0$  which seems to suggest that the fieldname can potentially be supplied, but I don?t see how to do so from the command line. Also, for the sake of completeness, I attach the output of the solve with ?-ksp_view? below. 
>  
> Thanks a lot in advance and best regards,
> Sebastian
>  
>  
> The output of ksp_view is the following:
> KSP Object: 1 MPI processes
>   type: fgmres
>     restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>     happy breakdown tolerance 1e-30
>   maximum iterations=10000, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-11, divergence=10000.
>   right preconditioning
>   using UNPRECONDITIONED norm type for convergence test
> PC Object: 1 MPI processes
>   type: fieldsplit
>     FieldSplit with ADDITIVE composition: total splits = 2
>     Solver info for each split is in the following KSP objects:
>   Split number 0 Defined by IS
>   KSP Object: (fieldsplit_0_) 1 MPI processes
>     type: fgmres
>       restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>       happy breakdown tolerance 1e-30
>     maximum iterations=10000, initial guess is zero
>     tolerances:  relative=1e-14, absolute=1e-30, divergence=10000.
>     right preconditioning
>     using UNPRECONDITIONED norm type for convergence test
>   PC Object: (fieldsplit_0_) 1 MPI processes
>     type: fieldsplit
>       FieldSplit with Schur preconditioner, factorization FULL
>       Preconditioner for the Schur complement formed from Sp, an assembled approximation to S, which uses A00's diagonal's inverse
>       Split info:
>       Split number 0 Defined by IS
>       Split number 1 Defined by IS
>       KSP solver for A00 block
>         KSP Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes
>           type: preonly
>           maximum iterations=10000, initial guess is zero
>           tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>           left preconditioning
>           using NONE norm type for convergence test
>         PC Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes
>           type: lu
>             out-of-place factorization
>             tolerance for zero pivot 2.22045e-14
>             matrix ordering: nd
>             factor fill ratio given 5., needed 3.92639
>               Factored matrix follows:
>                 Mat Object: 1 MPI processes
>                   type: seqaij
>                   rows=4290, cols=4290
>                   package used to perform factorization: petsc
>                   total: nonzeros=375944, allocated nonzeros=375944
>                     using I-node routines: found 2548 nodes, limit used is 5
>           linear system matrix = precond matrix:
>           Mat Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes
>             type: seqaij
>             rows=4290, cols=4290
>             total: nonzeros=95748, allocated nonzeros=95748
>             total number of mallocs used during MatSetValues calls=0
>               using I-node routines: found 3287 nodes, limit used is 5
>       KSP solver for S = A11 - A10 inv(A00) A01 
>         KSP Object: (fieldsplit_0_fieldsplit_p_) 1 MPI processes
>           type: cg
>           maximum iterations=10000, initial guess is zero
>           tolerances:  relative=1e-14, absolute=1e-30, divergence=10000.
>           left preconditioning
>           using PRECONDITIONED norm type for convergence test
>         PC Object: (fieldsplit_0_fieldsplit_p_) 1 MPI processes
>           type: icc
>             out-of-place factorization
>             0 levels of fill
>             tolerance for zero pivot 2.22045e-14
>             using Manteuffel shift [POSITIVE_DEFINITE]
>             matrix ordering: natural
>             factor fill ratio given 1., needed 1.
>               Factored matrix follows:
>                 Mat Object: 1 MPI processes
>                   type: seqsbaij
>                   rows=561, cols=561
>                   package used to perform factorization: petsc
>                   total: nonzeros=5120, allocated nonzeros=5120
>                       block size is 1
>           linear system matrix followed by preconditioner matrix:
>           Mat Object: (fieldsplit_0_fieldsplit_p_) 1 MPI processes
>             type: schurcomplement
>             rows=561, cols=561
>               Schur complement A11 - A10 inv(A00) A01
>               A11
>                 Mat Object: (fieldsplit_0_fieldsplit_p_) 1 MPI processes
>                   type: seqaij
>                   rows=561, cols=561
>                   total: nonzeros=3729, allocated nonzeros=3729
>                   total number of mallocs used during MatSetValues calls=0
>                     not using I-node routines
>               A10
>                 Mat Object: 1 MPI processes
>                   type: seqaij
>                   rows=561, cols=4290
>                   total: nonzeros=19938, allocated nonzeros=19938
>                   total number of mallocs used during MatSetValues calls=0
>                     not using I-node routines
>               KSP of A00
>                 KSP Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes
>                   type: preonly
>                   maximum iterations=10000, initial guess is zero
>                   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>                   left preconditioning
>                   using NONE norm type for convergence test
>                 PC Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes
>                   type: lu
>                     out-of-place factorization
>                     tolerance for zero pivot 2.22045e-14
>                     matrix ordering: nd
>                     factor fill ratio given 5., needed 3.92639
>                       Factored matrix follows:
>                         Mat Object: 1 MPI processes
>                           type: seqaij
>                           rows=4290, cols=4290
>                           package used to perform factorization: petsc
>                           total: nonzeros=375944, allocated nonzeros=375944
>                             using I-node routines: found 2548 nodes, limit used is 5
>                   linear system matrix = precond matrix:
>                   Mat Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes
>                     type: seqaij
>                     rows=4290, cols=4290
>                     total: nonzeros=95748, allocated nonzeros=95748
>                     total number of mallocs used during MatSetValues calls=0
>                       using I-node routines: found 3287 nodes, limit used is 5
>               A01
>                 Mat Object: 1 MPI processes
>                   type: seqaij
>                   rows=4290, cols=561
>                   total: nonzeros=19938, allocated nonzeros=19938
>                   total number of mallocs used during MatSetValues calls=0
>                     using I-node routines: found 3287 nodes, limit used is 5
>           Mat Object: 1 MPI processes
>             type: seqaij
>             rows=561, cols=561
>             total: nonzeros=9679, allocated nonzeros=9679
>             total number of mallocs used during MatSetValues calls=0
>               not using I-node routines
>     linear system matrix = precond matrix:
>     Mat Object: (fieldsplit_0_) 1 MPI processes
>       type: seqaij
>       rows=4851, cols=4851
>       total: nonzeros=139353, allocated nonzeros=139353
>       total number of mallocs used during MatSetValues calls=0
>         using I-node routines: found 3830 nodes, limit used is 5
>   Split number 1 Defined by IS
>   KSP Object: (fieldsplit_c_) 1 MPI processes
>     type: preonly
>     maximum iterations=10000, initial guess is zero
>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>     left preconditioning
>     using NONE norm type for convergence test
>   PC Object: (fieldsplit_c_) 1 MPI processes
>     type: lu
>       out-of-place factorization
>       tolerance for zero pivot 2.22045e-14
>       matrix ordering: nd
>       factor fill ratio given 5., needed 4.24323
>         Factored matrix follows:
>           Mat Object: 1 MPI processes
>             type: seqaij
>             rows=561, cols=561
>             package used to perform factorization: petsc
>             total: nonzeros=15823, allocated nonzeros=15823
>               not using I-node routines
>     linear system matrix = precond matrix:
>     Mat Object: (fieldsplit_c_) 1 MPI processes
>       type: seqaij
>       rows=561, cols=561
>       total: nonzeros=3729, allocated nonzeros=3729
>       total number of mallocs used during MatSetValues calls=0
>         not using I-node routines
>   linear system matrix = precond matrix:
>   Mat Object: 1 MPI processes
>     type: seqaij
>     rows=5412, cols=5412
>     total: nonzeros=190416, allocated nonzeros=190416
>     total number of mallocs used during MatSetValues calls=0
>       using I-node routines: found 3833 nodes, limit used is 5
>  
> --
> Dr. Sebastian Blauth
> Fraunhofer-Institut f?r
> Techno- und Wirtschaftsmathematik ITWM
> Abteilung Transportvorg?nge
> Fraunhofer-Platz 1, 67663 Kaiserslautern
> Telefon: +49 631 31600-4968
> sebastian.blauth at itwm.fraunhofer.de <mailto:sebastian.blauth at itwm.fraunhofer.de>
> https://urldefense.us/v3/__https://www.itwm.fraunhofer.de__;!!G_uCfscf7eWS!cAFFX-2D5mPl2LyzZdzgpGK1EsZCSss_e1OpkYmPPKSWI9R6M4qPL0ghruqbMv6bIKAYbSdHtCXmL68KeQMTQeY$  <https://urldefense.us/v3/__https://www.itwm.fraunhofer.de/__;!!G_uCfscf7eWS!cAFFX-2D5mPl2LyzZdzgpGK1EsZCSss_e1OpkYmPPKSWI9R6M4qPL0ghruqbMv6bIKAYbSdHtCXmL68KRgl9os0$ >
>  
> 
>  
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>  
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cAFFX-2D5mPl2LyzZdzgpGK1EsZCSss_e1OpkYmPPKSWI9R6M4qPL0ghruqbMv6bIKAYbSdHtCXmL68KAT1VwKQ$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cAFFX-2D5mPl2LyzZdzgpGK1EsZCSss_e1OpkYmPPKSWI9R6M4qPL0ghruqbMv6bIKAYbSdHtCXmL68Kgfz0DbI$ >
> 
>  
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>  
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cAFFX-2D5mPl2LyzZdzgpGK1EsZCSss_e1OpkYmPPKSWI9R6M4qPL0ghruqbMv6bIKAYbSdHtCXmL68KAT1VwKQ$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cAFFX-2D5mPl2LyzZdzgpGK1EsZCSss_e1OpkYmPPKSWI9R6M4qPL0ghruqbMv6bIKAYbSdHtCXmL68Kgfz0DbI$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240801/a75bacae/attachment-0001.html>

From mmolinos at us.es  Thu Aug  1 15:40:08 2024
From: mmolinos at us.es (MIGUEL MOLINOS PEREZ)
Date: Thu, 1 Aug 2024 20:40:08 +0000
Subject: [petsc-users] Ghost particles for DMSWARM (or similar)
Message-ID: <8FBAC7A5-B6AE-4B21-8FEB-52BE1C04A265@us.es>

An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240801/8a6cc456/attachment.html>

From Eric.Chamberland at giref.ulaval.ca  Thu Aug  1 16:33:24 2024
From: Eric.Chamberland at giref.ulaval.ca (Eric Chamberland)
Date: Thu, 1 Aug 2024 17:33:24 -0400
Subject: [petsc-users] How to combine different element types into a
 single DMPlex?
In-Reply-To: <CAMYG4GkDQT6_HfFMcpEDuw7QoLrC=pvWzLA-cZsntbewYgUcCw@mail.gmail.com>
References: <beda9fe3-b204-8b65-66c7-bdadf4509679@giref.ulaval.ca>
	<CAMYG4GnJE0DP_f-NctsMj0PoerBy2yi43RXL0WZtKoOxwqFQ7Q@mail.gmail.com>
	<ab13b059-1282-2314-2bc7-6336c051a590@giref.ulaval.ca>
	<CA+gX-L_3i=fA7hKBnnhL2EZvwvD+sEn11D8qcc2YRaR2ywwahA@mail.gmail.com>
	<CAMYG4GmYyHqLMhNWsCenbfGON_gSdN1DUy2oRgdO7fbGbwzv4w@mail.gmail.com>
	<CA+gX-L9FkvH+u-yPPpzKPcLkmk3vVzuFbF3mTr=jHJVZqVruZQ@mail.gmail.com>
	<CAMYG4G=mUTVCuayh4_kqfjEoNT_XEt6+FF24jF2KhLu0nWb8tw@mail.gmail.com>
	<6e78845e-2054-92b1-d6db-2c0820c05b64@giref.ulaval.ca>
	<BB7104B9-EB9F-4F1A-A62F-A4AD9CD6F294@erdw.ethz.ch>
	<9021c53e-18af-428a-978a-54a3c7371378@giref.ulaval.ca>
	<CAMYG4GkuqSvaPJYXV2wF+w-dfge_NtSzUncRZ8AKbyXaxuaJfg@mail.gmail.com>
	<4545fc14-d9d5-46c4-bb16-fa304b27d106@giref.ulaval.ca>
	<CAMYG4GkDQT6_HfFMcpEDuw7QoLrC=pvWzLA-cZsntbewYgUcCw@mail.gmail.com>
Message-ID: <4fc58cb6-10c8-40be-9c6f-2470e630c7b6@giref.ulaval.ca>

On 2024-08-01 09:24, Matthew Knepley wrote:
> On Thu, Aug 1, 2024 at 8:23?AM Eric Chamberland 
> <Eric.Chamberland at giref.ulaval.ca> wrote:
>
>     Hi Matthew,
>
>     we have our own format that uses MPI I/O for the initial read,
>     then we would like to do almost exactly what we do in ex47.c
>     (https://urldefense.us/v3/__https://petsc.org/main/src/dm/impls/plex/tests/ex47.c.html__;!!G_uCfscf7eWS!Yl2BQr5WaJV41Sq7-i2xoMTi_ZGsBeThe3GPDdLjQmRtNXOdQJKpIg1Ec8-av5NcnywNIyr2D9ew6B-O8jC5ICPpWzcZ0mNNE3n3bYIy$ )
>     excepted the very beginning of the program that will read (MPI
>     I/O) from the disk.? Then, always in parallel:
>
>     1- Populate a DMPlex with multiple element types (with a variant
>     of DMPlexBuildFromCellListParallel ? do you have an example of this?)
>
>     ...
>
> We can do that. We only need to change step 1. I will put it on my 
> TODO list. My thinking is the same as Vaclav, namely to replace 
> numCorners with a PetscSection describing?the cells[] array. Will that 
> work for you?
>
Hi Matthew,

That sounds fine for me!? I can create a mixed mesh partition 
description so we add it to ex47.c...

I'll ping @you in a MR for that...

thanks a lot!

Eric

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240801/053fb210/attachment.html>

From knepley at gmail.com  Fri Aug  2 07:41:47 2024
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 2 Aug 2024 08:41:47 -0400
Subject: [petsc-users] Question regarding naming of fieldsplit splits
In-Reply-To: <54A1B0B8-A644-446F-854B-A4D60B47671F@petsc.dev>
References: <BE1P281MB3365B52866DD5B6ACF27213CB8D02@BE1P281MB3365.DEUP281.PROD.OUTLOOK.COM>
	<CAMYG4G=NR2iYsFawk6guYeuXOzJUv-65WVRw0enQJZ2HW-R6CA@mail.gmail.com>
	<BEZP281MB3363972EE729667EF509B583B8D32@BEZP281MB3363.DEUP281.PROD.OUTLOOK.COM>
	<CAMYG4Gn9awhxEwZk9YxNP1VPWfbfozYbH84ny4bV=kHyNoG_Tg@mail.gmail.com>
	<BEZP281MB33633BDD021F547D5789693DB8DC2@BEZP281MB3363.DEUP281.PROD.OUTLOOK.COM>
	<BEZP281MB3363FAC6F29B92BDD53FE71EB8B22@BEZP281MB3363.DEUP281.PROD.OUTLOOK.COM>
	<54A1B0B8-A644-446F-854B-A4D60B47671F@petsc.dev>
Message-ID: <CAMYG4GmWmbmk2z4e=8XRaeSoN7t4vh4Gupz++ya6kAGjQDj5aQ@mail.gmail.com>

On Thu, Aug 1, 2024 at 12:19?PM Barry Smith <bsmith at petsc.dev> wrote:

>
>    The current code is nonsensical. We can "fix" it in a patch to the
> release branch (but the fix may break some current usage) by changing
>
>         if (nfields == 1) {
>           PetscCall(PCFieldSplitSetIS(pc, fieldNames[ifields[0]],
> compField));
>         } else {
>           PetscCall(PetscSNPrintf(splitname, sizeof(splitname), "%"
> PetscInt_FMT, i));
>           PetscCall(PCFieldSplitSetIS(pc, splitname, compField));
>         }
>
> to
>
>           PetscCall(PetscSNPrintf(splitname, sizeof(splitname), "%"
> PetscInt_FMT, i));
>           PetscCall(PCFieldSplitSetIS(pc, splitname, compField));
>
>
> but a "correct" fix will take some thought. The current model  using a
> combination of some "inner" integer fieldnames and some outer fieldnames
> (which are whatever they are including possible integers) doesn't make any
> sense.
>

My fix was going to be a flag that turns off names altogether. I think this
will fix it for Sebastian, and is the only consistent fix I can think of.

   Thanks,

      Matt


>
> On Aug 1, 2024, at 9:19?AM, Blauth, Sebastian <
> sebastian.blauth at itwm.fraunhofer.de> wrote:
>
> Hello everyone,
>
> I have a follow up on my question. I noticed the following behavior. Let?s
> assume I have 5 fields which I want to group with the following options:
>
> -ksp_type fgmres
> -ksp_max_it 1
> -ksp_monitor_true_residual
> -ksp_view
> -pc_type fieldsplit
> -pc_fieldsplit_type multiplicative
> -pc_fieldsplit_0_fields 0,1
> -pc_fieldsplit_1_fields 2
> -pc_fieldsplit_2_fields 3,4
> -fieldsplit_0_ksp_type preonly
> -fieldsplit_0_pc_type jacobi
> -fieldsplit_2_ksp_type preonly
> -fieldsplit_2_pc_type jacobi
>
> Then, the first split is fine, but both the second and third splits get
> the same prefix, i.e., ?fieldsplit_2?. This is shown in the output of the
> ksp_view, which I attach below.
> The first one gets the prefix as there is only a single split (and I
> choose as name the index) and the third split gets the name as it groups
> two other fields, so the ?outer? name is taken. Is there any way to
> circumvent this, other than using custom names for the splits which are
> unique?
>
> Thanks for your time and best regards,
> Sebastian Blauth
>
>
> The output of ?ksp_view? is the following
>
> KSP Object: 1 MPI process
>   type: fgmres
>     restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
>     happy breakdown tolerance 1e-30
>   maximum iterations=1, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>   right preconditioning
>   using UNPRECONDITIONED norm type for convergence test
> PC Object: 1 MPI process
>   type: fieldsplit
>     FieldSplit with MULTIPLICATIVE composition: total splits = 3
>     Solver info for each split is in the following KSP objects:
>   Split number 0 Defined by IS
>   KSP Object: (fieldsplit_0_) 1 MPI process
>     type: preonly
>     maximum iterations=10000, initial guess is zero
>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>     left preconditioning
>     using NONE norm type for convergence test
>   PC Object: (fieldsplit_0_) 1 MPI process
>     type: jacobi
>       type DIAGONAL
>     linear system matrix = precond matrix:
>     Mat Object: (fieldsplit_0_) 1 MPI process
>       type: seqaij
>       rows=243, cols=243
>       total: nonzeros=4473, allocated nonzeros=4473
>       total number of mallocs used during MatSetValues calls=0
>         using I-node routines: found 86 nodes, limit used is 5
>   Split number 1 Defined by IS
>   KSP Object: (fieldsplit_2_) 1 MPI process
>     type: preonly
>     maximum iterations=10000, initial guess is zero
>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>     left preconditioning
>     using NONE norm type for convergence test
>   PC Object: (fieldsplit_2_) 1 MPI process
>     type: jacobi
>       type DIAGONAL
>     linear system matrix = precond matrix:
>     Mat Object: (fieldsplit_2_) 1 MPI process
>       type: seqaij
>       rows=81, cols=81
>       total: nonzeros=497, allocated nonzeros=497
>       total number of mallocs used during MatSetValues calls=0
>         not using I-node routines
>   Split number 2 Defined by IS
>   KSP Object: (fieldsplit_2_) 1 MPI process
>     type: preonly
>     maximum iterations=10000, initial guess is zero
>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>     left preconditioning
>     using NONE norm type for convergence test
>   PC Object: (fieldsplit_2_) 1 MPI process
>     type: jacobi
>       type DIAGONAL
>     linear system matrix = precond matrix:
>     Mat Object: (fieldsplit_2_) 1 MPI process
>       type: seqaij
>       rows=243, cols=243
>       total: nonzeros=4473, allocated nonzeros=4473
>       total number of mallocs used during MatSetValues calls=0
>         using I-node routines: found 85 nodes, limit used is 5
>   linear system matrix = precond matrix:
>   Mat Object: 1 MPI process
>     type: seqaij
>     rows=567, cols=567
>     total: nonzeros=24353, allocated nonzeros=24353
>     total number of mallocs used during MatSetValues calls=0
>       using I-node routines: found 173 nodes, limit used is 5
>
> --
> Dr. Sebastian Blauth
> Fraunhofer-Institut f?r
> Techno- und Wirtschaftsmathematik ITWM
> Abteilung Transportvorg?nge
> Fraunhofer-Platz 1, 67663 Kaiserslautern
> Telefon: +49 631 31600-4968
> sebastian.blauth at itwm.fraunhofer.de
> https://urldefense.us/v3/__https://www.itwm.fraunhofer.de__;!!G_uCfscf7eWS!an6Idf-f7OiZlsU0N0Ftpr5mM5etD7GF_9ghya_ALFmQP_eL93oONwYYRLmLGz-0FSXHkB0bMsjj0e4-qdCV$ 
>
> *From:* petsc-users <petsc-users-bounces at mcs.anl.gov> *On Behalf Of *Blauth,
> Sebastian
> *Sent:* Tuesday, July 2, 2024 11:47 AM
> *To:* Matthew Knepley <knepley at gmail.com>
> *Cc:* petsc-users at mcs.anl.gov
> *Subject:* Re: [petsc-users] Question regarding naming of fieldsplit
> splits
>
> Hi Matt,
>
> thanks fort he answer and clarification. Then I?ll work around this issue
> in python, where I set the options.
>
> Best,
> Sebastian
>
> --
> Dr. Sebastian Blauth
> Fraunhofer-Institut f?r
> Techno- und Wirtschaftsmathematik ITWM
> Abteilung Transportvorg?nge
> Fraunhofer-Platz 1, 67663 Kaiserslautern
> Telefon: +49 631 31600-4968
> sebastian.blauth at itwm.fraunhofer.de
> https://urldefense.us/v3/__https://www.itwm.fraunhofer.de__;!!G_uCfscf7eWS!an6Idf-f7OiZlsU0N0Ftpr5mM5etD7GF_9ghya_ALFmQP_eL93oONwYYRLmLGz-0FSXHkB0bMsjj0e4-qdCV$ 
>
> *From:* Matthew Knepley <knepley at gmail.com>
> *Sent:* Monday, July 1, 2024 4:30 PM
> *To:* Blauth, Sebastian <sebastian.blauth at itwm.fraunhofer.de>
> *Cc:* petsc-users at mcs.anl.gov
> *Subject:* Re: [petsc-users] Question regarding naming of fieldsplit
> splits
>
> On Mon, Jul 1, 2024 at 9:48?AM Blauth, Sebastian <
> sebastian.blauth at itwm.fraunhofer.de> wrote:
>
> Dear Matt,
>
> thanks a lot for your help. Unfortunately, for me these extra options do
> not have any effect, I still get the ?u? and ?p? fieldnames. Also, this
> would not help me to get rid of the ?c? fieldname ? on that level of the
> fieldsplit I am basically using your approach already, and still it does
> show up. The output of the -ksp_view is unchanged, so that I do not attach
> it here again. Maybe I misunderstood you?
>
>
> Oh, we make an exception for single fields, since we think you would want
> to use the name. I have to make an extra option to shut off naming.
>
>    Thanks,
>
>      Matt
>
>
> Thanks for the help and best regards,
> Sebastian
>
> --
> Dr. Sebastian Blauth
> Fraunhofer-Institut f?r
> Techno- und Wirtschaftsmathematik ITWM
> Abteilung Transportvorg?nge
> Fraunhofer-Platz 1, 67663 Kaiserslautern
> Telefon: +49 631 31600-4968
> sebastian.blauth at itwm.fraunhofer.de
> https://urldefense.us/v3/__https://www.itwm.fraunhofer.de__;!!G_uCfscf7eWS!an6Idf-f7OiZlsU0N0Ftpr5mM5etD7GF_9ghya_ALFmQP_eL93oONwYYRLmLGz-0FSXHkB0bMsjj0e4-qdCV$ 
>
> *From:* Matthew Knepley <knepley at gmail.com>
> *Sent:* Monday, July 1, 2024 2:27 PM
> *To:* Blauth, Sebastian <sebastian.blauth at itwm.fraunhofer.de>
> *Cc:* petsc-users at mcs.anl.gov
> *Subject:* Re: [petsc-users] Question regarding naming of fieldsplit
> splits
>
> On Fri, Jun 28, 2024 at 4:05?AM Blauth, Sebastian <
> sebastian.blauth at itwm.fraunhofer.de> wrote:
>
> Hello everyone,
>
> I have a question regarding the naming convention using PETSc?s
> PCFieldsplit. I have been following
> https://urldefense.us/v3/__https://lists.mcs.anl.gov/pipermail/petsc-users/2019-January/037262.html__;!!G_uCfscf7eWS!an6Idf-f7OiZlsU0N0Ftpr5mM5etD7GF_9ghya_ALFmQP_eL93oONwYYRLmLGz-0FSXHkB0bMsjj0Qyn5DYX$  to
> create a DMShell with FEniCS in order to customize PCFieldsplit for my
> application.
> I am using the following options, which work nicely for me:
>
> -ksp_type fgmres
> -pc_type fieldsplit
> -pc_fieldsplit_0_fields 0, 1
> -pc_fieldsplit_1_fields 2
> -pc_fieldsplit_type additive
> -fieldsplit_0_ksp_type fgmres
> -fieldsplit_0_pc_type fieldsplit
> -fieldsplit_0_pc_fieldsplit_type schur
> -fieldsplit_0_pc_fieldsplit_schur_fact_type full
> -fieldsplit_0_pc_fieldsplit_schur_precondition selfp
> -fieldsplit_0_fieldsplit_u_ksp_type preonly
> -fieldsplit_0_fieldsplit_u_pc_type lu
> -fieldsplit_0_fieldsplit_p_ksp_type cg
> -fieldsplit_0_fieldsplit_p_ksp_rtol 1e-14
> -fieldsplit_0_fieldsplit_p_ksp_atol 1e-30
> -fieldsplit_0_fieldsplit_p_pc_type icc
> -fieldsplit_0_ksp_rtol 1e-14
> -fieldsplit_0_ksp_atol 1e-30
> -fieldsplit_0_ksp_monitor_true_residual
> -fieldsplit_c_ksp_type preonly
> -fieldsplit_c_pc_type lu
> -ksp_view
>
>
> By default, we use the field names, but you can prevent this by specifying
> the fields by hand, so
>
> -fieldsplit_0_pc_fieldsplit_0_fields 0
> -fieldsplit_0_pc_fieldsplit_1_fields 1
>
> should remove the 'u' and 'p' fieldnames. It is somewhat hacky, but I
> think easier to remember than
> some extra option.
>
>   Thanks,
>
>      Matt
>
>
> Note that this is just an academic example (sorry for the low solver
> tolerances) to test the approach, consisting of a Stokes equation and some
> concentration equation (which is not even coupled to Stokes, just for
> testing).
> Completely analogous to
> https://urldefense.us/v3/__https://lists.mcs.anl.gov/pipermail/petsc-users/2019-January/037262.html__;!!G_uCfscf7eWS!an6Idf-f7OiZlsU0N0Ftpr5mM5etD7GF_9ghya_ALFmQP_eL93oONwYYRLmLGz-0FSXHkB0bMsjj0Qyn5DYX$ ,
> I translate my IS?s to a PETSc Section, which is then supplied to a DMShell
> and assigned to a KSP. I am not so familiar with the code or how / why this
> works, but it seems to do so perfectly. I name my sections with petsc4py
> using
>
> section.setFieldName(0, "u")
> section.setFieldName(1, "p")
> section.setFieldName(2, "c")
>
> However, this is also reflected in the way I can access the fieldsplit
> options from the command line. My question is: Is there any way of not
> using the FieldNames specified in python but use the index of the field as
> defined with ?-pc_fieldsplit_0_fields 0, 1? and ?-pc_fieldsplit_1_fields
> 2?, i.e., instead of the prefix ?fieldsplit_0_fieldsplit_u? I want to write
> ?fieldsplit_0_fieldsplit_0?, instead of ?fieldsplit_0_fieldsplit_p? I want
> to use ?fieldsplit_0_fieldsplit_1?, and instead of ?fieldsplit_c? I want to
> use ?fieldsplit_1?. Just changing the names of the fields to
>
> section.setFieldName(0, "0")
> section.setFieldName(1, "1")
> section.setFieldName(2, "2")
>
> does obviously not work as expected, as it works for velocity and
> pressure, but not for the concentration ? the prefix there is then
> ?fieldsplit_2? and not ?fieldsplit_1?. In the docs, I have found
> https://urldefense.us/v3/__https://petsc.org/main/manualpages/PC/PCFieldSplitSetFields/__;!!G_uCfscf7eWS!an6Idf-f7OiZlsU0N0Ftpr5mM5etD7GF_9ghya_ALFmQP_eL93oONwYYRLmLGz-0FSXHkB0bMsjj0X9GdD2a$  which seems
> to suggest that the fieldname can potentially be supplied, but I don?t see
> how to do so from the command line. Also, for the sake of completeness, I
> attach the output of the solve with ?-ksp_view? below.
>
> Thanks a lot in advance and best regards,
> Sebastian
>
>
> The output of ksp_view is the following:
> KSP Object: 1 MPI processes
>   type: fgmres
>     restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
>     happy breakdown tolerance 1e-30
>   maximum iterations=10000, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-11, divergence=10000.
>   right preconditioning
>   using UNPRECONDITIONED norm type for convergence test
> PC Object: 1 MPI processes
>   type: fieldsplit
>     FieldSplit with ADDITIVE composition: total splits = 2
>     Solver info for each split is in the following KSP objects:
>   Split number 0 Defined by IS
>   KSP Object: (fieldsplit_0_) 1 MPI processes
>     type: fgmres
>       restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
>       happy breakdown tolerance 1e-30
>     maximum iterations=10000, initial guess is zero
>     tolerances:  relative=1e-14, absolute=1e-30, divergence=10000.
>     right preconditioning
>     using UNPRECONDITIONED norm type for convergence test
>   PC Object: (fieldsplit_0_) 1 MPI processes
>     type: fieldsplit
>       FieldSplit with Schur preconditioner, factorization FULL
>       Preconditioner for the Schur complement formed from Sp, an assembled
> approximation to S, which uses A00's diagonal's inverse
>       Split info:
>       Split number 0 Defined by IS
>       Split number 1 Defined by IS
>       KSP solver for A00 block
>         KSP Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes
>           type: preonly
>           maximum iterations=10000, initial guess is zero
>           tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>           left preconditioning
>           using NONE norm type for convergence test
>         PC Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes
>           type: lu
>             out-of-place factorization
>             tolerance for zero pivot 2.22045e-14
>             matrix ordering: nd
>             factor fill ratio given 5., needed 3.92639
>               Factored matrix follows:
>                 Mat Object: 1 MPI processes
>                   type: seqaij
>                   rows=4290, cols=4290
>                   package used to perform factorization: petsc
>                   total: nonzeros=375944, allocated nonzeros=375944
>                     using I-node routines: found 2548 nodes, limit used is
> 5
>           linear system matrix = precond matrix:
>           Mat Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes
>             type: seqaij
>             rows=4290, cols=4290
>             total: nonzeros=95748, allocated nonzeros=95748
>             total number of mallocs used during MatSetValues calls=0
>               using I-node routines: found 3287 nodes, limit used is 5
>       KSP solver for S = A11 - A10 inv(A00) A01
>         KSP Object: (fieldsplit_0_fieldsplit_p_) 1 MPI processes
>           type: cg
>           maximum iterations=10000, initial guess is zero
>           tolerances:  relative=1e-14, absolute=1e-30, divergence=10000.
>           left preconditioning
>           using PRECONDITIONED norm type for convergence test
>         PC Object: (fieldsplit_0_fieldsplit_p_) 1 MPI processes
>           type: icc
>             out-of-place factorization
>             0 levels of fill
>             tolerance for zero pivot 2.22045e-14
>             using Manteuffel shift [POSITIVE_DEFINITE]
>             matrix ordering: natural
>             factor fill ratio given 1., needed 1.
>               Factored matrix follows:
>                 Mat Object: 1 MPI processes
>                   type: seqsbaij
>                   rows=561, cols=561
>                   package used to perform factorization: petsc
>                   total: nonzeros=5120, allocated nonzeros=5120
>                       block size is 1
>           linear system matrix followed by preconditioner matrix:
>           Mat Object: (fieldsplit_0_fieldsplit_p_) 1 MPI processes
>             type: schurcomplement
>             rows=561, cols=561
>               Schur complement A11 - A10 inv(A00) A01
>               A11
>                 Mat Object: (fieldsplit_0_fieldsplit_p_) 1 MPI processes
>                   type: seqaij
>                   rows=561, cols=561
>                   total: nonzeros=3729, allocated nonzeros=3729
>                   total number of mallocs used during MatSetValues calls=0
>                     not using I-node routines
>               A10
>                 Mat Object: 1 MPI processes
>                   type: seqaij
>                   rows=561, cols=4290
>                   total: nonzeros=19938, allocated nonzeros=19938
>                   total number of mallocs used during MatSetValues calls=0
>                     not using I-node routines
>               KSP of A00
>                 KSP Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes
>                   type: preonly
>                   maximum iterations=10000, initial guess is zero
>                   tolerances:  relative=1e-05, absolute=1e-50,
> divergence=10000.
>                   left preconditioning
>                   using NONE norm type for convergence test
>                 PC Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes
>                   type: lu
>                     out-of-place factorization
>                     tolerance for zero pivot 2.22045e-14
>                     matrix ordering: nd
>                     factor fill ratio given 5., needed 3.92639
>                       Factored matrix follows:
>                         Mat Object: 1 MPI processes
>                           type: seqaij
>                           rows=4290, cols=4290
>                           package used to perform factorization: petsc
>                           total: nonzeros=375944, allocated nonzeros=375944
>                             using I-node routines: found 2548 nodes, limit
> used is 5
>                   linear system matrix = precond matrix:
>                   Mat Object: (fieldsplit_0_fieldsplit_u_) 1 MPI processes
>                     type: seqaij
>                     rows=4290, cols=4290
>                     total: nonzeros=95748, allocated nonzeros=95748
>                     total number of mallocs used during MatSetValues
> calls=0
>                       using I-node routines: found 3287 nodes, limit used
> is 5
>               A01
>                 Mat Object: 1 MPI processes
>                   type: seqaij
>                   rows=4290, cols=561
>                   total: nonzeros=19938, allocated nonzeros=19938
>                   total number of mallocs used during MatSetValues calls=0
>                     using I-node routines: found 3287 nodes, limit used is
> 5
>           Mat Object: 1 MPI processes
>             type: seqaij
>             rows=561, cols=561
>             total: nonzeros=9679, allocated nonzeros=9679
>             total number of mallocs used during MatSetValues calls=0
>               not using I-node routines
>     linear system matrix = precond matrix:
>     Mat Object: (fieldsplit_0_) 1 MPI processes
>       type: seqaij
>       rows=4851, cols=4851
>       total: nonzeros=139353, allocated nonzeros=139353
>       total number of mallocs used during MatSetValues calls=0
>         using I-node routines: found 3830 nodes, limit used is 5
>   Split number 1 Defined by IS
>   KSP Object: (fieldsplit_c_) 1 MPI processes
>     type: preonly
>     maximum iterations=10000, initial guess is zero
>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>     left preconditioning
>     using NONE norm type for convergence test
>   PC Object: (fieldsplit_c_) 1 MPI processes
>     type: lu
>       out-of-place factorization
>       tolerance for zero pivot 2.22045e-14
>       matrix ordering: nd
>       factor fill ratio given 5., needed 4.24323
>         Factored matrix follows:
>           Mat Object: 1 MPI processes
>             type: seqaij
>             rows=561, cols=561
>             package used to perform factorization: petsc
>             total: nonzeros=15823, allocated nonzeros=15823
>               not using I-node routines
>     linear system matrix = precond matrix:
>     Mat Object: (fieldsplit_c_) 1 MPI processes
>       type: seqaij
>       rows=561, cols=561
>       total: nonzeros=3729, allocated nonzeros=3729
>       total number of mallocs used during MatSetValues calls=0
>         not using I-node routines
>   linear system matrix = precond matrix:
>   Mat Object: 1 MPI processes
>     type: seqaij
>     rows=5412, cols=5412
>     total: nonzeros=190416, allocated nonzeros=190416
>     total number of mallocs used during MatSetValues calls=0
>       using I-node routines: found 3833 nodes, limit used is 5
>
> --
> Dr. Sebastian Blauth
> Fraunhofer-Institut f?r
> Techno- und Wirtschaftsmathematik ITWM
> Abteilung Transportvorg?nge
> Fraunhofer-Platz 1, 67663 Kaiserslautern
> Telefon: +49 631 31600-4968
> sebastian.blauth at itwm.fraunhofer.de
> https://urldefense.us/v3/__https://www.itwm.fraunhofer.de__;!!G_uCfscf7eWS!an6Idf-f7OiZlsU0N0Ftpr5mM5etD7GF_9ghya_ALFmQP_eL93oONwYYRLmLGz-0FSXHkB0bMsjj0e4-qdCV$ 
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!an6Idf-f7OiZlsU0N0Ftpr5mM5etD7GF_9ghya_ALFmQP_eL93oONwYYRLmLGz-0FSXHkB0bMsjj0dkG26YT$ 
> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!an6Idf-f7OiZlsU0N0Ftpr5mM5etD7GF_9ghya_ALFmQP_eL93oONwYYRLmLGz-0FSXHkB0bMsjj0VN-barz$ >
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!an6Idf-f7OiZlsU0N0Ftpr5mM5etD7GF_9ghya_ALFmQP_eL93oONwYYRLmLGz-0FSXHkB0bMsjj0dkG26YT$ 
> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!an6Idf-f7OiZlsU0N0Ftpr5mM5etD7GF_9ghya_ALFmQP_eL93oONwYYRLmLGz-0FSXHkB0bMsjj0VN-barz$ >
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!an6Idf-f7OiZlsU0N0Ftpr5mM5etD7GF_9ghya_ALFmQP_eL93oONwYYRLmLGz-0FSXHkB0bMsjj0dkG26YT$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!an6Idf-f7OiZlsU0N0Ftpr5mM5etD7GF_9ghya_ALFmQP_eL93oONwYYRLmLGz-0FSXHkB0bMsjj0VN-barz$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240802/499597a1/attachment-0001.html>

From knepley at gmail.com  Fri Aug  2 08:58:10 2024
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 2 Aug 2024 09:58:10 -0400
Subject: [petsc-users] Ghost particles for DMSWARM (or similar)
In-Reply-To: <8FBAC7A5-B6AE-4B21-8FEB-52BE1C04A265@us.es>
References: <8FBAC7A5-B6AE-4B21-8FEB-52BE1C04A265@us.es>
Message-ID: <CAMYG4G=cqKKsYCFz2bsFnf891hR_N7EwVdj2uLWL3Gi7MbacPQ@mail.gmail.com>

On Thu, Aug 1, 2024 at 4:40?PM MIGUEL MOLINOS PEREZ <mmolinos at us.es> wrote:

> Dear all, I am implementing a Molecular Dynamics (MD) code using the
> DMSWARM interface. In the MD simulations we evaluate on each particle
> (atoms) some kind of scalar functional using data from the neighbouring
> atoms. My problem lies in the
> ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
>
> ZjQcmQRYFpfptBannerEnd
>
> Dear all,
>
> I am implementing a Molecular Dynamics (MD) code using the DMSWARM interface. In the MD simulations we evaluate on each particle (atoms) some kind of scalar functional using data from the neighbouring atoms. My problem lies in the parallel implementation of the model, because sometimes, some of these neighbours lie on a different processor.
>
> This is usually solved by using ghost particles.  A similar approach (with nodes instead) is already implemented for other PETSc mesh structures like DMPlexConstructGhostCells. Unfortunately, I don't see this kind of constructs for DMSWARM. Am I missing something?
>
> I this could be done by applying a buffer region by exploiting the background DMDA mesh that I already use to do domain decomposition. Then using the buffer region of each cell to locate the ghost particles and finally using VecCreateGhost. Is this feasible? Or is there an easier approach using other PETSc functions.
>
>
This is feasible, but it would be good to develop a set of best practices,
since we have been mainly focused on the case of non-redundant particles.
Here is how I think I would do what you want.

1) Add a particle field 'ghost' that identifies ghost vs owned particles. I
think it needs options OWNED, OVERLAP, and GHOST

2) At some interval identify particles that should be sent to other
processes as ghosts. I would call these "overlap particles". The
determination
    seems application specific, so I would leave this determination to the
user right now. We do two things to these particles

    a) Mark chosen particles as OVERLAP

    b) Change rank to process we are sending to

3) Call DMSwarmMigrate with PETSC_FALSE for the particle deletion flag

4) Mark OVERLAP particles as GHOST when they arrive

There is one problem in the above algorithm. It does not allow sending
particles to multiple ranks. We would have to do this
in phases right now, or make a small adjustment to the interface allowing
replication of particles when a set of ranks is specified.

  THanks,

     Matt


> Thank you,
> Miguel
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fTP6CcczHauSge4FV5cI88RqYPhXISVNPhCpwU5IjmOea9z2VEtIlwEoPSlg5aJbEQzO0IQ8CIvAywPYjOAG$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fTP6CcczHauSge4FV5cI88RqYPhXISVNPhCpwU5IjmOea9z2VEtIlwEoPSlg5aJbEQzO0IQ8CIvAy4O8e6Q4$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240802/4a7b72fa/attachment.html>

From mmolinos at us.es  Fri Aug  2 10:15:40 2024
From: mmolinos at us.es (MIGUEL MOLINOS PEREZ)
Date: Fri, 2 Aug 2024 15:15:40 +0000
Subject: [petsc-users] Ghost particles for DMSWARM (or similar)
In-Reply-To: <CAMYG4G=cqKKsYCFz2bsFnf891hR_N7EwVdj2uLWL3Gi7MbacPQ@mail.gmail.com>
References: <8FBAC7A5-B6AE-4B21-8FEB-52BE1C04A265@us.es>
	<CAMYG4G=cqKKsYCFz2bsFnf891hR_N7EwVdj2uLWL3Gi7MbacPQ@mail.gmail.com>
Message-ID: <C58DE0E1-4155-47B2-A55C-23A2AFAFFBFC@us.es>

Thank you Matt for your time,

What you describe seems to me the ideal approach.

1) Add a particle field 'ghost' that identifies ghost vs owned particles. I think it needs options OWNED, OVERLAP, and GHOST
This means, locally, I need to allocate Nlocal + ghost particles (duplicated) for my model? If that so, how to do the communication between the ghost particles living in the rank i and their ?real? counterpart in the rank j.

Algo, as an alternative, what about:
1) Use an IS tag which contains, for each rank, a list of the global index of the neighbors particles outside of the rank.
2) Use VecCreateGhost to create a new vector which contains extra local space for the ghost components of the vector.
3) Use VecScatterCreate, VecScatterBegin, and VecScatterEnd to do the transference of data between a vector obtained with DMSwarmCreateGlobalVectorFromField
4) Do necessary computations using the vectors created with VecCreateGhost.

Thanks,
Miguel

On Aug 2, 2024, at 8:58?AM, Matthew Knepley <knepley at gmail.com> wrote:

On Thu, Aug 1, 2024 at 4:40?PM MIGUEL MOLINOS PEREZ <mmolinos at us.es<mailto:mmolinos at us.es>> wrote:
Dear all, I am implementing a Molecular Dynamics (MD) code using the DMSWARM interface. In the MD simulations we evaluate on each particle (atoms) some kind of scalar functional using data from the neighbouring atoms. My problem lies in the
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

Dear all,

I am implementing a Molecular Dynamics (MD) code using the DMSWARM interface. In the MD simulations we evaluate on each particle (atoms) some kind of scalar functional using data from the neighbouring atoms. My problem lies in the parallel implementation of the model, because sometimes, some of these neighbours lie on a different processor.

This is usually solved by using ghost particles.  A similar approach (with nodes instead) is already implemented for other PETSc mesh structures like DMPlexConstructGhostCells. Unfortunately, I don't see this kind of constructs for DMSWARM. Am I missing something?

I this could be done by applying a buffer region by exploiting the background DMDA mesh that I already use to do domain decomposition. Then using the buffer region of each cell to locate the ghost particles and finally using VecCreateGhost. Is this feasible? Or is there an easier approach using other PETSc functions.

This is feasible, but it would be good to develop a set of best practices, since we have been mainly focused on the case of non-redundant particles. Here is how I think I would do what you want.

1) Add a particle field 'ghost' that identifies ghost vs owned particles. I think it needs options OWNED, OVERLAP, and GHOST

2) At some interval identify particles that should be sent to other processes as ghosts. I would call these "overlap particles". The determination
    seems application specific, so I would leave this determination to the user right now. We do two things to these particles

    a) Mark chosen particles as OVERLAP

    b) Change rank to process we are sending to

3) Call DMSwarmMigrate with PETSC_FALSE for the particle deletion flag

4) Mark OVERLAP particles as GHOST when they arrive

There is one problem in the above algorithm. It does not allow sending particles to multiple ranks. We would have to do this
in phases right now, or make a small adjustment to the interface allowing replication of particles when a set of ranks is specified.

  THanks,

     Matt


Thank you,
Miguel


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bwMUVbfsEDURwiD6tV7_-3EXq7Aogacpt43DZLysMRG2mTWcoK-ax5Ad2xtFGWdBZWNR_QnyvEOYuHqbu4PhgA$ <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bwMUVbfsEDURwiD6tV7_-3EXq7Aogacpt43DZLysMRG2mTWcoK-ax5Ad2xtFGWdBZWNR_QnyvEOYuHqEu2Czbw$ >

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240802/1a7f1778/attachment-0001.html>

From knepley at gmail.com  Fri Aug  2 17:33:02 2024
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 2 Aug 2024 18:33:02 -0400
Subject: [petsc-users] Ghost particles for DMSWARM (or similar)
In-Reply-To: <C58DE0E1-4155-47B2-A55C-23A2AFAFFBFC@us.es>
References: <8FBAC7A5-B6AE-4B21-8FEB-52BE1C04A265@us.es>
	<CAMYG4G=cqKKsYCFz2bsFnf891hR_N7EwVdj2uLWL3Gi7MbacPQ@mail.gmail.com>
	<C58DE0E1-4155-47B2-A55C-23A2AFAFFBFC@us.es>
Message-ID: <CAMYG4GmwO9Z638MH3tE6+XBN2VnyadW1sSF5V02kPdp4J4osMw@mail.gmail.com>

On Fri, Aug 2, 2024 at 11:15?AM MIGUEL MOLINOS PEREZ <mmolinos at us.es> wrote:

> Thank you Matt for your time,
>
> What you describe seems to me the ideal approach.
>
> 1) Add a particle field 'ghost' that identifies ghost vs owned particles.
> I think it needs options OWNED, OVERLAP, and GHOST
>
> This means, locally, I need to allocate Nlocal + ghost particles
> (duplicated) for my model?
>

I would do it another way. I would allocate the particles with no overlap
and set them up. Then I would identify the halo particles, mark them as
OVERLAP, call DMSwarmMigrate(), and mark the migrated particles as GHOST,
then unmark the OVERLAP particles. Shoot! That marking will not work since
we cannot tell the difference between particles we received and particles
we sent. Okay, instead of the `ghost` field we need an `owner rank` field.
So then we

1) Setup the non-overlapping particles

2) Identify the halo particles

3) Change the `rank`, but not the `owner rank`

4) Call DMSwarmMigrate()

Now we can identify ghost particles by the `owner rank`


> If that so, how to do the communication between the ghost particles living
> in the rank i and their ?real? counterpart in the rank j.
>
> Algo, as an alternative, what about:
> 1) Use an IS tag which contains, for each rank, a list of the global
> index of the neighbors particles outside of the rank.
> 2) Use VecCreateGhost to create a new vector which contains extra local
> space for the ghost components of the vector.
> 3) Use VecScatterCreate, VecScatterBegin, and VecScatterEnd to do the
> transference of data between a vector obtained with
> DMSwarmCreateGlobalVectorFromField
> 4) Do necessary computations using the vectors created with VecCreateGhost
> .
>

This is essentially what Migrate() does. I was trying to reuse the code.

  Thanks,

     Matt


> Thanks,
> Miguel
>
> On Aug 2, 2024, at 8:58?AM, Matthew Knepley <knepley at gmail.com> wrote:
>
> On Thu, Aug 1, 2024 at 4:40?PM MIGUEL MOLINOS PEREZ <mmolinos at us.es>
> wrote:
>
>> Dear all, I am implementing a Molecular Dynamics (MD) code using the
>> DMSWARM interface. In the MD simulations we evaluate on each particle
>> (atoms) some kind of scalar functional using data from the neighbouring
>> atoms. My problem lies in the
>> ZjQcmQRYFpfptBannerStart
>> This Message Is From an External Sender
>> This message came from outside your organization.
>>
>> ZjQcmQRYFpfptBannerEnd
>>
>> Dear all,
>>
>> I am implementing a Molecular Dynamics (MD) code using the DMSWARM interface. In the MD simulations we evaluate on each particle (atoms) some kind of scalar functional using data from the neighbouring atoms. My problem lies in the parallel implementation of the model, because sometimes, some of these neighbours lie on a different processor.
>>
>> This is usually solved by using ghost particles.  A similar approach (with nodes instead) is already implemented for other PETSc mesh structures like DMPlexConstructGhostCells. Unfortunately, I don't see this kind of constructs for DMSWARM. Am I missing something?
>>
>> I this could be done by applying a buffer region by exploiting the background DMDA mesh that I already use to do domain decomposition. Then using the buffer region of each cell to locate the ghost particles and finally using VecCreateGhost. Is this feasible? Or is there an easier approach using other PETSc functions.
>>
>>
> This is feasible, but it would be good to develop a set of best practices,
> since we have been mainly focused on the case of non-redundant particles.
> Here is how I think I would do what you want.
>
> 1) Add a particle field 'ghost' that identifies ghost vs owned particles.
> I think it needs options OWNED, OVERLAP, and GHOST
>
> 2) At some interval identify particles that should be sent to other
> processes as ghosts. I would call these "overlap particles". The
> determination
>     seems application specific, so I would leave this determination to the
> user right now. We do two things to these particles
>
>     a) Mark chosen particles as OVERLAP
>
>     b) Change rank to process we are sending to
>
> 3) Call DMSwarmMigrate with PETSC_FALSE for the particle deletion flag
>
> 4) Mark OVERLAP particles as GHOST when they arrive
>
> There is one problem in the above algorithm. It does not allow sending
> particles to multiple ranks. We would have to do this
> in phases right now, or make a small adjustment to the interface allowing
> replication of particles when a set of ranks is specified.
>
>   THanks,
>
>      Matt
>
>
>> Thank you,
>> Miguel
>>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bV_q4tHUc2Lno7u6JeojubaRmzQjKJDlVFxnATMOtT6Soncx1isPiFXZBhekxMOgHSdyaz_fLrVfbGZhZdDQ$ 
> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bV_q4tHUc2Lno7u6JeojubaRmzQjKJDlVFxnATMOtT6Soncx1isPiFXZBhekxMOgHSdyaz_fLrVfbLWJTm3C$ >
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bV_q4tHUc2Lno7u6JeojubaRmzQjKJDlVFxnATMOtT6Soncx1isPiFXZBhekxMOgHSdyaz_fLrVfbGZhZdDQ$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bV_q4tHUc2Lno7u6JeojubaRmzQjKJDlVFxnATMOtT6Soncx1isPiFXZBhekxMOgHSdyaz_fLrVfbLWJTm3C$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240802/0775828e/attachment.html>

From mmolinos at us.es  Fri Aug  2 18:15:01 2024
From: mmolinos at us.es (MIGUEL MOLINOS PEREZ)
Date: Fri, 2 Aug 2024 23:15:01 +0000
Subject: [petsc-users] Ghost particles for DMSWARM (or similar)
In-Reply-To: <CAMYG4GmwO9Z638MH3tE6+XBN2VnyadW1sSF5V02kPdp4J4osMw@mail.gmail.com>
References: <8FBAC7A5-B6AE-4B21-8FEB-52BE1C04A265@us.es>
	<CAMYG4G=cqKKsYCFz2bsFnf891hR_N7EwVdj2uLWL3Gi7MbacPQ@mail.gmail.com>
	<C58DE0E1-4155-47B2-A55C-23A2AFAFFBFC@us.es>
	<CAMYG4GmwO9Z638MH3tE6+XBN2VnyadW1sSF5V02kPdp4J4osMw@mail.gmail.com>
Message-ID: <1B9B1277-9566-444C-9DA8-7ED17684FE01@us.es>

Thanks again Matt, that makes a lot more sense !!

Just to check that we are on the same page. You are saying:

1. create a field define a field called "owner rank" for each particle.

2. Identify the phantom particles and modify the internal variable defined by the DMSwarmField_rank variable.

3. Call DMSwarmMigrate(*,PETSC_FALSE), do the calculations using the new local vector including the ghost particles.

4. Then, once the calculations are done, rename the DMSwarmField_rank variable using the "owner rank" variable and call DMSwarmMigrate(*,PETSC_FALSE) once again.

Thank you,
Miguel


On Aug 2, 2024, at 5:33?PM, Matthew Knepley <knepley at gmail.com> wrote:

On Fri, Aug 2, 2024 at 11:15?AM MIGUEL MOLINOS PEREZ <mmolinos at us.es<mailto:mmolinos at us.es>> wrote:
Thank you Matt for your time,

What you describe seems to me the ideal approach.

1) Add a particle field 'ghost' that identifies ghost vs owned particles. I think it needs options OWNED, OVERLAP, and GHOST
This means, locally, I need to allocate Nlocal + ghost particles (duplicated) for my model?

I would do it another way. I would allocate the particles with no overlap and set them up. Then I would identify the halo particles, mark them as OVERLAP, call DMSwarmMigrate(), and mark the migrated particles as GHOST, then unmark the OVERLAP particles. Shoot! That marking will not work since we cannot tell the difference between particles we received and particles we sent. Okay, instead of the `ghost` field we need an `owner rank` field. So then we

1) Setup the non-overlapping particles

2) Identify the halo particles

3) Change the `rank`, but not the `owner rank`

4) Call DMSwarmMigrate()

Now we can identify ghost particles by the `owner rank`

If that so, how to do the communication between the ghost particles living in the rank i and their ?real? counterpart in the rank j.

Algo, as an alternative, what about:
1) Use an IS tag which contains, for each rank, a list of the global index of the neighbors particles outside of the rank.
2) Use VecCreateGhost to create a new vector which contains extra local space for the ghost components of the vector.
3) Use VecScatterCreate, VecScatterBegin, and VecScatterEnd to do the transference of data between a vector obtained with  DMSwarmCreateGlobalVectorFromField
4) Do necessary computations using the vectors created with VecCreateGhost.

This is essentially what Migrate() does. I was trying to reuse the code.

  Thanks,

     Matt

Thanks,
Miguel

On Aug 2, 2024, at 8:58?AM, Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>> wrote:

On Thu, Aug 1, 2024 at 4:40?PM MIGUEL MOLINOS PEREZ <mmolinos at us.es<mailto:mmolinos at us.es>> wrote:
This Message Is From an External Sender
This message came from outside your organization.


Dear all,

I am implementing a Molecular Dynamics (MD) code using the DMSWARM interface. In the MD simulations we evaluate on each particle (atoms) some kind of scalar functional using data from the neighbouring atoms. My problem lies in the parallel implementation of the model, because sometimes, some of these neighbours lie on a different processor.

This is usually solved by using ghost particles.  A similar approach (with nodes instead) is already implemented for other PETSc mesh structures like DMPlexConstructGhostCells. Unfortunately, I don't see this kind of constructs for DMSWARM. Am I missing something?

I this could be done by applying a buffer region by exploiting the background DMDA mesh that I already use to do domain decomposition. Then using the buffer region of each cell to locate the ghost particles and finally using VecCreateGhost. Is this feasible? Or is there an easier approach using other PETSc functions.

This is feasible, but it would be good to develop a set of best practices, since we have been mainly focused on the case of non-redundant particles. Here is how I think I would do what you want.

1) Add a particle field 'ghost' that identifies ghost vs owned particles. I think it needs options OWNED, OVERLAP, and GHOST

2) At some interval identify particles that should be sent to other processes as ghosts. I would call these "overlap particles". The determination
    seems application specific, so I would leave this determination to the user right now. We do two things to these particles

    a) Mark chosen particles as OVERLAP

    b) Change rank to process we are sending to

3) Call DMSwarmMigrate with PETSC_FALSE for the particle deletion flag

4) Mark OVERLAP particles as GHOST when they arrive

There is one problem in the above algorithm. It does not allow sending particles to multiple ranks. We would have to do this
in phases right now, or make a small adjustment to the interface allowing replication of particles when a set of ranks is specified.

  THanks,

     Matt


Thank you,
Miguel


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YUCXV0Md1eeDXw0O-sIM1ttHyCHlD3L9TKHf-ojo81xTW1wSCAlD3ilqjvUp-jFdEJos7OcgQNHzLba_F9ZmFA$ <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YUCXV0Md1eeDXw0O-sIM1ttHyCHlD3L9TKHf-ojo81xTW1wSCAlD3ilqjvUp-jFdEJos7OcgQNHzLbZZLJUNUA$ >


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YUCXV0Md1eeDXw0O-sIM1ttHyCHlD3L9TKHf-ojo81xTW1wSCAlD3ilqjvUp-jFdEJos7OcgQNHzLba_F9ZmFA$ <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YUCXV0Md1eeDXw0O-sIM1ttHyCHlD3L9TKHf-ojo81xTW1wSCAlD3ilqjvUp-jFdEJos7OcgQNHzLbZZLJUNUA$ >

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240802/b13ab5e8/attachment-0001.html>

From mmolinos at us.es  Fri Aug  2 18:47:30 2024
From: mmolinos at us.es (MIGUEL MOLINOS PEREZ)
Date: Fri, 2 Aug 2024 23:47:30 +0000
Subject: [petsc-users] Ghost particles for DMSWARM (or similar)
In-Reply-To: <1B9B1277-9566-444C-9DA8-7ED17684FE01@us.es>
References: <8FBAC7A5-B6AE-4B21-8FEB-52BE1C04A265@us.es>
	<CAMYG4G=cqKKsYCFz2bsFnf891hR_N7EwVdj2uLWL3Gi7MbacPQ@mail.gmail.com>
	<C58DE0E1-4155-47B2-A55C-23A2AFAFFBFC@us.es>
	<CAMYG4GmwO9Z638MH3tE6+XBN2VnyadW1sSF5V02kPdp4J4osMw@mail.gmail.com>
	<1B9B1277-9566-444C-9DA8-7ED17684FE01@us.es>
Message-ID: <63A1940C-EE12-48F1-8196-A3CE7C81DCA1@us.es>

Sorry, I forgot to ask this:

Is this procedure safe with overlapping ghost paddles? Like for instance, shared corners between 4 ranks in 2D.

Thanks,
Miguel

On Aug 2, 2024, at 6:14?PM, MIGUEL MOLINOS PEREZ <mmolinos at us.es> wrote:

Thanks again Matt, that makes a lot more sense !!

Just to check that we are on the same page. You are saying:

1. create a field define a field called "owner rank" for each particle.

2. Identify the phantom particles and modify the internal variable defined by the DMSwarmField_rank variable.

3. Call DMSwarmMigrate(*,PETSC_FALSE), do the calculations using the new local vector including the ghost particles.

4. Then, once the calculations are done, rename the DMSwarmField_rank variable using the "owner rank" variable and call DMSwarmMigrate(*,PETSC_FALSE) once again.

Thank you,
Miguel


On Aug 2, 2024, at 5:33?PM, Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>> wrote:

On Fri, Aug 2, 2024 at 11:15?AM MIGUEL MOLINOS PEREZ <mmolinos at us.es<mailto:mmolinos at us.es>> wrote:
Thank you Matt for your time,

What you describe seems to me the ideal approach.

1) Add a particle field 'ghost' that identifies ghost vs owned particles. I think it needs options OWNED, OVERLAP, and GHOST
This means, locally, I need to allocate Nlocal + ghost particles (duplicated) for my model?

I would do it another way. I would allocate the particles with no overlap and set them up. Then I would identify the halo particles, mark them as OVERLAP, call DMSwarmMigrate(), and mark the migrated particles as GHOST, then unmark the OVERLAP particles. Shoot! That marking will not work since we cannot tell the difference between particles we received and particles we sent. Okay, instead of the `ghost` field we need an `owner rank` field. So then we

1) Setup the non-overlapping particles

2) Identify the halo particles

3) Change the `rank`, but not the `owner rank`

4) Call DMSwarmMigrate()

Now we can identify ghost particles by the `owner rank`

If that so, how to do the communication between the ghost particles living in the rank i and their ?real? counterpart in the rank j.

Algo, as an alternative, what about:
1) Use an IS tag which contains, for each rank, a list of the global index of the neighbors particles outside of the rank.
2) Use VecCreateGhost to create a new vector which contains extra local space for the ghost components of the vector.
3) Use VecScatterCreate, VecScatterBegin, and VecScatterEnd to do the transference of data between a vector obtained with  DMSwarmCreateGlobalVectorFromField
4) Do necessary computations using the vectors created with VecCreateGhost.

This is essentially what Migrate() does. I was trying to reuse the code.

  Thanks,

     Matt

Thanks,
Miguel

On Aug 2, 2024, at 8:58?AM, Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>> wrote:

On Thu, Aug 1, 2024 at 4:40?PM MIGUEL MOLINOS PEREZ <mmolinos at us.es<mailto:mmolinos at us.es>> wrote:
This Message Is From an External Sender
This message came from outside your organization.


Dear all,

I am implementing a Molecular Dynamics (MD) code using the DMSWARM interface. In the MD simulations we evaluate on each particle (atoms) some kind of scalar functional using data from the neighbouring atoms. My problem lies in the parallel implementation of the model, because sometimes, some of these neighbours lie on a different processor.

This is usually solved by using ghost particles.  A similar approach (with nodes instead) is already implemented for other PETSc mesh structures like DMPlexConstructGhostCells. Unfortunately, I don't see this kind of constructs for DMSWARM. Am I missing something?

I this could be done by applying a buffer region by exploiting the background DMDA mesh that I already use to do domain decomposition. Then using the buffer region of each cell to locate the ghost particles and finally using VecCreateGhost. Is this feasible? Or is there an easier approach using other PETSc functions.

This is feasible, but it would be good to develop a set of best practices, since we have been mainly focused on the case of non-redundant particles. Here is how I think I would do what you want.

1) Add a particle field 'ghost' that identifies ghost vs owned particles. I think it needs options OWNED, OVERLAP, and GHOST

2) At some interval identify particles that should be sent to other processes as ghosts. I would call these "overlap particles". The determination
    seems application specific, so I would leave this determination to the user right now. We do two things to these particles

    a) Mark chosen particles as OVERLAP

    b) Change rank to process we are sending to

3) Call DMSwarmMigrate with PETSC_FALSE for the particle deletion flag

4) Mark OVERLAP particles as GHOST when they arrive

There is one problem in the above algorithm. It does not allow sending particles to multiple ranks. We would have to do this
in phases right now, or make a small adjustment to the interface allowing replication of particles when a set of ranks is specified.

  THanks,

     Matt


Thank you,
Miguel


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YDFG5MS95W4ljXEG0E1Yev9-PFcRmx0YhN98aKOS9oNQtJv4IZo87H1hNoJmE6kU1F2wiGvHxReC2jqH5qCJNQ$ <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YDFG5MS95W4ljXEG0E1Yev9-PFcRmx0YhN98aKOS9oNQtJv4IZo87H1hNoJmE6kU1F2wiGvHxReC2jro4c9jcw$ >


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YDFG5MS95W4ljXEG0E1Yev9-PFcRmx0YhN98aKOS9oNQtJv4IZo87H1hNoJmE6kU1F2wiGvHxReC2jqH5qCJNQ$ <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YDFG5MS95W4ljXEG0E1Yev9-PFcRmx0YhN98aKOS9oNQtJv4IZo87H1hNoJmE6kU1F2wiGvHxReC2jro4c9jcw$ >


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240802/e6a2e6b5/attachment-0001.html>

From knepley at gmail.com  Sun Aug  4 03:30:05 2024
From: knepley at gmail.com (Matthew Knepley)
Date: Sun, 4 Aug 2024 04:30:05 -0400
Subject: [petsc-users] Ghost particles for DMSWARM (or similar)
In-Reply-To: <1B9B1277-9566-444C-9DA8-7ED17684FE01@us.es>
References: <8FBAC7A5-B6AE-4B21-8FEB-52BE1C04A265@us.es>
	<CAMYG4G=cqKKsYCFz2bsFnf891hR_N7EwVdj2uLWL3Gi7MbacPQ@mail.gmail.com>
	<C58DE0E1-4155-47B2-A55C-23A2AFAFFBFC@us.es>
	<CAMYG4GmwO9Z638MH3tE6+XBN2VnyadW1sSF5V02kPdp4J4osMw@mail.gmail.com>
	<1B9B1277-9566-444C-9DA8-7ED17684FE01@us.es>
Message-ID: <CAMYG4G=sD5+0pUPEz1arT6G6EJdSapV0xbVoT-J39gR+u17TXw@mail.gmail.com>

On Fri, Aug 2, 2024 at 7:15?PM MIGUEL MOLINOS PEREZ <mmolinos at us.es> wrote:

> Thanks again Matt, that makes a lot more sense !!
>
> Just to check that we are on the same page. You are saying:
>
> 1. create a field define a field called "owner rank" for each particle.
>
> 2. Identify the phantom particles and modify the internal variable defined
> by the DMSwarmField_rank variable.
>
> 3. Call DMSwarmMigrate(*,PETSC_FALSE), do the calculations using the new
> local vector including the ghost particles.
>
> 4. Then, once the calculations are done, rename the DMSwarmField_rank
> variable using the "owner rank" variable and call
> DMSwarmMigrate(*,PETSC_FALSE) once again.
>

I don't think we need this last step. We can just remove those ghost
particles for the next step I think.

  Thanks,

     Matt


> Thank you,
> Miguel
>
>
> On Aug 2, 2024, at 5:33?PM, Matthew Knepley <knepley at gmail.com> wrote:
>
> On Fri, Aug 2, 2024 at 11:15?AM MIGUEL MOLINOS PEREZ <mmolinos at us.es>
> wrote:
>
>> Thank you Matt for your time,
>>
>> What you describe seems to me the ideal approach.
>>
>> 1) Add a particle field 'ghost' that identifies ghost vs owned particles.
>> I think it needs options OWNED, OVERLAP, and GHOST
>>
>> This means, locally, I need to allocate Nlocal + ghost particles
>> (duplicated) for my model?
>>
>
> I would do it another way. I would allocate the particles with no overlap
> and set them up. Then I would identify the halo particles, mark them as
> OVERLAP, call DMSwarmMigrate(), and mark the migrated particles as GHOST,
> then unmark the OVERLAP particles. Shoot! That marking will not work since
> we cannot tell the difference between particles we received and particles
> we sent. Okay, instead of the `ghost` field we need an `owner rank` field.
> So then we
>
> 1) Setup the non-overlapping particles
>
> 2) Identify the halo particles
>
> 3) Change the `rank`, but not the `owner rank`
>
> 4) Call DMSwarmMigrate()
>
> Now we can identify ghost particles by the `owner rank`
>
>
>> If that so, how to do the communication between the ghost particles
>> living in the rank i and their ?real? counterpart in the rank j.
>>
>> Algo, as an alternative, what about:
>> 1) Use an IS tag which contains, for each rank, a list of the global
>> index of the neighbors particles outside of the rank.
>> 2) Use VecCreateGhost to create a new vector which contains extra local
>> space for the ghost components of the vector.
>> 3) Use VecScatterCreate, VecScatterBegin, and VecScatterEnd to do the
>> transference of data between a vector obtained with
>> DMSwarmCreateGlobalVectorFromField
>> 4) Do necessary computations using the vectors created with
>> VecCreateGhost.
>>
>
> This is essentially what Migrate() does. I was trying to reuse the code.
>
>   Thanks,
>
>      Matt
>
>
>> Thanks,
>> Miguel
>>
>> On Aug 2, 2024, at 8:58?AM, Matthew Knepley <knepley at gmail.com> wrote:
>>
>> On Thu, Aug 1, 2024 at 4:40?PM MIGUEL MOLINOS PEREZ <mmolinos at us.es>
>> wrote:
>>
>>> This Message Is From an External Sender
>>> This message came from outside your organization.
>>>
>>>
>>> Dear all,
>>>
>>> I am implementing a Molecular Dynamics (MD) code using the DMSWARM interface. In the MD simulations we evaluate on each particle (atoms) some kind of scalar functional using data from the neighbouring atoms. My problem lies in the parallel implementation of the model, because sometimes, some of these neighbours lie on a different processor.
>>>
>>> This is usually solved by using ghost particles.  A similar approach (with nodes instead) is already implemented for other PETSc mesh structures like DMPlexConstructGhostCells. Unfortunately, I don't see this kind of constructs for DMSWARM. Am I missing something?
>>>
>>> I this could be done by applying a buffer region by exploiting the background DMDA mesh that I already use to do domain decomposition. Then using the buffer region of each cell to locate the ghost particles and finally using VecCreateGhost. Is this feasible? Or is there an easier approach using other PETSc functions.
>>>
>>>
>> This is feasible, but it would be good to develop a set of best
>> practices, since we have been mainly focused on the case of non-redundant
>> particles. Here is how I think I would do what you want.
>>
>> 1) Add a particle field 'ghost' that identifies ghost vs owned particles.
>> I think it needs options OWNED, OVERLAP, and GHOST
>>
>> 2) At some interval identify particles that should be sent to other
>> processes as ghosts. I would call these "overlap particles". The
>> determination
>>     seems application specific, so I would leave this determination to
>> the user right now. We do two things to these particles
>>
>>     a) Mark chosen particles as OVERLAP
>>
>>     b) Change rank to process we are sending to
>>
>> 3) Call DMSwarmMigrate with PETSC_FALSE for the particle deletion flag
>>
>> 4) Mark OVERLAP particles as GHOST when they arrive
>>
>> There is one problem in the above algorithm. It does not allow sending
>> particles to multiple ranks. We would have to do this
>> in phases right now, or make a small adjustment to the interface allowing
>> replication of particles when a set of ranks is specified.
>>
>>   THanks,
>>
>>      Matt
>>
>>
>>> Thank you,
>>> Miguel
>>>
>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bfs_cYi_MbewjyZ5saHoEqAx9SXEFMKekC6TOFsGAXCr11wOn1RrnuG5RTFV4WqHjWvBiHxouSdCL7B8UTwQ$ 
>> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bfs_cYi_MbewjyZ5saHoEqAx9SXEFMKekC6TOFsGAXCr11wOn1RrnuG5RTFV4WqHjWvBiHxouSdCL7tFL14w$ >
>>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bfs_cYi_MbewjyZ5saHoEqAx9SXEFMKekC6TOFsGAXCr11wOn1RrnuG5RTFV4WqHjWvBiHxouSdCL7B8UTwQ$ 
> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bfs_cYi_MbewjyZ5saHoEqAx9SXEFMKekC6TOFsGAXCr11wOn1RrnuG5RTFV4WqHjWvBiHxouSdCL7tFL14w$ >
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bfs_cYi_MbewjyZ5saHoEqAx9SXEFMKekC6TOFsGAXCr11wOn1RrnuG5RTFV4WqHjWvBiHxouSdCL7B8UTwQ$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bfs_cYi_MbewjyZ5saHoEqAx9SXEFMKekC6TOFsGAXCr11wOn1RrnuG5RTFV4WqHjWvBiHxouSdCL7tFL14w$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240804/9038d839/attachment.html>

From konstantin.murusidze at math.msu.ru  Mon Aug  5 03:43:09 2024
From: konstantin.murusidze at math.msu.ru (=?utf-8?B?0JrQvtC90YHRgtCw0L3RgtC40L0g0JzRg9GA0YPRgdC40LTQt9C1?=)
Date: Mon, 05 Aug 2024 11:43:09 +0300
Subject: [petsc-users] (no subject)
In-Reply-To: <F34FE60D-E412-4229-BE3A-73118A6EFC53@petsc.dev>
References: <441311721646454@mail.yandex.ru>
	<F34FE60D-E412-4229-BE3A-73118A6EFC53@petsc.dev>
Message-ID: <234711722846959@mail.yandex.ru>

An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240805/cf01a474/attachment.html>

From bsmith at petsc.dev  Mon Aug  5 10:50:46 2024
From: bsmith at petsc.dev (Barry Smith)
Date: Mon, 5 Aug 2024 11:50:46 -0400
Subject: [petsc-users] (no subject)
In-Reply-To: <234711722846959@mail.yandex.ru>
References: <441311721646454@mail.yandex.ru>
	<F34FE60D-E412-4229-BE3A-73118A6EFC53@petsc.dev>
	<234711722846959@mail.yandex.ru>
Message-ID: <634B195B-0BD7-43A5-8ABC-6D8D4E00ABF1@petsc.dev>

PCFactorSetShiftType() only works for certain preconditioners, ILU. 

If the matrix is not symmetric it also doesn't help.

Are you sure the matrix is symmetric positive definite?


> 
> 
> On Aug 5, 2024, at 4:43?AM, ?????????? ????????? <konstantin.murusidze at math.msu.ru> wrote:
> 
> After this procedure I saw "Linear solution did not converge due to DIVERGED_INDEFINITE_MAT". Then as shown on https://urldefense.us/v3/__https://petsc.org/main/manualpages/KSP/KSP_DIVERGED_INDEFINITE_PC/__;!!G_uCfscf7eWS!dUAb2JB1g_tVNhPsjMe6qIzyNarGnl8HzZh2cOn_IkzWnHD5QJPY06dL8iwmKAJH2A_CYwssOq3IcWT2ButJOns$ ?  I wrote such line PetscCall(PCFactorSetShiftType(pc, MAT_SHIFT_POSITIVE_DEFINITE)); but nothing changed and I still have divergence
> 
> 
> 22.07.2024, 17:22, "Barry Smith" <bsmith at petsc.dev>:
> 
>    Run with -ksp_monitor_true_residual -ksp_converged_reason -ksp_view to see why it is stopping at 38 iterations.
> 
>    Barry
> 
> 
> On Jul 22, 2024, at 7:37?AM, ?????????? ????????? <konstantin.murusidze at math.msu.ru <mailto:konstantin.murusidze at math.msu.ru>> wrote:
> 
> This Message Is From an External Sender
> This message came from outside your organization.
> Good afternoon. I am a student at the Faculty of Mathematics and for my course work I need to solve SLAE with a relative accuracy of 1e-8 or more. To do this, I created the function PetscCall(KSPSetTolerances(ksp, 1.e-8, PETSC_DEFAULT, PETSC_DEFAULT, 100000));. But in the end, only 38 iterations were made and the relative norm ||Ax-b||/||b|| it turns out 4.54011. If you reply to my email, I can give you more information about the solver settings.
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240805/1a5b41d3/attachment.html>

From srvenkat at utexas.edu  Mon Aug  5 12:10:26 2024
From: srvenkat at utexas.edu (Sreeram R Venkat)
Date: Mon, 5 Aug 2024 12:10:26 -0500
Subject: [petsc-users] Read/Write large dense matrix
Message-ID: <CADtq7MtMvTjtEiHAmPxqw3Z876X+nC4U1z=mc4GeP6YWUMz9hA@mail.gmail.com>

I have a large dense matrix (size ranging from 5e4 to 1e5) that arises as a
result of doing MatComputeOperator() on a MatShell. When the total number
of nonzeros exceeds the 32 bit integer value, I get an error (MPI buffer
size too big) when trying to do MatView() on this to save to binary. Is
there a way I can save this matrix to load again for later use?

The other thing I tried was to save each column as a separate dataset in an
hdf5 file. Then, I tried to load this in python, combine them to an np
array, and then create/save a dense matrix with petsc4py. I was able to
create the dense Mat, but the MatView() once again resulted in an error
(out of memory).

Thanks,
Sreeram
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240805/ab64250e/attachment.html>

From knepley at gmail.com  Mon Aug  5 12:25:27 2024
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 5 Aug 2024 13:25:27 -0400
Subject: [petsc-users] Read/Write large dense matrix
In-Reply-To: <CADtq7MtMvTjtEiHAmPxqw3Z876X+nC4U1z=mc4GeP6YWUMz9hA@mail.gmail.com>
References: <CADtq7MtMvTjtEiHAmPxqw3Z876X+nC4U1z=mc4GeP6YWUMz9hA@mail.gmail.com>
Message-ID: <CAMYG4Gn__tB8TioRWxbaa8yTKGU8d9zm4CgyfWXizHVCtg_yWw@mail.gmail.com>

On Mon, Aug 5, 2024 at 1:10?PM Sreeram R Venkat <srvenkat at utexas.edu> wrote:

> I have a large dense matrix (size ranging from 5e4 to 1e5) that arises as
> a result of doing MatComputeOperator() on a MatShell. When the total number
> of nonzeros exceeds the 32 bit integer value, I get an error (MPI buffer
> size too big) when
> ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
>
> ZjQcmQRYFpfptBannerEnd
> I have a large dense matrix (size ranging from 5e4 to 1e5) that arises as
> a result of doing MatComputeOperator() on a MatShell. When the total number
> of nonzeros exceeds the 32 bit integer value, I get an error (MPI buffer
> size too big) when trying to do MatView() on this to save to binary. Is
> there a way I can save this matrix to load again for later use?
>

I think you need to reconfigure with --with-64-bit-indices.

  Thanks,

     Matt


> The other thing I tried was to save each column as a separate dataset in
> an hdf5 file. Then, I tried to load this in python, combine them to an np
> array, and then create/save a dense matrix with petsc4py. I was able to
> create the dense Mat, but the MatView() once again resulted in an error
> (out of memory).
>
> Thanks,
> Sreeram
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ZWJy1vnQRcamRkpD9AvtD6y9h9bvIfbWSTz0DllLxYWq7hwcAytyX_EC7cpuwneyYXURUCUm2lSCptmMMZy4$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ZWJy1vnQRcamRkpD9AvtD6y9h9bvIfbWSTz0DllLxYWq7hwcAytyX_EC7cpuwneyYXURUCUm2lSCptfV0OFZ$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240805/7c553e93/attachment.html>

From srvenkat at utexas.edu  Mon Aug  5 12:26:45 2024
From: srvenkat at utexas.edu (Sreeram R Venkat)
Date: Mon, 5 Aug 2024 12:26:45 -0500
Subject: [petsc-users] Read/Write large dense matrix
In-Reply-To: <CAMYG4Gn__tB8TioRWxbaa8yTKGU8d9zm4CgyfWXizHVCtg_yWw@mail.gmail.com>
References: <CADtq7MtMvTjtEiHAmPxqw3Z876X+nC4U1z=mc4GeP6YWUMz9hA@mail.gmail.com>
	<CAMYG4Gn__tB8TioRWxbaa8yTKGU8d9zm4CgyfWXizHVCtg_yWw@mail.gmail.com>
Message-ID: <CADtq7Mu08ooBiDbe7EY-3_v95PyuT_-41L3BaYjo8K6Hchn_-g@mail.gmail.com>

I do have 64 bit indices turned on. The problem I think is that the
PetscMPIInt is always a 32 bit int, and that's what's overflowing

On Mon, Aug 5, 2024 at 12:25?PM Matthew Knepley <knepley at gmail.com> wrote:

> On Mon, Aug 5, 2024 at 1:10?PM Sreeram R Venkat <srvenkat at utexas.edu>
> wrote:
>
>> I have a large dense matrix (size ranging from 5e4 to 1e5) that arises as
>> a result of doing MatComputeOperator() on a MatShell. When the total number
>> of nonzeros exceeds the 32 bit integer value, I get an error (MPI buffer
>> size too big) when
>> ZjQcmQRYFpfptBannerStart
>> This Message Is From an External Sender
>> This message came from outside your organization.
>>
>> ZjQcmQRYFpfptBannerEnd
>> I have a large dense matrix (size ranging from 5e4 to 1e5) that arises as
>> a result of doing MatComputeOperator() on a MatShell. When the total number
>> of nonzeros exceeds the 32 bit integer value, I get an error (MPI buffer
>> size too big) when trying to do MatView() on this to save to binary. Is
>> there a way I can save this matrix to load again for later use?
>>
>
> I think you need to reconfigure with --with-64-bit-indices.
>
>   Thanks,
>
>      Matt
>
>
>> The other thing I tried was to save each column as a separate dataset in
>> an hdf5 file. Then, I tried to load this in python, combine them to an np
>> array, and then create/save a dense matrix with petsc4py. I was able to
>> create the dense Mat, but the MatView() once again resulted in an error
>> (out of memory).
>>
>> Thanks,
>> Sreeram
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YEWGX-ox3ARoPA3Pbscem_Hb7EN32OeIgge1kzxkTWE7WIoLiKNb_-TCX9RvQvrJDbraBDj_vhi63DtOgAjbEhbzEA$ 
> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YEWGX-ox3ARoPA3Pbscem_Hb7EN32OeIgge1kzxkTWE7WIoLiKNb_-TCX9RvQvrJDbraBDj_vhi63DtOgAg4dLZUNg$ >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240805/60c42c9c/attachment-0001.html>

From knepley at gmail.com  Mon Aug  5 12:40:43 2024
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 5 Aug 2024 13:40:43 -0400
Subject: [petsc-users] Read/Write large dense matrix
In-Reply-To: <CADtq7Mu08ooBiDbe7EY-3_v95PyuT_-41L3BaYjo8K6Hchn_-g@mail.gmail.com>
References: <CADtq7MtMvTjtEiHAmPxqw3Z876X+nC4U1z=mc4GeP6YWUMz9hA@mail.gmail.com>
	<CAMYG4Gn__tB8TioRWxbaa8yTKGU8d9zm4CgyfWXizHVCtg_yWw@mail.gmail.com>
	<CADtq7Mu08ooBiDbe7EY-3_v95PyuT_-41L3BaYjo8K6Hchn_-g@mail.gmail.com>
Message-ID: <CAMYG4GkFPXp=8YR68X94h4O=aVkcKH25x+Yhf78B80MXPOExPQ@mail.gmail.com>

On Mon, Aug 5, 2024 at 1:26?PM Sreeram R Venkat <srvenkat at utexas.edu> wrote:

> I do have 64 bit indices turned on. The problem I think is that the
> PetscMPIInt is always a 32 bit int, and that's what's overflowing
>

We should be using the large count support from MPI. However, it appears we
forgot somewhere. Would it be possible to
construct a simple example that I can run and find the error? You should be
able to just create a dense matrix of zeros with the
correct size.

  Thanks,

      Matt


> On Mon, Aug 5, 2024 at 12:25?PM Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Mon, Aug 5, 2024 at 1:10?PM Sreeram R Venkat <srvenkat at utexas.edu>
>> wrote:
>>
>>> I have a large dense matrix (size ranging from 5e4 to 1e5) that arises
>>> as a result of doing MatComputeOperator() on a MatShell. When the total
>>> number of nonzeros exceeds the 32 bit integer value, I get an error (MPI
>>> buffer size too big) when
>>> ZjQcmQRYFpfptBannerStart
>>> This Message Is From an External Sender
>>> This message came from outside your organization.
>>>
>>> ZjQcmQRYFpfptBannerEnd
>>> I have a large dense matrix (size ranging from 5e4 to 1e5) that arises
>>> as a result of doing MatComputeOperator() on a MatShell. When the total
>>> number of nonzeros exceeds the 32 bit integer value, I get an error (MPI
>>> buffer size too big) when trying to do MatView() on this to save to binary.
>>> Is there a way I can save this matrix to load again for later use?
>>>
>>
>> I think you need to reconfigure with --with-64-bit-indices.
>>
>>   Thanks,
>>
>>      Matt
>>
>>
>>> The other thing I tried was to save each column as a separate dataset in
>>> an hdf5 file. Then, I tried to load this in python, combine them to an np
>>> array, and then create/save a dense matrix with petsc4py. I was able to
>>> create the dense Mat, but the MatView() once again resulted in an error
>>> (out of memory).
>>>
>>> Thanks,
>>> Sreeram
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!a-sxRcKHh_nd4gLTjiXZxx0nYU4_lvIBL8xVFhNVrOwEBeVFcnTWMFNkyHuJ15bZDhKacKWF1t8swumsFxgH$ 
>> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!a-sxRcKHh_nd4gLTjiXZxx0nYU4_lvIBL8xVFhNVrOwEBeVFcnTWMFNkyHuJ15bZDhKacKWF1t8swuTKLNGG$ >
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!a-sxRcKHh_nd4gLTjiXZxx0nYU4_lvIBL8xVFhNVrOwEBeVFcnTWMFNkyHuJ15bZDhKacKWF1t8swumsFxgH$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!a-sxRcKHh_nd4gLTjiXZxx0nYU4_lvIBL8xVFhNVrOwEBeVFcnTWMFNkyHuJ15bZDhKacKWF1t8swuTKLNGG$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240805/8a97c14b/attachment.html>

From bsmith at petsc.dev  Mon Aug  5 13:19:37 2024
From: bsmith at petsc.dev (Barry Smith)
Date: Mon, 5 Aug 2024 14:19:37 -0400
Subject: [petsc-users] Read/Write large dense matrix
In-Reply-To: <CADtq7MtMvTjtEiHAmPxqw3Z876X+nC4U1z=mc4GeP6YWUMz9hA@mail.gmail.com>
References: <CADtq7MtMvTjtEiHAmPxqw3Z876X+nC4U1z=mc4GeP6YWUMz9hA@mail.gmail.com>
Message-ID: <48E31B64-61EB-463F-823F-314DE9E7C290@petsc.dev>


   By default PETSc MatView() to a binary viewer uses the "standard" compressed sparse storage format. This is not efficient (or reasonable) for dense matrices and 
produces issues with integer overflow.

   To store a dense matrix as dense on disk, use the PetscViewerFormat of PETSC_VIEWER_NATIVE. So for example

   PetscViewerPushFormat(viewer,PETSC_VIEWER_NATIVE);
   MatView(mat, viewer);
   PetscViewerPopFormat(viewer);


> On Aug 5, 2024, at 1:10?PM, Sreeram R Venkat <srvenkat at utexas.edu> wrote:
> 
> This Message Is From an External Sender
> This message came from outside your organization.
> I have a large dense matrix (size ranging from 5e4 to 1e5) that arises as a result of doing MatComputeOperator() on a MatShell. When the total number of nonzeros exceeds the 32 bit integer value, I get an error (MPI buffer size too big) when trying to do MatView() on this to save to binary. Is there a way I can save this matrix to load again for later use? 
> 
> The other thing I tried was to save each column as a separate dataset in an hdf5 file. Then, I tried to load this in python, combine them to an np array, and then create/save a dense matrix with petsc4py. I was able to create the dense Mat, but the MatView() once again resulted in an error (out of memory). 
> 
> Thanks,
> Sreeram

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240805/1786cb51/attachment-0001.html>

From konstantin.murusidze at math.msu.ru  Mon Aug  5 13:27:51 2024
From: konstantin.murusidze at math.msu.ru (=?utf-8?B?0JrQvtC90YHRgtCw0L3RgtC40L0g0JzRg9GA0YPRgdC40LTQt9C1?=)
Date: Mon, 05 Aug 2024 21:27:51 +0300
Subject: [petsc-users] (no subject)
In-Reply-To: <634B195B-0BD7-43A5-8ABC-6D8D4E00ABF1@petsc.dev>
References: <634B195B-0BD7-43A5-8ABC-6D8D4E00ABF1@petsc.dev>
	<441311721646454@mail.yandex.ru>
	<F34FE60D-E412-4229-BE3A-73118A6EFC53@petsc.dev>
	<234711722846959@mail.yandex.ru>
Message-ID: <27547411722882471@wf4nrjvtssjecb53.iva.yp-c.yandex.net>

An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240805/57862c9a/attachment.html>

From bsmith at petsc.dev  Mon Aug  5 14:33:44 2024
From: bsmith at petsc.dev (Barry Smith)
Date: Mon, 5 Aug 2024 15:33:44 -0400
Subject: [petsc-users] (no subject)
In-Reply-To: <27547411722882471@wf4nrjvtssjecb53.iva.yp-c.yandex.net>
References: <634B195B-0BD7-43A5-8ABC-6D8D4E00ABF1@petsc.dev>
	<441311721646454@mail.yandex.ru>
	<F34FE60D-E412-4229-BE3A-73118A6EFC53@petsc.dev>
	<234711722846959@mail.yandex.ru>
	<27547411722882471@wf4nrjvtssjecb53.iva.yp-c.yandex.net>
Message-ID: <FBDE67E4-234D-455C-B142-C4349B5875E8@petsc.dev>


https://urldefense.us/v3/__https://petsc.org/release/manualpages/KSP/KSPCR/__;!!G_uCfscf7eWS!ZQFXJM_C7qCPyXSN6vUgcQaTYavgEpmvtDdWSpbY1dR7m_6DlTM1CF00dL26gVO4tjPEFFvV79SEmjS1tbaF0nc$ 

The preconditioner must be POSITIVE-DEFINITE and the operator POSITIVE-SEMIDEFINITE.

https://urldefense.us/v3/__https://petsc.org/release/manualpages/KSP/KSPMINRES/__;!!G_uCfscf7eWS!ZQFXJM_C7qCPyXSN6vUgcQaTYavgEpmvtDdWSpbY1dR7m_6DlTM1CF00dL26gVO4tjPEFFvV79SEmjS1rmWgN0k$ ?


The operator and the preconditioner must be symmetric and the preconditioner must be positive definite for this method.

https://urldefense.us/v3/__https://petsc.org/release/manualpages/KSP/KSPSYMMLQ/__;!!G_uCfscf7eWS!ZQFXJM_C7qCPyXSN6vUgcQaTYavgEpmvtDdWSpbY1dR7m_6DlTM1CF00dL26gVO4tjPEFFvV79SEmjS1uMjjdeU$ ?

The preconditioner must be POSITIVE-DEFINITE.

Of course you can always use KSPGMRES or KSPBCGS


> On Aug 5, 2024, at 2:27?PM, ?????????? ????????? <konstantin.murusidze at math.msu.ru> wrote:
> 
> I know that matrix is symmetric, but it isn?t positive definite. Is it possible to solve such problem, maybe with another solver or preconditioner?
> 
> --
> ?????????? ?? ????????? ?????? ?????
> 
> 05.08.2024, 18:51, "Barry Smith" <bsmith at petsc.dev>:
> PCFactorSetShiftType() only works for certain preconditioners, ILU. 
> 
> If the matrix is not symmetric it also doesn't help.
> 
> Are you sure the matrix is symmetric positive definite?
> 
> 
> 
> 
> On Aug 5, 2024, at 4:43?AM, ?????????? ????????? <konstantin.murusidze at math.msu.ru <mailto:konstantin.murusidze at math.msu.ru>> wrote:
> 
> After this procedure I saw "Linear solution did not converge due to DIVERGED_INDEFINITE_MAT". Then as shown on https://urldefense.us/v3/__https://petsc.org/main/manualpages/KSP/KSP_DIVERGED_INDEFINITE_PC/__;!!G_uCfscf7eWS!ZQFXJM_C7qCPyXSN6vUgcQaTYavgEpmvtDdWSpbY1dR7m_6DlTM1CF00dL26gVO4tjPEFFvV79SEmjS1OqyHDu0$ ?  I wrote such line PetscCall(PCFactorSetShiftType(pc, MAT_SHIFT_POSITIVE_DEFINITE)); but nothing changed and I still have divergence
> 
> 
> 22.07.2024, 17:22, "Barry Smith" <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>:
> 
>    Run with -ksp_monitor_true_residual -ksp_converged_reason -ksp_view to see why it is stopping at 38 iterations.
> 
>    Barry
> 
> 
> On Jul 22, 2024, at 7:37?AM, ?????????? ????????? <konstantin.murusidze at math.msu.ru <mailto:konstantin.murusidze at math.msu.ru>> wrote:
> 
> This Message Is From an External Sender
> This message came from outside your organization.
> Good afternoon. I am a student at the Faculty of Mathematics and for my course work I need to solve SLAE with a relative accuracy of 1e-8 or more. To do this, I created the function PetscCall(KSPSetTolerances(ksp, 1.e-8, PETSC_DEFAULT, PETSC_DEFAULT, 100000));. But in the end, only 38 iterations were made and the relative norm ||Ax-b||/||b|| it turns out 4.54011. If you reply to my email, I can give you more information about the solver settings.
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240805/66ede6e7/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: petsc_favicon.png
Type: image/png
Size: 1172 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240805/66ede6e7/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: petsc_favicon.png
Type: image/png
Size: 1172 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240805/66ede6e7/attachment-0003.png>

From srvenkat at utexas.edu  Mon Aug  5 20:19:51 2024
From: srvenkat at utexas.edu (Sreeram R Venkat)
Date: Mon, 5 Aug 2024 20:19:51 -0500
Subject: [petsc-users] Read/Write large dense matrix
In-Reply-To: <48E31B64-61EB-463F-823F-314DE9E7C290@petsc.dev>
References: <CADtq7MtMvTjtEiHAmPxqw3Z876X+nC4U1z=mc4GeP6YWUMz9hA@mail.gmail.com>
	<48E31B64-61EB-463F-823F-314DE9E7C290@petsc.dev>
Message-ID: <CADtq7Mt-JsMux3f4DH8vquFhcEVFxx39qg6neQNw7vv4wwm9=Q@mail.gmail.com>

Here's an example code that should replicate the error:
https://urldefense.us/v3/__https://github.com/s769/petsc-test/tree/master__;!!G_uCfscf7eWS!dWU2gJCvykWqg3TTfkkQOsW3q32Sny3r399zmyr6MCiJQh6_dH-T3IktQLg9fbvc4okbbHP2koQZkzL0fCjOTrC90w$ .

I tried using the PETSC_FORMAT_NATIVE, but I still get the error. I have a
situation where the matrix is created on PETSC_COMM_WORLD but only has
entries on the first process due to some layout constraints elsewhere in
the program. The nodes I'm running on should have more than enough memory
to hold the entire matrix on one process, and the error I get is not an
out-of-memory error anyway.

Let me know if you aren't able to build the example.

I noticed that if I fully distributed the matrix over all processes, then
the save works fine. Is there some way to do that after I create the matrix
but before saving it?

On Mon, Aug 5, 2024 at 1:19?PM Barry Smith <bsmith at petsc.dev> wrote:

>
>    By default PETSc MatView() to a binary viewer uses the "standard"
> compressed sparse storage format. This is not efficient (or reasonable) for
> dense matrices and
> produces issues with integer overflow.
>
>    To store a dense matrix as dense on disk, use the PetscViewerFormat
> of PETSC_VIEWER_NATIVE. So for example
>
>    PetscViewerPushFormat(viewer,PETSC_VIEWER_NATIVE);
>    MatView(mat, viewer);
>    PetscViewerPopFormat(viewer);
>
>
> On Aug 5, 2024, at 1:10?PM, Sreeram R Venkat <srvenkat at utexas.edu> wrote:
>
> This Message Is From an External Sender
> This message came from outside your organization.
> I have a large dense matrix (size ranging from 5e4 to 1e5) that arises as
> a result of doing MatComputeOperator() on a MatShell. When the total number
> of nonzeros exceeds the 32 bit integer value, I get an error (MPI buffer
> size too big) when trying to do MatView() on this to save to binary. Is
> there a way I can save this matrix to load again for later use?
>
> The other thing I tried was to save each column as a separate dataset in
> an hdf5 file. Then, I tried to load this in python, combine them to an np
> array, and then create/save a dense matrix with petsc4py. I was able to
> create the dense Mat, but the MatView() once again resulted in an error
> (out of memory).
>
> Thanks,
> Sreeram
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240805/0e8f0600/attachment.html>

From mmolinos at us.es  Tue Aug  6 21:22:29 2024
From: mmolinos at us.es (MIGUEL MOLINOS PEREZ)
Date: Wed, 7 Aug 2024 02:22:29 +0000
Subject: [petsc-users] Ghost particles for DMSWARM (or similar)
In-Reply-To: <CAMYG4G=sD5+0pUPEz1arT6G6EJdSapV0xbVoT-J39gR+u17TXw@mail.gmail.com>
References: <8FBAC7A5-B6AE-4B21-8FEB-52BE1C04A265@us.es>
	<CAMYG4G=cqKKsYCFz2bsFnf891hR_N7EwVdj2uLWL3Gi7MbacPQ@mail.gmail.com>
	<C58DE0E1-4155-47B2-A55C-23A2AFAFFBFC@us.es>
	<CAMYG4GmwO9Z638MH3tE6+XBN2VnyadW1sSF5V02kPdp4J4osMw@mail.gmail.com>
	<1B9B1277-9566-444C-9DA8-7ED17684FE01@us.es>
	<CAMYG4G=sD5+0pUPEz1arT6G6EJdSapV0xbVoT-J39gR+u17TXw@mail.gmail.com>
Message-ID: <B3A3A20F-60C8-4097-9FF5-5687D8E22C63@us.es>

Thanks Matt, I think I'll start by making a small program as a proof of concept. Then, if it works I'll implement it in my code and I'll be happy to share it too :-)

Miguel

On Aug 4, 2024, at 3:30?AM, Matthew Knepley <knepley at gmail.com> wrote:

On Fri, Aug 2, 2024 at 7:15?PM MIGUEL MOLINOS PEREZ <mmolinos at us.es<mailto:mmolinos at us.es>> wrote:
Thanks again Matt, that makes a lot more sense !!

Just to check that we are on the same page. You are saying:

1. create a field define a field called "owner rank" for each particle.

2. Identify the phantom particles and modify the internal variable defined by the DMSwarmField_rank variable.

3. Call DMSwarmMigrate(*,PETSC_FALSE), do the calculations using the new local vector including the ghost particles.

4. Then, once the calculations are done, rename the DMSwarmField_rank variable using the "owner rank" variable and call DMSwarmMigrate(*,PETSC_FALSE) once again.

I don't think we need this last step. We can just remove those ghost particles for the next step I think.

  Thanks,

     Matt

Thank you,
Miguel


On Aug 2, 2024, at 5:33?PM, Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>> wrote:

On Fri, Aug 2, 2024 at 11:15?AM MIGUEL MOLINOS PEREZ <mmolinos at us.es<mailto:mmolinos at us.es>> wrote:
Thank you Matt for your time,

What you describe seems to me the ideal approach.

1) Add a particle field 'ghost' that identifies ghost vs owned particles. I think it needs options OWNED, OVERLAP, and GHOST
This means, locally, I need to allocate Nlocal + ghost particles (duplicated) for my model?

I would do it another way. I would allocate the particles with no overlap and set them up. Then I would identify the halo particles, mark them as OVERLAP, call DMSwarmMigrate(), and mark the migrated particles as GHOST, then unmark the OVERLAP particles. Shoot! That marking will not work since we cannot tell the difference between particles we received and particles we sent. Okay, instead of the `ghost` field we need an `owner rank` field. So then we

1) Setup the non-overlapping particles

2) Identify the halo particles

3) Change the `rank`, but not the `owner rank`

4) Call DMSwarmMigrate()

Now we can identify ghost particles by the `owner rank`

If that so, how to do the communication between the ghost particles living in the rank i and their ?real? counterpart in the rank j.

Algo, as an alternative, what about:
1) Use an IS tag which contains, for each rank, a list of the global index of the neighbors particles outside of the rank.
2) Use VecCreateGhost to create a new vector which contains extra local space for the ghost components of the vector.
3) Use VecScatterCreate, VecScatterBegin, and VecScatterEnd to do the transference of data between a vector obtained with  DMSwarmCreateGlobalVectorFromField
4) Do necessary computations using the vectors created with VecCreateGhost.

This is essentially what Migrate() does. I was trying to reuse the code.

  Thanks,

     Matt

Thanks,
Miguel

On Aug 2, 2024, at 8:58?AM, Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>> wrote:

On Thu, Aug 1, 2024 at 4:40?PM MIGUEL MOLINOS PEREZ <mmolinos at us.es<mailto:mmolinos at us.es>> wrote:
This Message Is From an External Sender
This message came from outside your organization.


Dear all,

I am implementing a Molecular Dynamics (MD) code using the DMSWARM interface. In the MD simulations we evaluate on each particle (atoms) some kind of scalar functional using data from the neighbouring atoms. My problem lies in the parallel implementation of the model, because sometimes, some of these neighbours lie on a different processor.

This is usually solved by using ghost particles.  A similar approach (with nodes instead) is already implemented for other PETSc mesh structures like DMPlexConstructGhostCells. Unfortunately, I don't see this kind of constructs for DMSWARM. Am I missing something?

I this could be done by applying a buffer region by exploiting the background DMDA mesh that I already use to do domain decomposition. Then using the buffer region of each cell to locate the ghost particles and finally using VecCreateGhost. Is this feasible? Or is there an easier approach using other PETSc functions.

This is feasible, but it would be good to develop a set of best practices, since we have been mainly focused on the case of non-redundant particles. Here is how I think I would do what you want.

1) Add a particle field 'ghost' that identifies ghost vs owned particles. I think it needs options OWNED, OVERLAP, and GHOST

2) At some interval identify particles that should be sent to other processes as ghosts. I would call these "overlap particles". The determination
    seems application specific, so I would leave this determination to the user right now. We do two things to these particles

    a) Mark chosen particles as OVERLAP

    b) Change rank to process we are sending to

3) Call DMSwarmMigrate with PETSC_FALSE for the particle deletion flag

4) Mark OVERLAP particles as GHOST when they arrive

There is one problem in the above algorithm. It does not allow sending particles to multiple ranks. We would have to do this
in phases right now, or make a small adjustment to the interface allowing replication of particles when a set of ranks is specified.

  THanks,

     Matt


Thank you,
Miguel


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Zaa9esHaq00hIlNsNZjjG-L4CQ2QxKeoyoqvAF4909vtCxKveI1Fh83DKxZnH24E5ToHwzs69i5yzVZlQGO6fA$ <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Zaa9esHaq00hIlNsNZjjG-L4CQ2QxKeoyoqvAF4909vtCxKveI1Fh83DKxZnH24E5ToHwzs69i5yzVYUM--A0g$ >


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Zaa9esHaq00hIlNsNZjjG-L4CQ2QxKeoyoqvAF4909vtCxKveI1Fh83DKxZnH24E5ToHwzs69i5yzVZlQGO6fA$ <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Zaa9esHaq00hIlNsNZjjG-L4CQ2QxKeoyoqvAF4909vtCxKveI1Fh83DKxZnH24E5ToHwzs69i5yzVYUM--A0g$ >


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Zaa9esHaq00hIlNsNZjjG-L4CQ2QxKeoyoqvAF4909vtCxKveI1Fh83DKxZnH24E5ToHwzs69i5yzVZlQGO6fA$ <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Zaa9esHaq00hIlNsNZjjG-L4CQ2QxKeoyoqvAF4909vtCxKveI1Fh83DKxZnH24E5ToHwzs69i5yzVYUM--A0g$ >

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240807/11e99412/attachment-0001.html>

From bsmith at petsc.dev  Tue Aug  6 22:23:30 2024
From: bsmith at petsc.dev (Barry Smith)
Date: Tue, 6 Aug 2024 23:23:30 -0400
Subject: [petsc-users] Read/Write large dense matrix
In-Reply-To: <CADtq7Mt-JsMux3f4DH8vquFhcEVFxx39qg6neQNw7vv4wwm9=Q@mail.gmail.com>
References: <CADtq7MtMvTjtEiHAmPxqw3Z876X+nC4U1z=mc4GeP6YWUMz9hA@mail.gmail.com>
	<48E31B64-61EB-463F-823F-314DE9E7C290@petsc.dev>
	<CADtq7Mt-JsMux3f4DH8vquFhcEVFxx39qg6neQNw7vv4wwm9=Q@mail.gmail.com>
Message-ID: <2BE6C49A-B4CB-48A8-AA78-056753B8539F@petsc.dev>


   I have removed an unnecessary PetscMPIIntCast() on MPI rank zero that was causing your test code to fail. See https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/7747__;!!G_uCfscf7eWS!fyVpTZyH7SS1nVHiZBR-6MSWa6uJ0mSExyg1aNmU4PyWfukw1682_dX9rwUKstiGP6Z8i22L4pmElEi9qsfHA6o$ 

   Thanks for reporting the problem.

   Barry

   BTW:  I don't think we have code to distribute a dense matrix that has values only on one rank to all the ranks. The code would essentially like the combination of 
of MatView_Dense_Binary/MatLoad_Dense_Binary with PetscViewerBinaryWriteReadAll without the saving and reading from disk.

   It is likely relatively easy to fix the dense matrix view/load with native format so that it does not need 64 bit indices to work with your test code.


> On Aug 5, 2024, at 9:19?PM, Sreeram R Venkat <srvenkat at utexas.edu> wrote:
> 
> Here's an example code that should replicate the error: https://urldefense.us/v3/__https://github.com/s769/petsc-test/tree/master__;!!G_uCfscf7eWS!fyVpTZyH7SS1nVHiZBR-6MSWa6uJ0mSExyg1aNmU4PyWfukw1682_dX9rwUKstiGP6Z8i22L4pmElEi9y0M6fi0$ .
> 
> I tried using the PETSC_FORMAT_NATIVE, but I still get the error. I have a situation where the matrix is created on PETSC_COMM_WORLD but only has entries on the first process due to some layout constraints elsewhere in the program. The nodes I'm running on should have more than enough memory to hold the entire matrix on one process, and the error I get is not an out-of-memory error anyway. 
> 
> Let me know if you aren't able to build the example.
> 
> I noticed that if I fully distributed the matrix over all processes, then the save works fine. Is there some way to do that after I create the matrix but before saving it?
> 
> On Mon, Aug 5, 2024 at 1:19?PM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>> 
>>    By default PETSc MatView() to a binary viewer uses the "standard" compressed sparse storage format. This is not efficient (or reasonable) for dense matrices and 
>> produces issues with integer overflow.
>> 
>>    To store a dense matrix as dense on disk, use the PetscViewerFormat of PETSC_VIEWER_NATIVE. So for example
>> 
>>    PetscViewerPushFormat(viewer,PETSC_VIEWER_NATIVE);
>>    MatView(mat, viewer);
>>    PetscViewerPopFormat(viewer);
>> 
>> 
>>> On Aug 5, 2024, at 1:10?PM, Sreeram R Venkat <srvenkat at utexas.edu <mailto:srvenkat at utexas.edu>> wrote:
>>> 
>>> This Message Is From an External Sender
>>> This message came from outside your organization.
>>> I have a large dense matrix (size ranging from 5e4 to 1e5) that arises as a result of doing MatComputeOperator() on a MatShell. When the total number of nonzeros exceeds the 32 bit integer value, I get an error (MPI buffer size too big) when trying to do MatView() on this to save to binary. Is there a way I can save this matrix to load again for later use? 
>>> 
>>> The other thing I tried was to save each column as a separate dataset in an hdf5 file. Then, I tried to load this in python, combine them to an np array, and then create/save a dense matrix with petsc4py. I was able to create the dense Mat, but the MatView() once again resulted in an error (out of memory). 
>>> 
>>> Thanks,
>>> Sreeram
>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240806/19f209db/attachment.html>

From pentaerythrotol at gmail.com  Thu Aug  8 09:46:19 2024
From: pentaerythrotol at gmail.com (=?UTF-8?B?546L5oOK5qKm?=)
Date: Thu, 8 Aug 2024 22:46:19 +0800
Subject: [petsc-users] A question about loading data form .h5 file
Message-ID: <CAAoQDP6=-fVUZ1JfXUVwczfp7-wY5mqazNvvKADb4NFwGXCk8w@mail.gmail.com>

Dear Sir or Madam,

I met a problem about loading data from hdf5 file. Attached is a simple
test program.

A introduction of the program is as follows:
I create a 3d DMDA object(50*50*50), and create 2 global vectors x and y by
DMDA, then I write the vector x into file test.h5.
When I try to load the vector y from test.h5, the error occurs that "Global
size of array in file is 2500, not 125000 as expected", but the size of the
array should be 125000 which I've checked in python.

Thank you for your help!

Best regards
ZHOU Yingjie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240808/9103e8fd/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.c
Type: text/x-c-code
Size: 1766 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240808/9103e8fd/attachment.bin>

From bsmith at petsc.dev  Thu Aug  8 13:28:33 2024
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 8 Aug 2024 14:28:33 -0400
Subject: [petsc-users] A question about loading data form .h5 file
In-Reply-To: <CAAoQDP6=-fVUZ1JfXUVwczfp7-wY5mqazNvvKADb4NFwGXCk8w@mail.gmail.com>
References: <CAAoQDP6=-fVUZ1JfXUVwczfp7-wY5mqazNvvKADb4NFwGXCk8w@mail.gmail.com>
Message-ID: <2C7817FD-EF02-455C-B6F9-17CEB8D72485@petsc.dev>


  Try reading the vector back in with a vector created from the DM.


> On Aug 8, 2024, at 10:46?AM, ??? <pentaerythrotol at gmail.com> wrote:
> 
> This Message Is From an External Sender
> This message came from outside your organization.
> Dear Sir or Madam,
> 
> I met a problem about loading data from hdf5 file. Attached is a simple test program.
> 
> A introduction of the program is as follows:
> I create a 3d DMDA object(50*50*50), and create 2 global vectors x and y by DMDA, then I write the vector x into file test.h5.
> When I try to load the vector y from test.h5, the error occurs that "Global size of array in file is 2500, not 125000 as expected", but the size of the array should be 125000 which I've checked in python.
> 
> Thank you for your help!
> 
> Best regards
> ZHOU Yingjie
> <test.c>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240808/312d3cf2/attachment-0001.html>

From bsmith at petsc.dev  Fri Aug  9 09:37:05 2024
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 9 Aug 2024 10:37:05 -0400
Subject: [petsc-users] A question about loading data form .h5 file
In-Reply-To: <CAAoQDP4XB+Sa2R04bg-5aLx2XpqCDY+G452FNgCDWRwnzJU2eQ@mail.gmail.com>
References: <CAAoQDP6=-fVUZ1JfXUVwczfp7-wY5mqazNvvKADb4NFwGXCk8w@mail.gmail.com>
	<2C7817FD-EF02-455C-B6F9-17CEB8D72485@petsc.dev>
	<CAAoQDP4XB+Sa2R04bg-5aLx2XpqCDY+G452FNgCDWRwnzJU2eQ@mail.gmail.com>
Message-ID: <2A0602EC-91E6-4D5B-9D76-C1458BA23535@petsc.dev>


   Vectors created from a DMDA have multi-dimensional information attached to them. When the vector is saved with an HDF5 viewer, this multi-dimensional information is saved in the HDF5 file.  When a vector created with a DMDA loads the HDF5 file, the multi-dimensional information is used for a successful load.

When a plain vector (obtained with VecCreate) loads from the HDF5 file, it does not have the concept of multidimensional information and thus, for some reason, works incorrectly. I would consider the current error output to be a bug; it should provide a more useful error message.

Note that for DMDA in parallel, the DMDA vector data is transformed to be stored in the natural multidimensional ordering on the disk. When it is read back in with a DMDA vector, the transformation is reversed to get the data back into PETSc's parallel ordering. So, in parallel, it would never make sense to store a DMDA vector with hdf5 and then read it back in with a "plain" vector since the plain vector does not know how to transform the data. 

Barry


> On Aug 9, 2024, at 1:11?AM, ??? <pentaerythrotol at gmail.com> wrote:
> 
> Dear Barry,
> 
> Thank you for your quick reply, the problem has been solved.
> 
> I'm just still curious why when I use PETSC_COMM_SELF in PETSCVIEWER to load a 50*50*50 vector, and the error happens that "Global size of array in file is 2500(50*50), not 125000(50*50*50) as expected".
> 
> Best regards,
> Yingjie ZHOU
> 
> Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> ?2024?8?9??? 02:28???
>> 
>>   Try reading the vector back in with a vector created from the DM.
>> 
>> 
>> 
>>> On Aug 8, 2024, at 10:46?AM, ??? <pentaerythrotol at gmail.com <mailto:pentaerythrotol at gmail.com>> wrote:
>>> 
>>> This Message Is From an External Sender
>>> This message came from outside your organization.
>>> Dear Sir or Madam,
>>> 
>>> I met a problem about loading data from hdf5 file. Attached is a simple test program.
>>> 
>>> A introduction of the program is as follows:
>>> I create a 3d DMDA object(50*50*50), and create 2 global vectors x and y by DMDA, then I write the vector x into file test.h5.
>>> When I try to load the vector y from test.h5, the error occurs that "Global size of array in file is 2500, not 125000 as expected", but the size of the array should be 125000 which I've checked in python.
>>> 
>>> Thank you for your help!
>>> 
>>> Best regards
>>> ZHOU Yingjie
>>> <test.c>
>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240809/15cae67d/attachment.html>

From marcos.vanella at nist.gov  Fri Aug  9 12:17:10 2024
From: marcos.vanella at nist.gov (Vanella, Marcos (Fed))
Date: Fri, 9 Aug 2024 17:17:10 +0000
Subject: [petsc-users] Issue configuring PETSc with HYPRE in Polaris
Message-ID: <DM6PR09MB5063AABF892589E5542DEB57F8BA2@DM6PR09MB5063.namprd09.prod.outlook.com>

Hi All, I keep running into this issue when trying to configure PETSc downloading HYPRE in Polaris.
My modules are:

export MPICH_GPU_SUPPORT_ENABLED=1
module use /soft/modulefiles
module load spack-pe-base cmake
module unload darshan
module load cudatoolkit-standalone PrgEnv-gnu cray-libsci

and my configure line is:

$./configure COPTFLAGS="-O2" CXXOPTFLAGS="-O2" FOPTFLAGS="-O2" FCOPTFLAGS="-O2" CUDAOPTFLAGS="-O2" --with-debugging=1 --download-suitesparse --download-hypre --with-cuda --with-cc=cc --with-cxx=CC --with-fc=ftn --with-cudac=nvcc --with-cuda-arch=80

What I see in the configure phase is:
=============================================================================================
                         Configuring PETSc to compile on your system
=============================================================================================
=============================================================================================
           Trying to download https://urldefense.us/v3/__https://bitbucket.org/petsc/pkg-sowing.git__;!!G_uCfscf7eWS!cmVCEe9Yo9XY7yJT97YkbQmjoCgOuxhiJ2FQxtDUKX1EeWJlKWt0pLawxoHeUS0ZDgSfwCHAoJNUjc5uQW3gQdHH9OXszklO$  for SOWING
=============================================================================================
=============================================================================================
                  Running configure on SOWING; this may take several minutes
=============================================================================================
=============================================================================================
                    Running make on SOWING; this may take several minutes
=============================================================================================
=============================================================================================
                Running make install on SOWING; this may take several minutes
=============================================================================================
=============================================================================================
                 Running arch-polaris-dbg/bin/bfort to generate Fortran stubs
=============================================================================================
=============================================================================================
    Trying to download https://urldefense.us/v3/__https://github.com/DrTimothyAldenDavis/SuiteSparse__;!!G_uCfscf7eWS!cmVCEe9Yo9XY7yJT97YkbQmjoCgOuxhiJ2FQxtDUKX1EeWJlKWt0pLawxoHeUS0ZDgSfwCHAoJNUjc5uQW3gQdHH9Ho5-hpl$  for SUITESPARSE
=============================================================================================
=============================================================================================
              Configuring SUITESPARSE with CMake; this may take several minutes
=============================================================================================
=============================================================================================
             Compiling and installing SUITESPARSE; this may take several minutes
=============================================================================================
=============================================================================================
              Trying to download https://urldefense.us/v3/__https://github.com/hypre-space/hypre__;!!G_uCfscf7eWS!cmVCEe9Yo9XY7yJT97YkbQmjoCgOuxhiJ2FQxtDUKX1EeWJlKWt0pLawxoHeUS0ZDgSfwCHAoJNUjc5uQW3gQdHH9JxTYrQ0$  for HYPRE
=============================================================================================
=============================================================================================
                  Running configure on HYPRE; this may take several minutes
=============================================================================================
=============================================================================================
                     Running make on HYPRE; this may take several minutes
=============================================================================================

*********************************************************************************************
           UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details):
---------------------------------------------------------------------------------------------
                          Error running make; make install on HYPRE
*********************************************************************************************

 the configure.log file ends with:

*********************************************************************************************
           UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details):
---------------------------------------------------------------------------------------------
                          Error running make; make install on HYPRE
*********************************************************************************************
  File "/home/mnv/Software/petsc/config/configure.py", line 462, in petsc_configure
    framework.configure(out = sys.stdout)
  File "/home/mnv/Software/petsc/config/BuildSystem/config/framework.py", line 1455, in configure
    self.processChildren()
  File "/home/mnv/Software/petsc/config/BuildSystem/config/framework.py", line 1443, in processChildren
    self.serialEvaluation(self.childGraph)
  File "/home/mnv/Software/petsc/config/BuildSystem/config/framework.py", line 1418, in serialEvaluation
    child.configure()
  File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", line 1354, in configure
    self.executeTest(self.configureLibrary)
  File "/home/mnv/Software/petsc/config/BuildSystem/config/base.py", line 138, in executeTest
    ret = test(*args,**kargs)
  File "/home/mnv/Software/petsc/config/BuildSystem/config/packages/hypre.py", line 199, in configureLibrary
    config.package.Package.configureLibrary(self)
  File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", line 1041, in configureLibrary
    for location, directory, lib, incl in self.generateGuesses():
  File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", line 609, in generateGuesses
    d = self.checkDownload()
  File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", line 743, in checkDownload
    return self.getInstallDir()
  File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", line 545, in getInstallDir
    installDir = self.Install()
  File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", line 1892, in Install
    raise RuntimeError('Error running make; make install on '+self.PACKAGE)
================================================================================
Finishing configure run at Fri, 09 Aug 2024 15:44:54 +0000
================================================================================

Any help in debugging this is much appreciated. I can provide the whole configure.log file if needed.
Thank you for your time,
Marcos
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240809/75bbe510/attachment-0001.html>

From balay.anl at fastmail.org  Fri Aug  9 12:44:18 2024
From: balay.anl at fastmail.org (Satish Balay)
Date: Fri, 9 Aug 2024 12:44:18 -0500 (CDT)
Subject: [petsc-users] Issue configuring PETSc with HYPRE in Polaris
In-Reply-To: <DM6PR09MB5063AABF892589E5542DEB57F8BA2@DM6PR09MB5063.namprd09.prod.outlook.com>
References: <DM6PR09MB5063AABF892589E5542DEB57F8BA2@DM6PR09MB5063.namprd09.prod.outlook.com>
Message-ID: <ce4f0f2a-269b-a1b5-0131-f40ea79965f6@fastmail.org>

If building on front-end - try using --with-make-np=8 [or 4]

If you still have issues - send configure.log

Satish

On Fri, 9 Aug 2024, Vanella, Marcos (Fed) via petsc-users wrote:

> Hi All, I keep running into this issue when trying to configure PETSc downloading HYPRE in Polaris.
> My modules are:
> 
> export MPICH_GPU_SUPPORT_ENABLED=1
> module use /soft/modulefiles
> module load spack-pe-base cmake
> module unload darshan
> module load cudatoolkit-standalone PrgEnv-gnu cray-libsci
> 
> and my configure line is:
> 
> $./configure COPTFLAGS="-O2" CXXOPTFLAGS="-O2" FOPTFLAGS="-O2" FCOPTFLAGS="-O2" CUDAOPTFLAGS="-O2" --with-debugging=1 --download-suitesparse --download-hypre --with-cuda --with-cc=cc --with-cxx=CC --with-fc=ftn --with-cudac=nvcc --with-cuda-arch=80
> 
> What I see in the configure phase is:
> =============================================================================================
>                          Configuring PETSc to compile on your system
> =============================================================================================
> =============================================================================================
>            Trying to download https://urldefense.us/v3/__https://bitbucket.org/petsc/pkg-sowing.git__;!!G_uCfscf7eWS!cmVCEe9Yo9XY7yJT97YkbQmjoCgOuxhiJ2FQxtDUKX1EeWJlKWt0pLawxoHeUS0ZDgSfwCHAoJNUjc5uQW3gQdHH9OXszklO$  for SOWING
> =============================================================================================
> =============================================================================================
>                   Running configure on SOWING; this may take several minutes
> =============================================================================================
> =============================================================================================
>                     Running make on SOWING; this may take several minutes
> =============================================================================================
> =============================================================================================
>                 Running make install on SOWING; this may take several minutes
> =============================================================================================
> =============================================================================================
>                  Running arch-polaris-dbg/bin/bfort to generate Fortran stubs
> =============================================================================================
> =============================================================================================
>     Trying to download https://urldefense.us/v3/__https://github.com/DrTimothyAldenDavis/SuiteSparse__;!!G_uCfscf7eWS!cmVCEe9Yo9XY7yJT97YkbQmjoCgOuxhiJ2FQxtDUKX1EeWJlKWt0pLawxoHeUS0ZDgSfwCHAoJNUjc5uQW3gQdHH9Ho5-hpl$  for SUITESPARSE
> =============================================================================================
> =============================================================================================
>               Configuring SUITESPARSE with CMake; this may take several minutes
> =============================================================================================
> =============================================================================================
>              Compiling and installing SUITESPARSE; this may take several minutes
> =============================================================================================
> =============================================================================================
>               Trying to download https://urldefense.us/v3/__https://github.com/hypre-space/hypre__;!!G_uCfscf7eWS!cmVCEe9Yo9XY7yJT97YkbQmjoCgOuxhiJ2FQxtDUKX1EeWJlKWt0pLawxoHeUS0ZDgSfwCHAoJNUjc5uQW3gQdHH9JxTYrQ0$  for HYPRE
> =============================================================================================
> =============================================================================================
>                   Running configure on HYPRE; this may take several minutes
> =============================================================================================
> =============================================================================================
>                      Running make on HYPRE; this may take several minutes
> =============================================================================================
> 
> *********************************************************************************************
>            UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details):
> ---------------------------------------------------------------------------------------------
>                           Error running make; make install on HYPRE
> *********************************************************************************************
> 
>  the configure.log file ends with:
> 
> *********************************************************************************************
>            UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details):
> ---------------------------------------------------------------------------------------------
>                           Error running make; make install on HYPRE
> *********************************************************************************************
>   File "/home/mnv/Software/petsc/config/configure.py", line 462, in petsc_configure
>     framework.configure(out = sys.stdout)
>   File "/home/mnv/Software/petsc/config/BuildSystem/config/framework.py", line 1455, in configure
>     self.processChildren()
>   File "/home/mnv/Software/petsc/config/BuildSystem/config/framework.py", line 1443, in processChildren
>     self.serialEvaluation(self.childGraph)
>   File "/home/mnv/Software/petsc/config/BuildSystem/config/framework.py", line 1418, in serialEvaluation
>     child.configure()
>   File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", line 1354, in configure
>     self.executeTest(self.configureLibrary)
>   File "/home/mnv/Software/petsc/config/BuildSystem/config/base.py", line 138, in executeTest
>     ret = test(*args,**kargs)
>   File "/home/mnv/Software/petsc/config/BuildSystem/config/packages/hypre.py", line 199, in configureLibrary
>     config.package.Package.configureLibrary(self)
>   File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", line 1041, in configureLibrary
>     for location, directory, lib, incl in self.generateGuesses():
>   File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", line 609, in generateGuesses
>     d = self.checkDownload()
>   File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", line 743, in checkDownload
>     return self.getInstallDir()
>   File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", line 545, in getInstallDir
>     installDir = self.Install()
>   File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py", line 1892, in Install
>     raise RuntimeError('Error running make; make install on '+self.PACKAGE)
> ================================================================================
> Finishing configure run at Fri, 09 Aug 2024 15:44:54 +0000
> ================================================================================
> 
> Any help in debugging this is much appreciated. I can provide the whole configure.log file if needed.
> Thank you for your time,
> Marcos
> 


From junming.duan.math at gmail.com  Fri Aug  9 15:02:06 2024
From: junming.duan.math at gmail.com (Junming Duan)
Date: Fri, 9 Aug 2024 22:02:06 +0200
Subject: [petsc-users] MatZeroRowsColumns eliminates incorrectly in parallel
In-Reply-To: <F29C13EB-6F54-4F0B-9308-595AE917F61B@icloud.com>
References: <F29C13EB-6F54-4F0B-9308-595AE917F61B@icloud.com>
Message-ID: <DF1DB09E-6137-41EB-A712-40C0B39081CC@gmail.com>

Dear all,

I tried to use MatZeroRowsColumns to eliminate Dirichlet boundary nodes. However, it cannot eliminate correctly in parallel.
Please see the attached code which uses DMDA to create the matrix.
When I used one process, it works as expected.
For two processes, the domain is split in the x direction. But the 10th row, 20th column is not eliminated as observed when using one process. The results for two processes are also attached.
I have input the same rows to be eliminated for both processes.
Thank you for any help.

#include <stdio.h>
#include <petscdmda.h>

int main(int argc, char **argv)
{
 PetscInt    M = 5, N = 3, m = PETSC_DECIDE, n = PETSC_DECIDE, ncomp = 2;
 PetscInt i, j;
 DMDALocalInfo daInfo;
 DM          da;
 Mat A;
 Vec x, b;
 MatStencil row, col[5];
 PetscScalar v[5];
 PetscInt  n_dirichlet_rows = 0, dirichlet_rows[2*(M+N)];

 PetscFunctionBeginUser;
 PetscCall(PetscInitialize(&argc, &argv, (char *)0, NULL));
 PetscCall(DMDACreate2d(PETSC_COMM_WORLD, DM_BOUNDARY_GHOSTED, DM_BOUNDARY_GHOSTED, DMDA_STENCIL_BOX, M, N, m, n, ncomp, 1, NULL, NULL, &da));
 PetscCall(DMSetFromOptions(da));
 PetscCall(DMSetUp(da));

 PetscCall(DMView(da, PETSC_VIEWER_STDOUT_WORLD));

 PetscCall(DMDAGetLocalInfo(da, &daInfo));
 PetscCall(DMSetMatrixPreallocateOnly(da, PETSC_TRUE));

 PetscCall(DMCreateMatrix(da, &A));
 PetscCall(MatCreateVecs(A, &x, &b));
 PetscCall(MatZeroEntries(A));
 PetscCall(VecZeroEntries(x));
 PetscCall(VecZeroEntries(b));

 for (j = daInfo.ys; j < daInfo.ys + daInfo.ym; ++j) {
   for (i = daInfo.xs; i < daInfo.xs + daInfo.xm; ++i) {
     row.j = j;
     row.i = i;
     row.c = 0;

     col[0].j = j;
     col[0].i = i;
     col[0].c = 0;
     v[0] = row.i + col[0].i + 2;
     col[1].j = j;
     col[1].i = i - 1;
     col[1].c = 0;
     v[1] = row.i + col[1].i + 2;
     col[2].j = j;
     col[2].i = i + 1;
     col[2].c = 0;
     v[2] = row.i + col[2].i + 2;
     col[3].j = j - 1;
     col[3].i = i;
     col[3].c = 0;
     v[3] = row.i + col[2].i + 2;
     col[4].j = j + 1;
     col[4].i = i;
     col[4].c = 0;
     v[4] = row.i + col[2].i + 2;

     PetscCall(MatSetValuesStencil(A, 1, &row, 5, col, v, ADD_VALUES));

     row.j = j;
     row.i = i;
     row.c = 1;
     col[0].j = j;
     col[0].i = i;
     col[0].c = 1;
     v[1] = row.j + col[1].j + 2;
     col[1].j = j - 1;
     col[1].i = i;
     col[1].c = 1;
     v[1] = row.j + col[1].j + 2;
     col[2].j = j + 1;
     col[2].i = i;
     col[2].c = 1;
     v[2] = row.j + col[2].j + 2;
     col[3].j = j;
     col[3].i = i - 1;
     col[3].c = 1;
     v[3] = row.j + col[1].j + 2;
     col[4].j = j;
     col[4].i = i + 1;
     col[4].c = 1;
     v[4] = row.j + col[2].j + 2;

     PetscCall(MatSetValuesStencil(A, 1, &row, 5, col, v, ADD_VALUES));

   }
 }
 PetscCall(MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY));
 PetscCall(MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY));
 MatView(A, 0);

 for (j = 0; j < daInfo.my; ++j) {
   dirichlet_rows[n_dirichlet_rows++] = j * daInfo.mx <https://urldefense.us/v3/__http://dainfo.mx/__;!!G_uCfscf7eWS!faAU5SYXVdbeL4y_c26LP7pbZnC1p3U_6-f4qNMpQMw2dXi4plc1QMmEh9_6S1xCYurJg4RIuXOrnKz85IkV6m3YcLROpRRf$ > * ncomp;
   dirichlet_rows[n_dirichlet_rows++] = (j+1) * daInfo.mx <https://urldefense.us/v3/__http://dainfo.mx/__;!!G_uCfscf7eWS!faAU5SYXVdbeL4y_c26LP7pbZnC1p3U_6-f4qNMpQMw2dXi4plc1QMmEh9_6S1xCYurJg4RIuXOrnKz85IkV6m3YcLROpRRf$ > * ncomp - ncomp;
 }
 PetscCall(PetscPrintf(PETSC_COMM_SELF, "n_dirichlet_rows: %d\n", n_dirichlet_rows));
 for (j = 0; j < n_dirichlet_rows; ++j) {
   PetscCall(PetscPrintf(PETSC_COMM_SELF, "%d, ", dirichlet_rows[j]));
 }
 PetscCall(PetscPrintf(PETSC_COMM_SELF, "\n"));
 PetscCall(MatZeroRowsColumns(A, n_dirichlet_rows, dirichlet_rows, 1, NULL, NULL));
 MatView(A, 0);

 PetscCall(VecDestroy(&x));
 PetscCall(VecDestroy(&b));
 PetscCall(DMDestroy(&da));
 PetscCall(PetscFinalize());
 return 0;
}

???????????????????

n_dirichlet_rows: 6
0, 8, 10, 18, 20, 28,
Mat Object: 2 MPI processes
 type: mpiaij
   row 0:     (0, 1.)      (2, 0.)      (10, 0.)
   row 1:     (1, 2.)      (3, 3.)      (11, 3.)
   row 2:     (0, 0.)      (2, 4.)      (4, 5.)      (12, 0.)
   row 3:     (1, 1.)      (3, 4.)      (5, 3.)      (13, 3.)
   row 4:     (2, 5.)      (4, 6.)      (6, 0.)      (14, 0.)
   row 5:     (3, 1.)      (5, 6.)      (7, 3.)      (15, 3.)
   row 6:     (4, 0.)      (6, 1.)      (8, 0.)      (16, 0.)
   row 7:     (5, 1.)      (7, 8.)      (9, 3.)      (17, 3.)
   row 8:     (6, 0.)      (8, 1.)      (18, 0.)
   row 9:     (7, 1.)      (9, 10.)      (19, 3.)
   row 10:     (0, 0.)      (10, 2.)      (12, 0.)      (20, 3.)
   row 11:     (1, 3.)      (11, 2.)      (13, 5.)      (21, 5.)
   row 12:     (2, 0.)      (10, 0.)      (12, 1.)      (14, 0.)      (22, 0.)
   row 13:     (3, 3.)      (11, 3.)      (13, 4.)      (15, 5.)      (23, 5.)
   row 14:     (4, 0.)      (12, 0.)      (14, 1.)      (16, 0.)      (24, 0.)
   row 15:     (5, 3.)      (13, 3.)      (15, 6.)      (17, 5.)      (25, 5.)
   row 16:     (6, 0.)      (14, 0.)      (16, 8.)      (18, 9.)      (26, 9.)
   row 17:     (7, 3.)      (15, 3.)      (17, 8.)      (19, 5.)      (27, 5.)
   row 18:     (8, 0.)      (16, 9.)      (18, 10.)      (28, 0.)
   row 19:     (9, 3.)      (17, 3.)      (19, 10.)      (29, 5.)
   row 20:     (10, 3.)      (20, 2.)      (22, 3.)
   row 21:     (11, 5.)      (21, 2.)      (23, 7.)
   row 22:     (12, 0.)      (20, 3.)      (22, 4.)      (24, 5.)
   row 23:     (13, 5.)      (21, 5.)      (23, 4.)      (25, 7.)
   row 24:     (14, 0.)      (22, 5.)      (24, 6.)      (26, 7.)
   row 25:     (15, 5.)      (23, 5.)      (25, 6.)      (27, 7.)
   row 26:     (16, 9.)      (24, 7.)      (26, 8.)      (28, 0.)
   row 27:     (17, 5.)      (25, 5.)      (27, 8.)      (29, 7.)
   row 28:     (18, 0.)      (26, 0.)      (28, 1.)
   row 29:     (19, 5.)      (27, 5.)      (29, 10.)

Junming
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240809/d894777c/attachment-0001.html>

From bsmith at petsc.dev  Fri Aug  9 15:25:09 2024
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 9 Aug 2024 16:25:09 -0400
Subject: [petsc-users] MatZeroRowsColumns eliminates incorrectly in
 parallel
In-Reply-To: <DF1DB09E-6137-41EB-A712-40C0B39081CC@gmail.com>
References: <F29C13EB-6F54-4F0B-9308-595AE917F61B@icloud.com>
	<DF1DB09E-6137-41EB-A712-40C0B39081CC@gmail.com>
Message-ID: <EC292B95-1B18-4944-AD57-F3DD5F7E4D96@petsc.dev>


  This is incorrect and will not work.

for (j = 0; j < daInfo.my; ++j) {
   dirichlet_rows[n_dirichlet_rows++] = j * daInfo.mx <https://urldefense.us/v3/__http://dainfo.mx/__;!!G_uCfscf7eWS!faAU5SYXVdbeL4y_c26LP7pbZnC1p3U_6-f4qNMpQMw2dXi4plc1QMmEh9_6S1xCYurJg4RIuXOrnKz85IkV6m3YcLROpRRf$> * ncomp;
   dirichlet_rows[n_dirichlet_rows++] = (j+1) * daInfo.mx <https://urldefense.us/v3/__http://dainfo.mx/__;!!G_uCfscf7eWS!faAU5SYXVdbeL4y_c26LP7pbZnC1p3U_6-f4qNMpQMw2dXi4plc1QMmEh9_6S1xCYurJg4RIuXOrnKz85IkV6m3YcLROpRRf$> * ncomp - ncomp;
 }

You are assuming that the PETSc global numbering of the matrix rows/columns is the same as the natural ordering (on a 2d mesh) across the entire mesh. It is not, rather all nodes are numbered on the first MPI process, followed by all the on second etc. Thus mapping between the PETSc ordering and the natural ordering is cumbersome.

But, no worries, MatZeroRowsColumnsStencil() allows you to indicate the rows/columns to zero using the same stencil information you use to fill the matrix, so you never need to worry about the mapping between the PETSc ordering and the global natural ordering. 

Barry


> On Aug 9, 2024, at 4:02?PM, Junming Duan <junming.duan.math at gmail.com> wrote:
> 
> Dear all,
> 
> I tried to use MatZeroRowsColumns to eliminate Dirichlet boundary nodes. However, it cannot eliminate correctly in parallel.
> Please see the attached code which uses DMDA to create the matrix.
> When I used one process, it works as expected.
> For two processes, the domain is split in the x direction. But the 10th row, 20th column is not eliminated as observed when using one process. The results for two processes are also attached.
> I have input the same rows to be eliminated for both processes.
> Thank you for any help.
> 
> #include <stdio.h>
> #include <petscdmda.h>
> 
> int main(int argc, char **argv)
> {
>  PetscInt    M = 5, N = 3, m = PETSC_DECIDE, n = PETSC_DECIDE, ncomp = 2;
>  PetscInt i, j;
>  DMDALocalInfo daInfo;
>  DM          da;
>  Mat A;
>  Vec x, b;
>  MatStencil row, col[5];
>  PetscScalar v[5];
>  PetscInt  n_dirichlet_rows = 0, dirichlet_rows[2*(M+N)];
> 
>  PetscFunctionBeginUser;
>  PetscCall(PetscInitialize(&argc, &argv, (char *)0, NULL));
>  PetscCall(DMDACreate2d(PETSC_COMM_WORLD, DM_BOUNDARY_GHOSTED, DM_BOUNDARY_GHOSTED, DMDA_STENCIL_BOX, M, N, m, n, ncomp, 1, NULL, NULL, &da));
>  PetscCall(DMSetFromOptions(da));
>  PetscCall(DMSetUp(da));
> 
>  PetscCall(DMView(da, PETSC_VIEWER_STDOUT_WORLD));
> 
>  PetscCall(DMDAGetLocalInfo(da, &daInfo));
>  PetscCall(DMSetMatrixPreallocateOnly(da, PETSC_TRUE));
> 
>  PetscCall(DMCreateMatrix(da, &A));
>  PetscCall(MatCreateVecs(A, &x, &b));
>  PetscCall(MatZeroEntries(A));
>  PetscCall(VecZeroEntries(x));
>  PetscCall(VecZeroEntries(b));
> 
>  for (j = daInfo.ys; j < daInfo.ys + daInfo.ym; ++j) {
>    for (i = daInfo.xs; i < daInfo.xs + daInfo.xm; ++i) {
>      row.j = j;
>      row.i = i;
>      row.c = 0;
> 
>      col[0].j = j;
>      col[0].i = i;
>      col[0].c = 0;
>      v[0] = row.i + col[0].i + 2;
>      col[1].j = j;
>      col[1].i = i - 1;
>      col[1].c = 0;
>      v[1] = row.i + col[1].i + 2;
>      col[2].j = j;
>      col[2].i = i + 1;
>      col[2].c = 0;
>      v[2] = row.i + col[2].i + 2;
>      col[3].j = j - 1;
>      col[3].i = i;
>      col[3].c = 0;
>      v[3] = row.i + col[2].i + 2;
>      col[4].j = j + 1;
>      col[4].i = i;
>      col[4].c = 0;
>      v[4] = row.i + col[2].i + 2;
> 
>      PetscCall(MatSetValuesStencil(A, 1, &row, 5, col, v, ADD_VALUES));
> 
>      row.j = j;
>      row.i = i;
>      row.c = 1;
>      col[0].j = j;
>      col[0].i = i;
>      col[0].c = 1;
>      v[1] = row.j + col[1].j + 2;
>      col[1].j = j - 1;
>      col[1].i = i;
>      col[1].c = 1;
>      v[1] = row.j + col[1].j + 2;
>      col[2].j = j + 1;
>      col[2].i = i;
>      col[2].c = 1;
>      v[2] = row.j + col[2].j + 2;
>      col[3].j = j;
>      col[3].i = i - 1;
>      col[3].c = 1;
>      v[3] = row.j + col[1].j + 2;
>      col[4].j = j;
>      col[4].i = i + 1;
>      col[4].c = 1;
>      v[4] = row.j + col[2].j + 2;
> 
>      PetscCall(MatSetValuesStencil(A, 1, &row, 5, col, v, ADD_VALUES));
> 
>    }
>  }
>  PetscCall(MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY));
>  PetscCall(MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY));
>  MatView(A, 0);
> 
>  for (j = 0; j < daInfo.my; ++j) {
>    dirichlet_rows[n_dirichlet_rows++] = j * daInfo.mx <https://urldefense.us/v3/__http://dainfo.mx/__;!!G_uCfscf7eWS!faAU5SYXVdbeL4y_c26LP7pbZnC1p3U_6-f4qNMpQMw2dXi4plc1QMmEh9_6S1xCYurJg4RIuXOrnKz85IkV6m3YcLROpRRf$> * ncomp;
>    dirichlet_rows[n_dirichlet_rows++] = (j+1) * daInfo.mx <https://urldefense.us/v3/__http://dainfo.mx/__;!!G_uCfscf7eWS!faAU5SYXVdbeL4y_c26LP7pbZnC1p3U_6-f4qNMpQMw2dXi4plc1QMmEh9_6S1xCYurJg4RIuXOrnKz85IkV6m3YcLROpRRf$> * ncomp - ncomp;
>  }
>  PetscCall(PetscPrintf(PETSC_COMM_SELF, "n_dirichlet_rows: %d\n", n_dirichlet_rows));
>  for (j = 0; j < n_dirichlet_rows; ++j) {
>    PetscCall(PetscPrintf(PETSC_COMM_SELF, "%d, ", dirichlet_rows[j]));
>  }
>  PetscCall(PetscPrintf(PETSC_COMM_SELF, "\n"));
>  PetscCall(MatZeroRowsColumns(A, n_dirichlet_rows, dirichlet_rows, 1, NULL, NULL));
>  MatView(A, 0);
> 
>  PetscCall(VecDestroy(&x));
>  PetscCall(VecDestroy(&b));
>  PetscCall(DMDestroy(&da));
>  PetscCall(PetscFinalize());
>  return 0;
> }
> 
> ???????????????????
> 
> n_dirichlet_rows: 6
> 0, 8, 10, 18, 20, 28,
> Mat Object: 2 MPI processes
>  type: mpiaij
>    row 0:     (0, 1.)      (2, 0.)      (10, 0.)
>    row 1:     (1, 2.)      (3, 3.)      (11, 3.)
>    row 2:     (0, 0.)      (2, 4.)      (4, 5.)      (12, 0.)
>    row 3:     (1, 1.)      (3, 4.)      (5, 3.)      (13, 3.)
>    row 4:     (2, 5.)      (4, 6.)      (6, 0.)      (14, 0.)
>    row 5:     (3, 1.)      (5, 6.)      (7, 3.)      (15, 3.)
>    row 6:     (4, 0.)      (6, 1.)      (8, 0.)      (16, 0.)
>    row 7:     (5, 1.)      (7, 8.)      (9, 3.)      (17, 3.)
>    row 8:     (6, 0.)      (8, 1.)      (18, 0.)
>    row 9:     (7, 1.)      (9, 10.)      (19, 3.)
>    row 10:     (0, 0.)      (10, 2.)      (12, 0.)      (20, 3.)
>    row 11:     (1, 3.)      (11, 2.)      (13, 5.)      (21, 5.)
>    row 12:     (2, 0.)      (10, 0.)      (12, 1.)      (14, 0.)      (22, 0.)
>    row 13:     (3, 3.)      (11, 3.)      (13, 4.)      (15, 5.)      (23, 5.)
>    row 14:     (4, 0.)      (12, 0.)      (14, 1.)      (16, 0.)      (24, 0.)
>    row 15:     (5, 3.)      (13, 3.)      (15, 6.)      (17, 5.)      (25, 5.)
>    row 16:     (6, 0.)      (14, 0.)      (16, 8.)      (18, 9.)      (26, 9.)
>    row 17:     (7, 3.)      (15, 3.)      (17, 8.)      (19, 5.)      (27, 5.)
>    row 18:     (8, 0.)      (16, 9.)      (18, 10.)      (28, 0.)
>    row 19:     (9, 3.)      (17, 3.)      (19, 10.)      (29, 5.)
>    row 20:     (10, 3.)      (20, 2.)      (22, 3.)
>    row 21:     (11, 5.)      (21, 2.)      (23, 7.)
>    row 22:     (12, 0.)      (20, 3.)      (22, 4.)      (24, 5.)
>    row 23:     (13, 5.)      (21, 5.)      (23, 4.)      (25, 7.)
>    row 24:     (14, 0.)      (22, 5.)      (24, 6.)      (26, 7.)
>    row 25:     (15, 5.)      (23, 5.)      (25, 6.)      (27, 7.)
>    row 26:     (16, 9.)      (24, 7.)      (26, 8.)      (28, 0.)
>    row 27:     (17, 5.)      (25, 5.)      (27, 8.)      (29, 7.)
>    row 28:     (18, 0.)      (26, 0.)      (28, 1.)
>    row 29:     (19, 5.)      (27, 5.)      (29, 10.)
> 
> Junming

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240809/af35b80f/attachment.html>

From knepley at gmail.com  Fri Aug  9 15:12:20 2024
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 9 Aug 2024 16:12:20 -0400
Subject: [petsc-users] Issue configuring PETSc with HYPRE in Polaris
In-Reply-To: <DM6PR09MB5063AABF892589E5542DEB57F8BA2@DM6PR09MB5063.namprd09.prod.outlook.com>
References: <DM6PR09MB5063AABF892589E5542DEB57F8BA2@DM6PR09MB5063.namprd09.prod.outlook.com>
Message-ID: <CAMYG4G=wqBmfqhqaMEAro21rDzkL_a4bFv7PP152h10S9hUMsg@mail.gmail.com>

As a start, please send configure.log

  Thanks,

    Matt

On Fri, Aug 9, 2024 at 1:17?PM Vanella, Marcos (Fed) via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Hi All, I keep running into this issue when trying to configure PETSc
> downloading HYPRE in Polaris.
> My modules are:
>
> export *MPICH_GPU_SUPPORT_ENABLED*=1
> module use /soft/modulefiles
> module load spack-pe-base cmake
> module unload darshan
> module load cudatoolkit-standalone PrgEnv-gnu cray-libsci
>
> and my configure line is:
>
> $./configure COPTFLAGS="-O2" CXXOPTFLAGS="-O2" FOPTFLAGS="-O2"
> FCOPTFLAGS="-O2" CUDAOPTFLAGS="-O2" --with-debugging=1
> --download-suitesparse --download-hypre --with-cuda --with-cc=cc
> --with-cxx=CC --with-fc=ftn --with-cudac=nvcc --with-cuda-arch=80
>
> What I see in the configure phase is:
>
> =============================================================================================
>                          Configuring PETSc to compile on your system
>
> =============================================================================================
>
> =============================================================================================
>            Trying to download https://urldefense.us/v3/__https://bitbucket.org/petsc/pkg-sowing.git__;!!G_uCfscf7eWS!bC2nF0niuYrmvBqOKJhC2c7ynXepezhMCen7e9RqnIO_bj8qEvum1TAPesC1XjzU0AEgkVpR4B20xSeFpvUg$ 
> <https://urldefense.us/v3/__https://bitbucket.org/petsc/pkg-sowing.git__;!!G_uCfscf7eWS!cmVCEe9Yo9XY7yJT97YkbQmjoCgOuxhiJ2FQxtDUKX1EeWJlKWt0pLawxoHeUS0ZDgSfwCHAoJNUjc5uQW3gQdHH9OXszklO$>
> for SOWING
>
> =============================================================================================
>
> =============================================================================================
>                   Running configure on SOWING; this may take several
> minutes
>
> =============================================================================================
>
> =============================================================================================
>                     Running make on SOWING; this may take several minutes
>
> =============================================================================================
>
> =============================================================================================
>                 Running make install on SOWING; this may take several
> minutes
>
> =============================================================================================
>
> =============================================================================================
>                  Running arch-polaris-dbg/bin/bfort to generate Fortran
> stubs
>
> =============================================================================================
>
> =============================================================================================
>     Trying to download https://urldefense.us/v3/__https://github.com/DrTimothyAldenDavis/SuiteSparse__;!!G_uCfscf7eWS!bC2nF0niuYrmvBqOKJhC2c7ynXepezhMCen7e9RqnIO_bj8qEvum1TAPesC1XjzU0AEgkVpR4B20xQ43P-ld$ 
> <https://urldefense.us/v3/__https://github.com/DrTimothyAldenDavis/SuiteSparse__;!!G_uCfscf7eWS!cmVCEe9Yo9XY7yJT97YkbQmjoCgOuxhiJ2FQxtDUKX1EeWJlKWt0pLawxoHeUS0ZDgSfwCHAoJNUjc5uQW3gQdHH9Ho5-hpl$>
> for SUITESPARSE
>
> =============================================================================================
>
> =============================================================================================
>               Configuring SUITESPARSE with CMake; this may take several
> minutes
>
> =============================================================================================
>
> =============================================================================================
>              Compiling and installing SUITESPARSE; this may take several
> minutes
>
> =============================================================================================
>
> =============================================================================================
>               Trying to download https://urldefense.us/v3/__https://github.com/hypre-space/hypre__;!!G_uCfscf7eWS!bC2nF0niuYrmvBqOKJhC2c7ynXepezhMCen7e9RqnIO_bj8qEvum1TAPesC1XjzU0AEgkVpR4B20xeK92H5d$ 
> <https://urldefense.us/v3/__https://github.com/hypre-space/hypre__;!!G_uCfscf7eWS!cmVCEe9Yo9XY7yJT97YkbQmjoCgOuxhiJ2FQxtDUKX1EeWJlKWt0pLawxoHeUS0ZDgSfwCHAoJNUjc5uQW3gQdHH9JxTYrQ0$>
> for HYPRE
>
> =============================================================================================
>
> =============================================================================================
>                   Running configure on HYPRE; this may take several minutes
>
> =============================================================================================
>
> =============================================================================================
>                      Running make on HYPRE; this may take several minutes
>
> =============================================================================================
>
>
> *********************************************************************************************
>            UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for
> details):
>
> ---------------------------------------------------------------------------------------------
>                           Error running make; make install on HYPRE
>
> *********************************************************************************************
>
>  the configure.log file ends with:
>
>
> *********************************************************************************************
>            UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for
> details):
>
> ---------------------------------------------------------------------------------------------
>                           Error running make; make install on HYPRE
>
> *********************************************************************************************
>   File "/home/mnv/Software/petsc/config/configure.py", line 462, in
> petsc_configure
>     framework.configure(out = sys.stdout)
>   File "/home/mnv/Software/petsc/config/BuildSystem/config/framework.py",
> line 1455, in configure
>     self.processChildren()
>   File "/home/mnv/Software/petsc/config/BuildSystem/config/framework.py",
> line 1443, in processChildren
>     self.serialEvaluation(self.childGraph)
>   File "/home/mnv/Software/petsc/config/BuildSystem/config/framework.py",
> line 1418, in serialEvaluation
>     child.configure()
>   File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py",
> line 1354, in configure
>     self.executeTest(self.configureLibrary)
>   File "/home/mnv/Software/petsc/config/BuildSystem/config/base.py", line
> 138, in executeTest
>     ret = test(*args,**kargs)
>   File
> "/home/mnv/Software/petsc/config/BuildSystem/config/packages/hypre.py",
> line 199, in configureLibrary
>     config.package.Package.configureLibrary(self)
>   File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py",
> line 1041, in configureLibrary
>     for location, directory, lib, incl in self.generateGuesses():
>   File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py",
> line 609, in generateGuesses
>     d = self.checkDownload()
>   File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py",
> line 743, in checkDownload
>     return self.getInstallDir()
>   File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py",
> line 545, in getInstallDir
>     installDir = self.Install()
>   File "/home/mnv/Software/petsc/config/BuildSystem/config/package.py",
> line 1892, in Install
>     raise RuntimeError('Error running make; make install on '+self.PACKAGE)
>
> ================================================================================
> Finishing configure run at Fri, 09 Aug 2024 15:44:54 +0000
>
> ================================================================================
>
> Any help in debugging this is much appreciated. I can provide the whole
> configure.log file if needed.
> Thank you for your time,
> Marcos
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bC2nF0niuYrmvBqOKJhC2c7ynXepezhMCen7e9RqnIO_bj8qEvum1TAPesC1XjzU0AEgkVpR4B20xdqmkbnf$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bC2nF0niuYrmvBqOKJhC2c7ynXepezhMCen7e9RqnIO_bj8qEvum1TAPesC1XjzU0AEgkVpR4B20xZJqvPeJ$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240809/9e29005e/attachment-0001.html>

From liufield at gmail.com  Tue Aug 13 16:17:25 2024
From: liufield at gmail.com (neil liu)
Date: Tue, 13 Aug 2024 17:17:25 -0400
Subject: [petsc-users] Question about the memory usage for BDDC
 preconditioner.
Message-ID: <CAGVJNHBC8UCApBoKt6PFaUiqQtkxVg_JSDyMTCyOH5_9rS6ksA@mail.gmail.com>

Dear Petsc developers,

I am testing PCBDDC for my vector based FEM solver(complex system). It can
work well on a coarse mesh(tetrahedra cell #: 6,108; dof # : 39,596). Then
I tried a finer mesh (tetrahedra cell #: 32,036; dof # : 206,362). It seems
ASM can work well with

petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 4 ./app -pc_type asm
-ksp_converged_reason -ksp_monitor  -ksp_gmres_restart 100 -ksp_rtol 1e-4
-pc_asm_overalp 4 -sub_pc_type ilu -malloc_view

while  PCBDDC eats up the memory (61 GB) when I tried

petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 4 ./app -pc_type bddc
-pc_bddc_coarse_redundant_pc_type ilu  -pc_bddc_use_vertices
-ksp_error_if_not_converged -mat_type is -ksp_monitor -ksp_rtol 1e-8
-ksp_gmres_restart 30 -ksp_view -malloc_view -pc_bddc_monolithic
-pc_bddc_neumann_pc_type ilu -pc_bddc_dirichlet_pc_type ilu

The following errors with BDDC came out. The memory usage for PCBDDC
(different from PCASM) is also listed (I am assuming the unit is Bytes,
right?). *Although the BDDC requires more memory, it still seems normal,
right? *

[0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[0]PETSC ERROR: Out of memory. This could be due to allocating
[0]PETSC ERROR: too large an object or bleeding by not properly
[0]PETSC ERROR: destroying unneeded objects.
[0] Maximum memory PetscMalloc()ed 30829727808 maximum size of entire
process 16899194880
[0] Memory usage sorted by function
....
*[0] 1 240 PCBDDCGraphCreate()*
*[0] 1 3551136 PCBDDCGraphInit()*
*[0] 2045 32720 PCBDDCGraphSetUp()*
*[0] 2 8345696 PCBDDCSetLocalAdjacencyGraph_BDDC()*
*[0] 1 784 PCCreate()*
*[0] 1 1216 PCCreate_BDDC()*
....

Thanks for your help.

Xiaodong
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240813/816e32d7/attachment.html>

From stefano.zampini at gmail.com  Tue Aug 13 16:47:07 2024
From: stefano.zampini at gmail.com (Stefano Zampini)
Date: Tue, 13 Aug 2024 23:47:07 +0200
Subject: [petsc-users] Question about the memory usage for BDDC
 preconditioner.
In-Reply-To: <CAGVJNHBC8UCApBoKt6PFaUiqQtkxVg_JSDyMTCyOH5_9rS6ksA@mail.gmail.com>
References: <CAGVJNHBC8UCApBoKt6PFaUiqQtkxVg_JSDyMTCyOH5_9rS6ksA@mail.gmail.com>
Message-ID: <CAGPUisjLSdn7eHY4ndz3C+8S__AVJ6oONvWpa24r3AwpHTB0CQ@mail.gmail.com>

can you run the same options and add "-ksp_view -pc_bddc_check_level 1" for
the smaller case? Also, can you send the full stack trace of the
out-of-memory error using a debug version of PETSc?
A note aside: you should not need pc_bddc_use_vertices (which is on by
default)

Il giorno mar 13 ago 2024 alle ore 23:17 neil liu <liufield at gmail.com> ha
scritto:

> Dear Petsc developers,
>
> I am testing PCBDDC for my vector based FEM solver(complex system). It can
> work well on a coarse mesh(tetrahedra cell #: 6,108; dof # : 39,596). Then
> I tried a finer mesh (tetrahedra cell #: 32,036; dof # : 206,362). It seems
> ASM can work well with
>
> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 4 ./app -pc_type asm
> -ksp_converged_reason -ksp_monitor  -ksp_gmres_restart 100 -ksp_rtol 1e-4
> -pc_asm_overalp 4 -sub_pc_type ilu -malloc_view
>
> while  PCBDDC eats up the memory (61 GB) when I tried
>
> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 4 ./app -pc_type bddc
> -pc_bddc_coarse_redundant_pc_type ilu  -pc_bddc_use_vertices
> -ksp_error_if_not_converged -mat_type is -ksp_monitor -ksp_rtol 1e-8
> -ksp_gmres_restart 30 -ksp_view -malloc_view -pc_bddc_monolithic
> -pc_bddc_neumann_pc_type ilu -pc_bddc_dirichlet_pc_type ilu
>
> The following errors with BDDC came out. The memory usage for PCBDDC
> (different from PCASM) is also listed (I am assuming the unit is Bytes,
> right?). *Although the BDDC requires more memory, it still seems normal,
> right? *
>
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Out of memory. This could be due to allocating
> [0]PETSC ERROR: too large an object or bleeding by not properly
> [0]PETSC ERROR: destroying unneeded objects.
> [0] Maximum memory PetscMalloc()ed 30829727808 maximum size of entire
> process 16899194880
> [0] Memory usage sorted by function
> ....
> *[0] 1 240 PCBDDCGraphCreate()*
> *[0] 1 3551136 PCBDDCGraphInit()*
> *[0] 2045 32720 PCBDDCGraphSetUp()*
> *[0] 2 8345696 PCBDDCSetLocalAdjacencyGraph_BDDC()*
> *[0] 1 784 PCCreate()*
> *[0] 1 1216 PCCreate_BDDC()*
> ....
>
> Thanks for your help.
>
> Xiaodong
>
>
>

-- 
Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240813/230f465c/attachment.html>

From liufield at gmail.com  Tue Aug 13 22:15:13 2024
From: liufield at gmail.com (neil liu)
Date: Tue, 13 Aug 2024 23:15:13 -0400
Subject: [petsc-users] Question about the memory usage for BDDC
 preconditioner.
In-Reply-To: <CAGPUisjLSdn7eHY4ndz3C+8S__AVJ6oONvWpa24r3AwpHTB0CQ@mail.gmail.com>
References: <CAGVJNHBC8UCApBoKt6PFaUiqQtkxVg_JSDyMTCyOH5_9rS6ksA@mail.gmail.com>
	<CAGPUisjLSdn7eHY4ndz3C+8S__AVJ6oONvWpa24r3AwpHTB0CQ@mail.gmail.com>
Message-ID: <CAGVJNHBo8Bb46Bbaz6+uLMhaDz5GaPYma62vbVZhi2hKf4VfDA@mail.gmail.com>

Hi, Stefano,

Please see the attached for the smaller case(successful with BDDC).
and the Error_largerMesh shows the error with the large mesh using petsc
debug mode.

Thanks a lot,

Xiaodong


On Tue, Aug 13, 2024 at 5:47?PM Stefano Zampini <stefano.zampini at gmail.com>
wrote:

> can you run the same options and add "-ksp_view -pc_bddc_check_level 1"
> for the smaller case? Also, can you send the full stack trace of the
> out-of-memory error using a debug version of PETSc?
> A note aside: you should not need pc_bddc_use_vertices (which is on by
> default)
>
> Il giorno mar 13 ago 2024 alle ore 23:17 neil liu <liufield at gmail.com> ha
> scritto:
>
>> Dear Petsc developers,
>>
>> I am testing PCBDDC for my vector based FEM solver(complex system). It
>> can work well on a coarse mesh(tetrahedra cell #: 6,108; dof # : 39,596).
>> Then I tried a finer mesh (tetrahedra cell #: 32,036; dof # : 206,362). It
>> seems ASM can work well with
>>
>> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 4 ./app -pc_type asm
>> -ksp_converged_reason -ksp_monitor  -ksp_gmres_restart 100 -ksp_rtol 1e-4
>> -pc_asm_overalp 4 -sub_pc_type ilu -malloc_view
>>
>> while  PCBDDC eats up the memory (61 GB) when I tried
>>
>> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 4 ./app -pc_type bddc
>> -pc_bddc_coarse_redundant_pc_type ilu  -pc_bddc_use_vertices
>> -ksp_error_if_not_converged -mat_type is -ksp_monitor -ksp_rtol 1e-8
>> -ksp_gmres_restart 30 -ksp_view -malloc_view -pc_bddc_monolithic
>> -pc_bddc_neumann_pc_type ilu -pc_bddc_dirichlet_pc_type ilu
>>
>> The following errors with BDDC came out. The memory usage for PCBDDC
>> (different from PCASM) is also listed (I am assuming the unit is Bytes,
>> right?). *Although the BDDC requires more memory, it still seems normal,
>> right? *
>>
>> [0]PETSC ERROR: --------------------- Error Message
>> --------------------------------------------------------------
>> [0]PETSC ERROR: Out of memory. This could be due to allocating
>> [0]PETSC ERROR: too large an object or bleeding by not properly
>> [0]PETSC ERROR: destroying unneeded objects.
>> [0] Maximum memory PetscMalloc()ed 30829727808 maximum size of entire
>> process 16899194880
>> [0] Memory usage sorted by function
>> ....
>> *[0] 1 240 PCBDDCGraphCreate()*
>> *[0] 1 3551136 PCBDDCGraphInit()*
>> *[0] 2045 32720 PCBDDCGraphSetUp()*
>> *[0] 2 8345696 PCBDDCSetLocalAdjacencyGraph_BDDC()*
>> *[0] 1 784 PCCreate()*
>> *[0] 1 1216 PCCreate_BDDC()*
>> ....
>>
>> Thanks for your help.
>>
>> Xiaodong
>>
>>
>>
>
> --
> Stefano
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240813/48df33f7/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Error_largerMesh
Type: application/octet-stream
Size: 6080 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240813/48df33f7/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Smaller_case
Type: application/octet-stream
Size: 323048 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240813/48df33f7/attachment-0003.obj>

From balay at fastmail.org  Wed Aug 14 09:21:52 2024
From: balay at fastmail.org (Satish Balay)
Date: Wed, 14 Aug 2024 09:21:52 -0500 (CDT)
Subject: [petsc-users] HDF5 (fwd)
Message-ID: <d41133eb-9108-471f-6fd3-7af3b5ad2547@fastmail.org>

Please fix your 'contacts' to use 'petsc-users' and not 'petsc-users-bounces'

Satish


---------- Forwarded message ----------
Date: Wed, 14 Aug 2024 07:09:31 +0000
From: Yang Yehua <yc37416 at connect.um.edu.mo>
To: "petsc-users-bounces at mcs.anl.gov" <petsc-users-bounces at mcs.anl.gov>
Subject: HDF5 

Dear all,
I am trying to use HDF5 to save a DM object, but it is very slow. Here is the code I am using:
PetscViewer viewer;
PetscCall(PetscViewerHDF5Open(PETSC_COMM_WORLD, "mesh.h5", FILE_MODE_WRITE, &viewer));
PetscCall(PetscObjectSetName((PetscObject)dm, "plexA"));
PetscCall(DMView(dm, viewer));
PetscCall(PetscViewerDestroy(&viewer));
The DM object is a parallel mesh with 2 MPI processes. Here are the details:

  *   Type: plex
  *   Parallel Mesh in 3 dimensions:
     *   Number of 0-cells per rank: 1954, 1948
     *   Number of 1-cells per rank: 11794, 11749
     *   Number of 2-cells per rank: 18851, 18773
     *   Number of 3-cells per rank: 9010, 8971
Labels:

  *   Depth: 4 strata with value/size (0 (1954), 1 (11794), 2 (18851), 3 (9010))
  *   Celltype: 4 strata with value/size (0 (1954), 1 (11794), 3 (18851), 6 (9010))
  *   Cell Sets: 1 strata with value/size (2 (9010))
  *   av_section: 1 strata with value/size (0 (9010))
Field phi_grad:

  *   Adjacency FEM
Despite the small mesh size, using 16 ranks takes several minutes, while using 2 ranks takes 21 seconds.
Any suggestions on how to improve the performance would be greatly appreciated.
Best regards, Yehua
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure.log
Type: application/octet-stream
Size: 988337 bytes
Desc: configure.log
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240814/ee989598/attachment-0001.obj>

From stefano.zampini at gmail.com  Wed Aug 14 10:54:17 2024
From: stefano.zampini at gmail.com (Stefano Zampini)
Date: Wed, 14 Aug 2024 17:54:17 +0200
Subject: [petsc-users] Question about the memory usage for BDDC
 preconditioner.
In-Reply-To: <CAGVJNHBo8Bb46Bbaz6+uLMhaDz5GaPYma62vbVZhi2hKf4VfDA@mail.gmail.com>
References: <CAGVJNHBC8UCApBoKt6PFaUiqQtkxVg_JSDyMTCyOH5_9rS6ksA@mail.gmail.com>
	<CAGPUisjLSdn7eHY4ndz3C+8S__AVJ6oONvWpa24r3AwpHTB0CQ@mail.gmail.com>
	<CAGVJNHBo8Bb46Bbaz6+uLMhaDz5GaPYma62vbVZhi2hKf4VfDA@mail.gmail.com>
Message-ID: <CAGPUisi_eCsb69ZM8otPQBE+qL3Qjb1RBGhhxtCLQxwE2=e_5A@mail.gmail.com>

Ok, the problem is that the default algorithm for detecting the connected
components of the interface finds a lot of disconnected dofs.
What discretization is this? Nedelec elements? Can you try using -pc_
*bddc_use_lo*cal_mat_graph 0?
Also, you are using -pc_bddc_monolithic, but you only have one field. That
flag aggregates different fields, but you only have one.
Note that with Nedelec elements, you need a special change of basis for
BDDC to work, see e.g. https://urldefense.us/v3/__https://www.osti.gov/servlets/purl/1377770__;!!G_uCfscf7eWS!Z94Qs8Q7RYEdhbAbvkaNorzlyoN4UH_ttW0EmR6d-NKweo4S35ELp-_Y60aJAAE1vzgZpof2VQYVxX9Xm1kM2vwiioZBFlo$ 

Il giorno mer 14 ago 2024 alle ore 05:15 neil liu <liufield at gmail.com> ha
scritto:

> Hi, Stefano,
>
> Please see the attached for the smaller case(successful with BDDC).
> and the Error_largerMesh shows the error with the large mesh using petsc
> debug mode.
>
> Thanks a lot,
>
> Xiaodong
>
>
> On Tue, Aug 13, 2024 at 5:47?PM Stefano Zampini <stefano.zampini at gmail.com>
> wrote:
>
>> can you run the same options and add "-ksp_view -pc_bddc_check_level 1"
>> for the smaller case? Also, can you send the full stack trace of the
>> out-of-memory error using a debug version of PETSc?
>> A note aside: you should not need pc_bddc_use_vertices (which is on by
>> default)
>>
>> Il giorno mar 13 ago 2024 alle ore 23:17 neil liu <liufield at gmail.com>
>> ha scritto:
>>
>>> Dear Petsc developers,
>>>
>>> I am testing PCBDDC for my vector based FEM solver(complex system). It
>>> can work well on a coarse mesh(tetrahedra cell #: 6,108; dof # : 39,596).
>>> Then I tried a finer mesh (tetrahedra cell #: 32,036; dof # : 206,362). It
>>> seems ASM can work well with
>>>
>>> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 4 ./app -pc_type asm
>>> -ksp_converged_reason -ksp_monitor  -ksp_gmres_restart 100 -ksp_rtol 1e-4
>>> -pc_asm_overalp 4 -sub_pc_type ilu -malloc_view
>>>
>>> while  PCBDDC eats up the memory (61 GB) when I tried
>>>
>>> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 4 ./app -pc_type bddc
>>> -pc_bddc_coarse_redundant_pc_type ilu  -pc_bddc_use_vertices
>>> -ksp_error_if_not_converged -mat_type is -ksp_monitor -ksp_rtol 1e-8
>>> -ksp_gmres_restart 30 -ksp_view -malloc_view -pc_bddc_monolithic
>>> -pc_bddc_neumann_pc_type ilu -pc_bddc_dirichlet_pc_type ilu
>>>
>>> The following errors with BDDC came out. The memory usage for PCBDDC
>>> (different from PCASM) is also listed (I am assuming the unit is Bytes,
>>> right?). *Although the BDDC requires more memory, it still seems
>>> normal, right? *
>>>
>>> [0]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>> [0]PETSC ERROR: Out of memory. This could be due to allocating
>>> [0]PETSC ERROR: too large an object or bleeding by not properly
>>> [0]PETSC ERROR: destroying unneeded objects.
>>> [0] Maximum memory PetscMalloc()ed 30829727808 maximum size of entire
>>> process 16899194880
>>> [0] Memory usage sorted by function
>>> ....
>>> *[0] 1 240 PCBDDCGraphCreate()*
>>> *[0] 1 3551136 PCBDDCGraphInit()*
>>> *[0] 2045 32720 PCBDDCGraphSetUp()*
>>> *[0] 2 8345696 PCBDDCSetLocalAdjacencyGraph_BDDC()*
>>> *[0] 1 784 PCCreate()*
>>> *[0] 1 1216 PCCreate_BDDC()*
>>> ....
>>>
>>> Thanks for your help.
>>>
>>> Xiaodong
>>>
>>>
>>>
>>
>> --
>> Stefano
>>
>

-- 
Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240814/25cf342e/attachment.html>

From coltonbryant2021 at u.northwestern.edu  Wed Aug 14 17:36:41 2024
From: coltonbryant2021 at u.northwestern.edu (Colton Bryant)
Date: Wed, 14 Aug 2024 16:36:41 -0600
Subject: [petsc-users] Using DM_BOUNDARY_GHOSTED in DMSTAG
Message-ID: <CAK7cQXsOnVce-AHaA_YFwSu8fvaD4vkU22=hORRczh8vPOdm5A@mail.gmail.com>

Hello,

I'm trying to understand the use of DM_BOUNDARY_GHOSTED and am a little
confused. Is there any way for the linear solver to access and manipulate
the ghost point value during the solve? I currently have a code using
DM_BOUNDARY_PERIODIC and at the periodic boundary I simply apply the same
discretization as I do everywhere else and as I understand it the value at
e.g. i=-1 is set automatically by the periodic boundary condition. I would
like to use DM_BOUNDARY_GHOSTED to set my own condition by which the point
at i=-1 is set (a Neumann type condition). I have seen some matrix free
examples but is there an easy way to "add" such a condition to the linear
system in this case?

Thanks for any help you can provide.

Best,
Colton Bryant
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240814/a6b0939f/attachment.html>

From liufield at gmail.com  Wed Aug 14 21:08:44 2024
From: liufield at gmail.com (neil liu)
Date: Wed, 14 Aug 2024 22:08:44 -0400
Subject: [petsc-users] Question about the memory usage for BDDC
 preconditioner.
In-Reply-To: <CAGPUisi_eCsb69ZM8otPQBE+qL3Qjb1RBGhhxtCLQxwE2=e_5A@mail.gmail.com>
References: <CAGVJNHBC8UCApBoKt6PFaUiqQtkxVg_JSDyMTCyOH5_9rS6ksA@mail.gmail.com>
	<CAGPUisjLSdn7eHY4ndz3C+8S__AVJ6oONvWpa24r3AwpHTB0CQ@mail.gmail.com>
	<CAGVJNHBo8Bb46Bbaz6+uLMhaDz5GaPYma62vbVZhi2hKf4VfDA@mail.gmail.com>
	<CAGPUisi_eCsb69ZM8otPQBE+qL3Qjb1RBGhhxtCLQxwE2=e_5A@mail.gmail.com>
Message-ID: <CAGVJNHAJHQgoKxjnqVB5MgXYNDe7XkqmVh7qjUQVLhNWsuwoGw@mail.gmail.com>

Thanks a lot, Stefano.
Yes. I am using 2nd-order Nedelec elements. -pc_*bddc_use_lo*cal_mat_graph
0 can make the code run. I am testing more cpu #.
I am testing my code using,
petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 4 ./app -pc_type bddc
-pc_bddc_coarse_redundant_pc_type svd  -pc_bddc_use_vertices
-ksp_error_if_not_converged -mat_type is -ksp_monitor -ksp_rtol 1e-8
-ksp_gmres_restart 2000 -ksp_view -malloc_view -pc_bddc_use_local_mat_graph
0 -ksp_converged_reason -pc_bddc_neumann_pc_type gamg
-pc_bddc_neumann_pc_gamg_esteig_ksp_max_it 10 -ksp_converged_reason
-pc_bddc_neumann_approximate -pc_bddc_dirichlet_pc_type gamg
-pc_bddc_dirichlet_pc_gamg_esteig_ksp_max_it 10 -ksp_converged_reason
-pc_bddc_dirichlet_approximate

The residual dropped to 6e-5 very fast and then continued to reduce very
slowly.  Do you have any suggestions to improve this ?

Will it be necessary to change the basis for BDDC in order to
accelerate the convergence ? In addition, I tried
-pc_bddc_use_deluxe_scaling, but it showed some errors. It seems deluxe
scaling obviously requires a much larger size (*Global size overflow
3051678564*) than my problem.

Thanks,
Xiaodong

[0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[0]PETSC ERROR: Overflow in integer operation:
https://urldefense.us/v3/__https://petsc.org/release/faq/*64-bit-indices__;Iw!!G_uCfscf7eWS!ZVyzxJb4s9N1kzsS2BV7raG-kJIn8X6skBNtfsvA8aHyjWPm8oYGfzk83j1n0PFstGE6nDCHpOIpMvkLFZcexA$ 
[0]PETSC ERROR: Global size overflow 3051678564. You may consider
./configure PETSc with --with-64-bit-indices for the case you are running
[0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the
program crashed before usage or a spelling mistake, etc!
[0]PETSC ERROR:   Option left: name:-ksp_converged_reason (no value)
source: command line
[0]PETSC ERROR:   Option left: name:-pc_bddc_coarse_redundant_pc_type
value: svd source: command line
[0]PETSC ERROR:   Option left:
name:-pc_bddc_neumann_pc_gamg_esteig_ksp_max_it value: 10 source: command
line
[0]PETSC ERROR:   Option left: name:-pc_bddc_neumann_pc_type value: gamg
source: command line
[0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!ZVyzxJb4s9N1kzsS2BV7raG-kJIn8X6skBNtfsvA8aHyjWPm8oYGfzk83j1n0PFstGE6nDCHpOIpMvlAB6PtAg$  for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.21.1, unknown
[0]PETSC ERROR: Configure options --with-cc=gcc --with-fc=gfortran
--with-cxx=g++ --download-fblaslapack --download-mpich
--with-scalar-type=complex --download-triangle --with-debugging=no
[0]PETSC ERROR: #1 PetscSplitOwnership() at
/Documents/petsc-3.21.1/petsc/src/sys/utils/psplit.c:86
[0]PETSC ERROR: #2 PetscLayoutSetUp() at
/Documents/petsc-3.21.1/petsc/src/vec/is/utils/pmap.c:244
[0]PETSC ERROR: #3 PetscLayoutCreateFromSizes() at
/Documents/petsc-3.21.1/petsc/src/vec/is/utils/pmap.c:107
[0]PETSC ERROR: #4 ISGeneralSetIndices_General() at
/Documents/petsc-3.21.1/petsc/src/vec/is/is/impls/general/general.c:569
[0]PETSC ERROR: #5 ISGeneralSetIndices() at
/Documents/petsc-3.21.1/petsc/src/vec/is/is/impls/general/general.c:559
[0]PETSC ERROR: #6 ISCreateGeneral() at
/Documents/petsc-3.21.1/petsc/src/vec/is/is/impls/general/general.c:530
[0]PETSC ERROR: #7 ISRenumber() at
/Documents/petsc-3.21.1/petsc/src/vec/is/is/interface/index.c:198
[0]PETSC ERROR: #8 PCBDDCSubSchursSetUp() at
/Documents/petsc-3.21.1/petsc/src/ksp/pc/impls/bddc/bddcschurs.c:646
[0]PETSC ERROR: #9 PCBDDCSetUpSubSchurs() at
/Documents/petsc-3.21.1/petsc/src/ksp/pc/impls/bddc/bddcprivate.c:9348
[0]PETSC ERROR: #10 PCSetUp_BDDC() at
/Documents/petsc-3.21.1/petsc/src/ksp/pc/impls/bddc/bddc.c:1564
[0]PETSC ERROR: #11 PCSetUp() at
/Documents/petsc-3.21.1/petsc/src/ksp/pc/interface/precon.c:1079
[0]PETSC ERROR: #12 KSPSetUp() at
/Documents/petsc-3.21.1/petsc/src/ksp/ksp/interface/itfunc.c:415
[0]PETSC ERROR: #13 KSPSolve_Private() at
Documents/petsc-3.21.1/petsc/src/ksp/ksp/interface/itfunc.c:831
[0]PETSC ERROR: #14 KSPSolve() at
/Documents/petsc-3.21.1/petsc/src/ksp/ksp/interface/itfunc.c:1078


On Wed, Aug 14, 2024 at 11:54?AM Stefano Zampini <stefano.zampini at gmail.com>
wrote:

> Ok, the problem is that the default algorithm for detecting the connected
> components of the interface finds a lot of disconnected dofs.
> What discretization is this? Nedelec elements? Can you try using -pc_
> *bddc_use_lo*cal_mat_graph 0?
> Also, you are using -pc_bddc_monolithic, but you only have one field. That
> flag aggregates different fields, but you only have one.
> Note that with Nedelec elements, you need a special change of basis for
> BDDC to work, see e.g. https://urldefense.us/v3/__https://www.osti.gov/servlets/purl/1377770__;!!G_uCfscf7eWS!ZVyzxJb4s9N1kzsS2BV7raG-kJIn8X6skBNtfsvA8aHyjWPm8oYGfzk83j1n0PFstGE6nDCHpOIpMvlI_QH81A$ 
>
> Il giorno mer 14 ago 2024 alle ore 05:15 neil liu <liufield at gmail.com> ha
> scritto:
>
>> Hi, Stefano,
>>
>> Please see the attached for the smaller case(successful with BDDC).
>> and the Error_largerMesh shows the error with the large mesh using petsc
>> debug mode.
>>
>> Thanks a lot,
>>
>> Xiaodong
>>
>>
>> On Tue, Aug 13, 2024 at 5:47?PM Stefano Zampini <
>> stefano.zampini at gmail.com> wrote:
>>
>>> can you run the same options and add "-ksp_view -pc_bddc_check_level 1"
>>> for the smaller case? Also, can you send the full stack trace of the
>>> out-of-memory error using a debug version of PETSc?
>>> A note aside: you should not need pc_bddc_use_vertices (which is on by
>>> default)
>>>
>>> Il giorno mar 13 ago 2024 alle ore 23:17 neil liu <liufield at gmail.com>
>>> ha scritto:
>>>
>>>> Dear Petsc developers,
>>>>
>>>> I am testing PCBDDC for my vector based FEM solver(complex system). It
>>>> can work well on a coarse mesh(tetrahedra cell #: 6,108; dof # : 39,596).
>>>> Then I tried a finer mesh (tetrahedra cell #: 32,036; dof # : 206,362). It
>>>> seems ASM can work well with
>>>>
>>>> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 4 ./app -pc_type asm
>>>> -ksp_converged_reason -ksp_monitor  -ksp_gmres_restart 100 -ksp_rtol 1e-4
>>>> -pc_asm_overalp 4 -sub_pc_type ilu -malloc_view
>>>>
>>>> while  PCBDDC eats up the memory (61 GB) when I tried
>>>>
>>>> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 4 ./app -pc_type bddc
>>>> -pc_bddc_coarse_redundant_pc_type ilu  -pc_bddc_use_vertices
>>>> -ksp_error_if_not_converged -mat_type is -ksp_monitor -ksp_rtol 1e-8
>>>> -ksp_gmres_restart 30 -ksp_view -malloc_view -pc_bddc_monolithic
>>>> -pc_bddc_neumann_pc_type ilu -pc_bddc_dirichlet_pc_type ilu
>>>>
>>>> The following errors with BDDC came out. The memory usage for PCBDDC
>>>> (different from PCASM) is also listed (I am assuming the unit is Bytes,
>>>> right?). *Although the BDDC requires more memory, it still seems
>>>> normal, right? *
>>>>
>>>> [0]PETSC ERROR: --------------------- Error Message
>>>> --------------------------------------------------------------
>>>> [0]PETSC ERROR: Out of memory. This could be due to allocating
>>>> [0]PETSC ERROR: too large an object or bleeding by not properly
>>>> [0]PETSC ERROR: destroying unneeded objects.
>>>> [0] Maximum memory PetscMalloc()ed 30829727808 maximum size of entire
>>>> process 16899194880
>>>> [0] Memory usage sorted by function
>>>> ....
>>>> *[0] 1 240 PCBDDCGraphCreate()*
>>>> *[0] 1 3551136 PCBDDCGraphInit()*
>>>> *[0] 2045 32720 PCBDDCGraphSetUp()*
>>>> *[0] 2 8345696 PCBDDCSetLocalAdjacencyGraph_BDDC()*
>>>> *[0] 1 784 PCCreate()*
>>>> *[0] 1 1216 PCCreate_BDDC()*
>>>> ....
>>>>
>>>> Thanks for your help.
>>>>
>>>> Xiaodong
>>>>
>>>>
>>>>
>>>
>>> --
>>> Stefano
>>>
>>
>
> --
> Stefano
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240814/94ebca4a/attachment-0001.html>

From bsmith at petsc.dev  Wed Aug 14 22:01:17 2024
From: bsmith at petsc.dev (Barry Smith)
Date: Wed, 14 Aug 2024 23:01:17 -0400
Subject: [petsc-users] Using DM_BOUNDARY_GHOSTED in DMSTAG
In-Reply-To: <CAK7cQXsOnVce-AHaA_YFwSu8fvaD4vkU22=hORRczh8vPOdm5A@mail.gmail.com>
References: <CAK7cQXsOnVce-AHaA_YFwSu8fvaD4vkU22=hORRczh8vPOdm5A@mail.gmail.com>
Message-ID: <AAF8D3E9-3A1B-45C3-ADC0-2252ED07E5BB@petsc.dev>


   The linear solvers (unless you write your own matrix-free beast, which I don't think you want to do) has no concept of the "ghost" locations and cannot access or manipulate them because the linear solver (and most standard preconditioners) work only with the global vector (not the local ghosted vector).

   This means you have one equation for each "point" on the DMStag or DMDA (for simplicity, assuming a scalar problem) and similarly one variable for each of those DM "points".  Resulting in a square matrix. 

   It sounds like you would like to have one equation for each point on the DMStag but variables on both the points on the DMStag and the extra ghost points, resulting in a rectangular matrix. The rectangular matrix is nice because it has the same regular stencil on all of its rows; no rows connected to the boundary are missing part of the stencil.  We don't support linear solvers that can work with the PETSc parallel matrices that can directly work with this form.

   Since you know the values on the ghost points, you can eliminate them (in theory or practice) by updating the right-hand side for the points on the DM. This elimination results in a square matrix (which now has the annoying boundary rows), which can then be solved.  This is the model that we work with for linear problems, having unknowns and equations only for variables in the global vector.

   Now, for nonlinear problems, it is a different story; here, using ghost points is very useful in evaluating f(x) (what becomes the right-hand side in Newton's method) and the Jacobian J(x). So both the function evaluation and Jacobian evaluation start by scattering the global x into local x (the ghost points between processes), the other ghost points (for boundary conditions) are filled by us as appropriate, and then a local function evaluation is done filling in the local points of the global vector using the values in the local vectors. Many of our SNES examples use this style.   

   If you do want to use your scheme for a linear problem directly you can do it. Do not have DM ghost boundary locations; instead, increase the size of the domain by one stencil width on each side and put the ghost boundary locations in the global solution vector, and make identity equations for each ghost boundary locations in the linear system. Conceptually, you have your nice rectangular matrix embedded in a square matrix by just having the other rows be rows of the identity matrix. And put in the right-hand side for those equations the values of your ghost locations, 

  There are a bunch of ways of thinking about these issues if you get into it,

   Barry


> On Aug 14, 2024, at 6:36?PM, Colton Bryant <coltonbryant2021 at u.northwestern.edu> wrote:
> 
> Hello,
> 
> I'm trying to understand the use of DM_BOUNDARY_GHOSTED and am a little confused. Is there any way for the linear solver to access and manipulate the ghost point value during the solve? I currently have a code using DM_BOUNDARY_PERIODIC and at the periodic boundary I simply apply the same discretization as I do everywhere else and as I understand it the value at e.g. i=-1 is set automatically by the periodic boundary condition. I would like to use DM_BOUNDARY_GHOSTED to set my own condition by which the point at i=-1 is set (a Neumann type condition). I have seen some matrix free examples but is there an easy way to "add" such a condition to the linear system in this case?
> 
> Thanks for any help you can provide.
> 
> Best,
> Colton Bryant


From liufield at gmail.com  Thu Aug 15 16:03:41 2024
From: liufield at gmail.com (neil liu)
Date: Thu, 15 Aug 2024 17:03:41 -0400
Subject: [petsc-users] Strong scaling concerns for PCBDDC with Vector FEM
Message-ID: <CAGVJNHCmbz7tSu5i4m5LJsFyJ1wOQ=wLQ-n0zP0NxocCnMeG8Q@mail.gmail.com>

Dear Petsc developers,

Thanks for your previous help. Now, the PCBDDC can converge to 1e-8 with,

petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 8 ./app -pc_type bddc
-pc_bddc_coarse_redundant_pc_type svd   -ksp_error_if_not_converged
-mat_type is -ksp_monitor -ksp_rtol 1e-8 -ksp_gmres_restart 5000 -ksp_view
-pc_bddc_use_local_mat_graph 0  -pc_bddc_dirichlet_pc_type ilu
-pc_bddc_neumann_pc_type gamg -pc_bddc_neumann_pc_gamg_esteig_ksp_max_it 10
-ksp_converged_reason -pc_bddc_neumann_approximate -ksp_max_it 500 -log_view

Then I used 2 cases for strong scaling test. One case only involves real
numbers (tetra #: 49,152; dof #: 324, 224 ) for matrix and rhs. The 2nd
case involves complex numbers  (tetra #: 95,336; dof #: 611,432)  due to
PML.

Case 1:
cpu #                Time for 500 ksp steps (s)    Parallel efficiency
 PCsetup time(s)
          2              234.7
                    3.12
          4              126.6                                     0.92
                  1.62
          8              84.97                                     0.69
                  1.26
However for Case 2,
cpu #                Time for 500 ksp steps (s)    Parallel efficiency
 PCsetup time(s)
          2              584.5
                        8.61
          4              376.8                                    0.77
                     6.56
          8              459.6                                    0.31
                   66.47
For these 2 cases, I checked the time for PCsetup as an example. It seems 8
cpus for case 2 used too much time on PCsetup.
Do you have any ideas about what is going on here?

Thanks,
Xiaodong
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240815/fa2da2ab/attachment.html>

From stefano.zampini at gmail.com  Sat Aug 17 08:23:22 2024
From: stefano.zampini at gmail.com (Stefano Zampini)
Date: Sat, 17 Aug 2024 16:23:22 +0300
Subject: [petsc-users] Strong scaling concerns for PCBDDC with Vector FEM
In-Reply-To: <CAGVJNHCmbz7tSu5i4m5LJsFyJ1wOQ=wLQ-n0zP0NxocCnMeG8Q@mail.gmail.com>
References: <CAGVJNHCmbz7tSu5i4m5LJsFyJ1wOQ=wLQ-n0zP0NxocCnMeG8Q@mail.gmail.com>
Message-ID: <CAGPUisjdyE8peRFUzF=w=37bF76WeaxJFK8kPZTwDY2cisM=EQ@mail.gmail.com>

Please include the output of -log_view -ksp_view -ksp_monitor to understand
what's happening.

Can you please share the equations you are solving so we can provide
suggestions on the solver configuration?
As I said, solving for Nedelec-type discretizations is challenging, and not
for off-the-shelf, black box solvers

Below are some comments:


   - You use a redundant SVD approach for the coarse solve, which can be
   inefficient if your coarse space grows. You can use a parallel direct
   solver like MUMPS (reconfigure with --download-mumps and use
   -pc_bddc_coarse_pc_type lu -pc_bddc_coarse_pc_factor_mat_solver_type mumps)
   - Why use ILU for the Dirichlet problem and GAMG for the Neumann
   problem? With 8 processes and 300K total dofs, you will have around 40K
   dofs per process, which is ok for a direct solver like MUMPS
   (-pc_bddc_dirichlet_pc_factor_mat_solver_type mumps, same for Neumann).
   With Nedelec dofs and the sparsity pattern they induce,  I believe you can
   push to 80K dofs per process with good performance.
   - Why 5000 of restart for GMRES? It is highly inefficient to
   re-orthogonalize such a large set of vectors.


Il giorno ven 16 ago 2024 alle ore 00:04 neil liu <liufield at gmail.com> ha
scritto:

> Dear Petsc developers,
>
> Thanks for your previous help. Now, the PCBDDC can converge to 1e-8 with,
>
> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 8 ./app -pc_type bddc
> -pc_bddc_coarse_redundant_pc_type svd   -ksp_error_if_not_converged
> -mat_type is -ksp_monitor -ksp_rtol 1e-8 -ksp_gmres_restart 5000 -ksp_view
> -pc_bddc_use_local_mat_graph 0  -pc_bddc_dirichlet_pc_type ilu
> -pc_bddc_neumann_pc_type gamg -pc_bddc_neumann_pc_gamg_esteig_ksp_max_it 10
> -ksp_converged_reason -pc_bddc_neumann_approximate -ksp_max_it 500 -log_view
>
> Then I used 2 cases for strong scaling test. One case only involves real
> numbers (tetra #: 49,152; dof #: 324, 224 ) for matrix and rhs. The 2nd
> case involves complex numbers  (tetra #: 95,336; dof #: 611,432)  due to
> PML.
>
> Case 1:
> cpu #                Time for 500 ksp steps (s)    Parallel efficiency
>  PCsetup time(s)
>           2              234.7
>                       3.12
>           4              126.6                                     0.92
>                   1.62
>           8              84.97                                     0.69
>                   1.26
> However for Case 2,
> cpu #                Time for 500 ksp steps (s)    Parallel efficiency
>  PCsetup time(s)
>           2              584.5
>                           8.61
>           4              376.8                                    0.77
>                        6.56
>           8              459.6                                    0.31
>                      66.47
> For these 2 cases, I checked the time for PCsetup as an example. It seems
> 8 cpus for case 2 used too much time on PCsetup.
> Do you have any ideas about what is going on here?
>
> Thanks,
> Xiaodong
>
>
>

-- 
Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240817/f4327a4f/attachment.html>

From lzou at anl.gov  Sat Aug 17 11:35:28 2024
From: lzou at anl.gov (Zou, Ling)
Date: Sat, 17 Aug 2024 16:35:28 +0000
Subject: [petsc-users] PCFactorSetMatOrderingType not working with 3.21
Message-ID: <SA1PR09MB79844769B6C005A14F0A3E45C5822@SA1PR09MB7984.namprd09.prod.outlook.com>

Hi all,

The following codes are how I used to setup PC mat ordering:

  // Setup KSP/PC (at this moment, user-input options and commandline options are available)
  SNESGetKSP(snes, &ksp);
  KSPSetFromOptions(ksp);
  PC pc;
  KSPGetPC(ksp, &pc);
  PCFactorSetMatOrderingType(pc, MATORDERINGRCM);
  // PCFactorSetLevels(pc, 5);
  SNESSetFromOptions(snes);

After switching to PETSc 3.21, this no longer works, and can be confirmed from ?-snes_view? output:

  PC Object: 1 MPI process
    type: ilu
      out-of-place factorization
      0 levels of fill
      tolerance for zero pivot 2.22045e-14
      using diagonal shift to prevent zero pivot [NONZERO]
      matrix ordering: natural

The command line option still works, i.e., ?-pc_factor_mat_ordering_type rcm? gives me the correct behavior.

Questions:

  *   Is this a bug introduced in the new version, or
  *   With the new version, I should call this function at a different time?

Best,

-Ling
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240817/9a944b17/attachment-0001.html>

From bsmith at petsc.dev  Sat Aug 17 12:07:46 2024
From: bsmith at petsc.dev (Barry Smith)
Date: Sat, 17 Aug 2024 13:07:46 -0400
Subject: [petsc-users] PCFactorSetMatOrderingType not working with 3.21
In-Reply-To: <SA1PR09MB79844769B6C005A14F0A3E45C5822@SA1PR09MB7984.namprd09.prod.outlook.com>
References: <SA1PR09MB79844769B6C005A14F0A3E45C5822@SA1PR09MB7984.namprd09.prod.outlook.com>
Message-ID: <F4C41789-FD4D-4911-978C-BE809A1B8598@petsc.dev>


   I have attached src/snes/tutorials/ex5.c in which I tried to reproduce your problem by inserting the code you've indicated.

   However I am not getting the problem you see, I am seeing,

    type: ilu
      out-of-place factorization
      0 levels of fill
      tolerance for zero pivot 2.22045e-14
      matrix ordering: rcm

when I run with -pc_type ilu -snes_view?

 Can you please confirm you get the same problem with the attached ex5.c ?  You could send your code to see if I can reproduce the problem.

  I am using the release branch of PETSc 

  Barry


> On Aug 17, 2024, at 12:35?PM, Zou, Ling via petsc-users <petsc-users at mcs.anl.gov> wrote:
> 
> Hi all,
>  
> The following codes are how I used to setup PC mat ordering:
>  
>   // Setup KSP/PC (at this moment, user-input options and commandline options are available)
>   SNESGetKSP(snes, &ksp);
>   KSPSetFromOptions(ksp);
>   PC pc;
>   KSPGetPC(ksp, &pc);
>   PCFactorSetMatOrderingType(pc, MATORDERINGRCM);
>   // PCFactorSetLevels(pc, 5);
>   SNESSetFromOptions(snes);
>  
> After switching to PETSc 3.21, this no longer works, and can be confirmed from ?-snes_view? output:
>  
>   PC Object: 1 MPI process
>     type: ilu
>       out-of-place factorization
>       0 levels of fill
>       tolerance for zero pivot 2.22045e-14
>       using diagonal shift to prevent zero pivot [NONZERO]
>       matrix ordering: natural
>  
> The command line option still works, i.e., ?-pc_factor_mat_ordering_type rcm? gives me the correct behavior.
>  
> Questions:
> Is this a bug introduced in the new version, or
> With the new version, I should call this function at a different time?
>  
> Best,
>  
> -Ling

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240817/4e556b41/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ex5.c
Type: application/octet-stream
Size: 37704 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240817/4e556b41/attachment-0001.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240817/4e556b41/attachment-0003.html>

From lzou at anl.gov  Sat Aug 17 14:44:25 2024
From: lzou at anl.gov (Zou, Ling)
Date: Sat, 17 Aug 2024 19:44:25 +0000
Subject: [petsc-users] PCFactorSetMatOrderingType not working with 3.21
In-Reply-To: <F4C41789-FD4D-4911-978C-BE809A1B8598@petsc.dev>
References: <SA1PR09MB79844769B6C005A14F0A3E45C5822@SA1PR09MB7984.namprd09.prod.outlook.com>
	<F4C41789-FD4D-4911-978C-BE809A1B8598@petsc.dev>
Message-ID: <SA1PR09MB7984057791018822452968A9C5822@SA1PR09MB7984.namprd09.prod.outlook.com>

Barry, thanks.
I am accessing PETSc through MOOSE. I need to figure out if the versions are consistent and how to test it.
-Ling

From: Barry Smith <bsmith at petsc.dev>
Date: Saturday, August 17, 2024 at 12:08?PM
To: Zou, Ling <lzou at anl.gov>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] PCFactorSetMatOrderingType not working with 3.21
I have attached src/snes/tutorials/ex5.?c in which I tried to reproduce your problem by inserting the code you've indicated. However I am not getting the problem you see, I am seeing, type: ilu out-of-place factorization 0 levels of fill tolerance
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

   I have attached src/snes/tutorials/ex5.c in which I tried to reproduce your problem by inserting the code you've indicated.

   However I am not getting the problem you see, I am seeing,


    type: ilu

      out-of-place factorization

      0 levels of fill

      tolerance for zero pivot 2.22045e-14

      matrix ordering: rcm


when I run with -pc_type ilu -snes_view

 Can you please confirm you get the same problem with the attached ex5.c ?  You could send your code to see if I can reproduce the problem.

  I am using the release branch of PETSc

  Barry


On Aug 17, 2024, at 12:35?PM, Zou, Ling via petsc-users <petsc-users at mcs.anl.gov> wrote:

Hi all,

The following codes are how I used to setup PC mat ordering:

  // Setup KSP/PC (at this moment, user-input options and commandline options are available)
  SNESGetKSP(snes, &ksp);
  KSPSetFromOptions(ksp);
  PC pc;
  KSPGetPC(ksp, &pc);
  PCFactorSetMatOrderingType(pc, MATORDERINGRCM);
  // PCFactorSetLevels(pc, 5);
  SNESSetFromOptions(snes);

After switching to PETSc 3.21, this no longer works, and can be confirmed from ?-snes_view? output:

  PC Object: 1 MPI process
    type: ilu
      out-of-place factorization
      0 levels of fill
      tolerance for zero pivot 2.22045e-14
      using diagonal shift to prevent zero pivot [NONZERO]
      matrix ordering: natural

The command line option still works, i.e., ?-pc_factor_mat_ordering_type rcm? gives me the correct behavior.

Questions:

  *   Is this a bug introduced in the new version, or
  *   With the new version, I should call this function at a different time?

Best,

-Ling

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240817/a5c1b7b6/attachment.html>

From liufield at gmail.com  Sat Aug 17 16:37:42 2024
From: liufield at gmail.com (neil liu)
Date: Sat, 17 Aug 2024 17:37:42 -0400
Subject: [petsc-users] Strong scaling concerns for PCBDDC with Vector FEM
In-Reply-To: <CAGPUisjdyE8peRFUzF=w=37bF76WeaxJFK8kPZTwDY2cisM=EQ@mail.gmail.com>
References: <CAGVJNHCmbz7tSu5i4m5LJsFyJ1wOQ=wLQ-n0zP0NxocCnMeG8Q@mail.gmail.com>
	<CAGPUisjdyE8peRFUzF=w=37bF76WeaxJFK8kPZTwDY2cisM=EQ@mail.gmail.com>
Message-ID: <CAGVJNHBJbLGwVgmF056=3f=FHNny5ter1UjBdbumsbFiw9aZkQ@mail.gmail.com>

Hi, Stefano,
Please see the attached for the information with 4 and 8 CPUs for the
complex matrix.
I am solving Maxwell equations (Attahced) using 2nd-order Nedelec elements
(two dofs each edge, and two dofs each face).
The computational domain consists of different mediums, e.g., vacuum and
substrate (different permitivity).
The PML is used to truncate the computational domain, absorbing the
outgoing wave and introducing complex numbers for the matrix.

Thanks a lot for your suggestions. I will try MUMPS.
For now, I just want to fiddle with Petsc's built-in features to know more
about it.
Yes. 5000 is larger. Smaller value. e.g., 30, converges very slowly.

Thanks a lot.

Have a good weekend.


On Sat, Aug 17, 2024 at 9:23?AM Stefano Zampini <stefano.zampini at gmail.com>
wrote:

> Please include the output of -log_view -ksp_view -ksp_monitor to
> understand what's happening.
>
> Can you please share the equations you are solving so we can provide
> suggestions on the solver configuration?
> As I said, solving for Nedelec-type discretizations is challenging, and
> not for off-the-shelf, black box solvers
>
> Below are some comments:
>
>
>    - You use a redundant SVD approach for the coarse solve, which can be
>    inefficient if your coarse space grows. You can use a parallel direct
>    solver like MUMPS (reconfigure with --download-mumps and use
>    -pc_bddc_coarse_pc_type lu -pc_bddc_coarse_pc_factor_mat_solver_type mumps)
>    - Why use ILU for the Dirichlet problem and GAMG for the Neumann
>    problem? With 8 processes and 300K total dofs, you will have around 40K
>    dofs per process, which is ok for a direct solver like MUMPS
>    (-pc_bddc_dirichlet_pc_factor_mat_solver_type mumps, same for Neumann).
>    With Nedelec dofs and the sparsity pattern they induce,  I believe you can
>    push to 80K dofs per process with good performance.
>    - Why 5000 of restart for GMRES? It is highly inefficient to
>    re-orthogonalize such a large set of vectors.
>
>
> Il giorno ven 16 ago 2024 alle ore 00:04 neil liu <liufield at gmail.com> ha
> scritto:
>
>> Dear Petsc developers,
>>
>> Thanks for your previous help. Now, the PCBDDC can converge to 1e-8 with,
>>
>> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 8 ./app -pc_type bddc
>> -pc_bddc_coarse_redundant_pc_type svd   -ksp_error_if_not_converged
>> -mat_type is -ksp_monitor -ksp_rtol 1e-8 -ksp_gmres_restart 5000 -ksp_view
>> -pc_bddc_use_local_mat_graph 0  -pc_bddc_dirichlet_pc_type ilu
>> -pc_bddc_neumann_pc_type gamg -pc_bddc_neumann_pc_gamg_esteig_ksp_max_it 10
>> -ksp_converged_reason -pc_bddc_neumann_approximate -ksp_max_it 500 -log_view
>>
>> Then I used 2 cases for strong scaling test. One case only involves real
>> numbers (tetra #: 49,152; dof #: 324, 224 ) for matrix and rhs. The 2nd
>> case involves complex numbers  (tetra #: 95,336; dof #: 611,432)  due to
>> PML.
>>
>> Case 1:
>> cpu #                Time for 500 ksp steps (s)    Parallel efficiency
>>  PCsetup time(s)
>>           2              234.7
>>                       3.12
>>           4              126.6                                     0.92
>>                     1.62
>>           8              84.97                                     0.69
>>                     1.26
>> However for Case 2,
>> cpu #                Time for 500 ksp steps (s)    Parallel efficiency
>>  PCsetup time(s)
>>           2              584.5
>>                           8.61
>>           4              376.8                                    0.77
>>                        6.56
>>           8              459.6                                    0.31
>>                      66.47
>> For these 2 cases, I checked the time for PCsetup as an example. It seems
>> 8 cpus for case 2 used too much time on PCsetup.
>> Do you have any ideas about what is going on here?
>>
>> Thanks,
>> Xiaodong
>>
>>
>>
>
> --
> Stefano
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240817/cd356150/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: LogView_KspView_KspMonitor_ComplexMatrix-4CPU
Type: application/octet-stream
Size: 58500 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240817/cd356150/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: LogView_KspView_KspMonitor_ComplexMatrix-8CPU
Type: application/octet-stream
Size: 59661 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240817/cd356150/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Equations.pdf
Type: application/pdf
Size: 114430 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240817/cd356150/attachment-0001.pdf>

From bsmith at petsc.dev  Sun Aug 18 12:22:20 2024
From: bsmith at petsc.dev (Barry Smith)
Date: Sun, 18 Aug 2024 13:22:20 -0400
Subject: [petsc-users] PCFactorSetMatOrderingType not working with 3.21
In-Reply-To: <SA1PR09MB7984057791018822452968A9C5822@SA1PR09MB7984.namprd09.prod.outlook.com>
References: <SA1PR09MB79844769B6C005A14F0A3E45C5822@SA1PR09MB7984.namprd09.prod.outlook.com>
	<F4C41789-FD4D-4911-978C-BE809A1B8598@petsc.dev>
	<SA1PR09MB7984057791018822452968A9C5822@SA1PR09MB7984.namprd09.prod.outlook.com>
Message-ID: <CD8F8D95-E2D5-4E5E-BE6B-04223C6DE59A@petsc.dev>


   Are you using -pc_type ilu  at the command line in your test? Or just letting it default to using ILU?

    This could explain the difference, the decision of what preconditioner to default to has moved until later in code, not when the PC is created or the matrix supplied but when it starts to build the preconditioner. Hence when you call PCFactorSetMatOrderingType() in the code the PC may not yet be set to ILU (or anything) hence the PCFactorSetMatOrderingType() is ignored,


> On Aug 17, 2024, at 3:44?PM, Zou, Ling <lzou at anl.gov> wrote:
> 
> Barry, thanks.
> I am accessing PETSc through MOOSE. I need to figure out if the versions are consistent and how to test it.
> -Ling
>  
> From: Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
> Date: Saturday, August 17, 2024 at 12:08?PM
> To: Zou, Ling <lzou at anl.gov <mailto:lzou at anl.gov>>
> Cc: petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
> Subject: Re: [petsc-users] PCFactorSetMatOrderingType not working with 3.21
> 
> I have attached src/snes/tutorials/ex5.?c in which I tried to reproduce your problem by inserting the code you've indicated. However I am not getting the problem you see, I am seeing, type: ilu out-of-place factorization 0 levels of fill tolerance
> ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
>  
> ZjQcmQRYFpfptBannerEnd
>  
>    I have attached src/snes/tutorials/ex5.c in which I tried to reproduce your problem by inserting the code you've indicated.
>  
>    However I am not getting the problem you see, I am seeing,
>  
>     type: ilu
>       out-of-place factorization
>       0 levels of fill
>       tolerance for zero pivot 2.22045e-14
>       matrix ordering: rcm
> 
> 
> when I run with -pc_type ilu -snes_view
>  
>  Can you please confirm you get the same problem with the attached ex5.c ?  You could send your code to see if I can reproduce the problem.
>  
>   I am using the release branch of PETSc 
>  
>   Barry
>  
>  
>  
> 
> 
> 
> 
> 
> 
> On Aug 17, 2024, at 12:35?PM, Zou, Ling via petsc-users <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>  
> Hi all,
>  
> The following codes are how I used to setup PC mat ordering:
>  
>   // Setup KSP/PC (at this moment, user-input options and commandline options are available)
>   SNESGetKSP(snes, &ksp);
>   KSPSetFromOptions(ksp);
>   PC pc;
>   KSPGetPC(ksp, &pc);
>   PCFactorSetMatOrderingType(pc, MATORDERINGRCM);
>   // PCFactorSetLevels(pc, 5);
>   SNESSetFromOptions(snes);
>  
> After switching to PETSc 3.21, this no longer works, and can be confirmed from ?-snes_view? output:
>  
>   PC Object: 1 MPI process
>     type: ilu
>       out-of-place factorization
>       0 levels of fill
>       tolerance for zero pivot 2.22045e-14
>       using diagonal shift to prevent zero pivot [NONZERO]
>       matrix ordering: natural
>  
> The command line option still works, i.e., ?-pc_factor_mat_ordering_type rcm? gives me the correct behavior.
>  
> Questions:
> Is this a bug introduced in the new version, or
> With the new version, I should call this function at a different time?
>  
> Best,
>  
> -Ling

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240818/6bc3a686/attachment.html>

From lzou at anl.gov  Sun Aug 18 17:19:38 2024
From: lzou at anl.gov (Zou, Ling)
Date: Sun, 18 Aug 2024 22:19:38 +0000
Subject: [petsc-users] PCFactorSetMatOrderingType not working with 3.21
In-Reply-To: <CD8F8D95-E2D5-4E5E-BE6B-04223C6DE59A@petsc.dev>
References: <SA1PR09MB79844769B6C005A14F0A3E45C5822@SA1PR09MB7984.namprd09.prod.outlook.com>
	<F4C41789-FD4D-4911-978C-BE809A1B8598@petsc.dev>
	<SA1PR09MB7984057791018822452968A9C5822@SA1PR09MB7984.namprd09.prod.outlook.com>
	<CD8F8D95-E2D5-4E5E-BE6B-04223C6DE59A@petsc.dev>
Message-ID: <SA1PR09MB7984AD4E5A85B58D3007DA09C5832@SA1PR09MB7984.namprd09.prod.outlook.com>

Thank you, Barry. You must be right in this case. I am defaulting to ILU.
I did an additional test to confirm, with ?-pc_type ilu? in the command line, it works fine.

If I am defaulting to ILU, when should I call PCFactorSetMatOrderingType?

-Ling


From: Barry Smith <bsmith at petsc.dev>
Date: Sunday, August 18, 2024 at 12:22?PM
To: Zou, Ling <lzou at anl.gov>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] PCFactorSetMatOrderingType not working with 3.21
Are you using -pc_type ilu at the command line in your test? Or just letting it default to using ILU? This could explain the difference, the decision of what preconditioner to default to has moved until later in code, not when the PC is created
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

   Are you using -pc_type ilu  at the command line in your test? Or just letting it default to using ILU?

    This could explain the difference, the decision of what preconditioner to default to has moved until later in code, not when the PC is created or the matrix supplied but when it starts to build the preconditioner. Hence when you call PCFactorSetMatOrderingType() in the code the PC may not yet be set to ILU (or anything) hence the PCFactorSetMatOrderingType() is ignored,


On Aug 17, 2024, at 3:44?PM, Zou, Ling <lzou at anl.gov> wrote:

Barry, thanks.
I am accessing PETSc through MOOSE. I need to figure out if the versions are consistent and how to test it.
-Ling

From: Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Date: Saturday, August 17, 2024 at 12:08?PM
To: Zou, Ling <lzou at anl.gov<mailto:lzou at anl.gov>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] PCFactorSetMatOrderingType not working with 3.21
I have attached src/snes/tutorials/ex5.?c in which I tried to reproduce your problem by inserting the code you've indicated. However I am not getting the problem you see, I am seeing, type: ilu out-of-place factorization 0 levels of fill tolerance
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

   I have attached src/snes/tutorials/ex5.c in which I tried to reproduce your problem by inserting the code you've indicated.

   However I am not getting the problem you see, I am seeing,

    type: ilu
      out-of-place factorization
      0 levels of fill
      tolerance for zero pivot 2.22045e-14
      matrix ordering: rcm

when I run with -pc_type ilu -snes_view

 Can you please confirm you get the same problem with the attached ex5.c ?  You could send your code to see if I can reproduce the problem.

  I am using the release branch of PETSc

  Barry


On Aug 17, 2024, at 12:35?PM, Zou, Ling via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:

Hi all,

The following codes are how I used to setup PC mat ordering:

  // Setup KSP/PC (at this moment, user-input options and commandline options are available)
  SNESGetKSP(snes, &ksp);
  KSPSetFromOptions(ksp);
  PC pc;
  KSPGetPC(ksp, &pc);
  PCFactorSetMatOrderingType(pc, MATORDERINGRCM);
  // PCFactorSetLevels(pc, 5);
  SNESSetFromOptions(snes);

After switching to PETSc 3.21, this no longer works, and can be confirmed from ?-snes_view? output:

  PC Object: 1 MPI process
    type: ilu
      out-of-place factorization
      0 levels of fill
      tolerance for zero pivot 2.22045e-14
      using diagonal shift to prevent zero pivot [NONZERO]
      matrix ordering: natural

The command line option still works, i.e., ?-pc_factor_mat_ordering_type rcm? gives me the correct behavior.

Questions:

  *   Is this a bug introduced in the new version, or
  *   With the new version, I should call this function at a different time?

Best,

-Ling

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240818/7ca25126/attachment-0001.html>

From lzou at anl.gov  Sun Aug 18 18:04:25 2024
From: lzou at anl.gov (Zou, Ling)
Date: Sun, 18 Aug 2024 23:04:25 +0000
Subject: [petsc-users] Would Mac OS version affect PETSc/C/C++ performance?
Message-ID: <SA1PR09MB7984D05748DC5722078CA28AC5832@SA1PR09MB7984.namprd09.prod.outlook.com>

Hi all,

After updating Mac OS from Ventura to Sonoma, I am seeing my PETSc code having slightly-larger-than 10% of performance degradation (only in terms of execution time).
I track the number of major function calls, they are identical between the two OS (so PETSc is not the one to blame), but just slower.
Is this something expected, any one also experienced it?

-Ling
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240818/fedefbfd/attachment.html>

From bsmith at petsc.dev  Sun Aug 18 18:31:11 2024
From: bsmith at petsc.dev (Barry Smith)
Date: Sun, 18 Aug 2024 19:31:11 -0400
Subject: [petsc-users] PCFactorSetMatOrderingType not working with 3.21
In-Reply-To: <SA1PR09MB7984AD4E5A85B58D3007DA09C5832@SA1PR09MB7984.namprd09.prod.outlook.com>
References: <SA1PR09MB79844769B6C005A14F0A3E45C5822@SA1PR09MB7984.namprd09.prod.outlook.com>
	<F4C41789-FD4D-4911-978C-BE809A1B8598@petsc.dev>
	<SA1PR09MB7984057791018822452968A9C5822@SA1PR09MB7984.namprd09.prod.outlook.com>
	<CD8F8D95-E2D5-4E5E-BE6B-04223C6DE59A@petsc.dev>
	<SA1PR09MB7984AD4E5A85B58D3007DA09C5832@SA1PR09MB7984.namprd09.prod.outlook.com>
Message-ID: <FD80150B-7AD3-4C7D-99E4-8F42A2E07469@petsc.dev>


   You can call 

  PCSetType(pc,PCILU); 
  KSPSetFromOptions(ksp);
  PCFactorSetMatOrderingType(pc,....);

  This reproduces the old behavior in PETSc.

   You can still pass -pc_type somethingelse at runtime to use a different PC.


> On Aug 18, 2024, at 6:19?PM, Zou, Ling <lzou at anl.gov> wrote:
> 
> Thank you, Barry. You must be right in this case. I am defaulting to ILU.
> I did an additional test to confirm, with ?-pc_type ilu? in the command line, it works fine.
>  
> If I am defaulting to ILU, when should I call PCFactorSetMatOrderingType?
>  
> -Ling
>  
>  
> From: Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
> Date: Sunday, August 18, 2024 at 12:22?PM
> To: Zou, Ling <lzou at anl.gov <mailto:lzou at anl.gov>>
> Cc: petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
> Subject: Re: [petsc-users] PCFactorSetMatOrderingType not working with 3.21
> 
> Are you using -pc_type ilu at the command line in your test? Or just letting it default to using ILU? This could explain the difference, the decision of what preconditioner to default to has moved until later in code, not when the PC is created 
> ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
>  
> ZjQcmQRYFpfptBannerEnd
>  
>    Are you using -pc_type ilu  at the command line in your test? Or just letting it default to using ILU?
>  
>     This could explain the difference, the decision of what preconditioner to default to has moved until later in code, not when the PC is created or the matrix supplied but when it starts to build the preconditioner. Hence when you call PCFactorSetMatOrderingType() in the code the PC may not yet be set to ILU (or anything) hence the PCFactorSetMatOrderingType() is ignored,
>  
>  
> 
> 
> On Aug 17, 2024, at 3:44?PM, Zou, Ling <lzou at anl.gov <mailto:lzou at anl.gov>> wrote:
>  
> Barry, thanks.
> I am accessing PETSc through MOOSE. I need to figure out if the versions are consistent and how to test it.
> -Ling
>  
> From: Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
> Date: Saturday, August 17, 2024 at 12:08?PM
> To: Zou, Ling <lzou at anl.gov <mailto:lzou at anl.gov>>
> Cc: petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
> Subject: Re: [petsc-users] PCFactorSetMatOrderingType not working with 3.21
> 
> I have attached src/snes/tutorials/ex5.?c in which I tried to reproduce your problem by inserting the code you've indicated. However I am not getting the problem you see, I am seeing, type: ilu out-of-place factorization 0 levels of fill tolerance
> ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
>  
> ZjQcmQRYFpfptBannerEnd
>  
>    I have attached src/snes/tutorials/ex5.c in which I tried to reproduce your problem by inserting the code you've indicated.
>  
>    However I am not getting the problem you see, I am seeing,
>  
>     type: ilu
>       out-of-place factorization
>       0 levels of fill
>       tolerance for zero pivot 2.22045e-14
>       matrix ordering: rcm
>  
> 
> when I run with -pc_type ilu -snes_view
>  
>  Can you please confirm you get the same problem with the attached ex5.c ?  You could send your code to see if I can reproduce the problem.
>  
>   I am using the release branch of PETSc 
>  
>   Barry
>  
>  
>  
>  
> 
>  
> 
>  
> 
> On Aug 17, 2024, at 12:35?PM, Zou, Ling via petsc-users <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>  
> Hi all,
>  
> The following codes are how I used to setup PC mat ordering:
>  
>   // Setup KSP/PC (at this moment, user-input options and commandline options are available)
>   SNESGetKSP(snes, &ksp);
>   KSPSetFromOptions(ksp);
>   PC pc;
>   KSPGetPC(ksp, &pc);
>   PCFactorSetMatOrderingType(pc, MATORDERINGRCM);
>   // PCFactorSetLevels(pc, 5);
>   SNESSetFromOptions(snes);
>  
> After switching to PETSc 3.21, this no longer works, and can be confirmed from ?-snes_view? output:
>  
>   PC Object: 1 MPI process
>     type: ilu
>       out-of-place factorization
>       0 levels of fill
>       tolerance for zero pivot 2.22045e-14
>       using diagonal shift to prevent zero pivot [NONZERO]
>       matrix ordering: natural
>  
> The command line option still works, i.e., ?-pc_factor_mat_ordering_type rcm? gives me the correct behavior.
>  
> Questions:
> Is this a bug introduced in the new version, or
> With the new version, I should call this function at a different time?
>  
> Best,
>  
> -Ling

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240818/9f02f1a6/attachment-0001.html>

From stefano.zampini at gmail.com  Mon Aug 19 03:15:28 2024
From: stefano.zampini at gmail.com (Stefano Zampini)
Date: Mon, 19 Aug 2024 11:15:28 +0300
Subject: [petsc-users] Strong scaling concerns for PCBDDC with Vector FEM
In-Reply-To: <CAGVJNHBJbLGwVgmF056=3f=FHNny5ter1UjBdbumsbFiw9aZkQ@mail.gmail.com>
References: <CAGVJNHCmbz7tSu5i4m5LJsFyJ1wOQ=wLQ-n0zP0NxocCnMeG8Q@mail.gmail.com>
	<CAGPUisjdyE8peRFUzF=w=37bF76WeaxJFK8kPZTwDY2cisM=EQ@mail.gmail.com>
	<CAGVJNHBJbLGwVgmF056=3f=FHNny5ter1UjBdbumsbFiw9aZkQ@mail.gmail.com>
Message-ID: <CAGPUisgOugaDwvERx-19E7bGw5cEyDLzUd5Xoi4MOEBbDPmmUw@mail.gmail.com>

It seems you are using DMPLEX to handle the mesh, correct?
If so, you should configure using --download-parmetis to have a better
domain decomposition since the default one just splits the cells in chunks
as they are ordered.
This results in a large number of primal dofs on average (191, from the
output of ksp_view)
...
Primal    dofs   : 176 204 191
...
that slows down the solver setup.

Again, you should not use approximate local solvers with BDDC unless you
know what you are doing.
The theory for approximate solvers for BDDC is small and only for SPD
problems.
Looking at the output of log_view, coarse problem setup (PCBDDCCSet), and
primal functions setup (PCBDDCCorr) costs 35 + 63 seconds, respectively.
Also, the 500 application of the GAMG preconditioner for the Neumann solver
(PCBDDCNeuS) takes 129 seconds out of the 400 seconds of the total solve
time.

PCBDDCTopo             1 1.0 3.1563e-01 1.0 1.11e+06 3.4 1.6e+03 3.9e+04
3.8e+01  0  0  1  0  2   0  0  1  0  2    19
PCBDDCLKSP             2 1.0 2.0423e+00 1.7 9.31e+08 1.2 0.0e+00 0.0e+00
2.0e+00  0  0  0  0  0   0  0  0  0  0  3378
PCBDDCLWor             1 1.0 3.9178e-02 13.4 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCBDDCCorr             1 1.0 6.3981e+01 2.2 8.16e+10 1.6 0.0e+00 0.0e+00
0.0e+00 11 11  0  0  0  11 11  0  0  0  8900
PCBDDCCSet             1 1.0 3.5453e+01 4564.9 1.06e+05 1.7 1.2e+03 5.3e+03
5.0e+01  2  0  1  0  3   2  0  1  0  3     0
PCBDDCCKSP             1 1.0 6.3266e-01 1.3 0.00e+00 0.0 3.3e+02 1.1e+02
2.2e+01  0  0  0  0  1   0  0  0  0  1     0
PCBDDCScal             1 1.0 6.8274e-03 1.3 1.11e+06 3.4 5.6e+01 3.2e+05
0.0e+00  0  0  0  0  0   0  0  0  0  0   894
PCBDDCDirS          1000 1.0 6.0420e+00 3.5 6.64e+09 5.4 0.0e+00 0.0e+00
0.0e+00  1  0  0  0  0   1  0  0  0  0  2995
PCBDDCNeuS           500 1.0 1.2901e+02 2.1 8.28e+10 1.2 0.0e+00 0.0e+00
0.0e+00 22 12  0  0  0  22 12  0  0  0  4828
PCBDDCCoaS           500 1.0 5.8757e-01 1.8 1.09e+09 1.0 2.8e+04 7.4e+02
5.0e+02  0  0 17  0 28   0  0 17  0 31 14901

Finally, if I look at the residual history, I see a sharp decrease and a
very long plateau. This indicates a bad coarse space; as I said before,
there's no hope of finding a suitable coarse space without first changing
the basis of the Nedelec elements, which is done automatically if you
prescribe the discrete gradient operator (see the paper I have linked to in
my previous communication).


Il giorno dom 18 ago 2024 alle ore 00:37 neil liu <liufield at gmail.com> ha
scritto:

> Hi, Stefano,
> Please see the attached for the information with 4 and 8 CPUs for the
> complex matrix.
> I am solving Maxwell equations (Attahced) using 2nd-order Nedelec elements
> (two dofs each edge, and two dofs each face).
> The computational domain consists of different mediums, e.g., vacuum and
> substrate (different permitivity).
> The PML is used to truncate the computational domain, absorbing the
> outgoing wave and introducing complex numbers for the matrix.
>
> Thanks a lot for your suggestions. I will try MUMPS.
> For now, I just want to fiddle with Petsc's built-in features to know more
> about it.
> Yes. 5000 is larger. Smaller value. e.g., 30, converges very slowly.
>
> Thanks a lot.
>
> Have a good weekend.
>
>
> On Sat, Aug 17, 2024 at 9:23?AM Stefano Zampini <stefano.zampini at gmail.com>
> wrote:
>
>> Please include the output of -log_view -ksp_view -ksp_monitor to
>> understand what's happening.
>>
>> Can you please share the equations you are solving so we can provide
>> suggestions on the solver configuration?
>> As I said, solving for Nedelec-type discretizations is challenging, and
>> not for off-the-shelf, black box solvers
>>
>> Below are some comments:
>>
>>
>>    - You use a redundant SVD approach for the coarse solve, which can be
>>    inefficient if your coarse space grows. You can use a parallel direct
>>    solver like MUMPS (reconfigure with --download-mumps and use
>>    -pc_bddc_coarse_pc_type lu -pc_bddc_coarse_pc_factor_mat_solver_type mumps)
>>    - Why use ILU for the Dirichlet problem and GAMG for the Neumann
>>    problem? With 8 processes and 300K total dofs, you will have around 40K
>>    dofs per process, which is ok for a direct solver like MUMPS
>>    (-pc_bddc_dirichlet_pc_factor_mat_solver_type mumps, same for Neumann).
>>    With Nedelec dofs and the sparsity pattern they induce,  I believe you can
>>    push to 80K dofs per process with good performance.
>>    - Why 5000 of restart for GMRES? It is highly inefficient to
>>    re-orthogonalize such a large set of vectors.
>>
>>
>> Il giorno ven 16 ago 2024 alle ore 00:04 neil liu <liufield at gmail.com>
>> ha scritto:
>>
>>> Dear Petsc developers,
>>>
>>> Thanks for your previous help. Now, the PCBDDC can converge to 1e-8
>>> with,
>>>
>>> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 8 ./app -pc_type bddc
>>> -pc_bddc_coarse_redundant_pc_type svd   -ksp_error_if_not_converged
>>> -mat_type is -ksp_monitor -ksp_rtol 1e-8 -ksp_gmres_restart 5000 -ksp_view
>>> -pc_bddc_use_local_mat_graph 0  -pc_bddc_dirichlet_pc_type ilu
>>> -pc_bddc_neumann_pc_type gamg -pc_bddc_neumann_pc_gamg_esteig_ksp_max_it 10
>>> -ksp_converged_reason -pc_bddc_neumann_approximate -ksp_max_it 500 -log_view
>>>
>>> Then I used 2 cases for strong scaling test. One case only involves real
>>> numbers (tetra #: 49,152; dof #: 324, 224 ) for matrix and rhs. The 2nd
>>> case involves complex numbers  (tetra #: 95,336; dof #: 611,432)  due to
>>> PML.
>>>
>>> Case 1:
>>> cpu #                Time for 500 ksp steps (s)    Parallel efficiency
>>>    PCsetup time(s)
>>>           2              234.7
>>>                         3.12
>>>           4              126.6                                     0.92
>>>                     1.62
>>>           8              84.97                                     0.69
>>>                     1.26
>>> However for Case 2,
>>> cpu #                Time for 500 ksp steps (s)    Parallel efficiency
>>>  PCsetup time(s)
>>>           2              584.5
>>>                             8.61
>>>           4              376.8                                    0.77
>>>                          6.56
>>>           8              459.6                                    0.31
>>>                        66.47
>>> For these 2 cases, I checked the time for PCsetup as an example. It
>>> seems 8 cpus for case 2 used too much time on PCsetup.
>>> Do you have any ideas about what is going on here?
>>>
>>> Thanks,
>>> Xiaodong
>>>
>>>
>>>
>>
>> --
>> Stefano
>>
>

-- 
Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240819/e140c64a/attachment.html>

From lzou at anl.gov  Mon Aug 19 08:25:47 2024
From: lzou at anl.gov (Zou, Ling)
Date: Mon, 19 Aug 2024 13:25:47 +0000
Subject: [petsc-users] PCFactorSetMatOrderingType not working with 3.21
In-Reply-To: <FD80150B-7AD3-4C7D-99E4-8F42A2E07469@petsc.dev>
References: <SA1PR09MB79844769B6C005A14F0A3E45C5822@SA1PR09MB7984.namprd09.prod.outlook.com>
	<F4C41789-FD4D-4911-978C-BE809A1B8598@petsc.dev>
	<SA1PR09MB7984057791018822452968A9C5822@SA1PR09MB7984.namprd09.prod.outlook.com>
	<CD8F8D95-E2D5-4E5E-BE6B-04223C6DE59A@petsc.dev>
	<SA1PR09MB7984AD4E5A85B58D3007DA09C5832@SA1PR09MB7984.namprd09.prod.outlook.com>
	<FD80150B-7AD3-4C7D-99E4-8F42A2E07469@petsc.dev>
Message-ID: <SA1PR09MB798440E6F51346A8B69882A1C58C2@SA1PR09MB7984.namprd09.prod.outlook.com>

That?s nice. Thank you!
-Ling

From: Barry Smith <bsmith at petsc.dev>
Date: Sunday, August 18, 2024 at 6:31?PM
To: Zou, Ling <lzou at anl.gov>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] PCFactorSetMatOrderingType not working with 3.21
You can call PCSetType(pc,PCILU); KSPSetFromOptions(ksp); PCFactorSetMatOrderingType(pc,.?.?.?.?); This reproduces the old behavior in PETSc. You can still pass -pc_type somethingelse at runtime to use a different PC.?On Aug 18, 2024, at 6:?19 PM,
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

   You can call

  PCSetType(pc,PCILU);
  KSPSetFromOptions(ksp);
  PCFactorSetMatOrderingType(pc,....);

  This reproduces the old behavior in PETSc.

   You can still pass -pc_type somethingelse at runtime to use a different PC.


On Aug 18, 2024, at 6:19?PM, Zou, Ling <lzou at anl.gov> wrote:

Thank you, Barry. You must be right in this case. I am defaulting to ILU.
I did an additional test to confirm, with ?-pc_type ilu? in the command line, it works fine.

If I am defaulting to ILU, when should I call PCFactorSetMatOrderingType?

-Ling


From: Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Date: Sunday, August 18, 2024 at 12:22?PM
To: Zou, Ling <lzou at anl.gov<mailto:lzou at anl.gov>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] PCFactorSetMatOrderingType not working with 3.21
Are you using -pc_type ilu at the command line in your test? Or just letting it default to using ILU? This could explain the difference, the decision of what preconditioner to default to has moved until later in code, not when the PC is created
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

   Are you using -pc_type ilu  at the command line in your test? Or just letting it default to using ILU?

    This could explain the difference, the decision of what preconditioner to default to has moved until later in code, not when the PC is created or the matrix supplied but when it starts to build the preconditioner. Hence when you call PCFactorSetMatOrderingType() in the code the PC may not yet be set to ILU (or anything) hence the PCFactorSetMatOrderingType() is ignored,


On Aug 17, 2024, at 3:44?PM, Zou, Ling <lzou at anl.gov<mailto:lzou at anl.gov>> wrote:

Barry, thanks.
I am accessing PETSc through MOOSE. I need to figure out if the versions are consistent and how to test it.
-Ling

From: Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Date: Saturday, August 17, 2024 at 12:08?PM
To: Zou, Ling <lzou at anl.gov<mailto:lzou at anl.gov>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] PCFactorSetMatOrderingType not working with 3.21
I have attached src/snes/tutorials/ex5.?c in which I tried to reproduce your problem by inserting the code you've indicated. However I am not getting the problem you see, I am seeing, type: ilu out-of-place factorization 0 levels of fill tolerance
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

   I have attached src/snes/tutorials/ex5.c in which I tried to reproduce your problem by inserting the code you've indicated.

   However I am not getting the problem you see, I am seeing,

    type: ilu
      out-of-place factorization
      0 levels of fill
      tolerance for zero pivot 2.22045e-14
      matrix ordering: rcm

when I run with -pc_type ilu -snes_view

 Can you please confirm you get the same problem with the attached ex5.c ?  You could send your code to see if I can reproduce the problem.

  I am using the release branch of PETSc

  Barry


On Aug 17, 2024, at 12:35?PM, Zou, Ling via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:

Hi all,

The following codes are how I used to setup PC mat ordering:

  // Setup KSP/PC (at this moment, user-input options and commandline options are available)
  SNESGetKSP(snes, &ksp);
  KSPSetFromOptions(ksp);
  PC pc;
  KSPGetPC(ksp, &pc);
  PCFactorSetMatOrderingType(pc, MATORDERINGRCM);
  // PCFactorSetLevels(pc, 5);
  SNESSetFromOptions(snes);

After switching to PETSc 3.21, this no longer works, and can be confirmed from ?-snes_view? output:

  PC Object: 1 MPI process
    type: ilu
      out-of-place factorization
      0 levels of fill
      tolerance for zero pivot 2.22045e-14
      using diagonal shift to prevent zero pivot [NONZERO]
      matrix ordering: natural

The command line option still works, i.e., ?-pc_factor_mat_ordering_type rcm? gives me the correct behavior.

Questions:

  *   Is this a bug introduced in the new version, or
  *   With the new version, I should call this function at a different time?

Best,

-Ling

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240819/68242508/attachment-0001.html>

From mail2amneet at gmail.com  Mon Aug 19 19:23:53 2024
From: mail2amneet at gmail.com (Amneet Bhalla)
Date: Mon, 19 Aug 2024 17:23:53 -0700
Subject: [petsc-users] Configure issues with scalapack
Message-ID: <CAMETWJ0PAvCRBPLXKXavrPekyjTFB_H0Qgr6n0BpUTwpFZvHvw@mail.gmail.com>

Hi Folks,

I am trying to build PETSc with MUMPS which requires building/downloading
scalapack. I used the following configure command to do this:

./configure --PETSC_ARCH=linux-opt --with-debugging=0 --download-hypre=1
--with-x=0 -download-mumps -download-scalapack -download-parmetis
-download-metis -download-ptscotch --COPTFLAGS="-O3" --CXXOPTFLAGS="-O3"
--FOPTFLAGS="-O3" --with-mpi-dir=/opt/intel/oneapi/mpi/latest

For some reason PETSc configure gets stuck at configuring SCALAPACK -- it's
been more than 1 hour at this point

=============================================================================================
                                                  Configuring SCALAPACK
with cmake; this may take several minutes

=============================================================================================


Any idea what might be going on?

Thanks,
-- 
--Amneet
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240819/d9e93935/attachment.html>

From bsmith at petsc.dev  Mon Aug 19 21:16:23 2024
From: bsmith at petsc.dev (Barry Smith)
Date: Mon, 19 Aug 2024 22:16:23 -0400
Subject: [petsc-users] Configure issues with scalapack
In-Reply-To: <CAMETWJ0PAvCRBPLXKXavrPekyjTFB_H0Qgr6n0BpUTwpFZvHvw@mail.gmail.com>
References: <CAMETWJ0PAvCRBPLXKXavrPekyjTFB_H0Qgr6n0BpUTwpFZvHvw@mail.gmail.com>
Message-ID: <75A3A976-12D4-4787-89F7-5A80CD69E8A2@petsc.dev>


   You need to send configure.log to petsc-maint at mcs.anl.gov <mailto:petsc-maint at mcs.anl.gov> so we can potentially locate the problem.

> On Aug 19, 2024, at 8:23?PM, Amneet Bhalla <mail2amneet at gmail.com> wrote:
> 
> Hi Folks, 
> 
> I am trying to build PETSc with MUMPS which requires building/downloading scalapack. I used the following configure command to do this:
> 
> ./configure --PETSC_ARCH=linux-opt --with-debugging=0 --download-hypre=1 --with-x=0 -download-mumps -download-scalapack -download-parmetis -download-metis -download-ptscotch --COPTFLAGS="-O3" --CXXOPTFLAGS="-O3" --FOPTFLAGS="-O3" --with-mpi-dir=/opt/intel/oneapi/mpi/latest
> 
> For some reason PETSc configure gets stuck at configuring SCALAPACK -- it's been more than 1 hour at this point
> 
> =============================================================================================                                                    Configuring SCALAPACK with cmake; this may take several minutes                                                                      ============================================================================================= 
> 
> Any idea what might be going on?
> 
> Thanks, 
> --
> --Amneet 
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240819/d1db673c/attachment.html>

From junchao.zhang at gmail.com  Mon Aug 19 22:03:51 2024
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Mon, 19 Aug 2024 22:03:51 -0500
Subject: [petsc-users] Would Mac OS version affect PETSc/C/C++
 performance?
In-Reply-To: <SA1PR09MB7984D05748DC5722078CA28AC5832@SA1PR09MB7984.namprd09.prod.outlook.com>
References: <SA1PR09MB7984D05748DC5722078CA28AC5832@SA1PR09MB7984.namprd09.prod.outlook.com>
Message-ID: <CA+MQGp_x0VLNsMyHVUC0j+TS1VHEEAOB8fVE8zHkabCDY-kiNw@mail.gmail.com>

Do you have -log_view report so that we can know which petsc functions
degraded?  Or is it because compilers were different?

--Junchao Zhang


On Sun, Aug 18, 2024 at 6:04?PM Zou, Ling via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Hi all,
>
>
>
> After updating Mac OS from Ventura to Sonoma, I am seeing my PETSc code
> having slightly-larger-than 10% of performance degradation (only in terms
> of execution time).
>
> I track the number of major function calls, they are identical between the
> two OS (so PETSc is not the one to blame), but just slower.
>
> Is this something expected, any one also experienced it?
>
>
>
> -Ling
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240819/3e8c5e45/attachment-0001.html>

From balay.anl at fastmail.org  Mon Aug 19 23:23:16 2024
From: balay.anl at fastmail.org (Satish Balay)
Date: Mon, 19 Aug 2024 23:23:16 -0500 (CDT)
Subject: [petsc-users] Configure issues with scalapack
In-Reply-To: <75A3A976-12D4-4787-89F7-5A80CD69E8A2@petsc.dev>
References: <CAMETWJ0PAvCRBPLXKXavrPekyjTFB_H0Qgr6n0BpUTwpFZvHvw@mail.gmail.com>
	<75A3A976-12D4-4787-89F7-5A80CD69E8A2@petsc.dev>
Message-ID: <055212a0-40f8-6a7f-9961-ab942cf79f71@fastmail.org>

I would:

- use 'top' to see where the hang is
- retry the build and see if the hang persists
- tweak compiler options [change from -O3 to -O2 or such, or use latest cmake] and see if that makes a difference.

Also note - the instructions for using Intel OneAPI MPI

https://urldefense.us/v3/__https://petsc.org/release/install/install/*mpi__;Iw!!G_uCfscf7eWS!fzmxXJAoz6sq09cMkWlxlC7lxKlGq7s8d1lmsvonkVcTttVqgkZBiY7idwr7nk6a4uOcMnl9J2WlJoXCnlQxhllRSCE$ 

Satish


On Mon, 19 Aug 2024, Barry Smith wrote:

> 
>    You need to send configure.log to petsc-maint at mcs.anl.gov <mailto:petsc-maint at mcs.anl.gov> so we can potentially locate the problem.
> 
> > On Aug 19, 2024, at 8:23?PM, Amneet Bhalla <mail2amneet at gmail.com> wrote:
> > 
> > Hi Folks, 
> > 
> > I am trying to build PETSc with MUMPS which requires building/downloading scalapack. I used the following configure command to do this:
> > 
> > ./configure --PETSC_ARCH=linux-opt --with-debugging=0 --download-hypre=1 --with-x=0 -download-mumps -download-scalapack -download-parmetis -download-metis -download-ptscotch --COPTFLAGS="-O3" --CXXOPTFLAGS="-O3" --FOPTFLAGS="-O3" --with-mpi-dir=/opt/intel/oneapi/mpi/latest
> > 
> > For some reason PETSc configure gets stuck at configuring SCALAPACK -- it's been more than 1 hour at this point
> > 
> > =============================================================================================                                                    Configuring SCALAPACK with cmake; this may take several minutes                                                                      ============================================================================================= 
> > 
> > Any idea what might be going on?
> > 
> > Thanks, 
> > --
> > --Amneet 
> > 
> > 
> > 
> 
> 

From liufield at gmail.com  Tue Aug 20 12:01:22 2024
From: liufield at gmail.com (neil liu)
Date: Tue, 20 Aug 2024 13:01:22 -0400
Subject: [petsc-users] Strong scaling concerns for PCBDDC with Vector FEM
In-Reply-To: <CAGPUisgOugaDwvERx-19E7bGw5cEyDLzUd5Xoi4MOEBbDPmmUw@mail.gmail.com>
References: <CAGVJNHCmbz7tSu5i4m5LJsFyJ1wOQ=wLQ-n0zP0NxocCnMeG8Q@mail.gmail.com>
	<CAGPUisjdyE8peRFUzF=w=37bF76WeaxJFK8kPZTwDY2cisM=EQ@mail.gmail.com>
	<CAGVJNHBJbLGwVgmF056=3f=FHNny5ter1UjBdbumsbFiw9aZkQ@mail.gmail.com>
	<CAGPUisgOugaDwvERx-19E7bGw5cEyDLzUd5Xoi4MOEBbDPmmUw@mail.gmail.com>
Message-ID: <CAGVJNHC0szqP83qvyn73cQySAKHPXLRPapmSqYJ-mRYa9cQSqQ@mail.gmail.com>

Thanks a lot for your explanation, Stefano. Very helpful.
Yes. I am using dmplex to read a tetrahdra mesh from gmsh. With parmetis,
the scaling performance is improved a lot.
I will read your paper about how to change the basis for Nedelec elements.

cpu #    time for 500 ksp steps  (s)           parallel efficiency
2           546
4           224                                               120%
8           170                                               80%
This results are much better than previous attempt. Then I checked the time
spent by several Petsc built-in functions for the ksp solver.

Functions          time(2 cpus)     time(4 cpus)      time(8 cpus)
VecMDot           78.32                43.28                30.47
VecMAXPY       92.95                48.37                30.798
MatMult          246.08               126.63                82.94

It seems from cpu 4 to cpu 8, the scaling is not as good as from cpu 2 to
cpu 4.
Am I  missing something?

Thanks a lot,

Xiaodong


On Mon, Aug 19, 2024 at 4:15?AM Stefano Zampini <stefano.zampini at gmail.com>
wrote:

> It seems you are using DMPLEX to handle the mesh, correct?
> If so, you should configure using --download-parmetis to have a better
> domain decomposition since the default one just splits the cells in chunks
> as they are ordered.
> This results in a large number of primal dofs on average (191, from the
> output of ksp_view)
> ...
> Primal    dofs   : 176 204 191
> ...
> that slows down the solver setup.
>
> Again, you should not use approximate local solvers with BDDC unless you
> know what you are doing.
> The theory for approximate solvers for BDDC is small and only for SPD
> problems.
> Looking at the output of log_view, coarse problem setup (PCBDDCCSet), and
> primal functions setup (PCBDDCCorr) costs 35 + 63 seconds, respectively.
> Also, the 500 application of the GAMG preconditioner for the Neumann
> solver (PCBDDCNeuS) takes 129 seconds out of the 400 seconds of the total
> solve time.
>
> PCBDDCTopo             1 1.0 3.1563e-01 1.0 1.11e+06 3.4 1.6e+03 3.9e+04
> 3.8e+01  0  0  1  0  2   0  0  1  0  2    19
> PCBDDCLKSP             2 1.0 2.0423e+00 1.7 9.31e+08 1.2 0.0e+00 0.0e+00
> 2.0e+00  0  0  0  0  0   0  0  0  0  0  3378
> PCBDDCLWor             1 1.0 3.9178e-02 13.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
> PCBDDCCorr             1 1.0 6.3981e+01 2.2 8.16e+10 1.6 0.0e+00 0.0e+00
> 0.0e+00 11 11  0  0  0  11 11  0  0  0  8900
> PCBDDCCSet             1 1.0 3.5453e+01 4564.9 1.06e+05 1.7 1.2e+03
> 5.3e+03 5.0e+01  2  0  1  0  3   2  0  1  0  3     0
> PCBDDCCKSP             1 1.0 6.3266e-01 1.3 0.00e+00 0.0 3.3e+02 1.1e+02
> 2.2e+01  0  0  0  0  1   0  0  0  0  1     0
> PCBDDCScal             1 1.0 6.8274e-03 1.3 1.11e+06 3.4 5.6e+01 3.2e+05
> 0.0e+00  0  0  0  0  0   0  0  0  0  0   894
> PCBDDCDirS          1000 1.0 6.0420e+00 3.5 6.64e+09 5.4 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0  2995
> PCBDDCNeuS           500 1.0 1.2901e+02 2.1 8.28e+10 1.2 0.0e+00 0.0e+00
> 0.0e+00 22 12  0  0  0  22 12  0  0  0  4828
> PCBDDCCoaS           500 1.0 5.8757e-01 1.8 1.09e+09 1.0 2.8e+04 7.4e+02
> 5.0e+02  0  0 17  0 28   0  0 17  0 31 14901
>
> Finally, if I look at the residual history, I see a sharp decrease and a
> very long plateau. This indicates a bad coarse space; as I said before,
> there's no hope of finding a suitable coarse space without first changing
> the basis of the Nedelec elements, which is done automatically if you
> prescribe the discrete gradient operator (see the paper I have linked to in
> my previous communication).
>
>
>
> Il giorno dom 18 ago 2024 alle ore 00:37 neil liu <liufield at gmail.com> ha
> scritto:
>
>> Hi, Stefano,
>> Please see the attached for the information with 4 and 8 CPUs for the
>> complex matrix.
>> I am solving Maxwell equations (Attahced) using 2nd-order Nedelec
>> elements (two dofs each edge, and two dofs each face).
>> The computational domain consists of different mediums, e.g., vacuum and
>> substrate (different permitivity).
>> The PML is used to truncate the computational domain, absorbing the
>> outgoing wave and introducing complex numbers for the matrix.
>>
>> Thanks a lot for your suggestions. I will try MUMPS.
>> For now, I just want to fiddle with Petsc's built-in features to know
>> more about it.
>> Yes. 5000 is larger. Smaller value. e.g., 30, converges very slowly.
>>
>> Thanks a lot.
>>
>> Have a good weekend.
>>
>>
>> On Sat, Aug 17, 2024 at 9:23?AM Stefano Zampini <
>> stefano.zampini at gmail.com> wrote:
>>
>>> Please include the output of -log_view -ksp_view -ksp_monitor to
>>> understand what's happening.
>>>
>>> Can you please share the equations you are solving so we can provide
>>> suggestions on the solver configuration?
>>> As I said, solving for Nedelec-type discretizations is challenging, and
>>> not for off-the-shelf, black box solvers
>>>
>>> Below are some comments:
>>>
>>>
>>>    - You use a redundant SVD approach for the coarse solve, which can
>>>    be inefficient if your coarse space grows. You can use a parallel direct
>>>    solver like MUMPS (reconfigure with --download-mumps and use
>>>    -pc_bddc_coarse_pc_type lu -pc_bddc_coarse_pc_factor_mat_solver_type mumps)
>>>    - Why use ILU for the Dirichlet problem and GAMG for the Neumann
>>>    problem? With 8 processes and 300K total dofs, you will have around 40K
>>>    dofs per process, which is ok for a direct solver like MUMPS
>>>    (-pc_bddc_dirichlet_pc_factor_mat_solver_type mumps, same for Neumann).
>>>    With Nedelec dofs and the sparsity pattern they induce,  I believe you can
>>>    push to 80K dofs per process with good performance.
>>>    - Why 5000 of restart for GMRES? It is highly inefficient to
>>>    re-orthogonalize such a large set of vectors.
>>>
>>>
>>> Il giorno ven 16 ago 2024 alle ore 00:04 neil liu <liufield at gmail.com>
>>> ha scritto:
>>>
>>>> Dear Petsc developers,
>>>>
>>>> Thanks for your previous help. Now, the PCBDDC can converge to 1e-8
>>>> with,
>>>>
>>>> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 8 ./app -pc_type bddc
>>>> -pc_bddc_coarse_redundant_pc_type svd   -ksp_error_if_not_converged
>>>> -mat_type is -ksp_monitor -ksp_rtol 1e-8 -ksp_gmres_restart 5000 -ksp_view
>>>> -pc_bddc_use_local_mat_graph 0  -pc_bddc_dirichlet_pc_type ilu
>>>> -pc_bddc_neumann_pc_type gamg -pc_bddc_neumann_pc_gamg_esteig_ksp_max_it 10
>>>> -ksp_converged_reason -pc_bddc_neumann_approximate -ksp_max_it 500 -log_view
>>>>
>>>> Then I used 2 cases for strong scaling test. One case only involves
>>>> real numbers (tetra #: 49,152; dof #: 324, 224 ) for matrix and rhs. The
>>>> 2nd case involves complex numbers  (tetra #: 95,336; dof #: 611,432)  due
>>>> to PML.
>>>>
>>>> Case 1:
>>>> cpu #                Time for 500 ksp steps (s)    Parallel efficiency
>>>>    PCsetup time(s)
>>>>           2              234.7
>>>>                         3.12
>>>>           4              126.6
>>>>  0.92                      1.62
>>>>           8              84.97
>>>>  0.69                      1.26
>>>> However for Case 2,
>>>> cpu #                Time for 500 ksp steps (s)    Parallel efficiency
>>>>  PCsetup time(s)
>>>>           2              584.5
>>>>                             8.61
>>>>           4              376.8                                    0.77
>>>>                          6.56
>>>>           8              459.6                                    0.31
>>>>                        66.47
>>>> For these 2 cases, I checked the time for PCsetup as an example. It
>>>> seems 8 cpus for case 2 used too much time on PCsetup.
>>>> Do you have any ideas about what is going on here?
>>>>
>>>> Thanks,
>>>> Xiaodong
>>>>
>>>>
>>>>
>>>
>>> --
>>> Stefano
>>>
>>
>
> --
> Stefano
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240820/bc85790d/attachment.html>

From knepley at gmail.com  Tue Aug 20 12:16:22 2024
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 20 Aug 2024 13:16:22 -0400
Subject: [petsc-users] Strong scaling concerns for PCBDDC with Vector FEM
In-Reply-To: <CAGVJNHC0szqP83qvyn73cQySAKHPXLRPapmSqYJ-mRYa9cQSqQ@mail.gmail.com>
References: <CAGVJNHCmbz7tSu5i4m5LJsFyJ1wOQ=wLQ-n0zP0NxocCnMeG8Q@mail.gmail.com>
	<CAGPUisjdyE8peRFUzF=w=37bF76WeaxJFK8kPZTwDY2cisM=EQ@mail.gmail.com>
	<CAGVJNHBJbLGwVgmF056=3f=FHNny5ter1UjBdbumsbFiw9aZkQ@mail.gmail.com>
	<CAGPUisgOugaDwvERx-19E7bGw5cEyDLzUd5Xoi4MOEBbDPmmUw@mail.gmail.com>
	<CAGVJNHC0szqP83qvyn73cQySAKHPXLRPapmSqYJ-mRYa9cQSqQ@mail.gmail.com>
Message-ID: <CAMYG4GkZKGOVBuouirkVmhsSZhg1qG7TBeRjokS-GKXpTu=mUg@mail.gmail.com>

On Tue, Aug 20, 2024 at 1:10?PM neil liu <liufield at gmail.com> wrote:

> Thanks a lot for your explanation, Stefano. Very helpful.
> Yes. I am using dmplex to read a tetrahdra mesh from gmsh. With parmetis,
> the scaling performance is improved a lot.
> I will read your paper about how to change the basis for Nedelec elements.
>
> cpu #    time for 500 ksp steps  (s)           parallel efficiency
> 2           546
> 4           224                                               120%
> 8           170                                               80%
> This results are much better than previous attempt. Then I checked the
> time spent by several Petsc built-in functions for the ksp solver.
>
> Functions          time(2 cpus)     time(4 cpus)      time(8 cpus)
> VecMDot           78.32                43.28                30.47
> VecMAXPY       92.95                48.37                30.798
> MatMult          246.08               126.63                82.94
>
> It seems from cpu 4 to cpu 8, the scaling is not as good as from cpu 2 to
> cpu 4.
> Am I  missing something?
>

Did you normalize by the number of calls?

  Thanks,

     Matt


> Thanks a lot,
>
> Xiaodong
>
>
> On Mon, Aug 19, 2024 at 4:15?AM Stefano Zampini <stefano.zampini at gmail.com>
> wrote:
>
>> It seems you are using DMPLEX to handle the mesh, correct?
>> If so, you should configure using --download-parmetis to have a better
>> domain decomposition since the default one just splits the cells in chunks
>> as they are ordered.
>> This results in a large number of primal dofs on average (191, from the
>> output of ksp_view)
>> ...
>> Primal    dofs   : 176 204 191
>> ...
>> that slows down the solver setup.
>>
>> Again, you should not use approximate local solvers with BDDC unless you
>> know what you are doing.
>> The theory for approximate solvers for BDDC is small and only for SPD
>> problems.
>> Looking at the output of log_view, coarse problem setup (PCBDDCCSet), and
>> primal functions setup (PCBDDCCorr) costs 35 + 63 seconds, respectively.
>> Also, the 500 application of the GAMG preconditioner for the Neumann
>> solver (PCBDDCNeuS) takes 129 seconds out of the 400 seconds of the total
>> solve time.
>>
>> PCBDDCTopo             1 1.0 3.1563e-01 1.0 1.11e+06 3.4 1.6e+03 3.9e+04
>> 3.8e+01  0  0  1  0  2   0  0  1  0  2    19
>> PCBDDCLKSP             2 1.0 2.0423e+00 1.7 9.31e+08 1.2 0.0e+00 0.0e+00
>> 2.0e+00  0  0  0  0  0   0  0  0  0  0  3378
>> PCBDDCLWor             1 1.0 3.9178e-02 13.4 0.00e+00 0.0 0.0e+00 0.0e+00
>> 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> PCBDDCCorr             1 1.0 6.3981e+01 2.2 8.16e+10 1.6 0.0e+00 0.0e+00
>> 0.0e+00 11 11  0  0  0  11 11  0  0  0  8900
>> PCBDDCCSet             1 1.0 3.5453e+01 4564.9 1.06e+05 1.7 1.2e+03
>> 5.3e+03 5.0e+01  2  0  1  0  3   2  0  1  0  3     0
>> PCBDDCCKSP             1 1.0 6.3266e-01 1.3 0.00e+00 0.0 3.3e+02 1.1e+02
>> 2.2e+01  0  0  0  0  1   0  0  0  0  1     0
>> PCBDDCScal             1 1.0 6.8274e-03 1.3 1.11e+06 3.4 5.6e+01 3.2e+05
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0   894
>> PCBDDCDirS          1000 1.0 6.0420e+00 3.5 6.64e+09 5.4 0.0e+00 0.0e+00
>> 0.0e+00  1  0  0  0  0   1  0  0  0  0  2995
>> PCBDDCNeuS           500 1.0 1.2901e+02 2.1 8.28e+10 1.2 0.0e+00 0.0e+00
>> 0.0e+00 22 12  0  0  0  22 12  0  0  0  4828
>> PCBDDCCoaS           500 1.0 5.8757e-01 1.8 1.09e+09 1.0 2.8e+04 7.4e+02
>> 5.0e+02  0  0 17  0 28   0  0 17  0 31 14901
>>
>> Finally, if I look at the residual history, I see a sharp decrease and a
>> very long plateau. This indicates a bad coarse space; as I said before,
>> there's no hope of finding a suitable coarse space without first changing
>> the basis of the Nedelec elements, which is done automatically if you
>> prescribe the discrete gradient operator (see the paper I have linked to in
>> my previous communication).
>>
>>
>>
>> Il giorno dom 18 ago 2024 alle ore 00:37 neil liu <liufield at gmail.com>
>> ha scritto:
>>
>>> Hi, Stefano,
>>> Please see the attached for the information with 4 and 8 CPUs for the
>>> complex matrix.
>>> I am solving Maxwell equations (Attahced) using 2nd-order Nedelec
>>> elements (two dofs each edge, and two dofs each face).
>>> The computational domain consists of different mediums, e.g., vacuum and
>>> substrate (different permitivity).
>>> The PML is used to truncate the computational domain, absorbing the
>>> outgoing wave and introducing complex numbers for the matrix.
>>>
>>> Thanks a lot for your suggestions. I will try MUMPS.
>>> For now, I just want to fiddle with Petsc's built-in features to know
>>> more about it.
>>> Yes. 5000 is larger. Smaller value. e.g., 30, converges very slowly.
>>>
>>> Thanks a lot.
>>>
>>> Have a good weekend.
>>>
>>>
>>> On Sat, Aug 17, 2024 at 9:23?AM Stefano Zampini <
>>> stefano.zampini at gmail.com> wrote:
>>>
>>>> Please include the output of -log_view -ksp_view -ksp_monitor to
>>>> understand what's happening.
>>>>
>>>> Can you please share the equations you are solving so we can provide
>>>> suggestions on the solver configuration?
>>>> As I said, solving for Nedelec-type discretizations is challenging, and
>>>> not for off-the-shelf, black box solvers
>>>>
>>>> Below are some comments:
>>>>
>>>>
>>>>    - You use a redundant SVD approach for the coarse solve, which can
>>>>    be inefficient if your coarse space grows. You can use a parallel direct
>>>>    solver like MUMPS (reconfigure with --download-mumps and use
>>>>    -pc_bddc_coarse_pc_type lu -pc_bddc_coarse_pc_factor_mat_solver_type mumps)
>>>>    - Why use ILU for the Dirichlet problem and GAMG for the Neumann
>>>>    problem? With 8 processes and 300K total dofs, you will have around 40K
>>>>    dofs per process, which is ok for a direct solver like MUMPS
>>>>    (-pc_bddc_dirichlet_pc_factor_mat_solver_type mumps, same for Neumann).
>>>>    With Nedelec dofs and the sparsity pattern they induce,  I believe you can
>>>>    push to 80K dofs per process with good performance.
>>>>    - Why 5000 of restart for GMRES? It is highly inefficient to
>>>>    re-orthogonalize such a large set of vectors.
>>>>
>>>>
>>>> Il giorno ven 16 ago 2024 alle ore 00:04 neil liu <liufield at gmail.com>
>>>> ha scritto:
>>>>
>>>>> Dear Petsc developers,
>>>>>
>>>>> Thanks for your previous help. Now, the PCBDDC can converge to 1e-8
>>>>> with,
>>>>>
>>>>> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 8 ./app -pc_type
>>>>> bddc -pc_bddc_coarse_redundant_pc_type svd   -ksp_error_if_not_converged
>>>>> -mat_type is -ksp_monitor -ksp_rtol 1e-8 -ksp_gmres_restart 5000 -ksp_view
>>>>> -pc_bddc_use_local_mat_graph 0  -pc_bddc_dirichlet_pc_type ilu
>>>>> -pc_bddc_neumann_pc_type gamg -pc_bddc_neumann_pc_gamg_esteig_ksp_max_it 10
>>>>> -ksp_converged_reason -pc_bddc_neumann_approximate -ksp_max_it 500 -log_view
>>>>>
>>>>> Then I used 2 cases for strong scaling test. One case only involves
>>>>> real numbers (tetra #: 49,152; dof #: 324, 224 ) for matrix and rhs. The
>>>>> 2nd case involves complex numbers  (tetra #: 95,336; dof #: 611,432)  due
>>>>> to PML.
>>>>>
>>>>> Case 1:
>>>>> cpu #                Time for 500 ksp steps (s)    Parallel
>>>>> efficiency     PCsetup time(s)
>>>>>           2              234.7
>>>>>                           3.12
>>>>>           4              126.6
>>>>>  0.92                      1.62
>>>>>           8              84.97
>>>>>  0.69                      1.26
>>>>> However for Case 2,
>>>>> cpu #                Time for 500 ksp steps (s)    Parallel
>>>>> efficiency   PCsetup time(s)
>>>>>           2              584.5
>>>>>                               8.61
>>>>>           4              376.8
>>>>> 0.77                           6.56
>>>>>           8              459.6
>>>>> 0.31                         66.47
>>>>> For these 2 cases, I checked the time for PCsetup as an example. It
>>>>> seems 8 cpus for case 2 used too much time on PCsetup.
>>>>> Do you have any ideas about what is going on here?
>>>>>
>>>>> Thanks,
>>>>> Xiaodong
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Stefano
>>>>
>>>
>>
>> --
>> Stefano
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cv8zEG85Ua5eMw-4Fw6dtM6pp_fpFPiPqLUZHoZeqOqx846JROreXMlDUQnUBwxMFMeAWi4j-2LV_1iMA-xF$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cv8zEG85Ua5eMw-4Fw6dtM6pp_fpFPiPqLUZHoZeqOqx846JROreXMlDUQnUBwxMFMeAWi4j-2LV_20fuHbB$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240820/329e82e8/attachment-0001.html>

From liufield at gmail.com  Tue Aug 20 12:35:50 2024
From: liufield at gmail.com (neil liu)
Date: Tue, 20 Aug 2024 13:35:50 -0400
Subject: [petsc-users] Strong scaling concerns for PCBDDC with Vector FEM
In-Reply-To: <CAMYG4GkZKGOVBuouirkVmhsSZhg1qG7TBeRjokS-GKXpTu=mUg@mail.gmail.com>
References: <CAGVJNHCmbz7tSu5i4m5LJsFyJ1wOQ=wLQ-n0zP0NxocCnMeG8Q@mail.gmail.com>
	<CAGPUisjdyE8peRFUzF=w=37bF76WeaxJFK8kPZTwDY2cisM=EQ@mail.gmail.com>
	<CAGVJNHBJbLGwVgmF056=3f=FHNny5ter1UjBdbumsbFiw9aZkQ@mail.gmail.com>
	<CAGPUisgOugaDwvERx-19E7bGw5cEyDLzUd5Xoi4MOEBbDPmmUw@mail.gmail.com>
	<CAGVJNHC0szqP83qvyn73cQySAKHPXLRPapmSqYJ-mRYa9cQSqQ@mail.gmail.com>
	<CAMYG4GkZKGOVBuouirkVmhsSZhg1qG7TBeRjokS-GKXpTu=mUg@mail.gmail.com>
Message-ID: <CAGVJNHAV1xQKgjuNQiJ74OYBPAU7OoObeStr7gaKGV+1XCPsVQ@mail.gmail.com>

Hi, Matt,
I think the time listed here represents the maximum total time across
different processors.

Thanks a lot.
                         2 cpus
        4 cpus                                           8 cpus
Event          Count                 Time (sec)              Count
       Time (sec)                Count                 Time (sec)
                   Max Ratio        Max        Ratio           Max Ratio
    Max     Ratio               Max Ratio        Max     Ratio
VecMDot      530 1.0         7.8320e+01 1.0         530    1.0
 4.3285e+01 1.1           530   1.0          3.0476e+01   1.1
VecMAXPY  534 1.0         9.2954e+01 1.0         534    1.0
4.8378e+01 1.1          534   1.0          3.0798e+01   1.1
MatMult      8055 1.0         2.4608e+02 1.0        8103   1.0
1.2663e+02 1.0          8367 1.0           8.2942e+01 1.1

On Tue, Aug 20, 2024 at 1:16?PM Matthew Knepley <knepley at gmail.com> wrote:

> On Tue, Aug 20, 2024 at 1:10?PM neil liu <liufield at gmail.com> wrote:
>
>> Thanks a lot for your explanation, Stefano. Very helpful.
>> Yes. I am using dmplex to read a tetrahdra mesh from gmsh. With parmetis,
>> the scaling performance is improved a lot.
>> I will read your paper about how to change the basis for Nedelec
>> elements.
>>
>> cpu #    time for 500 ksp steps  (s)           parallel efficiency
>> 2           546
>> 4           224                                               120%
>> 8           170                                               80%
>> This results are much better than previous attempt. Then I checked the
>> time spent by several Petsc built-in functions for the ksp solver.
>>
>> Functions          time(2 cpus)     time(4 cpus)      time(8 cpus)
>> VecMDot           78.32                43.28                30.47
>> VecMAXPY       92.95                48.37                30.798
>> MatMult          246.08               126.63                82.94
>>
>> It seems from cpu 4 to cpu 8, the scaling is not as good as from cpu 2 to
>> cpu 4.
>> Am I  missing something?
>>
>
> Did you normalize by the number of calls?
>
>   Thanks,
>
>      Matt
>
>
>> Thanks a lot,
>>
>> Xiaodong
>>
>>
>> On Mon, Aug 19, 2024 at 4:15?AM Stefano Zampini <
>> stefano.zampini at gmail.com> wrote:
>>
>>> It seems you are using DMPLEX to handle the mesh, correct?
>>> If so, you should configure using --download-parmetis to have a better
>>> domain decomposition since the default one just splits the cells in chunks
>>> as they are ordered.
>>> This results in a large number of primal dofs on average (191, from the
>>> output of ksp_view)
>>> ...
>>> Primal    dofs   : 176 204 191
>>> ...
>>> that slows down the solver setup.
>>>
>>> Again, you should not use approximate local solvers with BDDC unless you
>>> know what you are doing.
>>> The theory for approximate solvers for BDDC is small and only for SPD
>>> problems.
>>> Looking at the output of log_view, coarse problem setup (PCBDDCCSet),
>>> and primal functions setup (PCBDDCCorr) costs 35 + 63 seconds, respectively.
>>> Also, the 500 application of the GAMG preconditioner for the Neumann
>>> solver (PCBDDCNeuS) takes 129 seconds out of the 400 seconds of the total
>>> solve time.
>>>
>>> PCBDDCTopo             1 1.0 3.1563e-01 1.0 1.11e+06 3.4 1.6e+03 3.9e+04
>>> 3.8e+01  0  0  1  0  2   0  0  1  0  2    19
>>> PCBDDCLKSP             2 1.0 2.0423e+00 1.7 9.31e+08 1.2 0.0e+00 0.0e+00
>>> 2.0e+00  0  0  0  0  0   0  0  0  0  0  3378
>>> PCBDDCLWor             1 1.0 3.9178e-02 13.4 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> PCBDDCCorr             1 1.0 6.3981e+01 2.2 8.16e+10 1.6 0.0e+00 0.0e+00
>>> 0.0e+00 11 11  0  0  0  11 11  0  0  0  8900
>>> PCBDDCCSet             1 1.0 3.5453e+01 4564.9 1.06e+05 1.7 1.2e+03
>>> 5.3e+03 5.0e+01  2  0  1  0  3   2  0  1  0  3     0
>>> PCBDDCCKSP             1 1.0 6.3266e-01 1.3 0.00e+00 0.0 3.3e+02 1.1e+02
>>> 2.2e+01  0  0  0  0  1   0  0  0  0  1     0
>>> PCBDDCScal             1 1.0 6.8274e-03 1.3 1.11e+06 3.4 5.6e+01 3.2e+05
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0   894
>>> PCBDDCDirS          1000 1.0 6.0420e+00 3.5 6.64e+09 5.4 0.0e+00 0.0e+00
>>> 0.0e+00  1  0  0  0  0   1  0  0  0  0  2995
>>> PCBDDCNeuS           500 1.0 1.2901e+02 2.1 8.28e+10 1.2 0.0e+00 0.0e+00
>>> 0.0e+00 22 12  0  0  0  22 12  0  0  0  4828
>>> PCBDDCCoaS           500 1.0 5.8757e-01 1.8 1.09e+09 1.0 2.8e+04 7.4e+02
>>> 5.0e+02  0  0 17  0 28   0  0 17  0 31 14901
>>>
>>> Finally, if I look at the residual history, I see a sharp decrease and a
>>> very long plateau. This indicates a bad coarse space; as I said before,
>>> there's no hope of finding a suitable coarse space without first changing
>>> the basis of the Nedelec elements, which is done automatically if you
>>> prescribe the discrete gradient operator (see the paper I have linked to in
>>> my previous communication).
>>>
>>>
>>>
>>> Il giorno dom 18 ago 2024 alle ore 00:37 neil liu <liufield at gmail.com>
>>> ha scritto:
>>>
>>>> Hi, Stefano,
>>>> Please see the attached for the information with 4 and 8 CPUs for the
>>>> complex matrix.
>>>> I am solving Maxwell equations (Attahced) using 2nd-order Nedelec
>>>> elements (two dofs each edge, and two dofs each face).
>>>> The computational domain consists of different mediums, e.g.,
>>>> vacuum and substrate (different permitivity).
>>>> The PML is used to truncate the computational domain, absorbing the
>>>> outgoing wave and introducing complex numbers for the matrix.
>>>>
>>>> Thanks a lot for your suggestions. I will try MUMPS.
>>>> For now, I just want to fiddle with Petsc's built-in features to know
>>>> more about it.
>>>> Yes. 5000 is larger. Smaller value. e.g., 30, converges very slowly.
>>>>
>>>> Thanks a lot.
>>>>
>>>> Have a good weekend.
>>>>
>>>>
>>>> On Sat, Aug 17, 2024 at 9:23?AM Stefano Zampini <
>>>> stefano.zampini at gmail.com> wrote:
>>>>
>>>>> Please include the output of -log_view -ksp_view -ksp_monitor to
>>>>> understand what's happening.
>>>>>
>>>>> Can you please share the equations you are solving so we can provide
>>>>> suggestions on the solver configuration?
>>>>> As I said, solving for Nedelec-type discretizations is challenging,
>>>>> and not for off-the-shelf, black box solvers
>>>>>
>>>>> Below are some comments:
>>>>>
>>>>>
>>>>>    - You use a redundant SVD approach for the coarse solve, which can
>>>>>    be inefficient if your coarse space grows. You can use a parallel direct
>>>>>    solver like MUMPS (reconfigure with --download-mumps and use
>>>>>    -pc_bddc_coarse_pc_type lu -pc_bddc_coarse_pc_factor_mat_solver_type mumps)
>>>>>    - Why use ILU for the Dirichlet problem and GAMG for the Neumann
>>>>>    problem? With 8 processes and 300K total dofs, you will have around 40K
>>>>>    dofs per process, which is ok for a direct solver like MUMPS
>>>>>    (-pc_bddc_dirichlet_pc_factor_mat_solver_type mumps, same for Neumann).
>>>>>    With Nedelec dofs and the sparsity pattern they induce,  I believe you can
>>>>>    push to 80K dofs per process with good performance.
>>>>>    - Why 5000 of restart for GMRES? It is highly inefficient to
>>>>>    re-orthogonalize such a large set of vectors.
>>>>>
>>>>>
>>>>> Il giorno ven 16 ago 2024 alle ore 00:04 neil liu <liufield at gmail.com>
>>>>> ha scritto:
>>>>>
>>>>>> Dear Petsc developers,
>>>>>>
>>>>>> Thanks for your previous help. Now, the PCBDDC can converge to 1e-8
>>>>>> with,
>>>>>>
>>>>>> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 8 ./app -pc_type
>>>>>> bddc -pc_bddc_coarse_redundant_pc_type svd   -ksp_error_if_not_converged
>>>>>> -mat_type is -ksp_monitor -ksp_rtol 1e-8 -ksp_gmres_restart 5000 -ksp_view
>>>>>> -pc_bddc_use_local_mat_graph 0  -pc_bddc_dirichlet_pc_type ilu
>>>>>> -pc_bddc_neumann_pc_type gamg -pc_bddc_neumann_pc_gamg_esteig_ksp_max_it 10
>>>>>> -ksp_converged_reason -pc_bddc_neumann_approximate -ksp_max_it 500 -log_view
>>>>>>
>>>>>> Then I used 2 cases for strong scaling test. One case only involves
>>>>>> real numbers (tetra #: 49,152; dof #: 324, 224 ) for matrix and rhs. The
>>>>>> 2nd case involves complex numbers  (tetra #: 95,336; dof #: 611,432)  due
>>>>>> to PML.
>>>>>>
>>>>>> Case 1:
>>>>>> cpu #                Time for 500 ksp steps (s)    Parallel
>>>>>> efficiency     PCsetup time(s)
>>>>>>           2              234.7
>>>>>>                           3.12
>>>>>>           4              126.6
>>>>>>  0.92                      1.62
>>>>>>           8              84.97
>>>>>>  0.69                      1.26
>>>>>> However for Case 2,
>>>>>> cpu #                Time for 500 ksp steps (s)    Parallel
>>>>>> efficiency   PCsetup time(s)
>>>>>>           2              584.5
>>>>>>                               8.61
>>>>>>           4              376.8
>>>>>> 0.77                           6.56
>>>>>>           8              459.6
>>>>>> 0.31                         66.47
>>>>>> For these 2 cases, I checked the time for PCsetup as an example. It
>>>>>> seems 8 cpus for case 2 used too much time on PCsetup.
>>>>>> Do you have any ideas about what is going on here?
>>>>>>
>>>>>> Thanks,
>>>>>> Xiaodong
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Stefano
>>>>>
>>>>
>>>
>>> --
>>> Stefano
>>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ca9AF5cdAY7vJ6tkRgYarVU9gtRitWOShMIF4jR7s-PtvHGDo4bufcirY-qoE9vkvAzYBYCegD6y6bCQf02bqQ$ 
> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ca9AF5cdAY7vJ6tkRgYarVU9gtRitWOShMIF4jR7s-PtvHGDo4bufcirY-qoE9vkvAzYBYCegD6y6bChQGuxgQ$ >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240820/b6a1ff67/attachment-0001.html>

From knepley at gmail.com  Tue Aug 20 12:45:36 2024
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 20 Aug 2024 13:45:36 -0400
Subject: [petsc-users] Strong scaling concerns for PCBDDC with Vector FEM
In-Reply-To: <CAGVJNHAV1xQKgjuNQiJ74OYBPAU7OoObeStr7gaKGV+1XCPsVQ@mail.gmail.com>
References: <CAGVJNHCmbz7tSu5i4m5LJsFyJ1wOQ=wLQ-n0zP0NxocCnMeG8Q@mail.gmail.com>
	<CAGPUisjdyE8peRFUzF=w=37bF76WeaxJFK8kPZTwDY2cisM=EQ@mail.gmail.com>
	<CAGVJNHBJbLGwVgmF056=3f=FHNny5ter1UjBdbumsbFiw9aZkQ@mail.gmail.com>
	<CAGPUisgOugaDwvERx-19E7bGw5cEyDLzUd5Xoi4MOEBbDPmmUw@mail.gmail.com>
	<CAGVJNHC0szqP83qvyn73cQySAKHPXLRPapmSqYJ-mRYa9cQSqQ@mail.gmail.com>
	<CAMYG4GkZKGOVBuouirkVmhsSZhg1qG7TBeRjokS-GKXpTu=mUg@mail.gmail.com>
	<CAGVJNHAV1xQKgjuNQiJ74OYBPAU7OoObeStr7gaKGV+1XCPsVQ@mail.gmail.com>
Message-ID: <CAMYG4Gkt++oohgBvZnybLTsccYhw-izYqQ3Ua_yS9xHhSYB7EA@mail.gmail.com>

On Tue, Aug 20, 2024 at 1:36?PM neil liu <liufield at gmail.com> wrote:

> Hi, Matt,
> I think the time listed here represents the maximum total time across
> different processors.
>
> Thanks a lot.
>                          2 cpus
>         4 cpus                                           8 cpus
> Event          Count                 Time (sec)              Count
>          Time (sec)                Count                 Time (sec)
>                    Max Ratio        Max        Ratio           Max Ratio
>       Max     Ratio               Max Ratio        Max     Ratio
> VecMDot      530 1.0         7.8320e+01 1.0         530    1.0
>  4.3285e+01 1.1           530   1.0          3.0476e+01   1.1
> VecMAXPY  534 1.0         9.2954e+01 1.0         534    1.0
> 4.8378e+01 1.1          534   1.0          3.0798e+01   1.1
> MatMult      8055 1.0         2.4608e+02 1.0        8103   1.0
> 1.2663e+02 1.0          8367 1.0           8.2942e+01 1.1
>

For the number of calls listed.

1) The number of MatMults goes up, so you should normalize for that, but
you still have about 1.6 speedup. However, this is
    all multiplications. Are we sure they have the same size and sparsity?

2) MAXPY is also 1.6

3) MDot probably does not see the latency of one node, so again it is not
speeding up as you might want.

This looks like you are using a single node with 2, 4, and 8 procs. The
memory bandwidth is exhausted sometime before 8 procs
(maybe 6), so you cease to see speedup. You can check this by running `make
streams` on the node.

  Thanks,

     Matt


> On Tue, Aug 20, 2024 at 1:16?PM Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Tue, Aug 20, 2024 at 1:10?PM neil liu <liufield at gmail.com> wrote:
>>
>>> Thanks a lot for your explanation, Stefano. Very helpful.
>>> Yes. I am using dmplex to read a tetrahdra mesh from gmsh. With
>>> parmetis, the scaling performance is improved a lot.
>>> I will read your paper about how to change the basis for Nedelec
>>> elements.
>>>
>>> cpu #    time for 500 ksp steps  (s)           parallel efficiency
>>> 2           546
>>> 4           224                                               120%
>>> 8           170                                               80%
>>> This results are much better than previous attempt. Then I checked the
>>> time spent by several Petsc built-in functions for the ksp solver.
>>>
>>> Functions          time(2 cpus)     time(4 cpus)      time(8 cpus)
>>> VecMDot           78.32                43.28                30.47
>>> VecMAXPY       92.95                48.37                30.798
>>> MatMult          246.08               126.63                82.94
>>>
>>> It seems from cpu 4 to cpu 8, the scaling is not as good as from cpu 2
>>> to cpu 4.
>>> Am I  missing something?
>>>
>>
>> Did you normalize by the number of calls?
>>
>>   Thanks,
>>
>>      Matt
>>
>>
>>> Thanks a lot,
>>>
>>> Xiaodong
>>>
>>>
>>> On Mon, Aug 19, 2024 at 4:15?AM Stefano Zampini <
>>> stefano.zampini at gmail.com> wrote:
>>>
>>>> It seems you are using DMPLEX to handle the mesh, correct?
>>>> If so, you should configure using --download-parmetis to have a better
>>>> domain decomposition since the default one just splits the cells in chunks
>>>> as they are ordered.
>>>> This results in a large number of primal dofs on average (191, from
>>>> the  output of ksp_view)
>>>> ...
>>>> Primal    dofs   : 176 204 191
>>>> ...
>>>> that slows down the solver setup.
>>>>
>>>> Again, you should not use approximate local solvers with BDDC unless
>>>> you know what you are doing.
>>>> The theory for approximate solvers for BDDC is small and only for SPD
>>>> problems.
>>>> Looking at the output of log_view, coarse problem setup (PCBDDCCSet),
>>>> and primal functions setup (PCBDDCCorr) costs 35 + 63 seconds, respectively.
>>>> Also, the 500 application of the GAMG preconditioner for the Neumann
>>>> solver (PCBDDCNeuS) takes 129 seconds out of the 400 seconds of the total
>>>> solve time.
>>>>
>>>> PCBDDCTopo             1 1.0 3.1563e-01 1.0 1.11e+06 3.4 1.6e+03
>>>> 3.9e+04 3.8e+01  0  0  1  0  2   0  0  1  0  2    19
>>>> PCBDDCLKSP             2 1.0 2.0423e+00 1.7 9.31e+08 1.2 0.0e+00
>>>> 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0  3378
>>>> PCBDDCLWor             1 1.0 3.9178e-02 13.4 0.00e+00 0.0 0.0e+00
>>>> 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> PCBDDCCorr             1 1.0 6.3981e+01 2.2 8.16e+10 1.6 0.0e+00
>>>> 0.0e+00 0.0e+00 11 11  0  0  0  11 11  0  0  0  8900
>>>> PCBDDCCSet             1 1.0 3.5453e+01 4564.9 1.06e+05 1.7 1.2e+03
>>>> 5.3e+03 5.0e+01  2  0  1  0  3   2  0  1  0  3     0
>>>> PCBDDCCKSP             1 1.0 6.3266e-01 1.3 0.00e+00 0.0 3.3e+02
>>>> 1.1e+02 2.2e+01  0  0  0  0  1   0  0  0  0  1     0
>>>> PCBDDCScal             1 1.0 6.8274e-03 1.3 1.11e+06 3.4 5.6e+01
>>>> 3.2e+05 0.0e+00  0  0  0  0  0   0  0  0  0  0   894
>>>> PCBDDCDirS          1000 1.0 6.0420e+00 3.5 6.64e+09 5.4 0.0e+00
>>>> 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0  2995
>>>> PCBDDCNeuS           500 1.0 1.2901e+02 2.1 8.28e+10 1.2 0.0e+00
>>>> 0.0e+00 0.0e+00 22 12  0  0  0  22 12  0  0  0  4828
>>>> PCBDDCCoaS           500 1.0 5.8757e-01 1.8 1.09e+09 1.0 2.8e+04
>>>> 7.4e+02 5.0e+02  0  0 17  0 28   0  0 17  0 31 14901
>>>>
>>>> Finally, if I look at the residual history, I see a sharp decrease and
>>>> a very long plateau. This indicates a bad coarse space; as I said before,
>>>> there's no hope of finding a suitable coarse space without first changing
>>>> the basis of the Nedelec elements, which is done automatically if you
>>>> prescribe the discrete gradient operator (see the paper I have linked to in
>>>> my previous communication).
>>>>
>>>>
>>>>
>>>> Il giorno dom 18 ago 2024 alle ore 00:37 neil liu <liufield at gmail.com>
>>>> ha scritto:
>>>>
>>>>> Hi, Stefano,
>>>>> Please see the attached for the information with 4 and 8 CPUs for the
>>>>> complex matrix.
>>>>> I am solving Maxwell equations (Attahced) using 2nd-order Nedelec
>>>>> elements (two dofs each edge, and two dofs each face).
>>>>> The computational domain consists of different mediums, e.g.,
>>>>> vacuum and substrate (different permitivity).
>>>>> The PML is used to truncate the computational domain, absorbing the
>>>>> outgoing wave and introducing complex numbers for the matrix.
>>>>>
>>>>> Thanks a lot for your suggestions. I will try MUMPS.
>>>>> For now, I just want to fiddle with Petsc's built-in features to know
>>>>> more about it.
>>>>> Yes. 5000 is larger. Smaller value. e.g., 30, converges very slowly.
>>>>>
>>>>> Thanks a lot.
>>>>>
>>>>> Have a good weekend.
>>>>>
>>>>>
>>>>> On Sat, Aug 17, 2024 at 9:23?AM Stefano Zampini <
>>>>> stefano.zampini at gmail.com> wrote:
>>>>>
>>>>>> Please include the output of -log_view -ksp_view -ksp_monitor to
>>>>>> understand what's happening.
>>>>>>
>>>>>> Can you please share the equations you are solving so we can provide
>>>>>> suggestions on the solver configuration?
>>>>>> As I said, solving for Nedelec-type discretizations is challenging,
>>>>>> and not for off-the-shelf, black box solvers
>>>>>>
>>>>>> Below are some comments:
>>>>>>
>>>>>>
>>>>>>    - You use a redundant SVD approach for the coarse solve, which
>>>>>>    can be inefficient if your coarse space grows. You can use a parallel
>>>>>>    direct solver like MUMPS (reconfigure with --download-mumps and use
>>>>>>    -pc_bddc_coarse_pc_type lu -pc_bddc_coarse_pc_factor_mat_solver_type mumps)
>>>>>>    - Why use ILU for the Dirichlet problem and GAMG for the Neumann
>>>>>>    problem? With 8 processes and 300K total dofs, you will have around 40K
>>>>>>    dofs per process, which is ok for a direct solver like MUMPS
>>>>>>    (-pc_bddc_dirichlet_pc_factor_mat_solver_type mumps, same for Neumann).
>>>>>>    With Nedelec dofs and the sparsity pattern they induce,  I believe you can
>>>>>>    push to 80K dofs per process with good performance.
>>>>>>    - Why 5000 of restart for GMRES? It is highly inefficient to
>>>>>>    re-orthogonalize such a large set of vectors.
>>>>>>
>>>>>>
>>>>>> Il giorno ven 16 ago 2024 alle ore 00:04 neil liu <liufield at gmail.com>
>>>>>> ha scritto:
>>>>>>
>>>>>>> Dear Petsc developers,
>>>>>>>
>>>>>>> Thanks for your previous help. Now, the PCBDDC can converge to 1e-8
>>>>>>> with,
>>>>>>>
>>>>>>> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 8 ./app -pc_type
>>>>>>> bddc -pc_bddc_coarse_redundant_pc_type svd   -ksp_error_if_not_converged
>>>>>>> -mat_type is -ksp_monitor -ksp_rtol 1e-8 -ksp_gmres_restart 5000 -ksp_view
>>>>>>> -pc_bddc_use_local_mat_graph 0  -pc_bddc_dirichlet_pc_type ilu
>>>>>>> -pc_bddc_neumann_pc_type gamg -pc_bddc_neumann_pc_gamg_esteig_ksp_max_it 10
>>>>>>> -ksp_converged_reason -pc_bddc_neumann_approximate -ksp_max_it 500 -log_view
>>>>>>>
>>>>>>> Then I used 2 cases for strong scaling test. One case only involves
>>>>>>> real numbers (tetra #: 49,152; dof #: 324, 224 ) for matrix and rhs. The
>>>>>>> 2nd case involves complex numbers  (tetra #: 95,336; dof #: 611,432)  due
>>>>>>> to PML.
>>>>>>>
>>>>>>> Case 1:
>>>>>>> cpu #                Time for 500 ksp steps (s)    Parallel
>>>>>>> efficiency     PCsetup time(s)
>>>>>>>           2              234.7
>>>>>>>                             3.12
>>>>>>>           4              126.6
>>>>>>>  0.92                      1.62
>>>>>>>           8              84.97
>>>>>>>  0.69                      1.26
>>>>>>> However for Case 2,
>>>>>>> cpu #                Time for 500 ksp steps (s)    Parallel
>>>>>>> efficiency   PCsetup time(s)
>>>>>>>           2              584.5
>>>>>>>                                 8.61
>>>>>>>           4              376.8
>>>>>>> 0.77                           6.56
>>>>>>>           8              459.6
>>>>>>> 0.31                         66.47
>>>>>>> For these 2 cases, I checked the time for PCsetup as an example. It
>>>>>>> seems 8 cpus for case 2 used too much time on PCsetup.
>>>>>>> Do you have any ideas about what is going on here?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Xiaodong
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Stefano
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Stefano
>>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!eQty5R8qGgZBZNodHW90OVmUU1tsyjzmP4NkXVvtCk8QMzIM2XIAQEx4RrA_F814zU_1P_RsayqlJ7GNAhca$ 
>> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!eQty5R8qGgZBZNodHW90OVmUU1tsyjzmP4NkXVvtCk8QMzIM2XIAQEx4RrA_F814zU_1P_RsayqlJ6chGK_E$ >
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!eQty5R8qGgZBZNodHW90OVmUU1tsyjzmP4NkXVvtCk8QMzIM2XIAQEx4RrA_F814zU_1P_RsayqlJ7GNAhca$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!eQty5R8qGgZBZNodHW90OVmUU1tsyjzmP4NkXVvtCk8QMzIM2XIAQEx4RrA_F814zU_1P_RsayqlJ6chGK_E$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240820/584763e8/attachment-0001.html>

From liufield at gmail.com  Tue Aug 20 13:31:21 2024
From: liufield at gmail.com (neil liu)
Date: Tue, 20 Aug 2024 14:31:21 -0400
Subject: [petsc-users] Strong scaling concerns for PCBDDC with Vector FEM
In-Reply-To: <CAMYG4Gkt++oohgBvZnybLTsccYhw-izYqQ3Ua_yS9xHhSYB7EA@mail.gmail.com>
References: <CAGVJNHCmbz7tSu5i4m5LJsFyJ1wOQ=wLQ-n0zP0NxocCnMeG8Q@mail.gmail.com>
	<CAGPUisjdyE8peRFUzF=w=37bF76WeaxJFK8kPZTwDY2cisM=EQ@mail.gmail.com>
	<CAGVJNHBJbLGwVgmF056=3f=FHNny5ter1UjBdbumsbFiw9aZkQ@mail.gmail.com>
	<CAGPUisgOugaDwvERx-19E7bGw5cEyDLzUd5Xoi4MOEBbDPmmUw@mail.gmail.com>
	<CAGVJNHC0szqP83qvyn73cQySAKHPXLRPapmSqYJ-mRYa9cQSqQ@mail.gmail.com>
	<CAMYG4GkZKGOVBuouirkVmhsSZhg1qG7TBeRjokS-GKXpTu=mUg@mail.gmail.com>
	<CAGVJNHAV1xQKgjuNQiJ74OYBPAU7OoObeStr7gaKGV+1XCPsVQ@mail.gmail.com>
	<CAMYG4Gkt++oohgBvZnybLTsccYhw-izYqQ3Ua_yS9xHhSYB7EA@mail.gmail.com>
Message-ID: <CAGVJNHCkqRS5oeaUDZeSQe-rZWAMtDUui7sqD6ZZneqKrD=wrw@mail.gmail.com>

Thanks a lot for this explanation, Matt. I will explore whether the matrix
has the same size and spaisity.

On Tue, Aug 20, 2024 at 1:45?PM Matthew Knepley <knepley at gmail.com> wrote:

> On Tue, Aug 20, 2024 at 1:36?PM neil liu <liufield at gmail.com> wrote:
>
>> Hi, Matt,
>> I think the time listed here represents the maximum total time across
>> different processors.
>>
>> Thanks a lot.
>>                          2 cpus
>>           4 cpus                                           8 cpus
>> Event          Count                 Time (sec)              Count
>>          Time (sec)                Count                 Time (sec)
>>                    Max Ratio        Max        Ratio           Max Ratio
>>       Max     Ratio               Max Ratio        Max     Ratio
>> VecMDot      530 1.0         7.8320e+01 1.0         530    1.0
>>  4.3285e+01 1.1           530   1.0          3.0476e+01   1.1
>> VecMAXPY  534 1.0         9.2954e+01 1.0         534    1.0
>> 4.8378e+01 1.1          534   1.0          3.0798e+01   1.1
>> MatMult      8055 1.0         2.4608e+02 1.0        8103   1.0
>> 1.2663e+02 1.0          8367 1.0           8.2942e+01 1.1
>>
>
> For the number of calls listed.
>
> 1) The number of MatMults goes up, so you should normalize for that, but
> you still have about 1.6 speedup. However, this is
>     all multiplications. Are we sure they have the same size and sparsity?
>
> 2) MAXPY is also 1.6
>
> 3) MDot probably does not see the latency of one node, so again it is not
> speeding up as you might want.
>
> This looks like you are using a single node with 2, 4, and 8 procs. The
> memory bandwidth is exhausted sometime before 8 procs
> (maybe 6), so you cease to see speedup. You can check this by running
> `make streams` on the node.
>
>   Thanks,
>
>      Matt
>
>
>> On Tue, Aug 20, 2024 at 1:16?PM Matthew Knepley <knepley at gmail.com>
>> wrote:
>>
>>> On Tue, Aug 20, 2024 at 1:10?PM neil liu <liufield at gmail.com> wrote:
>>>
>>>> Thanks a lot for your explanation, Stefano. Very helpful.
>>>> Yes. I am using dmplex to read a tetrahdra mesh from gmsh. With
>>>> parmetis, the scaling performance is improved a lot.
>>>> I will read your paper about how to change the basis for Nedelec
>>>> elements.
>>>>
>>>> cpu #    time for 500 ksp steps  (s)           parallel efficiency
>>>> 2           546
>>>> 4           224                                               120%
>>>> 8           170                                               80%
>>>> This results are much better than previous attempt. Then I checked the
>>>> time spent by several Petsc built-in functions for the ksp solver.
>>>>
>>>> Functions          time(2 cpus)     time(4 cpus)      time(8 cpus)
>>>> VecMDot           78.32                43.28                30.47
>>>> VecMAXPY       92.95                48.37                30.798
>>>> MatMult          246.08               126.63                82.94
>>>>
>>>> It seems from cpu 4 to cpu 8, the scaling is not as good as from cpu 2
>>>> to cpu 4.
>>>> Am I  missing something?
>>>>
>>>
>>> Did you normalize by the number of calls?
>>>
>>>   Thanks,
>>>
>>>      Matt
>>>
>>>
>>>> Thanks a lot,
>>>>
>>>> Xiaodong
>>>>
>>>>
>>>> On Mon, Aug 19, 2024 at 4:15?AM Stefano Zampini <
>>>> stefano.zampini at gmail.com> wrote:
>>>>
>>>>> It seems you are using DMPLEX to handle the mesh, correct?
>>>>> If so, you should configure using --download-parmetis to have a better
>>>>> domain decomposition since the default one just splits the cells in chunks
>>>>> as they are ordered.
>>>>> This results in a large number of primal dofs on average (191, from
>>>>> the  output of ksp_view)
>>>>> ...
>>>>> Primal    dofs   : 176 204 191
>>>>> ...
>>>>> that slows down the solver setup.
>>>>>
>>>>> Again, you should not use approximate local solvers with BDDC unless
>>>>> you know what you are doing.
>>>>> The theory for approximate solvers for BDDC is small and only for SPD
>>>>> problems.
>>>>> Looking at the output of log_view, coarse problem setup (PCBDDCCSet),
>>>>> and primal functions setup (PCBDDCCorr) costs 35 + 63 seconds, respectively.
>>>>> Also, the 500 application of the GAMG preconditioner for the Neumann
>>>>> solver (PCBDDCNeuS) takes 129 seconds out of the 400 seconds of the total
>>>>> solve time.
>>>>>
>>>>> PCBDDCTopo             1 1.0 3.1563e-01 1.0 1.11e+06 3.4 1.6e+03
>>>>> 3.9e+04 3.8e+01  0  0  1  0  2   0  0  1  0  2    19
>>>>> PCBDDCLKSP             2 1.0 2.0423e+00 1.7 9.31e+08 1.2 0.0e+00
>>>>> 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0  3378
>>>>> PCBDDCLWor             1 1.0 3.9178e-02 13.4 0.00e+00 0.0 0.0e+00
>>>>> 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>> PCBDDCCorr             1 1.0 6.3981e+01 2.2 8.16e+10 1.6 0.0e+00
>>>>> 0.0e+00 0.0e+00 11 11  0  0  0  11 11  0  0  0  8900
>>>>> PCBDDCCSet             1 1.0 3.5453e+01 4564.9 1.06e+05 1.7 1.2e+03
>>>>> 5.3e+03 5.0e+01  2  0  1  0  3   2  0  1  0  3     0
>>>>> PCBDDCCKSP             1 1.0 6.3266e-01 1.3 0.00e+00 0.0 3.3e+02
>>>>> 1.1e+02 2.2e+01  0  0  0  0  1   0  0  0  0  1     0
>>>>> PCBDDCScal             1 1.0 6.8274e-03 1.3 1.11e+06 3.4 5.6e+01
>>>>> 3.2e+05 0.0e+00  0  0  0  0  0   0  0  0  0  0   894
>>>>> PCBDDCDirS          1000 1.0 6.0420e+00 3.5 6.64e+09 5.4 0.0e+00
>>>>> 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0  2995
>>>>> PCBDDCNeuS           500 1.0 1.2901e+02 2.1 8.28e+10 1.2 0.0e+00
>>>>> 0.0e+00 0.0e+00 22 12  0  0  0  22 12  0  0  0  4828
>>>>> PCBDDCCoaS           500 1.0 5.8757e-01 1.8 1.09e+09 1.0 2.8e+04
>>>>> 7.4e+02 5.0e+02  0  0 17  0 28   0  0 17  0 31 14901
>>>>>
>>>>> Finally, if I look at the residual history, I see a sharp decrease and
>>>>> a very long plateau. This indicates a bad coarse space; as I said before,
>>>>> there's no hope of finding a suitable coarse space without first changing
>>>>> the basis of the Nedelec elements, which is done automatically if you
>>>>> prescribe the discrete gradient operator (see the paper I have linked to in
>>>>> my previous communication).
>>>>>
>>>>>
>>>>>
>>>>> Il giorno dom 18 ago 2024 alle ore 00:37 neil liu <liufield at gmail.com>
>>>>> ha scritto:
>>>>>
>>>>>> Hi, Stefano,
>>>>>> Please see the attached for the information with 4 and 8 CPUs for the
>>>>>> complex matrix.
>>>>>> I am solving Maxwell equations (Attahced) using 2nd-order Nedelec
>>>>>> elements (two dofs each edge, and two dofs each face).
>>>>>> The computational domain consists of different mediums, e.g.,
>>>>>> vacuum and substrate (different permitivity).
>>>>>> The PML is used to truncate the computational domain, absorbing the
>>>>>> outgoing wave and introducing complex numbers for the matrix.
>>>>>>
>>>>>> Thanks a lot for your suggestions. I will try MUMPS.
>>>>>> For now, I just want to fiddle with Petsc's built-in features to know
>>>>>> more about it.
>>>>>> Yes. 5000 is larger. Smaller value. e.g., 30, converges very slowly.
>>>>>>
>>>>>> Thanks a lot.
>>>>>>
>>>>>> Have a good weekend.
>>>>>>
>>>>>>
>>>>>> On Sat, Aug 17, 2024 at 9:23?AM Stefano Zampini <
>>>>>> stefano.zampini at gmail.com> wrote:
>>>>>>
>>>>>>> Please include the output of -log_view -ksp_view -ksp_monitor to
>>>>>>> understand what's happening.
>>>>>>>
>>>>>>> Can you please share the equations you are solving so we can provide
>>>>>>> suggestions on the solver configuration?
>>>>>>> As I said, solving for Nedelec-type discretizations is challenging,
>>>>>>> and not for off-the-shelf, black box solvers
>>>>>>>
>>>>>>> Below are some comments:
>>>>>>>
>>>>>>>
>>>>>>>    - You use a redundant SVD approach for the coarse solve, which
>>>>>>>    can be inefficient if your coarse space grows. You can use a parallel
>>>>>>>    direct solver like MUMPS (reconfigure with --download-mumps and use
>>>>>>>    -pc_bddc_coarse_pc_type lu -pc_bddc_coarse_pc_factor_mat_solver_type mumps)
>>>>>>>    - Why use ILU for the Dirichlet problem and GAMG for the Neumann
>>>>>>>    problem? With 8 processes and 300K total dofs, you will have around 40K
>>>>>>>    dofs per process, which is ok for a direct solver like MUMPS
>>>>>>>    (-pc_bddc_dirichlet_pc_factor_mat_solver_type mumps, same for Neumann).
>>>>>>>    With Nedelec dofs and the sparsity pattern they induce,  I believe you can
>>>>>>>    push to 80K dofs per process with good performance.
>>>>>>>    - Why 5000 of restart for GMRES? It is highly inefficient to
>>>>>>>    re-orthogonalize such a large set of vectors.
>>>>>>>
>>>>>>>
>>>>>>> Il giorno ven 16 ago 2024 alle ore 00:04 neil liu <
>>>>>>> liufield at gmail.com> ha scritto:
>>>>>>>
>>>>>>>> Dear Petsc developers,
>>>>>>>>
>>>>>>>> Thanks for your previous help. Now, the PCBDDC can converge to 1e-8
>>>>>>>> with,
>>>>>>>>
>>>>>>>> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 8 ./app -pc_type
>>>>>>>> bddc -pc_bddc_coarse_redundant_pc_type svd   -ksp_error_if_not_converged
>>>>>>>> -mat_type is -ksp_monitor -ksp_rtol 1e-8 -ksp_gmres_restart 5000 -ksp_view
>>>>>>>> -pc_bddc_use_local_mat_graph 0  -pc_bddc_dirichlet_pc_type ilu
>>>>>>>> -pc_bddc_neumann_pc_type gamg -pc_bddc_neumann_pc_gamg_esteig_ksp_max_it 10
>>>>>>>> -ksp_converged_reason -pc_bddc_neumann_approximate -ksp_max_it 500 -log_view
>>>>>>>>
>>>>>>>> Then I used 2 cases for strong scaling test. One case only involves
>>>>>>>> real numbers (tetra #: 49,152; dof #: 324, 224 ) for matrix and rhs. The
>>>>>>>> 2nd case involves complex numbers  (tetra #: 95,336; dof #: 611,432)  due
>>>>>>>> to PML.
>>>>>>>>
>>>>>>>> Case 1:
>>>>>>>> cpu #                Time for 500 ksp steps (s)    Parallel
>>>>>>>> efficiency     PCsetup time(s)
>>>>>>>>           2              234.7
>>>>>>>>                             3.12
>>>>>>>>           4              126.6
>>>>>>>>  0.92                      1.62
>>>>>>>>           8              84.97
>>>>>>>>  0.69                      1.26
>>>>>>>> However for Case 2,
>>>>>>>> cpu #                Time for 500 ksp steps (s)    Parallel
>>>>>>>> efficiency   PCsetup time(s)
>>>>>>>>           2              584.5
>>>>>>>>                                 8.61
>>>>>>>>           4              376.8
>>>>>>>> 0.77                           6.56
>>>>>>>>           8              459.6
>>>>>>>> 0.31                         66.47
>>>>>>>> For these 2 cases, I checked the time for PCsetup as an example. It
>>>>>>>> seems 8 cpus for case 2 used too much time on PCsetup.
>>>>>>>> Do you have any ideas about what is going on here?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Xiaodong
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Stefano
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Stefano
>>>>>
>>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cyJV1P5nkrWGBQ-UVnZtbe-PVdnBESh8O4cLvI1MXjIrzOtnmzeW7XOz2HYHoQMXSg3E7SmUvsqb_dL2fyWPhg$ 
>>> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cyJV1P5nkrWGBQ-UVnZtbe-PVdnBESh8O4cLvI1MXjIrzOtnmzeW7XOz2HYHoQMXSg3E7SmUvsqb_dIYUO7Tng$ >
>>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cyJV1P5nkrWGBQ-UVnZtbe-PVdnBESh8O4cLvI1MXjIrzOtnmzeW7XOz2HYHoQMXSg3E7SmUvsqb_dL2fyWPhg$ 
> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cyJV1P5nkrWGBQ-UVnZtbe-PVdnBESh8O4cLvI1MXjIrzOtnmzeW7XOz2HYHoQMXSg3E7SmUvsqb_dIYUO7Tng$ >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240820/f8265e7a/attachment-0001.html>

From knepley at gmail.com  Tue Aug 20 16:53:20 2024
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 20 Aug 2024 17:53:20 -0400
Subject: [petsc-users] Strong scaling concerns for PCBDDC with Vector FEM
In-Reply-To: <CAGVJNHCkqRS5oeaUDZeSQe-rZWAMtDUui7sqD6ZZneqKrD=wrw@mail.gmail.com>
References: <CAGVJNHCmbz7tSu5i4m5LJsFyJ1wOQ=wLQ-n0zP0NxocCnMeG8Q@mail.gmail.com>
	<CAGPUisjdyE8peRFUzF=w=37bF76WeaxJFK8kPZTwDY2cisM=EQ@mail.gmail.com>
	<CAGVJNHBJbLGwVgmF056=3f=FHNny5ter1UjBdbumsbFiw9aZkQ@mail.gmail.com>
	<CAGPUisgOugaDwvERx-19E7bGw5cEyDLzUd5Xoi4MOEBbDPmmUw@mail.gmail.com>
	<CAGVJNHC0szqP83qvyn73cQySAKHPXLRPapmSqYJ-mRYa9cQSqQ@mail.gmail.com>
	<CAMYG4GkZKGOVBuouirkVmhsSZhg1qG7TBeRjokS-GKXpTu=mUg@mail.gmail.com>
	<CAGVJNHAV1xQKgjuNQiJ74OYBPAU7OoObeStr7gaKGV+1XCPsVQ@mail.gmail.com>
	<CAMYG4Gkt++oohgBvZnybLTsccYhw-izYqQ3Ua_yS9xHhSYB7EA@mail.gmail.com>
	<CAGVJNHCkqRS5oeaUDZeSQe-rZWAMtDUui7sqD6ZZneqKrD=wrw@mail.gmail.com>
Message-ID: <CAMYG4Gn=YSAjAFcwkz0JCUn5-NNCw_JZUyZecajG2c2dOxgjcA@mail.gmail.com>

On Tue, Aug 20, 2024 at 2:31?PM neil liu <liufield at gmail.com> wrote:

> Thanks a lot for this explanation, Matt. I will explore whether the matrix
> has the same size and spaisity.
>

I think it is much more likely that you just exhausted bandwidth on the
node.

  Thanks,

    Matt


> On Tue, Aug 20, 2024 at 1:45?PM Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Tue, Aug 20, 2024 at 1:36?PM neil liu <liufield at gmail.com> wrote:
>>
>>> Hi, Matt,
>>> I think the time listed here represents the maximum total time across
>>> different processors.
>>>
>>> Thanks a lot.
>>>                          2 cpus
>>>           4 cpus                                           8 cpus
>>> Event          Count                 Time (sec)              Count
>>>            Time (sec)                Count                 Time (sec)
>>>                    Max Ratio        Max        Ratio           Max
>>> Ratio        Max     Ratio               Max Ratio        Max     Ratio
>>> VecMDot      530 1.0         7.8320e+01 1.0         530    1.0
>>>  4.3285e+01 1.1           530   1.0          3.0476e+01   1.1
>>> VecMAXPY  534 1.0         9.2954e+01 1.0         534    1.0
>>> 4.8378e+01 1.1          534   1.0          3.0798e+01   1.1
>>> MatMult      8055 1.0         2.4608e+02 1.0        8103   1.0
>>> 1.2663e+02 1.0          8367 1.0           8.2942e+01 1.1
>>>
>>
>> For the number of calls listed.
>>
>> 1) The number of MatMults goes up, so you should normalize for that, but
>> you still have about 1.6 speedup. However, this is
>>     all multiplications. Are we sure they have the same size and sparsity?
>>
>> 2) MAXPY is also 1.6
>>
>> 3) MDot probably does not see the latency of one node, so again it is not
>> speeding up as you might want.
>>
>> This looks like you are using a single node with 2, 4, and 8 procs. The
>> memory bandwidth is exhausted sometime before 8 procs
>> (maybe 6), so you cease to see speedup. You can check this by running
>> `make streams` on the node.
>>
>>   Thanks,
>>
>>      Matt
>>
>>
>>> On Tue, Aug 20, 2024 at 1:16?PM Matthew Knepley <knepley at gmail.com>
>>> wrote:
>>>
>>>> On Tue, Aug 20, 2024 at 1:10?PM neil liu <liufield at gmail.com> wrote:
>>>>
>>>>> Thanks a lot for your explanation, Stefano. Very helpful.
>>>>> Yes. I am using dmplex to read a tetrahdra mesh from gmsh. With
>>>>> parmetis, the scaling performance is improved a lot.
>>>>> I will read your paper about how to change the basis for Nedelec
>>>>> elements.
>>>>>
>>>>> cpu #    time for 500 ksp steps  (s)           parallel efficiency
>>>>> 2           546
>>>>> 4           224                                               120%
>>>>> 8           170                                               80%
>>>>> This results are much better than previous attempt. Then I checked the
>>>>> time spent by several Petsc built-in functions for the ksp solver.
>>>>>
>>>>> Functions          time(2 cpus)     time(4 cpus)      time(8 cpus)
>>>>> VecMDot           78.32                43.28                30.47
>>>>> VecMAXPY       92.95                48.37                30.798
>>>>> MatMult          246.08               126.63                82.94
>>>>>
>>>>> It seems from cpu 4 to cpu 8, the scaling is not as good as from cpu 2
>>>>> to cpu 4.
>>>>> Am I  missing something?
>>>>>
>>>>
>>>> Did you normalize by the number of calls?
>>>>
>>>>   Thanks,
>>>>
>>>>      Matt
>>>>
>>>>
>>>>> Thanks a lot,
>>>>>
>>>>> Xiaodong
>>>>>
>>>>>
>>>>> On Mon, Aug 19, 2024 at 4:15?AM Stefano Zampini <
>>>>> stefano.zampini at gmail.com> wrote:
>>>>>
>>>>>> It seems you are using DMPLEX to handle the mesh, correct?
>>>>>> If so, you should configure using --download-parmetis to have a
>>>>>> better domain decomposition since the default one just splits the cells in
>>>>>> chunks as they are ordered.
>>>>>> This results in a large number of primal dofs on average (191, from
>>>>>> the  output of ksp_view)
>>>>>> ...
>>>>>> Primal    dofs   : 176 204 191
>>>>>> ...
>>>>>> that slows down the solver setup.
>>>>>>
>>>>>> Again, you should not use approximate local solvers with BDDC unless
>>>>>> you know what you are doing.
>>>>>> The theory for approximate solvers for BDDC is small and only for SPD
>>>>>> problems.
>>>>>> Looking at the output of log_view, coarse problem setup (PCBDDCCSet),
>>>>>> and primal functions setup (PCBDDCCorr) costs 35 + 63 seconds, respectively.
>>>>>> Also, the 500 application of the GAMG preconditioner for the Neumann
>>>>>> solver (PCBDDCNeuS) takes 129 seconds out of the 400 seconds of the total
>>>>>> solve time.
>>>>>>
>>>>>> PCBDDCTopo             1 1.0 3.1563e-01 1.0 1.11e+06 3.4 1.6e+03
>>>>>> 3.9e+04 3.8e+01  0  0  1  0  2   0  0  1  0  2    19
>>>>>> PCBDDCLKSP             2 1.0 2.0423e+00 1.7 9.31e+08 1.2 0.0e+00
>>>>>> 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0  3378
>>>>>> PCBDDCLWor             1 1.0 3.9178e-02 13.4 0.00e+00 0.0 0.0e+00
>>>>>> 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>>> PCBDDCCorr             1 1.0 6.3981e+01 2.2 8.16e+10 1.6 0.0e+00
>>>>>> 0.0e+00 0.0e+00 11 11  0  0  0  11 11  0  0  0  8900
>>>>>> PCBDDCCSet             1 1.0 3.5453e+01 4564.9 1.06e+05 1.7 1.2e+03
>>>>>> 5.3e+03 5.0e+01  2  0  1  0  3   2  0  1  0  3     0
>>>>>> PCBDDCCKSP             1 1.0 6.3266e-01 1.3 0.00e+00 0.0 3.3e+02
>>>>>> 1.1e+02 2.2e+01  0  0  0  0  1   0  0  0  0  1     0
>>>>>> PCBDDCScal             1 1.0 6.8274e-03 1.3 1.11e+06 3.4 5.6e+01
>>>>>> 3.2e+05 0.0e+00  0  0  0  0  0   0  0  0  0  0   894
>>>>>> PCBDDCDirS          1000 1.0 6.0420e+00 3.5 6.64e+09 5.4 0.0e+00
>>>>>> 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0  2995
>>>>>> PCBDDCNeuS           500 1.0 1.2901e+02 2.1 8.28e+10 1.2 0.0e+00
>>>>>> 0.0e+00 0.0e+00 22 12  0  0  0  22 12  0  0  0  4828
>>>>>> PCBDDCCoaS           500 1.0 5.8757e-01 1.8 1.09e+09 1.0 2.8e+04
>>>>>> 7.4e+02 5.0e+02  0  0 17  0 28   0  0 17  0 31 14901
>>>>>>
>>>>>> Finally, if I look at the residual history, I see a sharp decrease
>>>>>> and a very long plateau. This indicates a bad coarse space; as I said
>>>>>> before, there's no hope of finding a suitable coarse space without first
>>>>>> changing the basis of the Nedelec elements, which is done automatically if
>>>>>> you prescribe the discrete gradient operator (see the paper I have linked
>>>>>> to in my previous communication).
>>>>>>
>>>>>>
>>>>>>
>>>>>> Il giorno dom 18 ago 2024 alle ore 00:37 neil liu <liufield at gmail.com>
>>>>>> ha scritto:
>>>>>>
>>>>>>> Hi, Stefano,
>>>>>>> Please see the attached for the information with 4 and 8 CPUs for
>>>>>>> the complex matrix.
>>>>>>> I am solving Maxwell equations (Attahced) using 2nd-order Nedelec
>>>>>>> elements (two dofs each edge, and two dofs each face).
>>>>>>> The computational domain consists of different mediums, e.g.,
>>>>>>> vacuum and substrate (different permitivity).
>>>>>>> The PML is used to truncate the computational domain, absorbing the
>>>>>>> outgoing wave and introducing complex numbers for the matrix.
>>>>>>>
>>>>>>> Thanks a lot for your suggestions. I will try MUMPS.
>>>>>>> For now, I just want to fiddle with Petsc's built-in features to
>>>>>>> know more about it.
>>>>>>> Yes. 5000 is larger. Smaller value. e.g., 30, converges very slowly.
>>>>>>>
>>>>>>> Thanks a lot.
>>>>>>>
>>>>>>> Have a good weekend.
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Aug 17, 2024 at 9:23?AM Stefano Zampini <
>>>>>>> stefano.zampini at gmail.com> wrote:
>>>>>>>
>>>>>>>> Please include the output of -log_view -ksp_view -ksp_monitor to
>>>>>>>> understand what's happening.
>>>>>>>>
>>>>>>>> Can you please share the equations you are solving so we can
>>>>>>>> provide suggestions on the solver configuration?
>>>>>>>> As I said, solving for Nedelec-type discretizations is challenging,
>>>>>>>> and not for off-the-shelf, black box solvers
>>>>>>>>
>>>>>>>> Below are some comments:
>>>>>>>>
>>>>>>>>
>>>>>>>>    - You use a redundant SVD approach for the coarse solve, which
>>>>>>>>    can be inefficient if your coarse space grows. You can use a parallel
>>>>>>>>    direct solver like MUMPS (reconfigure with --download-mumps and use
>>>>>>>>    -pc_bddc_coarse_pc_type lu -pc_bddc_coarse_pc_factor_mat_solver_type mumps)
>>>>>>>>    - Why use ILU for the Dirichlet problem and GAMG for the
>>>>>>>>    Neumann problem? With 8 processes and 300K total dofs, you will have around
>>>>>>>>    40K dofs per process, which is ok for a direct solver like MUMPS
>>>>>>>>    (-pc_bddc_dirichlet_pc_factor_mat_solver_type mumps, same for Neumann).
>>>>>>>>    With Nedelec dofs and the sparsity pattern they induce,  I believe you can
>>>>>>>>    push to 80K dofs per process with good performance.
>>>>>>>>    - Why 5000 of restart for GMRES? It is highly inefficient to
>>>>>>>>    re-orthogonalize such a large set of vectors.
>>>>>>>>
>>>>>>>>
>>>>>>>> Il giorno ven 16 ago 2024 alle ore 00:04 neil liu <
>>>>>>>> liufield at gmail.com> ha scritto:
>>>>>>>>
>>>>>>>>> Dear Petsc developers,
>>>>>>>>>
>>>>>>>>> Thanks for your previous help. Now, the PCBDDC can converge to
>>>>>>>>> 1e-8 with,
>>>>>>>>>
>>>>>>>>> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 8 ./app -pc_type
>>>>>>>>> bddc -pc_bddc_coarse_redundant_pc_type svd   -ksp_error_if_not_converged
>>>>>>>>> -mat_type is -ksp_monitor -ksp_rtol 1e-8 -ksp_gmres_restart 5000 -ksp_view
>>>>>>>>> -pc_bddc_use_local_mat_graph 0  -pc_bddc_dirichlet_pc_type ilu
>>>>>>>>> -pc_bddc_neumann_pc_type gamg -pc_bddc_neumann_pc_gamg_esteig_ksp_max_it 10
>>>>>>>>> -ksp_converged_reason -pc_bddc_neumann_approximate -ksp_max_it 500 -log_view
>>>>>>>>>
>>>>>>>>> Then I used 2 cases for strong scaling test. One case only
>>>>>>>>> involves real numbers (tetra #: 49,152; dof #: 324, 224 ) for matrix and
>>>>>>>>> rhs. The 2nd case involves complex numbers  (tetra #: 95,336; dof #:
>>>>>>>>> 611,432)  due to PML.
>>>>>>>>>
>>>>>>>>> Case 1:
>>>>>>>>> cpu #                Time for 500 ksp steps (s)    Parallel
>>>>>>>>> efficiency     PCsetup time(s)
>>>>>>>>>           2              234.7
>>>>>>>>>                               3.12
>>>>>>>>>           4              126.6
>>>>>>>>>  0.92                      1.62
>>>>>>>>>           8              84.97
>>>>>>>>>  0.69                      1.26
>>>>>>>>> However for Case 2,
>>>>>>>>> cpu #                Time for 500 ksp steps (s)    Parallel
>>>>>>>>> efficiency   PCsetup time(s)
>>>>>>>>>           2              584.5
>>>>>>>>>                                   8.61
>>>>>>>>>           4              376.8
>>>>>>>>> 0.77                           6.56
>>>>>>>>>           8              459.6
>>>>>>>>> 0.31                         66.47
>>>>>>>>> For these 2 cases, I checked the time for PCsetup as an example.
>>>>>>>>> It seems 8 cpus for case 2 used too much time on PCsetup.
>>>>>>>>> Do you have any ideas about what is going on here?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Xiaodong
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Stefano
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Stefano
>>>>>>
>>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!c1-7PTlMFjRSGEtUBfqX0W9JQed5UTJTHCsmwhm4whuZoTMIll340dHxiKyGvIedaFLp4VcuBIrnBKMFP6GD$ 
>>>> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!c1-7PTlMFjRSGEtUBfqX0W9JQed5UTJTHCsmwhm4whuZoTMIll340dHxiKyGvIedaFLp4VcuBIrnBMwGiak0$ >
>>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!c1-7PTlMFjRSGEtUBfqX0W9JQed5UTJTHCsmwhm4whuZoTMIll340dHxiKyGvIedaFLp4VcuBIrnBKMFP6GD$ 
>> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!c1-7PTlMFjRSGEtUBfqX0W9JQed5UTJTHCsmwhm4whuZoTMIll340dHxiKyGvIedaFLp4VcuBIrnBMwGiak0$ >
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!c1-7PTlMFjRSGEtUBfqX0W9JQed5UTJTHCsmwhm4whuZoTMIll340dHxiKyGvIedaFLp4VcuBIrnBKMFP6GD$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!c1-7PTlMFjRSGEtUBfqX0W9JQed5UTJTHCsmwhm4whuZoTMIll340dHxiKyGvIedaFLp4VcuBIrnBMwGiak0$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240820/f8909e1c/attachment-0001.html>

From bsmith at petsc.dev  Tue Aug 20 20:44:05 2024
From: bsmith at petsc.dev (Barry Smith)
Date: Tue, 20 Aug 2024 21:44:05 -0400
Subject: [petsc-users] Strong scaling concerns for PCBDDC with Vector FEM
In-Reply-To: <CAMYG4Gn=YSAjAFcwkz0JCUn5-NNCw_JZUyZecajG2c2dOxgjcA@mail.gmail.com>
References: <CAGVJNHCmbz7tSu5i4m5LJsFyJ1wOQ=wLQ-n0zP0NxocCnMeG8Q@mail.gmail.com>
	<CAGPUisjdyE8peRFUzF=w=37bF76WeaxJFK8kPZTwDY2cisM=EQ@mail.gmail.com>
	<CAGVJNHBJbLGwVgmF056=3f=FHNny5ter1UjBdbumsbFiw9aZkQ@mail.gmail.com>
	<CAGPUisgOugaDwvERx-19E7bGw5cEyDLzUd5Xoi4MOEBbDPmmUw@mail.gmail.com>
	<CAGVJNHC0szqP83qvyn73cQySAKHPXLRPapmSqYJ-mRYa9cQSqQ@mail.gmail.com>
	<CAMYG4GkZKGOVBuouirkVmhsSZhg1qG7TBeRjokS-GKXpTu=mUg@mail.gmail.com>
	<CAGVJNHAV1xQKgjuNQiJ74OYBPAU7OoObeStr7gaKGV+1XCPsVQ@mail.gmail.com>
	<CAMYG4Gkt++oohgBvZnybLTsccYhw-izYqQ3Ua_yS9xHhSYB7EA@mail.gmail.com>
	<CAGVJNHCkqRS5oeaUDZeSQe-rZWAMtDUui7sqD6ZZneqKrD=wrw@mail.gmail.com>
	<CAMYG4Gn=YSAjAFcwkz0JCUn5-NNCw_JZUyZecajG2c2dOxgjcA@mail.gmail.com>
Message-ID: <683BA4D7-A421-4610-8D1F-3EE5A53C7B5A@petsc.dev>


  See the detailed discussion at https://urldefense.us/v3/__https://petsc.org/main/manual/streams/__;!!G_uCfscf7eWS!a3P4JjUgPCzentaJNryo2MwVyxl-cDAbiuEsoucMRAbQELiLDTyLtn-3nuro0gjye5CW9EGD2cuep7AG667XDu4$ 


> On Aug 20, 2024, at 5:53?PM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> On Tue, Aug 20, 2024 at 2:31?PM neil liu <liufield at gmail.com <mailto:liufield at gmail.com>> wrote:
>> Thanks a lot for this explanation, Matt. I will explore whether the matrix has the same size and spaisity.
> 
> I think it is much more likely that you just exhausted bandwidth on the node.
> 
>   Thanks,
> 
>     Matt
>  
>> On Tue, Aug 20, 2024 at 1:45?PM Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
>>> On Tue, Aug 20, 2024 at 1:36?PM neil liu <liufield at gmail.com <mailto:liufield at gmail.com>> wrote:
>>>> Hi, Matt, 
>>>> I think the time listed here represents the maximum total time across different processors.
>>>> 
>>>> Thanks a lot. 
>>>>                          2 cpus                                                    4 cpus                                           8 cpus 
>>>> Event          Count                 Time (sec)              Count                 Time (sec)                Count                 Time (sec)     
>>>>                    Max Ratio        Max        Ratio           Max Ratio        Max     Ratio               Max Ratio        Max     Ratio 
>>>> VecMDot      530 1.0         7.8320e+01 1.0         530    1.0         4.3285e+01 1.1           530   1.0          3.0476e+01   1.1
>>>> VecMAXPY  534 1.0         9.2954e+01 1.0         534    1.0          4.8378e+01 1.1          534   1.0          3.0798e+01   1.1
>>>> MatMult      8055 1.0         2.4608e+02 1.0        8103   1.0          1.2663e+02 1.0          8367 1.0           8.2942e+01 1.1
>>> 
>>> For the number of calls listed.
>>> 
>>> 1) The number of MatMults goes up, so you should normalize for that, but you still have about 1.6 speedup. However, this is
>>>     all multiplications. Are we sure they have the same size and sparsity?
>>> 
>>> 2) MAXPY is also 1.6
>>> 
>>> 3) MDot probably does not see the latency of one node, so again it is not speeding up as you might want.
>>> 
>>> This looks like you are using a single node with 2, 4, and 8 procs. The memory bandwidth is exhausted sometime before 8 procs
>>> (maybe 6), so you cease to see speedup. You can check this by running `make streams` on the node.
>>> 
>>>   Thanks,
>>> 
>>>      Matt
>>>  
>>>> On Tue, Aug 20, 2024 at 1:16?PM Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
>>>>> On Tue, Aug 20, 2024 at 1:10?PM neil liu <liufield at gmail.com <mailto:liufield at gmail.com>> wrote:
>>>>>> Thanks a lot for your explanation, Stefano. Very helpful. 
>>>>>> Yes. I am using dmplex to read a tetrahdra mesh from gmsh. With parmetis, the scaling performance is improved a lot. 
>>>>>> I will read your paper about how to change the basis for Nedelec elements. 
>>>>>> 
>>>>>> cpu #    time for 500 ksp steps  (s)           parallel efficiency
>>>>>> 2           546
>>>>>> 4           224                                               120%
>>>>>> 8           170                                               80%  
>>>>>> This results are much better than previous attempt. Then I checked the time spent by several Petsc built-in functions for the ksp solver. 
>>>>>> 
>>>>>> Functions          time(2 cpus)     time(4 cpus)      time(8 cpus)
>>>>>> VecMDot           78.32                43.28                30.47
>>>>>> VecMAXPY       92.95                48.37                30.798  
>>>>>> MatMult          246.08               126.63                82.94
>>>>>> 
>>>>>> It seems from cpu 4 to cpu 8, the scaling is not as good as from cpu 2 to cpu 4.
>>>>>> Am I  missing something? 
>>>>> 
>>>>> Did you normalize by the number of calls?
>>>>> 
>>>>>   Thanks,
>>>>> 
>>>>>      Matt
>>>>>  
>>>>>> Thanks a lot,
>>>>>> 
>>>>>> Xiaodong 
>>>>>> 
>>>>>> 
>>>>>> On Mon, Aug 19, 2024 at 4:15?AM Stefano Zampini <stefano.zampini at gmail.com <mailto:stefano.zampini at gmail.com>> wrote:
>>>>>>> It seems you are using DMPLEX to handle the mesh, correct?
>>>>>>> If so, you should configure using --download-parmetis to have a better domain decomposition since the default one just splits the cells in chunks as they are ordered.
>>>>>>> This results in a large number of primal dofs on average (191, from the  output of ksp_view)
>>>>>>> ...
>>>>>>> Primal    dofs   : 176 204 191
>>>>>>> ...
>>>>>>> that slows down the solver setup.
>>>>>>> 
>>>>>>> Again, you should not use approximate local solvers with BDDC unless you know what you are doing.
>>>>>>> The theory for approximate solvers for BDDC is small and only for SPD problems.
>>>>>>> Looking at the output of log_view, coarse problem setup (PCBDDCCSet), and primal functions setup (PCBDDCCorr) costs 35 + 63 seconds, respectively.
>>>>>>> Also, the 500 application of the GAMG preconditioner for the Neumann solver (PCBDDCNeuS) takes 129 seconds out of the 400 seconds of the total solve time.
>>>>>>> 
>>>>>>> PCBDDCTopo             1 1.0 3.1563e-01 1.0 1.11e+06 3.4 1.6e+03 3.9e+04 3.8e+01  0  0  1  0  2   0  0  1  0  2    19
>>>>>>> PCBDDCLKSP             2 1.0 2.0423e+00 1.7 9.31e+08 1.2 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0  3378
>>>>>>> PCBDDCLWor             1 1.0 3.9178e-02 13.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>>>> PCBDDCCorr             1 1.0 6.3981e+01 2.2 8.16e+10 1.6 0.0e+00 0.0e+00 0.0e+00 11 11  0  0  0  11 11  0  0  0  8900
>>>>>>> PCBDDCCSet             1 1.0 3.5453e+01 4564.9 1.06e+05 1.7 1.2e+03 5.3e+03 5.0e+01  2  0  1  0  3   2  0  1  0  3     0
>>>>>>> PCBDDCCKSP             1 1.0 6.3266e-01 1.3 0.00e+00 0.0 3.3e+02 1.1e+02 2.2e+01  0  0  0  0  1   0  0  0  0  1     0
>>>>>>> PCBDDCScal             1 1.0 6.8274e-03 1.3 1.11e+06 3.4 5.6e+01 3.2e+05 0.0e+00  0  0  0  0  0   0  0  0  0  0   894
>>>>>>> PCBDDCDirS          1000 1.0 6.0420e+00 3.5 6.64e+09 5.4 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0  2995
>>>>>>> PCBDDCNeuS           500 1.0 1.2901e+02 2.1 8.28e+10 1.2 0.0e+00 0.0e+00 0.0e+00 22 12  0  0  0  22 12  0  0  0  4828
>>>>>>> PCBDDCCoaS           500 1.0 5.8757e-01 1.8 1.09e+09 1.0 2.8e+04 7.4e+02 5.0e+02  0  0 17  0 28   0  0 17  0 31 14901
>>>>>>> 
>>>>>>> Finally, if I look at the residual history, I see a sharp decrease and a very long plateau. This indicates a bad coarse space; as I said before, there's no hope of finding a suitable coarse space without first changing the basis of the Nedelec elements, which is done automatically if you prescribe the discrete gradient operator (see the paper I have linked to in my previous communication).
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Il giorno dom 18 ago 2024 alle ore 00:37 neil liu <liufield at gmail.com <mailto:liufield at gmail.com>> ha scritto:
>>>>>>>> Hi, Stefano, 
>>>>>>>> Please see the attached for the information with 4 and 8 CPUs for the complex matrix.
>>>>>>>> I am solving Maxwell equations (Attahced) using 2nd-order Nedelec elements (two dofs each edge, and two dofs each face).
>>>>>>>> The computational domain consists of different mediums, e.g., vacuum and substrate (different permitivity).
>>>>>>>> The PML is used to truncate the computational domain, absorbing the outgoing wave and introducing complex numbers for the matrix.
>>>>>>>> 
>>>>>>>> Thanks a lot for your suggestions. I will try MUMPS. 
>>>>>>>> For now, I just want to fiddle with Petsc's built-in features to know more about it. 
>>>>>>>> Yes. 5000 is larger. Smaller value. e.g., 30, converges very slowly. 
>>>>>>>> 
>>>>>>>> Thanks a lot. 
>>>>>>>> 
>>>>>>>> Have a good weekend. 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Sat, Aug 17, 2024 at 9:23?AM Stefano Zampini <stefano.zampini at gmail.com <mailto:stefano.zampini at gmail.com>> wrote:
>>>>>>>>> Please include the output of -log_view -ksp_view -ksp_monitor to understand what's happening.
>>>>>>>>> 
>>>>>>>>> Can you please share the equations you are solving so we can provide suggestions on the solver configuration?
>>>>>>>>> As I said, solving for Nedelec-type discretizations is challenging, and not for off-the-shelf, black box solvers
>>>>>>>>> 
>>>>>>>>> Below are some comments:
>>>>>>>>> 
>>>>>>>>> You use a redundant SVD approach for the coarse solve, which can be inefficient if your coarse space grows. You can use a parallel direct solver like MUMPS (reconfigure with --download-mumps and use -pc_bddc_coarse_pc_type lu -pc_bddc_coarse_pc_factor_mat_solver_type mumps)
>>>>>>>>> Why use ILU for the Dirichlet problem and GAMG for the Neumann problem? With 8 processes and 300K total dofs, you will have around 40K dofs per process, which is ok for a direct solver like MUMPS (-pc_bddc_dirichlet_pc_factor_mat_solver_type mumps, same for Neumann). With Nedelec dofs and the sparsity pattern they induce,  I believe you can push to 80K dofs per process with good performance.
>>>>>>>>> Why 5000 of restart for GMRES? It is highly inefficient to re-orthogonalize such a large set of vectors.
>>>>>>>>> 
>>>>>>>>> Il giorno ven 16 ago 2024 alle ore 00:04 neil liu <liufield at gmail.com <mailto:liufield at gmail.com>> ha scritto:
>>>>>>>>>> Dear Petsc developers, 
>>>>>>>>>> 
>>>>>>>>>> Thanks for your previous help. Now, the PCBDDC can converge to 1e-8 with, 
>>>>>>>>>> 
>>>>>>>>>> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 8 ./app -pc_type bddc -pc_bddc_coarse_redundant_pc_type svd   -ksp_error_if_not_converged -mat_type is -ksp_monitor -ksp_rtol 1e-8 -ksp_gmres_restart 5000 -ksp_view -pc_bddc_use_local_mat_graph 0  -pc_bddc_dirichlet_pc_type ilu -pc_bddc_neumann_pc_type gamg -pc_bddc_neumann_pc_gamg_esteig_ksp_max_it 10 -ksp_converged_reason -pc_bddc_neumann_approximate -ksp_max_it 500 -log_view
>>>>>>>>>> 
>>>>>>>>>> Then I used 2 cases for strong scaling test. One case only involves real numbers (tetra #: 49,152; dof #: 324, 224 ) for matrix and rhs. The 2nd case involves complex numbers  (tetra #: 95,336; dof #: 611,432)  due to PML. 
>>>>>>>>>> 
>>>>>>>>>> Case 1: 
>>>>>>>>>> cpu #                Time for 500 ksp steps (s)    Parallel efficiency     PCsetup time(s)
>>>>>>>>>>           2              234.7                                                                  3.12
>>>>>>>>>>           4              126.6                                     0.92                      1.62
>>>>>>>>>>           8              84.97                                     0.69                      1.26
>>>>>>>>>> However for Case 2, 
>>>>>>>>>> cpu #                Time for 500 ksp steps (s)    Parallel efficiency   PCsetup time(s)
>>>>>>>>>>           2              584.5                                                                      8.61
>>>>>>>>>>           4              376.8                                    0.77                           6.56
>>>>>>>>>>           8              459.6                                    0.31                         66.47
>>>>>>>>>> For these 2 cases, I checked the time for PCsetup as an example. It seems 8 cpus for case 2 used too much time on PCsetup.
>>>>>>>>>> Do you have any ideas about what is going on here? 
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Xiaodong 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Stefano
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Stefano
>>>>> 
>>>>> 
>>>>> --
>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>>> -- Norbert Wiener
>>>>> 
>>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!a3P4JjUgPCzentaJNryo2MwVyxl-cDAbiuEsoucMRAbQELiLDTyLtn-3nuro0gjye5CW9EGD2cuep7AGveiw7Wc$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!c1-7PTlMFjRSGEtUBfqX0W9JQed5UTJTHCsmwhm4whuZoTMIll340dHxiKyGvIedaFLp4VcuBIrnBMwGiak0$>
>>> 
>>> 
>>> --
>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>> -- Norbert Wiener
>>> 
>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!a3P4JjUgPCzentaJNryo2MwVyxl-cDAbiuEsoucMRAbQELiLDTyLtn-3nuro0gjye5CW9EGD2cuep7AGveiw7Wc$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!c1-7PTlMFjRSGEtUBfqX0W9JQed5UTJTHCsmwhm4whuZoTMIll340dHxiKyGvIedaFLp4VcuBIrnBMwGiak0$>
> 
> 
> --
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!a3P4JjUgPCzentaJNryo2MwVyxl-cDAbiuEsoucMRAbQELiLDTyLtn-3nuro0gjye5CW9EGD2cuep7AGveiw7Wc$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!c1-7PTlMFjRSGEtUBfqX0W9JQed5UTJTHCsmwhm4whuZoTMIll340dHxiKyGvIedaFLp4VcuBIrnBMwGiak0$>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240820/4d1e02e5/attachment-0001.html>

From lzou at anl.gov  Wed Aug 21 08:57:18 2024
From: lzou at anl.gov (Zou, Ling)
Date: Wed, 21 Aug 2024 13:57:18 +0000
Subject: [petsc-users] Would Mac OS version affect PETSc/C/C++
 performance?
In-Reply-To: <CA+MQGp_x0VLNsMyHVUC0j+TS1VHEEAOB8fVE8zHkabCDY-kiNw@mail.gmail.com>
References: <SA1PR09MB7984D05748DC5722078CA28AC5832@SA1PR09MB7984.namprd09.prod.outlook.com>
	<CA+MQGp_x0VLNsMyHVUC0j+TS1VHEEAOB8fVE8zHkabCDY-kiNw@mail.gmail.com>
Message-ID: <SA1PR09MB79848D63FE178573FE51B367C58E2@SA1PR09MB7984.namprd09.prod.outlook.com>

Hi Junchao,

Yeah, I have part of the log_view, for the same code, same version of PETSc (3.20), but two OS (Ventura vs. Sonoma).
Note that PETSc function call numbers are exactly the same.
I suspect that it?s just OS becomes slower, or maybe something related to the compiler.

-Ling

# of calls
Time spent (Ventura)
Time spent (Sonoma)
MatMult MF
20463
3.718600E+00
4.467800E+00
MatMult
20463
3.721000E+00
4.470500E+00
MatFDColorApply
2062
4.507000E+00
5.394600E+00
MatFDColorFunc
24744
4.472400E+00
5.356300E+00
KSPSolve
2062
3.569700E+00
4.262400E+00
SNESSolve
986
9.195900E+00
1.102000E+01
SNESFunctionEval
23575
4.268600E+00
5.161100E+00
SNESJacobianEval
2062
4.509300E+00
5.397500E+00


From: Junchao Zhang <junchao.zhang at gmail.com>
Date: Monday, August 19, 2024 at 10:04?PM
To: Zou, Ling <lzou at anl.gov>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Would Mac OS version affect PETSc/C/C++ performance?
Do you have -log_view report so that we can know which petsc functions degraded? Or is it because compilers were different? --Junchao Zhang On Sun, Aug 18, 2024 at 6:?04 PM Zou, Ling via petsc-users <petsc-users@?mcs.?anl.?gov> wrote: Hi
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd
Do you have -log_view report so that we can know which petsc functions degraded?  Or is it because compilers were different?

--Junchao Zhang


On Sun, Aug 18, 2024 at 6:04?PM Zou, Ling via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
Hi all,

After updating Mac OS from Ventura to Sonoma, I am seeing my PETSc code having slightly-larger-than 10% of performance degradation (only in terms of execution time).
I track the number of major function calls, they are identical between the two OS (so PETSc is not the one to blame), but just slower.
Is this something expected, any one also experienced it?

-Ling
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240821/a38ccd0e/attachment-0001.html>

From zonexo at gmail.com  Wed Aug 21 10:19:01 2024
From: zonexo at gmail.com (TAY Wee Beng)
Date: Wed, 21 Aug 2024 23:19:01 +0800
Subject: [petsc-users] Error during code compile with MATGETOWNERSHIPRANGE
Message-ID: <28024196-ea0c-4ec7-ab13-a893d2852a04@gmail.com>

Hi,

I am using the latest PETSc thru github. I compiled both the debug and 
rel ver of PETSc w/o problem.

I then use it with my CFD code and the debug ver works.

However, I have problems with the rel ver:

*/ftn -o global.o -c -O3 -g -ip -ipo?? -fPIC? -save -w 
-I/home/project/11003851/lib/petsc_210824_intel_rel/include global.F90
ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is now 
deprecated and will be discontinued late 2024. Intel recommends that 
customers transition now to using the LLVM-based Intel(R) Fortran 
Compiler (ifx) for continued Windows* and Linux* support, new language 
support, new language features, and optimizations. Use 
'-diag-disable=10448' to disable this message.
global.F90(444): error #6285: There is no matching specific subroutine 
for this generic subroutine call. [MATGETOWNERSHIPRANGE]
 ??????? call MatGetOwnershipRange(A_mat,ksta_p,kend_p,ierr)
-------------^
global.F90(720): error #6285: There is no matching specific subroutine 
for this generic subroutine call. [MATGETOWNERSHIPRANGE]
call MatGetOwnershipRange(A_mat_uv,ksta_m,kend_m,ierr)
-----^
global.F90(774): error #6285: There is no matching specific subroutine 
for this generic subroutine call. [MATGETOWNERSHIPRANGE]
call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr)
-----^
global.F90(776): error #6285: There is no matching specific subroutine 
for this generic subroutine call. [MATGETOWNERSHIPRANGE]
call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr)
-----^
global.F90(949): error #6285: There is no matching specific subroutine 
for this generic subroutine call. [MATGETOWNERSHIPRANGE]
call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr)
-----^
global.F90(957): error #6285: There is no matching specific subroutine 
for this generic subroutine call. [MATGETOWNERSHIPRANGE]
call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr)
-----^
compilation aborted for global.F90 (code 1)/*

May I know what's the problem?

-- 

Thank you very much.

Yours sincerely,

================================================
TAY Wee-Beng ??? (Zheng Weiming)
================================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240821/b300c22e/attachment.html>

From bsmith at petsc.dev  Wed Aug 21 11:03:49 2024
From: bsmith at petsc.dev (Barry Smith)
Date: Wed, 21 Aug 2024 12:03:49 -0400
Subject: [petsc-users] Error during code compile with
 MATGETOWNERSHIPRANGE
In-Reply-To: <28024196-ea0c-4ec7-ab13-a893d2852a04@gmail.com>
References: <28024196-ea0c-4ec7-ab13-a893d2852a04@gmail.com>
Message-ID: <2CC29C86-EF68-4405-97F2-93EA0C25B9F2@petsc.dev>


  You must declare as

  PetscInt ksta_p,kend_p

  Perhaps they are declared as arrays?


> On Aug 21, 2024, at 11:19?AM, TAY Wee Beng <zonexo at gmail.com> wrote:
> 
> Hi,
> 
> I am using the latest PETSc thru github. I compiled both the debug and rel ver of PETSc w/o problem.
> 
> I then use it with my CFD code and the debug ver works.
> 
> However, I have problems with the rel ver:
> 
> ftn -o global.o -c -O3 -g -ip -ipo   -fPIC  -save -w -I/home/project/11003851/lib/petsc_210824_intel_rel/include    global.F90
> ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is now deprecated and will be discontinued late 2024. Intel recommends that customers transition now to using the LLVM-based Intel(R) Fortran Compiler (ifx) for continued Windows* and Linux* support, new language support, new language features, and optimizations. Use '-diag-disable=10448' to disable this message.
> global.F90(444): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>         call MatGetOwnershipRange(A_mat,ksta_p,kend_p,ierr)
> -------------^
> global.F90(720): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
> call MatGetOwnershipRange(A_mat_uv,ksta_m,kend_m,ierr)
> -----^
> global.F90(774): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr)
> -----^
> global.F90(776): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr)
> -----^
> global.F90(949): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr)
> -----^
> global.F90(957): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr)
> -----^
> compilation aborted for global.F90 (code 1)
> 
> May I know what's the problem?
> 
> --
> 
> Thank you very much.
> 
> Yours sincerely,
> 
> ================================================
> TAY Wee-Beng ??? (Zheng Weiming)
> ================================================
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240821/aa6cc842/attachment.html>

From meator.dev at gmail.com  Wed Aug 21 13:15:59 2024
From: meator.dev at gmail.com (meator)
Date: Wed, 21 Aug 2024 20:15:59 +0200
Subject: [petsc-users] petscerror.h is potentially missing includes
Message-ID: <207f84ff-fb28-48b2-af72-bc0b8ea0cd4c@gmail.com>

Hello. I have skimmed through PETSc's documentation to see whether PETSc 
has any special policy for including header files, but I didn't find 
anything, so I assume that standard C rules apply.

The problematic header file is <petscerror.h>. The following code 
doesn't compile:

     #include <petscerror.h>

     int main() { return 0; }

It fails because <petscerror.h> expects `MPI_Comm` to be defined, but it 
is (I assume) lacking appropriate includes which would define it. This 
is unfortunate, because many linters targeting C/C++ sort header files 
alphabetically. Since "petsc" is the common prefix for most PETSc header 
files, `petscerror.h` was put first in my header list because it begins 
with an "e".

I'm using PETSc version 3.21.3.

Thanks in advance
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0x1A14CB3464CBE5BF.asc
Type: application/pgp-keys
Size: 6275 bytes
Desc: OpenPGP public key
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240821/cf58820f/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 659 bytes
Desc: OpenPGP digital signature
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240821/cf58820f/attachment.sig>

From pierre at joliv.et  Wed Aug 21 13:40:25 2024
From: pierre at joliv.et (Pierre Jolivet)
Date: Wed, 21 Aug 2024 20:40:25 +0200
Subject: [petsc-users] petscerror.h is potentially missing includes
In-Reply-To: <207f84ff-fb28-48b2-af72-bc0b8ea0cd4c@gmail.com>
References: <207f84ff-fb28-48b2-af72-bc0b8ea0cd4c@gmail.com>
Message-ID: <DF1529F7-9ADB-4F67-8E6D-60C2E530FE78@joliv.et>

Cross-referencing https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/issues/1254__;!!G_uCfscf7eWS!dXHKUheZr_zi40ctF3flLGD0N_qAfFixD8DHUmzFJXKbIKjQ1jFS1-kfRf_GGXnljrjgjIyvcP-9POvAjSl8zA$ 

Thanks,
Pierre

> On 21 Aug 2024, at 8:15?PM, meator <meator.dev at gmail.com> wrote:
> 
> Hello. I have skimmed through PETSc's documentation to see whether PETSc has any special policy for including header files, but I didn't find anything, so I assume that standard C rules apply.
> 
> The problematic header file is <petscerror.h>. The following code doesn't compile:
> 
>    #include <petscerror.h>
> 
>    int main() { return 0; }
> 
> It fails because <petscerror.h> expects `MPI_Comm` to be defined, but it is (I assume) lacking appropriate includes which would define it. This is unfortunate, because many linters targeting C/C++ sort header files alphabetically. Since "petsc" is the common prefix for most PETSc header files, `petscerror.h` was put first in my header list because it begins with an "e".
> 
> I'm using PETSc version 3.21.3.
> 
> Thanks in advance
> <OpenPGP_0x1A14CB3464CBE5BF.asc>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240821/f16b9e96/attachment.html>

From meator.dev at gmail.com  Wed Aug 21 13:44:14 2024
From: meator.dev at gmail.com (meator)
Date: Wed, 21 Aug 2024 20:44:14 +0200
Subject: [petsc-users] petscerror.h is potentially missing includes
In-Reply-To: <DF1529F7-9ADB-4F67-8E6D-60C2E530FE78@joliv.et>
References: <207f84ff-fb28-48b2-af72-bc0b8ea0cd4c@gmail.com>
	<DF1529F7-9ADB-4F67-8E6D-60C2E530FE78@joliv.et>
Message-ID: <177d281c-06f1-4bf4-8d04-575596c1c797@gmail.com>

Ah, I didn't know that this bug is reported already. Thanks for the pointer!

On 8/21/24 8:40 PM, Pierre Jolivet wrote:
> Cross-referencing https://gitlab.com/petsc/petsc/-/issues/1254 
> <https://gitlab.com/petsc/petsc/-/issues/1254>
> 
> Thanks,
> Pierre
> 
>> On 21 Aug 2024, at 8:15?PM, meator <meator.dev at gmail.com> wrote:
>>
>> Hello. I have skimmed through PETSc's documentation to see whether 
>> PETSc has any special policy for including header files, but I didn't 
>> find anything, so I assume that standard C rules apply.
>>
>> The problematic header file is <petscerror.h>. The following code 
>> doesn't compile:
>>
>> ???#include <petscerror.h>
>>
>> ???int main() { return 0; }
>>
>> It fails because <petscerror.h> expects `MPI_Comm` to be defined, but 
>> it is (I assume) lacking appropriate includes which would define it. 
>> This is unfortunate, because many linters targeting C/C++ sort header 
>> files alphabetically. Since "petsc" is the common prefix for most 
>> PETSc header files, `petscerror.h` was put first in my header list 
>> because it begins with an "e".
>>
>> I'm using PETSc version 3.21.3.
>>
>> Thanks in advance
>> <OpenPGP_0x1A14CB3464CBE5BF.asc>
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0x1A14CB3464CBE5BF.asc
Type: application/pgp-keys
Size: 6275 bytes
Desc: OpenPGP public key
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240821/ba553742/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 659 bytes
Desc: OpenPGP digital signature
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240821/ba553742/attachment-0001.sig>

From bsmith at petsc.dev  Thu Aug 22 08:40:42 2024
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 22 Aug 2024 09:40:42 -0400
Subject: [petsc-users] Error during code compile with
 MATGETOWNERSHIPRANGE
In-Reply-To: <d630aaf8-a84a-49d3-9a4b-bf766bce4293@gmail.com>
References: <28024196-ea0c-4ec7-ab13-a893d2852a04@gmail.com>
	<2CC29C86-EF68-4405-97F2-93EA0C25B9F2@petsc.dev>
	<d630aaf8-a84a-49d3-9a4b-bf766bce4293@gmail.com>
Message-ID: <8F81DAC9-6A51-4AD2-9D8F-AC6FDCF2A007@petsc.dev>


   Fortran 90 type checking is very tight; The dimension of the array, or scalar passed as arguments must match the expected dimension (f77 did not do this type checking). Thus the ione argument must be a 1-d array as well as the numerical values so do 

> call MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr)
> 

See Fortran at https://urldefense.us/v3/__https://petsc.org/main/changes/dev/__;!!G_uCfscf7eWS!fyaeYb3WlpH1d83aZxEB9RHOQhvgYlgvDJA4PQ389kQZjJxTKqgLZj0Jdglufyhde7YlMWKSo8z5ZSw_DhHXk48$ 

I am trying to support the old-fashion F77 model, allowing miss-matches in the array dimensions while still doing proper type checking but it will take some time to simplify the API.

   Barry


> On Aug 21, 2024, at 9:44?PM, TAY Wee Beng <zonexo at gmail.com> wrote:
> 
> Hi Barry,
> 
> I have declared them as integers in Fortran. Is that different from PetscInt and how come it works in debug mode?
> 
> Anyway, I changed them and it solved the problem. However, I have a similar problem in my boundary.F90:
> 
> boundary.F90(6685): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATSETVALUES]
> call MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr)
> -----^
> I changed all to PetscInt and also PetscReal but I still got the error.
> 
> Why is this so now? Any solution?
> 
> Thanks!
> 
> On 22/8/2024 12:03 am, Barry Smith wrote:
>> 
>>   You must declare as
>> 
>>   PetscInt ksta_p,kend_p
>> 
>>   Perhaps they are declared as arrays?
>> 
>> 
>>> On Aug 21, 2024, at 11:19?AM, TAY Wee Beng <zonexo at gmail.com> <mailto:zonexo at gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I am using the latest PETSc thru github. I compiled both the debug and rel ver of PETSc w/o problem.
>>> 
>>> I then use it with my CFD code and the debug ver works.
>>> 
>>> However, I have problems with the rel ver:
>>> 
>>> ftn -o global.o -c -O3 -g -ip -ipo   -fPIC  -save -w -I/home/project/11003851/lib/petsc_210824_intel_rel/include    global.F90
>>> ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is now deprecated and will be discontinued late 2024. Intel recommends that customers transition now to using the LLVM-based Intel(R) Fortran Compiler (ifx) for continued Windows* and Linux* support, new language support, new language features, and optimizations. Use '-diag-disable=10448' to disable this message.
>>> global.F90(444): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>>         call MatGetOwnershipRange(A_mat,ksta_p,kend_p,ierr)
>>> -------------^
>>> global.F90(720): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>> call MatGetOwnershipRange(A_mat_uv,ksta_m,kend_m,ierr)
>>> -----^
>>> global.F90(774): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr)
>>> -----^
>>> global.F90(776): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr)
>>> -----^
>>> global.F90(949): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr)
>>> -----^
>>> global.F90(957): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr)
>>> -----^
>>> compilation aborted for global.F90 (code 1)
>>> 
>>> May I know what's the problem?
>>> 
>>> --
>>> 
>>> Thank you very much.
>>> 
>>> Yours sincerely,
>>> 
>>> ================================================
>>> TAY Wee-Beng ??? (Zheng Weiming)
>>> ================================================
>>> 
>>> 
>> 
> --
> 
> Thank you very much.
> 
> Yours sincerely,
> 
> ================================================
> TAY Wee-Beng ??? (Zheng Weiming)
> ================================================
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240822/a1067868/attachment.html>

From zonexo at gmail.com  Thu Aug 22 08:45:45 2024
From: zonexo at gmail.com (TAY Wee Beng)
Date: Thu, 22 Aug 2024 21:45:45 +0800
Subject: [petsc-users] Error during code compile with
 MATGETOWNERSHIPRANGE
In-Reply-To: <8F81DAC9-6A51-4AD2-9D8F-AC6FDCF2A007@petsc.dev>
References: <28024196-ea0c-4ec7-ab13-a893d2852a04@gmail.com>
	<2CC29C86-EF68-4405-97F2-93EA0C25B9F2@petsc.dev>
	<d630aaf8-a84a-49d3-9a4b-bf766bce4293@gmail.com>
	<8F81DAC9-6A51-4AD2-9D8F-AC6FDCF2A007@petsc.dev>
Message-ID: <e5852c5d-786b-4b5f-8c34-3cde37145f80@gmail.com>

Hi Barry,

Do you mean that I change from:

call 
MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr)

to

call 
MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr)

?

I did it but the error is still there.

On 22/8/2024 9:40 pm, Barry Smith wrote:
>
> ? ?Fortran 90 type checking is very tight; The dimension of the array, 
> or scalar passed as arguments must match the expected dimension (f77 
> did not do this type checking). Thus the ione argument must be a 1-d 
> array as well as the numerical values so do
>
>> */call 
>> MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr)/*
>>
>
> See Fortran at https://urldefense.us/v3/__https://petsc.org/main/changes/dev/__;!!G_uCfscf7eWS!YtzPtW9XslKdNpPZd4zGIwtB0bpm5C24PUmAaH-renGV54WI9JpWuh7yYG-oSS4g9_KOnSqCEFPSwHcba_c$ 
>
> I am trying to support the old-fashion F77 model, allowing 
> miss-matches in the array dimensions while still doing proper type 
> checking but it will take some time to simplify the API.
>
> ? ?Barry
>
>
>
>> On Aug 21, 2024, at 9:44?PM, TAY Wee Beng <zonexo at gmail.com> wrote:
>>
>> Hi Barry,
>>
>> I have declared them as integers in Fortran. Is that different from 
>> PetscInt and how come it works in debug mode?
>>
>> Anyway, I changed them and it solved the problem. However, I have a 
>> similar problem in my boundary.F90:
>>
>> */boundary.F90(6685): error #6285: There is no matching specific 
>> subroutine for this generic subroutine call.?? [MATSETVALUES]
>> call 
>> MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr)/*
>> -----^
>> I changed all to PetscInt and also PetscReal but I still got the error.
>>
>> Why is this so now? Any solution?
>>
>> Thanks!
>>
>> On 22/8/2024 12:03 am, Barry Smith wrote:
>>>
>>> ? You must declare as
>>>
>>> */? PetscInt ksta_p,kend_p/*
>>>
>>> ? Perhaps they are declared as arrays?
>>>
>>>
>>>> On Aug 21, 2024, at 11:19?AM, TAY Wee Beng <zonexo at gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am using the latest PETSc thru github. I compiled both the debug 
>>>> and rel ver of PETSc w/o problem.
>>>>
>>>> I then use it with my CFD code and the debug ver works.
>>>>
>>>> However, I have problems with the rel ver:
>>>>
>>>> */ftn -o global.o -c -O3 -g -ip -ipo?? -fPIC? -save -w 
>>>> -I/home/project/11003851/lib/petsc_210824_intel_rel/include global.F90
>>>> ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is 
>>>> now deprecated and will be discontinued late 2024. Intel recommends 
>>>> that customers transition now to using the LLVM-based Intel(R) 
>>>> Fortran Compiler (ifx) for continued Windows* and Linux* support, 
>>>> new language support, new language features, and optimizations. Use 
>>>> '-diag-disable=10448' to disable this message.
>>>> global.F90(444): error #6285: There is no matching specific 
>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE]
>>>> ??????? call MatGetOwnershipRange(A_mat,ksta_p,kend_p,ierr)
>>>> -------------^
>>>> global.F90(720): error #6285: There is no matching specific 
>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE]
>>>> call MatGetOwnershipRange(A_mat_uv,ksta_m,kend_m,ierr)
>>>> -----^
>>>> global.F90(774): error #6285: There is no matching specific 
>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE]
>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr)
>>>> -----^
>>>> global.F90(776): error #6285: There is no matching specific 
>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE]
>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr)
>>>> -----^
>>>> global.F90(949): error #6285: There is no matching specific 
>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE]
>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr)
>>>> -----^
>>>> global.F90(957): error #6285: There is no matching specific 
>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE]
>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr)
>>>> -----^
>>>> compilation aborted for global.F90 (code 1)/*
>>>>
>>>> May I know what's the problem?
>>>>
>>>> -- 
>>>>
>>>> Thank you very much.
>>>>
>>>> Yours sincerely,
>>>>
>>>> ================================================
>>>> TAY Wee-Beng ??? (Zheng Weiming)
>>>> ================================================
>>>>
>>>>
>>>
>> -- 
>>
>> Thank you very much.
>>
>> Yours sincerely,
>>
>> ================================================
>> TAY Wee-Beng ??? (Zheng Weiming)
>> ================================================
>>
>>
>
-- 

Thank you very much.

Yours sincerely,

================================================
TAY Wee-Beng ??? (Zheng Weiming)
================================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240822/29c061e4/attachment-0001.html>

From bsmith at petsc.dev  Thu Aug 22 08:54:50 2024
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 22 Aug 2024 09:54:50 -0400
Subject: [petsc-users] Error during code compile with
 MATGETOWNERSHIPRANGE
In-Reply-To: <e5852c5d-786b-4b5f-8c34-3cde37145f80@gmail.com>
References: <28024196-ea0c-4ec7-ab13-a893d2852a04@gmail.com>
	<2CC29C86-EF68-4405-97F2-93EA0C25B9F2@petsc.dev>
	<d630aaf8-a84a-49d3-9a4b-bf766bce4293@gmail.com>
	<8F81DAC9-6A51-4AD2-9D8F-AC6FDCF2A007@petsc.dev>
	<e5852c5d-786b-4b5f-8c34-3cde37145f80@gmail.com>
Message-ID: <FB6E2EF7-C73D-45EA-A0E9-EDE5C30F8F40@petsc.dev>


  What is int_impl(k,5) defined type?

> On Aug 22, 2024, at 9:45?AM, TAY Wee Beng <zonexo at gmail.com> wrote:
> 
> Hi Barry,
> 
> Do you mean that I change from:
> 
> call MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr)
> 
> to
> 
> call MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr)
> 
> ?
> 
> I did it but the error is still there.
> 
> On 22/8/2024 9:40 pm, Barry Smith wrote:
>> 
>>    Fortran 90 type checking is very tight; The dimension of the array, or scalar passed as arguments must match the expected dimension (f77 did not do this type checking). Thus the ione argument must be a 1-d array as well as the numerical values so do 
>> 
>>> call MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr)
>>> 
>> 
>> See Fortran at https://urldefense.us/v3/__https://petsc.org/main/changes/dev/__;!!G_uCfscf7eWS!aWzePGVwtbySxFNNIyaUMeslM1i47XZ6Q8Cu-XOfcXtqa0fUhIxUXbnw6aeXBBE-k5uGtriqZ7_yShLv_cy0KmM$ 
>> 
>> I am trying to support the old-fashion F77 model, allowing miss-matches in the array dimensions while still doing proper type checking but it will take some time to simplify the API.
>> 
>>    Barry
>> 
>> 
>> 
>>> On Aug 21, 2024, at 9:44?PM, TAY Wee Beng <zonexo at gmail.com> <mailto:zonexo at gmail.com> wrote:
>>> 
>>> Hi Barry,
>>> 
>>> I have declared them as integers in Fortran. Is that different from PetscInt and how come it works in debug mode?
>>> 
>>> Anyway, I changed them and it solved the problem. However, I have a similar problem in my boundary.F90:
>>> 
>>> boundary.F90(6685): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATSETVALUES]
>>> call MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr)
>>> -----^
>>> I changed all to PetscInt and also PetscReal but I still got the error.
>>> 
>>> Why is this so now? Any solution?
>>> 
>>> Thanks!
>>> 
>>> On 22/8/2024 12:03 am, Barry Smith wrote:
>>>> 
>>>>   You must declare as
>>>> 
>>>>   PetscInt ksta_p,kend_p
>>>> 
>>>>   Perhaps they are declared as arrays?
>>>> 
>>>> 
>>>>> On Aug 21, 2024, at 11:19?AM, TAY Wee Beng <zonexo at gmail.com> <mailto:zonexo at gmail.com> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I am using the latest PETSc thru github. I compiled both the debug and rel ver of PETSc w/o problem.
>>>>> 
>>>>> I then use it with my CFD code and the debug ver works.
>>>>> 
>>>>> However, I have problems with the rel ver:
>>>>> 
>>>>> ftn -o global.o -c -O3 -g -ip -ipo   -fPIC  -save -w -I/home/project/11003851/lib/petsc_210824_intel_rel/include    global.F90
>>>>> ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is now deprecated and will be discontinued late 2024. Intel recommends that customers transition now to using the LLVM-based Intel(R) Fortran Compiler (ifx) for continued Windows* and Linux* support, new language support, new language features, and optimizations. Use '-diag-disable=10448' to disable this message.
>>>>> global.F90(444): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>>>>         call MatGetOwnershipRange(A_mat,ksta_p,kend_p,ierr)
>>>>> -------------^
>>>>> global.F90(720): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>>>> call MatGetOwnershipRange(A_mat_uv,ksta_m,kend_m,ierr)
>>>>> -----^
>>>>> global.F90(774): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr)
>>>>> -----^
>>>>> global.F90(776): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr)
>>>>> -----^
>>>>> global.F90(949): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr)
>>>>> -----^
>>>>> global.F90(957): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr)
>>>>> -----^
>>>>> compilation aborted for global.F90 (code 1)
>>>>> 
>>>>> May I know what's the problem?
>>>>> 
>>>>> --
>>>>> 
>>>>> Thank you very much.
>>>>> 
>>>>> Yours sincerely,
>>>>> 
>>>>> ================================================
>>>>> TAY Wee-Beng ??? (Zheng Weiming)
>>>>> ================================================
>>>>> 
>>>>> 
>>>> 
>>> --
>>> 
>>> Thank you very much.
>>> 
>>> Yours sincerely,
>>> 
>>> ================================================
>>> TAY Wee-Beng ??? (Zheng Weiming)
>>> ================================================
>>> 
>>> 
>> 
> --
> 
> Thank you very much.
> 
> Yours sincerely,
> 
> ================================================
> TAY Wee-Beng ??? (Zheng Weiming)
> ================================================
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240822/308f9f90/attachment.html>

From zonexo at gmail.com  Thu Aug 22 08:55:53 2024
From: zonexo at gmail.com (TAY Wee Beng)
Date: Thu, 22 Aug 2024 21:55:53 +0800
Subject: [petsc-users] Error during code compile with
 MATGETOWNERSHIPRANGE
In-Reply-To: <FB6E2EF7-C73D-45EA-A0E9-EDE5C30F8F40@petsc.dev>
References: <28024196-ea0c-4ec7-ab13-a893d2852a04@gmail.com>
	<2CC29C86-EF68-4405-97F2-93EA0C25B9F2@petsc.dev>
	<d630aaf8-a84a-49d3-9a4b-bf766bce4293@gmail.com>
	<8F81DAC9-6A51-4AD2-9D8F-AC6FDCF2A007@petsc.dev>
	<e5852c5d-786b-4b5f-8c34-3cde37145f80@gmail.com>
	<FB6E2EF7-C73D-45EA-A0E9-EDE5C30F8F40@petsc.dev>
Message-ID: <493e2fc8-8ce6-47d9-83e2-8f6087d49422@gmail.com>


On 22/8/2024 9:54 pm, Barry Smith wrote:
>
> ? What is int_impl(k,5) defined type?
PetscInt
>
>> On Aug 22, 2024, at 9:45?AM, TAY Wee Beng <zonexo at gmail.com> wrote:
>>
>> Hi Barry,
>>
>> Do you mean that I change from:
>>
>> call 
>> MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr)
>>
>> to
>>
>> call 
>> MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr)
>>
>> ?
>>
>> I did it but the error is still there.
>>
>> On 22/8/2024 9:40 pm, Barry Smith wrote:
>>>
>>> ? ?Fortran 90 type checking is very tight; The dimension of the 
>>> array, or scalar passed as arguments must match the expected 
>>> dimension (f77 did not do this type checking). Thus the ione 
>>> argument must be a 1-d array as well as the numerical values so do
>>>
>>>> */call 
>>>> MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr)/*
>>>>
>>>
>>> See Fortran at https://urldefense.us/v3/__https://petsc.org/main/changes/dev/__;!!G_uCfscf7eWS!autVTPnQ7buLuq9rjvUR07AS8J_YKe2xLprKP48K_ELW64wGci2MCdQ2u2VxgZOFwjHSdmLTP8x3yfcAg30$ 
>>>
>>> I am trying to support the old-fashion F77 model, allowing 
>>> miss-matches in the array dimensions while still doing proper type 
>>> checking but it will take some time to simplify the API.
>>>
>>> ? ?Barry
>>>
>>>
>>>
>>>> On Aug 21, 2024, at 9:44?PM, TAY Wee Beng <zonexo at gmail.com> wrote:
>>>>
>>>> Hi Barry,
>>>>
>>>> I have declared them as integers in Fortran. Is that different from 
>>>> PetscInt and how come it works in debug mode?
>>>>
>>>> Anyway, I changed them and it solved the problem. However, I have a 
>>>> similar problem in my boundary.F90:
>>>>
>>>> */boundary.F90(6685): error #6285: There is no matching specific 
>>>> subroutine for this generic subroutine call. [MATSETVALUES]
>>>> call 
>>>> MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr)/*
>>>> -----^
>>>> I changed all to PetscInt and also PetscReal but I still got the error.
>>>>
>>>> Why is this so now? Any solution?
>>>>
>>>> Thanks!
>>>>
>>>> On 22/8/2024 12:03 am, Barry Smith wrote:
>>>>>
>>>>> ? You must declare as
>>>>>
>>>>> */? PetscInt ksta_p,kend_p/*
>>>>>
>>>>> ? Perhaps they are declared as arrays?
>>>>>
>>>>>
>>>>>> On Aug 21, 2024, at 11:19?AM, TAY Wee Beng <zonexo at gmail.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am using the latest PETSc thru github. I compiled both the 
>>>>>> debug and rel ver of PETSc w/o problem.
>>>>>>
>>>>>> I then use it with my CFD code and the debug ver works.
>>>>>>
>>>>>> However, I have problems with the rel ver:
>>>>>>
>>>>>> */ftn -o global.o -c -O3 -g -ip -ipo?? -fPIC? -save -w 
>>>>>> -I/home/project/11003851/lib/petsc_210824_intel_rel/include 
>>>>>> global.F90
>>>>>> ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) 
>>>>>> is now deprecated and will be discontinued late 2024. Intel 
>>>>>> recommends that customers transition now to using the LLVM-based 
>>>>>> Intel(R) Fortran Compiler (ifx) for continued Windows* and Linux* 
>>>>>> support, new language support, new language features, and 
>>>>>> optimizations. Use '-diag-disable=10448' to disable this message.
>>>>>> global.F90(444): error #6285: There is no matching specific 
>>>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE]
>>>>>> ??????? call MatGetOwnershipRange(A_mat,ksta_p,kend_p,ierr)
>>>>>> -------------^
>>>>>> global.F90(720): error #6285: There is no matching specific 
>>>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE]
>>>>>> call MatGetOwnershipRange(A_mat_uv,ksta_m,kend_m,ierr)
>>>>>> -----^
>>>>>> global.F90(774): error #6285: There is no matching specific 
>>>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE]
>>>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr)
>>>>>> -----^
>>>>>> global.F90(776): error #6285: There is no matching specific 
>>>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE]
>>>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr)
>>>>>> -----^
>>>>>> global.F90(949): error #6285: There is no matching specific 
>>>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE]
>>>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr)
>>>>>> -----^
>>>>>> global.F90(957): error #6285: There is no matching specific 
>>>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE]
>>>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr)
>>>>>> -----^
>>>>>> compilation aborted for global.F90 (code 1)/*
>>>>>>
>>>>>> May I know what's the problem?
>>>>>>
>>>>>> -- 
>>>>>>
>>>>>> Thank you very much.
>>>>>>
>>>>>> Yours sincerely,
>>>>>>
>>>>>> ================================================
>>>>>> TAY Wee-Beng ??? (Zheng Weiming)
>>>>>> ================================================
>>>>>>
>>>>>>
>>>>>
>>>> -- 
>>>>
>>>> Thank you very much.
>>>>
>>>> Yours sincerely,
>>>>
>>>> ================================================
>>>> TAY Wee-Beng ??? (Zheng Weiming)
>>>> ================================================
>>>>
>>>>
>>>
>> -- 
>>
>> Thank you very much.
>>
>> Yours sincerely,
>>
>> ================================================
>> TAY Wee-Beng ??? (Zheng Weiming)
>> ================================================
>>
>>
>
-- 

Thank you very much.

Yours sincerely,

================================================
TAY Wee-Beng ??? (Zheng Weiming)
================================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240822/8b64d7ee/attachment-0001.html>

From junchao.zhang at gmail.com  Thu Aug 22 09:28:20 2024
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Thu, 22 Aug 2024 09:28:20 -0500
Subject: [petsc-users] Would Mac OS version affect PETSc/C/C++
 performance?
In-Reply-To: <SA1PR09MB79848D63FE178573FE51B367C58E2@SA1PR09MB7984.namprd09.prod.outlook.com>
References: <SA1PR09MB7984D05748DC5722078CA28AC5832@SA1PR09MB7984.namprd09.prod.outlook.com>
	<CA+MQGp_x0VLNsMyHVUC0j+TS1VHEEAOB8fVE8zHkabCDY-kiNw@mail.gmail.com>
	<SA1PR09MB79848D63FE178573FE51B367C58E2@SA1PR09MB7984.namprd09.prod.outlook.com>
Message-ID: <CA+MQGp9W=hwmTftma3fpow4hf7Nz=x0B70A3s+JhR=Eu-eRsHg@mail.gmail.com>

Hi, Ling,
   MatMult almost degraded 20%, which is a lot.   Do you have configure.log
for the two builds?   We might find compiler discrepancies from it.

--Junchao Zhang


On Wed, Aug 21, 2024 at 8:57?AM Zou, Ling <lzou at anl.gov> wrote:

> Hi Junchao,
>
>
>
> Yeah, I have part of the log_view, for the same code, same version of
> PETSc (3.20), but two OS (Ventura vs. Sonoma).
>
> Note that PETSc function call numbers are exactly the same.
>
> I suspect that it?s just OS becomes slower, or maybe something related to
> the compiler.
>
>
>
> -Ling
>
>
>
> # of calls
>
> Time spent (Ventura)
>
> Time spent (Sonoma)
>
> MatMult MF
>
> 20463
>
> 3.718600E+00
>
> 4.467800E+00
>
> MatMult
>
> 20463
>
> 3.721000E+00
>
> 4.470500E+00
>
> MatFDColorApply
>
> 2062
>
> 4.507000E+00
>
> 5.394600E+00
>
> MatFDColorFunc
>
> 24744
>
> 4.472400E+00
>
> 5.356300E+00
>
> KSPSolve
>
> 2062
>
> 3.569700E+00
>
> 4.262400E+00
>
> SNESSolve
>
> 986
>
> 9.195900E+00
>
> 1.102000E+01
>
> SNESFunctionEval
>
> 23575
>
> 4.268600E+00
>
> 5.161100E+00
>
> SNESJacobianEval
>
> 2062
>
> 4.509300E+00
>
> 5.397500E+00
>
>
>
>
>
> *From: *Junchao Zhang <junchao.zhang at gmail.com>
> *Date: *Monday, August 19, 2024 at 10:04?PM
> *To: *Zou, Ling <lzou at anl.gov>
> *Cc: *petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject: *Re: [petsc-users] Would Mac OS version affect PETSc/C/C++
> performance?
>
> Do you have -log_view report so that we can know which petsc functions
> degraded? Or is it because compilers were different? --Junchao Zhang On
> Sun, Aug 18, 2024 at 6: 04 PM Zou, Ling via petsc-users <petsc-users@ mcs.
> anl. gov> wrote: Hi
>
> ZjQcmQRYFpfptBannerStart
>
> *This Message Is From an External Sender *
>
> This message came from outside your organization.
>
>
>
> ZjQcmQRYFpfptBannerEnd
>
> Do you have -log_view report so that we can know which petsc functions
> degraded?  Or is it because compilers were different?
>
>
> --Junchao Zhang
>
>
>
>
>
> On Sun, Aug 18, 2024 at 6:04?PM Zou, Ling via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> Hi all,
>
>
>
> After updating Mac OS from Ventura to Sonoma, I am seeing my PETSc code
> having slightly-larger-than 10% of performance degradation (only in terms
> of execution time).
>
> I track the number of major function calls, they are identical between the
> two OS (so PETSc is not the one to blame), but just slower.
>
> Is this something expected, any one also experienced it?
>
>
>
> -Ling
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240822/06e6583b/attachment.html>

From bsmith at petsc.dev  Thu Aug 22 09:28:24 2024
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 22 Aug 2024 10:28:24 -0400
Subject: [petsc-users] Error during code compile with
 MATGETOWNERSHIPRANGE
In-Reply-To: <493e2fc8-8ce6-47d9-83e2-8f6087d49422@gmail.com>
References: <28024196-ea0c-4ec7-ab13-a893d2852a04@gmail.com>
	<2CC29C86-EF68-4405-97F2-93EA0C25B9F2@petsc.dev>
	<d630aaf8-a84a-49d3-9a4b-bf766bce4293@gmail.com>
	<8F81DAC9-6A51-4AD2-9D8F-AC6FDCF2A007@petsc.dev>
	<e5852c5d-786b-4b5f-8c34-3cde37145f80@gmail.com>
	<FB6E2EF7-C73D-45EA-A0E9-EDE5C30F8F40@petsc.dev>
	<493e2fc8-8ce6-47d9-83e2-8f6087d49422@gmail.com>
Message-ID: <5561D4D2-5ADD-4076-8DB3-4B07EC893FD2@petsc.dev>


  Should be PetscReal as it is a numerical value.


> On Aug 22, 2024, at 9:55?AM, TAY Wee Beng <zonexo at gmail.com> wrote:
> 
> 
> 
> On 22/8/2024 9:54 pm, Barry Smith wrote:
>> 
>>   What is int_impl(k,5) defined type?
> PetscInt
>> 
>>> On Aug 22, 2024, at 9:45?AM, TAY Wee Beng <zonexo at gmail.com> <mailto:zonexo at gmail.com> wrote:
>>> 
>>> Hi Barry,
>>> 
>>> Do you mean that I change from:
>>> 
>>> call MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr)
>>> 
>>> to
>>> 
>>> call MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr)
>>> 
>>> ?
>>> 
>>> I did it but the error is still there.
>>> 
>>> On 22/8/2024 9:40 pm, Barry Smith wrote:
>>>> 
>>>>    Fortran 90 type checking is very tight; The dimension of the array, or scalar passed as arguments must match the expected dimension (f77 did not do this type checking). Thus the ione argument must be a 1-d array as well as the numerical values so do 
>>>> 
>>>>> call MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr)
>>>>> 
>>>> 
>>>> See Fortran at https://urldefense.us/v3/__https://petsc.org/main/changes/dev/__;!!G_uCfscf7eWS!dNBzWkhJd-4jZyaRlnKfwEu7GsbCs6Akjkgm66N2JkhV3BIi2VMXqj5uIxaLfxLiJiHd7B0-jQmrxJNpdLCb00A$ 
>>>> 
>>>> I am trying to support the old-fashion F77 model, allowing miss-matches in the array dimensions while still doing proper type checking but it will take some time to simplify the API.
>>>> 
>>>>    Barry
>>>> 
>>>> 
>>>> 
>>>>> On Aug 21, 2024, at 9:44?PM, TAY Wee Beng <zonexo at gmail.com> <mailto:zonexo at gmail.com> wrote:
>>>>> 
>>>>> Hi Barry,
>>>>> 
>>>>> I have declared them as integers in Fortran. Is that different from PetscInt and how come it works in debug mode?
>>>>> 
>>>>> Anyway, I changed them and it solved the problem. However, I have a similar problem in my boundary.F90:
>>>>> 
>>>>> boundary.F90(6685): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATSETVALUES]
>>>>> call MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr)
>>>>> -----^
>>>>> I changed all to PetscInt and also PetscReal but I still got the error.
>>>>> 
>>>>> Why is this so now? Any solution?
>>>>> 
>>>>> Thanks!
>>>>> 
>>>>> On 22/8/2024 12:03 am, Barry Smith wrote:
>>>>>> 
>>>>>>   You must declare as
>>>>>> 
>>>>>>   PetscInt ksta_p,kend_p
>>>>>> 
>>>>>>   Perhaps they are declared as arrays?
>>>>>> 
>>>>>> 
>>>>>>> On Aug 21, 2024, at 11:19?AM, TAY Wee Beng <zonexo at gmail.com> <mailto:zonexo at gmail.com> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I am using the latest PETSc thru github. I compiled both the debug and rel ver of PETSc w/o problem.
>>>>>>> 
>>>>>>> I then use it with my CFD code and the debug ver works.
>>>>>>> 
>>>>>>> However, I have problems with the rel ver:
>>>>>>> 
>>>>>>> ftn -o global.o -c -O3 -g -ip -ipo   -fPIC  -save -w -I/home/project/11003851/lib/petsc_210824_intel_rel/include    global.F90
>>>>>>> ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is now deprecated and will be discontinued late 2024. Intel recommends that customers transition now to using the LLVM-based Intel(R) Fortran Compiler (ifx) for continued Windows* and Linux* support, new language support, new language features, and optimizations. Use '-diag-disable=10448' to disable this message.
>>>>>>> global.F90(444): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>>>>>>         call MatGetOwnershipRange(A_mat,ksta_p,kend_p,ierr)
>>>>>>> -------------^
>>>>>>> global.F90(720): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>>>>>> call MatGetOwnershipRange(A_mat_uv,ksta_m,kend_m,ierr)
>>>>>>> -----^
>>>>>>> global.F90(774): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>>>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr)
>>>>>>> -----^
>>>>>>> global.F90(776): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>>>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr)
>>>>>>> -----^
>>>>>>> global.F90(949): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>>>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr)
>>>>>>> -----^
>>>>>>> global.F90(957): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>>>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr)
>>>>>>> -----^
>>>>>>> compilation aborted for global.F90 (code 1)
>>>>>>> 
>>>>>>> May I know what's the problem?
>>>>>>> 
>>>>>>> --
>>>>>>> 
>>>>>>> Thank you very much.
>>>>>>> 
>>>>>>> Yours sincerely,
>>>>>>> 
>>>>>>> ================================================
>>>>>>> TAY Wee-Beng ??? (Zheng Weiming)
>>>>>>> ================================================
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> --
>>>>> 
>>>>> Thank you very much.
>>>>> 
>>>>> Yours sincerely,
>>>>> 
>>>>> ================================================
>>>>> TAY Wee-Beng ??? (Zheng Weiming)
>>>>> ================================================
>>>>> 
>>>>> 
>>>> 
>>> --
>>> 
>>> Thank you very much.
>>> 
>>> Yours sincerely,
>>> 
>>> ================================================
>>> TAY Wee-Beng ??? (Zheng Weiming)
>>> ================================================
>>> 
>>> 
>> 
> --
> 
> Thank you very much.
> 
> Yours sincerely,
> 
> ================================================
> TAY Wee-Beng ??? (Zheng Weiming)
> ================================================
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240822/2da08c65/attachment-0001.html>

From zonexo at gmail.com  Thu Aug 22 09:33:44 2024
From: zonexo at gmail.com (TAY Wee Beng)
Date: Thu, 22 Aug 2024 22:33:44 +0800
Subject: [petsc-users] Error during code compile with
 MATGETOWNERSHIPRANGE
In-Reply-To: <5561D4D2-5ADD-4076-8DB3-4B07EC893FD2@petsc.dev>
References: <28024196-ea0c-4ec7-ab13-a893d2852a04@gmail.com>
	<2CC29C86-EF68-4405-97F2-93EA0C25B9F2@petsc.dev>
	<d630aaf8-a84a-49d3-9a4b-bf766bce4293@gmail.com>
	<8F81DAC9-6A51-4AD2-9D8F-AC6FDCF2A007@petsc.dev>
	<e5852c5d-786b-4b5f-8c34-3cde37145f80@gmail.com>
	<FB6E2EF7-C73D-45EA-A0E9-EDE5C30F8F40@petsc.dev>
	<493e2fc8-8ce6-47d9-83e2-8f6087d49422@gmail.com>
	<5561D4D2-5ADD-4076-8DB3-4B07EC893FD2@petsc.dev>
Message-ID: <18521985-6cbf-4e75-8274-c7ae303816cf@gmail.com>


On 22/8/2024 10:28 pm, Barry Smith wrote:
>
> ? Should be PetscReal as it is a numerical value.
Ok, I changed it but I still get the same error.
>
>
>> On Aug 22, 2024, at 9:55?AM, TAY Wee Beng <zonexo at gmail.com> wrote:
>>
>>
>> On 22/8/2024 9:54 pm, Barry Smith wrote:
>>>
>>> ? What is int_impl(k,5) defined type?
>> PetscInt
>>>
>>>> On Aug 22, 2024, at 9:45?AM, TAY Wee Beng <zonexo at gmail.com> wrote:
>>>>
>>>> Hi Barry,
>>>>
>>>> Do you mean that I change from:
>>>>
>>>> call 
>>>> MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr)
>>>>
>>>> to
>>>>
>>>> call 
>>>> MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr)
>>>>
>>>> ?
>>>>
>>>> I did it but the error is still there.
>>>>
>>>> On 22/8/2024 9:40 pm, Barry Smith wrote:
>>>>>
>>>>> ? ?Fortran 90 type checking is very tight; The dimension of the 
>>>>> array, or scalar passed as arguments must match the expected 
>>>>> dimension (f77 did not do this type checking). Thus the ione 
>>>>> argument must be a 1-d array as well as the numerical values so do
>>>>>
>>>>>> */call 
>>>>>> MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr)/*
>>>>>>
>>>>>
>>>>> See Fortran at https://urldefense.us/v3/__https://petsc.org/main/changes/dev/__;!!G_uCfscf7eWS!ceHG-n9LyJ3rsdpLe9UTHqjmAfS_mkeKJyn0NemRgYmtmmVJH-W2116HMDnQnrHJK4J7Vd8G9Z3Pqkcz9N8$ 
>>>>>
>>>>> I am trying to support the old-fashion F77 model, allowing 
>>>>> miss-matches in the array dimensions while still doing proper type 
>>>>> checking but it will take some time to simplify the API.
>>>>>
>>>>> ? ?Barry
>>>>>
>>>>>
>>>>>
>>>>>> On Aug 21, 2024, at 9:44?PM, TAY Wee Beng <zonexo at gmail.com> wrote:
>>>>>>
>>>>>> Hi Barry,
>>>>>>
>>>>>> I have declared them as integers in Fortran. Is that different 
>>>>>> from PetscInt and how come it works in debug mode?
>>>>>>
>>>>>> Anyway, I changed them and it solved the problem. However, I have 
>>>>>> a similar problem in my boundary.F90:
>>>>>>
>>>>>> */boundary.F90(6685): error #6285: There is no matching specific 
>>>>>> subroutine for this generic subroutine call. [MATSETVALUES]
>>>>>> call 
>>>>>> MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr)/*
>>>>>> -----^
>>>>>> I changed all to PetscInt and also PetscReal but I still got the 
>>>>>> error.
>>>>>>
>>>>>> Why is this so now? Any solution?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> On 22/8/2024 12:03 am, Barry Smith wrote:
>>>>>>>
>>>>>>> ? You must declare as
>>>>>>>
>>>>>>> */? PetscInt ksta_p,kend_p/*
>>>>>>>
>>>>>>> ? Perhaps they are declared as arrays?
>>>>>>>
>>>>>>>
>>>>>>>> On Aug 21, 2024, at 11:19?AM, TAY Wee Beng <zonexo at gmail.com> 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I am using the latest PETSc thru github. I compiled both the 
>>>>>>>> debug and rel ver of PETSc w/o problem.
>>>>>>>>
>>>>>>>> I then use it with my CFD code and the debug ver works.
>>>>>>>>
>>>>>>>> However, I have problems with the rel ver:
>>>>>>>>
>>>>>>>> */ftn -o global.o -c -O3 -g -ip -ipo -fPIC? -save -w 
>>>>>>>> -I/home/project/11003851/lib/petsc_210824_intel_rel/include 
>>>>>>>> global.F90
>>>>>>>> ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) 
>>>>>>>> is now deprecated and will be discontinued late 2024. Intel 
>>>>>>>> recommends that customers transition now to using the 
>>>>>>>> LLVM-based Intel(R) Fortran Compiler (ifx) for continued 
>>>>>>>> Windows* and Linux* support, new language support, new language 
>>>>>>>> features, and optimizations. Use '-diag-disable=10448' to 
>>>>>>>> disable this message.
>>>>>>>> global.F90(444): error #6285: There is no matching specific 
>>>>>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE]
>>>>>>>> ??????? call MatGetOwnershipRange(A_mat,ksta_p,kend_p,ierr)
>>>>>>>> -------------^
>>>>>>>> global.F90(720): error #6285: There is no matching specific 
>>>>>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE]
>>>>>>>> call MatGetOwnershipRange(A_mat_uv,ksta_m,kend_m,ierr)
>>>>>>>> -----^
>>>>>>>> global.F90(774): error #6285: There is no matching specific 
>>>>>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE]
>>>>>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr)
>>>>>>>> -----^
>>>>>>>> global.F90(776): error #6285: There is no matching specific 
>>>>>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE]
>>>>>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr)
>>>>>>>> -----^
>>>>>>>> global.F90(949): error #6285: There is no matching specific 
>>>>>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE]
>>>>>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr)
>>>>>>>> -----^
>>>>>>>> global.F90(957): error #6285: There is no matching specific 
>>>>>>>> subroutine for this generic subroutine call. [MATGETOWNERSHIPRANGE]
>>>>>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr)
>>>>>>>> -----^
>>>>>>>> compilation aborted for global.F90 (code 1)/*
>>>>>>>>
>>>>>>>> May I know what's the problem?
>>>>>>>>
>>>>>>>> -- 
>>>>>>>>
>>>>>>>> Thank you very much.
>>>>>>>>
>>>>>>>> Yours sincerely,
>>>>>>>>
>>>>>>>> ================================================
>>>>>>>> TAY Wee-Beng ??? (Zheng Weiming)
>>>>>>>> ================================================
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>> -- 
>>>>>>
>>>>>> Thank you very much.
>>>>>>
>>>>>> Yours sincerely,
>>>>>>
>>>>>> ================================================
>>>>>> TAY Wee-Beng ??? (Zheng Weiming)
>>>>>> ================================================
>>>>>>
>>>>>>
>>>>>
>>>> -- 
>>>>
>>>> Thank you very much.
>>>>
>>>> Yours sincerely,
>>>>
>>>> ================================================
>>>> TAY Wee-Beng ??? (Zheng Weiming)
>>>> ================================================
>>>>
>>>>
>>>
>> -- 
>>
>> Thank you very much.
>>
>> Yours sincerely,
>>
>> ================================================
>> TAY Wee-Beng ??? (Zheng Weiming)
>> ================================================
>>
>>
>
-- 

Thank you very much.

Yours sincerely,

================================================
TAY Wee-Beng ??? (Zheng Weiming)
================================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240822/28cbf884/attachment-0001.html>

From bsmith at petsc.dev  Thu Aug 22 09:39:08 2024
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 22 Aug 2024 10:39:08 -0400
Subject: [petsc-users] Error during code compile with
 MATGETOWNERSHIPRANGE
In-Reply-To: <18521985-6cbf-4e75-8274-c7ae303816cf@gmail.com>
References: <28024196-ea0c-4ec7-ab13-a893d2852a04@gmail.com>
	<2CC29C86-EF68-4405-97F2-93EA0C25B9F2@petsc.dev>
	<d630aaf8-a84a-49d3-9a4b-bf766bce4293@gmail.com>
	<8F81DAC9-6A51-4AD2-9D8F-AC6FDCF2A007@petsc.dev>
	<e5852c5d-786b-4b5f-8c34-3cde37145f80@gmail.com>
	<FB6E2EF7-C73D-45EA-A0E9-EDE5C30F8F40@petsc.dev>
	<493e2fc8-8ce6-47d9-83e2-8f6087d49422@gmail.com>
	<5561D4D2-5ADD-4076-8DB3-4B07EC893FD2@petsc.dev>
	<18521985-6cbf-4e75-8274-c7ae303816cf@gmail.com>
Message-ID: <80D9E1BB-1124-4650-8BD3-1697C51A0D86@petsc.dev>


  Hmm, try using a standalone variable

  PetscReal value 
  value = int_impl(k,5)]


>>>>> call MatSetValues(A_mat_uv,[ione],II,[ione],[value],impl_mat_A,INSERT_VALUES,ierr)
>>>>> 

unfortunately Fortan compilers in this situation are not good at telling us exactly what argument is giving it grief.

Barry


> On Aug 22, 2024, at 10:33?AM, TAY Wee Beng <zonexo at gmail.com> wrote:
> 
> 
> 
> On 22/8/2024 10:28 pm, Barry Smith wrote:
>> 
>>   Should be PetscReal as it is a numerical value.
> Ok, I changed it but I still get the same error.
>> 
>> 
>>> On Aug 22, 2024, at 9:55?AM, TAY Wee Beng <zonexo at gmail.com> <mailto:zonexo at gmail.com> wrote:
>>> 
>>> 
>>> 
>>> On 22/8/2024 9:54 pm, Barry Smith wrote:
>>>> 
>>>>   What is int_impl(k,5) defined type?
>>> PetscInt
>>>> 
>>>>> On Aug 22, 2024, at 9:45?AM, TAY Wee Beng <zonexo at gmail.com> <mailto:zonexo at gmail.com> wrote:
>>>>> 
>>>>> Hi Barry,
>>>>> 
>>>>> Do you mean that I change from:
>>>>> 
>>>>> call MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr)
>>>>> 
>>>>> to
>>>>> 
>>>>> call MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr)
>>>>> 
>>>>> ?
>>>>> 
>>>>> I did it but the error is still there.
>>>>> 
>>>>> On 22/8/2024 9:40 pm, Barry Smith wrote:
>>>>>> 
>>>>>>    Fortran 90 type checking is very tight; The dimension of the array, or scalar passed as arguments must match the expected dimension (f77 did not do this type checking). Thus the ione argument must be a 1-d array as well as the numerical values so do 
>>>>>> 
>>>>>>> call MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr)
>>>>>>> 
>>>>>> 
>>>>>> See Fortran at https://urldefense.us/v3/__https://petsc.org/main/changes/dev/__;!!G_uCfscf7eWS!eBd6zaB5WRncL8G-UiVtsat2nZsfBnJPgSygfyWsY7J3L6w4xI03YuoBCgbU-j5nWJjRqFxxKa2bJ7PYVM9EOWg$ 
>>>>>> 
>>>>>> I am trying to support the old-fashion F77 model, allowing miss-matches in the array dimensions while still doing proper type checking but it will take some time to simplify the API.
>>>>>> 
>>>>>>    Barry
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Aug 21, 2024, at 9:44?PM, TAY Wee Beng <zonexo at gmail.com> <mailto:zonexo at gmail.com> wrote:
>>>>>>> 
>>>>>>> Hi Barry,
>>>>>>> 
>>>>>>> I have declared them as integers in Fortran. Is that different from PetscInt and how come it works in debug mode?
>>>>>>> 
>>>>>>> Anyway, I changed them and it solved the problem. However, I have a similar problem in my boundary.F90:
>>>>>>> 
>>>>>>> boundary.F90(6685): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATSETVALUES]
>>>>>>> call MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr)
>>>>>>> -----^
>>>>>>> I changed all to PetscInt and also PetscReal but I still got the error.
>>>>>>> 
>>>>>>> Why is this so now? Any solution?
>>>>>>> 
>>>>>>> Thanks!
>>>>>>> 
>>>>>>> On 22/8/2024 12:03 am, Barry Smith wrote:
>>>>>>>> 
>>>>>>>>   You must declare as
>>>>>>>> 
>>>>>>>>   PetscInt ksta_p,kend_p
>>>>>>>> 
>>>>>>>>   Perhaps they are declared as arrays?
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Aug 21, 2024, at 11:19?AM, TAY Wee Beng <zonexo at gmail.com> <mailto:zonexo at gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I am using the latest PETSc thru github. I compiled both the debug and rel ver of PETSc w/o problem.
>>>>>>>>> 
>>>>>>>>> I then use it with my CFD code and the debug ver works.
>>>>>>>>> 
>>>>>>>>> However, I have problems with the rel ver:
>>>>>>>>> 
>>>>>>>>> ftn -o global.o -c -O3 -g -ip -ipo   -fPIC  -save -w -I/home/project/11003851/lib/petsc_210824_intel_rel/include    global.F90
>>>>>>>>> ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is now deprecated and will be discontinued late 2024. Intel recommends that customers transition now to using the LLVM-based Intel(R) Fortran Compiler (ifx) for continued Windows* and Linux* support, new language support, new language features, and optimizations. Use '-diag-disable=10448' to disable this message.
>>>>>>>>> global.F90(444): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>>>>>>>>         call MatGetOwnershipRange(A_mat,ksta_p,kend_p,ierr)
>>>>>>>>> -------------^
>>>>>>>>> global.F90(720): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>>>>>>>> call MatGetOwnershipRange(A_mat_uv,ksta_m,kend_m,ierr)
>>>>>>>>> -----^
>>>>>>>>> global.F90(774): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>>>>>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr)
>>>>>>>>> -----^
>>>>>>>>> global.F90(776): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>>>>>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr)
>>>>>>>>> -----^
>>>>>>>>> global.F90(949): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>>>>>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr)
>>>>>>>>> -----^
>>>>>>>>> global.F90(957): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>>>>>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr)
>>>>>>>>> -----^
>>>>>>>>> compilation aborted for global.F90 (code 1)
>>>>>>>>> 
>>>>>>>>> May I know what's the problem?
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> 
>>>>>>>>> Thank you very much.
>>>>>>>>> 
>>>>>>>>> Yours sincerely,
>>>>>>>>> 
>>>>>>>>> ================================================
>>>>>>>>> TAY Wee-Beng ??? (Zheng Weiming)
>>>>>>>>> ================================================
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> --
>>>>>>> 
>>>>>>> Thank you very much.
>>>>>>> 
>>>>>>> Yours sincerely,
>>>>>>> 
>>>>>>> ================================================
>>>>>>> TAY Wee-Beng ??? (Zheng Weiming)
>>>>>>> ================================================
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> --
>>>>> 
>>>>> Thank you very much.
>>>>> 
>>>>> Yours sincerely,
>>>>> 
>>>>> ================================================
>>>>> TAY Wee-Beng ??? (Zheng Weiming)
>>>>> ================================================
>>>>> 
>>>>> 
>>>> 
>>> --
>>> 
>>> Thank you very much.
>>> 
>>> Yours sincerely,
>>> 
>>> ================================================
>>> TAY Wee-Beng ??? (Zheng Weiming)
>>> ================================================
>>> 
>>> 
>> 
> --
> 
> Thank you very much.
> 
> Yours sincerely,
> 
> ================================================
> TAY Wee-Beng ??? (Zheng Weiming)
> ================================================
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240822/37e48066/attachment-0002.html>

From bsmith at petsc.dev  Thu Aug 22 09:39:08 2024
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 22 Aug 2024 10:39:08 -0400
Subject: [petsc-users] Error during code compile with
 MATGETOWNERSHIPRANGE
In-Reply-To: <18521985-6cbf-4e75-8274-c7ae303816cf@gmail.com>
References: <28024196-ea0c-4ec7-ab13-a893d2852a04@gmail.com>
	<2CC29C86-EF68-4405-97F2-93EA0C25B9F2@petsc.dev>
	<d630aaf8-a84a-49d3-9a4b-bf766bce4293@gmail.com>
	<8F81DAC9-6A51-4AD2-9D8F-AC6FDCF2A007@petsc.dev>
	<e5852c5d-786b-4b5f-8c34-3cde37145f80@gmail.com>
	<FB6E2EF7-C73D-45EA-A0E9-EDE5C30F8F40@petsc.dev>
	<493e2fc8-8ce6-47d9-83e2-8f6087d49422@gmail.com>
	<5561D4D2-5ADD-4076-8DB3-4B07EC893FD2@petsc.dev>
	<18521985-6cbf-4e75-8274-c7ae303816cf@gmail.com>
Message-ID: <80D9E1BB-1124-4650-8BD3-1697C51A0D86@petsc.dev>


  Hmm, try using a standalone variable

  PetscReal value 
  value = int_impl(k,5)]


>>>>> call MatSetValues(A_mat_uv,[ione],II,[ione],[value],impl_mat_A,INSERT_VALUES,ierr)
>>>>> 

unfortunately Fortan compilers in this situation are not good at telling us exactly what argument is giving it grief.

Barry


> On Aug 22, 2024, at 10:33?AM, TAY Wee Beng <zonexo at gmail.com> wrote:
> 
> 
> 
> On 22/8/2024 10:28 pm, Barry Smith wrote:
>> 
>>   Should be PetscReal as it is a numerical value.
> Ok, I changed it but I still get the same error.
>> 
>> 
>>> On Aug 22, 2024, at 9:55?AM, TAY Wee Beng <zonexo at gmail.com> <mailto:zonexo at gmail.com> wrote:
>>> 
>>> 
>>> 
>>> On 22/8/2024 9:54 pm, Barry Smith wrote:
>>>> 
>>>>   What is int_impl(k,5) defined type?
>>> PetscInt
>>>> 
>>>>> On Aug 22, 2024, at 9:45?AM, TAY Wee Beng <zonexo at gmail.com> <mailto:zonexo at gmail.com> wrote:
>>>>> 
>>>>> Hi Barry,
>>>>> 
>>>>> Do you mean that I change from:
>>>>> 
>>>>> call MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr)
>>>>> 
>>>>> to
>>>>> 
>>>>> call MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr)
>>>>> 
>>>>> ?
>>>>> 
>>>>> I did it but the error is still there.
>>>>> 
>>>>> On 22/8/2024 9:40 pm, Barry Smith wrote:
>>>>>> 
>>>>>>    Fortran 90 type checking is very tight; The dimension of the array, or scalar passed as arguments must match the expected dimension (f77 did not do this type checking). Thus the ione argument must be a 1-d array as well as the numerical values so do 
>>>>>> 
>>>>>>> call MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr)
>>>>>>> 
>>>>>> 
>>>>>> See Fortran at https://urldefense.us/v3/__https://petsc.org/main/changes/dev/__;!!G_uCfscf7eWS!b2f2s3CzjxVoPuzYvafOupk882EeGhmMhGIGBixXIP3SRNSBy6JkxfMOXE_aqVmSFgCwaEIhK4WHFyybLs7x4K8$ 
>>>>>> 
>>>>>> I am trying to support the old-fashion F77 model, allowing miss-matches in the array dimensions while still doing proper type checking but it will take some time to simplify the API.
>>>>>> 
>>>>>>    Barry
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Aug 21, 2024, at 9:44?PM, TAY Wee Beng <zonexo at gmail.com> <mailto:zonexo at gmail.com> wrote:
>>>>>>> 
>>>>>>> Hi Barry,
>>>>>>> 
>>>>>>> I have declared them as integers in Fortran. Is that different from PetscInt and how come it works in debug mode?
>>>>>>> 
>>>>>>> Anyway, I changed them and it solved the problem. However, I have a similar problem in my boundary.F90:
>>>>>>> 
>>>>>>> boundary.F90(6685): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATSETVALUES]
>>>>>>> call MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr)
>>>>>>> -----^
>>>>>>> I changed all to PetscInt and also PetscReal but I still got the error.
>>>>>>> 
>>>>>>> Why is this so now? Any solution?
>>>>>>> 
>>>>>>> Thanks!
>>>>>>> 
>>>>>>> On 22/8/2024 12:03 am, Barry Smith wrote:
>>>>>>>> 
>>>>>>>>   You must declare as
>>>>>>>> 
>>>>>>>>   PetscInt ksta_p,kend_p
>>>>>>>> 
>>>>>>>>   Perhaps they are declared as arrays?
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Aug 21, 2024, at 11:19?AM, TAY Wee Beng <zonexo at gmail.com> <mailto:zonexo at gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I am using the latest PETSc thru github. I compiled both the debug and rel ver of PETSc w/o problem.
>>>>>>>>> 
>>>>>>>>> I then use it with my CFD code and the debug ver works.
>>>>>>>>> 
>>>>>>>>> However, I have problems with the rel ver:
>>>>>>>>> 
>>>>>>>>> ftn -o global.o -c -O3 -g -ip -ipo   -fPIC  -save -w -I/home/project/11003851/lib/petsc_210824_intel_rel/include    global.F90
>>>>>>>>> ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is now deprecated and will be discontinued late 2024. Intel recommends that customers transition now to using the LLVM-based Intel(R) Fortran Compiler (ifx) for continued Windows* and Linux* support, new language support, new language features, and optimizations. Use '-diag-disable=10448' to disable this message.
>>>>>>>>> global.F90(444): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>>>>>>>>         call MatGetOwnershipRange(A_mat,ksta_p,kend_p,ierr)
>>>>>>>>> -------------^
>>>>>>>>> global.F90(720): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>>>>>>>> call MatGetOwnershipRange(A_mat_uv,ksta_m,kend_m,ierr)
>>>>>>>>> -----^
>>>>>>>>> global.F90(774): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>>>>>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr)
>>>>>>>>> -----^
>>>>>>>>> global.F90(776): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>>>>>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr)
>>>>>>>>> -----^
>>>>>>>>> global.F90(949): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>>>>>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr)
>>>>>>>>> -----^
>>>>>>>>> global.F90(957): error #6285: There is no matching specific subroutine for this generic subroutine call.   [MATGETOWNERSHIPRANGE]
>>>>>>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr)
>>>>>>>>> -----^
>>>>>>>>> compilation aborted for global.F90 (code 1)
>>>>>>>>> 
>>>>>>>>> May I know what's the problem?
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> 
>>>>>>>>> Thank you very much.
>>>>>>>>> 
>>>>>>>>> Yours sincerely,
>>>>>>>>> 
>>>>>>>>> ================================================
>>>>>>>>> TAY Wee-Beng ??? (Zheng Weiming)
>>>>>>>>> ================================================
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> --
>>>>>>> 
>>>>>>> Thank you very much.
>>>>>>> 
>>>>>>> Yours sincerely,
>>>>>>> 
>>>>>>> ================================================
>>>>>>> TAY Wee-Beng ??? (Zheng Weiming)
>>>>>>> ================================================
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> --
>>>>> 
>>>>> Thank you very much.
>>>>> 
>>>>> Yours sincerely,
>>>>> 
>>>>> ================================================
>>>>> TAY Wee-Beng ??? (Zheng Weiming)
>>>>> ================================================
>>>>> 
>>>>> 
>>>> 
>>> --
>>> 
>>> Thank you very much.
>>> 
>>> Yours sincerely,
>>> 
>>> ================================================
>>> TAY Wee-Beng ??? (Zheng Weiming)
>>> ================================================
>>> 
>>> 
>> 
> --
> 
> Thank you very much.
> 
> Yours sincerely,
> 
> ================================================
> TAY Wee-Beng ??? (Zheng Weiming)
> ================================================
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240822/37e48066/attachment-0003.html>

From edoardo.alinovi at gmail.com  Sun Aug 25 06:49:07 2024
From: edoardo.alinovi at gmail.com (Edoardo alinovi)
Date: Sun, 25 Aug 2024 13:49:07 +0200
Subject: [petsc-users] Preliminaries to use gpu capabilities
Message-ID: <CADmAu6+2z=t9p6j-2amwYWYYLBQBJzm7Q9o_q4URkS_RNt6EJQ@mail.gmail.com>

Hello Petsc friends,

As many people is doing,  I would like to explore a bit gpu capabilities
(cuda) in petsc.

Before attemping any coding effort I would like to hear from you if all of
this make sense:
- compile mpi with cuda support
- compile petsc with cuda support
- build matrix and vectors as MATAIJCUSPARSE and VECMPICUDA to tell petsc
using gpu.

That's really it or do I need to take care of something else?

I have seen that there is an amgXWrapper library aroud, but not sure if it
is still relevant now or not.

Thank you for the suggestions!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240825/bbc0c404/attachment.html>

From bsmith at petsc.dev  Sun Aug 25 11:17:20 2024
From: bsmith at petsc.dev (Barry Smith)
Date: Sun, 25 Aug 2024 12:17:20 -0400
Subject: [petsc-users] Preliminaries to use gpu capabilities
In-Reply-To: <CADmAu6+2z=t9p6j-2amwYWYYLBQBJzm7Q9o_q4URkS_RNt6EJQ@mail.gmail.com>
References: <CADmAu6+2z=t9p6j-2amwYWYYLBQBJzm7Q9o_q4URkS_RNt6EJQ@mail.gmail.com>
Message-ID: <7465D12B-F30E-496F-B1AF-6C390410DD05@petsc.dev>


> On Aug 25, 2024, at 7:49?AM, Edoardo alinovi <edoardo.alinovi at gmail.com> wrote:
> 
> Hello Petsc friends,
> 
> As many people is doing,  I would like to explore a bit gpu capabilities (cuda) in petsc. 
> 
> Before attemping any coding effort I would like to hear from you if all of this make sense:
> - compile mpi with cuda support

      This is commonly called CUDA aware MPI, but actually only means that the MPI can send and receive messages from memory addresses directly on the GPU, 
> - compile petsc with cuda support
> - build matrix and vectors as MATAIJCUSPARSE and VECMPICUDA to tell petsc using gpu.

   As presented, this will compute the vector and matrix entries on the CPU, and then PETSc will automatically move the values to the GPU for the linear solver. Which is a good start.  You can run with -log_view -log_view_gpu_time to see the timings, how much data is moved between the CPU and GPU, and where the computation happens.

 If all goes well, then you will find almost all the compute time is in building the vectors and matrices and copying the values to the GPU. At that point you will need to think about moving your computation to the GPU. This is problem-dependent, but you can look at VecSetPreallocationCOO() and MatSetPreallocationCOO() for how you can efficiently provide the values to PETSc on the GPU.

   As always, feel free to ask questions; the process is not trivial or as simple as we would like it to be,

  Barry


> 
> That's really it or do I need to take care of something else? 
> 
> I have seen that there is an amgXWrapper library aroud, but not sure if it is still relevant now or not.
> 
> Thank you for the suggestions!
> 
> 


From edoardo.alinovi at gmail.com  Sun Aug 25 11:45:29 2024
From: edoardo.alinovi at gmail.com (Edoardo alinovi)
Date: Sun, 25 Aug 2024 18:45:29 +0200
Subject: [petsc-users] Preliminaries to use gpu capabilities
In-Reply-To: <CADmAu6+2z=t9p6j-2amwYWYYLBQBJzm7Q9o_q4URkS_RNt6EJQ@mail.gmail.com>
References: <CADmAu6+2z=t9p6j-2amwYWYYLBQBJzm7Q9o_q4URkS_RNt6EJQ@mail.gmail.com>
Message-ID: <CADmAu6KEyBqeMOYQK1uTn=TEnd+ik_rr6zLZ_XGMiGGiDFwbyQ@mail.gmail.com>

Thank you Barry, sounds great. I'll try it out in the next weeks! Is copy
data such a bottleneck with respect to the solving time in your opinion? I
am not scared of building stuff on gpu directly, I basically assemble the
petsc matrix and rhs in one point so it would be ok doing stuff on gpu
directly. Is aij format ok for gpu or better CSR?

Cheers.

Il Dom 25 Ago 2024, 13:49 Edoardo alinovi <edoardo.alinovi at gmail.com> ha
scritto:

> Hello Petsc friends,
>
> As many people is doing,  I would like to explore a bit gpu capabilities
> (cuda) in petsc.
>
> Before attemping any coding effort I would like to hear from you if all of
> this make sense:
> - compile mpi with cuda support
> - compile petsc with cuda support
> - build matrix and vectors as MATAIJCUSPARSE and VECMPICUDA to tell petsc
> using gpu.
>
> That's really it or do I need to take care of something else?
>
> I have seen that there is an amgXWrapper library aroud, but not sure if it
> is still relevant now or not.
>
> Thank you for the suggestions!
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240825/f7a6009d/attachment.html>

From bsmith at petsc.dev  Sun Aug 25 17:26:28 2024
From: bsmith at petsc.dev (Barry Smith)
Date: Sun, 25 Aug 2024 18:26:28 -0400
Subject: [petsc-users] Preliminaries to use gpu capabilities
In-Reply-To: <CADmAu6KEyBqeMOYQK1uTn=TEnd+ik_rr6zLZ_XGMiGGiDFwbyQ@mail.gmail.com>
References: <CADmAu6+2z=t9p6j-2amwYWYYLBQBJzm7Q9o_q4URkS_RNt6EJQ@mail.gmail.com>
	<CADmAu6KEyBqeMOYQK1uTn=TEnd+ik_rr6zLZ_XGMiGGiDFwbyQ@mail.gmail.com>
Message-ID: <C30BE72E-A166-4BFA-9136-058AB17FEB34@petsc.dev>


> On Aug 25, 2024, at 12:45?PM, Edoardo alinovi <edoardo.alinovi at gmail.com> wrote:
> 
> Thank you Barry, sounds great. I'll try it out in the next weeks! Is copy data such a bottleneck with respect to the solving time in your opinion?

   If you are solving with the same matrix many times, then the matrix copy is not a big deal. 

> I am not scared of building stuff on gpu directly, I basically assemble the petsc matrix and rhs in one point so it would be ok doing stuff on gpu directly. Is aij format ok for gpu or better CSR?

   PETSc AIJ is essentially CSR and is what everyone supports. If you are solving with the same matrix many times, converting to MATAIJSELL up front will likely pay off, but this doesn't change user code.


> 
> Cheers.
> 
> Il Dom 25 Ago 2024, 13:49 Edoardo alinovi <edoardo.alinovi at gmail.com <mailto:edoardo.alinovi at gmail.com>> ha scritto:
>> Hello Petsc friends,
>> 
>> As many people is doing,  I would like to explore a bit gpu capabilities (cuda) in petsc. 
>> 
>> Before attemping any coding effort I would like to hear from you if all of this make sense:
>> - compile mpi with cuda support
>> - compile petsc with cuda support
>> - build matrix and vectors as MATAIJCUSPARSE and VECMPICUDA to tell petsc using gpu.
>> 
>> That's really it or do I need to take care of something else? 
>> 
>> I have seen that there is an amgXWrapper library aroud, but not sure if it is still relevant now or not.
>> 
>> Thank you for the suggestions!
>> 
>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240825/ae905a6e/attachment.html>

From zonexo at gmail.com  Sun Aug 25 22:46:17 2024
From: zonexo at gmail.com (TAY Wee Beng)
Date: Mon, 26 Aug 2024 11:46:17 +0800
Subject: [petsc-users] Error during code compile with
 MATGETOWNERSHIPRANGE
In-Reply-To: <80D9E1BB-1124-4650-8BD3-1697C51A0D86@petsc.dev>
References: <28024196-ea0c-4ec7-ab13-a893d2852a04@gmail.com>
	<2CC29C86-EF68-4405-97F2-93EA0C25B9F2@petsc.dev>
	<d630aaf8-a84a-49d3-9a4b-bf766bce4293@gmail.com>
	<8F81DAC9-6A51-4AD2-9D8F-AC6FDCF2A007@petsc.dev>
	<e5852c5d-786b-4b5f-8c34-3cde37145f80@gmail.com>
	<FB6E2EF7-C73D-45EA-A0E9-EDE5C30F8F40@petsc.dev>
	<493e2fc8-8ce6-47d9-83e2-8f6087d49422@gmail.com>
	<5561D4D2-5ADD-4076-8DB3-4B07EC893FD2@petsc.dev>
	<18521985-6cbf-4e75-8274-c7ae303816cf@gmail.com>
	<80D9E1BB-1124-4650-8BD3-1697C51A0D86@petsc.dev>
Message-ID: <1ca6f1e4-7da1-4cc9-85cd-76e7b431c1b3@gmail.com>

Hi Barry,

Thanks, I'll try later. Back to using 3.20.6 which is working 1st.


On 22/8/2024 10:39 pm, Barry Smith wrote:
>
> ? Hmm, try using a standalone variable
>
> ? PetscReal value
> ? value = int_impl(k,5)]
>
>
>>>>>> call 
>>>>>> MatSetValues(A_mat_uv,[ione],II,[ione],[value],impl_mat_A,INSERT_VALUES,ierr)
>>>>>>
>
> unfortunately Fortan compilers in this situation are not good at 
> telling us exactly what argument is giving it grief.
>
> Barry
>
>
>
>> On Aug 22, 2024, at 10:33?AM, TAY Wee Beng <zonexo at gmail.com> wrote:
>>
>>
>> On 22/8/2024 10:28 pm, Barry Smith wrote:
>>>
>>> ? Should be PetscReal as it is a numerical value.
>> Ok, I changed it but I still get the same error.
>>>
>>>
>>>> On Aug 22, 2024, at 9:55?AM, TAY Wee Beng <zonexo at gmail.com> wrote:
>>>>
>>>>
>>>> On 22/8/2024 9:54 pm, Barry Smith wrote:
>>>>>
>>>>> ? What is int_impl(k,5) defined type?
>>>> PetscInt
>>>>>
>>>>>> On Aug 22, 2024, at 9:45?AM, TAY Wee Beng <zonexo at gmail.com> wrote:
>>>>>>
>>>>>> Hi Barry,
>>>>>>
>>>>>> Do you mean that I change from:
>>>>>>
>>>>>> call 
>>>>>> MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr)
>>>>>>
>>>>>> to
>>>>>>
>>>>>> call 
>>>>>> MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr)
>>>>>>
>>>>>> ?
>>>>>>
>>>>>> I did it but the error is still there.
>>>>>>
>>>>>> On 22/8/2024 9:40 pm, Barry Smith wrote:
>>>>>>>
>>>>>>> ? ?Fortran 90 type checking is very tight; The dimension of the 
>>>>>>> array, or scalar passed as arguments must match the expected 
>>>>>>> dimension (f77 did not do this type checking). Thus the ione 
>>>>>>> argument must be a 1-d array as well as the numerical values so do
>>>>>>>
>>>>>>>> */call 
>>>>>>>> MatSetValues(A_mat_uv,[ione],II,[ione],[int_impl(k,5)],impl_mat_A,INSERT_VALUES,ierr)/*
>>>>>>>>
>>>>>>>
>>>>>>> See Fortran at https://urldefense.us/v3/__https://petsc.org/main/changes/dev/__;!!G_uCfscf7eWS!evaW1MccHDe9ZPfL8ftIad0f_3W-98xwAmqqb-CmMXOH4FQPmfb1qB2vE4K3hGi3w2UJfq_e19E7r9oLmRU$ 
>>>>>>>
>>>>>>> I am trying to support the old-fashion F77 model, allowing 
>>>>>>> miss-matches in the array dimensions while still doing proper 
>>>>>>> type checking but it will take some time to simplify the API.
>>>>>>>
>>>>>>> ? ?Barry
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On Aug 21, 2024, at 9:44?PM, TAY Wee Beng <zonexo at gmail.com> wrote:
>>>>>>>>
>>>>>>>> Hi Barry,
>>>>>>>>
>>>>>>>> I have declared them as integers in Fortran. Is that different 
>>>>>>>> from PetscInt and how come it works in debug mode?
>>>>>>>>
>>>>>>>> Anyway, I changed them and it solved the problem. However, I 
>>>>>>>> have a similar problem in my boundary.F90:
>>>>>>>>
>>>>>>>> */boundary.F90(6685): error #6285: There is no matching 
>>>>>>>> specific subroutine for this generic subroutine call. 
>>>>>>>> [MATSETVALUES]
>>>>>>>> call 
>>>>>>>> MatSetValues(A_mat_uv,ione,II,ione,int_impl(k,5),impl_mat_A,INSERT_VALUES,ierr)/*
>>>>>>>> -----^
>>>>>>>> I changed all to PetscInt and also PetscReal but I still got 
>>>>>>>> the error.
>>>>>>>>
>>>>>>>> Why is this so now? Any solution?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> On 22/8/2024 12:03 am, Barry Smith wrote:
>>>>>>>>>
>>>>>>>>> ? You must declare as
>>>>>>>>>
>>>>>>>>> */PetscInt ksta_p,kend_p/*
>>>>>>>>>
>>>>>>>>> ? Perhaps they are declared as arrays?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On Aug 21, 2024, at 11:19?AM, TAY Wee Beng <zonexo at gmail.com> 
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I am using the latest PETSc thru github. I compiled both the 
>>>>>>>>>> debug and rel ver of PETSc w/o problem.
>>>>>>>>>>
>>>>>>>>>> I then use it with my CFD code and the debug ver works.
>>>>>>>>>>
>>>>>>>>>> However, I have problems with the rel ver:
>>>>>>>>>>
>>>>>>>>>> */ftn -o global.o -c -O3 -g -ip -ipo?? -fPIC -save -w 
>>>>>>>>>> -I/home/project/11003851/lib/petsc_210824_intel_rel/include 
>>>>>>>>>> global.F90
>>>>>>>>>> ifort: remark #10448: Intel(R) Fortran Compiler Classic 
>>>>>>>>>> (ifort) is now deprecated and will be discontinued late 2024. 
>>>>>>>>>> Intel recommends that customers transition now to using the 
>>>>>>>>>> LLVM-based Intel(R) Fortran Compiler (ifx) for continued 
>>>>>>>>>> Windows* and Linux* support, new language support, new 
>>>>>>>>>> language features, and optimizations. Use 
>>>>>>>>>> '-diag-disable=10448' to disable this message.
>>>>>>>>>> global.F90(444): error #6285: There is no matching specific 
>>>>>>>>>> subroutine for this generic subroutine call. 
>>>>>>>>>> [MATGETOWNERSHIPRANGE]
>>>>>>>>>> ??????? call MatGetOwnershipRange(A_mat,ksta_p,kend_p,ierr)
>>>>>>>>>> -------------^
>>>>>>>>>> global.F90(720): error #6285: There is no matching specific 
>>>>>>>>>> subroutine for this generic subroutine call. 
>>>>>>>>>> [MATGETOWNERSHIPRANGE]
>>>>>>>>>> call MatGetOwnershipRange(A_mat_uv,ksta_m,kend_m,ierr)
>>>>>>>>>> -----^
>>>>>>>>>> global.F90(774): error #6285: There is no matching specific 
>>>>>>>>>> subroutine for this generic subroutine call. 
>>>>>>>>>> [MATGETOWNERSHIPRANGE]
>>>>>>>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr)
>>>>>>>>>> -----^
>>>>>>>>>> global.F90(776): error #6285: There is no matching specific 
>>>>>>>>>> subroutine for this generic subroutine call. 
>>>>>>>>>> [MATGETOWNERSHIPRANGE]
>>>>>>>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr)
>>>>>>>>>> -----^
>>>>>>>>>> global.F90(949): error #6285: There is no matching specific 
>>>>>>>>>> subroutine for this generic subroutine call. 
>>>>>>>>>> [MATGETOWNERSHIPRANGE]
>>>>>>>>>> call MatGetOwnershipRange(A_semi_x,ksta_mx,kend_mx,ierr)
>>>>>>>>>> -----^
>>>>>>>>>> global.F90(957): error #6285: There is no matching specific 
>>>>>>>>>> subroutine for this generic subroutine call. 
>>>>>>>>>> [MATGETOWNERSHIPRANGE]
>>>>>>>>>> call MatGetOwnershipRange(A_semi_y,ksta_my,kend_my,ierr)
>>>>>>>>>> -----^
>>>>>>>>>> compilation aborted for global.F90 (code 1)/*
>>>>>>>>>>
>>>>>>>>>> May I know what's the problem?
>>>>>>>>>>
>>>>>>>>>> -- 
>>>>>>>>>>
>>>>>>>>>> Thank you very much.
>>>>>>>>>>
>>>>>>>>>> Yours sincerely,
>>>>>>>>>>
>>>>>>>>>> ================================================
>>>>>>>>>> TAY Wee-Beng ??? (Zheng Weiming)
>>>>>>>>>> ================================================
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>> -- 
>>>>>>>>
>>>>>>>> Thank you very much.
>>>>>>>>
>>>>>>>> Yours sincerely,
>>>>>>>>
>>>>>>>> ================================================
>>>>>>>> TAY Wee-Beng ??? (Zheng Weiming)
>>>>>>>> ================================================
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>> -- 
>>>>>>
>>>>>> Thank you very much.
>>>>>>
>>>>>> Yours sincerely,
>>>>>>
>>>>>> ================================================
>>>>>> TAY Wee-Beng ??? (Zheng Weiming)
>>>>>> ================================================
>>>>>>
>>>>>>
>>>>>
>>>> -- 
>>>>
>>>> Thank you very much.
>>>>
>>>> Yours sincerely,
>>>>
>>>> ================================================
>>>> TAY Wee-Beng ??? (Zheng Weiming)
>>>> ================================================
>>>>
>>>>
>>>
>> -- 
>>
>> Thank you very much.
>>
>> Yours sincerely,
>>
>> ================================================
>> TAY Wee-Beng ??? (Zheng Weiming)
>> ================================================
>>
>>
>
-- 

Thank you very much.

Yours sincerely,

================================================
TAY Wee-Beng ??? (Zheng Weiming)
================================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240826/1af50288/attachment-0001.html>

From lzou at anl.gov  Mon Aug 26 08:10:38 2024
From: lzou at anl.gov (Zou, Ling)
Date: Mon, 26 Aug 2024 13:10:38 +0000
Subject: [petsc-users] Would Mac OS version affect PETSc/C/C++
 performance?
In-Reply-To: <CA+MQGp9W=hwmTftma3fpow4hf7Nz=x0B70A3s+JhR=Eu-eRsHg@mail.gmail.com>
References: <SA1PR09MB7984D05748DC5722078CA28AC5832@SA1PR09MB7984.namprd09.prod.outlook.com>
	<CA+MQGp_x0VLNsMyHVUC0j+TS1VHEEAOB8fVE8zHkabCDY-kiNw@mail.gmail.com>
	<SA1PR09MB79848D63FE178573FE51B367C58E2@SA1PR09MB7984.namprd09.prod.outlook.com>
	<CA+MQGp9W=hwmTftma3fpow4hf7Nz=x0B70A3s+JhR=Eu-eRsHg@mail.gmail.com>
Message-ID: <SA1PR09MB7984735068BEDEBB266010EAC58B2@SA1PR09MB7984.namprd09.prod.outlook.com>

Junchao, I am accessing a pre-compiled version of PETSc via MOOSE, so unfortunately, I don?t have those configure log files.
Note that all function calls became slower.
-Ling

From: Junchao Zhang <junchao.zhang at gmail.com>
Date: Thursday, August 22, 2024 at 9:28?AM
To: Zou, Ling <lzou at anl.gov>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Would Mac OS version affect PETSc/C/C++ performance?
Hi, Ling, MatMult almost degraded 20%, which is a lot. Do you have configure.?log for the two builds? We might find compiler discrepancies from it. --Junchao Zhang On Wed, Aug 21, 2024 at 8:?57 AM Zou, Ling <lzou@?anl.?gov> wrote: Hi Junchao,
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd
Hi, Ling,
   MatMult almost degraded 20%, which is a lot.   Do you have configure.log for the two builds?   We might find compiler discrepancies from it.

--Junchao Zhang


On Wed, Aug 21, 2024 at 8:57?AM Zou, Ling <lzou at anl.gov<mailto:lzou at anl.gov>> wrote:
Hi Junchao,

Yeah, I have part of the log_view, for the same code, same version of PETSc (3.20), but two OS (Ventura vs. Sonoma).
Note that PETSc function call numbers are exactly the same.
I suspect that it?s just OS becomes slower, or maybe something related to the compiler.

-Ling

# of calls
Time spent (Ventura)
Time spent (Sonoma)
MatMult MF
20463
3.718600E+00
4.467800E+00
MatMult
20463
3.721000E+00
4.470500E+00
MatFDColorApply
2062
4.507000E+00
5.394600E+00
MatFDColorFunc
24744
4.472400E+00
5.356300E+00
KSPSolve
2062
3.569700E+00
4.262400E+00
SNESSolve
986
9.195900E+00
1.102000E+01
SNESFunctionEval
23575
4.268600E+00
5.161100E+00
SNESJacobianEval
2062
4.509300E+00
5.397500E+00


From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Date: Monday, August 19, 2024 at 10:04?PM
To: Zou, Ling <lzou at anl.gov<mailto:lzou at anl.gov>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Would Mac OS version affect PETSc/C/C++ performance?
Do you have -log_view report so that we can know which petsc functions degraded? Or is it because compilers were different? --Junchao Zhang On Sun, Aug 18, 2024 at 6:?04 PM Zou, Ling via petsc-users <petsc-users@?mcs.?anl.?gov> wrote: Hi
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd
Do you have -log_view report so that we can know which petsc functions degraded?  Or is it because compilers were different?

--Junchao Zhang


On Sun, Aug 18, 2024 at 6:04?PM Zou, Ling via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
Hi all,

After updating Mac OS from Ventura to Sonoma, I am seeing my PETSc code having slightly-larger-than 10% of performance degradation (only in terms of execution time).
I track the number of major function calls, they are identical between the two OS (so PETSc is not the one to blame), but just slower.
Is this something expected, any one also experienced it?

-Ling
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240826/48683c67/attachment-0001.html>

From edoardo.centofanti01 at universitadipavia.it  Thu Aug 29 05:09:25 2024
From: edoardo.centofanti01 at universitadipavia.it (Edoardo Centofanti)
Date: Thu, 29 Aug 2024 12:09:25 +0200
Subject: [petsc-users] Questions about DMPlex
Message-ID: <CAGFdddoX=f6YJ_EvQ1ddVGVBMRzdJmxa6r+OECq5Mak3C1aiuw@mail.gmail.com>

Dear PETSc users,

I am tinkering with DMPlex. I am trying to import a .msh file with some
labelling over it (the mesh is 3D. Some are labels over points within a 3D
volume, others are labels over points lying on a surface for boundary
conditions). However, this is what I get in the preamble of my msh file for
the "PhysicalNames" part:

$PhysicalNames

6

2 4 "surf1"

2 5 "surf2"

2 6 "boundary"

3 1 "vol1"

3 2 "vol2"

3 3 "vol3"
$EndPhysicalNames

I import the mesh through a function which should create a distributed mesh
through all the MPI processes:

PetscErrorCode ImportMsh(const char meshname[], DM *dm) {
    PetscErrorCode ierr;
    DM distributedDM = NULL;

    // Create a DMPlex object and set its type
    ierr = DMCreate(PETSC_COMM_WORLD, dm); CHKERRQ(ierr);
    ierr = DMSetType(*dm, DMPLEX); CHKERRQ(ierr);

    // Import the mesh from an external gmsh file
    ierr = DMPlexCreateGmshFromFile(PETSC_COMM_WORLD, meshname, PETSC_TRUE,
dm); CHKERRQ(ierr);

    // Distribute the mesh across processors
    ierr = DMPlexDistribute(*dm, 0, NULL, &distributedDM); CHKERRQ(ierr);
    if (distributedDM) {
        ierr = DMDestroy(dm); CHKERRQ(ierr);
        *dm = distributedDM;
    }

    // View DMPlex
    DMView(*dm,PETSC_VIEWER_STDOUT_WORLD);
    return ierr;
}

The output of DMView with 1 processor is the following:

DM Object: DM_0x12a623ae0_0 1 MPI process
  type: plex
DM_0x12a623ae0_0 in 3 dimensions:
  Number of 0-cells per rank: 730
  Number of 1-cells per rank: 4010
  Number of 2-cells per rank: 6100
  Number of 3-cells per rank: 2819
Labels:
  celltype: 4 strata with value/size (0 (730), 6 (2819), 3 (6100), 1 (4010))
  depth: 4 strata with value/size (0 (730), 1 (4010), 2 (6100), 3 (2819))
  Cell Sets: 3 strata with value/size (2 (311), 3 (322), 1 (2186))
  Face Sets: 3 strata with value/size (6 (924), 4 (516), 5 (4))

While for 2 processes i get:

DM Object: Parallel Mesh 2 MPI processes
  type: plex
Parallel Mesh in 3 dimensions:
  Number of 0-cells per rank: 625 713
  Number of 1-cells per rank: 2792 3187
  Number of 2-cells per rank: 3546 3759
  Number of 3-cells per rank: 1410 1409
Labels:
  depth: 4 strata with value/size (0 (625), 1 (2792), 2 (3546), 3 (1410))
  celltype: 4 strata with value/size (0 (625), 1 (2792), 3 (3546), 6 (1410))
  Cell Sets: 3 strata with value/size (1 (777), 2 (311), 3 (322))
  Face Sets: 3 strata with value/size (4 (516), 5 (4), 6 (247))

*First question: *Where are the labels I gave in the .msh file? My
interpretation here is that they are the numbers in brackets: for example,
6 (924) means that 924 Face elements are marked with 6, which corresponds
to "boundary" in my .msh file. However, in the 2 processors DMView I get
different number of elements. Again, my intuition is that only proc 0
labels are printed.

*Second question: *Suppose I wanted to fill a matrix only in the entries
corresponding to the nodes of the elements marked with 5, namely "surf2".
How can I do that?

So far, I have used
ierr = DMPlexGetDepthStratum(dm, 2, &pStart, &pEnd); CHKERRQ(ierr);

in order to get pStart and pEnd for the stratum related to faces (2).

Then I looped over p from pStart to pEnd in order to select the faces which
are marked with 5 (I cannot access directly to points since they do not
seem to be marked at all).
For the selected faces, I used
ierr = DMPlexGetConeSize(dm, p, &coneSize1); CHKERRQ(ierr);
ierr = DMPlexGetCone(dm, p, &cone); CHKERRQ(ierr);

in order to retrieve the edges associated to each marked face, then I used
again GetConeSize and GetCone in order to access the vertices and managed
to build an IS object with the points I need (using also ISSortRemoveDups
to remove duplicates). But here I am stuck, since printing this IS gives
the right number of vertices, but with different local numberings depending
on the number of processors used and with a variable offset depending on
the local DAG associated to the local DMPlex.
I wonder if there exists a less cumbersome way to perform this task...

Thank you in advance,
Edoardo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240829/0bb5f531/attachment.html>

From martin.diehl at kuleuven.be  Thu Aug 29 08:56:48 2024
From: martin.diehl at kuleuven.be (Martin Diehl)
Date: Thu, 29 Aug 2024 13:56:48 +0000
Subject: [petsc-users] Fortran: PetscDSRestoreTabulation +
 PetscDSGetTabulation
Message-ID: <2554e3743adb38378641ca97e5cc9c828d06e8f9.camel@kuleuven.be>

Dear PETSc team,

I have a question regarding the use of PetscDSGetTabulation from
Fortran.
PetscDSGetTabulation has a slightly different function signature
between Fortran and C. In addition, there is an (undocumented)
PetscDSRestoreTabulation in Fortran which cleans up the arrays. Calling
it results in a segmentation fault.

I believe that PetscDSRestoreTabulation is not needed. At least our
Fortran FEM code compiles and runs without it. However, we have
convergence issues that we don't understand so any suspicious code is
currently under investigation.

best regards,
Martin

-- 
KU Leuven
Department of Computer Science
Department of Materials Engineering
Celestijnenlaan 200a
3001 Leuven, Belgium

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 659 bytes
Desc: This is a digitally signed message part
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240829/4b020c37/attachment.sig>

From knepley at gmail.com  Thu Aug 29 16:18:25 2024
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 29 Aug 2024 17:18:25 -0400
Subject: [petsc-users] Fortran: PetscDSRestoreTabulation +
 PetscDSGetTabulation
In-Reply-To: <2554e3743adb38378641ca97e5cc9c828d06e8f9.camel@kuleuven.be>
References: <2554e3743adb38378641ca97e5cc9c828d06e8f9.camel@kuleuven.be>
Message-ID: <CAMYG4GmUH1sP=ifW4h++mPgHMXMActEUvDMx+=shPbgi7Ff48w@mail.gmail.com>

On Thu, Aug 29, 2024 at 9:57?AM Martin Diehl <martin.diehl at kuleuven.be>
wrote:

> Dear PETSc team,
>
> I have a question regarding the use of PetscDSGetTabulation from
> Fortran.
> PetscDSGetTabulation has a slightly different function signature
> between Fortran and C. In addition, there is an (undocumented)
> PetscDSRestoreTabulation in Fortran which cleans up the arrays. Calling
> it results in a segmentation fault.
>
> I believe that PetscDSRestoreTabulation is not needed. At least our
> Fortran FEM code compiles and runs without it. However, we have
> convergence issues that we don't understand so any suspicious code is
> currently under investigation.
>

This may be due to my weak Fortran knowledge. Here is the code


https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/blob/main/src/dm/dt/interface/f90-custom/zdtdsf90.c?ref_type=heads__;!!G_uCfscf7eWS!dBU6FLrC9bckJQhQgaPX-SxZbtbXKaPqvirTeDpSB_7r8Pn1M2Lo4ZkCq70i-eFj3KAT-qA_gjQDfjsxLbf2$ 

I call F90Array1dCreate() in the GetTabulation and F90Array1dDestroy() in
the RestoreTabulation(), which I thought
was right. However, I remember something about interface declarations,
which have now moved somewhere I cannot find.

Barry, is the interface declaration for this function correct?

  Thanks,

      Matt


> best regards,
> Martin
>
> --
> KU Leuven
> Department of Computer Science
> Department of Materials Engineering
> Celestijnenlaan 200a
> 3001 Leuven, Belgium
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!dBU6FLrC9bckJQhQgaPX-SxZbtbXKaPqvirTeDpSB_7r8Pn1M2Lo4ZkCq70i-eFj3KAT-qA_gjQDfgnXdIvK$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!dBU6FLrC9bckJQhQgaPX-SxZbtbXKaPqvirTeDpSB_7r8Pn1M2Lo4ZkCq70i-eFj3KAT-qA_gjQDfmLCD_GY$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240829/1f12bb78/attachment.html>

From bsmith at petsc.dev  Thu Aug 29 16:21:19 2024
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 29 Aug 2024 17:21:19 -0400
Subject: [petsc-users] Fortran: PetscDSRestoreTabulation +
 PetscDSGetTabulation
In-Reply-To: <CAMYG4GmUH1sP=ifW4h++mPgHMXMActEUvDMx+=shPbgi7Ff48w@mail.gmail.com>
References: <2554e3743adb38378641ca97e5cc9c828d06e8f9.camel@kuleuven.be>
	<CAMYG4GmUH1sP=ifW4h++mPgHMXMActEUvDMx+=shPbgi7Ff48w@mail.gmail.com>
Message-ID: <D93CAD22-0605-49BE-A910-D2D46DB82E6E@petsc.dev>


   The interface definition and Fortran stub look ok to my eyeballs. However, eyeballs cannot compile code, so using the debugger to determine the cause of the crash is best.

   Barry


> On Aug 29, 2024, at 5:18?PM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> On Thu, Aug 29, 2024 at 9:57?AM Martin Diehl <martin.diehl at kuleuven.be <mailto:martin.diehl at kuleuven.be>> wrote:
>> Dear PETSc team,
>> 
>> I have a question regarding the use of PetscDSGetTabulation from
>> Fortran.
>> PetscDSGetTabulation has a slightly different function signature
>> between Fortran and C. In addition, there is an (undocumented)
>> PetscDSRestoreTabulation in Fortran which cleans up the arrays. Calling
>> it results in a segmentation fault.
>> 
>> I believe that PetscDSRestoreTabulation is not needed. At least our
>> Fortran FEM code compiles and runs without it. However, we have
>> convergence issues that we don't understand so any suspicious code is
>> currently under investigation.
> 
> This may be due to my weak Fortran knowledge. Here is the code
> 
>   https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/blob/main/src/dm/dt/interface/f90-custom/zdtdsf90.c?ref_type=heads__;!!G_uCfscf7eWS!c6UMUopH2bMHgSiLAqHoFUrJqajqcyYKZ3EFjfB_9tp9k8ByFqwf12a_M6JkOBU2tA3kSC5h83TvWw9kIrmuwic$ 
> 
> I call F90Array1dCreate() in the GetTabulation and F90Array1dDestroy() in the RestoreTabulation(), which I thought
> was right. However, I remember something about interface declarations, which have now moved somewhere I cannot find.
> 
> Barry, is the interface declaration for this function correct?
> 
>   Thanks,
> 
>       Matt
>  
>> best regards,
>> Martin
>> 
>> -- 
>> KU Leuven
>> Department of Computer Science
>> Department of Materials Engineering
>> Celestijnenlaan 200a
>> 3001 Leuven, Belgium
>> 
> 
> 
> --
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!c6UMUopH2bMHgSiLAqHoFUrJqajqcyYKZ3EFjfB_9tp9k8ByFqwf12a_M6JkOBU2tA3kSC5h83TvWw9kyCZ6mUM$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!c6UMUopH2bMHgSiLAqHoFUrJqajqcyYKZ3EFjfB_9tp9k8ByFqwf12a_M6JkOBU2tA3kSC5h83TvWw9kZ05AVyI$ >

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240829/41923420/attachment.html>

From martin.diehl at kuleuven.be  Fri Aug 30 02:10:57 2024
From: martin.diehl at kuleuven.be (Martin Diehl)
Date: Fri, 30 Aug 2024 07:10:57 +0000
Subject: [petsc-users] Fortran: PetscDSRestoreTabulation +
 PetscDSGetTabulation
In-Reply-To: <D93CAD22-0605-49BE-A910-D2D46DB82E6E@petsc.dev>
References: <2554e3743adb38378641ca97e5cc9c828d06e8f9.camel@kuleuven.be>
	<CAMYG4GmUH1sP=ifW4h++mPgHMXMActEUvDMx+=shPbgi7Ff48w@mail.gmail.com>
	<D93CAD22-0605-49BE-A910-D2D46DB82E6E@petsc.dev>
Message-ID: <bdd138560e5498545716d30e356d9b26e244598a.camel@kuleuven.be>

Dear Matt and Barry:

thanks for the quick reply.
Please forget about the segmentation fault, that was a mistake in my
code.

Regarding the necessity of PetscDSRestoreTabulation:
It cleans up "b" and "bDer". Those are defined as "PetscReal, pointer".
If they are defined in a function or subroutine, they go out of scope
automatically. So I believed (backed up by measuring the memory
consumption with and without PetscDSRestoreTabulation) that the
PetscDSRestoreTabulation does not add anything important.

Martin


On Thu, 2024-08-29 at 17:21 -0400, Barry Smith wrote:
> 
> ? ?The interface definition and Fortran stub look ok to my eyeballs.
> However, eyeballs cannot compile code, so using the debugger to
> determine the cause of the crash is best.
> 
> ? ?Barry
> 
> 
> 
> > On Aug 29, 2024, at 5:18?PM, Matthew Knepley <knepley at gmail.com>
> > wrote:
> > 
> > On Thu, Aug 29, 2024 at 9:57?AM Martin Diehl
> > <martin.diehl at kuleuven.be> wrote:
> > > Dear PETSc team,
> > > 
> > > I have a question regarding the use of PetscDSGetTabulation from
> > > Fortran.
> > > PetscDSGetTabulation has a slightly different function signature
> > > between Fortran and C. In addition, there is an (undocumented)
> > > PetscDSRestoreTabulation in Fortran which cleans up the arrays.
> > > Calling
> > > it results in a segmentation fault.
> > > 
> > > I believe that PetscDSRestoreTabulation is not needed. At least
> > > our
> > > Fortran FEM code compiles and runs without it. However, we have
> > > convergence issues that we don't understand so any suspicious
> > > code is
> > > currently under investigation.
> > > 
> > 
> > 
> > This may be due to my weak Fortran knowledge. Here is the code
> > 
> > ??
> > https://gitlab.com/petsc/petsc/-/blob/main/src/dm/dt/interface/f90-
> > custom/zdtdsf90.c?ref_type=heads
> > 
> > I call F90Array1dCreate() in the GetTabulation and
> > F90Array1dDestroy() in the RestoreTabulation(), which I thought
> > was right. However, I remember something about interface
> > declarations, which have now moved somewhere I cannot find.
> > 
> > Barry, is the interface declaration for this function correct?
> > 
> > ? Thanks,
> > 
> > ? ? ? Matt
> > ?
> > > best regards,
> > > Martin
> > > 

-- 
KU Leuven
Department of Computer Science
Department of Materials Engineering
Celestijnenlaan 200a
3001 Leuven, Belgium
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 659 bytes
Desc: This is a digitally signed message part
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240830/91ba3400/attachment-0001.sig>

From bsmith at petsc.dev  Fri Aug 30 11:04:42 2024
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 30 Aug 2024 12:04:42 -0400
Subject: [petsc-users] Fortran: PetscDSRestoreTabulation +
 PetscDSGetTabulation
In-Reply-To: <bdd138560e5498545716d30e356d9b26e244598a.camel@kuleuven.be>
References: <2554e3743adb38378641ca97e5cc9c828d06e8f9.camel@kuleuven.be>
	<CAMYG4GmUH1sP=ifW4h++mPgHMXMActEUvDMx+=shPbgi7Ff48w@mail.gmail.com>
	<D93CAD22-0605-49BE-A910-D2D46DB82E6E@petsc.dev>
	<bdd138560e5498545716d30e356d9b26e244598a.camel@kuleuven.be>
Message-ID: <E2849089-8300-4392-BF2D-7832E3540441@petsc.dev>


> On Aug 30, 2024, at 3:10?AM, Martin Diehl <martin.diehl at kuleuven.be> wrote:
> 
> Dear Matt and Barry:
> 
> thanks for the quick reply.
> Please forget about the segmentation fault, that was a mistake in my
> code.
> 
> Regarding the necessity of PetscDSRestoreTabulation:
> It cleans up "b" and "bDer". Those are defined as "PetscReal, pointer".
> If they are defined in a function or subroutine, they go out of scope
> automatically. So I believed (backed up by measuring the memory
> consumption with and without PetscDSRestoreTabulation) that the
> PetscDSRestoreTabulation does not add anything important.

  
    Our Fortran stub restore performs a  nullify(ptr) so that the ptr is no longer associated with our C array. I guess you are saying that when the ptr goes out of scope, its (little) memory is automatically freed regardless of whether the ptr is still associated with something or not. So nullify is not needed in that case.

    Thus, the restore is only a "safety" feature, preventing the caller from accidentally using the associated C array later (which they should not do). Similar
to how our restore in C nullifies the C pointer so it cannot be used accidentally later, resulting in memory corruption.

    Thanks for the clarification.

   Barry

  Note that some restores do dereference memory or objects, and those must be called, or there will be a memory loss. Thus, it is best always to call restore, though sometimes it may not be strictly necessary?


> 
> Martin
> 
> 
> On Thu, 2024-08-29 at 17:21 -0400, Barry Smith wrote:
>> 
>>    The interface definition and Fortran stub look ok to my eyeballs.
>> However, eyeballs cannot compile code, so using the debugger to
>> determine the cause of the crash is best.
>> 
>>    Barry
>> 
>> 
>> 
>>> On Aug 29, 2024, at 5:18?PM, Matthew Knepley <knepley at gmail.com>
>>> wrote:
>>> 
>>> On Thu, Aug 29, 2024 at 9:57?AM Martin Diehl
>>> <martin.diehl at kuleuven.be> wrote:
>>>> Dear PETSc team,
>>>> 
>>>> I have a question regarding the use of PetscDSGetTabulation from
>>>> Fortran.
>>>> PetscDSGetTabulation has a slightly different function signature
>>>> between Fortran and C. In addition, there is an (undocumented)
>>>> PetscDSRestoreTabulation in Fortran which cleans up the arrays.
>>>> Calling
>>>> it results in a segmentation fault.
>>>> 
>>>> I believe that PetscDSRestoreTabulation is not needed. At least
>>>> our
>>>> Fortran FEM code compiles and runs without it. However, we have
>>>> convergence issues that we don't understand so any suspicious
>>>> code is
>>>> currently under investigation.
>>>> 
>>> 
>>> 
>>> This may be due to my weak Fortran knowledge. Here is the code
>>> 
>>>   
>>> https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/blob/main/src/dm/dt/interface/f90-__;!!G_uCfscf7eWS!Yupp9dT687CoWxUcIqYk9RyipgxWnDmxmOhS6kdl4ta7TWb_rVTCvVs9Zf5syoTZzj4shfQuUcwW5Aev3a8Vx04$ 
>>> custom/zdtdsf90.c?ref_type=heads
>>> 
>>> I call F90Array1dCreate() in the GetTabulation and
>>> F90Array1dDestroy() in the RestoreTabulation(), which I thought
>>> was right. However, I remember something about interface
>>> declarations, which have now moved somewhere I cannot find.
>>> 
>>> Barry, is the interface declaration for this function correct?
>>> 
>>>   Thanks,
>>> 
>>>       Matt
>>>  
>>>> best regards,
>>>> Martin
>>>> 
> 
> -- 
> KU Leuven
> Department of Computer Science
> Department of Materials Engineering
> Celestijnenlaan 200a
> 3001 Leuven, Belgium