From paul.grosse-bley at ziti.uni-heidelberg.de  Wed Mar  1 09:30:33 2023
From: paul.grosse-bley at ziti.uni-heidelberg.de (Paul Grosse-Bley)
Date: Wed, 01 Mar 2023 16:30:33 +0100
Subject: [petsc-users] 
 =?utf-8?q?How_to_use_DM=5FBOUNDARY=5FGHOSTED_for_D?=
 =?utf-8?q?irichlet_boundary_conditions?=
Message-ID: <369419-63ff6f80-cd-ca552d@2229059>


Thank you for the detailed answer, Barry. I had hit a deadend on my side.
If you wish to compare, for example, ex45.c with a code that does not incorporate the Dirichlet boundary nodes in the linear system you can just use 0 boundary conditions for both codes.Do you mean to implement the boundary conditions explicitly in e.g. hpgmg-cuda instead of using the ghosted cells for them?

Do I go right in the assumption that the PCMG coarsening (using DMDAs geometric information) will cause the boundary condition on the coarser grids to be finite (>0)?

Ideally I would like to just use some kind of GPU-parallel (colored) SOR/Gauss-Seidel instead of Jacobi. One can relatively easily implement Red-Black GS using cuSPARSE's masked matrix vector products, but I have not found any information on implementing a custom preconditioner in PETSc.

Best,
Paul Grosse-Bley

On Wednesday, March 01, 2023 05:38 CET, Barry Smith <bsmith at petsc.dev> wrote:
???? ?Ok, here is the situation. The command line options as given do not result in multigrid quality convergence in any of the runs; the error contraction factor is around .94 (meaning that for the modes that the multigrid algorithm does the worst on it only removes about 6 percent of them per iteration).??? ?But this is hidden by the initial right hand side for the linear system as written in ex45.c which has O(h) values on the boundary nodes and O(h^3) values on the interior nodes. The first iterations are largely working on the boundary residual and making great progress attacking that so that it looks like the one has a good error contraction factor. One then sees the error contraction factor start to get worse and worse for the later iterations. With the 0 on the boundary the iterations quickly get to the bad regime where the error contraction factor is near one. One can see this by using a -ksp_rtol 1.e-12 and having the MG code print the residual decrease for each iteration. Thought it appears the 0 boundary condition one converges much slower (since it requires many more iterations) if you factor out the huge advantage of the nonzero boundary condition case at the beginning (in terms of decreasing the residual) you see they both have an asymptotic error contraction factor of around .94 (which is horrible for multigrid).?? ?I now add?-mg_levels_ksp_richardson_scale .9 -mg_coarse_ksp_richardson_scale .9?and?rerun the two cases (nonzero and zero boundary right hand side) they take 35 and 41 iterations (much better)?
initial residual norm 14.6993
next residual norm 0.84167 0.0572591
next residual norm 0.0665392 0.00452668
next residual norm 0.0307273 0.00209039
next residual norm 0.0158949 0.00108134
next residual norm 0.00825189 0.000561378
next residual norm 0.00428474 0.000291492
next residual norm 0.00222482 0.000151355
next residual norm 0.00115522 7.85898e-05
next residual norm 0.000599836 4.0807e-05
next residual norm 0.000311459 2.11887e-05
next residual norm 0.000161722 1.1002e-05
next residual norm 8.39727e-05 5.71269e-06
next residual norm 4.3602e-05 2.96626e-06
next residual norm 2.26399e-05 1.5402e-06
next residual norm 1.17556e-05 7.99735e-07
next residual norm 6.10397e-06 4.15255e-07
next residual norm 3.16943e-06 2.15617e-07
next residual norm 1.64569e-06 1.11957e-07
next residual norm 8.54511e-07 5.81326e-08
next residual norm 4.43697e-07 3.01848e-08
next residual norm 2.30385e-07 1.56732e-08
next residual norm 1.19625e-07 8.13815e-09
next residual norm 6.21143e-08 4.22566e-09
next residual norm 3.22523e-08 2.19413e-09
next residual norm 1.67467e-08 1.13928e-09
next residual norm 8.69555e-09 5.91561e-10
next residual norm 4.51508e-09 3.07162e-10
next residual norm 2.34441e-09 1.59491e-10
next residual norm 1.21731e-09 8.28143e-11
next residual norm 6.32079e-10 4.30005e-11
next residual norm 3.28201e-10 2.23276e-11
next residual norm 1.70415e-10 1.15934e-11
next residual norm 8.84865e-11 6.01976e-12
next residual norm 4.59457e-11 3.1257e-12
next residual norm 2.38569e-11 1.62299e-12
next residual norm 1.23875e-11 8.42724e-13
Linear solve converged due to CONVERGED_RTOL iterations 35
Residual norm 1.23875e-11?
initial residual norm 172.601
next residual norm 154.803 0.896887
next residual norm 66.9409 0.387837
next residual norm 34.4572 0.199636
next residual norm 17.8836 0.103612
next residual norm 9.28582 0.0537995
next residual norm 4.82161 0.027935
next residual norm 2.50358 0.014505
next residual norm 1.29996 0.0075316
next residual norm 0.674992 0.00391071
next residual norm 0.350483 0.0020306
next residual norm 0.181985 0.00105437
next residual norm 0.094494 0.000547472
next residual norm 0.0490651 0.000284269
next residual norm 0.0254766 0.000147604
next residual norm 0.0132285 7.6642e-05
next residual norm 0.00686876 3.97956e-05
next residual norm 0.00356654 2.06635e-05
next residual norm 0.00185189 1.07293e-05
next residual norm 0.000961576 5.5711e-06
next residual norm 0.000499289 2.89274e-06
next residual norm 0.000259251 1.50203e-06
next residual norm 0.000134614 7.79914e-07
next residual norm 6.98969e-05 4.04963e-07
next residual norm 3.62933e-05 2.10273e-07
next residual norm 1.88449e-05 1.09182e-07
next residual norm 9.78505e-06 5.66919e-08
next residual norm 5.0808e-06 2.94367e-08
next residual norm 2.63815e-06 1.52847e-08
next residual norm 1.36984e-06 7.93645e-09
next residual norm 7.11275e-07 4.12093e-09
next residual norm 3.69322e-07 2.13975e-09
next residual norm 1.91767e-07 1.11105e-09
next residual norm 9.95733e-08 5.769e-10
next residual norm 5.17024e-08 2.99549e-10
next residual norm 2.6846e-08 1.55538e-10
next residual norm 1.39395e-08 8.07615e-11
next residual norm 7.23798e-09 4.19348e-11
next residual norm 3.75824e-09 2.17742e-11
next residual norm 1.95138e-09 1.13058e-11
next residual norm 1.01327e-09 5.87059e-12
next residual norm 5.26184e-10 3.04856e-12
next residual norm 2.73182e-10 1.58274e-12
next residual norm 1.41806e-10 8.21586e-13
Linear solve converged due to CONVERGED_RTOL iterations 42
Residual norm 1.41806e-10?Notice in the first run the residual norm still dives much more quickly for the first 2 iterations than the second run. This is because the first run has "lucky error" that gets wiped out easily from the big boundary term. After that you can see that the convergence for both is very similar with both having a reasonable error contraction factor of .51?I' ve attached the modified src/ksp/pc/impls/mg/mg.c that prints the residuals along the way.?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230301/0c050ae2/attachment.html>

From bsmith at petsc.dev  Wed Mar  1 09:51:00 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Wed, 1 Mar 2023 10:51:00 -0500
Subject: [petsc-users] How to use DM_BOUNDARY_GHOSTED for Dirichlet
 boundary conditions
In-Reply-To: <369419-63ff6f80-cd-ca552d@2229059>
References: <369419-63ff6f80-cd-ca552d@2229059>
Message-ID: <6E352B23-E2B8-4538-B088-EBEDCE3315B6@petsc.dev>



> On Mar 1, 2023, at 10:30 AM, Paul Grosse-Bley <paul.grosse-bley at ziti.uni-heidelberg.de> wrote:
> 
> Thank you for the detailed answer, Barry. I had hit a deadend on my side.
> If you wish to compare, for example, ex45.c with a code that does not incorporate the Dirichlet boundary nodes in the linear system you can just use 0 boundary conditions for both codes.
> 
> Do you mean to implement the boundary conditions explicitly in e.g. hpgmg-cuda instead of using the ghosted cells for them?

   I don't know anything about hpgmg-cuda and what it means by "ghosted cells".  I am just saying I think it is reasonable to use the style of ex45.c that retains Dirichlet unknowns in the global matrix (with zero on the those points) in PETSc to compare with other codes that may or may not do something different. But you need to use zero for Dirichlet points to ensure that the "funny" convergence rates do not happen at the beginning making the comparison unbalanced between the two codes.

> 
> Do I go right in the assumption that the PCMG coarsening (using DMDAs geometric information) will cause the boundary condition on the coarser grids to be finite (>0)?

  In general even if the Dirichlet points on the fine grid are zero, during PCMG with DMDA as in ex45.c yes those "boundary" values on the coarser grid may end up during the iterative process as non-zero.  But this is "harmless", just part of the algorithm but it does mean the convergence with a code that does not include those points will be different (not necessarily better or worse, just different). 
> 
> Ideally I would like to just use some kind of GPU-parallel (colored) SOR/Gauss-Seidel instead of Jacobi. One can relatively easily implement Red-Black GS using cuSPARSE's masked matrix vector products, but I have not found any information on implementing a custom preconditioner in PETSc.

  You can use https://petsc.org/release/docs/manualpages/PC/PCSHELL/ You can look at src/ksp/pc/impls/jacobi.c for detailed comments on what goes into a preconditioner object.

 If you write such a GPU-parallel (colored) SOR/Gauss-Seidel  we would love to include it in PETSc. Note also https://petsc.org/release/docs/manualpages/KSP/KSPCHEBYSHEV/ and potentially other "polynomial preconditioners" are an alternative approach for having more powerful parallel smoothers.




> 
> Best,
> Paul Grosse-Bley
> 
> On Wednesday, March 01, 2023 05:38 CET, Barry Smith <bsmith at petsc.dev> wrote:
>  
>> 
>>  
>  
>    Ok, here is the situation. The command line options as given do not result in multigrid quality convergence in any of the runs; the error contraction factor is around .94 (meaning that for the modes that the multigrid algorithm does the worst on it only removes about 6 percent of them per iteration). 
>  
>    But this is hidden by the initial right hand side for the linear system as written in ex45.c which has O(h) values on the boundary nodes and O(h^3) values on the interior nodes. The first iterations are largely working on the boundary residual and making great progress attacking that so that it looks like the one has a good error contraction factor. One then sees the error contraction factor start to get worse and worse for the later iterations. With the 0 on the boundary the iterations quickly get to the bad regime where the error contraction factor is near one. One can see this by using a -ksp_rtol 1.e-12 and having the MG code print the residual decrease for each iteration. Thought it appears the 0 boundary condition one converges much slower (since it requires many more iterations) if you factor out the huge advantage of the nonzero boundary condition case at the beginning (in terms of decreasing the residual) you see they both have an asymptotic error contraction factor of around .94 (which is horrible for multigrid).
>  
>    I now add -mg_levels_ksp_richardson_scale .9 -mg_coarse_ksp_richardson_scale .9 and rerun the two cases (nonzero and zero boundary right hand side) they take 35 and 41 iterations (much better)
>  
> initial residual norm 14.6993
> next residual norm 0.84167 0.0572591
> next residual norm 0.0665392 0.00452668
> next residual norm 0.0307273 0.00209039
> next residual norm 0.0158949 0.00108134
> next residual norm 0.00825189 0.000561378
> next residual norm 0.00428474 0.000291492
> next residual norm 0.00222482 0.000151355
> next residual norm 0.00115522 7.85898e-05
> next residual norm 0.000599836 4.0807e-05
> next residual norm 0.000311459 2.11887e-05
> next residual norm 0.000161722 1.1002e-05
> next residual norm 8.39727e-05 5.71269e-06
> next residual norm 4.3602e-05 2.96626e-06
> next residual norm 2.26399e-05 1.5402e-06
> next residual norm 1.17556e-05 7.99735e-07
> next residual norm 6.10397e-06 4.15255e-07
> next residual norm 3.16943e-06 2.15617e-07
> next residual norm 1.64569e-06 1.11957e-07
> next residual norm 8.54511e-07 5.81326e-08
> next residual norm 4.43697e-07 3.01848e-08
> next residual norm 2.30385e-07 1.56732e-08
> next residual norm 1.19625e-07 8.13815e-09
> next residual norm 6.21143e-08 4.22566e-09
> next residual norm 3.22523e-08 2.19413e-09
> next residual norm 1.67467e-08 1.13928e-09
> next residual norm 8.69555e-09 5.91561e-10
> next residual norm 4.51508e-09 3.07162e-10
> next residual norm 2.34441e-09 1.59491e-10
> next residual norm 1.21731e-09 8.28143e-11
> next residual norm 6.32079e-10 4.30005e-11
> next residual norm 3.28201e-10 2.23276e-11
> next residual norm 1.70415e-10 1.15934e-11
> next residual norm 8.84865e-11 6.01976e-12
> next residual norm 4.59457e-11 3.1257e-12
> next residual norm 2.38569e-11 1.62299e-12
> next residual norm 1.23875e-11 8.42724e-13
> Linear solve converged due to CONVERGED_RTOL iterations 35
> Residual norm 1.23875e-11
>  
> initial residual norm 172.601
> next residual norm 154.803 0.896887
> next residual norm 66.9409 0.387837
> next residual norm 34.4572 0.199636
> next residual norm 17.8836 0.103612
> next residual norm 9.28582 0.0537995
> next residual norm 4.82161 0.027935
> next residual norm 2.50358 0.014505
> next residual norm 1.29996 0.0075316
> next residual norm 0.674992 0.00391071
> next residual norm 0.350483 0.0020306
> next residual norm 0.181985 0.00105437
> next residual norm 0.094494 0.000547472
> next residual norm 0.0490651 0.000284269
> next residual norm 0.0254766 0.000147604
> next residual norm 0.0132285 7.6642e-05
> next residual norm 0.00686876 3.97956e-05
> next residual norm 0.00356654 2.06635e-05
> next residual norm 0.00185189 1.07293e-05
> next residual norm 0.000961576 5.5711e-06
> next residual norm 0.000499289 2.89274e-06
> next residual norm 0.000259251 1.50203e-06
> next residual norm 0.000134614 7.79914e-07
> next residual norm 6.98969e-05 4.04963e-07
> next residual norm 3.62933e-05 2.10273e-07
> next residual norm 1.88449e-05 1.09182e-07
> next residual norm 9.78505e-06 5.66919e-08
> next residual norm 5.0808e-06 2.94367e-08
> next residual norm 2.63815e-06 1.52847e-08
> next residual norm 1.36984e-06 7.93645e-09
> next residual norm 7.11275e-07 4.12093e-09
> next residual norm 3.69322e-07 2.13975e-09
> next residual norm 1.91767e-07 1.11105e-09
> next residual norm 9.95733e-08 5.769e-10
> next residual norm 5.17024e-08 2.99549e-10
> next residual norm 2.6846e-08 1.55538e-10
> next residual norm 1.39395e-08 8.07615e-11
> next residual norm 7.23798e-09 4.19348e-11
> next residual norm 3.75824e-09 2.17742e-11
> next residual norm 1.95138e-09 1.13058e-11
> next residual norm 1.01327e-09 5.87059e-12
> next residual norm 5.26184e-10 3.04856e-12
> next residual norm 2.73182e-10 1.58274e-12
> next residual norm 1.41806e-10 8.21586e-13
> Linear solve converged due to CONVERGED_RTOL iterations 42
> Residual norm 1.41806e-10
>  
> Notice in the first run the residual norm still dives much more quickly for the first 2 iterations than the second run. This is because the first run has "lucky error" that gets wiped out easily from the big boundary term. After that you can see that the convergence for both is very similar with both having a reasonable error contraction factor of .51
>  
> I' ve attached the modified src/ksp/pc/impls/mg/mg.c that prints the residuals along the way. 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230301/8d92223a/attachment-0001.html>

From paul.grosse-bley at ziti.uni-heidelberg.de  Wed Mar  1 10:10:26 2023
From: paul.grosse-bley at ziti.uni-heidelberg.de (Paul Grosse-Bley)
Date: Wed, 01 Mar 2023 17:10:26 +0100
Subject: [petsc-users] 
 =?utf-8?q?How_to_use_DM=5FBOUNDARY=5FGHOSTED_for_D?=
 =?utf-8?q?irichlet_boundary_conditions?=
In-Reply-To: <6E352B23-E2B8-4538-B088-EBEDCE3315B6@petsc.dev>
Message-ID: <3831b9-63ff7900-e9-7ca7f900@96463981>


I previously thought that a custom Red-Black GS solution would be to specific for PETSc as it will only work for star-shaped stencils of width 1.

For that specific operation, a custom kernel would probably be able to achieve significantly better performance as it would not have to load a mask from memory and one can make use of the very regular structure of the problem. We still wanted to use cuSPARSE here because it seemed to be more in line with the present GPU support of PETSc and using custom kernels in one library might be seen as unfair when comparing it to others.

But if you are interested anyway, I will share my implementation if/when I get to implementing that.

Best,
Paul Grosse-Bley

On Wednesday, March 01, 2023 16:51 CET, Barry Smith <bsmith at petsc.dev> wrote:
???On Mar 1, 2023, at 10:30 AM, Paul Grosse-Bley <paul.grosse-bley at ziti.uni-heidelberg.de> wrote:?Thank you for the detailed answer, Barry. I had hit a deadend on my side.
If you wish to compare, for example, ex45.c with a code that does not incorporate the Dirichlet boundary nodes in the linear system you can just use 0 boundary conditions for both codes.Do you mean to implement the boundary conditions explicitly in e.g. hpgmg-cuda instead of using the ghosted cells for them??? ?I don't know anything about hpgmg-cuda and what it means by "ghosted cells". ?I am just saying I think it is reasonable to use the style of ex45.c that retains Dirichlet unknowns in the global matrix (with zero on the those points) in PETSc to compare with other codes that may or may not do something different. But you need to use zero for Dirichlet points to ensure that the "funny" convergence rates do not happen at the beginning making the comparison unbalanced between the two codes.?
Do I go right in the assumption that the PCMG coarsening (using DMDAs geometric information) will cause the boundary condition on the coarser grids to be finite (>0)??? In general even if the Dirichlet points on the fine grid are zero, during PCMG with DMDA as in ex45.c yes those "boundary" values on the coarser grid may end up during the iterative process as non-zero. ?But this is "harmless", just part of the algorithm but it does mean the convergence with a code that does not include those points will be different (not necessarily better or worse, just different).?
Ideally I would like to just use some kind of GPU-parallel (colored) SOR/Gauss-Seidel instead of Jacobi. One can relatively easily implement Red-Black GS using cuSPARSE's masked matrix vector products, but I have not found any information on implementing a custom preconditioner in PETSc.?? You can use?https://petsc.org/release/docs/manualpages/PC/PCSHELL/?You can look at src/ksp/pc/impls/jacobi.c for detailed comments on what goes into a preconditioner object.??If you write such a GPU-parallel (colored) SOR/Gauss-Seidel ?we would love to include it in PETSc. Note also?https://petsc.org/release/docs/manualpages/KSP/KSPCHEBYSHEV/?and potentially other "polynomial preconditioners" are an alternative approach for having more powerful parallel smoothers.????
Best,
Paul Grosse-Bley

On Wednesday, March 01, 2023 05:38 CET, Barry Smith <bsmith at petsc.dev> wrote:
???? ?Ok, here is the situation. The command line options as given do not result in multigrid quality convergence in any of the runs; the error contraction factor is around .94 (meaning that for the modes that the multigrid algorithm does the worst on it only removes about 6 percent of them per iteration).??? ?But this is hidden by the initial right hand side for the linear system as written in ex45.c which has O(h) values on the boundary nodes and O(h^3) values on the interior nodes. The first iterations are largely working on the boundary residual and making great progress attacking that so that it looks like the one has a good error contraction factor. One then sees the error contraction factor start to get worse and worse for the later iterations. With the 0 on the boundary the iterations quickly get to the bad regime where the error contraction factor is near one. One can see this by using a -ksp_rtol 1.e-12 and having the MG code print the residual decrease for each iteration. Thought it appears the 0 boundary condition one converges much slower (since it requires many more iterations) if you factor out the huge advantage of the nonzero boundary condition case at the beginning (in terms of decreasing the residual) you see they both have an asymptotic error contraction factor of around .94 (which is horrible for multigrid).?? ?I now add?-mg_levels_ksp_richardson_scale .9 -mg_coarse_ksp_richardson_scale .9?and?rerun the two cases (nonzero and zero boundary right hand side) they take 35 and 41 iterations (much better)?initial residual norm 14.6993next residual norm 0.84167 0.0572591next residual norm 0.0665392 0.00452668next residual norm 0.0307273 0.00209039next residual norm 0.0158949 0.00108134next residual norm 0.00825189 0.000561378next residual norm 0.00428474 0.000291492next residual norm 0.00222482 0.000151355next residual norm 0.00115522 7.85898e-05next residual norm 0.000599836 4.0807e-05next residual norm 0.000311459 2.11887e-05next residual norm 0.000161722 1.1002e-05next residual norm 8.39727e-05 5.71269e-06next residual norm 4.3602e-05 2.96626e-06next residual norm 2.26399e-05 1.5402e-06next residual norm 1.17556e-05 7.99735e-07next residual norm 6.10397e-06 4.15255e-07next residual norm 3.16943e-06 2.15617e-07next residual norm 1.64569e-06 1.11957e-07next residual norm 8.54511e-07 5.81326e-08next residual norm 4.43697e-07 3.01848e-08next residual norm 2.30385e-07 1.56732e-08next residual norm 1.19625e-07 8.13815e-09next residual norm 6.21143e-08 4.22566e-09next residual norm 3.22523e-08 2.19413e-09next residual norm 1.67467e-08 1.13928e-09next residual norm 8.69555e-09 5.91561e-10next residual norm 4.51508e-09 3.07162e-10next residual norm 2.34441e-09 1.59491e-10next residual norm 1.21731e-09 8.28143e-11next residual norm 6.32079e-10 4.30005e-11next residual norm 3.28201e-10 2.23276e-11next residual norm 1.70415e-10 1.15934e-11next residual norm 8.84865e-11 6.01976e-12next residual norm 4.59457e-11 3.1257e-12next residual norm 2.38569e-11 1.62299e-12next residual norm 1.23875e-11 8.42724e-13Linear solve converged due to CONVERGED_RTOL iterations 35Residual norm 1.23875e-11?initial residual norm 172.601next residual norm 154.803 0.896887next residual norm 66.9409 0.387837next residual norm 34.4572 0.199636next residual norm 17.8836 0.103612next residual norm 9.28582 0.0537995next residual norm 4.82161 0.027935next residual norm 2.50358 0.014505next residual norm 1.29996 0.0075316next residual norm 0.674992 0.00391071next residual norm 0.350483 0.0020306next residual norm 0.181985 0.00105437next residual norm 0.094494 0.000547472next residual norm 0.0490651 0.000284269next residual norm 0.0254766 0.000147604next residual norm 0.0132285 7.6642e-05next residual norm 0.00686876 3.97956e-05next residual norm 0.00356654 2.06635e-05next residual norm 0.00185189 1.07293e-05next residual norm 0.000961576 5.5711e-06next residual norm 0.000499289 2.89274e-06next residual norm 0.000259251 1.50203e-06next residual norm 0.000134614 7.79914e-07next residual norm 6.98969e-05 4.04963e-07next residual norm 3.62933e-05 2.10273e-07next residual norm 1.88449e-05 1.09182e-07next residual norm 9.78505e-06 5.66919e-08next residual norm 5.0808e-06 2.94367e-08next residual norm 2.63815e-06 1.52847e-08next residual norm 1.36984e-06 7.93645e-09next residual norm 7.11275e-07 4.12093e-09next residual norm 3.69322e-07 2.13975e-09next residual norm 1.91767e-07 1.11105e-09next residual norm 9.95733e-08 5.769e-10next residual norm 5.17024e-08 2.99549e-10next residual norm 2.6846e-08 1.55538e-10next residual norm 1.39395e-08 8.07615e-11next residual norm 7.23798e-09 4.19348e-11next residual norm 3.75824e-09 2.17742e-11next residual norm 1.95138e-09 1.13058e-11next residual norm 1.01327e-09 5.87059e-12next residual norm 5.26184e-10 3.04856e-12next residual norm 2.73182e-10 1.58274e-12next residual norm 1.41806e-10 8.21586e-13Linear solve converged due to CONVERGED_RTOL iterations 42Residual norm 1.41806e-10?Notice in the first run the residual norm still dives much more quickly for the first 2 iterations than the second run. This is because the first run has "lucky error" that gets wiped out easily from the big boundary term. After that you can see that the convergence for both is very similar with both having a reasonable error contraction factor of .51?I' ve attached the modified src/ksp/pc/impls/mg/mg.c that prints the residuals along the way.?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230301/2c60fde7/attachment-0001.html>

From jchristopher at anl.gov  Wed Mar  1 18:17:03 2023
From: jchristopher at anl.gov (Christopher, Joshua)
Date: Thu, 2 Mar 2023 00:17:03 +0000
Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG
Message-ID: <SA1PR09MB756856DAB4827D93146EDB69D4AD9@SA1PR09MB7568.namprd09.prod.outlook.com>

Hello,

I am trying to solve the leaky-dielectric model equations with PETSc using a second-order discretization scheme (with limiting to first order as needed) using the finite volume method. The leaky dielectric model is a coupled system of two equations, consisting of a Poisson equation and a convection-diffusion equation.  I have tested on small problems with simple geometry (~1000 DoFs) using:

-ksp_type gmres
-pc_type hypre
-pc_hypre_type boomeramg

and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this in parallel with 2 cores, but also previously was able to use successfully use a direct solver in serial to solve this problem. When I scale up to my production problem, I get significantly worse convergence. My production problem has ~3 million DoFs, more complex geometry, and is solved on ~100 cores across two nodes. The boundary conditions change a little because of the geometry, but are of the same classifications (e.g. only Dirichlet and Neumann). On the production case, I am needing 600-4000 iterations to converge. I've attached the output from the first solve that took 658 iterations to converge, using the following output options:

-ksp_view_pre
-ksp_view
-ksp_converged_reason
-ksp_monitor_true_residual
-ksp_test_null_space

My matrix is non-symmetric, the condition number can be around 10e6, and the eigenvalues reported by PETSc have been real and positive (using -ksp_view_eigenvalues).

I have tried using other preconditions (superlu, mumps, gamg, mg) but hypre+boomeramg has performed the best so far. The literature seems to indicate that AMG is the best approach for solving these equations in a coupled fashion.

Do you have any advice on speeding up the convergence of this system?

Thank you,
Joshua
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230302/9872b1f6/attachment-0001.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: petsc_gmres_boomeramg.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230302/9872b1f6/attachment-0001.txt>

From mfadams at lbl.gov  Thu Mar  2 02:57:37 2023
From: mfadams at lbl.gov (Mark Adams)
Date: Thu, 2 Mar 2023 09:57:37 +0100
Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre
 BoomerAMG
In-Reply-To: <SA1PR09MB756856DAB4827D93146EDB69D4AD9@SA1PR09MB7568.namprd09.prod.outlook.com>
References: <SA1PR09MB756856DAB4827D93146EDB69D4AD9@SA1PR09MB7568.namprd09.prod.outlook.com>
Message-ID: <CADOhEh6e5T55Fyfv42tKD7FrCV3s_x4Jb_kgyVRpw4G0ALpJiw@mail.gmail.com>

Can you give us a bit more detail on your equations?

* You are going to want to use FieldSplit.

* AMG usually takes some effort to get working well. You want to start
simple, even just a Lapacian or two decoupled Laplacians in your code to
get the expected MG performance. Then add realistic geometry, then more
tems, etc., and ramp up to what you want to do and we can help you address
problems that arise at each step. Verify the results at each step.

Mark


On Thu, Mar 2, 2023 at 5:51?AM Christopher, Joshua via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Hello,
>
> I am trying to solve the leaky-dielectric model equations with PETSc using
> a second-order discretization scheme (with limiting to first order as
> needed) using the finite volume method. The leaky dielectric model is a
> coupled system of two equations, consisting of a Poisson equation and a
> convection-diffusion equation.  I have tested on small problems with simple
> geometry (~1000 DoFs) using:
>
> -ksp_type gmres
> -pc_type hypre
> -pc_hypre_type boomeramg
>
> and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this
> in parallel with 2 cores, but also previously was able to use successfully
> use a direct solver in serial to solve this problem. When I scale up to my
> production problem, I get significantly worse convergence. My production
> problem has ~3 million DoFs, more complex geometry, and is solved on ~100
> cores across two nodes. The boundary conditions change a little because of
> the geometry, but are of the same classifications (e.g. only Dirichlet and
> Neumann). On the production case, I am needing 600-4000 iterations to
> converge. I've attached the output from the first solve that took 658
> iterations to converge, using the following output options:
>
> -ksp_view_pre
> -ksp_view
> -ksp_converged_reason
> -ksp_monitor_true_residual
> -ksp_test_null_space
>
> My matrix is non-symmetric, the condition number can be around 10e6, and
> the eigenvalues reported by PETSc have been real and positive (using
> -ksp_view_eigenvalues).
>
> I have tried using other preconditions (superlu, mumps, gamg, mg) but
> hypre+boomeramg has performed the best so far. The literature seems to
> indicate that AMG is the best approach for solving these equations in a
> coupled fashion.
>
> Do you have any advice on speeding up the convergence of this system?
>
> Thank you,
> Joshua
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230302/8e1dbf3d/attachment.html>

From bsmith at petsc.dev  Thu Mar  2 07:47:21 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 2 Mar 2023 08:47:21 -0500
Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre
 BoomerAMG
In-Reply-To: <SA1PR09MB756856DAB4827D93146EDB69D4AD9@SA1PR09MB7568.namprd09.prod.outlook.com>
References: <SA1PR09MB756856DAB4827D93146EDB69D4AD9@SA1PR09MB7568.namprd09.prod.outlook.com>
Message-ID: <CAC6644E-BC14-4F64-ABE5-F35589A4BD94@petsc.dev>


  Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the 5,000,000 unknowns? It is at the high end of problem sizes you can do with direct solvers but is worth comparing with  BoomerAMG. You likely want to use more nodes and fewer cores per node with MUMPs to be able to access more memory. If you are needing to solve multiple right hand sides but with the same matrix the factors will be reused resulting in the second and later solves being much faster.

  I agree with Mark, with iterative solvers you are likely to end up with PCFIELDSPLIT.

  Barry


> On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users <petsc-users at mcs.anl.gov> wrote:
> 
> Hello,
> 
> I am trying to solve the leaky-dielectric model equations with PETSc using a second-order discretization scheme (with limiting to first order as needed) using the finite volume method. The leaky dielectric model is a coupled system of two equations, consisting of a Poisson equation and a convection-diffusion equation.  I have tested on small problems with simple geometry (~1000 DoFs) using:
> 
> -ksp_type gmres 
> -pc_type hypre 
> -pc_hypre_type boomeramg
> 
> and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this in parallel with 2 cores, but also previously was able to use successfully use a direct solver in serial to solve this problem. When I scale up to my production problem, I get significantly worse convergence. My production problem has ~3 million DoFs, more complex geometry, and is solved on ~100 cores across two nodes. The boundary conditions change a little because of the geometry, but are of the same classifications (e.g. only Dirichlet and Neumann). On the production case, I am needing 600-4000 iterations to converge. I've attached the output from the first solve that took 658 iterations to converge, using the following output options:
> 
> -ksp_view_pre
> -ksp_view
> -ksp_converged_reason
> -ksp_monitor_true_residual
> -ksp_test_null_space
> 
> My matrix is non-symmetric, the condition number can be around 10e6, and the eigenvalues reported by PETSc have been real and positive (using -ksp_view_eigenvalues). 
> 
> I have tried using other preconditions (superlu, mumps, gamg, mg) but hypre+boomeramg has performed the best so far. The literature seems to indicate that AMG is the best approach for solving these equations in a coupled fashion.
> 
> Do you have any advice on speeding up the convergence of this system? 
> 
> Thank you,
> Joshua
> <petsc_gmres_boomeramg.txt>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230302/c0a27e59/attachment.html>

From fengshw3 at mail2.sysu.edu.cn  Thu Mar  2 11:43:16 2023
From: fengshw3 at mail2.sysu.edu.cn (=?utf-8?B?5Yav5LiK546u?=)
Date: Fri, 3 Mar 2023 01:43:16 +0800
Subject: [petsc-users] Error in configuring PETSc with Cygwin
Message-ID: <tencent_51F895C71B7559B508583817@qq.com>

Hi team,


Recently I try to install PETSc with Cygwin since I'd like to use PETSc with Visual Studio on Windows10 plateform. For the sake of clarity, I firstly list the softwares/packages used below:


1. PETSc: version 3.18.5
2. VS: version 2019
3. Intel Parallel Studio XE: version 2020
4. Cygwin with py3.8 and make (and default installation)


And because I plan to use Intel mpi, the compiler option in configuration is:


./configure --with-cc='win32fe cl' --with-fc='win32fe ifort' --with-cxx='win32fe cl' --download-fblaslapack


where there is no option for mpi.


While the PROBLEM came with the compiler option --with-fc='win32fe ifort', which returned an error (or two) as:


Cannot run executables created with FC. If this machine uses a batch system
to submit jobs you will need to configure using ./configure with the additional option&nbsp; --with-batch.
Otherwise there is problem with the compilers. Can you compile and run code with your compiler '/cygdrive/d/petsc/petsc-3.18.5/lib/petsc/bin/win32fe/win32fe ifort'?



Note that both ifort of x64 and ifort of ia-32 ended with the same error above and I install IPS with options related to mkl and fblaslapack. Something a bit suspectable is that I open Cygwin with dos. (actually the Intel Compiler 19.1 Update 3 Intel 64 Visual Studio 2019, x86 environment for the test of ifort ia-32 ,in particularlly)


Therefore, I write this e-mail to you in order to confirm if I should add "--with-batch" or the error is caused by other reason, such as ifort ?


Looking forward your reply!


Sinserely,
FENG.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230303/32c7cd7e/attachment.html>

From balay at mcs.anl.gov  Thu Mar  2 12:13:49 2023
From: balay at mcs.anl.gov (Satish Balay)
Date: Thu, 2 Mar 2023 12:13:49 -0600 (CST)
Subject: [petsc-users] Error in configuring PETSc with Cygwin
In-Reply-To: <tencent_51F895C71B7559B508583817@qq.com>
References: <tencent_51F895C71B7559B508583817@qq.com>
Message-ID: <b69b3331-249c-c31c-e255-9de5d6cc04d2@mcs.anl.gov>

On Fri, 3 Mar 2023, ??? wrote:

> Hi team,
> 
> 
> Recently I try to install PETSc with Cygwin since I'd like to use PETSc with Visual Studio on Windows10 plateform. For the sake of clarity, I firstly list the softwares/packages used below:
> 
> 
> 1. PETSc: version 3.18.5
> 2. VS: version 2019
> 3. Intel Parallel Studio XE: version 2020
> 4. Cygwin with py3.8 and make (and default installation)
> 
> 
> And because I plan to use Intel mpi, the compiler option in configuration is:
> 
> 
> ./configure --with-cc='win32fe cl' --with-fc='win32fe ifort' --with-cxx='win32fe cl' --download-fblaslapack

Check config/examples/arch-ci-mswin-opt-impi.py for an example on specifying IMPI [and MKL - instead of fblaslapack]. And if you don't need MPI - you can use --with-mpi=0

> 
> 
> where there is no option for mpi.
> 
> 
> While the PROBLEM came with the compiler option --with-fc='win32fe ifort', which returned an error (or two) as:
> 
> 
> Cannot run executables created with FC. If this machine uses a batch system
> to submit jobs you will need to configure using ./configure with the additional option&nbsp; --with-batch.
> Otherwise there is problem with the compilers. Can you compile and run code with your compiler '/cygdrive/d/petsc/petsc-3.18.5/lib/petsc/bin/win32fe/win32fe ifort'?

If you are not using PETSc from fortran - you don't need ifort. You can use --with-fc=0 [with MKL or --download-f2cblaslapack]

If you are still encountering errors - send us configure.log for the failed build.

Satish

> 
> 
> 
> Note that both ifort of x64 and ifort of ia-32 ended with the same error above and I install IPS with options related to mkl and fblaslapack. Something a bit suspectable is that I open Cygwin with dos. (actually the Intel Compiler 19.1 Update 3 Intel 64 Visual Studio 2019, x86 environment for the test of ifort ia-32 ,in particularlly)
> 
> 
> Therefore, I write this e-mail to you in order to confirm if I should add "--with-batch" or the error is caused by other reason, such as ifort ?
> 
> 
> Looking forward your reply!
> 
> 
> Sinserely,
> FENG.

From jchristopher at anl.gov  Thu Mar  2 15:22:38 2023
From: jchristopher at anl.gov (Christopher, Joshua)
Date: Thu, 2 Mar 2023 21:22:38 +0000
Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre
 BoomerAMG
In-Reply-To: <CAC6644E-BC14-4F64-ABE5-F35589A4BD94@petsc.dev>
References: <SA1PR09MB756856DAB4827D93146EDB69D4AD9@SA1PR09MB7568.namprd09.prod.outlook.com>
	<CAC6644E-BC14-4F64-ABE5-F35589A4BD94@petsc.dev>
Message-ID: <SA1PR09MB75689D2D180F6312B008039AD4B29@SA1PR09MB7568.namprd09.prod.outlook.com>

Hi Barry and Mark,

Thank you for looking into my problem. The two equations I am solving with PETSc are equations 6 and 7 from this paper: https://ris.utwente.nl/ws/portalfiles/portal/5676495/Roghair+Paper_final_draft_v1.pdf

I just used MUMPS and SuperLU_DIST on my full-size problem (with 3,000,000 unknowns). To clarify, I did a direct solve with -ksp_type preonly. They take a very long time, about 30 minutes for MUMPS and 18 minutes for SuperLU_DIST, see attached output. For reference, the same matrix took 658 iterations of BoomerAMG and about 20 seconds of walltime. Maybe I am already getting a great deal with BoomerAMG!

I'll try removing some terms from my solve (e.g. removing the second equation, then making the second equation just the elliptic portion of the equation, etc.) and try with a simpler geometry. I'll keep you updated as I run into troubles with that route. I wasn't aware of Field Split preconditioners, I'll do some reading on them and give them a try as well.

Thank you again,
Joshua
________________________________
From: Barry Smith <bsmith at petsc.dev>
Sent: Thursday, March 2, 2023 7:47 AM
To: Christopher, Joshua <jchristopher at anl.gov>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG


  Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the 5,000,000 unknowns? It is at the high end of problem sizes you can do with direct solvers but is worth comparing with  BoomerAMG. You likely want to use more nodes and fewer cores per node with MUMPs to be able to access more memory. If you are needing to solve multiple right hand sides but with the same matrix the factors will be reused resulting in the second and later solves being much faster.

  I agree with Mark, with iterative solvers you are likely to end up with PCFIELDSPLIT.

  Barry


On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users <petsc-users at mcs.anl.gov> wrote:

Hello,

I am trying to solve the leaky-dielectric model equations with PETSc using a second-order discretization scheme (with limiting to first order as needed) using the finite volume method. The leaky dielectric model is a coupled system of two equations, consisting of a Poisson equation and a convection-diffusion equation.  I have tested on small problems with simple geometry (~1000 DoFs) using:

-ksp_type gmres
-pc_type hypre
-pc_hypre_type boomeramg

and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this in parallel with 2 cores, but also previously was able to use successfully use a direct solver in serial to solve this problem. When I scale up to my production problem, I get significantly worse convergence. My production problem has ~3 million DoFs, more complex geometry, and is solved on ~100 cores across two nodes. The boundary conditions change a little because of the geometry, but are of the same classifications (e.g. only Dirichlet and Neumann). On the production case, I am needing 600-4000 iterations to converge. I've attached the output from the first solve that took 658 iterations to converge, using the following output options:

-ksp_view_pre
-ksp_view
-ksp_converged_reason
-ksp_monitor_true_residual
-ksp_test_null_space

My matrix is non-symmetric, the condition number can be around 10e6, and the eigenvalues reported by PETSc have been real and positive (using -ksp_view_eigenvalues).

I have tried using other preconditions (superlu, mumps, gamg, mg) but hypre+boomeramg has performed the best so far. The literature seems to indicate that AMG is the best approach for solving these equations in a coupled fashion.

Do you have any advice on speeding up the convergence of this system?

Thank you,
Joshua
<petsc_gmres_boomeramg.txt>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230302/75f03a69/attachment-0001.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: petsc_preonly_mumps.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230302/75f03a69/attachment-0002.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: petsc_preonly_superlu.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230302/75f03a69/attachment-0003.txt>

From bsmith at petsc.dev  Thu Mar  2 15:47:19 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 2 Mar 2023 16:47:19 -0500
Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre
 BoomerAMG
In-Reply-To: <SA1PR09MB75689D2D180F6312B008039AD4B29@SA1PR09MB7568.namprd09.prod.outlook.com>
References: <SA1PR09MB756856DAB4827D93146EDB69D4AD9@SA1PR09MB7568.namprd09.prod.outlook.com>
	<CAC6644E-BC14-4F64-ABE5-F35589A4BD94@petsc.dev>
	<SA1PR09MB75689D2D180F6312B008039AD4B29@SA1PR09MB7568.namprd09.prod.outlook.com>
Message-ID: <BC1877A1-BF64-4E2D-86C4-DDFFBB892577@petsc.dev>




?

  Are you solving this as a time-dependent problem? Using an implicit scheme (like backward Euler) for rho ? In ODE language, solving the differential algebraic equation?

Is epsilon bounded away from 0? 

> On Mar 2, 2023, at 4:22 PM, Christopher, Joshua <jchristopher at anl.gov> wrote:
> 
> Hi Barry and Mark,
> 
> Thank you for looking into my problem. The two equations I am solving with PETSc are equations 6 and 7 from this paper:https://ris.utwente.nl/ws/portalfiles/portal/5676495/Roghair+Paper_final_draft_v1.pdf
> 
> I just used MUMPS and SuperLU_DIST on my full-size problem (with 3,000,000 unknowns). To clarify, I did a direct solve with -ksp_type preonly. They take a very long time, about 30 minutes for MUMPS and 18 minutes for SuperLU_DIST, see attached output. For reference, the same matrix took 658 iterations of BoomerAMG and about 20 seconds of walltime. Maybe I am already getting a great deal with BoomerAMG!
> 
> I'll try removing some terms from my solve (e.g. removing the second equation, then making the second equation just the elliptic portion of the equation, etc.) and try with a simpler geometry. I'll keep you updated as I run into troubles with that route. I wasn't aware of Field Split preconditioners, I'll do some reading on them and give them a try as well.
> 
> Thank you again,
> Joshua
> From: Barry Smith <bsmith at petsc.dev>
> Sent: Thursday, March 2, 2023 7:47 AM
> To: Christopher, Joshua <jchristopher at anl.gov>
> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG
>  
> 
>   Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the 5,000,000 unknowns? It is at the high end of problem sizes you can do with direct solvers but is worth comparing with  BoomerAMG. You likely want to use more nodes and fewer cores per node with MUMPs to be able to access more memory. If you are needing to solve multiple right hand sides but with the same matrix the factors will be reused resulting in the second and later solves being much faster.
> 
>   I agree with Mark, with iterative solvers you are likely to end up with PCFIELDSPLIT.
> 
>   Barry
> 
> 
>> On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users <petsc-users at mcs.anl.gov> wrote:
>> 
>> Hello,
>> 
>> I am trying to solve the leaky-dielectric model equations with PETSc using a second-order discretization scheme (with limiting to first order as needed) using the finite volume method. The leaky dielectric model is a coupled system of two equations, consisting of a Poisson equation and a convection-diffusion equation.  I have tested on small problems with simple geometry (~1000 DoFs) using:
>> 
>> -ksp_type gmres 
>> -pc_type hypre 
>> -pc_hypre_type boomeramg
>> 
>> and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this in parallel with 2 cores, but also previously was able to use successfully use a direct solver in serial to solve this problem. When I scale up to my production problem, I get significantly worse convergence. My production problem has ~3 million DoFs, more complex geometry, and is solved on ~100 cores across two nodes. The boundary conditions change a little because of the geometry, but are of the same classifications (e.g. only Dirichlet and Neumann). On the production case, I am needing 600-4000 iterations to converge. I've attached the output from the first solve that took 658 iterations to converge, using the following output options:
>> 
>> -ksp_view_pre
>> -ksp_view
>> -ksp_converged_reason
>> -ksp_monitor_true_residual
>> -ksp_test_null_space
>> 
>> My matrix is non-symmetric, the condition number can be around 10e6, and the eigenvalues reported by PETSc have been real and positive (using -ksp_view_eigenvalues). 
>> 
>> I have tried using other preconditions (superlu, mumps, gamg, mg) but hypre+boomeramg has performed the best so far. The literature seems to indicate that AMG is the best approach for solving these equations in a coupled fashion.
>> 
>> Do you have any advice on speeding up the convergence of this system? 
>> 
>> Thank you,
>> Joshua
>> <petsc_gmres_boomeramg.txt>
> 
> <petsc_preonly_mumps.txt><petsc_preonly_superlu.txt>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230302/f566db0e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Untitled.png
Type: image/png
Size: 165137 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230302/f566db0e/attachment-0001.png>

From fengshw3 at mail2.sysu.edu.cn  Thu Mar  2 20:12:35 2023
From: fengshw3 at mail2.sysu.edu.cn (=?utf-8?B?5Yav5LiK546u?=)
Date: Fri, 3 Mar 2023 10:12:35 +0800
Subject: [petsc-users] Error in configuring PETSc with Cygwin
In-Reply-To: <b69b3331-249c-c31c-e255-9de5d6cc04d2@mcs.anl.gov>
References: <tencent_51F895C71B7559B508583817@qq.com>
	<b69b3331-249c-c31c-e255-9de5d6cc04d2@mcs.anl.gov>
Message-ID: <tencent_1179E6943164DC73548B94B0@qq.com>

Hi,&nbsp;


This time I try with ./configure --with-cc='win32fe cl' --with-fc=0 --with-cxx='win32fe cl' --download-f2cblaslapack, without fortran may have no problem in consideration that other libs will be used are CGNS and METIS.


Unfortunately, however, another error appeared as:


Cxx libraries cannot directly be used with C as linker.
If you don't need the C++ compiler to build external packages or for you application you can run
./configure with --with-cxx=0. Otherwise you need a different combination of C and C++ compilers

&nbsp;
&nbsp;The attachment is the log file, but some parts are unreadable.&nbsp;


Thanks for your continuous aid!
------------------&nbsp;Original&nbsp;------------------
From: &nbsp;"Satish&nbsp;Balay"<balay at mcs.anl.gov&gt;;
Date: &nbsp;Fri, Mar 3, 2023 02:13 AM
To: &nbsp;"???"<fengshw3 at mail2.sysu.edu.cn&gt;; 
Cc: &nbsp;"petsc-users"<petsc-users at mcs.anl.gov&gt;; 
Subject: &nbsp;Re: [petsc-users] Error in configuring PETSc with Cygwin

&nbsp;

On Fri, 3 Mar 2023, ??? wrote:

&gt; Hi team,
&gt; 
&gt; 
&gt; Recently I try to install PETSc with Cygwin since I'd like to use PETSc with Visual Studio on Windows10 plateform. For the sake of clarity, I firstly list the softwares/packages used below:
&gt; 
&gt; 
&gt; 1. PETSc: version 3.18.5
&gt; 2. VS: version 2019
&gt; 3. Intel Parallel Studio XE: version 2020
&gt; 4. Cygwin with py3.8 and make (and default installation)
&gt; 
&gt; 
&gt; And because I plan to use Intel mpi, the compiler option in configuration is:
&gt; 
&gt; 
&gt; ./configure --with-cc='win32fe cl' --with-fc='win32fe ifort' --with-cxx='win32fe cl' --download-fblaslapack

Check config/examples/arch-ci-mswin-opt-impi.py for an example on specifying IMPI [and MKL - instead of fblaslapack]. And if you don't need MPI - you can use --with-mpi=0

&gt; 
&gt; 
&gt; where there is no option for mpi.
&gt; 
&gt; 
&gt; While the PROBLEM came with the compiler option --with-fc='win32fe ifort', which returned an error (or two) as:
&gt; 
&gt; 
&gt; Cannot run executables created with FC. If this machine uses a batch system
&gt; to submit jobs you will need to configure using ./configure with the additional option&amp;nbsp; --with-batch.
&gt; Otherwise there is problem with the compilers. Can you compile and run code with your compiler '/cygdrive/d/petsc/petsc-3.18.5/lib/petsc/bin/win32fe/win32fe ifort'?

If you are not using PETSc from fortran - you don't need ifort. You can use --with-fc=0 [with MKL or --download-f2cblaslapack]

If you are still encountering errors - send us configure.log for the failed build.

Satish

&gt; 
&gt; 
&gt; 
&gt; Note that both ifort of x64 and ifort of ia-32 ended with the same error above and I install IPS with options related to mkl and fblaslapack. Something a bit suspectable is that I open Cygwin with dos. (actually the Intel Compiler 19.1 Update 3 Intel 64 Visual Studio 2019, x86 environment for the test of ifort ia-32 ,in particularlly)
&gt; 
&gt; 
&gt; Therefore, I write this e-mail to you in order to confirm if I should add "--with-batch" or the error is caused by other reason, such as ifort ?
&gt; 
&gt; 
&gt; Looking forward your reply!
&gt; 
&gt; 
&gt; Sinserely,
&gt; FENG.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230303/4961ec97/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure.txt
Type: application/octet-stream
Size: 957665 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230303/4961ec97/attachment-0001.obj>

From bsmith at petsc.dev  Thu Mar  2 20:27:31 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 2 Mar 2023 21:27:31 -0500
Subject: [petsc-users] Error in configuring PETSc with Cygwin
In-Reply-To: <tencent_1179E6943164DC73548B94B0@qq.com>
References: <tencent_51F895C71B7559B508583817@qq.com>
	<b69b3331-249c-c31c-e255-9de5d6cc04d2@mcs.anl.gov>
	<tencent_1179E6943164DC73548B94B0@qq.com>
Message-ID: <4ECF3541-271E-449E-B9FF-45EB24913F25@petsc.dev>


   The compiler is burping out some warning message which confuses configure into thinking there is a problem. 

cl: ?????? warning D9035 :??experimental:preprocessor????????????????????????????
cl: ?????? warning D9036 :????Zc:preprocessor??????????experimental:preprocessor??
cl: ?????? warning D9002 :?????????-Qwd10161??:

Any chance you can use a more recent version of VS. If not, we'll need to send you a file for the warning message.



> On Mar 2, 2023, at 9:12 PM, ??? <fengshw3 at mail2.sysu.edu.cn> wrote:
> 
> Hi, 
> 
> This time I try with ./configure --with-cc='win32fe cl' --with-fc=0 --with-cxx='win32fe cl' --download-f2cblaslapack, without fortran may have no problem in consideration that other libs will be used are CGNS and METIS.
> 
> Unfortunately, however, another error appeared as:
> 
> Cxx libraries cannot directly be used with C as linker.
> If you don't need the C++ compiler to build external packages or for you application you can run
> ./configure with --with-cxx=0. Otherwise you need a different combination of C and C++ compilers
>  
>  The attachment is the log file, but some parts are unreadable. 
> 
> Thanks for your continuous aid!
> ------------------ Original ------------------
> From:  "Satish Balay"<balay at mcs.anl.gov>;
> Date:  Fri, Mar 3, 2023 02:13 AM
> To:  "???"<fengshw3 at mail2.sysu.edu.cn>;
> Cc:  "petsc-users"<petsc-users at mcs.anl.gov>;
> Subject:  Re: [petsc-users] Error in configuring PETSc with Cygwin
>  
> On Fri, 3 Mar 2023, ??? wrote:
> 
> > Hi team,
> > 
> > 
> > Recently I try to install PETSc with Cygwin since I'd like to use PETSc with Visual Studio on Windows10 plateform. For the sake of clarity, I firstly list the softwares/packages used below:
> > 
> > 
> > 1. PETSc: version 3.18.5
> > 2. VS: version 2019
> > 3. Intel Parallel Studio XE: version 2020
> > 4. Cygwin with py3.8 and make (and default installation)
> > 
> > 
> > And because I plan to use Intel mpi, the compiler option in configuration is:
> > 
> > 
> > ./configure --with-cc='win32fe cl' --with-fc='win32fe ifort' --with-cxx='win32fe cl' --download-fblaslapack
> 
> Check config/examples/arch-ci-mswin-opt-impi.py for an example on specifying IMPI [and MKL - instead of fblaslapack]. And if you don't need MPI - you can use --with-mpi=0
> 
> > 
> > 
> > where there is no option for mpi.
> > 
> > 
> > While the PROBLEM came with the compiler option --with-fc='win32fe ifort', which returned an error (or two) as:
> > 
> > 
> > Cannot run executables created with FC. If this machine uses a batch system
> > to submit jobs you will need to configure using ./configure with the additional option&nbsp; --with-batch.
> > Otherwise there is problem with the compilers. Can you compile and run code with your compiler '/cygdrive/d/petsc/petsc-3.18.5/lib/petsc/bin/win32fe/win32fe ifort'?
> 
> If you are not using PETSc from fortran - you don't need ifort. You can use --with-fc=0 [with MKL or --download-f2cblaslapack]
> 
> If you are still encountering errors - send us configure.log for the failed build.
> 
> Satish
> 
> > 
> > 
> > 
> > Note that both ifort of x64 and ifort of ia-32 ended with the same error above and I install IPS with options related to mkl and fblaslapack. Something a bit suspectable is that I open Cygwin with dos. (actually the Intel Compiler 19.1 Update 3 Intel 64 Visual Studio 2019, x86 environment for the test of ifort ia-32 ,in particularlly)
> > 
> > 
> > Therefore, I write this e-mail to you in order to confirm if I should add "--with-batch" or the error is caused by other reason, such as ifort ?
> > 
> > 
> > Looking forward your reply!
> > 
> > 
> > Sinserely,
> > FENG.
> 
> <configure.txt>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230302/f0f29adc/attachment.html>

From balay at mcs.anl.gov  Thu Mar  2 22:12:45 2023
From: balay at mcs.anl.gov (Satish Balay)
Date: Thu, 2 Mar 2023 22:12:45 -0600 (CST)
Subject: [petsc-users] Error in configuring PETSc with Cygwin
In-Reply-To: <4ECF3541-271E-449E-B9FF-45EB24913F25@petsc.dev>
References: <tencent_51F895C71B7559B508583817@qq.com>
	<b69b3331-249c-c31c-e255-9de5d6cc04d2@mcs.anl.gov>
	<tencent_1179E6943164DC73548B94B0@qq.com>
	<4ECF3541-271E-449E-B9FF-45EB24913F25@petsc.dev>
Message-ID: <6bb8769f-976c-fa32-8076-757e2d04b54c@mcs.anl.gov>

Perhaps the compilers are installed without english - so we can't read the error messages. 

> ???? x64 ?? Microsoft (R) C/C++ ????????? 19.29.30147 ??

We test with:

Microsoft (R) C/C++ Optimizing Compiler Version 19.32.31329 for x64

I guess that's VS2019 vs VS2022?

You can try using --with-cxx=0 and see if that works.

Satish

On Thu, 2 Mar 2023, Barry Smith wrote:

> 
>    The compiler is burping out some warning message which confuses configure into thinking there is a problem. 
> 
> cl: ?????? warning D9035 :??experimental:preprocessor????????????????????????????
> cl: ?????? warning D9036 :????Zc:preprocessor??????????experimental:preprocessor??
> cl: ?????? warning D9002 :?????????-Qwd10161??:
> 
> Any chance you can use a more recent version of VS. If not, we'll need to send you a file for the warning message.
> 
> 
> 
> > On Mar 2, 2023, at 9:12 PM, ??? <fengshw3 at mail2.sysu.edu.cn> wrote:
> > 
> > Hi, 
> > 
> > This time I try with ./configure --with-cc='win32fe cl' --with-fc=0 --with-cxx='win32fe cl' --download-f2cblaslapack, without fortran may have no problem in consideration that other libs will be used are CGNS and METIS.
> > 
> > Unfortunately, however, another error appeared as:
> > 
> > Cxx libraries cannot directly be used with C as linker.
> > If you don't need the C++ compiler to build external packages or for you application you can run
> > ./configure with --with-cxx=0. Otherwise you need a different combination of C and C++ compilers
> >  
> >  The attachment is the log file, but some parts are unreadable. 
> > 
> > Thanks for your continuous aid!
> > ------------------ Original ------------------
> > From:  "Satish Balay"<balay at mcs.anl.gov>;
> > Date:  Fri, Mar 3, 2023 02:13 AM
> > To:  "???"<fengshw3 at mail2.sysu.edu.cn>;
> > Cc:  "petsc-users"<petsc-users at mcs.anl.gov>;
> > Subject:  Re: [petsc-users] Error in configuring PETSc with Cygwin
> >  
> > On Fri, 3 Mar 2023, ??? wrote:
> > 
> > > Hi team,
> > > 
> > > 
> > > Recently I try to install PETSc with Cygwin since I'd like to use PETSc with Visual Studio on Windows10 plateform. For the sake of clarity, I firstly list the softwares/packages used below:
> > > 
> > > 
> > > 1. PETSc: version 3.18.5
> > > 2. VS: version 2019
> > > 3. Intel Parallel Studio XE: version 2020
> > > 4. Cygwin with py3.8 and make (and default installation)
> > > 
> > > 
> > > And because I plan to use Intel mpi, the compiler option in configuration is:
> > > 
> > > 
> > > ./configure --with-cc='win32fe cl' --with-fc='win32fe ifort' --with-cxx='win32fe cl' --download-fblaslapack
> > 
> > Check config/examples/arch-ci-mswin-opt-impi.py for an example on specifying IMPI [and MKL - instead of fblaslapack]. And if you don't need MPI - you can use --with-mpi=0
> > 
> > > 
> > > 
> > > where there is no option for mpi.
> > > 
> > > 
> > > While the PROBLEM came with the compiler option --with-fc='win32fe ifort', which returned an error (or two) as:
> > > 
> > > 
> > > Cannot run executables created with FC. If this machine uses a batch system
> > > to submit jobs you will need to configure using ./configure with the additional option&nbsp; --with-batch.
> > > Otherwise there is problem with the compilers. Can you compile and run code with your compiler '/cygdrive/d/petsc/petsc-3.18.5/lib/petsc/bin/win32fe/win32fe ifort'?
> > 
> > If you are not using PETSc from fortran - you don't need ifort. You can use --with-fc=0 [with MKL or --download-f2cblaslapack]
> > 
> > If you are still encountering errors - send us configure.log for the failed build.
> > 
> > Satish
> > 
> > > 
> > > 
> > > 
> > > Note that both ifort of x64 and ifort of ia-32 ended with the same error above and I install IPS with options related to mkl and fblaslapack. Something a bit suspectable is that I open Cygwin with dos. (actually the Intel Compiler 19.1 Update 3 Intel 64 Visual Studio 2019, x86 environment for the test of ifort ia-32 ,in particularlly)
> > > 
> > > 
> > > Therefore, I write this e-mail to you in order to confirm if I should add "--with-batch" or the error is caused by other reason, such as ifort ?
> > > 
> > > 
> > > Looking forward your reply!
> > > 
> > > 
> > > Sinserely,
> > > FENG.
> > 
> > <configure.txt>
> 
> 

From fengshw3 at mail2.sysu.edu.cn  Thu Mar  2 22:29:31 2023
From: fengshw3 at mail2.sysu.edu.cn (=?utf-8?B?5Yav5LiK546u?=)
Date: Fri, 3 Mar 2023 12:29:31 +0800
Subject: [petsc-users] =?utf-8?b?5Zue5aSNOlJlOiAgRXJyb3IgaW4gY29uZmlndXJp?=
 =?utf-8?q?ng_PETSc_with_Cygwin?=
Message-ID: <tencent_0185841A7AB65CE802C047CC@qq.com>

My program is coded in C++, I think that it's not a good choice for cxx=0, whatever the configuration in this way works or not.
Anyway, I'll search for getting VS2022 and retry installation.

--------------????--------------
????"Satish&nbsp;Balay "<balay at mcs.anl.gov&gt;;
?????2023?3?3?(???) ??12:12
????"Barry Smith" <bsmith at petsc.dev&gt;;
???"??? "<fengshw3 at mail2.sysu.edu.cn&gt;;"petsc-users "<petsc-users at mcs.anl.gov&gt;;
???Re: [petsc-users] Error in configuring PETSc with Cygwin
-----------------------------------

 Perhaps the compilers are installed without english - so we can't read the error messages. 

&gt;&nbsp;    &nbsp;x64&nbsp;  &nbsp;Microsoft (R) C/C++&nbsp; ?       &nbsp;19.29.30147&nbsp;  

We test with:

Microsoft (R) C/C++ Optimizing Compiler Version 19.32.31329 for x64

I guess that's VS2019 vs VS2022?

You can try using --with-cxx=0 and see if that works.

Satish

On Thu, 2 Mar 2023, Barry Smith wrote:

&gt; 
&gt;&nbsp;&nbsp;&nbsp; The compiler is burping out some warning message which confuses configure into thinking there is a problem. 
&gt; 
&gt; cl:&nbsp;      &nbsp;warning D9035 :  experimental:preprocessor  ?   ?         ?    ??   ? 
&gt; cl:&nbsp;      &nbsp;warning D9036 :? ? Zc:preprocessor      ? ? experimental:preprocessor  
&gt; cl:&nbsp;      &nbsp;warning D9002 :    ??? ?-Qwd10161  :
&gt; 
&gt; Any chance you can use a more recent version of VS. If not, we'll need to send you a file for the warning message.
&gt; 
&gt; 
&gt; 
&gt; &gt; On Mar 2, 2023, at 9:12 PM, ??? <fengshw3 at mail2.sysu.edu.cn&gt; wrote:
&gt; &gt; 
&gt; &gt; Hi, 
&gt; &gt; 
&gt; &gt; This time I try with ./configure --with-cc='win32fe cl' --with-fc=0 --with-cxx='win32fe cl' --download-f2cblaslapack, without fortran may have no problem in consideration that other libs will be used are CGNS and METIS.
&gt; &gt; 
&gt; &gt; Unfortunately, however, another error appeared as:
&gt; &gt; 
&gt; &gt; Cxx libraries cannot directly be used with C as linker.
&gt; &gt; If you don't need the C++ compiler to build external packages or for you application you can run
&gt; &gt; ./configure with --with-cxx=0. Otherwise you need a different combination of C and C++ compilers
&gt; &gt;&nbsp; 
&gt; &gt;&nbsp; The attachment is the log file, but some parts are unreadable. 
&gt; &gt; 
&gt; &gt; Thanks for your continuous aid!
&gt; &gt; ------------------ Original ------------------
&gt; &gt; From:&nbsp; "Satish Balay"<balay at mcs.anl.gov&gt;;
&gt; &gt; Date:&nbsp; Fri, Mar 3, 2023 02:13 AM
&gt; &gt; To:&nbsp; "???"<fengshw3 at mail2.sysu.edu.cn&gt;;
&gt; &gt; Cc:&nbsp; "petsc-users"<petsc-users at mcs.anl.gov&gt;;
&gt; &gt; Subject:&nbsp; Re: [petsc-users] Error in configuring PETSc with Cygwin
&gt; &gt;&nbsp; 
&gt; &gt; On Fri, 3 Mar 2023, ??? wrote:
&gt; &gt; 
&gt; &gt; &gt; Hi team,
&gt; &gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; Recently I try to install PETSc with Cygwin since I'd like to use PETSc with Visual Studio on Windows10 plateform. For the sake of clarity, I firstly list the softwares/packages used below:
&gt; &gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; 1. PETSc: version 3.18.5
&gt; &gt; &gt; 2. VS: version 2019
&gt; &gt; &gt; 3. Intel Parallel Studio XE: version 2020
&gt; &gt; &gt; 4. Cygwin with py3.8 and make (and default installation)
&gt; &gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; And because I plan to use Intel mpi, the compiler option in configuration is:
&gt; &gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; ./configure --with-cc='win32fe cl' --with-fc='win32fe ifort' --with-cxx='win32fe cl' --download-fblaslapack
&gt; &gt; 
&gt; &gt; Check config/examples/arch-ci-mswin-opt-impi.py for an example on specifying IMPI [and MKL - instead of fblaslapack]. And if you don't need MPI - you can use --with-mpi=0
&gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; where there is no option for mpi.
&gt; &gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; While the PROBLEM came with the compiler option --with-fc='win32fe ifort', which returned an error (or two) as:
&gt; &gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; Cannot run executables created with FC. If this machine uses a batch system
&gt; &gt; &gt; to submit jobs you will need to configure using ./configure with the additional option&amp;nbsp; --with-batch.
&gt; &gt; &gt; Otherwise there is problem with the compilers. Can you compile and run code with your compiler '/cygdrive/d/petsc/petsc-3.18.5/lib/petsc/bin/win32fe/win32fe ifort'?
&gt; &gt; 
&gt; &gt; If you are not using PETSc from fortran - you don't need ifort. You can use --with-fc=0 [with MKL or --download-f2cblaslapack]
&gt; &gt; 
&gt; &gt; If you are still encountering errors - send us configure.log for the failed build.
&gt; &gt; 
&gt; &gt; Satish
&gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; Note that both ifort of x64 and ifort of ia-32 ended with the same error above and I install IPS with options related to mkl and fblaslapack. Something a bit suspectable is that I open Cygwin with dos. (actually the Intel Compiler 19.1 Update 3 Intel 64 Visual Studio 2019, x86 environment for the test of ifort ia-32 ,in particularlly)
&gt; &gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; Therefore, I write this e-mail to you in order to confirm if I should add "--with-batch" or the error is caused by other reason, such as ifort ?
&gt; &gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; Looking forward your reply!
&gt; &gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; Sinserely,
&gt; &gt; &gt; FENG.
&gt; &gt; 
&gt; &gt; <configure.txt&gt;
&gt; 
&gt;
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230303/d2646bb8/attachment-0001.html>

From jchristopher at anl.gov  Fri Mar  3 11:24:32 2023
From: jchristopher at anl.gov (Christopher, Joshua)
Date: Fri, 3 Mar 2023 17:24:32 +0000
Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre
 BoomerAMG
In-Reply-To: <BC1877A1-BF64-4E2D-86C4-DDFFBB892577@petsc.dev>
References: <SA1PR09MB756856DAB4827D93146EDB69D4AD9@SA1PR09MB7568.namprd09.prod.outlook.com>
	<CAC6644E-BC14-4F64-ABE5-F35589A4BD94@petsc.dev>
	<SA1PR09MB75689D2D180F6312B008039AD4B29@SA1PR09MB7568.namprd09.prod.outlook.com>
	<BC1877A1-BF64-4E2D-86C4-DDFFBB892577@petsc.dev>
Message-ID: <SA1PR09MB7568D48C1F38364F0DD286C0D4B39@SA1PR09MB7568.namprd09.prod.outlook.com>

I am solving these equations in the context of electrically-driven fluid flows as that first paper describes. I am using a PIMPLE scheme to advance the fluid equations in time, and my goal is to do a coupled solve of the electric equations similar to what is described in this paper: https://www.sciencedirect.com/science/article/pii/S0045793019302427. They are using the SIMPLE scheme in this paper. My fluid flow should eventually reach steady behavior, and likewise the time derivative in the charge density should trend towards zero. They preferred using BiCGStab with a direct LU preconditioner for solving their electric equations. I tried to test that combination, but my case is halting for unknown reasons in the middle of the PETSc solve. I'll try with more nodes and see if I am running out of memory, but the computer is a little overloaded at the moment so it may take a while to run.

I sent Pierre Jolivet my matrix and RHS, and they said the matrix does not appear to be following a parallel numbering, and instead looks like the matrix has natural numbering. When they renumbered the system with ParMETIS they got really fast convergence. I am using PETSc through a library, so I will reach out to the library authors and see if there is an issue in the library.

Thank you,
Joshua
________________________________
From: Barry Smith <bsmith at petsc.dev>
Sent: Thursday, March 2, 2023 3:47 PM
To: Christopher, Joshua <jchristopher at anl.gov>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG




[Untitled.png]

  Are you solving this as a time-dependent problem? Using an implicit scheme (like backward Euler) for rho ? In ODE language, solving the differential algebraic equation?

Is epsilon bounded away from 0?

On Mar 2, 2023, at 4:22 PM, Christopher, Joshua <jchristopher at anl.gov> wrote:

Hi Barry and Mark,

Thank you for looking into my problem. The two equations I am solving with PETSc are equations 6 and 7 from this paper:https://ris.utwente.nl/ws/portalfiles/portal/5676495/Roghair+Paper_final_draft_v1.pdf

I just used MUMPS and SuperLU_DIST on my full-size problem (with 3,000,000 unknowns). To clarify, I did a direct solve with -ksp_type preonly. They take a very long time, about 30 minutes for MUMPS and 18 minutes for SuperLU_DIST, see attached output. For reference, the same matrix took 658 iterations of BoomerAMG and about 20 seconds of walltime. Maybe I am already getting a great deal with BoomerAMG!

I'll try removing some terms from my solve (e.g. removing the second equation, then making the second equation just the elliptic portion of the equation, etc.) and try with a simpler geometry. I'll keep you updated as I run into troubles with that route. I wasn't aware of Field Split preconditioners, I'll do some reading on them and give them a try as well.

Thank you again,
Joshua
________________________________
From: Barry Smith <bsmith at petsc.dev>
Sent: Thursday, March 2, 2023 7:47 AM
To: Christopher, Joshua <jchristopher at anl.gov>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG


  Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the 5,000,000 unknowns? It is at the high end of problem sizes you can do with direct solvers but is worth comparing with  BoomerAMG. You likely want to use more nodes and fewer cores per node with MUMPs to be able to access more memory. If you are needing to solve multiple right hand sides but with the same matrix the factors will be reused resulting in the second and later solves being much faster.

  I agree with Mark, with iterative solvers you are likely to end up with PCFIELDSPLIT.

  Barry


On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users <petsc-users at mcs.anl.gov> wrote:

Hello,

I am trying to solve the leaky-dielectric model equations with PETSc using a second-order discretization scheme (with limiting to first order as needed) using the finite volume method. The leaky dielectric model is a coupled system of two equations, consisting of a Poisson equation and a convection-diffusion equation.  I have tested on small problems with simple geometry (~1000 DoFs) using:

-ksp_type gmres
-pc_type hypre
-pc_hypre_type boomeramg

and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this in parallel with 2 cores, but also previously was able to use successfully use a direct solver in serial to solve this problem. When I scale up to my production problem, I get significantly worse convergence. My production problem has ~3 million DoFs, more complex geometry, and is solved on ~100 cores across two nodes. The boundary conditions change a little because of the geometry, but are of the same classifications (e.g. only Dirichlet and Neumann). On the production case, I am needing 600-4000 iterations to converge. I've attached the output from the first solve that took 658 iterations to converge, using the following output options:

-ksp_view_pre
-ksp_view
-ksp_converged_reason
-ksp_monitor_true_residual
-ksp_test_null_space

My matrix is non-symmetric, the condition number can be around 10e6, and the eigenvalues reported by PETSc have been real and positive (using -ksp_view_eigenvalues).

I have tried using other preconditions (superlu, mumps, gamg, mg) but hypre+boomeramg has performed the best so far. The literature seems to indicate that AMG is the best approach for solving these equations in a coupled fashion.

Do you have any advice on speeding up the convergence of this system?

Thank you,
Joshua
<petsc_gmres_boomeramg.txt>

<petsc_preonly_mumps.txt><petsc_preonly_superlu.txt>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230303/6ee37f0e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Untitled.png
Type: image/png
Size: 165137 bytes
Desc: Untitled.png
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230303/6ee37f0e/attachment-0001.png>

From pierre at joliv.et  Fri Mar  3 11:45:05 2023
From: pierre at joliv.et (Pierre Jolivet)
Date: Fri, 3 Mar 2023 18:45:05 +0100
Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre
 BoomerAMG
In-Reply-To: <SA1PR09MB7568D48C1F38364F0DD286C0D4B39@SA1PR09MB7568.namprd09.prod.outlook.com>
References: <SA1PR09MB756856DAB4827D93146EDB69D4AD9@SA1PR09MB7568.namprd09.prod.outlook.com>
	<CAC6644E-BC14-4F64-ABE5-F35589A4BD94@petsc.dev>
	<SA1PR09MB75689D2D180F6312B008039AD4B29@SA1PR09MB7568.namprd09.prod.outlook.com>
	<BC1877A1-BF64-4E2D-86C4-DDFFBB892577@petsc.dev>
	<SA1PR09MB7568D48C1F38364F0DD286C0D4B39@SA1PR09MB7568.namprd09.prod.outlook.com>
Message-ID: <523EAD18-437E-4008-A811-4D32317C89AC@joliv.et>

For full disclosure, with -ksp_pc_side right -ksp_max_it 100 -ksp_rtol 1E-10:
1) with renumbering via ParMETIS
-pc_type bjacobi -sub_pc_type lu -sub_pc_factor_mat_solver_type mumps => Linear solve converged due to CONVERGED_RTOL iterations 10
-pc_type hypre -pc_hypre_boomeramg_relax_type_down l1-Gauss-Seidel -pc_hypre_boomeramg_relax_type_up backward-l1-Gauss-Seidel => Linear solve converged due to CONVERGED_RTOL iterations 55
2) without renumbering via ParMETIS
-pc_type bjacobi => Linear solve did not converge due to DIVERGED_ITS iterations 100
-pc_type hypre => Linear solve did not converge due to DIVERGED_ITS iterations 100
Using on outer fieldsplit may help fix this.

Thanks,
Pierre

> On 3 Mar 2023, at 6:24 PM, Christopher, Joshua via petsc-users <petsc-users at mcs.anl.gov> wrote:
> 
> I am solving these equations in the context of electrically-driven fluid flows as that first paper describes. I am using a PIMPLE scheme to advance the fluid equations in time, and my goal is to do a coupled solve of the electric equations similar to what is described in this paper: https://www.sciencedirect.com/science/article/pii/S0045793019302427. They are using the SIMPLE scheme in this paper. My fluid flow should eventually reach steady behavior, and likewise the time derivative in the charge density should trend towards zero. They preferred using BiCGStab with a direct LU preconditioner for solving their electric equations. I tried to test that combination, but my case is halting for unknown reasons in the middle of the PETSc solve. I'll try with more nodes and see if I am running out of memory, but the computer is a little overloaded at the moment so it may take a while to run.
> 
> I sent Pierre Jolivet my matrix and RHS, and they said the matrix does not appear to be following a parallel numbering, and instead looks like the matrix has natural numbering. When they renumbered the system with ParMETIS they got really fast convergence. I am using PETSc through a library, so I will reach out to the library authors and see if there is an issue in the library.
> 
> Thank you,
> Joshua
> From: Barry Smith <bsmith at petsc.dev>
> Sent: Thursday, March 2, 2023 3:47 PM
> To: Christopher, Joshua <jchristopher at anl.gov>
> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG
>  
> 
> 
> 
?
> 
>   Are you solving this as a time-dependent problem? Using an implicit scheme (like backward Euler) for rho ? In ODE language, solving the differential algebraic equation?
> 
> Is epsilon bounded away from 0? 
> 
>> On Mar 2, 2023, at 4:22 PM, Christopher, Joshua <jchristopher at anl.gov> wrote:
>> 
>> Hi Barry and Mark,
>> 
>> Thank you for looking into my problem. The two equations I am solving with PETSc are equations 6 and 7 from this paper:https://ris.utwente.nl/ws/portalfiles/portal/5676495/Roghair+Paper_final_draft_v1.pdf
>> 
>> I just used MUMPS and SuperLU_DIST on my full-size problem (with 3,000,000 unknowns). To clarify, I did a direct solve with -ksp_type preonly. They take a very long time, about 30 minutes for MUMPS and 18 minutes for SuperLU_DIST, see attached output. For reference, the same matrix took 658 iterations of BoomerAMG and about 20 seconds of walltime. Maybe I am already getting a great deal with BoomerAMG!
>> 
>> I'll try removing some terms from my solve (e.g. removing the second equation, then making the second equation just the elliptic portion of the equation, etc.) and try with a simpler geometry. I'll keep you updated as I run into troubles with that route. I wasn't aware of Field Split preconditioners, I'll do some reading on them and give them a try as well.
>> 
>> Thank you again,
>> Joshua
>> From: Barry Smith <bsmith at petsc.dev>
>> Sent: Thursday, March 2, 2023 7:47 AM
>> To: Christopher, Joshua <jchristopher at anl.gov>
>> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>> Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG
>>  
>> 
>>   Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the 5,000,000 unknowns? It is at the high end of problem sizes you can do with direct solvers but is worth comparing with  BoomerAMG. You likely want to use more nodes and fewer cores per node with MUMPs to be able to access more memory. If you are needing to solve multiple right hand sides but with the same matrix the factors will be reused resulting in the second and later solves being much faster.
>> 
>>   I agree with Mark, with iterative solvers you are likely to end up with PCFIELDSPLIT.
>> 
>>   Barry
>> 
>> 
>>> On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users <petsc-users at mcs.anl.gov> wrote:
>>> 
>>> Hello,
>>> 
>>> I am trying to solve the leaky-dielectric model equations with PETSc using a second-order discretization scheme (with limiting to first order as needed) using the finite volume method. The leaky dielectric model is a coupled system of two equations, consisting of a Poisson equation and a convection-diffusion equation.  I have tested on small problems with simple geometry (~1000 DoFs) using:
>>> 
>>> -ksp_type gmres 
>>> -pc_type hypre 
>>> -pc_hypre_type boomeramg
>>> 
>>> and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this in parallel with 2 cores, but also previously was able to use successfully use a direct solver in serial to solve this problem. When I scale up to my production problem, I get significantly worse convergence. My production problem has ~3 million DoFs, more complex geometry, and is solved on ~100 cores across two nodes. The boundary conditions change a little because of the geometry, but are of the same classifications (e.g. only Dirichlet and Neumann). On the production case, I am needing 600-4000 iterations to converge. I've attached the output from the first solve that took 658 iterations to converge, using the following output options:
>>> 
>>> -ksp_view_pre
>>> -ksp_view
>>> -ksp_converged_reason
>>> -ksp_monitor_true_residual
>>> -ksp_test_null_space
>>> 
>>> My matrix is non-symmetric, the condition number can be around 10e6, and the eigenvalues reported by PETSc have been real and positive (using -ksp_view_eigenvalues). 
>>> 
>>> I have tried using other preconditions (superlu, mumps, gamg, mg) but hypre+boomeramg has performed the best so far. The literature seems to indicate that AMG is the best approach for solving these equations in a coupled fashion.
>>> 
>>> Do you have any advice on speeding up the convergence of this system? 
>>> 
>>> Thank you,
>>> Joshua
>>> <petsc_gmres_boomeramg.txt>
>> 
>> <petsc_preonly_mumps.txt><petsc_preonly_superlu.txt>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230303/eb53a2e8/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Untitled.png
Type: image/png
Size: 165137 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230303/eb53a2e8/attachment-0001.png>

From danyang.su at gmail.com  Fri Mar  3 18:36:10 2023
From: danyang.su at gmail.com (danyang.su at gmail.com)
Date: Fri, 3 Mar 2023 16:36:10 -0800
Subject: [petsc-users] PETSC ERROR in DMGetLocalBoundingBox?
Message-ID: <00ab01d94e31$51fdc590$f5f950b0$@gmail.com>

Hi All,

 

I get a very strange error after upgrading PETSc version to 3.18.3,
indicating some object is already free. The error is begin and does not
crash the code. There is no error before PETSc 3.17.5 versions. 

 

        !Check coordinates

        call DMGetCoordinateDM(dmda_flow%da,cda,ierr)

        CHKERRQ(ierr)

        call DMGetCoordinates(dmda_flow%da,gc,ierr)

        CHKERRQ(ierr)

        call DMGetLocalBoundingBox(dmda_flow%da,lmin,lmax,ierr)

        CHKERRQ(ierr)

        call DMGetBoundingBox(dmda_flow%da,gmin,gmax,ierr)

        CHKERRQ(ierr)

 

 

[0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------

[0]PETSC ERROR: Corrupt argument: https://petsc.org/release/faq/#valgrind

[0]PETSC ERROR: Object already free: Parameter # 1

[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.

[0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022

[0]PETSC ERROR: ../min3p-hpc-mpi-petsc-3.18.3 on a linux-gnu-dbg named
starblazer by dsu Fri Mar  3 16:26:03 2023

[0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++
--with-fc=gfortran --download-mpich --download-scalapack --download-parmetis
--download-metis --download-mumps --download-ptscotch --download-chaco
--download-fblaslapack --download-hypre --download-superlu_dist
--download-hdf5=yes --download-ctetgen --download-zlib --download-pnetcdf
--download-cmake --with-hdf5-fortran-bindings --with-debugging=1

[0]PETSC ERROR: #1 VecGetArrayRead() at
/home/dsu/Soft/petsc/petsc-3.18.3/src/vec/vec/interface/rvector.c:1928

[0]PETSC ERROR: #2 DMGetLocalBoundingBox() at
/home/dsu/Soft/petsc/petsc-3.18.3/src/dm/interface/dmcoordinates.c:897

[0]PETSC ERROR: #3
/home/dsu/Work/min3p-dbs-backup/src/project/makefile_p/../../solver/solver_d
dmethod.F90:2140

 

Any suggestion on this?

 

Thanks,

 

Danyang

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230303/79d448a5/attachment.html>

From knepley at gmail.com  Fri Mar  3 22:58:05 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Sat, 4 Mar 2023 05:58:05 +0100
Subject: [petsc-users] PETSC ERROR in DMGetLocalBoundingBox?
In-Reply-To: <00ab01d94e31$51fdc590$f5f950b0$@gmail.com>
References: <00ab01d94e31$51fdc590$f5f950b0$@gmail.com>
Message-ID: <CAMYG4G=0upUasJCTPckC9zaT4_=oYosj61AOdT19SLUcDs=vyQ@mail.gmail.com>

On Sat, Mar 4, 2023 at 1:35?AM <danyang.su at gmail.com> wrote:

> Hi All,
>
>
>
> I get a very strange error after upgrading PETSc version to 3.18.3,
> indicating some object is already free. The error is begin and does not
> crash the code. There is no error before PETSc 3.17.5 versions.
>

We have changed the way coordinates are handled in order to support higher
order coordinate fields. Is it possible
to send something that we can run that has this error? It could be on our
end, but it could also be that you are
destroying a coordinate vector accidentally.

  Thanks,

     Matt


>
>
>         !Check coordinates
>
>         call DMGetCoordinateDM(dmda_flow%da,cda,ierr)
>
>         CHKERRQ(ierr)
>
>         call DMGetCoordinates(dmda_flow%da,gc,ierr)
>
>         CHKERRQ(ierr)
>
>         call DMGetLocalBoundingBox(dmda_flow%da,lmin,lmax,ierr)
>
>         CHKERRQ(ierr)
>
>         call DMGetBoundingBox(dmda_flow%da,gmin,gmax,ierr)
>
>         CHKERRQ(ierr)
>
>
>
>
>
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
>
> [0]PETSC ERROR: Corrupt argument: https://petsc.org/release/faq/#valgrind
>
> [0]PETSC ERROR: Object already free: Parameter # 1
>
> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
>
> [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022
>
> [0]PETSC ERROR: ../min3p-hpc-mpi-petsc-3.18.3 on a linux-gnu-dbg named
> starblazer by dsu Fri Mar  3 16:26:03 2023
>
> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++
> --with-fc=gfortran --download-mpich --download-scalapack
> --download-parmetis --download-metis --download-mumps --download-ptscotch
> --download-chaco --download-fblaslapack --download-hypre
> --download-superlu_dist --download-hdf5=yes --download-ctetgen
> --download-zlib --download-pnetcdf --download-cmake
> --with-hdf5-fortran-bindings --with-debugging=1
>
> [0]PETSC ERROR: #1 VecGetArrayRead() at
> /home/dsu/Soft/petsc/petsc-3.18.3/src/vec/vec/interface/rvector.c:1928
>
> [0]PETSC ERROR: #2 DMGetLocalBoundingBox() at
> /home/dsu/Soft/petsc/petsc-3.18.3/src/dm/interface/dmcoordinates.c:897
>
> [0]PETSC ERROR: #3
> /home/dsu/Work/min3p-dbs-backup/src/project/makefile_p/../../solver/solver_ddmethod.F90:2140
>
>
>
> Any suggestion on this?
>
>
>
> Thanks,
>
>
>
> Danyang
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230304/06642815/attachment.html>

From fengshw3 at mail2.sysu.edu.cn  Sat Mar  4 06:22:01 2023
From: fengshw3 at mail2.sysu.edu.cn (=?utf-8?B?5Yav5LiK546u?=)
Date: Sat, 4 Mar 2023 20:22:01 +0800
Subject: [petsc-users] Error in configuring PETSc with Cygwin
In-Reply-To: <6bb8769f-976c-fa32-8076-757e2d04b54c@mcs.anl.gov>
References: <tencent_51F895C71B7559B508583817@qq.com>
	<b69b3331-249c-c31c-e255-9de5d6cc04d2@mcs.anl.gov>
	<tencent_1179E6943164DC73548B94B0@qq.com>
	<4ECF3541-271E-449E-B9FF-45EB24913F25@petsc.dev>
	<6bb8769f-976c-fa32-8076-757e2d04b54c@mcs.anl.gov>
Message-ID: <tencent_763C76415843117C21D601F5@qq.com>

Hi,


VS2022 really sovled this error and there is no more error with the compiler, this is a good news!


However, a new problem comes with the link option for MS-MPI (since MPICH2 doesn't work):


I've made reference to PETSc website and downloaded MS-MPI in directory D:\MicrosoftMPI and&nbsp;D:\MicrosoftSDKs to avoid space (by the way, method on https://petsc.org/release/install/windows/ which use shortname for a path is not useful anymore for win10 because shortname doesn't exist, see [1]). I have no idea if my tying format is not correct since the PETSc website doesn't show the coding for two include directories. Below is my typing:


./configure --with-cc='win32fe cl' --with-fc=0 --with-cxx='win32fe cl' --with-shared-libraries=0 --with-mpi-include='[/cygdrive/d/MicrosoftSDKs/MPI/Include,/cygdrive/d/MicrosoftSDKs/MPI/Include/x64]' --with-mpi-lib=-L"/cygdrive/d/MicrosoftSDKs/MPI/Lib/x64 msmpifec.lib msmpi.lib" --with-mpiexec="/cygdrive/d/MicrosoftMPI/Bin/mpiexec"


This ends up with the error information:


--with-mpi-lib=['-L/cygdrive/d/MicrosoftSDKs/MPI/Lib/x64', 'msmpifec.lib', 'msmpi.lib'] and
--with-mpi-include=['/cygdrive/d/MicrosoftSDKs/MPI/Include', '/cygdrive/d/MicrosoftSDKs/MPI/Include/x64'] did not work


This may not be a very delicacy problem? And I am voluntary to make a summary about this installation once it succeed.


Sorry for always bother with problem,
FENG



[1]&nbsp;https://superuser.com/questions/348079/how-can-i-find-the-short-path-of-a-windows-directory-file
&nbsp;
&nbsp;
------------------&nbsp;Original&nbsp;------------------
From: &nbsp;"Satish&nbsp;Balay"<balay at mcs.anl.gov&gt;;
Date: &nbsp;Fri, Mar 3, 2023 12:12 PM
To: &nbsp;"Barry Smith"<bsmith at petsc.dev&gt;; 
Cc: &nbsp;"???"<fengshw3 at mail2.sysu.edu.cn&gt;; "petsc-users"<petsc-users at mcs.anl.gov&gt;; 
Subject: &nbsp;Re: [petsc-users] Error in configuring PETSc with Cygwin

&nbsp;

Perhaps the compilers are installed without english - so we can't read the error messages. 

&gt;&nbsp;    &nbsp;x64&nbsp;  &nbsp;Microsoft (R) C/C++&nbsp; ?       &nbsp;19.29.30147&nbsp;  

We test with:

Microsoft (R) C/C++ Optimizing Compiler Version 19.32.31329 for x64

I guess that's VS2019 vs VS2022?

You can try using --with-cxx=0 and see if that works.

Satish

On Thu, 2 Mar 2023, Barry Smith wrote:

&gt; 
&gt;&nbsp;&nbsp;&nbsp; The compiler is burping out some warning message which confuses configure into thinking there is a problem. 
&gt; 
&gt; cl:&nbsp;      &nbsp;warning D9035 :  experimental:preprocessor  ?   ?         ?    ??   ? 
&gt; cl:&nbsp;      &nbsp;warning D9036 :? ? Zc:preprocessor      ? ? experimental:preprocessor  
&gt; cl:&nbsp;      &nbsp;warning D9002 :    ??? ?-Qwd10161  :
&gt; 
&gt; Any chance you can use a more recent version of VS. If not, we'll need to send you a file for the warning message.
&gt; 
&gt; 
&gt; 
&gt; &gt; On Mar 2, 2023, at 9:12 PM, ??? <fengshw3 at mail2.sysu.edu.cn&gt; wrote:
&gt; &gt; 
&gt; &gt; Hi, 
&gt; &gt; 
&gt; &gt; This time I try with ./configure --with-cc='win32fe cl' --with-fc=0 --with-cxx='win32fe cl' --download-f2cblaslapack, without fortran may have no problem in consideration that other libs will be used are CGNS and METIS.
&gt; &gt; 
&gt; &gt; Unfortunately, however, another error appeared as:
&gt; &gt; 
&gt; &gt; Cxx libraries cannot directly be used with C as linker.
&gt; &gt; If you don't need the C++ compiler to build external packages or for you application you can run
&gt; &gt; ./configure with --with-cxx=0. Otherwise you need a different combination of C and C++ compilers
&gt; &gt;&nbsp; 
&gt; &gt;&nbsp; The attachment is the log file, but some parts are unreadable. 
&gt; &gt; 
&gt; &gt; Thanks for your continuous aid!
&gt; &gt; ------------------ Original ------------------
&gt; &gt; From:&nbsp; "Satish Balay"<balay at mcs.anl.gov&gt;;
&gt; &gt; Date:&nbsp; Fri, Mar 3, 2023 02:13 AM
&gt; &gt; To:&nbsp; "???"<fengshw3 at mail2.sysu.edu.cn&gt;;
&gt; &gt; Cc:&nbsp; "petsc-users"<petsc-users at mcs.anl.gov&gt;;
&gt; &gt; Subject:&nbsp; Re: [petsc-users] Error in configuring PETSc with Cygwin
&gt; &gt;&nbsp; 
&gt; &gt; On Fri, 3 Mar 2023, ??? wrote:
&gt; &gt; 
&gt; &gt; &gt; Hi team,
&gt; &gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; Recently I try to install PETSc with Cygwin since I'd like to use PETSc with Visual Studio on Windows10 plateform. For the sake of clarity, I firstly list the softwares/packages used below:
&gt; &gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; 1. PETSc: version 3.18.5
&gt; &gt; &gt; 2. VS: version 2019
&gt; &gt; &gt; 3. Intel Parallel Studio XE: version 2020
&gt; &gt; &gt; 4. Cygwin with py3.8 and make (and default installation)
&gt; &gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; And because I plan to use Intel mpi, the compiler option in configuration is:
&gt; &gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; ./configure --with-cc='win32fe cl' --with-fc='win32fe ifort' --with-cxx='win32fe cl' --download-fblaslapack
&gt; &gt; 
&gt; &gt; Check config/examples/arch-ci-mswin-opt-impi.py for an example on specifying IMPI [and MKL - instead of fblaslapack]. And if you don't need MPI - you can use --with-mpi=0
&gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; where there is no option for mpi.
&gt; &gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; While the PROBLEM came with the compiler option --with-fc='win32fe ifort', which returned an error (or two) as:
&gt; &gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; Cannot run executables created with FC. If this machine uses a batch system
&gt; &gt; &gt; to submit jobs you will need to configure using ./configure with the additional option&amp;nbsp; --with-batch.
&gt; &gt; &gt; Otherwise there is problem with the compilers. Can you compile and run code with your compiler '/cygdrive/d/petsc/petsc-3.18.5/lib/petsc/bin/win32fe/win32fe ifort'?
&gt; &gt; 
&gt; &gt; If you are not using PETSc from fortran - you don't need ifort. You can use --with-fc=0 [with MKL or --download-f2cblaslapack]
&gt; &gt; 
&gt; &gt; If you are still encountering errors - send us configure.log for the failed build.
&gt; &gt; 
&gt; &gt; Satish
&gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; Note that both ifort of x64 and ifort of ia-32 ended with the same error above and I install IPS with options related to mkl and fblaslapack. Something a bit suspectable is that I open Cygwin with dos. (actually the Intel Compiler 19.1 Update 3 Intel 64 Visual Studio 2019, x86 environment for the test of ifort ia-32 ,in particularlly)
&gt; &gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; Therefore, I write this e-mail to you in order to confirm if I should add "--with-batch" or the error is caused by other reason, such as ifort ?
&gt; &gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; Looking forward your reply!
&gt; &gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; Sinserely,
&gt; &gt; &gt; FENG.
&gt; &gt; 
&gt; &gt; <configure.txt&gt;
&gt; 
&gt;
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230304/584d628c/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure.log.txt
Type: application/octet-stream
Size: 1218733 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230304/584d628c/attachment-0001.obj>

From yangzongze at gmail.com  Sat Mar  4 07:30:38 2023
From: yangzongze at gmail.com (Zongze Yang)
Date: Sat, 4 Mar 2023 21:30:38 +0800
Subject: [petsc-users] Random Error of mumps: out of memory: INFOG(1)=-9
Message-ID: <CA+K_gXBVhF3ccqVfZqDAo=nk98gxjx9y7_BPe0c4tFonDw05pQ@mail.gmail.com>

Hi,

I am writing to seek your advice regarding a problem I encountered while
using multigrid to solve a certain issue.
I am currently using multigrid with the coarse problem solved by PCLU.
However, the PC failed randomly with the error below (the value of INFO(2)
may differ):
```shell
[ 0] Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9,
INFO(2)=36
```

Upon checking the documentation of MUMPS, I discovered that increasing the
value of ICNTL(14) may help resolve the issue. Specifically, I set the
option -mat_mumps_icntl_14 to a higher value (such as 40), and the error
seemed to disappear after I set the value of ICNTL(14) to 80. However, I am
still curious as to why MUMPS failed randomly in the first place.

Upon further inspection, I found that the number of nonzeros of the PETSc
matrix and the MUMPS matrix were different every time I ran the code. I am
now left with the following questions:

1. What could be causing the number of nonzeros of the MUMPS matrix to
change every time I run the code?
2. Why is the number of nonzeros of the MUMPS matrix significantly greater
than that of the PETSc matrix (as seen in the output of ksp_view, 115025949
vs 7346177)?
3. Is it possible that the varying number of nonzeros of the MUMPS matrix
is the cause of the random failure?

I have attached a test example written in Firedrake. The output of
`ksp_view` after running the code twice is included below for your
reference.
In the output, the number of nonzeros of the MUMPS matrix was 115025949 and
115377847, respectively, while that of the PETSc matrix was only 7346177.

```shell
(complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view
::ascii_info_detail | grep -A3 "type: "
  type: preonly
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
--
  type: lu
    out-of-place factorization
    tolerance for zero pivot 2.22045e-14
    matrix ordering: external
--
          type: mumps
          rows=1050625, cols=1050625
          package used to perform factorization: mumps
          total: nonzeros=115025949, allocated nonzeros=115025949
--
    type: mpiaij
    rows=1050625, cols=1050625
    total: nonzeros=7346177, allocated nonzeros=7346177
    total number of mallocs used during MatSetValues calls=0
(complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view
::ascii_info_detail | grep -A3 "type: "
  type: preonly
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
--
  type: lu
    out-of-place factorization
    tolerance for zero pivot 2.22045e-14
    matrix ordering: external
--
          type: mumps
          rows=1050625, cols=1050625
          package used to perform factorization: mumps
          total: nonzeros=115377847, allocated nonzeros=115377847
--
    type: mpiaij
    rows=1050625, cols=1050625
    total: nonzeros=7346177, allocated nonzeros=7346177
    total number of mallocs used during MatSetValues calls=0
```

I would greatly appreciate any insights you may have on this matter. Thank
you in advance for your time and assistance.

Best wishes,
Zongze
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230304/72957a42/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_mumps.py
Type: text/x-python
Size: 763 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230304/72957a42/attachment.py>

From pierre at joliv.et  Sat Mar  4 07:37:15 2023
From: pierre at joliv.et (Pierre Jolivet)
Date: Sat, 4 Mar 2023 14:37:15 +0100
Subject: [petsc-users] Random Error of mumps: out of memory: INFOG(1)=-9
In-Reply-To: <CA+K_gXBVhF3ccqVfZqDAo=nk98gxjx9y7_BPe0c4tFonDw05pQ@mail.gmail.com>
References: <CA+K_gXBVhF3ccqVfZqDAo=nk98gxjx9y7_BPe0c4tFonDw05pQ@mail.gmail.com>
Message-ID: <FC576342-24A0-405D-B0B3-BC1F3B4AA15D@joliv.et>



> On 4 Mar 2023, at 2:30 PM, Zongze Yang <yangzongze at gmail.com> wrote:
> 
> Hi, 
> 
> I am writing to seek your advice regarding a problem I encountered while using multigrid to solve a certain issue.
> I am currently using multigrid with the coarse problem solved by PCLU. However, the PC failed randomly with the error below (the value of INFO(2) may differ):
> ```shell
> [ 0] Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=36
> ```
> 
> Upon checking the documentation of MUMPS, I discovered that increasing the value of ICNTL(14) may help resolve the issue. Specifically, I set the option -mat_mumps_icntl_14 to a higher value (such as 40), and the error seemed to disappear after I set the value of ICNTL(14) to 80. However, I am still curious as to why MUMPS failed randomly in the first place.
> 
> Upon further inspection, I found that the number of nonzeros of the PETSc matrix and the MUMPS matrix were different every time I ran the code. I am now left with the following questions:
> 
> 1. What could be causing the number of nonzeros of the MUMPS matrix to change every time I run the code?

Is the Mat being fed to MUMPS distributed on a communicator of size greater than one?
If yes, then, depending on the pivoting and the renumbering, you may get non-deterministic results.

> 2. Why is the number of nonzeros of the MUMPS matrix significantly greater than that of the PETSc matrix (as seen in the output of ksp_view, 115025949 vs 7346177)?

Exact factorizations introduce fill-in.
The number of nonzeros you are seeing for MUMPS is the number of nonzeros in the factors.

> 3. Is it possible that the varying number of nonzeros of the MUMPS matrix is the cause of the random failure?

Yes, MUMPS uses dynamic scheduling, which will depend on numerical pivoting, and which may generate factors with different number of nonzeros.

Thanks,
Pierre

> I have attached a test example written in Firedrake. The output of `ksp_view` after running the code twice is included below for your reference.
> In the output, the number of nonzeros of the MUMPS matrix was 115025949 and 115377847, respectively, while that of the PETSc matrix was only 7346177.
> 
> ```shell
> (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view ::ascii_info_detail | grep -A3 "type: "
>   type: preonly
>   maximum iterations=10000, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>   left preconditioning
> --
>   type: lu
>     out-of-place factorization
>     tolerance for zero pivot 2.22045e-14
>     matrix ordering: external
> --
>           type: mumps
>           rows=1050625, cols=1050625
>           package used to perform factorization: mumps
>           total: nonzeros=115025949, allocated nonzeros=115025949
> --
>     type: mpiaij
>     rows=1050625, cols=1050625
>     total: nonzeros=7346177, allocated nonzeros=7346177
>     total number of mallocs used during MatSetValues calls=0
> (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view ::ascii_info_detail | grep -A3 "type: "
>   type: preonly
>   maximum iterations=10000, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>   left preconditioning
> --
>   type: lu
>     out-of-place factorization
>     tolerance for zero pivot 2.22045e-14
>     matrix ordering: external
> --
>           type: mumps
>           rows=1050625, cols=1050625
>           package used to perform factorization: mumps
>           total: nonzeros=115377847, allocated nonzeros=115377847
> --
>     type: mpiaij
>     rows=1050625, cols=1050625
>     total: nonzeros=7346177, allocated nonzeros=7346177
>     total number of mallocs used during MatSetValues calls=0
> ```
> 
> I would greatly appreciate any insights you may have on this matter. Thank you in advance for your time and assistance.
> 
> Best wishes,
> Zongze
> <test_mumps.py>


From yangzongze at gmail.com  Sat Mar  4 07:51:09 2023
From: yangzongze at gmail.com (Zongze Yang)
Date: Sat, 4 Mar 2023 21:51:09 +0800
Subject: [petsc-users] Random Error of mumps: out of memory: INFOG(1)=-9
In-Reply-To: <FC576342-24A0-405D-B0B3-BC1F3B4AA15D@joliv.et>
References: <CA+K_gXBVhF3ccqVfZqDAo=nk98gxjx9y7_BPe0c4tFonDw05pQ@mail.gmail.com>
	<FC576342-24A0-405D-B0B3-BC1F3B4AA15D@joliv.et>
Message-ID: <CA+K_gXBSaZwcJ+KZeejv96-L2+Jy-QMjZy3sm0+mw=EXiMTpYw@mail.gmail.com>

On Sat, 4 Mar 2023 at 21:37, Pierre Jolivet <pierre at joliv.et> wrote:

>
>
> > On 4 Mar 2023, at 2:30 PM, Zongze Yang <yangzongze at gmail.com> wrote:
> >
> > Hi,
> >
> > I am writing to seek your advice regarding a problem I encountered while
> using multigrid to solve a certain issue.
> > I am currently using multigrid with the coarse problem solved by PCLU.
> However, the PC failed randomly with the error below (the value of INFO(2)
> may differ):
> > ```shell
> > [ 0] Error reported by MUMPS in numerical factorization phase:
> INFOG(1)=-9, INFO(2)=36
> > ```
> >
> > Upon checking the documentation of MUMPS, I discovered that increasing
> the value of ICNTL(14) may help resolve the issue. Specifically, I set the
> option -mat_mumps_icntl_14 to a higher value (such as 40), and the error
> seemed to disappear after I set the value of ICNTL(14) to 80. However, I am
> still curious as to why MUMPS failed randomly in the first place.
> >
> > Upon further inspection, I found that the number of nonzeros of the
> PETSc matrix and the MUMPS matrix were different every time I ran the code.
> I am now left with the following questions:
> >
> > 1. What could be causing the number of nonzeros of the MUMPS matrix to
> change every time I run the code?
>
> Is the Mat being fed to MUMPS distributed on a communicator of size
> greater than one?
> If yes, then, depending on the pivoting and the renumbering, you may get
> non-deterministic results.
>

Hi, Pierre,
Thank you for your prompt reply. Yes, the size of the communicator is
greater than one.
Even if the size of the communicator is equal, are the results
still non-deterministic? Can I assume the Mat being fed to MUMPS is the
same in this case?
Is the pivoting and renumbering all done by MUMPS other than PETSc?


> > 2. Why is the number of nonzeros of the MUMPS matrix significantly
> greater than that of the PETSc matrix (as seen in the output of ksp_view,
> 115025949 vs 7346177)?
>
> Exact factorizations introduce fill-in.
> The number of nonzeros you are seeing for MUMPS is the number of nonzeros
> in the factors.
>
> > 3. Is it possible that the varying number of nonzeros of the MUMPS
> matrix is the cause of the random failure?
>
> Yes, MUMPS uses dynamic scheduling, which will depend on numerical
> pivoting, and which may generate factors with different number of nonzeros.
>

Got it. Thank you for your clear explanation.
Zongze


> Thanks,
> Pierre


> > I have attached a test example written in Firedrake. The output of
> `ksp_view` after running the code twice is included below for your
> reference.
> > In the output, the number of nonzeros of the MUMPS matrix was 115025949
> and 115377847, respectively, while that of the PETSc matrix was only
> 7346177.
> >
> > ```shell
> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view
> ::ascii_info_detail | grep -A3 "type: "
> >   type: preonly
> >   maximum iterations=10000, initial guess is zero
> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >   left preconditioning
> > --
> >   type: lu
> >     out-of-place factorization
> >     tolerance for zero pivot 2.22045e-14
> >     matrix ordering: external
> > --
> >           type: mumps
> >           rows=1050625, cols=1050625
> >           package used to perform factorization: mumps
> >           total: nonzeros=115025949, allocated nonzeros=115025949
> > --
> >     type: mpiaij
> >     rows=1050625, cols=1050625
> >     total: nonzeros=7346177, allocated nonzeros=7346177
> >     total number of mallocs used during MatSetValues calls=0
> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view
> ::ascii_info_detail | grep -A3 "type: "
> >   type: preonly
> >   maximum iterations=10000, initial guess is zero
> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >   left preconditioning
> > --
> >   type: lu
> >     out-of-place factorization
> >     tolerance for zero pivot 2.22045e-14
> >     matrix ordering: external
> > --
> >           type: mumps
> >           rows=1050625, cols=1050625
> >           package used to perform factorization: mumps
> >           total: nonzeros=115377847, allocated nonzeros=115377847
> > --
> >     type: mpiaij
> >     rows=1050625, cols=1050625
> >     total: nonzeros=7346177, allocated nonzeros=7346177
> >     total number of mallocs used during MatSetValues calls=0
> > ```
> >
> > I would greatly appreciate any insights you may have on this matter.
> Thank you in advance for your time and assistance.
> >
> > Best wishes,
> > Zongze
> > <test_mumps.py>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230304/42166db4/attachment-0001.html>

From pierre at joliv.et  Sat Mar  4 08:03:10 2023
From: pierre at joliv.et (Pierre Jolivet)
Date: Sat, 4 Mar 2023 15:03:10 +0100
Subject: [petsc-users] Random Error of mumps: out of memory: INFOG(1)=-9
In-Reply-To: <CA+K_gXBSaZwcJ+KZeejv96-L2+Jy-QMjZy3sm0+mw=EXiMTpYw@mail.gmail.com>
References: <CA+K_gXBVhF3ccqVfZqDAo=nk98gxjx9y7_BPe0c4tFonDw05pQ@mail.gmail.com>
	<FC576342-24A0-405D-B0B3-BC1F3B4AA15D@joliv.et>
	<CA+K_gXBSaZwcJ+KZeejv96-L2+Jy-QMjZy3sm0+mw=EXiMTpYw@mail.gmail.com>
Message-ID: <C00CC6D7-896A-41FA-85FC-B2BE2EBD3042@joliv.et>



> On 4 Mar 2023, at 2:51 PM, Zongze Yang <yangzongze at gmail.com> wrote:
> 
> 
> 
> On Sat, 4 Mar 2023 at 21:37, Pierre Jolivet <pierre at joliv.et <mailto:pierre at joliv.et>> wrote:
>> 
>> 
>> > On 4 Mar 2023, at 2:30 PM, Zongze Yang <yangzongze at gmail.com <mailto:yangzongze at gmail.com>> wrote:
>> > 
>> > Hi, 
>> > 
>> > I am writing to seek your advice regarding a problem I encountered while using multigrid to solve a certain issue.
>> > I am currently using multigrid with the coarse problem solved by PCLU. However, the PC failed randomly with the error below (the value of INFO(2) may differ):
>> > ```shell
>> > [ 0] Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=36
>> > ```
>> > 
>> > Upon checking the documentation of MUMPS, I discovered that increasing the value of ICNTL(14) may help resolve the issue. Specifically, I set the option -mat_mumps_icntl_14 to a higher value (such as 40), and the error seemed to disappear after I set the value of ICNTL(14) to 80. However, I am still curious as to why MUMPS failed randomly in the first place.
>> > 
>> > Upon further inspection, I found that the number of nonzeros of the PETSc matrix and the MUMPS matrix were different every time I ran the code. I am now left with the following questions:
>> > 
>> > 1. What could be causing the number of nonzeros of the MUMPS matrix to change every time I run the code?
>> 
>> Is the Mat being fed to MUMPS distributed on a communicator of size greater than one?
>> If yes, then, depending on the pivoting and the renumbering, you may get non-deterministic results.
>  
> Hi, Pierre,
> Thank you for your prompt reply. Yes, the size of the communicator is greater than one. 
> Even if the size of the communicator is equal, are the results still non-deterministic?

In the most general case, yes.

> Can I assume the Mat being fed to MUMPS is the same in this case?

Are you doing algebraic or geometric multigrid?
Are the prolongation operators computed by Firedrake or by PETSc, e.g., through GAMG?
If it?s the latter, I believe the Mat being fed to MUMPS should always be the same.
If it?s the former, you?ll have to ask the Firedrake people if there may be non-determinism in the coarsening process.

> Is the pivoting and renumbering all done by MUMPS other than PETSc?

You could provide your own numbering, but by default, this is outsourced to MUMPS indeed, which will itself outsourced this to METIS, AMD, etc.

Thanks,
Pierre

>> 
>> > 2. Why is the number of nonzeros of the MUMPS matrix significantly greater than that of the PETSc matrix (as seen in the output of ksp_view, 115025949 vs 7346177)?
>> 
>> Exact factorizations introduce fill-in.
>> The number of nonzeros you are seeing for MUMPS is the number of nonzeros in the factors.
>> 
>> > 3. Is it possible that the varying number of nonzeros of the MUMPS matrix is the cause of the random failure?
>> 
>> Yes, MUMPS uses dynamic scheduling, which will depend on numerical pivoting, and which may generate factors with different number of nonzeros.
> 
> Got it. Thank you for your clear explanation.
> Zongze 
> 
>> 
>> Thanks,
>> Pierre
>> 
>> > I have attached a test example written in Firedrake. The output of `ksp_view` after running the code twice is included below for your reference.
>> > In the output, the number of nonzeros of the MUMPS matrix was 115025949 and 115377847, respectively, while that of the PETSc matrix was only 7346177.
>> > 
>> > ```shell
>> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view ::ascii_info_detail | grep -A3 "type: "
>> >   type: preonly
>> >   maximum iterations=10000, initial guess is zero
>> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>> >   left preconditioning
>> > --
>> >   type: lu
>> >     out-of-place factorization
>> >     tolerance for zero pivot 2.22045e-14
>> >     matrix ordering: external
>> > --
>> >           type: mumps
>> >           rows=1050625, cols=1050625
>> >           package used to perform factorization: mumps
>> >           total: nonzeros=115025949, allocated nonzeros=115025949
>> > --
>> >     type: mpiaij
>> >     rows=1050625, cols=1050625
>> >     total: nonzeros=7346177, allocated nonzeros=7346177
>> >     total number of mallocs used during MatSetValues calls=0
>> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view ::ascii_info_detail | grep -A3 "type: "
>> >   type: preonly
>> >   maximum iterations=10000, initial guess is zero
>> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>> >   left preconditioning
>> > --
>> >   type: lu
>> >     out-of-place factorization
>> >     tolerance for zero pivot 2.22045e-14
>> >     matrix ordering: external
>> > --
>> >           type: mumps
>> >           rows=1050625, cols=1050625
>> >           package used to perform factorization: mumps
>> >           total: nonzeros=115377847, allocated nonzeros=115377847
>> > --
>> >     type: mpiaij
>> >     rows=1050625, cols=1050625
>> >     total: nonzeros=7346177, allocated nonzeros=7346177
>> >     total number of mallocs used during MatSetValues calls=0
>> > ```
>> > 
>> > I would greatly appreciate any insights you may have on this matter. Thank you in advance for your time and assistance.
>> > 
>> > Best wishes,
>> > Zongze
>> > <test_mumps.py>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230304/ddd8fe4e/attachment.html>

From yangzongze at gmail.com  Sat Mar  4 08:26:00 2023
From: yangzongze at gmail.com (Zongze Yang)
Date: Sat, 4 Mar 2023 22:26:00 +0800
Subject: [petsc-users] Random Error of mumps: out of memory: INFOG(1)=-9
In-Reply-To: <C00CC6D7-896A-41FA-85FC-B2BE2EBD3042@joliv.et>
References: <CA+K_gXBVhF3ccqVfZqDAo=nk98gxjx9y7_BPe0c4tFonDw05pQ@mail.gmail.com>
	<FC576342-24A0-405D-B0B3-BC1F3B4AA15D@joliv.et>
	<CA+K_gXBSaZwcJ+KZeejv96-L2+Jy-QMjZy3sm0+mw=EXiMTpYw@mail.gmail.com>
	<C00CC6D7-896A-41FA-85FC-B2BE2EBD3042@joliv.et>
Message-ID: <CA+K_gXB5wZDuaQeXA9kTrnyG657oCCj84cgfoa53ysZG_8nmYg@mail.gmail.com>

On Sat, 4 Mar 2023 at 22:03, Pierre Jolivet <pierre at joliv.et> wrote:

>
>
> On 4 Mar 2023, at 2:51 PM, Zongze Yang <yangzongze at gmail.com> wrote:
>
>
>
> On Sat, 4 Mar 2023 at 21:37, Pierre Jolivet <pierre at joliv.et> wrote:
>
>>
>>
>> > On 4 Mar 2023, at 2:30 PM, Zongze Yang <yangzongze at gmail.com> wrote:
>> >
>> > Hi,
>> >
>> > I am writing to seek your advice regarding a problem I encountered
>> while using multigrid to solve a certain issue.
>> > I am currently using multigrid with the coarse problem solved by PCLU.
>> However, the PC failed randomly with the error below (the value of INFO(2)
>> may differ):
>> > ```shell
>> > [ 0] Error reported by MUMPS in numerical factorization phase:
>> INFOG(1)=-9, INFO(2)=36
>> > ```
>> >
>> > Upon checking the documentation of MUMPS, I discovered that increasing
>> the value of ICNTL(14) may help resolve the issue. Specifically, I set the
>> option -mat_mumps_icntl_14 to a higher value (such as 40), and the error
>> seemed to disappear after I set the value of ICNTL(14) to 80. However, I am
>> still curious as to why MUMPS failed randomly in the first place.
>> >
>> > Upon further inspection, I found that the number of nonzeros of the
>> PETSc matrix and the MUMPS matrix were different every time I ran the code.
>> I am now left with the following questions:
>> >
>> > 1. What could be causing the number of nonzeros of the MUMPS matrix to
>> change every time I run the code?
>>
>> Is the Mat being fed to MUMPS distributed on a communicator of size
>> greater than one?
>> If yes, then, depending on the pivoting and the renumbering, you may get
>> non-deterministic results.
>>
>
> Hi, Pierre,
> Thank you for your prompt reply. Yes, the size of the communicator is
> greater than one.
> Even if the size of the communicator is equal, are the results
> still non-deterministic?
>
>
> In the most general case, yes.
>
> Can I assume the Mat being fed to MUMPS is the same in this case?
>
>
> Are you doing algebraic or geometric multigrid?
> Are the prolongation operators computed by Firedrake or by PETSc, e.g.,
> through GAMG?
> If it?s the latter, I believe the Mat being fed to MUMPS should always be
> the same.
> If it?s the former, you?ll have to ask the Firedrake people if there may
> be non-determinism in the coarsening process.
>

I am using geometric multigrid, and the prolongation operators, I think,
are computed by Firedrake.
Thanks for your suggestion, I will ask the Firedrake people.


>
> Is the pivoting and renumbering all done by MUMPS other than PETSc?
>
>
> You could provide your own numbering, but by default, this is outsourced
> to MUMPS indeed, which will itself outsourced this to METIS, AMD, etc.
>

I think I won't do this.
By the way, does the result of superlu_dist  have a similar
non-deterministic?

Thanks,
Zongze


> Thanks,
> Pierre
>
>
>> > 2. Why is the number of nonzeros of the MUMPS matrix significantly
>> greater than that of the PETSc matrix (as seen in the output of ksp_view,
>> 115025949 vs 7346177)?
>>
>> Exact factorizations introduce fill-in.
>> The number of nonzeros you are seeing for MUMPS is the number of nonzeros
>> in the factors.
>>
>> > 3. Is it possible that the varying number of nonzeros of the MUMPS
>> matrix is the cause of the random failure?
>>
>> Yes, MUMPS uses dynamic scheduling, which will depend on numerical
>> pivoting, and which may generate factors with different number of nonzeros.
>>
>
> Got it. Thank you for your clear explanation.
> Zongze
>
>
>> Thanks,
>> Pierre
>
>
>> > I have attached a test example written in Firedrake. The output of
>> `ksp_view` after running the code twice is included below for your
>> reference.
>> > In the output, the number of nonzeros of the MUMPS matrix was 115025949
>> and 115377847, respectively, while that of the PETSc matrix was only
>> 7346177.
>> >
>> > ```shell
>> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view
>> ::ascii_info_detail | grep -A3 "type: "
>> >   type: preonly
>> >   maximum iterations=10000, initial guess is zero
>> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>> >   left preconditioning
>> > --
>> >   type: lu
>> >     out-of-place factorization
>> >     tolerance for zero pivot 2.22045e-14
>> >     matrix ordering: external
>> > --
>> >           type: mumps
>> >           rows=1050625, cols=1050625
>> >           package used to perform factorization: mumps
>> >           total: nonzeros=115025949, allocated nonzeros=115025949
>> > --
>> >     type: mpiaij
>> >     rows=1050625, cols=1050625
>> >     total: nonzeros=7346177, allocated nonzeros=7346177
>> >     total number of mallocs used during MatSetValues calls=0
>> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view
>> ::ascii_info_detail | grep -A3 "type: "
>> >   type: preonly
>> >   maximum iterations=10000, initial guess is zero
>> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>> >   left preconditioning
>> > --
>> >   type: lu
>> >     out-of-place factorization
>> >     tolerance for zero pivot 2.22045e-14
>> >     matrix ordering: external
>> > --
>> >           type: mumps
>> >           rows=1050625, cols=1050625
>> >           package used to perform factorization: mumps
>> >           total: nonzeros=115377847, allocated nonzeros=115377847
>> > --
>> >     type: mpiaij
>> >     rows=1050625, cols=1050625
>> >     total: nonzeros=7346177, allocated nonzeros=7346177
>> >     total number of mallocs used during MatSetValues calls=0
>> > ```
>> >
>> > I would greatly appreciate any insights you may have on this matter.
>> Thank you in advance for your time and assistance.
>> >
>> > Best wishes,
>> > Zongze
>> > <test_mumps.py>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230304/4c2fd037/attachment-0001.html>

From balay at mcs.anl.gov  Sat Mar  4 08:30:29 2023
From: balay at mcs.anl.gov (Satish Balay)
Date: Sat, 4 Mar 2023 08:30:29 -0600 (CST)
Subject: [petsc-users] Error in configuring PETSc with Cygwin
In-Reply-To: <tencent_763C76415843117C21D601F5@qq.com>
References: <tencent_51F895C71B7559B508583817@qq.com>
	<b69b3331-249c-c31c-e255-9de5d6cc04d2@mcs.anl.gov>
	<tencent_1179E6943164DC73548B94B0@qq.com>
	<4ECF3541-271E-449E-B9FF-45EB24913F25@petsc.dev>
	<6bb8769f-976c-fa32-8076-757e2d04b54c@mcs.anl.gov>
	<tencent_763C76415843117C21D601F5@qq.com>
Message-ID: <756c6920-aee1-df97-3b33-0a87079fa3be@mcs.anl.gov>

>            Defined "SIZEOF_VOID_P" to "4"

It won't work with 32bit compilers. Can you use 64bit compilers [with 64bit ms-mpi]?

BTW: Our testing is on Windows10 and short paths do work. But yeah - if you an avoid spaces - thats one way to simplify.

https://gitlab.com/petsc/petsc/-/jobs/3873889443

MPI:
  Version:    2
  Includes:   -I/cygdrive/c/PROGRA~2/MICROS~3/MPI/Include -I/cygdrive/c/PROGRA~2/MICROS~3/MPI/Include/x64
  Libraries:  /cygdrive/c/PROGRA~2/MICROS~3/MPI/lib/x64/msmpifec.lib /cygdrive/c/PROGRA~2/MICROS~3/MPI/lib/x64/msmpi.lib
  mpiexec: /cygdrive/c/PROGRA~1/MICROS~3/Bin/mpiexec.exe

Satish


On Sat, 4 Mar 2023, ??? wrote:

> Hi,
> 
> 
> VS2022 really sovled this error and there is no more error with the compiler, this is a good news!
> 
> 
> However, a new problem comes with the link option for MS-MPI (since MPICH2 doesn't work):
> 
> 
> I've made reference to PETSc website and downloaded MS-MPI in directory D:\MicrosoftMPI and&nbsp;D:\MicrosoftSDKs to avoid space (by the way, method on https://petsc.org/release/install/windows/ which use shortname for a path is not useful anymore for win10 because shortname doesn't exist, see [1]). I have no idea if my tying format is not correct since the PETSc website doesn't show the coding for two include directories. Below is my typing:
> 
> 
> ./configure --with-cc='win32fe cl' --with-fc=0 --with-cxx='win32fe cl' --with-shared-libraries=0 --with-mpi-include='[/cygdrive/d/MicrosoftSDKs/MPI/Include,/cygdrive/d/MicrosoftSDKs/MPI/Include/x64]' --with-mpi-lib=-L"/cygdrive/d/MicrosoftSDKs/MPI/Lib/x64 msmpifec.lib msmpi.lib" --with-mpiexec="/cygdrive/d/MicrosoftMPI/Bin/mpiexec"
> 
> 
> This ends up with the error information:
> 
> 
> --with-mpi-lib=['-L/cygdrive/d/MicrosoftSDKs/MPI/Lib/x64', 'msmpifec.lib', 'msmpi.lib'] and
> --with-mpi-include=['/cygdrive/d/MicrosoftSDKs/MPI/Include', '/cygdrive/d/MicrosoftSDKs/MPI/Include/x64'] did not work
> 
> 
> This may not be a very delicacy problem? And I am voluntary to make a summary about this installation once it succeed.
> 
> 
> Sorry for always bother with problem,
> FENG
> 
> 
> 
> [1]&nbsp;https://superuser.com/questions/348079/how-can-i-find-the-short-path-of-a-windows-directory-file
> &nbsp;
> &nbsp;
> ------------------&nbsp;Original&nbsp;------------------
> From: &nbsp;"Satish&nbsp;Balay"<balay at mcs.anl.gov&gt;;
> Date: &nbsp;Fri, Mar 3, 2023 12:12 PM
> To: &nbsp;"Barry Smith"<bsmith at petsc.dev&gt;; 
> Cc: &nbsp;"???"<fengshw3 at mail2.sysu.edu.cn&gt;; "petsc-users"<petsc-users at mcs.anl.gov&gt;; 
> Subject: &nbsp;Re: [petsc-users] Error in configuring PETSc with Cygwin
> 
> &nbsp;
> 
> Perhaps the compilers are installed without english - so we can't read the error messages. 
> 
> &gt;&nbsp;    &nbsp;x64&nbsp;  &nbsp;Microsoft (R) C/C++&nbsp; ?       &nbsp;19.29.30147&nbsp;  
> 
> We test with:
> 
> Microsoft (R) C/C++ Optimizing Compiler Version 19.32.31329 for x64
> 
> I guess that's VS2019 vs VS2022?
> 
> You can try using --with-cxx=0 and see if that works.
> 
> Satish
> 
> On Thu, 2 Mar 2023, Barry Smith wrote:
> 
> &gt; 
> &gt;&nbsp;&nbsp;&nbsp; The compiler is burping out some warning message which confuses configure into thinking there is a problem. 
> &gt; 
> &gt; cl:&nbsp;      &nbsp;warning D9035 :  experimental:preprocessor  ?   ?         ?    ??   ? 
> &gt; cl:&nbsp;      &nbsp;warning D9036 :? ? Zc:preprocessor      ? ? experimental:preprocessor  
> &gt; cl:&nbsp;      &nbsp;warning D9002 :    ??? ?-Qwd10161  :
> &gt; 
> &gt; Any chance you can use a more recent version of VS. If not, we'll need to send you a file for the warning message.
> &gt; 
> &gt; 
> &gt; 
> &gt; &gt; On Mar 2, 2023, at 9:12 PM, ??? <fengshw3 at mail2.sysu.edu.cn&gt; wrote:
> &gt; &gt; 
> &gt; &gt; Hi, 
> &gt; &gt; 
> &gt; &gt; This time I try with ./configure --with-cc='win32fe cl' --with-fc=0 --with-cxx='win32fe cl' --download-f2cblaslapack, without fortran may have no problem in consideration that other libs will be used are CGNS and METIS.
> &gt; &gt; 
> &gt; &gt; Unfortunately, however, another error appeared as:
> &gt; &gt; 
> &gt; &gt; Cxx libraries cannot directly be used with C as linker.
> &gt; &gt; If you don't need the C++ compiler to build external packages or for you application you can run
> &gt; &gt; ./configure with --with-cxx=0. Otherwise you need a different combination of C and C++ compilers
> &gt; &gt;&nbsp; 
> &gt; &gt;&nbsp; The attachment is the log file, but some parts are unreadable. 
> &gt; &gt; 
> &gt; &gt; Thanks for your continuous aid!
> &gt; &gt; ------------------ Original ------------------
> &gt; &gt; From:&nbsp; "Satish Balay"<balay at mcs.anl.gov&gt;;
> &gt; &gt; Date:&nbsp; Fri, Mar 3, 2023 02:13 AM
> &gt; &gt; To:&nbsp; "???"<fengshw3 at mail2.sysu.edu.cn&gt;;
> &gt; &gt; Cc:&nbsp; "petsc-users"<petsc-users at mcs.anl.gov&gt;;
> &gt; &gt; Subject:&nbsp; Re: [petsc-users] Error in configuring PETSc with Cygwin
> &gt; &gt;&nbsp; 
> &gt; &gt; On Fri, 3 Mar 2023, ??? wrote:
> &gt; &gt; 
> &gt; &gt; &gt; Hi team,
> &gt; &gt; &gt; 
> &gt; &gt; &gt; 
> &gt; &gt; &gt; Recently I try to install PETSc with Cygwin since I'd like to use PETSc with Visual Studio on Windows10 plateform. For the sake of clarity, I firstly list the softwares/packages used below:
> &gt; &gt; &gt; 
> &gt; &gt; &gt; 
> &gt; &gt; &gt; 1. PETSc: version 3.18.5
> &gt; &gt; &gt; 2. VS: version 2019
> &gt; &gt; &gt; 3. Intel Parallel Studio XE: version 2020
> &gt; &gt; &gt; 4. Cygwin with py3.8 and make (and default installation)
> &gt; &gt; &gt; 
> &gt; &gt; &gt; 
> &gt; &gt; &gt; And because I plan to use Intel mpi, the compiler option in configuration is:
> &gt; &gt; &gt; 
> &gt; &gt; &gt; 
> &gt; &gt; &gt; ./configure --with-cc='win32fe cl' --with-fc='win32fe ifort' --with-cxx='win32fe cl' --download-fblaslapack
> &gt; &gt; 
> &gt; &gt; Check config/examples/arch-ci-mswin-opt-impi.py for an example on specifying IMPI [and MKL - instead of fblaslapack]. And if you don't need MPI - you can use --with-mpi=0
> &gt; &gt; 
> &gt; &gt; &gt; 
> &gt; &gt; &gt; 
> &gt; &gt; &gt; where there is no option for mpi.
> &gt; &gt; &gt; 
> &gt; &gt; &gt; 
> &gt; &gt; &gt; While the PROBLEM came with the compiler option --with-fc='win32fe ifort', which returned an error (or two) as:
> &gt; &gt; &gt; 
> &gt; &gt; &gt; 
> &gt; &gt; &gt; Cannot run executables created with FC. If this machine uses a batch system
> &gt; &gt; &gt; to submit jobs you will need to configure using ./configure with the additional option&amp;nbsp; --with-batch.
> &gt; &gt; &gt; Otherwise there is problem with the compilers. Can you compile and run code with your compiler '/cygdrive/d/petsc/petsc-3.18.5/lib/petsc/bin/win32fe/win32fe ifort'?
> &gt; &gt; 
> &gt; &gt; If you are not using PETSc from fortran - you don't need ifort. You can use --with-fc=0 [with MKL or --download-f2cblaslapack]
> &gt; &gt; 
> &gt; &gt; If you are still encountering errors - send us configure.log for the failed build.
> &gt; &gt; 
> &gt; &gt; Satish
> &gt; &gt; 
> &gt; &gt; &gt; 
> &gt; &gt; &gt; 
> &gt; &gt; &gt; 
> &gt; &gt; &gt; Note that both ifort of x64 and ifort of ia-32 ended with the same error above and I install IPS with options related to mkl and fblaslapack. Something a bit suspectable is that I open Cygwin with dos. (actually the Intel Compiler 19.1 Update 3 Intel 64 Visual Studio 2019, x86 environment for the test of ifort ia-32 ,in particularlly)
> &gt; &gt; &gt; 
> &gt; &gt; &gt; 
> &gt; &gt; &gt; Therefore, I write this e-mail to you in order to confirm if I should add "--with-batch" or the error is caused by other reason, such as ifort ?
> &gt; &gt; &gt; 
> &gt; &gt; &gt; 
> &gt; &gt; &gt; Looking forward your reply!
> &gt; &gt; &gt; 
> &gt; &gt; &gt; 
> &gt; &gt; &gt; Sinserely,
> &gt; &gt; &gt; FENG.
> &gt; &gt; 
> &gt; &gt; <configure.txt&gt;
> &gt; 
> &gt;

From pierre at joliv.et  Sat Mar  4 09:09:38 2023
From: pierre at joliv.et (Pierre Jolivet)
Date: Sat, 4 Mar 2023 16:09:38 +0100
Subject: [petsc-users] Random Error of mumps: out of memory: INFOG(1)=-9
In-Reply-To: <CA+K_gXB5wZDuaQeXA9kTrnyG657oCCj84cgfoa53ysZG_8nmYg@mail.gmail.com>
References: <CA+K_gXB5wZDuaQeXA9kTrnyG657oCCj84cgfoa53ysZG_8nmYg@mail.gmail.com>
Message-ID: <20049ECA-AEA6-4BED-B692-97C8C2AAAA3A@joliv.et>



> On 4 Mar 2023, at 3:26 PM, Zongze Yang <yangzongze at gmail.com> wrote:
> 
> ?
> 
> 
>> On Sat, 4 Mar 2023 at 22:03, Pierre Jolivet <pierre at joliv.et> wrote:
>> 
>> 
>>>> On 4 Mar 2023, at 2:51 PM, Zongze Yang <yangzongze at gmail.com> wrote:
>>>> 
>>>> 
>>>> 
>>>> On Sat, 4 Mar 2023 at 21:37, Pierre Jolivet <pierre at joliv.et> wrote:
>>>>> 
>>>>> 
>>>>> > On 4 Mar 2023, at 2:30 PM, Zongze Yang <yangzongze at gmail.com> wrote:
>>>>> > 
>>>>> > Hi, 
>>>>> > 
>>>>> > I am writing to seek your advice regarding a problem I encountered while using multigrid to solve a certain issue.
>>>>> > I am currently using multigrid with the coarse problem solved by PCLU. However, the PC failed randomly with the error below (the value of INFO(2) may differ):
>>>>> > ```shell
>>>>> > [ 0] Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=36
>>>>> > ```
>>>>> > 
>>>>> > Upon checking the documentation of MUMPS, I discovered that increasing the value of ICNTL(14) may help resolve the issue. Specifically, I set the option -mat_mumps_icntl_14 to a higher value (such as 40), and the error seemed to disappear after I set the value of ICNTL(14) to 80. However, I am still curious as to why MUMPS failed randomly in the first place.
>>>>> > 
>>>>> > Upon further inspection, I found that the number of nonzeros of the PETSc matrix and the MUMPS matrix were different every time I ran the code. I am now left with the following questions:
>>>>> > 
>>>>> > 1. What could be causing the number of nonzeros of the MUMPS matrix to change every time I run the code?
>>>>> 
>>>>> Is the Mat being fed to MUMPS distributed on a communicator of size greater than one?
>>>>> If yes, then, depending on the pivoting and the renumbering, you may get non-deterministic results.
>>>>  
>>>> Hi, Pierre,
>>>> Thank you for your prompt reply. Yes, the size of the communicator is greater than one. 
>>>> Even if the size of the communicator is equal, are the results still non-deterministic?
>>> 
>>> In the most general case, yes.
>>> 
>>> Can I assume the Mat being fed to MUMPS is the same in this case?
>> 
>> Are you doing algebraic or geometric multigrid?
>> Are the prolongation operators computed by Firedrake or by PETSc, e.g., through GAMG?
>> If it?s the latter, I believe the Mat being fed to MUMPS should always be the same.
>> If it?s the former, you?ll have to ask the Firedrake people if there may be non-determinism in the coarsening process.
> 
> I am using geometric multigrid, and the prolongation operators, I think, are computed by Firedrake. 
> Thanks for your suggestion, I will ask the Firedrake people.
>  
>> 
>>> Is the pivoting and renumbering all done by MUMPS other than PETSc?
>> 
>> You could provide your own numbering, but by default, this is outsourced to MUMPS indeed, which will itself outsourced this to METIS, AMD, etc.
> 
> I think I won't do this.
> By the way, does the result of superlu_dist  have a similar non-deterministic?

SuperLU_DIST uses static pivoting as far as I know, so it may be more deterministic.

Thanks,
Pierre

> Thanks,
> Zongze
> 
>> 
>> Thanks,
>> Pierre
>> 
>>>> 
>>>> > 2. Why is the number of nonzeros of the MUMPS matrix significantly greater than that of the PETSc matrix (as seen in the output of ksp_view, 115025949 vs 7346177)?
>>>> 
>>>> Exact factorizations introduce fill-in.
>>>> The number of nonzeros you are seeing for MUMPS is the number of nonzeros in the factors.
>>>> 
>>>> > 3. Is it possible that the varying number of nonzeros of the MUMPS matrix is the cause of the random failure?
>>>> 
>>>> Yes, MUMPS uses dynamic scheduling, which will depend on numerical pivoting, and which may generate factors with different number of nonzeros.
>>> 
>>> Got it. Thank you for your clear explanation.
>>> Zongze 
>>> 
>>>> 
>>>> Thanks,
>>>> Pierre
>>>> 
>>>> > I have attached a test example written in Firedrake. The output of `ksp_view` after running the code twice is included below for your reference.
>>>> > In the output, the number of nonzeros of the MUMPS matrix was 115025949 and 115377847, respectively, while that of the PETSc matrix was only 7346177.
>>>> > 
>>>> > ```shell
>>>> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view ::ascii_info_detail | grep -A3 "type: "
>>>> >   type: preonly
>>>> >   maximum iterations=10000, initial guess is zero
>>>> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>> >   left preconditioning
>>>> > --
>>>> >   type: lu
>>>> >     out-of-place factorization
>>>> >     tolerance for zero pivot 2.22045e-14
>>>> >     matrix ordering: external
>>>> > --
>>>> >           type: mumps
>>>> >           rows=1050625, cols=1050625
>>>> >           package used to perform factorization: mumps
>>>> >           total: nonzeros=115025949, allocated nonzeros=115025949
>>>> > --
>>>> >     type: mpiaij
>>>> >     rows=1050625, cols=1050625
>>>> >     total: nonzeros=7346177, allocated nonzeros=7346177
>>>> >     total number of mallocs used during MatSetValues calls=0
>>>> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view ::ascii_info_detail | grep -A3 "type: "
>>>> >   type: preonly
>>>> >   maximum iterations=10000, initial guess is zero
>>>> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>> >   left preconditioning
>>>> > --
>>>> >   type: lu
>>>> >     out-of-place factorization
>>>> >     tolerance for zero pivot 2.22045e-14
>>>> >     matrix ordering: external
>>>> > --
>>>> >           type: mumps
>>>> >           rows=1050625, cols=1050625
>>>> >           package used to perform factorization: mumps
>>>> >           total: nonzeros=115377847, allocated nonzeros=115377847
>>>> > --
>>>> >     type: mpiaij
>>>> >     rows=1050625, cols=1050625
>>>> >     total: nonzeros=7346177, allocated nonzeros=7346177
>>>> >     total number of mallocs used during MatSetValues calls=0
>>>> > ```
>>>> > 
>>>> > I would greatly appreciate any insights you may have on this matter. Thank you in advance for your time and assistance.
>>>> > 
>>>> > Best wishes,
>>>> > Zongze
>>>> > <test_mumps.py>
>> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230304/a4b70346/attachment-0001.html>

From yangzongze at gmail.com  Sat Mar  4 09:45:03 2023
From: yangzongze at gmail.com (Zongze Yang)
Date: Sat, 4 Mar 2023 23:45:03 +0800
Subject: [petsc-users] Random Error of mumps: out of memory: INFOG(1)=-9
In-Reply-To: <20049ECA-AEA6-4BED-B692-97C8C2AAAA3A@joliv.et>
References: <CA+K_gXB5wZDuaQeXA9kTrnyG657oCCj84cgfoa53ysZG_8nmYg@mail.gmail.com>
	<20049ECA-AEA6-4BED-B692-97C8C2AAAA3A@joliv.et>
Message-ID: <CA+K_gXCHW2ouTTEcsOVDZdNuFw0x6GaSi19n7-4u61Kh17c6cQ@mail.gmail.com>

Thanks, I will give it a try.

Best wishes,
Zongze


On Sat, 4 Mar 2023 at 23:09, Pierre Jolivet <pierre at joliv.et> wrote:

>
>
> On 4 Mar 2023, at 3:26 PM, Zongze Yang <yangzongze at gmail.com> wrote:
>
> ?
>
>
> On Sat, 4 Mar 2023 at 22:03, Pierre Jolivet <pierre at joliv.et> wrote:
>
>>
>>
>> On 4 Mar 2023, at 2:51 PM, Zongze Yang <yangzongze at gmail.com> wrote:
>>
>>
>>
>> On Sat, 4 Mar 2023 at 21:37, Pierre Jolivet <pierre at joliv.et> wrote:
>>
>>>
>>>
>>> > On 4 Mar 2023, at 2:30 PM, Zongze Yang <yangzongze at gmail.com> wrote:
>>> >
>>> > Hi,
>>> >
>>> > I am writing to seek your advice regarding a problem I encountered
>>> while using multigrid to solve a certain issue.
>>> > I am currently using multigrid with the coarse problem solved by PCLU.
>>> However, the PC failed randomly with the error below (the value of INFO(2)
>>> may differ):
>>> > ```shell
>>> > [ 0] Error reported by MUMPS in numerical factorization phase:
>>> INFOG(1)=-9, INFO(2)=36
>>> > ```
>>> >
>>> > Upon checking the documentation of MUMPS, I discovered that increasing
>>> the value of ICNTL(14) may help resolve the issue. Specifically, I set the
>>> option -mat_mumps_icntl_14 to a higher value (such as 40), and the error
>>> seemed to disappear after I set the value of ICNTL(14) to 80. However, I am
>>> still curious as to why MUMPS failed randomly in the first place.
>>> >
>>> > Upon further inspection, I found that the number of nonzeros of the
>>> PETSc matrix and the MUMPS matrix were different every time I ran the code.
>>> I am now left with the following questions:
>>> >
>>> > 1. What could be causing the number of nonzeros of the MUMPS matrix to
>>> change every time I run the code?
>>>
>>> Is the Mat being fed to MUMPS distributed on a communicator of size
>>> greater than one?
>>> If yes, then, depending on the pivoting and the renumbering, you may get
>>> non-deterministic results.
>>>
>>
>> Hi, Pierre,
>> Thank you for your prompt reply. Yes, the size of the communicator is
>> greater than one.
>> Even if the size of the communicator is equal, are the results
>> still non-deterministic?
>>
>>
>> In the most general case, yes.
>>
>> Can I assume the Mat being fed to MUMPS is the same in this case?
>>
>>
>> Are you doing algebraic or geometric multigrid?
>> Are the prolongation operators computed by Firedrake or by PETSc, e.g.,
>> through GAMG?
>> If it?s the latter, I believe the Mat being fed to MUMPS should always be
>> the same.
>> If it?s the former, you?ll have to ask the Firedrake people if there may
>> be non-determinism in the coarsening process.
>>
>
> I am using geometric multigrid, and the prolongation operators, I think,
> are computed by Firedrake.
> Thanks for your suggestion, I will ask the Firedrake people.
>
>
>>
>> Is the pivoting and renumbering all done by MUMPS other than PETSc?
>>
>>
>> You could provide your own numbering, but by default, this is outsourced
>> to MUMPS indeed, which will itself outsourced this to METIS, AMD, etc.
>>
>
> I think I won't do this.
> By the way, does the result of superlu_dist  have a similar
> non-deterministic?
>
>
> SuperLU_DIST uses static pivoting as far as I know, so it may be more
> deterministic.
>
> Thanks,
> Pierre
>
> Thanks,
> Zongze
>
>
>> Thanks,
>> Pierre
>>
>>
>>> > 2. Why is the number of nonzeros of the MUMPS matrix significantly
>>> greater than that of the PETSc matrix (as seen in the output of ksp_view,
>>> 115025949 vs 7346177)?
>>>
>>> Exact factorizations introduce fill-in.
>>> The number of nonzeros you are seeing for MUMPS is the number of
>>> nonzeros in the factors.
>>>
>>> > 3. Is it possible that the varying number of nonzeros of the MUMPS
>>> matrix is the cause of the random failure?
>>>
>>> Yes, MUMPS uses dynamic scheduling, which will depend on numerical
>>> pivoting, and which may generate factors with different number of nonzeros.
>>>
>>
>> Got it. Thank you for your clear explanation.
>> Zongze
>>
>>
>>> Thanks,
>>> Pierre
>>
>>
>>> > I have attached a test example written in Firedrake. The output of
>>> `ksp_view` after running the code twice is included below for your
>>> reference.
>>> > In the output, the number of nonzeros of the MUMPS matrix was
>>> 115025949 and 115377847, respectively, while that of the PETSc matrix was
>>> only 7346177.
>>> >
>>> > ```shell
>>> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view
>>> ::ascii_info_detail | grep -A3 "type: "
>>> >   type: preonly
>>> >   maximum iterations=10000, initial guess is zero
>>> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>> >   left preconditioning
>>> > --
>>> >   type: lu
>>> >     out-of-place factorization
>>> >     tolerance for zero pivot 2.22045e-14
>>> >     matrix ordering: external
>>> > --
>>> >           type: mumps
>>> >           rows=1050625, cols=1050625
>>> >           package used to perform factorization: mumps
>>> >           total: nonzeros=115025949, allocated nonzeros=115025949
>>> > --
>>> >     type: mpiaij
>>> >     rows=1050625, cols=1050625
>>> >     total: nonzeros=7346177, allocated nonzeros=7346177
>>> >     total number of mallocs used during MatSetValues calls=0
>>> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view
>>> ::ascii_info_detail | grep -A3 "type: "
>>> >   type: preonly
>>> >   maximum iterations=10000, initial guess is zero
>>> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>> >   left preconditioning
>>> > --
>>> >   type: lu
>>> >     out-of-place factorization
>>> >     tolerance for zero pivot 2.22045e-14
>>> >     matrix ordering: external
>>> > --
>>> >           type: mumps
>>> >           rows=1050625, cols=1050625
>>> >           package used to perform factorization: mumps
>>> >           total: nonzeros=115377847, allocated nonzeros=115377847
>>> > --
>>> >     type: mpiaij
>>> >     rows=1050625, cols=1050625
>>> >     total: nonzeros=7346177, allocated nonzeros=7346177
>>> >     total number of mallocs used during MatSetValues calls=0
>>> > ```
>>> >
>>> > I would greatly appreciate any insights you may have on this matter.
>>> Thank you in advance for your time and assistance.
>>> >
>>> > Best wishes,
>>> > Zongze
>>> > <test_mumps.py>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230304/9b700a1d/attachment.html>

From danyang.su at gmail.com  Sat Mar  4 18:38:15 2023
From: danyang.su at gmail.com (Danyang Su)
Date: Sat, 04 Mar 2023 16:38:15 -0800
Subject: [petsc-users] PETSC ERROR in DMGetLocalBoundingBox?
In-Reply-To: <CAMYG4G=0upUasJCTPckC9zaT4_=oYosj61AOdT19SLUcDs=vyQ@mail.gmail.com>
References: <00ab01d94e31$51fdc590$f5f950b0$@gmail.com>
	<CAMYG4G=0upUasJCTPckC9zaT4_=oYosj61AOdT19SLUcDs=vyQ@mail.gmail.com>
Message-ID: <64E48C68-62C7-4624-9D35-63F604DA3C3C@gmail.com>

Hi Matt,

 

Attached is the source code and example. I have deleted most of the unused source code but it is still a bit length. Sorry about that. The errors come after DMGetLocalBoundingBox and DMGetBoundingBox.

 

-> To compile the code

Please type 'make exe' and the executable file petsc_bounding will be created under the same folder.

 

 

-> To test the code

Please go to fold 'test' and type 'mpiexec -n 1 ../petsc_bounding'.

 

 

-> The output from PETSc 3.18, error information

input file: stedvs.dat

 

------------------------------------------------------------------------

global control parameters

------------------------------------------------------------------------

 

[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------

[0]PETSC ERROR: Corrupt argument: https://petsc.org/release/faq/#valgrind

[0]PETSC ERROR: Object already free: Parameter # 1

[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.

[0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 

[0]PETSC ERROR: ../petsc_bounding on a linux-gnu-dbg named starblazer by dsu Sat Mar? 4 16:20:51 2023

[0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-scalapack --download-parmetis --download-metis --download-mumps --download-ptscotch --download-chaco --download-fblaslapack --download-hypre --download-superlu_dist --download-hdf5=yes --download-ctetgen --download-zlib --download-pnetcdf --download-cmake --with-hdf5-fortran-bindings --with-debugging=1

[0]PETSC ERROR: #1 VecGetArrayRead() at /home/dsu/Soft/petsc/petsc-3.18.3/src/vec/vec/interface/rvector.c:1928

[0]PETSC ERROR: #2 DMGetLocalBoundingBox() at /home/dsu/Soft/petsc/petsc-3.18.3/src/dm/interface/dmcoordinates.c:897

[0]PETSC ERROR: #3 /home/dsu/Work/bug-check/petsc_bounding/src/solver_ddmethod.F90:1920

Total volume of simulation domain?? 0.20000000E+01

Total volume of simulation domain?? 0.20000000E+01

 

 

-> The output from PETSc 3.17 and earlier, no error

input file: stedvs.dat

 

------------------------------------------------------------------------

global control parameters

------------------------------------------------------------------------

 

Total volume of simulation domain?? 0.20000000E+01

Total volume of simulation domain?? 0.20000000E+01

 

 

Thanks,

 

Danyang

From: Matthew Knepley <knepley at gmail.com>
Date: Friday, March 3, 2023 at 8:58 PM
To: <danyang.su at gmail.com>
Cc: <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] PETSC ERROR in DMGetLocalBoundingBox?

 

On Sat, Mar 4, 2023 at 1:35?AM <danyang.su at gmail.com> wrote:

Hi All,

 

I get a very strange error after upgrading PETSc version to 3.18.3, indicating some object is already free. The error is begin and does not crash the code. There is no error before PETSc 3.17.5 versions.

 

We have changed the way coordinates are handled in order to support higher order coordinate fields. Is it possible

to send something that we can run that has this error? It could be on our end, but it could also be that you are

destroying a coordinate vector accidentally.

 

  Thanks,

 

     Matt

 

 

        !Check coordinates

        call DMGetCoordinateDM(dmda_flow%da,cda,ierr)

        CHKERRQ(ierr)

        call DMGetCoordinates(dmda_flow%da,gc,ierr)

        CHKERRQ(ierr)

        call DMGetLocalBoundingBox(dmda_flow%da,lmin,lmax,ierr)

        CHKERRQ(ierr)

        call DMGetBoundingBox(dmda_flow%da,gmin,gmax,ierr)

        CHKERRQ(ierr)

 

 

[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------

[0]PETSC ERROR: Corrupt argument: https://petsc.org/release/faq/#valgrind

[0]PETSC ERROR: Object already free: Parameter # 1

[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.

[0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022

[0]PETSC ERROR: ../min3p-hpc-mpi-petsc-3.18.3 on a linux-gnu-dbg named starblazer by dsu Fri Mar  3 16:26:03 2023

[0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-scalapack --download-parmetis --download-metis --download-mumps --download-ptscotch --download-chaco --download-fblaslapack --download-hypre --download-superlu_dist --download-hdf5=yes --download-ctetgen --download-zlib --download-pnetcdf --download-cmake --with-hdf5-fortran-bindings --with-debugging=1

[0]PETSC ERROR: #1 VecGetArrayRead() at /home/dsu/Soft/petsc/petsc-3.18.3/src/vec/vec/interface/rvector.c:1928

[0]PETSC ERROR: #2 DMGetLocalBoundingBox() at /home/dsu/Soft/petsc/petsc-3.18.3/src/dm/interface/dmcoordinates.c:897

[0]PETSC ERROR: #3 /home/dsu/Work/min3p-dbs-backup/src/project/makefile_p/../../solver/solver_ddmethod.F90:2140

 

Any suggestion on this?

 

Thanks,

 

Danyang


 

-- 

What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

 

https://www.cse.buffalo.edu/~knepley/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230304/e908b7e4/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: petsc_bounding_check.tar.gz
Type: application/x-gzip
Size: 77551 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230304/e908b7e4/attachment-0001.gz>

From yangzongze at gmail.com  Sun Mar  5 02:14:12 2023
From: yangzongze at gmail.com (Zongze Yang)
Date: Sun, 5 Mar 2023 16:14:12 +0800
Subject: [petsc-users] petsc4py did not raise for a second time with option
 `ksp_error_if_not_converged`
Message-ID: <CA+K_gXCnGp77i9OXhx1mXRinKLJx4sv7dYVkEZoV0WT5+SpwQg@mail.gmail.com>

Hello,

I am trying to catch the "not converged" error in a loop with the
`ksp_error_if_not_converged` option on. However, it seems that PETSc only
raises the exception once, even though the solver does not converge after
that. Is this expected behavior? Can I make it raise an exception every
time?

I have included a code snippet of the loop below, and the complete code is
attached:
```python
for i in range(3):
    printf(f"Loop i = {i}")
    try:
        solver.solve()
    except ConvergenceError:
        printf(f"  Error from Firedrake: solver did not converged:
{get_ksp_reason(solver)}")
    except PETSc.Error as e:
        if e.ierr == 91:
            printf(f"  Error from PETSc: solver did not converged:
{get_ksp_reason(solver)}")
        else:
            raise
```

The output of the code looks like this:
```python
(complex-int32-mkl) $ python test_error.py
Loop i = 0
  Linear  solve did not converge due to DIVERGED_ITS iterations 4
  Error from PETSc: solver did not converged: DIVERGED_MAX_IT
Loop i = 1
  Linear  solve did not converge due to DIVERGED_ITS iterations 4
  Error from Firedrake: solver did not converged: DIVERGED_MAX_IT
Loop i = 2
  Linear  solve did not converge due to DIVERGED_ITS iterations 4
  Error from Firedrake: solver did not converged: DIVERGED_MAX_IT
```

Best wishes,
Zongze
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230305/23100b7c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_error.py
Type: application/octet-stream
Size: 1476 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230305/23100b7c/attachment.obj>

From knepley at gmail.com  Sun Mar  5 12:40:04 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Sun, 5 Mar 2023 13:40:04 -0500
Subject: [petsc-users] petsc4py did not raise for a second time with
 option `ksp_error_if_not_converged`
In-Reply-To: <CA+K_gXCnGp77i9OXhx1mXRinKLJx4sv7dYVkEZoV0WT5+SpwQg@mail.gmail.com>
References: <CA+K_gXCnGp77i9OXhx1mXRinKLJx4sv7dYVkEZoV0WT5+SpwQg@mail.gmail.com>
Message-ID: <CAMYG4GmW8bTp6dxwnt1FpRvnbW9D-J82piGbyAEgNcwGZO0S4w@mail.gmail.com>

On Sun, Mar 5, 2023 at 3:14?AM Zongze Yang <yangzongze at gmail.com> wrote:

>
>
> Hello,
>
> I am trying to catch the "not converged" error in a loop with the
> `ksp_error_if_not_converged` option on. However, it seems that PETSc only
> raises the exception once, even though the solver does not converge after
> that. Is this expected behavior? Can I make it raise an exception every
> time?
>

When an error is raised, we do not guarantee a consistent state for
recovery, so errors terminate the program. If you want
to do something useful with non-convergence, then you do not set
-ksp_error_if_not_converged. Rather you check the convergence
code, and if it is not convergence, you take your action.

  Thanks,

     Matt


> I have included a code snippet of the loop below, and the complete code is
> attached:
> ```python
> for i in range(3):
>     printf(f"Loop i = {i}")
>     try:
>         solver.solve()
>     except ConvergenceError:
>         printf(f"  Error from Firedrake: solver did not converged:
> {get_ksp_reason(solver)}")
>     except PETSc.Error as e:
>         if e.ierr == 91:
>             printf(f"  Error from PETSc: solver did not converged:
> {get_ksp_reason(solver)}")
>         else:
>             raise
> ```
>
> The output of the code looks like this:
> ```python
> (complex-int32-mkl) $ python test_error.py
> Loop i = 0
>   Linear  solve did not converge due to DIVERGED_ITS iterations 4
>   Error from PETSc: solver did not converged: DIVERGED_MAX_IT
> Loop i = 1
>   Linear  solve did not converge due to DIVERGED_ITS iterations 4
>   Error from Firedrake: solver did not converged: DIVERGED_MAX_IT
> Loop i = 2
>   Linear  solve did not converge due to DIVERGED_ITS iterations 4
>   Error from Firedrake: solver did not converged: DIVERGED_MAX_IT
> ```
>
> Best wishes,
> Zongze
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230305/d43ec955/attachment.html>

From yangzongze at gmail.com  Tue Mar  7 00:09:52 2023
From: yangzongze at gmail.com (Zongze Yang)
Date: Tue, 7 Mar 2023 14:09:52 +0800
Subject: [petsc-users] petsc4py did not raise for a second time with
 option `ksp_error_if_not_converged`
In-Reply-To: <CAMYG4GmW8bTp6dxwnt1FpRvnbW9D-J82piGbyAEgNcwGZO0S4w@mail.gmail.com>
References: <CA+K_gXCnGp77i9OXhx1mXRinKLJx4sv7dYVkEZoV0WT5+SpwQg@mail.gmail.com>
	<CAMYG4GmW8bTp6dxwnt1FpRvnbW9D-J82piGbyAEgNcwGZO0S4w@mail.gmail.com>
Message-ID: <CA+K_gXAxO5=TwNO8=XFc3AHBuBNFxXOnh6QCfFZR17ym6FiGCQ@mail.gmail.com>

Thank you for your suggestion.

Best wishes,
Zongze


On Mon, 6 Mar 2023 at 02:40, Matthew Knepley <knepley at gmail.com> wrote:

> On Sun, Mar 5, 2023 at 3:14?AM Zongze Yang <yangzongze at gmail.com> wrote:
>
>>
>>
>> Hello,
>>
>> I am trying to catch the "not converged" error in a loop with the
>> `ksp_error_if_not_converged` option on. However, it seems that PETSc only
>> raises the exception once, even though the solver does not converge after
>> that. Is this expected behavior? Can I make it raise an exception every
>> time?
>>
>
> When an error is raised, we do not guarantee a consistent state for
> recovery, so errors terminate the program. If you want
> to do something useful with non-convergence, then you do not set
> -ksp_error_if_not_converged. Rather you check the convergence
> code, and if it is not convergence, you take your action.
>
>   Thanks,
>
>      Matt
>
>
>> I have included a code snippet of the loop below, and the complete code
>> is attached:
>> ```python
>> for i in range(3):
>>     printf(f"Loop i = {i}")
>>     try:
>>         solver.solve()
>>     except ConvergenceError:
>>         printf(f"  Error from Firedrake: solver did not converged:
>> {get_ksp_reason(solver)}")
>>     except PETSc.Error as e:
>>         if e.ierr == 91:
>>             printf(f"  Error from PETSc: solver did not converged:
>> {get_ksp_reason(solver)}")
>>         else:
>>             raise
>> ```
>>
>> The output of the code looks like this:
>> ```python
>> (complex-int32-mkl) $ python test_error.py
>> Loop i = 0
>>   Linear  solve did not converge due to DIVERGED_ITS iterations 4
>>   Error from PETSc: solver did not converged: DIVERGED_MAX_IT
>> Loop i = 1
>>   Linear  solve did not converge due to DIVERGED_ITS iterations 4
>>   Error from Firedrake: solver did not converged: DIVERGED_MAX_IT
>> Loop i = 2
>>   Linear  solve did not converge due to DIVERGED_ITS iterations 4
>>   Error from Firedrake: solver did not converged: DIVERGED_MAX_IT
>> ```
>>
>> Best wishes,
>> Zongze
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230307/3c753837/attachment.html>

From ajaramillopalma at gmail.com  Wed Mar  8 22:52:41 2023
From: ajaramillopalma at gmail.com (Alfredo Jaramillo)
Date: Wed, 8 Mar 2023 21:52:41 -0700
Subject: [petsc-users] O3 versus O2
Message-ID: <CAKoAOWprbicy4_cNLFQ_1YN04Hf5n044V2jeVk-EKbJtxqGd0Q@mail.gmail.com>

Dear community,

We are in the middle of testing a simulator where the main computational
bottleneck is solving a linear problem. We do this by calling
GMRES+BoomerAMG through PETSc.

This is a commercial code, pretended to serve clients with workstations or
with access to clusters.

Would you recommend O3 versus O2 optimizations? Maybe just to compile the
linear algebra libraries?

Some years ago, I worked on another project where going back to O2 solved a
weird runtime error that I was never able to solve. This triggers my
untrust.

Thank you for your time!
Alfredo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230308/bfd96f5b/attachment.html>

From jed at jedbrown.org  Wed Mar  8 23:36:50 2023
From: jed at jedbrown.org (Jed Brown)
Date: Wed, 08 Mar 2023 22:36:50 -0700
Subject: [petsc-users] O3 versus O2
In-Reply-To: <CAKoAOWprbicy4_cNLFQ_1YN04Hf5n044V2jeVk-EKbJtxqGd0Q@mail.gmail.com>
References: <CAKoAOWprbicy4_cNLFQ_1YN04Hf5n044V2jeVk-EKbJtxqGd0Q@mail.gmail.com>
Message-ID: <87o7p2zdsd.fsf@jedbrown.org>

You can test a benchmark problem with both. It probably doesn't make a lot of difference with the solver configuration you've selected (most of those operations are memory bandwidth limited).

If your residual and Jacobian assembly code is written to vectorize, you may get significant benefit from architecture-specific optimizations like -march=skylake.

Alfredo Jaramillo <ajaramillopalma at gmail.com> writes:

> Dear community,
>
> We are in the middle of testing a simulator where the main computational
> bottleneck is solving a linear problem. We do this by calling
> GMRES+BoomerAMG through PETSc.
>
> This is a commercial code, pretended to serve clients with workstations or
> with access to clusters.
>
> Would you recommend O3 versus O2 optimizations? Maybe just to compile the
> linear algebra libraries?
>
> Some years ago, I worked on another project where going back to O2 solved a
> weird runtime error that I was never able to solve. This triggers my
> untrust.
>
> Thank you for your time!
> Alfredo

From qingyuanhu at jiangnan.edu.cn  Fri Mar 10 03:34:17 2023
From: qingyuanhu at jiangnan.edu.cn (=?utf-8?B?6IOh5riF5YWD?=)
Date: Fri, 10 Mar 2023 17:34:17 +0800
Subject: [petsc-users] Questions about vec filter and recover
Message-ID: <tencent_1E7F960B6E713C4F3208DD0A@qq.com>

Hi there,


I am a fresh user of Petsc, from Jiangnan University. Now I am trying to use Petsc for FEM and topology optimization.
Since I use the background pixel elements, some of elements I don't want them to be calculated, so I have to filter them out. Then after my calculation, I want to have them back.


For example, in the context of "mpiexec -np 2":
I have a Vec xPassive=[1, 1, 0, 0, 0 | 1, 1, 1, 1, 1] showing the design-able elements (1) and the not-design-able elements (0) to be filtered out. This vec is auto sliced into 5+5&nbsp; by the 2 threads.
At the same time, I have a Vec density=[0.0, 0.1, 1.0, 1.0, 1.0 | 0.5, 0.6, 0.7, 0.8, 0.9].
In order to narrow down the density, I make an array and count, like resarray=[0.0, 0.1] with count=2 and resarray=[0.5, 0.6, 0.7, 0.8, 0.9] with count=5, then by the method VecCreateMPIWithArray(PETSC_COMM_WORLD, 1, count, PETSC_DECIDE, resarray, &amp;density_new),&nbsp; I get Vec density_new = [0.0, 0.1, 0.5, 0.6, 0.7, 0.8, 0.9] successfully.
Next, I put the density_new into some methods to get the new values like density_new=[0.01, 0.11, 0.51, 0.61 | 0.71, 0.81, 0.91], note that since the density_new is of size 7, it becomes 4+3 for the 2 threads.

Finally, I have to recover them as Vec density_recover=[0.01, 0.11, 1.0, 1.0, 1.0 | 0.51, 0.61, 0.71, 0.81, 0.91], in this process I fill the default 1.0 for the place where xPassive value=0.


In the last step, when I try to recover the density vector, I tried to use VecGetValues but it seems can only get local values, cannot cross threads. 

I tried also to use VecScatterCreate(density_new, NULL, density_recover, idx_to, &amp;scatter), however, my idx_to=[0, 1 | 5, 6, 7, 8, 9] and not works well like normal [0, 1, 5, 6, 7, 8, 9].
Could you help me with this please? Thank you soooooo much for your time!


Best regards,
Qingyuan HU
School of Science, Jiangnan University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230310/282d79e0/attachment.html>

From mfadams at lbl.gov  Fri Mar 10 09:32:27 2023
From: mfadams at lbl.gov (Mark Adams)
Date: Fri, 10 Mar 2023 10:32:27 -0500
Subject: [petsc-users] Questions about vec filter and recover
In-Reply-To: <tencent_1E7F960B6E713C4F3208DD0A@qq.com>
References: <tencent_1E7F960B6E713C4F3208DD0A@qq.com>
Message-ID: <CADOhEh7u7V6rvkQ2qNP0FbijPb1V0aJLQmQYPkaitgao6Oki7w@mail.gmail.com>

On Fri, Mar 10, 2023 at 9:23?AM ??? <qingyuanhu at jiangnan.edu.cn> wrote:

> Hi there,
>
> I am a fresh user of Petsc, from Jiangnan University. Now I am trying to
> use Petsc for FEM and topology optimization.
> Since I use the background pixel elements, some of elements I don't want
> them to be calculated, so I have to filter them out. Then after my
> calculation, I want to have them back.
>
> For example, in the context of "mpiexec -np 2":
> I have a Vec xPassive=[1, 1, 0, 0, 0 | 1, 1, 1, 1, 1] showing the
> design-able elements (1) and the not-design-able elements (0) to be
> filtered out. This vec is auto sliced into 5+5  by the 2 threads.
> At the same time, I have a Vec density=[0.0, 0.1, 1.0, 1.0, 1.0 | 0.5,
> 0.6, 0.7, 0.8, 0.9].
> In order to narrow down the density, I make an array and count, like
> resarray=[0.0, 0.1] with count=2 and resarray=[0.5, 0.6, 0.7, 0.8, 0.9]
> with count=5, then by the method VecCreateMPIWithArray(PETSC_COMM_WORLD,
> 1, count, PETSC_DECIDE, resarray, &density_new),  I get Vec density_new =
> [0.0, 0.1, 0.5, 0.6, 0.7, 0.8, 0.9] successfully.
> Next, I put the density_new into some methods to get the new values like
> density_new=[0.01, 0.11, 0.51, 0.61 | 0.71, 0.81, 0.91], note that since
> the density_new is of size 7, it becomes 4+3 for the 2 threads.
>

So the method changes the paralelled decompostion but not the order
(strange, but that is fine)


> Finally, I have to recover them as Vec density_recover=[0.01, 0.11, 1.0,
> 1.0, 1.0 | 0.51, 0.61, 0.71, 0.81, 0.91], in this process I fill the
> default 1.0 for the place where xPassive value=0.
>
> In the last step, when I try to recover the density vector, I tried to use
> VecGetValues but it seems can only get local values, cannot cross threads.
>

Yes, you can only get local values with VecGetValues.


>
> I tried also to use VecScatterCreate(density_new, NULL, density_recover,
> idx_to, &scatter), however, my idx_to=[0, 1 | 5, 6, 7, 8, 9] and not
> works well like normal [0, 1, 5, 6, 7, 8, 9].
> Could you help me with this please? Thank you soooooo much for your time!
>

You have 4+3 so you want your IS to be of that size. One IS can be NULL
because you are scattering all values.
I think you want:
[ 0 1 5 6  | 7  8 9 ]
And set the values to 1.0 before the scatter to get your 1.0, 1.0, 1.0 in
there.

I always just have to play around with this kind of stuff to get it right.

Good luck,
Mark



>
> Best regards,
> Qingyuan HU
> School of Science, Jiangnan University
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230310/b7475c61/attachment.html>

From eric.w.hester at icloud.com  Mon Mar 13 09:58:35 2023
From: eric.w.hester at icloud.com (Eric Hester)
Date: Mon, 13 Mar 2023 07:58:35 -0700
Subject: [petsc-users] Does petsc4py support matrix-free iterative solvers?
Message-ID: <BAF9A579-D220-444D-967E-E72BF8644619@icloud.com>

Hello everyone,

Does petsc4py support matrix-free iterative solvers (as for Matrix-Free matrices in petsc)?

For context, I have a distributed matrix problem to solve. It comes from a Fourier-Chebyshev Galerkin discretisation. The corresponding matrix is dense, but it is fast to evaluate using fftw. It is also distributed in memory.

While I?ve found some petsc4py tutorial examples in "/petsc/src/binding/petsc4py/demo/?, they don?t seem to show a matrix free example. And I don?t see a reference to a matrix shell create method in the petsc4py api.

If petsc4py does support matrix free iterative solvers, it would be really helpful if someone could provide even a toy example of that. Serial would work, though a parallelised one would be better.

Thanks,
Eric



From jroman at dsic.upv.es  Mon Mar 13 10:10:51 2023
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Mon, 13 Mar 2023 16:10:51 +0100
Subject: [petsc-users] Does petsc4py support matrix-free iterative
 solvers?
In-Reply-To: <BAF9A579-D220-444D-967E-E72BF8644619@icloud.com>
References: <BAF9A579-D220-444D-967E-E72BF8644619@icloud.com>
Message-ID: <59FB83B2-68E6-4A53-926A-1C0727269A87@dsic.upv.es>

Both ode/vanderpol.py and poisson2d/poisson2d.py use shell matrices via a mult(self,mat,X,Y) function defined in the python side. Another example is ex3.py in slepc4py.

Jose



> El 13 mar 2023, a las 15:58, Eric Hester via petsc-users <petsc-users at mcs.anl.gov> escribi?:
> 
> Hello everyone,
> 
> Does petsc4py support matrix-free iterative solvers (as for Matrix-Free matrices in petsc)?
> 
> For context, I have a distributed matrix problem to solve. It comes from a Fourier-Chebyshev Galerkin discretisation. The corresponding matrix is dense, but it is fast to evaluate using fftw. It is also distributed in memory.
> 
> While I?ve found some petsc4py tutorial examples in "/petsc/src/binding/petsc4py/demo/?, they don?t seem to show a matrix free example. And I don?t see a reference to a matrix shell create method in the petsc4py api.
> 
> If petsc4py does support matrix free iterative solvers, it would be really helpful if someone could provide even a toy example of that. Serial would work, though a parallelised one would be better.
> 
> Thanks,
> Eric
> 
> 


From eric.w.hester at icloud.com  Mon Mar 13 11:37:53 2023
From: eric.w.hester at icloud.com (Eric Hester)
Date: Mon, 13 Mar 2023 09:37:53 -0700
Subject: [petsc-users] Does petsc4py support matrix-free iterative
 solvers?
In-Reply-To: <59FB83B2-68E6-4A53-926A-1C0727269A87@dsic.upv.es>
References: <BAF9A579-D220-444D-967E-E72BF8644619@icloud.com>
	<59FB83B2-68E6-4A53-926A-1C0727269A87@dsic.upv.es>
Message-ID: <4AB5F676-1D25-42C3-B668-A5FF65C070D5@icloud.com>

Ah ok. I see how the poisson2d example works. Thanks for the quick reply.

Eric

> On Mar 13, 2023, at 08:10, Jose E. Roman <jroman at dsic.upv.es> wrote:
> 
> Both ode/vanderpol.py and poisson2d/poisson2d.py use shell matrices via a mult(self,mat,X,Y) function defined in the python side. Another example is ex3.py in slepc4py.
> 
> Jose
> 
> 
> 
>> El 13 mar 2023, a las 15:58, Eric Hester via petsc-users <petsc-users at mcs.anl.gov> escribi?:
>> 
>> Hello everyone,
>> 
>> Does petsc4py support matrix-free iterative solvers (as for Matrix-Free matrices in petsc)?
>> 
>> For context, I have a distributed matrix problem to solve. It comes from a Fourier-Chebyshev Galerkin discretisation. The corresponding matrix is dense, but it is fast to evaluate using fftw. It is also distributed in memory.
>> 
>> While I?ve found some petsc4py tutorial examples in "/petsc/src/binding/petsc4py/demo/?, they don?t seem to show a matrix free example. And I don?t see a reference to a matrix shell create method in the petsc4py api.
>> 
>> If petsc4py does support matrix free iterative solvers, it would be really helpful if someone could provide even a toy example of that. Serial would work, though a parallelised one would be better.
>> 
>> Thanks,
>> Eric
>> 
>> 
> 


From wuktsinghua at gmail.com  Tue Mar 14 06:25:20 2023
From: wuktsinghua at gmail.com (K. Wu)
Date: Tue, 14 Mar 2023 12:25:20 +0100
Subject: [petsc-users] KSP for successive linear systems
Message-ID: <CAM2YvrbmK4cg4M7vVoOogbiUbzmZ_xMAuR3SmniNpaO_z9oUyw@mail.gmail.com>

Hi all,

Good day!

I am trying to solve an optimization problem where I need to solve multiple
successive linear systems inside each optimization loop. The matrices are
based on the same grid, but their data structure will change for each
linear system.

Currently I am doing it by setting up just one single KSP object. Then call
KSPSetOperators() and KSPSolve() for each solve. This means the KSP object
is solving the successive linear systems one by one, and in the next
optimization iteration, it starts all over again.

I am wondering that should I use separate KSP objects for each linear
system so that during optimization the same KSP will be specialized in
solving its corresponding system all the time?

I use non-zero initial guess, so I pay attention to use different x vectors
for different linear systems, so that the x vectors from the previous
iteration can be used as initial guesses for linear systems in the next
iteration. Not sure whether some similar thing should also be done for KSP?

Thanks for your kind help!

Best regards,
Kai
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230314/609912b4/attachment.html>

From bsmith at petsc.dev  Tue Mar 14 07:27:22 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Tue, 14 Mar 2023 08:27:22 -0400
Subject: [petsc-users] KSP for successive linear systems
In-Reply-To: <CAM2YvrbmK4cg4M7vVoOogbiUbzmZ_xMAuR3SmniNpaO_z9oUyw@mail.gmail.com>
References: <CAM2YvrbmK4cg4M7vVoOogbiUbzmZ_xMAuR3SmniNpaO_z9oUyw@mail.gmail.com>
Message-ID: <A749FACF-88E8-4454-9EE9-41B4C9C10C61@petsc.dev>


  To gain an advantage in reusing the KSP the Mat must be the same size and the Mat must have the same nonzero structure (different numerical values are fine). Otherwise there is no measurable improvement in reusing the same KSP.

  Barry


> On Mar 14, 2023, at 7:25 AM, K. Wu <wuktsinghua at gmail.com> wrote:
> 
> Hi all,
> 
> Good day!
> 
> I am trying to solve an optimization problem where I need to solve multiple successive linear systems inside each optimization loop. The matrices are based on the same grid, but their data structure will change for each linear system.
> 
> Currently I am doing it by setting up just one single KSP object. Then call KSPSetOperators() and KSPSolve() for each solve. This means the KSP object is solving the successive linear systems one by one, and in the next optimization iteration, it starts all over again. 
> 
> I am wondering that should I use separate KSP objects for each linear system so that during optimization the same KSP will be specialized in solving its corresponding system all the time? 
> 
> I use non-zero initial guess, so I pay attention to use different x vectors for different linear systems, so that the x vectors from the previous iteration can be used as initial guesses for linear systems in the next iteration. Not sure whether some similar thing should also be done for KSP?
> 
> Thanks for your kind help!
> 
> Best regards,
> Kai
> 


From wuktsinghua at gmail.com  Tue Mar 14 09:06:50 2023
From: wuktsinghua at gmail.com (K. Wu)
Date: Tue, 14 Mar 2023 15:06:50 +0100
Subject: [petsc-users] KSP for successive linear systems
In-Reply-To: <A749FACF-88E8-4454-9EE9-41B4C9C10C61@petsc.dev>
References: <CAM2YvrbmK4cg4M7vVoOogbiUbzmZ_xMAuR3SmniNpaO_z9oUyw@mail.gmail.com>
	<A749FACF-88E8-4454-9EE9-41B4C9C10C61@petsc.dev>
Message-ID: <CAM2YvrY=GFoCnHDMYz5T2mianXKWqRTzMRVagHLWb8sR7uO6MQ@mail.gmail.com>

Thank you for the clarification.

Barry Smith <bsmith at petsc.dev> ?2023?3?14??? 13:27???

>
>   To gain an advantage in reusing the KSP the Mat must be the same size
> and the Mat must have the same nonzero structure (different numerical
> values are fine). Otherwise there is no measurable improvement in reusing
> the same KSP.
>
>   Barry
>
>
> > On Mar 14, 2023, at 7:25 AM, K. Wu <wuktsinghua at gmail.com> wrote:
> >
> > Hi all,
> >
> > Good day!
> >
> > I am trying to solve an optimization problem where I need to solve
> multiple successive linear systems inside each optimization loop. The
> matrices are based on the same grid, but their data structure will change
> for each linear system.
> >
> > Currently I am doing it by setting up just one single KSP object. Then
> call KSPSetOperators() and KSPSolve() for each solve. This means the KSP
> object is solving the successive linear systems one by one, and in the next
> optimization iteration, it starts all over again.
> >
> > I am wondering that should I use separate KSP objects for each linear
> system so that during optimization the same KSP will be specialized in
> solving its corresponding system all the time?
> >
> > I use non-zero initial guess, so I pay attention to use different x
> vectors for different linear systems, so that the x vectors from the
> previous iteration can be used as initial guesses for linear systems in the
> next iteration. Not sure whether some similar thing should also be done for
> KSP?
> >
> > Thanks for your kind help!
> >
> > Best regards,
> > Kai
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230314/f84524ac/attachment.html>

From pmoschopoulos at outlook.com  Tue Mar 14 01:22:03 2023
From: pmoschopoulos at outlook.com (Pantelis Moschopoulos)
Date: Tue, 14 Mar 2023 06:22:03 +0000
Subject: [petsc-users] Memory Usage in Matrix Assembly.
Message-ID: <DB8P193MB0485B923B541126207D26D67A3BE9@DB8P193MB0485.EURP193.PROD.OUTLOOK.COM>

Hi everyone,

I am a new Petsc user that incorporates Petsc for FEM in a Fortran code.
My question concerns the sudden increase of the memory that Petsc needs during the assembly of the jacobian matrix. After this point, memory is freed. It seems to me like Petsc performs memory allocations and the deallocations during assembly.
I have used the following commands with no success:
CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE,ier)
CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ier)
CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_TRUE,ier).
CALL MatSetOption(petsc_A, MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ier)

The structure of the matrix does not change during my simulation, just the values. I am expecting this behavior the first time that I create this matrix because the preallocation instructions that I use are not very accurate but this continues every time I assemble the matrix.
What I am missing here?

Thank you very much,
Pantelis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230314/c3c65e62/attachment-0001.html>

From jdara at dtu.dk  Tue Mar 14 07:48:03 2023
From: jdara at dtu.dk (Jonathan Davud Razi Seyed Mirpourian)
Date: Tue, 14 Mar 2023 12:48:03 +0000
Subject: [petsc-users] Dmplex+PetscFe+KSP
Message-ID: <44a1501d8bd64690a6189d1e4271e8c7@dtu.dk>

Dear Petsc team,

I am trying to use DMplex in combination with PetscFE and KSP to solve a linear system.

I have struggled to do so, as all the examples I found ( for example: https://petsc.org/release/src/snes/tutorials/ex26.c.html) use SNES.

Is there a way to avoid this? Optimally I would like to use dmplex for the mesh management, then create the discretization with PetscFE and then get KSP to automatically
assemble the system matrix A.

I hope my questions is reasonable.

All the best,

Jonathan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230314/b6c2855b/attachment.html>

From dave.mayhem23 at gmail.com  Tue Mar 14 09:40:46 2023
From: dave.mayhem23 at gmail.com (Dave May)
Date: Tue, 14 Mar 2023 07:40:46 -0700
Subject: [petsc-users] Memory Usage in Matrix Assembly.
In-Reply-To: <DB8P193MB0485B923B541126207D26D67A3BE9@DB8P193MB0485.EURP193.PROD.OUTLOOK.COM>
References: <DB8P193MB0485B923B541126207D26D67A3BE9@DB8P193MB0485.EURP193.PROD.OUTLOOK.COM>
Message-ID: <CAJ98EDr3n0VhOjrnKYuN5BmcgK9j4s2cYbUvabsy-1+pmJNpww@mail.gmail.com>

On Tue 14. Mar 2023 at 07:15, Pantelis Moschopoulos <
pmoschopoulos at outlook.com> wrote:

> Hi everyone,
>
> I am a new Petsc user that incorporates Petsc for FEM in a Fortran code.
> My question concerns the sudden increase of the memory that Petsc needs
> during the assembly of the jacobian matrix. After this point, memory is
> freed. It seems to me like Petsc performs memory allocations and the
> deallocations during assembly.
> I have used the following commands with no success:
> CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE,ier)
> CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ier)
> CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_TRUE,ier).
> CALL MatSetOption(petsc_A, MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ier)
>
> The structure of the matrix does not change during my simulation, just the
> values. I am expecting this behavior the first time that I create this
> matrix because the preallocation instructions that I use are not very
> accurate but this continues every time I assemble the matrix.
> What I am missing here?
>

I am guessing this observation is seen when you run a parallel job.

MatSetValues() will cache values in a temporary memory buffer if the values
are to be sent to a different MPI rank.
Hence if the parallel layout of your matrix doesn?t closely match the
layout of the DOFs on each mesh sub-domain, then a huge number of values
can potentially be cached. After you call MatAssemblyBegin(),
MatAssemblyEnd() this cache will be freed.

Thanks,
Dave



> Thank you very much,
> Pantelis
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230314/fe063a5c/attachment.html>

From pmoschopoulos at outlook.com  Tue Mar 14 09:59:34 2023
From: pmoschopoulos at outlook.com (Pantelis Moschopoulos)
Date: Tue, 14 Mar 2023 14:59:34 +0000
Subject: [petsc-users] Memory Usage in Matrix Assembly.
In-Reply-To: <CAJ98EDr3n0VhOjrnKYuN5BmcgK9j4s2cYbUvabsy-1+pmJNpww@mail.gmail.com>
References: <DB8P193MB0485B923B541126207D26D67A3BE9@DB8P193MB0485.EURP193.PROD.OUTLOOK.COM>
	<CAJ98EDr3n0VhOjrnKYuN5BmcgK9j4s2cYbUvabsy-1+pmJNpww@mail.gmail.com>
Message-ID: <AM0P193MB0481A2354B2CD3F2BA5444C6A3BE9@AM0P193MB0481.EURP193.PROD.OUTLOOK.COM>

Dear Dave,

Yes, I observe this in parallel runs. How I can change the parallel layout of the matrix? In my implementation, I read the mesh file, and the I split the domain where the first rank gets the first N elements, the second rank gets the next N elements etc. Should I use metis to distribute elements? Note that I use continuous finite elements, which means that some values will be cached in a temporary buffer.

Thank you very much,
Pantelis
________________________________
From: Dave May <dave.mayhem23 at gmail.com>
Sent: Tuesday, March 14, 2023 4:40 PM
To: Pantelis Moschopoulos <pmoschopoulos at outlook.com>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Memory Usage in Matrix Assembly.



On Tue 14. Mar 2023 at 07:15, Pantelis Moschopoulos <pmoschopoulos at outlook.com<mailto:pmoschopoulos at outlook.com>> wrote:
Hi everyone,

I am a new Petsc user that incorporates Petsc for FEM in a Fortran code.
My question concerns the sudden increase of the memory that Petsc needs during the assembly of the jacobian matrix. After this point, memory is freed. It seems to me like Petsc performs memory allocations and the deallocations during assembly.
I have used the following commands with no success:
CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE,ier)
CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ier)
CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_TRUE,ier).
CALL MatSetOption(petsc_A, MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ier)

The structure of the matrix does not change during my simulation, just the values. I am expecting this behavior the first time that I create this matrix because the preallocation instructions that I use are not very accurate but this continues every time I assemble the matrix.
What I am missing here?

I am guessing this observation is seen when you run a parallel job.

MatSetValues() will cache values in a temporary memory buffer if the values are to be sent to a different MPI rank.
Hence if the parallel layout of your matrix doesn?t closely match the layout of the DOFs on each mesh sub-domain, then a huge number of values can potentially be cached. After you call MatAssemblyBegin(), MatAssemblyEnd() this cache will be freed.

Thanks,
Dave



Thank you very much,
Pantelis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230314/140fc959/attachment-0001.html>

From bsmith at petsc.dev  Tue Mar 14 10:17:36 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Tue, 14 Mar 2023 11:17:36 -0400
Subject: [petsc-users] Dmplex+PetscFe+KSP
In-Reply-To: <44a1501d8bd64690a6189d1e4271e8c7@dtu.dk>
References: <44a1501d8bd64690a6189d1e4271e8c7@dtu.dk>
Message-ID: <D03C7593-57CD-4F91-B3CF-A9B77E638B9D@petsc.dev>


  KSP/SNES do not automatically assemble the linear system, that is the responsibility of DMPLEX in this case. Thus the process for assembling the matrix is largely the same whether done with KSP or SNES and DMPLEX. The difference is, of course, that constructing the linear matrix does not depend on some ?solution? vector as with SNES. 

Note also you can simply use SNES for a linear problem by selecting the SNESType of SNESKSP; this will just as efficient as using KSP directly.

  You should be able to locate a SNES example and extract the calls for defining the mesh and building the matrix but using them with KSP.

  Barry





> On Mar 14, 2023, at 8:48 AM, Jonathan Davud Razi Seyed Mirpourian via petsc-users <petsc-users at mcs.anl.gov> wrote:
> 
> Dear Petsc team,
>  
> I am trying to use DMplex in combination with PetscFE and KSP to solve a linear system.
>  
> I have struggled to do so, as all the examples I found ( for example:https://petsc.org/release/src/snes/tutorials/ex26.c.html) use SNES.
>  
> Is there a way to avoid this? Optimally I would like to use dmplex for the mesh management, then create the discretization with PetscFE and then get KSP to automatically
> assemble the system matrix A.  
>  
> I hope my questions is reasonable.
>  
> All the best,
>  
> Jonathan 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230314/4e0e57b5/attachment.html>

From bsmith at petsc.dev  Tue Mar 14 10:21:57 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Tue, 14 Mar 2023 11:21:57 -0400
Subject: [petsc-users] Memory Usage in Matrix Assembly.
In-Reply-To: <AM0P193MB0481A2354B2CD3F2BA5444C6A3BE9@AM0P193MB0481.EURP193.PROD.OUTLOOK.COM>
References: <DB8P193MB0485B923B541126207D26D67A3BE9@DB8P193MB0485.EURP193.PROD.OUTLOOK.COM>
	<CAJ98EDr3n0VhOjrnKYuN5BmcgK9j4s2cYbUvabsy-1+pmJNpww@mail.gmail.com>
	<AM0P193MB0481A2354B2CD3F2BA5444C6A3BE9@AM0P193MB0481.EURP193.PROD.OUTLOOK.COM>
Message-ID: <579AEAB4-2C37-44B5-9FD9-8F5EF7A1D7C1@petsc.dev>


  Yes, you should partition the elements and redistribute them for optimal parallelism.

  You can use the MatPartitioning object to partition the graph of the elements which will tell you what elements should be assigned to each MPI process. But then you need to move the element information to the correct process. At that point your code will remain pretty much as it is now.

  Barry


> On Mar 14, 2023, at 10:59 AM, Pantelis Moschopoulos <pmoschopoulos at outlook.com> wrote:
> 
> Dear Dave, 
> 
> Yes, I observe this in parallel runs. How I can change the parallel layout of the matrix? In my implementation, I read the mesh file, and the I split the domain where the first rank gets the first N elements, the second rank gets the next N elements etc. Should I use metis to distribute elements? Note that I use continuous finite elements, which means that some values will be cached in a temporary buffer.
> 
> Thank you very much, 
> Pantelis
> From: Dave May <dave.mayhem23 at gmail.com>
> Sent: Tuesday, March 14, 2023 4:40 PM
> To: Pantelis Moschopoulos <pmoschopoulos at outlook.com>
> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] Memory Usage in Matrix Assembly.
>  
> 
> 
> On Tue 14. Mar 2023 at 07:15, Pantelis Moschopoulos <pmoschopoulos at outlook.com <mailto:pmoschopoulos at outlook.com>> wrote:
> Hi everyone, 
> 
> I am a new Petsc user that incorporates Petsc for FEM in a Fortran code. 
> My question concerns the sudden increase of the memory that Petsc needs during the assembly of the jacobian matrix. After this point, memory is freed. It seems to me like Petsc performs memory allocations and the deallocations during assembly. 
> I have used the following commands with no success: 
> CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE,ier)
> CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ier)
> CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_TRUE,ier).
> CALL MatSetOption(petsc_A, MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ier)
> 
> The structure of the matrix does not change during my simulation, just the values. I am expecting this behavior the first time that I create this matrix because the preallocation instructions that I use are not very accurate but this continues every time I assemble the matrix.
> What I am missing here?
> 
> I am guessing this observation is seen when you run a parallel job.
> 
> MatSetValues() will cache values in a temporary memory buffer if the values are to be sent to a different MPI rank.
> Hence if the parallel layout of your matrix doesn?t closely match the layout of the DOFs on each mesh sub-domain, then a huge number of values can potentially be cached. After you call MatAssemblyBegin(), MatAssemblyEnd() this cache will be freed.
> 
> Thanks,
> Dave
> 
> 
> 
> Thank you very much, 
> Pantelis 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230314/47455b95/attachment.html>

From pmoschopoulos at outlook.com  Tue Mar 14 10:32:35 2023
From: pmoschopoulos at outlook.com (Pantelis Moschopoulos)
Date: Tue, 14 Mar 2023 15:32:35 +0000
Subject: [petsc-users] Memory Usage in Matrix Assembly.
In-Reply-To: <579AEAB4-2C37-44B5-9FD9-8F5EF7A1D7C1@petsc.dev>
References: <DB8P193MB0485B923B541126207D26D67A3BE9@DB8P193MB0485.EURP193.PROD.OUTLOOK.COM>
	<CAJ98EDr3n0VhOjrnKYuN5BmcgK9j4s2cYbUvabsy-1+pmJNpww@mail.gmail.com>
	<AM0P193MB0481A2354B2CD3F2BA5444C6A3BE9@AM0P193MB0481.EURP193.PROD.OUTLOOK.COM>
	<579AEAB4-2C37-44B5-9FD9-8F5EF7A1D7C1@petsc.dev>
Message-ID: <AM0P193MB0481545AC51E78C393B056ABA3BE9@AM0P193MB0481.EURP193.PROD.OUTLOOK.COM>

Ok, I will try to implement your suggestions.

Thank you very much for your help,
Pantelis
________________________________
From: Barry Smith <bsmith at petsc.dev>
Sent: Tuesday, March 14, 2023 5:21 PM
To: Pantelis Moschopoulos <pmoschopoulos at outlook.com>
Cc: Dave May <dave.mayhem23 at gmail.com>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Memory Usage in Matrix Assembly.


  Yes, you should partition the elements and redistribute them for optimal parallelism.

  You can use the MatPartitioning object to partition the graph of the elements which will tell you what elements should be assigned to each MPI process. But then you need to move the element information to the correct process. At that point your code will remain pretty much as it is now.

  Barry


On Mar 14, 2023, at 10:59 AM, Pantelis Moschopoulos <pmoschopoulos at outlook.com> wrote:

Dear Dave,

Yes, I observe this in parallel runs. How I can change the parallel layout of the matrix? In my implementation, I read the mesh file, and the I split the domain where the first rank gets the first N elements, the second rank gets the next N elements etc. Should I use metis to distribute elements? Note that I use continuous finite elements, which means that some values will be cached in a temporary buffer.

Thank you very much,
Pantelis
________________________________
From: Dave May <dave.mayhem23 at gmail.com>
Sent: Tuesday, March 14, 2023 4:40 PM
To: Pantelis Moschopoulos <pmoschopoulos at outlook.com>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Memory Usage in Matrix Assembly.



On Tue 14. Mar 2023 at 07:15, Pantelis Moschopoulos <pmoschopoulos at outlook.com<mailto:pmoschopoulos at outlook.com>> wrote:
Hi everyone,

I am a new Petsc user that incorporates Petsc for FEM in a Fortran code.
My question concerns the sudden increase of the memory that Petsc needs during the assembly of the jacobian matrix. After this point, memory is freed. It seems to me like Petsc performs memory allocations and the deallocations during assembly.
I have used the following commands with no success:
CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE,ier)
CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ier)
CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_TRUE,ier).
CALL MatSetOption(petsc_A, MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ier)

The structure of the matrix does not change during my simulation, just the values. I am expecting this behavior the first time that I create this matrix because the preallocation instructions that I use are not very accurate but this continues every time I assemble the matrix.
What I am missing here?

I am guessing this observation is seen when you run a parallel job.

MatSetValues() will cache values in a temporary memory buffer if the values are to be sent to a different MPI rank.
Hence if the parallel layout of your matrix doesn?t closely match the layout of the DOFs on each mesh sub-domain, then a huge number of values can potentially be cached. After you call MatAssemblyBegin(), MatAssemblyEnd() this cache will be freed.

Thanks,
Dave



Thank you very much,
Pantelis

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230314/47b60c58/attachment-0001.html>

From dave.mayhem23 at gmail.com  Tue Mar 14 11:00:39 2023
From: dave.mayhem23 at gmail.com (Dave May)
Date: Tue, 14 Mar 2023 09:00:39 -0700
Subject: [petsc-users] Memory Usage in Matrix Assembly.
In-Reply-To: <AM0P193MB0481A2354B2CD3F2BA5444C6A3BE9@AM0P193MB0481.EURP193.PROD.OUTLOOK.COM>
References: <DB8P193MB0485B923B541126207D26D67A3BE9@DB8P193MB0485.EURP193.PROD.OUTLOOK.COM>
	<CAJ98EDr3n0VhOjrnKYuN5BmcgK9j4s2cYbUvabsy-1+pmJNpww@mail.gmail.com>
	<AM0P193MB0481A2354B2CD3F2BA5444C6A3BE9@AM0P193MB0481.EURP193.PROD.OUTLOOK.COM>
Message-ID: <CAJ98EDoY1+f=aCFZaDUOUk5SqP9zf0zcinAkvu6iEh_my-ChgQ@mail.gmail.com>

On Tue, 14 Mar 2023 at 07:59, Pantelis Moschopoulos <
pmoschopoulos at outlook.com> wrote:

> Dear Dave,
>
> Yes, I observe this in parallel runs. How I can change the parallel layout
> of the matrix? In my implementation, I read the mesh file, and the I split
> the domain where the first rank gets the first N elements, the second rank
> gets the next N elements etc. Should I use metis to distribute elements?
>


> Note that I use continuous finite elements, which means that some values
> will be cached in a temporary buffer.
>

Sure. With CG FE you will always have some DOFs which need to be cached,
however the number of cached values will be minimized if you follow Barry's
advice. If you do what Barry suggests, only the DOFs which live on the
boundary of your element-wise defined sub-domains would need to cached.

Thanks,
Dave


>
> Thank you very much,
> Pantelis
> ------------------------------
> *From:* Dave May <dave.mayhem23 at gmail.com>
> *Sent:* Tuesday, March 14, 2023 4:40 PM
> *To:* Pantelis Moschopoulos <pmoschopoulos at outlook.com>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Memory Usage in Matrix Assembly.
>
>
>
> On Tue 14. Mar 2023 at 07:15, Pantelis Moschopoulos <
> pmoschopoulos at outlook.com> wrote:
>
> Hi everyone,
>
> I am a new Petsc user that incorporates Petsc for FEM in a Fortran code.
> My question concerns the sudden increase of the memory that Petsc needs
> during the assembly of the jacobian matrix. After this point, memory is
> freed. It seems to me like Petsc performs memory allocations and the
> deallocations during assembly.
> I have used the following commands with no success:
> CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE,ier)
> CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ier)
> CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_TRUE,ier).
> CALL MatSetOption(petsc_A, MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ier)
>
> The structure of the matrix does not change during my simulation, just the
> values. I am expecting this behavior the first time that I create this
> matrix because the preallocation instructions that I use are not very
> accurate but this continues every time I assemble the matrix.
> What I am missing here?
>
>
> I am guessing this observation is seen when you run a parallel job.
>
> MatSetValues() will cache values in a temporary memory buffer if the
> values are to be sent to a different MPI rank.
> Hence if the parallel layout of your matrix doesn?t closely match the
> layout of the DOFs on each mesh sub-domain, then a huge number of values
> can potentially be cached. After you call MatAssemblyBegin(),
> MatAssemblyEnd() this cache will be freed.
>
> Thanks,
> Dave
>
>
>
> Thank you very much,
> Pantelis
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230314/2095cdb4/attachment.html>

From bsmith at petsc.dev  Tue Mar 14 11:11:11 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Tue, 14 Mar 2023 12:11:11 -0400
Subject: [petsc-users] Dmplex+PetscFe+KSP
In-Reply-To: <dbe6f8d5a30448f095949359f19cbd58@dtu.dk>
References: <44a1501d8bd64690a6189d1e4271e8c7@dtu.dk>
	<D03C7593-57CD-4F91-B3CF-A9B77E638B9D@petsc.dev>
	<dbe6f8d5a30448f095949359f19cbd58@dtu.dk>
Message-ID: <8029C240-EFC7-42E2-8EEA-32A8A30EF364@petsc.dev>


  Matt can help you more directly.

  Barry


> On Mar 14, 2023, at 11:40 AM, Jonathan Davud Razi Seyed Mirpourian <jdara at dtu.dk> wrote:
> 
> Dear Barry,
>  
> Thank you very much for the quick answer! 
>  
> To my understanding, in the snes examples,  it is the call: DMPlexSetSnesLocalFEM that takes care of computing the identities important for snes (jacobian, residual, boundary values).
> Is there an equivalent for KSP (just computing the system Matrix A and the rhs b)? I cannot find any DMPlexSetKSPLocalFEM in the docs or am I missing something?
>  
> Also, I was not aware of SNESKSP, so thank you very much for that, it will be my fallback strategy.
>  
> All the best,
> Jonathan
>  
> From: Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> 
> Sent: 14. marts 2023 16:18
> To: Jonathan Davud Razi Seyed Mirpourian <jdara at dtu.dk <mailto:jdara at dtu.dk>>
> Cc: petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] Dmplex+PetscFe+KSP
>  
>  
>   KSP/SNES do not automatically assemble the linear system, that is the responsibility of DMPLEX in this case. Thus the process for assembling the matrix is largely the same whether done with KSP or SNES and DMPLEX. The difference is, of course, that constructing the linear matrix does not depend on some ?solution? vector as with SNES. 
>  
> Note also you can simply use SNES for a linear problem by selecting the SNESType of SNESKSP; this will just as efficient as using KSP directly.
>  
>   You should be able to locate a SNES example and extract the calls for defining the mesh and building the matrix but using them with KSP.
>  
>   Barry
>  
>  
>  
>  
> 
> 
> On Mar 14, 2023, at 8:48 AM, Jonathan Davud Razi Seyed Mirpourian via petsc-users <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>  
> Dear Petsc team,
>  
> I am trying to use DMplex in combination with PetscFE and KSP to solve a linear system.
>  
> I have struggled to do so, as all the examples I found ( for example:https://petsc.org/release/src/snes/tutorials/ex26.c.html) use SNES.
>  
> Is there a way to avoid this? Optimally I would like to use dmplex for the mesh management, then create the discretization with PetscFE and then get KSP to automatically
> assemble the system matrix A.  
>  
> I hope my questions is reasonable.
>  
> All the best,
>  
> Jonathan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230314/26084db3/attachment.html>

From eric.w.hester at icloud.com  Tue Mar 14 11:45:00 2023
From: eric.w.hester at icloud.com (Eric Hester)
Date: Tue, 14 Mar 2023 09:45:00 -0700
Subject: [petsc-users] Does petsc4py support matrix-free iterative
 solvers?
In-Reply-To: <4AB5F676-1D25-42C3-B668-A5FF65C070D5@icloud.com>
References: <BAF9A579-D220-444D-967E-E72BF8644619@icloud.com>
	<59FB83B2-68E6-4A53-926A-1C0727269A87@dsic.upv.es>
	<4AB5F676-1D25-42C3-B668-A5FF65C070D5@icloud.com>
Message-ID: <68BB11A0-CE85-4F87-877C-2BBC1A57DD9A@icloud.com>

Is there a similar example of how to create shell preconditioners using petsc4py?

Thanks,
Eric

> On Mar 13, 2023, at 09:37, Eric Hester <eric.w.hester at icloud.com> wrote:
> 
> Ah ok. I see how the poisson2d example works. Thanks for the quick reply.
> 
> Eric
> 
>> On Mar 13, 2023, at 08:10, Jose E. Roman <jroman at dsic.upv.es> wrote:
>> 
>> Both ode/vanderpol.py and poisson2d/poisson2d.py use shell matrices via a mult(self,mat,X,Y) function defined in the python side. Another example is ex3.py in slepc4py.
>> 
>> Jose
>> 
>> 
>> 
>>> El 13 mar 2023, a las 15:58, Eric Hester via petsc-users <petsc-users at mcs.anl.gov> escribi?:
>>> 
>>> Hello everyone,
>>> 
>>> Does petsc4py support matrix-free iterative solvers (as for Matrix-Free matrices in petsc)?
>>> 
>>> For context, I have a distributed matrix problem to solve. It comes from a Fourier-Chebyshev Galerkin discretisation. The corresponding matrix is dense, but it is fast to evaluate using fftw. It is also distributed in memory.
>>> 
>>> While I?ve found some petsc4py tutorial examples in "/petsc/src/binding/petsc4py/demo/?, they don?t seem to show a matrix free example. And I don?t see a reference to a matrix shell create method in the petsc4py api.
>>> 
>>> If petsc4py does support matrix free iterative solvers, it would be really helpful if someone could provide even a toy example of that. Serial would work, though a parallelised one would be better.
>>> 
>>> Thanks,
>>> Eric
>>> 
>>> 
>> 
> 


From jroman at dsic.upv.es  Tue Mar 14 11:49:51 2023
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Tue, 14 Mar 2023 17:49:51 +0100
Subject: [petsc-users] Does petsc4py support matrix-free iterative
 solvers?
In-Reply-To: <68BB11A0-CE85-4F87-877C-2BBC1A57DD9A@icloud.com>
References: <BAF9A579-D220-444D-967E-E72BF8644619@icloud.com>
	<59FB83B2-68E6-4A53-926A-1C0727269A87@dsic.upv.es>
	<4AB5F676-1D25-42C3-B668-A5FF65C070D5@icloud.com>
	<68BB11A0-CE85-4F87-877C-2BBC1A57DD9A@icloud.com>
Message-ID: <65C6B1F1-41F3-4F19-AC5D-E6C1A408D5AD@dsic.upv.es>

Have a look at ex100.c ex100.py:
https://gitlab.com/petsc/petsc/-/blob/c28a890633c5a91613f1645670105409b4ba3c14/src/ksp/ksp/tutorials/ex100.c
https://gitlab.com/petsc/petsc/-/blob/c28a890633c5a91613f1645670105409b4ba3c14/src/ksp/ksp/tutorials/ex100.py

Jose


> El 14 mar 2023, a las 17:45, Eric Hester <eric.w.hester at icloud.com> escribi?:
> 
> Is there a similar example of how to create shell preconditioners using petsc4py?
> 
> Thanks,
> Eric
> 
>> On Mar 13, 2023, at 09:37, Eric Hester <eric.w.hester at icloud.com> wrote:
>> 
>> Ah ok. I see how the poisson2d example works. Thanks for the quick reply.
>> 
>> Eric
>> 
>>> On Mar 13, 2023, at 08:10, Jose E. Roman <jroman at dsic.upv.es> wrote:
>>> 
>>> Both ode/vanderpol.py and poisson2d/poisson2d.py use shell matrices via a mult(self,mat,X,Y) function defined in the python side. Another example is ex3.py in slepc4py.
>>> 
>>> Jose
>>> 
>>> 
>>> 
>>>> El 13 mar 2023, a las 15:58, Eric Hester via petsc-users <petsc-users at mcs.anl.gov> escribi?:
>>>> 
>>>> Hello everyone,
>>>> 
>>>> Does petsc4py support matrix-free iterative solvers (as for Matrix-Free matrices in petsc)?
>>>> 
>>>> For context, I have a distributed matrix problem to solve. It comes from a Fourier-Chebyshev Galerkin discretisation. The corresponding matrix is dense, but it is fast to evaluate using fftw. It is also distributed in memory.
>>>> 
>>>> While I?ve found some petsc4py tutorial examples in "/petsc/src/binding/petsc4py/demo/?, they don?t seem to show a matrix free example. And I don?t see a reference to a matrix shell create method in the petsc4py api.
>>>> 
>>>> If petsc4py does support matrix free iterative solvers, it would be really helpful if someone could provide even a toy example of that. Serial would work, though a parallelised one would be better.
>>>> 
>>>> Thanks,
>>>> Eric
>>>> 
>>>> 
>>> 
>> 
> 


From stefano.zampini at gmail.com  Tue Mar 14 12:13:53 2023
From: stefano.zampini at gmail.com (Stefano Zampini)
Date: Tue, 14 Mar 2023 20:13:53 +0300
Subject: [petsc-users] Does petsc4py support matrix-free iterative
 solvers?
In-Reply-To: <65C6B1F1-41F3-4F19-AC5D-E6C1A408D5AD@dsic.upv.es>
References: <BAF9A579-D220-444D-967E-E72BF8644619@icloud.com>
	<59FB83B2-68E6-4A53-926A-1C0727269A87@dsic.upv.es>
	<4AB5F676-1D25-42C3-B668-A5FF65C070D5@icloud.com>
	<68BB11A0-CE85-4F87-877C-2BBC1A57DD9A@icloud.com>
	<65C6B1F1-41F3-4F19-AC5D-E6C1A408D5AD@dsic.upv.es>
Message-ID: <CAGPUishZkDvjEo4WSdmP4jEv6CVE+C3KJe0u8KfaON0w+V_VmA@mail.gmail.com>

You can find other  examples at
https://gitlab.com/stefanozampini/petscexamples

On Tue, Mar 14, 2023, 19:50 Jose E. Roman <jroman at dsic.upv.es> wrote:

> Have a look at ex100.c ex100.py:
>
> https://gitlab.com/petsc/petsc/-/blob/c28a890633c5a91613f1645670105409b4ba3c14/src/ksp/ksp/tutorials/ex100.c
>
> https://gitlab.com/petsc/petsc/-/blob/c28a890633c5a91613f1645670105409b4ba3c14/src/ksp/ksp/tutorials/ex100.py
>
> Jose
>
>
> > El 14 mar 2023, a las 17:45, Eric Hester <eric.w.hester at icloud.com>
> escribi?:
> >
> > Is there a similar example of how to create shell preconditioners using
> petsc4py?
> >
> > Thanks,
> > Eric
> >
> >> On Mar 13, 2023, at 09:37, Eric Hester <eric.w.hester at icloud.com>
> wrote:
> >>
> >> Ah ok. I see how the poisson2d example works. Thanks for the quick
> reply.
> >>
> >> Eric
> >>
> >>> On Mar 13, 2023, at 08:10, Jose E. Roman <jroman at dsic.upv.es> wrote:
> >>>
> >>> Both ode/vanderpol.py and poisson2d/poisson2d.py use shell matrices
> via a mult(self,mat,X,Y) function defined in the python side. Another
> example is ex3.py in slepc4py.
> >>>
> >>> Jose
> >>>
> >>>
> >>>
> >>>> El 13 mar 2023, a las 15:58, Eric Hester via petsc-users <
> petsc-users at mcs.anl.gov> escribi?:
> >>>>
> >>>> Hello everyone,
> >>>>
> >>>> Does petsc4py support matrix-free iterative solvers (as for
> Matrix-Free matrices in petsc)?
> >>>>
> >>>> For context, I have a distributed matrix problem to solve. It comes
> from a Fourier-Chebyshev Galerkin discretisation. The corresponding matrix
> is dense, but it is fast to evaluate using fftw. It is also distributed in
> memory.
> >>>>
> >>>> While I?ve found some petsc4py tutorial examples in
> "/petsc/src/binding/petsc4py/demo/?, they don?t seem to show a matrix free
> example. And I don?t see a reference to a matrix shell create method in the
> petsc4py api.
> >>>>
> >>>> If petsc4py does support matrix free iterative solvers, it would be
> really helpful if someone could provide even a toy example of that. Serial
> would work, though a parallelised one would be better.
> >>>>
> >>>> Thanks,
> >>>> Eric
> >>>>
> >>>>
> >>>
> >>
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230314/37c04cb8/attachment.html>

From jchristopher at anl.gov  Tue Mar 14 12:14:06 2023
From: jchristopher at anl.gov (Christopher, Joshua)
Date: Tue, 14 Mar 2023 17:14:06 +0000
Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre
 BoomerAMG
In-Reply-To: <523EAD18-437E-4008-A811-4D32317C89AC@joliv.et>
References: <SA1PR09MB756856DAB4827D93146EDB69D4AD9@SA1PR09MB7568.namprd09.prod.outlook.com>
	<CAC6644E-BC14-4F64-ABE5-F35589A4BD94@petsc.dev>
	<SA1PR09MB75689D2D180F6312B008039AD4B29@SA1PR09MB7568.namprd09.prod.outlook.com>
	<BC1877A1-BF64-4E2D-86C4-DDFFBB892577@petsc.dev>
	<SA1PR09MB7568D48C1F38364F0DD286C0D4B39@SA1PR09MB7568.namprd09.prod.outlook.com>
	<523EAD18-437E-4008-A811-4D32317C89AC@joliv.et>
Message-ID: <SA1PR09MB7568D0FA09DDED4732EA2546D4BE9@SA1PR09MB7568.namprd09.prod.outlook.com>

Hello PETSc users,

I haven't heard back from the library developer regarding the numbering issue or my questions on using field split operators with their library, so I need to fix this myself.

Regarding the natural numbering vs parallel numbering: I haven't figured out what is wrong here. I stepped through in parallel and it looks like each processor is setting up the matrix and calling MatSetValue similar to what is shown in https://petsc.org/release/src/ksp/ksp/tutorials/ex2.c.html. I see that PETSc is recognizing my simple two-processor test from the output ("PetscInitialize_Common(): PETSc successfully started: number of processors = 2"). I'll keep poking at this, however I'm very new to PETSc. When I print the matrix to ASCII using PETSC_VIEWER_DEFAULT, I'm guessing I see one row per line, and the tuples consists of the column number and value?

On the FieldSplit preconditioner, is my understanding here correct:

To use FieldSplit, I must have a DM. Since I have an unstructured mesh, I must use DMPlex and set up the chart and covering relations specific to my mesh following here: https://petsc.org/release/docs/manual/dmplex/. I think this may be very time-consuming for me to set up.

Currently, I already have a matrix stored in a parallel sparse L-D-U format. I am converting into PETSc's sparse parallel AIJ matrix (traversing my matrix and using MatSetValues). The weights for my discretization scheme are already accounted for in the coefficients of my L-D-U matrix. I do have the submatrices in L-D-U format for each of my two equations' coupling with each other. That is, the equivalent of lines 242,251-252,254 of example 28 https://petsc.org/release/src/snes/tutorials/ex28.c.html. Could I directly convert my submatrices into PETSc's sub-matrix here, then assemble things together so that the field split preconditioners will work?

Alternatively, since my L-D-U matrices already account for the discretization scheme, can I use a simple structured grid DM?

Thank you so much for your help!
Regards,
Joshua
________________________________
From: Pierre Jolivet <pierre at joliv.et>
Sent: Friday, March 3, 2023 11:45 AM
To: Christopher, Joshua <jchristopher at anl.gov>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG

For full disclosure, with -ksp_pc_side right -ksp_max_it 100 -ksp_rtol 1E-10:
1) with renumbering via ParMETIS
-pc_type bjacobi -sub_pc_type lu -sub_pc_factor_mat_solver_type mumps => Linear solve converged due to CONVERGED_RTOL iterations 10
-pc_type hypre -pc_hypre_boomeramg_relax_type_down l1-Gauss-Seidel -pc_hypre_boomeramg_relax_type_up backward-l1-Gauss-Seidel => Linear solve converged due to CONVERGED_RTOL iterations 55
2) without renumbering via ParMETIS
-pc_type bjacobi => Linear solve did not converge due to DIVERGED_ITS iterations 100
-pc_type hypre => Linear solve did not converge due to DIVERGED_ITS iterations 100
Using on outer fieldsplit may help fix this.

Thanks,
Pierre

On 3 Mar 2023, at 6:24 PM, Christopher, Joshua via petsc-users <petsc-users at mcs.anl.gov> wrote:

I am solving these equations in the context of electrically-driven fluid flows as that first paper describes. I am using a PIMPLE scheme to advance the fluid equations in time, and my goal is to do a coupled solve of the electric equations similar to what is described in this paper: https://www.sciencedirect.com/science/article/pii/S0045793019302427. They are using the SIMPLE scheme in this paper. My fluid flow should eventually reach steady behavior, and likewise the time derivative in the charge density should trend towards zero. They preferred using BiCGStab with a direct LU preconditioner for solving their electric equations. I tried to test that combination, but my case is halting for unknown reasons in the middle of the PETSc solve. I'll try with more nodes and see if I am running out of memory, but the computer is a little overloaded at the moment so it may take a while to run.

I sent Pierre Jolivet my matrix and RHS, and they said the matrix does not appear to be following a parallel numbering, and instead looks like the matrix has natural numbering. When they renumbered the system with ParMETIS they got really fast convergence. I am using PETSc through a library, so I will reach out to the library authors and see if there is an issue in the library.

Thank you,
Joshua
________________________________
From: Barry Smith <bsmith at petsc.dev>
Sent: Thursday, March 2, 2023 3:47 PM
To: Christopher, Joshua <jchristopher at anl.gov>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG




[Untitled.png]

  Are you solving this as a time-dependent problem? Using an implicit scheme (like backward Euler) for rho ? In ODE language, solving the differential algebraic equation?

Is epsilon bounded away from 0?

On Mar 2, 2023, at 4:22 PM, Christopher, Joshua <jchristopher at anl.gov> wrote:

Hi Barry and Mark,

Thank you for looking into my problem. The two equations I am solving with PETSc are equations 6 and 7 from this paper:https://ris.utwente.nl/ws/portalfiles/portal/5676495/Roghair+Paper_final_draft_v1.pdf

I just used MUMPS and SuperLU_DIST on my full-size problem (with 3,000,000 unknowns). To clarify, I did a direct solve with -ksp_type preonly. They take a very long time, about 30 minutes for MUMPS and 18 minutes for SuperLU_DIST, see attached output. For reference, the same matrix took 658 iterations of BoomerAMG and about 20 seconds of walltime. Maybe I am already getting a great deal with BoomerAMG!

I'll try removing some terms from my solve (e.g. removing the second equation, then making the second equation just the elliptic portion of the equation, etc.) and try with a simpler geometry. I'll keep you updated as I run into troubles with that route. I wasn't aware of Field Split preconditioners, I'll do some reading on them and give them a try as well.

Thank you again,
Joshua
________________________________
From: Barry Smith <bsmith at petsc.dev>
Sent: Thursday, March 2, 2023 7:47 AM
To: Christopher, Joshua <jchristopher at anl.gov>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG


  Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the 5,000,000 unknowns? It is at the high end of problem sizes you can do with direct solvers but is worth comparing with  BoomerAMG. You likely want to use more nodes and fewer cores per node with MUMPs to be able to access more memory. If you are needing to solve multiple right hand sides but with the same matrix the factors will be reused resulting in the second and later solves being much faster.

  I agree with Mark, with iterative solvers you are likely to end up with PCFIELDSPLIT.

  Barry


On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users <petsc-users at mcs.anl.gov> wrote:

Hello,

I am trying to solve the leaky-dielectric model equations with PETSc using a second-order discretization scheme (with limiting to first order as needed) using the finite volume method. The leaky dielectric model is a coupled system of two equations, consisting of a Poisson equation and a convection-diffusion equation.  I have tested on small problems with simple geometry (~1000 DoFs) using:

-ksp_type gmres
-pc_type hypre
-pc_hypre_type boomeramg

and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this in parallel with 2 cores, but also previously was able to use successfully use a direct solver in serial to solve this problem. When I scale up to my production problem, I get significantly worse convergence. My production problem has ~3 million DoFs, more complex geometry, and is solved on ~100 cores across two nodes. The boundary conditions change a little because of the geometry, but are of the same classifications (e.g. only Dirichlet and Neumann). On the production case, I am needing 600-4000 iterations to converge. I've attached the output from the first solve that took 658 iterations to converge, using the following output options:

-ksp_view_pre
-ksp_view
-ksp_converged_reason
-ksp_monitor_true_residual
-ksp_test_null_space

My matrix is non-symmetric, the condition number can be around 10e6, and the eigenvalues reported by PETSc have been real and positive (using -ksp_view_eigenvalues).

I have tried using other preconditions (superlu, mumps, gamg, mg) but hypre+boomeramg has performed the best so far. The literature seems to indicate that AMG is the best approach for solving these equations in a coupled fashion.

Do you have any advice on speeding up the convergence of this system?

Thank you,
Joshua
<petsc_gmres_boomeramg.txt>

<petsc_preonly_mumps.txt><petsc_preonly_superlu.txt>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230314/732d1fac/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Untitled.png
Type: image/png
Size: 165137 bytes
Desc: Untitled.png
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230314/732d1fac/attachment-0001.png>

From bsmith at petsc.dev  Tue Mar 14 13:35:30 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Tue, 14 Mar 2023 14:35:30 -0400
Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre
 BoomerAMG
In-Reply-To: <SA1PR09MB7568D0FA09DDED4732EA2546D4BE9@SA1PR09MB7568.namprd09.prod.outlook.com>
References: <SA1PR09MB756856DAB4827D93146EDB69D4AD9@SA1PR09MB7568.namprd09.prod.outlook.com>
	<CAC6644E-BC14-4F64-ABE5-F35589A4BD94@petsc.dev>
	<SA1PR09MB75689D2D180F6312B008039AD4B29@SA1PR09MB7568.namprd09.prod.outlook.com>
	<BC1877A1-BF64-4E2D-86C4-DDFFBB892577@petsc.dev>
	<SA1PR09MB7568D48C1F38364F0DD286C0D4B39@SA1PR09MB7568.namprd09.prod.outlook.com>
	<523EAD18-437E-4008-A811-4D32317C89AC@joliv.et>
	<SA1PR09MB7568D0FA09DDED4732EA2546D4BE9@SA1PR09MB7568.namprd09.prod.outlook.com>
Message-ID: <4A1F98D0-658C-47A2-8277-23F97F95F5C1@petsc.dev>


  You definitely do not need to use a complicated DM to take advantage of PCFIELDSPLIT. All you need to do is create two IS on each MPI process. The first should list all the indices of the degrees of freedom of your first type of variable and the second should list all the rest of the degrees of freedom. Then use https://petsc.org/release/docs/manualpages/PC/PCFieldSplitSetIS/

  Barry

Note: PCFIELDSPLIT does not care how you have ordered your degrees of freedom of the two types. You might interlace them or have all the first degree of freedom on an MPI process and then have all the second degree of freedom. This just determines what your IS look like.



> On Mar 14, 2023, at 1:14 PM, Christopher, Joshua via petsc-users <petsc-users at mcs.anl.gov> wrote:
> 
> Hello PETSc users,
> 
> I haven't heard back from the library developer regarding the numbering issue or my questions on using field split operators with their library, so I need to fix this myself.
> 
> Regarding the natural numbering vs parallel numbering: I haven't figured out what is wrong here. I stepped through in parallel and it looks like each processor is setting up the matrix and calling MatSetValue similar to what is shown in https://petsc.org/release/src/ksp/ksp/tutorials/ex2.c.html. I see that PETSc is recognizing my simple two-processor test from the output ("PetscInitialize_Common(): PETSc successfully started: number of processors = 2"). I'll keep poking at this, however I'm very new to PETSc. When I print the matrix to ASCII using PETSC_VIEWER_DEFAULT, I'm guessing I see one row per line, and the tuples consists of the column number and value?
> 
> On the FieldSplit preconditioner, is my understanding here correct:
> 
> To use FieldSplit, I must have a DM. Since I have an unstructured mesh, I must use DMPlex and set up the chart and covering relations specific to my mesh following here: https://petsc.org/release/docs/manual/dmplex/. I think this may be very time-consuming for me to set up. 
> 
> Currently, I already have a matrix stored in a parallel sparse L-D-U format. I am converting into PETSc's sparse parallel AIJ matrix (traversing my matrix and using MatSetValues). The weights for my discretization scheme are already accounted for in the coefficients of my L-D-U matrix. I do have the submatrices in L-D-U format for each of my two equations' coupling with each other. That is, the equivalent of lines 242,251-252,254 of example 28 https://petsc.org/release/src/snes/tutorials/ex28.c.html. Could I directly convert my submatrices into PETSc's sub-matrix here, then assemble things together so that the field split preconditioners will work?
> 
> Alternatively, since my L-D-U matrices already account for the discretization scheme, can I use a simple structured grid DM?
> 
> Thank you so much for your help!
> Regards,
> Joshua
> From: Pierre Jolivet <pierre at joliv.et <mailto:pierre at joliv.et>>
> Sent: Friday, March 3, 2023 11:45 AM
> To: Christopher, Joshua <jchristopher at anl.gov <mailto:jchristopher at anl.gov>>
> Cc: petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
> Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG
>  
> For full disclosure, with -ksp_pc_side right -ksp_max_it 100 -ksp_rtol 1E-10:
> 1) with renumbering via ParMETIS
> -pc_type bjacobi -sub_pc_type lu -sub_pc_factor_mat_solver_type mumps => Linear solve converged due to CONVERGED_RTOL iterations 10
> -pc_type hypre -pc_hypre_boomeramg_relax_type_down l1-Gauss-Seidel -pc_hypre_boomeramg_relax_type_up backward-l1-Gauss-Seidel => Linear solve converged due to CONVERGED_RTOL iterations 55
> 2) without renumbering via ParMETIS
> -pc_type bjacobi => Linear solve did not converge due to DIVERGED_ITS iterations 100
> -pc_type hypre => Linear solve did not converge due to DIVERGED_ITS iterations 100
> Using on outer fieldsplit may help fix this.
> 
> Thanks,
> Pierre
> 
>> On 3 Mar 2023, at 6:24 PM, Christopher, Joshua via petsc-users <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>> 
>> I am solving these equations in the context of electrically-driven fluid flows as that first paper describes. I am using a PIMPLE scheme to advance the fluid equations in time, and my goal is to do a coupled solve of the electric equations similar to what is described in this paper: https://www.sciencedirect.com/science/article/pii/S0045793019302427. They are using the SIMPLE scheme in this paper. My fluid flow should eventually reach steady behavior, and likewise the time derivative in the charge density should trend towards zero. They preferred using BiCGStab with a direct LU preconditioner for solving their electric equations. I tried to test that combination, but my case is halting for unknown reasons in the middle of the PETSc solve. I'll try with more nodes and see if I am running out of memory, but the computer is a little overloaded at the moment so it may take a while to run.
>> 
>> I sent Pierre Jolivet my matrix and RHS, and they said the matrix does not appear to be following a parallel numbering, and instead looks like the matrix has natural numbering. When they renumbered the system with ParMETIS they got really fast convergence. I am using PETSc through a library, so I will reach out to the library authors and see if there is an issue in the library.
>> 
>> Thank you,
>> Joshua
>> From: Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
>> Sent: Thursday, March 2, 2023 3:47 PM
>> To: Christopher, Joshua <jchristopher at anl.gov <mailto:jchristopher at anl.gov>>
>> Cc: petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
>> Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG
>>  
>> 
>> 
>> 
>> <Untitled.png>
>> 
>>   Are you solving this as a time-dependent problem? Using an implicit scheme (like backward Euler) for rho ? In ODE language, solving the differential algebraic equation?
>> 
>> Is epsilon bounded away from 0? 
>> 
>>> On Mar 2, 2023, at 4:22 PM, Christopher, Joshua <jchristopher at anl.gov <mailto:jchristopher at anl.gov>> wrote:
>>> 
>>> Hi Barry and Mark,
>>> 
>>> Thank you for looking into my problem. The two equations I am solving with PETSc are equations 6 and 7 from this paper:https://ris.utwente.nl/ws/portalfiles/portal/5676495/Roghair+Paper_final_draft_v1.pdf
>>> 
>>> I just used MUMPS and SuperLU_DIST on my full-size problem (with 3,000,000 unknowns). To clarify, I did a direct solve with -ksp_type preonly. They take a very long time, about 30 minutes for MUMPS and 18 minutes for SuperLU_DIST, see attached output. For reference, the same matrix took 658 iterations of BoomerAMG and about 20 seconds of walltime. Maybe I am already getting a great deal with BoomerAMG!
>>> 
>>> I'll try removing some terms from my solve (e.g. removing the second equation, then making the second equation just the elliptic portion of the equation, etc.) and try with a simpler geometry. I'll keep you updated as I run into troubles with that route. I wasn't aware of Field Split preconditioners, I'll do some reading on them and give them a try as well.
>>> 
>>> Thank you again,
>>> Joshua
>>> From: Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
>>> Sent: Thursday, March 2, 2023 7:47 AM
>>> To: Christopher, Joshua <jchristopher at anl.gov <mailto:jchristopher at anl.gov>>
>>> Cc: petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
>>> Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG
>>>  
>>> 
>>>   Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the 5,000,000 unknowns? It is at the high end of problem sizes you can do with direct solvers but is worth comparing with  BoomerAMG. You likely want to use more nodes and fewer cores per node with MUMPs to be able to access more memory. If you are needing to solve multiple right hand sides but with the same matrix the factors will be reused resulting in the second and later solves being much faster.
>>> 
>>>   I agree with Mark, with iterative solvers you are likely to end up with PCFIELDSPLIT.
>>> 
>>>   Barry
>>> 
>>> 
>>>> On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>>>> 
>>>> Hello,
>>>> 
>>>> I am trying to solve the leaky-dielectric model equations with PETSc using a second-order discretization scheme (with limiting to first order as needed) using the finite volume method. The leaky dielectric model is a coupled system of two equations, consisting of a Poisson equation and a convection-diffusion equation.  I have tested on small problems with simple geometry (~1000 DoFs) using:
>>>> 
>>>> -ksp_type gmres 
>>>> -pc_type hypre 
>>>> -pc_hypre_type boomeramg
>>>> 
>>>> and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this in parallel with 2 cores, but also previously was able to use successfully use a direct solver in serial to solve this problem. When I scale up to my production problem, I get significantly worse convergence. My production problem has ~3 million DoFs, more complex geometry, and is solved on ~100 cores across two nodes. The boundary conditions change a little because of the geometry, but are of the same classifications (e.g. only Dirichlet and Neumann). On the production case, I am needing 600-4000 iterations to converge. I've attached the output from the first solve that took 658 iterations to converge, using the following output options:
>>>> 
>>>> -ksp_view_pre
>>>> -ksp_view
>>>> -ksp_converged_reason
>>>> -ksp_monitor_true_residual
>>>> -ksp_test_null_space
>>>> 
>>>> My matrix is non-symmetric, the condition number can be around 10e6, and the eigenvalues reported by PETSc have been real and positive (using -ksp_view_eigenvalues). 
>>>> 
>>>> I have tried using other preconditions (superlu, mumps, gamg, mg) but hypre+boomeramg has performed the best so far. The literature seems to indicate that AMG is the best approach for solving these equations in a coupled fashion.
>>>> 
>>>> Do you have any advice on speeding up the convergence of this system? 
>>>> 
>>>> Thank you,
>>>> Joshua
>>>> <petsc_gmres_boomeramg.txt>
>>> 
>>> <petsc_preonly_mumps.txt><petsc_preonly_superlu.txt>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230314/4d34b695/attachment-0001.html>

From knepley at gmail.com  Tue Mar 14 15:52:12 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 14 Mar 2023 16:52:12 -0400
Subject: [petsc-users] Dmplex+PetscFe+KSP
In-Reply-To: <8029C240-EFC7-42E2-8EEA-32A8A30EF364@petsc.dev>
References: <44a1501d8bd64690a6189d1e4271e8c7@dtu.dk>
	<D03C7593-57CD-4F91-B3CF-A9B77E638B9D@petsc.dev>
	<dbe6f8d5a30448f095949359f19cbd58@dtu.dk>
	<8029C240-EFC7-42E2-8EEA-32A8A30EF364@petsc.dev>
Message-ID: <CAMYG4GkyE35WPRUaZ6pmG8kbQgwG1JK5NJOv8bWWhUfXsYcgEA@mail.gmail.com>

On Tue, Mar 14, 2023 at 12:11?PM Barry Smith <bsmith at petsc.dev> wrote:

>
>   Matt can help you more directly.
>
>   Barry
>
>
> On Mar 14, 2023, at 11:40 AM, Jonathan Davud Razi Seyed Mirpourian <
> jdara at dtu.dk> wrote:
>
> Dear Barry,
>
> Thank you very much for the quick answer!
>
> To my understanding, in the snes examples,  it is the call:
> DMPlexSetSnesLocalFEM that takes care of computing the identities important
> for snes (jacobian, residual, boundary values).
> Is there an equivalent for KSP (just computing the system Matrix A and the
> rhs b)? I cannot find any DMPlexSetKSPLocalFEM in the docs or am I missing
> something?
>
> Also, I was not aware of SNESKSP, so thank you very much for that, it will
> be my fallback strategy.
>
> There are no KSP analogues. The intent is for you to use -snes_type ksp
for truly linear problems.

  Thanks,

     Matt

> All the best,
> Jonathan
>
> *From:* Barry Smith <bsmith at petsc.dev>
> *Sent:* 14. marts 2023 16:18
> *To:* Jonathan Davud Razi Seyed Mirpourian <jdara at dtu.dk>
> *Cc:* petsc-users at mcs.anl.gov
> *Subject:* Re: [petsc-users] Dmplex+PetscFe+KSP
>
>
>   KSP/SNES do not automatically assemble the linear system, that is the
> responsibility of DMPLEX in this case. Thus the process for assembling the
> matrix is largely the same whether done with KSP or SNES and DMPLEX. The
> difference is, of course, that constructing the linear matrix does not
> depend on some ?solution? vector as with SNES.
>
> Note also you can simply use SNES for a linear problem by selecting the
> SNESType of SNESKSP; this will just as efficient as using KSP directly.
>
>   You should be able to locate a SNES example and extract the calls for
> defining the mesh and building the matrix but using them with KSP.
>
>   Barry
>
>
>
>
>
>
> On Mar 14, 2023, at 8:48 AM, Jonathan Davud Razi Seyed Mirpourian via
> petsc-users <petsc-users at mcs.anl.gov> wrote:
>
> Dear Petsc team,
>
> I am trying to use DMplex in combination with PetscFE and KSP to solve a
> linear system.
>
> I have struggled to do so, as all the examples I found ( for example:
> https://petsc.org/release/src/snes/tutorials/ex26.c.html) use SNES.
>
> Is there a way to avoid this? Optimally I would like to use dmplex for the
> mesh management, then create the discretization with PetscFE and then get
> KSP to automatically
> assemble the system matrix A.
>
> I hope my questions is reasonable.
>
> All the best,
>
> Jonathan
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230314/0a394d3f/attachment.html>

From knepley at gmail.com  Tue Mar 14 15:55:14 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 14 Mar 2023 16:55:14 -0400
Subject: [petsc-users] Memory Usage in Matrix Assembly.
In-Reply-To: <CAJ98EDoY1+f=aCFZaDUOUk5SqP9zf0zcinAkvu6iEh_my-ChgQ@mail.gmail.com>
References: <DB8P193MB0485B923B541126207D26D67A3BE9@DB8P193MB0485.EURP193.PROD.OUTLOOK.COM>
	<CAJ98EDr3n0VhOjrnKYuN5BmcgK9j4s2cYbUvabsy-1+pmJNpww@mail.gmail.com>
	<AM0P193MB0481A2354B2CD3F2BA5444C6A3BE9@AM0P193MB0481.EURP193.PROD.OUTLOOK.COM>
	<CAJ98EDoY1+f=aCFZaDUOUk5SqP9zf0zcinAkvu6iEh_my-ChgQ@mail.gmail.com>
Message-ID: <CAMYG4GmtJW1-+k_qL7RD_xGK8AgGffehqXrRZomacC4JaLeCpw@mail.gmail.com>

On Tue, Mar 14, 2023 at 12:01?PM Dave May <dave.mayhem23 at gmail.com> wrote:

>
>
> On Tue, 14 Mar 2023 at 07:59, Pantelis Moschopoulos <
> pmoschopoulos at outlook.com> wrote:
>
>> Dear Dave,
>>
>> Yes, I observe this in parallel runs. How I can change the parallel
>> layout of the matrix? In my implementation, I read the mesh file, and the I
>> split the domain where the first rank gets the first N elements, the second
>> rank gets the next N elements etc. Should I use metis to distribute
>> elements?
>>
>
>
>> Note that I use continuous finite elements, which means that some values
>> will be cached in a temporary buffer.
>>
>
> Sure. With CG FE you will always have some DOFs which need to be cached,
> however the number of cached values will be minimized if you follow Barry's
> advice. If you do what Barry suggests, only the DOFs which live on the
> boundary of your element-wise defined sub-domains would need to cached.
>

Note that we have direct support for unstructured meshes (Plex) with
partitioning and redistribution, rather than translating them to purely
algebraic language.

  Thanks,

     Matt


> Thanks,
> Dave
>
>
>>
>> Thank you very much,
>> Pantelis
>> ------------------------------
>> *From:* Dave May <dave.mayhem23 at gmail.com>
>> *Sent:* Tuesday, March 14, 2023 4:40 PM
>> *To:* Pantelis Moschopoulos <pmoschopoulos at outlook.com>
>> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>> *Subject:* Re: [petsc-users] Memory Usage in Matrix Assembly.
>>
>>
>>
>> On Tue 14. Mar 2023 at 07:15, Pantelis Moschopoulos <
>> pmoschopoulos at outlook.com> wrote:
>>
>> Hi everyone,
>>
>> I am a new Petsc user that incorporates Petsc for FEM in a Fortran code.
>> My question concerns the sudden increase of the memory that Petsc needs
>> during the assembly of the jacobian matrix. After this point, memory is
>> freed. It seems to me like Petsc performs memory allocations and the
>> deallocations during assembly.
>> I have used the following commands with no success:
>> CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE,ier)
>> CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ier)
>> CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_ALLOCATION_ERR,
>> PETSC_TRUE,ier).
>> CALL MatSetOption(petsc_A, MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ier)
>>
>> The structure of the matrix does not change during my simulation, just
>> the values. I am expecting this behavior the first time that I create this
>> matrix because the preallocation instructions that I use are not very
>> accurate but this continues every time I assemble the matrix.
>> What I am missing here?
>>
>>
>> I am guessing this observation is seen when you run a parallel job.
>>
>> MatSetValues() will cache values in a temporary memory buffer if the
>> values are to be sent to a different MPI rank.
>> Hence if the parallel layout of your matrix doesn?t closely match the
>> layout of the DOFs on each mesh sub-domain, then a huge number of values
>> can potentially be cached. After you call MatAssemblyBegin(),
>> MatAssemblyEnd() this cache will be freed.
>>
>> Thanks,
>> Dave
>>
>>
>>
>> Thank you very much,
>> Pantelis
>>
>>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230314/c9110c77/attachment-0001.html>

From pmoschopoulos at outlook.com  Wed Mar 15 01:34:58 2023
From: pmoschopoulos at outlook.com (Pantelis Moschopoulos)
Date: Wed, 15 Mar 2023 06:34:58 +0000
Subject: [petsc-users] Memory Usage in Matrix Assembly.
In-Reply-To: <CAMYG4GmtJW1-+k_qL7RD_xGK8AgGffehqXrRZomacC4JaLeCpw@mail.gmail.com>
References: <DB8P193MB0485B923B541126207D26D67A3BE9@DB8P193MB0485.EURP193.PROD.OUTLOOK.COM>
	<CAJ98EDr3n0VhOjrnKYuN5BmcgK9j4s2cYbUvabsy-1+pmJNpww@mail.gmail.com>
	<AM0P193MB0481A2354B2CD3F2BA5444C6A3BE9@AM0P193MB0481.EURP193.PROD.OUTLOOK.COM>
	<CAJ98EDoY1+f=aCFZaDUOUk5SqP9zf0zcinAkvu6iEh_my-ChgQ@mail.gmail.com>
	<CAMYG4GmtJW1-+k_qL7RD_xGK8AgGffehqXrRZomacC4JaLeCpw@mail.gmail.com>
Message-ID: <VI1P193MB0496A4A5764E05D296736645A3BF9@VI1P193MB0496.EURP193.PROD.OUTLOOK.COM>

Dear all,
Thank you all very much for your suggestions.

Dave, I am using also the reverse Cuthill?McKee algorithm when I load the mesh information and then the simulation proceeds. I can use partitioning after the reordering right?

Matt, with PLEX you refer to DMPLEX? To be honest, I have never tried the DM structures of Petsc up to this point.

Pantelis
________________________________
From: Matthew Knepley <knepley at gmail.com>
Sent: Tuesday, March 14, 2023 10:55 PM
To: Dave May <dave.mayhem23 at gmail.com>
Cc: Pantelis Moschopoulos <pmoschopoulos at outlook.com>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Memory Usage in Matrix Assembly.

On Tue, Mar 14, 2023 at 12:01?PM Dave May <dave.mayhem23 at gmail.com<mailto:dave.mayhem23 at gmail.com>> wrote:


On Tue, 14 Mar 2023 at 07:59, Pantelis Moschopoulos <pmoschopoulos at outlook.com<mailto:pmoschopoulos at outlook.com>> wrote:
Dear Dave,

Yes, I observe this in parallel runs. How I can change the parallel layout of the matrix? In my implementation, I read the mesh file, and the I split the domain where the first rank gets the first N elements, the second rank gets the next N elements etc. Should I use metis to distribute elements?

Note that I use continuous finite elements, which means that some values will be cached in a temporary buffer.

Sure. With CG FE you will always have some DOFs which need to be cached, however the number of cached values will be minimized if you follow Barry's advice. If you do what Barry suggests, only the DOFs which live on the boundary of your element-wise defined sub-domains would need to cached.

Note that we have direct support for unstructured meshes (Plex) with partitioning and redistribution, rather than translating them to purely algebraic language.

  Thanks,

     Matt

Thanks,
Dave


Thank you very much,
Pantelis
________________________________
From: Dave May <dave.mayhem23 at gmail.com<mailto:dave.mayhem23 at gmail.com>>
Sent: Tuesday, March 14, 2023 4:40 PM
To: Pantelis Moschopoulos <pmoschopoulos at outlook.com<mailto:pmoschopoulos at outlook.com>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Memory Usage in Matrix Assembly.



On Tue 14. Mar 2023 at 07:15, Pantelis Moschopoulos <pmoschopoulos at outlook.com<mailto:pmoschopoulos at outlook.com>> wrote:
Hi everyone,

I am a new Petsc user that incorporates Petsc for FEM in a Fortran code.
My question concerns the sudden increase of the memory that Petsc needs during the assembly of the jacobian matrix. After this point, memory is freed. It seems to me like Petsc performs memory allocations and the deallocations during assembly.
I have used the following commands with no success:
CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE,ier)
CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ier)
CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_TRUE,ier).
CALL MatSetOption(petsc_A, MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ier)

The structure of the matrix does not change during my simulation, just the values. I am expecting this behavior the first time that I create this matrix because the preallocation instructions that I use are not very accurate but this continues every time I assemble the matrix.
What I am missing here?

I am guessing this observation is seen when you run a parallel job.

MatSetValues() will cache values in a temporary memory buffer if the values are to be sent to a different MPI rank.
Hence if the parallel layout of your matrix doesn?t closely match the layout of the DOFs on each mesh sub-domain, then a huge number of values can potentially be cached. After you call MatAssemblyBegin(), MatAssemblyEnd() this cache will be freed.

Thanks,
Dave



Thank you very much,
Pantelis


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230315/b8ebf191/attachment.html>

From ksl7912 at snu.ac.kr  Wed Mar 15 02:38:07 2023
From: ksl7912 at snu.ac.kr (=?UTF-8?B?wq3qtozsirnrpqwgLyDtlZnsg50gLyDtla3qs7XsmrDso7zqs7XtlZnqs7w=?=)
Date: Wed, 15 Mar 2023 16:38:07 +0900
Subject: [petsc-users] Question about time issues in parallel computing
Message-ID: <CAHfVxeBLCs412gUNcGeVVb0ca+_9sHd2Do6gvKgygD_RD-01qw@mail.gmail.com>

Dear petsc developers.

Hello.
I am trying to solve the structural problem with FEM and test parallel
computing works well.

However, even if I change the number of cores, the total time is calculated
the same.

I have tested on a simple problem using a MUMPS solver using:
mpiexec -n 1
mpiexec -n 2
mpiexec -n 4
...

Could you give me some advice if you have experienced this problem?

Best regards
Seung Lee Kwon
-- 
Seung Lee Kwon, Ph.D.Candidate
Aerospace Structures and Materials Laboratory
Department of Mechanical and Aerospace Engineering
Seoul National University
Building 300 Rm 503, Gwanak-ro 1, Gwanak-gu, Seoul, South Korea, 08826
E-mail : ksl7912 at snu.ac.kr
Office : +82-2-880-7389
C. P : +82-10-4695-1062
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230315/cb2e11f3/attachment-0001.html>

From knepley at gmail.com  Wed Mar 15 06:07:51 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 15 Mar 2023 07:07:51 -0400
Subject: [petsc-users] Memory Usage in Matrix Assembly.
In-Reply-To: <VI1P193MB0496A4A5764E05D296736645A3BF9@VI1P193MB0496.EURP193.PROD.OUTLOOK.COM>
References: <DB8P193MB0485B923B541126207D26D67A3BE9@DB8P193MB0485.EURP193.PROD.OUTLOOK.COM>
	<CAJ98EDr3n0VhOjrnKYuN5BmcgK9j4s2cYbUvabsy-1+pmJNpww@mail.gmail.com>
	<AM0P193MB0481A2354B2CD3F2BA5444C6A3BE9@AM0P193MB0481.EURP193.PROD.OUTLOOK.COM>
	<CAJ98EDoY1+f=aCFZaDUOUk5SqP9zf0zcinAkvu6iEh_my-ChgQ@mail.gmail.com>
	<CAMYG4GmtJW1-+k_qL7RD_xGK8AgGffehqXrRZomacC4JaLeCpw@mail.gmail.com>
	<VI1P193MB0496A4A5764E05D296736645A3BF9@VI1P193MB0496.EURP193.PROD.OUTLOOK.COM>
Message-ID: <CAMYG4G=ddwECADZ2WiMjhbhvUG72vdVtitBsYR2iHLf276qetw@mail.gmail.com>

On Wed, Mar 15, 2023 at 2:34?AM Pantelis Moschopoulos <
pmoschopoulos at outlook.com> wrote:

> Dear all,
> Thank you all very much for your suggestions.
>
> Dave, I am using also the reverse Cuthill?McKee algorithm when I load the
> mesh information and then the simulation proceeds. I can use partitioning
> after the reordering right?
>

Yes.


> Matt, with PLEX you refer to DMPLEX? To be honest, I have never tried the
> DM structures of Petsc up to this point.
>

Yes. It can read a variety of mesh formats, but if everything is working,
there is no need to switch.

  Thanks

    Matt


> Pantelis
> ------------------------------
> *From:* Matthew Knepley <knepley at gmail.com>
> *Sent:* Tuesday, March 14, 2023 10:55 PM
> *To:* Dave May <dave.mayhem23 at gmail.com>
> *Cc:* Pantelis Moschopoulos <pmoschopoulos at outlook.com>;
> petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Memory Usage in Matrix Assembly.
>
> On Tue, Mar 14, 2023 at 12:01?PM Dave May <dave.mayhem23 at gmail.com> wrote:
>
>
>
> On Tue, 14 Mar 2023 at 07:59, Pantelis Moschopoulos <
> pmoschopoulos at outlook.com> wrote:
>
> Dear Dave,
>
> Yes, I observe this in parallel runs. How I can change the parallel layout
> of the matrix? In my implementation, I read the mesh file, and the I split
> the domain where the first rank gets the first N elements, the second rank
> gets the next N elements etc. Should I use metis to distribute elements?
>
>
>
> Note that I use continuous finite elements, which means that some values
> will be cached in a temporary buffer.
>
>
> Sure. With CG FE you will always have some DOFs which need to be cached,
> however the number of cached values will be minimized if you follow Barry's
> advice. If you do what Barry suggests, only the DOFs which live on the
> boundary of your element-wise defined sub-domains would need to cached.
>
>
> Note that we have direct support for unstructured meshes (Plex) with
> partitioning and redistribution, rather than translating them to purely
> algebraic language.
>
>   Thanks,
>
>      Matt
>
>
> Thanks,
> Dave
>
>
>
> Thank you very much,
> Pantelis
> ------------------------------
> *From:* Dave May <dave.mayhem23 at gmail.com>
> *Sent:* Tuesday, March 14, 2023 4:40 PM
> *To:* Pantelis Moschopoulos <pmoschopoulos at outlook.com>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Memory Usage in Matrix Assembly.
>
>
>
> On Tue 14. Mar 2023 at 07:15, Pantelis Moschopoulos <
> pmoschopoulos at outlook.com> wrote:
>
> Hi everyone,
>
> I am a new Petsc user that incorporates Petsc for FEM in a Fortran code.
> My question concerns the sudden increase of the memory that Petsc needs
> during the assembly of the jacobian matrix. After this point, memory is
> freed. It seems to me like Petsc performs memory allocations and the
> deallocations during assembly.
> I have used the following commands with no success:
> CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE,ier)
> CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ier)
> CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_TRUE,ier).
> CALL MatSetOption(petsc_A, MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ier)
>
> The structure of the matrix does not change during my simulation, just the
> values. I am expecting this behavior the first time that I create this
> matrix because the preallocation instructions that I use are not very
> accurate but this continues every time I assemble the matrix.
> What I am missing here?
>
>
> I am guessing this observation is seen when you run a parallel job.
>
> MatSetValues() will cache values in a temporary memory buffer if the
> values are to be sent to a different MPI rank.
> Hence if the parallel layout of your matrix doesn?t closely match the
> layout of the DOFs on each mesh sub-domain, then a huge number of values
> can potentially be cached. After you call MatAssemblyBegin(),
> MatAssemblyEnd() this cache will be freed.
>
> Thanks,
> Dave
>
>
>
> Thank you very much,
> Pantelis
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230315/c537a94f/attachment.html>

From knepley at gmail.com  Wed Mar 15 06:49:39 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 15 Mar 2023 07:49:39 -0400
Subject: [petsc-users] Question about time issues in parallel computing
In-Reply-To: <CAHfVxeBLCs412gUNcGeVVb0ca+_9sHd2Do6gvKgygD_RD-01qw@mail.gmail.com>
References: <CAHfVxeBLCs412gUNcGeVVb0ca+_9sHd2Do6gvKgygD_RD-01qw@mail.gmail.com>
Message-ID: <CAMYG4GnMnXiPdw8S+rodbKzQw7GnUZ7k8yjspw7uQyQjvcQbOA@mail.gmail.com>

On Wed, Mar 15, 2023 at 3:38?AM ???? / ?? / ??????? <ksl7912 at snu.ac.kr>
wrote:

> Dear petsc developers.
>
> Hello.
> I am trying to solve the structural problem with FEM and test parallel
> computing works well.
>
> However, even if I change the number of cores, the total time is
> calculated the same.
>
> I have tested on a simple problem using a MUMPS solver using:
> mpiexec -n 1
> mpiexec -n 2
> mpiexec -n 4
> ...
>
> Could you give me some advice if you have experienced this problem?
>

If your problem is small, you could very well see no speedup:


https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup

  Thanks,

     Matt


> Best regards
> Seung Lee Kwon
> --
> Seung Lee Kwon, Ph.D.Candidate
> Aerospace Structures and Materials Laboratory
> Department of Mechanical and Aerospace Engineering
> Seoul National University
> Building 300 Rm 503, Gwanak-ro 1, Gwanak-gu, Seoul, South Korea, 08826
> E-mail : ksl7912 at snu.ac.kr
> Office : +82-2-880-7389
> C. P : +82-10-4695-1062
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230315/ee1dc8dc/attachment-0001.html>

From ksl7912 at snu.ac.kr  Wed Mar 15 20:08:50 2023
From: ksl7912 at snu.ac.kr (=?UTF-8?B?wq3qtozsirnrpqwgLyDtlZnsg50gLyDtla3qs7XsmrDso7zqs7XtlZnqs7w=?=)
Date: Thu, 16 Mar 2023 10:08:50 +0900
Subject: [petsc-users] Question about time issues in parallel computing
In-Reply-To: <CAMYG4GnMnXiPdw8S+rodbKzQw7GnUZ7k8yjspw7uQyQjvcQbOA@mail.gmail.com>
References: <CAHfVxeBLCs412gUNcGeVVb0ca+_9sHd2Do6gvKgygD_RD-01qw@mail.gmail.com>
	<CAMYG4GnMnXiPdw8S+rodbKzQw7GnUZ7k8yjspw7uQyQjvcQbOA@mail.gmail.com>
Message-ID: <CAHfVxeDyjB70WCQXeojpaEMsSx7rf7i-bpad8tHt94UU06t3Tg@mail.gmail.com>

Thank you for your reply.

It was a simple problem, but it has more than 1000 degrees of freedom.

Is this not enough to check speedup?

Best regards
Seung Lee Kwon

2023? 3? 15? (?) ?? 8:50, Matthew Knepley <knepley at gmail.com>?? ??:

> On Wed, Mar 15, 2023 at 3:38?AM ???? / ?? / ??????? <ksl7912 at snu.ac.kr>
> wrote:
>
>> Dear petsc developers.
>>
>> Hello.
>> I am trying to solve the structural problem with FEM and test parallel
>> computing works well.
>>
>> However, even if I change the number of cores, the total time is
>> calculated the same.
>>
>> I have tested on a simple problem using a MUMPS solver using:
>> mpiexec -n 1
>> mpiexec -n 2
>> mpiexec -n 4
>> ...
>>
>> Could you give me some advice if you have experienced this problem?
>>
>
> If your problem is small, you could very well see no speedup:
>
>
> https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup
>
>   Thanks,
>
>      Matt
>
>
>> Best regards
>> Seung Lee Kwon
>> --
>> Seung Lee Kwon, Ph.D.Candidate
>> Aerospace Structures and Materials Laboratory
>> Department of Mechanical and Aerospace Engineering
>> Seoul National University
>> Building 300 Rm 503, Gwanak-ro 1, Gwanak-gu, Seoul, South Korea, 08826
>> E-mail : ksl7912 at snu.ac.kr
>> Office : +82-2-880-7389
>> C. P : +82-10-4695-1062
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>


-- 
Seung Lee Kwon, Ph.D.Candidate
Aerospace Structures and Materials Laboratory
Department of Mechanical and Aerospace Engineering
Seoul National University
Building 300 Rm 503, Gwanak-ro 1, Gwanak-gu, Seoul, South Korea, 08826
E-mail : ksl7912 at snu.ac.kr
Office : +82-2-880-7389
C. P : +82-10-4695-1062
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230316/fd4f4b9e/attachment.html>

From mfadams at lbl.gov  Wed Mar 15 20:13:52 2023
From: mfadams at lbl.gov (Mark Adams)
Date: Wed, 15 Mar 2023 21:13:52 -0400
Subject: [petsc-users] Question about time issues in parallel computing
In-Reply-To: <CAHfVxeDyjB70WCQXeojpaEMsSx7rf7i-bpad8tHt94UU06t3Tg@mail.gmail.com>
References: <CAHfVxeBLCs412gUNcGeVVb0ca+_9sHd2Do6gvKgygD_RD-01qw@mail.gmail.com>
	<CAMYG4GnMnXiPdw8S+rodbKzQw7GnUZ7k8yjspw7uQyQjvcQbOA@mail.gmail.com>
	<CAHfVxeDyjB70WCQXeojpaEMsSx7rf7i-bpad8tHt94UU06t3Tg@mail.gmail.com>
Message-ID: <CADOhEh6JnDLjRgSW5HcyynOmcYB+dJ16sXtxoWFjhBwLwHsqOQ@mail.gmail.com>

Speed up to 4 processors should have at least 40,000 equations for 3D
problems and more for 2D.
At least for iterative solvers. This is probably a good place to start with
direct solvers but you might see benefit with a little less.

Mark

On Wed, Mar 15, 2023 at 9:09?PM ???? / ?? / ??????? <ksl7912 at snu.ac.kr>
wrote:

> Thank you for your reply.
>
> It was a simple problem, but it has more than 1000 degrees of freedom.
>
> Is this not enough to check speedup?
>
> Best regards
> Seung Lee Kwon
>
> 2023? 3? 15? (?) ?? 8:50, Matthew Knepley <knepley at gmail.com>?? ??:
>
>> On Wed, Mar 15, 2023 at 3:38?AM ???? / ?? / ??????? <ksl7912 at snu.ac.kr>
>> wrote:
>>
>>> Dear petsc developers.
>>>
>>> Hello.
>>> I am trying to solve the structural problem with FEM and test parallel
>>> computing works well.
>>>
>>> However, even if I change the number of cores, the total time is
>>> calculated the same.
>>>
>>> I have tested on a simple problem using a MUMPS solver using:
>>> mpiexec -n 1
>>> mpiexec -n 2
>>> mpiexec -n 4
>>> ...
>>>
>>> Could you give me some advice if you have experienced this problem?
>>>
>>
>> If your problem is small, you could very well see no speedup:
>>
>>
>> https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup
>>
>>   Thanks,
>>
>>      Matt
>>
>>
>>> Best regards
>>> Seung Lee Kwon
>>> --
>>> Seung Lee Kwon, Ph.D.Candidate
>>> Aerospace Structures and Materials Laboratory
>>> Department of Mechanical and Aerospace Engineering
>>> Seoul National University
>>> Building 300 Rm 503, Gwanak-ro 1, Gwanak-gu, Seoul, South Korea, 08826
>>> E-mail : ksl7912 at snu.ac.kr
>>> Office : +82-2-880-7389
>>> C. P : +82-10-4695-1062
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>
>
> --
> Seung Lee Kwon, Ph.D.Candidate
> Aerospace Structures and Materials Laboratory
> Department of Mechanical and Aerospace Engineering
> Seoul National University
> Building 300 Rm 503, Gwanak-ro 1, Gwanak-gu, Seoul, South Korea, 08826
> E-mail : ksl7912 at snu.ac.kr
> Office : +82-2-880-7389
> C. P : +82-10-4695-1062
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230315/7dd9c495/attachment.html>

From ksi2443 at gmail.com  Thu Mar 16 07:26:37 2023
From: ksi2443 at gmail.com (user_gong Kim)
Date: Thu, 16 Mar 2023 21:26:37 +0900
Subject: [petsc-users] Difference between opt and debug
Message-ID: <CAGYM=minh4ByXbsmGhuMdy2HRbmZyN52DGw6gCtZn8GXyv4snw@mail.gmail.com>

Hello,


I have some issues about different mode and different command.

1. Exactly the same code, but no error occurs in debug mode, but an error
occurs in opt mode.
In this case, what should I be suspicious of?


2. When executed with ./application, no error occurs, but when executed
with mpiexec -n 1 ./app, an error may occur. What should be suspected in
this case?


Thanks,
Hyung Kim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230316/edbb38d5/attachment-0001.html>

From knepley at gmail.com  Thu Mar 16 07:51:13 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 16 Mar 2023 08:51:13 -0400
Subject: [petsc-users] Difference between opt and debug
In-Reply-To: <CAGYM=minh4ByXbsmGhuMdy2HRbmZyN52DGw6gCtZn8GXyv4snw@mail.gmail.com>
References: <CAGYM=minh4ByXbsmGhuMdy2HRbmZyN52DGw6gCtZn8GXyv4snw@mail.gmail.com>
Message-ID: <CAMYG4G=DhY75mUVr1iifqWDQt1-6GOTebbXxULvkSuFz7BeQHg@mail.gmail.com>

On Thu, Mar 16, 2023 at 8:26?AM user_gong Kim <ksi2443 at gmail.com> wrote:

> Hello,
>
> I have some issues about different mode and different command.
>
> 1. Exactly the same code, but no error occurs in debug mode, but an error
> occurs in opt mode.
> In this case, what should I be suspicious of?
>

Memory overwrites, since debug and opt can have different memory layouts.
Run under valgrind, or suing address sanitizer.


> 2. When executed with ./application, no error occurs, but when executed
> with mpiexec -n 1 ./app, an error may occur. What should be suspected in
> this case?
>

Same thing.

  Thanks,

     Matt


> Thanks,
> Hyung Kim
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230316/bd695611/attachment.html>

From ksi2443 at gmail.com  Fri Mar 17 04:50:45 2023
From: ksi2443 at gmail.com (user_gong Kim)
Date: Fri, 17 Mar 2023 18:50:45 +0900
Subject: [petsc-users] Question about MatView
Message-ID: <CAGYM=mg=kG_bpH77kQiw=GCVFbpJYvBWjnrY8vTdQ1jsee+uHA@mail.gmail.com>

Hello,



I have 2 questions about MatView.



1.    I would like to ask if the process below is possible.
When running in parallel, is it possible to make the matrix of the mpiaij
format into a txt file, output it, and read it again so that the entire
process has the same matrix?

2.    If possible, please let me know which function can be used to create
a txt file and how to read the txt file.



Thanks,

Hyung Kim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230317/29f6d933/attachment.html>

From knepley at gmail.com  Fri Mar 17 05:34:46 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 17 Mar 2023 06:34:46 -0400
Subject: [petsc-users] Question about MatView
In-Reply-To: <CAGYM=mg=kG_bpH77kQiw=GCVFbpJYvBWjnrY8vTdQ1jsee+uHA@mail.gmail.com>
References: <CAGYM=mg=kG_bpH77kQiw=GCVFbpJYvBWjnrY8vTdQ1jsee+uHA@mail.gmail.com>
Message-ID: <CAMYG4G=+5S+S_HVQkBJKV-dy67bOXbV_HcOUhMnTpjHQY1VHSQ@mail.gmail.com>

On Fri, Mar 17, 2023 at 5:51?AM user_gong Kim <ksi2443 at gmail.com> wrote:

> Hello,
>
>
>
> I have 2 questions about MatView.
>
>
>
> 1.    I would like to ask if the process below is possible.
> When running in parallel, is it possible to make the matrix of the mpiaij
> format into a txt file, output it, and read it again so that the entire
> process has the same matrix?
>
No. However, you can do this with a binary viewer. I suggest using

  MatViewFromOptions(mat, NULL, "-my_view");

and then the command line argument

  -my_view binary:mat.bin

and then you can read this in using

  MatCreate(PETSC_COMM_WORLD, &mat);
  PetscViewerBinaryOpen(PETSC_COMM_WORLD, "mat.bin", FILE_MODE_READ,
&viewer);
  MatLoad(mat, viewer);
  ViewerDestroy(&viewer);

  THanks,

     Matt



> 2.    If possible, please let me know which function can be used to
> create a txt file and how to read the txt file.
>
>
>
> Thanks,
>
> Hyung Kim
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230317/d6913190/attachment.html>

From ksi2443 at gmail.com  Fri Mar 17 08:45:44 2023
From: ksi2443 at gmail.com (user_gong Kim)
Date: Fri, 17 Mar 2023 22:45:44 +0900
Subject: [petsc-users] Question about MatView
In-Reply-To: <CAMYG4G=+5S+S_HVQkBJKV-dy67bOXbV_HcOUhMnTpjHQY1VHSQ@mail.gmail.com>
References: <CAGYM=mg=kG_bpH77kQiw=GCVFbpJYvBWjnrY8vTdQ1jsee+uHA@mail.gmail.com>
	<CAMYG4G=+5S+S_HVQkBJKV-dy67bOXbV_HcOUhMnTpjHQY1VHSQ@mail.gmail.com>
Message-ID: <CAGYM=mi+DYFfWZAO2KjiEx5o9t1sUDAUjAKrHJjtbFpnBxMPpw@mail.gmail.com>

Following your comments,  I did an test.
However, if I run the application in parallel.
In all processes, it is not possible to obtain values at all positions in
the matrix through MatGetValue.
As in the previous case of saving in binary, it is read in parallel divided
form.
Is it impossible to want to get the all value in the whole process?


Thanks,
Hyung Kim

2023? 3? 17? (?) ?? 7:35, Matthew Knepley <knepley at gmail.com>?? ??:

> On Fri, Mar 17, 2023 at 5:51?AM user_gong Kim <ksi2443 at gmail.com> wrote:
>
>> Hello,
>>
>>
>>
>> I have 2 questions about MatView.
>>
>>
>>
>> 1.    I would like to ask if the process below is possible.
>> When running in parallel, is it possible to make the matrix of the mpiaij
>> format into a txt file, output it, and read it again so that the entire
>> process has the same matrix?
>>
> No. However, you can do this with a binary viewer. I suggest using
>
>   MatViewFromOptions(mat, NULL, "-my_view");
>
> and then the command line argument
>
>   -my_view binary:mat.bin
>
> and then you can read this in using
>
>   MatCreate(PETSC_COMM_WORLD, &mat);
>   PetscViewerBinaryOpen(PETSC_COMM_WORLD, "mat.bin", FILE_MODE_READ,
> &viewer);
>   MatLoad(mat, viewer);
>   ViewerDestroy(&viewer);
>
>   THanks,
>
>      Matt
>
>
>
>> 2.    If possible, please let me know which function can be used to
>> create a txt file and how to read the txt file.
>>
>>
>>
>> Thanks,
>>
>> Hyung Kim
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230317/b1c5f1e3/attachment-0001.html>

From clement.berger at ens-lyon.fr  Fri Mar 17 09:10:17 2023
From: clement.berger at ens-lyon.fr (Berger Clement)
Date: Fri, 17 Mar 2023 15:10:17 +0100
Subject: [petsc-users] Create a nest not aligned by processors
Message-ID: <b89465855591abef194300f1dc0f68e4@ens-lyon.fr>

Dear all, 

I want to construct a matrix by blocs, each block having different sizes
and partially stored by multiple processors. If I am not mistaken, the
right way to do so is by using the MATNEST type. However, the following
code 

Call
MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
Call
MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
Call
MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr)


does not generate the same matrix depending on the number of processors.
It seems that it starts by everything owned by the first proc for A and
B, then goes on to the second proc and so on (I hope I am being clear). 

Is it possible to change that ? 

Note that I am coding in fortran if that has ay consequence. 

Thank you, 

Sincerely,

-- 
Cl?ment BERGER
ENS de Lyon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230317/1b7370d3/attachment.html>

From bsmith at petsc.dev  Fri Mar 17 09:48:40 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 17 Mar 2023 10:48:40 -0400
Subject: [petsc-users] Create a nest not aligned by processors
In-Reply-To: <b89465855591abef194300f1dc0f68e4@ens-lyon.fr>
References: <b89465855591abef194300f1dc0f68e4@ens-lyon.fr>
Message-ID: <3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev>


   You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process.

  Barry


> On Mar 17, 2023, at 10:10 AM, Berger Clement <clement.berger at ens-lyon.fr> wrote:
> 
> Dear all,
> 
> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code
> 
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr)
> 
> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear).
> 
> Is it possible to change that ?
> 
> Note that I am coding in fortran if that has ay consequence.
> 
> Thank you,
> 
> Sincerely,
> 
> -- 
> Cl?ment BERGER
> ENS de Lyon

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230317/1d196aaa/attachment.html>

From bsmith at petsc.dev  Fri Mar 17 09:53:21 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 17 Mar 2023 10:53:21 -0400
Subject: [petsc-users] Question about MatView
In-Reply-To: <CAGYM=mi+DYFfWZAO2KjiEx5o9t1sUDAUjAKrHJjtbFpnBxMPpw@mail.gmail.com>
References: <CAGYM=mg=kG_bpH77kQiw=GCVFbpJYvBWjnrY8vTdQ1jsee+uHA@mail.gmail.com>
	<CAMYG4G=+5S+S_HVQkBJKV-dy67bOXbV_HcOUhMnTpjHQY1VHSQ@mail.gmail.com>
	<CAGYM=mi+DYFfWZAO2KjiEx5o9t1sUDAUjAKrHJjtbFpnBxMPpw@mail.gmail.com>
Message-ID: <A947975D-CFA9-4F46-BA26-23B44DD67DBA@petsc.dev>


   Use 

>> MatCreate(PETSC_COMM_SELF, &mat);
>>  PetscViewerBinaryOpen(PETSC_COMM_SELF, "mat.bin", FILE_MODE_READ, &viewer);


    If it one program running that both views and loads the matrix you can use MatCreateRedundantMatrix() to reproduce the entire matrix on each MPI rank. It is better than using the filesystem to do it.


> On Mar 17, 2023, at 9:45 AM, user_gong Kim <ksi2443 at gmail.com> wrote:
> 
> Following your comments,  I did an test.
> However, if I run the application in parallel. 
> In all processes, it is not possible to obtain values at all positions in the matrix through MatGetValue.
> As in the previous case of saving in binary, it is read in parallel divided form.
> Is it impossible to want to get the all value in the whole process?
> 
> 
> Thanks,
> Hyung Kim
> 
> 2023? 3? 17? (?) ?? 7:35, Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>>?? ??:
>> On Fri, Mar 17, 2023 at 5:51?AM user_gong Kim <ksi2443 at gmail.com <mailto:ksi2443 at gmail.com>> wrote:
>>> Hello,
>>> 
>>>  
>>> I have 2 questions about MatView.
>>> 
>>>  
>>> 1.    I would like to ask if the process below is possible.
>>> When running in parallel, is it possible to make the matrix of the mpiaij format into a txt file, output it, and read it again so that the entire process has the same matrix?
>>> 
>> No. However, you can do this with a binary viewer. I suggest using
>> 
>>   MatViewFromOptions(mat, NULL, "-my_view");
>> 
>> and then the command line argument
>> 
>>   -my_view binary:mat.bin
>> 
>> and then you can read this in using
>> 
>>   MatCreate(PETSC_COMM_WORLD, &mat);
>>   PetscViewerBinaryOpen(PETSC_COMM_WORLD, "mat.bin", FILE_MODE_READ, &viewer);
>>   MatLoad(mat, viewer);
>>   ViewerDestroy(&viewer);
>> 
>>   THanks,
>> 
>>      Matt
>> 
>>  
>>> 2.    If possible, please let me know which function can be used to create a txt file and how to read the txt file.
>>> 
>>>  
>>> Thanks,
>>> 
>>> Hyung Kim
>>> 
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230317/4f766ff0/attachment.html>

From ksi2443 at gmail.com  Fri Mar 17 10:05:49 2023
From: ksi2443 at gmail.com (user_gong Kim)
Date: Sat, 18 Mar 2023 00:05:49 +0900
Subject: [petsc-users] Question about MatView
In-Reply-To: <A947975D-CFA9-4F46-BA26-23B44DD67DBA@petsc.dev>
References: <CAGYM=mg=kG_bpH77kQiw=GCVFbpJYvBWjnrY8vTdQ1jsee+uHA@mail.gmail.com>
	<CAMYG4G=+5S+S_HVQkBJKV-dy67bOXbV_HcOUhMnTpjHQY1VHSQ@mail.gmail.com>
	<CAGYM=mi+DYFfWZAO2KjiEx5o9t1sUDAUjAKrHJjtbFpnBxMPpw@mail.gmail.com>
	<A947975D-CFA9-4F46-BA26-23B44DD67DBA@petsc.dev>
Message-ID: <CAGYM=mj7YLqy2BCcVScbfk=wDgLb6XyrTzs1e36=N=8FZqWOFg@mail.gmail.com>

PETSC_COMM_SELF generates an error in more than 2 processes.

I would like to use the other method you said, matcreateredundantmatrix.
However, there is no example in the manual. Can you give an example using
this function?



2023? 3? 17? (?) ?? 11:53, Barry Smith <bsmith at petsc.dev>?? ??:

>
>    Use
>
> MatCreate(PETSC_COMM_SELF, &mat);
>>  PetscViewerBinaryOpen(PETSC_COMM_SELF, "mat.bin", FILE_MODE_READ,
>> &viewer);
>>
>
>
>     If it one program running that both views and loads the matrix you can
> use MatCreateRedundantMatrix() to reproduce the entire matrix on each MPI
> rank. It is better than using the filesystem to do it.
>
>
> On Mar 17, 2023, at 9:45 AM, user_gong Kim <ksi2443 at gmail.com> wrote:
>
> Following your comments,  I did an test.
> However, if I run the application in parallel.
> In all processes, it is not possible to obtain values at all positions in
> the matrix through MatGetValue.
> As in the previous case of saving in binary, it is read in parallel
> divided form.
> Is it impossible to want to get the all value in the whole process?
>
>
> Thanks,
> Hyung Kim
>
> 2023? 3? 17? (?) ?? 7:35, Matthew Knepley <knepley at gmail.com>?? ??:
>
>> On Fri, Mar 17, 2023 at 5:51?AM user_gong Kim <ksi2443 at gmail.com> wrote:
>>
>>> Hello,
>>>
>>>
>>> I have 2 questions about MatView.
>>>
>>>
>>> 1.    I would like to ask if the process below is possible.
>>> When running in parallel, is it possible to make the matrix of the
>>> mpiaij format into a txt file, output it, and read it again so that the
>>> entire process has the same matrix?
>>>
>> No. However, you can do this with a binary viewer. I suggest using
>>
>>   MatViewFromOptions(mat, NULL, "-my_view");
>>
>> and then the command line argument
>>
>>   -my_view binary:mat.bin
>>
>> and then you can read this in using
>>
>>   MatCreate(PETSC_COMM_WORLD, &mat);
>>   PetscViewerBinaryOpen(PETSC_COMM_WORLD, "mat.bin", FILE_MODE_READ,
>> &viewer);
>>   MatLoad(mat, viewer);
>>   ViewerDestroy(&viewer);
>>
>>   THanks,
>>
>>      Matt
>>
>>
>>
>>> 2.    If possible, please let me know which function can be used to
>>> create a txt file and how to read the txt file.
>>>
>>>
>>> Thanks,
>>>
>>> Hyung Kim
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230318/e93409d1/attachment-0001.html>

From bsmith at petsc.dev  Fri Mar 17 10:51:10 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 17 Mar 2023 11:51:10 -0400
Subject: [petsc-users] Question about MatView
In-Reply-To: <CAGYM=mj7YLqy2BCcVScbfk=wDgLb6XyrTzs1e36=N=8FZqWOFg@mail.gmail.com>
References: <CAGYM=mg=kG_bpH77kQiw=GCVFbpJYvBWjnrY8vTdQ1jsee+uHA@mail.gmail.com>
	<CAMYG4G=+5S+S_HVQkBJKV-dy67bOXbV_HcOUhMnTpjHQY1VHSQ@mail.gmail.com>
	<CAGYM=mi+DYFfWZAO2KjiEx5o9t1sUDAUjAKrHJjtbFpnBxMPpw@mail.gmail.com>
	<A947975D-CFA9-4F46-BA26-23B44DD67DBA@petsc.dev>
	<CAGYM=mj7YLqy2BCcVScbfk=wDgLb6XyrTzs1e36=N=8FZqWOFg@mail.gmail.com>
Message-ID: <9589CEBA-DF30-4580-95F4-1D30FD1BB36A@petsc.dev>


src/mat/tests/ex9.c and other examples in that directory use it.



> On Mar 17, 2023, at 11:05 AM, user_gong Kim <ksi2443 at gmail.com> wrote:
> 
> PETSC_COMM_SELF generates an error in more than 2 processes.

It should not


> 
> I would like to use the other method you said, matcreateredundantmatrix. However, there is no example in the manual. Can you give an example using this function?
> 
> 
> 
> 2023? 3? 17? (?) ?? 11:53, Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>?? ??:
>> 
>>    Use 
>> 
>>>> MatCreate(PETSC_COMM_SELF, &mat);
>>>>  PetscViewerBinaryOpen(PETSC_COMM_SELF, "mat.bin", FILE_MODE_READ, &viewer);
>> 
>> 
>>     If it one program running that both views and loads the matrix you can use MatCreateRedundantMatrix() to reproduce the entire matrix on each MPI rank. It is better than using the filesystem to do it.
>> 
>> 
>>> On Mar 17, 2023, at 9:45 AM, user_gong Kim <ksi2443 at gmail.com <mailto:ksi2443 at gmail.com>> wrote:
>>> 
>>> Following your comments,  I did an test.
>>> However, if I run the application in parallel. 
>>> In all processes, it is not possible to obtain values at all positions in the matrix through MatGetValue.
>>> As in the previous case of saving in binary, it is read in parallel divided form.
>>> Is it impossible to want to get the all value in the whole process?
>>> 
>>> 
>>> Thanks,
>>> Hyung Kim
>>> 
>>> 2023? 3? 17? (?) ?? 7:35, Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>>?? ??:
>>>> On Fri, Mar 17, 2023 at 5:51?AM user_gong Kim <ksi2443 at gmail.com <mailto:ksi2443 at gmail.com>> wrote:
>>>>> Hello,
>>>>> 
>>>>>  
>>>>> I have 2 questions about MatView.
>>>>> 
>>>>>  
>>>>> 1.    I would like to ask if the process below is possible.
>>>>> When running in parallel, is it possible to make the matrix of the mpiaij format into a txt file, output it, and read it again so that the entire process has the same matrix?
>>>>> 
>>>> No. However, you can do this with a binary viewer. I suggest using
>>>> 
>>>>   MatViewFromOptions(mat, NULL, "-my_view");
>>>> 
>>>> and then the command line argument
>>>> 
>>>>   -my_view binary:mat.bin
>>>> 
>>>> and then you can read this in using
>>>> 
>>>>   MatCreate(PETSC_COMM_WORLD, &mat);
>>>>   PetscViewerBinaryOpen(PETSC_COMM_WORLD, "mat.bin", FILE_MODE_READ, &viewer);
>>>>   MatLoad(mat, viewer);
>>>>   ViewerDestroy(&viewer);
>>>> 
>>>>   THanks,
>>>> 
>>>>      Matt
>>>> 
>>>>  
>>>>> 2.    If possible, please let me know which function can be used to create a txt file and how to read the txt file.
>>>>> 
>>>>>  
>>>>> Thanks,
>>>>> 
>>>>> Hyung Kim
>>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>> -- Norbert Wiener
>>>> 
>>>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230317/eaea2f99/attachment.html>

From clement.berger at ens-lyon.fr  Fri Mar 17 11:14:11 2023
From: clement.berger at ens-lyon.fr (Berger Clement)
Date: Fri, 17 Mar 2023 17:14:11 +0100
Subject: [petsc-users] Create a nest not aligned by processors
In-Reply-To: <3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev>
References: <b89465855591abef194300f1dc0f68e4@ens-lyon.fr>
	<3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev>
Message-ID: <ece4a0ac05b61082f2ea73507c7551c8@ens-lyon.fr>

It would be possible in the case I showed you but in mine that would
actually be quite complicated, isn't there any other workaround ? I
precise that I am not entitled to utilizing the MATNEST format, it's
just that I think the other ones wouldn't work.

---
Cl?ment BERGER
ENS de Lyon 

Le 2023-03-17 15:48, Barry Smith a ?crit :

> You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process. 
> 
> Barry 
> 
>> On Mar 17, 2023, at 10:10 AM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
>> 
>> Dear all, 
>> 
>> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code 
>> 
>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
>> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) 
>> 
>> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). 
>> 
>> Is it possible to change that ? 
>> 
>> Note that I am coding in fortran if that has ay consequence. 
>> 
>> Thank you, 
>> 
>> Sincerely,
>> 
>> -- 
>> Cl?ment BERGER
>> ENS de Lyon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230317/a925dbc6/attachment.html>

From bsmith at petsc.dev  Fri Mar 17 11:34:52 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 17 Mar 2023 12:34:52 -0400
Subject: [petsc-users] Create a nest not aligned by processors
In-Reply-To: <ece4a0ac05b61082f2ea73507c7551c8@ens-lyon.fr>
References: <b89465855591abef194300f1dc0f68e4@ens-lyon.fr>
	<3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev>
	<ece4a0ac05b61082f2ea73507c7551c8@ens-lyon.fr>
Message-ID: <193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev>


   Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it. 

   Barry

Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks).

> On Mar 17, 2023, at 12:14 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote:
> 
> It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work.
> 
> ---
> Cl?ment BERGER
> ENS de Lyon
> 
> 
> Le 2023-03-17 15:48, Barry Smith a ?crit :
> 
>>  
>>    You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process.
>>  
>>   Barry
>> 
>> 
>>> On Mar 17, 2023, at 10:10 AM, Berger Clement <clement.berger at ens-lyon.fr> wrote:
>>> Dear all,
>>> 
>>> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code
>>> 
>>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
>>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
>>> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr)
>>> 
>>> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear).
>>> 
>>> Is it possible to change that ?
>>> 
>>> Note that I am coding in fortran if that has ay consequence.
>>> 
>>> Thank you,
>>> 
>>> Sincerely,
>>> 
>>> -- 
>>> Cl?ment BERGER
>>> ENS de Lyon

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230317/49cba3f1/attachment-0001.html>

From clement.berger at ens-lyon.fr  Fri Mar 17 12:19:43 2023
From: clement.berger at ens-lyon.fr (Berger Clement)
Date: Fri, 17 Mar 2023 18:19:43 +0100
Subject: [petsc-users] Create a nest not aligned by processors
In-Reply-To: <193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev>
References: <b89465855591abef194300f1dc0f68e4@ens-lyon.fr>
	<3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev>
	<ece4a0ac05b61082f2ea73507c7551c8@ens-lyon.fr>
	<193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev>
Message-ID: <d9d5551d095a228f57cb5724fbd994ab@ens-lyon.fr>

I have a matrix with four different blocks (2rows - 2columns). The block
sizes differ from one another, because they correspond to a different
physical variable. One of the block has the particularity that it has to
be updated at each iteration. This update is performed by replacing it
with a product of multiple matrices that depend on the result of the
previous iteration. Note that these intermediate matrices are not square
(because they also correspond to other types of variables), and that
they must be completely refilled by hand (i.e. they are not the result
of some simple linear operations). Finally, I use this final block
matrix to solve multiple linear systems (with different righthand
sides), so for now I use MUMPS as only the first solve takes time (but I
might change it). 

Considering this setting, I created each type of variable separately,
filled the different matrices, and created different nests of vectors /
matrices for my operations. When the time comes to use KSPSolve, I use
MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also
copy the few vector data I need from my nests in a regular Vector, I
solve, I get back my data in my nest and carry on with the operations
needed for my updates. 

Is that clear ? I don't know if I provided too many or not enough
details. 

Thank you

---
Cl?ment BERGER
ENS de Lyon 

Le 2023-03-17 17:34, Barry Smith a ?crit :

> Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it.  
> 
> Barry 
> 
> Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks).
> 
> On Mar 17, 2023, at 12:14 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work.
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 15:48, Barry Smith a ?crit : 
> You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process. 
> 
> Barry 
> 
> On Mar 17, 2023, at 10:10 AM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> Dear all, 
> 
> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code 
> 
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) 
> 
> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). 
> 
> Is it possible to change that ? 
> 
> Note that I am coding in fortran if that has ay consequence. 
> 
> Thank you, 
> 
> Sincerely,
> 
> -- 
> Cl?ment BERGER
> ENS de Lyon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230317/c2d50f51/attachment.html>

From jchristopher at anl.gov  Fri Mar 17 12:26:34 2023
From: jchristopher at anl.gov (Christopher, Joshua)
Date: Fri, 17 Mar 2023 17:26:34 +0000
Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre
 BoomerAMG
In-Reply-To: <4A1F98D0-658C-47A2-8277-23F97F95F5C1@petsc.dev>
References: <SA1PR09MB756856DAB4827D93146EDB69D4AD9@SA1PR09MB7568.namprd09.prod.outlook.com>
	<CAC6644E-BC14-4F64-ABE5-F35589A4BD94@petsc.dev>
	<SA1PR09MB75689D2D180F6312B008039AD4B29@SA1PR09MB7568.namprd09.prod.outlook.com>
	<BC1877A1-BF64-4E2D-86C4-DDFFBB892577@petsc.dev>
	<SA1PR09MB7568D48C1F38364F0DD286C0D4B39@SA1PR09MB7568.namprd09.prod.outlook.com>
	<523EAD18-437E-4008-A811-4D32317C89AC@joliv.et>
	<SA1PR09MB7568D0FA09DDED4732EA2546D4BE9@SA1PR09MB7568.namprd09.prod.outlook.com>
	<4A1F98D0-658C-47A2-8277-23F97F95F5C1@petsc.dev>
Message-ID: <SA1PR09MB756832B0F3D61B1E798915D7D4BD9@SA1PR09MB7568.namprd09.prod.outlook.com>

Hi Barry,

Thank you for your response. I'm a little confused about the relation between the IS integer values and matrix indices. From https://petsc.org/release/src/snes/tutorials/ex70.c.html it looks like my IS should just contain a list of the rows for each split? For example, if I have a 100x100 matrix with two fields, "rho" and "phi", the first 50 rows correspond to the "rho" variable and the last 50 correspond to the "phi" variable. So I should call PCFieldSplitSetIS twice, the first with an IS containing integers 0-49 and the second with integers 49-99? PCFieldSplitSetIS is expecting global row numbers, correct?

My matrix is organized as one block after another.


Thank you,
Joshua
________________________________
From: Barry Smith <bsmith at petsc.dev>
Sent: Tuesday, March 14, 2023 1:35 PM
To: Christopher, Joshua <jchristopher at anl.gov>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG


  You definitely do not need to use a complicated DM to take advantage of PCFIELDSPLIT. All you need to do is create two IS on each MPI process. The first should list all the indices of the degrees of freedom of your first type of variable and the second should list all the rest of the degrees of freedom. Then use https://petsc.org/release/docs/manualpages/PC/PCFieldSplitSetIS/

  Barry

Note: PCFIELDSPLIT does not care how you have ordered your degrees of freedom of the two types. You might interlace them or have all the first degree of freedom on an MPI process and then have all the second degree of freedom. This just determines what your IS look like.



On Mar 14, 2023, at 1:14 PM, Christopher, Joshua via petsc-users <petsc-users at mcs.anl.gov> wrote:

Hello PETSc users,

I haven't heard back from the library developer regarding the numbering issue or my questions on using field split operators with their library, so I need to fix this myself.

Regarding the natural numbering vs parallel numbering: I haven't figured out what is wrong here. I stepped through in parallel and it looks like each processor is setting up the matrix and calling MatSetValue similar to what is shown in https://petsc.org/release/src/ksp/ksp/tutorials/ex2.c.html. I see that PETSc is recognizing my simple two-processor test from the output ("PetscInitialize_Common(): PETSc successfully started: number of processors = 2"). I'll keep poking at this, however I'm very new to PETSc. When I print the matrix to ASCII using PETSC_VIEWER_DEFAULT, I'm guessing I see one row per line, and the tuples consists of the column number and value?

On the FieldSplit preconditioner, is my understanding here correct:

To use FieldSplit, I must have a DM. Since I have an unstructured mesh, I must use DMPlex and set up the chart and covering relations specific to my mesh following here: https://petsc.org/release/docs/manual/dmplex/. I think this may be very time-consuming for me to set up.

Currently, I already have a matrix stored in a parallel sparse L-D-U format. I am converting into PETSc's sparse parallel AIJ matrix (traversing my matrix and using MatSetValues). The weights for my discretization scheme are already accounted for in the coefficients of my L-D-U matrix. I do have the submatrices in L-D-U format for each of my two equations' coupling with each other. That is, the equivalent of lines 242,251-252,254 of example 28 https://petsc.org/release/src/snes/tutorials/ex28.c.html. Could I directly convert my submatrices into PETSc's sub-matrix here, then assemble things together so that the field split preconditioners will work?

Alternatively, since my L-D-U matrices already account for the discretization scheme, can I use a simple structured grid DM?

Thank you so much for your help!
Regards,
Joshua
________________________________
From: Pierre Jolivet <pierre at joliv.et<mailto:pierre at joliv.et>>
Sent: Friday, March 3, 2023 11:45 AM
To: Christopher, Joshua <jchristopher at anl.gov<mailto:jchristopher at anl.gov>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG

For full disclosure, with -ksp_pc_side right -ksp_max_it 100 -ksp_rtol 1E-10:
1) with renumbering via ParMETIS
-pc_type bjacobi -sub_pc_type lu -sub_pc_factor_mat_solver_type mumps => Linear solve converged due to CONVERGED_RTOL iterations 10
-pc_type hypre -pc_hypre_boomeramg_relax_type_down l1-Gauss-Seidel -pc_hypre_boomeramg_relax_type_up backward-l1-Gauss-Seidel => Linear solve converged due to CONVERGED_RTOL iterations 55
2) without renumbering via ParMETIS
-pc_type bjacobi => Linear solve did not converge due to DIVERGED_ITS iterations 100
-pc_type hypre => Linear solve did not converge due to DIVERGED_ITS iterations 100
Using on outer fieldsplit may help fix this.

Thanks,
Pierre

On 3 Mar 2023, at 6:24 PM, Christopher, Joshua via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:

I am solving these equations in the context of electrically-driven fluid flows as that first paper describes. I am using a PIMPLE scheme to advance the fluid equations in time, and my goal is to do a coupled solve of the electric equations similar to what is described in this paper: https://www.sciencedirect.com/science/article/pii/S0045793019302427. They are using the SIMPLE scheme in this paper. My fluid flow should eventually reach steady behavior, and likewise the time derivative in the charge density should trend towards zero. They preferred using BiCGStab with a direct LU preconditioner for solving their electric equations. I tried to test that combination, but my case is halting for unknown reasons in the middle of the PETSc solve. I'll try with more nodes and see if I am running out of memory, but the computer is a little overloaded at the moment so it may take a while to run.

I sent Pierre Jolivet my matrix and RHS, and they said the matrix does not appear to be following a parallel numbering, and instead looks like the matrix has natural numbering. When they renumbered the system with ParMETIS they got really fast convergence. I am using PETSc through a library, so I will reach out to the library authors and see if there is an issue in the library.

Thank you,
Joshua
________________________________
From: Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Sent: Thursday, March 2, 2023 3:47 PM
To: Christopher, Joshua <jchristopher at anl.gov<mailto:jchristopher at anl.gov>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG




<Untitled.png>

  Are you solving this as a time-dependent problem? Using an implicit scheme (like backward Euler) for rho ? In ODE language, solving the differential algebraic equation?

Is epsilon bounded away from 0?

On Mar 2, 2023, at 4:22 PM, Christopher, Joshua <jchristopher at anl.gov<mailto:jchristopher at anl.gov>> wrote:

Hi Barry and Mark,

Thank you for looking into my problem. The two equations I am solving with PETSc are equations 6 and 7 from this paper:https://ris.utwente.nl/ws/portalfiles/portal/5676495/Roghair+Paper_final_draft_v1.pdf

I just used MUMPS and SuperLU_DIST on my full-size problem (with 3,000,000 unknowns). To clarify, I did a direct solve with -ksp_type preonly. They take a very long time, about 30 minutes for MUMPS and 18 minutes for SuperLU_DIST, see attached output. For reference, the same matrix took 658 iterations of BoomerAMG and about 20 seconds of walltime. Maybe I am already getting a great deal with BoomerAMG!

I'll try removing some terms from my solve (e.g. removing the second equation, then making the second equation just the elliptic portion of the equation, etc.) and try with a simpler geometry. I'll keep you updated as I run into troubles with that route. I wasn't aware of Field Split preconditioners, I'll do some reading on them and give them a try as well.

Thank you again,
Joshua
________________________________
From: Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Sent: Thursday, March 2, 2023 7:47 AM
To: Christopher, Joshua <jchristopher at anl.gov<mailto:jchristopher at anl.gov>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG


  Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the 5,000,000 unknowns? It is at the high end of problem sizes you can do with direct solvers but is worth comparing with  BoomerAMG. You likely want to use more nodes and fewer cores per node with MUMPs to be able to access more memory. If you are needing to solve multiple right hand sides but with the same matrix the factors will be reused resulting in the second and later solves being much faster.

  I agree with Mark, with iterative solvers you are likely to end up with PCFIELDSPLIT.

  Barry


On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:

Hello,

I am trying to solve the leaky-dielectric model equations with PETSc using a second-order discretization scheme (with limiting to first order as needed) using the finite volume method. The leaky dielectric model is a coupled system of two equations, consisting of a Poisson equation and a convection-diffusion equation.  I have tested on small problems with simple geometry (~1000 DoFs) using:

-ksp_type gmres
-pc_type hypre
-pc_hypre_type boomeramg

and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this in parallel with 2 cores, but also previously was able to use successfully use a direct solver in serial to solve this problem. When I scale up to my production problem, I get significantly worse convergence. My production problem has ~3 million DoFs, more complex geometry, and is solved on ~100 cores across two nodes. The boundary conditions change a little because of the geometry, but are of the same classifications (e.g. only Dirichlet and Neumann). On the production case, I am needing 600-4000 iterations to converge. I've attached the output from the first solve that took 658 iterations to converge, using the following output options:

-ksp_view_pre
-ksp_view
-ksp_converged_reason
-ksp_monitor_true_residual
-ksp_test_null_space

My matrix is non-symmetric, the condition number can be around 10e6, and the eigenvalues reported by PETSc have been real and positive (using -ksp_view_eigenvalues).

I have tried using other preconditions (superlu, mumps, gamg, mg) but hypre+boomeramg has performed the best so far. The literature seems to indicate that AMG is the best approach for solving these equations in a coupled fashion.

Do you have any advice on speeding up the convergence of this system?

Thank you,
Joshua
<petsc_gmres_boomeramg.txt>

<petsc_preonly_mumps.txt><petsc_preonly_superlu.txt>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230317/f15e516c/attachment-0001.html>

From mfadams at lbl.gov  Fri Mar 17 12:34:52 2023
From: mfadams at lbl.gov (Mark Adams)
Date: Fri, 17 Mar 2023 13:34:52 -0400
Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre
 BoomerAMG
In-Reply-To: <SA1PR09MB756832B0F3D61B1E798915D7D4BD9@SA1PR09MB7568.namprd09.prod.outlook.com>
References: <SA1PR09MB756856DAB4827D93146EDB69D4AD9@SA1PR09MB7568.namprd09.prod.outlook.com>
	<CAC6644E-BC14-4F64-ABE5-F35589A4BD94@petsc.dev>
	<SA1PR09MB75689D2D180F6312B008039AD4B29@SA1PR09MB7568.namprd09.prod.outlook.com>
	<BC1877A1-BF64-4E2D-86C4-DDFFBB892577@petsc.dev>
	<SA1PR09MB7568D48C1F38364F0DD286C0D4B39@SA1PR09MB7568.namprd09.prod.outlook.com>
	<523EAD18-437E-4008-A811-4D32317C89AC@joliv.et>
	<SA1PR09MB7568D0FA09DDED4732EA2546D4BE9@SA1PR09MB7568.namprd09.prod.outlook.com>
	<4A1F98D0-658C-47A2-8277-23F97F95F5C1@petsc.dev>
	<SA1PR09MB756832B0F3D61B1E798915D7D4BD9@SA1PR09MB7568.namprd09.prod.outlook.com>
Message-ID: <CADOhEh6EdfNs61pQXw+jfJt1WOBgUUGmR-dE9NYiD5TDAaBOVw@mail.gmail.com>

That sounds right,

See the docs and examples at
https://petsc.org/release/docs/manualpages/PC/PCFieldSplitSetIS/

On Fri, Mar 17, 2023 at 1:26?PM Christopher, Joshua via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Hi Barry,
>
> Thank you for your response. I'm a little confused about the relation
> between the IS integer values and matrix indices. From
> https://petsc.org/release/src/snes/tutorials/ex70.c.html it looks like my
> IS should just contain a list of the rows for each split? For example, if I
> have a 100x100 matrix with two fields, "rho" and "phi", the first 50 rows
> correspond to the "rho" variable and the last 50 correspond to the "phi"
> variable. So I should call PCFieldSplitSetIS twice, the first with an IS
> containing integers 0-49 and the second with integers 49-99?
> PCFieldSplitSetIS is expecting global row numbers, correct?
>
> My matrix is organized as one block after another.
>
>
> Thank you,
> Joshua
> ------------------------------
> *From:* Barry Smith <bsmith at petsc.dev>
> *Sent:* Tuesday, March 14, 2023 1:35 PM
> *To:* Christopher, Joshua <jchristopher at anl.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre
> BoomerAMG
>
>
>   You definitely do not need to use a complicated DM to take advantage of
> PCFIELDSPLIT. All you need to do is create two IS on each MPI process. The
> first should list all the indices of the degrees of freedom of your first
> type of variable and the second should list all the rest of the degrees of
> freedom. Then use
> https://petsc.org/release/docs/manualpages/PC/PCFieldSplitSetIS/
>
>   Barry
>
> Note: PCFIELDSPLIT does not care how you have ordered your degrees of
> freedom of the two types. You might interlace them or have all the first
> degree of freedom on an MPI process and then have all the second degree of
> freedom. This just determines what your IS look like.
>
>
>
> On Mar 14, 2023, at 1:14 PM, Christopher, Joshua via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> Hello PETSc users,
>
> I haven't heard back from the library developer regarding the numbering
> issue or my questions on using field split operators with their library, so
> I need to fix this myself.
>
> Regarding the natural numbering vs parallel numbering: I haven't figured
> out what is wrong here. I stepped through in parallel and it looks like
> each processor is setting up the matrix and calling MatSetValue similar to
> what is shown in
> https://petsc.org/release/src/ksp/ksp/tutorials/ex2.c.html. I see that
> PETSc is recognizing my simple two-processor test from the output
> ("PetscInitialize_Common(): PETSc successfully started: number of
> processors = 2"). I'll keep poking at this, however I'm very new to PETSc.
> When I print the matrix to ASCII using PETSC_VIEWER_DEFAULT, I'm guessing I
> see one row per line, and the tuples consists of the column number and
> value?
>
> On the FieldSplit preconditioner, is my understanding here correct:
>
> To use FieldSplit, I must have a DM. Since I have an unstructured mesh, I
> must use DMPlex and set up the chart and covering relations specific to my
> mesh following here: https://petsc.org/release/docs/manual/dmplex/. I
> think this may be very time-consuming for me to set up.
>
> Currently, I already have a matrix stored in a parallel sparse L-D-U
> format. I am converting into PETSc's sparse parallel AIJ matrix (traversing
> my matrix and using MatSetValues). The weights for my discretization scheme
> are already accounted for in the coefficients of my L-D-U matrix. I do have
> the submatrices in L-D-U format for each of my two equations' coupling with
> each other. That is, the equivalent of lines 242,251-252,254 of example 28
>  https://petsc.org/release/src/snes/tutorials/ex28.c.html. Could I
> directly convert my submatrices into PETSc's sub-matrix here, then assemble
> things together so that the field split preconditioners will work?
>
> Alternatively, since my L-D-U matrices already account for the
> discretization scheme, can I use a simple structured grid DM?
>
> Thank you so much for your help!
> Regards,
> Joshua
> ------------------------------
> *From:* Pierre Jolivet <pierre at joliv.et>
> *Sent:* Friday, March 3, 2023 11:45 AM
> *To:* Christopher, Joshua <jchristopher at anl.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre
> BoomerAMG
>
> For full disclosure, with -ksp_pc_side right -ksp_max_it 100 -ksp_rtol
> 1E-10:
> 1) with renumbering via ParMETIS
> -pc_type bjacobi -sub_pc_type lu -sub_pc_factor_mat_solver_type mumps
> => Linear solve converged due to CONVERGED_RTOL iterations 10
> -pc_type hypre -pc_hypre_boomeramg_relax_type_down l1-Gauss-Seidel
> -pc_hypre_boomeramg_relax_type_up backward-l1-Gauss-Seidel => Linear solve
> converged due to CONVERGED_RTOL iterations 55
> 2) without renumbering via ParMETIS
> -pc_type bjacobi => Linear solve did not converge due to DIVERGED_ITS
> iterations 100
> -pc_type hypre => Linear solve did not converge due to DIVERGED_ITS
> iterations 100
> Using on outer fieldsplit may help fix this.
>
> Thanks,
> Pierre
>
> On 3 Mar 2023, at 6:24 PM, Christopher, Joshua via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> I am solving these equations in the context of electrically-driven fluid
> flows as that first paper describes. I am using a PIMPLE scheme to advance
> the fluid equations in time, and my goal is to do a coupled solve of the
> electric equations similar to what is described in this paper:
> https://www.sciencedirect.com/science/article/pii/S0045793019302427. They
> are using the SIMPLE scheme in this paper. My fluid flow should eventually
> reach steady behavior, and likewise the time derivative in the charge
> density should trend towards zero. They preferred using BiCGStab with a
> direct LU preconditioner for solving their electric equations. I tried to
> test that combination, but my case is halting for unknown reasons in the
> middle of the PETSc solve. I'll try with more nodes and see if I am running
> out of memory, but the computer is a little overloaded at the moment so it
> may take a while to run.
>
> I sent Pierre Jolivet my matrix and RHS, and they said the matrix does not
> appear to be following a parallel numbering, and instead looks like the
> matrix has natural numbering. When they renumbered the system with ParMETIS
> they got really fast convergence. I am using PETSc through a library, so I
> will reach out to the library authors and see if there is an issue in the
> library.
>
> Thank you,
> Joshua
> ------------------------------
> *From:* Barry Smith <bsmith at petsc.dev>
> *Sent:* Thursday, March 2, 2023 3:47 PM
> *To:* Christopher, Joshua <jchristopher at anl.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre
> BoomerAMG
>
>
>
>
> <Untitled.png>
>
>   Are you solving this as a time-dependent problem? Using an implicit
> scheme (like backward Euler) for rho ? In ODE language, solving the
> differential algebraic equation?
>
> Is epsilon bounded away from 0?
>
> On Mar 2, 2023, at 4:22 PM, Christopher, Joshua <jchristopher at anl.gov>
> wrote:
>
> Hi Barry and Mark,
>
> Thank you for looking into my problem. The two equations I am solving with
> PETSc are equations 6 and 7 from this paper:
> https://ris.utwente.nl/ws/portalfiles/portal/5676495/Roghair+Paper_final_draft_v1.pdf
>
> I just used MUMPS and SuperLU_DIST on my full-size problem (with 3,000,000
> unknowns). To clarify, I did a direct solve with -ksp_type preonly. They
> take a very long time, about 30 minutes for MUMPS and 18 minutes for
> SuperLU_DIST, see attached output. For reference, the same matrix took 658
> iterations of BoomerAMG and about 20 seconds of walltime. Maybe I am
> already getting a great deal with BoomerAMG!
>
> I'll try removing some terms from my solve (e.g. removing the second
> equation, then making the second equation just the elliptic portion of the
> equation, etc.) and try with a simpler geometry. I'll keep you updated as I
> run into troubles with that route. I wasn't aware of Field Split
> preconditioners, I'll do some reading on them and give them a try as well.
>
> Thank you again,
> Joshua
> ------------------------------
> *From:* Barry Smith <bsmith at petsc.dev>
> *Sent:* Thursday, March 2, 2023 7:47 AM
> *To:* Christopher, Joshua <jchristopher at anl.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre
> BoomerAMG
>
>
>   Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the
> 5,000,000 unknowns? It is at the high end of problem sizes you can do with
> direct solvers but is worth comparing with  BoomerAMG. You likely want to
> use more nodes and fewer cores per node with MUMPs to be able to access
> more memory. If you are needing to solve multiple right hand sides but with
> the same matrix the factors will be reused resulting in the second and
> later solves being much faster.
>
>   I agree with Mark, with iterative solvers you are likely to end up with
> PCFIELDSPLIT.
>
>   Barry
>
>
> On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> Hello,
>
> I am trying to solve the leaky-dielectric model equations with PETSc using
> a second-order discretization scheme (with limiting to first order as
> needed) using the finite volume method. The leaky dielectric model is a
> coupled system of two equations, consisting of a Poisson equation and a
> convection-diffusion equation.  I have tested on small problems with simple
> geometry (~1000 DoFs) using:
>
> -ksp_type gmres
> -pc_type hypre
> -pc_hypre_type boomeramg
>
> and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this
> in parallel with 2 cores, but also previously was able to use successfully
> use a direct solver in serial to solve this problem. When I scale up to my
> production problem, I get significantly worse convergence. My production
> problem has ~3 million DoFs, more complex geometry, and is solved on ~100
> cores across two nodes. The boundary conditions change a little because of
> the geometry, but are of the same classifications (e.g. only Dirichlet and
> Neumann). On the production case, I am needing 600-4000 iterations to
> converge. I've attached the output from the first solve that took 658
> iterations to converge, using the following output options:
>
> -ksp_view_pre
> -ksp_view
> -ksp_converged_reason
> -ksp_monitor_true_residual
> -ksp_test_null_space
>
> My matrix is non-symmetric, the condition number can be around 10e6, and
> the eigenvalues reported by PETSc have been real and positive (using
> -ksp_view_eigenvalues).
>
> I have tried using other preconditions (superlu, mumps, gamg, mg) but
> hypre+boomeramg has performed the best so far. The literature seems to
> indicate that AMG is the best approach for solving these equations in a
> coupled fashion.
>
> Do you have any advice on speeding up the convergence of this system?
>
> Thank you,
> Joshua
> <petsc_gmres_boomeramg.txt>
>
>
> <petsc_preonly_mumps.txt><petsc_preonly_superlu.txt>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230317/803081a0/attachment-0001.html>

From danyang.su at gmail.com  Fri Mar 17 13:02:33 2023
From: danyang.su at gmail.com (danyang.su at gmail.com)
Date: Fri, 17 Mar 2023 11:02:33 -0700
Subject: [petsc-users] PETSC ERROR in DMGetLocalBoundingBox?
In-Reply-To: <8426FD29-CAD9-4B7B-8937-C03D1EF9C831@gmail.com>
References: <00ab01d94e31$51fdc590$f5f950b0$@gmail.com>
	<CAMYG4G=0upUasJCTPckC9zaT4_=oYosj61AOdT19SLUcDs=vyQ@mail.gmail.com>
	<8426FD29-CAD9-4B7B-8937-C03D1EF9C831@gmail.com>
Message-ID: <001601d958fa$a6b83a60$f428af20$@gmail.com>

Hi Matt,

 

I am following up to check if you can reproduce the problem on your side. 

 

Thanks and have a great weekend,

 

Danyang

 

From: Danyang Su <danyang.su at gmail.com> 
Sent: March 4, 2023 4:38 PM
To: Matthew Knepley <knepley at gmail.com>
Cc: petsc-users at mcs.anl.gov
Subject: Re: [petsc-users] PETSC ERROR in DMGetLocalBoundingBox?

 

Hi Matt,

 

Attached is the source code and example. I have deleted most of the unused source code but it is still a bit length. Sorry about that. The errors come after DMGetLocalBoundingBox and DMGetBoundingBox.

 

-> To compile the code

Please type 'make exe' and the executable file petsc_bounding will be created under the same folder.

 

 

-> To test the code

Please go to fold 'test' and type 'mpiexec -n 1 ../petsc_bounding'.

 

 

-> The output from PETSc 3.18, error information

input file: stedvs.dat

 

------------------------------------------------------------------------

global control parameters

------------------------------------------------------------------------

 

[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------

[0]PETSC ERROR: Corrupt argument: https://petsc.org/release/faq/#valgrind

[0]PETSC ERROR: Object already free: Parameter # 1

[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.

[0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 

[0]PETSC ERROR: ../petsc_bounding on a linux-gnu-dbg named starblazer by dsu Sat Mar  4 16:20:51 2023

[0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-scalapack --download-parmetis --download-metis --download-mumps --download-ptscotch --download-chaco --download-fblaslapack --download-hypre --download-superlu_dist --download-hdf5=yes --download-ctetgen --download-zlib --download-pnetcdf --download-cmake --with-hdf5-fortran-bindings --with-debugging=1

[0]PETSC ERROR: #1 VecGetArrayRead() at /home/dsu/Soft/petsc/petsc-3.18.3/src/vec/vec/interface/rvector.c:1928

[0]PETSC ERROR: #2 DMGetLocalBoundingBox() at /home/dsu/Soft/petsc/petsc-3.18.3/src/dm/interface/dmcoordinates.c:897

[0]PETSC ERROR: #3 /home/dsu/Work/bug-check/petsc_bounding/src/solver_ddmethod.F90:1920

Total volume of simulation domain   0.20000000E+01

Total volume of simulation domain   0.20000000E+01

 

 

-> The output from PETSc 3.17 and earlier, no error

input file: stedvs.dat

 

------------------------------------------------------------------------

global control parameters

------------------------------------------------------------------------

 

Total volume of simulation domain   0.20000000E+01

Total volume of simulation domain   0.20000000E+01

 

 

Thanks,

 

Danyang

From: Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com> >
Date: Friday, March 3, 2023 at 8:58 PM
To: <danyang.su at gmail.com <mailto:danyang.su at gmail.com> >
Cc: <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> >
Subject: Re: [petsc-users] PETSC ERROR in DMGetLocalBoundingBox?

 

On Sat, Mar 4, 2023 at 1:35?AM <danyang.su at gmail.com <mailto:danyang.su at gmail.com> > wrote:

Hi All,

 

I get a very strange error after upgrading PETSc version to 3.18.3, indicating some object is already free. The error is begin and does not crash the code. There is no error before PETSc 3.17.5 versions.

 

We have changed the way coordinates are handled in order to support higher order coordinate fields. Is it possible

to send something that we can run that has this error? It could be on our end, but it could also be that you are

destroying a coordinate vector accidentally.

 

  Thanks,

 

     Matt

 

 

        !Check coordinates

        call DMGetCoordinateDM(dmda_flow%da,cda,ierr)

        CHKERRQ(ierr)

        call DMGetCoordinates(dmda_flow%da,gc,ierr)

        CHKERRQ(ierr)

        call DMGetLocalBoundingBox(dmda_flow%da,lmin,lmax,ierr)

        CHKERRQ(ierr)

        call DMGetBoundingBox(dmda_flow%da,gmin,gmax,ierr)

        CHKERRQ(ierr)

 

 

[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------

[0]PETSC ERROR: Corrupt argument: https://petsc.org/release/faq/#valgrind

[0]PETSC ERROR: Object already free: Parameter # 1

[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.

[0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022

[0]PETSC ERROR: ../min3p-hpc-mpi-petsc-3.18.3 on a linux-gnu-dbg named starblazer by dsu Fri Mar  3 16:26:03 2023

[0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-scalapack --download-parmetis --download-metis --download-mumps --download-ptscotch --download-chaco --download-fblaslapack --download-hypre --download-superlu_dist --download-hdf5=yes --download-ctetgen --download-zlib --download-pnetcdf --download-cmake --with-hdf5-fortran-bindings --with-debugging=1

[0]PETSC ERROR: #1 VecGetArrayRead() at /home/dsu/Soft/petsc/petsc-3.18.3/src/vec/vec/interface/rvector.c:1928

[0]PETSC ERROR: #2 DMGetLocalBoundingBox() at /home/dsu/Soft/petsc/petsc-3.18.3/src/dm/interface/dmcoordinates.c:897

[0]PETSC ERROR: #3 /home/dsu/Work/min3p-dbs-backup/src/project/makefile_p/../../solver/solver_ddmethod.F90:2140

 

Any suggestion on this?

 

Thanks,

 

Danyang




 

-- 

What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

 

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230317/72c13909/attachment.html>

From bsmith at petsc.dev  Fri Mar 17 13:14:17 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 17 Mar 2023 14:14:17 -0400
Subject: [petsc-users] Create a nest not aligned by processors
In-Reply-To: <d9d5551d095a228f57cb5724fbd994ab@ens-lyon.fr>
References: <b89465855591abef194300f1dc0f68e4@ens-lyon.fr>
	<3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev>
	<ece4a0ac05b61082f2ea73507c7551c8@ens-lyon.fr>
	<193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev>
	<d9d5551d095a228f57cb5724fbd994ab@ens-lyon.fr>
Message-ID: <82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev>


   This sounds  like a fine use of MATNEST. Now back to the original question

>>> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code
>>> 
>>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
>>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
>>> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr)
>>> 
>>> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear).
>>> 
>>> Is it possible to change that ?
>>> 
  If I understand correctly it is behaving as expected. It is the same matrix on 1 and 2 MPI processes, the only difference is the ordering of the rows and columns. 

  Both matrix blocks are split among the two MPI processes. This is how MATNEST works and likely what you want in practice. 

> On Mar 17, 2023, at 1:19 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote:
> 
> I have a matrix with four different blocks (2rows - 2columns). The block sizes differ from one another, because they correspond to a different physical variable. One of the block has the particularity that it has to be updated at each iteration. This update is performed by replacing it with a product of multiple matrices that depend on the result of the previous iteration. Note that these intermediate matrices are not square (because they also correspond to other types of variables), and that they must be completely refilled by hand (i.e. they are not the result of some simple linear operations). Finally, I use this final block matrix to solve multiple linear systems (with different righthand sides), so for now I use MUMPS as only the first solve takes time (but I might change it).
> 
> Considering this setting, I created each type of variable separately, filled the different matrices, and created different nests of vectors / matrices for my operations. When the time comes to use KSPSolve, I use MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy the few vector data I need from my nests in a regular Vector, I solve, I get back my data in my nest and carry on with the operations needed for my updates.
> 
> Is that clear ? I don't know if I provided too many or not enough details.
> 
> Thank you
> 
> ---
> Cl?ment BERGER
> ENS de Lyon
> 
> 
> Le 2023-03-17 17:34, Barry Smith a ?crit :
> 
>>  
>>    Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it. 
>>  
>>    Barry
>>  
>> Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks).
>> 
>>> On Mar 17, 2023, at 12:14 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote:
>>> 
>>> It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work.
>>> 
>>> ---
>>> Cl?ment BERGER
>>> ENS de Lyon
>>> 
>>> 
>>> Le 2023-03-17 15:48, Barry Smith a ?crit :
>>> 
>>>  
>>>    You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process.
>>>  
>>>   Barry
>>> 
>>> 
>>> On Mar 17, 2023, at 10:10 AM, Berger Clement <clement.berger at ens-lyon.fr> wrote:
>>> Dear all,
>>> 
>>> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code
>>> 
>>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
>>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
>>> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr)
>>> 
>>> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear).
>>> 
>>> Is it possible to change that ?
>>> 
>>> Note that I am coding in fortran if that has ay consequence.
>>> 
>>> Thank you,
>>> 
>>> Sincerely,
>>> 
>>> -- 
>>> Cl?ment BERGER
>>> ENS de Lyon

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230317/e25fe153/attachment-0001.html>

From bsmith at petsc.dev  Fri Mar 17 13:22:52 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 17 Mar 2023 14:22:52 -0400
Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre
 BoomerAMG
In-Reply-To: <SA1PR09MB756832B0F3D61B1E798915D7D4BD9@SA1PR09MB7568.namprd09.prod.outlook.com>
References: <SA1PR09MB756856DAB4827D93146EDB69D4AD9@SA1PR09MB7568.namprd09.prod.outlook.com>
	<CAC6644E-BC14-4F64-ABE5-F35589A4BD94@petsc.dev>
	<SA1PR09MB75689D2D180F6312B008039AD4B29@SA1PR09MB7568.namprd09.prod.outlook.com>
	<BC1877A1-BF64-4E2D-86C4-DDFFBB892577@petsc.dev>
	<SA1PR09MB7568D48C1F38364F0DD286C0D4B39@SA1PR09MB7568.namprd09.prod.outlook.com>
	<523EAD18-437E-4008-A811-4D32317C89AC@joliv.et>
	<SA1PR09MB7568D0FA09DDED4732EA2546D4BE9@SA1PR09MB7568.namprd09.prod.outlook.com>
	<4A1F98D0-658C-47A2-8277-23F97F95F5C1@petsc.dev>
	<SA1PR09MB756832B0F3D61B1E798915D7D4BD9@SA1PR09MB7568.namprd09.prod.outlook.com>
Message-ID: <595D8D88-C619-41D7-A427-1C0EFB5C5E44@petsc.dev>



> On Mar 17, 2023, at 1:26 PM, Christopher, Joshua <jchristopher at anl.gov> wrote:
> 
> Hi Barry,
> 
> Thank you for your response. I'm a little confused about the relation between the IS integer values and matrix indices. Fromhttps://petsc.org/release/src/snes/tutorials/ex70.c.html it looks like my IS should just contain a list of the rows for each split? For example, if I have a 100x100 matrix with two fields, "rho" and "phi", the first 50 rows correspond to the "rho" variable and the last 50 correspond to the "phi" variable. So I should call PCFieldSplitSetIS twice, the first with an IS containing integers 0-49 and the second with integers 49-99? PCFieldSplitSetIS is expecting global row numbers, correct?

  As Mark said, yes this sounds fine.
> 
> My matrix is organized as one block after another.

   When you are running in parallel with MPI, how will you organize the unknowns? Will you have 25 of the rho followed by 25 of phi on each MPI process? You will need to take this into account when you build the IS on each MPI process.

  Barry

> 
> 
> Thank you,
> Joshua
> From: Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
> Sent: Tuesday, March 14, 2023 1:35 PM
> To: Christopher, Joshua <jchristopher at anl.gov <mailto:jchristopher at anl.gov>>
> Cc: petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
> Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG
>  
> 
>   You definitely do not need to use a complicated DM to take advantage of PCFIELDSPLIT. All you need to do is create two IS on each MPI process. The first should list all the indices of the degrees of freedom of your first type of variable and the second should list all the rest of the degrees of freedom. Then use https://petsc.org/release/docs/manualpages/PC/PCFieldSplitSetIS/
> 
>   Barry
> 
> Note: PCFIELDSPLIT does not care how you have ordered your degrees of freedom of the two types. You might interlace them or have all the first degree of freedom on an MPI process and then have all the second degree of freedom. This just determines what your IS look like.
> 
> 
> 
>> On Mar 14, 2023, at 1:14 PM, Christopher, Joshua via petsc-users <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>> 
>> Hello PETSc users,
>> 
>> I haven't heard back from the library developer regarding the numbering issue or my questions on using field split operators with their library, so I need to fix this myself.
>> 
>> Regarding the natural numbering vs parallel numbering: I haven't figured out what is wrong here. I stepped through in parallel and it looks like each processor is setting up the matrix and calling MatSetValue similar to what is shown in https://petsc.org/release/src/ksp/ksp/tutorials/ex2.c.html. I see that PETSc is recognizing my simple two-processor test from the output ("PetscInitialize_Common(): PETSc successfully started: number of processors = 2"). I'll keep poking at this, however I'm very new to PETSc. When I print the matrix to ASCII using PETSC_VIEWER_DEFAULT, I'm guessing I see one row per line, and the tuples consists of the column number and value?
>> 
>> On the FieldSplit preconditioner, is my understanding here correct:
>> 
>> To use FieldSplit, I must have a DM. Since I have an unstructured mesh, I must use DMPlex and set up the chart and covering relations specific to my mesh following here: https://petsc.org/release/docs/manual/dmplex/. I think this may be very time-consuming for me to set up. 
>> 
>> Currently, I already have a matrix stored in a parallel sparse L-D-U format. I am converting into PETSc's sparse parallel AIJ matrix (traversing my matrix and using MatSetValues). The weights for my discretization scheme are already accounted for in the coefficients of my L-D-U matrix. I do have the submatrices in L-D-U format for each of my two equations' coupling with each other. That is, the equivalent of lines 242,251-252,254 of example 28 https://petsc.org/release/src/snes/tutorials/ex28.c.html. Could I directly convert my submatrices into PETSc's sub-matrix here, then assemble things together so that the field split preconditioners will work?
>> 
>> Alternatively, since my L-D-U matrices already account for the discretization scheme, can I use a simple structured grid DM?
>> 
>> Thank you so much for your help!
>> Regards,
>> Joshua
>> From: Pierre Jolivet <pierre at joliv.et <mailto:pierre at joliv.et>>
>> Sent: Friday, March 3, 2023 11:45 AM
>> To: Christopher, Joshua <jchristopher at anl.gov <mailto:jchristopher at anl.gov>>
>> Cc: petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
>> Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG
>>  
>> For full disclosure, with -ksp_pc_side right -ksp_max_it 100 -ksp_rtol 1E-10:
>> 1) with renumbering via ParMETIS
>> -pc_type bjacobi -sub_pc_type lu -sub_pc_factor_mat_solver_type mumps => Linear solve converged due to CONVERGED_RTOL iterations 10
>> -pc_type hypre -pc_hypre_boomeramg_relax_type_down l1-Gauss-Seidel -pc_hypre_boomeramg_relax_type_up backward-l1-Gauss-Seidel => Linear solve converged due to CONVERGED_RTOL iterations 55
>> 2) without renumbering via ParMETIS
>> -pc_type bjacobi => Linear solve did not converge due to DIVERGED_ITS iterations 100
>> -pc_type hypre => Linear solve did not converge due to DIVERGED_ITS iterations 100
>> Using on outer fieldsplit may help fix this.
>> 
>> Thanks,
>> Pierre
>> 
>>> On 3 Mar 2023, at 6:24 PM, Christopher, Joshua via petsc-users <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>>> 
>>> I am solving these equations in the context of electrically-driven fluid flows as that first paper describes. I am using a PIMPLE scheme to advance the fluid equations in time, and my goal is to do a coupled solve of the electric equations similar to what is described in this paper: https://www.sciencedirect.com/science/article/pii/S0045793019302427. They are using the SIMPLE scheme in this paper. My fluid flow should eventually reach steady behavior, and likewise the time derivative in the charge density should trend towards zero. They preferred using BiCGStab with a direct LU preconditioner for solving their electric equations. I tried to test that combination, but my case is halting for unknown reasons in the middle of the PETSc solve. I'll try with more nodes and see if I am running out of memory, but the computer is a little overloaded at the moment so it may take a while to run.
>>> 
>>> I sent Pierre Jolivet my matrix and RHS, and they said the matrix does not appear to be following a parallel numbering, and instead looks like the matrix has natural numbering. When they renumbered the system with ParMETIS they got really fast convergence. I am using PETSc through a library, so I will reach out to the library authors and see if there is an issue in the library.
>>> 
>>> Thank you,
>>> Joshua
>>> From: Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
>>> Sent: Thursday, March 2, 2023 3:47 PM
>>> To: Christopher, Joshua <jchristopher at anl.gov <mailto:jchristopher at anl.gov>>
>>> Cc: petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
>>> Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG
>>>  
>>> 
>>> 
>>> 
>>> <Untitled.png>
>>> 
>>>   Are you solving this as a time-dependent problem? Using an implicit scheme (like backward Euler) for rho ? In ODE language, solving the differential algebraic equation?
>>> 
>>> Is epsilon bounded away from 0? 
>>> 
>>>> On Mar 2, 2023, at 4:22 PM, Christopher, Joshua <jchristopher at anl.gov <mailto:jchristopher at anl.gov>> wrote:
>>>> 
>>>> Hi Barry and Mark,
>>>> 
>>>> Thank you for looking into my problem. The two equations I am solving with PETSc are equations 6 and 7 from this paper:https://ris.utwente.nl/ws/portalfiles/portal/5676495/Roghair+Paper_final_draft_v1.pdf
>>>> 
>>>> I just used MUMPS and SuperLU_DIST on my full-size problem (with 3,000,000 unknowns). To clarify, I did a direct solve with -ksp_type preonly. They take a very long time, about 30 minutes for MUMPS and 18 minutes for SuperLU_DIST, see attached output. For reference, the same matrix took 658 iterations of BoomerAMG and about 20 seconds of walltime. Maybe I am already getting a great deal with BoomerAMG!
>>>> 
>>>> I'll try removing some terms from my solve (e.g. removing the second equation, then making the second equation just the elliptic portion of the equation, etc.) and try with a simpler geometry. I'll keep you updated as I run into troubles with that route. I wasn't aware of Field Split preconditioners, I'll do some reading on them and give them a try as well.
>>>> 
>>>> Thank you again,
>>>> Joshua
>>>>   
>>>> From: Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
>>>> Sent: Thursday, March 2, 2023 7:47 AM
>>>> To: Christopher, Joshua <jchristopher at anl.gov <mailto:jchristopher at anl.gov>>
>>>> Cc: petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
>>>> Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG
>>>>  
>>>> 
>>>>   Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the 5,000,000 unknowns? It is at the high end of problem sizes you can do with direct solvers but is worth comparing with  BoomerAMG. You likely want to use more nodes and fewer cores per node with MUMPs to be able to access more memory. If you are needing to solve multiple right hand sides but with the same matrix the factors will be reused resulting in the second and later solves being much faster.
>>>> 
>>>>   I agree with Mark, with iterative solvers you are likely to end up with PCFIELDSPLIT.
>>>> 
>>>>   Barry
>>>> 
>>>> 
>>>>> On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> I am trying to solve the leaky-dielectric model equations with PETSc using a second-order discretization scheme (with limiting to first order as needed) using the finite volume method. The leaky dielectric model is a coupled system of two equations, consisting of a Poisson equation and a convection-diffusion equation.  I have tested on small problems with simple geometry (~1000 DoFs) using:
>>>>> 
>>>>> -ksp_type gmres 
>>>>> -pc_type hypre 
>>>>> -pc_hypre_type boomeramg
>>>>> 
>>>>> and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this in parallel with 2 cores, but also previously was able to use successfully use a direct solver in serial to solve this problem. When I scale up to my production problem, I get significantly worse convergence. My production problem has ~3 million DoFs, more complex geometry, and is solved on ~100 cores across two nodes. The boundary conditions change a little because of the geometry, but are of the same classifications (e.g. only Dirichlet and Neumann). On the production case, I am needing 600-4000 iterations to converge. I've attached the output from the first solve that took 658 iterations to converge, using the following output options:
>>>>> 
>>>>> -ksp_view_pre
>>>>> -ksp_view
>>>>> -ksp_converged_reason
>>>>> -ksp_monitor_true_residual
>>>>> -ksp_test_null_space
>>>>> 
>>>>> My matrix is non-symmetric, the condition number can be around 10e6, and the eigenvalues reported by PETSc have been real and positive (using -ksp_view_eigenvalues). 
>>>>> 
>>>>> I have tried using other preconditions (superlu, mumps, gamg, mg) but hypre+boomeramg has performed the best so far. The literature seems to indicate that AMG is the best approach for solving these equations in a coupled fashion.
>>>>> 
>>>>> Do you have any advice on speeding up the convergence of this system? 
>>>>> 
>>>>> Thank you,
>>>>> Joshua
>>>>> <petsc_gmres_boomeramg.txt>
>>>> 
>>>> <petsc_preonly_mumps.txt><petsc_preonly_superlu.txt>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230317/a6c68500/attachment-0001.html>

From clement.berger at ens-lyon.fr  Fri Mar 17 13:23:04 2023
From: clement.berger at ens-lyon.fr (Berger Clement)
Date: Fri, 17 Mar 2023 19:23:04 +0100
Subject: [petsc-users] Create a nest not aligned by processors
In-Reply-To: <82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev>
References: <b89465855591abef194300f1dc0f68e4@ens-lyon.fr>
	<3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev>
	<ece4a0ac05b61082f2ea73507c7551c8@ens-lyon.fr>
	<193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev>
	<d9d5551d095a228f57cb5724fbd994ab@ens-lyon.fr>
	<82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev>
Message-ID: <eaab4d4a43e6e8551e058d5b3b2cf3ae@ens-lyon.fr>

My issue is that it seems to improperly with some step of my process,
the solve step doesn't provide the same result depending on the number
of processors I use. I manually tried to multiply one the matrices I
defined as a nest against a vector, and the result is not the same with
e.g. 1 and 3 processors. That's why I tried the toy program I wrote in
the first place, which highlights the misplacement of elements.

---
Cl?ment BERGER
ENS de Lyon 

Le 2023-03-17 19:14, Barry Smith a ?crit :

> This sounds  like a fine use of MATNEST. Now back to the original question 
> 
> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code 
> 
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) 
> 
> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). 
> 
> Is it possible to change that ?
   If I understand correctly it is behaving as expected. It is the same
matrix on 1 and 2 MPI processes, the only difference is the ordering of
the rows and columns.  

  Both matrix blocks are split among the two MPI processes. This is how
MATNEST works and likely what you want in practice.  

> On Mar 17, 2023, at 1:19 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> I have a matrix with four different blocks (2rows - 2columns). The block sizes differ from one another, because they correspond to a different physical variable. One of the block has the particularity that it has to be updated at each iteration. This update is performed by replacing it with a product of multiple matrices that depend on the result of the previous iteration. Note that these intermediate matrices are not square (because they also correspond to other types of variables), and that they must be completely refilled by hand (i.e. they are not the result of some simple linear operations). Finally, I use this final block matrix to solve multiple linear systems (with different righthand sides), so for now I use MUMPS as only the first solve takes time (but I might change it). 
> 
> Considering this setting, I created each type of variable separately, filled the different matrices, and created different nests of vectors / matrices for my operations. When the time comes to use KSPSolve, I use MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy the few vector data I need from my nests in a regular Vector, I solve, I get back my data in my nest and carry on with the operations needed for my updates. 
> 
> Is that clear ? I don't know if I provided too many or not enough details. 
> 
> Thank you
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 17:34, Barry Smith a ?crit : 
> Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it.  
> 
> Barry 
> 
> Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks).
> 
> On Mar 17, 2023, at 12:14 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work.
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 15:48, Barry Smith a ?crit : 
> You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process. 
> 
> Barry 
> 
> On Mar 17, 2023, at 10:10 AM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> Dear all, 
> 
> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code 
> 
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) 
> 
> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). 
> 
> Is it possible to change that ? 
> 
> Note that I am coding in fortran if that has ay consequence. 
> 
> Thank you, 
> 
> Sincerely,
> 
> -- 
> Cl?ment BERGER
> ENS de Lyon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230317/05de4ff4/attachment.html>

From bsmith at petsc.dev  Fri Mar 17 13:27:22 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 17 Mar 2023 14:27:22 -0400
Subject: [petsc-users] Create a nest not aligned by processors
In-Reply-To: <eaab4d4a43e6e8551e058d5b3b2cf3ae@ens-lyon.fr>
References: <b89465855591abef194300f1dc0f68e4@ens-lyon.fr>
	<3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev>
	<ece4a0ac05b61082f2ea73507c7551c8@ens-lyon.fr>
	<193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev>
	<d9d5551d095a228f57cb5724fbd994ab@ens-lyon.fr>
	<82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev>
	<eaab4d4a43e6e8551e058d5b3b2cf3ae@ens-lyon.fr>
Message-ID: <CC2A3FF0-505B-4A18-A8DB-8399BA92CCF8@petsc.dev>


  I would run your code with small sizes on 1, 2, 3 MPI ranks and use MatView() to examine the matrices. They will definitely be ordered differently but should otherwise be the same. My guess is that the right hand side may not have the correct ordering with respect to the matrix ordering in parallel. Note also that when the right hand side does have the correct ordering the solution will have a different ordering for each different number of MPI ranks when printed (but changing the ordering should give the same results up to machine precision.

  Barry


> On Mar 17, 2023, at 2:23 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote:
> 
> My issue is that it seems to improperly with some step of my process, the solve step doesn't provide the same result depending on the number of processors I use. I manually tried to multiply one the matrices I defined as a nest against a vector, and the result is not the same with e.g. 1 and 3 processors. That's why I tried the toy program I wrote in the first place, which highlights the misplacement of elements.
> 
> ---
> Cl?ment BERGER
> ENS de Lyon
> 
> 
> Le 2023-03-17 19:14, Barry Smith a ?crit :
> 
>> 
>>    This sounds  like a fine use of MATNEST. Now back to the original question
>>  
>>> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code
>>> 
>>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
>>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
>>> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr)
>>> 
>>> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear).
>>> 
>>> Is it possible to change that ?
>>> 
>>   If I understand correctly it is behaving as expected. It is the same matrix on 1 and 2 MPI processes, the only difference is the ordering of the rows and columns. 
>>  
>>   Both matrix blocks are split among the two MPI processes. This is how MATNEST works and likely what you want in practice. 
>> 
>>> On Mar 17, 2023, at 1:19 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote:
>>> 
>>> I have a matrix with four different blocks (2rows - 2columns). The block sizes differ from one another, because they correspond to a different physical variable. One of the block has the particularity that it has to be updated at each iteration. This update is performed by replacing it with a product of multiple matrices that depend on the result of the previous iteration. Note that these intermediate matrices are not square (because they also correspond to other types of variables), and that they must be completely refilled by hand (i.e. they are not the result of some simple linear operations). Finally, I use this final block matrix to solve multiple linear systems (with different righthand sides), so for now I use MUMPS as only the first solve takes time (but I might change it).
>>> 
>>> Considering this setting, I created each type of variable separately, filled the different matrices, and created different nests of vectors / matrices for my operations. When the time comes to use KSPSolve, I use MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy the few vector data I need from my nests in a regular Vector, I solve, I get back my data in my nest and carry on with the operations needed for my updates.
>>> 
>>> Is that clear ? I don't know if I provided too many or not enough details.
>>> 
>>> Thank you
>>> 
>>> ---
>>> Cl?ment BERGER
>>> ENS de Lyon
>>> 
>>> 
>>> Le 2023-03-17 17:34, Barry Smith a ?crit :
>>> 
>>>  
>>>    Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it. 
>>>  
>>>    Barry
>>>  
>>> Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks).
>>> 
>>> On Mar 17, 2023, at 12:14 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote:
>>> 
>>> It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work.
>>> 
>>> ---
>>> Cl?ment BERGER
>>> ENS de Lyon
>>> 
>>> 
>>> Le 2023-03-17 15:48, Barry Smith a ?crit :
>>> 
>>>  
>>>    You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process.
>>>  
>>>   Barry
>>> 
>>> 
>>> On Mar 17, 2023, at 10:10 AM, Berger Clement <clement.berger at ens-lyon.fr> wrote:
>>> Dear all,
>>> 
>>> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code
>>> 
>>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
>>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
>>> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr)
>>> 
>>> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear).
>>> 
>>> Is it possible to change that ?
>>> 
>>> Note that I am coding in fortran if that has ay consequence.
>>> 
>>> Thank you,
>>> 
>>> Sincerely,
>>> 
>>> -- 
>>> Cl?ment BERGER
>>> ENS de Lyon

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230317/e7eb1224/attachment-0001.html>

From knepley at gmail.com  Fri Mar 17 13:29:13 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 17 Mar 2023 14:29:13 -0400
Subject: [petsc-users] Create a nest not aligned by processors
In-Reply-To: <eaab4d4a43e6e8551e058d5b3b2cf3ae@ens-lyon.fr>
References: <b89465855591abef194300f1dc0f68e4@ens-lyon.fr>
	<3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev>
	<ece4a0ac05b61082f2ea73507c7551c8@ens-lyon.fr>
	<193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev>
	<d9d5551d095a228f57cb5724fbd994ab@ens-lyon.fr>
	<82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev>
	<eaab4d4a43e6e8551e058d5b3b2cf3ae@ens-lyon.fr>
Message-ID: <CAMYG4G=dDEvOR68-5Z4gG722dAbO2sy4+NoJL5tB=0JOw6=RpA@mail.gmail.com>

On Fri, Mar 17, 2023 at 2:23?PM Berger Clement <clement.berger at ens-lyon.fr>
wrote:

> My issue is that it seems to improperly with some step of my process, the
> solve step doesn't provide the same result depending on the number of
> processors I use. I manually tried to multiply one the matrices I defined
> as a nest against a vector, and the result is not the same with e.g. 1 and
> 3 processors. That's why I tried the toy program I wrote in the first
> place, which highlights the misplacement of elements.
>
Ah, now I think I understand.

  The PETSC_DECIDE arguments for sizes change with a different number of
processes. You can put in har numbers if you want.

  Thanks,

     Matt


> ---
> Cl?ment BERGER
> ENS de Lyon
>
>
> Le 2023-03-17 19:14, Barry Smith a ?crit :
>
>
>    This sounds  like a fine use of MATNEST. Now back to the original
> question
>
>
> I want to construct a matrix by blocs, each block having different sizes
> and partially stored by multiple processors. If I am not mistaken, the
> right way to do so is by using the MATNEST type. However, the following code
>
> Call
> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
> Call
> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
> Call
> MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr)
>
> does not generate the same matrix depending on the number of processors.
> It seems that it starts by everything owned by the first proc for A and B,
> then goes on to the second proc and so on (I hope I am being clear).
>
> Is it possible to change that ?
>
>   If I understand correctly it is behaving as expected. It is the same
> matrix on 1 and 2 MPI processes, the only difference is the ordering of the
> rows and columns.
>
>   Both matrix blocks are split among the two MPI processes. This is how
> MATNEST works and likely what you want in practice.
>
> On Mar 17, 2023, at 1:19 PM, Berger Clement <clement.berger at ens-lyon.fr>
> wrote:
>
> I have a matrix with four different blocks (2rows - 2columns). The block
> sizes differ from one another, because they correspond to a different
> physical variable. One of the block has the particularity that it has to be
> updated at each iteration. This update is performed by replacing it with a
> product of multiple matrices that depend on the result of the previous
> iteration. Note that these intermediate matrices are not square (because
> they also correspond to other types of variables), and that they must be
> completely refilled by hand (i.e. they are not the result of some simple
> linear operations). Finally, I use this final block matrix to solve
> multiple linear systems (with different righthand sides), so for now I use
> MUMPS as only the first solve takes time (but I might change it).
>
> Considering this setting, I created each type of variable separately,
> filled the different matrices, and created different nests of vectors /
> matrices for my operations. When the time comes to use KSPSolve, I use
> MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy
> the few vector data I need from my nests in a regular Vector, I solve, I
> get back my data in my nest and carry on with the operations needed for my
> updates.
>
> Is that clear ? I don't know if I provided too many or not enough details.
>
> Thank you
> ---
> Cl?ment BERGER
> ENS de Lyon
>
>
> Le 2023-03-17 17:34, Barry Smith a ?crit :
>
>
>    Perhaps if you provide a brief summary of what you would like to do and
> we may have ideas on how to achieve it.
>
>    Barry
>
> Note: that MATNEST does require that all matrices live on all the MPI
> processes within the original communicator. That is if the original
> communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST
> that only lives on ranks 1,2 but you could have it have 0 rows on rank zero
> so effectively it lives only on rank 1 and 2 (though its communicator is
> all three ranks).
>
> On Mar 17, 2023, at 12:14 PM, Berger Clement <clement.berger at ens-lyon.fr>
> wrote:
>
> It would be possible in the case I showed you but in mine that would
> actually be quite complicated, isn't there any other workaround ? I precise
> that I am not entitled to utilizing the MATNEST format, it's just that I
> think the other ones wouldn't work.
> ---
> Cl?ment BERGER
> ENS de Lyon
>
>
> Le 2023-03-17 15:48, Barry Smith a ?crit :
>
>
>    You may be able to mimic what you want by not using PETSC_DECIDE but
> instead computing up front how many rows of each matrix you want stored on
> each MPI process. You can use 0 for on certain MPI processes for certain
> matrices if you don't want any rows of that particular matrix stored on
> that particular MPI process.
>
>   Barry
>
>
> On Mar 17, 2023, at 10:10 AM, Berger Clement <clement.berger at ens-lyon.fr>
> wrote:
>
> Dear all,
>
> I want to construct a matrix by blocs, each block having different sizes
> and partially stored by multiple processors. If I am not mistaken, the
> right way to do so is by using the MATNEST type. However, the following code
>
> Call
> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
> Call
> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
> Call
> MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr)
>
> does not generate the same matrix depending on the number of processors.
> It seems that it starts by everything owned by the first proc for A and B,
> then goes on to the second proc and so on (I hope I am being clear).
>
> Is it possible to change that ?
>
> Note that I am coding in fortran if that has ay consequence.
>
> Thank you,
>
> Sincerely,
> --
> Cl?ment BERGER
> ENS de Lyon
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230317/87323cf8/attachment.html>

From clement.berger at ens-lyon.fr  Fri Mar 17 13:35:33 2023
From: clement.berger at ens-lyon.fr (Berger Clement)
Date: Fri, 17 Mar 2023 19:35:33 +0100
Subject: [petsc-users] Create a nest not aligned by processors
In-Reply-To: <CC2A3FF0-505B-4A18-A8DB-8399BA92CCF8@petsc.dev>
References: <b89465855591abef194300f1dc0f68e4@ens-lyon.fr>
	<3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev>
	<ece4a0ac05b61082f2ea73507c7551c8@ens-lyon.fr>
	<193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev>
	<d9d5551d095a228f57cb5724fbd994ab@ens-lyon.fr>
	<82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev>
	<eaab4d4a43e6e8551e058d5b3b2cf3ae@ens-lyon.fr>
	<CC2A3FF0-505B-4A18-A8DB-8399BA92CCF8@petsc.dev>
Message-ID: <26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr>

That might be it, I didn't find the equivalent of MatConvert for the
vectors, so when I need to solve my linear system, with my righthandside
properly computed in nest format, I create a new vector using
VecDuplicate, and then I copy into it my data using VecGetArrayF90 and
copiing each element by hand. Does it create an incorrect ordering ? If
so how can I get the correct one ?

---
Cl?ment BERGER
ENS de Lyon 

Le 2023-03-17 19:27, Barry Smith a ?crit :

> I would run your code with small sizes on 1, 2, 3 MPI ranks and use MatView() to examine the matrices. They will definitely be ordered differently but should otherwise be the same. My guess is that the right hand side may not have the correct ordering with respect to the matrix ordering in parallel. Note also that when the right hand side does have the correct ordering the solution will have a different ordering for each different number of MPI ranks when printed (but changing the ordering should give the same results up to machine precision. 
> 
> Barry 
> 
> On Mar 17, 2023, at 2:23 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> My issue is that it seems to improperly with some step of my process, the solve step doesn't provide the same result depending on the number of processors I use. I manually tried to multiply one the matrices I defined as a nest against a vector, and the result is not the same with e.g. 1 and 3 processors. That's why I tried the toy program I wrote in the first place, which highlights the misplacement of elements.
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 19:14, Barry Smith a ?crit : 
> This sounds  like a fine use of MATNEST. Now back to the original question 
> 
> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code 
> 
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) 
> 
> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). 
> 
> Is it possible to change that ?
   If I understand correctly it is behaving as expected. It is the same
matrix on 1 and 2 MPI processes, the only difference is the ordering of
the rows and columns.  

  Both matrix blocks are split among the two MPI processes. This is how
MATNEST works and likely what you want in practice.  

> On Mar 17, 2023, at 1:19 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> I have a matrix with four different blocks (2rows - 2columns). The block sizes differ from one another, because they correspond to a different physical variable. One of the block has the particularity that it has to be updated at each iteration. This update is performed by replacing it with a product of multiple matrices that depend on the result of the previous iteration. Note that these intermediate matrices are not square (because they also correspond to other types of variables), and that they must be completely refilled by hand (i.e. they are not the result of some simple linear operations). Finally, I use this final block matrix to solve multiple linear systems (with different righthand sides), so for now I use MUMPS as only the first solve takes time (but I might change it). 
> 
> Considering this setting, I created each type of variable separately, filled the different matrices, and created different nests of vectors / matrices for my operations. When the time comes to use KSPSolve, I use MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy the few vector data I need from my nests in a regular Vector, I solve, I get back my data in my nest and carry on with the operations needed for my updates. 
> 
> Is that clear ? I don't know if I provided too many or not enough details. 
> 
> Thank you
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 17:34, Barry Smith a ?crit : 
> Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it.  
> 
> Barry 
> 
> Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks).
> 
> On Mar 17, 2023, at 12:14 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work.
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 15:48, Barry Smith a ?crit : 
> You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process. 
> 
> Barry 
> 
> On Mar 17, 2023, at 10:10 AM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> Dear all, 
> 
> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code 
> 
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) 
> 
> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). 
> 
> Is it possible to change that ? 
> 
> Note that I am coding in fortran if that has ay consequence. 
> 
> Thank you, 
> 
> Sincerely,
> 
> -- 
> Cl?ment BERGER
> ENS de Lyon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230317/d59d8743/attachment-0001.html>

From bsmith at petsc.dev  Fri Mar 17 13:39:52 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 17 Mar 2023 14:39:52 -0400
Subject: [petsc-users] Create a nest not aligned by processors
In-Reply-To: <26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr>
References: <b89465855591abef194300f1dc0f68e4@ens-lyon.fr>
	<3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev>
	<ece4a0ac05b61082f2ea73507c7551c8@ens-lyon.fr>
	<193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev>
	<d9d5551d095a228f57cb5724fbd994ab@ens-lyon.fr>
	<82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev>
	<eaab4d4a43e6e8551e058d5b3b2cf3ae@ens-lyon.fr>
	<CC2A3FF0-505B-4A18-A8DB-8399BA92CCF8@petsc.dev>
	<26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr>
Message-ID: <1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev>


  I think the intention is that you use VecNestGetSubVecs() or VecNestGetSubVec() and fill up the sub-vectors in the same style as the matrices; this decreases the change of a reordering mistake in trying to do it by hand in your code.



> On Mar 17, 2023, at 2:35 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote:
> 
> That might be it, I didn't find the equivalent of MatConvert for the vectors, so when I need to solve my linear system, with my righthandside properly computed in nest format, I create a new vector using VecDuplicate, and then I copy into it my data using VecGetArrayF90 and copiing each element by hand. Does it create an incorrect ordering ? If so how can I get the correct one ?
> 
> ---
> Cl?ment BERGER
> ENS de Lyon
> 
> 
> Le 2023-03-17 19:27, Barry Smith a ?crit :
> 
>>  
>>   I would run your code with small sizes on 1, 2, 3 MPI ranks and use MatView() to examine the matrices. They will definitely be ordered differently but should otherwise be the same. My guess is that the right hand side may not have the correct ordering with respect to the matrix ordering in parallel. Note also that when the right hand side does have the correct ordering the solution will have a different ordering for each different number of MPI ranks when printed (but changing the ordering should give the same results up to machine precision.
>>  
>>   Barry
>> 
>> 
>>> On Mar 17, 2023, at 2:23 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote:
>>> 
>>> My issue is that it seems to improperly with some step of my process, the solve step doesn't provide the same result depending on the number of processors I use. I manually tried to multiply one the matrices I defined as a nest against a vector, and the result is not the same with e.g. 1 and 3 processors. That's why I tried the toy program I wrote in the first place, which highlights the misplacement of elements.
>>> 
>>> ---
>>> Cl?ment BERGER
>>> ENS de Lyon
>>> 
>>> 
>>> Le 2023-03-17 19:14, Barry Smith a ?crit :
>>> 
>>> 
>>>    This sounds  like a fine use of MATNEST. Now back to the original question
>>>  
>>> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code
>>> 
>>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
>>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
>>> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr)
>>> 
>>> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear).
>>> 
>>> Is it possible to change that ?
>>> 
>>>   If I understand correctly it is behaving as expected. It is the same matrix on 1 and 2 MPI processes, the only difference is the ordering of the rows and columns. 
>>>  
>>>   Both matrix blocks are split among the two MPI processes. This is how MATNEST works and likely what you want in practice. 
>>> 
>>> On Mar 17, 2023, at 1:19 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote:
>>> 
>>> I have a matrix with four different blocks (2rows - 2columns). The block sizes differ from one another, because they correspond to a different physical variable. One of the block has the particularity that it has to be updated at each iteration. This update is performed by replacing it with a product of multiple matrices that depend on the result of the previous iteration. Note that these intermediate matrices are not square (because they also correspond to other types of variables), and that they must be completely refilled by hand (i.e. they are not the result of some simple linear operations). Finally, I use this final block matrix to solve multiple linear systems (with different righthand sides), so for now I use MUMPS as only the first solve takes time (but I might change it).
>>> 
>>> Considering this setting, I created each type of variable separately, filled the different matrices, and created different nests of vectors / matrices for my operations. When the time comes to use KSPSolve, I use MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy the few vector data I need from my nests in a regular Vector, I solve, I get back my data in my nest and carry on with the operations needed for my updates.
>>> 
>>> Is that clear ? I don't know if I provided too many or not enough details.
>>> 
>>> Thank you
>>> 
>>> ---
>>> Cl?ment BERGER
>>> ENS de Lyon
>>> 
>>> 
>>> Le 2023-03-17 17:34, Barry Smith a ?crit :
>>> 
>>>  
>>>    Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it. 
>>>  
>>>    Barry
>>>  
>>> Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks).
>>> 
>>> On Mar 17, 2023, at 12:14 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote:
>>> 
>>> It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work.
>>> 
>>> ---
>>> Cl?ment BERGER
>>> ENS de Lyon
>>> 
>>> 
>>> Le 2023-03-17 15:48, Barry Smith a ?crit :
>>> 
>>>  
>>>    You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process.
>>>  
>>>   Barry
>>> 
>>> 
>>> On Mar 17, 2023, at 10:10 AM, Berger Clement <clement.berger at ens-lyon.fr> wrote:
>>> Dear all,
>>> 
>>> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code
>>> 
>>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
>>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
>>> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr)
>>> 
>>> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear).
>>> 
>>> Is it possible to change that ?
>>> 
>>> Note that I am coding in fortran if that has ay consequence.
>>> 
>>> Thank you,
>>> 
>>> Sincerely,
>>> 
>>> -- 
>>> Cl?ment BERGER
>>> ENS de Lyon

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230317/3e79f4fd/attachment.html>

From clement.berger at ens-lyon.fr  Fri Mar 17 13:52:49 2023
From: clement.berger at ens-lyon.fr (Berger Clement)
Date: Fri, 17 Mar 2023 19:52:49 +0100
Subject: [petsc-users] Create a nest not aligned by processors
In-Reply-To: <1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev>
References: <b89465855591abef194300f1dc0f68e4@ens-lyon.fr>
	<3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev>
	<ece4a0ac05b61082f2ea73507c7551c8@ens-lyon.fr>
	<193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev>
	<d9d5551d095a228f57cb5724fbd994ab@ens-lyon.fr>
	<82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev>
	<eaab4d4a43e6e8551e058d5b3b2cf3ae@ens-lyon.fr>
	<CC2A3FF0-505B-4A18-A8DB-8399BA92CCF8@petsc.dev>
	<26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr>
	<1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev>
Message-ID: <2b92d65cd8b6b5786bad3b03ab74eda6@ens-lyon.fr>

But this is to properly fill up the VecNest am I right ? Because this
one is correct, but I can't directly use it in the KSPSolve, I need to
copy it into a standard vector

---
Cl?ment BERGER
ENS de Lyon 

Le 2023-03-17 19:39, Barry Smith a ?crit :

> I think the intention is that you use VecNestGetSubVecs() or VecNestGetSubVec() and fill up the sub-vectors in the same style as the matrices; this decreases the change of a reordering mistake in trying to do it by hand in your code. 
> 
> On Mar 17, 2023, at 2:35 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> That might be it, I didn't find the equivalent of MatConvert for the vectors, so when I need to solve my linear system, with my righthandside properly computed in nest format, I create a new vector using VecDuplicate, and then I copy into it my data using VecGetArrayF90 and copiing each element by hand. Does it create an incorrect ordering ? If so how can I get the correct one ?
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 19:27, Barry Smith a ?crit : 
> I would run your code with small sizes on 1, 2, 3 MPI ranks and use MatView() to examine the matrices. They will definitely be ordered differently but should otherwise be the same. My guess is that the right hand side may not have the correct ordering with respect to the matrix ordering in parallel. Note also that when the right hand side does have the correct ordering the solution will have a different ordering for each different number of MPI ranks when printed (but changing the ordering should give the same results up to machine precision. 
> 
> Barry 
> 
> On Mar 17, 2023, at 2:23 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> My issue is that it seems to improperly with some step of my process, the solve step doesn't provide the same result depending on the number of processors I use. I manually tried to multiply one the matrices I defined as a nest against a vector, and the result is not the same with e.g. 1 and 3 processors. That's why I tried the toy program I wrote in the first place, which highlights the misplacement of elements.
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 19:14, Barry Smith a ?crit : 
> This sounds  like a fine use of MATNEST. Now back to the original question 
> 
> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code 
> 
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) 
> 
> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). 
> 
> Is it possible to change that ?
   If I understand correctly it is behaving as expected. It is the same
matrix on 1 and 2 MPI processes, the only difference is the ordering of
the rows and columns.  

  Both matrix blocks are split among the two MPI processes. This is how
MATNEST works and likely what you want in practice.  

> On Mar 17, 2023, at 1:19 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> I have a matrix with four different blocks (2rows - 2columns). The block sizes differ from one another, because they correspond to a different physical variable. One of the block has the particularity that it has to be updated at each iteration. This update is performed by replacing it with a product of multiple matrices that depend on the result of the previous iteration. Note that these intermediate matrices are not square (because they also correspond to other types of variables), and that they must be completely refilled by hand (i.e. they are not the result of some simple linear operations). Finally, I use this final block matrix to solve multiple linear systems (with different righthand sides), so for now I use MUMPS as only the first solve takes time (but I might change it). 
> 
> Considering this setting, I created each type of variable separately, filled the different matrices, and created different nests of vectors / matrices for my operations. When the time comes to use KSPSolve, I use MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy the few vector data I need from my nests in a regular Vector, I solve, I get back my data in my nest and carry on with the operations needed for my updates. 
> 
> Is that clear ? I don't know if I provided too many or not enough details. 
> 
> Thank you
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 17:34, Barry Smith a ?crit : 
> Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it.  
> 
> Barry 
> 
> Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks).
> 
> On Mar 17, 2023, at 12:14 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work.
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 15:48, Barry Smith a ?crit : 
> You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process. 
> 
> Barry 
> 
> On Mar 17, 2023, at 10:10 AM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> Dear all, 
> 
> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code 
> 
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) 
> 
> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). 
> 
> Is it possible to change that ? 
> 
> Note that I am coding in fortran if that has ay consequence. 
> 
> Thank you, 
> 
> Sincerely,
> 
> -- 
> Cl?ment BERGER
> ENS de Lyon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230317/78cdb967/attachment-0001.html>

From knepley at gmail.com  Fri Mar 17 13:53:59 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 17 Mar 2023 14:53:59 -0400
Subject: [petsc-users] Create a nest not aligned by processors
In-Reply-To: <2b92d65cd8b6b5786bad3b03ab74eda6@ens-lyon.fr>
References: <b89465855591abef194300f1dc0f68e4@ens-lyon.fr>
	<3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev>
	<ece4a0ac05b61082f2ea73507c7551c8@ens-lyon.fr>
	<193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev>
	<d9d5551d095a228f57cb5724fbd994ab@ens-lyon.fr>
	<82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev>
	<eaab4d4a43e6e8551e058d5b3b2cf3ae@ens-lyon.fr>
	<CC2A3FF0-505B-4A18-A8DB-8399BA92CCF8@petsc.dev>
	<26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr>
	<1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev>
	<2b92d65cd8b6b5786bad3b03ab74eda6@ens-lyon.fr>
Message-ID: <CAMYG4Gmybdr_wB+DDQdX+wUXS25UqCmhvKeiNtw1TOwNQXQwkQ@mail.gmail.com>

On Fri, Mar 17, 2023 at 2:53?PM Berger Clement <clement.berger at ens-lyon.fr>
wrote:

> But this is to properly fill up the VecNest am I right ? Because this one
> is correct, but I can't directly use it in the KSPSolve, I need to copy it
> into a standard vector
>
I do not understand what you mean here. You can definitely use a VecNest in
a KSP.

  Thanks,

    Matt



> ---
> Cl?ment BERGER
> ENS de Lyon
>
>
> Le 2023-03-17 19:39, Barry Smith a ?crit :
>
>
>   I think the intention is that you use VecNestGetSubVecs()
> or VecNestGetSubVec() and fill up the sub-vectors in the same style as the
> matrices; this decreases the change of a reordering mistake in trying to do
> it by hand in your code.
>
>
>
> On Mar 17, 2023, at 2:35 PM, Berger Clement <clement.berger at ens-lyon.fr>
> wrote:
>
> That might be it, I didn't find the equivalent of MatConvert for the
> vectors, so when I need to solve my linear system, with my righthandside
> properly computed in nest format, I create a new vector using VecDuplicate,
> and then I copy into it my data using VecGetArrayF90 and copiing each
> element by hand. Does it create an incorrect ordering ? If so how can I get
> the correct one ?
> ---
> Cl?ment BERGER
> ENS de Lyon
>
>
> Le 2023-03-17 19:27, Barry Smith a ?crit :
>
>
>   I would run your code with small sizes on 1, 2, 3 MPI ranks and use
> MatView() to examine the matrices. They will definitely be ordered
> differently but should otherwise be the same. My guess is that the right
> hand side may not have the correct ordering with respect to the matrix
> ordering in parallel. Note also that when the right hand side does have the
> correct ordering the solution will have a different ordering for each
> different number of MPI ranks when printed (but changing the ordering
> should give the same results up to machine precision.
>
>   Barry
>
>
> On Mar 17, 2023, at 2:23 PM, Berger Clement <clement.berger at ens-lyon.fr>
> wrote:
>
> My issue is that it seems to improperly with some step of my process, the
> solve step doesn't provide the same result depending on the number of
> processors I use. I manually tried to multiply one the matrices I defined
> as a nest against a vector, and the result is not the same with e.g. 1 and
> 3 processors. That's why I tried the toy program I wrote in the first
> place, which highlights the misplacement of elements.
> ---
> Cl?ment BERGER
> ENS de Lyon
>
>
> Le 2023-03-17 19:14, Barry Smith a ?crit :
>
>
>    This sounds  like a fine use of MATNEST. Now back to the original
> question
>
>
> I want to construct a matrix by blocs, each block having different sizes
> and partially stored by multiple processors. If I am not mistaken, the
> right way to do so is by using the MATNEST type. However, the following code
>
> Call
> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
> Call
> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
> Call
> MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr)
>
> does not generate the same matrix depending on the number of processors.
> It seems that it starts by everything owned by the first proc for A and B,
> then goes on to the second proc and so on (I hope I am being clear).
>
> Is it possible to change that ?
>
>   If I understand correctly it is behaving as expected. It is the same
> matrix on 1 and 2 MPI processes, the only difference is the ordering of the
> rows and columns.
>
>   Both matrix blocks are split among the two MPI processes. This is how
> MATNEST works and likely what you want in practice.
>
> On Mar 17, 2023, at 1:19 PM, Berger Clement <clement.berger at ens-lyon.fr>
> wrote:
>
> I have a matrix with four different blocks (2rows - 2columns). The block
> sizes differ from one another, because they correspond to a different
> physical variable. One of the block has the particularity that it has to be
> updated at each iteration. This update is performed by replacing it with a
> product of multiple matrices that depend on the result of the previous
> iteration. Note that these intermediate matrices are not square (because
> they also correspond to other types of variables), and that they must be
> completely refilled by hand (i.e. they are not the result of some simple
> linear operations). Finally, I use this final block matrix to solve
> multiple linear systems (with different righthand sides), so for now I use
> MUMPS as only the first solve takes time (but I might change it).
>
> Considering this setting, I created each type of variable separately,
> filled the different matrices, and created different nests of vectors /
> matrices for my operations. When the time comes to use KSPSolve, I use
> MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy
> the few vector data I need from my nests in a regular Vector, I solve, I
> get back my data in my nest and carry on with the operations needed for my
> updates.
>
> Is that clear ? I don't know if I provided too many or not enough details.
>
> Thank you
> ---
> Cl?ment BERGER
> ENS de Lyon
>
>
> Le 2023-03-17 17:34, Barry Smith a ?crit :
>
>
>    Perhaps if you provide a brief summary of what you would like to do and
> we may have ideas on how to achieve it.
>
>    Barry
>
> Note: that MATNEST does require that all matrices live on all the MPI
> processes within the original communicator. That is if the original
> communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST
> that only lives on ranks 1,2 but you could have it have 0 rows on rank zero
> so effectively it lives only on rank 1 and 2 (though its communicator is
> all three ranks).
>
> On Mar 17, 2023, at 12:14 PM, Berger Clement <clement.berger at ens-lyon.fr>
> wrote:
>
> It would be possible in the case I showed you but in mine that would
> actually be quite complicated, isn't there any other workaround ? I precise
> that I am not entitled to utilizing the MATNEST format, it's just that I
> think the other ones wouldn't work.
> ---
> Cl?ment BERGER
> ENS de Lyon
>
>
> Le 2023-03-17 15:48, Barry Smith a ?crit :
>
>
>    You may be able to mimic what you want by not using PETSC_DECIDE but
> instead computing up front how many rows of each matrix you want stored on
> each MPI process. You can use 0 for on certain MPI processes for certain
> matrices if you don't want any rows of that particular matrix stored on
> that particular MPI process.
>
>   Barry
>
>
> On Mar 17, 2023, at 10:10 AM, Berger Clement <clement.berger at ens-lyon.fr>
> wrote:
>
> Dear all,
>
> I want to construct a matrix by blocs, each block having different sizes
> and partially stored by multiple processors. If I am not mistaken, the
> right way to do so is by using the MATNEST type. However, the following code
>
> Call
> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
> Call
> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
> Call
> MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr)
>
> does not generate the same matrix depending on the number of processors.
> It seems that it starts by everything owned by the first proc for A and B,
> then goes on to the second proc and so on (I hope I am being clear).
>
> Is it possible to change that ?
>
> Note that I am coding in fortran if that has ay consequence.
>
> Thank you,
>
> Sincerely,
> --
> Cl?ment BERGER
> ENS de Lyon
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230317/8a80197c/attachment.html>

From clement.berger at ens-lyon.fr  Fri Mar 17 14:22:20 2023
From: clement.berger at ens-lyon.fr (Berger Clement)
Date: Fri, 17 Mar 2023 20:22:20 +0100
Subject: [petsc-users] Create a nest not aligned by processors
In-Reply-To: <CAMYG4Gmybdr_wB+DDQdX+wUXS25UqCmhvKeiNtw1TOwNQXQwkQ@mail.gmail.com>
References: <b89465855591abef194300f1dc0f68e4@ens-lyon.fr>
	<3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev>
	<ece4a0ac05b61082f2ea73507c7551c8@ens-lyon.fr>
	<193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev>
	<d9d5551d095a228f57cb5724fbd994ab@ens-lyon.fr>
	<82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev>
	<eaab4d4a43e6e8551e058d5b3b2cf3ae@ens-lyon.fr>
	<CC2A3FF0-505B-4A18-A8DB-8399BA92CCF8@petsc.dev>
	<26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr>
	<1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev>
	<2b92d65cd8b6b5786bad3b03ab74eda6@ens-lyon.fr>
	<CAMYG4Gmybdr_wB+DDQdX+wUXS25UqCmhvKeiNtw1TOwNQXQwkQ@mail.gmail.com>
Message-ID: <e503fd2aa361d85c515712b41c447011@ens-lyon.fr>

To use MUMPS I need to convert my matrix in MATAIJ format (or at least
not MATNEST), after that if I use a VECNEST for the left and right
hanside, I get an error during the solve procedure, it is removed if I
copy my data in a vector with standard format, I couldn't find any other
way

---
Cl?ment BERGER
ENS de Lyon 

Le 2023-03-17 19:53, Matthew Knepley a ?crit :

> On Fri, Mar 17, 2023 at 2:53?PM Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
>> But this is to properly fill up the VecNest am I right ? Because this one is correct, but I can't directly use it in the KSPSolve, I need to copy it into a standard vector
> 
> I do not understand what you mean here. You can definitely use a VecNest in a KSP. 
> 
> Thanks, 
> 
> Matt 
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 19:39, Barry Smith a ?crit : 
> I think the intention is that you use VecNestGetSubVecs() or VecNestGetSubVec() and fill up the sub-vectors in the same style as the matrices; this decreases the change of a reordering mistake in trying to do it by hand in your code. 
> 
> On Mar 17, 2023, at 2:35 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> That might be it, I didn't find the equivalent of MatConvert for the vectors, so when I need to solve my linear system, with my righthandside properly computed in nest format, I create a new vector using VecDuplicate, and then I copy into it my data using VecGetArrayF90 and copiing each element by hand. Does it create an incorrect ordering ? If so how can I get the correct one ?
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 19:27, Barry Smith a ?crit : 
> I would run your code with small sizes on 1, 2, 3 MPI ranks and use MatView() to examine the matrices. They will definitely be ordered differently but should otherwise be the same. My guess is that the right hand side may not have the correct ordering with respect to the matrix ordering in parallel. Note also that when the right hand side does have the correct ordering the solution will have a different ordering for each different number of MPI ranks when printed (but changing the ordering should give the same results up to machine precision. 
> 
> Barry 
> 
> On Mar 17, 2023, at 2:23 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> My issue is that it seems to improperly with some step of my process, the solve step doesn't provide the same result depending on the number of processors I use. I manually tried to multiply one the matrices I defined as a nest against a vector, and the result is not the same with e.g. 1 and 3 processors. That's why I tried the toy program I wrote in the first place, which highlights the misplacement of elements.
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 19:14, Barry Smith a ?crit : 
> This sounds  like a fine use of MATNEST. Now back to the original question 
> 
> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code 
> 
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) 
> 
> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). 
> 
> Is it possible to change that ?
   If I understand correctly it is behaving as expected. It is the same
matrix on 1 and 2 MPI processes, the only difference is the ordering of
the rows and columns.  

  Both matrix blocks are split among the two MPI processes. This is how
MATNEST works and likely what you want in practice.  

> On Mar 17, 2023, at 1:19 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> I have a matrix with four different blocks (2rows - 2columns). The block sizes differ from one another, because they correspond to a different physical variable. One of the block has the particularity that it has to be updated at each iteration. This update is performed by replacing it with a product of multiple matrices that depend on the result of the previous iteration. Note that these intermediate matrices are not square (because they also correspond to other types of variables), and that they must be completely refilled by hand (i.e. they are not the result of some simple linear operations). Finally, I use this final block matrix to solve multiple linear systems (with different righthand sides), so for now I use MUMPS as only the first solve takes time (but I might change it). 
> 
> Considering this setting, I created each type of variable separately, filled the different matrices, and created different nests of vectors / matrices for my operations. When the time comes to use KSPSolve, I use MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy the few vector data I need from my nests in a regular Vector, I solve, I get back my data in my nest and carry on with the operations needed for my updates. 
> 
> Is that clear ? I don't know if I provided too many or not enough details. 
> 
> Thank you
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 17:34, Barry Smith a ?crit : 
> Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it.  
> 
> Barry 
> 
> Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks).
> 
> On Mar 17, 2023, at 12:14 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work.
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 15:48, Barry Smith a ?crit : 
> You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process. 
> 
> Barry 
> 
> On Mar 17, 2023, at 10:10 AM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> Dear all, 
> 
> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code 
> 
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) 
> 
> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). 
> 
> Is it possible to change that ? 
> 
> Note that I am coding in fortran if that has ay consequence. 
> 
> Thank you, 
> 
> Sincerely,
> 
> -- 
> Cl?ment BERGER
> ENS de Lyon

  -- 

What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener 

https://www.cse.buffalo.edu/~knepley/ [1] 

Links:
------
[1] http://www.cse.buffalo.edu/~knepley/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230317/0204ec96/attachment-0001.html>

From gregory.meyer at berkeley.edu  Fri Mar 17 15:13:26 2023
From: gregory.meyer at berkeley.edu (Greg Kahanamoku-Meyer)
Date: Fri, 17 Mar 2023 13:13:26 -0700
Subject: [petsc-users] SLEPc: GPU accelerated shift-invert
Message-ID: <CAD9_pNPp26bcyjDFd8aKMK+tP03dMS1-hHLT3Kd2bbB=GY6irw@mail.gmail.com>

Hi,

I'm trying to accelerate a shift-invert eigensolve with GPU, but the
computation seems to be spending a lot of its time in the CPU. Looking at
the output with "-log_view -log_view_gpu_time" I see that MatLUFactorNum is
not using the GPU (GPU Mflops/s is 0), and is taking the majority of the
computation time. Is LU factorization on the GPU supported? I am currently
applying the command line options "-vec_type cuda -mat_type aijcusparse",
please let me know if there are other options I can apply to accelerate the
LU factorization as well. I tried digging through the documentation but
couldn't find a clear answer.

Thanks in advance!

Kind regards,
Greg KM

-- 
*Gregory D. Kahanamoku-Meyer*
PhD Candidate
quantum computing | cryptography | high-performance computing
Department of Physics
University of California at Berkeley
personal website <https://gregdmeyer.github.io/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230317/84c1b140/attachment.html>

From bsmith at petsc.dev  Fri Mar 17 15:57:09 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 17 Mar 2023 16:57:09 -0400
Subject: [petsc-users] Create a nest not aligned by processors
In-Reply-To: <e503fd2aa361d85c515712b41c447011@ens-lyon.fr>
References: <b89465855591abef194300f1dc0f68e4@ens-lyon.fr>
	<3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev>
	<ece4a0ac05b61082f2ea73507c7551c8@ens-lyon.fr>
	<193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev>
	<d9d5551d095a228f57cb5724fbd994ab@ens-lyon.fr>
	<82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev>
	<eaab4d4a43e6e8551e058d5b3b2cf3ae@ens-lyon.fr>
	<CC2A3FF0-505B-4A18-A8DB-8399BA92CCF8@petsc.dev>
	<26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr>
	<1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev>
	<2b92d65cd8b6b5786bad3b03ab74eda6@ens-lyon.fr>
	<CAMYG4Gmybdr_wB+DDQdX+wUXS25UqCmhvKeiNtw1TOwNQXQwkQ@mail.gmail.com>
	<e503fd2aa361d85c515712b41c447011@ens-lyon.fr>
Message-ID: <E31F8C4D-3737-44C0-9022-6B34A26CEAA0@petsc.dev>


  Yes, you would benefit from a VecConvert() to produce a standard vector. But you should be able to use VecGetArray() on the nest array and on the standard array and copy the values between the arrays any way you like. You don't need to do any reordering when you copy. Is that not working and what are the symptoms (more than just the answers to the linear solve are different)? Again you can run on one and two MPI processes with a tiny problem to see if things are not in the correct order in the vectors and matrices.

  Barry


> On Mar 17, 2023, at 3:22 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote:
> 
> To use MUMPS I need to convert my matrix in MATAIJ format (or at least not MATNEST), after that if I use a VECNEST for the left and right hanside, I get an error during the solve procedure, it is removed if I copy my data in a vector with standard format, I couldn't find any other way
> 
> ---
> Cl?ment BERGER
> ENS de Lyon
> 
> 
> Le 2023-03-17 19:53, Matthew Knepley a ?crit :
> 
>> On Fri, Mar 17, 2023 at 2:53?PM Berger Clement <clement.berger at ens-lyon.fr <mailto:clement.berger at ens-lyon.fr>> wrote:
>>> But this is to properly fill up the VecNest am I right ? Because this one is correct, but I can't directly use it in the KSPSolve, I need to copy it into a standard vector
>>> 
>>>  
>> I do not understand what you mean here. You can definitely use a VecNest in a KSP.
>>  
>>   Thanks,
>>  
>>     Matt
>>  
>>  
>>> ---
>>> Cl?ment BERGER
>>> ENS de Lyon
>>> 
>>> 
>>> Le 2023-03-17 19:39, Barry Smith a ?crit :
>>> 
>>>  
>>>   I think the intention is that you use VecNestGetSubVecs() or VecNestGetSubVec() and fill up the sub-vectors in the same style as the matrices; this decreases the change of a reordering mistake in trying to do it by hand in your code.
>>>  
>>> 
>>> 
>>> On Mar 17, 2023, at 2:35 PM, Berger Clement <clement.berger at ens-lyon.fr <mailto:clement.berger at ens-lyon.fr>> wrote:
>>> 
>>> That might be it, I didn't find the equivalent of MatConvert for the vectors, so when I need to solve my linear system, with my righthandside properly computed in nest format, I create a new vector using VecDuplicate, and then I copy into it my data using VecGetArrayF90 and copiing each element by hand. Does it create an incorrect ordering ? If so how can I get the correct one ?
>>> 
>>> ---
>>> Cl?ment BERGER
>>> ENS de Lyon
>>> 
>>> 
>>> Le 2023-03-17 19:27, Barry Smith a ?crit :
>>> 
>>>  
>>>   I would run your code with small sizes on 1, 2, 3 MPI ranks and use MatView() to examine the matrices. They will definitely be ordered differently but should otherwise be the same. My guess is that the right hand side may not have the correct ordering with respect to the matrix ordering in parallel. Note also that when the right hand side does have the correct ordering the solution will have a different ordering for each different number of MPI ranks when printed (but changing the ordering should give the same results up to machine precision.
>>>  
>>>   Barry
>>> 
>>> 
>>> On Mar 17, 2023, at 2:23 PM, Berger Clement <clement.berger at ens-lyon.fr <mailto:clement.berger at ens-lyon.fr>> wrote:
>>> 
>>> My issue is that it seems to improperly with some step of my process, the solve step doesn't provide the same result depending on the number of processors I use. I manually tried to multiply one the matrices I defined as a nest against a vector, and the result is not the same with e.g. 1 and 3 processors. That's why I tried the toy program I wrote in the first place, which highlights the misplacement of elements.
>>> 
>>> ---
>>> Cl?ment BERGER
>>> ENS de Lyon
>>> 
>>> 
>>> Le 2023-03-17 19:14, Barry Smith a ?crit :
>>> 
>>> 
>>>    This sounds  like a fine use of MATNEST. Now back to the original question
>>>  
>>> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code
>>> 
>>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
>>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
>>> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr)
>>> 
>>> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear).
>>> 
>>> Is it possible to change that ?
>>> 
>>>   If I understand correctly it is behaving as expected. It is the same matrix on 1 and 2 MPI processes, the only difference is the ordering of the rows and columns. 
>>>  
>>>   Both matrix blocks are split among the two MPI processes. This is how MATNEST works and likely what you want in practice. 
>>> 
>>> On Mar 17, 2023, at 1:19 PM, Berger Clement <clement.berger at ens-lyon.fr <mailto:clement.berger at ens-lyon.fr>> wrote:
>>> 
>>> I have a matrix with four different blocks (2rows - 2columns). The block sizes differ from one another, because they correspond to a different physical variable. One of the block has the particularity that it has to be updated at each iteration. This update is performed by replacing it with a product of multiple matrices that depend on the result of the previous iteration. Note that these intermediate matrices are not square (because they also correspond to other types of variables), and that they must be completely refilled by hand (i.e. they are not the result of some simple linear operations). Finally, I use this final block matrix to solve multiple linear systems (with different righthand sides), so for now I use MUMPS as only the first solve takes time (but I might change it).
>>> 
>>> Considering this setting, I created each type of variable separately, filled the different matrices, and created different nests of vectors / matrices for my operations. When the time comes to use KSPSolve, I use MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy the few vector data I need from my nests in a regular Vector, I solve, I get back my data in my nest and carry on with the operations needed for my updates.
>>> 
>>> Is that clear ? I don't know if I provided too many or not enough details.
>>> 
>>> Thank you
>>> 
>>> ---
>>> Cl?ment BERGER
>>> ENS de Lyon
>>> 
>>> 
>>> Le 2023-03-17 17:34, Barry Smith a ?crit :
>>> 
>>>  
>>>    Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it. 
>>>  
>>>    Barry
>>>  
>>> Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks).
>>> 
>>> On Mar 17, 2023, at 12:14 PM, Berger Clement <clement.berger at ens-lyon.fr <mailto:clement.berger at ens-lyon.fr>> wrote:
>>> 
>>> It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work.
>>> 
>>> ---
>>> Cl?ment BERGER
>>> ENS de Lyon
>>> 
>>> 
>>> Le 2023-03-17 15:48, Barry Smith a ?crit :
>>> 
>>>  
>>>    You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process.
>>>  
>>>   Barry
>>> 
>>> 
>>> On Mar 17, 2023, at 10:10 AM, Berger Clement <clement.berger at ens-lyon.fr <mailto:clement.berger at ens-lyon.fr>> wrote:
>>> Dear all,
>>> 
>>> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code
>>> 
>>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
>>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
>>> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr)
>>> 
>>> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear).
>>> 
>>> Is it possible to change that ?
>>> 
>>> Note that I am coding in fortran if that has ay consequence.
>>> 
>>> Thank you,
>>> 
>>> Sincerely,
>>> 
>>> -- 
>>> Cl?ment BERGER
>>> ENS de Lyon
>> 
>>  
>> -- 
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>>  
>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230317/5a6d952f/attachment-0001.html>

From jroman at dsic.upv.es  Sat Mar 18 06:46:19 2023
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Sat, 18 Mar 2023 12:46:19 +0100
Subject: [petsc-users] SLEPc: GPU accelerated shift-invert
In-Reply-To: <CAD9_pNPp26bcyjDFd8aKMK+tP03dMS1-hHLT3Kd2bbB=GY6irw@mail.gmail.com>
References: <CAD9_pNPp26bcyjDFd8aKMK+tP03dMS1-hHLT3Kd2bbB=GY6irw@mail.gmail.com>
Message-ID: <1ACEEC7D-35E4-453E-9F96-3B9271C4F946@dsic.upv.es>

When using an aijcusparse matrix by default it will select the cusparse solver, i.e., as if you have added the option -st_pc_factor_mat_solver_type cusparse 
The problem is that CUSPARSE does not have functionality for computing the LU factorization on the GPU, as far as I know. So what PETSc does is factorize the matrix on the CPU (the largest cost) and then the use the GPU for the triangular solves. In SLEPc computations, the number of triangular solves is usually small, so there is no gain in doing those on the GPU. Furthermore, these flops do not seem to be correctly logged to appear on the GPU side.

Probably someone like Stefano or Junchao can provide more information about factorizations on the GPU.

You could try doing inexact shift-and-invert, i.e., using an iterative linear solver such as bcgs+ilu. In the case of ILU, it is implemented on the GPU with CUSPARSE. However, inexact shift-and-invert is not viable in many applications, depending on the distribution of eigenvalues, due to non-convergence of the KSP.

A final alternative is to avoid shift-and-invert completely and use STFILTER. Again, this will not work in all cases. Basically, it trades a factorization for a huge amount of matrix-vector products, which may be good for GPU computation. If you want, send me a matrix and I can do some tests.

Jose



> El 17 mar 2023, a las 21:13, Greg Kahanamoku-Meyer <gregory.meyer at berkeley.edu> escribi?:
> 
> Hi,
> 
> I'm trying to accelerate a shift-invert eigensolve with GPU, but the computation seems to be spending a lot of its time in the CPU. Looking at the output with "-log_view -log_view_gpu_time" I see that MatLUFactorNum is not using the GPU (GPU Mflops/s is 0), and is taking the majority of the computation time. Is LU factorization on the GPU supported? I am currently applying the command line options "-vec_type cuda -mat_type aijcusparse", please let me know if there are other options I can apply to accelerate the LU factorization as well. I tried digging through the documentation but couldn't find a clear answer.
> 
> Thanks in advance!
> 
> Kind regards,
> Greg KM
> 
> -- 
> Gregory D. Kahanamoku-Meyer
> PhD Candidate
> quantum computing | cryptography | high-performance computing
> Department of Physics
> University of California at Berkeley
> personal website


From mail2amneet at gmail.com  Sun Mar 19 12:43:37 2023
From: mail2amneet at gmail.com (Amneet Bhalla)
Date: Sun, 19 Mar 2023 10:43:37 -0700
Subject: [petsc-users] PETSc build asks for network connections
Message-ID: <CAMETWJ04WWQNsiwE5z7ajcauy2TBpL69Y4xf4X4OVXSCK2SJPg@mail.gmail.com>

Hi Folks,

I'm trying to build PETSc on MacOS Ventura (Apple M2) with hypre. I'm using
the latest version (v3.18.5). During the configure and make check stage I
get a request about accepting network connections. The configure and check
proceeds without my input but the dialog box stays in place. Please see the
screenshot. I'm wondering if it is benign or something to be concerned
about? Do I need to accept any network certificate to not see this dialog
box?

Thanks,

-- 
--Amneet

[image: Screenshot 2023-03-19 at 10.38.57 AM.png][image: Screenshot
2023-03-19 at 10.33.01 AM.png]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230319/d09ae15a/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot 2023-03-19 at 10.38.57 AM.png
Type: image/png
Size: 1018501 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230319/d09ae15a/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot 2023-03-19 at 10.33.01 AM.png
Type: image/png
Size: 2269564 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230319/d09ae15a/attachment-0003.png>

From balay at mcs.anl.gov  Sun Mar 19 12:56:28 2023
From: balay at mcs.anl.gov (Satish Balay)
Date: Sun, 19 Mar 2023 12:56:28 -0500 (CDT)
Subject: [petsc-users] PETSc build asks for network connections
In-Reply-To: <CAMETWJ04WWQNsiwE5z7ajcauy2TBpL69Y4xf4X4OVXSCK2SJPg@mail.gmail.com>
References: <CAMETWJ04WWQNsiwE5z7ajcauy2TBpL69Y4xf4X4OVXSCK2SJPg@mail.gmail.com>
Message-ID: <d0c0f1a3-ce88-59dc-17c4-c6ea121496f2@mcs.anl.gov>

I think its due to some of the system calls from MPI.

You can verify this with a '--with-mpi=0' build.

I wonder if there is a way to build mpich or openmpi - that doesn't trigger Apple's firewall..

Satish

On Sun, 19 Mar 2023, Amneet Bhalla wrote:

> Hi Folks,
> 
> I'm trying to build PETSc on MacOS Ventura (Apple M2) with hypre. I'm using
> the latest version (v3.18.5). During the configure and make check stage I
> get a request about accepting network connections. The configure and check
> proceeds without my input but the dialog box stays in place. Please see the
> screenshot. I'm wondering if it is benign or something to be concerned
> about? Do I need to accept any network certificate to not see this dialog
> box?
> 
> Thanks,
> 
> 


From mail2amneet at gmail.com  Sun Mar 19 12:59:10 2023
From: mail2amneet at gmail.com (Amneet Bhalla)
Date: Sun, 19 Mar 2023 10:59:10 -0700
Subject: [petsc-users] PETSc build asks for network connections
In-Reply-To: <d0c0f1a3-ce88-59dc-17c4-c6ea121496f2@mcs.anl.gov>
References: <CAMETWJ04WWQNsiwE5z7ajcauy2TBpL69Y4xf4X4OVXSCK2SJPg@mail.gmail.com>
	<d0c0f1a3-ce88-59dc-17c4-c6ea121496f2@mcs.anl.gov>
Message-ID: <CAMETWJ019H-bRz7exRF5C-Na-6CQO-weWbRVFQm4yAfuUV9TkQ@mail.gmail.com>

I'm building PETSc without mpi (I built mpich v 4.1.1 locally). Here is the
configure command line that I used:

./configure --CC=mpicc --CXX=mpicxx --FC=mpif90 --PETSC_ARCH=darwin-dbg
--with-debugging=1 --download-hypre=1 --with-x=0

On Sun, Mar 19, 2023 at 10:56?AM Satish Balay <balay at mcs.anl.gov> wrote:

> I think its due to some of the system calls from MPI.
>
> You can verify this with a '--with-mpi=0' build.
>
> I wonder if there is a way to build mpich or openmpi - that doesn't
> trigger Apple's firewall..
>
> Satish
>
> On Sun, 19 Mar 2023, Amneet Bhalla wrote:
>
> > Hi Folks,
> >
> > I'm trying to build PETSc on MacOS Ventura (Apple M2) with hypre. I'm
> using
> > the latest version (v3.18.5). During the configure and make check stage I
> > get a request about accepting network connections. The configure and
> check
> > proceeds without my input but the dialog box stays in place. Please see
> the
> > screenshot. I'm wondering if it is benign or something to be concerned
> > about? Do I need to accept any network certificate to not see this dialog
> > box?
> >
> > Thanks,
> >
> >
>
>

-- 
--Amneet
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230319/9d851a06/attachment.html>

From knepley at gmail.com  Sun Mar 19 13:00:58 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Sun, 19 Mar 2023 14:00:58 -0400
Subject: [petsc-users] PETSc build asks for network connections
In-Reply-To: <CAMETWJ019H-bRz7exRF5C-Na-6CQO-weWbRVFQm4yAfuUV9TkQ@mail.gmail.com>
References: <CAMETWJ04WWQNsiwE5z7ajcauy2TBpL69Y4xf4X4OVXSCK2SJPg@mail.gmail.com>
	<d0c0f1a3-ce88-59dc-17c4-c6ea121496f2@mcs.anl.gov>
	<CAMETWJ019H-bRz7exRF5C-Na-6CQO-weWbRVFQm4yAfuUV9TkQ@mail.gmail.com>
Message-ID: <CAMYG4Gmbzun-TXuVR9U=0=e_J9GphAK8MgwGn-jtxeWvgi_+9A@mail.gmail.com>

On Sun, Mar 19, 2023 at 1:59?PM Amneet Bhalla <mail2amneet at gmail.com> wrote:

> I'm building PETSc without mpi (I built mpich v 4.1.1 locally). Here is
> the configure command line that I used:
>
> ./configure --CC=mpicc --CXX=mpicxx --FC=mpif90 --PETSC_ARCH=darwin-dbg
> --with-debugging=1 --download-hypre=1 --with-x=0
>
>
No, this uses MPI, it just does not built it. Configuring with --with-mpi=0
will shut off any use of MPI, which is what Satish thinks is bugging the
firewall.

  Thanks,

    Matt


> On Sun, Mar 19, 2023 at 10:56?AM Satish Balay <balay at mcs.anl.gov> wrote:
>
>> I think its due to some of the system calls from MPI.
>>
>> You can verify this with a '--with-mpi=0' build.
>>
>> I wonder if there is a way to build mpich or openmpi - that doesn't
>> trigger Apple's firewall..
>>
>> Satish
>>
>> On Sun, 19 Mar 2023, Amneet Bhalla wrote:
>>
>> > Hi Folks,
>> >
>> > I'm trying to build PETSc on MacOS Ventura (Apple M2) with hypre. I'm
>> using
>> > the latest version (v3.18.5). During the configure and make check stage
>> I
>> > get a request about accepting network connections. The configure and
>> check
>> > proceeds without my input but the dialog box stays in place. Please see
>> the
>> > screenshot. I'm wondering if it is benign or something to be concerned
>> > about? Do I need to accept any network certificate to not see this
>> dialog
>> > box?
>> >
>> > Thanks,
>> >
>> >
>>
>>
>
> --
> --Amneet
>
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230319/7d0842bb/attachment.html>

From mail2amneet at gmail.com  Sun Mar 19 16:25:53 2023
From: mail2amneet at gmail.com (Amneet Bhalla)
Date: Sun, 19 Mar 2023 14:25:53 -0700
Subject: [petsc-users] PETSc build asks for network connections
In-Reply-To: <CAMYG4Gmbzun-TXuVR9U=0=e_J9GphAK8MgwGn-jtxeWvgi_+9A@mail.gmail.com>
References: <CAMETWJ04WWQNsiwE5z7ajcauy2TBpL69Y4xf4X4OVXSCK2SJPg@mail.gmail.com>
	<d0c0f1a3-ce88-59dc-17c4-c6ea121496f2@mcs.anl.gov>
	<CAMETWJ019H-bRz7exRF5C-Na-6CQO-weWbRVFQm4yAfuUV9TkQ@mail.gmail.com>
	<CAMYG4Gmbzun-TXuVR9U=0=e_J9GphAK8MgwGn-jtxeWvgi_+9A@mail.gmail.com>
Message-ID: <CAMETWJ27WuzL9CzRH7N2O2p75hh5sYnib6jjs9hBhbdqGeGDNw@mail.gmail.com>

Yes, this is MPI that is triggering the apple firewall. If I allow it it
gets added to the allowed list (see the screenshot) and it does not trigger
the firewall again. However, this needs to be done for all executables
(there will be several main2d's in the list). Any way to suppress it
for all executables linked to mpi in the first place?

[image: Screenshot 2023-03-19 at 2.19.53 PM.png]

On Sun, Mar 19, 2023 at 11:01?AM Matthew Knepley <knepley at gmail.com> wrote:

> On Sun, Mar 19, 2023 at 1:59?PM Amneet Bhalla <mail2amneet at gmail.com>
> wrote:
>
>> I'm building PETSc without mpi (I built mpich v 4.1.1 locally). Here is
>> the configure command line that I used:
>>
>> ./configure --CC=mpicc --CXX=mpicxx --FC=mpif90 --PETSC_ARCH=darwin-dbg
>> --with-debugging=1 --download-hypre=1 --with-x=0
>>
>>
> No, this uses MPI, it just does not built it. Configuring with
> --with-mpi=0 will shut off any use of MPI, which is what Satish thinks is
> bugging the firewall.
>
>   Thanks,
>
>     Matt
>
>
>> On Sun, Mar 19, 2023 at 10:56?AM Satish Balay <balay at mcs.anl.gov> wrote:
>>
>>> I think its due to some of the system calls from MPI.
>>>
>>> You can verify this with a '--with-mpi=0' build.
>>>
>>> I wonder if there is a way to build mpich or openmpi - that doesn't
>>> trigger Apple's firewall..
>>>
>>> Satish
>>>
>>> On Sun, 19 Mar 2023, Amneet Bhalla wrote:
>>>
>>> > Hi Folks,
>>> >
>>> > I'm trying to build PETSc on MacOS Ventura (Apple M2) with hypre. I'm
>>> using
>>> > the latest version (v3.18.5). During the configure and make check
>>> stage I
>>> > get a request about accepting network connections. The configure and
>>> check
>>> > proceeds without my input but the dialog box stays in place. Please
>>> see the
>>> > screenshot. I'm wondering if it is benign or something to be concerned
>>> > about? Do I need to accept any network certificate to not see this
>>> dialog
>>> > box?
>>> >
>>> > Thanks,
>>> >
>>> >
>>>
>>>
>>
>> --
>> --Amneet
>>
>>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>


-- 
--Amneet
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230319/88bbdd83/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot 2023-03-19 at 2.19.53 PM.png
Type: image/png
Size: 300582 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230319/88bbdd83/attachment-0001.png>

From bsmith at petsc.dev  Sun Mar 19 17:51:31 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Sun, 19 Mar 2023 18:51:31 -0400
Subject: [petsc-users] PETSc build asks for network connections
In-Reply-To: <CAMETWJ27WuzL9CzRH7N2O2p75hh5sYnib6jjs9hBhbdqGeGDNw@mail.gmail.com>
References: <CAMETWJ04WWQNsiwE5z7ajcauy2TBpL69Y4xf4X4OVXSCK2SJPg@mail.gmail.com>
	<d0c0f1a3-ce88-59dc-17c4-c6ea121496f2@mcs.anl.gov>
	<CAMETWJ019H-bRz7exRF5C-Na-6CQO-weWbRVFQm4yAfuUV9TkQ@mail.gmail.com>
	<CAMYG4Gmbzun-TXuVR9U=0=e_J9GphAK8MgwGn-jtxeWvgi_+9A@mail.gmail.com>
	<CAMETWJ27WuzL9CzRH7N2O2p75hh5sYnib6jjs9hBhbdqGeGDNw@mail.gmail.com>
Message-ID: <D09E2589-DE0A-4D19-8A5E-9F8FC1B2A1C7@petsc.dev>


  ./configure option with-macos-firewall-rules


> On Mar 19, 2023, at 5:25 PM, Amneet Bhalla <mail2amneet at gmail.com> wrote:
> 
> Yes, this is MPI that is triggering the apple firewall. If I allow it it gets added to the allowed list (see the screenshot) and it does not trigger the firewall again. However, this needs to be done for all executables (there will be several main2d's in the list). Any way to suppress it for all executables linked to mpi in the first place?
> 
> <Screenshot 2023-03-19 at 2.19.53 PM.png>
> 
> On Sun, Mar 19, 2023 at 11:01?AM Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
>> On Sun, Mar 19, 2023 at 1:59?PM Amneet Bhalla <mail2amneet at gmail.com <mailto:mail2amneet at gmail.com>> wrote:
>>> I'm building PETSc without mpi (I built mpich v 4.1.1 locally). Here is the configure command line that I used:
>>> 
>>> ./configure --CC=mpicc --CXX=mpicxx --FC=mpif90 --PETSC_ARCH=darwin-dbg --with-debugging=1 --download-hypre=1 --with-x=0
>>> 
>> 
>> No, this uses MPI, it just does not built it. Configuring with --with-mpi=0 will shut off any use of MPI, which is what Satish thinks is bugging the firewall.
>> 
>>   Thanks,
>> 
>>     Matt
>>  
>>> On Sun, Mar 19, 2023 at 10:56?AM Satish Balay <balay at mcs.anl.gov <mailto:balay at mcs.anl.gov>> wrote:
>>>> I think its due to some of the system calls from MPI.
>>>> 
>>>> You can verify this with a '--with-mpi=0' build.
>>>> 
>>>> I wonder if there is a way to build mpich or openmpi - that doesn't trigger Apple's firewall..
>>>> 
>>>> Satish
>>>> 
>>>> On Sun, 19 Mar 2023, Amneet Bhalla wrote:
>>>> 
>>>> > Hi Folks,
>>>> > 
>>>> > I'm trying to build PETSc on MacOS Ventura (Apple M2) with hypre. I'm using
>>>> > the latest version (v3.18.5). During the configure and make check stage I
>>>> > get a request about accepting network connections. The configure and check
>>>> > proceeds without my input but the dialog box stays in place. Please see the
>>>> > screenshot. I'm wondering if it is benign or something to be concerned
>>>> > about? Do I need to accept any network certificate to not see this dialog
>>>> > box?
>>>> > 
>>>> > Thanks,
>>>> > 
>>>> > 
>>>> 
>>> 
>>> 
>>> -- 
>>> --Amneet 
>>> 
>>> 
>>> 
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
> 
> 
> -- 
> --Amneet 
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230319/6ad98afc/attachment.html>

From mail2amneet at gmail.com  Sun Mar 19 19:10:19 2023
From: mail2amneet at gmail.com (Amneet Bhalla)
Date: Sun, 19 Mar 2023 17:10:19 -0700
Subject: [petsc-users] PETSc build asks for network connections
In-Reply-To: <D09E2589-DE0A-4D19-8A5E-9F8FC1B2A1C7@petsc.dev>
References: <CAMETWJ04WWQNsiwE5z7ajcauy2TBpL69Y4xf4X4OVXSCK2SJPg@mail.gmail.com>
	<d0c0f1a3-ce88-59dc-17c4-c6ea121496f2@mcs.anl.gov>
	<CAMETWJ019H-bRz7exRF5C-Na-6CQO-weWbRVFQm4yAfuUV9TkQ@mail.gmail.com>
	<CAMYG4Gmbzun-TXuVR9U=0=e_J9GphAK8MgwGn-jtxeWvgi_+9A@mail.gmail.com>
	<CAMETWJ27WuzL9CzRH7N2O2p75hh5sYnib6jjs9hBhbdqGeGDNw@mail.gmail.com>
	<D09E2589-DE0A-4D19-8A5E-9F8FC1B2A1C7@petsc.dev>
Message-ID: <CAMETWJ0gcMfDLk7YvAPTR1NCLSadx7CzVJO3PiTAmA8=tXb+DQ@mail.gmail.com>

This helped only during the configure stage, and not during the check stage
and during executing the application built on PETSc. Do you think it is
because I built mpich locally and not with PETSc?

On Sun, Mar 19, 2023 at 3:51?PM Barry Smith <bsmith at petsc.dev> wrote:

>
>   ./configure option with-macos-firewall-rules
>
>
> On Mar 19, 2023, at 5:25 PM, Amneet Bhalla <mail2amneet at gmail.com> wrote:
>
> Yes, this is MPI that is triggering the apple firewall. If I allow it it
> gets added to the allowed list (see the screenshot) and it does not trigger
> the firewall again. However, this needs to be done for all executables
> (there will be several main2d's in the list). Any way to suppress it
> for all executables linked to mpi in the first place?
>
> <Screenshot 2023-03-19 at 2.19.53 PM.png>
>
> On Sun, Mar 19, 2023 at 11:01?AM Matthew Knepley <knepley at gmail.com>
> wrote:
>
>> On Sun, Mar 19, 2023 at 1:59?PM Amneet Bhalla <mail2amneet at gmail.com>
>> wrote:
>>
>>> I'm building PETSc without mpi (I built mpich v 4.1.1 locally). Here is
>>> the configure command line that I used:
>>>
>>> ./configure --CC=mpicc --CXX=mpicxx --FC=mpif90 --PETSC_ARCH=darwin-dbg
>>> --with-debugging=1 --download-hypre=1 --with-x=0
>>>
>>>
>> No, this uses MPI, it just does not built it. Configuring with
>> --with-mpi=0 will shut off any use of MPI, which is what Satish thinks is
>> bugging the firewall.
>>
>>   Thanks,
>>
>>     Matt
>>
>>
>>> On Sun, Mar 19, 2023 at 10:56?AM Satish Balay <balay at mcs.anl.gov> wrote:
>>>
>>>> I think its due to some of the system calls from MPI.
>>>>
>>>> You can verify this with a '--with-mpi=0' build.
>>>>
>>>> I wonder if there is a way to build mpich or openmpi - that doesn't
>>>> trigger Apple's firewall..
>>>>
>>>> Satish
>>>>
>>>> On Sun, 19 Mar 2023, Amneet Bhalla wrote:
>>>>
>>>> > Hi Folks,
>>>> >
>>>> > I'm trying to build PETSc on MacOS Ventura (Apple M2) with hypre. I'm
>>>> using
>>>> > the latest version (v3.18.5). During the configure and make check
>>>> stage I
>>>> > get a request about accepting network connections. The configure and
>>>> check
>>>> > proceeds without my input but the dialog box stays in place. Please
>>>> see the
>>>> > screenshot. I'm wondering if it is benign or something to be concerned
>>>> > about? Do I need to accept any network certificate to not see this
>>>> dialog
>>>> > box?
>>>> >
>>>> > Thanks,
>>>> >
>>>> >
>>>>
>>>>
>>>
>>> --
>>> --Amneet
>>>
>>>
>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>
>
> --
> --Amneet
>
>
>
>
>

-- 
--Amneet
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230319/7688c4b9/attachment.html>

From bsmith at petsc.dev  Sun Mar 19 20:45:03 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Sun, 19 Mar 2023 21:45:03 -0400
Subject: [petsc-users] PETSc build asks for network connections
In-Reply-To: <CAMETWJ0gcMfDLk7YvAPTR1NCLSadx7CzVJO3PiTAmA8=tXb+DQ@mail.gmail.com>
References: <CAMETWJ04WWQNsiwE5z7ajcauy2TBpL69Y4xf4X4OVXSCK2SJPg@mail.gmail.com>
	<d0c0f1a3-ce88-59dc-17c4-c6ea121496f2@mcs.anl.gov>
	<CAMETWJ019H-bRz7exRF5C-Na-6CQO-weWbRVFQm4yAfuUV9TkQ@mail.gmail.com>
	<CAMYG4Gmbzun-TXuVR9U=0=e_J9GphAK8MgwGn-jtxeWvgi_+9A@mail.gmail.com>
	<CAMETWJ27WuzL9CzRH7N2O2p75hh5sYnib6jjs9hBhbdqGeGDNw@mail.gmail.com>
	<D09E2589-DE0A-4D19-8A5E-9F8FC1B2A1C7@petsc.dev>
	<CAMETWJ0gcMfDLk7YvAPTR1NCLSadx7CzVJO3PiTAmA8=tXb+DQ@mail.gmail.com>
Message-ID: <E7C0983D-30A6-47BE-BB9A-BF2ADDDB3BA1@petsc.dev>


  I found a bit more information in gmakefile.test which has the magic sauce used by make test to stop the firewall popups while running the test suite.

# MACOS FIREWALL HANDLING
# - if run with MACOS_FIREWALL=1
#   (automatically set in $PETSC_ARCH/lib/petsc/conf/petscvariables if configured --with-macos-firewall-rules),
#   ensure mpiexec and test executable is on firewall list
#
ifeq ($(MACOS_FIREWALL),1)
FW := /usr/libexec/ApplicationFirewall/socketfilterfw
# There is no reliable realpath command in macOS without need for 3rd party tools like homebrew coreutils
# Using Python's realpath seems like the most robust way here
realpath-py = $(shell $(PYTHON) -c 'import os, sys; print(os.path.realpath(sys.argv[1]))' $(1))
#
define macos-firewall-register
  @APP=$(call realpath-py, $(1)); \
    if ! sudo -n true 2>/dev/null; then printf "Asking for sudo password to add new firewall rule for\n  $$APP\n"; fi; \
    sudo $(FW) --remove $$APP --add $$APP --blockapp $$APP
endef
endif

and below. When building each executable it automatically calls socketfilterfw on that executable so it won't popup.

From this I think you can reverse engineer how to turn it off for your executables.

Perhaps PETSc's make ex1 etc should also apply this magic sauce, Pierre?



> On Mar 19, 2023, at 8:10 PM, Amneet Bhalla <mail2amneet at gmail.com> wrote:
> 
> This helped only during the configure stage, and not during the check stage and during executing the application built on PETSc. Do you think it is because I built mpich locally and not with PETSc?
> 
> On Sun, Mar 19, 2023 at 3:51?PM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>> 
>>   ./configure option with-macos-firewall-rules
>> 
>> 
>>> On Mar 19, 2023, at 5:25 PM, Amneet Bhalla <mail2amneet at gmail.com <mailto:mail2amneet at gmail.com>> wrote:
>>> 
>>> Yes, this is MPI that is triggering the apple firewall. If I allow it it gets added to the allowed list (see the screenshot) and it does not trigger the firewall again. However, this needs to be done for all executables (there will be several main2d's in the list). Any way to suppress it for all executables linked to mpi in the first place?
>>> 
>>> <Screenshot 2023-03-19 at 2.19.53 PM.png>
>>> 
>>> On Sun, Mar 19, 2023 at 11:01?AM Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
>>>> On Sun, Mar 19, 2023 at 1:59?PM Amneet Bhalla <mail2amneet at gmail.com <mailto:mail2amneet at gmail.com>> wrote:
>>>>> I'm building PETSc without mpi (I built mpich v 4.1.1 locally). Here is the configure command line that I used:
>>>>> 
>>>>> ./configure --CC=mpicc --CXX=mpicxx --FC=mpif90 --PETSC_ARCH=darwin-dbg --with-debugging=1 --download-hypre=1 --with-x=0
>>>>> 
>>>> 
>>>> No, this uses MPI, it just does not built it. Configuring with --with-mpi=0 will shut off any use of MPI, which is what Satish thinks is bugging the firewall.
>>>> 
>>>>   Thanks,
>>>> 
>>>>     Matt
>>>>  
>>>>> On Sun, Mar 19, 2023 at 10:56?AM Satish Balay <balay at mcs.anl.gov <mailto:balay at mcs.anl.gov>> wrote:
>>>>>> I think its due to some of the system calls from MPI.
>>>>>> 
>>>>>> You can verify this with a '--with-mpi=0' build.
>>>>>> 
>>>>>> I wonder if there is a way to build mpich or openmpi - that doesn't trigger Apple's firewall..
>>>>>> 
>>>>>> Satish
>>>>>> 
>>>>>> On Sun, 19 Mar 2023, Amneet Bhalla wrote:
>>>>>> 
>>>>>> > Hi Folks,
>>>>>> > 
>>>>>> > I'm trying to build PETSc on MacOS Ventura (Apple M2) with hypre. I'm using
>>>>>> > the latest version (v3.18.5). During the configure and make check stage I
>>>>>> > get a request about accepting network connections. The configure and check
>>>>>> > proceeds without my input but the dialog box stays in place. Please see the
>>>>>> > screenshot. I'm wondering if it is benign or something to be concerned
>>>>>> > about? Do I need to accept any network certificate to not see this dialog
>>>>>> > box?
>>>>>> > 
>>>>>> > Thanks,
>>>>>> > 
>>>>>> > 
>>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> --Amneet 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>> -- Norbert Wiener
>>>> 
>>>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
>>> 
>>> 
>>> -- 
>>> --Amneet 
>>> 
>>> 
>>> 
>> 
> 
> 
> -- 
> --Amneet 
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230319/10287ec5/attachment-0001.html>

From pierre at joliv.et  Mon Mar 20 01:39:11 2023
From: pierre at joliv.et (Pierre Jolivet)
Date: Mon, 20 Mar 2023 07:39:11 +0100
Subject: [petsc-users] PETSc build asks for network connections
In-Reply-To: <E7C0983D-30A6-47BE-BB9A-BF2ADDDB3BA1@petsc.dev>
References: <CAMETWJ04WWQNsiwE5z7ajcauy2TBpL69Y4xf4X4OVXSCK2SJPg@mail.gmail.com>
	<d0c0f1a3-ce88-59dc-17c4-c6ea121496f2@mcs.anl.gov>
	<CAMETWJ019H-bRz7exRF5C-Na-6CQO-weWbRVFQm4yAfuUV9TkQ@mail.gmail.com>
	<CAMYG4Gmbzun-TXuVR9U=0=e_J9GphAK8MgwGn-jtxeWvgi_+9A@mail.gmail.com>
	<CAMETWJ27WuzL9CzRH7N2O2p75hh5sYnib6jjs9hBhbdqGeGDNw@mail.gmail.com>
	<D09E2589-DE0A-4D19-8A5E-9F8FC1B2A1C7@petsc.dev>
	<CAMETWJ0gcMfDLk7YvAPTR1NCLSadx7CzVJO3PiTAmA8=tXb+DQ@mail.gmail.com>
	<E7C0983D-30A6-47BE-BB9A-BF2ADDDB3BA1@petsc.dev>
Message-ID: <F99FB711-2412-44B5-9829-0159D27F143C@joliv.et>


> On 20 Mar 2023, at 2:45 AM, Barry Smith <bsmith at petsc.dev> wrote:
> 
> 
>   I found a bit more information in gmakefile.test which has the magic sauce used by make test to stop the firewall popups while running the test suite.
> 
> # MACOS FIREWALL HANDLING
> # - if run with MACOS_FIREWALL=1
> #   (automatically set in $PETSC_ARCH/lib/petsc/conf/petscvariables if configured --with-macos-firewall-rules),
> #   ensure mpiexec and test executable is on firewall list
> #
> ifeq ($(MACOS_FIREWALL),1)
> FW := /usr/libexec/ApplicationFirewall/socketfilterfw
> # There is no reliable realpath command in macOS without need for 3rd party tools like homebrew coreutils
> # Using Python's realpath seems like the most robust way here
> realpath-py = $(shell $(PYTHON) -c 'import os, sys; print(os.path.realpath(sys.argv[1]))' $(1))
> #
> define macos-firewall-register
>   @APP=$(call realpath-py, $(1)); \
>     if ! sudo -n true 2>/dev/null; then printf "Asking for sudo password to add new firewall rule for\n  $$APP\n"; fi; \
>     sudo $(FW) --remove $$APP --add $$APP --blockapp $$APP
> endef
> endif
> 
> and below. When building each executable it automatically calls socketfilterfw on that executable so it won't popup.
> 
> From this I think you can reverse engineer how to turn it off for your executables.
> 
> Perhaps PETSc's make ex1 etc should also apply this magic sauce, Pierre?

This configure option was added in https://gitlab.com/petsc/petsc/-/merge_requests/3131 but it never worked on my machines.
I just tried again this morning a make check with MACOS_FIREWALL=1, it?s asking for my password to register MPICH in the firewall, but the popups are still appearing afterwards.
That?s why I?ve never used that configure option and why I?m not sure if I can trust this code from makefile.test, but I?m probably being paranoid.
Prior to Ventura, when I was running the test suite, I manually disabled the firewall https://support.apple.com/en-gb/guide/mac-help/mh11783/12.0/mac/12.0
Apple has done yet again Apple things, and even if you disable the firewall on Ventura (https://support.apple.com/en-gb/guide/mac-help/mh11783/13.0/mac/13.0), the popups are still appearing.
Right now, I don?t have a solution, except for not using my machine while the test suite runs?
I don?t recall whether this has been mentioned by any of the other devs, but this is a completely harmless (though frustrating) message: MPI and/or PETSc cannot be used without an action from the user to allow others to get access to your machine.

Thanks,
Pierre

>> On Mar 19, 2023, at 8:10 PM, Amneet Bhalla <mail2amneet at gmail.com> wrote:
>> 
>> This helped only during the configure stage, and not during the check stage and during executing the application built on PETSc. Do you think it is because I built mpich locally and not with PETSc?
>> 
>> On Sun, Mar 19, 2023 at 3:51?PM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>>> 
>>>   ./configure option with-macos-firewall-rules
>>> 
>>> 
>>>> On Mar 19, 2023, at 5:25 PM, Amneet Bhalla <mail2amneet at gmail.com <mailto:mail2amneet at gmail.com>> wrote:
>>>> 
>>>> Yes, this is MPI that is triggering the apple firewall. If I allow it it gets added to the allowed list (see the screenshot) and it does not trigger the firewall again. However, this needs to be done for all executables (there will be several main2d's in the list). Any way to suppress it for all executables linked to mpi in the first place?
>>>> 
>>>> <Screenshot 2023-03-19 at 2.19.53 PM.png>
>>>> 
>>>> On Sun, Mar 19, 2023 at 11:01?AM Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
>>>>> On Sun, Mar 19, 2023 at 1:59?PM Amneet Bhalla <mail2amneet at gmail.com <mailto:mail2amneet at gmail.com>> wrote:
>>>>>> I'm building PETSc without mpi (I built mpich v 4.1.1 locally). Here is the configure command line that I used:
>>>>>> 
>>>>>> ./configure --CC=mpicc --CXX=mpicxx --FC=mpif90 --PETSC_ARCH=darwin-dbg --with-debugging=1 --download-hypre=1 --with-x=0
>>>>>> 
>>>>> 
>>>>> No, this uses MPI, it just does not built it. Configuring with --with-mpi=0 will shut off any use of MPI, which is what Satish thinks is bugging the firewall.
>>>>> 
>>>>>   Thanks,
>>>>> 
>>>>>     Matt
>>>>>  
>>>>>> On Sun, Mar 19, 2023 at 10:56?AM Satish Balay <balay at mcs.anl.gov <mailto:balay at mcs.anl.gov>> wrote:
>>>>>>> I think its due to some of the system calls from MPI.
>>>>>>> 
>>>>>>> You can verify this with a '--with-mpi=0' build.
>>>>>>> 
>>>>>>> I wonder if there is a way to build mpich or openmpi - that doesn't trigger Apple's firewall..
>>>>>>> 
>>>>>>> Satish
>>>>>>> 
>>>>>>> On Sun, 19 Mar 2023, Amneet Bhalla wrote:
>>>>>>> 
>>>>>>> > Hi Folks,
>>>>>>> > 
>>>>>>> > I'm trying to build PETSc on MacOS Ventura (Apple M2) with hypre. I'm using
>>>>>>> > the latest version (v3.18.5). During the configure and make check stage I
>>>>>>> > get a request about accepting network connections. The configure and check
>>>>>>> > proceeds without my input but the dialog box stays in place. Please see the
>>>>>>> > screenshot. I'm wondering if it is benign or something to be concerned
>>>>>>> > about? Do I need to accept any network certificate to not see this dialog
>>>>>>> > box?
>>>>>>> > 
>>>>>>> > Thanks,
>>>>>>> > 
>>>>>>> > 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> --Amneet 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>>> -- Norbert Wiener
>>>>> 
>>>>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
>>>> 
>>>> 
>>>> -- 
>>>> --Amneet 
>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>> -- 
>> --Amneet 
>> 
>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230320/5466a251/attachment.html>

From yangzongze at gmail.com  Mon Mar 20 01:40:51 2023
From: yangzongze at gmail.com (Zongze Yang)
Date: Mon, 20 Mar 2023 14:40:51 +0800
Subject: [petsc-users] `snes+ksponly` did not update the solution when ksp
 failed.
Message-ID: <CA+K_gXD+_1pG58e75sDQThcv9v0D_XP0bMEw-JU3wDzOvxNMpw@mail.gmail.com>

Hi,

Hope this email finds you well. I am using firedrake to solve linear
problems, which use SNES with KSPONLY.

I found that the solution did not update when the `ksp` failed with
DIVERGED_ITS.
The macro `SNESCheckKSPSolve` called in `SNESSolve_KSPONLY` make it return
before the solution is updated.

Is this behavior as expected? Can I just increase the value of
`maxLinearSolveFailures`
to make the solution updated without introducing other side effects?

Best wishes,
Zongze
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230320/fad13780/attachment.html>

From clement.berger at ens-lyon.fr  Mon Mar 20 05:18:30 2023
From: clement.berger at ens-lyon.fr (Berger Clement)
Date: Mon, 20 Mar 2023 11:18:30 +0100
Subject: [petsc-users] Create a nest not aligned by processors
In-Reply-To: <E31F8C4D-3737-44C0-9022-6B34A26CEAA0@petsc.dev>
References: <b89465855591abef194300f1dc0f68e4@ens-lyon.fr>
	<3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev>
	<ece4a0ac05b61082f2ea73507c7551c8@ens-lyon.fr>
	<193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev>
	<d9d5551d095a228f57cb5724fbd994ab@ens-lyon.fr>
	<82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev>
	<eaab4d4a43e6e8551e058d5b3b2cf3ae@ens-lyon.fr>
	<CC2A3FF0-505B-4A18-A8DB-8399BA92CCF8@petsc.dev>
	<26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr>
	<1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev>
	<2b92d65cd8b6b5786bad3b03ab74eda6@ens-lyon.fr>
	<CAMYG4Gmybdr_wB+DDQdX+wUXS25UqCmhvKeiNtw1TOwNQXQwkQ@mail.gmail.com>
	<e503fd2aa361d85c515712b41c447011@ens-lyon.fr>
	<E31F8C4D-3737-44C0-9022-6B34A26CEAA0@petsc.dev>
Message-ID: <0b5419aa1fdbe12f093413352c675782@ens-lyon.fr>

I simplified the problem with the initial test I talked about because I
thought I identified the issue, so I will walk you through my whole
problem : 

- first the solve doesn't produce the same results as mentioned 

- I noticed that the duration of the factorization step of the matrix
was also not consistent with the number of processors used (it is longer
with 3 processes than with 1), I didn't think much of it but I now
realize that for instance with 4 processes, MUMPS crashes when
factorizing 

- I thought my matrices were wrong, but it's hard for me to use MatView
to compare them with 1 or 2 proc because I work with a quite specific
geometry, so in order not to fall into some weird particular case I need
to use at least roughly 100 points, so looking at 100x100 matrices is
not really nice...Instead I tried to multiply them by a vector full of
one (after I used the vector v such that v(i)=i). I tried it on two
matrices, and the results didn't depend on the number of procs, but when
I tried to multiply against the nest of these two matrices (a 2x2 block
diagonal nest), the result changed depending on the number of processors
used 

- that's why I tried the toy problem I wrote to you in the first place 

I hope it's clearer now. 

Thank you

---
Cl?ment BERGER
ENS de Lyon 

Le 2023-03-17 21:57, Barry Smith a ?crit :

> Yes, you would benefit from a VecConvert() to produce a standard vector. But you should be able to use VecGetArray() on the nest array and on the standard array and copy the values between the arrays any way you like. You don't need to do any reordering when you copy. Is that not working and what are the symptoms (more than just the answers to the linear solve are different)? Again you can run on one and two MPI processes with a tiny problem to see if things are not in the correct order in the vectors and matrices. 
> 
> Barry 
> 
> On Mar 17, 2023, at 3:22 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> To use MUMPS I need to convert my matrix in MATAIJ format (or at least not MATNEST), after that if I use a VECNEST for the left and right hanside, I get an error during the solve procedure, it is removed if I copy my data in a vector with standard format, I couldn't find any other way
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 19:53, Matthew Knepley a ?crit : 
> 
> On Fri, Mar 17, 2023 at 2:53?PM Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> But this is to properly fill up the VecNest am I right ? Because this one is correct, but I can't directly use it in the KSPSolve, I need to copy it into a standard vector
> 
> I do not understand what you mean here. You can definitely use a VecNest in a KSP. 
> 
> Thanks, 
> 
> Matt 
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 19:39, Barry Smith a ?crit : 
> I think the intention is that you use VecNestGetSubVecs() or VecNestGetSubVec() and fill up the sub-vectors in the same style as the matrices; this decreases the change of a reordering mistake in trying to do it by hand in your code. 
> 
> On Mar 17, 2023, at 2:35 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> That might be it, I didn't find the equivalent of MatConvert for the vectors, so when I need to solve my linear system, with my righthandside properly computed in nest format, I create a new vector using VecDuplicate, and then I copy into it my data using VecGetArrayF90 and copiing each element by hand. Does it create an incorrect ordering ? If so how can I get the correct one ?
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 19:27, Barry Smith a ?crit : 
> I would run your code with small sizes on 1, 2, 3 MPI ranks and use MatView() to examine the matrices. They will definitely be ordered differently but should otherwise be the same. My guess is that the right hand side may not have the correct ordering with respect to the matrix ordering in parallel. Note also that when the right hand side does have the correct ordering the solution will have a different ordering for each different number of MPI ranks when printed (but changing the ordering should give the same results up to machine precision. 
> 
> Barry 
> 
> On Mar 17, 2023, at 2:23 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> My issue is that it seems to improperly with some step of my process, the solve step doesn't provide the same result depending on the number of processors I use. I manually tried to multiply one the matrices I defined as a nest against a vector, and the result is not the same with e.g. 1 and 3 processors. That's why I tried the toy program I wrote in the first place, which highlights the misplacement of elements.
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 19:14, Barry Smith a ?crit : 
> This sounds  like a fine use of MATNEST. Now back to the original question 
> 
> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code 
> 
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) 
> 
> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). 
> 
> Is it possible to change that ?
   If I understand correctly it is behaving as expected. It is the same
matrix on 1 and 2 MPI processes, the only difference is the ordering of
the rows and columns.  

  Both matrix blocks are split among the two MPI processes. This is how
MATNEST works and likely what you want in practice.  

> On Mar 17, 2023, at 1:19 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> I have a matrix with four different blocks (2rows - 2columns). The block sizes differ from one another, because they correspond to a different physical variable. One of the block has the particularity that it has to be updated at each iteration. This update is performed by replacing it with a product of multiple matrices that depend on the result of the previous iteration. Note that these intermediate matrices are not square (because they also correspond to other types of variables), and that they must be completely refilled by hand (i.e. they are not the result of some simple linear operations). Finally, I use this final block matrix to solve multiple linear systems (with different righthand sides), so for now I use MUMPS as only the first solve takes time (but I might change it). 
> 
> Considering this setting, I created each type of variable separately, filled the different matrices, and created different nests of vectors / matrices for my operations. When the time comes to use KSPSolve, I use MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy the few vector data I need from my nests in a regular Vector, I solve, I get back my data in my nest and carry on with the operations needed for my updates. 
> 
> Is that clear ? I don't know if I provided too many or not enough details. 
> 
> Thank you
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 17:34, Barry Smith a ?crit : 
> Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it.  
> 
> Barry 
> 
> Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks).
> 
> On Mar 17, 2023, at 12:14 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work.
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 15:48, Barry Smith a ?crit : 
> You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process. 
> 
> Barry 
> 
> On Mar 17, 2023, at 10:10 AM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> Dear all, 
> 
> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code 
> 
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) 
> 
> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). 
> 
> Is it possible to change that ? 
> 
> Note that I am coding in fortran if that has ay consequence. 
> 
> Thank you, 
> 
> Sincerely,
> 
> -- 
> Cl?ment BERGER
> ENS de Lyon

  -- 

What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener 

https://www.cse.buffalo.edu/~knepley/ [1] 

Links:
------
[1] http://www.cse.buffalo.edu/~knepley/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230320/b5b888d8/attachment-0001.html>

From mfadams at lbl.gov  Mon Mar 20 05:31:01 2023
From: mfadams at lbl.gov (Mark Adams)
Date: Mon, 20 Mar 2023 06:31:01 -0400
Subject: [petsc-users] [petsc-maint] Some questions about matrix
 multiplication in sell format
In-Reply-To: <CAHTYGLMucYzvtv1kCgwOjfh83dcTaMZXzABLYApVJK+=ij00xg@mail.gmail.com>
References: <CAHTYGLNuojwUESyU2iQucvWxBL2y2yE_M9T2RaMNO29y5EPsXw@mail.gmail.com>
	<CADOhEh47Dddg+uAjk0e4T+FmOYJZjrGh5BuOWdi+vZ0FHan82Q@mail.gmail.com>
	<CAHTYGLMaOD4Cm5ik=yYerQ3XQaqPMwC=47GDRtCXUySJinJ+oQ@mail.gmail.com>
	<CADOhEh4vWt4H8Ro=E9_600zGw0v=AgThL8M-x1+xCoYJXAvdLA@mail.gmail.com>
	<CAHTYGLMucYzvtv1kCgwOjfh83dcTaMZXzABLYApVJK+=ij00xg@mail.gmail.com>
Message-ID: <CADOhEh717fdeEX18H5wukuEzmMZWDZbRW0TPCP5Dc+vj+zwmZw@mail.gmail.com>

I have no idea, keep on the list.
Mark

On Sun, Mar 19, 2023 at 10:13?PM CaoHao at gmail.com <ch1057458756 at gmail.com>
wrote:

> Thank you very much, I still have a question about the test code after
> vectorization. I did not find the Examples of the sell storage format in
> the petsc document. I would like to know which example you use to test the
> efficiency of vectorization?
>
> Mark Adams <mfadams at lbl.gov> ?2023?3?16??? 19:40???
>
>>
>>
>> On Thu, Mar 16, 2023 at 4:18?AM CaoHao at gmail.com <ch1057458756 at gmail.com>
>> wrote:
>>
>>> Ok, maybe I can try to vectorize this format and make it part of the
>>> article.
>>>
>>
>> That would be great, and it would be a good learning experience for you
>> and a good way to get exposure.
>> See https://petsc.org/release/developers/contributing/ for guidance.
>>
>> Good luck,
>> Mark
>>
>>
>>>
>>> Mark Adams <mfadams at lbl.gov> ?2023?3?15??? 19:57???
>>>
>>>> I don't believe that we have an effort here. It could be a good
>>>> opportunity to contribute.
>>>>
>>>> Mark
>>>>
>>>> On Wed, Mar 15, 2023 at 4:54?AM CaoHao at gmail.com <
>>>> ch1057458756 at gmail.com> wrote:
>>>>
>>>>> I checked the sell.c file and found that this algorithm supports AVX
>>>>> vectorization. Will the vectorization support of ARM architecture be added
>>>>> in the future?
>>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230320/fdff4d0a/attachment.html>

From knepley at gmail.com  Mon Mar 20 06:53:22 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 20 Mar 2023 07:53:22 -0400
Subject: [petsc-users] Create a nest not aligned by processors
In-Reply-To: <0b5419aa1fdbe12f093413352c675782@ens-lyon.fr>
References: <b89465855591abef194300f1dc0f68e4@ens-lyon.fr>
	<3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev>
	<ece4a0ac05b61082f2ea73507c7551c8@ens-lyon.fr>
	<193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev>
	<d9d5551d095a228f57cb5724fbd994ab@ens-lyon.fr>
	<82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev>
	<eaab4d4a43e6e8551e058d5b3b2cf3ae@ens-lyon.fr>
	<CC2A3FF0-505B-4A18-A8DB-8399BA92CCF8@petsc.dev>
	<26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr>
	<1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev>
	<2b92d65cd8b6b5786bad3b03ab74eda6@ens-lyon.fr>
	<CAMYG4Gmybdr_wB+DDQdX+wUXS25UqCmhvKeiNtw1TOwNQXQwkQ@mail.gmail.com>
	<e503fd2aa361d85c515712b41c447011@ens-lyon.fr>
	<E31F8C4D-3737-44C0-9022-6B34A26CEAA0@petsc.dev>
	<0b5419aa1fdbe12f093413352c675782@ens-lyon.fr>
Message-ID: <CAMYG4GmN_vtr14ych+j+Ao=G=LN6kFowNx3r1-sW245x6uiV0w@mail.gmail.com>

On Mon, Mar 20, 2023 at 6:18?AM Berger Clement <clement.berger at ens-lyon.fr>
wrote:

> I simplified the problem with the initial test I talked about because I
> thought I identified the issue, so I will walk you through my whole problem
> :
>
> - first the solve doesn't produce the same results as mentioned
>
> - I noticed that the duration of the factorization step of the matrix was
> also not consistent with the number of processors used (it is longer with 3
> processes than with 1), I didn't think much of it but I now realize that
> for instance with 4 processes, MUMPS crashes when factorizing
>
> - I thought my matrices were wrong, but it's hard for me to use MatView to
> compare them with 1 or 2 proc because I work with a quite specific
> geometry, so in order not to fall into some weird particular case I need to
> use at least roughly 100 points, so looking at 100x100 matrices is not
> really nice...Instead I tried to multiply them by a vector full of one
> (after I used the vector v such that v(i)=i). I tried it on two matrices,
> and the results didn't depend on the number of procs, but when I tried to
> multiply against the nest of these two matrices (a 2x2 block diagonal
> nest), the result changed depending on the number of processors used
>
> - that's why I tried the toy problem I wrote to you in the first place
>
> I hope it's clearer now.
>

Unfortunately, it is not clear to me. There is nothing attached to this
email. I will try to describe things from my end.

1) There are lots of tests. Internally, Nest does not depend on the number
of processes unless you make it so. This leads
     me to believe that your construction of the matrix changes with the
number of processes. For example, using PETSC_DETERMINE
     for sizes will do this.

2) In order to understand what you want to achieve, we need to have
something running in two cases, one with "correct" output and one
    with something different. It sounds like you have such a small example,
but I have missed it.

Can you attach this example? Then I can run it, look at the matrices, and
see what is different.

  Thanks,

      Matt


> Thank you
> ---
> Cl?ment BERGER
> ENS de Lyon
>
>
> Le 2023-03-17 21:57, Barry Smith a ?crit :
>
>
>   Yes, you would benefit from a VecConvert() to produce a standard vector.
> But you should be able to use VecGetArray() on the nest array and on the
> standard array and copy the values between the arrays any way you like. You
> don't need to do any reordering when you copy. Is that not working and what
> are the symptoms (more than just the answers to the linear solve are
> different)? Again you can run on one and two MPI processes with a tiny
> problem to see if things are not in the correct order in the vectors and
> matrices.
>
>   Barry
>
>
> On Mar 17, 2023, at 3:22 PM, Berger Clement <clement.berger at ens-lyon.fr>
> wrote:
>
> To use MUMPS I need to convert my matrix in MATAIJ format (or at least not
> MATNEST), after that if I use a VECNEST for the left and right hanside, I
> get an error during the solve procedure, it is removed if I copy my data in
> a vector with standard format, I couldn't find any other way
> ---
> Cl?ment BERGER
> ENS de Lyon
>
>
> Le 2023-03-17 19:53, Matthew Knepley a ?crit :
>
> On Fri, Mar 17, 2023 at 2:53?PM Berger Clement <clement.berger at ens-lyon.fr>
> wrote:
>
>> But this is to properly fill up the VecNest am I right ? Because this one
>> is correct, but I can't directly use it in the KSPSolve, I need to copy it
>> into a standard vector
>>
>>
> I do not understand what you mean here. You can definitely use a
> VecNest in a KSP.
>
>   Thanks,
>
>     Matt
>
>
>
>> ---
>> Cl?ment BERGER
>> ENS de Lyon
>>
>>
>> Le 2023-03-17 19:39, Barry Smith a ?crit :
>>
>>
>>   I think the intention is that you use VecNestGetSubVecs()
>> or VecNestGetSubVec() and fill up the sub-vectors in the same style as the
>> matrices; this decreases the change of a reordering mistake in trying to do
>> it by hand in your code.
>>
>>
>>
>> On Mar 17, 2023, at 2:35 PM, Berger Clement <clement.berger at ens-lyon.fr>
>> wrote:
>>
>> That might be it, I didn't find the equivalent of MatConvert for the
>> vectors, so when I need to solve my linear system, with my righthandside
>> properly computed in nest format, I create a new vector using VecDuplicate,
>> and then I copy into it my data using VecGetArrayF90 and copiing each
>> element by hand. Does it create an incorrect ordering ? If so how can I get
>> the correct one ?
>> ---
>> Cl?ment BERGER
>> ENS de Lyon
>>
>>
>> Le 2023-03-17 19:27, Barry Smith a ?crit :
>>
>>
>>   I would run your code with small sizes on 1, 2, 3 MPI ranks and use
>> MatView() to examine the matrices. They will definitely be ordered
>> differently but should otherwise be the same. My guess is that the right
>> hand side may not have the correct ordering with respect to the matrix
>> ordering in parallel. Note also that when the right hand side does have the
>> correct ordering the solution will have a different ordering for each
>> different number of MPI ranks when printed (but changing the ordering
>> should give the same results up to machine precision.
>>
>>   Barry
>>
>>
>> On Mar 17, 2023, at 2:23 PM, Berger Clement <clement.berger at ens-lyon.fr>
>> wrote:
>>
>> My issue is that it seems to improperly with some step of my process, the
>> solve step doesn't provide the same result depending on the number of
>> processors I use. I manually tried to multiply one the matrices I defined
>> as a nest against a vector, and the result is not the same with e.g. 1 and
>> 3 processors. That's why I tried the toy program I wrote in the first
>> place, which highlights the misplacement of elements.
>> ---
>> Cl?ment BERGER
>> ENS de Lyon
>>
>>
>> Le 2023-03-17 19:14, Barry Smith a ?crit :
>>
>>
>>    This sounds  like a fine use of MATNEST. Now back to the original
>> question
>>
>>
>> I want to construct a matrix by blocs, each block having different sizes
>> and partially stored by multiple processors. If I am not mistaken, the
>> right way to do so is by using the MATNEST type. However, the following code
>>
>> Call
>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
>> Call
>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
>> Call
>> MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr)
>>
>> does not generate the same matrix depending on the number of processors.
>> It seems that it starts by everything owned by the first proc for A and B,
>> then goes on to the second proc and so on (I hope I am being clear).
>>
>> Is it possible to change that ?
>>
>>   If I understand correctly it is behaving as expected. It is the same
>> matrix on 1 and 2 MPI processes, the only difference is the ordering of the
>> rows and columns.
>>
>>   Both matrix blocks are split among the two MPI processes. This is how
>> MATNEST works and likely what you want in practice.
>>
>> On Mar 17, 2023, at 1:19 PM, Berger Clement <clement.berger at ens-lyon.fr>
>> wrote:
>>
>> I have a matrix with four different blocks (2rows - 2columns). The block
>> sizes differ from one another, because they correspond to a different
>> physical variable. One of the block has the particularity that it has to be
>> updated at each iteration. This update is performed by replacing it with a
>> product of multiple matrices that depend on the result of the previous
>> iteration. Note that these intermediate matrices are not square (because
>> they also correspond to other types of variables), and that they must be
>> completely refilled by hand (i.e. they are not the result of some simple
>> linear operations). Finally, I use this final block matrix to solve
>> multiple linear systems (with different righthand sides), so for now I use
>> MUMPS as only the first solve takes time (but I might change it).
>>
>> Considering this setting, I created each type of variable separately,
>> filled the different matrices, and created different nests of vectors /
>> matrices for my operations. When the time comes to use KSPSolve, I use
>> MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy
>> the few vector data I need from my nests in a regular Vector, I solve, I
>> get back my data in my nest and carry on with the operations needed for my
>> updates.
>>
>> Is that clear ? I don't know if I provided too many or not enough details.
>>
>> Thank you
>> ---
>> Cl?ment BERGER
>> ENS de Lyon
>>
>>
>> Le 2023-03-17 17:34, Barry Smith a ?crit :
>>
>>
>>    Perhaps if you provide a brief summary of what you would like to do
>> and we may have ideas on how to achieve it.
>>
>>    Barry
>>
>> Note: that MATNEST does require that all matrices live on all the MPI
>> processes within the original communicator. That is if the original
>> communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST
>> that only lives on ranks 1,2 but you could have it have 0 rows on rank zero
>> so effectively it lives only on rank 1 and 2 (though its communicator is
>> all three ranks).
>>
>> On Mar 17, 2023, at 12:14 PM, Berger Clement <clement.berger at ens-lyon.fr>
>> wrote:
>>
>> It would be possible in the case I showed you but in mine that would
>> actually be quite complicated, isn't there any other workaround ? I precise
>> that I am not entitled to utilizing the MATNEST format, it's just that I
>> think the other ones wouldn't work.
>> ---
>> Cl?ment BERGER
>> ENS de Lyon
>>
>>
>> Le 2023-03-17 15:48, Barry Smith a ?crit :
>>
>>
>>    You may be able to mimic what you want by not using PETSC_DECIDE but
>> instead computing up front how many rows of each matrix you want stored on
>> each MPI process. You can use 0 for on certain MPI processes for certain
>> matrices if you don't want any rows of that particular matrix stored on
>> that particular MPI process.
>>
>>   Barry
>>
>>
>> On Mar 17, 2023, at 10:10 AM, Berger Clement <clement.berger at ens-lyon.fr>
>> wrote:
>>
>> Dear all,
>>
>> I want to construct a matrix by blocs, each block having different sizes
>> and partially stored by multiple processors. If I am not mistaken, the
>> right way to do so is by using the MATNEST type. However, the following code
>>
>> Call
>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
>> Call
>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
>> Call
>> MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr)
>>
>> does not generate the same matrix depending on the number of processors.
>> It seems that it starts by everything owned by the first proc for A and B,
>> then goes on to the second proc and so on (I hope I am being clear).
>>
>> Is it possible to change that ?
>>
>> Note that I am coding in fortran if that has ay consequence.
>>
>> Thank you,
>>
>> Sincerely,
>> --
>> Cl?ment BERGER
>> ENS de Lyon
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230320/935e3daf/attachment-0001.html>

From knepley at gmail.com  Mon Mar 20 07:00:47 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 20 Mar 2023 08:00:47 -0400
Subject: [petsc-users] `snes+ksponly` did not update the solution when
 ksp failed.
In-Reply-To: <CA+K_gXD+_1pG58e75sDQThcv9v0D_XP0bMEw-JU3wDzOvxNMpw@mail.gmail.com>
References: <CA+K_gXD+_1pG58e75sDQThcv9v0D_XP0bMEw-JU3wDzOvxNMpw@mail.gmail.com>
Message-ID: <CAMYG4GnLGcWhNh1sy2-SNso_h4MEPd97c_gsBobGBxJbQycS1g@mail.gmail.com>

On Mon, Mar 20, 2023 at 2:41?AM Zongze Yang <yangzongze at gmail.com> wrote:

> Hi,
>
> Hope this email finds you well. I am using firedrake to solve linear
> problems, which use SNES with KSPONLY.
>
> I found that the solution did not update when the `ksp` failed with DIVERGED_ITS.
> The macro `SNESCheckKSPSolve` called in `SNESSolve_KSPONLY` make it
> return before the solution is updated.
>

Yes, this is the intended behavior. We do not guarantee cleanup on errors.


> Is this behavior as expected? Can I just increase the value of `maxLinearSolveFailures`
> to make the solution updated without introducing other side effects?
>

Yes, that is right. It will not have other side effects with this SNES type.

  Thanks,

     Matt


> Best wishes,
> Zongze
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230320/2d376bd2/attachment.html>

From yangzongze at gmail.com  Mon Mar 20 07:59:56 2023
From: yangzongze at gmail.com (Zongze Yang)
Date: Mon, 20 Mar 2023 20:59:56 +0800
Subject: [petsc-users] `snes+ksponly` did not update the solution when
 ksp failed.
In-Reply-To: <CAMYG4GnLGcWhNh1sy2-SNso_h4MEPd97c_gsBobGBxJbQycS1g@mail.gmail.com>
References: <CA+K_gXD+_1pG58e75sDQThcv9v0D_XP0bMEw-JU3wDzOvxNMpw@mail.gmail.com>
	<CAMYG4GnLGcWhNh1sy2-SNso_h4MEPd97c_gsBobGBxJbQycS1g@mail.gmail.com>
Message-ID: <CA+K_gXAqM5aYMtvB2owGdqwe4i2Acrg=fY9V7jhg_fZXoBeqQA@mail.gmail.com>

Thank you for your clarification.

Best wishes,
Zongze


On Mon, 20 Mar 2023 at 20:00, Matthew Knepley <knepley at gmail.com> wrote:

> On Mon, Mar 20, 2023 at 2:41?AM Zongze Yang <yangzongze at gmail.com> wrote:
>
>> Hi,
>>
>> Hope this email finds you well. I am using firedrake to solve linear
>> problems, which use SNES with KSPONLY.
>>
>> I found that the solution did not update when the `ksp` failed with DIVERGED_ITS.
>> The macro `SNESCheckKSPSolve` called in `SNESSolve_KSPONLY` make it
>> return before the solution is updated.
>>
>
> Yes, this is the intended behavior. We do not guarantee cleanup on errors.
>
>
>> Is this behavior as expected? Can I just increase the value of `maxLinearSolveFailures`
>> to make the solution updated without introducing other side effects?
>>
>
> Yes, that is right. It will not have other side effects with this SNES
> type.
>
>   Thanks,
>
>      Matt
>
>
>> Best wishes,
>> Zongze
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230320/635d8846/attachment.html>

From clement.berger at ens-lyon.fr  Mon Mar 20 08:35:38 2023
From: clement.berger at ens-lyon.fr (Berger Clement)
Date: Mon, 20 Mar 2023 14:35:38 +0100
Subject: [petsc-users] Create a nest not aligned by processors
In-Reply-To: <CAMYG4GmN_vtr14ych+j+Ao=G=LN6kFowNx3r1-sW245x6uiV0w@mail.gmail.com>
References: <b89465855591abef194300f1dc0f68e4@ens-lyon.fr>
	<3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev>
	<ece4a0ac05b61082f2ea73507c7551c8@ens-lyon.fr>
	<193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev>
	<d9d5551d095a228f57cb5724fbd994ab@ens-lyon.fr>
	<82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev>
	<eaab4d4a43e6e8551e058d5b3b2cf3ae@ens-lyon.fr>
	<CC2A3FF0-505B-4A18-A8DB-8399BA92CCF8@petsc.dev>
	<26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr>
	<1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev>
	<2b92d65cd8b6b5786bad3b03ab74eda6@ens-lyon.fr>
	<CAMYG4Gmybdr_wB+DDQdX+wUXS25UqCmhvKeiNtw1TOwNQXQwkQ@mail.gmail.com>
	<e503fd2aa361d85c515712b41c447011@ens-lyon.fr>
	<E31F8C4D-3737-44C0-9022-6B34A26CEAA0@petsc.dev>
	<0b5419aa1fdbe12f093413352c675782@ens-lyon.fr>
	<CAMYG4GmN_vtr14ych+j+Ao=G=LN6kFowNx3r1-sW245x6uiV0w@mail.gmail.com>
Message-ID: <0fe503797c8f73605cebaeaf1924e473@ens-lyon.fr>

1) Yes in my program I use PETSC_DETERMINE, but I don't see what is the
issue there. From what I understand, it just lets PETSc set the total
size from the local sizes provided, am I mistaken ? 

2) I attached a small script, when I run it with 1 proc the output
vector is not the same as if I run it with 2 procs, I don't know what I
should do to make them match. 

PS : I precise that I am not trying to point out a bug here, I realize
that my usage is wrong somehow, I just can't determine why, sorry if I
gave you the wrong impression ! 

Thank you,

---
Cl?ment BERGER
ENS de Lyon 

Le 2023-03-20 12:53, Matthew Knepley a ?crit :

> On Mon, Mar 20, 2023 at 6:18?AM Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
>> I simplified the problem with the initial test I talked about because I thought I identified the issue, so I will walk you through my whole problem : 
>> 
>> - first the solve doesn't produce the same results as mentioned 
>> 
>> - I noticed that the duration of the factorization step of the matrix was also not consistent with the number of processors used (it is longer with 3 processes than with 1), I didn't think much of it but I now realize that for instance with 4 processes, MUMPS crashes when factorizing 
>> 
>> - I thought my matrices were wrong, but it's hard for me to use MatView to compare them with 1 or 2 proc because I work with a quite specific geometry, so in order not to fall into some weird particular case I need to use at least roughly 100 points, so looking at 100x100 matrices is not really nice...Instead I tried to multiply them by a vector full of one (after I used the vector v such that v(i)=i). I tried it on two matrices, and the results didn't depend on the number of procs, but when I tried to multiply against the nest of these two matrices (a 2x2 block diagonal nest), the result changed depending on the number of processors used 
>> 
>> - that's why I tried the toy problem I wrote to you in the first place 
>> 
>> I hope it's clearer now.
> 
> Unfortunately, it is not clear to me. There is nothing attached to this email. I will try to describe things from my end. 
> 
> 1) There are lots of tests. Internally, Nest does not depend on the number of processes unless you make it so. This leads 
> me to believe that your construction of the matrix changes with the number of processes. For example, using PETSC_DETERMINE 
> for sizes will do this. 
> 
> 2) In order to understand what you want to achieve, we need to have something running in two cases, one with "correct" output and one 
> with something different. It sounds like you have such a small example, but I have missed it. 
> 
> Can you attach this example? Then I can run it, look at the matrices, and see what is different. 
> 
> Thanks, 
> 
> Matt 
> 
> Thank you
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 21:57, Barry Smith a ?crit : 
> Yes, you would benefit from a VecConvert() to produce a standard vector. But you should be able to use VecGetArray() on the nest array and on the standard array and copy the values between the arrays any way you like. You don't need to do any reordering when you copy. Is that not working and what are the symptoms (more than just the answers to the linear solve are different)? Again you can run on one and two MPI processes with a tiny problem to see if things are not in the correct order in the vectors and matrices. 
> 
> Barry 
> 
> On Mar 17, 2023, at 3:22 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> To use MUMPS I need to convert my matrix in MATAIJ format (or at least not MATNEST), after that if I use a VECNEST for the left and right hanside, I get an error during the solve procedure, it is removed if I copy my data in a vector with standard format, I couldn't find any other way
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 19:53, Matthew Knepley a ?crit : 
> 
> On Fri, Mar 17, 2023 at 2:53?PM Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> But this is to properly fill up the VecNest am I right ? Because this one is correct, but I can't directly use it in the KSPSolve, I need to copy it into a standard vector
> 
> I do not understand what you mean here. You can definitely use a VecNest in a KSP. 
> 
> Thanks, 
> 
> Matt 
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 19:39, Barry Smith a ?crit : 
> I think the intention is that you use VecNestGetSubVecs() or VecNestGetSubVec() and fill up the sub-vectors in the same style as the matrices; this decreases the change of a reordering mistake in trying to do it by hand in your code. 
> 
> On Mar 17, 2023, at 2:35 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> That might be it, I didn't find the equivalent of MatConvert for the vectors, so when I need to solve my linear system, with my righthandside properly computed in nest format, I create a new vector using VecDuplicate, and then I copy into it my data using VecGetArrayF90 and copiing each element by hand. Does it create an incorrect ordering ? If so how can I get the correct one ?
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 19:27, Barry Smith a ?crit : 
> I would run your code with small sizes on 1, 2, 3 MPI ranks and use MatView() to examine the matrices. They will definitely be ordered differently but should otherwise be the same. My guess is that the right hand side may not have the correct ordering with respect to the matrix ordering in parallel. Note also that when the right hand side does have the correct ordering the solution will have a different ordering for each different number of MPI ranks when printed (but changing the ordering should give the same results up to machine precision. 
> 
> Barry 
> 
> On Mar 17, 2023, at 2:23 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> My issue is that it seems to improperly with some step of my process, the solve step doesn't provide the same result depending on the number of processors I use. I manually tried to multiply one the matrices I defined as a nest against a vector, and the result is not the same with e.g. 1 and 3 processors. That's why I tried the toy program I wrote in the first place, which highlights the misplacement of elements.
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 19:14, Barry Smith a ?crit : 
> This sounds  like a fine use of MATNEST. Now back to the original question 
> 
> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code 
> 
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) 
> 
> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). 
> 
> Is it possible to change that ?
   If I understand correctly it is behaving as expected. It is the same
matrix on 1 and 2 MPI processes, the only difference is the ordering of
the rows and columns.  

  Both matrix blocks are split among the two MPI processes. This is how
MATNEST works and likely what you want in practice.  

> On Mar 17, 2023, at 1:19 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> I have a matrix with four different blocks (2rows - 2columns). The block sizes differ from one another, because they correspond to a different physical variable. One of the block has the particularity that it has to be updated at each iteration. This update is performed by replacing it with a product of multiple matrices that depend on the result of the previous iteration. Note that these intermediate matrices are not square (because they also correspond to other types of variables), and that they must be completely refilled by hand (i.e. they are not the result of some simple linear operations). Finally, I use this final block matrix to solve multiple linear systems (with different righthand sides), so for now I use MUMPS as only the first solve takes time (but I might change it). 
> 
> Considering this setting, I created each type of variable separately, filled the different matrices, and created different nests of vectors / matrices for my operations. When the time comes to use KSPSolve, I use MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy the few vector data I need from my nests in a regular Vector, I solve, I get back my data in my nest and carry on with the operations needed for my updates. 
> 
> Is that clear ? I don't know if I provided too many or not enough details. 
> 
> Thank you
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 17:34, Barry Smith a ?crit : 
> Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it.  
> 
> Barry 
> 
> Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks).
> 
> On Mar 17, 2023, at 12:14 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work.
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 15:48, Barry Smith a ?crit : 
> You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process. 
> 
> Barry 
> 
> On Mar 17, 2023, at 10:10 AM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> Dear all, 
> 
> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code 
> 
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) 
> 
> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). 
> 
> Is it possible to change that ? 
> 
> Note that I am coding in fortran if that has ay consequence. 
> 
> Thank you, 
> 
> Sincerely,
> 
> -- 
> Cl?ment BERGER
> ENS de Lyon

  -- 

What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener 

https://www.cse.buffalo.edu/~knepley/ [1] 
  -- 

What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener 

https://www.cse.buffalo.edu/~knepley/ [1] 

Links:
------
[1] http://www.cse.buffalo.edu/~knepley/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230320/d3561591/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mainTest.f90
Type: text/x-c
Size: 1087 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230320/d3561591/attachment-0001.bin>

From knepley at gmail.com  Mon Mar 20 08:58:59 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 20 Mar 2023 09:58:59 -0400
Subject: [petsc-users] Create a nest not aligned by processors
In-Reply-To: <0fe503797c8f73605cebaeaf1924e473@ens-lyon.fr>
References: <b89465855591abef194300f1dc0f68e4@ens-lyon.fr>
	<3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev>
	<ece4a0ac05b61082f2ea73507c7551c8@ens-lyon.fr>
	<193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev>
	<d9d5551d095a228f57cb5724fbd994ab@ens-lyon.fr>
	<82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev>
	<eaab4d4a43e6e8551e058d5b3b2cf3ae@ens-lyon.fr>
	<CC2A3FF0-505B-4A18-A8DB-8399BA92CCF8@petsc.dev>
	<26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr>
	<1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev>
	<2b92d65cd8b6b5786bad3b03ab74eda6@ens-lyon.fr>
	<CAMYG4Gmybdr_wB+DDQdX+wUXS25UqCmhvKeiNtw1TOwNQXQwkQ@mail.gmail.com>
	<e503fd2aa361d85c515712b41c447011@ens-lyon.fr>
	<E31F8C4D-3737-44C0-9022-6B34A26CEAA0@petsc.dev>
	<0b5419aa1fdbe12f093413352c675782@ens-lyon.fr>
	<CAMYG4GmN_vtr14ych+j+Ao=G=LN6kFowNx3r1-sW245x6uiV0w@mail.gmail.com>
	<0fe503797c8f73605cebaeaf1924e473@ens-lyon.fr>
Message-ID: <CAMYG4Gk-tz-fSsL=75wUfRxB-uYzYX3cf3RgFwdMbgM5oYkd2A@mail.gmail.com>

On Mon, Mar 20, 2023 at 9:35?AM Berger Clement <clement.berger at ens-lyon.fr>
wrote:

> 1) Yes in my program I use PETSC_DETERMINE, but I don't see what is the
> issue there. From what I understand, it just lets PETSc set the total size
> from the local sizes provided, am I mistaken ?
>
> 2) I attached a small script, when I run it with 1 proc the output vector
> is not the same as if I run it with 2 procs, I don't know what I should do
> to make them match.
>
> PS : I precise that I am not trying to point out a bug here, I realize
> that my usage is wrong somehow, I just can't determine why, sorry if I gave
> you the wrong impression !
>
> I think I can now explain this clearly. Thank you for the nice simple
example. I attach my slightly changed version (I think better in C). Here
is running on one process:

master *:~/Downloads/tmp/Berger$ /PETSc3/petsc/apple/bin/mpiexec -n 1
./nestTest -left_view -right_view -nest_view -full_view
Mat Object: 1 MPI process
  type: nest
  Matrix object:
    type=nest, rows=2, cols=2
    MatNest structure:
    (0,0) : type=constantdiagonal, rows=4, cols=4
    (0,1) : NULL
    (1,0) : NULL
    (1,1) : type=constantdiagonal, rows=4, cols=4
Mat Object: 1 MPI process
  type: seqaij
row 0: (0, 2.)
row 1: (1, 2.)
row 2: (2, 2.)
row 3: (3, 2.)
row 4: (4, 1.)
row 5: (5, 1.)
row 6: (6, 1.)
row 7: (7, 1.)
Vec Object: 1 MPI process
  type: seq
0.
1.
2.
3.
4.
5.
6.
7.
Vec Object: 1 MPI process
  type: seq
0.
2.
4.
6.
4.
5.
6.
7.

This looks like what you expect. Doubling the first four rows and
reproducing the last four. Now let's run on two processes:

master *:~/Downloads/tmp/Berger$ /PETSc3/petsc/apple/bin/mpiexec -n 2
./nestTest -left_view -right_view -nest_view -full_view
Mat Object: 2 MPI processes
  type: nest
  Matrix object:
    type=nest, rows=2, cols=2
    MatNest structure:
    (0,0) : type=constantdiagonal, rows=4, cols=4
    (0,1) : NULL
    (1,0) : NULL
    (1,1) : type=constantdiagonal, rows=4, cols=4
Mat Object: 2 MPI processes
  type: mpiaij
row 0: (0, 2.)
row 1: (1, 2.)
row 2: (2, 1.)
row 3: (3, 1.)
row 4: (4, 2.)
row 5: (5, 2.)
row 6: (6, 1.)
row 7: (7, 1.)
Vec Object: 2 MPI processes
  type: mpi
Process [0]
0.
1.
2.
3.
Process [1]
4.
5.
6.
7.
Vec Object: 2 MPI processes
  type: mpi
Process [0]
0.
2.
2.
3.
Process [1]
8.
10.
6.
7.

Let me describe what has changed. The matrices A and B are parallel, so
each has two rows on process 0 and two rows on process 1. In the MatNest
they are interleaved because we asked for contiguous numbering (by giving
NULL for the IS of global row numbers). If we want to reproduce the same
output, we would need to produce our input vector with the same interleaved
numbering.

  Thanks,

     Matt

> Thank you,
> ---
> Cl?ment BERGER
> ENS de Lyon
>
>
> Le 2023-03-20 12:53, Matthew Knepley a ?crit :
>
> On Mon, Mar 20, 2023 at 6:18?AM Berger Clement <clement.berger at ens-lyon.fr>
> wrote:
>
>> I simplified the problem with the initial test I talked about because I
>> thought I identified the issue, so I will walk you through my whole problem
>> :
>>
>> - first the solve doesn't produce the same results as mentioned
>>
>> - I noticed that the duration of the factorization step of the matrix was
>> also not consistent with the number of processors used (it is longer with 3
>> processes than with 1), I didn't think much of it but I now realize that
>> for instance with 4 processes, MUMPS crashes when factorizing
>>
>> - I thought my matrices were wrong, but it's hard for me to use MatView
>> to compare them with 1 or 2 proc because I work with a quite specific
>> geometry, so in order not to fall into some weird particular case I need to
>> use at least roughly 100 points, so looking at 100x100 matrices is not
>> really nice...Instead I tried to multiply them by a vector full of one
>> (after I used the vector v such that v(i)=i). I tried it on two matrices,
>> and the results didn't depend on the number of procs, but when I tried to
>> multiply against the nest of these two matrices (a 2x2 block diagonal
>> nest), the result changed depending on the number of processors used
>>
>> - that's why I tried the toy problem I wrote to you in the first place
>>
>> I hope it's clearer now.
>>
>
> Unfortunately, it is not clear to me. There is nothing attached to this
> email. I will try to describe things from my end.
>
> 1) There are lots of tests. Internally, Nest does not depend on the number
> of processes unless you make it so. This leads
>      me to believe that your construction of the matrix changes with the
> number of processes. For example, using PETSC_DETERMINE
>      for sizes will do this.
>
> 2) In order to understand what you want to achieve, we need to have
> something running in two cases, one with "correct" output and one
>     with something different. It sounds like you have such a small
> example, but I have missed it.
>
> Can you attach this example? Then I can run it, look at the matrices, and
> see what is different.
>
>   Thanks,
>
>       Matt
>
>
>> Thank you
>> ---
>> Cl?ment BERGER
>> ENS de Lyon
>>
>>
>> Le 2023-03-17 21:57, Barry Smith a ?crit :
>>
>>
>>   Yes, you would benefit from a VecConvert() to produce a standard
>> vector. But you should be able to use VecGetArray() on the nest array and
>> on the standard array and copy the values between the arrays any way you
>> like. You don't need to do any reordering when you copy. Is that not
>> working and what are the symptoms (more than just the answers to the linear
>> solve are different)? Again you can run on one and two MPI processes with a
>> tiny problem to see if things are not in the correct order in the vectors
>> and matrices.
>>
>>   Barry
>>
>>
>> On Mar 17, 2023, at 3:22 PM, Berger Clement <clement.berger at ens-lyon.fr>
>> wrote:
>>
>> To use MUMPS I need to convert my matrix in MATAIJ format (or at least
>> not MATNEST), after that if I use a VECNEST for the left and right hanside,
>> I get an error during the solve procedure, it is removed if I copy my data
>> in a vector with standard format, I couldn't find any other way
>> ---
>> Cl?ment BERGER
>> ENS de Lyon
>>
>>
>> Le 2023-03-17 19:53, Matthew Knepley a ?crit :
>>
>> On Fri, Mar 17, 2023 at 2:53?PM Berger Clement <
>> clement.berger at ens-lyon.fr> wrote:
>>
>>> But this is to properly fill up the VecNest am I right ? Because this
>>> one is correct, but I can't directly use it in the KSPSolve, I need to copy
>>> it into a standard vector
>>>
>>>
>> I do not understand what you mean here. You can definitely use a
>> VecNest in a KSP.
>>
>>   Thanks,
>>
>>     Matt
>>
>>
>>
>>> ---
>>> Cl?ment BERGER
>>> ENS de Lyon
>>>
>>>
>>> Le 2023-03-17 19:39, Barry Smith a ?crit :
>>>
>>>
>>>   I think the intention is that you use VecNestGetSubVecs()
>>> or VecNestGetSubVec() and fill up the sub-vectors in the same style as the
>>> matrices; this decreases the change of a reordering mistake in trying to do
>>> it by hand in your code.
>>>
>>>
>>>
>>> On Mar 17, 2023, at 2:35 PM, Berger Clement <clement.berger at ens-lyon.fr>
>>> wrote:
>>>
>>> That might be it, I didn't find the equivalent of MatConvert for the
>>> vectors, so when I need to solve my linear system, with my righthandside
>>> properly computed in nest format, I create a new vector using VecDuplicate,
>>> and then I copy into it my data using VecGetArrayF90 and copiing each
>>> element by hand. Does it create an incorrect ordering ? If so how can I get
>>> the correct one ?
>>> ---
>>> Cl?ment BERGER
>>> ENS de Lyon
>>>
>>>
>>> Le 2023-03-17 19:27, Barry Smith a ?crit :
>>>
>>>
>>>   I would run your code with small sizes on 1, 2, 3 MPI ranks and use
>>> MatView() to examine the matrices. They will definitely be ordered
>>> differently but should otherwise be the same. My guess is that the right
>>> hand side may not have the correct ordering with respect to the matrix
>>> ordering in parallel. Note also that when the right hand side does have the
>>> correct ordering the solution will have a different ordering for each
>>> different number of MPI ranks when printed (but changing the ordering
>>> should give the same results up to machine precision.
>>>
>>>   Barry
>>>
>>>
>>> On Mar 17, 2023, at 2:23 PM, Berger Clement <clement.berger at ens-lyon.fr>
>>> wrote:
>>>
>>> My issue is that it seems to improperly with some step of my process,
>>> the solve step doesn't provide the same result depending on the number of
>>> processors I use. I manually tried to multiply one the matrices I defined
>>> as a nest against a vector, and the result is not the same with e.g. 1 and
>>> 3 processors. That's why I tried the toy program I wrote in the first
>>> place, which highlights the misplacement of elements.
>>> ---
>>> Cl?ment BERGER
>>> ENS de Lyon
>>>
>>>
>>> Le 2023-03-17 19:14, Barry Smith a ?crit :
>>>
>>>
>>>    This sounds  like a fine use of MATNEST. Now back to the original
>>> question
>>>
>>>
>>> I want to construct a matrix by blocs, each block having different sizes
>>> and partially stored by multiple processors. If I am not mistaken, the
>>> right way to do so is by using the MATNEST type. However, the following code
>>>
>>> Call
>>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
>>> Call
>>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
>>> Call
>>> MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr)
>>>
>>> does not generate the same matrix depending on the number of processors.
>>> It seems that it starts by everything owned by the first proc for A and B,
>>> then goes on to the second proc and so on (I hope I am being clear).
>>>
>>> Is it possible to change that ?
>>>
>>>   If I understand correctly it is behaving as expected. It is the same
>>> matrix on 1 and 2 MPI processes, the only difference is the ordering of the
>>> rows and columns.
>>>
>>>   Both matrix blocks are split among the two MPI processes. This is how
>>> MATNEST works and likely what you want in practice.
>>>
>>> On Mar 17, 2023, at 1:19 PM, Berger Clement <clement.berger at ens-lyon.fr>
>>> wrote:
>>>
>>> I have a matrix with four different blocks (2rows - 2columns). The block
>>> sizes differ from one another, because they correspond to a different
>>> physical variable. One of the block has the particularity that it has to be
>>> updated at each iteration. This update is performed by replacing it with a
>>> product of multiple matrices that depend on the result of the previous
>>> iteration. Note that these intermediate matrices are not square (because
>>> they also correspond to other types of variables), and that they must be
>>> completely refilled by hand (i.e. they are not the result of some simple
>>> linear operations). Finally, I use this final block matrix to solve
>>> multiple linear systems (with different righthand sides), so for now I use
>>> MUMPS as only the first solve takes time (but I might change it).
>>>
>>> Considering this setting, I created each type of variable separately,
>>> filled the different matrices, and created different nests of vectors /
>>> matrices for my operations. When the time comes to use KSPSolve, I use
>>> MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy
>>> the few vector data I need from my nests in a regular Vector, I solve, I
>>> get back my data in my nest and carry on with the operations needed for my
>>> updates.
>>>
>>> Is that clear ? I don't know if I provided too many or not enough
>>> details.
>>>
>>> Thank you
>>> ---
>>> Cl?ment BERGER
>>> ENS de Lyon
>>>
>>>
>>> Le 2023-03-17 17:34, Barry Smith a ?crit :
>>>
>>>
>>>    Perhaps if you provide a brief summary of what you would like to do
>>> and we may have ideas on how to achieve it.
>>>
>>>    Barry
>>>
>>> Note: that MATNEST does require that all matrices live on all the MPI
>>> processes within the original communicator. That is if the original
>>> communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST
>>> that only lives on ranks 1,2 but you could have it have 0 rows on rank zero
>>> so effectively it lives only on rank 1 and 2 (though its communicator is
>>> all three ranks).
>>>
>>> On Mar 17, 2023, at 12:14 PM, Berger Clement <clement.berger at ens-lyon.fr>
>>> wrote:
>>>
>>> It would be possible in the case I showed you but in mine that would
>>> actually be quite complicated, isn't there any other workaround ? I precise
>>> that I am not entitled to utilizing the MATNEST format, it's just that I
>>> think the other ones wouldn't work.
>>> ---
>>> Cl?ment BERGER
>>> ENS de Lyon
>>>
>>>
>>> Le 2023-03-17 15:48, Barry Smith a ?crit :
>>>
>>>
>>>    You may be able to mimic what you want by not using PETSC_DECIDE but
>>> instead computing up front how many rows of each matrix you want stored on
>>> each MPI process. You can use 0 for on certain MPI processes for certain
>>> matrices if you don't want any rows of that particular matrix stored on
>>> that particular MPI process.
>>>
>>>   Barry
>>>
>>>
>>> On Mar 17, 2023, at 10:10 AM, Berger Clement <clement.berger at ens-lyon.fr>
>>> wrote:
>>>
>>> Dear all,
>>>
>>> I want to construct a matrix by blocs, each block having different sizes
>>> and partially stored by multiple processors. If I am not mistaken, the
>>> right way to do so is by using the MATNEST type. However, the following code
>>>
>>> Call
>>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
>>> Call
>>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
>>> Call
>>> MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr)
>>>
>>> does not generate the same matrix depending on the number of processors.
>>> It seems that it starts by everything owned by the first proc for A and B,
>>> then goes on to the second proc and so on (I hope I am being clear).
>>>
>>> Is it possible to change that ?
>>>
>>> Note that I am coding in fortran if that has ay consequence.
>>>
>>> Thank you,
>>>
>>> Sincerely,
>>> --
>>> Cl?ment BERGER
>>> ENS de Lyon
>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230320/659e6fd3/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nestTest.c
Type: application/octet-stream
Size: 1352 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230320/659e6fd3/attachment-0001.obj>

From clement.berger at ens-lyon.fr  Mon Mar 20 09:09:57 2023
From: clement.berger at ens-lyon.fr (Berger Clement)
Date: Mon, 20 Mar 2023 15:09:57 +0100
Subject: [petsc-users] Create a nest not aligned by processors
In-Reply-To: <CAMYG4Gk-tz-fSsL=75wUfRxB-uYzYX3cf3RgFwdMbgM5oYkd2A@mail.gmail.com>
References: <b89465855591abef194300f1dc0f68e4@ens-lyon.fr>
	<3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev>
	<ece4a0ac05b61082f2ea73507c7551c8@ens-lyon.fr>
	<193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev>
	<d9d5551d095a228f57cb5724fbd994ab@ens-lyon.fr>
	<82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev>
	<eaab4d4a43e6e8551e058d5b3b2cf3ae@ens-lyon.fr>
	<CC2A3FF0-505B-4A18-A8DB-8399BA92CCF8@petsc.dev>
	<26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr>
	<1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev>
	<2b92d65cd8b6b5786bad3b03ab74eda6@ens-lyon.fr>
	<CAMYG4Gmybdr_wB+DDQdX+wUXS25UqCmhvKeiNtw1TOwNQXQwkQ@mail.gmail.com>
	<e503fd2aa361d85c515712b41c447011@ens-lyon.fr>
	<E31F8C4D-3737-44C0-9022-6B34A26CEAA0@petsc.dev>
	<0b5419aa1fdbe12f093413352c675782@ens-lyon.fr>
	<CAMYG4GmN_vtr14ych+j+Ao=G=LN6kFowNx3r1-sW245x6uiV0w@mail.gmail.com>
	<0fe503797c8f73605cebaeaf1924e473@ens-lyon.fr>
	<CAMYG4Gk-tz-fSsL=75wUfRxB-uYzYX3cf3RgFwdMbgM5oYkd2A@mail.gmail.com>
Message-ID: <17f530ee58cffe33a7f66d5e293d70c9@ens-lyon.fr>

Ok so this means that if I define my vectors via VecNest (organized as
the matrices), everything will be correctly ordered ? How does that
behave with MatConvert ? In the sense that if I convert a MatNest to
MatAIJ via MatConvert, will the vector as built in the example I showed
you work properly ? 

Thank you !

---
Cl?ment BERGER
ENS de Lyon 

Le 2023-03-20 14:58, Matthew Knepley a ?crit :

> On Mon, Mar 20, 2023 at 9:35?AM Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
>> 1) Yes in my program I use PETSC_DETERMINE, but I don't see what is the issue there. From what I understand, it just lets PETSc set the total size from the local sizes provided, am I mistaken ? 
>> 
>> 2) I attached a small script, when I run it with 1 proc the output vector is not the same as if I run it with 2 procs, I don't know what I should do to make them match. 
>> 
>> PS : I precise that I am not trying to point out a bug here, I realize that my usage is wrong somehow, I just can't determine why, sorry if I gave you the wrong impression !
> 
> I think I can now explain this clearly. Thank you for the nice simple example. I attach my slightly changed version (I think better in C). Here is running on one process: 
> master *:~/Downloads/tmp/Berger$ /PETSc3/petsc/apple/bin/mpiexec -n 1 ./nestTest -left_view -right_view -nest_view -full_view
> Mat Object: 1 MPI process
> type: nest
> Matrix object: 
> type=nest, rows=2, cols=2 
> MatNest structure: 
> (0,0) : type=constantdiagonal, rows=4, cols=4 
> (0,1) : NULL 
> (1,0) : NULL 
> (1,1) : type=constantdiagonal, rows=4, cols=4 
> Mat Object: 1 MPI process
> type: seqaij
> row 0: (0, 2.) 
> row 1: (1, 2.) 
> row 2: (2, 2.) 
> row 3: (3, 2.) 
> row 4: (4, 1.) 
> row 5: (5, 1.) 
> row 6: (6, 1.) 
> row 7: (7, 1.) 
> Vec Object: 1 MPI process
> type: seq
> 0.
> 1.
> 2.
> 3.
> 4.
> 5.
> 6.
> 7.
> Vec Object: 1 MPI process
> type: seq
> 0.
> 2.
> 4.
> 6.
> 4.
> 5.
> 6.
> 7.
> 
> This looks like what you expect. Doubling the first four rows and reproducing the last four. Now let's run on two processes: 
> 
> master *:~/Downloads/tmp/Berger$ /PETSc3/petsc/apple/bin/mpiexec -n 2 ./nestTest -left_view -right_view -nest_view -full_view
> Mat Object: 2 MPI processes
> type: nest
> Matrix object: 
> type=nest, rows=2, cols=2 
> MatNest structure: 
> (0,0) : type=constantdiagonal, rows=4, cols=4 
> (0,1) : NULL 
> (1,0) : NULL 
> (1,1) : type=constantdiagonal, rows=4, cols=4 
> Mat Object: 2 MPI processes
> type: mpiaij
> row 0: (0, 2.) 
> row 1: (1, 2.) 
> row 2: (2, 1.) 
> row 3: (3, 1.) 
> row 4: (4, 2.) 
> row 5: (5, 2.) 
> row 6: (6, 1.) 
> row 7: (7, 1.) 
> Vec Object: 2 MPI processes
> type: mpi
> Process [0]
> 0.
> 1.
> 2.
> 3.
> Process [1]
> 4.
> 5.
> 6.
> 7.
> Vec Object: 2 MPI processes
> type: mpi
> Process [0]
> 0.
> 2.
> 2.
> 3.
> Process [1]
> 8.
> 10.
> 6.
> 7. 
> 
> Let me describe what has changed. The matrices A and B are parallel, so each has two rows on process 0 and two rows on process 1. In the MatNest they are interleaved because we asked for contiguous numbering (by giving NULL for the IS of global row numbers). If we want to reproduce the same output, we would need to produce our input vector with the same interleaved numbering. 
> 
> Thanks, 
> 
> Matt 
> 
> Thank you,
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-20 12:53, Matthew Knepley a ?crit : 
> 
> On Mon, Mar 20, 2023 at 6:18?AM Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> I simplified the problem with the initial test I talked about because I thought I identified the issue, so I will walk you through my whole problem : 
> 
> - first the solve doesn't produce the same results as mentioned 
> 
> - I noticed that the duration of the factorization step of the matrix was also not consistent with the number of processors used (it is longer with 3 processes than with 1), I didn't think much of it but I now realize that for instance with 4 processes, MUMPS crashes when factorizing 
> 
> - I thought my matrices were wrong, but it's hard for me to use MatView to compare them with 1 or 2 proc because I work with a quite specific geometry, so in order not to fall into some weird particular case I need to use at least roughly 100 points, so looking at 100x100 matrices is not really nice...Instead I tried to multiply them by a vector full of one (after I used the vector v such that v(i)=i). I tried it on two matrices, and the results didn't depend on the number of procs, but when I tried to multiply against the nest of these two matrices (a 2x2 block diagonal nest), the result changed depending on the number of processors used 
> 
> - that's why I tried the toy problem I wrote to you in the first place 
> 
> I hope it's clearer now. 
> 
> Unfortunately, it is not clear to me. There is nothing attached to this email. I will try to describe things from my end. 
> 
> 1) There are lots of tests. Internally, Nest does not depend on the number of processes unless you make it so. This leads 
> me to believe that your construction of the matrix changes with the number of processes. For example, using PETSC_DETERMINE 
> for sizes will do this. 
> 
> 2) In order to understand what you want to achieve, we need to have something running in two cases, one with "correct" output and one 
> with something different. It sounds like you have such a small example, but I have missed it. 
> 
> Can you attach this example? Then I can run it, look at the matrices, and see what is different. 
> 
> Thanks, 
> 
> Matt 
> 
> Thank you
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 21:57, Barry Smith a ?crit : 
> Yes, you would benefit from a VecConvert() to produce a standard vector. But you should be able to use VecGetArray() on the nest array and on the standard array and copy the values between the arrays any way you like. You don't need to do any reordering when you copy. Is that not working and what are the symptoms (more than just the answers to the linear solve are different)? Again you can run on one and two MPI processes with a tiny problem to see if things are not in the correct order in the vectors and matrices. 
> 
> Barry 
> 
> On Mar 17, 2023, at 3:22 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> To use MUMPS I need to convert my matrix in MATAIJ format (or at least not MATNEST), after that if I use a VECNEST for the left and right hanside, I get an error during the solve procedure, it is removed if I copy my data in a vector with standard format, I couldn't find any other way
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 19:53, Matthew Knepley a ?crit : 
> 
> On Fri, Mar 17, 2023 at 2:53?PM Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> But this is to properly fill up the VecNest am I right ? Because this one is correct, but I can't directly use it in the KSPSolve, I need to copy it into a standard vector
> 
> I do not understand what you mean here. You can definitely use a VecNest in a KSP. 
> 
> Thanks, 
> 
> Matt 
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 19:39, Barry Smith a ?crit : 
> I think the intention is that you use VecNestGetSubVecs() or VecNestGetSubVec() and fill up the sub-vectors in the same style as the matrices; this decreases the change of a reordering mistake in trying to do it by hand in your code. 
> 
> On Mar 17, 2023, at 2:35 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> That might be it, I didn't find the equivalent of MatConvert for the vectors, so when I need to solve my linear system, with my righthandside properly computed in nest format, I create a new vector using VecDuplicate, and then I copy into it my data using VecGetArrayF90 and copiing each element by hand. Does it create an incorrect ordering ? If so how can I get the correct one ?
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 19:27, Barry Smith a ?crit : 
> I would run your code with small sizes on 1, 2, 3 MPI ranks and use MatView() to examine the matrices. They will definitely be ordered differently but should otherwise be the same. My guess is that the right hand side may not have the correct ordering with respect to the matrix ordering in parallel. Note also that when the right hand side does have the correct ordering the solution will have a different ordering for each different number of MPI ranks when printed (but changing the ordering should give the same results up to machine precision. 
> 
> Barry 
> 
> On Mar 17, 2023, at 2:23 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> My issue is that it seems to improperly with some step of my process, the solve step doesn't provide the same result depending on the number of processors I use. I manually tried to multiply one the matrices I defined as a nest against a vector, and the result is not the same with e.g. 1 and 3 processors. That's why I tried the toy program I wrote in the first place, which highlights the misplacement of elements.
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 19:14, Barry Smith a ?crit : 
> This sounds  like a fine use of MATNEST. Now back to the original question 
> 
> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code 
> 
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) 
> 
> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). 
> 
> Is it possible to change that ?
   If I understand correctly it is behaving as expected. It is the same
matrix on 1 and 2 MPI processes, the only difference is the ordering of
the rows and columns.  

  Both matrix blocks are split among the two MPI processes. This is how
MATNEST works and likely what you want in practice.  

> On Mar 17, 2023, at 1:19 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> I have a matrix with four different blocks (2rows - 2columns). The block sizes differ from one another, because they correspond to a different physical variable. One of the block has the particularity that it has to be updated at each iteration. This update is performed by replacing it with a product of multiple matrices that depend on the result of the previous iteration. Note that these intermediate matrices are not square (because they also correspond to other types of variables), and that they must be completely refilled by hand (i.e. they are not the result of some simple linear operations). Finally, I use this final block matrix to solve multiple linear systems (with different righthand sides), so for now I use MUMPS as only the first solve takes time (but I might change it). 
> 
> Considering this setting, I created each type of variable separately, filled the different matrices, and created different nests of vectors / matrices for my operations. When the time comes to use KSPSolve, I use MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy the few vector data I need from my nests in a regular Vector, I solve, I get back my data in my nest and carry on with the operations needed for my updates. 
> 
> Is that clear ? I don't know if I provided too many or not enough details. 
> 
> Thank you
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 17:34, Barry Smith a ?crit : 
> Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it.  
> 
> Barry 
> 
> Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks).
> 
> On Mar 17, 2023, at 12:14 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work.
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 15:48, Barry Smith a ?crit : 
> You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process. 
> 
> Barry 
> 
> On Mar 17, 2023, at 10:10 AM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> Dear all, 
> 
> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code 
> 
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) 
> 
> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). 
> 
> Is it possible to change that ? 
> 
> Note that I am coding in fortran if that has ay consequence. 
> 
> Thank you, 
> 
> Sincerely,
> 
> -- 
> Cl?ment BERGER
> ENS de Lyon

  -- 

What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener 

https://www.cse.buffalo.edu/~knepley/ [1] 
  -- 

What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener 

https://www.cse.buffalo.edu/~knepley/ [1] 
  -- 

What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener 

https://www.cse.buffalo.edu/~knepley/ [1] 

Links:
------
[1] http://www.cse.buffalo.edu/~knepley/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230320/fcd19dcb/attachment-0001.html>

From knepley at gmail.com  Mon Mar 20 09:51:11 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 20 Mar 2023 10:51:11 -0400
Subject: [petsc-users] Create a nest not aligned by processors
In-Reply-To: <17f530ee58cffe33a7f66d5e293d70c9@ens-lyon.fr>
References: <b89465855591abef194300f1dc0f68e4@ens-lyon.fr>
	<3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev>
	<ece4a0ac05b61082f2ea73507c7551c8@ens-lyon.fr>
	<193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev>
	<d9d5551d095a228f57cb5724fbd994ab@ens-lyon.fr>
	<82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev>
	<eaab4d4a43e6e8551e058d5b3b2cf3ae@ens-lyon.fr>
	<CC2A3FF0-505B-4A18-A8DB-8399BA92CCF8@petsc.dev>
	<26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr>
	<1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev>
	<2b92d65cd8b6b5786bad3b03ab74eda6@ens-lyon.fr>
	<CAMYG4Gmybdr_wB+DDQdX+wUXS25UqCmhvKeiNtw1TOwNQXQwkQ@mail.gmail.com>
	<e503fd2aa361d85c515712b41c447011@ens-lyon.fr>
	<E31F8C4D-3737-44C0-9022-6B34A26CEAA0@petsc.dev>
	<0b5419aa1fdbe12f093413352c675782@ens-lyon.fr>
	<CAMYG4GmN_vtr14ych+j+Ao=G=LN6kFowNx3r1-sW245x6uiV0w@mail.gmail.com>
	<0fe503797c8f73605cebaeaf1924e473@ens-lyon.fr>
	<CAMYG4Gk-tz-fSsL=75wUfRxB-uYzYX3cf3RgFwdMbgM5oYkd2A@mail.gmail.com>
	<17f530ee58cffe33a7f66d5e293d70c9@ens-lyon.fr>
Message-ID: <CAMYG4GnYvtPOebhHvEsQ=cELX8Vgr5Gx_TrSPRjC2fVVtHXgiA@mail.gmail.com>

On Mon, Mar 20, 2023 at 10:09?AM Berger Clement <clement.berger at ens-lyon.fr>
wrote:

> Ok so this means that if I define my vectors via VecNest (organized as the
> matrices), everything will be correctly ordered ?
>
Yes.

> How does that behave with MatConvert ? In the sense that if I convert a
> MatNest to MatAIJ via MatConvert, will the vector as built in the example I
> showed you work properly ?
>
> No. MatConvert just changes the storage format, not the ordering.

  Thanks,

     Matt

> Thank you !
> ---
> Cl?ment BERGER
> ENS de Lyon
>
>
> Le 2023-03-20 14:58, Matthew Knepley a ?crit :
>
> On Mon, Mar 20, 2023 at 9:35?AM Berger Clement <clement.berger at ens-lyon.fr>
> wrote:
>
>> 1) Yes in my program I use PETSC_DETERMINE, but I don't see what is the
>> issue there. From what I understand, it just lets PETSc set the total size
>> from the local sizes provided, am I mistaken ?
>>
>> 2) I attached a small script, when I run it with 1 proc the output vector
>> is not the same as if I run it with 2 procs, I don't know what I should do
>> to make them match.
>>
>> PS : I precise that I am not trying to point out a bug here, I realize
>> that my usage is wrong somehow, I just can't determine why, sorry if I gave
>> you the wrong impression !
>>
>>
>> I think I can now explain this clearly. Thank you for the nice simple
> example. I attach my slightly changed version (I think better in C). Here
> is running on one process:
>
> master *:~/Downloads/tmp/Berger$ /PETSc3/petsc/apple/bin/mpiexec -n 1
> ./nestTest -left_view -right_view -nest_view -full_view
> Mat Object: 1 MPI process
>   type: nest
>   Matrix object:
>     type=nest, rows=2, cols=2
>     MatNest structure:
>     (0,0) : type=constantdiagonal, rows=4, cols=4
>     (0,1) : NULL
>     (1,0) : NULL
>     (1,1) : type=constantdiagonal, rows=4, cols=4
> Mat Object: 1 MPI process
>   type: seqaij
> row 0: (0, 2.)
> row 1: (1, 2.)
> row 2: (2, 2.)
> row 3: (3, 2.)
> row 4: (4, 1.)
> row 5: (5, 1.)
> row 6: (6, 1.)
> row 7: (7, 1.)
> Vec Object: 1 MPI process
>   type: seq
> 0.
> 1.
> 2.
> 3.
> 4.
> 5.
> 6.
> 7.
> Vec Object: 1 MPI process
>   type: seq
> 0.
> 2.
> 4.
> 6.
> 4.
> 5.
> 6.
> 7.
>
> This looks like what you expect. Doubling the first four rows and
> reproducing the last four. Now let's run on two processes:
>
> master *:~/Downloads/tmp/Berger$ /PETSc3/petsc/apple/bin/mpiexec -n 2
> ./nestTest -left_view -right_view -nest_view -full_view
> Mat Object: 2 MPI processes
>   type: nest
>   Matrix object:
>     type=nest, rows=2, cols=2
>     MatNest structure:
>     (0,0) : type=constantdiagonal, rows=4, cols=4
>     (0,1) : NULL
>     (1,0) : NULL
>     (1,1) : type=constantdiagonal, rows=4, cols=4
> Mat Object: 2 MPI processes
>   type: mpiaij
> row 0: (0, 2.)
> row 1: (1, 2.)
> row 2: (2, 1.)
> row 3: (3, 1.)
> row 4: (4, 2.)
> row 5: (5, 2.)
> row 6: (6, 1.)
> row 7: (7, 1.)
> Vec Object: 2 MPI processes
>   type: mpi
> Process [0]
> 0.
> 1.
> 2.
> 3.
> Process [1]
> 4.
> 5.
> 6.
> 7.
> Vec Object: 2 MPI processes
>   type: mpi
> Process [0]
> 0.
> 2.
> 2.
> 3.
> Process [1]
> 8.
> 10.
> 6.
> 7.
>
> Let me describe what has changed. The matrices A and B are parallel, so
> each has two rows on process 0 and two rows on process 1. In the MatNest
> they are interleaved because we asked for contiguous numbering (by giving
> NULL for the IS of global row numbers). If we want to reproduce the same
> output, we would need to produce our input vector with the same interleaved
> numbering.
>
>   Thanks,
>
>      Matt
>
>> Thank you,
>> ---
>> Cl?ment BERGER
>> ENS de Lyon
>>
>>
>> Le 2023-03-20 12:53, Matthew Knepley a ?crit :
>>
>> On Mon, Mar 20, 2023 at 6:18?AM Berger Clement <
>> clement.berger at ens-lyon.fr> wrote:
>>
>>> I simplified the problem with the initial test I talked about because I
>>> thought I identified the issue, so I will walk you through my whole problem
>>> :
>>>
>>> - first the solve doesn't produce the same results as mentioned
>>>
>>> - I noticed that the duration of the factorization step of the matrix
>>> was also not consistent with the number of processors used (it is longer
>>> with 3 processes than with 1), I didn't think much of it but I now realize
>>> that for instance with 4 processes, MUMPS crashes when factorizing
>>>
>>> - I thought my matrices were wrong, but it's hard for me to use MatView
>>> to compare them with 1 or 2 proc because I work with a quite specific
>>> geometry, so in order not to fall into some weird particular case I need to
>>> use at least roughly 100 points, so looking at 100x100 matrices is not
>>> really nice...Instead I tried to multiply them by a vector full of one
>>> (after I used the vector v such that v(i)=i). I tried it on two matrices,
>>> and the results didn't depend on the number of procs, but when I tried to
>>> multiply against the nest of these two matrices (a 2x2 block diagonal
>>> nest), the result changed depending on the number of processors used
>>>
>>> - that's why I tried the toy problem I wrote to you in the first place
>>>
>>> I hope it's clearer now.
>>>
>>
>> Unfortunately, it is not clear to me. There is nothing attached to this
>> email. I will try to describe things from my end.
>>
>> 1) There are lots of tests. Internally, Nest does not depend on the
>> number of processes unless you make it so. This leads
>>      me to believe that your construction of the matrix changes with the
>> number of processes. For example, using PETSC_DETERMINE
>>      for sizes will do this.
>>
>> 2) In order to understand what you want to achieve, we need to have
>> something running in two cases, one with "correct" output and one
>>     with something different. It sounds like you have such a small
>> example, but I have missed it.
>>
>> Can you attach this example? Then I can run it, look at the matrices, and
>> see what is different.
>>
>>   Thanks,
>>
>>       Matt
>>
>>
>>> Thank you
>>> ---
>>> Cl?ment BERGER
>>> ENS de Lyon
>>>
>>>
>>> Le 2023-03-17 21:57, Barry Smith a ?crit :
>>>
>>>
>>>   Yes, you would benefit from a VecConvert() to produce a standard
>>> vector. But you should be able to use VecGetArray() on the nest array and
>>> on the standard array and copy the values between the arrays any way you
>>> like. You don't need to do any reordering when you copy. Is that not
>>> working and what are the symptoms (more than just the answers to the linear
>>> solve are different)? Again you can run on one and two MPI processes with a
>>> tiny problem to see if things are not in the correct order in the vectors
>>> and matrices.
>>>
>>>   Barry
>>>
>>>
>>> On Mar 17, 2023, at 3:22 PM, Berger Clement <clement.berger at ens-lyon.fr>
>>> wrote:
>>>
>>> To use MUMPS I need to convert my matrix in MATAIJ format (or at least
>>> not MATNEST), after that if I use a VECNEST for the left and right hanside,
>>> I get an error during the solve procedure, it is removed if I copy my data
>>> in a vector with standard format, I couldn't find any other way
>>> ---
>>> Cl?ment BERGER
>>> ENS de Lyon
>>>
>>>
>>> Le 2023-03-17 19:53, Matthew Knepley a ?crit :
>>>
>>> On Fri, Mar 17, 2023 at 2:53?PM Berger Clement <
>>> clement.berger at ens-lyon.fr> wrote:
>>>
>>>> But this is to properly fill up the VecNest am I right ? Because this
>>>> one is correct, but I can't directly use it in the KSPSolve, I need to copy
>>>> it into a standard vector
>>>>
>>>>
>>> I do not understand what you mean here. You can definitely use a
>>> VecNest in a KSP.
>>>
>>>   Thanks,
>>>
>>>     Matt
>>>
>>>
>>>
>>>> ---
>>>> Cl?ment BERGER
>>>> ENS de Lyon
>>>>
>>>>
>>>> Le 2023-03-17 19:39, Barry Smith a ?crit :
>>>>
>>>>
>>>>   I think the intention is that you use VecNestGetSubVecs()
>>>> or VecNestGetSubVec() and fill up the sub-vectors in the same style as the
>>>> matrices; this decreases the change of a reordering mistake in trying to do
>>>> it by hand in your code.
>>>>
>>>>
>>>>
>>>> On Mar 17, 2023, at 2:35 PM, Berger Clement <clement.berger at ens-lyon.fr>
>>>> wrote:
>>>>
>>>> That might be it, I didn't find the equivalent of MatConvert for the
>>>> vectors, so when I need to solve my linear system, with my righthandside
>>>> properly computed in nest format, I create a new vector using VecDuplicate,
>>>> and then I copy into it my data using VecGetArrayF90 and copiing each
>>>> element by hand. Does it create an incorrect ordering ? If so how can I get
>>>> the correct one ?
>>>> ---
>>>> Cl?ment BERGER
>>>> ENS de Lyon
>>>>
>>>>
>>>> Le 2023-03-17 19:27, Barry Smith a ?crit :
>>>>
>>>>
>>>>   I would run your code with small sizes on 1, 2, 3 MPI ranks and use
>>>> MatView() to examine the matrices. They will definitely be ordered
>>>> differently but should otherwise be the same. My guess is that the right
>>>> hand side may not have the correct ordering with respect to the matrix
>>>> ordering in parallel. Note also that when the right hand side does have the
>>>> correct ordering the solution will have a different ordering for each
>>>> different number of MPI ranks when printed (but changing the ordering
>>>> should give the same results up to machine precision.
>>>>
>>>>   Barry
>>>>
>>>>
>>>> On Mar 17, 2023, at 2:23 PM, Berger Clement <clement.berger at ens-lyon.fr>
>>>> wrote:
>>>>
>>>> My issue is that it seems to improperly with some step of my process,
>>>> the solve step doesn't provide the same result depending on the number of
>>>> processors I use. I manually tried to multiply one the matrices I defined
>>>> as a nest against a vector, and the result is not the same with e.g. 1 and
>>>> 3 processors. That's why I tried the toy program I wrote in the first
>>>> place, which highlights the misplacement of elements.
>>>> ---
>>>> Cl?ment BERGER
>>>> ENS de Lyon
>>>>
>>>>
>>>> Le 2023-03-17 19:14, Barry Smith a ?crit :
>>>>
>>>>
>>>>    This sounds  like a fine use of MATNEST. Now back to the original
>>>> question
>>>>
>>>>
>>>> I want to construct a matrix by blocs, each block having different
>>>> sizes and partially stored by multiple processors. If I am not mistaken,
>>>> the right way to do so is by using the MATNEST type. However, the following
>>>> code
>>>>
>>>> Call
>>>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
>>>> Call
>>>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
>>>> Call
>>>> MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr)
>>>>
>>>> does not generate the same matrix depending on the number of
>>>> processors. It seems that it starts by everything owned by the first proc
>>>> for A and B, then goes on to the second proc and so on (I hope I am being
>>>> clear).
>>>>
>>>> Is it possible to change that ?
>>>>
>>>>   If I understand correctly it is behaving as expected. It is the same
>>>> matrix on 1 and 2 MPI processes, the only difference is the ordering of the
>>>> rows and columns.
>>>>
>>>>   Both matrix blocks are split among the two MPI processes. This is how
>>>> MATNEST works and likely what you want in practice.
>>>>
>>>> On Mar 17, 2023, at 1:19 PM, Berger Clement <clement.berger at ens-lyon.fr>
>>>> wrote:
>>>>
>>>> I have a matrix with four different blocks (2rows - 2columns). The
>>>> block sizes differ from one another, because they correspond to a different
>>>> physical variable. One of the block has the particularity that it has to be
>>>> updated at each iteration. This update is performed by replacing it with a
>>>> product of multiple matrices that depend on the result of the previous
>>>> iteration. Note that these intermediate matrices are not square (because
>>>> they also correspond to other types of variables), and that they must be
>>>> completely refilled by hand (i.e. they are not the result of some simple
>>>> linear operations). Finally, I use this final block matrix to solve
>>>> multiple linear systems (with different righthand sides), so for now I use
>>>> MUMPS as only the first solve takes time (but I might change it).
>>>>
>>>> Considering this setting, I created each type of variable separately,
>>>> filled the different matrices, and created different nests of vectors /
>>>> matrices for my operations. When the time comes to use KSPSolve, I use
>>>> MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy
>>>> the few vector data I need from my nests in a regular Vector, I solve, I
>>>> get back my data in my nest and carry on with the operations needed for my
>>>> updates.
>>>>
>>>> Is that clear ? I don't know if I provided too many or not enough
>>>> details.
>>>>
>>>> Thank you
>>>> ---
>>>> Cl?ment BERGER
>>>> ENS de Lyon
>>>>
>>>>
>>>> Le 2023-03-17 17:34, Barry Smith a ?crit :
>>>>
>>>>
>>>>    Perhaps if you provide a brief summary of what you would like to do
>>>> and we may have ideas on how to achieve it.
>>>>
>>>>    Barry
>>>>
>>>> Note: that MATNEST does require that all matrices live on all the MPI
>>>> processes within the original communicator. That is if the original
>>>> communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST
>>>> that only lives on ranks 1,2 but you could have it have 0 rows on rank zero
>>>> so effectively it lives only on rank 1 and 2 (though its communicator is
>>>> all three ranks).
>>>>
>>>> On Mar 17, 2023, at 12:14 PM, Berger Clement <
>>>> clement.berger at ens-lyon.fr> wrote:
>>>>
>>>> It would be possible in the case I showed you but in mine that would
>>>> actually be quite complicated, isn't there any other workaround ? I precise
>>>> that I am not entitled to utilizing the MATNEST format, it's just that I
>>>> think the other ones wouldn't work.
>>>> ---
>>>> Cl?ment BERGER
>>>> ENS de Lyon
>>>>
>>>>
>>>> Le 2023-03-17 15:48, Barry Smith a ?crit :
>>>>
>>>>
>>>>    You may be able to mimic what you want by not using PETSC_DECIDE but
>>>> instead computing up front how many rows of each matrix you want stored on
>>>> each MPI process. You can use 0 for on certain MPI processes for certain
>>>> matrices if you don't want any rows of that particular matrix stored on
>>>> that particular MPI process.
>>>>
>>>>   Barry
>>>>
>>>>
>>>> On Mar 17, 2023, at 10:10 AM, Berger Clement <
>>>> clement.berger at ens-lyon.fr> wrote:
>>>>
>>>> Dear all,
>>>>
>>>> I want to construct a matrix by blocs, each block having different
>>>> sizes and partially stored by multiple processors. If I am not mistaken,
>>>> the right way to do so is by using the MATNEST type. However, the following
>>>> code
>>>>
>>>> Call
>>>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
>>>> Call
>>>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
>>>> Call
>>>> MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr)
>>>>
>>>> does not generate the same matrix depending on the number of
>>>> processors. It seems that it starts by everything owned by the first proc
>>>> for A and B, then goes on to the second proc and so on (I hope I am being
>>>> clear).
>>>>
>>>> Is it possible to change that ?
>>>>
>>>> Note that I am coding in fortran if that has ay consequence.
>>>>
>>>> Thank you,
>>>>
>>>> Sincerely,
>>>> --
>>>> Cl?ment BERGER
>>>> ENS de Lyon
>>>>
>>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230320/51bb65a1/attachment-0001.html>

From hongzhang at anl.gov  Mon Mar 20 11:12:04 2023
From: hongzhang at anl.gov (Zhang, Hong)
Date: Mon, 20 Mar 2023 16:12:04 +0000
Subject: [petsc-users] [petsc-maint] Some questions about matrix
 multiplication in sell format
In-Reply-To: <CADOhEh717fdeEX18H5wukuEzmMZWDZbRW0TPCP5Dc+vj+zwmZw@mail.gmail.com>
References: <CAHTYGLNuojwUESyU2iQucvWxBL2y2yE_M9T2RaMNO29y5EPsXw@mail.gmail.com>
	<CADOhEh47Dddg+uAjk0e4T+FmOYJZjrGh5BuOWdi+vZ0FHan82Q@mail.gmail.com>
	<CAHTYGLMaOD4Cm5ik=yYerQ3XQaqPMwC=47GDRtCXUySJinJ+oQ@mail.gmail.com>
	<CADOhEh4vWt4H8Ro=E9_600zGw0v=AgThL8M-x1+xCoYJXAvdLA@mail.gmail.com>
	<CAHTYGLMucYzvtv1kCgwOjfh83dcTaMZXzABLYApVJK+=ij00xg@mail.gmail.com>
	<CADOhEh717fdeEX18H5wukuEzmMZWDZbRW0TPCP5Dc+vj+zwmZw@mail.gmail.com>
Message-ID: <FC4B7C61-FC7D-459D-B53D-08291603B782@anl.gov>

See https://dl.acm.org/doi/10.1145/3225058.3225100 for more information about SELL.

The example used in the paper is src/ts/tutorials/advection-diffusion-reaction/ex5adj.c

src/mat/tests/bench_spmv.c provides a driver to test SpMV using matrices stored in binary or matrix market format (from the SuiteSparse benchmark collection).

If you would like to dive deeper, check this configurable script to see how the benchmark testing can be automated:
src/benchmarks/run_petsc_benchmarks.sh

Hong (Mr.)

On Mar 20, 2023, at 5:31 AM, Mark Adams <mfadams at lbl.gov> wrote:

I have no idea, keep on the list.
Mark

On Sun, Mar 19, 2023 at 10:13?PM CaoHao at gmail.com<mailto:CaoHao at gmail.com> <ch1057458756 at gmail.com<mailto:ch1057458756 at gmail.com>> wrote:
Thank you very much, I still have a question about the test code after vectorization. I did not find the Examples of the sell storage format in the petsc document. I would like to know which example you use to test the efficiency of vectorization?

Mark Adams <mfadams at lbl.gov<mailto:mfadams at lbl.gov>> ?2023?3?16??? 19:40???


On Thu, Mar 16, 2023 at 4:18?AM CaoHao at gmail.com<mailto:CaoHao at gmail.com> <ch1057458756 at gmail.com<mailto:ch1057458756 at gmail.com>> wrote:
Ok, maybe I can try to vectorize this format and make it part of the article.

That would be great, and it would be a good learning experience for you and a good way to get exposure.
See https://petsc.org/release/developers/contributing/ for guidance.

Good luck,
Mark


Mark Adams <mfadams at lbl.gov<mailto:mfadams at lbl.gov>> ?2023?3?15??? 19:57???
I don't believe that we have an effort here. It could be a good opportunity to contribute.

Mark

On Wed, Mar 15, 2023 at 4:54?AM CaoHao at gmail.com<mailto:CaoHao at gmail.com> <ch1057458756 at gmail.com<mailto:ch1057458756 at gmail.com>> wrote:
I checked the sell.c file and found that this algorithm supports AVX vectorization. Will the vectorization support of ARM architecture be added in the future?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230320/f7aa04ca/attachment.html>

From clement.berger at ens-lyon.fr  Mon Mar 20 11:20:06 2023
From: clement.berger at ens-lyon.fr (Berger Clement)
Date: Mon, 20 Mar 2023 17:20:06 +0100
Subject: [petsc-users] Create a nest not aligned by processors
In-Reply-To: <CAMYG4GnYvtPOebhHvEsQ=cELX8Vgr5Gx_TrSPRjC2fVVtHXgiA@mail.gmail.com>
References: <b89465855591abef194300f1dc0f68e4@ens-lyon.fr>
	<3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev>
	<ece4a0ac05b61082f2ea73507c7551c8@ens-lyon.fr>
	<193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev>
	<d9d5551d095a228f57cb5724fbd994ab@ens-lyon.fr>
	<82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev>
	<eaab4d4a43e6e8551e058d5b3b2cf3ae@ens-lyon.fr>
	<CC2A3FF0-505B-4A18-A8DB-8399BA92CCF8@petsc.dev>
	<26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr>
	<1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev>
	<2b92d65cd8b6b5786bad3b03ab74eda6@ens-lyon.fr>
	<CAMYG4Gmybdr_wB+DDQdX+wUXS25UqCmhvKeiNtw1TOwNQXQwkQ@mail.gmail.com>
	<e503fd2aa361d85c515712b41c447011@ens-lyon.fr>
	<E31F8C4D-3737-44C0-9022-6B34A26CEAA0@petsc.dev>
	<0b5419aa1fdbe12f093413352c675782@ens-lyon.fr>
	<CAMYG4GmN_vtr14ych+j+Ao=G=LN6kFowNx3r1-sW245x6uiV0w@mail.gmail.com>
	<0fe503797c8f73605cebaeaf1924e473@ens-lyon.fr>
	<CAMYG4Gk-tz-fSsL=75wUfRxB-uYzYX3cf3RgFwdMbgM5oYkd2A@mail.gmail.com>
	<17f530ee58cffe33a7f66d5e293d70c9@ens-lyon.fr>
	<CAMYG4GnYvtPOebhHvEsQ=cELX8Vgr5Gx_TrSPRjC2fVVtHXgiA@mail.gmail.com>
Message-ID: <2436f5782a4044d5fbb050c6f3f61c3f@ens-lyon.fr>

That seems to be working fine, thank you !

---
Cl?ment BERGER
ENS de Lyon 

Le 2023-03-20 15:51, Matthew Knepley a ?crit :

> On Mon, Mar 20, 2023 at 10:09?AM Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
>> Ok so this means that if I define my vectors via VecNest (organized as the matrices), everything will be correctly ordered ?
> 
> Yes.  
> 
>> How does that behave with MatConvert ? In the sense that if I convert a MatNest to MatAIJ via MatConvert, will the vector as built in the example I showed you work properly ?
> 
> No. MatConvert just changes the storage format, not the ordering. 
> 
> Thanks, 
> 
> Matt  
> 
> Thank you !
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-20 14:58, Matthew Knepley a ?crit : 
> 
> On Mon, Mar 20, 2023 at 9:35?AM Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> 1) Yes in my program I use PETSC_DETERMINE, but I don't see what is the issue there. From what I understand, it just lets PETSc set the total size from the local sizes provided, am I mistaken ? 
> 
> 2) I attached a small script, when I run it with 1 proc the output vector is not the same as if I run it with 2 procs, I don't know what I should do to make them match. 
> 
> PS : I precise that I am not trying to point out a bug here, I realize that my usage is wrong somehow, I just can't determine why, sorry if I gave you the wrong impression ! 
> 
> I think I can now explain this clearly. Thank you for the nice simple example. I attach my slightly changed version (I think better in C). Here is running on one process: 
> master *:~/Downloads/tmp/Berger$ /PETSc3/petsc/apple/bin/mpiexec -n 1 ./nestTest -left_view -right_view -nest_view -full_view
> Mat Object: 1 MPI process
> type: nest
> Matrix object: 
> type=nest, rows=2, cols=2 
> MatNest structure: 
> (0,0) : type=constantdiagonal, rows=4, cols=4 
> (0,1) : NULL 
> (1,0) : NULL 
> (1,1) : type=constantdiagonal, rows=4, cols=4 
> Mat Object: 1 MPI process
> type: seqaij
> row 0: (0, 2.) 
> row 1: (1, 2.) 
> row 2: (2, 2.) 
> row 3: (3, 2.) 
> row 4: (4, 1.) 
> row 5: (5, 1.) 
> row 6: (6, 1.) 
> row 7: (7, 1.) 
> Vec Object: 1 MPI process
> type: seq
> 0.
> 1.
> 2.
> 3.
> 4.
> 5.
> 6.
> 7.
> Vec Object: 1 MPI process
> type: seq
> 0.
> 2.
> 4.
> 6.
> 4.
> 5.
> 6.
> 7.
> 
> This looks like what you expect. Doubling the first four rows and reproducing the last four. Now let's run on two processes: 
> 
> master *:~/Downloads/tmp/Berger$ /PETSc3/petsc/apple/bin/mpiexec -n 2 ./nestTest -left_view -right_view -nest_view -full_view
> Mat Object: 2 MPI processes
> type: nest
> Matrix object: 
> type=nest, rows=2, cols=2 
> MatNest structure: 
> (0,0) : type=constantdiagonal, rows=4, cols=4 
> (0,1) : NULL 
> (1,0) : NULL 
> (1,1) : type=constantdiagonal, rows=4, cols=4 
> Mat Object: 2 MPI processes
> type: mpiaij
> row 0: (0, 2.) 
> row 1: (1, 2.) 
> row 2: (2, 1.) 
> row 3: (3, 1.) 
> row 4: (4, 2.) 
> row 5: (5, 2.) 
> row 6: (6, 1.) 
> row 7: (7, 1.) 
> Vec Object: 2 MPI processes
> type: mpi
> Process [0]
> 0.
> 1.
> 2.
> 3.
> Process [1]
> 4.
> 5.
> 6.
> 7.
> Vec Object: 2 MPI processes
> type: mpi
> Process [0]
> 0.
> 2.
> 2.
> 3.
> Process [1]
> 8.
> 10.
> 6.
> 7. 
> 
> Let me describe what has changed. The matrices A and B are parallel, so each has two rows on process 0 and two rows on process 1. In the MatNest they are interleaved because we asked for contiguous numbering (by giving NULL for the IS of global row numbers). If we want to reproduce the same output, we would need to produce our input vector with the same interleaved numbering. 
> 
> Thanks, 
> 
> Matt 
> 
> Thank you,
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-20 12:53, Matthew Knepley a ?crit : 
> 
> On Mon, Mar 20, 2023 at 6:18?AM Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> I simplified the problem with the initial test I talked about because I thought I identified the issue, so I will walk you through my whole problem : 
> 
> - first the solve doesn't produce the same results as mentioned 
> 
> - I noticed that the duration of the factorization step of the matrix was also not consistent with the number of processors used (it is longer with 3 processes than with 1), I didn't think much of it but I now realize that for instance with 4 processes, MUMPS crashes when factorizing 
> 
> - I thought my matrices were wrong, but it's hard for me to use MatView to compare them with 1 or 2 proc because I work with a quite specific geometry, so in order not to fall into some weird particular case I need to use at least roughly 100 points, so looking at 100x100 matrices is not really nice...Instead I tried to multiply them by a vector full of one (after I used the vector v such that v(i)=i). I tried it on two matrices, and the results didn't depend on the number of procs, but when I tried to multiply against the nest of these two matrices (a 2x2 block diagonal nest), the result changed depending on the number of processors used 
> 
> - that's why I tried the toy problem I wrote to you in the first place 
> 
> I hope it's clearer now. 
> 
> Unfortunately, it is not clear to me. There is nothing attached to this email. I will try to describe things from my end. 
> 
> 1) There are lots of tests. Internally, Nest does not depend on the number of processes unless you make it so. This leads 
> me to believe that your construction of the matrix changes with the number of processes. For example, using PETSC_DETERMINE 
> for sizes will do this. 
> 
> 2) In order to understand what you want to achieve, we need to have something running in two cases, one with "correct" output and one 
> with something different. It sounds like you have such a small example, but I have missed it. 
> 
> Can you attach this example? Then I can run it, look at the matrices, and see what is different. 
> 
> Thanks, 
> 
> Matt 
> 
> Thank you
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 21:57, Barry Smith a ?crit : 
> Yes, you would benefit from a VecConvert() to produce a standard vector. But you should be able to use VecGetArray() on the nest array and on the standard array and copy the values between the arrays any way you like. You don't need to do any reordering when you copy. Is that not working and what are the symptoms (more than just the answers to the linear solve are different)? Again you can run on one and two MPI processes with a tiny problem to see if things are not in the correct order in the vectors and matrices. 
> 
> Barry 
> 
> On Mar 17, 2023, at 3:22 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> To use MUMPS I need to convert my matrix in MATAIJ format (or at least not MATNEST), after that if I use a VECNEST for the left and right hanside, I get an error during the solve procedure, it is removed if I copy my data in a vector with standard format, I couldn't find any other way
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 19:53, Matthew Knepley a ?crit : 
> 
> On Fri, Mar 17, 2023 at 2:53?PM Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> But this is to properly fill up the VecNest am I right ? Because this one is correct, but I can't directly use it in the KSPSolve, I need to copy it into a standard vector
> 
> I do not understand what you mean here. You can definitely use a VecNest in a KSP. 
> 
> Thanks, 
> 
> Matt 
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 19:39, Barry Smith a ?crit : 
> I think the intention is that you use VecNestGetSubVecs() or VecNestGetSubVec() and fill up the sub-vectors in the same style as the matrices; this decreases the change of a reordering mistake in trying to do it by hand in your code. 
> 
> On Mar 17, 2023, at 2:35 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> That might be it, I didn't find the equivalent of MatConvert for the vectors, so when I need to solve my linear system, with my righthandside properly computed in nest format, I create a new vector using VecDuplicate, and then I copy into it my data using VecGetArrayF90 and copiing each element by hand. Does it create an incorrect ordering ? If so how can I get the correct one ?
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 19:27, Barry Smith a ?crit : 
> I would run your code with small sizes on 1, 2, 3 MPI ranks and use MatView() to examine the matrices. They will definitely be ordered differently but should otherwise be the same. My guess is that the right hand side may not have the correct ordering with respect to the matrix ordering in parallel. Note also that when the right hand side does have the correct ordering the solution will have a different ordering for each different number of MPI ranks when printed (but changing the ordering should give the same results up to machine precision. 
> 
> Barry 
> 
> On Mar 17, 2023, at 2:23 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> My issue is that it seems to improperly with some step of my process, the solve step doesn't provide the same result depending on the number of processors I use. I manually tried to multiply one the matrices I defined as a nest against a vector, and the result is not the same with e.g. 1 and 3 processors. That's why I tried the toy program I wrote in the first place, which highlights the misplacement of elements.
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 19:14, Barry Smith a ?crit : 
> This sounds  like a fine use of MATNEST. Now back to the original question 
> 
> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code 
> 
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) 
> 
> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). 
> 
> Is it possible to change that ?
   If I understand correctly it is behaving as expected. It is the same
matrix on 1 and 2 MPI processes, the only difference is the ordering of
the rows and columns.  

  Both matrix blocks are split among the two MPI processes. This is how
MATNEST works and likely what you want in practice.  

> On Mar 17, 2023, at 1:19 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> I have a matrix with four different blocks (2rows - 2columns). The block sizes differ from one another, because they correspond to a different physical variable. One of the block has the particularity that it has to be updated at each iteration. This update is performed by replacing it with a product of multiple matrices that depend on the result of the previous iteration. Note that these intermediate matrices are not square (because they also correspond to other types of variables), and that they must be completely refilled by hand (i.e. they are not the result of some simple linear operations). Finally, I use this final block matrix to solve multiple linear systems (with different righthand sides), so for now I use MUMPS as only the first solve takes time (but I might change it). 
> 
> Considering this setting, I created each type of variable separately, filled the different matrices, and created different nests of vectors / matrices for my operations. When the time comes to use KSPSolve, I use MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy the few vector data I need from my nests in a regular Vector, I solve, I get back my data in my nest and carry on with the operations needed for my updates. 
> 
> Is that clear ? I don't know if I provided too many or not enough details. 
> 
> Thank you
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 17:34, Barry Smith a ?crit : 
> Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it.  
> 
> Barry 
> 
> Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks).
> 
> On Mar 17, 2023, at 12:14 PM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work.
> 
> ---
> Cl?ment BERGER
> ENS de Lyon 
> 
> Le 2023-03-17 15:48, Barry Smith a ?crit : 
> You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process. 
> 
> Barry 
> 
> On Mar 17, 2023, at 10:10 AM, Berger Clement <clement.berger at ens-lyon.fr> wrote: 
> 
> Dear all, 
> 
> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code 
> 
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) 
> 
> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). 
> 
> Is it possible to change that ? 
> 
> Note that I am coding in fortran if that has ay consequence. 
> 
> Thank you, 
> 
> Sincerely,
> 
> -- 
> Cl?ment BERGER
> ENS de Lyon

  -- 

What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener 

https://www.cse.buffalo.edu/~knepley/ [1] 
  -- 

What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener 

https://www.cse.buffalo.edu/~knepley/ [1] 
  -- 

What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener 

https://www.cse.buffalo.edu/~knepley/ [1] 
  -- 

What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener 

https://www.cse.buffalo.edu/~knepley/ [1] 

Links:
------
[1] http://www.cse.buffalo.edu/~knepley/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230320/be1a124c/attachment-0001.html>

From knepley at gmail.com  Mon Mar 20 12:03:05 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 20 Mar 2023 13:03:05 -0400
Subject: [petsc-users] Create a nest not aligned by processors
In-Reply-To: <2436f5782a4044d5fbb050c6f3f61c3f@ens-lyon.fr>
References: <b89465855591abef194300f1dc0f68e4@ens-lyon.fr>
	<3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev>
	<ece4a0ac05b61082f2ea73507c7551c8@ens-lyon.fr>
	<193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev>
	<d9d5551d095a228f57cb5724fbd994ab@ens-lyon.fr>
	<82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev>
	<eaab4d4a43e6e8551e058d5b3b2cf3ae@ens-lyon.fr>
	<CC2A3FF0-505B-4A18-A8DB-8399BA92CCF8@petsc.dev>
	<26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr>
	<1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev>
	<2b92d65cd8b6b5786bad3b03ab74eda6@ens-lyon.fr>
	<CAMYG4Gmybdr_wB+DDQdX+wUXS25UqCmhvKeiNtw1TOwNQXQwkQ@mail.gmail.com>
	<e503fd2aa361d85c515712b41c447011@ens-lyon.fr>
	<E31F8C4D-3737-44C0-9022-6B34A26CEAA0@petsc.dev>
	<0b5419aa1fdbe12f093413352c675782@ens-lyon.fr>
	<CAMYG4GmN_vtr14ych+j+Ao=G=LN6kFowNx3r1-sW245x6uiV0w@mail.gmail.com>
	<0fe503797c8f73605cebaeaf1924e473@ens-lyon.fr>
	<CAMYG4Gk-tz-fSsL=75wUfRxB-uYzYX3cf3RgFwdMbgM5oYkd2A@mail.gmail.com>
	<17f530ee58cffe33a7f66d5e293d70c9@ens-lyon.fr>
	<CAMYG4GnYvtPOebhHvEsQ=cELX8Vgr5Gx_TrSPRjC2fVVtHXgiA@mail.gmail.com>
	<2436f5782a4044d5fbb050c6f3f61c3f@ens-lyon.fr>
Message-ID: <CAMYG4G=LB=pVZMvOv280Oe0=naFynm55GdRt-6anFw+38mHgCw@mail.gmail.com>

On Mon, Mar 20, 2023 at 12:20?PM Berger Clement <clement.berger at ens-lyon.fr>
wrote:

> That seems to be working fine, thank you !
>
Great!

  Thanks

     Matt


> ---
> Cl?ment BERGER
> ENS de Lyon
>
>
> Le 2023-03-20 15:51, Matthew Knepley a ?crit :
>
> On Mon, Mar 20, 2023 at 10:09?AM Berger Clement <
> clement.berger at ens-lyon.fr> wrote:
>
>> Ok so this means that if I define my vectors via VecNest (organized as
>> the matrices), everything will be correctly ordered ?
>>
> Yes.
>
>> How does that behave with MatConvert ? In the sense that if I convert a
>> MatNest to MatAIJ via MatConvert, will the vector as built in the example I
>> showed you work properly ?
>>
>>
>> No. MatConvert just changes the storage format, not the ordering.
>
>   Thanks,
>
>      Matt
>
>> Thank you !
>> ---
>> Cl?ment BERGER
>> ENS de Lyon
>>
>>
>> Le 2023-03-20 14:58, Matthew Knepley a ?crit :
>>
>> On Mon, Mar 20, 2023 at 9:35?AM Berger Clement <
>> clement.berger at ens-lyon.fr> wrote:
>>
>>> 1) Yes in my program I use PETSC_DETERMINE, but I don't see what is the
>>> issue there. From what I understand, it just lets PETSc set the total size
>>> from the local sizes provided, am I mistaken ?
>>>
>>> 2) I attached a small script, when I run it with 1 proc the output
>>> vector is not the same as if I run it with 2 procs, I don't know what I
>>> should do to make them match.
>>>
>>> PS : I precise that I am not trying to point out a bug here, I realize
>>> that my usage is wrong somehow, I just can't determine why, sorry if I gave
>>> you the wrong impression !
>>>
>>>
>>> I think I can now explain this clearly. Thank you for the nice simple
>> example. I attach my slightly changed version (I think better in C). Here
>> is running on one process:
>>
>> master *:~/Downloads/tmp/Berger$ /PETSc3/petsc/apple/bin/mpiexec -n 1
>> ./nestTest -left_view -right_view -nest_view -full_view
>> Mat Object: 1 MPI process
>>   type: nest
>>   Matrix object:
>>     type=nest, rows=2, cols=2
>>     MatNest structure:
>>     (0,0) : type=constantdiagonal, rows=4, cols=4
>>     (0,1) : NULL
>>     (1,0) : NULL
>>     (1,1) : type=constantdiagonal, rows=4, cols=4
>> Mat Object: 1 MPI process
>>   type: seqaij
>> row 0: (0, 2.)
>> row 1: (1, 2.)
>> row 2: (2, 2.)
>> row 3: (3, 2.)
>> row 4: (4, 1.)
>> row 5: (5, 1.)
>> row 6: (6, 1.)
>> row 7: (7, 1.)
>> Vec Object: 1 MPI process
>>   type: seq
>> 0.
>> 1.
>> 2.
>> 3.
>> 4.
>> 5.
>> 6.
>> 7.
>> Vec Object: 1 MPI process
>>   type: seq
>> 0.
>> 2.
>> 4.
>> 6.
>> 4.
>> 5.
>> 6.
>> 7.
>>
>> This looks like what you expect. Doubling the first four rows and
>> reproducing the last four. Now let's run on two processes:
>>
>> master *:~/Downloads/tmp/Berger$ /PETSc3/petsc/apple/bin/mpiexec -n 2
>> ./nestTest -left_view -right_view -nest_view -full_view
>> Mat Object: 2 MPI processes
>>   type: nest
>>   Matrix object:
>>     type=nest, rows=2, cols=2
>>     MatNest structure:
>>     (0,0) : type=constantdiagonal, rows=4, cols=4
>>     (0,1) : NULL
>>     (1,0) : NULL
>>     (1,1) : type=constantdiagonal, rows=4, cols=4
>> Mat Object: 2 MPI processes
>>   type: mpiaij
>> row 0: (0, 2.)
>> row 1: (1, 2.)
>> row 2: (2, 1.)
>> row 3: (3, 1.)
>> row 4: (4, 2.)
>> row 5: (5, 2.)
>> row 6: (6, 1.)
>> row 7: (7, 1.)
>> Vec Object: 2 MPI processes
>>   type: mpi
>> Process [0]
>> 0.
>> 1.
>> 2.
>> 3.
>> Process [1]
>> 4.
>> 5.
>> 6.
>> 7.
>> Vec Object: 2 MPI processes
>>   type: mpi
>> Process [0]
>> 0.
>> 2.
>> 2.
>> 3.
>> Process [1]
>> 8.
>> 10.
>> 6.
>> 7.
>>
>> Let me describe what has changed. The matrices A and B are parallel, so
>> each has two rows on process 0 and two rows on process 1. In the MatNest
>> they are interleaved because we asked for contiguous numbering (by giving
>> NULL for the IS of global row numbers). If we want to reproduce the same
>> output, we would need to produce our input vector with the same interleaved
>> numbering.
>>
>>   Thanks,
>>
>>      Matt
>>
>>> Thank you,
>>> ---
>>> Cl?ment BERGER
>>> ENS de Lyon
>>>
>>>
>>> Le 2023-03-20 12:53, Matthew Knepley a ?crit :
>>>
>>> On Mon, Mar 20, 2023 at 6:18?AM Berger Clement <
>>> clement.berger at ens-lyon.fr> wrote:
>>>
>>>> I simplified the problem with the initial test I talked about because I
>>>> thought I identified the issue, so I will walk you through my whole problem
>>>> :
>>>>
>>>> - first the solve doesn't produce the same results as mentioned
>>>>
>>>> - I noticed that the duration of the factorization step of the matrix
>>>> was also not consistent with the number of processors used (it is longer
>>>> with 3 processes than with 1), I didn't think much of it but I now realize
>>>> that for instance with 4 processes, MUMPS crashes when factorizing
>>>>
>>>> - I thought my matrices were wrong, but it's hard for me to use MatView
>>>> to compare them with 1 or 2 proc because I work with a quite specific
>>>> geometry, so in order not to fall into some weird particular case I need to
>>>> use at least roughly 100 points, so looking at 100x100 matrices is not
>>>> really nice...Instead I tried to multiply them by a vector full of one
>>>> (after I used the vector v such that v(i)=i). I tried it on two matrices,
>>>> and the results didn't depend on the number of procs, but when I tried to
>>>> multiply against the nest of these two matrices (a 2x2 block diagonal
>>>> nest), the result changed depending on the number of processors used
>>>>
>>>> - that's why I tried the toy problem I wrote to you in the first place
>>>>
>>>> I hope it's clearer now.
>>>>
>>>
>>> Unfortunately, it is not clear to me. There is nothing attached to this
>>> email. I will try to describe things from my end.
>>>
>>> 1) There are lots of tests. Internally, Nest does not depend on the
>>> number of processes unless you make it so. This leads
>>>      me to believe that your construction of the matrix changes with the
>>> number of processes. For example, using PETSC_DETERMINE
>>>      for sizes will do this.
>>>
>>> 2) In order to understand what you want to achieve, we need to have
>>> something running in two cases, one with "correct" output and one
>>>     with something different. It sounds like you have such a small
>>> example, but I have missed it.
>>>
>>> Can you attach this example? Then I can run it, look at the matrices,
>>> and see what is different.
>>>
>>>   Thanks,
>>>
>>>       Matt
>>>
>>>
>>>> Thank you
>>>> ---
>>>> Cl?ment BERGER
>>>> ENS de Lyon
>>>>
>>>>
>>>> Le 2023-03-17 21:57, Barry Smith a ?crit :
>>>>
>>>>
>>>>   Yes, you would benefit from a VecConvert() to produce a standard
>>>> vector. But you should be able to use VecGetArray() on the nest array and
>>>> on the standard array and copy the values between the arrays any way you
>>>> like. You don't need to do any reordering when you copy. Is that not
>>>> working and what are the symptoms (more than just the answers to the linear
>>>> solve are different)? Again you can run on one and two MPI processes with a
>>>> tiny problem to see if things are not in the correct order in the vectors
>>>> and matrices.
>>>>
>>>>   Barry
>>>>
>>>>
>>>> On Mar 17, 2023, at 3:22 PM, Berger Clement <clement.berger at ens-lyon.fr>
>>>> wrote:
>>>>
>>>> To use MUMPS I need to convert my matrix in MATAIJ format (or at least
>>>> not MATNEST), after that if I use a VECNEST for the left and right hanside,
>>>> I get an error during the solve procedure, it is removed if I copy my data
>>>> in a vector with standard format, I couldn't find any other way
>>>> ---
>>>> Cl?ment BERGER
>>>> ENS de Lyon
>>>>
>>>>
>>>> Le 2023-03-17 19:53, Matthew Knepley a ?crit :
>>>>
>>>> On Fri, Mar 17, 2023 at 2:53?PM Berger Clement <
>>>> clement.berger at ens-lyon.fr> wrote:
>>>>
>>>>> But this is to properly fill up the VecNest am I right ? Because this
>>>>> one is correct, but I can't directly use it in the KSPSolve, I need to copy
>>>>> it into a standard vector
>>>>>
>>>>>
>>>> I do not understand what you mean here. You can definitely use a
>>>> VecNest in a KSP.
>>>>
>>>>   Thanks,
>>>>
>>>>     Matt
>>>>
>>>>
>>>>
>>>>> ---
>>>>> Cl?ment BERGER
>>>>> ENS de Lyon
>>>>>
>>>>>
>>>>> Le 2023-03-17 19:39, Barry Smith a ?crit :
>>>>>
>>>>>
>>>>>   I think the intention is that you use VecNestGetSubVecs()
>>>>> or VecNestGetSubVec() and fill up the sub-vectors in the same style as the
>>>>> matrices; this decreases the change of a reordering mistake in trying to do
>>>>> it by hand in your code.
>>>>>
>>>>>
>>>>>
>>>>> On Mar 17, 2023, at 2:35 PM, Berger Clement <
>>>>> clement.berger at ens-lyon.fr> wrote:
>>>>>
>>>>> That might be it, I didn't find the equivalent of MatConvert for the
>>>>> vectors, so when I need to solve my linear system, with my righthandside
>>>>> properly computed in nest format, I create a new vector using VecDuplicate,
>>>>> and then I copy into it my data using VecGetArrayF90 and copiing each
>>>>> element by hand. Does it create an incorrect ordering ? If so how can I get
>>>>> the correct one ?
>>>>> ---
>>>>> Cl?ment BERGER
>>>>> ENS de Lyon
>>>>>
>>>>>
>>>>> Le 2023-03-17 19:27, Barry Smith a ?crit :
>>>>>
>>>>>
>>>>>   I would run your code with small sizes on 1, 2, 3 MPI ranks and use
>>>>> MatView() to examine the matrices. They will definitely be ordered
>>>>> differently but should otherwise be the same. My guess is that the right
>>>>> hand side may not have the correct ordering with respect to the matrix
>>>>> ordering in parallel. Note also that when the right hand side does have the
>>>>> correct ordering the solution will have a different ordering for each
>>>>> different number of MPI ranks when printed (but changing the ordering
>>>>> should give the same results up to machine precision.
>>>>>
>>>>>   Barry
>>>>>
>>>>>
>>>>> On Mar 17, 2023, at 2:23 PM, Berger Clement <
>>>>> clement.berger at ens-lyon.fr> wrote:
>>>>>
>>>>> My issue is that it seems to improperly with some step of my process,
>>>>> the solve step doesn't provide the same result depending on the number of
>>>>> processors I use. I manually tried to multiply one the matrices I defined
>>>>> as a nest against a vector, and the result is not the same with e.g. 1 and
>>>>> 3 processors. That's why I tried the toy program I wrote in the first
>>>>> place, which highlights the misplacement of elements.
>>>>> ---
>>>>> Cl?ment BERGER
>>>>> ENS de Lyon
>>>>>
>>>>>
>>>>> Le 2023-03-17 19:14, Barry Smith a ?crit :
>>>>>
>>>>>
>>>>>    This sounds  like a fine use of MATNEST. Now back to the original
>>>>> question
>>>>>
>>>>>
>>>>> I want to construct a matrix by blocs, each block having different
>>>>> sizes and partially stored by multiple processors. If I am not mistaken,
>>>>> the right way to do so is by using the MATNEST type. However, the following
>>>>> code
>>>>>
>>>>> Call
>>>>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
>>>>> Call
>>>>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
>>>>> Call
>>>>> MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr)
>>>>>
>>>>> does not generate the same matrix depending on the number of
>>>>> processors. It seems that it starts by everything owned by the first proc
>>>>> for A and B, then goes on to the second proc and so on (I hope I am being
>>>>> clear).
>>>>>
>>>>> Is it possible to change that ?
>>>>>
>>>>>   If I understand correctly it is behaving as expected. It is the same
>>>>> matrix on 1 and 2 MPI processes, the only difference is the ordering of the
>>>>> rows and columns.
>>>>>
>>>>>   Both matrix blocks are split among the two MPI processes. This is
>>>>> how MATNEST works and likely what you want in practice.
>>>>>
>>>>> On Mar 17, 2023, at 1:19 PM, Berger Clement <
>>>>> clement.berger at ens-lyon.fr> wrote:
>>>>>
>>>>> I have a matrix with four different blocks (2rows - 2columns). The
>>>>> block sizes differ from one another, because they correspond to a different
>>>>> physical variable. One of the block has the particularity that it has to be
>>>>> updated at each iteration. This update is performed by replacing it with a
>>>>> product of multiple matrices that depend on the result of the previous
>>>>> iteration. Note that these intermediate matrices are not square (because
>>>>> they also correspond to other types of variables), and that they must be
>>>>> completely refilled by hand (i.e. they are not the result of some simple
>>>>> linear operations). Finally, I use this final block matrix to solve
>>>>> multiple linear systems (with different righthand sides), so for now I use
>>>>> MUMPS as only the first solve takes time (but I might change it).
>>>>>
>>>>> Considering this setting, I created each type of variable separately,
>>>>> filled the different matrices, and created different nests of vectors /
>>>>> matrices for my operations. When the time comes to use KSPSolve, I use
>>>>> MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy
>>>>> the few vector data I need from my nests in a regular Vector, I solve, I
>>>>> get back my data in my nest and carry on with the operations needed for my
>>>>> updates.
>>>>>
>>>>> Is that clear ? I don't know if I provided too many or not enough
>>>>> details.
>>>>>
>>>>> Thank you
>>>>> ---
>>>>> Cl?ment BERGER
>>>>> ENS de Lyon
>>>>>
>>>>>
>>>>> Le 2023-03-17 17:34, Barry Smith a ?crit :
>>>>>
>>>>>
>>>>>    Perhaps if you provide a brief summary of what you would like to do
>>>>> and we may have ideas on how to achieve it.
>>>>>
>>>>>    Barry
>>>>>
>>>>> Note: that MATNEST does require that all matrices live on all the MPI
>>>>> processes within the original communicator. That is if the original
>>>>> communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST
>>>>> that only lives on ranks 1,2 but you could have it have 0 rows on rank zero
>>>>> so effectively it lives only on rank 1 and 2 (though its communicator is
>>>>> all three ranks).
>>>>>
>>>>> On Mar 17, 2023, at 12:14 PM, Berger Clement <
>>>>> clement.berger at ens-lyon.fr> wrote:
>>>>>
>>>>> It would be possible in the case I showed you but in mine that would
>>>>> actually be quite complicated, isn't there any other workaround ? I precise
>>>>> that I am not entitled to utilizing the MATNEST format, it's just that I
>>>>> think the other ones wouldn't work.
>>>>> ---
>>>>> Cl?ment BERGER
>>>>> ENS de Lyon
>>>>>
>>>>>
>>>>> Le 2023-03-17 15:48, Barry Smith a ?crit :
>>>>>
>>>>>
>>>>>    You may be able to mimic what you want by not using PETSC_DECIDE
>>>>> but instead computing up front how many rows of each matrix you want stored
>>>>> on each MPI process. You can use 0 for on certain MPI processes for certain
>>>>> matrices if you don't want any rows of that particular matrix stored on
>>>>> that particular MPI process.
>>>>>
>>>>>   Barry
>>>>>
>>>>>
>>>>> On Mar 17, 2023, at 10:10 AM, Berger Clement <
>>>>> clement.berger at ens-lyon.fr> wrote:
>>>>>
>>>>> Dear all,
>>>>>
>>>>> I want to construct a matrix by blocs, each block having different
>>>>> sizes and partially stored by multiple processors. If I am not mistaken,
>>>>> the right way to do so is by using the MATNEST type. However, the following
>>>>> code
>>>>>
>>>>> Call
>>>>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
>>>>> Call
>>>>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
>>>>> Call
>>>>> MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr)
>>>>>
>>>>> does not generate the same matrix depending on the number of
>>>>> processors. It seems that it starts by everything owned by the first proc
>>>>> for A and B, then goes on to the second proc and so on (I hope I am being
>>>>> clear).
>>>>>
>>>>> Is it possible to change that ?
>>>>>
>>>>> Note that I am coding in fortran if that has ay consequence.
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Sincerely,
>>>>> --
>>>>> Cl?ment BERGER
>>>>> ENS de Lyon
>>>>>
>>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://www.cse.buffalo.edu/~knepley/
>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>
>>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230320/7c09c46e/attachment-0001.html>

From ch1057458756 at gmail.com  Mon Mar 20 12:04:38 2023
From: ch1057458756 at gmail.com (CaoHao@gmail.com)
Date: Tue, 21 Mar 2023 01:04:38 +0800
Subject: [petsc-users] [petsc-maint] Some questions about matrix
 multiplication in sell format
In-Reply-To: <FC4B7C61-FC7D-459D-B53D-08291603B782@anl.gov>
References: <CAHTYGLNuojwUESyU2iQucvWxBL2y2yE_M9T2RaMNO29y5EPsXw@mail.gmail.com>
	<CADOhEh47Dddg+uAjk0e4T+FmOYJZjrGh5BuOWdi+vZ0FHan82Q@mail.gmail.com>
	<CAHTYGLMaOD4Cm5ik=yYerQ3XQaqPMwC=47GDRtCXUySJinJ+oQ@mail.gmail.com>
	<CADOhEh4vWt4H8Ro=E9_600zGw0v=AgThL8M-x1+xCoYJXAvdLA@mail.gmail.com>
	<CAHTYGLMucYzvtv1kCgwOjfh83dcTaMZXzABLYApVJK+=ij00xg@mail.gmail.com>
	<CADOhEh717fdeEX18H5wukuEzmMZWDZbRW0TPCP5Dc+vj+zwmZw@mail.gmail.com>
	<FC4B7C61-FC7D-459D-B53D-08291603B782@anl.gov>
Message-ID: <CAHTYGLOe=H=HidNEWG9aGdMwa2Lj4=rZbLH5r7Z_km_+0DKHGA@mail.gmail.com>

Thank you very much. I will continue to study these contents.

Zhang, Hong <hongzhang at anl.gov>? 2023?3?21? ???00:12???

> See https://dl.acm.org/doi/10.1145/3225058.3225100 for more information
> about SELL.
>
> The example used in the paper
> is src/ts/tutorials/advection-diffusion-reaction/ex5adj.c
>
> src/mat/tests/bench_spmv.c provides a driver to test SpMV using matrices
> stored in binary or matrix market format (from the SuiteSparse benchmark
> collection).
>
> If you would like to dive deeper, check this configurable script to see
> how the benchmark testing can be automated:
> src/benchmarks/run_petsc_benchmarks.sh
>
> Hong (Mr.)
>
> On Mar 20, 2023, at 5:31 AM, Mark Adams <mfadams at lbl.gov> wrote:
>
> I have no idea, keep on the list.
> Mark
>
> On Sun, Mar 19, 2023 at 10:13?PM CaoHao at gmail.com <ch1057458756 at gmail.com>
> wrote:
>
>> Thank you very much, I still have a question about the test code after
>> vectorization. I did not find the Examples of the sell storage format in
>> the petsc document. I would like to know which example you use to test the
>> efficiency of vectorization?
>>
>> Mark Adams <mfadams at lbl.gov> ?2023?3?16??? 19:40???
>>
>>>
>>>
>>> On Thu, Mar 16, 2023 at 4:18?AM CaoHao at gmail.com <ch1057458756 at gmail.com>
>>> wrote:
>>>
>>>> Ok, maybe I can try to vectorize this format and make it part of the
>>>> article.
>>>>
>>>
>>> That would be great, and it would be a good learning experience for you
>>> and a good way to get exposure.
>>> See https://petsc.org/release/developers/contributing/ for guidance.
>>>
>>> Good luck,
>>> Mark
>>>
>>>
>>>>
>>>> Mark Adams <mfadams at lbl.gov> ?2023?3?15??? 19:57???
>>>>
>>>>> I don't believe that we have an effort here. It could be a good
>>>>> opportunity to contribute.
>>>>>
>>>>> Mark
>>>>>
>>>>> On Wed, Mar 15, 2023 at 4:54?AM CaoHao at gmail.com <
>>>>> ch1057458756 at gmail.com> wrote:
>>>>>
>>>>>> I checked the sell.c file and found that this algorithm supports AVX
>>>>>> vectorization. Will the vectorization support of ARM architecture be added
>>>>>> in the future?
>>>>>>
>>>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230321/3ccdc3b5/attachment.html>

From jchristopher at anl.gov  Mon Mar 20 17:45:04 2023
From: jchristopher at anl.gov (Christopher, Joshua)
Date: Mon, 20 Mar 2023 22:45:04 +0000
Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre
 BoomerAMG
In-Reply-To: <595D8D88-C619-41D7-A427-1C0EFB5C5E44@petsc.dev>
References: <SA1PR09MB756856DAB4827D93146EDB69D4AD9@SA1PR09MB7568.namprd09.prod.outlook.com>
	<CAC6644E-BC14-4F64-ABE5-F35589A4BD94@petsc.dev>
	<SA1PR09MB75689D2D180F6312B008039AD4B29@SA1PR09MB7568.namprd09.prod.outlook.com>
	<BC1877A1-BF64-4E2D-86C4-DDFFBB892577@petsc.dev>
	<SA1PR09MB7568D48C1F38364F0DD286C0D4B39@SA1PR09MB7568.namprd09.prod.outlook.com>
	<523EAD18-437E-4008-A811-4D32317C89AC@joliv.et>
	<SA1PR09MB7568D0FA09DDED4732EA2546D4BE9@SA1PR09MB7568.namprd09.prod.outlook.com>
	<4A1F98D0-658C-47A2-8277-23F97F95F5C1@petsc.dev>
	<SA1PR09MB756832B0F3D61B1E798915D7D4BD9@SA1PR09MB7568.namprd09.prod.outlook.com>
	<595D8D88-C619-41D7-A427-1C0EFB5C5E44@petsc.dev>
Message-ID: <SA1PR09MB756818101986202BF7D34CCED4809@SA1PR09MB7568.namprd09.prod.outlook.com>

Hi Barry and Mark,

Thank you for your responses. I implemented the index sets in my application and it appears to work in serial. Unfortunately I am having some trouble running in parallel. The error I am getting is:
[1]PETSC ERROR: Petsc has generated inconsistent data
[1]PETSC ERROR: Number of entries found in complement 1000 does not match expected 500
1]PETSC ERROR: #1 ISComplement() at petsc-3.16.5/src/vec/is/is/utils/iscoloring.c:837
[1]PETSC ERROR: #2 PCSetUp_FieldSplit() at petsc-3.16.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c:882
[1]PETSC ERROR: #3 PCSetUp() at petsc-3.16.5/src/ksp/pc/interface/precon.c:1017
[1]PETSC ERROR: #4 KSPSetUp() at petsc-3.16.5/src/ksp/ksp/interface/itfunc.c:408
[1]PETSC ERROR: #5 KSPSolve_Private() at petsc-3.16.5/src/ksp/ksp/interface/itfunc.c:852
[1]PETSC ERROR: #6 KSPSolve() at petsc-3.16.5/src/ksp/ksp/interface/itfunc.c:1086
[1]PETSC ERROR: #7 solvePetsc() at coupled/coupledSolver.C:612

I am testing with two processors and a 2000x2000 matrix. I have two fields, phi and rho. The matrix has rows 0-999 for phi and rows 1000-1999 for rho. Proc0 has rows 0-499 and 1000-1499 while proc1 has rows 500-999 and 1500-1999. I've attached the ASCII printout of the IS for phi and rho. Am I right thinking that I have some issue with my IS layouts?

Thank you,
Joshua


________________________________
From: Barry Smith <bsmith at petsc.dev>
Sent: Friday, March 17, 2023 1:22 PM
To: Christopher, Joshua <jchristopher at anl.gov>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG



On Mar 17, 2023, at 1:26 PM, Christopher, Joshua <jchristopher at anl.gov> wrote:

Hi Barry,

Thank you for your response. I'm a little confused about the relation between the IS integer values and matrix indices. Fromhttps://petsc.org/release/src/snes/tutorials/ex70.c.html it looks like my IS should just contain a list of the rows for each split? For example, if I have a 100x100 matrix with two fields, "rho" and "phi", the first 50 rows correspond to the "rho" variable and the last 50 correspond to the "phi" variable. So I should call PCFieldSplitSetIS twice, the first with an IS containing integers 0-49 and the second with integers 49-99? PCFieldSplitSetIS is expecting global row numbers, correct?

  As Mark said, yes this sounds fine.

My matrix is organized as one block after another.

   When you are running in parallel with MPI, how will you organize the unknowns? Will you have 25 of the rho followed by 25 of phi on each MPI process? You will need to take this into account when you build the IS on each MPI process.

  Barry



Thank you,
Joshua
________________________________
From: Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Sent: Tuesday, March 14, 2023 1:35 PM
To: Christopher, Joshua <jchristopher at anl.gov<mailto:jchristopher at anl.gov>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG


  You definitely do not need to use a complicated DM to take advantage of PCFIELDSPLIT. All you need to do is create two IS on each MPI process. The first should list all the indices of the degrees of freedom of your first type of variable and the second should list all the rest of the degrees of freedom. Then use https://petsc.org/release/docs/manualpages/PC/PCFieldSplitSetIS/

  Barry

Note: PCFIELDSPLIT does not care how you have ordered your degrees of freedom of the two types. You might interlace them or have all the first degree of freedom on an MPI process and then have all the second degree of freedom. This just determines what your IS look like.



On Mar 14, 2023, at 1:14 PM, Christopher, Joshua via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:

Hello PETSc users,

I haven't heard back from the library developer regarding the numbering issue or my questions on using field split operators with their library, so I need to fix this myself.

Regarding the natural numbering vs parallel numbering: I haven't figured out what is wrong here. I stepped through in parallel and it looks like each processor is setting up the matrix and calling MatSetValue similar to what is shown in https://petsc.org/release/src/ksp/ksp/tutorials/ex2.c.html. I see that PETSc is recognizing my simple two-processor test from the output ("PetscInitialize_Common(): PETSc successfully started: number of processors = 2"). I'll keep poking at this, however I'm very new to PETSc. When I print the matrix to ASCII using PETSC_VIEWER_DEFAULT, I'm guessing I see one row per line, and the tuples consists of the column number and value?

On the FieldSplit preconditioner, is my understanding here correct:

To use FieldSplit, I must have a DM. Since I have an unstructured mesh, I must use DMPlex and set up the chart and covering relations specific to my mesh following here: https://petsc.org/release/docs/manual/dmplex/. I think this may be very time-consuming for me to set up.

Currently, I already have a matrix stored in a parallel sparse L-D-U format. I am converting into PETSc's sparse parallel AIJ matrix (traversing my matrix and using MatSetValues). The weights for my discretization scheme are already accounted for in the coefficients of my L-D-U matrix. I do have the submatrices in L-D-U format for each of my two equations' coupling with each other. That is, the equivalent of lines 242,251-252,254 of example 28 https://petsc.org/release/src/snes/tutorials/ex28.c.html. Could I directly convert my submatrices into PETSc's sub-matrix here, then assemble things together so that the field split preconditioners will work?

Alternatively, since my L-D-U matrices already account for the discretization scheme, can I use a simple structured grid DM?

Thank you so much for your help!
Regards,
Joshua
________________________________
From: Pierre Jolivet <pierre at joliv.et<mailto:pierre at joliv.et>>
Sent: Friday, March 3, 2023 11:45 AM
To: Christopher, Joshua <jchristopher at anl.gov<mailto:jchristopher at anl.gov>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG

For full disclosure, with -ksp_pc_side right -ksp_max_it 100 -ksp_rtol 1E-10:
1) with renumbering via ParMETIS
-pc_type bjacobi -sub_pc_type lu -sub_pc_factor_mat_solver_type mumps => Linear solve converged due to CONVERGED_RTOL iterations 10
-pc_type hypre -pc_hypre_boomeramg_relax_type_down l1-Gauss-Seidel -pc_hypre_boomeramg_relax_type_up backward-l1-Gauss-Seidel => Linear solve converged due to CONVERGED_RTOL iterations 55
2) without renumbering via ParMETIS
-pc_type bjacobi => Linear solve did not converge due to DIVERGED_ITS iterations 100
-pc_type hypre => Linear solve did not converge due to DIVERGED_ITS iterations 100
Using on outer fieldsplit may help fix this.

Thanks,
Pierre

On 3 Mar 2023, at 6:24 PM, Christopher, Joshua via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:

I am solving these equations in the context of electrically-driven fluid flows as that first paper describes. I am using a PIMPLE scheme to advance the fluid equations in time, and my goal is to do a coupled solve of the electric equations similar to what is described in this paper: https://www.sciencedirect.com/science/article/pii/S0045793019302427. They are using the SIMPLE scheme in this paper. My fluid flow should eventually reach steady behavior, and likewise the time derivative in the charge density should trend towards zero. They preferred using BiCGStab with a direct LU preconditioner for solving their electric equations. I tried to test that combination, but my case is halting for unknown reasons in the middle of the PETSc solve. I'll try with more nodes and see if I am running out of memory, but the computer is a little overloaded at the moment so it may take a while to run.

I sent Pierre Jolivet my matrix and RHS, and they said the matrix does not appear to be following a parallel numbering, and instead looks like the matrix has natural numbering. When they renumbered the system with ParMETIS they got really fast convergence. I am using PETSc through a library, so I will reach out to the library authors and see if there is an issue in the library.

Thank you,
Joshua
________________________________
From: Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Sent: Thursday, March 2, 2023 3:47 PM
To: Christopher, Joshua <jchristopher at anl.gov<mailto:jchristopher at anl.gov>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG




<Untitled.png>

  Are you solving this as a time-dependent problem? Using an implicit scheme (like backward Euler) for rho ? In ODE language, solving the differential algebraic equation?

Is epsilon bounded away from 0?

On Mar 2, 2023, at 4:22 PM, Christopher, Joshua <jchristopher at anl.gov<mailto:jchristopher at anl.gov>> wrote:

Hi Barry and Mark,

Thank you for looking into my problem. The two equations I am solving with PETSc are equations 6 and 7 from this paper:https://ris.utwente.nl/ws/portalfiles/portal/5676495/Roghair+Paper_final_draft_v1.pdf

I just used MUMPS and SuperLU_DIST on my full-size problem (with 3,000,000 unknowns). To clarify, I did a direct solve with -ksp_type preonly. They take a very long time, about 30 minutes for MUMPS and 18 minutes for SuperLU_DIST, see attached output. For reference, the same matrix took 658 iterations of BoomerAMG and about 20 seconds of walltime. Maybe I am already getting a great deal with BoomerAMG!

I'll try removing some terms from my solve (e.g. removing the second equation, then making the second equation just the elliptic portion of the equation, etc.) and try with a simpler geometry. I'll keep you updated as I run into troubles with that route. I wasn't aware of Field Split preconditioners, I'll do some reading on them and give them a try as well.

Thank you again,
Joshua
________________________________

From: Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Sent: Thursday, March 2, 2023 7:47 AM
To: Christopher, Joshua <jchristopher at anl.gov<mailto:jchristopher at anl.gov>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG


  Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the 5,000,000 unknowns? It is at the high end of problem sizes you can do with direct solvers but is worth comparing with  BoomerAMG. You likely want to use more nodes and fewer cores per node with MUMPs to be able to access more memory. If you are needing to solve multiple right hand sides but with the same matrix the factors will be reused resulting in the second and later solves being much faster.

  I agree with Mark, with iterative solvers you are likely to end up with PCFIELDSPLIT.

  Barry


On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:

Hello,

I am trying to solve the leaky-dielectric model equations with PETSc using a second-order discretization scheme (with limiting to first order as needed) using the finite volume method. The leaky dielectric model is a coupled system of two equations, consisting of a Poisson equation and a convection-diffusion equation.  I have tested on small problems with simple geometry (~1000 DoFs) using:

-ksp_type gmres
-pc_type hypre
-pc_hypre_type boomeramg

and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this in parallel with 2 cores, but also previously was able to use successfully use a direct solver in serial to solve this problem. When I scale up to my production problem, I get significantly worse convergence. My production problem has ~3 million DoFs, more complex geometry, and is solved on ~100 cores across two nodes. The boundary conditions change a little because of the geometry, but are of the same classifications (e.g. only Dirichlet and Neumann). On the production case, I am needing 600-4000 iterations to converge. I've attached the output from the first solve that took 658 iterations to converge, using the following output options:

-ksp_view_pre
-ksp_view
-ksp_converged_reason
-ksp_monitor_true_residual
-ksp_test_null_space

My matrix is non-symmetric, the condition number can be around 10e6, and the eigenvalues reported by PETSc have been real and positive (using -ksp_view_eigenvalues).

I have tried using other preconditions (superlu, mumps, gamg, mg) but hypre+boomeramg has performed the best so far. The literature seems to indicate that AMG is the best approach for solving these equations in a coupled fashion.

Do you have any advice on speeding up the convergence of this system?

Thank you,
Joshua
<petsc_gmres_boomeramg.txt>

<petsc_preonly_mumps.txt><petsc_preonly_superlu.txt>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230320/cd48ee66/attachment-0001.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: phi_IS.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230320/cd48ee66/attachment-0002.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: rho_IS.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230320/cd48ee66/attachment-0003.txt>

From knepley at gmail.com  Mon Mar 20 18:16:52 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 20 Mar 2023 19:16:52 -0400
Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre
 BoomerAMG
In-Reply-To: <SA1PR09MB756818101986202BF7D34CCED4809@SA1PR09MB7568.namprd09.prod.outlook.com>
References: <SA1PR09MB756856DAB4827D93146EDB69D4AD9@SA1PR09MB7568.namprd09.prod.outlook.com>
	<CAC6644E-BC14-4F64-ABE5-F35589A4BD94@petsc.dev>
	<SA1PR09MB75689D2D180F6312B008039AD4B29@SA1PR09MB7568.namprd09.prod.outlook.com>
	<BC1877A1-BF64-4E2D-86C4-DDFFBB892577@petsc.dev>
	<SA1PR09MB7568D48C1F38364F0DD286C0D4B39@SA1PR09MB7568.namprd09.prod.outlook.com>
	<523EAD18-437E-4008-A811-4D32317C89AC@joliv.et>
	<SA1PR09MB7568D0FA09DDED4732EA2546D4BE9@SA1PR09MB7568.namprd09.prod.outlook.com>
	<4A1F98D0-658C-47A2-8277-23F97F95F5C1@petsc.dev>
	<SA1PR09MB756832B0F3D61B1E798915D7D4BD9@SA1PR09MB7568.namprd09.prod.outlook.com>
	<595D8D88-C619-41D7-A427-1C0EFB5C5E44@petsc.dev>
	<SA1PR09MB756818101986202BF7D34CCED4809@SA1PR09MB7568.namprd09.prod.outlook.com>
Message-ID: <CAMYG4GkX_2_N9HtzB0c5tvvEy068v9xNMqJW6z1a_JzsRogWeA@mail.gmail.com>

On Mon, Mar 20, 2023 at 6:45?PM Christopher, Joshua via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Hi Barry and Mark,
>
> Thank you for your responses. I implemented the index sets in my
> application and it appears to work in serial. Unfortunately I am having
> some trouble running in parallel. The error I am getting is:
> [1]PETSC ERROR: Petsc has generated inconsistent data
> [1]PETSC ERROR: Number of entries found in complement 1000 does not match
> expected 500
> 1]PETSC ERROR: #1 ISComplement() at
> petsc-3.16.5/src/vec/is/is/utils/iscoloring.c:837
> [1]PETSC ERROR: #2 PCSetUp_FieldSplit() at
> petsc-3.16.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c:882
> [1]PETSC ERROR: #3 PCSetUp() at
> petsc-3.16.5/src/ksp/pc/interface/precon.c:1017
> [1]PETSC ERROR: #4 KSPSetUp() at
> petsc-3.16.5/src/ksp/ksp/interface/itfunc.c:408
> [1]PETSC ERROR: #5 KSPSolve_Private() at
> petsc-3.16.5/src/ksp/ksp/interface/itfunc.c:852
> [1]PETSC ERROR: #6 KSPSolve() at
> petsc-3.16.5/src/ksp/ksp/interface/itfunc.c:1086
> [1]PETSC ERROR: #7 solvePetsc() at coupled/coupledSolver.C:612
>
> I am testing with two processors and a 2000x2000 matrix. I have two
> fields, phi and rho. The matrix has rows 0-999 for phi and rows 1000-1999
> for rho. Proc0 has rows 0-499 and 1000-1499 while proc1 has rows 500-999
> and 1500-1999. I've attached the ASCII printout of the IS for phi and rho.
> Am I right thinking that I have some issue with my IS layouts?
>

I do not understand your explanation. Your matrix is 2000x2000, and I
assume split so that

  proc 0 has rows 0       --   999
  proc 1 has rows 1000 -- 1999

Now, when you call PCFieldSplitSetIS(), each process gives an IS which
indicates the dofs _owned by that process_ the contribute to field k. If you
do not give unknowns within the global row bounds for that process, the
ISComplement() call will not work.

Of course, we should check that the entries are not out of bounds when they
are submitted. if you want to do it, it would be a cool submission.

   Thanks,

      Matt


> Thank you,
> Joshua
>
>
> ------------------------------
> *From:* Barry Smith <bsmith at petsc.dev>
> *Sent:* Friday, March 17, 2023 1:22 PM
> *To:* Christopher, Joshua <jchristopher at anl.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre
> BoomerAMG
>
>
>
> On Mar 17, 2023, at 1:26 PM, Christopher, Joshua <jchristopher at anl.gov>
> wrote:
>
> Hi Barry,
>
> Thank you for your response. I'm a little confused about the relation
> between the IS integer values and matrix indices. From
> https://petsc.org/release/src/snes/tutorials/ex70.c.html it looks like my
> IS should just contain a list of the rows for each split? For example, if I
> have a 100x100 matrix with two fields, "rho" and "phi", the first 50 rows
> correspond to the "rho" variable and the last 50 correspond to the "phi"
> variable. So I should call PCFieldSplitSetIS twice, the first with an IS
> containing integers 0-49 and the second with integers 49-99?
> PCFieldSplitSetIS is expecting global row numbers, correct?
>
>
>   As Mark said, yes this sounds fine.
>
>
> My matrix is organized as one block after another.
>
>
>    When you are running in parallel with MPI, how will you organize the
> unknowns? Will you have 25 of the rho followed by 25 of phi on each MPI
> process? You will need to take this into account when you build the IS on
> each MPI process.
>
>   Barry
>
>
>
> Thank you,
> Joshua
> ------------------------------
> *From:* Barry Smith <bsmith at petsc.dev>
> *Sent:* Tuesday, March 14, 2023 1:35 PM
> *To:* Christopher, Joshua <jchristopher at anl.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre
> BoomerAMG
>
>
>   You definitely do not need to use a complicated DM to take advantage of
> PCFIELDSPLIT. All you need to do is create two IS on each MPI process. The
> first should list all the indices of the degrees of freedom of your first
> type of variable and the second should list all the rest of the degrees of
> freedom. Then use
> https://petsc.org/release/docs/manualpages/PC/PCFieldSplitSetIS/
>
>   Barry
>
> Note: PCFIELDSPLIT does not care how you have ordered your degrees of
> freedom of the two types. You might interlace them or have all the first
> degree of freedom on an MPI process and then have all the second degree of
> freedom. This just determines what your IS look like.
>
>
>
> On Mar 14, 2023, at 1:14 PM, Christopher, Joshua via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> Hello PETSc users,
>
> I haven't heard back from the library developer regarding the numbering
> issue or my questions on using field split operators with their library, so
> I need to fix this myself.
>
> Regarding the natural numbering vs parallel numbering: I haven't figured
> out what is wrong here. I stepped through in parallel and it looks like
> each processor is setting up the matrix and calling MatSetValue similar to
> what is shown in
> https://petsc.org/release/src/ksp/ksp/tutorials/ex2.c.html. I see that
> PETSc is recognizing my simple two-processor test from the output
> ("PetscInitialize_Common(): PETSc successfully started: number of
> processors = 2"). I'll keep poking at this, however I'm very new to PETSc.
> When I print the matrix to ASCII using PETSC_VIEWER_DEFAULT, I'm guessing I
> see one row per line, and the tuples consists of the column number and
> value?
>
> On the FieldSplit preconditioner, is my understanding here correct:
>
> To use FieldSplit, I must have a DM. Since I have an unstructured mesh, I
> must use DMPlex and set up the chart and covering relations specific to my
> mesh following here: https://petsc.org/release/docs/manual/dmplex/. I
> think this may be very time-consuming for me to set up.
>
> Currently, I already have a matrix stored in a parallel sparse L-D-U
> format. I am converting into PETSc's sparse parallel AIJ matrix (traversing
> my matrix and using MatSetValues). The weights for my discretization scheme
> are already accounted for in the coefficients of my L-D-U matrix. I do have
> the submatrices in L-D-U format for each of my two equations' coupling with
> each other. That is, the equivalent of lines 242,251-252,254 of example 28
>  https://petsc.org/release/src/snes/tutorials/ex28.c.html. Could I
> directly convert my submatrices into PETSc's sub-matrix here, then assemble
> things together so that the field split preconditioners will work?
>
> Alternatively, since my L-D-U matrices already account for the
> discretization scheme, can I use a simple structured grid DM?
>
> Thank you so much for your help!
> Regards,
> Joshua
> ------------------------------
> *From:* Pierre Jolivet <pierre at joliv.et>
> *Sent:* Friday, March 3, 2023 11:45 AM
> *To:* Christopher, Joshua <jchristopher at anl.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre
> BoomerAMG
>
> For full disclosure, with -ksp_pc_side right -ksp_max_it 100 -ksp_rtol
> 1E-10:
> 1) with renumbering via ParMETIS
> -pc_type bjacobi -sub_pc_type lu -sub_pc_factor_mat_solver_type mumps
> => Linear solve converged due to CONVERGED_RTOL iterations 10
> -pc_type hypre -pc_hypre_boomeramg_relax_type_down l1-Gauss-Seidel
> -pc_hypre_boomeramg_relax_type_up backward-l1-Gauss-Seidel => Linear solve
> converged due to CONVERGED_RTOL iterations 55
> 2) without renumbering via ParMETIS
> -pc_type bjacobi => Linear solve did not converge due to DIVERGED_ITS
> iterations 100
> -pc_type hypre => Linear solve did not converge due to DIVERGED_ITS
> iterations 100
> Using on outer fieldsplit may help fix this.
>
> Thanks,
> Pierre
>
> On 3 Mar 2023, at 6:24 PM, Christopher, Joshua via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> I am solving these equations in the context of electrically-driven fluid
> flows as that first paper describes. I am using a PIMPLE scheme to advance
> the fluid equations in time, and my goal is to do a coupled solve of the
> electric equations similar to what is described in this paper:
> https://www.sciencedirect.com/science/article/pii/S0045793019302427. They
> are using the SIMPLE scheme in this paper. My fluid flow should eventually
> reach steady behavior, and likewise the time derivative in the charge
> density should trend towards zero. They preferred using BiCGStab with a
> direct LU preconditioner for solving their electric equations. I tried to
> test that combination, but my case is halting for unknown reasons in the
> middle of the PETSc solve. I'll try with more nodes and see if I am running
> out of memory, but the computer is a little overloaded at the moment so it
> may take a while to run.
>
> I sent Pierre Jolivet my matrix and RHS, and they said the matrix does not
> appear to be following a parallel numbering, and instead looks like the
> matrix has natural numbering. When they renumbered the system with ParMETIS
> they got really fast convergence. I am using PETSc through a library, so I
> will reach out to the library authors and see if there is an issue in the
> library.
>
> Thank you,
> Joshua
> ------------------------------
> *From:* Barry Smith <bsmith at petsc.dev>
> *Sent:* Thursday, March 2, 2023 3:47 PM
> *To:* Christopher, Joshua <jchristopher at anl.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre
> BoomerAMG
>
>
>
>
> <Untitled.png>
>
>   Are you solving this as a time-dependent problem? Using an implicit
> scheme (like backward Euler) for rho ? In ODE language, solving the
> differential algebraic equation?
>
> Is epsilon bounded away from 0?
>
> On Mar 2, 2023, at 4:22 PM, Christopher, Joshua <jchristopher at anl.gov>
> wrote:
>
> Hi Barry and Mark,
>
> Thank you for looking into my problem. The two equations I am solving with
> PETSc are equations 6 and 7 from this paper:
> https://ris.utwente.nl/ws/portalfiles/portal/5676495/Roghair+Paper_final_draft_v1.pdf
>
> I just used MUMPS and SuperLU_DIST on my full-size problem (with 3,000,000
> unknowns). To clarify, I did a direct solve with -ksp_type preonly. They
> take a very long time, about 30 minutes for MUMPS and 18 minutes for
> SuperLU_DIST, see attached output. For reference, the same matrix took 658
> iterations of BoomerAMG and about 20 seconds of walltime. Maybe I am
> already getting a great deal with BoomerAMG!
>
> I'll try removing some terms from my solve (e.g. removing the second
> equation, then making the second equation just the elliptic portion of the
> equation, etc.) and try with a simpler geometry. I'll keep you updated as I
> run into troubles with that route. I wasn't aware of Field Split
> preconditioners, I'll do some reading on them and give them a try as well.
>
> Thank you again,
> Joshua
> ------------------------------
>
> *From:* Barry Smith <bsmith at petsc.dev>
> *Sent:* Thursday, March 2, 2023 7:47 AM
> *To:* Christopher, Joshua <jchristopher at anl.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre
> BoomerAMG
>
>
>   Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the
> 5,000,000 unknowns? It is at the high end of problem sizes you can do with
> direct solvers but is worth comparing with  BoomerAMG. You likely want to
> use more nodes and fewer cores per node with MUMPs to be able to access
> more memory. If you are needing to solve multiple right hand sides but with
> the same matrix the factors will be reused resulting in the second and
> later solves being much faster.
>
>   I agree with Mark, with iterative solvers you are likely to end up with
> PCFIELDSPLIT.
>
>   Barry
>
>
> On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> Hello,
>
> I am trying to solve the leaky-dielectric model equations with PETSc using
> a second-order discretization scheme (with limiting to first order as
> needed) using the finite volume method. The leaky dielectric model is a
> coupled system of two equations, consisting of a Poisson equation and a
> convection-diffusion equation.  I have tested on small problems with simple
> geometry (~1000 DoFs) using:
>
> -ksp_type gmres
> -pc_type hypre
> -pc_hypre_type boomeramg
>
> and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this
> in parallel with 2 cores, but also previously was able to use successfully
> use a direct solver in serial to solve this problem. When I scale up to my
> production problem, I get significantly worse convergence. My production
> problem has ~3 million DoFs, more complex geometry, and is solved on ~100
> cores across two nodes. The boundary conditions change a little because of
> the geometry, but are of the same classifications (e.g. only Dirichlet and
> Neumann). On the production case, I am needing 600-4000 iterations to
> converge. I've attached the output from the first solve that took 658
> iterations to converge, using the following output options:
>
> -ksp_view_pre
> -ksp_view
> -ksp_converged_reason
> -ksp_monitor_true_residual
> -ksp_test_null_space
>
> My matrix is non-symmetric, the condition number can be around 10e6, and
> the eigenvalues reported by PETSc have been real and positive (using
> -ksp_view_eigenvalues).
>
> I have tried using other preconditions (superlu, mumps, gamg, mg) but
> hypre+boomeramg has performed the best so far. The literature seems to
> indicate that AMG is the best approach for solving these equations in a
> coupled fashion.
>
> Do you have any advice on speeding up the convergence of this system?
>
> Thank you,
> Joshua
> <petsc_gmres_boomeramg.txt>
>
>
> <petsc_preonly_mumps.txt><petsc_preonly_superlu.txt>
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230320/13b5d86d/attachment-0001.html>

From jchristopher at anl.gov  Tue Mar 21 09:28:29 2023
From: jchristopher at anl.gov (Christopher, Joshua)
Date: Tue, 21 Mar 2023 14:28:29 +0000
Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre
 BoomerAMG
In-Reply-To: <CAMYG4GkX_2_N9HtzB0c5tvvEy068v9xNMqJW6z1a_JzsRogWeA@mail.gmail.com>
References: <SA1PR09MB756856DAB4827D93146EDB69D4AD9@SA1PR09MB7568.namprd09.prod.outlook.com>
	<CAC6644E-BC14-4F64-ABE5-F35589A4BD94@petsc.dev>
	<SA1PR09MB75689D2D180F6312B008039AD4B29@SA1PR09MB7568.namprd09.prod.outlook.com>
	<BC1877A1-BF64-4E2D-86C4-DDFFBB892577@petsc.dev>
	<SA1PR09MB7568D48C1F38364F0DD286C0D4B39@SA1PR09MB7568.namprd09.prod.outlook.com>
	<523EAD18-437E-4008-A811-4D32317C89AC@joliv.et>
	<SA1PR09MB7568D0FA09DDED4732EA2546D4BE9@SA1PR09MB7568.namprd09.prod.outlook.com>
	<4A1F98D0-658C-47A2-8277-23F97F95F5C1@petsc.dev>
	<SA1PR09MB756832B0F3D61B1E798915D7D4BD9@SA1PR09MB7568.namprd09.prod.outlook.com>
	<595D8D88-C619-41D7-A427-1C0EFB5C5E44@petsc.dev>
	<SA1PR09MB756818101986202BF7D34CCED4809@SA1PR09MB7568.namprd09.prod.outlook.com>
	<CAMYG4GkX_2_N9HtzB0c5tvvEy068v9xNMqJW6z1a_JzsRogWeA@mail.gmail.com>
Message-ID: <SA1PR09MB7568A4E7164A35E4DCEA3C28D4819@SA1PR09MB7568.namprd09.prod.outlook.com>

Hi Matt,

Sorry for the unclear explanation. My layout is like this:

Proc 0: Rows 0--499 and rows 1000--1499
Proc 1: Rows 500-999 and rows 1500-1999

I have two unknowns, rho and phi, both correspond to a contiguous chunk of rows.

Phi: Rows 0-999
Rho: Rows 1000-1999

My source data (an OpenFOAM matrix) has the unknowns row-contiguous, which is why my layout is like this. My understanding is that my IS are set up correctly to match this matrix structure, which is why I am uncertain why I am getting the error message. I attached the output of my IS in my previous message.

Thank you,
Joshua
________________________________
From: Matthew Knepley <knepley at gmail.com>
Sent: Monday, March 20, 2023 6:16 PM
To: Christopher, Joshua <jchristopher at anl.gov>
Cc: Barry Smith <bsmith at petsc.dev>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG

On Mon, Mar 20, 2023 at 6:45?PM Christopher, Joshua via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
Hi Barry and Mark,

Thank you for your responses. I implemented the index sets in my application and it appears to work in serial. Unfortunately I am having some trouble running in parallel. The error I am getting is:
[1]PETSC ERROR: Petsc has generated inconsistent data
[1]PETSC ERROR: Number of entries found in complement 1000 does not match expected 500
1]PETSC ERROR: #1 ISComplement() at petsc-3.16.5/src/vec/is/is/utils/iscoloring.c:837
[1]PETSC ERROR: #2 PCSetUp_FieldSplit() at petsc-3.16.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c:882
[1]PETSC ERROR: #3 PCSetUp() at petsc-3.16.5/src/ksp/pc/interface/precon.c:1017
[1]PETSC ERROR: #4 KSPSetUp() at petsc-3.16.5/src/ksp/ksp/interface/itfunc.c:408
[1]PETSC ERROR: #5 KSPSolve_Private() at petsc-3.16.5/src/ksp/ksp/interface/itfunc.c:852
[1]PETSC ERROR: #6 KSPSolve() at petsc-3.16.5/src/ksp/ksp/interface/itfunc.c:1086
[1]PETSC ERROR: #7 solvePetsc() at coupled/coupledSolver.C:612

I am testing with two processors and a 2000x2000 matrix. I have two fields, phi and rho. The matrix has rows 0-999 for phi and rows 1000-1999 for rho. Proc0 has rows 0-499 and 1000-1499 while proc1 has rows 500-999 and 1500-1999. I've attached the ASCII printout of the IS for phi and rho. Am I right thinking that I have some issue with my IS layouts?

I do not understand your explanation. Your matrix is 2000x2000, and I assume split so that

  proc 0 has rows 0       --   999
  proc 1 has rows 1000 -- 1999

Now, when you call PCFieldSplitSetIS(), each process gives an IS which indicates the dofs _owned by that process_ the contribute to field k. If you
do not give unknowns within the global row bounds for that process, the ISComplement() call will not work.

Of course, we should check that the entries are not out of bounds when they are submitted. if you want to do it, it would be a cool submission.

   Thanks,

      Matt

Thank you,
Joshua


________________________________
From: Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Sent: Friday, March 17, 2023 1:22 PM
To: Christopher, Joshua <jchristopher at anl.gov<mailto:jchristopher at anl.gov>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG



On Mar 17, 2023, at 1:26 PM, Christopher, Joshua <jchristopher at anl.gov<mailto:jchristopher at anl.gov>> wrote:

Hi Barry,

Thank you for your response. I'm a little confused about the relation between the IS integer values and matrix indices. Fromhttps://petsc.org/release/src/snes/tutorials/ex70.c.html it looks like my IS should just contain a list of the rows for each split? For example, if I have a 100x100 matrix with two fields, "rho" and "phi", the first 50 rows correspond to the "rho" variable and the last 50 correspond to the "phi" variable. So I should call PCFieldSplitSetIS twice, the first with an IS containing integers 0-49 and the second with integers 49-99? PCFieldSplitSetIS is expecting global row numbers, correct?

  As Mark said, yes this sounds fine.

My matrix is organized as one block after another.

   When you are running in parallel with MPI, how will you organize the unknowns? Will you have 25 of the rho followed by 25 of phi on each MPI process? You will need to take this into account when you build the IS on each MPI process.

  Barry



Thank you,
Joshua
________________________________
From: Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Sent: Tuesday, March 14, 2023 1:35 PM
To: Christopher, Joshua <jchristopher at anl.gov<mailto:jchristopher at anl.gov>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG


  You definitely do not need to use a complicated DM to take advantage of PCFIELDSPLIT. All you need to do is create two IS on each MPI process. The first should list all the indices of the degrees of freedom of your first type of variable and the second should list all the rest of the degrees of freedom. Then use https://petsc.org/release/docs/manualpages/PC/PCFieldSplitSetIS/

  Barry

Note: PCFIELDSPLIT does not care how you have ordered your degrees of freedom of the two types. You might interlace them or have all the first degree of freedom on an MPI process and then have all the second degree of freedom. This just determines what your IS look like.



On Mar 14, 2023, at 1:14 PM, Christopher, Joshua via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:

Hello PETSc users,

I haven't heard back from the library developer regarding the numbering issue or my questions on using field split operators with their library, so I need to fix this myself.

Regarding the natural numbering vs parallel numbering: I haven't figured out what is wrong here. I stepped through in parallel and it looks like each processor is setting up the matrix and calling MatSetValue similar to what is shown in https://petsc.org/release/src/ksp/ksp/tutorials/ex2.c.html. I see that PETSc is recognizing my simple two-processor test from the output ("PetscInitialize_Common(): PETSc successfully started: number of processors = 2"). I'll keep poking at this, however I'm very new to PETSc. When I print the matrix to ASCII using PETSC_VIEWER_DEFAULT, I'm guessing I see one row per line, and the tuples consists of the column number and value?

On the FieldSplit preconditioner, is my understanding here correct:

To use FieldSplit, I must have a DM. Since I have an unstructured mesh, I must use DMPlex and set up the chart and covering relations specific to my mesh following here: https://petsc.org/release/docs/manual/dmplex/. I think this may be very time-consuming for me to set up.

Currently, I already have a matrix stored in a parallel sparse L-D-U format. I am converting into PETSc's sparse parallel AIJ matrix (traversing my matrix and using MatSetValues). The weights for my discretization scheme are already accounted for in the coefficients of my L-D-U matrix. I do have the submatrices in L-D-U format for each of my two equations' coupling with each other. That is, the equivalent of lines 242,251-252,254 of example 28 https://petsc.org/release/src/snes/tutorials/ex28.c.html. Could I directly convert my submatrices into PETSc's sub-matrix here, then assemble things together so that the field split preconditioners will work?

Alternatively, since my L-D-U matrices already account for the discretization scheme, can I use a simple structured grid DM?

Thank you so much for your help!
Regards,
Joshua
________________________________
From: Pierre Jolivet <pierre at joliv.et<mailto:pierre at joliv.et>>
Sent: Friday, March 3, 2023 11:45 AM
To: Christopher, Joshua <jchristopher at anl.gov<mailto:jchristopher at anl.gov>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG

For full disclosure, with -ksp_pc_side right -ksp_max_it 100 -ksp_rtol 1E-10:
1) with renumbering via ParMETIS
-pc_type bjacobi -sub_pc_type lu -sub_pc_factor_mat_solver_type mumps => Linear solve converged due to CONVERGED_RTOL iterations 10
-pc_type hypre -pc_hypre_boomeramg_relax_type_down l1-Gauss-Seidel -pc_hypre_boomeramg_relax_type_up backward-l1-Gauss-Seidel => Linear solve converged due to CONVERGED_RTOL iterations 55
2) without renumbering via ParMETIS
-pc_type bjacobi => Linear solve did not converge due to DIVERGED_ITS iterations 100
-pc_type hypre => Linear solve did not converge due to DIVERGED_ITS iterations 100
Using on outer fieldsplit may help fix this.

Thanks,
Pierre

On 3 Mar 2023, at 6:24 PM, Christopher, Joshua via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:

I am solving these equations in the context of electrically-driven fluid flows as that first paper describes. I am using a PIMPLE scheme to advance the fluid equations in time, and my goal is to do a coupled solve of the electric equations similar to what is described in this paper: https://www.sciencedirect.com/science/article/pii/S0045793019302427. They are using the SIMPLE scheme in this paper. My fluid flow should eventually reach steady behavior, and likewise the time derivative in the charge density should trend towards zero. They preferred using BiCGStab with a direct LU preconditioner for solving their electric equations. I tried to test that combination, but my case is halting for unknown reasons in the middle of the PETSc solve. I'll try with more nodes and see if I am running out of memory, but the computer is a little overloaded at the moment so it may take a while to run.

I sent Pierre Jolivet my matrix and RHS, and they said the matrix does not appear to be following a parallel numbering, and instead looks like the matrix has natural numbering. When they renumbered the system with ParMETIS they got really fast convergence. I am using PETSc through a library, so I will reach out to the library authors and see if there is an issue in the library.

Thank you,
Joshua
________________________________
From: Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Sent: Thursday, March 2, 2023 3:47 PM
To: Christopher, Joshua <jchristopher at anl.gov<mailto:jchristopher at anl.gov>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG




<Untitled.png>

  Are you solving this as a time-dependent problem? Using an implicit scheme (like backward Euler) for rho ? In ODE language, solving the differential algebraic equation?

Is epsilon bounded away from 0?

On Mar 2, 2023, at 4:22 PM, Christopher, Joshua <jchristopher at anl.gov<mailto:jchristopher at anl.gov>> wrote:

Hi Barry and Mark,

Thank you for looking into my problem. The two equations I am solving with PETSc are equations 6 and 7 from this paper:https://ris.utwente.nl/ws/portalfiles/portal/5676495/Roghair+Paper_final_draft_v1.pdf

I just used MUMPS and SuperLU_DIST on my full-size problem (with 3,000,000 unknowns). To clarify, I did a direct solve with -ksp_type preonly. They take a very long time, about 30 minutes for MUMPS and 18 minutes for SuperLU_DIST, see attached output. For reference, the same matrix took 658 iterations of BoomerAMG and about 20 seconds of walltime. Maybe I am already getting a great deal with BoomerAMG!

I'll try removing some terms from my solve (e.g. removing the second equation, then making the second equation just the elliptic portion of the equation, etc.) and try with a simpler geometry. I'll keep you updated as I run into troubles with that route. I wasn't aware of Field Split preconditioners, I'll do some reading on them and give them a try as well.

Thank you again,
Joshua
________________________________

From: Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Sent: Thursday, March 2, 2023 7:47 AM
To: Christopher, Joshua <jchristopher at anl.gov<mailto:jchristopher at anl.gov>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG


  Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the 5,000,000 unknowns? It is at the high end of problem sizes you can do with direct solvers but is worth comparing with  BoomerAMG. You likely want to use more nodes and fewer cores per node with MUMPs to be able to access more memory. If you are needing to solve multiple right hand sides but with the same matrix the factors will be reused resulting in the second and later solves being much faster.

  I agree with Mark, with iterative solvers you are likely to end up with PCFIELDSPLIT.

  Barry


On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:

Hello,

I am trying to solve the leaky-dielectric model equations with PETSc using a second-order discretization scheme (with limiting to first order as needed) using the finite volume method. The leaky dielectric model is a coupled system of two equations, consisting of a Poisson equation and a convection-diffusion equation.  I have tested on small problems with simple geometry (~1000 DoFs) using:

-ksp_type gmres
-pc_type hypre
-pc_hypre_type boomeramg

and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this in parallel with 2 cores, but also previously was able to use successfully use a direct solver in serial to solve this problem. When I scale up to my production problem, I get significantly worse convergence. My production problem has ~3 million DoFs, more complex geometry, and is solved on ~100 cores across two nodes. The boundary conditions change a little because of the geometry, but are of the same classifications (e.g. only Dirichlet and Neumann). On the production case, I am needing 600-4000 iterations to converge. I've attached the output from the first solve that took 658 iterations to converge, using the following output options:

-ksp_view_pre
-ksp_view
-ksp_converged_reason
-ksp_monitor_true_residual
-ksp_test_null_space

My matrix is non-symmetric, the condition number can be around 10e6, and the eigenvalues reported by PETSc have been real and positive (using -ksp_view_eigenvalues).

I have tried using other preconditions (superlu, mumps, gamg, mg) but hypre+boomeramg has performed the best so far. The literature seems to indicate that AMG is the best approach for solving these equations in a coupled fashion.

Do you have any advice on speeding up the convergence of this system?

Thank you,
Joshua
<petsc_gmres_boomeramg.txt>

<petsc_preonly_mumps.txt><petsc_preonly_superlu.txt>



--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230321/af2ae491/attachment-0001.html>

From knepley at gmail.com  Tue Mar 21 09:37:56 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 21 Mar 2023 10:37:56 -0400
Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre
 BoomerAMG
In-Reply-To: <SA1PR09MB7568A4E7164A35E4DCEA3C28D4819@SA1PR09MB7568.namprd09.prod.outlook.com>
References: <SA1PR09MB756856DAB4827D93146EDB69D4AD9@SA1PR09MB7568.namprd09.prod.outlook.com>
	<CAC6644E-BC14-4F64-ABE5-F35589A4BD94@petsc.dev>
	<SA1PR09MB75689D2D180F6312B008039AD4B29@SA1PR09MB7568.namprd09.prod.outlook.com>
	<BC1877A1-BF64-4E2D-86C4-DDFFBB892577@petsc.dev>
	<SA1PR09MB7568D48C1F38364F0DD286C0D4B39@SA1PR09MB7568.namprd09.prod.outlook.com>
	<523EAD18-437E-4008-A811-4D32317C89AC@joliv.et>
	<SA1PR09MB7568D0FA09DDED4732EA2546D4BE9@SA1PR09MB7568.namprd09.prod.outlook.com>
	<4A1F98D0-658C-47A2-8277-23F97F95F5C1@petsc.dev>
	<SA1PR09MB756832B0F3D61B1E798915D7D4BD9@SA1PR09MB7568.namprd09.prod.outlook.com>
	<595D8D88-C619-41D7-A427-1C0EFB5C5E44@petsc.dev>
	<SA1PR09MB756818101986202BF7D34CCED4809@SA1PR09MB7568.namprd09.prod.outlook.com>
	<CAMYG4GkX_2_N9HtzB0c5tvvEy068v9xNMqJW6z1a_JzsRogWeA@mail.gmail.com>
	<SA1PR09MB7568A4E7164A35E4DCEA3C28D4819@SA1PR09MB7568.namprd09.prod.outlook.com>
Message-ID: <CAMYG4GkoZ0zPcwM7QYpbf-PYxhR83qaoJ64NTiWVjwZfXjsORA@mail.gmail.com>

On Tue, Mar 21, 2023 at 10:28?AM Christopher, Joshua <jchristopher at anl.gov>
wrote:

> Hi Matt,
>
> Sorry for the unclear explanation. My layout is like this:
>
> Proc 0: Rows 0--499 and rows 1000--1499
> Proc 1: Rows 500-999 and rows 1500-1999
>

That is not a possible layout in PETSc. This is the source of the
misunderstanding. Rows are always contiguous in PETSc.

  Thanks,

     Matt


> I have two unknowns, rho and phi, both correspond to a contiguous chunk of
> rows.
>
> Phi: Rows 0-999
> Rho: Rows 1000-1999
>
> My source data (an OpenFOAM matrix) has the unknowns row-contiguous, which
> is why my layout is like this. My understanding is that my IS are set up
> correctly to match this matrix structure, which is why I am uncertain why I
> am getting the error message. I attached the output of my IS in my previous
> message.
>
> Thank you,
> Joshua
> ------------------------------
> *From:* Matthew Knepley <knepley at gmail.com>
> *Sent:* Monday, March 20, 2023 6:16 PM
> *To:* Christopher, Joshua <jchristopher at anl.gov>
> *Cc:* Barry Smith <bsmith at petsc.dev>; petsc-users at mcs.anl.gov <
> petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre
> BoomerAMG
>
> On Mon, Mar 20, 2023 at 6:45?PM Christopher, Joshua via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> Hi Barry and Mark,
>
> Thank you for your responses. I implemented the index sets in my
> application and it appears to work in serial. Unfortunately I am having
> some trouble running in parallel. The error I am getting is:
> [1]PETSC ERROR: Petsc has generated inconsistent data
> [1]PETSC ERROR: Number of entries found in complement 1000 does not match
> expected 500
> 1]PETSC ERROR: #1 ISComplement() at
> petsc-3.16.5/src/vec/is/is/utils/iscoloring.c:837
> [1]PETSC ERROR: #2 PCSetUp_FieldSplit() at
> petsc-3.16.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c:882
> [1]PETSC ERROR: #3 PCSetUp() at
> petsc-3.16.5/src/ksp/pc/interface/precon.c:1017
> [1]PETSC ERROR: #4 KSPSetUp() at
> petsc-3.16.5/src/ksp/ksp/interface/itfunc.c:408
> [1]PETSC ERROR: #5 KSPSolve_Private() at
> petsc-3.16.5/src/ksp/ksp/interface/itfunc.c:852
> [1]PETSC ERROR: #6 KSPSolve() at
> petsc-3.16.5/src/ksp/ksp/interface/itfunc.c:1086
> [1]PETSC ERROR: #7 solvePetsc() at coupled/coupledSolver.C:612
>
> I am testing with two processors and a 2000x2000 matrix. I have two
> fields, phi and rho. The matrix has rows 0-999 for phi and rows 1000-1999
> for rho. Proc0 has rows 0-499 and 1000-1499 while proc1 has rows 500-999
> and 1500-1999. I've attached the ASCII printout of the IS for phi and rho.
> Am I right thinking that I have some issue with my IS layouts?
>
>
> I do not understand your explanation. Your matrix is 2000x2000, and I
> assume split so that
>
>   proc 0 has rows 0       --   999
>   proc 1 has rows 1000 -- 1999
>
> Now, when you call PCFieldSplitSetIS(), each process gives an IS which
> indicates the dofs _owned by that process_ the contribute to field k. If you
> do not give unknowns within the global row bounds for that process, the
> ISComplement() call will not work.
>
> Of course, we should check that the entries are not out of bounds when
> they are submitted. if you want to do it, it would be a cool submission.
>
>    Thanks,
>
>       Matt
>
>
> Thank you,
> Joshua
>
>
> ------------------------------
> *From:* Barry Smith <bsmith at petsc.dev>
> *Sent:* Friday, March 17, 2023 1:22 PM
> *To:* Christopher, Joshua <jchristopher at anl.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre
> BoomerAMG
>
>
>
> On Mar 17, 2023, at 1:26 PM, Christopher, Joshua <jchristopher at anl.gov>
> wrote:
>
> Hi Barry,
>
> Thank you for your response. I'm a little confused about the relation
> between the IS integer values and matrix indices. From
> https://petsc.org/release/src/snes/tutorials/ex70.c.html it looks like my
> IS should just contain a list of the rows for each split? For example, if I
> have a 100x100 matrix with two fields, "rho" and "phi", the first 50 rows
> correspond to the "rho" variable and the last 50 correspond to the "phi"
> variable. So I should call PCFieldSplitSetIS twice, the first with an IS
> containing integers 0-49 and the second with integers 49-99?
> PCFieldSplitSetIS is expecting global row numbers, correct?
>
>
>   As Mark said, yes this sounds fine.
>
>
> My matrix is organized as one block after another.
>
>
>    When you are running in parallel with MPI, how will you organize the
> unknowns? Will you have 25 of the rho followed by 25 of phi on each MPI
> process? You will need to take this into account when you build the IS on
> each MPI process.
>
>   Barry
>
>
>
> Thank you,
> Joshua
> ------------------------------
> *From:* Barry Smith <bsmith at petsc.dev>
> *Sent:* Tuesday, March 14, 2023 1:35 PM
> *To:* Christopher, Joshua <jchristopher at anl.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre
> BoomerAMG
>
>
>   You definitely do not need to use a complicated DM to take advantage of
> PCFIELDSPLIT. All you need to do is create two IS on each MPI process. The
> first should list all the indices of the degrees of freedom of your first
> type of variable and the second should list all the rest of the degrees of
> freedom. Then use
> https://petsc.org/release/docs/manualpages/PC/PCFieldSplitSetIS/
>
>   Barry
>
> Note: PCFIELDSPLIT does not care how you have ordered your degrees of
> freedom of the two types. You might interlace them or have all the first
> degree of freedom on an MPI process and then have all the second degree of
> freedom. This just determines what your IS look like.
>
>
>
> On Mar 14, 2023, at 1:14 PM, Christopher, Joshua via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> Hello PETSc users,
>
> I haven't heard back from the library developer regarding the numbering
> issue or my questions on using field split operators with their library, so
> I need to fix this myself.
>
> Regarding the natural numbering vs parallel numbering: I haven't figured
> out what is wrong here. I stepped through in parallel and it looks like
> each processor is setting up the matrix and calling MatSetValue similar to
> what is shown in
> https://petsc.org/release/src/ksp/ksp/tutorials/ex2.c.html. I see that
> PETSc is recognizing my simple two-processor test from the output
> ("PetscInitialize_Common(): PETSc successfully started: number of
> processors = 2"). I'll keep poking at this, however I'm very new to PETSc.
> When I print the matrix to ASCII using PETSC_VIEWER_DEFAULT, I'm guessing I
> see one row per line, and the tuples consists of the column number and
> value?
>
> On the FieldSplit preconditioner, is my understanding here correct:
>
> To use FieldSplit, I must have a DM. Since I have an unstructured mesh, I
> must use DMPlex and set up the chart and covering relations specific to my
> mesh following here: https://petsc.org/release/docs/manual/dmplex/. I
> think this may be very time-consuming for me to set up.
>
> Currently, I already have a matrix stored in a parallel sparse L-D-U
> format. I am converting into PETSc's sparse parallel AIJ matrix (traversing
> my matrix and using MatSetValues). The weights for my discretization scheme
> are already accounted for in the coefficients of my L-D-U matrix. I do have
> the submatrices in L-D-U format for each of my two equations' coupling with
> each other. That is, the equivalent of lines 242,251-252,254 of example 28
>  https://petsc.org/release/src/snes/tutorials/ex28.c.html. Could I
> directly convert my submatrices into PETSc's sub-matrix here, then assemble
> things together so that the field split preconditioners will work?
>
> Alternatively, since my L-D-U matrices already account for the
> discretization scheme, can I use a simple structured grid DM?
>
> Thank you so much for your help!
> Regards,
> Joshua
> ------------------------------
> *From:* Pierre Jolivet <pierre at joliv.et>
> *Sent:* Friday, March 3, 2023 11:45 AM
> *To:* Christopher, Joshua <jchristopher at anl.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre
> BoomerAMG
>
> For full disclosure, with -ksp_pc_side right -ksp_max_it 100 -ksp_rtol
> 1E-10:
> 1) with renumbering via ParMETIS
> -pc_type bjacobi -sub_pc_type lu -sub_pc_factor_mat_solver_type mumps
> => Linear solve converged due to CONVERGED_RTOL iterations 10
> -pc_type hypre -pc_hypre_boomeramg_relax_type_down l1-Gauss-Seidel
> -pc_hypre_boomeramg_relax_type_up backward-l1-Gauss-Seidel => Linear solve
> converged due to CONVERGED_RTOL iterations 55
> 2) without renumbering via ParMETIS
> -pc_type bjacobi => Linear solve did not converge due to DIVERGED_ITS
> iterations 100
> -pc_type hypre => Linear solve did not converge due to DIVERGED_ITS
> iterations 100
> Using on outer fieldsplit may help fix this.
>
> Thanks,
> Pierre
>
> On 3 Mar 2023, at 6:24 PM, Christopher, Joshua via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> I am solving these equations in the context of electrically-driven fluid
> flows as that first paper describes. I am using a PIMPLE scheme to advance
> the fluid equations in time, and my goal is to do a coupled solve of the
> electric equations similar to what is described in this paper:
> https://www.sciencedirect.com/science/article/pii/S0045793019302427. They
> are using the SIMPLE scheme in this paper. My fluid flow should eventually
> reach steady behavior, and likewise the time derivative in the charge
> density should trend towards zero. They preferred using BiCGStab with a
> direct LU preconditioner for solving their electric equations. I tried to
> test that combination, but my case is halting for unknown reasons in the
> middle of the PETSc solve. I'll try with more nodes and see if I am running
> out of memory, but the computer is a little overloaded at the moment so it
> may take a while to run.
>
> I sent Pierre Jolivet my matrix and RHS, and they said the matrix does not
> appear to be following a parallel numbering, and instead looks like the
> matrix has natural numbering. When they renumbered the system with ParMETIS
> they got really fast convergence. I am using PETSc through a library, so I
> will reach out to the library authors and see if there is an issue in the
> library.
>
> Thank you,
> Joshua
> ------------------------------
> *From:* Barry Smith <bsmith at petsc.dev>
> *Sent:* Thursday, March 2, 2023 3:47 PM
> *To:* Christopher, Joshua <jchristopher at anl.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre
> BoomerAMG
>
>
>
>
> <Untitled.png>
>
>   Are you solving this as a time-dependent problem? Using an implicit
> scheme (like backward Euler) for rho ? In ODE language, solving the
> differential algebraic equation?
>
> Is epsilon bounded away from 0?
>
> On Mar 2, 2023, at 4:22 PM, Christopher, Joshua <jchristopher at anl.gov>
> wrote:
>
> Hi Barry and Mark,
>
> Thank you for looking into my problem. The two equations I am solving with
> PETSc are equations 6 and 7 from this paper:
> https://ris.utwente.nl/ws/portalfiles/portal/5676495/Roghair+Paper_final_draft_v1.pdf
>
> I just used MUMPS and SuperLU_DIST on my full-size problem (with 3,000,000
> unknowns). To clarify, I did a direct solve with -ksp_type preonly. They
> take a very long time, about 30 minutes for MUMPS and 18 minutes for
> SuperLU_DIST, see attached output. For reference, the same matrix took 658
> iterations of BoomerAMG and about 20 seconds of walltime. Maybe I am
> already getting a great deal with BoomerAMG!
>
> I'll try removing some terms from my solve (e.g. removing the second
> equation, then making the second equation just the elliptic portion of the
> equation, etc.) and try with a simpler geometry. I'll keep you updated as I
> run into troubles with that route. I wasn't aware of Field Split
> preconditioners, I'll do some reading on them and give them a try as well.
>
> Thank you again,
> Joshua
> ------------------------------
>
> *From:* Barry Smith <bsmith at petsc.dev>
> *Sent:* Thursday, March 2, 2023 7:47 AM
> *To:* Christopher, Joshua <jchristopher at anl.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre
> BoomerAMG
>
>
>   Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the
> 5,000,000 unknowns? It is at the high end of problem sizes you can do with
> direct solvers but is worth comparing with  BoomerAMG. You likely want to
> use more nodes and fewer cores per node with MUMPs to be able to access
> more memory. If you are needing to solve multiple right hand sides but with
> the same matrix the factors will be reused resulting in the second and
> later solves being much faster.
>
>   I agree with Mark, with iterative solvers you are likely to end up with
> PCFIELDSPLIT.
>
>   Barry
>
>
> On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> Hello,
>
> I am trying to solve the leaky-dielectric model equations with PETSc using
> a second-order discretization scheme (with limiting to first order as
> needed) using the finite volume method. The leaky dielectric model is a
> coupled system of two equations, consisting of a Poisson equation and a
> convection-diffusion equation.  I have tested on small problems with simple
> geometry (~1000 DoFs) using:
>
> -ksp_type gmres
> -pc_type hypre
> -pc_hypre_type boomeramg
>
> and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this
> in parallel with 2 cores, but also previously was able to use successfully
> use a direct solver in serial to solve this problem. When I scale up to my
> production problem, I get significantly worse convergence. My production
> problem has ~3 million DoFs, more complex geometry, and is solved on ~100
> cores across two nodes. The boundary conditions change a little because of
> the geometry, but are of the same classifications (e.g. only Dirichlet and
> Neumann). On the production case, I am needing 600-4000 iterations to
> converge. I've attached the output from the first solve that took 658
> iterations to converge, using the following output options:
>
> -ksp_view_pre
> -ksp_view
> -ksp_converged_reason
> -ksp_monitor_true_residual
> -ksp_test_null_space
>
> My matrix is non-symmetric, the condition number can be around 10e6, and
> the eigenvalues reported by PETSc have been real and positive (using
> -ksp_view_eigenvalues).
>
> I have tried using other preconditions (superlu, mumps, gamg, mg) but
> hypre+boomeramg has performed the best so far. The literature seems to
> indicate that AMG is the best approach for solving these equations in a
> coupled fashion.
>
> Do you have any advice on speeding up the convergence of this system?
>
> Thank you,
> Joshua
> <petsc_gmres_boomeramg.txt>
>
>
> <petsc_preonly_mumps.txt><petsc_preonly_superlu.txt>
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230321/39e56ca8/attachment-0001.html>

From bourdin at mcmaster.ca  Fri Mar 24 13:07:08 2023
From: bourdin at mcmaster.ca (Blaise Bourdin)
Date: Fri, 24 Mar 2023 18:07:08 +0000
Subject: [petsc-users] GAMG failure
Message-ID: <1FD5FB62-4111-4376-8126-CFA8E8925620@mcmaster.ca>

Hi,

I am having issue with GAMG for some very ill-conditioned 2D linearized elasticity problems (sharp variation of elastic moduli with thin  regions of nearly incompressible material). I use snes_type newtonls, linesearch_type cp, and pc_type gamg without any further options. pc_type Jacobi converges fine (although slowly of course).


I am not really surprised that gamg would not converge out of the box, but don?t know where to start to investigate the convergence failure. Can anybody help?

Blaise

? 
Canada Research Chair in Mathematical and Computational Aspects of Solid Mechanics (Tier 1)
Professor, Department of Mathematics & Statistics
Hamilton Hall room 409A, McMaster University
1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada 
https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243


From jed at jedbrown.org  Fri Mar 24 13:47:02 2023
From: jed at jedbrown.org (Jed Brown)
Date: Fri, 24 Mar 2023 12:47:02 -0600
Subject: [petsc-users] GAMG failure
In-Reply-To: <1FD5FB62-4111-4376-8126-CFA8E8925620@mcmaster.ca>
References: <1FD5FB62-4111-4376-8126-CFA8E8925620@mcmaster.ca>
Message-ID: <87y1nmj8bd.fsf@jedbrown.org>

You can -pc_gamg_threshold .02 to slow the coarsening and either stronger smoother or increase number of iterations used for estimation (or increase tolerance). I assume your system is SPD and you've set the near-null space.

Blaise Bourdin <bourdin at mcmaster.ca> writes:

> Hi,
>
> I am having issue with GAMG for some very ill-conditioned 2D linearized elasticity problems (sharp variation of elastic moduli with thin  regions of nearly incompressible material). I use snes_type newtonls, linesearch_type cp, and pc_type gamg without any further options. pc_type Jacobi converges fine (although slowly of course).
>
>
> I am not really surprised that gamg would not converge out of the box, but don?t know where to start to investigate the convergence failure. Can anybody help?
>
> Blaise
>
> ? 
> Canada Research Chair in Mathematical and Computational Aspects of Solid Mechanics (Tier 1)
> Professor, Department of Mathematics & Statistics
> Hamilton Hall room 409A, McMaster University
> 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada 
> https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243

From mfadams at lbl.gov  Fri Mar 24 14:21:08 2023
From: mfadams at lbl.gov (Mark Adams)
Date: Fri, 24 Mar 2023 15:21:08 -0400
Subject: [petsc-users] GAMG failure
In-Reply-To: <87y1nmj8bd.fsf@jedbrown.org>
References: <1FD5FB62-4111-4376-8126-CFA8E8925620@mcmaster.ca>
	<87y1nmj8bd.fsf@jedbrown.org>
Message-ID: <CADOhEh4ntgjpta+jSB1gpFt-fHtfVn8WqfcpK=J_teN_yStctA@mail.gmail.com>

* Do you set:

    PetscCall(MatSetOption(Amat, MAT_SPD, PETSC_TRUE));
    PetscCall(MatSetOption(Amat, MAT_SPD_ETERNAL, PETSC_TRUE));

Do that to get CG Eigen estimates. Outright failure is usually caused by a
bad Eigen estimate.
-pc_gamg_esteig_ksp_monitor_singular_value
Will print out the estimates as its iterating. You can look at that to
check that the max has converged.

*  -pc_gamg_aggressive_coarsening 0

will slow coarsening as well as threshold.

* you can run with '-info :pc' and send me the output (grep on GAMG)

Mark

On Fri, Mar 24, 2023 at 2:47?PM Jed Brown <jed at jedbrown.org> wrote:

> You can -pc_gamg_threshold .02 to slow the coarsening and either stronger
> smoother or increase number of iterations used for estimation (or increase
> tolerance). I assume your system is SPD and you've set the near-null space.
>
> Blaise Bourdin <bourdin at mcmaster.ca> writes:
>
> > Hi,
> >
> > I am having issue with GAMG for some very ill-conditioned 2D linearized
> elasticity problems (sharp variation of elastic moduli with thin  regions
> of nearly incompressible material). I use snes_type newtonls,
> linesearch_type cp, and pc_type gamg without any further options. pc_type
> Jacobi converges fine (although slowly of course).
> >
> >
> > I am not really surprised that gamg would not converge out of the box,
> but don?t know where to start to investigate the convergence failure. Can
> anybody help?
> >
> > Blaise
> >
> > ?
> > Canada Research Chair in Mathematical and Computational Aspects of Solid
> Mechanics (Tier 1)
> > Professor, Department of Mathematics & Statistics
> > Hamilton Hall room 409A, McMaster University
> > 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada
> > https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230324/402e82e9/attachment.html>

From daniele.prada85 at gmail.com  Mon Mar 27 05:14:58 2023
From: daniele.prada85 at gmail.com (Daniele Prada)
Date: Mon, 27 Mar 2023 12:14:58 +0200
Subject: [petsc-users] Using PETSc Testing System
Message-ID: <CAFAUb+SoVW=Rgd006g_AVDorHLp+iEm+8+=ZGFMp2p_JPKv4kg@mail.gmail.com>

Hello everyone,

I would like to use the PETSc Testing System for testing a package that I
am developing.

I have read the PETSc developer documentation and have written some tests
using the PETSc Test Description Language. I am going through the files in
${PETSC_DIR}/config but I am not able to make the testing system look into
the directory tree of my project.

Any suggestions?

Thanks in advance
Daniele
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230327/11312da0/attachment.html>

From joauma.marichal at uclouvain.be  Mon Mar 27 09:13:23 2023
From: joauma.marichal at uclouvain.be (Joauma Marichal)
Date: Mon, 27 Mar 2023 14:13:23 +0000
Subject: [petsc-users] DMSwarm documentation
Message-ID: <DU0PR03MB9590F304A5C7D1146A52E649818B9@DU0PR03MB9590.eurprd03.prod.outlook.com>

Hello,

I am writing to you as I am trying to find documentation about a function that would remove several particles (given their index). I was using:
DMSwarmRemovePointAtIndex(*swarm, to_remove[p]);
But need something to remove several particles at one time.

Petsc.org seems to be down and I was wondering if there was any other way to get this kind of information.

Thanks a lot for your help.

Best regards,

Joauma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230327/b0dc0d9e/attachment.html>

From jacob.fai at gmail.com  Mon Mar 27 09:14:36 2023
From: jacob.fai at gmail.com (Jacob Faibussowitsch)
Date: Mon, 27 Mar 2023 10:14:36 -0400
Subject: [petsc-users] Using PETSc Testing System
In-Reply-To: <CAFAUb+SoVW=Rgd006g_AVDorHLp+iEm+8+=ZGFMp2p_JPKv4kg@mail.gmail.com>
References: <CAFAUb+SoVW=Rgd006g_AVDorHLp+iEm+8+=ZGFMp2p_JPKv4kg@mail.gmail.com>
Message-ID: <8F636F03-6581-4594-877F-CB0A4AC91EA3@gmail.com>

Our testing framework was pretty much tailor-made for the PETSc src tree and as such has many hard-coded paths and decisions. I?m going to go out on a limb and say you probably won?t get this to work...

That being said, one of the ?base? paths that the testing harness uses to initially find tests is the `TESTSRCDIR` variable in `${PETSC_DIR}/gmakefile.test`. It is currently defined as 
```
# TESTSRCDIR is always relative to gmakefile.test
#  This must be before includes
mkfile_path := $(abspath $(lastword $(MAKEFILE_LIST)))
TESTSRCDIR := $(dir $(mkfile_path))src
```
You should start by changing this to
```
# TESTSRCDIR is always relative to gmakefile.test
#  This must be before includes
mkfile_path := $(abspath $(lastword $(MAKEFILE_LIST)))
TESTSRCDIR ?= $(dir $(mkfile_path))src
```
That way you could run your tests via
```
$ make test TESTSRCDIR=/path/to/your/src/dir
```
I am sure there are many other modifications you will need to make.

Best regards,

Jacob Faibussowitsch
(Jacob Fai - booss - oh - vitch)

> On Mar 27, 2023, at 06:14, Daniele Prada <daniele.prada85 at gmail.com> wrote:
> 
> Hello everyone,
> 
> I would like to use the PETSc Testing System for testing a package that I am developing.
> 
> I have read the PETSc developer documentation and have written some tests using the PETSc Test Description Language. I am going through the files in ${PETSC_DIR}/config but I am not able to make the testing system look into the directory tree of my project.
> 
> Any suggestions?
> 
> Thanks in advance
> Daniele


From knepley at gmail.com  Mon Mar 27 09:37:49 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 27 Mar 2023 10:37:49 -0400
Subject: [petsc-users] Using PETSc Testing System
In-Reply-To: <8F636F03-6581-4594-877F-CB0A4AC91EA3@gmail.com>
References: <CAFAUb+SoVW=Rgd006g_AVDorHLp+iEm+8+=ZGFMp2p_JPKv4kg@mail.gmail.com>
	<8F636F03-6581-4594-877F-CB0A4AC91EA3@gmail.com>
Message-ID: <CAMYG4GkHvr1O9PU+wbabUD2=+AgmUbh3j_iOefx3hpP_iDX9KA@mail.gmail.com>

On Mon, Mar 27, 2023 at 10:19?AM Jacob Faibussowitsch <jacob.fai at gmail.com>
wrote:

> Our testing framework was pretty much tailor-made for the PETSc src tree
> and as such has many hard-coded paths and decisions. I?m going to go out on
> a limb and say you probably won?t get this to work...
>

I think we can help you get this to work. I have wanted to generalize the
test framework for a long time. Everything is build by

  confg/gmakegentest.py

and I think we can get away with just changing paths here and everything
will work.

  Thanks!

     Matt


> That being said, one of the ?base? paths that the testing harness uses to
> initially find tests is the `TESTSRCDIR` variable in
> `${PETSC_DIR}/gmakefile.test`. It is currently defined as
> ```
> # TESTSRCDIR is always relative to gmakefile.test
> #  This must be before includes
> mkfile_path := $(abspath $(lastword $(MAKEFILE_LIST)))
> TESTSRCDIR := $(dir $(mkfile_path))src
> ```
> You should start by changing this to
> ```
> # TESTSRCDIR is always relative to gmakefile.test
> #  This must be before includes
> mkfile_path := $(abspath $(lastword $(MAKEFILE_LIST)))
> TESTSRCDIR ?= $(dir $(mkfile_path))src
> ```
> That way you could run your tests via
> ```
> $ make test TESTSRCDIR=/path/to/your/src/dir
> ```
> I am sure there are many other modifications you will need to make.
>
> Best regards,
>
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
>
> > On Mar 27, 2023, at 06:14, Daniele Prada <daniele.prada85 at gmail.com>
> wrote:
> >
> > Hello everyone,
> >
> > I would like to use the PETSc Testing System for testing a package that
> I am developing.
> >
> > I have read the PETSc developer documentation and have written some
> tests using the PETSc Test Description Language. I am going through the
> files in ${PETSC_DIR}/config but I am not able to make the testing system
> look into the directory tree of my project.
> >
> > Any suggestions?
> >
> > Thanks in advance
> > Daniele
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230327/da01af47/attachment-0001.html>

From bsmith at petsc.dev  Mon Mar 27 09:51:15 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Mon, 27 Mar 2023 10:51:15 -0400
Subject: [petsc-users] [petsc-maint] DMSwarm documentation
In-Reply-To: <DU0PR03MB9590F304A5C7D1146A52E649818B9@DU0PR03MB9590.eurprd03.prod.outlook.com>
References: <DU0PR03MB9590F304A5C7D1146A52E649818B9@DU0PR03MB9590.eurprd03.prod.outlook.com>
Message-ID: <75E0AEDB-D0D0-497A-BB1F-90CD25175382@petsc.dev>


  petsc.org <http://petsc.org/> can be flaky and hang for a few seconds or not respond occasionally but trying again should work.

  Barry


> On Mar 27, 2023, at 10:13 AM, Joauma Marichal <joauma.marichal at uclouvain.be> wrote:
> 
> Hello, 
>  
> I am writing to you as I am trying to find documentation about a function that would remove several particles (given their index). I was using:
> DMSwarmRemovePointAtIndex(*swarm, to_remove[p]);
> But need something to remove several particles at one time.
>  
> Petsc.org <http://petsc.org/> seems to be down and I was wondering if there was any other way to get this kind of information.
>  
> Thanks a lot for your help. 
> 
> Best regards, 
>  
> Joauma

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230327/f7a9e137/attachment.html>

From facklerpw at ornl.gov  Mon Mar 27 12:23:28 2023
From: facklerpw at ornl.gov (Fackler, Philip)
Date: Mon, 27 Mar 2023 17:23:28 +0000
Subject: [petsc-users] [EXTERNAL] Re: Kokkos backend for Mat and Vec
 diverging when running on CUDA device.
In-Reply-To: <CA+MQGp_LEKKdbBFy07kX_Yb-x6ZcnJogQ6XXvb9kMoLjWk3gLw@mail.gmail.com>
References: <SA1PR09MB807708099B49B71B4E553775C6059@SA1PR09MB8077.namprd09.prod.outlook.com>
	<CA+MQGp8_k2s3g=+F8+tVTft5-sxxtmwZm8iYDZyJJ-p_TUQk2w@mail.gmail.com>
	<SA1PR09MB807748A8597359311177A4F9C6049@SA1PR09MB8077.namprd09.prod.outlook.com>
	<CA+MQGp--u=DM48M8cLipgDpLVGL9NL9ZECoMSFOjqHQ+0b8hnQ@mail.gmail.com>
	<SA1PR09MB80779F0B304F849CC87A473AC6079@SA1PR09MB8077.namprd09.prod.outlook.com>
	<CA+MQGp8sXa2kn-W65oL7gV=eoUb1SXCcZKWGYxa3b4LyO8T=uA@mail.gmail.com>
	<SA1PR09MB807764ACC584D2414B996D98C6189@SA1PR09MB8077.namprd09.prod.outlook.com>
	<CA+MQGp_josGQuLXy_zks28SJ8eSeA+X3NG6bUJKa7n3TFrhCUQ@mail.gmail.com>
	<CA+MQGp8eb9-LuRofJPP+HOn3hfQkgOmMkM7fA9LLYNXN+EZWPw@mail.gmail.com>
	<SA1PR09MB80771EF3A42BE873463EF13EC61B9@SA1PR09MB8077.namprd09.prod.outlook.com>
	<CA+MQGp9gcPFY_bsj_Gyj=EQ2gsOx-aTW1=Q9_mpEjqrBXteRuQ@mail.gmail.com>
	<SA1PR09MB807705EB573A7CCB773E4A05C61D9@SA1PR09MB8077.namprd09.prod.outlook.com>
	<SA1PR09MB8077DA9C9A560C7A48F6BCDEC6C59@SA1PR09MB8077.namprd09.prod.outlook.com>
	<CA+MQGp9+8hdQ-o2f3PXbfq-6PO-hvfsqpzy51JTdK+KnoW-6Jw@mail.gmail.com>
	<CA+MQGp_LEKKdbBFy07kX_Yb-x6ZcnJogQ6XXvb9kMoLjWk3gLw@mail.gmail.com>
Message-ID: <SA1PR09MB807707CF9DB2E460E8821725C68B9@SA1PR09MB8077.namprd09.prod.outlook.com>

Junchao,

I'm realizing I left you hanging in this email thread. Thank you so much for addressing the problem. I have tested it (successfully) using one process and one GPU. I'm still attempting to test with multiple GPUs (one per task) on another machine. I'll let you know if I see any more trouble.

Thanks again,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com>
Sent: Tuesday, February 7, 2023 16:26
To: Fackler, Philip <facklerpw at ornl.gov>
Cc: xolotl-psi-development at lists.sourceforge.net <xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; Blondel, Sophie <sblondel at utk.edu>; Roth, Philip <rothpc at ornl.gov>
Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device.

Hi, Philip,
  I believe this MR https://gitlab.com/petsc/petsc/-/merge_requests/6030<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.com%2Fpetsc%2Fpetsc%2F-%2Fmerge_requests%2F6030&data=05%7C01%7Cfacklerpw%40ornl.gov%7Ca70cea76b4c2430ed66508db0951f8eb%7Cdb3dbd434c4b45449f8a0553f9f5f25e%7C1%7C0%7C638114019941885806%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2Bg2WwdJkEBfCjsGZ0b5dEsOOFMroPS6R9VtJEnYPHZk%3D&reserved=0> would fix the problem.  It is a fix to petsc/release, but you can cherry-pick it to petsc/main.
  Could you try that in your case?
  Thanks.
--Junchao Zhang


On Fri, Jan 20, 2023 at 11:31 AM Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>> wrote:
Sorry, no progress. I guess that is because a vector was gotten but not restored (e.g., VecRestoreArray() etc), causing host and device data not synced.  Maybe in your code, or in petsc code.
After the ECP AM, I will have more time on this bug.
Thanks.

--Junchao Zhang


On Fri, Jan 20, 2023 at 11:00 AM Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>> wrote:
Any progress on this? Any info/help needed?

Thanks,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>>
Sent: Thursday, December 8, 2022 09:07
To: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Cc: xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net> <xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>; Blondel, Sophie <sblondel at utk.edu<mailto:sblondel at utk.edu>>; Roth, Philip <rothpc at ornl.gov<mailto:rothpc at ornl.gov>>
Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device.

Great! Thank you!

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: Wednesday, December 7, 2022 18:47
To: Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>>
Cc: xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net> <xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>; Blondel, Sophie <sblondel at utk.edu<mailto:sblondel at utk.edu>>; Roth, Philip <rothpc at ornl.gov<mailto:rothpc at ornl.gov>>
Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device.

Hi, Philip,
 I could reproduce the error. I need to find a  way to debug it.  Thanks.

/home/jczhang/xolotl/test/system/SystemTestCase.cpp(317): fatal error: in "System/PSI_1": absolute value of diffNorm{0.19704848134353209} exceeds 1e-10
*** 1 failure is detected in the test module "Regression"

--Junchao Zhang


On Tue, Dec 6, 2022 at 10:10 AM Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>> wrote:
I think it would be simpler to use the develop branch for this issue. But you can still just build the SystemTester. Then (if you changed the PSI_1 case) run:

 ./test/system/SystemTester -t System/PSI_1 -- -v?

(No need for multiple MPI ranks)

Thanks,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: Monday, December 5, 2022 15:40
To: Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>>
Cc: xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net> <xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>; Blondel, Sophie <sblondel at utk.edu<mailto:sblondel at utk.edu>>; Roth, Philip <rothpc at ornl.gov<mailto:rothpc at ornl.gov>>
Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device.

I configured with xolotl branch feature-petsc-kokkos, and typed `make` under ~/xolotl-build/.  Though there were errors,  a lot of *Tester were built.
[ 62%] Built target xolotlViz
[ 63%] Linking CXX executable TemperatureProfileHandlerTester
[ 64%] Linking CXX executable TemperatureGradientHandlerTester
[ 64%] Built target TemperatureProfileHandlerTester
[ 64%] Built target TemperatureConstantHandlerTester
[ 64%] Built target TemperatureGradientHandlerTester
[ 65%] Linking CXX executable HeatEquationHandlerTester
[ 65%] Built target HeatEquationHandlerTester
[ 66%] Linking CXX executable FeFitFluxHandlerTester
[ 66%] Linking CXX executable W111FitFluxHandlerTester
[ 67%] Linking CXX executable FuelFitFluxHandlerTester
[ 67%] Linking CXX executable W211FitFluxHandlerTester
Which Tester should I use to run with the parameter file benchmarks/params_system_PSI_2.txt? And how many ranks should I use?  Could you give an example command line?
Thanks.

--Junchao Zhang


On Mon, Dec 5, 2022 at 2:22 PM Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>> wrote:
Hello, Philip,
   Do I still need to use the feature-petsc-kokkos branch?
--Junchao Zhang


On Mon, Dec 5, 2022 at 11:08 AM Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>> wrote:
Junchao,

Thank you for working on this. If you open the parameter file for, say, the PSI_2 system test case (benchmarks/params_system_PSI_2.txt), simply add -dm_mat_type aijkokkos -dm_vec_type kokkos?` to the "petscArgs=" field (or the corresponding cusparse/cuda option).

Thanks,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: Thursday, December 1, 2022 17:05
To: Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>>
Cc: xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net> <xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>; Blondel, Sophie <sblondel at utk.edu<mailto:sblondel at utk.edu>>; Roth, Philip <rothpc at ornl.gov<mailto:rothpc at ornl.gov>>
Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device.

Hi, Philip,
  Sorry for the long delay.  I could not get something useful from the -log_view output.  Since I have already built xolotl, could you give me instructions on how to do a xolotl test to reproduce the divergence with petsc GPU backends (but fine on CPU)?
  Thank you.
--Junchao Zhang


On Wed, Nov 16, 2022 at 1:38 PM Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>> wrote:
------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------

Unknown Name on a  named PC0115427 with 1 processor, by 4pf Wed Nov 16 14:36:46 2022
Using Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a  GIT Date: 2022-10-28 14:39:41 +0000

                         Max       Max/Min     Avg       Total
Time (sec):           6.023e+00     1.000   6.023e+00
Objects:              1.020e+02     1.000   1.020e+02
Flops:                1.080e+09     1.000   1.080e+09  1.080e+09
Flops/sec:            1.793e+08     1.000   1.793e+08  1.793e+08
MPI Msg Count:        0.000e+00     0.000   0.000e+00  0.000e+00
MPI Msg Len (bytes):  0.000e+00     0.000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00     0.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
 0:      Main Stage: 6.0226e+00 100.0%  1.0799e+09 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
   GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors)
   CpuToGpu Count: total number of CPU to GPU copies per processor
   CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor)
   GpuToCpu Count: total number of GPU to CPU copies per processor
   GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor)
   GPU %F: percent flops on GPU in this event
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
   GPU    - CpuToGpu -   - GpuToCpu - GPU

                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
 Mflop/s Count   Size   Count   Size  %F

------------------------------------------------------------------------------------------------------------------------
---------------------------------------
--- Event Stage 0: Main Stage

BuildTwoSided          3 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

DMCreateMat            1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

SFSetGraph             3 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

SFSetUp                3 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

SFPack              4647 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

SFUnpack            4647 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

VecDot               190 1.0   nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

VecMDot              775 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

VecNorm             1728 1.0   nan nan 1.92e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

VecScale            1983 1.0   nan nan 6.24e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

VecCopy              780 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

VecSet              4955 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

VecAXPY              190 1.0   nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

VecAYPX              597 1.0   nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

VecAXPBYCZ           643 1.0   nan nan 1.79e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

VecWAXPY             502 1.0   nan nan 5.58e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

VecMAXPY            1159 1.0   nan nan 3.68e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  3  0  0  0   0  3  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

VecScatterBegin     4647 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0  -nan
    -nan      2 5.14e-03    0 0.00e+00  0

VecScatterEnd       4647 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

VecReduceArith       380 1.0   nan nan 4.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

VecReduceComm        190 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

VecNormalize         965 1.0   nan nan 1.61e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

TSStep                20 1.0 5.8699e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 97100  0  0  0  97100  0  0  0   184
    -nan      2 5.14e-03    0 0.00e+00 54

TSFunctionEval       597 1.0   nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 63  1  0  0  0  63  1  0  0  0  -nan
    -nan      1 3.36e-04    0 0.00e+00 100

TSJacobianEval       190 1.0   nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 24  3  0  0  0  24  3  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 97

MatMult             1930 1.0   nan nan 4.46e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1 41  0  0  0   1 41  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

MatMultTranspose       1 1.0   nan nan 3.44e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

MatSolve             965 1.0   nan nan 5.04e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  5  0  0  0   1  5  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatSOR               965 1.0   nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 0.0e+00  4 31  0  0  0   4 31  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatLUFactorSym         1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatLUFactorNum       190 1.0   nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1 11  0  0  0   1 11  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatScale             190 1.0   nan nan 3.26e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  3  0  0  0   0  3  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

MatAssemblyBegin     761 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatAssemblyEnd       761 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatGetRowIJ            1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatCreateSubMats     380 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatGetOrdering         1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatZeroEntries       379 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatSetPreallCOO        1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatSetValuesCOO      190 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

KSPSetUp             760 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

KSPSolve             190 1.0 5.8052e-01 1.0 9.30e+08 1.0 0.0e+00 0.0e+00 0.0e+00 10 86  0  0  0  10 86  0  0  0  1602
    -nan      1 4.80e-03    0 0.00e+00 46

KSPGMRESOrthog       775 1.0   nan nan 2.27e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  2  0  0  0   1  2  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

SNESSolve             71 1.0 5.7117e+00 1.0 1.07e+09 1.0 0.0e+00 0.0e+00 0.0e+00 95 99  0  0  0  95 99  0  0  0   188
    -nan      1 4.80e-03    0 0.00e+00 53

SNESSetUp              1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

SNESFunctionEval     573 1.0   nan nan 2.23e+07 1.0 0.0e+00 0.0e+00 0.0e+00 60  2  0  0  0  60  2  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

SNESJacobianEval     190 1.0   nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 24  3  0  0  0  24  3  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 97

SNESLineSearch       190 1.0   nan nan 1.05e+08 1.0 0.0e+00 0.0e+00 0.0e+00 53 10  0  0  0  53 10  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

PCSetUp              570 1.0   nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2 11  0  0  0   2 11  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

PCApply              965 1.0   nan nan 6.14e+08 1.0 0.0e+00 0.0e+00 0.0e+00  8 57  0  0  0   8 57  0  0  0  -nan
    -nan      1 4.80e-03    0 0.00e+00 19

KSPSolve_FS_0        965 1.0   nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 0.0e+00  4 31  0  0  0   4 31  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

KSPSolve_FS_1        965 1.0   nan nan 1.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2 15  0  0  0   2 15  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0


--- Event Stage 1: Unknown

------------------------------------------------------------------------------------------------------------------------
---------------------------------------


Object Type          Creations   Destructions. Reports information only for process 0.

--- Event Stage 0: Main Stage

           Container     5              5
    Distributed Mesh     2              2
           Index Set    11             11
   IS L to G Mapping     1              1
   Star Forest Graph     7              7
     Discrete System     2              2
           Weak Form     2              2
              Vector    49             49
             TSAdapt     1              1
                  TS     1              1
                DMTS     1              1
                SNES     1              1
              DMSNES     3              3
      SNESLineSearch     1              1
       Krylov Solver     4              4
     DMKSP interface     1              1
              Matrix     4              4
      Preconditioner     4              4
              Viewer     2              1

--- Event Stage 1: Unknown

========================================================================================================================
Average time to get PetscTime(): 3.14e-08
#PETSc Option Table entries:
-log_view
-log_view_gpu_times
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with 64 bit PetscInt
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8
Configure options: PETSC_DIR=/home/4pf/repos/petsc PETSC_ARCH=arch-kokkos-cuda-no-tpls --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cuda --with-debugging=0 --with-shared-libraries --prefix=/home/4pf/build/petsc/cuda-no-tpls/install --with-64-bit-indices --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --CUDAOPTFLAGS=-O3 --with-kokkos-dir=/home/4pf/build/kokkos/cuda/install --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/cuda-no-tpls/install

-----------------------------------------
Libraries compiled on 2022-11-01 21:01:08 on PC0115427
Machine characteristics: Linux-5.15.0-52-generic-x86_64-with-glibc2.35
Using PETSc directory: /home/4pf/build/petsc/cuda-no-tpls/install
Using PETSc arch:
-----------------------------------------

Using C compiler: mpicc  -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -O3
-----------------------------------------

Using include paths: -I/home/4pf/build/petsc/cuda-no-tpls/install/include -I/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/include -I/home/4pf/build/kokkos/cuda/install/include -I/usr/local/cuda-11.8/include
-----------------------------------------

Using C linker: mpicc
Using libraries: -Wl,-rpath,/home/4pf/build/petsc/cuda-no-tpls/install/lib -L/home/4pf/build/petsc/cuda-no-tpls/install/lib -lpetsc -Wl,-rpath,/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib -L/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib -Wl,-rpath,/home/4pf/build/kokkos/cuda/install/lib -L/home/4pf/build/kokkos/cuda/install/lib -Wl,-rpath,/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64/stubs -lkokkoskernels -lkokkoscontainers -lkokkoscore -llapack -lblas -lm -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lquadmath -lstdc++ -ldl
-----------------------------------------


Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: Tuesday, November 15, 2022 13:03
To: Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>>
Cc: xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net> <xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>; Blondel, Sophie <sblondel at utk.edu<mailto:sblondel at utk.edu>>; Roth, Philip <rothpc at ornl.gov<mailto:rothpc at ornl.gov>>
Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device.

Can you paste -log_view result so I can see what functions are used?

--Junchao Zhang


On Tue, Nov 15, 2022 at 10:24 AM Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>> wrote:
Yes, most (but not all) of our system test cases fail with the kokkos/cuda or cuda backends. All of them pass with the CPU-only kokkos backend.

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: Monday, November 14, 2022 19:34
To: Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>>
Cc: xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net> <xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>; Blondel, Sophie <sblondel at utk.edu<mailto:sblondel at utk.edu>>; Zhang, Junchao <jczhang at mcs.anl.gov<mailto:jczhang at mcs.anl.gov>>; Roth, Philip <rothpc at ornl.gov<mailto:rothpc at ornl.gov>>
Subject: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device.

Hi, Philip,
  Sorry to hear that.  It seems you could run the same code on CPUs but not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it right?

--Junchao Zhang


On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
This is an issue I've brought up before (and discussed in-person with Richard). I wanted to bring it up again because I'm hitting the limits of what I know to do, and I need help figuring this out.

The problem can be reproduced using Xolotl's "develop" branch built against a petsc build with kokkos and kokkos-kernels enabled. Then, either add the relevant kokkos options to the "petscArgs=" line in the system test parameter file(s), or just replace the system test parameter files with the ones from the "feature-petsc-kokkos" branch. See here the files that begin with "params_system_".

Note that those files use the "kokkos" options, but the problem is similar using the corresponding cuda/cusparse options. I've already tried building kokkos-kernels with no TPLs and got slightly different results, but the same problem.

Any help would be appreciated.

Thanks,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230327/b60454e5/attachment-0001.html>

From bourdin at mcmaster.ca  Mon Mar 27 16:06:11 2023
From: bourdin at mcmaster.ca (Blaise Bourdin)
Date: Mon, 27 Mar 2023 21:06:11 +0000
Subject: [petsc-users] GAMG failure
In-Reply-To: <CADOhEh4ntgjpta+jSB1gpFt-fHtfVn8WqfcpK=J_teN_yStctA@mail.gmail.com>
References: <1FD5FB62-4111-4376-8126-CFA8E8925620@mcmaster.ca>
	<87y1nmj8bd.fsf@jedbrown.org>
	<CADOhEh4ntgjpta+jSB1gpFt-fHtfVn8WqfcpK=J_teN_yStctA@mail.gmail.com>
Message-ID: <A4F8C047-E315-4B5E-B6C1-8F319A8638C1@mcmaster.ca>

An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230327/d2d7e278/attachment.html>

From jed at jedbrown.org  Mon Mar 27 16:32:16 2023
From: jed at jedbrown.org (Jed Brown)
Date: Mon, 27 Mar 2023 15:32:16 -0600
Subject: [petsc-users] GAMG failure
In-Reply-To: <A4F8C047-E315-4B5E-B6C1-8F319A8638C1@mcmaster.ca>
References: <1FD5FB62-4111-4376-8126-CFA8E8925620@mcmaster.ca>
	<87y1nmj8bd.fsf@jedbrown.org>
	<CADOhEh4ntgjpta+jSB1gpFt-fHtfVn8WqfcpK=J_teN_yStctA@mail.gmail.com>
	<A4F8C047-E315-4B5E-B6C1-8F319A8638C1@mcmaster.ca>
Message-ID: <87lejhnan3.fsf@jedbrown.org>

Try -pc_gamg_reuse_interpolation 0. I thought this was disabled by default, but I see pc_gamg->reuse_prol = PETSC_TRUE in the code.

Blaise Bourdin <bourdin at mcmaster.ca> writes:

>  On Mar 24, 2023, at 3:21 PM, Mark Adams <mfadams at lbl.gov> wrote:
>
>  * Do you set: 
>
>      PetscCall(MatSetOption(Amat, MAT_SPD, PETSC_TRUE));
>
>      PetscCall(MatSetOption(Amat, MAT_SPD_ETERNAL, PETSC_TRUE));
>
> Yes
>
>  Do that to get CG Eigen estimates. Outright failure is usually caused by a bad Eigen estimate.
>  -pc_gamg_esteig_ksp_monitor_singular_value
>  Will print out the estimates as its iterating. You can look at that to check that the max has converged.
>
> I just did, and something is off:
> I do multiple calls to SNESSolve (staggered scheme for phase-field fracture), but only get informations on the first solve (which is
> not the one failing, of course)
> Here is what I get:
> Residual norms for Displacement_pc_gamg_esteig_ solve.
>   0 KSP Residual norm 7.636421712860e+01 % max 1.000000000000e+00 min 1.000000000000e+00 max/min
> 1.000000000000e+00
>   1 KSP Residual norm 3.402024867977e+01 % max 1.114319928921e+00 min 1.114319928921e+00 max/min
> 1.000000000000e+00
>   2 KSP Residual norm 2.124815079671e+01 % max 1.501143586520e+00 min 5.739351119078e-01 max/min
> 2.615528402732e+00
>   3 KSP Residual norm 1.581785698912e+01 % max 1.644351137983e+00 min 3.263683482596e-01 max/min
> 5.038329074347e+00
>   4 KSP Residual norm 1.254871990315e+01 % max 1.714668863819e+00 min 2.044075812142e-01 max/min
> 8.388479789416e+00
>   5 KSP Residual norm 1.051198229090e+01 % max 1.760078533063e+00 min 1.409327403114e-01 max/min
> 1.248878386367e+01
>   6 KSP Residual norm 9.061658306086e+00 % max 1.792995287686e+00 min 1.023484740555e-01 max/min
> 1.751853463603e+01
>   7 KSP Residual norm 8.015529297567e+00 % max 1.821497535985e+00 min 7.818018001928e-02 max/min
> 2.329871248104e+01
>   8 KSP Residual norm 7.201063258957e+00 % max 1.855140071935e+00 min 6.178572472468e-02 max/min
> 3.002538337458e+01
>   9 KSP Residual norm 6.548491711695e+00 % max 1.903578294573e+00 min 5.008612895206e-02 max/min
> 3.800609738466e+01
>  10 KSP Residual norm 6.002109992255e+00 % max 1.961356890125e+00 min 4.130572033722e-02 max/min
> 4.748390475004e+01
>   Residual norms for Displacement_pc_gamg_esteig_ solve.
>   0 KSP Residual norm 2.373573910237e+02 % max 1.000000000000e+00 min 1.000000000000e+00 max/min
> 1.000000000000e+00
>   1 KSP Residual norm 8.845061415709e+01 % max 1.081192207576e+00 min 1.081192207576e+00 max/min
> 1.000000000000e+00
>   2 KSP Residual norm 5.607525485152e+01 % max 1.345947059840e+00 min 5.768825326129e-01 max/min
> 2.333138869267e+00
>   3 KSP Residual norm 4.123522550864e+01 % max 1.481153523075e+00 min 3.070603564913e-01 max/min
> 4.823655974348e+00
>   4 KSP Residual norm 3.345765664017e+01 % max 1.551374710727e+00 min 1.953487694959e-01 max/min
> 7.941563771968e+00
>   5 KSP Residual norm 2.859712984893e+01 % max 1.604588395452e+00 min 1.313871480574e-01 max/min
> 1.221267391199e+01
>   6 KSP Residual norm 2.525636054248e+01 % max 1.650487481750e+00 min 9.322735730688e-02 max/min
> 1.770389646804e+01
>   7 KSP Residual norm 2.270711391451e+01 % max 1.697243639599e+00 min 6.945419058256e-02 max/min
> 2.443687883140e+01
>   8 KSP Residual norm 2.074739485241e+01 % max 1.737293728907e+00 min 5.319942519758e-02 max/min
> 3.265624999621e+01
>   9 KSP Residual norm 1.912808268870e+01 % max 1.771708608618e+00 min 4.229776586667e-02 max/min
> 4.188657656771e+01
>  10 KSP Residual norm 1.787394414641e+01 % max 1.802834420843e+00 min 3.460455235448e-02 max/min
> 5.209818645753e+01
>   Residual norms for Displacement_pc_gamg_esteig_ solve.
>   0 KSP Residual norm 1.361990679391e+03 % max 1.000000000000e+00 min 1.000000000000e+00 max/min
> 1.000000000000e+00
>   1 KSP Residual norm 5.377188333825e+02 % max 1.086812916769e+00 min 1.086812916769e+00 max/min
> 1.000000000000e+00
>   2 KSP Residual norm 2.819790765047e+02 % max 1.474233179517e+00 min 6.475176340551e-01 max/min
> 2.276745994212e+00
>   3 KSP Residual norm 1.856720658591e+02 % max 1.646049713883e+00 min 4.391851040105e-01 max/min
> 3.747963441500e+00
>   4 KSP Residual norm 1.446507859917e+02 % max 1.760403013135e+00 min 2.972886103795e-01 max/min
> 5.921528614526e+00
>   5 KSP Residual norm 1.212491636433e+02 % max 1.839250080524e+00 min 1.921591413785e-01 max/min
> 9.571494061277e+00
>   6 KSP Residual norm 1.052783637696e+02 % max 1.887062042760e+00 min 1.275920366984e-01 max/min
> 1.478981048966e+01
>   7 KSP Residual norm 9.230292625762e+01 % max 1.917891358356e+00 min 8.853577120467e-02 max/min
> 2.166233300122e+01
>   8 KSP Residual norm 8.262607594297e+01 % max 1.935857204308e+00 min 6.706949937710e-02 max/min
> 2.886345093206e+01
>   9 KSP Residual norm 7.616474911000e+01 % max 1.946323901431e+00 min 5.354310733090e-02 max/min
> 3.635059671458e+01
>  10 KSP Residual norm 7.138356892221e+01 % max 1.954382723686e+00 min 4.367661484659e-02 max/min
> 4.474666204216e+01
>   Residual norms for Displacement_pc_gamg_esteig_ solve.
>   0 KSP Residual norm 3.702300162209e+03 % max 1.000000000000e+00 min 1.000000000000e+00 max/min
> 1.000000000000e+00
>   1 KSP Residual norm 1.255008322497e+03 % max 9.938792139169e-01 min 9.938792139169e-01 max/min
> 1.000000000000e+00
>   2 KSP Residual norm 6.727201181977e+02 % max 1.297844907149e+00 min 6.478406586220e-01 max/min
> 2.003339694532e+00
>   3 KSP Residual norm 5.218419298230e+02 % max 1.435817121668e+00 min 3.868381643086e-01 max/min
> 3.711673909512e+00
>   4 KSP Residual norm 4.562548407646e+02 % max 1.507841675332e+00 min 1.835807205925e-01 max/min
> 8.213507771759e+00
>   5 KSP Residual norm 3.829651184063e+02 % max 1.544809112105e+00 min 9.645201420491e-02 max/min
> 1.601634890510e+01
>   6 KSP Residual norm 2.858162778588e+02 % max 1.571662611009e+00 min 6.326714268751e-02 max/min
> 2.484168786904e+01
>   7 KSP Residual norm 2.074805889949e+02 % max 1.587767457742e+00 min 5.145942909400e-02 max/min
> 3.085474296347e+01
>   8 KSP Residual norm 1.566220417755e+02 % max 1.597548616381e+00 min 4.650092979233e-02 max/min
> 3.435519727274e+01
>   9 KSP Residual norm 1.157894309297e+02 % max 1.603863600136e+00 min 4.344076378399e-02 max/min
> 3.692070443585e+01
>  10 KSP Residual norm 8.447209442299e+01 % max 1.608204129656e+00 min 4.123402730882e-02 max/min
> 3.900186895670e+01
>   Linear Displacement_ solve converged due to CONVERGED_RTOL iterations 14
>
>  *  -pc_gamg_aggressive_coarsening 0
>
>  will slow coarsening as well as threshold.
>
> That did not help
>
>  * you can run with '-info :pc' and send me the output (grep on GAMG)
>
> Let?s try to figure out if the fact that -pc_gamg_esteig_ksp_monitor_singular_value is an indication of a problem first.
>
> Blaise
>
> ? 
> Canada Research Chair in Mathematical and Computational Aspects of Solid Mechanics (Tier 1)
> Professor, Department of Mathematics & Statistics
> Hamilton Hall room 409A, McMaster University
> 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada 
> https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243

From knepley at gmail.com  Mon Mar 27 19:36:34 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 27 Mar 2023 20:36:34 -0400
Subject: [petsc-users] [petsc-maint] DMSwarm documentation
In-Reply-To: <DU0PR03MB9590F304A5C7D1146A52E649818B9@DU0PR03MB9590.eurprd03.prod.outlook.com>
References: <DU0PR03MB9590F304A5C7D1146A52E649818B9@DU0PR03MB9590.eurprd03.prod.outlook.com>
Message-ID: <CAMYG4G=2RsZvg+nTBQUtSPCnQMTUNkrnUg4LjYo8UysM0SRhug@mail.gmail.com>

On Mon, Mar 27, 2023 at 10:19?AM Joauma Marichal <
joauma.marichal at uclouvain.be> wrote:

> Hello,
>
>
>
> I am writing to you as I am trying to find documentation about a function
> that would remove several particles (given their index). I was using:
>
> DMSwarmRemovePointAtIndex(*swarm, to_remove[p]);
>
> But need something to remove several particles at one time.
>

There are no functions taking a list of points to remove.

  Thanks,

     Matt


> Petsc.org seems to be down and I was wondering if there was any other way
> to get this kind of information.
>
>
>
> Thanks a lot for your help.
>
> Best regards,
>
>
>
> Joauma
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230327/1e22de98/attachment.html>

From knepley at gmail.com  Mon Mar 27 19:48:54 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 27 Mar 2023 20:48:54 -0400
Subject: [petsc-users] Petsc DMLabel Fortran Stub request
In-Reply-To: <CAB1dcV6rrf7LVfD9ZOmuLqoD8C306BU0kcE-3FE7cn=nwkqiSQ@mail.gmail.com>
References: <CAB1dcV6rrf7LVfD9ZOmuLqoD8C306BU0kcE-3FE7cn=nwkqiSQ@mail.gmail.com>
Message-ID: <CAMYG4Gmw_3oZrdxxjVuWq7+FuYdEkH8Yiy6ucN6CGZJBAoOtBA@mail.gmail.com>

On Fri, Jan 6, 2023 at 10:03?AM Nicholas Arnold-Medabalimi <
narnoldm at umich.edu> wrote:

> Hi Petsc Users
>

I apologize. I found this email today and it looks like no one answered.


> I am trying to use the sequence of
> call DMLabelPropagateBegin(synchLabel,sf,ierr)
> call
> DMLabelPropagatePush(synchLabel,sf,PETSC_NULL_OPTIONS,PETSC_NULL_INTEGER,ierr)
> call DMLabelPropagateEnd(synchLabel,sf, ierr)
> in fortran.
>
> I apologize if I messed something up, it appears as if the
> DMLabelPropagatePush command doesn't have an appropriate Fortran interface
> as I get an undefined reference when it is called.
>

Yes, it takes a function pointer, and using function pointers with Fortran
is not easy, although it can be done. It might be better to create a C
function with some default marking and then wrap that. What do you want to
do?

  Thanks,

     Matt


> I would appreciate any assistance.
>
> As a side note in practice, what is the proper Fortran NULL pointer to use
> for void arguments? I used an integer one temporarily to get to the
> undefined reference error but I assume it doesn't matter?
>
>
> Sincerely
> Nicholas
>
> --
> Nicholas Arnold-Medabalimi
>
> Ph.D. Candidate
> Computational Aeroscience Lab
> University of Michigan
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230327/1e55e580/attachment.html>

From mfadams at lbl.gov  Mon Mar 27 20:11:57 2023
From: mfadams at lbl.gov (Mark Adams)
Date: Mon, 27 Mar 2023 21:11:57 -0400
Subject: [petsc-users] GAMG failure
In-Reply-To: <A4F8C047-E315-4B5E-B6C1-8F319A8638C1@mcmaster.ca>
References: <1FD5FB62-4111-4376-8126-CFA8E8925620@mcmaster.ca>
	<87y1nmj8bd.fsf@jedbrown.org>
	<CADOhEh4ntgjpta+jSB1gpFt-fHtfVn8WqfcpK=J_teN_yStctA@mail.gmail.com>
	<A4F8C047-E315-4B5E-B6C1-8F319A8638C1@mcmaster.ca>
Message-ID: <CADOhEh4KJz8aEeFVL4yeHaa8s86VpXjCknh6eC7C3Ygs1Dpz6w@mail.gmail.com>

Yes, the eigen estimates are converging slowly.

BTW, have you tried hypre? It is a good solver (lots lots more woman years)
These eigen estimates are conceptually simple, but they can lead to
problems like this (hypre and an eigen estimate free smoother).

But try this (good to have options anyway):

-pc_gamg_esteig_ksp_max_it 20

Chevy will scale the estimate that we give by, I think, 5% by default.
Maybe 10.
You can set that with:

-mg_levels_ksp_chebyshev_esteig 0,0.2,0,*1.05*

0.2 is the scaling of the high eigen estimate for the low eigen value in
Chebyshev.


On Mon, Mar 27, 2023 at 5:06?PM Blaise Bourdin <bourdin at mcmaster.ca> wrote:

>
>
> On Mar 24, 2023, at 3:21 PM, Mark Adams <mfadams at lbl.gov> wrote:
>
> * Do you set:
>
>     PetscCall(MatSetOption(Amat, MAT_SPD, PETSC_TRUE));
>
>     PetscCall(MatSetOption(Amat, MAT_SPD_ETERNAL, PETSC_TRUE));
>
>
> Yes
>
>
> Do that to get CG Eigen estimates. Outright failure is usually caused by a
> bad Eigen estimate.
> -pc_gamg_esteig_ksp_monitor_singular_value
> Will print out the estimates as its iterating. You can look at that to
> check that the max has converged.
>
>
> I just did, and something is off:
> I do multiple calls to SNESSolve (staggered scheme for phase-field
> fracture), but only get informations on the first solve (which is not the
> one failing, of course)
> Here is what I get:
> Residual norms for Displacement_pc_gamg_esteig_ solve.
>   0 KSP Residual norm 7.636421712860e+01 % max 1.000000000000e+00 min
> 1.000000000000e+00 max/min 1.000000000000e+00
>   1 KSP Residual norm 3.402024867977e+01 % max 1.114319928921e+00 min
> 1.114319928921e+00 max/min 1.000000000000e+00
>   2 KSP Residual norm 2.124815079671e+01 % max 1.501143586520e+00 min
> 5.739351119078e-01 max/min 2.615528402732e+00
>   3 KSP Residual norm 1.581785698912e+01 % max 1.644351137983e+00 min
> 3.263683482596e-01 max/min 5.038329074347e+00
>   4 KSP Residual norm 1.254871990315e+01 % max 1.714668863819e+00 min
> 2.044075812142e-01 max/min 8.388479789416e+00
>   5 KSP Residual norm 1.051198229090e+01 % max 1.760078533063e+00 min
> 1.409327403114e-01 max/min 1.248878386367e+01
>   6 KSP Residual norm 9.061658306086e+00 % max 1.792995287686e+00 min
> 1.023484740555e-01 max/min 1.751853463603e+01
>   7 KSP Residual norm 8.015529297567e+00 % max 1.821497535985e+00 min
> 7.818018001928e-02 max/min 2.329871248104e+01
>   8 KSP Residual norm 7.201063258957e+00 % max 1.855140071935e+00 min
> 6.178572472468e-02 max/min 3.002538337458e+01
>   9 KSP Residual norm 6.548491711695e+00 % max 1.903578294573e+00 min
> 5.008612895206e-02 max/min 3.800609738466e+01
>  10 KSP Residual norm 6.002109992255e+00 % max 1.961356890125e+00 min
> 4.130572033722e-02 max/min 4.748390475004e+01
>   Residual norms for Displacement_pc_gamg_esteig_ solve.
>   0 KSP Residual norm 2.373573910237e+02 % max 1.000000000000e+00 min
> 1.000000000000e+00 max/min 1.000000000000e+00
>   1 KSP Residual norm 8.845061415709e+01 % max 1.081192207576e+00 min
> 1.081192207576e+00 max/min 1.000000000000e+00
>   2 KSP Residual norm 5.607525485152e+01 % max 1.345947059840e+00 min
> 5.768825326129e-01 max/min 2.333138869267e+00
>   3 KSP Residual norm 4.123522550864e+01 % max 1.481153523075e+00 min
> 3.070603564913e-01 max/min 4.823655974348e+00
>   4 KSP Residual norm 3.345765664017e+01 % max 1.551374710727e+00 min
> 1.953487694959e-01 max/min 7.941563771968e+00
>   5 KSP Residual norm 2.859712984893e+01 % max 1.604588395452e+00 min
> 1.313871480574e-01 max/min 1.221267391199e+01
>   6 KSP Residual norm 2.525636054248e+01 % max 1.650487481750e+00 min
> 9.322735730688e-02 max/min 1.770389646804e+01
>   7 KSP Residual norm 2.270711391451e+01 % max 1.697243639599e+00 min
> 6.945419058256e-02 max/min 2.443687883140e+01
>   8 KSP Residual norm 2.074739485241e+01 % max 1.737293728907e+00 min
> 5.319942519758e-02 max/min 3.265624999621e+01
>   9 KSP Residual norm 1.912808268870e+01 % max 1.771708608618e+00 min
> 4.229776586667e-02 max/min 4.188657656771e+01
>  10 KSP Residual norm 1.787394414641e+01 % max 1.802834420843e+00 min
> 3.460455235448e-02 max/min 5.209818645753e+01
>   Residual norms for Displacement_pc_gamg_esteig_ solve.
>   0 KSP Residual norm 1.361990679391e+03 % max 1.000000000000e+00 min
> 1.000000000000e+00 max/min 1.000000000000e+00
>   1 KSP Residual norm 5.377188333825e+02 % max 1.086812916769e+00 min
> 1.086812916769e+00 max/min 1.000000000000e+00
>   2 KSP Residual norm 2.819790765047e+02 % max 1.474233179517e+00 min
> 6.475176340551e-01 max/min 2.276745994212e+00
>   3 KSP Residual norm 1.856720658591e+02 % max 1.646049713883e+00 min
> 4.391851040105e-01 max/min 3.747963441500e+00
>   4 KSP Residual norm 1.446507859917e+02 % max 1.760403013135e+00 min
> 2.972886103795e-01 max/min 5.921528614526e+00
>   5 KSP Residual norm 1.212491636433e+02 % max 1.839250080524e+00 min
> 1.921591413785e-01 max/min 9.571494061277e+00
>   6 KSP Residual norm 1.052783637696e+02 % max 1.887062042760e+00 min
> 1.275920366984e-01 max/min 1.478981048966e+01
>   7 KSP Residual norm 9.230292625762e+01 % max 1.917891358356e+00 min
> 8.853577120467e-02 max/min 2.166233300122e+01
>   8 KSP Residual norm 8.262607594297e+01 % max 1.935857204308e+00 min
> 6.706949937710e-02 max/min 2.886345093206e+01
>   9 KSP Residual norm 7.616474911000e+01 % max 1.946323901431e+00 min
> 5.354310733090e-02 max/min 3.635059671458e+01
>  10 KSP Residual norm 7.138356892221e+01 % max 1.954382723686e+00 min
> 4.367661484659e-02 max/min 4.474666204216e+01
>   Residual norms for Displacement_pc_gamg_esteig_ solve.
>   0 KSP Residual norm 3.702300162209e+03 % max 1.000000000000e+00 min
> 1.000000000000e+00 max/min 1.000000000000e+00
>   1 KSP Residual norm 1.255008322497e+03 % max 9.938792139169e-01 min
> 9.938792139169e-01 max/min 1.000000000000e+00
>   2 KSP Residual norm 6.727201181977e+02 % max 1.297844907149e+00 min
> 6.478406586220e-01 max/min 2.003339694532e+00
>   3 KSP Residual norm 5.218419298230e+02 % max 1.435817121668e+00 min
> 3.868381643086e-01 max/min 3.711673909512e+00
>   4 KSP Residual norm 4.562548407646e+02 % max 1.507841675332e+00 min
> 1.835807205925e-01 max/min 8.213507771759e+00
>   5 KSP Residual norm 3.829651184063e+02 % max 1.544809112105e+00 min
> 9.645201420491e-02 max/min 1.601634890510e+01
>   6 KSP Residual norm 2.858162778588e+02 % max 1.571662611009e+00 min
> 6.326714268751e-02 max/min 2.484168786904e+01
>   7 KSP Residual norm 2.074805889949e+02 % max 1.587767457742e+00 min
> 5.145942909400e-02 max/min 3.085474296347e+01
>   8 KSP Residual norm 1.566220417755e+02 % max 1.597548616381e+00 min
> 4.650092979233e-02 max/min 3.435519727274e+01
>   9 KSP Residual norm 1.157894309297e+02 % max 1.603863600136e+00 min
> 4.344076378399e-02 max/min 3.692070443585e+01
>  10 KSP Residual norm 8.447209442299e+01 % max 1.608204129656e+00 min
> 4.123402730882e-02 max/min 3.900186895670e+01
>   Linear Displacement_ solve converged due to CONVERGED_RTOL iterations 14
>
>
>
>
>
> *  -pc_gamg_aggressive_coarsening 0
>
> will slow coarsening as well as threshold.
>
> That did not help
>
>
> * you can run with '-info :pc' and send me the output (grep on GAMG)
>
> Let?s try to figure out if the fact
> that -pc_gamg_esteig_ksp_monitor_singular_value is an indication of a
> problem first.
>
> Blaise
>
> ?
> Canada Research Chair in Mathematical and Computational Aspects of Solid
> Mechanics (Tier 1)
> Professor, Department of Mathematics & Statistics
> Hamilton Hall room 409A, McMaster University
> 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada
> https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230327/79becd0f/attachment-0001.html>

From daniele.prada85 at gmail.com  Tue Mar 28 10:24:15 2023
From: daniele.prada85 at gmail.com (Daniele Prada)
Date: Tue, 28 Mar 2023 17:24:15 +0200
Subject: [petsc-users] Using PETSc Testing System
In-Reply-To: <CAMYG4GkHvr1O9PU+wbabUD2=+AgmUbh3j_iOefx3hpP_iDX9KA@mail.gmail.com>
References: <CAFAUb+SoVW=Rgd006g_AVDorHLp+iEm+8+=ZGFMp2p_JPKv4kg@mail.gmail.com>
	<8F636F03-6581-4594-877F-CB0A4AC91EA3@gmail.com>
	<CAMYG4GkHvr1O9PU+wbabUD2=+AgmUbh3j_iOefx3hpP_iDX9KA@mail.gmail.com>
Message-ID: <CAFAUb+QNWyCD2pai0YT5R9ywUWs+YsTkm=41Va02kD7kPpCcDw@mail.gmail.com>

Dear Matthew, dear Jacob,

Thank you very much for your useful remarks. I managed to use the PETSc
Testing System by doing as follows:

1. Redefined TESTDIR when running make
2. Used a project tree similar to that of PETSc. For examples, tests for
'package1' are in $MYLIB/src/package1/tests/
3. cp $PETSC_DIR/gmakefile.test $MYLIB/gmakefile.test

Inside gmakefile.test:

4. Right AFTER "-include petscdir.mk" added "-include mylib.mk" to have
$MYLIB exported (note: this affects TESTSRCDIR)
5. Redefined variable pkgs as "pkgs := package1"
6. Introduced a few variables to make PETSC_COMPILE.c work:

CFLAGS := -I$(MYLIB)/include
LDFLAGS = -L$(MYLIB)/lib
LDLIBS = -lmylib

7. Changed the call to gmakegentest.py as follows

$(PYTHON) $(CONFIGDIR)/gmakegentest.py --petsc-dir=$(PETSC_DIR)
--petsc-arch=$(PETSC_ARCH) --testdir=$(TESTDIR) --srcdir=$(MYLIB)/src
--pkg-pkgs=$(pkgs)

8. Changed the rule $(testexe.c) as follows:

$(call quiet,CLINKER) $(EXEFLAGS) $(LDFLAGS) -o $@ $^ $(PETSC_TEST_LIB)
$(LDLIBS)

9. Added the option --srcdir=$(TESTSRCDIR) and set --petsc-dir=$(MYLIB)
when calling query_tests.py, for example:

TESTTARGETS := $(shell $(PYTHON) $(CONFIGDIR)/query_tests.py
--srcdir=$(TESTSRCDIR) --testdir=$(TESTDIR) --petsc-dir=$(MYLIB)
--petsc-arch=$(PETSC_ARCH) --searchin=$(searchin) 'name' '$(search)')



What do you think?

Best,
Daniele

On Mon, Mar 27, 2023 at 4:38?PM Matthew Knepley <knepley at gmail.com> wrote:

> On Mon, Mar 27, 2023 at 10:19?AM Jacob Faibussowitsch <jacob.fai at gmail.com>
> wrote:
>
>> Our testing framework was pretty much tailor-made for the PETSc src tree
>> and as such has many hard-coded paths and decisions. I?m going to go out on
>> a limb and say you probably won?t get this to work...
>>
>
> I think we can help you get this to work. I have wanted to generalize the
> test framework for a long time. Everything is build by
>
>   confg/gmakegentest.py
>
> and I think we can get away with just changing paths here and everything
> will work.
>
>   Thanks!
>
>      Matt
>
>
>> That being said, one of the ?base? paths that the testing harness uses to
>> initially find tests is the `TESTSRCDIR` variable in
>> `${PETSC_DIR}/gmakefile.test`. It is currently defined as
>> ```
>> # TESTSRCDIR is always relative to gmakefile.test
>> #  This must be before includes
>> mkfile_path := $(abspath $(lastword $(MAKEFILE_LIST)))
>> TESTSRCDIR := $(dir $(mkfile_path))src
>> ```
>> You should start by changing this to
>> ```
>> # TESTSRCDIR is always relative to gmakefile.test
>> #  This must be before includes
>> mkfile_path := $(abspath $(lastword $(MAKEFILE_LIST)))
>> TESTSRCDIR ?= $(dir $(mkfile_path))src
>> ```
>> That way you could run your tests via
>> ```
>> $ make test TESTSRCDIR=/path/to/your/src/dir
>> ```
>> I am sure there are many other modifications you will need to make.
>>
>> Best regards,
>>
>> Jacob Faibussowitsch
>> (Jacob Fai - booss - oh - vitch)
>>
>> > On Mar 27, 2023, at 06:14, Daniele Prada <daniele.prada85 at gmail.com>
>> wrote:
>> >
>> > Hello everyone,
>> >
>> > I would like to use the PETSc Testing System for testing a package that
>> I am developing.
>> >
>> > I have read the PETSc developer documentation and have written some
>> tests using the PETSc Test Description Language. I am going through the
>> files in ${PETSC_DIR}/config but I am not able to make the testing system
>> look into the directory tree of my project.
>> >
>> > Any suggestions?
>> >
>> > Thanks in advance
>> > Daniele
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230328/9de2a617/attachment.html>

From Jim.Lutsko at ulb.be  Tue Mar 28 10:32:37 2023
From: Jim.Lutsko at ulb.be (LUTSKO James)
Date: Tue, 28 Mar 2023 15:32:37 +0000
Subject: [petsc-users] Restarting a SLEPC run to refine an eigenvector
Message-ID: <AM9P190MB1666829954C52BD4B5308164F2889@AM9P190MB1666.EURP190.PROD.OUTLOOK.COM>

Hello,
 I am a complete newbe so sorry if this is answered elsewhere or ill-posed but .. I am trying to get the smallest eigenvector of a  large matrix. Since the calculations are very long, I would like to be able to restart my code if I find that the "converged" eigenvector is not good enough without having to start from scratch. I am currently using the options

-eps_type jd  -eps_monitor -eps_smallest_real -eps_conv_abs -eps_tol 1e-8 -st_ksp_type gmres -st_pc_type jacobi -st_ksp_max_it 40

Is there any way to do this? I have tried using EPSSetInitialSubspace with the stored eigenvector but this does not seem to work.

thanks
jim

% James F. Lutsko, CNLPCS, Universite Libre de Bruxelles,
% Campus Plaine -- CP231  B-1050 Bruxelles, Belgium
% tel: +32-2-650-5997   email: jlutsko at ulb.ac.be
% fax: +32-2-650-5767
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230328/b1f106e8/attachment.html>

From bourdin at mcmaster.ca  Tue Mar 28 11:38:54 2023
From: bourdin at mcmaster.ca (Blaise Bourdin)
Date: Tue, 28 Mar 2023 16:38:54 +0000
Subject: [petsc-users] GAMG failure
In-Reply-To: <CADOhEh4KJz8aEeFVL4yeHaa8s86VpXjCknh6eC7C3Ygs1Dpz6w@mail.gmail.com>
References: <1FD5FB62-4111-4376-8126-CFA8E8925620@mcmaster.ca>
	<87y1nmj8bd.fsf@jedbrown.org>
	<CADOhEh4ntgjpta+jSB1gpFt-fHtfVn8WqfcpK=J_teN_yStctA@mail.gmail.com>
	<A4F8C047-E315-4B5E-B6C1-8F319A8638C1@mcmaster.ca>
	<CADOhEh4KJz8aEeFVL4yeHaa8s86VpXjCknh6eC7C3Ygs1Dpz6w@mail.gmail.com>
Message-ID: <D0A20D28-6E5F-448C-9DB5-AD053B4A5E20@mcmaster.ca>

An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230328/66bbb4c3/attachment-0001.html>

From jed at jedbrown.org  Tue Mar 28 12:18:50 2023
From: jed at jedbrown.org (Jed Brown)
Date: Tue, 28 Mar 2023 11:18:50 -0600
Subject: [petsc-users] GAMG failure
In-Reply-To: <D0A20D28-6E5F-448C-9DB5-AD053B4A5E20@mcmaster.ca>
References: <1FD5FB62-4111-4376-8126-CFA8E8925620@mcmaster.ca>
	<87y1nmj8bd.fsf@jedbrown.org>
	<CADOhEh4ntgjpta+jSB1gpFt-fHtfVn8WqfcpK=J_teN_yStctA@mail.gmail.com>
	<A4F8C047-E315-4B5E-B6C1-8F319A8638C1@mcmaster.ca>
	<CADOhEh4KJz8aEeFVL4yeHaa8s86VpXjCknh6eC7C3Ygs1Dpz6w@mail.gmail.com>
	<D0A20D28-6E5F-448C-9DB5-AD053B4A5E20@mcmaster.ca>
Message-ID: <87v8iklrph.fsf@jedbrown.org>

This suite has been good for my solid mechanics solvers. (It's written here as a coarse grid solver because we do matrix-free p-MG first, but you can use it directly.)

https://github.com/hypre-space/hypre/issues/601#issuecomment-1069426997

Blaise Bourdin <bourdin at mcmaster.ca> writes:

>  On Mar 27, 2023, at 9:11 PM, Mark Adams <mfadams at lbl.gov> wrote:
>
>  Yes, the eigen estimates are converging slowly. 
>
>  BTW, have you tried hypre? It is a good solver (lots lots more woman years)
>  These eigen estimates are conceptually simple, but they can lead to problems like this (hypre and an eigen estimate free
>  smoother).
>
> I just moved from petsc 3.3 to main, so my experience with an old version of hyper has not been very convincing. Strangely
> enough, ML has always been the most efficient PC for me. Maybe it?s time to revisit.
> That said, I would really like to get decent performances out of gamg. One day, I?d like to be able to account for the special structure
> of phase-field fracture in the construction of the coarse space.
>
>  But try this (good to have options anyway):
>
>  -pc_gamg_esteig_ksp_max_it 20
>
>  Chevy will scale the estimate that we give by, I think, 5% by default. Maybe 10.
>  You can set that with:
>
>  -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05
>
>  0.2 is the scaling of the high eigen estimate for the low eigen value in Chebyshev.
>
> Jed?s suggestion of using -pc_gamg_reuse_interpolation 0 worked. I am testing your options at the moment.
>
> Thanks a lot,
>
> Blaise
>
> ? 
> Canada Research Chair in Mathematical and Computational Aspects of Solid Mechanics (Tier 1)
> Professor, Department of Mathematics & Statistics
> Hamilton Hall room 409A, McMaster University
> 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada 
> https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243

From jed at jedbrown.org  Tue Mar 28 12:27:58 2023
From: jed at jedbrown.org (Jed Brown)
Date: Tue, 28 Mar 2023 11:27:58 -0600
Subject: [petsc-users] Using PETSc Testing System
In-Reply-To: <CAFAUb+QNWyCD2pai0YT5R9ywUWs+YsTkm=41Va02kD7kPpCcDw@mail.gmail.com>
References: <CAFAUb+SoVW=Rgd006g_AVDorHLp+iEm+8+=ZGFMp2p_JPKv4kg@mail.gmail.com>
	<8F636F03-6581-4594-877F-CB0A4AC91EA3@gmail.com>
	<CAMYG4GkHvr1O9PU+wbabUD2=+AgmUbh3j_iOefx3hpP_iDX9KA@mail.gmail.com>
	<CAFAUb+QNWyCD2pai0YT5R9ywUWs+YsTkm=41Va02kD7kPpCcDw@mail.gmail.com>
Message-ID: <87sfdolra9.fsf@jedbrown.org>

Great that you got it working. We would accept a merge request that made our infrastructure less PETSc-specific so long as it doesn't push more complexity on the end user. That would likely make it easier for you to pull updates in the future. 

Daniele Prada <daniele.prada85 at gmail.com> writes:

> Dear Matthew, dear Jacob,
>
> Thank you very much for your useful remarks. I managed to use the PETSc
> Testing System by doing as follows:
>
> 1. Redefined TESTDIR when running make
> 2. Used a project tree similar to that of PETSc. For examples, tests for
> 'package1' are in $MYLIB/src/package1/tests/
> 3. cp $PETSC_DIR/gmakefile.test $MYLIB/gmakefile.test
>
> Inside gmakefile.test:
>
> 4. Right AFTER "-include petscdir.mk" added "-include mylib.mk" to have
> $MYLIB exported (note: this affects TESTSRCDIR)
> 5. Redefined variable pkgs as "pkgs := package1"
> 6. Introduced a few variables to make PETSC_COMPILE.c work:
>
> CFLAGS := -I$(MYLIB)/include
> LDFLAGS = -L$(MYLIB)/lib
> LDLIBS = -lmylib
>
> 7. Changed the call to gmakegentest.py as follows
>
> $(PYTHON) $(CONFIGDIR)/gmakegentest.py --petsc-dir=$(PETSC_DIR)
> --petsc-arch=$(PETSC_ARCH) --testdir=$(TESTDIR) --srcdir=$(MYLIB)/src
> --pkg-pkgs=$(pkgs)
>
> 8. Changed the rule $(testexe.c) as follows:
>
> $(call quiet,CLINKER) $(EXEFLAGS) $(LDFLAGS) -o $@ $^ $(PETSC_TEST_LIB)
> $(LDLIBS)
>
> 9. Added the option --srcdir=$(TESTSRCDIR) and set --petsc-dir=$(MYLIB)
> when calling query_tests.py, for example:
>
> TESTTARGETS := $(shell $(PYTHON) $(CONFIGDIR)/query_tests.py
> --srcdir=$(TESTSRCDIR) --testdir=$(TESTDIR) --petsc-dir=$(MYLIB)
> --petsc-arch=$(PETSC_ARCH) --searchin=$(searchin) 'name' '$(search)')
>
>
>
> What do you think?
>
> Best,
> Daniele
>
> On Mon, Mar 27, 2023 at 4:38?PM Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Mon, Mar 27, 2023 at 10:19?AM Jacob Faibussowitsch <jacob.fai at gmail.com>
>> wrote:
>>
>>> Our testing framework was pretty much tailor-made for the PETSc src tree
>>> and as such has many hard-coded paths and decisions. I?m going to go out on
>>> a limb and say you probably won?t get this to work...
>>>
>>
>> I think we can help you get this to work. I have wanted to generalize the
>> test framework for a long time. Everything is build by
>>
>>   confg/gmakegentest.py
>>
>> and I think we can get away with just changing paths here and everything
>> will work.
>>
>>   Thanks!
>>
>>      Matt
>>
>>
>>> That being said, one of the ?base? paths that the testing harness uses to
>>> initially find tests is the `TESTSRCDIR` variable in
>>> `${PETSC_DIR}/gmakefile.test`. It is currently defined as
>>> ```
>>> # TESTSRCDIR is always relative to gmakefile.test
>>> #  This must be before includes
>>> mkfile_path := $(abspath $(lastword $(MAKEFILE_LIST)))
>>> TESTSRCDIR := $(dir $(mkfile_path))src
>>> ```
>>> You should start by changing this to
>>> ```
>>> # TESTSRCDIR is always relative to gmakefile.test
>>> #  This must be before includes
>>> mkfile_path := $(abspath $(lastword $(MAKEFILE_LIST)))
>>> TESTSRCDIR ?= $(dir $(mkfile_path))src
>>> ```
>>> That way you could run your tests via
>>> ```
>>> $ make test TESTSRCDIR=/path/to/your/src/dir
>>> ```
>>> I am sure there are many other modifications you will need to make.
>>>
>>> Best regards,
>>>
>>> Jacob Faibussowitsch
>>> (Jacob Fai - booss - oh - vitch)
>>>
>>> > On Mar 27, 2023, at 06:14, Daniele Prada <daniele.prada85 at gmail.com>
>>> wrote:
>>> >
>>> > Hello everyone,
>>> >
>>> > I would like to use the PETSc Testing System for testing a package that
>>> I am developing.
>>> >
>>> > I have read the PETSc developer documentation and have written some
>>> tests using the PETSc Test Description Language. I am going through the
>>> files in ${PETSC_DIR}/config but I am not able to make the testing system
>>> look into the directory tree of my project.
>>> >
>>> > Any suggestions?
>>> >
>>> > Thanks in advance
>>> > Daniele
>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>

From jroman at dsic.upv.es  Tue Mar 28 12:43:33 2023
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Tue, 28 Mar 2023 19:43:33 +0200
Subject: [petsc-users] Restarting a SLEPC run to refine an eigenvector
In-Reply-To: <AM9P190MB1666829954C52BD4B5308164F2889@AM9P190MB1666.EURP190.PROD.OUTLOOK.COM>
References: <AM9P190MB1666829954C52BD4B5308164F2889@AM9P190MB1666.EURP190.PROD.OUTLOOK.COM>
Message-ID: <AF15F042-828F-42BF-B869-5220282F6415@dsic.upv.es>

What do you mean that EPSSetInitialSubspace() does not work? Doesn't it improve convergence with respect to not using it?

Is your smallest eigenvalue positive? And do you have many eigenvalues close to the smallest one?

Jose


> El 28 mar 2023, a las 17:32, LUTSKO James via petsc-users <petsc-users at mcs.anl.gov> escribi?:
> 
> Hello,
>  I am a complete newbe so sorry if this is answered elsewhere or ill-posed but .. I am trying to get the smallest eigenvector of a  large matrix. Since the calculations are very long, I would like to be able to restart my code if I find that the "converged" eigenvector is not good enough without having to start from scratch. I am currently using the options 
> 
> -eps_type jd  -eps_monitor -eps_smallest_real -eps_conv_abs -eps_tol 1e-8 -st_ksp_type gmres -st_pc_type jacobi -st_ksp_max_it 40
> 
> Is there any way to do this? I have tried using EPSSetInitialSubspace with the stored eigenvector but this does not seem to work. 
> 
> thanks 
> jim
> 
> % James F. Lutsko, CNLPCS, Universite Libre de Bruxelles,
> % Campus Plaine -- CP231  B-1050 Bruxelles, Belgium
> % tel: +32-2-650-5997   email: jlutsko at ulb.ac.be
> % fax: +32-2-650-5767


From narnoldm at umich.edu  Tue Mar 28 14:28:53 2023
From: narnoldm at umich.edu (Nicholas Arnold-Medabalimi)
Date: Tue, 28 Mar 2023 15:28:53 -0400
Subject: [petsc-users] Petsc DMLabel Fortran Stub request
In-Reply-To: <CAMYG4Gmw_3oZrdxxjVuWq7+FuYdEkH8Yiy6ucN6CGZJBAoOtBA@mail.gmail.com>
References: <CAB1dcV6rrf7LVfD9ZOmuLqoD8C306BU0kcE-3FE7cn=nwkqiSQ@mail.gmail.com>
	<CAMYG4Gmw_3oZrdxxjVuWq7+FuYdEkH8Yiy6ucN6CGZJBAoOtBA@mail.gmail.com>
Message-ID: <CAB1dcV6_0mOq695iTvpzpmPMBa=DXe5YaKfMnHgazo57XLgEVg@mail.gmail.com>

Hi Matthew

Thanks for checking in on this. Fortunately, I was able to get the behavior
I needed through alternate means so I probably wouldn't investigate doing
this further.

Sincerely
Nicholas

On Mon, Mar 27, 2023 at 8:49?PM Matthew Knepley <knepley at gmail.com> wrote:

> On Fri, Jan 6, 2023 at 10:03?AM Nicholas Arnold-Medabalimi <
> narnoldm at umich.edu> wrote:
>
>> Hi Petsc Users
>>
>
> I apologize. I found this email today and it looks like no one answered.
>
>
>> I am trying to use the sequence of
>> call DMLabelPropagateBegin(synchLabel,sf,ierr)
>> call
>> DMLabelPropagatePush(synchLabel,sf,PETSC_NULL_OPTIONS,PETSC_NULL_INTEGER,ierr)
>> call DMLabelPropagateEnd(synchLabel,sf, ierr)
>> in fortran.
>>
>> I apologize if I messed something up, it appears as if the
>> DMLabelPropagatePush command doesn't have an appropriate Fortran interface
>> as I get an undefined reference when it is called.
>>
>
> Yes, it takes a function pointer, and using function pointers with Fortran
> is not easy, although it can be done. It might be better to create a C
> function with some default marking and then wrap that. What do you want to
> do?
>
>   Thanks,
>
>      Matt
>
>
>> I would appreciate any assistance.
>>
>> As a side note in practice, what is the proper Fortran NULL pointer to
>> use for void arguments? I used an integer one temporarily to get to the
>> undefined reference error but I assume it doesn't matter?
>>
>>
>> Sincerely
>> Nicholas
>>
>> --
>> Nicholas Arnold-Medabalimi
>>
>> Ph.D. Candidate
>> Computational Aeroscience Lab
>> University of Michigan
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>


-- 
Nicholas Arnold-Medabalimi

Ph.D. Candidate
Computational Aeroscience Lab
University of Michigan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230328/d4556dd1/attachment-0001.html>

From daniele.prada85 at gmail.com  Tue Mar 28 16:51:38 2023
From: daniele.prada85 at gmail.com (Daniele Prada)
Date: Tue, 28 Mar 2023 23:51:38 +0200
Subject: [petsc-users] Using PETSc Testing System
In-Reply-To: <87sfdolra9.fsf@jedbrown.org>
References: <CAFAUb+SoVW=Rgd006g_AVDorHLp+iEm+8+=ZGFMp2p_JPKv4kg@mail.gmail.com>
	<8F636F03-6581-4594-877F-CB0A4AC91EA3@gmail.com>
	<CAMYG4GkHvr1O9PU+wbabUD2=+AgmUbh3j_iOefx3hpP_iDX9KA@mail.gmail.com>
	<CAFAUb+QNWyCD2pai0YT5R9ywUWs+YsTkm=41Va02kD7kPpCcDw@mail.gmail.com>
	<87sfdolra9.fsf@jedbrown.org>
Message-ID: <CAFAUb+S+eGwq_8=fnhdWaeJv7ifwe7cdhcaS=zHxtiCmjaTbAw@mail.gmail.com>

Will do, thanks!

On Tue, Mar 28, 2023 at 7:28?PM Jed Brown <jed at jedbrown.org> wrote:

> Great that you got it working. We would accept a merge request that made
> our infrastructure less PETSc-specific so long as it doesn't push more
> complexity on the end user. That would likely make it easier for you to
> pull updates in the future.
>
> Daniele Prada <daniele.prada85 at gmail.com> writes:
>
> > Dear Matthew, dear Jacob,
> >
> > Thank you very much for your useful remarks. I managed to use the PETSc
> > Testing System by doing as follows:
> >
> > 1. Redefined TESTDIR when running make
> > 2. Used a project tree similar to that of PETSc. For examples, tests for
> > 'package1' are in $MYLIB/src/package1/tests/
> > 3. cp $PETSC_DIR/gmakefile.test $MYLIB/gmakefile.test
> >
> > Inside gmakefile.test:
> >
> > 4. Right AFTER "-include petscdir.mk" added "-include mylib.mk" to have
> > $MYLIB exported (note: this affects TESTSRCDIR)
> > 5. Redefined variable pkgs as "pkgs := package1"
> > 6. Introduced a few variables to make PETSC_COMPILE.c work:
> >
> > CFLAGS := -I$(MYLIB)/include
> > LDFLAGS = -L$(MYLIB)/lib
> > LDLIBS = -lmylib
> >
> > 7. Changed the call to gmakegentest.py as follows
> >
> > $(PYTHON) $(CONFIGDIR)/gmakegentest.py --petsc-dir=$(PETSC_DIR)
> > --petsc-arch=$(PETSC_ARCH) --testdir=$(TESTDIR) --srcdir=$(MYLIB)/src
> > --pkg-pkgs=$(pkgs)
> >
> > 8. Changed the rule $(testexe.c) as follows:
> >
> > $(call quiet,CLINKER) $(EXEFLAGS) $(LDFLAGS) -o $@ $^ $(PETSC_TEST_LIB)
> > $(LDLIBS)
> >
> > 9. Added the option --srcdir=$(TESTSRCDIR) and set --petsc-dir=$(MYLIB)
> > when calling query_tests.py, for example:
> >
> > TESTTARGETS := $(shell $(PYTHON) $(CONFIGDIR)/query_tests.py
> > --srcdir=$(TESTSRCDIR) --testdir=$(TESTDIR) --petsc-dir=$(MYLIB)
> > --petsc-arch=$(PETSC_ARCH) --searchin=$(searchin) 'name' '$(search)')
> >
> >
> >
> > What do you think?
> >
> > Best,
> > Daniele
> >
> > On Mon, Mar 27, 2023 at 4:38?PM Matthew Knepley <knepley at gmail.com>
> wrote:
> >
> >> On Mon, Mar 27, 2023 at 10:19?AM Jacob Faibussowitsch <
> jacob.fai at gmail.com>
> >> wrote:
> >>
> >>> Our testing framework was pretty much tailor-made for the PETSc src
> tree
> >>> and as such has many hard-coded paths and decisions. I?m going to go
> out on
> >>> a limb and say you probably won?t get this to work...
> >>>
> >>
> >> I think we can help you get this to work. I have wanted to generalize
> the
> >> test framework for a long time. Everything is build by
> >>
> >>   confg/gmakegentest.py
> >>
> >> and I think we can get away with just changing paths here and everything
> >> will work.
> >>
> >>   Thanks!
> >>
> >>      Matt
> >>
> >>
> >>> That being said, one of the ?base? paths that the testing harness uses
> to
> >>> initially find tests is the `TESTSRCDIR` variable in
> >>> `${PETSC_DIR}/gmakefile.test`. It is currently defined as
> >>> ```
> >>> # TESTSRCDIR is always relative to gmakefile.test
> >>> #  This must be before includes
> >>> mkfile_path := $(abspath $(lastword $(MAKEFILE_LIST)))
> >>> TESTSRCDIR := $(dir $(mkfile_path))src
> >>> ```
> >>> You should start by changing this to
> >>> ```
> >>> # TESTSRCDIR is always relative to gmakefile.test
> >>> #  This must be before includes
> >>> mkfile_path := $(abspath $(lastword $(MAKEFILE_LIST)))
> >>> TESTSRCDIR ?= $(dir $(mkfile_path))src
> >>> ```
> >>> That way you could run your tests via
> >>> ```
> >>> $ make test TESTSRCDIR=/path/to/your/src/dir
> >>> ```
> >>> I am sure there are many other modifications you will need to make.
> >>>
> >>> Best regards,
> >>>
> >>> Jacob Faibussowitsch
> >>> (Jacob Fai - booss - oh - vitch)
> >>>
> >>> > On Mar 27, 2023, at 06:14, Daniele Prada <daniele.prada85 at gmail.com>
> >>> wrote:
> >>> >
> >>> > Hello everyone,
> >>> >
> >>> > I would like to use the PETSc Testing System for testing a package
> that
> >>> I am developing.
> >>> >
> >>> > I have read the PETSc developer documentation and have written some
> >>> tests using the PETSc Test Description Language. I am going through the
> >>> files in ${PETSC_DIR}/config but I am not able to make the testing
> system
> >>> look into the directory tree of my project.
> >>> >
> >>> > Any suggestions?
> >>> >
> >>> > Thanks in advance
> >>> > Daniele
> >>>
> >>>
> >>
> >> --
> >> What most experimenters take for granted before they begin their
> >> experiments is infinitely more interesting than any results to which
> their
> >> experiments lead.
> >> -- Norbert Wiener
> >>
> >> https://www.cse.buffalo.edu/~knepley/
> >> <http://www.cse.buffalo.edu/~knepley/>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230328/59329c75/attachment.html>

From mfadams at lbl.gov  Tue Mar 28 19:46:57 2023
From: mfadams at lbl.gov (Mark Adams)
Date: Tue, 28 Mar 2023 20:46:57 -0400
Subject: [petsc-users] GAMG failure
In-Reply-To: <D0A20D28-6E5F-448C-9DB5-AD053B4A5E20@mcmaster.ca>
References: <1FD5FB62-4111-4376-8126-CFA8E8925620@mcmaster.ca>
	<87y1nmj8bd.fsf@jedbrown.org>
	<CADOhEh4ntgjpta+jSB1gpFt-fHtfVn8WqfcpK=J_teN_yStctA@mail.gmail.com>
	<A4F8C047-E315-4B5E-B6C1-8F319A8638C1@mcmaster.ca>
	<CADOhEh4KJz8aEeFVL4yeHaa8s86VpXjCknh6eC7C3Ygs1Dpz6w@mail.gmail.com>
	<D0A20D28-6E5F-448C-9DB5-AD053B4A5E20@mcmaster.ca>
Message-ID: <CADOhEh5Bqhk1xzFZMp_JeR6s1jbCXt6yAY4ojKZmxwptVS4K5Q@mail.gmail.com>

On Tue, Mar 28, 2023 at 12:38?PM Blaise Bourdin <bourdin at mcmaster.ca> wrote:

>
>
> On Mar 27, 2023, at 9:11 PM, Mark Adams <mfadams at lbl.gov> wrote:
>
> Yes, the eigen estimates are converging slowly.
>
> BTW, have you tried hypre? It is a good solver (lots lots more woman years)
> These eigen estimates are conceptually simple, but they can lead to
> problems like this (hypre and an eigen estimate free smoother).
>
> I just moved from petsc 3.3 to main, so my experience with an old version
> of hyper has not been very convincing. Strangely enough, ML has always been
> the most efficient PC for me.
>

ML is a good solver.


> Maybe it?s time to revisit.
> That said, I would really like to get decent performances out of gamg. One
> day, I?d like to be able to account for the special structure of
> phase-field fracture in the construction of the coarse space.
>
>
> But try this (good to have options anyway):
>
> -pc_gamg_esteig_ksp_max_it 20
>
> Chevy will scale the estimate that we give by, I think, 5% by default.
> Maybe 10.
> You can set that with:
>
> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,*1.05*
>
> 0.2 is the scaling of the high eigen estimate for the low eigen value in
> Chebyshev.
>
>
>
> Jed?s suggestion of using -pc_gamg_reuse_interpolation 0 worked.
>

OK, have to admit I am surprised.
But I guess with your fracture the matrix/physics/dynamics does change a lot


> I am testing your options at the moment.
>

There are a lot of options and it is cumbersome but they are finite and
good to know.
Glad its working,


>
> Thanks a lot,
>
> Blaise
>
> ?
> Canada Research Chair in Mathematical and Computational Aspects of Solid
> Mechanics (Tier 1)
> Professor, Department of Mathematics & Statistics
> Hamilton Hall room 409A, McMaster University
> 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada
> https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230328/2a5814b2/attachment.html>

From Jim.Lutsko at ulb.be  Wed Mar 29 02:58:47 2023
From: Jim.Lutsko at ulb.be (LUTSKO James)
Date: Wed, 29 Mar 2023 07:58:47 +0000
Subject: [petsc-users] Restarting a SLEPC run to refine an eigenvector
In-Reply-To: <AF15F042-828F-42BF-B869-5220282F6415@dsic.upv.es>
References: <AM9P190MB1666829954C52BD4B5308164F2889@AM9P190MB1666.EURP190.PROD.OUTLOOK.COM>
	<AF15F042-828F-42BF-B869-5220282F6415@dsic.upv.es>
Message-ID: <AM9P190MB16667B47C72C739A789E11B3F2899@AM9P190MB1666.EURP190.PROD.OUTLOOK.COM>

I withdraw my question. In trying to put together an example to illustrate the problem, I found that EPSSetInitialSubspace() does indeed "work" - in the sense that the eigenvalue, which decreases monotonically during the calculation, picks up from where the previous run left off. I am not sure why I had previously thought it was otherwise.

thanks

% James F. Lutsko, CNLPCS, Universite Libre de Bruxelles,
% Campus Plaine -- CP231  B-1050 Bruxelles, Belgium
% tel: +32-2-650-5997   email: jlutsko at ulb.ac.be
% fax: +32-2-650-5767
________________________________
From: Jose E. Roman <jroman at dsic.upv.es>
Sent: Tuesday, March 28, 2023 7:43 PM
To: LUTSKO James <Jim.Lutsko at ulb.be>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Restarting a SLEPC run to refine an eigenvector

What do you mean that EPSSetInitialSubspace() does not work? Doesn't it improve convergence with respect to not using it?

Is your smallest eigenvalue positive? And do you have many eigenvalues close to the smallest one?

Jose


> El 28 mar 2023, a las 17:32, LUTSKO James via petsc-users <petsc-users at mcs.anl.gov> escribi?:
>
> Hello,
>  I am a complete newbe so sorry if this is answered elsewhere or ill-posed but .. I am trying to get the smallest eigenvector of a  large matrix. Since the calculations are very long, I would like to be able to restart my code if I find that the "converged" eigenvector is not good enough without having to start from scratch. I am currently using the options
>
> -eps_type jd  -eps_monitor -eps_smallest_real -eps_conv_abs -eps_tol 1e-8 -st_ksp_type gmres -st_pc_type jacobi -st_ksp_max_it 40
>
> Is there any way to do this? I have tried using EPSSetInitialSubspace with the stored eigenvector but this does not seem to work.
>
> thanks
> jim
>
> % James F. Lutsko, CNLPCS, Universite Libre de Bruxelles,
> % Campus Plaine -- CP231  B-1050 Bruxelles, Belgium
> % tel: +32-2-650-5997   email: jlutsko at ulb.ac.be
> % fax: +32-2-650-5767

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230329/a6b0aa13/attachment-0001.html>

From jroman at dsic.upv.es  Wed Mar 29 05:53:06 2023
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Wed, 29 Mar 2023 12:53:06 +0200
Subject: [petsc-users] Restarting a SLEPC run to refine an eigenvector
In-Reply-To: <AM9P190MB16667B47C72C739A789E11B3F2899@AM9P190MB1666.EURP190.PROD.OUTLOOK.COM>
References: <AM9P190MB1666829954C52BD4B5308164F2889@AM9P190MB1666.EURP190.PROD.OUTLOOK.COM>
	<AF15F042-828F-42BF-B869-5220282F6415@dsic.upv.es>
	<AM9P190MB16667B47C72C739A789E11B3F2899@AM9P190MB1666.EURP190.PROD.OUTLOOK.COM>
Message-ID: <CC08A6E3-5CBB-4B71-B250-4115BE856927@dsic.upv.es>


> El 29 mar 2023, a las 9:58, LUTSKO James <Jim.Lutsko at ulb.be> escribi?:
> 
> I withdraw my question. In trying to put together an example to illustrate the problem, I found thatEPSSetInitialSubspace() does indeed "work" - in the sense that the eigenvalue, which decreases monotonically during the calculation, picks up from where the previous run left off. I am not sure why I had previously thought it was otherwise.
> 
> thanks
> 

Your use case is a typical one that might have slow convergence. In Davidson-type methods it is generally better if you can use a more powerful preconditioner (jacobi is likely not helping).

If you can afford to factorize the matrix I would suggest using shift-and-invert with target=0 (or whatever is better for your application) with the default solver (Krylov-Schur). But normally factorizing a large Hamiltonian matrix is prohibitive.

If you want, send me a sample matrix and I can do some tests.

Jose



From elias.karabelas at uni-graz.at  Thu Mar 30 10:56:37 2023
From: elias.karabelas at uni-graz.at (Karabelas, Elias (elias.karabelas@uni-graz.at))
Date: Thu, 30 Mar 2023 15:56:37 +0000
Subject: [petsc-users] Augmented Linear System
Message-ID: <2267f28c-ec43-66b8-43dd-29b4c6288478@uni-graz.at>

Dear Community,

I have a linear system of the form

|K B| du ?? f1

 ?????? =

|C D| dp ?? f2

where K is a big m x m sparsematrix that comes from some FE 
discretization, B is a coupling matrix (of the form m x 4), C is of the 
form (4 x m) and D is 4x4.

I save B and C as 4 Vecs and D as a 4x4 double array. D might be 
singular so at the moment I use the following schur-complement approach 
to solve this system

1) Calculate the vecs v1 = KSP(K,PrecK) * f1, invB = [ KSP(K, PrecK) * 
B[0], KSP(K, PrecK) * B[1], KSP(K, PrecK) * B[2], KSP(K, PrecK) * B[3] ]

2) Build the schurcomplement S=[C ^ K^-1 B - D] via VecDots (C ^ K^-1 B 
[i, j] = VecDot(C[i], invB[j])

3) invert S (this seems to be mostly non-singular) to get dp

4) calculate du with dp

So counting this, I need 5x to call KSP which can be expensive and I 
thought of somehow doing the Schur-Complement the other way around, 
however due to the (possible) singularity of D this seems like a bad 
idea (even using a pseudoinverse)

Two things puzzle me here still

A) Can this be done more efficiently?

B) In case my above matrix is the Jacobian in a newton method, how do I 
make sure with any form of Schur Complement approach that I hit the 
correct residual reduction?

Thanks

Elias

-- 
Dr. Elias Karabelas
Universit?tsassistent | PostDoc

Institut f?r Mathematik & Wissenschaftliches Rechnen | Institute of Mathematics & Scientific Computing
Universit?t Graz | University of Graz

Heinrichstra?e 36, 8010 Graz
Tel.:   +43/(0)316/380-8546
E-Mail: elias.karabelas at uni-graz.at
Web: 	https://ccl.medunigraz.at
  
Bitte denken Sie an die Umwelt, bevor Sie dieses E-Mail drucken. Danke!
Please consider the environment before printing this e-mail. Thank you!


From mfadams at lbl.gov  Thu Mar 30 12:41:43 2023
From: mfadams at lbl.gov (Mark Adams)
Date: Thu, 30 Mar 2023 13:41:43 -0400
Subject: [petsc-users] Augmented Linear System
In-Reply-To: <2267f28c-ec43-66b8-43dd-29b4c6288478@uni-graz.at>
References: <2267f28c-ec43-66b8-43dd-29b4c6288478@uni-graz.at>
Message-ID: <CADOhEh60Cd-r7FZHGpLv2VuFxt7C8HMxiLKv-n38V8C5TUYfkQ@mail.gmail.com>

You can lag the update of the Schur complement and use your solver as a
preconditioner.
If your problems don't change much you might converge fast enough (ie, < 4
iterations with one solve per iteration), but what you have is not bad if
the size of your auxiliary, p, space does not grow.

Mark

On Thu, Mar 30, 2023 at 11:56?AM Karabelas, Elias (
elias.karabelas at uni-graz.at) <elias.karabelas at uni-graz.at> wrote:

> Dear Community,
>
> I have a linear system of the form
>
> |K B| du    f1
>
>         =
>
> |C D| dp    f2
>
> where K is a big m x m sparsematrix that comes from some FE
> discretization, B is a coupling matrix (of the form m x 4), C is of the
> form (4 x m) and D is 4x4.
>
> I save B and C as 4 Vecs and D as a 4x4 double array. D might be
> singular so at the moment I use the following schur-complement approach
> to solve this system
>
> 1) Calculate the vecs v1 = KSP(K,PrecK) * f1, invB = [ KSP(K, PrecK) *
> B[0], KSP(K, PrecK) * B[1], KSP(K, PrecK) * B[2], KSP(K, PrecK) * B[3] ]
>
> 2) Build the schurcomplement S=[C ^ K^-1 B - D] via VecDots (C ^ K^-1 B
> [i, j] = VecDot(C[i], invB[j])
>
> 3) invert S (this seems to be mostly non-singular) to get dp
>
> 4) calculate du with dp
>
> So counting this, I need 5x to call KSP which can be expensive and I
> thought of somehow doing the Schur-Complement the other way around,
> however due to the (possible) singularity of D this seems like a bad
> idea (even using a pseudoinverse)
>
> Two things puzzle me here still
>
> A) Can this be done more efficiently?
>
> B) In case my above matrix is the Jacobian in a newton method, how do I
> make sure with any form of Schur Complement approach that I hit the
> correct residual reduction?
>
> Thanks
>
> Elias
>
> --
> Dr. Elias Karabelas
> Universit?tsassistent | PostDoc
>
> Institut f?r Mathematik & Wissenschaftliches Rechnen | Institute of
> Mathematics & Scientific Computing
> Universit?t Graz | University of Graz
>
> Heinrichstra?e 36, 8010 Graz
> Tel.:   +43/(0)316/380-8546
> E-Mail: elias.karabelas at uni-graz.at
> Web:    https://ccl.medunigraz.at
>
> Bitte denken Sie an die Umwelt, bevor Sie dieses E-Mail drucken. Danke!
> Please consider the environment before printing this e-mail. Thank you!
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230330/f154a089/attachment.html>

From mi.mike1021 at gmail.com  Thu Mar 30 22:00:15 2023
From: mi.mike1021 at gmail.com (Mike Michell)
Date: Thu, 30 Mar 2023 22:00:15 -0500
Subject: [petsc-users] DMPlex PETSCViewer for Surface Component
In-Reply-To: <CAMYG4Gn7vL6syvYuUEwmYNnmLgF1LY7HZFd5Q9BVE9XmhZLMjA@mail.gmail.com>
References: <CAEc7osb9DCAUOnUgCZ7S5fhAXfroi847W2dQbZVkd34pxW84Lw@mail.gmail.com>
	<87pmhje1nz.fsf@jedbrown.org>
	<CAMYG4Gn7vL6syvYuUEwmYNnmLgF1LY7HZFd5Q9BVE9XmhZLMjA@mail.gmail.com>
Message-ID: <CAEc7osaAQT6vxVOj65YaGOvSjQdD-M7p9PZKLLgFMV8cMBW4HQ@mail.gmail.com>

Hi Matt, this is a follow-up to the previous question:

I have created a short code as below to create a sub-dm on the surface,
extracted from the original volume dmplex:

-------------------------------------
    call DMClone(dm_origin, dm_wall, ierr);CHKERRA(ierr)

    ! label for face sets
    call DMGetLabel(dm_wall, "Face Sets", label_facesets,
ierr);CHKERRA(ierr)
    call DMPlexLabelComplete(dm_wall, label_facesets, ierr);CHKERRA(ierr)

    ! label for vertex on surface
    call DMCreateLabel(dm_wall, "Wall", ierr);CHKERRA(ierr)
    call DMGetLabel(dm_wall, "Wall", label_surf, ierr);CHKERRA(ierr)
    call DMPlexGetChart(dm_wall, ist, iend, ierr);CHKERRA(ierr)
    do i=ist,iend
      call DMLabelGetValue(label_facesets, i, val, ierr);CHKERRA(ierr)
      if(val .eq. ID_wall) then
        call DMLabelSetValue(label_surf, i, ID_wall, ierr);CHKERRA(ierr)
      endif
    enddo
    call DMPlexLabelComplete(dm_wall, label_surf, ierr);CHKERRA(ierr)

    ! create submesh
    call DMPlexCreateSubmesh(dm_wall, label_surf, ID_wall, PETSC_TRUE,
dm_sub, ierr);CHKERRA(ierr)
    call DMPlexGetSubpointMap(dm_sub, label_sub, ierr);CHKERRA(ierr)
-------------------------------------

It is a bit unclear how to map the vector on each vertex of the original
volume dm (dm_wall) into the subdm (dm_sub). The function
DMPlexGetSubpointMap() seems to create a subpointMap (label_sub) for this
mapping, but it is hard to get an idea how to use that DMLabel for the
mapping. Shall I create DMCreateInterpolation()? But it uses Mat and Vec to
define mapping rule. Could I ask for any comments?

Thanks,
Mike


> Yes, you can use DMPlexCreateSubmesh() (and friends depending on exactly
> what kind of submesh you want). This will allow you to create a vector over
> only this mesh, and map your volumetric solution to that subvector. Then
> you can view the subvector (which pulls in the submesh).
>
>   Thanks,
>
>      Matt
>
> On Mon, Aug 1, 2022 at 10:59 PM Jed Brown <jed at jedbrown.org> wrote:
>
>> I would create a sub-DM containing only the part you want to view.
>>
>> Mike Michell <mi.mike1021 at gmail.com> writes:
>>
>> > Hi,
>> >
>> > I am a user of DMPlex object in 3D grid topology. Currently the solution
>> > field is printed out using viewer PETSCVIEWERVTK in .vtu format. By
>> doing
>> > that the entire volume components are written in the file. I was
>> wondering
>> > if there is an option that I can tell PETSc viewer to print out only
>> > surface component, instead of the entire volume.
>> >
>> > Best,
>> > Mike
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230330/ef1d80fb/attachment.html>

From gongding at cn.cogenda.com  Thu Mar 30 23:14:22 2023
From: gongding at cn.cogenda.com (Gong Ding)
Date: Fri, 31 Mar 2023 12:14:22 +0800
Subject: [petsc-users] help: use real and complex petsc together
Message-ID: <eadd0e52-bb63-91d3-3581-894d420e50fd@cn.cogenda.com>

Dear petsc developer,

We are considering use complex matrix for eigen value decomposition via 
slepc will keep DC/Tran simulation in real world.

We can, compiler each solver (AC, DC, Tran, etc) as dynamic library 
(.so), and load them by dlopen.

So, is it possible, each time we load a solver, which links to 
real/comples petsc,? to do the simulation. After that, we dlclose it and 
load next solver.

And another question is, we must keep MPI commnucator in the main code 
and call PetscInitialize/PetscFinalize in the so, it seems petsc support 
this mechanism, right?


Gong Ding



From elias.karabelas at uni-graz.at  Fri Mar 31 03:58:31 2023
From: elias.karabelas at uni-graz.at (Karabelas, Elias (elias.karabelas@uni-graz.at))
Date: Fri, 31 Mar 2023 08:58:31 +0000
Subject: [petsc-users] Augmented Linear System
In-Reply-To: <CADOhEh60Cd-r7FZHGpLv2VuFxt7C8HMxiLKv-n38V8C5TUYfkQ@mail.gmail.com>
References: <2267f28c-ec43-66b8-43dd-29b4c6288478@uni-graz.at>
	<CADOhEh60Cd-r7FZHGpLv2VuFxt7C8HMxiLKv-n38V8C5TUYfkQ@mail.gmail.com>
Message-ID: <c88acb97-b17e-7fb7-6428-3d90649df387@uni-graz.at>

Hi Mark,

thanks for the input, however I didn't quite get what you mean.

Maybe I should be a bit more precisce what I want to achieve and why:

So this specific form of block system arises in some biomedical application that colleagues and me published in https://www.sciencedirect.com/science/article/pii/S0045782521004230 (the intersting part is Appendix B.3)

It boils down to a Newton method for solving nolinear mechanics describing the motion of the human hear, that is coupled on some Neumann surfaces (the surfaces of the inner cavities of each bloodpool in the heart) with a pressure that comes from a complicated 0D ODE model that describes cardiovascular physiology. This comes to look like

|F_1(u,p)|   | 0 |
|     | = |   |
|F_2(u,p)|   | 0 |

with F_1 is the residual of nonlinear mechanics plus a nonlinear boundary coupling term and F_2 is a coupling term to the ODE system. In this case u is displacement and p is the pressure calculated from the ODE model (one for each cavity in the heart, which gives four).

After linearization, we arrive exactly at the aforementioned blocksystem, that we solve at the moment by a Schurcomplement approach based on K.
So using this we can get around without doing a MATSHELL for the schurcomplement as we can just apply the KSP for K five times in order to approximate the solution of the schur-complement system. However here it gets tricky: The outer newton loop is working based on an inexact newton method with a forcing term from Walker et al. So basically the atol and rtol of the KSP are not constant but vary, so I guess this will influence how well we actually resolve the solution to the schur-complement system (I tried to find some works that can explain how to choose forcing terms in this case but found none).

This brought me to think if we can do this the other way around and do a pseudo-inverse of D because it's 4x4 and there is no need for a KSP there.
I did a test implementation with a MATSHELL that realizes (K - B D^+ C) and use just K for building an GAMG prec however this fails spectaculary, because D^+ can behave very badly and the other way around I have (C K^-1 B - D) and this behaves more like a singular pertubation of the Matrix C K^-1 B which behaves nicer. So here I stopped investigating because my PETSc expertise is not bad but certainly not good enough to judge which approach would pay off more in terms of runtime (by gut feeling was that building a MATSHELL requires then only one KSP solve vs the other 5).

However I'm happy to hear some alternatives that I could persue in order to speed up our current solver strategy or even be able to build a nice MATSHELL.

Thanks
Elias




Am 30.03.23 um 19:41 schrieb Mark Adams:
You can lag the update of the Schur complement and use your solver as a preconditioner.
If your problems don't change much you might converge fast enough (ie, < 4 iterations with one solve per iteration), but what you have is not bad if the size of your auxiliary, p, space does not grow.

Mark

On Thu, Mar 30, 2023 at 11:56?AM Karabelas, Elias (elias.karabelas at uni-graz.at<mailto:elias.karabelas at uni-graz.at>) <elias.karabelas at uni-graz.at<mailto:elias.karabelas at uni-graz.at>> wrote:
Dear Community,

I have a linear system of the form

|K B| du    f1

        =

|C D| dp    f2

where K is a big m x m sparsematrix that comes from some FE
discretization, B is a coupling matrix (of the form m x 4), C is of the
form (4 x m) and D is 4x4.

I save B and C as 4 Vecs and D as a 4x4 double array. D might be
singular so at the moment I use the following schur-complement approach
to solve this system

1) Calculate the vecs v1 = KSP(K,PrecK) * f1, invB = [ KSP(K, PrecK) *
B[0], KSP(K, PrecK) * B[1], KSP(K, PrecK) * B[2], KSP(K, PrecK) * B[3] ]

2) Build the schurcomplement S=[C ^ K^-1 B - D] via VecDots (C ^ K^-1 B
[i, j] = VecDot(C[i], invB[j])

3) invert S (this seems to be mostly non-singular) to get dp

4) calculate du with dp

So counting this, I need 5x to call KSP which can be expensive and I
thought of somehow doing the Schur-Complement the other way around,
however due to the (possible) singularity of D this seems like a bad
idea (even using a pseudoinverse)

Two things puzzle me here still

A) Can this be done more efficiently?

B) In case my above matrix is the Jacobian in a newton method, how do I
make sure with any form of Schur Complement approach that I hit the
correct residual reduction?

Thanks

Elias

--
Dr. Elias Karabelas
Universit?tsassistent | PostDoc

Institut f?r Mathematik & Wissenschaftliches Rechnen | Institute of Mathematics & Scientific Computing
Universit?t Graz | University of Graz

Heinrichstra?e 36, 8010 Graz
Tel.:   +43/(0)316/380-8546
E-Mail: elias.karabelas at uni-graz.at<mailto:elias.karabelas at uni-graz.at>
Web:    https://ccl.medunigraz.at

Bitte denken Sie an die Umwelt, bevor Sie dieses E-Mail drucken. Danke!
Please consider the environment before printing this e-mail. Thank you!



--
Dr. Elias Karabelas
Universit?tsassistent | PostDoc

Institut f?r Mathematik & Wissenschaftliches Rechnen | Institute of Mathematics & Scientific Computing
Universit?t Graz | University of Graz

Heinrichstra?e 36, 8010 Graz
Tel.:   +43/(0)316/380-8546
E-Mail: elias.karabelas at uni-graz.at<mailto:elias.karabelas at uni-graz.at>
Web:    https://ccl.medunigraz.at

Bitte denken Sie an die Umwelt, bevor Sie dieses E-Mail drucken. Danke!
Please consider the environment before printing this e-mail. Thank you!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230331/844e8d0e/attachment-0001.html>

From jroman at dsic.upv.es  Fri Mar 31 08:01:00 2023
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Fri, 31 Mar 2023 15:01:00 +0200
Subject: [petsc-users] help: use real and complex petsc together
In-Reply-To: <eadd0e52-bb63-91d3-3581-894d420e50fd@cn.cogenda.com>
References: <eadd0e52-bb63-91d3-3581-894d420e50fd@cn.cogenda.com>
Message-ID: <11FB728F-BBCE-46B8-8F08-975E1FB2887F@dsic.upv.es>

I don't know enough about dlopen to answer you question. Maybe other people can comment. My suggestion is that you do everything in complex scalars and avoid complications trying to mix real and complex.

Regarding the second question, if I am not wrong PetscInitialize/PetscFinalize can be called several times, so it should not be a problem.

Jose


> El 31 mar 2023, a las 6:14, Gong Ding <gongding at cn.cogenda.com> escribi?:
> 
> Dear petsc developer,
> 
> We are considering use complex matrix for eigen value decomposition via slepc will keep DC/Tran simulation in real world.
> 
> We can, compiler each solver (AC, DC, Tran, etc) as dynamic library (.so), and load them by dlopen.
> 
> So, is it possible, each time we load a solver, which links to real/comples petsc,  to do the simulation. After that, we dlclose it and load next solver.
> 
> And another question is, we must keep MPI commnucator in the main code and call PetscInitialize/PetscFinalize in the so, it seems petsc support this mechanism, right?
> 
> 
> Gong Ding
> 
> 


From mfadams at lbl.gov  Fri Mar 31 08:25:57 2023
From: mfadams at lbl.gov (Mark Adams)
Date: Fri, 31 Mar 2023 09:25:57 -0400
Subject: [petsc-users] Augmented Linear System
In-Reply-To: <c88acb97-b17e-7fb7-6428-3d90649df387@uni-graz.at>
References: <2267f28c-ec43-66b8-43dd-29b4c6288478@uni-graz.at>
	<CADOhEh60Cd-r7FZHGpLv2VuFxt7C8HMxiLKv-n38V8C5TUYfkQ@mail.gmail.com>
	<c88acb97-b17e-7fb7-6428-3d90649df387@uni-graz.at>
Message-ID: <CADOhEh5eUMJkej6-yCCnun-mMt4basK85uiUvx2FT_65YMu74Q@mail.gmail.com>

On Fri, Mar 31, 2023 at 4:58?AM Karabelas, Elias (
elias.karabelas at uni-graz.at) <elias.karabelas at uni-graz.at> wrote:

> Hi Mark,
>
> thanks for the input, however I didn't quite get what you mean.
>
> Maybe I should be a bit more precisce what I want to achieve and why:
>
> So this specific form of block system arises in some biomedical
> application that colleagues and me published in
> https://www.sciencedirect.com/science/article/pii/S0045782521004230 (the
> intersting part is Appendix B.3)
>
> It boils down to a Newton method for solving nolinear mechanics describing
> the motion of the human hear, that is coupled on some Neumann surfaces (the
> surfaces of the inner cavities of each bloodpool in the heart) with a
> pressure that comes from a complicated 0D ODE model that describes
> cardiovascular physiology. This comes to look like
>
> |F_1(u,p)|   | 0 |
> |     | = |   |
> |F_2(u,p)|   | 0 |
>
> with F_1 is the residual of nonlinear mechanics plus a nonlinear boundary
> coupling term and F_2 is a coupling term to the ODE system. In this case u
> is displacement and p is the pressure calculated from the ODE model (one
> for each cavity in the heart, which gives four).
>
> After linearization, we arrive exactly at the aforementioned blocksystem,
> that we solve at the moment by a Schurcomplement approach based on K.
> So using this we can get around without doing a MATSHELL for the
> schurcomplement as we can just apply the KSP for K five times in order to
> approximate the solution of the schur-complement system.
>

So you compute an explicit Schur complement (4 solves) and then the real
solve use 1 more K solve.
I think this is pretty good as is. You are lucky with only 4 of these
pressure equations.
I've actually do this on a problem with 100s of extra equations (surface
averaging equations) but the problem was linear and would 1000s of times
steps, or more, and this huge set up cost was amortized.


> However here it gets tricky: The outer newton loop is working based on an
> inexact newton method with a forcing term from Walker et al. So basically
> the atol and rtol of the KSP are not constant but vary, so I guess this
> will influence how well we actually resolve the solution to the
> schur-complement system (I tried to find some works that can explain how to
> choose forcing terms in this case but found none).
>
>
Honestly, Walker is a great guy, but I would not get too hung up on this.
I've done a lot of plasticity work long ago and gave up on Walker et al.
Others have had the same experience.
What is new with your problem is how accurately do you want the Schur
complement (4) solves.


> This brought me to think if we can do this the other way around and do a
> pseudo-inverse of D because it's 4x4 and there is no need for a KSP there.
> I did a test implementation with a MATSHELL that realizes (K - B D^+ C)
> and use just K for building an GAMG prec however this fails spectaculary,
> because D^+ can behave very badly and the other way around I have (C K^-1 B
> - D) and this behaves more like a singular pertubation of the Matrix C K^-1
> B which behaves nicer. So here I stopped investigating because my PETSc
> expertise is not bad but certainly not good enough to judge which approach
> would pay off more in terms of runtime (by gut feeling was that building a
> MATSHELL requires then only one KSP solve vs the other 5).
>
> However I'm happy to hear some alternatives that I could persue in order
> to speed up our current solver strategy or even be able to build a nice
> MATSHELL.
>

OK, so you have tried what I was alluding to.
I don't follow what you did exactly and have not worked it out, but there
should be an iteration on the pressure equation with a (lagged) Schur solve
as a preconditioner.
But with only 4 extra solves in your case, I don't think it is worth it
unless you want to write solver papers.
And AMG in general really has to have a normal PDE, e.g. the K^-1 solve,
and if K is too far away from the Laplacian (or elasticity) then all bets
are off.

Good luck,
Mark


> Thanks
> Elias
>
>
>
>
> Am 30.03.23 um 19:41 schrieb Mark Adams:
>
> You can lag the update of the Schur complement and use your solver as a
> preconditioner.
> If your problems don't change much you might converge fast enough (ie, < 4
> iterations with one solve per iteration), but what you have is not bad if
> the size of your auxiliary, p, space does not grow.
>
> Mark
>
> On Thu, Mar 30, 2023 at 11:56?AM Karabelas, Elias (
> elias.karabelas at uni-graz.at) <elias.karabelas at uni-graz.at> wrote:
>
>> Dear Community,
>>
>> I have a linear system of the form
>>
>> |K B| du    f1
>>
>>         =
>>
>> |C D| dp    f2
>>
>> where K is a big m x m sparsematrix that comes from some FE
>> discretization, B is a coupling matrix (of the form m x 4), C is of the
>> form (4 x m) and D is 4x4.
>>
>> I save B and C as 4 Vecs and D as a 4x4 double array. D might be
>> singular so at the moment I use the following schur-complement approach
>> to solve this system
>>
>> 1) Calculate the vecs v1 = KSP(K,PrecK) * f1, invB = [ KSP(K, PrecK) *
>> B[0], KSP(K, PrecK) * B[1], KSP(K, PrecK) * B[2], KSP(K, PrecK) * B[3] ]
>>
>> 2) Build the schurcomplement S=[C ^ K^-1 B - D] via VecDots (C ^ K^-1 B
>> [i, j] = VecDot(C[i], invB[j])
>>
>> 3) invert S (this seems to be mostly non-singular) to get dp
>>
>> 4) calculate du with dp
>>
>> So counting this, I need 5x to call KSP which can be expensive and I
>> thought of somehow doing the Schur-Complement the other way around,
>> however due to the (possible) singularity of D this seems like a bad
>> idea (even using a pseudoinverse)
>>
>> Two things puzzle me here still
>>
>> A) Can this be done more efficiently?
>>
>> B) In case my above matrix is the Jacobian in a newton method, how do I
>> make sure with any form of Schur Complement approach that I hit the
>> correct residual reduction?
>>
>> Thanks
>>
>> Elias
>>
>> --
>> Dr. Elias Karabelas
>> Universit?tsassistent | PostDoc
>>
>> Institut f?r Mathematik & Wissenschaftliches Rechnen | Institute of
>> Mathematics & Scientific Computing
>> Universit?t Graz | University of Graz
>>
>> Heinrichstra?e 36, 8010 Graz
>> Tel.:   +43/(0)316/380-8546
>> E-Mail: elias.karabelas at uni-graz.at
>> Web:    https://ccl.medunigraz.at
>>
>> Bitte denken Sie an die Umwelt, bevor Sie dieses E-Mail drucken. Danke!
>> Please consider the environment before printing this e-mail. Thank you!
>>
>>
> --
> Dr. Elias Karabelas
> Universit?tsassistent | PostDoc
>
> Institut f?r Mathematik & Wissenschaftliches Rechnen | Institute of Mathematics & Scientific Computing
> Universit?t Graz | University of Graz
>
> Heinrichstra?e 36, 8010 Graz
> Tel.:   +43/(0)316/380-8546
> E-Mail: elias.karabelas at uni-graz.at
> Web: 	https://ccl.medunigraz.at
>
> Bitte denken Sie an die Umwelt, bevor Sie dieses E-Mail drucken. Danke!
> Please consider the environment before printing this e-mail. Thank you!
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230331/cd2063cb/attachment.html>

From bsmith at petsc.dev  Fri Mar 31 12:47:31 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 31 Mar 2023 13:47:31 -0400
Subject: [petsc-users] PETSc release announcement and reminder to register
 for the PETSc annual meeting
Message-ID: <C35B20EA-F48D-44BA-91D7-64C64A13DAE4@petsc.dev>


We are pleased to announce the release of PETSc version 3.19.0, now available at https://petsc.org/release/download/ 

A list of the major changes and updates can be found at https://petsc.org/release/changes/319/

We'd also like to remind everyone to register for the PETSc Annual Meeting June 5-7 in Chicago https://www.eventbrite.com/e/petsc-2023-user-meeting-tickets-494165441137 and to submit an abstract https://docs.google.com/forms/d/e/1FAIpQLSesh47RGVb9YD9F1qu4obXSe1X6fn7vVmjewllePBDxBItfOw/viewform. 

Information on the meeting is available at https://petsc.org/release/community/meetings/2023/#meeting

We recommend upgrading to PETSc 3.19.0 soon. As always, please report problems to  petsc-maint at mcs.anl.gov  and ask questions at petsc-users at mcs.anl.gov

This release includes contributions from

Aidan Hamilton
Alp Dener
Ashish Patel
Barry Smith
Blaise Bourdin
danofinn
David Wells
DenverCoder9
Duncan Campbell
Eric Chamberland
Fernando S. Pacheco
Frederic Vi
Heeho Park
Hong Zhang
Igor Baratta
Jacob Faibussowitsch
JDBetteridge
Jed Brown
Jeremy L Thompson
Joseph Pusztay
Jose Roman
Junchao Zhang
Justin Chang
Koki Sagiyama
Lisandro Dalcin
Malachi Phillips
Mark Adams
Mark Lohry
Martin Diehl
Matthew Knepley
Mr. Hong Zhang
Nicholas Arnold-Medabalimi
Pablo Brubeck
Patrick Sanan
Pierre Jolivet
Ricardo Jesus
Richard Tran Mills
Samuel Khuvis
Satish Balay
Shao-Ching Huang
Stefano Zampini
Suyash Tandon
tlanyan
Toby Isaac
Umberto Zerbinati
Vaclav Hapla
Yang Zongze

and bug reports/proposed improvements received from

Balin, Riccardo
Carlos Alonso Aznar?n Laos
chiara.puglisi at mumps-tech.com
dalcinl
Daniel Taller
Dave May
Don Fernando
Philip Fackler
Felix Wilms
Hom Nath Gharti
Hong Zhang
Jacob Faibussowitsch
Jed Brown
Jin Chen
Mike Michell
Nicholas Arnold-Medabalimi
Robert Nourgaliev
Pierre Jolivet
Richard Katz
Sajid Ali Syed
Sanjay Govindjee
Stephan K?hler
Valentin Churavy
Victor Eijkhout
yangzongze

As always, thanks for your support,

Barry