From paul.grosse-bley at ziti.uni-heidelberg.de Wed Mar 1 09:30:33 2023 From: paul.grosse-bley at ziti.uni-heidelberg.de (Paul Grosse-Bley) Date: Wed, 01 Mar 2023 16:30:33 +0100 Subject: [petsc-users] =?utf-8?q?How_to_use_DM=5FBOUNDARY=5FGHOSTED_for_D?= =?utf-8?q?irichlet_boundary_conditions?= Message-ID: <369419-63ff6f80-cd-ca552d@2229059> Thank you for the detailed answer, Barry. I had hit a deadend on my side. If you wish to compare, for example, ex45.c with a code that does not incorporate the Dirichlet boundary nodes in the linear system you can just use 0 boundary conditions for both codes.Do you mean to implement the boundary conditions explicitly in e.g. hpgmg-cuda instead of using the ghosted cells for them? Do I go right in the assumption that the PCMG coarsening (using DMDAs geometric information) will cause the boundary condition on the coarser grids to be finite (>0)? Ideally I would like to just use some kind of GPU-parallel (colored) SOR/Gauss-Seidel instead of Jacobi. One can relatively easily implement Red-Black GS using cuSPARSE's masked matrix vector products, but I have not found any information on implementing a custom preconditioner in PETSc. Best, Paul Grosse-Bley On Wednesday, March 01, 2023 05:38 CET, Barry Smith wrote: ???? ?Ok, here is the situation. The command line options as given do not result in multigrid quality convergence in any of the runs; the error contraction factor is around .94 (meaning that for the modes that the multigrid algorithm does the worst on it only removes about 6 percent of them per iteration).??? ?But this is hidden by the initial right hand side for the linear system as written in ex45.c which has O(h) values on the boundary nodes and O(h^3) values on the interior nodes. The first iterations are largely working on the boundary residual and making great progress attacking that so that it looks like the one has a good error contraction factor. One then sees the error contraction factor start to get worse and worse for the later iterations. With the 0 on the boundary the iterations quickly get to the bad regime where the error contraction factor is near one. One can see this by using a -ksp_rtol 1.e-12 and having the MG code print the residual decrease for each iteration. Thought it appears the 0 boundary condition one converges much slower (since it requires many more iterations) if you factor out the huge advantage of the nonzero boundary condition case at the beginning (in terms of decreasing the residual) you see they both have an asymptotic error contraction factor of around .94 (which is horrible for multigrid).?? ?I now add?-mg_levels_ksp_richardson_scale .9 -mg_coarse_ksp_richardson_scale .9?and?rerun the two cases (nonzero and zero boundary right hand side) they take 35 and 41 iterations (much better)? initial residual norm 14.6993 next residual norm 0.84167 0.0572591 next residual norm 0.0665392 0.00452668 next residual norm 0.0307273 0.00209039 next residual norm 0.0158949 0.00108134 next residual norm 0.00825189 0.000561378 next residual norm 0.00428474 0.000291492 next residual norm 0.00222482 0.000151355 next residual norm 0.00115522 7.85898e-05 next residual norm 0.000599836 4.0807e-05 next residual norm 0.000311459 2.11887e-05 next residual norm 0.000161722 1.1002e-05 next residual norm 8.39727e-05 5.71269e-06 next residual norm 4.3602e-05 2.96626e-06 next residual norm 2.26399e-05 1.5402e-06 next residual norm 1.17556e-05 7.99735e-07 next residual norm 6.10397e-06 4.15255e-07 next residual norm 3.16943e-06 2.15617e-07 next residual norm 1.64569e-06 1.11957e-07 next residual norm 8.54511e-07 5.81326e-08 next residual norm 4.43697e-07 3.01848e-08 next residual norm 2.30385e-07 1.56732e-08 next residual norm 1.19625e-07 8.13815e-09 next residual norm 6.21143e-08 4.22566e-09 next residual norm 3.22523e-08 2.19413e-09 next residual norm 1.67467e-08 1.13928e-09 next residual norm 8.69555e-09 5.91561e-10 next residual norm 4.51508e-09 3.07162e-10 next residual norm 2.34441e-09 1.59491e-10 next residual norm 1.21731e-09 8.28143e-11 next residual norm 6.32079e-10 4.30005e-11 next residual norm 3.28201e-10 2.23276e-11 next residual norm 1.70415e-10 1.15934e-11 next residual norm 8.84865e-11 6.01976e-12 next residual norm 4.59457e-11 3.1257e-12 next residual norm 2.38569e-11 1.62299e-12 next residual norm 1.23875e-11 8.42724e-13 Linear solve converged due to CONVERGED_RTOL iterations 35 Residual norm 1.23875e-11? initial residual norm 172.601 next residual norm 154.803 0.896887 next residual norm 66.9409 0.387837 next residual norm 34.4572 0.199636 next residual norm 17.8836 0.103612 next residual norm 9.28582 0.0537995 next residual norm 4.82161 0.027935 next residual norm 2.50358 0.014505 next residual norm 1.29996 0.0075316 next residual norm 0.674992 0.00391071 next residual norm 0.350483 0.0020306 next residual norm 0.181985 0.00105437 next residual norm 0.094494 0.000547472 next residual norm 0.0490651 0.000284269 next residual norm 0.0254766 0.000147604 next residual norm 0.0132285 7.6642e-05 next residual norm 0.00686876 3.97956e-05 next residual norm 0.00356654 2.06635e-05 next residual norm 0.00185189 1.07293e-05 next residual norm 0.000961576 5.5711e-06 next residual norm 0.000499289 2.89274e-06 next residual norm 0.000259251 1.50203e-06 next residual norm 0.000134614 7.79914e-07 next residual norm 6.98969e-05 4.04963e-07 next residual norm 3.62933e-05 2.10273e-07 next residual norm 1.88449e-05 1.09182e-07 next residual norm 9.78505e-06 5.66919e-08 next residual norm 5.0808e-06 2.94367e-08 next residual norm 2.63815e-06 1.52847e-08 next residual norm 1.36984e-06 7.93645e-09 next residual norm 7.11275e-07 4.12093e-09 next residual norm 3.69322e-07 2.13975e-09 next residual norm 1.91767e-07 1.11105e-09 next residual norm 9.95733e-08 5.769e-10 next residual norm 5.17024e-08 2.99549e-10 next residual norm 2.6846e-08 1.55538e-10 next residual norm 1.39395e-08 8.07615e-11 next residual norm 7.23798e-09 4.19348e-11 next residual norm 3.75824e-09 2.17742e-11 next residual norm 1.95138e-09 1.13058e-11 next residual norm 1.01327e-09 5.87059e-12 next residual norm 5.26184e-10 3.04856e-12 next residual norm 2.73182e-10 1.58274e-12 next residual norm 1.41806e-10 8.21586e-13 Linear solve converged due to CONVERGED_RTOL iterations 42 Residual norm 1.41806e-10?Notice in the first run the residual norm still dives much more quickly for the first 2 iterations than the second run. This is because the first run has "lucky error" that gets wiped out easily from the big boundary term. After that you can see that the convergence for both is very similar with both having a reasonable error contraction factor of .51?I' ve attached the modified src/ksp/pc/impls/mg/mg.c that prints the residuals along the way.? -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Mar 1 09:51:00 2023 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 1 Mar 2023 10:51:00 -0500 Subject: [petsc-users] How to use DM_BOUNDARY_GHOSTED for Dirichlet boundary conditions In-Reply-To: <369419-63ff6f80-cd-ca552d@2229059> References: <369419-63ff6f80-cd-ca552d@2229059> Message-ID: <6E352B23-E2B8-4538-B088-EBEDCE3315B6@petsc.dev> > On Mar 1, 2023, at 10:30 AM, Paul Grosse-Bley wrote: > > Thank you for the detailed answer, Barry. I had hit a deadend on my side. > If you wish to compare, for example, ex45.c with a code that does not incorporate the Dirichlet boundary nodes in the linear system you can just use 0 boundary conditions for both codes. > > Do you mean to implement the boundary conditions explicitly in e.g. hpgmg-cuda instead of using the ghosted cells for them? I don't know anything about hpgmg-cuda and what it means by "ghosted cells". I am just saying I think it is reasonable to use the style of ex45.c that retains Dirichlet unknowns in the global matrix (with zero on the those points) in PETSc to compare with other codes that may or may not do something different. But you need to use zero for Dirichlet points to ensure that the "funny" convergence rates do not happen at the beginning making the comparison unbalanced between the two codes. > > Do I go right in the assumption that the PCMG coarsening (using DMDAs geometric information) will cause the boundary condition on the coarser grids to be finite (>0)? In general even if the Dirichlet points on the fine grid are zero, during PCMG with DMDA as in ex45.c yes those "boundary" values on the coarser grid may end up during the iterative process as non-zero. But this is "harmless", just part of the algorithm but it does mean the convergence with a code that does not include those points will be different (not necessarily better or worse, just different). > > Ideally I would like to just use some kind of GPU-parallel (colored) SOR/Gauss-Seidel instead of Jacobi. One can relatively easily implement Red-Black GS using cuSPARSE's masked matrix vector products, but I have not found any information on implementing a custom preconditioner in PETSc. You can use https://petsc.org/release/docs/manualpages/PC/PCSHELL/ You can look at src/ksp/pc/impls/jacobi.c for detailed comments on what goes into a preconditioner object. If you write such a GPU-parallel (colored) SOR/Gauss-Seidel we would love to include it in PETSc. Note also https://petsc.org/release/docs/manualpages/KSP/KSPCHEBYSHEV/ and potentially other "polynomial preconditioners" are an alternative approach for having more powerful parallel smoothers. > > Best, > Paul Grosse-Bley > > On Wednesday, March 01, 2023 05:38 CET, Barry Smith wrote: > >> >> > > Ok, here is the situation. The command line options as given do not result in multigrid quality convergence in any of the runs; the error contraction factor is around .94 (meaning that for the modes that the multigrid algorithm does the worst on it only removes about 6 percent of them per iteration). > > But this is hidden by the initial right hand side for the linear system as written in ex45.c which has O(h) values on the boundary nodes and O(h^3) values on the interior nodes. The first iterations are largely working on the boundary residual and making great progress attacking that so that it looks like the one has a good error contraction factor. One then sees the error contraction factor start to get worse and worse for the later iterations. With the 0 on the boundary the iterations quickly get to the bad regime where the error contraction factor is near one. One can see this by using a -ksp_rtol 1.e-12 and having the MG code print the residual decrease for each iteration. Thought it appears the 0 boundary condition one converges much slower (since it requires many more iterations) if you factor out the huge advantage of the nonzero boundary condition case at the beginning (in terms of decreasing the residual) you see they both have an asymptotic error contraction factor of around .94 (which is horrible for multigrid). > > I now add -mg_levels_ksp_richardson_scale .9 -mg_coarse_ksp_richardson_scale .9 and rerun the two cases (nonzero and zero boundary right hand side) they take 35 and 41 iterations (much better) > > initial residual norm 14.6993 > next residual norm 0.84167 0.0572591 > next residual norm 0.0665392 0.00452668 > next residual norm 0.0307273 0.00209039 > next residual norm 0.0158949 0.00108134 > next residual norm 0.00825189 0.000561378 > next residual norm 0.00428474 0.000291492 > next residual norm 0.00222482 0.000151355 > next residual norm 0.00115522 7.85898e-05 > next residual norm 0.000599836 4.0807e-05 > next residual norm 0.000311459 2.11887e-05 > next residual norm 0.000161722 1.1002e-05 > next residual norm 8.39727e-05 5.71269e-06 > next residual norm 4.3602e-05 2.96626e-06 > next residual norm 2.26399e-05 1.5402e-06 > next residual norm 1.17556e-05 7.99735e-07 > next residual norm 6.10397e-06 4.15255e-07 > next residual norm 3.16943e-06 2.15617e-07 > next residual norm 1.64569e-06 1.11957e-07 > next residual norm 8.54511e-07 5.81326e-08 > next residual norm 4.43697e-07 3.01848e-08 > next residual norm 2.30385e-07 1.56732e-08 > next residual norm 1.19625e-07 8.13815e-09 > next residual norm 6.21143e-08 4.22566e-09 > next residual norm 3.22523e-08 2.19413e-09 > next residual norm 1.67467e-08 1.13928e-09 > next residual norm 8.69555e-09 5.91561e-10 > next residual norm 4.51508e-09 3.07162e-10 > next residual norm 2.34441e-09 1.59491e-10 > next residual norm 1.21731e-09 8.28143e-11 > next residual norm 6.32079e-10 4.30005e-11 > next residual norm 3.28201e-10 2.23276e-11 > next residual norm 1.70415e-10 1.15934e-11 > next residual norm 8.84865e-11 6.01976e-12 > next residual norm 4.59457e-11 3.1257e-12 > next residual norm 2.38569e-11 1.62299e-12 > next residual norm 1.23875e-11 8.42724e-13 > Linear solve converged due to CONVERGED_RTOL iterations 35 > Residual norm 1.23875e-11 > > initial residual norm 172.601 > next residual norm 154.803 0.896887 > next residual norm 66.9409 0.387837 > next residual norm 34.4572 0.199636 > next residual norm 17.8836 0.103612 > next residual norm 9.28582 0.0537995 > next residual norm 4.82161 0.027935 > next residual norm 2.50358 0.014505 > next residual norm 1.29996 0.0075316 > next residual norm 0.674992 0.00391071 > next residual norm 0.350483 0.0020306 > next residual norm 0.181985 0.00105437 > next residual norm 0.094494 0.000547472 > next residual norm 0.0490651 0.000284269 > next residual norm 0.0254766 0.000147604 > next residual norm 0.0132285 7.6642e-05 > next residual norm 0.00686876 3.97956e-05 > next residual norm 0.00356654 2.06635e-05 > next residual norm 0.00185189 1.07293e-05 > next residual norm 0.000961576 5.5711e-06 > next residual norm 0.000499289 2.89274e-06 > next residual norm 0.000259251 1.50203e-06 > next residual norm 0.000134614 7.79914e-07 > next residual norm 6.98969e-05 4.04963e-07 > next residual norm 3.62933e-05 2.10273e-07 > next residual norm 1.88449e-05 1.09182e-07 > next residual norm 9.78505e-06 5.66919e-08 > next residual norm 5.0808e-06 2.94367e-08 > next residual norm 2.63815e-06 1.52847e-08 > next residual norm 1.36984e-06 7.93645e-09 > next residual norm 7.11275e-07 4.12093e-09 > next residual norm 3.69322e-07 2.13975e-09 > next residual norm 1.91767e-07 1.11105e-09 > next residual norm 9.95733e-08 5.769e-10 > next residual norm 5.17024e-08 2.99549e-10 > next residual norm 2.6846e-08 1.55538e-10 > next residual norm 1.39395e-08 8.07615e-11 > next residual norm 7.23798e-09 4.19348e-11 > next residual norm 3.75824e-09 2.17742e-11 > next residual norm 1.95138e-09 1.13058e-11 > next residual norm 1.01327e-09 5.87059e-12 > next residual norm 5.26184e-10 3.04856e-12 > next residual norm 2.73182e-10 1.58274e-12 > next residual norm 1.41806e-10 8.21586e-13 > Linear solve converged due to CONVERGED_RTOL iterations 42 > Residual norm 1.41806e-10 > > Notice in the first run the residual norm still dives much more quickly for the first 2 iterations than the second run. This is because the first run has "lucky error" that gets wiped out easily from the big boundary term. After that you can see that the convergence for both is very similar with both having a reasonable error contraction factor of .51 > > I' ve attached the modified src/ksp/pc/impls/mg/mg.c that prints the residuals along the way. -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.grosse-bley at ziti.uni-heidelberg.de Wed Mar 1 10:10:26 2023 From: paul.grosse-bley at ziti.uni-heidelberg.de (Paul Grosse-Bley) Date: Wed, 01 Mar 2023 17:10:26 +0100 Subject: [petsc-users] =?utf-8?q?How_to_use_DM=5FBOUNDARY=5FGHOSTED_for_D?= =?utf-8?q?irichlet_boundary_conditions?= In-Reply-To: <6E352B23-E2B8-4538-B088-EBEDCE3315B6@petsc.dev> Message-ID: <3831b9-63ff7900-e9-7ca7f900@96463981> I previously thought that a custom Red-Black GS solution would be to specific for PETSc as it will only work for star-shaped stencils of width 1. For that specific operation, a custom kernel would probably be able to achieve significantly better performance as it would not have to load a mask from memory and one can make use of the very regular structure of the problem. We still wanted to use cuSPARSE here because it seemed to be more in line with the present GPU support of PETSc and using custom kernels in one library might be seen as unfair when comparing it to others. But if you are interested anyway, I will share my implementation if/when I get to implementing that. Best, Paul Grosse-Bley On Wednesday, March 01, 2023 16:51 CET, Barry Smith wrote: ???On Mar 1, 2023, at 10:30 AM, Paul Grosse-Bley wrote:?Thank you for the detailed answer, Barry. I had hit a deadend on my side. If you wish to compare, for example, ex45.c with a code that does not incorporate the Dirichlet boundary nodes in the linear system you can just use 0 boundary conditions for both codes.Do you mean to implement the boundary conditions explicitly in e.g. hpgmg-cuda instead of using the ghosted cells for them??? ?I don't know anything about hpgmg-cuda and what it means by "ghosted cells". ?I am just saying I think it is reasonable to use the style of ex45.c that retains Dirichlet unknowns in the global matrix (with zero on the those points) in PETSc to compare with other codes that may or may not do something different. But you need to use zero for Dirichlet points to ensure that the "funny" convergence rates do not happen at the beginning making the comparison unbalanced between the two codes.? Do I go right in the assumption that the PCMG coarsening (using DMDAs geometric information) will cause the boundary condition on the coarser grids to be finite (>0)??? In general even if the Dirichlet points on the fine grid are zero, during PCMG with DMDA as in ex45.c yes those "boundary" values on the coarser grid may end up during the iterative process as non-zero. ?But this is "harmless", just part of the algorithm but it does mean the convergence with a code that does not include those points will be different (not necessarily better or worse, just different).? Ideally I would like to just use some kind of GPU-parallel (colored) SOR/Gauss-Seidel instead of Jacobi. One can relatively easily implement Red-Black GS using cuSPARSE's masked matrix vector products, but I have not found any information on implementing a custom preconditioner in PETSc.?? You can use?https://petsc.org/release/docs/manualpages/PC/PCSHELL/?You can look at src/ksp/pc/impls/jacobi.c for detailed comments on what goes into a preconditioner object.??If you write such a GPU-parallel (colored) SOR/Gauss-Seidel ?we would love to include it in PETSc. Note also?https://petsc.org/release/docs/manualpages/KSP/KSPCHEBYSHEV/?and potentially other "polynomial preconditioners" are an alternative approach for having more powerful parallel smoothers.???? Best, Paul Grosse-Bley On Wednesday, March 01, 2023 05:38 CET, Barry Smith wrote: ???? ?Ok, here is the situation. The command line options as given do not result in multigrid quality convergence in any of the runs; the error contraction factor is around .94 (meaning that for the modes that the multigrid algorithm does the worst on it only removes about 6 percent of them per iteration).??? ?But this is hidden by the initial right hand side for the linear system as written in ex45.c which has O(h) values on the boundary nodes and O(h^3) values on the interior nodes. The first iterations are largely working on the boundary residual and making great progress attacking that so that it looks like the one has a good error contraction factor. One then sees the error contraction factor start to get worse and worse for the later iterations. With the 0 on the boundary the iterations quickly get to the bad regime where the error contraction factor is near one. One can see this by using a -ksp_rtol 1.e-12 and having the MG code print the residual decrease for each iteration. Thought it appears the 0 boundary condition one converges much slower (since it requires many more iterations) if you factor out the huge advantage of the nonzero boundary condition case at the beginning (in terms of decreasing the residual) you see they both have an asymptotic error contraction factor of around .94 (which is horrible for multigrid).?? ?I now add?-mg_levels_ksp_richardson_scale .9 -mg_coarse_ksp_richardson_scale .9?and?rerun the two cases (nonzero and zero boundary right hand side) they take 35 and 41 iterations (much better)?initial residual norm 14.6993next residual norm 0.84167 0.0572591next residual norm 0.0665392 0.00452668next residual norm 0.0307273 0.00209039next residual norm 0.0158949 0.00108134next residual norm 0.00825189 0.000561378next residual norm 0.00428474 0.000291492next residual norm 0.00222482 0.000151355next residual norm 0.00115522 7.85898e-05next residual norm 0.000599836 4.0807e-05next residual norm 0.000311459 2.11887e-05next residual norm 0.000161722 1.1002e-05next residual norm 8.39727e-05 5.71269e-06next residual norm 4.3602e-05 2.96626e-06next residual norm 2.26399e-05 1.5402e-06next residual norm 1.17556e-05 7.99735e-07next residual norm 6.10397e-06 4.15255e-07next residual norm 3.16943e-06 2.15617e-07next residual norm 1.64569e-06 1.11957e-07next residual norm 8.54511e-07 5.81326e-08next residual norm 4.43697e-07 3.01848e-08next residual norm 2.30385e-07 1.56732e-08next residual norm 1.19625e-07 8.13815e-09next residual norm 6.21143e-08 4.22566e-09next residual norm 3.22523e-08 2.19413e-09next residual norm 1.67467e-08 1.13928e-09next residual norm 8.69555e-09 5.91561e-10next residual norm 4.51508e-09 3.07162e-10next residual norm 2.34441e-09 1.59491e-10next residual norm 1.21731e-09 8.28143e-11next residual norm 6.32079e-10 4.30005e-11next residual norm 3.28201e-10 2.23276e-11next residual norm 1.70415e-10 1.15934e-11next residual norm 8.84865e-11 6.01976e-12next residual norm 4.59457e-11 3.1257e-12next residual norm 2.38569e-11 1.62299e-12next residual norm 1.23875e-11 8.42724e-13Linear solve converged due to CONVERGED_RTOL iterations 35Residual norm 1.23875e-11?initial residual norm 172.601next residual norm 154.803 0.896887next residual norm 66.9409 0.387837next residual norm 34.4572 0.199636next residual norm 17.8836 0.103612next residual norm 9.28582 0.0537995next residual norm 4.82161 0.027935next residual norm 2.50358 0.014505next residual norm 1.29996 0.0075316next residual norm 0.674992 0.00391071next residual norm 0.350483 0.0020306next residual norm 0.181985 0.00105437next residual norm 0.094494 0.000547472next residual norm 0.0490651 0.000284269next residual norm 0.0254766 0.000147604next residual norm 0.0132285 7.6642e-05next residual norm 0.00686876 3.97956e-05next residual norm 0.00356654 2.06635e-05next residual norm 0.00185189 1.07293e-05next residual norm 0.000961576 5.5711e-06next residual norm 0.000499289 2.89274e-06next residual norm 0.000259251 1.50203e-06next residual norm 0.000134614 7.79914e-07next residual norm 6.98969e-05 4.04963e-07next residual norm 3.62933e-05 2.10273e-07next residual norm 1.88449e-05 1.09182e-07next residual norm 9.78505e-06 5.66919e-08next residual norm 5.0808e-06 2.94367e-08next residual norm 2.63815e-06 1.52847e-08next residual norm 1.36984e-06 7.93645e-09next residual norm 7.11275e-07 4.12093e-09next residual norm 3.69322e-07 2.13975e-09next residual norm 1.91767e-07 1.11105e-09next residual norm 9.95733e-08 5.769e-10next residual norm 5.17024e-08 2.99549e-10next residual norm 2.6846e-08 1.55538e-10next residual norm 1.39395e-08 8.07615e-11next residual norm 7.23798e-09 4.19348e-11next residual norm 3.75824e-09 2.17742e-11next residual norm 1.95138e-09 1.13058e-11next residual norm 1.01327e-09 5.87059e-12next residual norm 5.26184e-10 3.04856e-12next residual norm 2.73182e-10 1.58274e-12next residual norm 1.41806e-10 8.21586e-13Linear solve converged due to CONVERGED_RTOL iterations 42Residual norm 1.41806e-10?Notice in the first run the residual norm still dives much more quickly for the first 2 iterations than the second run. This is because the first run has "lucky error" that gets wiped out easily from the big boundary term. After that you can see that the convergence for both is very similar with both having a reasonable error contraction factor of .51?I' ve attached the modified src/ksp/pc/impls/mg/mg.c that prints the residuals along the way.? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jchristopher at anl.gov Wed Mar 1 18:17:03 2023 From: jchristopher at anl.gov (Christopher, Joshua) Date: Thu, 2 Mar 2023 00:17:03 +0000 Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG Message-ID: Hello, I am trying to solve the leaky-dielectric model equations with PETSc using a second-order discretization scheme (with limiting to first order as needed) using the finite volume method. The leaky dielectric model is a coupled system of two equations, consisting of a Poisson equation and a convection-diffusion equation. I have tested on small problems with simple geometry (~1000 DoFs) using: -ksp_type gmres -pc_type hypre -pc_hypre_type boomeramg and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this in parallel with 2 cores, but also previously was able to use successfully use a direct solver in serial to solve this problem. When I scale up to my production problem, I get significantly worse convergence. My production problem has ~3 million DoFs, more complex geometry, and is solved on ~100 cores across two nodes. The boundary conditions change a little because of the geometry, but are of the same classifications (e.g. only Dirichlet and Neumann). On the production case, I am needing 600-4000 iterations to converge. I've attached the output from the first solve that took 658 iterations to converge, using the following output options: -ksp_view_pre -ksp_view -ksp_converged_reason -ksp_monitor_true_residual -ksp_test_null_space My matrix is non-symmetric, the condition number can be around 10e6, and the eigenvalues reported by PETSc have been real and positive (using -ksp_view_eigenvalues). I have tried using other preconditions (superlu, mumps, gamg, mg) but hypre+boomeramg has performed the best so far. The literature seems to indicate that AMG is the best approach for solving these equations in a coupled fashion. Do you have any advice on speeding up the convergence of this system? Thank you, Joshua -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: petsc_gmres_boomeramg.txt URL: From mfadams at lbl.gov Thu Mar 2 02:57:37 2023 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 2 Mar 2023 09:57:37 +0100 Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG In-Reply-To: References: Message-ID: Can you give us a bit more detail on your equations? * You are going to want to use FieldSplit. * AMG usually takes some effort to get working well. You want to start simple, even just a Lapacian or two decoupled Laplacians in your code to get the expected MG performance. Then add realistic geometry, then more tems, etc., and ramp up to what you want to do and we can help you address problems that arise at each step. Verify the results at each step. Mark On Thu, Mar 2, 2023 at 5:51?AM Christopher, Joshua via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hello, > > I am trying to solve the leaky-dielectric model equations with PETSc using > a second-order discretization scheme (with limiting to first order as > needed) using the finite volume method. The leaky dielectric model is a > coupled system of two equations, consisting of a Poisson equation and a > convection-diffusion equation. I have tested on small problems with simple > geometry (~1000 DoFs) using: > > -ksp_type gmres > -pc_type hypre > -pc_hypre_type boomeramg > > and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this > in parallel with 2 cores, but also previously was able to use successfully > use a direct solver in serial to solve this problem. When I scale up to my > production problem, I get significantly worse convergence. My production > problem has ~3 million DoFs, more complex geometry, and is solved on ~100 > cores across two nodes. The boundary conditions change a little because of > the geometry, but are of the same classifications (e.g. only Dirichlet and > Neumann). On the production case, I am needing 600-4000 iterations to > converge. I've attached the output from the first solve that took 658 > iterations to converge, using the following output options: > > -ksp_view_pre > -ksp_view > -ksp_converged_reason > -ksp_monitor_true_residual > -ksp_test_null_space > > My matrix is non-symmetric, the condition number can be around 10e6, and > the eigenvalues reported by PETSc have been real and positive (using > -ksp_view_eigenvalues). > > I have tried using other preconditions (superlu, mumps, gamg, mg) but > hypre+boomeramg has performed the best so far. The literature seems to > indicate that AMG is the best approach for solving these equations in a > coupled fashion. > > Do you have any advice on speeding up the convergence of this system? > > Thank you, > Joshua > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Mar 2 07:47:21 2023 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 2 Mar 2023 08:47:21 -0500 Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG In-Reply-To: References: Message-ID: Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the 5,000,000 unknowns? It is at the high end of problem sizes you can do with direct solvers but is worth comparing with BoomerAMG. You likely want to use more nodes and fewer cores per node with MUMPs to be able to access more memory. If you are needing to solve multiple right hand sides but with the same matrix the factors will be reused resulting in the second and later solves being much faster. I agree with Mark, with iterative solvers you are likely to end up with PCFIELDSPLIT. Barry > On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users wrote: > > Hello, > > I am trying to solve the leaky-dielectric model equations with PETSc using a second-order discretization scheme (with limiting to first order as needed) using the finite volume method. The leaky dielectric model is a coupled system of two equations, consisting of a Poisson equation and a convection-diffusion equation. I have tested on small problems with simple geometry (~1000 DoFs) using: > > -ksp_type gmres > -pc_type hypre > -pc_hypre_type boomeramg > > and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this in parallel with 2 cores, but also previously was able to use successfully use a direct solver in serial to solve this problem. When I scale up to my production problem, I get significantly worse convergence. My production problem has ~3 million DoFs, more complex geometry, and is solved on ~100 cores across two nodes. The boundary conditions change a little because of the geometry, but are of the same classifications (e.g. only Dirichlet and Neumann). On the production case, I am needing 600-4000 iterations to converge. I've attached the output from the first solve that took 658 iterations to converge, using the following output options: > > -ksp_view_pre > -ksp_view > -ksp_converged_reason > -ksp_monitor_true_residual > -ksp_test_null_space > > My matrix is non-symmetric, the condition number can be around 10e6, and the eigenvalues reported by PETSc have been real and positive (using -ksp_view_eigenvalues). > > I have tried using other preconditions (superlu, mumps, gamg, mg) but hypre+boomeramg has performed the best so far. The literature seems to indicate that AMG is the best approach for solving these equations in a coupled fashion. > > Do you have any advice on speeding up the convergence of this system? > > Thank you, > Joshua > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fengshw3 at mail2.sysu.edu.cn Thu Mar 2 11:43:16 2023 From: fengshw3 at mail2.sysu.edu.cn (=?utf-8?B?5Yav5LiK546u?=) Date: Fri, 3 Mar 2023 01:43:16 +0800 Subject: [petsc-users] Error in configuring PETSc with Cygwin Message-ID: Hi team, Recently I try to install PETSc with Cygwin since I'd like to use PETSc with Visual Studio on Windows10 plateform. For the sake of clarity, I firstly list the softwares/packages used below: 1. PETSc: version 3.18.5 2. VS: version 2019 3. Intel Parallel Studio XE: version 2020 4. Cygwin with py3.8 and make (and default installation) And because I plan to use Intel mpi, the compiler option in configuration is: ./configure --with-cc='win32fe cl' --with-fc='win32fe ifort' --with-cxx='win32fe cl' --download-fblaslapack where there is no option for mpi. While the PROBLEM came with the compiler option --with-fc='win32fe ifort', which returned an error (or two) as: Cannot run executables created with FC. If this machine uses a batch system to submit jobs you will need to configure using ./configure with the additional option  --with-batch. Otherwise there is problem with the compilers. Can you compile and run code with your compiler '/cygdrive/d/petsc/petsc-3.18.5/lib/petsc/bin/win32fe/win32fe ifort'? Note that both ifort of x64 and ifort of ia-32 ended with the same error above and I install IPS with options related to mkl and fblaslapack. Something a bit suspectable is that I open Cygwin with dos. (actually the Intel Compiler 19.1 Update 3 Intel 64 Visual Studio 2019, x86 environment for the test of ifort ia-32 ,in particularlly) Therefore, I write this e-mail to you in order to confirm if I should add "--with-batch" or the error is caused by other reason, such as ifort ? Looking forward your reply! Sinserely, FENG. -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Thu Mar 2 12:13:49 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 2 Mar 2023 12:13:49 -0600 (CST) Subject: [petsc-users] Error in configuring PETSc with Cygwin In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023, ??? wrote: > Hi team, > > > Recently I try to install PETSc with Cygwin since I'd like to use PETSc with Visual Studio on Windows10 plateform. For the sake of clarity, I firstly list the softwares/packages used below: > > > 1. PETSc: version 3.18.5 > 2. VS: version 2019 > 3. Intel Parallel Studio XE: version 2020 > 4. Cygwin with py3.8 and make (and default installation) > > > And because I plan to use Intel mpi, the compiler option in configuration is: > > > ./configure --with-cc='win32fe cl' --with-fc='win32fe ifort' --with-cxx='win32fe cl' --download-fblaslapack Check config/examples/arch-ci-mswin-opt-impi.py for an example on specifying IMPI [and MKL - instead of fblaslapack]. And if you don't need MPI - you can use --with-mpi=0 > > > where there is no option for mpi. > > > While the PROBLEM came with the compiler option --with-fc='win32fe ifort', which returned an error (or two) as: > > > Cannot run executables created with FC. If this machine uses a batch system > to submit jobs you will need to configure using ./configure with the additional option  --with-batch. > Otherwise there is problem with the compilers. Can you compile and run code with your compiler '/cygdrive/d/petsc/petsc-3.18.5/lib/petsc/bin/win32fe/win32fe ifort'? If you are not using PETSc from fortran - you don't need ifort. You can use --with-fc=0 [with MKL or --download-f2cblaslapack] If you are still encountering errors - send us configure.log for the failed build. Satish > > > > Note that both ifort of x64 and ifort of ia-32 ended with the same error above and I install IPS with options related to mkl and fblaslapack. Something a bit suspectable is that I open Cygwin with dos. (actually the Intel Compiler 19.1 Update 3 Intel 64 Visual Studio 2019, x86 environment for the test of ifort ia-32 ,in particularlly) > > > Therefore, I write this e-mail to you in order to confirm if I should add "--with-batch" or the error is caused by other reason, such as ifort ? > > > Looking forward your reply! > > > Sinserely, > FENG. From jchristopher at anl.gov Thu Mar 2 15:22:38 2023 From: jchristopher at anl.gov (Christopher, Joshua) Date: Thu, 2 Mar 2023 21:22:38 +0000 Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG In-Reply-To: References: Message-ID: Hi Barry and Mark, Thank you for looking into my problem. The two equations I am solving with PETSc are equations 6 and 7 from this paper: https://ris.utwente.nl/ws/portalfiles/portal/5676495/Roghair+Paper_final_draft_v1.pdf I just used MUMPS and SuperLU_DIST on my full-size problem (with 3,000,000 unknowns). To clarify, I did a direct solve with -ksp_type preonly. They take a very long time, about 30 minutes for MUMPS and 18 minutes for SuperLU_DIST, see attached output. For reference, the same matrix took 658 iterations of BoomerAMG and about 20 seconds of walltime. Maybe I am already getting a great deal with BoomerAMG! I'll try removing some terms from my solve (e.g. removing the second equation, then making the second equation just the elliptic portion of the equation, etc.) and try with a simpler geometry. I'll keep you updated as I run into troubles with that route. I wasn't aware of Field Split preconditioners, I'll do some reading on them and give them a try as well. Thank you again, Joshua ________________________________ From: Barry Smith Sent: Thursday, March 2, 2023 7:47 AM To: Christopher, Joshua Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the 5,000,000 unknowns? It is at the high end of problem sizes you can do with direct solvers but is worth comparing with BoomerAMG. You likely want to use more nodes and fewer cores per node with MUMPs to be able to access more memory. If you are needing to solve multiple right hand sides but with the same matrix the factors will be reused resulting in the second and later solves being much faster. I agree with Mark, with iterative solvers you are likely to end up with PCFIELDSPLIT. Barry On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users wrote: Hello, I am trying to solve the leaky-dielectric model equations with PETSc using a second-order discretization scheme (with limiting to first order as needed) using the finite volume method. The leaky dielectric model is a coupled system of two equations, consisting of a Poisson equation and a convection-diffusion equation. I have tested on small problems with simple geometry (~1000 DoFs) using: -ksp_type gmres -pc_type hypre -pc_hypre_type boomeramg and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this in parallel with 2 cores, but also previously was able to use successfully use a direct solver in serial to solve this problem. When I scale up to my production problem, I get significantly worse convergence. My production problem has ~3 million DoFs, more complex geometry, and is solved on ~100 cores across two nodes. The boundary conditions change a little because of the geometry, but are of the same classifications (e.g. only Dirichlet and Neumann). On the production case, I am needing 600-4000 iterations to converge. I've attached the output from the first solve that took 658 iterations to converge, using the following output options: -ksp_view_pre -ksp_view -ksp_converged_reason -ksp_monitor_true_residual -ksp_test_null_space My matrix is non-symmetric, the condition number can be around 10e6, and the eigenvalues reported by PETSc have been real and positive (using -ksp_view_eigenvalues). I have tried using other preconditions (superlu, mumps, gamg, mg) but hypre+boomeramg has performed the best so far. The literature seems to indicate that AMG is the best approach for solving these equations in a coupled fashion. Do you have any advice on speeding up the convergence of this system? Thank you, Joshua -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: petsc_preonly_mumps.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: petsc_preonly_superlu.txt URL: From bsmith at petsc.dev Thu Mar 2 15:47:19 2023 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 2 Mar 2023 16:47:19 -0500 Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG In-Reply-To: References: Message-ID: ? Are you solving this as a time-dependent problem? Using an implicit scheme (like backward Euler) for rho ? In ODE language, solving the differential algebraic equation? Is epsilon bounded away from 0? > On Mar 2, 2023, at 4:22 PM, Christopher, Joshua wrote: > > Hi Barry and Mark, > > Thank you for looking into my problem. The two equations I am solving with PETSc are equations 6 and 7 from this paper:https://ris.utwente.nl/ws/portalfiles/portal/5676495/Roghair+Paper_final_draft_v1.pdf > > I just used MUMPS and SuperLU_DIST on my full-size problem (with 3,000,000 unknowns). To clarify, I did a direct solve with -ksp_type preonly. They take a very long time, about 30 minutes for MUMPS and 18 minutes for SuperLU_DIST, see attached output. For reference, the same matrix took 658 iterations of BoomerAMG and about 20 seconds of walltime. Maybe I am already getting a great deal with BoomerAMG! > > I'll try removing some terms from my solve (e.g. removing the second equation, then making the second equation just the elliptic portion of the equation, etc.) and try with a simpler geometry. I'll keep you updated as I run into troubles with that route. I wasn't aware of Field Split preconditioners, I'll do some reading on them and give them a try as well. > > Thank you again, > Joshua > From: Barry Smith > Sent: Thursday, March 2, 2023 7:47 AM > To: Christopher, Joshua > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG > > > Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the 5,000,000 unknowns? It is at the high end of problem sizes you can do with direct solvers but is worth comparing with BoomerAMG. You likely want to use more nodes and fewer cores per node with MUMPs to be able to access more memory. If you are needing to solve multiple right hand sides but with the same matrix the factors will be reused resulting in the second and later solves being much faster. > > I agree with Mark, with iterative solvers you are likely to end up with PCFIELDSPLIT. > > Barry > > >> On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users wrote: >> >> Hello, >> >> I am trying to solve the leaky-dielectric model equations with PETSc using a second-order discretization scheme (with limiting to first order as needed) using the finite volume method. The leaky dielectric model is a coupled system of two equations, consisting of a Poisson equation and a convection-diffusion equation. I have tested on small problems with simple geometry (~1000 DoFs) using: >> >> -ksp_type gmres >> -pc_type hypre >> -pc_hypre_type boomeramg >> >> and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this in parallel with 2 cores, but also previously was able to use successfully use a direct solver in serial to solve this problem. When I scale up to my production problem, I get significantly worse convergence. My production problem has ~3 million DoFs, more complex geometry, and is solved on ~100 cores across two nodes. The boundary conditions change a little because of the geometry, but are of the same classifications (e.g. only Dirichlet and Neumann). On the production case, I am needing 600-4000 iterations to converge. I've attached the output from the first solve that took 658 iterations to converge, using the following output options: >> >> -ksp_view_pre >> -ksp_view >> -ksp_converged_reason >> -ksp_monitor_true_residual >> -ksp_test_null_space >> >> My matrix is non-symmetric, the condition number can be around 10e6, and the eigenvalues reported by PETSc have been real and positive (using -ksp_view_eigenvalues). >> >> I have tried using other preconditions (superlu, mumps, gamg, mg) but hypre+boomeramg has performed the best so far. The literature seems to indicate that AMG is the best approach for solving these equations in a coupled fashion. >> >> Do you have any advice on speeding up the convergence of this system? >> >> Thank you, >> Joshua >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Untitled.png Type: image/png Size: 165137 bytes Desc: not available URL: From fengshw3 at mail2.sysu.edu.cn Thu Mar 2 20:12:35 2023 From: fengshw3 at mail2.sysu.edu.cn (=?utf-8?B?5Yav5LiK546u?=) Date: Fri, 3 Mar 2023 10:12:35 +0800 Subject: [petsc-users] Error in configuring PETSc with Cygwin In-Reply-To: References: Message-ID: Hi,  This time I try with ./configure --with-cc='win32fe cl' --with-fc=0 --with-cxx='win32fe cl' --download-f2cblaslapack, without fortran may have no problem in consideration that other libs will be used are CGNS and METIS. Unfortunately, however, another error appeared as: Cxx libraries cannot directly be used with C as linker. If you don't need the C++ compiler to build external packages or for you application you can run ./configure with --with-cxx=0. Otherwise you need a different combination of C and C++ compilers    The attachment is the log file, but some parts are unreadable.  Thanks for your continuous aid! ------------------ Original ------------------ From:  "Satish Balay" -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.txt Type: application/octet-stream Size: 957665 bytes Desc: not available URL: From bsmith at petsc.dev Thu Mar 2 20:27:31 2023 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 2 Mar 2023 21:27:31 -0500 Subject: [petsc-users] Error in configuring PETSc with Cygwin In-Reply-To: References: Message-ID: <4ECF3541-271E-449E-B9FF-45EB24913F25@petsc.dev> The compiler is burping out some warning message which confuses configure into thinking there is a problem. cl: ?????? warning D9035 :??experimental:preprocessor???????????????????????????? cl: ?????? warning D9036 :????Zc:preprocessor??????????experimental:preprocessor?? cl: ?????? warning D9002 :?????????-Qwd10161??: Any chance you can use a more recent version of VS. If not, we'll need to send you a file for the warning message. > On Mar 2, 2023, at 9:12 PM, ??? wrote: > > Hi, > > This time I try with ./configure --with-cc='win32fe cl' --with-fc=0 --with-cxx='win32fe cl' --download-f2cblaslapack, without fortran may have no problem in consideration that other libs will be used are CGNS and METIS. > > Unfortunately, however, another error appeared as: > > Cxx libraries cannot directly be used with C as linker. > If you don't need the C++ compiler to build external packages or for you application you can run > ./configure with --with-cxx=0. Otherwise you need a different combination of C and C++ compilers > > The attachment is the log file, but some parts are unreadable. > > Thanks for your continuous aid! > ------------------ Original ------------------ > From: "Satish Balay"; > Date: Fri, Mar 3, 2023 02:13 AM > To: "???"; > Cc: "petsc-users"; > Subject: Re: [petsc-users] Error in configuring PETSc with Cygwin > > On Fri, 3 Mar 2023, ??? wrote: > > > Hi team, > > > > > > Recently I try to install PETSc with Cygwin since I'd like to use PETSc with Visual Studio on Windows10 plateform. For the sake of clarity, I firstly list the softwares/packages used below: > > > > > > 1. PETSc: version 3.18.5 > > 2. VS: version 2019 > > 3. Intel Parallel Studio XE: version 2020 > > 4. Cygwin with py3.8 and make (and default installation) > > > > > > And because I plan to use Intel mpi, the compiler option in configuration is: > > > > > > ./configure --with-cc='win32fe cl' --with-fc='win32fe ifort' --with-cxx='win32fe cl' --download-fblaslapack > > Check config/examples/arch-ci-mswin-opt-impi.py for an example on specifying IMPI [and MKL - instead of fblaslapack]. And if you don't need MPI - you can use --with-mpi=0 > > > > > > > where there is no option for mpi. > > > > > > While the PROBLEM came with the compiler option --with-fc='win32fe ifort', which returned an error (or two) as: > > > > > > Cannot run executables created with FC. If this machine uses a batch system > > to submit jobs you will need to configure using ./configure with the additional option  --with-batch. > > Otherwise there is problem with the compilers. Can you compile and run code with your compiler '/cygdrive/d/petsc/petsc-3.18.5/lib/petsc/bin/win32fe/win32fe ifort'? > > If you are not using PETSc from fortran - you don't need ifort. You can use --with-fc=0 [with MKL or --download-f2cblaslapack] > > If you are still encountering errors - send us configure.log for the failed build. > > Satish > > > > > > > > > Note that both ifort of x64 and ifort of ia-32 ended with the same error above and I install IPS with options related to mkl and fblaslapack. Something a bit suspectable is that I open Cygwin with dos. (actually the Intel Compiler 19.1 Update 3 Intel 64 Visual Studio 2019, x86 environment for the test of ifort ia-32 ,in particularlly) > > > > > > Therefore, I write this e-mail to you in order to confirm if I should add "--with-batch" or the error is caused by other reason, such as ifort ? > > > > > > Looking forward your reply! > > > > > > Sinserely, > > FENG. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Thu Mar 2 22:12:45 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 2 Mar 2023 22:12:45 -0600 (CST) Subject: [petsc-users] Error in configuring PETSc with Cygwin In-Reply-To: <4ECF3541-271E-449E-B9FF-45EB24913F25@petsc.dev> References: <4ECF3541-271E-449E-B9FF-45EB24913F25@petsc.dev> Message-ID: <6bb8769f-976c-fa32-8076-757e2d04b54c@mcs.anl.gov> Perhaps the compilers are installed without english - so we can't read the error messages. > ???? x64 ?? Microsoft (R) C/C++ ????????? 19.29.30147 ?? We test with: Microsoft (R) C/C++ Optimizing Compiler Version 19.32.31329 for x64 I guess that's VS2019 vs VS2022? You can try using --with-cxx=0 and see if that works. Satish On Thu, 2 Mar 2023, Barry Smith wrote: > > The compiler is burping out some warning message which confuses configure into thinking there is a problem. > > cl: ?????? warning D9035 :??experimental:preprocessor???????????????????????????? > cl: ?????? warning D9036 :????Zc:preprocessor??????????experimental:preprocessor?? > cl: ?????? warning D9002 :?????????-Qwd10161??: > > Any chance you can use a more recent version of VS. If not, we'll need to send you a file for the warning message. > > > > > On Mar 2, 2023, at 9:12 PM, ??? wrote: > > > > Hi, > > > > This time I try with ./configure --with-cc='win32fe cl' --with-fc=0 --with-cxx='win32fe cl' --download-f2cblaslapack, without fortran may have no problem in consideration that other libs will be used are CGNS and METIS. > > > > Unfortunately, however, another error appeared as: > > > > Cxx libraries cannot directly be used with C as linker. > > If you don't need the C++ compiler to build external packages or for you application you can run > > ./configure with --with-cxx=0. Otherwise you need a different combination of C and C++ compilers > > > > The attachment is the log file, but some parts are unreadable. > > > > Thanks for your continuous aid! > > ------------------ Original ------------------ > > From: "Satish Balay"; > > Date: Fri, Mar 3, 2023 02:13 AM > > To: "???"; > > Cc: "petsc-users"; > > Subject: Re: [petsc-users] Error in configuring PETSc with Cygwin > > > > On Fri, 3 Mar 2023, ??? wrote: > > > > > Hi team, > > > > > > > > > Recently I try to install PETSc with Cygwin since I'd like to use PETSc with Visual Studio on Windows10 plateform. For the sake of clarity, I firstly list the softwares/packages used below: > > > > > > > > > 1. PETSc: version 3.18.5 > > > 2. VS: version 2019 > > > 3. Intel Parallel Studio XE: version 2020 > > > 4. Cygwin with py3.8 and make (and default installation) > > > > > > > > > And because I plan to use Intel mpi, the compiler option in configuration is: > > > > > > > > > ./configure --with-cc='win32fe cl' --with-fc='win32fe ifort' --with-cxx='win32fe cl' --download-fblaslapack > > > > Check config/examples/arch-ci-mswin-opt-impi.py for an example on specifying IMPI [and MKL - instead of fblaslapack]. And if you don't need MPI - you can use --with-mpi=0 > > > > > > > > > > > where there is no option for mpi. > > > > > > > > > While the PROBLEM came with the compiler option --with-fc='win32fe ifort', which returned an error (or two) as: > > > > > > > > > Cannot run executables created with FC. If this machine uses a batch system > > > to submit jobs you will need to configure using ./configure with the additional option  --with-batch. > > > Otherwise there is problem with the compilers. Can you compile and run code with your compiler '/cygdrive/d/petsc/petsc-3.18.5/lib/petsc/bin/win32fe/win32fe ifort'? > > > > If you are not using PETSc from fortran - you don't need ifort. You can use --with-fc=0 [with MKL or --download-f2cblaslapack] > > > > If you are still encountering errors - send us configure.log for the failed build. > > > > Satish > > > > > > > > > > > > > > Note that both ifort of x64 and ifort of ia-32 ended with the same error above and I install IPS with options related to mkl and fblaslapack. Something a bit suspectable is that I open Cygwin with dos. (actually the Intel Compiler 19.1 Update 3 Intel 64 Visual Studio 2019, x86 environment for the test of ifort ia-32 ,in particularlly) > > > > > > > > > Therefore, I write this e-mail to you in order to confirm if I should add "--with-batch" or the error is caused by other reason, such as ifort ? > > > > > > > > > Looking forward your reply! > > > > > > > > > Sinserely, > > > FENG. > > > > > > From fengshw3 at mail2.sysu.edu.cn Thu Mar 2 22:29:31 2023 From: fengshw3 at mail2.sysu.edu.cn (=?utf-8?B?5Yav5LiK546u?=) Date: Fri, 3 Mar 2023 12:29:31 +0800 Subject: [petsc-users] =?utf-8?b?5Zue5aSNOlJlOiAgRXJyb3IgaW4gY29uZmlndXJp?= =?utf-8?q?ng_PETSc_with_Cygwin?= Message-ID: My program is coded in C++, I think that it's not a good choice for cxx=0, whatever the configuration in this way works or not. Anyway, I'll search for getting VS2022 and retry installation. --------------????-------------- ????"Satish Balay " From jchristopher at anl.gov Fri Mar 3 11:24:32 2023 From: jchristopher at anl.gov (Christopher, Joshua) Date: Fri, 3 Mar 2023 17:24:32 +0000 Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG In-Reply-To: References: Message-ID: I am solving these equations in the context of electrically-driven fluid flows as that first paper describes. I am using a PIMPLE scheme to advance the fluid equations in time, and my goal is to do a coupled solve of the electric equations similar to what is described in this paper: https://www.sciencedirect.com/science/article/pii/S0045793019302427. They are using the SIMPLE scheme in this paper. My fluid flow should eventually reach steady behavior, and likewise the time derivative in the charge density should trend towards zero. They preferred using BiCGStab with a direct LU preconditioner for solving their electric equations. I tried to test that combination, but my case is halting for unknown reasons in the middle of the PETSc solve. I'll try with more nodes and see if I am running out of memory, but the computer is a little overloaded at the moment so it may take a while to run. I sent Pierre Jolivet my matrix and RHS, and they said the matrix does not appear to be following a parallel numbering, and instead looks like the matrix has natural numbering. When they renumbered the system with ParMETIS they got really fast convergence. I am using PETSc through a library, so I will reach out to the library authors and see if there is an issue in the library. Thank you, Joshua ________________________________ From: Barry Smith Sent: Thursday, March 2, 2023 3:47 PM To: Christopher, Joshua Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG [Untitled.png] Are you solving this as a time-dependent problem? Using an implicit scheme (like backward Euler) for rho ? In ODE language, solving the differential algebraic equation? Is epsilon bounded away from 0? On Mar 2, 2023, at 4:22 PM, Christopher, Joshua wrote: Hi Barry and Mark, Thank you for looking into my problem. The two equations I am solving with PETSc are equations 6 and 7 from this paper:https://ris.utwente.nl/ws/portalfiles/portal/5676495/Roghair+Paper_final_draft_v1.pdf I just used MUMPS and SuperLU_DIST on my full-size problem (with 3,000,000 unknowns). To clarify, I did a direct solve with -ksp_type preonly. They take a very long time, about 30 minutes for MUMPS and 18 minutes for SuperLU_DIST, see attached output. For reference, the same matrix took 658 iterations of BoomerAMG and about 20 seconds of walltime. Maybe I am already getting a great deal with BoomerAMG! I'll try removing some terms from my solve (e.g. removing the second equation, then making the second equation just the elliptic portion of the equation, etc.) and try with a simpler geometry. I'll keep you updated as I run into troubles with that route. I wasn't aware of Field Split preconditioners, I'll do some reading on them and give them a try as well. Thank you again, Joshua ________________________________ From: Barry Smith Sent: Thursday, March 2, 2023 7:47 AM To: Christopher, Joshua Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the 5,000,000 unknowns? It is at the high end of problem sizes you can do with direct solvers but is worth comparing with BoomerAMG. You likely want to use more nodes and fewer cores per node with MUMPs to be able to access more memory. If you are needing to solve multiple right hand sides but with the same matrix the factors will be reused resulting in the second and later solves being much faster. I agree with Mark, with iterative solvers you are likely to end up with PCFIELDSPLIT. Barry On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users wrote: Hello, I am trying to solve the leaky-dielectric model equations with PETSc using a second-order discretization scheme (with limiting to first order as needed) using the finite volume method. The leaky dielectric model is a coupled system of two equations, consisting of a Poisson equation and a convection-diffusion equation. I have tested on small problems with simple geometry (~1000 DoFs) using: -ksp_type gmres -pc_type hypre -pc_hypre_type boomeramg and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this in parallel with 2 cores, but also previously was able to use successfully use a direct solver in serial to solve this problem. When I scale up to my production problem, I get significantly worse convergence. My production problem has ~3 million DoFs, more complex geometry, and is solved on ~100 cores across two nodes. The boundary conditions change a little because of the geometry, but are of the same classifications (e.g. only Dirichlet and Neumann). On the production case, I am needing 600-4000 iterations to converge. I've attached the output from the first solve that took 658 iterations to converge, using the following output options: -ksp_view_pre -ksp_view -ksp_converged_reason -ksp_monitor_true_residual -ksp_test_null_space My matrix is non-symmetric, the condition number can be around 10e6, and the eigenvalues reported by PETSc have been real and positive (using -ksp_view_eigenvalues). I have tried using other preconditions (superlu, mumps, gamg, mg) but hypre+boomeramg has performed the best so far. The literature seems to indicate that AMG is the best approach for solving these equations in a coupled fashion. Do you have any advice on speeding up the convergence of this system? Thank you, Joshua -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Untitled.png Type: image/png Size: 165137 bytes Desc: Untitled.png URL: From pierre at joliv.et Fri Mar 3 11:45:05 2023 From: pierre at joliv.et (Pierre Jolivet) Date: Fri, 3 Mar 2023 18:45:05 +0100 Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG In-Reply-To: References: Message-ID: <523EAD18-437E-4008-A811-4D32317C89AC@joliv.et> For full disclosure, with -ksp_pc_side right -ksp_max_it 100 -ksp_rtol 1E-10: 1) with renumbering via ParMETIS -pc_type bjacobi -sub_pc_type lu -sub_pc_factor_mat_solver_type mumps => Linear solve converged due to CONVERGED_RTOL iterations 10 -pc_type hypre -pc_hypre_boomeramg_relax_type_down l1-Gauss-Seidel -pc_hypre_boomeramg_relax_type_up backward-l1-Gauss-Seidel => Linear solve converged due to CONVERGED_RTOL iterations 55 2) without renumbering via ParMETIS -pc_type bjacobi => Linear solve did not converge due to DIVERGED_ITS iterations 100 -pc_type hypre => Linear solve did not converge due to DIVERGED_ITS iterations 100 Using on outer fieldsplit may help fix this. Thanks, Pierre > On 3 Mar 2023, at 6:24 PM, Christopher, Joshua via petsc-users wrote: > > I am solving these equations in the context of electrically-driven fluid flows as that first paper describes. I am using a PIMPLE scheme to advance the fluid equations in time, and my goal is to do a coupled solve of the electric equations similar to what is described in this paper: https://www.sciencedirect.com/science/article/pii/S0045793019302427. They are using the SIMPLE scheme in this paper. My fluid flow should eventually reach steady behavior, and likewise the time derivative in the charge density should trend towards zero. They preferred using BiCGStab with a direct LU preconditioner for solving their electric equations. I tried to test that combination, but my case is halting for unknown reasons in the middle of the PETSc solve. I'll try with more nodes and see if I am running out of memory, but the computer is a little overloaded at the moment so it may take a while to run. > > I sent Pierre Jolivet my matrix and RHS, and they said the matrix does not appear to be following a parallel numbering, and instead looks like the matrix has natural numbering. When they renumbered the system with ParMETIS they got really fast convergence. I am using PETSc through a library, so I will reach out to the library authors and see if there is an issue in the library. > > Thank you, > Joshua > From: Barry Smith > Sent: Thursday, March 2, 2023 3:47 PM > To: Christopher, Joshua > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG > > > > ? > > Are you solving this as a time-dependent problem? Using an implicit scheme (like backward Euler) for rho ? In ODE language, solving the differential algebraic equation? > > Is epsilon bounded away from 0? > >> On Mar 2, 2023, at 4:22 PM, Christopher, Joshua wrote: >> >> Hi Barry and Mark, >> >> Thank you for looking into my problem. The two equations I am solving with PETSc are equations 6 and 7 from this paper:https://ris.utwente.nl/ws/portalfiles/portal/5676495/Roghair+Paper_final_draft_v1.pdf >> >> I just used MUMPS and SuperLU_DIST on my full-size problem (with 3,000,000 unknowns). To clarify, I did a direct solve with -ksp_type preonly. They take a very long time, about 30 minutes for MUMPS and 18 minutes for SuperLU_DIST, see attached output. For reference, the same matrix took 658 iterations of BoomerAMG and about 20 seconds of walltime. Maybe I am already getting a great deal with BoomerAMG! >> >> I'll try removing some terms from my solve (e.g. removing the second equation, then making the second equation just the elliptic portion of the equation, etc.) and try with a simpler geometry. I'll keep you updated as I run into troubles with that route. I wasn't aware of Field Split preconditioners, I'll do some reading on them and give them a try as well. >> >> Thank you again, >> Joshua >> From: Barry Smith >> Sent: Thursday, March 2, 2023 7:47 AM >> To: Christopher, Joshua >> Cc: petsc-users at mcs.anl.gov >> Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG >> >> >> Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the 5,000,000 unknowns? It is at the high end of problem sizes you can do with direct solvers but is worth comparing with BoomerAMG. You likely want to use more nodes and fewer cores per node with MUMPs to be able to access more memory. If you are needing to solve multiple right hand sides but with the same matrix the factors will be reused resulting in the second and later solves being much faster. >> >> I agree with Mark, with iterative solvers you are likely to end up with PCFIELDSPLIT. >> >> Barry >> >> >>> On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users wrote: >>> >>> Hello, >>> >>> I am trying to solve the leaky-dielectric model equations with PETSc using a second-order discretization scheme (with limiting to first order as needed) using the finite volume method. The leaky dielectric model is a coupled system of two equations, consisting of a Poisson equation and a convection-diffusion equation. I have tested on small problems with simple geometry (~1000 DoFs) using: >>> >>> -ksp_type gmres >>> -pc_type hypre >>> -pc_hypre_type boomeramg >>> >>> and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this in parallel with 2 cores, but also previously was able to use successfully use a direct solver in serial to solve this problem. When I scale up to my production problem, I get significantly worse convergence. My production problem has ~3 million DoFs, more complex geometry, and is solved on ~100 cores across two nodes. The boundary conditions change a little because of the geometry, but are of the same classifications (e.g. only Dirichlet and Neumann). On the production case, I am needing 600-4000 iterations to converge. I've attached the output from the first solve that took 658 iterations to converge, using the following output options: >>> >>> -ksp_view_pre >>> -ksp_view >>> -ksp_converged_reason >>> -ksp_monitor_true_residual >>> -ksp_test_null_space >>> >>> My matrix is non-symmetric, the condition number can be around 10e6, and the eigenvalues reported by PETSc have been real and positive (using -ksp_view_eigenvalues). >>> >>> I have tried using other preconditions (superlu, mumps, gamg, mg) but hypre+boomeramg has performed the best so far. The literature seems to indicate that AMG is the best approach for solving these equations in a coupled fashion. >>> >>> Do you have any advice on speeding up the convergence of this system? >>> >>> Thank you, >>> Joshua >>> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Untitled.png Type: image/png Size: 165137 bytes Desc: not available URL: From danyang.su at gmail.com Fri Mar 3 18:36:10 2023 From: danyang.su at gmail.com (danyang.su at gmail.com) Date: Fri, 3 Mar 2023 16:36:10 -0800 Subject: [petsc-users] PETSC ERROR in DMGetLocalBoundingBox? Message-ID: <00ab01d94e31$51fdc590$f5f950b0$@gmail.com> Hi All, I get a very strange error after upgrading PETSc version to 3.18.3, indicating some object is already free. The error is begin and does not crash the code. There is no error before PETSc 3.17.5 versions. !Check coordinates call DMGetCoordinateDM(dmda_flow%da,cda,ierr) CHKERRQ(ierr) call DMGetCoordinates(dmda_flow%da,gc,ierr) CHKERRQ(ierr) call DMGetLocalBoundingBox(dmda_flow%da,lmin,lmax,ierr) CHKERRQ(ierr) call DMGetBoundingBox(dmda_flow%da,gmin,gmax,ierr) CHKERRQ(ierr) [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Corrupt argument: https://petsc.org/release/faq/#valgrind [0]PETSC ERROR: Object already free: Parameter # 1 [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 [0]PETSC ERROR: ../min3p-hpc-mpi-petsc-3.18.3 on a linux-gnu-dbg named starblazer by dsu Fri Mar 3 16:26:03 2023 [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-scalapack --download-parmetis --download-metis --download-mumps --download-ptscotch --download-chaco --download-fblaslapack --download-hypre --download-superlu_dist --download-hdf5=yes --download-ctetgen --download-zlib --download-pnetcdf --download-cmake --with-hdf5-fortran-bindings --with-debugging=1 [0]PETSC ERROR: #1 VecGetArrayRead() at /home/dsu/Soft/petsc/petsc-3.18.3/src/vec/vec/interface/rvector.c:1928 [0]PETSC ERROR: #2 DMGetLocalBoundingBox() at /home/dsu/Soft/petsc/petsc-3.18.3/src/dm/interface/dmcoordinates.c:897 [0]PETSC ERROR: #3 /home/dsu/Work/min3p-dbs-backup/src/project/makefile_p/../../solver/solver_d dmethod.F90:2140 Any suggestion on this? Thanks, Danyang -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Mar 3 22:58:05 2023 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 4 Mar 2023 05:58:05 +0100 Subject: [petsc-users] PETSC ERROR in DMGetLocalBoundingBox? In-Reply-To: <00ab01d94e31$51fdc590$f5f950b0$@gmail.com> References: <00ab01d94e31$51fdc590$f5f950b0$@gmail.com> Message-ID: On Sat, Mar 4, 2023 at 1:35?AM wrote: > Hi All, > > > > I get a very strange error after upgrading PETSc version to 3.18.3, > indicating some object is already free. The error is begin and does not > crash the code. There is no error before PETSc 3.17.5 versions. > We have changed the way coordinates are handled in order to support higher order coordinate fields. Is it possible to send something that we can run that has this error? It could be on our end, but it could also be that you are destroying a coordinate vector accidentally. Thanks, Matt > > > !Check coordinates > > call DMGetCoordinateDM(dmda_flow%da,cda,ierr) > > CHKERRQ(ierr) > > call DMGetCoordinates(dmda_flow%da,gc,ierr) > > CHKERRQ(ierr) > > call DMGetLocalBoundingBox(dmda_flow%da,lmin,lmax,ierr) > > CHKERRQ(ierr) > > call DMGetBoundingBox(dmda_flow%da,gmin,gmax,ierr) > > CHKERRQ(ierr) > > > > > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: Corrupt argument: https://petsc.org/release/faq/#valgrind > > [0]PETSC ERROR: Object already free: Parameter # 1 > > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > > [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 > > [0]PETSC ERROR: ../min3p-hpc-mpi-petsc-3.18.3 on a linux-gnu-dbg named > starblazer by dsu Fri Mar 3 16:26:03 2023 > > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ > --with-fc=gfortran --download-mpich --download-scalapack > --download-parmetis --download-metis --download-mumps --download-ptscotch > --download-chaco --download-fblaslapack --download-hypre > --download-superlu_dist --download-hdf5=yes --download-ctetgen > --download-zlib --download-pnetcdf --download-cmake > --with-hdf5-fortran-bindings --with-debugging=1 > > [0]PETSC ERROR: #1 VecGetArrayRead() at > /home/dsu/Soft/petsc/petsc-3.18.3/src/vec/vec/interface/rvector.c:1928 > > [0]PETSC ERROR: #2 DMGetLocalBoundingBox() at > /home/dsu/Soft/petsc/petsc-3.18.3/src/dm/interface/dmcoordinates.c:897 > > [0]PETSC ERROR: #3 > /home/dsu/Work/min3p-dbs-backup/src/project/makefile_p/../../solver/solver_ddmethod.F90:2140 > > > > Any suggestion on this? > > > > Thanks, > > > > Danyang > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From fengshw3 at mail2.sysu.edu.cn Sat Mar 4 06:22:01 2023 From: fengshw3 at mail2.sysu.edu.cn (=?utf-8?B?5Yav5LiK546u?=) Date: Sat, 4 Mar 2023 20:22:01 +0800 Subject: [petsc-users] Error in configuring PETSc with Cygwin In-Reply-To: <6bb8769f-976c-fa32-8076-757e2d04b54c@mcs.anl.gov> References: <4ECF3541-271E-449E-B9FF-45EB24913F25@petsc.dev> <6bb8769f-976c-fa32-8076-757e2d04b54c@mcs.anl.gov> Message-ID: Hi, VS2022 really sovled this error and there is no more error with the compiler, this is a good news! However, a new problem comes with the link option for MS-MPI (since MPICH2 doesn't work): I've made reference to PETSc website and downloaded MS-MPI in directory D:\MicrosoftMPI and D:\MicrosoftSDKs to avoid space (by the way, method on https://petsc.org/release/install/windows/ which use shortname for a path is not useful anymore for win10 because shortname doesn't exist, see [1]). I have no idea if my tying format is not correct since the PETSc website doesn't show the coding for two include directories. Below is my typing: ./configure --with-cc='win32fe cl' --with-fc=0 --with-cxx='win32fe cl' --with-shared-libraries=0 --with-mpi-include='[/cygdrive/d/MicrosoftSDKs/MPI/Include,/cygdrive/d/MicrosoftSDKs/MPI/Include/x64]' --with-mpi-lib=-L"/cygdrive/d/MicrosoftSDKs/MPI/Lib/x64 msmpifec.lib msmpi.lib" --with-mpiexec="/cygdrive/d/MicrosoftMPI/Bin/mpiexec" This ends up with the error information: --with-mpi-lib=['-L/cygdrive/d/MicrosoftSDKs/MPI/Lib/x64', 'msmpifec.lib', 'msmpi.lib'] and --with-mpi-include=['/cygdrive/d/MicrosoftSDKs/MPI/Include', '/cygdrive/d/MicrosoftSDKs/MPI/Include/x64'] did not work This may not be a very delicacy problem? And I am voluntary to make a summary about this installation once it succeed. Sorry for always bother with problem, FENG [1] https://superuser.com/questions/348079/how-can-i-find-the-short-path-of-a-windows-directory-file     ------------------ Original ------------------ From:  "Satish Balay" -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log.txt Type: application/octet-stream Size: 1218733 bytes Desc: not available URL: From yangzongze at gmail.com Sat Mar 4 07:30:38 2023 From: yangzongze at gmail.com (Zongze Yang) Date: Sat, 4 Mar 2023 21:30:38 +0800 Subject: [petsc-users] Random Error of mumps: out of memory: INFOG(1)=-9 Message-ID: Hi, I am writing to seek your advice regarding a problem I encountered while using multigrid to solve a certain issue. I am currently using multigrid with the coarse problem solved by PCLU. However, the PC failed randomly with the error below (the value of INFO(2) may differ): ```shell [ 0] Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=36 ``` Upon checking the documentation of MUMPS, I discovered that increasing the value of ICNTL(14) may help resolve the issue. Specifically, I set the option -mat_mumps_icntl_14 to a higher value (such as 40), and the error seemed to disappear after I set the value of ICNTL(14) to 80. However, I am still curious as to why MUMPS failed randomly in the first place. Upon further inspection, I found that the number of nonzeros of the PETSc matrix and the MUMPS matrix were different every time I ran the code. I am now left with the following questions: 1. What could be causing the number of nonzeros of the MUMPS matrix to change every time I run the code? 2. Why is the number of nonzeros of the MUMPS matrix significantly greater than that of the PETSc matrix (as seen in the output of ksp_view, 115025949 vs 7346177)? 3. Is it possible that the varying number of nonzeros of the MUMPS matrix is the cause of the random failure? I have attached a test example written in Firedrake. The output of `ksp_view` after running the code twice is included below for your reference. In the output, the number of nonzeros of the MUMPS matrix was 115025949 and 115377847, respectively, while that of the PETSc matrix was only 7346177. ```shell (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view ::ascii_info_detail | grep -A3 "type: " type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning -- type: lu out-of-place factorization tolerance for zero pivot 2.22045e-14 matrix ordering: external -- type: mumps rows=1050625, cols=1050625 package used to perform factorization: mumps total: nonzeros=115025949, allocated nonzeros=115025949 -- type: mpiaij rows=1050625, cols=1050625 total: nonzeros=7346177, allocated nonzeros=7346177 total number of mallocs used during MatSetValues calls=0 (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view ::ascii_info_detail | grep -A3 "type: " type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning -- type: lu out-of-place factorization tolerance for zero pivot 2.22045e-14 matrix ordering: external -- type: mumps rows=1050625, cols=1050625 package used to perform factorization: mumps total: nonzeros=115377847, allocated nonzeros=115377847 -- type: mpiaij rows=1050625, cols=1050625 total: nonzeros=7346177, allocated nonzeros=7346177 total number of mallocs used during MatSetValues calls=0 ``` I would greatly appreciate any insights you may have on this matter. Thank you in advance for your time and assistance. Best wishes, Zongze -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_mumps.py Type: text/x-python Size: 763 bytes Desc: not available URL: From pierre at joliv.et Sat Mar 4 07:37:15 2023 From: pierre at joliv.et (Pierre Jolivet) Date: Sat, 4 Mar 2023 14:37:15 +0100 Subject: [petsc-users] Random Error of mumps: out of memory: INFOG(1)=-9 In-Reply-To: References: Message-ID: > On 4 Mar 2023, at 2:30 PM, Zongze Yang wrote: > > Hi, > > I am writing to seek your advice regarding a problem I encountered while using multigrid to solve a certain issue. > I am currently using multigrid with the coarse problem solved by PCLU. However, the PC failed randomly with the error below (the value of INFO(2) may differ): > ```shell > [ 0] Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=36 > ``` > > Upon checking the documentation of MUMPS, I discovered that increasing the value of ICNTL(14) may help resolve the issue. Specifically, I set the option -mat_mumps_icntl_14 to a higher value (such as 40), and the error seemed to disappear after I set the value of ICNTL(14) to 80. However, I am still curious as to why MUMPS failed randomly in the first place. > > Upon further inspection, I found that the number of nonzeros of the PETSc matrix and the MUMPS matrix were different every time I ran the code. I am now left with the following questions: > > 1. What could be causing the number of nonzeros of the MUMPS matrix to change every time I run the code? Is the Mat being fed to MUMPS distributed on a communicator of size greater than one? If yes, then, depending on the pivoting and the renumbering, you may get non-deterministic results. > 2. Why is the number of nonzeros of the MUMPS matrix significantly greater than that of the PETSc matrix (as seen in the output of ksp_view, 115025949 vs 7346177)? Exact factorizations introduce fill-in. The number of nonzeros you are seeing for MUMPS is the number of nonzeros in the factors. > 3. Is it possible that the varying number of nonzeros of the MUMPS matrix is the cause of the random failure? Yes, MUMPS uses dynamic scheduling, which will depend on numerical pivoting, and which may generate factors with different number of nonzeros. Thanks, Pierre > I have attached a test example written in Firedrake. The output of `ksp_view` after running the code twice is included below for your reference. > In the output, the number of nonzeros of the MUMPS matrix was 115025949 and 115377847, respectively, while that of the PETSc matrix was only 7346177. > > ```shell > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view ::ascii_info_detail | grep -A3 "type: " > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > -- > type: lu > out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: external > -- > type: mumps > rows=1050625, cols=1050625 > package used to perform factorization: mumps > total: nonzeros=115025949, allocated nonzeros=115025949 > -- > type: mpiaij > rows=1050625, cols=1050625 > total: nonzeros=7346177, allocated nonzeros=7346177 > total number of mallocs used during MatSetValues calls=0 > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view ::ascii_info_detail | grep -A3 "type: " > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > -- > type: lu > out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: external > -- > type: mumps > rows=1050625, cols=1050625 > package used to perform factorization: mumps > total: nonzeros=115377847, allocated nonzeros=115377847 > -- > type: mpiaij > rows=1050625, cols=1050625 > total: nonzeros=7346177, allocated nonzeros=7346177 > total number of mallocs used during MatSetValues calls=0 > ``` > > I would greatly appreciate any insights you may have on this matter. Thank you in advance for your time and assistance. > > Best wishes, > Zongze > From yangzongze at gmail.com Sat Mar 4 07:51:09 2023 From: yangzongze at gmail.com (Zongze Yang) Date: Sat, 4 Mar 2023 21:51:09 +0800 Subject: [petsc-users] Random Error of mumps: out of memory: INFOG(1)=-9 In-Reply-To: References: Message-ID: On Sat, 4 Mar 2023 at 21:37, Pierre Jolivet wrote: > > > > On 4 Mar 2023, at 2:30 PM, Zongze Yang wrote: > > > > Hi, > > > > I am writing to seek your advice regarding a problem I encountered while > using multigrid to solve a certain issue. > > I am currently using multigrid with the coarse problem solved by PCLU. > However, the PC failed randomly with the error below (the value of INFO(2) > may differ): > > ```shell > > [ 0] Error reported by MUMPS in numerical factorization phase: > INFOG(1)=-9, INFO(2)=36 > > ``` > > > > Upon checking the documentation of MUMPS, I discovered that increasing > the value of ICNTL(14) may help resolve the issue. Specifically, I set the > option -mat_mumps_icntl_14 to a higher value (such as 40), and the error > seemed to disappear after I set the value of ICNTL(14) to 80. However, I am > still curious as to why MUMPS failed randomly in the first place. > > > > Upon further inspection, I found that the number of nonzeros of the > PETSc matrix and the MUMPS matrix were different every time I ran the code. > I am now left with the following questions: > > > > 1. What could be causing the number of nonzeros of the MUMPS matrix to > change every time I run the code? > > Is the Mat being fed to MUMPS distributed on a communicator of size > greater than one? > If yes, then, depending on the pivoting and the renumbering, you may get > non-deterministic results. > Hi, Pierre, Thank you for your prompt reply. Yes, the size of the communicator is greater than one. Even if the size of the communicator is equal, are the results still non-deterministic? Can I assume the Mat being fed to MUMPS is the same in this case? Is the pivoting and renumbering all done by MUMPS other than PETSc? > > 2. Why is the number of nonzeros of the MUMPS matrix significantly > greater than that of the PETSc matrix (as seen in the output of ksp_view, > 115025949 vs 7346177)? > > Exact factorizations introduce fill-in. > The number of nonzeros you are seeing for MUMPS is the number of nonzeros > in the factors. > > > 3. Is it possible that the varying number of nonzeros of the MUMPS > matrix is the cause of the random failure? > > Yes, MUMPS uses dynamic scheduling, which will depend on numerical > pivoting, and which may generate factors with different number of nonzeros. > Got it. Thank you for your clear explanation. Zongze > Thanks, > Pierre > > I have attached a test example written in Firedrake. The output of > `ksp_view` after running the code twice is included below for your > reference. > > In the output, the number of nonzeros of the MUMPS matrix was 115025949 > and 115377847, respectively, while that of the PETSc matrix was only > 7346177. > > > > ```shell > > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view > ::ascii_info_detail | grep -A3 "type: " > > type: preonly > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > -- > > type: lu > > out-of-place factorization > > tolerance for zero pivot 2.22045e-14 > > matrix ordering: external > > -- > > type: mumps > > rows=1050625, cols=1050625 > > package used to perform factorization: mumps > > total: nonzeros=115025949, allocated nonzeros=115025949 > > -- > > type: mpiaij > > rows=1050625, cols=1050625 > > total: nonzeros=7346177, allocated nonzeros=7346177 > > total number of mallocs used during MatSetValues calls=0 > > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view > ::ascii_info_detail | grep -A3 "type: " > > type: preonly > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > -- > > type: lu > > out-of-place factorization > > tolerance for zero pivot 2.22045e-14 > > matrix ordering: external > > -- > > type: mumps > > rows=1050625, cols=1050625 > > package used to perform factorization: mumps > > total: nonzeros=115377847, allocated nonzeros=115377847 > > -- > > type: mpiaij > > rows=1050625, cols=1050625 > > total: nonzeros=7346177, allocated nonzeros=7346177 > > total number of mallocs used during MatSetValues calls=0 > > ``` > > > > I would greatly appreciate any insights you may have on this matter. > Thank you in advance for your time and assistance. > > > > Best wishes, > > Zongze > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Sat Mar 4 08:03:10 2023 From: pierre at joliv.et (Pierre Jolivet) Date: Sat, 4 Mar 2023 15:03:10 +0100 Subject: [petsc-users] Random Error of mumps: out of memory: INFOG(1)=-9 In-Reply-To: References: Message-ID: > On 4 Mar 2023, at 2:51 PM, Zongze Yang wrote: > > > > On Sat, 4 Mar 2023 at 21:37, Pierre Jolivet > wrote: >> >> >> > On 4 Mar 2023, at 2:30 PM, Zongze Yang > wrote: >> > >> > Hi, >> > >> > I am writing to seek your advice regarding a problem I encountered while using multigrid to solve a certain issue. >> > I am currently using multigrid with the coarse problem solved by PCLU. However, the PC failed randomly with the error below (the value of INFO(2) may differ): >> > ```shell >> > [ 0] Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=36 >> > ``` >> > >> > Upon checking the documentation of MUMPS, I discovered that increasing the value of ICNTL(14) may help resolve the issue. Specifically, I set the option -mat_mumps_icntl_14 to a higher value (such as 40), and the error seemed to disappear after I set the value of ICNTL(14) to 80. However, I am still curious as to why MUMPS failed randomly in the first place. >> > >> > Upon further inspection, I found that the number of nonzeros of the PETSc matrix and the MUMPS matrix were different every time I ran the code. I am now left with the following questions: >> > >> > 1. What could be causing the number of nonzeros of the MUMPS matrix to change every time I run the code? >> >> Is the Mat being fed to MUMPS distributed on a communicator of size greater than one? >> If yes, then, depending on the pivoting and the renumbering, you may get non-deterministic results. > > Hi, Pierre, > Thank you for your prompt reply. Yes, the size of the communicator is greater than one. > Even if the size of the communicator is equal, are the results still non-deterministic? In the most general case, yes. > Can I assume the Mat being fed to MUMPS is the same in this case? Are you doing algebraic or geometric multigrid? Are the prolongation operators computed by Firedrake or by PETSc, e.g., through GAMG? If it?s the latter, I believe the Mat being fed to MUMPS should always be the same. If it?s the former, you?ll have to ask the Firedrake people if there may be non-determinism in the coarsening process. > Is the pivoting and renumbering all done by MUMPS other than PETSc? You could provide your own numbering, but by default, this is outsourced to MUMPS indeed, which will itself outsourced this to METIS, AMD, etc. Thanks, Pierre >> >> > 2. Why is the number of nonzeros of the MUMPS matrix significantly greater than that of the PETSc matrix (as seen in the output of ksp_view, 115025949 vs 7346177)? >> >> Exact factorizations introduce fill-in. >> The number of nonzeros you are seeing for MUMPS is the number of nonzeros in the factors. >> >> > 3. Is it possible that the varying number of nonzeros of the MUMPS matrix is the cause of the random failure? >> >> Yes, MUMPS uses dynamic scheduling, which will depend on numerical pivoting, and which may generate factors with different number of nonzeros. > > Got it. Thank you for your clear explanation. > Zongze > >> >> Thanks, >> Pierre >> >> > I have attached a test example written in Firedrake. The output of `ksp_view` after running the code twice is included below for your reference. >> > In the output, the number of nonzeros of the MUMPS matrix was 115025949 and 115377847, respectively, while that of the PETSc matrix was only 7346177. >> > >> > ```shell >> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view ::ascii_info_detail | grep -A3 "type: " >> > type: preonly >> > maximum iterations=10000, initial guess is zero >> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> > left preconditioning >> > -- >> > type: lu >> > out-of-place factorization >> > tolerance for zero pivot 2.22045e-14 >> > matrix ordering: external >> > -- >> > type: mumps >> > rows=1050625, cols=1050625 >> > package used to perform factorization: mumps >> > total: nonzeros=115025949, allocated nonzeros=115025949 >> > -- >> > type: mpiaij >> > rows=1050625, cols=1050625 >> > total: nonzeros=7346177, allocated nonzeros=7346177 >> > total number of mallocs used during MatSetValues calls=0 >> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view ::ascii_info_detail | grep -A3 "type: " >> > type: preonly >> > maximum iterations=10000, initial guess is zero >> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> > left preconditioning >> > -- >> > type: lu >> > out-of-place factorization >> > tolerance for zero pivot 2.22045e-14 >> > matrix ordering: external >> > -- >> > type: mumps >> > rows=1050625, cols=1050625 >> > package used to perform factorization: mumps >> > total: nonzeros=115377847, allocated nonzeros=115377847 >> > -- >> > type: mpiaij >> > rows=1050625, cols=1050625 >> > total: nonzeros=7346177, allocated nonzeros=7346177 >> > total number of mallocs used during MatSetValues calls=0 >> > ``` >> > >> > I would greatly appreciate any insights you may have on this matter. Thank you in advance for your time and assistance. >> > >> > Best wishes, >> > Zongze >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangzongze at gmail.com Sat Mar 4 08:26:00 2023 From: yangzongze at gmail.com (Zongze Yang) Date: Sat, 4 Mar 2023 22:26:00 +0800 Subject: [petsc-users] Random Error of mumps: out of memory: INFOG(1)=-9 In-Reply-To: References: Message-ID: On Sat, 4 Mar 2023 at 22:03, Pierre Jolivet wrote: > > > On 4 Mar 2023, at 2:51 PM, Zongze Yang wrote: > > > > On Sat, 4 Mar 2023 at 21:37, Pierre Jolivet wrote: > >> >> >> > On 4 Mar 2023, at 2:30 PM, Zongze Yang wrote: >> > >> > Hi, >> > >> > I am writing to seek your advice regarding a problem I encountered >> while using multigrid to solve a certain issue. >> > I am currently using multigrid with the coarse problem solved by PCLU. >> However, the PC failed randomly with the error below (the value of INFO(2) >> may differ): >> > ```shell >> > [ 0] Error reported by MUMPS in numerical factorization phase: >> INFOG(1)=-9, INFO(2)=36 >> > ``` >> > >> > Upon checking the documentation of MUMPS, I discovered that increasing >> the value of ICNTL(14) may help resolve the issue. Specifically, I set the >> option -mat_mumps_icntl_14 to a higher value (such as 40), and the error >> seemed to disappear after I set the value of ICNTL(14) to 80. However, I am >> still curious as to why MUMPS failed randomly in the first place. >> > >> > Upon further inspection, I found that the number of nonzeros of the >> PETSc matrix and the MUMPS matrix were different every time I ran the code. >> I am now left with the following questions: >> > >> > 1. What could be causing the number of nonzeros of the MUMPS matrix to >> change every time I run the code? >> >> Is the Mat being fed to MUMPS distributed on a communicator of size >> greater than one? >> If yes, then, depending on the pivoting and the renumbering, you may get >> non-deterministic results. >> > > Hi, Pierre, > Thank you for your prompt reply. Yes, the size of the communicator is > greater than one. > Even if the size of the communicator is equal, are the results > still non-deterministic? > > > In the most general case, yes. > > Can I assume the Mat being fed to MUMPS is the same in this case? > > > Are you doing algebraic or geometric multigrid? > Are the prolongation operators computed by Firedrake or by PETSc, e.g., > through GAMG? > If it?s the latter, I believe the Mat being fed to MUMPS should always be > the same. > If it?s the former, you?ll have to ask the Firedrake people if there may > be non-determinism in the coarsening process. > I am using geometric multigrid, and the prolongation operators, I think, are computed by Firedrake. Thanks for your suggestion, I will ask the Firedrake people. > > Is the pivoting and renumbering all done by MUMPS other than PETSc? > > > You could provide your own numbering, but by default, this is outsourced > to MUMPS indeed, which will itself outsourced this to METIS, AMD, etc. > I think I won't do this. By the way, does the result of superlu_dist have a similar non-deterministic? Thanks, Zongze > Thanks, > Pierre > > >> > 2. Why is the number of nonzeros of the MUMPS matrix significantly >> greater than that of the PETSc matrix (as seen in the output of ksp_view, >> 115025949 vs 7346177)? >> >> Exact factorizations introduce fill-in. >> The number of nonzeros you are seeing for MUMPS is the number of nonzeros >> in the factors. >> >> > 3. Is it possible that the varying number of nonzeros of the MUMPS >> matrix is the cause of the random failure? >> >> Yes, MUMPS uses dynamic scheduling, which will depend on numerical >> pivoting, and which may generate factors with different number of nonzeros. >> > > Got it. Thank you for your clear explanation. > Zongze > > >> Thanks, >> Pierre > > >> > I have attached a test example written in Firedrake. The output of >> `ksp_view` after running the code twice is included below for your >> reference. >> > In the output, the number of nonzeros of the MUMPS matrix was 115025949 >> and 115377847, respectively, while that of the PETSc matrix was only >> 7346177. >> > >> > ```shell >> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view >> ::ascii_info_detail | grep -A3 "type: " >> > type: preonly >> > maximum iterations=10000, initial guess is zero >> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> > left preconditioning >> > -- >> > type: lu >> > out-of-place factorization >> > tolerance for zero pivot 2.22045e-14 >> > matrix ordering: external >> > -- >> > type: mumps >> > rows=1050625, cols=1050625 >> > package used to perform factorization: mumps >> > total: nonzeros=115025949, allocated nonzeros=115025949 >> > -- >> > type: mpiaij >> > rows=1050625, cols=1050625 >> > total: nonzeros=7346177, allocated nonzeros=7346177 >> > total number of mallocs used during MatSetValues calls=0 >> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view >> ::ascii_info_detail | grep -A3 "type: " >> > type: preonly >> > maximum iterations=10000, initial guess is zero >> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> > left preconditioning >> > -- >> > type: lu >> > out-of-place factorization >> > tolerance for zero pivot 2.22045e-14 >> > matrix ordering: external >> > -- >> > type: mumps >> > rows=1050625, cols=1050625 >> > package used to perform factorization: mumps >> > total: nonzeros=115377847, allocated nonzeros=115377847 >> > -- >> > type: mpiaij >> > rows=1050625, cols=1050625 >> > total: nonzeros=7346177, allocated nonzeros=7346177 >> > total number of mallocs used during MatSetValues calls=0 >> > ``` >> > >> > I would greatly appreciate any insights you may have on this matter. >> Thank you in advance for your time and assistance. >> > >> > Best wishes, >> > Zongze >> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Sat Mar 4 08:30:29 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Sat, 4 Mar 2023 08:30:29 -0600 (CST) Subject: [petsc-users] Error in configuring PETSc with Cygwin In-Reply-To: References: <4ECF3541-271E-449E-B9FF-45EB24913F25@petsc.dev> <6bb8769f-976c-fa32-8076-757e2d04b54c@mcs.anl.gov> Message-ID: <756c6920-aee1-df97-3b33-0a87079fa3be@mcs.anl.gov> > Defined "SIZEOF_VOID_P" to "4" It won't work with 32bit compilers. Can you use 64bit compilers [with 64bit ms-mpi]? BTW: Our testing is on Windows10 and short paths do work. But yeah - if you an avoid spaces - thats one way to simplify. https://gitlab.com/petsc/petsc/-/jobs/3873889443 MPI: Version: 2 Includes: -I/cygdrive/c/PROGRA~2/MICROS~3/MPI/Include -I/cygdrive/c/PROGRA~2/MICROS~3/MPI/Include/x64 Libraries: /cygdrive/c/PROGRA~2/MICROS~3/MPI/lib/x64/msmpifec.lib /cygdrive/c/PROGRA~2/MICROS~3/MPI/lib/x64/msmpi.lib mpiexec: /cygdrive/c/PROGRA~1/MICROS~3/Bin/mpiexec.exe Satish On Sat, 4 Mar 2023, ??? wrote: > Hi, > > > VS2022 really sovled this error and there is no more error with the compiler, this is a good news! > > > However, a new problem comes with the link option for MS-MPI (since MPICH2 doesn't work): > > > I've made reference to PETSc website and downloaded MS-MPI in directory D:\MicrosoftMPI and D:\MicrosoftSDKs to avoid space (by the way, method on https://petsc.org/release/install/windows/ which use shortname for a path is not useful anymore for win10 because shortname doesn't exist, see [1]). I have no idea if my tying format is not correct since the PETSc website doesn't show the coding for two include directories. Below is my typing: > > > ./configure --with-cc='win32fe cl' --with-fc=0 --with-cxx='win32fe cl' --with-shared-libraries=0 --with-mpi-include='[/cygdrive/d/MicrosoftSDKs/MPI/Include,/cygdrive/d/MicrosoftSDKs/MPI/Include/x64]' --with-mpi-lib=-L"/cygdrive/d/MicrosoftSDKs/MPI/Lib/x64 msmpifec.lib msmpi.lib" --with-mpiexec="/cygdrive/d/MicrosoftMPI/Bin/mpiexec" > > > This ends up with the error information: > > > --with-mpi-lib=['-L/cygdrive/d/MicrosoftSDKs/MPI/Lib/x64', 'msmpifec.lib', 'msmpi.lib'] and > --with-mpi-include=['/cygdrive/d/MicrosoftSDKs/MPI/Include', '/cygdrive/d/MicrosoftSDKs/MPI/Include/x64'] did not work > > > This may not be a very delicacy problem? And I am voluntary to make a summary about this installation once it succeed. > > > Sorry for always bother with problem, > FENG > > > > [1] https://superuser.com/questions/348079/how-can-i-find-the-short-path-of-a-windows-directory-file >   >   > ------------------ Original ------------------ > From:  "Satish Balay" Date:  Fri, Mar 3, 2023 12:12 PM > To:  "Barry Smith" Cc:  "???" Subject:  Re: [petsc-users] Error in configuring PETSc with Cygwin > >   > > Perhaps the compilers are installed without english - so we can't read the error messages. > > >   x64   Microsoft (R) C/C++  ?  19.29.30147  > > We test with: > > Microsoft (R) C/C++ Optimizing Compiler Version 19.32.31329 for x64 > > I guess that's VS2019 vs VS2022? > > You can try using --with-cxx=0 and see if that works. > > Satish > > On Thu, 2 Mar 2023, Barry Smith wrote: > > > > >    The compiler is burping out some warning message which confuses configure into thinking there is a problem. > > > > cl:   warning D9035 : experimental:preprocessor ? ? ? ?? ? > > cl:   warning D9036 :? ? Zc:preprocessor ? ? experimental:preprocessor > > cl:   warning D9002 : ??? ?-Qwd10161 : > > > > Any chance you can use a more recent version of VS. If not, we'll need to send you a file for the warning message. > > > > > > > > > On Mar 2, 2023, at 9:12 PM, ??? > > > > > Hi, > > > > > > This time I try with ./configure --with-cc='win32fe cl' --with-fc=0 --with-cxx='win32fe cl' --download-f2cblaslapack, without fortran may have no problem in consideration that other libs will be used are CGNS and METIS. > > > > > > Unfortunately, however, another error appeared as: > > > > > > Cxx libraries cannot directly be used with C as linker. > > > If you don't need the C++ compiler to build external packages or for you application you can run > > > ./configure with --with-cxx=0. Otherwise you need a different combination of C and C++ compilers > > >  > > >  The attachment is the log file, but some parts are unreadable. > > > > > > Thanks for your continuous aid! > > > ------------------ Original ------------------ > > > From:  "Satish Balay" > > Date:  Fri, Mar 3, 2023 02:13 AM > > > To:  "???" > > Cc:  "petsc-users" > > Subject:  Re: [petsc-users] Error in configuring PETSc with Cygwin > > >  > > > On Fri, 3 Mar 2023, ??? wrote: > > > > > > > Hi team, > > > > > > > > > > > > Recently I try to install PETSc with Cygwin since I'd like to use PETSc with Visual Studio on Windows10 plateform. For the sake of clarity, I firstly list the softwares/packages used below: > > > > > > > > > > > > 1. PETSc: version 3.18.5 > > > > 2. VS: version 2019 > > > > 3. Intel Parallel Studio XE: version 2020 > > > > 4. Cygwin with py3.8 and make (and default installation) > > > > > > > > > > > > And because I plan to use Intel mpi, the compiler option in configuration is: > > > > > > > > > > > > ./configure --with-cc='win32fe cl' --with-fc='win32fe ifort' --with-cxx='win32fe cl' --download-fblaslapack > > > > > > Check config/examples/arch-ci-mswin-opt-impi.py for an example on specifying IMPI [and MKL - instead of fblaslapack]. And if you don't need MPI - you can use --with-mpi=0 > > > > > > > > > > > > > > > where there is no option for mpi. > > > > > > > > > > > > While the PROBLEM came with the compiler option --with-fc='win32fe ifort', which returned an error (or two) as: > > > > > > > > > > > > Cannot run executables created with FC. If this machine uses a batch system > > > > to submit jobs you will need to configure using ./configure with the additional option&nbsp; --with-batch. > > > > Otherwise there is problem with the compilers. Can you compile and run code with your compiler '/cygdrive/d/petsc/petsc-3.18.5/lib/petsc/bin/win32fe/win32fe ifort'? > > > > > > If you are not using PETSc from fortran - you don't need ifort. You can use --with-fc=0 [with MKL or --download-f2cblaslapack] > > > > > > If you are still encountering errors - send us configure.log for the failed build. > > > > > > Satish > > > > > > > > > > > > > > > > > > > Note that both ifort of x64 and ifort of ia-32 ended with the same error above and I install IPS with options related to mkl and fblaslapack. Something a bit suspectable is that I open Cygwin with dos. (actually the Intel Compiler 19.1 Update 3 Intel 64 Visual Studio 2019, x86 environment for the test of ifort ia-32 ,in particularlly) > > > > > > > > > > > > Therefore, I write this e-mail to you in order to confirm if I should add "--with-batch" or the error is caused by other reason, such as ifort ? > > > > > > > > > > > > Looking forward your reply! > > > > > > > > > > > > Sinserely, > > > > FENG. > > > > > > > > > From pierre at joliv.et Sat Mar 4 09:09:38 2023 From: pierre at joliv.et (Pierre Jolivet) Date: Sat, 4 Mar 2023 16:09:38 +0100 Subject: [petsc-users] Random Error of mumps: out of memory: INFOG(1)=-9 In-Reply-To: References: Message-ID: <20049ECA-AEA6-4BED-B692-97C8C2AAAA3A@joliv.et> > On 4 Mar 2023, at 3:26 PM, Zongze Yang wrote: > > ? > > >> On Sat, 4 Mar 2023 at 22:03, Pierre Jolivet wrote: >> >> >>>> On 4 Mar 2023, at 2:51 PM, Zongze Yang wrote: >>>> >>>> >>>> >>>> On Sat, 4 Mar 2023 at 21:37, Pierre Jolivet wrote: >>>>> >>>>> >>>>> > On 4 Mar 2023, at 2:30 PM, Zongze Yang wrote: >>>>> > >>>>> > Hi, >>>>> > >>>>> > I am writing to seek your advice regarding a problem I encountered while using multigrid to solve a certain issue. >>>>> > I am currently using multigrid with the coarse problem solved by PCLU. However, the PC failed randomly with the error below (the value of INFO(2) may differ): >>>>> > ```shell >>>>> > [ 0] Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=36 >>>>> > ``` >>>>> > >>>>> > Upon checking the documentation of MUMPS, I discovered that increasing the value of ICNTL(14) may help resolve the issue. Specifically, I set the option -mat_mumps_icntl_14 to a higher value (such as 40), and the error seemed to disappear after I set the value of ICNTL(14) to 80. However, I am still curious as to why MUMPS failed randomly in the first place. >>>>> > >>>>> > Upon further inspection, I found that the number of nonzeros of the PETSc matrix and the MUMPS matrix were different every time I ran the code. I am now left with the following questions: >>>>> > >>>>> > 1. What could be causing the number of nonzeros of the MUMPS matrix to change every time I run the code? >>>>> >>>>> Is the Mat being fed to MUMPS distributed on a communicator of size greater than one? >>>>> If yes, then, depending on the pivoting and the renumbering, you may get non-deterministic results. >>>> >>>> Hi, Pierre, >>>> Thank you for your prompt reply. Yes, the size of the communicator is greater than one. >>>> Even if the size of the communicator is equal, are the results still non-deterministic? >>> >>> In the most general case, yes. >>> >>> Can I assume the Mat being fed to MUMPS is the same in this case? >> >> Are you doing algebraic or geometric multigrid? >> Are the prolongation operators computed by Firedrake or by PETSc, e.g., through GAMG? >> If it?s the latter, I believe the Mat being fed to MUMPS should always be the same. >> If it?s the former, you?ll have to ask the Firedrake people if there may be non-determinism in the coarsening process. > > I am using geometric multigrid, and the prolongation operators, I think, are computed by Firedrake. > Thanks for your suggestion, I will ask the Firedrake people. > >> >>> Is the pivoting and renumbering all done by MUMPS other than PETSc? >> >> You could provide your own numbering, but by default, this is outsourced to MUMPS indeed, which will itself outsourced this to METIS, AMD, etc. > > I think I won't do this. > By the way, does the result of superlu_dist have a similar non-deterministic? SuperLU_DIST uses static pivoting as far as I know, so it may be more deterministic. Thanks, Pierre > Thanks, > Zongze > >> >> Thanks, >> Pierre >> >>>> >>>> > 2. Why is the number of nonzeros of the MUMPS matrix significantly greater than that of the PETSc matrix (as seen in the output of ksp_view, 115025949 vs 7346177)? >>>> >>>> Exact factorizations introduce fill-in. >>>> The number of nonzeros you are seeing for MUMPS is the number of nonzeros in the factors. >>>> >>>> > 3. Is it possible that the varying number of nonzeros of the MUMPS matrix is the cause of the random failure? >>>> >>>> Yes, MUMPS uses dynamic scheduling, which will depend on numerical pivoting, and which may generate factors with different number of nonzeros. >>> >>> Got it. Thank you for your clear explanation. >>> Zongze >>> >>>> >>>> Thanks, >>>> Pierre >>>> >>>> > I have attached a test example written in Firedrake. The output of `ksp_view` after running the code twice is included below for your reference. >>>> > In the output, the number of nonzeros of the MUMPS matrix was 115025949 and 115377847, respectively, while that of the PETSc matrix was only 7346177. >>>> > >>>> > ```shell >>>> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view ::ascii_info_detail | grep -A3 "type: " >>>> > type: preonly >>>> > maximum iterations=10000, initial guess is zero >>>> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>>> > left preconditioning >>>> > -- >>>> > type: lu >>>> > out-of-place factorization >>>> > tolerance for zero pivot 2.22045e-14 >>>> > matrix ordering: external >>>> > -- >>>> > type: mumps >>>> > rows=1050625, cols=1050625 >>>> > package used to perform factorization: mumps >>>> > total: nonzeros=115025949, allocated nonzeros=115025949 >>>> > -- >>>> > type: mpiaij >>>> > rows=1050625, cols=1050625 >>>> > total: nonzeros=7346177, allocated nonzeros=7346177 >>>> > total number of mallocs used during MatSetValues calls=0 >>>> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view ::ascii_info_detail | grep -A3 "type: " >>>> > type: preonly >>>> > maximum iterations=10000, initial guess is zero >>>> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>>> > left preconditioning >>>> > -- >>>> > type: lu >>>> > out-of-place factorization >>>> > tolerance for zero pivot 2.22045e-14 >>>> > matrix ordering: external >>>> > -- >>>> > type: mumps >>>> > rows=1050625, cols=1050625 >>>> > package used to perform factorization: mumps >>>> > total: nonzeros=115377847, allocated nonzeros=115377847 >>>> > -- >>>> > type: mpiaij >>>> > rows=1050625, cols=1050625 >>>> > total: nonzeros=7346177, allocated nonzeros=7346177 >>>> > total number of mallocs used during MatSetValues calls=0 >>>> > ``` >>>> > >>>> > I would greatly appreciate any insights you may have on this matter. Thank you in advance for your time and assistance. >>>> > >>>> > Best wishes, >>>> > Zongze >>>> > >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangzongze at gmail.com Sat Mar 4 09:45:03 2023 From: yangzongze at gmail.com (Zongze Yang) Date: Sat, 4 Mar 2023 23:45:03 +0800 Subject: [petsc-users] Random Error of mumps: out of memory: INFOG(1)=-9 In-Reply-To: <20049ECA-AEA6-4BED-B692-97C8C2AAAA3A@joliv.et> References: <20049ECA-AEA6-4BED-B692-97C8C2AAAA3A@joliv.et> Message-ID: Thanks, I will give it a try. Best wishes, Zongze On Sat, 4 Mar 2023 at 23:09, Pierre Jolivet wrote: > > > On 4 Mar 2023, at 3:26 PM, Zongze Yang wrote: > > ? > > > On Sat, 4 Mar 2023 at 22:03, Pierre Jolivet wrote: > >> >> >> On 4 Mar 2023, at 2:51 PM, Zongze Yang wrote: >> >> >> >> On Sat, 4 Mar 2023 at 21:37, Pierre Jolivet wrote: >> >>> >>> >>> > On 4 Mar 2023, at 2:30 PM, Zongze Yang wrote: >>> > >>> > Hi, >>> > >>> > I am writing to seek your advice regarding a problem I encountered >>> while using multigrid to solve a certain issue. >>> > I am currently using multigrid with the coarse problem solved by PCLU. >>> However, the PC failed randomly with the error below (the value of INFO(2) >>> may differ): >>> > ```shell >>> > [ 0] Error reported by MUMPS in numerical factorization phase: >>> INFOG(1)=-9, INFO(2)=36 >>> > ``` >>> > >>> > Upon checking the documentation of MUMPS, I discovered that increasing >>> the value of ICNTL(14) may help resolve the issue. Specifically, I set the >>> option -mat_mumps_icntl_14 to a higher value (such as 40), and the error >>> seemed to disappear after I set the value of ICNTL(14) to 80. However, I am >>> still curious as to why MUMPS failed randomly in the first place. >>> > >>> > Upon further inspection, I found that the number of nonzeros of the >>> PETSc matrix and the MUMPS matrix were different every time I ran the code. >>> I am now left with the following questions: >>> > >>> > 1. What could be causing the number of nonzeros of the MUMPS matrix to >>> change every time I run the code? >>> >>> Is the Mat being fed to MUMPS distributed on a communicator of size >>> greater than one? >>> If yes, then, depending on the pivoting and the renumbering, you may get >>> non-deterministic results. >>> >> >> Hi, Pierre, >> Thank you for your prompt reply. Yes, the size of the communicator is >> greater than one. >> Even if the size of the communicator is equal, are the results >> still non-deterministic? >> >> >> In the most general case, yes. >> >> Can I assume the Mat being fed to MUMPS is the same in this case? >> >> >> Are you doing algebraic or geometric multigrid? >> Are the prolongation operators computed by Firedrake or by PETSc, e.g., >> through GAMG? >> If it?s the latter, I believe the Mat being fed to MUMPS should always be >> the same. >> If it?s the former, you?ll have to ask the Firedrake people if there may >> be non-determinism in the coarsening process. >> > > I am using geometric multigrid, and the prolongation operators, I think, > are computed by Firedrake. > Thanks for your suggestion, I will ask the Firedrake people. > > >> >> Is the pivoting and renumbering all done by MUMPS other than PETSc? >> >> >> You could provide your own numbering, but by default, this is outsourced >> to MUMPS indeed, which will itself outsourced this to METIS, AMD, etc. >> > > I think I won't do this. > By the way, does the result of superlu_dist have a similar > non-deterministic? > > > SuperLU_DIST uses static pivoting as far as I know, so it may be more > deterministic. > > Thanks, > Pierre > > Thanks, > Zongze > > >> Thanks, >> Pierre >> >> >>> > 2. Why is the number of nonzeros of the MUMPS matrix significantly >>> greater than that of the PETSc matrix (as seen in the output of ksp_view, >>> 115025949 vs 7346177)? >>> >>> Exact factorizations introduce fill-in. >>> The number of nonzeros you are seeing for MUMPS is the number of >>> nonzeros in the factors. >>> >>> > 3. Is it possible that the varying number of nonzeros of the MUMPS >>> matrix is the cause of the random failure? >>> >>> Yes, MUMPS uses dynamic scheduling, which will depend on numerical >>> pivoting, and which may generate factors with different number of nonzeros. >>> >> >> Got it. Thank you for your clear explanation. >> Zongze >> >> >>> Thanks, >>> Pierre >> >> >>> > I have attached a test example written in Firedrake. The output of >>> `ksp_view` after running the code twice is included below for your >>> reference. >>> > In the output, the number of nonzeros of the MUMPS matrix was >>> 115025949 and 115377847, respectively, while that of the PETSc matrix was >>> only 7346177. >>> > >>> > ```shell >>> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view >>> ::ascii_info_detail | grep -A3 "type: " >>> > type: preonly >>> > maximum iterations=10000, initial guess is zero >>> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>> > left preconditioning >>> > -- >>> > type: lu >>> > out-of-place factorization >>> > tolerance for zero pivot 2.22045e-14 >>> > matrix ordering: external >>> > -- >>> > type: mumps >>> > rows=1050625, cols=1050625 >>> > package used to perform factorization: mumps >>> > total: nonzeros=115025949, allocated nonzeros=115025949 >>> > -- >>> > type: mpiaij >>> > rows=1050625, cols=1050625 >>> > total: nonzeros=7346177, allocated nonzeros=7346177 >>> > total number of mallocs used during MatSetValues calls=0 >>> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view >>> ::ascii_info_detail | grep -A3 "type: " >>> > type: preonly >>> > maximum iterations=10000, initial guess is zero >>> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>> > left preconditioning >>> > -- >>> > type: lu >>> > out-of-place factorization >>> > tolerance for zero pivot 2.22045e-14 >>> > matrix ordering: external >>> > -- >>> > type: mumps >>> > rows=1050625, cols=1050625 >>> > package used to perform factorization: mumps >>> > total: nonzeros=115377847, allocated nonzeros=115377847 >>> > -- >>> > type: mpiaij >>> > rows=1050625, cols=1050625 >>> > total: nonzeros=7346177, allocated nonzeros=7346177 >>> > total number of mallocs used during MatSetValues calls=0 >>> > ``` >>> > >>> > I would greatly appreciate any insights you may have on this matter. >>> Thank you in advance for your time and assistance. >>> > >>> > Best wishes, >>> > Zongze >>> > >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From danyang.su at gmail.com Sat Mar 4 18:38:15 2023 From: danyang.su at gmail.com (Danyang Su) Date: Sat, 04 Mar 2023 16:38:15 -0800 Subject: [petsc-users] PETSC ERROR in DMGetLocalBoundingBox? In-Reply-To: References: <00ab01d94e31$51fdc590$f5f950b0$@gmail.com> Message-ID: <64E48C68-62C7-4624-9D35-63F604DA3C3C@gmail.com> Hi Matt, Attached is the source code and example. I have deleted most of the unused source code but it is still a bit length. Sorry about that. The errors come after DMGetLocalBoundingBox and DMGetBoundingBox. -> To compile the code Please type 'make exe' and the executable file petsc_bounding will be created under the same folder. -> To test the code Please go to fold 'test' and type 'mpiexec -n 1 ../petsc_bounding'. -> The output from PETSc 3.18, error information input file: stedvs.dat ------------------------------------------------------------------------ global control parameters ------------------------------------------------------------------------ [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Corrupt argument: https://petsc.org/release/faq/#valgrind [0]PETSC ERROR: Object already free: Parameter # 1 [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 [0]PETSC ERROR: ../petsc_bounding on a linux-gnu-dbg named starblazer by dsu Sat Mar? 4 16:20:51 2023 [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-scalapack --download-parmetis --download-metis --download-mumps --download-ptscotch --download-chaco --download-fblaslapack --download-hypre --download-superlu_dist --download-hdf5=yes --download-ctetgen --download-zlib --download-pnetcdf --download-cmake --with-hdf5-fortran-bindings --with-debugging=1 [0]PETSC ERROR: #1 VecGetArrayRead() at /home/dsu/Soft/petsc/petsc-3.18.3/src/vec/vec/interface/rvector.c:1928 [0]PETSC ERROR: #2 DMGetLocalBoundingBox() at /home/dsu/Soft/petsc/petsc-3.18.3/src/dm/interface/dmcoordinates.c:897 [0]PETSC ERROR: #3 /home/dsu/Work/bug-check/petsc_bounding/src/solver_ddmethod.F90:1920 Total volume of simulation domain?? 0.20000000E+01 Total volume of simulation domain?? 0.20000000E+01 -> The output from PETSc 3.17 and earlier, no error input file: stedvs.dat ------------------------------------------------------------------------ global control parameters ------------------------------------------------------------------------ Total volume of simulation domain?? 0.20000000E+01 Total volume of simulation domain?? 0.20000000E+01 Thanks, Danyang From: Matthew Knepley Date: Friday, March 3, 2023 at 8:58 PM To: Cc: Subject: Re: [petsc-users] PETSC ERROR in DMGetLocalBoundingBox? On Sat, Mar 4, 2023 at 1:35?AM wrote: Hi All, I get a very strange error after upgrading PETSc version to 3.18.3, indicating some object is already free. The error is begin and does not crash the code. There is no error before PETSc 3.17.5 versions. We have changed the way coordinates are handled in order to support higher order coordinate fields. Is it possible to send something that we can run that has this error? It could be on our end, but it could also be that you are destroying a coordinate vector accidentally. Thanks, Matt !Check coordinates call DMGetCoordinateDM(dmda_flow%da,cda,ierr) CHKERRQ(ierr) call DMGetCoordinates(dmda_flow%da,gc,ierr) CHKERRQ(ierr) call DMGetLocalBoundingBox(dmda_flow%da,lmin,lmax,ierr) CHKERRQ(ierr) call DMGetBoundingBox(dmda_flow%da,gmin,gmax,ierr) CHKERRQ(ierr) [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Corrupt argument: https://petsc.org/release/faq/#valgrind [0]PETSC ERROR: Object already free: Parameter # 1 [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 [0]PETSC ERROR: ../min3p-hpc-mpi-petsc-3.18.3 on a linux-gnu-dbg named starblazer by dsu Fri Mar 3 16:26:03 2023 [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-scalapack --download-parmetis --download-metis --download-mumps --download-ptscotch --download-chaco --download-fblaslapack --download-hypre --download-superlu_dist --download-hdf5=yes --download-ctetgen --download-zlib --download-pnetcdf --download-cmake --with-hdf5-fortran-bindings --with-debugging=1 [0]PETSC ERROR: #1 VecGetArrayRead() at /home/dsu/Soft/petsc/petsc-3.18.3/src/vec/vec/interface/rvector.c:1928 [0]PETSC ERROR: #2 DMGetLocalBoundingBox() at /home/dsu/Soft/petsc/petsc-3.18.3/src/dm/interface/dmcoordinates.c:897 [0]PETSC ERROR: #3 /home/dsu/Work/min3p-dbs-backup/src/project/makefile_p/../../solver/solver_ddmethod.F90:2140 Any suggestion on this? Thanks, Danyang -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: petsc_bounding_check.tar.gz Type: application/x-gzip Size: 77551 bytes Desc: not available URL: From yangzongze at gmail.com Sun Mar 5 02:14:12 2023 From: yangzongze at gmail.com (Zongze Yang) Date: Sun, 5 Mar 2023 16:14:12 +0800 Subject: [petsc-users] petsc4py did not raise for a second time with option `ksp_error_if_not_converged` Message-ID: Hello, I am trying to catch the "not converged" error in a loop with the `ksp_error_if_not_converged` option on. However, it seems that PETSc only raises the exception once, even though the solver does not converge after that. Is this expected behavior? Can I make it raise an exception every time? I have included a code snippet of the loop below, and the complete code is attached: ```python for i in range(3): printf(f"Loop i = {i}") try: solver.solve() except ConvergenceError: printf(f" Error from Firedrake: solver did not converged: {get_ksp_reason(solver)}") except PETSc.Error as e: if e.ierr == 91: printf(f" Error from PETSc: solver did not converged: {get_ksp_reason(solver)}") else: raise ``` The output of the code looks like this: ```python (complex-int32-mkl) $ python test_error.py Loop i = 0 Linear solve did not converge due to DIVERGED_ITS iterations 4 Error from PETSc: solver did not converged: DIVERGED_MAX_IT Loop i = 1 Linear solve did not converge due to DIVERGED_ITS iterations 4 Error from Firedrake: solver did not converged: DIVERGED_MAX_IT Loop i = 2 Linear solve did not converge due to DIVERGED_ITS iterations 4 Error from Firedrake: solver did not converged: DIVERGED_MAX_IT ``` Best wishes, Zongze -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_error.py Type: application/octet-stream Size: 1476 bytes Desc: not available URL: From knepley at gmail.com Sun Mar 5 12:40:04 2023 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 5 Mar 2023 13:40:04 -0500 Subject: [petsc-users] petsc4py did not raise for a second time with option `ksp_error_if_not_converged` In-Reply-To: References: Message-ID: On Sun, Mar 5, 2023 at 3:14?AM Zongze Yang wrote: > > > Hello, > > I am trying to catch the "not converged" error in a loop with the > `ksp_error_if_not_converged` option on. However, it seems that PETSc only > raises the exception once, even though the solver does not converge after > that. Is this expected behavior? Can I make it raise an exception every > time? > When an error is raised, we do not guarantee a consistent state for recovery, so errors terminate the program. If you want to do something useful with non-convergence, then you do not set -ksp_error_if_not_converged. Rather you check the convergence code, and if it is not convergence, you take your action. Thanks, Matt > I have included a code snippet of the loop below, and the complete code is > attached: > ```python > for i in range(3): > printf(f"Loop i = {i}") > try: > solver.solve() > except ConvergenceError: > printf(f" Error from Firedrake: solver did not converged: > {get_ksp_reason(solver)}") > except PETSc.Error as e: > if e.ierr == 91: > printf(f" Error from PETSc: solver did not converged: > {get_ksp_reason(solver)}") > else: > raise > ``` > > The output of the code looks like this: > ```python > (complex-int32-mkl) $ python test_error.py > Loop i = 0 > Linear solve did not converge due to DIVERGED_ITS iterations 4 > Error from PETSc: solver did not converged: DIVERGED_MAX_IT > Loop i = 1 > Linear solve did not converge due to DIVERGED_ITS iterations 4 > Error from Firedrake: solver did not converged: DIVERGED_MAX_IT > Loop i = 2 > Linear solve did not converge due to DIVERGED_ITS iterations 4 > Error from Firedrake: solver did not converged: DIVERGED_MAX_IT > ``` > > Best wishes, > Zongze > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangzongze at gmail.com Tue Mar 7 00:09:52 2023 From: yangzongze at gmail.com (Zongze Yang) Date: Tue, 7 Mar 2023 14:09:52 +0800 Subject: [petsc-users] petsc4py did not raise for a second time with option `ksp_error_if_not_converged` In-Reply-To: References: Message-ID: Thank you for your suggestion. Best wishes, Zongze On Mon, 6 Mar 2023 at 02:40, Matthew Knepley wrote: > On Sun, Mar 5, 2023 at 3:14?AM Zongze Yang wrote: > >> >> >> Hello, >> >> I am trying to catch the "not converged" error in a loop with the >> `ksp_error_if_not_converged` option on. However, it seems that PETSc only >> raises the exception once, even though the solver does not converge after >> that. Is this expected behavior? Can I make it raise an exception every >> time? >> > > When an error is raised, we do not guarantee a consistent state for > recovery, so errors terminate the program. If you want > to do something useful with non-convergence, then you do not set > -ksp_error_if_not_converged. Rather you check the convergence > code, and if it is not convergence, you take your action. > > Thanks, > > Matt > > >> I have included a code snippet of the loop below, and the complete code >> is attached: >> ```python >> for i in range(3): >> printf(f"Loop i = {i}") >> try: >> solver.solve() >> except ConvergenceError: >> printf(f" Error from Firedrake: solver did not converged: >> {get_ksp_reason(solver)}") >> except PETSc.Error as e: >> if e.ierr == 91: >> printf(f" Error from PETSc: solver did not converged: >> {get_ksp_reason(solver)}") >> else: >> raise >> ``` >> >> The output of the code looks like this: >> ```python >> (complex-int32-mkl) $ python test_error.py >> Loop i = 0 >> Linear solve did not converge due to DIVERGED_ITS iterations 4 >> Error from PETSc: solver did not converged: DIVERGED_MAX_IT >> Loop i = 1 >> Linear solve did not converge due to DIVERGED_ITS iterations 4 >> Error from Firedrake: solver did not converged: DIVERGED_MAX_IT >> Loop i = 2 >> Linear solve did not converge due to DIVERGED_ITS iterations 4 >> Error from Firedrake: solver did not converged: DIVERGED_MAX_IT >> ``` >> >> Best wishes, >> Zongze >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajaramillopalma at gmail.com Wed Mar 8 22:52:41 2023 From: ajaramillopalma at gmail.com (Alfredo Jaramillo) Date: Wed, 8 Mar 2023 21:52:41 -0700 Subject: [petsc-users] O3 versus O2 Message-ID: Dear community, We are in the middle of testing a simulator where the main computational bottleneck is solving a linear problem. We do this by calling GMRES+BoomerAMG through PETSc. This is a commercial code, pretended to serve clients with workstations or with access to clusters. Would you recommend O3 versus O2 optimizations? Maybe just to compile the linear algebra libraries? Some years ago, I worked on another project where going back to O2 solved a weird runtime error that I was never able to solve. This triggers my untrust. Thank you for your time! Alfredo -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Mar 8 23:36:50 2023 From: jed at jedbrown.org (Jed Brown) Date: Wed, 08 Mar 2023 22:36:50 -0700 Subject: [petsc-users] O3 versus O2 In-Reply-To: References: Message-ID: <87o7p2zdsd.fsf@jedbrown.org> You can test a benchmark problem with both. It probably doesn't make a lot of difference with the solver configuration you've selected (most of those operations are memory bandwidth limited). If your residual and Jacobian assembly code is written to vectorize, you may get significant benefit from architecture-specific optimizations like -march=skylake. Alfredo Jaramillo writes: > Dear community, > > We are in the middle of testing a simulator where the main computational > bottleneck is solving a linear problem. We do this by calling > GMRES+BoomerAMG through PETSc. > > This is a commercial code, pretended to serve clients with workstations or > with access to clusters. > > Would you recommend O3 versus O2 optimizations? Maybe just to compile the > linear algebra libraries? > > Some years ago, I worked on another project where going back to O2 solved a > weird runtime error that I was never able to solve. This triggers my > untrust. > > Thank you for your time! > Alfredo From qingyuanhu at jiangnan.edu.cn Fri Mar 10 03:34:17 2023 From: qingyuanhu at jiangnan.edu.cn (=?utf-8?B?6IOh5riF5YWD?=) Date: Fri, 10 Mar 2023 17:34:17 +0800 Subject: [petsc-users] Questions about vec filter and recover Message-ID: Hi there, I am a fresh user of Petsc, from Jiangnan University. Now I am trying to use Petsc for FEM and topology optimization. Since I use the background pixel elements, some of elements I don't want them to be calculated, so I have to filter them out. Then after my calculation, I want to have them back. For example, in the context of "mpiexec -np 2": I have a Vec xPassive=[1, 1, 0, 0, 0 | 1, 1, 1, 1, 1] showing the design-able elements (1) and the not-design-able elements (0) to be filtered out. This vec is auto sliced into 5+5  by the 2 threads. At the same time, I have a Vec density=[0.0, 0.1, 1.0, 1.0, 1.0 | 0.5, 0.6, 0.7, 0.8, 0.9]. In order to narrow down the density, I make an array and count, like resarray=[0.0, 0.1] with count=2 and resarray=[0.5, 0.6, 0.7, 0.8, 0.9] with count=5, then by the method VecCreateMPIWithArray(PETSC_COMM_WORLD, 1, count, PETSC_DECIDE, resarray, &density_new),  I get Vec density_new = [0.0, 0.1, 0.5, 0.6, 0.7, 0.8, 0.9] successfully. Next, I put the density_new into some methods to get the new values like density_new=[0.01, 0.11, 0.51, 0.61 | 0.71, 0.81, 0.91], note that since the density_new is of size 7, it becomes 4+3 for the 2 threads. Finally, I have to recover them as Vec density_recover=[0.01, 0.11, 1.0, 1.0, 1.0 | 0.51, 0.61, 0.71, 0.81, 0.91], in this process I fill the default 1.0 for the place where xPassive value=0. In the last step, when I try to recover the density vector, I tried to use VecGetValues but it seems can only get local values, cannot cross threads. I tried also to use VecScatterCreate(density_new, NULL, density_recover, idx_to, &scatter), however, my idx_to=[0, 1 | 5, 6, 7, 8, 9] and not works well like normal [0, 1, 5, 6, 7, 8, 9]. Could you help me with this please? Thank you soooooo much for your time! Best regards, Qingyuan HU School of Science, Jiangnan University -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Mar 10 09:32:27 2023 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 10 Mar 2023 10:32:27 -0500 Subject: [petsc-users] Questions about vec filter and recover In-Reply-To: References: Message-ID: On Fri, Mar 10, 2023 at 9:23?AM ??? wrote: > Hi there, > > I am a fresh user of Petsc, from Jiangnan University. Now I am trying to > use Petsc for FEM and topology optimization. > Since I use the background pixel elements, some of elements I don't want > them to be calculated, so I have to filter them out. Then after my > calculation, I want to have them back. > > For example, in the context of "mpiexec -np 2": > I have a Vec xPassive=[1, 1, 0, 0, 0 | 1, 1, 1, 1, 1] showing the > design-able elements (1) and the not-design-able elements (0) to be > filtered out. This vec is auto sliced into 5+5 by the 2 threads. > At the same time, I have a Vec density=[0.0, 0.1, 1.0, 1.0, 1.0 | 0.5, > 0.6, 0.7, 0.8, 0.9]. > In order to narrow down the density, I make an array and count, like > resarray=[0.0, 0.1] with count=2 and resarray=[0.5, 0.6, 0.7, 0.8, 0.9] > with count=5, then by the method VecCreateMPIWithArray(PETSC_COMM_WORLD, > 1, count, PETSC_DECIDE, resarray, &density_new), I get Vec density_new = > [0.0, 0.1, 0.5, 0.6, 0.7, 0.8, 0.9] successfully. > Next, I put the density_new into some methods to get the new values like > density_new=[0.01, 0.11, 0.51, 0.61 | 0.71, 0.81, 0.91], note that since > the density_new is of size 7, it becomes 4+3 for the 2 threads. > So the method changes the paralelled decompostion but not the order (strange, but that is fine) > Finally, I have to recover them as Vec density_recover=[0.01, 0.11, 1.0, > 1.0, 1.0 | 0.51, 0.61, 0.71, 0.81, 0.91], in this process I fill the > default 1.0 for the place where xPassive value=0. > > In the last step, when I try to recover the density vector, I tried to use > VecGetValues but it seems can only get local values, cannot cross threads. > Yes, you can only get local values with VecGetValues. > > I tried also to use VecScatterCreate(density_new, NULL, density_recover, > idx_to, &scatter), however, my idx_to=[0, 1 | 5, 6, 7, 8, 9] and not > works well like normal [0, 1, 5, 6, 7, 8, 9]. > Could you help me with this please? Thank you soooooo much for your time! > You have 4+3 so you want your IS to be of that size. One IS can be NULL because you are scattering all values. I think you want: [ 0 1 5 6 | 7 8 9 ] And set the values to 1.0 before the scatter to get your 1.0, 1.0, 1.0 in there. I always just have to play around with this kind of stuff to get it right. Good luck, Mark > > Best regards, > Qingyuan HU > School of Science, Jiangnan University > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.w.hester at icloud.com Mon Mar 13 09:58:35 2023 From: eric.w.hester at icloud.com (Eric Hester) Date: Mon, 13 Mar 2023 07:58:35 -0700 Subject: [petsc-users] Does petsc4py support matrix-free iterative solvers? Message-ID: Hello everyone, Does petsc4py support matrix-free iterative solvers (as for Matrix-Free matrices in petsc)? For context, I have a distributed matrix problem to solve. It comes from a Fourier-Chebyshev Galerkin discretisation. The corresponding matrix is dense, but it is fast to evaluate using fftw. It is also distributed in memory. While I?ve found some petsc4py tutorial examples in "/petsc/src/binding/petsc4py/demo/?, they don?t seem to show a matrix free example. And I don?t see a reference to a matrix shell create method in the petsc4py api. If petsc4py does support matrix free iterative solvers, it would be really helpful if someone could provide even a toy example of that. Serial would work, though a parallelised one would be better. Thanks, Eric From jroman at dsic.upv.es Mon Mar 13 10:10:51 2023 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 13 Mar 2023 16:10:51 +0100 Subject: [petsc-users] Does petsc4py support matrix-free iterative solvers? In-Reply-To: References: Message-ID: <59FB83B2-68E6-4A53-926A-1C0727269A87@dsic.upv.es> Both ode/vanderpol.py and poisson2d/poisson2d.py use shell matrices via a mult(self,mat,X,Y) function defined in the python side. Another example is ex3.py in slepc4py. Jose > El 13 mar 2023, a las 15:58, Eric Hester via petsc-users escribi?: > > Hello everyone, > > Does petsc4py support matrix-free iterative solvers (as for Matrix-Free matrices in petsc)? > > For context, I have a distributed matrix problem to solve. It comes from a Fourier-Chebyshev Galerkin discretisation. The corresponding matrix is dense, but it is fast to evaluate using fftw. It is also distributed in memory. > > While I?ve found some petsc4py tutorial examples in "/petsc/src/binding/petsc4py/demo/?, they don?t seem to show a matrix free example. And I don?t see a reference to a matrix shell create method in the petsc4py api. > > If petsc4py does support matrix free iterative solvers, it would be really helpful if someone could provide even a toy example of that. Serial would work, though a parallelised one would be better. > > Thanks, > Eric > > From eric.w.hester at icloud.com Mon Mar 13 11:37:53 2023 From: eric.w.hester at icloud.com (Eric Hester) Date: Mon, 13 Mar 2023 09:37:53 -0700 Subject: [petsc-users] Does petsc4py support matrix-free iterative solvers? In-Reply-To: <59FB83B2-68E6-4A53-926A-1C0727269A87@dsic.upv.es> References: <59FB83B2-68E6-4A53-926A-1C0727269A87@dsic.upv.es> Message-ID: <4AB5F676-1D25-42C3-B668-A5FF65C070D5@icloud.com> Ah ok. I see how the poisson2d example works. Thanks for the quick reply. Eric > On Mar 13, 2023, at 08:10, Jose E. Roman wrote: > > Both ode/vanderpol.py and poisson2d/poisson2d.py use shell matrices via a mult(self,mat,X,Y) function defined in the python side. Another example is ex3.py in slepc4py. > > Jose > > > >> El 13 mar 2023, a las 15:58, Eric Hester via petsc-users escribi?: >> >> Hello everyone, >> >> Does petsc4py support matrix-free iterative solvers (as for Matrix-Free matrices in petsc)? >> >> For context, I have a distributed matrix problem to solve. It comes from a Fourier-Chebyshev Galerkin discretisation. The corresponding matrix is dense, but it is fast to evaluate using fftw. It is also distributed in memory. >> >> While I?ve found some petsc4py tutorial examples in "/petsc/src/binding/petsc4py/demo/?, they don?t seem to show a matrix free example. And I don?t see a reference to a matrix shell create method in the petsc4py api. >> >> If petsc4py does support matrix free iterative solvers, it would be really helpful if someone could provide even a toy example of that. Serial would work, though a parallelised one would be better. >> >> Thanks, >> Eric >> >> > From wuktsinghua at gmail.com Tue Mar 14 06:25:20 2023 From: wuktsinghua at gmail.com (K. Wu) Date: Tue, 14 Mar 2023 12:25:20 +0100 Subject: [petsc-users] KSP for successive linear systems Message-ID: Hi all, Good day! I am trying to solve an optimization problem where I need to solve multiple successive linear systems inside each optimization loop. The matrices are based on the same grid, but their data structure will change for each linear system. Currently I am doing it by setting up just one single KSP object. Then call KSPSetOperators() and KSPSolve() for each solve. This means the KSP object is solving the successive linear systems one by one, and in the next optimization iteration, it starts all over again. I am wondering that should I use separate KSP objects for each linear system so that during optimization the same KSP will be specialized in solving its corresponding system all the time? I use non-zero initial guess, so I pay attention to use different x vectors for different linear systems, so that the x vectors from the previous iteration can be used as initial guesses for linear systems in the next iteration. Not sure whether some similar thing should also be done for KSP? Thanks for your kind help! Best regards, Kai -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Mar 14 07:27:22 2023 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 14 Mar 2023 08:27:22 -0400 Subject: [petsc-users] KSP for successive linear systems In-Reply-To: References: Message-ID: To gain an advantage in reusing the KSP the Mat must be the same size and the Mat must have the same nonzero structure (different numerical values are fine). Otherwise there is no measurable improvement in reusing the same KSP. Barry > On Mar 14, 2023, at 7:25 AM, K. Wu wrote: > > Hi all, > > Good day! > > I am trying to solve an optimization problem where I need to solve multiple successive linear systems inside each optimization loop. The matrices are based on the same grid, but their data structure will change for each linear system. > > Currently I am doing it by setting up just one single KSP object. Then call KSPSetOperators() and KSPSolve() for each solve. This means the KSP object is solving the successive linear systems one by one, and in the next optimization iteration, it starts all over again. > > I am wondering that should I use separate KSP objects for each linear system so that during optimization the same KSP will be specialized in solving its corresponding system all the time? > > I use non-zero initial guess, so I pay attention to use different x vectors for different linear systems, so that the x vectors from the previous iteration can be used as initial guesses for linear systems in the next iteration. Not sure whether some similar thing should also be done for KSP? > > Thanks for your kind help! > > Best regards, > Kai > From wuktsinghua at gmail.com Tue Mar 14 09:06:50 2023 From: wuktsinghua at gmail.com (K. Wu) Date: Tue, 14 Mar 2023 15:06:50 +0100 Subject: [petsc-users] KSP for successive linear systems In-Reply-To: References: Message-ID: Thank you for the clarification. Barry Smith ?2023?3?14??? 13:27??? > > To gain an advantage in reusing the KSP the Mat must be the same size > and the Mat must have the same nonzero structure (different numerical > values are fine). Otherwise there is no measurable improvement in reusing > the same KSP. > > Barry > > > > On Mar 14, 2023, at 7:25 AM, K. Wu wrote: > > > > Hi all, > > > > Good day! > > > > I am trying to solve an optimization problem where I need to solve > multiple successive linear systems inside each optimization loop. The > matrices are based on the same grid, but their data structure will change > for each linear system. > > > > Currently I am doing it by setting up just one single KSP object. Then > call KSPSetOperators() and KSPSolve() for each solve. This means the KSP > object is solving the successive linear systems one by one, and in the next > optimization iteration, it starts all over again. > > > > I am wondering that should I use separate KSP objects for each linear > system so that during optimization the same KSP will be specialized in > solving its corresponding system all the time? > > > > I use non-zero initial guess, so I pay attention to use different x > vectors for different linear systems, so that the x vectors from the > previous iteration can be used as initial guesses for linear systems in the > next iteration. Not sure whether some similar thing should also be done for > KSP? > > > > Thanks for your kind help! > > > > Best regards, > > Kai > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmoschopoulos at outlook.com Tue Mar 14 01:22:03 2023 From: pmoschopoulos at outlook.com (Pantelis Moschopoulos) Date: Tue, 14 Mar 2023 06:22:03 +0000 Subject: [petsc-users] Memory Usage in Matrix Assembly. Message-ID: Hi everyone, I am a new Petsc user that incorporates Petsc for FEM in a Fortran code. My question concerns the sudden increase of the memory that Petsc needs during the assembly of the jacobian matrix. After this point, memory is freed. It seems to me like Petsc performs memory allocations and the deallocations during assembly. I have used the following commands with no success: CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE,ier) CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ier) CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_TRUE,ier). CALL MatSetOption(petsc_A, MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ier) The structure of the matrix does not change during my simulation, just the values. I am expecting this behavior the first time that I create this matrix because the preallocation instructions that I use are not very accurate but this continues every time I assemble the matrix. What I am missing here? Thank you very much, Pantelis -------------- next part -------------- An HTML attachment was scrubbed... URL: From jdara at dtu.dk Tue Mar 14 07:48:03 2023 From: jdara at dtu.dk (Jonathan Davud Razi Seyed Mirpourian) Date: Tue, 14 Mar 2023 12:48:03 +0000 Subject: [petsc-users] Dmplex+PetscFe+KSP Message-ID: <44a1501d8bd64690a6189d1e4271e8c7@dtu.dk> Dear Petsc team, I am trying to use DMplex in combination with PetscFE and KSP to solve a linear system. I have struggled to do so, as all the examples I found ( for example: https://petsc.org/release/src/snes/tutorials/ex26.c.html) use SNES. Is there a way to avoid this? Optimally I would like to use dmplex for the mesh management, then create the discretization with PetscFE and then get KSP to automatically assemble the system matrix A. I hope my questions is reasonable. All the best, Jonathan -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Tue Mar 14 09:40:46 2023 From: dave.mayhem23 at gmail.com (Dave May) Date: Tue, 14 Mar 2023 07:40:46 -0700 Subject: [petsc-users] Memory Usage in Matrix Assembly. In-Reply-To: References: Message-ID: On Tue 14. Mar 2023 at 07:15, Pantelis Moschopoulos < pmoschopoulos at outlook.com> wrote: > Hi everyone, > > I am a new Petsc user that incorporates Petsc for FEM in a Fortran code. > My question concerns the sudden increase of the memory that Petsc needs > during the assembly of the jacobian matrix. After this point, memory is > freed. It seems to me like Petsc performs memory allocations and the > deallocations during assembly. > I have used the following commands with no success: > CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE,ier) > CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ier) > CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_TRUE,ier). > CALL MatSetOption(petsc_A, MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ier) > > The structure of the matrix does not change during my simulation, just the > values. I am expecting this behavior the first time that I create this > matrix because the preallocation instructions that I use are not very > accurate but this continues every time I assemble the matrix. > What I am missing here? > I am guessing this observation is seen when you run a parallel job. MatSetValues() will cache values in a temporary memory buffer if the values are to be sent to a different MPI rank. Hence if the parallel layout of your matrix doesn?t closely match the layout of the DOFs on each mesh sub-domain, then a huge number of values can potentially be cached. After you call MatAssemblyBegin(), MatAssemblyEnd() this cache will be freed. Thanks, Dave > Thank you very much, > Pantelis > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmoschopoulos at outlook.com Tue Mar 14 09:59:34 2023 From: pmoschopoulos at outlook.com (Pantelis Moschopoulos) Date: Tue, 14 Mar 2023 14:59:34 +0000 Subject: [petsc-users] Memory Usage in Matrix Assembly. In-Reply-To: References: Message-ID: Dear Dave, Yes, I observe this in parallel runs. How I can change the parallel layout of the matrix? In my implementation, I read the mesh file, and the I split the domain where the first rank gets the first N elements, the second rank gets the next N elements etc. Should I use metis to distribute elements? Note that I use continuous finite elements, which means that some values will be cached in a temporary buffer. Thank you very much, Pantelis ________________________________ From: Dave May Sent: Tuesday, March 14, 2023 4:40 PM To: Pantelis Moschopoulos Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Memory Usage in Matrix Assembly. On Tue 14. Mar 2023 at 07:15, Pantelis Moschopoulos > wrote: Hi everyone, I am a new Petsc user that incorporates Petsc for FEM in a Fortran code. My question concerns the sudden increase of the memory that Petsc needs during the assembly of the jacobian matrix. After this point, memory is freed. It seems to me like Petsc performs memory allocations and the deallocations during assembly. I have used the following commands with no success: CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE,ier) CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ier) CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_TRUE,ier). CALL MatSetOption(petsc_A, MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ier) The structure of the matrix does not change during my simulation, just the values. I am expecting this behavior the first time that I create this matrix because the preallocation instructions that I use are not very accurate but this continues every time I assemble the matrix. What I am missing here? I am guessing this observation is seen when you run a parallel job. MatSetValues() will cache values in a temporary memory buffer if the values are to be sent to a different MPI rank. Hence if the parallel layout of your matrix doesn?t closely match the layout of the DOFs on each mesh sub-domain, then a huge number of values can potentially be cached. After you call MatAssemblyBegin(), MatAssemblyEnd() this cache will be freed. Thanks, Dave Thank you very much, Pantelis -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Mar 14 10:17:36 2023 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 14 Mar 2023 11:17:36 -0400 Subject: [petsc-users] Dmplex+PetscFe+KSP In-Reply-To: <44a1501d8bd64690a6189d1e4271e8c7@dtu.dk> References: <44a1501d8bd64690a6189d1e4271e8c7@dtu.dk> Message-ID: KSP/SNES do not automatically assemble the linear system, that is the responsibility of DMPLEX in this case. Thus the process for assembling the matrix is largely the same whether done with KSP or SNES and DMPLEX. The difference is, of course, that constructing the linear matrix does not depend on some ?solution? vector as with SNES. Note also you can simply use SNES for a linear problem by selecting the SNESType of SNESKSP; this will just as efficient as using KSP directly. You should be able to locate a SNES example and extract the calls for defining the mesh and building the matrix but using them with KSP. Barry > On Mar 14, 2023, at 8:48 AM, Jonathan Davud Razi Seyed Mirpourian via petsc-users wrote: > > Dear Petsc team, > > I am trying to use DMplex in combination with PetscFE and KSP to solve a linear system. > > I have struggled to do so, as all the examples I found ( for example:https://petsc.org/release/src/snes/tutorials/ex26.c.html) use SNES. > > Is there a way to avoid this? Optimally I would like to use dmplex for the mesh management, then create the discretization with PetscFE and then get KSP to automatically > assemble the system matrix A. > > I hope my questions is reasonable. > > All the best, > > Jonathan -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Mar 14 10:21:57 2023 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 14 Mar 2023 11:21:57 -0400 Subject: [petsc-users] Memory Usage in Matrix Assembly. In-Reply-To: References: Message-ID: <579AEAB4-2C37-44B5-9FD9-8F5EF7A1D7C1@petsc.dev> Yes, you should partition the elements and redistribute them for optimal parallelism. You can use the MatPartitioning object to partition the graph of the elements which will tell you what elements should be assigned to each MPI process. But then you need to move the element information to the correct process. At that point your code will remain pretty much as it is now. Barry > On Mar 14, 2023, at 10:59 AM, Pantelis Moschopoulos wrote: > > Dear Dave, > > Yes, I observe this in parallel runs. How I can change the parallel layout of the matrix? In my implementation, I read the mesh file, and the I split the domain where the first rank gets the first N elements, the second rank gets the next N elements etc. Should I use metis to distribute elements? Note that I use continuous finite elements, which means that some values will be cached in a temporary buffer. > > Thank you very much, > Pantelis > From: Dave May > Sent: Tuesday, March 14, 2023 4:40 PM > To: Pantelis Moschopoulos > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Memory Usage in Matrix Assembly. > > > > On Tue 14. Mar 2023 at 07:15, Pantelis Moschopoulos > wrote: > Hi everyone, > > I am a new Petsc user that incorporates Petsc for FEM in a Fortran code. > My question concerns the sudden increase of the memory that Petsc needs during the assembly of the jacobian matrix. After this point, memory is freed. It seems to me like Petsc performs memory allocations and the deallocations during assembly. > I have used the following commands with no success: > CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE,ier) > CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ier) > CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_TRUE,ier). > CALL MatSetOption(petsc_A, MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ier) > > The structure of the matrix does not change during my simulation, just the values. I am expecting this behavior the first time that I create this matrix because the preallocation instructions that I use are not very accurate but this continues every time I assemble the matrix. > What I am missing here? > > I am guessing this observation is seen when you run a parallel job. > > MatSetValues() will cache values in a temporary memory buffer if the values are to be sent to a different MPI rank. > Hence if the parallel layout of your matrix doesn?t closely match the layout of the DOFs on each mesh sub-domain, then a huge number of values can potentially be cached. After you call MatAssemblyBegin(), MatAssemblyEnd() this cache will be freed. > > Thanks, > Dave > > > > Thank you very much, > Pantelis -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmoschopoulos at outlook.com Tue Mar 14 10:32:35 2023 From: pmoschopoulos at outlook.com (Pantelis Moschopoulos) Date: Tue, 14 Mar 2023 15:32:35 +0000 Subject: [petsc-users] Memory Usage in Matrix Assembly. In-Reply-To: <579AEAB4-2C37-44B5-9FD9-8F5EF7A1D7C1@petsc.dev> References: <579AEAB4-2C37-44B5-9FD9-8F5EF7A1D7C1@petsc.dev> Message-ID: Ok, I will try to implement your suggestions. Thank you very much for your help, Pantelis ________________________________ From: Barry Smith Sent: Tuesday, March 14, 2023 5:21 PM To: Pantelis Moschopoulos Cc: Dave May ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Memory Usage in Matrix Assembly. Yes, you should partition the elements and redistribute them for optimal parallelism. You can use the MatPartitioning object to partition the graph of the elements which will tell you what elements should be assigned to each MPI process. But then you need to move the element information to the correct process. At that point your code will remain pretty much as it is now. Barry On Mar 14, 2023, at 10:59 AM, Pantelis Moschopoulos wrote: Dear Dave, Yes, I observe this in parallel runs. How I can change the parallel layout of the matrix? In my implementation, I read the mesh file, and the I split the domain where the first rank gets the first N elements, the second rank gets the next N elements etc. Should I use metis to distribute elements? Note that I use continuous finite elements, which means that some values will be cached in a temporary buffer. Thank you very much, Pantelis ________________________________ From: Dave May Sent: Tuesday, March 14, 2023 4:40 PM To: Pantelis Moschopoulos Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Memory Usage in Matrix Assembly. On Tue 14. Mar 2023 at 07:15, Pantelis Moschopoulos > wrote: Hi everyone, I am a new Petsc user that incorporates Petsc for FEM in a Fortran code. My question concerns the sudden increase of the memory that Petsc needs during the assembly of the jacobian matrix. After this point, memory is freed. It seems to me like Petsc performs memory allocations and the deallocations during assembly. I have used the following commands with no success: CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE,ier) CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ier) CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_TRUE,ier). CALL MatSetOption(petsc_A, MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ier) The structure of the matrix does not change during my simulation, just the values. I am expecting this behavior the first time that I create this matrix because the preallocation instructions that I use are not very accurate but this continues every time I assemble the matrix. What I am missing here? I am guessing this observation is seen when you run a parallel job. MatSetValues() will cache values in a temporary memory buffer if the values are to be sent to a different MPI rank. Hence if the parallel layout of your matrix doesn?t closely match the layout of the DOFs on each mesh sub-domain, then a huge number of values can potentially be cached. After you call MatAssemblyBegin(), MatAssemblyEnd() this cache will be freed. Thanks, Dave Thank you very much, Pantelis -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Tue Mar 14 11:00:39 2023 From: dave.mayhem23 at gmail.com (Dave May) Date: Tue, 14 Mar 2023 09:00:39 -0700 Subject: [petsc-users] Memory Usage in Matrix Assembly. In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 at 07:59, Pantelis Moschopoulos < pmoschopoulos at outlook.com> wrote: > Dear Dave, > > Yes, I observe this in parallel runs. How I can change the parallel layout > of the matrix? In my implementation, I read the mesh file, and the I split > the domain where the first rank gets the first N elements, the second rank > gets the next N elements etc. Should I use metis to distribute elements? > > Note that I use continuous finite elements, which means that some values > will be cached in a temporary buffer. > Sure. With CG FE you will always have some DOFs which need to be cached, however the number of cached values will be minimized if you follow Barry's advice. If you do what Barry suggests, only the DOFs which live on the boundary of your element-wise defined sub-domains would need to cached. Thanks, Dave > > Thank you very much, > Pantelis > ------------------------------ > *From:* Dave May > *Sent:* Tuesday, March 14, 2023 4:40 PM > *To:* Pantelis Moschopoulos > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Memory Usage in Matrix Assembly. > > > > On Tue 14. Mar 2023 at 07:15, Pantelis Moschopoulos < > pmoschopoulos at outlook.com> wrote: > > Hi everyone, > > I am a new Petsc user that incorporates Petsc for FEM in a Fortran code. > My question concerns the sudden increase of the memory that Petsc needs > during the assembly of the jacobian matrix. After this point, memory is > freed. It seems to me like Petsc performs memory allocations and the > deallocations during assembly. > I have used the following commands with no success: > CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE,ier) > CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ier) > CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_TRUE,ier). > CALL MatSetOption(petsc_A, MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ier) > > The structure of the matrix does not change during my simulation, just the > values. I am expecting this behavior the first time that I create this > matrix because the preallocation instructions that I use are not very > accurate but this continues every time I assemble the matrix. > What I am missing here? > > > I am guessing this observation is seen when you run a parallel job. > > MatSetValues() will cache values in a temporary memory buffer if the > values are to be sent to a different MPI rank. > Hence if the parallel layout of your matrix doesn?t closely match the > layout of the DOFs on each mesh sub-domain, then a huge number of values > can potentially be cached. After you call MatAssemblyBegin(), > MatAssemblyEnd() this cache will be freed. > > Thanks, > Dave > > > > Thank you very much, > Pantelis > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Mar 14 11:11:11 2023 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 14 Mar 2023 12:11:11 -0400 Subject: [petsc-users] Dmplex+PetscFe+KSP In-Reply-To: References: <44a1501d8bd64690a6189d1e4271e8c7@dtu.dk> Message-ID: <8029C240-EFC7-42E2-8EEA-32A8A30EF364@petsc.dev> Matt can help you more directly. Barry > On Mar 14, 2023, at 11:40 AM, Jonathan Davud Razi Seyed Mirpourian wrote: > > Dear Barry, > > Thank you very much for the quick answer! > > To my understanding, in the snes examples, it is the call: DMPlexSetSnesLocalFEM that takes care of computing the identities important for snes (jacobian, residual, boundary values). > Is there an equivalent for KSP (just computing the system Matrix A and the rhs b)? I cannot find any DMPlexSetKSPLocalFEM in the docs or am I missing something? > > Also, I was not aware of SNESKSP, so thank you very much for that, it will be my fallback strategy. > > All the best, > Jonathan > > From: Barry Smith > > Sent: 14. marts 2023 16:18 > To: Jonathan Davud Razi Seyed Mirpourian > > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Dmplex+PetscFe+KSP > > > KSP/SNES do not automatically assemble the linear system, that is the responsibility of DMPLEX in this case. Thus the process for assembling the matrix is largely the same whether done with KSP or SNES and DMPLEX. The difference is, of course, that constructing the linear matrix does not depend on some ?solution? vector as with SNES. > > Note also you can simply use SNES for a linear problem by selecting the SNESType of SNESKSP; this will just as efficient as using KSP directly. > > You should be able to locate a SNES example and extract the calls for defining the mesh and building the matrix but using them with KSP. > > Barry > > > > > > > On Mar 14, 2023, at 8:48 AM, Jonathan Davud Razi Seyed Mirpourian via petsc-users > wrote: > > Dear Petsc team, > > I am trying to use DMplex in combination with PetscFE and KSP to solve a linear system. > > I have struggled to do so, as all the examples I found ( for example:https://petsc.org/release/src/snes/tutorials/ex26.c.html) use SNES. > > Is there a way to avoid this? Optimally I would like to use dmplex for the mesh management, then create the discretization with PetscFE and then get KSP to automatically > assemble the system matrix A. > > I hope my questions is reasonable. > > All the best, > > Jonathan -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.w.hester at icloud.com Tue Mar 14 11:45:00 2023 From: eric.w.hester at icloud.com (Eric Hester) Date: Tue, 14 Mar 2023 09:45:00 -0700 Subject: [petsc-users] Does petsc4py support matrix-free iterative solvers? In-Reply-To: <4AB5F676-1D25-42C3-B668-A5FF65C070D5@icloud.com> References: <59FB83B2-68E6-4A53-926A-1C0727269A87@dsic.upv.es> <4AB5F676-1D25-42C3-B668-A5FF65C070D5@icloud.com> Message-ID: <68BB11A0-CE85-4F87-877C-2BBC1A57DD9A@icloud.com> Is there a similar example of how to create shell preconditioners using petsc4py? Thanks, Eric > On Mar 13, 2023, at 09:37, Eric Hester wrote: > > Ah ok. I see how the poisson2d example works. Thanks for the quick reply. > > Eric > >> On Mar 13, 2023, at 08:10, Jose E. Roman wrote: >> >> Both ode/vanderpol.py and poisson2d/poisson2d.py use shell matrices via a mult(self,mat,X,Y) function defined in the python side. Another example is ex3.py in slepc4py. >> >> Jose >> >> >> >>> El 13 mar 2023, a las 15:58, Eric Hester via petsc-users escribi?: >>> >>> Hello everyone, >>> >>> Does petsc4py support matrix-free iterative solvers (as for Matrix-Free matrices in petsc)? >>> >>> For context, I have a distributed matrix problem to solve. It comes from a Fourier-Chebyshev Galerkin discretisation. The corresponding matrix is dense, but it is fast to evaluate using fftw. It is also distributed in memory. >>> >>> While I?ve found some petsc4py tutorial examples in "/petsc/src/binding/petsc4py/demo/?, they don?t seem to show a matrix free example. And I don?t see a reference to a matrix shell create method in the petsc4py api. >>> >>> If petsc4py does support matrix free iterative solvers, it would be really helpful if someone could provide even a toy example of that. Serial would work, though a parallelised one would be better. >>> >>> Thanks, >>> Eric >>> >>> >> > From jroman at dsic.upv.es Tue Mar 14 11:49:51 2023 From: jroman at dsic.upv.es (Jose E. Roman) Date: Tue, 14 Mar 2023 17:49:51 +0100 Subject: [petsc-users] Does petsc4py support matrix-free iterative solvers? In-Reply-To: <68BB11A0-CE85-4F87-877C-2BBC1A57DD9A@icloud.com> References: <59FB83B2-68E6-4A53-926A-1C0727269A87@dsic.upv.es> <4AB5F676-1D25-42C3-B668-A5FF65C070D5@icloud.com> <68BB11A0-CE85-4F87-877C-2BBC1A57DD9A@icloud.com> Message-ID: <65C6B1F1-41F3-4F19-AC5D-E6C1A408D5AD@dsic.upv.es> Have a look at ex100.c ex100.py: https://gitlab.com/petsc/petsc/-/blob/c28a890633c5a91613f1645670105409b4ba3c14/src/ksp/ksp/tutorials/ex100.c https://gitlab.com/petsc/petsc/-/blob/c28a890633c5a91613f1645670105409b4ba3c14/src/ksp/ksp/tutorials/ex100.py Jose > El 14 mar 2023, a las 17:45, Eric Hester escribi?: > > Is there a similar example of how to create shell preconditioners using petsc4py? > > Thanks, > Eric > >> On Mar 13, 2023, at 09:37, Eric Hester wrote: >> >> Ah ok. I see how the poisson2d example works. Thanks for the quick reply. >> >> Eric >> >>> On Mar 13, 2023, at 08:10, Jose E. Roman wrote: >>> >>> Both ode/vanderpol.py and poisson2d/poisson2d.py use shell matrices via a mult(self,mat,X,Y) function defined in the python side. Another example is ex3.py in slepc4py. >>> >>> Jose >>> >>> >>> >>>> El 13 mar 2023, a las 15:58, Eric Hester via petsc-users escribi?: >>>> >>>> Hello everyone, >>>> >>>> Does petsc4py support matrix-free iterative solvers (as for Matrix-Free matrices in petsc)? >>>> >>>> For context, I have a distributed matrix problem to solve. It comes from a Fourier-Chebyshev Galerkin discretisation. The corresponding matrix is dense, but it is fast to evaluate using fftw. It is also distributed in memory. >>>> >>>> While I?ve found some petsc4py tutorial examples in "/petsc/src/binding/petsc4py/demo/?, they don?t seem to show a matrix free example. And I don?t see a reference to a matrix shell create method in the petsc4py api. >>>> >>>> If petsc4py does support matrix free iterative solvers, it would be really helpful if someone could provide even a toy example of that. Serial would work, though a parallelised one would be better. >>>> >>>> Thanks, >>>> Eric >>>> >>>> >>> >> > From stefano.zampini at gmail.com Tue Mar 14 12:13:53 2023 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Tue, 14 Mar 2023 20:13:53 +0300 Subject: [petsc-users] Does petsc4py support matrix-free iterative solvers? In-Reply-To: <65C6B1F1-41F3-4F19-AC5D-E6C1A408D5AD@dsic.upv.es> References: <59FB83B2-68E6-4A53-926A-1C0727269A87@dsic.upv.es> <4AB5F676-1D25-42C3-B668-A5FF65C070D5@icloud.com> <68BB11A0-CE85-4F87-877C-2BBC1A57DD9A@icloud.com> <65C6B1F1-41F3-4F19-AC5D-E6C1A408D5AD@dsic.upv.es> Message-ID: You can find other examples at https://gitlab.com/stefanozampini/petscexamples On Tue, Mar 14, 2023, 19:50 Jose E. Roman wrote: > Have a look at ex100.c ex100.py: > > https://gitlab.com/petsc/petsc/-/blob/c28a890633c5a91613f1645670105409b4ba3c14/src/ksp/ksp/tutorials/ex100.c > > https://gitlab.com/petsc/petsc/-/blob/c28a890633c5a91613f1645670105409b4ba3c14/src/ksp/ksp/tutorials/ex100.py > > Jose > > > > El 14 mar 2023, a las 17:45, Eric Hester > escribi?: > > > > Is there a similar example of how to create shell preconditioners using > petsc4py? > > > > Thanks, > > Eric > > > >> On Mar 13, 2023, at 09:37, Eric Hester > wrote: > >> > >> Ah ok. I see how the poisson2d example works. Thanks for the quick > reply. > >> > >> Eric > >> > >>> On Mar 13, 2023, at 08:10, Jose E. Roman wrote: > >>> > >>> Both ode/vanderpol.py and poisson2d/poisson2d.py use shell matrices > via a mult(self,mat,X,Y) function defined in the python side. Another > example is ex3.py in slepc4py. > >>> > >>> Jose > >>> > >>> > >>> > >>>> El 13 mar 2023, a las 15:58, Eric Hester via petsc-users < > petsc-users at mcs.anl.gov> escribi?: > >>>> > >>>> Hello everyone, > >>>> > >>>> Does petsc4py support matrix-free iterative solvers (as for > Matrix-Free matrices in petsc)? > >>>> > >>>> For context, I have a distributed matrix problem to solve. It comes > from a Fourier-Chebyshev Galerkin discretisation. The corresponding matrix > is dense, but it is fast to evaluate using fftw. It is also distributed in > memory. > >>>> > >>>> While I?ve found some petsc4py tutorial examples in > "/petsc/src/binding/petsc4py/demo/?, they don?t seem to show a matrix free > example. And I don?t see a reference to a matrix shell create method in the > petsc4py api. > >>>> > >>>> If petsc4py does support matrix free iterative solvers, it would be > really helpful if someone could provide even a toy example of that. Serial > would work, though a parallelised one would be better. > >>>> > >>>> Thanks, > >>>> Eric > >>>> > >>>> > >>> > >> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jchristopher at anl.gov Tue Mar 14 12:14:06 2023 From: jchristopher at anl.gov (Christopher, Joshua) Date: Tue, 14 Mar 2023 17:14:06 +0000 Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG In-Reply-To: <523EAD18-437E-4008-A811-4D32317C89AC@joliv.et> References: <523EAD18-437E-4008-A811-4D32317C89AC@joliv.et> Message-ID: Hello PETSc users, I haven't heard back from the library developer regarding the numbering issue or my questions on using field split operators with their library, so I need to fix this myself. Regarding the natural numbering vs parallel numbering: I haven't figured out what is wrong here. I stepped through in parallel and it looks like each processor is setting up the matrix and calling MatSetValue similar to what is shown in https://petsc.org/release/src/ksp/ksp/tutorials/ex2.c.html. I see that PETSc is recognizing my simple two-processor test from the output ("PetscInitialize_Common(): PETSc successfully started: number of processors = 2"). I'll keep poking at this, however I'm very new to PETSc. When I print the matrix to ASCII using PETSC_VIEWER_DEFAULT, I'm guessing I see one row per line, and the tuples consists of the column number and value? On the FieldSplit preconditioner, is my understanding here correct: To use FieldSplit, I must have a DM. Since I have an unstructured mesh, I must use DMPlex and set up the chart and covering relations specific to my mesh following here: https://petsc.org/release/docs/manual/dmplex/. I think this may be very time-consuming for me to set up. Currently, I already have a matrix stored in a parallel sparse L-D-U format. I am converting into PETSc's sparse parallel AIJ matrix (traversing my matrix and using MatSetValues). The weights for my discretization scheme are already accounted for in the coefficients of my L-D-U matrix. I do have the submatrices in L-D-U format for each of my two equations' coupling with each other. That is, the equivalent of lines 242,251-252,254 of example 28 https://petsc.org/release/src/snes/tutorials/ex28.c.html. Could I directly convert my submatrices into PETSc's sub-matrix here, then assemble things together so that the field split preconditioners will work? Alternatively, since my L-D-U matrices already account for the discretization scheme, can I use a simple structured grid DM? Thank you so much for your help! Regards, Joshua ________________________________ From: Pierre Jolivet Sent: Friday, March 3, 2023 11:45 AM To: Christopher, Joshua Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG For full disclosure, with -ksp_pc_side right -ksp_max_it 100 -ksp_rtol 1E-10: 1) with renumbering via ParMETIS -pc_type bjacobi -sub_pc_type lu -sub_pc_factor_mat_solver_type mumps => Linear solve converged due to CONVERGED_RTOL iterations 10 -pc_type hypre -pc_hypre_boomeramg_relax_type_down l1-Gauss-Seidel -pc_hypre_boomeramg_relax_type_up backward-l1-Gauss-Seidel => Linear solve converged due to CONVERGED_RTOL iterations 55 2) without renumbering via ParMETIS -pc_type bjacobi => Linear solve did not converge due to DIVERGED_ITS iterations 100 -pc_type hypre => Linear solve did not converge due to DIVERGED_ITS iterations 100 Using on outer fieldsplit may help fix this. Thanks, Pierre On 3 Mar 2023, at 6:24 PM, Christopher, Joshua via petsc-users wrote: I am solving these equations in the context of electrically-driven fluid flows as that first paper describes. I am using a PIMPLE scheme to advance the fluid equations in time, and my goal is to do a coupled solve of the electric equations similar to what is described in this paper: https://www.sciencedirect.com/science/article/pii/S0045793019302427. They are using the SIMPLE scheme in this paper. My fluid flow should eventually reach steady behavior, and likewise the time derivative in the charge density should trend towards zero. They preferred using BiCGStab with a direct LU preconditioner for solving their electric equations. I tried to test that combination, but my case is halting for unknown reasons in the middle of the PETSc solve. I'll try with more nodes and see if I am running out of memory, but the computer is a little overloaded at the moment so it may take a while to run. I sent Pierre Jolivet my matrix and RHS, and they said the matrix does not appear to be following a parallel numbering, and instead looks like the matrix has natural numbering. When they renumbered the system with ParMETIS they got really fast convergence. I am using PETSc through a library, so I will reach out to the library authors and see if there is an issue in the library. Thank you, Joshua ________________________________ From: Barry Smith Sent: Thursday, March 2, 2023 3:47 PM To: Christopher, Joshua Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG [Untitled.png] Are you solving this as a time-dependent problem? Using an implicit scheme (like backward Euler) for rho ? In ODE language, solving the differential algebraic equation? Is epsilon bounded away from 0? On Mar 2, 2023, at 4:22 PM, Christopher, Joshua wrote: Hi Barry and Mark, Thank you for looking into my problem. The two equations I am solving with PETSc are equations 6 and 7 from this paper:https://ris.utwente.nl/ws/portalfiles/portal/5676495/Roghair+Paper_final_draft_v1.pdf I just used MUMPS and SuperLU_DIST on my full-size problem (with 3,000,000 unknowns). To clarify, I did a direct solve with -ksp_type preonly. They take a very long time, about 30 minutes for MUMPS and 18 minutes for SuperLU_DIST, see attached output. For reference, the same matrix took 658 iterations of BoomerAMG and about 20 seconds of walltime. Maybe I am already getting a great deal with BoomerAMG! I'll try removing some terms from my solve (e.g. removing the second equation, then making the second equation just the elliptic portion of the equation, etc.) and try with a simpler geometry. I'll keep you updated as I run into troubles with that route. I wasn't aware of Field Split preconditioners, I'll do some reading on them and give them a try as well. Thank you again, Joshua ________________________________ From: Barry Smith Sent: Thursday, March 2, 2023 7:47 AM To: Christopher, Joshua Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the 5,000,000 unknowns? It is at the high end of problem sizes you can do with direct solvers but is worth comparing with BoomerAMG. You likely want to use more nodes and fewer cores per node with MUMPs to be able to access more memory. If you are needing to solve multiple right hand sides but with the same matrix the factors will be reused resulting in the second and later solves being much faster. I agree with Mark, with iterative solvers you are likely to end up with PCFIELDSPLIT. Barry On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users wrote: Hello, I am trying to solve the leaky-dielectric model equations with PETSc using a second-order discretization scheme (with limiting to first order as needed) using the finite volume method. The leaky dielectric model is a coupled system of two equations, consisting of a Poisson equation and a convection-diffusion equation. I have tested on small problems with simple geometry (~1000 DoFs) using: -ksp_type gmres -pc_type hypre -pc_hypre_type boomeramg and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this in parallel with 2 cores, but also previously was able to use successfully use a direct solver in serial to solve this problem. When I scale up to my production problem, I get significantly worse convergence. My production problem has ~3 million DoFs, more complex geometry, and is solved on ~100 cores across two nodes. The boundary conditions change a little because of the geometry, but are of the same classifications (e.g. only Dirichlet and Neumann). On the production case, I am needing 600-4000 iterations to converge. I've attached the output from the first solve that took 658 iterations to converge, using the following output options: -ksp_view_pre -ksp_view -ksp_converged_reason -ksp_monitor_true_residual -ksp_test_null_space My matrix is non-symmetric, the condition number can be around 10e6, and the eigenvalues reported by PETSc have been real and positive (using -ksp_view_eigenvalues). I have tried using other preconditions (superlu, mumps, gamg, mg) but hypre+boomeramg has performed the best so far. The literature seems to indicate that AMG is the best approach for solving these equations in a coupled fashion. Do you have any advice on speeding up the convergence of this system? Thank you, Joshua -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Untitled.png Type: image/png Size: 165137 bytes Desc: Untitled.png URL: From bsmith at petsc.dev Tue Mar 14 13:35:30 2023 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 14 Mar 2023 14:35:30 -0400 Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG In-Reply-To: References: <523EAD18-437E-4008-A811-4D32317C89AC@joliv.et> Message-ID: <4A1F98D0-658C-47A2-8277-23F97F95F5C1@petsc.dev> You definitely do not need to use a complicated DM to take advantage of PCFIELDSPLIT. All you need to do is create two IS on each MPI process. The first should list all the indices of the degrees of freedom of your first type of variable and the second should list all the rest of the degrees of freedom. Then use https://petsc.org/release/docs/manualpages/PC/PCFieldSplitSetIS/ Barry Note: PCFIELDSPLIT does not care how you have ordered your degrees of freedom of the two types. You might interlace them or have all the first degree of freedom on an MPI process and then have all the second degree of freedom. This just determines what your IS look like. > On Mar 14, 2023, at 1:14 PM, Christopher, Joshua via petsc-users wrote: > > Hello PETSc users, > > I haven't heard back from the library developer regarding the numbering issue or my questions on using field split operators with their library, so I need to fix this myself. > > Regarding the natural numbering vs parallel numbering: I haven't figured out what is wrong here. I stepped through in parallel and it looks like each processor is setting up the matrix and calling MatSetValue similar to what is shown in https://petsc.org/release/src/ksp/ksp/tutorials/ex2.c.html. I see that PETSc is recognizing my simple two-processor test from the output ("PetscInitialize_Common(): PETSc successfully started: number of processors = 2"). I'll keep poking at this, however I'm very new to PETSc. When I print the matrix to ASCII using PETSC_VIEWER_DEFAULT, I'm guessing I see one row per line, and the tuples consists of the column number and value? > > On the FieldSplit preconditioner, is my understanding here correct: > > To use FieldSplit, I must have a DM. Since I have an unstructured mesh, I must use DMPlex and set up the chart and covering relations specific to my mesh following here: https://petsc.org/release/docs/manual/dmplex/. I think this may be very time-consuming for me to set up. > > Currently, I already have a matrix stored in a parallel sparse L-D-U format. I am converting into PETSc's sparse parallel AIJ matrix (traversing my matrix and using MatSetValues). The weights for my discretization scheme are already accounted for in the coefficients of my L-D-U matrix. I do have the submatrices in L-D-U format for each of my two equations' coupling with each other. That is, the equivalent of lines 242,251-252,254 of example 28 https://petsc.org/release/src/snes/tutorials/ex28.c.html. Could I directly convert my submatrices into PETSc's sub-matrix here, then assemble things together so that the field split preconditioners will work? > > Alternatively, since my L-D-U matrices already account for the discretization scheme, can I use a simple structured grid DM? > > Thank you so much for your help! > Regards, > Joshua > From: Pierre Jolivet > > Sent: Friday, March 3, 2023 11:45 AM > To: Christopher, Joshua > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG > > For full disclosure, with -ksp_pc_side right -ksp_max_it 100 -ksp_rtol 1E-10: > 1) with renumbering via ParMETIS > -pc_type bjacobi -sub_pc_type lu -sub_pc_factor_mat_solver_type mumps => Linear solve converged due to CONVERGED_RTOL iterations 10 > -pc_type hypre -pc_hypre_boomeramg_relax_type_down l1-Gauss-Seidel -pc_hypre_boomeramg_relax_type_up backward-l1-Gauss-Seidel => Linear solve converged due to CONVERGED_RTOL iterations 55 > 2) without renumbering via ParMETIS > -pc_type bjacobi => Linear solve did not converge due to DIVERGED_ITS iterations 100 > -pc_type hypre => Linear solve did not converge due to DIVERGED_ITS iterations 100 > Using on outer fieldsplit may help fix this. > > Thanks, > Pierre > >> On 3 Mar 2023, at 6:24 PM, Christopher, Joshua via petsc-users > wrote: >> >> I am solving these equations in the context of electrically-driven fluid flows as that first paper describes. I am using a PIMPLE scheme to advance the fluid equations in time, and my goal is to do a coupled solve of the electric equations similar to what is described in this paper: https://www.sciencedirect.com/science/article/pii/S0045793019302427. They are using the SIMPLE scheme in this paper. My fluid flow should eventually reach steady behavior, and likewise the time derivative in the charge density should trend towards zero. They preferred using BiCGStab with a direct LU preconditioner for solving their electric equations. I tried to test that combination, but my case is halting for unknown reasons in the middle of the PETSc solve. I'll try with more nodes and see if I am running out of memory, but the computer is a little overloaded at the moment so it may take a while to run. >> >> I sent Pierre Jolivet my matrix and RHS, and they said the matrix does not appear to be following a parallel numbering, and instead looks like the matrix has natural numbering. When they renumbered the system with ParMETIS they got really fast convergence. I am using PETSc through a library, so I will reach out to the library authors and see if there is an issue in the library. >> >> Thank you, >> Joshua >> From: Barry Smith > >> Sent: Thursday, March 2, 2023 3:47 PM >> To: Christopher, Joshua > >> Cc: petsc-users at mcs.anl.gov > >> Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG >> >> >> >> >> >> >> Are you solving this as a time-dependent problem? Using an implicit scheme (like backward Euler) for rho ? In ODE language, solving the differential algebraic equation? >> >> Is epsilon bounded away from 0? >> >>> On Mar 2, 2023, at 4:22 PM, Christopher, Joshua > wrote: >>> >>> Hi Barry and Mark, >>> >>> Thank you for looking into my problem. The two equations I am solving with PETSc are equations 6 and 7 from this paper:https://ris.utwente.nl/ws/portalfiles/portal/5676495/Roghair+Paper_final_draft_v1.pdf >>> >>> I just used MUMPS and SuperLU_DIST on my full-size problem (with 3,000,000 unknowns). To clarify, I did a direct solve with -ksp_type preonly. They take a very long time, about 30 minutes for MUMPS and 18 minutes for SuperLU_DIST, see attached output. For reference, the same matrix took 658 iterations of BoomerAMG and about 20 seconds of walltime. Maybe I am already getting a great deal with BoomerAMG! >>> >>> I'll try removing some terms from my solve (e.g. removing the second equation, then making the second equation just the elliptic portion of the equation, etc.) and try with a simpler geometry. I'll keep you updated as I run into troubles with that route. I wasn't aware of Field Split preconditioners, I'll do some reading on them and give them a try as well. >>> >>> Thank you again, >>> Joshua >>> From: Barry Smith > >>> Sent: Thursday, March 2, 2023 7:47 AM >>> To: Christopher, Joshua > >>> Cc: petsc-users at mcs.anl.gov > >>> Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG >>> >>> >>> Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the 5,000,000 unknowns? It is at the high end of problem sizes you can do with direct solvers but is worth comparing with BoomerAMG. You likely want to use more nodes and fewer cores per node with MUMPs to be able to access more memory. If you are needing to solve multiple right hand sides but with the same matrix the factors will be reused resulting in the second and later solves being much faster. >>> >>> I agree with Mark, with iterative solvers you are likely to end up with PCFIELDSPLIT. >>> >>> Barry >>> >>> >>>> On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users > wrote: >>>> >>>> Hello, >>>> >>>> I am trying to solve the leaky-dielectric model equations with PETSc using a second-order discretization scheme (with limiting to first order as needed) using the finite volume method. The leaky dielectric model is a coupled system of two equations, consisting of a Poisson equation and a convection-diffusion equation. I have tested on small problems with simple geometry (~1000 DoFs) using: >>>> >>>> -ksp_type gmres >>>> -pc_type hypre >>>> -pc_hypre_type boomeramg >>>> >>>> and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this in parallel with 2 cores, but also previously was able to use successfully use a direct solver in serial to solve this problem. When I scale up to my production problem, I get significantly worse convergence. My production problem has ~3 million DoFs, more complex geometry, and is solved on ~100 cores across two nodes. The boundary conditions change a little because of the geometry, but are of the same classifications (e.g. only Dirichlet and Neumann). On the production case, I am needing 600-4000 iterations to converge. I've attached the output from the first solve that took 658 iterations to converge, using the following output options: >>>> >>>> -ksp_view_pre >>>> -ksp_view >>>> -ksp_converged_reason >>>> -ksp_monitor_true_residual >>>> -ksp_test_null_space >>>> >>>> My matrix is non-symmetric, the condition number can be around 10e6, and the eigenvalues reported by PETSc have been real and positive (using -ksp_view_eigenvalues). >>>> >>>> I have tried using other preconditions (superlu, mumps, gamg, mg) but hypre+boomeramg has performed the best so far. The literature seems to indicate that AMG is the best approach for solving these equations in a coupled fashion. >>>> >>>> Do you have any advice on speeding up the convergence of this system? >>>> >>>> Thank you, >>>> Joshua >>>> >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Mar 14 15:52:12 2023 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 14 Mar 2023 16:52:12 -0400 Subject: [petsc-users] Dmplex+PetscFe+KSP In-Reply-To: <8029C240-EFC7-42E2-8EEA-32A8A30EF364@petsc.dev> References: <44a1501d8bd64690a6189d1e4271e8c7@dtu.dk> <8029C240-EFC7-42E2-8EEA-32A8A30EF364@petsc.dev> Message-ID: On Tue, Mar 14, 2023 at 12:11?PM Barry Smith wrote: > > Matt can help you more directly. > > Barry > > > On Mar 14, 2023, at 11:40 AM, Jonathan Davud Razi Seyed Mirpourian < > jdara at dtu.dk> wrote: > > Dear Barry, > > Thank you very much for the quick answer! > > To my understanding, in the snes examples, it is the call: > DMPlexSetSnesLocalFEM that takes care of computing the identities important > for snes (jacobian, residual, boundary values). > Is there an equivalent for KSP (just computing the system Matrix A and the > rhs b)? I cannot find any DMPlexSetKSPLocalFEM in the docs or am I missing > something? > > Also, I was not aware of SNESKSP, so thank you very much for that, it will > be my fallback strategy. > > There are no KSP analogues. The intent is for you to use -snes_type ksp for truly linear problems. Thanks, Matt > All the best, > Jonathan > > *From:* Barry Smith > *Sent:* 14. marts 2023 16:18 > *To:* Jonathan Davud Razi Seyed Mirpourian > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Dmplex+PetscFe+KSP > > > KSP/SNES do not automatically assemble the linear system, that is the > responsibility of DMPLEX in this case. Thus the process for assembling the > matrix is largely the same whether done with KSP or SNES and DMPLEX. The > difference is, of course, that constructing the linear matrix does not > depend on some ?solution? vector as with SNES. > > Note also you can simply use SNES for a linear problem by selecting the > SNESType of SNESKSP; this will just as efficient as using KSP directly. > > You should be able to locate a SNES example and extract the calls for > defining the mesh and building the matrix but using them with KSP. > > Barry > > > > > > > On Mar 14, 2023, at 8:48 AM, Jonathan Davud Razi Seyed Mirpourian via > petsc-users wrote: > > Dear Petsc team, > > I am trying to use DMplex in combination with PetscFE and KSP to solve a > linear system. > > I have struggled to do so, as all the examples I found ( for example: > https://petsc.org/release/src/snes/tutorials/ex26.c.html) use SNES. > > Is there a way to avoid this? Optimally I would like to use dmplex for the > mesh management, then create the discretization with PetscFE and then get > KSP to automatically > assemble the system matrix A. > > I hope my questions is reasonable. > > All the best, > > Jonathan > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Mar 14 15:55:14 2023 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 14 Mar 2023 16:55:14 -0400 Subject: [petsc-users] Memory Usage in Matrix Assembly. In-Reply-To: References: Message-ID: On Tue, Mar 14, 2023 at 12:01?PM Dave May wrote: > > > On Tue, 14 Mar 2023 at 07:59, Pantelis Moschopoulos < > pmoschopoulos at outlook.com> wrote: > >> Dear Dave, >> >> Yes, I observe this in parallel runs. How I can change the parallel >> layout of the matrix? In my implementation, I read the mesh file, and the I >> split the domain where the first rank gets the first N elements, the second >> rank gets the next N elements etc. Should I use metis to distribute >> elements? >> > > >> Note that I use continuous finite elements, which means that some values >> will be cached in a temporary buffer. >> > > Sure. With CG FE you will always have some DOFs which need to be cached, > however the number of cached values will be minimized if you follow Barry's > advice. If you do what Barry suggests, only the DOFs which live on the > boundary of your element-wise defined sub-domains would need to cached. > Note that we have direct support for unstructured meshes (Plex) with partitioning and redistribution, rather than translating them to purely algebraic language. Thanks, Matt > Thanks, > Dave > > >> >> Thank you very much, >> Pantelis >> ------------------------------ >> *From:* Dave May >> *Sent:* Tuesday, March 14, 2023 4:40 PM >> *To:* Pantelis Moschopoulos >> *Cc:* petsc-users at mcs.anl.gov >> *Subject:* Re: [petsc-users] Memory Usage in Matrix Assembly. >> >> >> >> On Tue 14. Mar 2023 at 07:15, Pantelis Moschopoulos < >> pmoschopoulos at outlook.com> wrote: >> >> Hi everyone, >> >> I am a new Petsc user that incorporates Petsc for FEM in a Fortran code. >> My question concerns the sudden increase of the memory that Petsc needs >> during the assembly of the jacobian matrix. After this point, memory is >> freed. It seems to me like Petsc performs memory allocations and the >> deallocations during assembly. >> I have used the following commands with no success: >> CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE,ier) >> CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ier) >> CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_ALLOCATION_ERR, >> PETSC_TRUE,ier). >> CALL MatSetOption(petsc_A, MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ier) >> >> The structure of the matrix does not change during my simulation, just >> the values. I am expecting this behavior the first time that I create this >> matrix because the preallocation instructions that I use are not very >> accurate but this continues every time I assemble the matrix. >> What I am missing here? >> >> >> I am guessing this observation is seen when you run a parallel job. >> >> MatSetValues() will cache values in a temporary memory buffer if the >> values are to be sent to a different MPI rank. >> Hence if the parallel layout of your matrix doesn?t closely match the >> layout of the DOFs on each mesh sub-domain, then a huge number of values >> can potentially be cached. After you call MatAssemblyBegin(), >> MatAssemblyEnd() this cache will be freed. >> >> Thanks, >> Dave >> >> >> >> Thank you very much, >> Pantelis >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmoschopoulos at outlook.com Wed Mar 15 01:34:58 2023 From: pmoschopoulos at outlook.com (Pantelis Moschopoulos) Date: Wed, 15 Mar 2023 06:34:58 +0000 Subject: [petsc-users] Memory Usage in Matrix Assembly. In-Reply-To: References: Message-ID: Dear all, Thank you all very much for your suggestions. Dave, I am using also the reverse Cuthill?McKee algorithm when I load the mesh information and then the simulation proceeds. I can use partitioning after the reordering right? Matt, with PLEX you refer to DMPLEX? To be honest, I have never tried the DM structures of Petsc up to this point. Pantelis ________________________________ From: Matthew Knepley Sent: Tuesday, March 14, 2023 10:55 PM To: Dave May Cc: Pantelis Moschopoulos ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Memory Usage in Matrix Assembly. On Tue, Mar 14, 2023 at 12:01?PM Dave May > wrote: On Tue, 14 Mar 2023 at 07:59, Pantelis Moschopoulos > wrote: Dear Dave, Yes, I observe this in parallel runs. How I can change the parallel layout of the matrix? In my implementation, I read the mesh file, and the I split the domain where the first rank gets the first N elements, the second rank gets the next N elements etc. Should I use metis to distribute elements? Note that I use continuous finite elements, which means that some values will be cached in a temporary buffer. Sure. With CG FE you will always have some DOFs which need to be cached, however the number of cached values will be minimized if you follow Barry's advice. If you do what Barry suggests, only the DOFs which live on the boundary of your element-wise defined sub-domains would need to cached. Note that we have direct support for unstructured meshes (Plex) with partitioning and redistribution, rather than translating them to purely algebraic language. Thanks, Matt Thanks, Dave Thank you very much, Pantelis ________________________________ From: Dave May > Sent: Tuesday, March 14, 2023 4:40 PM To: Pantelis Moschopoulos > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Memory Usage in Matrix Assembly. On Tue 14. Mar 2023 at 07:15, Pantelis Moschopoulos > wrote: Hi everyone, I am a new Petsc user that incorporates Petsc for FEM in a Fortran code. My question concerns the sudden increase of the memory that Petsc needs during the assembly of the jacobian matrix. After this point, memory is freed. It seems to me like Petsc performs memory allocations and the deallocations during assembly. I have used the following commands with no success: CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE,ier) CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ier) CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_TRUE,ier). CALL MatSetOption(petsc_A, MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ier) The structure of the matrix does not change during my simulation, just the values. I am expecting this behavior the first time that I create this matrix because the preallocation instructions that I use are not very accurate but this continues every time I assemble the matrix. What I am missing here? I am guessing this observation is seen when you run a parallel job. MatSetValues() will cache values in a temporary memory buffer if the values are to be sent to a different MPI rank. Hence if the parallel layout of your matrix doesn?t closely match the layout of the DOFs on each mesh sub-domain, then a huge number of values can potentially be cached. After you call MatAssemblyBegin(), MatAssemblyEnd() this cache will be freed. Thanks, Dave Thank you very much, Pantelis -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksl7912 at snu.ac.kr Wed Mar 15 02:38:07 2023 From: ksl7912 at snu.ac.kr (=?UTF-8?B?wq3qtozsirnrpqwgLyDtlZnsg50gLyDtla3qs7XsmrDso7zqs7XtlZnqs7w=?=) Date: Wed, 15 Mar 2023 16:38:07 +0900 Subject: [petsc-users] Question about time issues in parallel computing Message-ID: Dear petsc developers. Hello. I am trying to solve the structural problem with FEM and test parallel computing works well. However, even if I change the number of cores, the total time is calculated the same. I have tested on a simple problem using a MUMPS solver using: mpiexec -n 1 mpiexec -n 2 mpiexec -n 4 ... Could you give me some advice if you have experienced this problem? Best regards Seung Lee Kwon -- Seung Lee Kwon, Ph.D.Candidate Aerospace Structures and Materials Laboratory Department of Mechanical and Aerospace Engineering Seoul National University Building 300 Rm 503, Gwanak-ro 1, Gwanak-gu, Seoul, South Korea, 08826 E-mail : ksl7912 at snu.ac.kr Office : +82-2-880-7389 C. P : +82-10-4695-1062 -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Mar 15 06:07:51 2023 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 15 Mar 2023 07:07:51 -0400 Subject: [petsc-users] Memory Usage in Matrix Assembly. In-Reply-To: References: Message-ID: On Wed, Mar 15, 2023 at 2:34?AM Pantelis Moschopoulos < pmoschopoulos at outlook.com> wrote: > Dear all, > Thank you all very much for your suggestions. > > Dave, I am using also the reverse Cuthill?McKee algorithm when I load the > mesh information and then the simulation proceeds. I can use partitioning > after the reordering right? > Yes. > Matt, with PLEX you refer to DMPLEX? To be honest, I have never tried the > DM structures of Petsc up to this point. > Yes. It can read a variety of mesh formats, but if everything is working, there is no need to switch. Thanks Matt > Pantelis > ------------------------------ > *From:* Matthew Knepley > *Sent:* Tuesday, March 14, 2023 10:55 PM > *To:* Dave May > *Cc:* Pantelis Moschopoulos ; > petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Memory Usage in Matrix Assembly. > > On Tue, Mar 14, 2023 at 12:01?PM Dave May wrote: > > > > On Tue, 14 Mar 2023 at 07:59, Pantelis Moschopoulos < > pmoschopoulos at outlook.com> wrote: > > Dear Dave, > > Yes, I observe this in parallel runs. How I can change the parallel layout > of the matrix? In my implementation, I read the mesh file, and the I split > the domain where the first rank gets the first N elements, the second rank > gets the next N elements etc. Should I use metis to distribute elements? > > > > Note that I use continuous finite elements, which means that some values > will be cached in a temporary buffer. > > > Sure. With CG FE you will always have some DOFs which need to be cached, > however the number of cached values will be minimized if you follow Barry's > advice. If you do what Barry suggests, only the DOFs which live on the > boundary of your element-wise defined sub-domains would need to cached. > > > Note that we have direct support for unstructured meshes (Plex) with > partitioning and redistribution, rather than translating them to purely > algebraic language. > > Thanks, > > Matt > > > Thanks, > Dave > > > > Thank you very much, > Pantelis > ------------------------------ > *From:* Dave May > *Sent:* Tuesday, March 14, 2023 4:40 PM > *To:* Pantelis Moschopoulos > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Memory Usage in Matrix Assembly. > > > > On Tue 14. Mar 2023 at 07:15, Pantelis Moschopoulos < > pmoschopoulos at outlook.com> wrote: > > Hi everyone, > > I am a new Petsc user that incorporates Petsc for FEM in a Fortran code. > My question concerns the sudden increase of the memory that Petsc needs > during the assembly of the jacobian matrix. After this point, memory is > freed. It seems to me like Petsc performs memory allocations and the > deallocations during assembly. > I have used the following commands with no success: > CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE,ier) > CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ier) > CALL MatSetOption(petsc_A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_TRUE,ier). > CALL MatSetOption(petsc_A, MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ier) > > The structure of the matrix does not change during my simulation, just the > values. I am expecting this behavior the first time that I create this > matrix because the preallocation instructions that I use are not very > accurate but this continues every time I assemble the matrix. > What I am missing here? > > > I am guessing this observation is seen when you run a parallel job. > > MatSetValues() will cache values in a temporary memory buffer if the > values are to be sent to a different MPI rank. > Hence if the parallel layout of your matrix doesn?t closely match the > layout of the DOFs on each mesh sub-domain, then a huge number of values > can potentially be cached. After you call MatAssemblyBegin(), > MatAssemblyEnd() this cache will be freed. > > Thanks, > Dave > > > > Thank you very much, > Pantelis > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Mar 15 06:49:39 2023 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 15 Mar 2023 07:49:39 -0400 Subject: [petsc-users] Question about time issues in parallel computing In-Reply-To: References: Message-ID: On Wed, Mar 15, 2023 at 3:38?AM ???? / ?? / ??????? wrote: > Dear petsc developers. > > Hello. > I am trying to solve the structural problem with FEM and test parallel > computing works well. > > However, even if I change the number of cores, the total time is > calculated the same. > > I have tested on a simple problem using a MUMPS solver using: > mpiexec -n 1 > mpiexec -n 2 > mpiexec -n 4 > ... > > Could you give me some advice if you have experienced this problem? > If your problem is small, you could very well see no speedup: https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup Thanks, Matt > Best regards > Seung Lee Kwon > -- > Seung Lee Kwon, Ph.D.Candidate > Aerospace Structures and Materials Laboratory > Department of Mechanical and Aerospace Engineering > Seoul National University > Building 300 Rm 503, Gwanak-ro 1, Gwanak-gu, Seoul, South Korea, 08826 > E-mail : ksl7912 at snu.ac.kr > Office : +82-2-880-7389 > C. P : +82-10-4695-1062 > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksl7912 at snu.ac.kr Wed Mar 15 20:08:50 2023 From: ksl7912 at snu.ac.kr (=?UTF-8?B?wq3qtozsirnrpqwgLyDtlZnsg50gLyDtla3qs7XsmrDso7zqs7XtlZnqs7w=?=) Date: Thu, 16 Mar 2023 10:08:50 +0900 Subject: [petsc-users] Question about time issues in parallel computing In-Reply-To: References: Message-ID: Thank you for your reply. It was a simple problem, but it has more than 1000 degrees of freedom. Is this not enough to check speedup? Best regards Seung Lee Kwon 2023? 3? 15? (?) ?? 8:50, Matthew Knepley ?? ??: > On Wed, Mar 15, 2023 at 3:38?AM ???? / ?? / ??????? > wrote: > >> Dear petsc developers. >> >> Hello. >> I am trying to solve the structural problem with FEM and test parallel >> computing works well. >> >> However, even if I change the number of cores, the total time is >> calculated the same. >> >> I have tested on a simple problem using a MUMPS solver using: >> mpiexec -n 1 >> mpiexec -n 2 >> mpiexec -n 4 >> ... >> >> Could you give me some advice if you have experienced this problem? >> > > If your problem is small, you could very well see no speedup: > > > https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup > > Thanks, > > Matt > > >> Best regards >> Seung Lee Kwon >> -- >> Seung Lee Kwon, Ph.D.Candidate >> Aerospace Structures and Materials Laboratory >> Department of Mechanical and Aerospace Engineering >> Seoul National University >> Building 300 Rm 503, Gwanak-ro 1, Gwanak-gu, Seoul, South Korea, 08826 >> E-mail : ksl7912 at snu.ac.kr >> Office : +82-2-880-7389 >> C. P : +82-10-4695-1062 >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- Seung Lee Kwon, Ph.D.Candidate Aerospace Structures and Materials Laboratory Department of Mechanical and Aerospace Engineering Seoul National University Building 300 Rm 503, Gwanak-ro 1, Gwanak-gu, Seoul, South Korea, 08826 E-mail : ksl7912 at snu.ac.kr Office : +82-2-880-7389 C. P : +82-10-4695-1062 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Mar 15 20:13:52 2023 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 15 Mar 2023 21:13:52 -0400 Subject: [petsc-users] Question about time issues in parallel computing In-Reply-To: References: Message-ID: Speed up to 4 processors should have at least 40,000 equations for 3D problems and more for 2D. At least for iterative solvers. This is probably a good place to start with direct solvers but you might see benefit with a little less. Mark On Wed, Mar 15, 2023 at 9:09?PM ???? / ?? / ??????? wrote: > Thank you for your reply. > > It was a simple problem, but it has more than 1000 degrees of freedom. > > Is this not enough to check speedup? > > Best regards > Seung Lee Kwon > > 2023? 3? 15? (?) ?? 8:50, Matthew Knepley ?? ??: > >> On Wed, Mar 15, 2023 at 3:38?AM ???? / ?? / ??????? >> wrote: >> >>> Dear petsc developers. >>> >>> Hello. >>> I am trying to solve the structural problem with FEM and test parallel >>> computing works well. >>> >>> However, even if I change the number of cores, the total time is >>> calculated the same. >>> >>> I have tested on a simple problem using a MUMPS solver using: >>> mpiexec -n 1 >>> mpiexec -n 2 >>> mpiexec -n 4 >>> ... >>> >>> Could you give me some advice if you have experienced this problem? >>> >> >> If your problem is small, you could very well see no speedup: >> >> >> https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup >> >> Thanks, >> >> Matt >> >> >>> Best regards >>> Seung Lee Kwon >>> -- >>> Seung Lee Kwon, Ph.D.Candidate >>> Aerospace Structures and Materials Laboratory >>> Department of Mechanical and Aerospace Engineering >>> Seoul National University >>> Building 300 Rm 503, Gwanak-ro 1, Gwanak-gu, Seoul, South Korea, 08826 >>> E-mail : ksl7912 at snu.ac.kr >>> Office : +82-2-880-7389 >>> C. P : +82-10-4695-1062 >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > > > -- > Seung Lee Kwon, Ph.D.Candidate > Aerospace Structures and Materials Laboratory > Department of Mechanical and Aerospace Engineering > Seoul National University > Building 300 Rm 503, Gwanak-ro 1, Gwanak-gu, Seoul, South Korea, 08826 > E-mail : ksl7912 at snu.ac.kr > Office : +82-2-880-7389 > C. P : +82-10-4695-1062 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksi2443 at gmail.com Thu Mar 16 07:26:37 2023 From: ksi2443 at gmail.com (user_gong Kim) Date: Thu, 16 Mar 2023 21:26:37 +0900 Subject: [petsc-users] Difference between opt and debug Message-ID: Hello, I have some issues about different mode and different command. 1. Exactly the same code, but no error occurs in debug mode, but an error occurs in opt mode. In this case, what should I be suspicious of? 2. When executed with ./application, no error occurs, but when executed with mpiexec -n 1 ./app, an error may occur. What should be suspected in this case? Thanks, Hyung Kim -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Mar 16 07:51:13 2023 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 16 Mar 2023 08:51:13 -0400 Subject: [petsc-users] Difference between opt and debug In-Reply-To: References: Message-ID: On Thu, Mar 16, 2023 at 8:26?AM user_gong Kim wrote: > Hello, > > I have some issues about different mode and different command. > > 1. Exactly the same code, but no error occurs in debug mode, but an error > occurs in opt mode. > In this case, what should I be suspicious of? > Memory overwrites, since debug and opt can have different memory layouts. Run under valgrind, or suing address sanitizer. > 2. When executed with ./application, no error occurs, but when executed > with mpiexec -n 1 ./app, an error may occur. What should be suspected in > this case? > Same thing. Thanks, Matt > Thanks, > Hyung Kim > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksi2443 at gmail.com Fri Mar 17 04:50:45 2023 From: ksi2443 at gmail.com (user_gong Kim) Date: Fri, 17 Mar 2023 18:50:45 +0900 Subject: [petsc-users] Question about MatView Message-ID: Hello, I have 2 questions about MatView. 1. I would like to ask if the process below is possible. When running in parallel, is it possible to make the matrix of the mpiaij format into a txt file, output it, and read it again so that the entire process has the same matrix? 2. If possible, please let me know which function can be used to create a txt file and how to read the txt file. Thanks, Hyung Kim -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Mar 17 05:34:46 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 17 Mar 2023 06:34:46 -0400 Subject: [petsc-users] Question about MatView In-Reply-To: References: Message-ID: On Fri, Mar 17, 2023 at 5:51?AM user_gong Kim wrote: > Hello, > > > > I have 2 questions about MatView. > > > > 1. I would like to ask if the process below is possible. > When running in parallel, is it possible to make the matrix of the mpiaij > format into a txt file, output it, and read it again so that the entire > process has the same matrix? > No. However, you can do this with a binary viewer. I suggest using MatViewFromOptions(mat, NULL, "-my_view"); and then the command line argument -my_view binary:mat.bin and then you can read this in using MatCreate(PETSC_COMM_WORLD, &mat); PetscViewerBinaryOpen(PETSC_COMM_WORLD, "mat.bin", FILE_MODE_READ, &viewer); MatLoad(mat, viewer); ViewerDestroy(&viewer); THanks, Matt > 2. If possible, please let me know which function can be used to > create a txt file and how to read the txt file. > > > > Thanks, > > Hyung Kim > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksi2443 at gmail.com Fri Mar 17 08:45:44 2023 From: ksi2443 at gmail.com (user_gong Kim) Date: Fri, 17 Mar 2023 22:45:44 +0900 Subject: [petsc-users] Question about MatView In-Reply-To: References: Message-ID: Following your comments, I did an test. However, if I run the application in parallel. In all processes, it is not possible to obtain values at all positions in the matrix through MatGetValue. As in the previous case of saving in binary, it is read in parallel divided form. Is it impossible to want to get the all value in the whole process? Thanks, Hyung Kim 2023? 3? 17? (?) ?? 7:35, Matthew Knepley ?? ??: > On Fri, Mar 17, 2023 at 5:51?AM user_gong Kim wrote: > >> Hello, >> >> >> >> I have 2 questions about MatView. >> >> >> >> 1. I would like to ask if the process below is possible. >> When running in parallel, is it possible to make the matrix of the mpiaij >> format into a txt file, output it, and read it again so that the entire >> process has the same matrix? >> > No. However, you can do this with a binary viewer. I suggest using > > MatViewFromOptions(mat, NULL, "-my_view"); > > and then the command line argument > > -my_view binary:mat.bin > > and then you can read this in using > > MatCreate(PETSC_COMM_WORLD, &mat); > PetscViewerBinaryOpen(PETSC_COMM_WORLD, "mat.bin", FILE_MODE_READ, > &viewer); > MatLoad(mat, viewer); > ViewerDestroy(&viewer); > > THanks, > > Matt > > > >> 2. If possible, please let me know which function can be used to >> create a txt file and how to read the txt file. >> >> >> >> Thanks, >> >> Hyung Kim >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From clement.berger at ens-lyon.fr Fri Mar 17 09:10:17 2023 From: clement.berger at ens-lyon.fr (Berger Clement) Date: Fri, 17 Mar 2023 15:10:17 +0100 Subject: [petsc-users] Create a nest not aligned by processors Message-ID: Dear all, I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). Is it possible to change that ? Note that I am coding in fortran if that has ay consequence. Thank you, Sincerely, -- Cl?ment BERGER ENS de Lyon -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Mar 17 09:48:40 2023 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 17 Mar 2023 10:48:40 -0400 Subject: [petsc-users] Create a nest not aligned by processors In-Reply-To: References: Message-ID: <3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev> You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process. Barry > On Mar 17, 2023, at 10:10 AM, Berger Clement wrote: > > Dear all, > > I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code > > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) > Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) > > does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). > > Is it possible to change that ? > > Note that I am coding in fortran if that has ay consequence. > > Thank you, > > Sincerely, > > -- > Cl?ment BERGER > ENS de Lyon -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Mar 17 09:53:21 2023 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 17 Mar 2023 10:53:21 -0400 Subject: [petsc-users] Question about MatView In-Reply-To: References: Message-ID: Use >> MatCreate(PETSC_COMM_SELF, &mat); >> PetscViewerBinaryOpen(PETSC_COMM_SELF, "mat.bin", FILE_MODE_READ, &viewer); If it one program running that both views and loads the matrix you can use MatCreateRedundantMatrix() to reproduce the entire matrix on each MPI rank. It is better than using the filesystem to do it. > On Mar 17, 2023, at 9:45 AM, user_gong Kim wrote: > > Following your comments, I did an test. > However, if I run the application in parallel. > In all processes, it is not possible to obtain values at all positions in the matrix through MatGetValue. > As in the previous case of saving in binary, it is read in parallel divided form. > Is it impossible to want to get the all value in the whole process? > > > Thanks, > Hyung Kim > > 2023? 3? 17? (?) ?? 7:35, Matthew Knepley >?? ??: >> On Fri, Mar 17, 2023 at 5:51?AM user_gong Kim > wrote: >>> Hello, >>> >>> >>> I have 2 questions about MatView. >>> >>> >>> 1. I would like to ask if the process below is possible. >>> When running in parallel, is it possible to make the matrix of the mpiaij format into a txt file, output it, and read it again so that the entire process has the same matrix? >>> >> No. However, you can do this with a binary viewer. I suggest using >> >> MatViewFromOptions(mat, NULL, "-my_view"); >> >> and then the command line argument >> >> -my_view binary:mat.bin >> >> and then you can read this in using >> >> MatCreate(PETSC_COMM_WORLD, &mat); >> PetscViewerBinaryOpen(PETSC_COMM_WORLD, "mat.bin", FILE_MODE_READ, &viewer); >> MatLoad(mat, viewer); >> ViewerDestroy(&viewer); >> >> THanks, >> >> Matt >> >> >>> 2. If possible, please let me know which function can be used to create a txt file and how to read the txt file. >>> >>> >>> Thanks, >>> >>> Hyung Kim >>> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksi2443 at gmail.com Fri Mar 17 10:05:49 2023 From: ksi2443 at gmail.com (user_gong Kim) Date: Sat, 18 Mar 2023 00:05:49 +0900 Subject: [petsc-users] Question about MatView In-Reply-To: References: Message-ID: PETSC_COMM_SELF generates an error in more than 2 processes. I would like to use the other method you said, matcreateredundantmatrix. However, there is no example in the manual. Can you give an example using this function? 2023? 3? 17? (?) ?? 11:53, Barry Smith ?? ??: > > Use > > MatCreate(PETSC_COMM_SELF, &mat); >> PetscViewerBinaryOpen(PETSC_COMM_SELF, "mat.bin", FILE_MODE_READ, >> &viewer); >> > > > If it one program running that both views and loads the matrix you can > use MatCreateRedundantMatrix() to reproduce the entire matrix on each MPI > rank. It is better than using the filesystem to do it. > > > On Mar 17, 2023, at 9:45 AM, user_gong Kim wrote: > > Following your comments, I did an test. > However, if I run the application in parallel. > In all processes, it is not possible to obtain values at all positions in > the matrix through MatGetValue. > As in the previous case of saving in binary, it is read in parallel > divided form. > Is it impossible to want to get the all value in the whole process? > > > Thanks, > Hyung Kim > > 2023? 3? 17? (?) ?? 7:35, Matthew Knepley ?? ??: > >> On Fri, Mar 17, 2023 at 5:51?AM user_gong Kim wrote: >> >>> Hello, >>> >>> >>> I have 2 questions about MatView. >>> >>> >>> 1. I would like to ask if the process below is possible. >>> When running in parallel, is it possible to make the matrix of the >>> mpiaij format into a txt file, output it, and read it again so that the >>> entire process has the same matrix? >>> >> No. However, you can do this with a binary viewer. I suggest using >> >> MatViewFromOptions(mat, NULL, "-my_view"); >> >> and then the command line argument >> >> -my_view binary:mat.bin >> >> and then you can read this in using >> >> MatCreate(PETSC_COMM_WORLD, &mat); >> PetscViewerBinaryOpen(PETSC_COMM_WORLD, "mat.bin", FILE_MODE_READ, >> &viewer); >> MatLoad(mat, viewer); >> ViewerDestroy(&viewer); >> >> THanks, >> >> Matt >> >> >> >>> 2. If possible, please let me know which function can be used to >>> create a txt file and how to read the txt file. >>> >>> >>> Thanks, >>> >>> Hyung Kim >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Mar 17 10:51:10 2023 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 17 Mar 2023 11:51:10 -0400 Subject: [petsc-users] Question about MatView In-Reply-To: References: Message-ID: <9589CEBA-DF30-4580-95F4-1D30FD1BB36A@petsc.dev> src/mat/tests/ex9.c and other examples in that directory use it. > On Mar 17, 2023, at 11:05 AM, user_gong Kim wrote: > > PETSC_COMM_SELF generates an error in more than 2 processes. It should not > > I would like to use the other method you said, matcreateredundantmatrix. However, there is no example in the manual. Can you give an example using this function? > > > > 2023? 3? 17? (?) ?? 11:53, Barry Smith >?? ??: >> >> Use >> >>>> MatCreate(PETSC_COMM_SELF, &mat); >>>> PetscViewerBinaryOpen(PETSC_COMM_SELF, "mat.bin", FILE_MODE_READ, &viewer); >> >> >> If it one program running that both views and loads the matrix you can use MatCreateRedundantMatrix() to reproduce the entire matrix on each MPI rank. It is better than using the filesystem to do it. >> >> >>> On Mar 17, 2023, at 9:45 AM, user_gong Kim > wrote: >>> >>> Following your comments, I did an test. >>> However, if I run the application in parallel. >>> In all processes, it is not possible to obtain values at all positions in the matrix through MatGetValue. >>> As in the previous case of saving in binary, it is read in parallel divided form. >>> Is it impossible to want to get the all value in the whole process? >>> >>> >>> Thanks, >>> Hyung Kim >>> >>> 2023? 3? 17? (?) ?? 7:35, Matthew Knepley >?? ??: >>>> On Fri, Mar 17, 2023 at 5:51?AM user_gong Kim > wrote: >>>>> Hello, >>>>> >>>>> >>>>> I have 2 questions about MatView. >>>>> >>>>> >>>>> 1. I would like to ask if the process below is possible. >>>>> When running in parallel, is it possible to make the matrix of the mpiaij format into a txt file, output it, and read it again so that the entire process has the same matrix? >>>>> >>>> No. However, you can do this with a binary viewer. I suggest using >>>> >>>> MatViewFromOptions(mat, NULL, "-my_view"); >>>> >>>> and then the command line argument >>>> >>>> -my_view binary:mat.bin >>>> >>>> and then you can read this in using >>>> >>>> MatCreate(PETSC_COMM_WORLD, &mat); >>>> PetscViewerBinaryOpen(PETSC_COMM_WORLD, "mat.bin", FILE_MODE_READ, &viewer); >>>> MatLoad(mat, viewer); >>>> ViewerDestroy(&viewer); >>>> >>>> THanks, >>>> >>>> Matt >>>> >>>> >>>>> 2. If possible, please let me know which function can be used to create a txt file and how to read the txt file. >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Hyung Kim >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From clement.berger at ens-lyon.fr Fri Mar 17 11:14:11 2023 From: clement.berger at ens-lyon.fr (Berger Clement) Date: Fri, 17 Mar 2023 17:14:11 +0100 Subject: [petsc-users] Create a nest not aligned by processors In-Reply-To: <3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev> References: <3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev> Message-ID: It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work. --- Cl?ment BERGER ENS de Lyon Le 2023-03-17 15:48, Barry Smith a ?crit : > You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process. > > Barry > >> On Mar 17, 2023, at 10:10 AM, Berger Clement wrote: >> >> Dear all, >> >> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code >> >> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) >> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) >> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) >> >> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). >> >> Is it possible to change that ? >> >> Note that I am coding in fortran if that has ay consequence. >> >> Thank you, >> >> Sincerely, >> >> -- >> Cl?ment BERGER >> ENS de Lyon -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Mar 17 11:34:52 2023 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 17 Mar 2023 12:34:52 -0400 Subject: [petsc-users] Create a nest not aligned by processors In-Reply-To: References: <3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev> Message-ID: <193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev> Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it. Barry Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks). > On Mar 17, 2023, at 12:14 PM, Berger Clement wrote: > > It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work. > > --- > Cl?ment BERGER > ENS de Lyon > > > Le 2023-03-17 15:48, Barry Smith a ?crit : > >> >> You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process. >> >> Barry >> >> >>> On Mar 17, 2023, at 10:10 AM, Berger Clement wrote: >>> Dear all, >>> >>> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code >>> >>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) >>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) >>> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) >>> >>> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). >>> >>> Is it possible to change that ? >>> >>> Note that I am coding in fortran if that has ay consequence. >>> >>> Thank you, >>> >>> Sincerely, >>> >>> -- >>> Cl?ment BERGER >>> ENS de Lyon -------------- next part -------------- An HTML attachment was scrubbed... URL: From clement.berger at ens-lyon.fr Fri Mar 17 12:19:43 2023 From: clement.berger at ens-lyon.fr (Berger Clement) Date: Fri, 17 Mar 2023 18:19:43 +0100 Subject: [petsc-users] Create a nest not aligned by processors In-Reply-To: <193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev> References: <3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev> <193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev> Message-ID: I have a matrix with four different blocks (2rows - 2columns). The block sizes differ from one another, because they correspond to a different physical variable. One of the block has the particularity that it has to be updated at each iteration. This update is performed by replacing it with a product of multiple matrices that depend on the result of the previous iteration. Note that these intermediate matrices are not square (because they also correspond to other types of variables), and that they must be completely refilled by hand (i.e. they are not the result of some simple linear operations). Finally, I use this final block matrix to solve multiple linear systems (with different righthand sides), so for now I use MUMPS as only the first solve takes time (but I might change it). Considering this setting, I created each type of variable separately, filled the different matrices, and created different nests of vectors / matrices for my operations. When the time comes to use KSPSolve, I use MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy the few vector data I need from my nests in a regular Vector, I solve, I get back my data in my nest and carry on with the operations needed for my updates. Is that clear ? I don't know if I provided too many or not enough details. Thank you --- Cl?ment BERGER ENS de Lyon Le 2023-03-17 17:34, Barry Smith a ?crit : > Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it. > > Barry > > Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks). > > On Mar 17, 2023, at 12:14 PM, Berger Clement wrote: > > It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work. > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 15:48, Barry Smith a ?crit : > You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process. > > Barry > > On Mar 17, 2023, at 10:10 AM, Berger Clement wrote: > > Dear all, > > I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code > > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) > Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) > > does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). > > Is it possible to change that ? > > Note that I am coding in fortran if that has ay consequence. > > Thank you, > > Sincerely, > > -- > Cl?ment BERGER > ENS de Lyon -------------- next part -------------- An HTML attachment was scrubbed... URL: From jchristopher at anl.gov Fri Mar 17 12:26:34 2023 From: jchristopher at anl.gov (Christopher, Joshua) Date: Fri, 17 Mar 2023 17:26:34 +0000 Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG In-Reply-To: <4A1F98D0-658C-47A2-8277-23F97F95F5C1@petsc.dev> References: <523EAD18-437E-4008-A811-4D32317C89AC@joliv.et> <4A1F98D0-658C-47A2-8277-23F97F95F5C1@petsc.dev> Message-ID: Hi Barry, Thank you for your response. I'm a little confused about the relation between the IS integer values and matrix indices. From https://petsc.org/release/src/snes/tutorials/ex70.c.html it looks like my IS should just contain a list of the rows for each split? For example, if I have a 100x100 matrix with two fields, "rho" and "phi", the first 50 rows correspond to the "rho" variable and the last 50 correspond to the "phi" variable. So I should call PCFieldSplitSetIS twice, the first with an IS containing integers 0-49 and the second with integers 49-99? PCFieldSplitSetIS is expecting global row numbers, correct? My matrix is organized as one block after another. Thank you, Joshua ________________________________ From: Barry Smith Sent: Tuesday, March 14, 2023 1:35 PM To: Christopher, Joshua Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG You definitely do not need to use a complicated DM to take advantage of PCFIELDSPLIT. All you need to do is create two IS on each MPI process. The first should list all the indices of the degrees of freedom of your first type of variable and the second should list all the rest of the degrees of freedom. Then use https://petsc.org/release/docs/manualpages/PC/PCFieldSplitSetIS/ Barry Note: PCFIELDSPLIT does not care how you have ordered your degrees of freedom of the two types. You might interlace them or have all the first degree of freedom on an MPI process and then have all the second degree of freedom. This just determines what your IS look like. On Mar 14, 2023, at 1:14 PM, Christopher, Joshua via petsc-users wrote: Hello PETSc users, I haven't heard back from the library developer regarding the numbering issue or my questions on using field split operators with their library, so I need to fix this myself. Regarding the natural numbering vs parallel numbering: I haven't figured out what is wrong here. I stepped through in parallel and it looks like each processor is setting up the matrix and calling MatSetValue similar to what is shown in https://petsc.org/release/src/ksp/ksp/tutorials/ex2.c.html. I see that PETSc is recognizing my simple two-processor test from the output ("PetscInitialize_Common(): PETSc successfully started: number of processors = 2"). I'll keep poking at this, however I'm very new to PETSc. When I print the matrix to ASCII using PETSC_VIEWER_DEFAULT, I'm guessing I see one row per line, and the tuples consists of the column number and value? On the FieldSplit preconditioner, is my understanding here correct: To use FieldSplit, I must have a DM. Since I have an unstructured mesh, I must use DMPlex and set up the chart and covering relations specific to my mesh following here: https://petsc.org/release/docs/manual/dmplex/. I think this may be very time-consuming for me to set up. Currently, I already have a matrix stored in a parallel sparse L-D-U format. I am converting into PETSc's sparse parallel AIJ matrix (traversing my matrix and using MatSetValues). The weights for my discretization scheme are already accounted for in the coefficients of my L-D-U matrix. I do have the submatrices in L-D-U format for each of my two equations' coupling with each other. That is, the equivalent of lines 242,251-252,254 of example 28 https://petsc.org/release/src/snes/tutorials/ex28.c.html. Could I directly convert my submatrices into PETSc's sub-matrix here, then assemble things together so that the field split preconditioners will work? Alternatively, since my L-D-U matrices already account for the discretization scheme, can I use a simple structured grid DM? Thank you so much for your help! Regards, Joshua ________________________________ From: Pierre Jolivet > Sent: Friday, March 3, 2023 11:45 AM To: Christopher, Joshua > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG For full disclosure, with -ksp_pc_side right -ksp_max_it 100 -ksp_rtol 1E-10: 1) with renumbering via ParMETIS -pc_type bjacobi -sub_pc_type lu -sub_pc_factor_mat_solver_type mumps => Linear solve converged due to CONVERGED_RTOL iterations 10 -pc_type hypre -pc_hypre_boomeramg_relax_type_down l1-Gauss-Seidel -pc_hypre_boomeramg_relax_type_up backward-l1-Gauss-Seidel => Linear solve converged due to CONVERGED_RTOL iterations 55 2) without renumbering via ParMETIS -pc_type bjacobi => Linear solve did not converge due to DIVERGED_ITS iterations 100 -pc_type hypre => Linear solve did not converge due to DIVERGED_ITS iterations 100 Using on outer fieldsplit may help fix this. Thanks, Pierre On 3 Mar 2023, at 6:24 PM, Christopher, Joshua via petsc-users > wrote: I am solving these equations in the context of electrically-driven fluid flows as that first paper describes. I am using a PIMPLE scheme to advance the fluid equations in time, and my goal is to do a coupled solve of the electric equations similar to what is described in this paper: https://www.sciencedirect.com/science/article/pii/S0045793019302427. They are using the SIMPLE scheme in this paper. My fluid flow should eventually reach steady behavior, and likewise the time derivative in the charge density should trend towards zero. They preferred using BiCGStab with a direct LU preconditioner for solving their electric equations. I tried to test that combination, but my case is halting for unknown reasons in the middle of the PETSc solve. I'll try with more nodes and see if I am running out of memory, but the computer is a little overloaded at the moment so it may take a while to run. I sent Pierre Jolivet my matrix and RHS, and they said the matrix does not appear to be following a parallel numbering, and instead looks like the matrix has natural numbering. When they renumbered the system with ParMETIS they got really fast convergence. I am using PETSc through a library, so I will reach out to the library authors and see if there is an issue in the library. Thank you, Joshua ________________________________ From: Barry Smith > Sent: Thursday, March 2, 2023 3:47 PM To: Christopher, Joshua > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG Are you solving this as a time-dependent problem? Using an implicit scheme (like backward Euler) for rho ? In ODE language, solving the differential algebraic equation? Is epsilon bounded away from 0? On Mar 2, 2023, at 4:22 PM, Christopher, Joshua > wrote: Hi Barry and Mark, Thank you for looking into my problem. The two equations I am solving with PETSc are equations 6 and 7 from this paper:https://ris.utwente.nl/ws/portalfiles/portal/5676495/Roghair+Paper_final_draft_v1.pdf I just used MUMPS and SuperLU_DIST on my full-size problem (with 3,000,000 unknowns). To clarify, I did a direct solve with -ksp_type preonly. They take a very long time, about 30 minutes for MUMPS and 18 minutes for SuperLU_DIST, see attached output. For reference, the same matrix took 658 iterations of BoomerAMG and about 20 seconds of walltime. Maybe I am already getting a great deal with BoomerAMG! I'll try removing some terms from my solve (e.g. removing the second equation, then making the second equation just the elliptic portion of the equation, etc.) and try with a simpler geometry. I'll keep you updated as I run into troubles with that route. I wasn't aware of Field Split preconditioners, I'll do some reading on them and give them a try as well. Thank you again, Joshua ________________________________ From: Barry Smith > Sent: Thursday, March 2, 2023 7:47 AM To: Christopher, Joshua > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the 5,000,000 unknowns? It is at the high end of problem sizes you can do with direct solvers but is worth comparing with BoomerAMG. You likely want to use more nodes and fewer cores per node with MUMPs to be able to access more memory. If you are needing to solve multiple right hand sides but with the same matrix the factors will be reused resulting in the second and later solves being much faster. I agree with Mark, with iterative solvers you are likely to end up with PCFIELDSPLIT. Barry On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users > wrote: Hello, I am trying to solve the leaky-dielectric model equations with PETSc using a second-order discretization scheme (with limiting to first order as needed) using the finite volume method. The leaky dielectric model is a coupled system of two equations, consisting of a Poisson equation and a convection-diffusion equation. I have tested on small problems with simple geometry (~1000 DoFs) using: -ksp_type gmres -pc_type hypre -pc_hypre_type boomeramg and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this in parallel with 2 cores, but also previously was able to use successfully use a direct solver in serial to solve this problem. When I scale up to my production problem, I get significantly worse convergence. My production problem has ~3 million DoFs, more complex geometry, and is solved on ~100 cores across two nodes. The boundary conditions change a little because of the geometry, but are of the same classifications (e.g. only Dirichlet and Neumann). On the production case, I am needing 600-4000 iterations to converge. I've attached the output from the first solve that took 658 iterations to converge, using the following output options: -ksp_view_pre -ksp_view -ksp_converged_reason -ksp_monitor_true_residual -ksp_test_null_space My matrix is non-symmetric, the condition number can be around 10e6, and the eigenvalues reported by PETSc have been real and positive (using -ksp_view_eigenvalues). I have tried using other preconditions (superlu, mumps, gamg, mg) but hypre+boomeramg has performed the best so far. The literature seems to indicate that AMG is the best approach for solving these equations in a coupled fashion. Do you have any advice on speeding up the convergence of this system? Thank you, Joshua -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Mar 17 12:34:52 2023 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 17 Mar 2023 13:34:52 -0400 Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG In-Reply-To: References: <523EAD18-437E-4008-A811-4D32317C89AC@joliv.et> <4A1F98D0-658C-47A2-8277-23F97F95F5C1@petsc.dev> Message-ID: That sounds right, See the docs and examples at https://petsc.org/release/docs/manualpages/PC/PCFieldSplitSetIS/ On Fri, Mar 17, 2023 at 1:26?PM Christopher, Joshua via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi Barry, > > Thank you for your response. I'm a little confused about the relation > between the IS integer values and matrix indices. From > https://petsc.org/release/src/snes/tutorials/ex70.c.html it looks like my > IS should just contain a list of the rows for each split? For example, if I > have a 100x100 matrix with two fields, "rho" and "phi", the first 50 rows > correspond to the "rho" variable and the last 50 correspond to the "phi" > variable. So I should call PCFieldSplitSetIS twice, the first with an IS > containing integers 0-49 and the second with integers 49-99? > PCFieldSplitSetIS is expecting global row numbers, correct? > > My matrix is organized as one block after another. > > > Thank you, > Joshua > ------------------------------ > *From:* Barry Smith > *Sent:* Tuesday, March 14, 2023 1:35 PM > *To:* Christopher, Joshua > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre > BoomerAMG > > > You definitely do not need to use a complicated DM to take advantage of > PCFIELDSPLIT. All you need to do is create two IS on each MPI process. The > first should list all the indices of the degrees of freedom of your first > type of variable and the second should list all the rest of the degrees of > freedom. Then use > https://petsc.org/release/docs/manualpages/PC/PCFieldSplitSetIS/ > > Barry > > Note: PCFIELDSPLIT does not care how you have ordered your degrees of > freedom of the two types. You might interlace them or have all the first > degree of freedom on an MPI process and then have all the second degree of > freedom. This just determines what your IS look like. > > > > On Mar 14, 2023, at 1:14 PM, Christopher, Joshua via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hello PETSc users, > > I haven't heard back from the library developer regarding the numbering > issue or my questions on using field split operators with their library, so > I need to fix this myself. > > Regarding the natural numbering vs parallel numbering: I haven't figured > out what is wrong here. I stepped through in parallel and it looks like > each processor is setting up the matrix and calling MatSetValue similar to > what is shown in > https://petsc.org/release/src/ksp/ksp/tutorials/ex2.c.html. I see that > PETSc is recognizing my simple two-processor test from the output > ("PetscInitialize_Common(): PETSc successfully started: number of > processors = 2"). I'll keep poking at this, however I'm very new to PETSc. > When I print the matrix to ASCII using PETSC_VIEWER_DEFAULT, I'm guessing I > see one row per line, and the tuples consists of the column number and > value? > > On the FieldSplit preconditioner, is my understanding here correct: > > To use FieldSplit, I must have a DM. Since I have an unstructured mesh, I > must use DMPlex and set up the chart and covering relations specific to my > mesh following here: https://petsc.org/release/docs/manual/dmplex/. I > think this may be very time-consuming for me to set up. > > Currently, I already have a matrix stored in a parallel sparse L-D-U > format. I am converting into PETSc's sparse parallel AIJ matrix (traversing > my matrix and using MatSetValues). The weights for my discretization scheme > are already accounted for in the coefficients of my L-D-U matrix. I do have > the submatrices in L-D-U format for each of my two equations' coupling with > each other. That is, the equivalent of lines 242,251-252,254 of example 28 > https://petsc.org/release/src/snes/tutorials/ex28.c.html. Could I > directly convert my submatrices into PETSc's sub-matrix here, then assemble > things together so that the field split preconditioners will work? > > Alternatively, since my L-D-U matrices already account for the > discretization scheme, can I use a simple structured grid DM? > > Thank you so much for your help! > Regards, > Joshua > ------------------------------ > *From:* Pierre Jolivet > *Sent:* Friday, March 3, 2023 11:45 AM > *To:* Christopher, Joshua > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre > BoomerAMG > > For full disclosure, with -ksp_pc_side right -ksp_max_it 100 -ksp_rtol > 1E-10: > 1) with renumbering via ParMETIS > -pc_type bjacobi -sub_pc_type lu -sub_pc_factor_mat_solver_type mumps > => Linear solve converged due to CONVERGED_RTOL iterations 10 > -pc_type hypre -pc_hypre_boomeramg_relax_type_down l1-Gauss-Seidel > -pc_hypre_boomeramg_relax_type_up backward-l1-Gauss-Seidel => Linear solve > converged due to CONVERGED_RTOL iterations 55 > 2) without renumbering via ParMETIS > -pc_type bjacobi => Linear solve did not converge due to DIVERGED_ITS > iterations 100 > -pc_type hypre => Linear solve did not converge due to DIVERGED_ITS > iterations 100 > Using on outer fieldsplit may help fix this. > > Thanks, > Pierre > > On 3 Mar 2023, at 6:24 PM, Christopher, Joshua via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > I am solving these equations in the context of electrically-driven fluid > flows as that first paper describes. I am using a PIMPLE scheme to advance > the fluid equations in time, and my goal is to do a coupled solve of the > electric equations similar to what is described in this paper: > https://www.sciencedirect.com/science/article/pii/S0045793019302427. They > are using the SIMPLE scheme in this paper. My fluid flow should eventually > reach steady behavior, and likewise the time derivative in the charge > density should trend towards zero. They preferred using BiCGStab with a > direct LU preconditioner for solving their electric equations. I tried to > test that combination, but my case is halting for unknown reasons in the > middle of the PETSc solve. I'll try with more nodes and see if I am running > out of memory, but the computer is a little overloaded at the moment so it > may take a while to run. > > I sent Pierre Jolivet my matrix and RHS, and they said the matrix does not > appear to be following a parallel numbering, and instead looks like the > matrix has natural numbering. When they renumbered the system with ParMETIS > they got really fast convergence. I am using PETSc through a library, so I > will reach out to the library authors and see if there is an issue in the > library. > > Thank you, > Joshua > ------------------------------ > *From:* Barry Smith > *Sent:* Thursday, March 2, 2023 3:47 PM > *To:* Christopher, Joshua > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre > BoomerAMG > > > > > > > Are you solving this as a time-dependent problem? Using an implicit > scheme (like backward Euler) for rho ? In ODE language, solving the > differential algebraic equation? > > Is epsilon bounded away from 0? > > On Mar 2, 2023, at 4:22 PM, Christopher, Joshua > wrote: > > Hi Barry and Mark, > > Thank you for looking into my problem. The two equations I am solving with > PETSc are equations 6 and 7 from this paper: > https://ris.utwente.nl/ws/portalfiles/portal/5676495/Roghair+Paper_final_draft_v1.pdf > > I just used MUMPS and SuperLU_DIST on my full-size problem (with 3,000,000 > unknowns). To clarify, I did a direct solve with -ksp_type preonly. They > take a very long time, about 30 minutes for MUMPS and 18 minutes for > SuperLU_DIST, see attached output. For reference, the same matrix took 658 > iterations of BoomerAMG and about 20 seconds of walltime. Maybe I am > already getting a great deal with BoomerAMG! > > I'll try removing some terms from my solve (e.g. removing the second > equation, then making the second equation just the elliptic portion of the > equation, etc.) and try with a simpler geometry. I'll keep you updated as I > run into troubles with that route. I wasn't aware of Field Split > preconditioners, I'll do some reading on them and give them a try as well. > > Thank you again, > Joshua > ------------------------------ > *From:* Barry Smith > *Sent:* Thursday, March 2, 2023 7:47 AM > *To:* Christopher, Joshua > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre > BoomerAMG > > > Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the > 5,000,000 unknowns? It is at the high end of problem sizes you can do with > direct solvers but is worth comparing with BoomerAMG. You likely want to > use more nodes and fewer cores per node with MUMPs to be able to access > more memory. If you are needing to solve multiple right hand sides but with > the same matrix the factors will be reused resulting in the second and > later solves being much faster. > > I agree with Mark, with iterative solvers you are likely to end up with > PCFIELDSPLIT. > > Barry > > > On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hello, > > I am trying to solve the leaky-dielectric model equations with PETSc using > a second-order discretization scheme (with limiting to first order as > needed) using the finite volume method. The leaky dielectric model is a > coupled system of two equations, consisting of a Poisson equation and a > convection-diffusion equation. I have tested on small problems with simple > geometry (~1000 DoFs) using: > > -ksp_type gmres > -pc_type hypre > -pc_hypre_type boomeramg > > and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this > in parallel with 2 cores, but also previously was able to use successfully > use a direct solver in serial to solve this problem. When I scale up to my > production problem, I get significantly worse convergence. My production > problem has ~3 million DoFs, more complex geometry, and is solved on ~100 > cores across two nodes. The boundary conditions change a little because of > the geometry, but are of the same classifications (e.g. only Dirichlet and > Neumann). On the production case, I am needing 600-4000 iterations to > converge. I've attached the output from the first solve that took 658 > iterations to converge, using the following output options: > > -ksp_view_pre > -ksp_view > -ksp_converged_reason > -ksp_monitor_true_residual > -ksp_test_null_space > > My matrix is non-symmetric, the condition number can be around 10e6, and > the eigenvalues reported by PETSc have been real and positive (using > -ksp_view_eigenvalues). > > I have tried using other preconditions (superlu, mumps, gamg, mg) but > hypre+boomeramg has performed the best so far. The literature seems to > indicate that AMG is the best approach for solving these equations in a > coupled fashion. > > Do you have any advice on speeding up the convergence of this system? > > Thank you, > Joshua > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danyang.su at gmail.com Fri Mar 17 13:02:33 2023 From: danyang.su at gmail.com (danyang.su at gmail.com) Date: Fri, 17 Mar 2023 11:02:33 -0700 Subject: [petsc-users] PETSC ERROR in DMGetLocalBoundingBox? In-Reply-To: <8426FD29-CAD9-4B7B-8937-C03D1EF9C831@gmail.com> References: <00ab01d94e31$51fdc590$f5f950b0$@gmail.com> <8426FD29-CAD9-4B7B-8937-C03D1EF9C831@gmail.com> Message-ID: <001601d958fa$a6b83a60$f428af20$@gmail.com> Hi Matt, I am following up to check if you can reproduce the problem on your side. Thanks and have a great weekend, Danyang From: Danyang Su Sent: March 4, 2023 4:38 PM To: Matthew Knepley Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] PETSC ERROR in DMGetLocalBoundingBox? Hi Matt, Attached is the source code and example. I have deleted most of the unused source code but it is still a bit length. Sorry about that. The errors come after DMGetLocalBoundingBox and DMGetBoundingBox. -> To compile the code Please type 'make exe' and the executable file petsc_bounding will be created under the same folder. -> To test the code Please go to fold 'test' and type 'mpiexec -n 1 ../petsc_bounding'. -> The output from PETSc 3.18, error information input file: stedvs.dat ------------------------------------------------------------------------ global control parameters ------------------------------------------------------------------------ [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Corrupt argument: https://petsc.org/release/faq/#valgrind [0]PETSC ERROR: Object already free: Parameter # 1 [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 [0]PETSC ERROR: ../petsc_bounding on a linux-gnu-dbg named starblazer by dsu Sat Mar 4 16:20:51 2023 [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-scalapack --download-parmetis --download-metis --download-mumps --download-ptscotch --download-chaco --download-fblaslapack --download-hypre --download-superlu_dist --download-hdf5=yes --download-ctetgen --download-zlib --download-pnetcdf --download-cmake --with-hdf5-fortran-bindings --with-debugging=1 [0]PETSC ERROR: #1 VecGetArrayRead() at /home/dsu/Soft/petsc/petsc-3.18.3/src/vec/vec/interface/rvector.c:1928 [0]PETSC ERROR: #2 DMGetLocalBoundingBox() at /home/dsu/Soft/petsc/petsc-3.18.3/src/dm/interface/dmcoordinates.c:897 [0]PETSC ERROR: #3 /home/dsu/Work/bug-check/petsc_bounding/src/solver_ddmethod.F90:1920 Total volume of simulation domain 0.20000000E+01 Total volume of simulation domain 0.20000000E+01 -> The output from PETSc 3.17 and earlier, no error input file: stedvs.dat ------------------------------------------------------------------------ global control parameters ------------------------------------------------------------------------ Total volume of simulation domain 0.20000000E+01 Total volume of simulation domain 0.20000000E+01 Thanks, Danyang From: Matthew Knepley > Date: Friday, March 3, 2023 at 8:58 PM To: > Cc: > Subject: Re: [petsc-users] PETSC ERROR in DMGetLocalBoundingBox? On Sat, Mar 4, 2023 at 1:35?AM > wrote: Hi All, I get a very strange error after upgrading PETSc version to 3.18.3, indicating some object is already free. The error is begin and does not crash the code. There is no error before PETSc 3.17.5 versions. We have changed the way coordinates are handled in order to support higher order coordinate fields. Is it possible to send something that we can run that has this error? It could be on our end, but it could also be that you are destroying a coordinate vector accidentally. Thanks, Matt !Check coordinates call DMGetCoordinateDM(dmda_flow%da,cda,ierr) CHKERRQ(ierr) call DMGetCoordinates(dmda_flow%da,gc,ierr) CHKERRQ(ierr) call DMGetLocalBoundingBox(dmda_flow%da,lmin,lmax,ierr) CHKERRQ(ierr) call DMGetBoundingBox(dmda_flow%da,gmin,gmax,ierr) CHKERRQ(ierr) [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Corrupt argument: https://petsc.org/release/faq/#valgrind [0]PETSC ERROR: Object already free: Parameter # 1 [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 [0]PETSC ERROR: ../min3p-hpc-mpi-petsc-3.18.3 on a linux-gnu-dbg named starblazer by dsu Fri Mar 3 16:26:03 2023 [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-scalapack --download-parmetis --download-metis --download-mumps --download-ptscotch --download-chaco --download-fblaslapack --download-hypre --download-superlu_dist --download-hdf5=yes --download-ctetgen --download-zlib --download-pnetcdf --download-cmake --with-hdf5-fortran-bindings --with-debugging=1 [0]PETSC ERROR: #1 VecGetArrayRead() at /home/dsu/Soft/petsc/petsc-3.18.3/src/vec/vec/interface/rvector.c:1928 [0]PETSC ERROR: #2 DMGetLocalBoundingBox() at /home/dsu/Soft/petsc/petsc-3.18.3/src/dm/interface/dmcoordinates.c:897 [0]PETSC ERROR: #3 /home/dsu/Work/min3p-dbs-backup/src/project/makefile_p/../../solver/solver_ddmethod.F90:2140 Any suggestion on this? Thanks, Danyang -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Mar 17 13:14:17 2023 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 17 Mar 2023 14:14:17 -0400 Subject: [petsc-users] Create a nest not aligned by processors In-Reply-To: References: <3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev> <193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev> Message-ID: <82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev> This sounds like a fine use of MATNEST. Now back to the original question >>> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code >>> >>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) >>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) >>> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) >>> >>> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). >>> >>> Is it possible to change that ? >>> If I understand correctly it is behaving as expected. It is the same matrix on 1 and 2 MPI processes, the only difference is the ordering of the rows and columns. Both matrix blocks are split among the two MPI processes. This is how MATNEST works and likely what you want in practice. > On Mar 17, 2023, at 1:19 PM, Berger Clement wrote: > > I have a matrix with four different blocks (2rows - 2columns). The block sizes differ from one another, because they correspond to a different physical variable. One of the block has the particularity that it has to be updated at each iteration. This update is performed by replacing it with a product of multiple matrices that depend on the result of the previous iteration. Note that these intermediate matrices are not square (because they also correspond to other types of variables), and that they must be completely refilled by hand (i.e. they are not the result of some simple linear operations). Finally, I use this final block matrix to solve multiple linear systems (with different righthand sides), so for now I use MUMPS as only the first solve takes time (but I might change it). > > Considering this setting, I created each type of variable separately, filled the different matrices, and created different nests of vectors / matrices for my operations. When the time comes to use KSPSolve, I use MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy the few vector data I need from my nests in a regular Vector, I solve, I get back my data in my nest and carry on with the operations needed for my updates. > > Is that clear ? I don't know if I provided too many or not enough details. > > Thank you > > --- > Cl?ment BERGER > ENS de Lyon > > > Le 2023-03-17 17:34, Barry Smith a ?crit : > >> >> Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it. >> >> Barry >> >> Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks). >> >>> On Mar 17, 2023, at 12:14 PM, Berger Clement wrote: >>> >>> It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work. >>> >>> --- >>> Cl?ment BERGER >>> ENS de Lyon >>> >>> >>> Le 2023-03-17 15:48, Barry Smith a ?crit : >>> >>> >>> You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process. >>> >>> Barry >>> >>> >>> On Mar 17, 2023, at 10:10 AM, Berger Clement wrote: >>> Dear all, >>> >>> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code >>> >>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) >>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) >>> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) >>> >>> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). >>> >>> Is it possible to change that ? >>> >>> Note that I am coding in fortran if that has ay consequence. >>> >>> Thank you, >>> >>> Sincerely, >>> >>> -- >>> Cl?ment BERGER >>> ENS de Lyon -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Mar 17 13:22:52 2023 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 17 Mar 2023 14:22:52 -0400 Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG In-Reply-To: References: <523EAD18-437E-4008-A811-4D32317C89AC@joliv.et> <4A1F98D0-658C-47A2-8277-23F97F95F5C1@petsc.dev> Message-ID: <595D8D88-C619-41D7-A427-1C0EFB5C5E44@petsc.dev> > On Mar 17, 2023, at 1:26 PM, Christopher, Joshua wrote: > > Hi Barry, > > Thank you for your response. I'm a little confused about the relation between the IS integer values and matrix indices. Fromhttps://petsc.org/release/src/snes/tutorials/ex70.c.html it looks like my IS should just contain a list of the rows for each split? For example, if I have a 100x100 matrix with two fields, "rho" and "phi", the first 50 rows correspond to the "rho" variable and the last 50 correspond to the "phi" variable. So I should call PCFieldSplitSetIS twice, the first with an IS containing integers 0-49 and the second with integers 49-99? PCFieldSplitSetIS is expecting global row numbers, correct? As Mark said, yes this sounds fine. > > My matrix is organized as one block after another. When you are running in parallel with MPI, how will you organize the unknowns? Will you have 25 of the rho followed by 25 of phi on each MPI process? You will need to take this into account when you build the IS on each MPI process. Barry > > > Thank you, > Joshua > From: Barry Smith > > Sent: Tuesday, March 14, 2023 1:35 PM > To: Christopher, Joshua > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG > > > You definitely do not need to use a complicated DM to take advantage of PCFIELDSPLIT. All you need to do is create two IS on each MPI process. The first should list all the indices of the degrees of freedom of your first type of variable and the second should list all the rest of the degrees of freedom. Then use https://petsc.org/release/docs/manualpages/PC/PCFieldSplitSetIS/ > > Barry > > Note: PCFIELDSPLIT does not care how you have ordered your degrees of freedom of the two types. You might interlace them or have all the first degree of freedom on an MPI process and then have all the second degree of freedom. This just determines what your IS look like. > > > >> On Mar 14, 2023, at 1:14 PM, Christopher, Joshua via petsc-users > wrote: >> >> Hello PETSc users, >> >> I haven't heard back from the library developer regarding the numbering issue or my questions on using field split operators with their library, so I need to fix this myself. >> >> Regarding the natural numbering vs parallel numbering: I haven't figured out what is wrong here. I stepped through in parallel and it looks like each processor is setting up the matrix and calling MatSetValue similar to what is shown in https://petsc.org/release/src/ksp/ksp/tutorials/ex2.c.html. I see that PETSc is recognizing my simple two-processor test from the output ("PetscInitialize_Common(): PETSc successfully started: number of processors = 2"). I'll keep poking at this, however I'm very new to PETSc. When I print the matrix to ASCII using PETSC_VIEWER_DEFAULT, I'm guessing I see one row per line, and the tuples consists of the column number and value? >> >> On the FieldSplit preconditioner, is my understanding here correct: >> >> To use FieldSplit, I must have a DM. Since I have an unstructured mesh, I must use DMPlex and set up the chart and covering relations specific to my mesh following here: https://petsc.org/release/docs/manual/dmplex/. I think this may be very time-consuming for me to set up. >> >> Currently, I already have a matrix stored in a parallel sparse L-D-U format. I am converting into PETSc's sparse parallel AIJ matrix (traversing my matrix and using MatSetValues). The weights for my discretization scheme are already accounted for in the coefficients of my L-D-U matrix. I do have the submatrices in L-D-U format for each of my two equations' coupling with each other. That is, the equivalent of lines 242,251-252,254 of example 28 https://petsc.org/release/src/snes/tutorials/ex28.c.html. Could I directly convert my submatrices into PETSc's sub-matrix here, then assemble things together so that the field split preconditioners will work? >> >> Alternatively, since my L-D-U matrices already account for the discretization scheme, can I use a simple structured grid DM? >> >> Thank you so much for your help! >> Regards, >> Joshua >> From: Pierre Jolivet > >> Sent: Friday, March 3, 2023 11:45 AM >> To: Christopher, Joshua > >> Cc: petsc-users at mcs.anl.gov > >> Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG >> >> For full disclosure, with -ksp_pc_side right -ksp_max_it 100 -ksp_rtol 1E-10: >> 1) with renumbering via ParMETIS >> -pc_type bjacobi -sub_pc_type lu -sub_pc_factor_mat_solver_type mumps => Linear solve converged due to CONVERGED_RTOL iterations 10 >> -pc_type hypre -pc_hypre_boomeramg_relax_type_down l1-Gauss-Seidel -pc_hypre_boomeramg_relax_type_up backward-l1-Gauss-Seidel => Linear solve converged due to CONVERGED_RTOL iterations 55 >> 2) without renumbering via ParMETIS >> -pc_type bjacobi => Linear solve did not converge due to DIVERGED_ITS iterations 100 >> -pc_type hypre => Linear solve did not converge due to DIVERGED_ITS iterations 100 >> Using on outer fieldsplit may help fix this. >> >> Thanks, >> Pierre >> >>> On 3 Mar 2023, at 6:24 PM, Christopher, Joshua via petsc-users > wrote: >>> >>> I am solving these equations in the context of electrically-driven fluid flows as that first paper describes. I am using a PIMPLE scheme to advance the fluid equations in time, and my goal is to do a coupled solve of the electric equations similar to what is described in this paper: https://www.sciencedirect.com/science/article/pii/S0045793019302427. They are using the SIMPLE scheme in this paper. My fluid flow should eventually reach steady behavior, and likewise the time derivative in the charge density should trend towards zero. They preferred using BiCGStab with a direct LU preconditioner for solving their electric equations. I tried to test that combination, but my case is halting for unknown reasons in the middle of the PETSc solve. I'll try with more nodes and see if I am running out of memory, but the computer is a little overloaded at the moment so it may take a while to run. >>> >>> I sent Pierre Jolivet my matrix and RHS, and they said the matrix does not appear to be following a parallel numbering, and instead looks like the matrix has natural numbering. When they renumbered the system with ParMETIS they got really fast convergence. I am using PETSc through a library, so I will reach out to the library authors and see if there is an issue in the library. >>> >>> Thank you, >>> Joshua >>> From: Barry Smith > >>> Sent: Thursday, March 2, 2023 3:47 PM >>> To: Christopher, Joshua > >>> Cc: petsc-users at mcs.anl.gov > >>> Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG >>> >>> >>> >>> >>> >>> >>> Are you solving this as a time-dependent problem? Using an implicit scheme (like backward Euler) for rho ? In ODE language, solving the differential algebraic equation? >>> >>> Is epsilon bounded away from 0? >>> >>>> On Mar 2, 2023, at 4:22 PM, Christopher, Joshua > wrote: >>>> >>>> Hi Barry and Mark, >>>> >>>> Thank you for looking into my problem. The two equations I am solving with PETSc are equations 6 and 7 from this paper:https://ris.utwente.nl/ws/portalfiles/portal/5676495/Roghair+Paper_final_draft_v1.pdf >>>> >>>> I just used MUMPS and SuperLU_DIST on my full-size problem (with 3,000,000 unknowns). To clarify, I did a direct solve with -ksp_type preonly. They take a very long time, about 30 minutes for MUMPS and 18 minutes for SuperLU_DIST, see attached output. For reference, the same matrix took 658 iterations of BoomerAMG and about 20 seconds of walltime. Maybe I am already getting a great deal with BoomerAMG! >>>> >>>> I'll try removing some terms from my solve (e.g. removing the second equation, then making the second equation just the elliptic portion of the equation, etc.) and try with a simpler geometry. I'll keep you updated as I run into troubles with that route. I wasn't aware of Field Split preconditioners, I'll do some reading on them and give them a try as well. >>>> >>>> Thank you again, >>>> Joshua >>>> >>>> From: Barry Smith > >>>> Sent: Thursday, March 2, 2023 7:47 AM >>>> To: Christopher, Joshua > >>>> Cc: petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG >>>> >>>> >>>> Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the 5,000,000 unknowns? It is at the high end of problem sizes you can do with direct solvers but is worth comparing with BoomerAMG. You likely want to use more nodes and fewer cores per node with MUMPs to be able to access more memory. If you are needing to solve multiple right hand sides but with the same matrix the factors will be reused resulting in the second and later solves being much faster. >>>> >>>> I agree with Mark, with iterative solvers you are likely to end up with PCFIELDSPLIT. >>>> >>>> Barry >>>> >>>> >>>>> On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users > wrote: >>>>> >>>>> Hello, >>>>> >>>>> I am trying to solve the leaky-dielectric model equations with PETSc using a second-order discretization scheme (with limiting to first order as needed) using the finite volume method. The leaky dielectric model is a coupled system of two equations, consisting of a Poisson equation and a convection-diffusion equation. I have tested on small problems with simple geometry (~1000 DoFs) using: >>>>> >>>>> -ksp_type gmres >>>>> -pc_type hypre >>>>> -pc_hypre_type boomeramg >>>>> >>>>> and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this in parallel with 2 cores, but also previously was able to use successfully use a direct solver in serial to solve this problem. When I scale up to my production problem, I get significantly worse convergence. My production problem has ~3 million DoFs, more complex geometry, and is solved on ~100 cores across two nodes. The boundary conditions change a little because of the geometry, but are of the same classifications (e.g. only Dirichlet and Neumann). On the production case, I am needing 600-4000 iterations to converge. I've attached the output from the first solve that took 658 iterations to converge, using the following output options: >>>>> >>>>> -ksp_view_pre >>>>> -ksp_view >>>>> -ksp_converged_reason >>>>> -ksp_monitor_true_residual >>>>> -ksp_test_null_space >>>>> >>>>> My matrix is non-symmetric, the condition number can be around 10e6, and the eigenvalues reported by PETSc have been real and positive (using -ksp_view_eigenvalues). >>>>> >>>>> I have tried using other preconditions (superlu, mumps, gamg, mg) but hypre+boomeramg has performed the best so far. The literature seems to indicate that AMG is the best approach for solving these equations in a coupled fashion. >>>>> >>>>> Do you have any advice on speeding up the convergence of this system? >>>>> >>>>> Thank you, >>>>> Joshua >>>>> >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From clement.berger at ens-lyon.fr Fri Mar 17 13:23:04 2023 From: clement.berger at ens-lyon.fr (Berger Clement) Date: Fri, 17 Mar 2023 19:23:04 +0100 Subject: [petsc-users] Create a nest not aligned by processors In-Reply-To: <82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev> References: <3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev> <193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev> <82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev> Message-ID: My issue is that it seems to improperly with some step of my process, the solve step doesn't provide the same result depending on the number of processors I use. I manually tried to multiply one the matrices I defined as a nest against a vector, and the result is not the same with e.g. 1 and 3 processors. That's why I tried the toy program I wrote in the first place, which highlights the misplacement of elements. --- Cl?ment BERGER ENS de Lyon Le 2023-03-17 19:14, Barry Smith a ?crit : > This sounds like a fine use of MATNEST. Now back to the original question > > I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code > > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) > Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) > > does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). > > Is it possible to change that ? If I understand correctly it is behaving as expected. It is the same matrix on 1 and 2 MPI processes, the only difference is the ordering of the rows and columns. Both matrix blocks are split among the two MPI processes. This is how MATNEST works and likely what you want in practice. > On Mar 17, 2023, at 1:19 PM, Berger Clement wrote: > > I have a matrix with four different blocks (2rows - 2columns). The block sizes differ from one another, because they correspond to a different physical variable. One of the block has the particularity that it has to be updated at each iteration. This update is performed by replacing it with a product of multiple matrices that depend on the result of the previous iteration. Note that these intermediate matrices are not square (because they also correspond to other types of variables), and that they must be completely refilled by hand (i.e. they are not the result of some simple linear operations). Finally, I use this final block matrix to solve multiple linear systems (with different righthand sides), so for now I use MUMPS as only the first solve takes time (but I might change it). > > Considering this setting, I created each type of variable separately, filled the different matrices, and created different nests of vectors / matrices for my operations. When the time comes to use KSPSolve, I use MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy the few vector data I need from my nests in a regular Vector, I solve, I get back my data in my nest and carry on with the operations needed for my updates. > > Is that clear ? I don't know if I provided too many or not enough details. > > Thank you > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 17:34, Barry Smith a ?crit : > Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it. > > Barry > > Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks). > > On Mar 17, 2023, at 12:14 PM, Berger Clement wrote: > > It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work. > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 15:48, Barry Smith a ?crit : > You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process. > > Barry > > On Mar 17, 2023, at 10:10 AM, Berger Clement wrote: > > Dear all, > > I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code > > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) > Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) > > does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). > > Is it possible to change that ? > > Note that I am coding in fortran if that has ay consequence. > > Thank you, > > Sincerely, > > -- > Cl?ment BERGER > ENS de Lyon -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Mar 17 13:27:22 2023 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 17 Mar 2023 14:27:22 -0400 Subject: [petsc-users] Create a nest not aligned by processors In-Reply-To: References: <3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev> <193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev> <82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev> Message-ID: I would run your code with small sizes on 1, 2, 3 MPI ranks and use MatView() to examine the matrices. They will definitely be ordered differently but should otherwise be the same. My guess is that the right hand side may not have the correct ordering with respect to the matrix ordering in parallel. Note also that when the right hand side does have the correct ordering the solution will have a different ordering for each different number of MPI ranks when printed (but changing the ordering should give the same results up to machine precision. Barry > On Mar 17, 2023, at 2:23 PM, Berger Clement wrote: > > My issue is that it seems to improperly with some step of my process, the solve step doesn't provide the same result depending on the number of processors I use. I manually tried to multiply one the matrices I defined as a nest against a vector, and the result is not the same with e.g. 1 and 3 processors. That's why I tried the toy program I wrote in the first place, which highlights the misplacement of elements. > > --- > Cl?ment BERGER > ENS de Lyon > > > Le 2023-03-17 19:14, Barry Smith a ?crit : > >> >> This sounds like a fine use of MATNEST. Now back to the original question >> >>> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code >>> >>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) >>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) >>> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) >>> >>> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). >>> >>> Is it possible to change that ? >>> >> If I understand correctly it is behaving as expected. It is the same matrix on 1 and 2 MPI processes, the only difference is the ordering of the rows and columns. >> >> Both matrix blocks are split among the two MPI processes. This is how MATNEST works and likely what you want in practice. >> >>> On Mar 17, 2023, at 1:19 PM, Berger Clement wrote: >>> >>> I have a matrix with four different blocks (2rows - 2columns). The block sizes differ from one another, because they correspond to a different physical variable. One of the block has the particularity that it has to be updated at each iteration. This update is performed by replacing it with a product of multiple matrices that depend on the result of the previous iteration. Note that these intermediate matrices are not square (because they also correspond to other types of variables), and that they must be completely refilled by hand (i.e. they are not the result of some simple linear operations). Finally, I use this final block matrix to solve multiple linear systems (with different righthand sides), so for now I use MUMPS as only the first solve takes time (but I might change it). >>> >>> Considering this setting, I created each type of variable separately, filled the different matrices, and created different nests of vectors / matrices for my operations. When the time comes to use KSPSolve, I use MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy the few vector data I need from my nests in a regular Vector, I solve, I get back my data in my nest and carry on with the operations needed for my updates. >>> >>> Is that clear ? I don't know if I provided too many or not enough details. >>> >>> Thank you >>> >>> --- >>> Cl?ment BERGER >>> ENS de Lyon >>> >>> >>> Le 2023-03-17 17:34, Barry Smith a ?crit : >>> >>> >>> Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it. >>> >>> Barry >>> >>> Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks). >>> >>> On Mar 17, 2023, at 12:14 PM, Berger Clement wrote: >>> >>> It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work. >>> >>> --- >>> Cl?ment BERGER >>> ENS de Lyon >>> >>> >>> Le 2023-03-17 15:48, Barry Smith a ?crit : >>> >>> >>> You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process. >>> >>> Barry >>> >>> >>> On Mar 17, 2023, at 10:10 AM, Berger Clement wrote: >>> Dear all, >>> >>> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code >>> >>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) >>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) >>> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) >>> >>> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). >>> >>> Is it possible to change that ? >>> >>> Note that I am coding in fortran if that has ay consequence. >>> >>> Thank you, >>> >>> Sincerely, >>> >>> -- >>> Cl?ment BERGER >>> ENS de Lyon -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Mar 17 13:29:13 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 17 Mar 2023 14:29:13 -0400 Subject: [petsc-users] Create a nest not aligned by processors In-Reply-To: References: <3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev> <193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev> <82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev> Message-ID: On Fri, Mar 17, 2023 at 2:23?PM Berger Clement wrote: > My issue is that it seems to improperly with some step of my process, the > solve step doesn't provide the same result depending on the number of > processors I use. I manually tried to multiply one the matrices I defined > as a nest against a vector, and the result is not the same with e.g. 1 and > 3 processors. That's why I tried the toy program I wrote in the first > place, which highlights the misplacement of elements. > Ah, now I think I understand. The PETSC_DECIDE arguments for sizes change with a different number of processes. You can put in har numbers if you want. Thanks, Matt > --- > Cl?ment BERGER > ENS de Lyon > > > Le 2023-03-17 19:14, Barry Smith a ?crit : > > > This sounds like a fine use of MATNEST. Now back to the original > question > > > I want to construct a matrix by blocs, each block having different sizes > and partially stored by multiple processors. If I am not mistaken, the > right way to do so is by using the MATNEST type. However, the following code > > Call > MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) > Call > MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) > Call > MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) > > does not generate the same matrix depending on the number of processors. > It seems that it starts by everything owned by the first proc for A and B, > then goes on to the second proc and so on (I hope I am being clear). > > Is it possible to change that ? > > If I understand correctly it is behaving as expected. It is the same > matrix on 1 and 2 MPI processes, the only difference is the ordering of the > rows and columns. > > Both matrix blocks are split among the two MPI processes. This is how > MATNEST works and likely what you want in practice. > > On Mar 17, 2023, at 1:19 PM, Berger Clement > wrote: > > I have a matrix with four different blocks (2rows - 2columns). The block > sizes differ from one another, because they correspond to a different > physical variable. One of the block has the particularity that it has to be > updated at each iteration. This update is performed by replacing it with a > product of multiple matrices that depend on the result of the previous > iteration. Note that these intermediate matrices are not square (because > they also correspond to other types of variables), and that they must be > completely refilled by hand (i.e. they are not the result of some simple > linear operations). Finally, I use this final block matrix to solve > multiple linear systems (with different righthand sides), so for now I use > MUMPS as only the first solve takes time (but I might change it). > > Considering this setting, I created each type of variable separately, > filled the different matrices, and created different nests of vectors / > matrices for my operations. When the time comes to use KSPSolve, I use > MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy > the few vector data I need from my nests in a regular Vector, I solve, I > get back my data in my nest and carry on with the operations needed for my > updates. > > Is that clear ? I don't know if I provided too many or not enough details. > > Thank you > --- > Cl?ment BERGER > ENS de Lyon > > > Le 2023-03-17 17:34, Barry Smith a ?crit : > > > Perhaps if you provide a brief summary of what you would like to do and > we may have ideas on how to achieve it. > > Barry > > Note: that MATNEST does require that all matrices live on all the MPI > processes within the original communicator. That is if the original > communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST > that only lives on ranks 1,2 but you could have it have 0 rows on rank zero > so effectively it lives only on rank 1 and 2 (though its communicator is > all three ranks). > > On Mar 17, 2023, at 12:14 PM, Berger Clement > wrote: > > It would be possible in the case I showed you but in mine that would > actually be quite complicated, isn't there any other workaround ? I precise > that I am not entitled to utilizing the MATNEST format, it's just that I > think the other ones wouldn't work. > --- > Cl?ment BERGER > ENS de Lyon > > > Le 2023-03-17 15:48, Barry Smith a ?crit : > > > You may be able to mimic what you want by not using PETSC_DECIDE but > instead computing up front how many rows of each matrix you want stored on > each MPI process. You can use 0 for on certain MPI processes for certain > matrices if you don't want any rows of that particular matrix stored on > that particular MPI process. > > Barry > > > On Mar 17, 2023, at 10:10 AM, Berger Clement > wrote: > > Dear all, > > I want to construct a matrix by blocs, each block having different sizes > and partially stored by multiple processors. If I am not mistaken, the > right way to do so is by using the MATNEST type. However, the following code > > Call > MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) > Call > MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) > Call > MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) > > does not generate the same matrix depending on the number of processors. > It seems that it starts by everything owned by the first proc for A and B, > then goes on to the second proc and so on (I hope I am being clear). > > Is it possible to change that ? > > Note that I am coding in fortran if that has ay consequence. > > Thank you, > > Sincerely, > -- > Cl?ment BERGER > ENS de Lyon > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From clement.berger at ens-lyon.fr Fri Mar 17 13:35:33 2023 From: clement.berger at ens-lyon.fr (Berger Clement) Date: Fri, 17 Mar 2023 19:35:33 +0100 Subject: [petsc-users] Create a nest not aligned by processors In-Reply-To: References: <3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev> <193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev> <82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev> Message-ID: <26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr> That might be it, I didn't find the equivalent of MatConvert for the vectors, so when I need to solve my linear system, with my righthandside properly computed in nest format, I create a new vector using VecDuplicate, and then I copy into it my data using VecGetArrayF90 and copiing each element by hand. Does it create an incorrect ordering ? If so how can I get the correct one ? --- Cl?ment BERGER ENS de Lyon Le 2023-03-17 19:27, Barry Smith a ?crit : > I would run your code with small sizes on 1, 2, 3 MPI ranks and use MatView() to examine the matrices. They will definitely be ordered differently but should otherwise be the same. My guess is that the right hand side may not have the correct ordering with respect to the matrix ordering in parallel. Note also that when the right hand side does have the correct ordering the solution will have a different ordering for each different number of MPI ranks when printed (but changing the ordering should give the same results up to machine precision. > > Barry > > On Mar 17, 2023, at 2:23 PM, Berger Clement wrote: > > My issue is that it seems to improperly with some step of my process, the solve step doesn't provide the same result depending on the number of processors I use. I manually tried to multiply one the matrices I defined as a nest against a vector, and the result is not the same with e.g. 1 and 3 processors. That's why I tried the toy program I wrote in the first place, which highlights the misplacement of elements. > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 19:14, Barry Smith a ?crit : > This sounds like a fine use of MATNEST. Now back to the original question > > I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code > > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) > Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) > > does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). > > Is it possible to change that ? If I understand correctly it is behaving as expected. It is the same matrix on 1 and 2 MPI processes, the only difference is the ordering of the rows and columns. Both matrix blocks are split among the two MPI processes. This is how MATNEST works and likely what you want in practice. > On Mar 17, 2023, at 1:19 PM, Berger Clement wrote: > > I have a matrix with four different blocks (2rows - 2columns). The block sizes differ from one another, because they correspond to a different physical variable. One of the block has the particularity that it has to be updated at each iteration. This update is performed by replacing it with a product of multiple matrices that depend on the result of the previous iteration. Note that these intermediate matrices are not square (because they also correspond to other types of variables), and that they must be completely refilled by hand (i.e. they are not the result of some simple linear operations). Finally, I use this final block matrix to solve multiple linear systems (with different righthand sides), so for now I use MUMPS as only the first solve takes time (but I might change it). > > Considering this setting, I created each type of variable separately, filled the different matrices, and created different nests of vectors / matrices for my operations. When the time comes to use KSPSolve, I use MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy the few vector data I need from my nests in a regular Vector, I solve, I get back my data in my nest and carry on with the operations needed for my updates. > > Is that clear ? I don't know if I provided too many or not enough details. > > Thank you > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 17:34, Barry Smith a ?crit : > Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it. > > Barry > > Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks). > > On Mar 17, 2023, at 12:14 PM, Berger Clement wrote: > > It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work. > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 15:48, Barry Smith a ?crit : > You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process. > > Barry > > On Mar 17, 2023, at 10:10 AM, Berger Clement wrote: > > Dear all, > > I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code > > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) > Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) > > does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). > > Is it possible to change that ? > > Note that I am coding in fortran if that has ay consequence. > > Thank you, > > Sincerely, > > -- > Cl?ment BERGER > ENS de Lyon -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Mar 17 13:39:52 2023 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 17 Mar 2023 14:39:52 -0400 Subject: [petsc-users] Create a nest not aligned by processors In-Reply-To: <26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr> References: <3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev> <193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev> <82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev> <26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr> Message-ID: <1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev> I think the intention is that you use VecNestGetSubVecs() or VecNestGetSubVec() and fill up the sub-vectors in the same style as the matrices; this decreases the change of a reordering mistake in trying to do it by hand in your code. > On Mar 17, 2023, at 2:35 PM, Berger Clement wrote: > > That might be it, I didn't find the equivalent of MatConvert for the vectors, so when I need to solve my linear system, with my righthandside properly computed in nest format, I create a new vector using VecDuplicate, and then I copy into it my data using VecGetArrayF90 and copiing each element by hand. Does it create an incorrect ordering ? If so how can I get the correct one ? > > --- > Cl?ment BERGER > ENS de Lyon > > > Le 2023-03-17 19:27, Barry Smith a ?crit : > >> >> I would run your code with small sizes on 1, 2, 3 MPI ranks and use MatView() to examine the matrices. They will definitely be ordered differently but should otherwise be the same. My guess is that the right hand side may not have the correct ordering with respect to the matrix ordering in parallel. Note also that when the right hand side does have the correct ordering the solution will have a different ordering for each different number of MPI ranks when printed (but changing the ordering should give the same results up to machine precision. >> >> Barry >> >> >>> On Mar 17, 2023, at 2:23 PM, Berger Clement wrote: >>> >>> My issue is that it seems to improperly with some step of my process, the solve step doesn't provide the same result depending on the number of processors I use. I manually tried to multiply one the matrices I defined as a nest against a vector, and the result is not the same with e.g. 1 and 3 processors. That's why I tried the toy program I wrote in the first place, which highlights the misplacement of elements. >>> >>> --- >>> Cl?ment BERGER >>> ENS de Lyon >>> >>> >>> Le 2023-03-17 19:14, Barry Smith a ?crit : >>> >>> >>> This sounds like a fine use of MATNEST. Now back to the original question >>> >>> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code >>> >>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) >>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) >>> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) >>> >>> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). >>> >>> Is it possible to change that ? >>> >>> If I understand correctly it is behaving as expected. It is the same matrix on 1 and 2 MPI processes, the only difference is the ordering of the rows and columns. >>> >>> Both matrix blocks are split among the two MPI processes. This is how MATNEST works and likely what you want in practice. >>> >>> On Mar 17, 2023, at 1:19 PM, Berger Clement wrote: >>> >>> I have a matrix with four different blocks (2rows - 2columns). The block sizes differ from one another, because they correspond to a different physical variable. One of the block has the particularity that it has to be updated at each iteration. This update is performed by replacing it with a product of multiple matrices that depend on the result of the previous iteration. Note that these intermediate matrices are not square (because they also correspond to other types of variables), and that they must be completely refilled by hand (i.e. they are not the result of some simple linear operations). Finally, I use this final block matrix to solve multiple linear systems (with different righthand sides), so for now I use MUMPS as only the first solve takes time (but I might change it). >>> >>> Considering this setting, I created each type of variable separately, filled the different matrices, and created different nests of vectors / matrices for my operations. When the time comes to use KSPSolve, I use MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy the few vector data I need from my nests in a regular Vector, I solve, I get back my data in my nest and carry on with the operations needed for my updates. >>> >>> Is that clear ? I don't know if I provided too many or not enough details. >>> >>> Thank you >>> >>> --- >>> Cl?ment BERGER >>> ENS de Lyon >>> >>> >>> Le 2023-03-17 17:34, Barry Smith a ?crit : >>> >>> >>> Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it. >>> >>> Barry >>> >>> Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks). >>> >>> On Mar 17, 2023, at 12:14 PM, Berger Clement wrote: >>> >>> It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work. >>> >>> --- >>> Cl?ment BERGER >>> ENS de Lyon >>> >>> >>> Le 2023-03-17 15:48, Barry Smith a ?crit : >>> >>> >>> You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process. >>> >>> Barry >>> >>> >>> On Mar 17, 2023, at 10:10 AM, Berger Clement wrote: >>> Dear all, >>> >>> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code >>> >>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) >>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) >>> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) >>> >>> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). >>> >>> Is it possible to change that ? >>> >>> Note that I am coding in fortran if that has ay consequence. >>> >>> Thank you, >>> >>> Sincerely, >>> >>> -- >>> Cl?ment BERGER >>> ENS de Lyon -------------- next part -------------- An HTML attachment was scrubbed... URL: From clement.berger at ens-lyon.fr Fri Mar 17 13:52:49 2023 From: clement.berger at ens-lyon.fr (Berger Clement) Date: Fri, 17 Mar 2023 19:52:49 +0100 Subject: [petsc-users] Create a nest not aligned by processors In-Reply-To: <1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev> References: <3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev> <193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev> <82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev> <26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr> <1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev> Message-ID: <2b92d65cd8b6b5786bad3b03ab74eda6@ens-lyon.fr> But this is to properly fill up the VecNest am I right ? Because this one is correct, but I can't directly use it in the KSPSolve, I need to copy it into a standard vector --- Cl?ment BERGER ENS de Lyon Le 2023-03-17 19:39, Barry Smith a ?crit : > I think the intention is that you use VecNestGetSubVecs() or VecNestGetSubVec() and fill up the sub-vectors in the same style as the matrices; this decreases the change of a reordering mistake in trying to do it by hand in your code. > > On Mar 17, 2023, at 2:35 PM, Berger Clement wrote: > > That might be it, I didn't find the equivalent of MatConvert for the vectors, so when I need to solve my linear system, with my righthandside properly computed in nest format, I create a new vector using VecDuplicate, and then I copy into it my data using VecGetArrayF90 and copiing each element by hand. Does it create an incorrect ordering ? If so how can I get the correct one ? > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 19:27, Barry Smith a ?crit : > I would run your code with small sizes on 1, 2, 3 MPI ranks and use MatView() to examine the matrices. They will definitely be ordered differently but should otherwise be the same. My guess is that the right hand side may not have the correct ordering with respect to the matrix ordering in parallel. Note also that when the right hand side does have the correct ordering the solution will have a different ordering for each different number of MPI ranks when printed (but changing the ordering should give the same results up to machine precision. > > Barry > > On Mar 17, 2023, at 2:23 PM, Berger Clement wrote: > > My issue is that it seems to improperly with some step of my process, the solve step doesn't provide the same result depending on the number of processors I use. I manually tried to multiply one the matrices I defined as a nest against a vector, and the result is not the same with e.g. 1 and 3 processors. That's why I tried the toy program I wrote in the first place, which highlights the misplacement of elements. > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 19:14, Barry Smith a ?crit : > This sounds like a fine use of MATNEST. Now back to the original question > > I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code > > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) > Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) > > does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). > > Is it possible to change that ? If I understand correctly it is behaving as expected. It is the same matrix on 1 and 2 MPI processes, the only difference is the ordering of the rows and columns. Both matrix blocks are split among the two MPI processes. This is how MATNEST works and likely what you want in practice. > On Mar 17, 2023, at 1:19 PM, Berger Clement wrote: > > I have a matrix with four different blocks (2rows - 2columns). The block sizes differ from one another, because they correspond to a different physical variable. One of the block has the particularity that it has to be updated at each iteration. This update is performed by replacing it with a product of multiple matrices that depend on the result of the previous iteration. Note that these intermediate matrices are not square (because they also correspond to other types of variables), and that they must be completely refilled by hand (i.e. they are not the result of some simple linear operations). Finally, I use this final block matrix to solve multiple linear systems (with different righthand sides), so for now I use MUMPS as only the first solve takes time (but I might change it). > > Considering this setting, I created each type of variable separately, filled the different matrices, and created different nests of vectors / matrices for my operations. When the time comes to use KSPSolve, I use MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy the few vector data I need from my nests in a regular Vector, I solve, I get back my data in my nest and carry on with the operations needed for my updates. > > Is that clear ? I don't know if I provided too many or not enough details. > > Thank you > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 17:34, Barry Smith a ?crit : > Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it. > > Barry > > Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks). > > On Mar 17, 2023, at 12:14 PM, Berger Clement wrote: > > It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work. > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 15:48, Barry Smith a ?crit : > You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process. > > Barry > > On Mar 17, 2023, at 10:10 AM, Berger Clement wrote: > > Dear all, > > I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code > > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) > Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) > > does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). > > Is it possible to change that ? > > Note that I am coding in fortran if that has ay consequence. > > Thank you, > > Sincerely, > > -- > Cl?ment BERGER > ENS de Lyon -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Mar 17 13:53:59 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 17 Mar 2023 14:53:59 -0400 Subject: [petsc-users] Create a nest not aligned by processors In-Reply-To: <2b92d65cd8b6b5786bad3b03ab74eda6@ens-lyon.fr> References: <3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev> <193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev> <82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev> <26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr> <1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev> <2b92d65cd8b6b5786bad3b03ab74eda6@ens-lyon.fr> Message-ID: On Fri, Mar 17, 2023 at 2:53?PM Berger Clement wrote: > But this is to properly fill up the VecNest am I right ? Because this one > is correct, but I can't directly use it in the KSPSolve, I need to copy it > into a standard vector > I do not understand what you mean here. You can definitely use a VecNest in a KSP. Thanks, Matt > --- > Cl?ment BERGER > ENS de Lyon > > > Le 2023-03-17 19:39, Barry Smith a ?crit : > > > I think the intention is that you use VecNestGetSubVecs() > or VecNestGetSubVec() and fill up the sub-vectors in the same style as the > matrices; this decreases the change of a reordering mistake in trying to do > it by hand in your code. > > > > On Mar 17, 2023, at 2:35 PM, Berger Clement > wrote: > > That might be it, I didn't find the equivalent of MatConvert for the > vectors, so when I need to solve my linear system, with my righthandside > properly computed in nest format, I create a new vector using VecDuplicate, > and then I copy into it my data using VecGetArrayF90 and copiing each > element by hand. Does it create an incorrect ordering ? If so how can I get > the correct one ? > --- > Cl?ment BERGER > ENS de Lyon > > > Le 2023-03-17 19:27, Barry Smith a ?crit : > > > I would run your code with small sizes on 1, 2, 3 MPI ranks and use > MatView() to examine the matrices. They will definitely be ordered > differently but should otherwise be the same. My guess is that the right > hand side may not have the correct ordering with respect to the matrix > ordering in parallel. Note also that when the right hand side does have the > correct ordering the solution will have a different ordering for each > different number of MPI ranks when printed (but changing the ordering > should give the same results up to machine precision. > > Barry > > > On Mar 17, 2023, at 2:23 PM, Berger Clement > wrote: > > My issue is that it seems to improperly with some step of my process, the > solve step doesn't provide the same result depending on the number of > processors I use. I manually tried to multiply one the matrices I defined > as a nest against a vector, and the result is not the same with e.g. 1 and > 3 processors. That's why I tried the toy program I wrote in the first > place, which highlights the misplacement of elements. > --- > Cl?ment BERGER > ENS de Lyon > > > Le 2023-03-17 19:14, Barry Smith a ?crit : > > > This sounds like a fine use of MATNEST. Now back to the original > question > > > I want to construct a matrix by blocs, each block having different sizes > and partially stored by multiple processors. If I am not mistaken, the > right way to do so is by using the MATNEST type. However, the following code > > Call > MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) > Call > MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) > Call > MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) > > does not generate the same matrix depending on the number of processors. > It seems that it starts by everything owned by the first proc for A and B, > then goes on to the second proc and so on (I hope I am being clear). > > Is it possible to change that ? > > If I understand correctly it is behaving as expected. It is the same > matrix on 1 and 2 MPI processes, the only difference is the ordering of the > rows and columns. > > Both matrix blocks are split among the two MPI processes. This is how > MATNEST works and likely what you want in practice. > > On Mar 17, 2023, at 1:19 PM, Berger Clement > wrote: > > I have a matrix with four different blocks (2rows - 2columns). The block > sizes differ from one another, because they correspond to a different > physical variable. One of the block has the particularity that it has to be > updated at each iteration. This update is performed by replacing it with a > product of multiple matrices that depend on the result of the previous > iteration. Note that these intermediate matrices are not square (because > they also correspond to other types of variables), and that they must be > completely refilled by hand (i.e. they are not the result of some simple > linear operations). Finally, I use this final block matrix to solve > multiple linear systems (with different righthand sides), so for now I use > MUMPS as only the first solve takes time (but I might change it). > > Considering this setting, I created each type of variable separately, > filled the different matrices, and created different nests of vectors / > matrices for my operations. When the time comes to use KSPSolve, I use > MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy > the few vector data I need from my nests in a regular Vector, I solve, I > get back my data in my nest and carry on with the operations needed for my > updates. > > Is that clear ? I don't know if I provided too many or not enough details. > > Thank you > --- > Cl?ment BERGER > ENS de Lyon > > > Le 2023-03-17 17:34, Barry Smith a ?crit : > > > Perhaps if you provide a brief summary of what you would like to do and > we may have ideas on how to achieve it. > > Barry > > Note: that MATNEST does require that all matrices live on all the MPI > processes within the original communicator. That is if the original > communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST > that only lives on ranks 1,2 but you could have it have 0 rows on rank zero > so effectively it lives only on rank 1 and 2 (though its communicator is > all three ranks). > > On Mar 17, 2023, at 12:14 PM, Berger Clement > wrote: > > It would be possible in the case I showed you but in mine that would > actually be quite complicated, isn't there any other workaround ? I precise > that I am not entitled to utilizing the MATNEST format, it's just that I > think the other ones wouldn't work. > --- > Cl?ment BERGER > ENS de Lyon > > > Le 2023-03-17 15:48, Barry Smith a ?crit : > > > You may be able to mimic what you want by not using PETSC_DECIDE but > instead computing up front how many rows of each matrix you want stored on > each MPI process. You can use 0 for on certain MPI processes for certain > matrices if you don't want any rows of that particular matrix stored on > that particular MPI process. > > Barry > > > On Mar 17, 2023, at 10:10 AM, Berger Clement > wrote: > > Dear all, > > I want to construct a matrix by blocs, each block having different sizes > and partially stored by multiple processors. If I am not mistaken, the > right way to do so is by using the MATNEST type. However, the following code > > Call > MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) > Call > MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) > Call > MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) > > does not generate the same matrix depending on the number of processors. > It seems that it starts by everything owned by the first proc for A and B, > then goes on to the second proc and so on (I hope I am being clear). > > Is it possible to change that ? > > Note that I am coding in fortran if that has ay consequence. > > Thank you, > > Sincerely, > -- > Cl?ment BERGER > ENS de Lyon > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From clement.berger at ens-lyon.fr Fri Mar 17 14:22:20 2023 From: clement.berger at ens-lyon.fr (Berger Clement) Date: Fri, 17 Mar 2023 20:22:20 +0100 Subject: [petsc-users] Create a nest not aligned by processors In-Reply-To: References: <3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev> <193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev> <82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev> <26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr> <1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev> <2b92d65cd8b6b5786bad3b03ab74eda6@ens-lyon.fr> Message-ID: To use MUMPS I need to convert my matrix in MATAIJ format (or at least not MATNEST), after that if I use a VECNEST for the left and right hanside, I get an error during the solve procedure, it is removed if I copy my data in a vector with standard format, I couldn't find any other way --- Cl?ment BERGER ENS de Lyon Le 2023-03-17 19:53, Matthew Knepley a ?crit : > On Fri, Mar 17, 2023 at 2:53?PM Berger Clement wrote: > >> But this is to properly fill up the VecNest am I right ? Because this one is correct, but I can't directly use it in the KSPSolve, I need to copy it into a standard vector > > I do not understand what you mean here. You can definitely use a VecNest in a KSP. > > Thanks, > > Matt > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 19:39, Barry Smith a ?crit : > I think the intention is that you use VecNestGetSubVecs() or VecNestGetSubVec() and fill up the sub-vectors in the same style as the matrices; this decreases the change of a reordering mistake in trying to do it by hand in your code. > > On Mar 17, 2023, at 2:35 PM, Berger Clement wrote: > > That might be it, I didn't find the equivalent of MatConvert for the vectors, so when I need to solve my linear system, with my righthandside properly computed in nest format, I create a new vector using VecDuplicate, and then I copy into it my data using VecGetArrayF90 and copiing each element by hand. Does it create an incorrect ordering ? If so how can I get the correct one ? > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 19:27, Barry Smith a ?crit : > I would run your code with small sizes on 1, 2, 3 MPI ranks and use MatView() to examine the matrices. They will definitely be ordered differently but should otherwise be the same. My guess is that the right hand side may not have the correct ordering with respect to the matrix ordering in parallel. Note also that when the right hand side does have the correct ordering the solution will have a different ordering for each different number of MPI ranks when printed (but changing the ordering should give the same results up to machine precision. > > Barry > > On Mar 17, 2023, at 2:23 PM, Berger Clement wrote: > > My issue is that it seems to improperly with some step of my process, the solve step doesn't provide the same result depending on the number of processors I use. I manually tried to multiply one the matrices I defined as a nest against a vector, and the result is not the same with e.g. 1 and 3 processors. That's why I tried the toy program I wrote in the first place, which highlights the misplacement of elements. > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 19:14, Barry Smith a ?crit : > This sounds like a fine use of MATNEST. Now back to the original question > > I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code > > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) > Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) > > does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). > > Is it possible to change that ? If I understand correctly it is behaving as expected. It is the same matrix on 1 and 2 MPI processes, the only difference is the ordering of the rows and columns. Both matrix blocks are split among the two MPI processes. This is how MATNEST works and likely what you want in practice. > On Mar 17, 2023, at 1:19 PM, Berger Clement wrote: > > I have a matrix with four different blocks (2rows - 2columns). The block sizes differ from one another, because they correspond to a different physical variable. One of the block has the particularity that it has to be updated at each iteration. This update is performed by replacing it with a product of multiple matrices that depend on the result of the previous iteration. Note that these intermediate matrices are not square (because they also correspond to other types of variables), and that they must be completely refilled by hand (i.e. they are not the result of some simple linear operations). Finally, I use this final block matrix to solve multiple linear systems (with different righthand sides), so for now I use MUMPS as only the first solve takes time (but I might change it). > > Considering this setting, I created each type of variable separately, filled the different matrices, and created different nests of vectors / matrices for my operations. When the time comes to use KSPSolve, I use MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy the few vector data I need from my nests in a regular Vector, I solve, I get back my data in my nest and carry on with the operations needed for my updates. > > Is that clear ? I don't know if I provided too many or not enough details. > > Thank you > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 17:34, Barry Smith a ?crit : > Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it. > > Barry > > Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks). > > On Mar 17, 2023, at 12:14 PM, Berger Clement wrote: > > It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work. > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 15:48, Barry Smith a ?crit : > You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process. > > Barry > > On Mar 17, 2023, at 10:10 AM, Berger Clement wrote: > > Dear all, > > I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code > > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) > Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) > > does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). > > Is it possible to change that ? > > Note that I am coding in fortran if that has ay consequence. > > Thank you, > > Sincerely, > > -- > Cl?ment BERGER > ENS de Lyon -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ [1] Links: ------ [1] http://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gregory.meyer at berkeley.edu Fri Mar 17 15:13:26 2023 From: gregory.meyer at berkeley.edu (Greg Kahanamoku-Meyer) Date: Fri, 17 Mar 2023 13:13:26 -0700 Subject: [petsc-users] SLEPc: GPU accelerated shift-invert Message-ID: Hi, I'm trying to accelerate a shift-invert eigensolve with GPU, but the computation seems to be spending a lot of its time in the CPU. Looking at the output with "-log_view -log_view_gpu_time" I see that MatLUFactorNum is not using the GPU (GPU Mflops/s is 0), and is taking the majority of the computation time. Is LU factorization on the GPU supported? I am currently applying the command line options "-vec_type cuda -mat_type aijcusparse", please let me know if there are other options I can apply to accelerate the LU factorization as well. I tried digging through the documentation but couldn't find a clear answer. Thanks in advance! Kind regards, Greg KM -- *Gregory D. Kahanamoku-Meyer* PhD Candidate quantum computing | cryptography | high-performance computing Department of Physics University of California at Berkeley personal website -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Mar 17 15:57:09 2023 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 17 Mar 2023 16:57:09 -0400 Subject: [petsc-users] Create a nest not aligned by processors In-Reply-To: References: <3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev> <193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev> <82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev> <26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr> <1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev> <2b92d65cd8b6b5786bad3b03ab74eda6@ens-lyon.fr> Message-ID: Yes, you would benefit from a VecConvert() to produce a standard vector. But you should be able to use VecGetArray() on the nest array and on the standard array and copy the values between the arrays any way you like. You don't need to do any reordering when you copy. Is that not working and what are the symptoms (more than just the answers to the linear solve are different)? Again you can run on one and two MPI processes with a tiny problem to see if things are not in the correct order in the vectors and matrices. Barry > On Mar 17, 2023, at 3:22 PM, Berger Clement wrote: > > To use MUMPS I need to convert my matrix in MATAIJ format (or at least not MATNEST), after that if I use a VECNEST for the left and right hanside, I get an error during the solve procedure, it is removed if I copy my data in a vector with standard format, I couldn't find any other way > > --- > Cl?ment BERGER > ENS de Lyon > > > Le 2023-03-17 19:53, Matthew Knepley a ?crit : > >> On Fri, Mar 17, 2023 at 2:53?PM Berger Clement > wrote: >>> But this is to properly fill up the VecNest am I right ? Because this one is correct, but I can't directly use it in the KSPSolve, I need to copy it into a standard vector >>> >>> >> I do not understand what you mean here. You can definitely use a VecNest in a KSP. >> >> Thanks, >> >> Matt >> >> >>> --- >>> Cl?ment BERGER >>> ENS de Lyon >>> >>> >>> Le 2023-03-17 19:39, Barry Smith a ?crit : >>> >>> >>> I think the intention is that you use VecNestGetSubVecs() or VecNestGetSubVec() and fill up the sub-vectors in the same style as the matrices; this decreases the change of a reordering mistake in trying to do it by hand in your code. >>> >>> >>> >>> On Mar 17, 2023, at 2:35 PM, Berger Clement > wrote: >>> >>> That might be it, I didn't find the equivalent of MatConvert for the vectors, so when I need to solve my linear system, with my righthandside properly computed in nest format, I create a new vector using VecDuplicate, and then I copy into it my data using VecGetArrayF90 and copiing each element by hand. Does it create an incorrect ordering ? If so how can I get the correct one ? >>> >>> --- >>> Cl?ment BERGER >>> ENS de Lyon >>> >>> >>> Le 2023-03-17 19:27, Barry Smith a ?crit : >>> >>> >>> I would run your code with small sizes on 1, 2, 3 MPI ranks and use MatView() to examine the matrices. They will definitely be ordered differently but should otherwise be the same. My guess is that the right hand side may not have the correct ordering with respect to the matrix ordering in parallel. Note also that when the right hand side does have the correct ordering the solution will have a different ordering for each different number of MPI ranks when printed (but changing the ordering should give the same results up to machine precision. >>> >>> Barry >>> >>> >>> On Mar 17, 2023, at 2:23 PM, Berger Clement > wrote: >>> >>> My issue is that it seems to improperly with some step of my process, the solve step doesn't provide the same result depending on the number of processors I use. I manually tried to multiply one the matrices I defined as a nest against a vector, and the result is not the same with e.g. 1 and 3 processors. That's why I tried the toy program I wrote in the first place, which highlights the misplacement of elements. >>> >>> --- >>> Cl?ment BERGER >>> ENS de Lyon >>> >>> >>> Le 2023-03-17 19:14, Barry Smith a ?crit : >>> >>> >>> This sounds like a fine use of MATNEST. Now back to the original question >>> >>> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code >>> >>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) >>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) >>> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) >>> >>> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). >>> >>> Is it possible to change that ? >>> >>> If I understand correctly it is behaving as expected. It is the same matrix on 1 and 2 MPI processes, the only difference is the ordering of the rows and columns. >>> >>> Both matrix blocks are split among the two MPI processes. This is how MATNEST works and likely what you want in practice. >>> >>> On Mar 17, 2023, at 1:19 PM, Berger Clement > wrote: >>> >>> I have a matrix with four different blocks (2rows - 2columns). The block sizes differ from one another, because they correspond to a different physical variable. One of the block has the particularity that it has to be updated at each iteration. This update is performed by replacing it with a product of multiple matrices that depend on the result of the previous iteration. Note that these intermediate matrices are not square (because they also correspond to other types of variables), and that they must be completely refilled by hand (i.e. they are not the result of some simple linear operations). Finally, I use this final block matrix to solve multiple linear systems (with different righthand sides), so for now I use MUMPS as only the first solve takes time (but I might change it). >>> >>> Considering this setting, I created each type of variable separately, filled the different matrices, and created different nests of vectors / matrices for my operations. When the time comes to use KSPSolve, I use MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy the few vector data I need from my nests in a regular Vector, I solve, I get back my data in my nest and carry on with the operations needed for my updates. >>> >>> Is that clear ? I don't know if I provided too many or not enough details. >>> >>> Thank you >>> >>> --- >>> Cl?ment BERGER >>> ENS de Lyon >>> >>> >>> Le 2023-03-17 17:34, Barry Smith a ?crit : >>> >>> >>> Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it. >>> >>> Barry >>> >>> Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks). >>> >>> On Mar 17, 2023, at 12:14 PM, Berger Clement > wrote: >>> >>> It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work. >>> >>> --- >>> Cl?ment BERGER >>> ENS de Lyon >>> >>> >>> Le 2023-03-17 15:48, Barry Smith a ?crit : >>> >>> >>> You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process. >>> >>> Barry >>> >>> >>> On Mar 17, 2023, at 10:10 AM, Berger Clement > wrote: >>> Dear all, >>> >>> I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code >>> >>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) >>> Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) >>> Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) >>> >>> does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). >>> >>> Is it possible to change that ? >>> >>> Note that I am coding in fortran if that has ay consequence. >>> >>> Thank you, >>> >>> Sincerely, >>> >>> -- >>> Cl?ment BERGER >>> ENS de Lyon >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Sat Mar 18 06:46:19 2023 From: jroman at dsic.upv.es (Jose E. Roman) Date: Sat, 18 Mar 2023 12:46:19 +0100 Subject: [petsc-users] SLEPc: GPU accelerated shift-invert In-Reply-To: References: Message-ID: <1ACEEC7D-35E4-453E-9F96-3B9271C4F946@dsic.upv.es> When using an aijcusparse matrix by default it will select the cusparse solver, i.e., as if you have added the option -st_pc_factor_mat_solver_type cusparse The problem is that CUSPARSE does not have functionality for computing the LU factorization on the GPU, as far as I know. So what PETSc does is factorize the matrix on the CPU (the largest cost) and then the use the GPU for the triangular solves. In SLEPc computations, the number of triangular solves is usually small, so there is no gain in doing those on the GPU. Furthermore, these flops do not seem to be correctly logged to appear on the GPU side. Probably someone like Stefano or Junchao can provide more information about factorizations on the GPU. You could try doing inexact shift-and-invert, i.e., using an iterative linear solver such as bcgs+ilu. In the case of ILU, it is implemented on the GPU with CUSPARSE. However, inexact shift-and-invert is not viable in many applications, depending on the distribution of eigenvalues, due to non-convergence of the KSP. A final alternative is to avoid shift-and-invert completely and use STFILTER. Again, this will not work in all cases. Basically, it trades a factorization for a huge amount of matrix-vector products, which may be good for GPU computation. If you want, send me a matrix and I can do some tests. Jose > El 17 mar 2023, a las 21:13, Greg Kahanamoku-Meyer escribi?: > > Hi, > > I'm trying to accelerate a shift-invert eigensolve with GPU, but the computation seems to be spending a lot of its time in the CPU. Looking at the output with "-log_view -log_view_gpu_time" I see that MatLUFactorNum is not using the GPU (GPU Mflops/s is 0), and is taking the majority of the computation time. Is LU factorization on the GPU supported? I am currently applying the command line options "-vec_type cuda -mat_type aijcusparse", please let me know if there are other options I can apply to accelerate the LU factorization as well. I tried digging through the documentation but couldn't find a clear answer. > > Thanks in advance! > > Kind regards, > Greg KM > > -- > Gregory D. Kahanamoku-Meyer > PhD Candidate > quantum computing | cryptography | high-performance computing > Department of Physics > University of California at Berkeley > personal website From mail2amneet at gmail.com Sun Mar 19 12:43:37 2023 From: mail2amneet at gmail.com (Amneet Bhalla) Date: Sun, 19 Mar 2023 10:43:37 -0700 Subject: [petsc-users] PETSc build asks for network connections Message-ID: Hi Folks, I'm trying to build PETSc on MacOS Ventura (Apple M2) with hypre. I'm using the latest version (v3.18.5). During the configure and make check stage I get a request about accepting network connections. The configure and check proceeds without my input but the dialog box stays in place. Please see the screenshot. I'm wondering if it is benign or something to be concerned about? Do I need to accept any network certificate to not see this dialog box? Thanks, -- --Amneet [image: Screenshot 2023-03-19 at 10.38.57 AM.png][image: Screenshot 2023-03-19 at 10.33.01 AM.png] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot 2023-03-19 at 10.38.57 AM.png Type: image/png Size: 1018501 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot 2023-03-19 at 10.33.01 AM.png Type: image/png Size: 2269564 bytes Desc: not available URL: From balay at mcs.anl.gov Sun Mar 19 12:56:28 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Sun, 19 Mar 2023 12:56:28 -0500 (CDT) Subject: [petsc-users] PETSc build asks for network connections In-Reply-To: References: Message-ID: I think its due to some of the system calls from MPI. You can verify this with a '--with-mpi=0' build. I wonder if there is a way to build mpich or openmpi - that doesn't trigger Apple's firewall.. Satish On Sun, 19 Mar 2023, Amneet Bhalla wrote: > Hi Folks, > > I'm trying to build PETSc on MacOS Ventura (Apple M2) with hypre. I'm using > the latest version (v3.18.5). During the configure and make check stage I > get a request about accepting network connections. The configure and check > proceeds without my input but the dialog box stays in place. Please see the > screenshot. I'm wondering if it is benign or something to be concerned > about? Do I need to accept any network certificate to not see this dialog > box? > > Thanks, > > From mail2amneet at gmail.com Sun Mar 19 12:59:10 2023 From: mail2amneet at gmail.com (Amneet Bhalla) Date: Sun, 19 Mar 2023 10:59:10 -0700 Subject: [petsc-users] PETSc build asks for network connections In-Reply-To: References: Message-ID: I'm building PETSc without mpi (I built mpich v 4.1.1 locally). Here is the configure command line that I used: ./configure --CC=mpicc --CXX=mpicxx --FC=mpif90 --PETSC_ARCH=darwin-dbg --with-debugging=1 --download-hypre=1 --with-x=0 On Sun, Mar 19, 2023 at 10:56?AM Satish Balay wrote: > I think its due to some of the system calls from MPI. > > You can verify this with a '--with-mpi=0' build. > > I wonder if there is a way to build mpich or openmpi - that doesn't > trigger Apple's firewall.. > > Satish > > On Sun, 19 Mar 2023, Amneet Bhalla wrote: > > > Hi Folks, > > > > I'm trying to build PETSc on MacOS Ventura (Apple M2) with hypre. I'm > using > > the latest version (v3.18.5). During the configure and make check stage I > > get a request about accepting network connections. The configure and > check > > proceeds without my input but the dialog box stays in place. Please see > the > > screenshot. I'm wondering if it is benign or something to be concerned > > about? Do I need to accept any network certificate to not see this dialog > > box? > > > > Thanks, > > > > > > -- --Amneet -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Mar 19 13:00:58 2023 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 19 Mar 2023 14:00:58 -0400 Subject: [petsc-users] PETSc build asks for network connections In-Reply-To: References: Message-ID: On Sun, Mar 19, 2023 at 1:59?PM Amneet Bhalla wrote: > I'm building PETSc without mpi (I built mpich v 4.1.1 locally). Here is > the configure command line that I used: > > ./configure --CC=mpicc --CXX=mpicxx --FC=mpif90 --PETSC_ARCH=darwin-dbg > --with-debugging=1 --download-hypre=1 --with-x=0 > > No, this uses MPI, it just does not built it. Configuring with --with-mpi=0 will shut off any use of MPI, which is what Satish thinks is bugging the firewall. Thanks, Matt > On Sun, Mar 19, 2023 at 10:56?AM Satish Balay wrote: > >> I think its due to some of the system calls from MPI. >> >> You can verify this with a '--with-mpi=0' build. >> >> I wonder if there is a way to build mpich or openmpi - that doesn't >> trigger Apple's firewall.. >> >> Satish >> >> On Sun, 19 Mar 2023, Amneet Bhalla wrote: >> >> > Hi Folks, >> > >> > I'm trying to build PETSc on MacOS Ventura (Apple M2) with hypre. I'm >> using >> > the latest version (v3.18.5). During the configure and make check stage >> I >> > get a request about accepting network connections. The configure and >> check >> > proceeds without my input but the dialog box stays in place. Please see >> the >> > screenshot. I'm wondering if it is benign or something to be concerned >> > about? Do I need to accept any network certificate to not see this >> dialog >> > box? >> > >> > Thanks, >> > >> > >> >> > > -- > --Amneet > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail2amneet at gmail.com Sun Mar 19 16:25:53 2023 From: mail2amneet at gmail.com (Amneet Bhalla) Date: Sun, 19 Mar 2023 14:25:53 -0700 Subject: [petsc-users] PETSc build asks for network connections In-Reply-To: References: Message-ID: Yes, this is MPI that is triggering the apple firewall. If I allow it it gets added to the allowed list (see the screenshot) and it does not trigger the firewall again. However, this needs to be done for all executables (there will be several main2d's in the list). Any way to suppress it for all executables linked to mpi in the first place? [image: Screenshot 2023-03-19 at 2.19.53 PM.png] On Sun, Mar 19, 2023 at 11:01?AM Matthew Knepley wrote: > On Sun, Mar 19, 2023 at 1:59?PM Amneet Bhalla > wrote: > >> I'm building PETSc without mpi (I built mpich v 4.1.1 locally). Here is >> the configure command line that I used: >> >> ./configure --CC=mpicc --CXX=mpicxx --FC=mpif90 --PETSC_ARCH=darwin-dbg >> --with-debugging=1 --download-hypre=1 --with-x=0 >> >> > No, this uses MPI, it just does not built it. Configuring with > --with-mpi=0 will shut off any use of MPI, which is what Satish thinks is > bugging the firewall. > > Thanks, > > Matt > > >> On Sun, Mar 19, 2023 at 10:56?AM Satish Balay wrote: >> >>> I think its due to some of the system calls from MPI. >>> >>> You can verify this with a '--with-mpi=0' build. >>> >>> I wonder if there is a way to build mpich or openmpi - that doesn't >>> trigger Apple's firewall.. >>> >>> Satish >>> >>> On Sun, 19 Mar 2023, Amneet Bhalla wrote: >>> >>> > Hi Folks, >>> > >>> > I'm trying to build PETSc on MacOS Ventura (Apple M2) with hypre. I'm >>> using >>> > the latest version (v3.18.5). During the configure and make check >>> stage I >>> > get a request about accepting network connections. The configure and >>> check >>> > proceeds without my input but the dialog box stays in place. Please >>> see the >>> > screenshot. I'm wondering if it is benign or something to be concerned >>> > about? Do I need to accept any network certificate to not see this >>> dialog >>> > box? >>> > >>> > Thanks, >>> > >>> > >>> >>> >> >> -- >> --Amneet >> >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- --Amneet -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot 2023-03-19 at 2.19.53 PM.png Type: image/png Size: 300582 bytes Desc: not available URL: From bsmith at petsc.dev Sun Mar 19 17:51:31 2023 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 19 Mar 2023 18:51:31 -0400 Subject: [petsc-users] PETSc build asks for network connections In-Reply-To: References: Message-ID: ./configure option with-macos-firewall-rules > On Mar 19, 2023, at 5:25 PM, Amneet Bhalla wrote: > > Yes, this is MPI that is triggering the apple firewall. If I allow it it gets added to the allowed list (see the screenshot) and it does not trigger the firewall again. However, this needs to be done for all executables (there will be several main2d's in the list). Any way to suppress it for all executables linked to mpi in the first place? > > > > On Sun, Mar 19, 2023 at 11:01?AM Matthew Knepley > wrote: >> On Sun, Mar 19, 2023 at 1:59?PM Amneet Bhalla > wrote: >>> I'm building PETSc without mpi (I built mpich v 4.1.1 locally). Here is the configure command line that I used: >>> >>> ./configure --CC=mpicc --CXX=mpicxx --FC=mpif90 --PETSC_ARCH=darwin-dbg --with-debugging=1 --download-hypre=1 --with-x=0 >>> >> >> No, this uses MPI, it just does not built it. Configuring with --with-mpi=0 will shut off any use of MPI, which is what Satish thinks is bugging the firewall. >> >> Thanks, >> >> Matt >> >>> On Sun, Mar 19, 2023 at 10:56?AM Satish Balay > wrote: >>>> I think its due to some of the system calls from MPI. >>>> >>>> You can verify this with a '--with-mpi=0' build. >>>> >>>> I wonder if there is a way to build mpich or openmpi - that doesn't trigger Apple's firewall.. >>>> >>>> Satish >>>> >>>> On Sun, 19 Mar 2023, Amneet Bhalla wrote: >>>> >>>> > Hi Folks, >>>> > >>>> > I'm trying to build PETSc on MacOS Ventura (Apple M2) with hypre. I'm using >>>> > the latest version (v3.18.5). During the configure and make check stage I >>>> > get a request about accepting network connections. The configure and check >>>> > proceeds without my input but the dialog box stays in place. Please see the >>>> > screenshot. I'm wondering if it is benign or something to be concerned >>>> > about? Do I need to accept any network certificate to not see this dialog >>>> > box? >>>> > >>>> > Thanks, >>>> > >>>> > >>>> >>> >>> >>> -- >>> --Amneet >>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > > > -- > --Amneet > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail2amneet at gmail.com Sun Mar 19 19:10:19 2023 From: mail2amneet at gmail.com (Amneet Bhalla) Date: Sun, 19 Mar 2023 17:10:19 -0700 Subject: [petsc-users] PETSc build asks for network connections In-Reply-To: References: Message-ID: This helped only during the configure stage, and not during the check stage and during executing the application built on PETSc. Do you think it is because I built mpich locally and not with PETSc? On Sun, Mar 19, 2023 at 3:51?PM Barry Smith wrote: > > ./configure option with-macos-firewall-rules > > > On Mar 19, 2023, at 5:25 PM, Amneet Bhalla wrote: > > Yes, this is MPI that is triggering the apple firewall. If I allow it it > gets added to the allowed list (see the screenshot) and it does not trigger > the firewall again. However, this needs to be done for all executables > (there will be several main2d's in the list). Any way to suppress it > for all executables linked to mpi in the first place? > > > > On Sun, Mar 19, 2023 at 11:01?AM Matthew Knepley > wrote: > >> On Sun, Mar 19, 2023 at 1:59?PM Amneet Bhalla >> wrote: >> >>> I'm building PETSc without mpi (I built mpich v 4.1.1 locally). Here is >>> the configure command line that I used: >>> >>> ./configure --CC=mpicc --CXX=mpicxx --FC=mpif90 --PETSC_ARCH=darwin-dbg >>> --with-debugging=1 --download-hypre=1 --with-x=0 >>> >>> >> No, this uses MPI, it just does not built it. Configuring with >> --with-mpi=0 will shut off any use of MPI, which is what Satish thinks is >> bugging the firewall. >> >> Thanks, >> >> Matt >> >> >>> On Sun, Mar 19, 2023 at 10:56?AM Satish Balay wrote: >>> >>>> I think its due to some of the system calls from MPI. >>>> >>>> You can verify this with a '--with-mpi=0' build. >>>> >>>> I wonder if there is a way to build mpich or openmpi - that doesn't >>>> trigger Apple's firewall.. >>>> >>>> Satish >>>> >>>> On Sun, 19 Mar 2023, Amneet Bhalla wrote: >>>> >>>> > Hi Folks, >>>> > >>>> > I'm trying to build PETSc on MacOS Ventura (Apple M2) with hypre. I'm >>>> using >>>> > the latest version (v3.18.5). During the configure and make check >>>> stage I >>>> > get a request about accepting network connections. The configure and >>>> check >>>> > proceeds without my input but the dialog box stays in place. Please >>>> see the >>>> > screenshot. I'm wondering if it is benign or something to be concerned >>>> > about? Do I need to accept any network certificate to not see this >>>> dialog >>>> > box? >>>> > >>>> > Thanks, >>>> > >>>> > >>>> >>>> >>> >>> -- >>> --Amneet >>> >>> >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > > > -- > --Amneet > > > > > -- --Amneet -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sun Mar 19 20:45:03 2023 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 19 Mar 2023 21:45:03 -0400 Subject: [petsc-users] PETSc build asks for network connections In-Reply-To: References: Message-ID: I found a bit more information in gmakefile.test which has the magic sauce used by make test to stop the firewall popups while running the test suite. # MACOS FIREWALL HANDLING # - if run with MACOS_FIREWALL=1 # (automatically set in $PETSC_ARCH/lib/petsc/conf/petscvariables if configured --with-macos-firewall-rules), # ensure mpiexec and test executable is on firewall list # ifeq ($(MACOS_FIREWALL),1) FW := /usr/libexec/ApplicationFirewall/socketfilterfw # There is no reliable realpath command in macOS without need for 3rd party tools like homebrew coreutils # Using Python's realpath seems like the most robust way here realpath-py = $(shell $(PYTHON) -c 'import os, sys; print(os.path.realpath(sys.argv[1]))' $(1)) # define macos-firewall-register @APP=$(call realpath-py, $(1)); \ if ! sudo -n true 2>/dev/null; then printf "Asking for sudo password to add new firewall rule for\n $$APP\n"; fi; \ sudo $(FW) --remove $$APP --add $$APP --blockapp $$APP endef endif and below. When building each executable it automatically calls socketfilterfw on that executable so it won't popup. From this I think you can reverse engineer how to turn it off for your executables. Perhaps PETSc's make ex1 etc should also apply this magic sauce, Pierre? > On Mar 19, 2023, at 8:10 PM, Amneet Bhalla wrote: > > This helped only during the configure stage, and not during the check stage and during executing the application built on PETSc. Do you think it is because I built mpich locally and not with PETSc? > > On Sun, Mar 19, 2023 at 3:51?PM Barry Smith > wrote: >> >> ./configure option with-macos-firewall-rules >> >> >>> On Mar 19, 2023, at 5:25 PM, Amneet Bhalla > wrote: >>> >>> Yes, this is MPI that is triggering the apple firewall. If I allow it it gets added to the allowed list (see the screenshot) and it does not trigger the firewall again. However, this needs to be done for all executables (there will be several main2d's in the list). Any way to suppress it for all executables linked to mpi in the first place? >>> >>> >>> >>> On Sun, Mar 19, 2023 at 11:01?AM Matthew Knepley > wrote: >>>> On Sun, Mar 19, 2023 at 1:59?PM Amneet Bhalla > wrote: >>>>> I'm building PETSc without mpi (I built mpich v 4.1.1 locally). Here is the configure command line that I used: >>>>> >>>>> ./configure --CC=mpicc --CXX=mpicxx --FC=mpif90 --PETSC_ARCH=darwin-dbg --with-debugging=1 --download-hypre=1 --with-x=0 >>>>> >>>> >>>> No, this uses MPI, it just does not built it. Configuring with --with-mpi=0 will shut off any use of MPI, which is what Satish thinks is bugging the firewall. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>>> On Sun, Mar 19, 2023 at 10:56?AM Satish Balay > wrote: >>>>>> I think its due to some of the system calls from MPI. >>>>>> >>>>>> You can verify this with a '--with-mpi=0' build. >>>>>> >>>>>> I wonder if there is a way to build mpich or openmpi - that doesn't trigger Apple's firewall.. >>>>>> >>>>>> Satish >>>>>> >>>>>> On Sun, 19 Mar 2023, Amneet Bhalla wrote: >>>>>> >>>>>> > Hi Folks, >>>>>> > >>>>>> > I'm trying to build PETSc on MacOS Ventura (Apple M2) with hypre. I'm using >>>>>> > the latest version (v3.18.5). During the configure and make check stage I >>>>>> > get a request about accepting network connections. The configure and check >>>>>> > proceeds without my input but the dialog box stays in place. Please see the >>>>>> > screenshot. I'm wondering if it is benign or something to be concerned >>>>>> > about? Do I need to accept any network certificate to not see this dialog >>>>>> > box? >>>>>> > >>>>>> > Thanks, >>>>>> > >>>>>> > >>>>>> >>>>> >>>>> >>>>> -- >>>>> --Amneet >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> -- >>> --Amneet >>> >>> >>> >> > > > -- > --Amneet > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Mon Mar 20 01:39:11 2023 From: pierre at joliv.et (Pierre Jolivet) Date: Mon, 20 Mar 2023 07:39:11 +0100 Subject: [petsc-users] PETSc build asks for network connections In-Reply-To: References: Message-ID: > On 20 Mar 2023, at 2:45 AM, Barry Smith wrote: > > > I found a bit more information in gmakefile.test which has the magic sauce used by make test to stop the firewall popups while running the test suite. > > # MACOS FIREWALL HANDLING > # - if run with MACOS_FIREWALL=1 > # (automatically set in $PETSC_ARCH/lib/petsc/conf/petscvariables if configured --with-macos-firewall-rules), > # ensure mpiexec and test executable is on firewall list > # > ifeq ($(MACOS_FIREWALL),1) > FW := /usr/libexec/ApplicationFirewall/socketfilterfw > # There is no reliable realpath command in macOS without need for 3rd party tools like homebrew coreutils > # Using Python's realpath seems like the most robust way here > realpath-py = $(shell $(PYTHON) -c 'import os, sys; print(os.path.realpath(sys.argv[1]))' $(1)) > # > define macos-firewall-register > @APP=$(call realpath-py, $(1)); \ > if ! sudo -n true 2>/dev/null; then printf "Asking for sudo password to add new firewall rule for\n $$APP\n"; fi; \ > sudo $(FW) --remove $$APP --add $$APP --blockapp $$APP > endef > endif > > and below. When building each executable it automatically calls socketfilterfw on that executable so it won't popup. > > From this I think you can reverse engineer how to turn it off for your executables. > > Perhaps PETSc's make ex1 etc should also apply this magic sauce, Pierre? This configure option was added in https://gitlab.com/petsc/petsc/-/merge_requests/3131 but it never worked on my machines. I just tried again this morning a make check with MACOS_FIREWALL=1, it?s asking for my password to register MPICH in the firewall, but the popups are still appearing afterwards. That?s why I?ve never used that configure option and why I?m not sure if I can trust this code from makefile.test, but I?m probably being paranoid. Prior to Ventura, when I was running the test suite, I manually disabled the firewall https://support.apple.com/en-gb/guide/mac-help/mh11783/12.0/mac/12.0 Apple has done yet again Apple things, and even if you disable the firewall on Ventura (https://support.apple.com/en-gb/guide/mac-help/mh11783/13.0/mac/13.0), the popups are still appearing. Right now, I don?t have a solution, except for not using my machine while the test suite runs? I don?t recall whether this has been mentioned by any of the other devs, but this is a completely harmless (though frustrating) message: MPI and/or PETSc cannot be used without an action from the user to allow others to get access to your machine. Thanks, Pierre >> On Mar 19, 2023, at 8:10 PM, Amneet Bhalla wrote: >> >> This helped only during the configure stage, and not during the check stage and during executing the application built on PETSc. Do you think it is because I built mpich locally and not with PETSc? >> >> On Sun, Mar 19, 2023 at 3:51?PM Barry Smith > wrote: >>> >>> ./configure option with-macos-firewall-rules >>> >>> >>>> On Mar 19, 2023, at 5:25 PM, Amneet Bhalla > wrote: >>>> >>>> Yes, this is MPI that is triggering the apple firewall. If I allow it it gets added to the allowed list (see the screenshot) and it does not trigger the firewall again. However, this needs to be done for all executables (there will be several main2d's in the list). Any way to suppress it for all executables linked to mpi in the first place? >>>> >>>> >>>> >>>> On Sun, Mar 19, 2023 at 11:01?AM Matthew Knepley > wrote: >>>>> On Sun, Mar 19, 2023 at 1:59?PM Amneet Bhalla > wrote: >>>>>> I'm building PETSc without mpi (I built mpich v 4.1.1 locally). Here is the configure command line that I used: >>>>>> >>>>>> ./configure --CC=mpicc --CXX=mpicxx --FC=mpif90 --PETSC_ARCH=darwin-dbg --with-debugging=1 --download-hypre=1 --with-x=0 >>>>>> >>>>> >>>>> No, this uses MPI, it just does not built it. Configuring with --with-mpi=0 will shut off any use of MPI, which is what Satish thinks is bugging the firewall. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>>> On Sun, Mar 19, 2023 at 10:56?AM Satish Balay > wrote: >>>>>>> I think its due to some of the system calls from MPI. >>>>>>> >>>>>>> You can verify this with a '--with-mpi=0' build. >>>>>>> >>>>>>> I wonder if there is a way to build mpich or openmpi - that doesn't trigger Apple's firewall.. >>>>>>> >>>>>>> Satish >>>>>>> >>>>>>> On Sun, 19 Mar 2023, Amneet Bhalla wrote: >>>>>>> >>>>>>> > Hi Folks, >>>>>>> > >>>>>>> > I'm trying to build PETSc on MacOS Ventura (Apple M2) with hypre. I'm using >>>>>>> > the latest version (v3.18.5). During the configure and make check stage I >>>>>>> > get a request about accepting network connections. The configure and check >>>>>>> > proceeds without my input but the dialog box stays in place. Please see the >>>>>>> > screenshot. I'm wondering if it is benign or something to be concerned >>>>>>> > about? Do I need to accept any network certificate to not see this dialog >>>>>>> > box? >>>>>>> > >>>>>>> > Thanks, >>>>>>> > >>>>>>> > >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> --Amneet >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> -- >>>> --Amneet >>>> >>>> >>>> >>> >> >> >> -- >> --Amneet >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangzongze at gmail.com Mon Mar 20 01:40:51 2023 From: yangzongze at gmail.com (Zongze Yang) Date: Mon, 20 Mar 2023 14:40:51 +0800 Subject: [petsc-users] `snes+ksponly` did not update the solution when ksp failed. Message-ID: Hi, Hope this email finds you well. I am using firedrake to solve linear problems, which use SNES with KSPONLY. I found that the solution did not update when the `ksp` failed with DIVERGED_ITS. The macro `SNESCheckKSPSolve` called in `SNESSolve_KSPONLY` make it return before the solution is updated. Is this behavior as expected? Can I just increase the value of `maxLinearSolveFailures` to make the solution updated without introducing other side effects? Best wishes, Zongze -------------- next part -------------- An HTML attachment was scrubbed... URL: From clement.berger at ens-lyon.fr Mon Mar 20 05:18:30 2023 From: clement.berger at ens-lyon.fr (Berger Clement) Date: Mon, 20 Mar 2023 11:18:30 +0100 Subject: [petsc-users] Create a nest not aligned by processors In-Reply-To: References: <3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev> <193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev> <82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev> <26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr> <1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev> <2b92d65cd8b6b5786bad3b03ab74eda6@ens-lyon.fr> Message-ID: <0b5419aa1fdbe12f093413352c675782@ens-lyon.fr> I simplified the problem with the initial test I talked about because I thought I identified the issue, so I will walk you through my whole problem : - first the solve doesn't produce the same results as mentioned - I noticed that the duration of the factorization step of the matrix was also not consistent with the number of processors used (it is longer with 3 processes than with 1), I didn't think much of it but I now realize that for instance with 4 processes, MUMPS crashes when factorizing - I thought my matrices were wrong, but it's hard for me to use MatView to compare them with 1 or 2 proc because I work with a quite specific geometry, so in order not to fall into some weird particular case I need to use at least roughly 100 points, so looking at 100x100 matrices is not really nice...Instead I tried to multiply them by a vector full of one (after I used the vector v such that v(i)=i). I tried it on two matrices, and the results didn't depend on the number of procs, but when I tried to multiply against the nest of these two matrices (a 2x2 block diagonal nest), the result changed depending on the number of processors used - that's why I tried the toy problem I wrote to you in the first place I hope it's clearer now. Thank you --- Cl?ment BERGER ENS de Lyon Le 2023-03-17 21:57, Barry Smith a ?crit : > Yes, you would benefit from a VecConvert() to produce a standard vector. But you should be able to use VecGetArray() on the nest array and on the standard array and copy the values between the arrays any way you like. You don't need to do any reordering when you copy. Is that not working and what are the symptoms (more than just the answers to the linear solve are different)? Again you can run on one and two MPI processes with a tiny problem to see if things are not in the correct order in the vectors and matrices. > > Barry > > On Mar 17, 2023, at 3:22 PM, Berger Clement wrote: > > To use MUMPS I need to convert my matrix in MATAIJ format (or at least not MATNEST), after that if I use a VECNEST for the left and right hanside, I get an error during the solve procedure, it is removed if I copy my data in a vector with standard format, I couldn't find any other way > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 19:53, Matthew Knepley a ?crit : > > On Fri, Mar 17, 2023 at 2:53?PM Berger Clement wrote: > > But this is to properly fill up the VecNest am I right ? Because this one is correct, but I can't directly use it in the KSPSolve, I need to copy it into a standard vector > > I do not understand what you mean here. You can definitely use a VecNest in a KSP. > > Thanks, > > Matt > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 19:39, Barry Smith a ?crit : > I think the intention is that you use VecNestGetSubVecs() or VecNestGetSubVec() and fill up the sub-vectors in the same style as the matrices; this decreases the change of a reordering mistake in trying to do it by hand in your code. > > On Mar 17, 2023, at 2:35 PM, Berger Clement wrote: > > That might be it, I didn't find the equivalent of MatConvert for the vectors, so when I need to solve my linear system, with my righthandside properly computed in nest format, I create a new vector using VecDuplicate, and then I copy into it my data using VecGetArrayF90 and copiing each element by hand. Does it create an incorrect ordering ? If so how can I get the correct one ? > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 19:27, Barry Smith a ?crit : > I would run your code with small sizes on 1, 2, 3 MPI ranks and use MatView() to examine the matrices. They will definitely be ordered differently but should otherwise be the same. My guess is that the right hand side may not have the correct ordering with respect to the matrix ordering in parallel. Note also that when the right hand side does have the correct ordering the solution will have a different ordering for each different number of MPI ranks when printed (but changing the ordering should give the same results up to machine precision. > > Barry > > On Mar 17, 2023, at 2:23 PM, Berger Clement wrote: > > My issue is that it seems to improperly with some step of my process, the solve step doesn't provide the same result depending on the number of processors I use. I manually tried to multiply one the matrices I defined as a nest against a vector, and the result is not the same with e.g. 1 and 3 processors. That's why I tried the toy program I wrote in the first place, which highlights the misplacement of elements. > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 19:14, Barry Smith a ?crit : > This sounds like a fine use of MATNEST. Now back to the original question > > I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code > > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) > Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) > > does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). > > Is it possible to change that ? If I understand correctly it is behaving as expected. It is the same matrix on 1 and 2 MPI processes, the only difference is the ordering of the rows and columns. Both matrix blocks are split among the two MPI processes. This is how MATNEST works and likely what you want in practice. > On Mar 17, 2023, at 1:19 PM, Berger Clement wrote: > > I have a matrix with four different blocks (2rows - 2columns). The block sizes differ from one another, because they correspond to a different physical variable. One of the block has the particularity that it has to be updated at each iteration. This update is performed by replacing it with a product of multiple matrices that depend on the result of the previous iteration. Note that these intermediate matrices are not square (because they also correspond to other types of variables), and that they must be completely refilled by hand (i.e. they are not the result of some simple linear operations). Finally, I use this final block matrix to solve multiple linear systems (with different righthand sides), so for now I use MUMPS as only the first solve takes time (but I might change it). > > Considering this setting, I created each type of variable separately, filled the different matrices, and created different nests of vectors / matrices for my operations. When the time comes to use KSPSolve, I use MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy the few vector data I need from my nests in a regular Vector, I solve, I get back my data in my nest and carry on with the operations needed for my updates. > > Is that clear ? I don't know if I provided too many or not enough details. > > Thank you > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 17:34, Barry Smith a ?crit : > Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it. > > Barry > > Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks). > > On Mar 17, 2023, at 12:14 PM, Berger Clement wrote: > > It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work. > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 15:48, Barry Smith a ?crit : > You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process. > > Barry > > On Mar 17, 2023, at 10:10 AM, Berger Clement wrote: > > Dear all, > > I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code > > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) > Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) > > does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). > > Is it possible to change that ? > > Note that I am coding in fortran if that has ay consequence. > > Thank you, > > Sincerely, > > -- > Cl?ment BERGER > ENS de Lyon -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ [1] Links: ------ [1] http://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Mon Mar 20 05:31:01 2023 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 20 Mar 2023 06:31:01 -0400 Subject: [petsc-users] [petsc-maint] Some questions about matrix multiplication in sell format In-Reply-To: References: Message-ID: I have no idea, keep on the list. Mark On Sun, Mar 19, 2023 at 10:13?PM CaoHao at gmail.com wrote: > Thank you very much, I still have a question about the test code after > vectorization. I did not find the Examples of the sell storage format in > the petsc document. I would like to know which example you use to test the > efficiency of vectorization? > > Mark Adams ?2023?3?16??? 19:40??? > >> >> >> On Thu, Mar 16, 2023 at 4:18?AM CaoHao at gmail.com >> wrote: >> >>> Ok, maybe I can try to vectorize this format and make it part of the >>> article. >>> >> >> That would be great, and it would be a good learning experience for you >> and a good way to get exposure. >> See https://petsc.org/release/developers/contributing/ for guidance. >> >> Good luck, >> Mark >> >> >>> >>> Mark Adams ?2023?3?15??? 19:57??? >>> >>>> I don't believe that we have an effort here. It could be a good >>>> opportunity to contribute. >>>> >>>> Mark >>>> >>>> On Wed, Mar 15, 2023 at 4:54?AM CaoHao at gmail.com < >>>> ch1057458756 at gmail.com> wrote: >>>> >>>>> I checked the sell.c file and found that this algorithm supports AVX >>>>> vectorization. Will the vectorization support of ARM architecture be added >>>>> in the future? >>>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Mar 20 06:53:22 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 20 Mar 2023 07:53:22 -0400 Subject: [petsc-users] Create a nest not aligned by processors In-Reply-To: <0b5419aa1fdbe12f093413352c675782@ens-lyon.fr> References: <3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev> <193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev> <82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev> <26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr> <1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev> <2b92d65cd8b6b5786bad3b03ab74eda6@ens-lyon.fr> <0b5419aa1fdbe12f093413352c675782@ens-lyon.fr> Message-ID: On Mon, Mar 20, 2023 at 6:18?AM Berger Clement wrote: > I simplified the problem with the initial test I talked about because I > thought I identified the issue, so I will walk you through my whole problem > : > > - first the solve doesn't produce the same results as mentioned > > - I noticed that the duration of the factorization step of the matrix was > also not consistent with the number of processors used (it is longer with 3 > processes than with 1), I didn't think much of it but I now realize that > for instance with 4 processes, MUMPS crashes when factorizing > > - I thought my matrices were wrong, but it's hard for me to use MatView to > compare them with 1 or 2 proc because I work with a quite specific > geometry, so in order not to fall into some weird particular case I need to > use at least roughly 100 points, so looking at 100x100 matrices is not > really nice...Instead I tried to multiply them by a vector full of one > (after I used the vector v such that v(i)=i). I tried it on two matrices, > and the results didn't depend on the number of procs, but when I tried to > multiply against the nest of these two matrices (a 2x2 block diagonal > nest), the result changed depending on the number of processors used > > - that's why I tried the toy problem I wrote to you in the first place > > I hope it's clearer now. > Unfortunately, it is not clear to me. There is nothing attached to this email. I will try to describe things from my end. 1) There are lots of tests. Internally, Nest does not depend on the number of processes unless you make it so. This leads me to believe that your construction of the matrix changes with the number of processes. For example, using PETSC_DETERMINE for sizes will do this. 2) In order to understand what you want to achieve, we need to have something running in two cases, one with "correct" output and one with something different. It sounds like you have such a small example, but I have missed it. Can you attach this example? Then I can run it, look at the matrices, and see what is different. Thanks, Matt > Thank you > --- > Cl?ment BERGER > ENS de Lyon > > > Le 2023-03-17 21:57, Barry Smith a ?crit : > > > Yes, you would benefit from a VecConvert() to produce a standard vector. > But you should be able to use VecGetArray() on the nest array and on the > standard array and copy the values between the arrays any way you like. You > don't need to do any reordering when you copy. Is that not working and what > are the symptoms (more than just the answers to the linear solve are > different)? Again you can run on one and two MPI processes with a tiny > problem to see if things are not in the correct order in the vectors and > matrices. > > Barry > > > On Mar 17, 2023, at 3:22 PM, Berger Clement > wrote: > > To use MUMPS I need to convert my matrix in MATAIJ format (or at least not > MATNEST), after that if I use a VECNEST for the left and right hanside, I > get an error during the solve procedure, it is removed if I copy my data in > a vector with standard format, I couldn't find any other way > --- > Cl?ment BERGER > ENS de Lyon > > > Le 2023-03-17 19:53, Matthew Knepley a ?crit : > > On Fri, Mar 17, 2023 at 2:53?PM Berger Clement > wrote: > >> But this is to properly fill up the VecNest am I right ? Because this one >> is correct, but I can't directly use it in the KSPSolve, I need to copy it >> into a standard vector >> >> > I do not understand what you mean here. You can definitely use a > VecNest in a KSP. > > Thanks, > > Matt > > > >> --- >> Cl?ment BERGER >> ENS de Lyon >> >> >> Le 2023-03-17 19:39, Barry Smith a ?crit : >> >> >> I think the intention is that you use VecNestGetSubVecs() >> or VecNestGetSubVec() and fill up the sub-vectors in the same style as the >> matrices; this decreases the change of a reordering mistake in trying to do >> it by hand in your code. >> >> >> >> On Mar 17, 2023, at 2:35 PM, Berger Clement >> wrote: >> >> That might be it, I didn't find the equivalent of MatConvert for the >> vectors, so when I need to solve my linear system, with my righthandside >> properly computed in nest format, I create a new vector using VecDuplicate, >> and then I copy into it my data using VecGetArrayF90 and copiing each >> element by hand. Does it create an incorrect ordering ? If so how can I get >> the correct one ? >> --- >> Cl?ment BERGER >> ENS de Lyon >> >> >> Le 2023-03-17 19:27, Barry Smith a ?crit : >> >> >> I would run your code with small sizes on 1, 2, 3 MPI ranks and use >> MatView() to examine the matrices. They will definitely be ordered >> differently but should otherwise be the same. My guess is that the right >> hand side may not have the correct ordering with respect to the matrix >> ordering in parallel. Note also that when the right hand side does have the >> correct ordering the solution will have a different ordering for each >> different number of MPI ranks when printed (but changing the ordering >> should give the same results up to machine precision. >> >> Barry >> >> >> On Mar 17, 2023, at 2:23 PM, Berger Clement >> wrote: >> >> My issue is that it seems to improperly with some step of my process, the >> solve step doesn't provide the same result depending on the number of >> processors I use. I manually tried to multiply one the matrices I defined >> as a nest against a vector, and the result is not the same with e.g. 1 and >> 3 processors. That's why I tried the toy program I wrote in the first >> place, which highlights the misplacement of elements. >> --- >> Cl?ment BERGER >> ENS de Lyon >> >> >> Le 2023-03-17 19:14, Barry Smith a ?crit : >> >> >> This sounds like a fine use of MATNEST. Now back to the original >> question >> >> >> I want to construct a matrix by blocs, each block having different sizes >> and partially stored by multiple processors. If I am not mistaken, the >> right way to do so is by using the MATNEST type. However, the following code >> >> Call >> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) >> Call >> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) >> Call >> MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) >> >> does not generate the same matrix depending on the number of processors. >> It seems that it starts by everything owned by the first proc for A and B, >> then goes on to the second proc and so on (I hope I am being clear). >> >> Is it possible to change that ? >> >> If I understand correctly it is behaving as expected. It is the same >> matrix on 1 and 2 MPI processes, the only difference is the ordering of the >> rows and columns. >> >> Both matrix blocks are split among the two MPI processes. This is how >> MATNEST works and likely what you want in practice. >> >> On Mar 17, 2023, at 1:19 PM, Berger Clement >> wrote: >> >> I have a matrix with four different blocks (2rows - 2columns). The block >> sizes differ from one another, because they correspond to a different >> physical variable. One of the block has the particularity that it has to be >> updated at each iteration. This update is performed by replacing it with a >> product of multiple matrices that depend on the result of the previous >> iteration. Note that these intermediate matrices are not square (because >> they also correspond to other types of variables), and that they must be >> completely refilled by hand (i.e. they are not the result of some simple >> linear operations). Finally, I use this final block matrix to solve >> multiple linear systems (with different righthand sides), so for now I use >> MUMPS as only the first solve takes time (but I might change it). >> >> Considering this setting, I created each type of variable separately, >> filled the different matrices, and created different nests of vectors / >> matrices for my operations. When the time comes to use KSPSolve, I use >> MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy >> the few vector data I need from my nests in a regular Vector, I solve, I >> get back my data in my nest and carry on with the operations needed for my >> updates. >> >> Is that clear ? I don't know if I provided too many or not enough details. >> >> Thank you >> --- >> Cl?ment BERGER >> ENS de Lyon >> >> >> Le 2023-03-17 17:34, Barry Smith a ?crit : >> >> >> Perhaps if you provide a brief summary of what you would like to do >> and we may have ideas on how to achieve it. >> >> Barry >> >> Note: that MATNEST does require that all matrices live on all the MPI >> processes within the original communicator. That is if the original >> communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST >> that only lives on ranks 1,2 but you could have it have 0 rows on rank zero >> so effectively it lives only on rank 1 and 2 (though its communicator is >> all three ranks). >> >> On Mar 17, 2023, at 12:14 PM, Berger Clement >> wrote: >> >> It would be possible in the case I showed you but in mine that would >> actually be quite complicated, isn't there any other workaround ? I precise >> that I am not entitled to utilizing the MATNEST format, it's just that I >> think the other ones wouldn't work. >> --- >> Cl?ment BERGER >> ENS de Lyon >> >> >> Le 2023-03-17 15:48, Barry Smith a ?crit : >> >> >> You may be able to mimic what you want by not using PETSC_DECIDE but >> instead computing up front how many rows of each matrix you want stored on >> each MPI process. You can use 0 for on certain MPI processes for certain >> matrices if you don't want any rows of that particular matrix stored on >> that particular MPI process. >> >> Barry >> >> >> On Mar 17, 2023, at 10:10 AM, Berger Clement >> wrote: >> >> Dear all, >> >> I want to construct a matrix by blocs, each block having different sizes >> and partially stored by multiple processors. If I am not mistaken, the >> right way to do so is by using the MATNEST type. However, the following code >> >> Call >> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) >> Call >> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) >> Call >> MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) >> >> does not generate the same matrix depending on the number of processors. >> It seems that it starts by everything owned by the first proc for A and B, >> then goes on to the second proc and so on (I hope I am being clear). >> >> Is it possible to change that ? >> >> Note that I am coding in fortran if that has ay consequence. >> >> Thank you, >> >> Sincerely, >> -- >> Cl?ment BERGER >> ENS de Lyon >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Mar 20 07:00:47 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 20 Mar 2023 08:00:47 -0400 Subject: [petsc-users] `snes+ksponly` did not update the solution when ksp failed. In-Reply-To: References: Message-ID: On Mon, Mar 20, 2023 at 2:41?AM Zongze Yang wrote: > Hi, > > Hope this email finds you well. I am using firedrake to solve linear > problems, which use SNES with KSPONLY. > > I found that the solution did not update when the `ksp` failed with DIVERGED_ITS. > The macro `SNESCheckKSPSolve` called in `SNESSolve_KSPONLY` make it > return before the solution is updated. > Yes, this is the intended behavior. We do not guarantee cleanup on errors. > Is this behavior as expected? Can I just increase the value of `maxLinearSolveFailures` > to make the solution updated without introducing other side effects? > Yes, that is right. It will not have other side effects with this SNES type. Thanks, Matt > Best wishes, > Zongze > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangzongze at gmail.com Mon Mar 20 07:59:56 2023 From: yangzongze at gmail.com (Zongze Yang) Date: Mon, 20 Mar 2023 20:59:56 +0800 Subject: [petsc-users] `snes+ksponly` did not update the solution when ksp failed. In-Reply-To: References: Message-ID: Thank you for your clarification. Best wishes, Zongze On Mon, 20 Mar 2023 at 20:00, Matthew Knepley wrote: > On Mon, Mar 20, 2023 at 2:41?AM Zongze Yang wrote: > >> Hi, >> >> Hope this email finds you well. I am using firedrake to solve linear >> problems, which use SNES with KSPONLY. >> >> I found that the solution did not update when the `ksp` failed with DIVERGED_ITS. >> The macro `SNESCheckKSPSolve` called in `SNESSolve_KSPONLY` make it >> return before the solution is updated. >> > > Yes, this is the intended behavior. We do not guarantee cleanup on errors. > > >> Is this behavior as expected? Can I just increase the value of `maxLinearSolveFailures` >> to make the solution updated without introducing other side effects? >> > > Yes, that is right. It will not have other side effects with this SNES > type. > > Thanks, > > Matt > > >> Best wishes, >> Zongze >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From clement.berger at ens-lyon.fr Mon Mar 20 08:35:38 2023 From: clement.berger at ens-lyon.fr (Berger Clement) Date: Mon, 20 Mar 2023 14:35:38 +0100 Subject: [petsc-users] Create a nest not aligned by processors In-Reply-To: References: <3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev> <193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev> <82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev> <26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr> <1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev> <2b92d65cd8b6b5786bad3b03ab74eda6@ens-lyon.fr> <0b5419aa1fdbe12f093413352c675782@ens-lyon.fr> Message-ID: <0fe503797c8f73605cebaeaf1924e473@ens-lyon.fr> 1) Yes in my program I use PETSC_DETERMINE, but I don't see what is the issue there. From what I understand, it just lets PETSc set the total size from the local sizes provided, am I mistaken ? 2) I attached a small script, when I run it with 1 proc the output vector is not the same as if I run it with 2 procs, I don't know what I should do to make them match. PS : I precise that I am not trying to point out a bug here, I realize that my usage is wrong somehow, I just can't determine why, sorry if I gave you the wrong impression ! Thank you, --- Cl?ment BERGER ENS de Lyon Le 2023-03-20 12:53, Matthew Knepley a ?crit : > On Mon, Mar 20, 2023 at 6:18?AM Berger Clement wrote: > >> I simplified the problem with the initial test I talked about because I thought I identified the issue, so I will walk you through my whole problem : >> >> - first the solve doesn't produce the same results as mentioned >> >> - I noticed that the duration of the factorization step of the matrix was also not consistent with the number of processors used (it is longer with 3 processes than with 1), I didn't think much of it but I now realize that for instance with 4 processes, MUMPS crashes when factorizing >> >> - I thought my matrices were wrong, but it's hard for me to use MatView to compare them with 1 or 2 proc because I work with a quite specific geometry, so in order not to fall into some weird particular case I need to use at least roughly 100 points, so looking at 100x100 matrices is not really nice...Instead I tried to multiply them by a vector full of one (after I used the vector v such that v(i)=i). I tried it on two matrices, and the results didn't depend on the number of procs, but when I tried to multiply against the nest of these two matrices (a 2x2 block diagonal nest), the result changed depending on the number of processors used >> >> - that's why I tried the toy problem I wrote to you in the first place >> >> I hope it's clearer now. > > Unfortunately, it is not clear to me. There is nothing attached to this email. I will try to describe things from my end. > > 1) There are lots of tests. Internally, Nest does not depend on the number of processes unless you make it so. This leads > me to believe that your construction of the matrix changes with the number of processes. For example, using PETSC_DETERMINE > for sizes will do this. > > 2) In order to understand what you want to achieve, we need to have something running in two cases, one with "correct" output and one > with something different. It sounds like you have such a small example, but I have missed it. > > Can you attach this example? Then I can run it, look at the matrices, and see what is different. > > Thanks, > > Matt > > Thank you > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 21:57, Barry Smith a ?crit : > Yes, you would benefit from a VecConvert() to produce a standard vector. But you should be able to use VecGetArray() on the nest array and on the standard array and copy the values between the arrays any way you like. You don't need to do any reordering when you copy. Is that not working and what are the symptoms (more than just the answers to the linear solve are different)? Again you can run on one and two MPI processes with a tiny problem to see if things are not in the correct order in the vectors and matrices. > > Barry > > On Mar 17, 2023, at 3:22 PM, Berger Clement wrote: > > To use MUMPS I need to convert my matrix in MATAIJ format (or at least not MATNEST), after that if I use a VECNEST for the left and right hanside, I get an error during the solve procedure, it is removed if I copy my data in a vector with standard format, I couldn't find any other way > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 19:53, Matthew Knepley a ?crit : > > On Fri, Mar 17, 2023 at 2:53?PM Berger Clement wrote: > > But this is to properly fill up the VecNest am I right ? Because this one is correct, but I can't directly use it in the KSPSolve, I need to copy it into a standard vector > > I do not understand what you mean here. You can definitely use a VecNest in a KSP. > > Thanks, > > Matt > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 19:39, Barry Smith a ?crit : > I think the intention is that you use VecNestGetSubVecs() or VecNestGetSubVec() and fill up the sub-vectors in the same style as the matrices; this decreases the change of a reordering mistake in trying to do it by hand in your code. > > On Mar 17, 2023, at 2:35 PM, Berger Clement wrote: > > That might be it, I didn't find the equivalent of MatConvert for the vectors, so when I need to solve my linear system, with my righthandside properly computed in nest format, I create a new vector using VecDuplicate, and then I copy into it my data using VecGetArrayF90 and copiing each element by hand. Does it create an incorrect ordering ? If so how can I get the correct one ? > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 19:27, Barry Smith a ?crit : > I would run your code with small sizes on 1, 2, 3 MPI ranks and use MatView() to examine the matrices. They will definitely be ordered differently but should otherwise be the same. My guess is that the right hand side may not have the correct ordering with respect to the matrix ordering in parallel. Note also that when the right hand side does have the correct ordering the solution will have a different ordering for each different number of MPI ranks when printed (but changing the ordering should give the same results up to machine precision. > > Barry > > On Mar 17, 2023, at 2:23 PM, Berger Clement wrote: > > My issue is that it seems to improperly with some step of my process, the solve step doesn't provide the same result depending on the number of processors I use. I manually tried to multiply one the matrices I defined as a nest against a vector, and the result is not the same with e.g. 1 and 3 processors. That's why I tried the toy program I wrote in the first place, which highlights the misplacement of elements. > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 19:14, Barry Smith a ?crit : > This sounds like a fine use of MATNEST. Now back to the original question > > I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code > > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) > Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) > > does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). > > Is it possible to change that ? If I understand correctly it is behaving as expected. It is the same matrix on 1 and 2 MPI processes, the only difference is the ordering of the rows and columns. Both matrix blocks are split among the two MPI processes. This is how MATNEST works and likely what you want in practice. > On Mar 17, 2023, at 1:19 PM, Berger Clement wrote: > > I have a matrix with four different blocks (2rows - 2columns). The block sizes differ from one another, because they correspond to a different physical variable. One of the block has the particularity that it has to be updated at each iteration. This update is performed by replacing it with a product of multiple matrices that depend on the result of the previous iteration. Note that these intermediate matrices are not square (because they also correspond to other types of variables), and that they must be completely refilled by hand (i.e. they are not the result of some simple linear operations). Finally, I use this final block matrix to solve multiple linear systems (with different righthand sides), so for now I use MUMPS as only the first solve takes time (but I might change it). > > Considering this setting, I created each type of variable separately, filled the different matrices, and created different nests of vectors / matrices for my operations. When the time comes to use KSPSolve, I use MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy the few vector data I need from my nests in a regular Vector, I solve, I get back my data in my nest and carry on with the operations needed for my updates. > > Is that clear ? I don't know if I provided too many or not enough details. > > Thank you > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 17:34, Barry Smith a ?crit : > Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it. > > Barry > > Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks). > > On Mar 17, 2023, at 12:14 PM, Berger Clement wrote: > > It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work. > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 15:48, Barry Smith a ?crit : > You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process. > > Barry > > On Mar 17, 2023, at 10:10 AM, Berger Clement wrote: > > Dear all, > > I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code > > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) > Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) > > does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). > > Is it possible to change that ? > > Note that I am coding in fortran if that has ay consequence. > > Thank you, > > Sincerely, > > -- > Cl?ment BERGER > ENS de Lyon -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ [1] -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ [1] Links: ------ [1] http://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mainTest.f90 Type: text/x-c Size: 1087 bytes Desc: not available URL: From knepley at gmail.com Mon Mar 20 08:58:59 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 20 Mar 2023 09:58:59 -0400 Subject: [petsc-users] Create a nest not aligned by processors In-Reply-To: <0fe503797c8f73605cebaeaf1924e473@ens-lyon.fr> References: <3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev> <193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev> <82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev> <26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr> <1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev> <2b92d65cd8b6b5786bad3b03ab74eda6@ens-lyon.fr> <0b5419aa1fdbe12f093413352c675782@ens-lyon.fr> <0fe503797c8f73605cebaeaf1924e473@ens-lyon.fr> Message-ID: On Mon, Mar 20, 2023 at 9:35?AM Berger Clement wrote: > 1) Yes in my program I use PETSC_DETERMINE, but I don't see what is the > issue there. From what I understand, it just lets PETSc set the total size > from the local sizes provided, am I mistaken ? > > 2) I attached a small script, when I run it with 1 proc the output vector > is not the same as if I run it with 2 procs, I don't know what I should do > to make them match. > > PS : I precise that I am not trying to point out a bug here, I realize > that my usage is wrong somehow, I just can't determine why, sorry if I gave > you the wrong impression ! > > I think I can now explain this clearly. Thank you for the nice simple example. I attach my slightly changed version (I think better in C). Here is running on one process: master *:~/Downloads/tmp/Berger$ /PETSc3/petsc/apple/bin/mpiexec -n 1 ./nestTest -left_view -right_view -nest_view -full_view Mat Object: 1 MPI process type: nest Matrix object: type=nest, rows=2, cols=2 MatNest structure: (0,0) : type=constantdiagonal, rows=4, cols=4 (0,1) : NULL (1,0) : NULL (1,1) : type=constantdiagonal, rows=4, cols=4 Mat Object: 1 MPI process type: seqaij row 0: (0, 2.) row 1: (1, 2.) row 2: (2, 2.) row 3: (3, 2.) row 4: (4, 1.) row 5: (5, 1.) row 6: (6, 1.) row 7: (7, 1.) Vec Object: 1 MPI process type: seq 0. 1. 2. 3. 4. 5. 6. 7. Vec Object: 1 MPI process type: seq 0. 2. 4. 6. 4. 5. 6. 7. This looks like what you expect. Doubling the first four rows and reproducing the last four. Now let's run on two processes: master *:~/Downloads/tmp/Berger$ /PETSc3/petsc/apple/bin/mpiexec -n 2 ./nestTest -left_view -right_view -nest_view -full_view Mat Object: 2 MPI processes type: nest Matrix object: type=nest, rows=2, cols=2 MatNest structure: (0,0) : type=constantdiagonal, rows=4, cols=4 (0,1) : NULL (1,0) : NULL (1,1) : type=constantdiagonal, rows=4, cols=4 Mat Object: 2 MPI processes type: mpiaij row 0: (0, 2.) row 1: (1, 2.) row 2: (2, 1.) row 3: (3, 1.) row 4: (4, 2.) row 5: (5, 2.) row 6: (6, 1.) row 7: (7, 1.) Vec Object: 2 MPI processes type: mpi Process [0] 0. 1. 2. 3. Process [1] 4. 5. 6. 7. Vec Object: 2 MPI processes type: mpi Process [0] 0. 2. 2. 3. Process [1] 8. 10. 6. 7. Let me describe what has changed. The matrices A and B are parallel, so each has two rows on process 0 and two rows on process 1. In the MatNest they are interleaved because we asked for contiguous numbering (by giving NULL for the IS of global row numbers). If we want to reproduce the same output, we would need to produce our input vector with the same interleaved numbering. Thanks, Matt > Thank you, > --- > Cl?ment BERGER > ENS de Lyon > > > Le 2023-03-20 12:53, Matthew Knepley a ?crit : > > On Mon, Mar 20, 2023 at 6:18?AM Berger Clement > wrote: > >> I simplified the problem with the initial test I talked about because I >> thought I identified the issue, so I will walk you through my whole problem >> : >> >> - first the solve doesn't produce the same results as mentioned >> >> - I noticed that the duration of the factorization step of the matrix was >> also not consistent with the number of processors used (it is longer with 3 >> processes than with 1), I didn't think much of it but I now realize that >> for instance with 4 processes, MUMPS crashes when factorizing >> >> - I thought my matrices were wrong, but it's hard for me to use MatView >> to compare them with 1 or 2 proc because I work with a quite specific >> geometry, so in order not to fall into some weird particular case I need to >> use at least roughly 100 points, so looking at 100x100 matrices is not >> really nice...Instead I tried to multiply them by a vector full of one >> (after I used the vector v such that v(i)=i). I tried it on two matrices, >> and the results didn't depend on the number of procs, but when I tried to >> multiply against the nest of these two matrices (a 2x2 block diagonal >> nest), the result changed depending on the number of processors used >> >> - that's why I tried the toy problem I wrote to you in the first place >> >> I hope it's clearer now. >> > > Unfortunately, it is not clear to me. There is nothing attached to this > email. I will try to describe things from my end. > > 1) There are lots of tests. Internally, Nest does not depend on the number > of processes unless you make it so. This leads > me to believe that your construction of the matrix changes with the > number of processes. For example, using PETSC_DETERMINE > for sizes will do this. > > 2) In order to understand what you want to achieve, we need to have > something running in two cases, one with "correct" output and one > with something different. It sounds like you have such a small > example, but I have missed it. > > Can you attach this example? Then I can run it, look at the matrices, and > see what is different. > > Thanks, > > Matt > > >> Thank you >> --- >> Cl?ment BERGER >> ENS de Lyon >> >> >> Le 2023-03-17 21:57, Barry Smith a ?crit : >> >> >> Yes, you would benefit from a VecConvert() to produce a standard >> vector. But you should be able to use VecGetArray() on the nest array and >> on the standard array and copy the values between the arrays any way you >> like. You don't need to do any reordering when you copy. Is that not >> working and what are the symptoms (more than just the answers to the linear >> solve are different)? Again you can run on one and two MPI processes with a >> tiny problem to see if things are not in the correct order in the vectors >> and matrices. >> >> Barry >> >> >> On Mar 17, 2023, at 3:22 PM, Berger Clement >> wrote: >> >> To use MUMPS I need to convert my matrix in MATAIJ format (or at least >> not MATNEST), after that if I use a VECNEST for the left and right hanside, >> I get an error during the solve procedure, it is removed if I copy my data >> in a vector with standard format, I couldn't find any other way >> --- >> Cl?ment BERGER >> ENS de Lyon >> >> >> Le 2023-03-17 19:53, Matthew Knepley a ?crit : >> >> On Fri, Mar 17, 2023 at 2:53?PM Berger Clement < >> clement.berger at ens-lyon.fr> wrote: >> >>> But this is to properly fill up the VecNest am I right ? Because this >>> one is correct, but I can't directly use it in the KSPSolve, I need to copy >>> it into a standard vector >>> >>> >> I do not understand what you mean here. You can definitely use a >> VecNest in a KSP. >> >> Thanks, >> >> Matt >> >> >> >>> --- >>> Cl?ment BERGER >>> ENS de Lyon >>> >>> >>> Le 2023-03-17 19:39, Barry Smith a ?crit : >>> >>> >>> I think the intention is that you use VecNestGetSubVecs() >>> or VecNestGetSubVec() and fill up the sub-vectors in the same style as the >>> matrices; this decreases the change of a reordering mistake in trying to do >>> it by hand in your code. >>> >>> >>> >>> On Mar 17, 2023, at 2:35 PM, Berger Clement >>> wrote: >>> >>> That might be it, I didn't find the equivalent of MatConvert for the >>> vectors, so when I need to solve my linear system, with my righthandside >>> properly computed in nest format, I create a new vector using VecDuplicate, >>> and then I copy into it my data using VecGetArrayF90 and copiing each >>> element by hand. Does it create an incorrect ordering ? If so how can I get >>> the correct one ? >>> --- >>> Cl?ment BERGER >>> ENS de Lyon >>> >>> >>> Le 2023-03-17 19:27, Barry Smith a ?crit : >>> >>> >>> I would run your code with small sizes on 1, 2, 3 MPI ranks and use >>> MatView() to examine the matrices. They will definitely be ordered >>> differently but should otherwise be the same. My guess is that the right >>> hand side may not have the correct ordering with respect to the matrix >>> ordering in parallel. Note also that when the right hand side does have the >>> correct ordering the solution will have a different ordering for each >>> different number of MPI ranks when printed (but changing the ordering >>> should give the same results up to machine precision. >>> >>> Barry >>> >>> >>> On Mar 17, 2023, at 2:23 PM, Berger Clement >>> wrote: >>> >>> My issue is that it seems to improperly with some step of my process, >>> the solve step doesn't provide the same result depending on the number of >>> processors I use. I manually tried to multiply one the matrices I defined >>> as a nest against a vector, and the result is not the same with e.g. 1 and >>> 3 processors. That's why I tried the toy program I wrote in the first >>> place, which highlights the misplacement of elements. >>> --- >>> Cl?ment BERGER >>> ENS de Lyon >>> >>> >>> Le 2023-03-17 19:14, Barry Smith a ?crit : >>> >>> >>> This sounds like a fine use of MATNEST. Now back to the original >>> question >>> >>> >>> I want to construct a matrix by blocs, each block having different sizes >>> and partially stored by multiple processors. If I am not mistaken, the >>> right way to do so is by using the MATNEST type. However, the following code >>> >>> Call >>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) >>> Call >>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) >>> Call >>> MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) >>> >>> does not generate the same matrix depending on the number of processors. >>> It seems that it starts by everything owned by the first proc for A and B, >>> then goes on to the second proc and so on (I hope I am being clear). >>> >>> Is it possible to change that ? >>> >>> If I understand correctly it is behaving as expected. It is the same >>> matrix on 1 and 2 MPI processes, the only difference is the ordering of the >>> rows and columns. >>> >>> Both matrix blocks are split among the two MPI processes. This is how >>> MATNEST works and likely what you want in practice. >>> >>> On Mar 17, 2023, at 1:19 PM, Berger Clement >>> wrote: >>> >>> I have a matrix with four different blocks (2rows - 2columns). The block >>> sizes differ from one another, because they correspond to a different >>> physical variable. One of the block has the particularity that it has to be >>> updated at each iteration. This update is performed by replacing it with a >>> product of multiple matrices that depend on the result of the previous >>> iteration. Note that these intermediate matrices are not square (because >>> they also correspond to other types of variables), and that they must be >>> completely refilled by hand (i.e. they are not the result of some simple >>> linear operations). Finally, I use this final block matrix to solve >>> multiple linear systems (with different righthand sides), so for now I use >>> MUMPS as only the first solve takes time (but I might change it). >>> >>> Considering this setting, I created each type of variable separately, >>> filled the different matrices, and created different nests of vectors / >>> matrices for my operations. When the time comes to use KSPSolve, I use >>> MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy >>> the few vector data I need from my nests in a regular Vector, I solve, I >>> get back my data in my nest and carry on with the operations needed for my >>> updates. >>> >>> Is that clear ? I don't know if I provided too many or not enough >>> details. >>> >>> Thank you >>> --- >>> Cl?ment BERGER >>> ENS de Lyon >>> >>> >>> Le 2023-03-17 17:34, Barry Smith a ?crit : >>> >>> >>> Perhaps if you provide a brief summary of what you would like to do >>> and we may have ideas on how to achieve it. >>> >>> Barry >>> >>> Note: that MATNEST does require that all matrices live on all the MPI >>> processes within the original communicator. That is if the original >>> communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST >>> that only lives on ranks 1,2 but you could have it have 0 rows on rank zero >>> so effectively it lives only on rank 1 and 2 (though its communicator is >>> all three ranks). >>> >>> On Mar 17, 2023, at 12:14 PM, Berger Clement >>> wrote: >>> >>> It would be possible in the case I showed you but in mine that would >>> actually be quite complicated, isn't there any other workaround ? I precise >>> that I am not entitled to utilizing the MATNEST format, it's just that I >>> think the other ones wouldn't work. >>> --- >>> Cl?ment BERGER >>> ENS de Lyon >>> >>> >>> Le 2023-03-17 15:48, Barry Smith a ?crit : >>> >>> >>> You may be able to mimic what you want by not using PETSC_DECIDE but >>> instead computing up front how many rows of each matrix you want stored on >>> each MPI process. You can use 0 for on certain MPI processes for certain >>> matrices if you don't want any rows of that particular matrix stored on >>> that particular MPI process. >>> >>> Barry >>> >>> >>> On Mar 17, 2023, at 10:10 AM, Berger Clement >>> wrote: >>> >>> Dear all, >>> >>> I want to construct a matrix by blocs, each block having different sizes >>> and partially stored by multiple processors. If I am not mistaken, the >>> right way to do so is by using the MATNEST type. However, the following code >>> >>> Call >>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) >>> Call >>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) >>> Call >>> MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) >>> >>> does not generate the same matrix depending on the number of processors. >>> It seems that it starts by everything owned by the first proc for A and B, >>> then goes on to the second proc and so on (I hope I am being clear). >>> >>> Is it possible to change that ? >>> >>> Note that I am coding in fortran if that has ay consequence. >>> >>> Thank you, >>> >>> Sincerely, >>> -- >>> Cl?ment BERGER >>> ENS de Lyon >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: nestTest.c Type: application/octet-stream Size: 1352 bytes Desc: not available URL: From clement.berger at ens-lyon.fr Mon Mar 20 09:09:57 2023 From: clement.berger at ens-lyon.fr (Berger Clement) Date: Mon, 20 Mar 2023 15:09:57 +0100 Subject: [petsc-users] Create a nest not aligned by processors In-Reply-To: References: <3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev> <193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev> <82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev> <26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr> <1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev> <2b92d65cd8b6b5786bad3b03ab74eda6@ens-lyon.fr> <0b5419aa1fdbe12f093413352c675782@ens-lyon.fr> <0fe503797c8f73605cebaeaf1924e473@ens-lyon.fr> Message-ID: <17f530ee58cffe33a7f66d5e293d70c9@ens-lyon.fr> Ok so this means that if I define my vectors via VecNest (organized as the matrices), everything will be correctly ordered ? How does that behave with MatConvert ? In the sense that if I convert a MatNest to MatAIJ via MatConvert, will the vector as built in the example I showed you work properly ? Thank you ! --- Cl?ment BERGER ENS de Lyon Le 2023-03-20 14:58, Matthew Knepley a ?crit : > On Mon, Mar 20, 2023 at 9:35?AM Berger Clement wrote: > >> 1) Yes in my program I use PETSC_DETERMINE, but I don't see what is the issue there. From what I understand, it just lets PETSc set the total size from the local sizes provided, am I mistaken ? >> >> 2) I attached a small script, when I run it with 1 proc the output vector is not the same as if I run it with 2 procs, I don't know what I should do to make them match. >> >> PS : I precise that I am not trying to point out a bug here, I realize that my usage is wrong somehow, I just can't determine why, sorry if I gave you the wrong impression ! > > I think I can now explain this clearly. Thank you for the nice simple example. I attach my slightly changed version (I think better in C). Here is running on one process: > master *:~/Downloads/tmp/Berger$ /PETSc3/petsc/apple/bin/mpiexec -n 1 ./nestTest -left_view -right_view -nest_view -full_view > Mat Object: 1 MPI process > type: nest > Matrix object: > type=nest, rows=2, cols=2 > MatNest structure: > (0,0) : type=constantdiagonal, rows=4, cols=4 > (0,1) : NULL > (1,0) : NULL > (1,1) : type=constantdiagonal, rows=4, cols=4 > Mat Object: 1 MPI process > type: seqaij > row 0: (0, 2.) > row 1: (1, 2.) > row 2: (2, 2.) > row 3: (3, 2.) > row 4: (4, 1.) > row 5: (5, 1.) > row 6: (6, 1.) > row 7: (7, 1.) > Vec Object: 1 MPI process > type: seq > 0. > 1. > 2. > 3. > 4. > 5. > 6. > 7. > Vec Object: 1 MPI process > type: seq > 0. > 2. > 4. > 6. > 4. > 5. > 6. > 7. > > This looks like what you expect. Doubling the first four rows and reproducing the last four. Now let's run on two processes: > > master *:~/Downloads/tmp/Berger$ /PETSc3/petsc/apple/bin/mpiexec -n 2 ./nestTest -left_view -right_view -nest_view -full_view > Mat Object: 2 MPI processes > type: nest > Matrix object: > type=nest, rows=2, cols=2 > MatNest structure: > (0,0) : type=constantdiagonal, rows=4, cols=4 > (0,1) : NULL > (1,0) : NULL > (1,1) : type=constantdiagonal, rows=4, cols=4 > Mat Object: 2 MPI processes > type: mpiaij > row 0: (0, 2.) > row 1: (1, 2.) > row 2: (2, 1.) > row 3: (3, 1.) > row 4: (4, 2.) > row 5: (5, 2.) > row 6: (6, 1.) > row 7: (7, 1.) > Vec Object: 2 MPI processes > type: mpi > Process [0] > 0. > 1. > 2. > 3. > Process [1] > 4. > 5. > 6. > 7. > Vec Object: 2 MPI processes > type: mpi > Process [0] > 0. > 2. > 2. > 3. > Process [1] > 8. > 10. > 6. > 7. > > Let me describe what has changed. The matrices A and B are parallel, so each has two rows on process 0 and two rows on process 1. In the MatNest they are interleaved because we asked for contiguous numbering (by giving NULL for the IS of global row numbers). If we want to reproduce the same output, we would need to produce our input vector with the same interleaved numbering. > > Thanks, > > Matt > > Thank you, > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-20 12:53, Matthew Knepley a ?crit : > > On Mon, Mar 20, 2023 at 6:18?AM Berger Clement wrote: > > I simplified the problem with the initial test I talked about because I thought I identified the issue, so I will walk you through my whole problem : > > - first the solve doesn't produce the same results as mentioned > > - I noticed that the duration of the factorization step of the matrix was also not consistent with the number of processors used (it is longer with 3 processes than with 1), I didn't think much of it but I now realize that for instance with 4 processes, MUMPS crashes when factorizing > > - I thought my matrices were wrong, but it's hard for me to use MatView to compare them with 1 or 2 proc because I work with a quite specific geometry, so in order not to fall into some weird particular case I need to use at least roughly 100 points, so looking at 100x100 matrices is not really nice...Instead I tried to multiply them by a vector full of one (after I used the vector v such that v(i)=i). I tried it on two matrices, and the results didn't depend on the number of procs, but when I tried to multiply against the nest of these two matrices (a 2x2 block diagonal nest), the result changed depending on the number of processors used > > - that's why I tried the toy problem I wrote to you in the first place > > I hope it's clearer now. > > Unfortunately, it is not clear to me. There is nothing attached to this email. I will try to describe things from my end. > > 1) There are lots of tests. Internally, Nest does not depend on the number of processes unless you make it so. This leads > me to believe that your construction of the matrix changes with the number of processes. For example, using PETSC_DETERMINE > for sizes will do this. > > 2) In order to understand what you want to achieve, we need to have something running in two cases, one with "correct" output and one > with something different. It sounds like you have such a small example, but I have missed it. > > Can you attach this example? Then I can run it, look at the matrices, and see what is different. > > Thanks, > > Matt > > Thank you > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 21:57, Barry Smith a ?crit : > Yes, you would benefit from a VecConvert() to produce a standard vector. But you should be able to use VecGetArray() on the nest array and on the standard array and copy the values between the arrays any way you like. You don't need to do any reordering when you copy. Is that not working and what are the symptoms (more than just the answers to the linear solve are different)? Again you can run on one and two MPI processes with a tiny problem to see if things are not in the correct order in the vectors and matrices. > > Barry > > On Mar 17, 2023, at 3:22 PM, Berger Clement wrote: > > To use MUMPS I need to convert my matrix in MATAIJ format (or at least not MATNEST), after that if I use a VECNEST for the left and right hanside, I get an error during the solve procedure, it is removed if I copy my data in a vector with standard format, I couldn't find any other way > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 19:53, Matthew Knepley a ?crit : > > On Fri, Mar 17, 2023 at 2:53?PM Berger Clement wrote: > > But this is to properly fill up the VecNest am I right ? Because this one is correct, but I can't directly use it in the KSPSolve, I need to copy it into a standard vector > > I do not understand what you mean here. You can definitely use a VecNest in a KSP. > > Thanks, > > Matt > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 19:39, Barry Smith a ?crit : > I think the intention is that you use VecNestGetSubVecs() or VecNestGetSubVec() and fill up the sub-vectors in the same style as the matrices; this decreases the change of a reordering mistake in trying to do it by hand in your code. > > On Mar 17, 2023, at 2:35 PM, Berger Clement wrote: > > That might be it, I didn't find the equivalent of MatConvert for the vectors, so when I need to solve my linear system, with my righthandside properly computed in nest format, I create a new vector using VecDuplicate, and then I copy into it my data using VecGetArrayF90 and copiing each element by hand. Does it create an incorrect ordering ? If so how can I get the correct one ? > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 19:27, Barry Smith a ?crit : > I would run your code with small sizes on 1, 2, 3 MPI ranks and use MatView() to examine the matrices. They will definitely be ordered differently but should otherwise be the same. My guess is that the right hand side may not have the correct ordering with respect to the matrix ordering in parallel. Note also that when the right hand side does have the correct ordering the solution will have a different ordering for each different number of MPI ranks when printed (but changing the ordering should give the same results up to machine precision. > > Barry > > On Mar 17, 2023, at 2:23 PM, Berger Clement wrote: > > My issue is that it seems to improperly with some step of my process, the solve step doesn't provide the same result depending on the number of processors I use. I manually tried to multiply one the matrices I defined as a nest against a vector, and the result is not the same with e.g. 1 and 3 processors. That's why I tried the toy program I wrote in the first place, which highlights the misplacement of elements. > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 19:14, Barry Smith a ?crit : > This sounds like a fine use of MATNEST. Now back to the original question > > I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code > > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) > Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) > > does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). > > Is it possible to change that ? If I understand correctly it is behaving as expected. It is the same matrix on 1 and 2 MPI processes, the only difference is the ordering of the rows and columns. Both matrix blocks are split among the two MPI processes. This is how MATNEST works and likely what you want in practice. > On Mar 17, 2023, at 1:19 PM, Berger Clement wrote: > > I have a matrix with four different blocks (2rows - 2columns). The block sizes differ from one another, because they correspond to a different physical variable. One of the block has the particularity that it has to be updated at each iteration. This update is performed by replacing it with a product of multiple matrices that depend on the result of the previous iteration. Note that these intermediate matrices are not square (because they also correspond to other types of variables), and that they must be completely refilled by hand (i.e. they are not the result of some simple linear operations). Finally, I use this final block matrix to solve multiple linear systems (with different righthand sides), so for now I use MUMPS as only the first solve takes time (but I might change it). > > Considering this setting, I created each type of variable separately, filled the different matrices, and created different nests of vectors / matrices for my operations. When the time comes to use KSPSolve, I use MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy the few vector data I need from my nests in a regular Vector, I solve, I get back my data in my nest and carry on with the operations needed for my updates. > > Is that clear ? I don't know if I provided too many or not enough details. > > Thank you > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 17:34, Barry Smith a ?crit : > Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it. > > Barry > > Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks). > > On Mar 17, 2023, at 12:14 PM, Berger Clement wrote: > > It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work. > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 15:48, Barry Smith a ?crit : > You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process. > > Barry > > On Mar 17, 2023, at 10:10 AM, Berger Clement wrote: > > Dear all, > > I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code > > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) > Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) > > does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). > > Is it possible to change that ? > > Note that I am coding in fortran if that has ay consequence. > > Thank you, > > Sincerely, > > -- > Cl?ment BERGER > ENS de Lyon -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ [1] -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ [1] -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ [1] Links: ------ [1] http://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Mar 20 09:51:11 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 20 Mar 2023 10:51:11 -0400 Subject: [petsc-users] Create a nest not aligned by processors In-Reply-To: <17f530ee58cffe33a7f66d5e293d70c9@ens-lyon.fr> References: <3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev> <193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev> <82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev> <26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr> <1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev> <2b92d65cd8b6b5786bad3b03ab74eda6@ens-lyon.fr> <0b5419aa1fdbe12f093413352c675782@ens-lyon.fr> <0fe503797c8f73605cebaeaf1924e473@ens-lyon.fr> <17f530ee58cffe33a7f66d5e293d70c9@ens-lyon.fr> Message-ID: On Mon, Mar 20, 2023 at 10:09?AM Berger Clement wrote: > Ok so this means that if I define my vectors via VecNest (organized as the > matrices), everything will be correctly ordered ? > Yes. > How does that behave with MatConvert ? In the sense that if I convert a > MatNest to MatAIJ via MatConvert, will the vector as built in the example I > showed you work properly ? > > No. MatConvert just changes the storage format, not the ordering. Thanks, Matt > Thank you ! > --- > Cl?ment BERGER > ENS de Lyon > > > Le 2023-03-20 14:58, Matthew Knepley a ?crit : > > On Mon, Mar 20, 2023 at 9:35?AM Berger Clement > wrote: > >> 1) Yes in my program I use PETSC_DETERMINE, but I don't see what is the >> issue there. From what I understand, it just lets PETSc set the total size >> from the local sizes provided, am I mistaken ? >> >> 2) I attached a small script, when I run it with 1 proc the output vector >> is not the same as if I run it with 2 procs, I don't know what I should do >> to make them match. >> >> PS : I precise that I am not trying to point out a bug here, I realize >> that my usage is wrong somehow, I just can't determine why, sorry if I gave >> you the wrong impression ! >> >> >> I think I can now explain this clearly. Thank you for the nice simple > example. I attach my slightly changed version (I think better in C). Here > is running on one process: > > master *:~/Downloads/tmp/Berger$ /PETSc3/petsc/apple/bin/mpiexec -n 1 > ./nestTest -left_view -right_view -nest_view -full_view > Mat Object: 1 MPI process > type: nest > Matrix object: > type=nest, rows=2, cols=2 > MatNest structure: > (0,0) : type=constantdiagonal, rows=4, cols=4 > (0,1) : NULL > (1,0) : NULL > (1,1) : type=constantdiagonal, rows=4, cols=4 > Mat Object: 1 MPI process > type: seqaij > row 0: (0, 2.) > row 1: (1, 2.) > row 2: (2, 2.) > row 3: (3, 2.) > row 4: (4, 1.) > row 5: (5, 1.) > row 6: (6, 1.) > row 7: (7, 1.) > Vec Object: 1 MPI process > type: seq > 0. > 1. > 2. > 3. > 4. > 5. > 6. > 7. > Vec Object: 1 MPI process > type: seq > 0. > 2. > 4. > 6. > 4. > 5. > 6. > 7. > > This looks like what you expect. Doubling the first four rows and > reproducing the last four. Now let's run on two processes: > > master *:~/Downloads/tmp/Berger$ /PETSc3/petsc/apple/bin/mpiexec -n 2 > ./nestTest -left_view -right_view -nest_view -full_view > Mat Object: 2 MPI processes > type: nest > Matrix object: > type=nest, rows=2, cols=2 > MatNest structure: > (0,0) : type=constantdiagonal, rows=4, cols=4 > (0,1) : NULL > (1,0) : NULL > (1,1) : type=constantdiagonal, rows=4, cols=4 > Mat Object: 2 MPI processes > type: mpiaij > row 0: (0, 2.) > row 1: (1, 2.) > row 2: (2, 1.) > row 3: (3, 1.) > row 4: (4, 2.) > row 5: (5, 2.) > row 6: (6, 1.) > row 7: (7, 1.) > Vec Object: 2 MPI processes > type: mpi > Process [0] > 0. > 1. > 2. > 3. > Process [1] > 4. > 5. > 6. > 7. > Vec Object: 2 MPI processes > type: mpi > Process [0] > 0. > 2. > 2. > 3. > Process [1] > 8. > 10. > 6. > 7. > > Let me describe what has changed. The matrices A and B are parallel, so > each has two rows on process 0 and two rows on process 1. In the MatNest > they are interleaved because we asked for contiguous numbering (by giving > NULL for the IS of global row numbers). If we want to reproduce the same > output, we would need to produce our input vector with the same interleaved > numbering. > > Thanks, > > Matt > >> Thank you, >> --- >> Cl?ment BERGER >> ENS de Lyon >> >> >> Le 2023-03-20 12:53, Matthew Knepley a ?crit : >> >> On Mon, Mar 20, 2023 at 6:18?AM Berger Clement < >> clement.berger at ens-lyon.fr> wrote: >> >>> I simplified the problem with the initial test I talked about because I >>> thought I identified the issue, so I will walk you through my whole problem >>> : >>> >>> - first the solve doesn't produce the same results as mentioned >>> >>> - I noticed that the duration of the factorization step of the matrix >>> was also not consistent with the number of processors used (it is longer >>> with 3 processes than with 1), I didn't think much of it but I now realize >>> that for instance with 4 processes, MUMPS crashes when factorizing >>> >>> - I thought my matrices were wrong, but it's hard for me to use MatView >>> to compare them with 1 or 2 proc because I work with a quite specific >>> geometry, so in order not to fall into some weird particular case I need to >>> use at least roughly 100 points, so looking at 100x100 matrices is not >>> really nice...Instead I tried to multiply them by a vector full of one >>> (after I used the vector v such that v(i)=i). I tried it on two matrices, >>> and the results didn't depend on the number of procs, but when I tried to >>> multiply against the nest of these two matrices (a 2x2 block diagonal >>> nest), the result changed depending on the number of processors used >>> >>> - that's why I tried the toy problem I wrote to you in the first place >>> >>> I hope it's clearer now. >>> >> >> Unfortunately, it is not clear to me. There is nothing attached to this >> email. I will try to describe things from my end. >> >> 1) There are lots of tests. Internally, Nest does not depend on the >> number of processes unless you make it so. This leads >> me to believe that your construction of the matrix changes with the >> number of processes. For example, using PETSC_DETERMINE >> for sizes will do this. >> >> 2) In order to understand what you want to achieve, we need to have >> something running in two cases, one with "correct" output and one >> with something different. It sounds like you have such a small >> example, but I have missed it. >> >> Can you attach this example? Then I can run it, look at the matrices, and >> see what is different. >> >> Thanks, >> >> Matt >> >> >>> Thank you >>> --- >>> Cl?ment BERGER >>> ENS de Lyon >>> >>> >>> Le 2023-03-17 21:57, Barry Smith a ?crit : >>> >>> >>> Yes, you would benefit from a VecConvert() to produce a standard >>> vector. But you should be able to use VecGetArray() on the nest array and >>> on the standard array and copy the values between the arrays any way you >>> like. You don't need to do any reordering when you copy. Is that not >>> working and what are the symptoms (more than just the answers to the linear >>> solve are different)? Again you can run on one and two MPI processes with a >>> tiny problem to see if things are not in the correct order in the vectors >>> and matrices. >>> >>> Barry >>> >>> >>> On Mar 17, 2023, at 3:22 PM, Berger Clement >>> wrote: >>> >>> To use MUMPS I need to convert my matrix in MATAIJ format (or at least >>> not MATNEST), after that if I use a VECNEST for the left and right hanside, >>> I get an error during the solve procedure, it is removed if I copy my data >>> in a vector with standard format, I couldn't find any other way >>> --- >>> Cl?ment BERGER >>> ENS de Lyon >>> >>> >>> Le 2023-03-17 19:53, Matthew Knepley a ?crit : >>> >>> On Fri, Mar 17, 2023 at 2:53?PM Berger Clement < >>> clement.berger at ens-lyon.fr> wrote: >>> >>>> But this is to properly fill up the VecNest am I right ? Because this >>>> one is correct, but I can't directly use it in the KSPSolve, I need to copy >>>> it into a standard vector >>>> >>>> >>> I do not understand what you mean here. You can definitely use a >>> VecNest in a KSP. >>> >>> Thanks, >>> >>> Matt >>> >>> >>> >>>> --- >>>> Cl?ment BERGER >>>> ENS de Lyon >>>> >>>> >>>> Le 2023-03-17 19:39, Barry Smith a ?crit : >>>> >>>> >>>> I think the intention is that you use VecNestGetSubVecs() >>>> or VecNestGetSubVec() and fill up the sub-vectors in the same style as the >>>> matrices; this decreases the change of a reordering mistake in trying to do >>>> it by hand in your code. >>>> >>>> >>>> >>>> On Mar 17, 2023, at 2:35 PM, Berger Clement >>>> wrote: >>>> >>>> That might be it, I didn't find the equivalent of MatConvert for the >>>> vectors, so when I need to solve my linear system, with my righthandside >>>> properly computed in nest format, I create a new vector using VecDuplicate, >>>> and then I copy into it my data using VecGetArrayF90 and copiing each >>>> element by hand. Does it create an incorrect ordering ? If so how can I get >>>> the correct one ? >>>> --- >>>> Cl?ment BERGER >>>> ENS de Lyon >>>> >>>> >>>> Le 2023-03-17 19:27, Barry Smith a ?crit : >>>> >>>> >>>> I would run your code with small sizes on 1, 2, 3 MPI ranks and use >>>> MatView() to examine the matrices. They will definitely be ordered >>>> differently but should otherwise be the same. My guess is that the right >>>> hand side may not have the correct ordering with respect to the matrix >>>> ordering in parallel. Note also that when the right hand side does have the >>>> correct ordering the solution will have a different ordering for each >>>> different number of MPI ranks when printed (but changing the ordering >>>> should give the same results up to machine precision. >>>> >>>> Barry >>>> >>>> >>>> On Mar 17, 2023, at 2:23 PM, Berger Clement >>>> wrote: >>>> >>>> My issue is that it seems to improperly with some step of my process, >>>> the solve step doesn't provide the same result depending on the number of >>>> processors I use. I manually tried to multiply one the matrices I defined >>>> as a nest against a vector, and the result is not the same with e.g. 1 and >>>> 3 processors. That's why I tried the toy program I wrote in the first >>>> place, which highlights the misplacement of elements. >>>> --- >>>> Cl?ment BERGER >>>> ENS de Lyon >>>> >>>> >>>> Le 2023-03-17 19:14, Barry Smith a ?crit : >>>> >>>> >>>> This sounds like a fine use of MATNEST. Now back to the original >>>> question >>>> >>>> >>>> I want to construct a matrix by blocs, each block having different >>>> sizes and partially stored by multiple processors. If I am not mistaken, >>>> the right way to do so is by using the MATNEST type. However, the following >>>> code >>>> >>>> Call >>>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) >>>> Call >>>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) >>>> Call >>>> MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) >>>> >>>> does not generate the same matrix depending on the number of >>>> processors. It seems that it starts by everything owned by the first proc >>>> for A and B, then goes on to the second proc and so on (I hope I am being >>>> clear). >>>> >>>> Is it possible to change that ? >>>> >>>> If I understand correctly it is behaving as expected. It is the same >>>> matrix on 1 and 2 MPI processes, the only difference is the ordering of the >>>> rows and columns. >>>> >>>> Both matrix blocks are split among the two MPI processes. This is how >>>> MATNEST works and likely what you want in practice. >>>> >>>> On Mar 17, 2023, at 1:19 PM, Berger Clement >>>> wrote: >>>> >>>> I have a matrix with four different blocks (2rows - 2columns). The >>>> block sizes differ from one another, because they correspond to a different >>>> physical variable. One of the block has the particularity that it has to be >>>> updated at each iteration. This update is performed by replacing it with a >>>> product of multiple matrices that depend on the result of the previous >>>> iteration. Note that these intermediate matrices are not square (because >>>> they also correspond to other types of variables), and that they must be >>>> completely refilled by hand (i.e. they are not the result of some simple >>>> linear operations). Finally, I use this final block matrix to solve >>>> multiple linear systems (with different righthand sides), so for now I use >>>> MUMPS as only the first solve takes time (but I might change it). >>>> >>>> Considering this setting, I created each type of variable separately, >>>> filled the different matrices, and created different nests of vectors / >>>> matrices for my operations. When the time comes to use KSPSolve, I use >>>> MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy >>>> the few vector data I need from my nests in a regular Vector, I solve, I >>>> get back my data in my nest and carry on with the operations needed for my >>>> updates. >>>> >>>> Is that clear ? I don't know if I provided too many or not enough >>>> details. >>>> >>>> Thank you >>>> --- >>>> Cl?ment BERGER >>>> ENS de Lyon >>>> >>>> >>>> Le 2023-03-17 17:34, Barry Smith a ?crit : >>>> >>>> >>>> Perhaps if you provide a brief summary of what you would like to do >>>> and we may have ideas on how to achieve it. >>>> >>>> Barry >>>> >>>> Note: that MATNEST does require that all matrices live on all the MPI >>>> processes within the original communicator. That is if the original >>>> communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST >>>> that only lives on ranks 1,2 but you could have it have 0 rows on rank zero >>>> so effectively it lives only on rank 1 and 2 (though its communicator is >>>> all three ranks). >>>> >>>> On Mar 17, 2023, at 12:14 PM, Berger Clement < >>>> clement.berger at ens-lyon.fr> wrote: >>>> >>>> It would be possible in the case I showed you but in mine that would >>>> actually be quite complicated, isn't there any other workaround ? I precise >>>> that I am not entitled to utilizing the MATNEST format, it's just that I >>>> think the other ones wouldn't work. >>>> --- >>>> Cl?ment BERGER >>>> ENS de Lyon >>>> >>>> >>>> Le 2023-03-17 15:48, Barry Smith a ?crit : >>>> >>>> >>>> You may be able to mimic what you want by not using PETSC_DECIDE but >>>> instead computing up front how many rows of each matrix you want stored on >>>> each MPI process. You can use 0 for on certain MPI processes for certain >>>> matrices if you don't want any rows of that particular matrix stored on >>>> that particular MPI process. >>>> >>>> Barry >>>> >>>> >>>> On Mar 17, 2023, at 10:10 AM, Berger Clement < >>>> clement.berger at ens-lyon.fr> wrote: >>>> >>>> Dear all, >>>> >>>> I want to construct a matrix by blocs, each block having different >>>> sizes and partially stored by multiple processors. If I am not mistaken, >>>> the right way to do so is by using the MATNEST type. However, the following >>>> code >>>> >>>> Call >>>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) >>>> Call >>>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) >>>> Call >>>> MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) >>>> >>>> does not generate the same matrix depending on the number of >>>> processors. It seems that it starts by everything owned by the first proc >>>> for A and B, then goes on to the second proc and so on (I hope I am being >>>> clear). >>>> >>>> Is it possible to change that ? >>>> >>>> Note that I am coding in fortran if that has ay consequence. >>>> >>>> Thank you, >>>> >>>> Sincerely, >>>> -- >>>> Cl?ment BERGER >>>> ENS de Lyon >>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Mon Mar 20 11:12:04 2023 From: hongzhang at anl.gov (Zhang, Hong) Date: Mon, 20 Mar 2023 16:12:04 +0000 Subject: [petsc-users] [petsc-maint] Some questions about matrix multiplication in sell format In-Reply-To: References: Message-ID: See https://dl.acm.org/doi/10.1145/3225058.3225100 for more information about SELL. The example used in the paper is src/ts/tutorials/advection-diffusion-reaction/ex5adj.c src/mat/tests/bench_spmv.c provides a driver to test SpMV using matrices stored in binary or matrix market format (from the SuiteSparse benchmark collection). If you would like to dive deeper, check this configurable script to see how the benchmark testing can be automated: src/benchmarks/run_petsc_benchmarks.sh Hong (Mr.) On Mar 20, 2023, at 5:31 AM, Mark Adams wrote: I have no idea, keep on the list. Mark On Sun, Mar 19, 2023 at 10:13?PM CaoHao at gmail.com > wrote: Thank you very much, I still have a question about the test code after vectorization. I did not find the Examples of the sell storage format in the petsc document. I would like to know which example you use to test the efficiency of vectorization? Mark Adams > ?2023?3?16??? 19:40??? On Thu, Mar 16, 2023 at 4:18?AM CaoHao at gmail.com > wrote: Ok, maybe I can try to vectorize this format and make it part of the article. That would be great, and it would be a good learning experience for you and a good way to get exposure. See https://petsc.org/release/developers/contributing/ for guidance. Good luck, Mark Mark Adams > ?2023?3?15??? 19:57??? I don't believe that we have an effort here. It could be a good opportunity to contribute. Mark On Wed, Mar 15, 2023 at 4:54?AM CaoHao at gmail.com > wrote: I checked the sell.c file and found that this algorithm supports AVX vectorization. Will the vectorization support of ARM architecture be added in the future? -------------- next part -------------- An HTML attachment was scrubbed... URL: From clement.berger at ens-lyon.fr Mon Mar 20 11:20:06 2023 From: clement.berger at ens-lyon.fr (Berger Clement) Date: Mon, 20 Mar 2023 17:20:06 +0100 Subject: [petsc-users] Create a nest not aligned by processors In-Reply-To: References: <3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev> <193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev> <82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev> <26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr> <1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev> <2b92d65cd8b6b5786bad3b03ab74eda6@ens-lyon.fr> <0b5419aa1fdbe12f093413352c675782@ens-lyon.fr> <0fe503797c8f73605cebaeaf1924e473@ens-lyon.fr> <17f530ee58cffe33a7f66d5e293d70c9@ens-lyon.fr> Message-ID: <2436f5782a4044d5fbb050c6f3f61c3f@ens-lyon.fr> That seems to be working fine, thank you ! --- Cl?ment BERGER ENS de Lyon Le 2023-03-20 15:51, Matthew Knepley a ?crit : > On Mon, Mar 20, 2023 at 10:09?AM Berger Clement wrote: > >> Ok so this means that if I define my vectors via VecNest (organized as the matrices), everything will be correctly ordered ? > > Yes. > >> How does that behave with MatConvert ? In the sense that if I convert a MatNest to MatAIJ via MatConvert, will the vector as built in the example I showed you work properly ? > > No. MatConvert just changes the storage format, not the ordering. > > Thanks, > > Matt > > Thank you ! > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-20 14:58, Matthew Knepley a ?crit : > > On Mon, Mar 20, 2023 at 9:35?AM Berger Clement wrote: > > 1) Yes in my program I use PETSC_DETERMINE, but I don't see what is the issue there. From what I understand, it just lets PETSc set the total size from the local sizes provided, am I mistaken ? > > 2) I attached a small script, when I run it with 1 proc the output vector is not the same as if I run it with 2 procs, I don't know what I should do to make them match. > > PS : I precise that I am not trying to point out a bug here, I realize that my usage is wrong somehow, I just can't determine why, sorry if I gave you the wrong impression ! > > I think I can now explain this clearly. Thank you for the nice simple example. I attach my slightly changed version (I think better in C). Here is running on one process: > master *:~/Downloads/tmp/Berger$ /PETSc3/petsc/apple/bin/mpiexec -n 1 ./nestTest -left_view -right_view -nest_view -full_view > Mat Object: 1 MPI process > type: nest > Matrix object: > type=nest, rows=2, cols=2 > MatNest structure: > (0,0) : type=constantdiagonal, rows=4, cols=4 > (0,1) : NULL > (1,0) : NULL > (1,1) : type=constantdiagonal, rows=4, cols=4 > Mat Object: 1 MPI process > type: seqaij > row 0: (0, 2.) > row 1: (1, 2.) > row 2: (2, 2.) > row 3: (3, 2.) > row 4: (4, 1.) > row 5: (5, 1.) > row 6: (6, 1.) > row 7: (7, 1.) > Vec Object: 1 MPI process > type: seq > 0. > 1. > 2. > 3. > 4. > 5. > 6. > 7. > Vec Object: 1 MPI process > type: seq > 0. > 2. > 4. > 6. > 4. > 5. > 6. > 7. > > This looks like what you expect. Doubling the first four rows and reproducing the last four. Now let's run on two processes: > > master *:~/Downloads/tmp/Berger$ /PETSc3/petsc/apple/bin/mpiexec -n 2 ./nestTest -left_view -right_view -nest_view -full_view > Mat Object: 2 MPI processes > type: nest > Matrix object: > type=nest, rows=2, cols=2 > MatNest structure: > (0,0) : type=constantdiagonal, rows=4, cols=4 > (0,1) : NULL > (1,0) : NULL > (1,1) : type=constantdiagonal, rows=4, cols=4 > Mat Object: 2 MPI processes > type: mpiaij > row 0: (0, 2.) > row 1: (1, 2.) > row 2: (2, 1.) > row 3: (3, 1.) > row 4: (4, 2.) > row 5: (5, 2.) > row 6: (6, 1.) > row 7: (7, 1.) > Vec Object: 2 MPI processes > type: mpi > Process [0] > 0. > 1. > 2. > 3. > Process [1] > 4. > 5. > 6. > 7. > Vec Object: 2 MPI processes > type: mpi > Process [0] > 0. > 2. > 2. > 3. > Process [1] > 8. > 10. > 6. > 7. > > Let me describe what has changed. The matrices A and B are parallel, so each has two rows on process 0 and two rows on process 1. In the MatNest they are interleaved because we asked for contiguous numbering (by giving NULL for the IS of global row numbers). If we want to reproduce the same output, we would need to produce our input vector with the same interleaved numbering. > > Thanks, > > Matt > > Thank you, > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-20 12:53, Matthew Knepley a ?crit : > > On Mon, Mar 20, 2023 at 6:18?AM Berger Clement wrote: > > I simplified the problem with the initial test I talked about because I thought I identified the issue, so I will walk you through my whole problem : > > - first the solve doesn't produce the same results as mentioned > > - I noticed that the duration of the factorization step of the matrix was also not consistent with the number of processors used (it is longer with 3 processes than with 1), I didn't think much of it but I now realize that for instance with 4 processes, MUMPS crashes when factorizing > > - I thought my matrices were wrong, but it's hard for me to use MatView to compare them with 1 or 2 proc because I work with a quite specific geometry, so in order not to fall into some weird particular case I need to use at least roughly 100 points, so looking at 100x100 matrices is not really nice...Instead I tried to multiply them by a vector full of one (after I used the vector v such that v(i)=i). I tried it on two matrices, and the results didn't depend on the number of procs, but when I tried to multiply against the nest of these two matrices (a 2x2 block diagonal nest), the result changed depending on the number of processors used > > - that's why I tried the toy problem I wrote to you in the first place > > I hope it's clearer now. > > Unfortunately, it is not clear to me. There is nothing attached to this email. I will try to describe things from my end. > > 1) There are lots of tests. Internally, Nest does not depend on the number of processes unless you make it so. This leads > me to believe that your construction of the matrix changes with the number of processes. For example, using PETSC_DETERMINE > for sizes will do this. > > 2) In order to understand what you want to achieve, we need to have something running in two cases, one with "correct" output and one > with something different. It sounds like you have such a small example, but I have missed it. > > Can you attach this example? Then I can run it, look at the matrices, and see what is different. > > Thanks, > > Matt > > Thank you > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 21:57, Barry Smith a ?crit : > Yes, you would benefit from a VecConvert() to produce a standard vector. But you should be able to use VecGetArray() on the nest array and on the standard array and copy the values between the arrays any way you like. You don't need to do any reordering when you copy. Is that not working and what are the symptoms (more than just the answers to the linear solve are different)? Again you can run on one and two MPI processes with a tiny problem to see if things are not in the correct order in the vectors and matrices. > > Barry > > On Mar 17, 2023, at 3:22 PM, Berger Clement wrote: > > To use MUMPS I need to convert my matrix in MATAIJ format (or at least not MATNEST), after that if I use a VECNEST for the left and right hanside, I get an error during the solve procedure, it is removed if I copy my data in a vector with standard format, I couldn't find any other way > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 19:53, Matthew Knepley a ?crit : > > On Fri, Mar 17, 2023 at 2:53?PM Berger Clement wrote: > > But this is to properly fill up the VecNest am I right ? Because this one is correct, but I can't directly use it in the KSPSolve, I need to copy it into a standard vector > > I do not understand what you mean here. You can definitely use a VecNest in a KSP. > > Thanks, > > Matt > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 19:39, Barry Smith a ?crit : > I think the intention is that you use VecNestGetSubVecs() or VecNestGetSubVec() and fill up the sub-vectors in the same style as the matrices; this decreases the change of a reordering mistake in trying to do it by hand in your code. > > On Mar 17, 2023, at 2:35 PM, Berger Clement wrote: > > That might be it, I didn't find the equivalent of MatConvert for the vectors, so when I need to solve my linear system, with my righthandside properly computed in nest format, I create a new vector using VecDuplicate, and then I copy into it my data using VecGetArrayF90 and copiing each element by hand. Does it create an incorrect ordering ? If so how can I get the correct one ? > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 19:27, Barry Smith a ?crit : > I would run your code with small sizes on 1, 2, 3 MPI ranks and use MatView() to examine the matrices. They will definitely be ordered differently but should otherwise be the same. My guess is that the right hand side may not have the correct ordering with respect to the matrix ordering in parallel. Note also that when the right hand side does have the correct ordering the solution will have a different ordering for each different number of MPI ranks when printed (but changing the ordering should give the same results up to machine precision. > > Barry > > On Mar 17, 2023, at 2:23 PM, Berger Clement wrote: > > My issue is that it seems to improperly with some step of my process, the solve step doesn't provide the same result depending on the number of processors I use. I manually tried to multiply one the matrices I defined as a nest against a vector, and the result is not the same with e.g. 1 and 3 processors. That's why I tried the toy program I wrote in the first place, which highlights the misplacement of elements. > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 19:14, Barry Smith a ?crit : > This sounds like a fine use of MATNEST. Now back to the original question > > I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code > > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) > Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) > > does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). > > Is it possible to change that ? If I understand correctly it is behaving as expected. It is the same matrix on 1 and 2 MPI processes, the only difference is the ordering of the rows and columns. Both matrix blocks are split among the two MPI processes. This is how MATNEST works and likely what you want in practice. > On Mar 17, 2023, at 1:19 PM, Berger Clement wrote: > > I have a matrix with four different blocks (2rows - 2columns). The block sizes differ from one another, because they correspond to a different physical variable. One of the block has the particularity that it has to be updated at each iteration. This update is performed by replacing it with a product of multiple matrices that depend on the result of the previous iteration. Note that these intermediate matrices are not square (because they also correspond to other types of variables), and that they must be completely refilled by hand (i.e. they are not the result of some simple linear operations). Finally, I use this final block matrix to solve multiple linear systems (with different righthand sides), so for now I use MUMPS as only the first solve takes time (but I might change it). > > Considering this setting, I created each type of variable separately, filled the different matrices, and created different nests of vectors / matrices for my operations. When the time comes to use KSPSolve, I use MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy the few vector data I need from my nests in a regular Vector, I solve, I get back my data in my nest and carry on with the operations needed for my updates. > > Is that clear ? I don't know if I provided too many or not enough details. > > Thank you > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 17:34, Barry Smith a ?crit : > Perhaps if you provide a brief summary of what you would like to do and we may have ideas on how to achieve it. > > Barry > > Note: that MATNEST does require that all matrices live on all the MPI processes within the original communicator. That is if the original communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST that only lives on ranks 1,2 but you could have it have 0 rows on rank zero so effectively it lives only on rank 1 and 2 (though its communicator is all three ranks). > > On Mar 17, 2023, at 12:14 PM, Berger Clement wrote: > > It would be possible in the case I showed you but in mine that would actually be quite complicated, isn't there any other workaround ? I precise that I am not entitled to utilizing the MATNEST format, it's just that I think the other ones wouldn't work. > > --- > Cl?ment BERGER > ENS de Lyon > > Le 2023-03-17 15:48, Barry Smith a ?crit : > You may be able to mimic what you want by not using PETSC_DECIDE but instead computing up front how many rows of each matrix you want stored on each MPI process. You can use 0 for on certain MPI processes for certain matrices if you don't want any rows of that particular matrix stored on that particular MPI process. > > Barry > > On Mar 17, 2023, at 10:10 AM, Berger Clement wrote: > > Dear all, > > I want to construct a matrix by blocs, each block having different sizes and partially stored by multiple processors. If I am not mistaken, the right way to do so is by using the MATNEST type. However, the following code > > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) > Call MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) > Call MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) > > does not generate the same matrix depending on the number of processors. It seems that it starts by everything owned by the first proc for A and B, then goes on to the second proc and so on (I hope I am being clear). > > Is it possible to change that ? > > Note that I am coding in fortran if that has ay consequence. > > Thank you, > > Sincerely, > > -- > Cl?ment BERGER > ENS de Lyon -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ [1] -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ [1] -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ [1] -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ [1] Links: ------ [1] http://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Mar 20 12:03:05 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 20 Mar 2023 13:03:05 -0400 Subject: [petsc-users] Create a nest not aligned by processors In-Reply-To: <2436f5782a4044d5fbb050c6f3f61c3f@ens-lyon.fr> References: <3876145D-0CC7-41B8-B96C-E25F615EAC3F@petsc.dev> <193BEA01-43F8-44EC-8AD0-83477F129232@petsc.dev> <82DF1C67-455F-42A4-ADBD-A3DC826B175C@petsc.dev> <26d449ca68c717fecb45a3ec849724e5@ens-lyon.fr> <1349D982-6A5B-44C2-B949-B3B437F381E9@petsc.dev> <2b92d65cd8b6b5786bad3b03ab74eda6@ens-lyon.fr> <0b5419aa1fdbe12f093413352c675782@ens-lyon.fr> <0fe503797c8f73605cebaeaf1924e473@ens-lyon.fr> <17f530ee58cffe33a7f66d5e293d70c9@ens-lyon.fr> <2436f5782a4044d5fbb050c6f3f61c3f@ens-lyon.fr> Message-ID: On Mon, Mar 20, 2023 at 12:20?PM Berger Clement wrote: > That seems to be working fine, thank you ! > Great! Thanks Matt > --- > Cl?ment BERGER > ENS de Lyon > > > Le 2023-03-20 15:51, Matthew Knepley a ?crit : > > On Mon, Mar 20, 2023 at 10:09?AM Berger Clement < > clement.berger at ens-lyon.fr> wrote: > >> Ok so this means that if I define my vectors via VecNest (organized as >> the matrices), everything will be correctly ordered ? >> > Yes. > >> How does that behave with MatConvert ? In the sense that if I convert a >> MatNest to MatAIJ via MatConvert, will the vector as built in the example I >> showed you work properly ? >> >> >> No. MatConvert just changes the storage format, not the ordering. > > Thanks, > > Matt > >> Thank you ! >> --- >> Cl?ment BERGER >> ENS de Lyon >> >> >> Le 2023-03-20 14:58, Matthew Knepley a ?crit : >> >> On Mon, Mar 20, 2023 at 9:35?AM Berger Clement < >> clement.berger at ens-lyon.fr> wrote: >> >>> 1) Yes in my program I use PETSC_DETERMINE, but I don't see what is the >>> issue there. From what I understand, it just lets PETSc set the total size >>> from the local sizes provided, am I mistaken ? >>> >>> 2) I attached a small script, when I run it with 1 proc the output >>> vector is not the same as if I run it with 2 procs, I don't know what I >>> should do to make them match. >>> >>> PS : I precise that I am not trying to point out a bug here, I realize >>> that my usage is wrong somehow, I just can't determine why, sorry if I gave >>> you the wrong impression ! >>> >>> >>> I think I can now explain this clearly. Thank you for the nice simple >> example. I attach my slightly changed version (I think better in C). Here >> is running on one process: >> >> master *:~/Downloads/tmp/Berger$ /PETSc3/petsc/apple/bin/mpiexec -n 1 >> ./nestTest -left_view -right_view -nest_view -full_view >> Mat Object: 1 MPI process >> type: nest >> Matrix object: >> type=nest, rows=2, cols=2 >> MatNest structure: >> (0,0) : type=constantdiagonal, rows=4, cols=4 >> (0,1) : NULL >> (1,0) : NULL >> (1,1) : type=constantdiagonal, rows=4, cols=4 >> Mat Object: 1 MPI process >> type: seqaij >> row 0: (0, 2.) >> row 1: (1, 2.) >> row 2: (2, 2.) >> row 3: (3, 2.) >> row 4: (4, 1.) >> row 5: (5, 1.) >> row 6: (6, 1.) >> row 7: (7, 1.) >> Vec Object: 1 MPI process >> type: seq >> 0. >> 1. >> 2. >> 3. >> 4. >> 5. >> 6. >> 7. >> Vec Object: 1 MPI process >> type: seq >> 0. >> 2. >> 4. >> 6. >> 4. >> 5. >> 6. >> 7. >> >> This looks like what you expect. Doubling the first four rows and >> reproducing the last four. Now let's run on two processes: >> >> master *:~/Downloads/tmp/Berger$ /PETSc3/petsc/apple/bin/mpiexec -n 2 >> ./nestTest -left_view -right_view -nest_view -full_view >> Mat Object: 2 MPI processes >> type: nest >> Matrix object: >> type=nest, rows=2, cols=2 >> MatNest structure: >> (0,0) : type=constantdiagonal, rows=4, cols=4 >> (0,1) : NULL >> (1,0) : NULL >> (1,1) : type=constantdiagonal, rows=4, cols=4 >> Mat Object: 2 MPI processes >> type: mpiaij >> row 0: (0, 2.) >> row 1: (1, 2.) >> row 2: (2, 1.) >> row 3: (3, 1.) >> row 4: (4, 2.) >> row 5: (5, 2.) >> row 6: (6, 1.) >> row 7: (7, 1.) >> Vec Object: 2 MPI processes >> type: mpi >> Process [0] >> 0. >> 1. >> 2. >> 3. >> Process [1] >> 4. >> 5. >> 6. >> 7. >> Vec Object: 2 MPI processes >> type: mpi >> Process [0] >> 0. >> 2. >> 2. >> 3. >> Process [1] >> 8. >> 10. >> 6. >> 7. >> >> Let me describe what has changed. The matrices A and B are parallel, so >> each has two rows on process 0 and two rows on process 1. In the MatNest >> they are interleaved because we asked for contiguous numbering (by giving >> NULL for the IS of global row numbers). If we want to reproduce the same >> output, we would need to produce our input vector with the same interleaved >> numbering. >> >> Thanks, >> >> Matt >> >>> Thank you, >>> --- >>> Cl?ment BERGER >>> ENS de Lyon >>> >>> >>> Le 2023-03-20 12:53, Matthew Knepley a ?crit : >>> >>> On Mon, Mar 20, 2023 at 6:18?AM Berger Clement < >>> clement.berger at ens-lyon.fr> wrote: >>> >>>> I simplified the problem with the initial test I talked about because I >>>> thought I identified the issue, so I will walk you through my whole problem >>>> : >>>> >>>> - first the solve doesn't produce the same results as mentioned >>>> >>>> - I noticed that the duration of the factorization step of the matrix >>>> was also not consistent with the number of processors used (it is longer >>>> with 3 processes than with 1), I didn't think much of it but I now realize >>>> that for instance with 4 processes, MUMPS crashes when factorizing >>>> >>>> - I thought my matrices were wrong, but it's hard for me to use MatView >>>> to compare them with 1 or 2 proc because I work with a quite specific >>>> geometry, so in order not to fall into some weird particular case I need to >>>> use at least roughly 100 points, so looking at 100x100 matrices is not >>>> really nice...Instead I tried to multiply them by a vector full of one >>>> (after I used the vector v such that v(i)=i). I tried it on two matrices, >>>> and the results didn't depend on the number of procs, but when I tried to >>>> multiply against the nest of these two matrices (a 2x2 block diagonal >>>> nest), the result changed depending on the number of processors used >>>> >>>> - that's why I tried the toy problem I wrote to you in the first place >>>> >>>> I hope it's clearer now. >>>> >>> >>> Unfortunately, it is not clear to me. There is nothing attached to this >>> email. I will try to describe things from my end. >>> >>> 1) There are lots of tests. Internally, Nest does not depend on the >>> number of processes unless you make it so. This leads >>> me to believe that your construction of the matrix changes with the >>> number of processes. For example, using PETSC_DETERMINE >>> for sizes will do this. >>> >>> 2) In order to understand what you want to achieve, we need to have >>> something running in two cases, one with "correct" output and one >>> with something different. It sounds like you have such a small >>> example, but I have missed it. >>> >>> Can you attach this example? Then I can run it, look at the matrices, >>> and see what is different. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thank you >>>> --- >>>> Cl?ment BERGER >>>> ENS de Lyon >>>> >>>> >>>> Le 2023-03-17 21:57, Barry Smith a ?crit : >>>> >>>> >>>> Yes, you would benefit from a VecConvert() to produce a standard >>>> vector. But you should be able to use VecGetArray() on the nest array and >>>> on the standard array and copy the values between the arrays any way you >>>> like. You don't need to do any reordering when you copy. Is that not >>>> working and what are the symptoms (more than just the answers to the linear >>>> solve are different)? Again you can run on one and two MPI processes with a >>>> tiny problem to see if things are not in the correct order in the vectors >>>> and matrices. >>>> >>>> Barry >>>> >>>> >>>> On Mar 17, 2023, at 3:22 PM, Berger Clement >>>> wrote: >>>> >>>> To use MUMPS I need to convert my matrix in MATAIJ format (or at least >>>> not MATNEST), after that if I use a VECNEST for the left and right hanside, >>>> I get an error during the solve procedure, it is removed if I copy my data >>>> in a vector with standard format, I couldn't find any other way >>>> --- >>>> Cl?ment BERGER >>>> ENS de Lyon >>>> >>>> >>>> Le 2023-03-17 19:53, Matthew Knepley a ?crit : >>>> >>>> On Fri, Mar 17, 2023 at 2:53?PM Berger Clement < >>>> clement.berger at ens-lyon.fr> wrote: >>>> >>>>> But this is to properly fill up the VecNest am I right ? Because this >>>>> one is correct, but I can't directly use it in the KSPSolve, I need to copy >>>>> it into a standard vector >>>>> >>>>> >>>> I do not understand what you mean here. You can definitely use a >>>> VecNest in a KSP. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>> >>>>> --- >>>>> Cl?ment BERGER >>>>> ENS de Lyon >>>>> >>>>> >>>>> Le 2023-03-17 19:39, Barry Smith a ?crit : >>>>> >>>>> >>>>> I think the intention is that you use VecNestGetSubVecs() >>>>> or VecNestGetSubVec() and fill up the sub-vectors in the same style as the >>>>> matrices; this decreases the change of a reordering mistake in trying to do >>>>> it by hand in your code. >>>>> >>>>> >>>>> >>>>> On Mar 17, 2023, at 2:35 PM, Berger Clement < >>>>> clement.berger at ens-lyon.fr> wrote: >>>>> >>>>> That might be it, I didn't find the equivalent of MatConvert for the >>>>> vectors, so when I need to solve my linear system, with my righthandside >>>>> properly computed in nest format, I create a new vector using VecDuplicate, >>>>> and then I copy into it my data using VecGetArrayF90 and copiing each >>>>> element by hand. Does it create an incorrect ordering ? If so how can I get >>>>> the correct one ? >>>>> --- >>>>> Cl?ment BERGER >>>>> ENS de Lyon >>>>> >>>>> >>>>> Le 2023-03-17 19:27, Barry Smith a ?crit : >>>>> >>>>> >>>>> I would run your code with small sizes on 1, 2, 3 MPI ranks and use >>>>> MatView() to examine the matrices. They will definitely be ordered >>>>> differently but should otherwise be the same. My guess is that the right >>>>> hand side may not have the correct ordering with respect to the matrix >>>>> ordering in parallel. Note also that when the right hand side does have the >>>>> correct ordering the solution will have a different ordering for each >>>>> different number of MPI ranks when printed (but changing the ordering >>>>> should give the same results up to machine precision. >>>>> >>>>> Barry >>>>> >>>>> >>>>> On Mar 17, 2023, at 2:23 PM, Berger Clement < >>>>> clement.berger at ens-lyon.fr> wrote: >>>>> >>>>> My issue is that it seems to improperly with some step of my process, >>>>> the solve step doesn't provide the same result depending on the number of >>>>> processors I use. I manually tried to multiply one the matrices I defined >>>>> as a nest against a vector, and the result is not the same with e.g. 1 and >>>>> 3 processors. That's why I tried the toy program I wrote in the first >>>>> place, which highlights the misplacement of elements. >>>>> --- >>>>> Cl?ment BERGER >>>>> ENS de Lyon >>>>> >>>>> >>>>> Le 2023-03-17 19:14, Barry Smith a ?crit : >>>>> >>>>> >>>>> This sounds like a fine use of MATNEST. Now back to the original >>>>> question >>>>> >>>>> >>>>> I want to construct a matrix by blocs, each block having different >>>>> sizes and partially stored by multiple processors. If I am not mistaken, >>>>> the right way to do so is by using the MATNEST type. However, the following >>>>> code >>>>> >>>>> Call >>>>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) >>>>> Call >>>>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) >>>>> Call >>>>> MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) >>>>> >>>>> does not generate the same matrix depending on the number of >>>>> processors. It seems that it starts by everything owned by the first proc >>>>> for A and B, then goes on to the second proc and so on (I hope I am being >>>>> clear). >>>>> >>>>> Is it possible to change that ? >>>>> >>>>> If I understand correctly it is behaving as expected. It is the same >>>>> matrix on 1 and 2 MPI processes, the only difference is the ordering of the >>>>> rows and columns. >>>>> >>>>> Both matrix blocks are split among the two MPI processes. This is >>>>> how MATNEST works and likely what you want in practice. >>>>> >>>>> On Mar 17, 2023, at 1:19 PM, Berger Clement < >>>>> clement.berger at ens-lyon.fr> wrote: >>>>> >>>>> I have a matrix with four different blocks (2rows - 2columns). The >>>>> block sizes differ from one another, because they correspond to a different >>>>> physical variable. One of the block has the particularity that it has to be >>>>> updated at each iteration. This update is performed by replacing it with a >>>>> product of multiple matrices that depend on the result of the previous >>>>> iteration. Note that these intermediate matrices are not square (because >>>>> they also correspond to other types of variables), and that they must be >>>>> completely refilled by hand (i.e. they are not the result of some simple >>>>> linear operations). Finally, I use this final block matrix to solve >>>>> multiple linear systems (with different righthand sides), so for now I use >>>>> MUMPS as only the first solve takes time (but I might change it). >>>>> >>>>> Considering this setting, I created each type of variable separately, >>>>> filled the different matrices, and created different nests of vectors / >>>>> matrices for my operations. When the time comes to use KSPSolve, I use >>>>> MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy >>>>> the few vector data I need from my nests in a regular Vector, I solve, I >>>>> get back my data in my nest and carry on with the operations needed for my >>>>> updates. >>>>> >>>>> Is that clear ? I don't know if I provided too many or not enough >>>>> details. >>>>> >>>>> Thank you >>>>> --- >>>>> Cl?ment BERGER >>>>> ENS de Lyon >>>>> >>>>> >>>>> Le 2023-03-17 17:34, Barry Smith a ?crit : >>>>> >>>>> >>>>> Perhaps if you provide a brief summary of what you would like to do >>>>> and we may have ideas on how to achieve it. >>>>> >>>>> Barry >>>>> >>>>> Note: that MATNEST does require that all matrices live on all the MPI >>>>> processes within the original communicator. That is if the original >>>>> communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST >>>>> that only lives on ranks 1,2 but you could have it have 0 rows on rank zero >>>>> so effectively it lives only on rank 1 and 2 (though its communicator is >>>>> all three ranks). >>>>> >>>>> On Mar 17, 2023, at 12:14 PM, Berger Clement < >>>>> clement.berger at ens-lyon.fr> wrote: >>>>> >>>>> It would be possible in the case I showed you but in mine that would >>>>> actually be quite complicated, isn't there any other workaround ? I precise >>>>> that I am not entitled to utilizing the MATNEST format, it's just that I >>>>> think the other ones wouldn't work. >>>>> --- >>>>> Cl?ment BERGER >>>>> ENS de Lyon >>>>> >>>>> >>>>> Le 2023-03-17 15:48, Barry Smith a ?crit : >>>>> >>>>> >>>>> You may be able to mimic what you want by not using PETSC_DECIDE >>>>> but instead computing up front how many rows of each matrix you want stored >>>>> on each MPI process. You can use 0 for on certain MPI processes for certain >>>>> matrices if you don't want any rows of that particular matrix stored on >>>>> that particular MPI process. >>>>> >>>>> Barry >>>>> >>>>> >>>>> On Mar 17, 2023, at 10:10 AM, Berger Clement < >>>>> clement.berger at ens-lyon.fr> wrote: >>>>> >>>>> Dear all, >>>>> >>>>> I want to construct a matrix by blocs, each block having different >>>>> sizes and partially stored by multiple processors. If I am not mistaken, >>>>> the right way to do so is by using the MATNEST type. However, the following >>>>> code >>>>> >>>>> Call >>>>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr) >>>>> Call >>>>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr) >>>>> Call >>>>> MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr) >>>>> >>>>> does not generate the same matrix depending on the number of >>>>> processors. It seems that it starts by everything owned by the first proc >>>>> for A and B, then goes on to the second proc and so on (I hope I am being >>>>> clear). >>>>> >>>>> Is it possible to change that ? >>>>> >>>>> Note that I am coding in fortran if that has ay consequence. >>>>> >>>>> Thank you, >>>>> >>>>> Sincerely, >>>>> -- >>>>> Cl?ment BERGER >>>>> ENS de Lyon >>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ch1057458756 at gmail.com Mon Mar 20 12:04:38 2023 From: ch1057458756 at gmail.com (CaoHao@gmail.com) Date: Tue, 21 Mar 2023 01:04:38 +0800 Subject: [petsc-users] [petsc-maint] Some questions about matrix multiplication in sell format In-Reply-To: References: Message-ID: Thank you very much. I will continue to study these contents. Zhang, Hong ? 2023?3?21? ???00:12??? > See https://dl.acm.org/doi/10.1145/3225058.3225100 for more information > about SELL. > > The example used in the paper > is src/ts/tutorials/advection-diffusion-reaction/ex5adj.c > > src/mat/tests/bench_spmv.c provides a driver to test SpMV using matrices > stored in binary or matrix market format (from the SuiteSparse benchmark > collection). > > If you would like to dive deeper, check this configurable script to see > how the benchmark testing can be automated: > src/benchmarks/run_petsc_benchmarks.sh > > Hong (Mr.) > > On Mar 20, 2023, at 5:31 AM, Mark Adams wrote: > > I have no idea, keep on the list. > Mark > > On Sun, Mar 19, 2023 at 10:13?PM CaoHao at gmail.com > wrote: > >> Thank you very much, I still have a question about the test code after >> vectorization. I did not find the Examples of the sell storage format in >> the petsc document. I would like to know which example you use to test the >> efficiency of vectorization? >> >> Mark Adams ?2023?3?16??? 19:40??? >> >>> >>> >>> On Thu, Mar 16, 2023 at 4:18?AM CaoHao at gmail.com >>> wrote: >>> >>>> Ok, maybe I can try to vectorize this format and make it part of the >>>> article. >>>> >>> >>> That would be great, and it would be a good learning experience for you >>> and a good way to get exposure. >>> See https://petsc.org/release/developers/contributing/ for guidance. >>> >>> Good luck, >>> Mark >>> >>> >>>> >>>> Mark Adams ?2023?3?15??? 19:57??? >>>> >>>>> I don't believe that we have an effort here. It could be a good >>>>> opportunity to contribute. >>>>> >>>>> Mark >>>>> >>>>> On Wed, Mar 15, 2023 at 4:54?AM CaoHao at gmail.com < >>>>> ch1057458756 at gmail.com> wrote: >>>>> >>>>>> I checked the sell.c file and found that this algorithm supports AVX >>>>>> vectorization. Will the vectorization support of ARM architecture be added >>>>>> in the future? >>>>>> >>>>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jchristopher at anl.gov Mon Mar 20 17:45:04 2023 From: jchristopher at anl.gov (Christopher, Joshua) Date: Mon, 20 Mar 2023 22:45:04 +0000 Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG In-Reply-To: <595D8D88-C619-41D7-A427-1C0EFB5C5E44@petsc.dev> References: <523EAD18-437E-4008-A811-4D32317C89AC@joliv.et> <4A1F98D0-658C-47A2-8277-23F97F95F5C1@petsc.dev> <595D8D88-C619-41D7-A427-1C0EFB5C5E44@petsc.dev> Message-ID: Hi Barry and Mark, Thank you for your responses. I implemented the index sets in my application and it appears to work in serial. Unfortunately I am having some trouble running in parallel. The error I am getting is: [1]PETSC ERROR: Petsc has generated inconsistent data [1]PETSC ERROR: Number of entries found in complement 1000 does not match expected 500 1]PETSC ERROR: #1 ISComplement() at petsc-3.16.5/src/vec/is/is/utils/iscoloring.c:837 [1]PETSC ERROR: #2 PCSetUp_FieldSplit() at petsc-3.16.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c:882 [1]PETSC ERROR: #3 PCSetUp() at petsc-3.16.5/src/ksp/pc/interface/precon.c:1017 [1]PETSC ERROR: #4 KSPSetUp() at petsc-3.16.5/src/ksp/ksp/interface/itfunc.c:408 [1]PETSC ERROR: #5 KSPSolve_Private() at petsc-3.16.5/src/ksp/ksp/interface/itfunc.c:852 [1]PETSC ERROR: #6 KSPSolve() at petsc-3.16.5/src/ksp/ksp/interface/itfunc.c:1086 [1]PETSC ERROR: #7 solvePetsc() at coupled/coupledSolver.C:612 I am testing with two processors and a 2000x2000 matrix. I have two fields, phi and rho. The matrix has rows 0-999 for phi and rows 1000-1999 for rho. Proc0 has rows 0-499 and 1000-1499 while proc1 has rows 500-999 and 1500-1999. I've attached the ASCII printout of the IS for phi and rho. Am I right thinking that I have some issue with my IS layouts? Thank you, Joshua ________________________________ From: Barry Smith Sent: Friday, March 17, 2023 1:22 PM To: Christopher, Joshua Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG On Mar 17, 2023, at 1:26 PM, Christopher, Joshua wrote: Hi Barry, Thank you for your response. I'm a little confused about the relation between the IS integer values and matrix indices. Fromhttps://petsc.org/release/src/snes/tutorials/ex70.c.html it looks like my IS should just contain a list of the rows for each split? For example, if I have a 100x100 matrix with two fields, "rho" and "phi", the first 50 rows correspond to the "rho" variable and the last 50 correspond to the "phi" variable. So I should call PCFieldSplitSetIS twice, the first with an IS containing integers 0-49 and the second with integers 49-99? PCFieldSplitSetIS is expecting global row numbers, correct? As Mark said, yes this sounds fine. My matrix is organized as one block after another. When you are running in parallel with MPI, how will you organize the unknowns? Will you have 25 of the rho followed by 25 of phi on each MPI process? You will need to take this into account when you build the IS on each MPI process. Barry Thank you, Joshua ________________________________ From: Barry Smith > Sent: Tuesday, March 14, 2023 1:35 PM To: Christopher, Joshua > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG You definitely do not need to use a complicated DM to take advantage of PCFIELDSPLIT. All you need to do is create two IS on each MPI process. The first should list all the indices of the degrees of freedom of your first type of variable and the second should list all the rest of the degrees of freedom. Then use https://petsc.org/release/docs/manualpages/PC/PCFieldSplitSetIS/ Barry Note: PCFIELDSPLIT does not care how you have ordered your degrees of freedom of the two types. You might interlace them or have all the first degree of freedom on an MPI process and then have all the second degree of freedom. This just determines what your IS look like. On Mar 14, 2023, at 1:14 PM, Christopher, Joshua via petsc-users > wrote: Hello PETSc users, I haven't heard back from the library developer regarding the numbering issue or my questions on using field split operators with their library, so I need to fix this myself. Regarding the natural numbering vs parallel numbering: I haven't figured out what is wrong here. I stepped through in parallel and it looks like each processor is setting up the matrix and calling MatSetValue similar to what is shown in https://petsc.org/release/src/ksp/ksp/tutorials/ex2.c.html. I see that PETSc is recognizing my simple two-processor test from the output ("PetscInitialize_Common(): PETSc successfully started: number of processors = 2"). I'll keep poking at this, however I'm very new to PETSc. When I print the matrix to ASCII using PETSC_VIEWER_DEFAULT, I'm guessing I see one row per line, and the tuples consists of the column number and value? On the FieldSplit preconditioner, is my understanding here correct: To use FieldSplit, I must have a DM. Since I have an unstructured mesh, I must use DMPlex and set up the chart and covering relations specific to my mesh following here: https://petsc.org/release/docs/manual/dmplex/. I think this may be very time-consuming for me to set up. Currently, I already have a matrix stored in a parallel sparse L-D-U format. I am converting into PETSc's sparse parallel AIJ matrix (traversing my matrix and using MatSetValues). The weights for my discretization scheme are already accounted for in the coefficients of my L-D-U matrix. I do have the submatrices in L-D-U format for each of my two equations' coupling with each other. That is, the equivalent of lines 242,251-252,254 of example 28 https://petsc.org/release/src/snes/tutorials/ex28.c.html. Could I directly convert my submatrices into PETSc's sub-matrix here, then assemble things together so that the field split preconditioners will work? Alternatively, since my L-D-U matrices already account for the discretization scheme, can I use a simple structured grid DM? Thank you so much for your help! Regards, Joshua ________________________________ From: Pierre Jolivet > Sent: Friday, March 3, 2023 11:45 AM To: Christopher, Joshua > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG For full disclosure, with -ksp_pc_side right -ksp_max_it 100 -ksp_rtol 1E-10: 1) with renumbering via ParMETIS -pc_type bjacobi -sub_pc_type lu -sub_pc_factor_mat_solver_type mumps => Linear solve converged due to CONVERGED_RTOL iterations 10 -pc_type hypre -pc_hypre_boomeramg_relax_type_down l1-Gauss-Seidel -pc_hypre_boomeramg_relax_type_up backward-l1-Gauss-Seidel => Linear solve converged due to CONVERGED_RTOL iterations 55 2) without renumbering via ParMETIS -pc_type bjacobi => Linear solve did not converge due to DIVERGED_ITS iterations 100 -pc_type hypre => Linear solve did not converge due to DIVERGED_ITS iterations 100 Using on outer fieldsplit may help fix this. Thanks, Pierre On 3 Mar 2023, at 6:24 PM, Christopher, Joshua via petsc-users > wrote: I am solving these equations in the context of electrically-driven fluid flows as that first paper describes. I am using a PIMPLE scheme to advance the fluid equations in time, and my goal is to do a coupled solve of the electric equations similar to what is described in this paper: https://www.sciencedirect.com/science/article/pii/S0045793019302427. They are using the SIMPLE scheme in this paper. My fluid flow should eventually reach steady behavior, and likewise the time derivative in the charge density should trend towards zero. They preferred using BiCGStab with a direct LU preconditioner for solving their electric equations. I tried to test that combination, but my case is halting for unknown reasons in the middle of the PETSc solve. I'll try with more nodes and see if I am running out of memory, but the computer is a little overloaded at the moment so it may take a while to run. I sent Pierre Jolivet my matrix and RHS, and they said the matrix does not appear to be following a parallel numbering, and instead looks like the matrix has natural numbering. When they renumbered the system with ParMETIS they got really fast convergence. I am using PETSc through a library, so I will reach out to the library authors and see if there is an issue in the library. Thank you, Joshua ________________________________ From: Barry Smith > Sent: Thursday, March 2, 2023 3:47 PM To: Christopher, Joshua > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG Are you solving this as a time-dependent problem? Using an implicit scheme (like backward Euler) for rho ? In ODE language, solving the differential algebraic equation? Is epsilon bounded away from 0? On Mar 2, 2023, at 4:22 PM, Christopher, Joshua > wrote: Hi Barry and Mark, Thank you for looking into my problem. The two equations I am solving with PETSc are equations 6 and 7 from this paper:https://ris.utwente.nl/ws/portalfiles/portal/5676495/Roghair+Paper_final_draft_v1.pdf I just used MUMPS and SuperLU_DIST on my full-size problem (with 3,000,000 unknowns). To clarify, I did a direct solve with -ksp_type preonly. They take a very long time, about 30 minutes for MUMPS and 18 minutes for SuperLU_DIST, see attached output. For reference, the same matrix took 658 iterations of BoomerAMG and about 20 seconds of walltime. Maybe I am already getting a great deal with BoomerAMG! I'll try removing some terms from my solve (e.g. removing the second equation, then making the second equation just the elliptic portion of the equation, etc.) and try with a simpler geometry. I'll keep you updated as I run into troubles with that route. I wasn't aware of Field Split preconditioners, I'll do some reading on them and give them a try as well. Thank you again, Joshua ________________________________ From: Barry Smith > Sent: Thursday, March 2, 2023 7:47 AM To: Christopher, Joshua > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the 5,000,000 unknowns? It is at the high end of problem sizes you can do with direct solvers but is worth comparing with BoomerAMG. You likely want to use more nodes and fewer cores per node with MUMPs to be able to access more memory. If you are needing to solve multiple right hand sides but with the same matrix the factors will be reused resulting in the second and later solves being much faster. I agree with Mark, with iterative solvers you are likely to end up with PCFIELDSPLIT. Barry On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users > wrote: Hello, I am trying to solve the leaky-dielectric model equations with PETSc using a second-order discretization scheme (with limiting to first order as needed) using the finite volume method. The leaky dielectric model is a coupled system of two equations, consisting of a Poisson equation and a convection-diffusion equation. I have tested on small problems with simple geometry (~1000 DoFs) using: -ksp_type gmres -pc_type hypre -pc_hypre_type boomeramg and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this in parallel with 2 cores, but also previously was able to use successfully use a direct solver in serial to solve this problem. When I scale up to my production problem, I get significantly worse convergence. My production problem has ~3 million DoFs, more complex geometry, and is solved on ~100 cores across two nodes. The boundary conditions change a little because of the geometry, but are of the same classifications (e.g. only Dirichlet and Neumann). On the production case, I am needing 600-4000 iterations to converge. I've attached the output from the first solve that took 658 iterations to converge, using the following output options: -ksp_view_pre -ksp_view -ksp_converged_reason -ksp_monitor_true_residual -ksp_test_null_space My matrix is non-symmetric, the condition number can be around 10e6, and the eigenvalues reported by PETSc have been real and positive (using -ksp_view_eigenvalues). I have tried using other preconditions (superlu, mumps, gamg, mg) but hypre+boomeramg has performed the best so far. The literature seems to indicate that AMG is the best approach for solving these equations in a coupled fashion. Do you have any advice on speeding up the convergence of this system? Thank you, Joshua -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: phi_IS.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: rho_IS.txt URL: From knepley at gmail.com Mon Mar 20 18:16:52 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 20 Mar 2023 19:16:52 -0400 Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG In-Reply-To: References: <523EAD18-437E-4008-A811-4D32317C89AC@joliv.et> <4A1F98D0-658C-47A2-8277-23F97F95F5C1@petsc.dev> <595D8D88-C619-41D7-A427-1C0EFB5C5E44@petsc.dev> Message-ID: On Mon, Mar 20, 2023 at 6:45?PM Christopher, Joshua via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi Barry and Mark, > > Thank you for your responses. I implemented the index sets in my > application and it appears to work in serial. Unfortunately I am having > some trouble running in parallel. The error I am getting is: > [1]PETSC ERROR: Petsc has generated inconsistent data > [1]PETSC ERROR: Number of entries found in complement 1000 does not match > expected 500 > 1]PETSC ERROR: #1 ISComplement() at > petsc-3.16.5/src/vec/is/is/utils/iscoloring.c:837 > [1]PETSC ERROR: #2 PCSetUp_FieldSplit() at > petsc-3.16.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c:882 > [1]PETSC ERROR: #3 PCSetUp() at > petsc-3.16.5/src/ksp/pc/interface/precon.c:1017 > [1]PETSC ERROR: #4 KSPSetUp() at > petsc-3.16.5/src/ksp/ksp/interface/itfunc.c:408 > [1]PETSC ERROR: #5 KSPSolve_Private() at > petsc-3.16.5/src/ksp/ksp/interface/itfunc.c:852 > [1]PETSC ERROR: #6 KSPSolve() at > petsc-3.16.5/src/ksp/ksp/interface/itfunc.c:1086 > [1]PETSC ERROR: #7 solvePetsc() at coupled/coupledSolver.C:612 > > I am testing with two processors and a 2000x2000 matrix. I have two > fields, phi and rho. The matrix has rows 0-999 for phi and rows 1000-1999 > for rho. Proc0 has rows 0-499 and 1000-1499 while proc1 has rows 500-999 > and 1500-1999. I've attached the ASCII printout of the IS for phi and rho. > Am I right thinking that I have some issue with my IS layouts? > I do not understand your explanation. Your matrix is 2000x2000, and I assume split so that proc 0 has rows 0 -- 999 proc 1 has rows 1000 -- 1999 Now, when you call PCFieldSplitSetIS(), each process gives an IS which indicates the dofs _owned by that process_ the contribute to field k. If you do not give unknowns within the global row bounds for that process, the ISComplement() call will not work. Of course, we should check that the entries are not out of bounds when they are submitted. if you want to do it, it would be a cool submission. Thanks, Matt > Thank you, > Joshua > > > ------------------------------ > *From:* Barry Smith > *Sent:* Friday, March 17, 2023 1:22 PM > *To:* Christopher, Joshua > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre > BoomerAMG > > > > On Mar 17, 2023, at 1:26 PM, Christopher, Joshua > wrote: > > Hi Barry, > > Thank you for your response. I'm a little confused about the relation > between the IS integer values and matrix indices. From > https://petsc.org/release/src/snes/tutorials/ex70.c.html it looks like my > IS should just contain a list of the rows for each split? For example, if I > have a 100x100 matrix with two fields, "rho" and "phi", the first 50 rows > correspond to the "rho" variable and the last 50 correspond to the "phi" > variable. So I should call PCFieldSplitSetIS twice, the first with an IS > containing integers 0-49 and the second with integers 49-99? > PCFieldSplitSetIS is expecting global row numbers, correct? > > > As Mark said, yes this sounds fine. > > > My matrix is organized as one block after another. > > > When you are running in parallel with MPI, how will you organize the > unknowns? Will you have 25 of the rho followed by 25 of phi on each MPI > process? You will need to take this into account when you build the IS on > each MPI process. > > Barry > > > > Thank you, > Joshua > ------------------------------ > *From:* Barry Smith > *Sent:* Tuesday, March 14, 2023 1:35 PM > *To:* Christopher, Joshua > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre > BoomerAMG > > > You definitely do not need to use a complicated DM to take advantage of > PCFIELDSPLIT. All you need to do is create two IS on each MPI process. The > first should list all the indices of the degrees of freedom of your first > type of variable and the second should list all the rest of the degrees of > freedom. Then use > https://petsc.org/release/docs/manualpages/PC/PCFieldSplitSetIS/ > > Barry > > Note: PCFIELDSPLIT does not care how you have ordered your degrees of > freedom of the two types. You might interlace them or have all the first > degree of freedom on an MPI process and then have all the second degree of > freedom. This just determines what your IS look like. > > > > On Mar 14, 2023, at 1:14 PM, Christopher, Joshua via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hello PETSc users, > > I haven't heard back from the library developer regarding the numbering > issue or my questions on using field split operators with their library, so > I need to fix this myself. > > Regarding the natural numbering vs parallel numbering: I haven't figured > out what is wrong here. I stepped through in parallel and it looks like > each processor is setting up the matrix and calling MatSetValue similar to > what is shown in > https://petsc.org/release/src/ksp/ksp/tutorials/ex2.c.html. I see that > PETSc is recognizing my simple two-processor test from the output > ("PetscInitialize_Common(): PETSc successfully started: number of > processors = 2"). I'll keep poking at this, however I'm very new to PETSc. > When I print the matrix to ASCII using PETSC_VIEWER_DEFAULT, I'm guessing I > see one row per line, and the tuples consists of the column number and > value? > > On the FieldSplit preconditioner, is my understanding here correct: > > To use FieldSplit, I must have a DM. Since I have an unstructured mesh, I > must use DMPlex and set up the chart and covering relations specific to my > mesh following here: https://petsc.org/release/docs/manual/dmplex/. I > think this may be very time-consuming for me to set up. > > Currently, I already have a matrix stored in a parallel sparse L-D-U > format. I am converting into PETSc's sparse parallel AIJ matrix (traversing > my matrix and using MatSetValues). The weights for my discretization scheme > are already accounted for in the coefficients of my L-D-U matrix. I do have > the submatrices in L-D-U format for each of my two equations' coupling with > each other. That is, the equivalent of lines 242,251-252,254 of example 28 > https://petsc.org/release/src/snes/tutorials/ex28.c.html. Could I > directly convert my submatrices into PETSc's sub-matrix here, then assemble > things together so that the field split preconditioners will work? > > Alternatively, since my L-D-U matrices already account for the > discretization scheme, can I use a simple structured grid DM? > > Thank you so much for your help! > Regards, > Joshua > ------------------------------ > *From:* Pierre Jolivet > *Sent:* Friday, March 3, 2023 11:45 AM > *To:* Christopher, Joshua > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre > BoomerAMG > > For full disclosure, with -ksp_pc_side right -ksp_max_it 100 -ksp_rtol > 1E-10: > 1) with renumbering via ParMETIS > -pc_type bjacobi -sub_pc_type lu -sub_pc_factor_mat_solver_type mumps > => Linear solve converged due to CONVERGED_RTOL iterations 10 > -pc_type hypre -pc_hypre_boomeramg_relax_type_down l1-Gauss-Seidel > -pc_hypre_boomeramg_relax_type_up backward-l1-Gauss-Seidel => Linear solve > converged due to CONVERGED_RTOL iterations 55 > 2) without renumbering via ParMETIS > -pc_type bjacobi => Linear solve did not converge due to DIVERGED_ITS > iterations 100 > -pc_type hypre => Linear solve did not converge due to DIVERGED_ITS > iterations 100 > Using on outer fieldsplit may help fix this. > > Thanks, > Pierre > > On 3 Mar 2023, at 6:24 PM, Christopher, Joshua via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > I am solving these equations in the context of electrically-driven fluid > flows as that first paper describes. I am using a PIMPLE scheme to advance > the fluid equations in time, and my goal is to do a coupled solve of the > electric equations similar to what is described in this paper: > https://www.sciencedirect.com/science/article/pii/S0045793019302427. They > are using the SIMPLE scheme in this paper. My fluid flow should eventually > reach steady behavior, and likewise the time derivative in the charge > density should trend towards zero. They preferred using BiCGStab with a > direct LU preconditioner for solving their electric equations. I tried to > test that combination, but my case is halting for unknown reasons in the > middle of the PETSc solve. I'll try with more nodes and see if I am running > out of memory, but the computer is a little overloaded at the moment so it > may take a while to run. > > I sent Pierre Jolivet my matrix and RHS, and they said the matrix does not > appear to be following a parallel numbering, and instead looks like the > matrix has natural numbering. When they renumbered the system with ParMETIS > they got really fast convergence. I am using PETSc through a library, so I > will reach out to the library authors and see if there is an issue in the > library. > > Thank you, > Joshua > ------------------------------ > *From:* Barry Smith > *Sent:* Thursday, March 2, 2023 3:47 PM > *To:* Christopher, Joshua > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre > BoomerAMG > > > > > > > Are you solving this as a time-dependent problem? Using an implicit > scheme (like backward Euler) for rho ? In ODE language, solving the > differential algebraic equation? > > Is epsilon bounded away from 0? > > On Mar 2, 2023, at 4:22 PM, Christopher, Joshua > wrote: > > Hi Barry and Mark, > > Thank you for looking into my problem. The two equations I am solving with > PETSc are equations 6 and 7 from this paper: > https://ris.utwente.nl/ws/portalfiles/portal/5676495/Roghair+Paper_final_draft_v1.pdf > > I just used MUMPS and SuperLU_DIST on my full-size problem (with 3,000,000 > unknowns). To clarify, I did a direct solve with -ksp_type preonly. They > take a very long time, about 30 minutes for MUMPS and 18 minutes for > SuperLU_DIST, see attached output. For reference, the same matrix took 658 > iterations of BoomerAMG and about 20 seconds of walltime. Maybe I am > already getting a great deal with BoomerAMG! > > I'll try removing some terms from my solve (e.g. removing the second > equation, then making the second equation just the elliptic portion of the > equation, etc.) and try with a simpler geometry. I'll keep you updated as I > run into troubles with that route. I wasn't aware of Field Split > preconditioners, I'll do some reading on them and give them a try as well. > > Thank you again, > Joshua > ------------------------------ > > *From:* Barry Smith > *Sent:* Thursday, March 2, 2023 7:47 AM > *To:* Christopher, Joshua > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre > BoomerAMG > > > Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the > 5,000,000 unknowns? It is at the high end of problem sizes you can do with > direct solvers but is worth comparing with BoomerAMG. You likely want to > use more nodes and fewer cores per node with MUMPs to be able to access > more memory. If you are needing to solve multiple right hand sides but with > the same matrix the factors will be reused resulting in the second and > later solves being much faster. > > I agree with Mark, with iterative solvers you are likely to end up with > PCFIELDSPLIT. > > Barry > > > On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hello, > > I am trying to solve the leaky-dielectric model equations with PETSc using > a second-order discretization scheme (with limiting to first order as > needed) using the finite volume method. The leaky dielectric model is a > coupled system of two equations, consisting of a Poisson equation and a > convection-diffusion equation. I have tested on small problems with simple > geometry (~1000 DoFs) using: > > -ksp_type gmres > -pc_type hypre > -pc_hypre_type boomeramg > > and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this > in parallel with 2 cores, but also previously was able to use successfully > use a direct solver in serial to solve this problem. When I scale up to my > production problem, I get significantly worse convergence. My production > problem has ~3 million DoFs, more complex geometry, and is solved on ~100 > cores across two nodes. The boundary conditions change a little because of > the geometry, but are of the same classifications (e.g. only Dirichlet and > Neumann). On the production case, I am needing 600-4000 iterations to > converge. I've attached the output from the first solve that took 658 > iterations to converge, using the following output options: > > -ksp_view_pre > -ksp_view > -ksp_converged_reason > -ksp_monitor_true_residual > -ksp_test_null_space > > My matrix is non-symmetric, the condition number can be around 10e6, and > the eigenvalues reported by PETSc have been real and positive (using > -ksp_view_eigenvalues). > > I have tried using other preconditions (superlu, mumps, gamg, mg) but > hypre+boomeramg has performed the best so far. The literature seems to > indicate that AMG is the best approach for solving these equations in a > coupled fashion. > > Do you have any advice on speeding up the convergence of this system? > > Thank you, > Joshua > > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jchristopher at anl.gov Tue Mar 21 09:28:29 2023 From: jchristopher at anl.gov (Christopher, Joshua) Date: Tue, 21 Mar 2023 14:28:29 +0000 Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG In-Reply-To: References: <523EAD18-437E-4008-A811-4D32317C89AC@joliv.et> <4A1F98D0-658C-47A2-8277-23F97F95F5C1@petsc.dev> <595D8D88-C619-41D7-A427-1C0EFB5C5E44@petsc.dev> Message-ID: Hi Matt, Sorry for the unclear explanation. My layout is like this: Proc 0: Rows 0--499 and rows 1000--1499 Proc 1: Rows 500-999 and rows 1500-1999 I have two unknowns, rho and phi, both correspond to a contiguous chunk of rows. Phi: Rows 0-999 Rho: Rows 1000-1999 My source data (an OpenFOAM matrix) has the unknowns row-contiguous, which is why my layout is like this. My understanding is that my IS are set up correctly to match this matrix structure, which is why I am uncertain why I am getting the error message. I attached the output of my IS in my previous message. Thank you, Joshua ________________________________ From: Matthew Knepley Sent: Monday, March 20, 2023 6:16 PM To: Christopher, Joshua Cc: Barry Smith ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG On Mon, Mar 20, 2023 at 6:45?PM Christopher, Joshua via petsc-users > wrote: Hi Barry and Mark, Thank you for your responses. I implemented the index sets in my application and it appears to work in serial. Unfortunately I am having some trouble running in parallel. The error I am getting is: [1]PETSC ERROR: Petsc has generated inconsistent data [1]PETSC ERROR: Number of entries found in complement 1000 does not match expected 500 1]PETSC ERROR: #1 ISComplement() at petsc-3.16.5/src/vec/is/is/utils/iscoloring.c:837 [1]PETSC ERROR: #2 PCSetUp_FieldSplit() at petsc-3.16.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c:882 [1]PETSC ERROR: #3 PCSetUp() at petsc-3.16.5/src/ksp/pc/interface/precon.c:1017 [1]PETSC ERROR: #4 KSPSetUp() at petsc-3.16.5/src/ksp/ksp/interface/itfunc.c:408 [1]PETSC ERROR: #5 KSPSolve_Private() at petsc-3.16.5/src/ksp/ksp/interface/itfunc.c:852 [1]PETSC ERROR: #6 KSPSolve() at petsc-3.16.5/src/ksp/ksp/interface/itfunc.c:1086 [1]PETSC ERROR: #7 solvePetsc() at coupled/coupledSolver.C:612 I am testing with two processors and a 2000x2000 matrix. I have two fields, phi and rho. The matrix has rows 0-999 for phi and rows 1000-1999 for rho. Proc0 has rows 0-499 and 1000-1499 while proc1 has rows 500-999 and 1500-1999. I've attached the ASCII printout of the IS for phi and rho. Am I right thinking that I have some issue with my IS layouts? I do not understand your explanation. Your matrix is 2000x2000, and I assume split so that proc 0 has rows 0 -- 999 proc 1 has rows 1000 -- 1999 Now, when you call PCFieldSplitSetIS(), each process gives an IS which indicates the dofs _owned by that process_ the contribute to field k. If you do not give unknowns within the global row bounds for that process, the ISComplement() call will not work. Of course, we should check that the entries are not out of bounds when they are submitted. if you want to do it, it would be a cool submission. Thanks, Matt Thank you, Joshua ________________________________ From: Barry Smith > Sent: Friday, March 17, 2023 1:22 PM To: Christopher, Joshua > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG On Mar 17, 2023, at 1:26 PM, Christopher, Joshua > wrote: Hi Barry, Thank you for your response. I'm a little confused about the relation between the IS integer values and matrix indices. Fromhttps://petsc.org/release/src/snes/tutorials/ex70.c.html it looks like my IS should just contain a list of the rows for each split? For example, if I have a 100x100 matrix with two fields, "rho" and "phi", the first 50 rows correspond to the "rho" variable and the last 50 correspond to the "phi" variable. So I should call PCFieldSplitSetIS twice, the first with an IS containing integers 0-49 and the second with integers 49-99? PCFieldSplitSetIS is expecting global row numbers, correct? As Mark said, yes this sounds fine. My matrix is organized as one block after another. When you are running in parallel with MPI, how will you organize the unknowns? Will you have 25 of the rho followed by 25 of phi on each MPI process? You will need to take this into account when you build the IS on each MPI process. Barry Thank you, Joshua ________________________________ From: Barry Smith > Sent: Tuesday, March 14, 2023 1:35 PM To: Christopher, Joshua > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG You definitely do not need to use a complicated DM to take advantage of PCFIELDSPLIT. All you need to do is create two IS on each MPI process. The first should list all the indices of the degrees of freedom of your first type of variable and the second should list all the rest of the degrees of freedom. Then use https://petsc.org/release/docs/manualpages/PC/PCFieldSplitSetIS/ Barry Note: PCFIELDSPLIT does not care how you have ordered your degrees of freedom of the two types. You might interlace them or have all the first degree of freedom on an MPI process and then have all the second degree of freedom. This just determines what your IS look like. On Mar 14, 2023, at 1:14 PM, Christopher, Joshua via petsc-users > wrote: Hello PETSc users, I haven't heard back from the library developer regarding the numbering issue or my questions on using field split operators with their library, so I need to fix this myself. Regarding the natural numbering vs parallel numbering: I haven't figured out what is wrong here. I stepped through in parallel and it looks like each processor is setting up the matrix and calling MatSetValue similar to what is shown in https://petsc.org/release/src/ksp/ksp/tutorials/ex2.c.html. I see that PETSc is recognizing my simple two-processor test from the output ("PetscInitialize_Common(): PETSc successfully started: number of processors = 2"). I'll keep poking at this, however I'm very new to PETSc. When I print the matrix to ASCII using PETSC_VIEWER_DEFAULT, I'm guessing I see one row per line, and the tuples consists of the column number and value? On the FieldSplit preconditioner, is my understanding here correct: To use FieldSplit, I must have a DM. Since I have an unstructured mesh, I must use DMPlex and set up the chart and covering relations specific to my mesh following here: https://petsc.org/release/docs/manual/dmplex/. I think this may be very time-consuming for me to set up. Currently, I already have a matrix stored in a parallel sparse L-D-U format. I am converting into PETSc's sparse parallel AIJ matrix (traversing my matrix and using MatSetValues). The weights for my discretization scheme are already accounted for in the coefficients of my L-D-U matrix. I do have the submatrices in L-D-U format for each of my two equations' coupling with each other. That is, the equivalent of lines 242,251-252,254 of example 28 https://petsc.org/release/src/snes/tutorials/ex28.c.html. Could I directly convert my submatrices into PETSc's sub-matrix here, then assemble things together so that the field split preconditioners will work? Alternatively, since my L-D-U matrices already account for the discretization scheme, can I use a simple structured grid DM? Thank you so much for your help! Regards, Joshua ________________________________ From: Pierre Jolivet > Sent: Friday, March 3, 2023 11:45 AM To: Christopher, Joshua > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG For full disclosure, with -ksp_pc_side right -ksp_max_it 100 -ksp_rtol 1E-10: 1) with renumbering via ParMETIS -pc_type bjacobi -sub_pc_type lu -sub_pc_factor_mat_solver_type mumps => Linear solve converged due to CONVERGED_RTOL iterations 10 -pc_type hypre -pc_hypre_boomeramg_relax_type_down l1-Gauss-Seidel -pc_hypre_boomeramg_relax_type_up backward-l1-Gauss-Seidel => Linear solve converged due to CONVERGED_RTOL iterations 55 2) without renumbering via ParMETIS -pc_type bjacobi => Linear solve did not converge due to DIVERGED_ITS iterations 100 -pc_type hypre => Linear solve did not converge due to DIVERGED_ITS iterations 100 Using on outer fieldsplit may help fix this. Thanks, Pierre On 3 Mar 2023, at 6:24 PM, Christopher, Joshua via petsc-users > wrote: I am solving these equations in the context of electrically-driven fluid flows as that first paper describes. I am using a PIMPLE scheme to advance the fluid equations in time, and my goal is to do a coupled solve of the electric equations similar to what is described in this paper: https://www.sciencedirect.com/science/article/pii/S0045793019302427. They are using the SIMPLE scheme in this paper. My fluid flow should eventually reach steady behavior, and likewise the time derivative in the charge density should trend towards zero. They preferred using BiCGStab with a direct LU preconditioner for solving their electric equations. I tried to test that combination, but my case is halting for unknown reasons in the middle of the PETSc solve. I'll try with more nodes and see if I am running out of memory, but the computer is a little overloaded at the moment so it may take a while to run. I sent Pierre Jolivet my matrix and RHS, and they said the matrix does not appear to be following a parallel numbering, and instead looks like the matrix has natural numbering. When they renumbered the system with ParMETIS they got really fast convergence. I am using PETSc through a library, so I will reach out to the library authors and see if there is an issue in the library. Thank you, Joshua ________________________________ From: Barry Smith > Sent: Thursday, March 2, 2023 3:47 PM To: Christopher, Joshua > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG Are you solving this as a time-dependent problem? Using an implicit scheme (like backward Euler) for rho ? In ODE language, solving the differential algebraic equation? Is epsilon bounded away from 0? On Mar 2, 2023, at 4:22 PM, Christopher, Joshua > wrote: Hi Barry and Mark, Thank you for looking into my problem. The two equations I am solving with PETSc are equations 6 and 7 from this paper:https://ris.utwente.nl/ws/portalfiles/portal/5676495/Roghair+Paper_final_draft_v1.pdf I just used MUMPS and SuperLU_DIST on my full-size problem (with 3,000,000 unknowns). To clarify, I did a direct solve with -ksp_type preonly. They take a very long time, about 30 minutes for MUMPS and 18 minutes for SuperLU_DIST, see attached output. For reference, the same matrix took 658 iterations of BoomerAMG and about 20 seconds of walltime. Maybe I am already getting a great deal with BoomerAMG! I'll try removing some terms from my solve (e.g. removing the second equation, then making the second equation just the elliptic portion of the equation, etc.) and try with a simpler geometry. I'll keep you updated as I run into troubles with that route. I wasn't aware of Field Split preconditioners, I'll do some reading on them and give them a try as well. Thank you again, Joshua ________________________________ From: Barry Smith > Sent: Thursday, March 2, 2023 7:47 AM To: Christopher, Joshua > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the 5,000,000 unknowns? It is at the high end of problem sizes you can do with direct solvers but is worth comparing with BoomerAMG. You likely want to use more nodes and fewer cores per node with MUMPs to be able to access more memory. If you are needing to solve multiple right hand sides but with the same matrix the factors will be reused resulting in the second and later solves being much faster. I agree with Mark, with iterative solvers you are likely to end up with PCFIELDSPLIT. Barry On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users > wrote: Hello, I am trying to solve the leaky-dielectric model equations with PETSc using a second-order discretization scheme (with limiting to first order as needed) using the finite volume method. The leaky dielectric model is a coupled system of two equations, consisting of a Poisson equation and a convection-diffusion equation. I have tested on small problems with simple geometry (~1000 DoFs) using: -ksp_type gmres -pc_type hypre -pc_hypre_type boomeramg and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this in parallel with 2 cores, but also previously was able to use successfully use a direct solver in serial to solve this problem. When I scale up to my production problem, I get significantly worse convergence. My production problem has ~3 million DoFs, more complex geometry, and is solved on ~100 cores across two nodes. The boundary conditions change a little because of the geometry, but are of the same classifications (e.g. only Dirichlet and Neumann). On the production case, I am needing 600-4000 iterations to converge. I've attached the output from the first solve that took 658 iterations to converge, using the following output options: -ksp_view_pre -ksp_view -ksp_converged_reason -ksp_monitor_true_residual -ksp_test_null_space My matrix is non-symmetric, the condition number can be around 10e6, and the eigenvalues reported by PETSc have been real and positive (using -ksp_view_eigenvalues). I have tried using other preconditions (superlu, mumps, gamg, mg) but hypre+boomeramg has performed the best so far. The literature seems to indicate that AMG is the best approach for solving these equations in a coupled fashion. Do you have any advice on speeding up the convergence of this system? Thank you, Joshua -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Mar 21 09:37:56 2023 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 21 Mar 2023 10:37:56 -0400 Subject: [petsc-users] Overcoming slow convergence with GMRES+Hypre BoomerAMG In-Reply-To: References: <523EAD18-437E-4008-A811-4D32317C89AC@joliv.et> <4A1F98D0-658C-47A2-8277-23F97F95F5C1@petsc.dev> <595D8D88-C619-41D7-A427-1C0EFB5C5E44@petsc.dev> Message-ID: On Tue, Mar 21, 2023 at 10:28?AM Christopher, Joshua wrote: > Hi Matt, > > Sorry for the unclear explanation. My layout is like this: > > Proc 0: Rows 0--499 and rows 1000--1499 > Proc 1: Rows 500-999 and rows 1500-1999 > That is not a possible layout in PETSc. This is the source of the misunderstanding. Rows are always contiguous in PETSc. Thanks, Matt > I have two unknowns, rho and phi, both correspond to a contiguous chunk of > rows. > > Phi: Rows 0-999 > Rho: Rows 1000-1999 > > My source data (an OpenFOAM matrix) has the unknowns row-contiguous, which > is why my layout is like this. My understanding is that my IS are set up > correctly to match this matrix structure, which is why I am uncertain why I > am getting the error message. I attached the output of my IS in my previous > message. > > Thank you, > Joshua > ------------------------------ > *From:* Matthew Knepley > *Sent:* Monday, March 20, 2023 6:16 PM > *To:* Christopher, Joshua > *Cc:* Barry Smith ; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov> > *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre > BoomerAMG > > On Mon, Mar 20, 2023 at 6:45?PM Christopher, Joshua via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hi Barry and Mark, > > Thank you for your responses. I implemented the index sets in my > application and it appears to work in serial. Unfortunately I am having > some trouble running in parallel. The error I am getting is: > [1]PETSC ERROR: Petsc has generated inconsistent data > [1]PETSC ERROR: Number of entries found in complement 1000 does not match > expected 500 > 1]PETSC ERROR: #1 ISComplement() at > petsc-3.16.5/src/vec/is/is/utils/iscoloring.c:837 > [1]PETSC ERROR: #2 PCSetUp_FieldSplit() at > petsc-3.16.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c:882 > [1]PETSC ERROR: #3 PCSetUp() at > petsc-3.16.5/src/ksp/pc/interface/precon.c:1017 > [1]PETSC ERROR: #4 KSPSetUp() at > petsc-3.16.5/src/ksp/ksp/interface/itfunc.c:408 > [1]PETSC ERROR: #5 KSPSolve_Private() at > petsc-3.16.5/src/ksp/ksp/interface/itfunc.c:852 > [1]PETSC ERROR: #6 KSPSolve() at > petsc-3.16.5/src/ksp/ksp/interface/itfunc.c:1086 > [1]PETSC ERROR: #7 solvePetsc() at coupled/coupledSolver.C:612 > > I am testing with two processors and a 2000x2000 matrix. I have two > fields, phi and rho. The matrix has rows 0-999 for phi and rows 1000-1999 > for rho. Proc0 has rows 0-499 and 1000-1499 while proc1 has rows 500-999 > and 1500-1999. I've attached the ASCII printout of the IS for phi and rho. > Am I right thinking that I have some issue with my IS layouts? > > > I do not understand your explanation. Your matrix is 2000x2000, and I > assume split so that > > proc 0 has rows 0 -- 999 > proc 1 has rows 1000 -- 1999 > > Now, when you call PCFieldSplitSetIS(), each process gives an IS which > indicates the dofs _owned by that process_ the contribute to field k. If you > do not give unknowns within the global row bounds for that process, the > ISComplement() call will not work. > > Of course, we should check that the entries are not out of bounds when > they are submitted. if you want to do it, it would be a cool submission. > > Thanks, > > Matt > > > Thank you, > Joshua > > > ------------------------------ > *From:* Barry Smith > *Sent:* Friday, March 17, 2023 1:22 PM > *To:* Christopher, Joshua > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre > BoomerAMG > > > > On Mar 17, 2023, at 1:26 PM, Christopher, Joshua > wrote: > > Hi Barry, > > Thank you for your response. I'm a little confused about the relation > between the IS integer values and matrix indices. From > https://petsc.org/release/src/snes/tutorials/ex70.c.html it looks like my > IS should just contain a list of the rows for each split? For example, if I > have a 100x100 matrix with two fields, "rho" and "phi", the first 50 rows > correspond to the "rho" variable and the last 50 correspond to the "phi" > variable. So I should call PCFieldSplitSetIS twice, the first with an IS > containing integers 0-49 and the second with integers 49-99? > PCFieldSplitSetIS is expecting global row numbers, correct? > > > As Mark said, yes this sounds fine. > > > My matrix is organized as one block after another. > > > When you are running in parallel with MPI, how will you organize the > unknowns? Will you have 25 of the rho followed by 25 of phi on each MPI > process? You will need to take this into account when you build the IS on > each MPI process. > > Barry > > > > Thank you, > Joshua > ------------------------------ > *From:* Barry Smith > *Sent:* Tuesday, March 14, 2023 1:35 PM > *To:* Christopher, Joshua > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre > BoomerAMG > > > You definitely do not need to use a complicated DM to take advantage of > PCFIELDSPLIT. All you need to do is create two IS on each MPI process. The > first should list all the indices of the degrees of freedom of your first > type of variable and the second should list all the rest of the degrees of > freedom. Then use > https://petsc.org/release/docs/manualpages/PC/PCFieldSplitSetIS/ > > Barry > > Note: PCFIELDSPLIT does not care how you have ordered your degrees of > freedom of the two types. You might interlace them or have all the first > degree of freedom on an MPI process and then have all the second degree of > freedom. This just determines what your IS look like. > > > > On Mar 14, 2023, at 1:14 PM, Christopher, Joshua via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hello PETSc users, > > I haven't heard back from the library developer regarding the numbering > issue or my questions on using field split operators with their library, so > I need to fix this myself. > > Regarding the natural numbering vs parallel numbering: I haven't figured > out what is wrong here. I stepped through in parallel and it looks like > each processor is setting up the matrix and calling MatSetValue similar to > what is shown in > https://petsc.org/release/src/ksp/ksp/tutorials/ex2.c.html. I see that > PETSc is recognizing my simple two-processor test from the output > ("PetscInitialize_Common(): PETSc successfully started: number of > processors = 2"). I'll keep poking at this, however I'm very new to PETSc. > When I print the matrix to ASCII using PETSC_VIEWER_DEFAULT, I'm guessing I > see one row per line, and the tuples consists of the column number and > value? > > On the FieldSplit preconditioner, is my understanding here correct: > > To use FieldSplit, I must have a DM. Since I have an unstructured mesh, I > must use DMPlex and set up the chart and covering relations specific to my > mesh following here: https://petsc.org/release/docs/manual/dmplex/. I > think this may be very time-consuming for me to set up. > > Currently, I already have a matrix stored in a parallel sparse L-D-U > format. I am converting into PETSc's sparse parallel AIJ matrix (traversing > my matrix and using MatSetValues). The weights for my discretization scheme > are already accounted for in the coefficients of my L-D-U matrix. I do have > the submatrices in L-D-U format for each of my two equations' coupling with > each other. That is, the equivalent of lines 242,251-252,254 of example 28 > https://petsc.org/release/src/snes/tutorials/ex28.c.html. Could I > directly convert my submatrices into PETSc's sub-matrix here, then assemble > things together so that the field split preconditioners will work? > > Alternatively, since my L-D-U matrices already account for the > discretization scheme, can I use a simple structured grid DM? > > Thank you so much for your help! > Regards, > Joshua > ------------------------------ > *From:* Pierre Jolivet > *Sent:* Friday, March 3, 2023 11:45 AM > *To:* Christopher, Joshua > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre > BoomerAMG > > For full disclosure, with -ksp_pc_side right -ksp_max_it 100 -ksp_rtol > 1E-10: > 1) with renumbering via ParMETIS > -pc_type bjacobi -sub_pc_type lu -sub_pc_factor_mat_solver_type mumps > => Linear solve converged due to CONVERGED_RTOL iterations 10 > -pc_type hypre -pc_hypre_boomeramg_relax_type_down l1-Gauss-Seidel > -pc_hypre_boomeramg_relax_type_up backward-l1-Gauss-Seidel => Linear solve > converged due to CONVERGED_RTOL iterations 55 > 2) without renumbering via ParMETIS > -pc_type bjacobi => Linear solve did not converge due to DIVERGED_ITS > iterations 100 > -pc_type hypre => Linear solve did not converge due to DIVERGED_ITS > iterations 100 > Using on outer fieldsplit may help fix this. > > Thanks, > Pierre > > On 3 Mar 2023, at 6:24 PM, Christopher, Joshua via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > I am solving these equations in the context of electrically-driven fluid > flows as that first paper describes. I am using a PIMPLE scheme to advance > the fluid equations in time, and my goal is to do a coupled solve of the > electric equations similar to what is described in this paper: > https://www.sciencedirect.com/science/article/pii/S0045793019302427. They > are using the SIMPLE scheme in this paper. My fluid flow should eventually > reach steady behavior, and likewise the time derivative in the charge > density should trend towards zero. They preferred using BiCGStab with a > direct LU preconditioner for solving their electric equations. I tried to > test that combination, but my case is halting for unknown reasons in the > middle of the PETSc solve. I'll try with more nodes and see if I am running > out of memory, but the computer is a little overloaded at the moment so it > may take a while to run. > > I sent Pierre Jolivet my matrix and RHS, and they said the matrix does not > appear to be following a parallel numbering, and instead looks like the > matrix has natural numbering. When they renumbered the system with ParMETIS > they got really fast convergence. I am using PETSc through a library, so I > will reach out to the library authors and see if there is an issue in the > library. > > Thank you, > Joshua > ------------------------------ > *From:* Barry Smith > *Sent:* Thursday, March 2, 2023 3:47 PM > *To:* Christopher, Joshua > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre > BoomerAMG > > > > > > > Are you solving this as a time-dependent problem? Using an implicit > scheme (like backward Euler) for rho ? In ODE language, solving the > differential algebraic equation? > > Is epsilon bounded away from 0? > > On Mar 2, 2023, at 4:22 PM, Christopher, Joshua > wrote: > > Hi Barry and Mark, > > Thank you for looking into my problem. The two equations I am solving with > PETSc are equations 6 and 7 from this paper: > https://ris.utwente.nl/ws/portalfiles/portal/5676495/Roghair+Paper_final_draft_v1.pdf > > I just used MUMPS and SuperLU_DIST on my full-size problem (with 3,000,000 > unknowns). To clarify, I did a direct solve with -ksp_type preonly. They > take a very long time, about 30 minutes for MUMPS and 18 minutes for > SuperLU_DIST, see attached output. For reference, the same matrix took 658 > iterations of BoomerAMG and about 20 seconds of walltime. Maybe I am > already getting a great deal with BoomerAMG! > > I'll try removing some terms from my solve (e.g. removing the second > equation, then making the second equation just the elliptic portion of the > equation, etc.) and try with a simpler geometry. I'll keep you updated as I > run into troubles with that route. I wasn't aware of Field Split > preconditioners, I'll do some reading on them and give them a try as well. > > Thank you again, > Joshua > ------------------------------ > > *From:* Barry Smith > *Sent:* Thursday, March 2, 2023 7:47 AM > *To:* Christopher, Joshua > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre > BoomerAMG > > > Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the > 5,000,000 unknowns? It is at the high end of problem sizes you can do with > direct solvers but is worth comparing with BoomerAMG. You likely want to > use more nodes and fewer cores per node with MUMPs to be able to access > more memory. If you are needing to solve multiple right hand sides but with > the same matrix the factors will be reused resulting in the second and > later solves being much faster. > > I agree with Mark, with iterative solvers you are likely to end up with > PCFIELDSPLIT. > > Barry > > > On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hello, > > I am trying to solve the leaky-dielectric model equations with PETSc using > a second-order discretization scheme (with limiting to first order as > needed) using the finite volume method. The leaky dielectric model is a > coupled system of two equations, consisting of a Poisson equation and a > convection-diffusion equation. I have tested on small problems with simple > geometry (~1000 DoFs) using: > > -ksp_type gmres > -pc_type hypre > -pc_hypre_type boomeramg > > and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this > in parallel with 2 cores, but also previously was able to use successfully > use a direct solver in serial to solve this problem. When I scale up to my > production problem, I get significantly worse convergence. My production > problem has ~3 million DoFs, more complex geometry, and is solved on ~100 > cores across two nodes. The boundary conditions change a little because of > the geometry, but are of the same classifications (e.g. only Dirichlet and > Neumann). On the production case, I am needing 600-4000 iterations to > converge. I've attached the output from the first solve that took 658 > iterations to converge, using the following output options: > > -ksp_view_pre > -ksp_view > -ksp_converged_reason > -ksp_monitor_true_residual > -ksp_test_null_space > > My matrix is non-symmetric, the condition number can be around 10e6, and > the eigenvalues reported by PETSc have been real and positive (using > -ksp_view_eigenvalues). > > I have tried using other preconditions (superlu, mumps, gamg, mg) but > hypre+boomeramg has performed the best so far. The literature seems to > indicate that AMG is the best approach for solving these equations in a > coupled fashion. > > Do you have any advice on speeding up the convergence of this system? > > Thank you, > Joshua > > > > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bourdin at mcmaster.ca Fri Mar 24 13:07:08 2023 From: bourdin at mcmaster.ca (Blaise Bourdin) Date: Fri, 24 Mar 2023 18:07:08 +0000 Subject: [petsc-users] GAMG failure Message-ID: <1FD5FB62-4111-4376-8126-CFA8E8925620@mcmaster.ca> Hi, I am having issue with GAMG for some very ill-conditioned 2D linearized elasticity problems (sharp variation of elastic moduli with thin regions of nearly incompressible material). I use snes_type newtonls, linesearch_type cp, and pc_type gamg without any further options. pc_type Jacobi converges fine (although slowly of course). I am not really surprised that gamg would not converge out of the box, but don?t know where to start to investigate the convergence failure. Can anybody help? Blaise ? Canada Research Chair in Mathematical and Computational Aspects of Solid Mechanics (Tier 1) Professor, Department of Mathematics & Statistics Hamilton Hall room 409A, McMaster University 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243 From jed at jedbrown.org Fri Mar 24 13:47:02 2023 From: jed at jedbrown.org (Jed Brown) Date: Fri, 24 Mar 2023 12:47:02 -0600 Subject: [petsc-users] GAMG failure In-Reply-To: <1FD5FB62-4111-4376-8126-CFA8E8925620@mcmaster.ca> References: <1FD5FB62-4111-4376-8126-CFA8E8925620@mcmaster.ca> Message-ID: <87y1nmj8bd.fsf@jedbrown.org> You can -pc_gamg_threshold .02 to slow the coarsening and either stronger smoother or increase number of iterations used for estimation (or increase tolerance). I assume your system is SPD and you've set the near-null space. Blaise Bourdin writes: > Hi, > > I am having issue with GAMG for some very ill-conditioned 2D linearized elasticity problems (sharp variation of elastic moduli with thin regions of nearly incompressible material). I use snes_type newtonls, linesearch_type cp, and pc_type gamg without any further options. pc_type Jacobi converges fine (although slowly of course). > > > I am not really surprised that gamg would not converge out of the box, but don?t know where to start to investigate the convergence failure. Can anybody help? > > Blaise > > ? > Canada Research Chair in Mathematical and Computational Aspects of Solid Mechanics (Tier 1) > Professor, Department of Mathematics & Statistics > Hamilton Hall room 409A, McMaster University > 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada > https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243 From mfadams at lbl.gov Fri Mar 24 14:21:08 2023 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 24 Mar 2023 15:21:08 -0400 Subject: [petsc-users] GAMG failure In-Reply-To: <87y1nmj8bd.fsf@jedbrown.org> References: <1FD5FB62-4111-4376-8126-CFA8E8925620@mcmaster.ca> <87y1nmj8bd.fsf@jedbrown.org> Message-ID: * Do you set: PetscCall(MatSetOption(Amat, MAT_SPD, PETSC_TRUE)); PetscCall(MatSetOption(Amat, MAT_SPD_ETERNAL, PETSC_TRUE)); Do that to get CG Eigen estimates. Outright failure is usually caused by a bad Eigen estimate. -pc_gamg_esteig_ksp_monitor_singular_value Will print out the estimates as its iterating. You can look at that to check that the max has converged. * -pc_gamg_aggressive_coarsening 0 will slow coarsening as well as threshold. * you can run with '-info :pc' and send me the output (grep on GAMG) Mark On Fri, Mar 24, 2023 at 2:47?PM Jed Brown wrote: > You can -pc_gamg_threshold .02 to slow the coarsening and either stronger > smoother or increase number of iterations used for estimation (or increase > tolerance). I assume your system is SPD and you've set the near-null space. > > Blaise Bourdin writes: > > > Hi, > > > > I am having issue with GAMG for some very ill-conditioned 2D linearized > elasticity problems (sharp variation of elastic moduli with thin regions > of nearly incompressible material). I use snes_type newtonls, > linesearch_type cp, and pc_type gamg without any further options. pc_type > Jacobi converges fine (although slowly of course). > > > > > > I am not really surprised that gamg would not converge out of the box, > but don?t know where to start to investigate the convergence failure. Can > anybody help? > > > > Blaise > > > > ? > > Canada Research Chair in Mathematical and Computational Aspects of Solid > Mechanics (Tier 1) > > Professor, Department of Mathematics & Statistics > > Hamilton Hall room 409A, McMaster University > > 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada > > https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniele.prada85 at gmail.com Mon Mar 27 05:14:58 2023 From: daniele.prada85 at gmail.com (Daniele Prada) Date: Mon, 27 Mar 2023 12:14:58 +0200 Subject: [petsc-users] Using PETSc Testing System Message-ID: Hello everyone, I would like to use the PETSc Testing System for testing a package that I am developing. I have read the PETSc developer documentation and have written some tests using the PETSc Test Description Language. I am going through the files in ${PETSC_DIR}/config but I am not able to make the testing system look into the directory tree of my project. Any suggestions? Thanks in advance Daniele -------------- next part -------------- An HTML attachment was scrubbed... URL: From joauma.marichal at uclouvain.be Mon Mar 27 09:13:23 2023 From: joauma.marichal at uclouvain.be (Joauma Marichal) Date: Mon, 27 Mar 2023 14:13:23 +0000 Subject: [petsc-users] DMSwarm documentation Message-ID: Hello, I am writing to you as I am trying to find documentation about a function that would remove several particles (given their index). I was using: DMSwarmRemovePointAtIndex(*swarm, to_remove[p]); But need something to remove several particles at one time. Petsc.org seems to be down and I was wondering if there was any other way to get this kind of information. Thanks a lot for your help. Best regards, Joauma -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacob.fai at gmail.com Mon Mar 27 09:14:36 2023 From: jacob.fai at gmail.com (Jacob Faibussowitsch) Date: Mon, 27 Mar 2023 10:14:36 -0400 Subject: [petsc-users] Using PETSc Testing System In-Reply-To: References: Message-ID: <8F636F03-6581-4594-877F-CB0A4AC91EA3@gmail.com> Our testing framework was pretty much tailor-made for the PETSc src tree and as such has many hard-coded paths and decisions. I?m going to go out on a limb and say you probably won?t get this to work... That being said, one of the ?base? paths that the testing harness uses to initially find tests is the `TESTSRCDIR` variable in `${PETSC_DIR}/gmakefile.test`. It is currently defined as ``` # TESTSRCDIR is always relative to gmakefile.test # This must be before includes mkfile_path := $(abspath $(lastword $(MAKEFILE_LIST))) TESTSRCDIR := $(dir $(mkfile_path))src ``` You should start by changing this to ``` # TESTSRCDIR is always relative to gmakefile.test # This must be before includes mkfile_path := $(abspath $(lastword $(MAKEFILE_LIST))) TESTSRCDIR ?= $(dir $(mkfile_path))src ``` That way you could run your tests via ``` $ make test TESTSRCDIR=/path/to/your/src/dir ``` I am sure there are many other modifications you will need to make. Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) > On Mar 27, 2023, at 06:14, Daniele Prada wrote: > > Hello everyone, > > I would like to use the PETSc Testing System for testing a package that I am developing. > > I have read the PETSc developer documentation and have written some tests using the PETSc Test Description Language. I am going through the files in ${PETSC_DIR}/config but I am not able to make the testing system look into the directory tree of my project. > > Any suggestions? > > Thanks in advance > Daniele From knepley at gmail.com Mon Mar 27 09:37:49 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 27 Mar 2023 10:37:49 -0400 Subject: [petsc-users] Using PETSc Testing System In-Reply-To: <8F636F03-6581-4594-877F-CB0A4AC91EA3@gmail.com> References: <8F636F03-6581-4594-877F-CB0A4AC91EA3@gmail.com> Message-ID: On Mon, Mar 27, 2023 at 10:19?AM Jacob Faibussowitsch wrote: > Our testing framework was pretty much tailor-made for the PETSc src tree > and as such has many hard-coded paths and decisions. I?m going to go out on > a limb and say you probably won?t get this to work... > I think we can help you get this to work. I have wanted to generalize the test framework for a long time. Everything is build by confg/gmakegentest.py and I think we can get away with just changing paths here and everything will work. Thanks! Matt > That being said, one of the ?base? paths that the testing harness uses to > initially find tests is the `TESTSRCDIR` variable in > `${PETSC_DIR}/gmakefile.test`. It is currently defined as > ``` > # TESTSRCDIR is always relative to gmakefile.test > # This must be before includes > mkfile_path := $(abspath $(lastword $(MAKEFILE_LIST))) > TESTSRCDIR := $(dir $(mkfile_path))src > ``` > You should start by changing this to > ``` > # TESTSRCDIR is always relative to gmakefile.test > # This must be before includes > mkfile_path := $(abspath $(lastword $(MAKEFILE_LIST))) > TESTSRCDIR ?= $(dir $(mkfile_path))src > ``` > That way you could run your tests via > ``` > $ make test TESTSRCDIR=/path/to/your/src/dir > ``` > I am sure there are many other modifications you will need to make. > > Best regards, > > Jacob Faibussowitsch > (Jacob Fai - booss - oh - vitch) > > > On Mar 27, 2023, at 06:14, Daniele Prada > wrote: > > > > Hello everyone, > > > > I would like to use the PETSc Testing System for testing a package that > I am developing. > > > > I have read the PETSc developer documentation and have written some > tests using the PETSc Test Description Language. I am going through the > files in ${PETSC_DIR}/config but I am not able to make the testing system > look into the directory tree of my project. > > > > Any suggestions? > > > > Thanks in advance > > Daniele > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Mar 27 09:51:15 2023 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 27 Mar 2023 10:51:15 -0400 Subject: [petsc-users] [petsc-maint] DMSwarm documentation In-Reply-To: References: Message-ID: <75E0AEDB-D0D0-497A-BB1F-90CD25175382@petsc.dev> petsc.org can be flaky and hang for a few seconds or not respond occasionally but trying again should work. Barry > On Mar 27, 2023, at 10:13 AM, Joauma Marichal wrote: > > Hello, > > I am writing to you as I am trying to find documentation about a function that would remove several particles (given their index). I was using: > DMSwarmRemovePointAtIndex(*swarm, to_remove[p]); > But need something to remove several particles at one time. > > Petsc.org seems to be down and I was wondering if there was any other way to get this kind of information. > > Thanks a lot for your help. > > Best regards, > > Joauma -------------- next part -------------- An HTML attachment was scrubbed... URL: From facklerpw at ornl.gov Mon Mar 27 12:23:28 2023 From: facklerpw at ornl.gov (Fackler, Philip) Date: Mon, 27 Mar 2023 17:23:28 +0000 Subject: [petsc-users] [EXTERNAL] Re: Kokkos backend for Mat and Vec diverging when running on CUDA device. In-Reply-To: References: Message-ID: Junchao, I'm realizing I left you hanging in this email thread. Thank you so much for addressing the problem. I have tested it (successfully) using one process and one GPU. I'm still attempting to test with multiple GPUs (one per task) on another machine. I'll let you know if I see any more trouble. Thanks again, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang Sent: Tuesday, February 7, 2023 16:26 To: Fackler, Philip Cc: xolotl-psi-development at lists.sourceforge.net ; petsc-users at mcs.anl.gov ; Blondel, Sophie ; Roth, Philip Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Hi, Philip, I believe this MR https://gitlab.com/petsc/petsc/-/merge_requests/6030 would fix the problem. It is a fix to petsc/release, but you can cherry-pick it to petsc/main. Could you try that in your case? Thanks. --Junchao Zhang On Fri, Jan 20, 2023 at 11:31 AM Junchao Zhang > wrote: Sorry, no progress. I guess that is because a vector was gotten but not restored (e.g., VecRestoreArray() etc), causing host and device data not synced. Maybe in your code, or in petsc code. After the ECP AM, I will have more time on this bug. Thanks. --Junchao Zhang On Fri, Jan 20, 2023 at 11:00 AM Fackler, Philip > wrote: Any progress on this? Any info/help needed? Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Fackler, Philip > Sent: Thursday, December 8, 2022 09:07 To: Junchao Zhang > Cc: xolotl-psi-development at lists.sourceforge.net >; petsc-users at mcs.anl.gov >; Blondel, Sophie >; Roth, Philip > Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Great! Thank you! Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang > Sent: Wednesday, December 7, 2022 18:47 To: Fackler, Philip > Cc: xolotl-psi-development at lists.sourceforge.net >; petsc-users at mcs.anl.gov >; Blondel, Sophie >; Roth, Philip > Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Hi, Philip, I could reproduce the error. I need to find a way to debug it. Thanks. /home/jczhang/xolotl/test/system/SystemTestCase.cpp(317): fatal error: in "System/PSI_1": absolute value of diffNorm{0.19704848134353209} exceeds 1e-10 *** 1 failure is detected in the test module "Regression" --Junchao Zhang On Tue, Dec 6, 2022 at 10:10 AM Fackler, Philip > wrote: I think it would be simpler to use the develop branch for this issue. But you can still just build the SystemTester. Then (if you changed the PSI_1 case) run: ./test/system/SystemTester -t System/PSI_1 -- -v? (No need for multiple MPI ranks) Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang > Sent: Monday, December 5, 2022 15:40 To: Fackler, Philip > Cc: xolotl-psi-development at lists.sourceforge.net >; petsc-users at mcs.anl.gov >; Blondel, Sophie >; Roth, Philip > Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. I configured with xolotl branch feature-petsc-kokkos, and typed `make` under ~/xolotl-build/. Though there were errors, a lot of *Tester were built. [ 62%] Built target xolotlViz [ 63%] Linking CXX executable TemperatureProfileHandlerTester [ 64%] Linking CXX executable TemperatureGradientHandlerTester [ 64%] Built target TemperatureProfileHandlerTester [ 64%] Built target TemperatureConstantHandlerTester [ 64%] Built target TemperatureGradientHandlerTester [ 65%] Linking CXX executable HeatEquationHandlerTester [ 65%] Built target HeatEquationHandlerTester [ 66%] Linking CXX executable FeFitFluxHandlerTester [ 66%] Linking CXX executable W111FitFluxHandlerTester [ 67%] Linking CXX executable FuelFitFluxHandlerTester [ 67%] Linking CXX executable W211FitFluxHandlerTester Which Tester should I use to run with the parameter file benchmarks/params_system_PSI_2.txt? And how many ranks should I use? Could you give an example command line? Thanks. --Junchao Zhang On Mon, Dec 5, 2022 at 2:22 PM Junchao Zhang > wrote: Hello, Philip, Do I still need to use the feature-petsc-kokkos branch? --Junchao Zhang On Mon, Dec 5, 2022 at 11:08 AM Fackler, Philip > wrote: Junchao, Thank you for working on this. If you open the parameter file for, say, the PSI_2 system test case (benchmarks/params_system_PSI_2.txt), simply add -dm_mat_type aijkokkos -dm_vec_type kokkos?` to the "petscArgs=" field (or the corresponding cusparse/cuda option). Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang > Sent: Thursday, December 1, 2022 17:05 To: Fackler, Philip > Cc: xolotl-psi-development at lists.sourceforge.net >; petsc-users at mcs.anl.gov >; Blondel, Sophie >; Roth, Philip > Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Hi, Philip, Sorry for the long delay. I could not get something useful from the -log_view output. Since I have already built xolotl, could you give me instructions on how to do a xolotl test to reproduce the divergence with petsc GPU backends (but fine on CPU)? Thank you. --Junchao Zhang On Wed, Nov 16, 2022 at 1:38 PM Fackler, Philip > wrote: ------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------ Unknown Name on a named PC0115427 with 1 processor, by 4pf Wed Nov 16 14:36:46 2022 Using Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date: 2022-10-28 14:39:41 +0000 Max Max/Min Avg Total Time (sec): 6.023e+00 1.000 6.023e+00 Objects: 1.020e+02 1.000 1.020e+02 Flops: 1.080e+09 1.000 1.080e+09 1.080e+09 Flops/sec: 1.793e+08 1.000 1.793e+08 1.793e+08 MPI Msg Count: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Msg Len (bytes): 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 6.0226e+00 100.0% 1.0799e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors) CpuToGpu Count: total number of CPU to GPU copies per processor CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor) GpuToCpu Count: total number of GPU to CPU copies per processor GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor) GPU %F: percent flops on GPU in this event ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F ------------------------------------------------------------------------------------------------------------------------ --------------------------------------- --- Event Stage 0: Main Stage BuildTwoSided 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 DMCreateMat 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFSetGraph 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFSetUp 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFPack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFUnpack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecDot 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecMDot 775 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecNorm 1728 1.0 nan nan 1.92e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecScale 1983 1.0 nan nan 6.24e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecCopy 780 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecSet 4955 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecAXPY 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecAYPX 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecAXPBYCZ 643 1.0 nan nan 1.79e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecWAXPY 502 1.0 nan nan 5.58e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecMAXPY 1159 1.0 nan nan 3.68e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecScatterBegin 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 2 5.14e-03 0 0.00e+00 0 VecScatterEnd 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecReduceArith 380 1.0 nan nan 4.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecReduceComm 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecNormalize 965 1.0 nan nan 1.61e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 TSStep 20 1.0 5.8699e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 97100 0 0 0 97100 0 0 0 184 -nan 2 5.14e-03 0 0.00e+00 54 TSFunctionEval 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 63 1 0 0 0 63 1 0 0 0 -nan -nan 1 3.36e-04 0 0.00e+00 100 TSJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 97 MatMult 1930 1.0 nan nan 4.46e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 41 0 0 0 1 41 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatMultTranspose 1 1.0 nan nan 3.44e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatSolve 965 1.0 nan nan 5.04e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSOR 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatLUFactorSym 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatLUFactorNum 190 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 11 0 0 0 1 11 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatScale 190 1.0 nan nan 3.26e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatAssemblyBegin 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatAssemblyEnd 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatGetRowIJ 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatCreateSubMats 380 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatGetOrdering 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatZeroEntries 379 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSetPreallCOO 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSetValuesCOO 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSetUp 760 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSolve 190 1.0 5.8052e-01 1.0 9.30e+08 1.0 0.0e+00 0.0e+00 0.0e+00 10 86 0 0 0 10 86 0 0 0 1602 -nan 1 4.80e-03 0 0.00e+00 46 KSPGMRESOrthog 775 1.0 nan nan 2.27e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 SNESSolve 71 1.0 5.7117e+00 1.0 1.07e+09 1.0 0.0e+00 0.0e+00 0.0e+00 95 99 0 0 0 95 99 0 0 0 188 -nan 1 4.80e-03 0 0.00e+00 53 SNESSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SNESFunctionEval 573 1.0 nan nan 2.23e+07 1.0 0.0e+00 0.0e+00 0.0e+00 60 2 0 0 0 60 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 SNESJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 97 SNESLineSearch 190 1.0 nan nan 1.05e+08 1.0 0.0e+00 0.0e+00 0.0e+00 53 10 0 0 0 53 10 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 PCSetUp 570 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 11 0 0 0 2 11 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCApply 965 1.0 nan nan 6.14e+08 1.0 0.0e+00 0.0e+00 0.0e+00 8 57 0 0 0 8 57 0 0 0 -nan -nan 1 4.80e-03 0 0.00e+00 19 KSPSolve_FS_0 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSolve_FS_1 965 1.0 nan nan 1.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 15 0 0 0 2 15 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 --- Event Stage 1: Unknown ------------------------------------------------------------------------------------------------------------------------ --------------------------------------- Object Type Creations Destructions. Reports information only for process 0. --- Event Stage 0: Main Stage Container 5 5 Distributed Mesh 2 2 Index Set 11 11 IS L to G Mapping 1 1 Star Forest Graph 7 7 Discrete System 2 2 Weak Form 2 2 Vector 49 49 TSAdapt 1 1 TS 1 1 DMTS 1 1 SNES 1 1 DMSNES 3 3 SNESLineSearch 1 1 Krylov Solver 4 4 DMKSP interface 1 1 Matrix 4 4 Preconditioner 4 4 Viewer 2 1 --- Event Stage 1: Unknown ======================================================================================================================== Average time to get PetscTime(): 3.14e-08 #PETSc Option Table entries: -log_view -log_view_gpu_times #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with 64 bit PetscInt Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8 Configure options: PETSC_DIR=/home/4pf/repos/petsc PETSC_ARCH=arch-kokkos-cuda-no-tpls --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cuda --with-debugging=0 --with-shared-libraries --prefix=/home/4pf/build/petsc/cuda-no-tpls/install --with-64-bit-indices --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --CUDAOPTFLAGS=-O3 --with-kokkos-dir=/home/4pf/build/kokkos/cuda/install --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/cuda-no-tpls/install ----------------------------------------- Libraries compiled on 2022-11-01 21:01:08 on PC0115427 Machine characteristics: Linux-5.15.0-52-generic-x86_64-with-glibc2.35 Using PETSc directory: /home/4pf/build/petsc/cuda-no-tpls/install Using PETSc arch: ----------------------------------------- Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -O3 ----------------------------------------- Using include paths: -I/home/4pf/build/petsc/cuda-no-tpls/install/include -I/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/include -I/home/4pf/build/kokkos/cuda/install/include -I/usr/local/cuda-11.8/include ----------------------------------------- Using C linker: mpicc Using libraries: -Wl,-rpath,/home/4pf/build/petsc/cuda-no-tpls/install/lib -L/home/4pf/build/petsc/cuda-no-tpls/install/lib -lpetsc -Wl,-rpath,/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib -L/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib -Wl,-rpath,/home/4pf/build/kokkos/cuda/install/lib -L/home/4pf/build/kokkos/cuda/install/lib -Wl,-rpath,/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64/stubs -lkokkoskernels -lkokkoscontainers -lkokkoscore -llapack -lblas -lm -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lquadmath -lstdc++ -ldl ----------------------------------------- Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang > Sent: Tuesday, November 15, 2022 13:03 To: Fackler, Philip > Cc: xolotl-psi-development at lists.sourceforge.net >; petsc-users at mcs.anl.gov >; Blondel, Sophie >; Roth, Philip > Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Can you paste -log_view result so I can see what functions are used? --Junchao Zhang On Tue, Nov 15, 2022 at 10:24 AM Fackler, Philip > wrote: Yes, most (but not all) of our system test cases fail with the kokkos/cuda or cuda backends. All of them pass with the CPU-only kokkos backend. Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang > Sent: Monday, November 14, 2022 19:34 To: Fackler, Philip > Cc: xolotl-psi-development at lists.sourceforge.net >; petsc-users at mcs.anl.gov >; Blondel, Sophie >; Zhang, Junchao >; Roth, Philip > Subject: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Hi, Philip, Sorry to hear that. It seems you could run the same code on CPUs but not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it right? --Junchao Zhang On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users > wrote: This is an issue I've brought up before (and discussed in-person with Richard). I wanted to bring it up again because I'm hitting the limits of what I know to do, and I need help figuring this out. The problem can be reproduced using Xolotl's "develop" branch built against a petsc build with kokkos and kokkos-kernels enabled. Then, either add the relevant kokkos options to the "petscArgs=" line in the system test parameter file(s), or just replace the system test parameter files with the ones from the "feature-petsc-kokkos" branch. See here the files that begin with "params_system_". Note that those files use the "kokkos" options, but the problem is similar using the corresponding cuda/cusparse options. I've already tried building kokkos-kernels with no TPLs and got slightly different results, but the same problem. Any help would be appreciated. Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From bourdin at mcmaster.ca Mon Mar 27 16:06:11 2023 From: bourdin at mcmaster.ca (Blaise Bourdin) Date: Mon, 27 Mar 2023 21:06:11 +0000 Subject: [petsc-users] GAMG failure In-Reply-To: References: <1FD5FB62-4111-4376-8126-CFA8E8925620@mcmaster.ca> <87y1nmj8bd.fsf@jedbrown.org> Message-ID: An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Mar 27 16:32:16 2023 From: jed at jedbrown.org (Jed Brown) Date: Mon, 27 Mar 2023 15:32:16 -0600 Subject: [petsc-users] GAMG failure In-Reply-To: References: <1FD5FB62-4111-4376-8126-CFA8E8925620@mcmaster.ca> <87y1nmj8bd.fsf@jedbrown.org> Message-ID: <87lejhnan3.fsf@jedbrown.org> Try -pc_gamg_reuse_interpolation 0. I thought this was disabled by default, but I see pc_gamg->reuse_prol = PETSC_TRUE in the code. Blaise Bourdin writes: > On Mar 24, 2023, at 3:21 PM, Mark Adams wrote: > > * Do you set: > > PetscCall(MatSetOption(Amat, MAT_SPD, PETSC_TRUE)); > > PetscCall(MatSetOption(Amat, MAT_SPD_ETERNAL, PETSC_TRUE)); > > Yes > > Do that to get CG Eigen estimates. Outright failure is usually caused by a bad Eigen estimate. > -pc_gamg_esteig_ksp_monitor_singular_value > Will print out the estimates as its iterating. You can look at that to check that the max has converged. > > I just did, and something is off: > I do multiple calls to SNESSolve (staggered scheme for phase-field fracture), but only get informations on the first solve (which is > not the one failing, of course) > Here is what I get: > Residual norms for Displacement_pc_gamg_esteig_ solve. > 0 KSP Residual norm 7.636421712860e+01 % max 1.000000000000e+00 min 1.000000000000e+00 max/min > 1.000000000000e+00 > 1 KSP Residual norm 3.402024867977e+01 % max 1.114319928921e+00 min 1.114319928921e+00 max/min > 1.000000000000e+00 > 2 KSP Residual norm 2.124815079671e+01 % max 1.501143586520e+00 min 5.739351119078e-01 max/min > 2.615528402732e+00 > 3 KSP Residual norm 1.581785698912e+01 % max 1.644351137983e+00 min 3.263683482596e-01 max/min > 5.038329074347e+00 > 4 KSP Residual norm 1.254871990315e+01 % max 1.714668863819e+00 min 2.044075812142e-01 max/min > 8.388479789416e+00 > 5 KSP Residual norm 1.051198229090e+01 % max 1.760078533063e+00 min 1.409327403114e-01 max/min > 1.248878386367e+01 > 6 KSP Residual norm 9.061658306086e+00 % max 1.792995287686e+00 min 1.023484740555e-01 max/min > 1.751853463603e+01 > 7 KSP Residual norm 8.015529297567e+00 % max 1.821497535985e+00 min 7.818018001928e-02 max/min > 2.329871248104e+01 > 8 KSP Residual norm 7.201063258957e+00 % max 1.855140071935e+00 min 6.178572472468e-02 max/min > 3.002538337458e+01 > 9 KSP Residual norm 6.548491711695e+00 % max 1.903578294573e+00 min 5.008612895206e-02 max/min > 3.800609738466e+01 > 10 KSP Residual norm 6.002109992255e+00 % max 1.961356890125e+00 min 4.130572033722e-02 max/min > 4.748390475004e+01 > Residual norms for Displacement_pc_gamg_esteig_ solve. > 0 KSP Residual norm 2.373573910237e+02 % max 1.000000000000e+00 min 1.000000000000e+00 max/min > 1.000000000000e+00 > 1 KSP Residual norm 8.845061415709e+01 % max 1.081192207576e+00 min 1.081192207576e+00 max/min > 1.000000000000e+00 > 2 KSP Residual norm 5.607525485152e+01 % max 1.345947059840e+00 min 5.768825326129e-01 max/min > 2.333138869267e+00 > 3 KSP Residual norm 4.123522550864e+01 % max 1.481153523075e+00 min 3.070603564913e-01 max/min > 4.823655974348e+00 > 4 KSP Residual norm 3.345765664017e+01 % max 1.551374710727e+00 min 1.953487694959e-01 max/min > 7.941563771968e+00 > 5 KSP Residual norm 2.859712984893e+01 % max 1.604588395452e+00 min 1.313871480574e-01 max/min > 1.221267391199e+01 > 6 KSP Residual norm 2.525636054248e+01 % max 1.650487481750e+00 min 9.322735730688e-02 max/min > 1.770389646804e+01 > 7 KSP Residual norm 2.270711391451e+01 % max 1.697243639599e+00 min 6.945419058256e-02 max/min > 2.443687883140e+01 > 8 KSP Residual norm 2.074739485241e+01 % max 1.737293728907e+00 min 5.319942519758e-02 max/min > 3.265624999621e+01 > 9 KSP Residual norm 1.912808268870e+01 % max 1.771708608618e+00 min 4.229776586667e-02 max/min > 4.188657656771e+01 > 10 KSP Residual norm 1.787394414641e+01 % max 1.802834420843e+00 min 3.460455235448e-02 max/min > 5.209818645753e+01 > Residual norms for Displacement_pc_gamg_esteig_ solve. > 0 KSP Residual norm 1.361990679391e+03 % max 1.000000000000e+00 min 1.000000000000e+00 max/min > 1.000000000000e+00 > 1 KSP Residual norm 5.377188333825e+02 % max 1.086812916769e+00 min 1.086812916769e+00 max/min > 1.000000000000e+00 > 2 KSP Residual norm 2.819790765047e+02 % max 1.474233179517e+00 min 6.475176340551e-01 max/min > 2.276745994212e+00 > 3 KSP Residual norm 1.856720658591e+02 % max 1.646049713883e+00 min 4.391851040105e-01 max/min > 3.747963441500e+00 > 4 KSP Residual norm 1.446507859917e+02 % max 1.760403013135e+00 min 2.972886103795e-01 max/min > 5.921528614526e+00 > 5 KSP Residual norm 1.212491636433e+02 % max 1.839250080524e+00 min 1.921591413785e-01 max/min > 9.571494061277e+00 > 6 KSP Residual norm 1.052783637696e+02 % max 1.887062042760e+00 min 1.275920366984e-01 max/min > 1.478981048966e+01 > 7 KSP Residual norm 9.230292625762e+01 % max 1.917891358356e+00 min 8.853577120467e-02 max/min > 2.166233300122e+01 > 8 KSP Residual norm 8.262607594297e+01 % max 1.935857204308e+00 min 6.706949937710e-02 max/min > 2.886345093206e+01 > 9 KSP Residual norm 7.616474911000e+01 % max 1.946323901431e+00 min 5.354310733090e-02 max/min > 3.635059671458e+01 > 10 KSP Residual norm 7.138356892221e+01 % max 1.954382723686e+00 min 4.367661484659e-02 max/min > 4.474666204216e+01 > Residual norms for Displacement_pc_gamg_esteig_ solve. > 0 KSP Residual norm 3.702300162209e+03 % max 1.000000000000e+00 min 1.000000000000e+00 max/min > 1.000000000000e+00 > 1 KSP Residual norm 1.255008322497e+03 % max 9.938792139169e-01 min 9.938792139169e-01 max/min > 1.000000000000e+00 > 2 KSP Residual norm 6.727201181977e+02 % max 1.297844907149e+00 min 6.478406586220e-01 max/min > 2.003339694532e+00 > 3 KSP Residual norm 5.218419298230e+02 % max 1.435817121668e+00 min 3.868381643086e-01 max/min > 3.711673909512e+00 > 4 KSP Residual norm 4.562548407646e+02 % max 1.507841675332e+00 min 1.835807205925e-01 max/min > 8.213507771759e+00 > 5 KSP Residual norm 3.829651184063e+02 % max 1.544809112105e+00 min 9.645201420491e-02 max/min > 1.601634890510e+01 > 6 KSP Residual norm 2.858162778588e+02 % max 1.571662611009e+00 min 6.326714268751e-02 max/min > 2.484168786904e+01 > 7 KSP Residual norm 2.074805889949e+02 % max 1.587767457742e+00 min 5.145942909400e-02 max/min > 3.085474296347e+01 > 8 KSP Residual norm 1.566220417755e+02 % max 1.597548616381e+00 min 4.650092979233e-02 max/min > 3.435519727274e+01 > 9 KSP Residual norm 1.157894309297e+02 % max 1.603863600136e+00 min 4.344076378399e-02 max/min > 3.692070443585e+01 > 10 KSP Residual norm 8.447209442299e+01 % max 1.608204129656e+00 min 4.123402730882e-02 max/min > 3.900186895670e+01 > Linear Displacement_ solve converged due to CONVERGED_RTOL iterations 14 > > * -pc_gamg_aggressive_coarsening 0 > > will slow coarsening as well as threshold. > > That did not help > > * you can run with '-info :pc' and send me the output (grep on GAMG) > > Let?s try to figure out if the fact that -pc_gamg_esteig_ksp_monitor_singular_value is an indication of a problem first. > > Blaise > > ? > Canada Research Chair in Mathematical and Computational Aspects of Solid Mechanics (Tier 1) > Professor, Department of Mathematics & Statistics > Hamilton Hall room 409A, McMaster University > 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada > https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243 From knepley at gmail.com Mon Mar 27 19:36:34 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 27 Mar 2023 20:36:34 -0400 Subject: [petsc-users] [petsc-maint] DMSwarm documentation In-Reply-To: References: Message-ID: On Mon, Mar 27, 2023 at 10:19?AM Joauma Marichal < joauma.marichal at uclouvain.be> wrote: > Hello, > > > > I am writing to you as I am trying to find documentation about a function > that would remove several particles (given their index). I was using: > > DMSwarmRemovePointAtIndex(*swarm, to_remove[p]); > > But need something to remove several particles at one time. > There are no functions taking a list of points to remove. Thanks, Matt > Petsc.org seems to be down and I was wondering if there was any other way > to get this kind of information. > > > > Thanks a lot for your help. > > Best regards, > > > > Joauma > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Mar 27 19:48:54 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 27 Mar 2023 20:48:54 -0400 Subject: [petsc-users] Petsc DMLabel Fortran Stub request In-Reply-To: References: Message-ID: On Fri, Jan 6, 2023 at 10:03?AM Nicholas Arnold-Medabalimi < narnoldm at umich.edu> wrote: > Hi Petsc Users > I apologize. I found this email today and it looks like no one answered. > I am trying to use the sequence of > call DMLabelPropagateBegin(synchLabel,sf,ierr) > call > DMLabelPropagatePush(synchLabel,sf,PETSC_NULL_OPTIONS,PETSC_NULL_INTEGER,ierr) > call DMLabelPropagateEnd(synchLabel,sf, ierr) > in fortran. > > I apologize if I messed something up, it appears as if the > DMLabelPropagatePush command doesn't have an appropriate Fortran interface > as I get an undefined reference when it is called. > Yes, it takes a function pointer, and using function pointers with Fortran is not easy, although it can be done. It might be better to create a C function with some default marking and then wrap that. What do you want to do? Thanks, Matt > I would appreciate any assistance. > > As a side note in practice, what is the proper Fortran NULL pointer to use > for void arguments? I used an integer one temporarily to get to the > undefined reference error but I assume it doesn't matter? > > > Sincerely > Nicholas > > -- > Nicholas Arnold-Medabalimi > > Ph.D. Candidate > Computational Aeroscience Lab > University of Michigan > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Mon Mar 27 20:11:57 2023 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 27 Mar 2023 21:11:57 -0400 Subject: [petsc-users] GAMG failure In-Reply-To: References: <1FD5FB62-4111-4376-8126-CFA8E8925620@mcmaster.ca> <87y1nmj8bd.fsf@jedbrown.org> Message-ID: Yes, the eigen estimates are converging slowly. BTW, have you tried hypre? It is a good solver (lots lots more woman years) These eigen estimates are conceptually simple, but they can lead to problems like this (hypre and an eigen estimate free smoother). But try this (good to have options anyway): -pc_gamg_esteig_ksp_max_it 20 Chevy will scale the estimate that we give by, I think, 5% by default. Maybe 10. You can set that with: -mg_levels_ksp_chebyshev_esteig 0,0.2,0,*1.05* 0.2 is the scaling of the high eigen estimate for the low eigen value in Chebyshev. On Mon, Mar 27, 2023 at 5:06?PM Blaise Bourdin wrote: > > > On Mar 24, 2023, at 3:21 PM, Mark Adams wrote: > > * Do you set: > > PetscCall(MatSetOption(Amat, MAT_SPD, PETSC_TRUE)); > > PetscCall(MatSetOption(Amat, MAT_SPD_ETERNAL, PETSC_TRUE)); > > > Yes > > > Do that to get CG Eigen estimates. Outright failure is usually caused by a > bad Eigen estimate. > -pc_gamg_esteig_ksp_monitor_singular_value > Will print out the estimates as its iterating. You can look at that to > check that the max has converged. > > > I just did, and something is off: > I do multiple calls to SNESSolve (staggered scheme for phase-field > fracture), but only get informations on the first solve (which is not the > one failing, of course) > Here is what I get: > Residual norms for Displacement_pc_gamg_esteig_ solve. > 0 KSP Residual norm 7.636421712860e+01 % max 1.000000000000e+00 min > 1.000000000000e+00 max/min 1.000000000000e+00 > 1 KSP Residual norm 3.402024867977e+01 % max 1.114319928921e+00 min > 1.114319928921e+00 max/min 1.000000000000e+00 > 2 KSP Residual norm 2.124815079671e+01 % max 1.501143586520e+00 min > 5.739351119078e-01 max/min 2.615528402732e+00 > 3 KSP Residual norm 1.581785698912e+01 % max 1.644351137983e+00 min > 3.263683482596e-01 max/min 5.038329074347e+00 > 4 KSP Residual norm 1.254871990315e+01 % max 1.714668863819e+00 min > 2.044075812142e-01 max/min 8.388479789416e+00 > 5 KSP Residual norm 1.051198229090e+01 % max 1.760078533063e+00 min > 1.409327403114e-01 max/min 1.248878386367e+01 > 6 KSP Residual norm 9.061658306086e+00 % max 1.792995287686e+00 min > 1.023484740555e-01 max/min 1.751853463603e+01 > 7 KSP Residual norm 8.015529297567e+00 % max 1.821497535985e+00 min > 7.818018001928e-02 max/min 2.329871248104e+01 > 8 KSP Residual norm 7.201063258957e+00 % max 1.855140071935e+00 min > 6.178572472468e-02 max/min 3.002538337458e+01 > 9 KSP Residual norm 6.548491711695e+00 % max 1.903578294573e+00 min > 5.008612895206e-02 max/min 3.800609738466e+01 > 10 KSP Residual norm 6.002109992255e+00 % max 1.961356890125e+00 min > 4.130572033722e-02 max/min 4.748390475004e+01 > Residual norms for Displacement_pc_gamg_esteig_ solve. > 0 KSP Residual norm 2.373573910237e+02 % max 1.000000000000e+00 min > 1.000000000000e+00 max/min 1.000000000000e+00 > 1 KSP Residual norm 8.845061415709e+01 % max 1.081192207576e+00 min > 1.081192207576e+00 max/min 1.000000000000e+00 > 2 KSP Residual norm 5.607525485152e+01 % max 1.345947059840e+00 min > 5.768825326129e-01 max/min 2.333138869267e+00 > 3 KSP Residual norm 4.123522550864e+01 % max 1.481153523075e+00 min > 3.070603564913e-01 max/min 4.823655974348e+00 > 4 KSP Residual norm 3.345765664017e+01 % max 1.551374710727e+00 min > 1.953487694959e-01 max/min 7.941563771968e+00 > 5 KSP Residual norm 2.859712984893e+01 % max 1.604588395452e+00 min > 1.313871480574e-01 max/min 1.221267391199e+01 > 6 KSP Residual norm 2.525636054248e+01 % max 1.650487481750e+00 min > 9.322735730688e-02 max/min 1.770389646804e+01 > 7 KSP Residual norm 2.270711391451e+01 % max 1.697243639599e+00 min > 6.945419058256e-02 max/min 2.443687883140e+01 > 8 KSP Residual norm 2.074739485241e+01 % max 1.737293728907e+00 min > 5.319942519758e-02 max/min 3.265624999621e+01 > 9 KSP Residual norm 1.912808268870e+01 % max 1.771708608618e+00 min > 4.229776586667e-02 max/min 4.188657656771e+01 > 10 KSP Residual norm 1.787394414641e+01 % max 1.802834420843e+00 min > 3.460455235448e-02 max/min 5.209818645753e+01 > Residual norms for Displacement_pc_gamg_esteig_ solve. > 0 KSP Residual norm 1.361990679391e+03 % max 1.000000000000e+00 min > 1.000000000000e+00 max/min 1.000000000000e+00 > 1 KSP Residual norm 5.377188333825e+02 % max 1.086812916769e+00 min > 1.086812916769e+00 max/min 1.000000000000e+00 > 2 KSP Residual norm 2.819790765047e+02 % max 1.474233179517e+00 min > 6.475176340551e-01 max/min 2.276745994212e+00 > 3 KSP Residual norm 1.856720658591e+02 % max 1.646049713883e+00 min > 4.391851040105e-01 max/min 3.747963441500e+00 > 4 KSP Residual norm 1.446507859917e+02 % max 1.760403013135e+00 min > 2.972886103795e-01 max/min 5.921528614526e+00 > 5 KSP Residual norm 1.212491636433e+02 % max 1.839250080524e+00 min > 1.921591413785e-01 max/min 9.571494061277e+00 > 6 KSP Residual norm 1.052783637696e+02 % max 1.887062042760e+00 min > 1.275920366984e-01 max/min 1.478981048966e+01 > 7 KSP Residual norm 9.230292625762e+01 % max 1.917891358356e+00 min > 8.853577120467e-02 max/min 2.166233300122e+01 > 8 KSP Residual norm 8.262607594297e+01 % max 1.935857204308e+00 min > 6.706949937710e-02 max/min 2.886345093206e+01 > 9 KSP Residual norm 7.616474911000e+01 % max 1.946323901431e+00 min > 5.354310733090e-02 max/min 3.635059671458e+01 > 10 KSP Residual norm 7.138356892221e+01 % max 1.954382723686e+00 min > 4.367661484659e-02 max/min 4.474666204216e+01 > Residual norms for Displacement_pc_gamg_esteig_ solve. > 0 KSP Residual norm 3.702300162209e+03 % max 1.000000000000e+00 min > 1.000000000000e+00 max/min 1.000000000000e+00 > 1 KSP Residual norm 1.255008322497e+03 % max 9.938792139169e-01 min > 9.938792139169e-01 max/min 1.000000000000e+00 > 2 KSP Residual norm 6.727201181977e+02 % max 1.297844907149e+00 min > 6.478406586220e-01 max/min 2.003339694532e+00 > 3 KSP Residual norm 5.218419298230e+02 % max 1.435817121668e+00 min > 3.868381643086e-01 max/min 3.711673909512e+00 > 4 KSP Residual norm 4.562548407646e+02 % max 1.507841675332e+00 min > 1.835807205925e-01 max/min 8.213507771759e+00 > 5 KSP Residual norm 3.829651184063e+02 % max 1.544809112105e+00 min > 9.645201420491e-02 max/min 1.601634890510e+01 > 6 KSP Residual norm 2.858162778588e+02 % max 1.571662611009e+00 min > 6.326714268751e-02 max/min 2.484168786904e+01 > 7 KSP Residual norm 2.074805889949e+02 % max 1.587767457742e+00 min > 5.145942909400e-02 max/min 3.085474296347e+01 > 8 KSP Residual norm 1.566220417755e+02 % max 1.597548616381e+00 min > 4.650092979233e-02 max/min 3.435519727274e+01 > 9 KSP Residual norm 1.157894309297e+02 % max 1.603863600136e+00 min > 4.344076378399e-02 max/min 3.692070443585e+01 > 10 KSP Residual norm 8.447209442299e+01 % max 1.608204129656e+00 min > 4.123402730882e-02 max/min 3.900186895670e+01 > Linear Displacement_ solve converged due to CONVERGED_RTOL iterations 14 > > > > > > * -pc_gamg_aggressive_coarsening 0 > > will slow coarsening as well as threshold. > > That did not help > > > * you can run with '-info :pc' and send me the output (grep on GAMG) > > Let?s try to figure out if the fact > that -pc_gamg_esteig_ksp_monitor_singular_value is an indication of a > problem first. > > Blaise > > ? > Canada Research Chair in Mathematical and Computational Aspects of Solid > Mechanics (Tier 1) > Professor, Department of Mathematics & Statistics > Hamilton Hall room 409A, McMaster University > 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada > https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniele.prada85 at gmail.com Tue Mar 28 10:24:15 2023 From: daniele.prada85 at gmail.com (Daniele Prada) Date: Tue, 28 Mar 2023 17:24:15 +0200 Subject: [petsc-users] Using PETSc Testing System In-Reply-To: References: <8F636F03-6581-4594-877F-CB0A4AC91EA3@gmail.com> Message-ID: Dear Matthew, dear Jacob, Thank you very much for your useful remarks. I managed to use the PETSc Testing System by doing as follows: 1. Redefined TESTDIR when running make 2. Used a project tree similar to that of PETSc. For examples, tests for 'package1' are in $MYLIB/src/package1/tests/ 3. cp $PETSC_DIR/gmakefile.test $MYLIB/gmakefile.test Inside gmakefile.test: 4. Right AFTER "-include petscdir.mk" added "-include mylib.mk" to have $MYLIB exported (note: this affects TESTSRCDIR) 5. Redefined variable pkgs as "pkgs := package1" 6. Introduced a few variables to make PETSC_COMPILE.c work: CFLAGS := -I$(MYLIB)/include LDFLAGS = -L$(MYLIB)/lib LDLIBS = -lmylib 7. Changed the call to gmakegentest.py as follows $(PYTHON) $(CONFIGDIR)/gmakegentest.py --petsc-dir=$(PETSC_DIR) --petsc-arch=$(PETSC_ARCH) --testdir=$(TESTDIR) --srcdir=$(MYLIB)/src --pkg-pkgs=$(pkgs) 8. Changed the rule $(testexe.c) as follows: $(call quiet,CLINKER) $(EXEFLAGS) $(LDFLAGS) -o $@ $^ $(PETSC_TEST_LIB) $(LDLIBS) 9. Added the option --srcdir=$(TESTSRCDIR) and set --petsc-dir=$(MYLIB) when calling query_tests.py, for example: TESTTARGETS := $(shell $(PYTHON) $(CONFIGDIR)/query_tests.py --srcdir=$(TESTSRCDIR) --testdir=$(TESTDIR) --petsc-dir=$(MYLIB) --petsc-arch=$(PETSC_ARCH) --searchin=$(searchin) 'name' '$(search)') What do you think? Best, Daniele On Mon, Mar 27, 2023 at 4:38?PM Matthew Knepley wrote: > On Mon, Mar 27, 2023 at 10:19?AM Jacob Faibussowitsch > wrote: > >> Our testing framework was pretty much tailor-made for the PETSc src tree >> and as such has many hard-coded paths and decisions. I?m going to go out on >> a limb and say you probably won?t get this to work... >> > > I think we can help you get this to work. I have wanted to generalize the > test framework for a long time. Everything is build by > > confg/gmakegentest.py > > and I think we can get away with just changing paths here and everything > will work. > > Thanks! > > Matt > > >> That being said, one of the ?base? paths that the testing harness uses to >> initially find tests is the `TESTSRCDIR` variable in >> `${PETSC_DIR}/gmakefile.test`. It is currently defined as >> ``` >> # TESTSRCDIR is always relative to gmakefile.test >> # This must be before includes >> mkfile_path := $(abspath $(lastword $(MAKEFILE_LIST))) >> TESTSRCDIR := $(dir $(mkfile_path))src >> ``` >> You should start by changing this to >> ``` >> # TESTSRCDIR is always relative to gmakefile.test >> # This must be before includes >> mkfile_path := $(abspath $(lastword $(MAKEFILE_LIST))) >> TESTSRCDIR ?= $(dir $(mkfile_path))src >> ``` >> That way you could run your tests via >> ``` >> $ make test TESTSRCDIR=/path/to/your/src/dir >> ``` >> I am sure there are many other modifications you will need to make. >> >> Best regards, >> >> Jacob Faibussowitsch >> (Jacob Fai - booss - oh - vitch) >> >> > On Mar 27, 2023, at 06:14, Daniele Prada >> wrote: >> > >> > Hello everyone, >> > >> > I would like to use the PETSc Testing System for testing a package that >> I am developing. >> > >> > I have read the PETSc developer documentation and have written some >> tests using the PETSc Test Description Language. I am going through the >> files in ${PETSC_DIR}/config but I am not able to make the testing system >> look into the directory tree of my project. >> > >> > Any suggestions? >> > >> > Thanks in advance >> > Daniele >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jim.Lutsko at ulb.be Tue Mar 28 10:32:37 2023 From: Jim.Lutsko at ulb.be (LUTSKO James) Date: Tue, 28 Mar 2023 15:32:37 +0000 Subject: [petsc-users] Restarting a SLEPC run to refine an eigenvector Message-ID: Hello, I am a complete newbe so sorry if this is answered elsewhere or ill-posed but .. I am trying to get the smallest eigenvector of a large matrix. Since the calculations are very long, I would like to be able to restart my code if I find that the "converged" eigenvector is not good enough without having to start from scratch. I am currently using the options -eps_type jd -eps_monitor -eps_smallest_real -eps_conv_abs -eps_tol 1e-8 -st_ksp_type gmres -st_pc_type jacobi -st_ksp_max_it 40 Is there any way to do this? I have tried using EPSSetInitialSubspace with the stored eigenvector but this does not seem to work. thanks jim % James F. Lutsko, CNLPCS, Universite Libre de Bruxelles, % Campus Plaine -- CP231 B-1050 Bruxelles, Belgium % tel: +32-2-650-5997 email: jlutsko at ulb.ac.be % fax: +32-2-650-5767 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bourdin at mcmaster.ca Tue Mar 28 11:38:54 2023 From: bourdin at mcmaster.ca (Blaise Bourdin) Date: Tue, 28 Mar 2023 16:38:54 +0000 Subject: [petsc-users] GAMG failure In-Reply-To: References: <1FD5FB62-4111-4376-8126-CFA8E8925620@mcmaster.ca> <87y1nmj8bd.fsf@jedbrown.org> Message-ID: An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Mar 28 12:18:50 2023 From: jed at jedbrown.org (Jed Brown) Date: Tue, 28 Mar 2023 11:18:50 -0600 Subject: [petsc-users] GAMG failure In-Reply-To: References: <1FD5FB62-4111-4376-8126-CFA8E8925620@mcmaster.ca> <87y1nmj8bd.fsf@jedbrown.org> Message-ID: <87v8iklrph.fsf@jedbrown.org> This suite has been good for my solid mechanics solvers. (It's written here as a coarse grid solver because we do matrix-free p-MG first, but you can use it directly.) https://github.com/hypre-space/hypre/issues/601#issuecomment-1069426997 Blaise Bourdin writes: > On Mar 27, 2023, at 9:11 PM, Mark Adams wrote: > > Yes, the eigen estimates are converging slowly. > > BTW, have you tried hypre? It is a good solver (lots lots more woman years) > These eigen estimates are conceptually simple, but they can lead to problems like this (hypre and an eigen estimate free > smoother). > > I just moved from petsc 3.3 to main, so my experience with an old version of hyper has not been very convincing. Strangely > enough, ML has always been the most efficient PC for me. Maybe it?s time to revisit. > That said, I would really like to get decent performances out of gamg. One day, I?d like to be able to account for the special structure > of phase-field fracture in the construction of the coarse space. > > But try this (good to have options anyway): > > -pc_gamg_esteig_ksp_max_it 20 > > Chevy will scale the estimate that we give by, I think, 5% by default. Maybe 10. > You can set that with: > > -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 > > 0.2 is the scaling of the high eigen estimate for the low eigen value in Chebyshev. > > Jed?s suggestion of using -pc_gamg_reuse_interpolation 0 worked. I am testing your options at the moment. > > Thanks a lot, > > Blaise > > ? > Canada Research Chair in Mathematical and Computational Aspects of Solid Mechanics (Tier 1) > Professor, Department of Mathematics & Statistics > Hamilton Hall room 409A, McMaster University > 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada > https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243 From jed at jedbrown.org Tue Mar 28 12:27:58 2023 From: jed at jedbrown.org (Jed Brown) Date: Tue, 28 Mar 2023 11:27:58 -0600 Subject: [petsc-users] Using PETSc Testing System In-Reply-To: References: <8F636F03-6581-4594-877F-CB0A4AC91EA3@gmail.com> Message-ID: <87sfdolra9.fsf@jedbrown.org> Great that you got it working. We would accept a merge request that made our infrastructure less PETSc-specific so long as it doesn't push more complexity on the end user. That would likely make it easier for you to pull updates in the future. Daniele Prada writes: > Dear Matthew, dear Jacob, > > Thank you very much for your useful remarks. I managed to use the PETSc > Testing System by doing as follows: > > 1. Redefined TESTDIR when running make > 2. Used a project tree similar to that of PETSc. For examples, tests for > 'package1' are in $MYLIB/src/package1/tests/ > 3. cp $PETSC_DIR/gmakefile.test $MYLIB/gmakefile.test > > Inside gmakefile.test: > > 4. Right AFTER "-include petscdir.mk" added "-include mylib.mk" to have > $MYLIB exported (note: this affects TESTSRCDIR) > 5. Redefined variable pkgs as "pkgs := package1" > 6. Introduced a few variables to make PETSC_COMPILE.c work: > > CFLAGS := -I$(MYLIB)/include > LDFLAGS = -L$(MYLIB)/lib > LDLIBS = -lmylib > > 7. Changed the call to gmakegentest.py as follows > > $(PYTHON) $(CONFIGDIR)/gmakegentest.py --petsc-dir=$(PETSC_DIR) > --petsc-arch=$(PETSC_ARCH) --testdir=$(TESTDIR) --srcdir=$(MYLIB)/src > --pkg-pkgs=$(pkgs) > > 8. Changed the rule $(testexe.c) as follows: > > $(call quiet,CLINKER) $(EXEFLAGS) $(LDFLAGS) -o $@ $^ $(PETSC_TEST_LIB) > $(LDLIBS) > > 9. Added the option --srcdir=$(TESTSRCDIR) and set --petsc-dir=$(MYLIB) > when calling query_tests.py, for example: > > TESTTARGETS := $(shell $(PYTHON) $(CONFIGDIR)/query_tests.py > --srcdir=$(TESTSRCDIR) --testdir=$(TESTDIR) --petsc-dir=$(MYLIB) > --petsc-arch=$(PETSC_ARCH) --searchin=$(searchin) 'name' '$(search)') > > > > What do you think? > > Best, > Daniele > > On Mon, Mar 27, 2023 at 4:38?PM Matthew Knepley wrote: > >> On Mon, Mar 27, 2023 at 10:19?AM Jacob Faibussowitsch >> wrote: >> >>> Our testing framework was pretty much tailor-made for the PETSc src tree >>> and as such has many hard-coded paths and decisions. I?m going to go out on >>> a limb and say you probably won?t get this to work... >>> >> >> I think we can help you get this to work. I have wanted to generalize the >> test framework for a long time. Everything is build by >> >> confg/gmakegentest.py >> >> and I think we can get away with just changing paths here and everything >> will work. >> >> Thanks! >> >> Matt >> >> >>> That being said, one of the ?base? paths that the testing harness uses to >>> initially find tests is the `TESTSRCDIR` variable in >>> `${PETSC_DIR}/gmakefile.test`. It is currently defined as >>> ``` >>> # TESTSRCDIR is always relative to gmakefile.test >>> # This must be before includes >>> mkfile_path := $(abspath $(lastword $(MAKEFILE_LIST))) >>> TESTSRCDIR := $(dir $(mkfile_path))src >>> ``` >>> You should start by changing this to >>> ``` >>> # TESTSRCDIR is always relative to gmakefile.test >>> # This must be before includes >>> mkfile_path := $(abspath $(lastword $(MAKEFILE_LIST))) >>> TESTSRCDIR ?= $(dir $(mkfile_path))src >>> ``` >>> That way you could run your tests via >>> ``` >>> $ make test TESTSRCDIR=/path/to/your/src/dir >>> ``` >>> I am sure there are many other modifications you will need to make. >>> >>> Best regards, >>> >>> Jacob Faibussowitsch >>> (Jacob Fai - booss - oh - vitch) >>> >>> > On Mar 27, 2023, at 06:14, Daniele Prada >>> wrote: >>> > >>> > Hello everyone, >>> > >>> > I would like to use the PETSc Testing System for testing a package that >>> I am developing. >>> > >>> > I have read the PETSc developer documentation and have written some >>> tests using the PETSc Test Description Language. I am going through the >>> files in ${PETSC_DIR}/config but I am not able to make the testing system >>> look into the directory tree of my project. >>> > >>> > Any suggestions? >>> > >>> > Thanks in advance >>> > Daniele >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> From jroman at dsic.upv.es Tue Mar 28 12:43:33 2023 From: jroman at dsic.upv.es (Jose E. Roman) Date: Tue, 28 Mar 2023 19:43:33 +0200 Subject: [petsc-users] Restarting a SLEPC run to refine an eigenvector In-Reply-To: References: Message-ID: What do you mean that EPSSetInitialSubspace() does not work? Doesn't it improve convergence with respect to not using it? Is your smallest eigenvalue positive? And do you have many eigenvalues close to the smallest one? Jose > El 28 mar 2023, a las 17:32, LUTSKO James via petsc-users escribi?: > > Hello, > I am a complete newbe so sorry if this is answered elsewhere or ill-posed but .. I am trying to get the smallest eigenvector of a large matrix. Since the calculations are very long, I would like to be able to restart my code if I find that the "converged" eigenvector is not good enough without having to start from scratch. I am currently using the options > > -eps_type jd -eps_monitor -eps_smallest_real -eps_conv_abs -eps_tol 1e-8 -st_ksp_type gmres -st_pc_type jacobi -st_ksp_max_it 40 > > Is there any way to do this? I have tried using EPSSetInitialSubspace with the stored eigenvector but this does not seem to work. > > thanks > jim > > % James F. Lutsko, CNLPCS, Universite Libre de Bruxelles, > % Campus Plaine -- CP231 B-1050 Bruxelles, Belgium > % tel: +32-2-650-5997 email: jlutsko at ulb.ac.be > % fax: +32-2-650-5767 From narnoldm at umich.edu Tue Mar 28 14:28:53 2023 From: narnoldm at umich.edu (Nicholas Arnold-Medabalimi) Date: Tue, 28 Mar 2023 15:28:53 -0400 Subject: [petsc-users] Petsc DMLabel Fortran Stub request In-Reply-To: References: Message-ID: Hi Matthew Thanks for checking in on this. Fortunately, I was able to get the behavior I needed through alternate means so I probably wouldn't investigate doing this further. Sincerely Nicholas On Mon, Mar 27, 2023 at 8:49?PM Matthew Knepley wrote: > On Fri, Jan 6, 2023 at 10:03?AM Nicholas Arnold-Medabalimi < > narnoldm at umich.edu> wrote: > >> Hi Petsc Users >> > > I apologize. I found this email today and it looks like no one answered. > > >> I am trying to use the sequence of >> call DMLabelPropagateBegin(synchLabel,sf,ierr) >> call >> DMLabelPropagatePush(synchLabel,sf,PETSC_NULL_OPTIONS,PETSC_NULL_INTEGER,ierr) >> call DMLabelPropagateEnd(synchLabel,sf, ierr) >> in fortran. >> >> I apologize if I messed something up, it appears as if the >> DMLabelPropagatePush command doesn't have an appropriate Fortran interface >> as I get an undefined reference when it is called. >> > > Yes, it takes a function pointer, and using function pointers with Fortran > is not easy, although it can be done. It might be better to create a C > function with some default marking and then wrap that. What do you want to > do? > > Thanks, > > Matt > > >> I would appreciate any assistance. >> >> As a side note in practice, what is the proper Fortran NULL pointer to >> use for void arguments? I used an integer one temporarily to get to the >> undefined reference error but I assume it doesn't matter? >> >> >> Sincerely >> Nicholas >> >> -- >> Nicholas Arnold-Medabalimi >> >> Ph.D. Candidate >> Computational Aeroscience Lab >> University of Michigan >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- Nicholas Arnold-Medabalimi Ph.D. Candidate Computational Aeroscience Lab University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniele.prada85 at gmail.com Tue Mar 28 16:51:38 2023 From: daniele.prada85 at gmail.com (Daniele Prada) Date: Tue, 28 Mar 2023 23:51:38 +0200 Subject: [petsc-users] Using PETSc Testing System In-Reply-To: <87sfdolra9.fsf@jedbrown.org> References: <8F636F03-6581-4594-877F-CB0A4AC91EA3@gmail.com> <87sfdolra9.fsf@jedbrown.org> Message-ID: Will do, thanks! On Tue, Mar 28, 2023 at 7:28?PM Jed Brown wrote: > Great that you got it working. We would accept a merge request that made > our infrastructure less PETSc-specific so long as it doesn't push more > complexity on the end user. That would likely make it easier for you to > pull updates in the future. > > Daniele Prada writes: > > > Dear Matthew, dear Jacob, > > > > Thank you very much for your useful remarks. I managed to use the PETSc > > Testing System by doing as follows: > > > > 1. Redefined TESTDIR when running make > > 2. Used a project tree similar to that of PETSc. For examples, tests for > > 'package1' are in $MYLIB/src/package1/tests/ > > 3. cp $PETSC_DIR/gmakefile.test $MYLIB/gmakefile.test > > > > Inside gmakefile.test: > > > > 4. Right AFTER "-include petscdir.mk" added "-include mylib.mk" to have > > $MYLIB exported (note: this affects TESTSRCDIR) > > 5. Redefined variable pkgs as "pkgs := package1" > > 6. Introduced a few variables to make PETSC_COMPILE.c work: > > > > CFLAGS := -I$(MYLIB)/include > > LDFLAGS = -L$(MYLIB)/lib > > LDLIBS = -lmylib > > > > 7. Changed the call to gmakegentest.py as follows > > > > $(PYTHON) $(CONFIGDIR)/gmakegentest.py --petsc-dir=$(PETSC_DIR) > > --petsc-arch=$(PETSC_ARCH) --testdir=$(TESTDIR) --srcdir=$(MYLIB)/src > > --pkg-pkgs=$(pkgs) > > > > 8. Changed the rule $(testexe.c) as follows: > > > > $(call quiet,CLINKER) $(EXEFLAGS) $(LDFLAGS) -o $@ $^ $(PETSC_TEST_LIB) > > $(LDLIBS) > > > > 9. Added the option --srcdir=$(TESTSRCDIR) and set --petsc-dir=$(MYLIB) > > when calling query_tests.py, for example: > > > > TESTTARGETS := $(shell $(PYTHON) $(CONFIGDIR)/query_tests.py > > --srcdir=$(TESTSRCDIR) --testdir=$(TESTDIR) --petsc-dir=$(MYLIB) > > --petsc-arch=$(PETSC_ARCH) --searchin=$(searchin) 'name' '$(search)') > > > > > > > > What do you think? > > > > Best, > > Daniele > > > > On Mon, Mar 27, 2023 at 4:38?PM Matthew Knepley > wrote: > > > >> On Mon, Mar 27, 2023 at 10:19?AM Jacob Faibussowitsch < > jacob.fai at gmail.com> > >> wrote: > >> > >>> Our testing framework was pretty much tailor-made for the PETSc src > tree > >>> and as such has many hard-coded paths and decisions. I?m going to go > out on > >>> a limb and say you probably won?t get this to work... > >>> > >> > >> I think we can help you get this to work. I have wanted to generalize > the > >> test framework for a long time. Everything is build by > >> > >> confg/gmakegentest.py > >> > >> and I think we can get away with just changing paths here and everything > >> will work. > >> > >> Thanks! > >> > >> Matt > >> > >> > >>> That being said, one of the ?base? paths that the testing harness uses > to > >>> initially find tests is the `TESTSRCDIR` variable in > >>> `${PETSC_DIR}/gmakefile.test`. It is currently defined as > >>> ``` > >>> # TESTSRCDIR is always relative to gmakefile.test > >>> # This must be before includes > >>> mkfile_path := $(abspath $(lastword $(MAKEFILE_LIST))) > >>> TESTSRCDIR := $(dir $(mkfile_path))src > >>> ``` > >>> You should start by changing this to > >>> ``` > >>> # TESTSRCDIR is always relative to gmakefile.test > >>> # This must be before includes > >>> mkfile_path := $(abspath $(lastword $(MAKEFILE_LIST))) > >>> TESTSRCDIR ?= $(dir $(mkfile_path))src > >>> ``` > >>> That way you could run your tests via > >>> ``` > >>> $ make test TESTSRCDIR=/path/to/your/src/dir > >>> ``` > >>> I am sure there are many other modifications you will need to make. > >>> > >>> Best regards, > >>> > >>> Jacob Faibussowitsch > >>> (Jacob Fai - booss - oh - vitch) > >>> > >>> > On Mar 27, 2023, at 06:14, Daniele Prada > >>> wrote: > >>> > > >>> > Hello everyone, > >>> > > >>> > I would like to use the PETSc Testing System for testing a package > that > >>> I am developing. > >>> > > >>> > I have read the PETSc developer documentation and have written some > >>> tests using the PETSc Test Description Language. I am going through the > >>> files in ${PETSC_DIR}/config but I am not able to make the testing > system > >>> look into the directory tree of my project. > >>> > > >>> > Any suggestions? > >>> > > >>> > Thanks in advance > >>> > Daniele > >>> > >>> > >> > >> -- > >> What most experimenters take for granted before they begin their > >> experiments is infinitely more interesting than any results to which > their > >> experiments lead. > >> -- Norbert Wiener > >> > >> https://www.cse.buffalo.edu/~knepley/ > >> > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Tue Mar 28 19:46:57 2023 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 28 Mar 2023 20:46:57 -0400 Subject: [petsc-users] GAMG failure In-Reply-To: References: <1FD5FB62-4111-4376-8126-CFA8E8925620@mcmaster.ca> <87y1nmj8bd.fsf@jedbrown.org> Message-ID: On Tue, Mar 28, 2023 at 12:38?PM Blaise Bourdin wrote: > > > On Mar 27, 2023, at 9:11 PM, Mark Adams wrote: > > Yes, the eigen estimates are converging slowly. > > BTW, have you tried hypre? It is a good solver (lots lots more woman years) > These eigen estimates are conceptually simple, but they can lead to > problems like this (hypre and an eigen estimate free smoother). > > I just moved from petsc 3.3 to main, so my experience with an old version > of hyper has not been very convincing. Strangely enough, ML has always been > the most efficient PC for me. > ML is a good solver. > Maybe it?s time to revisit. > That said, I would really like to get decent performances out of gamg. One > day, I?d like to be able to account for the special structure of > phase-field fracture in the construction of the coarse space. > > > But try this (good to have options anyway): > > -pc_gamg_esteig_ksp_max_it 20 > > Chevy will scale the estimate that we give by, I think, 5% by default. > Maybe 10. > You can set that with: > > -mg_levels_ksp_chebyshev_esteig 0,0.2,0,*1.05* > > 0.2 is the scaling of the high eigen estimate for the low eigen value in > Chebyshev. > > > > Jed?s suggestion of using -pc_gamg_reuse_interpolation 0 worked. > OK, have to admit I am surprised. But I guess with your fracture the matrix/physics/dynamics does change a lot > I am testing your options at the moment. > There are a lot of options and it is cumbersome but they are finite and good to know. Glad its working, > > Thanks a lot, > > Blaise > > ? > Canada Research Chair in Mathematical and Computational Aspects of Solid > Mechanics (Tier 1) > Professor, Department of Mathematics & Statistics > Hamilton Hall room 409A, McMaster University > 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada > https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jim.Lutsko at ulb.be Wed Mar 29 02:58:47 2023 From: Jim.Lutsko at ulb.be (LUTSKO James) Date: Wed, 29 Mar 2023 07:58:47 +0000 Subject: [petsc-users] Restarting a SLEPC run to refine an eigenvector In-Reply-To: References: Message-ID: I withdraw my question. In trying to put together an example to illustrate the problem, I found that EPSSetInitialSubspace() does indeed "work" - in the sense that the eigenvalue, which decreases monotonically during the calculation, picks up from where the previous run left off. I am not sure why I had previously thought it was otherwise. thanks % James F. Lutsko, CNLPCS, Universite Libre de Bruxelles, % Campus Plaine -- CP231 B-1050 Bruxelles, Belgium % tel: +32-2-650-5997 email: jlutsko at ulb.ac.be % fax: +32-2-650-5767 ________________________________ From: Jose E. Roman Sent: Tuesday, March 28, 2023 7:43 PM To: LUTSKO James Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Restarting a SLEPC run to refine an eigenvector What do you mean that EPSSetInitialSubspace() does not work? Doesn't it improve convergence with respect to not using it? Is your smallest eigenvalue positive? And do you have many eigenvalues close to the smallest one? Jose > El 28 mar 2023, a las 17:32, LUTSKO James via petsc-users escribi?: > > Hello, > I am a complete newbe so sorry if this is answered elsewhere or ill-posed but .. I am trying to get the smallest eigenvector of a large matrix. Since the calculations are very long, I would like to be able to restart my code if I find that the "converged" eigenvector is not good enough without having to start from scratch. I am currently using the options > > -eps_type jd -eps_monitor -eps_smallest_real -eps_conv_abs -eps_tol 1e-8 -st_ksp_type gmres -st_pc_type jacobi -st_ksp_max_it 40 > > Is there any way to do this? I have tried using EPSSetInitialSubspace with the stored eigenvector but this does not seem to work. > > thanks > jim > > % James F. Lutsko, CNLPCS, Universite Libre de Bruxelles, > % Campus Plaine -- CP231 B-1050 Bruxelles, Belgium > % tel: +32-2-650-5997 email: jlutsko at ulb.ac.be > % fax: +32-2-650-5767 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Wed Mar 29 05:53:06 2023 From: jroman at dsic.upv.es (Jose E. Roman) Date: Wed, 29 Mar 2023 12:53:06 +0200 Subject: [petsc-users] Restarting a SLEPC run to refine an eigenvector In-Reply-To: References: Message-ID: > El 29 mar 2023, a las 9:58, LUTSKO James escribi?: > > I withdraw my question. In trying to put together an example to illustrate the problem, I found thatEPSSetInitialSubspace() does indeed "work" - in the sense that the eigenvalue, which decreases monotonically during the calculation, picks up from where the previous run left off. I am not sure why I had previously thought it was otherwise. > > thanks > Your use case is a typical one that might have slow convergence. In Davidson-type methods it is generally better if you can use a more powerful preconditioner (jacobi is likely not helping). If you can afford to factorize the matrix I would suggest using shift-and-invert with target=0 (or whatever is better for your application) with the default solver (Krylov-Schur). But normally factorizing a large Hamiltonian matrix is prohibitive. If you want, send me a sample matrix and I can do some tests. Jose From elias.karabelas at uni-graz.at Thu Mar 30 10:56:37 2023 From: elias.karabelas at uni-graz.at (Karabelas, Elias (elias.karabelas@uni-graz.at)) Date: Thu, 30 Mar 2023 15:56:37 +0000 Subject: [petsc-users] Augmented Linear System Message-ID: <2267f28c-ec43-66b8-43dd-29b4c6288478@uni-graz.at> Dear Community, I have a linear system of the form |K B| du ?? f1 ?????? = |C D| dp ?? f2 where K is a big m x m sparsematrix that comes from some FE discretization, B is a coupling matrix (of the form m x 4), C is of the form (4 x m) and D is 4x4. I save B and C as 4 Vecs and D as a 4x4 double array. D might be singular so at the moment I use the following schur-complement approach to solve this system 1) Calculate the vecs v1 = KSP(K,PrecK) * f1, invB = [ KSP(K, PrecK) * B[0], KSP(K, PrecK) * B[1], KSP(K, PrecK) * B[2], KSP(K, PrecK) * B[3] ] 2) Build the schurcomplement S=[C ^ K^-1 B - D] via VecDots (C ^ K^-1 B [i, j] = VecDot(C[i], invB[j]) 3) invert S (this seems to be mostly non-singular) to get dp 4) calculate du with dp So counting this, I need 5x to call KSP which can be expensive and I thought of somehow doing the Schur-Complement the other way around, however due to the (possible) singularity of D this seems like a bad idea (even using a pseudoinverse) Two things puzzle me here still A) Can this be done more efficiently? B) In case my above matrix is the Jacobian in a newton method, how do I make sure with any form of Schur Complement approach that I hit the correct residual reduction? Thanks Elias -- Dr. Elias Karabelas Universit?tsassistent | PostDoc Institut f?r Mathematik & Wissenschaftliches Rechnen | Institute of Mathematics & Scientific Computing Universit?t Graz | University of Graz Heinrichstra?e 36, 8010 Graz Tel.: +43/(0)316/380-8546 E-Mail: elias.karabelas at uni-graz.at Web: https://ccl.medunigraz.at Bitte denken Sie an die Umwelt, bevor Sie dieses E-Mail drucken. Danke! Please consider the environment before printing this e-mail. Thank you! From mfadams at lbl.gov Thu Mar 30 12:41:43 2023 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 30 Mar 2023 13:41:43 -0400 Subject: [petsc-users] Augmented Linear System In-Reply-To: <2267f28c-ec43-66b8-43dd-29b4c6288478@uni-graz.at> References: <2267f28c-ec43-66b8-43dd-29b4c6288478@uni-graz.at> Message-ID: You can lag the update of the Schur complement and use your solver as a preconditioner. If your problems don't change much you might converge fast enough (ie, < 4 iterations with one solve per iteration), but what you have is not bad if the size of your auxiliary, p, space does not grow. Mark On Thu, Mar 30, 2023 at 11:56?AM Karabelas, Elias ( elias.karabelas at uni-graz.at) wrote: > Dear Community, > > I have a linear system of the form > > |K B| du f1 > > = > > |C D| dp f2 > > where K is a big m x m sparsematrix that comes from some FE > discretization, B is a coupling matrix (of the form m x 4), C is of the > form (4 x m) and D is 4x4. > > I save B and C as 4 Vecs and D as a 4x4 double array. D might be > singular so at the moment I use the following schur-complement approach > to solve this system > > 1) Calculate the vecs v1 = KSP(K,PrecK) * f1, invB = [ KSP(K, PrecK) * > B[0], KSP(K, PrecK) * B[1], KSP(K, PrecK) * B[2], KSP(K, PrecK) * B[3] ] > > 2) Build the schurcomplement S=[C ^ K^-1 B - D] via VecDots (C ^ K^-1 B > [i, j] = VecDot(C[i], invB[j]) > > 3) invert S (this seems to be mostly non-singular) to get dp > > 4) calculate du with dp > > So counting this, I need 5x to call KSP which can be expensive and I > thought of somehow doing the Schur-Complement the other way around, > however due to the (possible) singularity of D this seems like a bad > idea (even using a pseudoinverse) > > Two things puzzle me here still > > A) Can this be done more efficiently? > > B) In case my above matrix is the Jacobian in a newton method, how do I > make sure with any form of Schur Complement approach that I hit the > correct residual reduction? > > Thanks > > Elias > > -- > Dr. Elias Karabelas > Universit?tsassistent | PostDoc > > Institut f?r Mathematik & Wissenschaftliches Rechnen | Institute of > Mathematics & Scientific Computing > Universit?t Graz | University of Graz > > Heinrichstra?e 36, 8010 Graz > Tel.: +43/(0)316/380-8546 > E-Mail: elias.karabelas at uni-graz.at > Web: https://ccl.medunigraz.at > > Bitte denken Sie an die Umwelt, bevor Sie dieses E-Mail drucken. Danke! > Please consider the environment before printing this e-mail. Thank you! > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mi.mike1021 at gmail.com Thu Mar 30 22:00:15 2023 From: mi.mike1021 at gmail.com (Mike Michell) Date: Thu, 30 Mar 2023 22:00:15 -0500 Subject: [petsc-users] DMPlex PETSCViewer for Surface Component In-Reply-To: References: <87pmhje1nz.fsf@jedbrown.org> Message-ID: Hi Matt, this is a follow-up to the previous question: I have created a short code as below to create a sub-dm on the surface, extracted from the original volume dmplex: ------------------------------------- call DMClone(dm_origin, dm_wall, ierr);CHKERRA(ierr) ! label for face sets call DMGetLabel(dm_wall, "Face Sets", label_facesets, ierr);CHKERRA(ierr) call DMPlexLabelComplete(dm_wall, label_facesets, ierr);CHKERRA(ierr) ! label for vertex on surface call DMCreateLabel(dm_wall, "Wall", ierr);CHKERRA(ierr) call DMGetLabel(dm_wall, "Wall", label_surf, ierr);CHKERRA(ierr) call DMPlexGetChart(dm_wall, ist, iend, ierr);CHKERRA(ierr) do i=ist,iend call DMLabelGetValue(label_facesets, i, val, ierr);CHKERRA(ierr) if(val .eq. ID_wall) then call DMLabelSetValue(label_surf, i, ID_wall, ierr);CHKERRA(ierr) endif enddo call DMPlexLabelComplete(dm_wall, label_surf, ierr);CHKERRA(ierr) ! create submesh call DMPlexCreateSubmesh(dm_wall, label_surf, ID_wall, PETSC_TRUE, dm_sub, ierr);CHKERRA(ierr) call DMPlexGetSubpointMap(dm_sub, label_sub, ierr);CHKERRA(ierr) ------------------------------------- It is a bit unclear how to map the vector on each vertex of the original volume dm (dm_wall) into the subdm (dm_sub). The function DMPlexGetSubpointMap() seems to create a subpointMap (label_sub) for this mapping, but it is hard to get an idea how to use that DMLabel for the mapping. Shall I create DMCreateInterpolation()? But it uses Mat and Vec to define mapping rule. Could I ask for any comments? Thanks, Mike > Yes, you can use DMPlexCreateSubmesh() (and friends depending on exactly > what kind of submesh you want). This will allow you to create a vector over > only this mesh, and map your volumetric solution to that subvector. Then > you can view the subvector (which pulls in the submesh). > > Thanks, > > Matt > > On Mon, Aug 1, 2022 at 10:59 PM Jed Brown wrote: > >> I would create a sub-DM containing only the part you want to view. >> >> Mike Michell writes: >> >> > Hi, >> > >> > I am a user of DMPlex object in 3D grid topology. Currently the solution >> > field is printed out using viewer PETSCVIEWERVTK in .vtu format. By >> doing >> > that the entire volume components are written in the file. I was >> wondering >> > if there is an option that I can tell PETSc viewer to print out only >> > surface component, instead of the entire volume. >> > >> > Best, >> > Mike >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gongding at cn.cogenda.com Thu Mar 30 23:14:22 2023 From: gongding at cn.cogenda.com (Gong Ding) Date: Fri, 31 Mar 2023 12:14:22 +0800 Subject: [petsc-users] help: use real and complex petsc together Message-ID: Dear petsc developer, We are considering use complex matrix for eigen value decomposition via slepc will keep DC/Tran simulation in real world. We can, compiler each solver (AC, DC, Tran, etc) as dynamic library (.so), and load them by dlopen. So, is it possible, each time we load a solver, which links to real/comples petsc,? to do the simulation. After that, we dlclose it and load next solver. And another question is, we must keep MPI commnucator in the main code and call PetscInitialize/PetscFinalize in the so, it seems petsc support this mechanism, right? Gong Ding From elias.karabelas at uni-graz.at Fri Mar 31 03:58:31 2023 From: elias.karabelas at uni-graz.at (Karabelas, Elias (elias.karabelas@uni-graz.at)) Date: Fri, 31 Mar 2023 08:58:31 +0000 Subject: [petsc-users] Augmented Linear System In-Reply-To: References: <2267f28c-ec43-66b8-43dd-29b4c6288478@uni-graz.at> Message-ID: Hi Mark, thanks for the input, however I didn't quite get what you mean. Maybe I should be a bit more precisce what I want to achieve and why: So this specific form of block system arises in some biomedical application that colleagues and me published in https://www.sciencedirect.com/science/article/pii/S0045782521004230 (the intersting part is Appendix B.3) It boils down to a Newton method for solving nolinear mechanics describing the motion of the human hear, that is coupled on some Neumann surfaces (the surfaces of the inner cavities of each bloodpool in the heart) with a pressure that comes from a complicated 0D ODE model that describes cardiovascular physiology. This comes to look like |F_1(u,p)| | 0 | | | = | | |F_2(u,p)| | 0 | with F_1 is the residual of nonlinear mechanics plus a nonlinear boundary coupling term and F_2 is a coupling term to the ODE system. In this case u is displacement and p is the pressure calculated from the ODE model (one for each cavity in the heart, which gives four). After linearization, we arrive exactly at the aforementioned blocksystem, that we solve at the moment by a Schurcomplement approach based on K. So using this we can get around without doing a MATSHELL for the schurcomplement as we can just apply the KSP for K five times in order to approximate the solution of the schur-complement system. However here it gets tricky: The outer newton loop is working based on an inexact newton method with a forcing term from Walker et al. So basically the atol and rtol of the KSP are not constant but vary, so I guess this will influence how well we actually resolve the solution to the schur-complement system (I tried to find some works that can explain how to choose forcing terms in this case but found none). This brought me to think if we can do this the other way around and do a pseudo-inverse of D because it's 4x4 and there is no need for a KSP there. I did a test implementation with a MATSHELL that realizes (K - B D^+ C) and use just K for building an GAMG prec however this fails spectaculary, because D^+ can behave very badly and the other way around I have (C K^-1 B - D) and this behaves more like a singular pertubation of the Matrix C K^-1 B which behaves nicer. So here I stopped investigating because my PETSc expertise is not bad but certainly not good enough to judge which approach would pay off more in terms of runtime (by gut feeling was that building a MATSHELL requires then only one KSP solve vs the other 5). However I'm happy to hear some alternatives that I could persue in order to speed up our current solver strategy or even be able to build a nice MATSHELL. Thanks Elias Am 30.03.23 um 19:41 schrieb Mark Adams: You can lag the update of the Schur complement and use your solver as a preconditioner. If your problems don't change much you might converge fast enough (ie, < 4 iterations with one solve per iteration), but what you have is not bad if the size of your auxiliary, p, space does not grow. Mark On Thu, Mar 30, 2023 at 11:56?AM Karabelas, Elias (elias.karabelas at uni-graz.at) > wrote: Dear Community, I have a linear system of the form |K B| du f1 = |C D| dp f2 where K is a big m x m sparsematrix that comes from some FE discretization, B is a coupling matrix (of the form m x 4), C is of the form (4 x m) and D is 4x4. I save B and C as 4 Vecs and D as a 4x4 double array. D might be singular so at the moment I use the following schur-complement approach to solve this system 1) Calculate the vecs v1 = KSP(K,PrecK) * f1, invB = [ KSP(K, PrecK) * B[0], KSP(K, PrecK) * B[1], KSP(K, PrecK) * B[2], KSP(K, PrecK) * B[3] ] 2) Build the schurcomplement S=[C ^ K^-1 B - D] via VecDots (C ^ K^-1 B [i, j] = VecDot(C[i], invB[j]) 3) invert S (this seems to be mostly non-singular) to get dp 4) calculate du with dp So counting this, I need 5x to call KSP which can be expensive and I thought of somehow doing the Schur-Complement the other way around, however due to the (possible) singularity of D this seems like a bad idea (even using a pseudoinverse) Two things puzzle me here still A) Can this be done more efficiently? B) In case my above matrix is the Jacobian in a newton method, how do I make sure with any form of Schur Complement approach that I hit the correct residual reduction? Thanks Elias -- Dr. Elias Karabelas Universit?tsassistent | PostDoc Institut f?r Mathematik & Wissenschaftliches Rechnen | Institute of Mathematics & Scientific Computing Universit?t Graz | University of Graz Heinrichstra?e 36, 8010 Graz Tel.: +43/(0)316/380-8546 E-Mail: elias.karabelas at uni-graz.at Web: https://ccl.medunigraz.at Bitte denken Sie an die Umwelt, bevor Sie dieses E-Mail drucken. Danke! Please consider the environment before printing this e-mail. Thank you! -- Dr. Elias Karabelas Universit?tsassistent | PostDoc Institut f?r Mathematik & Wissenschaftliches Rechnen | Institute of Mathematics & Scientific Computing Universit?t Graz | University of Graz Heinrichstra?e 36, 8010 Graz Tel.: +43/(0)316/380-8546 E-Mail: elias.karabelas at uni-graz.at Web: https://ccl.medunigraz.at Bitte denken Sie an die Umwelt, bevor Sie dieses E-Mail drucken. Danke! Please consider the environment before printing this e-mail. Thank you! -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Fri Mar 31 08:01:00 2023 From: jroman at dsic.upv.es (Jose E. Roman) Date: Fri, 31 Mar 2023 15:01:00 +0200 Subject: [petsc-users] help: use real and complex petsc together In-Reply-To: References: Message-ID: <11FB728F-BBCE-46B8-8F08-975E1FB2887F@dsic.upv.es> I don't know enough about dlopen to answer you question. Maybe other people can comment. My suggestion is that you do everything in complex scalars and avoid complications trying to mix real and complex. Regarding the second question, if I am not wrong PetscInitialize/PetscFinalize can be called several times, so it should not be a problem. Jose > El 31 mar 2023, a las 6:14, Gong Ding escribi?: > > Dear petsc developer, > > We are considering use complex matrix for eigen value decomposition via slepc will keep DC/Tran simulation in real world. > > We can, compiler each solver (AC, DC, Tran, etc) as dynamic library (.so), and load them by dlopen. > > So, is it possible, each time we load a solver, which links to real/comples petsc, to do the simulation. After that, we dlclose it and load next solver. > > And another question is, we must keep MPI commnucator in the main code and call PetscInitialize/PetscFinalize in the so, it seems petsc support this mechanism, right? > > > Gong Ding > > From mfadams at lbl.gov Fri Mar 31 08:25:57 2023 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 31 Mar 2023 09:25:57 -0400 Subject: [petsc-users] Augmented Linear System In-Reply-To: References: <2267f28c-ec43-66b8-43dd-29b4c6288478@uni-graz.at> Message-ID: On Fri, Mar 31, 2023 at 4:58?AM Karabelas, Elias ( elias.karabelas at uni-graz.at) wrote: > Hi Mark, > > thanks for the input, however I didn't quite get what you mean. > > Maybe I should be a bit more precisce what I want to achieve and why: > > So this specific form of block system arises in some biomedical > application that colleagues and me published in > https://www.sciencedirect.com/science/article/pii/S0045782521004230 (the > intersting part is Appendix B.3) > > It boils down to a Newton method for solving nolinear mechanics describing > the motion of the human hear, that is coupled on some Neumann surfaces (the > surfaces of the inner cavities of each bloodpool in the heart) with a > pressure that comes from a complicated 0D ODE model that describes > cardiovascular physiology. This comes to look like > > |F_1(u,p)| | 0 | > | | = | | > |F_2(u,p)| | 0 | > > with F_1 is the residual of nonlinear mechanics plus a nonlinear boundary > coupling term and F_2 is a coupling term to the ODE system. In this case u > is displacement and p is the pressure calculated from the ODE model (one > for each cavity in the heart, which gives four). > > After linearization, we arrive exactly at the aforementioned blocksystem, > that we solve at the moment by a Schurcomplement approach based on K. > So using this we can get around without doing a MATSHELL for the > schurcomplement as we can just apply the KSP for K five times in order to > approximate the solution of the schur-complement system. > So you compute an explicit Schur complement (4 solves) and then the real solve use 1 more K solve. I think this is pretty good as is. You are lucky with only 4 of these pressure equations. I've actually do this on a problem with 100s of extra equations (surface averaging equations) but the problem was linear and would 1000s of times steps, or more, and this huge set up cost was amortized. > However here it gets tricky: The outer newton loop is working based on an > inexact newton method with a forcing term from Walker et al. So basically > the atol and rtol of the KSP are not constant but vary, so I guess this > will influence how well we actually resolve the solution to the > schur-complement system (I tried to find some works that can explain how to > choose forcing terms in this case but found none). > > Honestly, Walker is a great guy, but I would not get too hung up on this. I've done a lot of plasticity work long ago and gave up on Walker et al. Others have had the same experience. What is new with your problem is how accurately do you want the Schur complement (4) solves. > This brought me to think if we can do this the other way around and do a > pseudo-inverse of D because it's 4x4 and there is no need for a KSP there. > I did a test implementation with a MATSHELL that realizes (K - B D^+ C) > and use just K for building an GAMG prec however this fails spectaculary, > because D^+ can behave very badly and the other way around I have (C K^-1 B > - D) and this behaves more like a singular pertubation of the Matrix C K^-1 > B which behaves nicer. So here I stopped investigating because my PETSc > expertise is not bad but certainly not good enough to judge which approach > would pay off more in terms of runtime (by gut feeling was that building a > MATSHELL requires then only one KSP solve vs the other 5). > > However I'm happy to hear some alternatives that I could persue in order > to speed up our current solver strategy or even be able to build a nice > MATSHELL. > OK, so you have tried what I was alluding to. I don't follow what you did exactly and have not worked it out, but there should be an iteration on the pressure equation with a (lagged) Schur solve as a preconditioner. But with only 4 extra solves in your case, I don't think it is worth it unless you want to write solver papers. And AMG in general really has to have a normal PDE, e.g. the K^-1 solve, and if K is too far away from the Laplacian (or elasticity) then all bets are off. Good luck, Mark > Thanks > Elias > > > > > Am 30.03.23 um 19:41 schrieb Mark Adams: > > You can lag the update of the Schur complement and use your solver as a > preconditioner. > If your problems don't change much you might converge fast enough (ie, < 4 > iterations with one solve per iteration), but what you have is not bad if > the size of your auxiliary, p, space does not grow. > > Mark > > On Thu, Mar 30, 2023 at 11:56?AM Karabelas, Elias ( > elias.karabelas at uni-graz.at) wrote: > >> Dear Community, >> >> I have a linear system of the form >> >> |K B| du f1 >> >> = >> >> |C D| dp f2 >> >> where K is a big m x m sparsematrix that comes from some FE >> discretization, B is a coupling matrix (of the form m x 4), C is of the >> form (4 x m) and D is 4x4. >> >> I save B and C as 4 Vecs and D as a 4x4 double array. D might be >> singular so at the moment I use the following schur-complement approach >> to solve this system >> >> 1) Calculate the vecs v1 = KSP(K,PrecK) * f1, invB = [ KSP(K, PrecK) * >> B[0], KSP(K, PrecK) * B[1], KSP(K, PrecK) * B[2], KSP(K, PrecK) * B[3] ] >> >> 2) Build the schurcomplement S=[C ^ K^-1 B - D] via VecDots (C ^ K^-1 B >> [i, j] = VecDot(C[i], invB[j]) >> >> 3) invert S (this seems to be mostly non-singular) to get dp >> >> 4) calculate du with dp >> >> So counting this, I need 5x to call KSP which can be expensive and I >> thought of somehow doing the Schur-Complement the other way around, >> however due to the (possible) singularity of D this seems like a bad >> idea (even using a pseudoinverse) >> >> Two things puzzle me here still >> >> A) Can this be done more efficiently? >> >> B) In case my above matrix is the Jacobian in a newton method, how do I >> make sure with any form of Schur Complement approach that I hit the >> correct residual reduction? >> >> Thanks >> >> Elias >> >> -- >> Dr. Elias Karabelas >> Universit?tsassistent | PostDoc >> >> Institut f?r Mathematik & Wissenschaftliches Rechnen | Institute of >> Mathematics & Scientific Computing >> Universit?t Graz | University of Graz >> >> Heinrichstra?e 36, 8010 Graz >> Tel.: +43/(0)316/380-8546 >> E-Mail: elias.karabelas at uni-graz.at >> Web: https://ccl.medunigraz.at >> >> Bitte denken Sie an die Umwelt, bevor Sie dieses E-Mail drucken. Danke! >> Please consider the environment before printing this e-mail. Thank you! >> >> > -- > Dr. Elias Karabelas > Universit?tsassistent | PostDoc > > Institut f?r Mathematik & Wissenschaftliches Rechnen | Institute of Mathematics & Scientific Computing > Universit?t Graz | University of Graz > > Heinrichstra?e 36, 8010 Graz > Tel.: +43/(0)316/380-8546 > E-Mail: elias.karabelas at uni-graz.at > Web: https://ccl.medunigraz.at > > Bitte denken Sie an die Umwelt, bevor Sie dieses E-Mail drucken. Danke! > Please consider the environment before printing this e-mail. Thank you! > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Mar 31 12:47:31 2023 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 31 Mar 2023 13:47:31 -0400 Subject: [petsc-users] PETSc release announcement and reminder to register for the PETSc annual meeting Message-ID: We are pleased to announce the release of PETSc version 3.19.0, now available at https://petsc.org/release/download/ A list of the major changes and updates can be found at https://petsc.org/release/changes/319/ We'd also like to remind everyone to register for the PETSc Annual Meeting June 5-7 in Chicago https://www.eventbrite.com/e/petsc-2023-user-meeting-tickets-494165441137 and to submit an abstract https://docs.google.com/forms/d/e/1FAIpQLSesh47RGVb9YD9F1qu4obXSe1X6fn7vVmjewllePBDxBItfOw/viewform. Information on the meeting is available at https://petsc.org/release/community/meetings/2023/#meeting We recommend upgrading to PETSc 3.19.0 soon. As always, please report problems to petsc-maint at mcs.anl.gov and ask questions at petsc-users at mcs.anl.gov This release includes contributions from Aidan Hamilton Alp Dener Ashish Patel Barry Smith Blaise Bourdin danofinn David Wells DenverCoder9 Duncan Campbell Eric Chamberland Fernando S. Pacheco Frederic Vi Heeho Park Hong Zhang Igor Baratta Jacob Faibussowitsch JDBetteridge Jed Brown Jeremy L Thompson Joseph Pusztay Jose Roman Junchao Zhang Justin Chang Koki Sagiyama Lisandro Dalcin Malachi Phillips Mark Adams Mark Lohry Martin Diehl Matthew Knepley Mr. Hong Zhang Nicholas Arnold-Medabalimi Pablo Brubeck Patrick Sanan Pierre Jolivet Ricardo Jesus Richard Tran Mills Samuel Khuvis Satish Balay Shao-Ching Huang Stefano Zampini Suyash Tandon tlanyan Toby Isaac Umberto Zerbinati Vaclav Hapla Yang Zongze and bug reports/proposed improvements received from Balin, Riccardo Carlos Alonso Aznar?n Laos chiara.puglisi at mumps-tech.com dalcinl Daniel Taller Dave May Don Fernando Philip Fackler Felix Wilms Hom Nath Gharti Hong Zhang Jacob Faibussowitsch Jed Brown Jin Chen Mike Michell Nicholas Arnold-Medabalimi Robert Nourgaliev Pierre Jolivet Richard Katz Sajid Ali Syed Sanjay Govindjee Stephan K?hler Valentin Churavy Victor Eijkhout yangzongze As always, thanks for your support, Barry