From Elena.Moral.Sanchez at ipp.mpg.de Mon Nov 3 11:39:07 2025 From: Elena.Moral.Sanchez at ipp.mpg.de (Moral Sanchez, Elena) Date: Mon, 3 Nov 2025 17:39:07 +0000 Subject: [petsc-users] norm of KSPBuildResidual does not match norm computed from KSPBuildSolution Message-ID: <4a3cebd0f2494a26b49e8b1b19595433@ipp.mpg.de> Hi, I am running CG with a Jacobi preconditioner. I have a monitor function that prints the residual and saves the solution at every iteration. To get the solution at every iteration, I am using the function KSPBuildSolution. I am setting the KSP norm as UNPRECONDITIONED. The convergence test is consistent with the 2-norm of KSPBuildResidual. However this norm does not match the 2-norm of the residual computed explicitly from the solution (obtained with KSPBuildSolution). It also does not match the preconditioned norm. What norm is it computing? When the CG is not preconditioned, the norm of KSPBuildResidual and the norm of the residual computed from the solution match, as I expected. This is KSPView(): KSP Object: 1 MPI process type: cg variant HERMITIAN maximum iterations=100, nonzero initial guess tolerances: relative=1e-08, absolute=1e-08, divergence=10000. left preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 1 MPI process type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 1 MPI process type: nest rows=524, cols=524 Matrix object: type=nest, rows=3, cols=3 MatNest structure: (0,0) : type=mpiaij, rows=176, cols=176 (0,1) : NULL (0,2) : NULL (1,0) : NULL (1,1) : type=mpiaij, rows=172, cols=172 (1,2) : NULL (2,0) : NULL (2,1) : NULL (2,2) : type=mpiaij, rows=176, cols=176 Cheers, Elena Moral S?nchez -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Nov 3 20:01:08 2025 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 3 Nov 2025 21:01:08 -0500 Subject: [petsc-users] norm of KSPBuildResidual does not match norm computed from KSPBuildSolution In-Reply-To: <4a3cebd0f2494a26b49e8b1b19595433@ipp.mpg.de> References: <4a3cebd0f2494a26b49e8b1b19595433@ipp.mpg.de> Message-ID: <3A29735F-9AB3-49CC-9ECB-7576B563949C@petsc.dev> Elena, I have attached a modification to src/snes/tutorials/ex5.c that adds a monitor routine in the style I think you are suggesting. ? Below I cut and paste the beginning of the output from running the command ./ex5 -ksp_type cg -ksp_monitor_true_residual -ksp_norm_type unpreconditioned -pc_type jacobi -da_refine 3 0 KSP unpreconditioned resid norm 1.265943996096e+00 true resid norm 1.265943996096e+00 ||r(i)||/||b|| 1.000000000000e+00 My monitor 0 1.265943996096e+00 1 KSP unpreconditioned resid norm 1.030361071579e+00 true resid norm 1.030361071579e+00 ||r(i)||/||b|| 8.139073092933e-01 My monitor 1 1.030361071579e+00 2 KSP unpreconditioned resid norm 7.753237278390e-01 true resid norm 7.753237278390e-01 ||r(i)||/||b|| 6.124470989473e-01 My monitor 2 7.753237278390e-01 3 KSP unpreconditioned resid norm 6.674186105521e-01 true resid norm 6.674186105521e-01 ||r(i)||/||b|| 5.272102183115e-01 My monitor 3 6.674186105521e-01 4 KSP unpreconditioned resid norm 5.745948088398e-01 true resid norm 5.745948088398e-01 ||r(i)||/||b|| 4.538864362181e-01 My monitor 4 5.745948088398e-01 5 KSP unpreconditioned resid norm 5.103132053010e-01 true resid norm 5.103132053010e-01 ||r(i)||/||b|| 4.031088317292e-01 My monitor 5 5.103132053010e-01 6 KSP unpreconditioned resid norm 4.581737850155e-01 true resid norm 4.581737850155e-01 ||r(i)||/||b|| 3.619226335670e-01 My monitor 6 4.581737850155e-01 7 KSP unpreconditioned resid norm 4.202213342980e-01 true resid norm 4.202213342980e-01 ||r(i)||/||b|| 3.319430682509e-01 My monitor 7 4.202213342980e-01 8 KSP unpreconditioned resid norm 3.936600255123e-01 true resid norm 3.936600255123e-01 ||r(i)||/||b|| 3.109616434267e-01 My monitor 8 3.936600255123e-01 9 KSP unpreconditioned resid norm 3.811944420804e-01 true resid norm 3.811944420804e-01 ||r(i)||/||b|| 3.011147754212e-01 My monitor 9 3.811944420804e-01 10 KSP unpreconditioned resid norm 3.851182669108e-01 true resid norm 3.851182669108e-01 ||r(i)||/||b|| 3.042143002363e-01 My monitor 10 3.851182669108e-01 11 KSP unpreconditioned resid norm 4.107620195902e-01 true resid norm 4.107620195902e-01 ||r(i)||/||b|| 3.244709251411e-01 My monitor 11 4.107620195902e-01 12 KSP unpreconditioned resid norm 3.678610761984e-01 true resid norm 3.678610761984e-01 ||r(i)||/||b|| 2.905824249198e-01 My monitor 12 3.678610761984e-01 13 KSP unpreconditioned resid norm 3.891700469761e-01 true resid norm 3.891700469761e-01 ||r(i)||/||b|| 3.074149000083e-01 My monitor 13 3.891700469761e-01 14 KSP unpreconditioned resid norm 4.123002052123e-01 true resid norm 4.123002052123e-01 ||r(i)||/||b|| 3.256859754331e-01 My monitor 14 4.123002052123e-01 15 KSP unpreconditioned resid norm 4.456104079353e-01 true resid norm 4.456104079353e-01 ||r(i)||/||b|| 3.519985159765e-01 My monitor 15 4.456104079353e-01 16 KSP unpreconditioned resid norm 5.125721163597e-01 true resid norm 5.125721163597e-01 ||r(i)||/||b|| 4.048932005999e-01 My monitor 16 5.125721163597e-01 17 KSP unpreconditioned resid norm 4.475156370525e-01 true resid norm 4.475156370525e-01 ||r(i)||/||b|| 3.535035028662e-01 My monitor 17 4.475156370525e-01 18 KSP unpreconditioned resid norm 2.977755423590e-01 true resid norm 2.977755423590e-01 ||r(i)||/||b|| 2.352201545070e-01 My monitor 18 2.977755423590e-01 19 KSP unpreconditioned resid norm 2.317275576684e-01 true resid norm 2.317275576684e-01 ||r(i)||/||b|| 1.830472425186e-01 My monitor 19 2.317275576684e-01 20 KSP unpreconditioned resid norm 2.388542347249e-01 true resid norm 2.388542347249e-01 ||r(i)||/||b|| 1.886767783262e-01 My monitor 20 2.388542347249e-01 21 KSP unpreconditioned resid norm 1.722165062986e-01 true resid norm 1.722165062986e-01 ||r(i)||/||b|| 1.360380133953e-01 My monitor 21 1.722165062986e-01 22 KSP unpreconditioned resid norm 1.161869442046e-01 true resid norm 1.161869442046e-01 ||r(i)||/||b|| 9.177889745747e-02 My monitor 22 1.161869442046e-01 23 KSP unpreconditioned resid norm 6.594339583731e-02 true resid norm 6.594339583731e-02 ||r(i)||/||b|| 5.209029470549e-02 My monitor 23 6.594339583731e-02 24 KSP unpreconditioned resid norm 4.351679748574e-02 true resid norm 4.351679748574e-02 ||r(i)||/||b|| 3.437497837181e-02 My monitor 24 4.351679748573e-02 25 KSP unpreconditioned resid norm 3.847638846864e-02 true resid norm 3.847638846864e-02 ||r(i)||/||b|| 3.039343650847e-02 My monitor 25 3.847638846864e-02 26 KSP unpreconditioned resid norm 2.063424248358e-02 true resid norm 2.063424248358e-02 ||r(i)||/||b|| 1.629949077306e-02 My monitor 26 2.063424248358e-02 27 KSP unpreconditioned resid norm 1.402462240396e-02 true resid norm 1.402462240396e-02 ||r(i)||/||b|| 1.107839086659e-02 My monitor 27 1.402462240396e-02 28 KSP unpreconditioned resid norm 7.732817953098e-03 true resid norm 7.732817953098e-03 ||r(i)||/||b|| 6.108341267025e-03 My monitor 28 7.732817953099e-03 29 KSP unpreconditioned resid norm 5.109464751004e-03 true resid norm 5.109464751004e-03 ||r(i)||/||b|| 4.036090669698e-03 My monitor 29 5.109464751004e-03 30 KSP unpreconditioned resid norm 2.628714079103e-03 true resid norm 2.628714079103e-03 ||r(i)||/||b|| 2.076485284664e-03 My monitor 30 2.628714079103e-03 31 KSP unpreconditioned resid norm 1.211324322673e-03 true resid norm 1.211324322673e-03 ||r(i)||/||b|| 9.568545894671e-04 My monitor 31 1.211324322673e-03 At iteration 32 we see a slight difference in the reported norms 32 KSP unpreconditioned resid norm 5.638897702485e-04 true resid norm 5.638897702485e-04 ||r(i)||/||b|| 4.454302654678e-04 My monitor 32 5.638897702491e-04 Then they continue to be different with more and more digits 33 KSP unpreconditioned resid norm 2.557920120696e-04 true resid norm 2.557920120696e-04 ||r(i)||/||b|| 2.020563412429e-04 My monitor 33 2.557920120695e-04 34 KSP unpreconditioned resid norm 1.249567288159e-04 true resid norm 1.249567288159e-04 ||r(i)||/||b|| 9.870636394758e-05 My monitor 34 1.249567288156e-04 35 KSP unpreconditioned resid norm 6.554146400697e-05 true resid norm 6.554146400697e-05 ||r(i)||/||b|| 5.177279896194e-05 My monitor 35 6.554146400761e-05 36 KSP unpreconditioned resid norm 3.360138566154e-05 true resid norm 3.360138566154e-05 ||r(i)||/||b|| 2.654255303959e-05 My monitor 36 3.360138566057e-05 37 KSP unpreconditioned resid norm 1.963635751089e-05 true resid norm 1.963635751089e-05 ||r(i)||/||b|| 1.551123712537e-05 My monitor 37 1.963635751179e-05 38 KSP unpreconditioned resid norm 1.111922577034e-05 true resid norm 1.111922577034e-05 ||r(i)||/||b|| 8.783347292320e-06 My monitor 38 1.111922577016e-05 Is this the type of discrepancy you're seeing in your code, or are you seeing enormous differences right off the bat? The discrepancy shown above is normal. It arises because KSPSolve_CG() uses PetscCall(VecAXPY(X, a, P)); /* x <- x + ap */ PetscCall(VecAXPY(R, -a, W)); /* r <- r - aw */ to update the solution and the residual. Where W has been computed further up in the code as A*P. If I change KSPSolve_CG() to instead compute R = b - A X explicitly (attached)? then the output becomes 32 KSP unpreconditioned resid norm 5.638897702486e-04 true resid norm 5.638897702486e-04 ||r(i)||/||b|| 4.454302654678e-04 My monitor 32 5.638897702486e-04 33 KSP unpreconditioned resid norm 2.557920120698e-04 true resid norm 2.557920120698e-04 ||r(i)||/||b|| 2.020563412431e-04 My monitor 33 2.557920120698e-04 34 KSP unpreconditioned resid norm 1.249567288163e-04 true resid norm 1.249567288163e-04 ||r(i)||/||b|| 9.870636394784e-05 My monitor 34 1.249567288163e-04 35 KSP unpreconditioned resid norm 6.554146400720e-05 true resid norm 6.554146400720e-05 ||r(i)||/||b|| 5.177279896212e-05 My monitor 35 6.554146400720e-05 36 KSP unpreconditioned resid norm 3.360138566157e-05 true resid norm 3.360138566157e-05 ||r(i)||/||b|| 2.654255303962e-05 My monitor 36 3.360138566157e-05 37 KSP unpreconditioned resid norm 1.963635751099e-05 true resid norm 1.963635751099e-05 ||r(i)||/||b|| 1.551123712545e-05 My monitor 37 1.963635751099e-05 38 KSP unpreconditioned resid norm 1.111922577015e-05 true resid norm 1.111922577015e-05 ||r(i)||/||b|| 8.783347292168e-06 My monitor 38 1.111922577015e-05 Now the residual norm printed by -ksp_monitor and MyMonitor are the same to all digits for all iterations. KSPSolve_CG() uses R = R - a W = R - a (A * P) instead of R = B - A*X because it saves a matrix-vector multiply per iteration (generally, for CG the matrix-vector multiply dominates the solution time). One final note. How come "The convergence test is consistent with the 2-norm of KSPBuildResidual" but not computed explicitly from KSPBuildSolution()? That is because the KSPBuildResidual() routine cheats PETSC_INTERN PetscErrorCode KSPBuildResidual_CG(KSP ksp, Vec t, Vec v, Vec *V) { PetscFunctionBegin; PetscCall(VecCopy(ksp->work[0], v)); *V = v; It knows from the KSPSolve_CG() code where the computed residual is stored and gives you that (slightly incorrect :-) one. Now some Krylov methods do not explicitly store (or even compute) the residual vector so they explicitly compute it with PetscErrorCode KSPBuildResidualDefault(KSP ksp, Vec t, Vec v, Vec *V) { Mat Amat, Pmat; PetscFunctionBegin; if (!ksp->pc) PetscCall(KSPGetPC(ksp, &ksp->pc)); PetscCall(PCGetOperators(ksp->pc, &Amat, &Pmat)); PetscCall(KSPBuildSolution(ksp, t, NULL)); PetscCall(KSP_MatMult(ksp, Amat, t, v)); PetscCall(VecAYPX(v, -1.0, ksp->vec_rhs)); *V = v; PetscFunctionReturn(PETSC_SUCCESS); } Barry This is an interesting phenomenon that often is not discussed in elementary introductions to Krylov methods (or even in advanced discussions), so I think I will write an FAQ that explains it for petsc.org > On Nov 3, 2025, at 12:39?PM, Moral Sanchez, Elena wrote: > > Hi, > I am running CG with a Jacobi preconditioner. I have a monitor function that prints the residual and saves the solution at every iteration. To get the solution at every iteration, I am using the function KSPBuildSolution. I am setting the KSP norm as UNPRECONDITIONED. > > The convergence test is consistent with the 2-norm of KSPBuildResidual. However this norm does not match the 2-norm of the residual computed explicitly from the solution (obtained with KSPBuildSolution). It also does not match the preconditioned norm. What norm is it computing? > > When the CG is not preconditioned, the norm of KSPBuildResidual and the norm of the residual computed from the solution match, as I expected. > > This is KSPView(): > > KSP Object: 1 MPI process > type: cg > variant HERMITIAN > maximum iterations=100, nonzero initial guess > tolerances: relative=1e-08, absolute=1e-08, divergence=10000. > left preconditioning > using UNPRECONDITIONED norm type for convergence test > PC Object: 1 MPI process > type: jacobi > type DIAGONAL > linear system matrix = precond matrix: > Mat Object: 1 MPI process > type: nest > rows=524, cols=524 > Matrix object: > type=nest, rows=3, cols=3 > MatNest structure: > (0,0) : type=mpiaij, rows=176, cols=176 > (0,1) : NULL > (0,2) : NULL > (1,0) : NULL > (1,1) : type=mpiaij, rows=172, cols=172 > (1,2) : NULL > (2,0) : NULL > (2,1) : NULL > (2,2) : type=mpiaij, rows=176, cols=176 > > Cheers, > Elena Moral S?nchez -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex5.c Type: application/octet-stream Size: 38511 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cg.c Type: application/octet-stream Size: 30555 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From prateekgupta1709 at gmail.com Tue Nov 4 02:09:52 2025 From: prateekgupta1709 at gmail.com (prateekgupta1709 at gmail.com) Date: Tue, 4 Nov 2025 13:39:52 +0530 Subject: [petsc-users] Segmentation violation in DMSwarmMigrate(sdm_, PETSC_TRUE) Message-ID: <054F3C46-1719-4A52-9CFF-8CDBB9D46B76@gmail.com> Hi, I am writing a basic particle tracking code with an existing periodic DMDA. All velocity calculations and position updates work fine but migration throws segfault. I have checked and particles are within the periodic box as well. I discovered that there is an internal field for coordinates which has to be defined using its identifier instead of explicitly registering a field. I am not registering any velocity field (using external vectors). Is there something I am missing? Thanks, Prateek From prateekgupta1709 at gmail.com Tue Nov 4 02:45:49 2025 From: prateekgupta1709 at gmail.com (prateekgupta1709 at gmail.com) Date: Tue, 4 Nov 2025 14:15:49 +0530 Subject: [petsc-users] Segmentation violation in DMSwarmMigrate(sdm_, PETSC_TRUE) In-Reply-To: <054F3C46-1719-4A52-9CFF-8CDBB9D46B76@gmail.com> References: <054F3C46-1719-4A52-9CFF-8CDBB9D46B76@gmail.com> Message-ID: <0BFB7586-0108-47EA-98F8-F9EC0ED21A6D@gmail.com> Forgot to add, even with PETSC_FALSE option, I get the same segfault error. > > On 4 Nov 2025, at 1:39?PM, prateekgupta1709 at gmail.com wrote: > > ?Hi, > I am writing a basic particle tracking code with an existing periodic DMDA. All velocity calculations and position updates work fine but migration throws segfault. I have checked and particles are within the periodic box as well. > > I discovered that there is an internal field for coordinates which has to be defined using its identifier instead of explicitly registering a field. I am not registering any velocity field (using external vectors). > > Is there something I am missing? > > Thanks, > Prateek From Elena.Moral.Sanchez at ipp.mpg.de Tue Nov 4 03:06:07 2025 From: Elena.Moral.Sanchez at ipp.mpg.de (Moral Sanchez, Elena) Date: Tue, 4 Nov 2025 09:06:07 +0000 Subject: [petsc-users] norm of KSPBuildResidual does not match norm computed from KSPBuildSolution In-Reply-To: <3A29735F-9AB3-49CC-9ECB-7576B563949C@petsc.dev> References: <4a3cebd0f2494a26b49e8b1b19595433@ipp.mpg.de> <3A29735F-9AB3-49CC-9ECB-7576B563949C@petsc.dev> Message-ID: Dear Barry, Thanks for the fast answer. Unfortunately in my case the discrepancy is huge. With the flags -ksp_monitor_true_residual -ksp_norm_type unpreconditioned this is the output: 0 KSP unpreconditioned resid norm 5.568889644229e-01 true resid norm 5.568889644229e-01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 2.831772665189e-01 true resid norm 2.831772665189e-01 ||r(i)||/||b|| 5.084986139245e-01 2 KSP unpreconditioned resid norm 1.875950094147e-01 true resid norm 1.875950094147e-01 ||r(i)||/||b|| 3.368625011435e-01 and this is the output of my own monitor function: Iter 0/10 | res = 5.57e-01/1.00e-08 | 0.0 s difference KSPBuildSolution and u: 0.0 UNPRECONDITIONED norm: 0.5568889644229376 PRECONDITIONED norm: 2.049041078011257 KSPBuildResidual 2-norm: 0.5568889644229299 difference KSPBuildResidual and b-A(KSPBuildSolution): 6.573603152700697e-13 Iter 1/10 | res = 2.83e-01/1.00e-08 | 0.0 s difference KSPBuildSolution and u: 0.0 UNPRECONDITIONED norm: 0.7661983589104541 PRECONDITIONED norm: 2.7387602134717137 KSPBuildResidual 2-norm: 0.2831772665189212 difference KSPBuildResidual and b-A(KSPBuildSolution): 0.1700718741085172 Iter 2/10 | res = 1.88e-01/1.00e-08 | 0.0 s difference KSPBuildSolution and u: 0.0 UNPRECONDITIONED norm: 0.7050518160900253 PRECONDITIONED norm: 2.421773833445645 KSPBuildResidual 2-norm: 0.18759500941469456 difference KSPBuildResidual and b-A(KSPBuildSolution): 0.19327058976599623 Here u is the vector in the KSPSolve. After the first iteration, the residual computed from KSPBuildSolution and the residual from KSPBuildResidual diverge. They are the same when I run the same code without preconditioner. Another observation is that after convergence (wrt. unpreconditioned norm == 2-norm of KSPBuildResidual) the solution with and without preconditioner looks quite different. How is this possible if my preconditioner is SPD? By the way, where can I find your implementation of "My monitor" in src/snes/tutorials/ex5.c? I tried to look at the Gitlab repository but could not find it. Thanks for the help. Cheers, Elena On 11/4/25 03:01, Barry Smith wrote: 0 KSP unpreconditioned resid norm 1.265943996096e+00 true resid norm 1.265943996096e+00 ||r(i)||/||b|| 1.000000000000e+00 My monitor 0 1.265943996096e+00 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Elena.Moral.Sanchez at ipp.mpg.de Tue Nov 4 05:09:21 2025 From: Elena.Moral.Sanchez at ipp.mpg.de (Moral Sanchez, Elena) Date: Tue, 4 Nov 2025 11:09:21 +0000 Subject: [petsc-users] norm of KSPBuildResidual does not match norm computed from KSPBuildSolution In-Reply-To: References: <4a3cebd0f2494a26b49e8b1b19595433@ipp.mpg.de> <3A29735F-9AB3-49CC-9ECB-7576B563949C@petsc.dev>, Message-ID: Dear Barry, I just realized that the operator from KSPGetOperators()[0] does not match my operator. In fact, KSPGetOperators()[0] returns the operator used for the preconditioner. This explains all the issues I reported. The way I was setting up the KSP is ksp.setOperators(ksp, lhs, None) precond = PETSc.PC().create(comm=comm) precond.setType(PETSc.PC.Type.JACOBI) precond.setOperators(A=nest_mass_matrix, P=None) precond.setUp() ksp.setPC(precond) ksp.setUp() This turns out to behave as ksp.setOperators(A=nest_mass_matrix, P=nest_mass_matrix) precond = ksp.getPC() precond.setType(PETSc.PC.Type.JACOBI) ksp.setUp() I managed to set up what I want with ksp.setOperators(A=lhs, P=nest_mass_matrix) precond = ksp.getPC() precond.setType(PETSc.PC.Type.JACOBI) ksp.setUp() Here I am solving lhs x = rhs, lhs is a matrix-free operator (python type) and my preconditioner is the diagonal of nest_mass_matrix, which is of type nest. I find this extremely confusing, especially because from the output of KSPView I could not detect the problem. For illustration, after the fix, PCView prints Mat Object: 1 MPI process type: python Python: __main__.LHSOperator Mat Object: 1 MPI process type: nest Matrix object: type=nest, rows=3, cols=3 MatNest structure: (0,0) : type=mpiaij, rows=176, cols=176 (0,1) : NULL (0,2) : NULL (1,0) : NULL (1,1) : type=mpiaij, rows=172, cols=172 (1,2) : NULL (2,0) : NULL (2,1) : NULL (2,2) : type=mpiaij, rows=176, cols=176 This brings me to the question: is the Jacobi preconditioner using the operator lhs or the nested operator? I think that the output of PCView and KSPView should be more clear on this. Cheers, Elena ________________________________ From: Moral Sanchez, Elena Sent: 04 November 2025 10:06:07 To: Barry Smith Cc: PETSc Subject: Re: [petsc-users] norm of KSPBuildResidual does not match norm computed from KSPBuildSolution Dear Barry, Thanks for the fast answer. Unfortunately in my case the discrepancy is huge. With the flags -ksp_monitor_true_residual -ksp_norm_type unpreconditioned this is the output: 0 KSP unpreconditioned resid norm 5.568889644229e-01 true resid norm 5.568889644229e-01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 2.831772665189e-01 true resid norm 2.831772665189e-01 ||r(i)||/||b|| 5.084986139245e-01 2 KSP unpreconditioned resid norm 1.875950094147e-01 true resid norm 1.875950094147e-01 ||r(i)||/||b|| 3.368625011435e-01 and this is the output of my own monitor function: Iter 0/10 | res = 5.57e-01/1.00e-08 | 0.0 s difference KSPBuildSolution and u: 0.0 UNPRECONDITIONED norm: 0.5568889644229376 PRECONDITIONED norm: 2.049041078011257 KSPBuildResidual 2-norm: 0.5568889644229299 difference KSPBuildResidual and b-A(KSPBuildSolution): 6.573603152700697e-13 Iter 1/10 | res = 2.83e-01/1.00e-08 | 0.0 s difference KSPBuildSolution and u: 0.0 UNPRECONDITIONED norm: 0.7661983589104541 PRECONDITIONED norm: 2.7387602134717137 KSPBuildResidual 2-norm: 0.2831772665189212 difference KSPBuildResidual and b-A(KSPBuildSolution): 0.1700718741085172 Iter 2/10 | res = 1.88e-01/1.00e-08 | 0.0 s difference KSPBuildSolution and u: 0.0 UNPRECONDITIONED norm: 0.7050518160900253 PRECONDITIONED norm: 2.421773833445645 KSPBuildResidual 2-norm: 0.18759500941469456 difference KSPBuildResidual and b-A(KSPBuildSolution): 0.19327058976599623 Here u is the vector in the KSPSolve. After the first iteration, the residual computed from KSPBuildSolution and the residual from KSPBuildResidual diverge. They are the same when I run the same code without preconditioner. Another observation is that after convergence (wrt. unpreconditioned norm == 2-norm of KSPBuildResidual) the solution with and without preconditioner looks quite different. How is this possible if my preconditioner is SPD? By the way, where can I find your implementation of "My monitor" in src/snes/tutorials/ex5.c? I tried to look at the Gitlab repository but could not find it. Thanks for the help. Cheers, Elena On 11/4/25 03:01, Barry Smith wrote: 0 KSP unpreconditioned resid norm 1.265943996096e+00 true resid norm 1.265943996096e+00 ||r(i)||/||b|| 1.000000000000e+00 My monitor 0 1.265943996096e+00 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Nov 4 08:36:15 2025 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 4 Nov 2025 09:36:15 -0500 Subject: [petsc-users] norm of KSPBuildResidual does not match norm computed from KSPBuildSolution In-Reply-To: References: <4a3cebd0f2494a26b49e8b1b19595433@ipp.mpg.de> <3A29735F-9AB3-49CC-9ECB-7576B563949C@petsc.dev> Message-ID: > On Nov 4, 2025, at 4:06?AM, Moral Sanchez, Elena wrote: > > Dear Barry, > Thanks for the fast answer. Unfortunately in my case the discrepancy is huge. With the flags > -ksp_monitor_true_residual -ksp_norm_type unpreconditioned > this is the output: > 0 KSP unpreconditioned resid norm 5.568889644229e-01 true resid norm 5.568889644229e-01 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP unpreconditioned resid norm 2.831772665189e-01 true resid norm 2.831772665189e-01 ||r(i)||/||b|| 5.084986139245e-01 > 2 KSP unpreconditioned resid norm 1.875950094147e-01 true resid norm 1.875950094147e-01 ||r(i)||/||b|| 3.368625011435e-01 > and this is the output of my own monitor function: > Iter 0/10 | res = 5.57e-01/1.00e-08 | 0.0 s > difference KSPBuildSolution and u: 0.0 > UNPRECONDITIONED norm: 0.5568889644229376 > PRECONDITIONED norm: 2.049041078011257 > KSPBuildResidual 2-norm: 0.5568889644229299 > difference KSPBuildResidual and b-A(KSPBuildSolution): 6.573603152700697e-13 > > Iter 1/10 | res = 2.83e-01/1.00e-08 | 0.0 s > difference KSPBuildSolution and u: 0.0 > UNPRECONDITIONED norm: 0.7661983589104541 > PRECONDITIONED norm: 2.7387602134717137 > KSPBuildResidual 2-norm: 0.2831772665189212 > difference KSPBuildResidual and b-A(KSPBuildSolution): 0.1700718741085172 > > Iter 2/10 | res = 1.88e-01/1.00e-08 | 0.0 s > difference KSPBuildSolution and u: 0.0 > UNPRECONDITIONED norm: 0.7050518160900253 > PRECONDITIONED norm: 2.421773833445645 > KSPBuildResidual 2-norm: 0.18759500941469456 > difference KSPBuildResidual and b-A(KSPBuildSolution): 0.19327058976599623 > Here u is the vector in the KSPSolve. > After the first iteration, the residual computed from KSPBuildSolution and the residual from KSPBuildResidual diverge. They are the same when I run the same code without preconditioner. > Another observation is that after convergence (wrt. unpreconditioned norm == 2-norm of KSPBuildResidual) the solution with and without preconditioner looks quite different. How is this possible if my preconditioner is SPD? > > By the way, where can I find your implementation of "My monitor" in src/snes/tutorials/ex5.c? I tried to look at the Gitlab repository but could not find it. I thought I attached it to the email. > Thanks for the help. > Cheers, > Elena > > > > > > On 11/4/25 03:01, Barry Smith wrote: >> 0 KSP unpreconditioned resid norm 1.265943996096e+00 true resid norm 1.265943996096e+00 ||r(i)||/||b|| 1.000000000000e+00 >> My monitor 0 1.265943996096e+00 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Nov 4 08:43:57 2025 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 4 Nov 2025 09:43:57 -0500 Subject: [petsc-users] norm of KSPBuildResidual does not match norm computed from KSPBuildSolution In-Reply-To: References: <4a3cebd0f2494a26b49e8b1b19595433@ipp.mpg.de> <3A29735F-9AB3-49CC-9ECB-7576B563949C@petsc.dev> Message-ID: Are you sure your matrix is symmetric, positive definite, and that the sign of all the diagonal entries is the same? You can run with -ksp_view_mat binary -ksp_view_rhs binary. This will produce a file called binaryoutput, you can email that file. Barry > On Nov 4, 2025, at 4:06?AM, Moral Sanchez, Elena wrote: > > Dear Barry, > Thanks for the fast answer. Unfortunately in my case the discrepancy is huge. With the flags > -ksp_monitor_true_residual -ksp_norm_type unpreconditioned > this is the output: > 0 KSP unpreconditioned resid norm 5.568889644229e-01 true resid norm 5.568889644229e-01 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP unpreconditioned resid norm 2.831772665189e-01 true resid norm 2.831772665189e-01 ||r(i)||/||b|| 5.084986139245e-01 > 2 KSP unpreconditioned resid norm 1.875950094147e-01 true resid norm 1.875950094147e-01 ||r(i)||/||b|| 3.368625011435e-01 > and this is the output of my own monitor function: > Iter 0/10 | res = 5.57e-01/1.00e-08 | 0.0 s > difference KSPBuildSolution and u: 0.0 > UNPRECONDITIONED norm: 0.5568889644229376 > PRECONDITIONED norm: 2.049041078011257 > KSPBuildResidual 2-norm: 0.5568889644229299 > difference KSPBuildResidual and b-A(KSPBuildSolution): 6.573603152700697e-13 > > Iter 1/10 | res = 2.83e-01/1.00e-08 | 0.0 s > difference KSPBuildSolution and u: 0.0 > UNPRECONDITIONED norm: 0.7661983589104541 > PRECONDITIONED norm: 2.7387602134717137 > KSPBuildResidual 2-norm: 0.2831772665189212 > difference KSPBuildResidual and b-A(KSPBuildSolution): 0.1700718741085172 > > Iter 2/10 | res = 1.88e-01/1.00e-08 | 0.0 s > difference KSPBuildSolution and u: 0.0 > UNPRECONDITIONED norm: 0.7050518160900253 > PRECONDITIONED norm: 2.421773833445645 > KSPBuildResidual 2-norm: 0.18759500941469456 > difference KSPBuildResidual and b-A(KSPBuildSolution): 0.19327058976599623 > Here u is the vector in the KSPSolve. > After the first iteration, the residual computed from KSPBuildSolution and the residual from KSPBuildResidual diverge. They are the same when I run the same code without preconditioner. > Another observation is that after convergence (wrt. unpreconditioned norm == 2-norm of KSPBuildResidual) the solution with and without preconditioner looks quite different. How is this possible if my preconditioner is SPD? > > By the way, where can I find your implementation of "My monitor" in src/snes/tutorials/ex5.c? I tried to look at the Gitlab repository but could not find it. > Thanks for the help. > Cheers, > Elena > > > > > > On 11/4/25 03:01, Barry Smith wrote: >> 0 KSP unpreconditioned resid norm 1.265943996096e+00 true resid norm 1.265943996096e+00 ||r(i)||/||b|| 1.000000000000e+00 >> My monitor 0 1.265943996096e+00 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Nov 4 09:08:21 2025 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 4 Nov 2025 10:08:21 -0500 Subject: [petsc-users] norm of KSPBuildResidual does not match norm computed from KSPBuildSolution In-Reply-To: References: <4a3cebd0f2494a26b49e8b1b19595433@ipp.mpg.de> <3A29735F-9AB3-49CC-9ECB-7576B563949C@petsc.dev> Message-ID: <3197B431-68DD-437E-997B-7E8276D1479B@petsc.dev> The preconditioner is always built from the second of the two matrices passed in KSPSetOperators() or PCSetOperators(). In the first case below both matrices are the same (the nest matrix) and so diagonal is extracted from the nest matrix (which is the only matrix). In the second case below the first matrix is custom and the second is the next matrix, again the preconditioner is constructed from the second one. linear system matrix = precond matrix: Mat Object: 1 MPI process type: nest rows=524, cols=524 Matrix object: type=nest, rows=3, cols=3 MatNest structure: (0,0) : type=mpiaij, rows=176, cols=176 (0,1) : NULL (0,2) : NULL (1,0) : NULL (1,1) : type=mpiaij, rows=172, cols=172 (1,2) : NULL (2,0) : NULL (2,1) : NULL (2,2) : type=mpiaij, rows=176, cols=176 Mat Object: 1 MPI process type: python Python: __main__.LHSOperator Mat Object: 1 MPI process type: nest Matrix object: type=nest, rows=3, cols=3 MatNest structure: (0,0) : type=mpiaij, rows=176, cols=176 (0,1) : NULL (0,2) : NULL (1,0) : NULL (1,1) : type=mpiaij, rows=172, cols=172 (1,2) : NULL (2,0) : NULL (2,1) : NULL (2,2) : type=mpiaij, rows=176, cols=176 The arguments to KSPSetOperators() and PCSetOperators() are stored in the same place (inside the PC) In fact, PetscErrorCode KSPSetOperators(KSP ksp, Mat Amat, Mat Pmat) { .... PetscCall(PCSetOperators(ksp->pc, Amat, Pmat)); so when you call precond.setOperators(A=nest_mass_matrix, P=None) ksp.setPC(precond) you are overwriting the A = lhs that you passed into ksp.setOperators(ksp, lhs, None). Given the names ksp.setOperators() and precond.setOperators() I can see how this can be confusing. It is reasonable to conclude that ksp.setOperators is providing the linear system and that pc.setOperators() is providing the matrix with which to build the preconditioner, but that is not the case. But the code has been this way for 31 years. I am not sure what we can change with the documentation. Perhaps in KSPSetOperators it can say that it sets them into the PC. Barry > On Nov 4, 2025, at 4:06?AM, Moral Sanchez, Elena wrote: > > Dear Barry, > Thanks for the fast answer. Unfortunately in my case the discrepancy is huge. With the flags > -ksp_monitor_true_residual -ksp_norm_type unpreconditioned > this is the output: > 0 KSP unpreconditioned resid norm 5.568889644229e-01 true resid norm 5.568889644229e-01 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP unpreconditioned resid norm 2.831772665189e-01 true resid norm 2.831772665189e-01 ||r(i)||/||b|| 5.084986139245e-01 > 2 KSP unpreconditioned resid norm 1.875950094147e-01 true resid norm 1.875950094147e-01 ||r(i)||/||b|| 3.368625011435e-01 > and this is the output of my own monitor function: > Iter 0/10 | res = 5.57e-01/1.00e-08 | 0.0 s > difference KSPBuildSolution and u: 0.0 > UNPRECONDITIONED norm: 0.5568889644229376 > PRECONDITIONED norm: 2.049041078011257 > KSPBuildResidual 2-norm: 0.5568889644229299 > difference KSPBuildResidual and b-A(KSPBuildSolution): 6.573603152700697e-13 > > Iter 1/10 | res = 2.83e-01/1.00e-08 | 0.0 s > difference KSPBuildSolution and u: 0.0 > UNPRECONDITIONED norm: 0.7661983589104541 > PRECONDITIONED norm: 2.7387602134717137 > KSPBuildResidual 2-norm: 0.2831772665189212 > difference KSPBuildResidual and b-A(KSPBuildSolution): 0.1700718741085172 > > Iter 2/10 | res = 1.88e-01/1.00e-08 | 0.0 s > difference KSPBuildSolution and u: 0.0 > UNPRECONDITIONED norm: 0.7050518160900253 > PRECONDITIONED norm: 2.421773833445645 > KSPBuildResidual 2-norm: 0.18759500941469456 > difference KSPBuildResidual and b-A(KSPBuildSolution): 0.19327058976599623 > Here u is the vector in the KSPSolve. > After the first iteration, the residual computed from KSPBuildSolution and the residual from KSPBuildResidual diverge. They are the same when I run the same code without preconditioner. > Another observation is that after convergence (wrt. unpreconditioned norm == 2-norm of KSPBuildResidual) the solution with and without preconditioner looks quite different. How is this possible if my preconditioner is SPD? > > By the way, where can I find your implementation of "My monitor" in src/snes/tutorials/ex5.c? I tried to look at the Gitlab repository but could not find it. > Thanks for the help. > Cheers, > Elena > > > > > > On 11/4/25 03:01, Barry Smith wrote: >> 0 KSP unpreconditioned resid norm 1.265943996096e+00 true resid norm 1.265943996096e+00 ||r(i)||/||b|| 1.000000000000e+00 >> My monitor 0 1.265943996096e+00 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Elena.Moral.Sanchez at ipp.mpg.de Tue Nov 4 10:14:00 2025 From: Elena.Moral.Sanchez at ipp.mpg.de (Moral Sanchez, Elena) Date: Tue, 4 Nov 2025 16:14:00 +0000 Subject: [petsc-users] norm of KSPBuildResidual does not match norm computed from KSPBuildSolution In-Reply-To: <3197B431-68DD-437E-997B-7E8276D1479B@petsc.dev> References: <4a3cebd0f2494a26b49e8b1b19595433@ipp.mpg.de> <3A29735F-9AB3-49CC-9ECB-7576B563949C@petsc.dev> , <3197B431-68DD-437E-997B-7E8276D1479B@petsc.dev> Message-ID: <74096301abcb458bbd82fc6adace2e6d@ipp.mpg.de> Dear Barry, Thank you for the clear answer. Indeed, it would be useful to know that they are both stored in PC, that PCSetOperators may overwrite KSPSetOperators and that the order (1 for A, 2 for PC) is the order that appears in KSPView. In any case, it is clear to me now. Thanks for the help! Cheers, Elena ________________________________ From: Barry Smith Sent: 04 November 2025 16:08:21 To: Moral Sanchez, Elena Cc: PETSc Subject: Re: [petsc-users] norm of KSPBuildResidual does not match norm computed from KSPBuildSolution The preconditioner is always built from the second of the two matrices passed in KSPSetOperators() or PCSetOperators(). In the first case below both matrices are the same (the nest matrix) and so diagonal is extracted from the nest matrix (which is the only matrix). In the second case below the first matrix is custom and the second is the next matrix, again the preconditioner is constructed from the second one. linear system matrix = precond matrix: Mat Object: 1 MPI process type: nest rows=524, cols=524 Matrix object: type=nest, rows=3, cols=3 MatNest structure: (0,0) : type=mpiaij, rows=176, cols=176 (0,1) : NULL (0,2) : NULL (1,0) : NULL (1,1) : type=mpiaij, rows=172, cols=172 (1,2) : NULL (2,0) : NULL (2,1) : NULL (2,2) : type=mpiaij, rows=176, cols=176 Mat Object: 1 MPI process type: python Python: __main__.LHSOperator Mat Object: 1 MPI process type: nest Matrix object: type=nest, rows=3, cols=3 MatNest structure: (0,0) : type=mpiaij, rows=176, cols=176 (0,1) : NULL (0,2) : NULL (1,0) : NULL (1,1) : type=mpiaij, rows=172, cols=172 (1,2) : NULL (2,0) : NULL (2,1) : NULL (2,2) : type=mpiaij, rows=176, cols=176 The arguments to KSPSetOperators() and PCSetOperators() are stored in the same place (inside the PC) In fact, PetscErrorCode KSPSetOperators(KSP ksp, Mat Amat, Mat Pmat) { .... PetscCall(PCSetOperators(ksp->pc, Amat, Pmat)); so when you call precond.setOperators(A=nest_mass_matrix, P=None) ksp.setPC(precond) you are overwriting the A = lhs that you passed into ksp.setOperators(ksp, lhs, None). Given the names ksp.setOperators() and precond.setOperators() I can see how this can be confusing. It is reasonable to conclude that ksp.setOperators is providing the linear system and that pc.setOperators() is providing the matrix with which to build the preconditioner, but that is not the case. But the code has been this way for 31 years. I am not sure what we can change with the documentation. Perhaps in KSPSetOperators it can say that it sets them into the PC. Barry On Nov 4, 2025, at 4:06?AM, Moral Sanchez, Elena wrote: Dear Barry, Thanks for the fast answer. Unfortunately in my case the discrepancy is huge. With the flags -ksp_monitor_true_residual -ksp_norm_type unpreconditioned this is the output: 0 KSP unpreconditioned resid norm 5.568889644229e-01 true resid norm 5.568889644229e-01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 2.831772665189e-01 true resid norm 2.831772665189e-01 ||r(i)||/||b|| 5.084986139245e-01 2 KSP unpreconditioned resid norm 1.875950094147e-01 true resid norm 1.875950094147e-01 ||r(i)||/||b|| 3.368625011435e-01 and this is the output of my own monitor function: Iter 0/10 | res = 5.57e-01/1.00e-08 | 0.0 s difference KSPBuildSolution and u: 0.0 UNPRECONDITIONED norm: 0.5568889644229376 PRECONDITIONED norm: 2.049041078011257 KSPBuildResidual 2-norm: 0.5568889644229299 difference KSPBuildResidual and b-A(KSPBuildSolution): 6.573603152700697e-13 Iter 1/10 | res = 2.83e-01/1.00e-08 | 0.0 s difference KSPBuildSolution and u: 0.0 UNPRECONDITIONED norm: 0.7661983589104541 PRECONDITIONED norm: 2.7387602134717137 KSPBuildResidual 2-norm: 0.2831772665189212 difference KSPBuildResidual and b-A(KSPBuildSolution): 0.1700718741085172 Iter 2/10 | res = 1.88e-01/1.00e-08 | 0.0 s difference KSPBuildSolution and u: 0.0 UNPRECONDITIONED norm: 0.7050518160900253 PRECONDITIONED norm: 2.421773833445645 KSPBuildResidual 2-norm: 0.18759500941469456 difference KSPBuildResidual and b-A(KSPBuildSolution): 0.19327058976599623 Here u is the vector in the KSPSolve. After the first iteration, the residual computed from KSPBuildSolution and the residual from KSPBuildResidual diverge. They are the same when I run the same code without preconditioner. Another observation is that after convergence (wrt. unpreconditioned norm == 2-norm of KSPBuildResidual) the solution with and without preconditioner looks quite different. How is this possible if my preconditioner is SPD? By the way, where can I find your implementation of "My monitor" in src/snes/tutorials/ex5.c? I tried to look at the Gitlab repository but could not find it. Thanks for the help. Cheers, Elena On 11/4/25 03:01, Barry Smith wrote: 0 KSP unpreconditioned resid norm 1.265943996096e+00 true resid norm 1.265943996096e+00 ||r(i)||/||b|| 1.000000000000e+00 My monitor 0 1.265943996096e+00 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Nov 4 20:10:36 2025 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 4 Nov 2025 21:10:36 -0500 Subject: [petsc-users] norm of KSPBuildResidual does not match norm computed from KSPBuildSolution In-Reply-To: <74096301abcb458bbd82fc6adace2e6d@ipp.mpg.de> References: <4a3cebd0f2494a26b49e8b1b19595433@ipp.mpg.de> <3A29735F-9AB3-49CC-9ECB-7576B563949C@petsc.dev> <3197B431-68DD-437E-997B-7E8276D1479B@petsc.dev> <74096301abcb458bbd82fc6adace2e6d@ipp.mpg.de> Message-ID: <98DD57FA-A301-4B5F-AFFD-041465090438@petsc.dev> https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8825__;!!G_uCfscf7eWS!bN5mKdIPW5Qb1phs3E5csHyDe8rTEZ1Wg5FYGcuTnKoIRKJqNhC4ydBAK9k4w-OY6HGXdZdeTCdxXp3afWUypHg$ > On Nov 4, 2025, at 11:14?AM, Moral Sanchez, Elena wrote: > > Dear Barry, > Thank you for the clear answer. Indeed, it would be useful to know that they are both stored in PC, that PCSetOperators may overwrite KSPSetOperators and that the order (1 for A, 2 for PC) is the order that appears in KSPView. > > In any case, it is clear to me now. Thanks for the help! > Cheers, > Elena > From: Barry Smith > > Sent: 04 November 2025 16:08:21 > To: Moral Sanchez, Elena > Cc: PETSc > Subject: Re: [petsc-users] norm of KSPBuildResidual does not match norm computed from KSPBuildSolution > > The preconditioner is always built from the second of the two matrices passed in KSPSetOperators() or PCSetOperators(). In the first case below both matrices are the same (the nest matrix) and so diagonal is extracted from the nest matrix (which is the only matrix). In the second case below the first matrix is custom and the second is the next matrix, again the preconditioner is constructed from the second one. > > linear system matrix = precond matrix: > Mat Object: 1 MPI process > type: nest > rows=524, cols=524 > Matrix object: > type=nest, rows=3, cols=3 > MatNest structure: > (0,0) : type=mpiaij, rows=176, cols=176 > (0,1) : NULL > (0,2) : NULL > (1,0) : NULL > (1,1) : type=mpiaij, rows=172, cols=172 > (1,2) : NULL > (2,0) : NULL > (2,1) : NULL > (2,2) : type=mpiaij, rows=176, cols=176 > > Mat Object: 1 MPI process > type: python > Python: __main__.LHSOperator > Mat Object: 1 MPI process > type: nest > Matrix object: > type=nest, rows=3, cols=3 > MatNest structure: > (0,0) : type=mpiaij, rows=176, cols=176 > (0,1) : NULL > (0,2) : NULL > (1,0) : NULL > (1,1) : type=mpiaij, rows=172, cols=172 > (1,2) : NULL > (2,0) : NULL > (2,1) : NULL > (2,2) : type=mpiaij, rows=176, cols=176 > > > The arguments to KSPSetOperators() and PCSetOperators() are stored in the same place (inside the PC) > > In fact, > > PetscErrorCode KSPSetOperators(KSP ksp, Mat Amat, Mat Pmat) > { > .... > PetscCall(PCSetOperators(ksp->pc, Amat, Pmat)); > > > > so when you call > > precond.setOperators(A=nest_mass_matrix, P=None) > ksp.setPC(precond) > > you are overwriting the A = lhs that you passed into ksp.setOperators(ksp, lhs, None). > > Given the names ksp.setOperators() and precond.setOperators() I can see how this can be confusing. It is reasonable to conclude that ksp.setOperators is providing the linear system and that pc.setOperators() is providing the matrix with which to build the preconditioner, but that is not the case. > > But the code has been this way for 31 years. I am not sure what we can change with the documentation. Perhaps in KSPSetOperators it can say that it sets them into the PC. > > Barry > > > > >> On Nov 4, 2025, at 4:06?AM, Moral Sanchez, Elena > wrote: >> >> Dear Barry, >> Thanks for the fast answer. Unfortunately in my case the discrepancy is huge. With the flags >> -ksp_monitor_true_residual -ksp_norm_type unpreconditioned >> this is the output: >> 0 KSP unpreconditioned resid norm 5.568889644229e-01 true resid norm 5.568889644229e-01 ||r(i)||/||b|| 1.000000000000e+00 >> 1 KSP unpreconditioned resid norm 2.831772665189e-01 true resid norm 2.831772665189e-01 ||r(i)||/||b|| 5.084986139245e-01 >> 2 KSP unpreconditioned resid norm 1.875950094147e-01 true resid norm 1.875950094147e-01 ||r(i)||/||b|| 3.368625011435e-01 >> and this is the output of my own monitor function: >> Iter 0/10 | res = 5.57e-01/1.00e-08 | 0.0 s >> difference KSPBuildSolution and u: 0.0 >> UNPRECONDITIONED norm: 0.5568889644229376 >> PRECONDITIONED norm: 2.049041078011257 >> KSPBuildResidual 2-norm: 0.5568889644229299 >> difference KSPBuildResidual and b-A(KSPBuildSolution): 6.573603152700697e-13 >> >> Iter 1/10 | res = 2.83e-01/1.00e-08 | 0.0 s >> difference KSPBuildSolution and u: 0.0 >> UNPRECONDITIONED norm: 0.7661983589104541 >> PRECONDITIONED norm: 2.7387602134717137 >> KSPBuildResidual 2-norm: 0.2831772665189212 >> difference KSPBuildResidual and b-A(KSPBuildSolution): 0.1700718741085172 >> >> Iter 2/10 | res = 1.88e-01/1.00e-08 | 0.0 s >> difference KSPBuildSolution and u: 0.0 >> UNPRECONDITIONED norm: 0.7050518160900253 >> PRECONDITIONED norm: 2.421773833445645 >> KSPBuildResidual 2-norm: 0.18759500941469456 >> difference KSPBuildResidual and b-A(KSPBuildSolution): 0.19327058976599623 >> Here u is the vector in the KSPSolve. >> After the first iteration, the residual computed from KSPBuildSolution and the residual from KSPBuildResidual diverge. They are the same when I run the same code without preconditioner. >> Another observation is that after convergence (wrt. unpreconditioned norm == 2-norm of KSPBuildResidual) the solution with and without preconditioner looks quite different. How is this possible if my preconditioner is SPD? >> >> By the way, where can I find your implementation of "My monitor" in src/snes/tutorials/ex5.c? I tried to look at the Gitlab repository but could not find it. >> Thanks for the help. >> Cheers, >> Elena >> >> >> >> >> >> On 11/4/25 03:01, Barry Smith wrote: >>> 0 KSP unpreconditioned resid norm 1.265943996096e+00 true resid norm 1.265943996096e+00 ||r(i)||/||b|| 1.000000000000e+00 >>> My monitor 0 1.265943996096e+00 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Nov 4 20:58:40 2025 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 4 Nov 2025 21:58:40 -0500 Subject: [petsc-users] norm of KSPBuildResidual does not match norm computed from KSPBuildSolution In-Reply-To: <74096301abcb458bbd82fc6adace2e6d@ipp.mpg.de> References: <4a3cebd0f2494a26b49e8b1b19595433@ipp.mpg.de> <3A29735F-9AB3-49CC-9ECB-7576B563949C@petsc.dev> <3197B431-68DD-437E-997B-7E8276D1479B@petsc.dev> <74096301abcb458bbd82fc6adace2e6d@ipp.mpg.de> Message-ID: <946FAEEA-83D0-480C-8BB5-487BBB4F36E0@petsc.dev> https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8826__;!!G_uCfscf7eWS!diDfPQOitwJEJCtXeieGgQxBQzXqEaYV-O_91c4jZTPklWudrcBr8bBjZGioRCn5h4NI49X6mRTD-BCf0N0RWKU$ ? Fix terminology for Pmat in KSPView output. (!8826) ? Merge requests ? PETSc / petsc ? GitLab gitlab.com > On Nov 4, 2025, at 11:14?AM, Moral Sanchez, Elena wrote: > > Dear Barry, > Thank you for the clear answer. Indeed, it would be useful to know that they are both stored in PC, that PCSetOperators may overwrite KSPSetOperators and that the order (1 for A, 2 for PC) is the order that appears in KSPView. > > In any case, it is clear to me now. Thanks for the help! > Cheers, > Elena > From: Barry Smith > > Sent: 04 November 2025 16:08:21 > To: Moral Sanchez, Elena > Cc: PETSc > Subject: Re: [petsc-users] norm of KSPBuildResidual does not match norm computed from KSPBuildSolution > > The preconditioner is always built from the second of the two matrices passed in KSPSetOperators() or PCSetOperators(). In the first case below both matrices are the same (the nest matrix) and so diagonal is extracted from the nest matrix (which is the only matrix). In the second case below the first matrix is custom and the second is the next matrix, again the preconditioner is constructed from the second one. > > linear system matrix = precond matrix: > Mat Object: 1 MPI process > type: nest > rows=524, cols=524 > Matrix object: > type=nest, rows=3, cols=3 > MatNest structure: > (0,0) : type=mpiaij, rows=176, cols=176 > (0,1) : NULL > (0,2) : NULL > (1,0) : NULL > (1,1) : type=mpiaij, rows=172, cols=172 > (1,2) : NULL > (2,0) : NULL > (2,1) : NULL > (2,2) : type=mpiaij, rows=176, cols=176 > > Mat Object: 1 MPI process > type: python > Python: __main__.LHSOperator > Mat Object: 1 MPI process > type: nest > Matrix object: > type=nest, rows=3, cols=3 > MatNest structure: > (0,0) : type=mpiaij, rows=176, cols=176 > (0,1) : NULL > (0,2) : NULL > (1,0) : NULL > (1,1) : type=mpiaij, rows=172, cols=172 > (1,2) : NULL > (2,0) : NULL > (2,1) : NULL > (2,2) : type=mpiaij, rows=176, cols=176 > > > The arguments to KSPSetOperators() and PCSetOperators() are stored in the same place (inside the PC) > > In fact, > > PetscErrorCode KSPSetOperators(KSP ksp, Mat Amat, Mat Pmat) > { > .... > PetscCall(PCSetOperators(ksp->pc, Amat, Pmat)); > > > > so when you call > > precond.setOperators(A=nest_mass_matrix, P=None) > ksp.setPC(precond) > > you are overwriting the A = lhs that you passed into ksp.setOperators(ksp, lhs, None). > > Given the names ksp.setOperators() and precond.setOperators() I can see how this can be confusing. It is reasonable to conclude that ksp.setOperators is providing the linear system and that pc.setOperators() is providing the matrix with which to build the preconditioner, but that is not the case. > > But the code has been this way for 31 years. I am not sure what we can change with the documentation. Perhaps in KSPSetOperators it can say that it sets them into the PC. > > Barry > > > > >> On Nov 4, 2025, at 4:06?AM, Moral Sanchez, Elena > wrote: >> >> Dear Barry, >> Thanks for the fast answer. Unfortunately in my case the discrepancy is huge. With the flags >> -ksp_monitor_true_residual -ksp_norm_type unpreconditioned >> this is the output: >> 0 KSP unpreconditioned resid norm 5.568889644229e-01 true resid norm 5.568889644229e-01 ||r(i)||/||b|| 1.000000000000e+00 >> 1 KSP unpreconditioned resid norm 2.831772665189e-01 true resid norm 2.831772665189e-01 ||r(i)||/||b|| 5.084986139245e-01 >> 2 KSP unpreconditioned resid norm 1.875950094147e-01 true resid norm 1.875950094147e-01 ||r(i)||/||b|| 3.368625011435e-01 >> and this is the output of my own monitor function: >> Iter 0/10 | res = 5.57e-01/1.00e-08 | 0.0 s >> difference KSPBuildSolution and u: 0.0 >> UNPRECONDITIONED norm: 0.5568889644229376 >> PRECONDITIONED norm: 2.049041078011257 >> KSPBuildResidual 2-norm: 0.5568889644229299 >> difference KSPBuildResidual and b-A(KSPBuildSolution): 6.573603152700697e-13 >> >> Iter 1/10 | res = 2.83e-01/1.00e-08 | 0.0 s >> difference KSPBuildSolution and u: 0.0 >> UNPRECONDITIONED norm: 0.7661983589104541 >> PRECONDITIONED norm: 2.7387602134717137 >> KSPBuildResidual 2-norm: 0.2831772665189212 >> difference KSPBuildResidual and b-A(KSPBuildSolution): 0.1700718741085172 >> >> Iter 2/10 | res = 1.88e-01/1.00e-08 | 0.0 s >> difference KSPBuildSolution and u: 0.0 >> UNPRECONDITIONED norm: 0.7050518160900253 >> PRECONDITIONED norm: 2.421773833445645 >> KSPBuildResidual 2-norm: 0.18759500941469456 >> difference KSPBuildResidual and b-A(KSPBuildSolution): 0.19327058976599623 >> Here u is the vector in the KSPSolve. >> After the first iteration, the residual computed from KSPBuildSolution and the residual from KSPBuildResidual diverge. They are the same when I run the same code without preconditioner. >> Another observation is that after convergence (wrt. unpreconditioned norm == 2-norm of KSPBuildResidual) the solution with and without preconditioner looks quite different. How is this possible if my preconditioner is SPD? >> >> By the way, where can I find your implementation of "My monitor" in src/snes/tutorials/ex5.c? I tried to look at the Gitlab repository but could not find it. >> Thanks for the help. >> Cheers, >> Elena >> >> >> >> >> >> On 11/4/25 03:01, Barry Smith wrote: >>> 0 KSP unpreconditioned resid norm 1.265943996096e+00 true resid norm 1.265943996096e+00 ||r(i)||/||b|| 1.000000000000e+00 >>> My monitor 0 1.265943996096e+00 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PETSc_RBG-logo.png Type: image/png Size: 7210 bytes Desc: not available URL: From bsmith at petsc.dev Tue Nov 4 20:58:40 2025 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 4 Nov 2025 21:58:40 -0500 Subject: [petsc-users] norm of KSPBuildResidual does not match norm computed from KSPBuildSolution In-Reply-To: <74096301abcb458bbd82fc6adace2e6d@ipp.mpg.de> References: <4a3cebd0f2494a26b49e8b1b19595433@ipp.mpg.de> <3A29735F-9AB3-49CC-9ECB-7576B563949C@petsc.dev> <3197B431-68DD-437E-997B-7E8276D1479B@petsc.dev> <74096301abcb458bbd82fc6adace2e6d@ipp.mpg.de> Message-ID: <946FAEEA-83D0-480C-8BB5-487BBB4F36E0@petsc.dev> https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8826__;!!G_uCfscf7eWS!esBVCjPHQrWOYQ6R1FZoyTB6d0lnQTWmypXz8_Fm14V7nZmgyIP_3WFXO9tJU3-lEhzeqt_xIZOD2yALOJf1d5c$ ? Fix terminology for Pmat in KSPView output. (!8826) ? Merge requests ? PETSc / petsc ? GitLab gitlab.com > On Nov 4, 2025, at 11:14?AM, Moral Sanchez, Elena wrote: > > Dear Barry, > Thank you for the clear answer. Indeed, it would be useful to know that they are both stored in PC, that PCSetOperators may overwrite KSPSetOperators and that the order (1 for A, 2 for PC) is the order that appears in KSPView. > > In any case, it is clear to me now. Thanks for the help! > Cheers, > Elena > From: Barry Smith > > Sent: 04 November 2025 16:08:21 > To: Moral Sanchez, Elena > Cc: PETSc > Subject: Re: [petsc-users] norm of KSPBuildResidual does not match norm computed from KSPBuildSolution > > The preconditioner is always built from the second of the two matrices passed in KSPSetOperators() or PCSetOperators(). In the first case below both matrices are the same (the nest matrix) and so diagonal is extracted from the nest matrix (which is the only matrix). In the second case below the first matrix is custom and the second is the next matrix, again the preconditioner is constructed from the second one. > > linear system matrix = precond matrix: > Mat Object: 1 MPI process > type: nest > rows=524, cols=524 > Matrix object: > type=nest, rows=3, cols=3 > MatNest structure: > (0,0) : type=mpiaij, rows=176, cols=176 > (0,1) : NULL > (0,2) : NULL > (1,0) : NULL > (1,1) : type=mpiaij, rows=172, cols=172 > (1,2) : NULL > (2,0) : NULL > (2,1) : NULL > (2,2) : type=mpiaij, rows=176, cols=176 > > Mat Object: 1 MPI process > type: python > Python: __main__.LHSOperator > Mat Object: 1 MPI process > type: nest > Matrix object: > type=nest, rows=3, cols=3 > MatNest structure: > (0,0) : type=mpiaij, rows=176, cols=176 > (0,1) : NULL > (0,2) : NULL > (1,0) : NULL > (1,1) : type=mpiaij, rows=172, cols=172 > (1,2) : NULL > (2,0) : NULL > (2,1) : NULL > (2,2) : type=mpiaij, rows=176, cols=176 > > > The arguments to KSPSetOperators() and PCSetOperators() are stored in the same place (inside the PC) > > In fact, > > PetscErrorCode KSPSetOperators(KSP ksp, Mat Amat, Mat Pmat) > { > .... > PetscCall(PCSetOperators(ksp->pc, Amat, Pmat)); > > > > so when you call > > precond.setOperators(A=nest_mass_matrix, P=None) > ksp.setPC(precond) > > you are overwriting the A = lhs that you passed into ksp.setOperators(ksp, lhs, None). > > Given the names ksp.setOperators() and precond.setOperators() I can see how this can be confusing. It is reasonable to conclude that ksp.setOperators is providing the linear system and that pc.setOperators() is providing the matrix with which to build the preconditioner, but that is not the case. > > But the code has been this way for 31 years. I am not sure what we can change with the documentation. Perhaps in KSPSetOperators it can say that it sets them into the PC. > > Barry > > > > >> On Nov 4, 2025, at 4:06?AM, Moral Sanchez, Elena > wrote: >> >> Dear Barry, >> Thanks for the fast answer. Unfortunately in my case the discrepancy is huge. With the flags >> -ksp_monitor_true_residual -ksp_norm_type unpreconditioned >> this is the output: >> 0 KSP unpreconditioned resid norm 5.568889644229e-01 true resid norm 5.568889644229e-01 ||r(i)||/||b|| 1.000000000000e+00 >> 1 KSP unpreconditioned resid norm 2.831772665189e-01 true resid norm 2.831772665189e-01 ||r(i)||/||b|| 5.084986139245e-01 >> 2 KSP unpreconditioned resid norm 1.875950094147e-01 true resid norm 1.875950094147e-01 ||r(i)||/||b|| 3.368625011435e-01 >> and this is the output of my own monitor function: >> Iter 0/10 | res = 5.57e-01/1.00e-08 | 0.0 s >> difference KSPBuildSolution and u: 0.0 >> UNPRECONDITIONED norm: 0.5568889644229376 >> PRECONDITIONED norm: 2.049041078011257 >> KSPBuildResidual 2-norm: 0.5568889644229299 >> difference KSPBuildResidual and b-A(KSPBuildSolution): 6.573603152700697e-13 >> >> Iter 1/10 | res = 2.83e-01/1.00e-08 | 0.0 s >> difference KSPBuildSolution and u: 0.0 >> UNPRECONDITIONED norm: 0.7661983589104541 >> PRECONDITIONED norm: 2.7387602134717137 >> KSPBuildResidual 2-norm: 0.2831772665189212 >> difference KSPBuildResidual and b-A(KSPBuildSolution): 0.1700718741085172 >> >> Iter 2/10 | res = 1.88e-01/1.00e-08 | 0.0 s >> difference KSPBuildSolution and u: 0.0 >> UNPRECONDITIONED norm: 0.7050518160900253 >> PRECONDITIONED norm: 2.421773833445645 >> KSPBuildResidual 2-norm: 0.18759500941469456 >> difference KSPBuildResidual and b-A(KSPBuildSolution): 0.19327058976599623 >> Here u is the vector in the KSPSolve. >> After the first iteration, the residual computed from KSPBuildSolution and the residual from KSPBuildResidual diverge. They are the same when I run the same code without preconditioner. >> Another observation is that after convergence (wrt. unpreconditioned norm == 2-norm of KSPBuildResidual) the solution with and without preconditioner looks quite different. How is this possible if my preconditioner is SPD? >> >> By the way, where can I find your implementation of "My monitor" in src/snes/tutorials/ex5.c? I tried to look at the Gitlab repository but could not find it. >> Thanks for the help. >> Cheers, >> Elena >> >> >> >> >> >> On 11/4/25 03:01, Barry Smith wrote: >>> 0 KSP unpreconditioned resid norm 1.265943996096e+00 true resid norm 1.265943996096e+00 ||r(i)||/||b|| 1.000000000000e+00 >>> My monitor 0 1.265943996096e+00 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PETSc_RBG-logo.png Type: image/png Size: 7210 bytes Desc: not available URL: From sale987 at live.com Wed Nov 5 08:17:32 2025 From: sale987 at live.com (Samuele Ferri) Date: Wed, 5 Nov 2025 14:17:32 +0000 Subject: [petsc-users] Two SNES on the same DM not working Message-ID: Dear petsc users, in petsc version 3.24, I'm trying to create two snes over the same DM, but with different functions and jacobians. Despite making different calls to SNESSetFunction it happens the second snes uses the same function of the first. Can you help me finding the problem, please? Here below there is a minimal working example showing the issue: static char help[] = "Test SNES.\n"; #include #include #include PetscErrorCode Jac_1(SNES snes, Vec x, Mat J, Mat B, void *){ PetscFunctionBegin; printf("Jac 1\n"); PetscFunctionReturn(PETSC_SUCCESS); } PetscErrorCode Function_1(SNES snes, Vec x, Vec f, void *){ PetscFunctionBegin; printf("Function 1\n"); PetscFunctionReturn(PETSC_SUCCESS); } PetscErrorCode Jac_2(SNES snes, Vec x, Mat J, Mat B, void *){ PetscFunctionBegin; printf("Jac 2\n"); PetscFunctionReturn(PETSC_SUCCESS); } PetscErrorCode Function_2(SNES snes, Vec x, Vec f, void *){ PetscFunctionBegin; printf("Function 2\n"); PetscFunctionReturn(PETSC_SUCCESS); } int main(int argc, char **argv) { PetscFunctionBeginUser; PetscCall(PetscInitialize(&argc, &argv, NULL, help)); DM dm; PetscCall(DMDACreate1d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, 100, 1, 1, NULL, &dm)); PetscCall(DMSetFromOptions(dm)); PetscCall(DMSetUp(dm)); SNES snes1, snes2; Vec r1,r2; Mat J1, J2; PetscCall(DMCreateGlobalVector(dm, &r1)); PetscCall(DMCreateGlobalVector(dm, &r2)) PetscCall(DMCreateMatrix(dm, &J1)); PetscCall(DMCreateMatrix(dm, &J2)); PetscCall(SNESCreate(PETSC_COMM_WORLD, &snes1)); PetscCall(SNESCreate(PETSC_COMM_WORLD, &snes2)); PetscCall(SNESSetType(snes1, SNESNEWTONLS)); PetscCall(SNESSetType(snes2, SNESNEWTONLS)); PetscCall(SNESSetFromOptions(snes1)); PetscCall(SNESSetFromOptions(snes2)); PetscCall(SNESSetFunction(snes1, r1, Function_1, NULL)); PetscCall(SNESSetFunction(snes2, r2, Function_2, NULL)); PetscCall(SNESSetJacobian(snes1, J1, J1, Jac_1, NULL)); PetscCall(SNESSetJacobian(snes2, J2, J2, Jac_2, NULL)); PetscCall(SNESSetDM(snes1, dm)); PetscCall(SNESSetDM(snes2, dm)); PetscCall(SNESSolve(snes1, NULL, NULL)); PetscCall(SNESSolve(snes2, NULL, NULL)); printf("snes1 %p; snes2 %p\n", snes1, snes2); SNESFunctionFn *p; PetscCall(SNESGetFunction(snes1, NULL, &p, NULL)); printf("snes1: pointer %p, true function %p\n", *p, Function_1); PetscCall(SNESGetFunction(snes2, NULL, &p, NULL)); printf("snes2: pointer %p, true function %p\n", *p, Function_2); PetscCall(PetscFinalize()); PetscFunctionReturn(PETSC_SUCCESS); } -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Nov 5 08:47:27 2025 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 5 Nov 2025 09:47:27 -0500 Subject: [petsc-users] Two SNES on the same DM not working In-Reply-To: References: Message-ID: This is not supported. Duplicate your DM. > On Nov 5, 2025, at 9:17?AM, Samuele Ferri wrote: > > Dear petsc users, > > in petsc version 3.24, I'm trying to create two snes over the same DM, but with different functions and jacobians. Despite making different calls to SNESSetFunction it happens the second snes uses the same function of the first. > Can you help me finding the problem, please? > > Here below there is a minimal working example showing the issue: > > static char help[] = "Test SNES.\n"; > #include > #include > #include > > PetscErrorCode Jac_1(SNES snes, Vec x, Mat J, Mat B, void *){ > PetscFunctionBegin; > printf("Jac 1\n"); > PetscFunctionReturn(PETSC_SUCCESS); > } > > PetscErrorCode Function_1(SNES snes, Vec x, Vec f, void *){ > PetscFunctionBegin; > printf("Function 1\n"); > PetscFunctionReturn(PETSC_SUCCESS); > } > > PetscErrorCode Jac_2(SNES snes, Vec x, Mat J, Mat B, void *){ > PetscFunctionBegin; > printf("Jac 2\n"); > PetscFunctionReturn(PETSC_SUCCESS); > } > > PetscErrorCode Function_2(SNES snes, Vec x, Vec f, void *){ > PetscFunctionBegin; > printf("Function 2\n"); > PetscFunctionReturn(PETSC_SUCCESS); > } > > int main(int argc, char **argv) { > > PetscFunctionBeginUser; > PetscCall(PetscInitialize(&argc, &argv, NULL, help)); > > DM dm; > PetscCall(DMDACreate1d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, 100, 1, 1, NULL, &dm)); > PetscCall(DMSetFromOptions(dm)); > PetscCall(DMSetUp(dm)); > > SNES snes1, snes2; > Vec r1,r2; > Mat J1, J2; > > PetscCall(DMCreateGlobalVector(dm, &r1)); > PetscCall(DMCreateGlobalVector(dm, &r2)) > PetscCall(DMCreateMatrix(dm, &J1)); > PetscCall(DMCreateMatrix(dm, &J2)); > > PetscCall(SNESCreate(PETSC_COMM_WORLD, &snes1)); > PetscCall(SNESCreate(PETSC_COMM_WORLD, &snes2)); > PetscCall(SNESSetType(snes1, SNESNEWTONLS)); > PetscCall(SNESSetType(snes2, SNESNEWTONLS)); > PetscCall(SNESSetFromOptions(snes1)); > PetscCall(SNESSetFromOptions(snes2)); > PetscCall(SNESSetFunction(snes1, r1, Function_1, NULL)); > PetscCall(SNESSetFunction(snes2, r2, Function_2, NULL)); > PetscCall(SNESSetJacobian(snes1, J1, J1, Jac_1, NULL)); > PetscCall(SNESSetJacobian(snes2, J2, J2, Jac_2, NULL)); > PetscCall(SNESSetDM(snes1, dm)); > PetscCall(SNESSetDM(snes2, dm)); > > PetscCall(SNESSolve(snes1, NULL, NULL)); > PetscCall(SNESSolve(snes2, NULL, NULL)); > > printf("snes1 %p; snes2 %p\n", snes1, snes2); > > SNESFunctionFn *p; > PetscCall(SNESGetFunction(snes1, NULL, &p, NULL)); > printf("snes1: pointer %p, true function %p\n", *p, Function_1); > PetscCall(SNESGetFunction(snes2, NULL, &p, NULL)); > printf("snes2: pointer %p, true function %p\n", *p, Function_2); > > PetscCall(PetscFinalize()); > PetscFunctionReturn(PETSC_SUCCESS); > } -------------- next part -------------- An HTML attachment was scrubbed... URL: From aldo.bonfiglioli at unibas.it Thu Nov 6 00:41:33 2025 From: aldo.bonfiglioli at unibas.it (Aldo Bonfiglioli) Date: Thu, 6 Nov 2025 07:41:33 +0100 Subject: [petsc-users] Probelm with DMPlexExtractSubMesh Message-ID: <1e9e4725-5274-4cef-a035-bb399deaac55@unibas.it> Dear all, I am having troubles in using DMPlexExtractSubMesh to extract the different strata of the Face Sets of a given mesh. When run on the enclosed tetrahedral mesh of the unit cube generated with gmsh > Face Sets: 6 strata with value/size (1 (246), 2 (246), 3 (246), 4 > (246), 5 (242), 6 (242)) > I would expect 246 "points" on stratum 3, but when I DMview the subdm (and plot it) the surface mesh looks incomplete > DM Object: patch_03 1 MPI process > ?type: plex > patch_03 in 2 dimensions: > ?Cells are at height 1 > ?Number of 0-cells per rank: 122 > ?Number of 1-cells per rank: 325 > Number of 2-cells per rank: 204 > Number of 3-cells per rank: 204 [204] > Labels: > celltype: 4 strata with value/size (0 (122), 1 (325), 3 (204), 12 (204)) > depth: 4 strata with value/size (0 (122), 1 (325), 2 (204), 3 (204)) > Cell Sets: 1 strata with value/size (1 (204)) > Face Sets: 1 strata with value/size (3 (204)) > Edge Sets: 2 strata with value/size (1 (8), 5 (8)) > see also patch_03.pdf What am I doing wrong? A simple reproducer (compiles with petsc-3.24.0)?and the gmsh mesh are enclosed. Thanks, Aldo -- Dr. Aldo Bonfiglioli Associate professor of Fluid Mechanics Dipartimento di Ingegneria Universita' della Basilicata V.le dell'Ateneo Lucano, 10 85100 Potenza ITALY tel:+39.0971.205203 fax:+39.0971.205215 web:https://urldefense.us/v3/__http://docenti.unibas.it/site/home/docente.html?m=002423__;!!G_uCfscf7eWS!aMKmGG4aim9XcbNSnDyHUkDyhUkQHGZ-u-xX2C-sycYUMmtTij6AwqsQbZPXJSvPp9KUfgwRJK2Ok6Me2BLgO0en1w4QF2fHo7s$ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: submesh_xmple.F90 Type: text/x-fortran Size: 3882 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cube6.msh Type: model/mesh Size: 186084 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: patch_03.pdf Type: application/pdf Size: 160538 bytes Desc: not available URL: From sale987 at live.com Thu Nov 6 00:49:12 2025 From: sale987 at live.com (Samuele Ferri) Date: Thu, 6 Nov 2025 06:49:12 +0000 Subject: [petsc-users] R: Two SNES on the same DM not working In-Reply-To: References: Message-ID: Dear Barry, thank you for your reply. Now everything works fine. Best regards Samuele ________________________________ Da: Barry Smith Inviato: mercoled? 5 novembre 2025 15:47 A: Samuele Ferri Cc: petsc-users at mcs.anl.gov Oggetto: Re: [petsc-users] Two SNES on the same DM not working This is not supported. Duplicate your DM. On Nov 5, 2025, at 9:17?AM, Samuele Ferri wrote: Dear petsc users, in petsc version 3.24, I'm trying to create two snes over the same DM, but with different functions and jacobians. Despite making different calls to SNESSetFunction it happens the second snes uses the same function of the first. Can you help me finding the problem, please? Here below there is a minimal working example showing the issue: static char help[] = "Test SNES.\n"; #include #include #include PetscErrorCode Jac_1(SNES snes, Vec x, Mat J, Mat B, void *){ PetscFunctionBegin; printf("Jac 1\n"); PetscFunctionReturn(PETSC_SUCCESS); } PetscErrorCode Function_1(SNES snes, Vec x, Vec f, void *){ PetscFunctionBegin; printf("Function 1\n"); PetscFunctionReturn(PETSC_SUCCESS); } PetscErrorCode Jac_2(SNES snes, Vec x, Mat J, Mat B, void *){ PetscFunctionBegin; printf("Jac 2\n"); PetscFunctionReturn(PETSC_SUCCESS); } PetscErrorCode Function_2(SNES snes, Vec x, Vec f, void *){ PetscFunctionBegin; printf("Function 2\n"); PetscFunctionReturn(PETSC_SUCCESS); } int main(int argc, char **argv) { PetscFunctionBeginUser; PetscCall(PetscInitialize(&argc, &argv, NULL, help)); DM dm; PetscCall(DMDACreate1d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, 100, 1, 1, NULL, &dm)); PetscCall(DMSetFromOptions(dm)); PetscCall(DMSetUp(dm)); SNES snes1, snes2; Vec r1,r2; Mat J1, J2; PetscCall(DMCreateGlobalVector(dm, &r1)); PetscCall(DMCreateGlobalVector(dm, &r2)) PetscCall(DMCreateMatrix(dm, &J1)); PetscCall(DMCreateMatrix(dm, &J2)); PetscCall(SNESCreate(PETSC_COMM_WORLD, &snes1)); PetscCall(SNESCreate(PETSC_COMM_WORLD, &snes2)); PetscCall(SNESSetType(snes1, SNESNEWTONLS)); PetscCall(SNESSetType(snes2, SNESNEWTONLS)); PetscCall(SNESSetFromOptions(snes1)); PetscCall(SNESSetFromOptions(snes2)); PetscCall(SNESSetFunction(snes1, r1, Function_1, NULL)); PetscCall(SNESSetFunction(snes2, r2, Function_2, NULL)); PetscCall(SNESSetJacobian(snes1, J1, J1, Jac_1, NULL)); PetscCall(SNESSetJacobian(snes2, J2, J2, Jac_2, NULL)); PetscCall(SNESSetDM(snes1, dm)); PetscCall(SNESSetDM(snes2, dm)); PetscCall(SNESSolve(snes1, NULL, NULL)); PetscCall(SNESSolve(snes2, NULL, NULL)); printf("snes1 %p; snes2 %p\n", snes1, snes2); SNESFunctionFn *p; PetscCall(SNESGetFunction(snes1, NULL, &p, NULL)); printf("snes1: pointer %p, true function %p\n", *p, Function_1); PetscCall(SNESGetFunction(snes2, NULL, &p, NULL)); printf("snes2: pointer %p, true function %p\n", *p, Function_2); PetscCall(PetscFinalize()); PetscFunctionReturn(PETSC_SUCCESS); } -------------- next part -------------- An HTML attachment was scrubbed... URL: From matteo.semplice at uninsubria.it Thu Nov 6 02:19:14 2025 From: matteo.semplice at uninsubria.it (Matteo Semplice) Date: Thu, 6 Nov 2025 09:19:14 +0100 Subject: [petsc-users] R: Two SNES on the same DM not working In-Reply-To: References: Message-ID: Dear Barry, ? ? sorry for jumping into this. I am wondering if your reply is related to DMDA or to DM in general. I have at least one code where I do something similar to what Samuele did in his sample code: create a DMPlex, create a section on this DMPlex, create two SNES solving for Vecs defined on that same section and attach to each of them a different SNESFunction and SNESJacobian (one solves a predictor and the other is a corrector). Everything seems fine, but I am wondering if that code is somewhat weak and should be changed by DMCloning the plex as you suggested to Samuele. Thanks ? ? Matteo On 06/11/2025 07:49, Samuele Ferri wrote: > > > sale987 at live.com sembra simile a un utente che in precedenza ti ha > inviato un messaggio di posta elettronica, ma potrebbe non essere lo > stesso. Scopri perch? potrebbe trattarsi di un rischio > > > > Dear Barry, > > thank you for your reply. Now everything works fine. > > Best regards > Samuele > ------------------------------------------------------------------------ > *Da:* Barry Smith > *Inviato:* mercoled? 5 novembre 2025 15:47 > *A:* Samuele Ferri > *Cc:* petsc-users at mcs.anl.gov > *Oggetto:* Re: [petsc-users] Two SNES on the same DM not working > > ? ?This is not supported. Duplicate your DM. > >> On Nov 5, 2025, at 9:17?AM, Samuele Ferri wrote: >> >> Dear petsc users, >> >> in petsc version 3.24, I'm trying to create two snes over the same >> DM, but with different functions and jacobians. Despite making >> different calls to SNESSetFunction it happens the second snes uses >> the same function of the first. >> Can you help me finding the problem, please? >> >> Here below there is a minimal working example showing the issue: >> >> static char help[] = "Test SNES.\n"; >> #include >> #include >> #include >> >> PetscErrorCode Jac_1(SNES/snes/, Vec/x/, Mat/J/, Mat/B/, void *){ >> ? ? PetscFunctionBegin; >> ? ? printf("Jac 1\n"); >> ? ? PetscFunctionReturn(PETSC_SUCCESS); >> } >> >> PetscErrorCode Function_1(SNES/snes/, Vec/x/, Vec/f/, void *){ >> ? ? PetscFunctionBegin; >> ? ? printf("Function 1\n"); >> ? ? PetscFunctionReturn(PETSC_SUCCESS); >> } >> >> PetscErrorCode Jac_2(SNES/snes/, Vec/x/, Mat/J/, Mat/B/, void *){ >> ? ? PetscFunctionBegin; >> ? ? printf("Jac 2\n"); >> ? ? PetscFunctionReturn(PETSC_SUCCESS); >> } >> >> PetscErrorCode Function_2(SNES/snes/, Vec/x/, Vec/f/, void *){ >> ? ? PetscFunctionBegin; >> ? ? printf("Function 2\n"); >> ? ? PetscFunctionReturn(PETSC_SUCCESS); >> } >> >> int main(int/argc/, char **/argv/) { >> >> ? ? PetscFunctionBeginUser; >> ? ? PetscCall(PetscInitialize(&/argc/, &/argv/, NULL, help)); >> >> ? ? DM dm; >> ? ? PetscCall(DMDACreate1d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, 100, >> 1, 1, NULL, &dm)); >> ? ? PetscCall(DMSetFromOptions(dm)); >> ? ? PetscCall(DMSetUp(dm)); >> >> ? ? SNES snes1, snes2; >> ? ? Vec r1,r2; >> ? ? Mat J1, J2; >> >> ? ? PetscCall(DMCreateGlobalVector(dm, &r1)); >> ? ? PetscCall(DMCreateGlobalVector(dm, &r2)) >> ? ? PetscCall(DMCreateMatrix(dm, &J1)); >> ? ? PetscCall(DMCreateMatrix(dm, &J2)); >> >> ? ? PetscCall(SNESCreate(PETSC_COMM_WORLD, &snes1)); >> ? ? PetscCall(SNESCreate(PETSC_COMM_WORLD, &snes2)); >> ? ? PetscCall(SNESSetType(snes1, SNESNEWTONLS)); >> ? ? PetscCall(SNESSetType(snes2, SNESNEWTONLS)); >> ? ? PetscCall(SNESSetFromOptions(snes1)); >> ? ? PetscCall(SNESSetFromOptions(snes2)); >> ? ? PetscCall(SNESSetFunction(snes1, r1, Function_1, NULL)); >> ? ? PetscCall(SNESSetFunction(snes2, r2, Function_2, NULL)); >> ? ? PetscCall(SNESSetJacobian(snes1, J1, J1, Jac_1, NULL)); >> ? ? PetscCall(SNESSetJacobian(snes2, J2, J2, Jac_2, NULL)); >> ? ? PetscCall(SNESSetDM(snes1, dm)); >> ? ? PetscCall(SNESSetDM(snes2, dm)); >> >> ? ? PetscCall(SNESSolve(snes1, NULL, NULL)); >> ? ? PetscCall(SNESSolve(snes2, NULL, NULL)); >> >> ? ? printf("snes1 %p; snes2 %p\n", snes1, snes2); >> >> ? ? SNESFunctionFn *p; >> ? ? PetscCall(SNESGetFunction(snes1, NULL, &p, NULL)); >> ? ? printf("snes1: pointer %p, true function %p\n", *p, Function_1); >> ? ? PetscCall(SNESGetFunction(snes2, NULL, &p, NULL)); >> ? ? printf("snes2: pointer %p, true function %p\n", *p, Function_2); >> ? ? PetscCall(PetscFinalize()); >> ? ? PetscFunctionReturn(PETSC_SUCCESS); >> } > -- Prof. Matteo Semplice Universit? degli Studi dell?Insubria Dipartimento di Scienza e Alta Tecnologia ? DiSAT Professore Associato Via Valleggio, 11 ? 22100 Como (CO) ? Italia tel.: +39 031 2386316 -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Thu Nov 6 03:11:26 2025 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Thu, 6 Nov 2025 12:11:26 +0300 Subject: [petsc-users] R: Two SNES on the same DM not working In-Reply-To: References: Message-ID: Matteo eventually (and in some sense counterintuitively) the DM stores the information on the problem, not SNES. See the snippet below to make things more clear SNESSetDM(snes1,dm) SNESSetFunction(snes1,F) SNESSolve(snes1) // Solves F(x)=0 SNESSetDM(snes2,dm) SNESSetFunction(snes2,G) SNESSolve(snes2) // Solves G(x)=0 SNESSolve(snes1) // Solves G(x), not F(x)!! If you have a plex you can call DMClone(dm,dm2) and set a new section on dm2 to be used on snes2 (the mesh won't be duplicated, only the problem dependent part) I guess you can follow the same approach with a DMDA, it should work. If not, you may need to call DMDuplicate on the DMDA. Il giorno gio 6 nov 2025 alle ore 11:19 Matteo Semplice via petsc-users < petsc-users at mcs.anl.gov> ha scritto: > Dear Barry, > > sorry for jumping into this. > > > I am wondering if your reply is related to DMDA or to DM in general. I > have at least one code where I do something similar to what Samuele did in > his sample code: create a DMPlex, create a section on this DMPlex, create > two SNES solving for Vecs defined on that same section and attach to each > of them a different SNESFunction and SNESJacobian (one solves a predictor > and the other is a corrector). Everything seems fine, but I am wondering if > that code is somewhat weak and should be changed by DMCloning the plex as > you suggested to Samuele. > > > Thanks > > Matteo > > > On 06/11/2025 07:49, Samuele Ferri wrote: > > > sale987 at live.com sembra simile a un utente che in precedenza ti ha > inviato un messaggio di posta elettronica, ma potrebbe non essere lo > stesso. Scopri perch? potrebbe trattarsi di un rischio > > > Dear Barry, > > thank you for your reply. Now everything works fine. > > Best regards > Samuele > ------------------------------ > *Da:* Barry Smith > *Inviato:* mercoled? 5 novembre 2025 15:47 > *A:* Samuele Ferri > *Cc:* petsc-users at mcs.anl.gov > > *Oggetto:* Re: [petsc-users] Two SNES on the same DM not working > > > This is not supported. Duplicate your DM. > > On Nov 5, 2025, at 9:17?AM, Samuele Ferri > wrote: > > Dear petsc users, > > in petsc version 3.24, I'm trying to create two snes over the same DM, but > with different functions and jacobians. Despite making different calls to > SNESSetFunction it happens the second snes uses the same function of the > first. > Can you help me finding the problem, please? > > Here below there is a minimal working example showing the issue: > > static char help[] = "Test SNES.\n"; > #include > #include > #include > > PetscErrorCode Jac_1(SNES *snes*, Vec *x*, Mat *J*, Mat *B*, void *){ > PetscFunctionBegin; > printf("Jac 1\n"); > PetscFunctionReturn(PETSC_SUCCESS); > } > > PetscErrorCode Function_1(SNES *snes*, Vec *x*, Vec *f*, void *){ > PetscFunctionBegin; > printf("Function 1\n"); > PetscFunctionReturn(PETSC_SUCCESS); > } > > PetscErrorCode Jac_2(SNES *snes*, Vec *x*, Mat *J*, Mat *B*, void *){ > PetscFunctionBegin; > printf("Jac 2\n"); > PetscFunctionReturn(PETSC_SUCCESS); > } > > PetscErrorCode Function_2(SNES *snes*, Vec *x*, Vec *f*, void *){ > PetscFunctionBegin; > printf("Function 2\n"); > PetscFunctionReturn(PETSC_SUCCESS); > } > > int main(int *argc*, char ***argv*) { > > PetscFunctionBeginUser; > PetscCall(PetscInitialize(&*argc*, &*argv*, NULL, help)); > > DM dm; > PetscCall(DMDACreate1d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, 100, 1, 1, > NULL, &dm)); > PetscCall(DMSetFromOptions(dm)); > PetscCall(DMSetUp(dm)); > > SNES snes1, snes2; > Vec r1,r2; > Mat J1, J2; > > PetscCall(DMCreateGlobalVector(dm, &r1)); > PetscCall(DMCreateGlobalVector(dm, &r2)) > PetscCall(DMCreateMatrix(dm, &J1)); > PetscCall(DMCreateMatrix(dm, &J2)); > > PetscCall(SNESCreate(PETSC_COMM_WORLD, &snes1)); > PetscCall(SNESCreate(PETSC_COMM_WORLD, &snes2)); > PetscCall(SNESSetType(snes1, SNESNEWTONLS)); > PetscCall(SNESSetType(snes2, SNESNEWTONLS)); > PetscCall(SNESSetFromOptions(snes1)); > PetscCall(SNESSetFromOptions(snes2)); > PetscCall(SNESSetFunction(snes1, r1, Function_1, NULL)); > PetscCall(SNESSetFunction(snes2, r2, Function_2, NULL)); > PetscCall(SNESSetJacobian(snes1, J1, J1, Jac_1, NULL)); > PetscCall(SNESSetJacobian(snes2, J2, J2, Jac_2, NULL)); > PetscCall(SNESSetDM(snes1, dm)); > PetscCall(SNESSetDM(snes2, dm)); > > PetscCall(SNESSolve(snes1, NULL, NULL)); > PetscCall(SNESSolve(snes2, NULL, NULL)); > > printf("snes1 %p; snes2 %p\n", snes1, snes2); > > SNESFunctionFn *p; > PetscCall(SNESGetFunction(snes1, NULL, &p, NULL)); > printf("snes1: pointer %p, true function %p\n", *p, Function_1); > PetscCall(SNESGetFunction(snes2, NULL, &p, NULL)); > printf("snes2: pointer %p, true function %p\n", *p, Function_2); > > PetscCall(PetscFinalize()); > PetscFunctionReturn(PETSC_SUCCESS); > } > > > -- > Prof. Matteo Semplice > Universit? degli Studi dell?Insubria > Dipartimento di Scienza e Alta Tecnologia ? DiSAT > Professore Associato > Via Valleggio, 11 ? 22100 Como (CO) ? Italia > tel.: +39 031 2386316 > > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From hng.email at gmail.com Thu Nov 6 13:02:05 2025 From: hng.email at gmail.com (Hom Nath Gharti) Date: Thu, 6 Nov 2025 14:02:05 -0500 Subject: [petsc-users] Fortran program compilation hangs with new PETSc version Message-ID: I compiled the latest version of PETSc (3.24.1) and attempted to compile my package, which uses Fortran and MPI. But the compilation of my package hangs forever. I tried it on a different cluster with the same behaviour. The compilation precisely hangs when linking to the PETSc program. It seems to be entering into some sort of infinite loop. This problem did not happen with PETSc 3.22.4. I tried with GCC versions 11 and 12. Any advice would be greatly appreciated. Best, Hom Nath -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Nov 6 19:10:06 2025 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 6 Nov 2025 20:10:06 -0500 Subject: [petsc-users] Two SNES on the same DM not working In-Reply-To: References: Message-ID: <45F50A6B-E163-4750-A970-E7AE504A1B38@petsc.dev> > On Nov 6, 2025, at 4:11?AM, Stefano Zampini wrote: > > Matteo > > eventually (and in some sense counterintuitively) the DM stores the information on the problem, not SNES. "The information on the problem" here means the function you set with SNESSetFunction, SNESSetJacobian etc. (and possibly other stuff, I am not sure). > See the snippet below to make things more clear > > SNESSetDM(snes1,dm) > SNESSetFunction(snes1,F) > SNESSolve(snes1) // Solves F(x)=0 > > SNESSetDM(snes2,dm) > SNESSetFunction(snes2,G) > SNESSolve(snes2) // Solves G(x)=0 > > SNESSolve(snes1) // Solves G(x), not F(x)!! > > If you have a plex you can call DMClone(dm,dm2) and set a new section on dm2 to be used on snes2 (the mesh won't be duplicated, only the problem dependent part) > I guess you can follow the same approach with a DMDA, it should work. If not, you may need to call DMDuplicate on the DMDA. > > > > Il giorno gio 6 nov 2025 alle ore 11:19 Matteo Semplice via petsc-users > ha scritto: >> Dear Barry, >> >> sorry for jumping into this. >> >> >> >> I am wondering if your reply is related to DMDA or to DM in general. I have at least one code where I do something similar to what Samuele did in his sample code: create a DMPlex, create a section on this DMPlex, create two SNES solving for Vecs defined on that same section and attach to each of them a different SNESFunction and SNESJacobian (one solves a predictor and the other is a corrector). Everything seems fine, but I am wondering if that code is somewhat weak and should be changed by DMCloning the plex as you suggested to Samuele. >> >> >> >> Thanks >> >> Matteo >> >> >> >> On 06/11/2025 07:49, Samuele Ferri wrote: >>> >>> >>> sale987 at live.com sembra simile a un utente che in precedenza ti ha inviato un messaggio di posta elettronica, ma potrebbe non essere lo stesso. Scopri perch? potrebbe trattarsi di un rischio >>> Dear Barry, >>> >>> thank you for your reply. Now everything works fine. >>> >>> Best regards >>> Samuele >>> Da: Barry Smith >>> Inviato: mercoled? 5 novembre 2025 15:47 >>> A: Samuele Ferri >>> Cc: petsc-users at mcs.anl.gov >>> Oggetto: Re: [petsc-users] Two SNES on the same DM not working >>> >>> >>> This is not supported. Duplicate your DM. >>> >>>> On Nov 5, 2025, at 9:17?AM, Samuele Ferri wrote: >>>> >>>> Dear petsc users, >>>> >>>> in petsc version 3.24, I'm trying to create two snes over the same DM, but with different functions and jacobians. Despite making different calls to SNESSetFunction it happens the second snes uses the same function of the first. >>>> Can you help me finding the problem, please? >>>> >>>> Here below there is a minimal working example showing the issue: >>>> >>>> static char help[] = "Test SNES.\n"; >>>> #include >>>> #include >>>> #include >>>> >>>> PetscErrorCode Jac_1(SNES snes, Vec x, Mat J, Mat B, void *){ >>>> PetscFunctionBegin; >>>> printf("Jac 1\n"); >>>> PetscFunctionReturn(PETSC_SUCCESS); >>>> } >>>> >>>> PetscErrorCode Function_1(SNES snes, Vec x, Vec f, void *){ >>>> PetscFunctionBegin; >>>> printf("Function 1\n"); >>>> PetscFunctionReturn(PETSC_SUCCESS); >>>> } >>>> >>>> PetscErrorCode Jac_2(SNES snes, Vec x, Mat J, Mat B, void *){ >>>> PetscFunctionBegin; >>>> printf("Jac 2\n"); >>>> PetscFunctionReturn(PETSC_SUCCESS); >>>> } >>>> >>>> PetscErrorCode Function_2(SNES snes, Vec x, Vec f, void *){ >>>> PetscFunctionBegin; >>>> printf("Function 2\n"); >>>> PetscFunctionReturn(PETSC_SUCCESS); >>>> } >>>> >>>> int main(int argc, char **argv) { >>>> >>>> PetscFunctionBeginUser; >>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); >>>> >>>> DM dm; >>>> PetscCall(DMDACreate1d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, 100, 1, 1, NULL, &dm)); >>>> PetscCall(DMSetFromOptions(dm)); >>>> PetscCall(DMSetUp(dm)); >>>> >>>> SNES snes1, snes2; >>>> Vec r1,r2; >>>> Mat J1, J2; >>>> >>>> PetscCall(DMCreateGlobalVector(dm, &r1)); >>>> PetscCall(DMCreateGlobalVector(dm, &r2)) >>>> PetscCall(DMCreateMatrix(dm, &J1)); >>>> PetscCall(DMCreateMatrix(dm, &J2)); >>>> >>>> PetscCall(SNESCreate(PETSC_COMM_WORLD, &snes1)); >>>> PetscCall(SNESCreate(PETSC_COMM_WORLD, &snes2)); >>>> PetscCall(SNESSetType(snes1, SNESNEWTONLS)); >>>> PetscCall(SNESSetType(snes2, SNESNEWTONLS)); >>>> PetscCall(SNESSetFromOptions(snes1)); >>>> PetscCall(SNESSetFromOptions(snes2)); >>>> PetscCall(SNESSetFunction(snes1, r1, Function_1, NULL)); >>>> PetscCall(SNESSetFunction(snes2, r2, Function_2, NULL)); >>>> PetscCall(SNESSetJacobian(snes1, J1, J1, Jac_1, NULL)); >>>> PetscCall(SNESSetJacobian(snes2, J2, J2, Jac_2, NULL)); >>>> PetscCall(SNESSetDM(snes1, dm)); >>>> PetscCall(SNESSetDM(snes2, dm)); >>>> >>>> PetscCall(SNESSolve(snes1, NULL, NULL)); >>>> PetscCall(SNESSolve(snes2, NULL, NULL)); >>>> >>>> printf("snes1 %p; snes2 %p\n", snes1, snes2); >>>> >>>> SNESFunctionFn *p; >>>> PetscCall(SNESGetFunction(snes1, NULL, &p, NULL)); >>>> printf("snes1: pointer %p, true function %p\n", *p, Function_1); >>>> PetscCall(SNESGetFunction(snes2, NULL, &p, NULL)); >>>> printf("snes2: pointer %p, true function %p\n", *p, Function_2); >>>> >>>> PetscCall(PetscFinalize()); >>>> PetscFunctionReturn(PETSC_SUCCESS); >>>> } >>> >> -- >> Prof. Matteo Semplice >> Universit? degli Studi dell?Insubria >> Dipartimento di Scienza e Alta Tecnologia ? DiSAT >> Professore Associato >> Via Valleggio, 11 ? 22100 Como (CO) ? Italia >> tel.: +39 031 2386316 > > > > -- > Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From matteo.semplice at uninsubria.it Fri Nov 7 11:21:17 2025 From: matteo.semplice at uninsubria.it (Matteo Semplice) Date: Fri, 7 Nov 2025 18:21:17 +0100 Subject: [petsc-users] Two SNES on the same DM not working In-Reply-To: <45F50A6B-E163-4750-A970-E7AE504A1B38@petsc.dev> References: <45F50A6B-E163-4750-A970-E7AE504A1B38@petsc.dev> Message-ID: <81621b5f-a171-4c60-af37-43955406454d@uninsubria.it> Il 07/11/25 02:10, Barry Smith ha scritto: > > >> On Nov 6, 2025, at 4:11?AM, Stefano Zampini >> wrote: >> >> Matteo >> >> eventually (and in some sense counterintuitively) the DM stores the >> information on the problem, not SNES. > > ? "The information on the problem" here means the function you set > with SNESSetFunction, SNESSetJacobian etc. (and possibly other stuff, > I am not sure). Oddly, sometimes the two SNES seems to work nevertheless, but?after your anwers I will make sure that I explicitly DMClone when I need more that 1 snes. Matteo > > >> See the snippet below to make things more clear >> >> SNESSetDM(snes1,dm) >> SNESSetFunction(snes1,F) >> SNESSolve(snes1) // Solves F(x)=0 >> >> SNESSetDM(snes2,dm) >> SNESSetFunction(snes2,G) >> SNESSolve(snes2) // Solves G(x)=0 >> >> SNESSolve(snes1) // Solves G(x), not F(x)!! >> >> If you have a plex you can call DMClone(dm,dm2) and set a new section >> on dm2 to be used on snes2 (the mesh won't be duplicated, only?the >> problem dependent part) >> I guess you can follow the same approach with a DMDA, it should work. >> If not, you may need to call DMDuplicate on the DMDA. >> >> >> >> Il giorno gio 6 nov 2025 alle ore 11:19 Matteo Semplice via >> petsc-users ha scritto: >> >> Dear Barry, >> >> ? ? sorry for jumping into this. >> >> >> I am wondering if your reply is related to DMDA or to DM in >> general. I have at least one code where I do something similar to >> what Samuele did in his sample code: create a DMPlex, create a >> section on this DMPlex, create two SNES solving for Vecs defined >> on that same section and attach to each of them a different >> SNESFunction and SNESJacobian (one solves a predictor and the >> other is a corrector). Everything seems fine, but I am wondering >> if that code is somewhat weak and should be changed by DMCloning >> the plex as you suggested to Samuele. >> >> >> Thanks >> >> ? ? Matteo >> >> >> On 06/11/2025 07:49, Samuele Ferri wrote: >>> >>> >>> sale987 at live.com sembra simile a un utente che in precedenza ti >>> ha inviato un messaggio di posta elettronica, ma potrebbe non >>> essere lo stesso. Scopri perch? potrebbe trattarsi di un rischio >>> >>> >>> >>> >>> Dear Barry, >>> >>> thank you for your reply. Now everything works fine. >>> >>> Best regards >>> Samuele >>> ------------------------------------------------------------------------ >>> *Da:* Barry Smith >>> *Inviato:* mercoled? 5 novembre 2025 15:47 >>> *A:* Samuele Ferri >>> *Cc:* petsc-users at mcs.anl.gov >>> >>> *Oggetto:* Re: [petsc-users] Two SNES on the same DM not working >>> >>> ? ?This is not supported. Duplicate your DM. >>> >>>> On Nov 5, 2025, at 9:17?AM, Samuele Ferri >>>> wrote: >>>> >>>> Dear petsc users, >>>> >>>> in petsc version 3.24, I'm trying to create two snes over the >>>> same DM, but with different functions and jacobians. Despite >>>> making different calls to SNESSetFunction it happens the second >>>> snes uses the same function of the first. >>>> Can you help me finding the problem, please? >>>> >>>> Here below there is a minimal working example showing the issue: >>>> >>>> static char help[] = "Test SNES.\n"; >>>> #include >>>> #include >>>> #include >>>> >>>> PetscErrorCode Jac_1(SNES/snes/, Vec/x/, Mat/J/, Mat/B/, void *){ >>>> ? ? PetscFunctionBegin; >>>> ? ? printf("Jac 1\n"); >>>> ? ? PetscFunctionReturn(PETSC_SUCCESS); >>>> } >>>> >>>> PetscErrorCode Function_1(SNES/snes/, Vec/x/, Vec/f/, void *){ >>>> ? ? PetscFunctionBegin; >>>> ? ? printf("Function 1\n"); >>>> ? ? PetscFunctionReturn(PETSC_SUCCESS); >>>> } >>>> >>>> PetscErrorCode Jac_2(SNES/snes/, Vec/x/, Mat/J/, Mat/B/, void *){ >>>> ? ? PetscFunctionBegin; >>>> ? ? printf("Jac 2\n"); >>>> ? ? PetscFunctionReturn(PETSC_SUCCESS); >>>> } >>>> >>>> PetscErrorCode Function_2(SNES/snes/, Vec/x/, Vec/f/, void *){ >>>> ? ? PetscFunctionBegin; >>>> ? ? printf("Function 2\n"); >>>> ? ? PetscFunctionReturn(PETSC_SUCCESS); >>>> } >>>> >>>> int main(int/argc/, char **/argv/) { >>>> >>>> ? ? PetscFunctionBeginUser; >>>> ? ? PetscCall(PetscInitialize(&/argc/, &/argv/, NULL, help)); >>>> >>>> ? ? DM dm; >>>> PetscCall(DMDACreate1d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, 100, >>>> 1, 1, NULL, &dm)); >>>> ? ? PetscCall(DMSetFromOptions(dm)); >>>> ? ? PetscCall(DMSetUp(dm)); >>>> >>>> ? ? SNES snes1, snes2; >>>> ? ? Vec r1,r2; >>>> ? ? Mat J1, J2; >>>> >>>> ? ? PetscCall(DMCreateGlobalVector(dm, &r1)); >>>> ? ? PetscCall(DMCreateGlobalVector(dm, &r2)) >>>> ? ? PetscCall(DMCreateMatrix(dm, &J1)); >>>> ? ? PetscCall(DMCreateMatrix(dm, &J2)); >>>> >>>> ? ? PetscCall(SNESCreate(PETSC_COMM_WORLD, &snes1)); >>>> ? ? PetscCall(SNESCreate(PETSC_COMM_WORLD, &snes2)); >>>> ? ? PetscCall(SNESSetType(snes1, SNESNEWTONLS)); >>>> ? ? PetscCall(SNESSetType(snes2, SNESNEWTONLS)); >>>> ? ? PetscCall(SNESSetFromOptions(snes1)); >>>> ? ? PetscCall(SNESSetFromOptions(snes2)); >>>> ? ? PetscCall(SNESSetFunction(snes1, r1, Function_1, NULL)); >>>> ? ? PetscCall(SNESSetFunction(snes2, r2, Function_2, NULL)); >>>> ? ? PetscCall(SNESSetJacobian(snes1, J1, J1, Jac_1, NULL)); >>>> ? ? PetscCall(SNESSetJacobian(snes2, J2, J2, Jac_2, NULL)); >>>> ? ? PetscCall(SNESSetDM(snes1, dm)); >>>> ? ? PetscCall(SNESSetDM(snes2, dm)); >>>> >>>> ? ? PetscCall(SNESSolve(snes1, NULL, NULL)); >>>> ? ? PetscCall(SNESSolve(snes2, NULL, NULL)); >>>> >>>> ? ? printf("snes1 %p; snes2 %p\n", snes1, snes2); >>>> >>>> ? ? SNESFunctionFn *p; >>>> ? ? PetscCall(SNESGetFunction(snes1, NULL, &p, NULL)); >>>> ? ? printf("snes1: pointer %p, true function %p\n", *p, >>>> Function_1); >>>> ? ? PetscCall(SNESGetFunction(snes2, NULL, &p, NULL)); >>>> ? ? printf("snes2: pointer %p, true function %p\n", *p, >>>> Function_2); >>>> ? ? PetscCall(PetscFinalize()); >>>> ? ? PetscFunctionReturn(PETSC_SUCCESS); >>>> } >>> >> -- >> Prof. Matteo Semplice >> Universit? degli Studi dell?Insubria >> Dipartimento di Scienza e Alta Tecnologia ? DiSAT >> Professore Associato >> Via Valleggio, 11 ? 22100 Como (CO) ? Italia >> tel.: +39 031 2386316 >> >> >> >> -- >> Stefano > -- --- Professore Associato in Analisi Numerica Dipartimento di Scienza e Alta Tecnologia Universit? degli Studi dell'Insubria Via Valleggio, 11 - Como -------------- next part -------------- An HTML attachment was scrubbed... URL: From ctchengben at mail.scut.edu.cn Tue Nov 11 03:43:10 2025 From: ctchengben at mail.scut.edu.cn (=?UTF-8?B?56iL5aWU?=) Date: Tue, 11 Nov 2025 17:43:10 +0800 (GMT+08:00) Subject: [petsc-users] Error in configuring PETSc with Cygwin on Windows by using MS-MPI Message-ID: <71f7ca4b.59c6.19a724c3c2c.Coremail.ctchengben@mail.scut.edu.cn> Hello, Recently I try to install PETSc with Cygwin since I'd like to use PETSc with Visual Studio on Windows10 plateform.For the sake of clarity, I firstly list the softwares/packages used below: 1. PETSc: version 3.14.1 2. VS: version 2022 3. MS MPI: download Microsoft MPI v10.1.2 4. Cygwin And the compiler option in configuration is: ./configure --with-debugging=0 --with-cc=cl --with-fc=0 --with-cxx=cl --download-f2cblaslapack=/cygdrive/g/mypetsc/f2cblaslapack-3.8.0.q2.tar.gz --with-mpi-include=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Include,/cygdrive/g/MSmpi/MicrosoftSDKs/Include/x64\] --with-mpi-lib=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpifec.lib,/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpi.lib\] --with-mpiexec=/cygdrive/g/MSmpi/MicrosoftMPI/Bin/mpiexec --download-metis=/cygdrive/g/mypetsc/petsc-pkg-metis-69fb26dd0428.tar.gz --download-parmetis=/cygdrive/g/mypetsc/petsc-pkg-parmetis-45100eac9301.tar.gz --with-strict-petscerrorcode=0 --with-64-bit-indices --download-hdf5=/cygdrive/g/mypetsc/hdf5-1.14.3-p1.tar.bz2 but there return an error: ********************************************************************************************* ============================================================================================= ============================================================================================= Configuring PARMETIS with CMake; this may take several minutes ============================================================================================= ============================================================================================= Compiling and installing PARMETIS; this may take several minutes ============================================================================================= ********************************************************************************************* UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): --------------------------------------------------------------------------------------------- Error running make on PARMETIS ********************************************************************************************* The configure.log is attached below. So I write this email to report my problem and ask for your help. Looking forward your reply! sinserely, Cheng. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: configure.log URL: From 202321009113 at mail.scut.edu.cn Tue Nov 11 03:45:38 2025 From: 202321009113 at mail.scut.edu.cn (=?UTF-8?B?56iL5aWU?=) Date: Tue, 11 Nov 2025 17:45:38 +0800 (GMT+08:00) Subject: [petsc-users] Error in configuring PETSc with Cygwin on Windows by using MS-MPI In-Reply-To: <71f7ca4b.59c6.19a724c3c2c.Coremail.ctchengben@mail.scut.edu.cn> References: <71f7ca4b.59c6.19a724c3c2c.Coremail.ctchengben@mail.scut.edu.cn> Message-ID: <692e614a.59c9.19a724e80b4.Coremail.202321009113@mail.scut.edu.cn> Sorry the PETSc version is 3.24.1. -----????----- ???:?? ????:2025-11-11 17:43:10 (???) ???: petsc-users at mcs.anl.gov ??: Error in configuring PETSc with Cygwin on Windows by using MS-MPI Hello, Recently I try to install PETSc with Cygwin since I'd like to use PETSc with Visual Studio on Windows10 plateform.For the sake of clarity, I firstly list the softwares/packages used below: 1. PETSc: version 3.14.1 2. VS: version 2022 3. MS MPI: download Microsoft MPI v10.1.2 4. Cygwin And the compiler option in configuration is: ./configure --with-debugging=0 --with-cc=cl --with-fc=0 --with-cxx=cl --download-f2cblaslapack=/cygdrive/g/mypetsc/f2cblaslapack-3.8.0.q2.tar.gz --with-mpi-include=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Include,/cygdrive/g/MSmpi/MicrosoftSDKs/Include/x64\] --with-mpi-lib=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpifec.lib,/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpi.lib\] --with-mpiexec=/cygdrive/g/MSmpi/MicrosoftMPI/Bin/mpiexec --download-metis=/cygdrive/g/mypetsc/petsc-pkg-metis-69fb26dd0428.tar.gz --download-parmetis=/cygdrive/g/mypetsc/petsc-pkg-parmetis-45100eac9301.tar.gz --with-strict-petscerrorcode=0 --with-64-bit-indices --download-hdf5=/cygdrive/g/mypetsc/hdf5-1.14.3-p1.tar.bz2 but there return an error: ********************************************************************************************* ============================================================================================= ============================================================================================= Configuring PARMETIS with CMake; this may take several minutes ============================================================================================= ============================================================================================= Compiling and installing PARMETIS; this may take several minutes ============================================================================================= ********************************************************************************************* UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): --------------------------------------------------------------------------------------------- Error running make on PARMETIS ********************************************************************************************* The configure.log is attached below. So I write this email to report my problem and ask for your help. Looking forward your reply! sinserely, Cheng. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Nov 11 06:35:41 2025 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 11 Nov 2025 07:35:41 -0500 Subject: [petsc-users] Error in configuring PETSc with Cygwin on Windows by using MS-MPI In-Reply-To: <71f7ca4b.59c6.19a724c3c2c.Coremail.ctchengben@mail.scut.edu.cn> References: <71f7ca4b.59c6.19a724c3c2c.Coremail.ctchengben@mail.scut.edu.cn> Message-ID: On Tue, Nov 11, 2025 at 4:44?AM ?? wrote: > Hello, > Recently I try to install PETSc with Cygwin since I'd like to use PETSc > with Visual Studio on Windows10 plateform.For the sake of clarity, I > firstly list the softwares/packages used below: > 1. PETSc: version 3.14.1 > 2. VS: version 2022 > 3. MS MPI: download Microsoft MPI v10.1.2 > 4. Cygwin > Quick question: Have you considered installing on WSL? I have had much better luck with that on Windows. This seems to be an incompatibility of ParMetis Windows support and your version: G:\VisualStudio2022\VC\Tools\MSVC\14.37.32822\include\stdint.h(37): error C2371: 'int_fast16_t': redefinition; different basic types^M G:\mypetsc\petsc-3.24.1\arch-mswin-c-opt\externalpackages\petsc-pkg-parmetis-f5e3aab04fd5\headers\ms_stdint.h(80): note: see declaration of 'int_fast16_t'^M G:\VisualStudio2022\VC\Tools\MSVC\14.37.32822\include\stdint.h(41): error C2371: 'uint_fast16_t': redefinition; different basic types^M G:\mypetsc\petsc-3.24.1\arch-mswin-c-opt\externalpackages\petsc-pkg-parmetis-f5e3aab04fd5\headers\ms_stdint.h(84): note: see declaration of 'uint_fast16_t'^M Thanks, Matt > And the compiler option in configuration is: > ./configure --with-debugging=0 --with-cc=cl --with-fc=0 --with-cxx=cl > --download-f2cblaslapack=/cygdrive/g/mypetsc/f2cblaslapack-3.8.0.q2.tar.gz > > --with-mpi-include=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Include,/cygdrive/g/MSmpi/MicrosoftSDKs/Include/x64\] > > --with-mpi-lib=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpifec.lib,/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpi.lib\] > > --with-mpiexec=/cygdrive/g/MSmpi/MicrosoftMPI/Bin/mpiexec > --download-metis=/cygdrive/g/mypetsc/petsc-pkg-metis-69fb26dd0428.tar.gz > --download-parmetis=/cygdrive/g/mypetsc/petsc-pkg-parmetis-45100eac9301.tar.gz > > --with-strict-petscerrorcode=0 --with-64-bit-indices > --download-hdf5=/cygdrive/g/mypetsc/hdf5-1.14.3-p1.tar.bz2 > > > > > > > but there return an error: > > ********************************************************************************************* > > ============================================================================================= > > ============================================================================================= > Configuring PARMETIS with CMake; this may take several > minutes > > ============================================================================================= > > ============================================================================================= > Compiling and installing PARMETIS; this may take several > minutes > > ============================================================================================= > > > > ********************************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for > details): > > --------------------------------------------------------------------------------------------- > Error running make on PARMETIS > > > > ********************************************************************************************* > > > The configure.log is attached below. > > So I write this email to report my problem and ask for your help. > > Looking forward your reply! > > > sinserely, > Cheng. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!dLmtriOEmUVP2A1oc3Mf52cboEA1wjKSpm11szn5VzeEqH4dEZEbvnyoNwoTWleZIFdbzRu6B635UJstTIIb$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Nov 11 09:29:01 2025 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 11 Nov 2025 10:29:01 -0500 Subject: [petsc-users] Error in configuring PETSc with Cygwin on Windows by using MS-MPI In-Reply-To: References: <71f7ca4b.59c6.19a724c3c2c.Coremail.ctchengben@mail.scut.edu.cn> Message-ID: <823C2320-52A3-4679-8BB2-26DA296E4ACA@petsc.dev> Where/how did you obtain /cygdrive/g/mypetsc/petsc-pkg-parmetis-45100eac9301.tar.gz ? Was it from PETSc ./configure? self.version = '4.0.3' self.versionname = 'PARMETIS_MAJOR_VERSION.PARMETIS_MINOR_VERSION.PARMETIS_SUBMINOR_VERSION' self.gitcommit = 'v'+self.version+'-p9' self.download = ['git://https://bitbucket.org/petsc/pkg-parmetis.git','https://bitbucket.org/petsc/pkg-parmetis/get/'+self.gitcommit+'.tar.gz'] > On Nov 11, 2025, at 7:35?AM, Matthew Knepley wrote: > > On Tue, Nov 11, 2025 at 4:44?AM ?? > wrote: >> Hello, >> Recently I try to install PETSc with Cygwin since I'd like to use PETSc with Visual Studio on Windows10 plateform.For the sake of clarity, I firstly list the softwares/packages used below: >> 1. PETSc: version 3.14.1 >> 2. VS: version 2022 >> 3. MS MPI: download Microsoft MPI v10.1.2 >> 4. Cygwin > > Quick question: Have you considered installing on WSL? I have had much better luck with that on Windows. > > This seems to be an incompatibility of ParMetis Windows support and your version: > > G:\VisualStudio2022\VC\Tools\MSVC\14.37.32822\include\stdint.h(37): error C2371: 'int_fast16_t': redefinition; different basic types^M > G:\mypetsc\petsc-3.24.1\arch-mswin-c-opt\externalpackages\petsc-pkg-parmetis-f5e3aab04fd5\headers\ms_stdint.h(80): note: see declaration of 'int_fast16_t'^M > G:\VisualStudio2022\VC\Tools\MSVC\14.37.32822\include\stdint.h(41): error C2371: 'uint_fast16_t': redefinition; different basic types^M > G:\mypetsc\petsc-3.24.1\arch-mswin-c-opt\externalpackages\petsc-pkg-parmetis-f5e3aab04fd5\headers\ms_stdint.h(84): note: see declaration of 'uint_fast16_t'^M > > Thanks, > > Matt > >> >> And the compiler option in configuration is: >> ./configure --with-debugging=0 --with-cc=cl --with-fc=0 --with-cxx=cl >> --download-f2cblaslapack=/cygdrive/g/mypetsc/f2cblaslapack-3.8.0.q2.tar.gz >> --with-mpi-include=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Include,/cygdrive/g/MSmpi/MicrosoftSDKs/Include/x64\] >> --with-mpi-lib=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpifec.lib,/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpi.lib\] >> --with-mpiexec=/cygdrive/g/MSmpi/MicrosoftMPI/Bin/mpiexec >> --download-metis=/cygdrive/g/mypetsc/petsc-pkg-metis-69fb26dd0428.tar.gz >> --download-parmetis=/cygdrive/g/mypetsc/petsc-pkg-parmetis-45100eac9301.tar.gz >> --with-strict-petscerrorcode=0 --with-64-bit-indices --download-hdf5=/cygdrive/g/mypetsc/hdf5-1.14.3-p1.tar.bz2 >> >> >> >> >> >> >> but there return an error: >> ********************************************************************************************* >> ============================================================================================= >> ============================================================================================= >> Configuring PARMETIS with CMake; this may take several minutes >> ============================================================================================= >> ============================================================================================= >> Compiling and installing PARMETIS; this may take several minutes >> ============================================================================================= >> >> >> ********************************************************************************************* >> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): >> --------------------------------------------------------------------------------------------- >> Error running make on PARMETIS >> >> >> ********************************************************************************************* >> >> >> >> The configure.log is attached below. >> >> So I write this email to report my problem and ask for your help. >> >> >> Looking forward your reply! >> >> >> sinserely, >> Cheng. > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!dR4RcpHZAmunWDbmeNsF6mKarUO8DHjbwajZkjXJy-_DKnCMYIt_pdxNJd1ZnSGAKlBTYKkzGncQU7Y1GZWvVy4$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay.anl at fastmail.org Tue Nov 11 11:07:48 2025 From: balay.anl at fastmail.org (Satish Balay) Date: Tue, 11 Nov 2025 11:07:48 -0600 (CST) Subject: [petsc-users] Error in configuring PETSc with Cygwin on Windows by using MS-MPI In-Reply-To: <823C2320-52A3-4679-8BB2-26DA296E4ACA@petsc.dev> References: <71f7ca4b.59c6.19a724c3c2c.Coremail.ctchengben@mail.scut.edu.cn> <823C2320-52A3-4679-8BB2-26DA296E4ACA@petsc.dev> Message-ID: <12b3e0c6-18a8-dffa-37f9-ac9663101f0d@fastmail.org> Also --download-hdf5 won't work with MS compilers on windows. Satish On Tue, 11 Nov 2025, Barry Smith wrote: > > Where/how did you obtain /cygdrive/g/mypetsc/petsc-pkg-parmetis-45100eac9301.tar.gz ? Was it from PETSc ./configure? > > self.version = '4.0.3' > self.versionname = 'PARMETIS_MAJOR_VERSION.PARMETIS_MINOR_VERSION.PARMETIS_SUBMINOR_VERSION' > self.gitcommit = 'v'+self.version+'-p9' > self.download = ['git://https://bitbucket.org/petsc/pkg-parmetis.git','https://bitbucket.org/petsc/pkg-parmetis/get/'+self.gitcommit+'.tar.gz'] > > > > > On Nov 11, 2025, at 7:35?AM, Matthew Knepley wrote: > > > > On Tue, Nov 11, 2025 at 4:44?AM ?? > wrote: > >> Hello, > >> Recently I try to install PETSc with Cygwin since I'd like to use PETSc with Visual Studio on Windows10 plateform.For the sake of clarity, I firstly list the softwares/packages used below: > >> 1. PETSc: version 3.14.1 > >> 2. VS: version 2022 > >> 3. MS MPI: download Microsoft MPI v10.1.2 > >> 4. Cygwin > > > > Quick question: Have you considered installing on WSL? I have had much better luck with that on Windows. > > > > This seems to be an incompatibility of ParMetis Windows support and your version: > > > > G:\VisualStudio2022\VC\Tools\MSVC\14.37.32822\include\stdint.h(37): error C2371: 'int_fast16_t': redefinition; different basic types^M > > G:\mypetsc\petsc-3.24.1\arch-mswin-c-opt\externalpackages\petsc-pkg-parmetis-f5e3aab04fd5\headers\ms_stdint.h(80): note: see declaration of 'int_fast16_t'^M > > G:\VisualStudio2022\VC\Tools\MSVC\14.37.32822\include\stdint.h(41): error C2371: 'uint_fast16_t': redefinition; different basic types^M > > G:\mypetsc\petsc-3.24.1\arch-mswin-c-opt\externalpackages\petsc-pkg-parmetis-f5e3aab04fd5\headers\ms_stdint.h(84): note: see declaration of 'uint_fast16_t'^M > > > > Thanks, > > > > Matt > > > >> > >> And the compiler option in configuration is: > >> ./configure --with-debugging=0 --with-cc=cl --with-fc=0 --with-cxx=cl > >> --download-f2cblaslapack=/cygdrive/g/mypetsc/f2cblaslapack-3.8.0.q2.tar.gz > >> --with-mpi-include=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Include,/cygdrive/g/MSmpi/MicrosoftSDKs/Include/x64\] > >> --with-mpi-lib=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpifec.lib,/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpi.lib\] > >> --with-mpiexec=/cygdrive/g/MSmpi/MicrosoftMPI/Bin/mpiexec > >> --download-metis=/cygdrive/g/mypetsc/petsc-pkg-metis-69fb26dd0428.tar.gz > >> --download-parmetis=/cygdrive/g/mypetsc/petsc-pkg-parmetis-45100eac9301.tar.gz > >> --with-strict-petscerrorcode=0 --with-64-bit-indices --download-hdf5=/cygdrive/g/mypetsc/hdf5-1.14.3-p1.tar.bz2 > >> > >> > >> > >> > >> > >> > >> but there return an error: > >> ********************************************************************************************* > >> ============================================================================================= > >> ============================================================================================= > >> Configuring PARMETIS with CMake; this may take several minutes > >> ============================================================================================= > >> ============================================================================================= > >> Compiling and installing PARMETIS; this may take several minutes > >> ============================================================================================= > >> > >> > >> ********************************************************************************************* > >> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): > >> --------------------------------------------------------------------------------------------- > >> Error running make on PARMETIS > >> > >> > >> ********************************************************************************************* > >> > >> > >> > >> The configure.log is attached below. > >> > >> So I write this email to report my problem and ask for your help. > >> > >> > >> Looking forward your reply! > >> > >> > >> sinserely, > >> Cheng. > > > > > > > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > -- Norbert Wiener > > > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!dR4RcpHZAmunWDbmeNsF6mKarUO8DHjbwajZkjXJy-_DKnCMYIt_pdxNJd1ZnSGAKlBTYKkzGncQU7Y1GZWvVy4$ > > From zhaowenbo.npic at gmail.com Tue Nov 11 19:50:10 2025 From: zhaowenbo.npic at gmail.com (Wenbo Zhao) Date: Wed, 12 Nov 2025 09:50:10 +0800 Subject: [petsc-users] gpu cpu parallel Message-ID: Dear all, We are trying to solve ksp using GPUs. We found the example, src/ksp/ksp/tutorials/bench_kspsolve.c, in which the matrix is created and assembling using COO way provided by PETSc. In this example, the number of CPU is as same as the number of GPU. In our case, computation of the parameters of matrix is performed on CPUs. And the cost of it is expensive, which might take half of total time or even more. We want to use more CPUs to compute parameters in parallel. And a smaller communication domain (such as gpu_comm) for the CPUs corresponding to the GPUs is created. The parameters are computed by all of the CPUs (in MPI_COMM_WORLD). Then, the parameters are send to gpu_comm related CPUs via MPI. Matrix (type of aijcusparse) is then created and assembled within gpu_comm. Finally, ksp_solve is performed on GPUs. I?m not sure if this approach will work in practice. Are there any comparable examples I can look to for guidance? Best, Wenbo -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Tue Nov 11 21:48:47 2025 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Tue, 11 Nov 2025 21:48:47 -0600 Subject: [petsc-users] gpu cpu parallel In-Reply-To: References: Message-ID: Hi, Wenbo, I think your approach should work. But before going this extra step with gpu_comm, have you tried to map multiple MPI ranks (CPUs) to one GPU, using nvidia's multiple process service (MPS)? If MPS works well, then you can avoid the extra complexity. --Junchao Zhang On Tue, Nov 11, 2025 at 7:50?PM Wenbo Zhao wrote: > Dear all, > > We are trying to solve ksp using GPUs. > We found the example, src/ksp/ksp/tutorials/bench_kspsolve.c, in which the > matrix is created and assembling using COO way provided by PETSc. In this > example, the number of CPU is as same as the number of GPU. > In our case, computation of the parameters of matrix is performed on CPUs. > And the cost of it is expensive, which might take half of total time or > even more. > > We want to use more CPUs to compute parameters in parallel. And a smaller > communication domain (such as gpu_comm) for the CPUs corresponding to the > GPUs is created. The parameters are computed by all of the CPUs (in > MPI_COMM_WORLD). Then, the parameters are send to gpu_comm related CPUs via > MPI. Matrix (type of aijcusparse) is then created and assembled within > gpu_comm. Finally, ksp_solve is performed on GPUs. > > I?m not sure if this approach will work in practice. Are there any > comparable examples I can look to for guidance? > > Best, > Wenbo > -------------- next part -------------- An HTML attachment was scrubbed... URL: From grantchao2018 at 163.com Wed Nov 12 01:31:35 2025 From: grantchao2018 at 163.com (Grant Chao) Date: Wed, 12 Nov 2025 15:31:35 +0800 (CST) Subject: [petsc-users] gpu cpu parallel In-Reply-To: References: Message-ID: <1f9310f.309.19a76fa21d9.Coremail.grantchao2018@163.com> Thank you for the suggestion. We have already tried running multiple CPU ranks with a single GPU. However, we observed that as the number of ranks increases, the EPS solver becomes significantly slower. We are not sure of the exact cause?could it be due to process access contention, hidden data transfers, or perhaps another reason? We would be very interested to hear your insight on this matter. To avoid this problem, we used the gpu_comm approach mentioned before. During testing, we noticed that the mapping between rank ID and GPU ID seems to be set automatically and is not user-specifiable. For example, with 4 GPUs (0-3) and 8 CPU ranks (0-7), the program binds ranks 0 and 4 to GPU 0, ranks 1 and 5 to GPU 1, and so on. We tested possible solutions, such as calling cudaSetDevice() manually to set rank 4 to device 1, but it did not work as expected. Ranks 0 and 4 still used GPU 0. We would appreciate your guidance on how to customize this mapping. Thank you for your support. Best wishes, Grant At 2025-11-12 11:48:47, "Junchao Zhang" , said: Hi, Wenbo, I think your approach should work. But before going this extra step with gpu_comm, have you tried to map multiple MPI ranks (CPUs) to one GPU, using nvidia's multiple process service (MPS)? If MPS works well, then you can avoid the extra complexity. --Junchao Zhang On Tue, Nov 11, 2025 at 7:50?PM Wenbo Zhao wrote: Dear all, We are trying to solve ksp using GPUs. We found the example, src/ksp/ksp/tutorials/bench_kspsolve.c, in which the matrix is created and assembling using COO way provided by PETSc. In this example, the number of CPU is as same as the number of GPU. In our case, computation of the parameters of matrix is performed on CPUs. And the cost of it is expensive, which might take half of total time or even more. We want to use more CPUs to compute parameters in parallel. And a smaller communication domain (such as gpu_comm) for the CPUs corresponding to the GPUs is created. The parameters are computed by all of the CPUs (in MPI_COMM_WORLD). Then, the parameters are send to gpu_comm related CPUs via MPI. Matrix (type of aijcusparse) is then created and assembled within gpu_comm. Finally, ksp_solve is performed on GPUs. I?m not sure if this approach will work in practice. Are there any comparable examples I can look to for guidance? Best, Wenbo -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Wed Nov 12 09:58:21 2025 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Wed, 12 Nov 2025 09:58:21 -0600 Subject: [petsc-users] gpu cpu parallel In-Reply-To: <1f9310f.309.19a76fa21d9.Coremail.grantchao2018@163.com> References: <1f9310f.309.19a76fa21d9.Coremail.grantchao2018@163.com> Message-ID: On Wed, Nov 12, 2025 at 1:31?AM Grant Chao wrote: > > Thank you for the suggestion. > > We have already tried running multiple CPU ranks with a single GPU. > However, we observed that as the number of ranks increases, the EPS solver > becomes significantly slower. We are not sure of the exact cause?could it > be due to process access contention, hidden data transfers, or perhaps > another reason? We would be very interested to hear your insight on this > matter. > Have you started the MPS, see https://urldefense.us/v3/__https://docs.nvidia.com/deploy/mps/index.html*starting-and-stopping-mps-on-linux__;Iw!!G_uCfscf7eWS!fRqGFSTH6neOLcmMT1alt2Uma1K1jVsAm1kXTHrg5nNNe-dVKOn6jIJvkO6q0AKcW9s3WvmnXT3jqrh2NFk1hBiuCBlC$ > > To avoid this problem, we used the gpu_comm approach mentioned before. > During testing, we noticed that the mapping between rank ID and GPU ID > seems to be set automatically and is not user-specifiable. > > For example, with 4 GPUs (0-3) and 8 CPU ranks (0-7), the program binds > ranks 0 and 4 to GPU 0, ranks 1 and 5 to GPU 1, and so on. > Yes, that is the current round-robin algorithm. Do you want ranks 0,1 on GPU 0, and ranks 2, 3 on GPU 1, and so on? > We tested possible solutions, such as calling cudaSetDevice() manually to > set rank 4 to device 1, but it did not work as expected. Ranks 0 and 4 > still used GPU 0. > > We would appreciate your guidance on how to customize this mapping. Thank > you for your support. > > Best wishes, > Grant > > > At 2025-11-12 11:48:47, "Junchao Zhang" , said: > > Hi, Wenbo, > I think your approach should work. But before going this extra step > with gpu_comm, have you tried to map multiple MPI ranks (CPUs) to one GPU, > using nvidia's multiple process service (MPS)? If MPS works well, then > you can avoid the extra complexity. > > --Junchao Zhang > > > On Tue, Nov 11, 2025 at 7:50?PM Wenbo Zhao > wrote: > >> Dear all, >> >> We are trying to solve ksp using GPUs. >> We found the example, src/ksp/ksp/tutorials/bench_kspsolve.c, in which >> the matrix is created and assembling using COO way provided by PETSc. In >> this example, the number of CPU is as same as the number of GPU. >> In our case, computation of the parameters of matrix is performed on >> CPUs. And the cost of it is expensive, which might take half of total time >> or even more. >> >> We want to use more CPUs to compute parameters in parallel. And a >> smaller communication domain (such as gpu_comm) for the CPUs corresponding >> to the GPUs is created. The parameters are computed by all of the CPUs (in >> MPI_COMM_WORLD). Then, the parameters are send to gpu_comm related CPUs via >> MPI. Matrix (type of aijcusparse) is then created and assembled within >> gpu_comm. Finally, ksp_solve is performed on GPUs. >> >> I?m not sure if this approach will work in practice. Are there any >> comparable examples I can look to for guidance? >> >> Best, >> Wenbo >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Nov 12 10:03:50 2025 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 12 Nov 2025 11:03:50 -0500 Subject: [petsc-users] Error in configuring PETSc with Cygwin on Windows by using MS-MPI In-Reply-To: <46e195ab.5cea.19a779d6908.Coremail.202321009113@mail.scut.edu.cn> References: <71f7ca4b.59c6.19a724c3c2c.Coremail.ctchengben@mail.scut.edu.cn> <823C2320-52A3-4679-8BB2-26DA296E4ACA@petsc.dev> <46e195ab.5cea.19a779d6908.Coremail.202321009113@mail.scut.edu.cn> Message-ID: G:\VisualStudio2022\VC\Tools\MSVC\14.37.32822\include\stdint.h(37): error C2371: 'int_fast16_t': redefinition; different basic types G:\mypetsc\petsc-3.24.1\arch-mswin-c-opt\externalpackages\petsc-pkg-parmetis-f5e3aab04fd5\headers\ms_stdint.h(80): note: see declaration of 'int_fast16_t' G:\VisualStudio2022\VC\Tools\MSVC\14.37.32822\include\stdint.h(41): error C2371: 'uint_fast16_t': redefinition; different basic types G:\mypetsc\petsc-3.24.1\arch-mswin-c-opt\externalpackages\petsc-pkg-parmetis-f5e3aab04fd5\headers\ms_stdint.h(84): note: see declaration of 'uint_fast16_t' G:\VisualStudio2022\VC\Tools\MSVC\14.37.32822\include\stdint.h(49): warning C4005: 'INT8_MIN': macro redefinition G:\mypetsc\petsc-3.24.1\arch-mswin-c-opt\externalpackages\petsc-pkg-parmetis-f5e3aab04fd5\headers\ms_stdint.h(107): note: see previous definition of 'INT8_MIN' Parmetis has its own definitions for many C standard types, etc in headers\ms_stdint.h that duplicate what is available in stdint.h on Unix systems. Normally, this gets included when __MSC_ is defined instead of stdint.h (in gk_arch.h). But for some reason, with your system it appears that Microsoft's stdint.h is also getting included; presumably brought in through some other system include file since it is only included in one place. $ git grep stdint.h headers/gk_arch.h: #include "ms_stdint.h" headers/gk_arch.h: #include headers/ms_inttypes.h:#include "ms_stdint.h" headers/ms_stdint.h:// ISO C9x compliant stdint.h for Microsoft Visual Studio You have a fairly old VisualStudio, 2022. Can you upgrade to the latest? Let us know if this resolves the problem. Barry > On Nov 12, 2025, at 5:29?AM, ?? <202321009113 at mail.scut.edu.cn> wrote: > > Hi Barry > > Thanks for your reply. > > I check the package parmetis,and the "petsc-pkg-parmetis-45100eac9301.tar.gz" is form https://urldefense.us/v3/__https://bitbucket.org/petsc/pkg-parmetis/get/v4.0.3.tar.gz__;!!G_uCfscf7eWS!anttFLuihC7sv3xitFbNls4Ab1QfxVAGNr1EttbSarqqFMdkXJIg9_aN1RakIYDBWqtKJJM8jYn3SxcuaKW6S2Q$ . So I made a mistake about the package. > > Then I download the package form https://urldefense.us/v3/__https://bitbucket.org/petsc/pkg-parmetis/get/v4.0.3-p9__;!!G_uCfscf7eWS!anttFLuihC7sv3xitFbNls4Ab1QfxVAGNr1EttbSarqqFMdkXJIg9_aN1RakIYDBWqtKJJM8jYn3Sxcu4By6gtk$ .tar.gz and it is "petsc-pkg-parmetis-f5e3aab04fd5.tar.gz" > > > > > Then the compiler option in configuration is: > ./configure --with-debugging=0 --with-cc=cl --with-fc=0 --with-cxx=cl --download-f2cblaslapack=/cygdrive/g/mypetsc/f2cblaslapack-3.8.0.q2.tar.gz --with-mpi-include=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Include,/cygdrive/g/MSmpi/MicrosoftSDKs/Include/x64\] --with-mpi-lib=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpifec.lib,/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpi.lib\] --with-mpiexec=/cygdrive/g/MSmpi/MicrosoftMPI/Bin/mpiexec --download-metis=/cygdrive/g/mypetsc/petsc-pkg-metis-69fb26dd0428.tar.gz --download-parmetis=/cygdrive/g/mypetsc/petsc-pkg-parmetis-f5e3aab04fd5.tar.gz --with-strict-petscerrorcode=0 --with-64-bit-indices > > > but it still have the same error: > ********************************************************************************************* > ============================================================================================= > ============================================================================================= > Configuring PARMETIS with CMake; this may take several minutes > ============================================================================================= > ============================================================================================= > Compiling and installing PARMETIS; this may take several minutes > ============================================================================================= > > > ********************************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): > --------------------------------------------------------------------------------------------- > Error running make on PARMETIS > > > > ********************************************************************************************* > > > > > The new configure.log is attached below. > > So I ask for your help again. > > > Looking forward your reply! > > > sinserely, > Cheng. > > > > > > > > > -----????----- > ???: "Barry Smith" > ????: 2025-11-11 23:29:01 (???) > ???: "Matthew Knepley" > ??: ?? , petsc-users at mcs.anl.gov > ??: Re: [petsc-users] Error in configuring PETSc with Cygwin on Windows by using MS-MPI > > > Where/how did you obtain /cygdrive/g/mypetsc/petsc-pkg-parmetis-45100eac9301.tar.gz ? Was it from PETSc ./configure? > > self.version = '4.0.3' > self.versionname = 'PARMETIS_MAJOR_VERSION.PARMETIS_MINOR_VERSION.PARMETIS_SUBMINOR_VERSION' > self.gitcommit = 'v'+self.version+'-p9' > self.download = ['git://https://bitbucket.org/petsc/pkg-parmetis.git','https://bitbucket.org/petsc/pkg-parmetis/get/'+self.gitcommit+'.tar.gz'] > > > >> On Nov 11, 2025, at 7:35?AM, Matthew Knepley wrote: >> >> On Tue, Nov 11, 2025 at 4:44?AM ?? > wrote: >>> Hello, >>> Recently I try to install PETSc with Cygwin since I'd like to use PETSc with Visual Studio on Windows10 plateform.For the sake of clarity, I firstly list the softwares/packages used below: >>> 1. PETSc: version 3.14.1 >>> 2. VS: version 2022 >>> 3. MS MPI: download Microsoft MPI v10.1.2 >>> 4. Cygwin >> >> Quick question: Have you considered installing on WSL? I have had much better luck with that on Windows. >> >> This seems to be an incompatibility of ParMetis Windows support and your version: >> >> G:\VisualStudio2022\VC\Tools\MSVC\14.37.32822\include\stdint.h(37): error C2371: 'int_fast16_t': redefinition; different basic types^M >> G:\mypetsc\petsc-3.24.1\arch-mswin-c-opt\externalpackages\petsc-pkg-parmetis-f5e3aab04fd5\headers\ms_stdint.h(80): note: see declaration of 'int_fast16_t'^M >> G:\VisualStudio2022\VC\Tools\MSVC\14.37.32822\include\stdint.h(41): error C2371: 'uint_fast16_t': redefinition; different basic types^M >> G:\mypetsc\petsc-3.24.1\arch-mswin-c-opt\externalpackages\petsc-pkg-parmetis-f5e3aab04fd5\headers\ms_stdint.h(84): note: see declaration of 'uint_fast16_t'^M >> >> Thanks, >> >> Matt >> >>> >>> And the compiler option in configuration is: >>> ./configure --with-debugging=0 --with-cc=cl --with-fc=0 --with-cxx=cl >>> --download-f2cblaslapack=/cygdrive/g/mypetsc/f2cblaslapack-3.8.0.q2.tar.gz >>> --with-mpi-include=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Include,/cygdrive/g/MSmpi/MicrosoftSDKs/Include/x64\] >>> --with-mpi-lib=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpifec.lib,/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpi.lib\] >>> --with-mpiexec=/cygdrive/g/MSmpi/MicrosoftMPI/Bin/mpiexec >>> --download-metis=/cygdrive/g/mypetsc/petsc-pkg-metis-69fb26dd0428.tar.gz >>> --download-parmetis=/cygdrive/g/mypetsc/petsc-pkg-parmetis-45100eac9301.tar.gz >>> --with-strict-petscerrorcode=0 --with-64-bit-indices --download-hdf5=/cygdrive/g/mypetsc/hdf5-1.14.3-p1.tar.bz2 >>> >>> >>> >>> >>> >>> >>> but there return an error: >>> ********************************************************************************************* >>> ============================================================================================= >>> ============================================================================================= >>> Configuring PARMETIS with CMake; this may take several minutes >>> ============================================================================================= >>> ============================================================================================= >>> Compiling and installing PARMETIS; this may take several minutes >>> ============================================================================================= >>> >>> >>> ********************************************************************************************* >>> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): >>> --------------------------------------------------------------------------------------------- >>> Error running make on PARMETIS >>> >>> >>> ********************************************************************************************* >>> >>> >>> >>> The configure.log is attached below. >>> >>> So I write this email to report my problem and ask for your help. >>> >>> >>> Looking forward your reply! >>> >>> >>> sinserely, >>> Cheng. >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!anttFLuihC7sv3xitFbNls4Ab1QfxVAGNr1EttbSarqqFMdkXJIg9_aN1RakIYDBWqtKJJM8jYn3SxcurPKaHgI$ > ? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 1253211 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Nov 12 10:20:41 2025 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 12 Nov 2025 11:20:41 -0500 Subject: [petsc-users] gpu cpu parallel In-Reply-To: <1f9310f.309.19a76fa21d9.Coremail.grantchao2018@163.com> References: <1f9310f.309.19a76fa21d9.Coremail.grantchao2018@163.com> Message-ID: <4E8E7829-A856-495A-ADA3-710C91F8B3EF@petsc.dev> > On Nov 12, 2025, at 2:31?AM, Grant Chao wrote: > > > Thank you for the suggestion. > > We have already tried running multiple CPU ranks with a single GPU. However, we observed that as the number of ranks increases, the EPS solver becomes significantly slower. We are not sure of the exact cause?could it be due to process access contention, hidden data transfers, or perhaps another reason? We would be very interested to hear your insight on this matter. > > To avoid this problem, we used the gpu_comm approach mentioned before. During testing, we noticed that the mapping between rank ID and GPU ID seems to be set automatically and is not user-specifiable. > > For example, with 4 GPUs (0-3) and 8 CPU ranks (0-7), the program binds ranks 0 and 4 to GPU 0, ranks 1 and 5 to GPU 1, and so on. > We tested possible solutions, such as calling cudaSetDevice() manually to set rank 4 to device 1, but it did not work as expected. Ranks 0 and 4 still used GPU 0. > > We would appreciate your guidance on how to customize this mapping. Thank you for your support. So you have a single compute "node" connected to multiple GPUs? Then the mapping of MPI ranks to GPUs doesn't matter and changing it won't improve the performance. > However, we observed that as the number of ranks increases, the EPS solver becomes significantly slower. Does the number of EPS "iterations" increase? Run with one, two, four and eight MPI ranks (and the same number of "GPUs" (if you only have say four GPUs that is fine, just virtualize them so two different MPI ranks share one) and the option -log_view and send the output. We need to know what is slowing down before trying to find any cure. Barry > > Best wishes, > Grant > > > At 2025-11-12 11:48:47, "Junchao Zhang" , said: > Hi, Wenbo, > I think your approach should work. But before going this extra step with gpu_comm, have you tried to map multiple MPI ranks (CPUs) to one GPU, using nvidia's multiple process service (MPS)? If MPS works well, then you can avoid the extra complexity. > > --Junchao Zhang > > > On Tue, Nov 11, 2025 at 7:50?PM Wenbo Zhao > wrote: >> Dear all, >> >> We are trying to solve ksp using GPUs. >> We found the example, src/ksp/ksp/tutorials/bench_kspsolve.c, in which the matrix is created and assembling using COO way provided by PETSc. In this example, the number of CPU is as same as the number of GPU. >> In our case, computation of the parameters of matrix is performed on CPUs. And the cost of it is expensive, which might take half of total time or even more. >> >> We want to use more CPUs to compute parameters in parallel. And a smaller communication domain (such as gpu_comm) for the CPUs corresponding to the GPUs is created. The parameters are computed by all of the CPUs (in MPI_COMM_WORLD). Then, the parameters are send to gpu_comm related CPUs via MPI. Matrix (type of aijcusparse) is then created and assembled within gpu_comm. Finally, ksp_solve is performed on GPUs. >> >> I?m not sure if this approach will work in practice. Are there any comparable examples I can look to for guidance? >> >> Best, >> Wenbo -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Wed Nov 12 15:58:05 2025 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Wed, 12 Nov 2025 15:58:05 -0600 Subject: [petsc-users] gpu cpu parallel In-Reply-To: <4E8E7829-A856-495A-ADA3-710C91F8B3EF@petsc.dev> References: <1f9310f.309.19a76fa21d9.Coremail.grantchao2018@163.com> <4E8E7829-A856-495A-ADA3-710C91F8B3EF@petsc.dev> Message-ID: A common approach is to use CUDA_VISIBLE_DEVICES to manipulate MPI ranks to GPUs mapping, see the section at https://urldefense.us/v3/__https://docs.nersc.gov/jobs/affinity/*gpu-nodes__;Iw!!G_uCfscf7eWS!ags1Nog_0A9TnDudT9S81jm72t1NQYuOCg3--XMIlL4LXQCv-SFhCbQzesjgOxMAaRoyDOeYcqInlCRwOorJ0HFSR5q_$ With OpenMPI, you can use OMPI_COMM_WORLD_LOCAL_RANK in place of SLURM_LOCALID (see https://urldefense.us/v3/__https://docs.open-mpi.org/en/v5.0.x/tuning-apps/environment-var.html__;!!G_uCfscf7eWS!ags1Nog_0A9TnDudT9S81jm72t1NQYuOCg3--XMIlL4LXQCv-SFhCbQzesjgOxMAaRoyDOeYcqInlCRwOorJ0DsDgr-l$ ). For example, with 8 MPI ranks and 4 GPUs per node, the following script will map ranks 0, 1 to GPU 0, ranks 2, 3 to GPU 1. #!/bin/bash # select_gpu_device wrapper script export CUDA_VISIBLE_DEVICES=$((OMPI_COMM_WORLD_LOCAL_RANK/(OMPI_COMM_WORLD_LOCAL_SIZE/4))) exec $* On Wed, Nov 12, 2025 at 10:20?AM Barry Smith wrote: > > > On Nov 12, 2025, at 2:31?AM, Grant Chao wrote: > > > Thank you for the suggestion. > > We have already tried running multiple CPU ranks with a single GPU. > However, we observed that as the number of ranks increases, the EPS solver > becomes significantly slower. We are not sure of the exact cause?could it > be due to process access contention, hidden data transfers, or perhaps > another reason? We would be very interested to hear your insight on this > matter. > > To avoid this problem, we used the gpu_comm approach mentioned before. > During testing, we noticed that the mapping between rank ID and GPU ID > seems to be set automatically and is not user-specifiable. > > For example, with 4 GPUs (0-3) and 8 CPU ranks (0-7), the program binds > ranks 0 and 4 to GPU 0, ranks 1 and 5 to GPU 1, and so on. > > > > > We tested possible solutions, such as calling cudaSetDevice() manually to > set rank 4 to device 1, but it did not work as expected. Ranks 0 and 4 > still used GPU 0. > > We would appreciate your guidance on how to customize this mapping. Thank > you for your support. > > > So you have a single compute "node" connected to multiple GPUs? Then > the mapping of MPI ranks to GPUs doesn't matter and changing it won't > improve the performance. > > However, we observed that as the number of ranks increases, the EPS solver > becomes significantly slower. > > > Does the number of EPS "iterations" increase? Run with one, two, four > and eight MPI ranks (and the same number of "GPUs" (if you only have say > four GPUs that is fine, just virtualize them so two different MPI ranks > share one) and the option -log_view and send the output. We need to know > what is slowing down before trying to find any cure. > > Barry > > > > > > Best wishes, > Grant > > > At 2025-11-12 11:48:47, "Junchao Zhang" , said: > > Hi, Wenbo, > I think your approach should work. But before going this extra step > with gpu_comm, have you tried to map multiple MPI ranks (CPUs) to one GPU, > using nvidia's multiple process service (MPS)? If MPS works well, then > you can avoid the extra complexity. > > --Junchao Zhang > > > On Tue, Nov 11, 2025 at 7:50?PM Wenbo Zhao > wrote: > >> Dear all, >> >> We are trying to solve ksp using GPUs. >> We found the example, src/ksp/ksp/tutorials/bench_kspsolve.c, in which >> the matrix is created and assembling using COO way provided by PETSc. In >> this example, the number of CPU is as same as the number of GPU. >> In our case, computation of the parameters of matrix is performed on >> CPUs. And the cost of it is expensive, which might take half of total time >> or even more. >> >> We want to use more CPUs to compute parameters in parallel. And a >> smaller communication domain (such as gpu_comm) for the CPUs corresponding >> to the GPUs is created. The parameters are computed by all of the CPUs (in >> MPI_COMM_WORLD). Then, the parameters are send to gpu_comm related CPUs via >> MPI. Matrix (type of aijcusparse) is then created and assembled within >> gpu_comm. Finally, ksp_solve is performed on GPUs. >> >> I?m not sure if this approach will work in practice. Are there any >> comparable examples I can look to for guidance? >> >> Best, >> Wenbo >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin.chapman at mail.utoronto.ca Wed Nov 12 23:08:01 2025 From: benjamin.chapman at mail.utoronto.ca (Benjamin Chapman) Date: Thu, 13 Nov 2025 05:08:01 +0000 Subject: [petsc-users] Inquiry about issue when compiling with clang Message-ID: Hello, We are using PETSc as part of a larger project and are switching from compiling with gcc to using AOCC (clang). However, I am getting an error from my include statements stating that the data type __complex128 is an "unknown type name". The full error log is attached. I found this solution online (FreeFem - PETSc compilation error - libblas.a/liblapack.a cannot be used - FreeFEM installation - FreeFEM), which says to comment out a line in the petscconf.h header file. Is this a safe fix or is there a more elegant way to go about this? Best, Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: petsc_build_error_messages.txt URL: From jed at jedbrown.org Wed Nov 12 23:59:21 2025 From: jed at jedbrown.org (Jed Brown) Date: Wed, 12 Nov 2025 22:59:21 -0700 Subject: [petsc-users] Inquiry about issue when compiling with clang In-Reply-To: References: Message-ID: <87qzu2qxqe.fsf@jedbrown.org> I'm assuming that you configured and built PETSc with gcc, and now are building a package that depends on PETSc using AOCC? Is it possible to configure PETSc with the same compiler, or to use -std=gnu++17 (any dialect that supports GNU extensions)? Benjamin Chapman via petsc-users writes: > Hello, > > We are using PETSc as part of a larger project and are switching from compiling with gcc to using AOCC (clang). However, I am getting an error from my include statements stating that the data type __complex128 is an "unknown type name". The full error log is attached. > > I found this solution online (FreeFem - PETSc compilation error - libblas.a/liblapack.a cannot be used - FreeFEM installation - FreeFEM), which says to comment out a line in the petscconf.h header file. Is this a safe fix or is there a more elegant way to go about this? > > Best, > Ben > [ 64%] Building CXX object src/CMakeFiles/rebel_lib.dir/common/coordinate_system.cpp.o > [ 65%] Building CXX object src/CMakeFiles/rebel_lib.dir/common/FFTW_interface.cpp.o > [ 65%] Building CXX object src/CMakeFiles/rebel_lib.dir/common/fileIO.cpp.o > [ 66%] Building CXX object src/CMakeFiles/rebel_lib.dir/common/lapacke_interface.cpp.o > [ 66%] Building CXX object src/CMakeFiles/rebel_lib.dir/common/petsc_extensions.cpp.o > In file included from /mnt/scratch/bchapman/rebel/src/common/petsc_extensions.cpp:7: > In file included from /mnt/scratch/bchapman/rebel/src/common/petsc_extensions.h:14: > In file included from /mnt/scratch/bchapman/rebel/build/external/builds/petsc-3.21.0/include/petscksp.h:6: > In file included from /mnt/scratch/bchapman/rebel/build/external/builds/petsc-3.21.0/include/petscpc.h:6: > In file included from /mnt/scratch/bchapman/rebel/build/external/builds/petsc-3.21.0/include/petscmat.h:6: > In file included from /mnt/scratch/bchapman/rebel/build/external/builds/petsc-3.21.0/include/petscvec.h:8: > In file included from /mnt/scratch/bchapman/rebel/build/external/builds/petsc-3.21.0/include/petscsys.h:193: > /mnt/scratch/bchapman/rebel/build/external/builds/petsc-3.21.0/include/petscmath.h:429:45: error: unknown type name '__complex128' > 429 | PETSC_EXTERN MPI_Datatype MPIU___COMPLEX128 MPIU___COMPLEX128_ATTR_TAG; > | ^ > /mnt/scratch/bchapman/rebel/build/external/builds/petsc-3.21.0/include/petscmath.h:424:71: note: expanded from macro 'MPIU___COMPLEX128_ATTR_TAG' > 424 | #define MPIU___COMPLEX128_ATTR_TAG PETSC_ATTRIBUTE_MPI_TYPE_TAG(__complex128) > | ^ > /mnt/scratch/bchapman/rebel/src/common/petsc_extensions.cpp:2570:13: warning: decomposition declarations are a C++17 extension [-Wc++17-extensions] > 2570 | const auto [min, max] = std::minmax_element(std::begin(s), std::end(s)); > | ^~~~~~~~~~ > 1 warning and 1 error generated. > make[2]: *** [src/CMakeFiles/rebel_lib.dir/build.make:132: src/CMakeFiles/rebel_lib.dir/common/petsc_extensions.cpp.o] Error 1 > make[1]: *** [CMakeFiles/Makefile2:772: src/CMakeFiles/rebel_lib.dir/all] Error 2 > make: *** [Makefile:136: all] Error 2 From pierre at joliv.et Thu Nov 13 03:44:02 2025 From: pierre at joliv.et (Pierre Jolivet) Date: Thu, 13 Nov 2025 10:44:02 +0100 Subject: [petsc-users] Inquiry about issue when compiling with clang In-Reply-To: <87qzu2qxqe.fsf@jedbrown.org> References: <87qzu2qxqe.fsf@jedbrown.org> Message-ID: <069EBDD7-E501-4076-8EB2-B32EB75C6134@joliv.et> > On 13 Nov 2025, at 6:59?AM, Jed Brown wrote: > > I'm assuming that you configured and built PETSc with gcc, and now are building a package that depends on PETSc using AOCC? Is it possible to configure PETSc with the same compiler, or to use -std=gnu++17 (any dialect that supports GNU extensions)? Do what Jed suggests. But to give you a thorough answer, this FreeFEM post got me to fix some missing code in PETSc https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/7868__;!!G_uCfscf7eWS!ZCDn8GMjZjjHKboEmYNh_DysRggGBuwNc5uVk819bJyjAPCkifplTawg0JU1vik-GoaLgcKII1fsCidj5CrSqw$ . So now, the more ?elegant? solution of defining PETSC_SKIP_REAL___FLOAT128 (which was not working back then) should work given that you are using a recent enough PETSc (it seems you are using version 3.21.0, we are at 3.24.1, and the fix is there since 3.21.6). But again, try what Jed suggests first and foremost. Thanks, Pierre > Benjamin Chapman via petsc-users writes: > >> Hello, >> >> We are using PETSc as part of a larger project and are switching from compiling with gcc to using AOCC (clang). However, I am getting an error from my include statements stating that the data type __complex128 is an "unknown type name". The full error log is attached. >> >> I found this solution online (FreeFem - PETSc compilation error - libblas.a/liblapack.a cannot be used - FreeFEM installation - FreeFEM), which says to comment out a line in the petscconf.h header file. Is this a safe fix or is there a more elegant way to go about this? >> >> Best, >> Ben >> [ 64%] Building CXX object src/CMakeFiles/rebel_lib.dir/common/coordinate_system.cpp.o >> [ 65%] Building CXX object src/CMakeFiles/rebel_lib.dir/common/FFTW_interface.cpp.o >> [ 65%] Building CXX object src/CMakeFiles/rebel_lib.dir/common/fileIO.cpp.o >> [ 66%] Building CXX object src/CMakeFiles/rebel_lib.dir/common/lapacke_interface.cpp.o >> [ 66%] Building CXX object src/CMakeFiles/rebel_lib.dir/common/petsc_extensions.cpp.o >> In file included from /mnt/scratch/bchapman/rebel/src/common/petsc_extensions.cpp:7: >> In file included from /mnt/scratch/bchapman/rebel/src/common/petsc_extensions.h:14: >> In file included from /mnt/scratch/bchapman/rebel/build/external/builds/petsc-3.21.0/include/petscksp.h:6: >> In file included from /mnt/scratch/bchapman/rebel/build/external/builds/petsc-3.21.0/include/petscpc.h:6: >> In file included from /mnt/scratch/bchapman/rebel/build/external/builds/petsc-3.21.0/include/petscmat.h:6: >> In file included from /mnt/scratch/bchapman/rebel/build/external/builds/petsc-3.21.0/include/petscvec.h:8: >> In file included from /mnt/scratch/bchapman/rebel/build/external/builds/petsc-3.21.0/include/petscsys.h:193: >> /mnt/scratch/bchapman/rebel/build/external/builds/petsc-3.21.0/include/petscmath.h:429:45: error: unknown type name '__complex128' >> 429 | PETSC_EXTERN MPI_Datatype MPIU___COMPLEX128 MPIU___COMPLEX128_ATTR_TAG; >> | ^ >> /mnt/scratch/bchapman/rebel/build/external/builds/petsc-3.21.0/include/petscmath.h:424:71: note: expanded from macro 'MPIU___COMPLEX128_ATTR_TAG' >> 424 | #define MPIU___COMPLEX128_ATTR_TAG PETSC_ATTRIBUTE_MPI_TYPE_TAG(__complex128) >> | ^ >> /mnt/scratch/bchapman/rebel/src/common/petsc_extensions.cpp:2570:13: warning: decomposition declarations are a C++17 extension [-Wc++17-extensions] >> 2570 | const auto [min, max] = std::minmax_element(std::begin(s), std::end(s)); >> | ^~~~~~~~~~ >> 1 warning and 1 error generated. >> make[2]: *** [src/CMakeFiles/rebel_lib.dir/build.make:132: src/CMakeFiles/rebel_lib.dir/common/petsc_extensions.cpp.o] Error 1 >> make[1]: *** [CMakeFiles/Makefile2:772: src/CMakeFiles/rebel_lib.dir/all] Error 2 >> make: *** [Makefile:136: all] Error 2 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Nov 13 09:05:05 2025 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 13 Nov 2025 10:05:05 -0500 Subject: [petsc-users] Error in configuring PETSc with Cygwin on Windows by using MS-MPI In-Reply-To: <6de01bb0.5fe2.19a7c6bb95f.Coremail.202321009113@mail.scut.edu.cn> References: <71f7ca4b.59c6.19a724c3c2c.Coremail.ctchengben@mail.scut.edu.cn> <823C2320-52A3-4679-8BB2-26DA296E4ACA@petsc.dev> <46e195ab.5cea.19a779d6908.Coremail.202321009113@mail.scut.edu.cn> <6de01bb0.5fe2.19a7c6bb95f.Coremail.202321009113@mail.scut.edu.cn> Message-ID: <84886CB9-D3B8-433C-943B-31E85A47C3B3@petsc.dev> Change __attribute__((packed)) to /* __attribute__((packed)) */ in include/petscmath.h and run make again. I think you should install a new version of Microsoft's compilers etc. Barry > On Nov 13, 2025, at 3:53?AM, ?? <202321009113 at mail.scut.edu.cn> wrote: > > Hi Barry > > > > Thanks for your advice. > I use AI help me that change the file on the petsc-3.24.1/arch-mswin-c-opt/externalpackages/petsc-pkg-parmetis-f5e3aab04fd5/headers/gk_arch. > > > The change is from: > > #ifdef __MSC__ > #include "ms_stdint.h" > #include "ms_inttypes.h" > #include "ms_stat.h" > #else > #ifndef SUNOS > #include > #endif > #if !defined(WIN32) && !defined(__MINGW32__) > #include > #endif > #include > #include > #include > #endif > > To: > > #if (defined(__MSC__) || defined(_MSC_VER)) && defined(_MSC_VER) && _MSC_VER < 1900 > #include "ms_stdint.h" > #include "ms_inttypes.h" > #include "ms_stat.h" > #else > #ifndef SUNOS > #include > #endif > #if !defined(WIN32) && !defined(__MINGW32__) && !defined(_MSC_VER) > #include > #endif > #include > #include > #if !defined(_MSC_VER) > #include > #endif > #endif > > > > Then I configure the PETSc: > > ./configure --with-debugging=0 --with-cc=cl --with-fc=0 --with-cxx=cl --download-f2cblaslapack=/cygdrive/g/mypetsc/f2cblaslapack-3.8.0.q2.tar.gz --with-mpi-include=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Include,/cygdrive/g/MSmpi/MicrosoftSDKs/Include/x64\] --with-mpi-lib=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpifec.lib,/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpi.lib\] --with-mpiexec=/cygdrive/g/MSmpi/MicrosoftMPI/Bin/mpiexec --download-metis=/cygdrive/g/mypetsc/petsc-pkg-metis-69fb26dd0428.tar.gz --download-parmetis=/cygdrive/g/mypetsc/petsc-pkg-parmetis-f5e3aab04fd5.tar.gz --with-strict-petscerrorcode=0 --with-64-bit-indices > > It seems good, but then I make it > > it have the error: > > > > make[3]: *** [gmakefile:211: arch-mswin-c-opt/obj/src/sys/objects/device/interface/mark_dcontext.o] Error 2 > make[3]: Leaving directory '/cygdrive/g/mypetsc/petsc-3.24.1' > make[2]: *** [/cygdrive/g/mypetsc/petsc-3.24.1/lib/petsc/conf/rules_doc.mk:5: libs] Error 2 > make[2]: Leaving directory '/cygdrive/g/mypetsc/petsc-3.24.1' > **************************ERROR************************************* > Error during compile, check arch-mswin-c-opt/lib/petsc/conf/make.log > Send it and arch-mswin-c-opt/lib/petsc/conf/configure.log to petsc-maint at mcs.anl.gov > ******************************************************************** > make[1]: *** [makefile:44: all] Error 1 > make: *** [GNUmakefile:9: all] Error 2 > > > The new configure.log and make.log is attached below. > > I don't know if it is caused by the change I made or the other problems. > > > > So I ask for your help again. > Looking forward your reply! > > > sinserely, > Cheng. > > > > > > > > -----????----- > ???: "Barry Smith" > > ????: 2025-11-13 00:03:50 (???) > ???: ?? <202321009113 at mail.scut.edu.cn > > ??: PETSc > > ??: Re: [petsc-users] Error in configuring PETSc with Cygwin on Windows by using MS-MPI > > G:\VisualStudio2022\VC\Tools\MSVC\14.37.32822\include\stdint.h(37): error C2371: 'int_fast16_t': redefinition; different basic types > G:\mypetsc\petsc-3.24.1\arch-mswin-c-opt\externalpackages\petsc-pkg-parmetis-f5e3aab04fd5\headers\ms_stdint.h(80): note: see declaration of 'int_fast16_t' > G:\VisualStudio2022\VC\Tools\MSVC\14.37.32822\include\stdint.h(41): error C2371: 'uint_fast16_t': redefinition; different basic types > G:\mypetsc\petsc-3.24.1\arch-mswin-c-opt\externalpackages\petsc-pkg-parmetis-f5e3aab04fd5\headers\ms_stdint.h(84): note: see declaration of 'uint_fast16_t' > G:\VisualStudio2022\VC\Tools\MSVC\14.37.32822\include\stdint.h(49): warning C4005: 'INT8_MIN': macro redefinition > G:\mypetsc\petsc-3.24.1\arch-mswin-c-opt\externalpackages\petsc-pkg-parmetis-f5e3aab04fd5\headers\ms_stdint.h(107): note: see previous definition of 'INT8_MIN' > > Parmetis has its own definitions for many C standard types, etc in headers\ms_stdint.h that duplicate what is available in stdint.h on Unix systems. Normally, this gets included when __MSC_ is defined instead of stdint.h (in gk_arch.h). > > But for some reason, with your system it appears that Microsoft's stdint.h is also getting included; presumably brought in through some other system include file since it is only included in one place. > > $ git grep stdint.h > headers/gk_arch.h: #include "ms_stdint.h" > headers/gk_arch.h: #include > headers/ms_inttypes.h:#include "ms_stdint.h" > headers/ms_stdint.h:// ISO C9x compliant stdint.h for Microsoft Visual Studio > > You have a fairly old VisualStudio, 2022. Can you upgrade to the latest? Let us know if this resolves the problem. > > Barry > > > > > > > > > > > On Nov 12, 2025, at 5:29?AM, ?? <202321009113 at mail.scut.edu.cn > wrote: > > Hi Barry > Thanks for your reply. > I check the package parmetis,and the "petsc-pkg-parmetis-45100eac9301.tar.gz" is form https://urldefense.us/v3/__https://bitbucket.org/petsc/pkg-parmetis/get/v4.0.3.tar.gz__;!!G_uCfscf7eWS!fxWXWboQRNUYFGMA0mW58ZDCE6A4aGfOZvzcj0EG2lHsj_174DkztA-YDWKfPXg9WJxjRkZ13WsNy2TXkQuzMkw$ . So I made a mistake about the package. > Then I download the package form https://urldefense.us/v3/__https://bitbucket.org/petsc/pkg-parmetis/get/v4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!fxWXWboQRNUYFGMA0mW58ZDCE6A4aGfOZvzcj0EG2lHsj_174DkztA-YDWKfPXg9WJxjRkZ13WsNy2TXPQoo3R0$ and it is "petsc-pkg-parmetis-f5e3aab04fd5.tar.gz" > > > Then the compiler option in configuration is: > ./configure --with-debugging=0 --with-cc=cl --with-fc=0 --with-cxx=cl --download-f2cblaslapack=/cygdrive/g/mypetsc/f2cblaslapack-3.8.0.q2.tar.gz --with-mpi-include=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Include,/cygdrive/g/MSmpi/MicrosoftSDKs/Include/x64\] --with-mpi-lib=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpifec.lib,/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpi.lib\] --with-mpiexec=/cygdrive/g/MSmpi/MicrosoftMPI/Bin/mpiexec --download-metis=/cygdrive/g/mypetsc/petsc-pkg-metis-69fb26dd0428.tar.gz --download-parmetis=/cygdrive/g/mypetsc/petsc-pkg-parmetis-f5e3aab04fd5.tar.gz --with-strict-petscerrorcode=0 --with-64-bit-indices > > but it still have the same error: > ********************************************************************************************* > ============================================================================================= > ============================================================================================= > Configuring PARMETIS with CMake; this may take several minutes > ============================================================================================= > ============================================================================================= > Compiling and installing PARMETIS; this may take several minutes > ============================================================================================= > > > ********************************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): > --------------------------------------------------------------------------------------------- > Error running make on PARMETIS > > > ********************************************************************************************* > > > The new configure.log is attached below. > So I ask for your help again. > Looking forward your reply! > > > sinserely, > Cheng. > > > > > -----????----- > ???: "Barry Smith" > > ????: 2025-11-11 23:29:01 (???) > ???: "Matthew Knepley" > > ??: ?? >, petsc-users at mcs.anl.gov > ??: Re: [petsc-users] Error in configuring PETSc with Cygwin on Windows by using MS-MPI > > > Where/how did you obtain /cygdrive/g/mypetsc/petsc-pkg-parmetis-45100eac9301.tar.gz ? Was it from PETSc ./configure? > > self.version = '4.0.3' > self.versionname = 'PARMETIS_MAJOR_VERSION.PARMETIS_MINOR_VERSION.PARMETIS_SUBMINOR_VERSION' > self.gitcommit = 'v'+self.version+'-p9' > self.download = ['git://https://bitbucket.org/petsc/pkg-parmetis.git','https://bitbucket.org/petsc/pkg-parmetis/get/'+self.gitcommit+'.tar.gz '] > > > > On Nov 11, 2025, at 7:35?AM, Matthew Knepley > wrote: > > On Tue, Nov 11, 2025 at 4:44?AM ?? > wrote: > Hello, > Recently I try to install PETSc with Cygwin since I'd like to use PETSc with Visual Studio on Windows10 plateform.For the sake of clarity, I firstly list the softwares/packages used below: > 1. PETSc: version 3.14.1 > 2. VS: version 2022 > 3. MS MPI: download Microsoft MPI v10.1.2 > 4. Cygwin > > Quick question: Have you considered installing on WSL? I have had much better luck with that on Windows. > > This seems to be an incompatibility of ParMetis Windows support and your version: > > G:\VisualStudio2022\VC\Tools\MSVC\14.37.32822\include\stdint.h(37): error C2371: 'int_fast16_t': redefinition; different basic types^M > G:\mypetsc\petsc-3.24.1\arch-mswin-c-opt\externalpackages\petsc-pkg-parmetis-f5e3aab04fd5\headers\ms_stdint.h(80): note: see declaration of 'int_fast16_t'^M > G:\VisualStudio2022\VC\Tools\MSVC\14.37.32822\include\stdint.h(41): error C2371: 'uint_fast16_t': redefinition; different basic types^M > G:\mypetsc\petsc-3.24.1\arch-mswin-c-opt\externalpackages\petsc-pkg-parmetis-f5e3aab04fd5\headers\ms_stdint.h(84): note: see declaration of 'uint_fast16_t'^M > > Thanks, > > Matt > > And the compiler option in configuration is: > ./configure --with-debugging=0 --with-cc=cl --with-fc=0 --with-cxx=cl > --download-f2cblaslapack=/cygdrive/g/mypetsc/f2cblaslapack-3.8.0.q2.tar.gz > --with-mpi-include=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Include,/cygdrive/g/MSmpi/MicrosoftSDKs/Include/x64\] > --with-mpi-lib=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpifec.lib,/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpi.lib\] > --with-mpiexec=/cygdrive/g/MSmpi/MicrosoftMPI/Bin/mpiexec > --download-metis=/cygdrive/g/mypetsc/petsc-pkg-metis-69fb26dd0428.tar.gz > --download-parmetis=/cygdrive/g/mypetsc/petsc-pkg-parmetis-45100eac9301.tar.gz > --with-strict-petscerrorcode=0 --with-64-bit-indices --download-hdf5=/cygdrive/g/mypetsc/hdf5-1.14.3-p1.tar.bz2 > > > > > > > but there return an error: > ********************************************************************************************* > ============================================================================================= > ============================================================================================= > Configuring PARMETIS with CMake; this may take several minutes > ============================================================================================= > ============================================================================================= > Compiling and installing PARMETIS; this may take several minutes > ============================================================================================= > > > ********************************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): > --------------------------------------------------------------------------------------------- > Error running make on PARMETIS > > > ********************************************************************************************* > > > The configure.log is attached below. > So I write this email to report my problem and ask for your help. > Looking forward your reply! > > > sinserely, > Cheng. > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fxWXWboQRNUYFGMA0mW58ZDCE6A4aGfOZvzcj0EG2lHsj_174DkztA-YDWKfPXg9WJxjRkZ13WsNy2TXA-nTYTc$ > > ?? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 2302425 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: make.log Type: application/octet-stream Size: 17055 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From herbert.owen at bsc.es Thu Nov 13 11:11:04 2025 From: herbert.owen at bsc.es (howen) Date: Thu, 13 Nov 2025 18:11:04 +0100 Subject: [petsc-users] Petsc + nvhpc In-Reply-To: References: <0B70CA06-D787-4D97-8E33-27E71D08BBF0@bsc.es> Message-ID: <2A940B06-46A3-40AB-BD66-835C46970CA1@bsc.es> Dear Junchao, Thank you for response and sorry for taking so long to answer back. I cannot avoid using the nvidia tools. Gfortran is not mature for OpenACC and gives us problems when compiling our code. What I have done to enable using the latest petsc is to create my own C code to call petsc. I have little experience with c and it took me some time, but I can now use petsc 3.24.1 ;) The behaviour remains the same as in my original email . Parallel+GPU gives bad results. CPU(serial and parallel) and GPU serial all work ok and give the same result. I have gone a bit into petsc comparing the CPU and GPU version with 2 mpi. I see that the difference starts in src/ksp/ksp/impls/cg/cg.c L170 PetscCall(KSP_PCApply(ksp, R, Z)); /* z <- Br */ I have printed the vectors R and Z and the norm dp. R is identical on both CPU and GPU; but Z differs. The correct value of dp (for the first time it enters) is 14.3014, while running on the GPU with 2 mpis it gives 14.7493. If you wish I can send you prints I introduced in cg.c The folder with the input files to run the case can be downloaded from https://urldefense.us/v3/__https://b2drop.eudat.eu/s/wKRQ4LK7RTKz2iQ__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEAh7n_UO$ For submitting the gpu run I use mpirun -np 2 --map-by ppr:4:node:PE=20 --report-bindings ./mn5_bind.sh /gpfs/scratch/bsc21/bsc021257/git/140-add-petsc/sod2d_gitlab/build_gpu/src/app_sod2d/sod2d ChannelFlowSolverIncomp.json For the cpu run mpirun -np 2 /gpfs/scratch/bsc21/bsc021257/git/140-add-petsc/sod2d_gitlab/build_cpu/src/app_sod2d/sod2d ChannelFlowSolverIncomp.json Our code can be downloaded with : git clone --recursive https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab.git__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEFjsBTIo$ -and the branch I am using with git checkout 140-add-petsc To use exactly the same commit I am using git checkout 09a923c9b57e46b14ae54b935845d50272691ace I am currently using: Currently Loaded Modules: 1) nvidia-hpc-sdk/25.1 2) hdf5/1.14.1-2-nvidia-nvhpcx 3) cmake/3.25.1 I guess/hope similar modules should be available in any supercomputer. To build the cpu version mkdir build_cpu cd build_cpu export PETSC_INSTALL=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241_cpu/hhinstal export LD_LIBRARY_PATH=$PETSC_INSTALL/lib:$LD_LIBRARY_PATH export LIBRARY_PATH=$PETSC_INSTALL/lib:$LIBRARY_PATH export C_INCLUDE_PATH=$PETSC_INSTALL/include:$C_INCLUDE_PATH export CPLUS_INCLUDE_PATH=$PETSC_INSTALL/include:$CPLUS_INCLUDE_PATH export PKG_CONFIG_PATH=$PETSC_INSTALL/lib/pkgconfig:$PKG_CONFIG_PATH cmake -DUSE_RP=8 -DUSE_PORDER=3 -DUSE_PETSC=ON -DUSE_GPU=OFF -DDEBUG_MODE=OFF .. make -j 80 I have built petsc myself as follows git clone -b release https://urldefense.us/v3/__https://gitlab.com/petsc/petsc.git__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLELP8U6d0$ petsc cd petsc git checkout v3.24.1 module purge module load nvidia-hpc-sdk/25.1 hdf5/1.14.1-2-nvidia-nvhpcx cmake/3.25.1 ./configure --PETSC_DIR=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/petsc --prefix=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/hhinstal --with-fortran-bindings=0 --with-fc=0 --with-petsc-arch=linux-x86_64-opt --with-scalar-type=real --with-debugging=yes --with-64-bit-indices=1 --with-precision=single --download-hypre CFLAGS=-I/apps/ACC/HDF5/1.14.1-2/NVIDIA/NVHPCX/include CXXFLAGS= FCFLAGS= --with-shared-libraries=1 --with-mpi=1 --with-blacs-lib=/gpfs/apps/MN5/ACC/ONEAPI/2025.1/mkl/2025.1/lib/intel64/libmkl_blacs_openmpi_lp64.a --with-blacs-include=/gpfs/apps/MN5/ACC/ONEAPI/2025.1/mkl/2025.1/include --with-mpi-dir=/apps/ACC/NVIDIA-HPC-SDK/25.1/Linux_x86_64/25.1/comm_libs/12.6/hpcx/latest/ompi/ --download-ptscotch=yes --download-metis --download-parmetis make all check make install ------------------- For the GPU version when configuring petsc I add : --with-cuda I then change the export PETSC_INSTALL to export PETSC_INSTALL=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/hhinstal and repeat all other exports mkdir build_gpu cd build_gpu cmake -DUSE_RP=8 -DUSE_PORDER=3 -DUSE_PETSC=ON -DUSE_GPU=ON -DDEBUG_MODE=OFF .. make -j 80 As you can see from the submit instructions the executable is found in sod2d_gitlab/build_gpu/src/app_sod2d/sod2d I hope I have not forgotten anything and my instructions are 'easy' to follow. If you have any issue do not doubt to contact me. The wiki for our code can be found in https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab/-/wikis/home__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEA1vqPYk$ Best, Herbert Owen Herbert Owen Senior Researcher, Dpt. Computer Applications in Science and Engineering Barcelona Supercomputing Center (BSC-CNS) Tel: +34 93 413 4038 Skype: herbert.owen https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEAA5PwtO$ > On 16 Oct 2025, at 18:30, Junchao Zhang wrote: > > Hi, Herbert, > I don't have much experience on OpenACC and PETSc CI doesn't have such tests. Could you avoid using nvfortran and instead use gfortran to compile your Fortran + OpenACC code? If you, then you can use the latest petsc code and make our debugging easier. > Also, could you provide us with a test and instructions to reproduce the problem? > > Thanks! > --Junchao Zhang > > > On Thu, Oct 16, 2025 at 5:07?AM howen via petsc-users > wrote: >> Dear All, >> >> I am interfacing our CFD code (Fortran + OpenACC) to Petsc. >> Since we use OpenACC the natural choice for us is to use Nvidia?s nvhpc compiler. The Gnu compiler does not work well and we do not have access to the Cray compiler. >> >> I already know that the latest version of Petsc does not compile with nvhpc, I am therefore using version 3.21. >> I get good results on the CPU both in serial and parallel (MPI). However, the GPU implementation, that is what we are interested in, only work correctly for the serial version. In parallel, the results are different. Even for a CG solve. >> >> I would like to know, if you have experience with the Nvidia compiler. I am particularly interested if you have already observed issues with it. Your opinion on whether to put further effort into trying to find a bug I may have introduced during the interfacing is highly appreciated. >> >> Best, >> >> Herbert Owen >> Senior Researcher, Dpt. Computer Applications in Science and Engineering >> Barcelona Supercomputing Center (BSC-CNS) >> Tel: +34 93 413 4038 >> Skype: herbert.owen >> >> https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEAA5PwtO$ >> >> >> >> >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From grantchao2018 at 163.com Thu Nov 13 11:16:33 2025 From: grantchao2018 at 163.com (Grant Chao) Date: Fri, 14 Nov 2025 01:16:33 +0800 (CST) Subject: [petsc-users] gpu cpu parallel In-Reply-To: References: <1f9310f.309.19a76fa21d9.Coremail.grantchao2018@163.com> <4E8E7829-A856-495A-ADA3-710C91F8B3EF@petsc.dev> Message-ID: <3d5b96a2.a3e4.19a7e380b1f.Coremail.grantchao2018@163.com> Junchao, We have tried cudaSetDevice. The test code is attached. 8 cpu and 2 gpu are used. And we create a gpu_comm including rank 0 and rank 4. Then we set gpu 0 to rank 0, gpu 1 to rank 1 respectively. After MatSetType, rank 1 is mapped to gpu0 again. The run cmd is mpirun -n 8 ./a.out -eps_type jd -st_ksp_type gmres -st_pc_type none The std out is show below, [Rank 0] using GPU 0, [line 22]. [Rank 1] no computation assigned. [Rank 2] no computation assigned. [Rank 3] no computation assigned. [Rank 4] using GPU 0, [line 22]. [Rank 5] no computation assigned. [Rank 6] no computation assigned. [Rank 7] no computation assigned. [Rank 4] using GPU 1, [line 31] after setdevice. -------- Here set device successfully [Rank 0] using GPU 0, [line 31] after setdevice. [Rank 4] using GPU 1, [line 41] after create A. [Rank 0] using GPU 0, [line 41] after create A. [Rank 0] using GPU 0, [line 45] after set A type. [Rank 4] using GPU 0, [line 45] after set A type. ------ change to 0? [Rank 4] using GPU 0, [line 49] after MatSetUp. [Rank 0] using GPU 0, [line 49] after MatSetUp. [Rank 4] using GPU 0, [line 62] after Mat Assemble. [Rank 0] using GPU 0, [line 62] after Mat Assemble. Smallest eigenvalue = 100.000000 Smallest eigenvalue = 100.000000 BEST, Grant At 2025-11-13 05:58:05, "Junchao Zhang" wrote: A common approach is to use CUDA_VISIBLE_DEVICES to manipulate MPI ranks to GPUs mapping, see the section at https://urldefense.us/v3/__https://docs.nersc.gov/jobs/affinity/*gpu-nodes__;Iw!!G_uCfscf7eWS!Z_gIM7FfeDHQ5dHmPBQcDcmQnG0t6iMrPQU7OgVoGBU_BV3clXDllaQuK7A2zJlgP_o477Up1LHyn0VK4A3ULkoO7PrHMQ$ With OpenMPI, you can use OMPI_COMM_WORLD_LOCAL_RANK in place of SLURM_LOCALID (see https://urldefense.us/v3/__https://docs.open-mpi.org/en/v5.0.x/tuning-apps/environment-var.html__;!!G_uCfscf7eWS!Z_gIM7FfeDHQ5dHmPBQcDcmQnG0t6iMrPQU7OgVoGBU_BV3clXDllaQuK7A2zJlgP_o477Up1LHyn0VK4A3ULkpfQizn9g$ ). For example, with 8 MPI ranks and 4 GPUs per node, the following script will map ranks 0, 1 to GPU 0, ranks 2, 3 to GPU 1. #!/bin/bash # select_gpu_device wrapper script export CUDA_VISIBLE_DEVICES=$((OMPI_COMM_WORLD_LOCAL_RANK/(OMPI_COMM_WORLD_LOCAL_SIZE/4))) exec $* On Wed, Nov 12, 2025 at 10:20?AM Barry Smith wrote: On Nov 12, 2025, at 2:31?AM, Grant Chao wrote: Thank you for the suggestion. We have already tried running multiple CPU ranks with a single GPU. However, we observed that as the number of ranks increases, the EPS solver becomes significantly slower. We are not sure of the exact cause?could it be due to process access contention, hidden data transfers, or perhaps another reason? We would be very interested to hear your insight on this matter. To avoid this problem, we used the gpu_comm approach mentioned before. During testing, we noticed that the mapping between rank ID and GPU ID seems to be set automatically and is not user-specifiable. For example, with 4 GPUs (0-3) and 8 CPU ranks (0-7), the program binds ranks 0 and 4 to GPU 0, ranks 1 and 5 to GPU 1, and so on. We tested possible solutions, such as calling cudaSetDevice() manually to set rank 4 to device 1, but it did not work as expected. Ranks 0 and 4 still used GPU 0. We would appreciate your guidance on how to customize this mapping. Thank you for your support. So you have a single compute "node" connected to multiple GPUs? Then the mapping of MPI ranks to GPUs doesn't matter and changing it won't improve the performance. However, we observed that as the number of ranks increases, the EPS solver becomes significantly slower. Does the number of EPS "iterations" increase? Run with one, two, four and eight MPI ranks (and the same number of "GPUs" (if you only have say four GPUs that is fine, just virtualize them so two different MPI ranks share one) and the option -log_view and send the output. We need to know what is slowing down before trying to find any cure. Barry Best wishes, Grant At 2025-11-12 11:48:47, "Junchao Zhang" , said: Hi, Wenbo, I think your approach should work. But before going this extra step with gpu_comm, have you tried to map multiple MPI ranks (CPUs) to one GPU, using nvidia's multiple process service (MPS)? If MPS works well, then you can avoid the extra complexity. --Junchao Zhang On Tue, Nov 11, 2025 at 7:50?PM Wenbo Zhao wrote: Dear all, We are trying to solve ksp using GPUs. We found the example, src/ksp/ksp/tutorials/bench_kspsolve.c, in which the matrix is created and assembling using COO way provided by PETSc. In this example, the number of CPU is as same as the number of GPU. In our case, computation of the parameters of matrix is performed on CPUs. And the cost of it is expensive, which might take half of total time or even more. We want to use more CPUs to compute parameters in parallel. And a smaller communication domain (such as gpu_comm) for the CPUs corresponding to the GPUs is created. The parameters are computed by all of the CPUs (in MPI_COMM_WORLD). Then, the parameters are send to gpu_comm related CPUs via MPI. Matrix (type of aijcusparse) is then created and assembled within gpu_comm. Finally, ksp_solve is performed on GPUs. I?m not sure if this approach will work in practice. Are there any comparable examples I can look to for guidance? Best, Wenbo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_cpu_gpu.cpp Type: text/x-c Size: 3190 bytes Desc: not available URL: From knepley at gmail.com Thu Nov 13 11:23:20 2025 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 13 Nov 2025 12:23:20 -0500 Subject: [petsc-users] Petsc + nvhpc In-Reply-To: <2A940B06-46A3-40AB-BD66-835C46970CA1@bsc.es> References: <0B70CA06-D787-4D97-8E33-27E71D08BBF0@bsc.es> <2A940B06-46A3-40AB-BD66-835C46970CA1@bsc.es> Message-ID: On Thu, Nov 13, 2025 at 12:11?PM howen via petsc-users < petsc-users at mcs.anl.gov> wrote: > Dear Junchao, > > Thank you for response and sorry for taking so long to answer back. > I cannot avoid using the nvidia tools. Gfortran is not mature for OpenACC > and gives us problems when compiling our code. > What I have done to enable using the latest petsc is to create my own C > code to call petsc. > I have little experience with c and it took me some time, but I can now > use petsc 3.24.1 ;) > > The behaviour remains the same as in my original email . > Parallel+GPU gives bad results. CPU(serial and parallel) and GPU serial > all work ok and give the same result. > > I have gone a bit into petsc comparing the CPU and GPU version with 2 mpi. > I see that the difference starts in > src/ksp/ksp/impls/cg/cg.c L170 > PetscCall(KSP_PCApply(ksp, R, Z)); /* z <- Br > */ > I have printed the vectors R and Z and the norm dp. > R is identical on both CPU and GPU; but Z differs. > The correct value of dp (for the first time it enters) is 14.3014, while > running on the GPU with 2 mpis it gives 14.7493. > If you wish I can send you prints I introduced in cg.c > Thank you for all the detail in this report. However, since you see a problem in KSPCG, I believe we can reduce the complexity. You can use -ksp_view_mat binary:A.bin -ksp_view_rhs binary:b.bin and send us those files. Then we can run your system directly using KSP ex10 (and so can you). Thanks, Matt > The folder with the input files to run the case can be downloaded from > https://urldefense.us/v3/__https://b2drop.eudat.eu/s/wKRQ4LK7RTKz2iQ__;!!G_uCfscf7eWS!bOzafvznhRJly5r11WId0BSmM38vOBG5qnlJMJf02uLM44-t4g7Xm8NCG7h_D7BTAe3ACc19jaFdq9hQ4klS$ > > > For submitting the gpu run I use > mpirun -np 2 --map-by ppr:4:node:PE=20 --report-bindings ./mn5_bind.sh > /gpfs/scratch/bsc21/bsc021257/git/140-add-petsc/sod2d_gitlab/build_gpu/src/app_sod2d/sod2d > ChannelFlowSolverIncomp.json > > For the cpu run > mpirun -np 2 > /gpfs/scratch/bsc21/bsc021257/git/140-add-petsc/sod2d_gitlab/build_cpu/src/app_sod2d/sod2d > ChannelFlowSolverIncomp.json > > Our code can be downloaded with : > git clone --recursive https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab.git__;!!G_uCfscf7eWS!bOzafvznhRJly5r11WId0BSmM38vOBG5qnlJMJf02uLM44-t4g7Xm8NCG7h_D7BTAe3ACc19jaFdq_ZmRVRG$ > > > -and the branch I am using with > git checkout 140-add-petsc > > To use exactly the same commit I am using > git checkout 09a923c9b57e46b14ae54b935845d50272691ace > > > I am currently using: Currently Loaded Modules: > 1) nvidia-hpc-sdk/25.1 2) hdf5/1.14.1-2-nvidia-nvhpcx 3) cmake/3.25.1 > I guess/hope similar modules should be available in any supercomputer. > > To build the cpu version > mkdir build_cpu > cd build_cpu > > export > PETSC_INSTALL=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241_cpu/hhinstal > export LD_LIBRARY_PATH=$PETSC_INSTALL/lib:$LD_LIBRARY_PATH > export LIBRARY_PATH=$PETSC_INSTALL/lib:$LIBRARY_PATH > export C_INCLUDE_PATH=$PETSC_INSTALL/include:$C_INCLUDE_PATH > export CPLUS_INCLUDE_PATH=$PETSC_INSTALL/include:$CPLUS_INCLUDE_PATH > export PKG_CONFIG_PATH=$PETSC_INSTALL/lib/pkgconfig:$PKG_CONFIG_PATH > > cmake -DUSE_RP=8 -DUSE_PORDER=3 -DUSE_PETSC=ON -DUSE_GPU=OFF > -DDEBUG_MODE=OFF .. > make -j 80 > > I have built petsc myself as follows > > git clone -b release https://urldefense.us/v3/__https://gitlab.com/petsc/petsc.git__;!!G_uCfscf7eWS!bOzafvznhRJly5r11WId0BSmM38vOBG5qnlJMJf02uLM44-t4g7Xm8NCG7h_D7BTAe3ACc19jaFdq58G6Hkk$ > > petsc > cd petsc > git checkout v3.24.1 > module purge > module load nvidia-hpc-sdk/25.1 hdf5/1.14.1-2-nvidia-nvhpcx cmake/3.25.1 > ./configure > --PETSC_DIR=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/petsc > --prefix=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/hhinstal > --with-fortran-bindings=0 --with-fc=0 --with-petsc-arch=linux-x86_64-opt > --with-scalar-type=real --with-debugging=yes --with-64-bit-indices=1 > --with-precision=single --download-hypre > CFLAGS=-I/apps/ACC/HDF5/1.14.1-2/NVIDIA/NVHPCX/include CXXFLAGS= FCFLAGS= > --with-shared-libraries=1 --with-mpi=1 > --with-blacs-lib=/gpfs/apps/MN5/ACC/ONEAPI/2025.1/mkl/2025.1/lib/intel64/libmkl_blacs_openmpi_lp64.a > --with-blacs-include=/gpfs/apps/MN5/ACC/ONEAPI/2025.1/mkl/2025.1/include > --with-mpi-dir=/apps/ACC/NVIDIA-HPC-SDK/25.1/Linux_x86_64/25.1/comm_libs/12.6/hpcx/latest/ompi/ > --download-ptscotch=yes --download-metis --download-parmetis > make all check > make install > > ------------------- > For the GPU version when configuring petsc I add : --with-cuda > > I then change the export PETSC_INSTALL to > export > PETSC_INSTALL=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/hhinstal > and repeat all other exports > > mkdir build_gpu > cd build_gpu > cmake -DUSE_RP=8 -DUSE_PORDER=3 -DUSE_PETSC=ON -DUSE_GPU=ON > -DDEBUG_MODE=OFF .. > make -j 80 > > As you can see from the submit instructions the executable is found in > sod2d_gitlab/build_gpu/src/app_sod2d/sod2d > > I hope I have not forgotten anything and my instructions are 'easy' to > follow. If you have any issue do not doubt to contact me. > The wiki for our code can be found in > https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab/-/wikis/home__;!!G_uCfscf7eWS!bOzafvznhRJly5r11WId0BSmM38vOBG5qnlJMJf02uLM44-t4g7Xm8NCG7h_D7BTAe3ACc19jaFdq4yklS7Y$ > > > Best, > > Herbert Owen > > Herbert Owen > Senior Researcher, Dpt. Computer Applications in Science and Engineering > Barcelona Supercomputing Center (BSC-CNS) > Tel: +34 93 413 4038 > Skype: herbert.owen > > https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!bOzafvznhRJly5r11WId0BSmM38vOBG5qnlJMJf02uLM44-t4g7Xm8NCG7h_D7BTAe3ACc19jaFdq7rqqXKl$ > > > > > > > > > > On 16 Oct 2025, at 18:30, Junchao Zhang wrote: > > Hi, Herbert, > I don't have much experience on OpenACC and PETSc CI doesn't have such > tests. Could you avoid using nvfortran and instead use gfortran to compile > your Fortran + OpenACC code? If you, then you can use the latest petsc > code and make our debugging easier. > Also, could you provide us with a test and instructions to reproduce > the problem? > > Thanks! > --Junchao Zhang > > > On Thu, Oct 16, 2025 at 5:07?AM howen via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> Dear All, >> >> I am interfacing our CFD code (Fortran + OpenACC) to Petsc. >> Since we use OpenACC the natural choice for us is to use Nvidia?s nvhpc >> compiler. The Gnu compiler does not work well and we do not have access to >> the Cray compiler. >> >> I already know that the latest version of Petsc does not compile with >> nvhpc, I am therefore using version 3.21. >> I get good results on the CPU both in serial and parallel (MPI). However, >> the GPU implementation, that is what we are interested in, only work >> correctly for the serial version. In parallel, the results are different. >> Even for a CG solve. >> >> I would like to know, if you have experience with the Nvidia compiler. I >> am particularly interested if you have already observed issues with it. >> Your opinion on whether to put further effort into trying to find a bug I >> may have introduced during the interfacing is highly appreciated. >> >> Best, >> >> Herbert Owen >> Senior Researcher, Dpt. Computer Applications in Science and Engineering >> Barcelona Supercomputing Center (BSC-CNS) >> Tel: +34 93 413 4038 >> Skype: herbert.owen >> >> https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!bOzafvznhRJly5r11WId0BSmM38vOBG5qnlJMJf02uLM44-t4g7Xm8NCG7h_D7BTAe3ACc19jaFdq7rqqXKl$ >> >> >> >> >> >> >> >> >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bOzafvznhRJly5r11WId0BSmM38vOBG5qnlJMJf02uLM44-t4g7Xm8NCG7h_D7BTAe3ACc19jaFdq3vxkBC_$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Nov 13 12:48:25 2025 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 13 Nov 2025 13:48:25 -0500 Subject: [petsc-users] Probelm with DMPlexExtractSubMesh In-Reply-To: <1e9e4725-5274-4cef-a035-bb399deaac55@unibas.it> References: <1e9e4725-5274-4cef-a035-bb399deaac55@unibas.it> Message-ID: Sorry, I have been traveling. I just got back to this. The problem is that _everything_ that goes in the submesh has to have the same label value. That way you can distinguish exactly what you want in. However, the boundary label has to make decisions about shared edges and vertices. I am attaching a modified code that does what you want by making a separate label for each side. I apologize for the C. I am just not as quick in Fortran. Thanks, Matt On Thu, Nov 6, 2025 at 1:42?AM Aldo Bonfiglioli wrote: > Dear all, > > I am having troubles in using DMPlexExtractSubMesh to extract the > different strata of the Face Sets of a given mesh. > > When run on the enclosed tetrahedral mesh of the unit cube generated with > gmsh > > Face Sets: 6 strata with value/size (1 (246), 2 (246), 3 (246), 4 (246), 5 > (242), 6 (242)) > > I would expect 246 "points" on stratum 3, but when I DMview the subdm (and > plot it) the surface mesh looks incomplete > > DM Object: patch_03 1 MPI process > type: plex > patch_03 in 2 dimensions: > Cells are at height 1 > Number of 0-cells per rank: 122 > Number of 1-cells per rank: 325 > Number of 2-cells per rank: 204 > Number of 3-cells per rank: 204 [204] > Labels: > celltype: 4 strata with value/size (0 (122), 1 (325), 3 (204), 12 (204)) > depth: 4 strata with value/size (0 (122), 1 (325), 2 (204), 3 (204)) > Cell Sets: 1 strata with value/size (1 (204)) > Face Sets: 1 strata with value/size (3 (204)) > Edge Sets: 2 strata with value/size (1 (8), 5 (8)) > > see also patch_03.pdf > > What am I doing wrong? > > A simple reproducer (compiles with petsc-3.24.0) and the gmsh mesh are > enclosed. > > Thanks, > > Aldo > > -- > Dr. Aldo Bonfiglioli > Associate professor of Fluid Mechanics > Dipartimento di Ingegneria > Universita' della Basilicata > V.le dell'Ateneo Lucano, 10 85100 Potenza ITALY > tel:+39.0971.205203 fax:+39.0971.205215 > web: https://urldefense.us/v3/__http://docenti.unibas.it/site/home/docente.html?m=002423__;!!G_uCfscf7eWS!aNqQNIAnqfeL74GBiwHA9seVWu0ove-CSJIwX6f353WAN55As1veo1pVXphJIAAgvQIkWls9Xnm5sW-es9gN$ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aNqQNIAnqfeL74GBiwHA9seVWu0ove-CSJIwX6f353WAN55As1veo1pVXphJIAAgvQIkWls9Xnm5sYVo0yux$ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex_submesh.c Type: application/octet-stream Size: 2524 bytes Desc: not available URL: From knepley at gmail.com Thu Nov 13 15:27:44 2025 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 13 Nov 2025 16:27:44 -0500 Subject: [petsc-users] How to map global vector to natural vector In-Reply-To: References: Message-ID: Here is the MR: https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8845__;!!G_uCfscf7eWS!ZmoN7SHfAQw89DROdmXJv-lFozEHO_b5_4vXYl19TEES-ofkxjaaeq13Z5aj1-VVMY43qgXqgj5Y1WIH2Mba$ Thanks, Matt On Tue, Oct 21, 2025 at 4:24?PM Matthew Knepley wrote: > I will fix it. > > Thanks, > > Matt > > On Tue, Oct 21, 2025 at 12:09?PM Xu, Donghui via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> Dear PETSc Team, >> >> I am working with petsc4py for my model. I had some experience of using >> PETSc in Fortran. In Fortran, I used the following subroutines: >> >> call DMPlexCreateNaturalVector(dm, natural, ierr) >> call DMPlexNaturalToGlobalBegin(dm,natural,X,ierr) >> call DMPlexNaturalToGlobalEnd(dm,natural,X,ierr) >> >> However, I found there are no such interfaces in petsc4py. Can you advise >> me on how to get the global vector in natural order with DMPLEX in petsc4py? >> >> Thanks, >> Donghui >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ZmoN7SHfAQw89DROdmXJv-lFozEHO_b5_4vXYl19TEES-ofkxjaaeq13Z5aj1-VVMY43qgXqgj5Y1TFMijNU$ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ZmoN7SHfAQw89DROdmXJv-lFozEHO_b5_4vXYl19TEES-ofkxjaaeq13Z5aj1-VVMY43qgXqgj5Y1TFMijNU$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From donghui.xu at pnnl.gov Thu Nov 13 15:28:43 2025 From: donghui.xu at pnnl.gov (Xu, Donghui) Date: Thu, 13 Nov 2025 21:28:43 +0000 Subject: [petsc-users] How to map global vector to natural vector In-Reply-To: References: Message-ID: Thank you, Matt! From: Matthew Knepley Date: Thursday, November 13, 2025 at 1:28?PM To: Xu, Donghui Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] How to map global vector to natural vector Check twice before you click! This email originated from outside PNNL. Here is the MR: https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8845__;!!G_uCfscf7eWS!eigadZs4YANflOBAegy-QTp9ga_gDR04kBT3z1MTqB-hieojq_WyFnV_8kjEYFw4cGI5ugeoekJUtEVOobw-94jwWXM$ Thanks, Matt On Tue, Oct 21, 2025 at 4:24?PM Matthew Knepley > wrote: I will fix it. Thanks, Matt On Tue, Oct 21, 2025 at 12:09?PM Xu, Donghui via petsc-users > wrote: Dear PETSc Team, I am working with petsc4py for my model. I had some experience of using PETSc in Fortran. In Fortran, I used the following subroutines: call DMPlexCreateNaturalVector(dm, natural, ierr) call DMPlexNaturalToGlobalBegin(dm,natural,X,ierr) call DMPlexNaturalToGlobalEnd(dm,natural,X,ierr) However, I found there are no such interfaces in petsc4py. Can you advise me on how to get the global vector in natural order with DMPLEX in petsc4py? Thanks, Donghui -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!eigadZs4YANflOBAegy-QTp9ga_gDR04kBT3z1MTqB-hieojq_WyFnV_8kjEYFw4cGI5ugeoekJUtEVOobw-Y8UMmvs$ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!eigadZs4YANflOBAegy-QTp9ga_gDR04kBT3z1MTqB-hieojq_WyFnV_8kjEYFw4cGI5ugeoekJUtEVOobw-Y8UMmvs$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Thu Nov 13 17:02:20 2025 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 13 Nov 2025 17:02:20 -0600 Subject: [petsc-users] gpu cpu parallel In-Reply-To: <3d5b96a2.a3e4.19a7e380b1f.Coremail.grantchao2018@163.com> References: <1f9310f.309.19a76fa21d9.Coremail.grantchao2018@163.com> <4E8E7829-A856-495A-ADA3-710C91F8B3EF@petsc.dev> <3d5b96a2.a3e4.19a7e380b1f.Coremail.grantchao2018@163.com> Message-ID: Hi, Grant, I could reproduce the issue with your code. I think petsc code has some problems and I created an issue at https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/issues/1826__;!!G_uCfscf7eWS!ZSsk7IMQF7yL-THgMdfh_H3K7F1HUJg38n2dhkaBkJR1IvhSOpfX3c1TZLEL6JDNyCGACV-PEFWtIy-WgsKA8roDoTvm$ . Though we should fix it (not sure how for now), I think a much simpler approach is to use CUDA_VISIBLE_DEVICES. For example, if you just want ranks 0, 4 to use GPUs 0, 1 respectively, you can just delete these lines in your example if (global_rank == 0) { cudaSetDevice(0); } else if (global_rank == 4) { cudaSetDevice(1); } Then, instead, just make GPUs 0, 1 visible to ranks 0, 4 respectively upfront, by $ cat set_gpu_device #!/bin/bash # select_gpu_device wrapper script export CUDA_VISIBLE_DEVICES=$((OMPI_COMM_WORLD_LOCAL_RANK/(OMPI_COMM_WORLD_LOCAL_SIZE/2))) exec $* $ mpirun -n 8 ./set_gpu_device ./ex0 [Rank 5] no computation assigned. [Rank 6] no computation assigned. [Rank 7] no computation assigned. [Rank 0] using GPU 0, [line 23]. [Rank 0] using GPU 0, [line 32] after setdevice. [Rank 1] no computation assigned. [Rank 2] no computation assigned. [Rank 3] no computation assigned. [Rank 4] using GPU 0, [line 23]. [Rank 4] using GPU 0, [line 32] after setdevice. [Rank 0] using GPU 0, [line 42] after create A. [Rank 4] using GPU 0, [line 42] after create A. [Rank 4] using GPU 0, [line 46] after set A type. [Rank 0] using GPU 0, [line 46] after set A type. [Rank 0] using GPU 0, [line 50] after MatSetUp. [Rank 4] using GPU 0, [line 50] after MatSetUp. [Rank 0] using GPU 0, [line 63] after Mat Assemble. [Rank 4] using GPU 0, [line 63] after Mat Assemble. Smallest eigenvalue = 100.000000 Smallest eigenvalue = 100.000000 Note for rank 4, GPU 0 is actually the physical GPU 1. Let me know if it works. --Junchao Zhang On Thu, Nov 13, 2025 at 11:17?AM Grant Chao wrote: > Junchao, > We have tried cudaSetDevice. > The test code is attached. 8 cpu and 2 gpu are used. And we create a > gpu_comm including rank 0 and rank 4. > Then we set gpu 0 to rank 0, gpu 1 to rank 1 respectively. > After MatSetType, rank 1 is mapped to gpu0 again. > > The run cmd is > mpirun -n 8 ./a.out -eps_type jd -st_ksp_type gmres -st_pc_type none > > The std out is show below, > [Rank 0] using GPU 0, [line 22]. > [Rank 1] no computation assigned. > [Rank 2] no computation assigned. > [Rank 3] no computation assigned. > [Rank 4] using GPU 0, [line 22]. > [Rank 5] no computation assigned. > [Rank 6] no computation assigned. > [Rank 7] no computation assigned. > [Rank 4] using GPU 1, [line 31] after setdevice. -------- Here set > device successfully > [Rank 0] using GPU 0, [line 31] after setdevice. > [Rank 4] using GPU 1, [line 41] after create A. > [Rank 0] using GPU 0, [line 41] after create A. > [Rank 0] using GPU 0, [line 45] after set A type. > [Rank 4] using GPU 0, [line 45] after set A type. ------ change to 0? > [Rank 4] using GPU 0, [line 49] after MatSetUp. > [Rank 0] using GPU 0, [line 49] after MatSetUp. > [Rank 4] using GPU 0, [line 62] after Mat Assemble. > [Rank 0] using GPU 0, [line 62] after Mat Assemble. > Smallest eigenvalue = 100.000000 > Smallest eigenvalue = 100.000000 > > BEST, > Grant > > > > > At 2025-11-13 05:58:05, "Junchao Zhang" wrote: > > A common approach is to use CUDA_VISIBLE_DEVICES to manipulate MPI ranks > to GPUs mapping, see the section at > https://urldefense.us/v3/__https://docs.nersc.gov/jobs/affinity/*gpu-nodes__;Iw!!G_uCfscf7eWS!ZSsk7IMQF7yL-THgMdfh_H3K7F1HUJg38n2dhkaBkJR1IvhSOpfX3c1TZLEL6JDNyCGACV-PEFWtIy-WgsKA8pWxGvch$ > > With OpenMPI, you can use OMPI_COMM_WORLD_LOCAL_RANK in place of > SLURM_LOCALID (see > https://urldefense.us/v3/__https://docs.open-mpi.org/en/v5.0.x/tuning-apps/environment-var.html__;!!G_uCfscf7eWS!ZSsk7IMQF7yL-THgMdfh_H3K7F1HUJg38n2dhkaBkJR1IvhSOpfX3c1TZLEL6JDNyCGACV-PEFWtIy-WgsKA8khuXtvj$ ). > For example, with 8 MPI ranks and 4 GPUs per node, the following script > will map ranks 0, 1 to GPU 0, ranks 2, 3 to GPU 1. > > #!/bin/bash > # select_gpu_device wrapper script > export > CUDA_VISIBLE_DEVICES=$((OMPI_COMM_WORLD_LOCAL_RANK/(OMPI_COMM_WORLD_LOCAL_SIZE/4))) > exec $* > > On Wed, Nov 12, 2025 at 10:20?AM Barry Smith wrote: > >> >> >> On Nov 12, 2025, at 2:31?AM, Grant Chao wrote: >> >> >> Thank you for the suggestion. >> >> We have already tried running multiple CPU ranks with a single GPU. >> However, we observed that as the number of ranks increases, the EPS solver >> becomes significantly slower. We are not sure of the exact cause?could it >> be due to process access contention, hidden data transfers, or perhaps >> another reason? We would be very interested to hear your insight on this >> matter. >> >> To avoid this problem, we used the gpu_comm approach mentioned before. >> During testing, we noticed that the mapping between rank ID and GPU ID >> seems to be set automatically and is not user-specifiable. >> >> For example, with 4 GPUs (0-3) and 8 CPU ranks (0-7), the program binds >> ranks 0 and 4 to GPU 0, ranks 1 and 5 to GPU 1, and so on. >> >> >> >> >> We tested possible solutions, such as calling cudaSetDevice() manually to >> set rank 4 to device 1, but it did not work as expected. Ranks 0 and 4 >> still used GPU 0. >> >> We would appreciate your guidance on how to customize this mapping. Thank >> you for your support. >> >> >> So you have a single compute "node" connected to multiple GPUs? Then >> the mapping of MPI ranks to GPUs doesn't matter and changing it won't >> improve the performance. >> > >> However, we observed that as the number of ranks increases, the EPS >> solver becomes significantly slower. >> >> >> Does the number of EPS "iterations" increase? Run with one, two, four >> and eight MPI ranks (and the same number of "GPUs" (if you only have say >> four GPUs that is fine, just virtualize them so two different MPI ranks >> share one) and the option -log_view and send the output. We need to know >> what is slowing down before trying to find any cure. >> >> Barry >> >> >> >> >> >> Best wishes, >> Grant >> >> >> At 2025-11-12 11:48:47, "Junchao Zhang" , said: >> >> Hi, Wenbo, >> I think your approach should work. But before going this extra step >> with gpu_comm, have you tried to map multiple MPI ranks (CPUs) to one GPU, >> using nvidia's multiple process service (MPS)? If MPS works well, then >> you can avoid the extra complexity. >> >> --Junchao Zhang >> >> >> On Tue, Nov 11, 2025 at 7:50?PM Wenbo Zhao >> wrote: >> >>> Dear all, >>> >>> We are trying to solve ksp using GPUs. >>> We found the example, src/ksp/ksp/tutorials/bench_kspsolve.c, in which >>> the matrix is created and assembling using COO way provided by PETSc. In >>> this example, the number of CPU is as same as the number of GPU. >>> In our case, computation of the parameters of matrix is performed on >>> CPUs. And the cost of it is expensive, which might take half of total time >>> or even more. >>> >>> We want to use more CPUs to compute parameters in parallel. And a >>> smaller communication domain (such as gpu_comm) for the CPUs corresponding >>> to the GPUs is created. The parameters are computed by all of the CPUs (in >>> MPI_COMM_WORLD). Then, the parameters are send to gpu_comm related CPUs via >>> MPI. Matrix (type of aijcusparse) is then created and assembled within >>> gpu_comm. Finally, ksp_solve is performed on GPUs. >>> >>> I?m not sure if this approach will work in practice. Are there any >>> comparable examples I can look to for guidance? >>> >>> Best, >>> Wenbo >>> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Thu Nov 13 21:15:35 2025 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 13 Nov 2025 21:15:35 -0600 Subject: [petsc-users] Fwd: Fw: gpu cpu parallel In-Reply-To: <7a48f48d.29317.19a80534c97.Coremail.amarantos@126.com> References: <7a48f48d.29317.19a80534c97.Coremail.amarantos@126.com> Message-ID: Glad to hear it works! --Junchao Zhang ---------- Forwarded message --------- From: Grace Date: Thu, Nov 13, 2025 at 9:05?PM Subject: Fw: [petsc-users] gpu cpu parallel To: junchao.zhang at gmail.com Hello, Junchao, Thank you for your prompt help and the detailed solution. We have tested the approach you suggested, using the set_gpu_device wrapper script to control GPU visibility via CUDA_VISIBLE_DEVICES. It works perfectly and now correctly maps the ranks to the intended GPUs as we desired. We really appreciate your guidance in resolving this issue. Best regards, Grace Gao ---- Forwarded Message ---- >From Grant Chao Date 11/14/2025 08:40 To amarantos at 126.com Cc Subject Fw:Re: Re: [petsc-users] gpu cpu parallel -- sent by my netease email phone version -------- Forward mail content -------- From: "Junchao Zhang" Date: 2025-11-14 07:02:20 To: "Grant Chao" CC: "Barry Smith" ,petsc-users Subject: Re: Re: [petsc-users] gpu cpu parallel Hi, Grant, I could reproduce the issue with your code. I think petsc code has some problems and I created an issue at https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/issues/1826__;!!G_uCfscf7eWS!cbp-7TpeWxYGe77z37dPkFan7mFckYzyKehvf7UVDZ8djmigGrIvIj5PYrJhgumqnq2Gi5vpjoVuOPymh3NEBptx_yOV$ . Though we should fix it (not sure how for now), I think a much simpler approach is to use CUDA_VISIBLE_DEVICES. For example, if you just want ranks 0, 4 to use GPUs 0, 1 respectively, you can just delete these lines in your example if (global_rank == 0) { cudaSetDevice(0); } else if (global_rank == 4) { cudaSetDevice(1); } Then, instead, just make GPUs 0, 1 visible to ranks 0, 4 respectively upfront, by $ cat set_gpu_device #!/bin/bash # select_gpu_device wrapper script export CUDA_VISIBLE_DEVICES=$((OMPI_COMM_WORLD_LOCAL_RANK/(OMPI_COMM_WORLD_LOCAL_SIZE/2))) exec $* $ mpirun -n 8 ./set_gpu_device ./ex0 [Rank 5] no computation assigned. [Rank 6] no computation assigned. [Rank 7] no computation assigned. [Rank 0] using GPU 0, [line 23]. [Rank 0] using GPU 0, [line 32] after setdevice. [Rank 1] no computation assigned. [Rank 2] no computation assigned. [Rank 3] no computation assigned. [Rank 4] using GPU 0, [line 23]. [Rank 4] using GPU 0, [line 32] after setdevice. [Rank 0] using GPU 0, [line 42] after create A. [Rank 4] using GPU 0, [line 42] after create A. [Rank 4] using GPU 0, [line 46] after set A type. [Rank 0] using GPU 0, [line 46] after set A type. [Rank 0] using GPU 0, [line 50] after MatSetUp. [Rank 4] using GPU 0, [line 50] after MatSetUp. [Rank 0] using GPU 0, [line 63] after Mat Assemble. [Rank 4] using GPU 0, [line 63] after Mat Assemble. Smallest eigenvalue = 100.000000 Smallest eigenvalue = 100.000000 Note for rank 4, GPU 0 is actually the physical GPU 1. Let me know if it works. --Junchao Zhang On Thu, Nov 13, 2025 at 11:17?AM Grant Chao wrote: > Junchao, > We have tried cudaSetDevice. > The test code is attached. 8 cpu and 2 gpu are used. And we create a > gpu_comm including rank 0 and rank 4. > Then we set gpu 0 to rank 0, gpu 1 to rank 1 respectively. > After MatSetType, rank 1 is mapped to gpu0 again. > > The run cmd is > mpirun -n 8 ./a.out -eps_type jd -st_ksp_type gmres -st_pc_type none > > The std out is show below, > [Rank 0] using GPU 0, [line 22]. > [Rank 1] no computation assigned. > [Rank 2] no computation assigned. > [Rank 3] no computation assigned. > [Rank 4] using GPU 0, [line 22]. > [Rank 5] no computation assigned. > [Rank 6] no computation assigned. > [Rank 7] no computation assigned. > [Rank 4] using GPU 1, [line 31] after setdevice. -------- Here set > device successfully > [Rank 0] using GPU 0, [line 31] after setdevice. > [Rank 4] using GPU 1, [line 41] after create A. > [Rank 0] using GPU 0, [line 41] after create A. > [Rank 0] using GPU 0, [line 45] after set A type. > [Rank 4] using GPU 0, [line 45] after set A type. ------ change to 0? > [Rank 4] using GPU 0, [line 49] after MatSetUp. > [Rank 0] using GPU 0, [line 49] after MatSetUp. > [Rank 4] using GPU 0, [line 62] after Mat Assemble. > [Rank 0] using GPU 0, [line 62] after Mat Assemble. > Smallest eigenvalue = 100.000000 > Smallest eigenvalue = 100.000000 > > BEST, > Grant > > > > > At 2025-11-13 05:58:05, "Junchao Zhang" wrote: > > A common approach is to use CUDA_VISIBLE_DEVICES to manipulate MPI ranks > to GPUs mapping, see the section at > https://urldefense.us/v3/__https://docs.nersc.gov/jobs/affinity/*gpu-nodes__;Iw!!G_uCfscf7eWS!cbp-7TpeWxYGe77z37dPkFan7mFckYzyKehvf7UVDZ8djmigGrIvIj5PYrJhgumqnq2Gi5vpjoVuOPymh3NEBtfb0PXl$ > > With OpenMPI, you can use OMPI_COMM_WORLD_LOCAL_RANK in place of > SLURM_LOCALID (see > https://urldefense.us/v3/__https://docs.open-mpi.org/en/v5.0.x/tuning-apps/environment-var.html__;!!G_uCfscf7eWS!cbp-7TpeWxYGe77z37dPkFan7mFckYzyKehvf7UVDZ8djmigGrIvIj5PYrJhgumqnq2Gi5vpjoVuOPymh3NEBgxjjYYZ$ ). > For example, with 8 MPI ranks and 4 GPUs per node, the following script > will map ranks 0, 1 to GPU 0, ranks 2, 3 to GPU 1. > > #!/bin/bash > # select_gpu_device wrapper script > export > CUDA_VISIBLE_DEVICES=$((OMPI_COMM_WORLD_LOCAL_RANK/(OMPI_COMM_WORLD_LOCAL_SIZE/4))) > exec $* > > On Wed, Nov 12, 2025 at 10:20?AM Barry Smith wrote: > >> >> >> On Nov 12, 2025, at 2:31?AM, Grant Chao wrote: >> >> >> Thank you for the suggestion. >> >> We have already tried running multiple CPU ranks with a single GPU. >> However, we observed that as the number of ranks increases, the EPS solver >> becomes significantly slower. We are not sure of the exact cause?could it >> be due to process access contention, hidden data transfers, or perhaps >> another reason? We would be very interested to hear your insight on this >> matter. >> >> To avoid this problem, we used the gpu_comm approach mentioned before. >> During testing, we noticed that the mapping between rank ID and GPU ID >> seems to be set automatically and is not user-specifiable. >> >> For example, with 4 GPUs (0-3) and 8 CPU ranks (0-7), the program binds >> ranks 0 and 4 to GPU 0, ranks 1 and 5 to GPU 1, and so on. >> >> >> >> >> We tested possible solutions, such as calling cudaSetDevice() manually to >> set rank 4 to device 1, but it did not work as expected. Ranks 0 and 4 >> still used GPU 0. >> >> We would appreciate your guidance on how to customize this mapping. Thank >> you for your support. >> >> >> So you have a single compute "node" connected to multiple GPUs? Then >> the mapping of MPI ranks to GPUs doesn't matter and changing it won't >> improve the performance. >> > >> However, we observed that as the number of ranks increases, the EPS >> solver becomes significantly slower. >> >> >> Does the number of EPS "iterations" increase? Run with one, two, four >> and eight MPI ranks (and the same number of "GPUs" (if you only have say >> four GPUs that is fine, just virtualize them so two different MPI ranks >> share one) and the option -log_view and send the output. We need to know >> what is slowing down before trying to find any cure. >> >> Barry >> >> >> >> >> >> Best wishes, >> Grant >> >> >> At 2025-11-12 11:48:47, "Junchao Zhang" , said: >> >> Hi, Wenbo, >> I think your approach should work. But before going this extra step >> with gpu_comm, have you tried to map multiple MPI ranks (CPUs) to one GPU, >> using nvidia's multiple process service (MPS)? If MPS works well, then >> you can avoid the extra complexity. >> >> --Junchao Zhang >> >> >> On Tue, Nov 11, 2025 at 7:50?PM Wenbo Zhao >> wrote: >> >>> Dear all, >>> >>> We are trying to solve ksp using GPUs. >>> We found the example, src/ksp/ksp/tutorials/bench_kspsolve.c, in which >>> the matrix is created and assembling using COO way provided by PETSc. In >>> this example, the number of CPU is as same as the number of GPU. >>> In our case, computation of the parameters of matrix is performed on >>> CPUs. And the cost of it is expensive, which might take half of total time >>> or even more. >>> >>> We want to use more CPUs to compute parameters in parallel. And a >>> smaller communication domain (such as gpu_comm) for the CPUs corresponding >>> to the GPUs is created. The parameters are computed by all of the CPUs (in >>> MPI_COMM_WORLD). Then, the parameters are send to gpu_comm related CPUs via >>> MPI. Matrix (type of aijcusparse) is then created and assembled within >>> gpu_comm. Finally, ksp_solve is performed on GPUs. >>> >>> I?m not sure if this approach will work in practice. Are there any >>> comparable examples I can look to for guidance? >>> >>> Best, >>> Wenbo >>> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From herbert.owen at bsc.es Fri Nov 14 09:08:24 2025 From: herbert.owen at bsc.es (howen) Date: Fri, 14 Nov 2025 16:08:24 +0100 Subject: [petsc-users] Petsc + nvhpc In-Reply-To: References: <0B70CA06-D787-4D97-8E33-27E71D08BBF0@bsc.es> <2A940B06-46A3-40AB-BD66-835C46970CA1@bsc.es> Message-ID: Thank you very much Matthew, I did what you suggested and I also added ierr = MatView(*amat, PETSC_VIEWER_STDOUT_WORLD); CHKERRQ(ierr); Now that I can see the matrices I notice that some values differ. I will debug and simplify my code to try to understand where the difference comes from . As soon as I have a more clear picture I will contact you back. Best, Herbert Owen Senior Researcher, Dpt. Computer Applications in Science and Engineering Barcelona Supercomputing Center (BSC-CNS) Tel: +34 93 413 4038 Skype: herbert.owen https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnGxsT7iF2$ > On 13 Nov 2025, at 18:23, Matthew Knepley wrote: > > On Thu, Nov 13, 2025 at 12:11?PM howen via petsc-users > wrote: >> Dear Junchao, >> >> Thank you for response and sorry for taking so long to answer back. >> I cannot avoid using the nvidia tools. Gfortran is not mature for OpenACC and gives us problems when compiling our code. >> What I have done to enable using the latest petsc is to create my own C code to call petsc. >> I have little experience with c and it took me some time, but I can now use petsc 3.24.1 ;) >> >> The behaviour remains the same as in my original email . >> Parallel+GPU gives bad results. CPU(serial and parallel) and GPU serial all work ok and give the same result. >> >> I have gone a bit into petsc comparing the CPU and GPU version with 2 mpi. >> I see that the difference starts in >> src/ksp/ksp/impls/cg/cg.c L170 >> PetscCall(KSP_PCApply(ksp, R, Z)); /* z <- Br */ >> I have printed the vectors R and Z and the norm dp. >> R is identical on both CPU and GPU; but Z differs. >> The correct value of dp (for the first time it enters) is 14.3014, while running on the GPU with 2 mpis it gives 14.7493. >> If you wish I can send you prints I introduced in cg.c > > Thank you for all the detail in this report. However, since you see a problem in KSPCG, I believe we can reduce the complexity. You can use > > -ksp_view_mat binary:A.bin -ksp_view_rhs binary:b.bin > > and send us those files. Then we can run your system directly using KSP ex10 (and so can you). > > Thanks, > > Matt > >> The folder with the input files to run the case can be downloaded from https://urldefense.us/v3/__https://b2drop.eudat.eu/s/wKRQ4LK7RTKz2iQ__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnG1yKXAMP$ >> >> For submitting the gpu run I use >> mpirun -np 2 --map-by ppr:4:node:PE=20 --report-bindings ./mn5_bind.sh /gpfs/scratch/bsc21/bsc021257/git/140-add-petsc/sod2d_gitlab/build_gpu/src/app_sod2d/sod2d ChannelFlowSolverIncomp.json >> >> For the cpu run >> mpirun -np 2 /gpfs/scratch/bsc21/bsc021257/git/140-add-petsc/sod2d_gitlab/build_cpu/src/app_sod2d/sod2d ChannelFlowSolverIncomp.json >> >> Our code can be downloaded with : >> git clone --recursive https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab.git__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnG8xQvHi_$ >> >> -and the branch I am using with >> git checkout 140-add-petsc >> >> To use exactly the same commit I am using >> git checkout 09a923c9b57e46b14ae54b935845d50272691ace >> >> >> I am currently using: Currently Loaded Modules: >> 1) nvidia-hpc-sdk/25.1 2) hdf5/1.14.1-2-nvidia-nvhpcx 3) cmake/3.25.1 >> I guess/hope similar modules should be available in any supercomputer. >> >> To build the cpu version >> mkdir build_cpu >> cd build_cpu >> >> export PETSC_INSTALL=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241_cpu/hhinstal >> export LD_LIBRARY_PATH=$PETSC_INSTALL/lib:$LD_LIBRARY_PATH >> export LIBRARY_PATH=$PETSC_INSTALL/lib:$LIBRARY_PATH >> export C_INCLUDE_PATH=$PETSC_INSTALL/include:$C_INCLUDE_PATH >> export CPLUS_INCLUDE_PATH=$PETSC_INSTALL/include:$CPLUS_INCLUDE_PATH >> export PKG_CONFIG_PATH=$PETSC_INSTALL/lib/pkgconfig:$PKG_CONFIG_PATH >> >> cmake -DUSE_RP=8 -DUSE_PORDER=3 -DUSE_PETSC=ON -DUSE_GPU=OFF -DDEBUG_MODE=OFF .. >> make -j 80 >> >> I have built petsc myself as follows >> >> git clone -b release https://urldefense.us/v3/__https://gitlab.com/petsc/petsc.git__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnG9OyCmiL$ petsc >> cd petsc >> git checkout v3.24.1 >> module purge >> module load nvidia-hpc-sdk/25.1 hdf5/1.14.1-2-nvidia-nvhpcx cmake/3.25.1 >> ./configure --PETSC_DIR=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/petsc --prefix=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/hhinstal --with-fortran-bindings=0 --with-fc=0 --with-petsc-arch=linux-x86_64-opt --with-scalar-type=real --with-debugging=yes --with-64-bit-indices=1 --with-precision=single --download-hypre CFLAGS=-I/apps/ACC/HDF5/1.14.1-2/NVIDIA/NVHPCX/include CXXFLAGS= FCFLAGS= --with-shared-libraries=1 --with-mpi=1 --with-blacs-lib=/gpfs/apps/MN5/ACC/ONEAPI/2025.1/mkl/2025.1/lib/intel64/libmkl_blacs_openmpi_lp64.a --with-blacs-include=/gpfs/apps/MN5/ACC/ONEAPI/2025.1/mkl/2025.1/include --with-mpi-dir=/apps/ACC/NVIDIA-HPC-SDK/25.1/Linux_x86_64/25.1/comm_libs/12.6/hpcx/latest/ompi/ --download-ptscotch=yes --download-metis --download-parmetis >> make all check >> make install >> >> ------------------- >> For the GPU version when configuring petsc I add : --with-cuda >> >> I then change the export PETSC_INSTALL to >> export PETSC_INSTALL=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/hhinstal >> and repeat all other exports >> >> mkdir build_gpu >> cd build_gpu >> cmake -DUSE_RP=8 -DUSE_PORDER=3 -DUSE_PETSC=ON -DUSE_GPU=ON -DDEBUG_MODE=OFF .. >> make -j 80 >> >> As you can see from the submit instructions the executable is found in sod2d_gitlab/build_gpu/src/app_sod2d/sod2d >> >> I hope I have not forgotten anything and my instructions are 'easy' to follow. If you have any issue do not doubt to contact me. >> The wiki for our code can be found in https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab/-/wikis/home__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnG49E2dbs$ >> >> Best, >> >> Herbert Owen >> >> Herbert Owen >> Senior Researcher, Dpt. Computer Applications in Science and Engineering >> Barcelona Supercomputing Center (BSC-CNS) >> Tel: +34 93 413 4038 >> Skype: herbert.owen >> >> https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnGxsT7iF2$ >> >> >> >> >> >> >> >> >>> On 16 Oct 2025, at 18:30, Junchao Zhang > wrote: >>> >>> Hi, Herbert, >>> I don't have much experience on OpenACC and PETSc CI doesn't have such tests. Could you avoid using nvfortran and instead use gfortran to compile your Fortran + OpenACC code? If you, then you can use the latest petsc code and make our debugging easier. >>> Also, could you provide us with a test and instructions to reproduce the problem? >>> >>> Thanks! >>> --Junchao Zhang >>> >>> >>> On Thu, Oct 16, 2025 at 5:07?AM howen via petsc-users > wrote: >>>> Dear All, >>>> >>>> I am interfacing our CFD code (Fortran + OpenACC) to Petsc. >>>> Since we use OpenACC the natural choice for us is to use Nvidia?s nvhpc compiler. The Gnu compiler does not work well and we do not have access to the Cray compiler. >>>> >>>> I already know that the latest version of Petsc does not compile with nvhpc, I am therefore using version 3.21. >>>> I get good results on the CPU both in serial and parallel (MPI). However, the GPU implementation, that is what we are interested in, only work correctly for the serial version. In parallel, the results are different. Even for a CG solve. >>>> >>>> I would like to know, if you have experience with the Nvidia compiler. I am particularly interested if you have already observed issues with it. Your opinion on whether to put further effort into trying to find a bug I may have introduced during the interfacing is highly appreciated. >>>> >>>> Best, >>>> >>>> Herbert Owen >>>> Senior Researcher, Dpt. Computer Applications in Science and Engineering >>>> Barcelona Supercomputing Center (BSC-CNS) >>>> Tel: +34 93 413 4038 >>>> Skype: herbert.owen >>>> >>>> https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnGxsT7iF2$ >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >> > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnG02oLPV3$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Nov 14 09:45:13 2025 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 14 Nov 2025 10:45:13 -0500 Subject: [petsc-users] Error in configuring PETSc with Cygwin on Windows by using MS-MPI In-Reply-To: <2980bc3b.63f0.19a824df26a.Coremail.202321009113@mail.scut.edu.cn> References: <71f7ca4b.59c6.19a724c3c2c.Coremail.ctchengben@mail.scut.edu.cn> <823C2320-52A3-4679-8BB2-26DA296E4ACA@petsc.dev> <46e195ab.5cea.19a779d6908.Coremail.202321009113@mail.scut.edu.cn> <6de01bb0.5fe2.19a7c6bb95f.Coremail.202321009113@mail.scut.edu.cn> <84886CB9-D3B8-433C-943B-31E85A47C3B3@petsc.dev> <2980bc3b.63f0.19a824df26a.Coremail.202321009113@mail.scut.edu.cn> Message-ID: <9A855973-12C8-4A76-B898-A69AD14962D2@petsc.dev> The C preprocessor may be failing on the complicated gymnastics of PETSC_DEPRECATED_FUNCTION My conclusion is PETSc is unbuildable on the version of the Microsoft compilers you are using, and you need to upgrade to the latest Microsoft compilers. The Microsoft C compiler has never been properly standard-compliant (and proudly so), so it can fail on correct C code. Barry > On Nov 14, 2025, at 7:18?AM, ?? <202321009113 at mail.scut.edu.cn> wrote: > >> Hi Barry >> >> >> >> Thanks for your advice. >> I follow your advice and configure and make again. >> >> /****************************************** >> ./configure --with-debugging=0 --with-cc="win32fe_cl" --with-fc=0 --with-cxx="win32fe_cl" --download-f2cblaslapack=/cygdrive/g/mypetsc/f2cblaslapack-3.8.0.q2.tar.gz --with-mpi-include=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Include,/cygdrive/g/MSmpi/MicrosoftSDKs/Include/x64\] --with-mpi-lib=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpifec.lib,/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpi.lib\] --with-mpiexec=/cygdrive/g/MSmpi/MicrosoftMPI/Bin/mpiexec --download-metis=/cygdrive/g/mypetsc/petsc-pkg-metis-69fb26dd0428.tar.gz --download-parmetis=/cygdrive/g/mypetsc/petsc-pkg-parmetis-f5e3aab04fd5.tar.gz --with-strict-petscerrorcode=0 --with-64-bit-indices -CFLAGS='-O2 -MD -wd4996' -CXXFLAGS='-O2 -MD -wd4996' >> *******************************************/ >> >> >> It happen to be another problems when performing the make all. >> >> >> >> The new configure.log and make.log is attached below. >> >> Sorry for bother you so many times but I wish your can help me again. >> >> >> Looking forward your reply! >> >> sinserely, >> Cheng. >> > > > > -----????----- > ???: "Barry Smith" > > ????: 2025-11-13 23:05:05 (???) > ???: ?? <202321009113 at mail.scut.edu.cn > > ??: petsc-users > > ??: Re: [petsc-users] Error in configuring PETSc with Cygwin on Windows by using MS-MPI > > > Change > > __attribute__((packed)) > > to > > /* __attribute__((packed)) */ > > in include/petscmath.h > > and run make again. > > I think you should install a new version of Microsoft's compilers etc. > > Barry > > >> On Nov 13, 2025, at 3:53?AM, ?? <202321009113 at mail.scut.edu.cn > wrote: >> >> Hi Barry >> >> >> >> Thanks for your advice. >> I use AI help me that change the file on the petsc-3.24.1/arch-mswin-c-opt/externalpackages/petsc-pkg-parmetis-f5e3aab04fd5/headers/gk_arch. >> >> >> The change is from: >> >> #ifdef __MSC__ >> #include "ms_stdint.h" >> #include "ms_inttypes.h" >> #include "ms_stat.h" >> #else >> #ifndef SUNOS >> #include >> #endif >> #if !defined(WIN32) && !defined(__MINGW32__) >> #include >> #endif >> #include >> #include >> #include >> #endif >> >> To: >> >> #if (defined(__MSC__) || defined(_MSC_VER)) && defined(_MSC_VER) && _MSC_VER < 1900 >> #include "ms_stdint.h" >> #include "ms_inttypes.h" >> #include "ms_stat.h" >> #else >> #ifndef SUNOS >> #include >> #endif >> #if !defined(WIN32) && !defined(__MINGW32__) && !defined(_MSC_VER) >> #include >> #endif >> #include >> #include >> #if !defined(_MSC_VER) >> #include >> #endif >> #endif >> >> >> >> Then I configure the PETSc: >> >> ./configure --with-debugging=0 --with-cc=cl --with-fc=0 --with-cxx=cl --download-f2cblaslapack=/cygdrive/g/mypetsc/f2cblaslapack-3.8.0.q2.tar.gz --with-mpi-include=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Include,/cygdrive/g/MSmpi/MicrosoftSDKs/Include/x64\] --with-mpi-lib=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpifec.lib,/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpi.lib\] --with-mpiexec=/cygdrive/g/MSmpi/MicrosoftMPI/Bin/mpiexec --download-metis=/cygdrive/g/mypetsc/petsc-pkg-metis-69fb26dd0428.tar.gz --download-parmetis=/cygdrive/g/mypetsc/petsc-pkg-parmetis-f5e3aab04fd5.tar.gz --with-strict-petscerrorcode=0 --with-64-bit-indices >> >> It seems good, but then I make it >> >> it have the error: >> >> >> >> make[3]: *** [gmakefile:211: arch-mswin-c-opt/obj/src/sys/objects/device/interface/mark_dcontext.o] Error 2 >> make[3]: Leaving directory '/cygdrive/g/mypetsc/petsc-3.24.1' >> make[2]: *** [/cygdrive/g/mypetsc/petsc-3.24.1/lib/petsc/conf/rules_doc.mk:5: libs] Error 2 >> make[2]: Leaving directory '/cygdrive/g/mypetsc/petsc-3.24.1' >> **************************ERROR************************************* >> Error during compile, check arch-mswin-c-opt/lib/petsc/conf/make.log >> Send it and arch-mswin-c-opt/lib/petsc/conf/configure.log to petsc-maint at mcs.anl.gov >> ******************************************************************** >> make[1]: *** [makefile:44: all] Error 1 >> make: *** [GNUmakefile:9: all] Error 2 >> >> >> The new configure.log and make.log is attached below. >> >> I don't know if it is caused by the change I made or the other problems. >> >> >> >> So I ask for your help again. >> Looking forward your reply! >> >> >> sinserely, >> Cheng. >> >> >> >> >> >> >> >> -----????----- >> ???: "Barry Smith" > >> ????: 2025-11-13 00:03:50 (???) >> ???: ?? <202321009113 at mail.scut.edu.cn > >> ??: PETSc > >> ??: Re: [petsc-users] Error in configuring PETSc with Cygwin on Windows by using MS-MPI >> >> G:\VisualStudio2022\VC\Tools\MSVC\14.37.32822\include\stdint.h(37): error C2371: 'int_fast16_t': redefinition; different basic types >> G:\mypetsc\petsc-3.24.1\arch-mswin-c-opt\externalpackages\petsc-pkg-parmetis-f5e3aab04fd5\headers\ms_stdint.h(80): note: see declaration of 'int_fast16_t' >> G:\VisualStudio2022\VC\Tools\MSVC\14.37.32822\include\stdint.h(41): error C2371: 'uint_fast16_t': redefinition; different basic types >> G:\mypetsc\petsc-3.24.1\arch-mswin-c-opt\externalpackages\petsc-pkg-parmetis-f5e3aab04fd5\headers\ms_stdint.h(84): note: see declaration of 'uint_fast16_t' >> G:\VisualStudio2022\VC\Tools\MSVC\14.37.32822\include\stdint.h(49): warning C4005: 'INT8_MIN': macro redefinition >> G:\mypetsc\petsc-3.24.1\arch-mswin-c-opt\externalpackages\petsc-pkg-parmetis-f5e3aab04fd5\headers\ms_stdint.h(107): note: see previous definition of 'INT8_MIN' >> >> Parmetis has its own definitions for many C standard types, etc in headers\ms_stdint.h that duplicate what is available in stdint.h on Unix systems. Normally, this gets included when __MSC_ is defined instead of stdint.h (in gk_arch.h). >> >> But for some reason, with your system it appears that Microsoft's stdint.h is also getting included; presumably brought in through some other system include file since it is only included in one place. >> >> $ git grep stdint.h >> headers/gk_arch.h: #include "ms_stdint.h" >> headers/gk_arch.h: #include >> headers/ms_inttypes.h:#include "ms_stdint.h" >> headers/ms_stdint.h:// ISO C9x compliant stdint.h for Microsoft Visual Studio >> >> You have a fairly old VisualStudio, 2022. Can you upgrade to the latest? Let us know if this resolves the problem. >> >> Barry >> >> >> >> >> >> >> >> >> >> >> On Nov 12, 2025, at 5:29?AM, ?? <202321009113 at mail.scut.edu.cn > wrote: >> >> Hi Barry >> Thanks for your reply. >> I check the package parmetis,and the "petsc-pkg-parmetis-45100eac9301.tar.gz" is form https://urldefense.us/v3/__https://bitbucket.org/petsc/pkg-parmetis/get/v4.0.3.tar.gz__;!!G_uCfscf7eWS!dCHHyY-VxDm-ywDHjCTx7TuexZfZSpNvZITmJoKuqThj3NRnYxGB_lEcLPxzGRFkPS8_uyJCqoS_NU4txpTE0xQ$ . So I made a mistake about the package. >> Then I download the package form https://urldefense.us/v3/__https://bitbucket.org/petsc/pkg-parmetis/get/v4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!dCHHyY-VxDm-ywDHjCTx7TuexZfZSpNvZITmJoKuqThj3NRnYxGB_lEcLPxzGRFkPS8_uyJCqoS_NU4twjES1lo$ and it is "petsc-pkg-parmetis-f5e3aab04fd5.tar.gz" >> >> >> Then the compiler option in configuration is: >> ./configure --with-debugging=0 --with-cc=cl --with-fc=0 --with-cxx=cl --download-f2cblaslapack=/cygdrive/g/mypetsc/f2cblaslapack-3.8.0.q2.tar.gz --with-mpi-include=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Include,/cygdrive/g/MSmpi/MicrosoftSDKs/Include/x64\] --with-mpi-lib=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpifec.lib,/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpi.lib\] --with-mpiexec=/cygdrive/g/MSmpi/MicrosoftMPI/Bin/mpiexec --download-metis=/cygdrive/g/mypetsc/petsc-pkg-metis-69fb26dd0428.tar.gz --download-parmetis=/cygdrive/g/mypetsc/petsc-pkg-parmetis-f5e3aab04fd5.tar.gz --with-strict-petscerrorcode=0 --with-64-bit-indices >> >> but it still have the same error: >> ********************************************************************************************* >> ============================================================================================= >> ============================================================================================= >> Configuring PARMETIS with CMake; this may take several minutes >> ============================================================================================= >> ============================================================================================= >> Compiling and installing PARMETIS; this may take several minutes >> ============================================================================================= >> >> >> ********************************************************************************************* >> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): >> --------------------------------------------------------------------------------------------- >> Error running make on PARMETIS >> >> >> ********************************************************************************************* >> >> >> The new configure.log is attached below. >> So I ask for your help again. >> Looking forward your reply! >> >> >> sinserely, >> Cheng. >> >> >> >> >> -----????----- >> ???: "Barry Smith" > >> ????: 2025-11-11 23:29:01 (???) >> ???: "Matthew Knepley" > >> ??: ?? >, petsc-users at mcs.anl.gov >> ??: Re: [petsc-users] Error in configuring PETSc with Cygwin on Windows by using MS-MPI >> >> >> Where/how did you obtain /cygdrive/g/mypetsc/petsc-pkg-parmetis-45100eac9301.tar.gz ? Was it from PETSc ./configure? >> >> self.version = '4.0.3' >> self.versionname = 'PARMETIS_MAJOR_VERSION.PARMETIS_MINOR_VERSION.PARMETIS_SUBMINOR_VERSION' >> self.gitcommit = 'v'+self.version+'-p9' >> self.download = ['git://https://bitbucket.org/petsc/pkg-parmetis.git','https://bitbucket.org/petsc/pkg-parmetis/get/'+self.gitcommit+'.tar.gz '] >> >> >> >> On Nov 11, 2025, at 7:35?AM, Matthew Knepley > wrote: >> >> On Tue, Nov 11, 2025 at 4:44?AM ?? > wrote: >> Hello, >> Recently I try to install PETSc with Cygwin since I'd like to use PETSc with Visual Studio on Windows10 plateform.For the sake of clarity, I firstly list the softwares/packages used below: >> 1. PETSc: version 3.14.1 >> 2. VS: version 2022 >> 3. MS MPI: download Microsoft MPI v10.1.2 >> 4. Cygwin >> >> Quick question: Have you considered installing on WSL? I have had much better luck with that on Windows. >> >> This seems to be an incompatibility of ParMetis Windows support and your version: >> >> G:\VisualStudio2022\VC\Tools\MSVC\14.37.32822\include\stdint.h(37): error C2371: 'int_fast16_t': redefinition; different basic types^M >> G:\mypetsc\petsc-3.24.1\arch-mswin-c-opt\externalpackages\petsc-pkg-parmetis-f5e3aab04fd5\headers\ms_stdint.h(80): note: see declaration of 'int_fast16_t'^M >> G:\VisualStudio2022\VC\Tools\MSVC\14.37.32822\include\stdint.h(41): error C2371: 'uint_fast16_t': redefinition; different basic types^M >> G:\mypetsc\petsc-3.24.1\arch-mswin-c-opt\externalpackages\petsc-pkg-parmetis-f5e3aab04fd5\headers\ms_stdint.h(84): note: see declaration of 'uint_fast16_t'^M >> >> Thanks, >> >> Matt >> >> And the compiler option in configuration is: >> ./configure --with-debugging=0 --with-cc=cl --with-fc=0 --with-cxx=cl >> --download-f2cblaslapack=/cygdrive/g/mypetsc/f2cblaslapack-3.8.0.q2.tar.gz >> --with-mpi-include=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Include,/cygdrive/g/MSmpi/MicrosoftSDKs/Include/x64\] >> --with-mpi-lib=\[/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpifec.lib,/cygdrive/g/MSmpi/MicrosoftSDKs/Lib/x64/msmpi.lib\] >> --with-mpiexec=/cygdrive/g/MSmpi/MicrosoftMPI/Bin/mpiexec >> --download-metis=/cygdrive/g/mypetsc/petsc-pkg-metis-69fb26dd0428.tar.gz >> --download-parmetis=/cygdrive/g/mypetsc/petsc-pkg-parmetis-45100eac9301.tar.gz >> --with-strict-petscerrorcode=0 --with-64-bit-indices --download-hdf5=/cygdrive/g/mypetsc/hdf5-1.14.3-p1.tar.bz2 >> >> >> >> >> >> >> but there return an error: >> ********************************************************************************************* >> ============================================================================================= >> ============================================================================================= >> Configuring PARMETIS with CMake; this may take several minutes >> ============================================================================================= >> ============================================================================================= >> Compiling and installing PARMETIS; this may take several minutes >> ============================================================================================= >> >> >> ********************************************************************************************* >> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): >> --------------------------------------------------------------------------------------------- >> Error running make on PARMETIS >> >> >> ********************************************************************************************* >> >> >> The configure.log is attached below. >> So I write this email to report my problem and ask for your help. >> Looking forward your reply! >> >> >> sinserely, >> Cheng. >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!dCHHyY-VxDm-ywDHjCTx7TuexZfZSpNvZITmJoKuqThj3NRnYxGB_lEcLPxzGRFkPS8_uyJCqoS_NU4tpGoejyM$ >> >> > > > ?? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 1387907 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: make.log Type: application/octet-stream Size: 137025 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From yin.shi1 at icloud.com Sat Nov 15 08:17:23 2025 From: yin.shi1 at icloud.com (Yin Shi) Date: Sat, 15 Nov 2025 22:17:23 +0800 Subject: [petsc-users] solveBackward in parallel Message-ID: Dear Developers, In short, I need to explicitly use A.solveBackward(b, x) in parallel with petsc4py, where A is a Cholesky factored matrix, but it seems that this is not supported (e.g., for mumps and superlu_dist factorization solver backend). Is it possible to work around this? In detail, the problem I need to solve is to generate a set of correlated random numbers (denoted by a vector, w) from an uncorrelated one (denoted by a vector n). Denote the covariance matrix of n as C (symmetric). One needs to first factorize C, C = L L^T, and then solve the linear system L^T w = n for w in parallel. Is it possible to reformulate this problem for it to be implemented using petsc4py? Thank you! Yin From bsmith at petsc.dev Sat Nov 15 18:59:33 2025 From: bsmith at petsc.dev (Barry Smith) Date: Sat, 15 Nov 2025 19:59:33 -0500 Subject: [petsc-users] solveBackward in parallel In-Reply-To: References: Message-ID: <5B2C79DF-736E-4113-AF9B-D8A40B64C192@petsc.dev> It appears that only MATSOLVERMKL_CPARDISO provides a parallel backward solve currently. The only seperation of forward and backward solves in MUMPS appears to be provided with (from its users manual) A special case is the one where the forward elimination step is performed during factorization (see Subsection 3.8), instead of during the solve phase. This allows accessing the L factors right after they have been computed, with a better locality, and can avoid writing the L factors to disk in an out-of-core context. In this case (forward > On Nov 15, 2025, at 9:17?AM, Yin Shi via petsc-users wrote: > > Dear Developers, > > In short, I need to explicitly use A.solveBackward(b, x) in parallel with petsc4py, where A is a Cholesky factored matrix, but it seems that this is not supported (e.g., for mumps and superlu_dist factorization solver backend). Is it possible to work around this? > > In detail, the problem I need to solve is to generate a set of correlated random numbers (denoted by a vector, w) from an uncorrelated one (denoted by a vector n). Denote the covariance matrix of n as C (symmetric). One needs to first factorize C, C = L L^T, and then solve the linear system L^T w = n for w in parallel. Is it possible to reformulate this problem for it to be implemented using petsc4py? > > Thank you! > Yin -------------- next part -------------- An HTML attachment was scrubbed... URL: From aldo.bonfiglioli at unibas.it Mon Nov 17 01:50:59 2025 From: aldo.bonfiglioli at unibas.it (Aldo Bonfiglioli) Date: Mon, 17 Nov 2025 08:50:59 +0100 Subject: [petsc-users] Probelm with DMPlexExtractSubMesh In-Reply-To: References: <1e9e4725-5274-4cef-a035-bb399deaac55@unibas.it> Message-ID: <13143062-44a5-4855-9929-47bfa668431f@unibas.it> On 11/13/25 19:48, Matthew Knepley wrote: > Sorry, I have been traveling. I just got back to this. > > The problem is that _everything_ that goes in the submesh has to have > the same label value. That way you can distinguish?exactly what?you > want in. However, the boundary label has to make decisions about > shared edges and vertices. I am attaching a modified code that does > what you want by making a separate label for each side. > > I apologize for the C. I am just not as quick in Fortran. > > ? Thanks, > > ? ? ?Matt > > On Thu, Nov 6, 2025 at 1:42?AM Aldo Bonfiglioli > wrote: > > Dear all, > > I am having troubles in using DMPlexExtractSubMesh to extract the > different strata of the Face Sets of a given mesh. > > When run on the enclosed tetrahedral mesh of the unit cube > generated with gmsh > >> Face Sets: 6 strata with value/size (1 (246), 2 (246), 3 (246), 4 >> (246), 5 (242), 6 (242)) >> > I would expect 246 "points" on stratum 3, but when I DMview the > subdm (and plot it) the surface mesh looks incomplete > >> DM Object: patch_03 1 MPI process >> ?type: plex >> patch_03 in 2 dimensions: >> ?Cells are at height 1 >> ?Number of 0-cells per rank: 122 >> ?Number of 1-cells per rank: 325 >> Number of 2-cells per rank: 204 >> Number of 3-cells per rank: 204 [204] >> Labels: >> celltype: 4 strata with value/size (0 (122), 1 (325), 3 (204), 12 >> (204)) >> depth: 4 strata with value/size (0 (122), 1 (325), 2 (204), 3 (204)) >> Cell Sets: 1 strata with value/size (1 (204)) >> Face Sets: 1 strata with value/size (3 (204)) >> Edge Sets: 2 strata with value/size (1 (8), 5 (8)) >> > see also patch_03.pdf > > What am I doing wrong? > > A simple reproducer (compiles with petsc-3.24.0)?and the gmsh mesh > are enclosed. > > Thanks, > > Aldo > > -- > Dr. Aldo Bonfiglioli > Associate professor of Fluid Mechanics > Dipartimento di Ingegneria > Universita' della Basilicata > V.le dell'Ateneo Lucano, 10 85100 Potenza ITALY > tel:+39.0971.205203 fax:+39.0971.205215 > web:https://urldefense.us/v3/__http://docenti.unibas.it/site/home/docente.html?m=002423__;!!G_uCfscf7eWS!abF6mWj2v5ivSvUET6QFY34S5Jw6daKMHiS5E9ztz2YbV2jQPr-0WGi09d7IEArZlAwqdLwjjsQeUl2PlNwJMcq6AAnRpMwHc3Q$ > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!abF6mWj2v5ivSvUET6QFY34S5Jw6daKMHiS5E9ztz2YbV2jQPr-0WGi09d7IEArZlAwqdLwjjsQeUl2PlNwJMcq6AAnRT1dw3uY$ > Matthew, thank you for providing the working C code. I will ba back to you in case I need further advice. Regards, Aldo -- Dr. Aldo Bonfiglioli Associate professor of Fluid Mechanics Dipartimento di Ingegneria Universita' della Basilicata V.le dell'Ateneo Lucano, 10 85100 Potenza ITALY tel:+39.0971.205203 fax:+39.0971.205215 web:https://urldefense.us/v3/__http://docenti.unibas.it/site/home/docente.html?m=002423__;!!G_uCfscf7eWS!abF6mWj2v5ivSvUET6QFY34S5Jw6daKMHiS5E9ztz2YbV2jQPr-0WGi09d7IEArZlAwqdLwjjsQeUl2PlNwJMcq6AAnRpMwHc3Q$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From herbert.owen at bsc.es Mon Nov 17 09:40:16 2025 From: herbert.owen at bsc.es (howen) Date: Mon, 17 Nov 2025 16:40:16 +0100 Subject: [petsc-users] Petsc + nvhpc In-Reply-To: References: <0B70CA06-D787-4D97-8E33-27E71D08BBF0@bsc.es> <2A940B06-46A3-40AB-BD66-835C46970CA1@bsc.es> Message-ID: <50E23080-79B1-4EF6-BE7C-9527978EFF8A@bsc.es> Dear Matthew and Junchao, I finally found my error now everything works fine. I was a bit stuck at some moment and your small comments were very helpful. Thanks!!! Herbert Owen Senior Researcher, Dpt. Computer Applications in Science and Engineering Barcelona Supercomputing Center (BSC-CNS) Tel: +34 93 413 4038 Skype: herbert.owen https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRfz8esh1$ > On 14 Nov 2025, at 16:08, howen wrote: > > Thank you very much Matthew, > > I did what you suggested and I also added > > ierr = MatView(*amat, PETSC_VIEWER_STDOUT_WORLD); CHKERRQ(ierr); > > Now that I can see the matrices I notice that some values differ. I will debug and simplify my code to try to understand where the difference comes from . > > As soon as I have a more clear picture I will contact you back. > > Best, > > > Herbert Owen > Senior Researcher, Dpt. Computer Applications in Science and Engineering > Barcelona Supercomputing Center (BSC-CNS) > Tel: +34 93 413 4038 > Skype: herbert.owen > > https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRfz8esh1$ > > > > > > > > >> On 13 Nov 2025, at 18:23, Matthew Knepley wrote: >> >> On Thu, Nov 13, 2025 at 12:11?PM howen via petsc-users > wrote: >>> Dear Junchao, >>> >>> Thank you for response and sorry for taking so long to answer back. >>> I cannot avoid using the nvidia tools. Gfortran is not mature for OpenACC and gives us problems when compiling our code. >>> What I have done to enable using the latest petsc is to create my own C code to call petsc. >>> I have little experience with c and it took me some time, but I can now use petsc 3.24.1 ;) >>> >>> The behaviour remains the same as in my original email . >>> Parallel+GPU gives bad results. CPU(serial and parallel) and GPU serial all work ok and give the same result. >>> >>> I have gone a bit into petsc comparing the CPU and GPU version with 2 mpi. >>> I see that the difference starts in >>> src/ksp/ksp/impls/cg/cg.c L170 >>> PetscCall(KSP_PCApply(ksp, R, Z)); /* z <- Br */ >>> I have printed the vectors R and Z and the norm dp. >>> R is identical on both CPU and GPU; but Z differs. >>> The correct value of dp (for the first time it enters) is 14.3014, while running on the GPU with 2 mpis it gives 14.7493. >>> If you wish I can send you prints I introduced in cg.c >> >> Thank you for all the detail in this report. However, since you see a problem in KSPCG, I believe we can reduce the complexity. You can use >> >> -ksp_view_mat binary:A.bin -ksp_view_rhs binary:b.bin >> >> and send us those files. Then we can run your system directly using KSP ex10 (and so can you). >> >> Thanks, >> >> Matt >> >>> The folder with the input files to run the case can be downloaded from https://urldefense.us/v3/__https://b2drop.eudat.eu/s/wKRQ4LK7RTKz2iQ__;!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRSYLlx3K$ >>> >>> For submitting the gpu run I use >>> mpirun -np 2 --map-by ppr:4:node:PE=20 --report-bindings ./mn5_bind.sh /gpfs/scratch/bsc21/bsc021257/git/140-add-petsc/sod2d_gitlab/build_gpu/src/app_sod2d/sod2d ChannelFlowSolverIncomp.json >>> >>> For the cpu run >>> mpirun -np 2 /gpfs/scratch/bsc21/bsc021257/git/140-add-petsc/sod2d_gitlab/build_cpu/src/app_sod2d/sod2d ChannelFlowSolverIncomp.json >>> >>> Our code can be downloaded with : >>> git clone --recursive https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab.git__;!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRQCr0eq8$ >>> >>> -and the branch I am using with >>> git checkout 140-add-petsc >>> >>> To use exactly the same commit I am using >>> git checkout 09a923c9b57e46b14ae54b935845d50272691ace >>> >>> >>> I am currently using: Currently Loaded Modules: >>> 1) nvidia-hpc-sdk/25.1 2) hdf5/1.14.1-2-nvidia-nvhpcx 3) cmake/3.25.1 >>> I guess/hope similar modules should be available in any supercomputer. >>> >>> To build the cpu version >>> mkdir build_cpu >>> cd build_cpu >>> >>> export PETSC_INSTALL=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241_cpu/hhinstal >>> export LD_LIBRARY_PATH=$PETSC_INSTALL/lib:$LD_LIBRARY_PATH >>> export LIBRARY_PATH=$PETSC_INSTALL/lib:$LIBRARY_PATH >>> export C_INCLUDE_PATH=$PETSC_INSTALL/include:$C_INCLUDE_PATH >>> export CPLUS_INCLUDE_PATH=$PETSC_INSTALL/include:$CPLUS_INCLUDE_PATH >>> export PKG_CONFIG_PATH=$PETSC_INSTALL/lib/pkgconfig:$PKG_CONFIG_PATH >>> >>> cmake -DUSE_RP=8 -DUSE_PORDER=3 -DUSE_PETSC=ON -DUSE_GPU=OFF -DDEBUG_MODE=OFF .. >>> make -j 80 >>> >>> I have built petsc myself as follows >>> >>> git clone -b release https://urldefense.us/v3/__https://gitlab.com/petsc/petsc.git__;!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRZKzWAoJ$ petsc >>> cd petsc >>> git checkout v3.24.1 >>> module purge >>> module load nvidia-hpc-sdk/25.1 hdf5/1.14.1-2-nvidia-nvhpcx cmake/3.25.1 >>> ./configure --PETSC_DIR=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/petsc --prefix=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/hhinstal --with-fortran-bindings=0 --with-fc=0 --with-petsc-arch=linux-x86_64-opt --with-scalar-type=real --with-debugging=yes --with-64-bit-indices=1 --with-precision=single --download-hypre CFLAGS=-I/apps/ACC/HDF5/1.14.1-2/NVIDIA/NVHPCX/include CXXFLAGS= FCFLAGS= --with-shared-libraries=1 --with-mpi=1 --with-blacs-lib=/gpfs/apps/MN5/ACC/ONEAPI/2025.1/mkl/2025.1/lib/intel64/libmkl_blacs_openmpi_lp64.a --with-blacs-include=/gpfs/apps/MN5/ACC/ONEAPI/2025.1/mkl/2025.1/include --with-mpi-dir=/apps/ACC/NVIDIA-HPC-SDK/25.1/Linux_x86_64/25.1/comm_libs/12.6/hpcx/latest/ompi/ --download-ptscotch=yes --download-metis --download-parmetis >>> make all check >>> make install >>> >>> ------------------- >>> For the GPU version when configuring petsc I add : --with-cuda >>> >>> I then change the export PETSC_INSTALL to >>> export PETSC_INSTALL=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/hhinstal >>> and repeat all other exports >>> >>> mkdir build_gpu >>> cd build_gpu >>> cmake -DUSE_RP=8 -DUSE_PORDER=3 -DUSE_PETSC=ON -DUSE_GPU=ON -DDEBUG_MODE=OFF .. >>> make -j 80 >>> >>> As you can see from the submit instructions the executable is found in sod2d_gitlab/build_gpu/src/app_sod2d/sod2d >>> >>> I hope I have not forgotten anything and my instructions are 'easy' to follow. If you have any issue do not doubt to contact me. >>> The wiki for our code can be found in https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab/-/wikis/home__;!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRTtC2VEI$ >>> >>> Best, >>> >>> Herbert Owen >>> >>> Herbert Owen >>> Senior Researcher, Dpt. Computer Applications in Science and Engineering >>> Barcelona Supercomputing Center (BSC-CNS) >>> Tel: +34 93 413 4038 >>> Skype: herbert.owen >>> >>> https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRfz8esh1$ >>> >>> >>> >>> >>> >>> >>> >>> >>>> On 16 Oct 2025, at 18:30, Junchao Zhang > wrote: >>>> >>>> Hi, Herbert, >>>> I don't have much experience on OpenACC and PETSc CI doesn't have such tests. Could you avoid using nvfortran and instead use gfortran to compile your Fortran + OpenACC code? If you, then you can use the latest petsc code and make our debugging easier. >>>> Also, could you provide us with a test and instructions to reproduce the problem? >>>> >>>> Thanks! >>>> --Junchao Zhang >>>> >>>> >>>> On Thu, Oct 16, 2025 at 5:07?AM howen via petsc-users > wrote: >>>>> Dear All, >>>>> >>>>> I am interfacing our CFD code (Fortran + OpenACC) to Petsc. >>>>> Since we use OpenACC the natural choice for us is to use Nvidia?s nvhpc compiler. The Gnu compiler does not work well and we do not have access to the Cray compiler. >>>>> >>>>> I already know that the latest version of Petsc does not compile with nvhpc, I am therefore using version 3.21. >>>>> I get good results on the CPU both in serial and parallel (MPI). However, the GPU implementation, that is what we are interested in, only work correctly for the serial version. In parallel, the results are different. Even for a CG solve. >>>>> >>>>> I would like to know, if you have experience with the Nvidia compiler. I am particularly interested if you have already observed issues with it. Your opinion on whether to put further effort into trying to find a bug I may have introduced during the interfacing is highly appreciated. >>>>> >>>>> Best, >>>>> >>>>> Herbert Owen >>>>> Senior Researcher, Dpt. Computer Applications in Science and Engineering >>>>> Barcelona Supercomputing Center (BSC-CNS) >>>>> Tel: +34 93 413 4038 >>>>> Skype: herbert.owen >>>>> >>>>> https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRfz8esh1$ >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRY11r9Bz$ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian.blauth at itwm.fraunhofer.de Tue Nov 18 10:11:44 2025 From: sebastian.blauth at itwm.fraunhofer.de (Blauth, Sebastian) Date: Tue, 18 Nov 2025 16:11:44 +0000 Subject: [petsc-users] Ordering of DoFs in submatrices with PCFieldsplit Message-ID: Dear PETSc developers and users, I have a question regarding the Fieldsplit preconditioner in PETSc. In particular, I want to know how the submatrices there are created from the parent matrix. The "obvious" way would be to take the DoF indices of the corresponding split and "renumber" them so that the DoFs in the submatrix have the same order as the ones of the parent matrix. I did not find any documentation on this and as it is at least possible that the DoFs are re-ordered, I wanted to ask this question. Obviously, in case the DoFs are re-ordered, how can I get the mapping between the DoFs of the parent and the submatrix? The thing I am wanting to work on is implementing a pressure convection diffusion preconditioner with FEniCS for the incompressible Navier-Stokes equations. The parent matrix is assembled via a mixed FEM and then I use PETSc to solve the system. I want to assemble the corresponding operators on the pressure space from a collapsed (i.e. sub-space of the mixed FEM) function space. However, FEniCS re-orders the DoFs there, but I can get a mapping between the DoFs so this should not be problematic. However, I am not sure if PETSc also does a re-ordering. Thanks a lot in advance and best regards, Sebastian -- Dr. Sebastian Blauth Fraunhofer-Institut f?r Techno- und Wirtschaftsmathematik ITWM Abteilung Transportvorg?nge Fraunhofer-Platz 1, 67663 Kaiserslautern Telefon: +49 631 31600-4968 sebastian.blauth at itwm.fraunhofer.de https://urldefense.us/v3/__https://www.itwm.fraunhofer.de__;!!G_uCfscf7eWS!f_qaoCRxX3prMgl6ev5fvSFQegVfZo84xW9eJTz7uYmLjZiyJFIlm1tlqYrM3LqjOpkEoMrIJZo6J63-23-atPBnJn4et_4R-UvZVnIkaQ0$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Nov 18 10:23:27 2025 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 18 Nov 2025 11:23:27 -0500 Subject: [petsc-users] Ordering of DoFs in submatrices with PCFieldsplit In-Reply-To: References: Message-ID: On Tue, Nov 18, 2025 at 11:12?AM Blauth, Sebastian < sebastian.blauth at itwm.fraunhofer.de> wrote: > Dear PETSc developers and users, > > > > I have a question regarding the Fieldsplit preconditioner in PETSc. In > particular, I want to know how the submatrices there are created from the > parent matrix. The ?obvious? way would be to take the DoF indices of the > corresponding split and ?renumber? them so that the DoFs in the submatrix > have the same order as the ones of the parent matrix. I did not find any > documentation on this and as it is at least possible that the DoFs are > re-ordered, I wanted to ask this question. Obviously, in case the DoFs are > re-ordered, how can I get the mapping between the DoFs of the parent and > the submatrix? > Hi Sebastian, Inside, we call MatCreateSubmatrix(), which takes an IS on each process, and selects those global rows, in the order in which they appear in the IS, into a new parallel matrix. PCFieldsplitSetIS() can be used to specify those IS, so you can control the reordering. Does that make sense? > The thing I am wanting to work on is implementing a pressure convection > diffusion preconditioner with FEniCS for the incompressible Navier-Stokes > equations. > The parent matrix is assembled via a mixed FEM and then I use PETSc to > solve the system. I want to assemble the corresponding operators on the > pressure space from a collapsed (i.e. sub-space of the mixed FEM) function > space. However, FEniCS re-orders the DoFs there, but I can get a mapping > between the DoFs so this should not be problematic. However, I am not sure > if PETSc also does a re-ordering. > You can just create an IS with that reordering. What operator are you planning on assembling on the pressure space? Have you seen https://urldefense.us/v3/__https://arxiv.org/abs/1810.03315?__;!!G_uCfscf7eWS!ZFlvrtpVlFuXdYWcwujVNh1WjnSmuEKqsh1s3GCYbyN0_wNsVgBaJo3x-lWG3Iea3iQhp_iniM9QzDSr9iD3$ Thanks, Matt > Thanks a lot in advance and best regards, > > Sebastian > > > > -- > > Dr. Sebastian Blauth > > Fraunhofer-Institut f?r > > Techno- und Wirtschaftsmathematik ITWM > > Abteilung Transportvorg?nge > > Fraunhofer-Platz 1, 67663 Kaiserslautern > > Telefon: +49 631 31600-4968 > > sebastian.blauth at itwm.fraunhofer.de > > https://urldefense.us/v3/__https://www.itwm.fraunhofer.de__;!!G_uCfscf7eWS!ZFlvrtpVlFuXdYWcwujVNh1WjnSmuEKqsh1s3GCYbyN0_wNsVgBaJo3x-lWG3Iea3iQhp_iniM9QzNhlmkaU$ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ZFlvrtpVlFuXdYWcwujVNh1WjnSmuEKqsh1s3GCYbyN0_wNsVgBaJo3x-lWG3Iea3iQhp_iniM9QzGRu26U1$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian.blauth at itwm.fraunhofer.de Wed Nov 19 01:03:20 2025 From: sebastian.blauth at itwm.fraunhofer.de (Blauth, Sebastian) Date: Wed, 19 Nov 2025 07:03:20 +0000 Subject: [petsc-users] Ordering of DoFs in submatrices with PCFieldsplit In-Reply-To: References: Message-ID: Dear Matt, thanks for the clarification. Yes, that makes sense. Basically, I use two approaches for defining the splits in my code, see https://urldefense.us/v3/__https://github.com/sblauth/cashocs/blob/46c0d91467d03a4906b7bde29727b45d4bb0d6d2/cashocs/_utils/linalg.py*L245-L287__;Iw!!G_uCfscf7eWS!ZrZxEenSD9yoVQBgqWHSpGUGp75YsbFopexb0vZKBu8oG5soqUBYoVKVAGETh1eMtV2aO-XjQUFcjY-OdaJUjHL04TxyhunHGM7Y_93bJVg$ I think the first one, where the IS is defined, then does exactly what I thought it would do. In the second approach, which I need for nested fieldsplits, I use a DMShell with a Section defined analogously - so I guess the same applies here. Well, yes I could just reorder the DoFs for the creation of the submatrices - but I usually don't need these sub-functionspaces and would not want to create them every time. I thought of using MatPermute (https://urldefense.us/v3/__https://petsc.org/release/petsc4py/reference/petsc4py.PETSc.Mat.html*petsc4py.PETSc.Mat.permute__;Iw!!G_uCfscf7eWS!ZrZxEenSD9yoVQBgqWHSpGUGp75YsbFopexb0vZKBu8oG5soqUBYoVKVAGETh1eMtV2aO-XjQUFcjY-OdaJUjHL04TxyhunHGM7YwxbMqnE$ ) with the permutation I get from FEniCS - or is there any reason not to do so? And thank you very much for the reference. Yes, I am aware of the paper you sent. However I think the function spaces involved in the method make it more or less infeasible for me - usually using Taylor-Hood elements is already very expensive. I usually use a stabilized P1-P1 discretization or try to get the linear Crouzeix?Raviart with elementwise constant pressure working (for slow flows, this works okay, but as I go to higher Reynolds numbers, things become more problematic). And regarding your question on which operators I am planning to assemble on the pressure space: Basically the pressure mass matrix, pressure convection-diffusion matrix and a pressure Laplacian. If you have any tips for solving the incompressible Navier Stokes equations (steady state) at higher Reynolds numbers I certainly welcome them. I can also go a bit more into detail of what kind of solution approach I am using - if that is appropriate here. Thanks a lot and best regards, Sebastian -- Dr. Sebastian Blauth Fraunhofer-Institut f?r Techno- und Wirtschaftsmathematik ITWM Abteilung Transportvorg?nge Fraunhofer-Platz 1, 67663 Kaiserslautern Telefon: +49 631 31600-4968 sebastian.blauth at itwm.fraunhofer.de https://urldefense.us/v3/__https://www.itwm.fraunhofer.de__;!!G_uCfscf7eWS!ZrZxEenSD9yoVQBgqWHSpGUGp75YsbFopexb0vZKBu8oG5soqUBYoVKVAGETh1eMtV2aO-XjQUFcjY-OdaJUjHL04TxyhunHGM7YW0PRsVU$ > -----Original Message----- > From: Matthew Knepley > Sent: Tuesday, November 18, 2025 5:23 PM > To: Blauth, Sebastian > Cc: PETSc users list > Subject: Re: [petsc-users] Ordering of DoFs in submatrices with PCFieldsplit > > On Tue, Nov 18, 2025 at 11:12?AM Blauth, Sebastian > > wrote: > > Dear PETSc developers and users, > > > > I have a question regarding the Fieldsplit preconditioner in PETSc. In > particular, I want to know how the submatrices there are created from the parent > matrix. The ?obvious? way would be to take the DoF indices of the corresponding > split and ?renumber? them so that the DoFs in the submatrix have the same order > as the ones of the parent matrix. I did not find any documentation on this and as > it is at least possible that the DoFs are re-ordered, I wanted to ask this question. > Obviously, in case the DoFs are re-ordered, how can I get the mapping between > the DoFs of the parent and the submatrix? > > > Hi Sebastian, > > Inside, we call MatCreateSubmatrix(), which takes an IS on each process, and > selects those global rows, in the order in which they appear in the IS, into a new > parallel matrix. PCFieldsplitSetIS() can be used to specify those IS, so you can > control the reordering. Does that make sense? > > > The thing I am wanting to work on is implementing a pressure convection > diffusion preconditioner with FEniCS for the incompressible Navier-Stokes > equations. > > The parent matrix is assembled via a mixed FEM and then I use PETSc to > solve the system. I want to assemble the corresponding operators on the pressure > space from a collapsed (i.e. sub-space of the mixed FEM) function space. > However, FEniCS re-orders the DoFs there, but I can get a mapping between the > DoFs so this should not be problematic. However, I am not sure if PETSc also does > a re-ordering. > > > You can just create an IS with that reordering. What operator are you planning on > assembling on the pressure space? Have you seen > https://urldefense.us/v3/__https://arxiv.org/abs/1810.03315?__;!!G_uCfscf7eWS!ZrZxEenSD9yoVQBgqWHSpGUGp75YsbFopexb0vZKBu8oG5soqUBYoVKVAGETh1eMtV2aO-XjQUFcjY-OdaJUjHL04TxyhunHGM7YE-1UtAk$ > > Thanks, > > Matt > > > Thanks a lot in advance and best regards, > > Sebastian > > > > -- > > Dr. Sebastian Blauth > > Fraunhofer-Institut f?r > > Techno- und Wirtschaftsmathematik ITWM > > Abteilung Transportvorg?nge > > Fraunhofer-Platz 1, 67663 Kaiserslautern > > Telefon: +49 631 31600-4968 > > sebastian.blauth at itwm.fraunhofer.de > > > https://urldefense.us/v3/__https://www.itwm.fraunhofer.de__;!!G_uCfscf7eWS!ZrZxEenSD9yoVQBgqWHSpGUGp75YsbFopexb0vZKBu8oG5soqUBYoVKVAGETh1eMtV2aO-XjQUFcjY-OdaJUjHL04TxyhunHGM7YW0PRsVU$ > S!f_qaoCRxX3prMgl6ev5fvSFQegVfZo84xW9eJTz7uYmLjZiyJFIlm1tlqYrM3LqjOpkE > oMrIJZo6J63-23-atPBnJn4et_4R-UvZoWlBpHM$> > > > > > > -- > > What most experimenters take for granted before they begin their experiments is > infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ZrZxEenSD9yoVQBgqWHSpGUGp75YsbFopexb0vZKBu8oG5soqUBYoVKVAGETh1eMtV2aO-XjQUFcjY-OdaJUjHL04TxyhunHGM7YQCIJKn8$ > From knepley at gmail.com Wed Nov 19 07:18:50 2025 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 19 Nov 2025 08:18:50 -0500 Subject: [petsc-users] Ordering of DoFs in submatrices with PCFieldsplit In-Reply-To: References: Message-ID: On Wed, Nov 19, 2025 at 2:03?AM Blauth, Sebastian < sebastian.blauth at itwm.fraunhofer.de> wrote: > Dear Matt, > > thanks for the clarification. Yes, that makes sense. Basically, I use two > approaches for defining the splits in my code, see > https://urldefense.us/v3/__https://github.com/sblauth/cashocs/blob/46c0d91467d03a4906b7bde29727b45d4bb0d6d2/cashocs/_utils/linalg.py*L245-L287__;Iw!!G_uCfscf7eWS!Z3vaW6mH3kTXdhSyLFZ2XcUc1E6hcY4dtjtLWLAzcg4Pj_S8issujZ0Khj24yL9Bb5KfynJ0mj5vygrOWHgH$ > I think the first one, where the IS is defined, then does exactly what I > thought it would do. In the second approach, which I need for nested > fieldsplits, I use a DMShell with a Section defined analogously - so I > guess the same applies here. > Okay. > Well, yes I could just reorder the DoFs for the creation of the > submatrices - but I usually don't need these sub-functionspaces and would > not want to create them every time. I thought of using MatPermute ( > https://urldefense.us/v3/__https://petsc.org/release/petsc4py/reference/petsc4py.PETSc.Mat.html*petsc4py.PETSc.Mat.permute__;Iw!!G_uCfscf7eWS!Z3vaW6mH3kTXdhSyLFZ2XcUc1E6hcY4dtjtLWLAzcg4Pj_S8issujZ0Khj24yL9Bb5KfynJ0mj5vyiWtyehi$ ) > with the permutation I get from FEniCS - or is there any reason not to do > so? > That is more memory movement. I am not understanding why you would not just permute the input defining the nested FS. > And thank you very much for the reference. Yes, I am aware of the paper > you sent. However I think the function spaces involved in the method make > it more or less infeasible for me - usually using Taylor-Hood elements is > already very expensive. I usually use a stabilized P1-P1 discretization or > try to get the linear Crouzeix?Raviart with elementwise constant pressure > working (for slow flows, this works okay, but as I go to higher Reynolds > numbers, things become more problematic). > Oh, yes those spaces are crazy, but not necessary. In https://urldefense.us/v3/__https://arxiv.org/pdf/2107.00820__;!!G_uCfscf7eWS!Z3vaW6mH3kTXdhSyLFZ2XcUc1E6hcY4dtjtLWLAzcg4Pj_S8issujZ0Khj24yL9Bb5KfynJ0mj5vysoKtJkS$ , on page 9, you can see that they are able to prove the kernel decomposition property for simple Taylor-Hood. > And regarding your question on which operators I am planning to assemble > on the pressure space: Basically the pressure mass matrix, pressure > convection-diffusion matrix and a pressure Laplacian. > > If you have any tips for solving the incompressible Navier Stokes > equations (steady state) at higher Reynolds numbers I certainly welcome > them. I can also go a bit more into detail of what kind of solution > approach I am using - if that is appropriate here. > I think that the Augmented Lagrangian strategy from the Stadler paper is currently the best option I know of. Thanks, Matt > Thanks a lot and best regards, > Sebastian > > > -- > Dr. Sebastian Blauth > Fraunhofer-Institut f?r > Techno- und Wirtschaftsmathematik ITWM > Abteilung Transportvorg?nge > Fraunhofer-Platz 1, 67663 Kaiserslautern > Telefon: +49 631 31600-4968 > sebastian.blauth at itwm.fraunhofer.de > https://urldefense.us/v3/__https://www.itwm.fraunhofer.de__;!!G_uCfscf7eWS!Z3vaW6mH3kTXdhSyLFZ2XcUc1E6hcY4dtjtLWLAzcg4Pj_S8issujZ0Khj24yL9Bb5KfynJ0mj5vyu_13JIA$ > > > -----Original Message----- > > From: Matthew Knepley > > Sent: Tuesday, November 18, 2025 5:23 PM > > To: Blauth, Sebastian > > Cc: PETSc users list > > Subject: Re: [petsc-users] Ordering of DoFs in submatrices with > PCFieldsplit > > > > On Tue, Nov 18, 2025 at 11:12?AM Blauth, Sebastian > > > > wrote: > > > > Dear PETSc developers and users, > > > > > > > > I have a question regarding the Fieldsplit preconditioner in > PETSc. In > > particular, I want to know how the submatrices there are created from > the parent > > matrix. The ?obvious? way would be to take the DoF indices of the > corresponding > > split and ?renumber? them so that the DoFs in the submatrix have the > same order > > as the ones of the parent matrix. I did not find any documentation on > this and as > > it is at least possible that the DoFs are re-ordered, I wanted to ask > this question. > > Obviously, in case the DoFs are re-ordered, how can I get the mapping > between > > the DoFs of the parent and the submatrix? > > > > > > Hi Sebastian, > > > > Inside, we call MatCreateSubmatrix(), which takes an IS on each process, > and > > selects those global rows, in the order in which they appear in the IS, > into a new > > parallel matrix. PCFieldsplitSetIS() can be used to specify those IS, so > you can > > control the reordering. Does that make sense? > > > > > > The thing I am wanting to work on is implementing a pressure > convection > > diffusion preconditioner with FEniCS for the incompressible Navier-Stokes > > equations. > > > > The parent matrix is assembled via a mixed FEM and then I use > PETSc to > > solve the system. I want to assemble the corresponding operators on the > pressure > > space from a collapsed (i.e. sub-space of the mixed FEM) function space. > > However, FEniCS re-orders the DoFs there, but I can get a mapping > between the > > DoFs so this should not be problematic. However, I am not sure if PETSc > also does > > a re-ordering. > > > > > > You can just create an IS with that reordering. What operator are you > planning on > > assembling on the pressure space? Have you seen > > https://urldefense.us/v3/__https://arxiv.org/abs/1810.03315?__;!!G_uCfscf7eWS!Z3vaW6mH3kTXdhSyLFZ2XcUc1E6hcY4dtjtLWLAzcg4Pj_S8issujZ0Khj24yL9Bb5KfynJ0mj5vyv_7q3wU$ > > > > Thanks, > > > > Matt > > > > > > Thanks a lot in advance and best regards, > > > > Sebastian > > > > > > > > -- > > > > Dr. Sebastian Blauth > > > > Fraunhofer-Institut f?r > > > > Techno- und Wirtschaftsmathematik ITWM > > > > Abteilung Transportvorg?nge > > > > Fraunhofer-Platz 1, 67663 Kaiserslautern > > > > Telefon: +49 631 31600-4968 > > > > sebastian.blauth at itwm.fraunhofer.de > > > > > > https://urldefense.us/v3/__https://www.itwm.fraunhofer.de__;!!G_uCfscf7eWS!Z3vaW6mH3kTXdhSyLFZ2XcUc1E6hcY4dtjtLWLAzcg4Pj_S8issujZ0Khj24yL9Bb5KfynJ0mj5vyu_13JIA$ > > < > https://urldefense.us/v3/__https://www.itwm.fraunhofer.de/__;!!G_uCfscf7eW > > S!f_qaoCRxX3prMgl6ev5fvSFQegVfZo84xW9eJTz7uYmLjZiyJFIlm1tlqYrM3LqjOpkE > > oMrIJZo6J63-23-atPBnJn4et_4R-UvZoWlBpHM$> > > > > > > > > > > > > -- > > > > What most experimenters take for granted before they begin their > experiments is > > infinitely more interesting than any results to which their experiments > lead. > > -- Norbert Wiener > > > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Z3vaW6mH3kTXdhSyLFZ2XcUc1E6hcY4dtjtLWLAzcg4Pj_S8issujZ0Khj24yL9Bb5KfynJ0mj5vypAp6Vm8$ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Z3vaW6mH3kTXdhSyLFZ2XcUc1E6hcY4dtjtLWLAzcg4Pj_S8issujZ0Khj24yL9Bb5KfynJ0mj5vypAp6Vm8$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sun Nov 23 14:09:10 2025 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 23 Nov 2025 15:09:10 -0500 Subject: [petsc-users] solveBackward in parallel In-Reply-To: <7D84E81D-60FF-4316-818C-4D8D0FCA1ACD@icloud.com> References: <5B2C79DF-736E-4113-AF9B-D8A40B64C192@petsc.dev> <7D84E81D-60FF-4316-818C-4D8D0FCA1ACD@icloud.com> Message-ID: I would be stunned and amazed if this worked. Sparse factorization codes use very complicated data structures to store the resulting "factors" and the solves are complicated code that traverse through the "factor" data structures to perform the solve. Barry > On Nov 22, 2025, at 6:58?AM, Yin Shi wrote: > > Thank you very much for your reply. Given this, when using MUMPS in parallel, I can still get the factor matrix (using getFactorMatrix method of a PC object) and use it to do matrix multiplications (e.g., using matMult method of the factor matrix), correct? I also would like to confirm whether the factor matrix returned is really triangular and multiplying it with another matrix gives the intended result. > >> On Nov 16, 2025, at 08:59, Barry Smith wrote: >> >> It appears that only MATSOLVERMKL_CPARDISO provides a parallel backward solve currently. >> >> The only seperation of forward and backward solves in MUMPS appears to be provided with (from its users manual) >> >> A special case is the one >> where the forward elimination step is performed during factorization (see Subsection 3.8), instead of >> during the solve phase. This allows accessing the L factors right after they have been computed, with a >> better locality, and can avoid writing the L factors to disk in an out-of-core context. In this case (forward >> >> >> >>> On Nov 15, 2025, at 9:17?AM, Yin Shi via petsc-users wrote: >>> >>> Dear Developers, >>> >>> In short, I need to explicitly use A.solveBackward(b, x) in parallel with petsc4py, where A is a Cholesky factored matrix, but it seems that this is not supported (e.g., for mumps and superlu_dist factorization solver backend). Is it possible to work around this? >>> >>> In detail, the problem I need to solve is to generate a set of correlated random numbers (denoted by a vector, w) from an uncorrelated one (denoted by a vector n). Denote the covariance matrix of n as C (symmetric). One needs to first factorize C, C = L L^T, and then solve the linear system L^T w = n for w in parallel. Is it possible to reformulate this problem for it to be implemented using petsc4py? >>> >>> Thank you! >>> Yin >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Sun Nov 23 16:41:33 2025 From: jed at jedbrown.org (Jed Brown) Date: Sun, 23 Nov 2025 15:41:33 -0700 Subject: [petsc-users] 41st Colorado Conference on Iterative and Multigrid Methods, June 2026 Message-ID: <87h5ukjrs2.fsf@jedbrown.org> After many successful Copper Mountain Conferences on Iterative and Multigrid Methods, we have made the difficult decision that it is time to continue the conference series in a new venue. Thus, it is with great pleasure that we announce the 41st Colorado Conference on Iterative and Multigrid Methods, to be held June 21-26, 2026 in Boulder, Colorado, on the CU Campus. We hope to continue the many wonderful traditions of Copper Mountain, with a rich technical problem, opportunities for informal discussion and work time amongst the participants, and a focus on student and early career participants. We are currently finalizing many details for the conference, and information about participation, registration, and lodging will be forthcoming at https://urldefense.us/v3/__https://coloradoconference.github.io/2026/__;!!G_uCfscf7eWS!dHswLmycwzRakxfip9Nnp9U-IWvLvaqe3za8V4FcT8p9hZZSWxi7SQhnV7CiKrk6nkktLJdS7ODcOXoUn20$ . Expect student paper competition deadlines in February, 2026, and abstract and registration deadlines in Spring, 2026. We look forward to welcoming you in Boulder! Jed Brown, University of Colorado at Boulder Rob Falgout, Lawrence Livermore National Laboratory Scott MacLachlan, Memorial University of Newfoundland Luke Olson, University of Illinois Urbana-Champaign From yin.shi1 at icloud.com Sat Nov 22 05:58:57 2025 From: yin.shi1 at icloud.com (Yin Shi) Date: Sat, 22 Nov 2025 19:58:57 +0800 Subject: [petsc-users] solveBackward in parallel In-Reply-To: <5B2C79DF-736E-4113-AF9B-D8A40B64C192@petsc.dev> References: <5B2C79DF-736E-4113-AF9B-D8A40B64C192@petsc.dev> Message-ID: <7D84E81D-60FF-4316-818C-4D8D0FCA1ACD@icloud.com> Thank you very much for your reply. Given this, when using MUMPS in parallel, I can still get the factor matrix (using getFactorMatrix method of a PC object) and use it to do matrix multiplications (e.g., using matMult method of the factor matrix), correct? I also would like to confirm whether the factor matrix returned is really triangular and multiplying it with another matrix gives the intended result. > On Nov 16, 2025, at 08:59, Barry Smith wrote: > > It appears that only MATSOLVERMKL_CPARDISO provides a parallel backward solve currently. > > The only seperation of forward and backward solves in MUMPS appears to be provided with (from its users manual) > > A special case is the one > where the forward elimination step is performed during factorization (see Subsection 3.8), instead of > during the solve phase. This allows accessing the L factors right after they have been computed, with a > better locality, and can avoid writing the L factors to disk in an out-of-core context. In this case (forward > > > >> On Nov 15, 2025, at 9:17?AM, Yin Shi via petsc-users wrote: >> >> Dear Developers, >> >> In short, I need to explicitly use A.solveBackward(b, x) in parallel with petsc4py, where A is a Cholesky factored matrix, but it seems that this is not supported (e.g., for mumps and superlu_dist factorization solver backend). Is it possible to work around this? >> >> In detail, the problem I need to solve is to generate a set of correlated random numbers (denoted by a vector, w) from an uncorrelated one (denoted by a vector n). Denote the covariance matrix of n as C (symmetric). One needs to first factorize C, C = L L^T, and then solve the linear system L^T w = n for w in parallel. Is it possible to reformulate this problem for it to be implemented using petsc4py? >> >> Thank you! >> Yin > -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin.chapman at mail.utoronto.ca Wed Nov 26 13:46:35 2025 From: benjamin.chapman at mail.utoronto.ca (Benjamin Chapman) Date: Wed, 26 Nov 2025 19:46:35 +0000 Subject: [petsc-users] Issue using AOCC + AOCL to build PETSc Message-ID: Hello, I am currently trying to build PETSc using the AOCC compiler and using AOCL libraries for BLAS/LAPACK. However, I am running into an issue when it tries to download and construct the MPI library, it cannot find the ".libs/libevent.so" library in libevent (which it downloads). I attached the configure.log file. The reason I'm so confused is because PETSc built successfully when I did AOCC + MKL and gcc + AOCL, but not AOCC + AOCL together. Is it even possible to build PETSc using AOCC + AOCL or is it not designed for that? If so, is there a specific procedure I should follow? Thanks in advance. Best, Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure_AOCC_AOCL_add.log Type: application/octet-stream Size: 2060079 bytes Desc: configure_AOCC_AOCL_add.log URL: From rlmackie862 at gmail.com Wed Nov 26 15:26:08 2025 From: rlmackie862 at gmail.com (Randall Mackie) Date: Wed, 26 Nov 2025 13:26:08 -0800 Subject: [petsc-users] [WARNING: UNSCANNABLE EXTRACTION FAILED]GMRES plus BlockJacobi behave differently for seemlingy identical matrices In-Reply-To: References: <877bwkms9h.fsf@mpi-hd.mpg.de> <87o6pwpdpm.fsf@mpi-hd.mpg.de> Message-ID: <574147FD-AC56-4236-B7BA-23AB03D209AC@gmail.com> Hi Pierre, Following up on this email, what are good options to use when trying PCHPDDM for the first time? For example, what did you use here that worked so well? Thanks, Randy M. > On Oct 24, 2025, at 8:14?AM, Pierre Jolivet wrote: > > >> On 24 Oct 2025, at 4:38?PM, Nils Schween wrote: >> >> Thank you very much Pierre! >> >> I was not aware of the fact that the fill-in in the ILU decides about >> its quality. But its clear now. I will just test what level of fill we >> need for our application. > > I?ll note that block Jacobi and ILU are not very efficient solvers in most instances. > I tried much fancier algebraic preconditioners such as BoomerAMG and GAMG on your problem, and they are failing hard out-of-the-box. > Without knowing much more on the problem, it?s difficult to setup. > We also have other more robust preconditioners in PETSc by means of domain decomposition methods. > deal.II is interfaced with PCBDDC (which is also somewhat difficult to tune in a fully algebraic mode) and you could also use PCHPDDM (in fully algebraic mode). > On this toy problem, PCHPDDM performs much better in terms of iteration than the simpler PCBJACOBI + (sub) PCILU. > Of course, as we always advise our users, it?s best to do a little bit of literature survey to find the best method for your application, I doubt it?s PCBJACOBI. > If the solver part is not a problem in your application, just carry on with what?s easiest for you. > If you want some precise help on either PCBDDC or PCHPDDM, feel free to get in touch with me in private. > > Thanks, > Pierre > > PCGAMG > Linear A_ solve did not converge due to DIVERGED_ITS iterations 1000 > Linear B_ solve did not converge due to DIVERGED_ITS iterations 1000 > PCHYPRE > Linear A_ solve did not converge due to DIVERGED_NANORINF iterations 0 > Linear B_ solve did not converge due to DIVERGED_NANORINF iterations 0 > PCHPDDM > Linear A_ solve converged due to CONVERGED_RTOL iterations 4 > Linear B_ solve converged due to CONVERGED_RTOL iterations 38 > PCBJACOBI > Linear A_ solve converged due to CONVERGED_RTOL iterations 134 > Linear B_ solve did not converge due to DIVERGED_ITS iterations 1000 > >> Once more thanks, >> Nils >> >> >> Pierre Jolivet writes: >> >>>> On 24 Oct 2025, at 1:52?PM, Nils Schween wrote: >>>> >>>> Dear PETSc users, Dear PETSc developers, >>>> >>>> in our software we are solving a linear system with PETSc using GMRES >>>> in conjunction with a BlockJacobi preconditioner, i.e. the default of >>>> the KSP object. >>>> >>>> We have two versions of the system matrix, say A and B. The difference >>>> between them is the non-zero pattern. The non-zero pattern of matrix B >>>> is a subset of the one of matrix A. Their values should be identical. >>>> >>>> We solve the linear system, using A yields a solution after some >>>> iterations, whereas using B does not converge. >>>> >>>> I created binary files of the two matrices, the right-hand side, and >>>> wrote a small PETSc programm, which loads them and demonstrates the >>>> issue. I attach the files to this email. >>>> >>>> We would like to understand why the solver-preconditioner combination >>>> works in case A and not in case B. Can you help us finding this out? >>>> >>>> To test if the two matrices are identical, I substracted them and >>>> computed the Frobenius norm of the result. It is zero. >>> >>> The default subdomain solver is ILU(0). >>> By definition, this won?t allow fill-in. >>> So when you are not storing the zeros in B, the quality of your PC is much worse. >>> You can check this yourself with -A_ksp_view -B_ksp_view: >>> [?] >>> 0 levels of fill >>> tolerance for zero pivot 2.22045e-14 >>> matrix ordering: natural >>> factor fill ratio given 1., needed 1. >>> Factored matrix follows: >>> Mat Object: (A_) 1 MPI process >>> type: seqaij >>> rows=1664, cols=1664 >>> package used to perform factorization: petsc >>> total: nonzeros=117760, allocated nonzeros=117760 >>> using I-node routines: found 416 nodes, limit used is 5 >>> [?] >>> 0 levels of fill >>> tolerance for zero pivot 2.22045e-14 >>> matrix ordering: natural >>> factor fill ratio given 1., needed 1. >>> Factored matrix follows: >>> Mat Object: (B_) 1 MPI process >>> type: seqaij >>> rows=1664, cols=1664 >>> package used to perform factorization: petsc >>> total: nonzeros=49408, allocated nonzeros=49408 >>> not using I-node routines >>> >>> Check the number of nonzeros of both factored Mat. >>> With -B_pc_factor_levels 3, you?ll get roughly similar convergence speed (and density in the factored Mat of both PC). >>> >>> Thanks, >>> Pierre >>> >>>> >>>> To give you more context, we solve a system of partial differential >>>> equations that models astrophysical plasmas. It is essentially a system >>>> of advection-reaction equations. We use a discontinuous Galerkin (dG) >>>> method. Our code relies on the finite element library library deal.ii >>>> and its PETSc interface. The system matrices A and B are the result of >>>> the (dG) discretisation. We GMRES with a BlockJaboci preconditioner, >>>> because we do not know any better. >>>> >>>> I tested the code I sent with PETSc 3.24.0 and 3.19.1 on my workstation, i.e. >>>> Linux home-desktop 6.17.2-arch1-1 #1 SMP PREEMPT_DYNAMIC Sun, 12 Oct 2025 12:45:18 +0000 x86_64 GNU/Linux >>>> I use OpenMPI 5.0.8 and I compiled with mpicc, which in my cases use >>>> gcc. >>>> >>>> In case you need more information. Please let me know. >>>> Any help is appreciated. >>>> >>>> Thank you, >>>> Nils >>>> >>>> >>>> -- >>>> Nils Schween >>>> >>>> Phone: +49 6221 516 557 >>>> Mail: nils.schween at mpi-hd.mpg.de >>>> PGP-Key: 4DD3DCC0532EE96DB0C1F8B5368DBFA14CB81849 >>>> >>>> Max Planck Institute for Nuclear Physics >>>> Astrophysical Plasma Theory (APT) >>>> Saupfercheckweg 1, D-69117 Heidelberg >>>> https://urldefense.us/v3/__https://www.mpi-hd.mpg.de/mpi/en/research/scientific-divisions-and-groups/independent-research-groups/apt__;!!G_uCfscf7eWS!YoOlZjX4v-hbz0Oaawvh2Yy3nCbpHcafn1VjON06Or7f-WVrzGzD9SMcky5YJAyzVu62BIfzC5cpshSkkkpKpA$ >> >> -- >> Nils Schween >> PhD Student >> >> Phone: +49 6221 516 557 >> Mail: nils.schween at mpi-hd.mpg.de >> PGP-Key: 4DD3DCC0532EE96DB0C1F8B5368DBFA14CB81849 >> >> Max Planck Institute for Nuclear Physics >> Astrophysical Plasma Theory (APT) >> Saupfercheckweg 1, D-69117 Heidelberg >> https://urldefense.us/v3/__https://www.mpi-hd.mpg.de/mpi/en/research/scientific-divisions-and-groups/independent-research-groups/apt__;!!G_uCfscf7eWS!YoOlZjX4v-hbz0Oaawvh2Yy3nCbpHcafn1VjON06Or7f-WVrzGzD9SMcky5YJAyzVu62BIfzC5cpshSkkkpKpA$ > From bsmith at petsc.dev Wed Nov 26 20:13:20 2025 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 26 Nov 2025 21:13:20 -0500 Subject: [petsc-users] Issue using AOCC + AOCL to build PETSc In-Reply-To: References: Message-ID: Probably not an issue at all but why does --with-blaslapack-dir=/home/bchapman/aocl/5.1.0/gcc end with gcc? > On Nov 26, 2025, at 2:46?PM, Benjamin Chapman via petsc-users wrote: > > Hello, > > I am currently trying to build PETSc using the AOCC compiler and using AOCL libraries for BLAS/LAPACK. However, I am running into an issue when it tries to download and construct the MPI library, it cannot find the ?.libs/libevent.so ? library in libevent (which it downloads). I attached the configure.log file. > > The reason I?m so confused is because PETSc built successfully when I did AOCC + MKL and gcc + AOCL, but not AOCC + AOCL together. > > Is it even possible to build PETSc using AOCC + AOCL or is it not designed for that? If so, is there a specific procedure I should follow? There is no specific reason with PETSc that this should not work. The failure takes place in building OpenMPI which should have nothing to do with aocl. What you set for --with-blaslapack-dir should have no effect on the building of OpenMPI. OpenMPI builds libevent as part of its build process and then uses it. I cannot see a failure in building libevent just that it cannot find it later while working on other parts of OpenMPI. Can you delete your PETSC_ARCH directory completely and rerun the ./configure Don't use --with-fortran-kernels=1 it is pretty worthless. Don't use --with-threadsafety unless you know specifically that you need it. If the rerun of ./configure fails you can try a trick. Run without the --with-blaslapack-dir (and let it use the default it finds) then immediately run ./configure again this time with the --with-blaslapack-dir option (it will reuse the OpenMPI that it has already just built and won't rebuild it). > > Thanks in advance. > > Best, > Ben > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Wed Nov 26 23:26:42 2025 From: pierre at joliv.et (Pierre Jolivet) Date: Thu, 27 Nov 2025 06:26:42 +0100 Subject: [petsc-users] [WARNING: UNSCANNABLE EXTRACTION FAILED]GMRES plus BlockJacobi behave differently for seemlingy identical matrices In-Reply-To: <574147FD-AC56-4236-B7BA-23AB03D209AC@gmail.com> References: <877bwkms9h.fsf@mpi-hd.mpg.de> <87o6pwpdpm.fsf@mpi-hd.mpg.de> <574147FD-AC56-4236-B7BA-23AB03D209AC@gmail.com> Message-ID: <90A737CA-1392-4CE8-AED8-7677683CBA6F@joliv.et> > On 26 Nov 2025, at 10:26?PM, Randall Mackie wrote: > > Hi Pierre, > > Following up on this email, what are good options to use when trying PCHPDDM for the first time? There is no single answer to this question, especially when solving systems in a purely algebraic manner. Some examples can be found in the repository: $ git grep " hpddm" "*/ex*" > For example, what did you use here that worked so well? What worked well here may totally fail for other problems, so unless you are solving the same problem, then the following may not work, but here is what I get with -options_view. Thanks, Pierre Linear A_ solve converged due to CONVERGED_RTOL iterations 5 Linear B_ solve converged due to CONVERGED_RTOL iterations 39 #PETSc Option Table entries: -A_ksp_converged_reason # (source: command line) -A_ksp_type fgmres # (source: command line) -A_pc_hpddm_coarse_mat_type baij # (source: command line) -A_pc_hpddm_harmonic_overlap 2 # (source: command line) -A_pc_hpddm_levels_1_sub_pc_type lu # (source: command line) -A_pc_hpddm_levels_1_svd_nsv 120 # (source: command line) -A_pc_type hpddm # (source: command line) -B_ksp_converged_reason # (source: command line) -B_ksp_type fgmres # (source: command line) -B_pc_hpddm_coarse_mat_type baij # (source: command line) -B_pc_hpddm_harmonic_overlap 2 # (source: command line) -B_pc_hpddm_levels_1_sub_pc_type lu # (source: command line) -B_pc_hpddm_levels_1_svd_nsv 120 # (source: command line) -B_pc_type hpddm # (source: command line) -matload_block_size 1 # (source: file) -options_view # (source: command line) -vecload_block_size 1 # (source: file) #End of PETSc Option Table entries > Thanks, > > Randy M. > > > >> On Oct 24, 2025, at 8:14?AM, Pierre Jolivet wrote: >> >> >>> On 24 Oct 2025, at 4:38?PM, Nils Schween wrote: >>> >>> Thank you very much Pierre! >>> >>> I was not aware of the fact that the fill-in in the ILU decides about >>> its quality. But its clear now. I will just test what level of fill we >>> need for our application. >> >> I?ll note that block Jacobi and ILU are not very efficient solvers in most instances. >> I tried much fancier algebraic preconditioners such as BoomerAMG and GAMG on your problem, and they are failing hard out-of-the-box. >> Without knowing much more on the problem, it?s difficult to setup. >> We also have other more robust preconditioners in PETSc by means of domain decomposition methods. >> deal.II is interfaced with PCBDDC (which is also somewhat difficult to tune in a fully algebraic mode) and you could also use PCHPDDM (in fully algebraic mode). >> On this toy problem, PCHPDDM performs much better in terms of iteration than the simpler PCBJACOBI + (sub) PCILU. >> Of course, as we always advise our users, it?s best to do a little bit of literature survey to find the best method for your application, I doubt it?s PCBJACOBI. >> If the solver part is not a problem in your application, just carry on with what?s easiest for you. >> If you want some precise help on either PCBDDC or PCHPDDM, feel free to get in touch with me in private. >> >> Thanks, >> Pierre >> >> PCGAMG >> Linear A_ solve did not converge due to DIVERGED_ITS iterations 1000 >> Linear B_ solve did not converge due to DIVERGED_ITS iterations 1000 >> PCHYPRE >> Linear A_ solve did not converge due to DIVERGED_NANORINF iterations 0 >> Linear B_ solve did not converge due to DIVERGED_NANORINF iterations 0 >> PCHPDDM >> Linear A_ solve converged due to CONVERGED_RTOL iterations 4 >> Linear B_ solve converged due to CONVERGED_RTOL iterations 38 >> PCBJACOBI >> Linear A_ solve converged due to CONVERGED_RTOL iterations 134 >> Linear B_ solve did not converge due to DIVERGED_ITS iterations 1000 >> >>> Once more thanks, >>> Nils >>> >>> >>> Pierre Jolivet writes: >>> >>>>> On 24 Oct 2025, at 1:52?PM, Nils Schween wrote: >>>>> >>>>> Dear PETSc users, Dear PETSc developers, >>>>> >>>>> in our software we are solving a linear system with PETSc using GMRES >>>>> in conjunction with a BlockJacobi preconditioner, i.e. the default of >>>>> the KSP object. >>>>> >>>>> We have two versions of the system matrix, say A and B. The difference >>>>> between them is the non-zero pattern. The non-zero pattern of matrix B >>>>> is a subset of the one of matrix A. Their values should be identical. >>>>> >>>>> We solve the linear system, using A yields a solution after some >>>>> iterations, whereas using B does not converge. >>>>> >>>>> I created binary files of the two matrices, the right-hand side, and >>>>> wrote a small PETSc programm, which loads them and demonstrates the >>>>> issue. I attach the files to this email. >>>>> >>>>> We would like to understand why the solver-preconditioner combination >>>>> works in case A and not in case B. Can you help us finding this out? >>>>> >>>>> To test if the two matrices are identical, I substracted them and >>>>> computed the Frobenius norm of the result. It is zero. >>>> >>>> The default subdomain solver is ILU(0). >>>> By definition, this won?t allow fill-in. >>>> So when you are not storing the zeros in B, the quality of your PC is much worse. >>>> You can check this yourself with -A_ksp_view -B_ksp_view: >>>> [?] >>>> 0 levels of fill >>>> tolerance for zero pivot 2.22045e-14 >>>> matrix ordering: natural >>>> factor fill ratio given 1., needed 1. >>>> Factored matrix follows: >>>> Mat Object: (A_) 1 MPI process >>>> type: seqaij >>>> rows=1664, cols=1664 >>>> package used to perform factorization: petsc >>>> total: nonzeros=117760, allocated nonzeros=117760 >>>> using I-node routines: found 416 nodes, limit used is 5 >>>> [?] >>>> 0 levels of fill >>>> tolerance for zero pivot 2.22045e-14 >>>> matrix ordering: natural >>>> factor fill ratio given 1., needed 1. >>>> Factored matrix follows: >>>> Mat Object: (B_) 1 MPI process >>>> type: seqaij >>>> rows=1664, cols=1664 >>>> package used to perform factorization: petsc >>>> total: nonzeros=49408, allocated nonzeros=49408 >>>> not using I-node routines >>>> >>>> Check the number of nonzeros of both factored Mat. >>>> With -B_pc_factor_levels 3, you?ll get roughly similar convergence speed (and density in the factored Mat of both PC). >>>> >>>> Thanks, >>>> Pierre >>>> >>>>> >>>>> To give you more context, we solve a system of partial differential >>>>> equations that models astrophysical plasmas. It is essentially a system >>>>> of advection-reaction equations. We use a discontinuous Galerkin (dG) >>>>> method. Our code relies on the finite element library library deal.ii >>>>> and its PETSc interface. The system matrices A and B are the result of >>>>> the (dG) discretisation. We GMRES with a BlockJaboci preconditioner, >>>>> because we do not know any better. >>>>> >>>>> I tested the code I sent with PETSc 3.24.0 and 3.19.1 on my workstation, i.e. >>>>> Linux home-desktop 6.17.2-arch1-1 #1 SMP PREEMPT_DYNAMIC Sun, 12 Oct 2025 12:45:18 +0000 x86_64 GNU/Linux >>>>> I use OpenMPI 5.0.8 and I compiled with mpicc, which in my cases use >>>>> gcc. >>>>> >>>>> In case you need more information. Please let me know. >>>>> Any help is appreciated. >>>>> >>>>> Thank you, >>>>> Nils >>>>> >>>>> >>>>> -- >>>>> Nils Schween >>>>> >>>>> Phone: +49 6221 516 557 >>>>> Mail: nils.schween at mpi-hd.mpg.de >>>>> PGP-Key: 4DD3DCC0532EE96DB0C1F8B5368DBFA14CB81849 >>>>> >>>>> Max Planck Institute for Nuclear Physics >>>>> Astrophysical Plasma Theory (APT) >>>>> Saupfercheckweg 1, D-69117 Heidelberg >>>>> https://urldefense.us/v3/__https://www.mpi-hd.mpg.de/mpi/en/research/scientific-divisions-and-groups/independent-research-groups/apt__;!!G_uCfscf7eWS!YoOlZjX4v-hbz0Oaawvh2Yy3nCbpHcafn1VjON06Or7f-WVrzGzD9SMcky5YJAyzVu62BIfzC5cpshSkkkpKpA$ >>> >>> -- >>> Nils Schween >>> PhD Student >>> >>> Phone: +49 6221 516 557 >>> Mail: nils.schween at mpi-hd.mpg.de >>> PGP-Key: 4DD3DCC0532EE96DB0C1F8B5368DBFA14CB81849 >>> >>> Max Planck Institute for Nuclear Physics >>> Astrophysical Plasma Theory (APT) >>> Saupfercheckweg 1, D-69117 Heidelberg >>> https://urldefense.us/v3/__https://www.mpi-hd.mpg.de/mpi/en/research/scientific-divisions-and-groups/independent-research-groups/apt__;!!G_uCfscf7eWS!YoOlZjX4v-hbz0Oaawvh2Yy3nCbpHcafn1VjON06Or7f-WVrzGzD9SMcky5YJAyzVu62BIfzC5cpshSkkkpKpA$ >> > From C.J.Berends at uu.nl Wed Nov 26 02:31:06 2025 From: C.J.Berends at uu.nl (Berends, C.J. (Tijn)) Date: Wed, 26 Nov 2025 08:31:06 +0000 Subject: [petsc-users] Bug report: petcds fortran module missing? Message-ID: Dear petsc folks, I am trying to set up a very basic example program in Fortran, using Petsc to solve a simple Poisson problem (d2u/dx2 = f). However, I am running into problems when I try to call PetscDSSetResidual to set the function pointers for the residual. According to the documentation, this function is part of the petcds module, but while the header file (petscds.h) exists in my Petsc include directotry, the Fortran module (petscds.mod) does not. Therefore, my code won't compile. I have currently got Petsc installed via Homebrew. I have tried cloning the Petsc git repository and building that, using the configuration option --with-fortran-interfaces=1, but still the petscds.mod file is not there. Am I doing something wrong, or can it be that this particular Fortran interface is just missing? If you need any additional information from me, please let me know. Kind regards, Tijn Berends dr. C. J. (Tijn) Berends Post-doc (palaeo)glaciology Institute for Marine and Atmospheric research Utrecht (IMAU), Utrecht University, The Netherlands Buys Ballot Building (BBG), Room 6.67 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Thu Nov 27 09:49:33 2025 From: jroman at dsic.upv.es (Jose E. Roman) Date: Thu, 27 Nov 2025 15:49:33 +0000 Subject: [petsc-users] Bug report: petcds fortran module missing? In-Reply-To: References: Message-ID: <09F62CDD-EE2F-47DF-B041-5DB4FF1C7D78@dsic.upv.es> PetscDS is part of DM, so you have to "use petscdm". The example src/dm/impls/plex/tutorials/ex4f90.F90 uses PetscDS. The configure option --with-fortran-interfaces=1 is deprecated, it is no longer needed. Jose > El 26 nov 2025, a las 9:31, Berends, C.J. (Tijn) via petsc-users escribi?: > > Dear petsc folks, > > I am trying to set up a very basic example program in Fortran, using Petsc to solve a simple Poisson problem (d2u/dx2 = f). > > However, I am running into problems when I try to call PetscDSSetResidual to set the function pointers for the residual. According to the documentation, this function is part of the petcds module, but while the header file (petscds.h) exists in my Petsc include directotry, the Fortran module (petscds.mod) does not. Therefore, my code won't compile. > > I have currently got Petsc installed via Homebrew. I have tried cloning the Petsc git repository and building that, using the configuration option --with-fortran-interfaces=1, but still the petscds.mod file is not there. > > Am I doing something wrong, or can it be that this particular Fortran interface is just missing? > > If you need any additional information from me, please let me know. > > Kind regards, > Tijn Berends > > > > dr. C. J. (Tijn) Berends > Post-doc (palaeo)glaciology > Institute for Marine and Atmospheric research Utrecht (IMAU), Utrecht University, The Netherlands > Buys Ballot Building (BBG), Room 6.67