From lidia.varsh at mail.ioffe.ru Wed Jun 1 12:37:57 2022 From: lidia.varsh at mail.ioffe.ru (Lidia) Date: Wed, 1 Jun 2022 20:37:57 +0300 Subject: [petsc-users] Sparse linear system solving In-Reply-To: References: <026d55af-c978-81da-f571-46519e5e6f8e@mail.ioffe.ru> <2e7ebbf1-511a-7055-ff92-131d3bf73f1e@mail.ioffe.ru> Message-ID: <201b6c28-2616-edaa-21dd-ee91a257c432@mail.ioffe.ru> Dear Matt, Thank you for the rule of 10,000 variables per process! We have run ex.5 with matrix 1e4 x 1e4 at our cluster and got a good performance dynamics (see the figure "performance.png" - dependency of the solving time in seconds on the number of cores). We have used GAMG preconditioner (multithread: we have added the option "-pc_gamg_use_parallel_coarse_grid_solver") and GMRES solver. And we have set one openMP thread to every MPI process. Now the ex.5 is working good on many mpi processes! But the running uses about 100 GB of RAM. How we can run ex.5 using many openMP threads without mpi? If we just change the running command, the cores are not loaded normally: usually just one core is loaded in 100 % and others are idle. Sometimes all cores are working in 100 % during 1 second but then again become idle about 30 seconds. Can the preconditioner use many threads and how to activate this option? The solving times (the time of the solver work) using 60 openMP threads is 511 seconds now, and while using 60 MPI processes - 13.19 seconds. ksp_monitor outs for both cases (many openMP threads or many MPI processes) are attached. Thank you! Best, Lidia On 31.05.2022 15:21, Matthew Knepley wrote: > I have looked at the local logs. First, you have run problems of size > 12? and 24. As a rule of thumb, you need 10,000 > variables per process in order to see good speedup. > > ? Thanks, > > ? ? ?Matt > > On Tue, May 31, 2022 at 8:19 AM Matthew Knepley wrote: > > On Tue, May 31, 2022 at 7:39 AM Lidia > wrote: > > Matt, Mark, thank you much for your answers! > > > Now we have run example # 5 on our computer cluster and on the > local server and also have not seen any performance increase, > but by unclear reason running times on the local server are > much better than on the cluster. > > I suspect that you are trying to get speedup without increasing > the memory bandwidth: > > https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup > > ? Thanks, > > ? ? ?Matt > > Now we will try to run petsc #5 example inside a docker > container on our server and see if the problem is in our > environment. I'll write you the results of this test as soon > as we get it. > > The ksp_monitor outs for the 5th test at the current local > server configuration (for 2 and 4 mpi processes) and for the > cluster (for 1 and 3 mpi processes) are attached . > > > And one more question. Potentially we can use 10 nodes and 96 > threads at each node on our cluster. What do you think, which > combination of numbers of mpi processes and openmp threads may > be the best for the 5th example? > > Thank you! > > > Best, > Lidiia > > On 31.05.2022 05:42, Mark Adams wrote: >> And if you see "NO" change in performance I suspect the >> solver/matrix is all on one processor. >> (PETSc does not use threads by default so threads should not >> change anything). >> >> As Matt said, it is best to start with a PETSc example?that >> does something like what you want (parallel linear solve, see >> src/ksp/ksp/tutorials for examples), and then add your code >> to it. >> That way you get the basic infrastructure?in place for you, >> which is pretty obscure to the uninitiated. >> >> Mark >> >> On Mon, May 30, 2022 at 10:18 PM Matthew Knepley >> wrote: >> >> On Mon, May 30, 2022 at 10:12 PM Lidia >> wrote: >> >> Dear colleagues, >> >> Is here anyone who have solved big sparse linear >> matrices using PETSC? >> >> >> There are lots of publications with this kind of data. >> Here is one recent one: https://arxiv.org/abs/2204.01722 >> >> We have found NO performance improvement while using >> more and more mpi >> processes (1-2-3) and open-mp threads (from 1 to 72 >> threads). Did anyone >> faced to this problem? Does anyone know any possible >> reasons of such >> behaviour? >> >> >> Solver behavior is dependent on the input matrix. The >> only general-purpose solvers >> are direct, but they do not scale linearly and have high >> memory requirements. >> >> Thus, in order to make progress you will have to be >> specific about your matrices. >> >> We use AMG preconditioner and GMRES solver from KSP >> package, as our >> matrix is large (from 100 000 to 1e+6 rows and >> columns), sparse, >> non-symmetric and includes both positive and negative >> values. But >> performance problems also exist while using CG >> solvers with symmetric >> matrices. >> >> >> There are many PETSc examples, such as example 5 for the >> Laplacian, that exhibit >> good scaling with both AMG and GMG. >> >> Could anyone help us to set appropriate options of >> the preconditioner >> and solver? Now we use default parameters, maybe they >> are not the best, >> but we do not know a good combination. Or maybe you >> could suggest any >> other pairs of preconditioner+solver for such tasks? >> >> I can provide more information: the matrices that we >> solve, c++ script >> to run solving using petsc and any statistics >> obtained by our runs. >> >> >> First, please provide a description of the linear system, >> and the output of >> >> ? -ksp_view -ksp_monitor_true_residual >> -ksp_converged_reason -log_view >> >> for each test case. >> >> ? Thanks, >> >> ? ? ?Matt >> >> Thank you in advance! >> >> Best regards, >> Lidiia Varshavchik, >> Ioffe Institute, St. Petersburg, Russia >> >> >> >> -- >> What most experimenters take for granted before they >> begin their experiments is infinitely more interesting >> than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to > which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: performance.png Type: image/png Size: 16053 bytes Desc: not available URL: -------------- next part -------------- [lida at head1 tutorials]$ ./ex5 -m 10000 -ksp_monitor -ksp_monitor_true_residual -ksp_converged_reason -log_view -pc_type gamg -ksp_type gmres -pc_gamg_use_parallel_coarse_grid_solver -------------------------------------------------------------------------- WARNING: No preset parameters were found for the device that Open MPI detected: Local host: head1 Device name: i40iw0 Device vendor ID: 0x8086 Device vendor part ID: 14290 Default device parameters will be used, which may result in lower performance. You can edit any of the files specified by the btl_openib_device_param_files MCA parameter to set values for your device. NOTE: You can turn off this warning by setting the MCA parameter btl_openib_warn_no_device_params_found to 0. -------------------------------------------------------------------------- -------------------------------------------------------------------------- No OpenFabrics connection schemes reported that they were able to be used on a specific port. As such, the openib BTL (OpenFabrics support) will be disabled for this port. Local host: head1 Local device: i40iw0 Local port: 1 CPCs attempted: rdmacm, udcm -------------------------------------------------------------------------- [head1.hpc:274354] 1 more process has sent help message help-mpi-btl-openib.txt / no device params found [head1.hpc:274354] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [head1.hpc:274354] 1 more process has sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port 0 KSP Residual norm 2.037538277184e+11 0 KSP preconditioned resid norm 2.037538277184e+11 true resid norm 1.291188079508e+10 ||r(i)||/||b|| 1.000000000000e+00 1 KSP Residual norm 4.559847344082e+10 1 KSP preconditioned resid norm 4.559847344082e+10 true resid norm 1.145337105566e+10 ||r(i)||/||b|| 8.870412635802e-01 2 KSP Residual norm 1.458580410483e+10 2 KSP preconditioned resid norm 1.458580410483e+10 true resid norm 6.820359295573e+09 ||r(i)||/||b|| 5.282235333346e-01 3 KSP Residual norm 5.133668905377e+09 3 KSP preconditioned resid norm 5.133668905377e+09 true resid norm 3.443273018496e+09 ||r(i)||/||b|| 2.666747837238e-01 4 KSP Residual norm 1.822791754681e+09 4 KSP preconditioned resid norm 1.822791754681e+09 true resid norm 1.429794150530e+09 ||r(i)||/||b|| 1.107347700325e-01 5 KSP Residual norm 6.883552291389e+08 5 KSP preconditioned resid norm 6.883552291389e+08 true resid norm 5.284618300965e+08 ||r(i)||/||b|| 4.092833867378e-02 6 KSP Residual norm 2.738661252083e+08 6 KSP preconditioned resid norm 2.738661252083e+08 true resid norm 2.298184687591e+08 ||r(i)||/||b|| 1.779899244785e-02 7 KSP Residual norm 1.175295112233e+08 7 KSP preconditioned resid norm 1.175295112233e+08 true resid norm 9.785469137958e+07 ||r(i)||/||b|| 7.578655110947e-03 8 KSP Residual norm 4.823372166305e+07 8 KSP preconditioned resid norm 4.823372166305e+07 true resid norm 4.288291058318e+07 ||r(i)||/||b|| 3.321197838159e-03 9 KSP Residual norm 2.019815757215e+07 9 KSP preconditioned resid norm 2.019815757215e+07 true resid norm 1.776678838786e+07 ||r(i)||/||b|| 1.376003129973e-03 10 KSP Residual norm 8.776441360510e+06 10 KSP preconditioned resid norm 8.776441360510e+06 true resid norm 7.333797620917e+06 ||r(i)||/||b|| 5.679883308489e-04 11 KSP Residual norm 3.536170852140e+06 11 KSP preconditioned resid norm 3.536170852140e+06 true resid norm 3.517014965376e+06 ||r(i)||/||b|| 2.723859537734e-04 12 KSP Residual norm 1.369320429479e+06 12 KSP preconditioned resid norm 1.369320429479e+06 true resid norm 1.434993628816e+06 ||r(i)||/||b|| 1.111374594910e-04 Linear solve converged due to CONVERGED_RTOL iterations 12 time 511.480000 m=10000 n=10000 Norm of error 4.8462e+06, Iterations 12 0 KSP Residual norm 6.828607739124e+09 0 KSP preconditioned resid norm 6.828607739124e+09 true resid norm 2.081798084592e+10 ||r(i)||/||b|| 1.000000000000e+00 1 KSP Residual norm 1.592108138342e+08 1 KSP preconditioned resid norm 1.592108138342e+08 true resid norm 1.085557726631e+09 ||r(i)||/||b|| 5.214519768589e-02 2 KSP Residual norm 4.713015543535e+06 2 KSP preconditioned resid norm 4.713015543535e+06 true resid norm 2.310928708753e+07 ||r(i)||/||b|| 1.110063807752e-03 3 KSP Residual norm 3.998043547851e+05 3 KSP preconditioned resid norm 3.998043547851e+05 true resid norm 2.247029256835e+06 ||r(i)||/||b|| 1.079369451565e-04 4 KSP Residual norm 3.507419330164e+04 4 KSP preconditioned resid norm 3.507419330164e+04 true resid norm 2.008185753840e+05 ||r(i)||/||b|| 9.646400237870e-06 Linear solve converged due to CONVERGED_RTOL iterations 4 Norm of error 5.77295e+11, Iterations 4 **************************************** *********************************************************************************************************************** *** WIDEN YOUR WINDOW TO 160 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** **************************************************************************************************************************************************************** ------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------- ./ex5 on a named head1.hpc with 1 processor, by lida Wed Jun 1 20:35:41 2022 Using Petsc Release Version 3.17.1, unknown Max Max/Min Avg Total Time (sec): 1.065e+03 1.000 1.065e+03 Objects: 7.090e+02 1.000 7.090e+02 Flops: 3.476e+11 1.000 3.476e+11 3.476e+11 Flops/sec: 3.263e+08 1.000 3.263e+08 3.263e+08 MPI Msg Count: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Msg Len (bytes): 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 3.4957e-04 0.0% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 1: Original Solve: 8.2717e+02 77.7% 2.5959e+11 74.7% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 2: Second Solve: 2.3804e+02 22.3% 8.8003e+10 25.3% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage --- Event Stage 1: Original Solve MatMult 460 1.0 1.5530e+02 1.0 1.05e+11 1.0 0.0e+00 0.0e+00 0.0e+00 15 30 0 0 0 19 40 0 0 0 676 MatMultAdd 91 1.0 1.7280e+01 1.0 8.00e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 3 0 0 0 463 MatMultTranspose 91 1.0 2.3679e+01 1.0 8.00e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 3 3 0 0 0 338 MatSolve 13 1.0 6.4134e-05 1.0 5.85e+02 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 9 MatLUFactorSym 1 1.0 2.6981e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatLUFactorNum 1 1.0 1.5310e-05 1.0 7.30e+01 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 5 MatConvert 7 1.0 1.2939e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0 MatScale 21 1.0 3.5806e+00 1.0 2.05e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 572 MatResidual 91 1.0 2.5775e+01 1.0 1.86e+10 1.0 0.0e+00 0.0e+00 0.0e+00 2 5 0 0 0 3 7 0 0 0 723 MatAssemblyBegin 36 1.0 1.7278e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 36 1.0 2.1752e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 3 0 0 0 0 0 MatGetRowIJ 1 1.0 5.9204e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 5.1203e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatCoarsen 7 1.0 3.5076e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 4 0 0 0 0 0 MatAXPY 7 1.0 8.9612e+00 1.0 1.17e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 13 MatMatMultSym 7 1.0 1.7313e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 MatMatMultNum 7 1.0 1.0714e+01 1.0 1.74e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 163 MatPtAPSymbolic 7 1.0 5.7167e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 7 0 0 0 0 0 MatPtAPNumeric 7 1.0 5.7185e+01 1.0 7.77e+09 1.0 0.0e+00 0.0e+00 0.0e+00 5 2 0 0 0 7 3 0 0 0 136 MatTrnMatMultSym 1 1.0 3.3972e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 4 0 0 0 0 0 MatGetSymTrans 7 1.0 3.8600e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecMDot 82 1.0 2.0620e+01 1.0 2.84e+10 1.0 0.0e+00 0.0e+00 0.0e+00 2 8 0 0 0 2 11 0 0 0 1378 VecNorm 105 1.0 1.4910e+00 1.0 8.16e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 3 0 0 0 5476 VecScale 90 1.0 8.9402e-01 1.0 2.58e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 2888 VecCopy 294 1.0 1.4368e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0 VecSet 236 1.0 7.5047e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 21 1.0 9.9549e-01 1.0 3.03e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 3047 VecAYPX 559 1.0 1.8945e+01 1.0 1.34e+10 1.0 0.0e+00 0.0e+00 0.0e+00 2 4 0 0 0 2 5 0 0 0 709 VecAXPBYCZ 182 1.0 7.3185e+00 1.0 1.52e+10 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 6 0 0 0 2071 VecMAXPY 102 1.0 2.6425e+01 1.0 4.88e+10 1.0 0.0e+00 0.0e+00 0.0e+00 2 14 0 0 0 3 19 0 0 0 1845 VecAssemblyBegin 1 1.0 5.1595e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 1 1.0 1.3132e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 441 1.0 2.1981e+01 1.0 7.34e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 3 3 0 0 0 334 VecNormalize 90 1.0 1.9910e+00 1.0 7.75e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 3 0 0 0 3891 KSPSetUp 16 1.0 7.4611e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 KSPSolve 1 1.0 2.9415e+02 1.0 2.00e+11 1.0 0.0e+00 0.0e+00 0.0e+00 28 58 0 0 0 36 77 0 0 0 680 KSPGMRESOrthog 82 1.0 3.5705e+01 1.0 5.68e+10 1.0 0.0e+00 0.0e+00 0.0e+00 3 16 0 0 0 4 22 0 0 0 1592 PCGAMGGraph_AGG 7 1.0 7.8044e+01 1.0 1.43e+09 1.0 0.0e+00 0.0e+00 0.0e+00 7 0 0 0 0 9 1 0 0 0 18 PCGAMGCoarse_AGG 7 1.0 1.2952e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12 0 0 0 0 16 0 0 0 0 0 PCGAMGProl_AGG 7 1.0 5.1550e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 6 0 0 0 0 0 PCGAMGPOpt_AGG 7 1.0 1.0284e+02 1.0 4.90e+10 1.0 0.0e+00 0.0e+00 0.0e+00 10 14 0 0 0 12 19 0 0 0 476 GAMG: createProl 7 1.0 3.6476e+02 1.0 5.04e+10 1.0 0.0e+00 0.0e+00 0.0e+00 34 15 0 0 0 44 19 0 0 0 138 Create Graph 7 1.0 1.2939e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0 Filter Graph 7 1.0 6.4028e+01 1.0 1.43e+09 1.0 0.0e+00 0.0e+00 0.0e+00 6 0 0 0 0 8 1 0 0 0 22 MIS/Agg 7 1.0 3.5146e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 4 0 0 0 0 0 SA: col data 7 1.0 7.9635e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SA: frmProl0 7 1.0 4.7866e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 6 0 0 0 0 0 SA: smooth 7 1.0 4.2317e+01 1.0 2.47e+09 1.0 0.0e+00 0.0e+00 0.0e+00 4 1 0 0 0 5 1 0 0 0 58 GAMG: partLevel 7 1.0 1.1435e+02 1.0 7.77e+09 1.0 0.0e+00 0.0e+00 0.0e+00 11 2 0 0 0 14 3 0 0 0 68 PCGAMG Squ l00 1 1.0 3.3972e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 4 0 0 0 0 0 PCGAMG Gal l00 1 1.0 6.8174e+01 1.0 4.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 6 1 0 0 0 8 2 0 0 0 65 PCGAMG Opt l00 1 1.0 2.1367e+01 1.0 1.24e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 3 0 0 0 0 58 PCGAMG Gal l01 1 1.0 3.6124e+01 1.0 2.30e+09 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 4 1 0 0 0 64 PCGAMG Opt l01 1 1.0 5.1169e+00 1.0 3.61e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 70 PCGAMG Gal l02 1 1.0 9.2202e+00 1.0 8.86e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 96 PCGAMG Opt l02 1 1.0 1.3495e+00 1.0 1.25e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 93 PCGAMG Gal l03 1 1.0 7.2320e-01 1.0 1.13e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 157 PCGAMG Opt l03 1 1.0 1.7474e-01 1.0 1.55e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 89 PCGAMG Gal l04 1 1.0 1.0741e-01 1.0 8.45e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 79 PCGAMG Opt l04 1 1.0 1.9007e-02 1.0 1.17e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 62 PCGAMG Gal l05 1 1.0 3.0491e-03 1.0 4.43e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 145 PCGAMG Opt l05 1 1.0 8.0217e-04 1.0 6.95e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 87 PCGAMG Gal l06 1 1.0 1.2688e-04 1.0 1.27e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 100 PCGAMG Opt l06 1 1.0 9.1779e-05 1.0 2.96e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 32 PCSetUp 1 1.0 4.8375e+02 1.0 5.82e+10 1.0 0.0e+00 0.0e+00 0.0e+00 45 17 0 0 0 58 22 0 0 0 120 PCApply 13 1.0 1.9865e+02 1.0 1.18e+11 1.0 0.0e+00 0.0e+00 0.0e+00 19 34 0 0 0 24 45 0 0 0 593 --- Event Stage 2: Second Solve MatMult 160 1.0 5.5173e+01 1.0 3.60e+10 1.0 0.0e+00 0.0e+00 0.0e+00 5 10 0 0 0 23 41 0 0 0 652 MatMultAdd 25 1.0 1.4079e+00 1.0 5.61e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 MatMultTranspose 25 1.0 1.1887e-02 1.0 5.61e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 47 MatSolve 5 1.0 3.1421e-05 1.0 1.94e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 62 MatLUFactorSym 1 1.0 4.1040e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatLUFactorNum 1 1.0 2.0536e-05 1.0 8.30e+02 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 40 MatConvert 5 1.0 9.8007e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 4 0 0 0 0 0 MatScale 15 1.0 2.0833e+00 1.0 1.00e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 1 0 0 0 480 MatResidual 25 1.0 7.8182e+00 1.0 5.00e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 3 6 0 0 0 640 MatAssemblyBegin 26 1.0 4.7507e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 26 1.0 6.1941e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 3 0 0 0 0 0 MatGetRowIJ 1 1.0 3.5968e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 3.4700e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatCoarsen 5 1.0 1.7509e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 7 0 0 0 0 0 MatZeroEntries 1 1.0 4.9738e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAXPY 5 1.0 3.0948e+00 1.0 2.54e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 MatMatMultSym 5 1.0 4.4847e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 2 0 0 0 0 0 MatMatMultNum 5 1.0 3.2775e+00 1.0 2.86e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 MatPtAPSymbolic 5 1.0 1.2886e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 MatPtAPNumeric 5 1.0 2.4792e+00 1.0 8.02e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 MatTrnMatMultSym 1 1.0 3.0547e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 MatGetSymTrans 5 1.0 1.3340e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecMDot 54 1.0 9.7819e+00 1.0 1.30e+10 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 4 15 0 0 0 1329 VecNorm 67 1.0 5.1862e-01 1.0 4.60e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 5 0 0 0 8870 VecScale 60 1.0 4.1836e-01 1.0 1.60e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 3825 VecCopy 86 1.0 5.7537e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0 VecSet 76 1.0 3.2515e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 VecAXPY 11 1.0 6.2773e-01 1.0 1.40e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 2230 VecAYPX 155 1.0 7.3351e+00 1.0 4.50e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 3 5 0 0 0 614 VecAXPBYCZ 50 1.0 2.7304e+00 1.0 5.00e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 1 6 0 0 0 1831 VecMAXPY 64 1.0 1.0069e+01 1.0 1.78e+10 1.0 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 4 20 0 0 0 1768 VecPointwiseMult 155 1.0 9.0886e+00 1.0 3.10e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 4 4 0 0 0 341 VecNormalize 60 1.0 7.8309e-01 1.0 4.80e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 5 0 0 0 6130 KSPSetUp 12 1.0 3.8298e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 2 0 0 0 0 0 KSPSolve 1 1.0 8.0642e+01 1.0 4.81e+10 1.0 0.0e+00 0.0e+00 0.0e+00 8 14 0 0 0 34 55 0 0 0 596 KSPGMRESOrthog 54 1.0 1.7151e+01 1.0 2.60e+10 1.0 0.0e+00 0.0e+00 0.0e+00 2 7 0 0 0 7 30 0 0 0 1516 PCGAMGGraph_AGG 5 1.0 2.7212e+01 1.0 1.00e+09 1.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 11 1 0 0 0 37 PCGAMGCoarse_AGG 5 1.0 5.3094e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 22 0 0 0 0 0 PCGAMGProl_AGG 5 1.0 6.3314e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 3 0 0 0 0 0 PCGAMGPOpt_AGG 5 1.0 5.8035e+01 1.0 3.76e+10 1.0 0.0e+00 0.0e+00 0.0e+00 5 11 0 0 0 24 43 0 0 0 648 GAMG: createProl 5 1.0 1.4493e+02 1.0 3.86e+10 1.0 0.0e+00 0.0e+00 0.0e+00 14 11 0 0 0 61 44 0 0 0 266 Create Graph 5 1.0 9.8007e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 4 0 0 0 0 0 Filter Graph 5 1.0 1.6596e+01 1.0 1.00e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 7 1 0 0 0 60 MIS/Agg 5 1.0 1.7585e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 7 0 0 0 0 0 SA: col data 5 1.0 6.9388e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SA: frmProl0 5 1.0 2.9984e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 SA: smooth 5 1.0 1.3478e+01 1.0 4.24e+05 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 6 0 0 0 0 0 GAMG: partLevel 5 1.0 3.7679e+00 1.0 8.02e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 2 0 0 0 0 0 PCGAMG Squ l00 1 1.0 3.0548e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 PCGAMG Gal l00 1 1.0 3.7665e+00 1.0 6.32e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 2 0 0 0 0 0 PCGAMG Opt l00 1 1.0 7.7613e+00 1.0 2.25e+05 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 3 0 0 0 0 0 PCGAMG Gal l01 1 1.0 7.6855e-04 1.0 1.19e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 155 PCGAMG Opt l01 1 1.0 6.0377e-04 1.0 4.29e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 71 PCGAMG Gal l02 1 1.0 3.4782e-04 1.0 3.70e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 106 PCGAMG Opt l02 1 1.0 2.2165e-04 1.0 1.33e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 60 PCGAMG Gal l03 1 1.0 1.3189e-04 1.0 1.08e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 82 PCGAMG Opt l03 1 1.0 9.0454e-05 1.0 3.83e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 42 PCGAMG Gal l04 1 1.0 6.2573e-05 1.0 3.34e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 53 PCGAMG Opt l04 1 1.0 4.9897e-05 1.0 1.14e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 23 PCSetUp 1 1.0 1.5265e+02 1.0 3.86e+10 1.0 0.0e+00 0.0e+00 0.0e+00 14 11 0 0 0 64 44 0 0 0 253 PCApply 5 1.0 4.8881e+01 1.0 2.90e+10 1.0 0.0e+00 0.0e+00 0.0e+00 5 8 0 0 0 21 33 0 0 0 593 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Viewer 1 1 896 0. --- Event Stage 1: Original Solve Container 7 0 0 0. Matrix 39 23 37853744688 0. Matrix Coarsen 7 7 4704 0. Vector 249 182 42116718600 0. Krylov Solver 16 7 217000 0. Preconditioner 16 7 6496 0. Viewer 1 0 0 0. PetscRandom 7 7 4970 0. Index Set 12 9 8584 0. Distributed Mesh 15 7 35896 0. Star Forest Graph 30 14 16464 0. Discrete System 15 7 7168 0. Weak Form 15 7 4648 0. --- Event Stage 2: Second Solve Container 5 12 7488 0. Matrix 28 44 42665589712 0. Matrix Coarsen 5 5 3360 0. Vector 156 223 48929583824 0. Krylov Solver 11 20 200246 0. Preconditioner 11 20 23064 0. PetscRandom 5 5 3550 0. Index Set 8 11 11264 0. Distributed Mesh 10 18 92304 0. Star Forest Graph 20 36 42336 0. Discrete System 10 18 18432 0. Weak Form 10 18 11952 0. ======================================================================================================================== Average time to get PetscTime(): 3.03611e-08 #PETSc Option Table entries: -ksp_converged_reason -ksp_monitor -ksp_monitor_true_residual -ksp_type gmres -log_view -m 10000 -pc_gamg_use_parallel_coarse_grid_solver -pc_type gamg #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with 64 bit PetscInt Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8 Configure options: --with-python --prefix=/home/lida -with-mpi-dir=/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4 LDFLAGS="-L/home/lida/lib64 -L/home/lida/lib -L/home/lida/jdk/lib" CPPFLAGS="-I/home/lida/include -I/home/lida/jdk/include -march=native -O3" CXXFLAGS="-I/home/lida/include -I/home/lida/jdk/include -march=native -O3" CFLAGS="-I/home/lida/include -I/home/lida/jdk/include -march=native -O3" --with-debugging=no --with-64-bit-indices FOPTFLAGS="-O3 -march=native" --download-make ----------------------------------------- Libraries compiled on 2022-05-25 10:03:14 on head1.hpc Machine characteristics: Linux-3.10.0-1062.el7.x86_64-x86_64-with-centos-7.7.1908-Core Using PETSc directory: /home/lida Using PETSc arch: ----------------------------------------- Using C compiler: /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpicc -I/home/lida/include -I/home/lida/jdk/include -march=native -O3 -fPIC -I/home/lida/include -I/home/lida/jdk/include -march=native -O3 Using Fortran compiler: /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -O3 -march=native -I/home/lida/include -I/home/lida/jdk/include -march=native -O3 ----------------------------------------- Using include paths: -I/home/lida/include -I/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/include ----------------------------------------- Using C linker: /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpicc Using Fortran linker: /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpif90 Using libraries: -Wl,-rpath,/home/lida/lib -L/home/lida/lib -lpetsc -Wl,-rpath,/home/lida/lib64 -L/home/lida/lib64 -Wl,-rpath,/home/lida/lib -L/home/lida/lib -Wl,-rpath,/home/lida/jdk/lib -L/home/lida/jdk/lib -Wl,-rpath,/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/lib -L/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/lib -Wl,-rpath,/opt/ohpc/pub/compiler/gcc/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0 -L/opt/ohpc/pub/compiler/gcc/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0 -Wl,-rpath,/opt/ohpc/pub/compiler/gcc/8.3.0/lib64 -L/opt/ohpc/pub/compiler/gcc/8.3.0/lib64 -Wl,-rpath,/home/lida/intel/oneapi/mkl/2022.0.2/lib/intel64 -L/home/lida/intel/oneapi/mkl/2022.0.2/lib/intel64 -Wl,-rpath,/opt/software/intel/compilers_and_libraries_2020.2.254/linux/tbb/lib/intel64/gcc4.8 -L/opt/software/intel/compilers_and_libraries_2020.2.254/linux/tbb/lib/intel64/gcc4.8 -Wl,-rpath,/opt/ohpc/pub/compiler/gcc/8.3.0/lib -L/opt/ohpc/pub/compiler/gcc/8.3.0/lib -lopenblas -lm -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lquadmath -lstdc++ -ldl ----------------------------------------- [lida at head1 tutorials]$ -------------- next part -------------- [lida at head1 tutorials]$ export OMP_NUM_THREADS=1 [lida at head1 tutorials]$ mpirun -n 60 ./ex5 -m 10000 -ksp_monitor -ksp_monitor_true_residual -ksp_converged_reason -log_view -pc_type gamg -ksp_type gmres -pc_gamg_use_parallel_coarse_grid_solver -------------------------------------------------------------------------- There are not enough slots available in the system to satisfy the 60 slots that were requested by the application: ./ex5 Either request fewer slots for your application, or make more slots available for use. A "slot" is the Open MPI term for an allocatable unit where we can launch a process. The number of slots available are defined by the environment in which Open MPI processes are run: 1. Hostfile, via "slots=N" clauses (N defaults to number of processor cores if not provided) 2. The --host command line parameter, via a ":N" suffix on the hostname (N defaults to 1 if not provided) 3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.) 4. If none of a hostfile, the --host command line parameter, or an RM is present, Open MPI defaults to the number of processor cores In all the above cases, if you want Open MPI to default to the number of hardware threads instead of the number of processor cores, use the --use-hwthread-cpus option. Alternatively, you can use the --oversubscribe option to ignore the number of available slots when deciding the number of processes to launch. -------------------------------------------------------------------------- [lida at head1 tutorials]$ mpirun --oversubscribe -n 60 ./ex5 -m 10000 -ksp_monitor -ksp_monitor_true_residual -ksp_converged_reason -log_view -pc_type gamg -ksp_type gmres -pc_gamg_use_parallel_coarse_grid_solver -------------------------------------------------------------------------- WARNING: No preset parameters were found for the device that Open MPI detected: Local host: head1 Device name: i40iw0 Device vendor ID: 0x8086 Device vendor part ID: 14290 Default device parameters will be used, which may result in lower performance. You can edit any of the files specified by the btl_openib_device_param_files MCA parameter to set values for your device. NOTE: You can turn off this warning by setting the MCA parameter btl_openib_warn_no_device_params_found to 0. -------------------------------------------------------------------------- -------------------------------------------------------------------------- No OpenFabrics connection schemes reported that they were able to be used on a specific port. As such, the openib BTL (OpenFabrics support) will be disabled for this port. Local host: head1 Local device: i40iw0 Local port: 1 CPCs attempted: rdmacm, udcm -------------------------------------------------------------------------- [head1.hpc:19648] 119 more processes have sent help message help-mpi-btl-openib.txt / no device params found [head1.hpc:19648] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [head1.hpc:19648] 119 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port 0 KSP Residual norm 4.355207026627e+09 0 KSP preconditioned resid norm 4.355207026627e+09 true resid norm 1.823690908212e+09 ||r(i)||/||b|| 1.000000000000e+00 1 KSP Residual norm 1.208748211681e+09 1 KSP preconditioned resid norm 1.208748211681e+09 true resid norm 6.155480046246e+08 ||r(i)||/||b|| 3.375286907736e-01 2 KSP Residual norm 4.517088284383e+08 2 KSP preconditioned resid norm 4.517088284383e+08 true resid norm 3.886324242477e+08 ||r(i)||/||b|| 2.131021339732e-01 3 KSP Residual norm 1.603295575511e+08 3 KSP preconditioned resid norm 1.603295575511e+08 true resid norm 1.620131718656e+08 ||r(i)||/||b|| 8.883806523136e-02 4 KSP Residual norm 5.544067857339e+07 4 KSP preconditioned resid norm 5.544067857339e+07 true resid norm 6.120149859376e+07 ||r(i)||/||b|| 3.355914004843e-02 5 KSP Residual norm 1.925294565414e+07 5 KSP preconditioned resid norm 1.925294565414e+07 true resid norm 2.393555310461e+07 ||r(i)||/||b|| 1.312478611196e-02 6 KSP Residual norm 5.919031529729e+06 6 KSP preconditioned resid norm 5.919031529729e+06 true resid norm 6.600310389533e+06 ||r(i)||/||b|| 3.619204526277e-03 7 KSP Residual norm 2.132637762468e+06 7 KSP preconditioned resid norm 2.132637762468e+06 true resid norm 2.301013174864e+06 ||r(i)||/||b|| 1.261734192183e-03 8 KSP Residual norm 7.288135118024e+05 8 KSP preconditioned resid norm 7.288135118024e+05 true resid norm 8.376703989009e+05 ||r(i)||/||b|| 4.593269589318e-04 9 KSP Residual norm 2.618419345570e+05 9 KSP preconditioned resid norm 2.618419345570e+05 true resid norm 2.924464805008e+05 ||r(i)||/||b|| 1.603596745390e-04 10 KSP Residual norm 9.736460918466e+04 10 KSP preconditioned resid norm 9.736460918466e+04 true resid norm 1.093493729815e+05 ||r(i)||/||b|| 5.996047492975e-05 11 KSP Residual norm 3.616464600646e+04 11 KSP preconditioned resid norm 3.616464600646e+04 true resid norm 4.287951581559e+04 ||r(i)||/||b|| 2.351249086262e-05 time 13.250000 m=10000 n=10000 Linear solve converged due to CONVERGED_RTOL iterations 11 time 10.790000 m=10000 n=10000 time 14.340000 m=10000 n=10000 time 11.910000 m=10000 n=10000 time 12.670000 m=10000 n=10000 time 14.900000 m=10000 n=10000 time 13.860000 m=10000 n=10000 time 12.550000 m=10000 n=10000 time 12.610000 m=10000 n=10000 time 11.110000 m=10000 n=10000 time 11.640000 m=10000 n=10000 time 11.980000 m=10000 n=10000 time 14.820000 m=10000 n=10000 time 16.180000 m=10000 n=10000 time 14.330000 m=10000 n=10000 time 13.850000 m=10000 n=10000 time 13.220000 m=10000 n=10000 time 10.220000 m=10000 n=10000 time 12.680000 m=10000 n=10000 time 16.240000 m=10000 n=10000 time 12.490000 m=10000 n=10000 time 16.070000 m=10000 n=10000 time 12.870000 m=10000 n=10000 time 12.170000 m=10000 n=10000 time 15.960000 m=10000 n=10000 time 13.630000 m=10000 n=10000 time 11.530000 m=10000 n=10000 time 13.700000 m=10000 n=10000 time 14.360000 m=10000 n=10000 time 11.690000 m=10000 n=10000 time 13.610000 m=10000 n=10000 time 12.800000 m=10000 n=10000 time 10.350000 m=10000 n=10000 time 14.680000 m=10000 n=10000 time 12.640000 m=10000 n=10000 time 10.860000 m=10000 n=10000 time 13.650000 m=10000 n=10000 time 14.190000 m=10000 n=10000 time 12.620000 m=10000 n=10000 time 12.860000 m=10000 n=10000 time 13.640000 m=10000 n=10000 time 14.790000 m=10000 n=10000 time 11.720000 m=10000 n=10000 time 13.300000 m=10000 n=10000 time 12.990000 m=10000 n=10000 time 13.100000 m=10000 n=10000 time 14.630000 m=10000 n=10000 time 14.170000 m=10000 n=10000 time 13.830000 m=10000 n=10000 time 12.600000 m=10000 n=10000 time 12.500000 m=10000 n=10000 time 12.050000 m=10000 n=10000 time 13.430000 m=10000 n=10000 time 11.790000 m=10000 n=10000 time 12.900000 m=10000 n=10000 time 11.200000 m=10000 n=10000 time 14.120000 m=10000 n=10000 time 15.230000 m=10000 n=10000 time 14.020000 m=10000 n=10000 time 13.360000 m=10000 n=10000 Norm of error 126115., Iterations 11 0 KSP Residual norm 1.051609452779e+09 0 KSP preconditioned resid norm 1.051609452779e+09 true resid norm 2.150197987965e+09 ||r(i)||/||b|| 1.000000000000e+00 1 KSP Residual norm 1.140665240186e+07 1 KSP preconditioned resid norm 1.140665240186e+07 true resid norm 7.877021908575e+07 ||r(i)||/||b|| 3.663393767766e-02 2 KSP Residual norm 9.303562428258e+05 2 KSP preconditioned resid norm 9.303562428258e+05 true resid norm 5.522945877123e+06 ||r(i)||/||b|| 2.568575502366e-03 3 KSP Residual norm 7.562974008642e+04 3 KSP preconditioned resid norm 7.562974008642e+04 true resid norm 4.308267545679e+05 ||r(i)||/||b|| 2.003660858113e-04 4 KSP Residual norm 6.241321855425e+03 4 KSP preconditioned resid norm 6.241321855425e+03 true resid norm 3.569774197924e+04 ||r(i)||/||b|| 1.660207207850e-05 Linear solve converged due to CONVERGED_RTOL iterations 4 Norm of error 9.5902e+09, Iterations 4 **************************************** *********************************************************************************************************************** *** WIDEN YOUR WINDOW TO 160 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** **************************************************************************************************************************************************************** ------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------- ./ex5 on a named head1.hpc with 60 processors, by lida Wed Jun 1 20:39:05 2022 Using Petsc Release Version 3.17.1, unknown Max Max/Min Avg Total Time (sec): 8.038e+01 1.001 8.036e+01 Objects: 1.450e+03 1.000 1.450e+03 Flops: 5.486e+09 1.002 5.485e+09 3.291e+11 Flops/sec: 6.827e+07 1.003 6.825e+07 4.095e+09 MPI Msg Count: 3.320e+03 2.712 2.535e+03 1.521e+05 MPI Msg Len (bytes): 5.412e+07 1.926 2.092e+04 3.183e+09 MPI Reductions: 1.547e+03 1.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 2.1349e-01 0.3% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 2.000e+00 0.1% 1: Original Solve: 5.8264e+01 72.5% 2.4065e+11 73.1% 1.086e+05 71.4% 2.402e+04 82.0% 9.080e+02 58.7% 2: Second Solve: 2.1880e+01 27.2% 8.8426e+10 26.9% 4.348e+04 28.6% 1.318e+04 18.0% 6.190e+02 40.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage --- Event Stage 1: Original Solve BuildTwoSided 99 1.0 5.4267e+00 1.8 0.00e+00 0.0 5.5e+03 8.0e+00 9.9e+01 5 0 4 0 6 7 0 5 0 11 0 BuildTwoSidedF 59 1.0 4.8310e+00 2.2 0.00e+00 0.0 2.8e+03 7.3e+04 5.9e+01 4 0 2 6 4 6 0 3 8 6 0 MatMult 430 1.0 1.0317e+01 1.3 1.64e+09 1.0 5.2e+04 2.8e+04 7.0e+00 11 30 34 46 0 16 41 48 56 1 9505 MatMultAdd 84 1.0 1.1148e+00 1.8 1.23e+08 1.0 7.7e+03 8.8e+03 0.0e+00 1 2 5 2 0 2 3 7 3 0 6638 MatMultTranspose 84 1.0 1.9097e+00 1.7 1.24e+08 1.0 9.0e+03 8.2e+03 7.0e+00 2 2 6 2 0 2 3 8 3 1 3880 MatSolve 12 1.0 8.0690e-05 3.1 3.36e+02 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 250 MatLUFactorSym 1 1.0 6.1153e-05 5.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatLUFactorNum 1 1.0 5.7829e-05 5.0 3.60e+01 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 37 MatConvert 8 1.0 7.6993e-01 1.9 0.00e+00 0.0 1.6e+03 1.3e+04 7.0e+00 1 0 1 1 0 1 0 1 1 1 0 MatScale 21 1.0 3.2515e-01 2.0 3.42e+07 1.0 8.1e+02 2.6e+04 0.0e+00 0 1 1 1 0 0 1 1 1 0 6310 MatResidual 84 1.0 1.8980e+00 1.5 2.87e+08 1.0 9.8e+03 2.6e+04 0.0e+00 2 5 6 8 0 3 7 9 10 0 9073 MatAssemblyBegin 117 1.0 4.0990e+00 2.4 0.00e+00 0.0 2.8e+03 7.3e+04 4.0e+01 3 0 2 6 3 5 0 3 8 4 0 MatAssemblyEnd 117 1.0 6.4010e+00 1.4 1.21e+05 2.3 0.0e+00 0.0e+00 1.4e+02 7 0 0 0 9 9 0 0 0 15 1 MatGetRowIJ 1 1.0 5.1278e-0513.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatCreateSubMats 1 1.0 2.7949e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatCreateSubMat 4 1.0 1.4416e-01 1.1 0.00e+00 0.0 2.9e+02 1.4e+03 5.6e+01 0 0 0 0 4 0 0 0 0 6 0 MatGetOrdering 1 1.0 1.2932e-02469.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatCoarsen 7 1.0 1.8915e+00 1.2 0.00e+00 0.0 2.0e+04 8.6e+03 9.5e+01 2 0 13 5 6 3 0 19 7 10 0 MatZeroEntries 7 1.0 2.4975e-02 6.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAXPY 7 1.0 1.1375e+00 1.2 1.95e+06 1.0 0.0e+00 0.0e+00 7.0e+00 1 0 0 0 0 2 0 0 0 1 103 MatTranspose 14 1.0 3.0706e-01 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMatMultSym 21 1.0 3.4042e+00 1.1 0.00e+00 0.0 2.4e+03 2.6e+04 6.3e+01 4 0 2 2 4 5 0 2 2 7 0 MatMatMultNum 21 1.0 1.7535e+00 1.6 8.62e+07 1.0 8.1e+02 2.6e+04 7.0e+00 2 2 1 1 0 2 2 1 1 1 2946 MatPtAPSymbolic 7 1.0 3.8899e+00 1.0 0.00e+00 0.0 5.3e+03 4.9e+04 4.9e+01 5 0 3 8 3 7 0 5 10 5 0 MatPtAPNumeric 7 1.0 2.3929e+00 1.1 1.34e+08 1.0 1.9e+03 1.0e+05 3.5e+01 3 2 1 6 2 4 3 2 7 4 3361 MatTrnMatMultSym 1 1.0 5.1090e+00 1.0 0.00e+00 0.0 3.5e+02 1.9e+05 1.1e+01 6 0 0 2 1 9 0 0 3 1 0 MatRedundantMat 1 1.0 2.8010e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMPIConcateSeq 1 1.0 3.4418e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetLocalMat 22 1.0 1.1056e+00 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatGetBrAoCol 21 1.0 2.0194e-01 2.8 0.00e+00 0.0 5.7e+03 4.7e+04 0.0e+00 0 0 4 8 0 0 0 5 10 0 0 MatGetSymTrans 2 1.0 2.8332e-01 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecMDot 81 1.0 3.5909e+00 1.5 4.34e+08 1.0 0.0e+00 0.0e+00 8.1e+01 4 8 0 0 5 5 11 0 0 9 7248 VecNorm 103 1.0 2.5986e+00 1.5 1.29e+08 1.0 0.0e+00 0.0e+00 1.0e+02 3 2 0 0 7 4 3 0 0 11 2988 VecScale 89 1.0 1.6507e-01 1.8 4.14e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 15040 VecCopy 272 1.0 5.8546e-01 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecSet 315 1.0 3.5993e-01 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 20 1.0 1.9678e-01 2.2 4.72e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 14398 VecAYPX 516 1.0 8.4424e-01 1.9 2.07e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 5 0 0 0 14681 VecAXPBYCZ 168 1.0 3.6343e-01 2.1 2.33e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 6 0 0 0 38501 VecMAXPY 100 1.0 1.6303e+00 1.4 7.30e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 13 0 0 0 2 18 0 0 0 26841 VecAssemblyBegin 21 1.0 7.7428e-01 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 1.9e+01 1 0 0 0 1 1 0 0 0 2 0 VecAssemblyEnd 21 1.0 8.0603e-05 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 413 1.0 8.0314e-01 1.5 1.15e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 3 0 0 0 8566 VecScatterBegin 650 1.0 5.5847e-01 2.3 0.00e+00 0.0 7.4e+04 2.4e+04 2.6e+01 1 0 49 57 2 1 0 68 69 3 0 VecScatterEnd 650 1.0 6.7772e+00 2.7 1.47e+05 2.0 0.0e+00 0.0e+00 0.0e+00 6 0 0 0 0 8 0 0 0 0 1 VecNormalize 89 1.0 1.7682e+00 1.4 1.24e+08 1.0 0.0e+00 0.0e+00 8.9e+01 2 2 0 0 6 3 3 0 0 10 4212 SFSetGraph 52 1.0 4.5486e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFSetUp 40 1.0 1.0026e+00 2.2 0.00e+00 0.0 8.3e+03 1.4e+04 4.0e+01 1 0 5 4 3 1 0 8 4 4 0 SFBcastBegin 102 1.0 1.1024e-0214.9 0.00e+00 0.0 1.9e+04 7.7e+03 0.0e+00 0 0 12 5 0 0 0 17 6 0 0 SFBcastEnd 102 1.0 4.4157e-0110.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFPack 752 1.0 3.4799e-02 8.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFUnpack 752 1.0 1.5714e-03 2.5 1.47e+05 2.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 5420 KSPSetUp 17 1.0 4.5984e-01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.6e+01 0 0 0 0 1 1 0 0 0 2 0 KSPSolve 1 1.0 1.7258e+01 1.0 3.02e+09 1.0 5.9e+04 2.3e+04 5.9e+01 21 55 39 43 4 30 75 55 52 6 10498 KSPGMRESOrthog 81 1.0 4.4162e+00 1.4 8.68e+08 1.0 0.0e+00 0.0e+00 8.1e+01 5 16 0 0 5 6 22 0 0 9 11787 PCGAMGGraph_AGG 7 1.0 6.6074e+00 1.0 2.39e+07 1.0 2.4e+03 1.7e+04 6.3e+01 8 0 2 1 4 11 1 2 2 7 217 PCGAMGCoarse_AGG 7 1.0 9.2609e+00 1.0 0.00e+00 0.0 2.2e+04 1.6e+04 1.2e+02 11 0 14 11 8 16 0 20 14 13 0 PCGAMGProl_AGG 7 1.0 3.7914e+00 1.1 0.00e+00 0.0 3.9e+03 2.2e+04 1.1e+02 5 0 3 3 7 6 0 4 3 12 0 PCGAMGPOpt_AGG 7 1.0 9.7580e+00 1.0 8.12e+08 1.0 1.3e+04 2.4e+04 2.9e+02 12 15 8 10 19 17 20 12 12 32 4991 GAMG: createProl 7 1.0 2.9629e+01 1.0 8.36e+08 1.0 4.1e+04 1.9e+04 5.8e+02 37 15 27 25 37 51 21 38 30 64 1692 Create Graph 7 1.0 7.6994e-01 1.9 0.00e+00 0.0 1.6e+03 1.3e+04 7.0e+00 1 0 1 1 0 1 0 1 1 1 0 Filter Graph 7 1.0 5.9877e+00 1.1 2.39e+07 1.0 8.1e+02 2.6e+04 5.6e+01 7 0 1 1 4 10 1 1 1 6 240 MIS/Agg 7 1.0 1.8917e+00 1.2 0.00e+00 0.0 2.0e+04 8.6e+03 9.5e+01 2 0 13 5 6 3 0 19 7 10 0 SA: col data 7 1.0 8.5757e-01 1.1 0.00e+00 0.0 3.0e+03 2.4e+04 4.8e+01 1 0 2 2 3 1 0 3 3 5 0 SA: frmProl0 7 1.0 2.5043e+00 1.0 0.00e+00 0.0 9.2e+02 1.5e+04 3.5e+01 3 0 1 0 2 4 0 1 1 4 0 SA: smooth 7 1.0 4.9217e+00 1.0 3.62e+07 1.0 3.3e+03 2.6e+04 9.4e+01 6 1 2 3 6 8 1 3 3 10 441 GAMG: partLevel 7 1.0 6.8353e+00 1.0 1.34e+08 1.0 7.9e+03 5.7e+04 1.9e+02 8 2 5 14 12 12 3 7 17 21 1177 repartition 2 1.0 4.0880e-01 1.0 0.00e+00 0.0 7.0e+02 6.2e+02 1.1e+02 1 0 0 0 7 1 0 1 0 12 0 Invert-Sort 2 1.0 6.3940e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 1 0 0 0 0 1 0 Move A 2 1.0 1.2997e-01 1.1 0.00e+00 0.0 2.9e+02 1.4e+03 3.0e+01 0 0 0 0 2 0 0 0 0 3 0 Move P 2 1.0 2.4461e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 3.2e+01 0 0 0 0 2 0 0 0 0 4 0 PCGAMG Squ l00 1 1.0 5.1090e+00 1.0 0.00e+00 0.0 3.5e+02 1.9e+05 1.1e+01 6 0 0 2 1 9 0 0 3 1 0 PCGAMG Gal l00 1 1.0 3.9555e+00 1.1 7.69e+07 1.0 9.4e+02 1.5e+05 1.3e+01 5 1 1 4 1 7 2 1 5 1 1167 PCGAMG Opt l00 1 1.0 2.4544e+00 1.1 1.67e+07 1.0 4.7e+02 8.0e+04 1.1e+01 3 0 0 1 1 4 0 0 1 1 407 PCGAMG Gal l01 1 1.0 1.6669e+00 1.0 3.97e+07 1.0 9.4e+02 1.8e+05 1.3e+01 2 1 1 5 1 3 1 1 6 1 1428 PCGAMG Opt l01 1 1.0 5.6540e-01 1.0 5.09e+06 1.0 4.7e+02 5.0e+04 1.1e+01 1 0 0 1 1 1 0 0 1 1 540 PCGAMG Gal l02 1 1.0 4.9954e-01 1.0 1.54e+07 1.1 9.4e+02 1.1e+05 1.3e+01 1 0 1 3 1 1 0 1 4 1 1829 PCGAMG Opt l02 1 1.0 2.3346e-01 1.1 1.91e+06 1.0 4.7e+02 3.2e+04 1.1e+01 0 0 0 0 1 0 0 0 1 1 488 PCGAMG Gal l03 1 1.0 2.1878e-01 1.0 2.19e+06 1.4 9.4e+02 3.6e+04 1.2e+01 0 0 1 1 1 0 0 1 1 1 567 PCGAMG Opt l03 1 1.0 1.3572e-01 1.1 2.49e+05 1.1 4.7e+02 1.1e+04 1.0e+01 0 0 0 0 1 0 0 0 0 1 108 PCGAMG Gal l04 1 1.0 1.1188e-01 1.2 1.95e+05 2.2 2.4e+03 3.6e+03 1.2e+01 0 0 2 0 1 0 0 2 0 1 93 PCGAMG Opt l04 1 1.0 1.3686e-01 1.1 2.16e+04 1.7 9.4e+02 1.6e+03 1.0e+01 0 0 1 0 1 0 0 1 0 1 9 PCGAMG Gal l05 1 1.0 3.8335e-03 1.4 3.21e+04 0.0 9.5e+02 5.7e+02 1.2e+01 0 0 1 0 1 0 0 1 0 1 133 PCGAMG Opt l05 1 1.0 1.3979e-02 1.1 4.02e+03 0.0 4.3e+02 2.7e+02 1.0e+01 0 0 0 0 1 0 0 0 0 1 5 PCGAMG Gal l06 1 1.0 2.1886e-03 1.5 8.55e+03 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 1 0 0 0 0 1 4 PCGAMG Opt l06 1 1.0 1.5361e-02 1.0 2.63e+03 0.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 1 0 0 0 0 1 0 PCSetUp 1 1.0 3.7111e+01 1.0 9.70e+08 1.0 4.9e+04 2.5e+04 8.3e+02 46 18 32 39 54 64 24 45 47 92 1568 PCApply 12 1.0 1.1146e+01 1.1 1.82e+09 1.0 5.7e+04 2.0e+04 2.3e+01 13 33 37 36 1 18 45 52 44 3 9768 --- Event Stage 2: Second Solve BuildTwoSided 73 1.0 1.9759e+00 1.9 0.00e+00 0.0 6.8e+03 8.0e+00 7.3e+01 2 0 4 0 5 7 0 16 0 12 0 BuildTwoSidedF 44 1.0 1.2189e+00 2.0 0.00e+00 0.0 8.6e+03 1.7e+04 4.4e+01 1 0 6 5 3 4 0 20 25 7 0 MatMult 160 1.0 4.2652e+00 1.4 6.02e+08 1.0 1.6e+04 2.3e+04 4.0e+00 5 11 11 12 0 17 41 38 67 1 8474 MatMultAdd 25 1.0 1.8994e-01 3.6 6.17e+05 1.0 2.1e+03 1.7e+02 0.0e+00 0 0 1 0 0 0 0 5 0 0 193 MatMultTranspose 25 1.0 7.2792e-0119.4 6.18e+05 1.0 2.9e+03 1.5e+02 5.0e+00 0 0 2 0 0 1 0 7 0 1 51 MatSolve 5 1.0 3.6131e-05 1.8 1.99e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3305 MatLUFactorSym 1 1.0 9.1382e-05 5.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatLUFactorNum 1 1.0 3.8696e-05 5.9 9.09e+02 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1409 MatConvert 6 1.0 4.6379e-01 2.2 0.00e+00 0.0 9.6e+02 1.0e+04 5.0e+00 0 0 1 0 0 1 0 2 2 1 0 MatScale 15 1.0 2.0882e-01 2.2 1.69e+07 1.0 4.8e+02 2.0e+04 0.0e+00 0 0 0 0 0 1 1 1 2 0 4848 MatResidual 25 1.0 6.9403e-01 2.1 8.38e+07 1.0 2.4e+03 2.0e+04 0.0e+00 1 2 2 2 0 2 6 6 8 0 7240 MatAssemblyBegin 87 1.0 8.9302e-01 2.2 0.00e+00 0.0 8.6e+03 1.7e+04 3.0e+01 1 0 6 5 2 3 0 20 25 5 0 MatAssemblyEnd 87 1.0 2.6614e+00 1.3 9.98e+04 1.0 0.0e+00 0.0e+00 1.0e+02 3 0 0 0 7 11 0 0 0 16 2 MatGetRowIJ 1 1.0 1.2436e-021825.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatCreateSubMats 1 1.0 5.4925e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 1 0 MatCreateSubMat 4 1.0 3.1275e-01 1.1 0.00e+00 0.0 3.0e+02 2.0e+02 5.6e+01 0 0 0 0 4 1 0 1 0 9 0 MatGetOrdering 1 1.0 1.2514e-02516.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatCoarsen 5 1.0 1.0904e+00 1.8 0.00e+00 0.0 2.9e+03 6.3e+02 1.5e+01 1 0 2 0 1 4 0 7 0 2 0 MatZeroEntries 6 1.0 5.8985e-02 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAXPY 5 1.0 4.8172e-01 1.2 2.35e+04 1.0 0.0e+00 0.0e+00 5.0e+00 1 0 0 0 0 2 0 0 0 1 3 MatTranspose 10 1.0 3.2074e-02 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMatMultSym 15 1.0 1.1367e+00 1.0 0.00e+00 0.0 1.4e+03 7.0e+03 4.5e+01 1 0 1 0 3 5 0 3 2 7 0 MatMatMultNum 15 1.0 2.4953e-01 1.2 1.02e+06 1.0 4.8e+02 5.1e+02 5.0e+00 0 0 0 0 0 1 0 1 0 1 241 MatPtAPSymbolic 5 1.0 8.1101e-01 1.1 0.00e+00 0.0 2.9e+03 4.2e+03 3.5e+01 1 0 2 0 2 4 0 7 2 6 0 MatPtAPNumeric 5 1.0 3.9561e-01 1.2 1.58e+06 1.0 9.6e+02 2.2e+03 2.5e+01 0 0 1 0 2 2 0 2 0 4 235 MatTrnMatMultSym 1 1.0 9.9313e-01 1.0 0.00e+00 0.0 3.5e+02 2.2e+03 1.1e+01 1 0 0 0 1 4 0 1 0 2 0 MatRedundantMat 1 1.0 5.4990e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 1 0 MatMPIConcateSeq 1 1.0 5.8523e-05 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetLocalMat 16 1.0 5.0405e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 2 0 0 0 0 0 MatGetBrAoCol 15 1.0 1.0643e-01 2.6 0.00e+00 0.0 3.4e+03 6.5e+03 0.0e+00 0 0 2 1 0 0 0 8 4 0 0 MatGetSymTrans 2 1.0 7.1719e-02 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecMDot 54 1.0 1.5885e+00 1.7 2.17e+08 1.0 0.0e+00 0.0e+00 5.4e+01 2 4 0 0 3 6 15 0 0 9 8198 VecNorm 67 1.0 1.6017e+00 1.4 7.67e+07 1.0 0.0e+00 0.0e+00 6.7e+01 2 1 0 0 4 6 5 0 0 11 2875 VecScale 60 1.0 1.0996e-01 1.7 2.67e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 14571 VecCopy 86 1.0 2.5385e-01 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 VecSet 106 1.0 1.8434e-01 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 11 1.0 9.2483e-02 2.2 2.33e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 15142 VecAYPX 155 1.0 3.6418e-01 2.3 7.51e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 1 5 0 0 0 12379 VecAXPBYCZ 50 1.0 1.6760e-01 2.9 8.35e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 6 0 0 0 29894 VecMAXPY 64 1.0 6.9542e-01 1.6 2.97e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 3 20 0 0 0 25634 VecAssemblyBegin 16 1.0 5.5466e-01 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.4e+01 0 0 0 0 1 2 0 0 0 2 0 VecAssemblyEnd 16 1.0 7.5968e-05 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 155 1.0 3.8336e-01 1.8 5.18e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 1 4 0 0 0 8103 VecScatterBegin 242 1.0 3.2776e-01 2.7 0.00e+00 0.0 2.5e+04 1.6e+04 1.9e+01 0 0 17 12 1 1 0 58 69 3 0 VecScatterEnd 242 1.0 2.3472e+00 2.9 1.22e+03 3.5 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 7 0 0 0 0 0 VecNormalize 60 1.0 1.3091e+00 1.6 8.01e+07 1.0 0.0e+00 0.0e+00 6.0e+01 1 1 0 0 4 5 5 0 0 10 3672 SFSetGraph 39 1.0 8.6654e-04 4.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFSetUp 29 1.0 8.1719e-01 2.8 0.00e+00 0.0 4.9e+03 2.2e+03 2.9e+01 1 0 3 0 2 3 0 11 2 5 0 SFBcastBegin 20 1.0 2.9462e-04 2.5 0.00e+00 0.0 1.9e+03 7.5e+02 0.0e+00 0 0 1 0 0 0 0 4 0 0 0 SFBcastEnd 20 1.0 2.5962e-01143.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFPack 262 1.0 1.7704e-02100.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFUnpack 262 1.0 1.3156e-02131.3 1.22e+03 3.5 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3 KSPSetUp 13 1.0 1.7844e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 1 1 0 0 0 2 0 KSPSolve 1 1.0 5.4889e+00 1.0 8.05e+08 1.0 1.6e+04 1.7e+04 3.2e+01 7 15 11 9 2 25 55 37 48 5 8798 KSPGMRESOrthog 54 1.0 1.9637e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00 5.4e+01 2 8 0 0 3 7 29 0 0 9 13263 PCGAMGGraph_AGG 5 1.0 2.2872e+00 1.0 1.68e+07 1.0 1.4e+03 1.3e+04 4.5e+01 3 0 1 1 3 10 1 3 3 7 439 PCGAMGCoarse_AGG 5 1.0 3.2731e+00 1.0 0.00e+00 0.0 4.3e+03 9.2e+02 3.7e+01 4 0 3 0 2 15 0 10 1 6 0 PCGAMGProl_AGG 5 1.0 1.2727e+00 1.0 0.00e+00 0.0 2.3e+03 4.5e+02 7.9e+01 2 0 1 0 5 6 0 5 0 13 0 PCGAMGPOpt_AGG 5 1.0 5.7655e+00 1.0 6.29e+08 1.0 7.4e+03 1.4e+04 2.0e+02 7 11 5 3 13 26 43 17 19 33 6544 GAMG: createProl 5 1.0 1.2524e+01 1.0 6.46e+08 1.0 1.5e+04 8.4e+03 3.7e+02 16 12 10 4 24 57 44 36 23 59 3093 Create Graph 5 1.0 4.6379e-01 2.2 0.00e+00 0.0 9.6e+02 1.0e+04 5.0e+00 0 0 1 0 0 1 0 2 2 1 0 Filter Graph 5 1.0 2.0622e+00 1.1 1.68e+07 1.0 4.8e+02 2.0e+04 4.0e+01 2 0 0 0 3 9 1 1 2 6 487 MIS/Agg 5 1.0 1.0905e+00 1.8 0.00e+00 0.0 2.9e+03 6.3e+02 1.5e+01 1 0 2 0 1 4 0 7 0 2 0 SA: col data 5 1.0 5.3167e-01 1.2 0.00e+00 0.0 1.7e+03 5.1e+02 3.4e+01 1 0 1 0 2 2 0 4 0 5 0 SA: frmProl0 5 1.0 5.6108e-01 1.1 0.00e+00 0.0 6.0e+02 2.8e+02 2.5e+01 1 0 0 0 2 2 0 1 0 4 0 SA: smooth 5 1.0 2.0259e+00 1.1 4.34e+05 1.0 1.9e+03 5.4e+03 6.6e+01 2 0 1 0 4 9 0 4 2 11 13 GAMG: partLevel 5 1.0 1.8885e+00 1.0 1.58e+06 1.0 4.5e+03 3.2e+03 1.7e+02 2 0 3 0 11 9 0 10 3 27 49 repartition 2 1.0 7.2098e-01 1.0 0.00e+00 0.0 6.7e+02 1.0e+02 1.1e+02 1 0 0 0 7 3 0 2 0 17 0 Invert-Sort 2 1.0 7.2351e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 1 0 0 0 0 2 0 Move A 2 1.0 1.6566e-01 1.2 0.00e+00 0.0 3.0e+02 2.0e+02 3.0e+01 0 0 0 0 2 1 0 1 0 5 0 Move P 2 1.0 1.9534e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.2e+01 0 0 0 0 2 1 0 0 0 5 0 PCGAMG Squ l00 1 1.0 9.9313e-01 1.0 0.00e+00 0.0 3.5e+02 2.2e+03 1.1e+01 1 0 0 0 1 4 0 1 0 2 0 PCGAMG Gal l00 1 1.0 7.0349e-01 1.0 9.22e+05 1.0 9.4e+02 1.2e+04 1.3e+01 1 0 1 0 1 3 0 2 2 2 78 PCGAMG Opt l00 1 1.0 1.1336e+00 1.1 2.00e+05 1.0 4.7e+02 2.1e+04 1.1e+01 1 0 0 0 1 5 0 1 2 2 11 PCGAMG Gal l01 1 1.0 1.3150e-01 1.1 4.76e+05 1.1 9.4e+02 2.1e+03 1.2e+01 0 0 1 0 1 1 0 2 0 2 210 PCGAMG Opt l01 1 1.0 1.3884e-01 1.1 6.11e+04 1.0 4.7e+02 6.0e+02 1.0e+01 0 0 0 0 1 1 0 1 0 2 26 PCGAMG Gal l02 1 1.0 1.7012e-01 1.1 1.76e+05 1.3 9.4e+02 1.2e+03 1.2e+01 0 0 1 0 1 1 0 2 0 2 57 PCGAMG Opt l02 1 1.0 3.7929e-02 1.0 2.31e+04 1.1 4.7e+02 3.9e+02 1.0e+01 0 0 0 0 1 0 0 1 0 2 35 PCGAMG Gal l03 1 1.0 1.2656e-01 1.1 1.65e+04 1.6 9.4e+02 2.8e+02 1.2e+01 0 0 1 0 1 1 0 2 0 2 7 PCGAMG Opt l03 1 1.0 1.6197e-03 1.3 2.75e+03 1.4 4.7e+02 1.4e+02 1.0e+01 0 0 0 0 1 0 0 1 0 2 90 PCGAMG Gal l04 1 1.0 7.2816e-02 1.4 4.91e+03 0.0 6.4e+01 6.3e+01 1.2e+01 0 0 0 0 1 0 0 0 0 2 0 PCGAMG Opt l04 1 1.0 1.1088e-01 1.2 1.50e+03 0.0 3.2e+01 5.5e+01 1.0e+01 0 0 0 0 1 0 0 0 0 2 0 PCSetUp 1 1.0 1.5702e+01 1.0 6.47e+08 1.0 2.0e+04 7.2e+03 5.8e+02 19 12 13 5 38 72 44 46 25 94 2473 PCApply 5 1.0 3.6198e+00 1.2 4.87e+08 1.0 1.5e+04 1.3e+04 1.7e+01 4 9 10 6 1 15 33 35 34 3 8064 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Viewer 1 1 896 0. --- Event Stage 1: Original Solve Container 14 4 2496 0. Matrix 195 108 1577691960 0. Matrix Coarsen 7 7 4704 0. Vector 352 266 734665864 0. Index Set 110 97 704896 0. Star Forest Graph 82 49 63224 0. Krylov Solver 17 7 217000 0. Preconditioner 17 7 6496 0. Viewer 1 0 0 0. PetscRandom 7 7 4970 0. Distributed Mesh 15 7 35896 0. Discrete System 15 7 7168 0. Weak Form 15 7 4648 0. --- Event Stage 2: Second Solve Container 10 20 12480 0. Matrix 144 231 1909487728 0. Matrix Coarsen 5 5 3360 0. Vector 237 323 870608184 0. Index Set 88 101 154152 0. Star Forest Graph 59 92 117152 0. Krylov Solver 12 22 203734 0. Preconditioner 12 22 25080 0. PetscRandom 5 5 3550 0. Distributed Mesh 10 18 92304 0. Discrete System 10 18 18432 0. Weak Form 10 18 11952 0. ======================================================================================================================== Average time to get PetscTime(): 3.64147e-08 Average time for MPI_Barrier(): 0.00739469 Average time for zero size MPI_Send(): 0.000168472 #PETSc Option Table entries: -ksp_converged_reason -ksp_monitor -ksp_monitor_true_residual -ksp_type gmres -log_view -m 10000 -pc_gamg_use_parallel_coarse_grid_solver -pc_type gamg #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with 64 bit PetscInt Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8 Configure options: --with-python --prefix=/home/lida -with-mpi-dir=/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4 LDFLAGS="-L/home/lida/lib64 -L/home/lida/lib -L/home/lida/jdk/lib" CPPFLAGS="-I/home/lida/include -I/home/lida/jdk/include -march=native -O3" CXXFLAGS="-I/home/lida/include -I/home/lida/jdk/include -march=native -O3" CFLAGS="-I/home/lida/include -I/home/lida/jdk/include -march=native -O3" --with-debugging=no --with-64-bit-indices FOPTFLAGS="-O3 -march=native" --download-make ----------------------------------------- Libraries compiled on 2022-05-25 10:03:14 on head1.hpc Machine characteristics: Linux-3.10.0-1062.el7.x86_64-x86_64-with-centos-7.7.1908-Core Using PETSc directory: /home/lida Using PETSc arch: ----------------------------------------- Using C compiler: /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpicc -I/home/lida/include -I/home/lida/jdk/include -march=native -O3 -fPIC -I/home/lida/include -I/home/lida/jdk/include -march=native -O3 Using Fortran compiler: /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -O3 -march=native -I/home/lida/include -I/home/lida/jdk/include -march=native -O3 ----------------------------------------- Using include paths: -I/home/lida/include -I/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/include ----------------------------------------- Using C linker: /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpicc Using Fortran linker: /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpif90 Using libraries: -Wl,-rpath,/home/lida/lib -L/home/lida/lib -lpetsc -Wl,-rpath,/home/lida/lib64 -L/home/lida/lib64 -Wl,-rpath,/home/lida/lib -L/home/lida/lib -Wl,-rpath,/home/lida/jdk/lib -L/home/lida/jdk/lib -Wl,-rpath,/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/lib -L/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/lib -Wl,-rpath,/opt/ohpc/pub/compiler/gcc/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0 -L/opt/ohpc/pub/compiler/gcc/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0 -Wl,-rpath,/opt/ohpc/pub/compiler/gcc/8.3.0/lib64 -L/opt/ohpc/pub/compiler/gcc/8.3.0/lib64 -Wl,-rpath,/home/lida/intel/oneapi/mkl/2022.0.2/lib/intel64 -L/home/lida/intel/oneapi/mkl/2022.0.2/lib/intel64 -Wl,-rpath,/opt/software/intel/compilers_and_libraries_2020.2.254/linux/tbb/lib/intel64/gcc4.8 -L/opt/software/intel/compilers_and_libraries_2020.2.254/linux/tbb/lib/intel64/gcc4.8 -Wl,-rpath,/opt/ohpc/pub/compiler/gcc/8.3.0/lib -L/opt/ohpc/pub/compiler/gcc/8.3.0/lib -lopenblas -lm -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lquadmath -lstdc++ -ldl ----------------------------------------- [lida at head1 tutorials]$ From bsmith at petsc.dev Wed Jun 1 13:06:02 2022 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 1 Jun 2022 14:06:02 -0400 Subject: [petsc-users] Mat created by DMStag cannot access ghost points In-Reply-To: References: Message-ID: <859FA50E-F3C2-4E54-AEDC-7C8A70D3FCE1@petsc.dev> This appears to be a bug in the DMStag/Mat preallocator code. If you add after the DMCreateMatrix() line in your code PetscCall(MatSetOption(A, MAT_NO_OFF_PROC_ENTRIES, PETSC_FALSE)); Your code will run correctly. Patrick and Matt, MatPreallocatorPreallocate_Preallocator() has PetscCall(MatSetOption(A, MAT_NO_OFF_PROC_ENTRIES, p->nooffproc)); to make the assembly of the stag matrix from the preallocator matrix a little faster, but then it never "undoes" this call. Hence the matrix is left in the state where it will error if someone sets values from a different rank (which they certainly can using DMStagMatSetValuesStencil(). I think you need to clear the NO_OFF_PROC at the end of MatPreallocatorPreallocate_Preallocator() because just because the preallocation process never needed communication does not mean that when someone puts real values in the matrix they will never use communication; they can put in values any dang way they please. I don't know why this bug has not come up before. Barry > On May 31, 2022, at 11:08 PM, Ye Changqing wrote: > > Dear all, > > [BugReport.c] is a sample code, [BugReportParallel.output] is the output when execute BugReport with mpiexec, [BugReportSerial.output] is the output in serial execution. > > Best, > Changqing > > ???: Dave May > > ????: 2022?5?31? 22:55 > ???: Ye Changqing > > ??: petsc-users at mcs.anl.gov > > ??: Re: [petsc-users] Mat created by DMStag cannot access ghost points > > > > On Tue 31. May 2022 at 16:28, Ye Changqing > wrote: > Dear developers of PETSc, > > I encountered a problem when using the DMStag module. The program could be executed perfectly in serial, while errors are thrown out in parallel (using mpiexec). Some rows in Mat cannot be accessed in local processes when looping all elements in DMStag. The DM object I used only has one DOF in each element. Hence, I could switch to the DMDA module easily, and the program now is back to normal. > > Some snippets are below. > > Initialise a DMStag object: > PetscCall(DMStagCreate2d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, M, N, PETSC_DECIDE, PETSC_DECIDE, 0, 0, 1, DMSTAG_STENCIL_BOX, 1, NULL, NULL, &(s_ctx->dm_P))); > Created a Mat: > PetscCall(DMCreateMatrix(s_ctx->dm_P, A)); > Loop: > PetscCall(DMStagGetCorners(s_ctx->dm_V, &startx, &starty, &startz, &nx, &ny, &nz, &extrax, &extray, &extraz)); > for (ey = starty; ey < starty + ny; ++ey) > for (ex = startx; ex < startx + nx; ++ex) > { > ... > PetscCall(DMStagMatSetValuesStencil(s_ctx->dm_P, *A, 2, &row[0], 2, &col[0], &val_A[0][0], ADD_VALUES)); // The traceback shows the problem is in here. > } > > In addition to the code or MWE, please forward us the complete stack trace / error thrown to stdout. > > Thanks, > Dave > > > > Best, > Changqing > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Jun 1 13:08:51 2022 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 1 Jun 2022 14:08:51 -0400 Subject: [petsc-users] Sparse linear system solving In-Reply-To: <201b6c28-2616-edaa-21dd-ee91a257c432@mail.ioffe.ru> References: <026d55af-c978-81da-f571-46519e5e6f8e@mail.ioffe.ru> <2e7ebbf1-511a-7055-ff92-131d3bf73f1e@mail.ioffe.ru> <201b6c28-2616-edaa-21dd-ee91a257c432@mail.ioffe.ru> Message-ID: <8EE5882E-238C-4FFF-8E51-7AA318B225E8@petsc.dev> PETSc is an MPI library. It is not an OpenMP library. Only some external packages that PETSc uses can use OpenMP, things like GAMG will not utilize OpenMP pretty much at all. Barry > On Jun 1, 2022, at 1:37 PM, Lidia wrote: > > Dear Matt, > > Thank you for the rule of 10,000 variables per process! We have run ex.5 with matrix 1e4 x 1e4 at our cluster and got a good performance dynamics (see the figure "performance.png" - dependency of the solving time in seconds on the number of cores). We have used GAMG preconditioner (multithread: we have added the option "-pc_gamg_use_parallel_coarse_grid_solver") and GMRES solver. And we have set one openMP thread to every MPI process. Now the ex.5 is working good on many mpi processes! But the running uses about 100 GB of RAM. > > How we can run ex.5 using many openMP threads without mpi? If we just change the running command, the cores are not loaded normally: usually just one core is loaded in 100 % and others are idle. Sometimes all cores are working in 100 % during 1 second but then again become idle about 30 seconds. Can the preconditioner use many threads and how to activate this option? > > The solving times (the time of the solver work) using 60 openMP threads is 511 seconds now, and while using 60 MPI processes - 13.19 seconds. > > ksp_monitor outs for both cases (many openMP threads or many MPI processes) are attached. > > > > Thank you! > > Best, > Lidia > > On 31.05.2022 15:21, Matthew Knepley wrote: >> I have looked at the local logs. First, you have run problems of size 12 and 24. As a rule of thumb, you need 10,000 >> variables per process in order to see good speedup. >> >> Thanks, >> >> Matt >> >> On Tue, May 31, 2022 at 8:19 AM Matthew Knepley > wrote: >> On Tue, May 31, 2022 at 7:39 AM Lidia > wrote: >> Matt, Mark, thank you much for your answers! >> >> >> >> Now we have run example # 5 on our computer cluster and on the local server and also have not seen any performance increase, but by unclear reason running times on the local server are much better than on the cluster. >> >> I suspect that you are trying to get speedup without increasing the memory bandwidth: >> >> https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup >> >> Thanks, >> >> Matt >> Now we will try to run petsc #5 example inside a docker container on our server and see if the problem is in our environment. I'll write you the results of this test as soon as we get it. >> >> The ksp_monitor outs for the 5th test at the current local server configuration (for 2 and 4 mpi processes) and for the cluster (for 1 and 3 mpi processes) are attached . >> >> >> >> And one more question. Potentially we can use 10 nodes and 96 threads at each node on our cluster. What do you think, which combination of numbers of mpi processes and openmp threads may be the best for the 5th example? >> >> Thank you! >> >> >> >> Best, >> Lidiia >> >> On 31.05.2022 05:42, Mark Adams wrote: >>> And if you see "NO" change in performance I suspect the solver/matrix is all on one processor. >>> (PETSc does not use threads by default so threads should not change anything). >>> >>> As Matt said, it is best to start with a PETSc example that does something like what you want (parallel linear solve, see src/ksp/ksp/tutorials for examples), and then add your code to it. >>> That way you get the basic infrastructure in place for you, which is pretty obscure to the uninitiated. >>> >>> Mark >>> >>> On Mon, May 30, 2022 at 10:18 PM Matthew Knepley > wrote: >>> On Mon, May 30, 2022 at 10:12 PM Lidia > wrote: >>> Dear colleagues, >>> >>> Is here anyone who have solved big sparse linear matrices using PETSC? >>> >>> There are lots of publications with this kind of data. Here is one recent one: https://arxiv.org/abs/2204.01722 >>> >>> We have found NO performance improvement while using more and more mpi >>> processes (1-2-3) and open-mp threads (from 1 to 72 threads). Did anyone >>> faced to this problem? Does anyone know any possible reasons of such >>> behaviour? >>> >>> Solver behavior is dependent on the input matrix. The only general-purpose solvers >>> are direct, but they do not scale linearly and have high memory requirements. >>> >>> Thus, in order to make progress you will have to be specific about your matrices. >>> >>> We use AMG preconditioner and GMRES solver from KSP package, as our >>> matrix is large (from 100 000 to 1e+6 rows and columns), sparse, >>> non-symmetric and includes both positive and negative values. But >>> performance problems also exist while using CG solvers with symmetric >>> matrices. >>> >>> There are many PETSc examples, such as example 5 for the Laplacian, that exhibit >>> good scaling with both AMG and GMG. >>> >>> Could anyone help us to set appropriate options of the preconditioner >>> and solver? Now we use default parameters, maybe they are not the best, >>> but we do not know a good combination. Or maybe you could suggest any >>> other pairs of preconditioner+solver for such tasks? >>> >>> I can provide more information: the matrices that we solve, c++ script >>> to run solving using petsc and any statistics obtained by our runs. >>> >>> First, please provide a description of the linear system, and the output of >>> >>> -ksp_view -ksp_monitor_true_residual -ksp_converged_reason -log_view >>> >>> for each test case. >>> >>> Thanks, >>> >>> Matt >>> >>> Thank you in advance! >>> >>> Best regards, >>> Lidiia Varshavchik, >>> Ioffe Institute, St. Petersburg, Russia >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jun 1 13:14:02 2022 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 1 Jun 2022 14:14:02 -0400 Subject: [petsc-users] Sparse linear system solving In-Reply-To: <201b6c28-2616-edaa-21dd-ee91a257c432@mail.ioffe.ru> References: <026d55af-c978-81da-f571-46519e5e6f8e@mail.ioffe.ru> <2e7ebbf1-511a-7055-ff92-131d3bf73f1e@mail.ioffe.ru> <201b6c28-2616-edaa-21dd-ee91a257c432@mail.ioffe.ru> Message-ID: On Wed, Jun 1, 2022 at 1:43 PM Lidia wrote: > Dear Matt, > > Thank you for the rule of 10,000 variables per process! We have run ex.5 > with matrix 1e4 x 1e4 at our cluster and got a good performance dynamics > (see the figure "performance.png" - dependency of the solving time in > seconds on the number of cores). We have used GAMG preconditioner > (multithread: we have added the option " > -pc_gamg_use_parallel_coarse_grid_solver") and GMRES solver. And we have > set one openMP thread to every MPI process. Now the ex.5 is working good on > many mpi processes! But the running uses about 100 GB of RAM. > > How we can run ex.5 using many openMP threads without mpi? If we just > change the running command, the cores are not loaded normally: usually just > one core is loaded in 100 % and others are idle. Sometimes all cores are > working in 100 % during 1 second but then again become idle about 30 > seconds. Can the preconditioner use many threads and how to activate this > option? > Maye you could describe what you are trying to accomplish? Threads and processes are not really different, except for memory sharing. However, sharing large complex data structures rarely works. That is why they get partitioned and operate effectively as distributed memory. You would not really save memory by using threads in this instance, if that is your goal. This is detailed in the talks in this session (see 2016 PP Minisymposium on this page https://cse.buffalo.edu/~knepley/relacs.html). Thanks, Matt > The solving times (the time of the solver work) using 60 openMP threads is > 511 seconds now, and while using 60 MPI processes - 13.19 seconds. > > ksp_monitor outs for both cases (many openMP threads or many MPI > processes) are attached. > > > Thank you! > Best, > Lidia > > On 31.05.2022 15:21, Matthew Knepley wrote: > > I have looked at the local logs. First, you have run problems of size 12 > and 24. As a rule of thumb, you need 10,000 > variables per process in order to see good speedup. > > Thanks, > > Matt > > On Tue, May 31, 2022 at 8:19 AM Matthew Knepley wrote: > >> On Tue, May 31, 2022 at 7:39 AM Lidia wrote: >> >>> Matt, Mark, thank you much for your answers! >>> >>> >>> Now we have run example # 5 on our computer cluster and on the local >>> server and also have not seen any performance increase, but by unclear >>> reason running times on the local server are much better than on the >>> cluster. >>> >> I suspect that you are trying to get speedup without increasing the >> memory bandwidth: >> >> >> https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup >> >> Thanks, >> >> Matt >> >>> Now we will try to run petsc #5 example inside a docker container on our >>> server and see if the problem is in our environment. I'll write you the >>> results of this test as soon as we get it. >>> >>> The ksp_monitor outs for the 5th test at the current local server >>> configuration (for 2 and 4 mpi processes) and for the cluster (for 1 and 3 >>> mpi processes) are attached . >>> >>> >>> And one more question. Potentially we can use 10 nodes and 96 threads at >>> each node on our cluster. What do you think, which combination of numbers >>> of mpi processes and openmp threads may be the best for the 5th example? >>> >>> Thank you! >>> >>> >>> Best, >>> Lidiia >>> >>> On 31.05.2022 05:42, Mark Adams wrote: >>> >>> And if you see "NO" change in performance I suspect the solver/matrix is >>> all on one processor. >>> (PETSc does not use threads by default so threads should not change >>> anything). >>> >>> As Matt said, it is best to start with a PETSc example that does >>> something like what you want (parallel linear solve, see >>> src/ksp/ksp/tutorials for examples), and then add your code to it. >>> That way you get the basic infrastructure in place for you, which is >>> pretty obscure to the uninitiated. >>> >>> Mark >>> >>> On Mon, May 30, 2022 at 10:18 PM Matthew Knepley >>> wrote: >>> >>>> On Mon, May 30, 2022 at 10:12 PM Lidia >>>> wrote: >>>> >>>>> Dear colleagues, >>>>> >>>>> Is here anyone who have solved big sparse linear matrices using PETSC? >>>>> >>>> >>>> There are lots of publications with this kind of data. Here is one >>>> recent one: https://arxiv.org/abs/2204.01722 >>>> >>>> >>>>> We have found NO performance improvement while using more and more mpi >>>>> processes (1-2-3) and open-mp threads (from 1 to 72 threads). Did >>>>> anyone >>>>> faced to this problem? Does anyone know any possible reasons of such >>>>> behaviour? >>>>> >>>> >>>> Solver behavior is dependent on the input matrix. The only >>>> general-purpose solvers >>>> are direct, but they do not scale linearly and have high memory >>>> requirements. >>>> >>>> Thus, in order to make progress you will have to be specific about your >>>> matrices. >>>> >>>> >>>>> We use AMG preconditioner and GMRES solver from KSP package, as our >>>>> matrix is large (from 100 000 to 1e+6 rows and columns), sparse, >>>>> non-symmetric and includes both positive and negative values. But >>>>> performance problems also exist while using CG solvers with symmetric >>>>> matrices. >>>>> >>>> >>>> There are many PETSc examples, such as example 5 for the Laplacian, >>>> that exhibit >>>> good scaling with both AMG and GMG. >>>> >>>> >>>>> Could anyone help us to set appropriate options of the preconditioner >>>>> and solver? Now we use default parameters, maybe they are not the >>>>> best, >>>>> but we do not know a good combination. Or maybe you could suggest any >>>>> other pairs of preconditioner+solver for such tasks? >>>>> >>>>> I can provide more information: the matrices that we solve, c++ script >>>>> to run solving using petsc and any statistics obtained by our runs. >>>>> >>>> >>>> First, please provide a description of the linear system, and the >>>> output of >>>> >>>> -ksp_view -ksp_monitor_true_residual -ksp_converged_reason -log_view >>>> >>>> for each test case. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Thank you in advance! >>>>> >>>>> Best regards, >>>>> Lidiia Varshavchik, >>>>> Ioffe Institute, St. Petersburg, Russia >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From badi.hamid at gmail.com Thu Jun 2 04:38:32 2022 From: badi.hamid at gmail.com (hamid badi) Date: Thu, 2 Jun 2022 11:38:32 +0200 Subject: [petsc-users] Petsc with mingw64 Message-ID: Hi, I want to compile petsc with openblas & mumps (sequential) under mingw64. To do so, I compiled openblas and mumps without any problem. But when it comes to petsc, configure can't find my mumps. I use the following configuration options : --with-shared-libraries=1 --with-openmp=1 --with-mpi=0 --with-debugging=0 --with-scalar-type=real --with-x=0 --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 --with-windows-graphics=0 --with-openblas=1 --with-openblas-dir=/mingw64/ --with-mumps=1 --with-mumps-include=~/mumps-git/build/include --with-mumps-lib="-L~/mumps-git/build/lib -lsmumps -ldmumps -lmumps_common -lpord -lmpiseq" --with-mumps-serial=1 i get the following error : UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): ------------------------------------------------------------------------------- --with-mumps-lib=['-L~/mumps-git/build/lib', '-lsmumps', '-ldmumps', '-lmumps_common', '-lpord', '-lmpiseq'] and --with-mumps-include=['~/mumps-git/build/include'] did not work ******************************************************************************* I also tried using --with-mumps-dir=~/mumps-git/build without success. Thanks for helping. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jun 2 06:33:41 2022 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 2 Jun 2022 07:33:41 -0400 Subject: [petsc-users] Petsc with mingw64 In-Reply-To: References: Message-ID: For any configure error, you need to send configure.log Thanks, Matt On Thu, Jun 2, 2022 at 5:38 AM hamid badi wrote: > Hi, > > I want to compile petsc with openblas & mumps (sequential) under mingw64. > To do so, I compiled openblas and mumps without any problem. But when it > comes to petsc, configure can't find my mumps. > > I use the following configuration options : > > --with-shared-libraries=1 > --with-openmp=1 > --with-mpi=0 > --with-debugging=0 > --with-scalar-type=real > --with-x=0 > --COPTFLAGS=-O3 > --CXXOPTFLAGS=-O3 > --FOPTFLAGS=-O3 > --with-windows-graphics=0 > --with-openblas=1 > --with-openblas-dir=/mingw64/ > --with-mumps=1 > --with-mumps-include=~/mumps-git/build/include > --with-mumps-lib="-L~/mumps-git/build/lib -lsmumps -ldmumps -lmumps_common > -lpord -lmpiseq" > --with-mumps-serial=1 > > i get the following error : > > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for > details): > > ------------------------------------------------------------------------------- > --with-mumps-lib=['-L~/mumps-git/build/lib', '-lsmumps', '-ldmumps', > '-lmumps_common', '-lpord', '-lmpiseq'] and > --with-mumps-include=['~/mumps-git/build/include'] did not work > > ******************************************************************************* > > I also tried using --with-mumps-dir=~/mumps-git/build without success. > > Thanks for helping. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Thu Jun 2 07:54:31 2022 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 2 Jun 2022 07:54:31 -0500 (CDT) Subject: [petsc-users] Petsc with mingw64 In-Reply-To: References: Message-ID: <533147c0-ae85-eb97-2d54-c74caaf7fce4@mcs.anl.gov> On Thu, 2 Jun 2022, hamid badi wrote: > Hi, > > I want to compile petsc with openblas & mumps (sequential) under mingw64. > To do so, I compiled openblas and mumps without any problem. But when it > comes to petsc, configure can't find my mumps. > > I use the following configuration options : > > --with-shared-libraries=1 > --with-openmp=1 > --with-mpi=0 > --with-debugging=0 > --with-scalar-type=real > --with-x=0 > --COPTFLAGS=-O3 > --CXXOPTFLAGS=-O3 > --FOPTFLAGS=-O3 > --with-windows-graphics=0 > --with-openblas=1 > --with-openblas-dir=/mingw64/ The option here should be --with-blaslapack-dir [not --with-openblas=1 --with-openblas-dir=/mingw64/]. But then - the compiler would automatically search this path? If so - avoid specifying this option. > --with-mumps=1 > --with-mumps-include=~/mumps-git/build/include > --with-mumps-lib="-L~/mumps-git/build/lib -lsmumps -ldmumps -lmumps_common use '$HOME' instead of '~' Satish > -lpord -lmpiseq" > --with-mumps-serial=1 > > i get the following error : > > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for > details): > ------------------------------------------------------------------------------- > --with-mumps-lib=['-L~/mumps-git/build/lib', '-lsmumps', '-ldmumps', > '-lmumps_common', '-lpord', '-lmpiseq'] and > --with-mumps-include=['~/mumps-git/build/include'] did not work > ******************************************************************************* > > I also tried using --with-mumps-dir=~/mumps-git/build without success. > > Thanks for helping. > From patrick.sanan at gmail.com Thu Jun 2 07:59:14 2022 From: patrick.sanan at gmail.com (Patrick Sanan) Date: Thu, 2 Jun 2022 14:59:14 +0200 Subject: [petsc-users] Mat created by DMStag cannot access ghost points In-Reply-To: <859FA50E-F3C2-4E54-AEDC-7C8A70D3FCE1@petsc.dev> References: <859FA50E-F3C2-4E54-AEDC-7C8A70D3FCE1@petsc.dev> Message-ID: Thanks, Barry and Changqing! That seems reasonable to me, so I'll make an MR with that change. Am Mi., 1. Juni 2022 um 20:06 Uhr schrieb Barry Smith : > > This appears to be a bug in the DMStag/Mat preallocator code. If you add > after the DMCreateMatrix() line in your code > > PetscCall(MatSetOption(A, MAT_NO_OFF_PROC_ENTRIES, PETSC_FALSE)); > > Your code will run correctly. > > Patrick and Matt, > > MatPreallocatorPreallocate_Preallocator() has > > PetscCall(MatSetOption(A, MAT_NO_OFF_PROC_ENTRIES, p->nooffproc)); > > to make the assembly of the stag matrix from the preallocator matrix a > little faster, > > but then it never "undoes" this call. Hence the matrix is left in the > state where it will error if someone sets values from a different rank > (which they certainly can using DMStagMatSetValuesStencil(). > > I think you need to clear the NO_OFF_PROC at the end > of MatPreallocatorPreallocate_Preallocator() because just because the > preallocation process never needed communication does not mean that when > someone puts real values in the matrix they will never use communication; > they can put in values any dang way they please. > > I don't know why this bug has not come up before. > > Barry > > > On May 31, 2022, at 11:08 PM, Ye Changqing > wrote: > > Dear all, > > [BugReport.c] is a sample code, [BugReportParallel.output] is the output > when execute BugReport with mpiexec, [BugReportSerial.output] is the output > in serial execution. > > Best, > Changqing > > ------------------------------ > *???:* Dave May > *????:* 2022?5?31? 22:55 > *???:* Ye Changqing > *??:* petsc-users at mcs.anl.gov > *??:* Re: [petsc-users] Mat created by DMStag cannot access ghost points > > > > On Tue 31. May 2022 at 16:28, Ye Changqing > wrote: > > Dear developers of PETSc, > > I encountered a problem when using the DMStag module. The program could be > executed perfectly in serial, while errors are thrown out in parallel > (using mpiexec). Some rows in Mat cannot be accessed in local processes > when looping all elements in DMStag. The DM object I used only has one DOF > in each element. Hence, I could switch to the DMDA module easily, and the > program now is back to normal. > > Some snippets are below. > > Initialise a DMStag object: > PetscCall(DMStagCreate2d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, > DM_BOUNDARY_NONE, M, N, PETSC_DECIDE, PETSC_DECIDE, 0, 0, 1, > DMSTAG_STENCIL_BOX, 1, NULL, NULL, &(s_ctx->dm_P))); > Created a Mat: > PetscCall(DMCreateMatrix(s_ctx->dm_P, A)); > Loop: > PetscCall(DMStagGetCorners(s_ctx->dm_V, &startx, &starty, &startz, &nx, > &ny, &nz, &extrax, &extray, &extraz)); > for (ey = starty; ey < starty + ny; ++ey) > for (ex = startx; ex < startx + nx; ++ex) > { > ... > PetscCall(DMStagMatSetValuesStencil(s_ctx->dm_P, *A, 2, &row[0], 2, > &col[0], &val_A[0][0], ADD_VALUES)); // The traceback shows the problem is > in here. > } > > > In addition to the code or MWE, please forward us the complete stack trace > / error thrown to stdout. > > Thanks, > Dave > > > > Best, > Changqing > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jun 2 08:00:50 2022 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 2 Jun 2022 09:00:50 -0400 Subject: [petsc-users] Mat created by DMStag cannot access ghost points In-Reply-To: References: <859FA50E-F3C2-4E54-AEDC-7C8A70D3FCE1@petsc.dev> Message-ID: On Thu, Jun 2, 2022 at 8:59 AM Patrick Sanan wrote: > Thanks, Barry and Changqing! That seems reasonable to me, so I'll make an > MR with that change. > Hi Patrick, In the MR, could you add that option to all places we internally use Preallocator? I think we mean it for those. Thanks, Matt > Am Mi., 1. Juni 2022 um 20:06 Uhr schrieb Barry Smith : > >> >> This appears to be a bug in the DMStag/Mat preallocator code. If you >> add after the DMCreateMatrix() line in your code >> >> PetscCall(MatSetOption(A, MAT_NO_OFF_PROC_ENTRIES, PETSC_FALSE)); >> >> Your code will run correctly. >> >> Patrick and Matt, >> >> MatPreallocatorPreallocate_Preallocator() has >> >> PetscCall(MatSetOption(A, MAT_NO_OFF_PROC_ENTRIES, p->nooffproc)); >> >> to make the assembly of the stag matrix from the preallocator matrix a >> little faster, >> >> but then it never "undoes" this call. Hence the matrix is left in the >> state where it will error if someone sets values from a different rank >> (which they certainly can using DMStagMatSetValuesStencil(). >> >> I think you need to clear the NO_OFF_PROC at the end >> of MatPreallocatorPreallocate_Preallocator() because just because the >> preallocation process never needed communication does not mean that when >> someone puts real values in the matrix they will never use communication; >> they can put in values any dang way they please. >> >> I don't know why this bug has not come up before. >> >> Barry >> >> >> On May 31, 2022, at 11:08 PM, Ye Changqing >> wrote: >> >> Dear all, >> >> [BugReport.c] is a sample code, [BugReportParallel.output] is the output >> when execute BugReport with mpiexec, [BugReportSerial.output] is the output >> in serial execution. >> >> Best, >> Changqing >> >> ------------------------------ >> *???:* Dave May >> *????:* 2022?5?31? 22:55 >> *???:* Ye Changqing >> *??:* petsc-users at mcs.anl.gov >> *??:* Re: [petsc-users] Mat created by DMStag cannot access ghost points >> >> >> >> On Tue 31. May 2022 at 16:28, Ye Changqing >> wrote: >> >> Dear developers of PETSc, >> >> I encountered a problem when using the DMStag module. The program could >> be executed perfectly in serial, while errors are thrown out in parallel >> (using mpiexec). Some rows in Mat cannot be accessed in local processes >> when looping all elements in DMStag. The DM object I used only has one DOF >> in each element. Hence, I could switch to the DMDA module easily, and the >> program now is back to normal. >> >> Some snippets are below. >> >> Initialise a DMStag object: >> PetscCall(DMStagCreate2d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, >> DM_BOUNDARY_NONE, M, N, PETSC_DECIDE, PETSC_DECIDE, 0, 0, 1, >> DMSTAG_STENCIL_BOX, 1, NULL, NULL, &(s_ctx->dm_P))); >> Created a Mat: >> PetscCall(DMCreateMatrix(s_ctx->dm_P, A)); >> Loop: >> PetscCall(DMStagGetCorners(s_ctx->dm_V, &startx, &starty, &startz, &nx, >> &ny, &nz, &extrax, &extray, &extraz)); >> for (ey = starty; ey < starty + ny; ++ey) >> for (ex = startx; ex < startx + nx; ++ex) >> { >> ... >> PetscCall(DMStagMatSetValuesStencil(s_ctx->dm_P, *A, 2, &row[0], 2, >> &col[0], &val_A[0][0], ADD_VALUES)); // The traceback shows the problem is >> in here. >> } >> >> >> In addition to the code or MWE, please forward us the complete stack >> trace / error thrown to stdout. >> >> Thanks, >> Dave >> >> >> >> Best, >> Changqing >> >> >> >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Jun 2 09:55:17 2022 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 2 Jun 2022 10:55:17 -0400 Subject: [petsc-users] Petsc with mingw64 In-Reply-To: References: Message-ID: Configure should error with a very helpful message if --with-openblas or --with-openblas-dir are provided on the command line > On Jun 2, 2022, at 5:38 AM, hamid badi wrote: > > Hi, > > I want to compile petsc with openblas & mumps (sequential) under mingw64. To do so, I compiled openblas and mumps without any problem. But when it comes to petsc, configure can't find my mumps. > > I use the following configuration options : > > --with-shared-libraries=1 > --with-openmp=1 > --with-mpi=0 > --with-debugging=0 > --with-scalar-type=real > --with-x=0 > --COPTFLAGS=-O3 > --CXXOPTFLAGS=-O3 > --FOPTFLAGS=-O3 > --with-windows-graphics=0 > --with-openblas=1 > --with-openblas-dir=/mingw64/ > --with-mumps=1 > --with-mumps-include=~/mumps-git/build/include > --with-mumps-lib="-L~/mumps-git/build/lib -lsmumps -ldmumps -lmumps_common -lpord -lmpiseq" > --with-mumps-serial=1 > > i get the following error : > > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): > ------------------------------------------------------------------------------- > --with-mumps-lib=['-L~/mumps-git/build/lib', '-lsmumps', '-ldmumps', '-lmumps_common', '-lpord', '-lmpiseq'] and > --with-mumps-include=['~/mumps-git/build/include'] did not work > ******************************************************************************* > > I also tried using --with-mumps-dir=~/mumps-git/build without success. > > Thanks for helping. From lidia.varsh at mail.ioffe.ru Fri Jun 3 05:36:32 2022 From: lidia.varsh at mail.ioffe.ru (Lidia) Date: Fri, 3 Jun 2022 13:36:32 +0300 Subject: [petsc-users] Sparse linear system solving In-Reply-To: References: <026d55af-c978-81da-f571-46519e5e6f8e@mail.ioffe.ru> <2e7ebbf1-511a-7055-ff92-131d3bf73f1e@mail.ioffe.ru> <201b6c28-2616-edaa-21dd-ee91a257c432@mail.ioffe.ru> Message-ID: <5475809d-16b1-e6a8-72b8-c4605f321e8e@mail.ioffe.ru> Dear Matt, Barry, thank you for the information about openMP! Now all processes are loaded well. But we see a strange behaviour of running times at different iterations, see description below. Could you please explain us the reason and how we can improve it? We need to quickly solve a big (about 1e6 rows) square sparse non-symmetric matrix many times (about 1e5 times) consequently. Matrix is constant at every iteration, and the right-side vector B is slowly changed (we think that its change at every iteration should be less then 0.001 %). So we use every previous solution vector X as an initial guess for the next iteration. AMG preconditioner and GMRES solver are used. We have tested the code using a matrix with 631 000 rows, during 15 consequent iterations, using vector X from the previous iterations. Right-side vector B and matrix A are constant during the whole running. The time of the first iteration is large (about 2 seconds) and is quickly decreased to the next iterations (average time of last iterations were about 0.00008 s). But some iterations in the middle (# 2 and # 12) have huge time - 0.999063 second (see the figure with time dynamics attached). This time of 0.999 second does not depend on the size of a matrix, on the number of MPI processes, these time jumps also exist if we vary vector B. Why these time jumps appear and how we can avoid them? The ksp_monitor out for this running (included 15 iterations) using 36 MPI processes and a file with the memory bandwidth information (testSpeed) are also attached. We can provide our C++ script if it is needed. Thanks a lot! Best, Lidiia On 01.06.2022 21:14, Matthew Knepley wrote: > On Wed, Jun 1, 2022 at 1:43 PM Lidia wrote: > > Dear Matt, > > Thank you for the rule of 10,000 variables per process! We have > run ex.5 with matrix 1e4 x 1e4 at our cluster and got a good > performance dynamics (see the figure "performance.png" - > dependency of the solving time in seconds on the number of cores). > We have used GAMG preconditioner (multithread: we have added the > option "-pc_gamg_use_parallel_coarse_grid_solver") and GMRES > solver. And we have set one openMP thread to every MPI process. > Now the ex.5 is working good on many mpi processes! But the > running uses about 100 GB of RAM. > > How we can run ex.5 using many openMP threads without mpi? If we > just change the running command, the cores are not loaded > normally: usually just one core is loaded in 100 % and others are > idle. Sometimes all cores are working in 100 % during 1 second but > then again become idle about 30 seconds. Can the preconditioner > use many threads and how to activate this option? > > > Maye you could describe what you are trying to accomplish? Threads and > processes are not really different, except for memory sharing. > However, sharing large complex data structures rarely works. That is > why they get partitioned and operate effectively as distributed > memory. You would not really save memory by using > threads in this instance, if that is your goal. This is detailed in > the talks in this session (see 2016 PP Minisymposium on this page > https://cse.buffalo.edu/~knepley/relacs.html). > > ? Thanks, > > ? ? ?Matt > > The solving times (the time of the solver work) using 60 openMP > threads is 511 seconds now, and while using 60 MPI processes - > 13.19 seconds. > > ksp_monitor outs for both cases (many openMP threads or many MPI > processes) are attached. > > > Thank you! > > Best, > Lidia > > On 31.05.2022 15:21, Matthew Knepley wrote: >> I have looked at the local logs. First, you have run problems of >> size 12? and 24. As a rule of thumb, you need 10,000 >> variables per process in order to see good speedup. >> >> ? Thanks, >> >> ? ? ?Matt >> >> On Tue, May 31, 2022 at 8:19 AM Matthew Knepley >> wrote: >> >> On Tue, May 31, 2022 at 7:39 AM Lidia >> wrote: >> >> Matt, Mark, thank you much for your answers! >> >> >> Now we have run example # 5 on our computer cluster and >> on the local server and also have not seen any >> performance increase, but by unclear reason running times >> on the local server are much better than on the cluster. >> >> I suspect that you are trying to get speedup without >> increasing the memory bandwidth: >> >> https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup >> >> ? Thanks, >> >> ? ? ?Matt >> >> Now we will try to run petsc #5 example inside a docker >> container on our server and see if the problem is in our >> environment. I'll write you the results of this test as >> soon as we get it. >> >> The ksp_monitor outs for the 5th test at the current >> local server configuration (for 2 and 4 mpi processes) >> and for the cluster (for 1 and 3 mpi processes) are >> attached . >> >> >> And one more question. Potentially we can use 10 nodes >> and 96 threads at each node on our cluster. What do you >> think, which combination of numbers of mpi processes and >> openmp threads may be the best for the 5th example? >> >> Thank you! >> >> >> Best, >> Lidiia >> >> On 31.05.2022 05:42, Mark Adams wrote: >>> And if you see "NO" change in performance I suspect the >>> solver/matrix is all on one processor. >>> (PETSc does not use threads by default so threads should >>> not change anything). >>> >>> As Matt said, it is best to start with a PETSc >>> example?that does something like what you want (parallel >>> linear solve, see src/ksp/ksp/tutorials for examples), >>> and then add your code to it. >>> That way you get the basic infrastructure?in place for >>> you, which is pretty obscure to the uninitiated. >>> >>> Mark >>> >>> On Mon, May 30, 2022 at 10:18 PM Matthew Knepley >>> wrote: >>> >>> On Mon, May 30, 2022 at 10:12 PM Lidia >>> wrote: >>> >>> Dear colleagues, >>> >>> Is here anyone who have solved big sparse linear >>> matrices using PETSC? >>> >>> >>> There are lots of publications with this kind of >>> data. Here is one recent one: >>> https://arxiv.org/abs/2204.01722 >>> >>> We have found NO performance improvement while >>> using more and more mpi >>> processes (1-2-3) and open-mp threads (from 1 to >>> 72 threads). Did anyone >>> faced to this problem? Does anyone know any >>> possible reasons of such >>> behaviour? >>> >>> >>> Solver behavior is dependent on the input matrix. >>> The only general-purpose solvers >>> are direct, but they do not scale linearly and have >>> high memory requirements. >>> >>> Thus, in order to make progress you will have to be >>> specific about your matrices. >>> >>> We use AMG preconditioner and GMRES solver from >>> KSP package, as our >>> matrix is large (from 100 000 to 1e+6 rows and >>> columns), sparse, >>> non-symmetric and includes both positive and >>> negative values. But >>> performance problems also exist while using CG >>> solvers with symmetric >>> matrices. >>> >>> >>> There are many PETSc examples, such as example 5 for >>> the Laplacian, that exhibit >>> good scaling with both AMG and GMG. >>> >>> Could anyone help us to set appropriate options >>> of the preconditioner >>> and solver? Now we use default parameters, maybe >>> they are not the best, >>> but we do not know a good combination. Or maybe >>> you could suggest any >>> other pairs of preconditioner+solver for such tasks? >>> >>> I can provide more information: the matrices >>> that we solve, c++ script >>> to run solving using petsc and any statistics >>> obtained by our runs. >>> >>> >>> First, please provide a description of the linear >>> system, and the output of >>> >>> ? -ksp_view -ksp_monitor_true_residual >>> -ksp_converged_reason -log_view >>> >>> for each test case. >>> >>> ? Thanks, >>> >>> ? ? ?Matt >>> >>> Thank you in advance! >>> >>> Best regards, >>> Lidiia Varshavchik, >>> Ioffe Institute, St. Petersburg, Russia >>> >>> >>> >>> -- >>> What most experimenters take for granted before they >>> begin their experiments is infinitely more >>> interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin >> their experiments is infinitely more interesting than any >> results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to >> which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- [lida at head1 build]$ mpirun -n 36 ./petscTest -ksp_monitor -ksp_monitor_true_residual -ksp_converged_reason -log_view -------------------------------------------------------------------------- WARNING: No preset parameters were found for the device that Open MPI detected: Local host: head1 Device name: i40iw0 Device vendor ID: 0x8086 Device vendor part ID: 14290 Default device parameters will be used, which may result in lower performance. You can edit any of the files specified by the btl_openib_device_param_files MCA parameter to set values for your device. NOTE: You can turn off this warning by setting the MCA parameter btl_openib_warn_no_device_params_found to 0. -------------------------------------------------------------------------- -------------------------------------------------------------------------- No OpenFabrics connection schemes reported that they were able to be used on a specific port. As such, the openib BTL (OpenFabrics support) will be disabled for this port. Local host: head1 Local device: i40iw0 Local port: 1 CPCs attempted: rdmacm, udcm -------------------------------------------------------------------------- Mat size 630834 using block size is 1 5 17524 87620 105144 1 17524 17524 35048 7 17523 122667 140190 21 17523 367989 385512 27 17523 473127 490650 31 17523 543219 560742 2 17524 35048 52572 3 17524 52572 70096 4 17524 70096 87620 0 17524 0 17524 6 17523 105144 122667 8 17523 140190 157713 9 17523 157713 175236 11 17523 192759 210282 12 17523 210282 227805 13 17523 227805 245328 14 17523 245328 262851 20 17523 350466 367989 22 17523 385512 403035 23 17523 403035 420558 25 17523 438081 455604 26 17523 455604 473127 28 17523 490650 508173 30 17523 525696 543219 33 17523 578265 595788 34 17523 595788 613311 35 17523 613311 630834 15 17523 262851 280374 16 17523 280374 297897 17 17523 297897 315420 18 17523 315420 332943 19 17523 332943 350466 24 17523 420558 438081 29 17523 508173 525696 10 17523 175236 192759 32 17523 560742 578265 [head1.hpc:242461] 71 more processes have sent help message help-mpi-btl-openib.txt / no device params found [head1.hpc:242461] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [head1.hpc:242461] 71 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port Compute with tolerance 0.000010000000000000000818030539 solver is gmres startPC startSolv 0 KSP Residual norm 1.868353493329e+08 0 KSP preconditioned resid norm 1.868353493329e+08 true resid norm 2.165031654579e+06 ||r(i)||/||b|| 1.000000000000e+00 1 KSP Residual norm 1.132315559206e+08 1 KSP preconditioned resid norm 1.132315559206e+08 true resid norm 6.461246152989e+07 ||r(i)||/||b|| 2.984365673971e+01 2 KSP Residual norm 1.534820972084e+07 2 KSP preconditioned resid norm 1.534820972084e+07 true resid norm 2.426876823961e+07 ||r(i)||/||b|| 1.120942882672e+01 3 KSP Residual norm 7.539322505186e+06 3 KSP preconditioned resid norm 7.539322505186e+06 true resid norm 1.829739078019e+07 ||r(i)||/||b|| 8.451327139485e+00 4 KSP Residual norm 4.660669278808e+06 4 KSP preconditioned resid norm 4.660669278808e+06 true resid norm 1.744671242073e+07 ||r(i)||/||b|| 8.058409854574e+00 5 KSP Residual norm 3.223391594815e+06 5 KSP preconditioned resid norm 3.223391594815e+06 true resid norm 1.737561446785e+07 ||r(i)||/||b|| 8.025570633618e+00 6 KSP Residual norm 2.240424900880e+06 6 KSP preconditioned resid norm 2.240424900880e+06 true resid norm 1.683362112781e+07 ||r(i)||/||b|| 7.775230949719e+00 7 KSP Residual norm 1.623399472779e+06 7 KSP preconditioned resid norm 1.623399472779e+06 true resid norm 1.624000914301e+07 ||r(i)||/||b|| 7.501049284271e+00 8 KSP Residual norm 1.211518107569e+06 8 KSP preconditioned resid norm 1.211518107569e+06 true resid norm 1.558830757667e+07 ||r(i)||/||b|| 7.200036795627e+00 9 KSP Residual norm 9.642201969240e+05 9 KSP preconditioned resid norm 9.642201969240e+05 true resid norm 1.486473650844e+07 ||r(i)||/||b|| 6.865828717562e+00 10 KSP Residual norm 7.867651557046e+05 10 KSP preconditioned resid norm 7.867651557046e+05 true resid norm 1.396084153269e+07 ||r(i)||/||b|| 6.448331368812e+00 11 KSP Residual norm 7.078405789961e+05 11 KSP preconditioned resid norm 7.078405789961e+05 true resid norm 1.296873719329e+07 ||r(i)||/||b|| 5.990091260724e+00 12 KSP Residual norm 6.335098563709e+05 12 KSP preconditioned resid norm 6.335098563709e+05 true resid norm 1.164201582227e+07 ||r(i)||/||b|| 5.377295892022e+00 13 KSP Residual norm 5.397665070507e+05 13 KSP preconditioned resid norm 5.397665070507e+05 true resid norm 1.042661489959e+07 ||r(i)||/||b|| 4.815917992485e+00 14 KSP Residual norm 4.549629296863e+05 14 KSP preconditioned resid norm 4.549629296863e+05 true resid norm 9.420542232153e+06 ||r(i)||/||b|| 4.351226095114e+00 15 KSP Residual norm 3.627838605442e+05 15 KSP preconditioned resid norm 3.627838605442e+05 true resid norm 8.546289749804e+06 ||r(i)||/||b|| 3.947420229042e+00 16 KSP Residual norm 2.974632184520e+05 16 KSP preconditioned resid norm 2.974632184520e+05 true resid norm 7.707507230485e+06 ||r(i)||/||b|| 3.559997478181e+00 17 KSP Residual norm 2.584437744774e+05 17 KSP preconditioned resid norm 2.584437744774e+05 true resid norm 6.996748201244e+06 ||r(i)||/||b|| 3.231707114510e+00 18 KSP Residual norm 2.172287358399e+05 18 KSP preconditioned resid norm 2.172287358399e+05 true resid norm 6.008578157843e+06 ||r(i)||/||b|| 2.775284206646e+00 19 KSP Residual norm 1.807320553225e+05 19 KSP preconditioned resid norm 1.807320553225e+05 true resid norm 5.166440962968e+06 ||r(i)||/||b|| 2.386311974719e+00 20 KSP Residual norm 1.583700438237e+05 20 KSP preconditioned resid norm 1.583700438237e+05 true resid norm 4.613820989743e+06 ||r(i)||/||b|| 2.131063986978e+00 21 KSP Residual norm 1.413879944302e+05 21 KSP preconditioned resid norm 1.413879944302e+05 true resid norm 4.151504476178e+06 ||r(i)||/||b|| 1.917525994318e+00 22 KSP Residual norm 1.228172205521e+05 22 KSP preconditioned resid norm 1.228172205521e+05 true resid norm 3.630290527838e+06 ||r(i)||/||b|| 1.676784041545e+00 23 KSP Residual norm 1.084793002546e+05 23 KSP preconditioned resid norm 1.084793002546e+05 true resid norm 3.185566371074e+06 ||r(i)||/||b|| 1.471371729986e+00 24 KSP Residual norm 9.520569914833e+04 24 KSP preconditioned resid norm 9.520569914833e+04 true resid norm 2.811378949429e+06 ||r(i)||/||b|| 1.298539420189e+00 25 KSP Residual norm 8.331027569193e+04 25 KSP preconditioned resid norm 8.331027569193e+04 true resid norm 2.487128345424e+06 ||r(i)||/||b|| 1.148772277839e+00 26 KSP Residual norm 7.116546817077e+04 26 KSP preconditioned resid norm 7.116546817077e+04 true resid norm 2.128784852233e+06 ||r(i)||/||b|| 9.832580728002e-01 27 KSP Residual norm 6.107201042673e+04 27 KSP preconditioned resid norm 6.107201042673e+04 true resid norm 1.816742057822e+06 ||r(i)||/||b|| 8.391295591358e-01 28 KSP Residual norm 5.407959454186e+04 28 KSP preconditioned resid norm 5.407959454186e+04 true resid norm 1.590698721931e+06 ||r(i)||/||b|| 7.347230783285e-01 29 KSP Residual norm 4.859208455279e+04 29 KSP preconditioned resid norm 4.859208455279e+04 true resid norm 1.405619902078e+06 ||r(i)||/||b|| 6.492375753974e-01 30 KSP Residual norm 4.463327440008e+04 30 KSP preconditioned resid norm 4.463327440008e+04 true resid norm 1.258789113490e+06 ||r(i)||/||b|| 5.814183413104e-01 31 KSP Residual norm 3.927742507325e+04 31 KSP preconditioned resid norm 3.927742507325e+04 true resid norm 1.086402490838e+06 ||r(i)||/||b|| 5.017951994097e-01 32 KSP Residual norm 3.417683630748e+04 32 KSP preconditioned resid norm 3.417683630748e+04 true resid norm 9.566603594382e+05 ||r(i)||/||b|| 4.418689941159e-01 33 KSP Residual norm 3.002775921838e+04 33 KSP preconditioned resid norm 3.002775921838e+04 true resid norm 8.429546731968e+05 ||r(i)||/||b|| 3.893498145460e-01 34 KSP Residual norm 2.622152046131e+04 34 KSP preconditioned resid norm 2.622152046131e+04 true resid norm 7.578781071384e+05 ||r(i)||/||b|| 3.500540537296e-01 35 KSP Residual norm 2.264910466846e+04 35 KSP preconditioned resid norm 2.264910466846e+04 true resid norm 6.684892523160e+05 ||r(i)||/||b|| 3.087665027447e-01 36 KSP Residual norm 1.970721593805e+04 36 KSP preconditioned resid norm 1.970721593805e+04 true resid norm 5.905536805578e+05 ||r(i)||/||b|| 2.727690744422e-01 37 KSP Residual norm 1.666104858674e+04 37 KSP preconditioned resid norm 1.666104858674e+04 true resid norm 5.172223947409e+05 ||r(i)||/||b|| 2.388983060118e-01 38 KSP Residual norm 1.432004409785e+04 38 KSP preconditioned resid norm 1.432004409785e+04 true resid norm 4.593351142808e+05 ||r(i)||/||b|| 2.121609230559e-01 39 KSP Residual norm 1.211549914084e+04 39 KSP preconditioned resid norm 1.211549914084e+04 true resid norm 4.019170298644e+05 ||r(i)||/||b|| 1.856402556583e-01 40 KSP Residual norm 1.061599294842e+04 40 KSP preconditioned resid norm 1.061599294842e+04 true resid norm 3.586589723898e+05 ||r(i)||/||b|| 1.656599207828e-01 41 KSP Residual norm 9.577489574913e+03 41 KSP preconditioned resid norm 9.577489574913e+03 true resid norm 3.221505690964e+05 ||r(i)||/||b|| 1.487971635034e-01 42 KSP Residual norm 8.221576307371e+03 42 KSP preconditioned resid norm 8.221576307371e+03 true resid norm 2.745213067979e+05 ||r(i)||/||b|| 1.267978258965e-01 43 KSP Residual norm 6.898384710028e+03 43 KSP preconditioned resid norm 6.898384710028e+03 true resid norm 2.330710645170e+05 ||r(i)||/||b|| 1.076524973776e-01 44 KSP Residual norm 6.087330352788e+03 44 KSP preconditioned resid norm 6.087330352788e+03 true resid norm 2.058183089407e+05 ||r(i)||/||b|| 9.506480355857e-02 45 KSP Residual norm 5.207144067562e+03 45 KSP preconditioned resid norm 5.207144067562e+03 true resid norm 1.745194864065e+05 ||r(i)||/||b|| 8.060828396546e-02 46 KSP Residual norm 4.556037825199e+03 46 KSP preconditioned resid norm 4.556037825199e+03 true resid norm 1.551715592432e+05 ||r(i)||/||b|| 7.167172771584e-02 47 KSP Residual norm 3.856329202278e+03 47 KSP preconditioned resid norm 3.856329202278e+03 true resid norm 1.315660202980e+05 ||r(i)||/||b|| 6.076863588562e-02 48 KSP Residual norm 3.361878313389e+03 48 KSP preconditioned resid norm 3.361878313389e+03 true resid norm 1.147746368397e+05 ||r(i)||/||b|| 5.301291396685e-02 49 KSP Residual norm 2.894852363045e+03 49 KSP preconditioned resid norm 2.894852363045e+03 true resid norm 9.951811967458e+04 ||r(i)||/||b|| 4.596612685273e-02 50 KSP Residual norm 2.576639763678e+03 50 KSP preconditioned resid norm 2.576639763678e+03 true resid norm 8.828512403741e+04 ||r(i)||/||b|| 4.077775207151e-02 51 KSP Residual norm 2.176356645511e+03 51 KSP preconditioned resid norm 2.176356645511e+03 true resid norm 7.535533182060e+04 ||r(i)||/||b|| 3.480564898957e-02 52 KSP Residual norm 1.909590120581e+03 52 KSP preconditioned resid norm 1.909590120581e+03 true resid norm 6.643741378378e+04 ||r(i)||/||b|| 3.068657848177e-02 53 KSP Residual norm 1.625794696835e+03 53 KSP preconditioned resid norm 1.625794696835e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02 Linear solve converged due to CONVERGED_RTOL iterations 53 ################################################################################# SOLV gmres iter 0 Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 53 time 2.000408 s(2000407820.91200017929077148438 us) ################################################################################# startPC startSolv 0 KSP Residual norm 1.625794716222e+03 0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02 Linear solve converged due to CONVERGED_RTOL iterations 0 ################################################################################# SOLV gmres iter 1 Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.000082 s(82206.13200000001234002411 us) ################################################################################# startPC startSolv 0 KSP Residual norm 1.625794716222e+03 0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02 Linear solve converged due to CONVERGED_RTOL iterations 0 ################################################################################# SOLV gmres iter 2 Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.999076 s(999076088.56700003147125244141 us) ################################################################################# startPC startSolv 0 KSP Residual norm 1.625794716222e+03 0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02 Linear solve converged due to CONVERGED_RTOL iterations 0 ################################################################################# SOLV gmres iter 3 Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.000081 s(80689.84000000001105945557 us) ################################################################################# startPC startSolv 0 KSP Residual norm 1.625794716222e+03 0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02 Linear solve converged due to CONVERGED_RTOL iterations 0 ################################################################################# SOLV gmres iter 4 Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.000079 s(79139.94299999999930150807 us) ################################################################################# startPC startSolv 0 KSP Residual norm 1.625794716222e+03 0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02 Linear solve converged due to CONVERGED_RTOL iterations 0 ################################################################################# SOLV gmres iter 5 Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.000065 s(65399.49300000000948784873 us) ################################################################################# startPC startSolv 0 KSP Residual norm 1.625794716222e+03 0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02 Linear solve converged due to CONVERGED_RTOL iterations 0 ################################################################################# SOLV gmres iter 6 Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.000080 s(79554.38999999999941792339 us) ################################################################################# startPC startSolv 0 KSP Residual norm 1.625794716222e+03 0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02 Linear solve converged due to CONVERGED_RTOL iterations 0 ################################################################################# SOLV gmres iter 7 Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.000080 s(80431.21900000001187436283 us) ################################################################################# startPC startSolv 0 KSP Residual norm 1.625794716222e+03 0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02 Linear solve converged due to CONVERGED_RTOL iterations 0 ################################################################################# SOLV gmres iter 8 Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.000080 s(80255.19100000000617001206 us) ################################################################################# startPC startSolv 0 KSP Residual norm 1.625794716222e+03 0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02 Linear solve converged due to CONVERGED_RTOL iterations 0 ################################################################################# SOLV gmres iter 9 Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.000081 s(80568.19700000000011641532 us) ################################################################################# startPC startSolv 0 KSP Residual norm 1.625794716222e+03 0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02 Linear solve converged due to CONVERGED_RTOL iterations 0 ################################################################################# SOLV gmres iter 10 Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.000078 s(78323.06299999999464489520 us) ################################################################################# startPC startSolv 0 KSP Residual norm 1.625794716222e+03 0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02 Linear solve converged due to CONVERGED_RTOL iterations 0 ################################################################################# SOLV gmres iter 11 Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.000072 s(71933.38600000001315493137 us) ################################################################################# startPC startSolv 0 KSP Residual norm 1.625794716222e+03 0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02 Linear solve converged due to CONVERGED_RTOL iterations 0 ################################################################################# SOLV gmres iter 12 Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.999063 s(999063438.25300002098083496094 us) ################################################################################# startPC startSolv 0 KSP Residual norm 1.625794716222e+03 0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02 Linear solve converged due to CONVERGED_RTOL iterations 0 ################################################################################# SOLV gmres iter 13 Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.000070 s(69632.13800000000628642738 us) ################################################################################# startPC startSolv 0 KSP Residual norm 1.625794716222e+03 0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02 Linear solve converged due to CONVERGED_RTOL iterations 0 ################################################################################# SOLV gmres iter 14 Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.000073 s(73498.46099999999569263309 us) ################################################################################# nohup: appending output to ?nohup.out? nohup: failed to run command ?localc?: No such file or directory **************************************** *********************************************************************************************************************** *** WIDEN YOUR WINDOW TO 160 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** **************************************************************************************************************************************************************** ------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------- ./petscTest on a named head1.hpc with 36 processors, by lida Fri Jun 3 13:23:29 2022 Using Petsc Release Version 3.17.1, unknown Max Max/Min Avg Total Time (sec): 8.454e+01 1.440 5.941e+01 Objects: 7.030e+02 1.000 7.030e+02 Flops: 1.018e+09 2.522 5.062e+08 1.822e+10 Flops/sec: 1.734e+07 3.633 8.567e+06 3.084e+08 MPI Msg Count: 5.257e+04 1.584 4.249e+04 1.530e+06 MPI Msg Len (bytes): 1.453e+09 14.133 2.343e+04 3.585e+10 MPI Reductions: 7.800e+02 1.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 5.9406e+01 100.0% 1.8223e+10 100.0% 1.530e+06 100.0% 2.343e+04 100.0% 7.620e+02 97.7% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage BuildTwoSided 75 1.0 3.5652e+0155.4 0.00e+00 0.0 1.9e+04 8.0e+00 7.5e+01 8 0 1 0 10 8 0 1 0 10 0 BuildTwoSidedF 51 1.0 3.5557e+0157.2 0.00e+00 0.0 8.5e+03 3.8e+05 5.1e+01 8 0 1 9 7 8 0 1 9 7 0 MatMult 1503 1.0 2.8036e+00 1.3 6.78e+08 3.8 1.1e+06 2.4e+04 4.0e+00 4 53 74 75 1 4 53 74 75 1 3473 MatMultAdd 328 1.0 2.2706e-01 2.0 2.43e+07 2.3 1.5e+05 3.0e+03 0.0e+00 0 2 10 1 0 0 2 10 1 0 1985 MatMultTranspose 328 1.0 4.6323e-01 2.6 4.98e+07 4.7 1.5e+05 3.0e+03 4.0e+00 0 3 10 1 1 0 3 10 1 1 1090 MatSolve 82 1.0 2.3332e-04 2.0 1.23e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 190 MatLUFactorSym 1 1.0 1.2696e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatLUFactorNum 1 1.0 2.1874e-05 2.1 1.60e+01 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 26 MatConvert 5 1.0 3.0352e-02 1.2 0.00e+00 0.0 5.7e+03 9.8e+03 4.0e+00 0 0 0 0 1 0 0 0 0 1 0 MatScale 12 1.0 9.6534e-03 2.1 2.13e+06 4.1 2.8e+03 2.0e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 2795 MatResidual 328 1.0 5.8371e-01 1.2 1.50e+08 4.7 2.3e+05 2.0e+04 0.0e+00 1 10 15 13 0 1 10 15 13 0 3018 MatAssemblyBegin 70 1.0 3.5348e+0142.1 0.00e+00 0.0 8.5e+03 3.8e+05 2.4e+01 9 0 1 9 3 9 0 1 9 3 0 MatAssemblyEnd 70 1.0 2.0657e+00 1.0 7.97e+06573.7 0.0e+00 0.0e+00 8.1e+01 3 0 0 0 10 3 0 0 0 11 7 MatGetRowIJ 1 1.0 4.5262e-06 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatCreateSubMats 1 1.0 3.6275e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 1 0 0 0 0 1 0 MatCreateSubMat 2 1.0 2.2424e-03 1.1 0.00e+00 0.0 3.5e+01 1.2e+03 2.8e+01 0 0 0 0 4 0 0 0 0 4 0 MatGetOrdering 1 1.0 3.3737e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatCoarsen 4 1.0 2.7961e-01 2.3 0.00e+00 0.0 3.1e+04 5.1e+04 2.4e+01 0 0 2 4 3 0 0 2 4 3 0 MatZeroEntries 4 1.0 6.2166e-04176.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAXPY 4 1.0 1.0507e-02 1.1 2.81e+04 1.6 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 1 0 0 0 0 1 63 MatTranspose 8 1.0 3.0698e-03 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMatMultSym 12 1.0 1.5384e-01 1.2 0.00e+00 0.0 8.5e+03 2.0e+04 3.6e+01 0 0 1 0 5 0 0 1 0 5 0 MatMatMultNum 12 1.0 7.4942e-02 1.3 1.29e+07 8.0 2.8e+03 2.0e+04 4.0e+00 0 1 0 0 1 0 1 0 0 1 1636 MatPtAPSymbolic 4 1.0 5.7252e-01 1.0 0.00e+00 0.0 1.4e+04 4.8e+04 2.8e+01 1 0 1 2 4 1 0 1 2 4 0 MatPtAPNumeric 4 1.0 9.4753e-01 1.0 3.99e+0714.7 3.7e+03 1.0e+05 2.0e+01 2 1 0 1 3 2 1 0 1 3 256 MatTrnMatMultSym 1 1.0 2.3274e+00 1.0 0.00e+00 0.0 5.6e+03 7.0e+05 1.2e+01 4 0 0 11 2 4 0 0 11 2 0 MatRedundantMat 1 1.0 3.9176e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 1 0 0 0 0 1 0 MatMPIConcateSeq 1 1.0 3.0197e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetLocalMat 13 1.0 8.8896e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetBrAoCol 12 1.0 2.0289e-01 1.8 0.00e+00 0.0 2.0e+04 3.8e+04 0.0e+00 0 0 1 2 0 0 0 1 2 0 0 VecMDot 93 1.0 2.7982e-01 3.2 5.32e+07 1.0 0.0e+00 0.0e+00 9.3e+01 0 10 0 0 12 0 10 0 0 12 6711 VecNorm 209 1.0 3.7674e-01 4.3 6.40e+06 1.0 0.0e+00 0.0e+00 2.1e+02 0 1 0 0 27 0 1 0 0 27 591 VecScale 112 1.0 5.5057e-04 1.5 1.50e+06 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 91073 VecCopy 1071 1.0 1.4028e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 1293 1.0 6.6862e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 72 1.0 1.4381e-03 1.6 2.44e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 60574 VecAYPX 2036 1.0 2.0018e-02 1.7 1.96e+07 1.5 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 23728 VecAXPBYCZ 656 1.0 7.2191e-03 1.7 2.30e+07 1.6 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 74818 VecMAXPY 151 1.0 8.7415e-02 1.3 1.06e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 21 0 0 0 0 21 0 0 0 43052 VecAssemblyBegin 28 1.0 2.6426e-01100.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.7e+01 0 0 0 0 3 0 0 0 0 4 0 VecAssemblyEnd 28 1.0 4.3833e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 1356 1.0 1.7490e-02 1.5 9.52e+06 1.6 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 12767 VecScatterBegin 2341 1.0 5.3816e-01 5.2 0.00e+00 0.0 1.5e+06 2.0e+04 1.6e+01 1 0 95 81 2 1 0 95 81 2 0 VecScatterEnd 2341 1.0 2.4118e+00 1.9 2.55e+07368.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 23 VecNormalize 112 1.0 7.0541e-02 2.8 4.50e+06 1.1 0.0e+00 0.0e+00 1.1e+02 0 1 0 0 14 0 1 0 0 15 2132 SFSetGraph 31 1.0 2.5626e-0218.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFSetUp 24 1.0 1.0973e-01 1.6 0.00e+00 0.0 2.9e+04 1.5e+04 2.4e+01 0 0 2 1 3 0 0 2 1 3 0 SFBcastBegin 28 1.0 2.3203e-02 3.2 0.00e+00 0.0 2.5e+04 5.7e+04 0.0e+00 0 0 2 4 0 0 0 2 4 0 0 SFBcastEnd 28 1.0 7.4158e-02 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFPack 2369 1.0 4.4483e-0118.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 SFUnpack 2369 1.0 3.6313e-0271.6 2.55e+07368.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1497 KSPSetUp 11 1.0 5.1386e-03 5.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 1 0 0 0 0 1 0 KSPSolve 15 1.0 3.4692e+00 1.0 9.40e+08 2.4 1.4e+06 1.9e+04 2.3e+02 6 95 90 73 30 6 95 90 73 30 4972 KSPGMRESOrthog 93 1.0 3.1573e-01 2.5 1.06e+08 1.0 0.0e+00 0.0e+00 9.3e+01 0 21 0 0 12 0 21 0 0 12 11896 PCGAMGGraph_AGG 4 1.0 3.0876e-01 1.0 1.83e+06 4.7 8.5e+03 1.3e+04 3.6e+01 1 0 1 0 5 1 0 1 0 5 70 PCGAMGCoarse_AGG 4 1.0 2.8281e+00 1.1 0.00e+00 0.0 4.8e+04 1.3e+05 4.7e+01 5 0 3 17 6 5 0 3 17 6 0 PCGAMGProl_AGG 4 1.0 3.2106e-01 1.8 0.00e+00 0.0 1.2e+04 2.4e+04 6.3e+01 0 0 1 1 8 0 0 1 1 8 0 PCGAMGPOpt_AGG 4 1.0 2.6704e-01 1.0 2.82e+07 3.0 4.5e+04 1.8e+04 1.6e+02 0 2 3 2 21 0 2 3 2 22 1589 GAMG: createProl 4 1.0 3.5902e+00 1.0 3.01e+07 3.0 1.1e+05 6.6e+04 3.1e+02 6 2 7 21 40 6 2 7 21 41 124 Create Graph 4 1.0 3.0346e-02 1.2 0.00e+00 0.0 5.7e+03 9.8e+03 4.0e+00 0 0 0 0 1 0 0 0 0 1 0 Filter Graph 4 1.0 2.8303e-01 1.0 1.83e+06 4.7 2.8e+03 2.0e+04 3.2e+01 0 0 0 0 4 0 0 0 0 4 76 MIS/Agg 4 1.0 2.7965e-01 2.3 0.00e+00 0.0 3.1e+04 5.1e+04 2.4e+01 0 0 2 4 3 0 0 2 4 3 0 SA: col data 4 1.0 2.1554e-02 1.2 0.00e+00 0.0 9.2e+03 2.9e+04 2.7e+01 0 0 1 1 3 0 0 1 1 4 0 SA: frmProl0 4 1.0 1.5549e-01 1.0 0.00e+00 0.0 2.6e+03 5.5e+03 2.0e+01 0 0 0 0 3 0 0 0 0 3 0 SA: smooth 4 1.0 1.8642e-01 1.0 2.16e+06 4.0 1.1e+04 2.0e+04 5.2e+01 0 0 1 1 7 0 0 1 1 7 148 GAMG: partLevel 4 1.0 1.5208e+00 1.0 3.99e+0714.7 1.8e+04 5.9e+04 1.0e+02 3 1 1 3 13 3 1 1 3 13 159 repartition 1 1.0 3.5234e-03 1.0 0.00e+00 0.0 8.0e+01 5.3e+02 5.3e+01 0 0 0 0 7 0 0 0 0 7 0 Invert-Sort 1 1.0 3.3143e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 1 0 0 0 0 1 0 Move A 1 1.0 1.1502e-03 1.2 0.00e+00 0.0 3.5e+01 1.2e+03 1.5e+01 0 0 0 0 2 0 0 0 0 2 0 Move P 1 1.0 1.4414e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.6e+01 0 0 0 0 2 0 0 0 0 2 0 PCGAMG Squ l00 1 1.0 2.3274e+00 1.0 0.00e+00 0.0 5.6e+03 7.0e+05 1.2e+01 4 0 0 11 2 4 0 0 11 2 0 PCGAMG Gal l00 1 1.0 1.2443e+00 1.0 1.06e+07 5.0 9.0e+03 1.1e+05 1.2e+01 2 1 1 3 2 2 1 1 3 2 135 PCGAMG Opt l00 1 1.0 1.4166e-01 1.0 5.88e+05 1.7 4.5e+03 4.8e+04 1.0e+01 0 0 0 1 1 0 0 0 1 1 130 PCGAMG Gal l01 1 1.0 2.5946e-01 1.0 2.64e+07543.5 6.8e+03 1.2e+04 1.2e+01 0 0 0 0 2 0 0 0 0 2 271 PCGAMG Opt l01 1 1.0 2.9430e-02 1.0 1.11e+06444.1 4.9e+03 1.5e+03 1.0e+01 0 0 0 0 1 0 0 0 0 1 90 PCGAMG Gal l02 1 1.0 1.2971e-02 1.0 3.34e+06 0.0 2.1e+03 1.3e+03 1.2e+01 0 0 0 0 2 0 0 0 0 2 343 PCGAMG Opt l02 1 1.0 3.6016e-03 1.0 2.61e+05 0.0 2.0e+03 2.1e+02 1.0e+01 0 0 0 0 1 0 0 0 0 1 97 PCGAMG Gal l03 1 1.0 5.7189e-04 1.1 1.45e+04 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 2 0 0 0 0 2 25 PCGAMG Opt l03 1 1.0 4.1255e-04 1.1 5.02e+03 0.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 1 0 0 0 0 1 12 PCSetUp 1 1.0 5.1101e+00 1.0 6.99e+07 5.5 1.3e+05 6.5e+04 4.6e+02 9 4 9 24 58 9 4 9 24 60 135 PCApply 82 1.0 2.5729e+00 1.0 7.17e+08 4.0 1.2e+06 1.6e+04 1.4e+01 4 49 80 53 2 4 49 80 53 2 3488 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Container 8 2 1248 0. Matrix 119 66 25244856 0. Matrix Coarsen 4 4 2688 0. Vector 402 293 26939160 0. Index Set 67 58 711824 0. Star Forest Graph 49 28 36128 0. Krylov Solver 11 4 124000 0. Preconditioner 11 4 3712 0. Viewer 1 0 0 0. PetscRandom 4 4 2840 0. Distributed Mesh 9 4 20512 0. Discrete System 9 4 4096 0. Weak Form 9 4 2656 0. ======================================================================================================================== Average time to get PetscTime(): 2.86847e-08 Average time for MPI_Barrier(): 1.14387e-05 Average time for zero size MPI_Send(): 3.53196e-06 #PETSc Option Table entries: -ksp_converged_reason -ksp_monitor -ksp_monitor_true_residual -log_view #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with 64 bit PetscInt Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8 Configure options: --with-python --prefix=/home/lida -with-mpi-dir=/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4 LDFLAGS="-L/home/lida/lib64 -L/home/lida/lib -L/home/lida/jdk/lib" CPPFLAGS="-I/home/lida/include -I/home/lida/jdk/include -march=native -O3" CXXFLAGS="-I/home/lida/include -I/home/lida/jdk/include -march=native -O3" CFLAGS="-I/home/lida/include -I/home/lida/jdk/include -march=native -O3" --with-debugging=no --with-64-bit-indices FOPTFLAGS="-O3 -march=native" --download-make ----------------------------------------- Libraries compiled on 2022-05-25 10:03:14 on head1.hpc Machine characteristics: Linux-3.10.0-1062.el7.x86_64-x86_64-with-centos-7.7.1908-Core Using PETSc directory: /home/lida Using PETSc arch: ----------------------------------------- Using C compiler: /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpicc -I/home/lida/include -I/home/lida/jdk/include -march=native -O3 -fPIC -I/home/lida/include -I/home/lida/jdk/include -march=native -O3 Using Fortran compiler: /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -O3 -march=native -I/home/lida/include -I/home/lida/jdk/include -march=native -O3 ----------------------------------------- Using include paths: -I/home/lida/include -I/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/include ----------------------------------------- Using C linker: /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpicc Using Fortran linker: /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpif90 Using libraries: -Wl,-rpath,/home/lida/lib -L/home/lida/lib -lpetsc -Wl,-rpath,/home/lida/lib64 -L/home/lida/lib64 -Wl,-rpath,/home/lida/lib -L/home/lida/lib -Wl,-rpath,/home/lida/jdk/lib -L/home/lida/jdk/lib -Wl,-rpath,/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/lib -L/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/lib -Wl,-rpath,/opt/ohpc/pub/compiler/gcc/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0 -L/opt/ohpc/pub/compiler/gcc/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0 -Wl,-rpath,/opt/ohpc/pub/compiler/gcc/8.3.0/lib64 -L/opt/ohpc/pub/compiler/gcc/8.3.0/lib64 -Wl,-rpath,/home/lida/intel/oneapi/mkl/2022.0.2/lib/intel64 -L/home/lida/intel/oneapi/mkl/2022.0.2/lib/intel64 -Wl,-rpath,/opt/software/intel/compilers_and_libraries_2020.2.254/linux/tbb/lib/intel64/gcc4.8 -L/opt/software/intel/compilers_and_libraries_2020.2.254/linux/tbb/lib/intel64/gcc4.8 -Wl,-rpath,/opt/ohpc/pub/compiler/gcc/8.3.0/lib -L/opt/ohpc/pub/compiler/gcc/8.3.0/lib -lopenblas -lm -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lquadmath -lstdc++ -ldl ----------------------------------------- [lida at head1 build]$ -------------- next part -------------- [lida at head1 petsc]$ export OMP_NUM_THREADS=1 [lida at head1 petsc]$ make streams NPMAX=8 2>/dev/null /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpicc -o MPIVersion.o -c -I/home/lida/include -I/home/lida/jdk/include -march=native -O3 -fPIC -I/home/lida/include -I/home/lida/jdk/include -march=native -O3 -I/home/lida/Code/petsc/include -I/home/lida/Code/petsc/arch-linux-c-opt/include -I/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/include -I/home/lida/include -I/home/lida/jdk/include -march=native -O3 `pwd`/MPIVersion.c Running streams with '/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpiexec --oversubscribe ' using 'NPMAX=8' 1 16106.3237 Rate (MB/s) 2 28660.2442 Rate (MB/s) 1.77944 3 42041.2053 Rate (MB/s) 2.61023 4 57109.2439 Rate (MB/s) 3.54577 5 66797.5164 Rate (MB/s) 4.14729 6 79516.0361 Rate (MB/s) 4.93695 7 88664.6509 Rate (MB/s) 5.50497 8 101902.1854 Rate (MB/s) 6.32685 ------------------------------------------------ Unable to open matplotlib to plot speedup Unable to open matplotlib to plot speedup -------------- next part -------------- A non-text attachment was scrubbed... Name: time per iterations.png Type: image/png Size: 15274 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: without # 0,2,12 iterations.png Type: image/png Size: 17069 bytes Desc: not available URL: From mfadams at lbl.gov Fri Jun 3 07:17:41 2022 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 3 Jun 2022 08:17:41 -0400 Subject: [petsc-users] Sparse linear system solving In-Reply-To: <5475809d-16b1-e6a8-72b8-c4605f321e8e@mail.ioffe.ru> References: <026d55af-c978-81da-f571-46519e5e6f8e@mail.ioffe.ru> <2e7ebbf1-511a-7055-ff92-131d3bf73f1e@mail.ioffe.ru> <201b6c28-2616-edaa-21dd-ee91a257c432@mail.ioffe.ru> <5475809d-16b1-e6a8-72b8-c4605f321e8e@mail.ioffe.ru> Message-ID: Your timing data in the first plot seems to have random integers (2,1,1) added to random iterations (0,2,12). Perhaps there is a bug in your test setup? Mark On Fri, Jun 3, 2022 at 6:42 AM Lidia wrote: > Dear Matt, Barry, > > thank you for the information about openMP! > > Now all processes are loaded well. But we see a strange behaviour of > running times at different iterations, see description below. Could you > please explain us the reason and how we can improve it? > > We need to quickly solve a big (about 1e6 rows) square sparse > non-symmetric matrix many times (about 1e5 times) consequently. Matrix is > constant at every iteration, and the right-side vector B is slowly changed > (we think that its change at every iteration should be less then 0.001 %). > So we use every previous solution vector X as an initial guess for the next > iteration. AMG preconditioner and GMRES solver are used. > > We have tested the code using a matrix with 631 000 rows, during 15 > consequent iterations, using vector X from the previous iterations. > Right-side vector B and matrix A are constant during the whole running. The > time of the first iteration is large (about 2 seconds) and is quickly > decreased to the next iterations (average time of last iterations were > about 0.00008 s). But some iterations in the middle (# 2 and # 12) have > huge time - 0.999063 second (see the figure with time dynamics attached). > This time of 0.999 second does not depend on the size of a matrix, on the > number of MPI processes, these time jumps also exist if we vary vector B. > Why these time jumps appear and how we can avoid them? > > The ksp_monitor out for this running (included 15 iterations) using 36 MPI > processes and a file with the memory bandwidth information (testSpeed) are > also attached. We can provide our C++ script if it is needed. > > Thanks a lot! > Best, > Lidiia > > > > On 01.06.2022 21:14, Matthew Knepley wrote: > > On Wed, Jun 1, 2022 at 1:43 PM Lidia wrote: > >> Dear Matt, >> >> Thank you for the rule of 10,000 variables per process! We have run ex.5 >> with matrix 1e4 x 1e4 at our cluster and got a good performance dynamics >> (see the figure "performance.png" - dependency of the solving time in >> seconds on the number of cores). We have used GAMG preconditioner >> (multithread: we have added the option " >> -pc_gamg_use_parallel_coarse_grid_solver") and GMRES solver. And we have >> set one openMP thread to every MPI process. Now the ex.5 is working good on >> many mpi processes! But the running uses about 100 GB of RAM. >> >> How we can run ex.5 using many openMP threads without mpi? If we just >> change the running command, the cores are not loaded normally: usually just >> one core is loaded in 100 % and others are idle. Sometimes all cores are >> working in 100 % during 1 second but then again become idle about 30 >> seconds. Can the preconditioner use many threads and how to activate this >> option? >> > > Maye you could describe what you are trying to accomplish? Threads and > processes are not really different, except for memory sharing. However, > sharing large complex data structures rarely works. That is why they get > partitioned and operate effectively as distributed memory. You would not > really save memory by using > threads in this instance, if that is your goal. This is detailed in the > talks in this session (see 2016 PP Minisymposium on this page > https://cse.buffalo.edu/~knepley/relacs.html). > > Thanks, > > Matt > > >> The solving times (the time of the solver work) using 60 openMP threads >> is 511 seconds now, and while using 60 MPI processes - 13.19 seconds. >> >> ksp_monitor outs for both cases (many openMP threads or many MPI >> processes) are attached. >> >> >> Thank you! >> Best, >> Lidia >> >> On 31.05.2022 15:21, Matthew Knepley wrote: >> >> I have looked at the local logs. First, you have run problems of size 12 >> and 24. As a rule of thumb, you need 10,000 >> variables per process in order to see good speedup. >> >> Thanks, >> >> Matt >> >> On Tue, May 31, 2022 at 8:19 AM Matthew Knepley >> wrote: >> >>> On Tue, May 31, 2022 at 7:39 AM Lidia wrote: >>> >>>> Matt, Mark, thank you much for your answers! >>>> >>>> >>>> Now we have run example # 5 on our computer cluster and on the local >>>> server and also have not seen any performance increase, but by unclear >>>> reason running times on the local server are much better than on the >>>> cluster. >>>> >>> I suspect that you are trying to get speedup without increasing the >>> memory bandwidth: >>> >>> >>> https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup >>> >>> Thanks, >>> >>> Matt >>> >>>> Now we will try to run petsc #5 example inside a docker container on >>>> our server and see if the problem is in our environment. I'll write you the >>>> results of this test as soon as we get it. >>>> >>>> The ksp_monitor outs for the 5th test at the current local server >>>> configuration (for 2 and 4 mpi processes) and for the cluster (for 1 and 3 >>>> mpi processes) are attached . >>>> >>>> >>>> And one more question. Potentially we can use 10 nodes and 96 threads >>>> at each node on our cluster. What do you think, which combination of >>>> numbers of mpi processes and openmp threads may be the best for the 5th >>>> example? >>>> >>>> Thank you! >>>> >>>> >>>> Best, >>>> Lidiia >>>> >>>> On 31.05.2022 05:42, Mark Adams wrote: >>>> >>>> And if you see "NO" change in performance I suspect the solver/matrix >>>> is all on one processor. >>>> (PETSc does not use threads by default so threads should not change >>>> anything). >>>> >>>> As Matt said, it is best to start with a PETSc example that does >>>> something like what you want (parallel linear solve, see >>>> src/ksp/ksp/tutorials for examples), and then add your code to it. >>>> That way you get the basic infrastructure in place for you, which is >>>> pretty obscure to the uninitiated. >>>> >>>> Mark >>>> >>>> On Mon, May 30, 2022 at 10:18 PM Matthew Knepley >>>> wrote: >>>> >>>>> On Mon, May 30, 2022 at 10:12 PM Lidia >>>>> wrote: >>>>> >>>>>> Dear colleagues, >>>>>> >>>>>> Is here anyone who have solved big sparse linear matrices using PETSC? >>>>>> >>>>> >>>>> There are lots of publications with this kind of data. Here is one >>>>> recent one: https://arxiv.org/abs/2204.01722 >>>>> >>>>> >>>>>> We have found NO performance improvement while using more and more >>>>>> mpi >>>>>> processes (1-2-3) and open-mp threads (from 1 to 72 threads). Did >>>>>> anyone >>>>>> faced to this problem? Does anyone know any possible reasons of such >>>>>> behaviour? >>>>>> >>>>> >>>>> Solver behavior is dependent on the input matrix. The only >>>>> general-purpose solvers >>>>> are direct, but they do not scale linearly and have high memory >>>>> requirements. >>>>> >>>>> Thus, in order to make progress you will have to be specific about >>>>> your matrices. >>>>> >>>>> >>>>>> We use AMG preconditioner and GMRES solver from KSP package, as our >>>>>> matrix is large (from 100 000 to 1e+6 rows and columns), sparse, >>>>>> non-symmetric and includes both positive and negative values. But >>>>>> performance problems also exist while using CG solvers with symmetric >>>>>> matrices. >>>>>> >>>>> >>>>> There are many PETSc examples, such as example 5 for the Laplacian, >>>>> that exhibit >>>>> good scaling with both AMG and GMG. >>>>> >>>>> >>>>>> Could anyone help us to set appropriate options of the preconditioner >>>>>> and solver? Now we use default parameters, maybe they are not the >>>>>> best, >>>>>> but we do not know a good combination. Or maybe you could suggest any >>>>>> other pairs of preconditioner+solver for such tasks? >>>>>> >>>>>> I can provide more information: the matrices that we solve, c++ >>>>>> script >>>>>> to run solving using petsc and any statistics obtained by our runs. >>>>>> >>>>> >>>>> First, please provide a description of the linear system, and the >>>>> output of >>>>> >>>>> -ksp_view -ksp_monitor_true_residual -ksp_converged_reason -log_view >>>>> >>>>> for each test case. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> Thank you in advance! >>>>>> >>>>>> Best regards, >>>>>> Lidiia Varshavchik, >>>>>> Ioffe Institute, St. Petersburg, Russia >>>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jun 3 07:19:03 2022 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 3 Jun 2022 08:19:03 -0400 Subject: [petsc-users] Sparse linear system solving In-Reply-To: <5475809d-16b1-e6a8-72b8-c4605f321e8e@mail.ioffe.ru> References: <026d55af-c978-81da-f571-46519e5e6f8e@mail.ioffe.ru> <2e7ebbf1-511a-7055-ff92-131d3bf73f1e@mail.ioffe.ru> <201b6c28-2616-edaa-21dd-ee91a257c432@mail.ioffe.ru> <5475809d-16b1-e6a8-72b8-c4605f321e8e@mail.ioffe.ru> Message-ID: On Fri, Jun 3, 2022 at 6:42 AM Lidia wrote: > Dear Matt, Barry, > > thank you for the information about openMP! > > Now all processes are loaded well. But we see a strange behaviour of > running times at different iterations, see description below. Could you > please explain us the reason and how we can improve it? > > We need to quickly solve a big (about 1e6 rows) square sparse > non-symmetric matrix many times (about 1e5 times) consequently. Matrix is > constant at every iteration, and the right-side vector B is slowly changed > (we think that its change at every iteration should be less then 0.001 %). > So we use every previous solution vector X as an initial guess for the next > iteration. AMG preconditioner and GMRES solver are used. > > We have tested the code using a matrix with 631 000 rows, during 15 > consequent iterations, using vector X from the previous iterations. > Right-side vector B and matrix A are constant during the whole running. The > time of the first iteration is large (about 2 seconds) and is quickly > decreased to the next iterations (average time of last iterations were > about 0.00008 s). But some iterations in the middle (# 2 and # 12) have > huge time - 0.999063 second (see the figure with time dynamics attached). > This time of 0.999 second does not depend on the size of a matrix, on the > number of MPI processes, these time jumps also exist if we vary vector B. > Why these time jumps appear and how we can avoid them? > > PETSc is not taking this time. It must come from somewhere else in your code. Notice that no iterations are taken for any subsequent solves, so no operations other than the residual norm check (and preconditioner application) are being performed. Thanks, Matt > The ksp_monitor out for this running (included 15 iterations) using 36 MPI > processes and a file with the memory bandwidth information (testSpeed) are > also attached. We can provide our C++ script if it is needed. > > Thanks a lot! > Best, > Lidiia > > > > On 01.06.2022 21:14, Matthew Knepley wrote: > > On Wed, Jun 1, 2022 at 1:43 PM Lidia wrote: > >> Dear Matt, >> >> Thank you for the rule of 10,000 variables per process! We have run ex.5 >> with matrix 1e4 x 1e4 at our cluster and got a good performance dynamics >> (see the figure "performance.png" - dependency of the solving time in >> seconds on the number of cores). We have used GAMG preconditioner >> (multithread: we have added the option " >> -pc_gamg_use_parallel_coarse_grid_solver") and GMRES solver. And we have >> set one openMP thread to every MPI process. Now the ex.5 is working good on >> many mpi processes! But the running uses about 100 GB of RAM. >> >> How we can run ex.5 using many openMP threads without mpi? If we just >> change the running command, the cores are not loaded normally: usually just >> one core is loaded in 100 % and others are idle. Sometimes all cores are >> working in 100 % during 1 second but then again become idle about 30 >> seconds. Can the preconditioner use many threads and how to activate this >> option? >> > > Maye you could describe what you are trying to accomplish? Threads and > processes are not really different, except for memory sharing. However, > sharing large complex data structures rarely works. That is why they get > partitioned and operate effectively as distributed memory. You would not > really save memory by using > threads in this instance, if that is your goal. This is detailed in the > talks in this session (see 2016 PP Minisymposium on this page > https://cse.buffalo.edu/~knepley/relacs.html). > > Thanks, > > Matt > > >> The solving times (the time of the solver work) using 60 openMP threads >> is 511 seconds now, and while using 60 MPI processes - 13.19 seconds. >> >> ksp_monitor outs for both cases (many openMP threads or many MPI >> processes) are attached. >> >> >> Thank you! >> Best, >> Lidia >> >> On 31.05.2022 15:21, Matthew Knepley wrote: >> >> I have looked at the local logs. First, you have run problems of size 12 >> and 24. As a rule of thumb, you need 10,000 >> variables per process in order to see good speedup. >> >> Thanks, >> >> Matt >> >> On Tue, May 31, 2022 at 8:19 AM Matthew Knepley >> wrote: >> >>> On Tue, May 31, 2022 at 7:39 AM Lidia wrote: >>> >>>> Matt, Mark, thank you much for your answers! >>>> >>>> >>>> Now we have run example # 5 on our computer cluster and on the local >>>> server and also have not seen any performance increase, but by unclear >>>> reason running times on the local server are much better than on the >>>> cluster. >>>> >>> I suspect that you are trying to get speedup without increasing the >>> memory bandwidth: >>> >>> >>> https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup >>> >>> Thanks, >>> >>> Matt >>> >>>> Now we will try to run petsc #5 example inside a docker container on >>>> our server and see if the problem is in our environment. I'll write you the >>>> results of this test as soon as we get it. >>>> >>>> The ksp_monitor outs for the 5th test at the current local server >>>> configuration (for 2 and 4 mpi processes) and for the cluster (for 1 and 3 >>>> mpi processes) are attached . >>>> >>>> >>>> And one more question. Potentially we can use 10 nodes and 96 threads >>>> at each node on our cluster. What do you think, which combination of >>>> numbers of mpi processes and openmp threads may be the best for the 5th >>>> example? >>>> >>>> Thank you! >>>> >>>> >>>> Best, >>>> Lidiia >>>> >>>> On 31.05.2022 05:42, Mark Adams wrote: >>>> >>>> And if you see "NO" change in performance I suspect the solver/matrix >>>> is all on one processor. >>>> (PETSc does not use threads by default so threads should not change >>>> anything). >>>> >>>> As Matt said, it is best to start with a PETSc example that does >>>> something like what you want (parallel linear solve, see >>>> src/ksp/ksp/tutorials for examples), and then add your code to it. >>>> That way you get the basic infrastructure in place for you, which is >>>> pretty obscure to the uninitiated. >>>> >>>> Mark >>>> >>>> On Mon, May 30, 2022 at 10:18 PM Matthew Knepley >>>> wrote: >>>> >>>>> On Mon, May 30, 2022 at 10:12 PM Lidia >>>>> wrote: >>>>> >>>>>> Dear colleagues, >>>>>> >>>>>> Is here anyone who have solved big sparse linear matrices using PETSC? >>>>>> >>>>> >>>>> There are lots of publications with this kind of data. Here is one >>>>> recent one: https://arxiv.org/abs/2204.01722 >>>>> >>>>> >>>>>> We have found NO performance improvement while using more and more >>>>>> mpi >>>>>> processes (1-2-3) and open-mp threads (from 1 to 72 threads). Did >>>>>> anyone >>>>>> faced to this problem? Does anyone know any possible reasons of such >>>>>> behaviour? >>>>>> >>>>> >>>>> Solver behavior is dependent on the input matrix. The only >>>>> general-purpose solvers >>>>> are direct, but they do not scale linearly and have high memory >>>>> requirements. >>>>> >>>>> Thus, in order to make progress you will have to be specific about >>>>> your matrices. >>>>> >>>>> >>>>>> We use AMG preconditioner and GMRES solver from KSP package, as our >>>>>> matrix is large (from 100 000 to 1e+6 rows and columns), sparse, >>>>>> non-symmetric and includes both positive and negative values. But >>>>>> performance problems also exist while using CG solvers with symmetric >>>>>> matrices. >>>>>> >>>>> >>>>> There are many PETSc examples, such as example 5 for the Laplacian, >>>>> that exhibit >>>>> good scaling with both AMG and GMG. >>>>> >>>>> >>>>>> Could anyone help us to set appropriate options of the preconditioner >>>>>> and solver? Now we use default parameters, maybe they are not the >>>>>> best, >>>>>> but we do not know a good combination. Or maybe you could suggest any >>>>>> other pairs of preconditioner+solver for such tasks? >>>>>> >>>>>> I can provide more information: the matrices that we solve, c++ >>>>>> script >>>>>> to run solving using petsc and any statistics obtained by our runs. >>>>>> >>>>> >>>>> First, please provide a description of the linear system, and the >>>>> output of >>>>> >>>>> -ksp_view -ksp_monitor_true_residual -ksp_converged_reason -log_view >>>>> >>>>> for each test case. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> Thank you in advance! >>>>>> >>>>>> Best regards, >>>>>> Lidiia Varshavchik, >>>>>> Ioffe Institute, St. Petersburg, Russia >>>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Arne.Morten.Kvarving at sintef.no Fri Jun 3 08:09:13 2022 From: Arne.Morten.Kvarving at sintef.no (Arne Morten Kvarving) Date: Fri, 3 Jun 2022 13:09:13 +0000 Subject: [petsc-users] MatSchurComplementGetPmat voes Message-ID: Hi! I have a Chorin pressure correction solver with consistent pressure update, i.e. pressure solve is based on the Schur complement E = -A10*ainv(A00)*A01 with A10 = divergence, A00 the mass matrix and A01 the gradient. I have had this implemented with petsc for a long time and it's working fine. However, I've done the schur-complement manually, ie using a MatShell. I now wanted to see if I can implement this using the petsc facilities for the schur-complement, but I get a confusing error when I call MatSchurComplementGetPmat(). ----- Code snippet: MatCreateSchurComplement(m_blocks[0], m_blocks[0], m_blocks[1], m_blocks[2], nullptr, &E_operator); < ... setup the ksp for A00 > MatSchurComplementSetAinvType(E_operator, MAT_SCHUR_COMPLEMENT_AINV_DIAG); MatView(E_operator); MatSchurComplementGetPmat(E_operator, MAT_INITIAL_MATRIX, &E_pc); ----- This yields the output (I cut out the matrix elements): Mat Object: 1 MPI processes type: schurcomplement Schur complement A11 - A10 inv(A00) A01 A11 = 0 A10 Mat Object: 1 MPI processes type: seqaij KSP of A00 KSP Object: 1 MPI processes type: preonly maximum iterations=1000, initial guess is zero tolerances: relative=1e-06, absolute=1e-20, divergence=1e+06 left preconditioning using DEFAULT norm type for convergence test PC Object: 1 MPI processes type: lu out-of-place factorization tolerance for zero pivot 2.22045e-14 matrix ordering: nd factor fill ratio given 5., needed 1.02768 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=72, cols=72 package used to perform factorization: petsc total: nonzeros=4752, allocated nonzeros=4752 using I-node routines: found 22 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=72, cols=72 total: nonzeros=4624, allocated nonzeros=5184 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 24 nodes, limit used is 5 A01 Mat Object: 1 MPI processes type: seqaij [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Invalid argument [0]PETSC ERROR: Wrong type of object: Parameter # 1 [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.17.2, Jun 02, 2022 [0]PETSC ERROR: ../d/bin/Stokes on a linux-gnu-cxx-opt named akvalung by akva Fri Jun 3 14:48:06 2022 [0]PETSC ERROR: Configure options --with-mpi=0 --with-lapack-lib=-llapack --with-64-bit-indices=0 --with-shared-libraries=0 --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 --with-blas-lib=-lblas --CFLAGS=-fPIC --CXXFLAGS=-fPIC --FFLAGS=-fPIC [0]PETSC ERROR: #1 MatDestroy() at /home/akva/kode/petsc/petsc-3.17.2/src/mat/interface/matrix.c:1235 [0]PETSC ERROR: #2 MatCreateSchurComplementPmat() at /home/akva/kode/petsc/petsc-3.17.2/src/ksp/ksp/utils/schurm/schurm.c:763 [0]PETSC ERROR: #3 MatSchurComplementGetPmat_Basic() at /home/akva/kode/petsc/petsc-3.17.2/src/ksp/ksp/utils/schurm/schurm.c:785 [0]PETSC ERROR: #4 MatSchurComplementGetPmat() at /home/akva/kode/petsc/petsc-3.17.2/src/ksp/ksp/utils/schurm/schurm.c:835 where the errors come from the call call to obtain the preconditioner matrix. I don't see what I've done wrong, as far as I can see it's all following https://petsc.org/release/docs/manualpages/KSP/MatCreateSchurComplement.html#MatCreateSchurComplement MatCreateSchurComplement - Argonne National Laboratory Notes The Schur complement is NOT explicitly formed! Rather, this function returns a virtual Schur complement that can compute the matrix-vector product by using formula S = A11 - A10 A^{-1} A01 for Schur complement S and a KSP solver to approximate the action of A^{-1}.. All four matrices must have the same MPI communicator. petsc.org ? and https://petsc.org/release/docs/manualpages/KSP/MatSchurComplementGetPmat.html#MatSchurComplementGetPmat Looking into the code it seems to try to call MatDestroy() for the Sp matrix but as Sp has not been set up it fails (schurm.c:763) Removing that call as a test, it seems to succeed and I get the same solution as I do with my manual code. I'm sure I have done something stupid but I cannot see what, so any pointers would be appreciated. cheers arnem -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jun 3 08:43:58 2022 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 3 Jun 2022 09:43:58 -0400 Subject: [petsc-users] MatSchurComplementGetPmat voes In-Reply-To: References: Message-ID: On Fri, Jun 3, 2022 at 9:09 AM Arne Morten Kvarving via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi! > > I have a Chorin pressure correction solver with consistent pressure > update, i.e. > pressure solve is based on the Schur complement > > E = -A10*ainv(A00)*A01 > > with A10 = divergence, A00 the mass matrix and A01 the gradient. > > I have had this implemented with petsc for a long time and it's working > fine. However, I've done the schur-complement manually, ie using a MatShell. > > I now wanted to see if I can implement this using the petsc facilities for > the schur-complement, but I get a confusing error when I call > MatSchurComplementGetPmat(). > > ----- > > Code snippet: > > MatCreateSchurComplement(m_blocks[0], m_blocks[0], m_blocks[1], > m_blocks[2], nullptr, &E_operator); > < ... setup the ksp for A00 > > MatSchurComplementSetAinvType(E_operator, MAT_SCHUR_COMPLEMENT_AINV_DIAG); > MatView(E_operator); > MatSchurComplementGetPmat(E_operator, MAT_INITIAL_MATRIX, &E_pc); > > ----- > > This yields the output (I cut out the matrix elements): > Mat Object: 1 MPI processes > type: schurcomplement > Schur complement A11 - A10 inv(A00) A01 > A11 = 0 > A10 > Mat Object: 1 MPI processes > type: seqaij > KSP of A00 > KSP Object: 1 MPI processes > type: preonly > maximum iterations=1000, initial guess is zero > tolerances: relative=1e-06, absolute=1e-20, divergence=1e+06 > left preconditioning > using DEFAULT norm type for convergence test > PC Object: 1 MPI processes > type: lu > out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: nd > factor fill ratio given 5., needed 1.02768 > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=72, cols=72 > package used to perform factorization: petsc > total: nonzeros=4752, allocated nonzeros=4752 > using I-node routines: found 22 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=72, cols=72 > total: nonzeros=4624, allocated nonzeros=5184 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 24 nodes, limit used is 5 > A01 > Mat Object: 1 MPI processes > type: seqaij > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Invalid argument > [0]PETSC ERROR: Wrong type of object: Parameter # 1 > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.17.2, Jun 02, 2022 > [0]PETSC ERROR: ../d/bin/Stokes on a linux-gnu-cxx-opt named akvalung by > akva Fri Jun 3 14:48:06 2022 > [0]PETSC ERROR: Configure options --with-mpi=0 --with-lapack-lib=-llapack > --with-64-bit-indices=0 --with-shared-libraries=0 --COPTFLAGS=-O3 > --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 --with-blas-lib=-lblas --CFLAGS=-fPIC > --CXXFLAGS=-fPIC --FFLAGS=-fPIC > [0]PETSC ERROR: #1 MatDestroy() at > /home/akva/kode/petsc/petsc-3.17.2/src/mat/interface/matrix.c:1235 > [0]PETSC ERROR: #2 MatCreateSchurComplementPmat() at > /home/akva/kode/petsc/petsc-3.17.2/src/ksp/ksp/utils/schurm/schurm.c:763 > [0]PETSC ERROR: #3 MatSchurComplementGetPmat_Basic() at > /home/akva/kode/petsc/petsc-3.17.2/src/ksp/ksp/utils/schurm/schurm.c:785 > [0]PETSC ERROR: #4 MatSchurComplementGetPmat() at > /home/akva/kode/petsc/petsc-3.17.2/src/ksp/ksp/utils/schurm/schurm.c:835 > > where the errors come from the call call to obtain the preconditioner > matrix. > I don't see what I've done wrong, as far as I can see it's all following > https://petsc.org/release/docs/manualpages/KSP/MatCreateSchurComplement.html#MatCreateSchurComplement > MatCreateSchurComplement - Argonne National Laboratory > > Notes The Schur complement is NOT explicitly formed! Rather, this function > returns a virtual Schur complement that can compute the matrix-vector > product by using formula S = A11 - A10 A^{-1} A01 for Schur complement S > and a KSP solver to approximate the action of A^{-1}.. All four matrices > must have the same MPI communicator. > petsc.org > *?* > > and > > https://petsc.org/release/docs/manualpages/KSP/MatSchurComplementGetPmat.html#MatSchurComplementGetPmat > > Looking into the code it seems to try to call MatDestroy() for the Sp > matrix but as Sp has not been set up it fails (schurm.c:763) > Removing that call as a test, it seems to succeed and I get the same > solution as I do > with my manual code. > > I'm sure I have done something stupid but I cannot see what, so any > pointers would be appreciated. > This is not your fault. If the flag is MAT_INITIAL_MATRIX, we are expecting the pointer to be initialized to NULL, but we never state this. I think if you do this, the code will start working. I will fix GetPmat() so that it does this automatically. Thanks, Matt > cheers > > arnem > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri Jun 3 09:19:39 2022 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 3 Jun 2022 09:19:39 -0500 (CDT) Subject: [petsc-users] petsc-3.17.2 now available Message-ID: <45e33633-8989-4793-4e82-8fad84be81@mcs.anl.gov> Dear PETSc users, The patch release petsc-3.17.2 is now available for download. http://www.mcs.anl.gov/petsc/download/index.html Satish From jsfaraway at gmail.com Fri Jun 3 11:50:50 2022 From: jsfaraway at gmail.com (jsfaraway) Date: Sat, 4 Jun 2022 00:50:50 +0800 Subject: [petsc-users] SLEPc EPSGD: too much time in single iteration Message-ID: <4C029833-EC0F-494F-911F-D795375718D9@gmail.com> An HTML attachment was scrubbed... URL: From jsfaraway at gmail.com Fri Jun 3 12:18:55 2022 From: jsfaraway at gmail.com (=?UTF-8?B?UnVuZmVuZyBKaW4=?=) Date: Sat, 4 Jun 2022 01:18:55 +0800 (GMT+08:00) Subject: [petsc-users] SLEPc EPSGD: too much time in single iteration Message-ID: <629a4282.1c69fb81.f0697.c92d@mx.google.com> An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Fri Jun 3 12:37:21 2022 From: jroman at dsic.upv.es (Jose E. Roman) Date: Fri, 3 Jun 2022 19:37:21 +0200 Subject: [petsc-users] SLEPc EPSGD: too much time in single iteration In-Reply-To: <4C029833-EC0F-494F-911F-D795375718D9@gmail.com> References: <4C029833-EC0F-494F-911F-D795375718D9@gmail.com> Message-ID: Convergence depends on distribution of eigenvalues you want to compute. On the other hand, the cost also depends on the time it takes to build the preconditioner. Use -log_view to see the cost of the different steps of the computation. Jose > El 3 jun 2022, a las 18:50, jsfaraway escribi?: > > hello! > > I am trying to use epsgd compute matrix's one smallest eigenvalue. And I find a strang thing. There are two matrix A(900000*900000) and B(90000*90000). While solve A use 371 iterations and only 30.83s, solve B use 22 iterations and 38885s! What could be the reason for this? Or what can I do to find the reason? > > I use" -eps_type gd -eps_ncv 300 -eps_nev 3 -eps_smallest_real ". > And there is one difference I can tell is matrix B has many small value, whose absolute value is less than 10-6. Could this be the reason? > > Thank you! > > Runfeng Jin From mi.mike1021 at gmail.com Fri Jun 3 12:39:43 2022 From: mi.mike1021 at gmail.com (Mike Michell) Date: Fri, 3 Jun 2022 12:39:43 -0500 Subject: [petsc-users] PetscSF Object on Distributed DMPlex for Halo Data Exchange In-Reply-To: References: Message-ID: Thanks for the effort. By any chance, is there any rough timeline for that part can be done? Thanks, Mike > On Tue, May 31, 2022 at 10:26 AM Mike Michell > wrote: > >> Thank you. But, it seems that PetscSFCreateSectionSF() also requires >> petscsf.h file. Which header file I should include to call >> PetscSFCreateSectionSF() from Fortran? >> > > I will have to write a binding. I will send you the MR when I finish. > > THanks, > > Matt > > >> Thanks, >> Mike >> >> >> On Tue, May 31, 2022 at 10:04 AM Mike Michell >>> wrote: >>> >>>> As a follow-up question on your example, is it possible to call >>>> PetscSFCreateRemoteOffsets() from Fortran? >>>> >>>> My code is written in .F90 and in "petsc/finclude/" there is no >>>> petscsf.h so that the code currently cannot find >>>> PetscSFCreateRemoteOffsets(). >>>> >>> >>> I believe if you pass in NULL for remoteOffsets, that function will be >>> called internally. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks, >>>> Mike >>>> >>>> >>>> I will also point out that Toby has created a nice example showing how >>>>> to create an SF for halo exchange between local vectors. >>>>> >>>>> https://gitlab.com/petsc/petsc/-/merge_requests/5267 >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> On Sun, May 22, 2022 at 9:47 PM Matthew Knepley >>>>> wrote: >>>>> >>>>>> On Sun, May 22, 2022 at 4:28 PM Mike Michell >>>>>> wrote: >>>>>> >>>>>>> Thanks for the reply. The diagram makes sense and is helpful for >>>>>>> understanding 1D representation. >>>>>>> >>>>>>> However, something is still unclear. From your diagram, the number >>>>>>> of roots per process seems to vary according to run arguments, such as >>>>>>> "-dm_distribute_overlap", because "the number of roots for a DMPlex is the >>>>>>> number of mesh points in the local portion of the mesh (cited from your >>>>>>> answer to my question (1))" will end up change according to that argument. >>>>>>> However, from my mock-up code, number of roots is independent to >>>>>>> -dm_distribute_overlap argument. The summation of "number of roots" through >>>>>>> processes was always equal to number of physical vertex on my mesh, if I >>>>>>> define the section layout on vertex with 1DOF. But in your diagram example, >>>>>>> the summation of "nroots" is larger than the actual number of mesh points, >>>>>>> which is 13. >>>>>>> >>>>>> >>>>>> I do not understand your question. Notice the -dm_distribute_overlap >>>>>> does _not_ change the owned points for any process. It only puts in new >>>>>> leaves, so it also never >>>>>> changes the roots for this way of using the SF. >>>>>> >>>>>> >>>>>>> Also, it is still unclear how to get the size of "roots" from the >>>>>>> PetscSection & PetscSF on distributed DMPlex? >>>>>>> >>>>>> >>>>>> For an SF mapping ghost dofs in a global vector, the number of roots >>>>>> is just the size of the local portion of the vector. >>>>>> >>>>>> >>>>>>> In your diagram, how can you tell your code and make it allocate the >>>>>>> "nroots=7 for P0, nroots=9 for P1, and nroots=7 for P2" arrays before you >>>>>>> call PetscSFBcastBegin/End()? It seems that we need to define arrays having >>>>>>> the size of nroots & nleaves before calling PetscSFBcastBegin/End(). >>>>>>> >>>>>> >>>>>> I just want to note that this usage is different from the canonical >>>>>> usage in Plex. It is fine to do this, but this will not match what I do in >>>>>> the library if you look. >>>>>> In Plex, I distinguish two linear spaces: >>>>>> >>>>>> 1) Global space: This is the vector space for the solvers. Each >>>>>> point is uniquely represented and owned by some process >>>>>> >>>>>> 2) Local space: This is the vector space for assembly. Some points >>>>>> are represented multiple times. >>>>>> >>>>>> I create an SF that maps from the global space (roots) to the local >>>>>> space (leaves), and it is called in DMGlobalToLocal() (and >>>>>> associated functions). This >>>>>> is more natural in FEM. You seem to want an SF that maps between >>>>>> global vectors. This will also work. The roots would be the local dofs, and >>>>>> the leaves >>>>>> would be shared dofs. >>>>>> >>>>>> Does this make sense? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> Mike >>>>>>> >>>>>>> Here's a diagram of a 1D mesh with overlap and 3 partitions, showing >>>>>>>> what the petscsf data is for each. The number of roots is the number of >>>>>>>> mesh points in the local representation, and the number of leaves is the >>>>>>>> number of mesh points that are duplicates of mesh points on other >>>>>>>> processes. With that in mind, answering your questions >>>>>>>> >>>>>>>> > (1) It seems that the "roots" means the number of vertex not >>>>>>>> considering overlap layer, and "leaves" seems the number of distributed >>>>>>>> vertex for each processor that includes overlap layer. Can you acknowledge >>>>>>>> that this is correct understanding? I have tried to find clearer examples >>>>>>>> from PETSc team's articles relevant to Star Forest, but I am still unclear >>>>>>>> about the exact relation & graphical notation of roots & leaves in SF if >>>>>>>> it's the case of DMPlex solution arrays. >>>>>>>> >>>>>>>> No, the number of roots for a DMPlex is the number of mesh points >>>>>>>> in the local portion of the mesh >>>>>>>> >>>>>>>> > (2) If it is so, there is an issue that I cannot define "root >>>>>>>> data" and "leave data" generally. I am trying to following >>>>>>>> "src/vec/is/sf/tutorials/ex1f.F90", however, in that example, size of roots >>>>>>>> and leaves are predefined as 6. How can I generalize that? Because I can >>>>>>>> get size of leaves using DAG depth(or height), which is equal to number of >>>>>>>> vertices each proc has. But, how can I get the size of my "roots" region >>>>>>>> from SF? Any example about that? This question is connected to how can I >>>>>>>> define "rootdata" for "PetscSFBcastBegin/End()". >>>>>>>> >>>>>>>> Does the diagram help you generalize? >>>>>>>> >>>>>>>> > (3) More importantly, with the attached PetscSection & SF layout, >>>>>>>> my vector is only resolved for the size equal to "number of roots" for each >>>>>>>> proc, but not for the overlapping area(i.e., "leaves"). What I wish to do >>>>>>>> is to exchange (or reduce) the solution data between each proc, in the >>>>>>>> overlapping region. Can I get some advices why my vector does not encompass >>>>>>>> the "leaves" regime? Is there any example doing similar things? >>>>>>>> Going back to my first response: if you use a section to say how >>>>>>>> many pieces of data are associated with each local mesh point, then a >>>>>>>> PetscSF is constructed that requires no more manipulation from you. >>>>>>>> >>>>>>>> >>>>>>>> On Sun, May 22, 2022 at 10:47 AM Mike Michell < >>>>>>>> mi.mike1021 at gmail.com> wrote: >>>>>>>> >>>>>>>>> Thank you for the reply. >>>>>>>>> The PetscSection and PetscSF objects are defined as in the >>>>>>>>> attached mock-up code (Q_PetscSF_1.tar). 1-DOF is defined on vertex as my >>>>>>>>> solution is determined on each vertex with 1-DOF from a finite-volume >>>>>>>>> method. >>>>>>>>> >>>>>>>>> As follow up questions: >>>>>>>>> (1) It seems that the "roots" means the number of vertex not >>>>>>>>> considering overlap layer, and "leaves" seems the number of distributed >>>>>>>>> vertex for each processor that includes overlap layer. Can you acknowledge >>>>>>>>> that this is correct understanding? I have tried to find clearer examples >>>>>>>>> from PETSc team's articles relevant to Star Forest, but I am still unclear >>>>>>>>> about the exact relation & graphical notation of roots & leaves in SF if >>>>>>>>> it's the case of DMPlex solution arrays. >>>>>>>>> >>>>>>>>> (2) If it is so, there is an issue that I cannot define "root >>>>>>>>> data" and "leave data" generally. I am trying to following >>>>>>>>> "src/vec/is/sf/tutorials/ex1f.F90", however, in that example, size of roots >>>>>>>>> and leaves are predefined as 6. How can I generalize that? Because I can >>>>>>>>> get size of leaves using DAG depth(or height), which is equal to number of >>>>>>>>> vertices each proc has. But, how can I get the size of my "roots" region >>>>>>>>> from SF? Any example about that? This question is connected to how can I >>>>>>>>> define "rootdata" for "PetscSFBcastBegin/End()". >>>>>>>>> >>>>>>>>> (3) More importantly, with the attached PetscSection & SF layout, >>>>>>>>> my vector is only resolved for the size equal to "number of roots" for each >>>>>>>>> proc, but not for the overlapping area(i.e., "leaves"). What I wish to do >>>>>>>>> is to exchange (or reduce) the solution data between each proc, in the >>>>>>>>> overlapping region. Can I get some advices why my vector does not encompass >>>>>>>>> the "leaves" regime? Is there any example doing similar things? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Mike >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Fri, May 20, 2022 at 4:45 PM Mike Michell < >>>>>>>>>> mi.mike1021 at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Thanks for the reply. >>>>>>>>>>> >>>>>>>>>>> > "What I want to do is to exchange data (probably just >>>>>>>>>>> MPI_Reduce)" which confuses me, because halo exchange is a point-to-point >>>>>>>>>>> exchange and not a reduction. Can you clarify? >>>>>>>>>>> PetscSFReduceBegin/End seems to be the function that do >>>>>>>>>>> reduction for PetscSF object. What I intended to mention was either >>>>>>>>>>> reduction or exchange, not specifically intended "reduction". >>>>>>>>>>> >>>>>>>>>>> As a follow-up question: >>>>>>>>>>> Assuming that the code has its own local solution arrays (not >>>>>>>>>>> Petsc type), and if the plex's DAG indices belong to the halo region are >>>>>>>>>>> the only information that I want to know (not the detailed section >>>>>>>>>>> description, such as degree of freedom on vertex, cells, etc.). I have >>>>>>>>>>> another PetscSection for printing out my solution. >>>>>>>>>>> Also if I can convert that DAG indices into my local cell/vertex >>>>>>>>>>> index, can I just use the PetscSF object created from DMGetPointSF(), >>>>>>>>>>> instead of "creating PetscSection + DMGetSectionSF()"? In other words, can >>>>>>>>>>> I use the PetscSF object declared from DMGetPointSF() for the halo >>>>>>>>>>> communication? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> No, because that point SF will index information by point number. >>>>>>>>>> You would need to build a new SF that indexes your dofs. The steps you would >>>>>>>>>> go through are exactly the same as you would if you just told us >>>>>>>>>> what the Section is that indexes your data. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Matt >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Mike >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The PetscSF that is created automatically is the "point sf" ( >>>>>>>>>>>> https://petsc.org/main/docs/manualpages/DM/DMGetPointSF/): it >>>>>>>>>>>> says which mesh points (cells, faces, edges and vertices) are duplicates of >>>>>>>>>>>> others. >>>>>>>>>>>> >>>>>>>>>>>> In a finite volume application we typically want to assign >>>>>>>>>>>> degrees of freedom just to cells: some applications may only have one >>>>>>>>>>>> degree of freedom, others may have multiple. >>>>>>>>>>>> >>>>>>>>>>>> You encode where you want degrees of freedom in a PetscSection >>>>>>>>>>>> and set that as the section for the DM in DMSetLocalSection() ( >>>>>>>>>>>> https://petsc.org/release/docs/manualpages/DM/DMSetLocalSection.html >>>>>>>>>>>> ) >>>>>>>>>>>> >>>>>>>>>>>> (A c example of these steps that sets degrees of freedom for >>>>>>>>>>>> *vertices* instead of cells is `src/dm/impls/plex/tutorials/ex7.c`) >>>>>>>>>>>> >>>>>>>>>>>> After that you can call DMGetSectionSF() ( >>>>>>>>>>>> https://petsc.org/main/docs/manualpages/DM/DMGetSectionSF/) to >>>>>>>>>>>> the the PetscSF that you want for halo exchange: the one for your solution >>>>>>>>>>>> variables. >>>>>>>>>>>> >>>>>>>>>>>> After that, the only calls you typically need in a finite >>>>>>>>>>>> volume code is PetscSFBcastBegin() to start a halo exchange and >>>>>>>>>>>> PetscSFBcastEnd() to complete it. >>>>>>>>>>>> >>>>>>>>>>>> You say >>>>>>>>>>>> >>>>>>>>>>>> > What I want to do is to exchange data (probably just >>>>>>>>>>>> MPI_Reduce) >>>>>>>>>>>> >>>>>>>>>>>> which confuses me, because halo exchange is a point-to-point >>>>>>>>>>>> exchange and not a reduction. Can you clarify? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, May 20, 2022 at 8:35 PM Mike Michell < >>>>>>>>>>>> mi.mike1021 at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Dear PETSc developer team, >>>>>>>>>>>>> >>>>>>>>>>>>> Hi, I am using DMPlex for a finite-volume code and trying to >>>>>>>>>>>>> understand the usage of PetscSF. What is a typical procedure for doing halo >>>>>>>>>>>>> data exchange at parallel boundary using PetscSF object on DMPlex? Is there >>>>>>>>>>>>> any example that I can refer to usage of PetscSF with distributed DMPlex? >>>>>>>>>>>>> >>>>>>>>>>>>> Assuming to use the attached mock-up code and mesh, if I give >>>>>>>>>>>>> "-dm_distribute_overlap 1 -over_dm_view" to run the code, I can see a >>>>>>>>>>>>> PetscSF object is already created, although I have not called >>>>>>>>>>>>> "PetscSFCreate" in the code. How can I import & use that PetscSF already >>>>>>>>>>>>> created by the code to do the halo data exchange? >>>>>>>>>>>>> >>>>>>>>>>>>> What I want to do is to exchange data (probably just >>>>>>>>>>>>> MPI_Reduce) in a parallel boundary region using PetscSF and its functions. >>>>>>>>>>>>> I might need to have an overlapping layer or not. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Mike >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>>>> experiments lead. >>>>>>>>>> -- Norbert Wiener >>>>>>>>>> >>>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lidia.varsh at mail.ioffe.ru Mon Jun 6 06:19:37 2022 From: lidia.varsh at mail.ioffe.ru (Lidia) Date: Mon, 6 Jun 2022 14:19:37 +0300 Subject: [petsc-users] Sparse linear system solving In-Reply-To: References: <026d55af-c978-81da-f571-46519e5e6f8e@mail.ioffe.ru> <2e7ebbf1-511a-7055-ff92-131d3bf73f1e@mail.ioffe.ru> <201b6c28-2616-edaa-21dd-ee91a257c432@mail.ioffe.ru> <5475809d-16b1-e6a8-72b8-c4605f321e8e@mail.ioffe.ru> Message-ID: <7db517a0-e541-fb1b-b4b7-9063499bc939@mail.ioffe.ru> Dear colleagues, Thank you much for the help! Now the code seems to be working well! Best, Lidiia On 03.06.2022 15:19, Matthew Knepley wrote: > On Fri, Jun 3, 2022 at 6:42 AM Lidia wrote: > > Dear Matt, Barry, > > thank you for the information about openMP! > > Now all processes are loaded well. But we see a strange behaviour > of running times at different iterations, see description below. > Could you please explain us the reason and how we can improve it? > > We need to quickly solve a big (about 1e6 rows) square sparse > non-symmetric matrix many times (about 1e5 times) consequently. > Matrix is constant at every iteration, and the right-side vector B > is slowly changed (we think that its change at every iteration > should be less then 0.001 %). So we use every previous solution > vector X as an initial guess for the next iteration. AMG > preconditioner and GMRES solver are used. > > We have tested the code using a matrix with 631 000 rows, during > 15 consequent iterations, using vector X from the previous > iterations. Right-side vector B and matrix A are constant during > the whole running. The time of the first iteration is large (about > 2 seconds) and is quickly decreased to the next iterations > (average time of last iterations were about 0.00008 s). But some > iterations in the middle (# 2 and # 12) have huge time - 0.999063 > second (see the figure with time dynamics attached). This time of > 0.999 second does not depend on the size of a matrix, on the > number of MPI processes, these time jumps also exist if we vary > vector B. Why these time jumps appear and how we can avoid them? > > > PETSc is not taking this time. It must come from somewhere else in > your code. Notice that no iterations are taken for any subsequent > solves, so no operations other than the residual norm check (and > preconditioner application) are being performed. > > ? Thanks, > > ? ? ?Matt > > The ksp_monitor out for this running (included 15 iterations) > using 36 MPI processes and a file with the memory bandwidth > information (testSpeed) are also attached. We can provide our C++ > script if it is needed. > > Thanks a lot! > > Best, > Lidiia > > > > On 01.06.2022 21:14, Matthew Knepley wrote: >> On Wed, Jun 1, 2022 at 1:43 PM Lidia >> wrote: >> >> Dear Matt, >> >> Thank you for the rule of 10,000 variables per process! We >> have run ex.5 with matrix 1e4 x 1e4 at our cluster and got a >> good performance dynamics (see the figure "performance.png" - >> dependency of the solving time in seconds on the number of >> cores). We have used GAMG preconditioner (multithread: we >> have added the option >> "-pc_gamg_use_parallel_coarse_grid_solver") and GMRES solver. >> And we have set one openMP thread to every MPI process. Now >> the ex.5 is working good on many mpi processes! But the >> running uses about 100 GB of RAM. >> >> How we can run ex.5 using many openMP threads without mpi? If >> we just change the running command, the cores are not loaded >> normally: usually just one core is loaded in 100 % and others >> are idle. Sometimes all cores are working in 100 % during 1 >> second but then again become idle about 30 seconds. Can the >> preconditioner use many threads and how to activate this option? >> >> >> Maye you could describe what you are trying to accomplish? >> Threads and processes are not really different, except for memory >> sharing. However, sharing large complex data structures rarely >> works. That is why they get partitioned and operate effectively >> as distributed memory. You would not really save memory by using >> threads in this instance, if that is your goal. This is detailed >> in the talks in this session (see 2016 PP Minisymposium on this >> page https://cse.buffalo.edu/~knepley/relacs.html). >> >> ? Thanks, >> >> ? ? ?Matt >> >> The solving times (the time of the solver work) using 60 >> openMP threads is 511 seconds now, and while using 60 MPI >> processes - 13.19 seconds. >> >> ksp_monitor outs for both cases (many openMP threads or many >> MPI processes) are attached. >> >> >> Thank you! >> >> Best, >> Lidia >> >> On 31.05.2022 15:21, Matthew Knepley wrote: >>> I have looked at the local logs. First, you have run >>> problems of size 12? and 24. As a rule of thumb, you need >>> 10,000 >>> variables per process in order to see good speedup. >>> >>> ? Thanks, >>> >>> ? ? ?Matt >>> >>> On Tue, May 31, 2022 at 8:19 AM Matthew Knepley >>> wrote: >>> >>> On Tue, May 31, 2022 at 7:39 AM Lidia >>> wrote: >>> >>> Matt, Mark, thank you much for your answers! >>> >>> >>> Now we have run example # 5 on our computer cluster >>> and on the local server and also have not seen any >>> performance increase, but by unclear reason running >>> times on the local server are much better than on >>> the cluster. >>> >>> I suspect that you are trying to get speedup without >>> increasing the memory bandwidth: >>> >>> https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup >>> >>> ? Thanks, >>> >>> ? ? ?Matt >>> >>> Now we will try to run petsc #5 example inside a >>> docker container on our server and see if the >>> problem is in our environment. I'll write you the >>> results of this test as soon as we get it. >>> >>> The ksp_monitor outs for the 5th test at the current >>> local server configuration (for 2 and 4 mpi >>> processes) and for the cluster (for 1 and 3 mpi >>> processes) are attached . >>> >>> >>> And one more question. Potentially we can use 10 >>> nodes and 96 threads at each node on our cluster. >>> What do you think, which combination of numbers of >>> mpi processes and openmp threads may be the best for >>> the 5th example? >>> >>> Thank you! >>> >>> >>> Best, >>> Lidiia >>> >>> On 31.05.2022 05:42, Mark Adams wrote: >>>> And if you see "NO" change in performance I suspect >>>> the solver/matrix is all on one processor. >>>> (PETSc does not use threads by default so threads >>>> should not change anything). >>>> >>>> As Matt said, it is best to start with a PETSc >>>> example?that does something like what you want >>>> (parallel linear solve, see src/ksp/ksp/tutorials >>>> for examples), and then add your code to it. >>>> That way you get the basic infrastructure?in place >>>> for you, which is pretty obscure to the uninitiated. >>>> >>>> Mark >>>> >>>> On Mon, May 30, 2022 at 10:18 PM Matthew Knepley >>>> wrote: >>>> >>>> On Mon, May 30, 2022 at 10:12 PM Lidia >>>> wrote: >>>> >>>> Dear colleagues, >>>> >>>> Is here anyone who have solved big sparse >>>> linear matrices using PETSC? >>>> >>>> >>>> There are lots of publications with this kind >>>> of data. Here is one recent one: >>>> https://arxiv.org/abs/2204.01722 >>>> >>>> We have found NO performance improvement >>>> while using more and more mpi >>>> processes (1-2-3) and open-mp threads (from >>>> 1 to 72 threads). Did anyone >>>> faced to this problem? Does anyone know any >>>> possible reasons of such >>>> behaviour? >>>> >>>> >>>> Solver behavior is dependent on the input >>>> matrix. The only general-purpose solvers >>>> are direct, but they do not scale linearly and >>>> have high memory requirements. >>>> >>>> Thus, in order to make progress you will have >>>> to be specific about your matrices. >>>> >>>> We use AMG preconditioner and GMRES solver >>>> from KSP package, as our >>>> matrix is large (from 100 000 to 1e+6 rows >>>> and columns), sparse, >>>> non-symmetric and includes both positive >>>> and negative values. But >>>> performance problems also exist while using >>>> CG solvers with symmetric >>>> matrices. >>>> >>>> >>>> There are many PETSc examples, such as example >>>> 5 for the Laplacian, that exhibit >>>> good scaling with both AMG and GMG. >>>> >>>> Could anyone help us to set appropriate >>>> options of the preconditioner >>>> and solver? Now we use default parameters, >>>> maybe they are not the best, >>>> but we do not know a good combination. Or >>>> maybe you could suggest any >>>> other pairs of preconditioner+solver for >>>> such tasks? >>>> >>>> I can provide more information: the >>>> matrices that we solve, c++ script >>>> to run solving using petsc and any >>>> statistics obtained by our runs. >>>> >>>> >>>> First, please provide a description of the >>>> linear system, and the output of >>>> >>>> ? -ksp_view -ksp_monitor_true_residual >>>> -ksp_converged_reason -log_view >>>> >>>> for each test case. >>>> >>>> ? Thanks, >>>> >>>> ? ? ?Matt >>>> >>>> Thank you in advance! >>>> >>>> Best regards, >>>> Lidiia Varshavchik, >>>> Ioffe Institute, St. Petersburg, Russia >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before >>>> they begin their experiments is infinitely more >>>> interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they >>> begin their experiments is infinitely more interesting >>> than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin >>> their experiments is infinitely more interesting than any >>> results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to >> which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Tue Jun 7 05:37:07 2022 From: jroman at dsic.upv.es (Jose E. Roman) Date: Tue, 7 Jun 2022 12:37:07 +0200 Subject: [petsc-users] Accelerating eigenvalue computation / removing portion of spectrum In-Reply-To: References: <7E80B1DF-0F06-4EFB-99BA-63471F55165D@dsic.upv.es> <62743559-BA54-4828-B0D8-B84111C2E1EA@dsic.upv.es> Message-ID: Lucas, I have tried your matrices. Below are some results with complex scalars and MUMPS using 2 MPI processes. Using shift-and-invert with eps_target=-0.95 I get three eigenvalues (two of them equal to -1), and MUMPS is taking 65 seconds out of 67. Convergence is very fast, nothing can be improved because most of the time is due to MUMPS. Adding a region filter with -rg_type interval -rg_interval_endpoints -.99,1,-.1,.1 the times are essentially the same, but you get rid of the unwanted eigenvalues (-1). This is the best option. If you need to compute many eigenvalues, then you should consider specifying an interval (spectrum slicing), see section 3.4.5 of the manual. But this cannot be used with complex scalars with MUMPS (see the note in the manual). Since your matrices are real and symmetric, I tried it with real scalars, using -eps_interval -.99,1 and in that case I get 33 eigenvalues and MUMPS takes 33 seconds out of the overall 68 seconds (three numerical factorizations are done in this execution). $ mpiexec -n 2 ./ex7 -f1 Areal.mat -f2 Breal.mat -eps_gen_hermitian -st_type sinvert -st_pc_type cholesky -eps_interval -.99,1 -st_mat_mumps_icntl_13 1 -st_mat_mumps_icntl_24 1 Generalized eigenproblem stored in file. Reading REAL matrices from binary files... Number of iterations of the method: 3 Number of linear iterations of the method: 0 Solution method: krylovschur Number of requested eigenvalues: 33 Stopping condition: tol=1e-10, maxit=100 Linear eigensolve converged (33 eigenpairs) due to CONVERGED_TOL; iterations 3 ---------------------- -------------------- k ||Ax-kBx||/||kx|| ---------------------- -------------------- -0.698786 4.61016e-14 -0.598058 5.34239e-14 -0.598051 5.53609e-14 -0.380951 7.83403e-14 -0.280707 2.91772e-13 -0.280671 3.86414e-13 -0.273832 2.18507e-13 -0.273792 2.25672e-13 -0.064625 2.71132e-12 -0.064558 2.74757e-12 -0.034888 4.02325e-12 0.138192 1.56285e-12 0.138298 3.58149e-12 0.197123 1.77274e-12 0.197391 1.93185e-12 0.268338 1.09276e-12 0.268416 8.24014e-13 0.363498 9.21471e-13 0.420608 7.18076e-13 0.420669 5.13068e-13 0.523661 1.28491e-12 0.621233 1.07663e-12 0.621648 5.91783e-13 0.662408 4.36285e-13 0.662578 5.11942e-13 0.708328 3.94862e-13 0.708488 3.56613e-13 0.709269 2.73414e-13 0.733286 5.73269e-13 0.733524 4.52308e-13 0.814093 2.5299e-13 0.870087 2.02513e-13 0.870229 3.19166e-13 ---------------------- -------------------- > El 31 may 2022, a las 22:28, Jose E. Roman escribi?: > > Probably MUMPS is taking most of the time... > > If the matrices are not too large, send them to my personal email and I will have a look. > > Jose > > >> El 31 may 2022, a las 22:13, Lucas Banting escribi?: >> >> Thanks for the sharing the article. >> For my application, I think using an interval region to exclude the unneeded eigenvalues will still be faster than forming a larger constrained system. Specifying an interval appears to run in a similar amount of time. >> >> Lucas >> From: Jose E. Roman >> Sent: Tuesday, May 31, 2022 2:08 PM >> To: Lucas Banting >> Cc: PETSc >> Subject: Re: [petsc-users] Accelerating eigenvalue computation / removing portion of spectrum >> >> Caution: This message was sent from outside the University of Manitoba. >> >> >> Please respond to the list also. >> >> The problem with EPSSetDeflationSpace() is that it internally orthogonalizes the vectors that you pass in, so it is not viable for thousands of vectors. >> >> You can try implementing any of the alternative schemes described in https://doi.org/10.1002/nla.307 >> >> Another thing you can try is to use a region for filtering, as explained in section 2.6.4 of the users manual. Use a region that excludes -1.0 and you will have more chances to get the wanted eigenvalues faster. But still convergence may be slow. >> >> Jose >> >> >>> El 31 may 2022, a las 20:52, Lucas Banting escribi?: >>> >>> Thanks for the response Jose, >>> >>> There is an analytical solution for these modes actually, however there are thousands of them and they are all sparse. >>> I assume it is a non-trivial thing for EPSSetDeflationSpace() to take something like a MATAIJ as input? >>> >>> Lucas >>> From: Jose E. Roman >>> Sent: Tuesday, May 31, 2022 1:11 PM >>> To: Lucas Banting >>> Cc: petsc-users at mcs.anl.gov >>> Subject: Re: [petsc-users] Accelerating eigenvalue computation / removing portion of spectrum >>> >>> Caution: This message was sent from outside the University of Manitoba. >>> >>> >>> If you know how to cheaply compute a basis of the nullspace of S, then you can try passing it to the solver via EPSSetDeflationSpace()https://slepc.upv.es/documentation/current/docs/manualpages/EPS/EPSSetDeflationSpace.html >>> >>> Jose >>> >>> >>>> El 31 may 2022, a las 19:28, Lucas Banting escribi?: >>>> >>>> Hello, >>>> >>>> I have a general non hermitian eigenvalue problem arising from the 3D helmholtz equation. >>>> The form of the helmholtz equaton is: >>>> >>>> (S - k^2M)v = lambda k^2 M v >>>> >>>> Where S is the stiffness/curl-curl matrix and M is the mass matrix associated with edge elements used to discretize the problem. >>>> The helmholtz equation creates eigenvalues of -1.0, which I believe are eigenvectors that are part of the null space of the curl-curl operator S. >>>> >>>> For my application, I would like to compute eigenvalues > -1.0, and avoid computation of eigenvalues of -1.0. >>>> I am currently using shift invert ST with mumps LU direct solver. By increasing the shift away from lambda=-1.0. I get faster computation of eigenvectors, and the lambda=-1.0 eigenvectors appear to slow down the computation by about a factor of two. >>>> Is there a way to avoid these lambda = -1.0 eigenpairs with a GNHEP problem type? >>>> >>>> Regards, >>>> Lucas > From yu1299885905 at outlook.com Tue Jun 7 08:51:32 2022 From: yu1299885905 at outlook.com (wang yuqi) Date: Tue, 7 Jun 2022 13:51:32 +0000 Subject: [petsc-users] PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range Message-ID: Hi, Dear developer: I encountered the following problems when I run my code with PETSC-3.5.2: [46]PETSC ERROR: ------------------------------------------------------------------------ [46]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [46]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [46]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [46]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors Could you please help me to fix this problem? Thank you very much! Best Regards. Yuqi Wang -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Jun 7 09:00:37 2022 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 7 Jun 2022 10:00:37 -0400 Subject: [petsc-users] PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range In-Reply-To: References: Message-ID: On Tue, Jun 7, 2022 at 9:51 AM wang yuqi wrote: > Hi, Dear developer: > > I encountered the following problems when I run my code with PETSC-3.5.2: > > > > [46]PETSC ERROR: > ------------------------------------------------------------------------ > > [46]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > > [46]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > > [46]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > [46]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > X to find memory corruption errors > > > > Could you please help me to fix this problem? > > It may not be a PETSc problem. Could you run in the debugger and get a stack trace? Thanks, Matt > Thank you very much! > > > > Best Regards. > > Yuqi Wang > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Jun 7 10:15:49 2022 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 7 Jun 2022 11:15:49 -0400 Subject: [petsc-users] PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range In-Reply-To: References: Message-ID: <7BA74193-101C-4387-B13E-79DEBF52F66F@petsc.dev> That is an extremely old PETSc version. Unless you are using a package that only works with that version (and talk to the package's authors about upgrading) we recommend upgrading to the latest PETSc version. Usually, there is more information in the error message, is there more in the message you can send? > On Jun 7, 2022, at 10:00 AM, Matthew Knepley wrote: > > On Tue, Jun 7, 2022 at 9:51 AM wang yuqi > wrote: > Hi, Dear developer: > > I encountered the following problems when I run my code with PETSC-3.5.2: > > > > [46]PETSC ERROR: ------------------------------------------------------------------------ > > [46]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range > > [46]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > > [46]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [46]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > > > > Could you please help me to fix this problem? > > > > It may not be a PETSc problem. Could you run in the debugger and get a stack trace? > > Thanks, > > Matt > > Thank you very much! > > > > Best Regards. > > Yuqi Wang > > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rthirumalaisam1857 at sdsu.edu Tue Jun 7 17:51:21 2022 From: rthirumalaisam1857 at sdsu.edu (Ramakrishnan Thirumalaisamy) Date: Tue, 7 Jun 2022 15:51:21 -0700 Subject: [petsc-users] How to ignore a one floating point exception and move to the next? Message-ID: Hi everyone, I am using fp_trap to debug the floating-point error in my code. Is there any way I can move from one floating point to next one When I run the code in the debugger with "-fp_trap"? I know that some floating point errors are due to uninitialized variables but those are benign. I want to move to those ones that lead to NANs or division by zero. Thanks, Rama -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Jun 7 18:10:15 2022 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 7 Jun 2022 19:10:15 -0400 Subject: [petsc-users] How to ignore a one floating point exception and move to the next? In-Reply-To: References: Message-ID: <7E70E5E5-EBE4-4614-9646-484C6441A619@petsc.dev> PETSc uses the signal handler to catch floating point exceptions when run by default or with -fp_trap. These are hard to recover from and continue. You can run PETSc with -fp_trap off in the debugger but tell the debugger to catch the floating point exceptions. You may be able to continue from those. Not having uninitialized variables and strange unimportant floating point exceptions in your code is part of good housekeeping and means that when you really need to debug you can be much more efficient in the debugging process. Like trying to find something in a messy room or a well organized room. I recommend you first do the housekeeping rather than try to find ways to avoid doing the housekeeping. Barry > On Jun 7, 2022, at 6:51 PM, Ramakrishnan Thirumalaisamy wrote: > > Hi everyone, > > I am using fp_trap to debug the floating-point error in my code. Is there any way I can move from one floating point to next one When I run the code in the debugger with "-fp_trap"? I know that some floating point errors are due to uninitialized variables but those are benign. I want to move to those ones that lead to NANs or division by zero. > > > > Thanks, > Rama From jacob.fai at gmail.com Tue Jun 7 18:31:13 2022 From: jacob.fai at gmail.com (Jacob Faibussowitsch) Date: Tue, 7 Jun 2022 19:31:13 -0400 Subject: [petsc-users] How to ignore a one floating point exception and move to the next? In-Reply-To: <7E70E5E5-EBE4-4614-9646-484C6441A619@petsc.dev> References: <7E70E5E5-EBE4-4614-9646-484C6441A619@petsc.dev> Message-ID: <109E15F6-5F38-4B1F-A4F4-348CBD63C5A6@gmail.com> You can also compile your code (and PETSc) using `-fsanitize=undefined` and run it to detect such errors. Note however that this will most likely also catch/error out on usage of uninitialized variables so your mileage may vary. As Barry notes this kind of stuff is much easier to debug when you don?t have to ignore other errors. Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) > On Jun 7, 2022, at 19:10, Barry Smith wrote: > > > PETSc uses the signal handler to catch floating point exceptions when run by default or with -fp_trap. These are hard to recover from and continue. > > You can run PETSc with -fp_trap off in the debugger but tell the debugger to catch the floating point exceptions. You may be able to continue from those. > > Not having uninitialized variables and strange unimportant floating point exceptions in your code is part of good housekeeping and means that when you really need to debug you can be much more efficient in the debugging process. Like trying to find something in a messy room or a well organized room. I recommend you first do the housekeeping rather than try to find ways to avoid doing the housekeeping. > > Barry > > >> On Jun 7, 2022, at 6:51 PM, Ramakrishnan Thirumalaisamy wrote: >> >> Hi everyone, >> >> I am using fp_trap to debug the floating-point error in my code. Is there any way I can move from one floating point to next one When I run the code in the debugger with "-fp_trap"? I know that some floating point errors are due to uninitialized variables but those are benign. I want to move to those ones that lead to NANs or division by zero. >> >> >> >> Thanks, >> Rama > From mi.mike1021 at gmail.com Tue Jun 7 23:14:31 2022 From: mi.mike1021 at gmail.com (Mike Michell) Date: Tue, 7 Jun 2022 23:14:31 -0500 Subject: [petsc-users] Load mesh as DMPlex along with Solution Fields obtained from External Codes Message-ID: Dear PETSc developer team, I am a user of PETSc DMPlex for a finite-volume solver. So far, I have loaded a mesh file made by Gmsh as a DMPlex object without pre-computed solution field. But what if I need to load the mesh as well as solution fields that are computed by other codes sharing the same physical domain, what is a smart way to do that? In other words, how can I load a DM object from a mesh file along with a defined solution field? I can think of that; load mesh to a DM object first, then declare a local (or global) vector to read & map the external solution field onto the PETSc data structure. But I can feel that this might not be the best way. Thanks, Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From sami.ben-elhaj-salah at ensma.fr Wed Jun 8 03:57:18 2022 From: sami.ben-elhaj-salah at ensma.fr (Sami BEN ELHAJ SALAH) Date: Wed, 8 Jun 2022 10:57:18 +0200 Subject: [petsc-users] Writing VTK output Message-ID: Dear Petsc Developer team, I solved a linear elastic problem in 3D using a DMPLEX. My system is converging, then I would like to write out my solution vector to a vtk file where I use unstructured mesh. Currently, I tried two algorithms and I have the same result. 1) Algorithm 1 err = SNESSolve(_snes, bc_vec_test, solution); CHKERRABORT(FOX::Parallel::COMM_WORLD,err); PetscViewer vtk; PetscViewerVTKOpen(FOX::Parallel::COMM_WORLD,"solution.vtk",FILE_MODE_WRITE,&vtk); VecView(solution,vtk); PetscViewerDestroy(&vtk); 2) Algorithm 2 err = SNESSolve(_snes, bc_vec_test, solution); CHKERRABORT(FOX::Parallel::COMM_WORLD,err); PetscViewer vtk; PetscViewerCreate(FOX::Parallel::COMM_WORLD, &vtk); PetscViewerSetType(vtk, PETSCVIEWERVTK); PetscViewerFileSetName(vtk, "sol.vtk"); VecView(solution, vtk); PetscViewerDestroy(&vtk); The result seems correct except for the rotation order of the nodes (see the red lines on gmsh and vtk file respectively). Then, I visualized my vtk file with paraview, and I remarked that my geometry is not correct and not conserved when comparing it with my gmsh file. So, I didn?t understand why the rotation order of nodes is not conserved when saving my result to a vtk file? Other information used: - gmsh format 2.2 - Vtk version: 7.1.1 - Petsc version: 3.13/opt Below my two files gmsh and vtk: Gmsh file: $MeshFormat 2.2 0 8 $EndMeshFormat $Nodes 12 1 0.0 10.0 10.0 2 0.0 0.0 10.0 3 0.0 0.0 0.0 4 0.0 10.0 0.0 5 10.0 10.0 10.0 6 10.0 0.0 10.0 7 10.0 0.0 0.0 8 10.0 10.0 0.0 9 20.0 10.0 10.0 10 20.0 0.0 10.0 11 20.0 0.0 0.0 12 20.0 10.0 0.0 $EndNodes $Elements 2 1 5 2 68 60 1 2 3 4 5 6 7 8 2 5 2 68 60 5 6 7 8 9 10 11 12 $EndElements Vtk file : # vtk DataFile Version 2.0 Simplicial Mesh Example ASCII DATASET UNSTRUCTURED_GRID POINTS 12 double 0.000000e+00 1.000000e+01 1.000000e+01 0.000000e+00 0.000000e+00 1.000000e+01 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+01 0.000000e+00 1.000000e+01 1.000000e+01 1.000000e+01 1.000000e+01 0.000000e+00 1.000000e+01 1.000000e+01 0.000000e+00 0.000000e+00 1.000000e+01 1.000000e+01 0.000000e+00 2.000000e+01 1.000000e+01 1.000000e+01 2.000000e+01 0.000000e+00 1.000000e+01 2.000000e+01 0.000000e+00 0.000000e+00 2.000000e+01 1.000000e+01 0.000000e+00 CELLS 2 18 8 0 3 2 1 4 5 6 7 8 4 7 6 5 8 9 10 11 CELL_TYPES 2 12 12 POINT_DATA 12 VECTORS dU_x double 2.754808e-10 -8.653846e-11 -8.653846e-11 2.754808e-10 8.653846e-11 -8.653846e-11 2.754808e-10 8.653846e-11 8.653846e-11 2.754808e-10 -8.653846e-11 8.653846e-11 4.678571e-01 -9.107143e-02 -9.107143e-02 4.678571e-01 9.107143e-02 -9.107143e-02 4.678571e-01 9.107143e-02 9.107143e-02 4.678571e-01 -9.107143e-02 9.107143e-02 1.000000e+00 -7.500000e-02 -7.500000e-02 1.000000e+00 7.500000e-02 -7.500000e-02 1.000000e+00 7.500000e-02 7.500000e-02 1.000000e+00 -7.500000e-02 7.500000e-02 Thank you in advance and have a good day ! Sami, -- Dr. Sami BEN ELHAJ SALAH Ing?nieur de Recherche (CNRS) Institut Pprime - ISAE - ENSMA Mobile: 06.62.51.26.74 Email: sami.ben-elhaj-salah at ensma.fr www.samibenelhajsalah.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Jun 8 08:37:14 2022 From: jed at jedbrown.org (Jed Brown) Date: Wed, 08 Jun 2022 07:37:14 -0600 Subject: [petsc-users] Writing VTK output In-Reply-To: References: Message-ID: <87czfje0ol.fsf@jedbrown.org> You're using pretty old versions of all software; I'd recommend upgrading. I recommend choosing the file name "solution.vtu" to use the modern (non-legacy) format. Does that work for you? Sami BEN ELHAJ SALAH writes: > Dear Petsc Developer team, > > I solved a linear elastic problem in 3D using a DMPLEX. My system is converging, then I would like to write out my solution vector to a vtk file where I use unstructured mesh. Currently, I tried two algorithms and I have the same result. > > 1) Algorithm 1 > err = SNESSolve(_snes, bc_vec_test, solution); > CHKERRABORT(FOX::Parallel::COMM_WORLD,err); > PetscViewer vtk; > PetscViewerVTKOpen(FOX::Parallel::COMM_WORLD,"solution.vtk",FILE_MODE_WRITE,&vtk); > VecView(solution,vtk); > PetscViewerDestroy(&vtk); > > > 2) Algorithm 2 > err = SNESSolve(_snes, bc_vec_test, solution); > CHKERRABORT(FOX::Parallel::COMM_WORLD,err); > PetscViewer vtk; > PetscViewerCreate(FOX::Parallel::COMM_WORLD, &vtk); > PetscViewerSetType(vtk, PETSCVIEWERVTK); > PetscViewerFileSetName(vtk, "sol.vtk"); > VecView(solution, vtk); > PetscViewerDestroy(&vtk); > > The result seems correct except for the rotation order of the nodes (see the red lines on gmsh and vtk file respectively). Then, I visualized my vtk file with paraview, and I remarked that my geometry is not correct and not conserved when comparing it with my gmsh file. So, I didn?t understand why the rotation order of nodes is not conserved when saving my result to a vtk file? > > Other information used: > - gmsh format 2.2 > - Vtk version: 7.1.1 > - Petsc version: 3.13/opt > > Below my two files gmsh and vtk: > > Gmsh file: > $MeshFormat > 2.2 0 8 > $EndMeshFormat > $Nodes > 12 > 1 0.0 10.0 10.0 > 2 0.0 0.0 10.0 > 3 0.0 0.0 0.0 > 4 0.0 10.0 0.0 > 5 10.0 10.0 10.0 > 6 10.0 0.0 10.0 > 7 10.0 0.0 0.0 > 8 10.0 10.0 0.0 > 9 20.0 10.0 10.0 > 10 20.0 0.0 10.0 > 11 20.0 0.0 0.0 > 12 20.0 10.0 0.0 > $EndNodes > $Elements > 2 > 1 5 2 68 60 1 2 3 4 5 6 7 8 > 2 5 2 68 60 5 6 7 8 9 10 11 12 > $EndElements > > Vtk file : > # vtk DataFile Version 2.0 > Simplicial Mesh Example > ASCII > DATASET UNSTRUCTURED_GRID > POINTS 12 double > 0.000000e+00 1.000000e+01 1.000000e+01 > 0.000000e+00 0.000000e+00 1.000000e+01 > 0.000000e+00 0.000000e+00 0.000000e+00 > 0.000000e+00 1.000000e+01 0.000000e+00 > 1.000000e+01 1.000000e+01 1.000000e+01 > 1.000000e+01 0.000000e+00 1.000000e+01 > 1.000000e+01 0.000000e+00 0.000000e+00 > 1.000000e+01 1.000000e+01 0.000000e+00 > 2.000000e+01 1.000000e+01 1.000000e+01 > 2.000000e+01 0.000000e+00 1.000000e+01 > 2.000000e+01 0.000000e+00 0.000000e+00 > 2.000000e+01 1.000000e+01 0.000000e+00 > CELLS 2 18 > 8 0 3 2 1 4 5 6 7 > 8 4 7 6 5 8 9 10 11 > CELL_TYPES 2 > 12 > 12 > POINT_DATA 12 > VECTORS dU_x double > 2.754808e-10 -8.653846e-11 -8.653846e-11 > 2.754808e-10 8.653846e-11 -8.653846e-11 > 2.754808e-10 8.653846e-11 8.653846e-11 > 2.754808e-10 -8.653846e-11 8.653846e-11 > 4.678571e-01 -9.107143e-02 -9.107143e-02 > 4.678571e-01 9.107143e-02 -9.107143e-02 > 4.678571e-01 9.107143e-02 9.107143e-02 > 4.678571e-01 -9.107143e-02 9.107143e-02 > 1.000000e+00 -7.500000e-02 -7.500000e-02 > 1.000000e+00 7.500000e-02 -7.500000e-02 > 1.000000e+00 7.500000e-02 7.500000e-02 > 1.000000e+00 -7.500000e-02 7.500000e-02 > > Thank you in advance and have a good day ! > > Sami, > > -- > Dr. Sami BEN ELHAJ SALAH > Ing?nieur de Recherche (CNRS) > Institut Pprime - ISAE - ENSMA > Mobile: 06.62.51.26.74 > Email: sami.ben-elhaj-salah at ensma.fr > www.samibenelhajsalah.com From sami.ben-elhaj-salah at ensma.fr Wed Jun 8 09:14:13 2022 From: sami.ben-elhaj-salah at ensma.fr (Sami BEN ELHAJ SALAH) Date: Wed, 8 Jun 2022 16:14:13 +0200 Subject: [petsc-users] Writing VTK output In-Reply-To: <87czfje0ol.fsf@jedbrown.org> References: <87czfje0ol.fsf@jedbrown.org> Message-ID: Hi Jed, Thank you for your answer. When I use a ??solution.vtu'', I obtain a wrong file. _ $@$@$@$@$@$@$@$@$@$@$@$@4@$@$@4@$@4@4@$@@   ?p?O??=??sT?????sT????p?O??=??sT???=??sT????p?O??=??sT???=??sT???=?p?O??=??sT?????sT???=o_?????uP????uP??o_?????uP????uP?? o_?????uP????uP?? o_?????uP????uP??b#???????333????333??_#?????? ?333????333??b#??????(?333??'?333??a#???????333??>?333?? If I understand your answer, to solve my problem, should just upgrade all my software ? Thanks, Sami, -- Dr. Sami BEN ELHAJ SALAH Ing?nieur de Recherche (CNRS) Institut Pprime - ISAE - ENSMA Mobile: 06.62.51.26.74 Email: sami.ben-elhaj-salah at ensma.fr www.samibenelhajsalah.com > Le 8 juin 2022 ? 15:37, Jed Brown a ?crit : > > You're using pretty old versions of all software; I'd recommend upgrading. I recommend choosing the file name "solution.vtu" to use the modern (non-legacy) format. Does that work for you? > > Sami BEN ELHAJ SALAH writes: > >> Dear Petsc Developer team, >> >> I solved a linear elastic problem in 3D using a DMPLEX. My system is converging, then I would like to write out my solution vector to a vtk file where I use unstructured mesh. Currently, I tried two algorithms and I have the same result. >> >> 1) Algorithm 1 >> err = SNESSolve(_snes, bc_vec_test, solution); >> CHKERRABORT(FOX::Parallel::COMM_WORLD,err); >> PetscViewer vtk; >> PetscViewerVTKOpen(FOX::Parallel::COMM_WORLD,"solution.vtk",FILE_MODE_WRITE,&vtk); >> VecView(solution,vtk); >> PetscViewerDestroy(&vtk); >> >> >> 2) Algorithm 2 >> err = SNESSolve(_snes, bc_vec_test, solution); >> CHKERRABORT(FOX::Parallel::COMM_WORLD,err); >> PetscViewer vtk; >> PetscViewerCreate(FOX::Parallel::COMM_WORLD, &vtk); >> PetscViewerSetType(vtk, PETSCVIEWERVTK); >> PetscViewerFileSetName(vtk, "sol.vtk"); >> VecView(solution, vtk); >> PetscViewerDestroy(&vtk); >> >> The result seems correct except for the rotation order of the nodes (see the red lines on gmsh and vtk file respectively). Then, I visualized my vtk file with paraview, and I remarked that my geometry is not correct and not conserved when comparing it with my gmsh file. So, I didn?t understand why the rotation order of nodes is not conserved when saving my result to a vtk file? >> >> Other information used: >> - gmsh format 2.2 >> - Vtk version: 7.1.1 >> - Petsc version: 3.13/opt >> >> Below my two files gmsh and vtk: >> >> Gmsh file: >> $MeshFormat >> 2.2 0 8 >> $EndMeshFormat >> $Nodes >> 12 >> 1 0.0 10.0 10.0 >> 2 0.0 0.0 10.0 >> 3 0.0 0.0 0.0 >> 4 0.0 10.0 0.0 >> 5 10.0 10.0 10.0 >> 6 10.0 0.0 10.0 >> 7 10.0 0.0 0.0 >> 8 10.0 10.0 0.0 >> 9 20.0 10.0 10.0 >> 10 20.0 0.0 10.0 >> 11 20.0 0.0 0.0 >> 12 20.0 10.0 0.0 >> $EndNodes >> $Elements >> 2 >> 1 5 2 68 60 1 2 3 4 5 6 7 8 >> 2 5 2 68 60 5 6 7 8 9 10 11 12 >> $EndElements >> >> Vtk file : >> # vtk DataFile Version 2.0 >> Simplicial Mesh Example >> ASCII >> DATASET UNSTRUCTURED_GRID >> POINTS 12 double >> 0.000000e+00 1.000000e+01 1.000000e+01 >> 0.000000e+00 0.000000e+00 1.000000e+01 >> 0.000000e+00 0.000000e+00 0.000000e+00 >> 0.000000e+00 1.000000e+01 0.000000e+00 >> 1.000000e+01 1.000000e+01 1.000000e+01 >> 1.000000e+01 0.000000e+00 1.000000e+01 >> 1.000000e+01 0.000000e+00 0.000000e+00 >> 1.000000e+01 1.000000e+01 0.000000e+00 >> 2.000000e+01 1.000000e+01 1.000000e+01 >> 2.000000e+01 0.000000e+00 1.000000e+01 >> 2.000000e+01 0.000000e+00 0.000000e+00 >> 2.000000e+01 1.000000e+01 0.000000e+00 >> CELLS 2 18 >> 8 0 3 2 1 4 5 6 7 >> 8 4 7 6 5 8 9 10 11 >> CELL_TYPES 2 >> 12 >> 12 >> POINT_DATA 12 >> VECTORS dU_x double >> 2.754808e-10 -8.653846e-11 -8.653846e-11 >> 2.754808e-10 8.653846e-11 -8.653846e-11 >> 2.754808e-10 8.653846e-11 8.653846e-11 >> 2.754808e-10 -8.653846e-11 8.653846e-11 >> 4.678571e-01 -9.107143e-02 -9.107143e-02 >> 4.678571e-01 9.107143e-02 -9.107143e-02 >> 4.678571e-01 9.107143e-02 9.107143e-02 >> 4.678571e-01 -9.107143e-02 9.107143e-02 >> 1.000000e+00 -7.500000e-02 -7.500000e-02 >> 1.000000e+00 7.500000e-02 -7.500000e-02 >> 1.000000e+00 7.500000e-02 7.500000e-02 >> 1.000000e+00 -7.500000e-02 7.500000e-02 >> >> Thank you in advance and have a good day ! >> >> Sami, >> >> -- >> Dr. Sami BEN ELHAJ SALAH >> Ing?nieur de Recherche (CNRS) >> Institut Pprime - ISAE - ENSMA >> Mobile: 06.62.51.26.74 >> Email: sami.ben-elhaj-salah at ensma.fr >> www.samibenelhajsalah.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Jun 8 09:25:51 2022 From: jed at jedbrown.org (Jed Brown) Date: Wed, 08 Jun 2022 08:25:51 -0600 Subject: [petsc-users] Writing VTK output In-Reply-To: References: <87czfje0ol.fsf@jedbrown.org> Message-ID: <875ylbdyfk.fsf@jedbrown.org> Does the file load in paraview? When I load your *.msh in a tutorial with -dm_plex_filename sami.msh -dm_view vtk:sami.vtu, I get this good output. -------------- next part -------------- A non-text attachment was scrubbed... Name: sami.vtu Type: model/vnd.vtu Size: 1319 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: sami.png Type: image/png Size: 35231 bytes Desc: not available URL: -------------- next part -------------- Sami BEN ELHAJ SALAH writes: > Hi Jed, > > Thank you for your answer. > > When I use a ??solution.vtu'', I obtain a wrong file. > > > > > > > > > > > > > > > > > > > > > > > _ $@$@$@$@$@$@$@$@$@$@$@$@4@$@$@4@$@4@4@$@@ >   ?p?O??=??sT?????sT????p?O??=??sT???=??sT????p?O??=??sT???=??sT???=?p?O??=??sT?????sT???=o_?????uP????uP??o_?????uP????uP??o_?????uP????uP??o_?????uP????uP??b#???????333????333??_#?????? ?333????333??b#??????(?333??'?333??a#???????333??>?333?? > > > > > If I understand your answer, to solve my problem, should just upgrade all my software ? > > Thanks, > Sami, > > > -- > Dr. Sami BEN ELHAJ SALAH > Ing?nieur de Recherche (CNRS) > Institut Pprime - ISAE - ENSMA > Mobile: 06.62.51.26.74 > Email: sami.ben-elhaj-salah at ensma.fr > www.samibenelhajsalah.com > > > >> Le 8 juin 2022 ? 15:37, Jed Brown a ?crit : >> >> You're using pretty old versions of all software; I'd recommend upgrading. I recommend choosing the file name "solution.vtu" to use the modern (non-legacy) format. Does that work for you? >> >> Sami BEN ELHAJ SALAH writes: >> >>> Dear Petsc Developer team, >>> >>> I solved a linear elastic problem in 3D using a DMPLEX. My system is converging, then I would like to write out my solution vector to a vtk file where I use unstructured mesh. Currently, I tried two algorithms and I have the same result. >>> >>> 1) Algorithm 1 >>> err = SNESSolve(_snes, bc_vec_test, solution); >>> CHKERRABORT(FOX::Parallel::COMM_WORLD,err); >>> PetscViewer vtk; >>> PetscViewerVTKOpen(FOX::Parallel::COMM_WORLD,"solution.vtk",FILE_MODE_WRITE,&vtk); >>> VecView(solution,vtk); >>> PetscViewerDestroy(&vtk); >>> >>> >>> 2) Algorithm 2 >>> err = SNESSolve(_snes, bc_vec_test, solution); >>> CHKERRABORT(FOX::Parallel::COMM_WORLD,err); >>> PetscViewer vtk; >>> PetscViewerCreate(FOX::Parallel::COMM_WORLD, &vtk); >>> PetscViewerSetType(vtk, PETSCVIEWERVTK); >>> PetscViewerFileSetName(vtk, "sol.vtk"); >>> VecView(solution, vtk); >>> PetscViewerDestroy(&vtk); >>> >>> The result seems correct except for the rotation order of the nodes (see the red lines on gmsh and vtk file respectively). Then, I visualized my vtk file with paraview, and I remarked that my geometry is not correct and not conserved when comparing it with my gmsh file. So, I didn?t understand why the rotation order of nodes is not conserved when saving my result to a vtk file? >>> >>> Other information used: >>> - gmsh format 2.2 >>> - Vtk version: 7.1.1 >>> - Petsc version: 3.13/opt >>> >>> Below my two files gmsh and vtk: >>> >>> Gmsh file: >>> $MeshFormat >>> 2.2 0 8 >>> $EndMeshFormat >>> $Nodes >>> 12 >>> 1 0.0 10.0 10.0 >>> 2 0.0 0.0 10.0 >>> 3 0.0 0.0 0.0 >>> 4 0.0 10.0 0.0 >>> 5 10.0 10.0 10.0 >>> 6 10.0 0.0 10.0 >>> 7 10.0 0.0 0.0 >>> 8 10.0 10.0 0.0 >>> 9 20.0 10.0 10.0 >>> 10 20.0 0.0 10.0 >>> 11 20.0 0.0 0.0 >>> 12 20.0 10.0 0.0 >>> $EndNodes >>> $Elements >>> 2 >>> 1 5 2 68 60 1 2 3 4 5 6 7 8 >>> 2 5 2 68 60 5 6 7 8 9 10 11 12 >>> $EndElements >>> >>> Vtk file : >>> # vtk DataFile Version 2.0 >>> Simplicial Mesh Example >>> ASCII >>> DATASET UNSTRUCTURED_GRID >>> POINTS 12 double >>> 0.000000e+00 1.000000e+01 1.000000e+01 >>> 0.000000e+00 0.000000e+00 1.000000e+01 >>> 0.000000e+00 0.000000e+00 0.000000e+00 >>> 0.000000e+00 1.000000e+01 0.000000e+00 >>> 1.000000e+01 1.000000e+01 1.000000e+01 >>> 1.000000e+01 0.000000e+00 1.000000e+01 >>> 1.000000e+01 0.000000e+00 0.000000e+00 >>> 1.000000e+01 1.000000e+01 0.000000e+00 >>> 2.000000e+01 1.000000e+01 1.000000e+01 >>> 2.000000e+01 0.000000e+00 1.000000e+01 >>> 2.000000e+01 0.000000e+00 0.000000e+00 >>> 2.000000e+01 1.000000e+01 0.000000e+00 >>> CELLS 2 18 >>> 8 0 3 2 1 4 5 6 7 >>> 8 4 7 6 5 8 9 10 11 >>> CELL_TYPES 2 >>> 12 >>> 12 >>> POINT_DATA 12 >>> VECTORS dU_x double >>> 2.754808e-10 -8.653846e-11 -8.653846e-11 >>> 2.754808e-10 8.653846e-11 -8.653846e-11 >>> 2.754808e-10 8.653846e-11 8.653846e-11 >>> 2.754808e-10 -8.653846e-11 8.653846e-11 >>> 4.678571e-01 -9.107143e-02 -9.107143e-02 >>> 4.678571e-01 9.107143e-02 -9.107143e-02 >>> 4.678571e-01 9.107143e-02 9.107143e-02 >>> 4.678571e-01 -9.107143e-02 9.107143e-02 >>> 1.000000e+00 -7.500000e-02 -7.500000e-02 >>> 1.000000e+00 7.500000e-02 -7.500000e-02 >>> 1.000000e+00 7.500000e-02 7.500000e-02 >>> 1.000000e+00 -7.500000e-02 7.500000e-02 >>> >>> Thank you in advance and have a good day ! >>> >>> Sami, >>> >>> -- >>> Dr. Sami BEN ELHAJ SALAH >>> Ing?nieur de Recherche (CNRS) >>> Institut Pprime - ISAE - ENSMA >>> Mobile: 06.62.51.26.74 >>> Email: sami.ben-elhaj-salah at ensma.fr >>> www.samibenelhajsalah.com From sami.ben-elhaj-salah at ensma.fr Wed Jun 8 10:24:15 2022 From: sami.ben-elhaj-salah at ensma.fr (Sami BEN ELHAJ SALAH) Date: Wed, 8 Jun 2022 17:24:15 +0200 Subject: [petsc-users] Writing VTK output In-Reply-To: <875ylbdyfk.fsf@jedbrown.org> References: <87czfje0ol.fsf@jedbrown.org> <875ylbdyfk.fsf@jedbrown.org> Message-ID: <7A2FB4C2-CA56-4D7D-9162-4574E14032C6@ensma.fr> Yes, the file "sami.vtu" is loaded correctly in paraview and I have the good output like you. In my code, I tried with the same command given in your last answer and I still have the wrong .vtu file. I use this: mpirun -np 1 /home/benelhasa/fox_petsc/build_test/bin/Debug/FoXtroT -snes_test_jacobian_view -snes_converged_reason -snes_monitor -ksp_monitor -ksp_xmonitor -dm_plex_filename cub_2C3D8_msh.msh -dm_view vtk:cub_2C3D8_msh.vtu cub_8C3D8.fxt Thanks, Sami, -- Dr. Sami BEN ELHAJ SALAH Ing?nieur de Recherche (CNRS) Institut Pprime - ISAE - ENSMA Mobile: 06.62.51.26.74 Email: sami.ben-elhaj-salah at ensma.fr www.samibenelhajsalah.com > Le 8 juin 2022 ? 16:25, Jed Brown a ?crit : > > Does the file load in paraview? When I load your *.msh in a tutorial with -dm_plex_filename sami.msh -dm_view vtk:sami.vtu, I get this good output. > > > Sami BEN ELHAJ SALAH > writes: > >> Hi Jed, >> >> Thank you for your answer. >> >> When I use a ??solution.vtu'', I obtain a wrong file. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> _ $@$@$@$@$@$@$@$@$@$@$@$@4@$@$@4@$@4 at 4@$@@ >>   ?p?O??=??sT?????sT????p?O??=??sT???=??sT????p?O??=??sT???=??sT???=?p?O??=??sT?????sT???=o_?????uP????uP??o_?????uP????uP??o_?????uP????uP??o_?????uP????uP??b#???????333????333??_#?????? ?333????333??b#??????(?333??'?333??a#???????333??>?333?? >> >> >> >> >> If I understand your answer, to solve my problem, should just upgrade all my software ? >> >> Thanks, >> Sami, >> >> >> -- >> Dr. Sami BEN ELHAJ SALAH >> Ing?nieur de Recherche (CNRS) >> Institut Pprime - ISAE - ENSMA >> Mobile: 06.62.51.26.74 >> Email: sami.ben-elhaj-salah at ensma.fr >> www.samibenelhajsalah.com > >> >> >> >>> Le 8 juin 2022 ? 15:37, Jed Brown a ?crit : >>> >>> You're using pretty old versions of all software; I'd recommend upgrading. I recommend choosing the file name "solution.vtu" to use the modern (non-legacy) format. Does that work for you? >>> >>> Sami BEN ELHAJ SALAH writes: >>> >>>> Dear Petsc Developer team, >>>> >>>> I solved a linear elastic problem in 3D using a DMPLEX. My system is converging, then I would like to write out my solution vector to a vtk file where I use unstructured mesh. Currently, I tried two algorithms and I have the same result. >>>> >>>> 1) Algorithm 1 >>>> err = SNESSolve(_snes, bc_vec_test, solution); >>>> CHKERRABORT(FOX::Parallel::COMM_WORLD,err); >>>> PetscViewer vtk; >>>> PetscViewerVTKOpen(FOX::Parallel::COMM_WORLD,"solution.vtk",FILE_MODE_WRITE,&vtk); >>>> VecView(solution,vtk); >>>> PetscViewerDestroy(&vtk); >>>> >>>> >>>> 2) Algorithm 2 >>>> err = SNESSolve(_snes, bc_vec_test, solution); >>>> CHKERRABORT(FOX::Parallel::COMM_WORLD,err); >>>> PetscViewer vtk; >>>> PetscViewerCreate(FOX::Parallel::COMM_WORLD, &vtk); >>>> PetscViewerSetType(vtk, PETSCVIEWERVTK); >>>> PetscViewerFileSetName(vtk, "sol.vtk"); >>>> VecView(solution, vtk); >>>> PetscViewerDestroy(&vtk); >>>> >>>> The result seems correct except for the rotation order of the nodes (see the red lines on gmsh and vtk file respectively). Then, I visualized my vtk file with paraview, and I remarked that my geometry is not correct and not conserved when comparing it with my gmsh file. So, I didn?t understand why the rotation order of nodes is not conserved when saving my result to a vtk file? >>>> >>>> Other information used: >>>> - gmsh format 2.2 >>>> - Vtk version: 7.1.1 >>>> - Petsc version: 3.13/opt >>>> >>>> Below my two files gmsh and vtk: >>>> >>>> Gmsh file: >>>> $MeshFormat >>>> 2.2 0 8 >>>> $EndMeshFormat >>>> $Nodes >>>> 12 >>>> 1 0.0 10.0 10.0 >>>> 2 0.0 0.0 10.0 >>>> 3 0.0 0.0 0.0 >>>> 4 0.0 10.0 0.0 >>>> 5 10.0 10.0 10.0 >>>> 6 10.0 0.0 10.0 >>>> 7 10.0 0.0 0.0 >>>> 8 10.0 10.0 0.0 >>>> 9 20.0 10.0 10.0 >>>> 10 20.0 0.0 10.0 >>>> 11 20.0 0.0 0.0 >>>> 12 20.0 10.0 0.0 >>>> $EndNodes >>>> $Elements >>>> 2 >>>> 1 5 2 68 60 1 2 3 4 5 6 7 8 >>>> 2 5 2 68 60 5 6 7 8 9 10 11 12 >>>> $EndElements >>>> >>>> Vtk file : >>>> # vtk DataFile Version 2.0 >>>> Simplicial Mesh Example >>>> ASCII >>>> DATASET UNSTRUCTURED_GRID >>>> POINTS 12 double >>>> 0.000000e+00 1.000000e+01 1.000000e+01 >>>> 0.000000e+00 0.000000e+00 1.000000e+01 >>>> 0.000000e+00 0.000000e+00 0.000000e+00 >>>> 0.000000e+00 1.000000e+01 0.000000e+00 >>>> 1.000000e+01 1.000000e+01 1.000000e+01 >>>> 1.000000e+01 0.000000e+00 1.000000e+01 >>>> 1.000000e+01 0.000000e+00 0.000000e+00 >>>> 1.000000e+01 1.000000e+01 0.000000e+00 >>>> 2.000000e+01 1.000000e+01 1.000000e+01 >>>> 2.000000e+01 0.000000e+00 1.000000e+01 >>>> 2.000000e+01 0.000000e+00 0.000000e+00 >>>> 2.000000e+01 1.000000e+01 0.000000e+00 >>>> CELLS 2 18 >>>> 8 0 3 2 1 4 5 6 7 >>>> 8 4 7 6 5 8 9 10 11 >>>> CELL_TYPES 2 >>>> 12 >>>> 12 >>>> POINT_DATA 12 >>>> VECTORS dU_x double >>>> 2.754808e-10 -8.653846e-11 -8.653846e-11 >>>> 2.754808e-10 8.653846e-11 -8.653846e-11 >>>> 2.754808e-10 8.653846e-11 8.653846e-11 >>>> 2.754808e-10 -8.653846e-11 8.653846e-11 >>>> 4.678571e-01 -9.107143e-02 -9.107143e-02 >>>> 4.678571e-01 9.107143e-02 -9.107143e-02 >>>> 4.678571e-01 9.107143e-02 9.107143e-02 >>>> 4.678571e-01 -9.107143e-02 9.107143e-02 >>>> 1.000000e+00 -7.500000e-02 -7.500000e-02 >>>> 1.000000e+00 7.500000e-02 -7.500000e-02 >>>> 1.000000e+00 7.500000e-02 7.500000e-02 >>>> 1.000000e+00 -7.500000e-02 7.500000e-02 >>>> >>>> Thank you in advance and have a good day ! >>>> >>>> Sami, >>>> >>>> -- >>>> Dr. Sami BEN ELHAJ SALAH >>>> Ing?nieur de Recherche (CNRS) >>>> Institut Pprime - ISAE - ENSMA >>>> Mobile: 06.62.51.26.74 >>>> Email: sami.ben-elhaj-salah at ensma.fr >>>> www.samibenelhajsalah.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jun 8 10:57:47 2022 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 8 Jun 2022 11:57:47 -0400 Subject: [petsc-users] Writing VTK output In-Reply-To: <7A2FB4C2-CA56-4D7D-9162-4574E14032C6@ensma.fr> References: <87czfje0ol.fsf@jedbrown.org> <875ylbdyfk.fsf@jedbrown.org> <7A2FB4C2-CA56-4D7D-9162-4574E14032C6@ensma.fr> Message-ID: On Wed, Jun 8, 2022 at 11:24 AM Sami BEN ELHAJ SALAH < sami.ben-elhaj-salah at ensma.fr> wrote: > Yes, the file "sami.vtu" is loaded correctly in paraview and I have the > good output like you. > > In my code, I tried with the same command given in your last answer and I > still have the wrong .vtu file. > Hi Sami, What do you mean by wrong? Can you just use the simple procedure: PetscCall(DMCreate(comm, dm)); PetscCall(DMSetType(*dm, DMPLEX)); PetscCall(DMSetFromOptions(*dm)); PetscCall(DMViewFromOptions(*dm, NULL, "-dm_view")); This is the one that works for us. Then we can change it in your code one step at a time until you get what you need. Thanks, Matt > I use this: > mpirun -np 1 /home/benelhasa/fox_petsc/build_test/bin/Debug/FoXtroT > -snes_test_jacobian_view -snes_converged_reason -snes_monitor -ksp_monitor > -ksp_xmonitor -dm_plex_filename cub_2C3D8_msh.msh -dm_view > vtk:cub_2C3D8_msh.vtu cub_8C3D8.fxt > > > Thanks, > Sami, > > -- > Dr. Sami BEN ELHAJ SALAH > Ing?nieur de Recherche (CNRS) > Institut Pprime - ISAE - ENSMA > Mobile: 06.62.51.26.74 > Email: sami.ben-elhaj-salah at ensma.fr > www.samibenelhajsalah.com > > > > > Le 8 juin 2022 ? 16:25, Jed Brown a ?crit : > > Does the file load in paraview? When I load your *.msh in a tutorial with > -dm_plex_filename sami.msh -dm_view vtk:sami.vtu, I get this good output. > > > Sami BEN ELHAJ SALAH writes: > > Hi Jed, > > Thank you for your answer. > > When I use a ??solution.vtu'', I obtain a wrong file. > > > > > > > format="appended" offset="0" /> > > > format="appended" offset="292" /> > format="appended" offset="360" /> > format="appended" offset="372" /> > > > format="appended" offset="378" /> > > > format="appended" offset="390" /> > > > > > _ $@$@$@$@$@$@$@$@$@$@$@$@4@$@$@4@$@4 at 4@$@@ > ?p?O??=??sT?????sT????p?O??=??sT???=??sT????p?O??=??sT???=??sT???=?p?O??=??sT?????sT???=o > _????? uP???? uP??o _????? uP???? uP??o _????? uP???? uP??o _????? uP???? > uP??b#???????333????333??_#?????? > ?333????333??b#??????(?333??'?333??a#???????333??>?333?? > > > > > If I understand your answer, to solve my problem, should just upgrade all > my software ? > > Thanks, > Sami, > > > -- > Dr. Sami BEN ELHAJ SALAH > Ing?nieur de Recherche (CNRS) > Institut Pprime - ISAE - ENSMA > Mobile: 06.62.51.26.74 > Email: sami.ben-elhaj-salah at ensma.fr > www.samibenelhajsalah.com < > https://samiben91.github.io/samibenelhajsalah/index.html> > > > > Le 8 juin 2022 ? 15:37, Jed Brown a ?crit : > > You're using pretty old versions of all software; I'd recommend upgrading. > I recommend choosing the file name "solution.vtu" to use the modern > (non-legacy) format. Does that work for you? > > Sami BEN ELHAJ SALAH writes: > > Dear Petsc Developer team, > > I solved a linear elastic problem in 3D using a DMPLEX. My system is > converging, then I would like to write out my solution vector to a vtk file > where I use unstructured mesh. Currently, I tried two algorithms and I have > the same result. > > 1) Algorithm 1 > err = SNESSolve(_snes, bc_vec_test, solution); > CHKERRABORT(FOX::Parallel::COMM_WORLD,err); > PetscViewer vtk; > > PetscViewerVTKOpen(FOX::Parallel::COMM_WORLD,"solution.vtk",FILE_MODE_WRITE,&vtk); > > VecView(solution,vtk); > PetscViewerDestroy(&vtk); > > > 2) Algorithm 2 > err = SNESSolve(_snes, bc_vec_test, solution); > CHKERRABORT(FOX::Parallel::COMM_WORLD,err); > PetscViewer vtk; > PetscViewerCreate(FOX::Parallel::COMM_WORLD, &vtk); > PetscViewerSetType(vtk, PETSCVIEWERVTK); > PetscViewerFileSetName(vtk, "sol.vtk"); > VecView(solution, vtk); > PetscViewerDestroy(&vtk); > > The result seems correct except for the rotation order of the nodes (see > the red lines on gmsh and vtk file respectively). Then, I visualized my vtk > file with paraview, and I remarked that my geometry is not correct and not > conserved when comparing it with my gmsh file. So, I didn?t understand why > the rotation order of nodes is not conserved when saving my result to a vtk > file? > > Other information used: > - gmsh format 2.2 > - Vtk version: 7.1.1 > - Petsc version: 3.13/opt > > Below my two files gmsh and vtk: > > Gmsh file: > $MeshFormat > 2.2 0 8 > $EndMeshFormat > $Nodes > 12 > 1 0.0 10.0 10.0 > 2 0.0 0.0 10.0 > 3 0.0 0.0 0.0 > 4 0.0 10.0 0.0 > 5 10.0 10.0 10.0 > 6 10.0 0.0 10.0 > 7 10.0 0.0 0.0 > 8 10.0 10.0 0.0 > 9 20.0 10.0 10.0 > 10 20.0 0.0 10.0 > 11 20.0 0.0 0.0 > 12 20.0 10.0 0.0 > $EndNodes > $Elements > 2 > 1 5 2 68 60 1 2 3 4 5 6 7 8 > 2 5 2 68 60 5 6 7 8 9 10 11 12 > $EndElements > > Vtk file : > # vtk DataFile Version 2.0 > Simplicial Mesh Example > ASCII > DATASET UNSTRUCTURED_GRID > POINTS 12 double > 0.000000e+00 1.000000e+01 1.000000e+01 > 0.000000e+00 0.000000e+00 1.000000e+01 > 0.000000e+00 0.000000e+00 0.000000e+00 > 0.000000e+00 1.000000e+01 0.000000e+00 > 1.000000e+01 1.000000e+01 1.000000e+01 > 1.000000e+01 0.000000e+00 1.000000e+01 > 1.000000e+01 0.000000e+00 0.000000e+00 > 1.000000e+01 1.000000e+01 0.000000e+00 > 2.000000e+01 1.000000e+01 1.000000e+01 > 2.000000e+01 0.000000e+00 1.000000e+01 > 2.000000e+01 0.000000e+00 0.000000e+00 > 2.000000e+01 1.000000e+01 0.000000e+00 > CELLS 2 18 > 8 0 3 2 1 4 5 6 7 > 8 4 7 6 5 8 9 10 11 > CELL_TYPES 2 > 12 > 12 > POINT_DATA 12 > VECTORS dU_x double > 2.754808e-10 -8.653846e-11 -8.653846e-11 > 2.754808e-10 8.653846e-11 -8.653846e-11 > 2.754808e-10 8.653846e-11 8.653846e-11 > 2.754808e-10 -8.653846e-11 8.653846e-11 > 4.678571e-01 -9.107143e-02 -9.107143e-02 > 4.678571e-01 9.107143e-02 -9.107143e-02 > 4.678571e-01 9.107143e-02 9.107143e-02 > 4.678571e-01 -9.107143e-02 9.107143e-02 > 1.000000e+00 -7.500000e-02 -7.500000e-02 > 1.000000e+00 7.500000e-02 -7.500000e-02 > 1.000000e+00 7.500000e-02 7.500000e-02 > 1.000000e+00 -7.500000e-02 7.500000e-02 > > Thank you in advance and have a good day ! > > Sami, > > -- > Dr. Sami BEN ELHAJ SALAH > Ing?nieur de Recherche (CNRS) > Institut Pprime - ISAE - ENSMA > Mobile: 06.62.51.26.74 > Email: sami.ben-elhaj-salah at ensma.fr > www.samibenelhajsalah.com < > https://samiben91.github.io/samibenelhajsalah/index.html> > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From armand.touminet at protonmail.com Thu Jun 9 10:25:02 2022 From: armand.touminet at protonmail.com (Armand Touminet) Date: Thu, 09 Jun 2022 15:25:02 +0000 Subject: [petsc-users] VecConcatenate in petsc4py Message-ID: Dear Petsc team, I'm trying to implement PDE constrained optimization using TAO from the petsc4py interface. Since my problem has multiple parameter fields to optimize, I need to combine them into a single Vec object to supply to TAO. I've found the VecConcatenate in the C documentation, which appears to do exactly what I need, however this function does not seem to exist in the python interface (or at least I was unable to find it). Is there an other easy way to combine vectors from python? Thanks for your help, Armand Touminet -------------- next part -------------- An HTML attachment was scrubbed... URL: From zjorti at lanl.gov Thu Jun 9 16:19:51 2022 From: zjorti at lanl.gov (Jorti, Zakariae) Date: Thu, 9 Jun 2022 21:19:51 +0000 Subject: [petsc-users] Question about SuperLU Message-ID: Hi, I am solving non-linear problem that has 5 unknowns {ni, T, E, B, V}, and for the preconditioning part, I am using a FieldSplit preconditioner. At the last fieldsplit/level, we are left with a {B,V} block that tried to precondition in 2 different ways: a) SuperLU: -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_type preonly -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_type lu -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_factor_mat_solver_type superlu_dist b) a Schur-based fieldsplit preconditioner that uses SuperLU for both V and B blocks: -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_type gmres -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_type fieldsplit -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_fieldsplit_type schur -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_fieldsplit_schur_precondition selfp -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_ksp_type preonly -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_pc_type lu -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_pc_factor_mat_solver_type superlu_dist -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_ksp_type preonly -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_pc_type lu -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_pc_factor_mat_solver_type superlu_dist Option a) yields the following error: " Linear fieldsplit_ni_ solve converged due to CONVERGED_ATOL iterations 0 Linear fieldsplit_TEBV_fieldsplit_tau_ solve converged due to CONVERGED_RTOL iterations 1 Linear fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_EP_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ solve did not converge due to DIVERGED_PC_FAILED iterations 0 PC failed due to FACTOR_NUMERIC_ZEROPIVOT " whereas options b) seems to be working well. Is it possible that the SuperLU on the {V,B} block uses a reordering that introduces a zero pivot or could there be another explanation for this error? Many thanks. Best, Zakariae -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Jun 9 16:36:47 2022 From: jed at jedbrown.org (Jed Brown) Date: Thu, 09 Jun 2022 15:36:47 -0600 Subject: [petsc-users] VecConcatenate in petsc4py In-Reply-To: References: Message-ID: <875yl9iknk.fsf@jedbrown.org> You don't want to create a new vector here, but read from (and write to) multiple parts of the same vector. You can use PETSc interfaces for subvectors, or do it with NumPy slices (perhaps more natural and ergonomic, depending on how your code is written). Armand Touminet via petsc-users writes: > Dear Petsc team, > > I'm trying to implement PDE constrained optimization using TAO from the petsc4py interface. > Since my problem has multiple parameter fields to optimize, I need to combine them into a single Vec object to supply to TAO. I've found the VecConcatenate in the C documentation, which appears to do exactly what I need, however this function does not seem to exist in the python interface (or at least I was unable to find it). > Is there an other easy way to combine vectors from python? > > Thanks for your help, > > Armand Touminet From xsli at lbl.gov Thu Jun 9 19:28:13 2022 From: xsli at lbl.gov (Xiaoye S. Li) Date: Thu, 9 Jun 2022 17:28:13 -0700 Subject: [petsc-users] Question about SuperLU In-Reply-To: References: Message-ID: Are you using serial SuperLU, or distributed-memory SuperLU_DIST? What are the algorithm options are you using? Sherry Li On Thu, Jun 9, 2022 at 2:20 PM Jorti, Zakariae via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi, > > I am solving non-linear problem that has 5 unknowns {ni, T, E, B, V}, and > for the preconditioning part, I am using a FieldSplit preconditioner. At > the last fieldsplit/level, we are left with a {B,V} block that tried to > precondition in 2 different ways: > a) SuperLU: > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_type preonly > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_type lu > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_factor_mat_solver_type > superlu_dist > b) a Schur-based fieldsplit preconditioner that uses SuperLU for both V > and B blocks: > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_type gmres > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_type fieldsplit > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_fieldsplit_type schur > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_fieldsplit_schur_precondition > selfp -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_ksp_type > preonly -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_pc_type > lu > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_pc_factor_mat_solver_type > superlu_dist > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_ksp_type preonly > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_pc_type lu > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_pc_factor_mat_solver_type > superlu_dist > > Option a) yields the following error: > " Linear fieldsplit_ni_ solve converged due to CONVERGED_ATOL > iterations 0 > Linear fieldsplit_TEBV_fieldsplit_tau_ solve converged due to > CONVERGED_RTOL iterations 1 > Linear fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_EP_ solve > converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ solve did > not converge due to DIVERGED_PC_FAILED iterations 0 > PC failed due to FACTOR_NUMERIC_ZEROPIVOT " > whereas options b) seems to be working well. > Is it possible that the SuperLU on the {V,B} block uses a reordering that > introduces a zero pivot or could there be another explanation for this > error? > > Many thanks. > Best, > > Zakariae > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsfaraway at gmail.com Fri Jun 10 07:27:06 2022 From: jsfaraway at gmail.com (jsfaraway) Date: Fri, 10 Jun 2022 20:27:06 +0800 Subject: [petsc-users] SLEPc EPSGD: too much time in single iteration In-Reply-To: References: <4C029833-EC0F-494F-911F-D795375718D9@gmail.com> Message-ID: <6E7BB1A2-5E08-4C99-93EB-77D14B23B44E@gmail.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log_view.txt URL: From jroman at dsic.upv.es Fri Jun 10 07:50:47 2022 From: jroman at dsic.upv.es (Jose E. Roman) Date: Fri, 10 Jun 2022 14:50:47 +0200 Subject: [petsc-users] SLEPc EPSGD: too much time in single iteration In-Reply-To: References: Message-ID: <0FE9EE1A-2CD4-44A2-ACD0-F6F4A466457A@dsic.upv.es> The value -eps_ncv 5000 is huge. Better let SLEPc use the default value. Jose > El 10 jun 2022, a las 14:24, Jin Runfeng escribi?: > > Hello! > I want to acquire the 3 smallest eigenvalue, and attachment is the log view output. I can see epssolve really cost the major time. But I can not see why it cost so much time. Can you see something from it? > > Thank you ! > > Runfeng Jin > > On 6? 4 2022, at 1:37 ??, Jose E. Roman wrote: > Convergence depends on distribution of eigenvalues you want to compute. On the other hand, the cost also depends on the time it takes to build the preconditioner. Use -log_view to see the cost of the different steps of the computation. > > Jose > > > > El 3 jun 2022, a las 18:50, jsfaraway escribi?: > > > > hello! > > > > I am trying to use epsgd compute matrix's one smallest eigenvalue. And I find a strang thing. There are two matrix A(900000*900000) and B(90000*90000). While solve A use 371 iterations and only 30.83s, solve B use 22 iterations and 38885s! What could be the reason for this? Or what can I do to find the reason? > > > > I use" -eps_type gd -eps_ncv 300 -eps_nev 3 -eps_smallest_real ". > > And there is one difference I can tell is matrix B has many small value, whose absolute value is less than 10-6. Could this be the reason? > > > > Thank you! > > > > Runfeng Jin > From bsmith at petsc.dev Fri Jun 10 08:32:01 2022 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 10 Jun 2022 09:32:01 -0400 Subject: [petsc-users] Question about SuperLU In-Reply-To: References: Message-ID: It is difficult to tell exactly how the preconditioner is being formed with the information below it looks like in the first case: the original B diagonal block and V diagonal block of the matrix are being factored separately with SuperLU_DIST second case: the B block is factored with SuperLU_DIST and an explicit approximation to a Schur complement of the V block (Schur complement on eliminating the B block) is formed using "Preconditioner for the Schur complement formed from Sp, an assembled approximation to S, which uses A00's %sdiagonal's inverse" (this is the printout from a KSPView() for this part of the preconditioner). My guess is you have a "Stokes"-like problem where the V block is identically 0 so, of course, the SuperLU_DIST will fail on it. But the approximation of the Schur complement onto that block is not singular so SuperLU_DIST has no trouble. If I am wrong and the V block is not identically 0 then it may be singular (or possibly, but less likely just badly order) so that SuperLU_DIST encounters a zero pivot. You can run with -ksp_view_pre to have the KSP print the KSP solver algorithm details BEFORE the linear solve (hence they would get printed despite your failed solve). That would be useful to see exactly what your preconditioner is. You can use -ksp_view_pmat (with appropriate prefix) to display the matrix that is going to be factored. Thus you can quickly verify what V is. If you run with -ksp_error_if_not_converged then the solver will stop exactly when the zero pivot is encountered; this would include some information from SuperLU_DIST which might include the row number etc. Notes on PETSc improvements needed. 1) The man page for KSPCheckSolve() is terribly misleading 2) It would be nice to have a view that displayed the nested fieldsplit preconditioners more clearly > On Jun 9, 2022, at 5:19 PM, Jorti, Zakariae via petsc-users wrote: > > Hi, > > I am solving non-linear problem that has 5 unknowns {ni, T, E, B, V}, and for the preconditioning part, I am using a FieldSplit preconditioner. At the last fieldsplit/level, we are left with a {B,V} block that tried to precondition in 2 different ways: > a) SuperLU: > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_type preonly -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_type lu -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_factor_mat_solver_type superlu_dist > b) a Schur-based fieldsplit preconditioner that uses SuperLU for both V and B blocks: > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_type gmres -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_type fieldsplit -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_fieldsplit_type schur -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_fieldsplit_schur_precondition selfp -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_ksp_type preonly -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_pc_type lu -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_pc_factor_mat_solver_type superlu_dist -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_ksp_type preonly -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_pc_type lu -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_pc_factor_mat_solver_type superlu_dist > > Option a) yields the following error: > " Linear fieldsplit_ni_ solve converged due to CONVERGED_ATOL iterations 0 > Linear fieldsplit_TEBV_fieldsplit_tau_ solve converged due to CONVERGED_RTOL iterations 1 > Linear fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_EP_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ solve did not converge due to DIVERGED_PC_FAILED iterations 0 > PC failed due to FACTOR_NUMERIC_ZEROPIVOT " > whereas options b) seems to be working well. > Is it possible that the SuperLU on the {V,B} block uses a reordering that introduces a zero pivot or could there be another explanation for this error? > > Many thanks. > Best, > > Zakariae -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jun 10 09:11:11 2022 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 10 Jun 2022 10:11:11 -0400 Subject: [petsc-users] Question about SuperLU In-Reply-To: References: Message-ID: On Thu, Jun 9, 2022 at 5:20 PM Jorti, Zakariae via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi, > > I am solving non-linear problem that has 5 unknowns {ni, T, E, B, V}, and > for the preconditioning part, I am using a FieldSplit preconditioner. At > the last fieldsplit/level, we are left with a {B,V} block that tried to > precondition in 2 different ways: > a) SuperLU: > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_type preonly > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_type lu > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_factor_mat_solver_type > superlu_dist > b) a Schur-based fieldsplit preconditioner that uses SuperLU for both V > and B blocks: > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_type gmres > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_type fieldsplit > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_fieldsplit_type schur > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_fieldsplit_schur_precondition > selfp -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_ksp_type > preonly -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_pc_type > lu > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_pc_factor_mat_solver_type > superlu_dist > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_ksp_type preonly > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_pc_type lu > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_pc_factor_mat_solver_type > superlu_dist > > Option a) yields the following error: > " Linear fieldsplit_ni_ solve converged due to CONVERGED_ATOL > iterations 0 > Linear fieldsplit_TEBV_fieldsplit_tau_ solve converged due to > CONVERGED_RTOL iterations 1 > Linear fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_EP_ solve > converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ solve did > not converge due to DIVERGED_PC_FAILED iterations 0 > PC failed due to FACTOR_NUMERIC_ZEROPIVOT " > whereas options b) seems to be working well. > Is it possible that the SuperLU on the {V,B} block uses a reordering that > introduces a zero pivot or could there be another explanation for this > error? > I can at least come up with a case where this is true. Suppose you have / A 0 \ \ 0 I / where A is rank deficient, but has a positive diagonal. SuperLU will fail since it is actually singular. However, your Schur complement might work since you use 'selfp' for the Schur preconditioner, and it just extracts the diagonal. Thanks, Matt > Many thanks. > Best, > > Zakariae > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tangqi at msu.edu Fri Jun 10 09:43:43 2022 From: tangqi at msu.edu (Tang, Qi) Date: Fri, 10 Jun 2022 14:43:43 +0000 Subject: [petsc-users] Question about SuperLU Message-ID: <451BCBEE-FCC1-44D2-946F-4AF80403E6A9@msu.edu> ?We use superlu_dist. We have a 2 x 2 block where directly calling suplerlu_dist fails, but a pc based on a fieldsplit Schur complement + superlu_dist on the assembled Schur complement matrix converges. (All the converge criteria are default at this level) I am having a hard time to understand what is going on. The B,V block is of size 240K, so it is also hard to analyze. And the mat is not something we explicitly formed. It is formed by finite difference coloring Jacobian + a few levels of Schur complement. / A 0 \ \ 0 I / Matt, I do not see this can explain why the second pc with superlu on S = A would succeed, if A is not full rank. I believe I found somewhere it says petsc?s pclu (or maybe superlu_dist) did reordering and it may introduce 0 pivoting. We are asking because it seems there is something we do not understand from pclu/superlu level. Anyway, is there a way to output the mat before it fails? We have been struggling to do that. We have TSSolve->SNES->FDColoringJacobian->A few levels of fieldsplit->failed Subblock matrix, which we want to analyze. (Sometimes it even happens in the second Newton iteration as the first one works okay.) Qi On Jun 10, 2022, at 8:11 AM, Matthew Knepley wrote: ? On Thu, Jun 9, 2022 at 5:20 PM Jorti, Zakariae via petsc-users > wrote: Hi, I am solving non-linear problem that has 5 unknowns {ni, T, E, B, V}, and for the preconditioning part, I am using a FieldSplit preconditioner. At the last fieldsplit/level, we are left with a {B,V} block that tried to precondition in 2 different ways: a) SuperLU: -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_type preonly -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_type lu -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_factor_mat_solver_type superlu_dist b) a Schur-based fieldsplit preconditioner that uses SuperLU for both V and B blocks: -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_type gmres -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_type fieldsplit -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_fieldsplit_type schur -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_fieldsplit_schur_precondition selfp -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_ksp_type preonly -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_pc_type lu -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_pc_factor_mat_solver_type superlu_dist -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_ksp_type preonly -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_pc_type lu -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_pc_factor_mat_solver_type superlu_dist Option a) yields the following error: " Linear fieldsplit_ni_ solve converged due to CONVERGED_ATOL iterations 0 Linear fieldsplit_TEBV_fieldsplit_tau_ solve converged due to CONVERGED_RTOL iterations 1 Linear fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_EP_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ solve did not converge due to DIVERGED_PC_FAILED iterations 0 PC failed due to FACTOR_NUMERIC_ZEROPIVOT " whereas options b) seems to be working well. Is it possible that the SuperLU on the {V,B} block uses a reordering that introduces a zero pivot or could there be another explanation for this error? I can at least come up with a case where this is true. Suppose you have / A 0 \ \ 0 I / where A is rank deficient, but has a positive diagonal. SuperLU will fail since it is actually singular. However, your Schur complement might work since you use 'selfp' for the Schur preconditioner, and it just extracts the diagonal. Thanks, Matt Many thanks. Best, Zakariae -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.sanan at gmail.com Fri Jun 10 11:54:30 2022 From: patrick.sanan at gmail.com (Patrick Sanan) Date: Fri, 10 Jun 2022 18:54:30 +0200 Subject: [petsc-users] Mat created by DMStag cannot access ghost points In-Reply-To: References: <859FA50E-F3C2-4E54-AEDC-7C8A70D3FCE1@petsc.dev> Message-ID: Sorry about the long delay on this. https://gitlab.com/petsc/petsc/-/merge_requests/5329 Am Do., 2. Juni 2022 um 15:01 Uhr schrieb Matthew Knepley : > On Thu, Jun 2, 2022 at 8:59 AM Patrick Sanan > wrote: > >> Thanks, Barry and Changqing! That seems reasonable to me, so I'll make an >> MR with that change. >> > > Hi Patrick, > > In the MR, could you add that option to all places we internally use > Preallocator? I think we mean it for those. > > Thanks, > > Matt > > >> Am Mi., 1. Juni 2022 um 20:06 Uhr schrieb Barry Smith : >> >>> >>> This appears to be a bug in the DMStag/Mat preallocator code. If you >>> add after the DMCreateMatrix() line in your code >>> >>> PetscCall(MatSetOption(A, MAT_NO_OFF_PROC_ENTRIES, PETSC_FALSE)); >>> >>> Your code will run correctly. >>> >>> Patrick and Matt, >>> >>> MatPreallocatorPreallocate_Preallocator() has >>> >>> PetscCall(MatSetOption(A, MAT_NO_OFF_PROC_ENTRIES, p->nooffproc)); >>> >>> to make the assembly of the stag matrix from the preallocator matrix a >>> little faster, >>> >>> but then it never "undoes" this call. Hence the matrix is left in the >>> state where it will error if someone sets values from a different rank >>> (which they certainly can using DMStagMatSetValuesStencil(). >>> >>> I think you need to clear the NO_OFF_PROC at the end >>> of MatPreallocatorPreallocate_Preallocator() because just because the >>> preallocation process never needed communication does not mean that when >>> someone puts real values in the matrix they will never use communication; >>> they can put in values any dang way they please. >>> >>> I don't know why this bug has not come up before. >>> >>> Barry >>> >>> >>> On May 31, 2022, at 11:08 PM, Ye Changqing >>> wrote: >>> >>> Dear all, >>> >>> [BugReport.c] is a sample code, [BugReportParallel.output] is the output >>> when execute BugReport with mpiexec, [BugReportSerial.output] is the output >>> in serial execution. >>> >>> Best, >>> Changqing >>> >>> ------------------------------ >>> *???:* Dave May >>> *????:* 2022?5?31? 22:55 >>> *???:* Ye Changqing >>> *??:* petsc-users at mcs.anl.gov >>> *??:* Re: [petsc-users] Mat created by DMStag cannot access ghost points >>> >>> >>> >>> On Tue 31. May 2022 at 16:28, Ye Changqing >>> wrote: >>> >>> Dear developers of PETSc, >>> >>> I encountered a problem when using the DMStag module. The program could >>> be executed perfectly in serial, while errors are thrown out in parallel >>> (using mpiexec). Some rows in Mat cannot be accessed in local processes >>> when looping all elements in DMStag. The DM object I used only has one DOF >>> in each element. Hence, I could switch to the DMDA module easily, and the >>> program now is back to normal. >>> >>> Some snippets are below. >>> >>> Initialise a DMStag object: >>> PetscCall(DMStagCreate2d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, >>> DM_BOUNDARY_NONE, M, N, PETSC_DECIDE, PETSC_DECIDE, 0, 0, 1, >>> DMSTAG_STENCIL_BOX, 1, NULL, NULL, &(s_ctx->dm_P))); >>> Created a Mat: >>> PetscCall(DMCreateMatrix(s_ctx->dm_P, A)); >>> Loop: >>> PetscCall(DMStagGetCorners(s_ctx->dm_V, &startx, &starty, &startz, &nx, >>> &ny, &nz, &extrax, &extray, &extraz)); >>> for (ey = starty; ey < starty + ny; ++ey) >>> for (ex = startx; ex < startx + nx; ++ex) >>> { >>> ... >>> PetscCall(DMStagMatSetValuesStencil(s_ctx->dm_P, *A, 2, &row[0], 2, >>> &col[0], &val_A[0][0], ADD_VALUES)); // The traceback shows the problem is >>> in here. >>> } >>> >>> >>> In addition to the code or MWE, please forward us the complete stack >>> trace / error thrown to stdout. >>> >>> Thanks, >>> Dave >>> >>> >>> >>> Best, >>> Changqing >>> >>> >>> >>> >>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From 100442268 at alumnos.uc3m.es Fri Jun 10 10:47:38 2022 From: 100442268 at alumnos.uc3m.es (NILTON SANTOS VALDIVIA) Date: Fri, 10 Jun 2022 17:47:38 +0200 Subject: [petsc-users] Load a Sparse Matrix from a MAT file in PETSC Message-ID: Hello, I was trying to load a sparse matrix from a .MAT file (and solve the linear system). But even though, I have extracted the A and b matrix and vector from this suite *https://sparse.tamu.edu/FEMLAB/waveguide3D * and saved the variables (A, b) in a MAT file as PETSC recommends, I couldn't be able to load the matrix. Is there something that I was doing wrong? or do you know a way to load a matrix from the above link? Actually I was using src/ksp/ksp/tutorials/ex10.c.html example to try to load a .MAT file (containing A and b) without success. Best Regards, NILTON SANTOS -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jun 10 12:57:02 2022 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 10 Jun 2022 13:57:02 -0400 Subject: [petsc-users] Load a Sparse Matrix from a MAT file in PETSC In-Reply-To: References: Message-ID: On Fri, Jun 10, 2022 at 1:15 PM NILTON SANTOS VALDIVIA < 100442268 at alumnos.uc3m.es> wrote: > Hello, > > I was trying to load a sparse matrix from a .MAT file (and solve the > linear system). But even though, I have extracted the A and b matrix and > vector from this suite *https://sparse.tamu.edu/FEMLAB/waveguide3D > * and saved the variables > (A, b) in a MAT file as PETSC recommends, I couldn't be able to load the > matrix. Is there something that I was doing wrong? or do you know a way to > load a matrix from the above link? > > Actually I was using src/ksp/ksp/tutorials/ex10.c.html > example to try > to load a .MAT file (containing A and b) without success. > Do you want to load a Matrix Market format matrix and vector? If so, this is what https://petsc.org/main/src/mat/tests/ex72.c.html does. Thanks, Matt > > Best Regards, > > NILTON SANTOS > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From 100442268 at alumnos.uc3m.es Fri Jun 10 14:58:48 2022 From: 100442268 at alumnos.uc3m.es (NILTON SANTOS VALDIVIA) Date: Fri, 10 Jun 2022 21:58:48 +0200 Subject: [petsc-users] Load a Sparse Matrix from a MAT file in PETSC In-Reply-To: References: Message-ID: Hello Matthew, Thank you very much for your answer. What I'm trying to do is to solve a linear system using the A and b (matrix and vector) provided in https://sparse.tamu.edu/FEMLAB/waveguide3D (no matter if its a .mat or a .mtx file) and also I want to solve the system with multiple MPI processes, I'm not an expert on it, just starting to understand the procedure. I'd really appreciate if you could help me with this exercise. Best Regards, NILTON SANTOS El vie, 10 jun 2022 a las 19:57, Matthew Knepley () escribi?: > On Fri, Jun 10, 2022 at 1:15 PM NILTON SANTOS VALDIVIA < > 100442268 at alumnos.uc3m.es> wrote: > >> Hello, >> >> I was trying to load a sparse matrix from a .MAT file (and solve the >> linear system). But even though, I have extracted the A and b matrix and >> vector from this suite *https://sparse.tamu.edu/FEMLAB/waveguide3D >> * and saved the variables >> (A, b) in a MAT file as PETSC recommends, I couldn't be able to load the >> matrix. Is there something that I was doing wrong? or do you know a way to >> load a matrix from the above link? >> >> Actually I was using src/ksp/ksp/tutorials/ex10.c.html >> example to >> try to load a .MAT file (containing A and b) without success. >> > > Do you want to load a Matrix Market format matrix and vector? If so, this > is what https://petsc.org/main/src/mat/tests/ex72.c.html does. > > Thanks, > > Matt > > >> >> Best Regards, >> >> NILTON SANTOS >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bourdin at mcmaster.ca Fri Jun 10 15:06:22 2022 From: bourdin at mcmaster.ca (Blaise Bourdin) Date: Fri, 10 Jun 2022 20:06:22 +0000 Subject: [petsc-users] List of points with dof>0 in a PetscSection Message-ID: Hi, Given a PetscSection, is there an easy way to get a list of point at which the number of dof is >0? For instance, when projecting over a FE space, I?d rather do a loop over such points than do a loop over all points in a DM, get the number of dof, and test if it is >0. Regards, Blaise -- Professor, Department of Mathematics & Statistics Hamilton Hall room 409A, McMaster University 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243 From kkhedkar9879 at sdsu.edu Fri Jun 10 15:14:30 2022 From: kkhedkar9879 at sdsu.edu (Kaustubh Khedkar) Date: Fri, 10 Jun 2022 13:14:30 -0700 Subject: [petsc-users] Error with PetscMatlabEngineCreate Message-ID: Hi all, I am using the Petsc?s Matlab engine to run some Matlab scripts from my c++ code. I have been using Petsc?s Github commit: 9babe2dd5ff256baf1aab74d81ff9ed4c6baba0b (HEAD -> master, origin/master, origin/HEAD) Merge: e9b74a6d12 bb2d6e605a Author: Satish Balay > Date: Fri Nov 6 17:46:10 2020 +0000 I used the command: PetscMatlabEngineCreate(PETSC_COMM_SELF, "master", &(mengine)); where, the hostname is master. (Verified by typing hostname in the terminal) Everything was working fine until I updated my PETSc version to 3.17.2. Using this version I get error using the command: PetscMatlabEngineEvaluate(mengine, "load_parameters;?); cannot read load_parameters script. where, load_parameters is a Matlab script. When I switch the hostname to NULL from master as: PetscMatlabEngineCreate(PETSC_COMM_SELF, NULL, &(mengine)); Everything starts working fine again. All of this was executed on the same machine. Has anything changed when using the PetscMatlabEngineEvaluate command? Thank you, Kaustubh Khedkar -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Jun 10 15:47:29 2022 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 10 Jun 2022 16:47:29 -0400 Subject: [petsc-users] Error with PetscMatlabEngineCreate In-Reply-To: References: Message-ID: <8091FE26-A225-4461-BC7D-1E5362D65DD6@petsc.dev> Based on your report the issue is likely due to a MATLABPATH issue. The difference between using "master" and NULL is that when "master" is used, PETSc ssh's to "master" to startup the Matlab engine while with NULL it launches the Matlab engine directly from the current process in (presumably) the current directory. When ssh is used it does not have information about the current directory nor would it have any tweaks you have made to your MATLABPATH in your shell. Thus it cannot find the script. Thus when not using NULL you need to make sure that all scripts you plan to launch are findable in the MATLABPATH on the machine you are launching the Matlab engine, maybe by putting the directories in MATLABPATH on in your .bashrc or .profile file or whatever file gets sourced automatically when you ssh to master. Barry I have no explanation why the behavior would change with PETSc versions or Matlab versions but the above should resolve the problem; you may have previously just been "lucky" it could fine the script. > On Jun 10, 2022, at 4:14 PM, Kaustubh Khedkar via petsc-users wrote: > > Hi all, > > I am using the Petsc?s Matlab engine to run some Matlab scripts from my c++ code. > > I have been using Petsc?s Github commit: 9babe2dd5ff256baf1aab74d81ff9ed4c6baba0b > (HEAD -> master, origin/master, origin/HEAD) > Merge: e9b74a6d12 bb2d6e605a > Author: Satish Balay > > Date: Fri Nov 6 17:46:10 2020 +0000 > > I used the command: > > PetscMatlabEngineCreate(PETSC_COMM_SELF, "master", &(mengine)); > > where, the hostname is master. (Verified by typing hostname in the terminal) > > Everything was working fine until I updated my PETSc version to 3.17.2. > Using this version I get error using the command: > > PetscMatlabEngineEvaluate(mengine, "load_parameters;?); > > cannot read load_parameters script. where, load_parameters is a Matlab script. > > When I switch the hostname to NULL from master as: > > PetscMatlabEngineCreate(PETSC_COMM_SELF, NULL, &(mengine)); > > Everything starts working fine again. All of this was executed on the same machine. > > Has anything changed when using the PetscMatlabEngineEvaluate command? > > > Thank you, > Kaustubh Khedkar -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jun 10 17:14:07 2022 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 10 Jun 2022 18:14:07 -0400 Subject: [petsc-users] Load a Sparse Matrix from a MAT file in PETSC In-Reply-To: References: Message-ID: On Fri, Jun 10, 2022 at 3:59 PM NILTON SANTOS VALDIVIA < 100442268 at alumnos.uc3m.es> wrote: > Hello Matthew, > > Thank you very much for your answer. What I'm trying to do is to solve a > linear system using the A and b (matrix and vector) provided in > https://sparse.tamu.edu/FEMLAB/waveguide3D (no matter if its a .mat or a > .mtx file) and also I want to solve the system with multiple MPI processes, > I'm not an expert on it, just starting to understand the procedure. I'd > really appreciate if you could help me with this exercise. > I would: 1) Download the Matrix Market format 2) Use Mat test ex72 to read that matrix + vector and output them in PETSc binary format 3) Use KSP ex10 to read the PETSc binary format and test your solver Thanks, Matt > Best Regards, > > NILTON SANTOS > > > El vie, 10 jun 2022 a las 19:57, Matthew Knepley () > escribi?: > >> On Fri, Jun 10, 2022 at 1:15 PM NILTON SANTOS VALDIVIA < >> 100442268 at alumnos.uc3m.es> wrote: >> >>> Hello, >>> >>> I was trying to load a sparse matrix from a .MAT file (and solve the >>> linear system). But even though, I have extracted the A and b matrix and >>> vector from this suite *https://sparse.tamu.edu/FEMLAB/waveguide3D >>> * and saved the variables >>> (A, b) in a MAT file as PETSC recommends, I couldn't be able to load the >>> matrix. Is there something that I was doing wrong? or do you know a way to >>> load a matrix from the above link? >>> >>> Actually I was using src/ksp/ksp/tutorials/ex10.c.html >>> example to >>> try to load a .MAT file (containing A and b) without success. >>> >> >> Do you want to load a Matrix Market format matrix and vector? If so, this >> is what https://petsc.org/main/src/mat/tests/ex72.c.html does. >> >> Thanks, >> >> Matt >> >> >>> >>> Best Regards, >>> >>> NILTON SANTOS >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From zjorti at lanl.gov Fri Jun 10 18:30:59 2022 From: zjorti at lanl.gov (Jorti, Zakariae) Date: Fri, 10 Jun 2022 23:30:59 +0000 Subject: [petsc-users] [EXTERNAL] Re: Question about SuperLU In-Reply-To: References: , Message-ID: Hi, Thank you all for your answers. I have tried your suggestions and here is what I found. Barry you were right about the first case. But in the second case, I am not using a Schur fieldsplit but a multiplicative fieldsplit : -fieldsplit_TEBV_fieldsplit_EBV_pc_type fieldsplit -fieldsplit_TEBV_fieldsplit_EBV_pc_fieldsplit_type multiplicative So, there should not be any Schur complement approximation Sp. When I ran a test with -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_view_pre -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_error_if_not_converged, I got this error: 0 SNES Function norm 6.368031218939e-02 0 KSP Residual norm 6.368031218939e-02 Linear fieldsplit_ni_ solve converged due to CONVERGED_ATOL iterations 0 Linear fieldsplit_TEBV_fieldsplit_tau_ solve converged due to CONVERGED_RTOL iterations 1 Linear fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_EP_ solve converged due to CONVERGED_RTOL iterations 3 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Zero pivot in LU factorization: https://petsc.org/release/faq/#zeropivot [0]PETSC ERROR: Zero pivot in row 1658 [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.16.3-751-g2f43bd9bc3 GIT Date: 2022-01-26 22:34:02 -0600 [0]PETSC ERROR: ./main on a macx named pn2032683.lanl.gov by zjorti Fri Jun 10 16:17:35 2022 [0]PETSC ERROR: Configure options PETSC_ARCH=macx --with-fc=0 --with-mpi-dir=/Users/zjorti/.brew --download-hypre --with-debugging=0 --with-cxx-dialect=C++11 --download-superlu_dist --download-parmetis --download-metis --download-ptscotch --download-cmake Then I tried this flag -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_view_pmat binary:BVmat:binary_matlab and checked the resulting matrix in Matlab. For Matlab, this is a full rank matrix, and the LU factorization there was carried out without any issues. I also outputted the BV block directly from the Jacobian matrix. Once again, according to Matlab, it is a full rank matrix and it computes the LU factorization without any problem. ________________________________ From: Barry Smith Sent: Friday, June 10, 2022 7:32 AM To: Jorti, Zakariae Cc: petsc-users at mcs.anl.gov; Tang, Xianzhu Subject: [EXTERNAL] Re: [petsc-users] Question about SuperLU It is difficult to tell exactly how the preconditioner is being formed with the information below it looks like in the first case: the original B diagonal block and V diagonal block of the matrix are being factored separately with SuperLU_DIST second case: the B block is factored with SuperLU_DIST and an explicit approximation to a Schur complement of the V block (Schur complement on eliminating the B block) is formed using "Preconditioner for the Schur complement formed from Sp, an assembled approximation to S, which uses A00's %sdiagonal's inverse" (this is the printout from a KSPView() for this part of the preconditioner). My guess is you have a "Stokes"-like problem where the V block is identically 0 so, of course, the SuperLU_DIST will fail on it. But the approximation of the Schur complement onto that block is not singular so SuperLU_DIST has no trouble. If I am wrong and the V block is not identically 0 then it may be singular (or possibly, but less likely just badly order) so that SuperLU_DIST encounters a zero pivot. You can run with -ksp_view_pre to have the KSP print the KSP solver algorithm details BEFORE the linear solve (hence they would get printed despite your failed solve). That would be useful to see exactly what your preconditioner is. You can use -ksp_view_pmat (with appropriate prefix) to display the matrix that is going to be factored. Thus you can quickly verify what V is. If you run with -ksp_error_if_not_converged then the solver will stop exactly when the zero pivot is encountered; this would include some information from SuperLU_DIST which might include the row number etc. Notes on PETSc improvements needed. 1) The man page for KSPCheckSolve() is terribly misleading 2) It would be nice to have a view that displayed the nested fieldsplit preconditioners more clearly On Jun 9, 2022, at 5:19 PM, Jorti, Zakariae via petsc-users > wrote: Hi, I am solving non-linear problem that has 5 unknowns {ni, T, E, B, V}, and for the preconditioning part, I am using a FieldSplit preconditioner. At the last fieldsplit/level, we are left with a {B,V} block that tried to precondition in 2 different ways: a) SuperLU: -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_type preonly -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_type lu -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_factor_mat_solver_type superlu_dist b) a Schur-based fieldsplit preconditioner that uses SuperLU for both V and B blocks: -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_type gmres -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_type fieldsplit -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_fieldsplit_type schur -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_fieldsplit_schur_precondition selfp -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_ksp_type preonly -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_pc_type lu -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_pc_factor_mat_solver_type superlu_dist -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_ksp_type preonly -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_pc_type lu -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_pc_factor_mat_solver_type superlu_dist Option a) yields the following error: " Linear fieldsplit_ni_ solve converged due to CONVERGED_ATOL iterations 0 Linear fieldsplit_TEBV_fieldsplit_tau_ solve converged due to CONVERGED_RTOL iterations 1 Linear fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_EP_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ solve did not converge due to DIVERGED_PC_FAILED iterations 0 PC failed due to FACTOR_NUMERIC_ZEROPIVOT " whereas options b) seems to be working well. Is it possible that the SuperLU on the {V,B} block uses a reordering that introduces a zero pivot or could there be another explanation for this error? Many thanks. Best, Zakariae -------------- next part -------------- An HTML attachment was scrubbed... URL: From kaustubh23593 at gmail.com Fri Jun 10 14:11:33 2022 From: kaustubh23593 at gmail.com (Kaustubh Khedkar) Date: Fri, 10 Jun 2022 12:11:33 -0700 Subject: [petsc-users] Error with PetscMatlabEngineCreate Message-ID: <3A0E58B8-A166-4024-B615-CD353F666C02@gmail.com> Hi all, I am using the Petsc?s Matlab engine to run some Matlab scripts from my c++ code. I have been using Petsc?s Github commit: 9babe2dd5ff256baf1aab74d81ff9ed4c6baba0b (HEAD -> master, origin/master, origin/HEAD) Merge: e9b74a6d12 bb2d6e605a Author: Satish Balay Date: Fri Nov 6 17:46:10 2020 +0000 I used the command: PetscMatlabEngineCreate(PETSC_COMM_SELF, "master", &(mengine)); where, the hostname is master. (Verified by typing hostname in the terminal) Everything was working fine until I updated my PETSc version to 3.17.2. Using this version I get error using the command: PetscMatlabEngineEvaluate(mengine, "load_parameters;?); cannot read load_parameters script. where, load_parameters is a Matlab script. When I switch the hostname to NULL from master as: PetscMatlabEngineCreate(PETSC_COMM_SELF, NULL, &(mengine)); Everything starts working fine again. All of this was executed on the same machine. Has anything changed when using the PetscMatlabEngineEvaluate command? Thank you, Kaustubh Khedkar -------------- next part -------------- An HTML attachment was scrubbed... URL: From lokie1372 at gmail.com Fri Jun 10 16:00:30 2022 From: lokie1372 at gmail.com (luciano Hammond Noratto) Date: Fri, 10 Jun 2022 23:00:30 +0200 Subject: [petsc-users] ead a matrix and vector from a file and solve a linear system in parallel Message-ID: Hello, I was trying to read a matrix and vector from here https://sparse.tamu.edu/FEMLAB/waveguide3D and then solve the linear system in parallel without success. I'm new in PETSC, I'd really appreciate it if someone could help me to solve this problem. Luciano -------------- next part -------------- An HTML attachment was scrubbed... URL: From xsli at lbl.gov Fri Jun 10 19:35:01 2022 From: xsli at lbl.gov (Xiaoye S. Li) Date: Fri, 10 Jun 2022 17:35:01 -0700 Subject: [petsc-users] [EXTERNAL] Re: Question about SuperLU In-Reply-To: References: Message-ID: Could that be due to "numerical zero pivot" (caused due to cancellation and underflow)? You can try to force diagonal to be nonzero. Looking at the options page: https://petsc.org/main/docs/manualpages/Mat/MATSOLVERSUPERLU_DIST/ You can enable this one: -mat_superlu_dist_replacetinypivot replace tiny pivots (the default is NO, not to replace tiny pivots, including zero pivots.) Sherry On Fri, Jun 10, 2022 at 4:31 PM Jorti, Zakariae via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi, > > Thank you all for your answers. > I have tried your suggestions and here is what I found. > Barry you were right about the first case. But in the second case, I am > not using a Schur fieldsplit but a multiplicative fieldsplit : > -fieldsplit_TEBV_fieldsplit_EBV_pc_type fieldsplit > -fieldsplit_TEBV_fieldsplit_EBV_pc_fieldsplit_type multiplicative > > So, there should not be any Schur complement approximation Sp. > > When I ran a test with > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_view_pre > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_error_if_not_converged, I > got this error: > > > 0 SNES Function norm 6.368031218939e-02 > 0 KSP Residual norm 6.368031218939e-02 > Linear fieldsplit_ni_ solve converged due to CONVERGED_ATOL > iterations 0 > Linear fieldsplit_TEBV_fieldsplit_tau_ solve converged due to > CONVERGED_RTOL iterations 1 > Linear fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_EP_ solve > converged due to CONVERGED_RTOL iterations 3 > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Zero pivot in LU factorization: > https://petsc.org/release/faq/#zeropivot > [0]PETSC ERROR: Zero pivot in row 1658 > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.16.3-751-g2f43bd9bc3 > GIT Date: 2022-01-26 22:34:02 -0600 > [0]PETSC ERROR: ./main on a macx named pn2032683.lanl.gov by zjorti Fri > Jun 10 16:17:35 2022 > [0]PETSC ERROR: Configure options PETSC_ARCH=macx --with-fc=0 > --with-mpi-dir=/Users/zjorti/.brew --download-hypre --with-debugging=0 > --with-cxx-dialect=C++11 --download-superlu_dist --download-parmetis > --download-metis --download-ptscotch --download-cmake > > > Then I tried this flag > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_view_pmat > binary:BVmat:binary_matlab and checked the resulting matrix in Matlab. > For Matlab, this is a full rank matrix, and the LU factorization there was > carried out without any issues. > > I also outputted the BV block directly from the Jacobian matrix. > Once again, according to Matlab, it is a full rank matrix and it computes > the LU factorization without any problem. > > > ------------------------------ > *From:* Barry Smith > *Sent:* Friday, June 10, 2022 7:32 AM > *To:* Jorti, Zakariae > *Cc:* petsc-users at mcs.anl.gov; Tang, Xianzhu > *Subject:* [EXTERNAL] Re: [petsc-users] Question about SuperLU > > > It is difficult to tell exactly how the preconditioner is being formed > with the information below it looks like in the > > first case: the original B diagonal block and V diagonal block of the > matrix are being factored separately with SuperLU_DIST > > second case: the B block is factored with SuperLU_DIST and an explicit > approximation to a Schur complement of the V block (Schur complement on > eliminating the B block) is formed using "Preconditioner for the Schur > complement formed from Sp, an assembled approximation to S, which uses > A00's %sdiagonal's inverse" (this is the printout from a KSPView() for this > part of the preconditioner). > > My guess is you have a "Stokes"-like problem where the V block is > identically 0 so, of course, the SuperLU_DIST will fail on it. But the > approximation of the Schur complement onto that block is not singular so > SuperLU_DIST has no trouble. If I am wrong and the V block is not > identically 0 then it may be singular (or possibly, but less likely just > badly order) so that SuperLU_DIST encounters a zero pivot. > > You can run with -ksp_view_pre to have the KSP print the KSP solver > algorithm details BEFORE the linear solve (hence they would get printed > despite your failed solve). That would be useful to see exactly what your > preconditioner is. > > You can use -ksp_view_pmat (with appropriate prefix) to display the > matrix that is going to be factored. Thus you can quickly verify what V is. > > If you run with -ksp_error_if_not_converged then the solver will stop > exactly when the zero pivot is encountered; this would include some > information from SuperLU_DIST which might include the row number etc. > > Notes on PETSc improvements needed. > > 1) The man page for KSPCheckSolve() is terribly misleading > > 2) It would be nice to have a view that displayed the nested fieldsplit > preconditioners more clearly > > > > > > > On Jun 9, 2022, at 5:19 PM, Jorti, Zakariae via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hi, > > I am solving non-linear problem that has 5 unknowns {ni, T, E, B, V}, and > for the preconditioning part, I am using a FieldSplit preconditioner. At > the last fieldsplit/level, we are left with a {B,V} block that tried to > precondition in 2 different ways: > a) SuperLU: > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_type preonly > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_type lu > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_factor_mat_solver_type > superlu_dist > b) a Schur-based fieldsplit preconditioner that uses SuperLU for both V > and B blocks: > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_type gmres > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_type fieldsplit > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_fieldsplit_type schur > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_fieldsplit_schur_precondition > selfp -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_ksp_type > preonly -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_pc_type > lu > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_pc_factor_mat_solver_type > superlu_dist > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_ksp_type preonly > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_pc_type lu > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_pc_factor_mat_solver_type > superlu_dist > > Option a) yields the following error: > " Linear fieldsplit_ni_ solve converged due to CONVERGED_ATOL > iterations 0 > Linear fieldsplit_TEBV_fieldsplit_tau_ solve converged due to > CONVERGED_RTOL iterations 1 > Linear fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_EP_ solve > converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ solve did > not converge due to DIVERGED_PC_FAILED iterations 0 > PC failed due to FACTOR_NUMERIC_ZEROPIVOT " > whereas options b) seems to be working well. > Is it possible that the SuperLU on the {V,B} block uses a reordering that > introduces a zero pivot or could there be another explanation for this > error? > > Many thanks. > Best, > > Zakariae > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xsli at lbl.gov Fri Jun 10 19:44:05 2022 From: xsli at lbl.gov (Xiaoye S. Li) Date: Fri, 10 Jun 2022 17:44:05 -0700 Subject: [petsc-users] Question about SuperLU In-Reply-To: <451BCBEE-FCC1-44D2-946F-4AF80403E6A9@msu.edu> References: <451BCBEE-FCC1-44D2-946F-4AF80403E6A9@msu.edu> Message-ID: On Fri, Jun 10, 2022 at 7:43 AM Tang, Qi wrote: > ?We use superlu_dist. > > We have a 2 x 2 block where directly calling suplerlu_dist fails, but a pc > based on a fieldsplit Schur complement + superlu_dist on the assembled > Schur complement matrix converges. (All the converge criteria are default > at this level) > > I am having a hard time to understand what is going on. The B,V block is > of size 240K, so it is also hard to analyze. And the mat is not something > we explicitly formed. It is formed by finite difference coloring Jacobian + > a few levels of Schur complement. > > / A 0 \ > \ 0 I / > > Matt, I do not see this can explain why the second pc with superlu on S = > A would succeed, if A is not full rank. > > I believe I found somewhere it says petsc?s pclu (or maybe superlu_dist) > did reordering and it may introduce 0 pivoting. We are asking because it > seems there is something we do not understand from pclu/superlu level. > If the matrix is non-singular, and you use the RowPerm default option: LargeDiag_MC64, then you won't have zero pivot (in a structural sense), unless numerical cancellation causes the diagonal element underflow, then flush to zero. You can try to set ReplaceTinyPivot: -mat_superlu_dist_replacetinypivot replace tiny pivots See my reply in another email. Sherry > Anyway, is there a way to output the mat before it fails? We have been > struggling to do that. We have TSSolve->SNES->FDColoringJacobian->A few > levels of fieldsplit->failed Subblock matrix, which we want to analyze. > (Sometimes it even happens in the second Newton iteration as the first one > works okay.) > > Qi > > > > On Jun 10, 2022, at 8:11 AM, Matthew Knepley wrote: > > ? > On Thu, Jun 9, 2022 at 5:20 PM Jorti, Zakariae via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> Hi, >> >> I am solving non-linear problem that has 5 unknowns {ni, T, E, B, V}, and >> for the preconditioning part, I am using a FieldSplit preconditioner. At >> the last fieldsplit/level, we are left with a {B,V} block that tried to >> precondition in 2 different ways: >> a) SuperLU: >> -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_type preonly >> -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_type lu >> -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_factor_mat_solver_type >> superlu_dist >> b) a Schur-based fieldsplit preconditioner that uses SuperLU for both V >> and B blocks: >> -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_type gmres >> -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_type fieldsplit >> -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_fieldsplit_type schur >> -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_fieldsplit_schur_precondition >> selfp -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_ksp_type >> preonly -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_pc_type >> lu >> -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_pc_factor_mat_solver_type >> superlu_dist >> -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_ksp_type preonly >> -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_pc_type lu >> -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_pc_factor_mat_solver_type >> superlu_dist >> >> Option a) yields the following error: >> " Linear fieldsplit_ni_ solve converged due to CONVERGED_ATOL >> iterations 0 >> Linear fieldsplit_TEBV_fieldsplit_tau_ solve converged due to >> CONVERGED_RTOL iterations 1 >> Linear fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_EP_ solve >> converged due to CONVERGED_RTOL iterations 5 >> Linear fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ solve did >> not converge due to DIVERGED_PC_FAILED iterations 0 >> PC failed due to FACTOR_NUMERIC_ZEROPIVOT " >> whereas options b) seems to be working well. >> Is it possible that the SuperLU on the {V,B} block uses a reordering that >> introduces a zero pivot or could there be another explanation for this >> error? >> > > I can at least come up with a case where this is true. Suppose you have > > / A 0 \ > \ 0 I / > > where A is rank deficient, but has a positive diagonal. SuperLU will fail > since it is actually singular. However, your Schur complement might work > since you use > 'selfp' for the Schur preconditioner, and it just extracts the diagonal. > > Thanks, > > Matt > > >> Many thanks. >> Best, >> >> Zakariae >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail2amneet at gmail.com Fri Jun 10 22:51:55 2022 From: mail2amneet at gmail.com (Amneet Bhalla) Date: Fri, 10 Jun 2022 20:51:55 -0700 Subject: [petsc-users] Error with PetscMatlabEngineCreate In-Reply-To: <8091FE26-A225-4461-BC7D-1E5362D65DD6@petsc.dev> References: <8091FE26-A225-4461-BC7D-1E5362D65DD6@petsc.dev> Message-ID: Thanks Barry. Adding absolute paths made it work with "master" hostname. We have added both options in our code just in case the NULL hostname does not work on a different machine. https://github.com/IBAMR/cfd-mpc-wecs/blob/main/main.cpp#L346-L361 On Fri, Jun 10, 2022 at 1:48 PM Barry Smith wrote: > > Based on your report the issue is likely due to a MATLABPATH issue. The > difference between using "master" and NULL is that when "master" is used, > PETSc ssh's to "master" to startup the Matlab engine while with NULL it > launches the Matlab engine directly from the current process in > (presumably) the current directory. > > When ssh is used it does not have information about the current > directory nor would it have any tweaks you have made to your MATLABPATH in > your shell. Thus it cannot find the script. > > Thus when not using NULL you need to make sure that all scripts you > plan to launch are findable in the MATLABPATH on the machine you are > launching the Matlab engine, maybe by putting the directories in MATLABPATH > on in your .bashrc or .profile file or whatever file gets sourced > automatically when you ssh to master. > > Barry > > I have no explanation why the behavior would change with PETSc versions > or Matlab versions but the above should resolve the problem; you may have > previously just been "lucky" it could fine the script. > > > > On Jun 10, 2022, at 4:14 PM, Kaustubh Khedkar via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hi all, > > I am using the Petsc?s Matlab engine to run some Matlab scripts from my > c++ code. > > I have been using Petsc?s Github commit: > 9babe2dd5ff256baf1aab74d81ff9ed4c6baba0b > (HEAD -> master, origin/master, origin/HEAD) > Merge: e9b74a6d12 bb2d6e605a > Author: Satish Balay > Date: Fri Nov 6 17:46:10 2020 +0000 > > I used the command: > > PetscMatlabEngineCreate(PETSC_COMM_SELF, "master", &(mengine)); > > where, the hostname is master. (Verified by typing hostname in the > terminal) > > Everything was working fine until I updated my PETSc version to 3.17.2. > Using this version I get error using the command: > > PetscMatlabEngineEvaluate(mengine, "load_parameters;?); > > cannot read load_parameters script. where, load_parameters is a Matlab > script. > > When I switch the hostname to NULL from master as: > > PetscMatlabEngineCreate(PETSC_COMM_SELF, NULL, &(mengine)); > > Everything starts working fine again. All of this was executed on the same > machine. > > Has anything changed when using the PetscMatlabEngineEvaluate command? > > > Thank you, > Kaustubh Khedkar > > > -- --Amneet -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Sat Jun 11 09:25:36 2022 From: mfadams at lbl.gov (Mark Adams) Date: Sat, 11 Jun 2022 10:25:36 -0400 Subject: [petsc-users] ead a matrix and vector from a file and solve a linear system in parallel In-Reply-To: References: Message-ID: Others would know more than me but there is a non-complex example in src/mat/tests/ex72.c Mark On Fri, Jun 10, 2022 at 7:52 PM luciano Hammond Noratto wrote: > Hello, > > I was trying to read a matrix and vector from here > https://sparse.tamu.edu/FEMLAB/waveguide3D and then solve the linear > system in parallel without success. I'm new in PETSC, I'd really appreciate > it if someone could help me to solve this problem. > > Luciano > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sat Jun 11 09:45:56 2022 From: bsmith at petsc.dev (Barry Smith) Date: Sat, 11 Jun 2022 10:45:56 -0400 Subject: [petsc-users] [EXTERNAL] Question about SuperLU In-Reply-To: References: Message-ID: <3BD31D76-AC27-47E5-85E2-9759C3ADF98C@petsc.dev> > On Jun 10, 2022, at 7:30 PM, Jorti, Zakariae wrote: > > Hi, > > Thank you all for your answers. > I have tried your suggestions and here is what I found. > Barry you were right about the first case. But in the second case, I am not using a Schur fieldsplit but a multiplicative fieldsplit : -fieldsplit_TEBV_fieldsplit_EBV_pc_type fieldsplit -fieldsplit_TEBV_fieldsplit_EBV_pc_fieldsplit_type multiplicative The previous email indicated > b) a Schur-based fieldsplit preconditioner that uses SuperLU for both V and B blocks: > -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_type gmres -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_type fieldsplit -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_fieldsplit_type schur - which means there is a Schur complement PC inside the multiplicative so my explanation that the Schur complement "saves" the problem by passing into SuperLU_DIST a non-singular matrix that is some approximation to the Schur complement could be true. > Then I tried this flag -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_view_pmat binary:BVmat:binary_matlab and checked the resulting matrix in Matlab. > For Matlab, this is a full rank matrix, and the LU factorization there was carried out without any issues. > I also outputted the BV block directly from the Jacobian matrix. > Once again, according to Matlab, it is a full rank matrix and it computes the LU factorization without any problem. The BV matrix you saved into Matlab is a "block" matrix where the first block is B and the second block V (presumably both the same size). Can you, in Matlab, extract the two blocks separated and examine them (via say spy) and also have Matlab factor each of them separately? In your failed fieldsplit case SuperLU_DIST is factoring each of these matrices separately which could produce a zero pivot that would not occur when the larger matrix (of both blocks) is factored together. Let's see what happens with Matlab's solver. It looks like you are running on one rank? If the above process is not informative this is what you do next. Use PetscBinaryWrite() from Matlab to save each of the two blocks (one for B and one for V) to two files. Then use a simple standalone PETSc code, say src/ksp/ksp/tutorials/ex10.c to read each of the files and use SuperLU_DIST directly on each of the two linear systems. This will, at least to my understanding, result in the exact same SuperLU_DIST solves that you get with the failed use of PCFIELDSPLIT. If they succeed or fail will be very informative. Barry > > So, there should not be any Schur complement approximation Sp. > > When I ran a test with -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_view_pre -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_error_if_not_converged, I got this error: > > 0 SNES Function norm 6.368031218939e-02 > 0 KSP Residual norm 6.368031218939e-02 > Linear fieldsplit_ni_ solve converged due to CONVERGED_ATOL iterations 0 > Linear fieldsplit_TEBV_fieldsplit_tau_ solve converged due to CONVERGED_RTOL iterations 1 > Linear fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_EP_ solve converged due to CONVERGED_RTOL iterations 3 > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Zero pivot in LU factorization: https://petsc.org/release/faq/#zeropivot > [0]PETSC ERROR: Zero pivot in row 1658 > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.16.3-751-g2f43bd9bc3 GIT Date: 2022-01-26 22:34:02 -0600 > [0]PETSC ERROR: ./main on a macx named pn2032683.lanl.gov by zjorti Fri Jun 10 16:17:35 2022 > [0]PETSC ERROR: Configure options PETSC_ARCH=macx --with-fc=0 --with-mpi-dir=/Users/zjorti/.brew --download-hypre --with-debugging=0 --with-cxx-dialect=C++11 --download-superlu_dist --download-parmetis --download-metis --download-ptscotch --download-cmake > > > Then I tried this flag -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_view_pmat binary:BVmat:binary_matlab and checked the resulting matrix in Matlab. > For Matlab, this is a full rank matrix, and the LU factorization there was carried out without any issues. > I also outputted the BV block directly from the Jacobian matrix. > Once again, according to Matlab, it is a full rank matrix and it computes the LU factorization without any problem. > > > From: Barry Smith > > Sent: Friday, June 10, 2022 7:32 AM > To: Jorti, Zakariae > Cc: petsc-users at mcs.anl.gov ; Tang, Xianzhu > Subject: [EXTERNAL] Re: [petsc-users] Question about SuperLU > > > It is difficult to tell exactly how the preconditioner is being formed with the information below it looks like in the > > first case: the original B diagonal block and V diagonal block of the matrix are being factored separately with SuperLU_DIST > > second case: the B block is factored with SuperLU_DIST and an explicit approximation to a Schur complement of the V block (Schur complement on eliminating the B block) is formed using "Preconditioner for the Schur complement formed from Sp, an assembled approximation to S, which uses A00's %sdiagonal's inverse" (this is the printout from a KSPView() for this part of the preconditioner). > > My guess is you have a "Stokes"-like problem where the V block is identically 0 so, of course, the SuperLU_DIST will fail on it. But the approximation of the Schur complement onto that block is not singular so SuperLU_DIST has no trouble. If I am wrong and the V block is not identically 0 then it may be singular (or possibly, but less likely just badly order) so that SuperLU_DIST encounters a zero pivot. > > You can run with -ksp_view_pre to have the KSP print the KSP solver algorithm details BEFORE the linear solve (hence they would get printed despite your failed solve). That would be useful to see exactly what your preconditioner is. > > You can use -ksp_view_pmat (with appropriate prefix) to display the matrix that is going to be factored. Thus you can quickly verify what V is. > > If you run with -ksp_error_if_not_converged then the solver will stop exactly when the zero pivot is encountered; this would include some information from SuperLU_DIST which might include the row number etc. > > Notes on PETSc improvements needed. > > 1) The man page for KSPCheckSolve() is terribly misleading > > 2) It would be nice to have a view that displayed the nested fieldsplit preconditioners more clearly > > > > > > >> On Jun 9, 2022, at 5:19 PM, Jorti, Zakariae via petsc-users > wrote: >> >> Hi, >> >> I am solving non-linear problem that has 5 unknowns {ni, T, E, B, V}, and for the preconditioning part, I am using a FieldSplit preconditioner. At the last fieldsplit/level, we are left with a {B,V} block that tried to precondition in 2 different ways: >> a) SuperLU: >> -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_type preonly -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_type lu -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_factor_mat_solver_type superlu_dist >> b) a Schur-based fieldsplit preconditioner that uses SuperLU for both V and B blocks: >> -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_type gmres -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_type fieldsplit -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_fieldsplit_type schur -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_fieldsplit_schur_precondition selfp -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_ksp_type preonly -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_pc_type lu -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_pc_factor_mat_solver_type superlu_dist -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_ksp_type preonly -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_pc_type lu -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_pc_factor_mat_solver_type superlu_dist >> >> Option a) yields the following error: >> " Linear fieldsplit_ni_ solve converged due to CONVERGED_ATOL iterations 0 >> Linear fieldsplit_TEBV_fieldsplit_tau_ solve converged due to CONVERGED_RTOL iterations 1 >> Linear fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_EP_ solve converged due to CONVERGED_RTOL iterations 5 >> Linear fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ solve did not converge due to DIVERGED_PC_FAILED iterations 0 >> PC failed due to FACTOR_NUMERIC_ZEROPIVOT " >> whereas options b) seems to be working well. >> Is it possible that the SuperLU on the {V,B} block uses a reordering that introduces a zero pivot or could there be another explanation for this error? >> >> Many thanks. >> Best, >> >> Zakariae -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Sat Jun 11 10:15:07 2022 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Sat, 11 Jun 2022 15:15:07 +0000 Subject: [petsc-users] [EXTERNAL] Question about SuperLU In-Reply-To: <3BD31D76-AC27-47E5-85E2-9759C3ADF98C@petsc.dev> References: <3BD31D76-AC27-47E5-85E2-9759C3ADF98C@petsc.dev> Message-ID: If each block is sequential, try replace SuperLU_DIST with SuperLU, which would be more robust. You may also try MUMPS LU. Hong ________________________________ From: petsc-users on behalf of Barry Smith Sent: Saturday, June 11, 2022 9:45 AM To: Jorti, Zakariae Cc: petsc-users at mcs.anl.gov ; Tang, Xianzhu Subject: Re: [petsc-users] [EXTERNAL] Question about SuperLU On Jun 10, 2022, at 7:30 PM, Jorti, Zakariae > wrote: Hi, Thank you all for your answers. I have tried your suggestions and here is what I found. Barry you were right about the first case. But in the second case, I am not using a Schur fieldsplit but a multiplicative fieldsplit : -fieldsplit_TEBV_fieldsplit_EBV_pc_type fieldsplit -fieldsplit_TEBV_fieldsplit_EBV_pc_fieldsplit_type multiplicative The previous email indicated b) a Schur-based fieldsplit preconditioner that uses SuperLU for both V and B blocks: -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_type gmres -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_type fieldsplit -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_fieldsplit_type schur - which means there is a Schur complement PC inside the multiplicative so my explanation that the Schur complement "saves" the problem by passing into SuperLU_DIST a non-singular matrix that is some approximation to the Schur complement could be true. Then I tried this flag -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_view_pmat binary:BVmat:binary_matlab and checked the resulting matrix in Matlab. For Matlab, this is a full rank matrix, and the LU factorization there was carried out without any issues. I also outputted the BV block directly from the Jacobian matrix. Once again, according to Matlab, it is a full rank matrix and it computes the LU factorization without any problem. The BV matrix you saved into Matlab is a "block" matrix where the first block is B and the second block V (presumably both the same size). Can you, in Matlab, extract the two blocks separated and examine them (via say spy) and also have Matlab factor each of them separately? In your failed fieldsplit case SuperLU_DIST is factoring each of these matrices separately which could produce a zero pivot that would not occur when the larger matrix (of both blocks) is factored together. Let's see what happens with Matlab's solver. It looks like you are running on one rank? If the above process is not informative this is what you do next. Use PetscBinaryWrite() from Matlab to save each of the two blocks (one for B and one for V) to two files. Then use a simple standalone PETSc code, say src/ksp/ksp/tutorials/ex10.c to read each of the files and use SuperLU_DIST directly on each of the two linear systems. This will, at least to my understanding, result in the exact same SuperLU_DIST solves that you get with the failed use of PCFIELDSPLIT. If they succeed or fail will be very informative. Barry So, there should not be any Schur complement approximation Sp. When I ran a test with -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_view_pre -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_error_if_not_converged, I got this error: 0 SNES Function norm 6.368031218939e-02 0 KSP Residual norm 6.368031218939e-02 Linear fieldsplit_ni_ solve converged due to CONVERGED_ATOL iterations 0 Linear fieldsplit_TEBV_fieldsplit_tau_ solve converged due to CONVERGED_RTOL iterations 1 Linear fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_EP_ solve converged due to CONVERGED_RTOL iterations 3 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Zero pivot in LU factorization: https://petsc.org/release/faq/#zeropivot [0]PETSC ERROR: Zero pivot in row 1658 [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.16.3-751-g2f43bd9bc3 GIT Date: 2022-01-26 22:34:02 -0600 [0]PETSC ERROR: ./main on a macx named pn2032683.lanl.gov by zjorti Fri Jun 10 16:17:35 2022 [0]PETSC ERROR: Configure options PETSC_ARCH=macx --with-fc=0 --with-mpi-dir=/Users/zjorti/.brew --download-hypre --with-debugging=0 --with-cxx-dialect=C++11 --download-superlu_dist --download-parmetis --download-metis --download-ptscotch --download-cmake Then I tried this flag -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_view_pmat binary:BVmat:binary_matlab and checked the resulting matrix in Matlab. For Matlab, this is a full rank matrix, and the LU factorization there was carried out without any issues. I also outputted the BV block directly from the Jacobian matrix. Once again, according to Matlab, it is a full rank matrix and it computes the LU factorization without any problem. ________________________________ From: Barry Smith > Sent: Friday, June 10, 2022 7:32 AM To: Jorti, Zakariae Cc: petsc-users at mcs.anl.gov; Tang, Xianzhu Subject: [EXTERNAL] Re: [petsc-users] Question about SuperLU It is difficult to tell exactly how the preconditioner is being formed with the information below it looks like in the first case: the original B diagonal block and V diagonal block of the matrix are being factored separately with SuperLU_DIST second case: the B block is factored with SuperLU_DIST and an explicit approximation to a Schur complement of the V block (Schur complement on eliminating the B block) is formed using "Preconditioner for the Schur complement formed from Sp, an assembled approximation to S, which uses A00's %sdiagonal's inverse" (this is the printout from a KSPView() for this part of the preconditioner). My guess is you have a "Stokes"-like problem where the V block is identically 0 so, of course, the SuperLU_DIST will fail on it. But the approximation of the Schur complement onto that block is not singular so SuperLU_DIST has no trouble. If I am wrong and the V block is not identically 0 then it may be singular (or possibly, but less likely just badly order) so that SuperLU_DIST encounters a zero pivot. You can run with -ksp_view_pre to have the KSP print the KSP solver algorithm details BEFORE the linear solve (hence they would get printed despite your failed solve). That would be useful to see exactly what your preconditioner is. You can use -ksp_view_pmat (with appropriate prefix) to display the matrix that is going to be factored. Thus you can quickly verify what V is. If you run with -ksp_error_if_not_converged then the solver will stop exactly when the zero pivot is encountered; this would include some information from SuperLU_DIST which might include the row number etc. Notes on PETSc improvements needed. 1) The man page for KSPCheckSolve() is terribly misleading 2) It would be nice to have a view that displayed the nested fieldsplit preconditioners more clearly On Jun 9, 2022, at 5:19 PM, Jorti, Zakariae via petsc-users > wrote: Hi, I am solving non-linear problem that has 5 unknowns {ni, T, E, B, V}, and for the preconditioning part, I am using a FieldSplit preconditioner. At the last fieldsplit/level, we are left with a {B,V} block that tried to precondition in 2 different ways: a) SuperLU: -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_type preonly -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_type lu -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_factor_mat_solver_type superlu_dist b) a Schur-based fieldsplit preconditioner that uses SuperLU for both V and B blocks: -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_type gmres -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_type fieldsplit -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_fieldsplit_type schur -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_fieldsplit_schur_precondition selfp -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_ksp_type preonly -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_pc_type lu -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_pc_factor_mat_solver_type superlu_dist -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_ksp_type preonly -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_pc_type lu -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_pc_factor_mat_solver_type superlu_dist Option a) yields the following error: " Linear fieldsplit_ni_ solve converged due to CONVERGED_ATOL iterations 0 Linear fieldsplit_TEBV_fieldsplit_tau_ solve converged due to CONVERGED_RTOL iterations 1 Linear fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_EP_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ solve did not converge due to DIVERGED_PC_FAILED iterations 0 PC failed due to FACTOR_NUMERIC_ZEROPIVOT " whereas options b) seems to be working well. Is it possible that the SuperLU on the {V,B} block uses a reordering that introduces a zero pivot or could there be another explanation for this error? Many thanks. Best, Zakariae -------------- next part -------------- An HTML attachment was scrubbed... URL: From tangqi at msu.edu Sat Jun 11 12:39:25 2022 From: tangqi at msu.edu (Tang, Qi) Date: Sat, 11 Jun 2022 17:39:25 +0000 Subject: [petsc-users] [EXTERNAL] Question about SuperLU In-Reply-To: <3BD31D76-AC27-47E5-85E2-9759C3ADF98C@petsc.dev> References: <3BD31D76-AC27-47E5-85E2-9759C3ADF98C@petsc.dev> Message-ID: <1B71D61A-AEED-4420-B93A-AAF786E92C5F@msu.edu> Thanks for explaining. Let me summarize what we found so far. Barry was correct on the fieldsplit comment. * Applying superlu_dist to the entire BV block failed (?together factorization?) * Applying superlu_dist to the selfp version of schur complement for B and the diagonal sub-block for V succeeded (?separate factorization?). The solution looks fine. It is a regularized saddle point problem, so the diagonal blocks are not singular. There is no implementation to switch between two options on our end, so that should exclude any potential bugs in the solver level. The original size of BV block is 240K, so we have to use superlu_dist. We also check its precondition number and it looks fine. Now we downsize the problem so that we can analyze in matlab. The BV size becomes roughly 10K. We checked various things using matlab and it seems the matrix looks fine from matlab. But superlu_dist still runs into zero pivoting error (all default options on superlu_dist). Yes, we should try superlu or mumps, which is a good suggestion. We will also try change the superlu_dist flag as Sherry suggested. Now we know a few detections to do on petsc/matlab. Thanks for all the good suggestions. Qi On Jun 11, 2022, at 8:46 AM, Barry Smith wrote: ? On Jun 10, 2022, at 7:30 PM, Jorti, Zakariae > wrote: Hi, Thank you all for your answers., I have tried your suggestions and here is what I found. Barry you were right about the first case. But in the second case, I am not using a Schur fieldsplit but a multiplicative fieldsplit : -fieldsplit_TEBV_fieldsplit_EBV_pc_type fieldsplit -fieldsplit_TEBV_fieldsplit_EBV_pc_fieldsplit_type multiplicative The previous email indicated b) a Schur-based fieldsplit preconditioner that uses SuperLU for both V and B blocks: -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_type gmres -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_type fieldsplit -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_fieldsplit_type schur - which means there is a Schur complement PC inside the multiplicative so my explanation that the Schur complement "saves" the problem by passing into SuperLU_DIST a non-singular matrix that is some approximation to the Schur complement could be true. Then I tried this flag -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_view_pmat binary:BVmat:binary_matlab and checked the resulting matrix in Matlab. For Matlab, this is a full rank matrix, and the LU factorization there was carried out without any issues. I also outputted the BV block directly from the Jacobian matrix. Once again, according to Matlab, it is a full rank matrix and it computes the LU factorization without any problem. The BV matrix you saved into Matlab is a "block" matrix where the first block is B and the second block V (presumably both the same size). Can you, in Matlab, extract the two blocks separated and examine them (via say spy) and also have Matlab factor each of them separately? In your failed fieldsplit case SuperLU_DIST is factoring each of these matrices separately which could produce a zero pivot that would not occur when the larger matrix (of both blocks) is factored together. Let's see what happens with Matlab's solver. It looks like you are running on one rank? If the above process is not informative this is what you do next. Use PetscBinaryWrite() from Matlab to save each of the two blocks (one for B and one for V) to two files. Then use a simple standalone PETSc code, say src/ksp/ksp/tutorials/ex10.c to read each of the files and use SuperLU_DIST directly on each of the two linear systems. This will, at least to my understanding, result in the exact same SuperLU_DIST solves that you get with the failed use of PCFIELDSPLIT. If they succeed or fail will be very informative. Barry So, there should not be any Schur complement approximation Sp. When I ran a test with -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_view_pre -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_error_if_not_converged, I got this error: 0 SNES Function norm 6.368031218939e-02 0 KSP Residual norm 6.368031218939e-02 Linear fieldsplit_ni_ solve converged due to CONVERGED_ATOL iterations 0 Linear fieldsplit_TEBV_fieldsplit_tau_ solve converged due to CONVERGED_RTOL iterations 1 Linear fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_EP_ solve converged due to CONVERGED_RTOL iterations 3 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Zero pivot in LU factorization: https://petsc.org/release/faq/#zeropivot [0]PETSC ERROR: Zero pivot in row 1658 [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.16.3-751-g2f43bd9bc3 GIT Date: 2022-01-26 22:34:02 -0600 [0]PETSC ERROR: ./main on a macx named pn2032683.lanl.gov by zjorti Fri Jun 10 16:17:35 2022 [0]PETSC ERROR: Configure options PETSC_ARCH=macx --with-fc=0 --with-mpi-dir=/Users/zjorti/.brew --download-hypre --with-debugging=0 --with-cxx-dialect=C++11 --download-superlu_dist --download-parmetis --download-metis --download-ptscotch --download-cmake Then I tried this flag -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_view_pmat binary:BVmat:binary_matlab and checked the resulting matrix in Matlab. For Matlab, this is a full rank matrix, and the LU factorization there was carried out without any issues. I also outputted the BV block directly from the Jacobian matrix. Once again, according to Matlab, it is a full rank matrix and it computes the LU factorization without any problem. ________________________________ From: Barry Smith > Sent: Friday, June 10, 2022 7:32 AM To: Jorti, Zakariae Cc: petsc-users at mcs.anl.gov; Tang, Xianzhu Subject: [EXTERNAL] Re: [petsc-users] Question about SuperLU It is difficult to tell exactly how the preconditioner is being formed with the information below it looks like in the first case: the original B diagonal block and V diagonal block of the matrix are being factored separately with SuperLU_DIST second case: the B block is factored with SuperLU_DIST and an explicit approximation to a Schur complement of the V block (Schur complement on eliminating the B block) is formed using "Preconditioner for the Schur complement formed from Sp, an assembled approximation to S, which uses A00's %sdiagonal's inverse" (this is the printout from a KSPView() for this part of the preconditioner). My guess is you have a "Stokes"-like problem where the V block is identically 0 so, of course, the SuperLU_DIST will fail on it. But the approximation of the Schur complement onto that block is not singular so SuperLU_DIST has no trouble. If I am wrong and the V block is not identically 0 then it may be singular (or possibly, but less likely just badly order) so that SuperLU_DIST encounters a zero pivot. You can run with -ksp_view_pre to have the KSP print the KSP solver algorithm details BEFORE the linear solve (hence they would get printed despite your failed solve). That would be useful to see exactly what your preconditioner is. You can use -ksp_view_pmat (with appropriate prefix) to display the matrix that is going to be factored. Thus you can quickly verify what V is. If you run with -ksp_error_if_not_converged then the solver will stop exactly when the zero pivot is encountered; this would include some information from SuperLU_DIST which might include the row number etc. Notes on PETSc improvements needed. 1) The man page for KSPCheckSolve() is terribly misleading 2) It would be nice to have a view that displayed the nested fieldsplit preconditioners more clearly On Jun 9, 2022, at 5:19 PM, Jorti, Zakariae via petsc-users > wrote: Hi, I am solving non-linear problem that has 5 unknowns {ni, T, E, B, V}, and for the preconditioning part, I am using a FieldSplit preconditioner. At the last fieldsplit/level, we are left with a {B,V} block that tried to precondition in 2 different ways: a) SuperLU: -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_type preonly -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_type lu -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_factor_mat_solver_type superlu_dist b) a Schur-based fieldsplit preconditioner that uses SuperLU for both V and B blocks: -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ksp_type gmres -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_type fieldsplit -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_fieldsplit_type schur -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_pc_fieldsplit_schur_precondition selfp -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_ksp_type preonly -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_pc_type lu -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_B_pc_factor_mat_solver_type superlu_dist -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_ksp_type preonly -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_pc_type lu -fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_fieldsplit_V_pc_factor_mat_solver_type superlu_dist Option a) yields the following error: " Linear fieldsplit_ni_ solve converged due to CONVERGED_ATOL iterations 0 Linear fieldsplit_TEBV_fieldsplit_tau_ solve converged due to CONVERGED_RTOL iterations 1 Linear fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_EP_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_TEBV_fieldsplit_EBV_fieldsplit_BV_ solve did not converge due to DIVERGED_PC_FAILED iterations 0 PC failed due to FACTOR_NUMERIC_ZEROPIVOT " whereas options b) seems to be working well. Is it possible that the SuperLU on the {V,B} block uses a reordering that introduces a zero pivot or could there be another explanation for this error? Many thanks. Best, Zakariae -------------- next part -------------- An HTML attachment was scrubbed... URL: From samuelestes91 at gmail.com Sat Jun 11 19:32:20 2022 From: samuelestes91 at gmail.com (Samuel Estes) Date: Sat, 11 Jun 2022 19:32:20 -0500 Subject: [petsc-users] Mat preallocation for adaptive grid Message-ID: Hello, My question concerns preallocation for Mats in adaptive FEM problems. When the grid refines, I destroy the old matrix and create a new one of the appropriate (larger size). When the grid ?un-refines? I just use the same (extra large) matrix and pad the extra unused diagonal entries with 1?s. The problem comes in with the preallocation. I use the MatPreallocator, MatPreallocatorPreallocate() paradigm which requires a specific sparsity pattern. When the grid un-refines, although the total number of nonzeros allocated is (most likely) more than sufficient, the particular sparsity pattern changes which leads to mallocs in the MatSetValues routines and obviously I would like to avoid this. One obvious solution is just to destroy and recreate the matrix any time the grid changes, even if it gets smaller. By just using a new matrix every time, I would avoid this problem although at the cost of having to rebuild the matrix more often than necessary. This is the simplest solution from a programming perspective and probably the one I will go with. I'm just curious if there's an alternative that you would recommend? Basically what I would like to do is to just change the sparsity pattern that is created in the MatPreallocatorPreallocate() routine. I'm not sure how it works under the hood, but in principle, it should be possible to keep the memory allocated for the Mat values and just assign them new column numbers and potentially add new nonzeros as well. Is there a convenient way of doing this? One thought I had was to just fill in the MatPreallocator object with the new sparsity pattern of the coarser mesh and then call the MatPreallocatorPreallocate() routine again with the new MatPreallocator matrix. I'm just not sure how exactly that would work since it would have already been called for the FEM matrix for the previous, finer grid. Finally, does this really matter? I imagine the bottleneck (assuming good preallocation) is in the solver so maybe it doesn't make much difference whether or not I reuse the old matrix. In that case, going with option 1 and simply destroying and recreating the matrix would be the way to go just to save myself some time. I hope that my question is clear. If not, please let me know and I will clarify. I am very curious if there's a convenient solution for the second option I mentioned to recycle the allocated memory and redo the sparsity pattern. Thanks! Sam -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Jun 11 19:38:35 2022 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 11 Jun 2022 20:38:35 -0400 Subject: [petsc-users] Mat preallocation for adaptive grid In-Reply-To: References: Message-ID: On Sat, Jun 11, 2022 at 8:32 PM Samuel Estes wrote: > Hello, > > My question concerns preallocation for Mats in adaptive FEM problems. When > the grid refines, I destroy the old matrix and create a new one of the > appropriate (larger size). When the grid ?un-refines? I just use the same > (extra large) matrix and pad the extra unused diagonal entries with 1?s. > The problem comes in with the preallocation. I use the MatPreallocator, > MatPreallocatorPreallocate() paradigm which requires a specific sparsity > pattern. When the grid un-refines, although the total number of nonzeros > allocated is (most likely) more than sufficient, the particular sparsity > pattern changes which leads to mallocs in the MatSetValues routines and > obviously I would like to avoid this. > > One obvious solution is just to destroy and recreate the matrix any time > the grid changes, even if it gets smaller. By just using a new matrix every > time, I would avoid this problem although at the cost of having to rebuild > the matrix more often than necessary. This is the simplest solution from a > programming perspective and probably the one I will go with. > > I'm just curious if there's an alternative that you would recommend? > Basically what I would like to do is to just change the sparsity pattern > that is created in the MatPreallocatorPreallocate() routine. I'm not sure > how it works under the hood, but in principle, it should be possible to > keep the memory allocated for the Mat values and just assign them new > column numbers and potentially add new nonzeros as well. Is there a > convenient way of doing this? One thought I had was to just fill in the > MatPreallocator object with the new sparsity pattern of the coarser mesh > and then call the MatPreallocatorPreallocate() routine again with the new > MatPreallocator matrix. I'm just not sure how exactly that would work since > it would have already been called for the FEM matrix for the previous, > finer grid. > > Finally, does this really matter? I imagine the bottleneck (assuming good > preallocation) is in the solver so maybe it doesn't make much difference > whether or not I reuse the old matrix. In that case, going with option 1 > and simply destroying and recreating the matrix would be the way to go just > to save myself some time. > > I hope that my question is clear. If not, please let me know and I will > clarify. I am very curious if there's a convenient solution for the second > option I mentioned to recycle the allocated memory and redo the sparsity > pattern. > I have not run any tests of this kind of thing, so I cannot say definitively. I can say that I consider the reuse of memory a problem to be solved at allocation time. You would hope that a good malloc system would give you back the same memory you just freed when getting rid of the prior matrix, so you would get the speedup you want using your approach. Second, I think the allocation cost is likely to pale in comparison to the cost of writing the matrix itself (passing all those indices and values through the memory bus), and so reuse of the memory is not that important (I think). Thanks, Matt > Thanks! > > Sam > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From samuelestes91 at gmail.com Sat Jun 11 19:43:06 2022 From: samuelestes91 at gmail.com (Samuel Estes) Date: Sat, 11 Jun 2022 19:43:06 -0500 Subject: [petsc-users] Mat preallocation for adaptive grid In-Reply-To: References: Message-ID: I'm sorry, would you mind clarifying? I think my email was so long and rambling that it's tough for me to understand which part was being answered. On Sat, Jun 11, 2022 at 7:38 PM Matthew Knepley wrote: > On Sat, Jun 11, 2022 at 8:32 PM Samuel Estes > wrote: > >> Hello, >> >> My question concerns preallocation for Mats in adaptive FEM problems. >> When the grid refines, I destroy the old matrix and create a new one of the >> appropriate (larger size). When the grid ?un-refines? I just use the same >> (extra large) matrix and pad the extra unused diagonal entries with 1?s. >> The problem comes in with the preallocation. I use the MatPreallocator, >> MatPreallocatorPreallocate() paradigm which requires a specific sparsity >> pattern. When the grid un-refines, although the total number of nonzeros >> allocated is (most likely) more than sufficient, the particular sparsity >> pattern changes which leads to mallocs in the MatSetValues routines and >> obviously I would like to avoid this. >> >> One obvious solution is just to destroy and recreate the matrix any time >> the grid changes, even if it gets smaller. By just using a new matrix every >> time, I would avoid this problem although at the cost of having to rebuild >> the matrix more often than necessary. This is the simplest solution from a >> programming perspective and probably the one I will go with. >> >> I'm just curious if there's an alternative that you would recommend? >> Basically what I would like to do is to just change the sparsity pattern >> that is created in the MatPreallocatorPreallocate() routine. I'm not sure >> how it works under the hood, but in principle, it should be possible to >> keep the memory allocated for the Mat values and just assign them new >> column numbers and potentially add new nonzeros as well. Is there a >> convenient way of doing this? One thought I had was to just fill in the >> MatPreallocator object with the new sparsity pattern of the coarser mesh >> and then call the MatPreallocatorPreallocate() routine again with the new >> MatPreallocator matrix. I'm just not sure how exactly that would work since >> it would have already been called for the FEM matrix for the previous, >> finer grid. >> >> Finally, does this really matter? I imagine the bottleneck (assuming good >> preallocation) is in the solver so maybe it doesn't make much difference >> whether or not I reuse the old matrix. In that case, going with option 1 >> and simply destroying and recreating the matrix would be the way to go just >> to save myself some time. >> >> I hope that my question is clear. If not, please let me know and I will >> clarify. I am very curious if there's a convenient solution for the second >> option I mentioned to recycle the allocated memory and redo the sparsity >> pattern. >> > > I have not run any tests of this kind of thing, so I cannot say > definitively. > > I can say that I consider the reuse of memory a problem to be solved at > allocation time. You would hope that a good malloc system would give > you back the same memory you just freed when getting rid of the prior > matrix, so you would get the speedup you want using your approach. > What do you mean by "your approach"? Do you mean the first option where I just always destroy the matrix? Are you basically saying that when I destroy the old matrix and create a new one, it should just give me the same block of memory that was just freed by the destruction of the previous one? > > Second, I think the allocation cost is likely to pale in comparison to the > cost of writing the matrix itself (passing all those indices and values > through > the memory bus), and so reuse of the memory is not that important (I > think). > This seems to suggest that the best option is just to destroy and recreate and not worry about "re-preallocating". Do I understand that correctly? > > Thanks, > > Matt > > >> Thanks! >> >> Sam >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Jun 11 19:54:47 2022 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 11 Jun 2022 20:54:47 -0400 Subject: [petsc-users] Mat preallocation for adaptive grid In-Reply-To: References: Message-ID: On Sat, Jun 11, 2022 at 8:43 PM Samuel Estes wrote: > I'm sorry, would you mind clarifying? I think my email was so long and > rambling that it's tough for me to understand which part was being > answered. > > On Sat, Jun 11, 2022 at 7:38 PM Matthew Knepley wrote: > >> On Sat, Jun 11, 2022 at 8:32 PM Samuel Estes >> wrote: >> >>> Hello, >>> >>> My question concerns preallocation for Mats in adaptive FEM problems. >>> When the grid refines, I destroy the old matrix and create a new one of the >>> appropriate (larger size). When the grid ?un-refines? I just use the same >>> (extra large) matrix and pad the extra unused diagonal entries with 1?s. >>> The problem comes in with the preallocation. I use the MatPreallocator, >>> MatPreallocatorPreallocate() paradigm which requires a specific sparsity >>> pattern. When the grid un-refines, although the total number of nonzeros >>> allocated is (most likely) more than sufficient, the particular sparsity >>> pattern changes which leads to mallocs in the MatSetValues routines and >>> obviously I would like to avoid this. >>> >>> One obvious solution is just to destroy and recreate the matrix any time >>> the grid changes, even if it gets smaller. By just using a new matrix every >>> time, I would avoid this problem although at the cost of having to rebuild >>> the matrix more often than necessary. This is the simplest solution from a >>> programming perspective and probably the one I will go with. >>> >>> I'm just curious if there's an alternative that you would recommend? >>> Basically what I would like to do is to just change the sparsity pattern >>> that is created in the MatPreallocatorPreallocate() routine. I'm not sure >>> how it works under the hood, but in principle, it should be possible to >>> keep the memory allocated for the Mat values and just assign them new >>> column numbers and potentially add new nonzeros as well. Is there a >>> convenient way of doing this? One thought I had was to just fill in the >>> MatPreallocator object with the new sparsity pattern of the coarser mesh >>> and then call the MatPreallocatorPreallocate() routine again with the new >>> MatPreallocator matrix. I'm just not sure how exactly that would work since >>> it would have already been called for the FEM matrix for the previous, >>> finer grid. >>> >>> Finally, does this really matter? I imagine the bottleneck (assuming >>> good preallocation) is in the solver so maybe it doesn't make much >>> difference whether or not I reuse the old matrix. In that case, going with >>> option 1 and simply destroying and recreating the matrix would be the way >>> to go just to save myself some time. >>> >>> I hope that my question is clear. If not, please let me know and I will >>> clarify. I am very curious if there's a convenient solution for the second >>> option I mentioned to recycle the allocated memory and redo the sparsity >>> pattern. >>> >> >> I have not run any tests of this kind of thing, so I cannot say >> definitively. >> >> I can say that I consider the reuse of memory a problem to be solved at >> allocation time. You would hope that a good malloc system would give >> you back the same memory you just freed when getting rid of the prior >> matrix, so you would get the speedup you want using your approach. >> > > What do you mean by "your approach"? Do you mean the first option where I > just always destroy the matrix? Are you basically saying that when I > destroy the old matrix and create a new one, it should just give me the > same block of memory that was just freed by the destruction of the previous > one? > Yes. > >> Second, I think the allocation cost is likely to pale in comparison to >> the cost of writing the matrix itself (passing all those indices and values >> through >> the memory bus), and so reuse of the memory is not that important (I >> think). >> > > This seems to suggest that the best option is just to destroy and recreate > and not worry about "re-preallocating". Do I understand that correctly? > Yes. Thanks, Matt > >> Thanks, >> >> Matt >> >> >>> Thanks! >>> >>> Sam >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From samuelestes91 at gmail.com Sat Jun 11 20:24:19 2022 From: samuelestes91 at gmail.com (Samuel Estes) Date: Sat, 11 Jun 2022 20:24:19 -0500 Subject: [petsc-users] Mat preallocation for adaptive grid In-Reply-To: References: Message-ID: Ok thanks so much for the help! It's nice that it coincides with the easiest option! On Sat, Jun 11, 2022 at 7:54 PM Matthew Knepley wrote: > On Sat, Jun 11, 2022 at 8:43 PM Samuel Estes > wrote: > >> I'm sorry, would you mind clarifying? I think my email was so long and >> rambling that it's tough for me to understand which part was being >> answered. >> >> On Sat, Jun 11, 2022 at 7:38 PM Matthew Knepley >> wrote: >> >>> On Sat, Jun 11, 2022 at 8:32 PM Samuel Estes >>> wrote: >>> >>>> Hello, >>>> >>>> My question concerns preallocation for Mats in adaptive FEM problems. >>>> When the grid refines, I destroy the old matrix and create a new one of the >>>> appropriate (larger size). When the grid ?un-refines? I just use the same >>>> (extra large) matrix and pad the extra unused diagonal entries with 1?s. >>>> The problem comes in with the preallocation. I use the MatPreallocator, >>>> MatPreallocatorPreallocate() paradigm which requires a specific sparsity >>>> pattern. When the grid un-refines, although the total number of nonzeros >>>> allocated is (most likely) more than sufficient, the particular sparsity >>>> pattern changes which leads to mallocs in the MatSetValues routines and >>>> obviously I would like to avoid this. >>>> >>>> One obvious solution is just to destroy and recreate the matrix any >>>> time the grid changes, even if it gets smaller. By just using a new matrix >>>> every time, I would avoid this problem although at the cost of having to >>>> rebuild the matrix more often than necessary. This is the simplest solution >>>> from a programming perspective and probably the one I will go with. >>>> >>>> I'm just curious if there's an alternative that you would recommend? >>>> Basically what I would like to do is to just change the sparsity pattern >>>> that is created in the MatPreallocatorPreallocate() routine. I'm not sure >>>> how it works under the hood, but in principle, it should be possible to >>>> keep the memory allocated for the Mat values and just assign them new >>>> column numbers and potentially add new nonzeros as well. Is there a >>>> convenient way of doing this? One thought I had was to just fill in the >>>> MatPreallocator object with the new sparsity pattern of the coarser mesh >>>> and then call the MatPreallocatorPreallocate() routine again with the new >>>> MatPreallocator matrix. I'm just not sure how exactly that would work since >>>> it would have already been called for the FEM matrix for the previous, >>>> finer grid. >>>> >>>> Finally, does this really matter? I imagine the bottleneck (assuming >>>> good preallocation) is in the solver so maybe it doesn't make much >>>> difference whether or not I reuse the old matrix. In that case, going with >>>> option 1 and simply destroying and recreating the matrix would be the way >>>> to go just to save myself some time. >>>> >>>> I hope that my question is clear. If not, please let me know and I will >>>> clarify. I am very curious if there's a convenient solution for the second >>>> option I mentioned to recycle the allocated memory and redo the sparsity >>>> pattern. >>>> >>> >>> I have not run any tests of this kind of thing, so I cannot say >>> definitively. >>> >>> I can say that I consider the reuse of memory a problem to be solved at >>> allocation time. You would hope that a good malloc system would give >>> you back the same memory you just freed when getting rid of the prior >>> matrix, so you would get the speedup you want using your approach. >>> >> >> What do you mean by "your approach"? Do you mean the first option where I >> just always destroy the matrix? Are you basically saying that when I >> destroy the old matrix and create a new one, it should just give me the >> same block of memory that was just freed by the destruction of the previous >> one? >> > > Yes. > > >> >>> Second, I think the allocation cost is likely to pale in comparison to >>> the cost of writing the matrix itself (passing all those indices and values >>> through >>> the memory bus), and so reuse of the memory is not that important (I >>> think). >>> >> >> This seems to suggest that the best option is just to destroy and >> recreate and not worry about "re-preallocating". Do I understand that >> correctly? >> > > Yes. > > Thanks, > > Matt > > >> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks! >>>> >>>> Sam >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Sun Jun 12 03:07:59 2022 From: jroman at dsic.upv.es (Jose E. Roman) Date: Sun, 12 Jun 2022 10:07:59 +0200 Subject: [petsc-users] SLEPc EPSGD: too much time in single iteration In-Reply-To: References: <0FE9EE1A-2CD4-44A2-ACD0-F6F4A466457A@dsic.upv.es> Message-ID: <873EFD58-3EB4-4FF5-97B7-E2AE8A60D17B@dsic.upv.es> Please always respond to the list. Pay attention to the warnings in the log: ########################################################## # # # WARNING!!! # # # # This code was compiled with a debugging option. # # To get timing results run ./configure # # using --with-debugging=no, the performance will # # be generally two or three times faster. # # # ########################################################## With the debugging option the times are not trustworthy, so I suggest repeating the analysis with an optimized build. Jose > El 12 jun 2022, a las 5:41, Runfeng Jin escribi?: > > Hello! > I compare these two matrix solver's log view and find some strange thing. Attachment files are the log view.: > file 1: log of matrix A solver. This is a larger matrix(900,000*900,000) but solved quickly(30s); > file 2: log of matix B solver. This is a smaller matrix(2,547*2,547 , a little different from the matrix B that is mentioned in initial email, but solved much slower too. I use this for a quicker test) but solved much slower(1244s). > > By comparing these two files, I find some thing: > 1) Matrix A has more basis vectors(375) than B(189), but A spent less time on BVCreate(0.349s) than B(296s); > 2) Matrix A spent less time on EPSSetup(0.031s) than B(10.709s) > 3) Matrix B distribute much more unbalancedly storage among processors(memory max/min 4365) than A(memory max/min 1.113), but other metrics seems more balanced. > > I don't do prealocation in A, and it is distributed across processors by PETSc. For B , when preallocation I use PetscSplitOwnership to decide which part belongs to local processor, and B is also distributed by PETSc when compute matrix values. > > - Does this mean, for matrix B, too much nonzero elements are stored in single process, and this is why it cost too much more time in solving the matrix and find eigenvalues? If so, are there some better ways to distribute the matrix among processors? > - Or are there any else reasons for this difference in cost time? > > Hope to recieve your reply, thank you! > > Runfeng Jin > > > > Runfeng Jin ?2022?6?11??? 20:33??? > Hello! > I have try ues PETSC_DEFAULT for eps_ncv, but it still cost much time. Is there anything else I can do? Attachment is log when use PETSC_DEFAULT for eps_ncv. > > Thank you ! > > Runfeng Jin > > Jose E. Roman ?2022?6?10??? 20:50??? > The value -eps_ncv 5000 is huge. > Better let SLEPc use the default value. > > Jose > > > > El 10 jun 2022, a las 14:24, Jin Runfeng escribi?: > > > > Hello! > > I want to acquire the 3 smallest eigenvalue, and attachment is the log view output. I can see epssolve really cost the major time. But I can not see why it cost so much time. Can you see something from it? > > > > Thank you ! > > > > Runfeng Jin > > > > On 6? 4 2022, at 1:37 ??, Jose E. Roman wrote: > > Convergence depends on distribution of eigenvalues you want to compute. On the other hand, the cost also depends on the time it takes to build the preconditioner. Use -log_view to see the cost of the different steps of the computation. > > > > Jose > > > > > > > El 3 jun 2022, a las 18:50, jsfaraway escribi?: > > > > > > hello! > > > > > > I am trying to use epsgd compute matrix's one smallest eigenvalue. And I find a strang thing. There are two matrix A(900000*900000) and B(90000*90000). While solve A use 371 iterations and only 30.83s, solve B use 22 iterations and 38885s! What could be the reason for this? Or what can I do to find the reason? > > > > > > I use" -eps_type gd -eps_ncv 300 -eps_nev 3 -eps_smallest_real ". > > > And there is one difference I can tell is matrix B has many small value, whose absolute value is less than 10-6. Could this be the reason? > > > > > > Thank you! > > > > > > Runfeng Jin > > > > From sami.ben-elhaj-salah at ensma.fr Sun Jun 12 09:48:44 2022 From: sami.ben-elhaj-salah at ensma.fr (Sami BEN ELHAJ SALAH) Date: Sun, 12 Jun 2022 16:48:44 +0200 Subject: [petsc-users] Writing VTK output In-Reply-To: References: <87czfje0ol.fsf@jedbrown.org> <875ylbdyfk.fsf@jedbrown.org> <7A2FB4C2-CA56-4D7D-9162-4574E14032C6@ensma.fr> Message-ID: Dear Matthew and Jed, Thank you very much for explaining and your help. I am sorry for my late reply. For me, the .vtu file is wrong when the section seems to be not correct (I mean the raw encoding because when I visualize the .vtu file on paraview, the geometry is not good). The header is OK (see attached file). To generate the vtu file, I use the routine suggested by Matthew and the commande line proposed by Jed (-dm_plex_filename 2C3D8_msh.msh -dm_view vtk:2C3D8_msh.vtu). On the other hand, when I use the routine below and write my output to a vtk file and not vtu, the result is ok except the rotation of the elements nodes (the nodes rotation is not good for me and not saved comparing to gmsh file). PetscViewer vtk; PetscViewerVTKOpen(FOX::Parallel::COMM_WORLD,"solution.vtk",FILE_MODE_WRITE,&vtk); VecView(solution,vtk); PetscViewerDestroy(&vtk); I put here an example of a vtk file that I have generated # vtk DataFile Version 2.0 Simplicial Mesh Example ASCII DATASET UNSTRUCTURED_GRID POINTS 12 double 0.000000e+00 1.000000e+01 1.000000e+01 0.000000e+00 0.000000e+00 1.000000e+01 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+01 0.000000e+00 1.000000e+01 1.000000e+01 1.000000e+01 1.000000e+01 0.000000e+00 1.000000e+01 1.000000e+01 0.000000e+00 0.000000e+00 1.000000e+01 1.000000e+01 0.000000e+00 2.000000e+01 1.000000e+01 1.000000e+01 2.000000e+01 0.000000e+00 1.000000e+01 2.000000e+01 0.000000e+00 0.000000e+00 2.000000e+01 1.000000e+01 0.000000e+00 CELLS 2 18 8 0 3 2 1 4 5 6 7 8 4 7 6 5 8 9 10 11 CELL_TYPES 2 12 12 POINT_DATA 12 VECTORS dU_x double 2.754808e-10 -8.653846e-11 -8.653846e-11 2.754808e-10 8.653846e-11 -8.653846e-11 2.754808e-10 8.653846e-11 8.653846e-11 2.754808e-10 -8.653846e-11 8.653846e-11 4.678571e-01 -9.107143e-02 -9.107143e-02 4.678571e-01 9.107143e-02 -9.107143e-02 4.678571e-01 9.107143e-02 9.107143e-02 4.678571e-01 -9.107143e-02 9.107143e-02 1.000000e+00 -7.500000e-02 -7.500000e-02 1.000000e+00 7.500000e-02 -7.500000e-02 1.000000e+00 7.500000e-02 7.500000e-02 1.000000e+00 -7.500000e-02 7.500000e-02 To obtain the good geometry, the two lines 8 0 3 2 1 4 5 6 7 8 4 7 6 5 8 9 10 11 Should be like this in order to have a good geometry defined in the gmsh file. 8 0 1 2 3 4 5 6 7 8 4 5 6 7 8 9 10 11 - - - > So I m trying now to compile my code with petsc 3.16, may be it solves the problem of the rotation order of nodes. Thank you and have a good day, Sami, -- Dr. Sami BEN ELHAJ SALAH Ing?nieur de Recherche (CNRS) Institut Pprime - ISAE - ENSMA Mobile: 06.62.51.26.74 Email: sami.ben-elhaj-salah at ensma.fr www.samibenelhajsalah.com > Le 8 juin 2022 ? 17:57, Matthew Knepley a ?crit : > > On Wed, Jun 8, 2022 at 11:24 AM Sami BEN ELHAJ SALAH > wrote: > Yes, the file "sami.vtu" is loaded correctly in paraview and I have the good output like you. > > In my code, I tried with the same command given in your last answer and I still have the wrong .vtu file. > > Hi Sami, > > What do you mean by wrong? > > Can you just use the simple procedure: > > PetscCall(DMCreate(comm, dm)); > PetscCall(DMSetType(*dm, DMPLEX)); > PetscCall(DMSetFromOptions(*dm)); > PetscCall(DMViewFromOptions(*dm, NULL, "-dm_view")); > > This is the one that works for us. Then we can change it in your code one step at a time until you get what you need. > > Thanks, > > Matt > > I use this: > mpirun -np 1 /home/benelhasa/fox_petsc/build_test/bin/Debug/FoXtroT -snes_test_jacobian_view -snes_converged_reason -snes_monitor -ksp_monitor -ksp_xmonitor -dm_plex_filename cub_2C3D8_msh.msh -dm_view vtk:cub_2C3D8_msh.vtu cub_8C3D8.fxt > > > Thanks, > Sami, > > -- > Dr. Sami BEN ELHAJ SALAH > Ing?nieur de Recherche (CNRS) > Institut Pprime - ISAE - ENSMA > Mobile: 06.62.51.26.74 > Email: sami.ben-elhaj-salah at ensma.fr > www.samibenelhajsalah.com > > > >> Le 8 juin 2022 ? 16:25, Jed Brown > a ?crit : >> >> Does the file load in paraview? When I load your *.msh in a tutorial with -dm_plex_filename sami.msh -dm_view vtk:sami.vtu, I get this good output. >> >> >> Sami BEN ELHAJ SALAH > writes: >> >>> Hi Jed, >>> >>> Thank you for your answer. >>> >>> When I use a ??solution.vtu'', I obtain a wrong file. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> _ $@$@$@$@$@$@$@$@$@$@$@$@4@$@$@4@$@4 at 4@$@@ >>> ?p?O??=??sT?????sT????p?O??=??sT???=??sT????p?O??=??sT???=??sT???=?p?O??=??sT?????sT???=o _????? uP???? uP??o _????? uP???? uP??o _????? uP???? uP??o _????? uP???? uP??b#???????333????333??_#?????? ?333????333??b#??????(?333??'?333??a#???????333??>?333?? >>> >>> >>> >>> >>> If I understand your answer, to solve my problem, should just upgrade all my software ? >>> >>> Thanks, >>> Sami, >>> >>> >>> -- >>> Dr. Sami BEN ELHAJ SALAH >>> Ing?nieur de Recherche (CNRS) >>> Institut Pprime - ISAE - ENSMA >>> Mobile: 06.62.51.26.74 >>> Email: sami.ben-elhaj-salah at ensma.fr >>> www.samibenelhajsalah.com > >>> >>> >>> >>>> Le 8 juin 2022 ? 15:37, Jed Brown > a ?crit : >>>> >>>> You're using pretty old versions of all software; I'd recommend upgrading. I recommend choosing the file name "solution.vtu" to use the modern (non-legacy) format. Does that work for you? >>>> >>>> Sami BEN ELHAJ SALAH > writes: >>>> >>>>> Dear Petsc Developer team, >>>>> >>>>> I solved a linear elastic problem in 3D using a DMPLEX. My system is converging, then I would like to write out my solution vector to a vtk file where I use unstructured mesh. Currently, I tried two algorithms and I have the same result. >>>>> >>>>> 1) Algorithm 1 >>>>> err = SNESSolve(_snes, bc_vec_test, solution); >>>>> CHKERRABORT(FOX::Parallel::COMM_WORLD,err); >>>>> PetscViewer vtk; >>>>> PetscViewerVTKOpen(FOX::Parallel::COMM_WORLD,"solution.vtk",FILE_MODE_WRITE,&vtk); >>>>> VecView(solution,vtk); >>>>> PetscViewerDestroy(&vtk); >>>>> >>>>> >>>>> 2) Algorithm 2 >>>>> err = SNESSolve(_snes, bc_vec_test, solution); >>>>> CHKERRABORT(FOX::Parallel::COMM_WORLD,err); >>>>> PetscViewer vtk; >>>>> PetscViewerCreate(FOX::Parallel::COMM_WORLD, &vtk); >>>>> PetscViewerSetType(vtk, PETSCVIEWERVTK); >>>>> PetscViewerFileSetName(vtk, "sol.vtk"); >>>>> VecView(solution, vtk); >>>>> PetscViewerDestroy(&vtk); >>>>> >>>>> The result seems correct except for the rotation order of the nodes (see the red lines on gmsh and vtk file respectively). Then, I visualized my vtk file with paraview, and I remarked that my geometry is not correct and not conserved when comparing it with my gmsh file. So, I didn?t understand why the rotation order of nodes is not conserved when saving my result to a vtk file? >>>>> >>>>> Other information used: >>>>> - gmsh format 2.2 >>>>> - Vtk version: 7.1.1 >>>>> - Petsc version: 3.13/opt >>>>> >>>>> Below my two files gmsh and vtk: >>>>> >>>>> Gmsh file: >>>>> $MeshFormat >>>>> 2.2 0 8 >>>>> $EndMeshFormat >>>>> $Nodes >>>>> 12 >>>>> 1 0.0 10.0 10.0 >>>>> 2 0.0 0.0 10.0 >>>>> 3 0.0 0.0 0.0 >>>>> 4 0.0 10.0 0.0 >>>>> 5 10.0 10.0 10.0 >>>>> 6 10.0 0.0 10.0 >>>>> 7 10.0 0.0 0.0 >>>>> 8 10.0 10.0 0.0 >>>>> 9 20.0 10.0 10.0 >>>>> 10 20.0 0.0 10.0 >>>>> 11 20.0 0.0 0.0 >>>>> 12 20.0 10.0 0.0 >>>>> $EndNodes >>>>> $Elements >>>>> 2 >>>>> 1 5 2 68 60 1 2 3 4 5 6 7 8 >>>>> 2 5 2 68 60 5 6 7 8 9 10 11 12 >>>>> $EndElements >>>>> >>>>> Vtk file : >>>>> # vtk DataFile Version 2.0 >>>>> Simplicial Mesh Example >>>>> ASCII >>>>> DATASET UNSTRUCTURED_GRID >>>>> POINTS 12 double >>>>> 0.000000e+00 1.000000e+01 1.000000e+01 >>>>> 0.000000e+00 0.000000e+00 1.000000e+01 >>>>> 0.000000e+00 0.000000e+00 0.000000e+00 >>>>> 0.000000e+00 1.000000e+01 0.000000e+00 >>>>> 1.000000e+01 1.000000e+01 1.000000e+01 >>>>> 1.000000e+01 0.000000e+00 1.000000e+01 >>>>> 1.000000e+01 0.000000e+00 0.000000e+00 >>>>> 1.000000e+01 1.000000e+01 0.000000e+00 >>>>> 2.000000e+01 1.000000e+01 1.000000e+01 >>>>> 2.000000e+01 0.000000e+00 1.000000e+01 >>>>> 2.000000e+01 0.000000e+00 0.000000e+00 >>>>> 2.000000e+01 1.000000e+01 0.000000e+00 >>>>> CELLS 2 18 >>>>> 8 0 3 2 1 4 5 6 7 >>>>> 8 4 7 6 5 8 9 10 11 >>>>> CELL_TYPES 2 >>>>> 12 >>>>> 12 >>>>> POINT_DATA 12 >>>>> VECTORS dU_x double >>>>> 2.754808e-10 -8.653846e-11 -8.653846e-11 >>>>> 2.754808e-10 8.653846e-11 -8.653846e-11 >>>>> 2.754808e-10 8.653846e-11 8.653846e-11 >>>>> 2.754808e-10 -8.653846e-11 8.653846e-11 >>>>> 4.678571e-01 -9.107143e-02 -9.107143e-02 >>>>> 4.678571e-01 9.107143e-02 -9.107143e-02 >>>>> 4.678571e-01 9.107143e-02 9.107143e-02 >>>>> 4.678571e-01 -9.107143e-02 9.107143e-02 >>>>> 1.000000e+00 -7.500000e-02 -7.500000e-02 >>>>> 1.000000e+00 7.500000e-02 -7.500000e-02 >>>>> 1.000000e+00 7.500000e-02 7.500000e-02 >>>>> 1.000000e+00 -7.500000e-02 7.500000e-02 >>>>> >>>>> Thank you in advance and have a good day ! >>>>> >>>>> Sami, >>>>> >>>>> -- >>>>> Dr. Sami BEN ELHAJ SALAH >>>>> Ing?nieur de Recherche (CNRS) >>>>> Institut Pprime - ISAE - ENSMA >>>>> Mobile: 06.62.51.26.74 >>>>> Email: sami.ben-elhaj-salah at ensma.fr >>>>> www.samibenelhajsalah.com > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 2C3D8.vtu Type: application/octet-stream Size: 1319 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Jun 13 08:18:56 2022 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 13 Jun 2022 09:18:56 -0400 Subject: [petsc-users] Writing VTK output In-Reply-To: References: <87czfje0ol.fsf@jedbrown.org> <875ylbdyfk.fsf@jedbrown.org> <7A2FB4C2-CA56-4D7D-9162-4574E14032C6@ensma.fr> Message-ID: Can you just send your GMsh file so I can see what you are asking for? Also, Plex stores hexes with outward normals, but some other programs store them with some inward normals. This should be converted in the output. I can check this if you send your mesh. Thanks, Matt On Sun, Jun 12, 2022 at 10:48 AM Sami BEN ELHAJ SALAH < sami.ben-elhaj-salah at ensma.fr> wrote: > Dear Matthew and Jed, > > Thank you very much for explaining and your help. I am sorry for my late > reply. > For me, the .vtu file is wrong when the section seems to be > not correct (I mean the raw encoding because when I visualize the .vtu file > on paraview, the geometry is not good). The header is OK (see attached > file). To generate the vtu file, I use the routine suggested by Matthew and > the commande line proposed by Jed (-dm_plex_filename 2C3D8_msh.msh -dm_view > vtk:2C3D8_msh.vtu). > > On the other hand, when I use the routine below and write my output to a > vtk file and not vtu, the result is ok except the rotation of the elements > nodes (the nodes rotation is not good for me and not saved comparing to > gmsh file). > > PetscViewer vtk; > PetscViewerVTKOpen(FOX::Parallel::COMM_WORLD,"solution.vtk",FILE_MODE_WRITE,&vtk); > VecView(solution,vtk); > PetscViewerDestroy(&vtk); > > I put here an example of a vtk file that I have generated > > # vtk DataFile Version 2.0 > Simplicial Mesh Example > ASCII > DATASET UNSTRUCTURED_GRID > POINTS 12 double > 0.000000e+00 1.000000e+01 1.000000e+01 > 0.000000e+00 0.000000e+00 1.000000e+01 > 0.000000e+00 0.000000e+00 0.000000e+00 > 0.000000e+00 1.000000e+01 0.000000e+00 > 1.000000e+01 1.000000e+01 1.000000e+01 > 1.000000e+01 0.000000e+00 1.000000e+01 > 1.000000e+01 0.000000e+00 0.000000e+00 > 1.000000e+01 1.000000e+01 0.000000e+00 > 2.000000e+01 1.000000e+01 1.000000e+01 > 2.000000e+01 0.000000e+00 1.000000e+01 > 2.000000e+01 0.000000e+00 0.000000e+00 > 2.000000e+01 1.000000e+01 0.000000e+00 > CELLS 2 18 > 8 0 3 2 1 4 5 6 7 > 8 4 7 6 5 8 9 10 11 > CELL_TYPES 2 > 12 > 12 > POINT_DATA 12 > VECTORS dU_x double > 2.754808e-10 -8.653846e-11 -8.653846e-11 > 2.754808e-10 8.653846e-11 -8.653846e-11 > 2.754808e-10 8.653846e-11 8.653846e-11 > 2.754808e-10 -8.653846e-11 8.653846e-11 > 4.678571e-01 -9.107143e-02 -9.107143e-02 > 4.678571e-01 9.107143e-02 -9.107143e-02 > 4.678571e-01 9.107143e-02 9.107143e-02 > 4.678571e-01 -9.107143e-02 9.107143e-02 > 1.000000e+00 -7.500000e-02 -7.500000e-02 > 1.000000e+00 7.500000e-02 -7.500000e-02 > 1.000000e+00 7.500000e-02 7.500000e-02 > 1.000000e+00 -7.500000e-02 7.500000e-02 > > > To obtain the good geometry, the two lines > > 8 0 3 2 1 4 5 6 7 > 8 4 7 6 5 8 9 10 11 > > Should be like this in order to have a good geometry defined in the gmsh > file. > > 8 0 1 2 3 4 5 6 7 > 8 4 5 6 7 8 9 10 11 > > > - - - > So I m trying now to compile my code with petsc 3.16, may be it > solves the problem of the rotation order of nodes. > > Thank you and have a good day, > > Sami, > > -- > Dr. Sami BEN ELHAJ SALAH > Ing?nieur de Recherche (CNRS) > Institut Pprime - ISAE - ENSMA > Mobile: 06.62.51.26.74 > Email: sami.ben-elhaj-salah at ensma.fr > www.samibenelhajsalah.com > > > > > Le 8 juin 2022 ? 17:57, Matthew Knepley a ?crit : > > On Wed, Jun 8, 2022 at 11:24 AM Sami BEN ELHAJ SALAH < > sami.ben-elhaj-salah at ensma.fr> wrote: > >> Yes, the file "sami.vtu" is loaded correctly in paraview and I have the >> good output like you. >> >> In my code, I tried with the same command given in your last answer and I >> still have the wrong .vtu file. >> > > Hi Sami, > > What do you mean by wrong? > > Can you just use the simple procedure: > > PetscCall(DMCreate(comm, dm)); > PetscCall(DMSetType(*dm, DMPLEX)); > PetscCall(DMSetFromOptions(*dm)); > PetscCall(DMViewFromOptions(*dm, NULL, "-dm_view")); > > This is the one that works for us. Then we can change it in your code one > step at a time until you get what you need. > > Thanks, > > Matt > > >> I use this: >> mpirun -np 1 /home/benelhasa/fox_petsc/build_test/bin/Debug/FoXtroT >> -snes_test_jacobian_view -snes_converged_reason -snes_monitor -ksp_monitor >> -ksp_xmonitor -dm_plex_filename cub_2C3D8_msh.msh -dm_view >> vtk:cub_2C3D8_msh.vtu cub_8C3D8.fxt >> >> >> Thanks, >> Sami, >> >> -- >> Dr. Sami BEN ELHAJ SALAH >> Ing?nieur de Recherche (CNRS) >> Institut Pprime - ISAE - ENSMA >> Mobile: 06.62.51.26.74 >> Email: sami.ben-elhaj-salah at ensma.fr >> www.samibenelhajsalah.com >> >> >> >> >> Le 8 juin 2022 ? 16:25, Jed Brown a ?crit : >> >> Does the file load in paraview? When I load your *.msh in a tutorial with >> -dm_plex_filename sami.msh -dm_view vtk:sami.vtu, I get this good output. >> >> >> Sami BEN ELHAJ SALAH writes: >> >> Hi Jed, >> >> Thank you for your answer. >> >> When I use a ??solution.vtu'', I obtain a wrong file. >> >> >> >> >> >> >> > format="appended" offset="0" /> >> >> >> > format="appended" offset="292" /> >> > format="appended" offset="360" /> >> > format="appended" offset="372" /> >> >> >> > format="appended" offset="378" /> >> >> >> > format="appended" offset="390" /> >> >> >> >> >> _ $@$@$@$@$@$@$@$@$@$@$@$@4@$@$@4@$@4 at 4@$@@ >> ?p?O??=??sT?????sT????p?O??=??sT???=??sT????p?O??=??sT???=??sT???=?p?O??=??sT?????sT???=o >> _????? uP???? uP??o _????? uP???? uP??o _????? uP???? uP??o _????? uP???? >> uP??b#???????333????333??_#?????? >> ?333????333??b#??????(?333??'?333??a#???????333??>?333?? >> >> >> >> >> If I understand your answer, to solve my problem, should just upgrade all >> my software ? >> >> Thanks, >> Sami, >> >> >> -- >> Dr. Sami BEN ELHAJ SALAH >> Ing?nieur de Recherche (CNRS) >> Institut Pprime - ISAE - ENSMA >> Mobile: 06.62.51.26.74 >> Email: sami.ben-elhaj-salah at ensma.fr >> www.samibenelhajsalah.com < >> https://samiben91.github.io/samibenelhajsalah/index.html> >> >> >> >> Le 8 juin 2022 ? 15:37, Jed Brown a ?crit : >> >> You're using pretty old versions of all software; I'd recommend >> upgrading. I recommend choosing the file name "solution.vtu" to use the >> modern (non-legacy) format. Does that work for you? >> >> Sami BEN ELHAJ SALAH writes: >> >> Dear Petsc Developer team, >> >> I solved a linear elastic problem in 3D using a DMPLEX. My system is >> converging, then I would like to write out my solution vector to a vtk file >> where I use unstructured mesh. Currently, I tried two algorithms and I have >> the same result. >> >> 1) Algorithm 1 >> err = SNESSolve(_snes, bc_vec_test, solution); >> CHKERRABORT(FOX::Parallel::COMM_WORLD,err); >> PetscViewer vtk; >> >> PetscViewerVTKOpen(FOX::Parallel::COMM_WORLD,"solution.vtk",FILE_MODE_WRITE,&vtk); >> >> VecView(solution,vtk); >> PetscViewerDestroy(&vtk); >> >> >> 2) Algorithm 2 >> err = SNESSolve(_snes, bc_vec_test, solution); >> CHKERRABORT(FOX::Parallel::COMM_WORLD,err); >> PetscViewer vtk; >> PetscViewerCreate(FOX::Parallel::COMM_WORLD, &vtk); >> PetscViewerSetType(vtk, PETSCVIEWERVTK); >> PetscViewerFileSetName(vtk, "sol.vtk"); >> VecView(solution, vtk); >> PetscViewerDestroy(&vtk); >> >> The result seems correct except for the rotation order of the nodes (see >> the red lines on gmsh and vtk file respectively). Then, I visualized my vtk >> file with paraview, and I remarked that my geometry is not correct and not >> conserved when comparing it with my gmsh file. So, I didn?t understand why >> the rotation order of nodes is not conserved when saving my result to a vtk >> file? >> >> Other information used: >> - gmsh format 2.2 >> - Vtk version: 7.1.1 >> - Petsc version: 3.13/opt >> >> Below my two files gmsh and vtk: >> >> Gmsh file: >> $MeshFormat >> 2.2 0 8 >> $EndMeshFormat >> $Nodes >> 12 >> 1 0.0 10.0 10.0 >> 2 0.0 0.0 10.0 >> 3 0.0 0.0 0.0 >> 4 0.0 10.0 0.0 >> 5 10.0 10.0 10.0 >> 6 10.0 0.0 10.0 >> 7 10.0 0.0 0.0 >> 8 10.0 10.0 0.0 >> 9 20.0 10.0 10.0 >> 10 20.0 0.0 10.0 >> 11 20.0 0.0 0.0 >> 12 20.0 10.0 0.0 >> $EndNodes >> $Elements >> 2 >> 1 5 2 68 60 1 2 3 4 5 6 7 8 >> 2 5 2 68 60 5 6 7 8 9 10 11 12 >> $EndElements >> >> Vtk file : >> # vtk DataFile Version 2.0 >> Simplicial Mesh Example >> ASCII >> DATASET UNSTRUCTURED_GRID >> POINTS 12 double >> 0.000000e+00 1.000000e+01 1.000000e+01 >> 0.000000e+00 0.000000e+00 1.000000e+01 >> 0.000000e+00 0.000000e+00 0.000000e+00 >> 0.000000e+00 1.000000e+01 0.000000e+00 >> 1.000000e+01 1.000000e+01 1.000000e+01 >> 1.000000e+01 0.000000e+00 1.000000e+01 >> 1.000000e+01 0.000000e+00 0.000000e+00 >> 1.000000e+01 1.000000e+01 0.000000e+00 >> 2.000000e+01 1.000000e+01 1.000000e+01 >> 2.000000e+01 0.000000e+00 1.000000e+01 >> 2.000000e+01 0.000000e+00 0.000000e+00 >> 2.000000e+01 1.000000e+01 0.000000e+00 >> CELLS 2 18 >> 8 0 3 2 1 4 5 6 7 >> 8 4 7 6 5 8 9 10 11 >> CELL_TYPES 2 >> 12 >> 12 >> POINT_DATA 12 >> VECTORS dU_x double >> 2.754808e-10 -8.653846e-11 -8.653846e-11 >> 2.754808e-10 8.653846e-11 -8.653846e-11 >> 2.754808e-10 8.653846e-11 8.653846e-11 >> 2.754808e-10 -8.653846e-11 8.653846e-11 >> 4.678571e-01 -9.107143e-02 -9.107143e-02 >> 4.678571e-01 9.107143e-02 -9.107143e-02 >> 4.678571e-01 9.107143e-02 9.107143e-02 >> 4.678571e-01 -9.107143e-02 9.107143e-02 >> 1.000000e+00 -7.500000e-02 -7.500000e-02 >> 1.000000e+00 7.500000e-02 -7.500000e-02 >> 1.000000e+00 7.500000e-02 7.500000e-02 >> 1.000000e+00 -7.500000e-02 7.500000e-02 >> >> Thank you in advance and have a good day ! >> >> Sami, >> >> -- >> Dr. Sami BEN ELHAJ SALAH >> Ing?nieur de Recherche (CNRS) >> Institut Pprime - ISAE - ENSMA >> Mobile: 06.62.51.26.74 >> Email: sami.ben-elhaj-salah at ensma.fr >> www.samibenelhajsalah.com < >> https://samiben91.github.io/samibenelhajsalah/index.html> >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail2amneet at gmail.com Mon Jun 13 09:50:32 2022 From: mail2amneet at gmail.com (Amneet Bhalla) Date: Mon, 13 Jun 2022 07:50:32 -0700 Subject: [petsc-users] Calling Pytorch and Python within PETSc C/C++ code Message-ID: Hi Guys, Is there a PETSc interface to make calls to Python scripts or libraries (e.g., Pytorch) from a C/C++ code making use of PETSc? If so, are there some examples that I refer to? Thanks, -- --Amneet -------------- next part -------------- An HTML attachment was scrubbed... URL: From tangqi at msu.edu Mon Jun 13 11:20:17 2022 From: tangqi at msu.edu (Tang, Qi) Date: Mon, 13 Jun 2022 16:20:17 +0000 Subject: [petsc-users] [EXTERNAL] Re: Question about SuperLU In-Reply-To: References: Message-ID: <4933ECB8-4977-4E1D-BD2A-87671C542A6F@msu.edu> Sherry, -mat_superlu_dist_replacetinypivot This flag makes superlu_dist back to working for the full VB block as we would think. Thanks again for the suggestion. Does this imply anything for the VB block matrix? Are we just unlucky or does that imply we have tiny terms along the diagonal and the matrix is not very good? (It could be the case since it is a stablized saddle point problem.) Again, we estimate the condition number through petsc and it is reasonable. Qi On Jun 10, 2022, at 6:35 PM, Xiaoye S. Li wrote: -mat_superlu_dist_replacetinypivot -------------- next part -------------- An HTML attachment was scrubbed... URL: From xsli at lbl.gov Mon Jun 13 12:01:20 2022 From: xsli at lbl.gov (Xiaoye S. Li) Date: Mon, 13 Jun 2022 19:01:20 +0200 Subject: [petsc-users] [EXTERNAL] Re: Question about SuperLU In-Reply-To: <4933ECB8-4977-4E1D-BD2A-87671C542A6F@msu.edu> References: <4933ECB8-4977-4E1D-BD2A-87671C542A6F@msu.edu> Message-ID: Can you write down (in matrix notation) what does full VB matrix look like? Sherry On Mon, Jun 13, 2022 at 6:20 PM Tang, Qi wrote: > Sherry, > > -mat_superlu_dist_replacetinypivot > This flag makes superlu_dist back to working for the full VB block as we > would think. Thanks again for the suggestion. > > Does this imply anything for the VB block matrix? Are we just unlucky or > does that imply we have tiny terms along the diagonal and the matrix is not > very good? (It could be the case since it is a stablized saddle point > problem.) Again, we estimate the condition number through petsc and it is > reasonable. > > Qi > > > On Jun 10, 2022, at 6:35 PM, Xiaoye S. Li wrote: > > -mat_superlu_dist_replacetinypivot > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sami.ben-elhaj-salah at ensma.fr Mon Jun 13 12:47:56 2022 From: sami.ben-elhaj-salah at ensma.fr (Sami BEN ELHAJ SALAH) Date: Mon, 13 Jun 2022 19:47:56 +0200 Subject: [petsc-users] Writing VTK output In-Reply-To: References: <87czfje0ol.fsf@jedbrown.org> <875ylbdyfk.fsf@jedbrown.org> <7A2FB4C2-CA56-4D7D-9162-4574E14032C6@ensma.fr> Message-ID: <76639150-0C2A-4307-AE0A-A2A68E5C2A80@ensma.fr> Hi Matthew, Please find attached the gmsh file, Thank you in advance ! Sami -- Dr. Sami BEN ELHAJ SALAH Ing?nieur de Recherche (CNRS) Institut Pprime - ISAE - ENSMA Mobile: 06.62.51.26.74 Email: sami.ben-elhaj-salah at ensma.fr www.samibenelhajsalah.com > Le 13 juin 2022 ? 15:18, Matthew Knepley a ?crit : > > Can you just send your GMsh file so I can see what you are asking for? > > Also, Plex stores hexes with outward normals, but some other programs store them with some inward normals. This > should be converted in the output. I can check this if you send your mesh. > > Thanks, > > Matt > > On Sun, Jun 12, 2022 at 10:48 AM Sami BEN ELHAJ SALAH > wrote: > Dear Matthew and Jed, > Thank you very much for explaining and your help. I am sorry for my late reply. > > For me, the .vtu file is wrong when the section seems to be not correct (I mean the raw encoding because when I visualize the .vtu file on paraview, the geometry is not good). The header is OK (see attached file). To generate the vtu file, I use the routine suggested by Matthew and the commande line proposed by Jed (-dm_plex_filename 2C3D8_msh.msh -dm_view vtk:2C3D8_msh.vtu). > > On the other hand, when I use the routine below and write my output to a vtk file and not vtu, the result is ok except the rotation of the elements nodes (the nodes rotation is not good for me and not saved comparing to gmsh file). > > PetscViewer vtk; > PetscViewerVTKOpen(FOX::Parallel::COMM_WORLD,"solution.vtk",FILE_MODE_WRITE,&vtk); > VecView(solution,vtk); > PetscViewerDestroy(&vtk); > I put here an example of a vtk file that I have generated > # vtk DataFile Version 2.0 > Simplicial Mesh Example > ASCII > DATASET UNSTRUCTURED_GRID > POINTS 12 double > 0.000000e+00 1.000000e+01 1.000000e+01 > 0.000000e+00 0.000000e+00 1.000000e+01 > 0.000000e+00 0.000000e+00 0.000000e+00 > 0.000000e+00 1.000000e+01 0.000000e+00 > 1.000000e+01 1.000000e+01 1.000000e+01 > 1.000000e+01 0.000000e+00 1.000000e+01 > 1.000000e+01 0.000000e+00 0.000000e+00 > 1.000000e+01 1.000000e+01 0.000000e+00 > 2.000000e+01 1.000000e+01 1.000000e+01 > 2.000000e+01 0.000000e+00 1.000000e+01 > 2.000000e+01 0.000000e+00 0.000000e+00 > 2.000000e+01 1.000000e+01 0.000000e+00 > CELLS 2 18 > 8 0 3 2 1 4 5 6 7 > 8 4 7 6 5 8 9 10 11 > CELL_TYPES 2 > 12 > 12 > POINT_DATA 12 > VECTORS dU_x double > 2.754808e-10 -8.653846e-11 -8.653846e-11 > 2.754808e-10 8.653846e-11 -8.653846e-11 > 2.754808e-10 8.653846e-11 8.653846e-11 > 2.754808e-10 -8.653846e-11 8.653846e-11 > 4.678571e-01 -9.107143e-02 -9.107143e-02 > 4.678571e-01 9.107143e-02 -9.107143e-02 > 4.678571e-01 9.107143e-02 9.107143e-02 > 4.678571e-01 -9.107143e-02 9.107143e-02 > 1.000000e+00 -7.500000e-02 -7.500000e-02 > 1.000000e+00 7.500000e-02 -7.500000e-02 > 1.000000e+00 7.500000e-02 7.500000e-02 > 1.000000e+00 -7.500000e-02 7.500000e-02 > > To obtain the good geometry, the two lines > 8 0 3 2 1 4 5 6 7 > 8 4 7 6 5 8 9 10 11 > Should be like this in order to have a good geometry defined in the gmsh file. > 8 0 1 2 3 4 5 6 7 > 8 4 5 6 7 8 9 10 11 > > - - - > So I m trying now to compile my code with petsc 3.16, may be it solves the problem of the rotation order of nodes. > > Thank you and have a good day, > > Sami, > > -- > Dr. Sami BEN ELHAJ SALAH > Ing?nieur de Recherche (CNRS) > Institut Pprime - ISAE - ENSMA > Mobile: 06.62.51.26.74 > Email: sami.ben-elhaj-salah at ensma.fr > www.samibenelhajsalah.com > > > >> Le 8 juin 2022 ? 17:57, Matthew Knepley > a ?crit : >> >> On Wed, Jun 8, 2022 at 11:24 AM Sami BEN ELHAJ SALAH > wrote: >> Yes, the file "sami.vtu" is loaded correctly in paraview and I have the good output like you. >> >> In my code, I tried with the same command given in your last answer and I still have the wrong .vtu file. >> >> Hi Sami, >> >> What do you mean by wrong? >> >> Can you just use the simple procedure: >> >> PetscCall(DMCreate(comm, dm)); >> PetscCall(DMSetType(*dm, DMPLEX)); >> PetscCall(DMSetFromOptions(*dm)); >> PetscCall(DMViewFromOptions(*dm, NULL, "-dm_view")); >> >> This is the one that works for us. Then we can change it in your code one step at a time until you get what you need. >> >> Thanks, >> >> Matt >> >> I use this: >> mpirun -np 1 /home/benelhasa/fox_petsc/build_test/bin/Debug/FoXtroT -snes_test_jacobian_view -snes_converged_reason -snes_monitor -ksp_monitor -ksp_xmonitor -dm_plex_filename cub_2C3D8_msh.msh -dm_view vtk:cub_2C3D8_msh.vtu cub_8C3D8.fxt >> >> >> Thanks, >> Sami, >> >> -- >> Dr. Sami BEN ELHAJ SALAH >> Ing?nieur de Recherche (CNRS) >> Institut Pprime - ISAE - ENSMA >> Mobile: 06.62.51.26.74 >> Email: sami.ben-elhaj-salah at ensma.fr >> www.samibenelhajsalah.com >> >> >> >>> Le 8 juin 2022 ? 16:25, Jed Brown > a ?crit : >>> >>> Does the file load in paraview? When I load your *.msh in a tutorial with -dm_plex_filename sami.msh -dm_view vtk:sami.vtu, I get this good output. >>> >>> >>> Sami BEN ELHAJ SALAH > writes: >>> >>>> Hi Jed, >>>> >>>> Thank you for your answer. >>>> >>>> When I use a ??solution.vtu'', I obtain a wrong file. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> _ $@$@$@$@$@$@$@$@$@$@$@$@4@$@$@4@$@4 at 4@$@@ >>>> ?p?O??=??sT?????sT????p?O??=??sT???=??sT????p?O??=??sT???=??sT???=?p?O??=??sT?????sT???=o _????? uP???? uP??o _????? uP???? uP??o _????? uP???? uP??o _????? uP???? uP??b#???????333????333??_#?????? ?333????333??b#??????(?333??'?333??a#???????333??>?333?? >>>> >>>> >>>> >>>> >>>> If I understand your answer, to solve my problem, should just upgrade all my software ? >>>> >>>> Thanks, >>>> Sami, >>>> >>>> >>>> -- >>>> Dr. Sami BEN ELHAJ SALAH >>>> Ing?nieur de Recherche (CNRS) >>>> Institut Pprime - ISAE - ENSMA >>>> Mobile: 06.62.51.26.74 >>>> Email: sami.ben-elhaj-salah at ensma.fr >>>> www.samibenelhajsalah.com > >>>> >>>> >>>> >>>>> Le 8 juin 2022 ? 15:37, Jed Brown > a ?crit : >>>>> >>>>> You're using pretty old versions of all software; I'd recommend upgrading. I recommend choosing the file name "solution.vtu" to use the modern (non-legacy) format. Does that work for you? >>>>> >>>>> Sami BEN ELHAJ SALAH > writes: >>>>> >>>>>> Dear Petsc Developer team, >>>>>> >>>>>> I solved a linear elastic problem in 3D using a DMPLEX. My system is converging, then I would like to write out my solution vector to a vtk file where I use unstructured mesh. Currently, I tried two algorithms and I have the same result. >>>>>> >>>>>> 1) Algorithm 1 >>>>>> err = SNESSolve(_snes, bc_vec_test, solution); >>>>>> CHKERRABORT(FOX::Parallel::COMM_WORLD,err); >>>>>> PetscViewer vtk; >>>>>> PetscViewerVTKOpen(FOX::Parallel::COMM_WORLD,"solution.vtk",FILE_MODE_WRITE,&vtk); >>>>>> VecView(solution,vtk); >>>>>> PetscViewerDestroy(&vtk); >>>>>> >>>>>> >>>>>> 2) Algorithm 2 >>>>>> err = SNESSolve(_snes, bc_vec_test, solution); >>>>>> CHKERRABORT(FOX::Parallel::COMM_WORLD,err); >>>>>> PetscViewer vtk; >>>>>> PetscViewerCreate(FOX::Parallel::COMM_WORLD, &vtk); >>>>>> PetscViewerSetType(vtk, PETSCVIEWERVTK); >>>>>> PetscViewerFileSetName(vtk, "sol.vtk"); >>>>>> VecView(solution, vtk); >>>>>> PetscViewerDestroy(&vtk); >>>>>> >>>>>> The result seems correct except for the rotation order of the nodes (see the red lines on gmsh and vtk file respectively). Then, I visualized my vtk file with paraview, and I remarked that my geometry is not correct and not conserved when comparing it with my gmsh file. So, I didn?t understand why the rotation order of nodes is not conserved when saving my result to a vtk file? >>>>>> >>>>>> Other information used: >>>>>> - gmsh format 2.2 >>>>>> - Vtk version: 7.1.1 >>>>>> - Petsc version: 3.13/opt >>>>>> >>>>>> Below my two files gmsh and vtk: >>>>>> >>>>>> Gmsh file: >>>>>> $MeshFormat >>>>>> 2.2 0 8 >>>>>> $EndMeshFormat >>>>>> $Nodes >>>>>> 12 >>>>>> 1 0.0 10.0 10.0 >>>>>> 2 0.0 0.0 10.0 >>>>>> 3 0.0 0.0 0.0 >>>>>> 4 0.0 10.0 0.0 >>>>>> 5 10.0 10.0 10.0 >>>>>> 6 10.0 0.0 10.0 >>>>>> 7 10.0 0.0 0.0 >>>>>> 8 10.0 10.0 0.0 >>>>>> 9 20.0 10.0 10.0 >>>>>> 10 20.0 0.0 10.0 >>>>>> 11 20.0 0.0 0.0 >>>>>> 12 20.0 10.0 0.0 >>>>>> $EndNodes >>>>>> $Elements >>>>>> 2 >>>>>> 1 5 2 68 60 1 2 3 4 5 6 7 8 >>>>>> 2 5 2 68 60 5 6 7 8 9 10 11 12 >>>>>> $EndElements >>>>>> >>>>>> Vtk file : >>>>>> # vtk DataFile Version 2.0 >>>>>> Simplicial Mesh Example >>>>>> ASCII >>>>>> DATASET UNSTRUCTURED_GRID >>>>>> POINTS 12 double >>>>>> 0.000000e+00 1.000000e+01 1.000000e+01 >>>>>> 0.000000e+00 0.000000e+00 1.000000e+01 >>>>>> 0.000000e+00 0.000000e+00 0.000000e+00 >>>>>> 0.000000e+00 1.000000e+01 0.000000e+00 >>>>>> 1.000000e+01 1.000000e+01 1.000000e+01 >>>>>> 1.000000e+01 0.000000e+00 1.000000e+01 >>>>>> 1.000000e+01 0.000000e+00 0.000000e+00 >>>>>> 1.000000e+01 1.000000e+01 0.000000e+00 >>>>>> 2.000000e+01 1.000000e+01 1.000000e+01 >>>>>> 2.000000e+01 0.000000e+00 1.000000e+01 >>>>>> 2.000000e+01 0.000000e+00 0.000000e+00 >>>>>> 2.000000e+01 1.000000e+01 0.000000e+00 >>>>>> CELLS 2 18 >>>>>> 8 0 3 2 1 4 5 6 7 >>>>>> 8 4 7 6 5 8 9 10 11 >>>>>> CELL_TYPES 2 >>>>>> 12 >>>>>> 12 >>>>>> POINT_DATA 12 >>>>>> VECTORS dU_x double >>>>>> 2.754808e-10 -8.653846e-11 -8.653846e-11 >>>>>> 2.754808e-10 8.653846e-11 -8.653846e-11 >>>>>> 2.754808e-10 8.653846e-11 8.653846e-11 >>>>>> 2.754808e-10 -8.653846e-11 8.653846e-11 >>>>>> 4.678571e-01 -9.107143e-02 -9.107143e-02 >>>>>> 4.678571e-01 9.107143e-02 -9.107143e-02 >>>>>> 4.678571e-01 9.107143e-02 9.107143e-02 >>>>>> 4.678571e-01 -9.107143e-02 9.107143e-02 >>>>>> 1.000000e+00 -7.500000e-02 -7.500000e-02 >>>>>> 1.000000e+00 7.500000e-02 -7.500000e-02 >>>>>> 1.000000e+00 7.500000e-02 7.500000e-02 >>>>>> 1.000000e+00 -7.500000e-02 7.500000e-02 >>>>>> >>>>>> Thank you in advance and have a good day ! >>>>>> >>>>>> Sami, >>>>>> >>>>>> -- >>>>>> Dr. Sami BEN ELHAJ SALAH >>>>>> Ing?nieur de Recherche (CNRS) >>>>>> Institut Pprime - ISAE - ENSMA >>>>>> Mobile: 06.62.51.26.74 >>>>>> Email: sami.ben-elhaj-salah at ensma.fr >>>>>> www.samibenelhajsalah.com > >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cub_2C3D8_msh.msh Type: application/octet-stream Size: 343 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Jun 13 12:58:23 2022 From: jed at jedbrown.org (Jed Brown) Date: Mon, 13 Jun 2022 11:58:23 -0600 Subject: [petsc-users] Writing VTK output In-Reply-To: <76639150-0C2A-4307-AE0A-A2A68E5C2A80@ensma.fr> References: <87czfje0ol.fsf@jedbrown.org> <875ylbdyfk.fsf@jedbrown.org> <7A2FB4C2-CA56-4D7D-9162-4574E14032C6@ensma.fr> <76639150-0C2A-4307-AE0A-A2A68E5C2A80@ensma.fr> Message-ID: <87a6agfnsw.fsf@jedbrown.org> This file is corrupted. It ends with $Elements 2 1 5 2 68 60 1 2 3 4 5 6 7 8 2 5 2 68 60 5 6 7 8 9 10 11 12 $EndElements//+ Show "*"; That should be $Elements 2 1 5 2 68 60 1 2 3 4 5 6 7 8 2 5 2 68 60 5 6 7 8 9 10 11 12 $EndElements If you fix it, then you can run $ make $PETSC_ARCH/tests/dm/impls/plex/tutorials/ex7 $ $PETSC_ARCH/tests/dm/impls/plex/tutorials/ex7 -dm_plex_filename ~/dl/cub_2C3D8_msh.msh -dm_view vtk:foo.vtu and open foo.vtu in Paraview. It looks correct. Sami BEN ELHAJ SALAH writes: > Hi Matthew, > Please find attached the gmsh file, > Thank you in advance ! > Sami > > -- > Dr. Sami BEN ELHAJ SALAH > Ing?nieur de Recherche (CNRS) > Institut Pprime - ISAE - ENSMA > Mobile: 06.62.51.26.74 > Email: sami.ben-elhaj-salah at ensma.fr > www.samibenelhajsalah.com > > > >> Le 13 juin 2022 ? 15:18, Matthew Knepley a ?crit : >> >> Can you just send your GMsh file so I can see what you are asking for? >> >> Also, Plex stores hexes with outward normals, but some other programs store them with some inward normals. This >> should be converted in the output. I can check this if you send your mesh. >> >> Thanks, >> >> Matt >> >> On Sun, Jun 12, 2022 at 10:48 AM Sami BEN ELHAJ SALAH > wrote: >> Dear Matthew and Jed, >> Thank you very much for explaining and your help. I am sorry for my late reply. >> >> For me, the .vtu file is wrong when the section seems to be not correct (I mean the raw encoding because when I visualize the .vtu file on paraview, the geometry is not good). The header is OK (see attached file). To generate the vtu file, I use the routine suggested by Matthew and the commande line proposed by Jed (-dm_plex_filename 2C3D8_msh.msh -dm_view vtk:2C3D8_msh.vtu). >> >> On the other hand, when I use the routine below and write my output to a vtk file and not vtu, the result is ok except the rotation of the elements nodes (the nodes rotation is not good for me and not saved comparing to gmsh file). >> >> PetscViewer vtk; >> PetscViewerVTKOpen(FOX::Parallel::COMM_WORLD,"solution.vtk",FILE_MODE_WRITE,&vtk); >> VecView(solution,vtk); >> PetscViewerDestroy(&vtk); >> I put here an example of a vtk file that I have generated >> # vtk DataFile Version 2.0 >> Simplicial Mesh Example >> ASCII >> DATASET UNSTRUCTURED_GRID >> POINTS 12 double >> 0.000000e+00 1.000000e+01 1.000000e+01 >> 0.000000e+00 0.000000e+00 1.000000e+01 >> 0.000000e+00 0.000000e+00 0.000000e+00 >> 0.000000e+00 1.000000e+01 0.000000e+00 >> 1.000000e+01 1.000000e+01 1.000000e+01 >> 1.000000e+01 0.000000e+00 1.000000e+01 >> 1.000000e+01 0.000000e+00 0.000000e+00 >> 1.000000e+01 1.000000e+01 0.000000e+00 >> 2.000000e+01 1.000000e+01 1.000000e+01 >> 2.000000e+01 0.000000e+00 1.000000e+01 >> 2.000000e+01 0.000000e+00 0.000000e+00 >> 2.000000e+01 1.000000e+01 0.000000e+00 >> CELLS 2 18 >> 8 0 3 2 1 4 5 6 7 >> 8 4 7 6 5 8 9 10 11 >> CELL_TYPES 2 >> 12 >> 12 >> POINT_DATA 12 >> VECTORS dU_x double >> 2.754808e-10 -8.653846e-11 -8.653846e-11 >> 2.754808e-10 8.653846e-11 -8.653846e-11 >> 2.754808e-10 8.653846e-11 8.653846e-11 >> 2.754808e-10 -8.653846e-11 8.653846e-11 >> 4.678571e-01 -9.107143e-02 -9.107143e-02 >> 4.678571e-01 9.107143e-02 -9.107143e-02 >> 4.678571e-01 9.107143e-02 9.107143e-02 >> 4.678571e-01 -9.107143e-02 9.107143e-02 >> 1.000000e+00 -7.500000e-02 -7.500000e-02 >> 1.000000e+00 7.500000e-02 -7.500000e-02 >> 1.000000e+00 7.500000e-02 7.500000e-02 >> 1.000000e+00 -7.500000e-02 7.500000e-02 >> >> To obtain the good geometry, the two lines >> 8 0 3 2 1 4 5 6 7 >> 8 4 7 6 5 8 9 10 11 >> Should be like this in order to have a good geometry defined in the gmsh file. >> 8 0 1 2 3 4 5 6 7 >> 8 4 5 6 7 8 9 10 11 >> >> - - - > So I m trying now to compile my code with petsc 3.16, may be it solves the problem of the rotation order of nodes. >> >> Thank you and have a good day, >> >> Sami, >> >> -- >> Dr. Sami BEN ELHAJ SALAH >> Ing?nieur de Recherche (CNRS) >> Institut Pprime - ISAE - ENSMA >> Mobile: 06.62.51.26.74 >> Email: sami.ben-elhaj-salah at ensma.fr >> www.samibenelhajsalah.com >> >> >> >>> Le 8 juin 2022 ? 17:57, Matthew Knepley > a ?crit : >>> >>> On Wed, Jun 8, 2022 at 11:24 AM Sami BEN ELHAJ SALAH > wrote: >>> Yes, the file "sami.vtu" is loaded correctly in paraview and I have the good output like you. >>> >>> In my code, I tried with the same command given in your last answer and I still have the wrong .vtu file. >>> >>> Hi Sami, >>> >>> What do you mean by wrong? >>> >>> Can you just use the simple procedure: >>> >>> PetscCall(DMCreate(comm, dm)); >>> PetscCall(DMSetType(*dm, DMPLEX)); >>> PetscCall(DMSetFromOptions(*dm)); >>> PetscCall(DMViewFromOptions(*dm, NULL, "-dm_view")); >>> >>> This is the one that works for us. Then we can change it in your code one step at a time until you get what you need. >>> >>> Thanks, >>> >>> Matt >>> >>> I use this: >>> mpirun -np 1 /home/benelhasa/fox_petsc/build_test/bin/Debug/FoXtroT -snes_test_jacobian_view -snes_converged_reason -snes_monitor -ksp_monitor -ksp_xmonitor -dm_plex_filename cub_2C3D8_msh.msh -dm_view vtk:cub_2C3D8_msh.vtu cub_8C3D8.fxt >>> >>> >>> Thanks, >>> Sami, >>> >>> -- >>> Dr. Sami BEN ELHAJ SALAH >>> Ing?nieur de Recherche (CNRS) >>> Institut Pprime - ISAE - ENSMA >>> Mobile: 06.62.51.26.74 >>> Email: sami.ben-elhaj-salah at ensma.fr >>> www.samibenelhajsalah.com >>> >>> >>> >>>> Le 8 juin 2022 ? 16:25, Jed Brown > a ?crit : >>>> >>>> Does the file load in paraview? When I load your *.msh in a tutorial with -dm_plex_filename sami.msh -dm_view vtk:sami.vtu, I get this good output. >>>> >>>> >>>> Sami BEN ELHAJ SALAH > writes: >>>> >>>>> Hi Jed, >>>>> >>>>> Thank you for your answer. >>>>> >>>>> When I use a ??solution.vtu'', I obtain a wrong file. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _ $@$@$@$@$@$@$@$@$@$@$@$@4@$@$@4@$@4 at 4@$@@ >>>>> ?p?O??=??sT?????sT????p?O??=??sT???=??sT????p?O??=??sT???=??sT???=?p?O??=??sT?????sT???=o _????? uP???? uP??o _????? uP???? uP??o _????? uP???? uP??o _????? uP???? uP??b#???????333????333??_#?????? ?333????333??b#??????(?333??'?333??a#???????333??>?333?? >>>>> >>>>> >>>>> >>>>> >>>>> If I understand your answer, to solve my problem, should just upgrade all my software ? >>>>> >>>>> Thanks, >>>>> Sami, >>>>> >>>>> >>>>> -- >>>>> Dr. Sami BEN ELHAJ SALAH >>>>> Ing?nieur de Recherche (CNRS) >>>>> Institut Pprime - ISAE - ENSMA >>>>> Mobile: 06.62.51.26.74 >>>>> Email: sami.ben-elhaj-salah at ensma.fr >>>>> www.samibenelhajsalah.com > >>>>> >>>>> >>>>> >>>>>> Le 8 juin 2022 ? 15:37, Jed Brown > a ?crit : >>>>>> >>>>>> You're using pretty old versions of all software; I'd recommend upgrading. I recommend choosing the file name "solution.vtu" to use the modern (non-legacy) format. Does that work for you? >>>>>> >>>>>> Sami BEN ELHAJ SALAH > writes: >>>>>> >>>>>>> Dear Petsc Developer team, >>>>>>> >>>>>>> I solved a linear elastic problem in 3D using a DMPLEX. My system is converging, then I would like to write out my solution vector to a vtk file where I use unstructured mesh. Currently, I tried two algorithms and I have the same result. >>>>>>> >>>>>>> 1) Algorithm 1 >>>>>>> err = SNESSolve(_snes, bc_vec_test, solution); >>>>>>> CHKERRABORT(FOX::Parallel::COMM_WORLD,err); >>>>>>> PetscViewer vtk; >>>>>>> PetscViewerVTKOpen(FOX::Parallel::COMM_WORLD,"solution.vtk",FILE_MODE_WRITE,&vtk); >>>>>>> VecView(solution,vtk); >>>>>>> PetscViewerDestroy(&vtk); >>>>>>> >>>>>>> >>>>>>> 2) Algorithm 2 >>>>>>> err = SNESSolve(_snes, bc_vec_test, solution); >>>>>>> CHKERRABORT(FOX::Parallel::COMM_WORLD,err); >>>>>>> PetscViewer vtk; >>>>>>> PetscViewerCreate(FOX::Parallel::COMM_WORLD, &vtk); >>>>>>> PetscViewerSetType(vtk, PETSCVIEWERVTK); >>>>>>> PetscViewerFileSetName(vtk, "sol.vtk"); >>>>>>> VecView(solution, vtk); >>>>>>> PetscViewerDestroy(&vtk); >>>>>>> >>>>>>> The result seems correct except for the rotation order of the nodes (see the red lines on gmsh and vtk file respectively). Then, I visualized my vtk file with paraview, and I remarked that my geometry is not correct and not conserved when comparing it with my gmsh file. So, I didn?t understand why the rotation order of nodes is not conserved when saving my result to a vtk file? >>>>>>> >>>>>>> Other information used: >>>>>>> - gmsh format 2.2 >>>>>>> - Vtk version: 7.1.1 >>>>>>> - Petsc version: 3.13/opt >>>>>>> >>>>>>> Below my two files gmsh and vtk: >>>>>>> >>>>>>> Gmsh file: >>>>>>> $MeshFormat >>>>>>> 2.2 0 8 >>>>>>> $EndMeshFormat >>>>>>> $Nodes >>>>>>> 12 >>>>>>> 1 0.0 10.0 10.0 >>>>>>> 2 0.0 0.0 10.0 >>>>>>> 3 0.0 0.0 0.0 >>>>>>> 4 0.0 10.0 0.0 >>>>>>> 5 10.0 10.0 10.0 >>>>>>> 6 10.0 0.0 10.0 >>>>>>> 7 10.0 0.0 0.0 >>>>>>> 8 10.0 10.0 0.0 >>>>>>> 9 20.0 10.0 10.0 >>>>>>> 10 20.0 0.0 10.0 >>>>>>> 11 20.0 0.0 0.0 >>>>>>> 12 20.0 10.0 0.0 >>>>>>> $EndNodes >>>>>>> $Elements >>>>>>> 2 >>>>>>> 1 5 2 68 60 1 2 3 4 5 6 7 8 >>>>>>> 2 5 2 68 60 5 6 7 8 9 10 11 12 >>>>>>> $EndElements >>>>>>> >>>>>>> Vtk file : >>>>>>> # vtk DataFile Version 2.0 >>>>>>> Simplicial Mesh Example >>>>>>> ASCII >>>>>>> DATASET UNSTRUCTURED_GRID >>>>>>> POINTS 12 double >>>>>>> 0.000000e+00 1.000000e+01 1.000000e+01 >>>>>>> 0.000000e+00 0.000000e+00 1.000000e+01 >>>>>>> 0.000000e+00 0.000000e+00 0.000000e+00 >>>>>>> 0.000000e+00 1.000000e+01 0.000000e+00 >>>>>>> 1.000000e+01 1.000000e+01 1.000000e+01 >>>>>>> 1.000000e+01 0.000000e+00 1.000000e+01 >>>>>>> 1.000000e+01 0.000000e+00 0.000000e+00 >>>>>>> 1.000000e+01 1.000000e+01 0.000000e+00 >>>>>>> 2.000000e+01 1.000000e+01 1.000000e+01 >>>>>>> 2.000000e+01 0.000000e+00 1.000000e+01 >>>>>>> 2.000000e+01 0.000000e+00 0.000000e+00 >>>>>>> 2.000000e+01 1.000000e+01 0.000000e+00 >>>>>>> CELLS 2 18 >>>>>>> 8 0 3 2 1 4 5 6 7 >>>>>>> 8 4 7 6 5 8 9 10 11 >>>>>>> CELL_TYPES 2 >>>>>>> 12 >>>>>>> 12 >>>>>>> POINT_DATA 12 >>>>>>> VECTORS dU_x double >>>>>>> 2.754808e-10 -8.653846e-11 -8.653846e-11 >>>>>>> 2.754808e-10 8.653846e-11 -8.653846e-11 >>>>>>> 2.754808e-10 8.653846e-11 8.653846e-11 >>>>>>> 2.754808e-10 -8.653846e-11 8.653846e-11 >>>>>>> 4.678571e-01 -9.107143e-02 -9.107143e-02 >>>>>>> 4.678571e-01 9.107143e-02 -9.107143e-02 >>>>>>> 4.678571e-01 9.107143e-02 9.107143e-02 >>>>>>> 4.678571e-01 -9.107143e-02 9.107143e-02 >>>>>>> 1.000000e+00 -7.500000e-02 -7.500000e-02 >>>>>>> 1.000000e+00 7.500000e-02 -7.500000e-02 >>>>>>> 1.000000e+00 7.500000e-02 7.500000e-02 >>>>>>> 1.000000e+00 -7.500000e-02 7.500000e-02 >>>>>>> >>>>>>> Thank you in advance and have a good day ! >>>>>>> >>>>>>> Sami, >>>>>>> >>>>>>> -- >>>>>>> Dr. Sami BEN ELHAJ SALAH >>>>>>> Ing?nieur de Recherche (CNRS) >>>>>>> Institut Pprime - ISAE - ENSMA >>>>>>> Mobile: 06.62.51.26.74 >>>>>>> Email: sami.ben-elhaj-salah at ensma.fr >>>>>>> www.samibenelhajsalah.com > >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ From hongzhang at anl.gov Mon Jun 13 15:42:13 2022 From: hongzhang at anl.gov (Zhang, Hong) Date: Mon, 13 Jun 2022 20:42:13 +0000 Subject: [petsc-users] Calling Pytorch and Python within PETSc C/C++ code In-Reply-To: References: Message-ID: No. It is not common to execute Python scripts or libraries from C/C++ code. If you are looking for ways to use PETSc and PyTorch together, it is best to build your application in Python so that you can use both petsc4py and PyTorch. See the following code for an example: https://github.com/caidao22/pnode Hong (Mr.) On Jun 13, 2022, at 9:50 AM, Amneet Bhalla > wrote: Hi Guys, Is there a PETSc interface to make calls to Python scripts or libraries (e.g., Pytorch) from a C/C++ code making use of PETSc? If so, are there some examples that I refer to? Thanks, -- --Amneet -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Jun 13 16:11:28 2022 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 13 Jun 2022 17:11:28 -0400 Subject: [petsc-users] Calling Pytorch and Python within PETSc C/C++ code In-Reply-To: References: Message-ID: <9AFC3AE5-E969-4755-911F-E7731C210E98@petsc.dev> Note that your Python main ptsc4py program can call C/C++ code for some of its computations, so if you have a lot of C/C++ code you do not need to change it all to Python. It is also possible to call Petsc4py (and hence PyTorch) from a C/C++ main but a bit more cumbersome so not recommended. See src/ksp/ksp/tutorials/ex100.c and ex100.py for an example of a C/C++ main that uses petsc4py (in a limited way). > On Jun 13, 2022, at 4:42 PM, Zhang, Hong via petsc-users wrote: > > No. It is not common to execute Python scripts or libraries from C/C++ code. If you are looking for ways to use PETSc and PyTorch together, it is best to build your application in Python so that you can use both petsc4py and PyTorch. See the following code for an example: > https://github.com/caidao22/pnode > > Hong (Mr.) > >> On Jun 13, 2022, at 9:50 AM, Amneet Bhalla > wrote: >> >> >> Hi Guys, >> >> Is there a PETSc interface to make calls to Python scripts or libraries (e.g., Pytorch) from a C/C++ code making use of PETSc? If so, are there some examples that I refer to? >> >> Thanks, >> -- >> --Amneet >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail2amneet at gmail.com Mon Jun 13 21:46:14 2022 From: mail2amneet at gmail.com (Amneet Bhalla) Date: Mon, 13 Jun 2022 19:46:14 -0700 Subject: [petsc-users] Calling Pytorch and Python within PETSc C/C++ code In-Reply-To: <9AFC3AE5-E969-4755-911F-E7731C210E98@petsc.dev> References: <9AFC3AE5-E969-4755-911F-E7731C210E98@petsc.dev> Message-ID: Thanks for the information. We will check these examples out. Basically we have some trained ANNs that will provide few scalars to the C++ based CFD code. We won?t envision doing too much data transfer between C++ and Python/PyTorch. On Mon, Jun 13, 2022 at 2:11 PM Barry Smith wrote: > > Note that your Python main ptsc4py program can call C/C++ code for some > of its computations, so if you have a lot of C/C++ code you do not need to > change it all to Python. It is also possible to call Petsc4py (and hence > PyTorch) from a C/C++ main but a bit more cumbersome so not recommended. > See src/ksp/ksp/tutorials/ex100.c and ex100.py for an example of a C/C++ > main that uses petsc4py (in a limited way). > > > On Jun 13, 2022, at 4:42 PM, Zhang, Hong via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > No. It is not common to execute Python scripts or libraries from C/C++ > code. If you are looking for ways to use PETSc and PyTorch together, it is > best to build your application in Python so that you can use both petsc4py > and PyTorch. See the following code for an example: > https://github.com/caidao22/pnode > > Hong (Mr.) > > On Jun 13, 2022, at 9:50 AM, Amneet Bhalla wrote: > > > Hi Guys, > > Is there a PETSc interface to make calls to Python scripts or libraries > (e.g., Pytorch) from a C/C++ code making use of PETSc? If so, are there > some examples that I refer to? > > Thanks, > -- > --Amneet > > > > > > -- --Amneet -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsfaraway at gmail.com Wed Jun 15 01:56:00 2022 From: jsfaraway at gmail.com (Runfeng Jin) Date: Wed, 15 Jun 2022 14:56:00 +0800 Subject: [petsc-users] SLEPc EPSGD: too much time in single iteration In-Reply-To: <873EFD58-3EB4-4FF5-97B7-E2AE8A60D17B@dsic.upv.es> References: <0FE9EE1A-2CD4-44A2-ACD0-F6F4A466457A@dsic.upv.es> <873EFD58-3EB4-4FF5-97B7-E2AE8A60D17B@dsic.upv.es> Message-ID: Hi! You are right! I try to use a SLEPc and PETSc version with nodebug, and the matrix B's solver time become 99s. But It is still a little higher than matrix A(8s). Same as mentioned before, attachment is log view of no-debug version: file 1: log of matrix A solver. This is a larger matrix(900,000*900,000) but solved quickly(8s); file 2: log of matix B solver. This is a smaller matrix(2,547*2,547) but solved much slower(99s). By comparing these two files, the strang phenomenon still exist: 1) Matrix A has more basis vectors(375) than B(189), but A spent less time on BVCreate(0.6s) than B(32s); 2) Matrix A spent less time on EPSSetup(0.015s) than B(0.9s) 3) In debug version, matrix B distribute much more unbalancedly storage among processors(memory max/min 4365) than A(memory max/min 1.113), but other metrics seems more balanced. And in no-debug version there is no memory information output. The significant difference I can tell is :1) B use preallocation; 2) A's matrix elements are calculated by CPU, while B's matrix elements are calculated by GPU and then transfered to CPU and solved by PETSc in CPU. Does this is a normal result? I mean, the matrix with less non-zero elements and less dimension can cost more epssolve time? Is this due to the structure of matrix? IF so, is there any ways to increase the solve speed? Or this is weired and should be fixed by some ways? Thank you! Runfeng Jin Jose E. Roman ?2022?6?12??? 16:08??? > Please always respond to the list. > > Pay attention to the warnings in the log: > > ########################################################## > # # > # WARNING!!! # > # # > # This code was compiled with a debugging option. # > # To get timing results run ./configure # > # using --with-debugging=no, the performance will # > # be generally two or three times faster. # > # # > ########################################################## > > With the debugging option the times are not trustworthy, so I suggest > repeating the analysis with an optimized build. > > Jose > > > > El 12 jun 2022, a las 5:41, Runfeng Jin escribi?: > > > > Hello! > > I compare these two matrix solver's log view and find some strange > thing. Attachment files are the log view.: > > file 1: log of matrix A solver. This is a larger > matrix(900,000*900,000) but solved quickly(30s); > > file 2: log of matix B solver. This is a smaller matrix(2,547*2,547 , > a little different from the matrix B that is mentioned in initial email, > but solved much slower too. I use this for a quicker test) but solved much > slower(1244s). > > > > By comparing these two files, I find some thing: > > 1) Matrix A has more basis vectors(375) than B(189), but A spent less > time on BVCreate(0.349s) than B(296s); > > 2) Matrix A spent less time on EPSSetup(0.031s) than B(10.709s) > > 3) Matrix B distribute much more unbalancedly storage among > processors(memory max/min 4365) than A(memory max/min 1.113), but other > metrics seems more balanced. > > > > I don't do prealocation in A, and it is distributed across processors by > PETSc. For B , when preallocation I use PetscSplitOwnership to decide which > part belongs to local processor, and B is also distributed by PETSc when > compute matrix values. > > > > - Does this mean, for matrix B, too much nonzero elements are stored in > single process, and this is why it cost too much more time in solving the > matrix and find eigenvalues? If so, are there some better ways to > distribute the matrix among processors? > > - Or are there any else reasons for this difference in cost time? > > > > Hope to recieve your reply, thank you! > > > > Runfeng Jin > > > > > > > > Runfeng Jin ?2022?6?11??? 20:33??? > > Hello! > > I have try ues PETSC_DEFAULT for eps_ncv, but it still cost much time. > Is there anything else I can do? Attachment is log when use PETSC_DEFAULT > for eps_ncv. > > > > Thank you ! > > > > Runfeng Jin > > > > Jose E. Roman ?2022?6?10??? 20:50??? > > The value -eps_ncv 5000 is huge. > > Better let SLEPc use the default value. > > > > Jose > > > > > > > El 10 jun 2022, a las 14:24, Jin Runfeng > escribi?: > > > > > > Hello! > > > I want to acquire the 3 smallest eigenvalue, and attachment is the > log view output. I can see epssolve really cost the major time. But I can > not see why it cost so much time. Can you see something from it? > > > > > > Thank you ! > > > > > > Runfeng Jin > > > > > > On 6? 4 2022, at 1:37 ??, Jose E. Roman wrote: > > > Convergence depends on distribution of eigenvalues you want to > compute. On the other hand, the cost also depends on the time it takes to > build the preconditioner. Use -log_view to see the cost of the different > steps of the computation. > > > > > > Jose > > > > > > > > > > El 3 jun 2022, a las 18:50, jsfaraway > escribi?: > > > > > > > > hello! > > > > > > > > I am trying to use epsgd compute matrix's one smallest eigenvalue. > And I find a strang thing. There are two matrix A(900000*900000) and > B(90000*90000). While solve A use 371 iterations and only 30.83s, solve B > use 22 iterations and 38885s! What could be the reason for this? Or what > can I do to find the reason? > > > > > > > > I use" -eps_type gd -eps_ncv 300 -eps_nev 3 -eps_smallest_real ". > > > > And there is one difference I can tell is matrix B has many small > value, whose absolute value is less than 10-6. Could this be the reason? > > > > > > > > Thank you! > > > > > > > > Runfeng Jin > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsfaraway at gmail.com Wed Jun 15 01:58:32 2022 From: jsfaraway at gmail.com (Runfeng Jin) Date: Wed, 15 Jun 2022 14:58:32 +0800 Subject: [petsc-users] SLEPc EPSGD: too much time in single iteration In-Reply-To: References: <0FE9EE1A-2CD4-44A2-ACD0-F6F4A466457A@dsic.upv.es> <873EFD58-3EB4-4FF5-97B7-E2AE8A60D17B@dsic.upv.es> Message-ID: Sorry ,I miss the attachment. Runfeng Jin Runfeng Jin ?2022?6?15??? 14:56??? > Hi! You are right! I try to use a SLEPc and PETSc version with nodebug, > and the matrix B's solver time become 99s. But It is still a little higher > than matrix A(8s). Same as mentioned before, attachment is log view of > no-debug version: > file 1: log of matrix A solver. This is a larger > matrix(900,000*900,000) but solved quickly(8s); > file 2: log of matix B solver. This is a smaller matrix(2,547*2,547) > but solved much slower(99s). > > By comparing these two files, the strang phenomenon still exist: > 1) Matrix A has more basis vectors(375) than B(189), but A spent less time > on BVCreate(0.6s) than B(32s); > 2) Matrix A spent less time on EPSSetup(0.015s) than B(0.9s) > 3) In debug version, matrix B distribute much more unbalancedly storage > among processors(memory max/min 4365) than A(memory max/min 1.113), but > other metrics seems more balanced. And in no-debug version there is no > memory information output. > > The significant difference I can tell is :1) B use preallocation; 2) A's > matrix elements are calculated by CPU, while B's matrix elements are > calculated by GPU and then transfered to CPU and solved by PETSc in CPU. > > Does this is a normal result? I mean, the matrix with less non-zero > elements and less dimension can cost more epssolve time? Is this due to the > structure of matrix? IF so, is there any ways to increase the solve speed? > > Or this is weired and should be fixed by some ways? > Thank you! > > Runfeng Jin > > > Jose E. Roman ?2022?6?12??? 16:08??? > >> Please always respond to the list. >> >> Pay attention to the warnings in the log: >> >> ########################################################## >> # # >> # WARNING!!! # >> # # >> # This code was compiled with a debugging option. # >> # To get timing results run ./configure # >> # using --with-debugging=no, the performance will # >> # be generally two or three times faster. # >> # # >> ########################################################## >> >> With the debugging option the times are not trustworthy, so I suggest >> repeating the analysis with an optimized build. >> >> Jose >> >> >> > El 12 jun 2022, a las 5:41, Runfeng Jin escribi?: >> > >> > Hello! >> > I compare these two matrix solver's log view and find some strange >> thing. Attachment files are the log view.: >> > file 1: log of matrix A solver. This is a larger >> matrix(900,000*900,000) but solved quickly(30s); >> > file 2: log of matix B solver. This is a smaller matrix(2,547*2,547 >> , a little different from the matrix B that is mentioned in initial email, >> but solved much slower too. I use this for a quicker test) but solved much >> slower(1244s). >> > >> > By comparing these two files, I find some thing: >> > 1) Matrix A has more basis vectors(375) than B(189), but A spent less >> time on BVCreate(0.349s) than B(296s); >> > 2) Matrix A spent less time on EPSSetup(0.031s) than B(10.709s) >> > 3) Matrix B distribute much more unbalancedly storage among >> processors(memory max/min 4365) than A(memory max/min 1.113), but other >> metrics seems more balanced. >> > >> > I don't do prealocation in A, and it is distributed across processors >> by PETSc. For B , when preallocation I use PetscSplitOwnership to decide >> which part belongs to local processor, and B is also distributed by PETSc >> when compute matrix values. >> > >> > - Does this mean, for matrix B, too much nonzero elements are stored in >> single process, and this is why it cost too much more time in solving the >> matrix and find eigenvalues? If so, are there some better ways to >> distribute the matrix among processors? >> > - Or are there any else reasons for this difference in cost time? >> > >> > Hope to recieve your reply, thank you! >> > >> > Runfeng Jin >> > >> > >> > >> > Runfeng Jin ?2022?6?11??? 20:33??? >> > Hello! >> > I have try ues PETSC_DEFAULT for eps_ncv, but it still cost much time. >> Is there anything else I can do? Attachment is log when use PETSC_DEFAULT >> for eps_ncv. >> > >> > Thank you ! >> > >> > Runfeng Jin >> > >> > Jose E. Roman ?2022?6?10??? 20:50??? >> > The value -eps_ncv 5000 is huge. >> > Better let SLEPc use the default value. >> > >> > Jose >> > >> > >> > > El 10 jun 2022, a las 14:24, Jin Runfeng >> escribi?: >> > > >> > > Hello! >> > > I want to acquire the 3 smallest eigenvalue, and attachment is the >> log view output. I can see epssolve really cost the major time. But I can >> not see why it cost so much time. Can you see something from it? >> > > >> > > Thank you ! >> > > >> > > Runfeng Jin >> > > >> > > On 6? 4 2022, at 1:37 ??, Jose E. Roman wrote: >> > > Convergence depends on distribution of eigenvalues you want to >> compute. On the other hand, the cost also depends on the time it takes to >> build the preconditioner. Use -log_view to see the cost of the different >> steps of the computation. >> > > >> > > Jose >> > > >> > > >> > > > El 3 jun 2022, a las 18:50, jsfaraway >> escribi?: >> > > > >> > > > hello! >> > > > >> > > > I am trying to use epsgd compute matrix's one smallest eigenvalue. >> And I find a strang thing. There are two matrix A(900000*900000) and >> B(90000*90000). While solve A use 371 iterations and only 30.83s, solve B >> use 22 iterations and 38885s! What could be the reason for this? Or what >> can I do to find the reason? >> > > > >> > > > I use" -eps_type gd -eps_ncv 300 -eps_nev 3 -eps_smallest_real ". >> > > > And there is one difference I can tell is matrix B has many small >> value, whose absolute value is less than 10-6. Could this be the reason? >> > > > >> > > > Thank you! >> > > > >> > > > Runfeng Jin >> > > >> > >> > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- /public/home/jrf/works/ecMRCI-shaula/MRCI on a named g16r3n07 with 256 processors, by jrf Wed Jun 15 10:04:00 2022 Using Petsc Release Version 3.15.1, Jun 17, 2021 Max Max/Min Avg Total Time (sec): 1.029e+02 1.001 1.028e+02 Objects: 2.011e+03 1.146 1.761e+03 Flop: 1.574e+06 2.099 1.104e+06 2.827e+08 Flop/sec: 1.531e+04 2.099 1.074e+04 2.748e+06 MPI Messages: 3.881e+04 7.920 1.865e+04 4.773e+06 MPI Message Lengths: 1.454e+06 6.190 3.542e+01 1.691e+08 MPI Reductions: 1.791e+03 1.001 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flop and VecAXPY() for complex vectors of length N --> 8N flop Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 1.0285e+02 100.0% 2.8266e+08 100.0% 4.773e+06 100.0% 3.542e+01 100.0% 1.769e+03 98.9% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage BuildTwoSided 2 1.0 4.0572e-01 2.6 0.00e+00 0.0 3.7e+04 4.0e+00 2.0e+00 0 0 1 0 0 0 0 1 0 0 0 BuildTwoSidedF 1 1.0 2.0986e-01 2.6 0.00e+00 0.0 2.4e+04 1.1e+02 1.0e+00 0 0 1 2 0 0 0 1 2 0 0 MatMult 193 1.0 4.6531e+00 1.1 9.85e+05 4.3 4.7e+06 3.5e+01 1.0e+00 4 48 99 98 0 4 48 99 98 0 29 MatSolve 377 1.0 1.6183e-0288.8 7.16e+04 6.6 0.0e+00 0.0e+00 0.0e+00 0 6 0 0 0 0 6 0 0 0 982 MatLUFactorNum 1 1.0 5.7322e-05 2.8 6.21e+0222.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2177 MatILUFactorSym 1 1.0 8.5668e-03793.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 1 1.0 2.1006e-01 2.6 0.00e+00 0.0 2.4e+04 1.1e+02 1.0e+00 0 0 1 2 0 0 0 1 2 0 0 MatAssemblyEnd 1 1.0 3.0272e-01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRowIJ 1 1.0 7.0000e-07 4.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 5.3758e-05 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 98 1.0 9.0806e-0371.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNorm 3 1.0 1.8633e-01 1.8 6.00e+01 1.1 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecCopy 959 1.0 1.4909e-0275.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 387 1.0 9.8578e-04 8.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 3 1.0 1.6157e-023639.0 6.00e+01 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1 VecScatterBegin 196 1.0 3.7129e-01 1.5 0.00e+00 0.0 4.7e+06 3.5e+01 4.0e+00 0 0 99 98 0 0 0 99 98 0 0 VecScatterEnd 196 1.0 4.6171e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0 VecSetRandom 3 1.0 3.5271e-0515.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecReduceArith 634 1.0 1.7589e-0265.9 1.20e+04 1.1 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 174 VecReduceComm 444 1.0 2.4972e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 4.4e+02 24 0 0 0 25 24 0 0 0 25 0 SFSetGraph 1 1.0 1.1170e-05 9.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFSetUp 4 1.0 2.3891e-01 1.3 0.00e+00 0.0 4.9e+04 1.1e+01 1.0e+00 0 0 1 0 0 0 0 1 0 0 0 SFPack 196 1.0 1.2023e-0291.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFUnpack 196 1.0 4.3491e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 EPSSetUp 1 1.0 9.6815e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.7e+01 1 0 0 0 1 1 0 0 0 1 0 EPSSolve 1 1.0 9.9906e+01 1.0 1.56e+06 2.1 4.7e+06 3.5e+01 1.7e+03 97 99 98 97 97 97 99 98 97 99 3 STSetUp 1 1.0 2.8679e-0450.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 STComputeOperatr 1 1.0 2.0985e-04223.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 BVCreate 194 1.0 3.2437e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.8e+02 31 0 0 0 33 31 0 0 0 33 0 BVCopy 386 1.0 1.7107e-02110.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 BVMultVec 1090 1.0 1.8337e-0221.1 2.22e+05 1.1 0.0e+00 0.0e+00 0.0e+00 0 20 0 0 0 0 20 0 0 0 3080 BVMultInPlace 224 1.0 1.8273e-0218.4 1.06e+05 1.1 0.0e+00 0.0e+00 0.0e+00 0 10 0 0 0 0 10 0 0 0 1481 BVDot 319 1.0 1.7687e+01 1.1 1.11e+05 1.1 0.0e+00 0.0e+00 3.2e+02 17 10 0 0 18 17 10 0 0 18 2 BVDotVec 392 1.0 2.2083e+01 1.0 6.32e+04 1.1 0.0e+00 0.0e+00 3.9e+02 21 6 0 0 22 21 6 0 0 22 1 BVOrthogonalizeV 190 1.0 1.1538e+01 1.0 1.15e+05 1.1 0.0e+00 0.0e+00 2.0e+02 11 10 0 0 11 11 10 0 0 12 3 BVScale 254 1.0 1.7301e-02125.1 2.54e+03 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 37 BVSetRandom 3 1.0 3.6330e-0485.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 BVMatProject 255 1.0 1.7707e+01 1.1 1.11e+05 1.1 0.0e+00 0.0e+00 3.2e+02 17 10 0 0 18 17 10 0 0 18 2 DSSolve 82 1.0 5.4953e-0215.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 DSVectors 380 1.0 9.7683e-0366.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 DSOther 179 1.0 1.7321e-0239.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSetUp 1 1.0 1.8680e-0574.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 377 1.0 1.8723e-0214.2 7.16e+04 6.6 0.0e+00 0.0e+00 0.0e+00 0 6 0 0 0 0 6 0 0 0 849 PCSetUp 2 1.0 8.8937e-0353.4 6.21e+0222.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 14 PCApply 377 1.0 3.0085e-0213.8 7.23e+04 6.6 0.0e+00 0.0e+00 0.0e+00 0 6 0 0 0 0 6 0 0 0 532 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 745 745 2664464 0. Vector 793 793 1538360 0. Index Set 10 10 10792 0. Star Forest Graph 4 4 5376 0. EPS Solver 1 1 3468 0. Spectral Transform 1 1 908 0. Basis Vectors 195 195 437744 0. Region 1 1 680 0. Direct Solver 1 1 20156 0. Krylov Solver 2 2 3200 0. Preconditioner 2 2 1936 0. PetscRandom 1 1 670 0. Viewer 1 0 0 0. ======================================================================================================================== Average time to get PetscTime(): 4.7e-08 Average time for MPI_Barrier(): 0.0578456 Average time for zero size MPI_Send(): 0.00358668 #PETSc Option Table entries: -eps_gd_blocksize 3 -eps_gd_initial_size 3 -eps_ncv PETSC_DEFAULT -eps_type gd -log_view #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --with-blaslapack=1 --with-blaslapack-dir=/public/software/compiler/intel/oneapi/mkl/2021.3.0 --with-64-bit-blas-indices=0 --with-boost=1 --with-boost-dir=/public/home/jrf/tools/boost_1_73_0/gcc7.3.1 --prefix=/public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug --with-valgrind-dir=/public/home/jrf/tools/valgrind --LDFLAGS=-Wl,-rpath=/opt/rh/devtoolset-7/root/usr/lib64 -Wl,-rpath=/opt/rh/devtoolset-7/root/usr/lib --with-64-bit-indices=0 --with-petsc-arch=gcc7.3.1-32indices-nodebug --with-debugging=no ----------------------------------------- Libraries compiled on 2022-06-14 01:43:59 on login05 Machine characteristics: Linux-3.10.0-957.el7.x86_64-x86_64-with-centos Using PETSc directory: /public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug Using PETSc arch: ----------------------------------------- Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O ----------------------------------------- Using include paths: -I/public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug/include -I/public/home/jrf/tools/boost_1_73_0/gcc7.3.1/include -I/public/home/jrf/tools/valgrind/include ----------------------------------------- Using C linker: mpicc Using Fortran linker: mpif90 Using libraries: -Wl,-rpath,/public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug/lib -L/public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug/lib -lpetsc -Wl,-rpath,/public/software/compiler/intel/oneapi/mkl/2021.3.0/lib/intel64 -L/public/software/compiler/intel/oneapi/mkl/2021.3.0/lib/intel64 -Wl,-rpath,/opt/hpc/software/mpi/hwloc/lib -L/opt/hpc/software/mpi/hwloc/lib -Wl,-rpath,/opt/hpc/software/mpi/hpcx/v2.7.4/gcc-7.3.1/lib -L/opt/hpc/software/mpi/hpcx/v2.7.4/gcc-7.3.1/lib -Wl,-rpath,/opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7 -L/opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7 -Wl,-rpath,/opt/rh/devtoolset-7/root/usr/lib64 -L/opt/rh/devtoolset-7/root/usr/lib64 -Wl,-rpath,/opt/hpc/software/mpi/hpcx/v2.7.4/sharp/lib -L/opt/hpc/software/mpi/hpcx/v2.7.4/sharp/lib -Wl,-rpath,/opt/hpc/software/mpi/hpcx/v2.7.4/hcoll/lib -L/opt/hpc/software/mpi/hpcx/v2.7.4/hcoll/lib -Wl,-rpath,/opt/hpc/software/mpi/hpcx/v2.7.4/ucx_without_rocm/lib -L/opt/hpc/software/mpi/hpcx/v2.7.4/ucx_without_rocm/lib -Wl,-rpath,/opt/rh/devtoolset-7/root/usr/lib -L/opt/rh/devtoolset-7/root/usr/lib -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -lX11 -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lquadmath -lstdc++ -ldl ----------------------------------------- -------------- next part -------------- ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- /public/home/jrf/works/qubic/bin/pfci.x on a named h09r4n13 with 192 processors, by jrf Wed Jun 15 12:10:57 2022 Using Petsc Release Version 3.15.1, Jun 17, 2021 Max Max/Min Avg Total Time (sec): 9.703e+02 1.000 9.703e+02 Objects: 2.472e+03 1.000 2.472e+03 Flop: 6.278e+09 1.064 6.012e+09 1.154e+12 Flop/sec: 6.470e+06 1.064 6.196e+06 1.190e+09 MPI Messages: 3.635e+04 1.947 2.755e+04 5.290e+06 MPI Message Lengths: 7.246e+08 1.742 2.052e+04 1.085e+11 MPI Reductions: 2.464e+03 1.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flop and VecAXPY() for complex vectors of length N --> 8N flop Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 9.7032e+02 100.0% 1.1543e+12 100.0% 5.290e+06 100.0% 2.052e+04 100.0% 2.446e+03 99.3% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage BuildTwoSided 2 1.0 1.9883e+029876.1 0.00e+00 0.0 2.1e+04 4.0e+00 2.0e+00 11 0 0 0 0 11 0 0 0 0 0 BuildTwoSidedF 1 1.0 1.9879e+021349804.6 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 11 0 0 0 0 11 0 0 0 0 0 MatMult 247 1.0 2.5963e+00 1.6 1.16e+09 1.2 5.3e+06 2.1e+04 1.0e+00 0 17100100 0 0 17100100 0 77449 MatSolve 479 1.0 3.2541e-01 2.3 3.89e+08 2.2 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 146312 MatLUFactorNum 1 1.0 4.3923e-02 7.0 2.24e+07 4.9 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 41413 MatILUFactorSym 1 1.0 2.5215e-03 4.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 1 1.0 1.9879e+02654719.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 11 0 0 0 0 11 0 0 0 0 0 MatAssemblyEnd 1 1.0 2.1247e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRowIJ 1 1.0 8.3000e-07 5.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 1.0741e-04 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 244 1.0 2.5375e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNorm 3 1.0 1.3600e-0125.4 2.83e+04 1.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 40 VecCopy 1214 1.0 6.8012e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 486 1.0 2.4261e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 3 1.0 1.5987e-04 3.8 2.83e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 34009 VecScatterBegin 247 1.0 3.8039e-01 2.2 0.00e+00 0.0 5.3e+06 2.1e+04 1.0e+00 0 0100100 0 0 0100100 0 0 VecScatterEnd 247 1.0 1.3181e+00 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSetRandom 6 1.0 1.3014e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecReduceArith 723 1.0 5.9514e-03 2.1 6.82e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 220153 VecReduceComm 482 1.0 2.1629e-01 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 4.8e+02 0 0 0 0 20 0 0 0 0 20 0 SFSetGraph 1 1.0 1.3207e-03 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFSetUp 1 1.0 8.3540e-02 1.4 0.00e+00 0.0 4.2e+04 5.2e+03 1.0e+00 0 0 1 0 0 0 0 1 0 0 0 SFPack 247 1.0 2.3981e-01 3.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFUnpack 247 1.0 1.8351e-04 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 EPSSetUp 1 1.0 1.5565e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.7e+01 0 0 0 0 1 0 0 0 0 1 0 EPSSolve 1 1.0 8.5090e+00 1.0 6.26e+09 1.1 5.2e+06 2.1e+04 2.4e+03 1100 99 99 99 1100 99 99 99 135365 STSetUp 1 1.0 1.3724e-04 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 STComputeOperatr 1 1.0 7.1348e-05 5.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 BVCreate 245 1.0 6.2414e-01 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 7.4e+02 0 0 0 0 30 0 0 0 0 30 0 BVCopy 488 1.0 1.9780e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 BVMultVec 1210 1.0 6.7882e-01 1.1 1.14e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 19 0 0 0 0 19 0 0 0 321786 BVMultInPlace 247 1.0 7.8465e-01 1.6 2.71e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 45 0 0 0 0 45 0 0 0 663459 BVDot 718 1.0 1.7888e+00 2.0 5.64e+08 1.0 0.0e+00 0.0e+00 7.2e+02 0 9 0 0 29 0 9 0 0 29 60566 BVDotVec 487 1.0 5.3124e-01 1.2 2.85e+08 1.0 0.0e+00 0.0e+00 4.9e+02 0 5 0 0 20 0 5 0 0 20 102853 BVOrthogonalizeV 244 1.0 5.6093e-01 1.0 5.62e+08 1.0 0.0e+00 0.0e+00 2.5e+02 0 9 0 0 10 0 9 0 0 10 192477 BVScale 482 1.0 2.0062e-03 1.7 2.28e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 217721 BVSetRandom 6 1.0 1.3480e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 BVMatProject 480 1.0 1.8300e+00 2.0 5.64e+08 1.0 0.0e+00 0.0e+00 7.2e+02 0 9 0 0 29 0 9 0 0 29 59203 DSSolve 242 1.0 2.3012e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 DSVectors 482 1.0 4.5230e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 DSOther 485 1.0 2.4384e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSetUp 1 1.0 3.5111e-05 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 479 1.0 3.3062e-01 2.2 3.89e+08 2.2 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 144006 PCSetUp 2 1.0 4.6721e-02 6.0 2.24e+07 4.9 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 38932 PCApply 479 1.0 3.7933e-01 2.4 4.11e+08 2.3 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 130309 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 1216 1216 91546008 0. Vector 994 994 83668472 0. Index Set 5 5 733636 0. Star Forest Graph 1 1 1224 0. EPS Solver 1 1 13512 0. Spectral Transform 1 1 908 0. Basis Vectors 246 246 785872 0. Region 1 1 680 0. Direct Solver 1 1 3617024 0. Krylov Solver 2 2 3200 0. Preconditioner 2 2 1936 0. PetscRandom 1 1 670 0. Viewer 1 0 0 0. ======================================================================================================================== Average time to get PetscTime(): 5e-08 Average time for MPI_Barrier(): 1.90986e-05 Average time for zero size MPI_Send(): 3.44587e-06 #PETSc Option Table entries: -eps_ncv 300 -eps_nev 3 -eps_smallest_real -eps_tol 1e-10 -eps_type gd -log_view #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --with-blaslapack=1 --with-blaslapack-dir=/public/software/compiler/intel/oneapi/mkl/2021.3.0 --with-64-bit-blas-indices=0 --with-boost=1 --with-boost-dir=/public/home/jrf/tools/boost_1_73_0/gcc7.3.1 --prefix=/public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug --with-valgrind-dir=/public/home/jrf/tools/valgrind --LDFLAGS=-Wl,-rpath=/opt/rh/devtoolset-7/root/usr/lib64 -Wl,-rpath=/opt/rh/devtoolset-7/root/usr/lib --with-64-bit-indices=0 --with-petsc-arch=gcc7.3.1-32indices-nodebug --with-debugging=no ----------------------------------------- Libraries compiled on 2022-06-14 01:43:59 on login05 Machine characteristics: Linux-3.10.0-957.el7.x86_64-x86_64-with-centos Using PETSc directory: /public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug Using PETSc arch: ----------------------------------------- Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O ----------------------------------------- Using include paths: -I/public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug/include -I/public/home/jrf/tools/boost_1_73_0/gcc7.3.1/include -I/public/home/jrf/tools/valgrind/include ----------------------------------------- Using C linker: mpicc Using Fortran linker: mpif90 Using libraries: -Wl,-rpath,/public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug/lib -L/public/home/jrf/tools/petsc3.15.1/gcc7.3.1-32indices-nodebug/lib -lpetsc -Wl,-rpath,/public/software/compiler/intel/oneapi/mkl/2021.3.0/lib/intel64 -L/public/software/compiler/intel/oneapi/mkl/2021.3.0/lib/intel64 -Wl,-rpath,/opt/hpc/software/mpi/hwloc/lib -L/opt/hpc/software/mpi/hwloc/lib -Wl,-rpath,/opt/hpc/software/mpi/hpcx/v2.7.4/gcc-7.3.1/lib -L/opt/hpc/software/mpi/hpcx/v2.7.4/gcc-7.3.1/lib -Wl,-rpath,/opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7 -L/opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7 -Wl,-rpath,/opt/rh/devtoolset-7/root/usr/lib64 -L/opt/rh/devtoolset-7/root/usr/lib64 -Wl,-rpath,/opt/hpc/software/mpi/hpcx/v2.7.4/sharp/lib -L/opt/hpc/software/mpi/hpcx/v2.7.4/sharp/lib -Wl,-rpath,/opt/hpc/software/mpi/hpcx/v2.7.4/hcoll/lib -L/opt/hpc/software/mpi/hpcx/v2.7.4/hcoll/lib -Wl,-rpath,/opt/hpc/software/mpi/hpcx/v2.7.4/ucx_without_rocm/lib -L/opt/hpc/software/mpi/hpcx/v2.7.4/ucx_without_rocm/lib -Wl,-rpath,/opt/rh/devtoolset-7/root/usr/lib -L/opt/rh/devtoolset-7/root/usr/lib -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -lX11 -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lquadmath -lstdc++ -ldl ----------------------------------------- From jroman at dsic.upv.es Wed Jun 15 03:09:01 2022 From: jroman at dsic.upv.es (Jose E. Roman) Date: Wed, 15 Jun 2022 10:09:01 +0200 Subject: [petsc-users] SLEPc EPSGD: too much time in single iteration In-Reply-To: References: <0FE9EE1A-2CD4-44A2-ACD0-F6F4A466457A@dsic.upv.es> <873EFD58-3EB4-4FF5-97B7-E2AE8A60D17B@dsic.upv.es> Message-ID: <14570A42-226E-49BB-9074-850258860ACC@dsic.upv.es> You are comparing two different codes on two different machines? Or is it the same machine? with different number of processes and different solver options... If it is the same machine, the performance seems very different: Matrix A: Average time for MPI_Barrier(): 1.90986e-05 Average time for zero size MPI_Send(): 3.44587e-06 Matrix B: Average time for MPI_Barrier(): 0.0578456 Average time for zero size MPI_Send(): 0.00358668 The reductions (VecReduceComm) are taking 2.1629e-01 and 2.4972e+01, respectively. It's a two orders of magnitude difference. Jose > El 15 jun 2022, a las 8:58, Runfeng Jin escribi?: > > Sorry ,I miss the attachment. > > Runfeng Jin > > Runfeng Jin ?2022?6?15??? 14:56??? > Hi! You are right! I try to use a SLEPc and PETSc version with nodebug, and the matrix B's solver time become 99s. But It is still a little higher than matrix A(8s). Same as mentioned before, attachment is log view of no-debug version: > file 1: log of matrix A solver. This is a larger matrix(900,000*900,000) but solved quickly(8s); > file 2: log of matix B solver. This is a smaller matrix(2,547*2,547) but solved much slower(99s). > > By comparing these two files, the strang phenomenon still exist: > 1) Matrix A has more basis vectors(375) than B(189), but A spent less time on BVCreate(0.6s) than B(32s); > 2) Matrix A spent less time on EPSSetup(0.015s) than B(0.9s) > 3) In debug version, matrix B distribute much more unbalancedly storage among processors(memory max/min 4365) than A(memory max/min 1.113), but other metrics seems more balanced. And in no-debug version there is no memory information output. > > The significant difference I can tell is :1) B use preallocation; 2) A's matrix elements are calculated by CPU, while B's matrix elements are calculated by GPU and then transfered to CPU and solved by PETSc in CPU. > > Does this is a normal result? I mean, the matrix with less non-zero elements and less dimension can cost more epssolve time? Is this due to the structure of matrix? IF so, is there any ways to increase the solve speed? > > Or this is weired and should be fixed by some ways? > Thank you! > > Runfeng Jin > > > Jose E. Roman ?2022?6?12??? 16:08??? > Please always respond to the list. > > Pay attention to the warnings in the log: > > ########################################################## > # # > # WARNING!!! # > # # > # This code was compiled with a debugging option. # > # To get timing results run ./configure # > # using --with-debugging=no, the performance will # > # be generally two or three times faster. # > # # > ########################################################## > > With the debugging option the times are not trustworthy, so I suggest repeating the analysis with an optimized build. > > Jose > > > > El 12 jun 2022, a las 5:41, Runfeng Jin escribi?: > > > > Hello! > > I compare these two matrix solver's log view and find some strange thing. Attachment files are the log view.: > > file 1: log of matrix A solver. This is a larger matrix(900,000*900,000) but solved quickly(30s); > > file 2: log of matix B solver. This is a smaller matrix(2,547*2,547 , a little different from the matrix B that is mentioned in initial email, but solved much slower too. I use this for a quicker test) but solved much slower(1244s). > > > > By comparing these two files, I find some thing: > > 1) Matrix A has more basis vectors(375) than B(189), but A spent less time on BVCreate(0.349s) than B(296s); > > 2) Matrix A spent less time on EPSSetup(0.031s) than B(10.709s) > > 3) Matrix B distribute much more unbalancedly storage among processors(memory max/min 4365) than A(memory max/min 1.113), but other metrics seems more balanced. > > > > I don't do prealocation in A, and it is distributed across processors by PETSc. For B , when preallocation I use PetscSplitOwnership to decide which part belongs to local processor, and B is also distributed by PETSc when compute matrix values. > > > > - Does this mean, for matrix B, too much nonzero elements are stored in single process, and this is why it cost too much more time in solving the matrix and find eigenvalues? If so, are there some better ways to distribute the matrix among processors? > > - Or are there any else reasons for this difference in cost time? > > > > Hope to recieve your reply, thank you! > > > > Runfeng Jin > > > > > > > > Runfeng Jin ?2022?6?11??? 20:33??? > > Hello! > > I have try ues PETSC_DEFAULT for eps_ncv, but it still cost much time. Is there anything else I can do? Attachment is log when use PETSC_DEFAULT for eps_ncv. > > > > Thank you ! > > > > Runfeng Jin > > > > Jose E. Roman ?2022?6?10??? 20:50??? > > The value -eps_ncv 5000 is huge. > > Better let SLEPc use the default value. > > > > Jose > > > > > > > El 10 jun 2022, a las 14:24, Jin Runfeng escribi?: > > > > > > Hello! > > > I want to acquire the 3 smallest eigenvalue, and attachment is the log view output. I can see epssolve really cost the major time. But I can not see why it cost so much time. Can you see something from it? > > > > > > Thank you ! > > > > > > Runfeng Jin > > > > > > On 6? 4 2022, at 1:37 ??, Jose E. Roman wrote: > > > Convergence depends on distribution of eigenvalues you want to compute. On the other hand, the cost also depends on the time it takes to build the preconditioner. Use -log_view to see the cost of the different steps of the computation. > > > > > > Jose > > > > > > > > > > El 3 jun 2022, a las 18:50, jsfaraway escribi?: > > > > > > > > hello! > > > > > > > > I am trying to use epsgd compute matrix's one smallest eigenvalue. And I find a strang thing. There are two matrix A(900000*900000) and B(90000*90000). While solve A use 371 iterations and only 30.83s, solve B use 22 iterations and 38885s! What could be the reason for this? Or what can I do to find the reason? > > > > > > > > I use" -eps_type gd -eps_ncv 300 -eps_nev 3 -eps_smallest_real ". > > > > And there is one difference I can tell is matrix B has many small value, whose absolute value is less than 10-6. Could this be the reason? > > > > > > > > Thank you! > > > > > > > > Runfeng Jin > > > > > > > > > From jsfaraway at gmail.com Wed Jun 15 03:20:45 2022 From: jsfaraway at gmail.com (Runfeng Jin) Date: Wed, 15 Jun 2022 16:20:45 +0800 Subject: [petsc-users] SLEPc EPSGD: too much time in single iteration In-Reply-To: <14570A42-226E-49BB-9074-850258860ACC@dsic.upv.es> References: <0FE9EE1A-2CD4-44A2-ACD0-F6F4A466457A@dsic.upv.es> <873EFD58-3EB4-4FF5-97B7-E2AE8A60D17B@dsic.upv.es> <14570A42-226E-49BB-9074-850258860ACC@dsic.upv.es> Message-ID: Hi! I use the same machine, same nodes and same processors per nodes. And I test many times, so this seems not an accidental result. But your points do inspire me. I use Global Array's communicator when solving matrix A, ang just MPI_COMM_WORLD in B. In every node, Global Array's communicator make one processor dedicated to manage communicate, maybe this is the reason for the difference in communicating speed? I will have a try and respond as soon as I get the result! Runfeng Jin Jose E. Roman ?2022?6?15??? 16:09??? > You are comparing two different codes on two different machines? Or is it > the same machine? with different number of processes and different solver > options... > > If it is the same machine, the performance seems very different: > > Matrix A: > Average time for MPI_Barrier(): 1.90986e-05 > Average time for zero size MPI_Send(): 3.44587e-06 > > Matrix B: > Average time for MPI_Barrier(): 0.0578456 > Average time for zero size MPI_Send(): 0.00358668 > > The reductions (VecReduceComm) are taking 2.1629e-01 and 2.4972e+01, > respectively. It's a two orders of magnitude difference. > > Jose > > > > El 15 jun 2022, a las 8:58, Runfeng Jin escribi?: > > > > Sorry ,I miss the attachment. > > > > Runfeng Jin > > > > Runfeng Jin ?2022?6?15??? 14:56??? > > Hi! You are right! I try to use a SLEPc and PETSc version with nodebug, > and the matrix B's solver time become 99s. But It is still a little higher > than matrix A(8s). Same as mentioned before, attachment is log view of > no-debug version: > > file 1: log of matrix A solver. This is a larger > matrix(900,000*900,000) but solved quickly(8s); > > file 2: log of matix B solver. This is a smaller matrix(2,547*2,547) > but solved much slower(99s). > > > > By comparing these two files, the strang phenomenon still exist: > > 1) Matrix A has more basis vectors(375) than B(189), but A spent less > time on BVCreate(0.6s) than B(32s); > > 2) Matrix A spent less time on EPSSetup(0.015s) than B(0.9s) > > 3) In debug version, matrix B distribute much more unbalancedly storage > among processors(memory max/min 4365) than A(memory max/min 1.113), but > other metrics seems more balanced. And in no-debug version there is no > memory information output. > > > > The significant difference I can tell is :1) B use preallocation; 2) A's > matrix elements are calculated by CPU, while B's matrix elements are > calculated by GPU and then transfered to CPU and solved by PETSc in CPU. > > > > Does this is a normal result? I mean, the matrix with less non-zero > elements and less dimension can cost more epssolve time? Is this due to the > structure of matrix? IF so, is there any ways to increase the solve speed? > > > > Or this is weired and should be fixed by some ways? > > Thank you! > > > > Runfeng Jin > > > > > > Jose E. Roman ?2022?6?12??? 16:08??? > > Please always respond to the list. > > > > Pay attention to the warnings in the log: > > > > ########################################################## > > # # > > # WARNING!!! # > > # # > > # This code was compiled with a debugging option. # > > # To get timing results run ./configure # > > # using --with-debugging=no, the performance will # > > # be generally two or three times faster. # > > # # > > ########################################################## > > > > With the debugging option the times are not trustworthy, so I suggest > repeating the analysis with an optimized build. > > > > Jose > > > > > > > El 12 jun 2022, a las 5:41, Runfeng Jin > escribi?: > > > > > > Hello! > > > I compare these two matrix solver's log view and find some strange > thing. Attachment files are the log view.: > > > file 1: log of matrix A solver. This is a larger > matrix(900,000*900,000) but solved quickly(30s); > > > file 2: log of matix B solver. This is a smaller matrix(2,547*2,547 > , a little different from the matrix B that is mentioned in initial email, > but solved much slower too. I use this for a quicker test) but solved much > slower(1244s). > > > > > > By comparing these two files, I find some thing: > > > 1) Matrix A has more basis vectors(375) than B(189), but A spent less > time on BVCreate(0.349s) than B(296s); > > > 2) Matrix A spent less time on EPSSetup(0.031s) than B(10.709s) > > > 3) Matrix B distribute much more unbalancedly storage among > processors(memory max/min 4365) than A(memory max/min 1.113), but other > metrics seems more balanced. > > > > > > I don't do prealocation in A, and it is distributed across processors > by PETSc. For B , when preallocation I use PetscSplitOwnership to decide > which part belongs to local processor, and B is also distributed by PETSc > when compute matrix values. > > > > > > - Does this mean, for matrix B, too much nonzero elements are stored > in single process, and this is why it cost too much more time in solving > the matrix and find eigenvalues? If so, are there some better ways to > distribute the matrix among processors? > > > - Or are there any else reasons for this difference in cost time? > > > > > > Hope to recieve your reply, thank you! > > > > > > Runfeng Jin > > > > > > > > > > > > Runfeng Jin ?2022?6?11??? 20:33??? > > > Hello! > > > I have try ues PETSC_DEFAULT for eps_ncv, but it still cost much time. > Is there anything else I can do? Attachment is log when use PETSC_DEFAULT > for eps_ncv. > > > > > > Thank you ! > > > > > > Runfeng Jin > > > > > > Jose E. Roman ?2022?6?10??? 20:50??? > > > The value -eps_ncv 5000 is huge. > > > Better let SLEPc use the default value. > > > > > > Jose > > > > > > > > > > El 10 jun 2022, a las 14:24, Jin Runfeng > escribi?: > > > > > > > > Hello! > > > > I want to acquire the 3 smallest eigenvalue, and attachment is the > log view output. I can see epssolve really cost the major time. But I can > not see why it cost so much time. Can you see something from it? > > > > > > > > Thank you ! > > > > > > > > Runfeng Jin > > > > > > > > On 6? 4 2022, at 1:37 ??, Jose E. Roman wrote: > > > > Convergence depends on distribution of eigenvalues you want to > compute. On the other hand, the cost also depends on the time it takes to > build the preconditioner. Use -log_view to see the cost of the different > steps of the computation. > > > > > > > > Jose > > > > > > > > > > > > > El 3 jun 2022, a las 18:50, jsfaraway > escribi?: > > > > > > > > > > hello! > > > > > > > > > > I am trying to use epsgd compute matrix's one smallest eigenvalue. > And I find a strang thing. There are two matrix A(900000*900000) and > B(90000*90000). While solve A use 371 iterations and only 30.83s, solve B > use 22 iterations and 38885s! What could be the reason for this? Or what > can I do to find the reason? > > > > > > > > > > I use" -eps_type gd -eps_ncv 300 -eps_nev 3 -eps_smallest_real ". > > > > > And there is one difference I can tell is matrix B has many small > value, whose absolute value is less than 10-6. Could this be the reason? > > > > > > > > > > Thank you! > > > > > > > > > > Runfeng Jin > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jun 15 06:22:30 2022 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 15 Jun 2022 07:22:30 -0400 Subject: [petsc-users] SLEPc EPSGD: too much time in single iteration In-Reply-To: References: <0FE9EE1A-2CD4-44A2-ACD0-F6F4A466457A@dsic.upv.es> <873EFD58-3EB4-4FF5-97B7-E2AE8A60D17B@dsic.upv.es> <14570A42-226E-49BB-9074-850258860ACC@dsic.upv.es> Message-ID: On Wed, Jun 15, 2022 at 4:21 AM Runfeng Jin wrote: > Hi! > I use the same machine, same nodes and same processors per nodes. And I > test many times, so this seems not an accidental result. But your points do > inspire me. I use Global Array's communicator when solving matrix A, ang > just MPI_COMM_WORLD in B. In every node, Global Array's communicator > make one processor dedicated to manage communicate, maybe this is the > reason for the difference in communicating speed? > > I will have a try and respond as soon as I get the result! > I would ask the sysadmin for that machine. That Barrier time is so high, I would think something is wrong with the switch. Or you are oversubscribing which is causing massive slowdown. Thanks, Matt > Runfeng Jin > > > Jose E. Roman ?2022?6?15??? 16:09??? > >> You are comparing two different codes on two different machines? Or is it >> the same machine? with different number of processes and different solver >> options... >> >> If it is the same machine, the performance seems very different: >> >> Matrix A: >> Average time for MPI_Barrier(): 1.90986e-05 >> Average time for zero size MPI_Send(): 3.44587e-06 >> >> Matrix B: >> Average time for MPI_Barrier(): 0.0578456 >> Average time for zero size MPI_Send(): 0.00358668 >> >> The reductions (VecReduceComm) are taking 2.1629e-01 and 2.4972e+01, >> respectively. It's a two orders of magnitude difference. >> >> Jose >> >> >> > El 15 jun 2022, a las 8:58, Runfeng Jin escribi?: >> > >> > Sorry ,I miss the attachment. >> > >> > Runfeng Jin >> > >> > Runfeng Jin ?2022?6?15??? 14:56??? >> > Hi! You are right! I try to use a SLEPc and PETSc version with >> nodebug, and the matrix B's solver time become 99s. But It is still a >> little higher than matrix A(8s). Same as mentioned before, attachment is >> log view of no-debug version: >> > file 1: log of matrix A solver. This is a larger >> matrix(900,000*900,000) but solved quickly(8s); >> > file 2: log of matix B solver. This is a smaller matrix(2,547*2,547) >> but solved much slower(99s). >> > >> > By comparing these two files, the strang phenomenon still exist: >> > 1) Matrix A has more basis vectors(375) than B(189), but A spent less >> time on BVCreate(0.6s) than B(32s); >> > 2) Matrix A spent less time on EPSSetup(0.015s) than B(0.9s) >> > 3) In debug version, matrix B distribute much more unbalancedly storage >> among processors(memory max/min 4365) than A(memory max/min 1.113), but >> other metrics seems more balanced. And in no-debug version there is no >> memory information output. >> > >> > The significant difference I can tell is :1) B use preallocation; 2) >> A's matrix elements are calculated by CPU, while B's matrix elements are >> calculated by GPU and then transfered to CPU and solved by PETSc in CPU. >> > >> > Does this is a normal result? I mean, the matrix with less non-zero >> elements and less dimension can cost more epssolve time? Is this due to the >> structure of matrix? IF so, is there any ways to increase the solve speed? >> > >> > Or this is weired and should be fixed by some ways? >> > Thank you! >> > >> > Runfeng Jin >> > >> > >> > Jose E. Roman ?2022?6?12??? 16:08??? >> > Please always respond to the list. >> > >> > Pay attention to the warnings in the log: >> > >> > ########################################################## >> > # # >> > # WARNING!!! # >> > # # >> > # This code was compiled with a debugging option. # >> > # To get timing results run ./configure # >> > # using --with-debugging=no, the performance will # >> > # be generally two or three times faster. # >> > # # >> > ########################################################## >> > >> > With the debugging option the times are not trustworthy, so I suggest >> repeating the analysis with an optimized build. >> > >> > Jose >> > >> > >> > > El 12 jun 2022, a las 5:41, Runfeng Jin >> escribi?: >> > > >> > > Hello! >> > > I compare these two matrix solver's log view and find some strange >> thing. Attachment files are the log view.: >> > > file 1: log of matrix A solver. This is a larger >> matrix(900,000*900,000) but solved quickly(30s); >> > > file 2: log of matix B solver. This is a smaller >> matrix(2,547*2,547 , a little different from the matrix B that is mentioned >> in initial email, but solved much slower too. I use this for a quicker >> test) but solved much slower(1244s). >> > > >> > > By comparing these two files, I find some thing: >> > > 1) Matrix A has more basis vectors(375) than B(189), but A spent less >> time on BVCreate(0.349s) than B(296s); >> > > 2) Matrix A spent less time on EPSSetup(0.031s) than B(10.709s) >> > > 3) Matrix B distribute much more unbalancedly storage among >> processors(memory max/min 4365) than A(memory max/min 1.113), but other >> metrics seems more balanced. >> > > >> > > I don't do prealocation in A, and it is distributed across processors >> by PETSc. For B , when preallocation I use PetscSplitOwnership to decide >> which part belongs to local processor, and B is also distributed by PETSc >> when compute matrix values. >> > > >> > > - Does this mean, for matrix B, too much nonzero elements are stored >> in single process, and this is why it cost too much more time in solving >> the matrix and find eigenvalues? If so, are there some better ways to >> distribute the matrix among processors? >> > > - Or are there any else reasons for this difference in cost time? >> > > >> > > Hope to recieve your reply, thank you! >> > > >> > > Runfeng Jin >> > > >> > > >> > > >> > > Runfeng Jin ?2022?6?11??? 20:33??? >> > > Hello! >> > > I have try ues PETSC_DEFAULT for eps_ncv, but it still cost much >> time. Is there anything else I can do? Attachment is log when use >> PETSC_DEFAULT for eps_ncv. >> > > >> > > Thank you ! >> > > >> > > Runfeng Jin >> > > >> > > Jose E. Roman ?2022?6?10??? 20:50??? >> > > The value -eps_ncv 5000 is huge. >> > > Better let SLEPc use the default value. >> > > >> > > Jose >> > > >> > > >> > > > El 10 jun 2022, a las 14:24, Jin Runfeng >> escribi?: >> > > > >> > > > Hello! >> > > > I want to acquire the 3 smallest eigenvalue, and attachment is the >> log view output. I can see epssolve really cost the major time. But I can >> not see why it cost so much time. Can you see something from it? >> > > > >> > > > Thank you ! >> > > > >> > > > Runfeng Jin >> > > > >> > > > On 6? 4 2022, at 1:37 ??, Jose E. Roman wrote: >> > > > Convergence depends on distribution of eigenvalues you want to >> compute. On the other hand, the cost also depends on the time it takes to >> build the preconditioner. Use -log_view to see the cost of the different >> steps of the computation. >> > > > >> > > > Jose >> > > > >> > > > >> > > > > El 3 jun 2022, a las 18:50, jsfaraway >> escribi?: >> > > > > >> > > > > hello! >> > > > > >> > > > > I am trying to use epsgd compute matrix's one smallest >> eigenvalue. And I find a strang thing. There are two matrix >> A(900000*900000) and B(90000*90000). While solve A use 371 iterations and >> only 30.83s, solve B use 22 iterations and 38885s! What could be the reason >> for this? Or what can I do to find the reason? >> > > > > >> > > > > I use" -eps_type gd -eps_ncv 300 -eps_nev 3 -eps_smallest_real ". >> > > > > And there is one difference I can tell is matrix B has many small >> value, whose absolute value is less than 10-6. Could this be the reason? >> > > > > >> > > > > Thank you! >> > > > > >> > > > > Runfeng Jin >> > > > >> > > >> > > >> >> > >> > >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsfaraway at gmail.com Wed Jun 15 20:31:26 2022 From: jsfaraway at gmail.com (Runfeng Jin) Date: Thu, 16 Jun 2022 09:31:26 +0800 Subject: [petsc-users] SLEPc EPSGD: too much time in single iteration In-Reply-To: References: <0FE9EE1A-2CD4-44A2-ACD0-F6F4A466457A@dsic.upv.es> <873EFD58-3EB4-4FF5-97B7-E2AE8A60D17B@dsic.upv.es> <14570A42-226E-49BB-9074-850258860ACC@dsic.upv.es> Message-ID: Hi! Thank you for your reply. I am a little confused about the problem of machine. These two matrices solved in the same cluster, if there are some problems about the machine, why the low performance just happen to the matrix B? And, what is the situation of oversubscribing? Could you give some examples? Thank you! Runfeng Jin Matthew Knepley ?2022?6?15??? 19:22??? > On Wed, Jun 15, 2022 at 4:21 AM Runfeng Jin wrote: > >> Hi! >> I use the same machine, same nodes and same processors per nodes. And I >> test many times, so this seems not an accidental result. But your points do >> inspire me. I use Global Array's communicator when solving matrix A, ang >> just MPI_COMM_WORLD in B. In every node, Global Array's communicator >> make one processor dedicated to manage communicate, maybe this is the >> reason for the difference in communicating speed? >> >> I will have a try and respond as soon as I get the result! >> > > I would ask the sysadmin for that machine. That Barrier time is so high, I > would think something is wrong with the switch. Or you are > oversubscribing which is causing massive slowdown. > > Thanks, > > Matt > > >> Runfeng Jin >> >> >> Jose E. Roman ?2022?6?15??? 16:09??? >> >>> You are comparing two different codes on two different machines? Or is >>> it the same machine? with different number of processes and different >>> solver options... >>> >>> If it is the same machine, the performance seems very different: >>> >>> Matrix A: >>> Average time for MPI_Barrier(): 1.90986e-05 >>> Average time for zero size MPI_Send(): 3.44587e-06 >>> >>> Matrix B: >>> Average time for MPI_Barrier(): 0.0578456 >>> Average time for zero size MPI_Send(): 0.00358668 >>> >>> The reductions (VecReduceComm) are taking 2.1629e-01 and 2.4972e+01, >>> respectively. It's a two orders of magnitude difference. >>> >>> Jose >>> >>> >>> > El 15 jun 2022, a las 8:58, Runfeng Jin >>> escribi?: >>> > >>> > Sorry ,I miss the attachment. >>> > >>> > Runfeng Jin >>> > >>> > Runfeng Jin ?2022?6?15??? 14:56??? >>> > Hi! You are right! I try to use a SLEPc and PETSc version with >>> nodebug, and the matrix B's solver time become 99s. But It is still a >>> little higher than matrix A(8s). Same as mentioned before, attachment is >>> log view of no-debug version: >>> > file 1: log of matrix A solver. This is a larger >>> matrix(900,000*900,000) but solved quickly(8s); >>> > file 2: log of matix B solver. This is a smaller >>> matrix(2,547*2,547) but solved much slower(99s). >>> > >>> > By comparing these two files, the strang phenomenon still exist: >>> > 1) Matrix A has more basis vectors(375) than B(189), but A spent less >>> time on BVCreate(0.6s) than B(32s); >>> > 2) Matrix A spent less time on EPSSetup(0.015s) than B(0.9s) >>> > 3) In debug version, matrix B distribute much more unbalancedly >>> storage among processors(memory max/min 4365) than A(memory max/min 1.113), >>> but other metrics seems more balanced. And in no-debug version there is no >>> memory information output. >>> > >>> > The significant difference I can tell is :1) B use preallocation; 2) >>> A's matrix elements are calculated by CPU, while B's matrix elements are >>> calculated by GPU and then transfered to CPU and solved by PETSc in CPU. >>> > >>> > Does this is a normal result? I mean, the matrix with less non-zero >>> elements and less dimension can cost more epssolve time? Is this due to the >>> structure of matrix? IF so, is there any ways to increase the solve speed? >>> > >>> > Or this is weired and should be fixed by some ways? >>> > Thank you! >>> > >>> > Runfeng Jin >>> > >>> > >>> > Jose E. Roman ?2022?6?12??? 16:08??? >>> > Please always respond to the list. >>> > >>> > Pay attention to the warnings in the log: >>> > >>> > ########################################################## >>> > # # >>> > # WARNING!!! # >>> > # # >>> > # This code was compiled with a debugging option. # >>> > # To get timing results run ./configure # >>> > # using --with-debugging=no, the performance will # >>> > # be generally two or three times faster. # >>> > # # >>> > ########################################################## >>> > >>> > With the debugging option the times are not trustworthy, so I suggest >>> repeating the analysis with an optimized build. >>> > >>> > Jose >>> > >>> > >>> > > El 12 jun 2022, a las 5:41, Runfeng Jin >>> escribi?: >>> > > >>> > > Hello! >>> > > I compare these two matrix solver's log view and find some strange >>> thing. Attachment files are the log view.: >>> > > file 1: log of matrix A solver. This is a larger >>> matrix(900,000*900,000) but solved quickly(30s); >>> > > file 2: log of matix B solver. This is a smaller >>> matrix(2,547*2,547 , a little different from the matrix B that is mentioned >>> in initial email, but solved much slower too. I use this for a quicker >>> test) but solved much slower(1244s). >>> > > >>> > > By comparing these two files, I find some thing: >>> > > 1) Matrix A has more basis vectors(375) than B(189), but A spent >>> less time on BVCreate(0.349s) than B(296s); >>> > > 2) Matrix A spent less time on EPSSetup(0.031s) than B(10.709s) >>> > > 3) Matrix B distribute much more unbalancedly storage among >>> processors(memory max/min 4365) than A(memory max/min 1.113), but other >>> metrics seems more balanced. >>> > > >>> > > I don't do prealocation in A, and it is distributed across >>> processors by PETSc. For B , when preallocation I use PetscSplitOwnership >>> to decide which part belongs to local processor, and B is also distributed >>> by PETSc when compute matrix values. >>> > > >>> > > - Does this mean, for matrix B, too much nonzero elements are stored >>> in single process, and this is why it cost too much more time in solving >>> the matrix and find eigenvalues? If so, are there some better ways to >>> distribute the matrix among processors? >>> > > - Or are there any else reasons for this difference in cost time? >>> > > >>> > > Hope to recieve your reply, thank you! >>> > > >>> > > Runfeng Jin >>> > > >>> > > >>> > > >>> > > Runfeng Jin ?2022?6?11??? 20:33??? >>> > > Hello! >>> > > I have try ues PETSC_DEFAULT for eps_ncv, but it still cost much >>> time. Is there anything else I can do? Attachment is log when use >>> PETSC_DEFAULT for eps_ncv. >>> > > >>> > > Thank you ! >>> > > >>> > > Runfeng Jin >>> > > >>> > > Jose E. Roman ?2022?6?10??? 20:50??? >>> > > The value -eps_ncv 5000 is huge. >>> > > Better let SLEPc use the default value. >>> > > >>> > > Jose >>> > > >>> > > >>> > > > El 10 jun 2022, a las 14:24, Jin Runfeng >>> escribi?: >>> > > > >>> > > > Hello! >>> > > > I want to acquire the 3 smallest eigenvalue, and attachment is >>> the log view output. I can see epssolve really cost the major time. But I >>> can not see why it cost so much time. Can you see something from it? >>> > > > >>> > > > Thank you ! >>> > > > >>> > > > Runfeng Jin >>> > > > >>> > > > On 6? 4 2022, at 1:37 ??, Jose E. Roman >>> wrote: >>> > > > Convergence depends on distribution of eigenvalues you want to >>> compute. On the other hand, the cost also depends on the time it takes to >>> build the preconditioner. Use -log_view to see the cost of the different >>> steps of the computation. >>> > > > >>> > > > Jose >>> > > > >>> > > > >>> > > > > El 3 jun 2022, a las 18:50, jsfaraway >>> escribi?: >>> > > > > >>> > > > > hello! >>> > > > > >>> > > > > I am trying to use epsgd compute matrix's one smallest >>> eigenvalue. And I find a strang thing. There are two matrix >>> A(900000*900000) and B(90000*90000). While solve A use 371 iterations and >>> only 30.83s, solve B use 22 iterations and 38885s! What could be the reason >>> for this? Or what can I do to find the reason? >>> > > > > >>> > > > > I use" -eps_type gd -eps_ncv 300 -eps_nev 3 -eps_smallest_real ". >>> > > > > And there is one difference I can tell is matrix B has many >>> small value, whose absolute value is less than 10-6. Could this be the >>> reason? >>> > > > > >>> > > > > Thank you! >>> > > > > >>> > > > > Runfeng Jin >>> > > > >>> > > >>> > > >>> >>> > >>> > >>> >>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jun 16 07:12:24 2022 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 16 Jun 2022 08:12:24 -0400 Subject: [petsc-users] SLEPc EPSGD: too much time in single iteration In-Reply-To: References: <0FE9EE1A-2CD4-44A2-ACD0-F6F4A466457A@dsic.upv.es> <873EFD58-3EB4-4FF5-97B7-E2AE8A60D17B@dsic.upv.es> <14570A42-226E-49BB-9074-850258860ACC@dsic.upv.es> Message-ID: On Wed, Jun 15, 2022 at 9:32 PM Runfeng Jin wrote: > Hi! Thank you for your reply. > > I am a little confused about the problem of machine. These two matrices > solved in the same cluster, if there are some problems about the machine, > why the low performance just happen to the matrix B? > The performance problem is not related to the matrix B. The MPI_Barrier time on the second run is 1,000x slower. We just run MPI_Barrier() at log output time to get this. It is not part of a solve. It could be that there is a part of the cluster that is broken and your second job got scheduled there. > And, what is the situation of oversubscribing? Could you give some > examples? > Some MPI implementations perform extremely poorly when the number of processes exceeds the number of cores. This is called oversubscription. Thanks, Matt > Thank you! > > Runfeng Jin > > Matthew Knepley ?2022?6?15??? 19:22??? > >> On Wed, Jun 15, 2022 at 4:21 AM Runfeng Jin wrote: >> >>> Hi! >>> I use the same machine, same nodes and same processors per nodes. And I >>> test many times, so this seems not an accidental result. But your points do >>> inspire me. I use Global Array's communicator when solving matrix A, ang >>> just MPI_COMM_WORLD in B. In every node, Global Array's communicator >>> make one processor dedicated to manage communicate, maybe this is the >>> reason for the difference in communicating speed? >>> >>> I will have a try and respond as soon as I get the result! >>> >> >> I would ask the sysadmin for that machine. That Barrier time is so high, >> I would think something is wrong with the switch. Or you are >> oversubscribing which is causing massive slowdown. >> >> Thanks, >> >> Matt >> >> >>> Runfeng Jin >>> >>> >>> Jose E. Roman ?2022?6?15??? 16:09??? >>> >>>> You are comparing two different codes on two different machines? Or is >>>> it the same machine? with different number of processes and different >>>> solver options... >>>> >>>> If it is the same machine, the performance seems very different: >>>> >>>> Matrix A: >>>> Average time for MPI_Barrier(): 1.90986e-05 >>>> Average time for zero size MPI_Send(): 3.44587e-06 >>>> >>>> Matrix B: >>>> Average time for MPI_Barrier(): 0.0578456 >>>> Average time for zero size MPI_Send(): 0.00358668 >>>> >>>> The reductions (VecReduceComm) are taking 2.1629e-01 and 2.4972e+01, >>>> respectively. It's a two orders of magnitude difference. >>>> >>>> Jose >>>> >>>> >>>> > El 15 jun 2022, a las 8:58, Runfeng Jin >>>> escribi?: >>>> > >>>> > Sorry ,I miss the attachment. >>>> > >>>> > Runfeng Jin >>>> > >>>> > Runfeng Jin ?2022?6?15??? 14:56??? >>>> > Hi! You are right! I try to use a SLEPc and PETSc version with >>>> nodebug, and the matrix B's solver time become 99s. But It is still a >>>> little higher than matrix A(8s). Same as mentioned before, attachment is >>>> log view of no-debug version: >>>> > file 1: log of matrix A solver. This is a larger >>>> matrix(900,000*900,000) but solved quickly(8s); >>>> > file 2: log of matix B solver. This is a smaller >>>> matrix(2,547*2,547) but solved much slower(99s). >>>> > >>>> > By comparing these two files, the strang phenomenon still exist: >>>> > 1) Matrix A has more basis vectors(375) than B(189), but A spent less >>>> time on BVCreate(0.6s) than B(32s); >>>> > 2) Matrix A spent less time on EPSSetup(0.015s) than B(0.9s) >>>> > 3) In debug version, matrix B distribute much more unbalancedly >>>> storage among processors(memory max/min 4365) than A(memory max/min 1.113), >>>> but other metrics seems more balanced. And in no-debug version there is no >>>> memory information output. >>>> > >>>> > The significant difference I can tell is :1) B use preallocation; 2) >>>> A's matrix elements are calculated by CPU, while B's matrix elements are >>>> calculated by GPU and then transfered to CPU and solved by PETSc in CPU. >>>> > >>>> > Does this is a normal result? I mean, the matrix with less non-zero >>>> elements and less dimension can cost more epssolve time? Is this due to the >>>> structure of matrix? IF so, is there any ways to increase the solve speed? >>>> > >>>> > Or this is weired and should be fixed by some ways? >>>> > Thank you! >>>> > >>>> > Runfeng Jin >>>> > >>>> > >>>> > Jose E. Roman ?2022?6?12??? 16:08??? >>>> > Please always respond to the list. >>>> > >>>> > Pay attention to the warnings in the log: >>>> > >>>> > ########################################################## >>>> > # # >>>> > # WARNING!!! # >>>> > # # >>>> > # This code was compiled with a debugging option. # >>>> > # To get timing results run ./configure # >>>> > # using --with-debugging=no, the performance will # >>>> > # be generally two or three times faster. # >>>> > # # >>>> > ########################################################## >>>> > >>>> > With the debugging option the times are not trustworthy, so I suggest >>>> repeating the analysis with an optimized build. >>>> > >>>> > Jose >>>> > >>>> > >>>> > > El 12 jun 2022, a las 5:41, Runfeng Jin >>>> escribi?: >>>> > > >>>> > > Hello! >>>> > > I compare these two matrix solver's log view and find some strange >>>> thing. Attachment files are the log view.: >>>> > > file 1: log of matrix A solver. This is a larger >>>> matrix(900,000*900,000) but solved quickly(30s); >>>> > > file 2: log of matix B solver. This is a smaller >>>> matrix(2,547*2,547 , a little different from the matrix B that is mentioned >>>> in initial email, but solved much slower too. I use this for a quicker >>>> test) but solved much slower(1244s). >>>> > > >>>> > > By comparing these two files, I find some thing: >>>> > > 1) Matrix A has more basis vectors(375) than B(189), but A spent >>>> less time on BVCreate(0.349s) than B(296s); >>>> > > 2) Matrix A spent less time on EPSSetup(0.031s) than B(10.709s) >>>> > > 3) Matrix B distribute much more unbalancedly storage among >>>> processors(memory max/min 4365) than A(memory max/min 1.113), but other >>>> metrics seems more balanced. >>>> > > >>>> > > I don't do prealocation in A, and it is distributed across >>>> processors by PETSc. For B , when preallocation I use PetscSplitOwnership >>>> to decide which part belongs to local processor, and B is also distributed >>>> by PETSc when compute matrix values. >>>> > > >>>> > > - Does this mean, for matrix B, too much nonzero elements are >>>> stored in single process, and this is why it cost too much more time in >>>> solving the matrix and find eigenvalues? If so, are there some better ways >>>> to distribute the matrix among processors? >>>> > > - Or are there any else reasons for this difference in cost time? >>>> > > >>>> > > Hope to recieve your reply, thank you! >>>> > > >>>> > > Runfeng Jin >>>> > > >>>> > > >>>> > > >>>> > > Runfeng Jin ?2022?6?11??? 20:33??? >>>> > > Hello! >>>> > > I have try ues PETSC_DEFAULT for eps_ncv, but it still cost much >>>> time. Is there anything else I can do? Attachment is log when use >>>> PETSC_DEFAULT for eps_ncv. >>>> > > >>>> > > Thank you ! >>>> > > >>>> > > Runfeng Jin >>>> > > >>>> > > Jose E. Roman ?2022?6?10??? 20:50??? >>>> > > The value -eps_ncv 5000 is huge. >>>> > > Better let SLEPc use the default value. >>>> > > >>>> > > Jose >>>> > > >>>> > > >>>> > > > El 10 jun 2022, a las 14:24, Jin Runfeng >>>> escribi?: >>>> > > > >>>> > > > Hello! >>>> > > > I want to acquire the 3 smallest eigenvalue, and attachment is >>>> the log view output. I can see epssolve really cost the major time. But I >>>> can not see why it cost so much time. Can you see something from it? >>>> > > > >>>> > > > Thank you ! >>>> > > > >>>> > > > Runfeng Jin >>>> > > > >>>> > > > On 6? 4 2022, at 1:37 ??, Jose E. Roman >>>> wrote: >>>> > > > Convergence depends on distribution of eigenvalues you want to >>>> compute. On the other hand, the cost also depends on the time it takes to >>>> build the preconditioner. Use -log_view to see the cost of the different >>>> steps of the computation. >>>> > > > >>>> > > > Jose >>>> > > > >>>> > > > >>>> > > > > El 3 jun 2022, a las 18:50, jsfaraway >>>> escribi?: >>>> > > > > >>>> > > > > hello! >>>> > > > > >>>> > > > > I am trying to use epsgd compute matrix's one smallest >>>> eigenvalue. And I find a strang thing. There are two matrix >>>> A(900000*900000) and B(90000*90000). While solve A use 371 iterations and >>>> only 30.83s, solve B use 22 iterations and 38885s! What could be the reason >>>> for this? Or what can I do to find the reason? >>>> > > > > >>>> > > > > I use" -eps_type gd -eps_ncv 300 -eps_nev 3 -eps_smallest_real >>>> ". >>>> > > > > And there is one difference I can tell is matrix B has many >>>> small value, whose absolute value is less than 10-6. Could this be the >>>> reason? >>>> > > > > >>>> > > > > Thank you! >>>> > > > > >>>> > > > > Runfeng Jin >>>> > > > >>>> > > >>>> > > >>>> >>>> > >>>> > >>>> >>>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangzongze at gmail.com Thu Jun 16 10:11:32 2022 From: yangzongze at gmail.com (Zongze Yang) Date: Thu, 16 Jun 2022 23:11:32 +0800 Subject: [petsc-users] How to find the map between the high order coordinates of DMPlex and vertex numbering? Message-ID: Hi, if I load a `gmsh` file with second-order elements, the coordinates will be stored in a DG-P2 space. After obtaining the coordinates of a cell, how can I map the coordinates to vertex and edge? Below is some code load the gmsh file, I want to know the relation between `cl` and `cell_coords`. ``` import firedrake as fd import numpy as np # Load gmsh file (2rd) plex = fd.mesh._from_gmsh('test-fd-load-p2-rect.msh') cs, ce = plex.getHeightStratum(0) cdm = plex.getCoordinateDM() csec = dm.getCoordinateSection() coords_gvec = dm.getCoordinates() for i in range(cs, ce): cell_coords = cdm.getVecClosure(csec, coords_gvec, i) print(f'coordinates for cell {i} :\n{cell_coords.reshape([-1, 3])}') cl = dm.getTransitiveClosure(i) print('closure:', cl) break ``` Best wishes, Zongze -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test-fd-load-p2-rect.msh Type: application/octet-stream Size: 189254 bytes Desc: not available URL: From knepley at gmail.com Thu Jun 16 10:22:08 2022 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 16 Jun 2022 11:22:08 -0400 Subject: [petsc-users] How to find the map between the high order coordinates of DMPlex and vertex numbering? In-Reply-To: References: Message-ID: On Thu, Jun 16, 2022 at 11:11 AM Zongze Yang wrote: > Hi, if I load a `gmsh` file with second-order elements, the coordinates > will be stored in a DG-P2 space. After obtaining the coordinates of a cell, > how can I map the coordinates to vertex and edge? > By default, they are stored as P2, not DG. You can ask for the coordinates of a vertex or an edge directly using https://petsc.org/main/docs/manualpages/DMPLEX/DMPlexPointLocalRead/ by giving the vertex or edge point. You can get all the coordinates on a cell, in the closure order, using https://petsc.org/main/docs/manualpages/DMPLEX/DMPlexVecGetClosure/ Thanks, Matt > Below is some code load the gmsh file, I want to know the relation between > `cl` and `cell_coords`. > > ``` > import firedrake as fd > import numpy as np > > # Load gmsh file (2rd) > plex = fd.mesh._from_gmsh('test-fd-load-p2-rect.msh') > > cs, ce = plex.getHeightStratum(0) > > cdm = plex.getCoordinateDM() > csec = dm.getCoordinateSection() > coords_gvec = dm.getCoordinates() > > for i in range(cs, ce): > cell_coords = cdm.getVecClosure(csec, coords_gvec, i) > print(f'coordinates for cell {i} :\n{cell_coords.reshape([-1, 3])}') > cl = dm.getTransitiveClosure(i) > print('closure:', cl) > break > ``` > > Best wishes, > Zongze > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangzongze at gmail.com Thu Jun 16 11:06:22 2022 From: yangzongze at gmail.com (Zongze Yang) Date: Fri, 17 Jun 2022 00:06:22 +0800 Subject: [petsc-users] How to find the map between the high order coordinates of DMPlex and vertex numbering? In-Reply-To: References: Message-ID: <2640A1A9-101C-4DFB-BFA4-C64AF231732A@gmail.com> > ? 2022?6?16??23:22?Matthew Knepley ??? > > ? >> On Thu, Jun 16, 2022 at 11:11 AM Zongze Yang wrote: > >> Hi, if I load a `gmsh` file with second-order elements, the coordinates will be stored in a DG-P2 space. After obtaining the coordinates of a cell, how can I map the coordinates to vertex and edge? > > By default, they are stored as P2, not DG. I checked the coordinates vector, and found the dogs only defined on cell other than vertex and edge, so I said they are stored as DG. Then the function DMPlexVecGetClosure seems return the coordinates in lex order. Some code in reading gmsh file reads that 1756: if (isSimplex) continuity = PETSC_FALSE; /* XXX FIXME Requires DMPlexSetClosurePermutationLexicographic() */ 1758: GmshCreateFE(comm, NULL, isSimplex, continuity, nodeType, dim, coordDim, order, &fe) The continuity is set to false for simplex. Thanks, Zongze > > You can ask for the coordinates of a vertex or an edge directly using > > https://petsc.org/main/docs/manualpages/DMPLEX/DMPlexPointLocalRead/ > > by giving the vertex or edge point. You can get all the coordinates on a cell, in the closure order, using > > https://petsc.org/main/docs/manualpages/DMPLEX/DMPlexVecGetClosure/ > Thanks, > > Matt > >> Below is some code load the gmsh file, I want to know the relation between `cl` and `cell_coords`. >> >> ``` >> import firedrake as fd >> import numpy as np >> >> # Load gmsh file (2rd) >> plex = fd.mesh._from_gmsh('test-fd-load-p2-rect.msh') >> >> cs, ce = plex.getHeightStratum(0) >> >> cdm = plex.getCoordinateDM() >> csec = dm.getCoordinateSection() >> coords_gvec = dm.getCoordinates() >> >> for i in range(cs, ce): >> cell_coords = cdm.getVecClosure(csec, coords_gvec, i) >> print(f'coordinates for cell {i} :\n{cell_coords.reshape([-1, 3])}') >> cl = dm.getTransitiveClosure(i) >> print('closure:', cl) >> break >> ``` >> >> Best wishes, >> Zongze > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jun 16 12:11:26 2022 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 16 Jun 2022 13:11:26 -0400 Subject: [petsc-users] How to find the map between the high order coordinates of DMPlex and vertex numbering? In-Reply-To: <2640A1A9-101C-4DFB-BFA4-C64AF231732A@gmail.com> References: <2640A1A9-101C-4DFB-BFA4-C64AF231732A@gmail.com> Message-ID: On Thu, Jun 16, 2022 at 12:06 PM Zongze Yang wrote: > > > ? 2022?6?16??23:22?Matthew Knepley ??? > > ? > On Thu, Jun 16, 2022 at 11:11 AM Zongze Yang wrote: > >> Hi, if I load a `gmsh` file with second-order elements, the coordinates >> will be stored in a DG-P2 space. After obtaining the coordinates of a cell, >> how can I map the coordinates to vertex and edge? >> > > By default, they are stored as P2, not DG. > > > I checked the coordinates vector, and found the dogs only defined on cell > other than vertex and edge, so I said they are stored as DG. > Then the function DMPlexVecGetClosure > seems return > the coordinates in lex order. > > Some code in reading gmsh file reads that > > > 1756: if (isSimplex) continuity = PETSC_FALSE > ; /* XXX FIXME > Requires DMPlexSetClosurePermutationLexicographic() */ > > > 1758: GmshCreateFE(comm, NULL, isSimplex, continuity, nodeType, dim, > coordDim, order, &fe) > > > The continuity is set to false for simplex. > Oh, yes. That needs to be fixed. For now, you can just project it to P2 if you want using https://petsc.org/main/docs/manualpages/DM/DMProjectCoordinates/ Thanks, Matt > Thanks, > Zongze > > You can ask for the coordinates of a vertex or an edge directly using > > https://petsc.org/main/docs/manualpages/DMPLEX/DMPlexPointLocalRead/ > > by giving the vertex or edge point. You can get all the coordinates on a > cell, in the closure order, using > > https://petsc.org/main/docs/manualpages/DMPLEX/DMPlexVecGetClosure/ > > Thanks, > > Matt > > >> Below is some code load the gmsh file, I want to know the relation >> between `cl` and `cell_coords`. >> >> ``` >> import firedrake as fd >> import numpy as np >> >> # Load gmsh file (2rd) >> plex = fd.mesh._from_gmsh('test-fd-load-p2-rect.msh') >> >> cs, ce = plex.getHeightStratum(0) >> >> cdm = plex.getCoordinateDM() >> csec = dm.getCoordinateSection() >> coords_gvec = dm.getCoordinates() >> >> for i in range(cs, ce): >> cell_coords = cdm.getVecClosure(csec, coords_gvec, i) >> print(f'coordinates for cell {i} :\n{cell_coords.reshape([-1, 3])}') >> cl = dm.getTransitiveClosure(i) >> print('closure:', cl) >> break >> ``` >> >> Best wishes, >> Zongze >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tt73 at njit.edu Thu Jun 16 16:57:23 2022 From: tt73 at njit.edu (tt73) Date: Thu, 16 Jun 2022 17:57:23 -0400 Subject: [petsc-users] Customizing NASM subsnes Message-ID: <62aba746.1c69fb81.7df46.678d@mx.google.com> Hi,?I am using? NASM as the outer solver for a nonlinear problem. For one of the subdomains, I want to run the local solve with a different set of options form the others. Is there any way to set options for each subdomain?? -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jun 16 17:23:42 2022 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 16 Jun 2022 18:23:42 -0400 Subject: [petsc-users] Customizing NASM subsnes In-Reply-To: <62aba746.1c69fb81.7df46.678d@mx.google.com> References: <62aba746.1c69fb81.7df46.678d@mx.google.com> Message-ID: On Thu, Jun 16, 2022 at 5:57 PM tt73 wrote: > > Hi, > > I am using NASM as the outer solver for a nonlinear problem. For one of > the subdomains, I want to run the local solve with a different set of > options form the others. Is there any way to set options for each > subdomain? > I can see two ways: 1) Pull out the subsolver and set it using the API 2) Pull out the subsolver and give it a different prefix Thanks, Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangzongze at gmail.com Fri Jun 17 01:54:03 2022 From: yangzongze at gmail.com (Zongze Yang) Date: Fri, 17 Jun 2022 14:54:03 +0800 Subject: [petsc-users] How to find the map between the high order coordinates of DMPlex and vertex numbering? In-Reply-To: References: <2640A1A9-101C-4DFB-BFA4-C64AF231732A@gmail.com> Message-ID: I tried the projection operation. However, it seems that the projection gives the wrong solution. After projection, the bounding box is changed! See logs below. First, I patch the petsc4py by adding `DMProjectCoordinates`: ``` diff --git a/src/binding/petsc4py/src/PETSc/DM.pyx b/src/binding/petsc4py/src/PETSc/DM.pyx index d8a58d183a..dbcdb280f1 100644 --- a/src/binding/petsc4py/src/PETSc/DM.pyx +++ b/src/binding/petsc4py/src/PETSc/DM.pyx @@ -307,6 +307,12 @@ cdef class DM(Object): PetscINCREF(c.obj) return c + def projectCoordinates(self, FE fe=None): + if fe is None: + CHKERR( DMProjectCoordinates(self.dm, NULL) ) + else: + CHKERR( DMProjectCoordinates(self.dm, fe.fe) ) + def getBoundingBox(self): cdef PetscInt i,dim=0 CHKERR( DMGetCoordinateDim(self.dm, &dim) ) diff --git a/src/binding/petsc4py/src/PETSc/petscdm.pxi b/src/binding/petsc4py/src/PETSc/petscdm.pxi index 514b6fa472..c778e39884 100644 --- a/src/binding/petsc4py/src/PETSc/petscdm.pxi +++ b/src/binding/petsc4py/src/PETSc/petscdm.pxi @@ -90,6 +90,7 @@ cdef extern from * nogil: int DMGetCoordinateDim(PetscDM,PetscInt*) int DMSetCoordinateDim(PetscDM,PetscInt) int DMLocalizeCoordinates(PetscDM) + int DMProjectCoordinates(PetscDM, PetscFE) int DMCreateInterpolation(PetscDM,PetscDM,PetscMat*,PetscVec*) int DMCreateInjection(PetscDM,PetscDM,PetscMat*) ``` Then in python, I load a mesh and project the coordinates to P2: ``` import firedrake as fd from firedrake.petsc import PETSc # plex = fd.mesh._from_gmsh('test-fd-load-p2.msh') plex = fd.mesh._from_gmsh('test-fd-load-p2-rect.msh') print('old bbox:', plex.getBoundingBox()) dim = plex.getDimension() # (dim, nc, isSimplex, k, qorder, comm=None) fe_new = PETSc.FE().createLagrange(dim, dim, True, 2, PETSc.DETERMINE) plex.projectCoordinates(fe_new) fe_new.view() print('new bbox:', plex.getBoundingBox()) ``` The output is (The bounding box is changed!) ``` old bbox: ((0.0, 1.0), (0.0, 1.0), (0.0, 1.0)) PetscFE Object: P2 1 MPI processes type: basic Basic Finite Element in 3 dimensions with 3 components PetscSpace Object: P2 1 MPI processes type: sum Space in 3 variables with 3 components, size 30 Sum space of 3 concatenated subspaces (all identical) PetscSpace Object: sum component (sumcomp_) 1 MPI processes type: poly Space in 3 variables with 1 components, size 10 Polynomial space of degree 2 PetscDualSpace Object: P2 1 MPI processes type: lagrange Dual space with 3 components, size 30 Continuous Lagrange dual space Quadrature of order 5 on 27 points (dim 3) new bbox: ((-6.530133708576188e-17, 36.30670832662781), (-3.899962995254311e-17, 36.2406171632539), (-8.8036464152166e-17, 36.111577025012224)) ``` By the way, for the original DG coordinates, where can I find the relation of the closure and the order of the dofs for the cell? Thanks! Zongze Matthew Knepley ?2022?6?17??? 01:11??? > On Thu, Jun 16, 2022 at 12:06 PM Zongze Yang wrote: > >> >> >> ? 2022?6?16??23:22?Matthew Knepley ??? >> >> ? >> On Thu, Jun 16, 2022 at 11:11 AM Zongze Yang >> wrote: >> >>> Hi, if I load a `gmsh` file with second-order elements, the coordinates >>> will be stored in a DG-P2 space. After obtaining the coordinates of a cell, >>> how can I map the coordinates to vertex and edge? >>> >> >> By default, they are stored as P2, not DG. >> >> >> I checked the coordinates vector, and found the dogs only defined on cell >> other than vertex and edge, so I said they are stored as DG. >> Then the function DMPlexVecGetClosure >> seems return >> the coordinates in lex order. >> >> Some code in reading gmsh file reads that >> >> >> 1756: if (isSimplex) continuity = PETSC_FALSE >> ; /* XXX FIXME >> Requires DMPlexSetClosurePermutationLexicographic() */ >> >> >> 1758: GmshCreateFE(comm, NULL, isSimplex, continuity, nodeType, dim, >> coordDim, order, &fe) >> >> >> The continuity is set to false for simplex. >> > > Oh, yes. That needs to be fixed. For now, you can just project it to P2 if > you want using > > https://petsc.org/main/docs/manualpages/DM/DMProjectCoordinates/ > > Thanks, > > Matt > > >> Thanks, >> Zongze >> >> You can ask for the coordinates of a vertex or an edge directly using >> >> https://petsc.org/main/docs/manualpages/DMPLEX/DMPlexPointLocalRead/ >> >> by giving the vertex or edge point. You can get all the coordinates on a >> cell, in the closure order, using >> >> https://petsc.org/main/docs/manualpages/DMPLEX/DMPlexVecGetClosure/ >> >> Thanks, >> >> Matt >> >> >>> Below is some code load the gmsh file, I want to know the relation >>> between `cl` and `cell_coords`. >>> >>> ``` >>> import firedrake as fd >>> import numpy as np >>> >>> # Load gmsh file (2rd) >>> plex = fd.mesh._from_gmsh('test-fd-load-p2-rect.msh') >>> >>> cs, ce = plex.getHeightStratum(0) >>> >>> cdm = plex.getCoordinateDM() >>> csec = dm.getCoordinateSection() >>> coords_gvec = dm.getCoordinates() >>> >>> for i in range(cs, ce): >>> cell_coords = cdm.getVecClosure(csec, coords_gvec, i) >>> print(f'coordinates for cell {i} :\n{cell_coords.reshape([-1, 3])}') >>> cl = dm.getTransitiveClosure(i) >>> print('closure:', cl) >>> break >>> ``` >>> >>> Best wishes, >>> Zongze >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tt73 at njit.edu Fri Jun 17 08:22:37 2022 From: tt73 at njit.edu (Takahashi, Tadanaga) Date: Fri, 17 Jun 2022 09:22:37 -0400 Subject: [petsc-users] Customizing NASM subsnes In-Reply-To: References: <62aba746.1c69fb81.7df46.678d@mx.google.com> Message-ID: I'm having some trouble pulling out the subsolver. I tried to use SNESNASMGetSNES in a loop over each subdomain. However I get an error when I run the code with more than one MPI processors. Here is a snippet from my code: SNES snes, subsnes; PetscMPIInt rank, size; ... ierr = SNESCreate(PETSC_COMM_WORLD,&snes); CHKERRQ(ierr); ierr = SNESSetType(snes,SNESNASM); CHKERRQ(ierr); ierr = SNESNASMSetType(snes,PC_ASM_RESTRICT); CHKERRQ(ierr); ... ierr = SNESSetFromOptions(snes); CHKERRQ(ierr); ierr = SNESSetUp(snes); CHKERRQ(ierr); PetscPrintf(PETSC_COMM_WORLD, "Size = %d\n",size); for (i=0; i wrote: > On Thu, Jun 16, 2022 at 5:57 PM tt73 wrote: > >> >> Hi, >> >> I am using NASM as the outer solver for a nonlinear problem. For one of >> the subdomains, I want to run the local solve with a different set of >> options form the others. Is there any way to set options for each >> subdomain? >> > > I can see two ways: > > 1) Pull out the subsolver and set it using the API > > 2) Pull out the subsolver and give it a different prefix > > Thanks, > > Matt > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jun 17 08:35:09 2022 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 17 Jun 2022 09:35:09 -0400 Subject: [petsc-users] Customizing NASM subsnes In-Reply-To: References: <62aba746.1c69fb81.7df46.678d@mx.google.com> Message-ID: On Fri, Jun 17, 2022 at 9:22 AM Takahashi, Tadanaga wrote: > I'm having some trouble pulling out the subsolver. I tried to use > SNESNASMGetSNES in a loop over each subdomain. However I get an error when > I run the code with more than one MPI processors. Here is a snippet from my > code: > > SNES snes, subsnes; > PetscMPIInt rank, size; > ... > ierr = SNESCreate(PETSC_COMM_WORLD,&snes); CHKERRQ(ierr); > ierr = SNESSetType(snes,SNESNASM); CHKERRQ(ierr); > ierr = SNESNASMSetType(snes,PC_ASM_RESTRICT); CHKERRQ(ierr); > ... > ierr = SNESSetFromOptions(snes); CHKERRQ(ierr); > ierr = SNESSetUp(snes); CHKERRQ(ierr); > PetscPrintf(PETSC_COMM_WORLD, "Size = %d\n",size); > for (i=0; i PetscPrintf(PETSC_COMM_WORLD, "rank = %d\n",i); > SNESNASMGetSNES(snes,i,&subsnes); > // char prefix[10]; > // sprintf(prefix,"sub_%d_",i); > // SNESSetOptionsPrefix(subsnes,prefix); > } > ... > ierr = SNESSolve(snes,NULL,u_initial); CHKERRQ(ierr); > > > And, here is the output of the code when I run with 2 MPI procs: > SNESNASMGetSNES() gets the local subsolvers. It seems you only have one per process. You can check https://petsc.org/main/docs/manualpages/SNES/SNESNASMGetNumber/ Notice that your current code will not work because, according to your explanation, you only want to change the prefix on a single rank, so you need to check the rank when you do it. Thanks, Matt > takahashi at ubuntu:~/Desktop/MA-DDM/C/Rectangle$ mpiexec -n 2 ./test1 > Size = 2 > rank = 0 > rank = 1 > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Argument out of range > [0]PETSC ERROR: No such subsolver > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.17.1, unknown > [0]PETSC ERROR: ./test1 on a linux-gnu-c-debug named ubuntu by takahashi > Fri Jun 17 06:06:38 2022 > [0]PETSC ERROR: Configure options --with-mpi-dir=/usr --with-fc=0 > [0]PETSC ERROR: #1 SNESNASMGetSNES() at > /home/takahashi/Desktop/petsc/src/snes/impls/nasm/nasm.c:923 > > > =================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = RANK 0 PID 976566 RUNNING AT ubuntu > = KILLED BY SIGNAL: 9 (Killed) > > =================================================================================== > > This error doesn't occur when I run this without MPI. However, I tried to > change the prefix of the subdomain to `sub_0_` but I am not able to change > the snes_type using this prefix. Running ./test1 -snes_view -help | grep > sub_0_snes_type prints nothing. > > On Thu, Jun 16, 2022 at 6:23 PM Matthew Knepley wrote: > >> On Thu, Jun 16, 2022 at 5:57 PM tt73 wrote: >> >>> >>> Hi, >>> >>> I am using NASM as the outer solver for a nonlinear problem. For one of >>> the subdomains, I want to run the local solve with a different set of >>> options form the others. Is there any way to set options for each >>> subdomain? >>> >> >> I can see two ways: >> >> 1) Pull out the subsolver and set it using the API >> >> 2) Pull out the subsolver and give it a different prefix >> >> Thanks, >> >> Matt >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Jun 17 09:00:53 2022 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 17 Jun 2022 10:00:53 -0400 Subject: [petsc-users] Customizing NASM subsnes In-Reply-To: References: <62aba746.1c69fb81.7df46.678d@mx.google.com> Message-ID: MPI_Comm_size(PETSC_COMM_WORLD,&size); MPI_Comm_rank(PETSC_COMM_WORLD,&rank); > SNESNASMGetSNES(snes,0,&subsnes); > char prefix[10]; > sprintf(prefix,"sub_%d_",rank); > SNESSetOptionsPrefix(subsnes,prefix); > On Jun 17, 2022, at 9:35 AM, Matthew Knepley wrote: > > On Fri, Jun 17, 2022 at 9:22 AM Takahashi, Tadanaga > wrote: > I'm having some trouble pulling out the subsolver. I tried to use SNESNASMGetSNES in a loop over each subdomain. However I get an error when I run the code with more than one MPI processors. Here is a snippet from my code: > > SNES snes, subsnes; > PetscMPIInt rank, size; > ... > ierr = SNESCreate(PETSC_COMM_WORLD,&snes); CHKERRQ(ierr); > ierr = SNESSetType(snes,SNESNASM); CHKERRQ(ierr); > ierr = SNESNASMSetType(snes,PC_ASM_RESTRICT); CHKERRQ(ierr); > ... > ierr = SNESSetFromOptions(snes); CHKERRQ(ierr); > ierr = SNESSetUp(snes); CHKERRQ(ierr); > PetscPrintf(PETSC_COMM_WORLD, "Size = %d\n",size); > for (i=0; i PetscPrintf(PETSC_COMM_WORLD, "rank = %d\n",i); > SNESNASMGetSNES(snes,i,&subsnes); > // char prefix[10]; > // sprintf(prefix,"sub_%d_",i); > // SNESSetOptionsPrefix(subsnes,prefix); > } > ... > ierr = SNESSolve(snes,NULL,u_initial); CHKERRQ(ierr); > > > And, here is the output of the code when I run with 2 MPI procs: > > SNESNASMGetSNES() gets the local subsolvers. It seems you only have one per process. > You can check https://petsc.org/main/docs/manualpages/SNES/SNESNASMGetNumber/ > > Notice that your current code will not work because, according to your explanation, you only want to change > the prefix on a single rank, so you need to check the rank when you do it. > > Thanks, > > Matt > > takahashi at ubuntu:~/Desktop/MA-DDM/C/Rectangle$ mpiexec -n 2 ./test1 > Size = 2 > rank = 0 > rank = 1 > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Argument out of range > [0]PETSC ERROR: No such subsolver > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.17.1, unknown > [0]PETSC ERROR: ./test1 on a linux-gnu-c-debug named ubuntu by takahashi Fri Jun 17 06:06:38 2022 > [0]PETSC ERROR: Configure options --with-mpi-dir=/usr --with-fc=0 > [0]PETSC ERROR: #1 SNESNASMGetSNES() at /home/takahashi/Desktop/petsc/src/snes/impls/nasm/nasm.c:923 > > =================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = RANK 0 PID 976566 RUNNING AT ubuntu > = KILLED BY SIGNAL: 9 (Killed) > =================================================================================== > > This error doesn't occur when I run this without MPI. However, I tried to change the prefix of the subdomain to `sub_0_` but I am not able to change the snes_type using this prefix. Running ./test1 -snes_view -help | grep sub_0_snes_type prints nothing. > > On Thu, Jun 16, 2022 at 6:23 PM Matthew Knepley > wrote: > On Thu, Jun 16, 2022 at 5:57 PM tt73 > wrote: > > Hi, > > I am using NASM as the outer solver for a nonlinear problem. For one of the subdomains, I want to run the local solve with a different set of options form the others. Is there any way to set options for each subdomain? > > I can see two ways: > > 1) Pull out the subsolver and set it using the API > > 2) Pull out the subsolver and give it a different prefix > > Thanks, > > Matt > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tt73 at njit.edu Fri Jun 17 09:47:25 2022 From: tt73 at njit.edu (Takahashi, Tadanaga) Date: Fri, 17 Jun 2022 10:47:25 -0400 Subject: [petsc-users] Customizing NASM subsnes In-Reply-To: References: <62aba746.1c69fb81.7df46.678d@mx.google.com> Message-ID: Thank you. I am now able to pull each subsnes, change its snes type through the API, and set a prefix. This is my updated code: SNES snes, subsnes; PetscMPIInt rank, size; ... ierr = SNESCreate(PETSC_COMM_WORLD,&snes); CHKERRQ(ierr); ierr = SNESSetType(snes,SNESNASM); CHKERRQ(ierr); ierr = SNESNASMSetType(snes,PC_ASM_RESTRICT); CHKERRQ(ierr); ... ierr = SNESSetFromOptions(snes); CHKERRQ(ierr); ierr = SNESSetUp(snes); CHKERRQ(ierr); PetscPrintf(PETSC_COMM_WORLD, "Size = %d\n",size); PetscBarrier(NULL); for (i=0; i wrote: > > MPI_Comm_size(PETSC_COMM_WORLD,&size); > MPI_Comm_rank(PETSC_COMM_WORLD,&rank); > > SNESNASMGetSNES(snes,0,&subsnes); >> char prefix[10]; >> sprintf(prefix,"sub_%d_",rank); >> SNESSetOptionsPrefix(subsnes,prefix); >> > > > > On Jun 17, 2022, at 9:35 AM, Matthew Knepley wrote: > > On Fri, Jun 17, 2022 at 9:22 AM Takahashi, Tadanaga wrote: > >> I'm having some trouble pulling out the subsolver. I tried to use >> SNESNASMGetSNES in a loop over each subdomain. However I get an error when >> I run the code with more than one MPI processors. Here is a snippet from my >> code: >> >> SNES snes, subsnes; >> PetscMPIInt rank, size; >> ... >> ierr = SNESCreate(PETSC_COMM_WORLD,&snes); CHKERRQ(ierr); >> ierr = SNESSetType(snes,SNESNASM); CHKERRQ(ierr); >> ierr = SNESNASMSetType(snes,PC_ASM_RESTRICT); CHKERRQ(ierr); >> ... >> ierr = SNESSetFromOptions(snes); CHKERRQ(ierr); >> ierr = SNESSetUp(snes); CHKERRQ(ierr); >> PetscPrintf(PETSC_COMM_WORLD, "Size = %d\n",size); >> for (i=0; i> PetscPrintf(PETSC_COMM_WORLD, "rank = %d\n",i); >> SNESNASMGetSNES(snes,i,&subsnes); >> // char prefix[10]; >> // sprintf(prefix,"sub_%d_",i); >> // SNESSetOptionsPrefix(subsnes,prefix); >> } >> ... >> ierr = SNESSolve(snes,NULL,u_initial); CHKERRQ(ierr); >> >> >> And, here is the output of the code when I run with 2 MPI procs: >> > > SNESNASMGetSNES() gets the local subsolvers. It seems you only have one > per process. > You can check > https://petsc.org/main/docs/manualpages/SNES/SNESNASMGetNumber/ > > Notice that your current code will not work because, according to your > explanation, you only want to change > the prefix on a single rank, so you need to check the rank when you do it. > > Thanks, > > Matt > > >> takahashi at ubuntu:~/Desktop/MA-DDM/C/Rectangle$ mpiexec -n 2 ./test1 >> Size = 2 >> rank = 0 >> rank = 1 >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: Argument out of range >> [0]PETSC ERROR: No such subsolver >> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. >> [0]PETSC ERROR: Petsc Release Version 3.17.1, unknown >> [0]PETSC ERROR: ./test1 on a linux-gnu-c-debug named ubuntu by takahashi >> Fri Jun 17 06:06:38 2022 >> [0]PETSC ERROR: Configure options --with-mpi-dir=/usr --with-fc=0 >> [0]PETSC ERROR: #1 SNESNASMGetSNES() at >> /home/takahashi/Desktop/petsc/src/snes/impls/nasm/nasm.c:923 >> >> >> =================================================================================== >> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >> = RANK 0 PID 976566 RUNNING AT ubuntu >> = KILLED BY SIGNAL: 9 (Killed) >> >> =================================================================================== >> >> This error doesn't occur when I run this without MPI. However, I tried to >> change the prefix of the subdomain to `sub_0_` but I am not able to change >> the snes_type using this prefix. Running ./test1 -snes_view -help | grep >> sub_0_snes_type prints nothing. >> >> On Thu, Jun 16, 2022 at 6:23 PM Matthew Knepley >> wrote: >> >>> On Thu, Jun 16, 2022 at 5:57 PM tt73 wrote: >>> >>>> >>>> Hi, >>>> >>>> I am using NASM as the outer solver for a nonlinear problem. For one >>>> of the subdomains, I want to run the local solve with a different set of >>>> options form the others. Is there any way to set options for each >>>> subdomain? >>>> >>> >>> I can see two ways: >>> >>> 1) Pull out the subsolver and set it using the API >>> >>> 2) Pull out the subsolver and give it a different prefix >>> >>> Thanks, >>> >>> Matt >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Jun 17 10:02:05 2022 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 17 Jun 2022 11:02:05 -0400 Subject: [petsc-users] Customizing NASM subsnes In-Reply-To: References: <62aba746.1c69fb81.7df46.678d@mx.google.com> Message-ID: You do not need the loop over size. Each rank sets options and options prefix for its local objects and never anyone elses. > char prefix[10]; > sprintf(prefix,"sub_%d_",rank); > SNESNASMGetSNES(snes,0,&subsnes); > if (rank SNESSetType(subsnes,SNESNEWTONLS); CHKERRQ(ierr); // newton for regular domains > } else { > SNESSetType(subsnes,SNESFAS); CHKERRQ(ierr); // fas for last domain > } > SNESSetOptionsPrefix(subsnes,prefix); > } > } To get the prefix to work try calling SNESSetFromOptions(subsnes); immediately after your SNESSetOptionsPrefix(subsnes,prefix); call Matt, it looks like there may be a bug in NASM, except in one particular case, it never calls SNESSetFromOptions() on the subsenses. Barry > On Jun 17, 2022, at 10:47 AM, Takahashi, Tadanaga wrote: > > Thank you. I am now able to pull each subsnes, change its snes type through the API, and set a prefix. This is my updated code: > > SNES snes, subsnes; > PetscMPIInt rank, size; > ... > ierr = SNESCreate(PETSC_COMM_WORLD,&snes); CHKERRQ(ierr); > ierr = SNESSetType(snes,SNESNASM); CHKERRQ(ierr); > ierr = SNESNASMSetType(snes,PC_ASM_RESTRICT); CHKERRQ(ierr); > ... > ierr = SNESSetFromOptions(snes); CHKERRQ(ierr); > ierr = SNESSetUp(snes); CHKERRQ(ierr); > PetscPrintf(PETSC_COMM_WORLD, "Size = %d\n",size); > PetscBarrier(NULL); > for (i=0; i char prefix[10]; > sprintf(prefix,"sub_%d_",i); > if(i==rank) { > ierr = SNESNASMGetNumber(snes,&Nd); > printf("rank = %d has %d block(s)\n",i,Nd); > if (i SNESNASMGetSNES(snes,0,&subsnes); > SNESSetType(subsnes,SNESNEWTONLS); CHKERRQ(ierr); // newton for regular domains > } else { > SNESNASMGetSNES(snes,0,&subsnes); > SNESSetType(subsnes,SNESFAS); CHKERRQ(ierr); // fas for last domain > } > SNESSetOptionsPrefix(subsnes,prefix); > } > } > ierr = SNESSetFromOptions(snes); CHKERRQ(ierr); > ... > ierr = SNESSolve(snes,NULL,u_initial); CHKERRQ(ierr); > > However, I still cannot change SNES, KSP, and PC types for individual domains through the command arguments. I checked the subdomains with -snes_view ::ascii_info_detail and it does show that the prefixes are properly changed. It also shows that the SNES type for the last domain was successfully changed. But for some reason, I only have access to the SNES viewer options during runtime. For example, if I run mpiexec -n 4 ./test1 -sub_0_ksp_type gmres -help | grep sub_0 I get the output: > > Viewer (-sub_0_snes_convergence_estimate) options: > -sub_0_snes_convergence_estimate ascii[:[filename][:[format][:append]]]: Prints object to stdout or ASCII file (PetscOptionsGetViewer) > -sub_0_snes_convergence_estimate binary[:[filename][:[format][:append]]]: Saves object to a binary file (PetscOptionsGetViewer) > -sub_0_snes_convergence_estimate draw[:[drawtype][:filename|format]] Draws object (PetscOptionsGetViewer) > -sub_0_snes_convergence_estimate socket[:port]: Pushes object to a Unix socket (PetscOptionsGetViewer) > -sub_0_snes_convergence_estimate saws[:communicatorname]: Publishes object to SAWs (PetscOptionsGetViewer) > Viewer (-sub_0_snes_view_pre) options: > -sub_0_snes_view_pre ascii[:[filename][:[format][:append]]]: Prints object to stdout or ASCII file (PetscOptionsGetViewer) > -sub_0_snes_view_pre binary[:[filename][:[format][:append]]]: Saves object to a binary file (PetscOptionsGetViewer) > -sub_0_snes_view_pre draw[:[drawtype][:filename|format]] Draws object (PetscOptionsGetViewer) > -sub_0_snes_view_pre socket[:port]: Pushes object to a Unix socket (PetscOptionsGetViewer) > -sub_0_snes_view_pre saws[:communicatorname]: Publishes object to SAWs (PetscOptionsGetViewer) > Viewer (-sub_0_snes_test_jacobian_view) options: > -sub_0_snes_test_jacobian_view ascii[:[filename][:[format][:append]]]: Prints object to stdout or ASCII file (PetscOptionsGetViewer) > -sub_0_snes_test_jacobian_view binary[:[filename][:[format][:append]]]: Saves object to a binary file (PetscOptionsGetViewer) > -sub_0_snes_test_jacobian_view draw[:[drawtype][:filename|format]] Draws object (PetscOptionsGetViewer) > -sub_0_snes_test_jacobian_view socket[:port]: Pushes object to a Unix socket (PetscOptionsGetViewer) > -sub_0_snes_test_jacobian_view saws[:communicatorname]: Publishes object to SAWs (PetscOptionsGetViewer) > Viewer (-sub_0_snes_test_jacobian_display) options: > -sub_0_snes_test_jacobian_display ascii[:[filename][:[format][:append]]]: Prints object to stdout or ASCII file (PetscOptionsGetViewer) > -sub_0_snes_test_jacobian_display binary[:[filename][:[format][:append]]]: Saves object to a binary file (PetscOptionsGetViewer) > -sub_0_snes_test_jacobian_display draw[:[drawtype][:filename|format]] Draws object (PetscOptionsGetViewer) > -sub_0_snes_test_jacobian_display socket[:port]: Pushes object to a Unix socket (PetscOptionsGetViewer) > -sub_0_snes_test_jacobian_display saws[:communicatorname]: Publishes object to SAWs (PetscOptionsGetViewer) > Viewer (-sub_0_ksp_converged_reason) options: > -sub_0_ksp_converged_reason ascii[:[filename][:[format][:append]]]: Prints object to stdout or ASCII file (PetscOptionsGetViewer) > -sub_0_ksp_converged_reason binary[:[filename][:[format][:append]]]: Saves object to a binary file (PetscOptionsGetViewer) > -sub_0_ksp_converged_reason draw[:[drawtype][:filename|format]] Draws object (PetscOptionsGetViewer) > -sub_0_ksp_converged_reason socket[:port]: Pushes object to a Unix socket (PetscOptionsGetViewer) > -sub_0_ksp_converged_reason saws[:communicatorname]: Publishes object to SAWs (PetscOptionsGetViewer) > Viewer (-sub_0_snes_converged_reason) options: > -sub_0_snes_converged_reason ascii[:[filename][:[format][:append]]]: Prints object to stdout or ASCII file (PetscOptionsGetViewer) > -sub_0_snes_converged_reason binary[:[filename][:[format][:append]]]: Saves object to a binary file (PetscOptionsGetViewer) > -sub_0_snes_converged_reason draw[:[drawtype][:filename|format]] Draws object (PetscOptionsGetViewer) > -sub_0_snes_converged_reason socket[:port]: Pushes object to a Unix socket (PetscOptionsGetViewer) > -sub_0_snes_converged_reason saws[:communicatorname]: Publishes object to SAWs (PetscOptionsGetViewer) > Viewer (-sub_0_snes_view) options: > -sub_0_snes_view ascii[:[filename][:[format][:append]]]: Prints object to stdout or ASCII file (PetscOptionsGetViewer) > -sub_0_snes_view binary[:[filename][:[format][:append]]]: Saves object to a binary file (PetscOptionsGetViewer) > -sub_0_snes_view draw[:[drawtype][:filename|format]] Draws object (PetscOptionsGetViewer) > -sub_0_snes_view socket[:port]: Pushes object to a Unix socket (PetscOptionsGetViewer) > -sub_0_snes_view saws[:communicatorname]: Publishes object to SAWs (PetscOptionsGetViewer) > Viewer (-sub_0_snes_view_solution) options: > -sub_0_snes_view_solution ascii[:[filename][:[format][:append]]]: Prints object to stdout or ASCII file (PetscOptionsGetViewer) > -sub_0_snes_view_solution binary[:[filename][:[format][:append]]]: Saves object to a binary file (PetscOptionsGetViewer) > -sub_0_snes_view_solution draw[:[drawtype][:filename|format]] Draws object (PetscOptionsGetViewer) > -sub_0_snes_view_solution socket[:port]: Pushes object to a Unix socket (PetscOptionsGetViewer) > -sub_0_snes_view_solution saws[:communicatorname]: Publishes object to SAWs (PetscOptionsGetViewer) > Option left: name:-sub_0_ksp_type value: gmres > > Do you know what could be causing this? > > On Fri, Jun 17, 2022 at 10:00 AM Barry Smith > wrote: > > MPI_Comm_size(PETSC_COMM_WORLD,&size); > MPI_Comm_rank(PETSC_COMM_WORLD,&rank); >> SNESNASMGetSNES(snes,0,&subsnes); >> char prefix[10]; >> sprintf(prefix,"sub_%d_",rank); >> SNESSetOptionsPrefix(subsnes,prefix); > > > >> On Jun 17, 2022, at 9:35 AM, Matthew Knepley > wrote: >> >> On Fri, Jun 17, 2022 at 9:22 AM Takahashi, Tadanaga > wrote: >> I'm having some trouble pulling out the subsolver. I tried to use SNESNASMGetSNES in a loop over each subdomain. However I get an error when I run the code with more than one MPI processors. Here is a snippet from my code: >> >> SNES snes, subsnes; >> PetscMPIInt rank, size; >> ... >> ierr = SNESCreate(PETSC_COMM_WORLD,&snes); CHKERRQ(ierr); >> ierr = SNESSetType(snes,SNESNASM); CHKERRQ(ierr); >> ierr = SNESNASMSetType(snes,PC_ASM_RESTRICT); CHKERRQ(ierr); >> ... >> ierr = SNESSetFromOptions(snes); CHKERRQ(ierr); >> ierr = SNESSetUp(snes); CHKERRQ(ierr); >> PetscPrintf(PETSC_COMM_WORLD, "Size = %d\n",size); >> for (i=0; i> PetscPrintf(PETSC_COMM_WORLD, "rank = %d\n",i); >> SNESNASMGetSNES(snes,i,&subsnes); >> // char prefix[10]; >> // sprintf(prefix,"sub_%d_",i); >> // SNESSetOptionsPrefix(subsnes,prefix); >> } >> ... >> ierr = SNESSolve(snes,NULL,u_initial); CHKERRQ(ierr); >> >> >> And, here is the output of the code when I run with 2 MPI procs: >> >> SNESNASMGetSNES() gets the local subsolvers. It seems you only have one per process. >> You can check https://petsc.org/main/docs/manualpages/SNES/SNESNASMGetNumber/ >> >> Notice that your current code will not work because, according to your explanation, you only want to change >> the prefix on a single rank, so you need to check the rank when you do it. >> >> Thanks, >> >> Matt >> >> takahashi at ubuntu:~/Desktop/MA-DDM/C/Rectangle$ mpiexec -n 2 ./test1 >> Size = 2 >> rank = 0 >> rank = 1 >> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> [0]PETSC ERROR: Argument out of range >> [0]PETSC ERROR: No such subsolver >> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. >> [0]PETSC ERROR: Petsc Release Version 3.17.1, unknown >> [0]PETSC ERROR: ./test1 on a linux-gnu-c-debug named ubuntu by takahashi Fri Jun 17 06:06:38 2022 >> [0]PETSC ERROR: Configure options --with-mpi-dir=/usr --with-fc=0 >> [0]PETSC ERROR: #1 SNESNASMGetSNES() at /home/takahashi/Desktop/petsc/src/snes/impls/nasm/nasm.c:923 >> >> =================================================================================== >> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >> = RANK 0 PID 976566 RUNNING AT ubuntu >> = KILLED BY SIGNAL: 9 (Killed) >> =================================================================================== >> >> This error doesn't occur when I run this without MPI. However, I tried to change the prefix of the subdomain to `sub_0_` but I am not able to change the snes_type using this prefix. Running ./test1 -snes_view -help | grep sub_0_snes_type prints nothing. >> >> On Thu, Jun 16, 2022 at 6:23 PM Matthew Knepley > wrote: >> On Thu, Jun 16, 2022 at 5:57 PM tt73 > wrote: >> >> Hi, >> >> I am using NASM as the outer solver for a nonlinear problem. For one of the subdomains, I want to run the local solve with a different set of options form the others. Is there any way to set options for each subdomain? >> >> I can see two ways: >> >> 1) Pull out the subsolver and set it using the API >> >> 2) Pull out the subsolver and give it a different prefix >> >> Thanks, >> >> Matt >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tt73 at njit.edu Fri Jun 17 10:12:29 2022 From: tt73 at njit.edu (Takahashi, Tadanaga) Date: Fri, 17 Jun 2022 11:12:29 -0400 Subject: [petsc-users] Customizing NASM subsnes In-Reply-To: References: <62aba746.1c69fb81.7df46.678d@mx.google.com> Message-ID: Ahh, I understand now. I got rid of the loop. Adding SNESSetFromOptions(subsnes) right after SNESSetOptionsPrefix(subsnes,prefix) did not fix the issue. On Fri, Jun 17, 2022 at 11:02 AM Barry Smith wrote: > > You do not need the loop over size. Each rank sets options and options > prefix for its local objects and never anyone elses. > > char prefix[10]; > > sprintf(prefix,"sub_%d_",rank); > > SNESNASMGetSNES(snes,0,&subsnes); > > if (rank SNESSetType(subsnes,SNESNEWTONLS); CHKERRQ(ierr); // newton > for regular domains > } else { > SNESSetType(subsnes,SNESFAS); CHKERRQ(ierr); // fas for last > domain > } > SNESSetOptionsPrefix(subsnes,prefix); > } > } > > > To get the prefix to work try calling SNESSetFromOptions(subsnes); > immediately after your SNESSetOptionsPrefix(subsnes,prefix); call > > Matt, it looks like there may be a bug in NASM, except in one > particular case, it never calls SNESSetFromOptions() on the subsenses. > > Barry > > > On Jun 17, 2022, at 10:47 AM, Takahashi, Tadanaga wrote: > > Thank you. I am now able to pull each subsnes, change its snes type > through the API, and set a prefix. This is my updated code: > > SNES snes, subsnes; > PetscMPIInt rank, size; > ... > ierr = SNESCreate(PETSC_COMM_WORLD,&snes); CHKERRQ(ierr); > ierr = SNESSetType(snes,SNESNASM); CHKERRQ(ierr); > ierr = SNESNASMSetType(snes,PC_ASM_RESTRICT); CHKERRQ(ierr); > ... > ierr = SNESSetFromOptions(snes); CHKERRQ(ierr); > ierr = SNESSetUp(snes); CHKERRQ(ierr); > PetscPrintf(PETSC_COMM_WORLD, "Size = %d\n",size); > PetscBarrier(NULL); > for (i=0; i char prefix[10]; > sprintf(prefix,"sub_%d_",i); > if(i==rank) { > ierr = SNESNASMGetNumber(snes,&Nd); > printf("rank = %d has %d block(s)\n",i,Nd); > if (i SNESNASMGetSNES(snes,0,&subsnes); > SNESSetType(subsnes,SNESNEWTONLS); CHKERRQ(ierr); // newton > for regular domains > } else { > SNESNASMGetSNES(snes,0,&subsnes); > SNESSetType(subsnes,SNESFAS); CHKERRQ(ierr); // fas for last > domain > } > SNESSetOptionsPrefix(subsnes,prefix); > } > } > ierr = SNESSetFromOptions(snes); CHKERRQ(ierr); > ... > ierr = SNESSolve(snes,NULL,u_initial); CHKERRQ(ierr); > > However, I still cannot change SNES, KSP, and PC types for > individual domains through the command arguments. I checked the subdomains > with -snes_view ::ascii_info_detail and it does show that the prefixes > are properly changed. It also shows that the SNES type for the last domain > was successfully changed. But for some reason, I only have access to the > SNES viewer options during runtime. For example, if I run mpiexec -n 4 > ./test1 -sub_0_ksp_type gmres -help | grep sub_0 I get the output: > > Viewer (-sub_0_snes_convergence_estimate) options: > -sub_0_snes_convergence_estimate ascii[:[filename][:[format][:append]]]: > Prints object to stdout or ASCII file (PetscOptionsGetViewer) > -sub_0_snes_convergence_estimate > binary[:[filename][:[format][:append]]]: Saves object to a binary file > (PetscOptionsGetViewer) > -sub_0_snes_convergence_estimate draw[:[drawtype][:filename|format]] > Draws object (PetscOptionsGetViewer) > -sub_0_snes_convergence_estimate socket[:port]: Pushes object to a Unix > socket (PetscOptionsGetViewer) > -sub_0_snes_convergence_estimate saws[:communicatorname]: Publishes > object to SAWs (PetscOptionsGetViewer) > Viewer (-sub_0_snes_view_pre) options: > -sub_0_snes_view_pre ascii[:[filename][:[format][:append]]]: Prints > object to stdout or ASCII file (PetscOptionsGetViewer) > -sub_0_snes_view_pre binary[:[filename][:[format][:append]]]: Saves > object to a binary file (PetscOptionsGetViewer) > -sub_0_snes_view_pre draw[:[drawtype][:filename|format]] Draws object > (PetscOptionsGetViewer) > -sub_0_snes_view_pre socket[:port]: Pushes object to a Unix socket > (PetscOptionsGetViewer) > -sub_0_snes_view_pre saws[:communicatorname]: Publishes object to SAWs > (PetscOptionsGetViewer) > Viewer (-sub_0_snes_test_jacobian_view) options: > -sub_0_snes_test_jacobian_view ascii[:[filename][:[format][:append]]]: > Prints object to stdout or ASCII file (PetscOptionsGetViewer) > -sub_0_snes_test_jacobian_view binary[:[filename][:[format][:append]]]: > Saves object to a binary file (PetscOptionsGetViewer) > -sub_0_snes_test_jacobian_view draw[:[drawtype][:filename|format]] Draws > object (PetscOptionsGetViewer) > -sub_0_snes_test_jacobian_view socket[:port]: Pushes object to a Unix > socket (PetscOptionsGetViewer) > -sub_0_snes_test_jacobian_view saws[:communicatorname]: Publishes object > to SAWs (PetscOptionsGetViewer) > Viewer (-sub_0_snes_test_jacobian_display) options: > -sub_0_snes_test_jacobian_display > ascii[:[filename][:[format][:append]]]: Prints object to stdout or ASCII > file (PetscOptionsGetViewer) > -sub_0_snes_test_jacobian_display > binary[:[filename][:[format][:append]]]: Saves object to a binary file > (PetscOptionsGetViewer) > -sub_0_snes_test_jacobian_display draw[:[drawtype][:filename|format]] > Draws object (PetscOptionsGetViewer) > -sub_0_snes_test_jacobian_display socket[:port]: Pushes object to a Unix > socket (PetscOptionsGetViewer) > -sub_0_snes_test_jacobian_display saws[:communicatorname]: Publishes > object to SAWs (PetscOptionsGetViewer) > Viewer (-sub_0_ksp_converged_reason) options: > -sub_0_ksp_converged_reason ascii[:[filename][:[format][:append]]]: > Prints object to stdout or ASCII file (PetscOptionsGetViewer) > -sub_0_ksp_converged_reason binary[:[filename][:[format][:append]]]: > Saves object to a binary file (PetscOptionsGetViewer) > -sub_0_ksp_converged_reason draw[:[drawtype][:filename|format]] Draws > object (PetscOptionsGetViewer) > -sub_0_ksp_converged_reason socket[:port]: Pushes object to a Unix > socket (PetscOptionsGetViewer) > -sub_0_ksp_converged_reason saws[:communicatorname]: Publishes object to > SAWs (PetscOptionsGetViewer) > Viewer (-sub_0_snes_converged_reason) options: > -sub_0_snes_converged_reason ascii[:[filename][:[format][:append]]]: > Prints object to stdout or ASCII file (PetscOptionsGetViewer) > -sub_0_snes_converged_reason binary[:[filename][:[format][:append]]]: > Saves object to a binary file (PetscOptionsGetViewer) > -sub_0_snes_converged_reason draw[:[drawtype][:filename|format]] Draws > object (PetscOptionsGetViewer) > -sub_0_snes_converged_reason socket[:port]: Pushes object to a Unix > socket (PetscOptionsGetViewer) > -sub_0_snes_converged_reason saws[:communicatorname]: Publishes object > to SAWs (PetscOptionsGetViewer) > Viewer (-sub_0_snes_view) options: > -sub_0_snes_view ascii[:[filename][:[format][:append]]]: Prints object > to stdout or ASCII file (PetscOptionsGetViewer) > -sub_0_snes_view binary[:[filename][:[format][:append]]]: Saves object > to a binary file (PetscOptionsGetViewer) > -sub_0_snes_view draw[:[drawtype][:filename|format]] Draws object > (PetscOptionsGetViewer) > -sub_0_snes_view socket[:port]: Pushes object to a Unix socket > (PetscOptionsGetViewer) > -sub_0_snes_view saws[:communicatorname]: Publishes object to SAWs > (PetscOptionsGetViewer) > Viewer (-sub_0_snes_view_solution) options: > -sub_0_snes_view_solution ascii[:[filename][:[format][:append]]]: Prints > object to stdout or ASCII file (PetscOptionsGetViewer) > -sub_0_snes_view_solution binary[:[filename][:[format][:append]]]: Saves > object to a binary file (PetscOptionsGetViewer) > -sub_0_snes_view_solution draw[:[drawtype][:filename|format]] Draws > object (PetscOptionsGetViewer) > -sub_0_snes_view_solution socket[:port]: Pushes object to a Unix socket > (PetscOptionsGetViewer) > -sub_0_snes_view_solution saws[:communicatorname]: Publishes object to > SAWs (PetscOptionsGetViewer) > Option left: name:-sub_0_ksp_type value: gmres > > > Do you know what could be causing this? > > On Fri, Jun 17, 2022 at 10:00 AM Barry Smith wrote: > >> >> MPI_Comm_size(PETSC_COMM_WORLD,&size); >> MPI_Comm_rank(PETSC_COMM_WORLD,&rank); >> >> SNESNASMGetSNES(snes,0,&subsnes); >>> char prefix[10]; >>> sprintf(prefix,"sub_%d_",rank); >>> SNESSetOptionsPrefix(subsnes,prefix); >>> >> >> >> >> On Jun 17, 2022, at 9:35 AM, Matthew Knepley wrote: >> >> On Fri, Jun 17, 2022 at 9:22 AM Takahashi, Tadanaga >> wrote: >> >>> I'm having some trouble pulling out the subsolver. I tried to use >>> SNESNASMGetSNES in a loop over each subdomain. However I get an error when >>> I run the code with more than one MPI processors. Here is a snippet from my >>> code: >>> >>> SNES snes, subsnes; >>> PetscMPIInt rank, size; >>> ... >>> ierr = SNESCreate(PETSC_COMM_WORLD,&snes); CHKERRQ(ierr); >>> ierr = SNESSetType(snes,SNESNASM); CHKERRQ(ierr); >>> ierr = SNESNASMSetType(snes,PC_ASM_RESTRICT); CHKERRQ(ierr); >>> ... >>> ierr = SNESSetFromOptions(snes); CHKERRQ(ierr); >>> ierr = SNESSetUp(snes); CHKERRQ(ierr); >>> PetscPrintf(PETSC_COMM_WORLD, "Size = %d\n",size); >>> for (i=0; i>> PetscPrintf(PETSC_COMM_WORLD, "rank = %d\n",i); >>> SNESNASMGetSNES(snes,i,&subsnes); >>> // char prefix[10]; >>> // sprintf(prefix,"sub_%d_",i); >>> // SNESSetOptionsPrefix(subsnes,prefix); >>> } >>> ... >>> ierr = SNESSolve(snes,NULL,u_initial); CHKERRQ(ierr); >>> >>> >>> And, here is the output of the code when I run with 2 MPI procs: >>> >> >> SNESNASMGetSNES() gets the local subsolvers. It seems you only have one >> per process. >> You can check >> https://petsc.org/main/docs/manualpages/SNES/SNESNASMGetNumber/ >> >> Notice that your current code will not work because, according to your >> explanation, you only want to change >> the prefix on a single rank, so you need to check the rank when you do it. >> >> Thanks, >> >> Matt >> >> >>> takahashi at ubuntu:~/Desktop/MA-DDM/C/Rectangle$ mpiexec -n 2 ./test1 >>> Size = 2 >>> rank = 0 >>> rank = 1 >>> [0]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> [0]PETSC ERROR: Argument out of range >>> [0]PETSC ERROR: No such subsolver >>> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. >>> [0]PETSC ERROR: Petsc Release Version 3.17.1, unknown >>> [0]PETSC ERROR: ./test1 on a linux-gnu-c-debug named ubuntu by takahashi >>> Fri Jun 17 06:06:38 2022 >>> [0]PETSC ERROR: Configure options --with-mpi-dir=/usr --with-fc=0 >>> [0]PETSC ERROR: #1 SNESNASMGetSNES() at >>> /home/takahashi/Desktop/petsc/src/snes/impls/nasm/nasm.c:923 >>> >>> >>> =================================================================================== >>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >>> = RANK 0 PID 976566 RUNNING AT ubuntu >>> = KILLED BY SIGNAL: 9 (Killed) >>> >>> =================================================================================== >>> >>> This error doesn't occur when I run this without MPI. However, I tried >>> to change the prefix of the subdomain to `sub_0_` but I am not able to >>> change the snes_type using this prefix. Running ./test1 -snes_view >>> -help | grep sub_0_snes_type prints nothing. >>> >>> On Thu, Jun 16, 2022 at 6:23 PM Matthew Knepley >>> wrote: >>> >>>> On Thu, Jun 16, 2022 at 5:57 PM tt73 wrote: >>>> >>>>> >>>>> Hi, >>>>> >>>>> I am using NASM as the outer solver for a nonlinear problem. For one >>>>> of the subdomains, I want to run the local solve with a different set of >>>>> options form the others. Is there any way to set options for each >>>>> subdomain? >>>>> >>>> >>>> I can see two ways: >>>> >>>> 1) Pull out the subsolver and set it using the API >>>> >>>> 2) Pull out the subsolver and give it a different prefix >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dfatiac at gmail.com Fri Jun 17 10:21:03 2022 From: dfatiac at gmail.com (Mario Rossi) Date: Fri, 17 Jun 2022 17:21:03 +0200 Subject: [petsc-users] Parallel matrix-vector product with Matrix Shell Message-ID: I need to find the largest eigenvalues (say the first three) of a very large matrix and I am using a combination of PetSc and SLEPc. In particular, I am using a shell matrix. I wrote a "custom" matrix-vector product and everything works fine in serial (one task) mode for a "small" case. For the real case, I need multiple (at least 128) tasks for memory reasons so I need a parallel variant of the custom matrix-vector product. I know exactly how to write the parallel variant (in plain MPI) but I am, somehow, blocked because it is not clear to me what each task receives and what is expected to provide in the parallel matrix-vector product. More in detail, with a single task, the function receives the full X vector and is expected to provide the full Y vector resulting from Y=A*X. What does it happen with multiple tasks? If I understand correctly in the matrix shell definition, I can choose to split the matrix into blocks of rows so that the matrix-vector function should compute a block of elements of the vector Y but does it receive only the corresponding subset of the X (input vector)? (this is what I guess happens) and in output, does each task return its subset of elements of Y as if it were the whole array and then PetSc manages all the subsets? Is there anyone who has a working example of a parallel matrix-vector product for matrix shell? Thanks in advance for any help you can provide! Mario i -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jun 17 10:33:30 2022 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 17 Jun 2022 11:33:30 -0400 Subject: [petsc-users] Customizing NASM subsnes In-Reply-To: References: <62aba746.1c69fb81.7df46.678d@mx.google.com> Message-ID: On Fri, Jun 17, 2022 at 11:02 AM Barry Smith wrote: > > You do not need the loop over size. Each rank sets options and options > prefix for its local objects and never anyone elses. > > char prefix[10]; > > sprintf(prefix,"sub_%d_",rank); > > SNESNASMGetSNES(snes,0,&subsnes); > > if (rank SNESSetType(subsnes,SNESNEWTONLS); CHKERRQ(ierr); // newton > for regular domains > } else { > SNESSetType(subsnes,SNESFAS); CHKERRQ(ierr); // fas for last > domain > } > SNESSetOptionsPrefix(subsnes,prefix); > } > } > > > To get the prefix to work try calling SNESSetFromOptions(subsnes); > immediately after your SNESSetOptionsPrefix(subsnes,prefix); call > > Matt, it looks like there may be a bug in NASM, except in one > particular case, it never calls SNESSetFromOptions() on the subsenses. > I bet Barry is correct. I can fix it, but unfortunately, I leave for a week long conference tomorrow, of which I am an organizer, so I don't think I can do it for a week. Thanks, Matt > Barry > > > On Jun 17, 2022, at 10:47 AM, Takahashi, Tadanaga wrote: > > Thank you. I am now able to pull each subsnes, change its snes type > through the API, and set a prefix. This is my updated code: > > SNES snes, subsnes; > PetscMPIInt rank, size; > ... > ierr = SNESCreate(PETSC_COMM_WORLD,&snes); CHKERRQ(ierr); > ierr = SNESSetType(snes,SNESNASM); CHKERRQ(ierr); > ierr = SNESNASMSetType(snes,PC_ASM_RESTRICT); CHKERRQ(ierr); > ... > ierr = SNESSetFromOptions(snes); CHKERRQ(ierr); > ierr = SNESSetUp(snes); CHKERRQ(ierr); > PetscPrintf(PETSC_COMM_WORLD, "Size = %d\n",size); > PetscBarrier(NULL); > for (i=0; i char prefix[10]; > sprintf(prefix,"sub_%d_",i); > if(i==rank) { > ierr = SNESNASMGetNumber(snes,&Nd); > printf("rank = %d has %d block(s)\n",i,Nd); > if (i SNESNASMGetSNES(snes,0,&subsnes); > SNESSetType(subsnes,SNESNEWTONLS); CHKERRQ(ierr); // newton > for regular domains > } else { > SNESNASMGetSNES(snes,0,&subsnes); > SNESSetType(subsnes,SNESFAS); CHKERRQ(ierr); // fas for last > domain > } > SNESSetOptionsPrefix(subsnes,prefix); > } > } > ierr = SNESSetFromOptions(snes); CHKERRQ(ierr); > ... > ierr = SNESSolve(snes,NULL,u_initial); CHKERRQ(ierr); > > However, I still cannot change SNES, KSP, and PC types for > individual domains through the command arguments. I checked the subdomains > with -snes_view ::ascii_info_detail and it does show that the prefixes > are properly changed. It also shows that the SNES type for the last domain > was successfully changed. But for some reason, I only have access to the > SNES viewer options during runtime. For example, if I run mpiexec -n 4 > ./test1 -sub_0_ksp_type gmres -help | grep sub_0 I get the output: > > Viewer (-sub_0_snes_convergence_estimate) options: > -sub_0_snes_convergence_estimate ascii[:[filename][:[format][:append]]]: > Prints object to stdout or ASCII file (PetscOptionsGetViewer) > -sub_0_snes_convergence_estimate > binary[:[filename][:[format][:append]]]: Saves object to a binary file > (PetscOptionsGetViewer) > -sub_0_snes_convergence_estimate draw[:[drawtype][:filename|format]] > Draws object (PetscOptionsGetViewer) > -sub_0_snes_convergence_estimate socket[:port]: Pushes object to a Unix > socket (PetscOptionsGetViewer) > -sub_0_snes_convergence_estimate saws[:communicatorname]: Publishes > object to SAWs (PetscOptionsGetViewer) > Viewer (-sub_0_snes_view_pre) options: > -sub_0_snes_view_pre ascii[:[filename][:[format][:append]]]: Prints > object to stdout or ASCII file (PetscOptionsGetViewer) > -sub_0_snes_view_pre binary[:[filename][:[format][:append]]]: Saves > object to a binary file (PetscOptionsGetViewer) > -sub_0_snes_view_pre draw[:[drawtype][:filename|format]] Draws object > (PetscOptionsGetViewer) > -sub_0_snes_view_pre socket[:port]: Pushes object to a Unix socket > (PetscOptionsGetViewer) > -sub_0_snes_view_pre saws[:communicatorname]: Publishes object to SAWs > (PetscOptionsGetViewer) > Viewer (-sub_0_snes_test_jacobian_view) options: > -sub_0_snes_test_jacobian_view ascii[:[filename][:[format][:append]]]: > Prints object to stdout or ASCII file (PetscOptionsGetViewer) > -sub_0_snes_test_jacobian_view binary[:[filename][:[format][:append]]]: > Saves object to a binary file (PetscOptionsGetViewer) > -sub_0_snes_test_jacobian_view draw[:[drawtype][:filename|format]] Draws > object (PetscOptionsGetViewer) > -sub_0_snes_test_jacobian_view socket[:port]: Pushes object to a Unix > socket (PetscOptionsGetViewer) > -sub_0_snes_test_jacobian_view saws[:communicatorname]: Publishes object > to SAWs (PetscOptionsGetViewer) > Viewer (-sub_0_snes_test_jacobian_display) options: > -sub_0_snes_test_jacobian_display > ascii[:[filename][:[format][:append]]]: Prints object to stdout or ASCII > file (PetscOptionsGetViewer) > -sub_0_snes_test_jacobian_display > binary[:[filename][:[format][:append]]]: Saves object to a binary file > (PetscOptionsGetViewer) > -sub_0_snes_test_jacobian_display draw[:[drawtype][:filename|format]] > Draws object (PetscOptionsGetViewer) > -sub_0_snes_test_jacobian_display socket[:port]: Pushes object to a Unix > socket (PetscOptionsGetViewer) > -sub_0_snes_test_jacobian_display saws[:communicatorname]: Publishes > object to SAWs (PetscOptionsGetViewer) > Viewer (-sub_0_ksp_converged_reason) options: > -sub_0_ksp_converged_reason ascii[:[filename][:[format][:append]]]: > Prints object to stdout or ASCII file (PetscOptionsGetViewer) > -sub_0_ksp_converged_reason binary[:[filename][:[format][:append]]]: > Saves object to a binary file (PetscOptionsGetViewer) > -sub_0_ksp_converged_reason draw[:[drawtype][:filename|format]] Draws > object (PetscOptionsGetViewer) > -sub_0_ksp_converged_reason socket[:port]: Pushes object to a Unix > socket (PetscOptionsGetViewer) > -sub_0_ksp_converged_reason saws[:communicatorname]: Publishes object to > SAWs (PetscOptionsGetViewer) > Viewer (-sub_0_snes_converged_reason) options: > -sub_0_snes_converged_reason ascii[:[filename][:[format][:append]]]: > Prints object to stdout or ASCII file (PetscOptionsGetViewer) > -sub_0_snes_converged_reason binary[:[filename][:[format][:append]]]: > Saves object to a binary file (PetscOptionsGetViewer) > -sub_0_snes_converged_reason draw[:[drawtype][:filename|format]] Draws > object (PetscOptionsGetViewer) > -sub_0_snes_converged_reason socket[:port]: Pushes object to a Unix > socket (PetscOptionsGetViewer) > -sub_0_snes_converged_reason saws[:communicatorname]: Publishes object > to SAWs (PetscOptionsGetViewer) > Viewer (-sub_0_snes_view) options: > -sub_0_snes_view ascii[:[filename][:[format][:append]]]: Prints object > to stdout or ASCII file (PetscOptionsGetViewer) > -sub_0_snes_view binary[:[filename][:[format][:append]]]: Saves object > to a binary file (PetscOptionsGetViewer) > -sub_0_snes_view draw[:[drawtype][:filename|format]] Draws object > (PetscOptionsGetViewer) > -sub_0_snes_view socket[:port]: Pushes object to a Unix socket > (PetscOptionsGetViewer) > -sub_0_snes_view saws[:communicatorname]: Publishes object to SAWs > (PetscOptionsGetViewer) > Viewer (-sub_0_snes_view_solution) options: > -sub_0_snes_view_solution ascii[:[filename][:[format][:append]]]: Prints > object to stdout or ASCII file (PetscOptionsGetViewer) > -sub_0_snes_view_solution binary[:[filename][:[format][:append]]]: Saves > object to a binary file (PetscOptionsGetViewer) > -sub_0_snes_view_solution draw[:[drawtype][:filename|format]] Draws > object (PetscOptionsGetViewer) > -sub_0_snes_view_solution socket[:port]: Pushes object to a Unix socket > (PetscOptionsGetViewer) > -sub_0_snes_view_solution saws[:communicatorname]: Publishes object to > SAWs (PetscOptionsGetViewer) > Option left: name:-sub_0_ksp_type value: gmres > > > Do you know what could be causing this? > > On Fri, Jun 17, 2022 at 10:00 AM Barry Smith wrote: > >> >> MPI_Comm_size(PETSC_COMM_WORLD,&size); >> MPI_Comm_rank(PETSC_COMM_WORLD,&rank); >> >> SNESNASMGetSNES(snes,0,&subsnes); >>> char prefix[10]; >>> sprintf(prefix,"sub_%d_",rank); >>> SNESSetOptionsPrefix(subsnes,prefix); >>> >> >> >> >> On Jun 17, 2022, at 9:35 AM, Matthew Knepley wrote: >> >> On Fri, Jun 17, 2022 at 9:22 AM Takahashi, Tadanaga >> wrote: >> >>> I'm having some trouble pulling out the subsolver. I tried to use >>> SNESNASMGetSNES in a loop over each subdomain. However I get an error when >>> I run the code with more than one MPI processors. Here is a snippet from my >>> code: >>> >>> SNES snes, subsnes; >>> PetscMPIInt rank, size; >>> ... >>> ierr = SNESCreate(PETSC_COMM_WORLD,&snes); CHKERRQ(ierr); >>> ierr = SNESSetType(snes,SNESNASM); CHKERRQ(ierr); >>> ierr = SNESNASMSetType(snes,PC_ASM_RESTRICT); CHKERRQ(ierr); >>> ... >>> ierr = SNESSetFromOptions(snes); CHKERRQ(ierr); >>> ierr = SNESSetUp(snes); CHKERRQ(ierr); >>> PetscPrintf(PETSC_COMM_WORLD, "Size = %d\n",size); >>> for (i=0; i>> PetscPrintf(PETSC_COMM_WORLD, "rank = %d\n",i); >>> SNESNASMGetSNES(snes,i,&subsnes); >>> // char prefix[10]; >>> // sprintf(prefix,"sub_%d_",i); >>> // SNESSetOptionsPrefix(subsnes,prefix); >>> } >>> ... >>> ierr = SNESSolve(snes,NULL,u_initial); CHKERRQ(ierr); >>> >>> >>> And, here is the output of the code when I run with 2 MPI procs: >>> >> >> SNESNASMGetSNES() gets the local subsolvers. It seems you only have one >> per process. >> You can check >> https://petsc.org/main/docs/manualpages/SNES/SNESNASMGetNumber/ >> >> Notice that your current code will not work because, according to your >> explanation, you only want to change >> the prefix on a single rank, so you need to check the rank when you do it. >> >> Thanks, >> >> Matt >> >> >>> takahashi at ubuntu:~/Desktop/MA-DDM/C/Rectangle$ mpiexec -n 2 ./test1 >>> Size = 2 >>> rank = 0 >>> rank = 1 >>> [0]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> [0]PETSC ERROR: Argument out of range >>> [0]PETSC ERROR: No such subsolver >>> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. >>> [0]PETSC ERROR: Petsc Release Version 3.17.1, unknown >>> [0]PETSC ERROR: ./test1 on a linux-gnu-c-debug named ubuntu by takahashi >>> Fri Jun 17 06:06:38 2022 >>> [0]PETSC ERROR: Configure options --with-mpi-dir=/usr --with-fc=0 >>> [0]PETSC ERROR: #1 SNESNASMGetSNES() at >>> /home/takahashi/Desktop/petsc/src/snes/impls/nasm/nasm.c:923 >>> >>> >>> =================================================================================== >>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >>> = RANK 0 PID 976566 RUNNING AT ubuntu >>> = KILLED BY SIGNAL: 9 (Killed) >>> >>> =================================================================================== >>> >>> This error doesn't occur when I run this without MPI. However, I tried >>> to change the prefix of the subdomain to `sub_0_` but I am not able to >>> change the snes_type using this prefix. Running ./test1 -snes_view >>> -help | grep sub_0_snes_type prints nothing. >>> >>> On Thu, Jun 16, 2022 at 6:23 PM Matthew Knepley >>> wrote: >>> >>>> On Thu, Jun 16, 2022 at 5:57 PM tt73 wrote: >>>> >>>>> >>>>> Hi, >>>>> >>>>> I am using NASM as the outer solver for a nonlinear problem. For one >>>>> of the subdomains, I want to run the local solve with a different set of >>>>> options form the others. Is there any way to set options for each >>>>> subdomain? >>>>> >>>> >>>> I can see two ways: >>>> >>>> 1) Pull out the subsolver and set it using the API >>>> >>>> 2) Pull out the subsolver and give it a different prefix >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Fri Jun 17 10:33:58 2022 From: jroman at dsic.upv.es (Jose E. Roman) Date: Fri, 17 Jun 2022 17:33:58 +0200 Subject: [petsc-users] Parallel matrix-vector product with Matrix Shell In-Reply-To: References: Message-ID: <3AAEAEF6-82A9-478F-BAD9-DC3AE6B0025C@dsic.upv.es> You can use VecGetOwnershipRange() to determine the range of global indices corresponding to the local portion of a vector, and VecGetArray() to access the values. In SLEPc, you can assume that X and Y will have the same parallel distribution. For an example of a shell matrix that implements the matrix-vector product in parallel, have a look at this: https://slepc.upv.es/documentation/current/src/nep/tutorials/ex21.c.html It is a simple tridiagonal example, where neighborwise communication is done with two calls to MPI_Sendrecv(). Jose > El 17 jun 2022, a las 17:21, Mario Rossi escribi?: > > I need to find the largest eigenvalues (say the first three) of a very large matrix and I am using > a combination of PetSc and SLEPc. In particular, I am using a shell matrix. I wrote a "custom" > matrix-vector product and everything works fine in serial (one task) mode for a "small" case. > For the real case, I need multiple (at least 128) tasks for memory reasons so I need a parallel variant of the custom matrix-vector product. I know exactly how to write the parallel variant > (in plain MPI) but I am, somehow, blocked because it is not clear to me what each task receives > and what is expected to provide in the parallel matrix-vector product. > More in detail, with a single task, the function receives the full X vector and is expected to provide the full Y vector resulting from Y=A*X. > What does it happen with multiple tasks? If I understand correctly > in the matrix shell definition, I can choose to split the matrix into blocks of rows so that the matrix-vector function should compute a block of elements of the vector Y but does it receive only the corresponding subset of the X (input vector)? (this is what I guess happens) and in output, does > each task return its subset of elements of Y as if it were the whole array and then PetSc manages all the subsets? Is there anyone who has a working example of a parallel matrix-vector product for matrix shell? > Thanks in advance for any help you can provide! > Mario > i > From dfatiac at gmail.com Fri Jun 17 10:56:34 2022 From: dfatiac at gmail.com (Mario Rossi) Date: Fri, 17 Jun 2022 17:56:34 +0200 Subject: [petsc-users] Parallel matrix-vector product with Matrix Shell In-Reply-To: <3AAEAEF6-82A9-478F-BAD9-DC3AE6B0025C@dsic.upv.es> References: <3AAEAEF6-82A9-478F-BAD9-DC3AE6B0025C@dsic.upv.es> Message-ID: Thanks a lot, Jose! I looked at the eps folder (where I found the test8.c that has been my starting point) but I did not look at the nep folder (my fault!) Thanks again, Mario Il giorno ven 17 giu 2022 alle ore 17:34 Jose E. Roman ha scritto: > You can use VecGetOwnershipRange() to determine the range of global > indices corresponding to the local portion of a vector, and VecGetArray() > to access the values. In SLEPc, you can assume that X and Y will have the > same parallel distribution. > > For an example of a shell matrix that implements the matrix-vector product > in parallel, have a look at this: > https://slepc.upv.es/documentation/current/src/nep/tutorials/ex21.c.html > It is a simple tridiagonal example, where neighborwise communication is > done with two calls to MPI_Sendrecv(). > > Jose > > > > El 17 jun 2022, a las 17:21, Mario Rossi escribi?: > > > > I need to find the largest eigenvalues (say the first three) of a very > large matrix and I am using > > a combination of PetSc and SLEPc. In particular, I am using a shell > matrix. I wrote a "custom" > > matrix-vector product and everything works fine in serial (one task) > mode for a "small" case. > > For the real case, I need multiple (at least 128) tasks for memory > reasons so I need a parallel variant of the custom matrix-vector product. I > know exactly how to write the parallel variant > > (in plain MPI) but I am, somehow, blocked because it is not clear to me > what each task receives > > and what is expected to provide in the parallel matrix-vector product. > > More in detail, with a single task, the function receives the full X > vector and is expected to provide the full Y vector resulting from Y=A*X. > > What does it happen with multiple tasks? If I understand correctly > > in the matrix shell definition, I can choose to split the matrix into > blocks of rows so that the matrix-vector function should compute a block of > elements of the vector Y but does it receive only the corresponding subset > of the X (input vector)? (this is what I guess happens) and in output, does > > each task return its subset of elements of Y as if it were the whole > array and then PetSc manages all the subsets? Is there anyone who has a > working example of a parallel matrix-vector product for matrix shell? > > Thanks in advance for any help you can provide! > > Mario > > i > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From leiyongxiang1205 at gmail.com Fri Jun 17 12:52:34 2022 From: leiyongxiang1205 at gmail.com (Yongxiang Lei) Date: Fri, 17 Jun 2022 18:52:34 +0100 Subject: [petsc-users] Pestc-matlab issue Message-ID: Dear concerns, I met such issues when I confirm that my Matlab installation is finished. I am also sure that the g++ version is given. Could you please check this problem for me? The related information is given as follows. Looking forward to hearing from you Best regards Xiang >> mex -setup cpp MEX configured to use 'g++' for C++ language compilation. ......................................................................................................... [ADS+u2192020 at cos8-25136ecf ~]$ g++ --version g++ (GCC) 8.5.0 20210514 (Red Hat 8.5.0-13) Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ............................................................................................................. export PETSC_DIR=$PWD export PETSC_ARCH=linux-debug ./configure \ --CC=$HOME/sfw/linux/openmpi-4.0.2/bin/mpicc \ --CXX=$HOME/sfw/linux/openmpi-4.0.2/bin/mpicxx \ --FC=$HOME/sfw/linux/openmpi-4.0.2/bin/mpif90 \ --with-debugging=1 \ --download-hypre=1 \ --download-fblaslapack=1 \ --with-x=0\ --with-matlab-dir=/home/ADS/u2192020/Matlab2022/ \ --with-matlab-engine=1 \ --with-matlab-engine-dir=/home/ADS/u2192020/Matlab2022/extern/engines/ ========================================= ========================================= Now to check if the libraries are working do: make PETSC_DIR=/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB PETSC_ARCH=linux-debug check ========================================= [ADS+u2192020 at cos8-25136ecf PETSc-MATLAB]$ make -j4 check Running check examples to verify correct installation Using PETSC_DIR=/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB and PETSC_ARCH=linux-debug gmake[3]: [/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/lib/petsc/conf/rules:320: ex19.PETSc] Error 2 (ignored) *******************Error detected during compile or link!******************* See http://www.mcs.anl.gov/petsc/documentation/faq.html /home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/src/snes/tutorials ex19 ********************************************************************************* /home/ADS/u2192020/sfw/linux/openmpi-4.0.2/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -fstack-protector -fvisibility=hidden -g3 -O0 -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -fstack-protector -fvisibility=hidden -g3 -O0 -I/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/include -I/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/include -I/home/ADS/u2192020/Matlab2022/extern/include ex19.c -Wl,-rpath,/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/lib -L/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/lib -Wl,-rpath,/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/lib -L/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/lib -Wl,-rpath,/home/ADS/u2192020/Matlab2022/bin/glnxa64 -L/home/ADS/u2192020/Matlab2022/bin/glnxa64 -Wl,-rpath,/home/ADS/u2192020/Matlab2022/extern/lib/glnxa64 -L/home/ADS/u2192020/Matlab2022/extern/lib/glnxa64 -Wl,-rpath,/home/ADS/u2192020/sfw/linux/openmpi-4.0.2/lib -L/home/ADS/u2192020/sfw/linux/openmpi-4.0.2/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/8 -L/usr/lib/gcc/x86_64-redhat-linux/8 -lpetsc -lHYPRE -lflapack -lfblas -lm -leng -lmex -lmx -lmat -lmwm_dispatcher -lmwopcmodel -lmwservices -lmwopcmodel -lmwm_dispatcher -lmwmpath -lmwopcmodel -lmwservices -lmwopcmodel -lmwservices -lxerces-c -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lquadmath -lstdc++ -ldl -o ex19 //home/ADS/u2192020/Matlab2022/bin/glnxa64/libmwopccore.so: undefined reference to `std::__cxx11::basic_ostringstream, std::allocator >::basic_ostringstream()@GLIBCXX_3.4.26' //home/ADS/u2192020/Matlab2022/bin/glnxa64/libmwfoundation_matlabdata_matlab.so: undefined reference to `std::logic_error::logic_error(std::logic_error&&)@GLIBCXX_3.4.26' /home/ADS/u2192020/Matlab2022/bin/glnxa64/libmex.so: undefined reference to `std::__cxx11::basic_ostringstream, std::allocator >::basic_ostringstream()@GLIBCXX_3.4.26' /home/ADS/u2192020/Matlab2022/bin/glnxa64/libmx.so: undefined reference to `std::__cxx11::basic_stringstream, std::allocator >::basic_stringstream()@GLIBCXX_3.4.26' collect2: error: ld returned 1 exit status gmake[4]: *** [: ex19] Error 1 1,5c1,10 < lid velocity = 0.0016, prandtl # = 1., grashof # = 1. < 0 SNES Function norm 0.0406612 < 1 SNES Function norm 4.12227e-06 < 2 SNES Function norm 6.098e-11 < Number of SNES iterations = 2 --- > -------------------------------------------------------------------------- > mpiexec was unable to launch the specified application as it could not access > or execute an executable: > > Executable: ./ex19 > Node: cos8-25136ecf > > while attempting to start process rank 0. > -------------------------------------------------------------------------- > 2 total processes failed to start /home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/src/snes/tutorials Possible problem with ex19 running with hypre, diffs above ========================================= gmake[3]: [/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/lib/petsc/conf/rules:374: ex5f.PETSc] Error 2 (ignored) *******************Error detected during compile or link!******************* See http://www.mcs.anl.gov/petsc/documentation/faq.html /home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/src/snes/tutorials ex5f ********************************************************* /home/ADS/u2192020/sfw/linux/openmpi-4.0.2/bin/mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -g -O0 -fPIC -Wall -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -g -O0 -I/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/include -I/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/include -I/home/ADS/u2192020/Matlab2022/extern/include ex5f.F90 -Wl,-rpath,/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/lib -L/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/lib -Wl,-rpath,/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/lib -L/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/lib -Wl,-rpath,/home/ADS/u2192020/Matlab2022/bin/glnxa64 -L/home/ADS/u2192020/Matlab2022/bin/glnxa64 -Wl,-rpath,/home/ADS/u2192020/Matlab2022/extern/lib/glnxa64 -L/home/ADS/u2192020/Matlab2022/extern/lib/glnxa64 -Wl,-rpath,/home/ADS/u2192020/sfw/linux/openmpi-4.0.2/lib -L/home/ADS/u2192020/sfw/linux/openmpi-4.0.2/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/8 -L/usr/lib/gcc/x86_64-redhat-linux/8 -lpetsc -lHYPRE -lflapack -lfblas -lm -leng -lmex -lmx -lmat -lmwm_dispatcher -lmwopcmodel -lmwservices -lmwopcmodel -lmwm_dispatcher -lmwmpath -lmwopcmodel -lmwservices -lmwopcmodel -lmwservices -lxerces-c -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lquadmath -lstdc++ -ldl -o ex5f //home/ADS/u2192020/Matlab2022/bin/glnxa64/libmwopccore.so: undefined reference to `std::__cxx11::basic_ostringstream, std::allocator >::basic_ostringstream()@GLIBCXX_3.4.26' //home/ADS/u2192020/Matlab2022/bin/glnxa64/libmwfoundation_matlabdata_matlab.so: undefined reference to `std::logic_error::logic_error(std::logic_error&&)@GLIBCXX_3.4.26' /home/ADS/u2192020/Matlab2022/bin/glnxa64/libmex.so: undefined reference to `std::__cxx11::basic_ostringstream, std::allocator >::basic_ostringstream()@GLIBCXX_3.4.26' /home/ADS/u2192020/Matlab2022/bin/glnxa64/libmx.so: undefined reference to `std::__cxx11::basic_stringstream, std::allocator >::basic_stringstream()@GLIBCXX_3.4.26' collect2: error: ld returned 1 exit status gmake[4]: *** [/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/lib/petsc/conf/test:43: ex5f] Error 1 gmake[3]: [/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/lib/petsc/conf/rules:320: ex31.PETSc] Error 2 (ignored) *******************Error detected during compile or link!******************* See http://www.mcs.anl.gov/petsc/documentation/faq.html /home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/src/vec/vec/tutorials ex31 ********************************************************************************* /home/ADS/u2192020/sfw/linux/openmpi-4.0.2/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -fstack-protector -fvisibility=hidden -g3 -O0 -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -fstack-protector -fvisibility=hidden -g3 -O0 -I/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/include -I/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/include -I/home/ADS/u2192020/Matlab2022/extern/include ex31.c -Wl,-rpath,/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/lib -L/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/lib -Wl,-rpath,/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/lib -L/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/lib -Wl,-rpath,/home/ADS/u2192020/Matlab2022/bin/glnxa64 -L/home/ADS/u2192020/Matlab2022/bin/glnxa64 -Wl,-rpath,/home/ADS/u2192020/Matlab2022/extern/lib/glnxa64 -L/home/ADS/u2192020/Matlab2022/extern/lib/glnxa64 -Wl,-rpath,/home/ADS/u2192020/sfw/linux/openmpi-4.0.2/lib -L/home/ADS/u2192020/sfw/linux/openmpi-4.0.2/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/8 -L/usr/lib/gcc/x86_64-redhat-linux/8 -lpetsc -lHYPRE -lflapack -lfblas -lm -leng -lmex -lmx -lmat -lmwm_dispatcher -lmwopcmodel -lmwservices -lmwopcmodel -lmwm_dispatcher -lmwmpath -lmwopcmodel -lmwservices -lmwopcmodel -lmwservices -lxerces-c -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lquadmath -lstdc++ -ldl -o ex31 //home/ADS/u2192020/Matlab2022/bin/glnxa64/libmwopccore.so: undefined reference to `std::__cxx11::basic_ostringstream, std::allocator >::basic_ostringstream()@GLIBCXX_3.4.26' //home/ADS/u2192020/Matlab2022/bin/glnxa64/libmwfoundation_matlabdata_matlab.so: undefined reference to `std::logic_error::logic_error(std::logic_error&&)@GLIBCXX_3.4.26' /home/ADS/u2192020/Matlab2022/bin/glnxa64/libmex.so: undefined reference to `std::__cxx11::basic_ostringstream, std::allocator >::basic_ostringstream()@GLIBCXX_3.4.26' /home/ADS/u2192020/Matlab2022/bin/glnxa64/libmx.so: undefined reference to `std::__cxx11::basic_stringstream, std::allocator >::basic_stringstream()@GLIBCXX_3.4.26' collect2: error: ld returned 1 exit status gmake[4]: *** [: ex31] Error 1 Completed test examples [ADS+u2192020 at cos8-25136ecf PETSc-MATLAB]$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Jun 17 13:07:19 2022 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 17 Jun 2022 14:07:19 -0400 Subject: [petsc-users] Pestc-matlab issue In-Reply-To: References: Message-ID: <246D1F71-9210-4DEA-B8CC-C999EC7C5B00@petsc.dev> Please send configure.log and make.log to petsc-maint at mcs.anl.gov > On Jun 17, 2022, at 1:52 PM, Yongxiang Lei wrote: > > Dear concerns, > > I met such issues when I confirm that my Matlab installation is finished. I am also sure that the g++ version is given. Could you please check this problem for me? The related information is given as follows. > > Looking forward to hearing from you > Best regards > Xiang > > >> mex -setup cpp > MEX configured to use 'g++' for C++ language compilation. > ......................................................................................................... > [ADS+u2192020 at cos8-25136ecf ~]$ g++ --version > g++ (GCC) 8.5.0 20210514 (Red Hat 8.5.0-13) > Copyright (C) 2018 Free Software Foundation, Inc. > This is free software; see the source for copying conditions. There is NO > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > ............................................................................................................. > export PETSC_DIR=$PWD > export PETSC_ARCH=linux-debug > ./configure \ > --CC=$HOME/sfw/linux/openmpi-4.0.2/bin/mpicc \ > --CXX=$HOME/sfw/linux/openmpi-4.0.2/bin/mpicxx \ > --FC=$HOME/sfw/linux/openmpi-4.0.2/bin/mpif90 \ > --with-debugging=1 \ > --download-hypre=1 \ > --download-fblaslapack=1 \ > --with-x=0\ > --with-matlab-dir=/home/ADS/u2192020/Matlab2022/ \ > --with-matlab-engine=1 \ > --with-matlab-engine-dir=/home/ADS/u2192020/Matlab2022/extern/engines/ > ========================================= > ========================================= > Now to check if the libraries are working do: > make PETSC_DIR=/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB PETSC_ARCH=linux-debug check > ========================================= > [ADS+u2192020 at cos8-25136ecf PETSc-MATLAB]$ make -j4 check > Running check examples to verify correct installation > Using PETSC_DIR=/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB and PETSC_ARCH=linux-debug > gmake[3]: [/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/lib/petsc/conf/rules:320: ex19.PETSc] Error 2 (ignored) > *******************Error detected during compile or link!******************* > See http://www.mcs.anl.gov/petsc/documentation/faq.html > /home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/src/snes/tutorials ex19 > ********************************************************************************* > /home/ADS/u2192020/sfw/linux/openmpi-4.0.2/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -fstack-protector -fvisibility=hidden -g3 -O0 -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -fstack-protector -fvisibility=hidden -g3 -O0 -I/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/include -I/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/include -I/home/ADS/u2192020/Matlab2022/extern/include ex19.c -Wl,-rpath,/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/lib -L/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/lib -Wl,-rpath,/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/lib -L/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/lib -Wl,-rpath,/home/ADS/u2192020/Matlab2022/bin/glnxa64 -L/home/ADS/u2192020/Matlab2022/bin/glnxa64 -Wl,-rpath,/home/ADS/u2192020/Matlab2022/extern/lib/glnxa64 -L/home/ADS/u2192020/Matlab2022/extern/lib/glnxa64 -Wl,-rpath,/home/ADS/u2192020/sfw/linux/openmpi-4.0.2/lib -L/home/ADS/u2192020/sfw/linux/openmpi-4.0.2/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/8 -L/usr/lib/gcc/x86_64-redhat-linux/8 -lpetsc -lHYPRE -lflapack -lfblas -lm -leng -lmex -lmx -lmat -lmwm_dispatcher -lmwopcmodel -lmwservices -lmwopcmodel -lmwm_dispatcher -lmwmpath -lmwopcmodel -lmwservices -lmwopcmodel -lmwservices -lxerces-c -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lquadmath -lstdc++ -ldl -o ex19 > //home/ADS/u2192020/Matlab2022/bin/glnxa64/libmwopccore.so: undefined reference to `std::__cxx11::basic_ostringstream, std::allocator >::basic_ostringstream()@GLIBCXX_3.4.26' > //home/ADS/u2192020/Matlab2022/bin/glnxa64/libmwfoundation_matlabdata_matlab.so: undefined reference to `std::logic_error::logic_error(std::logic_error&&)@GLIBCXX_3.4.26' > /home/ADS/u2192020/Matlab2022/bin/glnxa64/libmex.so: undefined reference to `std::__cxx11::basic_ostringstream, std::allocator >::basic_ostringstream()@GLIBCXX_3.4.26' > /home/ADS/u2192020/Matlab2022/bin/glnxa64/libmx.so: undefined reference to `std::__cxx11::basic_stringstream, std::allocator >::basic_stringstream()@GLIBCXX_3.4.26' > collect2: error: ld returned 1 exit status > gmake[4]: *** [: ex19] Error 1 > 1,5c1,10 > < lid velocity = 0.0016, prandtl # = 1., grashof # = 1. > < 0 SNES Function norm 0.0406612 > < 1 SNES Function norm 4.12227e-06 > < 2 SNES Function norm 6.098e-11 > < Number of SNES iterations = 2 > --- > > -------------------------------------------------------------------------- > > mpiexec was unable to launch the specified application as it could not access > > or execute an executable: > > > > Executable: ./ex19 > > Node: cos8-25136ecf > > > > while attempting to start process rank 0. > > -------------------------------------------------------------------------- > > 2 total processes failed to start > /home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/src/snes/tutorials > Possible problem with ex19 running with hypre, diffs above > ========================================= > gmake[3]: [/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/lib/petsc/conf/rules:374: ex5f.PETSc] Error 2 (ignored) > *******************Error detected during compile or link!******************* > See http://www.mcs.anl.gov/petsc/documentation/faq.html > /home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/src/snes/tutorials ex5f > ********************************************************* > /home/ADS/u2192020/sfw/linux/openmpi-4.0.2/bin/mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -g -O0 -fPIC -Wall -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -g -O0 -I/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/include -I/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/include -I/home/ADS/u2192020/Matlab2022/extern/include ex5f.F90 -Wl,-rpath,/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/lib -L/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/lib -Wl,-rpath,/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/lib -L/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/lib -Wl,-rpath,/home/ADS/u2192020/Matlab2022/bin/glnxa64 -L/home/ADS/u2192020/Matlab2022/bin/glnxa64 -Wl,-rpath,/home/ADS/u2192020/Matlab2022/extern/lib/glnxa64 -L/home/ADS/u2192020/Matlab2022/extern/lib/glnxa64 -Wl,-rpath,/home/ADS/u2192020/sfw/linux/openmpi-4.0.2/lib -L/home/ADS/u2192020/sfw/linux/openmpi-4.0.2/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/8 -L/usr/lib/gcc/x86_64-redhat-linux/8 -lpetsc -lHYPRE -lflapack -lfblas -lm -leng -lmex -lmx -lmat -lmwm_dispatcher -lmwopcmodel -lmwservices -lmwopcmodel -lmwm_dispatcher -lmwmpath -lmwopcmodel -lmwservices -lmwopcmodel -lmwservices -lxerces-c -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lquadmath -lstdc++ -ldl -o ex5f > //home/ADS/u2192020/Matlab2022/bin/glnxa64/libmwopccore.so: undefined reference to `std::__cxx11::basic_ostringstream, std::allocator >::basic_ostringstream()@GLIBCXX_3.4.26' > //home/ADS/u2192020/Matlab2022/bin/glnxa64/libmwfoundation_matlabdata_matlab.so: undefined reference to `std::logic_error::logic_error(std::logic_error&&)@GLIBCXX_3.4.26' > /home/ADS/u2192020/Matlab2022/bin/glnxa64/libmex.so: undefined reference to `std::__cxx11::basic_ostringstream, std::allocator >::basic_ostringstream()@GLIBCXX_3.4.26' > /home/ADS/u2192020/Matlab2022/bin/glnxa64/libmx.so: undefined reference to `std::__cxx11::basic_stringstream, std::allocator >::basic_stringstream()@GLIBCXX_3.4.26' > collect2: error: ld returned 1 exit status > gmake[4]: *** [/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/lib/petsc/conf/test:43: ex5f] Error 1 > gmake[3]: [/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/lib/petsc/conf/rules:320: ex31.PETSc] Error 2 (ignored) > *******************Error detected during compile or link!******************* > See http://www.mcs.anl.gov/petsc/documentation/faq.html > /home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/src/vec/vec/tutorials ex31 > ********************************************************************************* > /home/ADS/u2192020/sfw/linux/openmpi-4.0.2/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -fstack-protector -fvisibility=hidden -g3 -O0 -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -fstack-protector -fvisibility=hidden -g3 -O0 -I/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/include -I/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/include -I/home/ADS/u2192020/Matlab2022/extern/include ex31.c -Wl,-rpath,/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/lib -L/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/lib -Wl,-rpath,/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/lib -L/home/ADS/u2192020/sfw/petsc/PETSc-MATLAB/linux-debug/lib -Wl,-rpath,/home/ADS/u2192020/Matlab2022/bin/glnxa64 -L/home/ADS/u2192020/Matlab2022/bin/glnxa64 -Wl,-rpath,/home/ADS/u2192020/Matlab2022/extern/lib/glnxa64 -L/home/ADS/u2192020/Matlab2022/extern/lib/glnxa64 -Wl,-rpath,/home/ADS/u2192020/sfw/linux/openmpi-4.0.2/lib -L/home/ADS/u2192020/sfw/linux/openmpi-4.0.2/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/8 -L/usr/lib/gcc/x86_64-redhat-linux/8 -lpetsc -lHYPRE -lflapack -lfblas -lm -leng -lmex -lmx -lmat -lmwm_dispatcher -lmwopcmodel -lmwservices -lmwopcmodel -lmwm_dispatcher -lmwmpath -lmwopcmodel -lmwservices -lmwopcmodel -lmwservices -lxerces-c -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lquadmath -lstdc++ -ldl -o ex31 > //home/ADS/u2192020/Matlab2022/bin/glnxa64/libmwopccore.so: undefined reference to `std::__cxx11::basic_ostringstream, std::allocator >::basic_ostringstream()@GLIBCXX_3.4.26' > //home/ADS/u2192020/Matlab2022/bin/glnxa64/libmwfoundation_matlabdata_matlab.so: undefined reference to `std::logic_error::logic_error(std::logic_error&&)@GLIBCXX_3.4.26' > /home/ADS/u2192020/Matlab2022/bin/glnxa64/libmex.so: undefined reference to `std::__cxx11::basic_ostringstream, std::allocator >::basic_ostringstream()@GLIBCXX_3.4.26' > /home/ADS/u2192020/Matlab2022/bin/glnxa64/libmx.so: undefined reference to `std::__cxx11::basic_stringstream, std::allocator >::basic_stringstream()@GLIBCXX_3.4.26' > collect2: error: ld returned 1 exit status > gmake[4]: *** [: ex31] Error 1 > Completed test examples > [ADS+u2192020 at cos8-25136ecf PETSc-MATLAB]$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dfatiac at gmail.com Sat Jun 18 01:13:55 2022 From: dfatiac at gmail.com (Mario Rossi) Date: Sat, 18 Jun 2022 08:13:55 +0200 Subject: [petsc-users] Parallel matrix-vector product with Matrix Shell In-Reply-To: References: <3AAEAEF6-82A9-478F-BAD9-DC3AE6B0025C@dsic.upv.es> Message-ID: Dear Jose and Petsc users, I implemented the parallel matrix-vector product and it works meaning that it produces a result but it is different from the result produced with a single task. Obviously, I could be wrong in my implementation but what puzzles me is that the *input *vector (x) to the product is different running with one and two tasks and this is from the very first iteration (so it can not be due to a previous error in the product). I checked that X is different with one and two tasks with the following (naive) code PetscErrorCode MatMult_TM(Mat A,Vec x,Vec y) { void *ctx; PetscInt nx /* ,lo,i,j*/; const PetscScalar *px; PetscScalar *py; MPI_Comm comm; PetscFunctionBeginUser; PetscCall(MatShellGetContext(A,&ctx)); PetscCall(VecGetLocalSize(x,&nx)); PetscCall(PetscObjectGetComm((PetscObject)A,&comm)); // nx = *(int*)ctx; PetscCall(VecGetArrayRead(x,&px)); PetscCall(VecGetArray(y,&py)); for(int i=0;i ha scritto: > Thanks a lot, Jose! > I looked at the eps folder (where I found the test8.c that has been my > starting point) but I did not look at the nep folder (my fault!) > Thanks again, > Mario > > Il giorno ven 17 giu 2022 alle ore 17:34 Jose E. Roman > ha scritto: > >> You can use VecGetOwnershipRange() to determine the range of global >> indices corresponding to the local portion of a vector, and VecGetArray() >> to access the values. In SLEPc, you can assume that X and Y will have the >> same parallel distribution. >> >> For an example of a shell matrix that implements the matrix-vector >> product in parallel, have a look at this: >> https://slepc.upv.es/documentation/current/src/nep/tutorials/ex21.c.html >> It is a simple tridiagonal example, where neighborwise communication is >> done with two calls to MPI_Sendrecv(). >> >> Jose >> >> >> > El 17 jun 2022, a las 17:21, Mario Rossi escribi?: >> > >> > I need to find the largest eigenvalues (say the first three) of a very >> large matrix and I am using >> > a combination of PetSc and SLEPc. In particular, I am using a shell >> matrix. I wrote a "custom" >> > matrix-vector product and everything works fine in serial (one task) >> mode for a "small" case. >> > For the real case, I need multiple (at least 128) tasks for memory >> reasons so I need a parallel variant of the custom matrix-vector product. I >> know exactly how to write the parallel variant >> > (in plain MPI) but I am, somehow, blocked because it is not clear to me >> what each task receives >> > and what is expected to provide in the parallel matrix-vector product. >> > More in detail, with a single task, the function receives the full X >> vector and is expected to provide the full Y vector resulting from Y=A*X. >> > What does it happen with multiple tasks? If I understand correctly >> > in the matrix shell definition, I can choose to split the matrix into >> blocks of rows so that the matrix-vector function should compute a block of >> elements of the vector Y but does it receive only the corresponding subset >> of the X (input vector)? (this is what I guess happens) and in output, does >> > each task return its subset of elements of Y as if it were the whole >> array and then PetSc manages all the subsets? Is there anyone who has a >> working example of a parallel matrix-vector product for matrix shell? >> > Thanks in advance for any help you can provide! >> > Mario >> > i >> > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangzongze at gmail.com Sat Jun 18 01:16:08 2022 From: yangzongze at gmail.com (Zongze Yang) Date: Sat, 18 Jun 2022 14:16:08 +0800 Subject: [petsc-users] How to find the map between the high order coordinates of DMPlex and vertex numbering? In-Reply-To: References: <2640A1A9-101C-4DFB-BFA4-C64AF231732A@gmail.com> Message-ID: In order to check if I made mistakes in the python code, I try to use c code to show the issue on DMProjectCoordinates. The code and mesh file is attached. If the code is correct, there must be something wrong with `DMProjectCoordinates` or `DMPlexCreateGmshFromFile` for high-order mesh. The command and the output are listed below: (Obviously the bounding box is changed.) ``` $ ./test_gmsh_load_2rd -filename cube-p2.msh -old_fe_view -new_fe_view Old Bounding Box: 0: lo = 0. hi = 1. 1: lo = 0. hi = 1. 2: lo = 0. hi = 1. PetscFE Object: OldCoordinatesFE 1 MPI processes type: basic Basic Finite Element in 3 dimensions with 3 components PetscSpace Object: P2 1 MPI processes type: sum Space in 3 variables with 3 components, size 30 Sum space of 3 concatenated subspaces (all identical) PetscSpace Object: sum component (sumcomp_) 1 MPI processes type: poly Space in 3 variables with 1 components, size 10 Polynomial space of degree 2 PetscDualSpace Object: P2 1 MPI processes type: lagrange Dual space with 3 components, size 30 Discontinuous Lagrange dual space Quadrature of order 5 on 27 points (dim 3) PetscFE Object: NewCoordinatesFE 1 MPI processes type: basic Basic Finite Element in 3 dimensions with 3 components PetscSpace Object: P2 1 MPI processes type: sum Space in 3 variables with 3 components, size 30 Sum space of 3 concatenated subspaces (all identical) PetscSpace Object: sum component (sumcomp_) 1 MPI processes type: poly Space in 3 variables with 1 components, size 10 Polynomial space of degree 2 PetscDualSpace Object: P2 1 MPI processes type: lagrange Dual space with 3 components, size 30 Continuous Lagrange dual space Quadrature of order 5 on 27 points (dim 3) New Bounding Box: 0: lo = 2.5624e-17 hi = 8. 1: lo = -9.23372e-17 hi = 7. 2: lo = 2.72091e-17 hi = 8.5 ``` Thanks, Zongze Zongze Yang ?2022?6?17??? 14:54??? > I tried the projection operation. However, it seems that the projection > gives the wrong solution. After projection, the bounding box is changed! > See logs below. > > First, I patch the petsc4py by adding `DMProjectCoordinates`: > ``` > diff --git a/src/binding/petsc4py/src/PETSc/DM.pyx > b/src/binding/petsc4py/src/PETSc/DM.pyx > index d8a58d183a..dbcdb280f1 100644 > --- a/src/binding/petsc4py/src/PETSc/DM.pyx > +++ b/src/binding/petsc4py/src/PETSc/DM.pyx > @@ -307,6 +307,12 @@ cdef class DM(Object): > PetscINCREF(c.obj) > return c > > + def projectCoordinates(self, FE fe=None): > + if fe is None: > + CHKERR( DMProjectCoordinates(self.dm, NULL) ) > + else: > + CHKERR( DMProjectCoordinates(self.dm, fe.fe) ) > + > def getBoundingBox(self): > cdef PetscInt i,dim=0 > CHKERR( DMGetCoordinateDim(self.dm, &dim) ) > diff --git a/src/binding/petsc4py/src/PETSc/petscdm.pxi > b/src/binding/petsc4py/src/PETSc/petscdm.pxi > index 514b6fa472..c778e39884 100644 > --- a/src/binding/petsc4py/src/PETSc/petscdm.pxi > +++ b/src/binding/petsc4py/src/PETSc/petscdm.pxi > @@ -90,6 +90,7 @@ cdef extern from * nogil: > int DMGetCoordinateDim(PetscDM,PetscInt*) > int DMSetCoordinateDim(PetscDM,PetscInt) > int DMLocalizeCoordinates(PetscDM) > + int DMProjectCoordinates(PetscDM, PetscFE) > > int DMCreateInterpolation(PetscDM,PetscDM,PetscMat*,PetscVec*) > int DMCreateInjection(PetscDM,PetscDM,PetscMat*) > ``` > > Then in python, I load a mesh and project the coordinates to P2: > ``` > import firedrake as fd > from firedrake.petsc import PETSc > > # plex = fd.mesh._from_gmsh('test-fd-load-p2.msh') > plex = fd.mesh._from_gmsh('test-fd-load-p2-rect.msh') > print('old bbox:', plex.getBoundingBox()) > > dim = plex.getDimension() > # (dim, nc, isSimplex, k, > qorder, comm=None) > fe_new = PETSc.FE().createLagrange(dim, dim, True, 2, PETSc.DETERMINE) > plex.projectCoordinates(fe_new) > fe_new.view() > > print('new bbox:', plex.getBoundingBox()) > ``` > > The output is (The bounding box is changed!) > ``` > > old bbox: ((0.0, 1.0), (0.0, 1.0), (0.0, 1.0)) > PetscFE Object: P2 1 MPI processes > type: basic > Basic Finite Element in 3 dimensions with 3 components > PetscSpace Object: P2 1 MPI processes > type: sum > Space in 3 variables with 3 components, size 30 > Sum space of 3 concatenated subspaces (all identical) > PetscSpace Object: sum component (sumcomp_) 1 MPI processes > type: poly > Space in 3 variables with 1 components, size 10 > Polynomial space of degree 2 > PetscDualSpace Object: P2 1 MPI processes > type: lagrange > Dual space with 3 components, size 30 > Continuous Lagrange dual space > Quadrature of order 5 on 27 points (dim 3) > new bbox: ((-6.530133708576188e-17, 36.30670832662781), (-3.899962995254311e-17, 36.2406171632539), (-8.8036464152166e-17, 36.111577025012224)) > > ``` > > > By the way, for the original DG coordinates, where can I find the relation of the closure and the order of the dofs for the cell? > > > Thanks! > > > Zongze > > > > Matthew Knepley ?2022?6?17??? 01:11??? > >> On Thu, Jun 16, 2022 at 12:06 PM Zongze Yang >> wrote: >> >>> >>> >>> ? 2022?6?16??23:22?Matthew Knepley ??? >>> >>> ? >>> On Thu, Jun 16, 2022 at 11:11 AM Zongze Yang >>> wrote: >>> >>>> Hi, if I load a `gmsh` file with second-order elements, the coordinates >>>> will be stored in a DG-P2 space. After obtaining the coordinates of a cell, >>>> how can I map the coordinates to vertex and edge? >>>> >>> >>> By default, they are stored as P2, not DG. >>> >>> >>> I checked the coordinates vector, and found the dogs only defined on >>> cell other than vertex and edge, so I said they are stored as DG. >>> Then the function DMPlexVecGetClosure >>> seems return >>> the coordinates in lex order. >>> >>> Some code in reading gmsh file reads that >>> >>> >>> 1756: if (isSimplex) continuity = PETSC_FALSE >>> ; /* XXX >>> FIXME Requires DMPlexSetClosurePermutationLexicographic() */ >>> >>> >>> 1758: GmshCreateFE(comm, NULL, isSimplex, continuity, nodeType, >>> dim, coordDim, order, &fe) >>> >>> >>> The continuity is set to false for simplex. >>> >> >> Oh, yes. That needs to be fixed. For now, you can just project it to P2 >> if you want using >> >> https://petsc.org/main/docs/manualpages/DM/DMProjectCoordinates/ >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>> Zongze >>> >>> You can ask for the coordinates of a vertex or an edge directly using >>> >>> https://petsc.org/main/docs/manualpages/DMPLEX/DMPlexPointLocalRead/ >>> >>> by giving the vertex or edge point. You can get all the coordinates on a >>> cell, in the closure order, using >>> >>> https://petsc.org/main/docs/manualpages/DMPLEX/DMPlexVecGetClosure/ >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Below is some code load the gmsh file, I want to know the relation >>>> between `cl` and `cell_coords`. >>>> >>>> ``` >>>> import firedrake as fd >>>> import numpy as np >>>> >>>> # Load gmsh file (2rd) >>>> plex = fd.mesh._from_gmsh('test-fd-load-p2-rect.msh') >>>> >>>> cs, ce = plex.getHeightStratum(0) >>>> >>>> cdm = plex.getCoordinateDM() >>>> csec = dm.getCoordinateSection() >>>> coords_gvec = dm.getCoordinates() >>>> >>>> for i in range(cs, ce): >>>> cell_coords = cdm.getVecClosure(csec, coords_gvec, i) >>>> print(f'coordinates for cell {i} :\n{cell_coords.reshape([-1, 3])}') >>>> cl = dm.getTransitiveClosure(i) >>>> print('closure:', cl) >>>> break >>>> ``` >>>> >>>> Best wishes, >>>> Zongze >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cube-p2.msh Type: application/octet-stream Size: 5210 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_gmsh_load_2rd.c Type: application/octet-stream Size: 2297 bytes Desc: not available URL: From jroman at dsic.upv.es Sat Jun 18 01:27:25 2022 From: jroman at dsic.upv.es (Jose E. Roman) Date: Sat, 18 Jun 2022 08:27:25 +0200 Subject: [petsc-users] Parallel matrix-vector product with Matrix Shell In-Reply-To: References: <3AAEAEF6-82A9-478F-BAD9-DC3AE6B0025C@dsic.upv.es> Message-ID: <35218C6D-1BA4-4E2D-8A27-8860B6CF3E33@dsic.upv.es> The initial vector of the Krylov method is by default a random vector, which is different when you change the number of processes. To avoid this, you can run with the undocumented option -bv_reproducible_random which will generate the same random initial vector irrespective of the number of processes. Alternatively, set an initial vector in your code with EPSSetInitialSpace(), see e.g. https://slepc.upv.es/documentation/current/src/eps/tutorials/ex5.c.html Jose > El 18 jun 2022, a las 8:13, Mario Rossi escribi?: > > Dear Jose and Petsc users, I implemented the parallel matrix-vector product and it works meaning that it produces a result but it is different from the result produced with a single task. > Obviously, I could be wrong in my implementation but what puzzles me is that the input vector (x) to the product is different running with one and two tasks and this is from the very first iteration (so it can not be due to a previous error in the product). > I checked that X is different with one and two tasks with the following (naive) code > PetscErrorCode MatMult_TM(Mat A,Vec x,Vec y) { > void *ctx; > PetscInt nx /* ,lo,i,j*/; > const PetscScalar *px; > PetscScalar *py; > MPI_Comm comm; > PetscFunctionBeginUser; > PetscCall(MatShellGetContext(A,&ctx)); > PetscCall(VecGetLocalSize(x,&nx)); > PetscCall(PetscObjectGetComm((PetscObject)A,&comm)); > > // nx = *(int*)ctx; > PetscCall(VecGetArrayRead(x,&px)); > PetscCall(VecGetArray(y,&py)); > > for(int i=0;i PetscCall(MPI_Barrier(comm)); > exit(0); > ...... > } > > Then I reordered the output obtained with one and two tasks. The first part of the x vector is very similar (but not exactly the same) using one and two tasks but the second part (belonging to the second task) is pretty different > (here "offset" is offset=(n/size)*myrank;) > I create the matrix shell with > PetscCall(MatCreateShell(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,N,N,&n,&A)); > I am sure I am doing something wrong but I don't know what I need to look at. > Thanks in advance! > Mario > > > Il giorno ven 17 giu 2022 alle ore 17:56 Mario Rossi ha scritto: > Thanks a lot, Jose! > I looked at the eps folder (where I found the test8.c that has been my starting point) but I did not look at the nep folder (my fault!) > Thanks again, > Mario > > Il giorno ven 17 giu 2022 alle ore 17:34 Jose E. Roman ha scritto: > You can use VecGetOwnershipRange() to determine the range of global indices corresponding to the local portion of a vector, and VecGetArray() to access the values. In SLEPc, you can assume that X and Y will have the same parallel distribution. > > For an example of a shell matrix that implements the matrix-vector product in parallel, have a look at this: https://slepc.upv.es/documentation/current/src/nep/tutorials/ex21.c.html > It is a simple tridiagonal example, where neighborwise communication is done with two calls to MPI_Sendrecv(). > > Jose > > > > El 17 jun 2022, a las 17:21, Mario Rossi escribi?: > > > > I need to find the largest eigenvalues (say the first three) of a very large matrix and I am using > > a combination of PetSc and SLEPc. In particular, I am using a shell matrix. I wrote a "custom" > > matrix-vector product and everything works fine in serial (one task) mode for a "small" case. > > For the real case, I need multiple (at least 128) tasks for memory reasons so I need a parallel variant of the custom matrix-vector product. I know exactly how to write the parallel variant > > (in plain MPI) but I am, somehow, blocked because it is not clear to me what each task receives > > and what is expected to provide in the parallel matrix-vector product. > > More in detail, with a single task, the function receives the full X vector and is expected to provide the full Y vector resulting from Y=A*X. > > What does it happen with multiple tasks? If I understand correctly > > in the matrix shell definition, I can choose to split the matrix into blocks of rows so that the matrix-vector function should compute a block of elements of the vector Y but does it receive only the corresponding subset of the X (input vector)? (this is what I guess happens) and in output, does > > each task return its subset of elements of Y as if it were the whole array and then PetSc manages all the subsets? Is there anyone who has a working example of a parallel matrix-vector product for matrix shell? > > Thanks in advance for any help you can provide! > > Mario > > i > > > From knepley at gmail.com Sat Jun 18 07:02:44 2022 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 18 Jun 2022 08:02:44 -0400 Subject: [petsc-users] How to find the map between the high order coordinates of DMPlex and vertex numbering? In-Reply-To: References: <2640A1A9-101C-4DFB-BFA4-C64AF231732A@gmail.com> Message-ID: On Sat, Jun 18, 2022 at 2:16 AM Zongze Yang wrote: > In order to check if I made mistakes in the python code, I try to use c > code to show the issue on DMProjectCoordinates. The code and mesh file is > attached. > If the code is correct, there must be something wrong with > `DMProjectCoordinates` or `DMPlexCreateGmshFromFile` for high-order mesh. > Something is definitely wrong with high order, periodic simplices from Gmsh. We had not tested that case. I am at a conference and cannot look at it for a week. My suspicion is that the space we make when reading in the Gmsh coordinates does not match the values (wrong order). Thanks, Matt > The command and the output are listed below: (Obviously the bounding box > is changed.) > ``` > $ ./test_gmsh_load_2rd -filename cube-p2.msh -old_fe_view -new_fe_view > Old Bounding Box: > 0: lo = 0. hi = 1. > 1: lo = 0. hi = 1. > 2: lo = 0. hi = 1. > PetscFE Object: OldCoordinatesFE 1 MPI processes > type: basic > Basic Finite Element in 3 dimensions with 3 components > PetscSpace Object: P2 1 MPI processes > type: sum > Space in 3 variables with 3 components, size 30 > Sum space of 3 concatenated subspaces (all identical) > PetscSpace Object: sum component (sumcomp_) 1 MPI processes > type: poly > Space in 3 variables with 1 components, size 10 > Polynomial space of degree 2 > PetscDualSpace Object: P2 1 MPI processes > type: lagrange > Dual space with 3 components, size 30 > Discontinuous Lagrange dual space > Quadrature of order 5 on 27 points (dim 3) > PetscFE Object: NewCoordinatesFE 1 MPI processes > type: basic > Basic Finite Element in 3 dimensions with 3 components > PetscSpace Object: P2 1 MPI processes > type: sum > Space in 3 variables with 3 components, size 30 > Sum space of 3 concatenated subspaces (all identical) > PetscSpace Object: sum component (sumcomp_) 1 MPI processes > type: poly > Space in 3 variables with 1 components, size 10 > Polynomial space of degree 2 > PetscDualSpace Object: P2 1 MPI processes > type: lagrange > Dual space with 3 components, size 30 > Continuous Lagrange dual space > Quadrature of order 5 on 27 points (dim 3) > New Bounding Box: > 0: lo = 2.5624e-17 hi = 8. > 1: lo = -9.23372e-17 hi = 7. > 2: lo = 2.72091e-17 hi = 8.5 > ``` > > Thanks, > Zongze > > Zongze Yang ?2022?6?17??? 14:54??? > >> I tried the projection operation. However, it seems that the projection >> gives the wrong solution. After projection, the bounding box is changed! >> See logs below. >> >> First, I patch the petsc4py by adding `DMProjectCoordinates`: >> ``` >> diff --git a/src/binding/petsc4py/src/PETSc/DM.pyx >> b/src/binding/petsc4py/src/PETSc/DM.pyx >> index d8a58d183a..dbcdb280f1 100644 >> --- a/src/binding/petsc4py/src/PETSc/DM.pyx >> +++ b/src/binding/petsc4py/src/PETSc/DM.pyx >> @@ -307,6 +307,12 @@ cdef class DM(Object): >> PetscINCREF(c.obj) >> return c >> >> + def projectCoordinates(self, FE fe=None): >> + if fe is None: >> + CHKERR( DMProjectCoordinates(self.dm, NULL) ) >> + else: >> + CHKERR( DMProjectCoordinates(self.dm, fe.fe) ) >> + >> def getBoundingBox(self): >> cdef PetscInt i,dim=0 >> CHKERR( DMGetCoordinateDim(self.dm, &dim) ) >> diff --git a/src/binding/petsc4py/src/PETSc/petscdm.pxi >> b/src/binding/petsc4py/src/PETSc/petscdm.pxi >> index 514b6fa472..c778e39884 100644 >> --- a/src/binding/petsc4py/src/PETSc/petscdm.pxi >> +++ b/src/binding/petsc4py/src/PETSc/petscdm.pxi >> @@ -90,6 +90,7 @@ cdef extern from * nogil: >> int DMGetCoordinateDim(PetscDM,PetscInt*) >> int DMSetCoordinateDim(PetscDM,PetscInt) >> int DMLocalizeCoordinates(PetscDM) >> + int DMProjectCoordinates(PetscDM, PetscFE) >> >> int DMCreateInterpolation(PetscDM,PetscDM,PetscMat*,PetscVec*) >> int DMCreateInjection(PetscDM,PetscDM,PetscMat*) >> ``` >> >> Then in python, I load a mesh and project the coordinates to P2: >> ``` >> import firedrake as fd >> from firedrake.petsc import PETSc >> >> # plex = fd.mesh._from_gmsh('test-fd-load-p2.msh') >> plex = fd.mesh._from_gmsh('test-fd-load-p2-rect.msh') >> print('old bbox:', plex.getBoundingBox()) >> >> dim = plex.getDimension() >> # (dim, nc, isSimplex, k, >> qorder, comm=None) >> fe_new = PETSc.FE().createLagrange(dim, dim, True, 2, >> PETSc.DETERMINE) >> plex.projectCoordinates(fe_new) >> fe_new.view() >> >> print('new bbox:', plex.getBoundingBox()) >> ``` >> >> The output is (The bounding box is changed!) >> ``` >> >> old bbox: ((0.0, 1.0), (0.0, 1.0), (0.0, 1.0)) >> PetscFE Object: P2 1 MPI processes >> type: basic >> Basic Finite Element in 3 dimensions with 3 components >> PetscSpace Object: P2 1 MPI processes >> type: sum >> Space in 3 variables with 3 components, size 30 >> Sum space of 3 concatenated subspaces (all identical) >> PetscSpace Object: sum component (sumcomp_) 1 MPI processes >> type: poly >> Space in 3 variables with 1 components, size 10 >> Polynomial space of degree 2 >> PetscDualSpace Object: P2 1 MPI processes >> type: lagrange >> Dual space with 3 components, size 30 >> Continuous Lagrange dual space >> Quadrature of order 5 on 27 points (dim 3) >> new bbox: ((-6.530133708576188e-17, 36.30670832662781), (-3.899962995254311e-17, 36.2406171632539), (-8.8036464152166e-17, 36.111577025012224)) >> >> ``` >> >> >> By the way, for the original DG coordinates, where can I find the relation of the closure and the order of the dofs for the cell? >> >> >> Thanks! >> >> >> Zongze >> >> >> >> Matthew Knepley ?2022?6?17??? 01:11??? >> >>> On Thu, Jun 16, 2022 at 12:06 PM Zongze Yang >>> wrote: >>> >>>> >>>> >>>> ? 2022?6?16??23:22?Matthew Knepley ??? >>>> >>>> ? >>>> On Thu, Jun 16, 2022 at 11:11 AM Zongze Yang >>>> wrote: >>>> >>>>> Hi, if I load a `gmsh` file with second-order elements, the >>>>> coordinates will be stored in a DG-P2 space. After obtaining the >>>>> coordinates of a cell, how can I map the coordinates to vertex and edge? >>>>> >>>> >>>> By default, they are stored as P2, not DG. >>>> >>>> >>>> I checked the coordinates vector, and found the dogs only defined on >>>> cell other than vertex and edge, so I said they are stored as DG. >>>> Then the function DMPlexVecGetClosure >>>> seems return >>>> the coordinates in lex order. >>>> >>>> Some code in reading gmsh file reads that >>>> >>>> >>>> 1756: if (isSimplex) continuity = PETSC_FALSE >>>> ; /* XXX >>>> FIXME Requires DMPlexSetClosurePermutationLexicographic() */ >>>> >>>> >>>> 1758: GmshCreateFE(comm, NULL, isSimplex, continuity, nodeType, >>>> dim, coordDim, order, &fe) >>>> >>>> >>>> The continuity is set to false for simplex. >>>> >>> >>> Oh, yes. That needs to be fixed. For now, you can just project it to P2 >>> if you want using >>> >>> https://petsc.org/main/docs/manualpages/DM/DMProjectCoordinates/ >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks, >>>> Zongze >>>> >>>> You can ask for the coordinates of a vertex or an edge directly using >>>> >>>> https://petsc.org/main/docs/manualpages/DMPLEX/DMPlexPointLocalRead/ >>>> >>>> by giving the vertex or edge point. You can get all the coordinates on >>>> a cell, in the closure order, using >>>> >>>> https://petsc.org/main/docs/manualpages/DMPLEX/DMPlexVecGetClosure/ >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Below is some code load the gmsh file, I want to know the relation >>>>> between `cl` and `cell_coords`. >>>>> >>>>> ``` >>>>> import firedrake as fd >>>>> import numpy as np >>>>> >>>>> # Load gmsh file (2rd) >>>>> plex = fd.mesh._from_gmsh('test-fd-load-p2-rect.msh') >>>>> >>>>> cs, ce = plex.getHeightStratum(0) >>>>> >>>>> cdm = plex.getCoordinateDM() >>>>> csec = dm.getCoordinateSection() >>>>> coords_gvec = dm.getCoordinates() >>>>> >>>>> for i in range(cs, ce): >>>>> cell_coords = cdm.getVecClosure(csec, coords_gvec, i) >>>>> print(f'coordinates for cell {i} :\n{cell_coords.reshape([-1, >>>>> 3])}') >>>>> cl = dm.getTransitiveClosure(i) >>>>> print('closure:', cl) >>>>> break >>>>> ``` >>>>> >>>>> Best wishes, >>>>> Zongze >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangzongze at gmail.com Sat Jun 18 07:31:10 2022 From: yangzongze at gmail.com (Zongze Yang) Date: Sat, 18 Jun 2022 20:31:10 +0800 Subject: [petsc-users] How to find the map between the high order coordinates of DMPlex and vertex numbering? In-Reply-To: References: <2640A1A9-101C-4DFB-BFA4-C64AF231732A@gmail.com> Message-ID: Thank you for your reply. May I ask for some references on the order of the dofs on PETSc's FE Space (especially high order elements)? Thanks, Zongze Matthew Knepley ?2022?6?18??? 20:02??? > On Sat, Jun 18, 2022 at 2:16 AM Zongze Yang wrote: > >> In order to check if I made mistakes in the python code, I try to use c >> code to show the issue on DMProjectCoordinates. The code and mesh file is >> attached. >> If the code is correct, there must be something wrong with >> `DMProjectCoordinates` or `DMPlexCreateGmshFromFile` for high-order mesh. >> > > Something is definitely wrong with high order, periodic simplices from > Gmsh. We had not tested that case. I am at a conference and cannot look at > it for a week. > My suspicion is that the space we make when reading in the Gmsh > coordinates does not match the values (wrong order). > > Thanks, > > Matt > > >> The command and the output are listed below: (Obviously the bounding box >> is changed.) >> ``` >> $ ./test_gmsh_load_2rd -filename cube-p2.msh -old_fe_view -new_fe_view >> Old Bounding Box: >> 0: lo = 0. hi = 1. >> 1: lo = 0. hi = 1. >> 2: lo = 0. hi = 1. >> PetscFE Object: OldCoordinatesFE 1 MPI processes >> type: basic >> Basic Finite Element in 3 dimensions with 3 components >> PetscSpace Object: P2 1 MPI processes >> type: sum >> Space in 3 variables with 3 components, size 30 >> Sum space of 3 concatenated subspaces (all identical) >> PetscSpace Object: sum component (sumcomp_) 1 MPI processes >> type: poly >> Space in 3 variables with 1 components, size 10 >> Polynomial space of degree 2 >> PetscDualSpace Object: P2 1 MPI processes >> type: lagrange >> Dual space with 3 components, size 30 >> Discontinuous Lagrange dual space >> Quadrature of order 5 on 27 points (dim 3) >> PetscFE Object: NewCoordinatesFE 1 MPI processes >> type: basic >> Basic Finite Element in 3 dimensions with 3 components >> PetscSpace Object: P2 1 MPI processes >> type: sum >> Space in 3 variables with 3 components, size 30 >> Sum space of 3 concatenated subspaces (all identical) >> PetscSpace Object: sum component (sumcomp_) 1 MPI processes >> type: poly >> Space in 3 variables with 1 components, size 10 >> Polynomial space of degree 2 >> PetscDualSpace Object: P2 1 MPI processes >> type: lagrange >> Dual space with 3 components, size 30 >> Continuous Lagrange dual space >> Quadrature of order 5 on 27 points (dim 3) >> New Bounding Box: >> 0: lo = 2.5624e-17 hi = 8. >> 1: lo = -9.23372e-17 hi = 7. >> 2: lo = 2.72091e-17 hi = 8.5 >> ``` >> >> Thanks, >> Zongze >> >> Zongze Yang ?2022?6?17??? 14:54??? >> >>> I tried the projection operation. However, it seems that the projection >>> gives the wrong solution. After projection, the bounding box is changed! >>> See logs below. >>> >>> First, I patch the petsc4py by adding `DMProjectCoordinates`: >>> ``` >>> diff --git a/src/binding/petsc4py/src/PETSc/DM.pyx >>> b/src/binding/petsc4py/src/PETSc/DM.pyx >>> index d8a58d183a..dbcdb280f1 100644 >>> --- a/src/binding/petsc4py/src/PETSc/DM.pyx >>> +++ b/src/binding/petsc4py/src/PETSc/DM.pyx >>> @@ -307,6 +307,12 @@ cdef class DM(Object): >>> PetscINCREF(c.obj) >>> return c >>> >>> + def projectCoordinates(self, FE fe=None): >>> + if fe is None: >>> + CHKERR( DMProjectCoordinates(self.dm, NULL) ) >>> + else: >>> + CHKERR( DMProjectCoordinates(self.dm, fe.fe) ) >>> + >>> def getBoundingBox(self): >>> cdef PetscInt i,dim=0 >>> CHKERR( DMGetCoordinateDim(self.dm, &dim) ) >>> diff --git a/src/binding/petsc4py/src/PETSc/petscdm.pxi >>> b/src/binding/petsc4py/src/PETSc/petscdm.pxi >>> index 514b6fa472..c778e39884 100644 >>> --- a/src/binding/petsc4py/src/PETSc/petscdm.pxi >>> +++ b/src/binding/petsc4py/src/PETSc/petscdm.pxi >>> @@ -90,6 +90,7 @@ cdef extern from * nogil: >>> int DMGetCoordinateDim(PetscDM,PetscInt*) >>> int DMSetCoordinateDim(PetscDM,PetscInt) >>> int DMLocalizeCoordinates(PetscDM) >>> + int DMProjectCoordinates(PetscDM, PetscFE) >>> >>> int DMCreateInterpolation(PetscDM,PetscDM,PetscMat*,PetscVec*) >>> int DMCreateInjection(PetscDM,PetscDM,PetscMat*) >>> ``` >>> >>> Then in python, I load a mesh and project the coordinates to P2: >>> ``` >>> import firedrake as fd >>> from firedrake.petsc import PETSc >>> >>> # plex = fd.mesh._from_gmsh('test-fd-load-p2.msh') >>> plex = fd.mesh._from_gmsh('test-fd-load-p2-rect.msh') >>> print('old bbox:', plex.getBoundingBox()) >>> >>> dim = plex.getDimension() >>> # (dim, nc, isSimplex, k, >>> qorder, comm=None) >>> fe_new = PETSc.FE().createLagrange(dim, dim, True, 2, >>> PETSc.DETERMINE) >>> plex.projectCoordinates(fe_new) >>> fe_new.view() >>> >>> print('new bbox:', plex.getBoundingBox()) >>> ``` >>> >>> The output is (The bounding box is changed!) >>> ``` >>> >>> old bbox: ((0.0, 1.0), (0.0, 1.0), (0.0, 1.0)) >>> PetscFE Object: P2 1 MPI processes >>> type: basic >>> Basic Finite Element in 3 dimensions with 3 components >>> PetscSpace Object: P2 1 MPI processes >>> type: sum >>> Space in 3 variables with 3 components, size 30 >>> Sum space of 3 concatenated subspaces (all identical) >>> PetscSpace Object: sum component (sumcomp_) 1 MPI processes >>> type: poly >>> Space in 3 variables with 1 components, size 10 >>> Polynomial space of degree 2 >>> PetscDualSpace Object: P2 1 MPI processes >>> type: lagrange >>> Dual space with 3 components, size 30 >>> Continuous Lagrange dual space >>> Quadrature of order 5 on 27 points (dim 3) >>> new bbox: ((-6.530133708576188e-17, 36.30670832662781), (-3.899962995254311e-17, 36.2406171632539), (-8.8036464152166e-17, 36.111577025012224)) >>> >>> ``` >>> >>> >>> By the way, for the original DG coordinates, where can I find the relation of the closure and the order of the dofs for the cell? >>> >>> >>> Thanks! >>> >>> >>> Zongze >>> >>> >>> >>> Matthew Knepley ?2022?6?17??? 01:11??? >>> >>>> On Thu, Jun 16, 2022 at 12:06 PM Zongze Yang >>>> wrote: >>>> >>>>> >>>>> >>>>> ? 2022?6?16??23:22?Matthew Knepley ??? >>>>> >>>>> ? >>>>> On Thu, Jun 16, 2022 at 11:11 AM Zongze Yang >>>>> wrote: >>>>> >>>>>> Hi, if I load a `gmsh` file with second-order elements, the >>>>>> coordinates will be stored in a DG-P2 space. After obtaining the >>>>>> coordinates of a cell, how can I map the coordinates to vertex and edge? >>>>>> >>>>> >>>>> By default, they are stored as P2, not DG. >>>>> >>>>> >>>>> I checked the coordinates vector, and found the dogs only defined on >>>>> cell other than vertex and edge, so I said they are stored as DG. >>>>> Then the function DMPlexVecGetClosure >>>>> seems return >>>>> the coordinates in lex order. >>>>> >>>>> Some code in reading gmsh file reads that >>>>> >>>>> >>>>> 1756: if (isSimplex) continuity = PETSC_FALSE >>>>> ; /* XXX >>>>> FIXME Requires DMPlexSetClosurePermutationLexicographic() */ >>>>> >>>>> >>>>> 1758: GmshCreateFE(comm, NULL, isSimplex, continuity, nodeType, >>>>> dim, coordDim, order, &fe) >>>>> >>>>> >>>>> The continuity is set to false for simplex. >>>>> >>>> >>>> Oh, yes. That needs to be fixed. For now, you can just project it to P2 >>>> if you want using >>>> >>>> https://petsc.org/main/docs/manualpages/DM/DMProjectCoordinates/ >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Thanks, >>>>> Zongze >>>>> >>>>> You can ask for the coordinates of a vertex or an edge directly using >>>>> >>>>> https://petsc.org/main/docs/manualpages/DMPLEX/DMPlexPointLocalRead/ >>>>> >>>>> by giving the vertex or edge point. You can get all the coordinates on >>>>> a cell, in the closure order, using >>>>> >>>>> https://petsc.org/main/docs/manualpages/DMPLEX/DMPlexVecGetClosure/ >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> Below is some code load the gmsh file, I want to know the relation >>>>>> between `cl` and `cell_coords`. >>>>>> >>>>>> ``` >>>>>> import firedrake as fd >>>>>> import numpy as np >>>>>> >>>>>> # Load gmsh file (2rd) >>>>>> plex = fd.mesh._from_gmsh('test-fd-load-p2-rect.msh') >>>>>> >>>>>> cs, ce = plex.getHeightStratum(0) >>>>>> >>>>>> cdm = plex.getCoordinateDM() >>>>>> csec = dm.getCoordinateSection() >>>>>> coords_gvec = dm.getCoordinates() >>>>>> >>>>>> for i in range(cs, ce): >>>>>> cell_coords = cdm.getVecClosure(csec, coords_gvec, i) >>>>>> print(f'coordinates for cell {i} :\n{cell_coords.reshape([-1, >>>>>> 3])}') >>>>>> cl = dm.getTransitiveClosure(i) >>>>>> print('closure:', cl) >>>>>> break >>>>>> ``` >>>>>> >>>>>> Best wishes, >>>>>> Zongze >>>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dfatiac at gmail.com Sat Jun 18 09:30:03 2022 From: dfatiac at gmail.com (Mario Rossi) Date: Sat, 18 Jun 2022 16:30:03 +0200 Subject: [petsc-users] Parallel matrix-vector product with Matrix Shell In-Reply-To: <35218C6D-1BA4-4E2D-8A27-8860B6CF3E33@dsic.upv.es> References: <3AAEAEF6-82A9-478F-BAD9-DC3AE6B0025C@dsic.upv.es> <35218C6D-1BA4-4E2D-8A27-8860B6CF3E33@dsic.upv.es> Message-ID: Thanks again Jose for your prompt and very useful indication. By using that, I could understand where the REAL problem was (and obviously it was my fault). Now everything works smoothly and produces the expected result. All the best, Mario Il giorno sab 18 giu 2022 alle ore 08:27 Jose E. Roman ha scritto: > The initial vector of the Krylov method is by default a random vector, > which is different when you change the number of processes. To avoid this, > you can run with the undocumented option -bv_reproducible_random which will > generate the same random initial vector irrespective of the number of > processes. > > Alternatively, set an initial vector in your code with > EPSSetInitialSpace(), see e.g. > https://slepc.upv.es/documentation/current/src/eps/tutorials/ex5.c.html > > Jose > > > > El 18 jun 2022, a las 8:13, Mario Rossi escribi?: > > > > Dear Jose and Petsc users, I implemented the parallel matrix-vector > product and it works meaning that it produces a result but it is different > from the result produced with a single task. > > Obviously, I could be wrong in my implementation but what puzzles me is > that the input vector (x) to the product is different running with one and > two tasks and this is from the very first iteration (so it can not be due > to a previous error in the product). > > I checked that X is different with one and two tasks with the following > (naive) code > > PetscErrorCode MatMult_TM(Mat A,Vec x,Vec y) { > > void *ctx; > > PetscInt nx /* ,lo,i,j*/; > > const PetscScalar *px; > > PetscScalar *py; > > MPI_Comm comm; > > PetscFunctionBeginUser; > > PetscCall(MatShellGetContext(A,&ctx)); > > PetscCall(VecGetLocalSize(x,&nx)); > > PetscCall(PetscObjectGetComm((PetscObject)A,&comm)); > > > > // nx = *(int*)ctx; > > PetscCall(VecGetArrayRead(x,&px)); > > PetscCall(VecGetArray(y,&py)); > > > > for(int i=0;i w[%d]=%f\n",myrank,i+offset,px[i],i+offset,w[i+offset]); } > > PetscCall(MPI_Barrier(comm)); > > exit(0); > > ...... > > } > > > > Then I reordered the output obtained with one and two tasks. The first > part of the x vector is very similar (but not exactly the same) using one > and two tasks but the second part (belonging to the second task) is pretty > different > > (here "offset" is offset=(n/size)*myrank;) > > I create the matrix shell with > > > PetscCall(MatCreateShell(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,N,N,&n,&A)); > > I am sure I am doing something wrong but I don't know what I need to > look at. > > Thanks in advance! > > Mario > > > > > > Il giorno ven 17 giu 2022 alle ore 17:56 Mario Rossi > ha scritto: > > Thanks a lot, Jose! > > I looked at the eps folder (where I found the test8.c that has been my > starting point) but I did not look at the nep folder (my fault!) > > Thanks again, > > Mario > > > > Il giorno ven 17 giu 2022 alle ore 17:34 Jose E. Roman < > jroman at dsic.upv.es> ha scritto: > > You can use VecGetOwnershipRange() to determine the range of global > indices corresponding to the local portion of a vector, and VecGetArray() > to access the values. In SLEPc, you can assume that X and Y will have the > same parallel distribution. > > > > For an example of a shell matrix that implements the matrix-vector > product in parallel, have a look at this: > https://slepc.upv.es/documentation/current/src/nep/tutorials/ex21.c.html > > It is a simple tridiagonal example, where neighborwise communication is > done with two calls to MPI_Sendrecv(). > > > > Jose > > > > > > > El 17 jun 2022, a las 17:21, Mario Rossi escribi?: > > > > > > I need to find the largest eigenvalues (say the first three) of a very > large matrix and I am using > > > a combination of PetSc and SLEPc. In particular, I am using a shell > matrix. I wrote a "custom" > > > matrix-vector product and everything works fine in serial (one task) > mode for a "small" case. > > > For the real case, I need multiple (at least 128) tasks for memory > reasons so I need a parallel variant of the custom matrix-vector product. I > know exactly how to write the parallel variant > > > (in plain MPI) but I am, somehow, blocked because it is not clear to > me what each task receives > > > and what is expected to provide in the parallel matrix-vector product. > > > More in detail, with a single task, the function receives the full X > vector and is expected to provide the full Y vector resulting from Y=A*X. > > > What does it happen with multiple tasks? If I understand correctly > > > in the matrix shell definition, I can choose to split the matrix into > blocks of rows so that the matrix-vector function should compute a block of > elements of the vector Y but does it receive only the corresponding subset > of the X (input vector)? (this is what I guess happens) and in output, does > > > each task return its subset of elements of Y as if it were the whole > array and then PetSc manages all the subsets? Is there anyone who has a > working example of a parallel matrix-vector product for matrix shell? > > > Thanks in advance for any help you can provide! > > > Mario > > > i > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kcpkumar33 at gmail.com Sun Jun 19 16:07:43 2022 From: kcpkumar33 at gmail.com (Pavankumar Koratikere) Date: Sun, 19 Jun 2022 17:07:43 -0400 Subject: [petsc-users] PETSc Segmentation Violation error Message-ID: Hello, I am trying to run a script that uses packages that depend on OpenMPI and PETSC (as shown below). mpirun -np 4 python test.py I am getting following error: [1]PETSC ERROR: ------------------------------------------------------------------------ [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [1]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [1]PETSC ERROR: likely location of problem given in stack below [1]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [1]PETSC ERROR: INSTEAD the line number of the start of the function [1]PETSC ERROR: is given. [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: Signal received [1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [1]PETSC ERROR: Petsc Release Version 3.15.5, Sep 29, 2021 [1]PETSC ERROR: Unknown Name on a real-debug named skynet by pavan Sat Jun 18 10:30:04 2022 [1]PETSC ERROR: Configure options --PETSC_ARCH=real-debug --with-scalar-type=real --with-debugging=1 --with-mpi-dir=/home/pavan/packages/openmpi-4.0.7/opt-gfortran --download-metis=yes --download-parmetis=yes --download-superlu_dist=yes --with-shared-libraries=yes --with-fortran-bindings=1 --with-cxx-dialect=C++11 [1]PETSC ERROR: #1 User provided function() at unknown file:0 [1]PETSC ERROR: Checking the memory for corruption. [2]PETSC ERROR: ------------------------------------------------------------------------ [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [2]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [2]PETSC ERROR: likely location of problem given in stack below [2]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [2]PETSC ERROR: INSTEAD the line number of the start of the function [2]PETSC ERROR: is given. [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [2]PETSC ERROR: Signal received [2]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [2]PETSC ERROR: Petsc Release Version 3.15.5, Sep 29, 2021 [2]PETSC ERROR: Unknown Name on a real-debug named skynet by pavan Sat Jun 18 10:30:04 2022 [2]PETSC ERROR: Configure options --PETSC_ARCH=real-debug --with-scalar-type=real --with-debugging=1 --with-mpi-dir=/home/pavan/packages/openmpi-4.0.7/opt-gfortran --download-metis=yes --download-parmetis=yes --download-superlu_dist=yes --with-shared-libraries=yes --with-fortran-bindings=1 --with-cxx-dialect=C++11 [2]PETSC ERROR: #1 User provided function() at unknown file:0 [2]PETSC ERROR: Checking the memory for corruption. [3]PETSC ERROR: ------------------------------------------------------------------------ [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [3]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [3]PETSC ERROR: likely location of problem given in stack below [3]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [3]PETSC ERROR: INSTEAD the line number of the start of the function [3]PETSC ERROR: is given. [3]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [3]PETSC ERROR: Signal received [3]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [3]PETSC ERROR: Petsc Release Version 3.15.5, Sep 29, 2021 [3]PETSC ERROR: Unknown Name on a real-debug named skynet by pavan Sat Jun 18 10:30:04 2022 [3]PETSC ERROR: Configure options --PETSC_ARCH=real-debug --with-scalar-type=real --with-debugging=1 --with-mpi-dir=/home/pavan/packages/openmpi-4.0.7/opt-gfortran --download-metis=yes --download-parmetis=yes --download-superlu_dist=yes --with-shared-libraries=yes --with-fortran-bindings=1 --with-cxx-dialect=C++11 [3]PETSC ERROR: #1 User provided function() at unknown file:0 [3]PETSC ERROR: Checking the memory for corruption. [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [0]PETSC ERROR: likely location of problem given in stack below [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [0]PETSC ERROR: INSTEAD the line number of the start of the function [0]PETSC ERROR: is given. [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Signal received [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.15.5, Sep 29, 2021 [0]PETSC ERROR: Unknown Name on a real-debug named skynet by pavan Sat Jun 18 10:30:04 2022 [0]PETSC ERROR: Configure options --PETSC_ARCH=real-debug --with-scalar-type=real --with-debugging=1 --with-mpi-dir=/home/pavan/packages/openmpi-4.0.7/opt-gfortran --download-metis=yes --download-parmetis=yes --download-superlu_dist=yes --with-shared-libraries=yes --with-fortran-bindings=1 --with-cxx-dialect=C++11 [0]PETSC ERROR: #1 User provided function() at unknown file:0 [0]PETSC ERROR: Checking the memory for corruption. I am new to PETSc and I don't really know how to debug this. Any help will be much appreciated! Regards, Pavan. -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Sun Jun 19 20:59:07 2022 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 20 Jun 2022 07:29:07 +0530 (IST) Subject: [petsc-users] PETSc Segmentation Violation error In-Reply-To: References: Message-ID: As the below message indicates - you can try running this code via valgrind or gdb to determine the location of error. https://petsc.org/release/faq/#valgrind i.e mpirun -np 4 valgrind --tool=memcheck python test.py One way to specify option -start_in_debugger is: PETSC_OPTION=-start_in_debugger mpirun -np 4 python test.py Also good to use latest petsc version - currently its 3.17 Satish On Sun, 19 Jun 2022, Pavankumar Koratikere wrote: > Hello, > > I am trying to run a script that uses packages that depend on OpenMPI and > PETSC (as shown below). > > mpirun -np 4 python test.py > > I am getting following error: > > [1]PETSC ERROR: > ------------------------------------------------------------------------ > [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [1]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind > [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > OS X to find memory corruption errors > [1]PETSC ERROR: likely location of problem given in stack below > [1]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [1]PETSC ERROR: INSTEAD the line number of the start of the function > [1]PETSC ERROR: is given. > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: Signal received > [1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.15.5, Sep 29, 2021 > [1]PETSC ERROR: Unknown Name on a real-debug named skynet by pavan Sat > Jun 18 10:30:04 2022 > [1]PETSC ERROR: Configure options --PETSC_ARCH=real-debug > --with-scalar-type=real --with-debugging=1 > --with-mpi-dir=/home/pavan/packages/openmpi-4.0.7/opt-gfortran > --download-metis=yes --download-parmetis=yes > --download-superlu_dist=yes --with-shared-libraries=yes > --with-fortran-bindings=1 --with-cxx-dialect=C++11 > [1]PETSC ERROR: #1 User provided function() at unknown file:0 > [1]PETSC ERROR: Checking the memory for corruption. > [2]PETSC ERROR: > ------------------------------------------------------------------------ > [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [2]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind > [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > OS X to find memory corruption errors > [2]PETSC ERROR: likely location of problem given in stack below > [2]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [2]PETSC ERROR: INSTEAD the line number of the start of the function > [2]PETSC ERROR: is given. > [2]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [2]PETSC ERROR: Signal received > [2]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [2]PETSC ERROR: Petsc Release Version 3.15.5, Sep 29, 2021 > [2]PETSC ERROR: Unknown Name on a real-debug named skynet by pavan Sat > Jun 18 10:30:04 2022 > [2]PETSC ERROR: Configure options --PETSC_ARCH=real-debug > --with-scalar-type=real --with-debugging=1 > --with-mpi-dir=/home/pavan/packages/openmpi-4.0.7/opt-gfortran > --download-metis=yes --download-parmetis=yes > --download-superlu_dist=yes --with-shared-libraries=yes > --with-fortran-bindings=1 --with-cxx-dialect=C++11 > [2]PETSC ERROR: #1 User provided function() at unknown file:0 > [2]PETSC ERROR: Checking the memory for corruption. > [3]PETSC ERROR: > ------------------------------------------------------------------------ > [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [3]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind > [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > OS X to find memory corruption errors > [3]PETSC ERROR: likely location of problem given in stack below > [3]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [3]PETSC ERROR: INSTEAD the line number of the start of the function > [3]PETSC ERROR: is given. > [3]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [3]PETSC ERROR: Signal received > [3]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [3]PETSC ERROR: Petsc Release Version 3.15.5, Sep 29, 2021 > [3]PETSC ERROR: Unknown Name on a real-debug named skynet by pavan Sat > Jun 18 10:30:04 2022 > [3]PETSC ERROR: Configure options --PETSC_ARCH=real-debug > --with-scalar-type=real --with-debugging=1 > --with-mpi-dir=/home/pavan/packages/openmpi-4.0.7/opt-gfortran > --download-metis=yes --download-parmetis=yes > --download-superlu_dist=yes --with-shared-libraries=yes > --with-fortran-bindings=1 --with-cxx-dialect=C++11 > [3]PETSC ERROR: #1 User provided function() at unknown file:0 > [3]PETSC ERROR: Checking the memory for corruption. > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > OS X to find memory corruption errors > [0]PETSC ERROR: likely location of problem given in stack below > [0]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [0]PETSC ERROR: INSTEAD the line number of the start of the function > [0]PETSC ERROR: is given. > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Signal received > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.15.5, Sep 29, 2021 > [0]PETSC ERROR: Unknown Name on a real-debug named skynet by pavan Sat > Jun 18 10:30:04 2022 > [0]PETSC ERROR: Configure options --PETSC_ARCH=real-debug > --with-scalar-type=real --with-debugging=1 > --with-mpi-dir=/home/pavan/packages/openmpi-4.0.7/opt-gfortran > --download-metis=yes --download-parmetis=yes > --download-superlu_dist=yes --with-shared-libraries=yes > --with-fortran-bindings=1 --with-cxx-dialect=C++11 > [0]PETSC ERROR: #1 User provided function() at unknown file:0 > [0]PETSC ERROR: Checking the memory for corruption. > > I am new to PETSc and I don't really know how to debug this. Any help > will be much appreciated! > > Regards, > > Pavan. > From jacob.fai at gmail.com Mon Jun 20 07:56:43 2022 From: jacob.fai at gmail.com (Jacob Faibussowitsch) Date: Mon, 20 Jun 2022 08:56:43 -0400 Subject: [petsc-users] PETSc Segmentation Violation error In-Reply-To: References: Message-ID: <9BECDEDA-D113-43A8-B562-68CDBB16357E@gmail.com> Glad everything worked out. (I forgot to reply-all in my initial mail, so the mailing list did not get included, adding it back in now). Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) > On Jun 20, 2022, at 08:54, Pavankumar Koratikere wrote: > > Hello Jacob > > Thanks for your reply! As you mentioned, there was a minor discrepancy in the installation of a package which uses PETSc. I changed the version of python with which the package was configured and everything ran as expected. > > Regards, > Pavan. > > On Sun, Jun 19, 2022 at 9:09 PM Jacob Faibussowitsch wrote: > > [1]PETSC ERROR: #1 User provided function() at unknown file:0 > > The error message indicates that the segmentation violation occurs outside of PETSc. PETSc registers a SIGSEGV signal handler on startup, hence why it is the one to catch this. If the error was occurring somewhere within PETSc, or within a user-function called by PETSc then this stack trace would be more complete. > > Without seeing the code you are running we unfortunately cannot pinpoint the problem. > > Best regards, > > Jacob Faibussowitsch > (Jacob Fai - booss - oh - vitch) > > > On Jun 19, 2022, at 17:07, Pavankumar Koratikere wrote: > > > > Hello, > > I am trying to run a script that uses packages that depend on OpenMPI and PETSC (as shown below). > > > > mpirun -np 4 python test.py > > I am getting following error: > > [1]PETSC ERROR: ------------------------------------------------------------------------ > > [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range > > [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > > [1]PETSC ERROR: or see > > https://petsc.org/release/faq/#valgrind > > > > [1]PETSC ERROR: or try > > http://valgrind.org > > on GNU/linux and Apple Mac OS X to find memory corruption errors > > [1]PETSC ERROR: likely location of problem given in stack below > > [1]PETSC ERROR: --------------------- Stack Frames ------------------------------------ > > [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > > [1]PETSC ERROR: INSTEAD the line number of the start of the function > > [1]PETSC ERROR: is given. > > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > [1]PETSC ERROR: Signal received > > [1]PETSC ERROR: See > > https://petsc.org/release/faq/ > > for trouble shooting. > > [1]PETSC ERROR: Petsc Release Version 3.15.5, Sep 29, 2021 > > [1]PETSC ERROR: Unknown Name on a real-debug named skynet by pavan Sat Jun 18 10:30:04 2022 > > [1]PETSC ERROR: Configure options --PETSC_ARCH=real-debug --with-scalar-type=real --with-debugging=1 --with-mpi-dir=/home/pavan/packages/openmpi-4.0.7/opt-gfortran --download-metis=yes --download-parmetis=yes --download-superlu_dist=yes --with-shared-libraries=yes --with-fortran-bindings=1 --with-cxx-dialect=C++11 > > [1]PETSC ERROR: #1 User provided function() at unknown file:0 > > [1]PETSC ERROR: Checking the memory for corruption. > > [2]PETSC ERROR: ------------------------------------------------------------------------ > > [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range > > [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > > [2]PETSC ERROR: or see > > https://petsc.org/release/faq/#valgrind > > > > [2]PETSC ERROR: or try > > http://valgrind.org > > on GNU/linux and Apple Mac OS X to find memory corruption errors > > [2]PETSC ERROR: likely location of problem given in stack below > > [2]PETSC ERROR: --------------------- Stack Frames ------------------------------------ > > [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > > [2]PETSC ERROR: INSTEAD the line number of the start of the function > > [2]PETSC ERROR: is given. > > [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > [2]PETSC ERROR: Signal received > > [2]PETSC ERROR: See > > https://petsc.org/release/faq/ > > for trouble shooting. > > [2]PETSC ERROR: Petsc Release Version 3.15.5, Sep 29, 2021 > > [2]PETSC ERROR: Unknown Name on a real-debug named skynet by pavan Sat Jun 18 10:30:04 2022 > > [2]PETSC ERROR: Configure options --PETSC_ARCH=real-debug --with-scalar-type=real --with-debugging=1 --with-mpi-dir=/home/pavan/packages/openmpi-4.0.7/opt-gfortran --download-metis=yes --download-parmetis=yes --download-superlu_dist=yes --with-shared-libraries=yes --with-fortran-bindings=1 --with-cxx-dialect=C++11 > > [2]PETSC ERROR: #1 User provided function() at unknown file:0 > > [2]PETSC ERROR: Checking the memory for corruption. > > [3]PETSC ERROR: ------------------------------------------------------------------------ > > [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range > > [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > > [3]PETSC ERROR: or see > > https://petsc.org/release/faq/#valgrind > > > > [3]PETSC ERROR: or try > > http://valgrind.org > > on GNU/linux and Apple Mac OS X to find memory corruption errors > > [3]PETSC ERROR: likely location of problem given in stack below > > [3]PETSC ERROR: --------------------- Stack Frames ------------------------------------ > > [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > > [3]PETSC ERROR: INSTEAD the line number of the start of the function > > [3]PETSC ERROR: is given. > > [3]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > [3]PETSC ERROR: Signal received > > [3]PETSC ERROR: See > > https://petsc.org/release/faq/ > > for trouble shooting. > > [3]PETSC ERROR: Petsc Release Version 3.15.5, Sep 29, 2021 > > [3]PETSC ERROR: Unknown Name on a real-debug named skynet by pavan Sat Jun 18 10:30:04 2022 > > [3]PETSC ERROR: Configure options --PETSC_ARCH=real-debug --with-scalar-type=real --with-debugging=1 --with-mpi-dir=/home/pavan/packages/openmpi-4.0.7/opt-gfortran --download-metis=yes --download-parmetis=yes --download-superlu_dist=yes --with-shared-libraries=yes --with-fortran-bindings=1 --with-cxx-dialect=C++11 > > [3]PETSC ERROR: #1 User provided function() at unknown file:0 > > [3]PETSC ERROR: Checking the memory for corruption. > > [0]PETSC ERROR: ------------------------------------------------------------------------ > > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range > > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > > [0]PETSC ERROR: or see > > https://petsc.org/release/faq/#valgrind > > > > [0]PETSC ERROR: or try > > http://valgrind.org > > on GNU/linux and Apple Mac OS X to find memory corruption errors > > [0]PETSC ERROR: likely location of problem given in stack below > > [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > > [0]PETSC ERROR: INSTEAD the line number of the start of the function > > [0]PETSC ERROR: is given. > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > [0]PETSC ERROR: Signal received > > [0]PETSC ERROR: See > > https://petsc.org/release/faq/ > > for trouble shooting. > > [0]PETSC ERROR: Petsc Release Version 3.15.5, Sep 29, 2021 > > [0]PETSC ERROR: Unknown Name on a real-debug named skynet by pavan Sat Jun 18 10:30:04 2022 > > [0]PETSC ERROR: Configure options --PETSC_ARCH=real-debug --with-scalar-type=real --with-debugging=1 --with-mpi-dir=/home/pavan/packages/openmpi-4.0.7/opt-gfortran --download-metis=yes --download-parmetis=yes --download-superlu_dist=yes --with-shared-libraries=yes --with-fortran-bindings=1 --with-cxx-dialect=C++11 > > [0]PETSC ERROR: #1 User provided function() at unknown file:0 > > [0]PETSC ERROR: Checking the memory for corruption. > > > > > > I am new to PETSc and I don't really know how to debug this. Any help will be much appreciated! > > Regards, > > Pavan. > From kcpkumar33 at gmail.com Mon Jun 20 09:22:18 2022 From: kcpkumar33 at gmail.com (Pavankumar Koratikere) Date: Mon, 20 Jun 2022 10:22:18 -0400 Subject: [petsc-users] PETSc Segmentation Violation error In-Reply-To: References: Message-ID: Hello Satish Thanks for your email! There was a minor discrepancy in the installation of a package which uses PETSc. I changed the version of python with which the package was configured and everything ran as expected. The package which I am using requires me to use a specific version of PETSc, so I am using an old version. Regards, Pavan. On Sun, Jun 19, 2022 at 9:59 PM Satish Balay wrote: > As the below message indicates - you can try running this code via > valgrind or gdb to determine the location of error. > > https://petsc.org/release/faq/#valgrind > i.e > mpirun -np 4 valgrind --tool=memcheck python test.py > > One way to specify option -start_in_debugger is: > > PETSC_OPTION=-start_in_debugger mpirun -np 4 python test.py > > Also good to use latest petsc version - currently its 3.17 > > Satish > > On Sun, 19 Jun 2022, Pavankumar Koratikere wrote: > > > Hello, > > > > I am trying to run a script that uses packages that depend on OpenMPI and > > PETSC (as shown below). > > > > mpirun -np 4 python test.py > > > > I am getting following error: > > > > [1]PETSC ERROR: > > ------------------------------------------------------------------------ > > [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > > probably memory access out of range > > [1]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > > [1]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind > > [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > > OS X to find memory corruption errors > > [1]PETSC ERROR: likely location of problem given in stack below > > [1]PETSC ERROR: --------------------- Stack Frames > > ------------------------------------ > > [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > > [1]PETSC ERROR: INSTEAD the line number of the start of the > function > > [1]PETSC ERROR: is given. > > [1]PETSC ERROR: --------------------- Error Message > > -------------------------------------------------------------- > > [1]PETSC ERROR: Signal received > > [1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > > [1]PETSC ERROR: Petsc Release Version 3.15.5, Sep 29, 2021 > > [1]PETSC ERROR: Unknown Name on a real-debug named skynet by pavan Sat > > Jun 18 10:30:04 2022 > > [1]PETSC ERROR: Configure options --PETSC_ARCH=real-debug > > --with-scalar-type=real --with-debugging=1 > > --with-mpi-dir=/home/pavan/packages/openmpi-4.0.7/opt-gfortran > > --download-metis=yes --download-parmetis=yes > > --download-superlu_dist=yes --with-shared-libraries=yes > > --with-fortran-bindings=1 --with-cxx-dialect=C++11 > > [1]PETSC ERROR: #1 User provided function() at unknown file:0 > > [1]PETSC ERROR: Checking the memory for corruption. > > [2]PETSC ERROR: > > ------------------------------------------------------------------------ > > [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > > probably memory access out of range > > [2]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > > [2]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind > > [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > > OS X to find memory corruption errors > > [2]PETSC ERROR: likely location of problem given in stack below > > [2]PETSC ERROR: --------------------- Stack Frames > > ------------------------------------ > > [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > > [2]PETSC ERROR: INSTEAD the line number of the start of the > function > > [2]PETSC ERROR: is given. > > [2]PETSC ERROR: --------------------- Error Message > > -------------------------------------------------------------- > > [2]PETSC ERROR: Signal received > > [2]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > > [2]PETSC ERROR: Petsc Release Version 3.15.5, Sep 29, 2021 > > [2]PETSC ERROR: Unknown Name on a real-debug named skynet by pavan Sat > > Jun 18 10:30:04 2022 > > [2]PETSC ERROR: Configure options --PETSC_ARCH=real-debug > > --with-scalar-type=real --with-debugging=1 > > --with-mpi-dir=/home/pavan/packages/openmpi-4.0.7/opt-gfortran > > --download-metis=yes --download-parmetis=yes > > --download-superlu_dist=yes --with-shared-libraries=yes > > --with-fortran-bindings=1 --with-cxx-dialect=C++11 > > [2]PETSC ERROR: #1 User provided function() at unknown file:0 > > [2]PETSC ERROR: Checking the memory for corruption. > > [3]PETSC ERROR: > > ------------------------------------------------------------------------ > > [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > > probably memory access out of range > > [3]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > > [3]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind > > [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > > OS X to find memory corruption errors > > [3]PETSC ERROR: likely location of problem given in stack below > > [3]PETSC ERROR: --------------------- Stack Frames > > ------------------------------------ > > [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > > [3]PETSC ERROR: INSTEAD the line number of the start of the > function > > [3]PETSC ERROR: is given. > > [3]PETSC ERROR: --------------------- Error Message > > -------------------------------------------------------------- > > [3]PETSC ERROR: Signal received > > [3]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > > [3]PETSC ERROR: Petsc Release Version 3.15.5, Sep 29, 2021 > > [3]PETSC ERROR: Unknown Name on a real-debug named skynet by pavan Sat > > Jun 18 10:30:04 2022 > > [3]PETSC ERROR: Configure options --PETSC_ARCH=real-debug > > --with-scalar-type=real --with-debugging=1 > > --with-mpi-dir=/home/pavan/packages/openmpi-4.0.7/opt-gfortran > > --download-metis=yes --download-parmetis=yes > > --download-superlu_dist=yes --with-shared-libraries=yes > > --with-fortran-bindings=1 --with-cxx-dialect=C++11 > > [3]PETSC ERROR: #1 User provided function() at unknown file:0 > > [3]PETSC ERROR: Checking the memory for corruption. > > [0]PETSC ERROR: > > ------------------------------------------------------------------------ > > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > > probably memory access out of range > > [0]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > > [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind > > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > > OS X to find memory corruption errors > > [0]PETSC ERROR: likely location of problem given in stack below > > [0]PETSC ERROR: --------------------- Stack Frames > > ------------------------------------ > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > > [0]PETSC ERROR: INSTEAD the line number of the start of the > function > > [0]PETSC ERROR: is given. > > [0]PETSC ERROR: --------------------- Error Message > > -------------------------------------------------------------- > > [0]PETSC ERROR: Signal received > > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > > [0]PETSC ERROR: Petsc Release Version 3.15.5, Sep 29, 2021 > > [0]PETSC ERROR: Unknown Name on a real-debug named skynet by pavan Sat > > Jun 18 10:30:04 2022 > > [0]PETSC ERROR: Configure options --PETSC_ARCH=real-debug > > --with-scalar-type=real --with-debugging=1 > > --with-mpi-dir=/home/pavan/packages/openmpi-4.0.7/opt-gfortran > > --download-metis=yes --download-parmetis=yes > > --download-superlu_dist=yes --with-shared-libraries=yes > > --with-fortran-bindings=1 --with-cxx-dialect=C++11 > > [0]PETSC ERROR: #1 User provided function() at unknown file:0 > > [0]PETSC ERROR: Checking the memory for corruption. > > > > I am new to PETSc and I don't really know how to debug this. Any help > > will be much appreciated! > > > > Regards, > > > > Pavan. > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.bernigaud at onera.fr Tue Jun 21 10:14:46 2022 From: pierre.bernigaud at onera.fr (Bernigaud Pierre) Date: Tue, 21 Jun 2022 17:14:46 +0200 Subject: [petsc-users] PETSc / AMRex Message-ID: Greetings, I hope you are doing great. We are currently working on parallel solver employing PETSc for the main numerical methods (GMRES, Newton-Krylov method). We would be interested in combining the PETSc solvers with the AMR framework provided by the library AMReX (https://amrex-codes.github.io/amrex/). I know that within the AMReX framework the KSP solvers provided by PETSc can be used, but what about the SNES solvers? More specifically, we are using a DMDA to manage parallel communications during the SNES calculations, and I am wondering how it would behave in a context where the data layout between processors is modified by the AMR code when refining the grid. Would you have any experience on this matter ? Is there any collaboration going on between PETsc and AMReX, or would you know of a code using both of them? Respectfully, Pierre Bernigaud From mfadams at lbl.gov Tue Jun 21 11:00:42 2022 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 21 Jun 2022 12:00:42 -0400 Subject: [petsc-users] PETSc / AMRex In-Reply-To: References: Message-ID: Hi Bernigaud, To be clear, you have SNES working with DMDA in AMRex, but without adapting dynamically and you want to know what to do next. Is that right? Mark On Tue, Jun 21, 2022 at 11:46 AM Bernigaud Pierre wrote: > Greetings, > > I hope you are doing great. > > We are currently working on parallel solver employing PETSc for the main > numerical methods (GMRES, Newton-Krylov method). We would be interested > in combining the PETSc solvers with the AMR framework provided by the > library AMReX (https://amrex-codes.github.io/amrex/). I know that within > the AMReX framework the KSP solvers provided by PETSc can be used, but > what about the SNES solvers? More specifically, we are using a DMDA to > manage parallel communications during the SNES calculations, and I am > wondering how it would behave in a context where the data layout between > processors is modified by the AMR code when refining the grid. > > Would you have any experience on this matter ? Is there any > collaboration going on between PETsc and AMReX, or would you know of a > code using both of them? > > Respectfully, > > Pierre Bernigaud > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Jun 21 12:16:34 2022 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 21 Jun 2022 11:16:34 -0600 Subject: [petsc-users] PETSc / AMRex In-Reply-To: References: Message-ID: On Tue, Jun 21, 2022 at 10:01 AM Mark Adams wrote: > Hi Bernigaud, > > To be clear, you have SNES working with DMDA in AMRex, but > without adapting dynamically and you want to know what to do next. > > Is that right? > I will let Mark answer the AMReX question since he is more knowledgeable. I just wanted to note that PETSc has good integration with the p4est ( www.p4est.org) AMR package. We can manage all parallel data and solver integration with it out of the box. Mark also has extensive experience here. Thanks, Matt > Mark > > On Tue, Jun 21, 2022 at 11:46 AM Bernigaud Pierre < > pierre.bernigaud at onera.fr> wrote: > >> Greetings, >> >> I hope you are doing great. >> >> We are currently working on parallel solver employing PETSc for the main >> numerical methods (GMRES, Newton-Krylov method). We would be interested >> in combining the PETSc solvers with the AMR framework provided by the >> library AMReX (https://amrex-codes.github.io/amrex/). I know that within >> the AMReX framework the KSP solvers provided by PETSc can be used, but >> what about the SNES solvers? More specifically, we are using a DMDA to >> manage parallel communications during the SNES calculations, and I am >> wondering how it would behave in a context where the data layout between >> processors is modified by the AMR code when refining the grid. >> >> Would you have any experience on this matter ? Is there any >> collaboration going on between PETsc and AMReX, or would you know of a >> code using both of them? >> >> Respectfully, >> >> Pierre Bernigaud >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Tue Jun 21 12:57:19 2022 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 21 Jun 2022 13:57:19 -0400 Subject: [petsc-users] PETSc / AMRex In-Reply-To: <06d9a338-f327-9724-4485-8cb8529b524c@onera.fr> References: <06d9a338-f327-9724-4485-8cb8529b524c@onera.fr> Message-ID: (keep on the list, you will need Matt and Toby soon anyway). So you want to add AMRex to your code. I think the first thing that you want to do is move your DMDA code into a DMPLex code. You can create a "box" mesh and it is not hard. Others like Matt can give advice on how to get started on that translation. There is a simple step to create a DMForest (p4/8est) that Matt mentioned from the DMPlex . Now at this point you can run your current SNES tests and get back to where you started, but AMR is easy now. Or as easy as it gets. As far as AMRex, well, it's not clear what AMRex does for you at this point. You don't seem to have AMRex code that you want to reuse. If there is some functionality that you need then we can talk about it or if you have some programmatic reason to use it (eg, they are paying you) then, again, we can talk about it. PETSc/p4est and AMRex are similar with different strengths and design, and you could use both but that would complicate things. Hope that helps, Mark On Tue, Jun 21, 2022 at 1:18 PM Bernigaud Pierre wrote: > Hello Mark, > > We have a working solver employing SNES, to which is attached a DMDA to > handle ghost cells / data sharing between processors for flux evaluation > (using DMGlobalToLocalBegin / DMGlobalToLocalEnd) . We are considering to > add an AMReX layer to the solver, but no work has been done yet, as we are > currently evaluating if it would be feasible without too much trouble. > > Our main subject of concern would be to understand how to interface > correctly PETSc (SNES+DMDA) and AMRex, as AMRex also appears to have his > own methods for parallel data management. Hence our inquiry for examples, > just to get a feel for how it would work out. > > Best, > > Pierre > Le 21/06/2022 ? 18:00, Mark Adams a ?crit : > > Hi Bernigaud, > > To be clear, you have SNES working with DMDA in AMRex, but > without adapting dynamically and you want to know what to do next. > > Is that right? > > Mark > > > > > On Tue, Jun 21, 2022 at 11:46 AM Bernigaud Pierre < > pierre.bernigaud at onera.fr> wrote: > >> Greetings, >> >> I hope you are doing great. >> >> We are currently working on parallel solver employing PETSc for the main >> numerical methods (GMRES, Newton-Krylov method). We would be interested >> in combining the PETSc solvers with the AMR framework provided by the >> library AMReX (https://amrex-codes.github.io/amrex/). I know that within >> the AMReX framework the KSP solvers provided by PETSc can be used, but >> what about the SNES solvers? More specifically, we are using a DMDA to >> manage parallel communications during the SNES calculations, and I am >> wondering how it would behave in a context where the data layout between >> processors is modified by the AMR code when refining the grid. >> >> Would you have any experience on this matter ? Is there any >> collaboration going on between PETsc and AMReX, or would you know of a >> code using both of them? >> >> Respectfully, >> >> Pierre Bernigaud >> >> -- > *Pierre Bernigaud* > Doctorant > D?partement multi-physique pour l??nerg?tique > Mod?lisation Propulsion Fus?e > T?l: +33 1 80 38 62 33 > > > ONERA - The French Aerospace Lab - Centre de Palaiseau > 6, Chemin de la Vauve aux Granges - 91123 PALAISEAU > Coordonn?es GPS : 48.715169, 2.232833 > > Nous suivre sur : www.onera.fr | Twitter > | LinkedIn > | Facebook > | Instagram > > > > Avertissement/disclaimer https://www.onera.fr/en/emails-terms > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: jlgjjjnkhffoclfc.gif Type: image/gif Size: 1041 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: dldmcfkmcojhebgb.png Type: image/png Size: 16755 bytes Desc: not available URL: From pierre.bernigaud at onera.fr Wed Jun 22 12:50:26 2022 From: pierre.bernigaud at onera.fr (Pierre Bernigaud) Date: Wed, 22 Jun 2022 19:50:26 +0200 Subject: [petsc-users] PETSc / AMRex In-Reply-To: References: <06d9a338-f327-9724-4485-8cb8529b524c@onera.fr> Message-ID: Mark, Thank you for this roadmap. It should be doable to go from a DMDA to a DMPLex code. I wasn't aware of the existence of p4est. From what I've seen, it should fulfil our needs. I will contact you again if we encounter any trouble. Thanks again, Pierre Le 2022-06-21 19:57, Mark Adams a ?crit : > (keep on the list, you will need Matt and Toby soon anyway). > > So you want to add AMRex to your code. > > I think the first thing that you want to do is move your DMDA code into > a DMPLex code. You can create a "box" mesh and it is not hard. > Others like Matt can give advice on how to get started on that > translation. > There is a simple step to create a DMForest (p4/8est) that Matt > mentioned from the DMPlex . > > Now at this point you can run your current SNES tests and get back to > where you started, but AMR is easy now. > Or as easy as it gets. > > As far as AMRex, well, it's not clear what AMRex does for you at this > point. > You don't seem to have AMRex code that you want to reuse. > If there is some functionality that you need then we can talk about it > or if you have some programmatic reason to use it (eg, they are paying > you) then, again, we can talk about it. > > PETSc/p4est and AMRex are similar with different strengths and design, > and you could use both but that would complicate things. > > Hope that helps, > Mark > > On Tue, Jun 21, 2022 at 1:18 PM Bernigaud Pierre > wrote: > > Hello Mark, > > We have a working solver employing SNES, to which is attached a DMDA to > handle ghost cells / data sharing between processors for flux > evaluation (using DMGlobalToLocalBegin / DMGlobalToLocalEnd) . We are > considering to add an AMReX layer to the solver, but no work has been > done yet, as we are currently evaluating if it would be feasible > without too much trouble. > > Our main subject of concern would be to understand how to interface > correctly PETSc (SNES+DMDA) and AMRex, as AMRex also appears to have > his own methods for parallel data management. Hence our inquiry for > examples, just to get a feel for how it would work out. > > Best, > > Pierre > > Le 21/06/2022 ? 18:00, Mark Adams a ?crit : > Hi Bernigaud, > > To be clear, you have SNES working with DMDA in AMRex, but without > adapting dynamically and you want to know what to do next. > > Is that right? > > Mark > > On Tue, Jun 21, 2022 at 11:46 AM Bernigaud Pierre > wrote: Greetings, > > I hope you are doing great. > > We are currently working on parallel solver employing PETSc for the > main > numerical methods (GMRES, Newton-Krylov method). We would be interested > in combining the PETSc solvers with the AMR framework provided by the > library AMReX (https://amrex-codes.github.io/amrex/). I know that > within > the AMReX framework the KSP solvers provided by PETSc can be used, but > what about the SNES solvers? More specifically, we are using a DMDA to > manage parallel communications during the SNES calculations, and I am > wondering how it would behave in a context where the data layout > between > processors is modified by the AMR code when refining the grid. > > Would you have any experience on this matter ? Is there any > collaboration going on between PETsc and AMReX, or would you know of a > code using both of them? > > Respectfully, > > Pierre Bernigaud -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: jlgjjjnkhffoclfc.gif Type: image/gif Size: 1041 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: dldmcfkmcojhebgb.png Type: image/png Size: 16755 bytes Desc: not available URL: From FERRANJ2 at my.erau.edu Fri Jun 24 12:52:27 2022 From: FERRANJ2 at my.erau.edu (Ferrand, Jesus A.) Date: Fri, 24 Jun 2022 17:52:27 +0000 Subject: [petsc-users] [EXTERNAL] Re: DMPlex/PetscSF How to determine if local topology is other rank's ghost? In-Reply-To: References: Message-ID: Toby and Matt: Thank you for your helpful replies. In principle, I have what I need, however, I ran into a bug with PetscSFReduce. When I run the following on the pointSF from a distributed plex (2 MPI ranks on a small mesh). //============================================================================================== PetscSFGetGraph(point_sf,&nroots,&nleaves,&ilocal,&iremote); PetscCalloc2(nleaves,&leafdata,nroots,&rootdata); \* Code that populates leafdata*/ PetscSFReduceBegin(point_sf,MPIU_INT,leafdata, rootdata,MPI_SUM); PetscSFReduceEnd(point_sf,MPIU_INT,leafdata, rootdata,MPI_SUM); PetscSFView(point_sf,0); PetscViewerASCIIPrintf(PETSC_VIEWER_STDOUT_WORLD,"## Reduce Leafdata\n"); //I copied this from a PetscSF example. PetscIntView(nleaves,leafdata,PETSC_VIEWER_STDOUT_WORLD); PetscViewerASCIIPrintf(PETSC_VIEWER_STDOUT_WORLD,"## Reduce Rootdata\n"); PetscIntView(nroots,rootdata,PETSC_VIEWER_STDOUT_WORLD); PetscFree2(leafdata,rootdata); //============================================================================================== .... I get the following printout : //====================================== PetscSF Object: 2 MPI processes type: basic [0] Number of roots=29, leaves=5, remote ranks=1 [0] 9 <- (1,9) [0] 11 <- (1,10) [0] 12 <- (1,13) [0] 20 <- (1,20) [0] 27 <- (1,27) [1] Number of roots=29, leaves=2, remote ranks=1 [1] 14 <- (0,13) [1] 19 <- (0,18) MultiSF sort=rank-order ## Reduce Leafdata [0] 0: 2 2 2 0 0 [1] 0: 3 0 ## Reduce Rootdata [0] 0: 0 0 0 0 0 0 0 0 0 0 0 0 0 -686563120 0 0 0 0 0 0 [0] 20: 0 0 0 0 0 0 0 0 0 [1] 0: 0 0 0 0 0 0 0 0 0 0 0 0 0 128 0 0 0 0 0 0 [1] 20: -527386800 0 0 0 0 0 0 32610 0 //====================================== The good news is that the rootdata on both processors has the correct number of nonzeros after reduction. The bad news is that the nonzeros are garbage (like what one gets when a variable isn't initialized). Any ideas as to what could cause this? Could something like a previous call to a PetscSF or DMPlex function do this? I am still using PETSc version 3.16, but I looked at the patch notes of 3.17 and did not see any updates on PetscSFReduce(). ________________________________ From: Matthew Knepley Sent: Wednesday, May 18, 2022 2:09 AM To: Toby Isaac Cc: Ferrand, Jesus A. ; petsc-users at mcs.anl.gov Subject: [EXTERNAL] Re: [petsc-users] DMPlex/PetscSF How to determine if local topology is other rank's ghost? CAUTION: This email originated outside of Embry-Riddle Aeronautical University. Do not click links or open attachments unless you recognize the sender and know the content is safe. On Tue, May 17, 2022 at 6:47 PM Toby Isaac > wrote: A leaf point is attached to a root point (in a star forest there are only leaves and roots), so that means that a root point would be the point that owns a degree of freedom and a leaf point would have a ghost value. For a "point SF" of a DMPlex: - Each process has a local numbering of mesh points (cells + edges + faces + vertices): they are all potential roots, so the number of these is what is returned by `nroots`. - The number of ghost mesh points is `nleaves`. - `ilocal` would be a list of the mesh points that are leaves (using the local numbering). - For each leaf in `ilocal`, `iremote` describes the root it is attached to: which process it belongs to, and its id in *that* process's local numbering. If you're trying to create dof numberings on your own, please consider PetscSectionCreateGlobalSection: . You supply the PetscSF and a PetscSection which says how many dofs there are for each point and whether any have essential boundary conditions, and it computes a global PetscSection that tells you what the global id is for each dof on this process. Toby is exactly right. Also, if you want global numbering of points you can use https://petsc.org/main/docs/manualpages/DMPLEX/DMPlexCreatePointNumbering/ and there is a similar thing for jsut cells or vertices. Thanks, Matt On Tue, May 17, 2022 at 7:26 PM Ferrand, Jesus A. > wrote: Dear PETSc team: I am working with a non-overlapping distributed plex (i.e., when I call DMPlexDistribute(), I input overlap = 0), so only vertices and edges appear as ghosts to the local ranks. For preallocation of a parallel global stiffness matrix for FEA, I want to determine which locally owned vertices are ghosts to another rank. From reading the paper on PetscSF (https://ieeexplore.ieee.org/document/9442258) I think I can answer my question by inspecting the PetscSF returned by DMPlexDistribute() with PetscSFGetGraph(). I am just confused by the root/leaf and ilocal/iremote terminology. I read the manual page on PetscSFGetGraph() (https://petsc.org/release/docs/manualpages/PetscSF/PetscSFGetGraph.html) and that gave me the impression that I need to PetscSFBcast() the point IDs from foreign ranks to the local ones. Is this correct? [https://ieeexplore.ieee.org/assets/img/ieee_logo_smedia_200X200.png] The PetscSF Scalable Communication Layer | IEEE Journals & Magazine | IEEE Xplore PetscSF, the communication component of the Portable, Extensible Toolkit for Scientific Computation (PETSc), is designed to provide PETSc's communication infrastructure suitable for exascale computers that utilize GPUs and other accelerators. PetscSF provides a simple application programming interface (API) for managing common communication patterns in scientific computations by using a star ... ieeexplore.ieee.org ? Sincerely: J.A. Ferrand Embry-Riddle Aeronautical University - Daytona Beach FL M.Sc. Aerospace Engineering | May 2022 B.Sc. Aerospace Engineering B.Sc. Computational Mathematics Sigma Gamma Tau Tau Beta Pi Phone: (386)-843-1829 Email(s): ferranj2 at my.erau.edu jesus.ferrand at gmail.com -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From toby.isaac at gmail.com Fri Jun 24 13:09:40 2022 From: toby.isaac at gmail.com (Toby Isaac) Date: Fri, 24 Jun 2022 14:09:40 -0400 Subject: [petsc-users] [EXTERNAL] Re: DMPlex/PetscSF How to determine if local topology is other rank's ghost? In-Reply-To: References: Message-ID: > > //====================================== > PetscSF Object: 2 MPI processes > type: basic > [0] Number of roots=29, leaves=5, remote ranks=1 > [0] 9 <- (1,9) > [0] 11 <- (1,10) > [0] 12 <- (1,13) > [0] 20 <- (1,20) > [0] 27 <- (1,27) > [1] Number of roots=29, leaves=2, remote ranks=1 > [1] 14 <- (0,13) > [1] 19 <- (0,18) > MultiSF sort=rank-order > ## Reduce Leafdata > [0] 0: 2 2 2 0 0 > [1] 0: 3 0 > ## Reduce Rootdata > [0] 0: 0 0 0 0 0 0 0 0 0 0 0 0 0 -686563120 0 0 0 0 0 0 > [0] 20: 0 0 0 0 0 0 0 0 0 > [1] 0: 0 0 0 0 0 0 0 0 0 0 0 0 0 128 0 0 0 0 0 0 > [1] 20: -527386800 0 0 0 0 0 0 32610 0 This sf is one where the leaves are numbered as though they are sparsely drawn from a larger vector. For example, `[0] 9 <- (1,9)` means that the leaf with local index 9 on rank 0 has a root at (rank 1, index 9); the next leaf is `[0] 11 <- (1,10)`, meaning it has local index 11 and its root is at (rank 1, index 10). So `PetscSFReduceBegin()` is expecting to read the `leafdata` on rank 0 from indices 9, 11, .... But you have given it a `leafdata` array that is just a contiguous array that's the size of the number of leaves. The index spaces for leaves and roots don't have to be the same, but in the case of the point SF they always are. You should make a change like so: ``` --- PetscCalloc2(nleaves,&leafdata,nroots,&rootdata); +++ PetscCalloc2(nroots,&leafdata,nroots,&rootdata); ``` From jed at jedbrown.org Sun Jun 26 16:28:24 2022 From: jed at jedbrown.org (Jed Brown) Date: Sun, 26 Jun 2022 15:28:24 -0600 Subject: [petsc-users] Load mesh as DMPlex along with Solution Fields obtained from External Codes In-Reply-To: References: Message-ID: <87mtdz8687.fsf@jedbrown.org> (Sorry you didn't get a reply earlier.) That's generally right: DM gives what is basically a distributed "function space". Actual fields in that space are Vecs. You can have one or more Vecs. The implementation of VecLoad will depend on the file format. Mike Michell writes: > Dear PETSc developer team, > > I am a user of PETSc DMPlex for a finite-volume solver. So far, I have > loaded a mesh file made by Gmsh as a DMPlex object without pre-computed > solution field. > But what if I need to load the mesh as well as solution fields that are > computed by other codes sharing the same physical domain, what is a smart > way to do that? In other words, how can I load a DM object from a mesh file > along with a defined solution field? > I can think of that; load mesh to a DM object first, then declare a local > (or global) vector to read & map the external solution field onto the PETSc > data structure. But I can feel that this might not be the best way. > > Thanks, > Mike From mersoj2 at rpi.edu Mon Jun 27 00:20:37 2022 From: mersoj2 at rpi.edu (Merson, Jacob Simon) Date: Mon, 27 Jun 2022 05:20:37 +0000 Subject: [petsc-users] Fail function evaluation with SNES Message-ID: Hi All, I?m attempting to use the SNES solver with the finite element method. When I use the trust region or line search algorithms I?m not currently running into any problems and the solution matches a hand-coded newton solver. However, when I use other methods like quasi-newton, or newton-conjugate-graduent I end up with a guess that makes the element Jacobian negative causing issues with the residual (and Jacobian) evaluation. For this circumstance is it possible to set an error code that specifies that the function evaluation has failed and have SNES try a different step? Thanks for the help! Jacob From bsmith at petsc.dev Mon Jun 27 05:17:00 2022 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 27 Jun 2022 06:17:00 -0400 Subject: [petsc-users] Fail function evaluation with SNES In-Reply-To: References: Message-ID: <6FCDDA3F-60C2-4E95-84CA-B79A41F12983@petsc.dev> You would call SNESSetFunctionDomainError() or SNESSetJacobianDomainError() from within your function or Jacobian evaluation and then return from the function. This notifies SNES that the step it attempted is not acceptable to your functions. SNES may not be able to recover from its bad step. The simplest attempt to recover is to have SNES try a shorter step. If the bad steps come from, for example, negative pressures or other non-physical locations of the step you can try using SNESVISetVariableBounds() and friends to tell SNES what steps to avoid. If you have particular cases where SNES cannot recover and you can share your code we can investigate improving the handling of this feature in SNES. Barry > On Jun 27, 2022, at 1:20 AM, Merson, Jacob Simon wrote: > > Hi All, > > I?m attempting to use the SNES solver with the finite element method. When I use the trust region or line search algorithms I?m not currently running into any problems and the solution matches a hand-coded newton solver. However, when I use other methods like quasi-newton, or newton-conjugate-graduent I end up with a guess that makes the element Jacobian negative causing issues with the residual (and Jacobian) evaluation. > > For this circumstance is it possible to set an error code that specifies that the function evaluation has failed and have SNES try a different step? > > > Thanks for the help! > Jacob From mersoj2 at rpi.edu Mon Jun 27 07:23:42 2022 From: mersoj2 at rpi.edu (Merson, Jacob Simon) Date: Mon, 27 Jun 2022 12:23:42 +0000 Subject: [petsc-users] [EXTERNAL] Re: Fail function evaluation with SNES In-Reply-To: <6FCDDA3F-60C2-4E95-84CA-B79A41F12983@petsc.dev> References: <6FCDDA3F-60C2-4E95-84CA-B79A41F12983@petsc.dev> Message-ID: <318716D4-261D-4A87-862B-52F50F0CB48C@rpi.edu> Wonderful thank you! This is exactly what I was looking for. ? Jacob Merson > On Jun 27, 2022, at 6:17 AM, Barry Smith wrote: > > ? > You would call SNESSetFunctionDomainError() or SNESSetJacobianDomainError() from within your function or Jacobian evaluation and then return from the function. This notifies SNES that the step it attempted is not acceptable to your functions. > > SNES may not be able to recover from its bad step. The simplest attempt to recover is to have SNES try a shorter step. If the bad steps come from, for example, negative pressures or other non-physical locations of the step you can try using SNESVISetVariableBounds() and friends to tell SNES what steps to avoid. > > If you have particular cases where SNES cannot recover and you can share your code we can investigate improving the handling of this feature in SNES. > > Barry > >> On Jun 27, 2022, at 1:20 AM, Merson, Jacob Simon wrote: >> >> Hi All, >> >> I?m attempting to use the SNES solver with the finite element method. When I use the trust region or line search algorithms I?m not currently running into any problems and the solution matches a hand-coded newton solver. However, when I use other methods like quasi-newton, or newton-conjugate-graduent I end up with a guess that makes the element Jacobian negative causing issues with the residual (and Jacobian) evaluation. >> >> For this circumstance is it possible to set an error code that specifies that the function evaluation has failed and have SNES try a different step? >> >> >> Thanks for the help! >> Jacob > From bsmith at petsc.dev Mon Jun 27 12:40:20 2022 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 27 Jun 2022 13:40:20 -0400 Subject: [petsc-users] [EXTERNAL] Fail function evaluation with SNES In-Reply-To: <318716D4-261D-4A87-862B-52F50F0CB48C@rpi.edu> References: <6FCDDA3F-60C2-4E95-84CA-B79A41F12983@petsc.dev> <318716D4-261D-4A87-862B-52F50F0CB48C@rpi.edu> Message-ID: <2DC8D9AA-C378-46D2-8AE5-788FE541FCAE@petsc.dev> I forgot something, you can also use ``SNESLineSearchSetPreCheck()`` and ``SNESLineSearchSetPostCheck()`` to control properties of the steps selected by `SNES`. > On Jun 27, 2022, at 8:23 AM, Merson, Jacob Simon wrote: > > Wonderful thank you! This is exactly what I was looking for. > > ? > Jacob Merson > >> On Jun 27, 2022, at 6:17 AM, Barry Smith wrote: >> >> ? >> You would call SNESSetFunctionDomainError() or SNESSetJacobianDomainError() from within your function or Jacobian evaluation and then return from the function. This notifies SNES that the step it attempted is not acceptable to your functions. >> >> SNES may not be able to recover from its bad step. The simplest attempt to recover is to have SNES try a shorter step. If the bad steps come from, for example, negative pressures or other non-physical locations of the step you can try using SNESVISetVariableBounds() and friends to tell SNES what steps to avoid. >> >> If you have particular cases where SNES cannot recover and you can share your code we can investigate improving the handling of this feature in SNES. >> >> Barry >> >>> On Jun 27, 2022, at 1:20 AM, Merson, Jacob Simon wrote: >>> >>> Hi All, >>> >>> I?m attempting to use the SNES solver with the finite element method. When I use the trust region or line search algorithms I?m not currently running into any problems and the solution matches a hand-coded newton solver. However, when I use other methods like quasi-newton, or newton-conjugate-graduent I end up with a guess that makes the element Jacobian negative causing issues with the residual (and Jacobian) evaluation. >>> >>> For this circumstance is it possible to set an error code that specifies that the function evaluation has failed and have SNES try a different step? >>> >>> >>> Thanks for the help! >>> Jacob >> From knepley at gmail.com Tue Jun 28 06:46:56 2022 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 28 Jun 2022 07:46:56 -0400 Subject: [petsc-users] Load mesh as DMPlex along with Solution Fields obtained from External Codes In-Reply-To: References: Message-ID: On Wed, Jun 8, 2022 at 12:15 AM Mike Michell wrote: > Dear PETSc developer team, > > I am a user of PETSc DMPlex for a finite-volume solver. So far, I have > loaded a mesh file made by Gmsh as a DMPlex object without pre-computed > solution field. > But what if I need to load the mesh as well as solution fields that are > computed by other codes sharing the same physical domain, what is a smart > way to do that? In other words, how can I load a DM object from a mesh file > along with a defined solution field? > I can think of that; load mesh to a DM object first, then declare a local > (or global) vector to read & map the external solution field onto the PETSc > data structure. But I can feel that this might not be the best way. > Here was my idea for this. PetscSection is an abstraction for laying out data over a DMPlex. In parallel, each local Section lays out local data, and a PetscSF points "ghost" mesh points at the owner. From this we can make a _global_ Section automatically that lays out globally consistent data. Thus, in order to match an external layout, you need to: 1) Match the mesh topology with DMPlex 2) Match the mesh parallel layout with a PetscSF 3) Match the local data layout with a PetscSection (might require specifying a permutation of the mesh points to the section) Then you should be able to load your data with VecLoad(). Let me know if this is unclear or does not work for you. Thanks, Matt > Thanks, > Mike > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Jun 28 12:17:33 2022 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 28 Jun 2022 13:17:33 -0400 Subject: [petsc-users] List of points with dof>0 in a PetscSection In-Reply-To: References: Message-ID: On Fri, Jun 10, 2022 at 4:06 PM Blaise Bourdin wrote: > Hi, > > Given a PetscSection, is there an easy way to get a list of point at which > the number of dof is >0? > For instance, when projecting over a FE space, I?d rather do a loop over > such points than do a loop over all points in a DM, get the number of dof, > and test if it is >0. > We do not have an index like this. There is always a tradeoff between direct indexing (as we have now in Section) and indirect indexing (as you would have if you compressed the indices). For internal uses, the search would never pay off I think. It would not be hard to make this up front if you think it would be beneficial. I have never seen a case where the loop takes much more time than the compressed version. Thanks, Matt > Regards, > Blaise > -- > Professor, Department of Mathematics & Statistics > Hamilton Hall room 409A, McMaster University > 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada > https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243 > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Jun 29 02:46:23 2022 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 29 Jun 2022 13:16:23 +0530 (IST) Subject: [petsc-users] petsc-3.17.3 now available Message-ID: Dear PETSc users, The patch release petsc-3.17.3 is now available for download. http://www.mcs.anl.gov/petsc/download/index.html Satish From Jiannan_Tu at uml.edu Wed Jun 29 13:23:15 2022 From: Jiannan_Tu at uml.edu (Tu, Jiannan) Date: Wed, 29 Jun 2022 18:23:15 +0000 Subject: [petsc-users] KPS and linear complex equation system Message-ID: I have a quick question. Petsc can be configured with complex number. can KSP then be used to solve linear equations of complex number, that is, both the matrix elements and solutions are complex, directly without separation real and imaginary parts? Thank you very much. Best Jiannan Tu -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jun 29 13:58:27 2022 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 29 Jun 2022 14:58:27 -0400 Subject: [petsc-users] KPS and linear complex equation system In-Reply-To: References: Message-ID: On Wed, Jun 29, 2022 at 2:24 PM Tu, Jiannan wrote: > I have a quick question. Petsc can be configured with complex number. can > KSP then be used to solve linear equations of complex number, that is, both > the matrix elements and solutions are complex, directly without separation > real and imaginary parts? > Yes. Not all solvers make sense in this mode, but some do. Thanks, Matt > Thank you very much. > > Best > Jiannan Tu > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jiannan_Tu at uml.edu Wed Jun 29 22:04:48 2022 From: Jiannan_Tu at uml.edu (Tu, Jiannan) Date: Thu, 30 Jun 2022 03:04:48 +0000 Subject: [petsc-users] KPS and linear complex equation system In-Reply-To: References: Message-ID: Matt, thank very much you for the reply. Jiannan From: Matthew Knepley Sent: Wednesday, June 29, 2022 2:58 PM To: Tu, Jiannan Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] KPS and linear complex equation system On Wed, Jun 29, 2022 at 2:24 PM Tu, Jiannan > wrote: I have a quick question. Petsc can be configured with complex number. can KSP then be used to solve linear equations of complex number, that is, both the matrix elements and solutions are complex, directly without separation real and imaginary parts? Yes. Not all solvers make sense in this mode, but some do. Thanks, Matt Thank you very much. Best Jiannan Tu -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: