From awalker at student.ethz.ch Fri May 1 03:07:55 2020 From: awalker at student.ethz.ch (Walker Andreas) Date: Fri, 1 May 2020 08:07:55 +0000 Subject: [petsc-users] Performance of SLEPc's Krylov-Schur solver In-Reply-To: References: <86B05A0E-87C4-4B23-AC8B-6C39E6538B84@student.ethz.ch> Message-ID: <58697038-9771-4819-B060-061A3F3F0E91@student.ethz.ch> Hi Matthew, I just ran the same program on a single core. You can see the output of -log_view below. As I see it, most functions have speedups of around 50 for 128 cores, also functions like matmult etc. Best regards, Andreas ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./Solver on a named eu-a6-011-09 with 1 processor, by awalker Fri May 1 04:03:07 2020 Using Petsc Release Version 3.10.5, Mar, 28, 2019 Max Max/Min Avg Total Time (sec): 3.092e+04 1.000 3.092e+04 Objects: 6.099e+05 1.000 6.099e+05 Flop: 9.313e+13 1.000 9.313e+13 9.313e+13 Flop/sec: 3.012e+09 1.000 3.012e+09 3.012e+09 MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flop and VecAXPY() for complex vectors of length N --> 8N flop Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 3.0925e+04 100.0% 9.3134e+13 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 152338 1.0 8.2799e+03 1.0 8.20e+12 1.0 0.0e+00 0.0e+00 0.0e+00 27 9 0 0 0 27 9 0 0 0 990 MatMultAdd 609352 1.0 8.1229e+03 1.0 8.20e+12 1.0 0.0e+00 0.0e+00 0.0e+00 26 9 0 0 0 26 9 0 0 0 1010 MatConvert 30 1.0 1.5797e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatScale 10 1.0 4.7172e-02 1.0 6.73e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1426 MatAssemblyBegin 516 1.0 2.0695e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 516 1.0 2.8933e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 2 1.0 3.6038e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 10 1.0 2.4422e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAXPY 40 1.0 3.1595e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMatMult 60 1.0 1.3723e+01 1.0 1.24e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 90 MatMatMultSym 100 1.0 1.3651e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMatMultNum 100 1.0 7.5159e+00 1.0 2.06e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 274 MatMatMatMult 40 1.0 1.8674e+01 1.0 1.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 89 MatMatMatMultSym 40 1.0 1.1848e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMatMatMultNum 40 1.0 6.8266e+00 1.0 1.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 243 MatPtAP 40 1.0 1.9042e+01 1.0 1.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 87 MatTrnMatMult 40 1.0 7.7990e+00 1.0 8.24e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 106 DMPlexStratify 1 1.0 5.1223e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 DMPlexPrealloc 2 1.0 1.5242e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 914053 1.0 1.4929e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyBegin 1 1.0 1.3411e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 1 1.0 8.0094e-08 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 1 1.0 2.6399e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSetRandom 10 1.0 8.6088e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 EPSSetUp 10 1.0 2.9988e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 EPSSolve 10 1.0 2.8695e+04 1.0 9.31e+13 1.0 0.0e+00 0.0e+00 0.0e+00 93100 0 0 0 93100 0 0 0 3246 STSetUp 10 1.0 9.7291e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 STApply 152338 1.0 8.2803e+03 1.0 8.20e+12 1.0 0.0e+00 0.0e+00 0.0e+00 27 9 0 0 0 27 9 0 0 0 990 BVCopy 1814 1.0 1.1076e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 BVMultVec 304639 1.0 9.8281e+03 1.0 3.34e+13 1.0 0.0e+00 0.0e+00 0.0e+00 32 36 0 0 0 32 36 0 0 0 3397 BVMultInPlace 1824 1.0 7.0999e+02 1.0 1.79e+13 1.0 0.0e+00 0.0e+00 0.0e+00 2 19 0 0 0 2 19 0 0 0 25213 BVDotVec 304639 1.0 9.8037e+03 1.0 3.36e+13 1.0 0.0e+00 0.0e+00 0.0e+00 32 36 0 0 0 32 36 0 0 0 3427 BVOrthogonalizeV 152348 1.0 1.9633e+04 1.0 6.70e+13 1.0 0.0e+00 0.0e+00 0.0e+00 63 72 0 0 0 63 72 0 0 0 3411 BVScale 152348 1.0 3.7888e+01 1.0 5.32e+10 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1403 BVSetRandom 10 1.0 8.6364e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 DSSolve 1824 1.0 1.7363e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 DSVectors 2797 1.0 1.2353e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 DSOther 1824 1.0 9.8627e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Container 1 1 584 0. Distributed Mesh 1 1 5184 0. GraphPartitioner 1 1 624 0. Matrix 320 320 3469402576 0. Index Set 53 53 2777932 0. IS L to G Mapping 1 1 249320 0. Section 13 11 7920 0. Star Forest Graph 6 6 4896 0. Discrete System 1 1 936 0. Vector 609405 609405 857220847896 0. Vec Scatter 1 1 704 0. Viewer 22 11 9328 0. EPS Solver 10 10 86360 0. Spectral Transform 10 10 8400 0. Basis Vectors 10 10 530336 0. PetscRandom 10 10 6540 0. Region 10 10 6800 0. Direct Solver 10 10 9838880 0. Krylov Solver 10 10 13920 0. Preconditioner 10 10 10080 0. ======================================================================================================================== Average time to get PetscTime(): 2.50991e-08 #PETSc Option Table entries: -config=benchmark3.json -eps_converged_reason -log_view #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --prefix=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 CFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" FFLAGS= CXXFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" --with-cc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc --with-cxx=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpic++ --with-fc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 --with-precision=double --with-scalar-type=real --with-shared-libraries=1 --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= CXXOPTFLAGS= --with-blaslapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib/libopenblas.so --with-x=0 --with-cxx-dialect=C++11 --with-boost=1 --with-clanguage=C --with-scalapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib/libscalapack.so --with-scalapack=1 --with-metis=1 --with-metis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk --with-hdf5=1 --with-hdf5-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5 --with-hypre=1 --with-hypre-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne --with-parmetis=1 --with-parmetis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4 --with-mumps=1 --with-mumps-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b --with-trilinos=1 --with-trilinos-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo --with-fftw=0 --with-cxx-dialect=C++11 --with-superlu_dist-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include --with-superlu_dist-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib/libsuperlu_dist.a --with-superlu_dist=1 --with-suitesparse-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include --with-suitesparse-lib="/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libumfpack.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libklu.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcholmod.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libbtf.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libccolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libsuitesparseconfig.so /lib64/librt.so" --with-suitesparse=1 --with-zlib-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include --with-zlib-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib/libz.so --with-zlib=1 ----------------------------------------- Libraries compiled on 2020-01-22 15:21:53 on eu-c7-051-02 Machine characteristics: Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-centos-7.5.1804-Core Using PETSc directory: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit Using PETSc arch: ----------------------------------------- Using C compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc -ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2 Using Fortran compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 ----------------------------------------- Using include paths: -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include ----------------------------------------- Using C linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc Using Fortran linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 Using libraries: -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -lpetsc -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib /lib64/librt.so -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -Wl,-rpath,/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -L/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib:/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -lmuelu-adapters -lmuelu-interface -lmuelu -lstratimikos -lstratimikosbelos -lstratimikosaztecoo -lstratimikosamesos -lstratimikosml -lstratimikosifpack -lModeLaplace -lanasaziepetra -lanasazi -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lbelosxpetra -lbelosepetra -lbelos -lml -lifpack -lpamgen_extras -lpamgen -lamesos -lgaleri-xpetra -lgaleri-epetra -laztecoo -lisorropia -lxpetra-sup -lxpetra -lthyraepetraext -lthyraepetra -lthyracore -lthyraepetraext -lthyraepetra -lthyracore -lepetraext -ltrilinosss -ltriutils -lzoltan -lepetra -lsacado -lrtop -lkokkoskernels -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lgtest -lpthread -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu_dist -lHYPRE -lopenblas -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lparmetis -lmetis -lm -lz -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl ----------------------------------------- Am 30.04.2020 um 17:14 schrieb Matthew Knepley >: On Thu, Apr 30, 2020 at 10:55 AM Walker Andreas > wrote: Hello everyone, I have used SLEPc successfully on a FEM-related project. Even though it is very powerful overall, the speedup I measure is a bit below my expectations. Compared to using a single core, the speedup is for example around 1.8 for two cores but only maybe 50-60 for 128 cores and maybe 70 or 80 for 256 cores. Some details about my problem: - The problem is based on meshes with up to 400k degrees of freedom. DMPlex is used for organizing it. - ParMetis is used to partition the mesh. This yields a stiffness matrix where the vast majority of entries is in the diagonal blocks (i.e. looking at the rows owned by a core, there is a very dense square-shaped region around the diagonal and some loosely scattered nozeroes in the other columns). - The actual matrix from which I need eigenvalues is a 2x2 block matrix, saved as MATNEST - matrix. Each of these four matrices is computed based on the stiffness matrix and has a similar size and nonzero pattern. For a mesh of 200k dofs, one such matrix has a size of about 174kx174k and on average about 40 nonzeroes per row. - I use the default Krylov-Schur solver and look for the 100 smallest eigenvalues - The output of -log_view for the 200k-dof - mesh described above run on 128 cores is at the end of this mail. I noticed that the problem matrices are not perfectly balanced, i.e. the number of rows per core might vary between 2500 and 3000, for example. But I am not sure if this is the main reason for the poor speedup. I tried to reduce the subspace size but without effect. I also attempted to use the shift-and-invert spectral transformation but the MATNEST-type prevents this. Are there any suggestions to improve the speedup further or is this the maximum speedup that I can expect? Can you also give us the performance for this problem on one node using the same number of cores per node? Then we can calculate speedup and look at which functions are not speeding up. Thanks, Matt Thanks a lot in advance, Andreas Walker m&m group D-MAVT ETH Zurich ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./Solver on a named eu-g1-050-2 with 128 processors, by awalker Thu Apr 30 15:50:22 2020 Using Petsc Release Version 3.10.5, Mar, 28, 2019 Max Max/Min Avg Total Time (sec): 6.209e+02 1.000 6.209e+02 Objects: 6.068e+05 1.001 6.063e+05 Flop: 9.230e+11 1.816 7.212e+11 9.231e+13 Flop/sec: 1.487e+09 1.816 1.161e+09 1.487e+11 MPI Messages: 1.451e+07 2.999 8.265e+06 1.058e+09 MPI Message Lengths: 6.062e+09 2.011 5.029e+02 5.321e+11 MPI Reductions: 1.512e+06 1.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flop and VecAXPY() for complex vectors of length N --> 8N flop Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 6.2090e+02 100.0% 9.2309e+13 100.0% 1.058e+09 100.0% 5.029e+02 100.0% 1.512e+06 100.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage BuildTwoSided 20 1.0 2.3249e-01 2.2 0.00e+00 0.0 2.2e+04 4.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 BuildTwoSidedF 317 1.0 8.5016e-01 4.8 0.00e+00 0.0 2.1e+04 1.4e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMult 150986 1.0 2.1963e+02 1.3 8.07e+10 1.8 1.1e+09 5.0e+02 1.2e+06 31 9100100 80 31 9100100 80 37007 MatMultAdd 603944 1.0 1.6209e+02 1.4 8.07e+10 1.8 1.1e+09 5.0e+02 0.0e+00 23 9100100 0 23 9100100 0 50145 MatConvert 30 1.0 1.6488e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatScale 10 1.0 1.0347e-03 3.9 6.68e+05 1.8 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 65036 MatAssemblyBegin 916 1.0 8.6715e-01 1.4 0.00e+00 0.0 2.1e+04 1.4e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 916 1.0 2.0682e-01 1.1 0.00e+00 0.0 4.7e+05 1.3e+02 1.5e+03 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 42 1.0 7.2787e-03 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 10 1.0 1.4816e+00 1.0 0.00e+00 0.0 6.4e+03 1.3e+05 3.0e+01 0 0 0 0 0 0 0 0 0 0 0 MatAXPY 40 1.0 1.0752e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatTranspose 80 1.0 3.0198e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMatMult 60 1.0 3.0391e-01 1.0 7.82e+06 1.6 3.8e+05 2.8e+02 7.8e+02 0 0 0 0 0 0 0 0 0 0 2711 MatMatMultSym 60 1.0 2.4238e-01 1.0 0.00e+00 0.0 3.3e+05 2.4e+02 7.2e+02 0 0 0 0 0 0 0 0 0 0 0 MatMatMultNum 60 1.0 5.8508e-02 1.0 7.82e+06 1.6 4.7e+04 5.7e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 14084 MatPtAP 40 1.0 4.5617e-01 1.0 1.59e+07 1.6 3.3e+05 1.0e+03 6.4e+02 0 0 0 0 0 0 0 0 0 0 3649 MatPtAPSymbolic 40 1.0 2.6002e-01 1.0 0.00e+00 0.0 1.7e+05 6.5e+02 2.8e+02 0 0 0 0 0 0 0 0 0 0 0 MatPtAPNumeric 40 1.0 1.9293e-01 1.0 1.59e+07 1.6 1.5e+05 1.5e+03 3.2e+02 0 0 0 0 0 0 0 0 0 0 8629 MatTrnMatMult 40 1.0 2.3801e-01 1.0 6.09e+06 1.8 1.8e+05 1.0e+03 6.4e+02 0 0 0 0 0 0 0 0 0 0 2442 MatTrnMatMultSym 40 1.0 1.6962e-01 1.0 0.00e+00 0.0 1.7e+05 4.4e+02 6.4e+02 0 0 0 0 0 0 0 0 0 0 0 MatTrnMatMultNum 40 1.0 6.9000e-02 1.0 6.09e+06 1.8 9.7e+03 1.1e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 8425 MatGetLocalMat 240 1.0 4.9149e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetBrAoCol 160 1.0 2.0470e-02 1.6 0.00e+00 0.0 3.3e+05 4.1e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatTranspose_SeqAIJ_FAST 80 1.0 2.9940e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 Mesh Partition 1 1.0 1.4825e+00 1.0 0.00e+00 0.0 9.8e+04 6.9e+01 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 Mesh Migration 1 1.0 3.6680e-02 1.0 0.00e+00 0.0 1.5e+03 1.4e+04 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 DMPlexDistribute 1 1.0 1.5269e+00 1.0 0.00e+00 0.0 1.0e+05 3.5e+02 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 DMPlexDistCones 1 1.0 1.8845e-02 1.2 0.00e+00 0.0 1.0e+03 1.7e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 DMPlexDistLabels 1 1.0 9.7280e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 DMPlexDistData 1 1.0 3.1499e-01 1.4 0.00e+00 0.0 9.8e+04 4.3e+01 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 DMPlexStratify 2 1.0 9.3421e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 DMPlexPrealloc 2 1.0 3.5980e-02 1.0 0.00e+00 0.0 4.0e+04 1.8e+03 3.0e+01 0 0 0 0 0 0 0 0 0 0 0 SFSetGraph 20 1.0 1.6069e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFSetUp 20 1.0 2.8043e-01 1.9 0.00e+00 0.0 6.7e+04 5.0e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFBcastBegin 25 1.0 3.9653e-02 2.5 0.00e+00 0.0 6.1e+04 4.9e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFBcastEnd 25 1.0 9.0128e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFReduceBegin 10 1.0 4.3473e-04 5.5 0.00e+00 0.0 7.4e+03 4.0e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFReduceEnd 10 1.0 5.7962e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFFetchOpBegin 2 1.0 1.6069e-0434.7 0.00e+00 0.0 1.8e+03 4.4e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFFetchOpEnd 2 1.0 8.9251e-04 2.6 0.00e+00 0.0 1.8e+03 4.4e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 302179 1.0 1.3128e+00 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyBegin 1 1.0 1.3844e-03 7.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 1 1.0 3.4710e-05 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 603945 1.0 2.2874e+01 4.4 0.00e+00 0.0 1.1e+09 5.0e+02 1.0e+00 2 0100100 0 2 0100100 0 0 VecScatterEnd 603944 1.0 8.2651e+01 4.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 7 0 0 0 0 7 0 0 0 0 0 VecSetRandom 11 1.0 2.7061e-03 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 EPSSetUp 10 1.0 5.0371e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+01 0 0 0 0 0 0 0 0 0 0 0 EPSSolve 10 1.0 6.1329e+02 1.0 9.23e+11 1.8 1.1e+09 5.0e+02 1.5e+06 99100100100100 99100100100100 150509 STSetUp 10 1.0 2.5475e-04 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 STApply 150986 1.0 2.1997e+02 1.3 8.07e+10 1.8 1.1e+09 5.0e+02 1.2e+06 31 9100100 80 31 9100100 80 36950 BVCopy 1791 1.0 5.1953e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 BVMultVec 301925 1.0 1.5007e+02 3.1 3.31e+11 1.8 0.0e+00 0.0e+00 0.0e+00 14 36 0 0 0 14 36 0 0 0 220292 BVMultInPlace 1801 1.0 8.0080e+00 1.8 1.78e+11 1.8 0.0e+00 0.0e+00 0.0e+00 1 19 0 0 0 1 19 0 0 0 2222543 BVDotVec 301925 1.0 3.2807e+02 1.4 3.33e+11 1.8 0.0e+00 0.0e+00 3.0e+05 47 36 0 0 20 47 36 0 0 20 101409 BVOrthogonalizeV 150996 1.0 4.0292e+02 1.1 6.64e+11 1.8 0.0e+00 0.0e+00 3.0e+05 62 72 0 0 20 62 72 0 0 20 164619 BVScale 150996 1.0 4.1660e-01 3.2 5.27e+08 1.8 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 126494 BVSetRandom 10 1.0 2.5061e-03 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 DSSolve 1801 1.0 2.0764e+01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 DSVectors 2779 1.0 1.2691e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 DSOther 1801 1.0 1.2944e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Container 1 1 584 0. Distributed Mesh 6 6 29160 0. GraphPartitioner 2 2 1244 0. Matrix 1104 1104 136615232 0. Index Set 930 930 9125912 0. IS L to G Mapping 3 3 2235608 0. Section 28 26 18720 0. Star Forest Graph 30 30 25632 0. Discrete System 6 6 5616 0. PetscRandom 11 11 7194 0. Vector 604372 604372 8204816368 0. Vec Scatter 203 203 272192 0. Viewer 21 10 8480 0. EPS Solver 10 10 86360 0. Spectral Transform 10 10 8400 0. Basis Vectors 10 10 530848 0. Region 10 10 6800 0. Direct Solver 10 10 9838880 0. Krylov Solver 10 10 13920 0. Preconditioner 10 10 10080 0. ======================================================================================================================== Average time to get PetscTime(): 3.49944e-08 Average time for MPI_Barrier(): 5.842e-06 Average time for zero size MPI_Send(): 8.72551e-06 #PETSc Option Table entries: -config=benchmark3.json -log_view #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --prefix=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 CFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" FFLAGS= CXXFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" --with-cc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc --with-cxx=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpic++ --with-fc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 --with-precision=double --with-scalar-type=real --with-shared-libraries=1 --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= CXXOPTFLAGS= --with-blaslapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib/libopenblas.so --with-x=0 --with-cxx-dialect=C++11 --with-boost=1 --with-clanguage=C --with-scalapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib/libscalapack.so --with-scalapack=1 --with-metis=1 --with-metis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk --with-hdf5=1 --with-hdf5-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5 --with-hypre=1 --with-hypre-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne --with-parmetis=1 --with-parmetis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4 --with-mumps=1 --with-mumps-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b --with-trilinos=1 --with-trilinos-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo --with-fftw=0 --with-cxx-dialect=C++11 --with-superlu_dist-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include --with-superlu_dist-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib/libsuperlu_dist.a --with-superlu_dist=1 --with-suitesparse-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include --with-suitesparse-lib="/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libumfpack.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libklu.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcholmod.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libbtf.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libccolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libsuitesparseconfig.so /lib64/librt.so" --with-suitesparse=1 --with-zlib-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include --with-zlib-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib/libz.so --with-zlib=1 ----------------------------------------- Libraries compiled on 2020-01-22 15:21:53 on eu-c7-051-02 Machine characteristics: Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-centos-7.5.1804-Core Using PETSc directory: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit Using PETSc arch: ----------------------------------------- Using C compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc -ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2 Using Fortran compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 ----------------------------------------- Using include paths: -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include ----------------------------------------- Using C linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc Using Fortran linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 Using libraries: -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -lpetsc -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib /lib64/librt.so -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -Wl,-rpath,/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -L/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib:/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -lmuelu-adapters -lmuelu-interface -lmuelu -lstratimikos -lstratimikosbelos -lstratimikosaztecoo -lstratimikosamesos -lstratimikosml -lstratimikosifpack -lModeLaplace -lanasaziepetra -lanasazi -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lbelosxpetra -lbelosepetra -lbelos -lml -lifpack -lpamgen_extras -lpamgen -lamesos -lgaleri-xpetra -lgaleri-epetra -laztecoo -lisorropia -lxpetra-sup -lxpetra -lthyraepetraext -lthyraepetra -lthyracore -lthyraepetraext -lthyraepetra -lthyracore -lepetraext -ltrilinosss -ltriutils -lzoltan -lepetra -lsacado -lrtop -lkokkoskernels -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lgtest -lpthread -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu_dist -lHYPRE -lopenblas -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lparmetis -lmetis -lm -lz -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl ----------------------------------------- -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbuerkle at web.de Fri May 1 03:32:53 2020 From: mbuerkle at web.de (Marius Buerkle) Date: Fri, 1 May 2020 10:32:53 +0200 Subject: [petsc-users] mkl cpardiso iparm 31 Message-ID: An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Fri May 1 05:08:22 2020 From: jroman at dsic.upv.es (Jose E. Roman) Date: Fri, 1 May 2020 12:08:22 +0200 Subject: [petsc-users] Performance of SLEPc's Krylov-Schur solver In-Reply-To: <58697038-9771-4819-B060-061A3F3F0E91@student.ethz.ch> References: <86B05A0E-87C4-4B23-AC8B-6C39E6538B84@student.ethz.ch> <58697038-9771-4819-B060-061A3F3F0E91@student.ethz.ch> Message-ID: Comments related to PETSc: - If you look at the "Reduct" column you will see that MatMult() is doing a lot of global reductions, which is bad for scaling. This is due to MATNEST (other Mat types do not do that). I don't know the details of MATNEST, maybe Matt can comment on this. Comments related to SLEPc. - The last rows (DSSolve, DSVectors, DSOther) correspond to "sequential" computations. In your case they take a non-negligible time (around 30 seconds). You can try to reduce this time by reducing the size of the projected problem, e.g. running with -eps_nev 100 -eps_mpd 64 (see https://slepc.upv.es/documentation/current/docs/manualpages/EPS/EPSSetDimensions.html ) - In my previous comment about multithreaded BLAS, I was refering to configuring PETSc with MKL, OpenBLAS or similar. But anyway, I don't think this is relevant here. - Regarding the number of iterations, yes the number of iterations should be the same for different runs if you keep the same number of processes, but when you change the number of processes there might be significant differences for some problems, that is the rationale of my suggestion. Anyway, in your case the fluctuation does not seem very important. Jose > El 1 may 2020, a las 10:07, Walker Andreas escribi?: > > Hi Matthew, > > I just ran the same program on a single core. You can see the output of -log_view below. As I see it, most functions have speedups of around 50 for 128 cores, also functions like matmult etc. > > Best regards, > > Andreas > > ************************************************************************************************************************ > *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** > ************************************************************************************************************************ > > ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- > > ./Solver on a named eu-a6-011-09 with 1 processor, by awalker Fri May 1 04:03:07 2020 > Using Petsc Release Version 3.10.5, Mar, 28, 2019 > > Max Max/Min Avg Total > Time (sec): 3.092e+04 1.000 3.092e+04 > Objects: 6.099e+05 1.000 6.099e+05 > Flop: 9.313e+13 1.000 9.313e+13 9.313e+13 > Flop/sec: 3.012e+09 1.000 3.012e+09 3.012e+09 > MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 > MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 > MPI Reductions: 0.000e+00 0.000 > > Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N --> 2N flop > and VecAXPY() for complex vectors of length N --> 8N flop > > Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total Count %Total Avg %Total Count %Total > 0: Main Stage: 3.0925e+04 100.0% 9.3134e+13 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on interpreting output. > Phase summary info: > Count: number of times phase was executed > Time and Flop: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > AvgLen: average message length (bytes) > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). > %T - percent time in this phase %F - percent flop in this phase > %M - percent messages in this phase %L - percent message lengths in this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flop --- Global --- --- Stage ---- Total > Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > MatMult 152338 1.0 8.2799e+03 1.0 8.20e+12 1.0 0.0e+00 0.0e+00 0.0e+00 27 9 0 0 0 27 9 0 0 0 990 > MatMultAdd 609352 1.0 8.1229e+03 1.0 8.20e+12 1.0 0.0e+00 0.0e+00 0.0e+00 26 9 0 0 0 26 9 0 0 0 1010 > MatConvert 30 1.0 1.5797e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatScale 10 1.0 4.7172e-02 1.0 6.73e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1426 > MatAssemblyBegin 516 1.0 2.0695e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 516 1.0 2.8933e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatZeroEntries 2 1.0 3.6038e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatView 10 1.0 2.4422e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAXPY 40 1.0 3.1595e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatMatMult 60 1.0 1.3723e+01 1.0 1.24e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 90 > MatMatMultSym 100 1.0 1.3651e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatMatMultNum 100 1.0 7.5159e+00 1.0 2.06e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 274 > MatMatMatMult 40 1.0 1.8674e+01 1.0 1.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 89 > MatMatMatMultSym 40 1.0 1.1848e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatMatMatMultNum 40 1.0 6.8266e+00 1.0 1.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 243 > MatPtAP 40 1.0 1.9042e+01 1.0 1.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 87 > MatTrnMatMult 40 1.0 7.7990e+00 1.0 8.24e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 106 > DMPlexStratify 1 1.0 5.1223e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > DMPlexPrealloc 2 1.0 1.5242e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 914053 1.0 1.4929e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAssemblyBegin 1 1.0 1.3411e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAssemblyEnd 1 1.0 8.0094e-08 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecScatterBegin 1 1.0 2.6399e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSetRandom 10 1.0 8.6088e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > EPSSetUp 10 1.0 2.9988e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > EPSSolve 10 1.0 2.8695e+04 1.0 9.31e+13 1.0 0.0e+00 0.0e+00 0.0e+00 93100 0 0 0 93100 0 0 0 3246 > STSetUp 10 1.0 9.7291e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > STApply 152338 1.0 8.2803e+03 1.0 8.20e+12 1.0 0.0e+00 0.0e+00 0.0e+00 27 9 0 0 0 27 9 0 0 0 990 > BVCopy 1814 1.0 1.1076e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > BVMultVec 304639 1.0 9.8281e+03 1.0 3.34e+13 1.0 0.0e+00 0.0e+00 0.0e+00 32 36 0 0 0 32 36 0 0 0 3397 > BVMultInPlace 1824 1.0 7.0999e+02 1.0 1.79e+13 1.0 0.0e+00 0.0e+00 0.0e+00 2 19 0 0 0 2 19 0 0 0 25213 > BVDotVec 304639 1.0 9.8037e+03 1.0 3.36e+13 1.0 0.0e+00 0.0e+00 0.0e+00 32 36 0 0 0 32 36 0 0 0 3427 > BVOrthogonalizeV 152348 1.0 1.9633e+04 1.0 6.70e+13 1.0 0.0e+00 0.0e+00 0.0e+00 63 72 0 0 0 63 72 0 0 0 3411 > BVScale 152348 1.0 3.7888e+01 1.0 5.32e+10 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1403 > BVSetRandom 10 1.0 8.6364e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > DSSolve 1824 1.0 1.7363e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > DSVectors 2797 1.0 1.2353e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > DSOther 1824 1.0 9.8627e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > ------------------------------------------------------------------------------------------------------------------------ > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' Mem. > Reports information only for process 0. > > --- Event Stage 0: Main Stage > > Container 1 1 584 0. > Distributed Mesh 1 1 5184 0. > GraphPartitioner 1 1 624 0. > Matrix 320 320 3469402576 0. > Index Set 53 53 2777932 0. > IS L to G Mapping 1 1 249320 0. > Section 13 11 7920 0. > Star Forest Graph 6 6 4896 0. > Discrete System 1 1 936 0. > Vector 609405 609405 857220847896 0. > Vec Scatter 1 1 704 0. > Viewer 22 11 9328 0. > EPS Solver 10 10 86360 0. > Spectral Transform 10 10 8400 0. > Basis Vectors 10 10 530336 0. > PetscRandom 10 10 6540 0. > Region 10 10 6800 0. > Direct Solver 10 10 9838880 0. > Krylov Solver 10 10 13920 0. > Preconditioner 10 10 10080 0. > ======================================================================================================================== > Average time to get PetscTime(): 2.50991e-08 > #PETSc Option Table entries: > -config=benchmark3.json > -eps_converged_reason > -log_view > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 > Configure options: --prefix=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 CFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" FFLAGS= CXXFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" --with-cc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc --with-cxx=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpic++ --with-fc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 --with-precision=double --with-scalar-type=real --with-shared-libraries=1 --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= CXXOPTFLAGS= --with-blaslapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib/libopenblas.so --with-x=0 --with-cxx-dialect=C++11 --with-boost=1 --with-clanguage=C --with-scalapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib/libscalapack.so --with-scalapack=1 --with-metis=1 --with-metis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk --with-hdf5=1 --with-hdf5-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5 --with-hypre=1 --with-hypre-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne --with-parmetis=1 --with-parmetis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4 --with-mumps=1 --with-mumps-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b --with-trilinos=1 --with-trilinos-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo --with-fftw=0 --with-cxx-dialect=C++11 --with-superlu_dist-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include --with-superlu_dist-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib/libsuperlu_dist.a --with-superlu_dist=1 --with-suitesparse-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include --with-suitesparse-lib="/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libumfpack.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libklu.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcholmod.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libbtf.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libccolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libsuitesparseconfig.so /lib64/librt.so" --with-suitesparse=1 --with-zlib-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include --with-zlib-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib/libz.so --with-zlib=1 > ----------------------------------------- > Libraries compiled on 2020-01-22 15:21:53 on eu-c7-051-02 > Machine characteristics: Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-centos-7.5.1804-Core > Using PETSc directory: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit > Using PETSc arch: > ----------------------------------------- > > Using C compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc -ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2 > Using Fortran compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 > ----------------------------------------- > > Using include paths: -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include > ----------------------------------------- > > Using C linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc > Using Fortran linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 > Using libraries: -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -lpetsc -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib /lib64/librt.so -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -Wl,-rpath,/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -L/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib:/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -lmuelu-adapters -lmuelu-interface -lmuelu -lstratimikos -lstratimikosbelos -lstratimikosaztecoo -lstratimikosamesos -lstratimikosml -lstratimikosifpack -lModeLaplace -lanasaziepetra -lanasazi -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lbelosxpetra -lbelosepetra -lbelos -lml -lifpack -lpamgen_extras -lpamgen -lamesos -lgaleri-xpetra -lgaleri-epetra -laztecoo -lisorropia -lxpetra-sup -lxpetra -lthyraepetraext -lthyraepetra -lthyracore -lthyraepetraext -lthyraepetra -lthyracore -lepetraext -ltrilinosss -ltriutils -lzoltan -lepetra -lsacado -lrtop -lkokkoskernels -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lgtest -lpthread -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu_dist -lHYPRE -lopenblas -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lparmetis -lmetis -lm -lz -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl > ----------------------------------------- > > >> Am 30.04.2020 um 17:14 schrieb Matthew Knepley : >> >> On Thu, Apr 30, 2020 at 10:55 AM Walker Andreas wrote: >> Hello everyone, >> >> I have used SLEPc successfully on a FEM-related project. Even though it is very powerful overall, the speedup I measure is a bit below my expectations. Compared to using a single core, the speedup is for example around 1.8 for two cores but only maybe 50-60 for 128 cores and maybe 70 or 80 for 256 cores. Some details about my problem: >> >> - The problem is based on meshes with up to 400k degrees of freedom. DMPlex is used for organizing it. >> - ParMetis is used to partition the mesh. This yields a stiffness matrix where the vast majority of entries is in the diagonal blocks (i.e. looking at the rows owned by a core, there is a very dense square-shaped region around the diagonal and some loosely scattered nozeroes in the other columns). >> - The actual matrix from which I need eigenvalues is a 2x2 block matrix, saved as MATNEST - matrix. Each of these four matrices is computed based on the stiffness matrix and has a similar size and nonzero pattern. For a mesh of 200k dofs, one such matrix has a size of about 174kx174k and on average about 40 nonzeroes per row. >> - I use the default Krylov-Schur solver and look for the 100 smallest eigenvalues >> - The output of -log_view for the 200k-dof - mesh described above run on 128 cores is at the end of this mail. >> >> I noticed that the problem matrices are not perfectly balanced, i.e. the number of rows per core might vary between 2500 and 3000, for example. But I am not sure if this is the main reason for the poor speedup. >> >> I tried to reduce the subspace size but without effect. I also attempted to use the shift-and-invert spectral transformation but the MATNEST-type prevents this. >> >> Are there any suggestions to improve the speedup further or is this the maximum speedup that I can expect? >> >> Can you also give us the performance for this problem on one node using the same number of cores per node? Then we can calculate speedup >> and look at which functions are not speeding up. >> >> Thanks, >> >> Matt >> >> Thanks a lot in advance, >> >> Andreas Walker >> >> m&m group >> D-MAVT >> ETH Zurich >> >> ************************************************************************************************************************ >> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** >> ************************************************************************************************************************ >> >> ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- >> >> ./Solver on a named eu-g1-050-2 with 128 processors, by awalker Thu Apr 30 15:50:22 2020 >> Using Petsc Release Version 3.10.5, Mar, 28, 2019 >> >> Max Max/Min Avg Total >> Time (sec): 6.209e+02 1.000 6.209e+02 >> Objects: 6.068e+05 1.001 6.063e+05 >> Flop: 9.230e+11 1.816 7.212e+11 9.231e+13 >> Flop/sec: 1.487e+09 1.816 1.161e+09 1.487e+11 >> MPI Messages: 1.451e+07 2.999 8.265e+06 1.058e+09 >> MPI Message Lengths: 6.062e+09 2.011 5.029e+02 5.321e+11 >> MPI Reductions: 1.512e+06 1.000 >> >> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) >> e.g., VecAXPY() for real vectors of length N --> 2N flop >> and VecAXPY() for complex vectors of length N --> 8N flop >> >> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- >> Avg %Total Avg %Total Count %Total Avg %Total Count %Total >> 0: Main Stage: 6.2090e+02 100.0% 9.2309e+13 100.0% 1.058e+09 100.0% 5.029e+02 100.0% 1.512e+06 100.0% >> >> ------------------------------------------------------------------------------------------------------------------------ >> See the 'Profiling' chapter of the users' manual for details on interpreting output. >> Phase summary info: >> Count: number of times phase was executed >> Time and Flop: Max - maximum over all processors >> Ratio - ratio of maximum to minimum over all processors >> Mess: number of messages sent >> AvgLen: average message length (bytes) >> Reduct: number of global reductions >> Global: entire computation >> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). >> %T - percent time in this phase %F - percent flop in this phase >> %M - percent messages in this phase %L - percent message lengths in this phase >> %R - percent reductions in this phase >> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) >> ------------------------------------------------------------------------------------------------------------------------ >> Event Count Time (sec) Flop --- Global --- --- Stage ---- Total >> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> ------------------------------------------------------------------------------------------------------------------------ >> >> --- Event Stage 0: Main Stage >> >> BuildTwoSided 20 1.0 2.3249e-01 2.2 0.00e+00 0.0 2.2e+04 4.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> BuildTwoSidedF 317 1.0 8.5016e-01 4.8 0.00e+00 0.0 2.1e+04 1.4e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatMult 150986 1.0 2.1963e+02 1.3 8.07e+10 1.8 1.1e+09 5.0e+02 1.2e+06 31 9100100 80 31 9100100 80 37007 >> MatMultAdd 603944 1.0 1.6209e+02 1.4 8.07e+10 1.8 1.1e+09 5.0e+02 0.0e+00 23 9100100 0 23 9100100 0 50145 >> MatConvert 30 1.0 1.6488e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatScale 10 1.0 1.0347e-03 3.9 6.68e+05 1.8 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 65036 >> MatAssemblyBegin 916 1.0 8.6715e-01 1.4 0.00e+00 0.0 2.1e+04 1.4e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAssemblyEnd 916 1.0 2.0682e-01 1.1 0.00e+00 0.0 4.7e+05 1.3e+02 1.5e+03 0 0 0 0 0 0 0 0 0 0 0 >> MatZeroEntries 42 1.0 7.2787e-03 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatView 10 1.0 1.4816e+00 1.0 0.00e+00 0.0 6.4e+03 1.3e+05 3.0e+01 0 0 0 0 0 0 0 0 0 0 0 >> MatAXPY 40 1.0 1.0752e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatTranspose 80 1.0 3.0198e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatMatMult 60 1.0 3.0391e-01 1.0 7.82e+06 1.6 3.8e+05 2.8e+02 7.8e+02 0 0 0 0 0 0 0 0 0 0 2711 >> MatMatMultSym 60 1.0 2.4238e-01 1.0 0.00e+00 0.0 3.3e+05 2.4e+02 7.2e+02 0 0 0 0 0 0 0 0 0 0 0 >> MatMatMultNum 60 1.0 5.8508e-02 1.0 7.82e+06 1.6 4.7e+04 5.7e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 14084 >> MatPtAP 40 1.0 4.5617e-01 1.0 1.59e+07 1.6 3.3e+05 1.0e+03 6.4e+02 0 0 0 0 0 0 0 0 0 0 3649 >> MatPtAPSymbolic 40 1.0 2.6002e-01 1.0 0.00e+00 0.0 1.7e+05 6.5e+02 2.8e+02 0 0 0 0 0 0 0 0 0 0 0 >> MatPtAPNumeric 40 1.0 1.9293e-01 1.0 1.59e+07 1.6 1.5e+05 1.5e+03 3.2e+02 0 0 0 0 0 0 0 0 0 0 8629 >> MatTrnMatMult 40 1.0 2.3801e-01 1.0 6.09e+06 1.8 1.8e+05 1.0e+03 6.4e+02 0 0 0 0 0 0 0 0 0 0 2442 >> MatTrnMatMultSym 40 1.0 1.6962e-01 1.0 0.00e+00 0.0 1.7e+05 4.4e+02 6.4e+02 0 0 0 0 0 0 0 0 0 0 0 >> MatTrnMatMultNum 40 1.0 6.9000e-02 1.0 6.09e+06 1.8 9.7e+03 1.1e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 8425 >> MatGetLocalMat 240 1.0 4.9149e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatGetBrAoCol 160 1.0 2.0470e-02 1.6 0.00e+00 0.0 3.3e+05 4.1e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatTranspose_SeqAIJ_FAST 80 1.0 2.9940e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> Mesh Partition 1 1.0 1.4825e+00 1.0 0.00e+00 0.0 9.8e+04 6.9e+01 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> Mesh Migration 1 1.0 3.6680e-02 1.0 0.00e+00 0.0 1.5e+03 1.4e+04 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> DMPlexDistribute 1 1.0 1.5269e+00 1.0 0.00e+00 0.0 1.0e+05 3.5e+02 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 >> DMPlexDistCones 1 1.0 1.8845e-02 1.2 0.00e+00 0.0 1.0e+03 1.7e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> DMPlexDistLabels 1 1.0 9.7280e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> DMPlexDistData 1 1.0 3.1499e-01 1.4 0.00e+00 0.0 9.8e+04 4.3e+01 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> DMPlexStratify 2 1.0 9.3421e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> DMPlexPrealloc 2 1.0 3.5980e-02 1.0 0.00e+00 0.0 4.0e+04 1.8e+03 3.0e+01 0 0 0 0 0 0 0 0 0 0 0 >> SFSetGraph 20 1.0 1.6069e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> SFSetUp 20 1.0 2.8043e-01 1.9 0.00e+00 0.0 6.7e+04 5.0e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> SFBcastBegin 25 1.0 3.9653e-02 2.5 0.00e+00 0.0 6.1e+04 4.9e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> SFBcastEnd 25 1.0 9.0128e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> SFReduceBegin 10 1.0 4.3473e-04 5.5 0.00e+00 0.0 7.4e+03 4.0e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> SFReduceEnd 10 1.0 5.7962e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> SFFetchOpBegin 2 1.0 1.6069e-0434.7 0.00e+00 0.0 1.8e+03 4.4e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> SFFetchOpEnd 2 1.0 8.9251e-04 2.6 0.00e+00 0.0 1.8e+03 4.4e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecSet 302179 1.0 1.3128e+00 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecAssemblyBegin 1 1.0 1.3844e-03 7.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecAssemblyEnd 1 1.0 3.4710e-05 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecScatterBegin 603945 1.0 2.2874e+01 4.4 0.00e+00 0.0 1.1e+09 5.0e+02 1.0e+00 2 0100100 0 2 0100100 0 0 >> VecScatterEnd 603944 1.0 8.2651e+01 4.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 7 0 0 0 0 7 0 0 0 0 0 >> VecSetRandom 11 1.0 2.7061e-03 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> EPSSetUp 10 1.0 5.0371e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+01 0 0 0 0 0 0 0 0 0 0 0 >> EPSSolve 10 1.0 6.1329e+02 1.0 9.23e+11 1.8 1.1e+09 5.0e+02 1.5e+06 99100100100100 99100100100100 150509 >> STSetUp 10 1.0 2.5475e-04 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> STApply 150986 1.0 2.1997e+02 1.3 8.07e+10 1.8 1.1e+09 5.0e+02 1.2e+06 31 9100100 80 31 9100100 80 36950 >> BVCopy 1791 1.0 5.1953e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> BVMultVec 301925 1.0 1.5007e+02 3.1 3.31e+11 1.8 0.0e+00 0.0e+00 0.0e+00 14 36 0 0 0 14 36 0 0 0 220292 >> BVMultInPlace 1801 1.0 8.0080e+00 1.8 1.78e+11 1.8 0.0e+00 0.0e+00 0.0e+00 1 19 0 0 0 1 19 0 0 0 2222543 >> BVDotVec 301925 1.0 3.2807e+02 1.4 3.33e+11 1.8 0.0e+00 0.0e+00 3.0e+05 47 36 0 0 20 47 36 0 0 20 101409 >> BVOrthogonalizeV 150996 1.0 4.0292e+02 1.1 6.64e+11 1.8 0.0e+00 0.0e+00 3.0e+05 62 72 0 0 20 62 72 0 0 20 164619 >> BVScale 150996 1.0 4.1660e-01 3.2 5.27e+08 1.8 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 126494 >> BVSetRandom 10 1.0 2.5061e-03 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> DSSolve 1801 1.0 2.0764e+01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >> DSVectors 2779 1.0 1.2691e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> DSOther 1801 1.0 1.2944e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >> ------------------------------------------------------------------------------------------------------------------------ >> >> Memory usage is given in bytes: >> >> Object Type Creations Destructions Memory Descendants' Mem. >> Reports information only for process 0. >> >> --- Event Stage 0: Main Stage >> >> Container 1 1 584 0. >> Distributed Mesh 6 6 29160 0. >> GraphPartitioner 2 2 1244 0. >> Matrix 1104 1104 136615232 0. >> Index Set 930 930 9125912 0. >> IS L to G Mapping 3 3 2235608 0. >> Section 28 26 18720 0. >> Star Forest Graph 30 30 25632 0. >> Discrete System 6 6 5616 0. >> PetscRandom 11 11 7194 0. >> Vector 604372 604372 8204816368 0. >> Vec Scatter 203 203 272192 0. >> Viewer 21 10 8480 0. >> EPS Solver 10 10 86360 0. >> Spectral Transform 10 10 8400 0. >> Basis Vectors 10 10 530848 0. >> Region 10 10 6800 0. >> Direct Solver 10 10 9838880 0. >> Krylov Solver 10 10 13920 0. >> Preconditioner 10 10 10080 0. >> ======================================================================================================================== >> Average time to get PetscTime(): 3.49944e-08 >> Average time for MPI_Barrier(): 5.842e-06 >> Average time for zero size MPI_Send(): 8.72551e-06 >> #PETSc Option Table entries: >> -config=benchmark3.json >> -log_view >> #End of PETSc Option Table entries >> Compiled without FORTRAN kernels >> Compiled with full precision matrices (default) >> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 >> Configure options: --prefix=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 CFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" FFLAGS= CXXFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" --with-cc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc --with-cxx=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpic++ --with-fc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 --with-precision=double --with-scalar-type=real --with-shared-libraries=1 --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= CXXOPTFLAGS= --with-blaslapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib/libopenblas.so --with-x=0 --with-cxx-dialect=C++11 --with-boost=1 --with-clanguage=C --with-scalapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib/libscalapack.so --with-scalapack=1 --with-metis=1 --with-metis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk --with-hdf5=1 --with-hdf5-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5 --with-hypre=1 --with-hypre-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne --with-parmetis=1 --with-parmetis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4 --with-mumps=1 --with-mumps-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b --with-trilinos=1 --with-trilinos-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo --with-fftw=0 --with-cxx-dialect=C++11 --with-superlu_dist-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include --with-superlu_dist-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib/libsuperlu_dist.a --with-superlu_dist=1 --with-suitesparse-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include --with-suitesparse-lib="/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libumfpack.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libklu.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcholmod.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libbtf.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libccolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libsuitesparseconfig.so /lib64/librt.so" --with-suitesparse=1 --with-zlib-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include --with-zlib-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib/libz.so --with-zlib=1 >> ----------------------------------------- >> Libraries compiled on 2020-01-22 15:21:53 on eu-c7-051-02 >> Machine characteristics: Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-centos-7.5.1804-Core >> Using PETSc directory: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit >> Using PETSc arch: >> ----------------------------------------- >> >> Using C compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc -ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2 >> Using Fortran compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 >> ----------------------------------------- >> >> Using include paths: -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include >> ----------------------------------------- >> >> Using C linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc >> Using Fortran linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 >> Using libraries: -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -lpetsc -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib /lib64/librt.so -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -Wl,-rpath,/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -L/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib:/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -lmuelu-adapters -lmuelu-interface -lmuelu -lstratimikos -lstratimikosbelos -lstratimikosaztecoo -lstratimikosamesos -lstratimikosml -lstratimikosifpack -lModeLaplace -lanasaziepetra -lanasazi -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lbelosxpetra -lbelosepetra -lbelos -lml -lifpack -lpamgen_extras -lpamgen -lamesos -lgaleri-xpetra -lgaleri-epetra -laztecoo -lisorropia -lxpetra-sup -lxpetra -lthyraepetraext -lthyraepetra -lthyracore -lthyraepetraext -lthyraepetra -lthyracore -lepetraext -ltrilinosss -ltriutils -lzoltan -lepetra -lsacado -lrtop -lkokkoskernels -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lgtest -lpthread -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu_dist -lHYPRE -lopenblas -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lparmetis -lmetis -lm -lz -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl >> ----------------------------------------- >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > From jed at jedbrown.org Fri May 1 07:12:58 2020 From: jed at jedbrown.org (Jed Brown) Date: Fri, 01 May 2020 06:12:58 -0600 Subject: [petsc-users] Performance of SLEPc's Krylov-Schur solver In-Reply-To: References: <86B05A0E-87C4-4B23-AC8B-6C39E6538B84@student.ethz.ch> <58697038-9771-4819-B060-061A3F3F0E91@student.ethz.ch> Message-ID: <87sggjam5x.fsf@jedbrown.org> "Jose E. Roman" writes: > Comments related to PETSc: > > - If you look at the "Reduct" column you will see that MatMult() is doing a lot of global reductions, which is bad for scaling. This is due to MATNEST (other Mat types do not do that). I don't know the details of MATNEST, maybe Matt can comment on this. It is not intrinsic to MatNest, though use of MatNest incurs extra VecScatter costs. If you use MatNest without VecNest, then VecGetSubVector incurs significant cost (including reductions). I suspect it's likely that some SLEPc functionality is not available with VecNest. A better option would be to optimize VecGetSubVector by caching the IS and subvector, at least in the contiguous case. How difficult would it be for you to run with a monolithic matrix instead of MatNest? It would certainly be better at amortizing communication costs. > > Comments related to SLEPc. > > - The last rows (DSSolve, DSVectors, DSOther) correspond to "sequential" computations. In your case they take a non-negligible time (around 30 seconds). You can try to reduce this time by reducing the size of the projected problem, e.g. running with -eps_nev 100 -eps_mpd 64 (see https://slepc.upv.es/documentation/current/docs/manualpages/EPS/EPSSetDimensions.html ) > > - In my previous comment about multithreaded BLAS, I was refering to configuring PETSc with MKL, OpenBLAS or similar. But anyway, I don't think this is relevant here. > > - Regarding the number of iterations, yes the number of iterations should be the same for different runs if you keep the same number of processes, but when you change the number of processes there might be significant differences for some problems, that is the rationale of my suggestion. Anyway, in your case the fluctuation does not seem very important. > > Jose > > >> El 1 may 2020, a las 10:07, Walker Andreas escribi?: >> >> Hi Matthew, >> >> I just ran the same program on a single core. You can see the output of -log_view below. As I see it, most functions have speedups of around 50 for 128 cores, also functions like matmult etc. >> >> Best regards, >> >> Andreas >> >> ************************************************************************************************************************ >> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** >> ************************************************************************************************************************ >> >> ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- >> >> ./Solver on a named eu-a6-011-09 with 1 processor, by awalker Fri May 1 04:03:07 2020 >> Using Petsc Release Version 3.10.5, Mar, 28, 2019 >> >> Max Max/Min Avg Total >> Time (sec): 3.092e+04 1.000 3.092e+04 >> Objects: 6.099e+05 1.000 6.099e+05 >> Flop: 9.313e+13 1.000 9.313e+13 9.313e+13 >> Flop/sec: 3.012e+09 1.000 3.012e+09 3.012e+09 >> MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 >> MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 >> MPI Reductions: 0.000e+00 0.000 >> >> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) >> e.g., VecAXPY() for real vectors of length N --> 2N flop >> and VecAXPY() for complex vectors of length N --> 8N flop >> >> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- >> Avg %Total Avg %Total Count %Total Avg %Total Count %Total >> 0: Main Stage: 3.0925e+04 100.0% 9.3134e+13 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% >> >> ------------------------------------------------------------------------------------------------------------------------ >> See the 'Profiling' chapter of the users' manual for details on interpreting output. >> Phase summary info: >> Count: number of times phase was executed >> Time and Flop: Max - maximum over all processors >> Ratio - ratio of maximum to minimum over all processors >> Mess: number of messages sent >> AvgLen: average message length (bytes) >> Reduct: number of global reductions >> Global: entire computation >> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). >> %T - percent time in this phase %F - percent flop in this phase >> %M - percent messages in this phase %L - percent message lengths in this phase >> %R - percent reductions in this phase >> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) >> ------------------------------------------------------------------------------------------------------------------------ >> Event Count Time (sec) Flop --- Global --- --- Stage ---- Total >> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> ------------------------------------------------------------------------------------------------------------------------ >> >> --- Event Stage 0: Main Stage >> >> MatMult 152338 1.0 8.2799e+03 1.0 8.20e+12 1.0 0.0e+00 0.0e+00 0.0e+00 27 9 0 0 0 27 9 0 0 0 990 >> MatMultAdd 609352 1.0 8.1229e+03 1.0 8.20e+12 1.0 0.0e+00 0.0e+00 0.0e+00 26 9 0 0 0 26 9 0 0 0 1010 >> MatConvert 30 1.0 1.5797e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatScale 10 1.0 4.7172e-02 1.0 6.73e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1426 >> MatAssemblyBegin 516 1.0 2.0695e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAssemblyEnd 516 1.0 2.8933e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatZeroEntries 2 1.0 3.6038e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatView 10 1.0 2.4422e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAXPY 40 1.0 3.1595e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatMatMult 60 1.0 1.3723e+01 1.0 1.24e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 90 >> MatMatMultSym 100 1.0 1.3651e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatMatMultNum 100 1.0 7.5159e+00 1.0 2.06e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 274 >> MatMatMatMult 40 1.0 1.8674e+01 1.0 1.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 89 >> MatMatMatMultSym 40 1.0 1.1848e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatMatMatMultNum 40 1.0 6.8266e+00 1.0 1.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 243 >> MatPtAP 40 1.0 1.9042e+01 1.0 1.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 87 >> MatTrnMatMult 40 1.0 7.7990e+00 1.0 8.24e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 106 >> DMPlexStratify 1 1.0 5.1223e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> DMPlexPrealloc 2 1.0 1.5242e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecSet 914053 1.0 1.4929e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecAssemblyBegin 1 1.0 1.3411e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecAssemblyEnd 1 1.0 8.0094e-08 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecScatterBegin 1 1.0 2.6399e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecSetRandom 10 1.0 8.6088e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> EPSSetUp 10 1.0 2.9988e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> EPSSolve 10 1.0 2.8695e+04 1.0 9.31e+13 1.0 0.0e+00 0.0e+00 0.0e+00 93100 0 0 0 93100 0 0 0 3246 >> STSetUp 10 1.0 9.7291e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> STApply 152338 1.0 8.2803e+03 1.0 8.20e+12 1.0 0.0e+00 0.0e+00 0.0e+00 27 9 0 0 0 27 9 0 0 0 990 >> BVCopy 1814 1.0 1.1076e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> BVMultVec 304639 1.0 9.8281e+03 1.0 3.34e+13 1.0 0.0e+00 0.0e+00 0.0e+00 32 36 0 0 0 32 36 0 0 0 3397 >> BVMultInPlace 1824 1.0 7.0999e+02 1.0 1.79e+13 1.0 0.0e+00 0.0e+00 0.0e+00 2 19 0 0 0 2 19 0 0 0 25213 >> BVDotVec 304639 1.0 9.8037e+03 1.0 3.36e+13 1.0 0.0e+00 0.0e+00 0.0e+00 32 36 0 0 0 32 36 0 0 0 3427 >> BVOrthogonalizeV 152348 1.0 1.9633e+04 1.0 6.70e+13 1.0 0.0e+00 0.0e+00 0.0e+00 63 72 0 0 0 63 72 0 0 0 3411 >> BVScale 152348 1.0 3.7888e+01 1.0 5.32e+10 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1403 >> BVSetRandom 10 1.0 8.6364e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> DSSolve 1824 1.0 1.7363e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> DSVectors 2797 1.0 1.2353e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> DSOther 1824 1.0 9.8627e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> ------------------------------------------------------------------------------------------------------------------------ >> >> Memory usage is given in bytes: >> >> Object Type Creations Destructions Memory Descendants' Mem. >> Reports information only for process 0. >> >> --- Event Stage 0: Main Stage >> >> Container 1 1 584 0. >> Distributed Mesh 1 1 5184 0. >> GraphPartitioner 1 1 624 0. >> Matrix 320 320 3469402576 0. >> Index Set 53 53 2777932 0. >> IS L to G Mapping 1 1 249320 0. >> Section 13 11 7920 0. >> Star Forest Graph 6 6 4896 0. >> Discrete System 1 1 936 0. >> Vector 609405 609405 857220847896 0. >> Vec Scatter 1 1 704 0. >> Viewer 22 11 9328 0. >> EPS Solver 10 10 86360 0. >> Spectral Transform 10 10 8400 0. >> Basis Vectors 10 10 530336 0. >> PetscRandom 10 10 6540 0. >> Region 10 10 6800 0. >> Direct Solver 10 10 9838880 0. >> Krylov Solver 10 10 13920 0. >> Preconditioner 10 10 10080 0. >> ======================================================================================================================== >> Average time to get PetscTime(): 2.50991e-08 >> #PETSc Option Table entries: >> -config=benchmark3.json >> -eps_converged_reason >> -log_view >> #End of PETSc Option Table entries >> Compiled without FORTRAN kernels >> Compiled with full precision matrices (default) >> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 >> Configure options: --prefix=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 CFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" FFLAGS= CXXFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" --with-cc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc --with-cxx=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpic++ --with-fc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 --with-precision=double --with-scalar-type=real --with-shared-libraries=1 --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= CXXOPTFLAGS= --with-blaslapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib/libopenblas.so --with-x=0 --with-cxx-dialect=C++11 --with-boost=1 --with-clanguage=C --with-scalapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib/libscalapack.so --with-scalapack=1 --with-metis=1 --with-metis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk --with-hdf5=1 --with-hdf5-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5 --with-hypre=1 --with-hypre-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne --with-parmetis=1 --with-parmetis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4 --with-mumps=1 --with-mumps-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b --with-trilinos=1 --with-trilinos-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo --with-fftw=0 --with-cxx-dialect=C++11 --with-superlu_dist-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include --with-superlu_dist-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib/libsuperlu_dist.a --with-superlu_dist=1 --with-suitesparse-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include --with-suitesparse-lib="/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libumfpack.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libklu.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcholmod.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libbtf.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libccolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libsuitesparseconfig.so /lib64/librt.so" --with-suitesparse=1 --with-zlib-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include --with-zlib-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib/libz.so --with-zlib=1 >> ----------------------------------------- >> Libraries compiled on 2020-01-22 15:21:53 on eu-c7-051-02 >> Machine characteristics: Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-centos-7.5.1804-Core >> Using PETSc directory: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit >> Using PETSc arch: >> ----------------------------------------- >> >> Using C compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc -ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2 >> Using Fortran compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 >> ----------------------------------------- >> >> Using include paths: -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include >> ----------------------------------------- >> >> Using C linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc >> Using Fortran linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 >> Using libraries: -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -lpetsc -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib /lib64/librt.so -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -Wl,-rpath,/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -L/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib:/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -lmuelu-adapters -lmuelu-interface -lmuelu -lstratimikos -lstratimikosbelos -lstratimikosaztecoo -lstratimikosamesos -lstratimikosml -lstratimikosifpack -lModeLaplace -lanasaziepetra -lanasazi -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lbelosxpetra -lbelosepetra -lbelos -lml -lifpack -lpamgen_extras -lpamgen -lamesos -lgaleri-xpetra -lgaleri-epetra -laztecoo -lisorropia -lxpetra-sup -lxpetra -lthyraepetraext -lthyraepetra -lthyracore -lthyraepetraext -lthyraepetra -lthyracore -lepetraext -ltrilinosss -ltriutils -lzoltan -lepetra -lsacado -lrtop -lkokkoskernels -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lgtest -lpthread -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu_dist -lHYPRE -lopenblas -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lparmetis -lmetis -lm -lz -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl >> ----------------------------------------- >> >> >>> Am 30.04.2020 um 17:14 schrieb Matthew Knepley : >>> >>> On Thu, Apr 30, 2020 at 10:55 AM Walker Andreas wrote: >>> Hello everyone, >>> >>> I have used SLEPc successfully on a FEM-related project. Even though it is very powerful overall, the speedup I measure is a bit below my expectations. Compared to using a single core, the speedup is for example around 1.8 for two cores but only maybe 50-60 for 128 cores and maybe 70 or 80 for 256 cores. Some details about my problem: >>> >>> - The problem is based on meshes with up to 400k degrees of freedom. DMPlex is used for organizing it. >>> - ParMetis is used to partition the mesh. This yields a stiffness matrix where the vast majority of entries is in the diagonal blocks (i.e. looking at the rows owned by a core, there is a very dense square-shaped region around the diagonal and some loosely scattered nozeroes in the other columns). >>> - The actual matrix from which I need eigenvalues is a 2x2 block matrix, saved as MATNEST - matrix. Each of these four matrices is computed based on the stiffness matrix and has a similar size and nonzero pattern. For a mesh of 200k dofs, one such matrix has a size of about 174kx174k and on average about 40 nonzeroes per row. >>> - I use the default Krylov-Schur solver and look for the 100 smallest eigenvalues >>> - The output of -log_view for the 200k-dof - mesh described above run on 128 cores is at the end of this mail. >>> >>> I noticed that the problem matrices are not perfectly balanced, i.e. the number of rows per core might vary between 2500 and 3000, for example. But I am not sure if this is the main reason for the poor speedup. >>> >>> I tried to reduce the subspace size but without effect. I also attempted to use the shift-and-invert spectral transformation but the MATNEST-type prevents this. >>> >>> Are there any suggestions to improve the speedup further or is this the maximum speedup that I can expect? >>> >>> Can you also give us the performance for this problem on one node using the same number of cores per node? Then we can calculate speedup >>> and look at which functions are not speeding up. >>> >>> Thanks, >>> >>> Matt >>> >>> Thanks a lot in advance, >>> >>> Andreas Walker >>> >>> m&m group >>> D-MAVT >>> ETH Zurich >>> >>> ************************************************************************************************************************ >>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** >>> ************************************************************************************************************************ >>> >>> ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- >>> >>> ./Solver on a named eu-g1-050-2 with 128 processors, by awalker Thu Apr 30 15:50:22 2020 >>> Using Petsc Release Version 3.10.5, Mar, 28, 2019 >>> >>> Max Max/Min Avg Total >>> Time (sec): 6.209e+02 1.000 6.209e+02 >>> Objects: 6.068e+05 1.001 6.063e+05 >>> Flop: 9.230e+11 1.816 7.212e+11 9.231e+13 >>> Flop/sec: 1.487e+09 1.816 1.161e+09 1.487e+11 >>> MPI Messages: 1.451e+07 2.999 8.265e+06 1.058e+09 >>> MPI Message Lengths: 6.062e+09 2.011 5.029e+02 5.321e+11 >>> MPI Reductions: 1.512e+06 1.000 >>> >>> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) >>> e.g., VecAXPY() for real vectors of length N --> 2N flop >>> and VecAXPY() for complex vectors of length N --> 8N flop >>> >>> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- >>> Avg %Total Avg %Total Count %Total Avg %Total Count %Total >>> 0: Main Stage: 6.2090e+02 100.0% 9.2309e+13 100.0% 1.058e+09 100.0% 5.029e+02 100.0% 1.512e+06 100.0% >>> >>> ------------------------------------------------------------------------------------------------------------------------ >>> See the 'Profiling' chapter of the users' manual for details on interpreting output. >>> Phase summary info: >>> Count: number of times phase was executed >>> Time and Flop: Max - maximum over all processors >>> Ratio - ratio of maximum to minimum over all processors >>> Mess: number of messages sent >>> AvgLen: average message length (bytes) >>> Reduct: number of global reductions >>> Global: entire computation >>> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). >>> %T - percent time in this phase %F - percent flop in this phase >>> %M - percent messages in this phase %L - percent message lengths in this phase >>> %R - percent reductions in this phase >>> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) >>> ------------------------------------------------------------------------------------------------------------------------ >>> Event Count Time (sec) Flop --- Global --- --- Stage ---- Total >>> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> --- Event Stage 0: Main Stage >>> >>> BuildTwoSided 20 1.0 2.3249e-01 2.2 0.00e+00 0.0 2.2e+04 4.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> BuildTwoSidedF 317 1.0 8.5016e-01 4.8 0.00e+00 0.0 2.1e+04 1.4e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatMult 150986 1.0 2.1963e+02 1.3 8.07e+10 1.8 1.1e+09 5.0e+02 1.2e+06 31 9100100 80 31 9100100 80 37007 >>> MatMultAdd 603944 1.0 1.6209e+02 1.4 8.07e+10 1.8 1.1e+09 5.0e+02 0.0e+00 23 9100100 0 23 9100100 0 50145 >>> MatConvert 30 1.0 1.6488e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatScale 10 1.0 1.0347e-03 3.9 6.68e+05 1.8 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 65036 >>> MatAssemblyBegin 916 1.0 8.6715e-01 1.4 0.00e+00 0.0 2.1e+04 1.4e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatAssemblyEnd 916 1.0 2.0682e-01 1.1 0.00e+00 0.0 4.7e+05 1.3e+02 1.5e+03 0 0 0 0 0 0 0 0 0 0 0 >>> MatZeroEntries 42 1.0 7.2787e-03 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatView 10 1.0 1.4816e+00 1.0 0.00e+00 0.0 6.4e+03 1.3e+05 3.0e+01 0 0 0 0 0 0 0 0 0 0 0 >>> MatAXPY 40 1.0 1.0752e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatTranspose 80 1.0 3.0198e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatMatMult 60 1.0 3.0391e-01 1.0 7.82e+06 1.6 3.8e+05 2.8e+02 7.8e+02 0 0 0 0 0 0 0 0 0 0 2711 >>> MatMatMultSym 60 1.0 2.4238e-01 1.0 0.00e+00 0.0 3.3e+05 2.4e+02 7.2e+02 0 0 0 0 0 0 0 0 0 0 0 >>> MatMatMultNum 60 1.0 5.8508e-02 1.0 7.82e+06 1.6 4.7e+04 5.7e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 14084 >>> MatPtAP 40 1.0 4.5617e-01 1.0 1.59e+07 1.6 3.3e+05 1.0e+03 6.4e+02 0 0 0 0 0 0 0 0 0 0 3649 >>> MatPtAPSymbolic 40 1.0 2.6002e-01 1.0 0.00e+00 0.0 1.7e+05 6.5e+02 2.8e+02 0 0 0 0 0 0 0 0 0 0 0 >>> MatPtAPNumeric 40 1.0 1.9293e-01 1.0 1.59e+07 1.6 1.5e+05 1.5e+03 3.2e+02 0 0 0 0 0 0 0 0 0 0 8629 >>> MatTrnMatMult 40 1.0 2.3801e-01 1.0 6.09e+06 1.8 1.8e+05 1.0e+03 6.4e+02 0 0 0 0 0 0 0 0 0 0 2442 >>> MatTrnMatMultSym 40 1.0 1.6962e-01 1.0 0.00e+00 0.0 1.7e+05 4.4e+02 6.4e+02 0 0 0 0 0 0 0 0 0 0 0 >>> MatTrnMatMultNum 40 1.0 6.9000e-02 1.0 6.09e+06 1.8 9.7e+03 1.1e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 8425 >>> MatGetLocalMat 240 1.0 4.9149e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatGetBrAoCol 160 1.0 2.0470e-02 1.6 0.00e+00 0.0 3.3e+05 4.1e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatTranspose_SeqAIJ_FAST 80 1.0 2.9940e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> Mesh Partition 1 1.0 1.4825e+00 1.0 0.00e+00 0.0 9.8e+04 6.9e+01 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> Mesh Migration 1 1.0 3.6680e-02 1.0 0.00e+00 0.0 1.5e+03 1.4e+04 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> DMPlexDistribute 1 1.0 1.5269e+00 1.0 0.00e+00 0.0 1.0e+05 3.5e+02 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 >>> DMPlexDistCones 1 1.0 1.8845e-02 1.2 0.00e+00 0.0 1.0e+03 1.7e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> DMPlexDistLabels 1 1.0 9.7280e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> DMPlexDistData 1 1.0 3.1499e-01 1.4 0.00e+00 0.0 9.8e+04 4.3e+01 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> DMPlexStratify 2 1.0 9.3421e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> DMPlexPrealloc 2 1.0 3.5980e-02 1.0 0.00e+00 0.0 4.0e+04 1.8e+03 3.0e+01 0 0 0 0 0 0 0 0 0 0 0 >>> SFSetGraph 20 1.0 1.6069e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> SFSetUp 20 1.0 2.8043e-01 1.9 0.00e+00 0.0 6.7e+04 5.0e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> SFBcastBegin 25 1.0 3.9653e-02 2.5 0.00e+00 0.0 6.1e+04 4.9e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> SFBcastEnd 25 1.0 9.0128e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> SFReduceBegin 10 1.0 4.3473e-04 5.5 0.00e+00 0.0 7.4e+03 4.0e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> SFReduceEnd 10 1.0 5.7962e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> SFFetchOpBegin 2 1.0 1.6069e-0434.7 0.00e+00 0.0 1.8e+03 4.4e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> SFFetchOpEnd 2 1.0 8.9251e-04 2.6 0.00e+00 0.0 1.8e+03 4.4e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecSet 302179 1.0 1.3128e+00 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecAssemblyBegin 1 1.0 1.3844e-03 7.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecAssemblyEnd 1 1.0 3.4710e-05 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecScatterBegin 603945 1.0 2.2874e+01 4.4 0.00e+00 0.0 1.1e+09 5.0e+02 1.0e+00 2 0100100 0 2 0100100 0 0 >>> VecScatterEnd 603944 1.0 8.2651e+01 4.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 7 0 0 0 0 7 0 0 0 0 0 >>> VecSetRandom 11 1.0 2.7061e-03 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> EPSSetUp 10 1.0 5.0371e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+01 0 0 0 0 0 0 0 0 0 0 0 >>> EPSSolve 10 1.0 6.1329e+02 1.0 9.23e+11 1.8 1.1e+09 5.0e+02 1.5e+06 99100100100100 99100100100100 150509 >>> STSetUp 10 1.0 2.5475e-04 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> STApply 150986 1.0 2.1997e+02 1.3 8.07e+10 1.8 1.1e+09 5.0e+02 1.2e+06 31 9100100 80 31 9100100 80 36950 >>> BVCopy 1791 1.0 5.1953e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> BVMultVec 301925 1.0 1.5007e+02 3.1 3.31e+11 1.8 0.0e+00 0.0e+00 0.0e+00 14 36 0 0 0 14 36 0 0 0 220292 >>> BVMultInPlace 1801 1.0 8.0080e+00 1.8 1.78e+11 1.8 0.0e+00 0.0e+00 0.0e+00 1 19 0 0 0 1 19 0 0 0 2222543 >>> BVDotVec 301925 1.0 3.2807e+02 1.4 3.33e+11 1.8 0.0e+00 0.0e+00 3.0e+05 47 36 0 0 20 47 36 0 0 20 101409 >>> BVOrthogonalizeV 150996 1.0 4.0292e+02 1.1 6.64e+11 1.8 0.0e+00 0.0e+00 3.0e+05 62 72 0 0 20 62 72 0 0 20 164619 >>> BVScale 150996 1.0 4.1660e-01 3.2 5.27e+08 1.8 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 126494 >>> BVSetRandom 10 1.0 2.5061e-03 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> DSSolve 1801 1.0 2.0764e+01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >>> DSVectors 2779 1.0 1.2691e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> DSOther 1801 1.0 1.2944e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> Memory usage is given in bytes: >>> >>> Object Type Creations Destructions Memory Descendants' Mem. >>> Reports information only for process 0. >>> >>> --- Event Stage 0: Main Stage >>> >>> Container 1 1 584 0. >>> Distributed Mesh 6 6 29160 0. >>> GraphPartitioner 2 2 1244 0. >>> Matrix 1104 1104 136615232 0. >>> Index Set 930 930 9125912 0. >>> IS L to G Mapping 3 3 2235608 0. >>> Section 28 26 18720 0. >>> Star Forest Graph 30 30 25632 0. >>> Discrete System 6 6 5616 0. >>> PetscRandom 11 11 7194 0. >>> Vector 604372 604372 8204816368 0. >>> Vec Scatter 203 203 272192 0. >>> Viewer 21 10 8480 0. >>> EPS Solver 10 10 86360 0. >>> Spectral Transform 10 10 8400 0. >>> Basis Vectors 10 10 530848 0. >>> Region 10 10 6800 0. >>> Direct Solver 10 10 9838880 0. >>> Krylov Solver 10 10 13920 0. >>> Preconditioner 10 10 10080 0. >>> ======================================================================================================================== >>> Average time to get PetscTime(): 3.49944e-08 >>> Average time for MPI_Barrier(): 5.842e-06 >>> Average time for zero size MPI_Send(): 8.72551e-06 >>> #PETSc Option Table entries: >>> -config=benchmark3.json >>> -log_view >>> #End of PETSc Option Table entries >>> Compiled without FORTRAN kernels >>> Compiled with full precision matrices (default) >>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 >>> Configure options: --prefix=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 CFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" FFLAGS= CXXFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" --with-cc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc --with-cxx=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpic++ --with-fc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 --with-precision=double --with-scalar-type=real --with-shared-libraries=1 --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= CXXOPTFLAGS= --with-blaslapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib/libopenblas.so --with-x=0 --with-cxx-dialect=C++11 --with-boost=1 --with-clanguage=C --with-scalapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib/libscalapack.so --with-scalapack=1 --with-metis=1 --with-metis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk --with-hdf5=1 --with-hdf5-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5 --with-hypre=1 --with-hypre-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne --with-parmetis=1 --with-parmetis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4 --with-mumps=1 --with-mumps-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b --with-trilinos=1 --with-trilinos-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo --with-fftw=0 --with-cxx-dialect=C++11 --with-superlu_dist-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include --with-superlu_dist-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib/libsuperlu_dist.a --with-superlu_dist=1 --with-suitesparse-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include --with-suitesparse-lib="/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libumfpack.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libklu.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcholmod.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libbtf.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libccolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libsuitesparseconfig.so /lib64/librt.so" --with-suitesparse=1 --with-zlib-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include --with-zlib-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib/libz.so --with-zlib=1 >>> ----------------------------------------- >>> Libraries compiled on 2020-01-22 15:21:53 on eu-c7-051-02 >>> Machine characteristics: Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-centos-7.5.1804-Core >>> Using PETSc directory: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit >>> Using PETSc arch: >>> ----------------------------------------- >>> >>> Using C compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc -ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2 >>> Using Fortran compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 >>> ----------------------------------------- >>> >>> Using include paths: -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include >>> ----------------------------------------- >>> >>> Using C linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc >>> Using Fortran linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 >>> Using libraries: -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -lpetsc -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib /lib64/librt.so -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -Wl,-rpath,/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -L/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib:/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -lmuelu-adapters -lmuelu-interface -lmuelu -lstratimikos -lstratimikosbelos -lstratimikosaztecoo -lstratimikosamesos -lstratimikosml -lstratimikosifpack -lModeLaplace -lanasaziepetra -lanasazi -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lbelosxpetra -lbelosepetra -lbelos -lml -lifpack -lpamgen_extras -lpamgen -lamesos -lgaleri-xpetra -lgaleri-epetra -laztecoo -lisorropia -lxpetra-sup -lxpetra -lthyraepetraext -lthyraepetra -lthyracore -lthyraepetraext -lthyraepetra -lthyracore -lepetraext -ltrilinosss -ltriutils -lzoltan -lepetra -lsacado -lrtop -lkokkoskernels -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lgtest -lpthread -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu_dist -lHYPRE -lopenblas -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lparmetis -lmetis -lm -lz -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl >>> ----------------------------------------- >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >> From awalker at student.ethz.ch Fri May 1 07:31:20 2020 From: awalker at student.ethz.ch (Walker Andreas) Date: Fri, 1 May 2020 12:31:20 +0000 Subject: [petsc-users] Performance of SLEPc's Krylov-Schur solver In-Reply-To: <87sggjam5x.fsf@jedbrown.org> References: <86B05A0E-87C4-4B23-AC8B-6C39E6538B84@student.ethz.ch> <58697038-9771-4819-B060-061A3F3F0E91@student.ethz.ch> <87sggjam5x.fsf@jedbrown.org> Message-ID: <294D3E6F-33A1-4208-AD80-87CDA90DF87B@student.ethz.ch> Hi Jed, Hi Jose, Thank you very much for your suggestions. - I tried reducing the subspace to 64 which indeed reduced the runtime by around 20 percent (sometimes more) for 128 cores. I will check what the effect on the sequential runtime is. - Regarding MatNest, I can just look for the eigenvalues of a submatrix to see how the speedup is affected; I will check that. Replacing the full matnest with a contiguous matrix is definitely more work but, if it improves the performance, worth the work (we assume that the program will be reused a lot). - Petsc is configured with mumps, openblas, scalapack (among others). But I noticed no significant difference to when petsc is configured without them. - The number of iterations required by the solver does not depend on the number of cores. Best regards and many thanks, Andreas Walker > Am 01.05.2020 um 14:12 schrieb Jed Brown : > > "Jose E. Roman" writes: > >> Comments related to PETSc: >> >> - If you look at the "Reduct" column you will see that MatMult() is doing a lot of global reductions, which is bad for scaling. This is due to MATNEST (other Mat types do not do that). I don't know the details of MATNEST, maybe Matt can comment on this. > > It is not intrinsic to MatNest, though use of MatNest incurs extra > VecScatter costs. If you use MatNest without VecNest, then > VecGetSubVector incurs significant cost (including reductions). I > suspect it's likely that some SLEPc functionality is not available with > VecNest. A better option would be to optimize VecGetSubVector by > caching the IS and subvector, at least in the contiguous case. > > How difficult would it be for you to run with a monolithic matrix > instead of MatNest? It would certainly be better at amortizing > communication costs. > >> >> Comments related to SLEPc. >> >> - The last rows (DSSolve, DSVectors, DSOther) correspond to "sequential" computations. In your case they take a non-negligible time (around 30 seconds). You can try to reduce this time by reducing the size of the projected problem, e.g. running with -eps_nev 100 -eps_mpd 64 (see https://slepc.upv.es/documentation/current/docs/manualpages/EPS/EPSSetDimensions.html ) >> >> - In my previous comment about multithreaded BLAS, I was refering to configuring PETSc with MKL, OpenBLAS or similar. But anyway, I don't think this is relevant here. >> >> - Regarding the number of iterations, yes the number of iterations should be the same for different runs if you keep the same number of processes, but when you change the number of processes there might be significant differences for some problems, that is the rationale of my suggestion. Anyway, in your case the fluctuation does not seem very important. >> >> Jose >> >> >>> El 1 may 2020, a las 10:07, Walker Andreas escribi?: >>> >>> Hi Matthew, >>> >>> I just ran the same program on a single core. You can see the output of -log_view below. As I see it, most functions have speedups of around 50 for 128 cores, also functions like matmult etc. >>> >>> Best regards, >>> >>> Andreas >>> >>> ************************************************************************************************************************ >>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** >>> ************************************************************************************************************************ >>> >>> ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- >>> >>> ./Solver on a named eu-a6-011-09 with 1 processor, by awalker Fri May 1 04:03:07 2020 >>> Using Petsc Release Version 3.10.5, Mar, 28, 2019 >>> >>> Max Max/Min Avg Total >>> Time (sec): 3.092e+04 1.000 3.092e+04 >>> Objects: 6.099e+05 1.000 6.099e+05 >>> Flop: 9.313e+13 1.000 9.313e+13 9.313e+13 >>> Flop/sec: 3.012e+09 1.000 3.012e+09 3.012e+09 >>> MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 >>> MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 >>> MPI Reductions: 0.000e+00 0.000 >>> >>> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) >>> e.g., VecAXPY() for real vectors of length N --> 2N flop >>> and VecAXPY() for complex vectors of length N --> 8N flop >>> >>> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- >>> Avg %Total Avg %Total Count %Total Avg %Total Count %Total >>> 0: Main Stage: 3.0925e+04 100.0% 9.3134e+13 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% >>> >>> ------------------------------------------------------------------------------------------------------------------------ >>> See the 'Profiling' chapter of the users' manual for details on interpreting output. >>> Phase summary info: >>> Count: number of times phase was executed >>> Time and Flop: Max - maximum over all processors >>> Ratio - ratio of maximum to minimum over all processors >>> Mess: number of messages sent >>> AvgLen: average message length (bytes) >>> Reduct: number of global reductions >>> Global: entire computation >>> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). >>> %T - percent time in this phase %F - percent flop in this phase >>> %M - percent messages in this phase %L - percent message lengths in this phase >>> %R - percent reductions in this phase >>> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) >>> ------------------------------------------------------------------------------------------------------------------------ >>> Event Count Time (sec) Flop --- Global --- --- Stage ---- Total >>> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> --- Event Stage 0: Main Stage >>> >>> MatMult 152338 1.0 8.2799e+03 1.0 8.20e+12 1.0 0.0e+00 0.0e+00 0.0e+00 27 9 0 0 0 27 9 0 0 0 990 >>> MatMultAdd 609352 1.0 8.1229e+03 1.0 8.20e+12 1.0 0.0e+00 0.0e+00 0.0e+00 26 9 0 0 0 26 9 0 0 0 1010 >>> MatConvert 30 1.0 1.5797e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatScale 10 1.0 4.7172e-02 1.0 6.73e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1426 >>> MatAssemblyBegin 516 1.0 2.0695e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatAssemblyEnd 516 1.0 2.8933e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatZeroEntries 2 1.0 3.6038e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatView 10 1.0 2.4422e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatAXPY 40 1.0 3.1595e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatMatMult 60 1.0 1.3723e+01 1.0 1.24e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 90 >>> MatMatMultSym 100 1.0 1.3651e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatMatMultNum 100 1.0 7.5159e+00 1.0 2.06e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 274 >>> MatMatMatMult 40 1.0 1.8674e+01 1.0 1.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 89 >>> MatMatMatMultSym 40 1.0 1.1848e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatMatMatMultNum 40 1.0 6.8266e+00 1.0 1.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 243 >>> MatPtAP 40 1.0 1.9042e+01 1.0 1.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 87 >>> MatTrnMatMult 40 1.0 7.7990e+00 1.0 8.24e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 106 >>> DMPlexStratify 1 1.0 5.1223e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> DMPlexPrealloc 2 1.0 1.5242e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecSet 914053 1.0 1.4929e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecAssemblyBegin 1 1.0 1.3411e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecAssemblyEnd 1 1.0 8.0094e-08 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecScatterBegin 1 1.0 2.6399e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecSetRandom 10 1.0 8.6088e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> EPSSetUp 10 1.0 2.9988e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> EPSSolve 10 1.0 2.8695e+04 1.0 9.31e+13 1.0 0.0e+00 0.0e+00 0.0e+00 93100 0 0 0 93100 0 0 0 3246 >>> STSetUp 10 1.0 9.7291e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> STApply 152338 1.0 8.2803e+03 1.0 8.20e+12 1.0 0.0e+00 0.0e+00 0.0e+00 27 9 0 0 0 27 9 0 0 0 990 >>> BVCopy 1814 1.0 1.1076e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> BVMultVec 304639 1.0 9.8281e+03 1.0 3.34e+13 1.0 0.0e+00 0.0e+00 0.0e+00 32 36 0 0 0 32 36 0 0 0 3397 >>> BVMultInPlace 1824 1.0 7.0999e+02 1.0 1.79e+13 1.0 0.0e+00 0.0e+00 0.0e+00 2 19 0 0 0 2 19 0 0 0 25213 >>> BVDotVec 304639 1.0 9.8037e+03 1.0 3.36e+13 1.0 0.0e+00 0.0e+00 0.0e+00 32 36 0 0 0 32 36 0 0 0 3427 >>> BVOrthogonalizeV 152348 1.0 1.9633e+04 1.0 6.70e+13 1.0 0.0e+00 0.0e+00 0.0e+00 63 72 0 0 0 63 72 0 0 0 3411 >>> BVScale 152348 1.0 3.7888e+01 1.0 5.32e+10 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1403 >>> BVSetRandom 10 1.0 8.6364e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> DSSolve 1824 1.0 1.7363e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> DSVectors 2797 1.0 1.2353e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> DSOther 1824 1.0 9.8627e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> Memory usage is given in bytes: >>> >>> Object Type Creations Destructions Memory Descendants' Mem. >>> Reports information only for process 0. >>> >>> --- Event Stage 0: Main Stage >>> >>> Container 1 1 584 0. >>> Distributed Mesh 1 1 5184 0. >>> GraphPartitioner 1 1 624 0. >>> Matrix 320 320 3469402576 0. >>> Index Set 53 53 2777932 0. >>> IS L to G Mapping 1 1 249320 0. >>> Section 13 11 7920 0. >>> Star Forest Graph 6 6 4896 0. >>> Discrete System 1 1 936 0. >>> Vector 609405 609405 857220847896 0. >>> Vec Scatter 1 1 704 0. >>> Viewer 22 11 9328 0. >>> EPS Solver 10 10 86360 0. >>> Spectral Transform 10 10 8400 0. >>> Basis Vectors 10 10 530336 0. >>> PetscRandom 10 10 6540 0. >>> Region 10 10 6800 0. >>> Direct Solver 10 10 9838880 0. >>> Krylov Solver 10 10 13920 0. >>> Preconditioner 10 10 10080 0. >>> ======================================================================================================================== >>> Average time to get PetscTime(): 2.50991e-08 >>> #PETSc Option Table entries: >>> -config=benchmark3.json >>> -eps_converged_reason >>> -log_view >>> #End of PETSc Option Table entries >>> Compiled without FORTRAN kernels >>> Compiled with full precision matrices (default) >>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 >>> Configure options: --prefix=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 CFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" FFLAGS= CXXFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" --with-cc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc --with-cxx=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpic++ --with-fc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 --with-precision=double --with-scalar-type=real --with-shared-libraries=1 --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= CXXOPTFLAGS= --with-blaslapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib/libopenblas.so --with-x=0 --with-cxx-dialect=C++11 --with-boost=1 --with-clanguage=C --with-scalapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib/libscalapack.so --with-scalapack=1 --with-metis=1 --with-metis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk --with-hdf5=1 --with-hdf5-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5 --with-hypre=1 --with-hypre-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne --with-parmetis=1 --with-parmetis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4 --with-mumps=1 --with-mumps-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b --with-trilinos=1 --with-trilinos-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo --with-fftw=0 --with-cxx-dialect=C++11 --with-superlu_dist-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include --with-superlu_dist-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib/libsuperlu_dist.a --with-superlu_dist=1 --with-suitesparse-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include --with-suitesparse-lib="/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libumfpack.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libklu.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcholmod.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libbtf.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libccolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libsuitesparseconfig.so /lib64/librt.so" --with-suitesparse=1 --with-zlib-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include --with-zlib-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib/libz.so --with-zlib=1 >>> ----------------------------------------- >>> Libraries compiled on 2020-01-22 15:21:53 on eu-c7-051-02 >>> Machine characteristics: Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-centos-7.5.1804-Core >>> Using PETSc directory: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit >>> Using PETSc arch: >>> ----------------------------------------- >>> >>> Using C compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc -ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2 >>> Using Fortran compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 >>> ----------------------------------------- >>> >>> Using include paths: -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include >>> ----------------------------------------- >>> >>> Using C linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc >>> Using Fortran linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 >>> Using libraries: -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -lpetsc -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib /lib64/librt.so -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -Wl,-rpath,/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -L/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib:/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -lmuelu-adapters -lmuelu-interface -lmuelu -lstratimikos -lstratimikosbelos -lstratimikosaztecoo -lstratimikosamesos -lstratimikosml -lstratimikosifpack -lModeLaplace -lanasaziepetra -lanasazi -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lbelosxpetra -lbelosepetra -lbelos -lml -lifpack -lpamgen_extras -lpamgen -lamesos -lgaleri-xpetra -lgaleri-epetra -laztecoo -lisorropia -lxpetra-sup -lxpetra -lthyraepetraext -lthyraepetra -lthyracore -lthyraepetraext -lthyraepetra -lthyracore -lepetraext -ltrilinosss -ltriutils -lzoltan -lepetra -lsacado -lrtop -lkokkoskernels -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lgtest -lpthread -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu_dist -lHYPRE -lopenblas -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lparmetis -lmetis -lm -lz -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl >>> ----------------------------------------- >>> >>> >>>> Am 30.04.2020 um 17:14 schrieb Matthew Knepley : >>>> >>>> On Thu, Apr 30, 2020 at 10:55 AM Walker Andreas wrote: >>>> Hello everyone, >>>> >>>> I have used SLEPc successfully on a FEM-related project. Even though it is very powerful overall, the speedup I measure is a bit below my expectations. Compared to using a single core, the speedup is for example around 1.8 for two cores but only maybe 50-60 for 128 cores and maybe 70 or 80 for 256 cores. Some details about my problem: >>>> >>>> - The problem is based on meshes with up to 400k degrees of freedom. DMPlex is used for organizing it. >>>> - ParMetis is used to partition the mesh. This yields a stiffness matrix where the vast majority of entries is in the diagonal blocks (i.e. looking at the rows owned by a core, there is a very dense square-shaped region around the diagonal and some loosely scattered nozeroes in the other columns). >>>> - The actual matrix from which I need eigenvalues is a 2x2 block matrix, saved as MATNEST - matrix. Each of these four matrices is computed based on the stiffness matrix and has a similar size and nonzero pattern. For a mesh of 200k dofs, one such matrix has a size of about 174kx174k and on average about 40 nonzeroes per row. >>>> - I use the default Krylov-Schur solver and look for the 100 smallest eigenvalues >>>> - The output of -log_view for the 200k-dof - mesh described above run on 128 cores is at the end of this mail. >>>> >>>> I noticed that the problem matrices are not perfectly balanced, i.e. the number of rows per core might vary between 2500 and 3000, for example. But I am not sure if this is the main reason for the poor speedup. >>>> >>>> I tried to reduce the subspace size but without effect. I also attempted to use the shift-and-invert spectral transformation but the MATNEST-type prevents this. >>>> >>>> Are there any suggestions to improve the speedup further or is this the maximum speedup that I can expect? >>>> >>>> Can you also give us the performance for this problem on one node using the same number of cores per node? Then we can calculate speedup >>>> and look at which functions are not speeding up. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> Thanks a lot in advance, >>>> >>>> Andreas Walker >>>> >>>> m&m group >>>> D-MAVT >>>> ETH Zurich >>>> >>>> ************************************************************************************************************************ >>>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** >>>> ************************************************************************************************************************ >>>> >>>> ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- >>>> >>>> ./Solver on a named eu-g1-050-2 with 128 processors, by awalker Thu Apr 30 15:50:22 2020 >>>> Using Petsc Release Version 3.10.5, Mar, 28, 2019 >>>> >>>> Max Max/Min Avg Total >>>> Time (sec): 6.209e+02 1.000 6.209e+02 >>>> Objects: 6.068e+05 1.001 6.063e+05 >>>> Flop: 9.230e+11 1.816 7.212e+11 9.231e+13 >>>> Flop/sec: 1.487e+09 1.816 1.161e+09 1.487e+11 >>>> MPI Messages: 1.451e+07 2.999 8.265e+06 1.058e+09 >>>> MPI Message Lengths: 6.062e+09 2.011 5.029e+02 5.321e+11 >>>> MPI Reductions: 1.512e+06 1.000 >>>> >>>> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) >>>> e.g., VecAXPY() for real vectors of length N --> 2N flop >>>> and VecAXPY() for complex vectors of length N --> 8N flop >>>> >>>> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- >>>> Avg %Total Avg %Total Count %Total Avg %Total Count %Total >>>> 0: Main Stage: 6.2090e+02 100.0% 9.2309e+13 100.0% 1.058e+09 100.0% 5.029e+02 100.0% 1.512e+06 100.0% >>>> >>>> ------------------------------------------------------------------------------------------------------------------------ >>>> See the 'Profiling' chapter of the users' manual for details on interpreting output. >>>> Phase summary info: >>>> Count: number of times phase was executed >>>> Time and Flop: Max - maximum over all processors >>>> Ratio - ratio of maximum to minimum over all processors >>>> Mess: number of messages sent >>>> AvgLen: average message length (bytes) >>>> Reduct: number of global reductions >>>> Global: entire computation >>>> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). >>>> %T - percent time in this phase %F - percent flop in this phase >>>> %M - percent messages in this phase %L - percent message lengths in this phase >>>> %R - percent reductions in this phase >>>> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) >>>> ------------------------------------------------------------------------------------------------------------------------ >>>> Event Count Time (sec) Flop --- Global --- --- Stage ---- Total >>>> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >>>> ------------------------------------------------------------------------------------------------------------------------ >>>> >>>> --- Event Stage 0: Main Stage >>>> >>>> BuildTwoSided 20 1.0 2.3249e-01 2.2 0.00e+00 0.0 2.2e+04 4.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> BuildTwoSidedF 317 1.0 8.5016e-01 4.8 0.00e+00 0.0 2.1e+04 1.4e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatMult 150986 1.0 2.1963e+02 1.3 8.07e+10 1.8 1.1e+09 5.0e+02 1.2e+06 31 9100100 80 31 9100100 80 37007 >>>> MatMultAdd 603944 1.0 1.6209e+02 1.4 8.07e+10 1.8 1.1e+09 5.0e+02 0.0e+00 23 9100100 0 23 9100100 0 50145 >>>> MatConvert 30 1.0 1.6488e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatScale 10 1.0 1.0347e-03 3.9 6.68e+05 1.8 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 65036 >>>> MatAssemblyBegin 916 1.0 8.6715e-01 1.4 0.00e+00 0.0 2.1e+04 1.4e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatAssemblyEnd 916 1.0 2.0682e-01 1.1 0.00e+00 0.0 4.7e+05 1.3e+02 1.5e+03 0 0 0 0 0 0 0 0 0 0 0 >>>> MatZeroEntries 42 1.0 7.2787e-03 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatView 10 1.0 1.4816e+00 1.0 0.00e+00 0.0 6.4e+03 1.3e+05 3.0e+01 0 0 0 0 0 0 0 0 0 0 0 >>>> MatAXPY 40 1.0 1.0752e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatTranspose 80 1.0 3.0198e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatMatMult 60 1.0 3.0391e-01 1.0 7.82e+06 1.6 3.8e+05 2.8e+02 7.8e+02 0 0 0 0 0 0 0 0 0 0 2711 >>>> MatMatMultSym 60 1.0 2.4238e-01 1.0 0.00e+00 0.0 3.3e+05 2.4e+02 7.2e+02 0 0 0 0 0 0 0 0 0 0 0 >>>> MatMatMultNum 60 1.0 5.8508e-02 1.0 7.82e+06 1.6 4.7e+04 5.7e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 14084 >>>> MatPtAP 40 1.0 4.5617e-01 1.0 1.59e+07 1.6 3.3e+05 1.0e+03 6.4e+02 0 0 0 0 0 0 0 0 0 0 3649 >>>> MatPtAPSymbolic 40 1.0 2.6002e-01 1.0 0.00e+00 0.0 1.7e+05 6.5e+02 2.8e+02 0 0 0 0 0 0 0 0 0 0 0 >>>> MatPtAPNumeric 40 1.0 1.9293e-01 1.0 1.59e+07 1.6 1.5e+05 1.5e+03 3.2e+02 0 0 0 0 0 0 0 0 0 0 8629 >>>> MatTrnMatMult 40 1.0 2.3801e-01 1.0 6.09e+06 1.8 1.8e+05 1.0e+03 6.4e+02 0 0 0 0 0 0 0 0 0 0 2442 >>>> MatTrnMatMultSym 40 1.0 1.6962e-01 1.0 0.00e+00 0.0 1.7e+05 4.4e+02 6.4e+02 0 0 0 0 0 0 0 0 0 0 0 >>>> MatTrnMatMultNum 40 1.0 6.9000e-02 1.0 6.09e+06 1.8 9.7e+03 1.1e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 8425 >>>> MatGetLocalMat 240 1.0 4.9149e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatGetBrAoCol 160 1.0 2.0470e-02 1.6 0.00e+00 0.0 3.3e+05 4.1e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatTranspose_SeqAIJ_FAST 80 1.0 2.9940e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> Mesh Partition 1 1.0 1.4825e+00 1.0 0.00e+00 0.0 9.8e+04 6.9e+01 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> Mesh Migration 1 1.0 3.6680e-02 1.0 0.00e+00 0.0 1.5e+03 1.4e+04 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> DMPlexDistribute 1 1.0 1.5269e+00 1.0 0.00e+00 0.0 1.0e+05 3.5e+02 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 >>>> DMPlexDistCones 1 1.0 1.8845e-02 1.2 0.00e+00 0.0 1.0e+03 1.7e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> DMPlexDistLabels 1 1.0 9.7280e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> DMPlexDistData 1 1.0 3.1499e-01 1.4 0.00e+00 0.0 9.8e+04 4.3e+01 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> DMPlexStratify 2 1.0 9.3421e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> DMPlexPrealloc 2 1.0 3.5980e-02 1.0 0.00e+00 0.0 4.0e+04 1.8e+03 3.0e+01 0 0 0 0 0 0 0 0 0 0 0 >>>> SFSetGraph 20 1.0 1.6069e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> SFSetUp 20 1.0 2.8043e-01 1.9 0.00e+00 0.0 6.7e+04 5.0e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> SFBcastBegin 25 1.0 3.9653e-02 2.5 0.00e+00 0.0 6.1e+04 4.9e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> SFBcastEnd 25 1.0 9.0128e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> SFReduceBegin 10 1.0 4.3473e-04 5.5 0.00e+00 0.0 7.4e+03 4.0e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> SFReduceEnd 10 1.0 5.7962e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> SFFetchOpBegin 2 1.0 1.6069e-0434.7 0.00e+00 0.0 1.8e+03 4.4e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> SFFetchOpEnd 2 1.0 8.9251e-04 2.6 0.00e+00 0.0 1.8e+03 4.4e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecSet 302179 1.0 1.3128e+00 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecAssemblyBegin 1 1.0 1.3844e-03 7.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecAssemblyEnd 1 1.0 3.4710e-05 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecScatterBegin 603945 1.0 2.2874e+01 4.4 0.00e+00 0.0 1.1e+09 5.0e+02 1.0e+00 2 0100100 0 2 0100100 0 0 >>>> VecScatterEnd 603944 1.0 8.2651e+01 4.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 7 0 0 0 0 7 0 0 0 0 0 >>>> VecSetRandom 11 1.0 2.7061e-03 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> EPSSetUp 10 1.0 5.0371e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+01 0 0 0 0 0 0 0 0 0 0 0 >>>> EPSSolve 10 1.0 6.1329e+02 1.0 9.23e+11 1.8 1.1e+09 5.0e+02 1.5e+06 99100100100100 99100100100100 150509 >>>> STSetUp 10 1.0 2.5475e-04 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> STApply 150986 1.0 2.1997e+02 1.3 8.07e+10 1.8 1.1e+09 5.0e+02 1.2e+06 31 9100100 80 31 9100100 80 36950 >>>> BVCopy 1791 1.0 5.1953e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> BVMultVec 301925 1.0 1.5007e+02 3.1 3.31e+11 1.8 0.0e+00 0.0e+00 0.0e+00 14 36 0 0 0 14 36 0 0 0 220292 >>>> BVMultInPlace 1801 1.0 8.0080e+00 1.8 1.78e+11 1.8 0.0e+00 0.0e+00 0.0e+00 1 19 0 0 0 1 19 0 0 0 2222543 >>>> BVDotVec 301925 1.0 3.2807e+02 1.4 3.33e+11 1.8 0.0e+00 0.0e+00 3.0e+05 47 36 0 0 20 47 36 0 0 20 101409 >>>> BVOrthogonalizeV 150996 1.0 4.0292e+02 1.1 6.64e+11 1.8 0.0e+00 0.0e+00 3.0e+05 62 72 0 0 20 62 72 0 0 20 164619 >>>> BVScale 150996 1.0 4.1660e-01 3.2 5.27e+08 1.8 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 126494 >>>> BVSetRandom 10 1.0 2.5061e-03 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> DSSolve 1801 1.0 2.0764e+01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >>>> DSVectors 2779 1.0 1.2691e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> DSOther 1801 1.0 1.2944e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>>> ------------------------------------------------------------------------------------------------------------------------ >>>> >>>> Memory usage is given in bytes: >>>> >>>> Object Type Creations Destructions Memory Descendants' Mem. >>>> Reports information only for process 0. >>>> >>>> --- Event Stage 0: Main Stage >>>> >>>> Container 1 1 584 0. >>>> Distributed Mesh 6 6 29160 0. >>>> GraphPartitioner 2 2 1244 0. >>>> Matrix 1104 1104 136615232 0. >>>> Index Set 930 930 9125912 0. >>>> IS L to G Mapping 3 3 2235608 0. >>>> Section 28 26 18720 0. >>>> Star Forest Graph 30 30 25632 0. >>>> Discrete System 6 6 5616 0. >>>> PetscRandom 11 11 7194 0. >>>> Vector 604372 604372 8204816368 0. >>>> Vec Scatter 203 203 272192 0. >>>> Viewer 21 10 8480 0. >>>> EPS Solver 10 10 86360 0. >>>> Spectral Transform 10 10 8400 0. >>>> Basis Vectors 10 10 530848 0. >>>> Region 10 10 6800 0. >>>> Direct Solver 10 10 9838880 0. >>>> Krylov Solver 10 10 13920 0. >>>> Preconditioner 10 10 10080 0. >>>> ======================================================================================================================== >>>> Average time to get PetscTime(): 3.49944e-08 >>>> Average time for MPI_Barrier(): 5.842e-06 >>>> Average time for zero size MPI_Send(): 8.72551e-06 >>>> #PETSc Option Table entries: >>>> -config=benchmark3.json >>>> -log_view >>>> #End of PETSc Option Table entries >>>> Compiled without FORTRAN kernels >>>> Compiled with full precision matrices (default) >>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 >>>> Configure options: --prefix=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 CFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" FFLAGS= CXXFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" --with-cc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc --with-cxx=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpic++ --with-fc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 --with-precision=double --with-scalar-type=real --with-shared-libraries=1 --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= CXXOPTFLAGS= --with-blaslapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib/libopenblas.so --with-x=0 --with-cxx-dialect=C++11 --with-boost=1 --with-clanguage=C --with-scalapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib/libscalapack.so --with-scalapack=1 --with-metis=1 --with-metis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk --with-hdf5=1 --with-hdf5-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5 --with-hypre=1 --with-hypre-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne --with-parmetis=1 --with-parmetis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4 --with-mumps=1 --with-mumps-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b --with-trilinos=1 --with-trilinos-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo --with-fftw=0 --with-cxx-dialect=C++11 --with-superlu_dist-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include --with-superlu_dist-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib/libsuperlu_dist.a --with-superlu_dist=1 --with-suitesparse-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include --with-suitesparse-lib="/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libumfpack.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libklu.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcholmod.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libbtf.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libccolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libsuitesparseconfig.so /lib64/librt.so" --with-suitesparse=1 --with-zlib-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include --with-zlib-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib/libz.so --with-zlib=1 >>>> ----------------------------------------- >>>> Libraries compiled on 2020-01-22 15:21:53 on eu-c7-051-02 >>>> Machine characteristics: Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-centos-7.5.1804-Core >>>> Using PETSc directory: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit >>>> Using PETSc arch: >>>> ----------------------------------------- >>>> >>>> Using C compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc -ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2 >>>> Using Fortran compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 >>>> ----------------------------------------- >>>> >>>> Using include paths: -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include >>>> ----------------------------------------- >>>> >>>> Using C linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc >>>> Using Fortran linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 >>>> Using libraries: -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -lpetsc -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib /lib64/librt.so -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -Wl,-rpath,/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -L/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib:/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -lmuelu-adapters -lmuelu-interface -lmuelu -lstratimikos -lstratimikosbelos -lstratimikosaztecoo -lstratimikosamesos -lstratimikosml -lstratimikosifpack -lModeLaplace -lanasaziepetra -lanasazi -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lbelosxpetra -lbelosepetra -lbelos -lml -lifpack -lpamgen_extras -lpamgen -lamesos -lgaleri-xpetra -lgaleri-epetra -laztecoo -lisorropia -lxpetra-sup -lxpetra -lthyraepetraext -lthyraepetra -lthyracore -lthyraepetraext -lthyraepetra -lthyracore -lepetraext -ltrilinosss -ltriutils -lzoltan -lepetra -lsacado -lrtop -lkokkoskernels -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lgtest -lpthread -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu_dist -lHYPRE -lopenblas -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lparmetis -lmetis -lm -lz -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl >>>> ----------------------------------------- >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>> From knepley at gmail.com Fri May 1 07:45:47 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 1 May 2020 08:45:47 -0400 Subject: [petsc-users] Performance of SLEPc's Krylov-Schur solver In-Reply-To: <294D3E6F-33A1-4208-AD80-87CDA90DF87B@student.ethz.ch> References: <86B05A0E-87C4-4B23-AC8B-6C39E6538B84@student.ethz.ch> <58697038-9771-4819-B060-061A3F3F0E91@student.ethz.ch> <87sggjam5x.fsf@jedbrown.org> <294D3E6F-33A1-4208-AD80-87CDA90DF87B@student.ethz.ch> Message-ID: On Fri, May 1, 2020 at 8:32 AM Walker Andreas wrote: > Hi Jed, Hi Jose, > > Thank you very much for your suggestions. > > - I tried reducing the subspace to 64 which indeed reduced the runtime by > around 20 percent (sometimes more) for 128 cores. I will check what the > effect on the sequential runtime is. > - Regarding MatNest, I can just look for the eigenvalues of a submatrix > to see how the speedup is affected; I will check that. Replacing the full > matnest with a contiguous matrix is definitely more work but, if it > improves the performance, worth the work (we assume that the program will > be reused a lot). > - Petsc is configured with mumps, openblas, scalapack (among others). But > I noticed no significant difference to when petsc is configured without > them. > - The number of iterations required by the solver does not depend on the > number of cores. > > Best regards and many thanks, > Let me just address something from a high level. These operations are not compute limited (for the most part), but limited by bandwidth. Bandwidth is allocated by node, not by core, on these machines. That is why it important to understand how many nodes you are using, not cores. A useful scaling test would be to fill up a single node (however many cores fit on one node), and then increase the # of nodes. We would expect close to linear scaling in that case. Thanks, Matt > Andreas Walker > > > Am 01.05.2020 um 14:12 schrieb Jed Brown : > > > > "Jose E. Roman" writes: > > > >> Comments related to PETSc: > >> > >> - If you look at the "Reduct" column you will see that MatMult() is > doing a lot of global reductions, which is bad for scaling. This is due to > MATNEST (other Mat types do not do that). I don't know the details of > MATNEST, maybe Matt can comment on this. > > > > It is not intrinsic to MatNest, though use of MatNest incurs extra > > VecScatter costs. If you use MatNest without VecNest, then > > VecGetSubVector incurs significant cost (including reductions). I > > suspect it's likely that some SLEPc functionality is not available with > > VecNest. A better option would be to optimize VecGetSubVector by > > caching the IS and subvector, at least in the contiguous case. > > > > How difficult would it be for you to run with a monolithic matrix > > instead of MatNest? It would certainly be better at amortizing > > communication costs. > > > >> > >> Comments related to SLEPc. > >> > >> - The last rows (DSSolve, DSVectors, DSOther) correspond to > "sequential" computations. In your case they take a non-negligible time > (around 30 seconds). You can try to reduce this time by reducing the size > of the projected problem, e.g. running with -eps_nev 100 -eps_mpd 64 (see > https://slepc.upv.es/documentation/current/docs/manualpages/EPS/EPSSetDimensions.html > ) > >> > >> - In my previous comment about multithreaded BLAS, I was refering to > configuring PETSc with MKL, OpenBLAS or similar. But anyway, I don't think > this is relevant here. > >> > >> - Regarding the number of iterations, yes the number of iterations > should be the same for different runs if you keep the same number of > processes, but when you change the number of processes there might be > significant differences for some problems, that is the rationale of my > suggestion. Anyway, in your case the fluctuation does not seem very > important. > >> > >> Jose > >> > >> > >>> El 1 may 2020, a las 10:07, Walker Andreas > escribi?: > >>> > >>> Hi Matthew, > >>> > >>> I just ran the same program on a single core. You can see the output > of -log_view below. As I see it, most functions have speedups of around 50 > for 128 cores, also functions like matmult etc. > >>> > >>> Best regards, > >>> > >>> Andreas > >>> > >>> > ************************************************************************************************************************ > >>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r > -fCourier9' to print this document *** > >>> > ************************************************************************************************************************ > >>> > >>> ---------------------------------------------- PETSc Performance > Summary: ---------------------------------------------- > >>> > >>> ./Solver on a named eu-a6-011-09 with 1 processor, by awalker Fri > May 1 04:03:07 2020 > >>> Using Petsc Release Version 3.10.5, Mar, 28, 2019 > >>> > >>> Max Max/Min Avg Total > >>> Time (sec): 3.092e+04 1.000 3.092e+04 > >>> Objects: 6.099e+05 1.000 6.099e+05 > >>> Flop: 9.313e+13 1.000 9.313e+13 9.313e+13 > >>> Flop/sec: 3.012e+09 1.000 3.012e+09 3.012e+09 > >>> MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 > >>> MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 > >>> MPI Reductions: 0.000e+00 0.000 > >>> > >>> Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > >>> e.g., VecAXPY() for real vectors of length > N --> 2N flop > >>> and VecAXPY() for complex vectors of length > N --> 8N flop > >>> > >>> Summary of Stages: ----- Time ------ ----- Flop ------ --- > Messages --- -- Message Lengths -- -- Reductions -- > >>> Avg %Total Avg %Total Count > %Total Avg %Total Count %Total > >>> 0: Main Stage: 3.0925e+04 100.0% 9.3134e+13 100.0% 0.000e+00 > 0.0% 0.000e+00 0.0% 0.000e+00 0.0% > >>> > >>> > ------------------------------------------------------------------------------------------------------------------------ > >>> See the 'Profiling' chapter of the users' manual for details on > interpreting output. > >>> Phase summary info: > >>> Count: number of times phase was executed > >>> Time and Flop: Max - maximum over all processors > >>> Ratio - ratio of maximum to minimum over all > processors > >>> Mess: number of messages sent > >>> AvgLen: average message length (bytes) > >>> Reduct: number of global reductions > >>> Global: entire computation > >>> Stage: stages of a computation. Set stages with PetscLogStagePush() > and PetscLogStagePop(). > >>> %T - percent time in this phase %F - percent flop in this > phase > >>> %M - percent messages in this phase %L - percent message > lengths in this phase > >>> %R - percent reductions in this phase > >>> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time > over all processors) > >>> > ------------------------------------------------------------------------------------------------------------------------ > >>> Event Count Time (sec) Flop > --- Global --- --- Stage ---- Total > >>> Max Ratio Max Ratio Max Ratio Mess > AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > >>> > ------------------------------------------------------------------------------------------------------------------------ > >>> > >>> --- Event Stage 0: Main Stage > >>> > >>> MatMult 152338 1.0 8.2799e+03 1.0 8.20e+12 1.0 0.0e+00 > 0.0e+00 0.0e+00 27 9 0 0 0 27 9 0 0 0 990 > >>> MatMultAdd 609352 1.0 8.1229e+03 1.0 8.20e+12 1.0 0.0e+00 > 0.0e+00 0.0e+00 26 9 0 0 0 26 9 0 0 0 1010 > >>> MatConvert 30 1.0 1.5797e+00 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>> MatScale 10 1.0 4.7172e-02 1.0 6.73e+07 1.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1426 > >>> MatAssemblyBegin 516 1.0 2.0695e-04 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>> MatAssemblyEnd 516 1.0 2.8933e+00 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>> MatZeroEntries 2 1.0 3.6038e-02 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>> MatView 10 1.0 2.4422e+00 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>> MatAXPY 40 1.0 3.1595e-01 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>> MatMatMult 60 1.0 1.3723e+01 1.0 1.24e+09 1.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 90 > >>> MatMatMultSym 100 1.0 1.3651e+01 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>> MatMatMultNum 100 1.0 7.5159e+00 1.0 2.06e+09 1.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 274 > >>> MatMatMatMult 40 1.0 1.8674e+01 1.0 1.66e+09 1.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 89 > >>> MatMatMatMultSym 40 1.0 1.1848e+01 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>> MatMatMatMultNum 40 1.0 6.8266e+00 1.0 1.66e+09 1.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 243 > >>> MatPtAP 40 1.0 1.9042e+01 1.0 1.66e+09 1.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 87 > >>> MatTrnMatMult 40 1.0 7.7990e+00 1.0 8.24e+08 1.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 106 > >>> DMPlexStratify 1 1.0 5.1223e-02 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>> DMPlexPrealloc 2 1.0 1.5242e+00 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>> VecSet 914053 1.0 1.4929e+02 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>> VecAssemblyBegin 1 1.0 1.3411e-07 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>> VecAssemblyEnd 1 1.0 8.0094e-08 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>> VecScatterBegin 1 1.0 2.6399e-04 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>> VecSetRandom 10 1.0 8.6088e-02 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>> EPSSetUp 10 1.0 2.9988e+00 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>> EPSSolve 10 1.0 2.8695e+04 1.0 9.31e+13 1.0 0.0e+00 > 0.0e+00 0.0e+00 93100 0 0 0 93100 0 0 0 3246 > >>> STSetUp 10 1.0 9.7291e-05 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>> STApply 152338 1.0 8.2803e+03 1.0 8.20e+12 1.0 0.0e+00 > 0.0e+00 0.0e+00 27 9 0 0 0 27 9 0 0 0 990 > >>> BVCopy 1814 1.0 1.1076e+00 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>> BVMultVec 304639 1.0 9.8281e+03 1.0 3.34e+13 1.0 0.0e+00 > 0.0e+00 0.0e+00 32 36 0 0 0 32 36 0 0 0 3397 > >>> BVMultInPlace 1824 1.0 7.0999e+02 1.0 1.79e+13 1.0 0.0e+00 > 0.0e+00 0.0e+00 2 19 0 0 0 2 19 0 0 0 25213 > >>> BVDotVec 304639 1.0 9.8037e+03 1.0 3.36e+13 1.0 0.0e+00 > 0.0e+00 0.0e+00 32 36 0 0 0 32 36 0 0 0 3427 > >>> BVOrthogonalizeV 152348 1.0 1.9633e+04 1.0 6.70e+13 1.0 0.0e+00 > 0.0e+00 0.0e+00 63 72 0 0 0 63 72 0 0 0 3411 > >>> BVScale 152348 1.0 3.7888e+01 1.0 5.32e+10 1.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1403 > >>> BVSetRandom 10 1.0 8.6364e-02 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>> DSSolve 1824 1.0 1.7363e+01 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>> DSVectors 2797 1.0 1.2353e-01 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>> DSOther 1824 1.0 9.8627e+00 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>> > ------------------------------------------------------------------------------------------------------------------------ > >>> > >>> Memory usage is given in bytes: > >>> > >>> Object Type Creations Destructions Memory Descendants' > Mem. > >>> Reports information only for process 0. > >>> > >>> --- Event Stage 0: Main Stage > >>> > >>> Container 1 1 584 0. > >>> Distributed Mesh 1 1 5184 0. > >>> GraphPartitioner 1 1 624 0. > >>> Matrix 320 320 3469402576 0. > >>> Index Set 53 53 2777932 0. > >>> IS L to G Mapping 1 1 249320 0. > >>> Section 13 11 7920 0. > >>> Star Forest Graph 6 6 4896 0. > >>> Discrete System 1 1 936 0. > >>> Vector 609405 609405 857220847896 0. > >>> Vec Scatter 1 1 704 0. > >>> Viewer 22 11 9328 0. > >>> EPS Solver 10 10 86360 0. > >>> Spectral Transform 10 10 8400 0. > >>> Basis Vectors 10 10 530336 0. > >>> PetscRandom 10 10 6540 0. > >>> Region 10 10 6800 0. > >>> Direct Solver 10 10 9838880 0. > >>> Krylov Solver 10 10 13920 0. > >>> Preconditioner 10 10 10080 0. > >>> > ======================================================================================================================== > >>> Average time to get PetscTime(): 2.50991e-08 > >>> #PETSc Option Table entries: > >>> -config=benchmark3.json > >>> -eps_converged_reason > >>> -log_view > >>> #End of PETSc Option Table entries > >>> Compiled without FORTRAN kernels > >>> Compiled with full precision matrices (default) > >>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 8 sizeof(PetscInt) 4 > >>> Configure options: > --prefix=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit > --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 > CFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" FFLAGS= > CXXFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" > --with-cc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc > --with-cxx=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpic++ > --with-fc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 > --with-precision=double --with-scalar-type=real --with-shared-libraries=1 > --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= > CXXOPTFLAGS= > --with-blaslapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib/libopenblas.so > --with-x=0 --with-cxx-dialect=C++11 --with-boost=1 --with-clanguage=C > --with-scalapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib/libscalapack.so > --with-scalapack=1 --with-metis=1 > --with-metis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk > --with-hdf5=1 > --with-hdf5-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5 > --with-hypre=1 > --with-hypre-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne > --with-parmetis=1 > --with-parmetis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4 > --with-mumps=1 > --with-mumps-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b > --with-trilinos=1 > --with-trilinos-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo > --with-fftw=0 --with-cxx-dialect=C++11 > --with-superlu_dist-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include > --with-superlu_dist-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib/libsuperlu_dist.a > --with-superlu_dist=1 > --with-suitesparse-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include > --with-suitesparse-lib="/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libumfpack.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libklu.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcholmod.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libbtf.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libccolamd.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcolamd.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcamd.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libamd.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libsuitesparseconfig.so > /lib64/librt.so" --with-suitesparse=1 > --with-zlib-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include > --with-zlib-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib/libz.so > --with-zlib=1 > >>> ----------------------------------------- > >>> Libraries compiled on 2020-01-22 15:21:53 on eu-c7-051-02 > >>> Machine characteristics: > Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-centos-7.5.1804-Core > >>> Using PETSc directory: > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit > >>> Using PETSc arch: > >>> ----------------------------------------- > >>> > >>> Using C compiler: > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc > -ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2 > >>> Using Fortran compiler: > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 > > >>> ----------------------------------------- > >>> > >>> Using include paths: > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include > >>> ----------------------------------------- > >>> > >>> Using C linker: > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc > >>> Using Fortran linker: > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 > >>> Using libraries: > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib > -lpetsc > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib > /lib64/librt.so > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib > -Wl,-rpath,/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib > -L/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib:/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib > -lmuelu-adapters -lmuelu-interface -lmuelu -lstratimikos -lstratimikosbelos > -lstratimikosaztecoo -lstratimikosamesos -lstratimikosml > -lstratimikosifpack -lModeLaplace -lanasaziepetra -lanasazi -lmapvarlib > -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco > -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac > -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus > -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco > -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac > -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus > -lbelosxpetra -lbelosepetra -lbelos -lml -lifpack -lpamgen_extras -lpamgen > -lamesos -lgaleri-xpetra -lgaleri-epetra -laztecoo -lisorropia -lxpetra-sup > -lxpetra -lthyraepetraext -lthyraepetra -lthyracore -lthyraepetraext > -lthyraepetra -lthyracore -lepetraext -ltrilinosss -ltriutils -lzoltan > -lepetra -lsacado -lrtop -lkokkoskernels -lteuchoskokkoscomm > -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm > -lteuchosparameterlist -lteuchosparser -lteuchoscore -lteuchoskokkoscomm > -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm > -lteuchosparameterlist -lteuchosparser -lteuchoscore -lkokkosalgorithms > -lkokkoscontainers -lkokkoscore -lkokkosalgorithms -lkokkoscontainers > -lkokkoscore -lgtest -lpthread -lcmumps -ldmumps -lsmumps -lzmumps > -lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf -lccolamd > -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu_dist -lHYPRE -lopenblas > -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lparmetis -lmetis -lm -lz > -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi > -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl > >>> ----------------------------------------- > >>> > >>> > >>>> Am 30.04.2020 um 17:14 schrieb Matthew Knepley : > >>>> > >>>> On Thu, Apr 30, 2020 at 10:55 AM Walker Andreas < > awalker at student.ethz.ch> wrote: > >>>> Hello everyone, > >>>> > >>>> I have used SLEPc successfully on a FEM-related project. Even though > it is very powerful overall, the speedup I measure is a bit below my > expectations. Compared to using a single core, the speedup is for example > around 1.8 for two cores but only maybe 50-60 for 128 cores and maybe 70 or > 80 for 256 cores. Some details about my problem: > >>>> > >>>> - The problem is based on meshes with up to 400k degrees of freedom. > DMPlex is used for organizing it. > >>>> - ParMetis is used to partition the mesh. This yields a stiffness > matrix where the vast majority of entries is in the diagonal blocks (i.e. > looking at the rows owned by a core, there is a very dense square-shaped > region around the diagonal and some loosely scattered nozeroes in the other > columns). > >>>> - The actual matrix from which I need eigenvalues is a 2x2 block > matrix, saved as MATNEST - matrix. Each of these four matrices is computed > based on the stiffness matrix and has a similar size and nonzero pattern. > For a mesh of 200k dofs, one such matrix has a size of about 174kx174k and > on average about 40 nonzeroes per row. > >>>> - I use the default Krylov-Schur solver and look for the 100 smallest > eigenvalues > >>>> - The output of -log_view for the 200k-dof - mesh described above run > on 128 cores is at the end of this mail. > >>>> > >>>> I noticed that the problem matrices are not perfectly balanced, i.e. > the number of rows per core might vary between 2500 and 3000, for example. > But I am not sure if this is the main reason for the poor speedup. > >>>> > >>>> I tried to reduce the subspace size but without effect. I also > attempted to use the shift-and-invert spectral transformation but the > MATNEST-type prevents this. > >>>> > >>>> Are there any suggestions to improve the speedup further or is this > the maximum speedup that I can expect? > >>>> > >>>> Can you also give us the performance for this problem on one node > using the same number of cores per node? Then we can calculate speedup > >>>> and look at which functions are not speeding up. > >>>> > >>>> Thanks, > >>>> > >>>> Matt > >>>> > >>>> Thanks a lot in advance, > >>>> > >>>> Andreas Walker > >>>> > >>>> m&m group > >>>> D-MAVT > >>>> ETH Zurich > >>>> > >>>> > ************************************************************************************************************************ > >>>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript > -r -fCourier9' to print this document *** > >>>> > ************************************************************************************************************************ > >>>> > >>>> ---------------------------------------------- PETSc Performance > Summary: ---------------------------------------------- > >>>> > >>>> ./Solver on a named eu-g1-050-2 with 128 processors, by awalker Thu > Apr 30 15:50:22 2020 > >>>> Using Petsc Release Version 3.10.5, Mar, 28, 2019 > >>>> > >>>> Max Max/Min Avg Total > >>>> Time (sec): 6.209e+02 1.000 6.209e+02 > >>>> Objects: 6.068e+05 1.001 6.063e+05 > >>>> Flop: 9.230e+11 1.816 7.212e+11 9.231e+13 > >>>> Flop/sec: 1.487e+09 1.816 1.161e+09 1.487e+11 > >>>> MPI Messages: 1.451e+07 2.999 8.265e+06 1.058e+09 > >>>> MPI Message Lengths: 6.062e+09 2.011 5.029e+02 5.321e+11 > >>>> MPI Reductions: 1.512e+06 1.000 > >>>> > >>>> Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > >>>> e.g., VecAXPY() for real vectors of length > N --> 2N flop > >>>> and VecAXPY() for complex vectors of > length N --> 8N flop > >>>> > >>>> Summary of Stages: ----- Time ------ ----- Flop ------ --- > Messages --- -- Message Lengths -- -- Reductions -- > >>>> Avg %Total Avg %Total Count > %Total Avg %Total Count %Total > >>>> 0: Main Stage: 6.2090e+02 100.0% 9.2309e+13 100.0% 1.058e+09 > 100.0% 5.029e+02 100.0% 1.512e+06 100.0% > >>>> > >>>> > ------------------------------------------------------------------------------------------------------------------------ > >>>> See the 'Profiling' chapter of the users' manual for details on > interpreting output. > >>>> Phase summary info: > >>>> Count: number of times phase was executed > >>>> Time and Flop: Max - maximum over all processors > >>>> Ratio - ratio of maximum to minimum over all > processors > >>>> Mess: number of messages sent > >>>> AvgLen: average message length (bytes) > >>>> Reduct: number of global reductions > >>>> Global: entire computation > >>>> Stage: stages of a computation. Set stages with PetscLogStagePush() > and PetscLogStagePop(). > >>>> %T - percent time in this phase %F - percent flop in > this phase > >>>> %M - percent messages in this phase %L - percent message > lengths in this phase > >>>> %R - percent reductions in this phase > >>>> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time > over all processors) > >>>> > ------------------------------------------------------------------------------------------------------------------------ > >>>> Event Count Time (sec) Flop > --- Global --- --- Stage ---- Total > >>>> Max Ratio Max Ratio Max Ratio Mess > AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > >>>> > ------------------------------------------------------------------------------------------------------------------------ > >>>> > >>>> --- Event Stage 0: Main Stage > >>>> > >>>> BuildTwoSided 20 1.0 2.3249e-01 2.2 0.00e+00 0.0 2.2e+04 > 4.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> BuildTwoSidedF 317 1.0 8.5016e-01 4.8 0.00e+00 0.0 2.1e+04 > 1.4e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> MatMult 150986 1.0 2.1963e+02 1.3 8.07e+10 1.8 1.1e+09 > 5.0e+02 1.2e+06 31 9100100 80 31 9100100 80 37007 > >>>> MatMultAdd 603944 1.0 1.6209e+02 1.4 8.07e+10 1.8 1.1e+09 > 5.0e+02 0.0e+00 23 9100100 0 23 9100100 0 50145 > >>>> MatConvert 30 1.0 1.6488e-02 2.2 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> MatScale 10 1.0 1.0347e-03 3.9 6.68e+05 1.8 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 65036 > >>>> MatAssemblyBegin 916 1.0 8.6715e-01 1.4 0.00e+00 0.0 2.1e+04 > 1.4e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> MatAssemblyEnd 916 1.0 2.0682e-01 1.1 0.00e+00 0.0 4.7e+05 > 1.3e+02 1.5e+03 0 0 0 0 0 0 0 0 0 0 0 > >>>> MatZeroEntries 42 1.0 7.2787e-03 2.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> MatView 10 1.0 1.4816e+00 1.0 0.00e+00 0.0 6.4e+03 > 1.3e+05 3.0e+01 0 0 0 0 0 0 0 0 0 0 0 > >>>> MatAXPY 40 1.0 1.0752e-02 1.9 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> MatTranspose 80 1.0 3.0198e-03 1.4 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> MatMatMult 60 1.0 3.0391e-01 1.0 7.82e+06 1.6 3.8e+05 > 2.8e+02 7.8e+02 0 0 0 0 0 0 0 0 0 0 2711 > >>>> MatMatMultSym 60 1.0 2.4238e-01 1.0 0.00e+00 0.0 3.3e+05 > 2.4e+02 7.2e+02 0 0 0 0 0 0 0 0 0 0 0 > >>>> MatMatMultNum 60 1.0 5.8508e-02 1.0 7.82e+06 1.6 4.7e+04 > 5.7e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 14084 > >>>> MatPtAP 40 1.0 4.5617e-01 1.0 1.59e+07 1.6 3.3e+05 > 1.0e+03 6.4e+02 0 0 0 0 0 0 0 0 0 0 3649 > >>>> MatPtAPSymbolic 40 1.0 2.6002e-01 1.0 0.00e+00 0.0 1.7e+05 > 6.5e+02 2.8e+02 0 0 0 0 0 0 0 0 0 0 0 > >>>> MatPtAPNumeric 40 1.0 1.9293e-01 1.0 1.59e+07 1.6 1.5e+05 > 1.5e+03 3.2e+02 0 0 0 0 0 0 0 0 0 0 8629 > >>>> MatTrnMatMult 40 1.0 2.3801e-01 1.0 6.09e+06 1.8 1.8e+05 > 1.0e+03 6.4e+02 0 0 0 0 0 0 0 0 0 0 2442 > >>>> MatTrnMatMultSym 40 1.0 1.6962e-01 1.0 0.00e+00 0.0 1.7e+05 > 4.4e+02 6.4e+02 0 0 0 0 0 0 0 0 0 0 0 > >>>> MatTrnMatMultNum 40 1.0 6.9000e-02 1.0 6.09e+06 1.8 9.7e+03 > 1.1e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 8425 > >>>> MatGetLocalMat 240 1.0 4.9149e-02 1.6 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> MatGetBrAoCol 160 1.0 2.0470e-02 1.6 0.00e+00 0.0 3.3e+05 > 4.1e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> MatTranspose_SeqAIJ_FAST 80 1.0 2.9940e-03 1.4 0.00e+00 0.0 > 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> Mesh Partition 1 1.0 1.4825e+00 1.0 0.00e+00 0.0 9.8e+04 > 6.9e+01 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> Mesh Migration 1 1.0 3.6680e-02 1.0 0.00e+00 0.0 1.5e+03 > 1.4e+04 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> DMPlexDistribute 1 1.0 1.5269e+00 1.0 0.00e+00 0.0 1.0e+05 > 3.5e+02 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 > >>>> DMPlexDistCones 1 1.0 1.8845e-02 1.2 0.00e+00 0.0 1.0e+03 > 1.7e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> DMPlexDistLabels 1 1.0 9.7280e-04 1.2 0.00e+00 0.0 0.0e+00 > 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> DMPlexDistData 1 1.0 3.1499e-01 1.4 0.00e+00 0.0 9.8e+04 > 4.3e+01 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> DMPlexStratify 2 1.0 9.3421e-02 1.8 0.00e+00 0.0 0.0e+00 > 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> DMPlexPrealloc 2 1.0 3.5980e-02 1.0 0.00e+00 0.0 4.0e+04 > 1.8e+03 3.0e+01 0 0 0 0 0 0 0 0 0 0 0 > >>>> SFSetGraph 20 1.0 1.6069e-05 2.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> SFSetUp 20 1.0 2.8043e-01 1.9 0.00e+00 0.0 6.7e+04 > 5.0e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> SFBcastBegin 25 1.0 3.9653e-02 2.5 0.00e+00 0.0 6.1e+04 > 4.9e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> SFBcastEnd 25 1.0 9.0128e-02 1.6 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> SFReduceBegin 10 1.0 4.3473e-04 5.5 0.00e+00 0.0 7.4e+03 > 4.0e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> SFReduceEnd 10 1.0 5.7962e-03 1.3 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> SFFetchOpBegin 2 1.0 1.6069e-0434.7 0.00e+00 0.0 1.8e+03 > 4.4e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> SFFetchOpEnd 2 1.0 8.9251e-04 2.6 0.00e+00 0.0 1.8e+03 > 4.4e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> VecSet 302179 1.0 1.3128e+00 2.3 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> VecAssemblyBegin 1 1.0 1.3844e-03 7.3 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> VecAssemblyEnd 1 1.0 3.4710e-05 4.1 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> VecScatterBegin 603945 1.0 2.2874e+01 4.4 0.00e+00 0.0 1.1e+09 > 5.0e+02 1.0e+00 2 0100100 0 2 0100100 0 0 > >>>> VecScatterEnd 603944 1.0 8.2651e+01 4.5 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 7 0 0 0 0 7 0 0 0 0 0 > >>>> VecSetRandom 11 1.0 2.7061e-03 3.1 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> EPSSetUp 10 1.0 5.0371e-02 1.1 0.00e+00 0.0 0.0e+00 > 0.0e+00 4.0e+01 0 0 0 0 0 0 0 0 0 0 0 > >>>> EPSSolve 10 1.0 6.1329e+02 1.0 9.23e+11 1.8 1.1e+09 > 5.0e+02 1.5e+06 99100100100100 99100100100100 150509 > >>>> STSetUp 10 1.0 2.5475e-04 2.9 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> STApply 150986 1.0 2.1997e+02 1.3 8.07e+10 1.8 1.1e+09 > 5.0e+02 1.2e+06 31 9100100 80 31 9100100 80 36950 > >>>> BVCopy 1791 1.0 5.1953e-03 1.5 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> BVMultVec 301925 1.0 1.5007e+02 3.1 3.31e+11 1.8 0.0e+00 > 0.0e+00 0.0e+00 14 36 0 0 0 14 36 0 0 0 220292 > >>>> BVMultInPlace 1801 1.0 8.0080e+00 1.8 1.78e+11 1.8 0.0e+00 > 0.0e+00 0.0e+00 1 19 0 0 0 1 19 0 0 0 2222543 > >>>> BVDotVec 301925 1.0 3.2807e+02 1.4 3.33e+11 1.8 0.0e+00 > 0.0e+00 3.0e+05 47 36 0 0 20 47 36 0 0 20 101409 > >>>> BVOrthogonalizeV 150996 1.0 4.0292e+02 1.1 6.64e+11 1.8 0.0e+00 > 0.0e+00 3.0e+05 62 72 0 0 20 62 72 0 0 20 164619 > >>>> BVScale 150996 1.0 4.1660e-01 3.2 5.27e+08 1.8 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 126494 > >>>> BVSetRandom 10 1.0 2.5061e-03 2.9 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> DSSolve 1801 1.0 2.0764e+01 1.1 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 > >>>> DSVectors 2779 1.0 1.2691e-01 1.1 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> DSOther 1801 1.0 1.2944e+01 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > >>>> > ------------------------------------------------------------------------------------------------------------------------ > >>>> > >>>> Memory usage is given in bytes: > >>>> > >>>> Object Type Creations Destructions Memory > Descendants' Mem. > >>>> Reports information only for process 0. > >>>> > >>>> --- Event Stage 0: Main Stage > >>>> > >>>> Container 1 1 584 0. > >>>> Distributed Mesh 6 6 29160 0. > >>>> GraphPartitioner 2 2 1244 0. > >>>> Matrix 1104 1104 136615232 0. > >>>> Index Set 930 930 9125912 0. > >>>> IS L to G Mapping 3 3 2235608 0. > >>>> Section 28 26 18720 0. > >>>> Star Forest Graph 30 30 25632 0. > >>>> Discrete System 6 6 5616 0. > >>>> PetscRandom 11 11 7194 0. > >>>> Vector 604372 604372 8204816368 0. > >>>> Vec Scatter 203 203 272192 0. > >>>> Viewer 21 10 8480 0. > >>>> EPS Solver 10 10 86360 0. > >>>> Spectral Transform 10 10 8400 0. > >>>> Basis Vectors 10 10 530848 0. > >>>> Region 10 10 6800 0. > >>>> Direct Solver 10 10 9838880 0. > >>>> Krylov Solver 10 10 13920 0. > >>>> Preconditioner 10 10 10080 0. > >>>> > ======================================================================================================================== > >>>> Average time to get PetscTime(): 3.49944e-08 > >>>> Average time for MPI_Barrier(): 5.842e-06 > >>>> Average time for zero size MPI_Send(): 8.72551e-06 > >>>> #PETSc Option Table entries: > >>>> -config=benchmark3.json > >>>> -log_view > >>>> #End of PETSc Option Table entries > >>>> Compiled without FORTRAN kernels > >>>> Compiled with full precision matrices (default) > >>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 8 sizeof(PetscInt) 4 > >>>> Configure options: > --prefix=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit > --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 > CFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" FFLAGS= > CXXFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" > --with-cc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc > --with-cxx=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpic++ > --with-fc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 > --with-precision=double --with-scalar-type=real --with-shared-libraries=1 > --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= > CXXOPTFLAGS= > --with-blaslapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib/libopenblas.so > --with-x=0 --with-cxx-dialect=C++11 --with-boost=1 --with-clanguage=C > --with-scalapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib/libscalapack.so > --with-scalapack=1 --with-metis=1 > --with-metis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk > --with-hdf5=1 > --with-hdf5-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5 > --with-hypre=1 > --with-hypre-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne > --with-parmetis=1 > --with-parmetis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4 > --with-mumps=1 > --with-mumps-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b > --with-trilinos=1 > --with-trilinos-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo > --with-fftw=0 --with-cxx-dialect=C++11 > --with-superlu_dist-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include > --with-superlu_dist-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib/libsuperlu_dist.a > --with-superlu_dist=1 > --with-suitesparse-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include > --with-suitesparse-lib="/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libumfpack.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libklu.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcholmod.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libbtf.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libccolamd.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcolamd.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcamd.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libamd.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libsuitesparseconfig.so > /lib64/librt.so" --with-suitesparse=1 > --with-zlib-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include > --with-zlib-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib/libz.so > --with-zlib=1 > >>>> ----------------------------------------- > >>>> Libraries compiled on 2020-01-22 15:21:53 on eu-c7-051-02 > >>>> Machine characteristics: > Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-centos-7.5.1804-Core > >>>> Using PETSc directory: > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit > >>>> Using PETSc arch: > >>>> ----------------------------------------- > >>>> > >>>> Using C compiler: > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc > -ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2 > >>>> Using Fortran compiler: > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 > > >>>> ----------------------------------------- > >>>> > >>>> Using include paths: > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include > >>>> ----------------------------------------- > >>>> > >>>> Using C linker: > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc > >>>> Using Fortran linker: > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 > >>>> Using libraries: > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib > -lpetsc > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib > /lib64/librt.so > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib > -Wl,-rpath,/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib > -L/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib:/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib > -lmuelu-adapters -lmuelu-interface -lmuelu -lstratimikos -lstratimikosbelos > -lstratimikosaztecoo -lstratimikosamesos -lstratimikosml > -lstratimikosifpack -lModeLaplace -lanasaziepetra -lanasazi -lmapvarlib > -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco > -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac > -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus > -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco > -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac > -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus > -lbelosxpetra -lbelosepetra -lbelos -lml -lifpack -lpamgen_extras -lpamgen > -lamesos -lgaleri-xpetra -lgaleri-epetra -laztecoo -lisorropia -lxpetra-sup > -lxpetra -lthyraepetraext -lthyraepetra -lthyracore -lthyraepetraext > -lthyraepetra -lthyracore -lepetraext -ltrilinosss -ltriutils -lzoltan > -lepetra -lsacado -lrtop -lkokkoskernels -lteuchoskokkoscomm > -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm > -lteuchosparameterlist -lteuchosparser -lteuchoscore -lteuchoskokkoscomm > -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm > -lteuchosparameterlist -lteuchosparser -lteuchoscore -lkokkosalgorithms > -lkokkoscontainers -lkokkoscore -lkokkosalgorithms -lkokkoscontainers > -lkokkoscore -lgtest -lpthread -lcmumps -ldmumps -lsmumps -lzmumps > -lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf -lccolamd > -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu_dist -lHYPRE -lopenblas > -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lparmetis -lmetis -lm -lz > -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi > -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl > >>>> ----------------------------------------- > >>>> > >>>> > >>>> > >>>> -- > >>>> What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > >>>> -- Norbert Wiener > >>>> > >>>> https://www.cse.buffalo.edu/~knepley/ > >>> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From shaswat121994 at gmail.com Sat May 2 02:35:22 2020 From: shaswat121994 at gmail.com (Shashwat Tiwari) Date: Sat, 2 May 2020 13:05:22 +0530 Subject: [petsc-users] Setting values to a Global Vector using Labels Message-ID: Hi, I am writing a simple code to solve linear advection equation using a first order cell centered finite volume scheme on 2D unstructured girds using DMPlex. In order to set values to a Global vector (for example, to set the initial value of the solution vector), I am looping over the cells owned by a process (including partition ghost cells) and checking the "vtk" label for each cell, assigned by the DMPlexConstructGhostCells() function, to prevent it from writing to ghost cells. If I don't do this check, the code gives a segmentation fault, which, as far as I understand, is caused by trying to write into the ghost points which do not exist on the Global Vector. Following is the function that I have written to set the initial condition: PetscErrorCode SetIC(DM dm, Vec U) { PetscErrorCode ierr; PetscScalar *u; PetscFunctionBegin; PetscInt c, cStart, cEnd; // cells PetscReal area, centroid[3], normal[3]; // geometric data // get cell stratum owned by processor ierr = DMPlexGetHeightStratum(dm, 0, &cStart, &cEnd); CHKERRQ(ierr); // get array for U ierr = VecGetArray(U, &u); // loop over cells and assign values for(c=cStart; c -------------- next part -------------- A non-text attachment was scrubbed... Name: test.c Type: text/x-csrc Size: 5707 bytes Desc: not available URL: From knepley at gmail.com Sat May 2 06:12:31 2020 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 2 May 2020 07:12:31 -0400 Subject: [petsc-users] Setting values to a Global Vector using Labels In-Reply-To: References: Message-ID: On Sat, May 2, 2020 at 3:36 AM Shashwat Tiwari wrote: > Hi, > I am writing a simple code to solve linear advection equation using a > first order cell centered finite volume scheme on 2D unstructured girds > using DMPlex. In order to set values to a Global vector (for example, to > set the initial value of the solution vector), I am looping over the cells > owned by a process (including partition ghost cells) and checking the "vtk" > label for each cell, assigned by the DMPlexConstructGhostCells() function, > to prevent it from writing to ghost cells. If I don't do this check, the > code gives a segmentation fault, which, as far as I understand, is caused > by trying to write into the ghost points which do not exist on the Global > Vector. Following is the function that I have written to set the initial > condition: > > PetscErrorCode SetIC(DM dm, Vec U) > { > PetscErrorCode ierr; > PetscScalar *u; > > PetscFunctionBegin; > PetscInt c, cStart, cEnd; // cells > PetscReal area, centroid[3], normal[3]; // geometric data > // get cell stratum owned by processor > ierr = DMPlexGetHeightStratum(dm, 0, &cStart, &cEnd); CHKERRQ(ierr); > // get array for U > ierr = VecGetArray(U, &u); > // loop over cells and assign values > for(c=cStart; c { > PetscInt label; > ierr = DMGetLabelValue(dm, "vtk", c, &label); CHKERRQ(ierr); > // write into Global vector if the cell is a real cell > if(label == 1) > { > PetscReal X[2]; // cell centroid > ierr = DMPlexComputeCellGeometryFVM(dm, c, &area, centroid, normal); > CHKERRQ(ierr); > X[0] = centroid[0]; X[1] = centroid[1]; > u[c] = initial_condition(X); > } > } > ierr = VecRestoreArray(U, &u); > PetscFunctionReturn(0); > } > > This gives me the desired output, but, I wanted to ask if there is a > better and more efficient way to achieve this, i.e. to write to a global > vector, than checking the label for each cell. I am also attaching a sample > code which sets the initial condition and writes a corresponding vtk file. > Kindly correct me if I am wrong in my understanding and give your > suggestions to improve upon this. > 1) This will work, but I think the intent is to use the "ghost" label to identify ghost cells, It has the same information as "vtk" since you do not want to output those cells for visualization either, but it might be more clear in the code. 2) If I were doing this many times, I would create a new IS with the cells I was looping over, namely [cStart, cEnd) - {ghost cells}. That way you have only a lookup rather than a search. However, if you do it only once or twice, I don't think there is much advantage. Thanks, Matt > Regards, > Shashwat > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenzhuotj at gmail.com Sat May 2 23:33:11 2020 From: chenzhuotj at gmail.com (Zhuo Chen) Date: Sat, 2 May 2020 22:33:11 -0600 Subject: [petsc-users] PCBJACOBI behaves weirdly Message-ID: Dear Petsc users, My name is Zhuo Chen. I am very new to Petsc. I have encountered a very weird problem. I try to use Petsc to solve a 2D diffusion problem (https://en.wikipedia.org/wiki/Heat_equation). When I use PCLU, the solution looks very good. Then, I change to PCJACOBI because I plan to use MPI in the future. If I treat the whole matrix as a single block and set the preconditioner of the "sub-matrix" with PCLU, the solution is exactly the same as the PCLU setting. However, if I increase the number of blocks, the result becomes worse. I attached the solution of a 2D diffusion simulation I have done. The red dots are the analytic solution and the blue circles are the solution from Petsc with 8 blocks. The size of the computational domain is 300*8. The result with PCLU only is very close to the analytic solution so I do not include it here. Then I implemented the same problem with Matlab with x=gmres(mat,rhs,10,tol,maxit,p), where mat is the same as the one in my Petsc code, and p is the block Jacobi matrix (8 blocks). The result from Matlab is very good, though it is slow. I attach the Matlab result to this email as well. I would like to paste a part of my Petsc code here as well. 179 call KSPSetOperators(ksp,A,A,ierr);CHKERRQ(ierr) 180 call KSPGetPC(ksp,pc,ierr);CHKERRQ(ierr) 181 tol=1d-6 182 call KSPSetTolerances(ksp,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr);CHKERRQ(ierr) 183 call PCSetType(pc,PCBJACOBI,ierr);CHKERRQ(ierr) 184 nblks=8 185 allocate(blks(nblks)) 186 blks=Ntot/nblks 187 call PCBJacobiSetTotalBlocks(pc,nblks,blks,ierr);CHKERRQ(ierr) 188 deallocate(blks) 189 call KSPsetup(ksp,ierr) 190 call PCBJacobiGetSubKSP(pc,nlocal,first,PETSC_NULL_KSP,ierr) 191 allocate(subksp(nlocal)) 192 call PCBJacobiGetSubKSP(pc,nlocal,first,subksp,ierr) 193 do i=0,nlocal-1 194 call KSPGetPC(subksp(i+1),subpc,ierr) 195 call PCSetType(subpc,PCLU,ierr); CHKERRA(ierr) 196 call KSPSetType(subksp(i+1),KSPGMRES,ierr); CHKERRA(ierr) 197 tol=1d-6 198 call KSPSetTolerances(subksp(i+1),tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr);CHKERRQ(ierr) 199 end do 200 deallocate(subksp) It would be great if anyone can help me with this issue. Many thanks! -- Zhuo Chen Department of Physics University of Alberta Edmonton Alberta, Canada T6G 2E1 http://www.pas.rochester.edu/~zchen25/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: diffusion_petsc.png Type: image/png Size: 113735 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: diffusion_matlab.png Type: image/png Size: 36981 bytes Desc: not available URL: From balay at mcs.anl.gov Sat May 2 23:50:38 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Sat, 2 May 2020 23:50:38 -0500 (CDT) Subject: [petsc-users] petsc-3.13.1.tar.gz now available Message-ID: Dear PETSc users, The patch release petsc-3.13.1 is now available for download, with change list at 'PETSc-3.13 Changelog' http://www.mcs.anl.gov/petsc/download/index.html Satish From knepley at gmail.com Sun May 3 06:35:30 2020 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 3 May 2020 07:35:30 -0400 Subject: [petsc-users] PCBJACOBI behaves weirdly In-Reply-To: References: Message-ID: On Sun, May 3, 2020 at 12:34 AM Zhuo Chen wrote: > Dear Petsc users, > > My name is Zhuo Chen. I am very new to Petsc. > > I have encountered a very weird problem. I try to use Petsc to solve a 2D > diffusion problem (https://en.wikipedia.org/wiki/Heat_equation). When I > use PCLU, the solution looks very good. Then, I change to PCJACOBI because > I plan to use MPI in the future. If I treat the whole matrix as a single > block and set the preconditioner of the "sub-matrix" with PCLU, the > solution is exactly the same as the PCLU setting. However, if I increase > the number of blocks, the result becomes worse. I attached the solution of > a 2D diffusion simulation I have done. The red dots are the analytic > solution and the blue circles are the solution from Petsc with 8 blocks. > The size of the computational domain is 300*8. The result with PCLU only is > very close to the analytic solution so I do not include it here. > > Then I implemented the same problem with Matlab > with x=gmres(mat,rhs,10,tol,maxit,p), where mat is the same as the one in > my Petsc code, and p is the block Jacobi matrix (8 blocks). The result from > Matlab is very good, though it is slow. I attach the Matlab result to this > email as well. > The default tolerance for KSP is 1e-5. If you make it lower, you will get what you want, -ksp_rtol 1e-10 However, BJACOBI is a terrible way to solve the diffusion equation. You should use multigrid, for example -pc_type gamg or Hypre/BoomerAMG or ML. Thanks, Matt > I would like to paste a part of my Petsc code here as well. > > 179 call KSPSetOperators(ksp,A,A,ierr);CHKERRQ(ierr) > 180 call KSPGetPC(ksp,pc,ierr);CHKERRQ(ierr) > 181 tol=1d-6 > 182 call > KSPSetTolerances(ksp,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr);CHKERRQ(ierr) > 183 call PCSetType(pc,PCBJACOBI,ierr);CHKERRQ(ierr) > 184 nblks=8 > 185 allocate(blks(nblks)) > 186 blks=Ntot/nblks > 187 call PCBJacobiSetTotalBlocks(pc,nblks,blks,ierr);CHKERRQ(ierr) > 188 deallocate(blks) > 189 call KSPsetup(ksp,ierr) > 190 call PCBJacobiGetSubKSP(pc,nlocal,first,PETSC_NULL_KSP,ierr) > 191 allocate(subksp(nlocal)) > 192 call PCBJacobiGetSubKSP(pc,nlocal,first,subksp,ierr) > 193 do i=0,nlocal-1 > 194 call KSPGetPC(subksp(i+1),subpc,ierr) > 195 call PCSetType(subpc,PCLU,ierr); CHKERRA(ierr) > 196 call KSPSetType(subksp(i+1),KSPGMRES,ierr); CHKERRA(ierr) > 197 tol=1d-6 > 198 call > KSPSetTolerances(subksp(i+1),tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr);CHKERRQ(ierr) > 199 end do > 200 deallocate(subksp) > > It would be great if anyone can help me with this issue. Many thanks! > > -- > Zhuo Chen > Department of Physics > University of Alberta > Edmonton Alberta, Canada T6G 2E1 > http://www.pas.rochester.edu/~zchen25/ > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From awalker at student.ethz.ch Mon May 4 02:51:25 2020 From: awalker at student.ethz.ch (Walker Andreas) Date: Mon, 4 May 2020 07:51:25 +0000 Subject: [petsc-users] Performance of SLEPc's Krylov-Schur solver In-Reply-To: References: <86B05A0E-87C4-4B23-AC8B-6C39E6538B84@student.ethz.ch> <58697038-9771-4819-B060-061A3F3F0E91@student.ethz.ch> <87sggjam5x.fsf@jedbrown.org> <294D3E6F-33A1-4208-AD80-87CDA90DF87B@student.ethz.ch> Message-ID: <5EFC5BA9-A229-4FCB-B484-3A90F9213F95@student.ethz.ch> Hey everyone, I wanted to give you a short update on this: - As suggested by Matt, I played around with the distribution of my cores over nodes and changed to using one core per node only. - After experimenting with a monolithic matrix instead of a MATNEST object, I observed that the monolithic showed better speedup and changed to building my problem matrix as monolithic matrix. - I keep the subspace for the solver small which slightly improves the runtimes After these changes, I get near-perfect scaling with speedups above 56 for 64 cores (1 core/node) for example. Unfortunately I can?t really tell which of the above changes contributed how much to this improvement. Anyway, thanks everyone for your help! Best regards and stay healthy Andreas Am 01.05.2020 um 14:45 schrieb Matthew Knepley >: On Fri, May 1, 2020 at 8:32 AM Walker Andreas > wrote: Hi Jed, Hi Jose, Thank you very much for your suggestions. - I tried reducing the subspace to 64 which indeed reduced the runtime by around 20 percent (sometimes more) for 128 cores. I will check what the effect on the sequential runtime is. - Regarding MatNest, I can just look for the eigenvalues of a submatrix to see how the speedup is affected; I will check that. Replacing the full matnest with a contiguous matrix is definitely more work but, if it improves the performance, worth the work (we assume that the program will be reused a lot). - Petsc is configured with mumps, openblas, scalapack (among others). But I noticed no significant difference to when petsc is configured without them. - The number of iterations required by the solver does not depend on the number of cores. Best regards and many thanks, Let me just address something from a high level. These operations are not compute limited (for the most part), but limited by bandwidth. Bandwidth is allocated by node, not by core, on these machines. That is why it important to understand how many nodes you are using, not cores. A useful scaling test would be to fill up a single node (however many cores fit on one node), and then increase the # of nodes. We would expect close to linear scaling in that case. Thanks, Matt Andreas Walker > Am 01.05.2020 um 14:12 schrieb Jed Brown >: > > "Jose E. Roman" > writes: > >> Comments related to PETSc: >> >> - If you look at the "Reduct" column you will see that MatMult() is doing a lot of global reductions, which is bad for scaling. This is due to MATNEST (other Mat types do not do that). I don't know the details of MATNEST, maybe Matt can comment on this. > > It is not intrinsic to MatNest, though use of MatNest incurs extra > VecScatter costs. If you use MatNest without VecNest, then > VecGetSubVector incurs significant cost (including reductions). I > suspect it's likely that some SLEPc functionality is not available with > VecNest. A better option would be to optimize VecGetSubVector by > caching the IS and subvector, at least in the contiguous case. > > How difficult would it be for you to run with a monolithic matrix > instead of MatNest? It would certainly be better at amortizing > communication costs. > >> >> Comments related to SLEPc. >> >> - The last rows (DSSolve, DSVectors, DSOther) correspond to "sequential" computations. In your case they take a non-negligible time (around 30 seconds). You can try to reduce this time by reducing the size of the projected problem, e.g. running with -eps_nev 100 -eps_mpd 64 (see https://slepc.upv.es/documentation/current/docs/manualpages/EPS/EPSSetDimensions.html ) >> >> - In my previous comment about multithreaded BLAS, I was refering to configuring PETSc with MKL, OpenBLAS or similar. But anyway, I don't think this is relevant here. >> >> - Regarding the number of iterations, yes the number of iterations should be the same for different runs if you keep the same number of processes, but when you change the number of processes there might be significant differences for some problems, that is the rationale of my suggestion. Anyway, in your case the fluctuation does not seem very important. >> >> Jose >> >> >>> El 1 may 2020, a las 10:07, Walker Andreas > escribi?: >>> >>> Hi Matthew, >>> >>> I just ran the same program on a single core. You can see the output of -log_view below. As I see it, most functions have speedups of around 50 for 128 cores, also functions like matmult etc. >>> >>> Best regards, >>> >>> Andreas >>> >>> ************************************************************************************************************************ >>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** >>> ************************************************************************************************************************ >>> >>> ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- >>> >>> ./Solver on a named eu-a6-011-09 with 1 processor, by awalker Fri May 1 04:03:07 2020 >>> Using Petsc Release Version 3.10.5, Mar, 28, 2019 >>> >>> Max Max/Min Avg Total >>> Time (sec): 3.092e+04 1.000 3.092e+04 >>> Objects: 6.099e+05 1.000 6.099e+05 >>> Flop: 9.313e+13 1.000 9.313e+13 9.313e+13 >>> Flop/sec: 3.012e+09 1.000 3.012e+09 3.012e+09 >>> MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 >>> MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 >>> MPI Reductions: 0.000e+00 0.000 >>> >>> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) >>> e.g., VecAXPY() for real vectors of length N --> 2N flop >>> and VecAXPY() for complex vectors of length N --> 8N flop >>> >>> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- >>> Avg %Total Avg %Total Count %Total Avg %Total Count %Total >>> 0: Main Stage: 3.0925e+04 100.0% 9.3134e+13 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% >>> >>> ------------------------------------------------------------------------------------------------------------------------ >>> See the 'Profiling' chapter of the users' manual for details on interpreting output. >>> Phase summary info: >>> Count: number of times phase was executed >>> Time and Flop: Max - maximum over all processors >>> Ratio - ratio of maximum to minimum over all processors >>> Mess: number of messages sent >>> AvgLen: average message length (bytes) >>> Reduct: number of global reductions >>> Global: entire computation >>> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). >>> %T - percent time in this phase %F - percent flop in this phase >>> %M - percent messages in this phase %L - percent message lengths in this phase >>> %R - percent reductions in this phase >>> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) >>> ------------------------------------------------------------------------------------------------------------------------ >>> Event Count Time (sec) Flop --- Global --- --- Stage ---- Total >>> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> --- Event Stage 0: Main Stage >>> >>> MatMult 152338 1.0 8.2799e+03 1.0 8.20e+12 1.0 0.0e+00 0.0e+00 0.0e+00 27 9 0 0 0 27 9 0 0 0 990 >>> MatMultAdd 609352 1.0 8.1229e+03 1.0 8.20e+12 1.0 0.0e+00 0.0e+00 0.0e+00 26 9 0 0 0 26 9 0 0 0 1010 >>> MatConvert 30 1.0 1.5797e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatScale 10 1.0 4.7172e-02 1.0 6.73e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1426 >>> MatAssemblyBegin 516 1.0 2.0695e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatAssemblyEnd 516 1.0 2.8933e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatZeroEntries 2 1.0 3.6038e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatView 10 1.0 2.4422e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatAXPY 40 1.0 3.1595e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatMatMult 60 1.0 1.3723e+01 1.0 1.24e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 90 >>> MatMatMultSym 100 1.0 1.3651e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatMatMultNum 100 1.0 7.5159e+00 1.0 2.06e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 274 >>> MatMatMatMult 40 1.0 1.8674e+01 1.0 1.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 89 >>> MatMatMatMultSym 40 1.0 1.1848e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatMatMatMultNum 40 1.0 6.8266e+00 1.0 1.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 243 >>> MatPtAP 40 1.0 1.9042e+01 1.0 1.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 87 >>> MatTrnMatMult 40 1.0 7.7990e+00 1.0 8.24e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 106 >>> DMPlexStratify 1 1.0 5.1223e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> DMPlexPrealloc 2 1.0 1.5242e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecSet 914053 1.0 1.4929e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecAssemblyBegin 1 1.0 1.3411e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecAssemblyEnd 1 1.0 8.0094e-08 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecScatterBegin 1 1.0 2.6399e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecSetRandom 10 1.0 8.6088e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> EPSSetUp 10 1.0 2.9988e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> EPSSolve 10 1.0 2.8695e+04 1.0 9.31e+13 1.0 0.0e+00 0.0e+00 0.0e+00 93100 0 0 0 93100 0 0 0 3246 >>> STSetUp 10 1.0 9.7291e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> STApply 152338 1.0 8.2803e+03 1.0 8.20e+12 1.0 0.0e+00 0.0e+00 0.0e+00 27 9 0 0 0 27 9 0 0 0 990 >>> BVCopy 1814 1.0 1.1076e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> BVMultVec 304639 1.0 9.8281e+03 1.0 3.34e+13 1.0 0.0e+00 0.0e+00 0.0e+00 32 36 0 0 0 32 36 0 0 0 3397 >>> BVMultInPlace 1824 1.0 7.0999e+02 1.0 1.79e+13 1.0 0.0e+00 0.0e+00 0.0e+00 2 19 0 0 0 2 19 0 0 0 25213 >>> BVDotVec 304639 1.0 9.8037e+03 1.0 3.36e+13 1.0 0.0e+00 0.0e+00 0.0e+00 32 36 0 0 0 32 36 0 0 0 3427 >>> BVOrthogonalizeV 152348 1.0 1.9633e+04 1.0 6.70e+13 1.0 0.0e+00 0.0e+00 0.0e+00 63 72 0 0 0 63 72 0 0 0 3411 >>> BVScale 152348 1.0 3.7888e+01 1.0 5.32e+10 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1403 >>> BVSetRandom 10 1.0 8.6364e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> DSSolve 1824 1.0 1.7363e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> DSVectors 2797 1.0 1.2353e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> DSOther 1824 1.0 9.8627e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> Memory usage is given in bytes: >>> >>> Object Type Creations Destructions Memory Descendants' Mem. >>> Reports information only for process 0. >>> >>> --- Event Stage 0: Main Stage >>> >>> Container 1 1 584 0. >>> Distributed Mesh 1 1 5184 0. >>> GraphPartitioner 1 1 624 0. >>> Matrix 320 320 3469402576 0. >>> Index Set 53 53 2777932 0. >>> IS L to G Mapping 1 1 249320 0. >>> Section 13 11 7920 0. >>> Star Forest Graph 6 6 4896 0. >>> Discrete System 1 1 936 0. >>> Vector 609405 609405 857220847896 0. >>> Vec Scatter 1 1 704 0. >>> Viewer 22 11 9328 0. >>> EPS Solver 10 10 86360 0. >>> Spectral Transform 10 10 8400 0. >>> Basis Vectors 10 10 530336 0. >>> PetscRandom 10 10 6540 0. >>> Region 10 10 6800 0. >>> Direct Solver 10 10 9838880 0. >>> Krylov Solver 10 10 13920 0. >>> Preconditioner 10 10 10080 0. >>> ======================================================================================================================== >>> Average time to get PetscTime(): 2.50991e-08 >>> #PETSc Option Table entries: >>> -config=benchmark3.json >>> -eps_converged_reason >>> -log_view >>> #End of PETSc Option Table entries >>> Compiled without FORTRAN kernels >>> Compiled with full precision matrices (default) >>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 >>> Configure options: --prefix=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 CFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" FFLAGS= CXXFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" --with-cc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc --with-cxx=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpic++ --with-fc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 --with-precision=double --with-scalar-type=real --with-shared-libraries=1 --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= CXXOPTFLAGS= --with-blaslapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib/libopenblas.so --with-x=0 --with-cxx-dialect=C++11 --with-boost=1 --with-clanguage=C --with-scalapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib/libscalapack.so --with-scalapack=1 --with-metis=1 --with-metis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk --with-hdf5=1 --with-hdf5-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5 --with-hypre=1 --with-hypre-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne --with-parmetis=1 --with-parmetis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4 --with-mumps=1 --with-mumps-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b --with-trilinos=1 --with-trilinos-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo --with-fftw=0 --with-cxx-dialect=C++11 --with-superlu_dist-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include --with-superlu_dist-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib/libsuperlu_dist.a --with-superlu_dist=1 --with-suitesparse-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include --with-suitesparse-lib="/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libumfpack.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libklu.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcholmod.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libbtf.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libccolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libsuitesparseconfig.so /lib64/librt.so" --with-suitesparse=1 --with-zlib-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include --with-zlib-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib/libz.so --with-zlib=1 >>> ----------------------------------------- >>> Libraries compiled on 2020-01-22 15:21:53 on eu-c7-051-02 >>> Machine characteristics: Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-centos-7.5.1804-Core >>> Using PETSc directory: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit >>> Using PETSc arch: >>> ----------------------------------------- >>> >>> Using C compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc -ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2 >>> Using Fortran compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 >>> ----------------------------------------- >>> >>> Using include paths: -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include >>> ----------------------------------------- >>> >>> Using C linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc >>> Using Fortran linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 >>> Using libraries: -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -lpetsc -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib /lib64/librt.so -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -Wl,-rpath,/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -L/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib:/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -lmuelu-adapters -lmuelu-interface -lmuelu -lstratimikos -lstratimikosbelos -lstratimikosaztecoo -lstratimikosamesos -lstratimikosml -lstratimikosifpack -lModeLaplace -lanasaziepetra -lanasazi -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lbelosxpetra -lbelosepetra -lbelos -lml -lifpack -lpamgen_extras -lpamgen -lamesos -lgaleri-xpetra -lgaleri-epetra -laztecoo -lisorropia -lxpetra-sup -lxpetra -lthyraepetraext -lthyraepetra -lthyracore -lthyraepetraext -lthyraepetra -lthyracore -lepetraext -ltrilinosss -ltriutils -lzoltan -lepetra -lsacado -lrtop -lkokkoskernels -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lgtest -lpthread -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu_dist -lHYPRE -lopenblas -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lparmetis -lmetis -lm -lz -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl >>> ----------------------------------------- >>> >>> >>>> Am 30.04.2020 um 17:14 schrieb Matthew Knepley >: >>>> >>>> On Thu, Apr 30, 2020 at 10:55 AM Walker Andreas > wrote: >>>> Hello everyone, >>>> >>>> I have used SLEPc successfully on a FEM-related project. Even though it is very powerful overall, the speedup I measure is a bit below my expectations. Compared to using a single core, the speedup is for example around 1.8 for two cores but only maybe 50-60 for 128 cores and maybe 70 or 80 for 256 cores. Some details about my problem: >>>> >>>> - The problem is based on meshes with up to 400k degrees of freedom. DMPlex is used for organizing it. >>>> - ParMetis is used to partition the mesh. This yields a stiffness matrix where the vast majority of entries is in the diagonal blocks (i.e. looking at the rows owned by a core, there is a very dense square-shaped region around the diagonal and some loosely scattered nozeroes in the other columns). >>>> - The actual matrix from which I need eigenvalues is a 2x2 block matrix, saved as MATNEST - matrix. Each of these four matrices is computed based on the stiffness matrix and has a similar size and nonzero pattern. For a mesh of 200k dofs, one such matrix has a size of about 174kx174k and on average about 40 nonzeroes per row. >>>> - I use the default Krylov-Schur solver and look for the 100 smallest eigenvalues >>>> - The output of -log_view for the 200k-dof - mesh described above run on 128 cores is at the end of this mail. >>>> >>>> I noticed that the problem matrices are not perfectly balanced, i.e. the number of rows per core might vary between 2500 and 3000, for example. But I am not sure if this is the main reason for the poor speedup. >>>> >>>> I tried to reduce the subspace size but without effect. I also attempted to use the shift-and-invert spectral transformation but the MATNEST-type prevents this. >>>> >>>> Are there any suggestions to improve the speedup further or is this the maximum speedup that I can expect? >>>> >>>> Can you also give us the performance for this problem on one node using the same number of cores per node? Then we can calculate speedup >>>> and look at which functions are not speeding up. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> Thanks a lot in advance, >>>> >>>> Andreas Walker >>>> >>>> m&m group >>>> D-MAVT >>>> ETH Zurich >>>> >>>> ************************************************************************************************************************ >>>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** >>>> ************************************************************************************************************************ >>>> >>>> ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- >>>> >>>> ./Solver on a named eu-g1-050-2 with 128 processors, by awalker Thu Apr 30 15:50:22 2020 >>>> Using Petsc Release Version 3.10.5, Mar, 28, 2019 >>>> >>>> Max Max/Min Avg Total >>>> Time (sec): 6.209e+02 1.000 6.209e+02 >>>> Objects: 6.068e+05 1.001 6.063e+05 >>>> Flop: 9.230e+11 1.816 7.212e+11 9.231e+13 >>>> Flop/sec: 1.487e+09 1.816 1.161e+09 1.487e+11 >>>> MPI Messages: 1.451e+07 2.999 8.265e+06 1.058e+09 >>>> MPI Message Lengths: 6.062e+09 2.011 5.029e+02 5.321e+11 >>>> MPI Reductions: 1.512e+06 1.000 >>>> >>>> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) >>>> e.g., VecAXPY() for real vectors of length N --> 2N flop >>>> and VecAXPY() for complex vectors of length N --> 8N flop >>>> >>>> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- >>>> Avg %Total Avg %Total Count %Total Avg %Total Count %Total >>>> 0: Main Stage: 6.2090e+02 100.0% 9.2309e+13 100.0% 1.058e+09 100.0% 5.029e+02 100.0% 1.512e+06 100.0% >>>> >>>> ------------------------------------------------------------------------------------------------------------------------ >>>> See the 'Profiling' chapter of the users' manual for details on interpreting output. >>>> Phase summary info: >>>> Count: number of times phase was executed >>>> Time and Flop: Max - maximum over all processors >>>> Ratio - ratio of maximum to minimum over all processors >>>> Mess: number of messages sent >>>> AvgLen: average message length (bytes) >>>> Reduct: number of global reductions >>>> Global: entire computation >>>> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). >>>> %T - percent time in this phase %F - percent flop in this phase >>>> %M - percent messages in this phase %L - percent message lengths in this phase >>>> %R - percent reductions in this phase >>>> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) >>>> ------------------------------------------------------------------------------------------------------------------------ >>>> Event Count Time (sec) Flop --- Global --- --- Stage ---- Total >>>> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >>>> ------------------------------------------------------------------------------------------------------------------------ >>>> >>>> --- Event Stage 0: Main Stage >>>> >>>> BuildTwoSided 20 1.0 2.3249e-01 2.2 0.00e+00 0.0 2.2e+04 4.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> BuildTwoSidedF 317 1.0 8.5016e-01 4.8 0.00e+00 0.0 2.1e+04 1.4e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatMult 150986 1.0 2.1963e+02 1.3 8.07e+10 1.8 1.1e+09 5.0e+02 1.2e+06 31 9100100 80 31 9100100 80 37007 >>>> MatMultAdd 603944 1.0 1.6209e+02 1.4 8.07e+10 1.8 1.1e+09 5.0e+02 0.0e+00 23 9100100 0 23 9100100 0 50145 >>>> MatConvert 30 1.0 1.6488e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatScale 10 1.0 1.0347e-03 3.9 6.68e+05 1.8 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 65036 >>>> MatAssemblyBegin 916 1.0 8.6715e-01 1.4 0.00e+00 0.0 2.1e+04 1.4e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatAssemblyEnd 916 1.0 2.0682e-01 1.1 0.00e+00 0.0 4.7e+05 1.3e+02 1.5e+03 0 0 0 0 0 0 0 0 0 0 0 >>>> MatZeroEntries 42 1.0 7.2787e-03 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatView 10 1.0 1.4816e+00 1.0 0.00e+00 0.0 6.4e+03 1.3e+05 3.0e+01 0 0 0 0 0 0 0 0 0 0 0 >>>> MatAXPY 40 1.0 1.0752e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatTranspose 80 1.0 3.0198e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatMatMult 60 1.0 3.0391e-01 1.0 7.82e+06 1.6 3.8e+05 2.8e+02 7.8e+02 0 0 0 0 0 0 0 0 0 0 2711 >>>> MatMatMultSym 60 1.0 2.4238e-01 1.0 0.00e+00 0.0 3.3e+05 2.4e+02 7.2e+02 0 0 0 0 0 0 0 0 0 0 0 >>>> MatMatMultNum 60 1.0 5.8508e-02 1.0 7.82e+06 1.6 4.7e+04 5.7e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 14084 >>>> MatPtAP 40 1.0 4.5617e-01 1.0 1.59e+07 1.6 3.3e+05 1.0e+03 6.4e+02 0 0 0 0 0 0 0 0 0 0 3649 >>>> MatPtAPSymbolic 40 1.0 2.6002e-01 1.0 0.00e+00 0.0 1.7e+05 6.5e+02 2.8e+02 0 0 0 0 0 0 0 0 0 0 0 >>>> MatPtAPNumeric 40 1.0 1.9293e-01 1.0 1.59e+07 1.6 1.5e+05 1.5e+03 3.2e+02 0 0 0 0 0 0 0 0 0 0 8629 >>>> MatTrnMatMult 40 1.0 2.3801e-01 1.0 6.09e+06 1.8 1.8e+05 1.0e+03 6.4e+02 0 0 0 0 0 0 0 0 0 0 2442 >>>> MatTrnMatMultSym 40 1.0 1.6962e-01 1.0 0.00e+00 0.0 1.7e+05 4.4e+02 6.4e+02 0 0 0 0 0 0 0 0 0 0 0 >>>> MatTrnMatMultNum 40 1.0 6.9000e-02 1.0 6.09e+06 1.8 9.7e+03 1.1e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 8425 >>>> MatGetLocalMat 240 1.0 4.9149e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatGetBrAoCol 160 1.0 2.0470e-02 1.6 0.00e+00 0.0 3.3e+05 4.1e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatTranspose_SeqAIJ_FAST 80 1.0 2.9940e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> Mesh Partition 1 1.0 1.4825e+00 1.0 0.00e+00 0.0 9.8e+04 6.9e+01 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> Mesh Migration 1 1.0 3.6680e-02 1.0 0.00e+00 0.0 1.5e+03 1.4e+04 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> DMPlexDistribute 1 1.0 1.5269e+00 1.0 0.00e+00 0.0 1.0e+05 3.5e+02 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 >>>> DMPlexDistCones 1 1.0 1.8845e-02 1.2 0.00e+00 0.0 1.0e+03 1.7e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> DMPlexDistLabels 1 1.0 9.7280e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> DMPlexDistData 1 1.0 3.1499e-01 1.4 0.00e+00 0.0 9.8e+04 4.3e+01 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> DMPlexStratify 2 1.0 9.3421e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> DMPlexPrealloc 2 1.0 3.5980e-02 1.0 0.00e+00 0.0 4.0e+04 1.8e+03 3.0e+01 0 0 0 0 0 0 0 0 0 0 0 >>>> SFSetGraph 20 1.0 1.6069e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> SFSetUp 20 1.0 2.8043e-01 1.9 0.00e+00 0.0 6.7e+04 5.0e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> SFBcastBegin 25 1.0 3.9653e-02 2.5 0.00e+00 0.0 6.1e+04 4.9e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> SFBcastEnd 25 1.0 9.0128e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> SFReduceBegin 10 1.0 4.3473e-04 5.5 0.00e+00 0.0 7.4e+03 4.0e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> SFReduceEnd 10 1.0 5.7962e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> SFFetchOpBegin 2 1.0 1.6069e-0434.7 0.00e+00 0.0 1.8e+03 4.4e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> SFFetchOpEnd 2 1.0 8.9251e-04 2.6 0.00e+00 0.0 1.8e+03 4.4e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecSet 302179 1.0 1.3128e+00 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecAssemblyBegin 1 1.0 1.3844e-03 7.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecAssemblyEnd 1 1.0 3.4710e-05 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecScatterBegin 603945 1.0 2.2874e+01 4.4 0.00e+00 0.0 1.1e+09 5.0e+02 1.0e+00 2 0100100 0 2 0100100 0 0 >>>> VecScatterEnd 603944 1.0 8.2651e+01 4.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 7 0 0 0 0 7 0 0 0 0 0 >>>> VecSetRandom 11 1.0 2.7061e-03 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> EPSSetUp 10 1.0 5.0371e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+01 0 0 0 0 0 0 0 0 0 0 0 >>>> EPSSolve 10 1.0 6.1329e+02 1.0 9.23e+11 1.8 1.1e+09 5.0e+02 1.5e+06 99100100100100 99100100100100 150509 >>>> STSetUp 10 1.0 2.5475e-04 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> STApply 150986 1.0 2.1997e+02 1.3 8.07e+10 1.8 1.1e+09 5.0e+02 1.2e+06 31 9100100 80 31 9100100 80 36950 >>>> BVCopy 1791 1.0 5.1953e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> BVMultVec 301925 1.0 1.5007e+02 3.1 3.31e+11 1.8 0.0e+00 0.0e+00 0.0e+00 14 36 0 0 0 14 36 0 0 0 220292 >>>> BVMultInPlace 1801 1.0 8.0080e+00 1.8 1.78e+11 1.8 0.0e+00 0.0e+00 0.0e+00 1 19 0 0 0 1 19 0 0 0 2222543 >>>> BVDotVec 301925 1.0 3.2807e+02 1.4 3.33e+11 1.8 0.0e+00 0.0e+00 3.0e+05 47 36 0 0 20 47 36 0 0 20 101409 >>>> BVOrthogonalizeV 150996 1.0 4.0292e+02 1.1 6.64e+11 1.8 0.0e+00 0.0e+00 3.0e+05 62 72 0 0 20 62 72 0 0 20 164619 >>>> BVScale 150996 1.0 4.1660e-01 3.2 5.27e+08 1.8 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 126494 >>>> BVSetRandom 10 1.0 2.5061e-03 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> DSSolve 1801 1.0 2.0764e+01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >>>> DSVectors 2779 1.0 1.2691e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> DSOther 1801 1.0 1.2944e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>>> ------------------------------------------------------------------------------------------------------------------------ >>>> >>>> Memory usage is given in bytes: >>>> >>>> Object Type Creations Destructions Memory Descendants' Mem. >>>> Reports information only for process 0. >>>> >>>> --- Event Stage 0: Main Stage >>>> >>>> Container 1 1 584 0. >>>> Distributed Mesh 6 6 29160 0. >>>> GraphPartitioner 2 2 1244 0. >>>> Matrix 1104 1104 136615232 0. >>>> Index Set 930 930 9125912 0. >>>> IS L to G Mapping 3 3 2235608 0. >>>> Section 28 26 18720 0. >>>> Star Forest Graph 30 30 25632 0. >>>> Discrete System 6 6 5616 0. >>>> PetscRandom 11 11 7194 0. >>>> Vector 604372 604372 8204816368 0. >>>> Vec Scatter 203 203 272192 0. >>>> Viewer 21 10 8480 0. >>>> EPS Solver 10 10 86360 0. >>>> Spectral Transform 10 10 8400 0. >>>> Basis Vectors 10 10 530848 0. >>>> Region 10 10 6800 0. >>>> Direct Solver 10 10 9838880 0. >>>> Krylov Solver 10 10 13920 0. >>>> Preconditioner 10 10 10080 0. >>>> ======================================================================================================================== >>>> Average time to get PetscTime(): 3.49944e-08 >>>> Average time for MPI_Barrier(): 5.842e-06 >>>> Average time for zero size MPI_Send(): 8.72551e-06 >>>> #PETSc Option Table entries: >>>> -config=benchmark3.json >>>> -log_view >>>> #End of PETSc Option Table entries >>>> Compiled without FORTRAN kernels >>>> Compiled with full precision matrices (default) >>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 >>>> Configure options: --prefix=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 CFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" FFLAGS= CXXFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" --with-cc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc --with-cxx=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpic++ --with-fc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 --with-precision=double --with-scalar-type=real --with-shared-libraries=1 --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= CXXOPTFLAGS= --with-blaslapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib/libopenblas.so --with-x=0 --with-cxx-dialect=C++11 --with-boost=1 --with-clanguage=C --with-scalapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib/libscalapack.so --with-scalapack=1 --with-metis=1 --with-metis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk --with-hdf5=1 --with-hdf5-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5 --with-hypre=1 --with-hypre-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne --with-parmetis=1 --with-parmetis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4 --with-mumps=1 --with-mumps-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b --with-trilinos=1 --with-trilinos-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo --with-fftw=0 --with-cxx-dialect=C++11 --with-superlu_dist-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include --with-superlu_dist-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib/libsuperlu_dist.a --with-superlu_dist=1 --with-suitesparse-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include --with-suitesparse-lib="/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libumfpack.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libklu.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcholmod.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libbtf.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libccolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libsuitesparseconfig.so /lib64/librt.so" --with-suitesparse=1 --with-zlib-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include --with-zlib-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib/libz.so --with-zlib=1 >>>> ----------------------------------------- >>>> Libraries compiled on 2020-01-22 15:21:53 on eu-c7-051-02 >>>> Machine characteristics: Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-centos-7.5.1804-Core >>>> Using PETSc directory: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit >>>> Using PETSc arch: >>>> ----------------------------------------- >>>> >>>> Using C compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc -ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2 >>>> Using Fortran compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 >>>> ----------------------------------------- >>>> >>>> Using include paths: -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include >>>> ----------------------------------------- >>>> >>>> Using C linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc >>>> Using Fortran linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 >>>> Using libraries: -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -lpetsc -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib /lib64/librt.so -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -Wl,-rpath,/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -L/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib:/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -lmuelu-adapters -lmuelu-interface -lmuelu -lstratimikos -lstratimikosbelos -lstratimikosaztecoo -lstratimikosamesos -lstratimikosml -lstratimikosifpack -lModeLaplace -lanasaziepetra -lanasazi -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lbelosxpetra -lbelosepetra -lbelos -lml -lifpack -lpamgen_extras -lpamgen -lamesos -lgaleri-xpetra -lgaleri-epetra -laztecoo -lisorropia -lxpetra-sup -lxpetra -lthyraepetraext -lthyraepetra -lthyracore -lthyraepetraext -lthyraepetra -lthyracore -lepetraext -ltrilinosss -ltriutils -lzoltan -lepetra -lsacado -lrtop -lkokkoskernels -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lgtest -lpthread -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu_dist -lHYPRE -lopenblas -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lparmetis -lmetis -lm -lz -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl >>>> ----------------------------------------- >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon May 4 05:06:35 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 4 May 2020 06:06:35 -0400 Subject: [petsc-users] Performance of SLEPc's Krylov-Schur solver In-Reply-To: <5EFC5BA9-A229-4FCB-B484-3A90F9213F95@student.ethz.ch> References: <86B05A0E-87C4-4B23-AC8B-6C39E6538B84@student.ethz.ch> <58697038-9771-4819-B060-061A3F3F0E91@student.ethz.ch> <87sggjam5x.fsf@jedbrown.org> <294D3E6F-33A1-4208-AD80-87CDA90DF87B@student.ethz.ch> <5EFC5BA9-A229-4FCB-B484-3A90F9213F95@student.ethz.ch> Message-ID: On Mon, May 4, 2020 at 3:51 AM Walker Andreas wrote: > Hey everyone, > > I wanted to give you a short update on this: > > - As suggested by Matt, I played around with the distribution of my cores > over nodes and changed to using one core per node only. > - After experimenting with a monolithic matrix instead of a MATNEST > object, I observed that the monolithic showed better speedup and changed to > building my problem matrix as monolithic matrix. > - I keep the subspace for the solver small which slightly improves the > runtimes > > After these changes, I get near-perfect scaling with speedups above 56 for > 64 cores (1 core/node) for example. Unfortunately I can?t really tell which > of the above changes contributed how much to this improvement. > > Anyway, thanks everyone for your help! > Great! I am glad its scaling well. 1) You should be able to up the number of cores per node as large as you want, as long as you evaluate the scaling in terms of nodes., meaning use that as the baseline. 2) You can see how many cores per node are used efficiently by running 'make streams'. 3) We would be really interested in seeing the logs from runs where MATNEST scales worse than AIJ. Maybe we could fix that. Thanks, Matt > Best regards and stay healthy > > Andreas > > Am 01.05.2020 um 14:45 schrieb Matthew Knepley : > > On Fri, May 1, 2020 at 8:32 AM Walker Andreas > wrote: > >> Hi Jed, Hi Jose, >> >> Thank you very much for your suggestions. >> >> - I tried reducing the subspace to 64 which indeed reduced the runtime >> by around 20 percent (sometimes more) for 128 cores. I will check what the >> effect on the sequential runtime is. >> - Regarding MatNest, I can just look for the eigenvalues of a submatrix >> to see how the speedup is affected; I will check that. Replacing the full >> matnest with a contiguous matrix is definitely more work but, if it >> improves the performance, worth the work (we assume that the program will >> be reused a lot). >> - Petsc is configured with mumps, openblas, scalapack (among others). >> But I noticed no significant difference to when petsc is configured without >> them. >> - The number of iterations required by the solver does not depend on the >> number of cores. >> >> Best regards and many thanks, >> > > Let me just address something from a high level. These operations are not > compute limited (for the most part), but limited by > bandwidth. Bandwidth is allocated by node, not by core, on these machines. > That is why it important to understand how many > nodes you are using, not cores. A useful scaling test would be to fill up > a single node (however many cores fit on one node), and > then increase the # of nodes. We would expect close to linear scaling in > that case. > > Thanks, > > Matt > > >> Andreas Walker >> >> > Am 01.05.2020 um 14:12 schrieb Jed Brown : >> > >> > "Jose E. Roman" writes: >> > >> >> Comments related to PETSc: >> >> >> >> - If you look at the "Reduct" column you will see that MatMult() is >> doing a lot of global reductions, which is bad for scaling. This is due to >> MATNEST (other Mat types do not do that). I don't know the details of >> MATNEST, maybe Matt can comment on this. >> > >> > It is not intrinsic to MatNest, though use of MatNest incurs extra >> > VecScatter costs. If you use MatNest without VecNest, then >> > VecGetSubVector incurs significant cost (including reductions). I >> > suspect it's likely that some SLEPc functionality is not available with >> > VecNest. A better option would be to optimize VecGetSubVector by >> > caching the IS and subvector, at least in the contiguous case. >> > >> > How difficult would it be for you to run with a monolithic matrix >> > instead of MatNest? It would certainly be better at amortizing >> > communication costs. >> > >> >> >> >> Comments related to SLEPc. >> >> >> >> - The last rows (DSSolve, DSVectors, DSOther) correspond to >> "sequential" computations. In your case they take a non-negligible time >> (around 30 seconds). You can try to reduce this time by reducing the size >> of the projected problem, e.g. running with -eps_nev 100 -eps_mpd 64 (see >> https://slepc.upv.es/documentation/current/docs/manualpages/EPS/EPSSetDimensions.html >> ) >> >> >> >> - In my previous comment about multithreaded BLAS, I was refering to >> configuring PETSc with MKL, OpenBLAS or similar. But anyway, I don't think >> this is relevant here. >> >> >> >> - Regarding the number of iterations, yes the number of iterations >> should be the same for different runs if you keep the same number of >> processes, but when you change the number of processes there might be >> significant differences for some problems, that is the rationale of my >> suggestion. Anyway, in your case the fluctuation does not seem very >> important. >> >> >> >> Jose >> >> >> >> >> >>> El 1 may 2020, a las 10:07, Walker Andreas >> escribi?: >> >>> >> >>> Hi Matthew, >> >>> >> >>> I just ran the same program on a single core. You can see the output >> of -log_view below. As I see it, most functions have speedups of around 50 >> for 128 cores, also functions like matmult etc. >> >>> >> >>> Best regards, >> >>> >> >>> Andreas >> >>> >> >>> >> ************************************************************************************************************************ >> >>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript >> -r -fCourier9' to print this document *** >> >>> >> ************************************************************************************************************************ >> >>> >> >>> ---------------------------------------------- PETSc Performance >> Summary: ---------------------------------------------- >> >>> >> >>> ./Solver on a named eu-a6-011-09 with 1 processor, by awalker Fri >> May 1 04:03:07 2020 >> >>> Using Petsc Release Version 3.10.5, Mar, 28, 2019 >> >>> >> >>> Max Max/Min Avg Total >> >>> Time (sec): 3.092e+04 1.000 3.092e+04 >> >>> Objects: 6.099e+05 1.000 6.099e+05 >> >>> Flop: 9.313e+13 1.000 9.313e+13 9.313e+13 >> >>> Flop/sec: 3.012e+09 1.000 3.012e+09 3.012e+09 >> >>> MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 >> >>> MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 >> >>> MPI Reductions: 0.000e+00 0.000 >> >>> >> >>> Flop counting convention: 1 flop = 1 real number operation of type >> (multiply/divide/add/subtract) >> >>> e.g., VecAXPY() for real vectors of length >> N --> 2N flop >> >>> and VecAXPY() for complex vectors of >> length N --> 8N flop >> >>> >> >>> Summary of Stages: ----- Time ------ ----- Flop ------ --- >> Messages --- -- Message Lengths -- -- Reductions -- >> >>> Avg %Total Avg %Total Count >> %Total Avg %Total Count %Total >> >>> 0: Main Stage: 3.0925e+04 100.0% 9.3134e+13 100.0% 0.000e+00 >> 0.0% 0.000e+00 0.0% 0.000e+00 0.0% >> >>> >> >>> >> ------------------------------------------------------------------------------------------------------------------------ >> >>> See the 'Profiling' chapter of the users' manual for details on >> interpreting output. >> >>> Phase summary info: >> >>> Count: number of times phase was executed >> >>> Time and Flop: Max - maximum over all processors >> >>> Ratio - ratio of maximum to minimum over all >> processors >> >>> Mess: number of messages sent >> >>> AvgLen: average message length (bytes) >> >>> Reduct: number of global reductions >> >>> Global: entire computation >> >>> Stage: stages of a computation. Set stages with PetscLogStagePush() >> and PetscLogStagePop(). >> >>> %T - percent time in this phase %F - percent flop in >> this phase >> >>> %M - percent messages in this phase %L - percent message >> lengths in this phase >> >>> %R - percent reductions in this phase >> >>> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time >> over all processors) >> >>> >> ------------------------------------------------------------------------------------------------------------------------ >> >>> Event Count Time (sec) Flop >> --- Global --- --- Stage ---- Total >> >>> Max Ratio Max Ratio Max Ratio Mess >> AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> >>> >> ------------------------------------------------------------------------------------------------------------------------ >> >>> >> >>> --- Event Stage 0: Main Stage >> >>> >> >>> MatMult 152338 1.0 8.2799e+03 1.0 8.20e+12 1.0 0.0e+00 >> 0.0e+00 0.0e+00 27 9 0 0 0 27 9 0 0 0 990 >> >>> MatMultAdd 609352 1.0 8.1229e+03 1.0 8.20e+12 1.0 0.0e+00 >> 0.0e+00 0.0e+00 26 9 0 0 0 26 9 0 0 0 1010 >> >>> MatConvert 30 1.0 1.5797e+00 1.0 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> MatScale 10 1.0 4.7172e-02 1.0 6.73e+07 1.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1426 >> >>> MatAssemblyBegin 516 1.0 2.0695e-04 1.0 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> MatAssemblyEnd 516 1.0 2.8933e+00 1.0 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> MatZeroEntries 2 1.0 3.6038e-02 1.0 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> MatView 10 1.0 2.4422e+00 1.0 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> MatAXPY 40 1.0 3.1595e-01 1.0 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> MatMatMult 60 1.0 1.3723e+01 1.0 1.24e+09 1.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 90 >> >>> MatMatMultSym 100 1.0 1.3651e+01 1.0 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> MatMatMultNum 100 1.0 7.5159e+00 1.0 2.06e+09 1.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 274 >> >>> MatMatMatMult 40 1.0 1.8674e+01 1.0 1.66e+09 1.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 89 >> >>> MatMatMatMultSym 40 1.0 1.1848e+01 1.0 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> MatMatMatMultNum 40 1.0 6.8266e+00 1.0 1.66e+09 1.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 243 >> >>> MatPtAP 40 1.0 1.9042e+01 1.0 1.66e+09 1.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 87 >> >>> MatTrnMatMult 40 1.0 7.7990e+00 1.0 8.24e+08 1.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 106 >> >>> DMPlexStratify 1 1.0 5.1223e-02 1.0 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> DMPlexPrealloc 2 1.0 1.5242e+00 1.0 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> VecSet 914053 1.0 1.4929e+02 1.0 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> VecAssemblyBegin 1 1.0 1.3411e-07 1.0 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> VecAssemblyEnd 1 1.0 8.0094e-08 1.0 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> VecScatterBegin 1 1.0 2.6399e-04 1.0 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> VecSetRandom 10 1.0 8.6088e-02 1.0 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> EPSSetUp 10 1.0 2.9988e+00 1.0 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> EPSSolve 10 1.0 2.8695e+04 1.0 9.31e+13 1.0 0.0e+00 >> 0.0e+00 0.0e+00 93100 0 0 0 93100 0 0 0 3246 >> >>> STSetUp 10 1.0 9.7291e-05 1.0 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> STApply 152338 1.0 8.2803e+03 1.0 8.20e+12 1.0 0.0e+00 >> 0.0e+00 0.0e+00 27 9 0 0 0 27 9 0 0 0 990 >> >>> BVCopy 1814 1.0 1.1076e+00 1.0 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> BVMultVec 304639 1.0 9.8281e+03 1.0 3.34e+13 1.0 0.0e+00 >> 0.0e+00 0.0e+00 32 36 0 0 0 32 36 0 0 0 3397 >> >>> BVMultInPlace 1824 1.0 7.0999e+02 1.0 1.79e+13 1.0 0.0e+00 >> 0.0e+00 0.0e+00 2 19 0 0 0 2 19 0 0 0 25213 >> >>> BVDotVec 304639 1.0 9.8037e+03 1.0 3.36e+13 1.0 0.0e+00 >> 0.0e+00 0.0e+00 32 36 0 0 0 32 36 0 0 0 3427 >> >>> BVOrthogonalizeV 152348 1.0 1.9633e+04 1.0 6.70e+13 1.0 0.0e+00 >> 0.0e+00 0.0e+00 63 72 0 0 0 63 72 0 0 0 3411 >> >>> BVScale 152348 1.0 3.7888e+01 1.0 5.32e+10 1.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1403 >> >>> BVSetRandom 10 1.0 8.6364e-02 1.0 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> DSSolve 1824 1.0 1.7363e+01 1.0 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> DSVectors 2797 1.0 1.2353e-01 1.0 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> DSOther 1824 1.0 9.8627e+00 1.0 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> >> ------------------------------------------------------------------------------------------------------------------------ >> >>> >> >>> Memory usage is given in bytes: >> >>> >> >>> Object Type Creations Destructions Memory >> Descendants' Mem. >> >>> Reports information only for process 0. >> >>> >> >>> --- Event Stage 0: Main Stage >> >>> >> >>> Container 1 1 584 0. >> >>> Distributed Mesh 1 1 5184 0. >> >>> GraphPartitioner 1 1 624 0. >> >>> Matrix 320 320 3469402576 0. >> >>> Index Set 53 53 2777932 0. >> >>> IS L to G Mapping 1 1 249320 0. >> >>> Section 13 11 7920 0. >> >>> Star Forest Graph 6 6 4896 0. >> >>> Discrete System 1 1 936 0. >> >>> Vector 609405 609405 857220847896 0. >> >>> Vec Scatter 1 1 704 0. >> >>> Viewer 22 11 9328 0. >> >>> EPS Solver 10 10 86360 0. >> >>> Spectral Transform 10 10 8400 0. >> >>> Basis Vectors 10 10 530336 0. >> >>> PetscRandom 10 10 6540 0. >> >>> Region 10 10 6800 0. >> >>> Direct Solver 10 10 9838880 0. >> >>> Krylov Solver 10 10 13920 0. >> >>> Preconditioner 10 10 10080 0. >> >>> >> ======================================================================================================================== >> >>> Average time to get PetscTime(): 2.50991e-08 >> >>> #PETSc Option Table entries: >> >>> -config=benchmark3.json >> >>> -eps_converged_reason >> >>> -log_view >> >>> #End of PETSc Option Table entries >> >>> Compiled without FORTRAN kernels >> >>> Compiled with full precision matrices (default) >> >>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 >> sizeof(PetscScalar) 8 sizeof(PetscInt) 4 >> >>> Configure options: >> --prefix=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit >> --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 >> CFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" FFLAGS= >> CXXFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" >> --with-cc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc >> --with-cxx=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpic++ >> --with-fc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 >> --with-precision=double --with-scalar-type=real --with-shared-libraries=1 >> --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= >> CXXOPTFLAGS= >> --with-blaslapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib/libopenblas.so >> --with-x=0 --with-cxx-dialect=C++11 --with-boost=1 --with-clanguage=C >> --with-scalapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib/libscalapack.so >> --with-scalapack=1 --with-metis=1 >> --with-metis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk >> --with-hdf5=1 >> --with-hdf5-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5 >> --with-hypre=1 >> --with-hypre-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne >> --with-parmetis=1 >> --with-parmetis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4 >> --with-mumps=1 >> --with-mumps-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b >> --with-trilinos=1 >> --with-trilinos-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo >> --with-fftw=0 --with-cxx-dialect=C++11 >> --with-superlu_dist-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include >> --with-superlu_dist-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib/libsuperlu_dist.a >> --with-superlu_dist=1 >> --with-suitesparse-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include >> --with-suitesparse-lib="/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libumfpack.so >> /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libklu.so >> /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcholmod.so >> /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libbtf.so >> /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libccolamd.so >> /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcolamd.so >> /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcamd.so >> /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libamd.so >> /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libsuitesparseconfig.so >> /lib64/librt.so" --with-suitesparse=1 >> --with-zlib-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include >> --with-zlib-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib/libz.so >> --with-zlib=1 >> >>> ----------------------------------------- >> >>> Libraries compiled on 2020-01-22 15:21:53 on eu-c7-051-02 >> >>> Machine characteristics: >> Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-centos-7.5.1804-Core >> >>> Using PETSc directory: >> /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit >> >>> Using PETSc arch: >> >>> ----------------------------------------- >> >>> >> >>> Using C compiler: >> /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc >> -ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2 >> >>> Using Fortran compiler: >> /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 >> >> >>> ----------------------------------------- >> >>> >> >>> Using include paths: >> -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/include >> -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/include >> -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/include >> -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include >> -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include >> -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/include >> -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/include >> -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/include >> -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/include >> -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include >> >>> ----------------------------------------- >> >>> >> >>> Using C linker: >> /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc >> >>> Using Fortran linker: >> /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 >> >>> Using libraries: >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib >> -lpetsc >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib >> /lib64/librt.so >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib >> -Wl,-rpath,/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib >> -L/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib:/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib >> -lmuelu-adapters -lmuelu-interface -lmuelu -lstratimikos -lstratimikosbelos >> -lstratimikosaztecoo -lstratimikosamesos -lstratimikosml >> -lstratimikosifpack -lModeLaplace -lanasaziepetra -lanasazi -lmapvarlib >> -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco >> -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac >> -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus >> -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco >> -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac >> -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus >> -lbelosxpetra -lbelosepetra -lbelos -lml -lifpack -lpamgen_extras -lpamgen >> -lamesos -lgaleri-xpetra -lgaleri-epetra -laztecoo -lisorropia -lxpetra-sup >> -lxpetra -lthyraepetraext -lthyraepetra -lthyracore -lthyraepetraext >> -lthyraepetra -lthyracore -lepetraext -ltrilinosss -ltriutils -lzoltan >> -lepetra -lsacado -lrtop -lkokkoskernels -lteuchoskokkoscomm >> -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm >> -lteuchosparameterlist -lteuchosparser -lteuchoscore -lteuchoskokkoscomm >> -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm >> -lteuchosparameterlist -lteuchosparser -lteuchoscore -lkokkosalgorithms >> -lkokkoscontainers -lkokkoscore -lkokkosalgorithms -lkokkoscontainers >> -lkokkoscore -lgtest -lpthread -lcmumps -ldmumps -lsmumps -lzmumps >> -lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf -lccolamd >> -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu_dist -lHYPRE -lopenblas >> -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lparmetis -lmetis -lm -lz >> -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi >> -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl >> >>> ----------------------------------------- >> >>> >> >>> >> >>>> Am 30.04.2020 um 17:14 schrieb Matthew Knepley : >> >>>> >> >>>> On Thu, Apr 30, 2020 at 10:55 AM Walker Andreas < >> awalker at student.ethz.ch> wrote: >> >>>> Hello everyone, >> >>>> >> >>>> I have used SLEPc successfully on a FEM-related project. Even though >> it is very powerful overall, the speedup I measure is a bit below my >> expectations. Compared to using a single core, the speedup is for example >> around 1.8 for two cores but only maybe 50-60 for 128 cores and maybe 70 or >> 80 for 256 cores. Some details about my problem: >> >>>> >> >>>> - The problem is based on meshes with up to 400k degrees of freedom. >> DMPlex is used for organizing it. >> >>>> - ParMetis is used to partition the mesh. This yields a stiffness >> matrix where the vast majority of entries is in the diagonal blocks (i.e. >> looking at the rows owned by a core, there is a very dense square-shaped >> region around the diagonal and some loosely scattered nozeroes in the other >> columns). >> >>>> - The actual matrix from which I need eigenvalues is a 2x2 block >> matrix, saved as MATNEST - matrix. Each of these four matrices is computed >> based on the stiffness matrix and has a similar size and nonzero pattern. >> For a mesh of 200k dofs, one such matrix has a size of about 174kx174k and >> on average about 40 nonzeroes per row. >> >>>> - I use the default Krylov-Schur solver and look for the 100 >> smallest eigenvalues >> >>>> - The output of -log_view for the 200k-dof - mesh described above >> run on 128 cores is at the end of this mail. >> >>>> >> >>>> I noticed that the problem matrices are not perfectly balanced, i.e. >> the number of rows per core might vary between 2500 and 3000, for example. >> But I am not sure if this is the main reason for the poor speedup. >> >>>> >> >>>> I tried to reduce the subspace size but without effect. I also >> attempted to use the shift-and-invert spectral transformation but the >> MATNEST-type prevents this. >> >>>> >> >>>> Are there any suggestions to improve the speedup further or is this >> the maximum speedup that I can expect? >> >>>> >> >>>> Can you also give us the performance for this problem on one node >> using the same number of cores per node? Then we can calculate speedup >> >>>> and look at which functions are not speeding up. >> >>>> >> >>>> Thanks, >> >>>> >> >>>> Matt >> >>>> >> >>>> Thanks a lot in advance, >> >>>> >> >>>> Andreas Walker >> >>>> >> >>>> m&m group >> >>>> D-MAVT >> >>>> ETH Zurich >> >>>> >> >>>> >> ************************************************************************************************************************ >> >>>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript >> -r -fCourier9' to print this document *** >> >>>> >> ************************************************************************************************************************ >> >>>> >> >>>> ---------------------------------------------- PETSc Performance >> Summary: ---------------------------------------------- >> >>>> >> >>>> ./Solver on a named eu-g1-050-2 with 128 processors, by awalker Thu >> Apr 30 15:50:22 2020 >> >>>> Using Petsc Release Version 3.10.5, Mar, 28, 2019 >> >>>> >> >>>> Max Max/Min Avg Total >> >>>> Time (sec): 6.209e+02 1.000 6.209e+02 >> >>>> Objects: 6.068e+05 1.001 6.063e+05 >> >>>> Flop: 9.230e+11 1.816 7.212e+11 9.231e+13 >> >>>> Flop/sec: 1.487e+09 1.816 1.161e+09 1.487e+11 >> >>>> MPI Messages: 1.451e+07 2.999 8.265e+06 1.058e+09 >> >>>> MPI Message Lengths: 6.062e+09 2.011 5.029e+02 5.321e+11 >> >>>> MPI Reductions: 1.512e+06 1.000 >> >>>> >> >>>> Flop counting convention: 1 flop = 1 real number operation of type >> (multiply/divide/add/subtract) >> >>>> e.g., VecAXPY() for real vectors of >> length N --> 2N flop >> >>>> and VecAXPY() for complex vectors of >> length N --> 8N flop >> >>>> >> >>>> Summary of Stages: ----- Time ------ ----- Flop ------ --- >> Messages --- -- Message Lengths -- -- Reductions -- >> >>>> Avg %Total Avg %Total Count >> %Total Avg %Total Count %Total >> >>>> 0: Main Stage: 6.2090e+02 100.0% 9.2309e+13 100.0% 1.058e+09 >> 100.0% 5.029e+02 100.0% 1.512e+06 100.0% >> >>>> >> >>>> >> ------------------------------------------------------------------------------------------------------------------------ >> >>>> See the 'Profiling' chapter of the users' manual for details on >> interpreting output. >> >>>> Phase summary info: >> >>>> Count: number of times phase was executed >> >>>> Time and Flop: Max - maximum over all processors >> >>>> Ratio - ratio of maximum to minimum over all >> processors >> >>>> Mess: number of messages sent >> >>>> AvgLen: average message length (bytes) >> >>>> Reduct: number of global reductions >> >>>> Global: entire computation >> >>>> Stage: stages of a computation. Set stages with >> PetscLogStagePush() and PetscLogStagePop(). >> >>>> %T - percent time in this phase %F - percent flop in >> this phase >> >>>> %M - percent messages in this phase %L - percent message >> lengths in this phase >> >>>> %R - percent reductions in this phase >> >>>> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time >> over all processors) >> >>>> >> ------------------------------------------------------------------------------------------------------------------------ >> >>>> Event Count Time (sec) Flop >> --- Global --- --- Stage ---- Total >> >>>> Max Ratio Max Ratio Max Ratio Mess >> AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> >>>> >> ------------------------------------------------------------------------------------------------------------------------ >> >>>> >> >>>> --- Event Stage 0: Main Stage >> >>>> >> >>>> BuildTwoSided 20 1.0 2.3249e-01 2.2 0.00e+00 0.0 2.2e+04 >> 4.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> BuildTwoSidedF 317 1.0 8.5016e-01 4.8 0.00e+00 0.0 2.1e+04 >> 1.4e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> MatMult 150986 1.0 2.1963e+02 1.3 8.07e+10 1.8 1.1e+09 >> 5.0e+02 1.2e+06 31 9100100 80 31 9100100 80 37007 >> >>>> MatMultAdd 603944 1.0 1.6209e+02 1.4 8.07e+10 1.8 1.1e+09 >> 5.0e+02 0.0e+00 23 9100100 0 23 9100100 0 50145 >> >>>> MatConvert 30 1.0 1.6488e-02 2.2 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> MatScale 10 1.0 1.0347e-03 3.9 6.68e+05 1.8 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 65036 >> >>>> MatAssemblyBegin 916 1.0 8.6715e-01 1.4 0.00e+00 0.0 2.1e+04 >> 1.4e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> MatAssemblyEnd 916 1.0 2.0682e-01 1.1 0.00e+00 0.0 4.7e+05 >> 1.3e+02 1.5e+03 0 0 0 0 0 0 0 0 0 0 0 >> >>>> MatZeroEntries 42 1.0 7.2787e-03 2.0 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> MatView 10 1.0 1.4816e+00 1.0 0.00e+00 0.0 6.4e+03 >> 1.3e+05 3.0e+01 0 0 0 0 0 0 0 0 0 0 0 >> >>>> MatAXPY 40 1.0 1.0752e-02 1.9 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> MatTranspose 80 1.0 3.0198e-03 1.4 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> MatMatMult 60 1.0 3.0391e-01 1.0 7.82e+06 1.6 3.8e+05 >> 2.8e+02 7.8e+02 0 0 0 0 0 0 0 0 0 0 2711 >> >>>> MatMatMultSym 60 1.0 2.4238e-01 1.0 0.00e+00 0.0 3.3e+05 >> 2.4e+02 7.2e+02 0 0 0 0 0 0 0 0 0 0 0 >> >>>> MatMatMultNum 60 1.0 5.8508e-02 1.0 7.82e+06 1.6 4.7e+04 >> 5.7e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 14084 >> >>>> MatPtAP 40 1.0 4.5617e-01 1.0 1.59e+07 1.6 3.3e+05 >> 1.0e+03 6.4e+02 0 0 0 0 0 0 0 0 0 0 3649 >> >>>> MatPtAPSymbolic 40 1.0 2.6002e-01 1.0 0.00e+00 0.0 1.7e+05 >> 6.5e+02 2.8e+02 0 0 0 0 0 0 0 0 0 0 0 >> >>>> MatPtAPNumeric 40 1.0 1.9293e-01 1.0 1.59e+07 1.6 1.5e+05 >> 1.5e+03 3.2e+02 0 0 0 0 0 0 0 0 0 0 8629 >> >>>> MatTrnMatMult 40 1.0 2.3801e-01 1.0 6.09e+06 1.8 1.8e+05 >> 1.0e+03 6.4e+02 0 0 0 0 0 0 0 0 0 0 2442 >> >>>> MatTrnMatMultSym 40 1.0 1.6962e-01 1.0 0.00e+00 0.0 1.7e+05 >> 4.4e+02 6.4e+02 0 0 0 0 0 0 0 0 0 0 0 >> >>>> MatTrnMatMultNum 40 1.0 6.9000e-02 1.0 6.09e+06 1.8 9.7e+03 >> 1.1e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 8425 >> >>>> MatGetLocalMat 240 1.0 4.9149e-02 1.6 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> MatGetBrAoCol 160 1.0 2.0470e-02 1.6 0.00e+00 0.0 3.3e+05 >> 4.1e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> MatTranspose_SeqAIJ_FAST 80 1.0 2.9940e-03 1.4 0.00e+00 0.0 >> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> Mesh Partition 1 1.0 1.4825e+00 1.0 0.00e+00 0.0 9.8e+04 >> 6.9e+01 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> Mesh Migration 1 1.0 3.6680e-02 1.0 0.00e+00 0.0 1.5e+03 >> 1.4e+04 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> DMPlexDistribute 1 1.0 1.5269e+00 1.0 0.00e+00 0.0 1.0e+05 >> 3.5e+02 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 >> >>>> DMPlexDistCones 1 1.0 1.8845e-02 1.2 0.00e+00 0.0 1.0e+03 >> 1.7e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> DMPlexDistLabels 1 1.0 9.7280e-04 1.2 0.00e+00 0.0 0.0e+00 >> 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> DMPlexDistData 1 1.0 3.1499e-01 1.4 0.00e+00 0.0 9.8e+04 >> 4.3e+01 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> DMPlexStratify 2 1.0 9.3421e-02 1.8 0.00e+00 0.0 0.0e+00 >> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> DMPlexPrealloc 2 1.0 3.5980e-02 1.0 0.00e+00 0.0 4.0e+04 >> 1.8e+03 3.0e+01 0 0 0 0 0 0 0 0 0 0 0 >> >>>> SFSetGraph 20 1.0 1.6069e-05 2.0 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> SFSetUp 20 1.0 2.8043e-01 1.9 0.00e+00 0.0 6.7e+04 >> 5.0e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> SFBcastBegin 25 1.0 3.9653e-02 2.5 0.00e+00 0.0 6.1e+04 >> 4.9e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> SFBcastEnd 25 1.0 9.0128e-02 1.6 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> SFReduceBegin 10 1.0 4.3473e-04 5.5 0.00e+00 0.0 7.4e+03 >> 4.0e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> SFReduceEnd 10 1.0 5.7962e-03 1.3 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> SFFetchOpBegin 2 1.0 1.6069e-0434.7 0.00e+00 0.0 1.8e+03 >> 4.4e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> SFFetchOpEnd 2 1.0 8.9251e-04 2.6 0.00e+00 0.0 1.8e+03 >> 4.4e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> VecSet 302179 1.0 1.3128e+00 2.3 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> VecAssemblyBegin 1 1.0 1.3844e-03 7.3 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> VecAssemblyEnd 1 1.0 3.4710e-05 4.1 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> VecScatterBegin 603945 1.0 2.2874e+01 4.4 0.00e+00 0.0 1.1e+09 >> 5.0e+02 1.0e+00 2 0100100 0 2 0100100 0 0 >> >>>> VecScatterEnd 603944 1.0 8.2651e+01 4.5 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 7 0 0 0 0 7 0 0 0 0 0 >> >>>> VecSetRandom 11 1.0 2.7061e-03 3.1 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> EPSSetUp 10 1.0 5.0371e-02 1.1 0.00e+00 0.0 0.0e+00 >> 0.0e+00 4.0e+01 0 0 0 0 0 0 0 0 0 0 0 >> >>>> EPSSolve 10 1.0 6.1329e+02 1.0 9.23e+11 1.8 1.1e+09 >> 5.0e+02 1.5e+06 99100100100100 99100100100100 150509 >> >>>> STSetUp 10 1.0 2.5475e-04 2.9 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> STApply 150986 1.0 2.1997e+02 1.3 8.07e+10 1.8 1.1e+09 >> 5.0e+02 1.2e+06 31 9100100 80 31 9100100 80 36950 >> >>>> BVCopy 1791 1.0 5.1953e-03 1.5 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> BVMultVec 301925 1.0 1.5007e+02 3.1 3.31e+11 1.8 0.0e+00 >> 0.0e+00 0.0e+00 14 36 0 0 0 14 36 0 0 0 220292 >> >>>> BVMultInPlace 1801 1.0 8.0080e+00 1.8 1.78e+11 1.8 0.0e+00 >> 0.0e+00 0.0e+00 1 19 0 0 0 1 19 0 0 0 2222543 >> >>>> BVDotVec 301925 1.0 3.2807e+02 1.4 3.33e+11 1.8 0.0e+00 >> 0.0e+00 3.0e+05 47 36 0 0 20 47 36 0 0 20 101409 >> >>>> BVOrthogonalizeV 150996 1.0 4.0292e+02 1.1 6.64e+11 1.8 0.0e+00 >> 0.0e+00 3.0e+05 62 72 0 0 20 62 72 0 0 20 164619 >> >>>> BVScale 150996 1.0 4.1660e-01 3.2 5.27e+08 1.8 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 126494 >> >>>> BVSetRandom 10 1.0 2.5061e-03 2.9 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> DSSolve 1801 1.0 2.0764e+01 1.1 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >> >>>> DSVectors 2779 1.0 1.2691e-01 1.1 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> DSOther 1801 1.0 1.2944e+01 1.0 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >> >>>> >> ------------------------------------------------------------------------------------------------------------------------ >> >>>> >> >>>> Memory usage is given in bytes: >> >>>> >> >>>> Object Type Creations Destructions Memory >> Descendants' Mem. >> >>>> Reports information only for process 0. >> >>>> >> >>>> --- Event Stage 0: Main Stage >> >>>> >> >>>> Container 1 1 584 0. >> >>>> Distributed Mesh 6 6 29160 0. >> >>>> GraphPartitioner 2 2 1244 0. >> >>>> Matrix 1104 1104 136615232 0. >> >>>> Index Set 930 930 9125912 0. >> >>>> IS L to G Mapping 3 3 2235608 0. >> >>>> Section 28 26 18720 0. >> >>>> Star Forest Graph 30 30 25632 0. >> >>>> Discrete System 6 6 5616 0. >> >>>> PetscRandom 11 11 7194 0. >> >>>> Vector 604372 604372 8204816368 0. >> >>>> Vec Scatter 203 203 272192 0. >> >>>> Viewer 21 10 8480 0. >> >>>> EPS Solver 10 10 86360 0. >> >>>> Spectral Transform 10 10 8400 0. >> >>>> Basis Vectors 10 10 530848 0. >> >>>> Region 10 10 6800 0. >> >>>> Direct Solver 10 10 9838880 0. >> >>>> Krylov Solver 10 10 13920 0. >> >>>> Preconditioner 10 10 10080 0. >> >>>> >> ======================================================================================================================== >> >>>> Average time to get PetscTime(): 3.49944e-08 >> >>>> Average time for MPI_Barrier(): 5.842e-06 >> >>>> Average time for zero size MPI_Send(): 8.72551e-06 >> >>>> #PETSc Option Table entries: >> >>>> -config=benchmark3.json >> >>>> -log_view >> >>>> #End of PETSc Option Table entries >> >>>> Compiled without FORTRAN kernels >> >>>> Compiled with full precision matrices (default) >> >>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 >> sizeof(PetscScalar) 8 sizeof(PetscInt) 4 >> >>>> Configure options: >> --prefix=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit >> --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 >> CFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" FFLAGS= >> CXXFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" >> --with-cc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc >> --with-cxx=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpic++ >> --with-fc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 >> --with-precision=double --with-scalar-type=real --with-shared-libraries=1 >> --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= >> CXXOPTFLAGS= >> --with-blaslapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib/libopenblas.so >> --with-x=0 --with-cxx-dialect=C++11 --with-boost=1 --with-clanguage=C >> --with-scalapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib/libscalapack.so >> --with-scalapack=1 --with-metis=1 >> --with-metis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk >> --with-hdf5=1 >> --with-hdf5-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5 >> --with-hypre=1 >> --with-hypre-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne >> --with-parmetis=1 >> --with-parmetis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4 >> --with-mumps=1 >> --with-mumps-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b >> --with-trilinos=1 >> --with-trilinos-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo >> --with-fftw=0 --with-cxx-dialect=C++11 >> --with-superlu_dist-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include >> --with-superlu_dist-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib/libsuperlu_dist.a >> --with-superlu_dist=1 >> --with-suitesparse-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include >> --with-suitesparse-lib="/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libumfpack.so >> /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libklu.so >> /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcholmod.so >> /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libbtf.so >> /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libccolamd.so >> /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcolamd.so >> /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcamd.so >> /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libamd.so >> /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libsuitesparseconfig.so >> /lib64/librt.so" --with-suitesparse=1 >> --with-zlib-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include >> --with-zlib-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib/libz.so >> --with-zlib=1 >> >>>> ----------------------------------------- >> >>>> Libraries compiled on 2020-01-22 15:21:53 on eu-c7-051-02 >> >>>> Machine characteristics: >> Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-centos-7.5.1804-Core >> >>>> Using PETSc directory: >> /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit >> >>>> Using PETSc arch: >> >>>> ----------------------------------------- >> >>>> >> >>>> Using C compiler: >> /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc >> -ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2 >> >>>> Using Fortran compiler: >> /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 >> >> >>>> ----------------------------------------- >> >>>> >> >>>> Using include paths: >> -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/include >> -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/include >> -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/include >> -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include >> -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include >> -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/include >> -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/include >> -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/include >> -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/include >> -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include >> >>>> ----------------------------------------- >> >>>> >> >>>> Using C linker: >> /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc >> >>>> Using Fortran linker: >> /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 >> >>>> Using libraries: >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib >> -lpetsc >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib >> /lib64/librt.so >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib >> -Wl,-rpath,/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib >> -L/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib:/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 >> -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib >> -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib >> -lmuelu-adapters -lmuelu-interface -lmuelu -lstratimikos -lstratimikosbelos >> -lstratimikosaztecoo -lstratimikosamesos -lstratimikosml >> -lstratimikosifpack -lModeLaplace -lanasaziepetra -lanasazi -lmapvarlib >> -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco >> -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac >> -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus >> -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco >> -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac >> -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus >> -lbelosxpetra -lbelosepetra -lbelos -lml -lifpack -lpamgen_extras -lpamgen >> -lamesos -lgaleri-xpetra -lgaleri-epetra -laztecoo -lisorropia -lxpetra-sup >> -lxpetra -lthyraepetraext -lthyraepetra -lthyracore -lthyraepetraext >> -lthyraepetra -lthyracore -lepetraext -ltrilinosss -ltriutils -lzoltan >> -lepetra -lsacado -lrtop -lkokkoskernels -lteuchoskokkoscomm >> -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm >> -lteuchosparameterlist -lteuchosparser -lteuchoscore -lteuchoskokkoscomm >> -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm >> -lteuchosparameterlist -lteuchosparser -lteuchoscore -lkokkosalgorithms >> -lkokkoscontainers -lkokkoscore -lkokkosalgorithms -lkokkoscontainers >> -lkokkoscore -lgtest -lpthread -lcmumps -ldmumps -lsmumps -lzmumps >> -lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf -lccolamd >> -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu_dist -lHYPRE -lopenblas >> -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lparmetis -lmetis -lm -lz >> -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi >> -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl >> >>>> ----------------------------------------- >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> >>>> -- Norbert Wiener >> >>>> >> >>>> https://www.cse.buffalo.edu/~knepley/ >> >>> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Mon May 4 05:12:01 2020 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 4 May 2020 12:12:01 +0200 Subject: [petsc-users] Performance of SLEPc's Krylov-Schur solver In-Reply-To: References: <86B05A0E-87C4-4B23-AC8B-6C39E6538B84@student.ethz.ch> <58697038-9771-4819-B060-061A3F3F0E91@student.ethz.ch> <87sggjam5x.fsf@jedbrown.org> <294D3E6F-33A1-4208-AD80-87CDA90DF87B@student.ethz.ch> <5EFC5BA9-A229-4FCB-B484-3A90F9213F95@student.ethz.ch> Message-ID: <2758D742-5E7D-4AEA-B93E-1592424FA8E2@dsic.upv.es> > El 4 may 2020, a las 12:06, Matthew Knepley escribi?: > > On Mon, May 4, 2020 at 3:51 AM Walker Andreas wrote: > Hey everyone, > > I wanted to give you a short update on this: > > - As suggested by Matt, I played around with the distribution of my cores over nodes and changed to using one core per node only. > - After experimenting with a monolithic matrix instead of a MATNEST object, I observed that the monolithic showed better speedup and changed to building my problem matrix as monolithic matrix. > - I keep the subspace for the solver small which slightly improves the runtimes > > After these changes, I get near-perfect scaling with speedups above 56 for 64 cores (1 core/node) for example. Unfortunately I can?t really tell which of the above changes contributed how much to this improvement. > > Anyway, thanks everyone for your help! > > Great! I am glad its scaling well. > > 1) You should be able to up the number of cores per node as large as you want, as long as you evaluate the scaling in terms of nodes., meaning use that as the baseline. > > 2) You can see how many cores per node are used efficiently by running 'make streams'. > > 3) We would be really interested in seeing the logs from runs where MATNEST scales worse than AIJ. Maybe we could fix that. Matt, this last issue is due to SLEPc, I am preparing a fix. > > Thanks, > > Matt > > Best regards and stay healthy > > Andreas > >> Am 01.05.2020 um 14:45 schrieb Matthew Knepley : >> >> On Fri, May 1, 2020 at 8:32 AM Walker Andreas wrote: >> Hi Jed, Hi Jose, >> >> Thank you very much for your suggestions. >> >> - I tried reducing the subspace to 64 which indeed reduced the runtime by around 20 percent (sometimes more) for 128 cores. I will check what the effect on the sequential runtime is. >> - Regarding MatNest, I can just look for the eigenvalues of a submatrix to see how the speedup is affected; I will check that. Replacing the full matnest with a contiguous matrix is definitely more work but, if it improves the performance, worth the work (we assume that the program will be reused a lot). >> - Petsc is configured with mumps, openblas, scalapack (among others). But I noticed no significant difference to when petsc is configured without them. >> - The number of iterations required by the solver does not depend on the number of cores. >> >> Best regards and many thanks, >> >> Let me just address something from a high level. These operations are not compute limited (for the most part), but limited by >> bandwidth. Bandwidth is allocated by node, not by core, on these machines. That is why it important to understand how many >> nodes you are using, not cores. A useful scaling test would be to fill up a single node (however many cores fit on one node), and >> then increase the # of nodes. We would expect close to linear scaling in that case. >> >> Thanks, >> >> Matt >> >> Andreas Walker >> >> > Am 01.05.2020 um 14:12 schrieb Jed Brown : >> > >> > "Jose E. Roman" writes: >> > >> >> Comments related to PETSc: >> >> >> >> - If you look at the "Reduct" column you will see that MatMult() is doing a lot of global reductions, which is bad for scaling. This is due to MATNEST (other Mat types do not do that). I don't know the details of MATNEST, maybe Matt can comment on this. >> > >> > It is not intrinsic to MatNest, though use of MatNest incurs extra >> > VecScatter costs. If you use MatNest without VecNest, then >> > VecGetSubVector incurs significant cost (including reductions). I >> > suspect it's likely that some SLEPc functionality is not available with >> > VecNest. A better option would be to optimize VecGetSubVector by >> > caching the IS and subvector, at least in the contiguous case. >> > >> > How difficult would it be for you to run with a monolithic matrix >> > instead of MatNest? It would certainly be better at amortizing >> > communication costs. >> > >> >> >> >> Comments related to SLEPc. >> >> >> >> - The last rows (DSSolve, DSVectors, DSOther) correspond to "sequential" computations. In your case they take a non-negligible time (around 30 seconds). You can try to reduce this time by reducing the size of the projected problem, e.g. running with -eps_nev 100 -eps_mpd 64 (see https://slepc.upv.es/documentation/current/docs/manualpages/EPS/EPSSetDimensions.html ) >> >> >> >> - In my previous comment about multithreaded BLAS, I was refering to configuring PETSc with MKL, OpenBLAS or similar. But anyway, I don't think this is relevant here. >> >> >> >> - Regarding the number of iterations, yes the number of iterations should be the same for different runs if you keep the same number of processes, but when you change the number of processes there might be significant differences for some problems, that is the rationale of my suggestion. Anyway, in your case the fluctuation does not seem very important. >> >> >> >> Jose >> >> >> >> >> >>> El 1 may 2020, a las 10:07, Walker Andreas escribi?: >> >>> >> >>> Hi Matthew, >> >>> >> >>> I just ran the same program on a single core. You can see the output of -log_view below. As I see it, most functions have speedups of around 50 for 128 cores, also functions like matmult etc. >> >>> >> >>> Best regards, >> >>> >> >>> Andreas >> >>> >> >>> ************************************************************************************************************************ >> >>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** >> >>> ************************************************************************************************************************ >> >>> >> >>> ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- >> >>> >> >>> ./Solver on a named eu-a6-011-09 with 1 processor, by awalker Fri May 1 04:03:07 2020 >> >>> Using Petsc Release Version 3.10.5, Mar, 28, 2019 >> >>> >> >>> Max Max/Min Avg Total >> >>> Time (sec): 3.092e+04 1.000 3.092e+04 >> >>> Objects: 6.099e+05 1.000 6.099e+05 >> >>> Flop: 9.313e+13 1.000 9.313e+13 9.313e+13 >> >>> Flop/sec: 3.012e+09 1.000 3.012e+09 3.012e+09 >> >>> MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 >> >>> MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 >> >>> MPI Reductions: 0.000e+00 0.000 >> >>> >> >>> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) >> >>> e.g., VecAXPY() for real vectors of length N --> 2N flop >> >>> and VecAXPY() for complex vectors of length N --> 8N flop >> >>> >> >>> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- >> >>> Avg %Total Avg %Total Count %Total Avg %Total Count %Total >> >>> 0: Main Stage: 3.0925e+04 100.0% 9.3134e+13 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% >> >>> >> >>> ------------------------------------------------------------------------------------------------------------------------ >> >>> See the 'Profiling' chapter of the users' manual for details on interpreting output. >> >>> Phase summary info: >> >>> Count: number of times phase was executed >> >>> Time and Flop: Max - maximum over all processors >> >>> Ratio - ratio of maximum to minimum over all processors >> >>> Mess: number of messages sent >> >>> AvgLen: average message length (bytes) >> >>> Reduct: number of global reductions >> >>> Global: entire computation >> >>> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). >> >>> %T - percent time in this phase %F - percent flop in this phase >> >>> %M - percent messages in this phase %L - percent message lengths in this phase >> >>> %R - percent reductions in this phase >> >>> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) >> >>> ------------------------------------------------------------------------------------------------------------------------ >> >>> Event Count Time (sec) Flop --- Global --- --- Stage ---- Total >> >>> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> >>> ------------------------------------------------------------------------------------------------------------------------ >> >>> >> >>> --- Event Stage 0: Main Stage >> >>> >> >>> MatMult 152338 1.0 8.2799e+03 1.0 8.20e+12 1.0 0.0e+00 0.0e+00 0.0e+00 27 9 0 0 0 27 9 0 0 0 990 >> >>> MatMultAdd 609352 1.0 8.1229e+03 1.0 8.20e+12 1.0 0.0e+00 0.0e+00 0.0e+00 26 9 0 0 0 26 9 0 0 0 1010 >> >>> MatConvert 30 1.0 1.5797e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> MatScale 10 1.0 4.7172e-02 1.0 6.73e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1426 >> >>> MatAssemblyBegin 516 1.0 2.0695e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> MatAssemblyEnd 516 1.0 2.8933e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> MatZeroEntries 2 1.0 3.6038e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> MatView 10 1.0 2.4422e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> MatAXPY 40 1.0 3.1595e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> MatMatMult 60 1.0 1.3723e+01 1.0 1.24e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 90 >> >>> MatMatMultSym 100 1.0 1.3651e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> MatMatMultNum 100 1.0 7.5159e+00 1.0 2.06e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 274 >> >>> MatMatMatMult 40 1.0 1.8674e+01 1.0 1.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 89 >> >>> MatMatMatMultSym 40 1.0 1.1848e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> MatMatMatMultNum 40 1.0 6.8266e+00 1.0 1.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 243 >> >>> MatPtAP 40 1.0 1.9042e+01 1.0 1.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 87 >> >>> MatTrnMatMult 40 1.0 7.7990e+00 1.0 8.24e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 106 >> >>> DMPlexStratify 1 1.0 5.1223e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> DMPlexPrealloc 2 1.0 1.5242e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> VecSet 914053 1.0 1.4929e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> VecAssemblyBegin 1 1.0 1.3411e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> VecAssemblyEnd 1 1.0 8.0094e-08 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> VecScatterBegin 1 1.0 2.6399e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> VecSetRandom 10 1.0 8.6088e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> EPSSetUp 10 1.0 2.9988e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> EPSSolve 10 1.0 2.8695e+04 1.0 9.31e+13 1.0 0.0e+00 0.0e+00 0.0e+00 93100 0 0 0 93100 0 0 0 3246 >> >>> STSetUp 10 1.0 9.7291e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> STApply 152338 1.0 8.2803e+03 1.0 8.20e+12 1.0 0.0e+00 0.0e+00 0.0e+00 27 9 0 0 0 27 9 0 0 0 990 >> >>> BVCopy 1814 1.0 1.1076e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> BVMultVec 304639 1.0 9.8281e+03 1.0 3.34e+13 1.0 0.0e+00 0.0e+00 0.0e+00 32 36 0 0 0 32 36 0 0 0 3397 >> >>> BVMultInPlace 1824 1.0 7.0999e+02 1.0 1.79e+13 1.0 0.0e+00 0.0e+00 0.0e+00 2 19 0 0 0 2 19 0 0 0 25213 >> >>> BVDotVec 304639 1.0 9.8037e+03 1.0 3.36e+13 1.0 0.0e+00 0.0e+00 0.0e+00 32 36 0 0 0 32 36 0 0 0 3427 >> >>> BVOrthogonalizeV 152348 1.0 1.9633e+04 1.0 6.70e+13 1.0 0.0e+00 0.0e+00 0.0e+00 63 72 0 0 0 63 72 0 0 0 3411 >> >>> BVScale 152348 1.0 3.7888e+01 1.0 5.32e+10 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1403 >> >>> BVSetRandom 10 1.0 8.6364e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> DSSolve 1824 1.0 1.7363e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> DSVectors 2797 1.0 1.2353e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> DSOther 1824 1.0 9.8627e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>> ------------------------------------------------------------------------------------------------------------------------ >> >>> >> >>> Memory usage is given in bytes: >> >>> >> >>> Object Type Creations Destructions Memory Descendants' Mem. >> >>> Reports information only for process 0. >> >>> >> >>> --- Event Stage 0: Main Stage >> >>> >> >>> Container 1 1 584 0. >> >>> Distributed Mesh 1 1 5184 0. >> >>> GraphPartitioner 1 1 624 0. >> >>> Matrix 320 320 3469402576 0. >> >>> Index Set 53 53 2777932 0. >> >>> IS L to G Mapping 1 1 249320 0. >> >>> Section 13 11 7920 0. >> >>> Star Forest Graph 6 6 4896 0. >> >>> Discrete System 1 1 936 0. >> >>> Vector 609405 609405 857220847896 0. >> >>> Vec Scatter 1 1 704 0. >> >>> Viewer 22 11 9328 0. >> >>> EPS Solver 10 10 86360 0. >> >>> Spectral Transform 10 10 8400 0. >> >>> Basis Vectors 10 10 530336 0. >> >>> PetscRandom 10 10 6540 0. >> >>> Region 10 10 6800 0. >> >>> Direct Solver 10 10 9838880 0. >> >>> Krylov Solver 10 10 13920 0. >> >>> Preconditioner 10 10 10080 0. >> >>> ======================================================================================================================== >> >>> Average time to get PetscTime(): 2.50991e-08 >> >>> #PETSc Option Table entries: >> >>> -config=benchmark3.json >> >>> -eps_converged_reason >> >>> -log_view >> >>> #End of PETSc Option Table entries >> >>> Compiled without FORTRAN kernels >> >>> Compiled with full precision matrices (default) >> >>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 >> >>> Configure options: --prefix=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 CFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" FFLAGS= CXXFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" --with-cc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc --with-cxx=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpic++ --with-fc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 --with-precision=double --with-scalar-type=real --with-shared-libraries=1 --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= CXXOPTFLAGS= --with-blaslapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib/libopenblas.so --with-x=0 --with-cxx-dialect=C++11 --with-boost=1 --with-clanguage=C --with-scalapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib/libscalapack.so --with-scalapack=1 --with-metis=1 --with-metis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk --with-hdf5=1 --with-hdf5-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5 --with-hypre=1 --with-hypre-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne --with-parmetis=1 --with-parmetis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4 --with-mumps=1 --with-mumps-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b --with-trilinos=1 --with-trilinos-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo --with-fftw=0 --with-cxx-dialect=C++11 --with-superlu_dist-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include --with-superlu_dist-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib/libsuperlu_dist.a --with-superlu_dist=1 --with-suitesparse-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include --with-suitesparse-lib="/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libumfpack.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libklu.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcholmod.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libbtf.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libccolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libsuitesparseconfig.so /lib64/librt.so" --with-suitesparse=1 --with-zlib-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include --with-zlib-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib/libz.so --with-zlib=1 >> >>> ----------------------------------------- >> >>> Libraries compiled on 2020-01-22 15:21:53 on eu-c7-051-02 >> >>> Machine characteristics: Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-centos-7.5.1804-Core >> >>> Using PETSc directory: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit >> >>> Using PETSc arch: >> >>> ----------------------------------------- >> >>> >> >>> Using C compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc -ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2 >> >>> Using Fortran compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 >> >>> ----------------------------------------- >> >>> >> >>> Using include paths: -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include >> >>> ----------------------------------------- >> >>> >> >>> Using C linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc >> >>> Using Fortran linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 >> >>> Using libraries: -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -lpetsc -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib /lib64/librt.so -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -Wl,-rpath,/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -L/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib:/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -lmuelu-adapters -lmuelu-interface -lmuelu -lstratimikos -lstratimikosbelos -lstratimikosaztecoo -lstratimikosamesos -lstratimikosml -lstratimikosifpack -lModeLaplace -lanasaziepetra -lanasazi -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lbelosxpetra -lbelosepetra -lbelos -lml -lifpack -lpamgen_extras -lpamgen -lamesos -lgaleri-xpetra -lgaleri-epetra -laztecoo -lisorropia -lxpetra-sup -lxpetra -lthyraepetraext -lthyraepetra -lthyracore -lthyraepetraext -lthyraepetra -lthyracore -lepetraext -ltrilinosss -ltriutils -lzoltan -lepetra -lsacado -lrtop -lkokkoskernels -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lgtest -lpthread -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu_dist -lHYPRE -lopenblas -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lparmetis -lmetis -lm -lz -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl >> >>> ----------------------------------------- >> >>> >> >>> >> >>>> Am 30.04.2020 um 17:14 schrieb Matthew Knepley : >> >>>> >> >>>> On Thu, Apr 30, 2020 at 10:55 AM Walker Andreas wrote: >> >>>> Hello everyone, >> >>>> >> >>>> I have used SLEPc successfully on a FEM-related project. Even though it is very powerful overall, the speedup I measure is a bit below my expectations. Compared to using a single core, the speedup is for example around 1.8 for two cores but only maybe 50-60 for 128 cores and maybe 70 or 80 for 256 cores. Some details about my problem: >> >>>> >> >>>> - The problem is based on meshes with up to 400k degrees of freedom. DMPlex is used for organizing it. >> >>>> - ParMetis is used to partition the mesh. This yields a stiffness matrix where the vast majority of entries is in the diagonal blocks (i.e. looking at the rows owned by a core, there is a very dense square-shaped region around the diagonal and some loosely scattered nozeroes in the other columns). >> >>>> - The actual matrix from which I need eigenvalues is a 2x2 block matrix, saved as MATNEST - matrix. Each of these four matrices is computed based on the stiffness matrix and has a similar size and nonzero pattern. For a mesh of 200k dofs, one such matrix has a size of about 174kx174k and on average about 40 nonzeroes per row. >> >>>> - I use the default Krylov-Schur solver and look for the 100 smallest eigenvalues >> >>>> - The output of -log_view for the 200k-dof - mesh described above run on 128 cores is at the end of this mail. >> >>>> >> >>>> I noticed that the problem matrices are not perfectly balanced, i.e. the number of rows per core might vary between 2500 and 3000, for example. But I am not sure if this is the main reason for the poor speedup. >> >>>> >> >>>> I tried to reduce the subspace size but without effect. I also attempted to use the shift-and-invert spectral transformation but the MATNEST-type prevents this. >> >>>> >> >>>> Are there any suggestions to improve the speedup further or is this the maximum speedup that I can expect? >> >>>> >> >>>> Can you also give us the performance for this problem on one node using the same number of cores per node? Then we can calculate speedup >> >>>> and look at which functions are not speeding up. >> >>>> >> >>>> Thanks, >> >>>> >> >>>> Matt >> >>>> >> >>>> Thanks a lot in advance, >> >>>> >> >>>> Andreas Walker >> >>>> >> >>>> m&m group >> >>>> D-MAVT >> >>>> ETH Zurich >> >>>> >> >>>> ************************************************************************************************************************ >> >>>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** >> >>>> ************************************************************************************************************************ >> >>>> >> >>>> ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- >> >>>> >> >>>> ./Solver on a named eu-g1-050-2 with 128 processors, by awalker Thu Apr 30 15:50:22 2020 >> >>>> Using Petsc Release Version 3.10.5, Mar, 28, 2019 >> >>>> >> >>>> Max Max/Min Avg Total >> >>>> Time (sec): 6.209e+02 1.000 6.209e+02 >> >>>> Objects: 6.068e+05 1.001 6.063e+05 >> >>>> Flop: 9.230e+11 1.816 7.212e+11 9.231e+13 >> >>>> Flop/sec: 1.487e+09 1.816 1.161e+09 1.487e+11 >> >>>> MPI Messages: 1.451e+07 2.999 8.265e+06 1.058e+09 >> >>>> MPI Message Lengths: 6.062e+09 2.011 5.029e+02 5.321e+11 >> >>>> MPI Reductions: 1.512e+06 1.000 >> >>>> >> >>>> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) >> >>>> e.g., VecAXPY() for real vectors of length N --> 2N flop >> >>>> and VecAXPY() for complex vectors of length N --> 8N flop >> >>>> >> >>>> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- >> >>>> Avg %Total Avg %Total Count %Total Avg %Total Count %Total >> >>>> 0: Main Stage: 6.2090e+02 100.0% 9.2309e+13 100.0% 1.058e+09 100.0% 5.029e+02 100.0% 1.512e+06 100.0% >> >>>> >> >>>> ------------------------------------------------------------------------------------------------------------------------ >> >>>> See the 'Profiling' chapter of the users' manual for details on interpreting output. >> >>>> Phase summary info: >> >>>> Count: number of times phase was executed >> >>>> Time and Flop: Max - maximum over all processors >> >>>> Ratio - ratio of maximum to minimum over all processors >> >>>> Mess: number of messages sent >> >>>> AvgLen: average message length (bytes) >> >>>> Reduct: number of global reductions >> >>>> Global: entire computation >> >>>> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). >> >>>> %T - percent time in this phase %F - percent flop in this phase >> >>>> %M - percent messages in this phase %L - percent message lengths in this phase >> >>>> %R - percent reductions in this phase >> >>>> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) >> >>>> ------------------------------------------------------------------------------------------------------------------------ >> >>>> Event Count Time (sec) Flop --- Global --- --- Stage ---- Total >> >>>> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> >>>> ------------------------------------------------------------------------------------------------------------------------ >> >>>> >> >>>> --- Event Stage 0: Main Stage >> >>>> >> >>>> BuildTwoSided 20 1.0 2.3249e-01 2.2 0.00e+00 0.0 2.2e+04 4.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> BuildTwoSidedF 317 1.0 8.5016e-01 4.8 0.00e+00 0.0 2.1e+04 1.4e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> MatMult 150986 1.0 2.1963e+02 1.3 8.07e+10 1.8 1.1e+09 5.0e+02 1.2e+06 31 9100100 80 31 9100100 80 37007 >> >>>> MatMultAdd 603944 1.0 1.6209e+02 1.4 8.07e+10 1.8 1.1e+09 5.0e+02 0.0e+00 23 9100100 0 23 9100100 0 50145 >> >>>> MatConvert 30 1.0 1.6488e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> MatScale 10 1.0 1.0347e-03 3.9 6.68e+05 1.8 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 65036 >> >>>> MatAssemblyBegin 916 1.0 8.6715e-01 1.4 0.00e+00 0.0 2.1e+04 1.4e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> MatAssemblyEnd 916 1.0 2.0682e-01 1.1 0.00e+00 0.0 4.7e+05 1.3e+02 1.5e+03 0 0 0 0 0 0 0 0 0 0 0 >> >>>> MatZeroEntries 42 1.0 7.2787e-03 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> MatView 10 1.0 1.4816e+00 1.0 0.00e+00 0.0 6.4e+03 1.3e+05 3.0e+01 0 0 0 0 0 0 0 0 0 0 0 >> >>>> MatAXPY 40 1.0 1.0752e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> MatTranspose 80 1.0 3.0198e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> MatMatMult 60 1.0 3.0391e-01 1.0 7.82e+06 1.6 3.8e+05 2.8e+02 7.8e+02 0 0 0 0 0 0 0 0 0 0 2711 >> >>>> MatMatMultSym 60 1.0 2.4238e-01 1.0 0.00e+00 0.0 3.3e+05 2.4e+02 7.2e+02 0 0 0 0 0 0 0 0 0 0 0 >> >>>> MatMatMultNum 60 1.0 5.8508e-02 1.0 7.82e+06 1.6 4.7e+04 5.7e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 14084 >> >>>> MatPtAP 40 1.0 4.5617e-01 1.0 1.59e+07 1.6 3.3e+05 1.0e+03 6.4e+02 0 0 0 0 0 0 0 0 0 0 3649 >> >>>> MatPtAPSymbolic 40 1.0 2.6002e-01 1.0 0.00e+00 0.0 1.7e+05 6.5e+02 2.8e+02 0 0 0 0 0 0 0 0 0 0 0 >> >>>> MatPtAPNumeric 40 1.0 1.9293e-01 1.0 1.59e+07 1.6 1.5e+05 1.5e+03 3.2e+02 0 0 0 0 0 0 0 0 0 0 8629 >> >>>> MatTrnMatMult 40 1.0 2.3801e-01 1.0 6.09e+06 1.8 1.8e+05 1.0e+03 6.4e+02 0 0 0 0 0 0 0 0 0 0 2442 >> >>>> MatTrnMatMultSym 40 1.0 1.6962e-01 1.0 0.00e+00 0.0 1.7e+05 4.4e+02 6.4e+02 0 0 0 0 0 0 0 0 0 0 0 >> >>>> MatTrnMatMultNum 40 1.0 6.9000e-02 1.0 6.09e+06 1.8 9.7e+03 1.1e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 8425 >> >>>> MatGetLocalMat 240 1.0 4.9149e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> MatGetBrAoCol 160 1.0 2.0470e-02 1.6 0.00e+00 0.0 3.3e+05 4.1e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> MatTranspose_SeqAIJ_FAST 80 1.0 2.9940e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> Mesh Partition 1 1.0 1.4825e+00 1.0 0.00e+00 0.0 9.8e+04 6.9e+01 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> Mesh Migration 1 1.0 3.6680e-02 1.0 0.00e+00 0.0 1.5e+03 1.4e+04 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> DMPlexDistribute 1 1.0 1.5269e+00 1.0 0.00e+00 0.0 1.0e+05 3.5e+02 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 >> >>>> DMPlexDistCones 1 1.0 1.8845e-02 1.2 0.00e+00 0.0 1.0e+03 1.7e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> DMPlexDistLabels 1 1.0 9.7280e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> DMPlexDistData 1 1.0 3.1499e-01 1.4 0.00e+00 0.0 9.8e+04 4.3e+01 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> DMPlexStratify 2 1.0 9.3421e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> DMPlexPrealloc 2 1.0 3.5980e-02 1.0 0.00e+00 0.0 4.0e+04 1.8e+03 3.0e+01 0 0 0 0 0 0 0 0 0 0 0 >> >>>> SFSetGraph 20 1.0 1.6069e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> SFSetUp 20 1.0 2.8043e-01 1.9 0.00e+00 0.0 6.7e+04 5.0e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> SFBcastBegin 25 1.0 3.9653e-02 2.5 0.00e+00 0.0 6.1e+04 4.9e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> SFBcastEnd 25 1.0 9.0128e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> SFReduceBegin 10 1.0 4.3473e-04 5.5 0.00e+00 0.0 7.4e+03 4.0e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> SFReduceEnd 10 1.0 5.7962e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> SFFetchOpBegin 2 1.0 1.6069e-0434.7 0.00e+00 0.0 1.8e+03 4.4e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> SFFetchOpEnd 2 1.0 8.9251e-04 2.6 0.00e+00 0.0 1.8e+03 4.4e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> VecSet 302179 1.0 1.3128e+00 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> VecAssemblyBegin 1 1.0 1.3844e-03 7.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> VecAssemblyEnd 1 1.0 3.4710e-05 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> VecScatterBegin 603945 1.0 2.2874e+01 4.4 0.00e+00 0.0 1.1e+09 5.0e+02 1.0e+00 2 0100100 0 2 0100100 0 0 >> >>>> VecScatterEnd 603944 1.0 8.2651e+01 4.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 7 0 0 0 0 7 0 0 0 0 0 >> >>>> VecSetRandom 11 1.0 2.7061e-03 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> EPSSetUp 10 1.0 5.0371e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+01 0 0 0 0 0 0 0 0 0 0 0 >> >>>> EPSSolve 10 1.0 6.1329e+02 1.0 9.23e+11 1.8 1.1e+09 5.0e+02 1.5e+06 99100100100100 99100100100100 150509 >> >>>> STSetUp 10 1.0 2.5475e-04 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> STApply 150986 1.0 2.1997e+02 1.3 8.07e+10 1.8 1.1e+09 5.0e+02 1.2e+06 31 9100100 80 31 9100100 80 36950 >> >>>> BVCopy 1791 1.0 5.1953e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> BVMultVec 301925 1.0 1.5007e+02 3.1 3.31e+11 1.8 0.0e+00 0.0e+00 0.0e+00 14 36 0 0 0 14 36 0 0 0 220292 >> >>>> BVMultInPlace 1801 1.0 8.0080e+00 1.8 1.78e+11 1.8 0.0e+00 0.0e+00 0.0e+00 1 19 0 0 0 1 19 0 0 0 2222543 >> >>>> BVDotVec 301925 1.0 3.2807e+02 1.4 3.33e+11 1.8 0.0e+00 0.0e+00 3.0e+05 47 36 0 0 20 47 36 0 0 20 101409 >> >>>> BVOrthogonalizeV 150996 1.0 4.0292e+02 1.1 6.64e+11 1.8 0.0e+00 0.0e+00 3.0e+05 62 72 0 0 20 62 72 0 0 20 164619 >> >>>> BVScale 150996 1.0 4.1660e-01 3.2 5.27e+08 1.8 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 126494 >> >>>> BVSetRandom 10 1.0 2.5061e-03 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> DSSolve 1801 1.0 2.0764e+01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >> >>>> DSVectors 2779 1.0 1.2691e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >>>> DSOther 1801 1.0 1.2944e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >> >>>> ------------------------------------------------------------------------------------------------------------------------ >> >>>> >> >>>> Memory usage is given in bytes: >> >>>> >> >>>> Object Type Creations Destructions Memory Descendants' Mem. >> >>>> Reports information only for process 0. >> >>>> >> >>>> --- Event Stage 0: Main Stage >> >>>> >> >>>> Container 1 1 584 0. >> >>>> Distributed Mesh 6 6 29160 0. >> >>>> GraphPartitioner 2 2 1244 0. >> >>>> Matrix 1104 1104 136615232 0. >> >>>> Index Set 930 930 9125912 0. >> >>>> IS L to G Mapping 3 3 2235608 0. >> >>>> Section 28 26 18720 0. >> >>>> Star Forest Graph 30 30 25632 0. >> >>>> Discrete System 6 6 5616 0. >> >>>> PetscRandom 11 11 7194 0. >> >>>> Vector 604372 604372 8204816368 0. >> >>>> Vec Scatter 203 203 272192 0. >> >>>> Viewer 21 10 8480 0. >> >>>> EPS Solver 10 10 86360 0. >> >>>> Spectral Transform 10 10 8400 0. >> >>>> Basis Vectors 10 10 530848 0. >> >>>> Region 10 10 6800 0. >> >>>> Direct Solver 10 10 9838880 0. >> >>>> Krylov Solver 10 10 13920 0. >> >>>> Preconditioner 10 10 10080 0. >> >>>> ======================================================================================================================== >> >>>> Average time to get PetscTime(): 3.49944e-08 >> >>>> Average time for MPI_Barrier(): 5.842e-06 >> >>>> Average time for zero size MPI_Send(): 8.72551e-06 >> >>>> #PETSc Option Table entries: >> >>>> -config=benchmark3.json >> >>>> -log_view >> >>>> #End of PETSc Option Table entries >> >>>> Compiled without FORTRAN kernels >> >>>> Compiled with full precision matrices (default) >> >>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 >> >>>> Configure options: --prefix=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 CFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" FFLAGS= CXXFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" --with-cc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc --with-cxx=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpic++ --with-fc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 --with-precision=double --with-scalar-type=real --with-shared-libraries=1 --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= CXXOPTFLAGS= --with-blaslapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib/libopenblas.so --with-x=0 --with-cxx-dialect=C++11 --with-boost=1 --with-clanguage=C --with-scalapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib/libscalapack.so --with-scalapack=1 --with-metis=1 --with-metis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk --with-hdf5=1 --with-hdf5-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5 --with-hypre=1 --with-hypre-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne --with-parmetis=1 --with-parmetis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4 --with-mumps=1 --with-mumps-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b --with-trilinos=1 --with-trilinos-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo --with-fftw=0 --with-cxx-dialect=C++11 --with-superlu_dist-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include --with-superlu_dist-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib/libsuperlu_dist.a --with-superlu_dist=1 --with-suitesparse-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include --with-suitesparse-lib="/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libumfpack.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libklu.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcholmod.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libbtf.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libccolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libsuitesparseconfig.so /lib64/librt.so" --with-suitesparse=1 --with-zlib-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include --with-zlib-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib/libz.so --with-zlib=1 >> >>>> ----------------------------------------- >> >>>> Libraries compiled on 2020-01-22 15:21:53 on eu-c7-051-02 >> >>>> Machine characteristics: Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-centos-7.5.1804-Core >> >>>> Using PETSc directory: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit >> >>>> Using PETSc arch: >> >>>> ----------------------------------------- >> >>>> >> >>>> Using C compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc -ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2 >> >>>> Using Fortran compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 >> >>>> ----------------------------------------- >> >>>> >> >>>> Using include paths: -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include >> >>>> ----------------------------------------- >> >>>> >> >>>> Using C linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc >> >>>> Using Fortran linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 >> >>>> Using libraries: -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -lpetsc -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib /lib64/librt.so -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -Wl,-rpath,/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -L/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib:/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -lmuelu-adapters -lmuelu-interface -lmuelu -lstratimikos -lstratimikosbelos -lstratimikosaztecoo -lstratimikosamesos -lstratimikosml -lstratimikosifpack -lModeLaplace -lanasaziepetra -lanasazi -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lbelosxpetra -lbelosepetra -lbelos -lml -lifpack -lpamgen_extras -lpamgen -lamesos -lgaleri-xpetra -lgaleri-epetra -laztecoo -lisorropia -lxpetra-sup -lxpetra -lthyraepetraext -lthyraepetra -lthyracore -lthyraepetraext -lthyraepetra -lthyracore -lepetraext -ltrilinosss -ltriutils -lzoltan -lepetra -lsacado -lrtop -lkokkoskernels -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lgtest -lpthread -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu_dist -lHYPRE -lopenblas -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lparmetis -lmetis -lm -lz -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl >> >>>> ----------------------------------------- >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> >>>> -- Norbert Wiener >> >>>> >> >>>> https://www.cse.buffalo.edu/~knepley/ >> >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From knepley at gmail.com Mon May 4 05:24:27 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 4 May 2020 06:24:27 -0400 Subject: [petsc-users] Performance of SLEPc's Krylov-Schur solver In-Reply-To: <2758D742-5E7D-4AEA-B93E-1592424FA8E2@dsic.upv.es> References: <86B05A0E-87C4-4B23-AC8B-6C39E6538B84@student.ethz.ch> <58697038-9771-4819-B060-061A3F3F0E91@student.ethz.ch> <87sggjam5x.fsf@jedbrown.org> <294D3E6F-33A1-4208-AD80-87CDA90DF87B@student.ethz.ch> <5EFC5BA9-A229-4FCB-B484-3A90F9213F95@student.ethz.ch> <2758D742-5E7D-4AEA-B93E-1592424FA8E2@dsic.upv.es> Message-ID: On Mon, May 4, 2020 at 6:12 AM Jose E. Roman wrote: > > > > El 4 may 2020, a las 12:06, Matthew Knepley > escribi?: > > > > On Mon, May 4, 2020 at 3:51 AM Walker Andreas > wrote: > > Hey everyone, > > > > I wanted to give you a short update on this: > > > > - As suggested by Matt, I played around with the distribution of my > cores over nodes and changed to using one core per node only. > > - After experimenting with a monolithic matrix instead of a MATNEST > object, I observed that the monolithic showed better speedup and changed to > building my problem matrix as monolithic matrix. > > - I keep the subspace for the solver small which slightly improves the > runtimes > > > > After these changes, I get near-perfect scaling with speedups above 56 > for 64 cores (1 core/node) for example. Unfortunately I can?t really tell > which of the above changes contributed how much to this improvement. > > > > Anyway, thanks everyone for your help! > > > > Great! I am glad its scaling well. > > > > 1) You should be able to up the number of cores per node as large as you > want, as long as you evaluate the scaling in terms of nodes., meaning use > that as the baseline. > > > > 2) You can see how many cores per node are used efficiently by running > 'make streams'. > > > > 3) We would be really interested in seeing the logs from runs where > MATNEST scales worse than AIJ. Maybe we could fix that. > > Matt, this last issue is due to SLEPc, I am preparing a fix. > Thanks! I am interested to see it. Matt > > > > Thanks, > > > > Matt > > > > Best regards and stay healthy > > > > Andreas > > > >> Am 01.05.2020 um 14:45 schrieb Matthew Knepley : > >> > >> On Fri, May 1, 2020 at 8:32 AM Walker Andreas > wrote: > >> Hi Jed, Hi Jose, > >> > >> Thank you very much for your suggestions. > >> > >> - I tried reducing the subspace to 64 which indeed reduced the runtime > by around 20 percent (sometimes more) for 128 cores. I will check what the > effect on the sequential runtime is. > >> - Regarding MatNest, I can just look for the eigenvalues of a > submatrix to see how the speedup is affected; I will check that. Replacing > the full matnest with a contiguous matrix is definitely more work but, if > it improves the performance, worth the work (we assume that the program > will be reused a lot). > >> - Petsc is configured with mumps, openblas, scalapack (among others). > But I noticed no significant difference to when petsc is configured without > them. > >> - The number of iterations required by the solver does not depend on > the number of cores. > >> > >> Best regards and many thanks, > >> > >> Let me just address something from a high level. These operations are > not compute limited (for the most part), but limited by > >> bandwidth. Bandwidth is allocated by node, not by core, on these > machines. That is why it important to understand how many > >> nodes you are using, not cores. A useful scaling test would be to fill > up a single node (however many cores fit on one node), and > >> then increase the # of nodes. We would expect close to linear scaling > in that case. > >> > >> Thanks, > >> > >> Matt > >> > >> Andreas Walker > >> > >> > Am 01.05.2020 um 14:12 schrieb Jed Brown : > >> > > >> > "Jose E. Roman" writes: > >> > > >> >> Comments related to PETSc: > >> >> > >> >> - If you look at the "Reduct" column you will see that MatMult() is > doing a lot of global reductions, which is bad for scaling. This is due to > MATNEST (other Mat types do not do that). I don't know the details of > MATNEST, maybe Matt can comment on this. > >> > > >> > It is not intrinsic to MatNest, though use of MatNest incurs extra > >> > VecScatter costs. If you use MatNest without VecNest, then > >> > VecGetSubVector incurs significant cost (including reductions). I > >> > suspect it's likely that some SLEPc functionality is not available > with > >> > VecNest. A better option would be to optimize VecGetSubVector by > >> > caching the IS and subvector, at least in the contiguous case. > >> > > >> > How difficult would it be for you to run with a monolithic matrix > >> > instead of MatNest? It would certainly be better at amortizing > >> > communication costs. > >> > > >> >> > >> >> Comments related to SLEPc. > >> >> > >> >> - The last rows (DSSolve, DSVectors, DSOther) correspond to > "sequential" computations. In your case they take a non-negligible time > (around 30 seconds). You can try to reduce this time by reducing the size > of the projected problem, e.g. running with -eps_nev 100 -eps_mpd 64 (see > https://slepc.upv.es/documentation/current/docs/manualpages/EPS/EPSSetDimensions.html > ) > >> >> > >> >> - In my previous comment about multithreaded BLAS, I was refering to > configuring PETSc with MKL, OpenBLAS or similar. But anyway, I don't think > this is relevant here. > >> >> > >> >> - Regarding the number of iterations, yes the number of iterations > should be the same for different runs if you keep the same number of > processes, but when you change the number of processes there might be > significant differences for some problems, that is the rationale of my > suggestion. Anyway, in your case the fluctuation does not seem very > important. > >> >> > >> >> Jose > >> >> > >> >> > >> >>> El 1 may 2020, a las 10:07, Walker Andreas > escribi?: > >> >>> > >> >>> Hi Matthew, > >> >>> > >> >>> I just ran the same program on a single core. You can see the > output of -log_view below. As I see it, most functions have speedups of > around 50 for 128 cores, also functions like matmult etc. > >> >>> > >> >>> Best regards, > >> >>> > >> >>> Andreas > >> >>> > >> >>> > ************************************************************************************************************************ > >> >>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript > -r -fCourier9' to print this document *** > >> >>> > ************************************************************************************************************************ > >> >>> > >> >>> ---------------------------------------------- PETSc Performance > Summary: ---------------------------------------------- > >> >>> > >> >>> ./Solver on a named eu-a6-011-09 with 1 processor, by awalker Fri > May 1 04:03:07 2020 > >> >>> Using Petsc Release Version 3.10.5, Mar, 28, 2019 > >> >>> > >> >>> Max Max/Min Avg Total > >> >>> Time (sec): 3.092e+04 1.000 3.092e+04 > >> >>> Objects: 6.099e+05 1.000 6.099e+05 > >> >>> Flop: 9.313e+13 1.000 9.313e+13 9.313e+13 > >> >>> Flop/sec: 3.012e+09 1.000 3.012e+09 3.012e+09 > >> >>> MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 > >> >>> MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 > >> >>> MPI Reductions: 0.000e+00 0.000 > >> >>> > >> >>> Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > >> >>> e.g., VecAXPY() for real vectors of > length N --> 2N flop > >> >>> and VecAXPY() for complex vectors of > length N --> 8N flop > >> >>> > >> >>> Summary of Stages: ----- Time ------ ----- Flop ------ --- > Messages --- -- Message Lengths -- -- Reductions -- > >> >>> Avg %Total Avg %Total Count > %Total Avg %Total Count %Total > >> >>> 0: Main Stage: 3.0925e+04 100.0% 9.3134e+13 100.0% > 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% > >> >>> > >> >>> > ------------------------------------------------------------------------------------------------------------------------ > >> >>> See the 'Profiling' chapter of the users' manual for details on > interpreting output. > >> >>> Phase summary info: > >> >>> Count: number of times phase was executed > >> >>> Time and Flop: Max - maximum over all processors > >> >>> Ratio - ratio of maximum to minimum over all > processors > >> >>> Mess: number of messages sent > >> >>> AvgLen: average message length (bytes) > >> >>> Reduct: number of global reductions > >> >>> Global: entire computation > >> >>> Stage: stages of a computation. Set stages with > PetscLogStagePush() and PetscLogStagePop(). > >> >>> %T - percent time in this phase %F - percent flop in > this phase > >> >>> %M - percent messages in this phase %L - percent message > lengths in this phase > >> >>> %R - percent reductions in this phase > >> >>> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max > time over all processors) > >> >>> > ------------------------------------------------------------------------------------------------------------------------ > >> >>> Event Count Time (sec) Flop > --- Global --- --- Stage ---- Total > >> >>> Max Ratio Max Ratio Max Ratio Mess > AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > >> >>> > ------------------------------------------------------------------------------------------------------------------------ > >> >>> > >> >>> --- Event Stage 0: Main Stage > >> >>> > >> >>> MatMult 152338 1.0 8.2799e+03 1.0 8.20e+12 1.0 0.0e+00 > 0.0e+00 0.0e+00 27 9 0 0 0 27 9 0 0 0 990 > >> >>> MatMultAdd 609352 1.0 8.1229e+03 1.0 8.20e+12 1.0 0.0e+00 > 0.0e+00 0.0e+00 26 9 0 0 0 26 9 0 0 0 1010 > >> >>> MatConvert 30 1.0 1.5797e+00 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> MatScale 10 1.0 4.7172e-02 1.0 6.73e+07 1.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1426 > >> >>> MatAssemblyBegin 516 1.0 2.0695e-04 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> MatAssemblyEnd 516 1.0 2.8933e+00 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> MatZeroEntries 2 1.0 3.6038e-02 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> MatView 10 1.0 2.4422e+00 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> MatAXPY 40 1.0 3.1595e-01 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> MatMatMult 60 1.0 1.3723e+01 1.0 1.24e+09 1.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 90 > >> >>> MatMatMultSym 100 1.0 1.3651e+01 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> MatMatMultNum 100 1.0 7.5159e+00 1.0 2.06e+09 1.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 274 > >> >>> MatMatMatMult 40 1.0 1.8674e+01 1.0 1.66e+09 1.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 89 > >> >>> MatMatMatMultSym 40 1.0 1.1848e+01 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> MatMatMatMultNum 40 1.0 6.8266e+00 1.0 1.66e+09 1.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 243 > >> >>> MatPtAP 40 1.0 1.9042e+01 1.0 1.66e+09 1.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 87 > >> >>> MatTrnMatMult 40 1.0 7.7990e+00 1.0 8.24e+08 1.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 106 > >> >>> DMPlexStratify 1 1.0 5.1223e-02 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> DMPlexPrealloc 2 1.0 1.5242e+00 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> VecSet 914053 1.0 1.4929e+02 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> VecAssemblyBegin 1 1.0 1.3411e-07 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> VecAssemblyEnd 1 1.0 8.0094e-08 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> VecScatterBegin 1 1.0 2.6399e-04 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> VecSetRandom 10 1.0 8.6088e-02 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> EPSSetUp 10 1.0 2.9988e+00 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> EPSSolve 10 1.0 2.8695e+04 1.0 9.31e+13 1.0 0.0e+00 > 0.0e+00 0.0e+00 93100 0 0 0 93100 0 0 0 3246 > >> >>> STSetUp 10 1.0 9.7291e-05 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> STApply 152338 1.0 8.2803e+03 1.0 8.20e+12 1.0 0.0e+00 > 0.0e+00 0.0e+00 27 9 0 0 0 27 9 0 0 0 990 > >> >>> BVCopy 1814 1.0 1.1076e+00 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> BVMultVec 304639 1.0 9.8281e+03 1.0 3.34e+13 1.0 0.0e+00 > 0.0e+00 0.0e+00 32 36 0 0 0 32 36 0 0 0 3397 > >> >>> BVMultInPlace 1824 1.0 7.0999e+02 1.0 1.79e+13 1.0 0.0e+00 > 0.0e+00 0.0e+00 2 19 0 0 0 2 19 0 0 0 25213 > >> >>> BVDotVec 304639 1.0 9.8037e+03 1.0 3.36e+13 1.0 0.0e+00 > 0.0e+00 0.0e+00 32 36 0 0 0 32 36 0 0 0 3427 > >> >>> BVOrthogonalizeV 152348 1.0 1.9633e+04 1.0 6.70e+13 1.0 0.0e+00 > 0.0e+00 0.0e+00 63 72 0 0 0 63 72 0 0 0 3411 > >> >>> BVScale 152348 1.0 3.7888e+01 1.0 5.32e+10 1.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1403 > >> >>> BVSetRandom 10 1.0 8.6364e-02 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> DSSolve 1824 1.0 1.7363e+01 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> DSVectors 2797 1.0 1.2353e-01 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> DSOther 1824 1.0 9.8627e+00 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> > ------------------------------------------------------------------------------------------------------------------------ > >> >>> > >> >>> Memory usage is given in bytes: > >> >>> > >> >>> Object Type Creations Destructions Memory > Descendants' Mem. > >> >>> Reports information only for process 0. > >> >>> > >> >>> --- Event Stage 0: Main Stage > >> >>> > >> >>> Container 1 1 584 0. > >> >>> Distributed Mesh 1 1 5184 0. > >> >>> GraphPartitioner 1 1 624 0. > >> >>> Matrix 320 320 3469402576 0. > >> >>> Index Set 53 53 2777932 0. > >> >>> IS L to G Mapping 1 1 249320 0. > >> >>> Section 13 11 7920 0. > >> >>> Star Forest Graph 6 6 4896 0. > >> >>> Discrete System 1 1 936 0. > >> >>> Vector 609405 609405 857220847896 0. > >> >>> Vec Scatter 1 1 704 0. > >> >>> Viewer 22 11 9328 0. > >> >>> EPS Solver 10 10 86360 0. > >> >>> Spectral Transform 10 10 8400 0. > >> >>> Basis Vectors 10 10 530336 0. > >> >>> PetscRandom 10 10 6540 0. > >> >>> Region 10 10 6800 0. > >> >>> Direct Solver 10 10 9838880 0. > >> >>> Krylov Solver 10 10 13920 0. > >> >>> Preconditioner 10 10 10080 0. > >> >>> > ======================================================================================================================== > >> >>> Average time to get PetscTime(): 2.50991e-08 > >> >>> #PETSc Option Table entries: > >> >>> -config=benchmark3.json > >> >>> -eps_converged_reason > >> >>> -log_view > >> >>> #End of PETSc Option Table entries > >> >>> Compiled without FORTRAN kernels > >> >>> Compiled with full precision matrices (default) > >> >>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 8 sizeof(PetscInt) 4 > >> >>> Configure options: > --prefix=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit > --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 > CFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" FFLAGS= > CXXFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" > --with-cc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc > --with-cxx=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpic++ > --with-fc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 > --with-precision=double --with-scalar-type=real --with-shared-libraries=1 > --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= > CXXOPTFLAGS= > --with-blaslapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib/libopenblas.so > --with-x=0 --with-cxx-dialect=C++11 --with-boost=1 --with-clanguage=C > --with-scalapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib/libscalapack.so > --with-scalapack=1 --with-metis=1 > --with-metis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk > --with-hdf5=1 > --with-hdf5-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5 > --with-hypre=1 > --with-hypre-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne > --with-parmetis=1 > --with-parmetis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4 > --with-mumps=1 > --with-mumps-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b > --with-trilinos=1 > --with-trilinos-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo > --with-fftw=0 --with-cxx-dialect=C++11 > --with-superlu_dist-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include > --with-superlu_dist-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib/libsuperlu_dist.a > --with-superlu_dist=1 > --with-suitesparse-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include > --with-suitesparse-lib="/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libumfpack.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libklu.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcholmod.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libbtf.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libccolamd.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcolamd.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcamd.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libamd.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libsuitesparseconfig.so > /lib64/librt.so" --with-suitesparse=1 > --with-zlib-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include > --with-zlib-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib/libz.so > --with-zlib=1 > >> >>> ----------------------------------------- > >> >>> Libraries compiled on 2020-01-22 15:21:53 on eu-c7-051-02 > >> >>> Machine characteristics: > Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-centos-7.5.1804-Core > >> >>> Using PETSc directory: > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit > >> >>> Using PETSc arch: > >> >>> ----------------------------------------- > >> >>> > >> >>> Using C compiler: > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc > -ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2 > >> >>> Using Fortran compiler: > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 > > >> >>> ----------------------------------------- > >> >>> > >> >>> Using include paths: > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include > >> >>> ----------------------------------------- > >> >>> > >> >>> Using C linker: > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc > >> >>> Using Fortran linker: > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 > >> >>> Using libraries: > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib > -lpetsc > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib > /lib64/librt.so > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib > -Wl,-rpath,/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib > -L/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib:/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib > -lmuelu-adapters -lmuelu-interface -lmuelu -lstratimikos -lstratimikosbelos > -lstratimikosaztecoo -lstratimikosamesos -lstratimikosml > -lstratimikosifpack -lModeLaplace -lanasaziepetra -lanasazi -lmapvarlib > -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco > -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac > -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus > -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco > -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac > -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus > -lbelosxpetra -lbelosepetra -lbelos -lml -lifpack -lpamgen_extras -lpamgen > -lamesos -lgaleri-xpetra -lgaleri-epetra -laztecoo -lisorropia -lxpetra-sup > -lxpetra -lthyraepetraext -lthyraepetra -lthyracore -lthyraepetraext > -lthyraepetra -lthyracore -lepetraext -ltrilinosss -ltriutils -lzoltan > -lepetra -lsacado -lrtop -lkokkoskernels -lteuchoskokkoscomm > -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm > -lteuchosparameterlist -lteuchosparser -lteuchoscore -lteuchoskokkoscomm > -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm > -lteuchosparameterlist -lteuchosparser -lteuchoscore -lkokkosalgorithms > -lkokkoscontainers -lkokkoscore -lkokkosalgorithms -lkokkoscontainers > -lkokkoscore -lgtest -lpthread -lcmumps -ldmumps -lsmumps -lzmumps > -lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf -lccolamd > -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu_dist -lHYPRE -lopenblas > -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lparmetis -lmetis -lm -lz > -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi > -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl > >> >>> ----------------------------------------- > >> >>> > >> >>> > >> >>>> Am 30.04.2020 um 17:14 schrieb Matthew Knepley >: > >> >>>> > >> >>>> On Thu, Apr 30, 2020 at 10:55 AM Walker Andreas < > awalker at student.ethz.ch> wrote: > >> >>>> Hello everyone, > >> >>>> > >> >>>> I have used SLEPc successfully on a FEM-related project. Even > though it is very powerful overall, the speedup I measure is a bit below my > expectations. Compared to using a single core, the speedup is for example > around 1.8 for two cores but only maybe 50-60 for 128 cores and maybe 70 or > 80 for 256 cores. Some details about my problem: > >> >>>> > >> >>>> - The problem is based on meshes with up to 400k degrees of > freedom. DMPlex is used for organizing it. > >> >>>> - ParMetis is used to partition the mesh. This yields a stiffness > matrix where the vast majority of entries is in the diagonal blocks (i.e. > looking at the rows owned by a core, there is a very dense square-shaped > region around the diagonal and some loosely scattered nozeroes in the other > columns). > >> >>>> - The actual matrix from which I need eigenvalues is a 2x2 block > matrix, saved as MATNEST - matrix. Each of these four matrices is computed > based on the stiffness matrix and has a similar size and nonzero pattern. > For a mesh of 200k dofs, one such matrix has a size of about 174kx174k and > on average about 40 nonzeroes per row. > >> >>>> - I use the default Krylov-Schur solver and look for the 100 > smallest eigenvalues > >> >>>> - The output of -log_view for the 200k-dof - mesh described above > run on 128 cores is at the end of this mail. > >> >>>> > >> >>>> I noticed that the problem matrices are not perfectly balanced, > i.e. the number of rows per core might vary between 2500 and 3000, for > example. But I am not sure if this is the main reason for the poor speedup. > >> >>>> > >> >>>> I tried to reduce the subspace size but without effect. I also > attempted to use the shift-and-invert spectral transformation but the > MATNEST-type prevents this. > >> >>>> > >> >>>> Are there any suggestions to improve the speedup further or is > this the maximum speedup that I can expect? > >> >>>> > >> >>>> Can you also give us the performance for this problem on one node > using the same number of cores per node? Then we can calculate speedup > >> >>>> and look at which functions are not speeding up. > >> >>>> > >> >>>> Thanks, > >> >>>> > >> >>>> Matt > >> >>>> > >> >>>> Thanks a lot in advance, > >> >>>> > >> >>>> Andreas Walker > >> >>>> > >> >>>> m&m group > >> >>>> D-MAVT > >> >>>> ETH Zurich > >> >>>> > >> >>>> > ************************************************************************************************************************ > >> >>>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use > 'enscript -r -fCourier9' to print this document *** > >> >>>> > ************************************************************************************************************************ > >> >>>> > >> >>>> ---------------------------------------------- PETSc Performance > Summary: ---------------------------------------------- > >> >>>> > >> >>>> ./Solver on a named eu-g1-050-2 with 128 processors, by awalker > Thu Apr 30 15:50:22 2020 > >> >>>> Using Petsc Release Version 3.10.5, Mar, 28, 2019 > >> >>>> > >> >>>> Max Max/Min Avg Total > >> >>>> Time (sec): 6.209e+02 1.000 6.209e+02 > >> >>>> Objects: 6.068e+05 1.001 6.063e+05 > >> >>>> Flop: 9.230e+11 1.816 7.212e+11 9.231e+13 > >> >>>> Flop/sec: 1.487e+09 1.816 1.161e+09 1.487e+11 > >> >>>> MPI Messages: 1.451e+07 2.999 8.265e+06 1.058e+09 > >> >>>> MPI Message Lengths: 6.062e+09 2.011 5.029e+02 5.321e+11 > >> >>>> MPI Reductions: 1.512e+06 1.000 > >> >>>> > >> >>>> Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > >> >>>> e.g., VecAXPY() for real vectors of > length N --> 2N flop > >> >>>> and VecAXPY() for complex vectors of > length N --> 8N flop > >> >>>> > >> >>>> Summary of Stages: ----- Time ------ ----- Flop ------ --- > Messages --- -- Message Lengths -- -- Reductions -- > >> >>>> Avg %Total Avg %Total Count > %Total Avg %Total Count %Total > >> >>>> 0: Main Stage: 6.2090e+02 100.0% 9.2309e+13 100.0% > 1.058e+09 100.0% 5.029e+02 100.0% 1.512e+06 100.0% > >> >>>> > >> >>>> > ------------------------------------------------------------------------------------------------------------------------ > >> >>>> See the 'Profiling' chapter of the users' manual for details on > interpreting output. > >> >>>> Phase summary info: > >> >>>> Count: number of times phase was executed > >> >>>> Time and Flop: Max - maximum over all processors > >> >>>> Ratio - ratio of maximum to minimum over all > processors > >> >>>> Mess: number of messages sent > >> >>>> AvgLen: average message length (bytes) > >> >>>> Reduct: number of global reductions > >> >>>> Global: entire computation > >> >>>> Stage: stages of a computation. Set stages with > PetscLogStagePush() and PetscLogStagePop(). > >> >>>> %T - percent time in this phase %F - percent flop in > this phase > >> >>>> %M - percent messages in this phase %L - percent message > lengths in this phase > >> >>>> %R - percent reductions in this phase > >> >>>> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max > time over all processors) > >> >>>> > ------------------------------------------------------------------------------------------------------------------------ > >> >>>> Event Count Time (sec) Flop > --- Global --- --- Stage ---- Total > >> >>>> Max Ratio Max Ratio Max Ratio Mess > AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > >> >>>> > ------------------------------------------------------------------------------------------------------------------------ > >> >>>> > >> >>>> --- Event Stage 0: Main Stage > >> >>>> > >> >>>> BuildTwoSided 20 1.0 2.3249e-01 2.2 0.00e+00 0.0 2.2e+04 > 4.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> BuildTwoSidedF 317 1.0 8.5016e-01 4.8 0.00e+00 0.0 2.1e+04 > 1.4e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> MatMult 150986 1.0 2.1963e+02 1.3 8.07e+10 1.8 1.1e+09 > 5.0e+02 1.2e+06 31 9100100 80 31 9100100 80 37007 > >> >>>> MatMultAdd 603944 1.0 1.6209e+02 1.4 8.07e+10 1.8 1.1e+09 > 5.0e+02 0.0e+00 23 9100100 0 23 9100100 0 50145 > >> >>>> MatConvert 30 1.0 1.6488e-02 2.2 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> MatScale 10 1.0 1.0347e-03 3.9 6.68e+05 1.8 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 65036 > >> >>>> MatAssemblyBegin 916 1.0 8.6715e-01 1.4 0.00e+00 0.0 2.1e+04 > 1.4e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> MatAssemblyEnd 916 1.0 2.0682e-01 1.1 0.00e+00 0.0 4.7e+05 > 1.3e+02 1.5e+03 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> MatZeroEntries 42 1.0 7.2787e-03 2.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> MatView 10 1.0 1.4816e+00 1.0 0.00e+00 0.0 6.4e+03 > 1.3e+05 3.0e+01 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> MatAXPY 40 1.0 1.0752e-02 1.9 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> MatTranspose 80 1.0 3.0198e-03 1.4 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> MatMatMult 60 1.0 3.0391e-01 1.0 7.82e+06 1.6 3.8e+05 > 2.8e+02 7.8e+02 0 0 0 0 0 0 0 0 0 0 2711 > >> >>>> MatMatMultSym 60 1.0 2.4238e-01 1.0 0.00e+00 0.0 3.3e+05 > 2.4e+02 7.2e+02 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> MatMatMultNum 60 1.0 5.8508e-02 1.0 7.82e+06 1.6 4.7e+04 > 5.7e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 14084 > >> >>>> MatPtAP 40 1.0 4.5617e-01 1.0 1.59e+07 1.6 3.3e+05 > 1.0e+03 6.4e+02 0 0 0 0 0 0 0 0 0 0 3649 > >> >>>> MatPtAPSymbolic 40 1.0 2.6002e-01 1.0 0.00e+00 0.0 1.7e+05 > 6.5e+02 2.8e+02 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> MatPtAPNumeric 40 1.0 1.9293e-01 1.0 1.59e+07 1.6 1.5e+05 > 1.5e+03 3.2e+02 0 0 0 0 0 0 0 0 0 0 8629 > >> >>>> MatTrnMatMult 40 1.0 2.3801e-01 1.0 6.09e+06 1.8 1.8e+05 > 1.0e+03 6.4e+02 0 0 0 0 0 0 0 0 0 0 2442 > >> >>>> MatTrnMatMultSym 40 1.0 1.6962e-01 1.0 0.00e+00 0.0 1.7e+05 > 4.4e+02 6.4e+02 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> MatTrnMatMultNum 40 1.0 6.9000e-02 1.0 6.09e+06 1.8 9.7e+03 > 1.1e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 8425 > >> >>>> MatGetLocalMat 240 1.0 4.9149e-02 1.6 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> MatGetBrAoCol 160 1.0 2.0470e-02 1.6 0.00e+00 0.0 3.3e+05 > 4.1e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> MatTranspose_SeqAIJ_FAST 80 1.0 2.9940e-03 1.4 0.00e+00 0.0 > 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> Mesh Partition 1 1.0 1.4825e+00 1.0 0.00e+00 0.0 9.8e+04 > 6.9e+01 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> Mesh Migration 1 1.0 3.6680e-02 1.0 0.00e+00 0.0 1.5e+03 > 1.4e+04 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> DMPlexDistribute 1 1.0 1.5269e+00 1.0 0.00e+00 0.0 1.0e+05 > 3.5e+02 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> DMPlexDistCones 1 1.0 1.8845e-02 1.2 0.00e+00 0.0 1.0e+03 > 1.7e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> DMPlexDistLabels 1 1.0 9.7280e-04 1.2 0.00e+00 0.0 0.0e+00 > 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> DMPlexDistData 1 1.0 3.1499e-01 1.4 0.00e+00 0.0 9.8e+04 > 4.3e+01 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> DMPlexStratify 2 1.0 9.3421e-02 1.8 0.00e+00 0.0 0.0e+00 > 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> DMPlexPrealloc 2 1.0 3.5980e-02 1.0 0.00e+00 0.0 4.0e+04 > 1.8e+03 3.0e+01 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> SFSetGraph 20 1.0 1.6069e-05 2.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> SFSetUp 20 1.0 2.8043e-01 1.9 0.00e+00 0.0 6.7e+04 > 5.0e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> SFBcastBegin 25 1.0 3.9653e-02 2.5 0.00e+00 0.0 6.1e+04 > 4.9e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> SFBcastEnd 25 1.0 9.0128e-02 1.6 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> SFReduceBegin 10 1.0 4.3473e-04 5.5 0.00e+00 0.0 7.4e+03 > 4.0e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> SFReduceEnd 10 1.0 5.7962e-03 1.3 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> SFFetchOpBegin 2 1.0 1.6069e-0434.7 0.00e+00 0.0 1.8e+03 > 4.4e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> SFFetchOpEnd 2 1.0 8.9251e-04 2.6 0.00e+00 0.0 1.8e+03 > 4.4e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> VecSet 302179 1.0 1.3128e+00 2.3 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> VecAssemblyBegin 1 1.0 1.3844e-03 7.3 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> VecAssemblyEnd 1 1.0 3.4710e-05 4.1 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> VecScatterBegin 603945 1.0 2.2874e+01 4.4 0.00e+00 0.0 1.1e+09 > 5.0e+02 1.0e+00 2 0100100 0 2 0100100 0 0 > >> >>>> VecScatterEnd 603944 1.0 8.2651e+01 4.5 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 7 0 0 0 0 7 0 0 0 0 0 > >> >>>> VecSetRandom 11 1.0 2.7061e-03 3.1 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> EPSSetUp 10 1.0 5.0371e-02 1.1 0.00e+00 0.0 0.0e+00 > 0.0e+00 4.0e+01 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> EPSSolve 10 1.0 6.1329e+02 1.0 9.23e+11 1.8 1.1e+09 > 5.0e+02 1.5e+06 99100100100100 99100100100100 150509 > >> >>>> STSetUp 10 1.0 2.5475e-04 2.9 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> STApply 150986 1.0 2.1997e+02 1.3 8.07e+10 1.8 1.1e+09 > 5.0e+02 1.2e+06 31 9100100 80 31 9100100 80 36950 > >> >>>> BVCopy 1791 1.0 5.1953e-03 1.5 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> BVMultVec 301925 1.0 1.5007e+02 3.1 3.31e+11 1.8 0.0e+00 > 0.0e+00 0.0e+00 14 36 0 0 0 14 36 0 0 0 220292 > >> >>>> BVMultInPlace 1801 1.0 8.0080e+00 1.8 1.78e+11 1.8 0.0e+00 > 0.0e+00 0.0e+00 1 19 0 0 0 1 19 0 0 0 2222543 > >> >>>> BVDotVec 301925 1.0 3.2807e+02 1.4 3.33e+11 1.8 0.0e+00 > 0.0e+00 3.0e+05 47 36 0 0 20 47 36 0 0 20 101409 > >> >>>> BVOrthogonalizeV 150996 1.0 4.0292e+02 1.1 6.64e+11 1.8 0.0e+00 > 0.0e+00 3.0e+05 62 72 0 0 20 62 72 0 0 20 164619 > >> >>>> BVScale 150996 1.0 4.1660e-01 3.2 5.27e+08 1.8 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 126494 > >> >>>> BVSetRandom 10 1.0 2.5061e-03 2.9 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> DSSolve 1801 1.0 2.0764e+01 1.1 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 > >> >>>> DSVectors 2779 1.0 1.2691e-01 1.1 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> DSOther 1801 1.0 1.2944e+01 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > >> >>>> > ------------------------------------------------------------------------------------------------------------------------ > >> >>>> > >> >>>> Memory usage is given in bytes: > >> >>>> > >> >>>> Object Type Creations Destructions Memory > Descendants' Mem. > >> >>>> Reports information only for process 0. > >> >>>> > >> >>>> --- Event Stage 0: Main Stage > >> >>>> > >> >>>> Container 1 1 584 0. > >> >>>> Distributed Mesh 6 6 29160 0. > >> >>>> GraphPartitioner 2 2 1244 0. > >> >>>> Matrix 1104 1104 136615232 0. > >> >>>> Index Set 930 930 9125912 0. > >> >>>> IS L to G Mapping 3 3 2235608 0. > >> >>>> Section 28 26 18720 0. > >> >>>> Star Forest Graph 30 30 25632 0. > >> >>>> Discrete System 6 6 5616 0. > >> >>>> PetscRandom 11 11 7194 0. > >> >>>> Vector 604372 604372 8204816368 0. > >> >>>> Vec Scatter 203 203 272192 0. > >> >>>> Viewer 21 10 8480 0. > >> >>>> EPS Solver 10 10 86360 0. > >> >>>> Spectral Transform 10 10 8400 0. > >> >>>> Basis Vectors 10 10 530848 0. > >> >>>> Region 10 10 6800 0. > >> >>>> Direct Solver 10 10 9838880 0. > >> >>>> Krylov Solver 10 10 13920 0. > >> >>>> Preconditioner 10 10 10080 0. > >> >>>> > ======================================================================================================================== > >> >>>> Average time to get PetscTime(): 3.49944e-08 > >> >>>> Average time for MPI_Barrier(): 5.842e-06 > >> >>>> Average time for zero size MPI_Send(): 8.72551e-06 > >> >>>> #PETSc Option Table entries: > >> >>>> -config=benchmark3.json > >> >>>> -log_view > >> >>>> #End of PETSc Option Table entries > >> >>>> Compiled without FORTRAN kernels > >> >>>> Compiled with full precision matrices (default) > >> >>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 8 sizeof(PetscInt) 4 > >> >>>> Configure options: > --prefix=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit > --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 > CFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" FFLAGS= > CXXFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" > --with-cc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc > --with-cxx=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpic++ > --with-fc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 > --with-precision=double --with-scalar-type=real --with-shared-libraries=1 > --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= > CXXOPTFLAGS= > --with-blaslapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib/libopenblas.so > --with-x=0 --with-cxx-dialect=C++11 --with-boost=1 --with-clanguage=C > --with-scalapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib/libscalapack.so > --with-scalapack=1 --with-metis=1 > --with-metis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk > --with-hdf5=1 > --with-hdf5-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5 > --with-hypre=1 > --with-hypre-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne > --with-parmetis=1 > --with-parmetis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4 > --with-mumps=1 > --with-mumps-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b > --with-trilinos=1 > --with-trilinos-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo > --with-fftw=0 --with-cxx-dialect=C++11 > --with-superlu_dist-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include > --with-superlu_dist-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib/libsuperlu_dist.a > --with-superlu_dist=1 > --with-suitesparse-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include > --with-suitesparse-lib="/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libumfpack.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libklu.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcholmod.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libbtf.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libccolamd.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcolamd.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcamd.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libamd.so > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libsuitesparseconfig.so > /lib64/librt.so" --with-suitesparse=1 > --with-zlib-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include > --with-zlib-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib/libz.so > --with-zlib=1 > >> >>>> ----------------------------------------- > >> >>>> Libraries compiled on 2020-01-22 15:21:53 on eu-c7-051-02 > >> >>>> Machine characteristics: > Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-centos-7.5.1804-Core > >> >>>> Using PETSc directory: > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit > >> >>>> Using PETSc arch: > >> >>>> ----------------------------------------- > >> >>>> > >> >>>> Using C compiler: > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc > -ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2 > >> >>>> Using Fortran compiler: > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 > > >> >>>> ----------------------------------------- > >> >>>> > >> >>>> Using include paths: > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/include > -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include > >> >>>> ----------------------------------------- > >> >>>> > >> >>>> Using C linker: > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc > >> >>>> Using Fortran linker: > /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 > >> >>>> Using libraries: > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib > -lpetsc > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib > /lib64/librt.so > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib > -Wl,-rpath,/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib > -L/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib:/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 > -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib > -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib > -lmuelu-adapters -lmuelu-interface -lmuelu -lstratimikos -lstratimikosbelos > -lstratimikosaztecoo -lstratimikosamesos -lstratimikosml > -lstratimikosifpack -lModeLaplace -lanasaziepetra -lanasazi -lmapvarlib > -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco > -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac > -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus > -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco > -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac > -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus > -lbelosxpetra -lbelosepetra -lbelos -lml -lifpack -lpamgen_extras -lpamgen > -lamesos -lgaleri-xpetra -lgaleri-epetra -laztecoo -lisorropia -lxpetra-sup > -lxpetra -lthyraepetraext -lthyraepetra -lthyracore -lthyraepetraext > -lthyraepetra -lthyracore -lepetraext -ltrilinosss -ltriutils -lzoltan > -lepetra -lsacado -lrtop -lkokkoskernels -lteuchoskokkoscomm > -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm > -lteuchosparameterlist -lteuchosparser -lteuchoscore -lteuchoskokkoscomm > -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm > -lteuchosparameterlist -lteuchosparser -lteuchoscore -lkokkosalgorithms > -lkokkoscontainers -lkokkoscore -lkokkosalgorithms -lkokkoscontainers > -lkokkoscore -lgtest -lpthread -lcmumps -ldmumps -lsmumps -lzmumps > -lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf -lccolamd > -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu_dist -lHYPRE -lopenblas > -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lparmetis -lmetis -lm -lz > -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi > -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl > >> >>>> ----------------------------------------- > >> >>>> > >> >>>> > >> >>>> > >> >>>> -- > >> >>>> What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > >> >>>> -- Norbert Wiener > >> >>>> > >> >>>> https://www.cse.buffalo.edu/~knepley/ > >> >>> > >> > >> > >> > >> -- > >> What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > >> -- Norbert Wiener > >> > >> https://www.cse.buffalo.edu/~knepley/ > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Mon May 4 06:03:48 2020 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 4 May 2020 13:03:48 +0200 Subject: [petsc-users] Performance of SLEPc's Krylov-Schur solver In-Reply-To: References: <86B05A0E-87C4-4B23-AC8B-6C39E6538B84@student.ethz.ch> <58697038-9771-4819-B060-061A3F3F0E91@student.ethz.ch> <87sggjam5x.fsf@jedbrown.org> <294D3E6F-33A1-4208-AD80-87CDA90DF87B@student.ethz.ch> <5EFC5BA9-A229-4FCB-B484-3A90F9213F95@student.ethz.ch> <2758D742-5E7D-4AEA-B93E-1592424FA8E2@dsic.upv.es> Message-ID: <6D13673B-FAD0-4CE0-B65B-252E04CB071C@dsic.upv.es> > El 4 may 2020, a las 12:24, Matthew Knepley escribi?: > > > On Mon, May 4, 2020 at 6:12 AM Jose E. Roman wrote: > > > > El 4 may 2020, a las 12:06, Matthew Knepley escribi?: > > > > On Mon, May 4, 2020 at 3:51 AM Walker Andreas wrote: > > Hey everyone, > > > > I wanted to give you a short update on this: > > > > - As suggested by Matt, I played around with the distribution of my cores over nodes and changed to using one core per node only. > > - After experimenting with a monolithic matrix instead of a MATNEST object, I observed that the monolithic showed better speedup and changed to building my problem matrix as monolithic matrix. > > - I keep the subspace for the solver small which slightly improves the runtimes > > > > After these changes, I get near-perfect scaling with speedups above 56 for 64 cores (1 core/node) for example. Unfortunately I can?t really tell which of the above changes contributed how much to this improvement. > > > > Anyway, thanks everyone for your help! > > > > Great! I am glad its scaling well. > > > > 1) You should be able to up the number of cores per node as large as you want, as long as you evaluate the scaling in terms of nodes., meaning use that as the baseline. > > > > 2) You can see how many cores per node are used efficiently by running 'make streams'. > > > > 3) We would be really interested in seeing the logs from runs where MATNEST scales worse than AIJ. Maybe we could fix that. > > Matt, this last issue is due to SLEPc, I am preparing a fix. > > Thanks! I am interested to see it. Have a look at MR 50: https://gitlab.com/slepc/slepc/-/merge_requests/50 In the test example I have made it seems that the MatNest is not using VecNest internally. How can I modify the example to use VecNest? Jose > > Matt > > > > > Thanks, > > > > Matt > > > > Best regards and stay healthy > > > > Andreas > > > >> Am 01.05.2020 um 14:45 schrieb Matthew Knepley : > >> > >> On Fri, May 1, 2020 at 8:32 AM Walker Andreas wrote: > >> Hi Jed, Hi Jose, > >> > >> Thank you very much for your suggestions. > >> > >> - I tried reducing the subspace to 64 which indeed reduced the runtime by around 20 percent (sometimes more) for 128 cores. I will check what the effect on the sequential runtime is. > >> - Regarding MatNest, I can just look for the eigenvalues of a submatrix to see how the speedup is affected; I will check that. Replacing the full matnest with a contiguous matrix is definitely more work but, if it improves the performance, worth the work (we assume that the program will be reused a lot). > >> - Petsc is configured with mumps, openblas, scalapack (among others). But I noticed no significant difference to when petsc is configured without them. > >> - The number of iterations required by the solver does not depend on the number of cores. > >> > >> Best regards and many thanks, > >> > >> Let me just address something from a high level. These operations are not compute limited (for the most part), but limited by > >> bandwidth. Bandwidth is allocated by node, not by core, on these machines. That is why it important to understand how many > >> nodes you are using, not cores. A useful scaling test would be to fill up a single node (however many cores fit on one node), and > >> then increase the # of nodes. We would expect close to linear scaling in that case. > >> > >> Thanks, > >> > >> Matt > >> > >> Andreas Walker > >> > >> > Am 01.05.2020 um 14:12 schrieb Jed Brown : > >> > > >> > "Jose E. Roman" writes: > >> > > >> >> Comments related to PETSc: > >> >> > >> >> - If you look at the "Reduct" column you will see that MatMult() is doing a lot of global reductions, which is bad for scaling. This is due to MATNEST (other Mat types do not do that). I don't know the details of MATNEST, maybe Matt can comment on this. > >> > > >> > It is not intrinsic to MatNest, though use of MatNest incurs extra > >> > VecScatter costs. If you use MatNest without VecNest, then > >> > VecGetSubVector incurs significant cost (including reductions). I > >> > suspect it's likely that some SLEPc functionality is not available with > >> > VecNest. A better option would be to optimize VecGetSubVector by > >> > caching the IS and subvector, at least in the contiguous case. > >> > > >> > How difficult would it be for you to run with a monolithic matrix > >> > instead of MatNest? It would certainly be better at amortizing > >> > communication costs. > >> > > >> >> > >> >> Comments related to SLEPc. > >> >> > >> >> - The last rows (DSSolve, DSVectors, DSOther) correspond to "sequential" computations. In your case they take a non-negligible time (around 30 seconds). You can try to reduce this time by reducing the size of the projected problem, e.g. running with -eps_nev 100 -eps_mpd 64 (see https://slepc.upv.es/documentation/current/docs/manualpages/EPS/EPSSetDimensions.html ) > >> >> > >> >> - In my previous comment about multithreaded BLAS, I was refering to configuring PETSc with MKL, OpenBLAS or similar. But anyway, I don't think this is relevant here. > >> >> > >> >> - Regarding the number of iterations, yes the number of iterations should be the same for different runs if you keep the same number of processes, but when you change the number of processes there might be significant differences for some problems, that is the rationale of my suggestion. Anyway, in your case the fluctuation does not seem very important. > >> >> > >> >> Jose > >> >> > >> >> > >> >>> El 1 may 2020, a las 10:07, Walker Andreas escribi?: > >> >>> > >> >>> Hi Matthew, > >> >>> > >> >>> I just ran the same program on a single core. You can see the output of -log_view below. As I see it, most functions have speedups of around 50 for 128 cores, also functions like matmult etc. > >> >>> > >> >>> Best regards, > >> >>> > >> >>> Andreas > >> >>> > >> >>> ************************************************************************************************************************ > >> >>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** > >> >>> ************************************************************************************************************************ > >> >>> > >> >>> ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- > >> >>> > >> >>> ./Solver on a named eu-a6-011-09 with 1 processor, by awalker Fri May 1 04:03:07 2020 > >> >>> Using Petsc Release Version 3.10.5, Mar, 28, 2019 > >> >>> > >> >>> Max Max/Min Avg Total > >> >>> Time (sec): 3.092e+04 1.000 3.092e+04 > >> >>> Objects: 6.099e+05 1.000 6.099e+05 > >> >>> Flop: 9.313e+13 1.000 9.313e+13 9.313e+13 > >> >>> Flop/sec: 3.012e+09 1.000 3.012e+09 3.012e+09 > >> >>> MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 > >> >>> MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 > >> >>> MPI Reductions: 0.000e+00 0.000 > >> >>> > >> >>> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) > >> >>> e.g., VecAXPY() for real vectors of length N --> 2N flop > >> >>> and VecAXPY() for complex vectors of length N --> 8N flop > >> >>> > >> >>> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- > >> >>> Avg %Total Avg %Total Count %Total Avg %Total Count %Total > >> >>> 0: Main Stage: 3.0925e+04 100.0% 9.3134e+13 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% > >> >>> > >> >>> ------------------------------------------------------------------------------------------------------------------------ > >> >>> See the 'Profiling' chapter of the users' manual for details on interpreting output. > >> >>> Phase summary info: > >> >>> Count: number of times phase was executed > >> >>> Time and Flop: Max - maximum over all processors > >> >>> Ratio - ratio of maximum to minimum over all processors > >> >>> Mess: number of messages sent > >> >>> AvgLen: average message length (bytes) > >> >>> Reduct: number of global reductions > >> >>> Global: entire computation > >> >>> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). > >> >>> %T - percent time in this phase %F - percent flop in this phase > >> >>> %M - percent messages in this phase %L - percent message lengths in this phase > >> >>> %R - percent reductions in this phase > >> >>> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) > >> >>> ------------------------------------------------------------------------------------------------------------------------ > >> >>> Event Count Time (sec) Flop --- Global --- --- Stage ---- Total > >> >>> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > >> >>> ------------------------------------------------------------------------------------------------------------------------ > >> >>> > >> >>> --- Event Stage 0: Main Stage > >> >>> > >> >>> MatMult 152338 1.0 8.2799e+03 1.0 8.20e+12 1.0 0.0e+00 0.0e+00 0.0e+00 27 9 0 0 0 27 9 0 0 0 990 > >> >>> MatMultAdd 609352 1.0 8.1229e+03 1.0 8.20e+12 1.0 0.0e+00 0.0e+00 0.0e+00 26 9 0 0 0 26 9 0 0 0 1010 > >> >>> MatConvert 30 1.0 1.5797e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> MatScale 10 1.0 4.7172e-02 1.0 6.73e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1426 > >> >>> MatAssemblyBegin 516 1.0 2.0695e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> MatAssemblyEnd 516 1.0 2.8933e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> MatZeroEntries 2 1.0 3.6038e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> MatView 10 1.0 2.4422e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> MatAXPY 40 1.0 3.1595e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> MatMatMult 60 1.0 1.3723e+01 1.0 1.24e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 90 > >> >>> MatMatMultSym 100 1.0 1.3651e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> MatMatMultNum 100 1.0 7.5159e+00 1.0 2.06e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 274 > >> >>> MatMatMatMult 40 1.0 1.8674e+01 1.0 1.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 89 > >> >>> MatMatMatMultSym 40 1.0 1.1848e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> MatMatMatMultNum 40 1.0 6.8266e+00 1.0 1.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 243 > >> >>> MatPtAP 40 1.0 1.9042e+01 1.0 1.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 87 > >> >>> MatTrnMatMult 40 1.0 7.7990e+00 1.0 8.24e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 106 > >> >>> DMPlexStratify 1 1.0 5.1223e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> DMPlexPrealloc 2 1.0 1.5242e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> VecSet 914053 1.0 1.4929e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> VecAssemblyBegin 1 1.0 1.3411e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> VecAssemblyEnd 1 1.0 8.0094e-08 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> VecScatterBegin 1 1.0 2.6399e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> VecSetRandom 10 1.0 8.6088e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> EPSSetUp 10 1.0 2.9988e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> EPSSolve 10 1.0 2.8695e+04 1.0 9.31e+13 1.0 0.0e+00 0.0e+00 0.0e+00 93100 0 0 0 93100 0 0 0 3246 > >> >>> STSetUp 10 1.0 9.7291e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> STApply 152338 1.0 8.2803e+03 1.0 8.20e+12 1.0 0.0e+00 0.0e+00 0.0e+00 27 9 0 0 0 27 9 0 0 0 990 > >> >>> BVCopy 1814 1.0 1.1076e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> BVMultVec 304639 1.0 9.8281e+03 1.0 3.34e+13 1.0 0.0e+00 0.0e+00 0.0e+00 32 36 0 0 0 32 36 0 0 0 3397 > >> >>> BVMultInPlace 1824 1.0 7.0999e+02 1.0 1.79e+13 1.0 0.0e+00 0.0e+00 0.0e+00 2 19 0 0 0 2 19 0 0 0 25213 > >> >>> BVDotVec 304639 1.0 9.8037e+03 1.0 3.36e+13 1.0 0.0e+00 0.0e+00 0.0e+00 32 36 0 0 0 32 36 0 0 0 3427 > >> >>> BVOrthogonalizeV 152348 1.0 1.9633e+04 1.0 6.70e+13 1.0 0.0e+00 0.0e+00 0.0e+00 63 72 0 0 0 63 72 0 0 0 3411 > >> >>> BVScale 152348 1.0 3.7888e+01 1.0 5.32e+10 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1403 > >> >>> BVSetRandom 10 1.0 8.6364e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> DSSolve 1824 1.0 1.7363e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> DSVectors 2797 1.0 1.2353e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> DSOther 1824 1.0 9.8627e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>> ------------------------------------------------------------------------------------------------------------------------ > >> >>> > >> >>> Memory usage is given in bytes: > >> >>> > >> >>> Object Type Creations Destructions Memory Descendants' Mem. > >> >>> Reports information only for process 0. > >> >>> > >> >>> --- Event Stage 0: Main Stage > >> >>> > >> >>> Container 1 1 584 0. > >> >>> Distributed Mesh 1 1 5184 0. > >> >>> GraphPartitioner 1 1 624 0. > >> >>> Matrix 320 320 3469402576 0. > >> >>> Index Set 53 53 2777932 0. > >> >>> IS L to G Mapping 1 1 249320 0. > >> >>> Section 13 11 7920 0. > >> >>> Star Forest Graph 6 6 4896 0. > >> >>> Discrete System 1 1 936 0. > >> >>> Vector 609405 609405 857220847896 0. > >> >>> Vec Scatter 1 1 704 0. > >> >>> Viewer 22 11 9328 0. > >> >>> EPS Solver 10 10 86360 0. > >> >>> Spectral Transform 10 10 8400 0. > >> >>> Basis Vectors 10 10 530336 0. > >> >>> PetscRandom 10 10 6540 0. > >> >>> Region 10 10 6800 0. > >> >>> Direct Solver 10 10 9838880 0. > >> >>> Krylov Solver 10 10 13920 0. > >> >>> Preconditioner 10 10 10080 0. > >> >>> ======================================================================================================================== > >> >>> Average time to get PetscTime(): 2.50991e-08 > >> >>> #PETSc Option Table entries: > >> >>> -config=benchmark3.json > >> >>> -eps_converged_reason > >> >>> -log_view > >> >>> #End of PETSc Option Table entries > >> >>> Compiled without FORTRAN kernels > >> >>> Compiled with full precision matrices (default) > >> >>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 > >> >>> Configure options: --prefix=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 CFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" FFLAGS= CXXFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" --with-cc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc --with-cxx=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpic++ --with-fc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 --with-precision=double --with-scalar-type=real --with-shared-libraries=1 --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= CXXOPTFLAGS= --with-blaslapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib/libopenblas.so --with-x=0 --with-cxx-dialect=C++11 --with-boost=1 --with-clanguage=C --with-scalapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib/libscalapack.so --with-scalapack=1 --with-metis=1 --with-metis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk --with-hdf5=1 --with-hdf5-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5 --with-hypre=1 --with-hypre-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne --with-parmetis=1 --with-parmetis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4 --with-mumps=1 --with-mumps-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b --with-trilinos=1 --with-trilinos-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo --with-fftw=0 --with-cxx-dialect=C++11 --with-superlu_dist-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include --with-superlu_dist-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib/libsuperlu_dist.a --with-superlu_dist=1 --with-suitesparse-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include --with-suitesparse-lib="/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libumfpack.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libklu.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcholmod.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libbtf.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libccolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libsuitesparseconfig.so /lib64/librt.so" --with-suitesparse=1 --with-zlib-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include --with-zlib-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib/libz.so --with-zlib=1 > >> >>> ----------------------------------------- > >> >>> Libraries compiled on 2020-01-22 15:21:53 on eu-c7-051-02 > >> >>> Machine characteristics: Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-centos-7.5.1804-Core > >> >>> Using PETSc directory: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit > >> >>> Using PETSc arch: > >> >>> ----------------------------------------- > >> >>> > >> >>> Using C compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc -ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2 > >> >>> Using Fortran compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 > >> >>> ----------------------------------------- > >> >>> > >> >>> Using include paths: -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include > >> >>> ----------------------------------------- > >> >>> > >> >>> Using C linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc > >> >>> Using Fortran linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 > >> >>> Using libraries: -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -lpetsc -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib /lib64/librt.so -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -Wl,-rpath,/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -L/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib:/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -lmuelu-adapters -lmuelu-interface -lmuelu -lstratimikos -lstratimikosbelos -lstratimikosaztecoo -lstratimikosamesos -lstratimikosml -lstratimikosifpack -lModeLaplace -lanasaziepetra -lanasazi -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lbelosxpetra -lbelosepetra -lbelos -lml -lifpack -lpamgen_extras -lpamgen -lamesos -lgaleri-xpetra -lgaleri-epetra -laztecoo -lisorropia -lxpetra-sup -lxpetra -lthyraepetraext -lthyraepetra -lthyracore -lthyraepetraext -lthyraepetra -lthyracore -lepetraext -ltrilinosss -ltriutils -lzoltan -lepetra -lsacado -lrtop -lkokkoskernels -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lgtest -lpthread -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu_dist -lHYPRE -lopenblas -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lparmetis -lmetis -lm -lz -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl > >> >>> ----------------------------------------- > >> >>> > >> >>> > >> >>>> Am 30.04.2020 um 17:14 schrieb Matthew Knepley : > >> >>>> > >> >>>> On Thu, Apr 30, 2020 at 10:55 AM Walker Andreas wrote: > >> >>>> Hello everyone, > >> >>>> > >> >>>> I have used SLEPc successfully on a FEM-related project. Even though it is very powerful overall, the speedup I measure is a bit below my expectations. Compared to using a single core, the speedup is for example around 1.8 for two cores but only maybe 50-60 for 128 cores and maybe 70 or 80 for 256 cores. Some details about my problem: > >> >>>> > >> >>>> - The problem is based on meshes with up to 400k degrees of freedom. DMPlex is used for organizing it. > >> >>>> - ParMetis is used to partition the mesh. This yields a stiffness matrix where the vast majority of entries is in the diagonal blocks (i.e. looking at the rows owned by a core, there is a very dense square-shaped region around the diagonal and some loosely scattered nozeroes in the other columns). > >> >>>> - The actual matrix from which I need eigenvalues is a 2x2 block matrix, saved as MATNEST - matrix. Each of these four matrices is computed based on the stiffness matrix and has a similar size and nonzero pattern. For a mesh of 200k dofs, one such matrix has a size of about 174kx174k and on average about 40 nonzeroes per row. > >> >>>> - I use the default Krylov-Schur solver and look for the 100 smallest eigenvalues > >> >>>> - The output of -log_view for the 200k-dof - mesh described above run on 128 cores is at the end of this mail. > >> >>>> > >> >>>> I noticed that the problem matrices are not perfectly balanced, i.e. the number of rows per core might vary between 2500 and 3000, for example. But I am not sure if this is the main reason for the poor speedup. > >> >>>> > >> >>>> I tried to reduce the subspace size but without effect. I also attempted to use the shift-and-invert spectral transformation but the MATNEST-type prevents this. > >> >>>> > >> >>>> Are there any suggestions to improve the speedup further or is this the maximum speedup that I can expect? > >> >>>> > >> >>>> Can you also give us the performance for this problem on one node using the same number of cores per node? Then we can calculate speedup > >> >>>> and look at which functions are not speeding up. > >> >>>> > >> >>>> Thanks, > >> >>>> > >> >>>> Matt > >> >>>> > >> >>>> Thanks a lot in advance, > >> >>>> > >> >>>> Andreas Walker > >> >>>> > >> >>>> m&m group > >> >>>> D-MAVT > >> >>>> ETH Zurich > >> >>>> > >> >>>> ************************************************************************************************************************ > >> >>>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** > >> >>>> ************************************************************************************************************************ > >> >>>> > >> >>>> ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- > >> >>>> > >> >>>> ./Solver on a named eu-g1-050-2 with 128 processors, by awalker Thu Apr 30 15:50:22 2020 > >> >>>> Using Petsc Release Version 3.10.5, Mar, 28, 2019 > >> >>>> > >> >>>> Max Max/Min Avg Total > >> >>>> Time (sec): 6.209e+02 1.000 6.209e+02 > >> >>>> Objects: 6.068e+05 1.001 6.063e+05 > >> >>>> Flop: 9.230e+11 1.816 7.212e+11 9.231e+13 > >> >>>> Flop/sec: 1.487e+09 1.816 1.161e+09 1.487e+11 > >> >>>> MPI Messages: 1.451e+07 2.999 8.265e+06 1.058e+09 > >> >>>> MPI Message Lengths: 6.062e+09 2.011 5.029e+02 5.321e+11 > >> >>>> MPI Reductions: 1.512e+06 1.000 > >> >>>> > >> >>>> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) > >> >>>> e.g., VecAXPY() for real vectors of length N --> 2N flop > >> >>>> and VecAXPY() for complex vectors of length N --> 8N flop > >> >>>> > >> >>>> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- > >> >>>> Avg %Total Avg %Total Count %Total Avg %Total Count %Total > >> >>>> 0: Main Stage: 6.2090e+02 100.0% 9.2309e+13 100.0% 1.058e+09 100.0% 5.029e+02 100.0% 1.512e+06 100.0% > >> >>>> > >> >>>> ------------------------------------------------------------------------------------------------------------------------ > >> >>>> See the 'Profiling' chapter of the users' manual for details on interpreting output. > >> >>>> Phase summary info: > >> >>>> Count: number of times phase was executed > >> >>>> Time and Flop: Max - maximum over all processors > >> >>>> Ratio - ratio of maximum to minimum over all processors > >> >>>> Mess: number of messages sent > >> >>>> AvgLen: average message length (bytes) > >> >>>> Reduct: number of global reductions > >> >>>> Global: entire computation > >> >>>> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). > >> >>>> %T - percent time in this phase %F - percent flop in this phase > >> >>>> %M - percent messages in this phase %L - percent message lengths in this phase > >> >>>> %R - percent reductions in this phase > >> >>>> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) > >> >>>> ------------------------------------------------------------------------------------------------------------------------ > >> >>>> Event Count Time (sec) Flop --- Global --- --- Stage ---- Total > >> >>>> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > >> >>>> ------------------------------------------------------------------------------------------------------------------------ > >> >>>> > >> >>>> --- Event Stage 0: Main Stage > >> >>>> > >> >>>> BuildTwoSided 20 1.0 2.3249e-01 2.2 0.00e+00 0.0 2.2e+04 4.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> BuildTwoSidedF 317 1.0 8.5016e-01 4.8 0.00e+00 0.0 2.1e+04 1.4e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> MatMult 150986 1.0 2.1963e+02 1.3 8.07e+10 1.8 1.1e+09 5.0e+02 1.2e+06 31 9100100 80 31 9100100 80 37007 > >> >>>> MatMultAdd 603944 1.0 1.6209e+02 1.4 8.07e+10 1.8 1.1e+09 5.0e+02 0.0e+00 23 9100100 0 23 9100100 0 50145 > >> >>>> MatConvert 30 1.0 1.6488e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> MatScale 10 1.0 1.0347e-03 3.9 6.68e+05 1.8 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 65036 > >> >>>> MatAssemblyBegin 916 1.0 8.6715e-01 1.4 0.00e+00 0.0 2.1e+04 1.4e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> MatAssemblyEnd 916 1.0 2.0682e-01 1.1 0.00e+00 0.0 4.7e+05 1.3e+02 1.5e+03 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> MatZeroEntries 42 1.0 7.2787e-03 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> MatView 10 1.0 1.4816e+00 1.0 0.00e+00 0.0 6.4e+03 1.3e+05 3.0e+01 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> MatAXPY 40 1.0 1.0752e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> MatTranspose 80 1.0 3.0198e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> MatMatMult 60 1.0 3.0391e-01 1.0 7.82e+06 1.6 3.8e+05 2.8e+02 7.8e+02 0 0 0 0 0 0 0 0 0 0 2711 > >> >>>> MatMatMultSym 60 1.0 2.4238e-01 1.0 0.00e+00 0.0 3.3e+05 2.4e+02 7.2e+02 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> MatMatMultNum 60 1.0 5.8508e-02 1.0 7.82e+06 1.6 4.7e+04 5.7e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 14084 > >> >>>> MatPtAP 40 1.0 4.5617e-01 1.0 1.59e+07 1.6 3.3e+05 1.0e+03 6.4e+02 0 0 0 0 0 0 0 0 0 0 3649 > >> >>>> MatPtAPSymbolic 40 1.0 2.6002e-01 1.0 0.00e+00 0.0 1.7e+05 6.5e+02 2.8e+02 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> MatPtAPNumeric 40 1.0 1.9293e-01 1.0 1.59e+07 1.6 1.5e+05 1.5e+03 3.2e+02 0 0 0 0 0 0 0 0 0 0 8629 > >> >>>> MatTrnMatMult 40 1.0 2.3801e-01 1.0 6.09e+06 1.8 1.8e+05 1.0e+03 6.4e+02 0 0 0 0 0 0 0 0 0 0 2442 > >> >>>> MatTrnMatMultSym 40 1.0 1.6962e-01 1.0 0.00e+00 0.0 1.7e+05 4.4e+02 6.4e+02 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> MatTrnMatMultNum 40 1.0 6.9000e-02 1.0 6.09e+06 1.8 9.7e+03 1.1e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 8425 > >> >>>> MatGetLocalMat 240 1.0 4.9149e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> MatGetBrAoCol 160 1.0 2.0470e-02 1.6 0.00e+00 0.0 3.3e+05 4.1e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> MatTranspose_SeqAIJ_FAST 80 1.0 2.9940e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> Mesh Partition 1 1.0 1.4825e+00 1.0 0.00e+00 0.0 9.8e+04 6.9e+01 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> Mesh Migration 1 1.0 3.6680e-02 1.0 0.00e+00 0.0 1.5e+03 1.4e+04 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> DMPlexDistribute 1 1.0 1.5269e+00 1.0 0.00e+00 0.0 1.0e+05 3.5e+02 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> DMPlexDistCones 1 1.0 1.8845e-02 1.2 0.00e+00 0.0 1.0e+03 1.7e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> DMPlexDistLabels 1 1.0 9.7280e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> DMPlexDistData 1 1.0 3.1499e-01 1.4 0.00e+00 0.0 9.8e+04 4.3e+01 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> DMPlexStratify 2 1.0 9.3421e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> DMPlexPrealloc 2 1.0 3.5980e-02 1.0 0.00e+00 0.0 4.0e+04 1.8e+03 3.0e+01 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> SFSetGraph 20 1.0 1.6069e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> SFSetUp 20 1.0 2.8043e-01 1.9 0.00e+00 0.0 6.7e+04 5.0e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> SFBcastBegin 25 1.0 3.9653e-02 2.5 0.00e+00 0.0 6.1e+04 4.9e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> SFBcastEnd 25 1.0 9.0128e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> SFReduceBegin 10 1.0 4.3473e-04 5.5 0.00e+00 0.0 7.4e+03 4.0e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> SFReduceEnd 10 1.0 5.7962e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> SFFetchOpBegin 2 1.0 1.6069e-0434.7 0.00e+00 0.0 1.8e+03 4.4e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> SFFetchOpEnd 2 1.0 8.9251e-04 2.6 0.00e+00 0.0 1.8e+03 4.4e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> VecSet 302179 1.0 1.3128e+00 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> VecAssemblyBegin 1 1.0 1.3844e-03 7.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> VecAssemblyEnd 1 1.0 3.4710e-05 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> VecScatterBegin 603945 1.0 2.2874e+01 4.4 0.00e+00 0.0 1.1e+09 5.0e+02 1.0e+00 2 0100100 0 2 0100100 0 0 > >> >>>> VecScatterEnd 603944 1.0 8.2651e+01 4.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 7 0 0 0 0 7 0 0 0 0 0 > >> >>>> VecSetRandom 11 1.0 2.7061e-03 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> EPSSetUp 10 1.0 5.0371e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+01 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> EPSSolve 10 1.0 6.1329e+02 1.0 9.23e+11 1.8 1.1e+09 5.0e+02 1.5e+06 99100100100100 99100100100100 150509 > >> >>>> STSetUp 10 1.0 2.5475e-04 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> STApply 150986 1.0 2.1997e+02 1.3 8.07e+10 1.8 1.1e+09 5.0e+02 1.2e+06 31 9100100 80 31 9100100 80 36950 > >> >>>> BVCopy 1791 1.0 5.1953e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> BVMultVec 301925 1.0 1.5007e+02 3.1 3.31e+11 1.8 0.0e+00 0.0e+00 0.0e+00 14 36 0 0 0 14 36 0 0 0 220292 > >> >>>> BVMultInPlace 1801 1.0 8.0080e+00 1.8 1.78e+11 1.8 0.0e+00 0.0e+00 0.0e+00 1 19 0 0 0 1 19 0 0 0 2222543 > >> >>>> BVDotVec 301925 1.0 3.2807e+02 1.4 3.33e+11 1.8 0.0e+00 0.0e+00 3.0e+05 47 36 0 0 20 47 36 0 0 20 101409 > >> >>>> BVOrthogonalizeV 150996 1.0 4.0292e+02 1.1 6.64e+11 1.8 0.0e+00 0.0e+00 3.0e+05 62 72 0 0 20 62 72 0 0 20 164619 > >> >>>> BVScale 150996 1.0 4.1660e-01 3.2 5.27e+08 1.8 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 126494 > >> >>>> BVSetRandom 10 1.0 2.5061e-03 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> DSSolve 1801 1.0 2.0764e+01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 > >> >>>> DSVectors 2779 1.0 1.2691e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >> >>>> DSOther 1801 1.0 1.2944e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > >> >>>> ------------------------------------------------------------------------------------------------------------------------ > >> >>>> > >> >>>> Memory usage is given in bytes: > >> >>>> > >> >>>> Object Type Creations Destructions Memory Descendants' Mem. > >> >>>> Reports information only for process 0. > >> >>>> > >> >>>> --- Event Stage 0: Main Stage > >> >>>> > >> >>>> Container 1 1 584 0. > >> >>>> Distributed Mesh 6 6 29160 0. > >> >>>> GraphPartitioner 2 2 1244 0. > >> >>>> Matrix 1104 1104 136615232 0. > >> >>>> Index Set 930 930 9125912 0. > >> >>>> IS L to G Mapping 3 3 2235608 0. > >> >>>> Section 28 26 18720 0. > >> >>>> Star Forest Graph 30 30 25632 0. > >> >>>> Discrete System 6 6 5616 0. > >> >>>> PetscRandom 11 11 7194 0. > >> >>>> Vector 604372 604372 8204816368 0. > >> >>>> Vec Scatter 203 203 272192 0. > >> >>>> Viewer 21 10 8480 0. > >> >>>> EPS Solver 10 10 86360 0. > >> >>>> Spectral Transform 10 10 8400 0. > >> >>>> Basis Vectors 10 10 530848 0. > >> >>>> Region 10 10 6800 0. > >> >>>> Direct Solver 10 10 9838880 0. > >> >>>> Krylov Solver 10 10 13920 0. > >> >>>> Preconditioner 10 10 10080 0. > >> >>>> ======================================================================================================================== > >> >>>> Average time to get PetscTime(): 3.49944e-08 > >> >>>> Average time for MPI_Barrier(): 5.842e-06 > >> >>>> Average time for zero size MPI_Send(): 8.72551e-06 > >> >>>> #PETSc Option Table entries: > >> >>>> -config=benchmark3.json > >> >>>> -log_view > >> >>>> #End of PETSc Option Table entries > >> >>>> Compiled without FORTRAN kernels > >> >>>> Compiled with full precision matrices (default) > >> >>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 > >> >>>> Configure options: --prefix=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 CFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" FFLAGS= CXXFLAGS="-ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2" --with-cc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc --with-cxx=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpic++ --with-fc=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 --with-precision=double --with-scalar-type=real --with-shared-libraries=1 --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= CXXOPTFLAGS= --with-blaslapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib/libopenblas.so --with-x=0 --with-cxx-dialect=C++11 --with-boost=1 --with-clanguage=C --with-scalapack-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib/libscalapack.so --with-scalapack=1 --with-metis=1 --with-metis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk --with-hdf5=1 --with-hdf5-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5 --with-hypre=1 --with-hypre-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne --with-parmetis=1 --with-parmetis-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4 --with-mumps=1 --with-mumps-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b --with-trilinos=1 --with-trilinos-dir=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo --with-fftw=0 --with-cxx-dialect=C++11 --with-superlu_dist-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include --with-superlu_dist-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib/libsuperlu_dist.a --with-superlu_dist=1 --with-suitesparse-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include --with-suitesparse-lib="/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libumfpack.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libklu.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcholmod.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libbtf.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libccolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcolamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libcamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libamd.so /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib/libsuitesparseconfig.so /lib64/librt.so" --with-suitesparse=1 --with-zlib-include=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include --with-zlib-lib=/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib/libz.so --with-zlib=1 > >> >>>> ----------------------------------------- > >> >>>> Libraries compiled on 2020-01-22 15:21:53 on eu-c7-051-02 > >> >>>> Machine characteristics: Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-centos-7.5.1804-Core > >> >>>> Using PETSc directory: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit > >> >>>> Using PETSc arch: > >> >>>> ----------------------------------------- > >> >>>> > >> >>>> Using C compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc -ftree-vectorize -O2 -march=core-avx2 -fPIC -mavx2 > >> >>>> Using Fortran compiler: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 > >> >>>> ----------------------------------------- > >> >>>> > >> >>>> Using include paths: -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/include -I/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/include > >> >>>> ----------------------------------------- > >> >>>> > >> >>>> Using C linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpicc > >> >>>> Using Fortran linker: /cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/bin/mpif90 > >> >>>> Using libraries: -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/petsc-3.10.5-3czpbqhprn65yalty4o46knmhytixlit/lib -lpetsc -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/trilinos-12.14.1-hcdtxkqirqt6wkui3vkie5qse64payqo/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/mumps-5.1.1-36fzslrywwsg7gxnoxbjbzwuz6o74n6b/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/netlib-scalapack-2.0.2-bq6sqixlc4zwxpfrtbu7jt7twhps5ldv/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/suite-sparse-5.1.0-sk4v2rs7dfpese3zgsyigwtv2w66v2gz/lib /lib64/librt.so -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/superlu-dist-6.1.1-ejpmx43wk4vplnmry5n5njvgqvcvfe6x/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hypre-2.14.0-ly5dmcaty5wx4opqwspvoim6zss6sxne/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openblas-0.2.20-cot3cawsqf4pkxjwzjexaykbwn2ch3ii/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hdf5-1.10.1-sbxt5qlg2pojshva2b6kdflsy64i4rs5/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/parmetis-4.0.3-ik3r6faxeb6uzyywppuc2niuvivwiux4/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/metis-5.1.0-bqbfmcvyqigdaeetkg6fuhdh4eplu3fk/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/zlib-1.2.11-bu2rglshnlxrwc24334r76jr34jm2fxy/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/hwloc-1.11.9-a436y6rdahnn57u6oe6snwemjhcfmrso/lib -Wl,-rpath,/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -L/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-6.3.0/openmpi-3.0.1-k6n5k3l3baqlkdw3w7il7dwb6wilr6r6/lib -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib:/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib64 -Wl,-rpath,/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -L/cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/lib -lmuelu-adapters -lmuelu-interface -lmuelu -lstratimikos -lstratimikosbelos -lstratimikosaztecoo -lstratimikosamesos -lstratimikosml -lstratimikosifpack -lModeLaplace -lanasaziepetra -lanasazi -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lmapvarlib -lsuplib_cpp -lsuplib_c -lsuplib -lsupes -laprepro_lib -lchaco -lio_info_lib -lIonit -lIotr -lIohb -lIogs -lIogn -lIovs -lIopg -lIoexo_fac -lIopx -lIofx -lIoex -lIoss -lnemesis -lexoIIv2for32 -lexodus_for -lexodus -lbelosxpetra -lbelosepetra -lbelos -lml -lifpack -lpamgen_extras -lpamgen -lamesos -lgaleri-xpetra -lgaleri-epetra -laztecoo -lisorropia -lxpetra-sup -lxpetra -lthyraepetraext -lthyraepetra -lthyracore -lthyraepetraext -lthyraepetra -lthyracore -lepetraext -ltrilinosss -ltriutils -lzoltan -lepetra -lsacado -lrtop -lkokkoskernels -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lteuchoskokkoscomm -lteuchoskokkoscompat -lteuchosremainder -lteuchosnumerics -lteuchoscomm -lteuchosparameterlist -lteuchosparser -lteuchoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lkokkosalgorithms -lkokkoscontainers -lkokkoscore -lgtest -lpthread -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu_dist -lHYPRE -lopenblas -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lparmetis -lmetis -lm -lz -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl > >> >>>> ----------------------------------------- > >> >>>> > >> >>>> > >> >>>> > >> >>>> -- > >> >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > >> >>>> -- Norbert Wiener > >> >>>> > >> >>>> https://www.cse.buffalo.edu/~knepley/ > >> >>> > >> > >> > >> > >> -- > >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > >> -- Norbert Wiener > >> > >> https://www.cse.buffalo.edu/~knepley/ > > > > > > > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From sajidsyed2021 at u.northwestern.edu Mon May 4 09:22:01 2020 From: sajidsyed2021 at u.northwestern.edu (Sajid Ali) Date: Mon, 4 May 2020 09:22:01 -0500 Subject: [petsc-users] Reuse hypre interpolations between successive KSPOnly solves associated with TSStep ? Message-ID: Hi PETSc-developers, For a linear TS, when solving with -ts_type cn -ksp_type fgmres -pc_type gamg, the flag -pc_gamg_reuse_interpolation can be used to re-use the GAMG interpolation calculated when solving the first time step. Increasing the number of time steps only increases the time spent in application of the precondition and the setup time is constant. Is there an equivalent way to do this with Hypre?s BoomerAMG ? I?m using euclid as the smoother but adding the -pc_hypre_euclid_reuse flag only seems to make it reuse the euclid ILU smoothing within a particular TS time step, i.e. increasing the number of time steps linearly increases the time spent in setting up the preconditioner. Another approach I tried was to use hypre within the PCHMG preconditioner (which I hoped would allow me to use BoomerAMG to computer interpolations while also reusing the interpolations via the -pc_hmg_reuse_interpolation flag.) However, I?m unable to pass the parameter to BoomerAMG when Hypre is set as the inner PC for the HMG preconditioner. I didn?t see any options for doing so when I searched the -help output and neither does setting options via -hmg_inner_pc_hypre_boomeramg_option value work. What am I missing here ? For reference, I?m adding the exact solver options used (and the associated log files) for each of the above cases if it helps. (GAMG was used on the system of equations arising out of the complex PDE but since I was unable to get good convergence rates with GAMG for the real equivalent of the same, I switched to Hypre which performed better. Hence I?m looking for a way to set the reuse interpolation feature for this as well.) Thank You, Sajid Ali | PhD Candidate Applied Physics Northwestern University s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: log_gamg Type: application/octet-stream Size: 24888 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: log_hmg Type: application/octet-stream Size: 57454 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: run_hmg.sh Type: application/octet-stream Size: 772 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: run_gamg.sh Type: application/octet-stream Size: 396 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: log_hypre Type: application/octet-stream Size: 43742 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: run_hypre.sh Type: application/octet-stream Size: 662 bytes Desc: not available URL: From junchao.zhang at gmail.com Mon May 4 11:32:30 2020 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Mon, 4 May 2020 11:32:30 -0500 Subject: [petsc-users] mkl cpardiso iparm 31 In-Reply-To: References: Message-ID: On Fri, May 1, 2020 at 3:33 AM Marius Buerkle wrote: > Hi, > > Is the option "-mat_mkl_cpardiso_31" to calculate Partial solve and > computing selected components of the solution vectors actually supported > by PETSC? > >From the code, it seems so. > > Marius > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Mon May 4 17:03:47 2020 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 4 May 2020 18:03:47 -0400 Subject: [petsc-users] Reuse hypre interpolations between successive KSPOnly solves associated with TSStep ? In-Reply-To: References: Message-ID: On Mon, May 4, 2020 at 10:24 AM Sajid Ali wrote: > Hi PETSc-developers, > > For a linear TS, when solving with -ts_type cn -ksp_type fgmres -pc_type > gamg, the flag -pc_gamg_reuse_interpolation can be used to re-use the > GAMG interpolation calculated when solving the first time step. Increasing > the number of time steps only increases the time spent in application of > the precondition and the setup time is constant. > > Is there an equivalent way to do this with Hypre?s BoomerAMG ? > I don't know but you can look at the BoomerAMG documentation and if you find something that does what you want you can look at the PETSc source in /src/ksp/pc/impls/hypre/hypre.c and see the idiom for getting a hypre object and manipulating it. You can then call the Hypre API directly and do whatever you want. > I?m using euclid as the smoother but adding the -pc_hypre_euclid_reuse > flag only seems to make it reuse the euclid ILU smoothing within a > particular TS time step, i.e. increasing the number of time steps linearly > increases the time spent in setting up the preconditioner. > > Another approach I tried was to use hypre within the PCHMG preconditioner > (which I hoped would allow me to use BoomerAMG to computer interpolations > while also reusing the interpolations via the -pc_hmg_reuse_interpolation > flag.) However, I?m unable to pass the parameter to BoomerAMG when Hypre is > set as the inner PC for the HMG preconditioner. I didn?t see any options > for doing so when I searched the -help output and neither does setting > options via -hmg_inner_pc_hypre_boomeramg_option value work. What am I > missing here ? > > For reference, I?m adding the exact solver options used (and the > associated log files) for each of the above cases if it helps. (GAMG was > used on the system of equations arising out of the complex PDE but since I > was unable to get good convergence rates with GAMG for the real equivalent > of the same, I switched to Hypre which performed better. Hence I?m looking > for a way to set the reuse interpolation feature for this as well.) > > Thank You, > Sajid Ali | PhD Candidate > Applied Physics > Northwestern University > s-sajid-ali.github.io > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mukkundsunjii at gmail.com Tue May 5 07:46:48 2020 From: mukkundsunjii at gmail.com (MUKKUND SUNJII) Date: Tue, 5 May 2020 14:46:48 +0200 Subject: [petsc-users] Regarding Valgrind Message-ID: <9432E3FA-48D3-4561-A657-9DF4BBB8434B@gmail.com> Greetings, I have been working on modifying ex11.c by adding source terms to the existing model. In the process, I had introduced some memory corruption errors. The modifications are fairly large, hence it is difficult to single out the problem. In the FAQ page, I saw that we can use the flag -malloc_debug and/or CHKMEMQ statements. However, they don?t seem to help my case. At this point of time, Valgrind is not supported by MAC OS X as I have the newer version of the OS. Any suggestions on how to figure out the source of the problem? Regards, Mukkund From balay at mcs.anl.gov Tue May 5 08:56:50 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 5 May 2020 08:56:50 -0500 (CDT) Subject: [petsc-users] Regarding Valgrind In-Reply-To: <9432E3FA-48D3-4561-A657-9DF4BBB8434B@gmail.com> References: <9432E3FA-48D3-4561-A657-9DF4BBB8434B@gmail.com> Message-ID: I think its best to get access to a linux box for valgrind - either a remote box - or a VM on your laptop. [or via docker?] Satish On Tue, 5 May 2020, MUKKUND SUNJII wrote: > Greetings, > > I have been working on modifying ex11.c by adding source terms to the existing model. > > In the process, I had introduced some memory corruption errors. The modifications are fairly large, hence it is difficult to single out the problem. > > In the FAQ page, I saw that we can use the flag -malloc_debug and/or CHKMEMQ statements. However, they don?t seem to help my case. > > At this point of time, Valgrind is not supported by MAC OS X as I have the newer version of the OS. Any suggestions on how to figure out the source of the problem? > > Regards, > > Mukkund From jacob.fai at gmail.com Tue May 5 09:53:37 2020 From: jacob.fai at gmail.com (Jacob Faibussowitsch) Date: Tue, 5 May 2020 09:53:37 -0500 Subject: [petsc-users] Regarding Valgrind In-Reply-To: <9432E3FA-48D3-4561-A657-9DF4BBB8434B@gmail.com> References: <9432E3FA-48D3-4561-A657-9DF4BBB8434B@gmail.com> Message-ID: Using Docker is certainly the easiest way to do this in my experience! You can install docker desktop here https://hub.docker.com/editions/community/docker-ce-desktop-mac/ Then make a docker context file (preferably in an empty directory somewhere) with the following contents: FROM jedbrown/mpich-ccache USER root # PACKAGE INSTALL RUN apt-get -y update RUN apt-get -y upgrade RUN apt-get -y install build-essential RUN apt-get -y install python3 # OPTIONAL RUN apt-get -y install libc6-dbg # VALGRIND BUILD FROM SOURCE WORKDIR / RUN git clone git://sourceware.org/git/valgrind.git WORKDIR /valgrind RUN git pull RUN ./autogen.sh RUN ./configure --with-mpicc=/usr/local/bin/mpicc RUN make -j 5 RUN make install # ENV VARIABLES ENV PETSC_DIR="/petsc" ENV PETSC_ARCH="arch-linux-c-debug? RUN echo 'export PATH="/usr/lib/ccache:$PATH"' >> ~/.bashrc RUN "/usr/sbin/update-ccache-symlinks" WORKDIR /petsc Then simply run these commands from command line: docker build -t valgrinddocker:latest /PATH/TO/DOCKER/FILE docker run -it --rm --cpus 4 -v ${PETSC_DIR}:/petsc:delegated valgrinddocker:latest? This will build and launch docker image called ?valgrinddocker? based on ubuntu with mpi and ccache preinstalled as well as optionally build valgrind from source. It also most importantly mounts your current ${PETSC_DIR} inside the container meaning you don?t have to reinstall PETSc inside docker. Anything you do inside the docker will also be mirrored on your ?local? machine inside ${PETSC_DIR}. You will have to have a separate arch folder for running inside the image however so you will have to go through ./configure again. Alternatively If you don?t want to build valgrind from source and instead use apt-get to install it you may remove the commands regarding building valgrind and installing libc6-dbg. Hope this helps! Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) Cell: (312) 694-3391 > On May 5, 2020, at 7:46 AM, MUKKUND SUNJII wrote: > > Greetings, > > I have been working on modifying ex11.c by adding source terms to the existing model. > > In the process, I had introduced some memory corruption errors. The modifications are fairly large, hence it is difficult to single out the problem. > > In the FAQ page, I saw that we can use the flag -malloc_debug and/or CHKMEMQ statements. However, they don?t seem to help my case. > > At this point of time, Valgrind is not supported by MAC OS X as I have the newer version of the OS. Any suggestions on how to figure out the source of the problem? > > Regards, > > Mukkund -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Tue May 5 15:39:03 2020 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 5 May 2020 16:39:03 -0400 Subject: [petsc-users] error configuring METIS with OpenMP Message-ID: This works with OpenMP -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 865250 bytes Desc: not available URL: From balay at mcs.anl.gov Tue May 5 16:16:46 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 5 May 2020 16:16:46 -0500 (CDT) Subject: [petsc-users] error configuring METIS with OpenMP In-Reply-To: References: Message-ID: Sorry - what is the issue here? The attached configure.log looks good to me. Satish On Tue, 5 May 2020, Mark Adams wrote: > This works with OpenMP > From mfadams at lbl.gov Tue May 5 16:39:25 2020 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 5 May 2020 17:39:25 -0400 Subject: [petsc-users] error configuring METIS with OpenMP In-Reply-To: References: Message-ID: Sorry, getting my machines mixed up. On Tue, May 5, 2020 at 5:16 PM Satish Balay wrote: > Sorry - what is the issue here? The attached configure.log looks good to > me. > > Satish > > On Tue, 5 May 2020, Mark Adams wrote: > > > This works with OpenMP > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 1364489 bytes Desc: not available URL: From balay at mcs.anl.gov Tue May 5 16:57:27 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 5 May 2020 16:57:27 -0500 (CDT) Subject: [petsc-users] error configuring METIS with OpenMP In-Reply-To: References: Message-ID: Ok - this is on your Mac. >>>>> Executing: /usr/local/Cellar/mpich/3.3.2/bin/mpicc --version stdout: Apple clang version 11.0.3 (clang-1103.0.32.59) <<<<< XCode clang does not support OpenMP as far as I know. Satish On Tue, 5 May 2020, Mark Adams wrote: > Sorry, getting my machines mixed up. > > On Tue, May 5, 2020 at 5:16 PM Satish Balay wrote: > > > Sorry - what is the issue here? The attached configure.log looks good to > > me. > > > > Satish > > > > On Tue, 5 May 2020, Mark Adams wrote: > > > > > This works with OpenMP > > > > > > > > From mfadams at lbl.gov Tue May 5 17:17:39 2020 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 5 May 2020 18:17:39 -0400 Subject: [petsc-users] Error with OpenMP Message-ID: My code seems tob running correctly with threads but I get this error in PetscFinalize. I Looked at this in DDT and got an error in free here: PetscErrorCode PetscStackDestroy(void) { if (PetscStackActive()) { free(petscstack); petscstack = NULL; } return 0; } This error did not happen with one thread. Any ideas? Thanks, *** Error in `./ex11': corrupted size vs. prev_size: 0x0000000043fb8070 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x92344)[0x200022e72344] /lib64/libc.so.6(cfree+0xa5c)[0x200022e7a19c] /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/container/../lib/spectrum_mpi/mca_pml_pami.so(mca_pml_pami_del_comm+0xc0)[0x2000269f7690] /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/container/../lib/libmpi_ibm.so.3(+0x4d830)[0x200022c0d830] /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/container/../lib/libmpi_ibm.so.3(ompi_comm_free+0x244)[0x200022c10354] /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/container/../lib/libmpi_ibm.so.3(PMPI_Comm_free+0xb4)[0x200022c5ae54] /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(+0x16c65a4)[0x2000017b65a4] /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(PetscPartitionerDestroy+0x73c)[0x2000017a750c] /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(+0x164c474)[0x20000173c474] /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(DMDestroy+0x2bcc)[0x200001b006a4] /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(PetscDualSpaceDestroy+0x95c)[0x200001a5a314] /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(+0x196919c)[0x200001a5919c] /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(PetscDualSpaceDestroy+0x750)[0x200001a5a108] /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(PetscFEDestroy+0xb1c)[0x200001a89734] /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(PetscObjectDereference+0x494)[0x200000241484] /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(PetscDSDestroy+0x8cc)[0x200001acded0] /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(DMClearDS+0x30c)[0x200001b387ac] /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(DMDestroy+0x2a30)[0x200001b00508] /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(+0x164cb58)[0x20000173cb58] /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(DMDestroy+0x2bcc)[0x200001b006a4] /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(+0x149066c)[0x20000158066c] /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(+0x1475908)[0x200001565908] /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(DMDestroy+0x2bcc)[0x200001b006a4] /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(PetscObjectDereference+0x494)[0x200000241484] /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(PetscObjectListDestroy+0x1c8)[0x20000022ebe8] /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(PetscHeaderDestroy_Private+0x628)[0x20000023b50c] /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(VecDestroy+0x794)[0x200000b70e50] ./ex11[0x1000f6c8] /lib64/libc.so.6(+0x25200)[0x200022e05200] /lib64/libc.so.6(__libc_start_main+0xc4)[0x200022e053f4] ======= Memory map: ======== 10000000-10020000 r-xp 00000000 00:30 155767402 /autofs/nccs-svm1_home1/adams/petsc/src/dm/impls/plex/tutorials/ex11 10020000-10030000 r--p 00010000 00:30 155767402 /autofs/nccs-svm1_home1/adams/petsc/src/dm/impls/plex/tutorials/ex11 10030000-10040000 rw-p 00020000 00:30 155767402 /autofs/nccs-svm1_home1/adams/petsc/src/dm/impls/plex/tutorials/ex11 43330000-43800000 rw-p 00000000 00:00 0 [heap] 43800000-43810000 rw-p 00000000 00:00 0 [heap] -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Tue May 5 17:18:37 2020 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 5 May 2020 18:18:37 -0400 Subject: [petsc-users] error configuring METIS with OpenMP In-Reply-To: References: Message-ID: I use MPICH from Homebrew and specify this is configured. mpicc seems to use XCode now. Do you know how I would get Homebew to not use Xcode? I actually don't need MPI for this so may I should just set the C compiler explicitly to gcc.. Thanks, On Tue, May 5, 2020 at 5:57 PM Satish Balay wrote: > Ok - this is on your Mac. > > >>>>> > Executing: /usr/local/Cellar/mpich/3.3.2/bin/mpicc --version > stdout: > Apple clang version 11.0.3 (clang-1103.0.32.59) > <<<<< > > XCode clang does not support OpenMP as far as I know. > > Satish > > On Tue, 5 May 2020, Mark Adams wrote: > > > Sorry, getting my machines mixed up. > > > > On Tue, May 5, 2020 at 5:16 PM Satish Balay wrote: > > > > > Sorry - what is the issue here? The attached configure.log looks good > to > > > me. > > > > > > Satish > > > > > > On Tue, 5 May 2020, Mark Adams wrote: > > > > > > > This works with OpenMP > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Tue May 5 17:24:17 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 5 May 2020 17:24:17 -0500 (CDT) Subject: [petsc-users] error configuring METIS with OpenMP In-Reply-To: References: Message-ID: You can try: ./configure --with-cc=gcc-9 --with-fc=gfortran-9 --with-cxx=g++-9 --download-mpich=1 --with-openmp=1 ..... Or to get a prebuilt mpich with brew compilers: ./configure --with-cc=gcc-9 --with-fc=gfortran-9 --with-cxx=g++-9 --download-mpich=1 --prefix=$HOME/soft/mpich-brew-gcc And later: ./configure --with-mpi-dir=$HOME/soft/mpich-brew-gcc ..... Satish On Tue, 5 May 2020, Mark Adams wrote: > I use MPICH from Homebrew and specify this is configured. mpicc seems to > use XCode now. > > Do you know how I would get Homebew to not use Xcode? > > I actually don't need MPI for this so may I should just set the C compiler > explicitly to gcc.. > Thanks, > > On Tue, May 5, 2020 at 5:57 PM Satish Balay wrote: > > > Ok - this is on your Mac. > > > > >>>>> > > Executing: /usr/local/Cellar/mpich/3.3.2/bin/mpicc --version > > stdout: > > Apple clang version 11.0.3 (clang-1103.0.32.59) > > <<<<< > > > > XCode clang does not support OpenMP as far as I know. > > > > Satish > > > > On Tue, 5 May 2020, Mark Adams wrote: > > > > > Sorry, getting my machines mixed up. > > > > > > On Tue, May 5, 2020 at 5:16 PM Satish Balay wrote: > > > > > > > Sorry - what is the issue here? The attached configure.log looks good > > to > > > > me. > > > > > > > > Satish > > > > > > > > On Tue, 5 May 2020, Mark Adams wrote: > > > > > > > > > This works with OpenMP > > > > > > > > > > > > > > > > > > > > > From mfadams at lbl.gov Tue May 5 18:06:37 2020 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 5 May 2020 19:06:37 -0400 Subject: [petsc-users] Error with OpenMP In-Reply-To: References: Message-ID: I found a way to get an error in my code so nevermind. On Tue, May 5, 2020 at 6:17 PM Mark Adams wrote: > My code seems tob running correctly with threads but I get this error in > PetscFinalize. > > I Looked at this in DDT and got an error in free here: > > PetscErrorCode PetscStackDestroy(void) > { > if (PetscStackActive()) { > free(petscstack); > petscstack = NULL; > } > return 0; > } > > This error did not happen with one thread. > > Any ideas? > Thanks, > > *** Error in `./ex11': corrupted size vs. prev_size: 0x0000000043fb8070 *** > ======= Backtrace: ========= > /lib64/libc.so.6(+0x92344)[0x200022e72344] > /lib64/libc.so.6(cfree+0xa5c)[0x200022e7a19c] > > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/container/../lib/spectrum_mpi/mca_pml_pami.so(mca_pml_pami_del_comm+0xc0)[0x2000269f7690] > > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/container/../lib/libmpi_ibm.so.3(+0x4d830)[0x200022c0d830] > > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/container/../lib/libmpi_ibm.so.3(ompi_comm_free+0x244)[0x200022c10354] > > /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/container/../lib/libmpi_ibm.so.3(PMPI_Comm_free+0xb4)[0x200022c5ae54] > > /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(+0x16c65a4)[0x2000017b65a4] > > /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(PetscPartitionerDestroy+0x73c)[0x2000017a750c] > > /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(+0x164c474)[0x20000173c474] > > /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(DMDestroy+0x2bcc)[0x200001b006a4] > > /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(PetscDualSpaceDestroy+0x95c)[0x200001a5a314] > > /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(+0x196919c)[0x200001a5919c] > > /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(PetscDualSpaceDestroy+0x750)[0x200001a5a108] > > /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(PetscFEDestroy+0xb1c)[0x200001a89734] > > /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(PetscObjectDereference+0x494)[0x200000241484] > > /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(PetscDSDestroy+0x8cc)[0x200001acded0] > > /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(DMClearDS+0x30c)[0x200001b387ac] > > /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(DMDestroy+0x2a30)[0x200001b00508] > > /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(+0x164cb58)[0x20000173cb58] > > /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(DMDestroy+0x2bcc)[0x200001b006a4] > > /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(+0x149066c)[0x20000158066c] > > /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(+0x1475908)[0x200001565908] > > /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(DMDestroy+0x2bcc)[0x200001b006a4] > > /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(PetscObjectDereference+0x494)[0x200000241484] > > /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(PetscObjectListDestroy+0x1c8)[0x20000022ebe8] > > /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(PetscHeaderDestroy_Private+0x628)[0x20000023b50c] > > /ccs/home/adams/petsc/arch-summit-dbg-gnu-cuda-omp/lib/libpetsc.so.3.013(VecDestroy+0x794)[0x200000b70e50] > ./ex11[0x1000f6c8] > /lib64/libc.so.6(+0x25200)[0x200022e05200] > /lib64/libc.so.6(__libc_start_main+0xc4)[0x200022e053f4] > ======= Memory map: ======== > 10000000-10020000 r-xp 00000000 00:30 155767402 > /autofs/nccs-svm1_home1/adams/petsc/src/dm/impls/plex/tutorials/ex11 > 10020000-10030000 r--p 00010000 00:30 155767402 > /autofs/nccs-svm1_home1/adams/petsc/src/dm/impls/plex/tutorials/ex11 > 10030000-10040000 rw-p 00020000 00:30 155767402 > /autofs/nccs-svm1_home1/adams/petsc/src/dm/impls/plex/tutorials/ex11 > 43330000-43800000 rw-p 00000000 00:00 0 > [heap] > 43800000-43810000 rw-p 00000000 00:00 0 > [heap] > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yjwu16 at gmail.com Tue May 5 22:04:27 2020 From: yjwu16 at gmail.com (Yingjie Wu) Date: Wed, 6 May 2020 11:04:27 +0800 Subject: [petsc-users] Treatment of piecewise residual function in SNES Message-ID: Dear PETSc developers Hi, I have been using SNES to solve a nonlinear problem recently, but this nonlinear problem is different from the ordinary problem, in which its residual function is a piecewise function. I have the following questions during calculation: 1. As I am not very familiar with the jacobian matrix construction of this piecewise function, I used the - snes_fd. However, the result is not converged in process of calculation. I don't know if it's the piecewise function problem or other errors. 2. For this piecewise function problem, it is actually determination of the residual function according to the current solution vector. Should I determine the piecewise function before each Newton step begins, or add judgment directly to the evaluation function to form the piecewise function? Now I'm adding judgment directly to the evaluation function to form a piecewise function. 3. Are there some special treatment for piecewise residual functions in the SNES? I'm dealing with a water two-phase flow problem, which is difficult to describe in detail because the model is relatively complex. For the first time I have encountered this problem and hope to get some advice or information. Thanks, Yingjie -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 6 05:38:35 2020 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 6 May 2020 06:38:35 -0400 Subject: [petsc-users] Treatment of piecewise residual function in SNES In-Reply-To: References: Message-ID: Do you mean a piecewise smooth function, with a discontinuous derivative, or a piecewise function which is itself discontinuous? Thanks, Matt On Tue, May 5, 2020 at 11:05 PM Yingjie Wu wrote: > Dear PETSc developers > Hi, > > I have been using SNES to solve a nonlinear problem recently, but this > nonlinear problem is different from the ordinary problem, in which its > residual function is a piecewise function. I have the following questions > during calculation: > > > 1. As I am not very familiar with the jacobian matrix construction of > this piecewise function, I used the - snes_fd. However, the result is not > converged in process of calculation. I don't know if it's the piecewise > function problem or other errors. > 2. For this piecewise function problem, it is actually determination > of the residual function according to the current solution vector. Should I > determine the piecewise function before each Newton step begins, or add > judgment directly to the evaluation function to form the piecewise > function? Now I'm adding judgment directly to the evaluation function to > form a piecewise function. > 3. Are there some special treatment for piecewise residual functions > in the SNES? > > I'm dealing with a water two-phase flow problem, which is difficult to > describe in detail because the model is relatively complex. For the first > time I have encountered this problem and hope to get some advice or > information. > > > Thanks, > > Yingjie > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yjwu16 at gmail.com Wed May 6 09:25:26 2020 From: yjwu16 at gmail.com (Yingjie Wu) Date: Wed, 6 May 2020 22:25:26 +0800 Subject: [petsc-users] Treatment of piecewise residual function in SNES In-Reply-To: References: Message-ID: Hi, I think my residual functions are piecewise but discontinuous functions. Although the residual function is discontinuous, there is little difference between the residuals in different segments at the piecewise point. This is the first time I have encountered such a problem, as I described before, I solve the two-phase problem of water, which involves the disappearance and generation of phases, so the conservation equation is piecewised. I looked at some of the papers about the piecewise residual function in Newton method, which refers to the peiecewise smooth function, I do not really understand the different treatment in Newton method between the two kinds of functions. Thank you very much for your reply, I know little about this field and hope to get some suggestions. Thanks, Yingjie Matthew Knepley ?2020?5?6??? ??6:38??? > Do you mean a piecewise smooth function, with a discontinuous derivative, > or a piecewise function which is itself discontinuous? > > Thanks, > > Matt > > On Tue, May 5, 2020 at 11:05 PM Yingjie Wu wrote: > >> Dear PETSc developers >> Hi, >> >> I have been using SNES to solve a nonlinear problem recently, but this >> nonlinear problem is different from the ordinary problem, in which its >> residual function is a piecewise function. I have the following questions >> during calculation: >> >> >> 1. As I am not very familiar with the jacobian matrix construction of >> this piecewise function, I used the - snes_fd. However, the result is not >> converged in process of calculation. I don't know if it's the piecewise >> function problem or other errors. >> 2. For this piecewise function problem, it is actually determination >> of the residual function according to the current solution vector. Should I >> determine the piecewise function before each Newton step begins, or add >> judgment directly to the evaluation function to form the piecewise >> function? Now I'm adding judgment directly to the evaluation function to >> form a piecewise function. >> 3. Are there some special treatment for piecewise residual functions >> in the SNES? >> >> I'm dealing with a water two-phase flow problem, which is difficult to >> describe in detail because the model is relatively complex. For the first >> time I have encountered this problem and hope to get some advice or >> information. >> >> >> Thanks, >> >> Yingjie >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 6 09:54:10 2020 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 6 May 2020 10:54:10 -0400 Subject: [petsc-users] Treatment of piecewise residual function in SNES In-Reply-To: References: Message-ID: On Wed, May 6, 2020 at 10:25 AM Yingjie Wu wrote: > Hi, > > I think my residual functions are piecewise but discontinuous functions. > > Although the residual function is discontinuous, there is little > difference between the residuals in different segments at the piecewise > point. This is the first time I have encountered such a problem, as I > described before, I solve the two-phase problem of water, which involves > the disappearance and generation of phases, so the conservation equation is > piecewised. I looked at some of the papers about the piecewise residual > function in Newton method, which refers to the peiecewise smooth function, > I do not really understand the different treatment in Newton method between > the two kinds of functions. > > > Thank you very much for your reply, I know little about this field and > hope to get some suggestions. > I have no idea what to do for discontinuous functions. The value near the discontinuity is not even computable, since within error you cannot tell which side of the jump you might be on. Thanks, Matt > Thanks, > > Yingjie > > Matthew Knepley ?2020?5?6??? ??6:38??? > >> Do you mean a piecewise smooth function, with a discontinuous derivative, >> or a piecewise function which is itself discontinuous? >> >> Thanks, >> >> Matt >> >> On Tue, May 5, 2020 at 11:05 PM Yingjie Wu wrote: >> >>> Dear PETSc developers >>> Hi, >>> >>> I have been using SNES to solve a nonlinear problem recently, but this >>> nonlinear problem is different from the ordinary problem, in which its >>> residual function is a piecewise function. I have the following questions >>> during calculation: >>> >>> >>> 1. As I am not very familiar with the jacobian matrix construction >>> of this piecewise function, I used the - snes_fd. However, the result is >>> not converged in process of calculation. I don't know if it's the piecewise >>> function problem or other errors. >>> 2. For this piecewise function problem, it is actually determination >>> of the residual function according to the current solution vector. Should I >>> determine the piecewise function before each Newton step begins, or add >>> judgment directly to the evaluation function to form the piecewise >>> function? Now I'm adding judgment directly to the evaluation function to >>> form a piecewise function. >>> 3. Are there some special treatment for piecewise residual functions >>> in the SNES? >>> >>> I'm dealing with a water two-phase flow problem, which is difficult to >>> describe in detail because the model is relatively complex. For the first >>> time I have encountered this problem and hope to get some advice or >>> information. >>> >>> >>> Thanks, >>> >>> Yingjie >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mukkundsunjii at gmail.com Wed May 6 11:08:03 2020 From: mukkundsunjii at gmail.com (MUKKUND SUNJII) Date: Wed, 6 May 2020 18:08:03 +0200 Subject: [petsc-users] Regarding Valgrind In-Reply-To: References: <9432E3FA-48D3-4561-A657-9DF4BBB8434B@gmail.com> Message-ID: Thank you so much for your suggestion. I will try to install Valgrind through docker. Regards, Mukkund > On 5 May 2020, at 16:53, Jacob Faibussowitsch wrote: > > Using Docker is certainly the easiest way to do this in my experience! You can install docker desktop here https://hub.docker.com/editions/community/docker-ce-desktop-mac/ > > Then make a docker context file (preferably in an empty directory somewhere) with the following contents: > > FROM jedbrown/mpich-ccache > USER root > > # PACKAGE INSTALL > RUN apt-get -y update > RUN apt-get -y upgrade > RUN apt-get -y install build-essential > RUN apt-get -y install python3 > > # OPTIONAL > RUN apt-get -y install libc6-dbg > > # VALGRIND BUILD FROM SOURCE > WORKDIR / > RUN git clone git://sourceware.org/git/valgrind.git > WORKDIR /valgrind > RUN git pull > RUN ./autogen.sh > RUN ./configure --with-mpicc=/usr/local/bin/mpicc > RUN make -j 5 > RUN make install > > # ENV VARIABLES > ENV PETSC_DIR="/petsc" > ENV PETSC_ARCH="arch-linux-c-debug? > RUN echo 'export PATH="/usr/lib/ccache:$PATH"' >> ~/.bashrc > > RUN "/usr/sbin/update-ccache-symlinks" > WORKDIR /petsc > > Then simply run these commands from command line: > > docker build -t valgrinddocker:latest /PATH/TO/DOCKER/FILE > docker run -it --rm --cpus 4 -v ${PETSC_DIR}:/petsc:delegated valgrinddocker:latest? > > This will build and launch docker image called ?valgrinddocker? based on ubuntu with mpi and ccache preinstalled as well as optionally build valgrind from source. It also most importantly mounts your current ${PETSC_DIR} inside the container meaning you don?t have to reinstall PETSc inside docker. Anything you do inside the docker will also be mirrored on your ?local? machine inside ${PETSC_DIR}. You will have to have a separate arch folder for running inside the image however so you will have to go through ./configure again. > > Alternatively If you don?t want to build valgrind from source and instead use apt-get to install it you may remove the commands regarding building valgrind and installing libc6-dbg. > > Hope this helps! > > Best regards, > > Jacob Faibussowitsch > (Jacob Fai - booss - oh - vitch) > Cell: (312) 694-3391 > >> On May 5, 2020, at 7:46 AM, MUKKUND SUNJII > wrote: >> >> Greetings, >> >> I have been working on modifying ex11.c by adding source terms to the existing model. >> >> In the process, I had introduced some memory corruption errors. The modifications are fairly large, hence it is difficult to single out the problem. >> >> In the FAQ page, I saw that we can use the flag -malloc_debug and/or CHKMEMQ statements. However, they don?t seem to help my case. >> >> At this point of time, Valgrind is not supported by MAC OS X as I have the newer version of the OS. Any suggestions on how to figure out the source of the problem? >> >> Regards, >> >> Mukkund > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gehammo at sandia.gov Wed May 6 11:26:14 2020 From: gehammo at sandia.gov (Hammond, Glenn E) Date: Wed, 6 May 2020 16:26:14 +0000 Subject: [petsc-users] Alternate mesh partitioners to METIS/ParMETIS Message-ID: <76e971e81ecf4fc28e5b4000cc194062@ES06AMSNLNT.srn.sandia.gov> PETSc Users, We have many PFLOTRAN users outside the US who cannot use METIS/ParMETIS for partitioning unstructured grids due to license restrictions. What alternate mesh partitioners have developers/users successfully employed with PETSc? Thank you, Glenn From jed at jedbrown.org Wed May 6 11:44:50 2020 From: jed at jedbrown.org (Jed Brown) Date: Wed, 06 May 2020 10:44:50 -0600 Subject: [petsc-users] Alternate mesh partitioners to METIS/ParMETIS In-Reply-To: <76e971e81ecf4fc28e5b4000cc194062@ES06AMSNLNT.srn.sandia.gov> References: <76e971e81ecf4fc28e5b4000cc194062@ES06AMSNLNT.srn.sandia.gov> Message-ID: <87y2q59fnh.fsf@jedbrown.org> METIS is (since 5.0.3) Apache 2.0, and thus should be usable by anyone. The most direct substitute for ParMETIS is PTScotch, which is CeCILL-C (compatible with LGPL). "Hammond, Glenn E via petsc-users" writes: > PETSc Users, > > We have many PFLOTRAN users outside the US who cannot use METIS/ParMETIS for partitioning unstructured grids due to license restrictions. What alternate mesh partitioners have developers/users successfully employed with PETSc? > > Thank you, > > Glenn From tsc109 at arl.psu.edu Wed May 6 14:30:25 2020 From: tsc109 at arl.psu.edu (Thomas S. Chyczewski) Date: Wed, 6 May 2020 19:30:25 +0000 Subject: [petsc-users] Example code of linear solve of a block matrix system in Fortran Message-ID: <91e4f8d137f444edb4542ce2cf09c97d@arl.psu.edu> All, I'm relatively new to PETSc and have relied pretty heavily on the example codes included in the distribution to figure out the finer points of using the PETSc library that I couldn't deduce from the manual. One thing I can't figure out is how to solve block matrix systems and I couldn't find an example in Fortran. I'm writing a 2D incompressible CFD solver so I have a 3x3 block Imax*Jmax system. The closest I've come to finding an example is ex19.c in the snes directory, but that is in c and for the nonlinear solver. I have been able to run PETSc but unwrapping the block matrix into a monolithic system. But the manual says "Block matrices represent an important class of problems in numerical linear algebra and offer the possibility of far more efficient iterative solvers than just treating the entire matrix as black box." However, in the FAQs I saw a comment that PETSc scans the AIJ matrices for rows that have the same column layout and can deduce if it's a block system and use the more efficient solvers. I also saw in the archives for this email list a thread where it seems workaround for building fields in a Fortran code is discussed ("Back to struct in Fortran to represent field with dof > 1"), so I'm beginning to suspect building a block system in Fortran might not be straight forward. All that being said, my questions: Is there a significant advantage to building the block system as opposed to the analogous monolithic system if PETSc can figure out that it's a block system? Can you confirm that PETSc does figure this out? If there is an advantage to loading the matrix as a block matrix, is there an example Fortran code that builds and solves a linear block system? Thanks, Tom C -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed May 6 14:43:03 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 6 May 2020 14:43:03 -0500 (CDT) Subject: [petsc-users] Example code of linear solve of a block matrix system in Fortran In-Reply-To: <91e4f8d137f444edb4542ce2cf09c97d@arl.psu.edu> References: <91e4f8d137f444edb4542ce2cf09c97d@arl.psu.edu> Message-ID: What you are looking for is: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateBAIJ.html indoe is optimization for AIJ type - where you might not have a simple block size like 3 that you have. BAIJ will perform better than AIJ/with-inode To verify indoe usage - you can run the code with '-log_info' option - and do a 'grep inode' Satish On Wed, 6 May 2020, Thomas S. Chyczewski wrote: > > All, > > I'm relatively new to PETSc and have relied pretty heavily on the example codes included in the distribution to figure out the finer points of using the PETSc library that I couldn't deduce from the manual. One thing I can't figure out is how to solve block matrix systems and I couldn't find an example in Fortran. I'm writing a 2D incompressible CFD solver so I have a 3x3 block Imax*Jmax system. The closest I've come to finding an example is ex19.c in the snes directory, but that is in c and for the nonlinear solver. > > I have been able to run PETSc but unwrapping the block matrix into a monolithic system. But the manual says "Block matrices represent an important class of problems in numerical linear algebra and offer the possibility of far more efficient iterative solvers than just treating the entire matrix as black box." However, in the FAQs I saw a comment that PETSc scans the AIJ matrices for rows that have the same column layout and can deduce if it's a block system and use the more efficient solvers. I also saw in the archives for this email list a thread where it seems workaround for building fields in a Fortran code is discussed ("Back to struct in Fortran to represent field with dof > 1"), so I'm beginning to suspect building a block system in Fortran might not be straight forward. > > All that being said, my questions: > > Is there a significant advantage to building the block system as opposed to the analogous monolithic system if PETSc can figure out that it's a block system? Can you confirm that PETSc does figure this out? > If there is an advantage to loading the matrix as a block matrix, is there an example Fortran code that builds and solves a linear block system? > > Thanks, > Tom C > > From tsc109 at arl.psu.edu Wed May 6 14:59:49 2020 From: tsc109 at arl.psu.edu (Thomas S. Chyczewski) Date: Wed, 6 May 2020 19:59:49 +0000 Subject: [petsc-users] [EXTERNAL] Re: Example code of linear solve of a block matrix system in Fortran In-Reply-To: References: <91e4f8d137f444edb4542ce2cf09c97d@arl.psu.edu> Message-ID: Thanks Satish. I have seen that page and can create a block matrix. It's not clear to me how to fill it and use it in a Fortran code. -----Original Message----- From: Satish Balay Sent: Wednesday, May 6, 2020 3:43 PM To: Thomas S. Chyczewski Cc: petsc-users at mcs.anl.gov Subject: [EXTERNAL] Re: [petsc-users] Example code of linear solve of a block matrix system in Fortran What you are looking for is: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateBAIJ.html indoe is optimization for AIJ type - where you might not have a simple block size like 3 that you have. BAIJ will perform better than AIJ/with-inode To verify indoe usage - you can run the code with '-log_info' option - and do a 'grep inode' Satish On Wed, 6 May 2020, Thomas S. Chyczewski wrote: > > All, > > I'm relatively new to PETSc and have relied pretty heavily on the example codes included in the distribution to figure out the finer points of using the PETSc library that I couldn't deduce from the manual. One thing I can't figure out is how to solve block matrix systems and I couldn't find an example in Fortran. I'm writing a 2D incompressible CFD solver so I have a 3x3 block Imax*Jmax system. The closest I've come to finding an example is ex19.c in the snes directory, but that is in c and for the nonlinear solver. > > I have been able to run PETSc but unwrapping the block matrix into a monolithic system. But the manual says "Block matrices represent an important class of problems in numerical linear algebra and offer the possibility of far more efficient iterative solvers than just treating the entire matrix as black box." However, in the FAQs I saw a comment that PETSc scans the AIJ matrices for rows that have the same column layout and can deduce if it's a block system and use the more efficient solvers. I also saw in the archives for this email list a thread where it seems workaround for building fields in a Fortran code is discussed ("Back to struct in Fortran to represent field with dof > 1"), so I'm beginning to suspect building a block system in Fortran might not be straight forward. > > All that being said, my questions: > > Is there a significant advantage to building the block system as opposed to the analogous monolithic system if PETSc can figure out that it's a block system? Can you confirm that PETSc does figure this out? > If there is an advantage to loading the matrix as a block matrix, is there an example Fortran code that builds and solves a linear block system? > > Thanks, > Tom C > > From balay at mcs.anl.gov Wed May 6 15:05:27 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 6 May 2020 15:05:27 -0500 (CDT) Subject: [petsc-users] [EXTERNAL] Re: Example code of linear solve of a block matrix system in Fortran In-Reply-To: References: <91e4f8d137f444edb4542ce2cf09c97d@arl.psu.edu> Message-ID: You can use https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatSetValuesBlocked.html MatSetValues() will also work - but MatSetValuesBlocked() is more efficient wrt BAIJ Satish On Wed, 6 May 2020, Thomas S. Chyczewski wrote: > Thanks Satish. I have seen that page and can create a block matrix. It's not clear to me how to fill it and use it in a Fortran code. > > -----Original Message----- > From: Satish Balay > Sent: Wednesday, May 6, 2020 3:43 PM > To: Thomas S. Chyczewski > Cc: petsc-users at mcs.anl.gov > Subject: [EXTERNAL] Re: [petsc-users] Example code of linear solve of a block matrix system in Fortran > > What you are looking for is: > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateBAIJ.html > > indoe is optimization for AIJ type - where you might not have a simple block size like 3 that you have. > > BAIJ will perform better than AIJ/with-inode > > To verify indoe usage - you can run the code with '-log_info' option - and do a 'grep inode' > > Satish > > On Wed, 6 May 2020, Thomas S. Chyczewski wrote: > > > > > All, > > > > I'm relatively new to PETSc and have relied pretty heavily on the example codes included in the distribution to figure out the finer points of using the PETSc library that I couldn't deduce from the manual. One thing I can't figure out is how to solve block matrix systems and I couldn't find an example in Fortran. I'm writing a 2D incompressible CFD solver so I have a 3x3 block Imax*Jmax system. The closest I've come to finding an example is ex19.c in the snes directory, but that is in c and for the nonlinear solver. > > > > I have been able to run PETSc but unwrapping the block matrix into a monolithic system. But the manual says "Block matrices represent an important class of problems in numerical linear algebra and offer the possibility of far more efficient iterative solvers than just treating the entire matrix as black box." However, in the FAQs I saw a comment that PETSc scans the AIJ matrices for rows that have the same column layout and can deduce if it's a block system and use the more efficient solvers. I also saw in the archives for this email list a thread where it seems workaround for building fields in a Fortran code is discussed ("Back to struct in Fortran to represent field with dof > 1"), so I'm beginning to suspect building a block system in Fortran might not be straight forward. > > > > All that being said, my questions: > > > > Is there a significant advantage to building the block system as opposed to the analogous monolithic system if PETSc can figure out that it's a block system? Can you confirm that PETSc does figure this out? > > If there is an advantage to loading the matrix as a block matrix, is there an example Fortran code that builds and solves a linear block system? > > > > Thanks, > > Tom C > > > > > From mbuerkle at web.de Thu May 7 03:14:45 2020 From: mbuerkle at web.de (Marius Buerkle) Date: Thu, 7 May 2020 10:14:45 +0200 Subject: [petsc-users] mkl cpardiso iparm 31 In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From gehammo at sandia.gov Thu May 7 09:53:44 2020 From: gehammo at sandia.gov (Hammond, Glenn E) Date: Thu, 7 May 2020 14:53:44 +0000 Subject: [petsc-users] [EXTERNAL] Re: Alternate mesh partitioners to METIS/ParMETIS In-Reply-To: <87y2q59fnh.fsf@jedbrown.org> References: <76e971e81ecf4fc28e5b4000cc194062@ES06AMSNLNT.srn.sandia.gov> <87y2q59fnh.fsf@jedbrown.org> Message-ID: <6114fa01c0894f0c8f7036b499d703ad@ES06AMSNLNT.srn.sandia.gov> Jed, We call MatMeshToCellGraph() to generate the dual matrix. This function relies upon ParMETIS. Can you point me to a similar function that does not require ParMETIS ( e.g. for PTScotch)? Thanks, Glenn > -----Original Message----- > From: Jed Brown > Sent: Wednesday, May 6, 2020 9:45 AM > To: Hammond, Glenn E ; petsc-users at mcs.anl.gov > Subject: [EXTERNAL] Re: [petsc-users] Alternate mesh partitioners to > METIS/ParMETIS > > METIS is (since 5.0.3) Apache 2.0, and thus should be usable by anyone. > > The most direct substitute for ParMETIS is PTScotch, which is CeCILL-C > (compatible with LGPL). > > "Hammond, Glenn E via petsc-users" writes: > > > PETSc Users, > > > > We have many PFLOTRAN users outside the US who cannot use > METIS/ParMETIS for partitioning unstructured grids due to license > restrictions. What alternate mesh partitioners have developers/users > successfully employed with PETSc? > > > > Thank you, > > > > Glenn From dalcinl at gmail.com Thu May 7 10:14:45 2020 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Thu, 7 May 2020 18:14:45 +0300 Subject: [petsc-users] [EXTERNAL] Re: Alternate mesh partitioners to METIS/ParMETIS In-Reply-To: <6114fa01c0894f0c8f7036b499d703ad@ES06AMSNLNT.srn.sandia.gov> References: <76e971e81ecf4fc28e5b4000cc194062@ES06AMSNLNT.srn.sandia.gov> <87y2q59fnh.fsf@jedbrown.org> <6114fa01c0894f0c8f7036b499d703ad@ES06AMSNLNT.srn.sandia.gov> Message-ID: On Thu, 7 May 2020 at 17:53, Hammond, Glenn E via petsc-users < petsc-users at mcs.anl.gov> wrote: > Jed, > > We call MatMeshToCellGraph() to generate the dual matrix. This function > relies upon ParMETIS. Can you point me to a similar function that does not > require ParMETIS ( e.g. for PTScotch)? > > Maybe a starting point: DMPlexCreateFromCellList(comm, ..., &dm); PetscSectionCreate(comm, &cellPartSection); DMPlexGetPartitioner(dm, &partitioner); PetscPartitionerSetType(partitioner, PETSCPARTITIONERPTSCOTCH); PetscPartitionerDMPlexPartition(partitioner, dm, NULL, cellPartSection, &cellPart); One minor annoyance is that the first call will need the vertex coordinates. Matthew, any better way? -- Lisandro Dalcin ============ Research Scientist Extreme Computing Research Center (ECRC) King Abdullah University of Science and Technology (KAUST) http://ecrc.kaust.edu.sa/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sajidsyed2021 at u.northwestern.edu Thu May 7 11:42:46 2020 From: sajidsyed2021 at u.northwestern.edu (Sajid Ali) Date: Thu, 7 May 2020 11:42:46 -0500 Subject: [petsc-users] Reuse hypre interpolations between successive KSPOnly solves associated with TSStep ? In-Reply-To: References: Message-ID: Hi Mark, As Victor explained on the Hypre mailing list, setting ksp_reuse_preconditioner flag doesn't have the intended effect because SNES still recomputes the preconditioner at each time step. Setting the flag for -snes_lag_preconditioner to -1 prevents BoomerAMG from recomputing the interpolations at each time step. Thank You, Sajid Ali | PhD Candidate Applied Physics Northwestern University s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Thu May 7 12:20:47 2020 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 7 May 2020 12:20:47 -0500 Subject: [petsc-users] mkl cpardiso iparm 31 In-Reply-To: References: Message-ID: Marius, You are right. perm is not referenced. I searched and found this, https://software.intel.com/content/www/us/en/develop/documentation/mkl-developer-reference-c/top/sparse-solver-routines/parallel-direct-sparse-solver-for-clusters-interface/cluster-sparse-solver.html . It says "perm Ignored". But from other parts of the document, it seems perm is used. I'm puzzled whether Intel MKL pardiso supports this feature or not. I am thinking about adding MatMkl_CPardisoSetPerm(Mat A, IS perm) or MatMkl_CPardisoSetPerm(Mat A, const PetscInt *perm). But I don't know whether perm(N) is distributed or every mpi rank has the same perm(N). Do you know good Intel MKL pardiso documentation or examples for me to reference? Thank you. --Junchao Zhang On Thu, May 7, 2020 at 3:14 AM Marius Buerkle wrote: > Hi, > > Thanks for the info. But how do I set the values to be calculated. > According to the intel parallel sparse cluster solver manual the entries > have to be defined in the permutation vector (before each call). However, > if I understand what is happening in mkl_cpardiso.c. correctly, perm is > set to 0 during the initialization phase and then not referenced anymore. > Is this correct? How can I specify the necessary entries in perm? > > Best, > Marius > > > > > On Fri, May 1, 2020 at 3:33 AM Marius Buerkle wrote: > >> Hi, >> >> Is the option "-mat_mkl_cpardiso_31" to calculate Partial solve and >> computing selected components of the solution vectors actually supported >> by PETSC? >> > From the code, it seems so. > >> >> Marius >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Thu May 7 12:38:26 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 7 May 2020 12:38:26 -0500 (CDT) Subject: [petsc-users] [EXTERNAL] Re: Example code of linear solve of a block matrix system in Fortran In-Reply-To: References: <91e4f8d137f444edb4542ce2cf09c97d@arl.psu.edu> Message-ID: Its best to keep the discussion on the list [or petsc-maint] - as others would have answers to some of these qestions. Wrt -info - yeah this is the correct option [my mistake in suggesting -log_info] Wrt understanding performance -log_view should help Satish On Thu, 7 May 2020, Thomas S. Chyczewski wrote: > Satish, > > Thanks for your help. Sorry for not being able to figure it out on my own. I had a little trouble following the discussion in the manual. I got the block version working now and the linear solver time is cut in half compared to the monolithic version for a given set of parameters. I have experimented with a number of solver and preconditioner options as well as solver convergence criteria. But I'm wondering if there are any other parameters I should be playing with. I ask because I also had some trouble following the PCFIELDSPLIT discussion in the manual and I'm wondering if the default is the best option for me. > > The -log_info option isn't available in the Fortran version, so I couldn't check the inode information as you suggested. However, below is the output when I run with the -info option. I know that having no mallocs during MatSetValues is good, but that's about it. > > Thanks, > Tom > > [0] PetscGetHostName(): Rejecting domainname, likely is NIS arl19814.(none) > [0] petscinitialize_internal(): (Fortran):PETSc successfully started: procs 1 > [0] PetscGetHostName(): Rejecting domainname, likely is NIS arl19814.(none) > [0] petscinitialize_internal(): Running on machine: arl19814 > [0] PetscCommDuplicate(): Duplicating a communicator 2 2 max tags = 100000000 > [0] MatAssemblyEnd_SeqBAIJ(): Matrix size: 374400 X 374400, block size 3; storage space: 11199240 unneeded, 5606640 used > [0] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is 0 > [0] MatAssemblyEnd_SeqBAIJ(): Most nonzeros blocks in any row is 5 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 124800) < 0.6. Do not use CompressedRow routines. > [0] PCSetUp(): Setting up PC for first time > [0] PetscCommDuplicate(): Duplicating a communicator 1 3 max tags = 100000000 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1 3 > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > > > -----Original Message----- > From: Satish Balay > Sent: Wednesday, May 6, 2020 4:05 PM > To: Thomas S. Chyczewski > Cc: petsc-users > Subject: Re: [petsc-users] [EXTERNAL] Re: Example code of linear solve of a block matrix system in Fortran > > You can use https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatSetValuesBlocked.html > > MatSetValues() will also work - but MatSetValuesBlocked() is more efficient wrt BAIJ > > Satish > > On Wed, 6 May 2020, Thomas S. Chyczewski wrote: > > > Thanks Satish. I have seen that page and can create a block matrix. It's not clear to me how to fill it and use it in a Fortran code. > > > > -----Original Message----- > > From: Satish Balay > > Sent: Wednesday, May 6, 2020 3:43 PM > > To: Thomas S. Chyczewski > > Cc: petsc-users at mcs.anl.gov > > Subject: [EXTERNAL] Re: [petsc-users] Example code of linear solve of a block matrix system in Fortran > > > > What you are looking for is: > > > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateBAIJ.html > > > > indoe is optimization for AIJ type - where you might not have a simple block size like 3 that you have. > > > > BAIJ will perform better than AIJ/with-inode > > > > To verify indoe usage - you can run the code with '-log_info' option - and do a 'grep inode' > > > > Satish > > > > On Wed, 6 May 2020, Thomas S. Chyczewski wrote: > > > > > > > > All, > > > > > > I'm relatively new to PETSc and have relied pretty heavily on the example codes included in the distribution to figure out the finer points of using the PETSc library that I couldn't deduce from the manual. One thing I can't figure out is how to solve block matrix systems and I couldn't find an example in Fortran. I'm writing a 2D incompressible CFD solver so I have a 3x3 block Imax*Jmax system. The closest I've come to finding an example is ex19.c in the snes directory, but that is in c and for the nonlinear solver. > > > > > > I have been able to run PETSc but unwrapping the block matrix into a monolithic system. But the manual says "Block matrices represent an important class of problems in numerical linear algebra and offer the possibility of far more efficient iterative solvers than just treating the entire matrix as black box." However, in the FAQs I saw a comment that PETSc scans the AIJ matrices for rows that have the same column layout and can deduce if it's a block system and use the more efficient solvers. I also saw in the archives for this email list a thread where it seems workaround for building fields in a Fortran code is discussed ("Back to struct in Fortran to represent field with dof > 1"), so I'm beginning to suspect building a block system in Fortran might not be straight forward. > > > > > > All that being said, my questions: > > > > > > Is there a significant advantage to building the block system as opposed to the analogous monolithic system if PETSc can figure out that it's a block system? Can you confirm that PETSc does figure this out? > > > If there is an advantage to loading the matrix as a block matrix, is there an example Fortran code that builds and solves a linear block system? > > > > > > Thanks, > > > Tom C > > > > > > > > > From knepley at gmail.com Thu May 7 12:46:02 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 7 May 2020 13:46:02 -0400 Subject: [petsc-users] [EXTERNAL] Re: Alternate mesh partitioners to METIS/ParMETIS In-Reply-To: References: <76e971e81ecf4fc28e5b4000cc194062@ES06AMSNLNT.srn.sandia.gov> <87y2q59fnh.fsf@jedbrown.org> <6114fa01c0894f0c8f7036b499d703ad@ES06AMSNLNT.srn.sandia.gov> Message-ID: On Thu, May 7, 2020 at 11:15 AM Lisandro Dalcin wrote: > On Thu, 7 May 2020 at 17:53, Hammond, Glenn E via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> Jed, >> >> We call MatMeshToCellGraph() to generate the dual matrix. This function >> relies upon ParMETIS. Can you point me to a similar function that does not >> require ParMETIS ( e.g. for PTScotch)? >> >> > Maybe a starting point: > > DMPlexCreateFromCellList(comm, ..., &dm); > PetscSectionCreate(comm, &cellPartSection); > DMPlexGetPartitioner(dm, &partitioner); > PetscPartitionerSetType(partitioner, PETSCPARTITIONERPTSCOTCH); > PetscPartitionerDMPlexPartition(partitioner, dm, NULL, cellPartSection, > &cellPart); > > One minor annoyance is that the first call will need the vertex > coordinates. Matthew, any better way? > I can just fix it to allow NULL for coordinates. Matt > -- > Lisandro Dalcin > ============ > Research Scientist > Extreme Computing Research Center (ECRC) > King Abdullah University of Science and Technology (KAUST) > http://ecrc.kaust.edu.sa/ > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu May 7 13:04:02 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 7 May 2020 14:04:02 -0400 Subject: [petsc-users] [EXTERNAL] Re: Example code of linear solve of a block matrix system in Fortran In-Reply-To: References: <91e4f8d137f444edb4542ce2cf09c97d@arl.psu.edu> Message-ID: On Thu, May 7, 2020 at 1:38 PM Satish Balay via petsc-users < petsc-users at mcs.anl.gov> wrote: > Its best to keep the discussion on the list [or petsc-maint] - as others > would have answers to some of these qestions. > > Wrt -info - yeah this is the correct option [my mistake in suggesting > -log_info] > > Wrt understanding performance -log_view should help > Block preconditioners tend to come down to having a good preconditioner for the Schur complement. There is a voluminous literature here. For example, https://www.sciencedirect.com/science/article/pii/S0021999107004330 Thanks, Matt > Satish > > > On Thu, 7 May 2020, Thomas S. Chyczewski wrote: > > > Satish, > > > > Thanks for your help. Sorry for not being able to figure it out on my > own. I had a little trouble following the discussion in the manual. I got > the block version working now and the linear solver time is cut in half > compared to the monolithic version for a given set of parameters. I have > experimented with a number of solver and preconditioner options as well as > solver convergence criteria. But I'm wondering if there are any other > parameters I should be playing with. I ask because I also had some trouble > following the PCFIELDSPLIT discussion in the manual and I'm wondering if > the default is the best option for me. > > > > The -log_info option isn't available in the Fortran version, so I > couldn't check the inode information as you suggested. However, below is > the output when I run with the -info option. I know that having no mallocs > during MatSetValues is good, but that's about it. > > > > Thanks, > > Tom > > > > [0] PetscGetHostName(): Rejecting domainname, likely is NIS > arl19814.(none) > > [0] petscinitialize_internal(): (Fortran):PETSc successfully started: > procs 1 > > [0] PetscGetHostName(): Rejecting domainname, likely is NIS > arl19814.(none) > > [0] petscinitialize_internal(): Running on machine: arl19814 > > [0] PetscCommDuplicate(): Duplicating a communicator 2 2 max tags = > 100000000 > > [0] MatAssemblyEnd_SeqBAIJ(): Matrix size: 374400 X 374400, block size > 3; storage space: 11199240 unneeded, 5606640 used > > [0] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is 0 > > [0] MatAssemblyEnd_SeqBAIJ(): Most nonzeros blocks in any row is 5 > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 124800) < 0.6. Do not use CompressedRow routines. > > [0] PCSetUp(): Setting up PC for first time > > [0] PetscCommDuplicate(): Duplicating a communicator 1 3 max tags = > 100000000 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 1 3 > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator > is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator > is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator > is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator > is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator > is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator > is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator > is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator > is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator > is unchanged > > > > > > > > -----Original Message----- > > From: Satish Balay > > Sent: Wednesday, May 6, 2020 4:05 PM > > To: Thomas S. Chyczewski > > Cc: petsc-users > > Subject: Re: [petsc-users] [EXTERNAL] Re: Example code of linear solve > of a block matrix system in Fortran > > > > You can use > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatSetValuesBlocked.html > > > > MatSetValues() will also work - but MatSetValuesBlocked() is more > efficient wrt BAIJ > > > > Satish > > > > On Wed, 6 May 2020, Thomas S. Chyczewski wrote: > > > > > Thanks Satish. I have seen that page and can create a block matrix. > It's not clear to me how to fill it and use it in a Fortran code. > > > > > > -----Original Message----- > > > From: Satish Balay > > > Sent: Wednesday, May 6, 2020 3:43 PM > > > To: Thomas S. Chyczewski > > > Cc: petsc-users at mcs.anl.gov > > > Subject: [EXTERNAL] Re: [petsc-users] Example code of linear solve of > a block matrix system in Fortran > > > > > > What you are looking for is: > > > > > > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateBAIJ.html > > > > > > indoe is optimization for AIJ type - where you might not have a simple > block size like 3 that you have. > > > > > > BAIJ will perform better than AIJ/with-inode > > > > > > To verify indoe usage - you can run the code with '-log_info' option - > and do a 'grep inode' > > > > > > Satish > > > > > > On Wed, 6 May 2020, Thomas S. Chyczewski wrote: > > > > > > > > > > > All, > > > > > > > > I'm relatively new to PETSc and have relied pretty heavily on the > example codes included in the distribution to figure out the finer points > of using the PETSc library that I couldn't deduce from the manual. One > thing I can't figure out is how to solve block matrix systems and I > couldn't find an example in Fortran. I'm writing a 2D incompressible CFD > solver so I have a 3x3 block Imax*Jmax system. The closest I've come to > finding an example is ex19.c in the snes directory, but that is in c and > for the nonlinear solver. > > > > > > > > I have been able to run PETSc but unwrapping the block matrix into a > monolithic system. But the manual says "Block matrices represent an > important class of problems in numerical linear algebra and offer the > possibility of far more efficient iterative solvers than just treating the > entire matrix as black box." However, in the FAQs I saw a comment that > PETSc scans the AIJ matrices for rows that have the same column layout and > can deduce if it's a block system and use the more efficient solvers. I > also saw in the archives for this email list a thread where it seems > workaround for building fields in a Fortran code is discussed ("Back to > struct in Fortran to represent field with dof > 1"), so I'm beginning to > suspect building a block system in Fortran might not be straight forward. > > > > > > > > All that being said, my questions: > > > > > > > > Is there a significant advantage to building the block system as > opposed to the analogous monolithic system if PETSc can figure out that > it's a block system? Can you confirm that PETSc does figure this out? > > > > If there is an advantage to loading the matrix as a block matrix, is > there an example Fortran code that builds and solves a linear block system? > > > > > > > > Thanks, > > > > Tom C > > > > > > > > > > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From juaneah at gmail.com Thu May 7 15:36:19 2020 From: juaneah at gmail.com (Emmanuel Ayala) Date: Thu, 7 May 2020 15:36:19 -0500 Subject: [petsc-users] VecGetSubVector Message-ID: Hi, I hope everyone is well. I have a MPI vector, I selected some elements of this vector and for some processes there is not selected elements (I stored the indices elements into an array). So, I wan to to create a MPI vector using these selected elements, the selected elements must be fully distributed in a MPI vector, but I do not figure out how to do it properly. If I use VecGetSubVector, it creates a MPI vector, but in those processes where there is not selected elements, the subvector portion is empty, this means that the resultant vector is unbalanced. If I use: VecScatterCreate(Vec x,IS ix,Vec y,IS iy,VecScatter *ctx); VecScatterBegin(VecScatter ctx,Vec x,Vec y,INSERT VALUES,SCATTER FORWARD); VecScatterEnd(VecScatter ctx,Vec x,Vec y,INSERT VALUES,SCATTER FORWARD); The vector is still unbalanced. Any suggestion? Kind regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu May 7 15:44:52 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 7 May 2020 16:44:52 -0400 Subject: [petsc-users] VecGetSubVector In-Reply-To: References: Message-ID: On Thu, May 7, 2020 at 4:37 PM Emmanuel Ayala wrote: > Hi, I hope everyone is well. > > I have a MPI vector, I selected some elements of this vector and for some > processes there is not selected elements (I stored the indices elements > into an array). So, I wan to to create a MPI vector using these selected > elements, the selected elements must be fully distributed in a MPI vector, > but I do not figure out how to do it properly. > > If I use VecGetSubVector, it creates a MPI vector, but in those processes > where there is not selected elements, the subvector portion is empty, this > means that the resultant vector is unbalanced. > Yes, you cannot use SubVector. > If I use: > > VecScatterCreate(Vec x,IS ix,Vec y,IS iy,VecScatter *ctx); > VecScatterBegin(VecScatter ctx,Vec x,Vec y,INSERT VALUES,SCATTER FORWARD); > VecScatterEnd(VecScatter ctx,Vec x,Vec y,INSERT VALUES,SCATTER FORWARD); > > The vector is still unbalanced. > You have not sent the elements to the places that you really wanted. VecScatter moves the elements that you ask for to the place you tell it. If you want them to be balanced, you must send them to balanced locations using iy. Thanks, Matt > Any suggestion? > > Kind regards. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Thu May 7 21:56:53 2020 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 7 May 2020 21:56:53 -0500 Subject: [petsc-users] VecGetSubVector In-Reply-To: References: Message-ID: You have to compute the length of the new vector (i.e., total number of selected elements), and then create a VecMPI with this length. PETSc will compute a balanced layout for the vector. Then you create a vecscatter that scatters selected elements from the old vector to the new vector. You may need MPI_Exscan to build correct index mapping. --Junchao Zhang On Thu, May 7, 2020 at 3:37 PM Emmanuel Ayala wrote: > Hi, I hope everyone is well. > > I have a MPI vector, I selected some elements of this vector and for some > processes there is not selected elements (I stored the indices elements > into an array). So, I wan to to create a MPI vector using these selected > elements, the selected elements must be fully distributed in a MPI vector, > but I do not figure out how to do it properly. > > If I use VecGetSubVector, it creates a MPI vector, but in those processes > where there is not selected elements, the subvector portion is empty, this > means that the resultant vector is unbalanced. > > If I use: > > VecScatterCreate(Vec x,IS ix,Vec y,IS iy,VecScatter *ctx); > VecScatterBegin(VecScatter ctx,Vec x,Vec y,INSERT VALUES,SCATTER FORWARD); > VecScatterEnd(VecScatter ctx,Vec x,Vec y,INSERT VALUES,SCATTER FORWARD); > > The vector is still unbalanced. > > Any suggestion? > > Kind regards. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From juaneah at gmail.com Thu May 7 22:32:44 2020 From: juaneah at gmail.com (Emmanuel Ayala) Date: Thu, 7 May 2020 22:32:44 -0500 Subject: [petsc-users] VecGetSubVector In-Reply-To: References: Message-ID: Thanks Matt and Zhang for the answers. That is right, VecScatter will be the solution to my problem. And MPI_Exscan is the function that I was looking for. The only thing that I though, very inefficient, was get the subvector using the IS that I need, and the scatter it with VecScatterCreateToAll. In this way I got all the elements in sequential form and then apply scatter operations based on the ranges of the final Vector ( created to contain the selected elements), it works. For suer I will try MPI_Exscan. Kind regards! El jue., 7 de may. de 2020 a la(s) 21:57, Junchao Zhang ( junchao.zhang at gmail.com) escribi?: > You have to compute the length of the new vector (i.e., total number of > selected elements), and then create a VecMPI with this length. PETSc will > compute a balanced layout for the vector. Then you create a vecscatter that > scatters selected elements from the old vector to the new vector. You may > need MPI_Exscan to build correct index mapping. > > --Junchao Zhang > > > On Thu, May 7, 2020 at 3:37 PM Emmanuel Ayala wrote: > >> Hi, I hope everyone is well. >> >> I have a MPI vector, I selected some elements of this vector and for some >> processes there is not selected elements (I stored the indices elements >> into an array). So, I wan to to create a MPI vector using these selected >> elements, the selected elements must be fully distributed in a MPI vector, >> but I do not figure out how to do it properly. >> >> If I use VecGetSubVector, it creates a MPI vector, but in those processes >> where there is not selected elements, the subvector portion is empty, this >> means that the resultant vector is unbalanced. >> >> If I use: >> >> VecScatterCreate(Vec x,IS ix,Vec y,IS iy,VecScatter *ctx); >> VecScatterBegin(VecScatter ctx,Vec x,Vec y,INSERT VALUES,SCATTER FORWARD); >> VecScatterEnd(VecScatter ctx,Vec x,Vec y,INSERT VALUES,SCATTER FORWARD); >> >> The vector is still unbalanced. >> >> Any suggestion? >> >> Kind regards. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbuerkle at web.de Thu May 7 23:56:33 2020 From: mbuerkle at web.de (Marius Buerkle) Date: Fri, 8 May 2020 06:56:33 +0200 Subject: [petsc-users] mkl cpardiso iparm 31 In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Fri May 8 10:44:12 2020 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Fri, 8 May 2020 10:44:12 -0500 Subject: [petsc-users] mkl cpardiso iparm 31 In-Reply-To: References: Message-ID: Marius, Thanks for the update. Once you get feedback from Intel, please let us know. If Intel supports it, we can add it. I am new to pardiso, but I think it is doable. --Junchao Zhang On Thu, May 7, 2020 at 11:56 PM Marius Buerkle wrote: > Hi Junchao, > > I contacted intel support regarding this, they told me that this is a typo > in the manual and that iparm[30] is indeed used. However, while it works > for the non-MPI (MKL_PARDISO) version, it does not work, or I could not get > it working, for the cluster sparse solver (MKL_CPARDISO). I reported this > also to intel support but no reply yet. I was also wondering about how > perm(N) is distributed and I don't know at the moment. > > Best, > Marius > > > > Marius, > You are right. perm is not referenced. I searched and found this, > > https://software.intel.com/content/www/us/en/develop/documentation/mkl-developer-reference-c/top/sparse-solver-routines/parallel-direct-sparse-solver-for-clusters-interface/cluster-sparse-solver.html > . > It says "perm Ignored". But from other parts of the document, it seems > perm is used. I'm puzzled whether Intel MKL pardiso supports this feature > or not. > > I am thinking about adding MatMkl_CPardisoSetPerm(Mat A, IS perm) or > MatMkl_CPardisoSetPerm(Mat A, const PetscInt *perm). But I don't know > whether perm(N) is distributed or every mpi rank has the same perm(N). > Do you know good Intel MKL pardiso documentation or examples for me to > reference? > > Thank you. > --Junchao Zhang > > On Thu, May 7, 2020 at 3:14 AM Marius Buerkle wrote: > >> Hi, >> >> Thanks for the info. But how do I set the values to be calculated. >> According to the intel parallel sparse cluster solver manual the entries >> have to be defined in the permutation vector (before each call). However, >> if I understand what is happening in mkl_cpardiso.c. correctly, perm is >> set to 0 during the initialization phase and then not referenced anymore. >> Is this correct? How can I specify the necessary entries in perm? >> >> Best, >> Marius >> >> >> >> >> On Fri, May 1, 2020 at 3:33 AM Marius Buerkle wrote: >> >>> Hi, >>> >>> Is the option "-mat_mkl_cpardiso_31" to calculate Partial solve and >>> computing selected components of the solution vectors actually >>> supported by PETSC? >>> >> From the code, it seems so. >> >>> >>> Marius >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashour.msc at gmail.com Sat May 9 14:30:22 2020 From: ashour.msc at gmail.com (Mohammed Ashour) Date: Sat, 9 May 2020 21:30:22 +0200 Subject: [petsc-users] Resuming analysis by importing Trajectory Data from external source Message-ID: Dear All, I'm using PETSc in conjunction wit PetIGA to solve a 3D phase-field problem on HPC cluster. Given the computational load of the code, I'm running it on 15 nodes with a multithreaded job. Now there is a wall time for the current partition set to 24 hours, afterward, the job will be killed. I have been searching for the possibility of using the Trajectory to resume the run after it being terminated by reloading the dumped binary files from TSSetSaveTrajectory once again into the TS, i.e., state vector and it's time derivative, timestep and time. I have tried using TSTrajectoryGet but I ended up with the following error: [0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: TS solver did not save trajectory So, I would like to ask if in theory that would be possible? if so, how can I reload the trajectory from a previously terminated job into a new one? Yours Sincerely. -- *Mohammed Ashour, M.Sc.*PhD Scholar Bauhaus-Universit?t Weimar Institute of Structural Mechanics (ISM) Marienstra?e 7 99423 Weimar, Germany Mobile: +(49) 176 58834667 -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Sat May 9 14:40:07 2020 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Sat, 9 May 2020 22:40:07 +0300 Subject: [petsc-users] Resuming analysis by importing Trajectory Data from external source In-Reply-To: References: Message-ID: <564C8CC3-8F1E-4D5C-8C20-E7759276711E@gmail.com> You do not want to use TSTrajectory for that. The trajectory object is meant to be used for sensitivity analysis to store _all_ the intermediate time steps and the perform backward integration. You should code a TSMonitor that dumps every n time steps and then use the last one to restart the simulation. > On May 9, 2020, at 10:30 PM, Mohammed Ashour wrote: > > Dear All, > I'm using PETSc in conjunction wit PetIGA to solve a 3D phase-field problem on HPC cluster. > Given the computational load of the code, I'm running it on 15 nodes with a multithreaded job. Now there is a wall time for the current partition set to 24 hours, afterward, the job will be killed. > > I have been searching for the possibility of using the Trajectory to resume the run after it being terminated by reloading the dumped binary files from TSSetSaveTrajectory once again into the TS, i.e., state vector and it's time derivative, timestep and time. > > I have tried using TSTrajectoryGet but I ended up with the following error: > [0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: TS solver did not save trajectory > > So, I would like to ask if in theory that would be possible? if so, how can I reload the trajectory from a previously terminated job into a new one? > > Yours Sincerely. > > -- > Mohammed Ashour, M.Sc. > PhD Scholar > Bauhaus-Universit?t Weimar > Institute of Structural Mechanics (ISM) > Marienstra?e 7 > 99423 Weimar, Germany > Mobile: +(49) 176 58834667 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashour.msc at gmail.com Sat May 9 14:47:27 2020 From: ashour.msc at gmail.com (Mohammed Ashour) Date: Sat, 9 May 2020 21:47:27 +0200 Subject: [petsc-users] Resuming analysis by importing Trajectory Data from external source In-Reply-To: <564C8CC3-8F1E-4D5C-8C20-E7759276711E@gmail.com> References: <564C8CC3-8F1E-4D5C-8C20-E7759276711E@gmail.com> Message-ID: Oh, thanks for the tip. I can do that. Aside from the solution vector, step number, and time, are there any hidden details in TS I need to dump that would be essential for the next run? Thanks in advance Yours Sincerely On Sat, May 9, 2020 at 9:40 PM Stefano Zampini wrote: > You do not want to use TSTrajectory for that. The trajectory object is > meant to be used for sensitivity analysis to store _all_ the intermediate > time steps and the perform backward integration. > > You should code a TSMonitor that dumps every n time steps and then use the > last one to restart the simulation. > > On May 9, 2020, at 10:30 PM, Mohammed Ashour wrote: > > Dear All, > I'm using PETSc in conjunction wit PetIGA to solve a 3D phase-field > problem on HPC cluster. > Given the computational load of the code, I'm running it on 15 nodes with > a multithreaded job. Now there is a wall time for the current partition set > to 24 hours, afterward, the job will be killed. > > I have been searching for the possibility of using the Trajectory to > resume the run after it being terminated by reloading the dumped binary > files from TSSetSaveTrajectory once again into the TS, i.e., state vector > and it's time derivative, timestep and time. > > I have tried using TSTrajectoryGet but I ended up with the following error: > [0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: TS solver did not save trajectory > > So, I would like to ask if in theory that would be possible? if so, how > can I reload the trajectory from a previously terminated job into a new one? > > Yours Sincerely. > > -- > > *Mohammed Ashour, M.Sc.*PhD Scholar > Bauhaus-Universit?t Weimar > Institute of Structural Mechanics (ISM) > Marienstra?e 7 > 99423 Weimar, Germany > Mobile: +(49) 176 58834667 > > > -- *Mohammed Ashour, M.Sc.*PhD Scholar Bauhaus-Universit?t Weimar Institute of Structural Mechanics (ISM) Marienstra?e 7 99423 Weimar, Germany Mobile: +(49) 176 58834667 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dalcinl at gmail.com Sat May 9 15:17:18 2020 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Sat, 9 May 2020 23:17:18 +0300 Subject: [petsc-users] Resuming analysis by importing Trajectory Data from external source In-Reply-To: References: <564C8CC3-8F1E-4D5C-8C20-E7759276711E@gmail.com> Message-ID: On Sat, 9 May 2020 at 22:48, Mohammed Ashour wrote: > Oh, thanks for the tip. > I can do that. Aside from the solution vector, step number, and time, are > there any hidden details in TS I need to dump that would be essential for > the next run? > > This really depends on the timestepper. What TS type are you using? -- Lisandro Dalcin ============ Research Scientist Extreme Computing Research Center (ECRC) King Abdullah University of Science and Technology (KAUST) http://ecrc.kaust.edu.sa/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashour.msc at gmail.com Sat May 9 15:44:27 2020 From: ashour.msc at gmail.com (Mohammed Ashour) Date: Sat, 9 May 2020 22:44:27 +0200 Subject: [petsc-users] Resuming analysis by importing Trajectory Data from external source In-Reply-To: References: <564C8CC3-8F1E-4D5C-8C20-E7759276711E@gmail.com> Message-ID: I'm using TSALPHA1, as in ierr = TSSetType(ts,TSALPHA1); CHKERRQ(ierr); Best Regards. On Sat, May 9, 2020 at 10:17 PM Lisandro Dalcin wrote: > > > On Sat, 9 May 2020 at 22:48, Mohammed Ashour wrote: > >> Oh, thanks for the tip. >> I can do that. Aside from the solution vector, step number, and time, are >> there any hidden details in TS I need to dump that would be essential for >> the next run? >> >> > This really depends on the timestepper. What TS type are you using? > > -- > Lisandro Dalcin > ============ > Research Scientist > Extreme Computing Research Center (ECRC) > King Abdullah University of Science and Technology (KAUST) > http://ecrc.kaust.edu.sa/ > -- *Mohammed Ashour, M.Sc.*PhD Scholar Bauhaus-Universit?t Weimar Institute of Structural Mechanics (ISM) Marienstra?e 7 99423 Weimar, Germany Mobile: +(49) 176 58834667 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dalcinl at gmail.com Sun May 10 09:56:53 2020 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Sun, 10 May 2020 17:56:53 +0300 Subject: [petsc-users] Resuming analysis by importing Trajectory Data from external source In-Reply-To: References: <564C8CC3-8F1E-4D5C-8C20-E7759276711E@gmail.com> Message-ID: On Sat, 9 May 2020 at 23:44, Mohammed Ashour wrote: > I'm using TSALPHA1, as in ierr = TSSetType(ts,TSALPHA1); CHKERRQ(ierr); > > Well, TSALPHA1 uses some vectors keeping intenal state, as you know, generalized-\alpha is not really a one-step method. That intenal state should be saved and loaded if you want to implement a truly correct checkpoint/restart. But PETSc does not currently provide access to this vector in its public API, so you will have to hack things around. IIRC, just saving and restoring the internal V1 vector is all what you need, maybe also set ts->steprestart to FALSE and properly set the ts->steps counter. Or you can just ignore these detail, and restart the easy way from just the saved solution, at the price of a very small "glitch" at the restart time. But note that this glitch just means that you temporarily change the scheme to reinitialize, and it is exactly the same thing the implementation does at the initial time to initialize the method (as you surely know, the method is also not trully self-starting). The restarting procesure as implemented in TSALPHA1 is not in the literature, it is of my own cooking but based on rather trivial relations performing two steps with dt/2 time step size, the fist of those inner steps using backward-Euler. -- Lisandro Dalcin ============ Research Scientist Extreme Computing Research Center (ECRC) King Abdullah University of Science and Technology (KAUST) http://ecrc.kaust.edu.sa/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From shapero at uw.edu Sun May 10 15:47:15 2020 From: shapero at uw.edu (Daniel R. Shapero) Date: Sun, 10 May 2020 13:47:15 -0700 Subject: [petsc-users] HDF5 viewer + groups Message-ID: I'm trying to improve the checkpointing functionality in the Firedrake library, which currently uses the HDF5 viewer to store PetscVec objects. The problem with this approach is that a lot of context about the stored vector (e.g. the mesh and finite element space that it came from) is lost and we'd like to save that too. In order to make a sensible hierarchy inside the file, I'd like to be able to write a DMPlex to a group, say `/meshes/`, within the file rather than at the root `/`. I tried using the `PETScViewerHDF5PushGroup` function; I thought that this will change the current group of the file to whatever name you give and all subsequent dataset writes will go under that group until you call the matching pop. This doesn't seem to be the case and all of the mesh data gets written under `/`. From reading the source code, it looks like the HDF5 writer pushes the group `/topology` to write out the DMPlex cells (and likewise for coordinates etc) which then clobbers whatever I pushed before it instead of concatenating. I've attached a minimal example using petsc4py to demonstrate. I get the same results when I use some extra functionality in Firedrake to ensure that the `/meshes` group is created in the first place, which I can then verify with h5ls. Is there a way to do what I want and if so how? If there isn't, is that because no one has needed this before or is there a fundamental reason why you shouldn't be doing this in the first place? Thanks! Daniel -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: plex_to_hdf_test.py Type: text/x-python Size: 387 bytes Desc: not available URL: From knepley at gmail.com Mon May 11 05:23:17 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 11 May 2020 06:23:17 -0400 Subject: [petsc-users] HDF5 viewer + groups In-Reply-To: References: Message-ID: On Sun, May 10, 2020 at 10:03 PM Daniel R. Shapero wrote: > I'm trying to improve the checkpointing functionality in the Firedrake > library, which currently uses the HDF5 viewer to store PetscVec objects. > The problem with this approach is that a lot of context about the stored > vector (e.g. the mesh and finite element space that it came from) is lost > and we'd like to save that too. > > In order to make a sensible hierarchy inside the file, I'd like to be able > to write a DMPlex to a group, say `/meshes/`, within the file > rather than at the root `/`. I tried using the `PETScViewerHDF5PushGroup` > function; I thought that this will change the current group of the file to > whatever name you give and all subsequent dataset writes will go under that > group until you call the matching pop. This doesn't seem to be the case and > all of the mesh data gets written under `/`. From reading the source code, > it looks like the HDF5 writer pushes the group `/topology` to write out the > DMPlex cells (and likewise for coordinates etc) which then clobbers > whatever I pushed before it instead of concatenating. > > I've attached a minimal example using petsc4py to demonstrate. I get the > same results when I use some extra functionality in Firedrake to ensure > that the `/meshes` group is created in the first place, which I can then > verify with h5ls. > > Is there a way to do what I want and if so how? If there isn't, is that > because no one has needed this before or is there a fundamental reason why > you shouldn't be doing this in the first place? > The reason is the impedence mismatch with visualization formats. VTK/ExodusII/etc are all based on a single mesh and multiple vectors in a file, so I copied that design. It would be possible to namespace everything, and fix the XDMF generator to understand that. Vaclav, should we roll this into the HDF5 parallel reading/writing update? Thanks, Matt > Thanks! > Daniel > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From shapero at uw.edu Mon May 11 12:28:28 2020 From: shapero at uw.edu (Daniel R. Shapero) Date: Mon, 11 May 2020 10:28:28 -0700 Subject: [petsc-users] HDF5 viewer + groups In-Reply-To: References: Message-ID: Gotcha, that all makes sense and thanks for the reply! Is adding a namespace mechanism to HDF5 viewers worthwhile from your end? I totally understand if not and we can probably design around that from the Firedrake side, e.g. restricting to only one mesh per checkpoint file. On Mon, May 11, 2020 at 3:24 AM Matthew Knepley wrote: > On Sun, May 10, 2020 at 10:03 PM Daniel R. Shapero wrote: > >> I'm trying to improve the checkpointing functionality in the Firedrake >> library, which currently uses the HDF5 viewer to store PetscVec objects. >> The problem with this approach is that a lot of context about the stored >> vector (e.g. the mesh and finite element space that it came from) is lost >> and we'd like to save that too. >> >> In order to make a sensible hierarchy inside the file, I'd like to be >> able to write a DMPlex to a group, say `/meshes/`, within the >> file rather than at the root `/`. I tried using the >> `PETScViewerHDF5PushGroup` function; I thought that this will change the >> current group of the file to whatever name you give and all subsequent >> dataset writes will go under that group until you call the matching pop. >> This doesn't seem to be the case and all of the mesh data gets written >> under `/`. From reading the source code, it looks like the HDF5 writer >> pushes the group `/topology` to write out the DMPlex cells (and likewise >> for coordinates etc) which then clobbers whatever I pushed before it >> instead of concatenating. >> >> I've attached a minimal example using petsc4py to demonstrate. I get the >> same results when I use some extra functionality in Firedrake to ensure >> that the `/meshes` group is created in the first place, which I can then >> verify with h5ls. >> >> Is there a way to do what I want and if so how? If there isn't, is that >> because no one has needed this before or is there a fundamental reason why >> you shouldn't be doing this in the first place? >> > > The reason is the impedence mismatch with visualization formats. > VTK/ExodusII/etc are all based on a single mesh and multiple vectors in a > file, so I > copied that design. It would be possible to namespace everything, and fix > the XDMF generator to understand that. > > Vaclav, should we roll this into the HDF5 parallel reading/writing update? > > Thanks, > > Matt > > >> Thanks! >> Daniel >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon May 11 12:35:11 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 11 May 2020 13:35:11 -0400 Subject: [petsc-users] HDF5 viewer + groups In-Reply-To: References: Message-ID: On Mon, May 11, 2020 at 1:29 PM Daniel R. Shapero wrote: > Gotcha, that all makes sense and thanks for the reply! > > Is adding a namespace mechanism to HDF5 viewers worthwhile from your end? > I totally understand if not and we can probably design around that from the > Firedrake side, e.g. restricting to only one mesh per checkpoint file. > I don't think it's such a huge deal. We are redoing things anyway, so its just another requirement. Thanks, Matt > On Mon, May 11, 2020 at 3:24 AM Matthew Knepley wrote: > >> On Sun, May 10, 2020 at 10:03 PM Daniel R. Shapero >> wrote: >> >>> I'm trying to improve the checkpointing functionality in the Firedrake >>> library, which currently uses the HDF5 viewer to store PetscVec objects. >>> The problem with this approach is that a lot of context about the stored >>> vector (e.g. the mesh and finite element space that it came from) is lost >>> and we'd like to save that too. >>> >>> In order to make a sensible hierarchy inside the file, I'd like to be >>> able to write a DMPlex to a group, say `/meshes/`, within the >>> file rather than at the root `/`. I tried using the >>> `PETScViewerHDF5PushGroup` function; I thought that this will change the >>> current group of the file to whatever name you give and all subsequent >>> dataset writes will go under that group until you call the matching pop. >>> This doesn't seem to be the case and all of the mesh data gets written >>> under `/`. From reading the source code, it looks like the HDF5 writer >>> pushes the group `/topology` to write out the DMPlex cells (and likewise >>> for coordinates etc) which then clobbers whatever I pushed before it >>> instead of concatenating. >>> >>> I've attached a minimal example using petsc4py to demonstrate. I get the >>> same results when I use some extra functionality in Firedrake to ensure >>> that the `/meshes` group is created in the first place, which I can then >>> verify with h5ls. >>> >>> Is there a way to do what I want and if so how? If there isn't, is that >>> because no one has needed this before or is there a fundamental reason why >>> you shouldn't be doing this in the first place? >>> >> >> The reason is the impedence mismatch with visualization formats. >> VTK/ExodusII/etc are all based on a single mesh and multiple vectors in a >> file, so I >> copied that design. It would be possible to namespace everything, and fix >> the XDMF generator to understand that. >> >> Vaclav, should we roll this into the HDF5 parallel reading/writing update? >> >> Thanks, >> >> Matt >> >> >>> Thanks! >>> Daniel >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From vaclav.hapla at erdw.ethz.ch Mon May 11 15:55:52 2020 From: vaclav.hapla at erdw.ethz.ch (Hapla Vaclav) Date: Mon, 11 May 2020 20:55:52 +0000 Subject: [petsc-users] HDF5 viewer + groups In-Reply-To: References: Message-ID: On 11 May 2020, at 19:35, Matthew Knepley > wrote: On Mon, May 11, 2020 at 1:29 PM Daniel R. Shapero > wrote: Gotcha, that all makes sense and thanks for the reply! Is adding a namespace mechanism to HDF5 viewers worthwhile from your end? I totally understand if not and we can probably design around that from the Firedrake side, e.g. restricting to only one mesh per checkpoint file. I don't think it's such a huge deal. We are redoing things anyway, so its just another requirement. Yes, this is a totally reasonable requirement. We are going to change the default naming and custom naming will be definitely possible as well. It's in my backlog but should get to that soon! See also https://gitlab.com/petsc/petsc/-/issues/553 Thanks, Vaclav Thanks, Matt On Mon, May 11, 2020 at 3:24 AM Matthew Knepley > wrote: On Sun, May 10, 2020 at 10:03 PM Daniel R. Shapero > wrote: I'm trying to improve the checkpointing functionality in the Firedrake library, which currently uses the HDF5 viewer to store PetscVec objects. The problem with this approach is that a lot of context about the stored vector (e.g. the mesh and finite element space that it came from) is lost and we'd like to save that too. In order to make a sensible hierarchy inside the file, I'd like to be able to write a DMPlex to a group, say `/meshes/`, within the file rather than at the root `/`. I tried using the `PETScViewerHDF5PushGroup` function; I thought that this will change the current group of the file to whatever name you give and all subsequent dataset writes will go under that group until you call the matching pop. This doesn't seem to be the case and all of the mesh data gets written under `/`. From reading the source code, it looks like the HDF5 writer pushes the group `/topology` to write out the DMPlex cells (and likewise for coordinates etc) which then clobbers whatever I pushed before it instead of concatenating. I've attached a minimal example using petsc4py to demonstrate. I get the same results when I use some extra functionality in Firedrake to ensure that the `/meshes` group is created in the first place, which I can then verify with h5ls. Is there a way to do what I want and if so how? If there isn't, is that because no one has needed this before or is there a fundamental reason why you shouldn't be doing this in the first place? The reason is the impedence mismatch with visualization formats. VTK/ExodusII/etc are all based on a single mesh and multiple vectors in a file, so I copied that design. It would be possible to namespace everything, and fix the XDMF generator to understand that. Vaclav, should we roll this into the HDF5 parallel reading/writing update? Thanks, Matt Thanks! Daniel -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From shaswat121994 at gmail.com Tue May 12 03:19:50 2020 From: shaswat121994 at gmail.com (Shashwat Tiwari) Date: Tue, 12 May 2020 13:49:50 +0530 Subject: [petsc-users] Linear Reconstruction of Gradients Message-ID: Hi, I am trying to write a second order upwind finite volume scheme on unstructured grid using DMPlex. I am performing gradient reconstruction using the "DMPlexReconstructGradientsFVM" function as well as a "ComputeGradient" function that I have written myself, in which, I loop over faces and add the gradient contribution to neighboring cells of the face (similar to what is done in "DMPlexReconstructGradients_Internal") . The results don't seem correct when using the "DMPlexReconstructGradientsFVM" function where as when using my own function, the second order is achieved. To investigate, I looked at the gradients computed by these two functions, and noticed that "DMPlexReconstructGradientsFVM" assigns a zero value to the gradient of cells with less than three neighbors, for example, boundary cells (which I assume is because leastsquares requires over-constrained system of equations and hence, atleast three neighbors), where as, I am not considering this factor in my "ComputeGradient" function and adding the contribution to neighboring cells for all interior and partition faces. Other than this, both the functions seem to compute similar values for gradients on all the other cells and I suspect, zero values for certain cells might be causing the problem. I wanted to know if I am missing out on some crucial step in the code which would enable the "DMPlexReconstructGradientsFVM" function to compute gradients for cells with lesser number of neighbors, and also if there is a way to augment the leastsquares stencil, i.e. to use, say, vertex neighbors instead of face neighbors to reconstruct gradients. Also, kindly let me know if my analysis of the problem is wrong and if there might be some other issue. I am attaching my code for your reference. You can find the use of both "DMPlexReconstructGradientsFVM" function and "ComputeGradient" function (just uncomment the one you want to use) inside "ComputeResidual" function. I am also attaching the Gmsh file for the mesh I am using. You can run the code as follows: mpiexec -n ./convect -mesh square_tri.msh Please let me know if you need anything else. Thanks and Regards, Shashwat -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: convect.c Type: text/x-csrc Size: 26290 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: square_tri.geo Type: application/octet-stream Size: 376 bytes Desc: not available URL: From knepley at gmail.com Tue May 12 12:58:28 2020 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 12 May 2020 13:58:28 -0400 Subject: [petsc-users] Linear Reconstruction of Gradients In-Reply-To: References: Message-ID: On Tue, May 12, 2020 at 4:21 AM Shashwat Tiwari wrote: > Hi, > I am trying to write a second order upwind finite volume scheme on > unstructured grid using DMPlex. I am performing gradient reconstruction > using the "DMPlexReconstructGradientsFVM" function as well as a > "ComputeGradient" function that I have written myself, in which, I loop > over faces and add the gradient contribution to neighboring cells of the > face (similar to what is done in "DMPlexReconstructGradients_Internal") . > The results don't seem correct when using the > "DMPlexReconstructGradientsFVM" function where as when using my own > function, the second order is achieved. To investigate, I looked at the > gradients computed by these two functions, and noticed that > "DMPlexReconstructGradientsFVM" assigns a zero value to the gradient of > cells with less than three neighbors, for example, boundary cells (which I > assume is because leastsquares requires over-constrained system of > equations and hence, atleast three neighbors), where as, I am not > considering this factor in my "ComputeGradient" function and adding the > contribution to neighboring cells for all interior and partition faces. > Other than this, both the functions seem to compute similar values for > gradients on all the other cells and I suspect, zero values for certain > cells might be causing the problem. I wanted to know if I am missing out on > some crucial step in the code which would enable the > "DMPlexReconstructGradientsFVM" function to compute gradients for cells > with lesser number of neighbors, and also if there is a way to augment the > leastsquares stencil, i.e. to use, say, vertex neighbors instead of face > neighbors to reconstruct gradients. Also, kindly let me know if my analysis > of the problem is wrong and if there might be some other issue. > I am sure you are right. It should be possible to do exactly as you say in DMPlexReconstructGradientsFVM(). I must have just messed it up in the initial implementtion. If you want to make a simple MR, or send your patch, I can help get it integrated. Thanks! Matt > I am attaching my code for your reference. You can find the use of both > "DMPlexReconstructGradientsFVM" function and "ComputeGradient" function > (just uncomment the one you want to use) inside "ComputeResidual" function. > I am also attaching the Gmsh file for the mesh I am using. You can run the > code as follows: > mpiexec -n ./convect -mesh square_tri.msh > > Please let me know if you need anything else. > > Thanks and Regards, > Shashwat > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexlindsay239 at gmail.com Tue May 12 14:11:19 2020 From: alexlindsay239 at gmail.com (Alexander Lindsay) Date: Tue, 12 May 2020 12:11:19 -0700 Subject: [petsc-users] make check failed with intel 2019 compilers Message-ID: The parallel make check target (ex19) fails with the error below after configuring/building with intel 2019 mpi compilers (mpiicc,mpiicpc,mpiifort). Any attempt to run valgrind or to attach to a debugger fails with `mpiexec: Error: unknown option "-pmi_args"`. I've attached configure.log. Does anyone have any ideas off the top of their head? We're trying to link MOOSE with a project that refuses to use a toolchain other than intel's. I'm currently trying to figure out whether the MPI implementation matters (e.g. can I use mpich/openmpi), but for now I'm operating under the assumption that I need to use the intel MPI implementation. lindad at lemhi2:/scratch/lindad/moose/petsc/src/snes/examples/tutorials((detached from 7c25e2d))$ mpiexec -np 2 ./ex19 lid velocity = 0.0625, prandtl # = 1., grashof # = 1. [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [0]PETSC ERROR: likely location of problem given in stack below [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [0]PETSC ERROR: INSTEAD the line number of the start of the function [0]PETSC ERROR: is given. [0]PETSC ERROR: [0] MPIPetsc_Type_unwrap line 38 /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sftype.c [0]PETSC ERROR: [1]PETSC ERROR: ------------------------------------------------------------------------ [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [1]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [1]PETSC ERROR: likely location of problem given in stack below [1]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [1]PETSC ERROR: INSTEAD the line number of the start of the function [1]PETSC ERROR: is given. [1]PETSC ERROR: [1] MPIPetsc_Type_unwrap line 38 /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sftype.c [1]PETSC ERROR: [0] MPIPetsc_Type_compare line 71 /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sftype.c [0]PETSC ERROR: [0] PetscSFPackGetInUse line 514 /scratch/lindad/moose/petsc/src/vec/is/sf/impls/basic/sfpack.c [0]PETSC ERROR: [0] PetscSFBcastAndOpEnd_Basic line 305 /scratch/lindad/moose/petsc/src/vec/is/sf/impls/basic/sfbasic.c [0]PETSC ERROR: [0] PetscSFBcastAndOpEnd line 1335 /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sf.c [0]PETSC ERROR: [0] VecScatterEnd_SF line 83 /scratch/lindad/moose/petsc/src/vec/vscat/impls/sf/vscatsf.c [0]PETSC ERROR: [1] MPIPetsc_Type_compare line 71 /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sftype.c [1]PETSC ERROR: [1] PetscSFPackGetInUse line 514 /scratch/lindad/moose/petsc/src/vec/is/sf/impls/basic/sfpack.c [0] VecScatterEnd line 145 /scratch/lindad/moose/petsc/src/vec/vscat/interface/vscatfce.c [0]PETSC ERROR: [0] DMGlobalToLocalEnd_DA line 25 /scratch/lindad/moose/petsc/src/dm/impls/da/dagtol.c [0]PETSC ERROR: [1]PETSC ERROR: [1] PetscSFBcastAndOpEnd_Basic line 305 /scratch/lindad/moose/petsc/src/vec/is/sf/impls/basic/sfbasic.c [1]PETSC ERROR: [1] PetscSFBcastAndOpEnd line 1335 /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sf.c [0] DMGlobalToLocalEnd line 2368 /scratch/lindad/moose/petsc/src/dm/interface/dm.c [0]PETSC ERROR: [0] SNESComputeFunction_DMDA line 67 /scratch/lindad/moose/petsc/src/snes/utils/dmdasnes.c [0]PETSC ERROR: [1]PETSC ERROR: [1] VecScatterEnd_SF line 83 /scratch/lindad/moose/petsc/src/vec/vscat/impls/sf/vscatsf.c [0] MatFDColoringApply_AIJ line 180 /scratch/lindad/moose/petsc/src/mat/impls/aij/mpi/fdmpiaij.c [0]PETSC ERROR: [0] MatFDColoringApply line 610 /scratch/lindad/moose/petsc/src/mat/matfd/fdmatrix.c [0]PETSC ERROR: [1]PETSC ERROR: [1] VecScatterEnd line 145 /scratch/lindad/moose/petsc/src/vec/vscat/interface/vscatfce.c [1]PETSC ERROR: [1] DMGlobalToLocalEnd_DA line 25 /scratch/lindad/moose/petsc/src/dm/impls/da/dagtol.c [0] SNESComputeJacobian_DMDA line 153 /scratch/lindad/moose/petsc/src/snes/utils/dmdasnes.c [0]PETSC ERROR: [0] SNES user Jacobian function line 2678 /scratch/lindad/moose/petsc/src/snes/interface/snes.c [0]PETSC ERROR: [1]PETSC ERROR: [1] DMGlobalToLocalEnd line 2368 /scratch/lindad/moose/petsc/src/dm/interface/dm.c [1]PETSC ERROR: [1] SNESComputeFunction_DMDA line 67 /scratch/lindad/moose/petsc/src/snes/utils/dmdasnes.c [0] SNESComputeJacobian line 2637 /scratch/lindad/moose/petsc/src/snes/interface/snes.c [0]PETSC ERROR: [0] SNESSolve_NEWTONLS line 144 /scratch/lindad/moose/petsc/src/snes/impls/ls/ls.c [0]PETSC ERROR: [1]PETSC ERROR: [1] MatFDColoringApply_AIJ line 180 /scratch/lindad/moose/petsc/src/mat/impls/aij/mpi/fdmpiaij.c [1]PETSC ERROR: [1] MatFDColoringApply line 610 /scratch/lindad/moose/petsc/src/mat/matfd/fdmatrix.c [1]PETSC ERROR: [1] SNESComputeJacobian_DMDA line 153 /scratch/lindad/moose/petsc/src/snes/utils/dmdasnes.c [1]PETSC ERROR: [0] SNESSolve line 4366 /scratch/lindad/moose/petsc/src/snes/interface/snes.c [0]PETSC ERROR: [0] main line 108 ex19.c [1] SNES user Jacobian function line 2678 /scratch/lindad/moose/petsc/src/snes/interface/snes.c [1]PETSC ERROR: [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1] SNESComputeJacobian line 2637 /scratch/lindad/moose/petsc/src/snes/interface/snes.c [1]PETSC ERROR: [1] SNESSolve_NEWTONLS line 144 /scratch/lindad/moose/petsc/src/snes/impls/ls/ls.c [1]PETSC ERROR: [0]PETSC ERROR: Signal received [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: [1] SNESSolve line 4366 /scratch/lindad/moose/petsc/src/snes/interface/snes.c [1]PETSC ERROR: [1] main line 108 ex19.c Petsc Release Version 3.12.4, unknown [0]PETSC ERROR: ./ex19 on a arch-moose named lemhi2 by lindad Tue May 12 12:54:11 2020 [0]PETSC ERROR: [1]PETSC ERROR: Configure options --download-hypre=1 --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 --download-metis=1 --download-ptscotch=1 --download-parmetis=1 --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 --download-slepc=git://https://gitlab.com/slepc/slepc.git --download-slepc-commit= 59ff81b --with-mpi=1 --with-cxx-dialect=C++11 --with-fortran-bindings=0 --with-sowing=0 --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-debugging=yes [0]PETSC ERROR: #1 User provided function() line 0 in unknown file --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: Signal received [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [1]PETSC ERROR: Petsc Release Version 3.12.4, unknown [1]PETSC ERROR: Abort(59) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 ./ex19 on a arch-moose named lemhi2 by lindad Tue May 12 12:54:11 2020 [1]PETSC ERROR: Configure options --download-hypre=1 --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 --download-metis=1 --download-ptscotch=1 --download-parmetis=1 --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 --download-slepc=git:// https://gitlab.com/slepc/slepc.git --download-slepc-commit= 59ff81b --with-mpi=1 --with-cxx-dialect=C++11 --with-fortran-bindings=0 --with-sowing=0 --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-debugging=yes [1]PETSC ERROR: #1 User provided function() line 0 in unknown file [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: Abort(59) on node 1 (rank 1 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 1 [1]PETSC ERROR: ------------------------------------------------------------------------ [1]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [1]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [1]PETSC ERROR: likely location of problem given in stack below [1]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [0]PETSC ERROR: likely location of problem given in stack below [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 2685382 bytes Desc: not available URL: From knepley at gmail.com Tue May 12 14:22:05 2020 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 12 May 2020 15:22:05 -0400 Subject: [petsc-users] make check failed with intel 2019 compilers In-Reply-To: References: Message-ID: On Tue, May 12, 2020 at 3:13 PM Alexander Lindsay wrote: > The parallel make check target (ex19) fails with the error below after > configuring/building with intel 2019 mpi compilers > (mpiicc,mpiicpc,mpiifort). Any attempt to run valgrind or to attach to a > debugger fails with `mpiexec: Error: unknown option "-pmi_args"`. I've > attached configure.log. Does anyone have any ideas off the top of their > head? We're trying to link MOOSE with a project that refuses to use a > toolchain other than intel's. I'm currently trying to figure out whether > the MPI implementation matters (e.g. can I use mpich/openmpi), but for now > I'm operating under the assumption that I need to use the intel MPI > implementation. > There have been a _lot_ of bugs in the 2019 MPI for some reason. Is it at all possible to rollback? If not, is this somewhere we can run? Thanks, Matt > lindad at lemhi2:/scratch/lindad/moose/petsc/src/snes/examples/tutorials((detached > from 7c25e2d))$ mpiexec -np 2 ./ex19 > lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > X to find memory corruption errors > [0]PETSC ERROR: likely location of problem given in stack below > [0]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > [0]PETSC ERROR: INSTEAD the line number of the start of the function > [0]PETSC ERROR: is given. > [0]PETSC ERROR: [0] MPIPetsc_Type_unwrap line 38 > /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sftype.c > [0]PETSC ERROR: [1]PETSC ERROR: > ------------------------------------------------------------------------ > [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [1]PETSC ERROR: or see > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > X to find memory corruption errors > [1]PETSC ERROR: likely location of problem given in stack below > [1]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > [1]PETSC ERROR: INSTEAD the line number of the start of the function > [1]PETSC ERROR: is given. > [1]PETSC ERROR: [1] MPIPetsc_Type_unwrap line 38 > /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sftype.c > [1]PETSC ERROR: [0] MPIPetsc_Type_compare line 71 > /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sftype.c > [0]PETSC ERROR: [0] PetscSFPackGetInUse line 514 > /scratch/lindad/moose/petsc/src/vec/is/sf/impls/basic/sfpack.c > [0]PETSC ERROR: [0] PetscSFBcastAndOpEnd_Basic line 305 > /scratch/lindad/moose/petsc/src/vec/is/sf/impls/basic/sfbasic.c > [0]PETSC ERROR: [0] PetscSFBcastAndOpEnd line 1335 > /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sf.c > [0]PETSC ERROR: [0] VecScatterEnd_SF line 83 > /scratch/lindad/moose/petsc/src/vec/vscat/impls/sf/vscatsf.c > [0]PETSC ERROR: [1] MPIPetsc_Type_compare line 71 > /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sftype.c > [1]PETSC ERROR: [1] PetscSFPackGetInUse line 514 > /scratch/lindad/moose/petsc/src/vec/is/sf/impls/basic/sfpack.c > [0] VecScatterEnd line 145 > /scratch/lindad/moose/petsc/src/vec/vscat/interface/vscatfce.c > [0]PETSC ERROR: [0] DMGlobalToLocalEnd_DA line 25 > /scratch/lindad/moose/petsc/src/dm/impls/da/dagtol.c > [0]PETSC ERROR: [1]PETSC ERROR: [1] PetscSFBcastAndOpEnd_Basic line 305 > /scratch/lindad/moose/petsc/src/vec/is/sf/impls/basic/sfbasic.c > [1]PETSC ERROR: [1] PetscSFBcastAndOpEnd line 1335 > /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sf.c > [0] DMGlobalToLocalEnd line 2368 > /scratch/lindad/moose/petsc/src/dm/interface/dm.c > [0]PETSC ERROR: [0] SNESComputeFunction_DMDA line 67 > /scratch/lindad/moose/petsc/src/snes/utils/dmdasnes.c > [0]PETSC ERROR: [1]PETSC ERROR: [1] VecScatterEnd_SF line 83 > /scratch/lindad/moose/petsc/src/vec/vscat/impls/sf/vscatsf.c > [0] MatFDColoringApply_AIJ line 180 > /scratch/lindad/moose/petsc/src/mat/impls/aij/mpi/fdmpiaij.c > [0]PETSC ERROR: [0] MatFDColoringApply line 610 > /scratch/lindad/moose/petsc/src/mat/matfd/fdmatrix.c > [0]PETSC ERROR: [1]PETSC ERROR: [1] VecScatterEnd line 145 > /scratch/lindad/moose/petsc/src/vec/vscat/interface/vscatfce.c > [1]PETSC ERROR: [1] DMGlobalToLocalEnd_DA line 25 > /scratch/lindad/moose/petsc/src/dm/impls/da/dagtol.c > [0] SNESComputeJacobian_DMDA line 153 > /scratch/lindad/moose/petsc/src/snes/utils/dmdasnes.c > [0]PETSC ERROR: [0] SNES user Jacobian function line 2678 > /scratch/lindad/moose/petsc/src/snes/interface/snes.c > [0]PETSC ERROR: [1]PETSC ERROR: [1] DMGlobalToLocalEnd line 2368 > /scratch/lindad/moose/petsc/src/dm/interface/dm.c > [1]PETSC ERROR: [1] SNESComputeFunction_DMDA line 67 > /scratch/lindad/moose/petsc/src/snes/utils/dmdasnes.c > [0] SNESComputeJacobian line 2637 > /scratch/lindad/moose/petsc/src/snes/interface/snes.c > [0]PETSC ERROR: [0] SNESSolve_NEWTONLS line 144 > /scratch/lindad/moose/petsc/src/snes/impls/ls/ls.c > [0]PETSC ERROR: [1]PETSC ERROR: [1] MatFDColoringApply_AIJ line 180 > /scratch/lindad/moose/petsc/src/mat/impls/aij/mpi/fdmpiaij.c > [1]PETSC ERROR: [1] MatFDColoringApply line 610 > /scratch/lindad/moose/petsc/src/mat/matfd/fdmatrix.c > [1]PETSC ERROR: [1] SNESComputeJacobian_DMDA line 153 > /scratch/lindad/moose/petsc/src/snes/utils/dmdasnes.c > [1]PETSC ERROR: [0] SNESSolve line 4366 > /scratch/lindad/moose/petsc/src/snes/interface/snes.c > [0]PETSC ERROR: [0] main line 108 ex19.c > [1] SNES user Jacobian function line 2678 > /scratch/lindad/moose/petsc/src/snes/interface/snes.c > [1]PETSC ERROR: [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1] SNESComputeJacobian line 2637 > /scratch/lindad/moose/petsc/src/snes/interface/snes.c > [1]PETSC ERROR: [1] SNESSolve_NEWTONLS line 144 > /scratch/lindad/moose/petsc/src/snes/impls/ls/ls.c > [1]PETSC ERROR: [0]PETSC ERROR: Signal received > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: [1] SNESSolve line 4366 > /scratch/lindad/moose/petsc/src/snes/interface/snes.c > [1]PETSC ERROR: [1] main line 108 ex19.c > Petsc Release Version 3.12.4, unknown > [0]PETSC ERROR: ./ex19 on a arch-moose named lemhi2 by lindad Tue May 12 > 12:54:11 2020 > [0]PETSC ERROR: [1]PETSC ERROR: Configure options --download-hypre=1 > --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 > --download-metis=1 --download-ptscotch=1 --download-parmetis=1 > --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 > --download-slepc=git://https://gitlab.com/slepc/slepc.git > --download-slepc-commit= 59ff81b --with-mpi=1 --with-cxx-dialect=C++11 > --with-fortran-bindings=0 --with-sowing=0 --with-cc=mpiicc > --with-cxx=mpiicpc --with-fc=mpiifort --with-debugging=yes > [0]PETSC ERROR: #1 User provided function() line 0 in unknown file > --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: Signal received > [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.12.4, unknown > [1]PETSC ERROR: Abort(59) on node 0 (rank 0 in comm 0): application called > MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > ./ex19 on a arch-moose named lemhi2 by lindad Tue May 12 12:54:11 2020 > [1]PETSC ERROR: Configure options --download-hypre=1 --with-debugging=no > --with-shared-libraries=1 --download-fblaslapack=1 --download-metis=1 > --download-ptscotch=1 --download-parmetis=1 --download-superlu_dist=1 > --download-mumps=1 --download-scalapack=1 --download-slepc=git:// > https://gitlab.com/slepc/slepc.git --download-slepc-commit= 59ff81b > --with-mpi=1 --with-cxx-dialect=C++11 --with-fortran-bindings=0 > --with-sowing=0 --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-debugging=yes > [1]PETSC ERROR: #1 User provided function() line 0 in unknown file > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the > batch system) has told this process to end > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: Abort(59) on node 1 (rank 1 in comm 0): application called > MPI_Abort(MPI_COMM_WORLD, 59) - process 1 > [1]PETSC ERROR: > ------------------------------------------------------------------------ > [1]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the > batch system) has told this process to end > [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [1]PETSC ERROR: or see > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > X to find memory corruption errors > [1]PETSC ERROR: likely location of problem given in stack below > [1]PETSC ERROR: or see > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > X to find memory corruption errors > [0]PETSC ERROR: likely location of problem given in stack below > [0]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Tue May 12 14:45:21 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 12 May 2020 14:45:21 -0500 (CDT) Subject: [petsc-users] make check failed with intel 2019 compilers In-Reply-To: References: Message-ID: On Tue, 12 May 2020, Matthew Knepley wrote: > On Tue, May 12, 2020 at 3:13 PM Alexander Lindsay > wrote: > > > The parallel make check target (ex19) fails with the error below after > > configuring/building with intel 2019 mpi compilers > > (mpiicc,mpiicpc,mpiifort). Any attempt to run valgrind or to attach to a > > debugger fails with `mpiexec: Error: unknown option "-pmi_args"`. I've > > attached configure.log. Does anyone have any ideas off the top of their > > head? We're trying to link MOOSE with a project that refuses to use a > > toolchain other than intel's. I'm currently trying to figure out whether > > the MPI implementation matters (e.g. can I use mpich/openmpi), but for now > > I'm operating under the assumption that I need to use the intel MPI > > implementation. > > > > There have been a _lot_ of bugs in the 2019 MPI for some reason. Is it at > all possible to rollback? > > If not, is this somewhere we can run? We have this compiler/mpi [19u3] on our KNL box. I've had weird issues with it - so we still use 18u2 on it. Satish > > Thanks, > > Matt > > > > lindad at lemhi2:/scratch/lindad/moose/petsc/src/snes/examples/tutorials((detached > > from 7c25e2d))$ mpiexec -np 2 ./ex19 > > lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > > [0]PETSC ERROR: > > ------------------------------------------------------------------------ > > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > > probably memory access out of range > > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > > [0]PETSC ERROR: or see > > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > > X to find memory corruption errors > > [0]PETSC ERROR: likely location of problem given in stack below > > [0]PETSC ERROR: --------------------- Stack Frames > > ------------------------------------ > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not > > available, > > [0]PETSC ERROR: INSTEAD the line number of the start of the function > > [0]PETSC ERROR: is given. > > [0]PETSC ERROR: [0] MPIPetsc_Type_unwrap line 38 > > /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sftype.c > > [0]PETSC ERROR: [1]PETSC ERROR: > > ------------------------------------------------------------------------ > > [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > > probably memory access out of range > > [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > > [1]PETSC ERROR: or see > > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > > X to find memory corruption errors > > [1]PETSC ERROR: likely location of problem given in stack below > > [1]PETSC ERROR: --------------------- Stack Frames > > ------------------------------------ > > [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not > > available, > > [1]PETSC ERROR: INSTEAD the line number of the start of the function > > [1]PETSC ERROR: is given. > > [1]PETSC ERROR: [1] MPIPetsc_Type_unwrap line 38 > > /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sftype.c > > [1]PETSC ERROR: [0] MPIPetsc_Type_compare line 71 > > /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sftype.c > > [0]PETSC ERROR: [0] PetscSFPackGetInUse line 514 > > /scratch/lindad/moose/petsc/src/vec/is/sf/impls/basic/sfpack.c > > [0]PETSC ERROR: [0] PetscSFBcastAndOpEnd_Basic line 305 > > /scratch/lindad/moose/petsc/src/vec/is/sf/impls/basic/sfbasic.c > > [0]PETSC ERROR: [0] PetscSFBcastAndOpEnd line 1335 > > /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sf.c > > [0]PETSC ERROR: [0] VecScatterEnd_SF line 83 > > /scratch/lindad/moose/petsc/src/vec/vscat/impls/sf/vscatsf.c > > [0]PETSC ERROR: [1] MPIPetsc_Type_compare line 71 > > /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sftype.c > > [1]PETSC ERROR: [1] PetscSFPackGetInUse line 514 > > /scratch/lindad/moose/petsc/src/vec/is/sf/impls/basic/sfpack.c > > [0] VecScatterEnd line 145 > > /scratch/lindad/moose/petsc/src/vec/vscat/interface/vscatfce.c > > [0]PETSC ERROR: [0] DMGlobalToLocalEnd_DA line 25 > > /scratch/lindad/moose/petsc/src/dm/impls/da/dagtol.c > > [0]PETSC ERROR: [1]PETSC ERROR: [1] PetscSFBcastAndOpEnd_Basic line 305 > > /scratch/lindad/moose/petsc/src/vec/is/sf/impls/basic/sfbasic.c > > [1]PETSC ERROR: [1] PetscSFBcastAndOpEnd line 1335 > > /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sf.c > > [0] DMGlobalToLocalEnd line 2368 > > /scratch/lindad/moose/petsc/src/dm/interface/dm.c > > [0]PETSC ERROR: [0] SNESComputeFunction_DMDA line 67 > > /scratch/lindad/moose/petsc/src/snes/utils/dmdasnes.c > > [0]PETSC ERROR: [1]PETSC ERROR: [1] VecScatterEnd_SF line 83 > > /scratch/lindad/moose/petsc/src/vec/vscat/impls/sf/vscatsf.c > > [0] MatFDColoringApply_AIJ line 180 > > /scratch/lindad/moose/petsc/src/mat/impls/aij/mpi/fdmpiaij.c > > [0]PETSC ERROR: [0] MatFDColoringApply line 610 > > /scratch/lindad/moose/petsc/src/mat/matfd/fdmatrix.c > > [0]PETSC ERROR: [1]PETSC ERROR: [1] VecScatterEnd line 145 > > /scratch/lindad/moose/petsc/src/vec/vscat/interface/vscatfce.c > > [1]PETSC ERROR: [1] DMGlobalToLocalEnd_DA line 25 > > /scratch/lindad/moose/petsc/src/dm/impls/da/dagtol.c > > [0] SNESComputeJacobian_DMDA line 153 > > /scratch/lindad/moose/petsc/src/snes/utils/dmdasnes.c > > [0]PETSC ERROR: [0] SNES user Jacobian function line 2678 > > /scratch/lindad/moose/petsc/src/snes/interface/snes.c > > [0]PETSC ERROR: [1]PETSC ERROR: [1] DMGlobalToLocalEnd line 2368 > > /scratch/lindad/moose/petsc/src/dm/interface/dm.c > > [1]PETSC ERROR: [1] SNESComputeFunction_DMDA line 67 > > /scratch/lindad/moose/petsc/src/snes/utils/dmdasnes.c > > [0] SNESComputeJacobian line 2637 > > /scratch/lindad/moose/petsc/src/snes/interface/snes.c > > [0]PETSC ERROR: [0] SNESSolve_NEWTONLS line 144 > > /scratch/lindad/moose/petsc/src/snes/impls/ls/ls.c > > [0]PETSC ERROR: [1]PETSC ERROR: [1] MatFDColoringApply_AIJ line 180 > > /scratch/lindad/moose/petsc/src/mat/impls/aij/mpi/fdmpiaij.c > > [1]PETSC ERROR: [1] MatFDColoringApply line 610 > > /scratch/lindad/moose/petsc/src/mat/matfd/fdmatrix.c > > [1]PETSC ERROR: [1] SNESComputeJacobian_DMDA line 153 > > /scratch/lindad/moose/petsc/src/snes/utils/dmdasnes.c > > [1]PETSC ERROR: [0] SNESSolve line 4366 > > /scratch/lindad/moose/petsc/src/snes/interface/snes.c > > [0]PETSC ERROR: [0] main line 108 ex19.c > > [1] SNES user Jacobian function line 2678 > > /scratch/lindad/moose/petsc/src/snes/interface/snes.c > > [1]PETSC ERROR: [0]PETSC ERROR: --------------------- Error Message > > -------------------------------------------------------------- > > [1] SNESComputeJacobian line 2637 > > /scratch/lindad/moose/petsc/src/snes/interface/snes.c > > [1]PETSC ERROR: [1] SNESSolve_NEWTONLS line 144 > > /scratch/lindad/moose/petsc/src/snes/impls/ls/ls.c > > [1]PETSC ERROR: [0]PETSC ERROR: Signal received > > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > > for trouble shooting. > > [0]PETSC ERROR: [1] SNESSolve line 4366 > > /scratch/lindad/moose/petsc/src/snes/interface/snes.c > > [1]PETSC ERROR: [1] main line 108 ex19.c > > Petsc Release Version 3.12.4, unknown > > [0]PETSC ERROR: ./ex19 on a arch-moose named lemhi2 by lindad Tue May 12 > > 12:54:11 2020 > > [0]PETSC ERROR: [1]PETSC ERROR: Configure options --download-hypre=1 > > --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 > > --download-metis=1 --download-ptscotch=1 --download-parmetis=1 > > --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 > > --download-slepc=git://https://gitlab.com/slepc/slepc.git > > --download-slepc-commit= 59ff81b --with-mpi=1 --with-cxx-dialect=C++11 > > --with-fortran-bindings=0 --with-sowing=0 --with-cc=mpiicc > > --with-cxx=mpiicpc --with-fc=mpiifort --with-debugging=yes > > [0]PETSC ERROR: #1 User provided function() line 0 in unknown file > > --------------------- Error Message > > -------------------------------------------------------------- > > [1]PETSC ERROR: Signal received > > [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > > for trouble shooting. > > [1]PETSC ERROR: Petsc Release Version 3.12.4, unknown > > [1]PETSC ERROR: Abort(59) on node 0 (rank 0 in comm 0): application called > > MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > > ./ex19 on a arch-moose named lemhi2 by lindad Tue May 12 12:54:11 2020 > > [1]PETSC ERROR: Configure options --download-hypre=1 --with-debugging=no > > --with-shared-libraries=1 --download-fblaslapack=1 --download-metis=1 > > --download-ptscotch=1 --download-parmetis=1 --download-superlu_dist=1 > > --download-mumps=1 --download-scalapack=1 --download-slepc=git:// > > https://gitlab.com/slepc/slepc.git --download-slepc-commit= 59ff81b > > --with-mpi=1 --with-cxx-dialect=C++11 --with-fortran-bindings=0 > > --with-sowing=0 --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > > --with-debugging=yes > > [1]PETSC ERROR: #1 User provided function() line 0 in unknown file > > [0]PETSC ERROR: > > ------------------------------------------------------------------------ > > [0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the > > batch system) has told this process to end > > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > > [0]PETSC ERROR: Abort(59) on node 1 (rank 1 in comm 0): application called > > MPI_Abort(MPI_COMM_WORLD, 59) - process 1 > > [1]PETSC ERROR: > > ------------------------------------------------------------------------ > > [1]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the > > batch system) has told this process to end > > [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > > [1]PETSC ERROR: or see > > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > > X to find memory corruption errors > > [1]PETSC ERROR: likely location of problem given in stack below > > [1]PETSC ERROR: or see > > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > > X to find memory corruption errors > > [0]PETSC ERROR: likely location of problem given in stack below > > [0]PETSC ERROR: --------------------- Stack Frames > > ------------------------------------ > > > > > From alexlindsay239 at gmail.com Tue May 12 15:44:22 2020 From: alexlindsay239 at gmail.com (Alexander Lindsay) Date: Tue, 12 May 2020 13:44:22 -0700 Subject: [petsc-users] make check failed with intel 2019 compilers In-Reply-To: References: Message-ID: Ok, this is good to know. Yea we'll probably just roll back then. Thanks! On Tue, May 12, 2020 at 12:45 PM Satish Balay wrote: > On Tue, 12 May 2020, Matthew Knepley wrote: > > > On Tue, May 12, 2020 at 3:13 PM Alexander Lindsay < > alexlindsay239 at gmail.com> > > wrote: > > > > > The parallel make check target (ex19) fails with the error below after > > > configuring/building with intel 2019 mpi compilers > > > (mpiicc,mpiicpc,mpiifort). Any attempt to run valgrind or to attach to > a > > > debugger fails with `mpiexec: Error: unknown option "-pmi_args"`. I've > > > attached configure.log. Does anyone have any ideas off the top of their > > > head? We're trying to link MOOSE with a project that refuses to use a > > > toolchain other than intel's. I'm currently trying to figure out > whether > > > the MPI implementation matters (e.g. can I use mpich/openmpi), but for > now > > > I'm operating under the assumption that I need to use the intel MPI > > > implementation. > > > > > > > There have been a _lot_ of bugs in the 2019 MPI for some reason. Is it at > > all possible to rollback? > > > > If not, is this somewhere we can run? > > We have this compiler/mpi [19u3] on our KNL box. I've had weird issues > with it - so we still use 18u2 on it. > > Satish > > > > > Thanks, > > > > Matt > > > > > > > lindad at lemhi2 > :/scratch/lindad/moose/petsc/src/snes/examples/tutorials((detached > > > from 7c25e2d))$ mpiexec -np 2 ./ex19 > > > lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > > > [0]PETSC ERROR: > > > > ------------------------------------------------------------------------ > > > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > > > probably memory access out of range > > > [0]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > > > [0]PETSC ERROR: or see > > > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > OS > > > X to find memory corruption errors > > > [0]PETSC ERROR: likely location of problem given in stack below > > > [0]PETSC ERROR: --------------------- Stack Frames > > > ------------------------------------ > > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not > > > available, > > > [0]PETSC ERROR: INSTEAD the line number of the start of the > function > > > [0]PETSC ERROR: is given. > > > [0]PETSC ERROR: [0] MPIPetsc_Type_unwrap line 38 > > > /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sftype.c > > > [0]PETSC ERROR: [1]PETSC ERROR: > > > > ------------------------------------------------------------------------ > > > [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > > > probably memory access out of range > > > [1]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > > > [1]PETSC ERROR: or see > > > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > > [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > OS > > > X to find memory corruption errors > > > [1]PETSC ERROR: likely location of problem given in stack below > > > [1]PETSC ERROR: --------------------- Stack Frames > > > ------------------------------------ > > > [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not > > > available, > > > [1]PETSC ERROR: INSTEAD the line number of the start of the > function > > > [1]PETSC ERROR: is given. > > > [1]PETSC ERROR: [1] MPIPetsc_Type_unwrap line 38 > > > /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sftype.c > > > [1]PETSC ERROR: [0] MPIPetsc_Type_compare line 71 > > > /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sftype.c > > > [0]PETSC ERROR: [0] PetscSFPackGetInUse line 514 > > > /scratch/lindad/moose/petsc/src/vec/is/sf/impls/basic/sfpack.c > > > [0]PETSC ERROR: [0] PetscSFBcastAndOpEnd_Basic line 305 > > > /scratch/lindad/moose/petsc/src/vec/is/sf/impls/basic/sfbasic.c > > > [0]PETSC ERROR: [0] PetscSFBcastAndOpEnd line 1335 > > > /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sf.c > > > [0]PETSC ERROR: [0] VecScatterEnd_SF line 83 > > > /scratch/lindad/moose/petsc/src/vec/vscat/impls/sf/vscatsf.c > > > [0]PETSC ERROR: [1] MPIPetsc_Type_compare line 71 > > > /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sftype.c > > > [1]PETSC ERROR: [1] PetscSFPackGetInUse line 514 > > > /scratch/lindad/moose/petsc/src/vec/is/sf/impls/basic/sfpack.c > > > [0] VecScatterEnd line 145 > > > /scratch/lindad/moose/petsc/src/vec/vscat/interface/vscatfce.c > > > [0]PETSC ERROR: [0] DMGlobalToLocalEnd_DA line 25 > > > /scratch/lindad/moose/petsc/src/dm/impls/da/dagtol.c > > > [0]PETSC ERROR: [1]PETSC ERROR: [1] PetscSFBcastAndOpEnd_Basic line 305 > > > /scratch/lindad/moose/petsc/src/vec/is/sf/impls/basic/sfbasic.c > > > [1]PETSC ERROR: [1] PetscSFBcastAndOpEnd line 1335 > > > /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sf.c > > > [0] DMGlobalToLocalEnd line 2368 > > > /scratch/lindad/moose/petsc/src/dm/interface/dm.c > > > [0]PETSC ERROR: [0] SNESComputeFunction_DMDA line 67 > > > /scratch/lindad/moose/petsc/src/snes/utils/dmdasnes.c > > > [0]PETSC ERROR: [1]PETSC ERROR: [1] VecScatterEnd_SF line 83 > > > /scratch/lindad/moose/petsc/src/vec/vscat/impls/sf/vscatsf.c > > > [0] MatFDColoringApply_AIJ line 180 > > > /scratch/lindad/moose/petsc/src/mat/impls/aij/mpi/fdmpiaij.c > > > [0]PETSC ERROR: [0] MatFDColoringApply line 610 > > > /scratch/lindad/moose/petsc/src/mat/matfd/fdmatrix.c > > > [0]PETSC ERROR: [1]PETSC ERROR: [1] VecScatterEnd line 145 > > > /scratch/lindad/moose/petsc/src/vec/vscat/interface/vscatfce.c > > > [1]PETSC ERROR: [1] DMGlobalToLocalEnd_DA line 25 > > > /scratch/lindad/moose/petsc/src/dm/impls/da/dagtol.c > > > [0] SNESComputeJacobian_DMDA line 153 > > > /scratch/lindad/moose/petsc/src/snes/utils/dmdasnes.c > > > [0]PETSC ERROR: [0] SNES user Jacobian function line 2678 > > > /scratch/lindad/moose/petsc/src/snes/interface/snes.c > > > [0]PETSC ERROR: [1]PETSC ERROR: [1] DMGlobalToLocalEnd line 2368 > > > /scratch/lindad/moose/petsc/src/dm/interface/dm.c > > > [1]PETSC ERROR: [1] SNESComputeFunction_DMDA line 67 > > > /scratch/lindad/moose/petsc/src/snes/utils/dmdasnes.c > > > [0] SNESComputeJacobian line 2637 > > > /scratch/lindad/moose/petsc/src/snes/interface/snes.c > > > [0]PETSC ERROR: [0] SNESSolve_NEWTONLS line 144 > > > /scratch/lindad/moose/petsc/src/snes/impls/ls/ls.c > > > [0]PETSC ERROR: [1]PETSC ERROR: [1] MatFDColoringApply_AIJ line 180 > > > /scratch/lindad/moose/petsc/src/mat/impls/aij/mpi/fdmpiaij.c > > > [1]PETSC ERROR: [1] MatFDColoringApply line 610 > > > /scratch/lindad/moose/petsc/src/mat/matfd/fdmatrix.c > > > [1]PETSC ERROR: [1] SNESComputeJacobian_DMDA line 153 > > > /scratch/lindad/moose/petsc/src/snes/utils/dmdasnes.c > > > [1]PETSC ERROR: [0] SNESSolve line 4366 > > > /scratch/lindad/moose/petsc/src/snes/interface/snes.c > > > [0]PETSC ERROR: [0] main line 108 ex19.c > > > [1] SNES user Jacobian function line 2678 > > > /scratch/lindad/moose/petsc/src/snes/interface/snes.c > > > [1]PETSC ERROR: [0]PETSC ERROR: --------------------- Error Message > > > -------------------------------------------------------------- > > > [1] SNESComputeJacobian line 2637 > > > /scratch/lindad/moose/petsc/src/snes/interface/snes.c > > > [1]PETSC ERROR: [1] SNESSolve_NEWTONLS line 144 > > > /scratch/lindad/moose/petsc/src/snes/impls/ls/ls.c > > > [1]PETSC ERROR: [0]PETSC ERROR: Signal received > > > [0]PETSC ERROR: See > https://www.mcs.anl.gov/petsc/documentation/faq.html > > > for trouble shooting. > > > [0]PETSC ERROR: [1] SNESSolve line 4366 > > > /scratch/lindad/moose/petsc/src/snes/interface/snes.c > > > [1]PETSC ERROR: [1] main line 108 ex19.c > > > Petsc Release Version 3.12.4, unknown > > > [0]PETSC ERROR: ./ex19 on a arch-moose named lemhi2 by lindad Tue May > 12 > > > 12:54:11 2020 > > > [0]PETSC ERROR: [1]PETSC ERROR: Configure options --download-hypre=1 > > > --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 > > > --download-metis=1 --download-ptscotch=1 --download-parmetis=1 > > > --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 > > > --download-slepc=git://https://gitlab.com/slepc/slepc.git > > > --download-slepc-commit= 59ff81b --with-mpi=1 --with-cxx-dialect=C++11 > > > --with-fortran-bindings=0 --with-sowing=0 --with-cc=mpiicc > > > --with-cxx=mpiicpc --with-fc=mpiifort --with-debugging=yes > > > [0]PETSC ERROR: #1 User provided function() line 0 in unknown file > > > --------------------- Error Message > > > -------------------------------------------------------------- > > > [1]PETSC ERROR: Signal received > > > [1]PETSC ERROR: See > https://www.mcs.anl.gov/petsc/documentation/faq.html > > > for trouble shooting. > > > [1]PETSC ERROR: Petsc Release Version 3.12.4, unknown > > > [1]PETSC ERROR: Abort(59) on node 0 (rank 0 in comm 0): application > called > > > MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > > > ./ex19 on a arch-moose named lemhi2 by lindad Tue May 12 12:54:11 2020 > > > [1]PETSC ERROR: Configure options --download-hypre=1 > --with-debugging=no > > > --with-shared-libraries=1 --download-fblaslapack=1 --download-metis=1 > > > --download-ptscotch=1 --download-parmetis=1 --download-superlu_dist=1 > > > --download-mumps=1 --download-scalapack=1 --download-slepc=git:// > > > https://gitlab.com/slepc/slepc.git --download-slepc-commit= 59ff81b > > > --with-mpi=1 --with-cxx-dialect=C++11 --with-fortran-bindings=0 > > > --with-sowing=0 --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > > > --with-debugging=yes > > > [1]PETSC ERROR: #1 User provided function() line 0 in unknown file > > > [0]PETSC ERROR: > > > > ------------------------------------------------------------------------ > > > [0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the > > > batch system) has told this process to end > > > [0]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > > > [0]PETSC ERROR: Abort(59) on node 1 (rank 1 in comm 0): application > called > > > MPI_Abort(MPI_COMM_WORLD, 59) - process 1 > > > [1]PETSC ERROR: > > > > ------------------------------------------------------------------------ > > > [1]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the > > > batch system) has told this process to end > > > [1]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > > > [1]PETSC ERROR: or see > > > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > > [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > OS > > > X to find memory corruption errors > > > [1]PETSC ERROR: likely location of problem given in stack below > > > [1]PETSC ERROR: or see > > > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > OS > > > X to find memory corruption errors > > > [0]PETSC ERROR: likely location of problem given in stack below > > > [0]PETSC ERROR: --------------------- Stack Frames > > > ------------------------------------ > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eijkhout at tacc.utexas.edu Wed May 13 08:25:26 2020 From: eijkhout at tacc.utexas.edu (Victor Eijkhout) Date: Wed, 13 May 2020 13:25:26 +0000 Subject: [petsc-users] make check failed with intel 2019 compilers In-Reply-To: References: Message-ID: <16B98E57-F3D9-4636-9ED4-264BCA69E8D8@tacc.utexas.edu> On , 2020May12, at 14:45, Satish Balay via petsc-users > wrote: We have this compiler/mpi [19u3] on our KNL box. I've had weird issues with it - so we still use 18u2 on it. Yeah, with 19 don?t use anything short of update 6. Victor. -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Wed May 13 15:33:51 2020 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Wed, 13 May 2020 15:33:51 -0500 Subject: [petsc-users] make check failed with intel 2019 compilers In-Reply-To: References: Message-ID: Alexander, I reproduced the error with Intel MPI 2019.3.199 and I can confirm it is because Intel MPI_Type_get_envelope() is wrong. --Junchao Zhang On Tue, May 12, 2020 at 3:45 PM Alexander Lindsay wrote: > Ok, this is good to know. Yea we'll probably just roll back then. Thanks! > > On Tue, May 12, 2020 at 12:45 PM Satish Balay wrote: > >> On Tue, 12 May 2020, Matthew Knepley wrote: >> >> > On Tue, May 12, 2020 at 3:13 PM Alexander Lindsay < >> alexlindsay239 at gmail.com> >> > wrote: >> > >> > > The parallel make check target (ex19) fails with the error below after >> > > configuring/building with intel 2019 mpi compilers >> > > (mpiicc,mpiicpc,mpiifort). Any attempt to run valgrind or to attach >> to a >> > > debugger fails with `mpiexec: Error: unknown option "-pmi_args"`. I've >> > > attached configure.log. Does anyone have any ideas off the top of >> their >> > > head? We're trying to link MOOSE with a project that refuses to use a >> > > toolchain other than intel's. I'm currently trying to figure out >> whether >> > > the MPI implementation matters (e.g. can I use mpich/openmpi), but >> for now >> > > I'm operating under the assumption that I need to use the intel MPI >> > > implementation. >> > > >> > >> > There have been a _lot_ of bugs in the 2019 MPI for some reason. Is it >> at >> > all possible to rollback? >> > >> > If not, is this somewhere we can run? >> >> We have this compiler/mpi [19u3] on our KNL box. I've had weird issues >> with it - so we still use 18u2 on it. >> >> Satish >> >> > >> > Thanks, >> > >> > Matt >> > >> > >> > > lindad at lemhi2 >> :/scratch/lindad/moose/petsc/src/snes/examples/tutorials((detached >> > > from 7c25e2d))$ mpiexec -np 2 ./ex19 >> > > lid velocity = 0.0625, prandtl # = 1., grashof # = 1. >> > > [0]PETSC ERROR: >> > > >> ------------------------------------------------------------------------ >> > > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, >> > > probably memory access out of range >> > > [0]PETSC ERROR: Try option -start_in_debugger or >> -on_error_attach_debugger >> > > [0]PETSC ERROR: or see >> > > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> > > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple >> Mac OS >> > > X to find memory corruption errors >> > > [0]PETSC ERROR: likely location of problem given in stack below >> > > [0]PETSC ERROR: --------------------- Stack Frames >> > > ------------------------------------ >> > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not >> > > available, >> > > [0]PETSC ERROR: INSTEAD the line number of the start of the >> function >> > > [0]PETSC ERROR: is given. >> > > [0]PETSC ERROR: [0] MPIPetsc_Type_unwrap line 38 >> > > /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sftype.c >> > > [0]PETSC ERROR: [1]PETSC ERROR: >> > > >> ------------------------------------------------------------------------ >> > > [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, >> > > probably memory access out of range >> > > [1]PETSC ERROR: Try option -start_in_debugger or >> -on_error_attach_debugger >> > > [1]PETSC ERROR: or see >> > > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> > > [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple >> Mac OS >> > > X to find memory corruption errors >> > > [1]PETSC ERROR: likely location of problem given in stack below >> > > [1]PETSC ERROR: --------------------- Stack Frames >> > > ------------------------------------ >> > > [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not >> > > available, >> > > [1]PETSC ERROR: INSTEAD the line number of the start of the >> function >> > > [1]PETSC ERROR: is given. >> > > [1]PETSC ERROR: [1] MPIPetsc_Type_unwrap line 38 >> > > /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sftype.c >> > > [1]PETSC ERROR: [0] MPIPetsc_Type_compare line 71 >> > > /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sftype.c >> > > [0]PETSC ERROR: [0] PetscSFPackGetInUse line 514 >> > > /scratch/lindad/moose/petsc/src/vec/is/sf/impls/basic/sfpack.c >> > > [0]PETSC ERROR: [0] PetscSFBcastAndOpEnd_Basic line 305 >> > > /scratch/lindad/moose/petsc/src/vec/is/sf/impls/basic/sfbasic.c >> > > [0]PETSC ERROR: [0] PetscSFBcastAndOpEnd line 1335 >> > > /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sf.c >> > > [0]PETSC ERROR: [0] VecScatterEnd_SF line 83 >> > > /scratch/lindad/moose/petsc/src/vec/vscat/impls/sf/vscatsf.c >> > > [0]PETSC ERROR: [1] MPIPetsc_Type_compare line 71 >> > > /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sftype.c >> > > [1]PETSC ERROR: [1] PetscSFPackGetInUse line 514 >> > > /scratch/lindad/moose/petsc/src/vec/is/sf/impls/basic/sfpack.c >> > > [0] VecScatterEnd line 145 >> > > /scratch/lindad/moose/petsc/src/vec/vscat/interface/vscatfce.c >> > > [0]PETSC ERROR: [0] DMGlobalToLocalEnd_DA line 25 >> > > /scratch/lindad/moose/petsc/src/dm/impls/da/dagtol.c >> > > [0]PETSC ERROR: [1]PETSC ERROR: [1] PetscSFBcastAndOpEnd_Basic line >> 305 >> > > /scratch/lindad/moose/petsc/src/vec/is/sf/impls/basic/sfbasic.c >> > > [1]PETSC ERROR: [1] PetscSFBcastAndOpEnd line 1335 >> > > /scratch/lindad/moose/petsc/src/vec/is/sf/interface/sf.c >> > > [0] DMGlobalToLocalEnd line 2368 >> > > /scratch/lindad/moose/petsc/src/dm/interface/dm.c >> > > [0]PETSC ERROR: [0] SNESComputeFunction_DMDA line 67 >> > > /scratch/lindad/moose/petsc/src/snes/utils/dmdasnes.c >> > > [0]PETSC ERROR: [1]PETSC ERROR: [1] VecScatterEnd_SF line 83 >> > > /scratch/lindad/moose/petsc/src/vec/vscat/impls/sf/vscatsf.c >> > > [0] MatFDColoringApply_AIJ line 180 >> > > /scratch/lindad/moose/petsc/src/mat/impls/aij/mpi/fdmpiaij.c >> > > [0]PETSC ERROR: [0] MatFDColoringApply line 610 >> > > /scratch/lindad/moose/petsc/src/mat/matfd/fdmatrix.c >> > > [0]PETSC ERROR: [1]PETSC ERROR: [1] VecScatterEnd line 145 >> > > /scratch/lindad/moose/petsc/src/vec/vscat/interface/vscatfce.c >> > > [1]PETSC ERROR: [1] DMGlobalToLocalEnd_DA line 25 >> > > /scratch/lindad/moose/petsc/src/dm/impls/da/dagtol.c >> > > [0] SNESComputeJacobian_DMDA line 153 >> > > /scratch/lindad/moose/petsc/src/snes/utils/dmdasnes.c >> > > [0]PETSC ERROR: [0] SNES user Jacobian function line 2678 >> > > /scratch/lindad/moose/petsc/src/snes/interface/snes.c >> > > [0]PETSC ERROR: [1]PETSC ERROR: [1] DMGlobalToLocalEnd line 2368 >> > > /scratch/lindad/moose/petsc/src/dm/interface/dm.c >> > > [1]PETSC ERROR: [1] SNESComputeFunction_DMDA line 67 >> > > /scratch/lindad/moose/petsc/src/snes/utils/dmdasnes.c >> > > [0] SNESComputeJacobian line 2637 >> > > /scratch/lindad/moose/petsc/src/snes/interface/snes.c >> > > [0]PETSC ERROR: [0] SNESSolve_NEWTONLS line 144 >> > > /scratch/lindad/moose/petsc/src/snes/impls/ls/ls.c >> > > [0]PETSC ERROR: [1]PETSC ERROR: [1] MatFDColoringApply_AIJ line 180 >> > > /scratch/lindad/moose/petsc/src/mat/impls/aij/mpi/fdmpiaij.c >> > > [1]PETSC ERROR: [1] MatFDColoringApply line 610 >> > > /scratch/lindad/moose/petsc/src/mat/matfd/fdmatrix.c >> > > [1]PETSC ERROR: [1] SNESComputeJacobian_DMDA line 153 >> > > /scratch/lindad/moose/petsc/src/snes/utils/dmdasnes.c >> > > [1]PETSC ERROR: [0] SNESSolve line 4366 >> > > /scratch/lindad/moose/petsc/src/snes/interface/snes.c >> > > [0]PETSC ERROR: [0] main line 108 ex19.c >> > > [1] SNES user Jacobian function line 2678 >> > > /scratch/lindad/moose/petsc/src/snes/interface/snes.c >> > > [1]PETSC ERROR: [0]PETSC ERROR: --------------------- Error Message >> > > -------------------------------------------------------------- >> > > [1] SNESComputeJacobian line 2637 >> > > /scratch/lindad/moose/petsc/src/snes/interface/snes.c >> > > [1]PETSC ERROR: [1] SNESSolve_NEWTONLS line 144 >> > > /scratch/lindad/moose/petsc/src/snes/impls/ls/ls.c >> > > [1]PETSC ERROR: [0]PETSC ERROR: Signal received >> > > [0]PETSC ERROR: See >> https://www.mcs.anl.gov/petsc/documentation/faq.html >> > > for trouble shooting. >> > > [0]PETSC ERROR: [1] SNESSolve line 4366 >> > > /scratch/lindad/moose/petsc/src/snes/interface/snes.c >> > > [1]PETSC ERROR: [1] main line 108 ex19.c >> > > Petsc Release Version 3.12.4, unknown >> > > [0]PETSC ERROR: ./ex19 on a arch-moose named lemhi2 by lindad Tue May >> 12 >> > > 12:54:11 2020 >> > > [0]PETSC ERROR: [1]PETSC ERROR: Configure options --download-hypre=1 >> > > --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 >> > > --download-metis=1 --download-ptscotch=1 --download-parmetis=1 >> > > --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 >> > > --download-slepc=git://https://gitlab.com/slepc/slepc.git >> > > --download-slepc-commit= 59ff81b --with-mpi=1 --with-cxx-dialect=C++11 >> > > --with-fortran-bindings=0 --with-sowing=0 --with-cc=mpiicc >> > > --with-cxx=mpiicpc --with-fc=mpiifort --with-debugging=yes >> > > [0]PETSC ERROR: #1 User provided function() line 0 in unknown file >> > > --------------------- Error Message >> > > -------------------------------------------------------------- >> > > [1]PETSC ERROR: Signal received >> > > [1]PETSC ERROR: See >> https://www.mcs.anl.gov/petsc/documentation/faq.html >> > > for trouble shooting. >> > > [1]PETSC ERROR: Petsc Release Version 3.12.4, unknown >> > > [1]PETSC ERROR: Abort(59) on node 0 (rank 0 in comm 0): application >> called >> > > MPI_Abort(MPI_COMM_WORLD, 59) - process 0 >> > > ./ex19 on a arch-moose named lemhi2 by lindad Tue May 12 12:54:11 2020 >> > > [1]PETSC ERROR: Configure options --download-hypre=1 >> --with-debugging=no >> > > --with-shared-libraries=1 --download-fblaslapack=1 --download-metis=1 >> > > --download-ptscotch=1 --download-parmetis=1 --download-superlu_dist=1 >> > > --download-mumps=1 --download-scalapack=1 --download-slepc=git:// >> > > https://gitlab.com/slepc/slepc.git --download-slepc-commit= 59ff81b >> > > --with-mpi=1 --with-cxx-dialect=C++11 --with-fortran-bindings=0 >> > > --with-sowing=0 --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort >> > > --with-debugging=yes >> > > [1]PETSC ERROR: #1 User provided function() line 0 in unknown file >> > > [0]PETSC ERROR: >> > > >> ------------------------------------------------------------------------ >> > > [0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or >> the >> > > batch system) has told this process to end >> > > [0]PETSC ERROR: Try option -start_in_debugger or >> -on_error_attach_debugger >> > > [0]PETSC ERROR: Abort(59) on node 1 (rank 1 in comm 0): application >> called >> > > MPI_Abort(MPI_COMM_WORLD, 59) - process 1 >> > > [1]PETSC ERROR: >> > > >> ------------------------------------------------------------------------ >> > > [1]PETSC ERROR: Caught signal number 15 Terminate: Some process (or >> the >> > > batch system) has told this process to end >> > > [1]PETSC ERROR: Try option -start_in_debugger or >> -on_error_attach_debugger >> > > [1]PETSC ERROR: or see >> > > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> > > [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple >> Mac OS >> > > X to find memory corruption errors >> > > [1]PETSC ERROR: likely location of problem given in stack below >> > > [1]PETSC ERROR: or see >> > > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> > > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple >> Mac OS >> > > X to find memory corruption errors >> > > [0]PETSC ERROR: likely location of problem given in stack below >> > > [0]PETSC ERROR: --------------------- Stack Frames >> > > ------------------------------------ >> > > >> > >> > >> > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From zakaryah at gmail.com Fri May 15 10:16:29 2020 From: zakaryah at gmail.com (zakaryah .) Date: Fri, 15 May 2020 11:16:29 -0400 Subject: [petsc-users] DMCreateRestriction with DMDA Message-ID: Hi everyone, I am wondering about how to use PETSc to create a restriction on a structured mesh. The documentation for DMCreateRestriction says: For DMDA objects this only works for "uniform refinement", that is the > refined mesh was obtained DMRefine() or the coarse mesh was obtained by > DMCoarsen(). > If the fine DMDA is dmf, shouldn't I be able to do the following? DMCoarsen(dmf,comm,&dmc); > DMCreateRestriction(dmc,dmf,&restriction); > When I try this, I get the error > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: No support for this operation for this object type > [0]PETSC ERROR: DM type da does not implement DMCreateRestriction > It seems to me it shouldn't be too hard to get the restriction for a uniform coarsening, so I'm sure I'm missing something. Thanks in advance for your help! Cheers, Zak -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri May 15 10:55:58 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 15 May 2020 11:55:58 -0400 Subject: [petsc-users] DMCreateRestriction with DMDA In-Reply-To: References: Message-ID: On Fri, May 15, 2020 at 11:17 AM zakaryah . wrote: > Hi everyone, > > I am wondering about how to use PETSc to create a restriction on a > structured mesh. The documentation for DMCreateRestriction says: > > For DMDA objects this only works for "uniform refinement", that is the >> refined mesh was obtained DMRefine() or the coarse mesh was obtained by >> DMCoarsen(). >> > > If the fine DMDA is dmf, shouldn't I be able to do the following? > > DMCoarsen(dmf,comm,&dmc); >> DMCreateRestriction(dmc,dmf,&restriction); >> > > When I try this, I get the error > >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: No support for this operation for this object type >> [0]PETSC ERROR: DM type da does not implement DMCreateRestriction >> > > It seems to me it shouldn't be too hard to get the restriction for a > uniform coarsening, so I'm sure I'm missing something. Thanks in advance > for your help! > I believe we have only implemented DMCreateInterpolation(). Thanks, Matt > Cheers, Zak > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbuerkle at web.de Fri May 15 10:55:53 2020 From: mbuerkle at web.de (Marius Buerkle) Date: Fri, 15 May 2020 17:55:53 +0200 Subject: [petsc-users] *****SPAM*****PCGASMSetSubdomains fortran interface Message-ID: An HTML attachment was scrubbed... URL: From lbllm2018 at hotmail.com Sun May 10 00:17:34 2020 From: lbllm2018 at hotmail.com (Bin Liu) Date: Sun, 10 May 2020 05:17:34 +0000 Subject: [petsc-users] *****SPAM*****support of complex number Message-ID: Dear all, I did not specifically configure my petsc installation with complex number support, e.g., -with-scalar-type = complex. How do I check if my installation of petsc support complex number in PetscScalar? Does PETSc support complex number by default during configuration without specifically including "-with-scalar-type = complex"? Best -------------- next part -------------- An HTML attachment was scrubbed... URL: From shaswat121994 at gmail.com Mon May 18 01:48:30 2020 From: shaswat121994 at gmail.com (Shashwat Tiwari) Date: Mon, 18 May 2020 12:18:30 +0530 Subject: [petsc-users] Cell-Cell adjacency in DMPlex Message-ID: Hi, I am writing a second order finite volume scheme on 2D unstructured grids using DMPlex. For this, I need a vertex neighbor stencil for the cells (i.e. stencil of the cells which share atleast one vertex with the given cell). I am trying to achieve this using two of the Petsc functions, "DMPlexCreateNeighborCSR" and "DMPlexGetAdjacency" and using "DMSetBasicAdjacency" function to set whether to use closure or not in order to get face neighbors and vertex neighbors. Here is a simple code I have written to test these functions out: int main(int argc, char **argv) { PetscErrorCode ierr; DM dm, dmDist = NULL; PetscBool interpolate = PETSC_TRUE, status; PetscBool useCone = PETSC_TRUE, useClosure = PETSC_TRUE; PetscInt overlap = 1; PetscMPIInt rank, size; char filename[PETSC_MAX_PATH_LEN]; ierr = PetscInitialize(&argc, &argv, (char*)0, help); CHKERRQ(ierr); MPI_Comm_rank(PETSC_COMM_WORLD, &rank); MPI_Comm_size(PETSC_COMM_WORLD, &size); // get mesh filename ierr = PetscOptionsGetString(NULL, NULL, "-mesh", filename, PETSC_MAX_PATH_LEN, &status); if(status) // gmsh file provided by user { char file[PETSC_MAX_PATH_LEN]; ierr = PetscStrcpy(file, filename); CHKERRQ(ierr); ierr = PetscSNPrintf(filename, sizeof filename,"./%s", file); CHKERRQ(ierr); ierr = PetscPrintf(PETSC_COMM_WORLD, "Reading gmsh %s ... ", file); CHKERRQ(ierr); ierr = DMPlexCreateGmshFromFile(PETSC_COMM_WORLD, filename, interpolate, &dm); CHKERRQ(ierr); ierr = PetscPrintf(PETSC_COMM_WORLD, "Done\n"); CHKERRQ(ierr); } // distribute mesh over processes; ierr = DMSetBasicAdjacency(dm, useCone, useClosure); CHKERRQ(ierr); ierr = DMPlexDistribute(dm, overlap, NULL, &dmDist); CHKERRQ(ierr); if(dmDist) { ierr = DMDestroy(&dm); CHKERRQ(ierr); dm = dmDist; } // print mesh information ierr = PetscPrintf(PETSC_COMM_WORLD, "overlap: %d, " "distributed among %d processors\n", overlap, size); CHKERRQ(ierr); // construct ghost cells PetscInt nGhost; // number of ghost cells DM dmG; // DM with ghost cells ierr = DMPlexConstructGhostCells(dm, NULL, &nGhost, &dmG); CHKERRQ(ierr); if(dmG) { ierr = DMDestroy(&dm); CHKERRQ(ierr); dm = dmG; } ierr = DMSetUp(dm); CHKERRQ(ierr); ierr = DMView(dm, PETSC_VIEWER_STDOUT_WORLD); // testing cell-cell adjacency ierr = DMGetBasicAdjacency(dm, &useCone, &useClosure); CHKERRQ(ierr); ierr = PetscPrintf(PETSC_COMM_WORLD, "useCone = %d, useClosure = %d\n", useCone, useClosure); CHKERRQ(ierr); // using DMPlexCreateNeighborCSR PetscInt numVertices, *offsets, *adjacency; ierr = DMPlexCreateNeighborCSR(dm, 0, &numVertices, &offsets, &adjacency); CHKERRQ(ierr); ierr = PetscPrintf(PETSC_COMM_WORLD, "using \"DMPlexCreateNeighborCSR\"\n"); CHKERRQ(ierr); ierr = PetscPrintf(PETSC_COMM_WORLD, "numVertices = %d\n", numVertices); CHKERRQ(ierr); PetscInt i, j, nCells = 32; for(i=0; i -------------- next part -------------- A non-text attachment was scrubbed... Name: test.geo Type: application/octet-stream Size: 374 bytes Desc: not available URL: From knepley at gmail.com Mon May 18 06:13:57 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 18 May 2020 07:13:57 -0400 Subject: [petsc-users] Cell-Cell adjacency in DMPlex In-Reply-To: References: Message-ID: On Mon, May 18, 2020 at 2:49 AM Shashwat Tiwari wrote: > Hi, > I am writing a second order finite volume scheme on 2D unstructured grids > using DMPlex. For this, I need a vertex neighbor stencil for the cells > (i.e. stencil of the cells which share atleast one vertex with the given > cell). I am trying to achieve this using two of the Petsc functions, > "DMPlexCreateNeighborCSR" and "DMPlexGetAdjacency" and using > "DMSetBasicAdjacency" function to set whether to use closure or not in > order to get face neighbors and vertex neighbors. Here is a simple code I > have written to test these functions out: > > int main(int argc, char **argv) > { > PetscErrorCode ierr; > DM dm, dmDist = NULL; > PetscBool interpolate = PETSC_TRUE, status; > PetscBool useCone = PETSC_TRUE, useClosure = PETSC_TRUE; > PetscInt overlap = 1; > PetscMPIInt rank, size; > char filename[PETSC_MAX_PATH_LEN]; > > ierr = PetscInitialize(&argc, &argv, (char*)0, help); CHKERRQ(ierr); > MPI_Comm_rank(PETSC_COMM_WORLD, &rank); > MPI_Comm_size(PETSC_COMM_WORLD, &size); > > // get mesh filename > ierr = PetscOptionsGetString(NULL, NULL, "-mesh", filename, > PETSC_MAX_PATH_LEN, &status); > if(status) // gmsh file provided by user > { > char file[PETSC_MAX_PATH_LEN]; > ierr = PetscStrcpy(file, filename); CHKERRQ(ierr); > ierr = PetscSNPrintf(filename, sizeof filename,"./%s", file); > CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD, "Reading gmsh %s ... ", file); > CHKERRQ(ierr); > ierr = DMPlexCreateGmshFromFile(PETSC_COMM_WORLD, filename, > interpolate, &dm); CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD, "Done\n"); CHKERRQ(ierr); > } > > // distribute mesh over processes; > ierr = DMSetBasicAdjacency(dm, useCone, useClosure); CHKERRQ(ierr); > ierr = DMPlexDistribute(dm, overlap, NULL, &dmDist); CHKERRQ(ierr); > if(dmDist) > { > ierr = DMDestroy(&dm); CHKERRQ(ierr); > dm = dmDist; > } > // print mesh information > ierr = PetscPrintf(PETSC_COMM_WORLD, "overlap: %d, " > "distributed among %d > processors\n", > overlap, size); CHKERRQ(ierr); > > // construct ghost cells > PetscInt nGhost; // number of ghost cells > DM dmG; // DM with ghost cells > ierr = DMPlexConstructGhostCells(dm, NULL, &nGhost, &dmG); > CHKERRQ(ierr); > if(dmG) > { > ierr = DMDestroy(&dm); CHKERRQ(ierr); > dm = dmG; > } > > ierr = DMSetUp(dm); CHKERRQ(ierr); > ierr = DMView(dm, PETSC_VIEWER_STDOUT_WORLD); > > // testing cell-cell adjacency > ierr = DMGetBasicAdjacency(dm, &useCone, &useClosure); CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD, "useCone = %d, useClosure = %d\n", > useCone, useClosure); CHKERRQ(ierr); > > // using DMPlexCreateNeighborCSR > PetscInt numVertices, *offsets, *adjacency; > ierr = DMPlexCreateNeighborCSR(dm, 0, &numVertices, &offsets, > &adjacency); CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD, "using > \"DMPlexCreateNeighborCSR\"\n"); CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD, "numVertices = %d\n", > numVertices); CHKERRQ(ierr); > PetscInt i, j, nCells = 32; > for(i=0; i { > PetscPrintf(PETSC_COMM_WORLD, "cell number = %d\n", i); > for(j=offsets[i]; j { > PetscPrintf(PETSC_COMM_WORLD, "nbr cell = %d\n", adjacency[j]); > } > } > > // using DMPlexGetAdjacency > PetscInt adjSize = 10, *adj=NULL; > ierr = PetscPrintf(PETSC_COMM_WORLD, "using \"DMPlexGetAdjacency\"\n"); > CHKERRQ(ierr); > for(i=0; i { > ierr = DMPlexGetAdjacency(dm, i, &adjSize, &adj); CHKERRQ(ierr); > ierr = PetscIntView(adjSize, adj, PETSC_VIEWER_STDOUT_WORLD); > CHKERRQ(ierr); > ierr = PetscFree(adj); CHKERRQ(ierr); > } > > // cleanup > ierr = DMDestroy(&dm); CHKERRQ(ierr); > ierr = PetscFinalize(); CHKERRQ(ierr); > return ierr; > } > > I am reading the mesh from a gmsh file which I have attached. When I set > the "useClosure" to PETSC_FALSE, both the functions give the expected > result, i.e. the list of all the neighbors sharing a face with the given > cell. But, when I set it to PETSC_TRUE, the "DMPlexCreateNeighborCSR" still > gives the face neighbors > Yes, by definition this is using the dual mesh. We could make another function that used the adjacency information, but here I was following the input prescription for the mesh partitioners. > while "DMPlexGetAdjacency" gives the following error: > This does look like a bug in the preallocation for this case. I will check it out. THanks, Matt > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Petsc has generated inconsistent data > [0]PETSC ERROR: Invalid mesh exceeded adjacency allocation (10) > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.13.0, unknown > [0]PETSC ERROR: ./esue_test on a arch-linux2-c-debug named shashwat by > shashwat Mon May 18 12:04:27 2020 > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ > --with-fc=gfortran --download-mpich --download-fblaslapack > --with-debugging=1 --download-triangle > [0]PETSC ERROR: #1 DMPlexGetAdjacency_Transitive_Internal() line 173 in > /home/shashwat/local/petsc/src/dm/impls/plex/plexdistribute.c > [0]PETSC ERROR: #2 DMPlexGetAdjacency_Internal() line 223 in > /home/shashwat/local/petsc/src/dm/impls/plex/plexdistribute.c > [0]PETSC ERROR: #3 DMPlexGetAdjacency() line 300 in > /home/shashwat/local/petsc/src/dm/impls/plex/plexdistribute.c > [0]PETSC ERROR: #4 main() line 583 in esue_test.c > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -mesh test.msh > [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > application called MPI_Abort(MPI_COMM_SELF, 583077) - process 0 > > Kindly look into it and let me know what mistake I might be making, and > also, if there is some other way in Petsc to get the Vertex Neighbors for a > given cell. > > Regards, > Shashwat > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Mon May 18 15:28:14 2020 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 18 May 2020 16:28:14 -0400 Subject: [petsc-users] configure erro Message-ID: I just rebased with master and now get this hdf5 error. Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 626412 bytes Desc: not available URL: From knepley at gmail.com Mon May 18 15:31:19 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 18 May 2020 16:31:19 -0400 Subject: [petsc-users] configure erro In-Reply-To: References: Message-ID: It almost looks like your HDF5 tarball is incomplete. Matt On Mon, May 18, 2020 at 4:29 PM Mark Adams wrote: > I just rebased with master and now get this hdf5 error. > Thanks, > Mark > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Mon May 18 16:38:04 2020 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 18 May 2020 17:38:04 -0400 Subject: [petsc-users] configure erro In-Reply-To: References: Message-ID: Thanks, the file system on SUMMIT (home directories) has problems. I moved to the scratch (working) directories and it seems fine. On Mon, May 18, 2020 at 4:31 PM Matthew Knepley wrote: > It almost looks like your HDF5 tarball is incomplete. > > Matt > > On Mon, May 18, 2020 at 4:29 PM Mark Adams wrote: > >> I just rebased with master and now get this hdf5 error. >> Thanks, >> Mark >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zakaryah at gmail.com Tue May 19 00:56:05 2020 From: zakaryah at gmail.com (zakaryah .) Date: Tue, 19 May 2020 01:56:05 -0400 Subject: [petsc-users] Jacobian test by finite difference Message-ID: Hi all, I'm debugging some convergence issues and I came across something strange. I am using a nonlinear solver on a structured grid. The DMDA is 3 dimensional, with 3 dof, and a box stencil of width 1. There are some small errors in the hand-coded Jacobian, which I am trying to sort out, but at least the fill pattern of the matrix is correct. However, when I run with -snes_test_jacobian -snes_test_jacobian_display -snes_compare_explicit, I see something very strange. The finite difference Jacobian has large terms outside the stencil. For example, for x,y,z,c = 0,0,0,0 (row 0), the columns 6, 7, 8, 12, 13, and 14 (column 6 => x=2,y=0,z=0,c=0, etc.) have large values, while columns 9 through 20 are calculated but equal to zero. The "correct" values, i.e., in the stencil, are calculated as well, and nearly agree with my hand-coded Jacobian. This issue does NOT occur in serial, but occurs for any number of processors greater than 1. I have checked the indexing by hand, carefully, and the memory access with valgrind, and the results were clean. Does anyone have an idea why the finite difference calculation of the Jacobian would produce large values outside the stencil? I am using PETSc 3.12.2 and openMPI 3.1.0. Thanks for your help. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue May 19 04:55:51 2020 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 19 May 2020 05:55:51 -0400 Subject: [petsc-users] Jacobian test by finite difference In-Reply-To: References: Message-ID: On Tue, May 19, 2020 at 1:57 AM zakaryah . wrote: > Hi all, > > I'm debugging some convergence issues and I came across something strange. > I am using a nonlinear solver on a structured grid. The DMDA is 3 > dimensional, with 3 dof, and a box stencil of width 1. There are some small > errors in the hand-coded Jacobian, which I am trying to sort out, but at > least the fill pattern of the matrix is correct. > > However, when I run with -snes_test_jacobian -snes_test_jacobian_display > -snes_compare_explicit, I see something very strange. The finite difference > Jacobian has large terms outside the stencil. For example, for x,y,z,c = > 0,0,0,0 (row 0), the columns 6, 7, 8, 12, 13, and 14 (column 6 => > x=2,y=0,z=0,c=0, etc.) have large values, while columns 9 through 20 are > calculated but equal to zero. The "correct" values, i.e., in the stencil, > are calculated as well, and nearly agree with my hand-coded Jacobian. This > issue does NOT occur in serial, but occurs for any number of processors > greater than 1. > > I have checked the indexing by hand, carefully, and the memory access with > valgrind, and the results were clean. Does anyone have an idea why the > finite difference calculation of the Jacobian would produce large values > outside the stencil? I am using PETSc 3.12.2 and openMPI 3.1.0. Thanks for > your help. > Since it only appears in parallel, I am guessing that your calculation of global ordering does not take into account that we locally reorder, rather than using lexicographic ordering, and you might have periodic boundary conditions. Thanks, Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue May 19 08:12:45 2020 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 19 May 2020 09:12:45 -0400 Subject: [petsc-users] Cell-Cell adjacency in DMPlex In-Reply-To: References: Message-ID: On Mon, May 18, 2020 at 2:49 AM Shashwat Tiwari wrote: > Hi, > I am writing a second order finite volume scheme on 2D unstructured grids > using DMPlex. For this, I need a vertex neighbor stencil for the cells > (i.e. stencil of the cells which share atleast one vertex with the given > cell). I am trying to achieve this using two of the Petsc functions, > "DMPlexCreateNeighborCSR" and "DMPlexGetAdjacency" and using > "DMSetBasicAdjacency" function to set whether to use closure or not in > order to get face neighbors and vertex neighbors. Here is a simple code I > have written to test these functions out: > > int main(int argc, char **argv) > { > PetscErrorCode ierr; > DM dm, dmDist = NULL; > PetscBool interpolate = PETSC_TRUE, status; > PetscBool useCone = PETSC_TRUE, useClosure = PETSC_TRUE; > PetscInt overlap = 1; > PetscMPIInt rank, size; > char filename[PETSC_MAX_PATH_LEN]; > > ierr = PetscInitialize(&argc, &argv, (char*)0, help); CHKERRQ(ierr); > MPI_Comm_rank(PETSC_COMM_WORLD, &rank); > MPI_Comm_size(PETSC_COMM_WORLD, &size); > > // get mesh filename > ierr = PetscOptionsGetString(NULL, NULL, "-mesh", filename, > PETSC_MAX_PATH_LEN, &status); > if(status) // gmsh file provided by user > { > char file[PETSC_MAX_PATH_LEN]; > ierr = PetscStrcpy(file, filename); CHKERRQ(ierr); > ierr = PetscSNPrintf(filename, sizeof filename,"./%s", file); > CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD, "Reading gmsh %s ... ", file); > CHKERRQ(ierr); > ierr = DMPlexCreateGmshFromFile(PETSC_COMM_WORLD, filename, > interpolate, &dm); CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD, "Done\n"); CHKERRQ(ierr); > } > > // distribute mesh over processes; > ierr = DMSetBasicAdjacency(dm, useCone, useClosure); CHKERRQ(ierr); > ierr = DMPlexDistribute(dm, overlap, NULL, &dmDist); CHKERRQ(ierr); > if(dmDist) > { > ierr = DMDestroy(&dm); CHKERRQ(ierr); > dm = dmDist; > } > // print mesh information > ierr = PetscPrintf(PETSC_COMM_WORLD, "overlap: %d, " > "distributed among %d > processors\n", > overlap, size); CHKERRQ(ierr); > > // construct ghost cells > PetscInt nGhost; // number of ghost cells > DM dmG; // DM with ghost cells > ierr = DMPlexConstructGhostCells(dm, NULL, &nGhost, &dmG); > CHKERRQ(ierr); > if(dmG) > { > ierr = DMDestroy(&dm); CHKERRQ(ierr); > dm = dmG; > } > > ierr = DMSetUp(dm); CHKERRQ(ierr); > ierr = DMView(dm, PETSC_VIEWER_STDOUT_WORLD); > > // testing cell-cell adjacency > ierr = DMGetBasicAdjacency(dm, &useCone, &useClosure); CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD, "useCone = %d, useClosure = %d\n", > useCone, useClosure); CHKERRQ(ierr); > > // using DMPlexCreateNeighborCSR > PetscInt numVertices, *offsets, *adjacency; > ierr = DMPlexCreateNeighborCSR(dm, 0, &numVertices, &offsets, > &adjacency); CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD, "using > \"DMPlexCreateNeighborCSR\"\n"); CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD, "numVertices = %d\n", > numVertices); CHKERRQ(ierr); > PetscInt i, j, nCells = 32; > for(i=0; i { > PetscPrintf(PETSC_COMM_WORLD, "cell number = %d\n", i); > for(j=offsets[i]; j { > PetscPrintf(PETSC_COMM_WORLD, "nbr cell = %d\n", adjacency[j]); > } > } > > // using DMPlexGetAdjacency > PetscInt adjSize = 10, *adj=NULL; > ierr = PetscPrintf(PETSC_COMM_WORLD, "using \"DMPlexGetAdjacency\"\n"); > CHKERRQ(ierr); > for(i=0; i { > ierr = DMPlexGetAdjacency(dm, i, &adjSize, &adj); CHKERRQ(ierr); > ierr = PetscIntView(adjSize, adj, PETSC_VIEWER_STDOUT_WORLD); > CHKERRQ(ierr); > ierr = PetscFree(adj); CHKERRQ(ierr); > } > > // cleanup > ierr = DMDestroy(&dm); CHKERRQ(ierr); > ierr = PetscFinalize(); CHKERRQ(ierr); > return ierr; > } > > I am reading the mesh from a gmsh file which I have attached. When I set > the "useClosure" to PETSC_FALSE, both the functions give the expected > result, i.e. the list of all the neighbors sharing a face with the given > cell. But, when I set it to PETSC_TRUE, the "DMPlexCreateNeighborCSR" still > gives the face neighbors while "DMPlexGetAdjacency" gives the following > error: > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Petsc has generated inconsistent data > [0]PETSC ERROR: Invalid mesh exceeded adjacency allocation (10) > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.13.0, unknown > [0]PETSC ERROR: ./esue_test on a arch-linux2-c-debug named shashwat by > shashwat Mon May 18 12:04:27 2020 > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ > --with-fc=gfortran --download-mpich --download-fblaslapack > --with-debugging=1 --download-triangle > [0]PETSC ERROR: #1 DMPlexGetAdjacency_Transitive_Internal() line 173 in > /home/shashwat/local/petsc/src/dm/impls/plex/plexdistribute.c > [0]PETSC ERROR: #2 DMPlexGetAdjacency_Internal() line 223 in > /home/shashwat/local/petsc/src/dm/impls/plex/plexdistribute.c > [0]PETSC ERROR: #3 DMPlexGetAdjacency() line 300 in > /home/shashwat/local/petsc/src/dm/impls/plex/plexdistribute.c > [0]PETSC ERROR: #4 main() line 583 in esue_test.c > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -mesh test.msh > [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > application called MPI_Abort(MPI_COMM_SELF, 583077) - process 0 > > Kindly look into it and let me know what mistake I might be making, and > also, if there is some other way in Petsc to get the Vertex Neighbors for a > given cell. > You are specifying the size of the array, and its too small. I am attaching a working version. Thanks, Matt > Regards, > Shashwat > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: tester.c Type: application/octet-stream Size: 3470 bytes Desc: not available URL: From zakaryah at gmail.com Tue May 19 11:08:07 2020 From: zakaryah at gmail.com (zakaryah .) Date: Tue, 19 May 2020 12:08:07 -0400 Subject: [petsc-users] Jacobian test by finite difference In-Reply-To: References: Message-ID: > > Thanks Matt. I should have said that the boundary is set to BOUNDARY_NONE. > I am not sure what you mean about the ordering. I understand that the > actual indexing, internal to PETSc, does not match the natural ordering, as > explained in the manual. But, doesn't MatView always use the natural > ordering? Also, my hand-coded Jacobian routine uses MatSetValuesStencil, > and the corresponding output from snes_test_jacobian_display exactly > matches what I expect to see - correct layout of nonzero terms according to > natural ordering. It is only the finite difference Jacobian that contains > the unexpected, off-stencil terms. > This is weird. I run the Jacobian test manually, by adding a small perturbation to the configuration vector at each index and calculating the function for the perturbed configuration, then the result looks like it should. This makes me think there is not a problem with my routine for calculating the function, but I can't explain why the Jacobian test by finite difference that PETSc calculates would be different, and have non-zero values outside the stencil. On Tue, May 19, 2020 at 10:13 AM zakaryah . wrote: > Thanks Matt. I should have said that the boundary is set to BOUNDARY_NONE. > I am not sure what you mean about the ordering. I understand that the > actual indexing, internal to PETSc, does not match the natural ordering, as > explained in the manual. But, doesn't MatView always use the natural > ordering? Also, my hand-coded Jacobian routine uses MatSetValuesStencil, > and the corresponding output from snes_test_jacobian_display exactly > matches what I expect to see - correct layout of nonzero terms according to > natural ordering. It is only the finite difference Jacobian that contains > the unexpected, off-stencil terms. > > On Tue, May 19, 2020, 5:56 AM Matthew Knepley wrote: > >> On Tue, May 19, 2020 at 1:57 AM zakaryah . wrote: >> >>> Hi all, >>> >>> I'm debugging some convergence issues and I came across something >>> strange. I am using a nonlinear solver on a structured grid. The DMDA is 3 >>> dimensional, with 3 dof, and a box stencil of width 1. There are some small >>> errors in the hand-coded Jacobian, which I am trying to sort out, but at >>> least the fill pattern of the matrix is correct. >>> >>> However, when I run with -snes_test_jacobian -snes_test_jacobian_display >>> -snes_compare_explicit, I see something very strange. The finite difference >>> Jacobian has large terms outside the stencil. For example, for x,y,z,c = >>> 0,0,0,0 (row 0), the columns 6, 7, 8, 12, 13, and 14 (column 6 => >>> x=2,y=0,z=0,c=0, etc.) have large values, while columns 9 through 20 are >>> calculated but equal to zero. The "correct" values, i.e., in the stencil, >>> are calculated as well, and nearly agree with my hand-coded Jacobian. This >>> issue does NOT occur in serial, but occurs for any number of processors >>> greater than 1. >>> >>> I have checked the indexing by hand, carefully, and the memory access >>> with valgrind, and the results were clean. Does anyone have an idea why the >>> finite difference calculation of the Jacobian would produce large values >>> outside the stencil? I am using PETSc 3.12.2 and openMPI 3.1.0. Thanks for >>> your help. >>> >> >> Since it only appears in parallel, I am guessing that your calculation of >> global ordering does not take into account that we locally >> reorder, rather than using lexicographic ordering, and you might have >> periodic boundary conditions. >> >> Thanks, >> >> Matt >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue May 19 11:27:21 2020 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 19 May 2020 12:27:21 -0400 Subject: [petsc-users] Jacobian test by finite difference In-Reply-To: References: Message-ID: On Tue, May 19, 2020 at 12:09 PM zakaryah . wrote: > Thanks Matt. I should have said that the boundary is set to BOUNDARY_NONE. >> I am not sure what you mean about the ordering. I understand that the >> actual indexing, internal to PETSc, does not match the natural ordering, as >> explained in the manual. But, doesn't MatView always use the natural >> ordering? Also, my hand-coded Jacobian routine uses MatSetValuesStencil, >> and the corresponding output from snes_test_jacobian_display exactly >> matches what I expect to see - correct layout of nonzero terms according to >> natural ordering. It is only the finite difference Jacobian that contains >> the unexpected, off-stencil terms. >> > > This is weird. I run the Jacobian test manually, by adding a small > perturbation to the configuration vector at each index and calculating the > function for the perturbed configuration, then the result looks like it > should. This makes me think there is not a problem with my routine for > calculating the function, but I can't explain why the Jacobian test by > finite difference that PETSc calculates would be different, and have > non-zero values outside the stencil. > So it sounds like your residual function is the one generating the out of stencil changes. The FD just calls your residual with different inputs. Thanks, Matt > On Tue, May 19, 2020 at 10:13 AM zakaryah . wrote: > >> Thanks Matt. I should have said that the boundary is set to >> BOUNDARY_NONE. I am not sure what you mean about the ordering. I understand >> that the actual indexing, internal to PETSc, does not match the natural >> ordering, as explained in the manual. But, doesn't MatView always use the >> natural ordering? Also, my hand-coded Jacobian routine uses >> MatSetValuesStencil, and the corresponding output from >> snes_test_jacobian_display exactly matches what I expect to see - correct >> layout of nonzero terms according to natural ordering. It is only the >> finite difference Jacobian that contains the unexpected, off-stencil terms. >> >> On Tue, May 19, 2020, 5:56 AM Matthew Knepley wrote: >> >>> On Tue, May 19, 2020 at 1:57 AM zakaryah . wrote: >>> >>>> Hi all, >>>> >>>> I'm debugging some convergence issues and I came across something >>>> strange. I am using a nonlinear solver on a structured grid. The DMDA is 3 >>>> dimensional, with 3 dof, and a box stencil of width 1. There are some small >>>> errors in the hand-coded Jacobian, which I am trying to sort out, but at >>>> least the fill pattern of the matrix is correct. >>>> >>>> However, when I run with -snes_test_jacobian >>>> -snes_test_jacobian_display -snes_compare_explicit, I see something very >>>> strange. The finite difference Jacobian has large terms outside the >>>> stencil. For example, for x,y,z,c = 0,0,0,0 (row 0), the columns 6, 7, 8, >>>> 12, 13, and 14 (column 6 => x=2,y=0,z=0,c=0, etc.) have large values, while >>>> columns 9 through 20 are calculated but equal to zero. The "correct" >>>> values, i.e., in the stencil, are calculated as well, and nearly agree with >>>> my hand-coded Jacobian. This issue does NOT occur in serial, but occurs for >>>> any number of processors greater than 1. >>>> >>>> I have checked the indexing by hand, carefully, and the memory access >>>> with valgrind, and the results were clean. Does anyone have an idea why the >>>> finite difference calculation of the Jacobian would produce large values >>>> outside the stencil? I am using PETSc 3.12.2 and openMPI 3.1.0. Thanks for >>>> your help. >>>> >>> >>> Since it only appears in parallel, I am guessing that your calculation >>> of global ordering does not take into account that we locally >>> reorder, rather than using lexicographic ordering, and you might have >>> periodic boundary conditions. >>> >>> Thanks, >>> >>> Matt >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Tue May 19 12:17:15 2020 From: hongzhang at anl.gov (Zhang, Hong) Date: Tue, 19 May 2020 17:17:15 +0000 Subject: [petsc-users] Jacobian test by finite difference In-Reply-To: References: Message-ID: Can you post your code (at least some essential snippets) for the residual evaluation and the Jacobian evaluation? Hong (Mr.) On May 19, 2020, at 11:08 AM, zakaryah . > wrote: Thanks Matt. I should have said that the boundary is set to BOUNDARY_NONE. I am not sure what you mean about the ordering. I understand that the actual indexing, internal to PETSc, does not match the natural ordering, as explained in the manual. But, doesn't MatView always use the natural ordering? Also, my hand-coded Jacobian routine uses MatSetValuesStencil, and the corresponding output from snes_test_jacobian_display exactly matches what I expect to see - correct layout of nonzero terms according to natural ordering. It is only the finite difference Jacobian that contains the unexpected, off-stencil terms. This is weird. I run the Jacobian test manually, by adding a small perturbation to the configuration vector at each index and calculating the function for the perturbed configuration, then the result looks like it should. This makes me think there is not a problem with my routine for calculating the function, but I can't explain why the Jacobian test by finite difference that PETSc calculates would be different, and have non-zero values outside the stencil. On Tue, May 19, 2020 at 10:13 AM zakaryah . > wrote: Thanks Matt. I should have said that the boundary is set to BOUNDARY_NONE. I am not sure what you mean about the ordering. I understand that the actual indexing, internal to PETSc, does not match the natural ordering, as explained in the manual. But, doesn't MatView always use the natural ordering? Also, my hand-coded Jacobian routine uses MatSetValuesStencil, and the corresponding output from snes_test_jacobian_display exactly matches what I expect to see - correct layout of nonzero terms according to natural ordering. It is only the finite difference Jacobian that contains the unexpected, off-stencil terms. On Tue, May 19, 2020, 5:56 AM Matthew Knepley > wrote: On Tue, May 19, 2020 at 1:57 AM zakaryah . > wrote: Hi all, I'm debugging some convergence issues and I came across something strange. I am using a nonlinear solver on a structured grid. The DMDA is 3 dimensional, with 3 dof, and a box stencil of width 1. There are some small errors in the hand-coded Jacobian, which I am trying to sort out, but at least the fill pattern of the matrix is correct. However, when I run with -snes_test_jacobian -snes_test_jacobian_display -snes_compare_explicit, I see something very strange. The finite difference Jacobian has large terms outside the stencil. For example, for x,y,z,c = 0,0,0,0 (row 0), the columns 6, 7, 8, 12, 13, and 14 (column 6 => x=2,y=0,z=0,c=0, etc.) have large values, while columns 9 through 20 are calculated but equal to zero. The "correct" values, i.e., in the stencil, are calculated as well, and nearly agree with my hand-coded Jacobian. This issue does NOT occur in serial, but occurs for any number of processors greater than 1. I have checked the indexing by hand, carefully, and the memory access with valgrind, and the results were clean. Does anyone have an idea why the finite difference calculation of the Jacobian would produce large values outside the stencil? I am using PETSc 3.12.2 and openMPI 3.1.0. Thanks for your help. Since it only appears in parallel, I am guessing that your calculation of global ordering does not take into account that we locally reorder, rather than using lexicographic ordering, and you might have periodic boundary conditions. Thanks, Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From zakaryah at gmail.com Tue May 19 14:51:53 2020 From: zakaryah at gmail.com (zakaryah .) Date: Tue, 19 May 2020 15:51:53 -0400 Subject: [petsc-users] Jacobian test by finite difference In-Reply-To: References: Message-ID: I'd like to have one more pass at debugging it myself first. I think Matt is probably right about the ordering. I need to make sure I understand the ordering used in VecView. If I use DMCreateGlobalVector to create the vector for the residual, calculate the residual, then call VecView on the residual vector, how do I ensure that the output is in natural ordering? On Tue, May 19, 2020 at 1:17 PM Zhang, Hong wrote: > Can you post your code (at least some essential snippets) for the residual > evaluation and the Jacobian evaluation? > > Hong (Mr.) > > On May 19, 2020, at 11:08 AM, zakaryah . wrote: > > Thanks Matt. I should have said that the boundary is set to BOUNDARY_NONE. >> I am not sure what you mean about the ordering. I understand that the >> actual indexing, internal to PETSc, does not match the natural ordering, as >> explained in the manual. But, doesn't MatView always use the natural >> ordering? Also, my hand-coded Jacobian routine uses MatSetValuesStencil, >> and the corresponding output from snes_test_jacobian_display exactly >> matches what I expect to see - correct layout of nonzero terms according to >> natural ordering. It is only the finite difference Jacobian that contains >> the unexpected, off-stencil terms. >> > > This is weird. I run the Jacobian test manually, by adding a small > perturbation to the configuration vector at each index and calculating the > function for the perturbed configuration, then the result looks like it > should. This makes me think there is not a problem with my routine for > calculating the function, but I can't explain why the Jacobian test by > finite difference that PETSc calculates would be different, and have > non-zero values outside the stencil. > > On Tue, May 19, 2020 at 10:13 AM zakaryah . wrote: > >> Thanks Matt. I should have said that the boundary is set to >> BOUNDARY_NONE. I am not sure what you mean about the ordering. I understand >> that the actual indexing, internal to PETSc, does not match the natural >> ordering, as explained in the manual. But, doesn't MatView always use the >> natural ordering? Also, my hand-coded Jacobian routine uses >> MatSetValuesStencil, and the corresponding output from >> snes_test_jacobian_display exactly matches what I expect to see - correct >> layout of nonzero terms according to natural ordering. It is only the >> finite difference Jacobian that contains the unexpected, off-stencil terms. >> >> On Tue, May 19, 2020, 5:56 AM Matthew Knepley wrote: >> >>> On Tue, May 19, 2020 at 1:57 AM zakaryah . wrote: >>> >>>> Hi all, >>>> >>>> I'm debugging some convergence issues and I came across something >>>> strange. I am using a nonlinear solver on a structured grid. The DMDA is 3 >>>> dimensional, with 3 dof, and a box stencil of width 1. There are some small >>>> errors in the hand-coded Jacobian, which I am trying to sort out, but at >>>> least the fill pattern of the matrix is correct. >>>> >>>> However, when I run with -snes_test_jacobian >>>> -snes_test_jacobian_display -snes_compare_explicit, I see something very >>>> strange. The finite difference Jacobian has large terms outside the >>>> stencil. For example, for x,y,z,c = 0,0,0,0 (row 0), the columns 6, 7, 8, >>>> 12, 13, and 14 (column 6 => x=2,y=0,z=0,c=0, etc.) have large values, while >>>> columns 9 through 20 are calculated but equal to zero. The "correct" >>>> values, i.e., in the stencil, are calculated as well, and nearly agree with >>>> my hand-coded Jacobian. This issue does NOT occur in serial, but occurs for >>>> any number of processors greater than 1. >>>> >>>> I have checked the indexing by hand, carefully, and the memory access >>>> with valgrind, and the results were clean. Does anyone have an idea why the >>>> finite difference calculation of the Jacobian would produce large values >>>> outside the stencil? I am using PETSc 3.12.2 and openMPI 3.1.0. Thanks for >>>> your help. >>>> >>> >>> Since it only appears in parallel, I am guessing that your calculation >>> of global ordering does not take into account that we locally >>> reorder, rather than using lexicographic ordering, and you might have >>> periodic boundary conditions. >>> >>> Thanks, >>> >>> Matt >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Tue May 19 16:06:24 2020 From: hongzhang at anl.gov (Zhang, Hong) Date: Tue, 19 May 2020 21:06:24 +0000 Subject: [petsc-users] Jacobian test by finite difference In-Reply-To: References: Message-ID: <03F64DAB-43E4-411E-9973-81A993E53EE3@anl.gov> On May 19, 2020, at 2:51 PM, zakaryah . > wrote: I'd like to have one more pass at debugging it myself first. I think Matt is probably right about the ordering. I need to make sure I understand the ordering used in VecView. If I use DMCreateGlobalVector to create the vector for the residual, calculate the residual, then call VecView on the residual vector, how do I ensure that the output is in natural ordering? The output is in natural ordering by default. You can use PETSC_VIEWER_NATIVE to make output printed in PETSc ordering and have a comparison. PetscViewerPushFormat(PETSC_VIEWER_STDOUT_WORLD,PETSC_VIEWER_NATIVE); VecView(your_global_vector,PETSC_VIEWER_STDOUT_WORLD); PetscViewerPopFormat(PETSC_VIEWER_STDOUT_WORLD); Hong (Mr.) On Tue, May 19, 2020 at 1:17 PM Zhang, Hong > wrote: Can you post your code (at least some essential snippets) for the residual evaluation and the Jacobian evaluation? Hong (Mr.) On May 19, 2020, at 11:08 AM, zakaryah . > wrote: Thanks Matt. I should have said that the boundary is set to BOUNDARY_NONE. I am not sure what you mean about the ordering. I understand that the actual indexing, internal to PETSc, does not match the natural ordering, as explained in the manual. But, doesn't MatView always use the natural ordering? Also, my hand-coded Jacobian routine uses MatSetValuesStencil, and the corresponding output from snes_test_jacobian_display exactly matches what I expect to see - correct layout of nonzero terms according to natural ordering. It is only the finite difference Jacobian that contains the unexpected, off-stencil terms. This is weird. I run the Jacobian test manually, by adding a small perturbation to the configuration vector at each index and calculating the function for the perturbed configuration, then the result looks like it should. This makes me think there is not a problem with my routine for calculating the function, but I can't explain why the Jacobian test by finite difference that PETSc calculates would be different, and have non-zero values outside the stencil. On Tue, May 19, 2020 at 10:13 AM zakaryah . > wrote: Thanks Matt. I should have said that the boundary is set to BOUNDARY_NONE. I am not sure what you mean about the ordering. I understand that the actual indexing, internal to PETSc, does not match the natural ordering, as explained in the manual. But, doesn't MatView always use the natural ordering? Also, my hand-coded Jacobian routine uses MatSetValuesStencil, and the corresponding output from snes_test_jacobian_display exactly matches what I expect to see - correct layout of nonzero terms according to natural ordering. It is only the finite difference Jacobian that contains the unexpected, off-stencil terms. On Tue, May 19, 2020, 5:56 AM Matthew Knepley > wrote: On Tue, May 19, 2020 at 1:57 AM zakaryah . > wrote: Hi all, I'm debugging some convergence issues and I came across something strange. I am using a nonlinear solver on a structured grid. The DMDA is 3 dimensional, with 3 dof, and a box stencil of width 1. There are some small errors in the hand-coded Jacobian, which I am trying to sort out, but at least the fill pattern of the matrix is correct. However, when I run with -snes_test_jacobian -snes_test_jacobian_display -snes_compare_explicit, I see something very strange. The finite difference Jacobian has large terms outside the stencil. For example, for x,y,z,c = 0,0,0,0 (row 0), the columns 6, 7, 8, 12, 13, and 14 (column 6 => x=2,y=0,z=0,c=0, etc.) have large values, while columns 9 through 20 are calculated but equal to zero. The "correct" values, i.e., in the stencil, are calculated as well, and nearly agree with my hand-coded Jacobian. This issue does NOT occur in serial, but occurs for any number of processors greater than 1. I have checked the indexing by hand, carefully, and the memory access with valgrind, and the results were clean. Does anyone have an idea why the finite difference calculation of the Jacobian would produce large values outside the stencil? I am using PETSc 3.12.2 and openMPI 3.1.0. Thanks for your help. Since it only appears in parallel, I am guessing that your calculation of global ordering does not take into account that we locally reorder, rather than using lexicographic ordering, and you might have periodic boundary conditions. Thanks, Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacob.fai at gmail.com Tue May 19 16:53:11 2020 From: jacob.fai at gmail.com (Jacob Faibussowitsch) Date: Tue, 19 May 2020 16:53:11 -0500 Subject: [petsc-users] Sanitize Options Passed "Through" Petsc Message-ID: <12AFB1C4-D29A-47FC-B103-FD4D7A9844EB@gmail.com> Hello all, I use petsc/slepc as the driver for running another library code and would like to pass command line options through but this library takes the approach of erroring out when it encounters an unknown option (i.e. some option that petsc would use but it would not). My current way of dealing with the problem is to run my script with ?-help? option to save the output in a separate ascii file, then next time the program is run remove any options found in the file from the options passed through. Clearly this isn?t exactly a bulletproof scheme, so I?m wondering if there is a smarter way to do it. Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) Cell: (312) 694-3391 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue May 19 16:57:19 2020 From: jed at jedbrown.org (Jed Brown) Date: Tue, 19 May 2020 15:57:19 -0600 Subject: [petsc-users] Sanitize Options Passed "Through" Petsc In-Reply-To: <12AFB1C4-D29A-47FC-B103-FD4D7A9844EB@gmail.com> References: <12AFB1C4-D29A-47FC-B103-FD4D7A9844EB@gmail.com> Message-ID: <87h7wbmvuo.fsf@jedbrown.org> Namespace the PETSc options or use something other than the command line (e.g., PETSC_OPTIONS env variable or a file). There is no way to know in advance all the options that could be requested through the PETSc options system. If the other app has a well-defined list of allowed options, you could whitelist those at program startup and leave everything else for PETSc. Jacob Faibussowitsch writes: > Hello all, > > I use petsc/slepc as the driver for running another library code and would like to pass command line options through but this library takes the approach of erroring out when it encounters an unknown option (i.e. some option that petsc would use but it would not). My current way of dealing with the problem is to run my script with ?-help? option to save the output in a separate ascii file, then next time the program is run remove any options found in the file from the options passed through. > > Clearly this isn?t exactly a bulletproof scheme, so I?m wondering if there is a smarter way to do it. > > Best regards, > > Jacob Faibussowitsch > (Jacob Fai - booss - oh - vitch) > Cell: (312) 694-3391 From zakaryah at gmail.com Wed May 20 13:48:42 2020 From: zakaryah at gmail.com (zakaryah .) Date: Wed, 20 May 2020 14:48:42 -0400 Subject: [petsc-users] Jacobian test by finite difference In-Reply-To: <03F64DAB-43E4-411E-9973-81A993E53EE3@anl.gov> References: <03F64DAB-43E4-411E-9973-81A993E53EE3@anl.gov> Message-ID: I think I figured it out. It seems that only the hand-coded Jacobian is displayed in natural ordering, while the finite difference (and hand-coded minus finite difference) are displayed in global ordering. I guess the finite difference Jacobian doesn't inherit the natural ordering from the DM. Thanks for your help! On Tue, May 19, 2020 at 5:06 PM Zhang, Hong wrote: > > > On May 19, 2020, at 2:51 PM, zakaryah . wrote: > > I'd like to have one more pass at debugging it myself first. I think Matt > is probably right about the ordering. I need to make sure I understand the > ordering used in VecView. If I use DMCreateGlobalVector to create the > vector for the residual, calculate the residual, then call VecView on the > residual vector, how do I ensure that the output is in natural ordering? > > > The output is in natural ordering by default. You can use > PETSC_VIEWER_NATIVE to make output printed in PETSc ordering and have a > comparison. > > PetscViewerPushFormat(PETSC_VIEWER_STDOUT_WORLD,PETSC_VIEWER_NATIVE); > VecView(your_global_vector,PETSC_VIEWER_STDOUT_WORLD); > PetscViewerPopFormat(PETSC_VIEWER_STDOUT_WORLD); > > Hong (Mr.) > > > On Tue, May 19, 2020 at 1:17 PM Zhang, Hong wrote: > >> Can you post your code (at least some essential snippets) for the >> residual evaluation and the Jacobian evaluation? >> >> Hong (Mr.) >> >> On May 19, 2020, at 11:08 AM, zakaryah . wrote: >> >> Thanks Matt. I should have said that the boundary is set to >>> BOUNDARY_NONE. I am not sure what you mean about the ordering. I understand >>> that the actual indexing, internal to PETSc, does not match the natural >>> ordering, as explained in the manual. But, doesn't MatView always use the >>> natural ordering? Also, my hand-coded Jacobian routine uses >>> MatSetValuesStencil, and the corresponding output from >>> snes_test_jacobian_display exactly matches what I expect to see - correct >>> layout of nonzero terms according to natural ordering. It is only the >>> finite difference Jacobian that contains the unexpected, off-stencil terms. >>> >> >> This is weird. I run the Jacobian test manually, by adding a small >> perturbation to the configuration vector at each index and calculating the >> function for the perturbed configuration, then the result looks like it >> should. This makes me think there is not a problem with my routine for >> calculating the function, but I can't explain why the Jacobian test by >> finite difference that PETSc calculates would be different, and have >> non-zero values outside the stencil. >> >> On Tue, May 19, 2020 at 10:13 AM zakaryah . wrote: >> >>> Thanks Matt. I should have said that the boundary is set to >>> BOUNDARY_NONE. I am not sure what you mean about the ordering. I understand >>> that the actual indexing, internal to PETSc, does not match the natural >>> ordering, as explained in the manual. But, doesn't MatView always use the >>> natural ordering? Also, my hand-coded Jacobian routine uses >>> MatSetValuesStencil, and the corresponding output from >>> snes_test_jacobian_display exactly matches what I expect to see - correct >>> layout of nonzero terms according to natural ordering. It is only the >>> finite difference Jacobian that contains the unexpected, off-stencil terms. >>> >>> On Tue, May 19, 2020, 5:56 AM Matthew Knepley wrote: >>> >>>> On Tue, May 19, 2020 at 1:57 AM zakaryah . wrote: >>>> >>>>> Hi all, >>>>> >>>>> I'm debugging some convergence issues and I came across something >>>>> strange. I am using a nonlinear solver on a structured grid. The DMDA is 3 >>>>> dimensional, with 3 dof, and a box stencil of width 1. There are some small >>>>> errors in the hand-coded Jacobian, which I am trying to sort out, but at >>>>> least the fill pattern of the matrix is correct. >>>>> >>>>> However, when I run with -snes_test_jacobian >>>>> -snes_test_jacobian_display -snes_compare_explicit, I see something very >>>>> strange. The finite difference Jacobian has large terms outside the >>>>> stencil. For example, for x,y,z,c = 0,0,0,0 (row 0), the columns 6, 7, 8, >>>>> 12, 13, and 14 (column 6 => x=2,y=0,z=0,c=0, etc.) have large values, while >>>>> columns 9 through 20 are calculated but equal to zero. The "correct" >>>>> values, i.e., in the stencil, are calculated as well, and nearly agree with >>>>> my hand-coded Jacobian. This issue does NOT occur in serial, but occurs for >>>>> any number of processors greater than 1. >>>>> >>>>> I have checked the indexing by hand, carefully, and the memory access >>>>> with valgrind, and the results were clean. Does anyone have an idea why the >>>>> finite difference calculation of the Jacobian would produce large values >>>>> outside the stencil? I am using PETSc 3.12.2 and openMPI 3.1.0. Thanks for >>>>> your help. >>>>> >>>> >>>> Since it only appears in parallel, I am guessing that your calculation >>>> of global ordering does not take into account that we locally >>>> reorder, rather than using lexicographic ordering, and you might have >>>> periodic boundary conditions. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.croucher at auckland.ac.nz Wed May 20 22:59:49 2020 From: a.croucher at auckland.ac.nz (Adrian Croucher) Date: Thu, 21 May 2020 15:59:49 +1200 Subject: [petsc-users] make check vs make test Message-ID: hi In the latest PETSc version (3.13), 'make test' seems to be doing a much more comprehensive set of tests than it used to. That is ok except that I have a CI pipeline which builds PETSc and my code, and then runs my tests. It also does a 'make test' after building PETSc and that is now making the CI time out. At the end of the PETSc build process it says I should run 'make check' to "check if the libraries are working". I can't seem to find any documentation on what 'make check' actually does, but it looks like it runs a much reduced set of tests. Currently I am running both 'make check' and 'make test'. For this use case should it be sufficient to run 'make check' and omit 'make test'? - Adrian -- Dr Adrian Croucher Senior Research Fellow Department of Engineering Science University of Auckland, New Zealand email: a.croucher at auckland.ac.nz tel: +64 (0)9 923 4611 From balay at mcs.anl.gov Wed May 20 23:05:36 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 20 May 2020 23:05:36 -0500 (CDT) Subject: [petsc-users] make check vs make test In-Reply-To: References: Message-ID: In prior releases - 'make test' was same as 'make check' With petsc-3.13 - 'make test' is now the full test suite. If you were using 'make test' previously - the appropriate thing now is 'make check' Satish On Thu, 21 May 2020, Adrian Croucher wrote: > hi > > In the latest PETSc version (3.13), 'make test' seems to be doing a much more > comprehensive set of tests than it used to. > > That is ok except that I have a CI pipeline which builds PETSc and my code, > and then runs my tests. It also does a 'make test' after building PETSc and > that is now making the CI time out. > > At the end of the PETSc build process it says I should run 'make check' to > "check if the libraries are working". I can't seem to find any documentation > on what 'make check' actually does, but it looks like it runs a much reduced > set of tests. Currently I am running both 'make check' and 'make test'. > > For this use case should it be sufficient to run 'make check' and omit 'make > test'? > > - Adrian > > From jed at jedbrown.org Wed May 20 23:08:59 2020 From: jed at jedbrown.org (Jed Brown) Date: Wed, 20 May 2020 22:08:59 -0600 Subject: [petsc-users] make check vs make test In-Reply-To: References: Message-ID: <87a722kjz8.fsf@jedbrown.org> `make check` has been a synonym for `make test` for several releases, and we used `make -f gmakefile test` for the comprehensive suite. We recently made it so that if you have a sufficiently recent GNU Make, you can `make test` without the `-f gmakefile` verbosity. If you want the simple test, either continue using `make check` or select a small number of tests using `make test search=snes_tutorials-ex5_1` or whatever. Adrian Croucher writes: > hi > > In the latest PETSc version (3.13), 'make test' seems to be doing a much > more comprehensive set of tests than it used to. > > That is ok except that I have a CI pipeline which builds PETSc and my > code, and then runs my tests. It also does a 'make test' after building > PETSc and that is now making the CI time out. > > At the end of the PETSc build process it says I should run 'make check' > to "check if the libraries are working". I can't seem to find any > documentation on what 'make check' actually does, but it looks like it > runs a much reduced set of tests. Currently I am running both 'make > check' and 'make test'. > > For this use case should it be sufficient to run 'make check' and omit > 'make test'? > > - Adrian > > -- > Dr Adrian Croucher > Senior Research Fellow > Department of Engineering Science > University of Auckland, New Zealand > email: a.croucher at auckland.ac.nz > tel: +64 (0)9 923 4611 From yang.bo at ntu.edu.sg Thu May 21 02:54:51 2020 From: yang.bo at ntu.edu.sg (Yang Bo (Asst Prof)) Date: Thu, 21 May 2020 07:54:51 +0000 Subject: [petsc-users] a question about MatSetValue Message-ID: <9C987DDD-DE23-476B-9224-49069EC666C7@ntu.edu.sg> Hi Everyone, I have a question about adding values to the matrix. The code I have is for (int i=0;i References: <9C987DDD-DE23-476B-9224-49069EC666C7@ntu.edu.sg> Message-ID: On Thu, 21 May 2020 at 08:55, Yang Bo (Asst Prof) wrote: > Hi Everyone, > > I have a question about adding values to the matrix. The code I have is > > > for (int i=0;i MatSetValue(A,row[i],column[i],h[i],INSERT_VALUES); > } > > where row.size() is a large number. It seems the running time of this > procedure does not scale linearly with row.size(). As row.size() gets > bigger, the time it takes increases exponentially. It sounds like your matrix is not properly preallocated. Could this be the case? You can confirm / deny this by running with the command line options (shown in bold) ./your-exec * -info | grep malloc* If all is good you will see something like this [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 *total number of mallocs used during MatSetValues calls=0* If the reported number of mallocs in your code is not 0, please read these pages: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatSeqAIJSetPreallocation.html https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatMPIAIJSetPreallocation.html You may like to use the generic preallocator (depending on the type of you Mat). https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatXAIJSetPreallocation.html Thanks Dave > Am I doing something wrong and can I do better than that? > > Thanks and stay healthy! > > Cheers, > > Yang Bo > ________________________________ > > CONFIDENTIALITY: This email is intended solely for the person(s) named and > may be confidential and/or privileged. If you are not the intended > recipient, please delete it, notify us and do not copy, use, or disclose > its contents. > Towards a sustainable earth: Print only when necessary. Thank you. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yang.bo at ntu.edu.sg Thu May 21 03:49:38 2020 From: yang.bo at ntu.edu.sg (Yang Bo (Asst Prof)) Date: Thu, 21 May 2020 08:49:38 +0000 Subject: [petsc-users] a question about MatSetValue In-Reply-To: References: <9C987DDD-DE23-476B-9224-49069EC666C7@ntu.edu.sg> Message-ID: <8223114B-55C2-440A-89D7-D4483D3F95A0@ntu.edu.sg> Hi Dave, Thank you very much for your reply. That is indeed the problem. I have been working with matrices in Slepc but I don?t really understand it. I tried to preallocate but it still does not work. If you look at my code below: ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); ierr = MatSetSizes(A,PETSC_DECIDE,PETSC_DECIDE,h_dim,h_dim); // h_dim is the dimension of the square matrix A ierr = MatSetFromOptions(A);CHKERRQ(ierr); ierr = MatSetUp(A);CHKERRQ(ierr); ierr = MatGetOwnershipRange(A,&Istart,&Iend);CHKERRQ(ierr); MatSeqAIJSetPreallocation(A,0,nnz); // I try to preallocate here, where nnz is the array containing the number of non-zero entries each row for (int i=0;i> wrote: -info | grep malloc ________________________________ CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its contents. Towards a sustainable earth: Print only when necessary. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Thu May 21 04:42:29 2020 From: dave.mayhem23 at gmail.com (Dave May) Date: Thu, 21 May 2020 11:42:29 +0200 Subject: [petsc-users] a question about MatSetValue In-Reply-To: <8223114B-55C2-440A-89D7-D4483D3F95A0@ntu.edu.sg> References: <9C987DDD-DE23-476B-9224-49069EC666C7@ntu.edu.sg> <8223114B-55C2-440A-89D7-D4483D3F95A0@ntu.edu.sg> Message-ID: On Thu 21. May 2020 at 10:49, Yang Bo (Asst Prof) wrote: > Hi Dave, > > Thank you very much for your reply. That is indeed the problem. I have > been working with matrices in Slepc but I don?t really understand it. I > tried to preallocate but it still does not work. > Meaning the number of reported mallocs is still non-zero? Is the number reported with you preallocation calls lower than what you originally saw? If you look at my code below: > > ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); > ierr = MatSetSizes(A,PETSC_DECIDE,PETSC_DECIDE,h_dim,h_dim); > // h_dim is the dimension of the square matrix A > ierr = MatSetFromOptions(A);CHKERRQ(ierr); > ierr = MatSetUp(A);CHKERRQ(ierr); > ierr = MatGetOwnershipRange(A,&Istart,&Iend);CHKERRQ(ierr); > > MatSeqAIJSetPreallocation(A,0,nnz); // > I try to preallocate here, where nnz is the array containing the number of > non-zero entries each row > > for (int i=0;i MatSetValue(A,row[i],column[i],h[i],INSERT_VALUES); > } > > I am not sure what other information I need to give for the pre-allocation? > This looks fine. However MatSeqAIJSetPreallocation() has no effect if the Mat type is not SEQAIJ. Are you running in parallel? If yes then the Mat type will be MATMPIAIJ and you either have to call the MPI specific preallocator or use the generic one I pointed you too. Thanks Dave > Cheers, > > Yang Bo > > > > On 21 May 2020, at 4:08 PM, Dave May wrote: > > *-info | grep malloc* > > > ------------------------------ > > CONFIDENTIALITY: This email is intended solely for the person(s) named and > may be confidential and/or privileged. If you are not the intended > recipient, please delete it, notify us and do not copy, use, or disclose > its contents. > Towards a sustainable earth: Print only when necessary. Thank you. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yang.bo at ntu.edu.sg Thu May 21 05:17:44 2020 From: yang.bo at ntu.edu.sg (Yang Bo (Asst Prof)) Date: Thu, 21 May 2020 10:17:44 +0000 Subject: [petsc-users] a question about MatSetValue In-Reply-To: References: <9C987DDD-DE23-476B-9224-49069EC666C7@ntu.edu.sg> <8223114B-55C2-440A-89D7-D4483D3F95A0@ntu.edu.sg> Message-ID: Hi Dave, Yes it is parallel so the preallocation calls are not lowered by the allocation. I am trying to use MatXAIJSetPreallocation, but not sure how, since the following link does not give an example: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatXAIJSetPreallocation.html If I have the following matrix: 0 1 2 0 1 0 0 0 2 0 1 3 0 0 3 2 How should I put in the parameters of MatXAIJSetPreallocation? Thanks! Cheers, Yang Bo On 21 May 2020, at 5:42 PM, Dave May > wrote: On Thu 21. May 2020 at 10:49, Yang Bo (Asst Prof) > wrote: Hi Dave, Thank you very much for your reply. That is indeed the problem. I have been working with matrices in Slepc but I don?t really understand it. I tried to preallocate but it still does not work. Meaning the number of reported mallocs is still non-zero? Is the number reported with you preallocation calls lower than what you originally saw? If you look at my code below: ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); ierr = MatSetSizes(A,PETSC_DECIDE,PETSC_DECIDE,h_dim,h_dim); // h_dim is the dimension of the square matrix A ierr = MatSetFromOptions(A);CHKERRQ(ierr); ierr = MatSetUp(A);CHKERRQ(ierr); ierr = MatGetOwnershipRange(A,&Istart,&Iend);CHKERRQ(ierr); MatSeqAIJSetPreallocation(A,0,nnz); // I try to preallocate here, where nnz is the array containing the number of non-zero entries each row for (int i=0;i> wrote: -info | grep malloc ________________________________ CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its contents. Towards a sustainable earth: Print only when necessary. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Thu May 21 05:23:07 2020 From: dave.mayhem23 at gmail.com (Dave May) Date: Thu, 21 May 2020 12:23:07 +0200 Subject: [petsc-users] a question about MatSetValue In-Reply-To: References: <9C987DDD-DE23-476B-9224-49069EC666C7@ntu.edu.sg> <8223114B-55C2-440A-89D7-D4483D3F95A0@ntu.edu.sg> Message-ID: On Thu 21. May 2020 at 12:17, Yang Bo (Asst Prof) wrote: > Hi Dave, > > Yes it is parallel so the preallocation calls are not lowered by the > allocation. > > I am trying to use MatXAIJSetPreallocation, but not sure how, since the > following link does not give an example: > > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatXAIJSetPreallocation.html > > If I have the following matrix: > > 0 1 2 0 > 1 0 0 0 > 2 0 1 3 > 0 0 3 2 > > How should I put in the parameters of MatXAIJSetPreallocation? > Please read this page to understand the info required https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatMPIAIJSetPreallocation.html Compute everything as described above and give the results to MatXAIJSetPreallocation(). MatXAIJSetPreallocation() is just a helper function to hide all the implementation specific setters. Thanks Dave > Thanks! > > Cheers, > > Yang Bo > > > On 21 May 2020, at 5:42 PM, Dave May wrote: > > > > On Thu 21. May 2020 at 10:49, Yang Bo (Asst Prof) > wrote: > >> Hi Dave, >> >> Thank you very much for your reply. That is indeed the problem. I have >> been working with matrices in Slepc but I don?t really understand it. I >> tried to preallocate but it still does not work. >> > > Meaning the number of reported mallocs is still non-zero? > Is the number reported with you preallocation calls lower than what you > originally saw? > > If you look at my code below: >> >> ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); >> ierr = MatSetSizes(A,PETSC_DECIDE,PETSC_DECIDE,h_dim,h_dim); >> // h_dim is the dimension of the square matrix A >> ierr = MatSetFromOptions(A);CHKERRQ(ierr); >> ierr = MatSetUp(A);CHKERRQ(ierr); >> ierr = MatGetOwnershipRange(A,&Istart,&Iend);CHKERRQ(ierr); >> >> MatSeqAIJSetPreallocation(A,0,nnz); >> // I try to preallocate here, where nnz is the array containing the >> number of non-zero entries each row >> >> for (int i=0;i> MatSetValue(A,row[i],column[i],h[i],INSERT_VALUES); >> } >> >> I am not sure what other information I need to give for the >> pre-allocation? >> > > This looks fine. However MatSeqAIJSetPreallocation() has no effect if the > Mat type is not SEQAIJ. > > Are you running in parallel? If yes then the Mat type will be MATMPIAIJ > and you either have to call the MPI specific preallocator or use the > generic one I pointed you too. > > Thanks > Dave > > > >> Cheers, >> >> Yang Bo >> >> >> >> On 21 May 2020, at 4:08 PM, Dave May wrote: >> >> *-info | grep malloc* >> >> >> ------------------------------ >> >> CONFIDENTIALITY: This email is intended solely for the person(s) named >> and may be confidential and/or privileged. If you are not the intended >> recipient, please delete it, notify us and do not copy, use, or disclose >> its contents. >> Towards a sustainable earth: Print only when necessary. Thank you. >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yang.bo at ntu.edu.sg Thu May 21 07:37:01 2020 From: yang.bo at ntu.edu.sg (Yang Bo (Asst Prof)) Date: Thu, 21 May 2020 12:37:01 +0000 Subject: [petsc-users] a question about MatSetValue In-Reply-To: References: <9C987DDD-DE23-476B-9224-49069EC666C7@ntu.edu.sg> <8223114B-55C2-440A-89D7-D4483D3F95A0@ntu.edu.sg> Message-ID: <81BFD52E-2CDA-4EDF-8703-8EF3E606019E@ntu.edu.sg> I see, it is working now, thanks! On 21 May 2020, at 6:23 PM, Dave May > wrote: On Thu 21. May 2020 at 12:17, Yang Bo (Asst Prof) > wrote: Hi Dave, Yes it is parallel so the preallocation calls are not lowered by the allocation. I am trying to use MatXAIJSetPreallocation, but not sure how, since the following link does not give an example: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatXAIJSetPreallocation.html If I have the following matrix: 0 1 2 0 1 0 0 0 2 0 1 3 0 0 3 2 How should I put in the parameters of MatXAIJSetPreallocation? Please read this page to understand the info required https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatMPIAIJSetPreallocation.html Compute everything as described above and give the results to MatXAIJSetPreallocation(). MatXAIJSetPreallocation() is just a helper function to hide all the implementation specific setters. Thanks Dave Thanks! Cheers, Yang Bo On 21 May 2020, at 5:42 PM, Dave May > wrote: On Thu 21. May 2020 at 10:49, Yang Bo (Asst Prof) > wrote: Hi Dave, Thank you very much for your reply. That is indeed the problem. I have been working with matrices in Slepc but I don?t really understand it. I tried to preallocate but it still does not work. Meaning the number of reported mallocs is still non-zero? Is the number reported with you preallocation calls lower than what you originally saw? If you look at my code below: ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); ierr = MatSetSizes(A,PETSC_DECIDE,PETSC_DECIDE,h_dim,h_dim); // h_dim is the dimension of the square matrix A ierr = MatSetFromOptions(A);CHKERRQ(ierr); ierr = MatSetUp(A);CHKERRQ(ierr); ierr = MatGetOwnershipRange(A,&Istart,&Iend);CHKERRQ(ierr); MatSeqAIJSetPreallocation(A,0,nnz); // I try to preallocate here, where nnz is the array containing the number of non-zero entries each row for (int i=0;i> wrote: -info | grep malloc ________________________________ CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its contents. Towards a sustainable earth: Print only when necessary. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rui.silva at uam.es Thu May 21 09:53:17 2020 From: rui.silva at uam.es (Rui Silva) Date: Thu, 21 May 2020 16:53:17 +0200 Subject: [petsc-users] Possible bug PETSc+Complex+CUDA Message-ID: Hello everyone, I am trying to run PETSc (with complex numbers in the GPU). When I call the VecWAXPY routine using the complex version of PETSc and mpicuda vectors, the program fails with a segmentation fault. This problem does not appear, if I run the complex version with mpi vectors or with the real version using mpicuda vectors. Is there any problem using CUDA+complex PETSc? Furthermore, I use the -log_view option to run the complex+gpu code, otherwise the program fails at the beggining. Best regards, Rui Silva -- Dr. Rui Emanuel Ferreira da Silva Departamento de F?sica Te?rica de la Materia Condensada Universidad Aut?noma de Madrid, Spain https://ruiefdasilva.wixsite.com/ruiefdasilva https://mmuscles.eu/ -------------- next part -------------- all : odesolve.exe clean :: rm -f *.o odesolve.exe include ${SLEPC_DIR}/lib/slepc/conf/slepc_common # FC_FLAGS := $(filter-out -Wall,${FC_FLAGS}) -ffree-line-length-none %.exe: %.o -${FLINKER} -o $@ $< ${SLEPC_LIB} -------------- next part -------------- A non-text attachment was scrubbed... Name: odesolve.F90 Type: text/x-fortran Size: 2086 bytes Desc: not available URL: From knepley at gmail.com Thu May 21 10:21:55 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 21 May 2020 11:21:55 -0400 Subject: [petsc-users] Possible bug PETSc+Complex+CUDA In-Reply-To: References: Message-ID: On Thu, May 21, 2020 at 10:53 AM Rui Silva wrote: > Hello everyone, > > I am trying to run PETSc (with complex numbers in the GPU). When I call > the VecWAXPY routine using the complex version of PETSc and mpicuda > vectors, the program fails with a segmentation fault. This problem does > not appear, if I run the complex version with mpi vectors or with the > real version using mpicuda vectors. Is there any problem using > CUDA+complex PETSc? > > Furthermore, I use the -log_view option to run the complex+gpu code, > otherwise the program fails at the beggining. > What version of CUDA do you have? There are bugs in the versions before 10.2. Thanks, Matt > Best regards, > > Rui Silva > > -- > Dr. Rui Emanuel Ferreira da Silva > Departamento de F?sica Te?rica de la Materia Condensada > Universidad Aut?noma de Madrid, Spain > https://ruiefdasilva.wixsite.com/ruiefdasilva > https://mmuscles.eu/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Thu May 21 10:31:35 2020 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Thu, 21 May 2020 18:31:35 +0300 Subject: [petsc-users] Possible bug PETSc+Complex+CUDA In-Reply-To: References: Message-ID: <351ECF24-E265-4487-AE44-C4D8F2EFD045@gmail.com> Oh, there is also an issue I have recently noticed and did not have yet the time to fix it With complex numbers, we use the definitions for complexes from thrust and this does not seem to be always compatible to whatever the C compiler uses Matt, take a look at petscsytypes.h and you will see the issue https://gitlab.com/petsc/petsc/-/blob/master/include/petscsystypes.h#L208 For sure, you need to configure petsc with --with-clanguage=cxx, but even that does not to seem make it work on a CUDA box I have recently tried out (CUDA 10.1) I believe the issue arise even if you call VecSet(v,0) on a VECCUDA > On May 21, 2020, at 6:21 PM, Matthew Knepley wrote: > > On Thu, May 21, 2020 at 10:53 AM Rui Silva > wrote: > Hello everyone, > > I am trying to run PETSc (with complex numbers in the GPU). When I call > the VecWAXPY routine using the complex version of PETSc and mpicuda > vectors, the program fails with a segmentation fault. This problem does > not appear, if I run the complex version with mpi vectors or with the > real version using mpicuda vectors. Is there any problem using > CUDA+complex PETSc? > > Furthermore, I use the -log_view option to run the complex+gpu code, > otherwise the program fails at the beggining. > > What version of CUDA do you have? There are bugs in the versions before 10.2. > > Thanks, > > Matt > > Best regards, > > Rui Silva > > -- > Dr. Rui Emanuel Ferreira da Silva > Departamento de F?sica Te?rica de la Materia Condensada > Universidad Aut?noma de Madrid, Spain > https://ruiefdasilva.wixsite.com/ruiefdasilva > https://mmuscles.eu/ > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rui.silva at uam.es Thu May 21 10:37:42 2020 From: rui.silva at uam.es (Rui Silva) Date: Thu, 21 May 2020 17:37:42 +0200 Subject: [petsc-users] Possible bug PETSc+Complex+CUDA In-Reply-To: References: Message-ID: <28e9011c-578e-4d74-602f-fdb9a19ddc11@uam.es> Hi Matt, I am using CUDA 9.2. I will try to compile with CUDA 10.2. Best regards, Rui ?s 17:21 de 21/05/2020, Matthew Knepley escreveu: > On Thu, May 21, 2020 at 10:53 AM Rui Silva > wrote: > > Hello everyone, > > I am trying to run PETSc (with complex numbers in the GPU). When I > call > the VecWAXPY routine using the complex version of PETSc and mpicuda > vectors, the program fails with a segmentation fault. This problem > does > not appear, if I run the complex version with mpi vectors or with the > real version using mpicuda vectors. Is there any problem using > CUDA+complex PETSc? > > Furthermore, I use the -log_view option to run the complex+gpu code, > otherwise the program fails at the beggining. > > > What version of CUDA do you have? There are bugs in the versions > before 10.2. > > ? Thanks, > > ? ? Matt > > Best regards, > > Rui Silva > > -- > ? Dr. Rui Emanuel Ferreira da Silva > ? Departamento de F?sica Te?rica de la Materia Condensada > ? Universidad Aut?noma de Madrid, Spain > https://ruiefdasilva.wixsite.com/ruiefdasilva > https://mmuscles.eu/ > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -- Dr. Rui Emanuel Ferreira da Silva Departamento de F?sica Te?rica de la Materia Condensada Universidad Aut?noma de Madrid, Spain https://ruiefdasilva.wixsite.com/ruiefdasilva https://mmuscles.eu/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu May 21 11:02:56 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 21 May 2020 12:02:56 -0400 Subject: [petsc-users] Possible bug PETSc+Complex+CUDA In-Reply-To: <351ECF24-E265-4487-AE44-C4D8F2EFD045@gmail.com> References: <351ECF24-E265-4487-AE44-C4D8F2EFD045@gmail.com> Message-ID: On Thu, May 21, 2020 at 11:31 AM Stefano Zampini wrote: > Oh, there is also an issue I have recently noticed and did not have yet > the time to fix it > > With complex numbers, we use the definitions for complexes from thrust and > this does not seem to be always compatible to whatever the C compiler uses > Matt, take a look at petscsytypes.h and you will see the issue > https://gitlab.com/petsc/petsc/-/blob/master/include/petscsystypes.h#L208 > > For sure, you need to configure petsc with --with-clanguage=cxx, but even > that does not to seem make it work on a CUDA box I have recently tried out > (CUDA 10.1) > I believe the issue arise even if you call VecSet(v,0) on a VECCUDA > So Karl and Junchao say that with 10.2 it is working. Do you have access to 10.2? Thanks, Matt > On May 21, 2020, at 6:21 PM, Matthew Knepley wrote: > > On Thu, May 21, 2020 at 10:53 AM Rui Silva wrote: > >> Hello everyone, >> >> I am trying to run PETSc (with complex numbers in the GPU). When I call >> the VecWAXPY routine using the complex version of PETSc and mpicuda >> vectors, the program fails with a segmentation fault. This problem does >> not appear, if I run the complex version with mpi vectors or with the >> real version using mpicuda vectors. Is there any problem using >> CUDA+complex PETSc? >> >> Furthermore, I use the -log_view option to run the complex+gpu code, >> otherwise the program fails at the beggining. >> > > What version of CUDA do you have? There are bugs in the versions before > 10.2. > > Thanks, > > Matt > > >> Best regards, >> >> Rui Silva >> >> -- >> Dr. Rui Emanuel Ferreira da Silva >> Departamento de F?sica Te?rica de la Materia Condensada >> Universidad Aut?noma de Madrid, Spain >> https://ruiefdasilva.wixsite.com/ruiefdasilva >> https://mmuscles.eu/ >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Thu May 21 11:15:20 2020 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 21 May 2020 11:15:20 -0500 Subject: [petsc-users] Possible bug PETSc+Complex+CUDA In-Reply-To: References: <351ECF24-E265-4487-AE44-C4D8F2EFD045@gmail.com> Message-ID: I tested this example with cuda 10.2, it did segfault. I'm looking into it. --Junchao Zhang On Thu, May 21, 2020 at 11:04 AM Matthew Knepley wrote: > On Thu, May 21, 2020 at 11:31 AM Stefano Zampini < > stefano.zampini at gmail.com> wrote: > >> Oh, there is also an issue I have recently noticed and did not have yet >> the time to fix it >> >> With complex numbers, we use the definitions for complexes from thrust >> and this does not seem to be always compatible to whatever the C compiler >> uses >> Matt, take a look at petscsytypes.h and you will see the issue >> https://gitlab.com/petsc/petsc/-/blob/master/include/petscsystypes.h#L208 >> >> For sure, you need to configure petsc with --with-clanguage=cxx, but even >> that does not to seem make it work on a CUDA box I have recently tried out >> (CUDA 10.1) >> I believe the issue arise even if you call VecSet(v,0) on a VECCUDA >> > > So Karl and Junchao say that with 10.2 it is working. Do you have access > to 10.2? > > Thanks, > > Matt > > >> On May 21, 2020, at 6:21 PM, Matthew Knepley wrote: >> >> On Thu, May 21, 2020 at 10:53 AM Rui Silva wrote: >> >>> Hello everyone, >>> >>> I am trying to run PETSc (with complex numbers in the GPU). When I call >>> the VecWAXPY routine using the complex version of PETSc and mpicuda >>> vectors, the program fails with a segmentation fault. This problem does >>> not appear, if I run the complex version with mpi vectors or with the >>> real version using mpicuda vectors. Is there any problem using >>> CUDA+complex PETSc? >>> >>> Furthermore, I use the -log_view option to run the complex+gpu code, >>> otherwise the program fails at the beggining. >>> >> >> What version of CUDA do you have? There are bugs in the versions before >> 10.2. >> >> Thanks, >> >> Matt >> >> >>> Best regards, >>> >>> Rui Silva >>> >>> -- >>> Dr. Rui Emanuel Ferreira da Silva >>> Departamento de F?sica Te?rica de la Materia Condensada >>> Universidad Aut?noma de Madrid, Spain >>> https://ruiefdasilva.wixsite.com/ruiefdasilva >>> https://mmuscles.eu/ >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Thu May 21 14:24:33 2020 From: sam.guo at cd-adapco.com (Sam Guo) Date: Thu, 21 May 2020 12:24:33 -0700 Subject: [petsc-users] error of compiling complex Message-ID: Dear PETSc dev team, I got following error of compiling complex: ../../../petsc/src/ksp/pc/impls/tfs/ivec.c: In function ?PCTFS_rvec_max?: ../../../petsc/include/petscmath.h:532:30: error: invalid operands to binary < (have ?PetscScalar? {aka ?_Complex double?} and ?PetscScalar? {aka ?_Complex double?}) 532 | #define PetscMax(a,b) (((a)<(b)) ? (b) : (a)) Any idea what I did wrong? I am using version 3.11.3. Thanks, Sam -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Thu May 21 14:30:06 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 21 May 2020 14:30:06 -0500 (CDT) Subject: [petsc-users] error of compiling complex In-Reply-To: References: Message-ID: Something wrong with your build. This source-file should not get compiled for complex. > ../../../petsc/src/ksp/pc/impls/tfs/ivec.c And this relative path looks wierd. How are you building PETSc? Might start with a fresh petsc install. [and look at updating to latest release - petsc-3.13] And send complete build logs - if you still have issues. Satish On Thu, 21 May 2020, Sam Guo wrote: > Dear PETSc dev team, > I got following error of compiling complex: > ../../../petsc/src/ksp/pc/impls/tfs/ivec.c: In function ?PCTFS_rvec_max?: > ../../../petsc/include/petscmath.h:532:30: error: invalid operands to > binary < (have ?PetscScalar? {aka ?_Complex double?} and ?PetscScalar? {aka > ?_Complex double?}) > 532 | #define PetscMax(a,b) (((a)<(b)) ? (b) : (a)) > > Any idea what I did wrong? I am using version 3.11.3. > > Thanks, > Sam > From knepley at gmail.com Thu May 21 14:30:54 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 21 May 2020 15:30:54 -0400 Subject: [petsc-users] error of compiling complex In-Reply-To: References: Message-ID: On Thu, May 21, 2020 at 3:25 PM Sam Guo wrote: > Dear PETSc dev team, > I got following error of compiling complex: > ../../../petsc/src/ksp/pc/impls/tfs/ivec.c: In function ?PCTFS_rvec_max?: > ../../../petsc/include/petscmath.h:532:30: error: invalid operands to > binary < (have ?PetscScalar? {aka ?_Complex double?} and ?PetscScalar? {aka > ?_Complex double?}) > 532 | #define PetscMax(a,b) (((a)<(b)) ? (b) : (a)) > > Any idea what I did wrong? I am using version 3.11.3. > Can you upgrade to 3.13? Thanks, Matt > Thanks, > Sam > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Thu May 21 15:32:49 2020 From: sam.guo at cd-adapco.com (Sam Guo) Date: Thu, 21 May 2020 13:32:49 -0700 Subject: [petsc-users] error of compiling complex In-Reply-To: References: Message-ID: Thanks for the quick response. I have my own makefile based on PETSc makefile. I need to update my makefile. It seems PETSc generates the makefile on fly. I struggle to figure out which files I should skip for complex. Could you tell me what files I should skip? Thanks, Sam On Thu, May 21, 2020 at 12:31 PM Matthew Knepley wrote: > On Thu, May 21, 2020 at 3:25 PM Sam Guo wrote: > >> Dear PETSc dev team, >> I got following error of compiling complex: >> ../../../petsc/src/ksp/pc/impls/tfs/ivec.c: In function ?PCTFS_rvec_max?: >> ../../../petsc/include/petscmath.h:532:30: error: invalid operands to >> binary < (have ?PetscScalar? {aka ?_Complex double?} and ?PetscScalar? {aka >> ?_Complex double?}) >> 532 | #define PetscMax(a,b) (((a)<(b)) ? (b) : (a)) >> >> Any idea what I did wrong? I am using version 3.11.3. >> > > Can you upgrade to 3.13? > > Thanks, > > Matt > > >> Thanks, >> Sam >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Thu May 21 15:42:28 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 21 May 2020 15:42:28 -0500 (CDT) Subject: [petsc-users] error of compiling complex In-Reply-To: References: Message-ID: Is there a reason you can't use petsc build tools? config/gmakegen.py processes the mkaefiles in source dirs - and determines the list of files to build [for a given configure setup] src/ksp/pc/impls/tfs/makefile has: #requiresscalar real i.e the sources in this dir are compiled only when --with-scalar-type=real and not when --with-scalar-type=complex etc.. Satish On Thu, 21 May 2020, Sam Guo wrote: > Thanks for the quick response. I have my own makefile based on PETSc > makefile. I need to update my makefile. It seems PETSc generates the > makefile on fly. I struggle to figure out which files I should skip for > complex. Could you tell me what files I should skip? > > Thanks, > Sam > > On Thu, May 21, 2020 at 12:31 PM Matthew Knepley wrote: > > > On Thu, May 21, 2020 at 3:25 PM Sam Guo wrote: > > > >> Dear PETSc dev team, > >> I got following error of compiling complex: > >> ../../../petsc/src/ksp/pc/impls/tfs/ivec.c: In function ?PCTFS_rvec_max?: > >> ../../../petsc/include/petscmath.h:532:30: error: invalid operands to > >> binary < (have ?PetscScalar? {aka ?_Complex double?} and ?PetscScalar? {aka > >> ?_Complex double?}) > >> 532 | #define PetscMax(a,b) (((a)<(b)) ? (b) : (a)) > >> > >> Any idea what I did wrong? I am using version 3.11.3. > >> > > > > Can you upgrade to 3.13? > > > > Thanks, > > > > Matt > > > > > >> Thanks, > >> Sam > >> > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which their > > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > From sam.guo at cd-adapco.com Thu May 21 15:55:20 2020 From: sam.guo at cd-adapco.com (Sam Guo) Date: Thu, 21 May 2020 13:55:20 -0700 Subject: [petsc-users] error of compiling complex In-Reply-To: References: Message-ID: The reason we don?t use petsc build tool is to make sure we use same compiler and flags to build our code and petsc. On Thursday, May 21, 2020, Satish Balay wrote: > Is there a reason you can't use petsc build tools? > > config/gmakegen.py processes the mkaefiles in source dirs - and determines > the list of files to build [for a given configure setup] > > src/ksp/pc/impls/tfs/makefile has: > > #requiresscalar real > > i.e the sources in this dir are compiled only when --with-scalar-type=real > and not when --with-scalar-type=complex etc.. > > Satish > > On Thu, 21 May 2020, Sam Guo wrote: > > > Thanks for the quick response. I have my own makefile based on PETSc > > makefile. I need to update my makefile. It seems PETSc generates the > > makefile on fly. I struggle to figure out which files I should skip for > > complex. Could you tell me what files I should skip? > > > > Thanks, > > Sam > > > > On Thu, May 21, 2020 at 12:31 PM Matthew Knepley > wrote: > > > > > On Thu, May 21, 2020 at 3:25 PM Sam Guo wrote: > > > > > >> Dear PETSc dev team, > > >> I got following error of compiling complex: > > >> ../../../petsc/src/ksp/pc/impls/tfs/ivec.c: In function > ?PCTFS_rvec_max?: > > >> ../../../petsc/include/petscmath.h:532:30: error: invalid operands to > > >> binary < (have ?PetscScalar? {aka ?_Complex double?} and > ?PetscScalar? {aka > > >> ?_Complex double?}) > > >> 532 | #define PetscMax(a,b) (((a)<(b)) ? (b) : (a)) > > >> > > >> Any idea what I did wrong? I am using version 3.11.3. > > >> > > > > > > Can you upgrade to 3.13? > > > > > > Thanks, > > > > > > Matt > > > > > > > > >> Thanks, > > >> Sam > > >> > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > > > experiments is infinitely more interesting than any results to which > their > > > experiments lead. > > > -- Norbert Wiener > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Thu May 21 16:00:25 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 21 May 2020 16:00:25 -0500 (CDT) Subject: [petsc-users] error of compiling complex In-Reply-To: References: Message-ID: We usually do the other way around [build petsc with the correct compilers and option] and make apps use the same flags [via petsc formatted makefiles]. However you should be able to construct petsc configure command and force petsc to build exactly as you need. Satish On Thu, 21 May 2020, Sam Guo wrote: > The reason we don?t use petsc build tool is to make sure we use same > compiler and flags to build our code and petsc. > > On Thursday, May 21, 2020, Satish Balay wrote: > > > Is there a reason you can't use petsc build tools? > > > > config/gmakegen.py processes the mkaefiles in source dirs - and determines > > the list of files to build [for a given configure setup] > > > > src/ksp/pc/impls/tfs/makefile has: > > > > #requiresscalar real > > > > i.e the sources in this dir are compiled only when --with-scalar-type=real > > and not when --with-scalar-type=complex etc.. > > > > Satish > > > > On Thu, 21 May 2020, Sam Guo wrote: > > > > > Thanks for the quick response. I have my own makefile based on PETSc > > > makefile. I need to update my makefile. It seems PETSc generates the > > > makefile on fly. I struggle to figure out which files I should skip for > > > complex. Could you tell me what files I should skip? > > > > > > Thanks, > > > Sam > > > > > > On Thu, May 21, 2020 at 12:31 PM Matthew Knepley > > wrote: > > > > > > > On Thu, May 21, 2020 at 3:25 PM Sam Guo wrote: > > > > > > > >> Dear PETSc dev team, > > > >> I got following error of compiling complex: > > > >> ../../../petsc/src/ksp/pc/impls/tfs/ivec.c: In function > > ?PCTFS_rvec_max?: > > > >> ../../../petsc/include/petscmath.h:532:30: error: invalid operands to > > > >> binary < (have ?PetscScalar? {aka ?_Complex double?} and > > ?PetscScalar? {aka > > > >> ?_Complex double?}) > > > >> 532 | #define PetscMax(a,b) (((a)<(b)) ? (b) : (a)) > > > >> > > > >> Any idea what I did wrong? I am using version 3.11.3. > > > >> > > > > > > > > Can you upgrade to 3.13? > > > > > > > > Thanks, > > > > > > > > Matt > > > > > > > > > > > >> Thanks, > > > >> Sam > > > >> > > > > > > > > > > > > -- > > > > What most experimenters take for granted before they begin their > > > > experiments is infinitely more interesting than any results to which > > their > > > > experiments lead. > > > > -- Norbert Wiener > > > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > > > > > From ajaramillopalma at gmail.com Thu May 21 16:29:13 2020 From: ajaramillopalma at gmail.com (Alfredo Jaramillo) Date: Thu, 21 May 2020 18:29:13 -0300 Subject: [petsc-users] multiple definition of `main' with intel compilers Message-ID: dear PETSc team, I have compiled PETSc with a 2016 version of the intel compilers. The installation went well, but when I tried to compile my code the following error appears in the final step of compilation (linking with ld) ../build/linux_icc/obj_linux_icc_opt/main.o: In function `main': main.c:(.text+0x0): multiple definition of `main' /opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64/libifcore_pic.a(for_main.o):for_main.c:(.text+0x0): first defined here /opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64/libifcore_pic.a(for_main.o): In function `main': for_main.c:(.text+0x3e): undefined reference to `MAIN__' I searched for this and I found that the option "-nofor_main" should be added when compiling with ifort, but our code is written only in C an C++. The FORTRAN compiler is used when PETSc compiles MUMPS. So I dont know if this would work for this case. The configure.log file and the log of the compilation giving the error are attached to this message. These logs were obtained in a cluster, I'm getting the same error on my personal computer with a 2020 version of the Intel Parallel Studio. thank you for any help on this Alfredo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: log_make Type: application/octet-stream Size: 53528 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log.7z Type: application/x-7z-compressed Size: 149825 bytes Desc: not available URL: From balay at mcs.anl.gov Thu May 21 16:37:55 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 21 May 2020 16:37:55 -0500 (CDT) Subject: [petsc-users] multiple definition of `main' with intel compilers In-Reply-To: References: Message-ID: Do you get this error when building PETSc examples [C and/or fortran] - when you build them with the corresponding petsc makefile? Can you send the log of the example compiles? Satish --- [the attachment got deleted - don't know by who..] DENIAL OF SERVICE ALERT A denial of service protection limit was exceeded. The file has been removed. Context: 'configure.log.7z' Reason: The data size limit was exceeded Limit: 10 MB Ticket Number : 0c9c-5ec6-f30f-0001 For further information, contact your system administrator. Copyright 1999-2014 McAfee, Inc. All Rights Reserved. http://www.mcafee.com On Thu, 21 May 2020, Alfredo Jaramillo wrote: > dear PETSc team, > > I have compiled PETSc with a 2016 version of the intel compilers. The > installation went well, but when I tried to compile my code the following > error appears in the final step of compilation (linking with ld) > > ../build/linux_icc/obj_linux_icc_opt/main.o: In function `main': > main.c:(.text+0x0): multiple definition of `main' > /opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64/libifcore_pic.a(for_main.o):for_main.c:(.text+0x0): > first defined here > /opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64/libifcore_pic.a(for_main.o): > In function `main': > for_main.c:(.text+0x3e): undefined reference to `MAIN__' > > I searched for this and I found that the option "-nofor_main" should be > added when compiling with ifort, but our code is written only in C an C++. > The FORTRAN compiler is used when PETSc compiles MUMPS. So I dont know if > this would work for this case. > > The configure.log file and the log of the compilation giving the error are > attached to this message. These logs were obtained in a cluster, I'm > getting the same error on my personal computer with a 2020 version of the > Intel Parallel Studio. > > thank you for any help on this > Alfredo > From sam.guo at cd-adapco.com Thu May 21 16:43:43 2020 From: sam.guo at cd-adapco.com (Sam Guo) Date: Thu, 21 May 2020 14:43:43 -0700 Subject: [petsc-users] error of compiling complex In-Reply-To: References: Message-ID: What preprocessor directive should I use for real: PETSC_USE_SCALAR_REAL or PETSC_USE_REAL_DOUBLE? On Thu, May 21, 2020 at 2:00 PM Satish Balay wrote: > We usually do the other way around [build petsc with the correct > compilers and option] and make apps use the same flags [via petsc > formatted makefiles]. > > However you should be able to construct petsc configure command and > force petsc to build exactly as you need. > > Satish > > On Thu, 21 May 2020, Sam Guo wrote: > > > The reason we don?t use petsc build tool is to make sure we use same > > compiler and flags to build our code and petsc. > > > > On Thursday, May 21, 2020, Satish Balay wrote: > > > > > Is there a reason you can't use petsc build tools? > > > > > > config/gmakegen.py processes the mkaefiles in source dirs - and > determines > > > the list of files to build [for a given configure setup] > > > > > > src/ksp/pc/impls/tfs/makefile has: > > > > > > #requiresscalar real > > > > > > i.e the sources in this dir are compiled only when > --with-scalar-type=real > > > and not when --with-scalar-type=complex etc.. > > > > > > Satish > > > > > > On Thu, 21 May 2020, Sam Guo wrote: > > > > > > > Thanks for the quick response. I have my own makefile based on PETSc > > > > makefile. I need to update my makefile. It seems PETSc generates the > > > > makefile on fly. I struggle to figure out which files I should skip > for > > > > complex. Could you tell me what files I should skip? > > > > > > > > Thanks, > > > > Sam > > > > > > > > On Thu, May 21, 2020 at 12:31 PM Matthew Knepley > > > wrote: > > > > > > > > > On Thu, May 21, 2020 at 3:25 PM Sam Guo > wrote: > > > > > > > > > >> Dear PETSc dev team, > > > > >> I got following error of compiling complex: > > > > >> ../../../petsc/src/ksp/pc/impls/tfs/ivec.c: In function > > > ?PCTFS_rvec_max?: > > > > >> ../../../petsc/include/petscmath.h:532:30: error: invalid > operands to > > > > >> binary < (have ?PetscScalar? {aka ?_Complex double?} and > > > ?PetscScalar? {aka > > > > >> ?_Complex double?}) > > > > >> 532 | #define PetscMax(a,b) (((a)<(b)) ? (b) : (a)) > > > > >> > > > > >> Any idea what I did wrong? I am using version 3.11.3. > > > > >> > > > > > > > > > > Can you upgrade to 3.13? > > > > > > > > > > Thanks, > > > > > > > > > > Matt > > > > > > > > > > > > > > >> Thanks, > > > > >> Sam > > > > >> > > > > > > > > > > > > > > > -- > > > > > What most experimenters take for granted before they begin their > > > > > experiments is infinitely more interesting than any results to > which > > > their > > > > > experiments lead. > > > > > -- Norbert Wiener > > > > > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajaramillopalma at gmail.com Thu May 21 16:46:54 2020 From: ajaramillopalma at gmail.com (Alfredo Jaramillo) Date: Thu, 21 May 2020 18:46:54 -0300 Subject: [petsc-users] multiple definition of `main' with intel compilers In-Reply-To: References: Message-ID: hello Satish, no the tests seem to be ok altough some error related to mpd. ==============THE TESTS=================== Running check examples to verify correct installation Using PETSC_DIR=/scratch/simulreserv/softwares/petsc-3.13.0 and PETSC_ARCH=x64-O3-3.13-intel2016-64 Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process See http://www.mcs.anl.gov/petsc/documentation/faq.html mpiexec_sdumont11: cannot connect to local mpd (/tmp/mpd2.console_alfredo.jaramillo); possible causes: 1. no mpd is running on this host 2. an mpd is running but was started without a "console" (-n option) Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes See http://www.mcs.anl.gov/petsc/documentation/faq.html mpiexec_sdumont11: cannot connect to local mpd (/tmp/mpd2.console_alfredo.jaramillo); possible causes: 1. no mpd is running on this host 2. an mpd is running but was started without a "console" (-n option) 1,5c1,3 < lid velocity = 0.0016, prandtl # = 1., grashof # = 1. < 0 SNES Function norm 0.0406612 < 1 SNES Function norm 4.12227e-06 < 2 SNES Function norm 6.098e-11 < Number of SNES iterations = 2 --- > mpiexec_sdumont11: cannot connect to local mpd (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > 1. no mpd is running on this host > 2. an mpd is running but was started without a "console" (-n option) /scratch/simulreserv/softwares/petsc-3.13.0/src/snes/tutorials Possible problem with ex19 running with hypre, diffs above ========================================= 1,9c1,3 < lid velocity = 0.0625, prandtl # = 1., grashof # = 1. < 0 SNES Function norm 0.239155 < 0 KSP Residual norm 0.239155 < 1 KSP Residual norm < 1.e-11 < 1 SNES Function norm 6.81968e-05 < 0 KSP Residual norm 6.81968e-05 < 1 KSP Residual norm < 1.e-11 < 2 SNES Function norm < 1.e-11 < Number of SNES iterations = 2 --- > mpiexec_sdumont11: cannot connect to local mpd (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > 1. no mpd is running on this host > 2. an mpd is running but was started without a "console" (-n option) /scratch/simulreserv/softwares/petsc-3.13.0/src/snes/tutorials Possible problem with ex19 running with mumps, diffs above ========================================= Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI process See http://www.mcs.anl.gov/petsc/documentation/faq.html mpiexec_sdumont11: cannot connect to local mpd (/tmp/mpd2.console_alfredo.jaramillo); possible causes: 1. no mpd is running on this host 2. an mpd is running but was started without a "console" (-n option) Completed test examples =============================== I entered in src/snes/tutorials/ and executed "make ex5f". The binary exf5 was created On Thu, May 21, 2020 at 6:37 PM Satish Balay wrote: > Do you get this error when building PETSc examples [C and/or fortran] - > when you build them with the corresponding petsc makefile? > > Can you send the log of the example compiles? > > Satish > > --- > > [the attachment got deleted - don't know by who..] > > DENIAL OF SERVICE ALERT > > A denial of service protection limit was exceeded. The file has been > removed. > Context: 'configure.log.7z' > Reason: The data size limit was exceeded > Limit: 10 MB > Ticket Number : 0c9c-5ec6-f30f-0001 > > > For further information, contact your system administrator. > Copyright 1999-2014 McAfee, Inc. > All Rights Reserved. > http://www.mcafee.com > > > > On Thu, 21 May 2020, Alfredo Jaramillo wrote: > > > dear PETSc team, > > > > I have compiled PETSc with a 2016 version of the intel compilers. The > > installation went well, but when I tried to compile my code the following > > error appears in the final step of compilation (linking with ld) > > > > ../build/linux_icc/obj_linux_icc_opt/main.o: In function `main': > > main.c:(.text+0x0): multiple definition of `main' > > > /opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64/libifcore_pic.a(for_main.o):for_main.c:(.text+0x0): > > first defined here > > > /opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64/libifcore_pic.a(for_main.o): > > In function `main': > > for_main.c:(.text+0x3e): undefined reference to `MAIN__' > > > > I searched for this and I found that the option "-nofor_main" should be > > added when compiling with ifort, but our code is written only in C an > C++. > > The FORTRAN compiler is used when PETSc compiles MUMPS. So I dont know if > > this would work for this case. > > > > The configure.log file and the log of the compilation giving the error > are > > attached to this message. These logs were obtained in a cluster, I'm > > getting the same error on my personal computer with a 2020 version of the > > Intel Parallel Studio. > > > > thank you for any help on this > > Alfredo > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Thu May 21 16:51:04 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 21 May 2020 16:51:04 -0500 (CDT) Subject: [petsc-users] error of compiling complex In-Reply-To: References: Message-ID: Its PETSC_USE_COMPLEX [undefined for --with-scalar-type=real] Satish On Thu, 21 May 2020, Sam Guo wrote: > What preprocessor directive should I use for real: PETSC_USE_SCALAR_REAL > or PETSC_USE_REAL_DOUBLE? > > On Thu, May 21, 2020 at 2:00 PM Satish Balay wrote: > > > We usually do the other way around [build petsc with the correct > > compilers and option] and make apps use the same flags [via petsc > > formatted makefiles]. > > > > However you should be able to construct petsc configure command and > > force petsc to build exactly as you need. > > > > Satish > > > > On Thu, 21 May 2020, Sam Guo wrote: > > > > > The reason we don?t use petsc build tool is to make sure we use same > > > compiler and flags to build our code and petsc. > > > > > > On Thursday, May 21, 2020, Satish Balay wrote: > > > > > > > Is there a reason you can't use petsc build tools? > > > > > > > > config/gmakegen.py processes the mkaefiles in source dirs - and > > determines > > > > the list of files to build [for a given configure setup] > > > > > > > > src/ksp/pc/impls/tfs/makefile has: > > > > > > > > #requiresscalar real > > > > > > > > i.e the sources in this dir are compiled only when > > --with-scalar-type=real > > > > and not when --with-scalar-type=complex etc.. > > > > > > > > Satish > > > > > > > > On Thu, 21 May 2020, Sam Guo wrote: > > > > > > > > > Thanks for the quick response. I have my own makefile based on PETSc > > > > > makefile. I need to update my makefile. It seems PETSc generates the > > > > > makefile on fly. I struggle to figure out which files I should skip > > for > > > > > complex. Could you tell me what files I should skip? > > > > > > > > > > Thanks, > > > > > Sam > > > > > > > > > > On Thu, May 21, 2020 at 12:31 PM Matthew Knepley > > > > wrote: > > > > > > > > > > > On Thu, May 21, 2020 at 3:25 PM Sam Guo > > wrote: > > > > > > > > > > > >> Dear PETSc dev team, > > > > > >> I got following error of compiling complex: > > > > > >> ../../../petsc/src/ksp/pc/impls/tfs/ivec.c: In function > > > > ?PCTFS_rvec_max?: > > > > > >> ../../../petsc/include/petscmath.h:532:30: error: invalid > > operands to > > > > > >> binary < (have ?PetscScalar? {aka ?_Complex double?} and > > > > ?PetscScalar? {aka > > > > > >> ?_Complex double?}) > > > > > >> 532 | #define PetscMax(a,b) (((a)<(b)) ? (b) : (a)) > > > > > >> > > > > > >> Any idea what I did wrong? I am using version 3.11.3. > > > > > >> > > > > > > > > > > > > Can you upgrade to 3.13? > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Matt > > > > > > > > > > > > > > > > > >> Thanks, > > > > > >> Sam > > > > > >> > > > > > > > > > > > > > > > > > > -- > > > > > > What most experimenters take for granted before they begin their > > > > > > experiments is infinitely more interesting than any results to > > which > > > > their > > > > > > experiments lead. > > > > > > -- Norbert Wiener > > > > > > > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > > > > > > > > > > > > > > > > > > From balay at mcs.anl.gov Thu May 21 16:53:14 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 21 May 2020 16:53:14 -0500 (CDT) Subject: [petsc-users] multiple definition of `main' with intel compilers In-Reply-To: References: Message-ID: Please copy/paste complete [compile] commands from: src/snes/tutorials/ make clean make ex19 make ex5f Likely the link command used in your code is different than what is used here - triggering errors. Satish On Thu, 21 May 2020, Alfredo Jaramillo wrote: > hello Satish, no the tests seem to be ok altough some error related to mpd. > > ==============THE TESTS=================== > > Running check examples to verify correct installation > Using PETSC_DIR=/scratch/simulreserv/softwares/petsc-3.13.0 and > PETSC_ARCH=x64-O3-3.13-intel2016-64 > Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process > See http://www.mcs.anl.gov/petsc/documentation/faq.html > mpiexec_sdumont11: cannot connect to local mpd > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > 1. no mpd is running on this host > 2. an mpd is running but was started without a "console" (-n option) > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes > See http://www.mcs.anl.gov/petsc/documentation/faq.html > mpiexec_sdumont11: cannot connect to local mpd > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > 1. no mpd is running on this host > 2. an mpd is running but was started without a "console" (-n option) > 1,5c1,3 > < lid velocity = 0.0016, prandtl # = 1., grashof # = 1. > < 0 SNES Function norm 0.0406612 > < 1 SNES Function norm 4.12227e-06 > < 2 SNES Function norm 6.098e-11 > < Number of SNES iterations = 2 > --- > > mpiexec_sdumont11: cannot connect to local mpd > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > > 1. no mpd is running on this host > > 2. an mpd is running but was started without a "console" (-n option) > /scratch/simulreserv/softwares/petsc-3.13.0/src/snes/tutorials > Possible problem with ex19 running with hypre, diffs above > ========================================= > 1,9c1,3 > < lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > < 0 SNES Function norm 0.239155 > < 0 KSP Residual norm 0.239155 > < 1 KSP Residual norm < 1.e-11 > < 1 SNES Function norm 6.81968e-05 > < 0 KSP Residual norm 6.81968e-05 > < 1 KSP Residual norm < 1.e-11 > < 2 SNES Function norm < 1.e-11 > < Number of SNES iterations = 2 > --- > > mpiexec_sdumont11: cannot connect to local mpd > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > > 1. no mpd is running on this host > > 2. an mpd is running but was started without a "console" (-n option) > /scratch/simulreserv/softwares/petsc-3.13.0/src/snes/tutorials > Possible problem with ex19 running with mumps, diffs above > ========================================= > Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI > process > See http://www.mcs.anl.gov/petsc/documentation/faq.html > mpiexec_sdumont11: cannot connect to local mpd > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > 1. no mpd is running on this host > 2. an mpd is running but was started without a "console" (-n option) > Completed test examples > > =============================== > > I entered in src/snes/tutorials/ and executed "make ex5f". The binary exf5 > was created > > > > On Thu, May 21, 2020 at 6:37 PM Satish Balay wrote: > > > Do you get this error when building PETSc examples [C and/or fortran] - > > when you build them with the corresponding petsc makefile? > > > > Can you send the log of the example compiles? > > > > Satish > > > > --- > > > > [the attachment got deleted - don't know by who..] > > > > DENIAL OF SERVICE ALERT > > > > A denial of service protection limit was exceeded. The file has been > > removed. > > Context: 'configure.log.7z' > > Reason: The data size limit was exceeded > > Limit: 10 MB > > Ticket Number : 0c9c-5ec6-f30f-0001 > > > > > > For further information, contact your system administrator. > > Copyright 1999-2014 McAfee, Inc. > > All Rights Reserved. > > http://www.mcafee.com > > > > > > > > On Thu, 21 May 2020, Alfredo Jaramillo wrote: > > > > > dear PETSc team, > > > > > > I have compiled PETSc with a 2016 version of the intel compilers. The > > > installation went well, but when I tried to compile my code the following > > > error appears in the final step of compilation (linking with ld) > > > > > > ../build/linux_icc/obj_linux_icc_opt/main.o: In function `main': > > > main.c:(.text+0x0): multiple definition of `main' > > > > > /opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64/libifcore_pic.a(for_main.o):for_main.c:(.text+0x0): > > > first defined here > > > > > /opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64/libifcore_pic.a(for_main.o): > > > In function `main': > > > for_main.c:(.text+0x3e): undefined reference to `MAIN__' > > > > > > I searched for this and I found that the option "-nofor_main" should be > > > added when compiling with ifort, but our code is written only in C an > > C++. > > > The FORTRAN compiler is used when PETSc compiles MUMPS. So I dont know if > > > this would work for this case. > > > > > > The configure.log file and the log of the compilation giving the error > > are > > > attached to this message. These logs were obtained in a cluster, I'm > > > getting the same error on my personal computer with a 2020 version of the > > > Intel Parallel Studio. > > > > > > thank you for any help on this > > > Alfredo > > > > > > > > From ajaramillopalma at gmail.com Thu May 21 16:57:02 2020 From: ajaramillopalma at gmail.com (Alfredo Jaramillo) Date: Thu, 21 May 2020 18:57:02 -0300 Subject: [petsc-users] multiple definition of `main' with intel compilers In-Reply-To: References: Message-ID: here is the output: alfredo.jaramillo at sdumont11 tutorials]$ make ex19 mpiicc -fPIC -O3 -march=native -mtune=native -fPIC -O3 -march=native -mtune=native -I/scratch/simulreserv/softwares/petsc-3.13.0/include -I/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/include -I/scratch/simulreserv/softwares/valgrind-3.15.0/include -I/scratch/app/zlib/1.2.11/include ex19.c -Wl,-rpath,/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib -L/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib -Wl,-rpath,/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib -L/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib/release_mt -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib/release_mt -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/clck/ 3.1.2.006/lib/intel64 -L/opt/intel/parallel_studio_xe_2016_update2/clck/ 3.1.2.006/lib/intel64 -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/ipp/lib/intel64 -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/ipp/lib/intel64 -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64 -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64 -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mkl/lib/intel64 -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mkl/lib/intel64 -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64/gcc4.4 -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/daal/lib/intel64_lin -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/daal/lib/intel64_lin -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64_lin/gcc4.4 -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64_lin/gcc4.4 -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64_lin -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64_lin -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib/release_mt -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib -lpetsc -lHYPRE -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lopenblas -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lifport -lifcore_pic -limf -lsvml -lm -lipgo -lirc -lgcc_s -lirc_s -lquadmath -lstdc++ -ldl -o ex19 [alfredo.jaramillo at sdumont11 tutorials]$ make ex5f mpiifort -fPIC -O3 -march=native -mtune=native -I/scratch/simulreserv/softwares/petsc-3.13.0/include -I/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/include -I/scratch/simulreserv/softwares/valgrind-3.15.0/include ex5f.F90 -Wl,-rpath,/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib -L/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib -Wl,-rpath,/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib -L/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib/release_mt -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib/release_mt -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/clck/ 3.1.2.006/lib/intel64 -L/opt/intel/parallel_studio_xe_2016_update2/clck/ 3.1.2.006/lib/intel64 -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/ipp/lib/intel64 -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/ipp/lib/intel64 -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64 -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64 -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mkl/lib/intel64 -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mkl/lib/intel64 -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64/gcc4.4 -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/daal/lib/intel64_lin -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/daal/lib/intel64_lin -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64_lin/gcc4.4 -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64_lin/gcc4.4 -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64_lin -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64_lin -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib/release_mt -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib -lpetsc -lHYPRE -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lopenblas -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lifport -lifcore_pic -limf -lsvml -lm -lipgo -lirc -lgcc_s -lirc_s -lquadmath -lstdc++ -ldl -o ex5f [alfredo.jaramillo at sdumont11 tutorials]$ ./ex5f Number of SNES iterations = 4 [alfredo.jaramillo at sdumont11 tutorials]$ ./ex19 lid velocity = 0.0625, prandtl # = 1., grashof # = 1. Number of SNES iterations = 2 On Thu, May 21, 2020 at 6:53 PM Satish Balay wrote: > Please copy/paste complete [compile] commands from: > > src/snes/tutorials/ > make clean > make ex19 > make ex5f > > Likely the link command used in your code is different than what is used > here - triggering errors. > > Satish > > On Thu, 21 May 2020, Alfredo Jaramillo wrote: > > > hello Satish, no the tests seem to be ok altough some error related to > mpd. > > > > ==============THE TESTS=================== > > > > Running check examples to verify correct installation > > Using PETSC_DIR=/scratch/simulreserv/softwares/petsc-3.13.0 and > > PETSC_ARCH=x64-O3-3.13-intel2016-64 > > Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > mpiexec_sdumont11: cannot connect to local mpd > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > > 1. no mpd is running on this host > > 2. an mpd is running but was started without a "console" (-n option) > > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > mpiexec_sdumont11: cannot connect to local mpd > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > > 1. no mpd is running on this host > > 2. an mpd is running but was started without a "console" (-n option) > > 1,5c1,3 > > < lid velocity = 0.0016, prandtl # = 1., grashof # = 1. > > < 0 SNES Function norm 0.0406612 > > < 1 SNES Function norm 4.12227e-06 > > < 2 SNES Function norm 6.098e-11 > > < Number of SNES iterations = 2 > > --- > > > mpiexec_sdumont11: cannot connect to local mpd > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > > > 1. no mpd is running on this host > > > 2. an mpd is running but was started without a "console" (-n option) > > /scratch/simulreserv/softwares/petsc-3.13.0/src/snes/tutorials > > Possible problem with ex19 running with hypre, diffs above > > ========================================= > > 1,9c1,3 > > < lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > > < 0 SNES Function norm 0.239155 > > < 0 KSP Residual norm 0.239155 > > < 1 KSP Residual norm < 1.e-11 > > < 1 SNES Function norm 6.81968e-05 > > < 0 KSP Residual norm 6.81968e-05 > > < 1 KSP Residual norm < 1.e-11 > > < 2 SNES Function norm < 1.e-11 > > < Number of SNES iterations = 2 > > --- > > > mpiexec_sdumont11: cannot connect to local mpd > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > > > 1. no mpd is running on this host > > > 2. an mpd is running but was started without a "console" (-n option) > > /scratch/simulreserv/softwares/petsc-3.13.0/src/snes/tutorials > > Possible problem with ex19 running with mumps, diffs above > > ========================================= > > Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI > > process > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > mpiexec_sdumont11: cannot connect to local mpd > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > > 1. no mpd is running on this host > > 2. an mpd is running but was started without a "console" (-n option) > > Completed test examples > > > > =============================== > > > > I entered in src/snes/tutorials/ and executed "make ex5f". The binary > exf5 > > was created > > > > > > > > On Thu, May 21, 2020 at 6:37 PM Satish Balay wrote: > > > > > Do you get this error when building PETSc examples [C and/or fortran] - > > > when you build them with the corresponding petsc makefile? > > > > > > Can you send the log of the example compiles? > > > > > > Satish > > > > > > --- > > > > > > [the attachment got deleted - don't know by who..] > > > > > > DENIAL OF SERVICE ALERT > > > > > > A denial of service protection limit was exceeded. The file has been > > > removed. > > > Context: 'configure.log.7z' > > > Reason: The data size limit was exceeded > > > Limit: 10 MB > > > Ticket Number : 0c9c-5ec6-f30f-0001 > > > > > > > > > For further information, contact your system administrator. > > > Copyright 1999-2014 McAfee, Inc. > > > All Rights Reserved. > > > http://www.mcafee.com > > > > > > > > > > > > On Thu, 21 May 2020, Alfredo Jaramillo wrote: > > > > > > > dear PETSc team, > > > > > > > > I have compiled PETSc with a 2016 version of the intel compilers. The > > > > installation went well, but when I tried to compile my code the > following > > > > error appears in the final step of compilation (linking with ld) > > > > > > > > ../build/linux_icc/obj_linux_icc_opt/main.o: In function `main': > > > > main.c:(.text+0x0): multiple definition of `main' > > > > > > > > /opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64/libifcore_pic.a(for_main.o):for_main.c:(.text+0x0): > > > > first defined here > > > > > > > > /opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64/libifcore_pic.a(for_main.o): > > > > In function `main': > > > > for_main.c:(.text+0x3e): undefined reference to `MAIN__' > > > > > > > > I searched for this and I found that the option "-nofor_main" should > be > > > > added when compiling with ifort, but our code is written only in C an > > > C++. > > > > The FORTRAN compiler is used when PETSc compiles MUMPS. So I dont > know if > > > > this would work for this case. > > > > > > > > The configure.log file and the log of the compilation giving the > error > > > are > > > > attached to this message. These logs were obtained in a cluster, I'm > > > > getting the same error on my personal computer with a 2020 version > of the > > > > Intel Parallel Studio. > > > > > > > > thank you for any help on this > > > > Alfredo > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Thu May 21 17:21:49 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 21 May 2020 17:21:49 -0500 (CDT) Subject: [petsc-users] multiple definition of `main' with intel compilers In-Reply-To: References: Message-ID: For one - PETSc is built without -qopenmp flag - but your makefile is using it. Intel compiler can link in with different [incompatible] compiler libraries when some flags change this way [causing conflict]. However - the issue could be: your makefile is listing PETSC_LIB [or however you are accessing petsc info from petsc makefiles] redundantly. i.e - its listing -lpetsc etc when compiling .c to .o files [this should not happen] - its listing -lpetsc etc twice [-lifcore_pic is listed once though] - don't know if this is the reason for the problem. You can try [manually] removing -lifcore_pic from PETSC_DIR/PETSC_ARCH/lib/petscvariables - and see if this problem goes away Satish On Thu, 21 May 2020, Alfredo Jaramillo wrote: > here is the output: > > alfredo.jaramillo at sdumont11 tutorials]$ make ex19 > mpiicc -fPIC -O3 -march=native -mtune=native -fPIC -O3 -march=native > -mtune=native -I/scratch/simulreserv/softwares/petsc-3.13.0/include > -I/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/include > -I/scratch/simulreserv/softwares/valgrind-3.15.0/include > -I/scratch/app/zlib/1.2.11/include ex19.c > -Wl,-rpath,/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > -L/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > -Wl,-rpath,/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > -L/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib/release_mt > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib/release_mt > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/clck/ > 3.1.2.006/lib/intel64 -L/opt/intel/parallel_studio_xe_2016_update2/clck/ > 3.1.2.006/lib/intel64 > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/ipp/lib/intel64 > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/ipp/lib/intel64 > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64 > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64 > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mkl/lib/intel64 > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mkl/lib/intel64 > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64/gcc4.4 > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64/gcc4.4 > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/daal/lib/intel64_lin > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/daal/lib/intel64_lin > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64_lin/gcc4.4 > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64_lin/gcc4.4 > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64_lin > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64_lin > -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 > -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib/release_mt > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib -lpetsc -lHYPRE -lcmumps > -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lopenblas > -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lifport -lifcore_pic > -limf -lsvml -lm -lipgo -lirc -lgcc_s -lirc_s -lquadmath -lstdc++ -ldl -o > ex19 > [alfredo.jaramillo at sdumont11 tutorials]$ make ex5f > mpiifort -fPIC -O3 -march=native -mtune=native > -I/scratch/simulreserv/softwares/petsc-3.13.0/include > -I/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/include > -I/scratch/simulreserv/softwares/valgrind-3.15.0/include ex5f.F90 > -Wl,-rpath,/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > -L/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > -Wl,-rpath,/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > -L/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib/release_mt > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib/release_mt > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/clck/ > 3.1.2.006/lib/intel64 -L/opt/intel/parallel_studio_xe_2016_update2/clck/ > 3.1.2.006/lib/intel64 > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/ipp/lib/intel64 > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/ipp/lib/intel64 > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64 > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64 > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mkl/lib/intel64 > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mkl/lib/intel64 > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64/gcc4.4 > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64/gcc4.4 > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/daal/lib/intel64_lin > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/daal/lib/intel64_lin > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64_lin/gcc4.4 > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64_lin/gcc4.4 > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64_lin > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64_lin > -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 > -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib/release_mt > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib -lpetsc -lHYPRE -lcmumps > -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lopenblas > -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lifport -lifcore_pic > -limf -lsvml -lm -lipgo -lirc -lgcc_s -lirc_s -lquadmath -lstdc++ -ldl -o > ex5f > > [alfredo.jaramillo at sdumont11 tutorials]$ ./ex5f > Number of SNES iterations = 4 > > [alfredo.jaramillo at sdumont11 tutorials]$ ./ex19 > lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > Number of SNES iterations = 2 > > On Thu, May 21, 2020 at 6:53 PM Satish Balay wrote: > > > Please copy/paste complete [compile] commands from: > > > > src/snes/tutorials/ > > make clean > > make ex19 > > make ex5f > > > > Likely the link command used in your code is different than what is used > > here - triggering errors. > > > > Satish > > > > On Thu, 21 May 2020, Alfredo Jaramillo wrote: > > > > > hello Satish, no the tests seem to be ok altough some error related to > > mpd. > > > > > > ==============THE TESTS=================== > > > > > > Running check examples to verify correct installation > > > Using PETSC_DIR=/scratch/simulreserv/softwares/petsc-3.13.0 and > > > PETSC_ARCH=x64-O3-3.13-intel2016-64 > > > Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > > mpiexec_sdumont11: cannot connect to local mpd > > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > > > 1. no mpd is running on this host > > > 2. an mpd is running but was started without a "console" (-n option) > > > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > > mpiexec_sdumont11: cannot connect to local mpd > > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > > > 1. no mpd is running on this host > > > 2. an mpd is running but was started without a "console" (-n option) > > > 1,5c1,3 > > > < lid velocity = 0.0016, prandtl # = 1., grashof # = 1. > > > < 0 SNES Function norm 0.0406612 > > > < 1 SNES Function norm 4.12227e-06 > > > < 2 SNES Function norm 6.098e-11 > > > < Number of SNES iterations = 2 > > > --- > > > > mpiexec_sdumont11: cannot connect to local mpd > > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > > > > 1. no mpd is running on this host > > > > 2. an mpd is running but was started without a "console" (-n option) > > > /scratch/simulreserv/softwares/petsc-3.13.0/src/snes/tutorials > > > Possible problem with ex19 running with hypre, diffs above > > > ========================================= > > > 1,9c1,3 > > > < lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > > > < 0 SNES Function norm 0.239155 > > > < 0 KSP Residual norm 0.239155 > > > < 1 KSP Residual norm < 1.e-11 > > > < 1 SNES Function norm 6.81968e-05 > > > < 0 KSP Residual norm 6.81968e-05 > > > < 1 KSP Residual norm < 1.e-11 > > > < 2 SNES Function norm < 1.e-11 > > > < Number of SNES iterations = 2 > > > --- > > > > mpiexec_sdumont11: cannot connect to local mpd > > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > > > > 1. no mpd is running on this host > > > > 2. an mpd is running but was started without a "console" (-n option) > > > /scratch/simulreserv/softwares/petsc-3.13.0/src/snes/tutorials > > > Possible problem with ex19 running with mumps, diffs above > > > ========================================= > > > Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI > > > process > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > > mpiexec_sdumont11: cannot connect to local mpd > > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > > > 1. no mpd is running on this host > > > 2. an mpd is running but was started without a "console" (-n option) > > > Completed test examples > > > > > > =============================== > > > > > > I entered in src/snes/tutorials/ and executed "make ex5f". The binary > > exf5 > > > was created > > > > > > > > > > > > On Thu, May 21, 2020 at 6:37 PM Satish Balay wrote: > > > > > > > Do you get this error when building PETSc examples [C and/or fortran] - > > > > when you build them with the corresponding petsc makefile? > > > > > > > > Can you send the log of the example compiles? > > > > > > > > Satish > > > > > > > > --- > > > > > > > > [the attachment got deleted - don't know by who..] > > > > > > > > DENIAL OF SERVICE ALERT > > > > > > > > A denial of service protection limit was exceeded. The file has been > > > > removed. > > > > Context: 'configure.log.7z' > > > > Reason: The data size limit was exceeded > > > > Limit: 10 MB > > > > Ticket Number : 0c9c-5ec6-f30f-0001 > > > > > > > > > > > > For further information, contact your system administrator. > > > > Copyright 1999-2014 McAfee, Inc. > > > > All Rights Reserved. > > > > http://www.mcafee.com > > > > > > > > > > > > > > > > On Thu, 21 May 2020, Alfredo Jaramillo wrote: > > > > > > > > > dear PETSc team, > > > > > > > > > > I have compiled PETSc with a 2016 version of the intel compilers. The > > > > > installation went well, but when I tried to compile my code the > > following > > > > > error appears in the final step of compilation (linking with ld) > > > > > > > > > > ../build/linux_icc/obj_linux_icc_opt/main.o: In function `main': > > > > > main.c:(.text+0x0): multiple definition of `main' > > > > > > > > > > > /opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64/libifcore_pic.a(for_main.o):for_main.c:(.text+0x0): > > > > > first defined here > > > > > > > > > > > /opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64/libifcore_pic.a(for_main.o): > > > > > In function `main': > > > > > for_main.c:(.text+0x3e): undefined reference to `MAIN__' > > > > > > > > > > I searched for this and I found that the option "-nofor_main" should > > be > > > > > added when compiling with ifort, but our code is written only in C an > > > > C++. > > > > > The FORTRAN compiler is used when PETSc compiles MUMPS. So I dont > > know if > > > > > this would work for this case. > > > > > > > > > > The configure.log file and the log of the compilation giving the > > error > > > > are > > > > > attached to this message. These logs were obtained in a cluster, I'm > > > > > getting the same error on my personal computer with a 2020 version > > of the > > > > > Intel Parallel Studio. > > > > > > > > > > thank you for any help on this > > > > > Alfredo > > > > > > > > > > > > > > > > > > > > > From sam.guo at cd-adapco.com Thu May 21 18:09:24 2020 From: sam.guo at cd-adapco.com (Sam Guo) Date: Thu, 21 May 2020 16:09:24 -0700 Subject: [petsc-users] error of compiling complex In-Reply-To: References: Message-ID: Hi Satish, Works. Thanks a lot for your help. BR, Sam On Thu, May 21, 2020 at 2:51 PM Satish Balay wrote: > Its PETSC_USE_COMPLEX [undefined for --with-scalar-type=real] > > Satish > > On Thu, 21 May 2020, Sam Guo wrote: > > > What preprocessor directive should I use for real: PETSC_USE_SCALAR_REAL > > or PETSC_USE_REAL_DOUBLE? > > > > On Thu, May 21, 2020 at 2:00 PM Satish Balay wrote: > > > > > We usually do the other way around [build petsc with the correct > > > compilers and option] and make apps use the same flags [via petsc > > > formatted makefiles]. > > > > > > However you should be able to construct petsc configure command and > > > force petsc to build exactly as you need. > > > > > > Satish > > > > > > On Thu, 21 May 2020, Sam Guo wrote: > > > > > > > The reason we don?t use petsc build tool is to make sure we use same > > > > compiler and flags to build our code and petsc. > > > > > > > > On Thursday, May 21, 2020, Satish Balay wrote: > > > > > > > > > Is there a reason you can't use petsc build tools? > > > > > > > > > > config/gmakegen.py processes the mkaefiles in source dirs - and > > > determines > > > > > the list of files to build [for a given configure setup] > > > > > > > > > > src/ksp/pc/impls/tfs/makefile has: > > > > > > > > > > #requiresscalar real > > > > > > > > > > i.e the sources in this dir are compiled only when > > > --with-scalar-type=real > > > > > and not when --with-scalar-type=complex etc.. > > > > > > > > > > Satish > > > > > > > > > > On Thu, 21 May 2020, Sam Guo wrote: > > > > > > > > > > > Thanks for the quick response. I have my own makefile based on > PETSc > > > > > > makefile. I need to update my makefile. It seems PETSc generates > the > > > > > > makefile on fly. I struggle to figure out which files I should > skip > > > for > > > > > > complex. Could you tell me what files I should skip? > > > > > > > > > > > > Thanks, > > > > > > Sam > > > > > > > > > > > > On Thu, May 21, 2020 at 12:31 PM Matthew Knepley < > knepley at gmail.com> > > > > > wrote: > > > > > > > > > > > > > On Thu, May 21, 2020 at 3:25 PM Sam Guo > > > > wrote: > > > > > > > > > > > > > >> Dear PETSc dev team, > > > > > > >> I got following error of compiling complex: > > > > > > >> ../../../petsc/src/ksp/pc/impls/tfs/ivec.c: In function > > > > > ?PCTFS_rvec_max?: > > > > > > >> ../../../petsc/include/petscmath.h:532:30: error: invalid > > > operands to > > > > > > >> binary < (have ?PetscScalar? {aka ?_Complex double?} and > > > > > ?PetscScalar? {aka > > > > > > >> ?_Complex double?}) > > > > > > >> 532 | #define PetscMax(a,b) (((a)<(b)) ? (b) : (a)) > > > > > > >> > > > > > > >> Any idea what I did wrong? I am using version 3.11.3. > > > > > > >> > > > > > > > > > > > > > > Can you upgrade to 3.13? > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Matt > > > > > > > > > > > > > > > > > > > > >> Thanks, > > > > > > >> Sam > > > > > > >> > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > What most experimenters take for granted before they begin > their > > > > > > > experiments is infinitely more interesting than any results to > > > which > > > > > their > > > > > > > experiments lead. > > > > > > > -- Norbert Wiener > > > > > > > > > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajaramillopalma at gmail.com Thu May 21 19:41:01 2020 From: ajaramillopalma at gmail.com (Alfredo Jaramillo) Date: Thu, 21 May 2020 21:41:01 -0300 Subject: [petsc-users] multiple definition of `main' with intel compilers In-Reply-To: References: Message-ID: In fact, removing -lifcore_pic from PETSC_DIR/PETSC_ARCH/lib/petscvariables solved the problem (this was the first I traied and it worked), it compiles and the program runs fine. Also, -lpetsc was being listed some times when compiling .c to .o, I fixed that in the scripts. Now, I'm a bit lost about when fcoremt_pic is being linked or if it is necessary at all? thank you very much! alfredo On Thu, May 21, 2020 at 7:21 PM Satish Balay wrote: > For one - PETSc is built without -qopenmp flag - but your makefile is > using it. Intel compiler can link in with > different [incompatible] compiler libraries when some flags change this > way [causing conflict]. > > However - the issue could be: > > your makefile is listing PETSC_LIB [or however you are accessing petsc > info from petsc makefiles] redundantly. > > i.e > > - its listing -lpetsc etc when compiling .c to .o files [this should not > happen] > - its listing -lpetsc etc twice [-lifcore_pic is listed once though] - > don't know if this is the reason for the problem. > > You can try [manually] removing -lifcore_pic from > PETSC_DIR/PETSC_ARCH/lib/petscvariables - and see if this problem goes away > > Satish > > On Thu, 21 May 2020, Alfredo Jaramillo wrote: > > > here is the output: > > > > alfredo.jaramillo at sdumont11 tutorials]$ make ex19 > > mpiicc -fPIC -O3 -march=native -mtune=native -fPIC -O3 -march=native > > -mtune=native -I/scratch/simulreserv/softwares/petsc-3.13.0/include > > > -I/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/include > > -I/scratch/simulreserv/softwares/valgrind-3.15.0/include > > -I/scratch/app/zlib/1.2.11/include ex19.c > > > -Wl,-rpath,/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > > > -L/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > > > -Wl,-rpath,/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > > > -L/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib/release_mt > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib/release_mt > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/clck/ > > 3.1.2.006/lib/intel64 -L/opt/intel/parallel_studio_xe_2016_update2/clck/ > > 3.1.2.006/lib/intel64 > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/ipp/lib/intel64 > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/ipp/lib/intel64 > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64 > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64 > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mkl/lib/intel64 > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mkl/lib/intel64 > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64/gcc4.4 > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64/gcc4.4 > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/daal/lib/intel64_lin > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/daal/lib/intel64_lin > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64_lin/gcc4.4 > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64_lin/gcc4.4 > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64_lin > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64_lin > > -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 > > -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib/release_mt > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib -lpetsc -lHYPRE -lcmumps > > -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lopenblas > > -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lifport > -lifcore_pic > > -limf -lsvml -lm -lipgo -lirc -lgcc_s -lirc_s -lquadmath -lstdc++ -ldl -o > > ex19 > > [alfredo.jaramillo at sdumont11 tutorials]$ make ex5f > > mpiifort -fPIC -O3 -march=native -mtune=native > > -I/scratch/simulreserv/softwares/petsc-3.13.0/include > > > -I/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/include > > -I/scratch/simulreserv/softwares/valgrind-3.15.0/include ex5f.F90 > > > -Wl,-rpath,/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > > > -L/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > > > -Wl,-rpath,/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > > > -L/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib/release_mt > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib/release_mt > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/clck/ > > 3.1.2.006/lib/intel64 -L/opt/intel/parallel_studio_xe_2016_update2/clck/ > > 3.1.2.006/lib/intel64 > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/ipp/lib/intel64 > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/ipp/lib/intel64 > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64 > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64 > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mkl/lib/intel64 > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mkl/lib/intel64 > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64/gcc4.4 > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64/gcc4.4 > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/daal/lib/intel64_lin > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/daal/lib/intel64_lin > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64_lin/gcc4.4 > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64_lin/gcc4.4 > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64_lin > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64_lin > > -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 > > -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib/release_mt > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib -lpetsc -lHYPRE -lcmumps > > -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lopenblas > > -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lifport > -lifcore_pic > > -limf -lsvml -lm -lipgo -lirc -lgcc_s -lirc_s -lquadmath -lstdc++ -ldl -o > > ex5f > > > > [alfredo.jaramillo at sdumont11 tutorials]$ ./ex5f > > Number of SNES iterations = 4 > > > > [alfredo.jaramillo at sdumont11 tutorials]$ ./ex19 > > lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > > Number of SNES iterations = 2 > > > > On Thu, May 21, 2020 at 6:53 PM Satish Balay wrote: > > > > > Please copy/paste complete [compile] commands from: > > > > > > src/snes/tutorials/ > > > make clean > > > make ex19 > > > make ex5f > > > > > > Likely the link command used in your code is different than what is > used > > > here - triggering errors. > > > > > > Satish > > > > > > On Thu, 21 May 2020, Alfredo Jaramillo wrote: > > > > > > > hello Satish, no the tests seem to be ok altough some error related > to > > > mpd. > > > > > > > > ==============THE TESTS=================== > > > > > > > > Running check examples to verify correct installation > > > > Using PETSC_DIR=/scratch/simulreserv/softwares/petsc-3.13.0 and > > > > PETSC_ARCH=x64-O3-3.13-intel2016-64 > > > > Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI > process > > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > > > mpiexec_sdumont11: cannot connect to local mpd > > > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > > > > 1. no mpd is running on this host > > > > 2. an mpd is running but was started without a "console" (-n > option) > > > > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI > processes > > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > > > mpiexec_sdumont11: cannot connect to local mpd > > > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > > > > 1. no mpd is running on this host > > > > 2. an mpd is running but was started without a "console" (-n > option) > > > > 1,5c1,3 > > > > < lid velocity = 0.0016, prandtl # = 1., grashof # = 1. > > > > < 0 SNES Function norm 0.0406612 > > > > < 1 SNES Function norm 4.12227e-06 > > > > < 2 SNES Function norm 6.098e-11 > > > > < Number of SNES iterations = 2 > > > > --- > > > > > mpiexec_sdumont11: cannot connect to local mpd > > > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > > > > > 1. no mpd is running on this host > > > > > 2. an mpd is running but was started without a "console" (-n > option) > > > > /scratch/simulreserv/softwares/petsc-3.13.0/src/snes/tutorials > > > > Possible problem with ex19 running with hypre, diffs above > > > > ========================================= > > > > 1,9c1,3 > > > > < lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > > > > < 0 SNES Function norm 0.239155 > > > > < 0 KSP Residual norm 0.239155 > > > > < 1 KSP Residual norm < 1.e-11 > > > > < 1 SNES Function norm 6.81968e-05 > > > > < 0 KSP Residual norm 6.81968e-05 > > > > < 1 KSP Residual norm < 1.e-11 > > > > < 2 SNES Function norm < 1.e-11 > > > > < Number of SNES iterations = 2 > > > > --- > > > > > mpiexec_sdumont11: cannot connect to local mpd > > > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > > > > > 1. no mpd is running on this host > > > > > 2. an mpd is running but was started without a "console" (-n > option) > > > > /scratch/simulreserv/softwares/petsc-3.13.0/src/snes/tutorials > > > > Possible problem with ex19 running with mumps, diffs above > > > > ========================================= > > > > Possible error running Fortran example src/snes/tutorials/ex5f with > 1 MPI > > > > process > > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > > > mpiexec_sdumont11: cannot connect to local mpd > > > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > > > > 1. no mpd is running on this host > > > > 2. an mpd is running but was started without a "console" (-n > option) > > > > Completed test examples > > > > > > > > =============================== > > > > > > > > I entered in src/snes/tutorials/ and executed "make ex5f". The binary > > > exf5 > > > > was created > > > > > > > > > > > > > > > > On Thu, May 21, 2020 at 6:37 PM Satish Balay > wrote: > > > > > > > > > Do you get this error when building PETSc examples [C and/or > fortran] - > > > > > when you build them with the corresponding petsc makefile? > > > > > > > > > > Can you send the log of the example compiles? > > > > > > > > > > Satish > > > > > > > > > > --- > > > > > > > > > > [the attachment got deleted - don't know by who..] > > > > > > > > > > DENIAL OF SERVICE ALERT > > > > > > > > > > A denial of service protection limit was exceeded. The file has > been > > > > > removed. > > > > > Context: 'configure.log.7z' > > > > > Reason: The data size limit was exceeded > > > > > Limit: 10 MB > > > > > Ticket Number : 0c9c-5ec6-f30f-0001 > > > > > > > > > > > > > > > For further information, contact your system administrator. > > > > > Copyright 1999-2014 McAfee, Inc. > > > > > All Rights Reserved. > > > > > http://www.mcafee.com > > > > > > > > > > > > > > > > > > > > On Thu, 21 May 2020, Alfredo Jaramillo wrote: > > > > > > > > > > > dear PETSc team, > > > > > > > > > > > > I have compiled PETSc with a 2016 version of the intel > compilers. The > > > > > > installation went well, but when I tried to compile my code the > > > following > > > > > > error appears in the final step of compilation (linking with ld) > > > > > > > > > > > > ../build/linux_icc/obj_linux_icc_opt/main.o: In function `main': > > > > > > main.c:(.text+0x0): multiple definition of `main' > > > > > > > > > > > > > > > /opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64/libifcore_pic.a(for_main.o):for_main.c:(.text+0x0): > > > > > > first defined here > > > > > > > > > > > > > > > /opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64/libifcore_pic.a(for_main.o): > > > > > > In function `main': > > > > > > for_main.c:(.text+0x3e): undefined reference to `MAIN__' > > > > > > > > > > > > I searched for this and I found that the option "-nofor_main" > should > > > be > > > > > > added when compiling with ifort, but our code is written only in > C an > > > > > C++. > > > > > > The FORTRAN compiler is used when PETSc compiles MUMPS. So I dont > > > know if > > > > > > this would work for this case. > > > > > > > > > > > > The configure.log file and the log of the compilation giving the > > > error > > > > > are > > > > > > attached to this message. These logs were obtained in a cluster, > I'm > > > > > > getting the same error on my personal computer with a 2020 > version > > > of the > > > > > > Intel Parallel Studio. > > > > > > > > > > > > thank you for any help on this > > > > > > Alfredo > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Thu May 21 20:08:36 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 21 May 2020 20:08:36 -0500 (CDT) Subject: [petsc-users] multiple definition of `main' with intel compilers In-Reply-To: References: Message-ID: Configure attempts to automatically determine the language compatible libraries by running 'icc -v' and 'ifort -v' - and uses what it says. [this list can be different based on compiler options used] And PETSc examples work fine - so its not clear why fcoremt_pic is causing issues with your app build. There are some differences in your build vs examples - so that could be one reason. So its best to build PETSc and your app with the exact same compilers and options and see if that works. Alternative is to specify to PETSc configure the compatibility libraries so that it doesn't try to figure this out. [and add extra un-needed libraries] With Intel compilers - its likely the additional configure option is: LIBS="-Bstatic -lifcore -Bdynamic" If this works - configure won't run 'ifort -v' and grab fcoremt_pic etc.. Satish On Thu, 21 May 2020, Alfredo Jaramillo wrote: > In fact, removing -lifcore_pic from PETSC_DIR/PETSC_ARCH/lib/petscvariables > solved the problem (this was the first I traied and it worked), it compiles > and the program runs fine. > > Also, -lpetsc was being listed some times when compiling .c to .o, I fixed > that in the scripts. > > Now, I'm a bit lost about when fcoremt_pic is being linked or if it is > necessary at all? > > thank you very much! > alfredo > > On Thu, May 21, 2020 at 7:21 PM Satish Balay wrote: > > > For one - PETSc is built without -qopenmp flag - but your makefile is > > using it. Intel compiler can link in with > > different [incompatible] compiler libraries when some flags change this > > way [causing conflict]. > > > > However - the issue could be: > > > > your makefile is listing PETSC_LIB [or however you are accessing petsc > > info from petsc makefiles] redundantly. > > > > i.e > > > > - its listing -lpetsc etc when compiling .c to .o files [this should not > > happen] > > - its listing -lpetsc etc twice [-lifcore_pic is listed once though] - > > don't know if this is the reason for the problem. > > > > You can try [manually] removing -lifcore_pic from > > PETSC_DIR/PETSC_ARCH/lib/petscvariables - and see if this problem goes away > > > > Satish > > > > On Thu, 21 May 2020, Alfredo Jaramillo wrote: > > > > > here is the output: > > > > > > alfredo.jaramillo at sdumont11 tutorials]$ make ex19 > > > mpiicc -fPIC -O3 -march=native -mtune=native -fPIC -O3 -march=native > > > -mtune=native -I/scratch/simulreserv/softwares/petsc-3.13.0/include > > > > > -I/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/include > > > -I/scratch/simulreserv/softwares/valgrind-3.15.0/include > > > -I/scratch/app/zlib/1.2.11/include ex19.c > > > > > -Wl,-rpath,/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > > > > > -L/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > > > > > -Wl,-rpath,/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > > > > > -L/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib/release_mt > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib/release_mt > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/clck/ > > > 3.1.2.006/lib/intel64 -L/opt/intel/parallel_studio_xe_2016_update2/clck/ > > > 3.1.2.006/lib/intel64 > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/ipp/lib/intel64 > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/ipp/lib/intel64 > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64 > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64 > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mkl/lib/intel64 > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mkl/lib/intel64 > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64/gcc4.4 > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64/gcc4.4 > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/daal/lib/intel64_lin > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/daal/lib/intel64_lin > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64_lin/gcc4.4 > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64_lin/gcc4.4 > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64_lin > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64_lin > > > -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 > > > -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 > > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib/release_mt > > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib -lpetsc -lHYPRE -lcmumps > > > -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lopenblas > > > -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lifport > > -lifcore_pic > > > -limf -lsvml -lm -lipgo -lirc -lgcc_s -lirc_s -lquadmath -lstdc++ -ldl -o > > > ex19 > > > [alfredo.jaramillo at sdumont11 tutorials]$ make ex5f > > > mpiifort -fPIC -O3 -march=native -mtune=native > > > -I/scratch/simulreserv/softwares/petsc-3.13.0/include > > > > > -I/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/include > > > -I/scratch/simulreserv/softwares/valgrind-3.15.0/include ex5f.F90 > > > > > -Wl,-rpath,/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > > > > > -L/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > > > > > -Wl,-rpath,/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > > > > > -L/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib/release_mt > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib/release_mt > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/clck/ > > > 3.1.2.006/lib/intel64 -L/opt/intel/parallel_studio_xe_2016_update2/clck/ > > > 3.1.2.006/lib/intel64 > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/ipp/lib/intel64 > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/ipp/lib/intel64 > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64 > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64 > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mkl/lib/intel64 > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mkl/lib/intel64 > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64/gcc4.4 > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64/gcc4.4 > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/daal/lib/intel64_lin > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/daal/lib/intel64_lin > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64_lin/gcc4.4 > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64_lin/gcc4.4 > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64_lin > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64_lin > > > -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 > > > -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 > > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib/release_mt > > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib -lpetsc -lHYPRE -lcmumps > > > -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lopenblas > > > -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lifport > > -lifcore_pic > > > -limf -lsvml -lm -lipgo -lirc -lgcc_s -lirc_s -lquadmath -lstdc++ -ldl -o > > > ex5f > > > > > > [alfredo.jaramillo at sdumont11 tutorials]$ ./ex5f > > > Number of SNES iterations = 4 > > > > > > [alfredo.jaramillo at sdumont11 tutorials]$ ./ex19 > > > lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > > > Number of SNES iterations = 2 > > > > > > On Thu, May 21, 2020 at 6:53 PM Satish Balay wrote: > > > > > > > Please copy/paste complete [compile] commands from: > > > > > > > > src/snes/tutorials/ > > > > make clean > > > > make ex19 > > > > make ex5f > > > > > > > > Likely the link command used in your code is different than what is > > used > > > > here - triggering errors. > > > > > > > > Satish > > > > > > > > On Thu, 21 May 2020, Alfredo Jaramillo wrote: > > > > > > > > > hello Satish, no the tests seem to be ok altough some error related > > to > > > > mpd. > > > > > > > > > > ==============THE TESTS=================== > > > > > > > > > > Running check examples to verify correct installation > > > > > Using PETSC_DIR=/scratch/simulreserv/softwares/petsc-3.13.0 and > > > > > PETSC_ARCH=x64-O3-3.13-intel2016-64 > > > > > Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI > > process > > > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > > > > mpiexec_sdumont11: cannot connect to local mpd > > > > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > > > > > 1. no mpd is running on this host > > > > > 2. an mpd is running but was started without a "console" (-n > > option) > > > > > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI > > processes > > > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > > > > mpiexec_sdumont11: cannot connect to local mpd > > > > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > > > > > 1. no mpd is running on this host > > > > > 2. an mpd is running but was started without a "console" (-n > > option) > > > > > 1,5c1,3 > > > > > < lid velocity = 0.0016, prandtl # = 1., grashof # = 1. > > > > > < 0 SNES Function norm 0.0406612 > > > > > < 1 SNES Function norm 4.12227e-06 > > > > > < 2 SNES Function norm 6.098e-11 > > > > > < Number of SNES iterations = 2 > > > > > --- > > > > > > mpiexec_sdumont11: cannot connect to local mpd > > > > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > > > > > > 1. no mpd is running on this host > > > > > > 2. an mpd is running but was started without a "console" (-n > > option) > > > > > /scratch/simulreserv/softwares/petsc-3.13.0/src/snes/tutorials > > > > > Possible problem with ex19 running with hypre, diffs above > > > > > ========================================= > > > > > 1,9c1,3 > > > > > < lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > > > > > < 0 SNES Function norm 0.239155 > > > > > < 0 KSP Residual norm 0.239155 > > > > > < 1 KSP Residual norm < 1.e-11 > > > > > < 1 SNES Function norm 6.81968e-05 > > > > > < 0 KSP Residual norm 6.81968e-05 > > > > > < 1 KSP Residual norm < 1.e-11 > > > > > < 2 SNES Function norm < 1.e-11 > > > > > < Number of SNES iterations = 2 > > > > > --- > > > > > > mpiexec_sdumont11: cannot connect to local mpd > > > > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > > > > > > 1. no mpd is running on this host > > > > > > 2. an mpd is running but was started without a "console" (-n > > option) > > > > > /scratch/simulreserv/softwares/petsc-3.13.0/src/snes/tutorials > > > > > Possible problem with ex19 running with mumps, diffs above > > > > > ========================================= > > > > > Possible error running Fortran example src/snes/tutorials/ex5f with > > 1 MPI > > > > > process > > > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > > > > mpiexec_sdumont11: cannot connect to local mpd > > > > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > > > > > 1. no mpd is running on this host > > > > > 2. an mpd is running but was started without a "console" (-n > > option) > > > > > Completed test examples > > > > > > > > > > =============================== > > > > > > > > > > I entered in src/snes/tutorials/ and executed "make ex5f". The binary > > > > exf5 > > > > > was created > > > > > > > > > > > > > > > > > > > > On Thu, May 21, 2020 at 6:37 PM Satish Balay > > wrote: > > > > > > > > > > > Do you get this error when building PETSc examples [C and/or > > fortran] - > > > > > > when you build them with the corresponding petsc makefile? > > > > > > > > > > > > Can you send the log of the example compiles? > > > > > > > > > > > > Satish > > > > > > > > > > > > --- > > > > > > > > > > > > [the attachment got deleted - don't know by who..] > > > > > > > > > > > > DENIAL OF SERVICE ALERT > > > > > > > > > > > > A denial of service protection limit was exceeded. The file has > > been > > > > > > removed. > > > > > > Context: 'configure.log.7z' > > > > > > Reason: The data size limit was exceeded > > > > > > Limit: 10 MB > > > > > > Ticket Number : 0c9c-5ec6-f30f-0001 > > > > > > > > > > > > > > > > > > For further information, contact your system administrator. > > > > > > Copyright 1999-2014 McAfee, Inc. > > > > > > All Rights Reserved. > > > > > > http://www.mcafee.com > > > > > > > > > > > > > > > > > > > > > > > > On Thu, 21 May 2020, Alfredo Jaramillo wrote: > > > > > > > > > > > > > dear PETSc team, > > > > > > > > > > > > > > I have compiled PETSc with a 2016 version of the intel > > compilers. The > > > > > > > installation went well, but when I tried to compile my code the > > > > following > > > > > > > error appears in the final step of compilation (linking with ld) > > > > > > > > > > > > > > ../build/linux_icc/obj_linux_icc_opt/main.o: In function `main': > > > > > > > main.c:(.text+0x0): multiple definition of `main' > > > > > > > > > > > > > > > > > > > /opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64/libifcore_pic.a(for_main.o):for_main.c:(.text+0x0): > > > > > > > first defined here > > > > > > > > > > > > > > > > > > > /opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64/libifcore_pic.a(for_main.o): > > > > > > > In function `main': > > > > > > > for_main.c:(.text+0x3e): undefined reference to `MAIN__' > > > > > > > > > > > > > > I searched for this and I found that the option "-nofor_main" > > should > > > > be > > > > > > > added when compiling with ifort, but our code is written only in > > C an > > > > > > C++. > > > > > > > The FORTRAN compiler is used when PETSc compiles MUMPS. So I dont > > > > know if > > > > > > > this would work for this case. > > > > > > > > > > > > > > The configure.log file and the log of the compilation giving the > > > > error > > > > > > are > > > > > > > attached to this message. These logs were obtained in a cluster, > > I'm > > > > > > > getting the same error on my personal computer with a 2020 > > version > > > > of the > > > > > > > Intel Parallel Studio. > > > > > > > > > > > > > > thank you for any help on this > > > > > > > Alfredo > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From ajaramillopalma at gmail.com Thu May 21 20:25:01 2020 From: ajaramillopalma at gmail.com (Alfredo Jaramillo) Date: Thu, 21 May 2020 22:25:01 -0300 Subject: [petsc-users] multiple definition of `main' with intel compilers In-Reply-To: References: Message-ID: I just read at the end of the section "compilers" here https://www.mcs.anl.gov/petsc/documentation/installation.html that one can indicate these libraries by doing ./configure --LIBS='-Bstatic -lifcore -Bdynamic' thank you very much for your help, Satish alfredo On Thu, May 21, 2020 at 10:08 PM Satish Balay wrote: > Configure attempts to automatically determine the language compatible > libraries by running 'icc -v' and 'ifort -v' - and uses what it says. [this > list can be different based on compiler options used] > > And PETSc examples work fine - so its not clear why fcoremt_pic is causing > issues with your app build. > > There are some differences in your build vs examples - so that could be > one reason. So its best to build PETSc and your app with the exact same > compilers and options and see if that works. > > Alternative is to specify to PETSc configure the compatibility libraries > so that it doesn't try to figure this out. [and add extra un-needed > libraries] > > With Intel compilers - its likely the additional configure option is: > > LIBS="-Bstatic -lifcore -Bdynamic" > > If this works - configure won't run 'ifort -v' and grab fcoremt_pic etc.. > > Satish > > On Thu, 21 May 2020, Alfredo Jaramillo wrote: > > > In fact, removing -lifcore_pic from > PETSC_DIR/PETSC_ARCH/lib/petscvariables > > solved the problem (this was the first I traied and it worked), it > compiles > > and the program runs fine. > > > > Also, -lpetsc was being listed some times when compiling .c to .o, I > fixed > > that in the scripts. > > > > Now, I'm a bit lost about when fcoremt_pic is being linked or if it is > > necessary at all? > > > > thank you very much! > > alfredo > > > > On Thu, May 21, 2020 at 7:21 PM Satish Balay wrote: > > > > > For one - PETSc is built without -qopenmp flag - but your makefile is > > > using it. Intel compiler can link in with > > > different [incompatible] compiler libraries when some flags change this > > > way [causing conflict]. > > > > > > However - the issue could be: > > > > > > your makefile is listing PETSC_LIB [or however you are accessing petsc > > > info from petsc makefiles] redundantly. > > > > > > i.e > > > > > > - its listing -lpetsc etc when compiling .c to .o files [this should > not > > > happen] > > > - its listing -lpetsc etc twice [-lifcore_pic is listed once though] > - > > > don't know if this is the reason for the problem. > > > > > > You can try [manually] removing -lifcore_pic from > > > PETSC_DIR/PETSC_ARCH/lib/petscvariables - and see if this problem goes > away > > > > > > Satish > > > > > > On Thu, 21 May 2020, Alfredo Jaramillo wrote: > > > > > > > here is the output: > > > > > > > > alfredo.jaramillo at sdumont11 tutorials]$ make ex19 > > > > mpiicc -fPIC -O3 -march=native -mtune=native -fPIC -O3 -march=native > > > > -mtune=native -I/scratch/simulreserv/softwares/petsc-3.13.0/include > > > > > > > > -I/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/include > > > > -I/scratch/simulreserv/softwares/valgrind-3.15.0/include > > > > -I/scratch/app/zlib/1.2.11/include ex19.c > > > > > > > > -Wl,-rpath,/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > > > > > > > > -L/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > > > > > > > > -Wl,-rpath,/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > > > > > > > > -L/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > > > > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib/release_mt > > > > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib/release_mt > > > > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib > > > > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/clck/ > > > > 3.1.2.006/lib/intel64 > -L/opt/intel/parallel_studio_xe_2016_update2/clck/ > > > > 3.1.2.006/lib/intel64 > > > > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/ipp/lib/intel64 > > > > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/ipp/lib/intel64 > > > > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64 > > > > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64 > > > > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mkl/lib/intel64 > > > > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mkl/lib/intel64 > > > > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64/gcc4.4 > > > > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64/gcc4.4 > > > > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/daal/lib/intel64_lin > > > > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/daal/lib/intel64_lin > > > > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64_lin/gcc4.4 > > > > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64_lin/gcc4.4 > > > > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64_lin > > > > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64_lin > > > > -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 > > > > -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 > > > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib/release_mt > > > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib -lpetsc -lHYPRE -lcmumps > > > > -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack > -lopenblas > > > > -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lifport > > > -lifcore_pic > > > > -limf -lsvml -lm -lipgo -lirc -lgcc_s -lirc_s -lquadmath -lstdc++ > -ldl -o > > > > ex19 > > > > [alfredo.jaramillo at sdumont11 tutorials]$ make ex5f > > > > mpiifort -fPIC -O3 -march=native -mtune=native > > > > -I/scratch/simulreserv/softwares/petsc-3.13.0/include > > > > > > > > -I/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/include > > > > -I/scratch/simulreserv/softwares/valgrind-3.15.0/include > ex5f.F90 > > > > > > > > -Wl,-rpath,/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > > > > > > > > -L/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > > > > > > > > -Wl,-rpath,/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > > > > > > > > -L/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib > > > > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib/release_mt > > > > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib/release_mt > > > > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib > > > > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/clck/ > > > > 3.1.2.006/lib/intel64 > -L/opt/intel/parallel_studio_xe_2016_update2/clck/ > > > > 3.1.2.006/lib/intel64 > > > > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/ipp/lib/intel64 > > > > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/ipp/lib/intel64 > > > > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64 > > > > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64 > > > > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mkl/lib/intel64 > > > > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mkl/lib/intel64 > > > > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64/gcc4.4 > > > > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64/gcc4.4 > > > > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/daal/lib/intel64_lin > > > > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/daal/lib/intel64_lin > > > > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64_lin/gcc4.4 > > > > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64_lin/gcc4.4 > > > > > > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64_lin > > > > > > > > -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64_lin > > > > -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 > > > > -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 > > > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib/release_mt > > > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib -lpetsc -lHYPRE -lcmumps > > > > -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack > -lopenblas > > > > -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lifport > > > -lifcore_pic > > > > -limf -lsvml -lm -lipgo -lirc -lgcc_s -lirc_s -lquadmath -lstdc++ > -ldl -o > > > > ex5f > > > > > > > > [alfredo.jaramillo at sdumont11 tutorials]$ ./ex5f > > > > Number of SNES iterations = 4 > > > > > > > > [alfredo.jaramillo at sdumont11 tutorials]$ ./ex19 > > > > lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > > > > Number of SNES iterations = 2 > > > > > > > > On Thu, May 21, 2020 at 6:53 PM Satish Balay > wrote: > > > > > > > > > Please copy/paste complete [compile] commands from: > > > > > > > > > > src/snes/tutorials/ > > > > > make clean > > > > > make ex19 > > > > > make ex5f > > > > > > > > > > Likely the link command used in your code is different than what is > > > used > > > > > here - triggering errors. > > > > > > > > > > Satish > > > > > > > > > > On Thu, 21 May 2020, Alfredo Jaramillo wrote: > > > > > > > > > > > hello Satish, no the tests seem to be ok altough some error > related > > > to > > > > > mpd. > > > > > > > > > > > > ==============THE TESTS=================== > > > > > > > > > > > > Running check examples to verify correct installation > > > > > > Using PETSC_DIR=/scratch/simulreserv/softwares/petsc-3.13.0 and > > > > > > PETSC_ARCH=x64-O3-3.13-intel2016-64 > > > > > > Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI > > > process > > > > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > > > > > mpiexec_sdumont11: cannot connect to local mpd > > > > > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > > > > > > 1. no mpd is running on this host > > > > > > 2. an mpd is running but was started without a "console" (-n > > > option) > > > > > > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI > > > processes > > > > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > > > > > mpiexec_sdumont11: cannot connect to local mpd > > > > > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > > > > > > 1. no mpd is running on this host > > > > > > 2. an mpd is running but was started without a "console" (-n > > > option) > > > > > > 1,5c1,3 > > > > > > < lid velocity = 0.0016, prandtl # = 1., grashof # = 1. > > > > > > < 0 SNES Function norm 0.0406612 > > > > > > < 1 SNES Function norm 4.12227e-06 > > > > > > < 2 SNES Function norm 6.098e-11 > > > > > > < Number of SNES iterations = 2 > > > > > > --- > > > > > > > mpiexec_sdumont11: cannot connect to local mpd > > > > > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > > > > > > > 1. no mpd is running on this host > > > > > > > 2. an mpd is running but was started without a "console" (-n > > > option) > > > > > > /scratch/simulreserv/softwares/petsc-3.13.0/src/snes/tutorials > > > > > > Possible problem with ex19 running with hypre, diffs above > > > > > > ========================================= > > > > > > 1,9c1,3 > > > > > > < lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > > > > > > < 0 SNES Function norm 0.239155 > > > > > > < 0 KSP Residual norm 0.239155 > > > > > > < 1 KSP Residual norm < 1.e-11 > > > > > > < 1 SNES Function norm 6.81968e-05 > > > > > > < 0 KSP Residual norm 6.81968e-05 > > > > > > < 1 KSP Residual norm < 1.e-11 > > > > > > < 2 SNES Function norm < 1.e-11 > > > > > > < Number of SNES iterations = 2 > > > > > > --- > > > > > > > mpiexec_sdumont11: cannot connect to local mpd > > > > > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > > > > > > > 1. no mpd is running on this host > > > > > > > 2. an mpd is running but was started without a "console" (-n > > > option) > > > > > > /scratch/simulreserv/softwares/petsc-3.13.0/src/snes/tutorials > > > > > > Possible problem with ex19 running with mumps, diffs above > > > > > > ========================================= > > > > > > Possible error running Fortran example src/snes/tutorials/ex5f > with > > > 1 MPI > > > > > > process > > > > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > > > > > mpiexec_sdumont11: cannot connect to local mpd > > > > > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: > > > > > > 1. no mpd is running on this host > > > > > > 2. an mpd is running but was started without a "console" (-n > > > option) > > > > > > Completed test examples > > > > > > > > > > > > =============================== > > > > > > > > > > > > I entered in src/snes/tutorials/ and executed "make ex5f". The > binary > > > > > exf5 > > > > > > was created > > > > > > > > > > > > > > > > > > > > > > > > On Thu, May 21, 2020 at 6:37 PM Satish Balay > > > wrote: > > > > > > > > > > > > > Do you get this error when building PETSc examples [C and/or > > > fortran] - > > > > > > > when you build them with the corresponding petsc makefile? > > > > > > > > > > > > > > Can you send the log of the example compiles? > > > > > > > > > > > > > > Satish > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > [the attachment got deleted - don't know by who..] > > > > > > > > > > > > > > DENIAL OF SERVICE ALERT > > > > > > > > > > > > > > A denial of service protection limit was exceeded. The file has > > > been > > > > > > > removed. > > > > > > > Context: 'configure.log.7z' > > > > > > > Reason: The data size limit was exceeded > > > > > > > Limit: 10 MB > > > > > > > Ticket Number : 0c9c-5ec6-f30f-0001 > > > > > > > > > > > > > > > > > > > > > For further information, contact your system administrator. > > > > > > > Copyright 1999-2014 McAfee, Inc. > > > > > > > All Rights Reserved. > > > > > > > http://www.mcafee.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, 21 May 2020, Alfredo Jaramillo wrote: > > > > > > > > > > > > > > > dear PETSc team, > > > > > > > > > > > > > > > > I have compiled PETSc with a 2016 version of the intel > > > compilers. The > > > > > > > > installation went well, but when I tried to compile my code > the > > > > > following > > > > > > > > error appears in the final step of compilation (linking with > ld) > > > > > > > > > > > > > > > > ../build/linux_icc/obj_linux_icc_opt/main.o: In function > `main': > > > > > > > > main.c:(.text+0x0): multiple definition of `main' > > > > > > > > > > > > > > > > > > > > > > > > /opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64/libifcore_pic.a(for_main.o):for_main.c:(.text+0x0): > > > > > > > > first defined here > > > > > > > > > > > > > > > > > > > > > > > > /opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64/libifcore_pic.a(for_main.o): > > > > > > > > In function `main': > > > > > > > > for_main.c:(.text+0x3e): undefined reference to `MAIN__' > > > > > > > > > > > > > > > > I searched for this and I found that the option "-nofor_main" > > > should > > > > > be > > > > > > > > added when compiling with ifort, but our code is written > only in > > > C an > > > > > > > C++. > > > > > > > > The FORTRAN compiler is used when PETSc compiles MUMPS. So I > dont > > > > > know if > > > > > > > > this would work for this case. > > > > > > > > > > > > > > > > The configure.log file and the log of the compilation giving > the > > > > > error > > > > > > > are > > > > > > > > attached to this message. These logs were obtained in a > cluster, > > > I'm > > > > > > > > getting the same error on my personal computer with a 2020 > > > version > > > > > of the > > > > > > > > Intel Parallel Studio. > > > > > > > > > > > > > > > > thank you for any help on this > > > > > > > > Alfredo > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajaramillopalma at gmail.com Fri May 22 10:57:05 2020 From: ajaramillopalma at gmail.com (Alfredo Jaramillo) Date: Fri, 22 May 2020 12:57:05 -0300 Subject: [petsc-users] multiple definition of `main' with intel compilers In-Reply-To: References: Message-ID: Hello Satish and PETSc developers, I'm sending this email in case somebody else is having the same problem with the intel compilers. In my personal computer (with Fedora 32, gcc 10). When I tried to execute the resulting binary there appears the error message: build/linux_icc/partrans_linux_icc_opt: /opt/intel/clck/2019.8/lib/intel64/libstdc++.so.6: version `CXXABI_1.3.9' not found (required by build/linux_icc/partrans_linux_icc_opt) a workaround that worked for me was to delete the link /opt/intel/clck/2019.8/lib/intel64/libstdc++.so.6 that was doing: libstdc++.so.6 -> libstdc++.so.6.0.20 (in the folder /opt/intel/clck/2019.8/lib/intel64/) and replace it by a link to the default c++ library of my system: sudo ln -s /usr/lib64/libstdc++.so.6.0.20 libstdc++.so.6 the code is running so such replacement seems to work fine. best wishes Alfredo On Thu, May 21, 2020 at 10:25 PM Alfredo Jaramillo < ajaramillopalma at gmail.com> wrote: > I just read at the end of the section "compilers" here > https://www.mcs.anl.gov/petsc/documentation/installation.html that one > can indicate these libraries by doing > ./configure --LIBS='-Bstatic -lifcore -Bdynamic' > > thank you very much for your help, Satish > > alfredo > > On Thu, May 21, 2020 at 10:08 PM Satish Balay wrote: > >> Configure attempts to automatically determine the language compatible >> libraries by running 'icc -v' and 'ifort -v' - and uses what it says. [this >> list can be different based on compiler options used] >> >> And PETSc examples work fine - so its not clear why fcoremt_pic is >> causing issues with your app build. >> >> There are some differences in your build vs examples - so that could be >> one reason. So its best to build PETSc and your app with the exact same >> compilers and options and see if that works. >> >> Alternative is to specify to PETSc configure the compatibility libraries >> so that it doesn't try to figure this out. [and add extra un-needed >> libraries] >> >> With Intel compilers - its likely the additional configure option is: >> >> LIBS="-Bstatic -lifcore -Bdynamic" >> >> If this works - configure won't run 'ifort -v' and grab fcoremt_pic etc.. >> >> Satish >> >> On Thu, 21 May 2020, Alfredo Jaramillo wrote: >> >> > In fact, removing -lifcore_pic from >> PETSC_DIR/PETSC_ARCH/lib/petscvariables >> > solved the problem (this was the first I traied and it worked), it >> compiles >> > and the program runs fine. >> > >> > Also, -lpetsc was being listed some times when compiling .c to .o, I >> fixed >> > that in the scripts. >> > >> > Now, I'm a bit lost about when fcoremt_pic is being linked or if it is >> > necessary at all? >> > >> > thank you very much! >> > alfredo >> > >> > On Thu, May 21, 2020 at 7:21 PM Satish Balay wrote: >> > >> > > For one - PETSc is built without -qopenmp flag - but your makefile is >> > > using it. Intel compiler can link in with >> > > different [incompatible] compiler libraries when some flags change >> this >> > > way [causing conflict]. >> > > >> > > However - the issue could be: >> > > >> > > your makefile is listing PETSC_LIB [or however you are accessing petsc >> > > info from petsc makefiles] redundantly. >> > > >> > > i.e >> > > >> > > - its listing -lpetsc etc when compiling .c to .o files [this >> should not >> > > happen] >> > > - its listing -lpetsc etc twice [-lifcore_pic is listed once >> though] - >> > > don't know if this is the reason for the problem. >> > > >> > > You can try [manually] removing -lifcore_pic from >> > > PETSC_DIR/PETSC_ARCH/lib/petscvariables - and see if this problem >> goes away >> > > >> > > Satish >> > > >> > > On Thu, 21 May 2020, Alfredo Jaramillo wrote: >> > > >> > > > here is the output: >> > > > >> > > > alfredo.jaramillo at sdumont11 tutorials]$ make ex19 >> > > > mpiicc -fPIC -O3 -march=native -mtune=native -fPIC -O3 >> -march=native >> > > > -mtune=native -I/scratch/simulreserv/softwares/petsc-3.13.0/include >> > > > >> > > >> -I/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/include >> > > > -I/scratch/simulreserv/softwares/valgrind-3.15.0/include >> > > > -I/scratch/app/zlib/1.2.11/include ex19.c >> > > > >> > > >> -Wl,-rpath,/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib >> > > > >> > > >> -L/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib >> > > > >> > > >> -Wl,-rpath,/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib >> > > > >> > > >> -L/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib >> > > > >> > > >> -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib/release_mt >> > > > >> > > >> -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib/release_mt >> > > > >> > > >> -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib >> > > > >> > > >> -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib >> > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/clck/ >> > > > 3.1.2.006/lib/intel64 >> -L/opt/intel/parallel_studio_xe_2016_update2/clck/ >> > > > 3.1.2.006/lib/intel64 >> > > > >> > > >> -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/ipp/lib/intel64 >> > > > >> > > >> -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/ipp/lib/intel64 >> > > > >> > > >> -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64 >> > > > >> > > >> -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64 >> > > > >> > > >> -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mkl/lib/intel64 >> > > > >> > > >> -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mkl/lib/intel64 >> > > > >> > > >> -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64/gcc4.4 >> > > > >> > > >> -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64/gcc4.4 >> > > > >> > > >> -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/daal/lib/intel64_lin >> > > > >> > > >> -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/daal/lib/intel64_lin >> > > > >> > > >> -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64_lin/gcc4.4 >> > > > >> > > >> -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64_lin/gcc4.4 >> > > > >> > > >> -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64_lin >> > > > >> > > >> -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64_lin >> > > > -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 >> > > > -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 >> > > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib/release_mt >> > > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib -lpetsc -lHYPRE >> -lcmumps >> > > > -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack >> -lopenblas >> > > > -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lifport >> > > -lifcore_pic >> > > > -limf -lsvml -lm -lipgo -lirc -lgcc_s -lirc_s -lquadmath -lstdc++ >> -ldl -o >> > > > ex19 >> > > > [alfredo.jaramillo at sdumont11 tutorials]$ make ex5f >> > > > mpiifort -fPIC -O3 -march=native -mtune=native >> > > > -I/scratch/simulreserv/softwares/petsc-3.13.0/include >> > > > >> > > >> -I/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/include >> > > > -I/scratch/simulreserv/softwares/valgrind-3.15.0/include >> ex5f.F90 >> > > > >> > > >> -Wl,-rpath,/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib >> > > > >> > > >> -L/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib >> > > > >> > > >> -Wl,-rpath,/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib >> > > > >> > > >> -L/scratch/simulreserv/softwares/petsc-3.13.0/x64-O3-3.13-intel2016-64/lib >> > > > >> > > >> -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib/release_mt >> > > > >> > > >> -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib/release_mt >> > > > >> > > >> -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib >> > > > >> > > >> -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/lib >> > > > -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/clck/ >> > > > 3.1.2.006/lib/intel64 >> -L/opt/intel/parallel_studio_xe_2016_update2/clck/ >> > > > 3.1.2.006/lib/intel64 >> > > > >> > > >> -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/ipp/lib/intel64 >> > > > >> > > >> -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/ipp/lib/intel64 >> > > > >> > > >> -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64 >> > > > >> > > >> -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64 >> > > > >> > > >> -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mkl/lib/intel64 >> > > > >> > > >> -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/mkl/lib/intel64 >> > > > >> > > >> -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64/gcc4.4 >> > > > >> > > >> -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64/gcc4.4 >> > > > >> > > >> -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/daal/lib/intel64_lin >> > > > >> > > >> -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/daal/lib/intel64_lin >> > > > >> > > >> -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64_lin/gcc4.4 >> > > > >> > > >> -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/tbb/lib/intel64_lin/gcc4.4 >> > > > >> > > >> -Wl,-rpath,/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64_lin >> > > > >> > > >> -L/opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64_lin >> > > > -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 >> > > > -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 >> > > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib/release_mt >> > > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib -lpetsc -lHYPRE >> -lcmumps >> > > > -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack >> -lopenblas >> > > > -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lifport >> > > -lifcore_pic >> > > > -limf -lsvml -lm -lipgo -lirc -lgcc_s -lirc_s -lquadmath -lstdc++ >> -ldl -o >> > > > ex5f >> > > > >> > > > [alfredo.jaramillo at sdumont11 tutorials]$ ./ex5f >> > > > Number of SNES iterations = 4 >> > > > >> > > > [alfredo.jaramillo at sdumont11 tutorials]$ ./ex19 >> > > > lid velocity = 0.0625, prandtl # = 1., grashof # = 1. >> > > > Number of SNES iterations = 2 >> > > > >> > > > On Thu, May 21, 2020 at 6:53 PM Satish Balay >> wrote: >> > > > >> > > > > Please copy/paste complete [compile] commands from: >> > > > > >> > > > > src/snes/tutorials/ >> > > > > make clean >> > > > > make ex19 >> > > > > make ex5f >> > > > > >> > > > > Likely the link command used in your code is different than what >> is >> > > used >> > > > > here - triggering errors. >> > > > > >> > > > > Satish >> > > > > >> > > > > On Thu, 21 May 2020, Alfredo Jaramillo wrote: >> > > > > >> > > > > > hello Satish, no the tests seem to be ok altough some error >> related >> > > to >> > > > > mpd. >> > > > > > >> > > > > > ==============THE TESTS=================== >> > > > > > >> > > > > > Running check examples to verify correct installation >> > > > > > Using PETSC_DIR=/scratch/simulreserv/softwares/petsc-3.13.0 and >> > > > > > PETSC_ARCH=x64-O3-3.13-intel2016-64 >> > > > > > Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI >> > > process >> > > > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html >> > > > > > mpiexec_sdumont11: cannot connect to local mpd >> > > > > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: >> > > > > > 1. no mpd is running on this host >> > > > > > 2. an mpd is running but was started without a "console" (-n >> > > option) >> > > > > > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI >> > > processes >> > > > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html >> > > > > > mpiexec_sdumont11: cannot connect to local mpd >> > > > > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: >> > > > > > 1. no mpd is running on this host >> > > > > > 2. an mpd is running but was started without a "console" (-n >> > > option) >> > > > > > 1,5c1,3 >> > > > > > < lid velocity = 0.0016, prandtl # = 1., grashof # = 1. >> > > > > > < 0 SNES Function norm 0.0406612 >> > > > > > < 1 SNES Function norm 4.12227e-06 >> > > > > > < 2 SNES Function norm 6.098e-11 >> > > > > > < Number of SNES iterations = 2 >> > > > > > --- >> > > > > > > mpiexec_sdumont11: cannot connect to local mpd >> > > > > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: >> > > > > > > 1. no mpd is running on this host >> > > > > > > 2. an mpd is running but was started without a "console" (-n >> > > option) >> > > > > > /scratch/simulreserv/softwares/petsc-3.13.0/src/snes/tutorials >> > > > > > Possible problem with ex19 running with hypre, diffs above >> > > > > > ========================================= >> > > > > > 1,9c1,3 >> > > > > > < lid velocity = 0.0625, prandtl # = 1., grashof # = 1. >> > > > > > < 0 SNES Function norm 0.239155 >> > > > > > < 0 KSP Residual norm 0.239155 >> > > > > > < 1 KSP Residual norm < 1.e-11 >> > > > > > < 1 SNES Function norm 6.81968e-05 >> > > > > > < 0 KSP Residual norm 6.81968e-05 >> > > > > > < 1 KSP Residual norm < 1.e-11 >> > > > > > < 2 SNES Function norm < 1.e-11 >> > > > > > < Number of SNES iterations = 2 >> > > > > > --- >> > > > > > > mpiexec_sdumont11: cannot connect to local mpd >> > > > > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: >> > > > > > > 1. no mpd is running on this host >> > > > > > > 2. an mpd is running but was started without a "console" (-n >> > > option) >> > > > > > /scratch/simulreserv/softwares/petsc-3.13.0/src/snes/tutorials >> > > > > > Possible problem with ex19 running with mumps, diffs above >> > > > > > ========================================= >> > > > > > Possible error running Fortran example src/snes/tutorials/ex5f >> with >> > > 1 MPI >> > > > > > process >> > > > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html >> > > > > > mpiexec_sdumont11: cannot connect to local mpd >> > > > > > (/tmp/mpd2.console_alfredo.jaramillo); possible causes: >> > > > > > 1. no mpd is running on this host >> > > > > > 2. an mpd is running but was started without a "console" (-n >> > > option) >> > > > > > Completed test examples >> > > > > > >> > > > > > =============================== >> > > > > > >> > > > > > I entered in src/snes/tutorials/ and executed "make ex5f". The >> binary >> > > > > exf5 >> > > > > > was created >> > > > > > >> > > > > > >> > > > > > >> > > > > > On Thu, May 21, 2020 at 6:37 PM Satish Balay > > >> > > wrote: >> > > > > > >> > > > > > > Do you get this error when building PETSc examples [C and/or >> > > fortran] - >> > > > > > > when you build them with the corresponding petsc makefile? >> > > > > > > >> > > > > > > Can you send the log of the example compiles? >> > > > > > > >> > > > > > > Satish >> > > > > > > >> > > > > > > --- >> > > > > > > >> > > > > > > [the attachment got deleted - don't know by who..] >> > > > > > > >> > > > > > > DENIAL OF SERVICE ALERT >> > > > > > > >> > > > > > > A denial of service protection limit was exceeded. The file >> has >> > > been >> > > > > > > removed. >> > > > > > > Context: 'configure.log.7z' >> > > > > > > Reason: The data size limit was exceeded >> > > > > > > Limit: 10 MB >> > > > > > > Ticket Number : 0c9c-5ec6-f30f-0001 >> > > > > > > >> > > > > > > >> > > > > > > For further information, contact your system administrator. >> > > > > > > Copyright 1999-2014 McAfee, Inc. >> > > > > > > All Rights Reserved. >> > > > > > > http://www.mcafee.com >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > On Thu, 21 May 2020, Alfredo Jaramillo wrote: >> > > > > > > >> > > > > > > > dear PETSc team, >> > > > > > > > >> > > > > > > > I have compiled PETSc with a 2016 version of the intel >> > > compilers. The >> > > > > > > > installation went well, but when I tried to compile my code >> the >> > > > > following >> > > > > > > > error appears in the final step of compilation (linking >> with ld) >> > > > > > > > >> > > > > > > > ../build/linux_icc/obj_linux_icc_opt/main.o: In function >> `main': >> > > > > > > > main.c:(.text+0x0): multiple definition of `main' >> > > > > > > > >> > > > > > > >> > > > > >> > > >> /opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64/libifcore_pic.a(for_main.o):for_main.c:(.text+0x0): >> > > > > > > > first defined here >> > > > > > > > >> > > > > > > >> > > > > >> > > >> /opt/intel/parallel_studio_xe_2016_update2/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64/libifcore_pic.a(for_main.o): >> > > > > > > > In function `main': >> > > > > > > > for_main.c:(.text+0x3e): undefined reference to `MAIN__' >> > > > > > > > >> > > > > > > > I searched for this and I found that the option >> "-nofor_main" >> > > should >> > > > > be >> > > > > > > > added when compiling with ifort, but our code is written >> only in >> > > C an >> > > > > > > C++. >> > > > > > > > The FORTRAN compiler is used when PETSc compiles MUMPS. So >> I dont >> > > > > know if >> > > > > > > > this would work for this case. >> > > > > > > > >> > > > > > > > The configure.log file and the log of the compilation >> giving the >> > > > > error >> > > > > > > are >> > > > > > > > attached to this message. These logs were obtained in a >> cluster, >> > > I'm >> > > > > > > > getting the same error on my personal computer with a 2020 >> > > version >> > > > > of the >> > > > > > > > Intel Parallel Studio. >> > > > > > > > >> > > > > > > > thank you for any help on this >> > > > > > > > Alfredo >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > >> > > > >> > > >> > > >> > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From shaswat121994 at gmail.com Fri May 22 11:56:11 2020 From: shaswat121994 at gmail.com (Shashwat Tiwari) Date: Fri, 22 May 2020 22:26:11 +0530 Subject: [petsc-users] Creating multi-field section using DMPlex Message-ID: Hi, I am working on a Finite Volume scheme for a hyperbolic system of equations equations with 6 variables on unstructured grids. I am trying to create a section with 6 fields for this purpose, and have written a small test code for creating the section by editing the example given at src/dm/impls/plex/tutorials/ex1.c as follows: int main(int argc, char **argv) { DM dm, dmDist = NULL; Vec U; PetscSection section; PetscViewer viewer; PetscInt dim = 2, numFields, numBC, i; PetscInt numComp[6]; PetscInt numDof[18]; PetscBool interpolate = PETSC_TRUE, useCone = PETSC_TRUE, useClosure = PETSC_TRUE; PetscReal lower[3], upper[3]; // lower left and upper right coordinates of domain PetscInt cells[2]; PetscErrorCode ierr; // define domain properties cells[0] = 4; cells[1] = 4; lower[0] = -1; lower[1] = -1; lower[2] = 0; upper[0] = 1; upper[1] = 1; upper[2] = 0; ierr = PetscInitialize(&argc, &argv, NULL, help); CHKERRQ(ierr); // create the mesh ierr = PetscPrintf(PETSC_COMM_WORLD, "Generating mesh ... "); CHKERRQ(ierr); ierr = DMPlexCreateBoxMesh(PETSC_COMM_WORLD, dim, PETSC_TRUE, cells, lower, upper, NULL, interpolate, &dm); CHKERRQ(ierr); ierr = PetscPrintf(PETSC_COMM_WORLD, "Done\n"); CHKERRQ(ierr); // set cell adjacency ierr = DMSetBasicAdjacency(dm, useCone, useClosure); CHKERRQ(ierr); // Distribute mesh over processes ierr = DMPlexDistribute(dm, 1, NULL, &dmDist);CHKERRQ(ierr); if (dmDist) {ierr = DMDestroy(&dm);CHKERRQ(ierr); dm = dmDist;} // Create scalar fields numFields = 6; numComp[0] = 1; numComp[1] = 1; numComp[2] = 1; numComp[3] = 1; numComp[4] = 1; numComp[5] = 1; for (i = 0; i < numFields*(dim+1); ++i) numDof[i] = 0; numDof[0*(dim+1)+dim] = 1; numDof[1*(dim+1)+dim] = 1; numDof[2*(dim+1)+dim] = 1; numDof[3*(dim+1)+dim] = 1; numDof[4*(dim+1)+dim] = 1; numDof[5*(dim+1)+dim] = 1; numBC = 0; // Create a PetscSection with this data layout ierr = DMSetNumFields(dm, numFields);CHKERRQ(ierr); ierr = DMPlexCreateSection(dm, NULL, numComp, numDof, numBC, NULL, NULL, NULL, NULL, §ion);CHKERRQ(ierr); // Name the Field variables ierr = PetscSectionSetFieldName(section, 0, "h");CHKERRQ(ierr); ierr = PetscSectionSetFieldName(section, 1, "m1");CHKERRQ(ierr); ierr = PetscSectionSetFieldName(section, 2, "m2");CHKERRQ(ierr); ierr = PetscSectionSetFieldName(section, 3, "E11");CHKERRQ(ierr); ierr = PetscSectionSetFieldName(section, 4, "E12");CHKERRQ(ierr); ierr = PetscSectionSetFieldName(section, 5, "E22");CHKERRQ(ierr); // set adjacency for each field ierr = DMSetAdjacency(dm, 0, useCone, useClosure); CHKERRQ(ierr); ierr = DMSetAdjacency(dm, 1, useCone, useClosure); CHKERRQ(ierr); ierr = DMSetAdjacency(dm, 2, useCone, useClosure); CHKERRQ(ierr); ierr = DMSetAdjacency(dm, 3, useCone, useClosure); CHKERRQ(ierr); ierr = DMSetAdjacency(dm, 4, useCone, useClosure); CHKERRQ(ierr); ierr = DMSetAdjacency(dm, 5, useCone, useClosure); CHKERRQ(ierr); ierr = PetscSectionView(section, PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr); // Tell the DM to use this data layout ierr = DMSetLocalSection(dm, section);CHKERRQ(ierr); ierr = DMView(dm, PETSC_VIEWER_STDOUT_WORLD); CHKERRQ(ierr); // Create a Vec with this layout and view it ierr = DMGetGlobalVector(dm, &U);CHKERRQ(ierr); ierr = PetscObjectSetName((PetscObject) U, "U"); CHKERRQ(ierr); ierr = PetscViewerCreate(PETSC_COMM_WORLD, &viewer);CHKERRQ(ierr); ierr = PetscViewerSetType(viewer, PETSCVIEWERVTK);CHKERRQ(ierr); ierr = PetscViewerPushFormat(viewer, PETSC_VIEWER_ASCII_VTK);CHKERRQ(ierr); ierr = PetscViewerFileSetName(viewer, "sol.vtk");CHKERRQ(ierr); //ierr = PetscViewerPushFormat(viewer, PETSC_VIEWER_VTK_VTU);CHKERRQ(ierr); //ierr = PetscViewerFileSetName(viewer, "sol.vtu");CHKERRQ(ierr); ierr = VecView(U, viewer);CHKERRQ(ierr); ierr = PetscViewerDestroy(&viewer);CHKERRQ(ierr); ierr = DMRestoreGlobalVector(dm, &U);CHKERRQ(ierr); // Cleanup ierr = PetscSectionDestroy(§ion);CHKERRQ(ierr); ierr = DMDestroy(&dm);CHKERRQ(ierr); ierr = PetscFinalize(); return ierr; } When exporting the vector data to "vtk" format, the code works fine and I am able to see the 6 field names in the visualizer. But when I change it to "vtu", I get the following error: [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Petsc has generated inconsistent data [0]PETSC ERROR: Total number of field components 1 != block size 6 [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.13.0, unknown [0]PETSC ERROR: ./ex1 on a arch-linux2-c-debug named shashwat by shashwat Fri May 22 22:11:13 2020 [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-fblaslapack --with-debugging=1 --download-triangle --download-metis --download-parmetis --download-cmake [0]PETSC ERROR: #1 DMPlexVTKWriteAll_VTU() line 334 in /home/shashwat/local/petsc/src/dm/impls/plex/plexvtu.c [0]PETSC ERROR: #2 DMPlexVTKWriteAll() line 688 in /home/shashwat/local/petsc/src/dm/impls/plex/plexvtk.c [0]PETSC ERROR: #3 PetscViewerFlush_VTK() line 100 in /home/shashwat/local/petsc/src/sys/classes/viewer/impls/vtk/vtkv.c [0]PETSC ERROR: #4 PetscViewerFlush() line 26 in /home/shashwat/local/petsc/src/sys/classes/viewer/interface/flush.c [0]PETSC ERROR: #5 PetscViewerDestroy() line 113 in /home/shashwat/local/petsc/src/sys/classes/viewer/interface/view.c [0]PETSC ERROR: #6 main() line 485 in ex1.c [0]PETSC ERROR: No PETSc Option Table entries [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_SELF, 485077) - process 0 I would like to use the "vtu" format for visualiztion. Kindly have a look and let me know I might be doing wrong here, and how can I make it work. Regards, Shashwat -------------- next part -------------- An HTML attachment was scrubbed... URL: From eda.oktay at metu.edu.tr Fri May 22 12:31:36 2020 From: eda.oktay at metu.edu.tr (Eda Oktay) Date: Fri, 22 May 2020 20:31:36 +0300 Subject: [petsc-users] Gather and Broadcast Parallel Vectors in k-means algorithm In-Reply-To: <738e615c-de8e-88cd-81cd-4d6f35b454d2@anl.gov> References: <0c10fc0a-3d86-e91a-f349-fd7c087ba8ed@anl.gov> <738e615c-de8e-88cd-81cd-4d6f35b454d2@anl.gov> Message-ID: Dear Richard, Thank you for your email. From MATLAB's kmeans() function I believe I got the final clustering index set, not centroids. What I am trying to do is to cluster vectors created by MatDuplicateVecs() according to the index set (whose type is not IS since I took it from MATLAB) that I obtained from MATLAB. I am trying to cluster these vectors however since they are parallel, I couldn't understand how to cluster them. Normally, I have to be independent from MATLAB so I will try your suggestion, grateful for that. However, because of my limited knowledge about PETSc and parallel computing, I am not able to figure out how to cluster parallel vectors according to an index set. Thanks, Eda Mills, Richard Tran , 30 Nis 2020 Per, 02:07 tarihinde ?unu yazd?: > Hi Eda, > > Thanks for your reply. I'm still trying to understand why you say you need > to duplicate the row vectors across all processes. When I have implemented > parallel k-means, I don't duplicate the row vectors. (This would be very > unscalable and largely defeat the point of doing this with MPI parallelism > in the first place.) > > Earlier in this email thread, you said that you have used Matlab to get > cluster IDs for each row vector. Are you trying to then use this > information to calculate the cluster centroids from inside your PETSc > program? If so, you can do this by having each MPI rank do the following: > For cluster i in 0 to (k-1), calculate the element-wise sum of all of the > local rows that belong to cluster i, then use MPI_Allreduce() to calculate > the global elementwise sum of all the local sums (this array will be > replicated across all MPI ranks), and finally divide by the number of > members of that cluster to get the centroid. Note that MPI_Allreduce() > doesn't work on PETSc objects, but simple arrays, so you'll want to use > something like MatGetValues() or MatGetRow() to access the elements of your > row vectors. > > Let me know if I am misunderstanding what you are aiming to do, or if I am > misunderstanding something. > > It sounds like you would benefit from having some routines in PETSc to do > k-means (or other) clustering, by the way? > > Best regards, > Richard > > On 4/29/20 3:47 AM, Eda Oktay wrote: > > Dear Richard, > > I am trying to use spectral clustering algorithm by using k-means > clustering algorithm at some point. I am doing this by producing a matrix > consisting of eigenvectors (of the adjacency matrix of the graph that I > want to partition), then forming row vectors of this matrix. This is the > part that I am using parallel vector. By using the output from k-means, I > am trying to cluster these row vectors. To cluster these vectors, I think I > need all row vectors in all processes. I wanted to use sequential vectors, > however, I couldn't find a different way that I form row vectors of a > matrix. > > I am trying to use VecScatterCreateToAll, however, since my vector is > parallel crated by VecDuplicateVecs, my input is not in correct type, so I > get error. I still can't get how can I use this function in parallel vector > created by VecDuplicateVecs. > > Thank you all for your help. > > Eda > > Mills, Richard Tran , 7 Nis 2020 Sal, 01:51 tarihinde > ?unu yazd?: > >> Hi Eda, >> >> I think that you probably want to use VecScatter routines, as Junchao >> has suggested, instead of the lower level star forest for this. I >> believe that VecScatterCreateToZero() is what you want for the broadcast >> problem you describe, in the second part of your question. I'm not sure >> what you are trying to do in the first part. Taking a parallel vector >> and then copying its entire contents to a sequential vector residing on >> each process is not scalable, and a lot of the design that has gone into >> PETSc is to prevent the user from ever needing to do things like that. >> Can you please tell us what you intend to do with these sequential >> vectors? >> >> I'm also wondering why, later in your message, you say that you get >> cluster assignments from Matlab, and then "to cluster row vectors >> according to this information, all processors need to have all of the >> row vectors". Do you mean you want to get all of the row vectors copied >> onto all of the processors so that you can compute the cluster >> centroids? If so, computing the cluster centroids can be done without >> copying the row vectors onto all processors if you use a communication >> operation like MPI_Allreduce(). >> >> Lastly, let me add that I've done a fair amount of work implementing >> clustering algorithms on distributed memory parallel machines, but >> outside of PETSc. I was thinking that I should implement some of these >> routines using PETSc. I can't get to this immediately, but I'm wondering >> if you might care to tell me a bit more about the clustering problems >> you need to solve and how having some support for this in PETSc might >> (or might not) help. >> >> Best regards, >> Richard >> >> On 4/4/20 1:39 AM, Eda Oktay wrote: >> > Hi all, >> > >> > I created a parallel vector UV, by using VecDuplicateVecs since I need >> > row vectors of a matrix. However, I need the whole vector be in all >> > processors, which means I need to gather all and broadcast them to all >> > processors. To gather, I tried to use VecStrideGatherAll: >> > >> > Vec UVG; >> > VecStrideGatherAll(UV,UVG,INSERT_VALUES); >> > VecView(UVG,PETSC_VIEWER_STDOUT_WORLD); >> > >> > however when I try to view the vector, I get the following error. >> > >> > [3]PETSC ERROR: Invalid argument >> > [3]PETSC ERROR: Wrong type of object: Parameter # 1 >> > [3]PETSC ERROR: See >> > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >> shooting. >> > [3]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019 >> > [3]PETSC ERROR: ./clustering_son_final_edgecut_without_parmetis on a >> > arch-linux2-c-debug named localhost.localdomain by edaoktay Sat Apr 4 >> > 11:22:54 2020 >> > [3]PETSC ERROR: Wrong type of object: Parameter # 1 >> > [0]PETSC ERROR: See >> > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >> shooting. >> > [0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019 >> > [0]PETSC ERROR: ./clustering_son_final_edgecut_without_parmetis on a >> > arch-linux2-c-debug named localhost.localdomain by edaoktay Sat Apr 4 >> > 11:22:54 2020 >> > [0]PETSC ERROR: Configure options --download-mpich --download-openblas >> > --download-slepc --download-metis --download-parmetis --download-chaco >> > --with-X=1 >> > [0]PETSC ERROR: #1 VecStrideGatherAll() line 646 in >> > /home/edaoktay/petsc-3.11.1/src/vec/vec/utils/vinv.c >> > ./clustering_son_final_edgecut_without_parmetis on a >> > arch-linux2-c-debug named localhost.localdomain by edaoktay Sat Apr 4 >> > 11:22:54 2020 >> > [1]PETSC ERROR: Configure options --download-mpich --download-openblas >> > --download-slepc --download-metis --download-parmetis --download-chaco >> > --with-X=1 >> > [1]PETSC ERROR: #1 VecStrideGatherAll() line 646 in >> > /home/edaoktay/petsc-3.11.1/src/vec/vec/utils/vinv.c >> > Configure options --download-mpich --download-openblas >> > --download-slepc --download-metis --download-parmetis --download-chaco >> > --with-X=1 >> > [3]PETSC ERROR: #1 VecStrideGatherAll() line 646 in >> > /home/edaoktay/petsc-3.11.1/src/vec/vec/utils/vinv.c >> > >> > I couldn't understand why I am getting this error. Is this because of >> > UV being created by VecDuplicateVecs? How can I solve this problem? >> > >> > The other question is broadcasting. After gathering all elements of >> > the vector UV, I need to broadcast them to all processors. I found >> > PetscSFBcastBegin. However, I couldn't understand the PetscSF concept >> > properly. I couldn't adjust my question to the star forest concept. >> > >> > My problem is: If I have 4 processors, I create a matrix whose columns >> > are 4 smallest eigenvectors, say of size 72. Then by defining each row >> > of this matrix as a vector, I cluster them by using k-means >> > clustering algorithm. For now, I cluster them by using MATLAB and I >> > obtain a vector showing which row vector is in which cluster. After >> > getting this vector, to cluster row vectors according to this >> > information, all processors need to have all of the row vectors. >> > >> > According to this problem, how can I use the star forest concept? >> > >> > I will be glad if you can help me about this problem since I don't >> > have enough knowledge about graph theory. An if you have any idea >> > about how can I use k-means algorithm in a more practical way, please >> > let me know. >> > >> > Thanks! >> > >> > Eda >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eda.oktay at metu.edu.tr Fri May 22 12:38:40 2020 From: eda.oktay at metu.edu.tr (Eda Oktay) Date: Fri, 22 May 2020 20:38:40 +0300 Subject: [petsc-users] Gather and Broadcast Parallel Vectors in k-means algorithm In-Reply-To: References: <0c10fc0a-3d86-e91a-f349-fd7c087ba8ed@anl.gov> <738e615c-de8e-88cd-81cd-4d6f35b454d2@anl.gov> Message-ID: I am sorry, I used VecDuplictaeVecs not MatDuplicateVecs Eda Oktay , 22 May 2020 Cum, 20:31 tarihinde ?unu yazd?: > Dear Richard, > > Thank you for your email. From MATLAB's kmeans() function I believe I got > the final clustering index set, not centroids. What I am trying to do is to > cluster vectors created by MatDuplicateVecs() according to the index set > (whose type is not IS since I took it from MATLAB) that I obtained from > MATLAB. I am trying to cluster these vectors however since they are > parallel, I couldn't understand how to cluster them. > > Normally, I have to be independent from MATLAB so I will try your > suggestion, grateful for that. However, because of my limited > knowledge about PETSc and parallel computing, I am not able to figure out > how to cluster parallel vectors according to an index set. > > Thanks, > > Eda > > Mills, Richard Tran , 30 Nis 2020 Per, 02:07 tarihinde > ?unu yazd?: > >> Hi Eda, >> >> Thanks for your reply. I'm still trying to understand why you say you >> need to duplicate the row vectors across all processes. When I have >> implemented parallel k-means, I don't duplicate the row vectors. (This >> would be very unscalable and largely defeat the point of doing this with >> MPI parallelism in the first place.) >> >> Earlier in this email thread, you said that you have used Matlab to get >> cluster IDs for each row vector. Are you trying to then use this >> information to calculate the cluster centroids from inside your PETSc >> program? If so, you can do this by having each MPI rank do the following: >> For cluster i in 0 to (k-1), calculate the element-wise sum of all of the >> local rows that belong to cluster i, then use MPI_Allreduce() to calculate >> the global elementwise sum of all the local sums (this array will be >> replicated across all MPI ranks), and finally divide by the number of >> members of that cluster to get the centroid. Note that MPI_Allreduce() >> doesn't work on PETSc objects, but simple arrays, so you'll want to use >> something like MatGetValues() or MatGetRow() to access the elements of your >> row vectors. >> >> Let me know if I am misunderstanding what you are aiming to do, or if I >> am misunderstanding something. >> >> It sounds like you would benefit from having some routines in PETSc to do >> k-means (or other) clustering, by the way? >> >> Best regards, >> Richard >> >> On 4/29/20 3:47 AM, Eda Oktay wrote: >> >> Dear Richard, >> >> I am trying to use spectral clustering algorithm by using k-means >> clustering algorithm at some point. I am doing this by producing a matrix >> consisting of eigenvectors (of the adjacency matrix of the graph that I >> want to partition), then forming row vectors of this matrix. This is the >> part that I am using parallel vector. By using the output from k-means, I >> am trying to cluster these row vectors. To cluster these vectors, I think I >> need all row vectors in all processes. I wanted to use sequential vectors, >> however, I couldn't find a different way that I form row vectors of a >> matrix. >> >> I am trying to use VecScatterCreateToAll, however, since my vector is >> parallel crated by VecDuplicateVecs, my input is not in correct type, so I >> get error. I still can't get how can I use this function in parallel vector >> created by VecDuplicateVecs. >> >> Thank you all for your help. >> >> Eda >> >> Mills, Richard Tran , 7 Nis 2020 Sal, 01:51 tarihinde >> ?unu yazd?: >> >>> Hi Eda, >>> >>> I think that you probably want to use VecScatter routines, as Junchao >>> has suggested, instead of the lower level star forest for this. I >>> believe that VecScatterCreateToZero() is what you want for the broadcast >>> problem you describe, in the second part of your question. I'm not sure >>> what you are trying to do in the first part. Taking a parallel vector >>> and then copying its entire contents to a sequential vector residing on >>> each process is not scalable, and a lot of the design that has gone into >>> PETSc is to prevent the user from ever needing to do things like that. >>> Can you please tell us what you intend to do with these sequential >>> vectors? >>> >>> I'm also wondering why, later in your message, you say that you get >>> cluster assignments from Matlab, and then "to cluster row vectors >>> according to this information, all processors need to have all of the >>> row vectors". Do you mean you want to get all of the row vectors copied >>> onto all of the processors so that you can compute the cluster >>> centroids? If so, computing the cluster centroids can be done without >>> copying the row vectors onto all processors if you use a communication >>> operation like MPI_Allreduce(). >>> >>> Lastly, let me add that I've done a fair amount of work implementing >>> clustering algorithms on distributed memory parallel machines, but >>> outside of PETSc. I was thinking that I should implement some of these >>> routines using PETSc. I can't get to this immediately, but I'm wondering >>> if you might care to tell me a bit more about the clustering problems >>> you need to solve and how having some support for this in PETSc might >>> (or might not) help. >>> >>> Best regards, >>> Richard >>> >>> On 4/4/20 1:39 AM, Eda Oktay wrote: >>> > Hi all, >>> > >>> > I created a parallel vector UV, by using VecDuplicateVecs since I need >>> > row vectors of a matrix. However, I need the whole vector be in all >>> > processors, which means I need to gather all and broadcast them to all >>> > processors. To gather, I tried to use VecStrideGatherAll: >>> > >>> > Vec UVG; >>> > VecStrideGatherAll(UV,UVG,INSERT_VALUES); >>> > VecView(UVG,PETSC_VIEWER_STDOUT_WORLD); >>> > >>> > however when I try to view the vector, I get the following error. >>> > >>> > [3]PETSC ERROR: Invalid argument >>> > [3]PETSC ERROR: Wrong type of object: Parameter # 1 >>> > [3]PETSC ERROR: See >>> > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>> shooting. >>> > [3]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019 >>> > [3]PETSC ERROR: ./clustering_son_final_edgecut_without_parmetis on a >>> > arch-linux2-c-debug named localhost.localdomain by edaoktay Sat Apr 4 >>> > 11:22:54 2020 >>> > [3]PETSC ERROR: Wrong type of object: Parameter # 1 >>> > [0]PETSC ERROR: See >>> > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>> shooting. >>> > [0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019 >>> > [0]PETSC ERROR: ./clustering_son_final_edgecut_without_parmetis on a >>> > arch-linux2-c-debug named localhost.localdomain by edaoktay Sat Apr 4 >>> > 11:22:54 2020 >>> > [0]PETSC ERROR: Configure options --download-mpich --download-openblas >>> > --download-slepc --download-metis --download-parmetis --download-chaco >>> > --with-X=1 >>> > [0]PETSC ERROR: #1 VecStrideGatherAll() line 646 in >>> > /home/edaoktay/petsc-3.11.1/src/vec/vec/utils/vinv.c >>> > ./clustering_son_final_edgecut_without_parmetis on a >>> > arch-linux2-c-debug named localhost.localdomain by edaoktay Sat Apr 4 >>> > 11:22:54 2020 >>> > [1]PETSC ERROR: Configure options --download-mpich --download-openblas >>> > --download-slepc --download-metis --download-parmetis --download-chaco >>> > --with-X=1 >>> > [1]PETSC ERROR: #1 VecStrideGatherAll() line 646 in >>> > /home/edaoktay/petsc-3.11.1/src/vec/vec/utils/vinv.c >>> > Configure options --download-mpich --download-openblas >>> > --download-slepc --download-metis --download-parmetis --download-chaco >>> > --with-X=1 >>> > [3]PETSC ERROR: #1 VecStrideGatherAll() line 646 in >>> > /home/edaoktay/petsc-3.11.1/src/vec/vec/utils/vinv.c >>> > >>> > I couldn't understand why I am getting this error. Is this because of >>> > UV being created by VecDuplicateVecs? How can I solve this problem? >>> > >>> > The other question is broadcasting. After gathering all elements of >>> > the vector UV, I need to broadcast them to all processors. I found >>> > PetscSFBcastBegin. However, I couldn't understand the PetscSF concept >>> > properly. I couldn't adjust my question to the star forest concept. >>> > >>> > My problem is: If I have 4 processors, I create a matrix whose columns >>> > are 4 smallest eigenvectors, say of size 72. Then by defining each row >>> > of this matrix as a vector, I cluster them by using k-means >>> > clustering algorithm. For now, I cluster them by using MATLAB and I >>> > obtain a vector showing which row vector is in which cluster. After >>> > getting this vector, to cluster row vectors according to this >>> > information, all processors need to have all of the row vectors. >>> > >>> > According to this problem, how can I use the star forest concept? >>> > >>> > I will be glad if you can help me about this problem since I don't >>> > have enough knowledge about graph theory. An if you have any idea >>> > about how can I use k-means algorithm in a more practical way, please >>> > let me know. >>> > >>> > Thanks! >>> > >>> > Eda >>> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From rtmills at anl.gov Fri May 22 12:57:37 2020 From: rtmills at anl.gov (Mills, Richard Tran) Date: Fri, 22 May 2020 17:57:37 +0000 Subject: [petsc-users] Possible bug PETSc+Complex+CUDA In-Reply-To: References: <351ECF24-E265-4487-AE44-C4D8F2EFD045@gmail.com> Message-ID: <363e7979-6a7b-b8f0-76e1-f3d6479cabd1@anl.gov> Yes, Junchao said he gets the segfault, but it works for Karl. Sounds like this may be a case of one compiler liking the definitions for complex that Thrust uses, and some not, as Stefano says. Karl and Junchao, can you please share the version of the compilers (and maybe associated settings) that you are using? --Richard On 5/21/20 9:15 AM, Junchao Zhang wrote: I tested this example with cuda 10.2, it did segfault. I'm looking into it. --Junchao Zhang On Thu, May 21, 2020 at 11:04 AM Matthew Knepley > wrote: On Thu, May 21, 2020 at 11:31 AM Stefano Zampini > wrote: Oh, there is also an issue I have recently noticed and did not have yet the time to fix it With complex numbers, we use the definitions for complexes from thrust and this does not seem to be always compatible to whatever the C compiler uses Matt, take a look at petscsytypes.h and you will see the issue https://gitlab.com/petsc/petsc/-/blob/master/include/petscsystypes.h#L208 For sure, you need to configure petsc with --with-clanguage=cxx, but even that does not to seem make it work on a CUDA box I have recently tried out (CUDA 10.1) I believe the issue arise even if you call VecSet(v,0) on a VECCUDA So Karl and Junchao say that with 10.2 it is working. Do you have access to 10.2? Thanks, Matt On May 21, 2020, at 6:21 PM, Matthew Knepley > wrote: On Thu, May 21, 2020 at 10:53 AM Rui Silva > wrote: Hello everyone, I am trying to run PETSc (with complex numbers in the GPU). When I call the VecWAXPY routine using the complex version of PETSc and mpicuda vectors, the program fails with a segmentation fault. This problem does not appear, if I run the complex version with mpi vectors or with the real version using mpicuda vectors. Is there any problem using CUDA+complex PETSc? Furthermore, I use the -log_view option to run the complex+gpu code, otherwise the program fails at the beggining. What version of CUDA do you have? There are bugs in the versions before 10.2. Thanks, Matt Best regards, Rui Silva -- Dr. Rui Emanuel Ferreira da Silva Departamento de F?sica Te?rica de la Materia Condensada Universidad Aut?noma de Madrid, Spain https://ruiefdasilva.wixsite.com/ruiefdasilva https://mmuscles.eu/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Fri May 22 13:55:02 2020 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Fri, 22 May 2020 13:55:02 -0500 Subject: [petsc-users] Possible bug PETSc+Complex+CUDA In-Reply-To: <363e7979-6a7b-b8f0-76e1-f3d6479cabd1@anl.gov> References: <351ECF24-E265-4487-AE44-C4D8F2EFD045@gmail.com> <363e7979-6a7b-b8f0-76e1-f3d6479cabd1@anl.gov> Message-ID: $ module list Currently Loaded Modules: 1) cuda/10.2 2) gcc/8.3.0-fjpc5ys 3) cmake/3.17.0-n3kslpc 4) openmpi-4.0.2-gcc-8.3.0-e2zcbqz $nvcc -V Cuda compilation tools, release 10.2, V10.2.89 --Junchao Zhang On Fri, May 22, 2020 at 12:57 PM Mills, Richard Tran via petsc-users < petsc-users at mcs.anl.gov> wrote: > Yes, Junchao said he gets the segfault, but it works for Karl. Sounds like > this may be a case of one compiler liking the definitions for complex that > Thrust uses, and some not, as Stefano says. Karl and Junchao, can you > please share the version of the compilers (and maybe associated settings) > that you are using? > > --Richard > > On 5/21/20 9:15 AM, Junchao Zhang wrote: > > I tested this example with cuda 10.2, it did segfault. I'm looking into it. > --Junchao Zhang > > > On Thu, May 21, 2020 at 11:04 AM Matthew Knepley > wrote: > >> On Thu, May 21, 2020 at 11:31 AM Stefano Zampini < >> stefano.zampini at gmail.com> wrote: >> >>> Oh, there is also an issue I have recently noticed and did not have yet >>> the time to fix it >>> >>> With complex numbers, we use the definitions for complexes from thrust >>> and this does not seem to be always compatible to whatever the C compiler >>> uses >>> Matt, take a look at petscsytypes.h and you will see the issue >>> https://gitlab.com/petsc/petsc/-/blob/master/include/petscsystypes.h#L208 >>> >>> For sure, you need to configure petsc with --with-clanguage=cxx, but >>> even that does not to seem make it work on a CUDA box I have recently tried >>> out (CUDA 10.1) >>> I believe the issue arise even if you call VecSet(v,0) on a VECCUDA >>> >> >> So Karl and Junchao say that with 10.2 it is working. Do you have access >> to 10.2? >> >> Thanks, >> >> Matt >> >> >>> On May 21, 2020, at 6:21 PM, Matthew Knepley wrote: >>> >>> On Thu, May 21, 2020 at 10:53 AM Rui Silva wrote: >>> >>>> Hello everyone, >>>> >>>> I am trying to run PETSc (with complex numbers in the GPU). When I call >>>> the VecWAXPY routine using the complex version of PETSc and mpicuda >>>> vectors, the program fails with a segmentation fault. This problem does >>>> not appear, if I run the complex version with mpi vectors or with the >>>> real version using mpicuda vectors. Is there any problem using >>>> CUDA+complex PETSc? >>>> >>>> Furthermore, I use the -log_view option to run the complex+gpu code, >>>> otherwise the program fails at the beggining. >>>> >>> >>> What version of CUDA do you have? There are bugs in the versions before >>> 10.2. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Best regards, >>>> >>>> Rui Silva >>>> >>>> -- >>>> Dr. Rui Emanuel Ferreira da Silva >>>> Departamento de F?sica Te?rica de la Materia Condensada >>>> Universidad Aut?noma de Madrid, Spain >>>> https://ruiefdasilva.wixsite.com/ruiefdasilva >>>> https://mmuscles.eu/ >>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bantingl at myumanitoba.ca Fri May 22 15:10:08 2020 From: bantingl at myumanitoba.ca (Lucas Banting) Date: Fri, 22 May 2020 20:10:08 +0000 Subject: [petsc-users] Question about DMLocalToLocal for DM_BOUNDARY_GHOSTED conditions Message-ID: Hello, I am converting a serial code to parallel in fortran with petsc. I am using the DMDA to manage communication of the information that used to be in old two-dimensional fortran arrays. I noticed when using DMLocalToLocalBegin/End, not all the ghost values in the array at the DM_BOUNDARY_GHOSTED area is updated. Is this expected behaviour? I read through this thread: https://lists.mcs.anl.gov/mailman/htdig/petsc-users/2016-May/029252.html and saw someone had a similar question, but the answer was not clear to me. If this is expected behaviour, how should I instead update these values in my arrays? I was using DM_BOUNDARY_GHOSTED as I needed the extra ghost cells for some subroutines, but I do not need them in my matrix from DMCreateMatrix. I am using Petsc 3.12.4 and open MPI 3.1.4. Thanks, Lucas Banting -------------- next part -------------- An HTML attachment was scrubbed... URL: From bantingl at myumanitoba.ca Fri May 22 16:11:00 2020 From: bantingl at myumanitoba.ca (Lucas Banting) Date: Fri, 22 May 2020 21:11:00 +0000 Subject: [petsc-users] Question about DMLocalToLocal when using DM_BOUNDARY_GHOSTED Message-ID: Hello, I am converting a serial code to parallel in fortran with petsc. I am using the DMDA to manage communication of the information that used to be in old two-dimensional fortran arrays. I noticed when using DMLocalToLocalBegin/End, not all the ghost values in the array at the DM_BOUNDARY_GHOSTED area is updated. Is this expected behaviour? I read through this thread: https://lists.mcs.anl.gov/mailman/htdig/petsc-users/2016-May/029252.html and saw someone had a similar question, but the answer was not clear to me. If this is expected behaviour, how should I instead update these values in my arrays? I was using DM_BOUNDARY_GHOSTED as I needed the extra ghost cells for some subroutines, but I do not need them in my matrix from DMCreateMatrix. I am using Petsc 3.12.4 and open MPI 3.1.4. Thanks, Lucas Banting -------------- next part -------------- An HTML attachment was scrubbed... URL: From rtmills at anl.gov Fri May 22 18:39:40 2020 From: rtmills at anl.gov (Mills, Richard Tran) Date: Fri, 22 May 2020 23:39:40 +0000 Subject: [petsc-users] Gather and Broadcast Parallel Vectors in k-means algorithm In-Reply-To: References: <0c10fc0a-3d86-e91a-f349-fd7c087ba8ed@anl.gov> <738e615c-de8e-88cd-81cd-4d6f35b454d2@anl.gov> Message-ID: Hi Eda, If you are using the MATLAB k-means function, calling it like idx = kmeans(X,k) will give you the index set, but if you do [idx,C] = kmeans(X,k) then you will also get a matrix C which contains the cluster centroids. Is this not what you need? --Richard On 5/22/20 10:38 AM, Eda Oktay wrote: I am sorry, I used VecDuplictaeVecs not MatDuplicateVecs Eda Oktay >, 22 May 2020 Cum, 20:31 tarihinde ?unu yazd?: Dear Richard, Thank you for your email. From MATLAB's kmeans() function I believe I got the final clustering index set, not centroids. What I am trying to do is to cluster vectors created by MatDuplicateVecs() according to the index set (whose type is not IS since I took it from MATLAB) that I obtained from MATLAB. I am trying to cluster these vectors however since they are parallel, I couldn't understand how to cluster them. Normally, I have to be independent from MATLAB so I will try your suggestion, grateful for that. However, because of my limited knowledge about PETSc and parallel computing, I am not able to figure out how to cluster parallel vectors according to an index set. Thanks, Eda Mills, Richard Tran >, 30 Nis 2020 Per, 02:07 tarihinde ?unu yazd?: Hi Eda, Thanks for your reply. I'm still trying to understand why you say you need to duplicate the row vectors across all processes. When I have implemented parallel k-means, I don't duplicate the row vectors. (This would be very unscalable and largely defeat the point of doing this with MPI parallelism in the first place.) Earlier in this email thread, you said that you have used Matlab to get cluster IDs for each row vector. Are you trying to then use this information to calculate the cluster centroids from inside your PETSc program? If so, you can do this by having each MPI rank do the following: For cluster i in 0 to (k-1), calculate the element-wise sum of all of the local rows that belong to cluster i, then use MPI_Allreduce() to calculate the global elementwise sum of all the local sums (this array will be replicated across all MPI ranks), and finally divide by the number of members of that cluster to get the centroid. Note that MPI_Allreduce() doesn't work on PETSc objects, but simple arrays, so you'll want to use something like MatGetValues() or MatGetRow() to access the elements of your row vectors. Let me know if I am misunderstanding what you are aiming to do, or if I am misunderstanding something. It sounds like you would benefit from having some routines in PETSc to do k-means (or other) clustering, by the way? Best regards, Richard On 4/29/20 3:47 AM, Eda Oktay wrote: Dear Richard, I am trying to use spectral clustering algorithm by using k-means clustering algorithm at some point. I am doing this by producing a matrix consisting of eigenvectors (of the adjacency matrix of the graph that I want to partition), then forming row vectors of this matrix. This is the part that I am using parallel vector. By using the output from k-means, I am trying to cluster these row vectors. To cluster these vectors, I think I need all row vectors in all processes. I wanted to use sequential vectors, however, I couldn't find a different way that I form row vectors of a matrix. I am trying to use VecScatterCreateToAll, however, since my vector is parallel crated by VecDuplicateVecs, my input is not in correct type, so I get error. I still can't get how can I use this function in parallel vector created by VecDuplicateVecs. Thank you all for your help. Eda Mills, Richard Tran >, 7 Nis 2020 Sal, 01:51 tarihinde ?unu yazd?: Hi Eda, I think that you probably want to use VecScatter routines, as Junchao has suggested, instead of the lower level star forest for this. I believe that VecScatterCreateToZero() is what you want for the broadcast problem you describe, in the second part of your question. I'm not sure what you are trying to do in the first part. Taking a parallel vector and then copying its entire contents to a sequential vector residing on each process is not scalable, and a lot of the design that has gone into PETSc is to prevent the user from ever needing to do things like that. Can you please tell us what you intend to do with these sequential vectors? I'm also wondering why, later in your message, you say that you get cluster assignments from Matlab, and then "to cluster row vectors according to this information, all processors need to have all of the row vectors". Do you mean you want to get all of the row vectors copied onto all of the processors so that you can compute the cluster centroids? If so, computing the cluster centroids can be done without copying the row vectors onto all processors if you use a communication operation like MPI_Allreduce(). Lastly, let me add that I've done a fair amount of work implementing clustering algorithms on distributed memory parallel machines, but outside of PETSc. I was thinking that I should implement some of these routines using PETSc. I can't get to this immediately, but I'm wondering if you might care to tell me a bit more about the clustering problems you need to solve and how having some support for this in PETSc might (or might not) help. Best regards, Richard On 4/4/20 1:39 AM, Eda Oktay wrote: > Hi all, > > I created a parallel vector UV, by using VecDuplicateVecs since I need > row vectors of a matrix. However, I need the whole vector be in all > processors, which means I need to gather all and broadcast them to all > processors. To gather, I tried to use VecStrideGatherAll: > > Vec UVG; > VecStrideGatherAll(UV,UVG,INSERT_VALUES); > VecView(UVG,PETSC_VIEWER_STDOUT_WORLD); > > however when I try to view the vector, I get the following error. > > [3]PETSC ERROR: Invalid argument > [3]PETSC ERROR: Wrong type of object: Parameter # 1 > [3]PETSC ERROR: See > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [3]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019 > [3]PETSC ERROR: ./clustering_son_final_edgecut_without_parmetis on a > arch-linux2-c-debug named localhost.localdomain by edaoktay Sat Apr 4 > 11:22:54 2020 > [3]PETSC ERROR: Wrong type of object: Parameter # 1 > [0]PETSC ERROR: See > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019 > [0]PETSC ERROR: ./clustering_son_final_edgecut_without_parmetis on a > arch-linux2-c-debug named localhost.localdomain by edaoktay Sat Apr 4 > 11:22:54 2020 > [0]PETSC ERROR: Configure options --download-mpich --download-openblas > --download-slepc --download-metis --download-parmetis --download-chaco > --with-X=1 > [0]PETSC ERROR: #1 VecStrideGatherAll() line 646 in > /home/edaoktay/petsc-3.11.1/src/vec/vec/utils/vinv.c > ./clustering_son_final_edgecut_without_parmetis on a > arch-linux2-c-debug named localhost.localdomain by edaoktay Sat Apr 4 > 11:22:54 2020 > [1]PETSC ERROR: Configure options --download-mpich --download-openblas > --download-slepc --download-metis --download-parmetis --download-chaco > --with-X=1 > [1]PETSC ERROR: #1 VecStrideGatherAll() line 646 in > /home/edaoktay/petsc-3.11.1/src/vec/vec/utils/vinv.c > Configure options --download-mpich --download-openblas > --download-slepc --download-metis --download-parmetis --download-chaco > --with-X=1 > [3]PETSC ERROR: #1 VecStrideGatherAll() line 646 in > /home/edaoktay/petsc-3.11.1/src/vec/vec/utils/vinv.c > > I couldn't understand why I am getting this error. Is this because of > UV being created by VecDuplicateVecs? How can I solve this problem? > > The other question is broadcasting. After gathering all elements of > the vector UV, I need to broadcast them to all processors. I found > PetscSFBcastBegin. However, I couldn't understand the PetscSF concept > properly. I couldn't adjust my question to the star forest concept. > > My problem is: If I have 4 processors, I create a matrix whose columns > are 4 smallest eigenvectors, say of size 72. Then by defining each row > of this matrix as a vector, I cluster them by using k-means > clustering algorithm. For now, I cluster them by using MATLAB and I > obtain a vector showing which row vector is in which cluster. After > getting this vector, to cluster row vectors according to this > information, all processors need to have all of the row vectors. > > According to this problem, how can I use the star forest concept? > > I will be glad if you can help me about this problem since I don't > have enough knowledge about graph theory. An if you have any idea > about how can I use k-means algorithm in a more practical way, please > let me know. > > Thanks! > > Eda -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri May 22 20:03:49 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 22 May 2020 21:03:49 -0400 Subject: [petsc-users] Question about DMLocalToLocal for DM_BOUNDARY_GHOSTED conditions In-Reply-To: References: Message-ID: On Fri, May 22, 2020 at 4:34 PM Lucas Banting wrote: > Hello, > > I am converting a serial code to parallel in fortran with petsc. I am > using the DMDA to manage communication of the information that used to be > in old two-dimensional fortran arrays. > > I noticed when using DMLocalToLocalBegin/End, not all the ghost values in > the array at the DM_BOUNDARY_GHOSTED area is updated. Is this expected > behaviour? > I believe so. GHOSTED is user managed space. We do not touch it. > I read through this thread: > https://lists.mcs.anl.gov/mailman/htdig/petsc-users/2016-May/029252.html > and saw someone had a similar question, but the answer was not clear to me. > > If this is expected behaviour, how should I instead update these values in > my arrays? I was using DM_BOUNDARY_GHOSTED as I needed the extra ghost > cells for some subroutines, but I do not need them in my matrix from > DMCreateMatrix. > You fill them in the local vector. Thanks, Matt > I am using Petsc 3.12.4 and open MPI 3.1.4. > > Thanks, > > Lucas Banting > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri May 22 20:07:27 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 22 May 2020 21:07:27 -0400 Subject: [petsc-users] Creating multi-field section using DMPlex In-Reply-To: References: Message-ID: On Fri, May 22, 2020 at 12:57 PM Shashwat Tiwari wrote: > Hi, > I am working on a Finite Volume scheme for a hyperbolic system of > equations equations with 6 variables on unstructured grids. I am trying to > create a section with 6 fields for this purpose, and have written a small > test code for creating the section by editing the example given at > src/dm/impls/plex/tutorials/ex1.c as follows: > If you use the PetscFV stuff, it assumes everything is in a single field. This is because FV methods often want Riemann solvers with all the fields, but Plex splits the callbacks by field in order to get sparsity in the finite element matrix. In order to make things more understandable, we added https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PetscSection/PetscSectionSetComponentName.html#PetscSectionSetComponentName so you can name the components just like you would name fields. I think this may be what you want. Let me know if this does not help. Thanks, Matt > int main(int argc, char **argv) > { > DM dm, dmDist = NULL; > Vec U; > PetscSection section; > PetscViewer viewer; > PetscInt dim = 2, numFields, numBC, i; > PetscInt numComp[6]; > PetscInt numDof[18]; > PetscBool interpolate = PETSC_TRUE, useCone = PETSC_TRUE, > useClosure = PETSC_TRUE; > PetscReal lower[3], upper[3]; // lower left and upper right > coordinates of domain > PetscInt cells[2]; > PetscErrorCode ierr; > > // define domain properties > cells[0] = 4; cells[1] = 4; > lower[0] = -1; lower[1] = -1; lower[2] = 0; > upper[0] = 1; upper[1] = 1; upper[2] = 0; > > ierr = PetscInitialize(&argc, &argv, NULL, help); CHKERRQ(ierr); > // create the mesh > ierr = PetscPrintf(PETSC_COMM_WORLD, "Generating mesh ... "); > CHKERRQ(ierr); > ierr = DMPlexCreateBoxMesh(PETSC_COMM_WORLD, dim, PETSC_TRUE, > cells, lower, upper, NULL, interpolate, &dm); > CHKERRQ(ierr); > ierr = PetscPrintf(PETSC_COMM_WORLD, "Done\n"); CHKERRQ(ierr); > > // set cell adjacency > ierr = DMSetBasicAdjacency(dm, useCone, useClosure); CHKERRQ(ierr); > // Distribute mesh over processes > ierr = DMPlexDistribute(dm, 1, NULL, &dmDist);CHKERRQ(ierr); > if (dmDist) {ierr = DMDestroy(&dm);CHKERRQ(ierr); dm = dmDist;} > > // Create scalar fields > numFields = 6; > numComp[0] = 1; > numComp[1] = 1; > numComp[2] = 1; > numComp[3] = 1; > numComp[4] = 1; > numComp[5] = 1; > > for (i = 0; i < numFields*(dim+1); ++i) numDof[i] = 0; > numDof[0*(dim+1)+dim] = 1; > numDof[1*(dim+1)+dim] = 1; > numDof[2*(dim+1)+dim] = 1; > numDof[3*(dim+1)+dim] = 1; > numDof[4*(dim+1)+dim] = 1; > numDof[5*(dim+1)+dim] = 1; > > numBC = 0; > > // Create a PetscSection with this data layout > ierr = DMSetNumFields(dm, numFields);CHKERRQ(ierr); > ierr = DMPlexCreateSection(dm, NULL, numComp, numDof, numBC, NULL, NULL, > NULL, NULL, §ion);CHKERRQ(ierr); > > // Name the Field variables > ierr = PetscSectionSetFieldName(section, 0, "h");CHKERRQ(ierr); > ierr = PetscSectionSetFieldName(section, 1, "m1");CHKERRQ(ierr); > ierr = PetscSectionSetFieldName(section, 2, "m2");CHKERRQ(ierr); > ierr = PetscSectionSetFieldName(section, 3, "E11");CHKERRQ(ierr); > ierr = PetscSectionSetFieldName(section, 4, "E12");CHKERRQ(ierr); > ierr = PetscSectionSetFieldName(section, 5, "E22");CHKERRQ(ierr); > > // set adjacency for each field > ierr = DMSetAdjacency(dm, 0, useCone, useClosure); CHKERRQ(ierr); > ierr = DMSetAdjacency(dm, 1, useCone, useClosure); CHKERRQ(ierr); > ierr = DMSetAdjacency(dm, 2, useCone, useClosure); CHKERRQ(ierr); > ierr = DMSetAdjacency(dm, 3, useCone, useClosure); CHKERRQ(ierr); > ierr = DMSetAdjacency(dm, 4, useCone, useClosure); CHKERRQ(ierr); > ierr = DMSetAdjacency(dm, 5, useCone, useClosure); CHKERRQ(ierr); > > ierr = PetscSectionView(section, > PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr); > > // Tell the DM to use this data layout > ierr = DMSetLocalSection(dm, section);CHKERRQ(ierr); > ierr = DMView(dm, PETSC_VIEWER_STDOUT_WORLD); CHKERRQ(ierr); > > // Create a Vec with this layout and view it > ierr = DMGetGlobalVector(dm, &U);CHKERRQ(ierr); > ierr = PetscObjectSetName((PetscObject) U, "U"); CHKERRQ(ierr); > ierr = PetscViewerCreate(PETSC_COMM_WORLD, &viewer);CHKERRQ(ierr); > ierr = PetscViewerSetType(viewer, PETSCVIEWERVTK);CHKERRQ(ierr); > ierr = PetscViewerPushFormat(viewer, > PETSC_VIEWER_ASCII_VTK);CHKERRQ(ierr); > ierr = PetscViewerFileSetName(viewer, "sol.vtk");CHKERRQ(ierr); > //ierr = PetscViewerPushFormat(viewer, > PETSC_VIEWER_VTK_VTU);CHKERRQ(ierr); > //ierr = PetscViewerFileSetName(viewer, "sol.vtu");CHKERRQ(ierr); > ierr = VecView(U, viewer);CHKERRQ(ierr); > ierr = PetscViewerDestroy(&viewer);CHKERRQ(ierr); > ierr = DMRestoreGlobalVector(dm, &U);CHKERRQ(ierr); > > // Cleanup > ierr = PetscSectionDestroy(§ion);CHKERRQ(ierr); > ierr = DMDestroy(&dm);CHKERRQ(ierr); > ierr = PetscFinalize(); > return ierr; > } > > When exporting the vector data to "vtk" format, the code works fine and I > am able to see the 6 field names in the visualizer. But when I change it to > "vtu", I get the following error: > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Petsc has generated inconsistent data > [0]PETSC ERROR: Total number of field components 1 != block size 6 > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.13.0, unknown > [0]PETSC ERROR: ./ex1 on a arch-linux2-c-debug named shashwat by shashwat > Fri May 22 22:11:13 2020 > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ > --with-fc=gfortran --download-mpich --download-fblaslapack > --with-debugging=1 --download-triangle --download-metis --download-parmetis > --download-cmake > [0]PETSC ERROR: #1 DMPlexVTKWriteAll_VTU() line 334 in > /home/shashwat/local/petsc/src/dm/impls/plex/plexvtu.c > [0]PETSC ERROR: #2 DMPlexVTKWriteAll() line 688 in > /home/shashwat/local/petsc/src/dm/impls/plex/plexvtk.c > [0]PETSC ERROR: #3 PetscViewerFlush_VTK() line 100 in > /home/shashwat/local/petsc/src/sys/classes/viewer/impls/vtk/vtkv.c > [0]PETSC ERROR: #4 PetscViewerFlush() line 26 in > /home/shashwat/local/petsc/src/sys/classes/viewer/interface/flush.c > [0]PETSC ERROR: #5 PetscViewerDestroy() line 113 in > /home/shashwat/local/petsc/src/sys/classes/viewer/interface/view.c > [0]PETSC ERROR: #6 main() line 485 in ex1.c > [0]PETSC ERROR: No PETSc Option Table entries > [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > application called MPI_Abort(MPI_COMM_SELF, 485077) - process 0 > > I would like to use the "vtu" format for visualiztion. Kindly have a look > and let me know I might be doing wrong here, and how can I make it work. > > Regards, > Shashwat > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bantingl at myumanitoba.ca Mon May 25 10:19:18 2020 From: bantingl at myumanitoba.ca (Lucas Banting) Date: Mon, 25 May 2020 15:19:18 +0000 Subject: [petsc-users] Question about DMLocalToLocal for DM_BOUNDARY_GHOSTED conditions In-Reply-To: References: , Message-ID: So DM_BOUNDARY_GHOSTED values are not updated then in the DMDALocalToLocal routines. I should instead use DM_BOUNDARY_NONE and make the domain larger by 2 elements if I need those values to be shared between processes then? Is that the best approach? Thanks, Lucas ________________________________ From: Matthew Knepley Sent: Friday, May 22, 2020 8:03 PM To: Lucas Banting Cc: PETSc Subject: Re: [petsc-users] Question about DMLocalToLocal for DM_BOUNDARY_GHOSTED conditions On Fri, May 22, 2020 at 4:34 PM Lucas Banting > wrote: Hello, I am converting a serial code to parallel in fortran with petsc. I am using the DMDA to manage communication of the information that used to be in old two-dimensional fortran arrays. I noticed when using DMLocalToLocalBegin/End, not all the ghost values in the array at the DM_BOUNDARY_GHOSTED area is updated. Is this expected behaviour? I believe so. GHOSTED is user managed space. We do not touch it. I read through this thread: https://lists.mcs.anl.gov/mailman/htdig/petsc-users/2016-May/029252.html and saw someone had a similar question, but the answer was not clear to me. If this is expected behaviour, how should I instead update these values in my arrays? I was using DM_BOUNDARY_GHOSTED as I needed the extra ghost cells for some subroutines, but I do not need them in my matrix from DMCreateMatrix. You fill them in the local vector. Thanks, Matt I am using Petsc 3.12.4 and open MPI 3.1.4. Thanks, Lucas Banting -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From edoardo.alinovi at gmail.com Mon May 25 11:17:36 2020 From: edoardo.alinovi at gmail.com (Edoardo alinovi) Date: Mon, 25 May 2020 18:17:36 +0200 Subject: [petsc-users] Correct Usage of MatDiagonalSet Message-ID: Dear Guys, I have quick question for you. Can anyone tell me if MatDiagonalSet needs also MatAssemblyBegin/MatAssemblyEnd to be called afterwards or not? Basically, I am trying to compute: sum (a_nb*\phi_nb), where anb are the off diagonal coeffs of the matrix and phi_nb the corresponding field values. I was thinking to set the matrix diagonal to zero using MatDiagonalSet and then simply use MatMult(A, x) to accomplish my task. Do we have a better way in PETSc? Any suggestion in welcome :) Thank you very much -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon May 25 11:52:34 2020 From: jed at jedbrown.org (Jed Brown) Date: Mon, 25 May 2020 10:52:34 -0600 Subject: [petsc-users] Correct Usage of MatDiagonalSet In-Reply-To: References: Message-ID: <87wo50yn1p.fsf@jedbrown.org> Edoardo alinovi writes: > Dear Guys, > > I have quick question for you. Can anyone tell me if MatDiagonalSet needs > also MatAssemblyBegin/MatAssemblyEnd to be called afterwards or not? It's called internally so you don't have to call again. > Basically, I am trying to compute: sum (a_nb*\phi_nb), where anb are the > off diagonal coeffs of the matrix and phi_nb the corresponding field > values. > > I was thinking to set the matrix diagonal to zero using MatDiagonalSet and > then simply use MatMult(A, x) to accomplish my task. Do we have a better > way in PETSc? Any suggestion in welcome :) An alternative would be to extract the diagonal and subtract off the result of VecPointwiseMult(w, diag, phi). From edoardo.alinovi at gmail.com Mon May 25 11:54:37 2020 From: edoardo.alinovi at gmail.com (Edoardo alinovi) Date: Mon, 25 May 2020 18:54:37 +0200 Subject: [petsc-users] Correct Usage of MatDiagonalSet In-Reply-To: <87wo50yn1p.fsf@jedbrown.org> References: <87wo50yn1p.fsf@jedbrown.org> Message-ID: Cool, thanks a lot! On Mon, 25 May 2020, 18:52 Jed Brown, wrote: > Edoardo alinovi writes: > > > Dear Guys, > > > > I have quick question for you. Can anyone tell me if MatDiagonalSet needs > > also MatAssemblyBegin/MatAssemblyEnd to be called afterwards or not? > > It's called internally so you don't have to call again. > > > Basically, I am trying to compute: sum (a_nb*\phi_nb), where anb are the > > off diagonal coeffs of the matrix and phi_nb the corresponding field > > values. > > > > I was thinking to set the matrix diagonal to zero using MatDiagonalSet > and > > then simply use MatMult(A, x) to accomplish my task. Do we have a better > > way in PETSc? Any suggestion in welcome :) > > An alternative would be to extract the diagonal and subtract off the > result of VecPointwiseMult(w, diag, phi). > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon May 25 19:22:22 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 25 May 2020 20:22:22 -0400 Subject: [petsc-users] Question about DMLocalToLocal for DM_BOUNDARY_GHOSTED conditions In-Reply-To: References: Message-ID: On Mon, May 25, 2020 at 11:19 AM Lucas Banting wrote: > So DM_BOUNDARY_GHOSTED values are not updated then in the DMDALocalToLocal > routines. I should instead use DM_BOUNDARY_NONE and make the domain larger > by 2 elements if I need those values to be shared between processes then? > Is that the best approach? > I think we need to clarify what PETSc means by these things. We divide the original grid into non-overlapping pieces. These are the vertices that are ordered for the global numbering, and those that we own are called the "local" vertices. In order to calculate the value at a local vertex, we need information from neighboring vertices. Sometimes these are outside the portion we own. We call these "ghost" vertices. Thus Global Numbering --> local vertices Local Numbering --> local+ghost vertices At a boundary, we have to decide how our stencil behaves. If there is just nothing there, we would have type NONE. Some stencil indices for a boundary vertex are invalid. If instead we decide that the stencil should wrap around the domain, indexing into vertices owned by another process, that is type PERIODIC. If the stencil wraps back onto this process, it is type MIRROR. If the stencil indexes into some storage space in which I can put boundary values, it is type GHOSTED. Now if we do LocalToGlobal. We are copying unknowns from Global storage to Local storage. This is just 1-to-1 if we have NONE, since no extra unknowns were added. For PERIODIC or MIRROR, the extra local unknowns are mapped to other global unknowns, and thus we have a copy, possibly with communication. For GHOSTED, the extra local unknowns are just storage and do not correspond to other global unknowns. Thus there is no copy. We can think of LocalToLocal as just LocalToGlobal followed by GlobalToLocal but done in one step. Since GHOSTED is not affected by LocalToGlobal, nothing happens. I am not sure what you are trying to achieve with the boundary condition. Thanks, Matt > Thanks, > > Lucas > > ------------------------------ > *From:* Matthew Knepley > *Sent:* Friday, May 22, 2020 8:03 PM > *To:* Lucas Banting > *Cc:* PETSc > *Subject:* Re: [petsc-users] Question about DMLocalToLocal for > DM_BOUNDARY_GHOSTED conditions > > On Fri, May 22, 2020 at 4:34 PM Lucas Banting > wrote: > > Hello, > > I am converting a serial code to parallel in fortran with petsc. I am > using the DMDA to manage communication of the information that used to be > in old two-dimensional fortran arrays. > > I noticed when using DMLocalToLocalBegin/End, not all the ghost values in > the array at the DM_BOUNDARY_GHOSTED area is updated. Is this expected > behaviour? > > > I believe so. GHOSTED is user managed space. We do not touch it. > > > I read through this thread: > https://lists.mcs.anl.gov/mailman/htdig/petsc-users/2016-May/029252.html > and saw someone had a similar question, but the answer was not clear to me. > > If this is expected behaviour, how should I instead update these values in > my arrays? I was using DM_BOUNDARY_GHOSTED as I needed the extra ghost > cells for some subroutines, but I do not need them in my matrix from > DMCreateMatrix. > > > You fill them in the local vector. > > Thanks, > > Matt > > > I am using Petsc 3.12.4 and open MPI 3.1.4. > > Thanks, > > Lucas Banting > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From eda.oktay at metu.edu.tr Tue May 26 07:39:44 2020 From: eda.oktay at metu.edu.tr (Eda Oktay) Date: Tue, 26 May 2020 15:39:44 +0300 Subject: [petsc-users] Gather and Broadcast Parallel Vectors in k-means algorithm In-Reply-To: References: <0c10fc0a-3d86-e91a-f349-fd7c087ba8ed@anl.gov> <738e615c-de8e-88cd-81cd-4d6f35b454d2@anl.gov> Message-ID: Dear Richard, I believe I don't need centroids. I just need cluster indices which corresponds to idx. What I am trying to do is this: Step 6: Cluster the points (y_i) i=1,...,n in R^k with the k-means algorithm into clusters C_1,...,C_k. Output: Clusters A_1,....,A_k with A_i = {j | y_j in C_i} where y_i is the row vector of a matrix whose columns are eigenvectors In order to cluster y_i, I think I just need idx from MATLAB since it shows clustering indices. Thanks, Eda Mills, Richard Tran , 23 May 2020 Cmt, 02:39 tarihinde ?unu yazd?: > Hi Eda, > > If you are using the MATLAB k-means function, calling it like > > idx = kmeans(X,k) > > will give you the index set, but if you do > > [idx,C] = kmeans(X,k) > > then you will also get a matrix C which contains the cluster centroids. Is > this not what you need? > > --Richard > > On 5/22/20 10:38 AM, Eda Oktay wrote: > > I am sorry, I used VecDuplictaeVecs not MatDuplicateVecs > > Eda Oktay , 22 May 2020 Cum, 20:31 tarihinde ?unu > yazd?: > >> Dear Richard, >> >> Thank you for your email. From MATLAB's kmeans() function I believe I got >> the final clustering index set, not centroids. What I am trying to do is to >> cluster vectors created by MatDuplicateVecs() according to the index set >> (whose type is not IS since I took it from MATLAB) that I obtained from >> MATLAB. I am trying to cluster these vectors however since they are >> parallel, I couldn't understand how to cluster them. >> >> Normally, I have to be independent from MATLAB so I will try your >> suggestion, grateful for that. However, because of my limited >> knowledge about PETSc and parallel computing, I am not able to figure out >> how to cluster parallel vectors according to an index set. >> >> Thanks, >> >> Eda >> >> Mills, Richard Tran , 30 Nis 2020 Per, 02:07 tarihinde >> ?unu yazd?: >> >>> Hi Eda, >>> >>> Thanks for your reply. I'm still trying to understand why you say you >>> need to duplicate the row vectors across all processes. When I have >>> implemented parallel k-means, I don't duplicate the row vectors. (This >>> would be very unscalable and largely defeat the point of doing this with >>> MPI parallelism in the first place.) >>> >>> Earlier in this email thread, you said that you have used Matlab to get >>> cluster IDs for each row vector. Are you trying to then use this >>> information to calculate the cluster centroids from inside your PETSc >>> program? If so, you can do this by having each MPI rank do the following: >>> For cluster i in 0 to (k-1), calculate the element-wise sum of all of the >>> local rows that belong to cluster i, then use MPI_Allreduce() to calculate >>> the global elementwise sum of all the local sums (this array will be >>> replicated across all MPI ranks), and finally divide by the number of >>> members of that cluster to get the centroid. Note that MPI_Allreduce() >>> doesn't work on PETSc objects, but simple arrays, so you'll want to use >>> something like MatGetValues() or MatGetRow() to access the elements of your >>> row vectors. >>> >>> Let me know if I am misunderstanding what you are aiming to do, or if I >>> am misunderstanding something. >>> >>> It sounds like you would benefit from having some routines in PETSc to >>> do k-means (or other) clustering, by the way? >>> >>> Best regards, >>> Richard >>> >>> On 4/29/20 3:47 AM, Eda Oktay wrote: >>> >>> Dear Richard, >>> >>> I am trying to use spectral clustering algorithm by using k-means >>> clustering algorithm at some point. I am doing this by producing a matrix >>> consisting of eigenvectors (of the adjacency matrix of the graph that I >>> want to partition), then forming row vectors of this matrix. This is the >>> part that I am using parallel vector. By using the output from k-means, I >>> am trying to cluster these row vectors. To cluster these vectors, I think I >>> need all row vectors in all processes. I wanted to use sequential vectors, >>> however, I couldn't find a different way that I form row vectors of a >>> matrix. >>> >>> I am trying to use VecScatterCreateToAll, however, since my vector is >>> parallel crated by VecDuplicateVecs, my input is not in correct type, so I >>> get error. I still can't get how can I use this function in parallel vector >>> created by VecDuplicateVecs. >>> >>> Thank you all for your help. >>> >>> Eda >>> >>> Mills, Richard Tran , 7 Nis 2020 Sal, 01:51 tarihinde >>> ?unu yazd?: >>> >>>> Hi Eda, >>>> >>>> I think that you probably want to use VecScatter routines, as Junchao >>>> has suggested, instead of the lower level star forest for this. I >>>> believe that VecScatterCreateToZero() is what you want for the >>>> broadcast >>>> problem you describe, in the second part of your question. I'm not sure >>>> what you are trying to do in the first part. Taking a parallel vector >>>> and then copying its entire contents to a sequential vector residing on >>>> each process is not scalable, and a lot of the design that has gone >>>> into >>>> PETSc is to prevent the user from ever needing to do things like that. >>>> Can you please tell us what you intend to do with these sequential >>>> vectors? >>>> >>>> I'm also wondering why, later in your message, you say that you get >>>> cluster assignments from Matlab, and then "to cluster row vectors >>>> according to this information, all processors need to have all of the >>>> row vectors". Do you mean you want to get all of the row vectors copied >>>> onto all of the processors so that you can compute the cluster >>>> centroids? If so, computing the cluster centroids can be done without >>>> copying the row vectors onto all processors if you use a communication >>>> operation like MPI_Allreduce(). >>>> >>>> Lastly, let me add that I've done a fair amount of work implementing >>>> clustering algorithms on distributed memory parallel machines, but >>>> outside of PETSc. I was thinking that I should implement some of these >>>> routines using PETSc. I can't get to this immediately, but I'm >>>> wondering >>>> if you might care to tell me a bit more about the clustering problems >>>> you need to solve and how having some support for this in PETSc might >>>> (or might not) help. >>>> >>>> Best regards, >>>> Richard >>>> >>>> On 4/4/20 1:39 AM, Eda Oktay wrote: >>>> > Hi all, >>>> > >>>> > I created a parallel vector UV, by using VecDuplicateVecs since I >>>> need >>>> > row vectors of a matrix. However, I need the whole vector be in all >>>> > processors, which means I need to gather all and broadcast them to >>>> all >>>> > processors. To gather, I tried to use VecStrideGatherAll: >>>> > >>>> > Vec UVG; >>>> > VecStrideGatherAll(UV,UVG,INSERT_VALUES); >>>> > VecView(UVG,PETSC_VIEWER_STDOUT_WORLD); >>>> > >>>> > however when I try to view the vector, I get the following error. >>>> > >>>> > [3]PETSC ERROR: Invalid argument >>>> > [3]PETSC ERROR: Wrong type of object: Parameter # 1 >>>> > [3]PETSC ERROR: See >>>> > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>> shooting. >>>> > [3]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019 >>>> > [3]PETSC ERROR: ./clustering_son_final_edgecut_without_parmetis on a >>>> > arch-linux2-c-debug named localhost.localdomain by edaoktay Sat Apr >>>> 4 >>>> > 11:22:54 2020 >>>> > [3]PETSC ERROR: Wrong type of object: Parameter # 1 >>>> > [0]PETSC ERROR: See >>>> > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>> shooting. >>>> > [0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019 >>>> > [0]PETSC ERROR: ./clustering_son_final_edgecut_without_parmetis on a >>>> > arch-linux2-c-debug named localhost.localdomain by edaoktay Sat Apr >>>> 4 >>>> > 11:22:54 2020 >>>> > [0]PETSC ERROR: Configure options --download-mpich >>>> --download-openblas >>>> > --download-slepc --download-metis --download-parmetis >>>> --download-chaco >>>> > --with-X=1 >>>> > [0]PETSC ERROR: #1 VecStrideGatherAll() line 646 in >>>> > /home/edaoktay/petsc-3.11.1/src/vec/vec/utils/vinv.c >>>> > ./clustering_son_final_edgecut_without_parmetis on a >>>> > arch-linux2-c-debug named localhost.localdomain by edaoktay Sat Apr >>>> 4 >>>> > 11:22:54 2020 >>>> > [1]PETSC ERROR: Configure options --download-mpich >>>> --download-openblas >>>> > --download-slepc --download-metis --download-parmetis >>>> --download-chaco >>>> > --with-X=1 >>>> > [1]PETSC ERROR: #1 VecStrideGatherAll() line 646 in >>>> > /home/edaoktay/petsc-3.11.1/src/vec/vec/utils/vinv.c >>>> > Configure options --download-mpich --download-openblas >>>> > --download-slepc --download-metis --download-parmetis >>>> --download-chaco >>>> > --with-X=1 >>>> > [3]PETSC ERROR: #1 VecStrideGatherAll() line 646 in >>>> > /home/edaoktay/petsc-3.11.1/src/vec/vec/utils/vinv.c >>>> > >>>> > I couldn't understand why I am getting this error. Is this because of >>>> > UV being created by VecDuplicateVecs? How can I solve this problem? >>>> > >>>> > The other question is broadcasting. After gathering all elements of >>>> > the vector UV, I need to broadcast them to all processors. I found >>>> > PetscSFBcastBegin. However, I couldn't understand the PetscSF concept >>>> > properly. I couldn't adjust my question to the star forest concept. >>>> > >>>> > My problem is: If I have 4 processors, I create a matrix whose >>>> columns >>>> > are 4 smallest eigenvectors, say of size 72. Then by defining each >>>> row >>>> > of this matrix as a vector, I cluster them by using k-means >>>> > clustering algorithm. For now, I cluster them by using MATLAB and I >>>> > obtain a vector showing which row vector is in which cluster. After >>>> > getting this vector, to cluster row vectors according to this >>>> > information, all processors need to have all of the row vectors. >>>> > >>>> > According to this problem, how can I use the star forest concept? >>>> > >>>> > I will be glad if you can help me about this problem since I don't >>>> > have enough knowledge about graph theory. An if you have any idea >>>> > about how can I use k-means algorithm in a more practical way, please >>>> > let me know. >>>> > >>>> > Thanks! >>>> > >>>> > Eda >>>> >>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajaramillopalma at gmail.com Tue May 26 10:22:21 2020 From: ajaramillopalma at gmail.com (Alfredo Jaramillo) Date: Tue, 26 May 2020 12:22:21 -0300 Subject: [petsc-users] error when configuring petsc with Intel Compilers 2019 update 3 Message-ID: hello dear PETSc team, I'm trying to install PETSc with the 2019 update 3 Intel Parallel Studio. When starting the configuration process there appears the next message: *TESTING: checkFortran90Array from config.compilersFortran(/opt/petsc-3.13.0/config/BuildSystem/config/compilersFortran.py:211)******************************************************************************* UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details):-------------------------------------------------------------------------------Could not check Fortran pointer arguments******************************************************************************** the configuration log is attached to this message I would be very thankful of any kind help on this matter Alfredo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 307596 bytes Desc: not available URL: From knepley at gmail.com Tue May 26 11:02:30 2020 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 26 May 2020 12:02:30 -0400 Subject: [petsc-users] error when configuring petsc with Intel Compilers 2019 update 3 In-Reply-To: References: Message-ID: On Tue, May 26, 2020 at 11:23 AM Alfredo Jaramillo < ajaramillopalma at gmail.com> wrote: > hello dear PETSc team, > > I'm trying to install PETSc with the 2019 update 3 Intel Parallel Studio. > When starting the configuration process there appears the next message: > > > > > > > > *TESTING: checkFortran90Array from > config.compilersFortran(/opt/petsc-3.13.0/config/BuildSystem/config/compilersFortran.py:211)******************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for > details):-------------------------------------------------------------------------------Could > not check Fortran pointer > arguments******************************************************************************** > > the configuration log is attached to this message > > I would be very thankful of any kind help on this matter > It seems like headers in /usr/include are conflicting with your new compiler. Executing: mpiicc -c -o /tmp/petsc-n__j58ih/config.compilersFortran/conftest.o -I/tmp/petsc-n__j58ih/config.libraries -I/tmp/petsc-n__j58ih/config.setCompilers -I/tmp/petsc-n__j58ih/config.compilers -I/tmp/petsc-n__j58ih/config.utilities.closure -I/tmp/petsc-n__j58ih/config.compilersFortran -fPIC -O3 -march=native -mtune=native /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c Possible ERROR while running compiler: exit code 2 stderr: In file included from /usr/include/bits/floatn.h(119), from /usr/include/stdlib.h(55), from /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c(4): /usr/include/bits/floatn-common.h(214): error: invalid combination of type specifiers typedef float _Float32; ^ In file included from /usr/include/bits/floatn.h(119), from /usr/include/stdlib.h(55), from /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c(4): /usr/include/bits/floatn-common.h(251): error: invalid combination of type specifiers typedef double _Float64; ^ In file included from /usr/include/bits/floatn.h(119), from /usr/include/stdlib.h(55), from /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c(4): /usr/include/bits/floatn-common.h(268): error: invalid combination of type specifiers typedef double _Float32x; ^ In file included from /usr/include/bits/floatn.h(119), from /usr/include/stdlib.h(55), from /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c(4): /usr/include/bits/floatn-common.h(285): error: invalid combination of type specifiers typedef long double _Float64x; ^ compilation aborted for /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c (code 2) Source: #include "confdefs.h" #include "conffix.h" #include #include void f90arraytest_(void* a1, void* a2,void* a3, void* i) { printf("arrays [%p %p %p]\n",a1,a2,a3); fflush(stdout); return; } void f90ptrtest_(void* a1, void* a2,void* a3, void* i, void* p1 ,void* p2, void* p3) { printf("arrays [%p %p %p]\n",a1,a2,a3); if ((p1 == p3) && (p1 != p2)) { printf("pointers match! [%p %p] [%p]\n",p1,p3,p2); fflush(stdout); } else { printf("pointers do not match! [%p %p] [%p]\n",p1,p3,p2); fflush(stdout); exit(111); } return; } Thanks, Matt Alfredo > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Tue May 26 13:00:55 2020 From: sam.guo at cd-adapco.com (Sam Guo) Date: Tue, 26 May 2020 11:00:55 -0700 Subject: [petsc-users] using real and complex together Message-ID: Dear PETSc dev team, Can I use both real and complex versions together? Thanks, Sam -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajaramillopalma at gmail.com Tue May 26 14:15:24 2020 From: ajaramillopalma at gmail.com (Alfredo Jaramillo) Date: Tue, 26 May 2020 16:15:24 -0300 Subject: [petsc-users] error when configuring petsc with Intel Compilers 2019 update 3 In-Reply-To: References: Message-ID: Thank you Matthew! I need this version to work on my computer in order to look for a bug that appears in a cluster. I'm not sure how to make it work, I will try with an older Linux distribution. regards Alfredo On Tue, May 26, 2020 at 1:02 PM Matthew Knepley wrote: > On Tue, May 26, 2020 at 11:23 AM Alfredo Jaramillo < > ajaramillopalma at gmail.com> wrote: > >> hello dear PETSc team, >> >> I'm trying to install PETSc with the 2019 update 3 Intel Parallel Studio. >> When starting the configuration process there appears the next message: >> >> >> >> >> >> >> >> *TESTING: checkFortran90Array from >> config.compilersFortran(/opt/petsc-3.13.0/config/BuildSystem/config/compilersFortran.py:211)******************************************************************************* >> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for >> details):-------------------------------------------------------------------------------Could >> not check Fortran pointer >> arguments******************************************************************************** >> >> the configuration log is attached to this message >> >> I would be very thankful of any kind help on this matter >> > > It seems like headers in /usr/include are conflicting with your new > compiler. > > Executing: mpiicc -c -o > /tmp/petsc-n__j58ih/config.compilersFortran/conftest.o > -I/tmp/petsc-n__j58ih/config.libraries > -I/tmp/petsc-n__j58ih/config.setCompilers > -I/tmp/petsc-n__j58ih/config.compilers > -I/tmp/petsc-n__j58ih/config.utilities.closure > -I/tmp/petsc-n__j58ih/config.compilersFortran -fPIC -O3 -march=native > -mtune=native /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c > Possible ERROR while running compiler: exit code 2 > stderr: > In file included from /usr/include/bits/floatn.h(119), > from /usr/include/stdlib.h(55), > from > /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c(4): > /usr/include/bits/floatn-common.h(214): error: invalid combination of type > specifiers > typedef float _Float32; > ^ > > In file included from /usr/include/bits/floatn.h(119), > from /usr/include/stdlib.h(55), > from > /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c(4): > /usr/include/bits/floatn-common.h(251): error: invalid combination of type > specifiers > typedef double _Float64; > ^ > > In file included from /usr/include/bits/floatn.h(119), > from /usr/include/stdlib.h(55), > from > /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c(4): > /usr/include/bits/floatn-common.h(268): error: invalid combination of type > specifiers > typedef double _Float32x; > ^ > > In file included from /usr/include/bits/floatn.h(119), > from /usr/include/stdlib.h(55), > from > /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c(4): > /usr/include/bits/floatn-common.h(285): error: invalid combination of type > specifiers > typedef long double _Float64x; > ^ > > compilation aborted for > /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c (code 2) > Source: > #include "confdefs.h" > #include "conffix.h" > #include > #include > void f90arraytest_(void* a1, void* a2,void* a3, void* i) > { > printf("arrays [%p %p %p]\n",a1,a2,a3); > fflush(stdout); > return; > } > void f90ptrtest_(void* a1, void* a2,void* a3, void* i, void* p1 ,void* p2, > void* p3) > { > printf("arrays [%p %p %p]\n",a1,a2,a3); > if ((p1 == p3) && (p1 != p2)) { > printf("pointers match! [%p %p] [%p]\n",p1,p3,p2); > fflush(stdout); > } else { > printf("pointers do not match! [%p %p] [%p]\n",p1,p3,p2); > fflush(stdout); > exit(111); > } > return; > } > > Thanks, > > Matt > > Alfredo >> > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue May 26 14:38:54 2020 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 26 May 2020 15:38:54 -0400 Subject: [petsc-users] error when configuring petsc with Intel Compilers 2019 update 3 In-Reply-To: References: Message-ID: On Tue, May 26, 2020 at 3:15 PM Alfredo Jaramillo wrote: > Thank you Matthew! > > I need this version to work on my computer in order to look for a bug that > appears in a cluster. > I'm not sure how to make it work, I will try with an older Linux > distribution. > If you are just looking for a bug, maybe a Docker container would allow easier experimentation? Thanks, Matt > regards > Alfredo > > On Tue, May 26, 2020 at 1:02 PM Matthew Knepley wrote: > >> On Tue, May 26, 2020 at 11:23 AM Alfredo Jaramillo < >> ajaramillopalma at gmail.com> wrote: >> >>> hello dear PETSc team, >>> >>> I'm trying to install PETSc with the 2019 update 3 Intel Parallel >>> Studio. When starting the configuration process there appears the next >>> message: >>> >>> >>> >>> >>> >>> >>> >>> *TESTING: checkFortran90Array from >>> config.compilersFortran(/opt/petsc-3.13.0/config/BuildSystem/config/compilersFortran.py:211)******************************************************************************* >>> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for >>> details):-------------------------------------------------------------------------------Could >>> not check Fortran pointer >>> arguments******************************************************************************** >>> >>> the configuration log is attached to this message >>> >>> I would be very thankful of any kind help on this matter >>> >> >> It seems like headers in /usr/include are conflicting with your new >> compiler. >> >> Executing: mpiicc -c -o >> /tmp/petsc-n__j58ih/config.compilersFortran/conftest.o >> -I/tmp/petsc-n__j58ih/config.libraries >> -I/tmp/petsc-n__j58ih/config.setCompilers >> -I/tmp/petsc-n__j58ih/config.compilers >> -I/tmp/petsc-n__j58ih/config.utilities.closure >> -I/tmp/petsc-n__j58ih/config.compilersFortran -fPIC -O3 -march=native >> -mtune=native /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c >> Possible ERROR while running compiler: exit code 2 >> stderr: >> In file included from /usr/include/bits/floatn.h(119), >> from /usr/include/stdlib.h(55), >> from >> /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c(4): >> /usr/include/bits/floatn-common.h(214): error: invalid combination of >> type specifiers >> typedef float _Float32; >> ^ >> >> In file included from /usr/include/bits/floatn.h(119), >> from /usr/include/stdlib.h(55), >> from >> /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c(4): >> /usr/include/bits/floatn-common.h(251): error: invalid combination of >> type specifiers >> typedef double _Float64; >> ^ >> >> In file included from /usr/include/bits/floatn.h(119), >> from /usr/include/stdlib.h(55), >> from >> /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c(4): >> /usr/include/bits/floatn-common.h(268): error: invalid combination of >> type specifiers >> typedef double _Float32x; >> ^ >> >> In file included from /usr/include/bits/floatn.h(119), >> from /usr/include/stdlib.h(55), >> from >> /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c(4): >> /usr/include/bits/floatn-common.h(285): error: invalid combination of >> type specifiers >> typedef long double _Float64x; >> ^ >> >> compilation aborted for >> /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c (code 2) >> Source: >> #include "confdefs.h" >> #include "conffix.h" >> #include >> #include >> void f90arraytest_(void* a1, void* a2,void* a3, void* i) >> { >> printf("arrays [%p %p %p]\n",a1,a2,a3); >> fflush(stdout); >> return; >> } >> void f90ptrtest_(void* a1, void* a2,void* a3, void* i, void* p1 ,void* >> p2, void* p3) >> { >> printf("arrays [%p %p %p]\n",a1,a2,a3); >> if ((p1 == p3) && (p1 != p2)) { >> printf("pointers match! [%p %p] [%p]\n",p1,p3,p2); >> fflush(stdout); >> } else { >> printf("pointers do not match! [%p %p] [%p]\n",p1,p3,p2); >> fflush(stdout); >> exit(111); >> } >> return; >> } >> >> Thanks, >> >> Matt >> >> Alfredo >>> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajaramillopalma at gmail.com Tue May 26 14:40:26 2020 From: ajaramillopalma at gmail.com (Alfredo Jaramillo) Date: Tue, 26 May 2020 16:40:26 -0300 Subject: [petsc-users] error when configuring petsc with Intel Compilers 2019 update 3 In-Reply-To: References: Message-ID: Thanks for the tip! I will look into it. regards Alfredo On Tue, May 26, 2020 at 4:39 PM Matthew Knepley wrote: > On Tue, May 26, 2020 at 3:15 PM Alfredo Jaramillo < > ajaramillopalma at gmail.com> wrote: > >> Thank you Matthew! >> >> I need this version to work on my computer in order to look for a bug >> that appears in a cluster. >> I'm not sure how to make it work, I will try with an older Linux >> distribution. >> > > If you are just looking for a bug, maybe a Docker container would allow > easier experimentation? > > Thanks, > > Matt > > >> regards >> Alfredo >> >> On Tue, May 26, 2020 at 1:02 PM Matthew Knepley >> wrote: >> >>> On Tue, May 26, 2020 at 11:23 AM Alfredo Jaramillo < >>> ajaramillopalma at gmail.com> wrote: >>> >>>> hello dear PETSc team, >>>> >>>> I'm trying to install PETSc with the 2019 update 3 Intel Parallel >>>> Studio. When starting the configuration process there appears the next >>>> message: >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> *TESTING: checkFortran90Array from >>>> config.compilersFortran(/opt/petsc-3.13.0/config/BuildSystem/config/compilersFortran.py:211)******************************************************************************* >>>> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for >>>> details):-------------------------------------------------------------------------------Could >>>> not check Fortran pointer >>>> arguments******************************************************************************** >>>> >>>> the configuration log is attached to this message >>>> >>>> I would be very thankful of any kind help on this matter >>>> >>> >>> It seems like headers in /usr/include are conflicting with your new >>> compiler. >>> >>> Executing: mpiicc -c -o >>> /tmp/petsc-n__j58ih/config.compilersFortran/conftest.o >>> -I/tmp/petsc-n__j58ih/config.libraries >>> -I/tmp/petsc-n__j58ih/config.setCompilers >>> -I/tmp/petsc-n__j58ih/config.compilers >>> -I/tmp/petsc-n__j58ih/config.utilities.closure >>> -I/tmp/petsc-n__j58ih/config.compilersFortran -fPIC -O3 -march=native >>> -mtune=native /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c >>> Possible ERROR while running compiler: exit code 2 >>> stderr: >>> In file included from /usr/include/bits/floatn.h(119), >>> from /usr/include/stdlib.h(55), >>> from >>> /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c(4): >>> /usr/include/bits/floatn-common.h(214): error: invalid combination of >>> type specifiers >>> typedef float _Float32; >>> ^ >>> >>> In file included from /usr/include/bits/floatn.h(119), >>> from /usr/include/stdlib.h(55), >>> from >>> /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c(4): >>> /usr/include/bits/floatn-common.h(251): error: invalid combination of >>> type specifiers >>> typedef double _Float64; >>> ^ >>> >>> In file included from /usr/include/bits/floatn.h(119), >>> from /usr/include/stdlib.h(55), >>> from >>> /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c(4): >>> /usr/include/bits/floatn-common.h(268): error: invalid combination of >>> type specifiers >>> typedef double _Float32x; >>> ^ >>> >>> In file included from /usr/include/bits/floatn.h(119), >>> from /usr/include/stdlib.h(55), >>> from >>> /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c(4): >>> /usr/include/bits/floatn-common.h(285): error: invalid combination of >>> type specifiers >>> typedef long double _Float64x; >>> ^ >>> >>> compilation aborted for >>> /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c (code 2) >>> Source: >>> #include "confdefs.h" >>> #include "conffix.h" >>> #include >>> #include >>> void f90arraytest_(void* a1, void* a2,void* a3, void* i) >>> { >>> printf("arrays [%p %p %p]\n",a1,a2,a3); >>> fflush(stdout); >>> return; >>> } >>> void f90ptrtest_(void* a1, void* a2,void* a3, void* i, void* p1 ,void* >>> p2, void* p3) >>> { >>> printf("arrays [%p %p %p]\n",a1,a2,a3); >>> if ((p1 == p3) && (p1 != p2)) { >>> printf("pointers match! [%p %p] [%p]\n",p1,p3,p2); >>> fflush(stdout); >>> } else { >>> printf("pointers do not match! [%p %p] [%p]\n",p1,p3,p2); >>> fflush(stdout); >>> exit(111); >>> } >>> return; >>> } >>> >>> Thanks, >>> >>> Matt >>> >>> Alfredo >>>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Tue May 26 15:00:07 2020 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Tue, 26 May 2020 20:00:07 +0000 Subject: [petsc-users] using real and complex together In-Reply-To: References: Message-ID: You can build PETSc with complex version, and declare some variables as 'PETSC_REAL'. Hong ________________________________ From: petsc-users on behalf of Sam Guo Sent: Tuesday, May 26, 2020 1:00 PM To: PETSc Subject: [petsc-users] using real and complex together Dear PETSc dev team, Can I use both real and complex versions together? Thanks, Sam -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Tue May 26 15:16:42 2020 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Tue, 26 May 2020 15:16:42 -0500 Subject: [petsc-users] error when configuring petsc with Intel Compilers 2019 update 3 In-Reply-To: References: Message-ID: Please be aware that Intel MPI 2019 u3 has a lot of bugs when you look into a bug. You better upgrade Intel Parallel Studio to the latest version. --Junchao Zhang On Tue, May 26, 2020 at 2:41 PM Alfredo Jaramillo wrote: > Thanks for the tip! I will look into it. > > regards > Alfredo > > > On Tue, May 26, 2020 at 4:39 PM Matthew Knepley wrote: > >> On Tue, May 26, 2020 at 3:15 PM Alfredo Jaramillo < >> ajaramillopalma at gmail.com> wrote: >> >>> Thank you Matthew! >>> >>> I need this version to work on my computer in order to look for a bug >>> that appears in a cluster. >>> I'm not sure how to make it work, I will try with an older Linux >>> distribution. >>> >> >> If you are just looking for a bug, maybe a Docker container would allow >> easier experimentation? >> >> Thanks, >> >> Matt >> >> >>> regards >>> Alfredo >>> >>> On Tue, May 26, 2020 at 1:02 PM Matthew Knepley >>> wrote: >>> >>>> On Tue, May 26, 2020 at 11:23 AM Alfredo Jaramillo < >>>> ajaramillopalma at gmail.com> wrote: >>>> >>>>> hello dear PETSc team, >>>>> >>>>> I'm trying to install PETSc with the 2019 update 3 Intel Parallel >>>>> Studio. When starting the configuration process there appears the next >>>>> message: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> *TESTING: checkFortran90Array from >>>>> config.compilersFortran(/opt/petsc-3.13.0/config/BuildSystem/config/compilersFortran.py:211)******************************************************************************* >>>>> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for >>>>> details):-------------------------------------------------------------------------------Could >>>>> not check Fortran pointer >>>>> arguments******************************************************************************** >>>>> >>>>> the configuration log is attached to this message >>>>> >>>>> I would be very thankful of any kind help on this matter >>>>> >>>> >>>> It seems like headers in /usr/include are conflicting with your new >>>> compiler. >>>> >>>> Executing: mpiicc -c -o >>>> /tmp/petsc-n__j58ih/config.compilersFortran/conftest.o >>>> -I/tmp/petsc-n__j58ih/config.libraries >>>> -I/tmp/petsc-n__j58ih/config.setCompilers >>>> -I/tmp/petsc-n__j58ih/config.compilers >>>> -I/tmp/petsc-n__j58ih/config.utilities.closure >>>> -I/tmp/petsc-n__j58ih/config.compilersFortran -fPIC -O3 -march=native >>>> -mtune=native /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c >>>> Possible ERROR while running compiler: exit code 2 >>>> stderr: >>>> In file included from /usr/include/bits/floatn.h(119), >>>> from /usr/include/stdlib.h(55), >>>> from >>>> /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c(4): >>>> /usr/include/bits/floatn-common.h(214): error: invalid combination of >>>> type specifiers >>>> typedef float _Float32; >>>> ^ >>>> >>>> In file included from /usr/include/bits/floatn.h(119), >>>> from /usr/include/stdlib.h(55), >>>> from >>>> /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c(4): >>>> /usr/include/bits/floatn-common.h(251): error: invalid combination of >>>> type specifiers >>>> typedef double _Float64; >>>> ^ >>>> >>>> In file included from /usr/include/bits/floatn.h(119), >>>> from /usr/include/stdlib.h(55), >>>> from >>>> /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c(4): >>>> /usr/include/bits/floatn-common.h(268): error: invalid combination of >>>> type specifiers >>>> typedef double _Float32x; >>>> ^ >>>> >>>> In file included from /usr/include/bits/floatn.h(119), >>>> from /usr/include/stdlib.h(55), >>>> from >>>> /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c(4): >>>> /usr/include/bits/floatn-common.h(285): error: invalid combination of >>>> type specifiers >>>> typedef long double _Float64x; >>>> ^ >>>> >>>> compilation aborted for >>>> /tmp/petsc-n__j58ih/config.compilersFortran/conftest.c (code 2) >>>> Source: >>>> #include "confdefs.h" >>>> #include "conffix.h" >>>> #include >>>> #include >>>> void f90arraytest_(void* a1, void* a2,void* a3, void* i) >>>> { >>>> printf("arrays [%p %p %p]\n",a1,a2,a3); >>>> fflush(stdout); >>>> return; >>>> } >>>> void f90ptrtest_(void* a1, void* a2,void* a3, void* i, void* p1 ,void* >>>> p2, void* p3) >>>> { >>>> printf("arrays [%p %p %p]\n",a1,a2,a3); >>>> if ((p1 == p3) && (p1 != p2)) { >>>> printf("pointers match! [%p %p] [%p]\n",p1,p3,p2); >>>> fflush(stdout); >>>> } else { >>>> printf("pointers do not match! [%p %p] [%p]\n",p1,p3,p2); >>>> fflush(stdout); >>>> exit(111); >>>> } >>>> return; >>>> } >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> Alfredo >>>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Tue May 26 15:28:18 2020 From: sam.guo at cd-adapco.com (Sam Guo) Date: Tue, 26 May 2020 13:28:18 -0700 Subject: [petsc-users] using real and complex together In-Reply-To: References: Message-ID: complex version is needed since matrix sometimes is real and sometimes is complex. I want to solve real matrix without allocating memory for imaginary part((except eigen pairs). On Tuesday, May 26, 2020, Zhang, Hong wrote: > You can build PETSc with complex version, and declare some variables as > 'PETSC_REAL'. > Hong > > ------------------------------ > *From:* petsc-users on behalf of Sam > Guo > *Sent:* Tuesday, May 26, 2020 1:00 PM > *To:* PETSc > *Subject:* [petsc-users] using real and complex together > > Dear PETSc dev team, > Can I use both real and complex versions together? > > Thanks, > Sam > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacob.fai at gmail.com Tue May 26 15:32:56 2020 From: jacob.fai at gmail.com (Jacob Faibussowitsch) Date: Tue, 26 May 2020 15:32:56 -0500 Subject: [petsc-users] using real and complex together In-Reply-To: References: Message-ID: > complex version is needed since matrix sometimes is real and sometimes is complex. PetscScalar is a flexible datatype, it will be real if PETSc is configured without complex support and include complex if PETSc is configured with complex. > I want to solve real matrix without allocating memory for imaginary part((except eigen pairs). If you only want to use the real component of a PetscScalar you can use PetscRealPart() https://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Sys/PetscRealPart.html to extract it. Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) Cell: (312) 694-3391 > On May 26, 2020, at 3:28 PM, Sam Guo wrote: > > complex version is needed since matrix sometimes is real and sometimes is complex. I want to solve real matrix without allocating memory for imaginary part((except eigen pairs). > > On Tuesday, May 26, 2020, Zhang, Hong > wrote: > You can build PETSc with complex version, and declare some variables as 'PETSC_REAL'. > Hong > > From: petsc-users > on behalf of Sam Guo > > Sent: Tuesday, May 26, 2020 1:00 PM > To: PETSc > > Subject: [petsc-users] using real and complex together > > Dear PETSc dev team, > Can I use both real and complex versions together? > > Thanks, > Sam -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Tue May 26 15:34:11 2020 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Tue, 26 May 2020 23:34:11 +0300 Subject: [petsc-users] using real and complex together In-Reply-To: References: Message-ID: <32DEC92E-93A5-4EE8-BF92-2468C1FE31B9@gmail.com> All the solvers/matrices/vectors works for PetscScalar types (i.e. in your case complex) If you need to solve for the real part only, you can duplicate the matrix and call MatRealPart to zero out the imaginary part. But the solve will always run in the complex space You should not be worried about doubling the memory for a matrix (i.e. real and imaginary part) > On May 26, 2020, at 11:28 PM, Sam Guo wrote: > > complex version is needed since matrix sometimes is real and sometimes is complex. I want to solve real matrix without allocating memory for imaginary part((except eigen pairs). > > On Tuesday, May 26, 2020, Zhang, Hong > wrote: > You can build PETSc with complex version, and declare some variables as 'PETSC_REAL'. > Hong > > From: petsc-users > on behalf of Sam Guo > > Sent: Tuesday, May 26, 2020 1:00 PM > To: PETSc > > Subject: [petsc-users] using real and complex together > > Dear PETSc dev team, > Can I use both real and complex versions together? > > Thanks, > Sam -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Tue May 26 15:49:18 2020 From: sam.guo at cd-adapco.com (Sam Guo) Date: Tue, 26 May 2020 13:49:18 -0700 Subject: [petsc-users] using real and complex together In-Reply-To: <32DEC92E-93A5-4EE8-BF92-2468C1FE31B9@gmail.com> References: <32DEC92E-93A5-4EE8-BF92-2468C1FE31B9@gmail.com> Message-ID: Thanks On Tuesday, May 26, 2020, Stefano Zampini wrote: > All the solvers/matrices/vectors works for PetscScalar types (i.e. in your > case complex) > If you need to solve for the real part only, you can duplicate the matrix > and call MatRealPart to zero out the imaginary part. But the solve will > always run in the complex space > You should not be worried about doubling the memory for a matrix (i.e. > real and imaginary part) > > > On May 26, 2020, at 11:28 PM, Sam Guo wrote: > > complex version is needed since matrix sometimes is real and sometimes is > complex. I want to solve real matrix without allocating memory for > imaginary part((except eigen pairs). > > On Tuesday, May 26, 2020, Zhang, Hong wrote: > >> You can build PETSc with complex version, and declare some variables as >> 'PETSC_REAL'. >> Hong >> >> ------------------------------ >> *From:* petsc-users on behalf of Sam >> Guo >> *Sent:* Tuesday, May 26, 2020 1:00 PM >> *To:* PETSc >> *Subject:* [petsc-users] using real and complex together >> >> Dear PETSc dev team, >> Can I use both real and complex versions together? >> >> Thanks, >> Sam >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From prateekgupta1709 at gmail.com Wed May 27 03:13:34 2020 From: prateekgupta1709 at gmail.com (Prateek Gupta) Date: Wed, 27 May 2020 10:13:34 +0200 Subject: [petsc-users] Repeated global indices in local maps Message-ID: Hi, I am new to using petsc and need its nonlinear solvers for my code. I am currently using parmetis (outside petsc) to partition an unstructured mesh element-wise, but working with data on the vertices of the mesh. Consequently, I have repeated vertices in different MPI-processes/ranks. At the solver stage, I need to solve for the data on vertices (solution vector is defined on the vertices). So, I need to create a distributed vector over vertices of the mesh, but the distribution in MPI-ranks is not contiguous since partitioning is (has to be) done element wise. I am trying to figure out, 1. if I need only Local to Global IS or do I need to combine them with AO? 2. Even at the VecCreateMPI stage, is it possible to inform petsc that, although, say, rank_i has n_i components of the vector, but those components are not arranged contiguously? For instance, Global vertices vector v : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] v_rank_1 : [2, 3, 4, 8, 9, 7] ; v_rank_2 : [0, 1, 2, 3, 6, 10, 11, 8, 9, 5] Any help is greatly appreciated. Thank you. Prateek Gupta -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierpaolo.minelli at cnr.it Wed May 27 03:50:39 2020 From: pierpaolo.minelli at cnr.it (Pierpaolo Minelli) Date: Wed, 27 May 2020 10:50:39 +0200 Subject: [petsc-users] Error on INTEGER SIZE using DMDACreate3d Message-ID: <88A21A58-CDCD-42C0-9A20-1A6C5CBCDF8B@cnr.it> Hi, I am trying to solve a Poisson equation on this grid: Nx = 2501 Ny = 3401 Nz = 1601 I received this error: [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Overflow in integer operation: http://www.mcs.anl.gov/petsc/documentation/faq.html#64-bit-indices [0]PETSC ERROR: Mesh of 2501 by 3401 by 1 (dof) is too large for 32 bit indices [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017 [0]PETSC ERROR: /marconi_scratch/userexternal/pminelli/PIC3D/2500_3400_1600/./PIC_3D on a arch-linux2-c-opt named r129c09s02 by pminelli Tu e May 26 20:16:34 2020 [0]PETSC ERROR: Configure options --prefix=/cineca/prod/opt/libraries/petsc/3.8.3/intelmpi--2018--binary CC=mpiicc FC=mpiifort CXX=mpiicpc F77=mpiifort F90=mpiifort --with-debugging=0 --with-blaslapack-dir=/cineca/prod/opt/compilers/intel/pe-xe-2018/binary/mkl --with-fortran=1 --with-fortran-interfaces=1 --with-cmake-dir=/cineca/prod/opt/tools/cmake/3.5.2/none --with-mpi-dir=/cineca/prod/opt/compilers/intel/pe-xe- 2018/binary/impi/2018.4.274 --download-scalapack --download-mumps=yes --download-hypre --download-superlu_dist --download-parmetis --downlo ad-metis [0]PETSC ERROR: #1 DMSetUp_DA_3D() line 218 in /marconi/prod/build/libraries/petsc/3.8.3/intelmpi--2018--binary/BA_WORK/petsc-3.8.3/src/dm/ impls/da/da3.c [0]PETSC ERROR: #2 DMSetUp_DA() line 25 in /marconi/prod/build/libraries/petsc/3.8.3/intelmpi--2018--binary/BA_WORK/petsc-3.8.3/src/dm/impl s/da/dareg.c [0]PETSC ERROR: #3 DMSetUp() line 720 in /marconi/prod/build/libraries/petsc/3.8.3/intelmpi--2018--binary/BA_WORK/petsc-3.8.3/src/dm/interf ace/dm.c forrtl: error (76): Abort trap signal I am on an HPC facility and after I loaded PETSC module, I have seen that it is configured with INTEGER size = 32 I solve my problem with these options and it works perfectly with smaller grids: -dm_mat_type hypre -pc_type hypre -pc_hypre_type boomeramg -pc_hypre_boomeramg_relax_type_all SOR/Jacobi -pc_hypre_boomeramg_coarsen_type PMIS -pc_hypre_boomeramg_interp_type FF1 -ksp_type richardson Is it possible to overcome this if I ask them to install a version with INTEGER SIZE = 64? Alternatively, is it possible to overcome this using intel compiler options? Thanks in advance Pierpaolo Minelli From stefano.zampini at gmail.com Wed May 27 04:26:51 2020 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Wed, 27 May 2020 12:26:51 +0300 Subject: [petsc-users] Error on INTEGER SIZE using DMDACreate3d In-Reply-To: <88A21A58-CDCD-42C0-9A20-1A6C5CBCDF8B@cnr.it> References: <88A21A58-CDCD-42C0-9A20-1A6C5CBCDF8B@cnr.it> Message-ID: You need a version of PETSc compiled with 64bit indices, since the message indicates the number of dofs in this case is larger the INT_MAX 2501?3401?1601 = 13617947501 I also suggest you upgrade to a newer version, 3.8.3 is quite old as the error message reports Il giorno mer 27 mag 2020 alle ore 11:50 Pierpaolo Minelli < pierpaolo.minelli at cnr.it> ha scritto: > Hi, > > I am trying to solve a Poisson equation on this grid: > > Nx = 2501 > Ny = 3401 > Nz = 1601 > > I received this error: > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Overflow in integer operation: > http://www.mcs.anl.gov/petsc/documentation/faq.html#64-bit-indices > [0]PETSC ERROR: Mesh of 2501 by 3401 by 1 (dof) is too large for 32 bit > indices > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017 > [0]PETSC ERROR: > /marconi_scratch/userexternal/pminelli/PIC3D/2500_3400_1600/./PIC_3D on a > arch-linux2-c-opt named r129c09s02 by pminelli Tu > e May 26 20:16:34 2020 > [0]PETSC ERROR: Configure options > --prefix=/cineca/prod/opt/libraries/petsc/3.8.3/intelmpi--2018--binary > CC=mpiicc FC=mpiifort CXX=mpiicpc > F77=mpiifort F90=mpiifort --with-debugging=0 > --with-blaslapack-dir=/cineca/prod/opt/compilers/intel/pe-xe-2018/binary/mkl > --with-fortran=1 > --with-fortran-interfaces=1 > --with-cmake-dir=/cineca/prod/opt/tools/cmake/3.5.2/none > --with-mpi-dir=/cineca/prod/opt/compilers/intel/pe-xe- > 2018/binary/impi/2018.4.274 --download-scalapack --download-mumps=yes > --download-hypre --download-superlu_dist --download-parmetis --downlo > ad-metis > [0]PETSC ERROR: #1 DMSetUp_DA_3D() line 218 in > /marconi/prod/build/libraries/petsc/3.8.3/intelmpi--2018--binary/BA_WORK/petsc-3.8.3/src/dm/ > impls/da/da3.c > [0]PETSC ERROR: #2 DMSetUp_DA() line 25 in > /marconi/prod/build/libraries/petsc/3.8.3/intelmpi--2018--binary/BA_WORK/petsc-3.8.3/src/dm/impl > s/da/dareg.c > [0]PETSC ERROR: #3 DMSetUp() line 720 in > /marconi/prod/build/libraries/petsc/3.8.3/intelmpi--2018--binary/BA_WORK/petsc-3.8.3/src/dm/interf > ace/dm.c > forrtl: error (76): Abort trap signal > > > I am on an HPC facility and after I loaded PETSC module, I have seen that > it is configured with INTEGER size = 32 > > I solve my problem with these options and it works perfectly with smaller > grids: > > -dm_mat_type hypre -pc_type hypre -pc_hypre_type boomeramg > -pc_hypre_boomeramg_relax_type_all SOR/Jacobi > -pc_hypre_boomeramg_coarsen_type PMIS -pc_hypre_boomeramg_interp_type FF1 > -ksp_type richardson > > Is it possible to overcome this if I ask them to install a version with > INTEGER SIZE = 64? > Alternatively, is it possible to overcome this using intel compiler > options? > > Thanks in advance > > Pierpaolo Minelli -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 27 06:12:44 2020 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 27 May 2020 07:12:44 -0400 Subject: [petsc-users] Repeated global indices in local maps In-Reply-To: References: Message-ID: On Wed, May 27, 2020 at 4:14 AM Prateek Gupta wrote: > Hi, > I am new to using petsc and need its nonlinear solvers for my code. I am > currently using parmetis (outside petsc) to partition an unstructured mesh > element-wise, but working with data on the vertices of the mesh. > Consequently, I have repeated vertices in different MPI-processes/ranks. > At the solver stage, I need to solve for the data on vertices (solution > vector is defined on the vertices). So, I need to create a distributed > vector over vertices of the mesh, but the distribution in MPI-ranks is not > contiguous since partitioning is (has to be) done element wise. I am trying > to figure out, > 1. if I need only Local to Global IS or do I need to combine them with AO? > 2. Even at the VecCreateMPI stage, is it possible to inform petsc that, > although, say, rank_i has n_i components of the vector, but those > components are not arranged contiguously? > > For instance, > > Global vertices vector v : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] > v_rank_1 : [2, 3, 4, 8, 9, 7] ; v_rank_2 : [0, 1, 2, 3, 6, 10, 11, 8, 9, 5] > PETSc is going to number the unknowns contiguously by process. If you want to also have another numbering, as above, you must handle it somehow. You could use an AO. However, I believe it is easier to just renumber your mesh after partitioning. This is what PETSc does in its unstructured mesh code. Thanks, Matt > Any help is greatly appreciated. > > Thank you. > Prateek Gupta > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed May 27 16:57:40 2020 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 27 May 2020 17:57:40 -0400 Subject: [petsc-users] matsetvaluesblocked4_ Message-ID: Is there a Mat AIJSeq method For MatSetValuesBlocked, like matsetvaluesblocked4_ that is not hardwired for bs=4? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed May 27 17:14:16 2020 From: jed at jedbrown.org (Jed Brown) Date: Wed, 27 May 2020 16:14:16 -0600 Subject: [petsc-users] matsetvaluesblocked4_ In-Reply-To: References: Message-ID: <87pnapgh53.fsf@jedbrown.org> What did you profile to determine that expanding indices is significant? matsetvaluesblocked4_ was made specially for PETSc-FUN3D with BAIJ matrices. I take it you can't use BAIJ because you use GAMG? Mark Adams writes: > Is there a Mat AIJSeq method For MatSetValuesBlocked, > like matsetvaluesblocked4_ that is not hardwired for bs=4? From mfadams at lbl.gov Wed May 27 17:43:12 2020 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 27 May 2020 18:43:12 -0400 Subject: [petsc-users] matsetvaluesblocked4_ In-Reply-To: <87pnapgh53.fsf@jedbrown.org> References: <87pnapgh53.fsf@jedbrown.org> Message-ID: Nvidias's NSight with 2D Q3 and bs=10. (attached). I am using LU in serial. I copied MatSetValues_SeqAIJ into a .cu file and made some minor adjustments. I am getting the indices from Matt's DMPlexMatGetClosureIndices and the matrix is from DMPlex. I also use GPU direct solvers sometimes and will want that in the future. All and all I figure AIJ is a safer bet, but maybe BAIJ is an option. Matt: Could I use BAIJ with Plex? Thanks, Mark On Wed, May 27, 2020 at 6:14 PM Jed Brown wrote: > What did you profile to determine that expanding indices is significant? > > matsetvaluesblocked4_ was made specially for PETSc-FUN3D with BAIJ > matrices. > > I take it you can't use BAIJ because you use GAMG? > > Mark Adams writes: > > > Is there a Mat AIJSeq method For MatSetValuesBlocked, > > like matsetvaluesblocked4_ that is not hardwired for bs=4? > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-05-27 at 6.34.07 PM.png Type: image/png Size: 987293 bytes Desc: not available URL: From knepley at gmail.com Wed May 27 17:47:16 2020 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 27 May 2020 18:47:16 -0400 Subject: [petsc-users] matsetvaluesblocked4_ In-Reply-To: References: <87pnapgh53.fsf@jedbrown.org> Message-ID: On Wed, May 27, 2020 at 6:43 PM Mark Adams wrote: > Nvidias's NSight with 2D Q3 and bs=10. (attached). > > I am using LU in serial. > > I copied MatSetValues_SeqAIJ into a .cu file and made some minor > adjustments. > > I am getting the indices from Matt's DMPlexMatGetClosureIndices and the > matrix is from DMPlex. I also use GPU direct solvers sometimes and will > want that in the future. All and all I figure AIJ is a safer bet, but maybe > BAIJ is an option. > > Matt: Could I use BAIJ with Plex? > Plex does this automatically if you have blocks. Matt > Thanks, > Mark > > > On Wed, May 27, 2020 at 6:14 PM Jed Brown wrote: > >> What did you profile to determine that expanding indices is significant? >> >> matsetvaluesblocked4_ was made specially for PETSc-FUN3D with BAIJ >> matrices. >> >> I take it you can't use BAIJ because you use GAMG? >> >> Mark Adams writes: >> >> > Is there a Mat AIJSeq method For MatSetValuesBlocked, >> > like matsetvaluesblocked4_ that is not hardwired for bs=4? >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed May 27 18:08:34 2020 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 27 May 2020 19:08:34 -0400 Subject: [petsc-users] matsetvaluesblocked4_ In-Reply-To: References: <87pnapgh53.fsf@jedbrown.org> Message-ID: > >> >> Matt: Could I use BAIJ with Plex? >> > > Plex does this automatically if you have blocks. > I use DMCreateMatrix with forest or plex and I seem to get AIJ matrices. Where does Plex get the block size? I have not yet verified that bs is set in this matrix. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 27 18:14:04 2020 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 27 May 2020 19:14:04 -0400 Subject: [petsc-users] matsetvaluesblocked4_ In-Reply-To: References: <87pnapgh53.fsf@jedbrown.org> Message-ID: On Wed, May 27, 2020 at 7:09 PM Mark Adams wrote: > > >>> >>> Matt: Could I use BAIJ with Plex? >>> >> >> Plex does this automatically if you have blocks. >> > > I use DMCreateMatrix with forest or plex and I seem to get AIJ matrices. > Where does Plex get the block size? > > I have not yet verified that bs is set in this matrix. > I think I may know what your problem is. Plex evaluates the blocksize by looking for an equal number of dofs on each point. This is sufficient, but not necessary. If you are using higher order methods, there is block structure there that I will not see. Jed, is there an obvious way to see that structure that I am missing? Thanks, Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed May 27 18:22:09 2020 From: jed at jedbrown.org (Jed Brown) Date: Wed, 27 May 2020 17:22:09 -0600 Subject: [petsc-users] matsetvaluesblocked4_ In-Reply-To: References: <87pnapgh53.fsf@jedbrown.org> Message-ID: <87mu5tgdzy.fsf@jedbrown.org> Matthew Knepley writes: > On Wed, May 27, 2020 at 7:09 PM Mark Adams wrote: > >> >> >>>> >>>> Matt: Could I use BAIJ with Plex? >>>> >>> >>> Plex does this automatically if you have blocks. >>> >> >> I use DMCreateMatrix with forest or plex and I seem to get AIJ matrices. >> Where does Plex get the block size? >> >> I have not yet verified that bs is set in this matrix. >> > > I think I may know what your problem is. Plex evaluates the blocksize by > looking for an equal number of dofs > on each point. This is sufficient, but not necessary. If you are using > higher order methods, there is block structure > there that I will not see. > > Jed, is there an obvious way to see that structure that I am missing? gcd of all the point sizes. Note that you don't have constant block sizes if you eliminate some fields/components in boundary conditions. From mfadams at lbl.gov Wed May 27 18:31:07 2020 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 27 May 2020 19:31:07 -0400 Subject: [petsc-users] matsetvaluesblocked4_ In-Reply-To: References: <87pnapgh53.fsf@jedbrown.org> Message-ID: > > >> I think I may know what your problem is. Plex evaluates the blocksize by > looking for an equal number of dofs > on each point. This is sufficient, but not necessary. If you are using > higher order methods, there is block structure > there that I will not see. > I don't understand what the order has to do with it. I use code like this to setup the dofs: for (ii=0;iinum_species;ii++) { ierr = PetscFECreateDefault(PetscObjectComm((PetscObject) dm), dim, 1, ctx->simplex, NULL, PETSC_DECIDE, &ctx->fe[ii]);CHKERRQ(ierr); ierr = DMSetField(dm, ii, NULL, (PetscObject) ctx->fe[ii]);CHKERRQ(ierr); } Everything is constant, elements (eg, Q3) and dofs/vertex. > > Jed, is there an obvious way to see that structure that I am missing? > > Thanks, > > Matt > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed May 27 18:32:07 2020 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 27 May 2020 19:32:07 -0400 Subject: [petsc-users] matsetvaluesblocked4_ In-Reply-To: <87mu5tgdzy.fsf@jedbrown.org> References: <87pnapgh53.fsf@jedbrown.org> <87mu5tgdzy.fsf@jedbrown.org> Message-ID: I have DM_BOUNDARY_NONE. On Wed, May 27, 2020 at 7:22 PM Jed Brown wrote: > Matthew Knepley writes: > > > On Wed, May 27, 2020 at 7:09 PM Mark Adams wrote: > > > >> > >> > >>>> > >>>> Matt: Could I use BAIJ with Plex? > >>>> > >>> > >>> Plex does this automatically if you have blocks. > >>> > >> > >> I use DMCreateMatrix with forest or plex and I seem to get AIJ matrices. > >> Where does Plex get the block size? > >> > >> I have not yet verified that bs is set in this matrix. > >> > > > > I think I may know what your problem is. Plex evaluates the blocksize by > > looking for an equal number of dofs > > on each point. This is sufficient, but not necessary. If you are using > > higher order methods, there is block structure > > there that I will not see. > > > > Jed, is there an obvious way to see that structure that I am missing? > > gcd of all the point sizes. > > Note that you don't have constant block sizes if you eliminate some > fields/components in boundary conditions. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed May 27 18:34:43 2020 From: jed at jedbrown.org (Jed Brown) Date: Wed, 27 May 2020 17:34:43 -0600 Subject: [petsc-users] matsetvaluesblocked4_ In-Reply-To: References: <87pnapgh53.fsf@jedbrown.org> Message-ID: <87imghgdf0.fsf@jedbrown.org> Mark Adams writes: > Nvidias's NSight with 2D Q3 and bs=10. (attached). Thanks; this is basically the same as a CPU -- the cost is searching the sorted rows for the next entry. I've long thought we should optimize the implementations to fast-path when the next column index in the sparse matrix equals the next index in the provided block. It'd just take a good CPU test to demonstrate that payoff. From knepley at gmail.com Wed May 27 18:36:50 2020 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 27 May 2020 19:36:50 -0400 Subject: [petsc-users] matsetvaluesblocked4_ In-Reply-To: <87imghgdf0.fsf@jedbrown.org> References: <87pnapgh53.fsf@jedbrown.org> <87imghgdf0.fsf@jedbrown.org> Message-ID: On Wed, May 27, 2020 at 7:34 PM Jed Brown wrote: > Mark Adams writes: > > > Nvidias's NSight with 2D Q3 and bs=10. (attached). > > Thanks; this is basically the same as a CPU -- the cost is searching the > sorted rows for the next entry. I've long thought we should optimize > the implementations to fast-path when the next column index in the > sparse matrix equals the next index in the provided block. It'd just > take a good CPU test to demonstrate that payoff. > So you first check whether the next index is the one in the set passed in, and otherwise fall back on the search? Good idea. Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed May 27 18:38:48 2020 From: jed at jedbrown.org (Jed Brown) Date: Wed, 27 May 2020 17:38:48 -0600 Subject: [petsc-users] matsetvaluesblocked4_ In-Reply-To: References: <87pnapgh53.fsf@jedbrown.org> Message-ID: <87ftblgd87.fsf@jedbrown.org> Mark Adams writes: >> >> >>> I think I may know what your problem is. Plex evaluates the blocksize by >> looking for an equal number of dofs >> on each point. This is sufficient, but not necessary. If you are using >> higher order methods, there is block structure >> there that I will not see. >> > > I don't understand what the order has to do with it. > > I use code like this to setup the dofs: > > for (ii=0;iinum_species;ii++) { > ierr = PetscFECreateDefault(PetscObjectComm((PetscObject) dm), dim, 1, > ctx->simplex, NULL, PETSC_DECIDE, &ctx->fe[ii]);CHKERRQ(ierr); > ierr = DMSetField(dm, ii, NULL, (PetscObject) > ctx->fe[ii]);CHKERRQ(ierr); > } > > Everything is constant, elements (eg, Q3) and dofs/vertex. Everything in DMPlex is by *point*. A vertex has one block (num_species above), but a Q2 edge has two, a Q2 face has 4, and a Q2 cell has 8. Note that some DMPlex stuff might run faster if you just make one field with num_species components instead of num_species fields with one component each. It'll also make the block structure more exploitable. From mfadams at lbl.gov Wed May 27 21:29:14 2020 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 27 May 2020 22:29:14 -0400 Subject: [petsc-users] matsetvaluesblocked4_ In-Reply-To: References: <87pnapgh53.fsf@jedbrown.org> <87imghgdf0.fsf@jedbrown.org> Message-ID: On Wed, May 27, 2020 at 7:37 PM Matthew Knepley wrote: > On Wed, May 27, 2020 at 7:34 PM Jed Brown wrote: > >> Mark Adams writes: >> >> > Nvidias's NSight with 2D Q3 and bs=10. (attached). >> >> Thanks; this is basically the same as a CPU -- the cost is searching the >> sorted rows for the next entry. I've long thought we should optimize >> the implementations to fast-path when the next column index in the >> sparse matrix equals the next index in the provided block. It'd just >> take a good CPU test to demonstrate that payoff. >> > > So you first check whether the next index is the one in the set passed in, > and otherwise > fall back on the search? Good idea. > The existing code seems to have *a line (low=i+1) *that seems to be trying to exploit consecutive indices but it is not quite right, I don't think, This is the existing code fragment (this code has been cloned many times and there are several instances of this kernel). I've *added code* that I think might make this do the right thing. if (col <= lastcol) low = 0; else high = nrow; lastcol = col; while (high-low > 5) { t = (low+high)/2; *if (rp[low] == col) high = low+1;else * if (rp[t] > col) high = t; else low = t; } for (i=low; i col) break; *// delete this check if you don't add new columns* if (rp[i] == col) { ap[i] += value; * low = i + 1;* goto noinsert; } } I'll experiment with this. Thanks, Mark > > Matt > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed May 27 21:43:50 2020 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 27 May 2020 22:43:50 -0400 Subject: [petsc-users] matsetvaluesblocked4_ In-Reply-To: <87ftblgd87.fsf@jedbrown.org> References: <87pnapgh53.fsf@jedbrown.org> <87ftblgd87.fsf@jedbrown.org> Message-ID: > > > Note that some DMPlex stuff might run faster if you just make one field > with num_species components instead of num_species fields with one > component each. It'll also make the block structure more exploitable. > Humm, I assumed fields should be vectors (perhaps 0D for a scalar), but maybe Plex does not care. It would break my existing point functions and viz... -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed May 27 22:27:58 2020 From: jed at jedbrown.org (Jed Brown) Date: Wed, 27 May 2020 21:27:58 -0600 Subject: [petsc-users] matsetvaluesblocked4_ In-Reply-To: References: <87pnapgh53.fsf@jedbrown.org> <87ftblgd87.fsf@jedbrown.org> Message-ID: <878shchh6p.fsf@jedbrown.org> Mark Adams writes: >> >> >> Note that some DMPlex stuff might run faster if you just make one field >> with num_species components instead of num_species fields with one >> component each. It'll also make the block structure more exploitable. >> > > Humm, I assumed fields should be vectors (perhaps 0D for a scalar), but > maybe Plex does not care. It would break my existing point functions and > viz... IIRC, the memory ordering for a point with several fields (of the same size and number of components) is point_mem[field][node][component] So if you want a block to have collocated components, it's better to create one field with several components. I'm pretty sure the implementation will be faster, though it might still not be your hot spot. From jed at jedbrown.org Wed May 27 23:06:50 2020 From: jed at jedbrown.org (Jed Brown) Date: Wed, 27 May 2020 22:06:50 -0600 Subject: [petsc-users] matsetvaluesblocked4_ In-Reply-To: References: <87pnapgh53.fsf@jedbrown.org> <87imghgdf0.fsf@jedbrown.org> Message-ID: <875zcghfdx.fsf@jedbrown.org> Mark Adams writes: > The existing code seems to have *a line (low=i+1) *that seems to be > trying to exploit consecutive indices but it is not quite right, I don't > think, > > This is the existing code fragment (this code has been cloned many times > and there are several instances of this kernel). > > I've *added code* that I think might make this do the right thing. > > if (col <= lastcol) low = 0; > else high = nrow; > lastcol = col; > while (high-low > 5) { > t = (low+high)/2; > > *if (rp[low] == col) high = low+1;else * if (rp[t] > col) high = t; > else low = t; Replacing a single comparison per bsearch iteration with two doesn't seem like a good choice to me. > } > for (i=low; i if (rp[i] > col) break; *// delete this check if you don't add new > columns* > if (rp[i] == col) { > ap[i] += value; > > * low = i + 1;* goto noinsert; I was thinking of a fast-path like while (rp[i] == in[l]) ap[i++] = in[l++]; A bit more logic is needed to avoid running off the end of either array. From mfadams at lbl.gov Thu May 28 05:43:29 2020 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 28 May 2020 06:43:29 -0400 Subject: [petsc-users] matsetvaluesblocked4_ In-Reply-To: <875zcghfdx.fsf@jedbrown.org> References: <87pnapgh53.fsf@jedbrown.org> <87imghgdf0.fsf@jedbrown.org> <875zcghfdx.fsf@jedbrown.org> Message-ID: > > > > > > *if (rp[low] == col) high = low+1;else * if (rp[t] > col) high = t; > > else low = t; > > Replacing a single comparison per bsearch iteration with two doesn't > seem like a good choice to me. > > It forces the bisection search and the linear search to terminate in one iteration. -------------- next part -------------- An HTML attachment was scrubbed... URL: From prateekgupta1709 at gmail.com Thu May 28 07:29:57 2020 From: prateekgupta1709 at gmail.com (Prateek Gupta) Date: Thu, 28 May 2020 14:29:57 +0200 Subject: [petsc-users] Repeated global indices in local maps In-Reply-To: References: Message-ID: Thanks Matt. I ended up renumbering the global mesh nodes, tagging the ghost nodes on each processor, and using VecCreateGhost() and VecGhostGetLocalForm() in the Formfunction at solver stage. On Wed, May 27, 2020 at 1:12 PM Matthew Knepley wrote: > On Wed, May 27, 2020 at 4:14 AM Prateek Gupta > wrote: > >> Hi, >> I am new to using petsc and need its nonlinear solvers for my code. I am >> currently using parmetis (outside petsc) to partition an unstructured mesh >> element-wise, but working with data on the vertices of the mesh. >> Consequently, I have repeated vertices in different MPI-processes/ranks. >> At the solver stage, I need to solve for the data on vertices (solution >> vector is defined on the vertices). So, I need to create a distributed >> vector over vertices of the mesh, but the distribution in MPI-ranks is not >> contiguous since partitioning is (has to be) done element wise. I am trying >> to figure out, >> 1. if I need only Local to Global IS or do I need to combine them with >> AO? >> 2. Even at the VecCreateMPI stage, is it possible to inform petsc that, >> although, say, rank_i has n_i components of the vector, but those >> components are not arranged contiguously? >> >> For instance, >> >> Global vertices vector v : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] >> v_rank_1 : [2, 3, 4, 8, 9, 7] ; v_rank_2 : [0, 1, 2, 3, 6, 10, 11, 8, 9, >> 5] >> > > PETSc is going to number the unknowns contiguously by process. If you want > to also have another numbering, as above, > you must handle it somehow. You could use an AO. However, I believe it is > easier to just renumber your mesh after > partitioning. This is what PETSc does in its unstructured mesh code. > > Thanks, > > Matt > > >> Any help is greatly appreciated. >> >> Thank you. >> Prateek Gupta >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu May 28 09:21:13 2020 From: jed at jedbrown.org (Jed Brown) Date: Thu, 28 May 2020 08:21:13 -0600 Subject: [petsc-users] matsetvaluesblocked4_ In-Reply-To: References: <87pnapgh53.fsf@jedbrown.org> <87imghgdf0.fsf@jedbrown.org> <875zcghfdx.fsf@jedbrown.org> Message-ID: <87a71si1ie.fsf@jedbrown.org> Mark Adams writes: >> >> >> > >> > *if (rp[low] == col) high = low+1;else * if (rp[t] > col) high = t; >> > else low = t; >> >> Replacing a single comparison per bsearch iteration with two doesn't >> seem like a good choice to me. >> >> > It forces the bisection search and the linear search to terminate in one > iteration. I care about the impact when inputs aren't sorted. If the fast-path doesn't take effect, you've doubled the branching and (although it would need testing) I'm concerned about the perf impact. That is, I'm willing to pay one extra branch per inserted entry, but not an extra branch per iteration of the search. From jorge.chiva.segura at gmail.com Fri May 29 07:51:14 2020 From: jorge.chiva.segura at gmail.com (Jorge Chiva Segura) Date: Fri, 29 May 2020 14:51:14 +0200 Subject: [petsc-users] Possible bug PETSc+Complex+CUDA Message-ID: Dear all, I just wanted to add what we found in case that it can help to solve this problem: * Examples ex2.c, ex11.c and ex32.c under src/ksp/ksp/tutorials all of them seg-fault with "-mat_type mpiaijcusparse" or "-vec_type mpicuda" with CUDA 9.2 and 10.2. Examples ex2.c and ex32.c work fine if PETSc scalar type is set to real instead of complex. PETSc has been compiled with gcc-6.4.0 * Here: https://dynamite.readthedocs.io/en/latest/tips.html#gpu-support it is mentioned that there is some problem for CUDA versions 8 and above. * It seems that the same problem was mentioned in this thread: https://lists.mcs.anl.gov/mailman/htdig/petsc-users/2019-February/037748.html Please, let me know if you need any extra information. Thank you very much for your help. Best regards, Jorge -------------- next part -------------- An HTML attachment was scrubbed... URL: From eijkhout at tacc.utexas.edu Fri May 29 08:13:42 2020 From: eijkhout at tacc.utexas.edu (Victor Eijkhout) Date: Fri, 29 May 2020 13:13:42 +0000 Subject: [petsc-users] PDF manual, section 4.4.9 Message-ID: <11A1BAE7-F9B4-43D7-8BEB-80E74F3BDCA6@tacc.utexas.edu> to be encapuslated -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Fri May 29 09:02:58 2020 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Fri, 29 May 2020 09:02:58 -0500 Subject: [petsc-users] Possible bug PETSc+Complex+CUDA In-Reply-To: References: Message-ID: Hi, Jorge, I knew this problem and a fix is coming. Thanks. --Junchao Zhang On Fri, May 29, 2020 at 7:52 AM Jorge Chiva Segura < jorge.chiva.segura at gmail.com> wrote: > Dear all, > > I just wanted to add what we found in case that it can help to solve this > problem: > > * Examples ex2.c, ex11.c and ex32.c under src/ksp/ksp/tutorials all of > them seg-fault with "-mat_type mpiaijcusparse" or "-vec_type mpicuda" with > CUDA 9.2 and 10.2. > Examples ex2.c and ex32.c work fine if PETSc scalar type is set to real > instead of complex. PETSc has been compiled with gcc-6.4.0 > > * Here: > https://dynamite.readthedocs.io/en/latest/tips.html#gpu-support > it is mentioned that there is some problem for CUDA versions 8 and > above. > > * It seems that the same problem was mentioned in this thread: > > https://lists.mcs.anl.gov/mailman/htdig/petsc-users/2019-February/037748.html > > Please, let me know if you need any extra information. > > Thank you very much for your help. > > Best regards, > > Jorge > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorge.chiva.segura at gmail.com Fri May 29 10:21:59 2020 From: jorge.chiva.segura at gmail.com (Jorge Chiva Segura) Date: Fri, 29 May 2020 17:21:59 +0200 Subject: [petsc-users] Possible bug PETSc+Complex+CUDA In-Reply-To: References: Message-ID: Hi Junchao, I am sorry, I tried to reply to the thread created by Rui Silva regarding this problem but it seems that I did something wrong and it looks like I created a new thread (I am not used to mailing lists :S). Just wanted to share that information in case that it could be useful to identify the problem and test the solution. Thank you very much for your help. Best regards, Jorge On Fri, May 29, 2020 at 4:03 PM Junchao Zhang wrote: > Hi, Jorge, > I knew this problem and a fix is coming. Thanks. > --Junchao Zhang > > > On Fri, May 29, 2020 at 7:52 AM Jorge Chiva Segura < > jorge.chiva.segura at gmail.com> wrote: > >> Dear all, >> >> I just wanted to add what we found in case that it can help to solve this >> problem: >> >> * Examples ex2.c, ex11.c and ex32.c under src/ksp/ksp/tutorials all of >> them seg-fault with "-mat_type mpiaijcusparse" or "-vec_type mpicuda" with >> CUDA 9.2 and 10.2. >> Examples ex2.c and ex32.c work fine if PETSc scalar type is set to >> real instead of complex. PETSc has been compiled with gcc-6.4.0 >> >> * Here: >> https://dynamite.readthedocs.io/en/latest/tips.html#gpu-support >> it is mentioned that there is some problem for CUDA versions 8 and >> above. >> >> * It seems that the same problem was mentioned in this thread: >> >> https://lists.mcs.anl.gov/mailman/htdig/petsc-users/2019-February/037748.html >> >> Please, let me know if you need any extra information. >> >> Thank you very much for your help. >> >> Best regards, >> >> Jorge >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Fri May 29 10:26:39 2020 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Fri, 29 May 2020 10:26:39 -0500 Subject: [petsc-users] Possible bug PETSc+Complex+CUDA In-Reply-To: References: Message-ID: Jorge, No problem at all with the new thread. I ran the tests you mentioned and did found new problems. Thanks. --Junchao Zhang On Fri, May 29, 2020 at 10:22 AM Jorge Chiva Segura < jorge.chiva.segura at gmail.com> wrote: > Hi Junchao, > > I am sorry, I tried to reply to the thread created by Rui Silva regarding > this problem but it seems that I did something wrong and it looks like I > created a new thread (I am not used to mailing lists :S). > Just wanted to share that information in case that it could be useful to > identify the problem and test the solution. > > Thank you very much for your help. > Best regards, > Jorge > > On Fri, May 29, 2020 at 4:03 PM Junchao Zhang > wrote: > >> Hi, Jorge, >> I knew this problem and a fix is coming. Thanks. >> --Junchao Zhang >> >> >> On Fri, May 29, 2020 at 7:52 AM Jorge Chiva Segura < >> jorge.chiva.segura at gmail.com> wrote: >> >>> Dear all, >>> >>> I just wanted to add what we found in case that it can help to solve >>> this problem: >>> >>> * Examples ex2.c, ex11.c and ex32.c under src/ksp/ksp/tutorials all of >>> them seg-fault with "-mat_type mpiaijcusparse" or "-vec_type mpicuda" with >>> CUDA 9.2 and 10.2. >>> Examples ex2.c and ex32.c work fine if PETSc scalar type is set to >>> real instead of complex. PETSc has been compiled with gcc-6.4.0 >>> >>> * Here: >>> https://dynamite.readthedocs.io/en/latest/tips.html#gpu-support >>> it is mentioned that there is some problem for CUDA versions 8 and >>> above. >>> >>> * It seems that the same problem was mentioned in this thread: >>> >>> https://lists.mcs.anl.gov/mailman/htdig/petsc-users/2019-February/037748.html >>> >>> Please, let me know if you need any extra information. >>> >>> Thank you very much for your help. >>> >>> Best regards, >>> >>> Jorge >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From bantingl at myumanitoba.ca Fri May 29 11:15:32 2020 From: bantingl at myumanitoba.ca (Lucas Banting) Date: Fri, 29 May 2020 16:15:32 +0000 Subject: [petsc-users] How to use MatSetValuesStencil() with DOF>1 Message-ID: Hello, I have a structured grid problem with 5 unknowns per cell: U,V,P,T,F, this is a CFD code but I don't think these details are necessary to answer my question. I was wondering how exactly I should use MatSetValuesStencil() to fill my matrix. I have 5 degrees of freedom, so I need to set up to 25 coefficients per grid cell. I don't understand how i,j, and c map to all coefficients. For a 1 degree of freedom system to fill my 9 point stencil I did: values(1)= ASW(II,JJ) values(2)= AS (II,JJ) values(3)= ASE(II,JJ) values(4)= AW (II,JJ) values(5)= AP (II,JJ) values(6)= AE (II,JJ) values(7)= ANW(II,JJ) values(8)= AN (II,JJ) values(9)= ANE(II,JJ) idxm(MatStencil_i,1) = II;idxm(MatStencil_j,1) = JJ idxn(MatStencil_i,1) = II-1;idxn(MatStencil_j,1) = JJ-1 idxn(MatStencil_i,2) = II ;idxn(MatStencil_j,2) = JJ-1 idxn(MatStencil_i,3) = II+1;idxn(MatStencil_j,3) = JJ-1 idxn(MatStencil_i,4) = II-1 ;idxn(MatStencil_j,4) = JJ idxn(MatStencil_i,5) = II ;idxn(MatStencil_j,5) = JJ idxn(MatStencil_i,6) = II+1;idxn(MatStencil_j,6) = JJ idxn(MatStencil_i,7) = II-1 ;idxn(MatStencil_j,7) = JJ+1 idxn(MatStencil_i,8) = II ;idxn(MatStencil_j,8) = JJ+1 idxn(MatStencil_i,9) = II+1;idxn(MatStencil_j,9) = JJ+1 call MatSetValuesStencil(A,1,idxm,9,idxn,values,INSERT_VALUES,ierr) Which seemed to work just fine. For 5 degrees of freedom, instead of just having a single AP coefficient for example, I have an AP matrix: ap_uu ap_uv ap_up ap_ut ap_uf ap_vu ap_vv ap_vp ap_vt ap_vf ap_pu ap_pv ap_pp ap_pt ap_pf ap_tu ap_tv ap_tp ap_tt ap_tf ap_fu ap_fv ap_fp ap_ft ap_ff In 1 degree of freedom, i and j corresponded to the solution variable in the grid. For multi degree of freedom, I don't understand how values of i, j, and c could distinguish from an ap_uv and ap_vv coefficent for example, wouldn't they both be at i, j ,c=2? Is there anyway I can use MatSetValuesStencil() to fill in my 9 point stencil with my 5x5 matrix coefficients? To clarify I have nine 5x5 matrices for each cell which correspond to the 5 unknowns per cell. Thanks, Lucas -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri May 29 12:47:34 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 29 May 2020 13:47:34 -0400 Subject: [petsc-users] How to use MatSetValuesStencil() with DOF>1 In-Reply-To: References: Message-ID: On Fri, May 29, 2020 at 12:16 PM Lucas Banting wrote: > Hello, > > I have a structured grid problem with 5 unknowns per cell: U,V,P,T,F, this > is a CFD code but I don't think these details are necessary to answer my > question. > I was wondering how exactly I should use MatSetValuesStencil() to fill my > matrix. > I have 5 degrees of freedom, so I need to set up to 25 coefficients per > grid cell. I don't understand how i,j, and c map to all coefficients. > > For a 1 degree of freedom system to fill my 9 point stencil I did: > > values(1)= ASW(II,JJ) > values(2)= AS (II,JJ) > values(3)= ASE(II,JJ) > values(4)= AW (II,JJ) > values(5)= AP (II,JJ) > values(6)= AE (II,JJ) > values(7)= ANW(II,JJ) > values(8)= AN (II,JJ) > values(9)= ANE(II,JJ) > > idxm(MatStencil_i,1) = II;idxm(MatStencil_j,1) = JJ > > idxn(MatStencil_i,1) = II-1;idxn(MatStencil_j,1) = JJ-1 > idxn(MatStencil_i,2) = II ;idxn(MatStencil_j,2) = JJ-1 > idxn(MatStencil_i,3) = II+1;idxn(MatStencil_j,3) = JJ-1 > idxn(MatStencil_i,4) = II-1 ;idxn(MatStencil_j,4) = JJ > idxn(MatStencil_i,5) = II ;idxn(MatStencil_j,5) = JJ > idxn(MatStencil_i,6) = II+1;idxn(MatStencil_j,6) = JJ > idxn(MatStencil_i,7) = II-1 ;idxn(MatStencil_j,7) = JJ+1 > idxn(MatStencil_i,8) = II ;idxn(MatStencil_j,8) = JJ+1 > idxn(MatStencil_i,9) = II+1;idxn(MatStencil_j,9) = JJ+1 > call > MatSetValuesStencil(A,1,idxm,9,idxn,values,INSERT_VALUES,ierr) > > Which seemed to work just fine. For 5 degrees of freedom, instead of just > having a single AP coefficient for example, I have an AP matrix: > > ap_uu > ap_uv > ap_up > ap_ut > ap_uf > ap_vu > ap_vv > ap_vp > ap_vt > ap_vf > ap_pu > ap_pv > ap_pp > ap_pt > ap_pf > ap_tu > ap_tv > ap_tp > ap_tt > ap_tf > ap_fu > ap_fv > ap_fp > ap_ft > ap_ff > > In 1 degree of freedom, i and j corresponded to the solution variable in > the grid. For multi degree of freedom, I don't understand how values of i, > j, and c could distinguish from an ap_uv and ap_vv coefficent for example, > wouldn't they both be at i, j ,c=2? > Is there anyway I can use MatSetValuesStencil() to fill in my 9 point > stencil with my 5x5 matrix coefficients? To clarify I have nine 5x5 > matrices for each cell which correspond to the 5 unknowns per cell. > In MatSetValuesStencil(), I think it is best to think of the Stencils as another way of providing row/col numbers. So each MatStencil corresponds to some row number. So your ap_uv is some entry in the input matrix, and thus corresponds to some row number (a MatStencil) and some column number (another MatStencil). Since ap is one point, it appears that ap_uv --> row = i,j,0 col = i,j,1 ap_vv --> row = i,j,1 col = i,j,1 So you line up your MatStencil arguments to match the order of your input. It sounds like you want the field index to be the fastest in your input. Does that make sense? Thanks, Matt > Thanks, > > Lucas > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Fri May 29 12:51:46 2020 From: hongzhang at anl.gov (Zhang, Hong) Date: Fri, 29 May 2020 17:51:46 +0000 Subject: [petsc-users] How to use MatSetValuesStencil() with DOF>1 In-Reply-To: References: Message-ID: <8198B607-5A01-4283-ADEE-985720E698F6@anl.gov> To see how this can be done with DM field for a PDE with DOF=2, you can refer to RHSJacobian() in src/ts/tutorials/advection-diffusion-reaction/ex5.c Hong (Mr.) On May 29, 2020, at 11:15 AM, Lucas Banting > wrote: Hello, I have a structured grid problem with 5 unknowns per cell: U,V,P,T,F, this is a CFD code but I don't think these details are necessary to answer my question. I was wondering how exactly I should use MatSetValuesStencil() to fill my matrix. I have 5 degrees of freedom, so I need to set up to 25 coefficients per grid cell. I don't understand how i,j, and c map to all coefficients. For a 1 degree of freedom system to fill my 9 point stencil I did: values(1)= ASW(II,JJ) values(2)= AS (II,JJ) values(3)= ASE(II,JJ) values(4)= AW (II,JJ) values(5)= AP (II,JJ) values(6)= AE (II,JJ) values(7)= ANW(II,JJ) values(8)= AN (II,JJ) values(9)= ANE(II,JJ) idxm(MatStencil_i,1) = II;idxm(MatStencil_j,1) = JJ idxn(MatStencil_i,1) = II-1;idxn(MatStencil_j,1) = JJ-1 idxn(MatStencil_i,2) = II ;idxn(MatStencil_j,2) = JJ-1 idxn(MatStencil_i,3) = II+1;idxn(MatStencil_j,3) = JJ-1 idxn(MatStencil_i,4) = II-1 ;idxn(MatStencil_j,4) = JJ idxn(MatStencil_i,5) = II ;idxn(MatStencil_j,5) = JJ idxn(MatStencil_i,6) = II+1;idxn(MatStencil_j,6) = JJ idxn(MatStencil_i,7) = II-1 ;idxn(MatStencil_j,7) = JJ+1 idxn(MatStencil_i,8) = II ;idxn(MatStencil_j,8) = JJ+1 idxn(MatStencil_i,9) = II+1;idxn(MatStencil_j,9) = JJ+1 call MatSetValuesStencil(A,1,idxm,9,idxn,values,INSERT_VALUES,ierr) Which seemed to work just fine. For 5 degrees of freedom, instead of just having a single AP coefficient for example, I have an AP matrix: ap_uu ap_uv ap_up ap_ut ap_uf ap_vu ap_vv ap_vp ap_vt ap_vf ap_pu ap_pv ap_pp ap_pt ap_pf ap_tu ap_tv ap_tp ap_tt ap_tf ap_fu ap_fv ap_fp ap_ft ap_ff In 1 degree of freedom, i and j corresponded to the solution variable in the grid. For multi degree of freedom, I don't understand how values of i, j, and c could distinguish from an ap_uv and ap_vv coefficent for example, wouldn't they both be at i, j ,c=2? Is there anyway I can use MatSetValuesStencil() to fill in my 9 point stencil with my 5x5 matrix coefficients? To clarify I have nine 5x5 matrices for each cell which correspond to the 5 unknowns per cell. Thanks, Lucas -------------- next part -------------- An HTML attachment was scrubbed... URL: From bantingl at myumanitoba.ca Fri May 29 12:55:59 2020 From: bantingl at myumanitoba.ca (Lucas Banting) Date: Fri, 29 May 2020 17:55:59 +0000 Subject: [petsc-users] How to use MatSetValuesStencil() with DOF>1 In-Reply-To: References: , Message-ID: Yes, I think that makes sense to me. Each cell is at i,j but then you can set the 25 coefficients per node using MatStencil_c from 0 to 4 for both idxm and idxn. Thanks for your help. Lucas ________________________________ From: Matthew Knepley Sent: Friday, May 29, 2020 12:47 PM To: Lucas Banting Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] How to use MatSetValuesStencil() with DOF>1 On Fri, May 29, 2020 at 12:16 PM Lucas Banting > wrote: Hello, I have a structured grid problem with 5 unknowns per cell: U,V,P,T,F, this is a CFD code but I don't think these details are necessary to answer my question. I was wondering how exactly I should use MatSetValuesStencil() to fill my matrix. I have 5 degrees of freedom, so I need to set up to 25 coefficients per grid cell. I don't understand how i,j, and c map to all coefficients. For a 1 degree of freedom system to fill my 9 point stencil I did: values(1)= ASW(II,JJ) values(2)= AS (II,JJ) values(3)= ASE(II,JJ) values(4)= AW (II,JJ) values(5)= AP (II,JJ) values(6)= AE (II,JJ) values(7)= ANW(II,JJ) values(8)= AN (II,JJ) values(9)= ANE(II,JJ) idxm(MatStencil_i,1) = II;idxm(MatStencil_j,1) = JJ idxn(MatStencil_i,1) = II-1;idxn(MatStencil_j,1) = JJ-1 idxn(MatStencil_i,2) = II ;idxn(MatStencil_j,2) = JJ-1 idxn(MatStencil_i,3) = II+1;idxn(MatStencil_j,3) = JJ-1 idxn(MatStencil_i,4) = II-1 ;idxn(MatStencil_j,4) = JJ idxn(MatStencil_i,5) = II ;idxn(MatStencil_j,5) = JJ idxn(MatStencil_i,6) = II+1;idxn(MatStencil_j,6) = JJ idxn(MatStencil_i,7) = II-1 ;idxn(MatStencil_j,7) = JJ+1 idxn(MatStencil_i,8) = II ;idxn(MatStencil_j,8) = JJ+1 idxn(MatStencil_i,9) = II+1;idxn(MatStencil_j,9) = JJ+1 call MatSetValuesStencil(A,1,idxm,9,idxn,values,INSERT_VALUES,ierr) Which seemed to work just fine. For 5 degrees of freedom, instead of just having a single AP coefficient for example, I have an AP matrix: ap_uu ap_uv ap_up ap_ut ap_uf ap_vu ap_vv ap_vp ap_vt ap_vf ap_pu ap_pv ap_pp ap_pt ap_pf ap_tu ap_tv ap_tp ap_tt ap_tf ap_fu ap_fv ap_fp ap_ft ap_ff In 1 degree of freedom, i and j corresponded to the solution variable in the grid. For multi degree of freedom, I don't understand how values of i, j, and c could distinguish from an ap_uv and ap_vv coefficent for example, wouldn't they both be at i, j ,c=2? Is there anyway I can use MatSetValuesStencil() to fill in my 9 point stencil with my 5x5 matrix coefficients? To clarify I have nine 5x5 matrices for each cell which correspond to the 5 unknowns per cell. In MatSetValuesStencil(), I think it is best to think of the Stencils as another way of providing row/col numbers. So each MatStencil corresponds to some row number. So your ap_uv is some entry in the input matrix, and thus corresponds to some row number (a MatStencil) and some column number (another MatStencil). Since ap is one point, it appears that ap_uv --> row = i,j,0 col = i,j,1 ap_vv --> row = i,j,1 col = i,j,1 So you line up your MatStencil arguments to match the order of your input. It sounds like you want the field index to be the fastest in your input. Does that make sense? Thanks, Matt Thanks, Lucas -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Fri May 29 13:05:41 2020 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Fri, 29 May 2020 21:05:41 +0300 Subject: [petsc-users] How to use MatSetValuesStencil() with DOF>1 In-Reply-To: References: Message-ID: <3AAC25D4-F742-45EF-9660-A9210EBD3365@gmail.com> Here you have an example of a 2 dof system using MatSetValuesBlockedStencil https://github.com/stefanozampini/petscopt/blob/master/src/ts/examples/tests/ex3.c#L186 > On May 29, 2020, at 8:55 PM, Lucas Banting wrote: > > Yes, I think that makes sense to me. Each cell is at i,j but then you can set the 25 coefficients per node using MatStencil_c from 0 to 4 for both idxm and idxn. > > Thanks for your help. > > Lucas > From: Matthew Knepley > Sent: Friday, May 29, 2020 12:47 PM > To: Lucas Banting > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] How to use MatSetValuesStencil() with DOF>1 > > On Fri, May 29, 2020 at 12:16 PM Lucas Banting > wrote: > Hello, > > I have a structured grid problem with 5 unknowns per cell: U,V,P,T,F, this is a CFD code but I don't think these details are necessary to answer my question. > I was wondering how exactly I should use MatSetValuesStencil() to fill my matrix. > I have 5 degrees of freedom, so I need to set up to 25 coefficients per grid cell. I don't understand how i,j, and c map to all coefficients. > > For a 1 degree of freedom system to fill my 9 point stencil I did: > > values(1)= ASW(II,JJ) > values(2)= AS (II,JJ) > values(3)= ASE(II,JJ) > values(4)= AW (II,JJ) > values(5)= AP (II,JJ) > values(6)= AE (II,JJ) > values(7)= ANW(II,JJ) > values(8)= AN (II,JJ) > values(9)= ANE(II,JJ) > > idxm(MatStencil_i,1) = II;idxm(MatStencil_j,1) = JJ > > idxn(MatStencil_i,1) = II-1;idxn(MatStencil_j,1) = JJ-1 > idxn(MatStencil_i,2) = II ;idxn(MatStencil_j,2) = JJ-1 > idxn(MatStencil_i,3) = II+1;idxn(MatStencil_j,3) = JJ-1 > idxn(MatStencil_i,4) = II-1 ;idxn(MatStencil_j,4) = JJ > idxn(MatStencil_i,5) = II ;idxn(MatStencil_j,5) = JJ > idxn(MatStencil_i,6) = II+1;idxn(MatStencil_j,6) = JJ > idxn(MatStencil_i,7) = II-1 ;idxn(MatStencil_j,7) = JJ+1 > idxn(MatStencil_i,8) = II ;idxn(MatStencil_j,8) = JJ+1 > idxn(MatStencil_i,9) = II+1;idxn(MatStencil_j,9) = JJ+1 > call MatSetValuesStencil(A,1,idxm,9,idxn,values,INSERT_VALUES,ierr) > > Which seemed to work just fine. For 5 degrees of freedom, instead of just having a single AP coefficient for example, I have an AP matrix: > > ap_uu > ap_uv > ap_up > ap_ut > ap_uf > ap_vu > ap_vv > ap_vp > ap_vt > ap_vf > ap_pu > ap_pv > ap_pp > ap_pt > ap_pf > ap_tu > ap_tv > ap_tp > ap_tt > ap_tf > ap_fu > ap_fv > ap_fp > ap_ft > ap_ff > > In 1 degree of freedom, i and j corresponded to the solution variable in the grid. For multi degree of freedom, I don't understand how values of i, j, and c could distinguish from an ap_uv and ap_vv coefficent for example, wouldn't they both be at i, j ,c=2? > Is there anyway I can use MatSetValuesStencil() to fill in my 9 point stencil with my 5x5 matrix coefficients? To clarify I have nine 5x5 matrices for each cell which correspond to the 5 unknowns per cell. > > In MatSetValuesStencil(), I think it is best to think of the Stencils as another way of providing row/col numbers. So each MatStencil corresponds > to some row number. So your ap_uv is some entry in the input matrix, and thus corresponds to some row number (a MatStencil) and some > column number (another MatStencil). Since ap is one point, it appears that > > ap_uv --> row = i,j,0 col = i,j,1 > ap_vv --> row = i,j,1 col = i,j,1 > > So you line up your MatStencil arguments to match the order of your input. It sounds like you want the field index to be the fastest in your input. > > Does that make sense? > > Thanks, > > Matt > > > Thanks, > > Lucas > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbuerkle at web.de Sun May 31 21:53:17 2020 From: mbuerkle at web.de (Marius Buerkle) Date: Mon, 1 Jun 2020 04:53:17 +0200 Subject: [petsc-users] mkl cpardiso iparm 31 In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Sun May 31 23:34:33 2020 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Sun, 31 May 2020 23:34:33 -0500 Subject: [petsc-users] mkl cpardiso iparm 31 In-Reply-To: References: Message-ID: Thanks for the update. Let's hope Intel will add it. --Junchao Zhang On Sun, May 31, 2020 at 9:53 PM Marius Buerkle wrote: > Sorry for the late reply, but it took some time to figure things out with > intel support. It turned out that the feature is at the present only > available in the non-mpi MKL pardiso but not in the MKL cluster solver, so > the MKL manual is actually wrong here. I filled a feature request at intel > premium support but no clue how long this will take (if ever) to add it. in > case it will be available in mkl at some point i will come back to you. > > best and thanks > marius > > > > *Betreff:* Re: Re: Re: [petsc-users] mkl cpardiso iparm 31 > Marius, > Thanks for the update. Once you get feedback from Intel, please let us > know. If Intel supports it, we can add it. I am new to pardiso, but I > think it is doable. > --Junchao Zhang > > On Thu, May 7, 2020 at 11:56 PM Marius Buerkle wrote: > >> Hi Junchao, >> >> I contacted intel support regarding this, they told me that this is a >> typo in the manual and that iparm[30] is indeed used. However, while it >> works for the non-MPI (MKL_PARDISO) version, it does not work, or I could >> not get it working, for the cluster sparse solver (MKL_CPARDISO). I >> reported this also to intel support but no reply yet. I was also wondering >> about how perm(N) is distributed and I don't know at the moment. >> >> Best, >> Marius >> >> >> >> Marius, >> You are right. perm is not referenced. I searched and found this, >> >> https://software.intel.com/content/www/us/en/develop/documentation/mkl-developer-reference-c/top/sparse-solver-routines/parallel-direct-sparse-solver-for-clusters-interface/cluster-sparse-solver.html >> . >> It says "perm Ignored". But from other parts of the document, it seems >> perm is used. I'm puzzled whether Intel MKL pardiso supports this feature >> or not. >> >> I am thinking about adding MatMkl_CPardisoSetPerm(Mat A, IS perm) or >> MatMkl_CPardisoSetPerm(Mat A, const PetscInt *perm). But I don't know >> whether perm(N) is distributed or every mpi rank has the same perm(N). >> Do you know good Intel MKL pardiso documentation or examples for me to >> reference? >> >> Thank you. >> --Junchao Zhang >> >> On Thu, May 7, 2020 at 3:14 AM Marius Buerkle wrote: >> >>> Hi, >>> >>> Thanks for the info. But how do I set the values to be calculated. >>> According to the intel parallel sparse cluster solver manual the entries >>> have to be defined in the permutation vector (before each call). However, >>> if I understand what is happening in mkl_cpardiso.c. correctly, perm is >>> set to 0 during the initialization phase and then not referenced anymore. >>> Is this correct? How can I specify the necessary entries in perm? >>> >>> Best, >>> Marius >>> >>> >>> >>> >>> On Fri, May 1, 2020 at 3:33 AM Marius Buerkle wrote: >>> >>>> Hi, >>>> >>>> Is the option "-mat_mkl_cpardiso_31" to calculate Partial solve and >>>> computing selected components of the solution vectors actually >>>> supported by PETSC? >>>> >>> From the code, it seems so. >>> >>>> >>>> Marius >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: