************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./OPFLOW on a opt-mode-newell-pdipm-cxx named newell01.pnl.gov with 1 processor, by abhy245 Tue Mar 10 12:04:15 2020 Using 128 OpenMP threads Using Petsc Development GIT revision: v3.11.3-1447-g46c30d0 GIT Date: 2020-03-05 13:48:32 -0600 Max Max/Min Avg Total Time (sec): 1.874e+02 1.000 1.874e+02 Objects: 1.370e+02 1.000 1.370e+02 Flop: 2.233e+10 1.000 2.233e+10 2.233e+10 Flop/sec: 1.192e+08 1.000 1.192e+08 1.192e+08 MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flop and VecAXPY() for complex vectors of length N --> 8N flop Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 1.4577e-02 0.0% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 1: Reading Data: 2.3948e-01 0.1% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 2: Allocation & Setting: 1.8712e+02 99.9% 2.2327e+10 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors) CpuToGpu Count: total number of CPU to GPU copies per processor CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor) GpuToCpu Count: total number of GPU to CPU copies per processor GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor) GPU %F: percent flops on GPU in this event ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F --------------------------------------------------------------------------------------------------------------------------------------------------------------- --- Event Stage 0: Main Stage --- Event Stage 1: Reading Data --- Event Stage 2: Allocation & Setting BuildTwoSided 1 1.0 2.0260e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 DMCreateMat 1 1.0 2.2490e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexStratify 1 1.0 7.3588e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexSymmetrize 1 1.0 9.3640e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexPrealloc 1 1.0 2.1868e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SFSetGraph 1 1.0 1.2180e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SFSetUp 1 1.0 5.4870e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SFBcastBegin 3421 1.0 9.6037e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SFBcastEnd 3421 1.0 9.3357e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SFReduceBegin 5 1.0 1.3241e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SFReduceEnd 5 1.0 4.1562e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 MatMult 859 1.0 2.0374e+00 1.0 1.72e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 8 0 0 0 1 8 0 0 0 843 0 0 0.00e+00 860 1.35e+03 0 MatMultTrAdd 1608 1.0 2.5075e-01 1.0 2.33e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 930 0 0 0.00e+00 0 0.00e+00 0 MatSolve 1059 1.0 3.9690e+00 1.0 3.52e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 16 0 0 0 2 16 0 0 0 887 0 0 0.00e+00 0 0.00e+00 0 MatLUFactorSym 1 1.0 4.3658e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 MatLUFactorNum 200 1.0 3.0984e+01 1.0 1.14e+10 1.0 0.0e+00 0.0e+00 0.0e+00 17 51 0 0 0 17 51 0 0 0 369 0 0 0.00e+00 0 0.00e+00 0 MatScale 402 1.0 1.2604e-02 1.0 4.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3252 0 0 0.00e+00 0 0.00e+00 0 MatAssemblyBegin 1207 1.0 2.4332e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 MatAssemblyEnd 1207 1.0 1.5498e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 MatGetRowIJ 1 1.0 7.7044e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 MatGetOrdering 1 1.0 1.3421e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 7 0 0 0 0 7 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 MatZeroEntries 1004 1.0 4.8473e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecDot 200 1.0 8.6307e-02 1.0 3.03e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 351 482 400 2.42e+02 0 0.00e+00 100 VecMDot 859 1.0 2.2225e-01 1.0 1.10e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 4928 29111 859 1.35e+03 0 0.00e+00 100 VecNorm 2668 1.0 6.6124e-01 1.0 5.74e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 869 1027 1407 8.71e+02 0 0.00e+00 97 VecScale 1461 1.0 2.1179e-01 1.0 2.18e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1028 28136 0 0.00e+00 0 0.00e+00 96 VecCopy 600 1.0 5.6811e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 2 4.04e-01 0 VecSet 2485 1.0 4.5777e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecAXPY 602 1.0 6.8484e-02 1.0 9.78e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1428 70454 0 0.00e+00 0 0.00e+00 80 VecMAXPY 1059 1.0 4.4488e-01 1.0 2.87e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 13 0 0 0 0 13 0 0 0 6441 127696 0 0.00e+00 0 0.00e+00 100 VecScatterBegin 602 1.0 2.9505e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 403 6.45e+01 0 VecReduceArith 600 1.0 2.5009e-01 1.0 2.36e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 943 1188 400 6.29e+02 0 0.00e+00 100 VecReduceComm 200 1.0 1.7001e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecNormalize 1059 1.0 4.7541e-01 1.0 6.24e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 1313 2743 200 3.14e+02 0 0.00e+00 100 VecCUDACopyTo 3066 1.0 2.6383e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 3066 3.09e+03 0 0.00e+00 0 VecCUDACopyFrom 2687 1.0 3.8111e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 2687 2.53e+03 0 TaoSolve 1 1.0 1.8705e+02 1.0 2.23e+10 1.0 0.0e+00 0.0e+00 0.0e+00100100 0 0 0 100100 0 0 0 119 5623 3066 3.09e+03 2685 2.53e+03 23 TaoObjGradEval 402 1.0 6.2670e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 TaoHessianEval 200 1.0 4.6434e+00 1.0 1.30e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 28 0 0 0.00e+00 1 1.60e-01 0 TaoConstrEval 804 1.0 1.2309e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 1206 2.27e+02 0 TaoJacobianEval 804 1.0 4.9159e+00 1.0 1.81e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 37 0 0 0.00e+00 0 0.00e+00 0 SNESSolve 1 1.0 1.8573e+02 1.0 2.23e+10 1.0 0.0e+00 0.0e+00 0.0e+00 99100 0 0 0 99100 0 0 0 120 5625 3063 3.09e+03 2667 2.52e+03 23 SNESSetUp 1 1.0 3.6838e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SNESFunctionEval 201 1.0 3.5386e+00 1.0 3.01e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 85 400 603 2.77e+02 804 4.29e+02 23 SNESJacobianEval 200 1.0 1.2696e+02 1.0 1.30e+08 1.0 0.0e+00 0.0e+00 0.0e+00 68 1 0 0 0 68 1 0 0 0 1 0 0 0.00e+00 3 5.64e-01 0 SNESLineSearch 200 1.0 6.8961e+00 1.0 8.64e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 4 0 0 0 4 4 0 0 0 125 676 2000 1.42e+03 1800 1.17e+03 47 KSPSetUp 200 1.0 4.1967e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 KSPSolve 200 1.0 5.1848e+01 1.0 2.13e+10 1.0 0.0e+00 0.0e+00 0.0e+00 28 96 0 0 0 28 96 0 0 0 411 15375 1059 1.66e+03 860 1.35e+03 22 KSPGMRESOrthog 859 1.0 5.1966e-01 1.0 3.29e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 15 0 0 0 0 15 0 0 0 6323 59430 859 1.35e+03 0 0.00e+00 100 PCSetUp 200 1.0 4.4452e+01 1.0 1.14e+10 1.0 0.0e+00 0.0e+00 0.0e+00 24 51 0 0 0 24 51 0 0 0 257 0 0 0.00e+00 0 0.00e+00 0 PCApply 1059 1.0 3.9703e+00 1.0 3.52e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 16 0 0 0 2 16 0 0 0 886 0 0 0.00e+00 0 0.00e+00 0 --------------------------------------------------------------------------------------------------------------------------------------------------------------- Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Container 0 1 584 0. Distributed Mesh 0 5 25640 0. DM Label 0 1 640 0. GraphPartitioner 0 1 672 0. Index Set 0 10 2620728 0. IS L to G Mapping 0 1 96176 0. Section 0 8 5824 0. Star Forest Graph 0 9 7624 0. Discrete System 0 5 4760 0. Matrix 0 8 45196956 0. Vector 0 62 273222032 0. Vec Scatter 0 1 776 0. Tao 0 1 2296 0. SNES 0 1 1412 0. DMSNES 0 1 680 0. SNESLineSearch 0 1 1024 0. Krylov Solver 0 1 18648 0. DMKSP interface 0 1 664 0. Preconditioner 0 1 1008 0. Viewer 1 1 848 0. --- Event Stage 1: Reading Data --- Event Stage 2: Allocation & Setting Container 1 0 0 0. Distributed Mesh 5 0 0 0. DM Label 1 0 0 0. GraphPartitioner 1 0 0 0. Index Set 14 4 3200 0. IS L to G Mapping 1 0 0 0. Section 16 8 5824 0. Star Forest Graph 12 3 2520 0. Discrete System 6 1 952 0. Matrix 8 0 0 0. Vector 62 0 0 0. Vec Scatter 1 0 0 0. Tao 1 0 0 0. SNES 1 0 0 0. DMSNES 1 0 0 0. SNESLineSearch 1 0 0 0. Krylov Solver 1 0 0 0. DMKSP interface 1 0 0 0. Preconditioner 1 0 0 0. Viewer 1 0 0 0. ======================================================================================================================== Average time to get PetscTime(): 3.59956e-08 #PETSc Option Table entries: -log_view -netfile datafiles/case_ACTIVSg10k.m -opflow_ksp_error_if_not_converged -opflow_pc_factor_mat_ordering_type qmd -opflow_pc_factor_mat_solver_type petsc -opflow_pc_factor_shift_type NONZERO -opflow_pc_type lu -opflow_snes_max_funcs 100000 -opflow_snes_max_it 200 -opflow_solver TAO -opflow_tao_catol 1e-3 -opflow_tao_converged_reason -opflow_tao_crtol 1e-6 -opflow_tao_gatol 1e-3 -opflow_tao_grtol 1e-6 -opflow_tao_max_funcs 100000 -opflow_tao_max_it 200 -opflow_tao_monitor -opflow_tao_type pdipm -options_left no -petscpartitioner_type parmetis -vec_type seqcuda #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --download-fblaslapack --download-make --download-metis --download-mumps --download-parmetis --download-scalapack --download-superlu_dist --download-superlu_dist-gpu=1 --with-clanguage=cxx --with-cc=mpicc --with-cuda-dir=/share/apps/cuda/10.2 --with-cuda=1 --with-cxx=mpicxx --with-fc=mpif77 --with-openmp=1 --with-debugging=0 COPTFLAGS="-fPIC -O3" CXXOPTFLAGS="-fPIC -O3" FOPTFLAGS="-fPIC -O3" CUDAOPTFLAGS=-O3 PETSC_ARCH=opt-mode-newell-pdipm-cxx ----------------------------------------- Libraries compiled on 2020-03-10 17:48:31 on newell01.pnl.gov Machine characteristics: Linux-4.14.0-115.7.1.el7a.ppc64le-ppc64le-with-redhat-7.6-Maipo Using PETSc directory: /people/abhy245/software/petsc Using PETSc arch: opt-mode-newell-pdipm-cxx ----------------------------------------- Using C compiler: mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -fPIC -O3 -fopenmp Using Fortran compiler: mpif77 -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -fPIC -O3 -fopenmp ----------------------------------------- Using include paths: -I/people/abhy245/software/petsc/include -I/people/abhy245/software/petsc/opt-mode-newell-pdipm-cxx/include -I/share/apps/cuda/10.2/include ----------------------------------------- Using C linker: mpicxx Using Fortran linker: mpif77 Using libraries: -Wl,-rpath,/people/abhy245/software/petsc/opt-mode-newell-pdipm-cxx/lib -L/people/abhy245/software/petsc/opt-mode-newell-pdipm-cxx/lib -lpetsc -Wl,-rpath,/people/abhy245/software/petsc/opt-mode-newell-pdipm-cxx/lib -L/people/abhy245/software/petsc/opt-mode-newell-pdipm-cxx/lib -Wl,-rpath,/share/apps/cuda/10.2/lib64 -L/share/apps/cuda/10.2/lib64 -Wl,-rpath,/share/apps/openmpi/3.1.3/gcc/7.4.0/lib -L/share/apps/openmpi/3.1.3/gcc/7.4.0/lib -Wl,-rpath,/qfs/projects/ops/rh7p9/gcc/7.4.0/lib/gcc/powerpc64le-unknown-linux-gnu/7.4.0 -L/qfs/projects/ops/rh7p9/gcc/7.4.0/lib/gcc/powerpc64le-unknown-linux-gnu/7.4.0 -Wl,-rpath,/qfs/projects/ops/rh7p9/gcc/7.4.0/lib/gcc -L/qfs/projects/ops/rh7p9/gcc/7.4.0/lib/gcc -Wl,-rpath,/qfs/projects/ops/rh7p9/gcc/7.4.0/lib64 -L/qfs/projects/ops/rh7p9/gcc/7.4.0/lib64 -Wl,-rpath,/qfs/projects/ops/rh7p9/gcc/7.4.0/lib -L/qfs/projects/ops/rh7p9/gcc/7.4.0/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist -lflapack -lfblas -lparmetis -lmetis -lm -lX11 -lcufft -lcublas -lcudart -lcusparse -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lpthread -lstdc++ -ldl -----------------------------------------