+--------------------------------------------------------------------+ + PaStiX : Parallel Sparse matriX package + +--------------------------------------------------------------------+ Matrix size 222264 x 222264 Number of nonzeros in A 1524222 +--------------------------------------------------------------------+ + Options + +--------------------------------------------------------------------+ Version : exported SMP_SOPALIN : Defined VERSION MPI : Defined PASTIX_DYNSCHED : Not defined STATS_SOPALIN : Not defined NAPA_SOPALIN : Defined TEST_IRECV : Not defined TEST_ISEND : Defined THREAD_COMM : Not defined THREAD_FUNNELED : Not defined TAG : Exact Thread FORCE_CONSO : Not defined RECV_FANIN_OR_BLOCK : Not defined OUT_OF_CORE : Not defined DISTRIBUTED : Defined METIS : Not defined WITH_SCOTCH : Defined INTEGER TYPE : int32_t FLOAT TYPE : double complex +--------------------------------------------------------------------+ Check : Numbering OK Check : Sort CSC OK Check : Duplicates OK Ordering : Time to compute ordering 8.39 s Symbolic Factorization : Analyse : Number of cluster 4 Number of processor per cluster 1 Number of thread number per MPI process 1 Building elimination graph Building cost matrix Total cost of the elimination tree 157.299 Total cost of the elimination tree 157.299 Building elimination tree Total cost of the elimination tree 157.299 Spliting initial partition Using proportionnal mapping Total cost of the elimination tree 157.299 Total cost of the elimination tree 126.83 Total cost of the elimination tree 126.83 Total cost of the elimination tree 126.83 ** New Partition: cblknbr= 6990 bloknbr= 195747 ratio=28.003862 ** Factorization of the new symbol matrix by Crout blok algo takes : 3.46581e+11 Re-Building elimination graph Total cost of the elimination tree 126.83 Building task graph Number of tasks 6990 Distributing partition 2 : Genering final SolverMatrix NUMBER of THREAD 1 NUMBER of BUBBLE 1 3 : Genering final SolverMatrix NUMBER of THREAD 1 NUMBER of BUBBLE 1 0 : Genering final SolverMatrix NUMBER of THREAD 1 NUMBER of BUBBLE 1 1 : Genering final SolverMatrix NUMBER of THREAD 1 NUMBER of BUBBLE 1 COEFMAX 884852 CPFTMAX 24649 BPFTMAX 0 NBFTMAX 98 ARFTMAX 865488 2 : SolverMatrix size (without coefficients) 3 Mo 2 : Number of nonzeros (local block structure) 25950751 COEFMAX 847540 CPFTMAX 24649 BPFTMAX 0 NBFTMAX 98 ARFTMAX 865488 COEFMAX 907774 CPFTMAX 24649 BPFTMAX 0 NBFTMAX 98 ARFTMAX 865488 3 : SolverMatrix size (without coefficients) 5.84 Mo 3 : Number of nonzeros (local block structure) 40669292 ** End of Partition & Distribution phase ** COEFMAX 930696 CPFTMAX 24649 BPFTMAX 0 NBFTMAX 98 ARFTMAX 865488 Time to analyze 0.255 s Number of nonzeros in factorized matrice 137325351 Fill-in 90.0954 Number of operations (LLt) 3.46581e+11 Prediction Time to factorize (IBM PWR5 ESSL) 33.2 s 0 : SolverMatrix size (without coefficients) 5.44 Mo 0 : Number of nonzeros (local block structure) 39001859 Numerical Factorization : 1 : SolverMatrix size (without coefficients) 5.91 Mo 1 : Number of nonzeros (local block structure) 38581540 Time to fill internal csc 0.145 s --- Sopalin : Allocation de la structure globale --- --- Fin Sopalin Init --- --- Initialisation des tableaux globaux --- --- Sopalin : Local structure allocation --- --- Sopalin : Threads are binded --- --- Sopalin Begin --- --- Sopalin End --- Static pivoting 0 Time to factorize 105 s Solve : --- Sopalin : Allocation de la structure globale --- --- Fin Sopalin Init --- --- Sopalin : Local structure allocation --- --- Sopalin : Threads are binded --- --- Down Step --- --- Diag Step --- --- Up Step --- Time to solve 0.561 s Reffinement : --- Sopalin : Allocation de la structure globale --- --- Fin Sopalin Init --- Refinement 0 iterations, norm=1 Time for refinement 0.036 s Linear solve converged due to CONVERGED_ITS iterations 1 KSP Object: 4 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: 4 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 0, needed 0 Factored matrix follows: Matrix Object: 4 MPI processes type: mpiaij rows=222264, cols=222264 package used to perform factorization: pastix Error : 1 Error : 1 Error : 1 total: nonzeros=0, allocated nonzeros=0 total number of mallocs used during MatSetValues calls =0 PaStiX run parameters: Matrix type : Symmetric Level of printing (0,1,2): 2 Number of refinements iterations : 0 Error : 1 linear system matrix = precond matrix: Matrix Object: 4 MPI processes type: mpiaij rows=222264, cols=222264 total: nonzeros=2826180, allocated nonzeros=2826180 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Total wall clock time needed 116.346 seconds Relative residual norm -nan. Iterations 1 Norm of error 1.000000e+00 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- home/test on a openmpi-i named node212 with 4 processors, by agrayver Wed Dec 28 14:49:00 2011 Using Petsc Development HG revision: 199bab0ea052fc92ce8e4abb56afc442629a19c8 HG Date: Tue Dec 13 22:22:13 2011 -0800 Max Max/Min Avg Total Time (sec): 1.170e+02 1.00022 1.170e+02 Objects: 3.000e+01 1.00000 3.000e+01 Flops: 1.111e+06 1.00000 1.111e+06 4.445e+06 Flops/sec: 9.501e+03 1.00022 9.500e+03 3.800e+04 MPI Messages: 2.400e+01 1.33333 1.950e+01 7.800e+01 MPI Message Lengths: 2.927e+07 2.39404 8.478e+05 6.613e+07 MPI Reductions: 5.700e+01 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 1.1698e+02 100.0% 4.4453e+06 100.0% 7.800e+01 100.0% 8.478e+05 100.0% 5.600e+01 98.2% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %f - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatSolve 1 1.0 6.5932e-01 1.0 0.00e+00 0.0 1.2e+01 8.9e+05 1.0e+00 1 0 15 16 2 1 0 15 16 2 0 MatLUFactorSym 1 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatLUFactorNum 1 1.0 1.1568e+02 1.0 0.00e+00 0.0 2.4e+01 1.1e+05 1.7e+01 99 0 31 4 30 99 0 31 4 30 0 MatAssemblyBegin 2 1.0 9.8679e-02822.8 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 4 0 0 0 0 4 0 MatAssemblyEnd 2 1.0 8.6417e-02 1.0 0.00e+00 0.0 2.4e+01 7.6e+04 9.0e+00 0 0 31 3 16 0 0 31 3 16 0 MatGetSubMatrice 1 1.0 1.2759e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 14 0 0 0 0 14 0 MatLoad 1 1.0 4.4161e-01 1.0 0.00e+00 0.0 3.3e+01 1.4e+06 1.8e+01 0 0 42 68 32 0 0 42 68 32 0 MatView 2 1.0 2.1272e-0312.3 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 4 0 0 0 0 4 0 VecView 1 1.0 1.8570e-02 2.1 0.00e+00 0.0 3.0e+00 8.9e+05 0.0e+00 0 0 4 4 0 0 0 4 4 0 0 VecNorm 2 1.0 2.2527e-0213.0 4.45e+05 1.0 0.0e+00 0.0e+00 2.0e+00 0 40 0 0 4 0 40 0 0 4 79 VecSet 1 1.0 4.4298e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 1 1.0 9.4295e-04 3.1 4.45e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 40 0 0 0 0 40 0 0 0 1886 VecAssemblyBegin 2 1.0 7.7121e-0363.8 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 11 0 0 0 0 11 0 VecAssemblyEnd 2 1.0 5.0068e-06 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecLoad 2 1.0 5.5238e-02 1.0 0.00e+00 0.0 6.0e+00 8.9e+05 8.0e+00 0 0 8 8 14 0 0 8 8 14 0 VecScatterBegin 2 1.0 1.1051e-02 1.3 0.00e+00 0.0 1.2e+01 8.9e+05 1.0e+00 0 0 15 16 2 0 0 15 16 2 0 VecScatterEnd 1 1.0 7.9820e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSetup 1 1.0 9.5367e-07 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 1.1635e+02 1.0 0.00e+00 0.0 3.6e+01 3.7e+05 2.6e+01 99 0 46 20 46 99 0 46 20 46 0 PCSetUp 1 1.0 1.1569e+02 1.0 0.00e+00 0.0 2.4e+01 1.1e+05 2.5e+01 99 0 31 4 44 99 0 31 4 45 0 PCApply 1 1.0 6.5932e-01 1.0 0.00e+00 0.0 1.2e+01 8.9e+05 1.0e+00 1 0 15 16 2 1 0 15 16 2 0 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Viewer 5 4 2912 0 Matrix 7 7 77903688 0 Vector 7 7 8958080 0 Vector Scatter 3 3 2748 0 Index Set 6 6 4472 0 Krylov Solver 1 1 1080 0 Preconditioner 1 1 944 0 ======================================================================================================================== Average time to get PetscTime(): 9.53674e-08 Average time for MPI_Barrier(): 3.00407e-06 Average time for zero size MPI_Send(): 3.99351e-06 #PETSc Option Table entries: -ksp_converged_reason -ksp_monitor -ksp_monitor_true_residual -ksp_type preonly -ksp_view -log_summary -mat_pastix_verbose 2 -mat_type mpiaij -pc_factor_mat_solver_package pastix -pc_type lu -rhs RHS_1_1.dat -sm A1.dat -sol Solution_1_1.dat #End of PETSc Option Table entries Compiled with FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4 Configure run at: Wed Dec 14 10:54:32 2011 Configure options: --with-petsc-arch=openmpi-intel-complex-release-f-ds --with-fortran-interfaces=1 --download-superlu --download-superlu_dist --download-mumps --download-pastix --download-parmetis --download-metis --download-ptscotch --with-scalapack-lib=/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_scalapack_lp64.a --with-scalapack-include=/opt/intel/Compiler/11.1/072/mkl/include --with-blacs-lib=/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_blacs_openmpi_lp64.a --with-blacs-include=/opt/intel/Compiler/11.1/072/mkl/include --with-mpi-dir=/opt/mpi/intel/openmpi-1.4.2 --with-scalar-type=complex --with-blas-lapack-dir=/opt/intel/Compiler/11.1/072/mkl/lib/em64t --with-precision=double --with-debugging=0 --with-fortran-kernels=1 --with-x=0 ----------------------------------------- Libraries compiled on Wed Dec 14 10:54:32 2011 on glic1 Machine characteristics: Linux-2.6.32.12-0.7-default-x86_64-with-SuSE-11-x86_64 Using PETSc directory: home/lib/petsc-dev Using PETSc arch: openmpi-intel-complex-release-f-ds ----------------------------------------- Using C compiler: /opt/mpi/intel/openmpi-1.4.2/bin/mpicc -wd1572 -Qoption,cpp,--extended_float_type -O3 ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /opt/mpi/intel/openmpi-1.4.2/bin/mpif90 -O3 ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/lib/petsc-dev/openmpi-intel-complex-release-f-ds/include -I/home/lib/petsc-dev/include -Ihome/lib/petsc-dev/include -I/home/lib/petsc-dev/openmpi-intel-complex-release-f-ds/include -I/opt/mpi/intel/openmpi-1.4.2/include ----------------------------------------- Using C linker: /opt/mpi/intel/openmpi-1.4.2/bin/mpicc Using Fortran linker: /opt/mpi/intel/openmpi-1.4.2/bin/mpif90 Using libraries: -Wl,-rpath,home/lib/petsc-dev/openmpi-intel-complex-release-f-ds/lib -Lhome/lib/petsc-dev/openmpi-intel-complex-release-f-ds/lib -lpetsc -Wl,-rpath,home/lib/petsc-dev/openmpi-intel-complex-release-f-ds/lib -Lhome/lib/petsc-dev/openmpi-intel-complex-release-f-ds/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -Wl,-rpath,/opt/intel/Compiler/11.1/072/mkl/lib/em64t -L/opt/intel/Compiler/11.1/072/mkl/lib/em64t -lmkl_scalapack_lp64 -lmkl_blacs_openmpi_lp64 -lsuperlu_dist_3.0 -lparmetis -lmetis -lpastix -lptesmumps -lptscotch -lptscotcherr -lsuperlu_4.3 -lpthread -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -ldl -L/opt/mpi/intel/openmpi-1.4.2/lib -lmpi -lopen-rte -lopen-pal -lnsl -lutil -L/opt/intel/Compiler/11.1/072/ipp/em64t/lib -L/opt/intel/Compiler/11.1/072/mkl/interfaces/fftw3xf -L//opt/intel/Compiler/11.1/072/lib/intel64 -L/usr/lib64/gcc/x86_64-suse-linux/4.3 -L/usr/x86_64-suse-linux/lib -limf -lsvml -lipgo -ldecimal -lgcc_s -lirc -lpthread -lirc_s -lmpi_f90 -lmpi_f77 -lifport -lifcore -lm -lm -lrt -lm -lrt -lm -lz -lz -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -limf -lsvml -lipgo -ldecimal -lgcc_s -lirc -lpthread -lirc_s -ldl