0 SNES Function norm 1.027913970935e+00 0 KSP Residual norm 5.361584434805e+02 1 KSP Residual norm 1.344248511840e-01 2 KSP Residual norm 1.353058567917e-03 3 KSP Residual norm 1.349082481980e-05 4 KSP Residual norm 2.133023320121e-07 1 SNES Function norm 1.617487035933e-05 0 KSP Residual norm 4.177693110178e+01 1 KSP Residual norm 1.141800000529e-05 2 KSP Residual norm 2.890349576551e-09 2 SNES Function norm 1.241941226490e-07 0 KSP Residual norm 3.422770299684e-01 1 KSP Residual norm 6.496253397408e-08 2 KSP Residual norm 2.461559357679e-11 3 SNES Function norm 1.078577744090e-11 SNES Object: 16 MPI processes type: newtonls maximum iterations=50, maximum function evaluations=10000 tolerances: relative=1e-08, absolute=1e-50, solution=1e-08 total number of linear solver iterations=8 total number of function evaluations=4 norm schedule ALWAYS SNESLineSearch Object: 16 MPI processes type: bt interpolation: cubic alpha=1.000000e-04 maxstep=1.000000e+08, minlambda=1.000000e-12 tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 maximum iterations=40 KSP Object: 16 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-09, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 16 MPI processes type: mg MG: type is MULTIPLICATIVE, levels=3 cycles=v Cycles per PCApply=1 Not using Galerkin computed coarse grid matrices Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 16 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 16 MPI processes type: redundant Redundant preconditioner: First (color=0) of 16 PCs follows KSP Object: (mg_coarse_redundant_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_redundant_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5, needed 13.7197 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=1640961, cols=1640961 package used to perform factorization: petsc total: nonzeros=1.12497e+08, allocated nonzeros=1.12497e+08 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=1640961, cols=1640961 total: nonzeros=8.19968e+06, allocated nonzeros=8.19968e+06 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=1640961, cols=1640961 total: nonzeros=8.19968e+06, allocated nonzeros=8.19968e+06 total number of mallocs used during MatSetValues calls =0 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 16 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 16 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=6558721, cols=6558721 total: nonzeros=3.27834e+07, allocated nonzeros=3.27834e+07 total number of mallocs used during MatSetValues calls =0 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 16 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_2_) 16 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=26224641, cols=26224641 total: nonzeros=1.31103e+08, allocated nonzeros=1.31103e+08 total number of mallocs used during MatSetValues calls =0 Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: 16 MPI processes type: mpiaij rows=26224641, cols=26224641 total: nonzeros=1.31103e+08, allocated nonzeros=1.31103e+08 total number of mallocs used during MatSetValues calls =0 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./ex5 on a arch-linux2-c-opt named helios91 with 16 processors, by tnicolas Thu Oct 15 14:07:36 2015 Using Petsc Release Version 3.6.0, Jun, 09, 2015 Max Max/Min Avg Total Time (sec): 1.822e+02 1.00022 1.822e+02 Objects: 2.490e+02 1.00000 2.490e+02 Flops: 1.447e+11 1.00003 1.447e+11 2.316e+12 Flops/sec: 7.947e+08 1.00022 7.946e+08 1.271e+10 MPI Messages: 8.910e+02 1.71676 7.085e+02 1.134e+04 MPI Message Lengths: 1.676e+08 1.01222 2.353e+05 2.667e+09 MPI Reductions: 4.050e+02 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 1.8216e+02 100.0% 2.3159e+12 100.0% 1.134e+04 100.0% 2.353e+05 100.0% 4.040e+02 99.8% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage SNESSolve 1 1.0 1.8043e+02 1.0 1.45e+11 1.0 1.1e+04 2.4e+05 3.6e+02 99100 97100 88 99100 97100 89 12835 SNESFunctionEval 4 1.0 1.9804e-01 1.1 7.22e+07 1.0 1.9e+02 1.0e+04 0.0e+00 0 0 2 0 0 0 0 2 0 0 5826 SNESJacobianEval 9 1.0 1.7695e+00 1.0 0.00e+00 0.0 4.3e+02 6.0e+03 1.8e+01 1 0 4 0 4 1 0 4 0 4 0 SNESLineSearch 3 1.0 4.2320e-01 1.2 1.53e+08 1.0 2.9e+02 1.0e+04 1.2e+01 0 0 3 0 3 0 0 3 0 3 5763 VecDot 3 1.0 1.7738e-02 1.3 9.85e+06 1.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 1 0 0 0 0 1 8871 VecMDot 8 1.0 1.6123e-01 2.4 5.25e+07 1.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 2 0 0 0 0 2 5205 VecNorm 18 1.0 9.3466e-02 1.6 5.91e+07 1.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 4 0 0 0 0 4 10101 VecScale 77 1.0 6.7001e-02 1.3 1.83e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 4351 VecCopy 9 1.0 5.7824e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 133 1.0 1.8206e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 3 1.0 2.4214e-02 1.0 9.85e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 6498 VecAYPX 22 1.0 1.0268e-01 1.6 2.26e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3512 VecWAXPY 3 1.0 3.2856e-02 1.0 4.92e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2395 VecMAXPY 11 1.0 1.2761e-01 1.0 7.88e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 9864 VecPointwiseMult 6 1.0 5.6350e-03 1.6 1.54e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 4365 VecScatterBegin 188 1.0 1.9245e-01 1.1 0.00e+00 0.0 9.7e+03 2.3e+05 0.0e+00 0 0 86 83 0 0 0 86 83 0 0 VecScatterEnd 188 1.0 7.3301e-01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecReduceArith 6 1.0 1.5279e-02 1.3 1.97e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 20597 VecReduceComm 3 1.0 6.8052e-0289.4 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 1 0 0 0 0 1 0 VecNormalize 11 1.0 1.1098e-01 1.3 5.42e+07 1.0 0.0e+00 0.0e+00 1.1e+01 0 0 0 0 3 0 0 0 0 3 7798 MatMult 33 1.0 7.8697e-01 1.1 3.65e+08 1.0 1.6e+03 8.5e+03 0.0e+00 0 0 14 1 0 0 0 14 1 0 7422 MatMultAdd 22 1.0 3.4402e-01 1.5 1.02e+08 1.0 7.3e+02 2.8e+03 0.0e+00 0 0 6 0 0 0 0 6 0 0 4716 MatMultTranspose 30 1.0 3.5397e-01 1.2 1.38e+08 1.0 9.9e+02 2.8e+03 0.0e+00 0 0 9 0 0 0 0 9 0 0 6251 MatSolve 11 1.0 3.3544e+00 1.0 2.46e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 11719 MatSOR 44 1.0 8.8550e+00 1.6 1.25e+09 1.0 3.2e+03 7.7e+03 8.8e+01 4 1 28 1 22 4 1 28 1 22 2250 MatLUFactorSym 1 1.0 2.9942e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 MatLUFactorNum 3 1.0 1.6042e+02 1.0 1.40e+11 1.0 0.0e+00 0.0e+00 0.0e+00 87 97 0 0 0 87 97 0 0 0 13972 MatCopy 2 1.0 6.3430e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatConvert 1 1.0 9.7278e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatResidual 22 1.0 5.4385e-01 1.1 2.26e+08 1.0 1.1e+03 7.7e+03 0.0e+00 0 0 9 0 0 0 0 9 0 0 6630 MatAssemblyBegin 15 1.0 4.2017e-01 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 2.8e+01 0 0 0 0 7 0 0 0 0 7 0 MatAssemblyEnd 15 1.0 4.5533e-01 1.0 0.00e+00 0.0 4.2e+02 1.2e+03 4.0e+01 0 0 4 0 10 0 0 4 0 10 0 MatGetRowIJ 1 1.0 6.2245e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetSubMatrice 3 1.0 3.1651e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 2 0 0 0 0 2 0 MatGetOrdering 1 1.0 1.5877e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatView 6 1.5 7.5293e-04 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 1 0 0 0 0 1 0 MatRedundantMat 3 1.0 4.7641e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 2 0 0 0 0 2 0 KSPGMRESOrthog 8 1.0 2.4541e-01 1.6 1.05e+08 1.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 2 0 0 0 0 2 6839 KSPSetUp 15 1.0 8.0436e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.6e+01 0 0 0 0 4 0 0 0 0 4 0 KSPSolve 3 1.0 1.7866e+02 1.0 1.45e+11 1.0 1.1e+04 2.5e+05 3.4e+02 98100 93100 83 98100 93100 83 12947 PCSetUp 3 1.0 1.6719e+02 1.0 1.40e+11 1.0 1.9e+03 2.4e+05 2.0e+02 91 97 16 17 49 91 97 16 17 49 13410 PCApply 11 1.0 1.3820e+01 1.3 4.13e+09 1.0 8.3e+03 2.6e+05 8.8e+01 7 3 73 83 22 7 3 73 83 22 4782 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage SNES 1 1 1332 0 SNESLineSearch 1 1 864 0 DMSNES 4 4 2816 0 Vector 124 124 871042800 0 Vector Scatter 15 15 44365856 0 Matrix 20 20 1967799792 0 Distributed Mesh 7 7 34336 0 Star Forest Bipartite Graph 14 14 11872 0 Discrete System 7 7 5936 0 Index Set 35 35 45692680 0 IS L to G Mapping 6 6 17278720 0 Krylov Solver 5 5 23280 0 DMKSP interface 3 3 1944 0 Preconditioner 5 5 4912 0 Viewer 2 1 760 0 ======================================================================================================================== Average time to get PetscTime(): 0 Average time for MPI_Barrier(): 1.81198e-06 Average time for zero size MPI_Send(): 5.87106e-06 #PETSc Option Table entries: -da_grid_x 21 -da_grid_y 21 -da_refine 8 -ksp_monitor -ksp_rtol 1e-9 -log_summary -mg_levels_ksp_type richardson -pc_mg_levels 3 -pc_type mg -snes_monitor -snes_view #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --prefix=/csc/softs/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real --with-debugging=0 --with-x=0 --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --with-fortran --known-mpi-shared-libraries=1 --with-scalar-type=real --with-precision=double --CFLAGS="-g -O3 -mavx -mkl" --CXXFLAGS="-g -O3 -mavx -mkl" --FFLAGS="-g -O3 -mavx -mkl" ----------------------------------------- Libraries compiled on Mon Sep 28 20:22:47 2015 on helios85 Machine characteristics: Linux-2.6.32-573.1.1.el6.Bull.80.x86_64-x86_64-with-redhat-6.4-Santiago Using PETSc directory: /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0 Using PETSc arch: arch-linux2-c-opt ----------------------------------------- Using C compiler: mpicc -g -O3 -mavx -mkl -fPIC ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: mpif90 -g -O3 -mavx -mkl -fPIC ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/arch-linux2-c-opt/include -I/csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/include -I/csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/include -I/csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/arch-linux2-c-opt/include -I/opt/mpi/bullxmpi/1.2.8.2/include ----------------------------------------- Using C linker: mpicc Using Fortran linker: mpif90 Using libraries: -Wl,-rpath,/csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/arch-linux2-c-opt/lib -L/csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/arch-linux2-c-opt/lib -lpetsc -lhwloc -lxml2 -lssl -lcrypto -Wl,-rpath,/opt/mpi/bullxmpi/1.2.8.2/lib -L/opt/mpi/bullxmpi/1.2.8.2/lib -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -lmpi_f90 -lmpi_f77 -lm -lifport -lifcore -lm -lmpi_cxx -ldl -Wl,-rpath,/opt/mpi/bullxmpi/1.2.8.2/lib -L/opt/mpi/bullxmpi/1.2.8.2/lib -lmpi -lnuma -lrt -lnsl -lutil -Wl,-rpath,/opt/mpi/bullxmpi/1.2.8.2/lib -L/opt/mpi/bullxmpi/1.2.8.2/lib -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -limf -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -lsvml -lirng -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lpthread -lirc_s -Wl,-rpath,/opt/mpi/bullxmpi/1.2.8.2/lib -L/opt/mpi/bullxmpi/1.2.8.2/lib -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -ldl -----------------------------------------