Case $ Mon Apr 30 05:49:29 +03 2018 Compiler gcc + mkl Loaded modules: gcc/7.1.0:mpi/openmpi-3.0.0-gcc-7.1.0:intel/mkl_2017.0.098 Working directory: /X/pdslin_2.0.0/examples Command: mpirun -np 16 ./dtest_pdslin matrices/symInput /X/dataDir/ijv/audikw_1.ijv 16 reading from the input file (matrices/symInput). input->nproc_schur = 16. input->nproc_dcomp = 16. input->input_type = INPUT_IJV (0). input->mat_type = SYMMETRIC. input->mat_pattern = SYMMETRIC. input->remove_zero = NO (0). input->dcomp_type = SCOTCH (1). input->num_doms = 8. input->tau_sub = 0.000000e+00. input->dom_solver = SLU_DIST (0). input->dom_size = 1000. input->blk_size = 1. input->pperm_schur = NO (0). input->psymb_schur = YES (1). input->equil_schur = 5 (both rowperm and equil). switching to use MC77. input->relax_factor = 2.000000e+01. input->equil_dom = 5 (both equil and row perm). input->perm_dom = MMD_AT_PLUS_A (2). input->inner_outer = YES (1). ** keep the local matrix input->outer_tol = 1.000000e-12. input->outer_max = 50. input->diag_tau = 0.000000e+00. input->drop_tau0 = 1.000000e-06. input->drop_tau2 = 0.000000e+00. input->drop_tau1 = 1.000000e-05. input->drop_tau3 = 0.000000e+00. input->ilu_lvl = -1. input->asm_ovlp = 1. input->asm_nsub = 1. input->inner_solver = BICGSTAB (2). input->inner_max = 1000. input->inner_restart= 500. input->inner_tol = 1.000000e-12. input->ortho_scheme = CLASSICAL. input->precond_type = PR_SLU_ILU (0). inpuf->patoh_lbound = 0.000000e+00. inpuf->patoh_ubound = 4.000000e-01. inpuf->patoh_sparsity = 1.000000e-01. ## proc 0: running on 16 processors. reading in matrix from /X/dataDir/ijv/audikw_1.ijv zero elements are not checked. dread_ijv: n=943695 m=943695 nnz=39297771 (77651847). m_flag 1, c_flag 0 -- scattering the matrix into 16 processors -- took 2.833750e+01 seconds to read matrix. -- local nnz (3573432--10580373: 2.960844e+00) BeforeFacto(1) /proc/self/status: peakRSS= 185.613 peakVM= 606.426 ( 606.426) VmSize= 592.980, ( 592.980)MB TotalVirt=8443.270 AfterRedist(1) /proc/self/status: peakRSS= 185.613 peakVM= 606.426 ( 606.426) VmSize= 592.984, ( 592.984)MB TotalVirt=8443.887 factorizing the matrix. gathering matrix to processors that compute matrix partition took 1.480803e-06 seconds. AfterPerm(1) /proc/self/status: peakRSS= 185.613 peakVM= 606.426 ( 606.426) VmSize= 593.660, ( 593.660)MB TotalVirt=8454.672 -- calling ParMetis (0) -- -- ParMetis done (0) -- ParMETIS took 1.001070e+01 seconds. # 0: computation of matrix partitioning took 1.011731e+01 seconds AfterDD(0) /proc/self/status: peakRSS=1850.074 peakVM=2270.977 (2270.977) VmSize= 673.090, ( 673.090)MB TotalVirt=9495.348 * distribute interfaces to load-balance matvec * rows of the Schur complement is distributed equaly among the processors.. == point-to-point buffer == == communication 16 cores at a time == $$ 0: current nnz: 4699202 (original A). matrix redistribution took 4.075976e-01 seconds # factorization of subdomains by SuperLU_DIST ## 8: factor dom 4 (132933x132933, nnz=10648883 (4802784--5846099: 1.217231e+00)) using 1x2 processor (nproc=2,p1=8,p2=10,id=8).. ## 12: factor dom 6 (102969x102969, nnz=8166717 (2768517--5398200: 1.949853e+00)) using 1x2 processor (nproc=2,p1=12,p2=14,id=12).. ## 4: factor dom 2 (118824x118824, nnz=9447264 (3673818--5773446: 1.571511e+00)) using 1x2 processor (nproc=2,p1=4,p2=6,id=4).. All2All(6) /proc/self/status: peakRSS= 320.938 peakVM= 742.188 ( 742.188) VmSize= 687.234, ( 687.234)MB TotalVirt=10462.477 FreePerm(6) /proc/self/status: peakRSS= 320.938 peakVM= 742.188 ( 742.188) VmSize= 687.238, ( 687.238)MB TotalVirt=10462.480 CallLUD(6) /proc/self/status: peakRSS= 320.938 peakVM= 742.188 ( 742.188) VmSize= 687.238, ( 687.238)MB TotalVirt=10462.480 ## 6: factor dom 3 (127254x127254, nnz=10164222 (3421026--6743196: 1.971103e+00)) using 1x2 processor (nproc=2,p1=6,p2=8,id=6).. ## 2: factor dom 1 (132260x132260, nnz=10608150 (4764799--5843351: 1.226358e+00)) using 1x2 processor (nproc=2,p1=2,p2=4,id=2).. ## 10: factor dom 5 (108795x108795, nnz=8697627 (4062027--4635600: 1.141204e+00)) using 1x2 processor (nproc=2,p1=10,p2=12,id=10).. ## 14: factor dom 7 (91107x91107, nnz=7172181 (2371473--4800708: 2.024357e+00)) using 1x2 processor (nproc=2,p1=14,p2=16,id=14).. ## 0: factor dom 0 (98631x98631, nnz=7846047 (3469119--4376928: 1.261683e+00)) using 1x2 processor (nproc=2,p1=0,p2=2,id=0).. ***** serial symbolic is used ***** using symmetric mode (METIS and no Equil) Nonzeros in L 70504725 Nonzeros in U 70504725 nonzeros in L+U 140910819 nonzeros in LSUB 12573718 SYMBfact time: 0.77 Nonzeros in L 59124414 Nonzeros in U 59124414 nonzeros in L+U 118157721 nonzeros in LSUB 10605676 SYMBfact time: 0.73 Nonzeros in L 71087718 Nonzeros in U 71087718 nonzeros in L+U 142072467 nonzeros in LSUB 13299511 SYMBfact time: 0.88 Nonzeros in L 84520620 Nonzeros in U 84520620 nonzeros in L+U 168922416 nonzeros in LSUB 14938123 SYMBfact time: 0.91 Nonzeros in L 88227309 Nonzeros in U 88227309 nonzeros in L+U 176345823 nonzeros in LSUB 14618158 SYMBfact time: 1.03 Nonzeros in L 90676875 Nonzeros in U 90676875 nonzeros in L+U 181226496 nonzeros in LSUB 15892949 SYMBfact time: 0.99 Nonzeros in L 105430837 Nonzeros in U 105430837 nonzeros in L+U 210729414 nonzeros in LSUB 17596632 SYMBfact time: 1.10 DISTRIBUTE time 1.11 MPI tag upper bound = 2147483647 .. Starting with 1 OpenMP threads === using DAG === DISTRIBUTE time 1.10 MPI tag upper bound = 2147483647 .. Starting with 1 OpenMP threads === using DAG === .. thresh = s_eps 5.960464e-08 * anorm 5.834870e+09 = 3.477854e+02 .. Buffer size: Lsub 4450 Lval 559104 Usub 3090 Uval 275584 LDA 4368 .. thresh = s_eps 5.960464e-08 * anorm 1.605646e+09 = 9.570399e+01 .. Buffer size: Lsub 3167 Lval 397440 Usub 2556 Uval 204160 LDA 3105 max_ncols 2087, max_ldu 128, ldt 128, bigu_size=267136 max_ncols 2153, max_ldu 128, ldt 128, bigu_size=275584 [0] .. BIG U bigu_size 275584 (same either on CPU or GPU) [0] .. BIG V size (on CPU) 131072 Max row size is 4368 Threads per process 1 max_ncols 1530, max_ldu 128, ldt 128, bigu_size=195840 [0] .. BIG U bigu_size 195840 (same either on CPU or GPU) [0] .. BIG V size (on CPU) 131072 Max row size is 3105 Threads per process 1 max_ncols 1595, max_ldu 128, ldt 128, bigu_size=204160 Nonzeros in L 120275679 Nonzeros in U 120275679 nonzeros in L+U 240418425 nonzeros in LSUB 18882570 SYMBfact time: 1.38 DISTRIBUTE time 1.27 MPI tag upper bound = 2147483647 .. Starting with 1 OpenMP threads === using DAG === .. thresh = s_eps 5.960464e-08 * anorm 3.817093e+09 = 2.275165e+02 .. Buffer size: Lsub 4085 Lval 512640 Usub 2903 Uval 256384 LDA 4005 max_ncols 1874, max_ldu 128, ldt 128, bigu_size=239872 max_ncols 2003, max_ldu 128, ldt 128, bigu_size=256384 [0] .. BIG U bigu_size 256384 (same either on CPU or GPU) [0] .. BIG V size (on CPU) 131072 Max row size is 4005 Threads per process 1 DISTRIBUTE time 1.39 MPI tag upper bound = 2147483647 .. Starting with 1 OpenMP threads === using DAG === .. thresh = s_eps 5.960464e-08 * anorm 8.064600e+09 = 4.806876e+02 .. Buffer size: Lsub 4301 Lval 541056 Usub 2905 Uval 263808 LDA 4227 max_ncols 2038, max_ldu 128, ldt 128, bigu_size=260864 max_ncols 2061, max_ldu 128, ldt 128, bigu_size=263808 [0] .. BIG U bigu_size 263808 (same either on CPU or GPU) [0] .. BIG V size (on CPU) 131072 Max row size is 4227 Threads per process 1 DISTRIBUTE time 1.44 MPI tag upper bound = 2147483647 .. Starting with 1 OpenMP threads === using DAG === .. thresh = s_eps 5.960464e-08 * anorm 3.481744e+09 = 2.075281e+02 .. Buffer size: Lsub 4327 Lval 544896 Usub 3600 Uval 272128 LDA 4257 max_ncols 2081, max_ldu 128, ldt 128, bigu_size=266368 max_ncols 2126, max_ldu 128, ldt 128, bigu_size=272128 [0] .. BIG U bigu_size 272128 (same either on CPU or GPU) [0] .. BIG V size (on CPU) 131072 Max row size is 4257 Threads per process 1 DISTRIBUTE time 1.52 MPI tag upper bound = 2147483647 .. Starting with 1 OpenMP threads === using DAG === .. thresh = s_eps 5.960464e-08 * anorm 4.590669e+09 = 2.736252e+02 .. Buffer size: Lsub 4208 Lval 529920 Usub 2948 Uval 260224 LDA 4140 max_ncols 2033, max_ldu 128, ldt 128, bigu_size=260224 max_ncols 1979, max_ldu 128, ldt 128, bigu_size=253312 [0] .. BIG U bigu_size 253312 (same either on CPU or GPU) [0] .. BIG V size (on CPU) 131072 Max row size is 4140 Threads per process 1 DISTRIBUTE time 1.61 MPI tag upper bound = 2147483647 .. Starting with 1 OpenMP threads === using DAG === .. thresh = s_eps 5.960464e-08 * anorm 2.167332e+10 = 1.291830e+03 .. Buffer size: Lsub 4628 Lval 582400 Usub 3521 Uval 295808 LDA 4550 max_ncols 2111, max_ldu 128, ldt 128, bigu_size=270208 [0] .. BIG U bigu_size 270208 (same either on CPU or GPU) [0] .. BIG V size (on CPU) 131072 Max row size is 4550 Threads per process 1 max_ncols 2311, max_ldu 128, ldt 128, bigu_size=295808 DISTRIBUTE time 1.93 MPI tag upper bound = 2147483647 .. Starting with 1 OpenMP threads === using DAG === .. thresh = s_eps 5.960464e-08 * anorm 4.635928e+10 = 2.763229e+03 .. Buffer size: Lsub 5215 Lval 656768 Usub 4007 Uval 327680 LDA 5131 max_ncols 2560, max_ldu 128, ldt 128, bigu_size=327680 max_ncols 2494, max_ldu 128, ldt 128, bigu_size=319232 [0] .. BIG U bigu_size 319232 (same either on CPU or GPU) [0] .. BIG V size (on CPU) 131072 Max row size is 5131 Threads per process 1 Initialization time 0.03 seconds Serial: compute static schedule, allocate storage ---- Time breakdown in factorization ---- Time in Look-ahead update 0.58 seconds Time in Schur update 6.95 seconds .. Time to Gather L buffer 0.10 (Separate L panel by Lookahead/Remain) .. Time to Gather U buffer 0.12 .. Time in GEMM 2.99 * Look-ahead 0.21 * Remain 2.78 .. Time to Scatter 3.67 * Look-ahead 0.15 * Remain 3.52 Total Time in Factorization : 8.31 seconds, Total time in Schur update with offload 0.00 seconds, -------- GEMM maximum block: 128-128-128 ..(14) return from dpdslin_slu_fact Initialization time 0.03 seconds Serial: compute static schedule, allocate storage ---- Time breakdown in factorization ---- Time in Look-ahead update 0.93 seconds Time in Schur update 9.01 seconds .. Time to Gather L buffer 0.12 (Separate L panel by Lookahead/Remain) .. Time to Gather U buffer 0.12 .. Time in GEMM 4.06 * Look-ahead 0.36 * Remain 3.70 .. Time to Scatter 4.62 * Look-ahead 0.23 * Remain 4.39 Total Time in Factorization : 10.74 seconds, Total time in Schur update with offload 0.00 seconds, -------- GEMM maximum block: 128-128-128 ..(0) return from dpdslin_slu_fact Initialization time 0.04 seconds Serial: compute static schedule, allocate storage ---- Time breakdown in factorization ---- Time in Look-ahead update 0.74 seconds Time in Schur update 9.60 seconds .. Time to Gather L buffer 0.13 (Separate L panel by Lookahead/Remain) .. Time to Gather U buffer 0.15 .. Time in GEMM 4.32 * Look-ahead 0.29 * Remain 4.03 .. Time to Scatter 4.92 * Look-ahead 0.20 * Remain 4.72 Total Time in Factorization : 11.24 seconds, Total time in Schur update with offload 0.00 seconds, -------- GEMM maximum block: 128-128-128 ..(12) return from dpdslin_slu_fact Initialization time 0.04 seconds Serial: compute static schedule, allocate storage ---- Time breakdown in factorization ---- Time in Look-ahead update 0.74 seconds Time in Schur update 9.67 seconds .. Time to Gather L buffer 0.14 (Separate L panel by Lookahead/Remain) .. Time to Gather U buffer 0.15 .. Time in GEMM 4.27 * Look-ahead 0.29 * Remain 3.98 .. Time to Scatter 5.02 * Look-ahead 0.19 * Remain 4.83 Total Time in Factorization : 11.65 seconds, Total time in Schur update with offload 0.00 seconds, -------- GEMM maximum block: 128-128-128 ..(4) return from dpdslin_slu_fact Initialization time 0.04 seconds Serial: compute static schedule, allocate storage ---- Time breakdown in factorization ---- Time in Look-ahead update 0.75 seconds Time in Schur update 10.58 seconds .. Time to Gather L buffer 0.15 (Separate L panel by Lookahead/Remain) .. Time to Gather U buffer 0.15 .. Time in GEMM 4.74 * Look-ahead 0.30 * Remain 4.44 .. Time to Scatter 5.43 * Look-ahead 0.20 * Remain 5.23 Total Time in Factorization : 12.53 seconds, Total time in Schur update with offload 0.00 seconds, -------- GEMM maximum block: 128-128-128 ..(6) return from dpdslin_slu_fact Initialization time 0.04 seconds Serial: compute static schedule, allocate storage ---- Time breakdown in factorization ---- Time in Look-ahead update 1.17 seconds Time in Schur update 13.92 seconds .. Time to Gather L buffer 0.16 (Separate L panel by Lookahead/Remain) .. Time to Gather U buffer 0.18 .. Time in GEMM 6.36 * Look-ahead 0.47 * Remain 5.90 .. Time to Scatter 7.09 * Look-ahead 0.34 * Remain 6.75 Total Time in Factorization : 16.21 seconds, Total time in Schur update with offload 0.00 seconds, -------- GEMM maximum block: 128-128-128 ..(10) return from dpdslin_slu_fact Initialization time 0.04 seconds Serial: compute static schedule, allocate storage ---- Time breakdown in factorization ---- Time in Look-ahead update 0.95 seconds Time in Schur update 13.88 seconds .. Time to Gather L buffer 0.18 (Separate L panel by Lookahead/Remain) .. Time to Gather U buffer 0.18 .. Time in GEMM 6.34 * Look-ahead 0.38 * Remain 5.96 .. Time to Scatter 7.07 * Look-ahead 0.27 * Remain 6.80 Total Time in Factorization : 16.79 seconds, Total time in Schur update with offload 0.00 seconds, -------- GEMM maximum block: 128-128-128 ..(2) return from dpdslin_slu_fact Initialization time 0.05 seconds Serial: compute static schedule, allocate storage ---- Time breakdown in factorization ---- Time in Look-ahead update 1.60 seconds Time in Schur update 21.50 seconds .. Time to Gather L buffer 0.22 (Separate L panel by Lookahead/Remain) .. Time to Gather U buffer 0.24 .. Time in GEMM 10.54 * Look-ahead 0.69 * Remain 9.85 .. Time to Scatter 10.34 * Look-ahead 0.45 * Remain 9.89 Total Time in Factorization : 24.58 seconds, Total time in Schur update with offload 0.00 seconds, -------- GEMM maximum block: 128-128-128 ************************************************** **** Time (seconds) **** COLPERM time 2.17 SYMBFACT time 1.10 DISTRIBUTE time 1.61 FACTOR time 16.84 Factor flops 4.081367e+11 Mflops 24242.88 SOLVE time 0.00 ************************************************** # 2: fill-ratio: (105430837+105430837)/5843351=3.608574e+01, time: 2.223084e+01 (2.223084e+01,2.223091e+01) seconds ************************************************** **** Time (seconds) **** COLPERM time 1.99 SYMBFACT time 1.03 DISTRIBUTE time 1.44 FACTOR time 16.25 Factor flops 3.415694e+11 Mflops 21014.70 SOLVE time 0.00 ************************************************** # 10: fill-ratio: (88227309+88227309)/4062027=4.344004e+01, time: 2.120044e+01 (2.120044e+01,2.120063e+01) seconds ************************************************** **** Time (seconds) **** COLPERM time 1.85 SYMBFACT time 0.88 DISTRIBUTE time 1.27 FACTOR time 11.28 Factor flops 2.160424e+11 Mflops 19154.26 SOLVE time 0.00 ************************************************** # 12: fill-ratio: (71087718+71087718)/2768517=5.135437e+01, time: 1.574500e+01 (1.574500e+01,1.574523e+01) seconds .. Total nnz(LU): 1379696354, fill-ratio: 18.96 ************************************************** .. options: ** Fact : 0 ** Equil : 0 ** ParSymbFact : 0 ** ColPerm : 4 ** RowPerm : 0 ** ReplaceTinyPivot : 0 ** IterRefine : 0 ** Trans : 0 ** num_lookaheads : 10 ** SymPattern : 0 ** lookahead_etree : 0 ************************************************** ************************************************** **** Time (seconds) **** COLPERM time 1.55 SYMBFACT time 0.77 DISTRIBUTE time 1.11 FACTOR time 10.77 Factor flops 2.298040e+11 Mflops 21331.53 SOLVE time 0.00 ************************************************** # 0: fill-ratio: (70504725+70504725)/4376928=3.221653e+01, time: 1.459867e+01 (1.459867e+01,1.459872e+01) seconds ************************************************** **** Time (seconds) **** COLPERM time 1.90 SYMBFACT time 0.91 DISTRIBUTE time 1.39 FACTOR time 11.68 Factor flops 2.642398e+11 Mflops 22618.50 SOLVE time 0.00 ************************************************** # 4: fill-ratio: (84520620+84520620)/5773446=2.927909e+01, time: 1.636408e+01 (1.636408e+01,1.636416e+01) seconds ************************************************** **** Time (seconds) **** COLPERM time 2.09 SYMBFACT time 0.99 DISTRIBUTE time 1.52 FACTOR time 12.57 Factor flops 2.939753e+11 Mflops 23380.06 SOLVE time 0.00 ************************************************** # 6: fill-ratio: (90676875+90676875)/6743196=2.689433e+01, time: 1.772623e+01 (1.772623e+01,1.772634e+01) seconds ************************************************** **** Time (seconds) **** COLPERM time 1.58 SYMBFACT time 0.73 DISTRIBUTE time 1.10 FACTOR time 8.34 Factor flops 1.614347e+11 Mflops 19349.10 SOLVE time 0.00 ************************************************** # 14: fill-ratio: (59124414+59124414)/2371473=4.986303e+01, time: 1.217488e+01 (1.217488e+01,1.217490e+01) seconds ..(8) return from dpdslin_slu_fact ************************************************** **** Time (seconds) **** COLPERM time 2.53 SYMBFACT time 1.38 DISTRIBUTE time 1.93 FACTOR time 24.63 Factor flops 5.609668e+11 Mflops 22776.53 SOLVE time 0.00 ************************************************** # 8: fill-ratio: (120275679+120275679)/5846099=4.114733e+01, time: 3.103986e+01 (3.103985e+01,3.103986e+01) seconds BeforeCompS(8) /proc/self/status: peakRSS=1671.812 peakVM=2092.867 (2092.867) VmSize=1936.426, (1936.426)MB TotalVirt=26048.469 InCompS(8) /proc/self/status: peakRSS=1671.812 peakVM=2092.867 (2092.867) VmSize=1936.426, (1936.426)MB TotalVirt=26048.469 BeforeCompG(8) /proc/self/status: peakRSS=1671.812 peakVM=2092.867 (2092.867) VmSize=1936.426, (1936.426)MB TotalVirt=26048.469 interface[4]: 132933x30922 nnz=257344 (128665--128679: 1.000109e+00) nnzcol=8493 interface[6]: 102969x30922 nnz=136116 (68058--68058: 1.000000e+00) nnzcol=4320 interface[5]: 108795x30922 nnz=253530 (126741--126789: 1.000379e+00) nnzcol=7701 interface[3]: 127254x30922 nnz=227313 (113649--113664: 1.000132e+00) nnzcol=7182 interface[7]: 91107x30922 nnz=154647 (77319--77328: 1.000116e+00) nnzcol=4932 interface[1]: 132260x30922 nnz=317142 (158560--158582: 1.000139e+00) nnzcol=11176 # 0: subdomain factorization took 3.104433e+01 seconds. $$ 0: current nnz: 4699202 (water mark: 4699202). BeforeCompS(0): Bal=1.456674e+00 (1.936426e+03, 1.329348e+03, 2.604847e+04) ** computing an approximate schur complement ** schurs complement (30922x30922) is computed by 1 columns with tau0=1.000000e-06 on W and tau1=1.000000e-05 on T ** Post-ordering permutations ** BeforeCompG(0): Bal=1.456674e+00 (1.936426e+03, 1.329348e+03, 2.604847e+04) interface[0]: 98631x30922 nnz=238104 (119052--119052: 1.000000e+00) nnzcol=7578 interface[2]: 118824x30922 nnz=246816 (123399--123417: 1.000146e+00) nnzcol=7719 AfterCompG(0): Bal=1.625115e+00 (2.491898e+03, 1.533367e+03, 3.232141e+04) AfterCompG(2) /proc/self/status: peakRSS=2011.727 peakVM=2491.898 (2491.898) VmSize=2491.898, (2491.898)MB TotalVirt=32321.406 AfterCompW(2) /proc/self/status: peakRSS=2011.793 peakVM=2491.898 (2491.898) VmSize=2491.898, (2491.898)MB TotalVirt=32321.406 ** skiping the computation of F for the symmetric mode ** AfterCompW(0): Bal=1.625115e+00 (2.491898e+03, 1.533367e+03, 3.232141e+04) $$ 0: current nnz 39388620, water mark 39388620 (local F and E: 17297633+17297633) AfterCompT(3) /proc/self/status: peakRSS=3384.340 peakVM=3853.930 (3853.930) VmSize=3853.930, (3853.930)MB TotalVirt=41318.141 AfterSum: Max=4.030035e+03 Min=1.935477e+03 AfterSum(3) /proc/self/status: peakRSS=3459.312 peakVM=4030.035 (4030.035) VmSize=4030.035, (4030.035)MB TotalVirt=43805.422 $$ 0: current nnz 40668514, water mark 96769245 (schur: 23952885) Total number of nonzeros in E (Lsolve) and F (Usolve): 297801975 297801975 TIME: Solve : 1.58e+02, 1.58e+02 (2.72e+01+1.58e+02, 1.04e-06+1.31e-05) SymbT : 9.37e-04, 8.47e-03 MatVec: 1.03e+02, 7.83e+02 CommI : 8.37e-06, 3.16e-03 + 4.28e-08, 4.65e-03 InitS : 2.91e-03, 4.78e-02 NumS : 1.18e+00, 1.39e+00 (1.37e+00) approximate schur computation took 9.431587e+02 = 9.431587e+02 - 0.000000e+00 seconds (0.000000e+00 on patoh) Schur complement : nnz = 23952885 / 378231152 AfterCompS(2) /proc/self/status: peakRSS=2879.348 peakVM=3606.250 (3606.250) VmSize=2719.289, (2719.289)MB TotalVirt=33699.898 AfterCompS(0): Bal=1.854420e+00 (2.719289e+03, 1.466383e+03, 3.369990e+04) calling dsparsify_schur() ... ** applying equilbration on schur complement ** $$ 0: current nnz 44401126, watermark 96769245 (dblock 3732612). # 0: diag block = 1.636064e-01 sec calling dpdslin_ldperm(5) to get the max-weight permutation. ** mc77 is not linked, will use mc64 (compile with SYM_FACT) ** ** calling mc64 ** diagS(2) /proc/self/status: peakRSS=2879.348 peakVM=3606.250 (3606.250) VmSize=2719.289, (2719.289)MB TotalVirt=33728.672 # 0: allgatherv = 5.239535e-02 sec MC64 count 0/30922.. # 0: mc64/77 (2.797262e-01 sec) # 0: sparsify (4.977919e-07 sec) with tau2=0.000000e+00. $$ 0: current nnz 64621399, watermark 96769245 (sparsify 23952885) nnz=23952885/378231152 (1.000000e+00 nonzeros are kept). preprocessing and sparsifying approximate schur took 6.171009e-01 seconds. AfterPreProc(0): Bal=1.784342e+00 (2.719289e+03, 1.523973e+03, 3.381303e+04) ** factorizing the approximate schur complement ** SuperLU preconditioner. ## 0: using 4x4 processor.. ** Metis and serial symbolic ** AfterPreProc(2) /proc/self/status: peakRSS=2879.348 peakVM=3606.250 (3606.250) VmSize=2719.289, (2719.289)MB TotalVirt=33813.027 Nonzeros in L 219546898 Nonzeros in U 219546898 nonzeros in L+U 439062874 nonzeros in LSUB 4385685 SYMBfact time: 8.32 DISTRIBUTE time 7.08 MPI tag upper bound = 2147483647 .. Starting with 1 OpenMP threads === using DAG === .. thresh = s_eps 5.960464e-08 * anorm 1.516638e+01 = 9.039866e-07 .. Buffer size: Lsub 3413 Lval 429696 Usub 3513 Uval 429696 LDA 3357 max_ncols 3340, max_ldu 128, ldt 128, bigu_size=427520 max_ncols 3126, max_ldu 128, ldt 128, bigu_size=400128 [0] .. BIG U bigu_size 400128 (same either on CPU or GPU) [0] .. BIG V size (on CPU) 131072 Max row size is 3146 Threads per process 1 max_ncols 3189, max_ldu 128, ldt 128, bigu_size=408192 max_ncols 3340, max_ldu 128, ldt 128, bigu_size=427520 max_ncols 3357, max_ldu 128, ldt 128, bigu_size=429696 max_ncols 3126, max_ldu 128, ldt 128, bigu_size=400128 max_ncols 3340, max_ldu 128, ldt 128, bigu_size=427520 max_ncols 3340, max_ldu 128, ldt 128, bigu_size=427520 max_ncols 3126, max_ldu 128, ldt 128, bigu_size=400128 max_ncols 3126, max_ldu 128, ldt 128, bigu_size=400128 max_ncols 3189, max_ldu 128, ldt 128, bigu_size=408192 max_ncols 3357, max_ldu 128, ldt 128, bigu_size=429696 max_ncols 3189, max_ldu 128, ldt 128, bigu_size=408192 max_ncols 3357, max_ldu 128, ldt 128, bigu_size=429696 max_ncols 3189, max_ldu 128, ldt 128, bigu_size=408192 max_ncols 3357, max_ldu 128, ldt 128, bigu_size=429696 Initialization time 0.04 seconds Serial: compute static schedule, allocate storage ---- Time breakdown in factorization ---- Time in Look-ahead update 0.83 seconds Time in Schur update 9.59 seconds .. Time to Gather L buffer 0.05 (Separate L panel by Lookahead/Remain) .. Time to Gather U buffer 0.13 .. Time in GEMM 5.72 * Look-ahead 0.33 * Remain 5.40 .. Time to Scatter 3.67 * Look-ahead 0.21 * Remain 3.45 Total Time in Factorization : 13.80 seconds, Total time in Schur update with offload 0.00 seconds, -------- GEMM maximum block: 128-128-128 ************************************************** **** Time (seconds) **** COLPERM time 16.70 SYMBFACT time 8.32 DISTRIBUTE time 7.08 FACTOR time 13.84 Factor flops 3.656084e+12 Mflops 264132.72 SOLVE time 0.00 ************************************************** # 0: took 6.577226e+01 seconds for subdomain factorization. AfterProc(2) /proc/self/status: peakRSS=14855.555 peakVM=15335.969 (15335.969) VmSize=2533.020, (2533.020)MB TotalVirt=30743.984 $$ 0: current nnz 479762310, watermark 479762310 (slu: 219546898+219546898) $$ 0: current nnz 455809425, watermark 479762310 (free S: 23952885) factorization of approximate schur took 6.587741e+01 (1.040747e+03 Precond.) seconds AfterProc(0): Bal=1.681742e+00 (2.533020e+03, 1.506188e+03, 3.074398e+04) Total preconditioner computation time: 1.051274e+03 seconds. solving the system. ** computing solution (943695x943695) ** ** calling fgmres as outer-loop ** ** # 1: inner-loop(1.000000e-12): 3 iterations AfterSol(2) /proc/self/status: peakRSS=14855.555 peakVM=15335.969 (15335.969) VmSize=2533.020, (2533.020)MB TotalVirt=30810.465 Convergence in 1 iterations with rnorm=6.061044e-05. KSP Object: 16 MPI processes type: fgmres restart=500, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=50, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. right preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 16 MPI processes type: shell no name linear system matrix = precond matrix: Mat Object: 16 MPI processes type: shell rows=943695, cols=943695 MatVec = 2.896956e+00, 2.898476e+00 + interior solve: 1.360492e+00, 2.879299e+00 + interface matv: 2.437697e-03, 5.159489e-03 : 1.960481e-03, 4.119361e-03 + interior subtr: 2.251966e-03, 3.252203e-03 + distribute : 4.006629e-04, 1.523160e+00 mat Schur : 2.290151e+00, 2.897881e+00 TriSol = 1.116102e+00, 1.118972e+00 GMatV = 3.904761e-02 3.906426e-02 GPrec = 5.204885e+00 5.242523e+00 GSubs = 5.213628e-01 1.079413e+00 AfterSol(2) /proc/self/status: peakRSS=14855.555 peakVM=15335.969 (15335.969) VmSize=2533.020, (2533.020)MB TotalVirt=30828.473 ** # 2: inner-loop(1.000000e-12): 3 iterations Convergence in 1 iterations with rnorm=3.483073e-05. KSP Object: 16 MPI processes type: fgmres restart=500, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=50, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. right preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 16 MPI processes type: shell no name linear system matrix = precond matrix: Mat Object: 16 MPI processes type: shell rows=943695, cols=943695 MatVec = 5.783775e+00, 5.785303e+00 + interior solve: 2.722698e+00, 5.748069e+00 + interface matv: 4.868983e-03, 1.027341e-02 : 3.776802e-03, 7.983692e-03 + interior subtr: 4.371923e-03, 6.350646e-03 + distribute : 8.291849e-04, 3.034848e+00 mat Schur : 4.562700e+00, 5.784018e+00 TriSol = 2.249454e+00, 2.252564e+00 GMatV = 7.510517e-02 7.512476e-02 GPrec = 1.018488e+01 1.022252e+01 GSubs = 9.773760e-01 2.014463e+00 solution time: 1.080445e+01 sec. Total solution time: 1.080447e+01 seconds. AfterSol(2) /proc/self/status: peakRSS=14855.555 peakVM=15335.969 (15335.969) VmSize=2533.020, (2533.020)MB TotalVirt=30828.828 ----------- Summary ------------ ----------- Summary ------------ Time: Partition : 1.011731e+01 (part.)+ 4.075976e-01 (redist.) sec LU of interior domains: 3.104433e+01 3.104593e+01 sec Comp of Schur compl : 9.431584e+02 9.431602e+02 sec : LU(subdom.) + comp. Schur => 9.742028e+02 9.742046e+02 sec Preprocessing Schur : 5.500424e-01 6.642934e-01 sec LU of Schur compl. : 6.587770e+01 sec Precond. construction : 1.040747e+03 1.040749e+03 sec Srpreolution time: : 1.080445e+01 1.080447e+01 sec Number of iterations : outer: 2, subs: 0--0, schur: 0--0, total inner: 6 : # of subdom inverse = 0, # of precond applic. = 6 ///////////////////////////////////////////////////////////////////////////////////////////////////// Time: (1) Partition:(1a)time for matrix partitioning : 1.011731e+01 (1) Partition:(1b)time for matrix redistribution 4.075976e-01 sec (2) LU time for factorizing interior subdomains: min : 3.104433e+01 ,max: 3.104593e+01 , avg: 3.104513e+01 sec (3) time for computing approximate schur : min: 9.431584e+02 max: 9.431602e+02 avg: 9.431593e+02 sec (4) LU(subdom.) + comp. Schur (1)+(2): min: 9.742028e+02 , max: 9.742046e+02 , avg: 9.742037e+02 sec (5) time for preprocessing approximate schur : min: 5.500424e-01 , max: 6.642934e-01 , avg: 6.071679e-01 sec (6) time for factorizing approximate schur. : min: 6.587740e+01 , max : 6.587770e+01 , avg : 6.587755e+01 sec (7) total time for preconditioner computation : min: 1.040747e+03 , max : 1.040749e+03 , avg : 1.040748e+03 sec (8) total time for solution computation: : min: 1.080445e+01 , max : 1.080447e+01 , avg : 1.080446e+01 sec (9) Number of iterations (9a) : outer-itrations: 2 , (9) Number of iterations (9b) : subdomain-iteration for substitution: min: 0 , max : 0 , avg : 1024 (9) Number of iterations (9c) : schur ;subdomain-iteration for mat-vec min: 0 , max : 0 , avg : 1024 (9) Number of iterations (9d) : total number of innter-iterations: 6 (10) # of subdom inverse(max(9b) + max(9c)) = 0 Balance: (max/min, max/avg) max / min, avg domain dim : (1.46, 1.17 ) 132933 / 91107, 114096 domain nnz : (1.48, 1.17 ) 10648883 / 7172181, 9093886 LU nnz : (2.03, 1.39 ) 240551358.00 / 118248828.00, 172462044.25 interf nnzcol : (2.59, 1.51 ) 11176 / 4320, 7387 interf nnzrow : (3.31, 1.63 ) 11175 / 3375, 6841 interf nnz : (2.33, 1.39 ) 158582 / 68058, 114438 Number of nonzeros in A : (dim: 943695 x 943695), local/total Number of nonzeros interior subdomain : 4376928 / 72751091 (dim: 912773 x 912773) Number of nonzeros interfaces (both) : 238104 / 3662024 (dim: 912773 x 30922) Number of nonzeros in separator : 84170 / 1238732 (dim: 30922 x 30922) total Number of nonzeros (before dist) : 10311894 / 77651847 Number of nonzeros in Preconditioner: apx. Schur : 47549961165168 apx. Schur (total) : 378231152 W : 297801975, (10502949--28541049) G : 297801975, (10502949--28541049) LU interior : 1379696354 Schur complement : 439093796, (L: 219546898, U: 219546898) total precond. : 47551936465472 total nnz : 1818790150 fill-ratio over A : 23.42 A1 is now freed : 72751091 reading input file again to compute residual norm. zero elements are not checked. dread_ijv: n=943695 m=943695 nnz=39297771 (77651847). m_flag 1, c_flag 0 == rhs #1 == Err. nrm. = 8.733697e-09 / 5.606929e+02 = 1.557662e-11 Rel. res. = 6.248650e-05 / 1.428504e+10 = 4.374262e-15 End(2) /proc/self/status: peakRSS=14855.555 peakVM=15335.969 (15335.969) VmSize=2481.246, (2481.246)MB TotalVirt=30089.848 == rhs #2 == Err. nrm. = 1.508970e-09 / 5.607138e+02 = 2.691160e-12 Rel. res. = 2.930752e-05 / 1.311318e+10 = 2.234967e-15