Case $
Mon Apr 30 05:49:29 +03 2018

Compiler gcc  + mkl

Loaded modules: gcc/7.1.0:mpi/openmpi-3.0.0-gcc-7.1.0:intel/mkl_2017.0.098

Working directory: /X/pdslin_2.0.0/examples

Command: mpirun -np 16 ./dtest_pdslin matrices/symInput /X/dataDir/ijv/audikw_1.ijv 


16
reading from the input file (matrices/symInput).
 input->nproc_schur  = 16.
 input->nproc_dcomp  = 16.
 input->input_type   = INPUT_IJV (0).
 input->mat_type     = SYMMETRIC.
 input->mat_pattern  = SYMMETRIC.
 input->remove_zero  = NO (0).
 input->dcomp_type   = SCOTCH (1).
 input->num_doms     = 8.
 input->tau_sub      = 0.000000e+00.
 input->dom_solver   = SLU_DIST (0).
 input->dom_size     = 1000.
 input->blk_size     = 1.
 input->pperm_schur  = NO (0).
 input->psymb_schur  = YES (1).
 input->equil_schur  = 5 (both rowperm and equil).
 switching to use MC77.
 input->relax_factor = 2.000000e+01.
 input->equil_dom    = 5 (both equil and row perm).
 input->perm_dom     = MMD_AT_PLUS_A (2).
 input->inner_outer  = YES (1).
 ** keep the local matrix
 input->outer_tol    = 1.000000e-12.
 input->outer_max    = 50.
 input->diag_tau     = 0.000000e+00.
 input->drop_tau0    = 1.000000e-06.
 input->drop_tau2    = 0.000000e+00.
 input->drop_tau1    = 1.000000e-05.
 input->drop_tau3    = 0.000000e+00.
 input->ilu_lvl      = -1.
 input->asm_ovlp     = 1.
 input->asm_nsub     = 1.
 input->inner_solver = BICGSTAB (2).
 input->inner_max    = 1000.
 input->inner_restart= 500.
 input->inner_tol    = 1.000000e-12.
 input->ortho_scheme = CLASSICAL.
 input->precond_type = PR_SLU_ILU (0).
 inpuf->patoh_lbound = 0.000000e+00.
 inpuf->patoh_ubound = 4.000000e-01.
 inpuf->patoh_sparsity = 1.000000e-01.
 ## proc 0: running on 16 processors.
 reading in matrix from /X/dataDir/ijv/audikw_1.ijv
 zero elements are not checked.

dread_ijv: n=943695 m=943695 nnz=39297771 (77651847).
m_flag 1, c_flag 0
 -- scattering the matrix into 16 processors --
 took 2.833750e+01 seconds to read matrix.

 -- local nnz (3573432--10580373: 2.960844e+00)
 BeforeFacto(1) /proc/self/status: peakRSS= 185.613  peakVM= 606.426 ( 606.426) VmSize= 592.980, ( 592.980)MB TotalVirt=8443.270
 AfterRedist(1) /proc/self/status: peakRSS= 185.613  peakVM= 606.426 ( 606.426) VmSize= 592.984, ( 592.984)MB TotalVirt=8443.887
 factorizing the matrix.
 gathering matrix to processors that compute matrix partition took 1.480803e-06 seconds.
 AfterPerm(1) /proc/self/status: peakRSS= 185.613  peakVM= 606.426 ( 606.426) VmSize= 593.660, ( 593.660)MB TotalVirt=8454.672
 -- calling ParMetis (0) --
 -- ParMetis done (0) --
 ParMETIS took 1.001070e+01 seconds.
 # 0: computation of matrix partitioning took 1.011731e+01 seconds
 AfterDD(0) /proc/self/status: peakRSS=1850.074  peakVM=2270.977 (2270.977) VmSize= 673.090, ( 673.090)MB TotalVirt=9495.348
 * distribute interfaces to load-balance matvec * 
 rows of the Schur complement is distributed equaly among the processors..
 == point-to-point buffer ==
 == communication 16 cores at a time ==
 $$ 0: current nnz: 4699202 (original A).
 matrix redistribution took 4.075976e-01 seconds
 # factorization of subdomains by SuperLU_DIST
 ## 8: factor dom 4 (132933x132933, nnz=10648883 (4802784--5846099: 1.217231e+00)) using 1x2 processor (nproc=2,p1=8,p2=10,id=8)..
 ## 12: factor dom 6 (102969x102969, nnz=8166717 (2768517--5398200: 1.949853e+00)) using 1x2 processor (nproc=2,p1=12,p2=14,id=12)..
 ## 4: factor dom 2 (118824x118824, nnz=9447264 (3673818--5773446: 1.571511e+00)) using 1x2 processor (nproc=2,p1=4,p2=6,id=4)..
 All2All(6) /proc/self/status: peakRSS= 320.938  peakVM= 742.188 ( 742.188) VmSize= 687.234, ( 687.234)MB TotalVirt=10462.477
 FreePerm(6) /proc/self/status: peakRSS= 320.938  peakVM= 742.188 ( 742.188) VmSize= 687.238, ( 687.238)MB TotalVirt=10462.480
 CallLUD(6) /proc/self/status: peakRSS= 320.938  peakVM= 742.188 ( 742.188) VmSize= 687.238, ( 687.238)MB TotalVirt=10462.480
 ## 6: factor dom 3 (127254x127254, nnz=10164222 (3421026--6743196: 1.971103e+00)) using 1x2 processor (nproc=2,p1=6,p2=8,id=6)..
 ## 2: factor dom 1 (132260x132260, nnz=10608150 (4764799--5843351: 1.226358e+00)) using 1x2 processor (nproc=2,p1=2,p2=4,id=2)..
 ## 10: factor dom 5 (108795x108795, nnz=8697627 (4062027--4635600: 1.141204e+00)) using 1x2 processor (nproc=2,p1=10,p2=12,id=10)..
 ## 14: factor dom 7 (91107x91107, nnz=7172181 (2371473--4800708: 2.024357e+00)) using 1x2 processor (nproc=2,p1=14,p2=16,id=14)..
 ## 0: factor dom 0 (98631x98631, nnz=7846047 (3469119--4376928: 1.261683e+00)) using 1x2 processor (nproc=2,p1=0,p2=2,id=0)..
 ***** serial symbolic is used *****
 using symmetric mode (METIS and no Equil)
	Nonzeros in L       70504725
	Nonzeros in U       70504725
	nonzeros in L+U     140910819
	nonzeros in LSUB    12573718
	SYMBfact time: 0.77
	Nonzeros in L       59124414
	Nonzeros in U       59124414
	nonzeros in L+U     118157721
	nonzeros in LSUB    10605676
	SYMBfact time: 0.73
	Nonzeros in L       71087718
	Nonzeros in U       71087718
	nonzeros in L+U     142072467
	nonzeros in LSUB    13299511
	SYMBfact time: 0.88
	Nonzeros in L       84520620
	Nonzeros in U       84520620
	nonzeros in L+U     168922416
	nonzeros in LSUB    14938123
	SYMBfact time: 0.91
	Nonzeros in L       88227309
	Nonzeros in U       88227309
	nonzeros in L+U     176345823
	nonzeros in LSUB    14618158
	SYMBfact time: 1.03
	Nonzeros in L       90676875
	Nonzeros in U       90676875
	nonzeros in L+U     181226496
	nonzeros in LSUB    15892949
	SYMBfact time: 0.99
	Nonzeros in L       105430837
	Nonzeros in U       105430837
	nonzeros in L+U     210729414
	nonzeros in LSUB    17596632
	SYMBfact time: 1.10
	DISTRIBUTE time        1.11
MPI tag upper bound = 2147483647
.. Starting with 1 OpenMP threads 
 === using DAG ===
	DISTRIBUTE time        1.10
MPI tag upper bound = 2147483647
.. Starting with 1 OpenMP threads 
 === using DAG ===
.. thresh = s_eps 5.960464e-08 * anorm 5.834870e+09 = 3.477854e+02
.. Buffer size: Lsub 4450	Lval 559104	Usub 3090	Uval 275584	LDA 4368
.. thresh = s_eps 5.960464e-08 * anorm 1.605646e+09 = 9.570399e+01
.. Buffer size: Lsub 3167	Lval 397440	Usub 2556	Uval 204160	LDA 3105
max_ncols 2087, max_ldu 128, ldt 128, bigu_size=267136
max_ncols 2153, max_ldu 128, ldt 128, bigu_size=275584
[0] .. BIG U bigu_size   275584 (same either on CPU or GPU)
[0] .. BIG V size (on CPU) 131072
  Max row size is 4368 
  Threads per process 1 
max_ncols 1530, max_ldu 128, ldt 128, bigu_size=195840
[0] .. BIG U bigu_size   195840 (same either on CPU or GPU)
[0] .. BIG V size (on CPU) 131072
  Max row size is 3105 
  Threads per process 1 
max_ncols 1595, max_ldu 128, ldt 128, bigu_size=204160
	Nonzeros in L       120275679
	Nonzeros in U       120275679
	nonzeros in L+U     240418425
	nonzeros in LSUB    18882570
	SYMBfact time: 1.38
	DISTRIBUTE time        1.27
MPI tag upper bound = 2147483647
.. Starting with 1 OpenMP threads 
 === using DAG ===
.. thresh = s_eps 5.960464e-08 * anorm 3.817093e+09 = 2.275165e+02
.. Buffer size: Lsub 4085	Lval 512640	Usub 2903	Uval 256384	LDA 4005
max_ncols 1874, max_ldu 128, ldt 128, bigu_size=239872
max_ncols 2003, max_ldu 128, ldt 128, bigu_size=256384
[0] .. BIG U bigu_size   256384 (same either on CPU or GPU)
[0] .. BIG V size (on CPU) 131072
  Max row size is 4005 
  Threads per process 1 
	DISTRIBUTE time        1.39
MPI tag upper bound = 2147483647
.. Starting with 1 OpenMP threads 
 === using DAG ===
.. thresh = s_eps 5.960464e-08 * anorm 8.064600e+09 = 4.806876e+02
.. Buffer size: Lsub 4301	Lval 541056	Usub 2905	Uval 263808	LDA 4227
max_ncols 2038, max_ldu 128, ldt 128, bigu_size=260864
max_ncols 2061, max_ldu 128, ldt 128, bigu_size=263808
[0] .. BIG U bigu_size   263808 (same either on CPU or GPU)
[0] .. BIG V size (on CPU) 131072
  Max row size is 4227 
  Threads per process 1 
	DISTRIBUTE time        1.44
MPI tag upper bound = 2147483647
.. Starting with 1 OpenMP threads 
 === using DAG ===
.. thresh = s_eps 5.960464e-08 * anorm 3.481744e+09 = 2.075281e+02
.. Buffer size: Lsub 4327	Lval 544896	Usub 3600	Uval 272128	LDA 4257
max_ncols 2081, max_ldu 128, ldt 128, bigu_size=266368
max_ncols 2126, max_ldu 128, ldt 128, bigu_size=272128
[0] .. BIG U bigu_size   272128 (same either on CPU or GPU)
[0] .. BIG V size (on CPU) 131072
  Max row size is 4257 
  Threads per process 1 
	DISTRIBUTE time        1.52
MPI tag upper bound = 2147483647
.. Starting with 1 OpenMP threads 
 === using DAG ===
.. thresh = s_eps 5.960464e-08 * anorm 4.590669e+09 = 2.736252e+02
.. Buffer size: Lsub 4208	Lval 529920	Usub 2948	Uval 260224	LDA 4140
max_ncols 2033, max_ldu 128, ldt 128, bigu_size=260224
max_ncols 1979, max_ldu 128, ldt 128, bigu_size=253312
[0] .. BIG U bigu_size   253312 (same either on CPU or GPU)
[0] .. BIG V size (on CPU) 131072
  Max row size is 4140 
  Threads per process 1 
	DISTRIBUTE time        1.61
MPI tag upper bound = 2147483647
.. Starting with 1 OpenMP threads 
 === using DAG ===
.. thresh = s_eps 5.960464e-08 * anorm 2.167332e+10 = 1.291830e+03
.. Buffer size: Lsub 4628	Lval 582400	Usub 3521	Uval 295808	LDA 4550
max_ncols 2111, max_ldu 128, ldt 128, bigu_size=270208
[0] .. BIG U bigu_size   270208 (same either on CPU or GPU)
[0] .. BIG V size (on CPU) 131072
  Max row size is 4550 
  Threads per process 1 
max_ncols 2311, max_ldu 128, ldt 128, bigu_size=295808
	DISTRIBUTE time        1.93
MPI tag upper bound = 2147483647
.. Starting with 1 OpenMP threads 
 === using DAG ===
.. thresh = s_eps 5.960464e-08 * anorm 4.635928e+10 = 2.763229e+03
.. Buffer size: Lsub 5215	Lval 656768	Usub 4007	Uval 327680	LDA 5131
max_ncols 2560, max_ldu 128, ldt 128, bigu_size=327680
max_ncols 2494, max_ldu 128, ldt 128, bigu_size=319232
[0] .. BIG U bigu_size   319232 (same either on CPU or GPU)
[0] .. BIG V size (on CPU) 131072
  Max row size is 5131 
  Threads per process 1 

Initialization time	    0.03 seconds
	 Serial: compute static schedule, allocate storage

---- Time breakdown in factorization ----
Time in Look-ahead update 	     0.58 seconds
Time in Schur update 		     6.95 seconds
.. Time to Gather L buffer	     0.10  (Separate L panel by Lookahead/Remain)
.. Time to Gather U buffer	     0.12 
.. Time in GEMM     2.99 
	* Look-ahead	     0.21 
	* Remain	     2.78 
.. Time to Scatter     3.67 
	* Look-ahead	     0.15 
	* Remain	     3.52 
Total Time in Factorization            	:     8.31 seconds, 
Total time in Schur update with offload	      0.00 seconds,
--------
GEMM maximum block: 128-128-128
..(14) return from dpdslin_slu_fact

Initialization time	    0.03 seconds
	 Serial: compute static schedule, allocate storage

---- Time breakdown in factorization ----
Time in Look-ahead update 	     0.93 seconds
Time in Schur update 		     9.01 seconds
.. Time to Gather L buffer	     0.12  (Separate L panel by Lookahead/Remain)
.. Time to Gather U buffer	     0.12 
.. Time in GEMM     4.06 
	* Look-ahead	     0.36 
	* Remain	     3.70 
.. Time to Scatter     4.62 
	* Look-ahead	     0.23 
	* Remain	     4.39 
Total Time in Factorization            	:    10.74 seconds, 
Total time in Schur update with offload	      0.00 seconds,
--------
GEMM maximum block: 128-128-128
..(0) return from dpdslin_slu_fact

Initialization time	    0.04 seconds
	 Serial: compute static schedule, allocate storage

---- Time breakdown in factorization ----
Time in Look-ahead update 	     0.74 seconds
Time in Schur update 		     9.60 seconds
.. Time to Gather L buffer	     0.13  (Separate L panel by Lookahead/Remain)
.. Time to Gather U buffer	     0.15 
.. Time in GEMM     4.32 
	* Look-ahead	     0.29 
	* Remain	     4.03 
.. Time to Scatter     4.92 
	* Look-ahead	     0.20 
	* Remain	     4.72 
Total Time in Factorization            	:    11.24 seconds, 
Total time in Schur update with offload	      0.00 seconds,
--------
GEMM maximum block: 128-128-128
..(12) return from dpdslin_slu_fact

Initialization time	    0.04 seconds
	 Serial: compute static schedule, allocate storage

---- Time breakdown in factorization ----
Time in Look-ahead update 	     0.74 seconds
Time in Schur update 		     9.67 seconds
.. Time to Gather L buffer	     0.14  (Separate L panel by Lookahead/Remain)
.. Time to Gather U buffer	     0.15 
.. Time in GEMM     4.27 
	* Look-ahead	     0.29 
	* Remain	     3.98 
.. Time to Scatter     5.02 
	* Look-ahead	     0.19 
	* Remain	     4.83 
Total Time in Factorization            	:    11.65 seconds, 
Total time in Schur update with offload	      0.00 seconds,
--------
GEMM maximum block: 128-128-128
..(4) return from dpdslin_slu_fact

Initialization time	    0.04 seconds
	 Serial: compute static schedule, allocate storage

---- Time breakdown in factorization ----
Time in Look-ahead update 	     0.75 seconds
Time in Schur update 		    10.58 seconds
.. Time to Gather L buffer	     0.15  (Separate L panel by Lookahead/Remain)
.. Time to Gather U buffer	     0.15 
.. Time in GEMM     4.74 
	* Look-ahead	     0.30 
	* Remain	     4.44 
.. Time to Scatter     5.43 
	* Look-ahead	     0.20 
	* Remain	     5.23 
Total Time in Factorization            	:    12.53 seconds, 
Total time in Schur update with offload	      0.00 seconds,
--------
GEMM maximum block: 128-128-128
..(6) return from dpdslin_slu_fact

Initialization time	    0.04 seconds
	 Serial: compute static schedule, allocate storage

---- Time breakdown in factorization ----
Time in Look-ahead update 	     1.17 seconds
Time in Schur update 		    13.92 seconds
.. Time to Gather L buffer	     0.16  (Separate L panel by Lookahead/Remain)
.. Time to Gather U buffer	     0.18 
.. Time in GEMM     6.36 
	* Look-ahead	     0.47 
	* Remain	     5.90 
.. Time to Scatter     7.09 
	* Look-ahead	     0.34 
	* Remain	     6.75 
Total Time in Factorization            	:    16.21 seconds, 
Total time in Schur update with offload	      0.00 seconds,
--------
GEMM maximum block: 128-128-128
..(10) return from dpdslin_slu_fact

Initialization time	    0.04 seconds
	 Serial: compute static schedule, allocate storage

---- Time breakdown in factorization ----
Time in Look-ahead update 	     0.95 seconds
Time in Schur update 		    13.88 seconds
.. Time to Gather L buffer	     0.18  (Separate L panel by Lookahead/Remain)
.. Time to Gather U buffer	     0.18 
.. Time in GEMM     6.34 
	* Look-ahead	     0.38 
	* Remain	     5.96 
.. Time to Scatter     7.07 
	* Look-ahead	     0.27 
	* Remain	     6.80 
Total Time in Factorization            	:    16.79 seconds, 
Total time in Schur update with offload	      0.00 seconds,
--------
GEMM maximum block: 128-128-128
..(2) return from dpdslin_slu_fact

Initialization time	    0.05 seconds
	 Serial: compute static schedule, allocate storage

---- Time breakdown in factorization ----
Time in Look-ahead update 	     1.60 seconds
Time in Schur update 		    21.50 seconds
.. Time to Gather L buffer	     0.22  (Separate L panel by Lookahead/Remain)
.. Time to Gather U buffer	     0.24 
.. Time in GEMM    10.54 
	* Look-ahead	     0.69 
	* Remain	     9.85 
.. Time to Scatter    10.34 
	* Look-ahead	     0.45 
	* Remain	     9.89 
Total Time in Factorization            	:    24.58 seconds, 
Total time in Schur update with offload	      0.00 seconds,
--------
GEMM maximum block: 128-128-128
**************************************************
**** Time (seconds) ****
	COLPERM time           2.17
	SYMBFACT time          1.10
	DISTRIBUTE time        1.61
	FACTOR time           16.84
	Factor flops	4.081367e+11	Mflops 	24242.88
	SOLVE time             0.00
**************************************************
 # 2: fill-ratio: (105430837+105430837)/5843351=3.608574e+01, time: 2.223084e+01 (2.223084e+01,2.223091e+01) seconds
**************************************************
**** Time (seconds) ****
	COLPERM time           1.99
	SYMBFACT time          1.03
	DISTRIBUTE time        1.44
	FACTOR time           16.25
	Factor flops	3.415694e+11	Mflops 	21014.70
	SOLVE time             0.00
**************************************************
 # 10: fill-ratio: (88227309+88227309)/4062027=4.344004e+01, time: 2.120044e+01 (2.120044e+01,2.120063e+01) seconds
**************************************************
**** Time (seconds) ****
	COLPERM time           1.85
	SYMBFACT time          0.88
	DISTRIBUTE time        1.27
	FACTOR time           11.28
	Factor flops	2.160424e+11	Mflops 	19154.26
	SOLVE time             0.00
**************************************************
 # 12: fill-ratio: (71087718+71087718)/2768517=5.135437e+01, time: 1.574500e+01 (1.574500e+01,1.574523e+01) seconds
.. Total nnz(LU): 1379696354, fill-ratio: 18.96
**************************************************
.. options:
**    Fact             :    0
**    Equil            :    0
**    ParSymbFact      :    0
**    ColPerm          :    4
**    RowPerm          :    0
**    ReplaceTinyPivot :    0
**    IterRefine       :    0
**    Trans            :    0
**    num_lookaheads   :   10
**    SymPattern       :    0
**    lookahead_etree  :    0
**************************************************
**************************************************
**** Time (seconds) ****
	COLPERM time           1.55
	SYMBFACT time          0.77
	DISTRIBUTE time        1.11
	FACTOR time           10.77
	Factor flops	2.298040e+11	Mflops 	21331.53
	SOLVE time             0.00
**************************************************
 # 0: fill-ratio: (70504725+70504725)/4376928=3.221653e+01, time: 1.459867e+01 (1.459867e+01,1.459872e+01) seconds
**************************************************
**** Time (seconds) ****
	COLPERM time           1.90
	SYMBFACT time          0.91
	DISTRIBUTE time        1.39
	FACTOR time           11.68
	Factor flops	2.642398e+11	Mflops 	22618.50
	SOLVE time             0.00
**************************************************
 # 4: fill-ratio: (84520620+84520620)/5773446=2.927909e+01, time: 1.636408e+01 (1.636408e+01,1.636416e+01) seconds
**************************************************
**** Time (seconds) ****
	COLPERM time           2.09
	SYMBFACT time          0.99
	DISTRIBUTE time        1.52
	FACTOR time           12.57
	Factor flops	2.939753e+11	Mflops 	23380.06
	SOLVE time             0.00
**************************************************
 # 6: fill-ratio: (90676875+90676875)/6743196=2.689433e+01, time: 1.772623e+01 (1.772623e+01,1.772634e+01) seconds
**************************************************
**** Time (seconds) ****
	COLPERM time           1.58
	SYMBFACT time          0.73
	DISTRIBUTE time        1.10
	FACTOR time            8.34
	Factor flops	1.614347e+11	Mflops 	19349.10
	SOLVE time             0.00
**************************************************
 # 14: fill-ratio: (59124414+59124414)/2371473=4.986303e+01, time: 1.217488e+01 (1.217488e+01,1.217490e+01) seconds
..(8) return from dpdslin_slu_fact
**************************************************
**** Time (seconds) ****
	COLPERM time           2.53
	SYMBFACT time          1.38
	DISTRIBUTE time        1.93
	FACTOR time           24.63
	Factor flops	5.609668e+11	Mflops 	22776.53
	SOLVE time             0.00
**************************************************
 # 8: fill-ratio: (120275679+120275679)/5846099=4.114733e+01, time: 3.103986e+01 (3.103985e+01,3.103986e+01) seconds
 BeforeCompS(8) /proc/self/status: peakRSS=1671.812  peakVM=2092.867 (2092.867) VmSize=1936.426, (1936.426)MB TotalVirt=26048.469
 InCompS(8) /proc/self/status: peakRSS=1671.812  peakVM=2092.867 (2092.867) VmSize=1936.426, (1936.426)MB TotalVirt=26048.469
 BeforeCompG(8) /proc/self/status: peakRSS=1671.812  peakVM=2092.867 (2092.867) VmSize=1936.426, (1936.426)MB TotalVirt=26048.469
 interface[4]: 132933x30922 nnz=257344 (128665--128679: 1.000109e+00) nnzcol=8493
 interface[6]: 102969x30922 nnz=136116 (68058--68058: 1.000000e+00) nnzcol=4320
 interface[5]: 108795x30922 nnz=253530 (126741--126789: 1.000379e+00) nnzcol=7701
 interface[3]: 127254x30922 nnz=227313 (113649--113664: 1.000132e+00) nnzcol=7182
 interface[7]: 91107x30922 nnz=154647 (77319--77328: 1.000116e+00) nnzcol=4932
 interface[1]: 132260x30922 nnz=317142 (158560--158582: 1.000139e+00) nnzcol=11176
 # 0: subdomain factorization took 3.104433e+01 seconds.
 $$ 0: current nnz: 4699202 (water mark: 4699202).
 BeforeCompS(0): Bal=1.456674e+00 (1.936426e+03, 1.329348e+03, 2.604847e+04)
 ** computing an approximate schur complement **
 schurs complement (30922x30922) is computed by 1 columns with tau0=1.000000e-06 on W and tau1=1.000000e-05 on T
 ** Post-ordering permutations **
 BeforeCompG(0): Bal=1.456674e+00 (1.936426e+03, 1.329348e+03, 2.604847e+04)
 interface[0]: 98631x30922 nnz=238104 (119052--119052: 1.000000e+00) nnzcol=7578
 interface[2]: 118824x30922 nnz=246816 (123399--123417: 1.000146e+00) nnzcol=7719
 AfterCompG(0): Bal=1.625115e+00 (2.491898e+03, 1.533367e+03, 3.232141e+04)
 AfterCompG(2) /proc/self/status: peakRSS=2011.727  peakVM=2491.898 (2491.898) VmSize=2491.898, (2491.898)MB TotalVirt=32321.406
 AfterCompW(2) /proc/self/status: peakRSS=2011.793  peakVM=2491.898 (2491.898) VmSize=2491.898, (2491.898)MB TotalVirt=32321.406
 ** skiping the computation of F for the symmetric mode **
 AfterCompW(0): Bal=1.625115e+00 (2.491898e+03, 1.533367e+03, 3.232141e+04)
 $$ 0: current nnz 39388620, water mark 39388620 (local F and E: 17297633+17297633)
 AfterCompT(3) /proc/self/status: peakRSS=3384.340  peakVM=3853.930 (3853.930) VmSize=3853.930, (3853.930)MB TotalVirt=41318.141
 AfterSum: Max=4.030035e+03 Min=1.935477e+03
 AfterSum(3) /proc/self/status: peakRSS=3459.312  peakVM=4030.035 (4030.035) VmSize=4030.035, (4030.035)MB TotalVirt=43805.422
 $$ 0: current nnz 40668514, water mark 96769245 (schur: 23952885)
 Total number of nonzeros in E (Lsolve) and F (Usolve): 297801975 297801975
TIME: 
 Solve : 1.58e+02, 1.58e+02 (2.72e+01+1.58e+02, 1.04e-06+1.31e-05)
 SymbT : 9.37e-04, 8.47e-03
 MatVec: 1.03e+02, 7.83e+02
 CommI : 8.37e-06, 3.16e-03 + 4.28e-08, 4.65e-03
 InitS : 2.91e-03, 4.78e-02
 NumS  : 1.18e+00, 1.39e+00 (1.37e+00)
 approximate schur computation took 9.431587e+02 = 9.431587e+02 - 0.000000e+00 seconds (0.000000e+00 on patoh)
  Schur complement  : nnz = 23952885 / 378231152
 AfterCompS(2) /proc/self/status: peakRSS=2879.348  peakVM=3606.250 (3606.250) VmSize=2719.289, (2719.289)MB TotalVirt=33699.898
 AfterCompS(0): Bal=1.854420e+00 (2.719289e+03, 1.466383e+03, 3.369990e+04)
calling dsparsify_schur() ... 
 ** applying equilbration on schur complement **
 $$ 0: current nnz 44401126, watermark 96769245 (dblock 3732612).
 # 0: diag block = 1.636064e-01 sec
 calling dpdslin_ldperm(5) to get the max-weight permutation.
 ** mc77 is not linked, will use mc64 (compile with SYM_FACT) **
 ** calling mc64 **
 diagS(2) /proc/self/status: peakRSS=2879.348  peakVM=3606.250 (3606.250) VmSize=2719.289, (2719.289)MB TotalVirt=33728.672
 # 0: allgatherv = 5.239535e-02 sec
MC64 count 0/30922..
 # 0: mc64/77 (2.797262e-01 sec)
 # 0: sparsify (4.977919e-07 sec) with tau2=0.000000e+00.
 $$ 0: current nnz 64621399, watermark 96769245 (sparsify 23952885)
 nnz=23952885/378231152 (1.000000e+00 nonzeros are kept).
preprocessing and sparsifying approximate schur took 6.171009e-01 seconds.
 AfterPreProc(0): Bal=1.784342e+00 (2.719289e+03, 1.523973e+03, 3.381303e+04)
 ** factorizing the approximate schur complement **
 SuperLU preconditioner.
 ## 0: using 4x4 processor..
 ** Metis and serial symbolic **
 AfterPreProc(2) /proc/self/status: peakRSS=2879.348  peakVM=3606.250 (3606.250) VmSize=2719.289, (2719.289)MB TotalVirt=33813.027
	Nonzeros in L       219546898
	Nonzeros in U       219546898
	nonzeros in L+U     439062874
	nonzeros in LSUB    4385685
	SYMBfact time: 8.32
	DISTRIBUTE time        7.08
MPI tag upper bound = 2147483647
.. Starting with 1 OpenMP threads 
 === using DAG ===
.. thresh = s_eps 5.960464e-08 * anorm 1.516638e+01 = 9.039866e-07
.. Buffer size: Lsub 3413	Lval 429696	Usub 3513	Uval 429696	LDA 3357
max_ncols 3340, max_ldu 128, ldt 128, bigu_size=427520
max_ncols 3126, max_ldu 128, ldt 128, bigu_size=400128
[0] .. BIG U bigu_size   400128 (same either on CPU or GPU)
[0] .. BIG V size (on CPU) 131072
  Max row size is 3146 
  Threads per process 1 
max_ncols 3189, max_ldu 128, ldt 128, bigu_size=408192
max_ncols 3340, max_ldu 128, ldt 128, bigu_size=427520
max_ncols 3357, max_ldu 128, ldt 128, bigu_size=429696
max_ncols 3126, max_ldu 128, ldt 128, bigu_size=400128
max_ncols 3340, max_ldu 128, ldt 128, bigu_size=427520
max_ncols 3340, max_ldu 128, ldt 128, bigu_size=427520
max_ncols 3126, max_ldu 128, ldt 128, bigu_size=400128
max_ncols 3126, max_ldu 128, ldt 128, bigu_size=400128
max_ncols 3189, max_ldu 128, ldt 128, bigu_size=408192
max_ncols 3357, max_ldu 128, ldt 128, bigu_size=429696
max_ncols 3189, max_ldu 128, ldt 128, bigu_size=408192
max_ncols 3357, max_ldu 128, ldt 128, bigu_size=429696
max_ncols 3189, max_ldu 128, ldt 128, bigu_size=408192
max_ncols 3357, max_ldu 128, ldt 128, bigu_size=429696

Initialization time	    0.04 seconds
	 Serial: compute static schedule, allocate storage

---- Time breakdown in factorization ----
Time in Look-ahead update 	     0.83 seconds
Time in Schur update 		     9.59 seconds
.. Time to Gather L buffer	     0.05  (Separate L panel by Lookahead/Remain)
.. Time to Gather U buffer	     0.13 
.. Time in GEMM     5.72 
	* Look-ahead	     0.33 
	* Remain	     5.40 
.. Time to Scatter     3.67 
	* Look-ahead	     0.21 
	* Remain	     3.45 
Total Time in Factorization            	:    13.80 seconds, 
Total time in Schur update with offload	      0.00 seconds,
--------
GEMM maximum block: 128-128-128
**************************************************
**** Time (seconds) ****
	COLPERM time          16.70
	SYMBFACT time          8.32
	DISTRIBUTE time        7.08
	FACTOR time           13.84
	Factor flops	3.656084e+12	Mflops 	264132.72
	SOLVE time             0.00
**************************************************
 # 0: took 6.577226e+01 seconds for subdomain factorization.
 AfterProc(2) /proc/self/status: peakRSS=14855.555  peakVM=15335.969 (15335.969) VmSize=2533.020, (2533.020)MB TotalVirt=30743.984
 $$ 0: current nnz 479762310, watermark 479762310 (slu: 219546898+219546898)
 $$ 0: current nnz 455809425, watermark 479762310 (free S: 23952885)
 factorization of approximate schur took 6.587741e+01 (1.040747e+03 Precond.) seconds

 AfterProc(0): Bal=1.681742e+00 (2.533020e+03, 1.506188e+03, 3.074398e+04)
 Total preconditioner computation time: 1.051274e+03 seconds.

 solving the system.
 ** computing solution (943695x943695) **
 ** calling fgmres as outer-loop **

 ** # 1: inner-loop(1.000000e-12): 3 iterations
 AfterSol(2) /proc/self/status: peakRSS=14855.555  peakVM=15335.969 (15335.969) VmSize=2533.020, (2533.020)MB TotalVirt=30810.465

Convergence in 1 iterations with rnorm=6.061044e-05.
KSP Object: 16 MPI processes
  type: fgmres
    restart=500, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    happy breakdown tolerance 1e-30
  maximum iterations=50, initial guess is zero
  tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
  right preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 16 MPI processes
  type: shell
    no name
  linear system matrix = precond matrix:
  Mat Object: 16 MPI processes
    type: shell
    rows=943695, cols=943695
  MatVec = 2.896956e+00, 2.898476e+00
   + interior solve: 1.360492e+00, 2.879299e+00
   + interface matv: 2.437697e-03, 5.159489e-03
                   : 1.960481e-03, 4.119361e-03
   + interior subtr: 2.251966e-03, 3.252203e-03
   + distribute    : 4.006629e-04, 1.523160e+00
   mat Schur     : 2.290151e+00, 2.897881e+00
  TriSol = 1.116102e+00, 1.118972e+00

  GMatV = 3.904761e-02 3.906426e-02
  GPrec = 5.204885e+00 5.242523e+00
  GSubs = 5.213628e-01 1.079413e+00
 AfterSol(2) /proc/self/status: peakRSS=14855.555  peakVM=15335.969 (15335.969) VmSize=2533.020, (2533.020)MB TotalVirt=30828.473

 ** # 2: inner-loop(1.000000e-12): 3 iterations

Convergence in 1 iterations with rnorm=3.483073e-05.
KSP Object: 16 MPI processes
  type: fgmres
    restart=500, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    happy breakdown tolerance 1e-30
  maximum iterations=50, initial guess is zero
  tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
  right preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 16 MPI processes
  type: shell
    no name
  linear system matrix = precond matrix:
  Mat Object: 16 MPI processes
    type: shell
    rows=943695, cols=943695
  MatVec = 5.783775e+00, 5.785303e+00
   + interior solve: 2.722698e+00, 5.748069e+00
   + interface matv: 4.868983e-03, 1.027341e-02
                   : 3.776802e-03, 7.983692e-03
   + interior subtr: 4.371923e-03, 6.350646e-03
   + distribute    : 8.291849e-04, 3.034848e+00
   mat Schur     : 4.562700e+00, 5.784018e+00
  TriSol = 2.249454e+00, 2.252564e+00

  GMatV = 7.510517e-02 7.512476e-02
  GPrec = 1.018488e+01 1.022252e+01
  GSubs = 9.773760e-01 2.014463e+00
 solution time: 1.080445e+01 sec.

 Total solution time: 1.080447e+01 seconds.

 AfterSol(2) /proc/self/status: peakRSS=14855.555  peakVM=15335.969 (15335.969) VmSize=2533.020, (2533.020)MB TotalVirt=30828.828


 ----------- Summary ------------

 

 ----------- Summary ------------

  Time:
   Partition             : 1.011731e+01 (part.)+ 4.075976e-01 (redist.) sec 
   LU of interior domains: 3.104433e+01 3.104593e+01 sec
   Comp of Schur  compl  : 9.431584e+02 9.431602e+02 sec
                         : LU(subdom.) + comp. Schur => 9.742028e+02 9.742046e+02 sec
   Preprocessing Schur   : 5.500424e-01 6.642934e-01 sec
   LU of Schur compl.    : 6.587770e+01 sec
   Precond. construction : 1.040747e+03 1.040749e+03 sec
   Srpreolution time:        : 1.080445e+01 1.080447e+01 sec
   Number of iterations  : outer: 2, subs: 0--0, schur: 0--0, total inner: 6
                         : # of subdom inverse = 0,  # of precond applic. = 6



/////////////////////////////////////////////////////////////////////////////////////////////////////

 Time:
 (1)  Partition:(1a)time for matrix partitioning : 1.011731e+01 
 (1)  Partition:(1b)time for matrix redistribution 4.075976e-01 sec 
 (2) LU time for factorizing interior subdomains: min : 3.104433e+01 ,max: 3.104593e+01 , avg: 3.104513e+01 sec
 (3) time for computing approximate schur  : min: 9.431584e+02 max: 9.431602e+02 avg: 9.431593e+02 sec
 (4) LU(subdom.) + comp. Schur (1)+(2): min: 9.742028e+02 , max: 9.742046e+02 , avg: 9.742037e+02 sec
 (5) time for preprocessing approximate schur   : min: 5.500424e-01 , max: 6.642934e-01 , avg: 6.071679e-01 sec
 (6) time for factorizing approximate schur.    : min: 6.587740e+01  , max : 6.587770e+01 , avg : 6.587755e+01 sec
 (7) total time for preconditioner computation : min: 1.040747e+03 , max : 1.040749e+03 , avg : 1.040748e+03 sec
 (8) total time for solution computation:        : min: 1.080445e+01 , max : 1.080447e+01 , avg : 1.080446e+01 sec
 (9) Number of iterations (9a)  : outer-itrations: 2 , 
 (9) Number of iterations (9b)  : subdomain-iteration for substitution: min: 0 , max : 0 , avg : 1024 
 (9) Number of iterations (9c)  : schur ;subdomain-iteration for mat-vec min: 0 , max : 0 , avg : 1024 
 (9) Number of iterations (9d)  : total number of innter-iterations: 6 
 (10) # of subdom inverse(max(9b) + max(9c)) = 0 

 Balance: (max/min, max/avg) max / min, avg
   domain dim             : (1.46, 1.17 ) 132933 / 91107, 114096
   domain nnz             : (1.48, 1.17 ) 10648883 / 7172181, 9093886
   LU     nnz             : (2.03, 1.39 ) 240551358.00 / 118248828.00, 172462044.25
   interf nnzcol          : (2.59, 1.51 ) 11176 / 4320, 7387
   interf nnzrow          : (3.31, 1.63 ) 11175 / 3375, 6841
   interf nnz             : (2.33, 1.39 ) 158582 / 68058, 114438

 Number of nonzeros in A                   : (dim: 943695 x 943695), local/total
 Number of nonzeros  interior subdomain    : 4376928 / 72751091 (dim: 912773 x 912773)
 Number of nonzeros  interfaces (both)     : 238104 / 3662024 (dim: 912773 x 30922)
Number of nonzeros in  separator         : 84170 / 1238732 (dim: 30922 x 30922)
  total Number of nonzeros (before dist)    : 10311894 / 77651847

 Number of nonzeros in Preconditioner:
  apx. Schur         : 47549961165168
  apx. Schur (total) : 378231152
  W                  : 297801975, (10502949--28541049)
  G                  : 297801975, (10502949--28541049)
  LU 
   interior          : 1379696354
   Schur complement  : 439093796, (L: 219546898, U: 219546898)
  total precond.     : 47551936465472
  total nnz          : 1818790150
  fill-ratio over A  : 23.42
  A1 is now freed    : 72751091
 reading input file again to compute residual norm.
 zero elements are not checked.

dread_ijv: n=943695 m=943695 nnz=39297771 (77651847).
m_flag 1, c_flag 0

  == rhs #1 ==
  Err. nrm. = 8.733697e-09 / 5.606929e+02 = 1.557662e-11
  Rel. res. = 6.248650e-05 / 1.428504e+10 = 4.374262e-15

 End(2) /proc/self/status: peakRSS=14855.555  peakVM=15335.969 (15335.969) VmSize=2481.246, (2481.246)MB TotalVirt=30089.848

  == rhs #2 ==
  Err. nrm. = 1.508970e-09 / 5.607138e+02 = 2.691160e-12
  Rel. res. = 2.930752e-05 / 1.311318e+10 = 2.234967e-15