superlu_dist options

Fri May 8 17:28:16 CDT 2009

   I don't think we have any parallel sorts in PETSc.

    Barry

On May 8, 2009, at 5:26 PM, Fredrik Bengzon wrote:

> Hi again,
> I resorted to using Mumps, which seems to scale very well, in Slepc.  
> However I have another question: how do you sort an MPI vector in  
> Petsc, and can you get the permutation also?
> /Fredrik
>
>
> Barry Smith wrote:
>>
>> On May 8, 2009, at 11:03 AM, Matthew Knepley wrote:
>>
>>> Look at the timing. The symbolic factorization takes 1e-4 seconds  
>>> and the numeric takes
>>> only 10s, out of 542s. MatSolve is taking 517s. If you have a  
>>> problem, it is likely there.
>>> However, the MatSolve looks balanced.
>>
>>   Something is funky with this. The 28 solves should not be so much  
>> more than the numeric factorization.
>> Perhaps it is worth saving the matrix and reporting this as a  
>> performance bug to Sherrie.
>>
>>   Barry
>>
>>>
>>>
>>>  Matt
>>>
>>> On Fri, May 8, 2009 at 10:59 AM, Fredrik Bengzon <fredrik.bengzon at math.umu.se 
>>> > wrote:
>>> Hi,
>>> Here is the output from the KSP and EPS objects, and the log  
>>> summary.
>>> / Fredrik
>>>
>>>
>>> Reading Triangle/Tetgen mesh
>>> #nodes=19345
>>> #elements=81895
>>> #nodes per element=4
>>> Partitioning mesh with METIS 4.0
>>> Element distribution (rank | #elements)
>>> 0 | 19771
>>> 1 | 20954
>>> 2 | 20611
>>> 3 | 20559
>>> rank 1 has 257 ghost nodes
>>> rank 0 has 127 ghost nodes
>>> rank 2 has 143 ghost nodes
>>> rank 3 has 270 ghost nodes
>>> Calling 3D Navier-Lame Eigenvalue Solver
>>> Assembling stiffness and mass matrix
>>> Solving eigensystem with SLEPc
>>> KSP Object:(st_)
>>> type: preonly
>>> maximum iterations=100000, initial guess is zero
>>> tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
>>> left preconditioning
>>> PC Object:(st_)
>>> type: lu
>>>  LU: out-of-place factorization
>>>    matrix ordering: natural
>>>  LU: tolerance for zero pivot 1e-12
>>> EPS Object:
>>> problem type: generalized symmetric eigenvalue problem
>>> method: krylovschur
>>> extraction type: Rayleigh-Ritz
>>> selected portion of the spectrum: largest eigenvalues in magnitude
>>> number of eigenvalues (nev): 4
>>> number of column vectors (ncv): 19
>>> maximum dimension of projected problem (mpd): 19
>>> maximum number of iterations: 6108
>>> tolerance: 1e-05
>>> dimension of user-provided deflation space: 0
>>> IP Object:
>>>  orthogonalization method:   classical Gram-Schmidt
>>>  orthogonalization refinement:   if needed (eta: 0.707100)
>>> ST Object:
>>>  type: sinvert
>>>  shift: 0
>>> Matrices A and B have same nonzero pattern
>>>    Associated KSP object
>>>    ------------------------------
>>>    KSP Object:(st_)
>>>      type: preonly
>>>      maximum iterations=100000, initial guess is zero
>>>      tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
>>>      left preconditioning
>>>    PC Object:(st_)
>>>      type: lu
>>>        LU: out-of-place factorization
>>>          matrix ordering: natural
>>>        LU: tolerance for zero pivot 1e-12
>>>        LU: factor fill ratio needed 0
>>>             Factored matrix follows
>>>            Matrix Object:
>>>              type=mpiaij, rows=58035, cols=58035
>>>              package used to perform factorization: superlu_dist
>>>              total: nonzeros=0, allocated nonzeros=116070
>>>                SuperLU_DIST run parameters:
>>>                  Process grid nprow 2 x npcol 2
>>>                  Equilibrate matrix TRUE
>>>                  Matrix input mode 1
>>>                  Replace tiny pivots TRUE
>>>                  Use iterative refinement FALSE
>>>                  Processors in row 2 col partition 2
>>>                  Row permutation LargeDiag
>>>                  Column permutation PARMETIS
>>>                  Parallel symbolic factorization TRUE
>>>                  Repeated factorization SamePattern
>>>      linear system matrix = precond matrix:
>>>      Matrix Object:
>>>        type=mpiaij, rows=58035, cols=58035
>>>        total: nonzeros=2223621, allocated nonzeros=2233584
>>>          using I-node (on process 0) routines: found 4695 nodes,  
>>> limit used is 5
>>>    ------------------------------
>>> Number of iterations in the eigensolver: 1
>>> Number of requested eigenvalues: 4
>>> Stopping condition: tol=1e-05, maxit=6108
>>> Number of converged eigenpairs: 8
>>>
>>> Writing binary .vtu file /scratch/fredrik/output/mode-0.vtu
>>> Writing binary .vtu file /scratch/fredrik/output/mode-1.vtu
>>> Writing binary .vtu file /scratch/fredrik/output/mode-2.vtu
>>> Writing binary .vtu file /scratch/fredrik/output/mode-3.vtu
>>> Writing binary .vtu file /scratch/fredrik/output/mode-4.vtu
>>> Writing binary .vtu file /scratch/fredrik/output/mode-5.vtu
>>> Writing binary .vtu file /scratch/fredrik/output/mode-6.vtu
>>> Writing binary .vtu file /scratch/fredrik/output/mode-7.vtu
>>> ************************************************************************************************************************
>>> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use  
>>> 'enscript -r -fCourier9' to print this document            ***
>>> ************************************************************************************************************************
>>>
>>> ---------------------------------------------- PETSc Performance  
>>> Summary: ----------------------------------------------
>>>
>>> /home/fredrik/Hakan/cmlfet/a.out on a linux-gnu named medusa1 with  
>>> 4 processors, by fredrik Fri May  8 17:57:28 2009
>>> Using Petsc Release Version 3.0.0, Patch 5, Mon Apr 13 09:15:37  
>>> CDT 2009
>>>
>>>                       Max       Max/Min        Avg      Total
>>> Time (sec):           5.429e+02      1.00001   5.429e+02
>>> Objects:              1.380e+02      1.00000   1.380e+02
>>> Flops:                1.053e+08      1.05695   1.028e+08  4.114e+08
>>> Flops/sec:            1.939e+05      1.05696   1.894e+05  7.577e+05
>>> Memory:               5.927e+07      1.03224              2.339e+08
>>> MPI Messages:         2.880e+02      1.51579   2.535e+02  1.014e+03
>>> MPI Message Lengths:  4.868e+07      1.08170   1.827e+05  1.853e+08
>>> MPI Reductions:       1.122e+02      1.00000
>>>
>>> Flop counting convention: 1 flop = 1 real number operation of type  
>>> (multiply/divide/add/subtract)
>>>                          e.g., VecAXPY() for real vectors of  
>>> length N --> 2N flops
>>>                          and VecAXPY() for complex vectors of  
>>> length N --> 8N flops
>>>
>>> Summary of Stages:   ----- Time ------  ----- Flops -----  ---  
>>> Messages ---  -- Message Lengths --  -- Reductions --
>>>                      Avg     %Total     Avg     %Total   counts    
>>> %Total     Avg         %Total   counts   %Total
>>> 0:      Main Stage: 5.4292e+02 100.0%  4.1136e+08 100.0%  1.014e 
>>> +03 100.0%  1.827e+05      100.0%  3.600e+02  80.2%
>>>
>>> ------------------------------------------------------------------------------------------------------------------------
>>> See the 'Profiling' chapter of the users' manual for details on  
>>> interpreting output.
>>> Phase summary info:
>>> Count: number of times phase was executed
>>> Time and Flops: Max - maximum over all processors
>>>                 Ratio - ratio of maximum to minimum over all  
>>> processors
>>> Mess: number of messages sent
>>> Avg. len: average message length
>>> Reduct: number of global reductions
>>> Global: entire computation
>>> Stage: stages of a computation. Set stages with  
>>> PetscLogStagePush() and PetscLogStagePop().
>>>    %T - percent time in this phase         %F - percent flops in  
>>> this phase
>>>    %M - percent messages in this phase     %L - percent message  
>>> lengths in this phase
>>>    %R - percent reductions in this phase
>>> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max  
>>> time over all processors)
>>> ------------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>>    ##########################################################
>>>    #                                                        #
>>>    #                          WARNING!!!                    #
>>>    #                                                        #
>>>    #   This code was compiled with a debugging option,      #
>>>    #   To get timing results run config/configure.py        #
>>>    #   using --with-debugging=no, the performance will      #
>>>    #   be generally two or three times faster.              #
>>>    #                                                        #
>>>    ##########################################################
>>>
>>>
>>> Event                Count      Time (sec)      
>>> Flops                             --- Global ---  --- Stage ---    
>>> Total
>>>                 Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg  
>>> len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>> ------------------------------------------------------------------------------------------------------------------------
>>>
>>> --- Event Stage 0: Main Stage
>>>
>>> STSetUp                1 1.0 1.0467e+01 1.0 0.00e+00 0.0 0.0e+00  
>>> 0.0e+00 8.0e+00  2  0  0  0  2   2  0  0  0  2     0
>>> STApply               28 1.0 5.1775e+02 1.0 3.15e+07 1.1 1.7e+02  
>>> 4.2e+03 2.8e+01 95 30 17  0  6  95 30 17  0  8     0
>>> EPSSetUp               1 1.0 1.0482e+01 1.0 0.00e+00 0.0 0.0e+00  
>>> 0.0e+00 4.6e+01  2  0  0  0 10   2  0  0  0 13     0
>>> EPSSolve               1 1.0 3.7193e+02 1.0 9.59e+07 1.1 3.5e+02  
>>> 4.2e+03 9.7e+01 69 91 35  1 22  69 91 35  1 27     1
>>> IPOrthogonalize       19 1.0 3.4406e-01 1.1 6.75e+07 1.1 2.3e+02  
>>> 4.2e+03 7.6e+01  0 64 22  1 17   0 64 22  1 21   767
>>> IPInnerProduct       153 1.0 3.1410e-01 1.0 5.63e+07 1.1 2.3e+02  
>>> 4.2e+03 3.9e+01  0 53 23  1  9   0 53 23  1 11   700
>>> IPApplyMatrix         39 1.0 2.4903e-01 1.1 4.38e+07 1.1 2.3e+02  
>>> 4.2e+03 0.0e+00  0 42 23  1  0   0 42 23  1  0   687
>>> UpdateVectors          1 1.0 4.2958e-03 1.2 4.51e+06 1.1 0.0e+00  
>>> 0.0e+00 0.0e+00  0  4  0  0  0   0  4  0  0  0  4107
>>> VecDot                 1 1.0 5.6815e-04 4.7 2.97e+04 1.1 0.0e+00  
>>> 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0   204
>>> VecNorm                8 1.0 2.5260e-03 3.2 2.38e+05 1.1 0.0e+00  
>>> 0.0e+00 8.0e+00  0  0  0  0  2   0  0  0  0  2   368
>>> VecScale              27 1.0 5.9605e-04 1.1 4.01e+05 1.1 0.0e+00  
>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2629
>>> VecCopy               53 1.0 4.0610e-03 1.4 0.00e+00 0.0 0.0e+00  
>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecSet                77 1.0 6.2165e-03 1.1 0.00e+00 0.0 0.0e+00  
>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecAXPY               38 1.0 2.7709e-03 1.7 1.13e+06 1.1 0.0e+00  
>>> 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  1592
>>> VecMAXPY              38 1.0 2.5925e-02 1.1 1.13e+07 1.1 0.0e+00  
>>> 0.0e+00 0.0e+00  0 11  0  0  0   0 11  0  0  0  1701
>>> VecAssemblyBegin       5 1.0 9.0070e-03 2.3 0.00e+00 0.0 3.6e+01  
>>> 2.1e+04 1.5e+01  0  0  4  0  3   0  0  4  0  4     0
>>> VecAssemblyEnd         5 1.0 3.4809e-04 1.1 0.00e+00 0.0 0.0e+00  
>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecScatterBegin       73 1.0 8.5931e-03 1.5 0.00e+00 0.0 4.6e+02  
>>> 8.9e+03 0.0e+00  0  0 45  2  0   0  0 45  2  0     0
>>> VecScatterEnd         73 1.0 2.2542e-02 2.2 0.00e+00 0.0 0.0e+00  
>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecReduceArith        76 1.0 3.0838e-02 1.1 1.24e+07 1.1 0.0e+00  
>>> 0.0e+00 0.0e+00  0 12  0  0  0   0 12  0  0  0  1573
>>> VecReduceComm         38 1.0 4.8040e-02 2.0 0.00e+00 0.0 0.0e+00  
>>> 0.0e+00 3.8e+01  0  0  0  0  8   0  0  0  0 11     0
>>> VecNormalize           8 1.0 2.7280e-03 2.8 3.56e+05 1.1 0.0e+00  
>>> 0.0e+00 8.0e+00  0  0  0  0  2   0  0  0  0  2   511
>>> MatMult               67 1.0 4.1397e-01 1.1 7.53e+07 1.1 4.0e+02  
>>> 4.2e+03 0.0e+00  0 71 40  1  0   0 71 40  1  0   710
>>> MatSolve              28 1.0 5.1757e+02 1.0 0.00e+00 0.0 0.0e+00  
>>> 0.0e+00 0.0e+00 95  0  0  0  0  95  0  0  0  0     0
>>> MatLUFactorSym         1 1.0 3.6097e-04 1.1 0.00e+00 0.0 0.0e+00  
>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatLUFactorNum         1 1.0 1.0464e+01 1.0 0.00e+00 0.0 0.0e+00  
>>> 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>>> MatAssemblyBegin       9 1.0 3.3842e-0146.7 0.00e+00 0.0 5.4e+01  
>>> 6.0e+04 8.0e+00  0  0  5  2  2   0  0  5  2  2     0
>>> MatAssemblyEnd         9 1.0 2.3042e-01 1.0 0.00e+00 0.0 3.6e+01  
>>> 9.4e+02 3.1e+01  0  0  4  0  7   0  0  4  0  9     0
>>> MatGetRow           5206 1.1 3.1164e-03 1.1 0.00e+00 0.0 0.0e+00  
>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatGetSubMatrice       5 1.0 8.7580e-01 1.2 0.00e+00 0.0 1.5e+02  
>>> 1.1e+06 2.5e+01  0  0 15 88  6   0  0 15 88  7     0
>>> MatZeroEntries         2 1.0 1.0233e-02 1.1 0.00e+00 0.0 0.0e+00  
>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatView                2 1.0 1.0149e-03 2.0 0.00e+00 0.0 0.0e+00  
>>> 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  1     0
>>> KSPSetup               1 1.0 2.8610e-06 1.5 0.00e+00 0.0 0.0e+00  
>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> KSPSolve              28 1.0 5.1758e+02 1.0 0.00e+00 0.0 0.0e+00  
>>> 0.0e+00 2.8e+01 95  0  0  0  6  95  0  0  0  8     0
>>> PCSetUp                1 1.0 1.0467e+01 1.0 0.00e+00 0.0 0.0e+00  
>>> 0.0e+00 8.0e+00  2  0  0  0  2   2  0  0  0  2     0
>>> PCApply               28 1.0 5.1757e+02 1.0 0.00e+00 0.0 0.0e+00  
>>> 0.0e+00 0.0e+00 95  0  0  0  0  95  0  0  0  0     0
>>> ------------------------------------------------------------------------------------------------------------------------
>>>
>>> Memory usage is given in bytes:
>>>
>>> Object Type          Creations   Destructions   Memory   
>>> Descendants' Mem.
>>>
>>> --- Event Stage 0: Main Stage
>>>
>>> Spectral Transform     1              1        536     0
>>> Eigenproblem Solver     1              1        824     0
>>>     Inner product     1              1        428     0
>>>         Index Set    38             38    1796776     0
>>> IS L to G Mapping     1              1      58700     0
>>>               Vec    65             65    5458584     0
>>>       Vec Scatter     9              9       7092     0
>>> Application Order     1              1     155232     0
>>>            Matrix    17             16   17715680     0
>>>     Krylov Solver     1              1        832     0
>>>    Preconditioner     1              1        744     0
>>>            Viewer     2              2       1088     0
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> = 
>>> ====================================================================
>>> Average time to get PetscTime(): 1.90735e-07
>>> Average time for MPI_Barrier(): 5.9557e-05
>>> Average time for zero size MPI_Send(): 2.97427e-05
>>> #PETSc Option Table entries:
>>> -log_summary
>>> -mat_superlu_dist_parsymbfact
>>> #End o PETSc Option Table entries
>>> Compiled without FORTRAN kernels
>>> Compiled with full precision matrices (default)
>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8  
>>> sizeof(PetscScalar) 8
>>> Configure run at: Wed May  6 15:14:39 2009
>>> Configure options: --download-superlu_dist=1 --download-parmetis=1  
>>> --with-mpi-dir=/usr/lib/mpich --with-shared=0
>>> -----------------------------------------
>>> Libraries compiled on Wed May  6 15:14:49 CEST 2009 on medusa1
>>> Machine characteristics: Linux medusa1 2.6.18-6-amd64 #1 SMP Fri  
>>> Dec 12 05:49:32 UTC 2008 x86_64 GNU/Linux
>>> Using PETSc directory: /home/fredrik/Hakan/cmlfet/external/ 
>>> petsc-3.0.0-p5
>>> Using PETSc arch: linux-gnu-c-debug
>>> -----------------------------------------
>>> Using C compiler: /usr/lib/mpich/bin/mpicc -Wall -Wwrite-strings - 
>>> Wno-strict-aliasing -g3  Using Fortran compiler: /usr/lib/mpich/ 
>>> bin/mpif77 -Wall -Wno-unused-variable -g    
>>> -----------------------------------------
>>> Using include paths: -I/home/fredrik/Hakan/cmlfet/external/ 
>>> petsc-3.0.0-p5/linux-gnu-c-debug/include -I/home/fredrik/Hakan/ 
>>> cmlfet/external/petsc-3.0.0-p5/include -I/home/fredrik/Hakan/ 
>>> cmlfet/external/petsc-3.0.0-p5/linux-gnu-c-debug/include -I/usr/ 
>>> lib/mpich/include  ------------------------------------------
>>> Using C linker: /usr/lib/mpich/bin/mpicc -Wall -Wwrite-strings - 
>>> Wno-strict-aliasing -g3
>>> Using Fortran linker: /usr/lib/mpich/bin/mpif77 -Wall -Wno-unused- 
>>> variable -g Using libraries: -Wl,-rpath,/home/fredrik/Hakan/cmlfet/ 
>>> external/petsc-3.0.0-p5/linux-gnu-c-debug/lib -L/home/fredrik/ 
>>> Hakan/cmlfet/external/petsc-3.0.0-p5/linux-gnu-c-debug/lib - 
>>> lpetscts -lpetscsnes -lpetscksp -lpetscdm -lpetscmat -lpetscvec - 
>>> lpetsc        -lX11 -Wl,-rpath,/home/fredrik/Hakan/cmlfet/external/ 
>>> petsc-3.0.0-p5/linux-gnu-c-debug/lib -L/home/fredrik/Hakan/cmlfet/ 
>>> external/petsc-3.0.0-p5/linux-gnu-c-debug/lib -lsuperlu_dist_2.3 - 
>>> llapack -lblas -lparmetis -lmetis -lm -L/usr/lib/mpich/lib -L/usr/ 
>>> lib/gcc/x86_64-linux-gnu/4.1.2 -L/usr/lib64 -L/lib64 -ldl -lmpich - 
>>> lpthread -lrt -lgcc_s -lg2c -lm -L/usr/lib/gcc/x86_64-linux-gnu/ 
>>> 3.4.6 -L/lib -lm -ldl -lmpich -lpthread -lrt -lgcc_s -ldl
>>> ------------------------------------------
>>>
>>> real    9m10.616s
>>> user    0m23.921s
>>> sys    0m6.944s
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Satish Balay wrote:
>>> Just a note about scalability: its a function of the hardware as
>>> well.. For proper scalability studies - you'll need a true  
>>> distributed
>>> system with fast network [not SMP nodes..]
>>>
>>> Satish
>>>
>>> On Fri, 8 May 2009, Fredrik Bengzon wrote:
>>>
>>>
>>> Hong,
>>> Thank you for the suggestions, but I have looked at the EPS and  
>>> KSP objects
>>> and I can not find anything wrong. The problem is that it takes  
>>> longer to
>>> solve with 4 cpus than with 2 so the scalability seems to be  
>>> absent when using
>>> superlu_dist. I have stored my mass and stiffness matrix in the  
>>> mpiaij format
>>> and just passed them on to slepc. When using the petsc iterative  
>>> krylov
>>> solvers i see 100% workload on all processors but when i switch to
>>> superlu_dist only two cpus seem to do the whole work of LU  
>>> factoring. I don't
>>> want to use the krylov solver though since it might cause slepc  
>>> not to
>>> converge.
>>> Regards,
>>> Fredrik
>>>
>>> Hong Zhang wrote:
>>>
>>> Run your code with '-eps_view -ksp_view' for checking
>>> which methods are used
>>> and '-log_summary' to see which operations dominate
>>> the computation.
>>>
>>> You can turn on parallel symbolic factorization
>>> with '-mat_superlu_dist_parsymbfact'.
>>>
>>> Unless you use large num of processors, symbolic factorization
>>> takes ignorable execution time. The numeric
>>> factorization usually dominates.
>>>
>>> Hong
>>>
>>> On Fri, 8 May 2009, Fredrik Bengzon wrote:
>>>
>>>
>>> Hi Petsc team,
>>> Sorry for posting questions not really concerning the petsc core,  
>>> but when
>>> I run superlu_dist from within slepc I notice that the load  
>>> balance is
>>> poor. It is just fine during assembly (I use Metis to partition my  
>>> finite
>>> element mesh) but when calling the slepc solver it dramatically  
>>> changes. I
>>> use superlu_dist as solver for the eigenvalue iteration. My  
>>> question is:
>>> can this have something to do with the fact that the option  
>>> 'Parallel
>>> symbolic factorization' is set to false? If so, can I change the  
>>> options
>>> to superlu_dist using MatSetOption for instance? Also, does this  
>>> mean that
>>> superlu_dist is not using parmetis to reorder the matrix?
>>> Best Regards,
>>> Fredrik Bengzon
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --What most experimenters take for granted before they begin their  
>>> experiments is infinitely more interesting than any results to  
>>> which their experiments lead.
>>> -- Norbert Wiener
>>
>>
>