[petsc-users] Fwd: superlu_dist with same_nonzero_pattern

Fri Jun 24 11:11:05 CDT 2011

  Use the KSP routines, there is no reason to use the low-level routines, they will not be any less efficient.

   Barry

On Jun 24, 2011, at 10:20 AM, Xiangdong Liang wrote:

> Thanks for helping me check the code.
> 
> I agree that moving the  KSPSetOperators(ksp,M,M,SAME_NONZERO_PATTERN)
> outside the iteration loop lifted the crash of superlu_dist. However,
> I have the following concerns:
> 
> 1)  If we only call KSPSetOperators once outside the loop, the ksp
> solver would be the same for all the iterations, although matrix M is
> changing during each iteration.  I added the following code to
> manually check the error of the solution,
> 
>  ierr = MatMult(M,x, xdiff);CHKERRQ(ierr);
>  ierr = VecAXPY(xdiff,-1.0,J);CHKERRQ(ierr);
>  ierr = VecNorm(xdiff,NORM_2,&norm);CHKERRQ(ierr);
>  ierr = PetscPrintf(PETSC_COMM_WORLD,"---Norm of error %A----\n
> ",norm);CHKERRQ(ierr);
> 
> As far as I get, the error are huge (1e+6) except the first iteration.
> It seems that it keeps solving Mx=J by using the non-updated M.
> 
> 2) Actually, I really want to switch between SAME_NONZERO_PATTERN and
> SAME_PRECONDITIONER. In my problem, matrix M changes only a little bit
> during each step. I am planing to only use the sparse direct solver
> every 10 steps, because the LU decomposition is very expensive. The
> other steps I simply use the iterative methods (GMRES) with the
> preconditioner obtained in the previous LU decomposition.
> 
> As we tested, Pastix performs well for this switch. It would be nice
> if superlu_dist also works for this switch. The other sparse direct
> solvers are just too slow for my 3D problem.
> 
> 3) I am also think to use the low level functions like:
> 
> Outside the loop:
> MatGetOrdering(Mat matrix,MatOrderingType type,IS* rowperm,IS* colperm);
> MatGetFactor(Mat matrix,const MatSolverPackage package,MatFactorType
> ftype,Mat *factor);
> MatLUFactorSymbolic(Mat factor,Mat matrix,IS rowperm,IS colperm,const
> MatFactorInfo *info);
> 
> Inside the loop:
> MatLUFactorNumeric(Mat factor,Mat matrix,const MatFactorInfo *info);
> MatSolve(Mat A,Vec x, Vec y);
> 
> Are they equivalent to SAME_NONZERO_PATTERN? Do you have an idea
> whether it will perform well or worse than the ksp version?
> 
> Thank you very much.
> 
> Xiangdong
> 
> 
> On Thu, Jun 23, 2011 at 11:27 PM, Hong Zhang <hzhang at mcs.anl.gov> wrote:
>> Xiangdong :
>> I tested your code. Here is what I get:
>> 
>> 1. your code does not build. I made following changes:
>> < PetscErrorCode MoperatorGeneral(MPI_Comm comm, int Nx, int Ny, int
>> Nz, Vec muinvpml,Mat *Aout)
>> ---
>>> Mat MoperatorGeneral(MPI_Comm comm, int Nx, int Ny, int Nz, Vec muinvpml)
>> 
>> and get it build on an iMac and a linux machine
>> 
>> 2. code hangs at
>> if (rank == 0) fgetc(stdin);
>> I comment it out.
>> 
>> 3. then I can reproduce your error with superlu_dist, np=2, and
>> SAME_NONZERO_PATTERN.
>> KSPSetOperators(ksp,M,M,SAME_NONZERO_PATTERN)
>> should be called once, not inside your iteration loop. Move it out of
>> the loop fix the problem.
>> Why other packages work? I do not know :-(
>> 
>> 4. the modified code works well on the linux machine with np=1, 2, 4 etc.
>> However, on iMac, even with np=1, at the end of 1st iteration, I get
>> 
>>  [0]PETSC ERROR: --------------------- Error Message
>> ------------------------------------
>> [0]PETSC ERROR: Floating point exception!
>> [0]PETSC ERROR: Infinite or not-a-number generated in norm!
>> ...
>> [0]PETSC ERROR: VecNorm() line 167 in src/vec/vec/interface/rvector.c
>> [0]PETSC ERROR: main() line 93 in test.c
>> 
>> Checking solution x, I see
>> '-inf' in several components.
>> Using mumps ('-pc_factor_mat_solver_package mumps') with np=2, I get same error.
>> However, sequential superlu, mumps and petsc all run well.
>> It is likely your model is numerically sensitive, not problem with
>> these packages.
>> 
>> The modified code is attached. Good luck!
>> 
>> Hong
>> 
>> 
>> 
>>> Thanks, Hong. I tried the runtime option -mat_superlu_dist_equil NO.
>>> However, the problem is still there.
>>> 
>>> I've upload my short codes here:
>>> http://math.mit.edu/~xdliang/superlu_dist_test.zip
>>> 
>>> In this code, I first generate my sparse matrix M with dimension
>>> N-by-N, then solve Mx=J for a few times. Each time, I only modify the
>>> entries of two diagonals: (i,i) and  ( i, (i+N/2)%N). Althougth I
>>> modify these entries, I thought the nonzero pattern should not change.
>>> Based on my tests, I found that:
>>> 
>>> 1 superlu_dist works fine with DIFFERENT_NONZERO_PATTERN. However, if
>>> I switch to SAME_NONZERO_PATTERN, the program either crashes (with np
>>>> 1 ) or does not converge ( with single processor).
>>> 
>>> 2 pastix works fine with SAME_NONZERO_PATTERN, but it halts there if I
>>> switch to DIFFERENT_NONZERO_PATTERN ( with np>1).
>>> 
>>> 3 spooles works fine with both SAME and DIFFERENT nonzero pattern.
>>> 
>>> Can you give me some hints why superlu_dist fails on my matrix with
>>> SAME_NONZERO_PATTERN? I tried both superlu_dist v2.4 and v2.5 ( latest
>>> one). Thanks.
>>> 
>>> Best,
>>> Xiangdong
>>> 
>>> 
>>> 
>>> On Wed, Jun 22, 2011 at 2:06 PM, Hong Zhang <hzhang at mcs.anl.gov> wrote:
>>>> Xiangdong :
>>>> 
>>>> Does it hangs inside superlu_dist?
>>>> Try runtime option '-mat_superlu_dist_equil NO'
>>>> 
>>>> If you can send us a stand-alone short code or your matrix in petsc
>>>> binary format that repeats this behavior, then we can investigate it.
>>>> Likely memory problem.
>>>> 
>>>> Hong
>>>> 
>>>> 
>>>>> Hello everyone,
>>>>> 
>>>>> I had some problem in using superlu_dist with same_nonzero_pattern.
>>>>> Sometimes superlu_dist solver halts there and never finishes. However,
>>>>> when I switched to different_nonzero_pattern or other solvers, the
>>>>> program works nicely.  Ding had reported this problem a few months
>>>>> ago:
>>>>>  http://lists.mcs.anl.gov/pipermail/petsc-users/2011-January/007623.html
>>>>> 
>>>>> Following Barry's suggestion, I checked that the nonzero pattern of my
>>>>> matrix is still the same via
>>>>>  ierr=MatSetOption(M, MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE); CHKERRQ(ierr);
>>>>> 
>>>>> I tried two other options:
>>>>> 
>>>>> 1 First install the latest superlu_dist (v2.5), then compile petsc
>>>>> with this new superlu_dist package. However, the problem is still
>>>>> there, and the performance is even worse ( comparing to the option
>>>>> --download-superlu_dist=yes (v2.4)).
>>>>> 
>>>>> 2 I compiled petsc-dev with --download-superlu_dist. However, I got
>>>>> some errors when compiling my program with petsc-dev. It said that
>>>>> 
>>>>> ResonatorOpt.c: In function ‘main’:
>>>>> ResonatorOpt.c:154: error: ‘MAT_SOLVER_SUPERLU_DIST’ undeclared (first
>>>>> use in this function)
>>>>> ResonatorOpt.c:154: error: (Each undeclared identifier is reported only once
>>>>> ResonatorOpt.c:154: error: for each function it appears in.)
>>>>> 
>>>>> Besides superlu_dist, MAT_SOLVER_PASTIX and MAT_SOLVER_PASTIX are
>>>>> undeclared either. Am I missing some header files? I used the same
>>>>> procedure as I complied regular petsc, which had no such problems.  (I
>>>>> tried to attach the configure log in a previous mail, but it is
>>>>> bounced back.)
>>>>> 
>>>>> Can you give me some suggestions on using superlu_dist with
>>>>> same_nonzero_pattern? Thank you.
>>>>> 
>>>>> Best,
>>>>> Xiangdong
>>>>> 
>>>> 
>>> 
>>