[petsc-users] Fwd: superlu_dist with same_nonzero_pattern

Fri Jun 24 10:20:46 CDT 2011

Thanks for helping me check the code.

I agree that moving the  KSPSetOperators(ksp,M,M,SAME_NONZERO_PATTERN)
outside the iteration loop lifted the crash of superlu_dist. However,
I have the following concerns:

1)  If we only call KSPSetOperators once outside the loop, the ksp
solver would be the same for all the iterations, although matrix M is
changing during each iteration.  I added the following code to
manually check the error of the solution,

  ierr = MatMult(M,x, xdiff);CHKERRQ(ierr);
  ierr = VecAXPY(xdiff,-1.0,J);CHKERRQ(ierr);
  ierr = VecNorm(xdiff,NORM_2,&norm);CHKERRQ(ierr);
  ierr = PetscPrintf(PETSC_COMM_WORLD,"---Norm of error %A----\n
",norm);CHKERRQ(ierr);

As far as I get, the error are huge (1e+6) except the first iteration.
It seems that it keeps solving Mx=J by using the non-updated M.

2) Actually, I really want to switch between SAME_NONZERO_PATTERN and
SAME_PRECONDITIONER. In my problem, matrix M changes only a little bit
during each step. I am planing to only use the sparse direct solver
every 10 steps, because the LU decomposition is very expensive. The
other steps I simply use the iterative methods (GMRES) with the
preconditioner obtained in the previous LU decomposition.

As we tested, Pastix performs well for this switch. It would be nice
if superlu_dist also works for this switch. The other sparse direct
solvers are just too slow for my 3D problem.

3) I am also think to use the low level functions like:

Outside the loop:
MatGetOrdering(Mat matrix,MatOrderingType type,IS* rowperm,IS* colperm);
MatGetFactor(Mat matrix,const MatSolverPackage package,MatFactorType
ftype,Mat *factor);
MatLUFactorSymbolic(Mat factor,Mat matrix,IS rowperm,IS colperm,const
MatFactorInfo *info);

Inside the loop:
MatLUFactorNumeric(Mat factor,Mat matrix,const MatFactorInfo *info);
MatSolve(Mat A,Vec x, Vec y);

Are they equivalent to SAME_NONZERO_PATTERN? Do you have an idea
whether it will perform well or worse than the ksp version?

Thank you very much.

Xiangdong

On Thu, Jun 23, 2011 at 11:27 PM, Hong Zhang <hzhang at mcs.anl.gov> wrote:
> Xiangdong :
> I tested your code. Here is what I get:
>
> 1. your code does not build. I made following changes:
> < PetscErrorCode MoperatorGeneral(MPI_Comm comm, int Nx, int Ny, int
> Nz, Vec muinvpml,Mat *Aout)
> ---
>> Mat MoperatorGeneral(MPI_Comm comm, int Nx, int Ny, int Nz, Vec muinvpml)
>
> and get it build on an iMac and a linux machine
>
> 2. code hangs at
> if (rank == 0) fgetc(stdin);
> I comment it out.
>
> 3. then I can reproduce your error with superlu_dist, np=2, and
> SAME_NONZERO_PATTERN.
> KSPSetOperators(ksp,M,M,SAME_NONZERO_PATTERN)
> should be called once, not inside your iteration loop. Move it out of
> the loop fix the problem.
> Why other packages work? I do not know :-(
>
> 4. the modified code works well on the linux machine with np=1, 2, 4 etc.
> However, on iMac, even with np=1, at the end of 1st iteration, I get
>
>  [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [0]PETSC ERROR: Floating point exception!
> [0]PETSC ERROR: Infinite or not-a-number generated in norm!
> ...
> [0]PETSC ERROR: VecNorm() line 167 in src/vec/vec/interface/rvector.c
> [0]PETSC ERROR: main() line 93 in test.c
>
> Checking solution x, I see
> '-inf' in several components.
> Using mumps ('-pc_factor_mat_solver_package mumps') with np=2, I get same error.
> However, sequential superlu, mumps and petsc all run well.
> It is likely your model is numerically sensitive, not problem with
> these packages.
>
> The modified code is attached. Good luck!
>
> Hong
>
>
>
>> Thanks, Hong. I tried the runtime option -mat_superlu_dist_equil NO.
>> However, the problem is still there.
>>
>> I've upload my short codes here:
>> http://math.mit.edu/~xdliang/superlu_dist_test.zip
>>
>> In this code, I first generate my sparse matrix M with dimension
>> N-by-N, then solve Mx=J for a few times. Each time, I only modify the
>> entries of two diagonals: (i,i) and  ( i, (i+N/2)%N). Althougth I
>> modify these entries, I thought the nonzero pattern should not change.
>> Based on my tests, I found that:
>>
>> 1 superlu_dist works fine with DIFFERENT_NONZERO_PATTERN. However, if
>> I switch to SAME_NONZERO_PATTERN, the program either crashes (with np
>>>1 ) or does not converge ( with single processor).
>>
>> 2 pastix works fine with SAME_NONZERO_PATTERN, but it halts there if I
>> switch to DIFFERENT_NONZERO_PATTERN ( with np>1).
>>
>> 3 spooles works fine with both SAME and DIFFERENT nonzero pattern.
>>
>> Can you give me some hints why superlu_dist fails on my matrix with
>> SAME_NONZERO_PATTERN? I tried both superlu_dist v2.4 and v2.5 ( latest
>> one). Thanks.
>>
>> Best,
>> Xiangdong
>>
>>
>>
>> On Wed, Jun 22, 2011 at 2:06 PM, Hong Zhang <hzhang at mcs.anl.gov> wrote:
>>> Xiangdong :
>>>
>>> Does it hangs inside superlu_dist?
>>> Try runtime option '-mat_superlu_dist_equil NO'
>>>
>>> If you can send us a stand-alone short code or your matrix in petsc
>>> binary format that repeats this behavior, then we can investigate it.
>>> Likely memory problem.
>>>
>>> Hong
>>>
>>>
>>>> Hello everyone,
>>>>
>>>> I had some problem in using superlu_dist with same_nonzero_pattern.
>>>> Sometimes superlu_dist solver halts there and never finishes. However,
>>>> when I switched to different_nonzero_pattern or other solvers, the
>>>> program works nicely.  Ding had reported this problem a few months
>>>> ago:
>>>>  http://lists.mcs.anl.gov/pipermail/petsc-users/2011-January/007623.html
>>>>
>>>> Following Barry's suggestion, I checked that the nonzero pattern of my
>>>> matrix is still the same via
>>>>  ierr=MatSetOption(M, MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE); CHKERRQ(ierr);
>>>>
>>>> I tried two other options:
>>>>
>>>> 1 First install the latest superlu_dist (v2.5), then compile petsc
>>>> with this new superlu_dist package. However, the problem is still
>>>> there, and the performance is even worse ( comparing to the option
>>>> --download-superlu_dist=yes (v2.4)).
>>>>
>>>> 2 I compiled petsc-dev with --download-superlu_dist. However, I got
>>>> some errors when compiling my program with petsc-dev. It said that
>>>>
>>>> ResonatorOpt.c: In function ‘main’:
>>>> ResonatorOpt.c:154: error: ‘MAT_SOLVER_SUPERLU_DIST’ undeclared (first
>>>> use in this function)
>>>> ResonatorOpt.c:154: error: (Each undeclared identifier is reported only once
>>>> ResonatorOpt.c:154: error: for each function it appears in.)
>>>>
>>>> Besides superlu_dist, MAT_SOLVER_PASTIX and MAT_SOLVER_PASTIX are
>>>> undeclared either. Am I missing some header files? I used the same
>>>> procedure as I complied regular petsc, which had no such problems.  (I
>>>> tried to attach the configure log in a previous mail, but it is
>>>> bounced back.)
>>>>
>>>> Can you give me some suggestions on using superlu_dist with
>>>> same_nonzero_pattern? Thank you.
>>>>
>>>> Best,
>>>> Xiangdong
>>>>
>>>
>>
>