[petsc-users] Fwd: superlu_dist with same_nonzero_pattern
Hong Zhang
hzhang at mcs.anl.gov
Thu Jun 23 22:27:03 CDT 2011
Xiangdong :
I tested your code. Here is what I get:
1. your code does not build. I made following changes:
< PetscErrorCode MoperatorGeneral(MPI_Comm comm, int Nx, int Ny, int
Nz, Vec muinvpml,Mat *Aout)
---
> Mat MoperatorGeneral(MPI_Comm comm, int Nx, int Ny, int Nz, Vec muinvpml)
and get it build on an iMac and a linux machine
2. code hangs at
if (rank == 0) fgetc(stdin);
I comment it out.
3. then I can reproduce your error with superlu_dist, np=2, and
SAME_NONZERO_PATTERN.
KSPSetOperators(ksp,M,M,SAME_NONZERO_PATTERN)
should be called once, not inside your iteration loop. Move it out of
the loop fix the problem.
Why other packages work? I do not know :-(
4. the modified code works well on the linux machine with np=1, 2, 4 etc.
However, on iMac, even with np=1, at the end of 1st iteration, I get
[0]PETSC ERROR: --------------------- Error Message
------------------------------------
[0]PETSC ERROR: Floating point exception!
[0]PETSC ERROR: Infinite or not-a-number generated in norm!
...
[0]PETSC ERROR: VecNorm() line 167 in src/vec/vec/interface/rvector.c
[0]PETSC ERROR: main() line 93 in test.c
Checking solution x, I see
'-inf' in several components.
Using mumps ('-pc_factor_mat_solver_package mumps') with np=2, I get same error.
However, sequential superlu, mumps and petsc all run well.
It is likely your model is numerically sensitive, not problem with
these packages.
The modified code is attached. Good luck!
Hong
> Thanks, Hong. I tried the runtime option -mat_superlu_dist_equil NO.
> However, the problem is still there.
>
> I've upload my short codes here:
> http://math.mit.edu/~xdliang/superlu_dist_test.zip
>
> In this code, I first generate my sparse matrix M with dimension
> N-by-N, then solve Mx=J for a few times. Each time, I only modify the
> entries of two diagonals: (i,i) and ( i, (i+N/2)%N). Althougth I
> modify these entries, I thought the nonzero pattern should not change.
> Based on my tests, I found that:
>
> 1 superlu_dist works fine with DIFFERENT_NONZERO_PATTERN. However, if
> I switch to SAME_NONZERO_PATTERN, the program either crashes (with np
>>1 ) or does not converge ( with single processor).
>
> 2 pastix works fine with SAME_NONZERO_PATTERN, but it halts there if I
> switch to DIFFERENT_NONZERO_PATTERN ( with np>1).
>
> 3 spooles works fine with both SAME and DIFFERENT nonzero pattern.
>
> Can you give me some hints why superlu_dist fails on my matrix with
> SAME_NONZERO_PATTERN? I tried both superlu_dist v2.4 and v2.5 ( latest
> one). Thanks.
>
> Best,
> Xiangdong
>
>
>
On Wed, Jun 22, 2011 at 2:06 PM, Hong Zhang <hzhang at mcs.anl.gov> wrote:
>> Xiangdong :
>>
>> Does it hangs inside superlu_dist?
>> Try runtime option '-mat_superlu_dist_equil NO'
>>
>> If you can send us a stand-alone short code or your matrix in petsc
>> binary format that repeats this behavior, then we can investigate it.
>> Likely memory problem.
>>
>> Hong
>>
>>
>>> Hello everyone,
>>>
>>> I had some problem in using superlu_dist with same_nonzero_pattern.
>>> Sometimes superlu_dist solver halts there and never finishes. However,
>>> when I switched to different_nonzero_pattern or other solvers, the
>>> program works nicely. Ding had reported this problem a few months
>>> ago:
>>> http://lists.mcs.anl.gov/pipermail/petsc-users/2011-January/007623.html
>>>
>>> Following Barry's suggestion, I checked that the nonzero pattern of my
>>> matrix is still the same via
>>> ierr=MatSetOption(M, MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE); CHKERRQ(ierr);
>>>
>>> I tried two other options:
>>>
>>> 1 First install the latest superlu_dist (v2.5), then compile petsc
>>> with this new superlu_dist package. However, the problem is still
>>> there, and the performance is even worse ( comparing to the option
>>> --download-superlu_dist=yes (v2.4)).
>>>
>>> 2 I compiled petsc-dev with --download-superlu_dist. However, I got
>>> some errors when compiling my program with petsc-dev. It said that
>>>
>>> ResonatorOpt.c: In function ‘main’:
>>> ResonatorOpt.c:154: error: ‘MAT_SOLVER_SUPERLU_DIST’ undeclared (first
>>> use in this function)
>>> ResonatorOpt.c:154: error: (Each undeclared identifier is reported only once
>>> ResonatorOpt.c:154: error: for each function it appears in.)
>>>
>>> Besides superlu_dist, MAT_SOLVER_PASTIX and MAT_SOLVER_PASTIX are
>>> undeclared either. Am I missing some header files? I used the same
>>> procedure as I complied regular petsc, which had no such problems. (I
>>> tried to attach the configure log in a previous mail, but it is
>>> bounced back.)
>>>
>>> Can you give me some suggestions on using superlu_dist with
>>> same_nonzero_pattern? Thank you.
>>>
>>> Best,
>>> Xiangdong
>>>
>>
>
