[petsc-users] MPICH error in KSPSolve
Mark F. Adams
mark.adams at columbia.edu
Tue Jul 10 08:31:05 CDT 2012
On Jul 9, 2012, at 3:41 PM, John Mousel wrote:
> Can you clarify what you mean by null-space cleaning. I just run SOR on the coarse grid.
>
http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/KSP/KSPSetNullSpace.html#KSPSetNullSpace
>
>
> On Mon, Jul 9, 2012 at 11:52 AM, Mark F. Adams <mark.adams at columbia.edu> wrote:
>
> On Jul 9, 2012, at 12:39 PM, John Mousel wrote:
>
>> Mark,
>>
>> The problem is indeed non-symmetric. We went back and forth in March about this problem. I think we ended up concluding that the coarse size couldn't get too small or the null-space presented problems.
>
> Oh its singular. I forget what the issues were but an iterative coarse grid solver should be fine for singular problems, perhaps with null space cleaning if the kernel is sneaking in. Actually there is an SVD coarse grid solver:
>
> -mg_coarse_pc_type svd
>
> That is the most robust.
>
>> When I did get it to work, I tried to scale it up, and on my local university cluster, it seemed to just hang when the core counts got above something like 16 cores. I don't really trust that machine though.
>
> That's the machine. GAMG does have some issues but I've not seen it hang.
>
>> It's new and has been plagued by hardware incompatability issues since day 1. I could re-examine this on Kraken. Also, what option are you talking about with ML. I thought I had tried all the -pc_ml_CoarsenScheme options, but I could be wrong.
>
> This sounds like the right one. I try to be careful in my solvers to be invariant to subdomain shapes and sizes and I think Ray Tuminaro (ML developer) at least has options that should be careful about this also. But I don't know much about what they are deploying these days.
>
> Mark
>
>>
>> John
>>
>>
>>
>> On Mon, Jul 9, 2012 at 11:30 AM, Mark F. Adams <mark.adams at columbia.edu> wrote:
>> What problems are you having again with GAMG? Are you problems unsymmetric?
>>
>> ML has several coarsening strategies available and I think the default does aggregation locally and does not aggregate across processor subdomains. If you have poorly shaped domains then you want to use a global coarsening method (these are not expensive).
>>
>> Mark
>>
>> On Jul 9, 2012, at 12:17 PM, John Mousel wrote:
>>
>>> Mark,
>>>
>>> I still haven't had much luck getting GAMG to work consistently for my Poisson problem. ML seems to work nicely on low core counts, but I have a problem where I can get long thin portions of grid on some processors instead of nice block like chunks at high core counts, which leads to a pretty tough time for ML.
>>>
>>> John
>>>
>>> On Mon, Jul 9, 2012 at 10:58 AM, John Mousel <john.mousel at gmail.com> wrote:
>>> Getting rid of the Hypre option seemed to be the trick.
>>>
>>> On Mon, Jul 9, 2012 at 10:40 AM, Mark F. Adams <mark.adams at columbia.edu> wrote:
>>> Google PTL_NO_SPACE and you will find some NERSC presentations on how to go about fixing this. (I have run into these problems years ago but forget the issues)
>>>
>>> Also, I would try running with a Jacobi solver to see if that fixes the problem. If so then you might try
>>>
>>> -pc_type gamg
>>> -pc_gamg_agg_nsmooths 1
>>> -pc_gamg_type agg
>>>
>>> This is a built in AMG solver so perhaps it plays nicer with resources ...
>>>
>>> Mark
>>>
>>> On Jul 9, 2012, at 10:57 AM, John Mousel wrote:
>>>
>>> > I'm running on Kraken and am currently working with 4320 cores. I get the following error in KSPSolve.
>>> >
>>> > [2711]: (/ptmp/ulib/mpt/nightly/5.3/120211/mpich2/src/mpid/cray/src/adi/ptldev.c:2046) PtlMEInsert failed with error : PTL_NO_SPACE
>>> > MHV_exe: /ptmp/ulib/mpt/nightly/5.3/120211/mpich2/src/mpid/cray/src/adi/ptldev.c:2046: MPIDI_CRAY_ptldev_desc_pkt: Assertion `0' failed.
>>> > forrtl: error (76): Abort trap signal
>>> > Image PC Routine Line Source
>>> > MHV_exe 00000000014758CB Unknown Unknown Unknown
>>> > MHV_exe 000000000182ED43 Unknown Unknown Unknown
>>> > MHV_exe 0000000001829460 Unknown Unknown Unknown
>>> > MHV_exe 00000000017EDE3E Unknown Unknown Unknown
>>> > MHV_exe 00000000017B3FE6 Unknown Unknown Unknown
>>> > MHV_exe 00000000017B3738 Unknown Unknown Unknown
>>> > MHV_exe 00000000017B2B12 Unknown Unknown Unknown
>>> > MHV_exe 00000000017B428F Unknown Unknown Unknown
>>> > MHV_exe 000000000177FCE1 Unknown Unknown Unknown
>>> > MHV_exe 0000000001590A43 Unknown Unknown Unknown
>>> > MHV_exe 00000000014F909B Unknown Unknown Unknown
>>> > MHV_exe 00000000014FF53B Unknown Unknown Unknown
>>> > MHV_exe 00000000014A4E25 Unknown Unknown Unknown
>>> > MHV_exe 0000000001487D57 Unknown Unknown Unknown
>>> > MHV_exe 000000000147F726 Unknown Unknown Unknown
>>> > MHV_exe 000000000137A8D3 Unknown Unknown Unknown
>>> > MHV_exe 0000000000E97BF2 Unknown Unknown Unknown
>>> > MHV_exe 000000000098EAF1 Unknown Unknown Unknown
>>> > MHV_exe 0000000000989C20 Unknown Unknown Unknown
>>> > MHV_exe 000000000097A9C2 Unknown Unknown Unknown
>>> > MHV_exe 000000000082FF2D axbsolve_ 539 PetscObjectsOperations.F90
>>> >
>>> > This is somewhere in KSPSolve. Is there an MPICH environment variable that needs tweaking? I couldn't really find much on this particular error.
>>> > The solver is BiCGStab with Hypre as a preconditioner.
>>> >
>>> > -ksp_type bcgsl -pc_type hypre -pc_hypre_type boomeramg -ksp_monitor
>>> >
>>> > Thanks,
>>> >
>>> > John
>>>
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120710/6dc2a1cd/attachment.html>
More information about the petsc-users
mailing list