[petsc-users] MPICH error in KSPSolve

Mark F. Adams mark.adams at columbia.edu
Mon Jul 9 11:52:54 CDT 2012


On Jul 9, 2012, at 12:39 PM, John Mousel wrote:

> Mark,
> 
> The problem is indeed non-symmetric. We went back and forth in March about this problem. I think we ended up concluding that the coarse size couldn't get too small or the null-space presented problems.

Oh its singular.  I forget what the issues were but an iterative coarse grid solver should be fine for singular problems, perhaps with null space cleaning if the kernel is sneaking in.   Actually there is an SVD coarse grid solver:

-mg_coarse_pc_type svd

That is the most robust.

> When I did get it to work, I tried to scale it up, and on my local university cluster, it seemed to just hang when the core counts got above something like 16 cores. I don't really trust that machine though.

That's the machine.  GAMG does have some issues but I've not seen it hang.

> It's new and has been plagued by hardware incompatability issues since day 1. I could re-examine this on Kraken. Also, what option are you talking about with ML. I thought I had tried all the -pc_ml_CoarsenScheme options, but I could be wrong.

This sounds like the right one.  I try to be careful in my solvers to be invariant to subdomain shapes and sizes and I think Ray Tuminaro (ML developer) at least has options that should be careful about this also.  But I don't know much about what they are deploying these days.

Mark

> 
> John
> 
>  
> 
> On Mon, Jul 9, 2012 at 11:30 AM, Mark F. Adams <mark.adams at columbia.edu> wrote:
> What problems are you having again with GAMG?  Are you problems unsymmetric?
> 
> ML has several coarsening strategies available and I think the default does aggregation locally and does not aggregate across processor subdomains.  If you have poorly shaped domains then you want to use a global coarsening method (these are not expensive).
> 
> Mark
> 
> On Jul 9, 2012, at 12:17 PM, John Mousel wrote:
> 
>> Mark,
>> 
>> I still haven't had much luck getting GAMG to work consistently for my Poisson problem. ML seems to work nicely on low core counts, but I have a problem where I can get long thin portions of grid on some processors instead of nice block like chunks at high core counts, which leads to a pretty tough time for ML. 
>> 
>> John
>> 
>> On Mon, Jul 9, 2012 at 10:58 AM, John Mousel <john.mousel at gmail.com> wrote:
>> Getting rid of the Hypre option seemed to be the trick. 
>> 
>> On Mon, Jul 9, 2012 at 10:40 AM, Mark F. Adams <mark.adams at columbia.edu> wrote:
>> Google PTL_NO_SPACE and you will find some NERSC presentations on how to go about fixing this.  (I have run into these problems years ago but forget the issues)
>> 
>> Also, I would try running with a Jacobi solver to see if that fixes the problem.  If so then you might try
>> 
>> -pc_type gamg
>> -pc_gamg_agg_nsmooths 1
>> -pc_gamg_type agg
>> 
>> This is a built in AMG solver so perhaps it plays nicer with resources ...
>> 
>> Mark
>> 
>> On Jul 9, 2012, at 10:57 AM, John Mousel wrote:
>> 
>> > I'm running on Kraken and am currently working with 4320 cores. I get the following error in KSPSolve.
>> >
>> > [2711]: (/ptmp/ulib/mpt/nightly/5.3/120211/mpich2/src/mpid/cray/src/adi/ptldev.c:2046) PtlMEInsert failed with error : PTL_NO_SPACE
>> > MHV_exe: /ptmp/ulib/mpt/nightly/5.3/120211/mpich2/src/mpid/cray/src/adi/ptldev.c:2046: MPIDI_CRAY_ptldev_desc_pkt: Assertion `0' failed.
>> > forrtl: error (76): Abort trap signal
>> > Image              PC                Routine            Line        Source
>> > MHV_exe            00000000014758CB  Unknown               Unknown  Unknown
>> > MHV_exe            000000000182ED43  Unknown               Unknown  Unknown
>> > MHV_exe            0000000001829460  Unknown               Unknown  Unknown
>> > MHV_exe            00000000017EDE3E  Unknown               Unknown  Unknown
>> > MHV_exe            00000000017B3FE6  Unknown               Unknown  Unknown
>> > MHV_exe            00000000017B3738  Unknown               Unknown  Unknown
>> > MHV_exe            00000000017B2B12  Unknown               Unknown  Unknown
>> > MHV_exe            00000000017B428F  Unknown               Unknown  Unknown
>> > MHV_exe            000000000177FCE1  Unknown               Unknown  Unknown
>> > MHV_exe            0000000001590A43  Unknown               Unknown  Unknown
>> > MHV_exe            00000000014F909B  Unknown               Unknown  Unknown
>> > MHV_exe            00000000014FF53B  Unknown               Unknown  Unknown
>> > MHV_exe            00000000014A4E25  Unknown               Unknown  Unknown
>> > MHV_exe            0000000001487D57  Unknown               Unknown  Unknown
>> > MHV_exe            000000000147F726  Unknown               Unknown  Unknown
>> > MHV_exe            000000000137A8D3  Unknown               Unknown  Unknown
>> > MHV_exe            0000000000E97BF2  Unknown               Unknown  Unknown
>> > MHV_exe            000000000098EAF1  Unknown               Unknown  Unknown
>> > MHV_exe            0000000000989C20  Unknown               Unknown  Unknown
>> > MHV_exe            000000000097A9C2  Unknown               Unknown  Unknown
>> > MHV_exe            000000000082FF2D  axbsolve_                 539  PetscObjectsOperations.F90
>> >
>> > This is somewhere in KSPSolve. Is there an MPICH environment variable that needs tweaking? I couldn't really find much on this particular error.
>> > The solver is BiCGStab with Hypre as a preconditioner.
>> >
>> > -ksp_type bcgsl -pc_type hypre -pc_hypre_type boomeramg -ksp_monitor
>> >
>> > Thanks,
>> >
>> > John
>> 
>> 
>> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120709/8d7e018b/attachment.html>


More information about the petsc-users mailing list