[petsc-users] da_overlap and zero pivot row

Barry Smith bsmith at mcs.anl.gov
Thu Jun 30 00:15:39 CDT 2016


  Xiangdong,

    I've attached at patch 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Xiangdong.patch
Type: application/octet-stream
Size: 3108 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160630/ba3c6cd6/attachment.obj>
-------------- next part --------------
also available in the branch barry/support-dmcreatedomaindecompositionscatters-da-nonperiodic-maint if you are using git.
It resolves the current issue by only creating scatters for physical ghost points. 

    BUT I do not know if this fix will get nasm to work properly for a real problem with DMDAs. There may be other problems that come up.

   Barry


> On Jun 29, 2016, at 10:49 AM, Xiangdong <epscodes at gmail.com> wrote:
> 
> To make things easier, I created a simple example here:
> 
> https://drive.google.com/file/d/0BxEfb1tasJxhOHJWbVJ0ZGl0LTA/view?usp=sharing
> 
> ./exe -snes_newtonls okay.
> ./exe -snes_nasm -da_grid_z 2 okay
> ./exe -snes_nasm failed. It got trapped in the while loop.
> 
> Do you have any suggestions to avoid this trap for nz=1?
> 
> Thanks.
> 
> Xiangdong
> 
> On Wed, Jun 29, 2016 at 11:23 AM, Xiangdong <epscodes at gmail.com> wrote:
> The code got trapped in a while loop when I switch to s=2.
> 
> http://www.mcs.anl.gov/petsc/petsc-current/src/dm/impls/da/dadd.c.html
> 
> In dadd.c line 85-96  
> /* gone out of processor range on y axis */
>  86:         while(jj > ne-1 || jj < ns) {
>  87:           if (nr == n-1) {
>  88:             ns = 0;
>  89:             ne = ly[0];
>  90:             nr = 0;
>  91:           } else {
>  92:             nr++;
>  93:             ns = ne;
>  94:             ne += ly[nr];
>  95:           }
>  96:         }
> 
> For my case nx=5,ny=5,nz=1,s=2, np=2, snes_type nasm, the code is trapped in the loop for jj=-1, ns=0, nr=0, n=1. This negative jj comes from the line 340-349 of same file.
> 
> /* global and subdomain ISes for the local indices of the subdomain */
> 341:     /* todo - make this not loop over at nonperiodic boundaries, which will be more involved */
> 342:     lower.i = subinfo.gxs;
> 343:     lower.j = subinfo.gys;
> 344:     lower.k = subinfo.gzs;
> 345:     upper.i = subinfo.gxs+subinfo.gxm;
> 346:     upper.j = subinfo.gys+subinfo.gym;
> 347:     upper.k = subinfo.gzs+subinfo.gzm;
> 
> 349:     DMDACreatePatchIS(dm,&lower,&upper,&gdis);
> 
> When trapped, subinfo.gxs=-2. Since there is a todo comment, I am not sure this is the reason for the while loop trap.
> 
> Thanks.
> 
> Xiangdong
> 
> 
> On Tue, Jun 28, 2016 at 5:16 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
>   If you make the stencil width as wide at the overlap does the problem go away and you get expected convergence?
> 
> > On Jun 28, 2016, at 3:48 PM, Xiangdong <epscodes at gmail.com> wrote:
> >
> > Let me illustrate my point and question by an example. I have a 1d nonlinear problem with Nx=5 and stencil width 1. When I run it with np=2, processor 1 own cells 1,2,3 (+ ghost 4) and processor 2 own cells 4,5 (+ ghost 3). I also have functions FormFunctionLocal and FormJacobianLocal to compute the residual and Jacobian.
> >
> > i) snes_type newtonls. The Jacobian is in mpiaij format distributed on two processors. processor 1 computes the first three rows of Jac (size 3x5) and processor 1 computes the last two rows of Jac (size 2x5). The code works fine.
> >
> > ii) snes_type nasm and da_overlap 0. Each processor has its own Jac (seqaij). Processor 1 computes its 3x3 Jac and Processor 2 computes its 2x2 Jac. The code works fine.
> >
> > iii) snes_type nasm and da_overlap 1. Each processor needs to computes its own Jac. Processor 1 computes its 4x4 Jac and Processor 2 computes its 3x3 Jac. However, for processor 1 to compute the 4-th row of its Jac, it needs the properties information stored on cell 5, which is not available on processor 1. The zero pivot error is from this wrong Jac for the row corresponding to the overlap cell. The mat_fd_color does not work either for the same reason that information from cell 5 is not available on processor 1.
> >
> > Do you have any suggestion for the overlap case? Thank you.
> >
> > Best,
> > Xiangdong
> >
> > On Mon, Jun 27, 2016 at 4:21 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> > > On Jun 27, 2016, at 3:15 PM, Xiangdong <epscodes at gmail.com> wrote:
> > >
> > > Hello everyone,
> > >
> > > I am trying different number of da_overlap to see its effects on nasm and aspin preconditioner. The codes works fine with -da_overlap 0. However, when I change the option -da_overlap 1, it crashed with the error message like "zero pivot row 12544 value 0 tolerance 2.2e-14". The option -pc_factor_nonzeros_along_diagonal does not fix this zero pivot issue.
> > >
> > > Do you have any quick suggestions for me to try to fix this issue?
> >
> >    My guess is that it is bug in the da_overlap business or in the example; normally I would not expect this to happen. Is this a PETSc example I can run (tell me example and exact options) to reproduce the problem?
> >
> >    Thanks
> >
> >    Barry
> >
> > >
> > > Thank you.
> > >
> > > Best,
> > > Xiangdong
> >
> >
> 
> 
> 



More information about the petsc-users mailing list