[petsc-users] Mat/Vec with empty ranks

Barry Smith bsmith at mcs.anl.gov
Thu Oct 5 02:11:19 CDT 2017


  Florian,

    Thanks for reporting the problem. It is a serious bug in PETSc with dense matrices. Here is my proposed fix

https://bitbucket.org/petsc/petsc/pull-requests/764/fix-bug-in-sequential-dense-multiply-and/diff

  Barry

> On Oct 5, 2017, at 6:39 AM, Florian Lindner <mailinglists at xgm.de> wrote:
> 
> Am 04.10.2017 um 18:08 schrieb Matthew Knepley:
> 
>> I don't know if that is right. However, the sequential and parallel algorithms agree on both the initial residual (so
>> that parallel
>> matrix and rhs appear correct) and the first iterate. Divergence of the second iterate could still be a bug in our code,
>> but it
>> was harder for me to see how.
>> 
>> The real thing to do, which should not be that much work but I don't have time for now unfortunately, is to step through the
>> algorithm in serial and parallel and see what number changes. The algorithm only has 20 or so steps per iterate, so this
>> would probably take one day to do right.
> 
> Ok, I try to dig a bit into petsc.
> 
> I worked on the cleaned up code you gave me, ran it on 4 MPI ranks and compared output with and without using -load.
> 
> Other options were:
> 
> -ksp_max_it 10 -ksp_view -ksp_monitor_true_residual -ksp_lsqr_monitor -ksp_view_pre -vecscatter_view"
> 
> All on the maint branch.
> 
> Starting from lsqr.c, I identified values to start differing after KSP_MatMultTranspose(ksp,Amat,U1,V1);
> 
> With -load (converging), V1 has the value:
> 
> Vec Object: 4 MPI processes
>  type: mpi
> Process [0]
> -0.544245
> Process [1]
> 1.11245
> Process [2]
> -1.25846
> Process [3]
> 
> Without -load:
> 
> Vec Object: 4 MPI processes
>  type: mpi
> Process [0]
> 0.316288
> Process [1]
> 2.85233
> Process [2]
> -0.776467
> Process [3]
> 
> Other input values are same.
> 
> I tracked it further down to MatMultTranspose_MPIDense in mpidense.c where the value of yy starts to differ after the
> VecScatterBegin/End. At this place, a->lvec, the scatter source also differes, whereas Mat A is identical (by looking at
> MatView output).
> 
> However, no idea where a->lvec (which is A->data->lvec) is filled.
> 
> I hope that helps a bit.
> 
> Best,
> Florian
> 



More information about the petsc-users mailing list