[petsc-users] Mat/Vec with empty ranks
Barry Smith
bsmith at mcs.anl.gov
Thu Oct 5 02:11:19 CDT 2017
Florian,
Thanks for reporting the problem. It is a serious bug in PETSc with dense matrices. Here is my proposed fix
https://bitbucket.org/petsc/petsc/pull-requests/764/fix-bug-in-sequential-dense-multiply-and/diff
Barry
> On Oct 5, 2017, at 6:39 AM, Florian Lindner <mailinglists at xgm.de> wrote:
>
> Am 04.10.2017 um 18:08 schrieb Matthew Knepley:
>
>> I don't know if that is right. However, the sequential and parallel algorithms agree on both the initial residual (so
>> that parallel
>> matrix and rhs appear correct) and the first iterate. Divergence of the second iterate could still be a bug in our code,
>> but it
>> was harder for me to see how.
>>
>> The real thing to do, which should not be that much work but I don't have time for now unfortunately, is to step through the
>> algorithm in serial and parallel and see what number changes. The algorithm only has 20 or so steps per iterate, so this
>> would probably take one day to do right.
>
> Ok, I try to dig a bit into petsc.
>
> I worked on the cleaned up code you gave me, ran it on 4 MPI ranks and compared output with and without using -load.
>
> Other options were:
>
> -ksp_max_it 10 -ksp_view -ksp_monitor_true_residual -ksp_lsqr_monitor -ksp_view_pre -vecscatter_view"
>
> All on the maint branch.
>
> Starting from lsqr.c, I identified values to start differing after KSP_MatMultTranspose(ksp,Amat,U1,V1);
>
> With -load (converging), V1 has the value:
>
> Vec Object: 4 MPI processes
> type: mpi
> Process [0]
> -0.544245
> Process [1]
> 1.11245
> Process [2]
> -1.25846
> Process [3]
>
> Without -load:
>
> Vec Object: 4 MPI processes
> type: mpi
> Process [0]
> 0.316288
> Process [1]
> 2.85233
> Process [2]
> -0.776467
> Process [3]
>
> Other input values are same.
>
> I tracked it further down to MatMultTranspose_MPIDense in mpidense.c where the value of yy starts to differ after the
> VecScatterBegin/End. At this place, a->lvec, the scatter source also differes, whereas Mat A is identical (by looking at
> MatView output).
>
> However, no idea where a->lvec (which is A->data->lvec) is filled.
>
> I hope that helps a bit.
>
> Best,
> Florian
>
More information about the petsc-users
mailing list