[petsc-users] slepc NHEP error
Jose E. Roman
jroman at dsic.upv.es
Thu Jun 15 12:31:42 CDT 2017
> El 15 jun 2017, a las 19:13, Barry Smith <bsmith at mcs.anl.gov> escribió:
>
>
>> On Jun 15, 2017, at 9:51 AM, Jose E. Roman <jroman at dsic.upv.es> wrote:
>>
>>
>>> El 15 jun 2017, a las 16:18, Kannan, Ramakrishnan <kannanr at ornl.gov> escribió:
>>>
>>> I made the advised changes and rebuilt slepc. I ran and the error still exists. Attached are the error file and the modified source file bvblas.c.
>>
>> This is really weird. It seems that at some point the number of columns of the BV object differs by one in different MPI processes. The only explanation I can think of is that a threaded BLAS/LAPACK is giving slightly different results in each process.
>
>
> Jose,
>
> Do you have a local calculation that generates the number of columns seperately on each process with the assumption that the result will be the same on all processes? Where is that code? You may need a global reduction where the processes "negotiate" what the number of columns should be after they do the local computation, for example take the maximum (or min or average) produced by all the processes.
>
> Barry
>
My comment is related to the convergence criterion, which is based on a call to LAPACK (it is buried in the DS object, no clear spot in the code). This is done this way in SLEPc for 15+ years, and no one has complained. So maybe it is not what is causing this problem. The thing is that I do not have access to these big machines, with Cray libraries etc., so I cannot reproduce the problem and am just suggesting things blindly.
Jose
>
>
>
>> Which BLAS/LAPACK do you have? Can you run with threads turned off? An alternative would be to configure PETSc with --download-fblaslapack (or --download-f2cblaslapack)
>>
>> Jose
>>
>
More information about the petsc-users
mailing list