[petsc-users] slepc NHEP error

Jose E. Roman jroman at dsic.upv.es
Mon Oct 23 11:03:19 CDT 2017


To close this old thread, I would like to mention that in SLEPc 3.8 we have added a command-line option that should fix the problem:

  -ds_parallel synchronized

This option forces a synchronization of the results of local computations in DS (that involve LAPACK calls), so that all MPI processes have exactly the same result. This was causing the failure you reported.

If this option is not provided, the behaviour is the same as in SLEPc 3.7 and before, i.e., all processes do the computation redundantly (-ds_parallel redundant).

Jose


> El 16 jun 2017, a las 17:36, Jose E. Roman <jroman at dsic.upv.es> escribió:
> 
> I still need to work on this, but in principle my previous comments are confirmed. In particular, in my tests it seems that the problem does not appear if PETSc has been configured with --download-fblaslapack
> If you have a deadline, I would suggest you to go this way, until I can find a more definitive solution.
> 
> Jose
> 
> 
> 
>> El 16 jun 2017, a las 14:50, Kannan, Ramakrishnan <kannanr at ornl.gov> escribió:
>> 
>> Jose/Barry,
>> 
>> Excellent. This is a good news. I have a deadline on this code next Wednesday and hope it is not a big one to address.  Please keep me posted.
>> -- 
>> Regards,
>> Ramki
>> 
>> 
>> On 6/16/17, 8:44 AM, "Jose E. Roman" <jroman at dsic.upv.es> wrote:
>> 
>>   I was able to reproduce the problem. I will try to track it down.
>>   Jose
>> 
>>> El 16 jun 2017, a las 2:03, Barry Smith <bsmith at mcs.anl.gov> escribió:
>>> 
>>> 
>>> Ok, got it.
>>> 
>>>> On Jun 15, 2017, at 6:56 PM, Kannan, Ramakrishnan <kannanr at ornl.gov> wrote:
>>>> 
>>>> You don't need to install. Just download and extract the tar file. There will be a folder of include files. Point this in build.sh.
>>>> 
>>>> Regards, Ramki
>>>> Android keyboard at work. Excuse typos and brevity
>>>> From: Barry Smith 
>>>> Sent: Thursday, June 15, 2017 7:54 PM
>>>> To: "Kannan, Ramakrishnan" 
>>>> CC: "Jose E. Roman" ,petsc-users at mcs.anl.gov
>>>> Subject: Re: [petsc-users] slepc NHEP error
>>>> 
>>>> 
>>>> 
>>>> brew install   Armadillo  fails for me on brew install hdf5 I have reported this to home-brew and hopefully they'll have a fix within a couple of days so I can try to run the test case.
>>>> 
>>>> Barry
>>>> 
>>>>> On Jun 15, 2017, at 6:34 PM, Kannan, Ramakrishnan <kannanr at ornl.gov> wrote:
>>>>> 
>>>>> Barry,
>>>>> 
>>>>> Attached is the quick test program I extracted out of my existing code. This is not clean but you can still understand. I use slepc 3.7.3 and 32 bit real petsc 3.7.4.
>>>>> 
>>>>> This requires armadillo from http://arma.sourceforge.net/download.html. Just extract and show the correct path of armadillo in the build.sh. 
>>>>> 
>>>>> I compiled, ran the code. The error and the output file are also in the tar.gz file.
>>>>> 
>>>>> Appreciate your kind support and looking forward for early resolution. 
>>>>> -- 
>>>>> Regards,
>>>>> Ramki
>>>>> 
>>>>> 
>>>>> On 6/15/17, 4:35 PM, "Barry Smith" <bsmith at mcs.anl.gov> wrote:
>>>>> 
>>>>> 
>>>>>> On Jun 15, 2017, at 1:45 PM, Kannan, Ramakrishnan <kannanr at ornl.gov> wrote:
>>>>>> 
>>>>>> Attached is the latest error w/ 32 bit petsc and the uniform random input matrix. Let me know if you are looking for more information.
>>>>> 
>>>>>    Could you please send the full program that reads in the data files and runs SLEPc generating the problem? We don't have any way of using the data files you sent us.
>>>>> 
>>>>>    Barry
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> Regards,
>>>>>> Ramki
>>>>>> 
>>>>>> 
>>>>>> On 6/15/17, 2:27 PM, "Jose E. Roman" <jroman at dsic.upv.es> wrote:
>>>>>> 
>>>>>> 
>>>>>>> El 15 jun 2017, a las 19:35, Barry Smith <bsmith at mcs.anl.gov> escribió:
>>>>>>> 
>>>>>>> So where in the code is the decision on how many columns to use made? If we look at that it might help see why it could ever produce different results on different processes.
>>>>>> 
>>>>>> After seeing the call stack again, I think my previous comment is wrong. I really don't know what is happening. If the number of columns was different in different processes, it would have failed before reaching that line of code.
>>>>>> 
>>>>>> Ramki: could you send me the matrix somehow? I could try it in a machine here. Which options are you using for the solver?
>>>>>> 
>>>>>> Jose
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> <slepc.e614138><Arows.tar.gz>
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> <testslepc.tar.gz>
>>> 
>> 
>> 
>> 
>> 
> 



More information about the petsc-users mailing list