[petsc-dev] problem with hypre with '--with-openmp=1'

Mark Adams mfadams at lbl.gov
Tue Jun 26 09:48:18 CDT 2018


BTW, this is from that super slow with-openmp run on 8 procs. Barrier looks
sad.

========================================================================================================================
Average time to get PetscTime(): 5.00679e-07
Average time for MPI_Barrier(): 0.1064
Average time for zero size MPI_Send(): 0.00800014


On Tue, Jun 26, 2018 at 10:36 AM Mark Adams <mfadams at lbl.gov> wrote:

> Interesting, I am seeing the same thing with ksp/ex56 (elasticity) with
> 30^3 grid on each process. One process runs fine (1.5 sec) but 8 processes
> with 30^3 on each process took 156 sec.
>
> And, PETSc's log_view is running extremely slow. I have the total time
> (156) but each event is taking like a minute or more to come out.
>
> On Tue, Jun 26, 2018 at 10:13 AM Junchao Zhang <jczhang at mcs.anl.gov>
> wrote:
>
>>
>> On Tue, Jun 26, 2018 at 8:26 AM, Mark Adams <mfadams at lbl.gov> wrote:
>>
>>>
>>>
>>> On Tue, Jun 26, 2018 at 12:19 AM Junchao Zhang <jczhang at mcs.anl.gov>
>>> wrote:
>>>
>>>> Mark,
>>>>   Your email reminded me my recent experiments. My PETSc was configured --with-openmp=1.
>>>> With hypre, my job ran out of time. That was on an Argonne Xeon cluster.
>>>>
>>>
>>> Interesting. I tested on Cori's Haswell nodes and it looked fine. I did
>>> not time it but seemed OK.
>>>
>>>
>>>>   I repeated the experiments on Cori's Haswell nodes.  --with-openmp=1,
>>>> "Linear solve converged due to CONVERGED_RTOL iterations 5". But it took
>>>> very long time (10 mins). Without --with-openmp=1, it took less than 1
>>>> second.
>>>>
>>>
>>> Humm. I seemed to run OK on Cori's Haswell nodes. Where you running a
>>> significant sized job? I was test small serial runs.
>>>
>>
>>  I ran with 27 processors and each had 30^3 unknowns.
>>
>>>
>>>
>>>>
>>>> --Junchao Zhang
>>>>
>>>> On Fri, Jun 22, 2018 at 3:33 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>>
>>>>> We are using KNL (Cori) and hypre is not working when configured
>>>>> with  '--with-openmp=1', even when not using threads (as far as I can tell,
>>>>> I never use threads).
>>>>>
>>>>> Hypre is not converging, for instance with an optimized build:
>>>>>
>>>>> srun -n 1 ./ex56 -pc_type hypre -ksp_monitor -ksp_converged_reason
>>>>> -ksp_type cg -pc_hypre_type boomeramg
>>>>> OMP: Warning #239: KMP_AFFINITY: granularity=fine will be used.
>>>>>   0 KSP Residual norm 7.366251922394e+22
>>>>>   1 KSP Residual norm 3.676434682799e+22
>>>>> Linear solve did not converge due to DIVERGED_INDEFINITE_PC iterations
>>>>> 2
>>>>>
>>>>> Interestingly in debug mode it almost looks good but it is dying:
>>>>>
>>>>> 05:09 nid02516 maint *=
>>>>> ~/petsc_install/petsc/src/ksp/ksp/examples/tutorials$ make
>>>>> PETSC_DIR=/global/homes/m/madams/petsc_install/petsc-cori-knl-dbg64-intel-omp
>>>>> PETSC_ARCH="" run
>>>>> srun -n 1 ./ex56 -pc_type hypre -ksp_monitor -ksp_converged_reason
>>>>> -ksp_type cg -pc_hypre_type boomeramg
>>>>> OMP: Warning #239: KMP_AFFINITY: granularity=fine will be used.
>>>>>   0 KSP Residual norm 7.882081712007e+02
>>>>>   1 KSP Residual norm 2.500214073037e+02
>>>>>   2 KSP Residual norm 3.371746347713e+01
>>>>>   3 KSP Residual norm 2.918759396143e+00
>>>>>   4 KSP Residual norm 9.006505495017e-01
>>>>> Linear solve did not converge due to DIVERGED_INDEFINITE_PC iterations
>>>>> 5
>>>>>
>>>>> This test runs fine on Xeon nodes. I assume that Hypre has been tested
>>>>> on KNL. GAMG runs fine, of coarse and the initial residual is similar to
>>>>> this debug run.
>>>>>
>>>>> Could PETSc be messing up the matrix conversion to hypre
>>>>> '--with-openmp=1' ?
>>>>>
>>>>> Any ideas?
>>>>>
>>>>> Thanks,
>>>>> Mark
>>>>>
>>>>>
>>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20180626/6b70701e/attachment.html>


More information about the petsc-dev mailing list