[petsc-users] [petsc-maint] Speedup problem when using OpenMP?

Karl Rupp rupp at mcs.anl.gov
Tue Nov 5 03:22:01 CST 2013


Hi Danyang,

 > This does not make any difference. I have scaled up the matrix but the
> performance does not change. If I run with OpenMP, the iteration number
> is always the same whatever how many processors are used. This seems
> quite strange as the iteration number usually increase as the number of
> processors increased when run with MPI. I think I should move to the
> ubuntu system to make further test, to see if this is a windows problem.

OpenMP and MPI are two different parallelization approaches:

- With MPI, we split up the system matrix into different strips, where 
each of the strips is assigned to one MPI process. This then leads 
(among others) to block-Jacobi preconditioner techniques, where you 
usually see an increase in iteration counts. In the ex2 case, however, 
this even leads to a reduction of iteration counts.

- With OpenMP, the system matrix is contiguous in memory, so one still 
computes preconditioners for the full matrix (as is for example the case 
with ILU). Thus, the use of OpenMP is transparent with respect to the 
algorithms employed, so you don't see any change in iteration counts. 
The typical vector operations like VecScale() (should) make use of 
OpenMP, but apparently this is not the case. I'm double-checking on my 
machine (Linux Mint Maya, based on Ubuntu 12.04 LTS) and let you know.

Best regards,
Karli



> On 04/11/2013 6:51 AM, Karl Rupp wrote:
>> Hi,
>>
>> > I have a question on the speedup of PETSc when using OpenMP. I can get
>>> good speedup when using MPI, but no speedup when using OpenMP.
>>> The example is ex2f with m=100 and n=100. The number of available
>>> processors is 16 (32 threads) and the OS is Windows Server 2012. The log
>>> files for 4 and 8 processors are attached.
>>>
>>> The commands I used to run with 4 processors are as follows:
>>> Run using MPI
>>> mpiexec -n 4 Petsc-windows-ex2f.exe -m 100 -n 100 -log_summary
>>> log_100x100_mpi_p4.log
>>>
>>> Run using OpenMP
>>> Petsc-windows-ex2f.exe -threadcomm_type openmp -threadcomm_nthreads 4 -m
>>> 100 -n 100 -log_summary log_100x100_openmp_p4.log
>>>
>>> The PETSc used for this test is PETSc for Windows
>>> http://www.mic-tc.ch/downloads/PETScForWindows.zip, but I guess this is
>>> not the problem because the same problem exists when I use PETSc-dev in
>>> Cygwin. I don't know if this problem exists in Linux, would anybody help
>>> to test?
>>
>> For the 100x100 case considered, the execution times per call are
>> somewhere in the millisecond to sub-millisecond range (e.g. 1.3ms for
>> 68 calls to VecScale with 4 processors). I'd say this is too small in
>> order to see any reasonable performance gain when running multiple
>> threads, consider problem sizes of about 1000x1000 instead.
>>
>> Moreover, keep in mind that typically you won't get a perfectly linear
>> scaling with the number of processor cores, because ultimately the
>> memory bandwidth is the limiting factor for standard vector operations.
>>
>> Best regards,
>> Karli
>>
>



More information about the petsc-users mailing list