[petsc-dev] PETSc programming model for multi-core systems

Barry Smith bsmith at mcs.anl.gov
Thu Nov 11 19:18:26 CST 2010


On Nov 11, 2010, at 7:09 PM, Aron Ahmadia wrote:

> I would support either MPI+OpenMP or MPI+MPI.  I've seen reasonable
> performance achieved for things like SpMV on both, but OpenMP gives
> you a lot of flexibility for reduction operations.

   How do you get adaptive load balancing (across the cores inside a process) if you have OpenMP compiler decide the partitioning/parallelism? This was Bill's point in why not to use OpenMP. For example if you give each core the same amount of work up front they will end not ending at the same time so you have wasted cycles.

   barry

> 
> Matt, in pthreads, all you have to do is fork, synchronize your
> threads, then run whatever piece of code you want in parallel.  That's
> basically the model CUDA uses, how hard is that?
> 
> A
> 
> On Fri, Nov 12, 2010 at 12:58 AM, Matthew Knepley <knepley at gmail.com> wrote:
>> On Fri, Nov 12, 2010 at 11:54 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>> 
>>> On Nov 11, 2010, at 6:15 PM, Matthew Knepley wrote:
>>> 
>>>> On Fri, Nov 12, 2010 at 9:52 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>> 
>>>>   What should we use a for programming model for PETSc on multi-core
>>>> systems? Currently for conventional multicore we have only have one MPI
>>>> process per core and for GPU we have subclasses of Vec and Mat with custom
>>>> CUDA code.
>>>> 
>>>>   Should we introduce subclasses of Vec and Mat built on pthreads (this
>>>> is what Bill G recommends, and not to use OpenMP)?
>>>> 
>>>> pthreads are a nightmare and do not do vectorization right, which is
>>>> necessary here.
>>> 
>>>   What do you mean do not vectorize?
>> 
>> I mean its easy to tell a thread to do something, but I was not aware that
>> pthreads had nice support
>> for telling all threads to do something at the same time. On a multicore,
>> you want vector instructions,
>> and on a GPU you basically have no choice.
>> 
>>> 
>>>> I am for OpenCL/CUDA. Eventually
>>>> I think OpenCL will take its head out of its ass and be as nice as CUDA.
>>> 
>>>   That's fine for GPUs but what about 8 core or 12 core Intel processors?
>>> CUDA for that also?
>> 
>> OpenCL already runs on these, but the interface is too low-level. It should
>> look more like CUDA. In CUDA,
>> you have constructs for vectorization, but you can also branch as Jed
>> indicated. This is slow on a GPU,
>> but would be fine on a multicore.
>> Jed, do you see obvious shortcomings of CUDA for these multicore machines?
>>    Matt
>> 
>>> 
>>>   Barry
>>> 
>>>> 
>>>>   Matt
>>>> 
>>>>   Is there a way to have some kind of consistent model between
>>>> conventional multicore and GPU multi-core? If not the same code. What about
>>>> this MCUDA stuff?
>>>> 
>>>>   Barry
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>> 
>> 
>> 
>> 
>> --
>> What most experimenters take for granted before they begin their experiments
>> is infinitely more interesting than any results to which their experiments
>> lead.
>> -- Norbert Wiener
>> 




More information about the petsc-dev mailing list