[petsc-dev] GPU preconditioners
Andrea Lani
andrea.lani at gmail.com
Sat Jan 18 03:53:02 CST 2014
Thanks a lot, Karli! I will update, make a few tests and let you know if my problem is fixed!
Best regards
Andrea
On Jan 18, 2014, at 10:26 AM, Karl Rupp <rupp at mcs.anl.gov> wrote:
> Hi Andrea,
>
> the fix is now merged to master:
> https://bitbucket.org/petsc/petsc/commits/087a195f1d07b315894e9d8ab1801a0ce993221c
>
> Best regards,
> Karli
>
>
>
> On 01/17/2014 10:13 PM, Andrea Lani wrote:
>> Well, I have 9 equations, so 9x9 I guess...
>>
>> I hope the one you are mentioning was a major bug, because what I get is
>> seriously wrong: while on single GPU (KSPGMRES+PCASM) I get a residual
>> of +0.72, on 8-cores/GPU I get -1.00 at the first time step, just to
>> make an example. Can this be due to the bug you are saying or you can
>> suspect something more?
>>
>> What should I do then? wait for the valgrind fix which is underway and
>> then update? Can you please notify me when this is fixed? I'm writing a
>> final report for a project and I would like to include this feature
>> fully fixed if possible.
>>
>> Another question, what do you exactly mean by "order the unknowns
>> properly" in this case?
>> Thanks a lot!
>>
>> Andrea
>>
>>
>> On Fri, Jan 17, 2014 at 10:02 PM, Karl Rupp <rupp at mcs.anl.gov
>> <mailto:rupp at mcs.anl.gov>> wrote:
>>
>> Hi Andrea,
>>
>>
>> In fact, I have another major problem: when running on multi-GPU
>> with
>> PETSc my results are totally inconsistent compared to a single
>> GPU .
>>
>>
>> This was a bug which was fixed a couple of days ago. It is in branch
>> 'next', but not yet merged to master since it has another valgrind
>> issue I haven't nailed down yet.
>>
>>
>>
>> In my code, for now, I'm assuming a 1-1 correspondence between
>> CPU and
>> GPU: I run on 8 cores and 8 GPUs (4 K10). How can I enforce
>> this in the
>> PETSc solver? Is it automatically done or do I have to specify
>> some options?
>>
>>
>> One MPI rank maps to one logical GPU. In your case, please run with
>> 8 MPI ranks and distribute them equally over the nodes equipped with
>> the GPUs.
>>
>> As for the preconditioners: We haven't added any new preconditioners
>> recently. Preconditioning on GPUs is a very problem-specific thing
>> due to the burden of PCI-Express latency. Massively parallel
>> approaches such as Sparse Approximate Inverses perform well in terms
>> of theoretical FLOP counts, but are poor in terms of convergence and
>> pretty expensive in terms of memory when running many simultaneous
>> factorizations. ILU on the GPU can be fast if you order the unknowns
>> properly and have only few nonzeros per row, but it is not great in
>> terms of convergence rate either. PCI-Express bandwidth and latency
>> is really a problem here...
>>
>> How large are your blocks when using a block-Jacobi preconditioner
>> for your problem? In the order of 3x3 or (much) larger?
>>
>> Best regards,
>> Karli
>>
>>
>>
>>
>> --
>> Dr. Andrea Lani
>> Senior Research Engineer, PhD
>> Aeronautics & Aerospace dept., CFD group
>> Von Karman Institute for Fluid Dynamics
>> Chausse de Waterloo 72,
>> B-1640, Rhode-Saint-Genese, Belgium
>> fax : +32-2-3599600
>> work : +32-2-3599769 _
>> lani at vki.ac.be <mailto:lani at vki.ac.be>_
>
More information about the petsc-dev
mailing list