[petsc-dev] FW: MKL with PETSc

Sat Dec 21 21:11:42 CST 2013

  Dinesh,

   It would be better if they just email us directly, if they are afraid of public review they can use petsc-maint at mcs.anl.gov since no public records of kept of those.

    Barry

On Dec 21, 2013, at 9:02 PM, Dinesh K Kaushik <Dinesh.Kaushik at KAUST.EDU.SA> wrote:

> This group of Intel people has been working with me and David for past many months.
> 
> Do you have any recommendation for the issue they are encountering? Can we use the multithreaded routines such as MatILUFactor and MatSolve from MKL (I believe for some matrix formats- but at least for BAIJ and AIJ)?
> 
> Thanks,
> 
> Dinesh
> 
> 
> From: <Mudigere>, Dheevatsa <dheevatsa.mudigere at intel.com>
> To: Dinesh Kaushik <dinesh.kaushik at kaust.edu.sa>, Dinesh Kaushik <kaushik at mcs.anl.gov>
> Cc: David E Keyes <david.keyes at kaust.edu.sa>, "Deshpande, Anand M" <anand.m.deshpande at intel.com>
> Subject: MKL with PETSc
> 
> Hi Dinesh,
>  
> I had a question regarding interfacing MKL routines with PETSC.
> Now that we have a fairly optimized flux kernel both on the Xeon and Xeon-Phi, we are progressing on to the other key kernels. Among them, the next major contributors to the execution time are the - ILU decomposition (called within the pre-conditioner once every time step)  and the direct solver (called every inner GMRES iteration,  using preconditioned matrix).  Form the initial performance profile (below) it can be seen that these two operations together contribute to 31% of the sequential execution time on a single node on Xeon and on Xeon-phi the contribution of these two operations are ~50%.
>  
> As you would already know, the following PETSc routines are used for these operations – MatILUFactor and MatSolve. These are higher level interfaces and depending on the sparse matrix storage format – AIJ or BAIJ, the more specific lower-level MatSymbolic, MatNumeric  and MatSolve routines are used. Unfortunately, these PETSc routines are not multi-threaded and can’t leverage the available fine grained parallelism. As a first step in optimizing these operations, we want to replace these PETSc calls with multi-threaded MKL routines. This would give us a good idea of how well these operations scale on single node (Xeon and Xeon-Phi) with multiple threads.
> So, it’s in this regard that I wanted your help – to know what’s the best way to reverse communicate from PETSc to use MKL routines for these operations. For now, I have managed to do this by modifying the PETSC functions themselves (MatLUFactorNumeric_SeqAIJ_Inode) to use the MKL routines. This is somewhat a “dirty hack”, where I am shunting out the actual logic and calling the MKL functions instead and I am not taking the all the precautions to maintaining compatibility with other functions.  I wanted to check with you if there is a better and more systematic way to do this, without having to hack around with the PETSc library routines ?
> PETSc already supports hypre, suprelu and several other such performance libraries, is there also a way to support MKL ?
>  
> Your help on this will be greatly appreciated.
>  
> Thanks,
> Dheevatsa
>  
>  
> <image002.png><image004.png>
> 
> 
> This message and its contents including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.
> <image002.png><image004.png>