[petsc-dev] (no subject)

Jakub Kruzik jakub.kruzik at vsb.cz
Sat Sep 23 09:26:27 CDT 2017


Dear all,

I would just like to note that we also develop SVM implementation. It is 
intended for large-scale datasets and makes use of PETSc parallel linear 
algebra. Currently, it supports only linear kernels - Hessian is, in 
fact, MATNORMAL with arbitrary underlying data matrix - it is, e.g. 
possible to use MATDENSE or MATAIJ depending on the problem. For the 
solution of the arising quadratic program (QP), it uses solvers from our 
PermonQP package. Both PermonSVM and PermonQP are libraries depending on 
PETSc. They are written in the PETSc coding style, pretty much like SLEPc.

http://permon.it4i.cz/permonqp.htm
http://permon.it4i.cz/permonsvm.htm

https://github.com/it4innovations/permon
https://github.com/it4innovations/permonsvm

So far, PermonQP only implements an Augmented Lagrangian type algorithm 
which can be combined with any solver for box-constrained QP. In 
PermonQP, there are some concrete ones and also TAO wrapper. However, 
adding an Interior Point implementation is interesting for us as well.

PermonSVM is so far a proof-of-concept thing, but it already scales 
pretty well (almost proportionally to the application of the data matrix 
to a vector). See, e.g. our PASC poster 
https://www.researchgate.net/publication/318317204_PERMON_PASC17_Poster

We'll be grateful for any feedback on this.

Jakub


On 22.9.2017 06:06, Richard Tran Mills wrote:
> Thanks for sharing this, Barry. I haven't had time to read their 
> paper, but it looks worth a read.
>
> Hong, since many machine-learning or data-mining problems can be cast 
> as linear algebra problems (several examples involving eigenproblems 
> come to mind), I'm guessing that there must be several people using 
> PETSc (with SLEPc, likely) in this this area, but I don't think I've 
> come across any published examples. What have others seen?
>
> Most of the machine learning and data-mining papers I read seem employ 
> sequential algorithms or, at most, algorithms targeted at on-node 
> parallelism only. With available data sets getting as large and easily 
> available as they are, I'm surprised that there isn't more focus on 
> doing things with distributed parallelism. One of my cited papers is 
> on a distributed parallel k-means implementation I worked on some 
> years ago: we didn't do anything especially clever with it, but today 
> it is still one of the *only* parallel clustering publications I've seen.
>
> I'd love to 1) hear about what other machine-learning or data-mining 
> applications using PETSc that others have come across and 2) hear 
> about applications in this area where people aren't using PETSc but it 
> looks like they should!
>
> Cheers,
> Richard
>
> On Thu, Sep 21, 2017 at 12:51 PM, Zhang, Hong <hongzhang at anl.gov 
> <mailto:hongzhang at anl.gov>> wrote:
>
>     Great news! According to their papers, MLSVM works only in serial.
>     I am not sure what is stopping them using PETSc in parallel.
>
>     Btw, are there any other cases that use PETSc for machine learning?
>
>     Hong (Mr.)
>
>     > On Sep 21, 2017, at 1:02 PM, Barry Smith <bsmith at mcs.anl.gov
>     <mailto:bsmith at mcs.anl.gov>> wrote:
>     >
>     >
>     > From: Ilya Safro isafro at g.clemson.edu <mailto:isafro at g.clemson.edu>
>     > Date: September 17, 2017
>     > Subject: MLSVM 1.0, Multilevel Support Vector Machines
>     >
>     > We are pleased to announce the release of MLSVM 1.0, a library
>     of fast
>     > multilevel algorithms for training nonlinear support vector machine
>     > models on large-scale datasets. The library is developed as an
>     > extension of PETSc to support, among other applications, the
>     analysis
>     > of datasets in scientific computing.
>     >
>     > Highlights:
>     > - The best quality/performance trade-off is achieved with algebraic
>     > multigrid coarsening
>     > - Tested on academic, industrial, and healthcare datasets
>     > - Generates multiple models for each training
>     > - Effective on imbalanced datasets
>     >
>     > Download MLSVM at https://github.com/esadr/mlsvm
>     >
>     > Corresponding paper: Sadrfaridpour, Razzaghi and Safro "Engineering
>     > multilevel support vector machines", 2017,
>     > https://arxiv.org/pdf/1707.07657.pdf
>     <https://arxiv.org/pdf/1707.07657.pdf>
>     >
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20170923/3cf53e69/attachment.html>


More information about the petsc-dev mailing list