[petsc-dev] (no subject)

Munson, Todd tmunson at mcs.anl.gov
Fri Sep 22 07:57:33 CDT 2017

> On Sep 22, 2017, at 7:35 AM, Matthew Knepley <knepley at gmail.com> wrote:
> On Fri, Sep 22, 2017 at 12:06 AM, Richard Tran Mills <rtmills at anl.gov> wrote:
> Thanks for sharing this, Barry. I haven't had time to read their paper, but it looks worth a read.
> Hong, since many machine-learning or data-mining problems can be cast as linear algebra problems (several examples involving eigenproblems come to mind), I'm guessing that there must be several people using PETSc (with SLEPc, likely) in this this area, but I don't think I've come across any published examples. What have others seen?
> http://epubs.siam.org/doi/abs/10.1137/S1052623400374379

Alas, we did not use PETSc for it.  There is another semismooth version we wrote using 
similar concepts that is better, but alas that one also did not use PETSc.  All could
be written using PETsc though.  (We make an assumption that the data is a dense,
tall, and skinny matrix.)

Mike Gertz wrote a similar version using OOQP, but I do not think it was published.

I recently ran across some recent parallel clustering methods based on DBSCAN and 
KD-trees, but that does not seem suitable for using PETSc.

There should be lots of data analysis computations that might written using PETSc as well,
but I do not know of any group that writes them specifically using PETSc.  As some point,
we may want to think about how to hijack the data being output for some online analysis.
Its unclear if we need PETSc support or if its up to the user.


> Most of the machine learning and data-mining papers I read seem employ sequential algorithms or, at most, algorithms targeted at on-node parallelism only. With available data sets getting as large and easily available as they are, I'm surprised that there isn't more focus on doing things with distributed parallelism. One of my cited papers is on a distributed parallel k-means implementation I worked on some years ago: we didn't do anything especially clever with it, but today it is still one of the *only* parallel clustering publications I've seen.
> I'd love to 1) hear about what other machine-learning or data-mining applications using PETSc that others have come across and 2) hear about applications in this area where people aren't using PETSc but it looks like they should!
> Cheers,
> Richard
> On Thu, Sep 21, 2017 at 12:51 PM, Zhang, Hong <hongzhang at anl.gov> wrote:
> Great news! According to their papers, MLSVM works only in serial. I am not sure what is stopping them using PETSc in parallel.
> Btw, are there any other cases that use PETSc for machine learning?
> Hong (Mr.)
> > On Sep 21, 2017, at 1:02 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >
> > From: Ilya Safro isafro at g.clemson.edu
> > Date: September 17, 2017
> > Subject: MLSVM 1.0, Multilevel Support Vector Machines
> >
> > We are pleased to announce the release of MLSVM 1.0, a library of fast
> > multilevel algorithms for training nonlinear support vector machine
> > models on large-scale datasets. The library is developed as an
> > extension of PETSc to support, among other applications, the analysis
> > of datasets in scientific computing.
> >
> > Highlights:
> > - The best quality/performance trade-off is achieved with algebraic
> > multigrid coarsening
> > - Tested on academic, industrial, and healthcare datasets
> > - Generates multiple models for each training
> > - Effective on imbalanced datasets
> >
> > Download MLSVM at https://github.com/esadr/mlsvm
> >
> > Corresponding paper: Sadrfaridpour, Razzaghi and Safro "Engineering
> > multilevel support vector machines", 2017,
> > https://arxiv.org/pdf/1707.07657.pdf
> >
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> http://www.caam.rice.edu/~mk51/

More information about the petsc-dev mailing list