[petsc-users] Gather and Broadcast Parallel Vectors in k-means algorithm

Eda Oktay eda.oktay at metu.edu.tr
Wed Apr 29 05:47:32 CDT 2020


Dear Richard,

I am trying to use spectral clustering algorithm by using k-means
clustering algorithm at some point. I am doing this by producing a matrix
consisting of eigenvectors (of the adjacency matrix of the graph that I
want to partition), then forming row vectors of this matrix. This is the
part that I am using parallel vector. By using the output from k-means, I
am trying to cluster these row vectors. To cluster these vectors, I think I
need all row vectors in all processes. I wanted to use sequential vectors,
however, I couldn't find a different way that I form row vectors of a
matrix.

I am trying to use VecScatterCreateToAll, however, since my vector is
parallel crated by VecDuplicateVecs, my input is not in correct type, so I
get error. I still can't get how can I use this function in parallel vector
created by VecDuplicateVecs.

Thank you all for your help.

Eda

Mills, Richard Tran <rtmills at anl.gov>, 7 Nis 2020 Sal, 01:51 tarihinde şunu
yazdı:

> Hi Eda,
>
> I think that you probably want to use VecScatter routines, as Junchao
> has suggested, instead of the lower level star forest for this. I
> believe that VecScatterCreateToZero() is what you want for the broadcast
> problem you describe, in the second part of your question. I'm not sure
> what you are trying to do in the first part. Taking a parallel vector
> and then copying its entire contents to a sequential vector residing on
> each process is not scalable, and a lot of the design that has gone into
> PETSc is to prevent the user from ever needing to do things like that.
> Can you please tell us what you intend to do with these sequential vectors?
>
> I'm also wondering why, later in your message, you say that you get
> cluster assignments from Matlab, and then "to cluster row vectors
> according to this information, all processors need to have all of the
> row vectors". Do you mean you want to get all of the row vectors copied
> onto all of the processors so that you can compute the cluster
> centroids? If so, computing the cluster centroids can be done without
> copying the row vectors onto all processors if you use a communication
> operation like MPI_Allreduce().
>
> Lastly, let me add that I've done a fair amount of work implementing
> clustering algorithms on distributed memory parallel machines, but
> outside of PETSc. I was thinking that I should implement some of these
> routines using PETSc. I can't get to this immediately, but I'm wondering
> if you might care to tell me a bit more about the clustering problems
> you need to solve and how having some support for this in PETSc might
> (or might not) help.
>
> Best regards,
> Richard
>
> On 4/4/20 1:39 AM, Eda Oktay wrote:
> > Hi all,
> >
> > I created a parallel vector UV, by using VecDuplicateVecs since I need
> > row vectors of a matrix. However, I need the whole vector be in all
> > processors, which means I need to gather all and broadcast them to all
> > processors. To gather, I tried to use VecStrideGatherAll:
> >
> >   Vec UVG;
> >   VecStrideGatherAll(UV,UVG,INSERT_VALUES);
> >   VecView(UVG,PETSC_VIEWER_STDOUT_WORLD);
> >
> >  however when I try to view the vector, I get the following error.
> >
> > [3]PETSC ERROR: Invalid argument
> > [3]PETSC ERROR: Wrong type of object: Parameter # 1
> > [3]PETSC ERROR: See
> > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
> shooting.
> > [3]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019
> > [3]PETSC ERROR: ./clustering_son_final_edgecut_without_parmetis on a
> > arch-linux2-c-debug named localhost.localdomain by edaoktay Sat Apr  4
> > 11:22:54 2020
> > [3]PETSC ERROR: Wrong type of object: Parameter # 1
> > [0]PETSC ERROR: See
> > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
> shooting.
> > [0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019
> > [0]PETSC ERROR: ./clustering_son_final_edgecut_without_parmetis on a
> > arch-linux2-c-debug named localhost.localdomain by edaoktay Sat Apr  4
> > 11:22:54 2020
> > [0]PETSC ERROR: Configure options --download-mpich --download-openblas
> > --download-slepc --download-metis --download-parmetis --download-chaco
> > --with-X=1
> > [0]PETSC ERROR: #1 VecStrideGatherAll() line 646 in
> > /home/edaoktay/petsc-3.11.1/src/vec/vec/utils/vinv.c
> > ./clustering_son_final_edgecut_without_parmetis on a
> > arch-linux2-c-debug named localhost.localdomain by edaoktay Sat Apr  4
> > 11:22:54 2020
> > [1]PETSC ERROR: Configure options --download-mpich --download-openblas
> > --download-slepc --download-metis --download-parmetis --download-chaco
> > --with-X=1
> > [1]PETSC ERROR: #1 VecStrideGatherAll() line 646 in
> > /home/edaoktay/petsc-3.11.1/src/vec/vec/utils/vinv.c
> > Configure options --download-mpich --download-openblas
> > --download-slepc --download-metis --download-parmetis --download-chaco
> > --with-X=1
> > [3]PETSC ERROR: #1 VecStrideGatherAll() line 646 in
> > /home/edaoktay/petsc-3.11.1/src/vec/vec/utils/vinv.c
> >
> > I couldn't understand why I am getting this error. Is this because of
> > UV being created by VecDuplicateVecs? How can I solve this problem?
> >
> > The other question is broadcasting. After gathering all elements of
> > the vector UV, I need to broadcast them to all processors. I found
> > PetscSFBcastBegin. However, I couldn't understand the PetscSF concept
> > properly. I couldn't adjust my question to the star forest concept.
> >
> > My problem is: If I have 4 processors, I create a matrix whose columns
> > are 4 smallest eigenvectors, say of size 72. Then by defining each row
> > of this matrix as a vector, I cluster them by using k-means
> > clustering algorithm. For now, I cluster them by using MATLAB and I
> > obtain a vector showing which row vector is in which cluster. After
> > getting this vector, to cluster row vectors according to this
> > information, all processors need to have all of the row vectors.
> >
> > According to this problem, how can I use the star forest concept?
> >
> > I will be glad if you can help me about this problem since I don't
> > have enough knowledge about graph theory. An if you have any idea
> > about how can I use k-means algorithm in a more practical way, please
> > let me know.
> >
> > Thanks!
> >
> > Eda
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200429/f99a8ca3/attachment.html>


More information about the petsc-users mailing list