[petsc-users] Gather and Broadcast Parallel Vectors in k-means algorithm

Eda Oktay eda.oktay at metu.edu.tr
Fri May 22 12:38:40 CDT 2020


I am sorry, I used VecDuplictaeVecs not MatDuplicateVecs

Eda Oktay <eda.oktay at metu.edu.tr>, 22 May 2020 Cum, 20:31 tarihinde şunu
yazdı:

> Dear Richard,
>
> Thank you for your email. From MATLAB's kmeans() function I believe I got
> the final clustering index set, not centroids. What I am trying to do is to
> cluster vectors created by MatDuplicateVecs() according to the index set
> (whose type is not IS since I took it from MATLAB) that I obtained from
> MATLAB. I am trying to cluster these vectors however since they are
> parallel, I couldn't understand how to cluster them.
>
> Normally, I have to be independent from MATLAB so I will try your
> suggestion, grateful for that. However, because of my limited
> knowledge about PETSc and parallel computing, I am not able to figure out
> how to cluster parallel vectors according to an index set.
>
> Thanks,
>
> Eda
>
> Mills, Richard Tran <rtmills at anl.gov>, 30 Nis 2020 Per, 02:07 tarihinde
> şunu yazdı:
>
>> Hi Eda,
>>
>> Thanks for your reply. I'm still trying to understand why you say you
>> need to duplicate the row vectors across all processes. When I have
>> implemented parallel k-means, I don't duplicate the row vectors. (This
>> would be very unscalable and largely defeat the point of doing this with
>> MPI parallelism in the first place.)
>>
>> Earlier in this email thread, you said that you have used Matlab to get
>> cluster IDs for each row vector. Are you trying to then use this
>> information to calculate the cluster centroids from inside your PETSc
>> program? If so, you can do this by having each MPI rank do the following:
>> For cluster i in 0 to (k-1), calculate the element-wise sum of all of the
>> local rows that belong to cluster i, then use MPI_Allreduce() to calculate
>> the global elementwise sum of all the local sums (this array will be
>> replicated across all MPI ranks), and finally divide by the number of
>> members of that cluster to get the centroid. Note that MPI_Allreduce()
>> doesn't work on PETSc objects, but simple arrays, so you'll want to use
>> something like MatGetValues() or MatGetRow() to access the elements of your
>> row vectors.
>>
>> Let me know if I am misunderstanding what you are aiming to do, or if I
>> am misunderstanding something.
>>
>> It sounds like you would benefit from having some routines in PETSc to do
>> k-means (or other) clustering, by the way?
>>
>> Best regards,
>> Richard
>>
>> On 4/29/20 3:47 AM, Eda Oktay wrote:
>>
>> Dear Richard,
>>
>> I am trying to use spectral clustering algorithm by using k-means
>> clustering algorithm at some point. I am doing this by producing a matrix
>> consisting of eigenvectors (of the adjacency matrix of the graph that I
>> want to partition), then forming row vectors of this matrix. This is the
>> part that I am using parallel vector. By using the output from k-means, I
>> am trying to cluster these row vectors. To cluster these vectors, I think I
>> need all row vectors in all processes. I wanted to use sequential vectors,
>> however, I couldn't find a different way that I form row vectors of a
>> matrix.
>>
>> I am trying to use VecScatterCreateToAll, however, since my vector is
>> parallel crated by VecDuplicateVecs, my input is not in correct type, so I
>> get error. I still can't get how can I use this function in parallel vector
>> created by VecDuplicateVecs.
>>
>> Thank you all for your help.
>>
>> Eda
>>
>> Mills, Richard Tran <rtmills at anl.gov>, 7 Nis 2020 Sal, 01:51 tarihinde
>> şunu yazdı:
>>
>>> Hi Eda,
>>>
>>> I think that you probably want to use VecScatter routines, as Junchao
>>> has suggested, instead of the lower level star forest for this. I
>>> believe that VecScatterCreateToZero() is what you want for the broadcast
>>> problem you describe, in the second part of your question. I'm not sure
>>> what you are trying to do in the first part. Taking a parallel vector
>>> and then copying its entire contents to a sequential vector residing on
>>> each process is not scalable, and a lot of the design that has gone into
>>> PETSc is to prevent the user from ever needing to do things like that.
>>> Can you please tell us what you intend to do with these sequential
>>> vectors?
>>>
>>> I'm also wondering why, later in your message, you say that you get
>>> cluster assignments from Matlab, and then "to cluster row vectors
>>> according to this information, all processors need to have all of the
>>> row vectors". Do you mean you want to get all of the row vectors copied
>>> onto all of the processors so that you can compute the cluster
>>> centroids? If so, computing the cluster centroids can be done without
>>> copying the row vectors onto all processors if you use a communication
>>> operation like MPI_Allreduce().
>>>
>>> Lastly, let me add that I've done a fair amount of work implementing
>>> clustering algorithms on distributed memory parallel machines, but
>>> outside of PETSc. I was thinking that I should implement some of these
>>> routines using PETSc. I can't get to this immediately, but I'm wondering
>>> if you might care to tell me a bit more about the clustering problems
>>> you need to solve and how having some support for this in PETSc might
>>> (or might not) help.
>>>
>>> Best regards,
>>> Richard
>>>
>>> On 4/4/20 1:39 AM, Eda Oktay wrote:
>>> > Hi all,
>>> >
>>> > I created a parallel vector UV, by using VecDuplicateVecs since I need
>>> > row vectors of a matrix. However, I need the whole vector be in all
>>> > processors, which means I need to gather all and broadcast them to all
>>> > processors. To gather, I tried to use VecStrideGatherAll:
>>> >
>>> >   Vec UVG;
>>> >   VecStrideGatherAll(UV,UVG,INSERT_VALUES);
>>> >   VecView(UVG,PETSC_VIEWER_STDOUT_WORLD);
>>> >
>>> >  however when I try to view the vector, I get the following error.
>>> >
>>> > [3]PETSC ERROR: Invalid argument
>>> > [3]PETSC ERROR: Wrong type of object: Parameter # 1
>>> > [3]PETSC ERROR: See
>>> > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>>> shooting.
>>> > [3]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019
>>> > [3]PETSC ERROR: ./clustering_son_final_edgecut_without_parmetis on a
>>> > arch-linux2-c-debug named localhost.localdomain by edaoktay Sat Apr  4
>>> > 11:22:54 2020
>>> > [3]PETSC ERROR: Wrong type of object: Parameter # 1
>>> > [0]PETSC ERROR: See
>>> > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>>> shooting.
>>> > [0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019
>>> > [0]PETSC ERROR: ./clustering_son_final_edgecut_without_parmetis on a
>>> > arch-linux2-c-debug named localhost.localdomain by edaoktay Sat Apr  4
>>> > 11:22:54 2020
>>> > [0]PETSC ERROR: Configure options --download-mpich --download-openblas
>>> > --download-slepc --download-metis --download-parmetis --download-chaco
>>> > --with-X=1
>>> > [0]PETSC ERROR: #1 VecStrideGatherAll() line 646 in
>>> > /home/edaoktay/petsc-3.11.1/src/vec/vec/utils/vinv.c
>>> > ./clustering_son_final_edgecut_without_parmetis on a
>>> > arch-linux2-c-debug named localhost.localdomain by edaoktay Sat Apr  4
>>> > 11:22:54 2020
>>> > [1]PETSC ERROR: Configure options --download-mpich --download-openblas
>>> > --download-slepc --download-metis --download-parmetis --download-chaco
>>> > --with-X=1
>>> > [1]PETSC ERROR: #1 VecStrideGatherAll() line 646 in
>>> > /home/edaoktay/petsc-3.11.1/src/vec/vec/utils/vinv.c
>>> > Configure options --download-mpich --download-openblas
>>> > --download-slepc --download-metis --download-parmetis --download-chaco
>>> > --with-X=1
>>> > [3]PETSC ERROR: #1 VecStrideGatherAll() line 646 in
>>> > /home/edaoktay/petsc-3.11.1/src/vec/vec/utils/vinv.c
>>> >
>>> > I couldn't understand why I am getting this error. Is this because of
>>> > UV being created by VecDuplicateVecs? How can I solve this problem?
>>> >
>>> > The other question is broadcasting. After gathering all elements of
>>> > the vector UV, I need to broadcast them to all processors. I found
>>> > PetscSFBcastBegin. However, I couldn't understand the PetscSF concept
>>> > properly. I couldn't adjust my question to the star forest concept.
>>> >
>>> > My problem is: If I have 4 processors, I create a matrix whose columns
>>> > are 4 smallest eigenvectors, say of size 72. Then by defining each row
>>> > of this matrix as a vector, I cluster them by using k-means
>>> > clustering algorithm. For now, I cluster them by using MATLAB and I
>>> > obtain a vector showing which row vector is in which cluster. After
>>> > getting this vector, to cluster row vectors according to this
>>> > information, all processors need to have all of the row vectors.
>>> >
>>> > According to this problem, how can I use the star forest concept?
>>> >
>>> > I will be glad if you can help me about this problem since I don't
>>> > have enough knowledge about graph theory. An if you have any idea
>>> > about how can I use k-means algorithm in a more practical way, please
>>> > let me know.
>>> >
>>> > Thanks!
>>> >
>>> > Eda
>>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200522/e4e002fb/attachment-0001.html>


More information about the petsc-users mailing list