<div dir="ltr">Dear Richard,<div><br></div><div>I believe I don't need centroids. I just need cluster indices which corresponds to idx.</div><div><br></div><div>What I am trying to do is this:</div><div><br></div><div>Step 6: Cluster the points (y_i) i=1,...,n in R^k with the k-means algorithm into clusters C_1,...,C_k.</div>Output: Clusters A_1,....,A_k with A_i = {j | y_j in C_i}<div><br></div><div>where y_i is the row vector of a matrix whose columns are eigenvectors</div><div><br></div><div>In order to cluster y_i, I think I just need idx from MATLAB since it shows clustering indices. </div><div><br></div><div>Thanks,</div><div><br></div><div>Eda</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Mills, Richard Tran <<a href="mailto:rtmills@anl.gov">rtmills@anl.gov</a>>, 23 May 2020 Cmt, 02:39 tarihinde şunu yazdı:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
Hi Eda,<br>
<br>
If you are using the MATLAB k-means function, calling it like<br>
<br>
idx = kmeans(X,k)<br>
<br>
will give you the index set, but if you do<br>
<br>
[idx,C] = kmeans(X,k)<br>
<br>
then you will also get a matrix C which contains the cluster centroids. Is this not what you need?<br>
<br>
--Richard<br>
<br>
<div>On 5/22/20 10:38 AM, Eda Oktay wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">I am sorry, I used VecDuplictaeVecs not MatDuplicateVecs</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">Eda Oktay <<a href="mailto:eda.oktay@metu.edu.tr" target="_blank">eda.oktay@metu.edu.tr</a>>, 22 May 2020 Cum, 20:31 tarihinde şunu yazdı:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">Dear Richard,
<div><br>
</div>
<div>Thank you for your email. From MATLAB's kmeans() function I believe I got the final clustering index set, not centroids. What I am trying to do is to cluster vectors created by MatDuplicateVecs() according to the index set (whose type is not IS since I
took it from MATLAB) that I obtained from MATLAB. I am trying to cluster these vectors however since they are parallel, I couldn't understand how to cluster them.</div>
<div><br>
</div>
<div>Normally, I have to be independent from MATLAB so I will try your suggestion, grateful for that. However, because of my limited knowledge about PETSc and parallel computing, I am not able to figure out how to cluster parallel vectors according to an index
set.</div>
<div><br>
</div>
<div>Thanks,</div>
<div><br>
</div>
<div>Eda</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">Mills, Richard Tran <<a href="mailto:rtmills@anl.gov" target="_blank">rtmills@anl.gov</a>>, 30 Nis 2020 Per, 02:07 tarihinde şunu yazdı:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>Hi Eda,<br>
<br>
Thanks for your reply. I'm still trying to understand why you say you need to duplicate the row vectors across all processes. When I have implemented parallel k-means, I don't duplicate the row vectors. (This would be very unscalable and largely defeat the
point of doing this with MPI parallelism in the first place.)<br>
<br>
Earlier in this email thread, you said that you have used Matlab to get cluster IDs for each row vector. Are you trying to then use this information to calculate the cluster centroids from inside your PETSc program? If so, you can do this by having each MPI
rank do the following: For cluster i in 0 to (k-1), calculate the element-wise sum of all of the local rows that belong to cluster i, then use MPI_Allreduce() to calculate the global elementwise sum of all the local sums (this array will be replicated across
all MPI ranks), and finally divide by the number of members of that cluster to get the centroid. Note that MPI_Allreduce() doesn't work on PETSc objects, but simple arrays, so you'll want to use something like MatGetValues() or MatGetRow() to access the elements
of your row vectors.<br>
<br>
Let me know if I am misunderstanding what you are aiming to do, or if I am misunderstanding something.<br>
<br>
It sounds like you would benefit from having some routines in PETSc to do k-means (or other) clustering, by the way?<br>
<br>
Best regards,<br>
Richard<br>
<br>
<div>On 4/29/20 3:47 AM, Eda Oktay wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Dear Richard,
<div><br>
</div>
<div>I am trying to use spectral clustering algorithm by using k-means clustering algorithm at some point. I am doing this by producing a matrix consisting of eigenvectors (of the adjacency matrix of the graph that I want to partition), then forming row vectors
of this matrix. This is the part that I am using parallel vector. By using the output from k-means, I am trying to cluster these row vectors. To cluster these vectors, I think I need all row vectors in all processes. I wanted to use sequential vectors, however,
I couldn't find a different way that I form row vectors of a matrix.</div>
<div><br>
</div>
<div>I am trying to use VecScatterCreateToAll, however, since my vector is parallel crated by VecDuplicateVecs, my input is not in correct type, so I get error. I still can't get how can I use this function in parallel vector created by VecDuplicateVecs.</div>
<div><br>
</div>
<div>Thank you all for your help.</div>
<div><br>
</div>
<div>Eda</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">Mills, Richard Tran <<a href="mailto:rtmills@anl.gov" target="_blank">rtmills@anl.gov</a>>, 7 Nis 2020 Sal, 01:51 tarihinde şunu yazdı:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Hi Eda,<br>
<br>
I think that you probably want to use VecScatter routines, as Junchao <br>
has suggested, instead of the lower level star forest for this. I <br>
believe that VecScatterCreateToZero() is what you want for the broadcast <br>
problem you describe, in the second part of your question. I'm not sure <br>
what you are trying to do in the first part. Taking a parallel vector <br>
and then copying its entire contents to a sequential vector residing on <br>
each process is not scalable, and a lot of the design that has gone into <br>
PETSc is to prevent the user from ever needing to do things like that. <br>
Can you please tell us what you intend to do with these sequential vectors?<br>
<br>
I'm also wondering why, later in your message, you say that you get <br>
cluster assignments from Matlab, and then "to cluster row vectors <br>
according to this information, all processors need to have all of the <br>
row vectors". Do you mean you want to get all of the row vectors copied <br>
onto all of the processors so that you can compute the cluster <br>
centroids? If so, computing the cluster centroids can be done without <br>
copying the row vectors onto all processors if you use a communication <br>
operation like MPI_Allreduce().<br>
<br>
Lastly, let me add that I've done a fair amount of work implementing <br>
clustering algorithms on distributed memory parallel machines, but <br>
outside of PETSc. I was thinking that I should implement some of these <br>
routines using PETSc. I can't get to this immediately, but I'm wondering <br>
if you might care to tell me a bit more about the clustering problems <br>
you need to solve and how having some support for this in PETSc might <br>
(or might not) help.<br>
<br>
Best regards,<br>
Richard<br>
<br>
On 4/4/20 1:39 AM, Eda Oktay wrote:<br>
> Hi all,<br>
><br>
> I created a parallel vector UV, by using VecDuplicateVecs since I need <br>
> row vectors of a matrix. However, I need the whole vector be in all <br>
> processors, which means I need to gather all and broadcast them to all <br>
> processors. To gather, I tried to use VecStrideGatherAll:<br>
><br>
> Vec UVG;<br>
> VecStrideGatherAll(UV,UVG,INSERT_VALUES);<br>
> VecView(UVG,PETSC_VIEWER_STDOUT_WORLD);<br>
><br>
> however when I try to view the vector, I get the following error.<br>
><br>
> [3]PETSC ERROR: Invalid argument<br>
> [3]PETSC ERROR: Wrong type of object: Parameter # 1<br>
> [3]PETSC ERROR: See <br>
> <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html" rel="noreferrer" target="_blank">
http://www.mcs.anl.gov/petsc/documentation/faq.html</a> for trouble shooting.<br>
> [3]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019<br>
> [3]PETSC ERROR: ./clustering_son_final_edgecut_without_parmetis on a <br>
> arch-linux2-c-debug named localhost.localdomain by edaoktay Sat Apr 4 <br>
> 11:22:54 2020<br>
> [3]PETSC ERROR: Wrong type of object: Parameter # 1<br>
> [0]PETSC ERROR: See <br>
> <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html" rel="noreferrer" target="_blank">
http://www.mcs.anl.gov/petsc/documentation/faq.html</a> for trouble shooting.<br>
> [0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019<br>
> [0]PETSC ERROR: ./clustering_son_final_edgecut_without_parmetis on a <br>
> arch-linux2-c-debug named localhost.localdomain by edaoktay Sat Apr 4 <br>
> 11:22:54 2020<br>
> [0]PETSC ERROR: Configure options --download-mpich --download-openblas <br>
> --download-slepc --download-metis --download-parmetis --download-chaco <br>
> --with-X=1<br>
> [0]PETSC ERROR: #1 VecStrideGatherAll() line 646 in <br>
> /home/edaoktay/petsc-3.11.1/src/vec/vec/utils/vinv.c<br>
> ./clustering_son_final_edgecut_without_parmetis on a <br>
> arch-linux2-c-debug named localhost.localdomain by edaoktay Sat Apr 4 <br>
> 11:22:54 2020<br>
> [1]PETSC ERROR: Configure options --download-mpich --download-openblas <br>
> --download-slepc --download-metis --download-parmetis --download-chaco <br>
> --with-X=1<br>
> [1]PETSC ERROR: #1 VecStrideGatherAll() line 646 in <br>
> /home/edaoktay/petsc-3.11.1/src/vec/vec/utils/vinv.c<br>
> Configure options --download-mpich --download-openblas <br>
> --download-slepc --download-metis --download-parmetis --download-chaco <br>
> --with-X=1<br>
> [3]PETSC ERROR: #1 VecStrideGatherAll() line 646 in <br>
> /home/edaoktay/petsc-3.11.1/src/vec/vec/utils/vinv.c<br>
><br>
> I couldn't understand why I am getting this error. Is this because of <br>
> UV being created by VecDuplicateVecs? How can I solve this problem?<br>
><br>
> The other question is broadcasting. After gathering all elements of <br>
> the vector UV, I need to broadcast them to all processors. I found <br>
> PetscSFBcastBegin. However, I couldn't understand the PetscSF concept <br>
> properly. I couldn't adjust my question to the star forest concept.<br>
><br>
> My problem is: If I have 4 processors, I create a matrix whose columns <br>
> are 4 smallest eigenvectors, say of size 72. Then by defining each row <br>
> of this matrix as a vector, I cluster them by using k-means <br>
> clustering algorithm. For now, I cluster them by using MATLAB and I <br>
> obtain a vector showing which row vector is in which cluster. After <br>
> getting this vector, to cluster row vectors according to this <br>
> information, all processors need to have all of the row vectors.<br>
><br>
> According to this problem, how can I use the star forest concept?<br>
><br>
> I will be glad if you can help me about this problem since I don't <br>
> have enough knowledge about graph theory. An if you have any idea <br>
> about how can I use k-means algorithm in a more practical way, please <br>
> let me know.<br>
><br>
> Thanks!<br>
><br>
> Eda<br>
</blockquote>
</div>
</div>
</blockquote>
<br>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
<br>
</div>
</blockquote></div>