[petsc-users] Gather and Broadcast Parallel Vectors in k-means algorithm

Mills, Richard Tran rtmills at anl.gov
Mon Apr 6 17:51:48 CDT 2020


Hi Eda,

I think that you probably want to use VecScatter routines, as Junchao 
has suggested, instead of the lower level star forest for this. I 
believe that VecScatterCreateToZero() is what you want for the broadcast 
problem you describe, in the second part of your question. I'm not sure 
what you are trying to do in the first part. Taking a parallel vector 
and then copying its entire contents to a sequential vector residing on 
each process is not scalable, and a lot of the design that has gone into 
PETSc is to prevent the user from ever needing to do things like that. 
Can you please tell us what you intend to do with these sequential vectors?

I'm also wondering why, later in your message, you say that you get 
cluster assignments from Matlab, and then "to cluster row vectors 
according to this information, all processors need to have all of the 
row vectors". Do you mean you want to get all of the row vectors copied 
onto all of the processors so that you can compute the cluster 
centroids? If so, computing the cluster centroids can be done without 
copying the row vectors onto all processors if you use a communication 
operation like MPI_Allreduce().

Lastly, let me add that I've done a fair amount of work implementing 
clustering algorithms on distributed memory parallel machines, but 
outside of PETSc. I was thinking that I should implement some of these 
routines using PETSc. I can't get to this immediately, but I'm wondering 
if you might care to tell me a bit more about the clustering problems 
you need to solve and how having some support for this in PETSc might 
(or might not) help.

Best regards,
Richard

On 4/4/20 1:39 AM, Eda Oktay wrote:
> Hi all,
>
> I created a parallel vector UV, by using VecDuplicateVecs since I need 
> row vectors of a matrix. However, I need the whole vector be in all 
> processors, which means I need to gather all and broadcast them to all 
> processors. To gather, I tried to use VecStrideGatherAll:
>
>   Vec UVG;
>   VecStrideGatherAll(UV,UVG,INSERT_VALUES);
>   VecView(UVG,PETSC_VIEWER_STDOUT_WORLD);
>
>  however when I try to view the vector, I get the following error.
>
> [3]PETSC ERROR: Invalid argument
> [3]PETSC ERROR: Wrong type of object: Parameter # 1
> [3]PETSC ERROR: See 
> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [3]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019
> [3]PETSC ERROR: ./clustering_son_final_edgecut_without_parmetis on a 
> arch-linux2-c-debug named localhost.localdomain by edaoktay Sat Apr  4 
> 11:22:54 2020
> [3]PETSC ERROR: Wrong type of object: Parameter # 1
> [0]PETSC ERROR: See 
> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019
> [0]PETSC ERROR: ./clustering_son_final_edgecut_without_parmetis on a 
> arch-linux2-c-debug named localhost.localdomain by edaoktay Sat Apr  4 
> 11:22:54 2020
> [0]PETSC ERROR: Configure options --download-mpich --download-openblas 
> --download-slepc --download-metis --download-parmetis --download-chaco 
> --with-X=1
> [0]PETSC ERROR: #1 VecStrideGatherAll() line 646 in 
> /home/edaoktay/petsc-3.11.1/src/vec/vec/utils/vinv.c
> ./clustering_son_final_edgecut_without_parmetis on a 
> arch-linux2-c-debug named localhost.localdomain by edaoktay Sat Apr  4 
> 11:22:54 2020
> [1]PETSC ERROR: Configure options --download-mpich --download-openblas 
> --download-slepc --download-metis --download-parmetis --download-chaco 
> --with-X=1
> [1]PETSC ERROR: #1 VecStrideGatherAll() line 646 in 
> /home/edaoktay/petsc-3.11.1/src/vec/vec/utils/vinv.c
> Configure options --download-mpich --download-openblas 
> --download-slepc --download-metis --download-parmetis --download-chaco 
> --with-X=1
> [3]PETSC ERROR: #1 VecStrideGatherAll() line 646 in 
> /home/edaoktay/petsc-3.11.1/src/vec/vec/utils/vinv.c
>
> I couldn't understand why I am getting this error. Is this because of 
> UV being created by VecDuplicateVecs? How can I solve this problem?
>
> The other question is broadcasting. After gathering all elements of 
> the vector UV, I need to broadcast them to all processors. I found 
> PetscSFBcastBegin. However, I couldn't understand the PetscSF concept 
> properly. I couldn't adjust my question to the star forest concept.
>
> My problem is: If I have 4 processors, I create a matrix whose columns 
> are 4 smallest eigenvectors, say of size 72. Then by defining each row 
> of this matrix as a vector, I cluster them by using k-means 
> clustering algorithm. For now, I cluster them by using MATLAB and I 
> obtain a vector showing which row vector is in which cluster. After 
> getting this vector, to cluster row vectors according to this 
> information, all processors need to have all of the row vectors.
>
> According to this problem, how can I use the star forest concept?
>
> I will be glad if you can help me about this problem since I don't 
> have enough knowledge about graph theory. An if you have any idea 
> about how can I use k-means algorithm in a more practical way, please 
> let me know.
>
> Thanks!
>
> Eda


More information about the petsc-users mailing list