[MPICH] Collective *v operation question

Rajeev Thakur thakur at mcs.anl.gov
Sat Nov 24 18:24:14 CST 2007


Yes, your use of gatherv is ok.

Rajeev 

> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov 
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Martin 
> Schwinzerl
> Sent: Friday, November 23, 2007 11:35 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: [MPICH] Collective *v operation question
> 
> Dear all!
> 
> I'm currently hunting down a bug in a rather large program, 
> that works fine on our local cluster with MPICH2 but crashes 
> in - as it seems - random fashion on another cluster where 
> only a  MPI1.2 runtime env.
> ( MPICH-1.2.7p1, IIRC) is available.
> 
> The crashes are in my humble opinion most likely caused by an 
> (known, but as of now unidentified) configuration problem in 
> the local scheduler / cluster management software (some jobs 
> are not created properly, etc.) and therefore unrelated to 
> this lists subject,  but in order to rule out another 
> possible cause, I would be grateful if somebody here could 
> confirm, that my understanding of the MPI standard with 
> respect to the *v variants of collective operations (e.g. 
> Gatherv, Scatterv, ... ) is correct :
> 
> SITUATION :
> 
> In my example, each process of the Intracommunicator performs 
> a fixed, number Np of operations and increments a counting 
> variable cnt for each successful operation (e.g. after the 
> calculation is finished, each processes cnt : 0 <= cnt <= Np ).
> 
> The root process then gathers the counter values from all 
> processes and increments the receive displacement entries for 
> an Gatherv operation accordingly.
> If the process with rank r has cnt == 0, then the associated 
> receive displacement value is set to the same value as for 
> the Process ( r - 1 ). ( See code sample below for clarification)
> 
> QUESTION :
> 
> It has so far been my understanding, that this is compatible 
> with this restriction from the MPI1.1 report / standard ( 
> chapter 4.5 )
> 
> > The specification of counts, types, and displacements 
> should not cause 
> > any location on the root to be written more than once. Such 
> a call is 
> > erroneous.
> 
> , as those processes with cnt == 0 would send only zero 
> length messages and cause no write operations on the receive Buffer.
> 
> ---> Is this assumption correct  ?
> 
> CODE SAMPLE FOR CLARIFICATION :
> 
> The following code sample  (C++ bindings)  sketches the 
> situation in question :
> 
> // ... rank, size from MPI_Init
> 
> int cnt = 0;
> 
> //Holds the results of the successful operation(s) :
> std::vector< double > data; 
> 
> while( .... )
> {
>     // processes 0, ... size - 1 perform  essentially a random number
>     // of operations. For each operation, cnt is incremented.
> }
> 
> std::vector< int > recvCount( size );
> std::vector< int > recvOffset( size );
> 
> MPI::COMM_WORLD.Gather( &cnt, 1, MPI::INT, &recvCount[ 0 ], 
> 1, MPI::INT, 0 ); int n = 0;
> 
> if( rank == 0 )
> {
>     int offset = 0;
>     for( int r = 0 ; r < size ; ++r )
>     {
>         n += recvCount[ r ];
>         recvOffset[ r ] = offset;
> 
>         if( recvCount[ r ] > 0 )
>         {
>             offset += recvCount[ r ];
>         }
>     }
> }
> 
> std::vector< double > recvData( n );
> MPI::COMM_WORLD.Gatherv( &data[ 0 ], cnt, MPI::DOUBLE, 
> &recvData[ 0 ], &recvCount[ 0 ], &recvOffset[ 0 ], MPI::DOUBLE, 0);
> 
> // .... -> end of example
> 
> May thanks in advance!
> 
> Yours, truly
> Martin Schwinzerl
> 
> 
> 




More information about the mpich-discuss mailing list