[MPICH] Collective *v operation question
Rajeev Thakur
thakur at mcs.anl.gov
Sat Nov 24 18:24:14 CST 2007
Yes, your use of gatherv is ok.
Rajeev
> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Martin
> Schwinzerl
> Sent: Friday, November 23, 2007 11:35 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: [MPICH] Collective *v operation question
>
> Dear all!
>
> I'm currently hunting down a bug in a rather large program,
> that works fine on our local cluster with MPICH2 but crashes
> in - as it seems - random fashion on another cluster where
> only a MPI1.2 runtime env.
> ( MPICH-1.2.7p1, IIRC) is available.
>
> The crashes are in my humble opinion most likely caused by an
> (known, but as of now unidentified) configuration problem in
> the local scheduler / cluster management software (some jobs
> are not created properly, etc.) and therefore unrelated to
> this lists subject, but in order to rule out another
> possible cause, I would be grateful if somebody here could
> confirm, that my understanding of the MPI standard with
> respect to the *v variants of collective operations (e.g.
> Gatherv, Scatterv, ... ) is correct :
>
> SITUATION :
>
> In my example, each process of the Intracommunicator performs
> a fixed, number Np of operations and increments a counting
> variable cnt for each successful operation (e.g. after the
> calculation is finished, each processes cnt : 0 <= cnt <= Np ).
>
> The root process then gathers the counter values from all
> processes and increments the receive displacement entries for
> an Gatherv operation accordingly.
> If the process with rank r has cnt == 0, then the associated
> receive displacement value is set to the same value as for
> the Process ( r - 1 ). ( See code sample below for clarification)
>
> QUESTION :
>
> It has so far been my understanding, that this is compatible
> with this restriction from the MPI1.1 report / standard (
> chapter 4.5 )
>
> > The specification of counts, types, and displacements
> should not cause
> > any location on the root to be written more than once. Such
> a call is
> > erroneous.
>
> , as those processes with cnt == 0 would send only zero
> length messages and cause no write operations on the receive Buffer.
>
> ---> Is this assumption correct ?
>
> CODE SAMPLE FOR CLARIFICATION :
>
> The following code sample (C++ bindings) sketches the
> situation in question :
>
> // ... rank, size from MPI_Init
>
> int cnt = 0;
>
> //Holds the results of the successful operation(s) :
> std::vector< double > data;
>
> while( .... )
> {
> // processes 0, ... size - 1 perform essentially a random number
> // of operations. For each operation, cnt is incremented.
> }
>
> std::vector< int > recvCount( size );
> std::vector< int > recvOffset( size );
>
> MPI::COMM_WORLD.Gather( &cnt, 1, MPI::INT, &recvCount[ 0 ],
> 1, MPI::INT, 0 ); int n = 0;
>
> if( rank == 0 )
> {
> int offset = 0;
> for( int r = 0 ; r < size ; ++r )
> {
> n += recvCount[ r ];
> recvOffset[ r ] = offset;
>
> if( recvCount[ r ] > 0 )
> {
> offset += recvCount[ r ];
> }
> }
> }
>
> std::vector< double > recvData( n );
> MPI::COMM_WORLD.Gatherv( &data[ 0 ], cnt, MPI::DOUBLE,
> &recvData[ 0 ], &recvCount[ 0 ], &recvOffset[ 0 ], MPI::DOUBLE, 0);
>
> // .... -> end of example
>
> May thanks in advance!
>
> Yours, truly
> Martin Schwinzerl
>
>
>
More information about the mpich-discuss
mailing list