[MPICH] Collective *v operation question
Martin Schwinzerl
martin.schwinzerl at edu.uni-graz.at
Fri Nov 23 11:34:33 CST 2007
Dear all!
I'm currently hunting down a bug in a rather large program, that works fine
on our local cluster with MPICH2 but crashes in - as it seems - random
fashion on another cluster where only a MPI1.2 runtime env.
( MPICH-1.2.7p1, IIRC) is available.
The crashes are in my humble opinion most likely caused by an
(known, but as of now unidentified) configuration problem in the local
scheduler / cluster management software (some jobs are not created
properly, etc.) and therefore unrelated to this lists subject, but in
order to rule out another possible cause, I would be grateful if somebody
here could confirm, that my understanding of the MPI standard
with respect to the *v variants of collective operations (e.g. Gatherv,
Scatterv, ... ) is
correct :
SITUATION :
In my example, each process of the Intracommunicator performs a fixed,
number Np of operations and increments a counting variable cnt for each
successful operation (e.g. after the calculation is finished, each processes
cnt : 0 <= cnt <= Np ).
The root process then gathers the counter values from all processes and
increments the receive displacement entries for an Gatherv operation
accordingly.
If the process with rank r has cnt == 0, then the associated receive
displacement
value is set to the same value as for the Process ( r - 1 ). ( See code
sample
below for clarification)
QUESTION :
It has so far been my understanding, that this is compatible with this
restriction
from the MPI1.1 report / standard ( chapter 4.5 )
> The specification of counts, types, and displacements should not cause
> any location on
> the root to be written more than once. Such a call is erroneous.
, as those processes with cnt == 0 would send only zero length messages and
cause no write operations on the receive Buffer.
---> Is this assumption correct ?
CODE SAMPLE FOR CLARIFICATION :
The following code sample (C++ bindings) sketches the situation
in question :
// ... rank, size from MPI_Init
int cnt = 0;
//Holds the results of the successful operation(s) :
std::vector< double > data;
while( .... )
{
// processes 0, ... size - 1 perform essentially a random number
// of operations. For each operation, cnt is incremented.
}
std::vector< int > recvCount( size );
std::vector< int > recvOffset( size );
MPI::COMM_WORLD.Gather( &cnt, 1, MPI::INT, &recvCount[ 0 ], 1, MPI::INT,
0 );
int n = 0;
if( rank == 0 )
{
int offset = 0;
for( int r = 0 ; r < size ; ++r )
{
n += recvCount[ r ];
recvOffset[ r ] = offset;
if( recvCount[ r ] > 0 )
{
offset += recvCount[ r ];
}
}
}
std::vector< double > recvData( n );
MPI::COMM_WORLD.Gatherv( &data[ 0 ], cnt, MPI::DOUBLE, &recvData[ 0 ],
&recvCount[ 0 ], &recvOffset[ 0 ], MPI::DOUBLE, 0);
// .... -> end of example
May thanks in advance!
Yours, truly
Martin Schwinzerl
More information about the mpich-discuss
mailing list