[MPICH] MPICH105 shm drops packages on SUN niagara
William Gropp
gropp at mcs.anl.gov
Wed May 16 13:24:46 CDT 2007
We're looking at it; I've added a variation of this to our regular
tests. No solution yet, however. My guess is that there is a
missing volatile or memory barrier somewhere; this should force us to
clean up the current code.
Bill
On May 16, 2007, at 12:18 PM, chong tan wrote:
> No taker on this ? There is an identical proble on Linux. Just
> that I am not sure if this code can reproduce the problem.
> tan
>
>
>
> ----- Original Message ----
> From: chong tan <chong_guan_tan at yahoo.com>
> To: mpich-discuss at mcs.anl.gov
> Sent: Friday, April 27, 2007 3:24:09 PM
> Subject: Re: [MPICH] MPICH105 shm drops packages on SUN niagara
>
> The following code reproduces the problem. I think you maybe able
> to reproduce the error on
> Linux, but I am not sure.
>
>
> It is best to run :
> mpiexec -n 8 a.out
> to reproduce the problem. You will need a machine with 8CPU/
> cores. SOmetime you will need to
> run the code multiple time to see the error.
>
> there will be files fast_mpi_?.dmp created, where ? is the rank of
> the related 'rank'. When MPI get stuck,
> you should look at the last line of fast_mpi_0.dmp. If it says:
>
> read from child 7
>
> then you should look at the last line of fast_mpi_7.dmp, it will say:
> read from master
>
> hope this help to debug the error.
>
> thanks
> tan
>
> ---------------------
> #include "stdlib.h"
> #include "stdio.h"
> #include "mpi.h"
>
> #define LOOP_COUNT 1000000
> #define DATA_SIZE 4
> #define MP_TAG 999
> main()
> {
> int nProc, rank ;
> int argc = 0 ;
> int i, j, status ;
> char buf[ 128 ] ;
> FILE *pf ;
> MPI_Init( &argc, NULL ) ;
> MPI_Comm_size( MPI_COMM_WORLD, &nProc ) ;
> MPI_Comm_rank( MPI_COMM_WORLD, &rank ) ;
> sprintf( buf, "fast_mpi_%d.dmp", rank ) ;
> pf = fopen( buf, "w" ) ;
> if( !rank ) {
> int **psend ;
> int **precv ;
> psend = (int**)calloc( nProc, sizeof( int *) ) ;
> precv = (int**)calloc( nProc, sizeof( int *) ) ;
> for( i = 0 ; i < nProc ; i++ ) {
> psend[ i ] = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;
> precv[ i ] = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;
> }
> for( i = 0 ; i < LOOP_COUNT ; i++ ) {
> fprintf( pf, "Master : loop %d\n", i ) ;
> fflush( pf ) ;
> for( j = 1 ; j < nProc ; j++ ) {
> fprintf( pf, " read from child %d\n", j ) ;
> fflush( pf ) ;
> status = MPI_Recv( precv[ j ], DATA_SIZE, MPI_LONG, j,
> MP_TAG, MPI_COMM_WORLD, MP
> I_STATUS_IGNORE ) ;
> fprintf( pf, " read from child %d done, status = %d
> \n", j, status ) ;
> fflush( pf ) ;
> }
> for( j = 1 ; j < nProc ; j++ ) {
> fprintf( pf, " send to child %d\n", j ) ;
> fflush( pf ) ;
> status = MPI_Send( psend[ j ], DATA_SIZE - 1,
> MPI_LONG, j, MP_TAG, MPI_COMM_WORLD
> ) ;
> fprintf( pf, " send to child %d done, status = %d\n",
> j, status ) ;
> fflush( pf ) ;
> }
> }
> } else {
> int *psend ;
> int *precv ;
> psend = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;
> precv = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;
> for( i = 0 ; i < LOOP_COUNT ; i++ ) {
> fprintf( pf, " send to master\n" ) ;
> fflush( pf ) ;
> status = MPI_Send( psend, DATA_SIZE - 1, MPI_LONG, 0,
> MP_TAG, MPI_COMM_WORLD ) ;
> fprintf( pf, " send to master done, status = %d\n",
> status ) ;
> fflush( pf ) ;
> fprintf( pf, " read from master\n" ) ;
> fflush( pf ) ;
> status = MPI_Recv( precv, DATA_SIZE, MPI_LONG, 0,
> MP_TAG, MPI_COMM_WORLD, MPI_STATUS_IGNORE ) ;
> fprintf( pf, " read from master done, status = %d\n",
> status ) ;
> fflush( pf ) ;
> }
> }
> fclose( pf ) ;
> MPI_Finalize() ;
> }
>
> Ahhh...imagining that irresistible "new car" smell?
> Check out new cars at Yahoo! Autos.
>
>
> Be a better Heartthrob. Get better relationship answers from
> someone who knows.
> Yahoo! Answers - Check it out.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20070516/3f9adc6f/attachment.htm>
More information about the mpich-discuss
mailing list