[MPICH] MPICH105 shm drops packages on SUN niagara

William Gropp gropp at mcs.anl.gov
Wed May 16 13:24:46 CDT 2007


We're looking at it; I've added a variation of this to our regular  
tests.  No solution yet, however.  My guess is that there is a  
missing volatile or memory barrier somewhere; this should force us to  
clean up the current code.

Bill

On May 16, 2007, at 12:18 PM, chong tan wrote:

> No taker on this ?  There is an identical proble on Linux.  Just  
> that I am not sure if this code can reproduce the problem.
> tan
>
>
>
> ----- Original Message ----
> From: chong tan <chong_guan_tan at yahoo.com>
> To: mpich-discuss at mcs.anl.gov
> Sent: Friday, April 27, 2007 3:24:09 PM
> Subject: Re: [MPICH] MPICH105 shm drops packages on SUN niagara
>
> The following code reproduces the problem.  I think you maybe able  
> to reproduce the error on
> Linux, but I am not sure.
>
>
> It is best to run :
> mpiexec -n 8 a.out
> to reproduce the problem.  You will need a machine with 8CPU/ 
> cores.  SOmetime you will need to
> run the code multiple time to see the error.
>
> there will be files fast_mpi_?.dmp created, where ? is the rank of  
> the related 'rank'.  When MPI get stuck,
> you should look at the last line of fast_mpi_0.dmp.  If it says:
>
>   read from child 7
>
> then you should look at the last line of fast_mpi_7.dmp, it will say:
>   read from master
>
> hope this help to debug the error.
>
> thanks
> tan
>
> ---------------------
> #include "stdlib.h"
> #include "stdio.h"
> #include "mpi.h"
>
> #define LOOP_COUNT  1000000
> #define DATA_SIZE   4
> #define MP_TAG      999
> main()
> {
>     int     nProc, rank ;
>     int     argc = 0 ;
>     int     i, j, status ;
>     char    buf[ 128 ] ;
>     FILE    *pf ;
>     MPI_Init( &argc, NULL ) ;
>     MPI_Comm_size( MPI_COMM_WORLD, &nProc ) ;
>     MPI_Comm_rank( MPI_COMM_WORLD, &rank ) ;
>     sprintf( buf, "fast_mpi_%d.dmp", rank ) ;
>     pf = fopen( buf, "w" ) ;
>     if( !rank ) {
>        int      **psend ;
>        int      **precv ;
>        psend = (int**)calloc( nProc, sizeof( int *) ) ;
>        precv = (int**)calloc( nProc, sizeof( int *) ) ;
>        for( i = 0 ; i < nProc ; i++ ) {
>            psend[ i ] = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;
>            precv[ i ] = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;
>        }
>        for( i = 0 ; i < LOOP_COUNT ; i++ ) {
>           fprintf( pf, "Master : loop %d\n", i ) ;
>           fflush( pf ) ;
>           for( j = 1 ; j < nProc ; j++ ) {
>              fprintf( pf, "  read from child %d\n", j ) ;
>              fflush( pf ) ;
>              status = MPI_Recv( precv[ j ], DATA_SIZE, MPI_LONG, j,  
> MP_TAG, MPI_COMM_WORLD, MP
> I_STATUS_IGNORE ) ;
>              fprintf( pf, "  read from child %d done, status = %d 
> \n", j, status ) ;
>              fflush( pf ) ;
>           }
>           for( j = 1 ; j < nProc ; j++ ) {
>              fprintf( pf, "  send to child %d\n", j ) ;
>              fflush( pf ) ;
>              status = MPI_Send( psend[ j ], DATA_SIZE - 1,  
> MPI_LONG, j, MP_TAG, MPI_COMM_WORLD
>  ) ;
>              fprintf( pf, "  send to child %d done, status = %d\n",  
> j, status ) ;
>              fflush( pf ) ;
>           }
>        }
>     } else {
>        int  *psend ;
>        int  *precv ;
>        psend = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;
>        precv = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;
>        for( i = 0 ; i < LOOP_COUNT ; i++ ) {
>              fprintf( pf, "  send to master\n" ) ;
>              fflush( pf ) ;
>              status = MPI_Send( psend, DATA_SIZE - 1, MPI_LONG, 0,  
> MP_TAG, MPI_COMM_WORLD ) ;
>              fprintf( pf, "  send to master done, status = %d\n",  
> status ) ;
>              fflush( pf ) ;
>              fprintf( pf, "  read from master\n" ) ;
>              fflush( pf ) ;
>              status = MPI_Recv( precv, DATA_SIZE, MPI_LONG, 0,  
> MP_TAG, MPI_COMM_WORLD, MPI_STATUS_IGNORE ) ;
>              fprintf( pf, "  read from master done, status = %d\n",  
> status ) ;
>              fflush( pf ) ;
>        }
>     }
>     fclose( pf ) ;
>     MPI_Finalize() ;
> }
>
> Ahhh...imagining that irresistible "new car" smell?
> Check out new cars at Yahoo! Autos.
>
>
> Be a better Heartthrob. Get better relationship answers from  
> someone who knows.
> Yahoo! Answers - Check it out.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20070516/3f9adc6f/attachment.htm>


More information about the mpich-discuss mailing list