[MPICH] MPICH105 shm drops packages on SUN niagara

chong tan chong_guan_tan at yahoo.com
Mon Sep 17 19:34:04 CDT 2007


It seems to work for Niagara 1.    At least it passed the set of crazy tests I have.
   
  thanks
   
  tan
  
Darius Buntinas <buntinas at mcs.anl.gov> wrote:
  
It seems to be working on linux, but we don't have a solaris box to try 
it on. Can you try it and let us know?

-d

On 09/17/2007 12:07 PM, chong tan wrote:
> In the 'change liost' of the new 106 release, I see thie item:
> 
> # Bugfix for shm and ssm channels. Added missing read and write memory 
> barriers for x86, and missing volatile in packet structure
> 
> does it means this problem is fixed ?
> 
> thanks
> 
> 
> */William Gropp /* wrote:
> 
> We're looking at it; I've added a variation of this to our regular
> tests. No solution yet, however. My guess is that there is a
> missing volatile or memory barrier somewhere; this should force us
> to clean up the current code.
> 
> Bill
> 
> On May 16, 2007, at 12:18 PM, chong tan wrote:
> 
>> No taker on this ? There is an identical proble on Linux. Just
>> that I am not sure if this code can reproduce the problem. 
>> tan
>>
>>
>> 
>> ----- Original Message ----
>> From: chong tan >> >
>> To: mpich-discuss at mcs.anl.gov 
>> Sent: Friday, April 27, 2007 3:24:09 PM
>> Subject: Re: [MPICH] MPICH105 shm drops packages on SUN niagara
>>
>> The following code reproduces the problem. I think you maybe able
>> to reproduce the error on
>> Linux, but I am not sure.
>> 
>> 
>> It is best to run :
>> mpiexec -n 8 a.out
>> to reproduce the problem. You will need a machine with
>> 8CPU/cores. SOmetime you will need to
>> run the code multiple time to see the error.
>> 
>> there will be files fast_mpi_?.dmp created, where ? is the rank of
>> the related 'rank'. When MPI get stuck,
>> you should look at the last line of fast_mpi_0.dmp. If it says:
>>
>> read from child 7
>> 
>> then you should look at the last line of fast_mpi_7.dmp, it will say:
>> read from master
>> 
>> hope this help to debug the error.
>> 
>> thanks
>> tan
>>
>> ---------------------
>> #include "stdlib.h"
>> #include "stdio.h"
>> #include "mpi.h"
>> 
>> #define LOOP_COUNT 1000000
>> #define DATA_SIZE 4
>> #define MP_TAG 999
>> main()
>> {
>> int nProc, rank ;
>> int argc = 0 ;
>> int i, j, status ;
>> char buf[ 128 ] ;
>> FILE *pf ;
>> MPI_Init( &argc, NULL ) ;
>> MPI_Comm_size( MPI_COMM_WORLD, &nProc ) ;
>> MPI_Comm_rank( MPI_COMM_WORLD, &rank ) ;
>> sprintf( buf, "fast_mpi_%d.dmp", rank ) ;
>> pf = fopen( buf, "w" ) ;
>> if( !rank ) {
>> int **psend ;
>> int **precv ;
>> psend = (int**)calloc( nProc, sizeof( int *) ) ;
>> precv = (int**)calloc( nProc, sizeof( int *) ) ;
>> for( i = 0 ; i < nProc ; i++ ) {
>> psend[ i ] = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;
>> precv[ i ] = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;
>> }
>> for( i = 0 ; i < LOOP_COUNT ; i++ ) {
>> fprintf( pf, "Master : loop %d\n", i ) ;
>> fflush( pf ) ;
>> for( j = 1 ; j < nProc ; j++ ) {
>> fprintf( pf, " read from child %d\n", j ) ;
>> fflush( pf ) ;
>> status = MPI_Recv( precv[ j ], DATA_SIZE, MPI_LONG,
>> j, MP_TAG, MPI_COMM_WORLD, MP
>> I_STATUS_IGNORE ) ;
>> fprintf( pf, " read from child %d done, status =
>> %d\n", j, status ) ;
>> fflush( pf ) ;
>> }
>> for( j = 1 ; j < nProc ; j++ ) {
>> fprintf( pf, " send to child %d\n", j ) ;
>> fflush( pf ) ;
>> status = MPI_Send( psend[ j ], DATA_SIZE - 1,
>> MPI_LONG, j, MP_TAG, MPI_COMM_WORLD
>> ) ;
>> fprintf( pf, " send to child %d done, status =
>> %d\n", j, status ) ;
>> fflush( pf ) ;
>> }
>> }
>> } else {
>> int *psend ;
>> int *precv ;
>> psend = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;
>> precv = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;
>> for( i = 0 ; i < LOOP_COUNT ; i++ ) {
>> fprintf( pf, " send to master\n" ) ;
>> fflush( pf ) ;
>> status = MPI_Send( psend, DATA_SIZE - 1, MPI_LONG, 0,
>> MP_TAG, MPI_COMM_WORLD ) ;
>> fprintf( pf, " send to master done, status = %d\n",
>> status ) ;
>> fflush( pf ) ;
>> fprintf( pf, " read from master\n" ) ;
>> fflush( pf ) ;
>> status = MPI_Recv( precv, DATA_SIZE, MPI_LONG, 0,
>> MP_TAG, MPI_COMM_WORLD, MPI_STATUS_IGNORE ) ;
>> fprintf( pf, " read from master done, status =
>> %d\n", status ) ;
>> fflush( pf ) ;
>> }
>> }
>> fclose( pf ) ;
>> MPI_Finalize() ;
>> }
>>
>> ------------------------------------------------------------------------
>> Ahhh...imagining that irresistible "new car" smell?
>> Check out new cars at Yahoo! Autos.
>> 
>>
>>
>> ------------------------------------------------------------------------
>> Be a better Heartthrob. Get better relationship answers
>> from
>> someone who knows.
>> Yahoo! Answers - Check it out.
> 
> 
> ------------------------------------------------------------------------
> Luggage? GPS? Comic books?
> Check out fitting gifts for grads 
> 
> at Yahoo! Search.


       
---------------------------------
Fussy? Opinionated? Impossible to please? Perfect.  Join Yahoo!'s user panel and lay it on us.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20070917/fcbb18d6/attachment.htm>


More information about the mpich-discuss mailing list