[MPICH] MPICH105 shm drops packages on SUN niagara

Rajeev Thakur thakur at mcs.anl.gov
Wed Jul 18 17:42:56 CDT 2007


No we didn't get a chance. Most of our current development is in Nemesis.
 
Rajeev


  _____  

From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of chong tan
Sent: Wednesday, July 18, 2007 2:18 PM
To: mpich-discuss at mcs.anl.gov
Subject: Re: [MPICH] MPICH105 shm drops packages on SUN niagara


Any update on this problem ?  
thanks
tan


 
----- Original Message ----
From: William Gropp <gropp at mcs.anl.gov>
To: chong tan <chong_guan_tan at yahoo.com>
Cc: mpich-discuss at mcs.anl.gov
Sent: Wednesday, May 16, 2007 11:24:46 AM
Subject: Re: [MPICH] MPICH105 shm drops packages on SUN niagara

We're looking at it; I've added a variation of this to our regular tests.
No solution yet, however.  My guess is that there is a missing volatile or
memory barrier somewhere; this should force us to clean up the current code.


Bill

On May 16, 2007, at 12:18 PM, chong tan wrote:



No taker on this ?  There is an identical proble on Linux.  Just that I am
not sure if this code can reproduce the problem. 
tan


 
----- Original Message ----
From: chong tan <chong_guan_tan at yahoo.com>
To: mpich-discuss at mcs.anl.gov
Sent: Friday, April 27, 2007 3:24:09 PM
Subject: Re: [MPICH] MPICH105 shm drops packages on SUN niagara


The following code reproduces the problem.  I think you maybe able to
reproduce the error on
Linux, but I am not sure.
 
 
It is best to run :
mpiexec -n 8 a.out
to reproduce the problem.  You will need a machine with 8CPU/cores.
SOmetime you will need to
run the code multiple time to see the error.
 
there will be files fast_mpi_?.dmp created, where ? is the rank of the
related 'rank'.  When MPI get stuck,
you should look at the last line of fast_mpi_0.dmp.  If it says:

  read from child 7

 
then you should look at the last line of fast_mpi_7.dmp, it will say:
  read from master

 
hope this help to debug the error.
 
thanks
tan

---------------------
#include "stdlib.h"
#include "stdio.h"
#include "mpi.h"
 
#define LOOP_COUNT  1000000
#define DATA_SIZE   4
#define MP_TAG      999
main()
{
    int     nProc, rank ;
    int     argc = 0 ;
    int     i, j, status ;
    char    buf[ 128 ] ;
    FILE    *pf ;
    MPI_Init( &argc, NULL ) ;
    MPI_Comm_size( MPI_COMM_WORLD, &nProc ) ;
    MPI_Comm_rank( MPI_COMM_WORLD, &rank ) ;
    sprintf( buf, "fast_mpi_%d.dmp", rank ) ;
    pf = fopen( buf, "w" ) ;
    if( !rank ) {
       int      **psend ;
       int      **precv ;
       psend = (int**)calloc( nProc, sizeof( int *) ) ;
       precv = (int**)calloc( nProc, sizeof( int *) ) ;
       for( i = 0 ; i < nProc ; i++ ) {
           psend[ i ] = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;
           precv[ i ] = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;
       }
       for( i = 0 ; i < LOOP_COUNT ; i++ ) {
          fprintf( pf, "Master : loop %d\n", i ) ;
          fflush( pf ) ;
          for( j = 1 ; j < nProc ; j++ ) {
             fprintf( pf, "  read from child %d\n", j ) ;
             fflush( pf ) ;
             status = MPI_Recv( precv[ j ], DATA_SIZE, MPI_LONG, j, MP_TAG,
MPI_COMM_WORLD, MP
I_STATUS_IGNORE ) ;
             fprintf( pf, "  read from child %d done, status = %d\n", j,
status ) ;
             fflush( pf ) ;
          }
          for( j = 1 ; j < nProc ; j++ ) {
             fprintf( pf, "  send to child %d\n", j ) ;
             fflush( pf ) ;
             status = MPI_Send( psend[ j ], DATA_SIZE - 1, MPI_LONG, j,
MP_TAG, MPI_COMM_WORLD
 ) ;
             fprintf( pf, "  send to child %d done, status = %d\n", j,
status ) ;
             fflush( pf ) ;
          }
       }
    } else {
       int  *psend ;
       int  *precv ;
       psend = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;
       precv = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;
       for( i = 0 ; i < LOOP_COUNT ; i++ ) {
             fprintf( pf, "  send to master\n" ) ;
             fflush( pf ) ;
             status = MPI_Send( psend, DATA_SIZE - 1, MPI_LONG, 0, MP_TAG,
MPI_COMM_WORLD ) ;
             fprintf( pf, "  send to master done, status = %d\n", status ) ;
             fflush( pf ) ;
             fprintf( pf, "  read from master\n" ) ;
             fflush( pf ) ;
             status = MPI_Recv( precv, DATA_SIZE, MPI_LONG, 0, MP_TAG,
MPI_COMM_WORLD, MPI_STATUS_IGNORE ) ;
             fprintf( pf, "  read from master done, status = %d\n", status )
;
             fflush( pf ) ;
       }
    }
    fclose( pf ) ;
    MPI_Finalize() ;
}


  _____  

Ahhh...imagining that irresistible "new car" smell?
Check out
<http://us.rd.yahoo.com/evt=48245/*http://autos.yahoo.com/new_cars.html;_ylc
=X3oDMTE1YW1jcXJ2BF9TAzk3MTA3MDc2BHNlYwNtYWlsdGFncwRzbGsDbmV3LWNhcnM-> new
cars at Yahoo! Autos.


  _____  

Be a better Heartthrob.
<http://us.rd.yahoo.com/evt=48255/*http://answers.yahoo.com/dir/_ylc=X3oDMTI
5MGx2aThyBF9TAzIxMTU1MDAzNTIEX3MDMzk2NTQ1MTAzBHNlYwNCQUJwaWxsYXJfTklfMzYwBHN
sawNQcm9kdWN0X3F1ZXN0aW9uX3BhZ2U-?link=list&sid=396545433> Get better
relationship answers from someone who knows.
Yahoo! Answers - Check it out.




  _____  

Get the Yahoo! toolbar and be
<http://us.rd.yahoo.com/evt=48225/*http://new.toolbar.yahoo.com/toolbar/feat
ures/mail/index.php> alerted to new email wherever you're surfing. 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20070718/1f03cedc/attachment.htm>


More information about the mpich-discuss mailing list