[MPICH] MPICH105 shm drops packages on SUN niagara
Darius Buntinas
buntinas at mcs.anl.gov
Mon Sep 17 13:06:06 CDT 2007
It seems to be working on linux, but we don't have a solaris box to try
it on. Can you try it and let us know?
-d
On 09/17/2007 12:07 PM, chong tan wrote:
> In the 'change liost' of the new 106 release, I see thie item:
>
> # Bugfix for shm and ssm channels. Added missing read and write memory
> barriers for x86, and missing volatile in packet structure
>
> does it means this problem is fixed ?
>
> thanks
>
>
> */William Gropp <gropp at mcs.anl.gov>/* wrote:
>
> We're looking at it; I've added a variation of this to our regular
> tests. No solution yet, however. My guess is that there is a
> missing volatile or memory barrier somewhere; this should force us
> to clean up the current code.
>
> Bill
>
> On May 16, 2007, at 12:18 PM, chong tan wrote:
>
>> No taker on this ? There is an identical proble on Linux. Just
>> that I am not sure if this code can reproduce the problem.
>> tan
>>
>>
>>
>> ----- Original Message ----
>> From: chong tan <chong_guan_tan at yahoo.com
>> <mailto:chong_guan_tan at yahoo.com>>
>> To: mpich-discuss at mcs.anl.gov <mailto:mpich-discuss at mcs.anl.gov>
>> Sent: Friday, April 27, 2007 3:24:09 PM
>> Subject: Re: [MPICH] MPICH105 shm drops packages on SUN niagara
>>
>> The following code reproduces the problem. I think you maybe able
>> to reproduce the error on
>> Linux, but I am not sure.
>>
>>
>> It is best to run :
>> mpiexec -n 8 a.out
>> to reproduce the problem. You will need a machine with
>> 8CPU/cores. SOmetime you will need to
>> run the code multiple time to see the error.
>>
>> there will be files fast_mpi_?.dmp created, where ? is the rank of
>> the related 'rank'. When MPI get stuck,
>> you should look at the last line of fast_mpi_0.dmp. If it says:
>>
>> read from child 7
>>
>> then you should look at the last line of fast_mpi_7.dmp, it will say:
>> read from master
>>
>> hope this help to debug the error.
>>
>> thanks
>> tan
>>
>> ---------------------
>> #include "stdlib.h"
>> #include "stdio.h"
>> #include "mpi.h"
>>
>> #define LOOP_COUNT 1000000
>> #define DATA_SIZE 4
>> #define MP_TAG 999
>> main()
>> {
>> int nProc, rank ;
>> int argc = 0 ;
>> int i, j, status ;
>> char buf[ 128 ] ;
>> FILE *pf ;
>> MPI_Init( &argc, NULL ) ;
>> MPI_Comm_size( MPI_COMM_WORLD, &nProc ) ;
>> MPI_Comm_rank( MPI_COMM_WORLD, &rank ) ;
>> sprintf( buf, "fast_mpi_%d.dmp", rank ) ;
>> pf = fopen( buf, "w" ) ;
>> if( !rank ) {
>> int **psend ;
>> int **precv ;
>> psend = (int**)calloc( nProc, sizeof( int *) ) ;
>> precv = (int**)calloc( nProc, sizeof( int *) ) ;
>> for( i = 0 ; i < nProc ; i++ ) {
>> psend[ i ] = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;
>> precv[ i ] = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;
>> }
>> for( i = 0 ; i < LOOP_COUNT ; i++ ) {
>> fprintf( pf, "Master : loop %d\n", i ) ;
>> fflush( pf ) ;
>> for( j = 1 ; j < nProc ; j++ ) {
>> fprintf( pf, " read from child %d\n", j ) ;
>> fflush( pf ) ;
>> status = MPI_Recv( precv[ j ], DATA_SIZE, MPI_LONG,
>> j, MP_TAG, MPI_COMM_WORLD, MP
>> I_STATUS_IGNORE ) ;
>> fprintf( pf, " read from child %d done, status =
>> %d\n", j, status ) ;
>> fflush( pf ) ;
>> }
>> for( j = 1 ; j < nProc ; j++ ) {
>> fprintf( pf, " send to child %d\n", j ) ;
>> fflush( pf ) ;
>> status = MPI_Send( psend[ j ], DATA_SIZE - 1,
>> MPI_LONG, j, MP_TAG, MPI_COMM_WORLD
>> ) ;
>> fprintf( pf, " send to child %d done, status =
>> %d\n", j, status ) ;
>> fflush( pf ) ;
>> }
>> }
>> } else {
>> int *psend ;
>> int *precv ;
>> psend = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;
>> precv = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;
>> for( i = 0 ; i < LOOP_COUNT ; i++ ) {
>> fprintf( pf, " send to master\n" ) ;
>> fflush( pf ) ;
>> status = MPI_Send( psend, DATA_SIZE - 1, MPI_LONG, 0,
>> MP_TAG, MPI_COMM_WORLD ) ;
>> fprintf( pf, " send to master done, status = %d\n",
>> status ) ;
>> fflush( pf ) ;
>> fprintf( pf, " read from master\n" ) ;
>> fflush( pf ) ;
>> status = MPI_Recv( precv, DATA_SIZE, MPI_LONG, 0,
>> MP_TAG, MPI_COMM_WORLD, MPI_STATUS_IGNORE ) ;
>> fprintf( pf, " read from master done, status =
>> %d\n", status ) ;
>> fflush( pf ) ;
>> }
>> }
>> fclose( pf ) ;
>> MPI_Finalize() ;
>> }
>>
>> ------------------------------------------------------------------------
>> Ahhh...imagining that irresistible "new car" smell?
>> Check out new cars at Yahoo! Autos.
>> <http://us.rd.yahoo.com/evt=48245/*http://autos.yahoo.com/new_cars.html;_ylc=X3oDMTE1YW1jcXJ2BF9TAzk3MTA3MDc2BHNlYwNtYWlsdGFncwRzbGsDbmV3LWNhcnM->
>>
>>
>> ------------------------------------------------------------------------
>> Be a better Heartthrob. Get better relationship answers
>> <http://us.rd.yahoo.com/evt=48255/*http://answers.yahoo.com/dir/_ylc=X3oDMTI5MGx2aThyBF9TAzIxMTU1MDAzNTIEX3MDMzk2NTQ1MTAzBHNlYwNCQUJwaWxsYXJfTklfMzYwBHNsawNQcm9kdWN0X3F1ZXN0aW9uX3BhZ2U-?link=list&sid=396545433>from
>> someone who knows.
>> Yahoo! Answers - Check it out.
>
>
> ------------------------------------------------------------------------
> Luggage? GPS? Comic books?
> Check out fitting gifts for grads
> <http://us.rd.yahoo.com/evt=48249/*http://search.yahoo.com/search?fr=oni_on_mail&p=graduation+gifts&cs=bz>
> at Yahoo! Search.
More information about the mpich-discuss
mailing list