<div><BR>It seems to work for Niagara 1. At least it passed the set of crazy tests I have.</div> <div> </div> <div>thanks</div> <div> </div> <div>tan</div> <div><BR><B><I>Darius Buntinas <buntinas@mcs.anl.gov></I></B> wrote:</div> <BLOCKQUOTE class=replbq style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #1010ff 2px solid"><BR>It seems to be working on linux, but we don't have a solaris box to try <BR>it on. Can you try it and let us know?<BR><BR>-d<BR><BR>On 09/17/2007 12:07 PM, chong tan wrote:<BR>> In the 'change liost' of the new 106 release, I see thie item:<BR>> <BR>> # Bugfix for shm and ssm channels. Added missing read and write memory <BR>> barriers for x86, and missing volatile in packet structure<BR>> <BR>> does it means this problem is fixed ?<BR>> <BR>> thanks<BR>> <BR>> <BR>> */William Gropp <GROPP@MCS.ANL.GOV>/* wrote:<BR>> <BR>> We're looking at it; I've added a variation
of this to our regular<BR>> tests. No solution yet, however. My guess is that there is a<BR>> missing volatile or memory barrier somewhere; this should force us<BR>> to clean up the current code.<BR>> <BR>> Bill<BR>> <BR>> On May 16, 2007, at 12:18 PM, chong tan wrote:<BR>> <BR>>> No taker on this ? There is an identical proble on Linux. Just<BR>>> that I am not sure if this code can reproduce the problem. <BR>>> tan<BR>>><BR>>><BR>>> <BR>>> ----- Original Message ----<BR>>> From: chong tan <CHONG_GUAN_TAN@YAHOO.COM<BR>>> <mailto:chong_guan_tan@yahoo.com>><BR>>> To: mpich-discuss@mcs.anl.gov <mailto:mpich-discuss@mcs.anl.gov><BR>>> Sent: Friday, April 27, 2007 3:24:09 PM<BR>>> Subject: Re: [MPICH] MPICH105 shm drops packages on SUN niagara<BR>>><BR>>> The following code reproduces the problem. I think you maybe able<BR>>> to reproduce the error
on<BR>>> Linux, but I am not sure.<BR>>> <BR>>> <BR>>> It is best to run :<BR>>> mpiexec -n 8 a.out<BR>>> to reproduce the problem. You will need a machine with<BR>>> 8CPU/cores. SOmetime you will need to<BR>>> run the code multiple time to see the error.<BR>>> <BR>>> there will be files fast_mpi_?.dmp created, where ? is the rank of<BR>>> the related 'rank'. When MPI get stuck,<BR>>> you should look at the last line of fast_mpi_0.dmp. If it says:<BR>>><BR>>> read from child 7<BR>>> <BR>>> then you should look at the last line of fast_mpi_7.dmp, it will say:<BR>>> read from master<BR>>> <BR>>> hope this help to debug the error.<BR>>> <BR>>> thanks<BR>>> tan<BR>>><BR>>> ---------------------<BR>>> #include "stdlib.h"<BR>>> #include "stdio.h"<BR>>> #include "mpi.h"<BR>>> <BR>>> #define LOOP_COUNT
1000000<BR>>> #define DATA_SIZE 4<BR>>> #define MP_TAG 999<BR>>> main()<BR>>> {<BR>>> int nProc, rank ;<BR>>> int argc = 0 ;<BR>>> int i, j, status ;<BR>>> char buf[ 128 ] ;<BR>>> FILE *pf ;<BR>>> MPI_Init( &argc, NULL ) ;<BR>>> MPI_Comm_size( MPI_COMM_WORLD, &nProc ) ;<BR>>> MPI_Comm_rank( MPI_COMM_WORLD, &rank ) ;<BR>>> sprintf( buf, "fast_mpi_%d.dmp", rank ) ;<BR>>> pf = fopen( buf, "w" ) ;<BR>>> if( !rank ) {<BR>>> int **psend ;<BR>>> int **precv ;<BR>>> psend = (int**)calloc( nProc, sizeof( int *) ) ;<BR>>> precv = (int**)calloc( nProc, sizeof( int *) ) ;<BR>>> for( i = 0 ; i < nProc ; i++ ) {<BR>>> psend[ i ] = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;<BR>>> precv[ i ] = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;<BR>>> }<BR>>> for( i = 0 ; i < LOOP_COUNT ; i++ ) {<BR>>> fprintf( pf, "Master : loop
%d\n", i ) ;<BR>>> fflush( pf ) ;<BR>>> for( j = 1 ; j < nProc ; j++ ) {<BR>>> fprintf( pf, " read from child %d\n", j ) ;<BR>>> fflush( pf ) ;<BR>>> status = MPI_Recv( precv[ j ], DATA_SIZE, MPI_LONG,<BR>>> j, MP_TAG, MPI_COMM_WORLD, MP<BR>>> I_STATUS_IGNORE ) ;<BR>>> fprintf( pf, " read from child %d done, status =<BR>>> %d\n", j, status ) ;<BR>>> fflush( pf ) ;<BR>>> }<BR>>> for( j = 1 ; j < nProc ; j++ ) {<BR>>> fprintf( pf, " send to child %d\n", j ) ;<BR>>> fflush( pf ) ;<BR>>> status = MPI_Send( psend[ j ], DATA_SIZE - 1,<BR>>> MPI_LONG, j, MP_TAG, MPI_COMM_WORLD<BR>>> ) ;<BR>>> fprintf( pf, " send to child %d done, status =<BR>>> %d\n", j, status ) ;<BR>>> fflush( pf ) ;<BR>>> }<BR>>> }<BR>>> } else {<BR>>> int *psend ;<BR>>> int *precv ;<BR>>> psend = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;<BR>>>
precv = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;<BR>>> for( i = 0 ; i < LOOP_COUNT ; i++ ) {<BR>>> fprintf( pf, " send to master\n" ) ;<BR>>> fflush( pf ) ;<BR>>> status = MPI_Send( psend, DATA_SIZE - 1, MPI_LONG, 0,<BR>>> MP_TAG, MPI_COMM_WORLD ) ;<BR>>> fprintf( pf, " send to master done, status = %d\n",<BR>>> status ) ;<BR>>> fflush( pf ) ;<BR>>> fprintf( pf, " read from master\n" ) ;<BR>>> fflush( pf ) ;<BR>>> status = MPI_Recv( precv, DATA_SIZE, MPI_LONG, 0,<BR>>> MP_TAG, MPI_COMM_WORLD, MPI_STATUS_IGNORE ) ;<BR>>> fprintf( pf, " read from master done, status =<BR>>> %d\n", status ) ;<BR>>> fflush( pf ) ;<BR>>> }<BR>>> }<BR>>> fclose( pf ) ;<BR>>> MPI_Finalize() ;<BR>>> }<BR>>><BR>>> ------------------------------------------------------------------------<BR>>> Ahhh...imagining that irresistible "new car" smell?<BR>>>
Check out new cars at Yahoo! Autos.<BR>>> <HTTP: evt="48245/*http://autos.yahoo.com/new_cars.html;_ylc=X3oDMTE1YW1jcXJ2BF9TAzk3MTA3MDc2BHNlYwNtYWlsdGFncwRzbGsDbmV3LWNhcnM-" us.rd.yahoo.com><BR>>><BR>>><BR>>> ------------------------------------------------------------------------<BR>>> Be a better Heartthrob. Get better relationship answers<BR>>> <HTTP: evt="48255/*http://answers.yahoo.com/dir/_ylc=X3oDMTI5MGx2aThyBF9TAzIxMTU1MDAzNTIEX3MDMzk2NTQ1MTAzBHNlYwNCQUJwaWxsYXJfTklfMzYwBHNsawNQcm9kdWN0X3F1ZXN0aW9uX3BhZ2U-?link=list&sid=396545433" us.rd.yahoo.com>from<BR>>> someone who knows.<BR>>> Yahoo! Answers - Check it out.<BR>> <BR>> <BR>> ------------------------------------------------------------------------<BR>> Luggage? GPS? Comic books?<BR>> Check out fitting gifts for grads <BR>> <HTTP: evt="48249/*http://search.yahoo.com/search?fr=oni_on_mail&p=graduation+gifts&cs=bz" us.rd.yahoo.com><BR>>
at Yahoo! Search.<BR></BLOCKQUOTE><BR></mailto:mpich-discuss@mcs.anl.gov></mailto:chong_guan_tan@yahoo.com><p> 
<hr size=1>Fussy? Opinionated? Impossible to please? Perfect. <a href="http://us.rd.yahoo.com/evt=48516/*http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 ">Join Yahoo!'s user panel</a> and lay it on us.