[MPICH] MPICH105 shm drops packages on SUN niagara
chong tan
chong_guan_tan at yahoo.com
Wed Jul 18 18:49:35 CDT 2007
Any suggestion for solution on Solaris other than socket ? For SMP and Niagra boxes, socket is a very bad solution.
tan
----- Original Message ----
From: Rajeev Thakur <thakur at mcs.anl.gov>
To: chong tan <chong_guan_tan at yahoo.com>
Cc: Darius Buntinas <buntinas at mcs.anl.gov>
Sent: Wednesday, July 18, 2007 4:39:24 PM
Subject: RE: [MPICH] MPICH105 shm drops packages on SUN niagara
Not in the next release (1.0.6) which will be out in a couple of weeks.
Rajeev
From: chong tan [mailto:chong_guan_tan at yahoo.com]
Sent: Wednesday, July 18, 2007 6:33 PM
To: Rajeev Thakur
Subject: Re: [MPICH] MPICH105 shm drops packages on SUN niagara
will nemesis be supported on SUN Solaris in the next release ?
tan
----- Original Message ----
From: Rajeev Thakur <thakur at mcs.anl.gov>
To: chong tan <chong_guan_tan at yahoo.com>; mpich-discuss at mcs.anl.gov
Sent: Wednesday, July 18, 2007 3:42:56 PM
Subject: RE: [MPICH] MPICH105 shm drops packages on SUN niagara
No we didn't get a chance. Most of our current development is in Nemesis.
Rajeev
From: owner-mpich-discuss at mcs.anl.gov [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of chong tan
Sent: Wednesday, July 18, 2007 2:18 PM
To: mpich-discuss at mcs.anl.gov
Subject: Re: [MPICH] MPICH105 shm drops packages on SUN niagara
Any update on this problem ?
thanks
tan
----- Original Message ----
From: William Gropp <gropp at mcs.anl.gov>
To: chong tan <chong_guan_tan at yahoo.com>
Cc: mpich-discuss at mcs.anl.gov
Sent: Wednesday, May 16, 2007 11:24:46 AM
Subject: Re: [MPICH] MPICH105 shm drops packages on SUN niagara
We're looking at it; I've added a variation of this to our regular tests. No solution yet, however. My guess is that there is a missing volatile or memory barrier somewhere; this should force us to clean up the current code.
Bill
On May 16, 2007, at 12:18 PM, chong tan wrote:
No taker on this ? There is an identical proble on Linux. Just that I am not sure if this code can reproduce the problem.
tan
----- Original Message ----
From: chong tan <chong_guan_tan at yahoo.com>
To: mpich-discuss at mcs.anl.gov
Sent: Friday, April 27, 2007 3:24:09 PM
Subject: Re: [MPICH] MPICH105 shm drops packages on SUN niagara
The following code reproduces the problem. I think you maybe able to reproduce the error on
Linux, but I am not sure.
It is best to run :
mpiexec -n 8 a.out
to reproduce the problem. You will need a machine with 8CPU/cores. SOmetime you will need to
run the code multiple time to see the error.
there will be files fast_mpi_?.dmp created, where ? is the rank of the related 'rank'. When MPI get stuck,
you should look at the last line of fast_mpi_0.dmp. If it says:
read from child 7
then you should look at the last line of fast_mpi_7.dmp, it will say:
read from master
hope this help to debug the error.
thanks
tan
---------------------
#include "stdlib.h"
#include "stdio.h"
#include "mpi.h"
#define LOOP_COUNT 1000000
#define DATA_SIZE 4
#define MP_TAG 999
main()
{
int nProc, rank ;
int argc = 0 ;
int i, j, status ;
char buf[ 128 ] ;
FILE *pf ;
MPI_Init( &argc, NULL ) ;
MPI_Comm_size( MPI_COMM_WORLD, &nProc ) ;
MPI_Comm_rank( MPI_COMM_WORLD, &rank ) ;
sprintf( buf, "fast_mpi_%d.dmp", rank ) ;
pf = fopen( buf, "w" ) ;
if( !rank ) {
int **psend ;
int **precv ;
psend = (int**)calloc( nProc, sizeof( int *) ) ;
precv = (int**)calloc( nProc, sizeof( int *) ) ;
for( i = 0 ; i < nProc ; i++ ) {
psend[ i ] = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;
precv[ i ] = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;
}
for( i = 0 ; i < LOOP_COUNT ; i++ ) {
fprintf( pf, "Master : loop %d\n", i ) ;
fflush( pf ) ;
for( j = 1 ; j < nProc ; j++ ) {
fprintf( pf, " read from child %d\n", j ) ;
fflush( pf ) ;
status = MPI_Recv( precv[ j ], DATA_SIZE, MPI_LONG, j, MP_TAG, MPI_COMM_WORLD, MP
I_STATUS_IGNORE ) ;
fprintf( pf, " read from child %d done, status = %d\n", j, status ) ;
fflush( pf ) ;
}
for( j = 1 ; j < nProc ; j++ ) {
fprintf( pf, " send to child %d\n", j ) ;
fflush( pf ) ;
status = MPI_Send( psend[ j ], DATA_SIZE - 1, MPI_LONG, j, MP_TAG, MPI_COMM_WORLD
) ;
fprintf( pf, " send to child %d done, status = %d\n", j, status ) ;
fflush( pf ) ;
}
}
} else {
int *psend ;
int *precv ;
psend = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;
precv = (int*)calloc( DATA_SIZE, sizeof( int ) ) ;
for( i = 0 ; i < LOOP_COUNT ; i++ ) {
fprintf( pf, " send to master\n" ) ;
fflush( pf ) ;
status = MPI_Send( psend, DATA_SIZE - 1, MPI_LONG, 0, MP_TAG, MPI_COMM_WORLD ) ;
fprintf( pf, " send to master done, status = %d\n", status ) ;
fflush( pf ) ;
fprintf( pf, " read from master\n" ) ;
fflush( pf ) ;
status = MPI_Recv( precv, DATA_SIZE, MPI_LONG, 0, MP_TAG, MPI_COMM_WORLD, MPI_STATUS_IGNORE ) ;
fprintf( pf, " read from master done, status = %d\n", status ) ;
fflush( pf ) ;
}
}
fclose( pf ) ;
MPI_Finalize() ;
}
Ahhh...imagining that irresistible "new car" smell?
Check out new cars at Yahoo! Autos.
Be a better Heartthrob. Get better relationship answers from someone who knows.
Yahoo! Answers - Check it out.
Get the Yahoo! toolbar and be alerted to new email wherever you're surfing.
Need a vacation? Get great deals to amazing places on Yahoo! Travel.
____________________________________________________________________________________
Got a little couch potato?
Check out fun summer activities for kids.
http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20070718/b2585489/attachment.htm>
More information about the mpich-discuss
mailing list