[mpich-discuss] I wonder if my mpdboot is the cause ofproblem...help me!

Rajeev Thakur thakur at mcs.anl.gov
Sat Jul 18 09:02:36 CDT 2009


What are the exact parameters you passed to configure when building
MPICH2? Are the two machines identical?
 
Rajeev


  _____  

From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Gra zeus
Sent: Saturday, July 18, 2009 12:06 AM
To: mpich-discuss at mcs.anl.gov
Subject: [mpich-discuss] I wonder if my mpdboot is the cause
ofproblem...help me!


hello, 


thx for answer yesterday.
I tested my code in one machine (with"mpiexec -n 2 ./myprog"),everything
work fine - my program can use MPI_Send,MPI_Recv without any problems.

today, I setup mpich2 on two machines. Both machines can communicate
with others, ssh are tested on both machines, mpd work, mpdringtest
work.

however,when i run my program that use MPI_Send and MPI_Recv,  MPI_Recv
is blocked forever.
so i write new simple code to test MPI_Send,MPI_Recv like this

        int myrank;
        MPI_Status status;
MPI_Init( &argc, &argv );
MPI_Comm_rank( MPI_COMM_WORLD, &myrank );
if (myrank == 0) 
{
int senddata = 1;
MPI_Send(&senddata, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
}
else if (myrank == 1) 
{
int recvdata = 0;
MPI_Recv(&recvdata, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);
printf("received :%d:\n", recvdata);
}
MPI_Finalize();


i got this error


Assertion failed in file ch3_progress.c at line 489: pkt->type >= 0 &&
pkt->type < MPIDI_NEM_PKT_END
internal ABORT - process 1
Fatal error in MPI_Finalize: Other MPI error, error stack:
MPI_Finalize(315)..................: MPI_Finalize failed
MPI_Finalize(207)..................: 
MPID_Finalize(92)..................: 
PMPI_Barrier(476)..................: MPI_Barrier(comm=0x44000002) failed
MPIR_Barrier(82)...................: 
MPIC_Sendrecv(164).................: 
MPIC_Wait(405).....................: 
MPIDI_CH3I_Progress(150)...........: 
MPID_nem_mpich2_blocking_recv(1074): 
MPID_nem_tcp_connpoll(1667)........: 
state_commrdy_handler(1517)........: 
MPID_nem_tcp_recv_handler(1413)....: socket closed

////////////////////////////////////////////////////////////////

I also tried example/cpi that come with install package -> result is the
example program freezed, without any errors.(I assume it stopped at
MPI_Bcast())

Can anyone help me with this?
This code and my program can run smoothly when I use 1 machine (with
option ,  -n 2, -n 4 .... etc) but whenever I start mpdboot with 2
machines, mpi processes can't communicate with other mpi processes via
MPI_Send,MPI_Recv.

thx,
gra




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090718/3b5dd4b1/attachment.htm>


More information about the mpich-discuss mailing list