[mpich-discuss] I wonder if my mpdboot is the causeofproblem...help me!

Gra zeus gra_zeus at yahoo.com
Sat Jul 18 15:08:23 CDT 2009


ello Rajeev,
ahh sorry about last email, my OS on two machine are different
quadcore machine is 64bit and OS is "Linux myquadcore_machine 2.6.18-128.1.1.el5 #1 SMP Tue Feb 10 11:36:29 EST 2009 x86_64 x86_64 x86_64 GNU/Linux"

dual core is 32bit and IS is :: "Linux mydualcore_machine 2.6.18-128.1.6.el5PAE #1 SMP Wed Apr 1 07:24:39 EDT 2009 i686 i686 i386 GNU/Linux"
Are these the cause of my problem? Do i need to run my MPI with the same 32-bit machines? Are there any configurations i need to set , to make them work togather?
thank you very much,and sorry again about wrong OS info in my last email
regards,Gra 
--- On Sat, 7/18/09, Rajeev Thakur <thakur at mcs.anl.gov> wrote:

From: Rajeev Thakur <thakur at mcs.anl.gov>
Subject: Re: [mpich-discuss] I wonder if my mpdboot is the causeofproblem...help me!
To: mpich-discuss at mcs.anl.gov
Date: Saturday, July 18, 2009, 8:42 AM



 
 
 
Are the CPUs identical on them? Is one 32-bit, the other 
64-bit?
 


  
  
  From: mpich-discuss-bounces at mcs.anl.gov 
  [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Gra 
  zeus
Sent: Saturday, July 18, 2009 10:27 AM
To: 
  mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] I wonder if 
  my mpdboot is the causeofproblem...help me!


  
  
    
    
      one of them is quad core and another one is dual core. 
        however, OS,account,my password,install path are all the same.
        I use this  configuration "./configure 
        --prefix=/opt/localhomes/myname/mpich2-install" in both 
        machines.
--- On Sat, 7/18/09, Rajeev Thakur 
        <thakur at mcs.anl.gov> wrote:

        
From: 
          Rajeev Thakur <thakur at mcs.anl.gov>
Subject: Re: 
          [mpich-discuss] I wonder if my mpdboot is the cause ofproblem...help 
          me!
To: mpich-discuss at mcs.anl.gov
Date: Saturday, July 18, 2009, 
          7:02 AM


          
          What are the exact parameters you 
          passed to configure when building MPICH2? Are the two machines 
          identical?
           
          Rajeev

          
            
            
            From: 
            mpich-discuss-bounces at mcs.anl.gov 
            [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Gra 
            zeus
Sent: Saturday, July 18, 2009 12:06 AM
To: 
            mpich-discuss at mcs.anl.gov
Subject: [mpich-discuss] I 
            wonder if my mpdboot is the cause ofproblem...help 
            me!


            
            
              
              
                hello, 
                  

                  thx for answer yesterday.
                  I tested my code in one machine (with"mpiexec -n 2 
                  ./myprog"),everything work fine - my program can use 
                  MPI_Send,MPI_Recv without any problems.
                  

                  today, I setup mpich2 on two machines. Both machines can 
                  communicate with others, ssh are tested on both machines, mpd 
                  work, mpdringtest work.
                  

                  however,when i run my program that use MPI_Send and 
                  MPI_Recv,  MPI_Recv is blocked forever.
                  so i write new simple code to test MPI_Send,MPI_Recv like 
                  this
                  

                          int myrank;
                  
                          MPI_Status status;
                  MPI_Init( &argc, &argv 
                  );
                  MPI_Comm_rank( MPI_COMM_WORLD, 
                  &myrank );
                  if (myrank == 0) 
                  {
                  int senddata = 1;
                  MPI_Send(&senddata, 1, 
                  MPI_INT, 1, 0, MPI_COMM_WORLD);
                  }
                  else if (myrank == 1) 
                  {
                  int recvdata = 0;
                  MPI_Recv(&recvdata, 1, 
                  MPI_INT, 0, 0, MPI_COMM_WORLD, &status);
                  printf("received :%d:\n", 
                  recvdata);
                  }
                  MPI_Finalize();
                  

                  

                  i got this error
                  

                  

                  
                  Assertion failed in file ch3_progress.c at line 489: 
                  pkt->type >= 0 && pkt->type < 
                  MPIDI_NEM_PKT_END
                  internal ABORT - process 1
                  Fatal error in MPI_Finalize: Other MPI error, error 
                  stack:
                  MPI_Finalize(315)..................: MPI_Finalize 
                  failed
                  MPI_Finalize(207)..................: 
                  MPID_Finalize(92)..................: 
                  PMPI_Barrier(476)..................: 
                  MPI_Barrier(comm=0x44000002) failed
                  MPIR_Barrier(82)...................: 
                  MPIC_Sendrecv(164).................: 
                  MPIC_Wait(405).....................: 
                  MPIDI_CH3I_Progress(150)...........: 
                  MPID_nem_mpich2_blocking_recv(1074): 
                  MPID_nem_tcp_connpoll(1667)........: 
                  state_commrdy_handler(1517)........: 
                  MPID_nem_tcp_recv_handler(1413)....: socket closed
                  

                  ////////////////////////////////////////////////////////////////
                  

                  I also tried example/cpi that come with install package 
                  -> result is the example program freezed, without any 
                  errors.(I assume it stopped at MPI_Bcast())
                  

                  Can anyone help me with this?
                  This code and my program can run smoothly when I use 1 
                  machine (with option ,  -n 2, -n 4 .... etc) but whenever 
                  I start mpdboot with 2 machines, mpi processes can't 
                  communicate with other mpi processes via 
                  MPI_Send,MPI_Recv.
                  

                  thx,
                  gra
                  

                  


 



      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090718/1b958348/attachment.htm>


More information about the mpich-discuss mailing list