[mpich-discuss] I wonder if my mpdboot is the causeofproblem...help me!

Pavan Balaji balaji at mcs.anl.gov
Sat Jul 18 15:17:18 CDT 2009


You have a few options:

1. Check if your processor supports 64-bit Operating Systems -- most 
modern processors do. If it does, just reinstall a 64-bit OS on the 
machine. This is the most efficient option.

2. Use the "-m32" CFLAGS to your MPICH2 configure -- this will build 
MPICH2 in 32-bit mode even on the 64-bit platform. Even your 
applications that are built with mpicc and friends will be built as 
32-bit binaries. This will work, but you'll not be using the 64-bit 
capabilities of one of your machines, so the performance will not be 
optimal.

3. You could use MPICH-1 instead of MPICH2, though I wouldn't suggest 
doing that. In this case MPICH will internally do the data conversion 
for you, which will eat up some performance as well.

  -- Pavan

On 07/18/2009 03:08 PM, Gra zeus wrote:
> ello Rajeev,
> *
> *
> ahh sorry about last email, my OS on two machine are different
> 
> quadcore machine is 64bit and OS is "Linux myquadcore_machine 
> 2.6.18-128.1.1.el5 #1 SMP Tue Feb 10 11:36:29 EST 2009 x86_64 x86_64 
> x86_64 GNU/Linux"
> 
> 
> dual core is 32bit and IS is :: "Linux mydualcore_machine 
> 2.6.18-128.1.6.el5PAE #1 SMP Wed Apr 1 07:24:39 EDT 2009 i686 i686 i386 
> GNU/Linux"
> 
> Are these the cause of my problem? Do i need to run my MPI with the same 
> 32-bit machines? Are there any configurations i need to set , to make 
> them work togather?
> 
> thank you very much,and sorry again about wrong OS info in my last email
> 
> regards,
> Gra 
> 
> --- On *Sat, 7/18/09, Rajeev Thakur /<thakur at mcs.anl.gov>/* wrote:
> 
> 
>     From: Rajeev Thakur <thakur at mcs.anl.gov>
>     Subject: Re: [mpich-discuss] I wonder if my mpdboot is the
>     causeofproblem...help me!
>     To: mpich-discuss at mcs.anl.gov
>     Date: Saturday, July 18, 2009, 8:42 AM
> 
>     Are the CPUs identical on them? Is one 32-bit, the other 64-bit?
>      
> 
>         ------------------------------------------------------------------------
>         *From:* mpich-discuss-bounces at mcs.anl.gov
>         [mailto:mpich-discuss-bounces at mcs.anl.gov] *On Behalf Of *Gra zeus
>         *Sent:* Saturday, July 18, 2009 10:27 AM
>         *To:* mpich-discuss at mcs.anl.gov
>         *Subject:* Re: [mpich-discuss] I wonder if my mpdboot is the
>         causeofproblem...help me!
> 
>         one of them is quad core and another one is dual core. however,
>         OS,account,my password,install path are all the same.
>         I use this  configuration "./configure
>         --prefix=/opt/localhomes/myname/mpich2-install" in both machines.
> 
>         --- On *Sat, 7/18/09, Rajeev Thakur /<thakur at mcs.anl.gov>/* wrote:
> 
> 
>             From: Rajeev Thakur <thakur at mcs.anl.gov>
>             Subject: Re: [mpich-discuss] I wonder if my mpdboot is the
>             cause ofproblem...help me!
>             To: mpich-discuss at mcs.anl.gov
>             Date: Saturday, July 18, 2009, 7:02 AM
> 
>             What are the exact parameters you passed to configure when
>             building MPICH2? Are the two machines identical?
>              
>             Rajeev
> 
>                 ------------------------------------------------------------------------
>                 *From:* mpich-discuss-bounces at mcs.anl.gov
>                 [mailto:mpich-discuss-bounces at mcs.anl.gov] *On Behalf Of
>                 *Gra zeus
>                 *Sent:* Saturday, July 18, 2009 12:06 AM
>                 *To:* mpich-discuss at mcs.anl.gov
>                 *Subject:* [mpich-discuss] I wonder if my mpdboot is the
>                 cause ofproblem...help me!
> 
>                 hello,
> 
>                 thx for answer yesterday.
>                 I tested my code in one machine (with"mpiexec -n 2
>                 ./myprog"),everything work fine - my program can use
>                 MPI_Send,MPI_Recv without any problems.
> 
>                 today, I setup mpich2 on two machines. Both machines can
>                 communicate with others, ssh are tested on both
>                 machines, mpd work, mpdringtest work.
> 
>                 however,when i run my program that use MPI_Send and
>                 MPI_Recv,  MPI_Recv is blocked forever.
>                 so i write new simple code to test MPI_Send,MPI_Recv
>                 like this
> 
>                         int myrank;
>                         MPI_Status status;
>                 MPI_Init( &argc, &argv );
>                 MPI_Comm_rank( MPI_COMM_WORLD, &myrank );
>                 if (myrank == 0) 
>                 {
>                 int senddata = 1;
>                 MPI_Send(&senddata, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
>                 }
>                 else if (myrank == 1) 
>                 {
>                 int recvdata = 0;
>                 MPI_Recv(&recvdata, 1, MPI_INT, 0, 0, MPI_COMM_WORLD,
>                 &status);
>                 printf("received :%d:\n", recvdata);
>                 }
>                 MPI_Finalize();
> 
> 
>                 i got this error
> 
> 
>                 Assertion failed in file ch3_progress.c at line 489:
>                 pkt->type >= 0 && pkt->type < MPIDI_NEM_PKT_END
>                 internal ABORT - process 1
>                 Fatal error in MPI_Finalize: Other MPI error, error stack:
>                 MPI_Finalize(315)..................: MPI_Finalize failed
>                 MPI_Finalize(207)..................: 
>                 MPID_Finalize(92)..................: 
>                 PMPI_Barrier(476)..................:
>                 MPI_Barrier(comm=0x44000002) failed
>                 MPIR_Barrier(82)...................: 
>                 MPIC_Sendrecv(164).................: 
>                 MPIC_Wait(405).....................: 
>                 MPIDI_CH3I_Progress(150)...........: 
>                 MPID_nem_mpich2_blocking_recv(1074): 
>                 MPID_nem_tcp_connpoll(1667)........: 
>                 state_commrdy_handler(1517)........: 
>                 MPID_nem_tcp_recv_handler(1413)....: socket closed
> 
>                 ////////////////////////////////////////////////////////////////
> 
>                 I also tried example/cpi that come with install package
>                 -> result is the example program freezed, without any
>                 errors.(I assume it stopped at MPI_Bcast())
> 
>                 Can anyone help me with this?
>                 This code and my program can run smoothly when I use 1
>                 machine (with option ,  -n 2, -n 4 .... etc) but
>                 whenever I start mpdboot with 2 machines, mpi processes
>                 can't communicate with other mpi processes via
>                 MPI_Send,MPI_Recv.
> 
>                 thx,
>                 gra
> 
> 
> 
> 
> 

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list