[mpich-discuss] MPI_Recv crashes with mpd ring

Pavan Balaji balaji at mcs.anl.gov
Wed Feb 16 13:19:08 CST 2011


Rohit,

Try this:

% mpiexec.hydra -f hosts -n 1 ./a.exec arg1 : -n 1 ./a.exec arg2 : -n 1 
./a.exec arg3 : -n 1 ./a.exec arg4

  -- Pavan

On 02/16/2011 12:37 PM, Jain, Rohit wrote:
> Thanks everyone for responses. I got around ssh issue.
>
> But, it seems some more setup is required to make hydra work:
>
> mpiexec.hydra -f hosts -n 4 a.exec arg1 : a.exec arg2 : a.exec arg3 :
> a.exec arg4
> [proxy at hansel] HYDU_create_process
> (/mpich/src/mpich2-1.2.1p1/src/pm/hydra/utils/launch/launch.c:72):
> execvp error on file a.exec (No such file or directory)
> [proxy at hansel] HYDU_create_process
> (/mpich/src/mpich2-1.2.1p1/src/pm/hydra/utils/launch/launch.c:72):
> execvp error on file a.exec (No such file or directory)
>
>
> Regards,
> Rohit
>
>
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Dave Goodell
> Sent: Tuesday, February 15, 2011 7:21 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] MPI_Recv crashes with mpd ring
>
> On Feb 15, 2011, at 5:52 PM CST, Jain, Rohit wrote:
>
>> I had 1.2.1p1 built locally. So, I tried that. It also gave me same
> fatal error. I will try newer version, but I am less hopeful.
>
> There's a good chance that there is a bug in your code, since 1.0.6 was
> not fundamentally a broken version of MPICH2.  However, it is important
> for you to use a fairly recent version so that we can rule out the ~3-4
> years of bugs that have been fixed since 1.0.6 was released.  Also,
> error messages and debugging facilities are typically only improved in
> later versions of MPICH2, which could help you track down your problem.
> You should attempt to debug your program in all of the usual ways, such
> as by enabling core dumps or running valgrind on your program.
>
>> I am trying to use hydra (mpiexec.hydra) with 1.2.1.p1, but getting
> some startup errors:
>>
>> The authenticity of host 'XXX' can't be established.
>> RSA key fingerprint is ed:ce:ca:7b:08:b9:49:fd:f6:af:14.
>> Are you sure you want to continue connecting (yes/no)?
>> The authenticity of host 'XXX2' can't be established.
>> RSA key fingerprint is fb:1b:7b:0c:bb:b1:a6:b1:7d:dc:05.
>>
>> Any pointers how to resolve them?
>
> See Pavan's mail for some tips here.
>
> -Dave
>
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list