[mpich-discuss] (mpiexec 392): no msg recvd from mpd when expecting ack of request

Mr. Teo En Ming (Zhang Enming) space.time.universe at gmail.com
Thu Oct 29 22:55:16 CDT 2009


Hi,

I am getting the same mpiexec 392 error message as Kenneth Yoshimoto from
the San Diego Supercomputer Center. His mpich-discuss mailing list topic URL
is http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005882.html

I have actually already performed the 2-node mpdcheck utility test as
described in Appendix A.1 of the MPICH2 installation guide. I could start
the ring of mpd on the 2-node test scenario using mpdboot successfully as
well.

薛正华 (ID: zhxue123) from China reported solving the mpiexec 392 error.
According to 薛正华, the cause of the mpiexec 392 error is the absence of high
performance network in his environment. He had changed the default
communication method from nemesis to ssm and also increased the value of
MPIEXEC_RECV_TIMEOUT in the mpiexec.py python source code. The URL of his
report is at http://blog.csdn.net/zhxue123/archive/2009/08/22/4473089.aspx

Could this be my problem also?

Thank you.

-- 
Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical
Engineering)
Alma Maters:
(1) Singapore Polytechnic
(2) National University of Singapore
My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com
My Youtube videos: http://www.youtube.com/user/enmingteo
Email: space.time.universe at gmail.com
MSN: teoenming at hotmail.com
Mobile Phone (SingTel): +65-9648-9798
Mobile Phone (Starhub Prepaid): +65-8369-2618
Age: 31 (as at 30 Oct 2009)
Height: 1.78 meters
Race: Chinese
Dialect: Hokkien
Street: Bedok Reservoir Road
Country: Singapore

On Fri, Oct 30, 2009 at 11:09 AM, Rajeev Thakur <thakur at mcs.anl.gov> wrote:

>  You need to do the mpdcheck tests with every pair of compute nodes. Or to
> isolate the problem, try running on a smaller set of nodes first and
> increase it one at a time until it fails.
>
> Rajeev
>
>
>  ------------------------------
> *From:* mpich-discuss-bounces at mcs.anl.gov [mailto:
> mpich-discuss-bounces at mcs.anl.gov] *On Behalf Of *Mr. Teo En Ming (Zhang
> Enming)
> *Sent:* Thursday, October 29, 2009 2:35 PM
> *To:* mpich-discuss at mcs.anl.gov
> *Subject:* [mpich-discuss] (mpiexec 392): no msg recvd from mpd when
> expectingack of request
>
> Hi,
>
> I have just installed MPICH2 in my Xen-based virtual machines.
>
> My hardware configuration is as follows:
>
> Processor: Intel Pentium Dual Core E6300 @ 2.8 GHz
> Motherboard: Intel Desktop Board DQ45CB BIOS 0093
> Memory: 4X 2GB Kingston DDR2-800 CL5
>
> My software configuration is as follows:
>
> Xen Hypervisor / Virtual Machine Monitor Version: 3.5-unstable
> Jeremy Fitzhardinge's pv-ops dom0 kernel: 2.6.31.4
> Host Operating System: Fedora Linux 11 x86-64 (SELinux disabled)
> Guest Operating Systems: Fedora Linux 11 x86-64 paravirtualized (PV) domU
> guests (SELinux disabled)
>
> I have successfully configured, built and installed MPICH2 in a F11 PV
> guest OS master compute node 1 with NFS server (MPICH2 bin subdirectory
> exported). The rest of the 5 compute nodes have access to the MPICH2
> binaries by mounting NFS share from node 1. Please see attached c.txt, m.txt
> and mi.txt. With Xen virtualization, I have created 6 F11 linux PV guests to
> simulate 6 HPC compute nodes. The network adapter (NIC) in each guest OS is
> virtual. The Xen networking type is bridged. Running "lspci -v" and lsusb in
> each guest OS does not show up anything.
>
> According to Appendix A troubleshooting section of the MPICH2 install
> guide, I have verified that the 2-node test scenario with "mpdcheck -s" and
> "mpdcheck -c" is working. The 2 nodes each acting as server and client
> respectively can communicate with each other without problems. Both nodes
> can communicate with each other in server and client modes respectively. I
> have also tested mpdboot with the 2-node scenario and the ring of mpd is
> working.
>
> After the troubleshooting process, I have successfully created a ring of
> mpd involving 6 compute nodes. "mpdtrace -l" successfully lists all the 6
> nodes. However, when I want to run a job with mpiexec, it gives me the
> following error:
>
> [enming at enming-f11-pv-hpc-node0001 ~]$ mpiexec -n 2 examples/cpi
> mpiexec_enming-f11-pv-hpc-node0001 (mpiexec 392): no msg recvd from mpd
> when expecting ack of request
>
> I have also tried starting the mpd ring with the root user but I still
> encounter the same error above.
>
> Thank you.
>
> PS. config.log is also attached.
>
> --
> Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical
> Engineering)
> Alma Maters:
> (1) Singapore Polytechnic
> (2) National University of Singapore
> My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com
> My Youtube videos: http://www.youtube.com/user/enmingteo
> Email: space.time.universe at gmail.com
> MSN: teoenming at hotmail.com
> Mobile Phone (SingTel): +65-9648-9798
> Mobile Phone (Starhub Prepaid): +65-8369-2618
> Age: 31 (as at 30 Oct 2009)
> Height: 1.78 meters
> Race: Chinese
> Dialect: Hokkien
> Street: Bedok Reservoir Road
> Country: Singapore
>
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20091030/4e350b85/attachment-0001.htm>


More information about the mpich-discuss mailing list