[mpich-discuss] (mpiexec 392): no msg recvd from mpd when expectingack of request

Rajeev Thakur thakur at mcs.anl.gov
Thu Oct 29 22:09:51 CDT 2009


You need to do the mpdcheck tests with every pair of compute nodes. Or
to isolate the problem, try running on a smaller set of nodes first and
increase it one at a time until it fails.
 
Rajeev
 


  _____  

From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Mr. Teo En Ming
(Zhang Enming)
Sent: Thursday, October 29, 2009 2:35 PM
To: mpich-discuss at mcs.anl.gov
Subject: [mpich-discuss] (mpiexec 392): no msg recvd from mpd when
expectingack of request


Hi,

I have just installed MPICH2 in my Xen-based virtual machines.

My hardware configuration is as follows:

Processor: Intel Pentium Dual Core E6300 @ 2.8 GHz
Motherboard: Intel Desktop Board DQ45CB BIOS 0093
Memory: 4X 2GB Kingston DDR2-800 CL5

My software configuration is as follows:

Xen Hypervisor / Virtual Machine Monitor Version: 3.5-unstable
Jeremy Fitzhardinge's pv-ops dom0 kernel: 2.6.31.4
Host Operating System: Fedora Linux 11 x86-64 (SELinux disabled)
Guest Operating Systems: Fedora Linux 11 x86-64 paravirtualized (PV)
domU guests (SELinux disabled)

I have successfully configured, built and installed MPICH2 in a F11 PV
guest OS master compute node 1 with NFS server (MPICH2 bin subdirectory
exported). The rest of the 5 compute nodes have access to the MPICH2
binaries by mounting NFS share from node 1. Please see attached c.txt,
m.txt and mi.txt. With Xen virtualization, I have created 6 F11 linux PV
guests to simulate 6 HPC compute nodes. The network adapter (NIC) in
each guest OS is virtual. The Xen networking type is bridged. Running
"lspci -v" and lsusb in each guest OS does not show up anything.

According to Appendix A troubleshooting section of the MPICH2 install
guide, I have verified that the 2-node test scenario with "mpdcheck -s"
and "mpdcheck -c" is working. The 2 nodes each acting as server and
client respectively can communicate with each other without problems.
Both nodes can communicate with each other in server and client modes
respectively. I have also tested mpdboot with the 2-node scenario and
the ring of mpd is working.

After the troubleshooting process, I have successfully created a ring of
mpd involving 6 compute nodes. "mpdtrace -l" successfully lists all the
6 nodes. However, when I want to run a job with mpiexec, it gives me the
following error:

[enming at enming-f11-pv-hpc-node0001 ~]$ mpiexec -n 2 examples/cpi
mpiexec_enming-f11-pv-hpc-node0001 (mpiexec 392): no msg recvd from mpd
when expecting ack of request

I have also tried starting the mpd ring with the root user but I still
encounter the same error above.

Thank you.

PS. config.log is also attached.

-- 
Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical
Engineering)
Alma Maters:
(1) Singapore Polytechnic
(2) National University of Singapore
My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com
My Youtube videos: http://www.youtube.com/user/enmingteo
Email: space.time.universe at gmail.com
MSN: teoenming at hotmail.com
Mobile Phone (SingTel): +65-9648-9798
Mobile Phone (Starhub Prepaid): +65-8369-2618
Age: 31 (as at 30 Oct 2009)
Height: 1.78 meters
Race: Chinese
Dialect: Hokkien
Street: Bedok Reservoir Road
Country: Singapore


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20091029/efa91565/attachment.htm>


More information about the mpich-discuss mailing list