[mpich-discuss] mpich2-1.2.1p1

Rajeev Thakur thakur at mcs.anl.gov
Sun Mar 28 20:02:25 CDT 2010


Can you configure MPICH2 with --with-device=ch3:sock, run make clean,
make, make install, and see if it makes any difference?
 
Rajeev


  _____  

From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Costa, Michael
Sent: Friday, March 26, 2010 5:42 PM
To: mpich-discuss at mcs.anl.gov
Subject: [mpich-discuss] mpich2-1.2.1p1


I have been struggling with communication errors when ever I run
mpiexec. This installation is on PARISC based cluster. Running
mpich2-1.2.1p1, I have configured it with --with-device=ch3:nemesis.
 
Currently only 2 nodes are in the ring, hp20 and hp14 for testing/setup
purposes.
 
The following steps may shed some light on the problem, which I'm sure
is something I have omitted or failed to do with the initial
installation/configuration. It appears that I can run non MPI programs
OK, but MPI code like cpi or hello fail.
 
 
hp20:~$ mpdallexit


mikec at hp20:~$ mpdboot -v -n 2 -f /etc/mpd.hosts
running mpdallexit on hp20
LAUNCHED mpd on hp20  via
RUNNING: mpd on hp20
LAUNCHED mpd on hp14  via  hp20
RUNNING: mpd on hp14

mikec at hp20:~$ mpdtrace
hp20
hp14

mikec at hp20:~$ mpdtrace -l
hp20_44192 (172.17.81.20)
hp14_51832 (172.17.81.14)

mikec at hp20:~$ mpdringtest 10
time for 10 loops = 0.0491678714752 seconds

mikec at hp20:~$ mpiexec -n 2 uname -a
Linux hp20 2.6.32-trunk-parisc #1 Mon Jan 11 03:07:31 UTC 2010 parisc
GNU/Linux
Linux hp14 2.6.32-trunk-parisc #1 Mon Jan 11 03:07:31 UTC 2010 parisc
GNU/Linux
 
mikec at hp20:~$ mpiexec -n 1 ./cpi
Process 0 of 1 is on hp20
pi is approximately 3.1415926544231341, Error is 0.0000000008333410
wall clock time = 0.003888


mikec at hp20:~$ mpiexec -n 2 ./cpi
Process 0 of 2 is on hp20
Process 1 of 2 is on hp14
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(1302)..................: MPI_Bcast(buf=0xc016e33c, count=1,
MPI_INT, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast(1031)..................:
MPIR_Bcast_binomial(157)..........:
MPIC_Recv(83).....................:
MPIC_Wait(513)....................:
MPIDI_CH3I_Progress(150)..........:
MPID_nem_mpich2_blocking_recv(948):
MPID_nem_tcp_connpoll(1720).......:
state_listening_handler(1787).....: accept of socket fd failed -
Resource temporarily unavailable
rank 1 in job 2  hp20_44192   caused collective abort of all ranks
  exit status of rank 1: return code 1
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(1302)..................: MPI_Bcast(buf=0xc067f33c, count=1,
MPI_INT, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast(1031)..................:
MPIR_Bcast_binomial(187)..........:
MPIC_Send(41).....................:
MPIC_Wait(513)....................:
MPIDI_CH3I_Progress(150)..........:
MPID_nem_mpich2_blocking_recv(948):
MPID_nem_tcp_connpoll(1709).......: Communication error
 
 
Any comments and or suggestions are greatly appreciated.
 
 
Mike C.
 
 
 



Michael A. Costa
SET (RCC), CCAI-CCNA/CCNP (Cisco), MInfTech (Griffith)
Professor - Information Technology Division
Fanshawe College
G3001
1001 Fanshawe College Boulevard
P.O. Box 7005
London, ON N5Y 5R6
Tel: (519) 452-4291  Fax: (519) 452-1801
 



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100328/0287d3b5/attachment.htm>


More information about the mpich-discuss mailing list