Troubles with mpich installation

William Gropp gropp at mcs.anl.gov
Wed May 18 08:49:41 CDT 2005


At 06:04 AM 5/18/2005, paolo.zini at ipcf.cnr.it wrote:
>Hi all.
>
>
>
>I have troubles with the mpich installation on one opteron cluster.

I suggest that you try MPICH2 (www.mcs.anl.gov/mpi/mpich2) instead of 
MPICH1.  The ch_p4 device in MPICH1 performs eager sending to relatively 
large message sizes and this can sometimes run into problems when the 
messages that are being sent eagerly are larger than the socket 
buffer.  MPICH2 handles this case properly, and we're unlikely to change 
MPICH1 to fix this.

Bill




>The configuration is:
>
>HW
>
>11 x MB Tyan 2882, processors 2 Opteron 2.4 Ghz, 4 Gbytes ram,
>
>          2 160 Gbytes SATA disks arranged in hardware RAID 1 (using the
>onboard controller)
>
>1 d-link gigabit switch.
>
>SW
>
>Suse 9.0
>
>Portland compilers (cc and f77-f90)
>
>Mpich-1.2.6, compiled with Portland suite. Both with and without the patches
>available on the mpich home page.
>
>
>
>If I run the perftest programs, the buflimit test stops at 64K; the mpptest
>run correctly on short messages, but hangs silently, without errors on long
>messages.
>
>
>
>The application programs runs on two processor on a single node, using the
>ch_shmem  device, but if I try to run it on a multiple node configuration,
>using ch_p4 device, one of the processes dies silently, after a time
>variable from few minutes to several hours.
>
>Sometimes the OS itself hang.
>
>
>
>Any suggestion?
>
>
>
>Paolo Zini
>
>
>Paolo Zini
>IPCF institute of CNR
>Pisa
>Italy
>tel +39 050 3152964
>Paolo.Zini at ipcf.cnr.it

William Gropp
http://www.mcs.anl.gov/~gropp 




More information about the mpich-discuss mailing list