Troubles with mpich installation
William Gropp
gropp at mcs.anl.gov
Wed May 18 08:49:41 CDT 2005
At 06:04 AM 5/18/2005, paolo.zini at ipcf.cnr.it wrote:
>Hi all.
>
>
>
>I have troubles with the mpich installation on one opteron cluster.
I suggest that you try MPICH2 (www.mcs.anl.gov/mpi/mpich2) instead of
MPICH1. The ch_p4 device in MPICH1 performs eager sending to relatively
large message sizes and this can sometimes run into problems when the
messages that are being sent eagerly are larger than the socket
buffer. MPICH2 handles this case properly, and we're unlikely to change
MPICH1 to fix this.
Bill
>The configuration is:
>
>HW
>
>11 x MB Tyan 2882, processors 2 Opteron 2.4 Ghz, 4 Gbytes ram,
>
> 2 160 Gbytes SATA disks arranged in hardware RAID 1 (using the
>onboard controller)
>
>1 d-link gigabit switch.
>
>SW
>
>Suse 9.0
>
>Portland compilers (cc and f77-f90)
>
>Mpich-1.2.6, compiled with Portland suite. Both with and without the patches
>available on the mpich home page.
>
>
>
>If I run the perftest programs, the buflimit test stops at 64K; the mpptest
>run correctly on short messages, but hangs silently, without errors on long
>messages.
>
>
>
>The application programs runs on two processor on a single node, using the
>ch_shmem device, but if I try to run it on a multiple node configuration,
>using ch_p4 device, one of the processes dies silently, after a time
>variable from few minutes to several hours.
>
>Sometimes the OS itself hang.
>
>
>
>Any suggestion?
>
>
>
>Paolo Zini
>
>
>Paolo Zini
>IPCF institute of CNR
>Pisa
>Italy
>tel +39 050 3152964
>Paolo.Zini at ipcf.cnr.it
William Gropp
http://www.mcs.anl.gov/~gropp
More information about the mpich-discuss
mailing list