[mpich-discuss] mpich2-1.2.1, only starts 5 mpd's and cpi won't run, compiler flags issue?

David Mathog mathog at caltech.edu
Mon Feb 8 13:05:11 CST 2010


Built and installed mpich2-1.2.1 on our master node, which placed it in 
/opt/mpich2_121.  Copied that directory to all compute nodes.  For an
account set up .mpd.conf with a secretword, put /opt/mpich2_121/bin in
that accounts PATH via .bashrc, and did:

mpdboot -f /usr/common/etc/machines.LINUX_INTEL_Safserver \
  -r rsh --ifhn=192.168.1.220 &

There are 21 machines listed in that file, but it only started mpd on
the first 5.  No warnings or anything, it just did 5 and stopped,
/var/log/messages on the missing node don't show any attempt to rsh in.
 The rump cluster works to the extent that 

  mpiexec -n 5 mpdtrace -l
  mpiexec -n 5 /bin/uname -n

give the expected results.  However this doesn't work:

  mpiexec -n -5 /opt/mpich2_121/examples/cpi

and this is why, I think:

  rsh monkey01 '/opt/mpich2_121/examples/cpi'
  Illegal instruction

The main machine is a dual opteron, and the slaves are all Athlon MPs,
so it is possible to generate code that would run on the former and not
the latter.  However they are both running the same 32 bit linux, and
have the same versions of all packages installed.   Does MPICH2 somehow
pick compiler flags that might only run locally by default (none were
specified in ./configure)?   For instance, to rebuild cpi make does only
this:

 ../bin/mpicc   -o cpi cpi.o  -lm 

and what comes out runs on the Opteron but not the Athlon.  Also by
putting -x in the first line of mpicc, it turns out that the full
compile command was:

gcc -o cpi cpi.o -lm -I/usr/common/src/mpich2-1.2.1/src/include
-L/usr/common/src/mpich2-1.2.1/lib
-L/usr/common/src/mpich2-1.2.1/src/openpa/src
-Wl,-rpath,/usr/common/src/mpich2-1.2.1/lib -lmpich -lopa -lpthread -lrt

which doesn't have anything funky in it in terms of architecture
switches, but does show that if the libraries were compiled that way it
would explain the problem.  I have the log files for the build but they
just show "CC" and not the actual compiler commands which were used.

Other than building this again on one of the client nodes (and so
obtaining compiler flags which are upwards compatible), how would one
control this???

Thank you,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


More information about the mpich-discuss mailing list