[MPICH] stdin buffer overflow problem?
Benjamin Svetitsky
bqs at julian.tau.ac.il
Sun Feb 24 09:44:12 CST 2008
Dear developers,
I've been running MPICH2 under RHEL on a cluster of Intel Core 2 Quad
processors. The installation went smoothly and it was up and running in
a few minutes. We've done a good deal of production in the past month,
and I want to thank the developers.
Now the problem. I run an MPI job via a command like
mpiexec -np 4 -host nodeA ../su3_hmc : -np 4 -host nodeB ../su3_hmc <
inputfile > outputfile &
If the inputfile is too large, this leads to unpredictable behavior
ending in a crash. Typical results are:
1. No processes are ever initiated: mpdlistjobs gives the expected
report of jobs running on the two hosts, but there are no actual
processes reported by ps; an output file is created but it is zero bytes
long. The only solution in mpdkilljob.
2. Processes are created on one host but not the other; likewise there
are no results.
3. All the processes are created, and they run for a few minutes and
then hang up.
4. One or more of the hosts crashes and reboots.
There is no problem if inputfile is short; I run into trouble if the
file is longer than a few K, certainly by 67K. I interpret this as a
buffer overflow -- what, exactly, does mpiexec do with its standard input?
This looks like a serious security problem. I am running mpd as root,
with MPD_USE_ROOT_MPD=1. So this, I think, is how a buffer overflow can
crash the entire node.
A successful workaround is to get the program to read from a file other
than stdin. I give the inout file as an argument to the program, and
redirect stdin to /dev/null. So the following runs successfully:
mpiexec -np 4 -host nodeA ../su3_hmc inputfile : -np 4 -host nodeB
../su3_hmc inputfile < /dev/null > outputfile &
I recall seeing the same problem a few years ago with the old MPICH on
an SGI Origin system. So it's not a new bug.
Best regards,
B. Svetitsky
--
Prof. Benjamin Svetitsky Phone: +972-3-640 8870
School of Physics and Astronomy Fax: +972-3-640 7932
Tel Aviv University E-mail: bqs at julian.tau.ac.il
69978 Tel Aviv, Israel WWW: http://julian.tau.ac.il/~bqs
More information about the mpich-discuss
mailing list