[MPICH] stdin buffer overflow problem?
Rajeev Thakur
thakur at mcs.anl.gov
Mon Feb 25 18:35:11 CST 2008
Yes, the MPD process manager doesn't handle large input files via stdin very
well. In such cases, you will need to read from a file as you have.
Rajeev
> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of
> Benjamin Svetitsky
> Sent: Sunday, February 24, 2008 9:44 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: [MPICH] stdin buffer overflow problem?
>
> Dear developers,
>
> I've been running MPICH2 under RHEL on a cluster of Intel
> Core 2 Quad processors. The installation went smoothly and
> it was up and running in a few minutes. We've done a good
> deal of production in the past month, and I want to thank the
> developers.
>
> Now the problem. I run an MPI job via a command like
>
> mpiexec -np 4 -host nodeA ../su3_hmc : -np 4 -host nodeB
> ../su3_hmc < inputfile > outputfile &
>
> If the inputfile is too large, this leads to unpredictable
> behavior ending in a crash. Typical results are:
>
> 1. No processes are ever initiated: mpdlistjobs gives the
> expected report of jobs running on the two hosts, but there
> are no actual processes reported by ps; an output file is
> created but it is zero bytes long. The only solution in mpdkilljob.
>
> 2. Processes are created on one host but not the other;
> likewise there are no results.
>
> 3. All the processes are created, and they run for a few
> minutes and then hang up.
>
> 4. One or more of the hosts crashes and reboots.
>
> There is no problem if inputfile is short; I run into trouble
> if the file is longer than a few K, certainly by 67K. I
> interpret this as a buffer overflow -- what, exactly, does
> mpiexec do with its standard input?
>
> This looks like a serious security problem. I am running mpd
> as root, with MPD_USE_ROOT_MPD=1. So this, I think, is how a
> buffer overflow can crash the entire node.
>
> A successful workaround is to get the program to read from a
> file other than stdin. I give the inout file as an argument
> to the program, and redirect stdin to /dev/null. So the
> following runs successfully:
>
> mpiexec -np 4 -host nodeA ../su3_hmc inputfile : -np 4 -host
> nodeB ../su3_hmc inputfile < /dev/null > outputfile &
>
> I recall seeing the same problem a few years ago with the old
> MPICH on an SGI Origin system. So it's not a new bug.
>
> Best regards,
> B. Svetitsky
>
> --
> Prof. Benjamin Svetitsky Phone: +972-3-640 8870
> School of Physics and Astronomy Fax: +972-3-640 7932
> Tel Aviv University E-mail: bqs at julian.tau.ac.il
> 69978 Tel Aviv, Israel WWW:
> http://julian.tau.ac.il/~bqs
>
>
More information about the mpich-discuss
mailing list