[mpich-discuss] Rmpi technical problem/question with MPICH2

Cye Stoner stonerc at gmail.com
Sun Sep 27 02:06:30 CDT 2009


I'm sorry if this is the wrong channel of communication for these types of
problems. If that is the case, I would appreciate knowing where to go.

I am aware that Rmpi was mostly developed under LAM-MPI, but I am attempting
to deploy it under MPICH2.
MPICH2 has been set up using the "./configure --with-device=ch3:sock"
command in order to avoid a bug I was encountering with some of the nodes.
Everything else under MPICH2 now works, and I can compile and run the
examples without problem. MPICH2 is deployed across the cluster under the
/mirror/mpich2 directory. If it's relevant, they also have their home
directories for the mpiu user mirrored over I am running into problems with
Rmpi.

To install Rmpi, I used my generic mpiu account, and executed the following
commands:
> install.packages("Rmpi", configure.args="--with-mpi=/mirror/mpich2")

This installation completes without error, and I am able to load the Rmpi
library with the "> library(Rmpi)" command from the R prompt.

This is where my problems occur, and where I could use your advice.

If I start the mpd daemon with 1 node using the following command:
$ mpdboot -n 1 -v
then I can successfully start use
> mpi.spawn.Rslaves()
command to start the Rslaves with the following output
    1 slaves are spawned successfully. 0 failed.
master (rank 0, comm 1) of size 2 is running on: hal
slave1 (rank 1, comm 1) of size 2 is running on: hal
> mpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size()))
$slave1
[1] "I am 1 of 2"
> mpmpi.close.Rslaves()
mpi.close.Rslaves()
[1] 1
> mpi.quit()
> Error: unexpected '>' in ">"
> mpi.quit()
mpi.quit()
mpi.quit()

There seems to be some error (possibly permissions?) and after getting back
to the $ prompt, I get a lot of errors in the following form:
mpiexec_hal (handle_stdin_input 1089): stdin problem; if pgm is run in
background, redirect from /dev/null
mpiexec_hal (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out <
/dev/null


After doing this, I can
However, if I start the mpd daemon with 2 (or more) nodes, using the
following commands from the R prompt:
> library("Rmpi")
> mpi.spawn.Rslaves()
I immediately get the following error:

Error in mpi.comm.spawn(slave = system.file("Rslaves.sh", package =
"Rmpi"),  :
  Other MPI error, error stack:
MPI_Comm_spawn(144)...........:
MPI_Comm_spawn(cmd="/home/mpiu/R/i486-pc-linux-gnu-library/2.6/Rmpi/Rslaves.sh",
argv=0x8b8ce20, maxprocs=1, MPI_INFO_NULL, root=0, MPI_COMM_SELF,
intercomm=0x88cd0e0, errors=0x80ff870) failed
MPIDI_Comm_spawn_multiple(233): PMI_Spawn_multiple failed

For this particular error, the output of "mpdtrace -l" is:
hal_43272 (192.168.100.1)
n01_55355 (192.168.100.101)

Where hal is the name of the master node with mpd listening on port 43272,
and n01 is the slave node listening on port 55355.

I have tried several different versions of Rmpi (0.5-7 and 0.5-8), but get
the same error regardless.

This error seems to be caused within the mpi_comm_spawn(...) call under the
./src/Rmpi.c file of the Rmpi package.

I am completely baffled by this, and any help (or a good mailing list from
which to ask for help) would be very much appreciated.

Thank you for your time,
Cye Stoner


-- 
"If you already know what recursion is, just remember the answer. Otherwise,
find someone who is standing closer to
Douglas Hofstadter than you are; then ask him or her what recursion is." -
Andrew Plotkin



-- 
"If you already know what recursion is, just remember the answer. Otherwise,
find someone who is standing closer to
Douglas Hofstadter than you are; then ask him or her what recursion is." -
Andrew Plotkin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090927/bd2a2d73/attachment.htm>


More information about the mpich-discuss mailing list