[mpich-discuss] Rmpi technical problem/question with MPICH2
Cye Stoner
stonerc at gmail.com
Sun Sep 27 02:06:30 CDT 2009
I'm sorry if this is the wrong channel of communication for these types of
problems. If that is the case, I would appreciate knowing where to go.
I am aware that Rmpi was mostly developed under LAM-MPI, but I am attempting
to deploy it under MPICH2.
MPICH2 has been set up using the "./configure --with-device=ch3:sock"
command in order to avoid a bug I was encountering with some of the nodes.
Everything else under MPICH2 now works, and I can compile and run the
examples without problem. MPICH2 is deployed across the cluster under the
/mirror/mpich2 directory. If it's relevant, they also have their home
directories for the mpiu user mirrored over I am running into problems with
Rmpi.
To install Rmpi, I used my generic mpiu account, and executed the following
commands:
> install.packages("Rmpi", configure.args="--with-mpi=/mirror/mpich2")
This installation completes without error, and I am able to load the Rmpi
library with the "> library(Rmpi)" command from the R prompt.
This is where my problems occur, and where I could use your advice.
If I start the mpd daemon with 1 node using the following command:
$ mpdboot -n 1 -v
then I can successfully start use
> mpi.spawn.Rslaves()
command to start the Rslaves with the following output
1 slaves are spawned successfully. 0 failed.
master (rank 0, comm 1) of size 2 is running on: hal
slave1 (rank 1, comm 1) of size 2 is running on: hal
> mpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size()))
$slave1
[1] "I am 1 of 2"
> mpmpi.close.Rslaves()
mpi.close.Rslaves()
[1] 1
> mpi.quit()
> Error: unexpected '>' in ">"
> mpi.quit()
mpi.quit()
mpi.quit()
There seems to be some error (possibly permissions?) and after getting back
to the $ prompt, I get a lot of errors in the following form:
mpiexec_hal (handle_stdin_input 1089): stdin problem; if pgm is run in
background, redirect from /dev/null
mpiexec_hal (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out <
/dev/null
After doing this, I can
However, if I start the mpd daemon with 2 (or more) nodes, using the
following commands from the R prompt:
> library("Rmpi")
> mpi.spawn.Rslaves()
I immediately get the following error:
Error in mpi.comm.spawn(slave = system.file("Rslaves.sh", package =
"Rmpi"), :
Other MPI error, error stack:
MPI_Comm_spawn(144)...........:
MPI_Comm_spawn(cmd="/home/mpiu/R/i486-pc-linux-gnu-library/2.6/Rmpi/Rslaves.sh",
argv=0x8b8ce20, maxprocs=1, MPI_INFO_NULL, root=0, MPI_COMM_SELF,
intercomm=0x88cd0e0, errors=0x80ff870) failed
MPIDI_Comm_spawn_multiple(233): PMI_Spawn_multiple failed
For this particular error, the output of "mpdtrace -l" is:
hal_43272 (192.168.100.1)
n01_55355 (192.168.100.101)
Where hal is the name of the master node with mpd listening on port 43272,
and n01 is the slave node listening on port 55355.
I have tried several different versions of Rmpi (0.5-7 and 0.5-8), but get
the same error regardless.
This error seems to be caused within the mpi_comm_spawn(...) call under the
./src/Rmpi.c file of the Rmpi package.
I am completely baffled by this, and any help (or a good mailing list from
which to ask for help) would be very much appreciated.
Thank you for your time,
Cye Stoner
--
"If you already know what recursion is, just remember the answer. Otherwise,
find someone who is standing closer to
Douglas Hofstadter than you are; then ask him or her what recursion is." -
Andrew Plotkin
--
"If you already know what recursion is, just remember the answer. Otherwise,
find someone who is standing closer to
Douglas Hofstadter than you are; then ask him or her what recursion is." -
Andrew Plotkin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090927/bd2a2d73/attachment.htm>
More information about the mpich-discuss
mailing list