[mpich-discuss] Rmpi technical problem/question with MPICH2

Rajeev Thakur thakur at mcs.anl.gov
Sun Sep 27 18:21:56 CDT 2009


You may want to test the MPICH2 installation independent of Rmpi first.
You can run make testing in the top level mpich2 directory, which will
run through the entire test suite (>500 tests). If that works, you may
need to contact the Rmpi developers about this specific problem.
 
Rajeev
 


  _____  

From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Cye Stoner
Sent: Sunday, September 27, 2009 2:07 AM
To: mpich-discuss at mcs.anl.gov
Subject: [mpich-discuss] Rmpi technical problem/question with MPICH2


I'm sorry if this is the wrong channel of communication for these types
of problems. If that is the case, I would appreciate knowing where to
go.

I am aware that Rmpi was mostly developed under LAM-MPI, but I am
attempting to deploy it under MPICH2.
MPICH2 has been set up using the "./configure --with-device=ch3:sock"
command in order to avoid a bug I was encountering with some of the
nodes. Everything else under MPICH2 now works, and I can compile and run
the examples without problem. MPICH2 is deployed across the cluster
under the /mirror/mpich2 directory. If it's relevant, they also have
their home directories for the mpiu user mirrored over I am running into
problems with Rmpi.

To install Rmpi, I used my generic mpiu account, and executed the
following commands:
> install.packages("Rmpi", configure.args="--with-mpi=/mirror/mpich2")

This installation completes without error, and I am able to load the
Rmpi library with the "> library(Rmpi)" command from the R prompt.

This is where my problems occur, and where I could use your advice.

If I start the mpd daemon with 1 node using the following command:
$ mpdboot -n 1 -v
then I can successfully start use
> mpi.spawn.Rslaves()
command to start the Rslaves with the following output
    1 slaves are spawned successfully. 0 failed.
master (rank 0, comm 1) of size 2 is running on: hal 
slave1 (rank 1, comm 1) of size 2 is running on: hal
> mpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size()))
$slave1
[1] "I am 1 of 2"
> mpmpi.close.Rslaves()
mpi.close.Rslaves()
[1] 1
> mpi.quit()
> Error: unexpected '>' in ">"
> mpi.quit()
mpi.quit() 
mpi.quit()

There seems to be some error (possibly permissions?) and after getting
back to the $ prompt, I get a lot of errors in the following form:
mpiexec_hal (handle_stdin_input 1089): stdin problem; if pgm is run in
background, redirect from /dev/null
mpiexec_hal (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out <
/dev/null


After doing this, I can 
However, if I start the mpd daemon with 2 (or more) nodes, using the
following commands from the R prompt:
> library("Rmpi")
> mpi.spawn.Rslaves()
I immediately get the following error:

Error in mpi.comm.spawn(slave = system.file("Rslaves.sh", package =
"Rmpi"),  : 
  Other MPI error, error stack:
MPI_Comm_spawn(144)...........:
MPI_Comm_spawn(cmd="/home/mpiu/R/i486-pc-linux-gnu-library/2.6/Rmpi/Rsla
ves.sh", argv=0x8b8ce20, maxprocs=1, MPI_INFO_NULL, root=0,
MPI_COMM_SELF, intercomm=0x88cd0e0, errors=0x80ff870) failed
MPIDI_Comm_spawn_multiple(233): PMI_Spawn_multiple failed

For this particular error, the output of "mpdtrace -l" is:
hal_43272 (192.168.100.1)
n01_55355 (192.168.100.101)

Where hal is the name of the master node with mpd listening on port
43272, and n01 is the slave node listening on port 55355.

I have tried several different versions of Rmpi (0.5-7 and 0.5-8), but
get the same error regardless.

This error seems to be caused within the mpi_comm_spawn(...) call under
the ./src/Rmpi.c file of the Rmpi package.

I am completely baffled by this, and any help (or a good mailing list
from which to ask for help) would be very much appreciated.

Thank you for your time,
Cye Stoner


-- 
"If you already know what recursion is, just remember the answer.
Otherwise, find someone who is standing closer to
Douglas Hofstadter than you are; then ask him or her what recursion is."
- Andrew Plotkin




-- 
"If you already know what recursion is, just remember the answer.
Otherwise, find someone who is standing closer to
Douglas Hofstadter than you are; then ask him or her what recursion is."
- Andrew Plotkin


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090927/dd3f86c6/attachment.htm>


More information about the mpich-discuss mailing list