[mpich-discuss] ssh, mpiexec, and path to working dir
Thomas Ruedas
ruedas at dtm.ciw.edu
Fri Feb 5 22:40:51 CST 2010
Hi again,
after finally having recovered the use of the cluster after some
filesystem and hardware problems, I am trying to get my parallel jobs
going again. While on the last occasion (immediately before those system
failures and likely related to them) I hadn't been able to use mpdboot
anymore, these things seem to work now. The problem this time lies in
finding the path to the working directory on the nodes.
I have my executable and the files and directories the program should
use (and used to use correctly) in some sub-subdirectory of my HOME,
which I find to be mounted seemingly correctly on the head node and the
other nodes. The PATH variable is also apparently set correctly
everywhere. Nonetheless, when I go to that sub-subdirectory and start
the job:
mpiexec -machinefile machines -n 8 myprog < /dev/null >& scr.out &
It fails with this error:
problem with execution of myprog on compute-0-2.local: [Errno 2] No
such file or directory
... (and so on, for all nodes)
indicating that it doesn't find the executable myprog. However, when I do
mpiexec -machinefile machines -n 8 ~/sub/subdir/myprog < /dev/null >&
scr.out &
it starts, but then crashes because it can't find its input file in
~/sub/subdir/, because it is still somewhere else (presumably in /, see
below).
Another parallel test program that doesn't read input files can be
started correctly if I use the full path as in the previous example.
System commands work ok, e.g.
mpiexec -machinefile machines -n 8 hostname
gives correct results. Remarkably,
mpiexec -machinefile machines -n 8 ls
gives the contents of / rather than of the (NFS-mounted, or so it should
be) directory in which I invoke it on the head node.
Something must be wrong with the way ssh works, but I don't know what.
Does anybody have an idea what the problem is and how I could try to fix it?
Thanks,
Thomas
--
-----------------------------------
Thomas Ruedas
Department of Terrestrial Magnetism
Carnegie Institution of Washington
http://www.dtm.ciw.edu/users/ruedas/
More information about the mpich-discuss
mailing list