[mpich-discuss] MPICH2 1.1a2 - problems with more than 4 computers

Tina Tina gucigu at gmail.com
Tue Jan 13 12:00:48 CST 2009


Dear Community!

I am using the latest version of MPICH2 for Windows (the problem occurs also
on 1.0.8). I have 8 computers connected over giga-bit switch. I have written
a program that uses MPI for paralelization. When I run a program on one or
two computers. Everything works OK (lets say most of the time). When I run
it on 4 computers, sometimes it works and sometimes it does not. The error
that I get is:
launch failed: CreateProcess(X:\mpi_program.exe) on 'computerX' failed,
error 3 - The system cannot find the path specified.

Most times I get this error for one computer in machine list, but it can
also happen for 2 or more computers etc.

If I increase number of computers over 4. I get this error almost every
time. With 6 or more this happens every time. It looks like the higher the
number the worse it gets. I would really like to make this work. Has anybody
had such experiences and what was the solution.

It looks like the computer tries to start the program before the mapped
drive would be made operational. Is there any way to increase this delay? Or
are there any other settings that needs to be set?

There are some other errors that I occasionally get, but this is the most
important one (for now).

Systems:
Windows XP SP3 (on all computers)
Installed latest MPICH2
Connection giga-bit NICs (local network) over switch

Example of run command: "C:\Program Files\MPICH2\bin\mpiexec.exe" -map
X:\\computer1\MPI -wdir X:\ -n 4 -machinefile "C:\Program
Files\MPICH2\bin\machines.txt" -noprompt X:\mpi_program.exe

\\computer1\MPI is a shared folder on computer1 from which the command is
run

machines.txt consists of following lines:
computer1 -ifhn 192.168.1.1
computer2 -ifhn 192.168.1.2
...
computer8 -ifhn 192.168.1.8

These are the NICs I would like MPI to use them for communication. The order
of computers in machines.txt is irrelevant (it happens on every
combination).

Regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090113/22852797/attachment.htm>


More information about the mpich-discuss mailing list