[MPICH] MPICH2 startup w/ PBS

Jeffrey B. Layton laytonjb at charter.net
Tue Apr 4 10:07:10 CDT 2006


No joy. It always screams about not having enough hosts:

totalnum=16  numhosts=8
there are not enough hosts on which to start all processes

I think this because we have two processors per node (ppn=2).
Consequently PBS_NODEFILE has the hosts repeated. I've
tried using --totalnum=${NP} --ncpus=2 and this didn't work
either (same error message).

Thanks!

Jeff

>
> How about the following 3 lines in your script:
>
> mpdboot -n ${NP} -f ${PBS_NODEFILE}
> mpiexec -machinefile ${PBS_NODEFILE} -n ${NP} ./${EXE}
> mpdallexit
>
> Wei-keng
>
>
> On Tue, 4 Apr 2006, Jeffrey B. Layton wrote:
>
>> Good morning,
>>
>>  I hate to bother everyone early in the morning, but I'm
>> looking for some advice on MPICH2 startup. I've been starting
>> an mpd on each node in the cluster via,
>>
>> mpdboot -n 25 -f /home/jlayton/mpd.hosts
>>
>> where the file mpd.hosts contains a list of all possible hosts.
>> So I'm basically starting mpd on every node. Then I run the
>> code using mpiexec
>>
>> mpiexec -machinefile ${PBS_NODEFILE} -n ${NP} ./${EXE}
>>
>> and run mpdallexit after the code is finished to stop all of the
>> mpds. Notice that I'm using PBS for queuing/scheduling.
>>  This is something of a pain, because we lose nodes for
>> various projects or training so I'm constantly having to go into
>> the list of hosts and edit it. I also have to change the count on
>> the mpdboot command.
>>  Is there a better way to start up MPICH2 codes using PBS?
>>
>> Thanks!
>>
>> Jeff
>>
>




More information about the mpich-discuss mailing list