[mpich-discuss] Wrong distribution of parallel processes (mpd machinefile)

Mário Costa mario.silva.costa at gmail.com
Wed Aug 19 10:23:03 CDT 2009


Thanks a lot Scott,

It did the job, although this seems to me redundant information ...

I'm using mpdboot to dynamically create rings on a PBS system for each
new mpi job ... this adds the complexity that I have to know how much
processors are in the node where I create the ring via mpdboot,
besides having it already in the machinefile ...

Regards,
Mário

On Wed, Aug 19, 2009 at 3:41 PM, Scott Atchley<atchley at myri.com> wrote:
> Mário,
>
> You need to pass the this to mpdboot:
>
> --ncpus=4
>
> Even though you specify it in the machinefile, it needs it on the command
> line.
>
> Scott
>
> On Aug 19, 2009, at 10:34 AM, Mário Costa wrote:
>
>> Hello,
>>
>> I hope someone can help me with the following problem.
>>
>>
>> I'm using mpich2 ch3:nemesis device with mpd.
>>
>> I use mpdboot to start the mpi ring to start the mpi job using the
>> following command:
>>
>> mpdboot --remcons -n 2 -f machinefile
>>
>> the machine file has
>>
>> node001:4
>> node002:4
>>
>> then I start the mpi job via mpiexec:
>>
>> mpiexec -np 8 ./mpi_executable
>>
>> Now the problem I'm having is that the node001 has 3 mpi processes and
>> the node002 has 5, but it was supposed to be distributed 4 per node,
>> as specified in the machinefile.
>>
>> Does anyone has an idea on what the problem might be?
>>
>> I'm using mpich2 version 1.0.8, I've used mpich2 version 1.0.5 and I
>> had no such problems ... I've also tested with 1.1.1 and had the same
>> problem ...
>>
>> Thanks in advance!
>>
>> Regards,
>> Mário
>>
>
>


More information about the mpich-discuss mailing list