[mpich-discuss] problem with collective on sub-communicator
Miguel Oliveira
m.a.oliveira at coimbra.lip.pt
Thu Nov 10 22:24:30 CST 2011
Hi,
Here is why....
93:master_slave miguel$ mpicc -o master_slave master_slave.c
93:master_slave miguel$ mpiexec -n 10 ./master_slave
Average=5.478889e+00
93:master_slave miguel$ mpiexec -n 10 ./master_slave
Average=5.563333e+00
93:master_slave miguel$ mpiexec -n 10 ./master_slave
Average=5.384444e+00
93:master_slave miguel$ mpiexec -n 10 ./master_slave
Average=5.565556e+00
93:master_slave miguel$ mpiexec -n 10 ./master_slave
Oops...
This output is generated with the attached code which only differs from the one I sent previously by the if statement that prints the "Oops..." in the deadlock case....
Cheers,
MAO
-------------- next part --------------
A non-text attachment was scrubbed...
Name: master_slave.c
Type: application/octet-stream
Size: 1807 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20111111/5ae50c8c/attachment-0001.obj>
-------------- next part --------------
On Nov 11, 2011, at 04:16 , Pavan Balaji wrote:
>
> Even if I remove it, the program just prints a number. What makes you believe that the MPI_Send is overtaking the Reduce?
>
> -- Pavan
>
> On 11/10/2011 10:12 PM, Miguel Oliveira wrote:
>> Hi,
>>
>> Sorry, sent the version with the only correction I found that makes it work. Remove the MPI_Barrier(world) in both the master and the slaves and you should be able to reproduce the problem.
>>
>> Cheers,
>>
>> MAO
>>
>> On Nov 11, 2011, at 03:31 , Pavan Balaji wrote:
>>
>>>
>>> The test code you sent seems to work fine for me. I don't see any such problem.
>>>
>>> -- Pavan
>>>
>>> On 11/10/2011 03:14 PM, Miguel Oliveira wrote:
>>>> Hi all,
>>>>
>>>> I wrote a very simple master/slave code in MPI and I'm having problems with MPI_Reduce, or even, MPI_Barrier, inside a subset of the world communicator.
>>>> These operations don't seem to be waiting for all the processes in the subgroup.
>>>>
>>>> The code is a straightforward master/slave case where the master generates random numbers when requested and then retrieves a reduction of the sum of these
>>>> done on the slaves.
>>>>
>>>> When run on more than three processes sometimes it happens that the message after the reduction, done from one of the slaves to inform the master of the final
>>>> result gets to the master before some of the requests for random numbers... This ought to be impossible with a blocking reduction...
>>>>
>>>> Am I missing something?
>>>>
>>>> Code is attached.
>>>>
>>>> Help is appreciated.
>>>>
>>>> Cheers,
>>>>
>>>> MAO
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>
>>> --
>>> Pavan Balaji
>>> http://www.mcs.anl.gov/~balaji
>>
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1580 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20111111/5ae50c8c/attachment-0001.bin>
More information about the mpich-discuss
mailing list