[mpich-discuss] MPI_Bcast error

Luiz Carlos Costa Junior lccostajr at gmail.com
Tue Feb 28 17:03:53 CST 2012


Hi all,

I've been experiencing some frequent problems using MPICH2.
During the execution, the root process tries to broadcast a ~12 MB-matrix
but, sometimes, it just don't get back from the MPI_BCAST function,
freezing the execution. Here, "sometimes" is really a problem because we
don't have any clue about when it is going to happen.

Has someone already experienced any similar problem?

Some few questions have been raised about the MPI_BCAST behaviour and its
implementation:
1) Is there any limitation on the size of the buffer that is sent?
2) If this limit exists, would it be related to the number of the process
of the communicator? in this case, I am using 32 processes, but I commonly
had success with bigger clusters (over 200 processes).
3) Is the content of data being sent relevant? If I have some uninitialized
data, would it be a concern? In other words, I understand that the only
thing that matters is that the buffer size must be correct in all process
(any combination of datatype/array size) and there must be enough allocated
space to receive the data, right?
4) How is the best way to send this data? Split it in smaller broadcasts
might be better/safer?
5) How should I classify a 12 MB message? Small? Big? I believe it should
be pretty small because I also have other typical executions instances with
messages over 100 MB that had sucess.

Regards,
Luiz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120228/08733a34/attachment-0001.htm>


More information about the mpich-discuss mailing list