[MPICH] Simple MPI program crashes randomly
Yusong Wang
ywang25 at aps.anl.gov
Tue May 29 20:22:48 CDT 2007
There is a small mistake in the simplified version of the program you
provided before:
Array allocation should be
double* array = new double[height*width];
instead of
double* array = new double(height*width);
After fixing this, there is no problem on my cluster.
For the application program, I checked the detail implementation of the
splitDomain function. There are some difference between this one and the
one you provided before. This one has end[lastSlave] set as 201, which
is larger than the width you assigned. As you printed out everything
from this function, I assume you are aware of allocating enough memory
for the data array (some applications do need extra points). While the
simplified one has end[lastSlave] set as 199.
I ran your application program on my cluster and didn't see the problem.
My suggestion is recompiling everything with the complier from MPICH2.
You can check it with `which mpicxx` to see if you are using the correct
one.
Let me know if this could help you.
Yusong
On Mon, 2007-05-28 at 06:24 -0400, Christian Zemlin wrote:
> The attached program ran literally thousands of times on our old
> cluster, which has MPICH1 libraries, without any problems.
> It is simple in terms of MPI (just the minimum MPI setup + a few Send
> and Recv commands), although it is fairly long because it implements a
> complicated model of cardiac tissue.
>
> On our new cluster (with MPICH2) the program crashes with segmentation
> faults either on the first MPI_Send/Recv or at the end (MPI_Finalize).
> If you run it several times in a row, it seems to be random which of
> the two occurs. I have checked carefully that the space for the
> passed data has been allocated.
>
> I would greatly appreciate if someone with more MPI experience could
> have a look at the source code and give me his/her opinion.
>
> Best,
>
> Christian
More information about the mpich-discuss
mailing list