[mpich-discuss] SIGx13 Intermittent error

Marc levesqm at emt.inrs.ca
Thu Jun 11 12:26:03 CDT 2009


I had already run multiple times a simple parallel hello World program
and the error never showed up. Same thing for cpi... Maybe that depends
on the message size.

I also made a little bit a googling before asking on the mailing list
and there's not a lot of information. However, I recently found this
post
(https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2006-November/022378.html) on the Rocks Cluster mailing list (which is my cluster distro) and the problem seems to be related to uncleared shared memory, in the case of intermittent errors. It's to me the most plausible explanation.

I will try MPICH-2. Thanks for the suggestion.

Best regards,

Marc



More information about the mpich-discuss mailing list