<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
<meta name=Generator content="Microsoft Word 12 (filtered medium)">
<style>
<!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
span.EmailStyle17
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;}
@page Section1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.Section1
        {page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=EN-US link=blue vlink=purple>
<div class=Section1>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>It looks like one process may have died for some reason (such as
seg fault) which caused another process to detect a broken connection and hence
it aborted the program.<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Rajeev<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<div style='border:none;border-left:solid blue 1.5pt;padding:0in 0in 0in 4.0pt'>
<div>
<div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in'>
<p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span
style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>
owner-mpich-discuss@mcs.anl.gov [mailto:owner-mpich-discuss@mcs.anl.gov] <b>On
Behalf Of </b>Yasmine Chebaro<br>
<b>Sent:</b> Monday, September 24, 2007 3:51 AM<br>
<b>To:</b> mpich-discuss@mcs.anl.gov<br>
<b>Subject:</b> [MPICH] Program running with mpich2 stops<o:p></o:p></span></p>
</div>
</div>
<p class=MsoNormal><o:p> </o:p></p>
<p class=MsoNormal style='margin-bottom:12.0pt'>Hi all,<br>
<br>
I am using mpich2 on a 64Bit Linux x86_64 machines, running in parrallel.<br>
The program running with mpich2 is a mix of Fortran 90 and 77, with the Intel
Fortran Compiler.<br>
I am actually having problems when I run my program with mpich2, in fact after
running for a couple of hours the program stops and the error message is
(ignore the four first lines they're the "normal" output of my
program) <br>
<br>
<br>
<i> 118
31.0400010000000<br>
119
12.0100000000000<br>
120
16.0000000000000<br>
121 16.0000000000000<br>
place holderplace holder <br>
Image
PC
Routine Line
Source<br>
simulateur
0000000000E3F3CE
Unknown
Unknown Unknown <br>
simulateur
0000000000E3E5CA
Unknown
Unknown Unknown<br>
simulateur
0000000000DF9F62
Unknown
Unknown Unknown<br>
simulateur
0000000000DC99BA
Unknown
Unknown Unknown <br>
simulateur
0000000000DC8F29
Unknown
Unknown Unknown<br>
simulateur 0000000000DDAB2B
Unknown
Unknown Unknown<br>
simulateur
0000000000438804
Unknown
Unknown Unknown <br>
simulateur
0000000000457C86
Unknown
Unknown Unknown<br>
simulateur
000000000045462D
Unknown
Unknown Unknown<br>
simulateur
0000000000411962
Unknown
Unknown Unknown <br>
libc.so.6
00002AE4FD3C34CA
Unknown
Unknown Unknown<br>
simulateur
00000000004118AA
Unknown
Unknown Unknown<br>
rank 1 in job 1 fargeau_44741 caused collective abort of all
ranks</i> <i><br>
exit status of rank 1: return code 104<br>
<br>
</i>I<i> </i>really don't know what to do know. I thought it was due to
some stack overflow with the Intel Compiler (which happened to me before), but
even with the option that solves the stack overflow problem, it doesnt work. <br>
If you have any idea why is this happening<i>.<br>
</i>Thanks in advance.<o:p></o:p></p>
</div>
</div>
</body>
</html>