<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=us-ascii"><meta name=Generator content="Microsoft Word 12 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
        {mso-style-priority:34;
        margin-top:0in;
        margin-right:0in;
        margin-bottom:0in;
        margin-left:.5in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri","sans-serif";}
span.EmailStyle18
        {mso-style-type:personal-compose;
        font-family:"Arial","sans-serif";
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'>Hello,<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'>We are having a problem using MPICH2 in order to execute MCNPX in an MPI environment, but only when we increase our number of processes beyond a certain point. We have about 10 workstations in our “mini-cloud”, each running MPICH2 and SMPD version 1.3.2p1, 32-bit.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'>The command line is simple, something like the following:<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>mpiexec -hosts 3 10.39.16.37 2 10.39.16.65 8 10.39.16.54 8 -env DATAPATH c:\mcnpx\data -dir c:\mcnpx\phill\test mcnpx i=test1.in<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'>We see the normal output from MCNPX, as well as the report that it is initializing MPI processes. The problem we are having is that MPIEXEC throws the following error (A) when trying to initialize the MPI calculation:<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>***<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>master starting 17 by 1 subtasks 08/19/11 17:02:54<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>master sending static commons...<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>master sending dynamic commons...<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>Fatal error in MPI_Send: Other MPI error, error stack:<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>MPI_Send(173)....................................: MPI_Send(buf=23110000, count=<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>9486420, MPI_PACKED, dest=12, tag=4, MPI_COMM_WORLD) failed<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>MPIDI_CH3I_Progress(353).........................:<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>MPID_nem_mpich2_blocking_recv(905)...............:<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>MPID_nem_newtcp_module_poll(37)..................:<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>MPID_nem_newtcp_module_connpoll(2655)............:<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>MPID_nem_newtcp_module_recv_success_handler(2322):<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>MPID_nem_handle_pkt(587).........................:<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>MPIDI_CH3_PktHandler_RndvClrToSend(253)..........: failure occurred while attemp<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>ting to send message data<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>MPID_nem_newtcp_iSendContig(409).................:<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>MPID_nem_newtcp_iSendContig(408).................: Unable to write to a socket,<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>An operation on a socket could not be performed because the system lacked suffi<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>cient buffer space or because a queue was full.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>(errno 10055)<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>***<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'>Then this error is repeated several times, increasing with the number of processes requested:<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>***<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>Fatal error in MPI_Probe: Other MPI error, error stack:<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>MPI_Probe(113).......................: MPI_Probe(src=MPI_ANY_SOURCE, tag=4, MPI_<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>COMM_WORLD, status=012242D0) failed<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>MPIDI_CH3I_Progress(353).............:<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>MPID_nem_mpich2_blocking_recv(905)...:<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>MPID_nem_newtcp_module_poll(37)......:<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>MPID_nem_newtcp_module_connpoll(2655):<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>gen_read_fail_handler(1145)..........: read from socket failed - The specified n<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>etwork name is no longer available.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New"'>***<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'>A few notes on the situation and debugging we have attempted:<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'>1. We use the precompiled MPI version of MCNPX, and the error happens with both v2.6 and v2.7e<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'>2. This DOES NOT happen when overloading the local cpu, i.e. sending 30 processes to a local dual-core cpu.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'>3. This happens both when using the –hosts option as well as when using the wmpiconfig utility to specify hosts and using the –n option on the MPIEXEC command line.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'>4. The number of hosts seems not to be the cause, i.e. sending 5 processes to 2 hosts, 2 processes to 5 hosts, or 1 process to 10 hosts all work fine.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'>5. This seems to be MCNPX input-file dependent, even between input files which differ only by a few numbers in certain locations.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'>Could this be a communication or windows firewall issue? We are truly stumped and have had difficulty finding hints or answers in the forum.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'>Best regards and many thanks in advance,<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'>patrick<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Arial","sans-serif"'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:9.0pt;font-family:"Arial","sans-serif"'>Patrick M. Hill, Ph.D.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:9.0pt;font-family:"Arial","sans-serif"'>Washington University in St. Louis<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:9.0pt;font-family:"Arial","sans-serif"'>Department of Radiation Oncology<o:p></o:p></span></p><p class=MsoNormal><o:p> </o:p></p></div><DIV> </DIV>The materials in this message are private and may contain Protected Healthcare Information. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.<br></body></html>