<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Exchange Server">
<!-- converted from rtf -->
<style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>
</head>
<body>
<font face="Arial, sans-serif" size="2">
<div>Hi,</div>
<div>Summary:</div>
<div>---------------</div>
<div>On Windows when I execute the following command (working on a fairly large dataset):</div>
<div> mpiexec -hosts 2 usctap3825 15 usctap3488 1 <a href="\\fs1\correlatempi.exe"><font color="#0000FF"><u>\\fs1\correlatempi.exe</u></font></a> cfg.xml in.h5 out.h5 debug</div>
<div>I encounter an MPI gather error (read from socket failed (errno 10055). See error stack at end of this message. If I run on only one computer (with 16 cores):</div>
<div> mpiexec -hosts 1 usctap3825 15 <a href="\\fs1\correlatempi.exe"><font color="#0000FF"><u>\\fs1\correlatempi.exe</u></font></a> cfg.xml in.h5 out.h5 debug</div>
<div>the program runs successfully.</div>
<div> </div>
<div>Additionally, both of the above commands run successfully on mpich2 v1.2.1 (although I had to build on mpich2 1.2.1 and used different servers that are configured exactly like the origian servers noted above (e.g., usctap3825, 16-core, 64GB memory, etc).</div>
<div> </div>
<div>I noticed that a similar error was fixed in mpich2-1.2 (<a href="http://trac.mcs.anl.gov/projects/mpich2/ticket/895"><font color="#0000FF"><u>http://trac.mcs.anl.gov/projects/mpich2/ticket/895</u></font></a>). Could this have regressed? tia.</div>
<div> </div>
<div>System Configuration:</div>
<div>--------------------------------</div>
<div style="margin-top: 5pt; margin-bottom: 5pt; "><font face="Courier New, sans-serif">Server1 (usctap3825)<font face="Times New Roman, serif" size="3">
<br>
</font>-------<font face="Times New Roman, serif" size="3"> <br>
</font>a. Windows Server 2003, 64-bit, SP2<font face="Times New Roman, serif" size="3">
<br>
</font>b. 16 cores/processors<font face="Times New Roman, serif" size="3"> <br>
</font>c. 64GB memory<font face="Times New Roman, serif" size="3"> <br>
</font>d. Physical computer<font face="Times New Roman, serif" size="3"> </font></font></div>
<div style="margin-top: 5pt; margin-bottom: 5pt; "><font face="Courier New, sans-serif">Server2<font face="Times New Roman, serif" size="3"> </font>(usctap3488)<font face="Times New Roman, serif" size="3">
<br>
</font>-------<font face="Times New Roman, serif" size="3"> <br>
</font>a. Windows Server 2003, 64-bit, SP2<font face="Times New Roman, serif" size="3">
<br>
</font>b. 2 cores/processors<font face="Times New Roman, serif" size="3"> <br>
</font>c. 8GB memory<font face="Times New Roman, serif" size="3"> <br>
</font>d. Virtual Machine<font face="Times New Roman, serif" size="3"> </font></font></div>
<div><font face="Times New Roman, serif" size="3"> </font></div>
<div>cheers, roy</div>
<div> </div>
<div> </div>
<div>error stack:</div>
<div>----------------</div>
<div>Fatal error in PMPI_Gatherv: Other MPI error, error stack:</div>
<div>PMPI_Gatherv(398)................................: MPI_Gatherv failed(sbuf=00000</div>
<div>0003AA30040, scount=97787376, MPI_FLOAT, rbuf=0000000180040040, rcnts=000000000D</div>
<div>6515E0, displs=000000000D651630, MPI_FLOAT, root=0, MPI_COMM_WORLD) failed</div>
<div>MPIR_Gatherv_impl(210)...........................:</div>
<div>MPIR_Gatherv(118)................................:</div>
<div>MPIC_Waitall_ft(852).............................:</div>
<div>MPIR_Waitall_impl(121)...........................:</div>
<div>MPIDI_CH3I_Progress(353).........................:</div>
<div>MPID_nem_mpich2_blocking_recv(905)...............:</div>
<div>MPID_nem_newtcp_module_poll(37)..................:</div>
<div>MPID_nem_newtcp_module_connpoll(2669)............:</div>
<div>MPID_nem_newtcp_module_recv_success_handler(2364):</div>
<div>MPID_nem_newtcp_module_post_readv_ex(330)........:</div>
<div>MPIU_SOCKW_Readv_ex(392).........................: read from socket failed, An o</div>
<div>peration on a socket could not be performed because the system lacked sufficient</div>
<div> buffer space or because a queue was full.</div>
<div> (errno 10055)</div>
<div> </div>
</font>
<pre>Notice: This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates Direct contact information
for affiliates is available at
http://www.merck.com/contact/contacts.html) that may be confidential,
proprietary copyrighted and/or legally privileged. It is intended solely
for the use of the individual or entity named on this message. If you are
not the intended recipient, and have received this message in error,
please notify us immediately by reply e-mail and then delete it from
your system.
</pre></body>
</html>