<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif; "><div>Can anyone help me understand the following crash in MPICH2 1.3.1 running on Ubuntu Linux and using the Nemesis channel. &nbsp;The application has a couple of threads and is starting with MPI_<span style="font-style: italic">THREAD_</span>MULTIPLE. &nbsp;Most of the application is just doing MPI_ISend/Irecv. &nbsp;There is a progress thread that essentially is in an MPI_Waitsome loop. &nbsp;A 3rd thread periodically wakes up the progress thread by use of a generalized request. &nbsp;</div><div><br></div><div>In case it matters, I built &nbsp;64 bit using —disable-fc —enable-g=dbg and —disable-fast</div><div><br></div><div><font class="Apple-style-span" face="Consolas"><span class="Apple-style-span" style="font-size: medium;"><div><span style="font-family: Calibri; ">Here is the thread that suffers the SIGSEGV. &nbsp;As you can see this</span></div><div><span style="font-family: Calibri; ">thread is in MPI_Waitsome waiting on an array of requests. &nbsp;The last</span></div><div><span style="font-family: Calibri; ">request in the array is a generalized request that the other thread is</span></div><div><span style="font-family: Calibri; ">interested in (see below).</span></div><div><span style="font-family: Calibri; "><br></span></div><div><span style="font-family: Calibri; ">(gdb) display/i $pc</span></div><div><span style="font-family: Calibri; ">1: x/i $pc</span></div><div><span style="font-family: Calibri; ">=&gt; 0xdc7ec1 &lt;poll_active_fboxes&#43;212&gt;:</span><span class="Apple-tab-span" style="white-space: pre; font-family: Calibri; ">        </span><span style="font-family: Calibri; ">mov &nbsp; &nbsp;0x10(%rax),%rax</span></div><div><span style="font-family: Calibri; ">(gdb) print $rax</span></div><div><span style="font-family: Calibri; ">$6 = 0</span></div><div><span style="font-family: Calibri; ">(gdb) bt</span></div><div><span style="font-family: Calibri; ">#0 &nbsp;0x0000000000dc7ec1 in poll_active_fboxes (cell=0x7f7f38e0d1e0)</span></div><div><span style="font-family: Calibri; ">&nbsp;&nbsp; &nbsp;at /home/dblair/insight/mpich2-1.3.1/src/mpid/ch3/channels/nemesis/nemesis/include/mpid_nem_fbox.h:51</span></div><div><span style="font-family: Calibri; ">#1 &nbsp;0x0000000000dc7f7e in MPID_nem_mpich2_test_recv (cell=0x7f7f38e0d1e0,&nbsp;</span></div><div><span style="font-family: Calibri; ">&nbsp;&nbsp; &nbsp;in_fbox=0x7f7f38e0d210, in_blocking_progress=1)</span></div><div><span style="font-family: Calibri; ">&nbsp;&nbsp; &nbsp;at /home/dblair/insight/mpich2-1.3.1/src/mpid/ch3/channels/nemesis/nemesis/include/mpid_nem_inline.h:741</span></div><div><span style="font-family: Calibri; ">#2 &nbsp;0x0000000000dc75a5 in MPIDI_CH3I_Progress (progress_state=0x7f7f38e0d340,&nbsp;</span></div><div><span style="font-family: Calibri; ">&nbsp;&nbsp; &nbsp;is_blocking=1) at ch3_progress.c:333</span></div><div><span style="font-family: Calibri; ">#3 &nbsp;0x0000000000dbcb74 in PMPI_Waitsome (incount=5,&nbsp;</span></div><div><span style="font-family: Calibri; ">&nbsp;&nbsp; &nbsp;array_of_requests=0x2f87400, outcount=0x2bb4ca8,&nbsp;</span></div><div><span style="font-family: Calibri; ">&nbsp;&nbsp; &nbsp;array_of_indices=0x2f87440, array_of_statuses=0x2f8b930) at waitsome.c:255</span></div><div><span style="font-family: Calibri; ">#4 &nbsp;0x0000000000d21a3d in RuntimeMessageDemuxOperator::onEvent (</span></div><div><span style="font-family: Calibri; ">&nbsp;&nbsp; &nbsp;this=0x2bb4c30, port=0x2c84cc0) at dataflow/MessagePassing.cc:173</span></div><div><span style="font-family: Calibri; ">#5 &nbsp;0x0000000000d275d9 in DataflowScheduler::runOperator (this=0x2c4a000,&nbsp;</span></div><div><span style="font-family: Calibri; ">&nbsp;&nbsp; &nbsp;port=...) at dataflow/DataflowRuntime.cc:91</span></div><div><span style="font-family: Calibri; ">#6 &nbsp;0x0000000000d2799a in DataflowScheduler::run (this=0x2c4a000)</span></div><div><span style="font-family: Calibri; ">&nbsp;&nbsp; &nbsp;at dataflow/DataflowRuntime.cc:232</span></div><div><span style="font-family: Calibri; ">#7 &nbsp;0x0000000000c6f55a in RuntimeProcess::run (s=...)</span></div><div><span style="font-family: Calibri; ">&nbsp;&nbsp; &nbsp;at dataflow/RuntimeProcess.cc:424</span></div><div><span style="font-family: Calibri; ">#8 &nbsp;0x0000000000c84d9d in boost::_bi::list1&lt;boost::reference_wrapper&lt;DataflowScheduler&gt; &gt;::operator()&lt;void (*)(DataflowScheduler&amp;), boost::_bi::list0&gt; (</span></div><div><span style="font-family: Calibri; ">&nbsp;&nbsp; &nbsp;this=0x2bfbc78, f=@0x2bfbc70, a=...)</span></div><div><span style="font-family: Calibri; ">&nbsp;&nbsp; &nbsp;at boost_1_42_0/boost/bind/bind.hpp:253</span></div><div><span style="font-family: Calibri; ">#9 &nbsp;0x0000000000c84dda in boost::_bi::bind_t&lt;void, void (*)(DataflowScheduler&amp;), boost::_bi::list1&lt;boost::reference_wrapper&lt;DataflowScheduler&gt; &gt; &gt;::operator()</span></div><div><span style="font-family: Calibri; ">&nbsp;&nbsp; &nbsp;(this=0x2bfbc70) at boost_1_42_0/boost/bind/bind_template.hpp:20</span></div><div><span style="font-family: Calibri; ">#10 0x0000000000c84df8 in boost::detail::thread_data&lt;boost::_bi::bind_t&lt;void, void (*)(DataflowScheduler&amp;), boost::_bi::list1&lt;boost::reference_wrapper&lt;DataflowScheduler&gt; &gt; &gt; &gt;::run (this=0x2bfbb40)</span></div><div><span style="font-family: Calibri; ">&nbsp;&nbsp; &nbsp;at boost_1_42_0/boost/thread/detail/thread.hpp:56</span></div><div><span style="font-family: Calibri; ">#11 0x0000000000c6487f in thread_proxy (param=0x2bfbb40)</span></div><div><span style="font-family: Calibri; ">&nbsp;&nbsp; &nbsp;at boost_1_42_0/libs/thread/src/pthread/thread.cpp:120</span></div><div><span style="font-family: Calibri; ">#12 0x00007f7f3aa199ca in start_thread () from /lib/libpthread.so.0</span></div><div><span style="font-family: Calibri; ">#13 0x00007f7f399a370d in clone () from /lib/libc.so.6</span></div><div><span style="font-family: Calibri; ">#14 0x0000000000000000 in ?? ()</span></div><div><span style="font-family: Calibri; "><br></span></div><div><span style="font-family: Calibri; ">This thread is calling MPI_Request_get_status to check on whether</span></div><div><span style="font-family: Calibri; ">a generalized request is completed (if not it will be calling MPI_Grequest_complete).</span></div><div><span style="font-family: Calibri; "><br></span></div><div><span style="font-family: Calibri; ">(gdb) bt</span></div><div><span style="font-family: Calibri; ">#0 &nbsp;poll_active_fboxes (cell=0x7f7f37e0b7f0)</span></div><div><span style="font-family: Calibri; ">&nbsp;&nbsp; &nbsp;at /home/dblair/insight/mpich2-1.3.1/src/mpid/ch3/channels/nemesis/nemesis/include/mpid_nem_fbox.h:43</span></div><div><span style="font-family: Calibri; ">#1 &nbsp;0x0000000000dc7f7e in MPID_nem_mpich2_test_recv (cell=0x7f7f37e0b7f0,&nbsp;</span></div><div><span style="font-family: Calibri; ">&nbsp;&nbsp; &nbsp;in_fbox=0x7f7f37e0b820, in_blocking_progress=0)</span></div><div><span style="font-family: Calibri; ">&nbsp;&nbsp; &nbsp;at /home/dblair/insight/mpich2-1.3.1/src/mpid/ch3/channels/nemesis/nemesis/include/mpid_nem_inline.h:741</span></div><div><span style="font-family: Calibri; ">#2 &nbsp;0x0000000000dc75a5 in MPIDI_CH3I_Progress (progress_state=0x0,&nbsp;</span></div><div><span style="font-family: Calibri; ">&nbsp;&nbsp; &nbsp;is_blocking=0) at ch3_progress.c:333</span></div><div><span style="font-family: Calibri; ">#3 &nbsp;0x0000000000dba6c2 in PMPI_Request_get_status (request=-1409286144,&nbsp;</span></div><div><span style="font-family: Calibri; ">&nbsp;&nbsp; &nbsp;flag=0x7f7f37e0b928, status=0x7f7f37e0b8f0) at request_get_status.c:110</span></div><div><span style="font-family: Calibri; ">#4 &nbsp;0x0000000000d2265a in RuntimeMessageDemuxOperator::wakeupTimer (</span></div><div><span style="font-family: Calibri; ">&nbsp;&nbsp; &nbsp;this=0x2bb4c30) at dataflow/MessagePassing.cc:88</span></div><div><span style="font-family: Calibri; ">#5 &nbsp;0x0000000000d2327c in boost::_mfi::mf0&lt;void, RuntimeMessageDemuxOperator&gt;::operator() (this=0x2c2f810, p=0x2bb4c30)</span></div><div><span style="font-family: Calibri; ">&nbsp;&nbsp; &nbsp;at boost_1_42_0/boost/bind/mem_fn_template.hpp:49</span></div><div><span style="font-family: Calibri; ">#6 &nbsp;0x0000000000d232e9 in boost::_bi::list1&lt;boost::_bi::value&lt;RuntimeMessageDemuxOperator*&gt; &gt;::operator()&lt;boost::_mfi::mf0&lt;void, RuntimeMessageDemuxOperator&gt;, boost::_bi::list0&gt; (this=0x2c2f820, f=..., a=...)</span></div><div><span style="font-family: Calibri; ">&nbsp;&nbsp; &nbsp;at boost_1_42_0/boost/bind/bind.hpp:253</span></div><div><span style="font-family: Calibri; ">#7 &nbsp;0x0000000000d23326 in boost::_bi::bind_t&lt;void, boost::_mfi::mf0&lt;void, RuntimeMessageDemuxOperator&gt;, boost::_bi::list1&lt;boost::_bi::value&lt;RuntimeMessageDemuxOperator*&gt; &gt; &gt;::operator() (this=0x2c2f810)</span></div><div><span style="font-family: Calibri; ">&nbsp;&nbsp; &nbsp;at boost_1_42_0/boost/bind/bind_template.hpp:20</span></div><div><span style="font-family: Calibri; ">#8 &nbsp;0x0000000000d23344 in boost::detail::thread_data&lt;boost::_bi::bind_t&lt;void, boost::_mfi::mf0&lt;void, RuntimeMessageDemuxOperator&gt;, boost::_bi::list1&lt;boost::_bi::value&lt;RuntimeMessageDemuxOperator*&gt; &gt; &gt; &gt;::run (this=0x2c2f6e0)</span></div><div><span style="font-family: Calibri; ">&nbsp;&nbsp; &nbsp;at boost_1_42_0/boost/thread/detail/thread.hpp:56</span></div><div><span style="font-family: Calibri; ">#9 &nbsp;0x0000000000c6487f in thread_proxy (param=0x2c2f6e0)</span></div><div><span style="font-family: Calibri; ">&nbsp;&nbsp; &nbsp;at boost_1_42_0/libs/thread/src/pthread/thread.cpp:120</span></div><div><span style="font-family: Calibri; ">#10 0x00007f7f3aa199ca in start_thread () from /lib/libpthread.so.0</span></div><div><span style="font-family: Calibri; ">#11 0x00007f7f399a370d in clone () from /lib/libc.so.6</span></div><div><span style="font-family: Calibri; ">#12 0x0000000000000000 in ?? ()</span></div><div><br></div></span></font></div></body></html>