<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 12 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:Wingdings;
        panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
        {mso-style-priority:34;
        margin-top:0in;
        margin-right:0in;
        margin-bottom:0in;
        margin-left:.5in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri","sans-serif";}
span.EmailStyle17
        {mso-style-type:personal-compose;
        font-family:"Calibri","sans-serif";
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
/* List Definitions */
@list l0
        {mso-list-id:1286621494;
        mso-list-type:hybrid;
        mso-list-template-ids:-1615584106 269025281 269025283 269025285 269025281 269025283 269025285 269025281 269025283 269025285;}
@list l0:level1
        {mso-level-number-format:bullet;
        mso-level-text:\F0B7;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;
        font-family:Symbol;}
ol
        {margin-bottom:0in;}
ul
        {margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-CA" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal">Hello All,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I have a troubling intermittent problem with the simple VecSetValues/VecAssemblyBegin functions after porting a robust long working application to a cloud platform.
<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l0 level1 lfo1"><![if !supportLists]><span style="font-family:Symbol"><span style="mso-list:Ignore">·<span style="font:7.0pt "Times New Roman"">
</span></span></span><![endif]>I have 30M doubles on rank0<o:p></o:p></p>
<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l0 level1 lfo1"><![if !supportLists]><span style="font-family:Symbol"><span style="mso-list:Ignore">·<span style="font:7.0pt "Times New Roman"">
</span></span></span><![endif]>I intend to assign them non sequentially among 32 processors, ranks 1-31.<o:p></o:p></p>
<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l0 level1 lfo1"><![if !supportLists]><span style="font-family:Symbol"><span style="mso-list:Ignore">·<span style="font:7.0pt "Times New Roman"">
</span></span></span><![endif]>On rank0 only I use VecSetValues(x,...) to make the assignment. So far everything is fine.<o:p></o:p></p>
<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l0 level1 lfo1"><![if !supportLists]><span style="font-family:Symbol"><span style="mso-list:Ignore">·<span style="font:7.0pt "Times New Roman"">
</span></span></span><![endif]>I call VecAssemblyBegin expecting this to distribute the values appropriately.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Sometimes this works, but about 50% of the time I see errors, immediately on calling VecAssemblyBegin, of the following form:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"> [23]PETSC ERROR: Fatal error in MPI_Allreduce: Other MPI error, error stack:<o:p></o:p></p>
<p class="MsoNormal"> MPI_Allreduce(919).........................: MPI_Allreduce(sbuf=0000000012DE29B0, rbuf=00000000069F6ED0, count=32, dtype=USER, op=0x98000000, comm=0x84000002) failed<o:p></o:p></p>
<p class="MsoNormal"> MPIR_Allreduce_impl(776)...................:<o:p></o:p></p>
<p class="MsoNormal"> MPIR_Allreduce_intra(220)..................:<o:p></o:p></p>
<p class="MsoNormal"> MPIR_Bcast_impl(1273)......................:<o:p></o:p></p>
<p class="MsoNormal"> MPIR_Bcast_intra(1107).....................:<o:p></o:p></p>
<p class="MsoNormal"> MPIR_Bcast_binomial(143)...................:<o:p></o:p></p>
<p class="MsoNormal"> MPIC_Recv(110).............................:<o:p></o:p></p>
<p class="MsoNormal"> MPIC_Wait(540).............................:<o:p></o:p></p>
<p class="MsoNormal"> MPIDI_CH3I_Progress(353)...................:<o:p></o:p></p>
<p class="MsoNormal"> MPID_nem_mpich2_blocking_recv(905).........:<o:p></o:p></p>
<p class="MsoNormal"> MPID_nem_newtcp_module_poll(37)............:<o:p></o:p></p>
<p class="MsoNormal"> MPID_nem_newtcp_module_connpoll(2655)......:<o:p></o:p></p>
<p class="MsoNormal"> recv_id_or_tmpvc_info_success_handler(1278): read from socket failed - No error<o:p></o:p></p>
<p class="MsoNormal"> --------------------- Error Message ------------------------------------<o:p></o:p></p>
<p class="MsoNormal"> [23]PETSC ERROR: Out of memory. This could be due to allocating<o:p></o:p></p>
<p class="MsoNormal"> [23]PETSC ERROR: too large an object or bleeding by not properly<o:p></o:p></p>
<p class="MsoNormal"> [23]PETSC ERROR: destroying unneeded objects.<o:p></o:p></p>
<p class="MsoNormal"> [23]PETSC ERROR: Memory allocated 0 Memory used by process 0<o:p></o:p></p>
<p class="MsoNormal"> [23]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info.<o:p></o:p></p>
<p class="MsoNormal"> [23]PETSC ERROR: Memory requested 18446744066053327000!<o:p></o:p></p>
<p class="MsoNormal"> [23]PETSC ERROR: ------------------------------------------------------------------------<o:p></o:p></p>
<p class="MsoNormal"> [23]PETSC ERROR: Petsc Release Version 3.1.0, Patch 7, Mon Dec 20 14:26:37 CST 2010<o:p></o:p></p>
<p class="MsoNormal"> [23]PETSC ERROR: See docs/changes/index.html for recent updates.<o:p></o:p></p>
<p class="MsoNormal"> [23]PETSC ERROR: See docs/faq.html for hints about trouble shooting.<o:p></o:p></p>
<p class="MsoNormal"> [23]PETSC ERROR: See docs/in ...<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">My questions are (1) has anybody seen anything like this type of VecAssemblyBegin error? or (2) is it likely that splitting the VecSetValue in smaller blocks will help? or (4) is it likely that moving to mpich2 1.4p1 would help? (3) any
other thoughts?<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Thanks in advance,<o:p></o:p></p>
<p class="MsoNormal">Rob <o:p></o:p></p>
</div>
</body>
</html>