<div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br><br>
> VecPointwiseMult 402 1.0 2.9605e-01 3.6 1.05e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 5 1 0 0 0 22515 70608 0 0.00e+00 0 0.00e+00 100<br>
> VecScatterBegin 400 1.0 1.6791e-01 6.0 0.00e+00 0.0 3.7e+05 1.6e+04 0.0e+00 0 0 62 54 0 2 0100100 0 0 0 0 0.00e+00 0 0.00e+00 0<br>
> VecScatterEnd 400 1.0 1.0057e+00 7.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 5 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0<br>
> PCApply 402 1.0 2.9638e-01 3.6 1.05e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 5 1 0 0 0 22490 70608 0 0.00e+00 0 0.00e+00 100<br>
<br>
Most of the MatMult time is attributed to VecScatterEnd here. Can you share a run of the same total problem size on 8 ranks (one rank per GPU)? <br>
<br></blockquote><div><br></div><div>attached. I ran out of memory with the same size problem so this is the 262K / GPU version.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
>From the other log file (10x bigger problem)<br><br></blockquote><div><br></div><div>???? </div></div></div>