<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 12pt;
font-family:Calibri
}
--></style></head>
<body class='hmmessage'><div dir='ltr'><br><br><div><hr id="stopSpelling">Date: Wed, 13 Nov 2013 09:40:27 -0600<br>Subject: Re: [petsc-users] approaches to reduce computing time<br>From: knepley@gmail.com<br>To: pengxwang@hotmail.com<br>CC: petsc-users@mcs.anl.gov<br><br><div dir="ltr"><div class="ecxgmail_extra"><div class="ecxgmail_quote">On Wed, Nov 13, 2013 at 9:24 AM, Roc Wang <span dir="ltr"><<a href="mailto:pengxwang@hotmail.com" target="_blank">pengxwang@hotmail.com</a>></span> wrote:<br>
<blockquote class="ecxgmail_quote" style="border-left:1px #ccc solid;padding-left:1ex;">
<div><div dir="ltr">Hi, I tried to use -ksp_type bicg, but there was error. It was fine if I use gmres as solver. Doe it mean the matrix cannot be solved by BiCG? Thanks.<br></div></div></blockquote><div><br></div><div>BiCG can breakdown. You can try -ksp_type bcgstab<br><br>Thanks, I used -ksp_bcgsl. It worked well. <br><br>In addition, I also tried to employed the options -ksp_bcgsl_ell 1 and -ksp_bcgsl_cxpoly. But I the records of residual norm are same as without -ksp_bcgsl_cxpoly. Should there be difference between two options? OR, I didn't set the option correctly?<br><br>The options showed in the following webpages are different. <br><a href="http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/KSP/KSPBCGSL.html#KSPBCGSL" target="_blank">http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/KSP/KSPBCGSL.html#KSPBCGSL</a> <b>-ksp_bcgsl_cxpol <br><a href="http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/KSP/KSPBCGSLSetPol.html#KSPBCGSLSetPol" target="_blank"><a href="http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/KSP/KSPBCGSLSetPol.html#KSPBCGSLSetPol" target="_blank"><a href="http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/KSP/KSPBCGSLSetPol.html#KSPBCGSLSetPol" target="_blank">http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/KSP/KSPBCGSLSetPol.html#KSPBCGSLSetPol</a> </a></a></b><b>-ksp_bcgsl_cxpoly </b>- use enhanced polynomial .<br><br>Thanks. <a href="http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/KSP/KSPBCGSLSetPol.html#KSPBCGSLSetPol" target="_blank"></a><b><br><br></b><br></div>
<div><br></div><div> Matt</div><div> </div><blockquote class="ecxgmail_quote" style="border-left:1px #ccc solid;padding-left:1ex;"><div><div dir="ltr">[0]PETSC ERROR: --------------------- Error Message ------------------------------------<br>
[0]PETSC ERROR: Floating point exception!<br>[0]PETSC ERROR: Infinite or not-a-number generated in norm!<br>[0]PETSC ERROR: ------------------------------------------------------------------------<br>[0]PETSC ERROR: Petsc Release Version 3.3.0, Patch 6, Mon Feb 11 12:26:34 CST 2013 <br>
[0]PETSC ERROR: See docs/changes/index.html for recent updates.<br>[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.<br>[0]PETSC ERROR: See docs/index.html for manual pages.<br>[0]PETSC ERROR: ------------------------------------------------------------------------<br>
[0]PETSC ERROR: ./x.r on a arch-linu named node48.cocoa5 by pzw2 Wed Nov 13 10:09:22 2013<br>[0]PETSC ERROR: Libraries linked from /home/pzw2/ZSoft/petsc-3.3-p6/arch-linux2-c-opt/lib<br>[0]PETSC ERROR: Configure run at Tue Nov 12 09:52:45 2013<br>
[0]PETSC ERROR: Configure options --download-f-blas-lapack --with-mpi-dir=/usr/local/OpenMPI-1.6.4_Intel --download-hypre=1 --download-hdf5=1 --download-superlu_dist --download-parmetis --download-metis --download-spai --with-debugging=no<br>
[0]PETSC ERROR: ------------------------------------------------------------------------<br>[0]PETSC ERROR: VecNorm() line 169 in /home/pzw2/ZSoft/petsc-3.3-p6/src/vec/vec/interface/rvector.c<br>[0]PETSC ERROR: KSPSolve_BiCG() line 107 in /home/pzw2/ZSoft/petsc-3.3-p6/src/ksp/ksp/impls/bicg/bicg.c<br>
[0]PETSC ERROR: KSPSolve() line 446 in /home/pzw2/ZSoft/petsc-3.3-p6/src/ksp/ksp/interface/itfunc.c<br>[0]PETSC ERROR: LinearSolver() line 181 in "unknowndirectory/"src/solver.cpp<br>[23]PETSC ERROR: ------------------------------------------------------------------------<br>
[23]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range<br>[23]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger<br>[23]PETSC ERROR: or see <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind%5b23%5dPETSC" target="_blank">http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[23]PETSC</a> ERROR: or try <a href="http://valgrind.org" target="_blank">http://valgrind.org</a> on GNU/linux and Apple Mac OS X to find memory corruption errors<br>
[23]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run <br>[23]PETSC ERROR: to get more information on the crash.<br>[23]PETSC ERROR: --------------------- Error Message ------------------------------------<br>
[23]PETSC ERROR: Signal received!<br><br><div><hr>Date: Tue, 12 Nov 2013 15:34:16 -0600<br>Subject: Re: [petsc-users] approaches to reduce computing time<br>From: <a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a><br>
To: <a href="mailto:pengxwang@hotmail.com" target="_blank">pengxwang@hotmail.com</a><br>CC: <a href="mailto:jedbrown@mcs.anl.gov" target="_blank">jedbrown@mcs.anl.gov</a>; <a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a><br>
<br><div dir="ltr">On Tue, Nov 12, 2013 at 3:22 PM, Roc Wang <span dir="ltr"><<a href="mailto:pengxwang@hotmail.com" target="_blank">pengxwang@hotmail.com</a>></span> wrote:<br><div><div>
<blockquote style="border-left:1px #ccc solid;padding-left:1ex;">
<div><div dir="ltr"><br><br><div><hr>Date: Tue, 12 Nov 2013 14:59:30 -0600<br>Subject: Re: [petsc-users] approaches to reduce computing time<br>From: <a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a><br>
To: <a href="mailto:pengxwang@hotmail.com" target="_blank">pengxwang@hotmail.com</a><br>CC: <a href="mailto:jedbrown@mcs.anl.gov" target="_blank">jedbrown@mcs.anl.gov</a>; <a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a><br>
<br><div dir="ltr">On Tue, Nov 12, 2013 at 2:48 PM, Roc Wang <span dir="ltr"><<a href="mailto:pengxwang@hotmail.com" target="_blank">pengxwang@hotmail.com</a>></span> wrote:<br><div><div>
<blockquote style="border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex;">
<div><div dir="ltr"><br><br><div><hr>Date: Tue, 12 Nov 2013 14:22:35 -0600<br>Subject: Re: [petsc-users] approaches to reduce computing time<br>From: <a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a><br>
To: <a href="mailto:pengxwang@hotmail.com" target="_blank">pengxwang@hotmail.com</a><br>CC: <a href="mailto:jedbrown@mcs.anl.gov" target="_blank">jedbrown@mcs.anl.gov</a>; <a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a><br>
<br><div dir="ltr">On Tue, Nov 12, 2013 at 2:14 PM, Roc Wang <span dir="ltr"><<a href="mailto:pengxwang@hotmail.com" target="_blank">pengxwang@hotmail.com</a>></span> wrote:<br><div><div>
<blockquote style="border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex;">
<div><div dir="ltr">Thanks Jed,<br><br>I have questions about load balance and PC type below.<br><br><div>> From: <a href="mailto:jedbrown@mcs.anl.gov" target="_blank">jedbrown@mcs.anl.gov</a><br>> To: <a href="mailto:pengxwang@hotmail.com" target="_blank">pengxwang@hotmail.com</a>; <a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a><br>
> Subject: Re: [petsc-users] approaches to reduce computing time<br>> Date: Sun, 10 Nov 2013 12:20:18 -0700<br>> <br>> Roc Wang <<a href="mailto:pengxwang@hotmail.com" target="_blank">pengxwang@hotmail.com</a>> writes:<br>
> <br>> > Hi all,<br>> ><br>> > I am trying to minimize the computing time to solve a large sparse matrix. The matrix dimension is with m=321 n=321 and p=321. I am trying to reduce the computing time from two directions: 1 finding a Pre-conditioner to reduce the number of iterations which reduces the time numerically, 2 requesting more cores.<br>
> ><br>> > ----For the first method, I tried several methods:<br>> > 1 default KSP and PC,<br>> > 2 -ksp_type fgmres -ksp_gmres_restart 30 -pc_type ksp -ksp_pc_type jacobi, <br>> > 3 -ksp_type lgmres -ksp_gmres_restart 40 -ksp_lgmres_augment 10,<br>
> > 4 -ksp_type lgmres -ksp_gmres_restart 50 -ksp_lgmres_augment 10,<br>> > 5 -ksp_type lgmres -ksp_gmres_restart 40 -ksp_lgmres_augment 10 -pc_type asm (PCASM)<br>> ><br>> > The iterations and timing is like the following with 128 cores requested:<br>
> > case# iter timing (s)<br>> > 1 1436 816 <br>> > 2 3 12658<br>> > 3 1069 669.64<br>> > 4 872 768.12<br>> > 5 927 513.14<br>
> ><br>> > It can be seen that change -ksp_gmres_restart and -ksp_lgmres_augment can help to reduce the iterations but not the timing (comparing case 3 and 4). Second, the PCASM helps a lot. Although the second option is able to reduce iterations, the timing increases very much. Is it because more operations are needed in the PC?<br>
> ><br>> > My questions here are: 1. Which direction should I take to select<br>> > -ksp_gmres_restart and -ksp_lgmres_augment? For example, if larger<br>> > restart with large augment is better or larger restart with smaller<br>
> > augment is better?<br>> <br>> Look at the -log_summary. By increasing the restart, the work in<br>> KSPGMRESOrthog will increase linearly, but the number of iterations<br>> might decrease enough to compensate. There is no general rule here<br>
> since it depends on the relative expense of operations for your problem<br>> on your machine.<br>> <br>> > ----For the second method, I tried with -ksp_type lgmres -ksp_gmres_restart 40 -ksp_lgmres_augment 10 -pc_type asm with different number of cores. I found the speedup ratio increases slowly when more than 32 to 64 cores are requested. I searched the milling list archives and found that I am very likely running into the memory bandwidth bottleneck. <a href="http://www.mail-archive.com/petsc-users%40mcs.anl.gov/msg19152.html" target="_blank">http://www.mail-archive.com/petsc-users@mcs.anl.gov/msg19152.html</a>:<br>
> ><br>> > # of cores iter timing<br>> > 1 923 19541.83<br>> > 4 929 5897.06<br>> > 8 932 4854.72<br>> > 16 924 1494.33<br>
> > 32 924 1480.88<br>> > 64 928 686.89<br>> > 128 927 627.33<br>> > 256 926 552.93<br>> <br>> The bandwidth issue has more to do with using multiple cores within a<br>
> node rather than between nodes. Likely the above is a load balancing<br>> problem or bad communication.<br><br>I use DM to manage the distributed data. The DM was created by calling DMDACreate3d() and let PETSc decide the local number of nodes in each direction. To my understand the load of each core is determined at this stage. If the load balance is done when DMDACreate3d() is called and use PETSC_DECIDE option? Or how should make the load balanced after DM is created?<br>
</div></div></div></blockquote><div><br></div><div>We do not have a way to do fine-grained load balancing for the DMDA since it is intended for very simple topologies. You can see</div><div>if it is load imbalance from the division by running with a cube that is evenly divisible with a cube number of processes.</div>
<div><br></div><div> Matt<br><br>So, I have nothing to do to make the load balanced if I use DMDA? Would you please take a look at the attached log summary files and give me some suggestions on how to improve the speedup ratio? Thanks.</div>
</div></div></div></div></div></div></blockquote><div><br></div><div>Please try what I suggested above. And it looks like there is a little load imbalance<br><br>Roc----So if the domain is a cube, then the number of the processors is better to be like 2^3=8, 3^3=9, 4^4 =16, and so on, right?<br>
</div></div></div></div></div></div></div></blockquote><div><br></div><div>I want you to try this to eliminate load imbalance as a reason for poor speedup. I don't think it is, but we will see.</div><div> </div><blockquote style="border-left:1px #ccc solid;padding-left:1ex;">
<div><div dir="ltr"><div><div dir="ltr"><div><div><div>I am also wondering whether the physical boundary type effects the load balance? Since freed node, Dirichlet node and Neumann node has different number of neighbors?<br>
</div><div><br></div><div><pre style="white-space:pre-wrap;word-wrap:break-word;">VecAXPY 234 1.0 1.0124e+00 3.4 1.26e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 15290
</pre></div><div><pre style="white-space:pre-wrap;word-wrap:break-word;">VecAXPY 234 1.0 4.2862e-01 3.6 6.37e+07 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 36115
</pre></div><div><br></div><div>although it is not limiting the speedup. The time imbalance is really strange. I am guessing other jobs are running on this machine.<br><br><br>Roc----The code was run a cluster. There should be other jobs were running. Do you mean those jobs affect the load balance of my job or speed of the cluster? I am just trying to improve the scalability of the code, but really don't know what's the reason that the speedup ratio decreases so quickly? Thanks.</div>
</div></div></div></div></div></div></blockquote><div><br></div><div>Yes, other people running can definitely screw up speedup and cause imbalance. Usually timing runs are made with dedicated time.</div><div><br></div><div>
Your VecAXPY and MatMult are speeding up just fine. It is reductions which are killing your computation.</div><div>You should switch to a more effective preconditioner, so you can avoid all those dot products. Also, you</div>
<div>might try something like BiCG with fewer dot products.</div><div><br></div><div> Matt</div><div> </div><blockquote style="border-left:1px #ccc solid;padding-left:1ex;"><div><div dir="ltr">
<div><div dir="ltr"><div><div><div> Matt</div><div> </div><blockquote style="border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex;">
<div><div dir="ltr"><div><div dir="ltr"><div><div><blockquote style="border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex;"><div><div dir="ltr"><div>> <br>> > My question here is: Is there any other PC can help on both reducing iterations and increasing scalability? Thanks. <br>
> <br>> Always send -log_summary with questions like this, but algebraic multigrid is a good place to start.<br><br>Please take a look at the attached log file, they are for 128 cores and 256 cores, respectively. Based on the log files, what should be done to increase the scalability? Thanks.<span><font color="#888888"><br>
</font></span></div><span><font color="#888888"> </font></span></div></div><span><font color="#888888">
</font></span></blockquote></div><span><font color="#888888"><br><br clear="all"><span><font color="#888888"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener
</font></span></font></span></div></div></div><span><font color="#888888"> </font></span></div></div><span><font color="#888888">
</font></span></blockquote></div><span><font color="#888888"><br><br clear="all"><span class="ecxHOEnZb"><font color="#888888"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener
</font></span></font></span></div></div></div><span class="ecxHOEnZb"><font color="#888888"> </font></span></div></div><span class="ecxHOEnZb"><font color="#888888">
</font></span></blockquote></div><span class="ecxHOEnZb"><font color="#888888"><br><br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener
</font></span></div></div></div> </div></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener
</div></div></div> </div></body>
</html>