[petsc-users] -log_view hangs unexpectedly // how to optimize my kspsolve
Manuel Valera
mvalera at mail.sdsu.edu
Sun Jan 8 18:22:04 CST 2017
Ok many thanks Barry,
For the cpu:sockets binding i get an ugly error:
[valera at ocean petsc]$ make streams NPMAX=4 MPI_BINDING="--binding
cpu:sockets"
cd src/benchmarks/streams; /usr/bin/gmake --no-print-directory
PETSC_DIR=/home/valera/petsc PETSC_ARCH=arch-linux2-c-debug streams
/home/valera/petsc/arch-linux2-c-debug/bin/mpicc -o MPIVersion.o -c -Wall
-Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas
-fvisibility=hidden -g -O -I/home/valera/petsc/include
-I/home/valera/petsc/arch-linux2-c-debug/include `pwd`/MPIVersion.c
Running streams with '/home/valera/petsc/arch-linux2-c-debug/bin/mpiexec
--binding cpu:sockets' using 'NPMAX=4'
[proxy:0:0 at ocean] handle_bitmap_binding
(tools/topo/hwloc/topo_hwloc.c:203): unrecognized binding string
"cpu:sockets"
[proxy:0:0 at ocean] HYDT_topo_hwloc_init (tools/topo/hwloc/topo_hwloc.c:415):
error binding with bind "cpu:sockets" and map "(null)"
[proxy:0:0 at ocean] HYDT_topo_init (tools/topo/topo.c:62): unable to
initialize hwloc
[proxy:0:0 at ocean] launch_procs (pm/pmiserv/pmip_cb.c:515): unable to
initialize process topology
[proxy:0:0 at ocean] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:892):
launch_procs returned error
[proxy:0:0 at ocean] HYDT_dmxu_poll_wait_for_event
(tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:0 at ocean] main (pm/pmiserv/pmip.c:206): demux engine error waiting
for event
[mpiexec at ocean] control_cb (pm/pmiserv/pmiserv_cb.c:200): assert (!closed)
failed
[mpiexec at ocean] HYDT_dmxu_poll_wait_for_event
(tools/demux/demux_poll.c:76): callback returned error status
[mpiexec at ocean] HYD_pmci_wait_for_completion
(pm/pmiserv/pmiserv_pmci.c:198): error waiting for event
[mpiexec at ocean] main (ui/mpich/mpiexec.c:344): process manager error
waiting for completion
[proxy:0:0 at ocean] handle_bitmap_binding
(tools/topo/hwloc/topo_hwloc.c:203): unrecognized binding string
"cpu:sockets"
[proxy:0:0 at ocean] HYDT_topo_hwloc_init (tools/topo/hwloc/topo_hwloc.c:415):
error binding with bind "cpu:sockets" and map "(null)"
[proxy:0:0 at ocean] HYDT_topo_init (tools/topo/topo.c:62): unable to
initialize hwloc
[proxy:0:0 at ocean] launch_procs (pm/pmiserv/pmip_cb.c:515): unable to
initialize process topology
[proxy:0:0 at ocean] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:892):
launch_procs returned error
[proxy:0:0 at ocean] HYDT_dmxu_poll_wait_for_event
(tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:0 at ocean] main (pm/pmiserv/pmip.c:206): demux engine error waiting
for event
[mpiexec at ocean] control_cb (pm/pmiserv/pmiserv_cb.c:200): assert (!closed)
failed
[mpiexec at ocean] HYDT_dmxu_poll_wait_for_event
(tools/demux/demux_poll.c:76): callback returned error status
[mpiexec at ocean] HYD_pmci_wait_for_completion
(pm/pmiserv/pmiserv_pmci.c:198): error waiting for event
[mpiexec at ocean] main (ui/mpich/mpiexec.c:344): process manager error
waiting for completion
[proxy:0:0 at ocean] handle_bitmap_binding
(tools/topo/hwloc/topo_hwloc.c:203): unrecognized binding string
"cpu:sockets"
[proxy:0:0 at ocean] HYDT_topo_hwloc_init (tools/topo/hwloc/topo_hwloc.c:415):
error binding with bind "cpu:sockets" and map "(null)"
[proxy:0:0 at ocean] HYDT_topo_init (tools/topo/topo.c:62): unable to
initialize hwloc
[proxy:0:0 at ocean] launch_procs (pm/pmiserv/pmip_cb.c:515): unable to
initialize process topology
[proxy:0:0 at ocean] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:892):
launch_procs returned error
[proxy:0:0 at ocean] HYDT_dmxu_poll_wait_for_event
(tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:0 at ocean] main (pm/pmiserv/pmip.c:206): demux engine error waiting
for event
[mpiexec at ocean] control_cb (pm/pmiserv/pmiserv_cb.c:200): assert (!closed)
failed
[mpiexec at ocean] HYDT_dmxu_poll_wait_for_event
(tools/demux/demux_poll.c:76): callback returned error status
[mpiexec at ocean] HYD_pmci_wait_for_completion
(pm/pmiserv/pmiserv_pmci.c:198): error waiting for event
[mpiexec at ocean] main (ui/mpich/mpiexec.c:344): process manager error
waiting for completion
[proxy:0:0 at ocean] handle_bitmap_binding
(tools/topo/hwloc/topo_hwloc.c:203): unrecognized binding string
"cpu:sockets"
[proxy:0:0 at ocean] HYDT_topo_hwloc_init (tools/topo/hwloc/topo_hwloc.c:415):
error binding with bind "cpu:sockets" and map "(null)"
[proxy:0:0 at ocean] HYDT_topo_init (tools/topo/topo.c:62): unable to
initialize hwloc
[proxy:0:0 at ocean] launch_procs (pm/pmiserv/pmip_cb.c:515): unable to
initialize process topology
[proxy:0:0 at ocean] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:892):
launch_procs returned error
[proxy:0:0 at ocean] HYDT_dmxu_poll_wait_for_event
(tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:0 at ocean] main (pm/pmiserv/pmip.c:206): demux engine error waiting
for event
[mpiexec at ocean] control_cb (pm/pmiserv/pmiserv_cb.c:200): assert (!closed)
failed
[mpiexec at ocean] HYDT_dmxu_poll_wait_for_event
(tools/demux/demux_poll.c:76): callback returned error status
[mpiexec at ocean] HYD_pmci_wait_for_completion
(pm/pmiserv/pmiserv_pmci.c:198): error waiting for event
[mpiexec at ocean] main (ui/mpich/mpiexec.c:344): process manager error
waiting for completion
------------------------------------------------
Im sending the binary file for the other list in a separate mail next,
Regards,
On Sun, Jan 8, 2017 at 4:05 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> Manuel,
>
> Ok there are two (actually 3) distinct things you need to deal with
> to get get any kind of performance out of this machine.
>
> 0) When running on the machine you cannot share it with other peoples jobs
> or you will get timings all over the place so run streams and benchmarks of
> your code when no one else has jobs running (The Unix top command helps)
>
> 1) mpiexec is making bad decisions about process binding (what MPI
> processes are bound/assigned to what MPI cores).
>
> From streams you have
>
> np speedup
> 1 1.0
> 2 1.95
> 3 0.57
> 4 0.6
> 5 2.79
> 6 2.8
> 7 2.74
> 8 2.67
> 9 2.55
> 10 2.68
> .....
>
> This is nuts. When going from 2 to 3 processes the performance goes WAY
> down. If the machine is empty and MPI did a good assignment of processes to
> cores the speedup should not go down for more cores it should just stagnate.
>
> So you need to find out how to do process binding with MPI see
> http://www.mcs.anl.gov/petsc/documentation/faq.html#computers and the
> links from there. You can run the streams test with binding by for example
> make streams NPMAX=4 MPI_BINDING="--binding cpu:sockets".
>
> Once you have a good binding for your MPI make sure you always run the
> mpiexec with that binding when running your code.
>
> 2) Both preconditioners you have tried for your problem are terrible. With
> block Jacobi it went from 156 linear iterations (for 5 linear solves) to
> 546 iterations. With AMG it went from 1463!! iterations to 1760. These are
> huge numbers of iterations for algebraic multigrid!
>
> For some reason AMG doesn't like your pressure matrix (even though AMG
> generally loves pressure matrices). What do you have for boundary
> conditions for your pressure?
>
> Please run with -ksp_view_mat binary -ksp_view_rhs binary and then send
> the resulting file binaryoutput to petsc-maint at mcs.anl.gov and we'll see
> if we can figure out why AMG doesn't like it.
>
>
>
>
>
>
>
> > On Jan 8, 2017, at 4:41 PM, Manuel Valera <mvalera at mail.sdsu.edu> wrote:
> >
> > Ok, i just did the streams and log_summary tests, im attaching the
> output for each run, with NPMAX=4 and NPMAX=32, also -log_summary runs with
> -pc_type hypre and without it, with 1 and 2 cores, all of this with
> debugging turned off.
> >
> > The matrix is 200,000x200,000, full curvilinear 3d meshes,
> non-hydrostatic pressure solver.
> >
> > Thanks a lot for your insight,
> >
> > Manuel
> >
> > On Sun, Jan 8, 2017 at 9:48 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> > we need to see the -log_summary with hypre on 1 and 2 processes (with
> debugging tuned off) also we need to see the output from
> >
> > make stream NPMAX=4
> >
> > run in the PETSc directory.
> >
> >
> >
> > > On Jan 7, 2017, at 7:38 PM, Manuel Valera <mvalera at mail.sdsu.edu>
> wrote:
> > >
> > > Ok great, i tried those command line args and this is the result:
> > >
> > > when i use -pc_type gamg:
> > >
> > > [1]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > > [1]PETSC ERROR: Petsc has generated inconsistent data
> > > [1]PETSC ERROR: Have un-symmetric graph (apparently). Use
> '-pc_gamg_sym_graph true' to symetrize the graph or '-pc_gamg_threshold
> -1.0' if the matrix is structurally symmetric.
> > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/
> documentation/faq.html for trouble shooting.
> > > [1]PETSC ERROR: Petsc Release Version 3.7.4, unknown
> > > [1]PETSC ERROR: ./ucmsMR on a arch-linux2-c-debug named ocean by
> valera Sat Jan 7 17:35:05 2017
> > > [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++
> --with-fc=gfortran --download-fblaslapack --download-mpich --download-hdf5
> --download-netcdf --download-hypre --download-metis --download-parmetis
> --download-trillinos
> > > [1]PETSC ERROR: #1 smoothAggs() line 462 in
> /usr/dataC/home/valera/petsc/src/ksp/pc/impls/gamg/agg.c
> > > [1]PETSC ERROR: #2 PCGAMGCoarsen_AGG() line 998 in
> /usr/dataC/home/valera/petsc/src/ksp/pc/impls/gamg/agg.c
> > > [1]PETSC ERROR: #3 PCSetUp_GAMG() line 571 in
> /usr/dataC/home/valera/petsc/src/ksp/pc/impls/gamg/gamg.c
> > > [1]PETSC ERROR: #4 PCSetUp() line 968 in /usr/dataC/home/valera/petsc/
> src/ksp/pc/interface/precon.c
> > > [1]PETSC ERROR: #5 KSPSetUp() line 390 in /usr/dataC/home/valera/petsc/
> src/ksp/ksp/interface/itfunc.c
> > > application called MPI_Abort(comm=0x84000002, 77) - process 1
> > >
> > >
> > > when i use -pc_type gamg and -pc_gamg_sym_graph true:
> > >
> > > ------------------------------------------------------------
> ------------
> > > [0]PETSC ERROR: Caught signal number 8 FPE: Floating Point
> Exception,probably divide by zero
> > > [0]PETSC ERROR: Try option -start_in_debugger or
> -on_error_attach_debugger
> > > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/
> documentation/faq.html#valgrind
> > > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac
> OS X to find memory corruption errors
> > > [1]PETSC ERROR: ------------------------------
> ------------------------------------------
> > > [1]PETSC ERROR: --------------------- Stack Frames
> ------------------------------------
> > > [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> > > [1]PETSC ERROR: INSTEAD the line number of the start of the
> function
> > > [1]PETSC ERROR: is given.
> > > [1]PETSC ERROR: [1] LAPACKgesvd line 42 /usr/dataC/home/valera/petsc/
> src/ksp/ksp/impls/gmres/gmreig.c
> > > [1]PETSC ERROR: [1] KSPComputeExtremeSingularValues_GMRES line 24
> /usr/dataC/home/valera/petsc/src/ksp/ksp/impls/gmres/gmreig.c
> > > [1]PETSC ERROR: [1] KSPComputeExtremeSingularValues line 51
> /usr/dataC/home/valera/petsc/src/ksp/ksp/interface/itfunc.c
> > > [1]PETSC ERROR: [1] PCGAMGOptProlongator_AGG line 1187
> /usr/dataC/home/valera/petsc/src/ksp/pc/impls/gamg/agg.c
> > > [1]PETSC ERROR: [1] PCSetUp_GAMG line 472 /usr/dataC/home/valera/petsc/
> src/ksp/pc/impls/gamg/gamg.c
> > > [1]PETSC ERROR: [1] PCSetUp line 930 /usr/dataC/home/valera/petsc/
> src/ksp/pc/interface/precon.c
> > > [1]PETSC ERROR: [1] KSPSetUp line 305 /usr/dataC/home/valera/petsc/
> src/ksp/ksp/interface/itfunc.c
> > > [0] PCGAMGOptProlongator_AGG line 1187 /usr/dataC/home/valera/petsc/
> src/ksp/pc/impls/gamg/agg.c
> > > [0]PETSC ERROR: [0] PCSetUp_GAMG line 472 /usr/dataC/home/valera/petsc/
> src/ksp/pc/impls/gamg/gamg.c
> > > [0]PETSC ERROR: [0] PCSetUp line 930 /usr/dataC/home/valera/petsc/
> src/ksp/pc/interface/precon.c
> > > [0]PETSC ERROR: [0] KSPSetUp line 305 /usr/dataC/home/valera/petsc/
> src/ksp/ksp/interface/itfunc.c
> > > [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > >
> > > when i use -pc_type hypre it actually shows something different on
> -ksp_view :
> > >
> > > KSP Object: 2 MPI processes
> > > type: gcr
> > > GCR: restart = 30
> > > GCR: restarts performed = 37
> > > maximum iterations=10000, initial guess is zero
> > > tolerances: relative=1e-14, absolute=1e-50, divergence=10000.
> > > right preconditioning
> > > using UNPRECONDITIONED norm type for convergence test
> > > PC Object: 2 MPI processes
> > > type: hypre
> > > HYPRE BoomerAMG preconditioning
> > > HYPRE BoomerAMG: Cycle type V
> > > HYPRE BoomerAMG: Maximum number of levels 25
> > > HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
> > > HYPRE BoomerAMG: Convergence tolerance PER hypre call 0.
> > > HYPRE BoomerAMG: Threshold for strong coupling 0.25
> > > HYPRE BoomerAMG: Interpolation truncation factor 0.
> > > HYPRE BoomerAMG: Interpolation: max elements per row 0
> > > HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
> > > HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
> > > HYPRE BoomerAMG: Maximum row sums 0.9
> > > HYPRE BoomerAMG: Sweeps down 1
> > > HYPRE BoomerAMG: Sweeps up 1
> > > HYPRE BoomerAMG: Sweeps on coarse 1
> > > HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi
> > > HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi
> > > HYPRE BoomerAMG: Relax on coarse Gaussian-elimination
> > > HYPRE BoomerAMG: Relax weight (all) 1.
> > > HYPRE BoomerAMG: Outer relax weight (all) 1.
> > > HYPRE BoomerAMG: Using CF-relaxation
> > > HYPRE BoomerAMG: Not using more complex smoothers.
> > > HYPRE BoomerAMG: Measure type local
> > > HYPRE BoomerAMG: Coarsen type Falgout
> > > HYPRE BoomerAMG: Interpolation type classical
> > > HYPRE BoomerAMG: Using nodal coarsening (with
> HYPRE_BOOMERAMGSetNodal() 1
> > > HYPRE BoomerAMG: HYPRE_BoomerAMGSetInterpVecVariant() 1
> > > linear system matrix = precond matrix:
> > > Mat Object: 2 MPI processes
> > > type: mpiaij
> > > rows=200000, cols=200000
> > > total: nonzeros=3373340, allocated nonzeros=3373340
> > > total number of mallocs used during MatSetValues calls =0
> > > not using I-node (on process 0) routines
> > >
> > >
> > > but still the timing is terrible.
> > >
> > >
> > >
> > >
> > > On Sat, Jan 7, 2017 at 5:28 PM, Jed Brown <jed at jedbrown.org> wrote:
> > > Manuel Valera <mvalera at mail.sdsu.edu> writes:
> > >
> > > > Awesome Matt and Jed,
> > > >
> > > > The GCR is used because the matrix is not invertible and because
> this was
> > > > the algorithm that the previous library used,
> > > >
> > > > The Preconditioned im aiming to use is multigrid, i thought i
> configured
> > > > the hypre-boomerAmg solver for this, but i agree in that it doesn't
> show in
> > > > the log anywhere, how can i be sure is being used ? i sent -ksp_view
> log
> > > > before in this thread
> > >
> > > Did you run with -pc_type hypre?
> > >
> > > > I had a problem with the matrix block sizes so i couldn't make the
> petsc
> > > > native multigrid solver to work,
> > >
> > > What block sizes? If the only variable is pressure, the block size
> > > would be 1 (default).
> > >
> > > > This is a nonhidrostatic pressure solver, it is an elliptic problem
> so
> > > > multigrid is a must,
> > >
> > > Yes, multigrid should work well.
> > >
> >
> >
> > <logsumm1hypre.txt><logsumm1jacobi.txt><logsumm2hypre.txt><
> logsumm2jacobi.txt><steams4.txt><steams32.txt>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170108/abd9ec8d/attachment-0001.html>
More information about the petsc-users
mailing list