[petsc-users] -log_view hangs unexpectedly // how to optimize my kspsolve

Sat Jan 7 18:36:55 CST 2017

On Sat, Jan 7, 2017 at 6:17 PM, Manuel Valera <mvalera at mail.sdsu.edu> wrote:

> I was able to find the bug, it was the outer loop bound in the same
> fashion than before, my -log_view is this :
>

Good. We also need to see the log from 1 process.

I note that you are using GCR/ILU. This solver is different on 1 and 2
processes, so you must check that it is not doing more iterates.

   Matt

> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
>
> ./ucmsMR on a arch-linux2-c-debug named ocean with 2 processors, by valera
> Sat Jan  7 16:11:51 2017
>
> Using Petsc Release Version 3.7.4, unknown
>
>
>                          Max       Max/Min        Avg      Total
>
> Time (sec):           2.074e+01      1.00000   2.074e+01
>
> Objects:              9.300e+01      1.00000   9.300e+01
>
> Flops:                8.662e+09      1.00000   8.662e+09  1.732e+10
>
> Flops/sec:            4.177e+08      1.00000   4.177e+08  8.354e+08
>
> Memory:               1.027e+08      1.03217              2.021e+08
>
> MPI Messages:         5.535e+02      1.00000   5.535e+02  1.107e+03
>
> MPI Message Lengths:  2.533e+07      1.00000   4.576e+04  5.066e+07
>
> MPI Reductions:       1.903e+04      1.00000
>
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>
>                             e.g., VecAXPY() for real vectors of length N
> --> 2N flops
>
>                             and VecAXPY() for complex vectors of length N
> --> 8N flops
>
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>
>                         Avg     %Total     Avg     %Total   counts
> %Total     Avg         %Total   counts   %Total
>
>  0:      Main Stage: 2.0739e+01 100.0%  1.7325e+10 100.0%  1.107e+03
> 100.0%  4.576e+04      100.0%  1.903e+04 100.0%
>
>
> ------------------------------------------------------------
> ------------------------------------------------------------
>
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
>
> Phase summary info:
>
>    Count: number of times phase was executed
>
>    Time and Flops: Max - maximum over all processors
>
>                    Ratio - ratio of maximum to minimum over all processors
>
>    Mess: number of messages sent
>
>    Avg. len: average message length (bytes)
>
>    Reduct: number of global reductions
>
>    Global: entire computation
>
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
>
>       %T - percent time in this phase         %F - percent flops in this
> phase
>
>       %M - percent messages in this phase     %L - percent message lengths
> in this phase
>
>       %R - percent reductions in this phase
>
>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all processors)
>
> ------------------------------------------------------------
> ------------------------------------------------------------
>
>
>
>       ##########################################################
>
>       #                                                        #
>
>       #                          WARNING!!!                    #
>
>       #                                                        #
>
>       #   This code was compiled with a debugging option,      #
>
>       #   To get timing results run ./configure                #
>
>       #   using --with-debugging=no, the performance will      #
>
>       #   be generally two or three times faster.              #
>
>       #                                                        #
>
>       ##########################################################
>
>
>
> Event                Count      Time (sec)     Flops
>       --- Global ---  --- Stage ---   Total
>
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------
> ------------------------------------------------------------
>
>
> --- Event Stage 0: Main Stage
>
>
> VecDotNorm2          545 1.0 4.4925e-01 1.6 2.18e+08 1.0 0.0e+00 0.0e+00
> 1.1e+03  2  3  0  0  6   2  3  0  0  6   971
>
> VecMDot              525 1.0 1.7089e+00 1.7 1.48e+09 1.0 0.0e+00 0.0e+00
> 1.0e+03  7 17  0  0  6   7 17  0  0  6  1735
>
> VecNorm              420 1.0 7.6857e-02 1.0 8.40e+07 1.0 0.0e+00 0.0e+00
> 8.4e+02  0  1  0  0  4   0  1  0  0  4  2186
>
> VecScale            1090 1.0 2.5113e-01 1.1 1.09e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  1  1  0  0  0   1  1  0  0  0   868
>
> VecSet               555 1.0 7.3570e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>
> VecAXPY             1090 1.0 2.7621e-01 1.1 2.18e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  1  3  0  0  0   1  3  0  0  0  1579
>
> VecAYPX                5 1.0 3.6647e-03 2.1 5.00e+05 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0   273
>
> VecMAXPY            1050 1.0 2.3646e+00 1.1 2.97e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 11 34  0  0  0  11 34  0  0  0  2508
>
> VecAssemblyBegin      12 1.7 2.4388e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.8e+01  0  0  0  0  0   0  0  0  0  0     0
>
> VecAssemblyEnd        12 1.7 1.0085e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>
> VecScatterBegin      560 1.0 2.3770e+0071.3 0.00e+00 0.0 1.1e+03 2.7e+04
> 1.0e+01  6  0 99 59  0   6  0 99 59  0     0
>
> VecScatterEnd        550 1.0 3.7769e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>
> MatMult              550 1.0 3.7412e+00 1.1 1.80e+09 1.0 1.1e+03 2.0e+04
> 0.0e+00 17 21 99 43  0  17 21 99 43  0   962
>
> MatSolve             545 1.0 3.6138e+00 1.1 1.77e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 17 20  0  0  0  17 20  0  0  0   980
>
> MatLUFactorNum         1 1.0 1.2530e-01 1.5 1.27e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0   203
>
> MatILUFactorSym        1 1.0 2.0162e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>
> MatConvert             1 1.0 3.3683e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>
> MatAssemblyBegin       1 1.0 9.5172e-02359.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
>
> MatAssemblyEnd         1 1.0 2.6907e-02 1.0 0.00e+00 0.0 4.0e+00 5.0e+03
> 2.3e+01  0  0  0  0  0   0  0  0  0  0     0
>
> MatGetRowIJ            3 1.0 1.2398e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>
> MatGetOrdering         1 1.0 4.4249e-02 2.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>
> MatLoad                1 1.0 3.8892e-01 1.0 0.00e+00 0.0 7.0e+00 3.0e+06
> 3.8e+01  2  0  1 41  0   2  0  1 41  0     0
>
> KSPSetUp               2 1.0 2.2634e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 5.0e+00  0  0  0  0  0   0  0  0  0  0     0
>
> KSPSolve               5 1.0 1.2218e+01 1.0 8.66e+09 1.0 1.1e+03 2.0e+04
> 1.9e+04 59100 99 43 99  59100 99 43 99  1418
>
> PCSetUp                3 1.0 1.7993e+00 1.0 1.27e+07 1.0 0.0e+00 0.0e+00
> 1.0e+01  8  0  0  0  0   8  0  0  0  0    14
>
> PCSetUpOnBlocks        5 1.0 1.9013e-01 1.7 1.27e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0   134
>
> PCApply              546 1.0 3.8320e+00 1.1 1.77e+09 1.0 0.0e+00 0.0e+00
> 1.0e+00 18 20  0  0  0  18 20  0  0  0   925
>
> ------------------------------------------------------------
> ------------------------------------------------------------
>
>
> Memory usage is given in bytes:
>
>
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
>
> Reports information only for process 0.
>
>
> --- Event Stage 0: Main Stage
>
>
>               Vector    72              6      5609648     0.
>
>       Vector Scatter     3              2         1312     0.
>
>               Matrix     4              0            0     0.
>
>               Viewer     2              0            0     0.
>
>            Index Set     7              4        13104     0.
>
>        Krylov Solver     2              0            0     0.
>
>       Preconditioner     3              1         1384     0.
>
> ============================================================
> ============================================================
>
> Average time to get PetscTime(): 7.15256e-08
>
> Average time for MPI_Barrier(): 1.82629e-05
>
> Average time for zero size MPI_Send(): 9.89437e-06
>
> #PETSc Option Table entries:
>
> -log_view
>
> -matload_block_size 1
>
> -pc_hypre_boomeramg_nodal_coarsen 1
>
> -pc_hypre_boomeramg_vec_interp_variant 1
>
> #End of PETSc Option Table entries
>
> Compiled without FORTRAN kernels
>
> Compiled with full precision matrices (default)
>
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
>
> Configure options: --with-cc=gcc --with-cxx=g++ --with-fc=gfortran
> --download-fblaslapack --download-mpich --download-hdf5 --download-netcdf
> --download-hypre --download-metis --download-parmetis --download-trillinos
>
> -----------------------------------------
>
> Libraries compiled on Fri Dec  9 12:45:19 2016 on ocean
>
> Machine characteristics: Linux-3.10.0-327.13.1.el7.x86_
> 64-x86_64-with-centos-7.2.1511-Core
>
> Using PETSc directory: /home/valera/petsc
>
> Using PETSc arch: arch-linux2-c-debug
>
> -----------------------------------------
>
>
> Using C compiler: /home/valera/petsc/arch-linux2-c-debug/bin/mpicc
> -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas
> -fvisibility=hidden -g3  ${COPTFLAGS} ${CFLAGS}
>
> Using Fortran compiler: /home/valera/petsc/arch-linux2-c-debug/bin/mpif90
>   -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g  ${FOPTFLAGS}
> ${FFLAGS}
>
> -----------------------------------------
>
>
> Using include paths: -I/home/valera/petsc/arch-linux2-c-debug/include
> -I/home/valera/petsc/include -I/home/valera/petsc/include
> -I/home/valera/petsc/arch-linux2-c-debug/include
>
> -----------------------------------------
>
>
> Using C linker: /home/valera/petsc/arch-linux2-c-debug/bin/mpicc
>
> Using Fortran linker: /home/valera/petsc/arch-linux2-c-debug/bin/mpif90
>
> Using libraries: -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib
> -L/home/valera/petsc/arch-linux2-c-debug/lib -lpetsc
> -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib
> -L/home/valera/petsc/arch-linux2-c-debug/lib -lparmetis -lmetis -lHYPRE
> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5
> -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lmpicxx -lstdc++ -lflapack
> -lfblas -lnetcdf -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lpthread
> -lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx
> -lstdc++ -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib
> -L/home/valera/petsc/arch-linux2-c-debug/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5
> -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -ldl
> -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -lmpi -lgcc_s -ldl
>
> -----------------------------------------
>
>
> WARNING! There are options you set that were not used!
>
> WARNING! could be spelling mistake, etc!
>
> Option left: name:-pc_hypre_boomeramg_nodal_coarsen value: 1
>
> Option left: name:-pc_hypre_boomeramg_vec_interp_variant value: 1
>
> [valera at ocean serGCCOM]$
>
>
> Any suggestions are very much appreciated,
>
> Thanks
>
>
>
> On Sat, Jan 7, 2017 at 3:39 PM, Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Sat, Jan 7, 2017 at 5:33 PM, Manuel Valera <mvalera at mail.sdsu.edu>
>> wrote:
>>
>>> Thanks Barry and Matt,
>>>
>>> I was able to detect a bug that i just solved, as suggested the loop
>>> parameters weren't updated as it should, now it does and the program still
>>> freezes but now in the beginning of the loop... ?
>>>
>>
>> You have called a collective function from only one process. Stepping
>> through on both processes in your run will find this easily.
>>
>>   Thanks,
>>
>>      Matt
>>
>>
>>> Im attaching screen so you have an idea. Im thinking about it...
>>>
>>> Thanks
>>>
>>> On Sat, Jan 7, 2017 at 3:21 PM, Matthew Knepley <knepley at gmail.com>
>>> wrote:
>>>
>>>> On Sat, Jan 7, 2017 at 4:59 PM, Manuel Valera <mvalera at mail.sdsu.edu>
>>>> wrote:
>>>>
>>>>> I would have to think and code a MWE for this problem before sending
>>>>> it since the model is much bigger than the petsc solver. Attached here is a
>>>>> screenshot of the debugger as barry taught me, is that the stack trace you
>>>>> need ?
>>>>>
>>>>> the ucmsMain.f90:522 that shows is the call (from all processes) to
>>>>> the routine that updates the rhs vector (from root) and scatters it (from
>>>>> all processes).
>>>>>
>>>>
>>>> Yes, so one process is here and the other has moved on, so there is a
>>>> mismatch in calls.
>>>>
>>>> You could do what Barry suggests, but I think it would be better to
>>>> just step through your main routine once (its slow going), and
>>>> see where the divergence happens.
>>>>
>>>>    Matt
>>>>
>>>>
>>>>> This routine is itself inside a double loop that occurs in all
>>>>> processes but the only call from all processes to the solver is this one,
>>>>> the rest of the loop which involves correcting for velocities, pressure and
>>>>> temperature, all happens in root node.
>>>>>
>>>>> Sorry for the convoluted program design, this is the first beta
>>>>> version of the model working on parallel and was the best i could come
>>>>> with, i suppose it makes more sense in serial,
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Sat, Jan 7, 2017 at 2:24 PM, Matthew Knepley <knepley at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> On Sat, Jan 7, 2017 at 4:20 PM, Manuel Valera <mvalera at mail.sdsu.edu>
>>>>>> wrote:
>>>>>>
>>>>>>> Thank you Matthew,
>>>>>>>
>>>>>>> On Sat, Jan 7, 2017 at 1:49 PM, Matthew Knepley <knepley at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On Sat, Jan 7, 2017 at 3:32 PM, Manuel Valera <
>>>>>>>> mvalera at mail.sdsu.edu> wrote:
>>>>>>>>
>>>>>>>>> Hi Devs, hope you are having a great weekend,
>>>>>>>>>
>>>>>>>>> I could finally parallelize my linear solver and implement it into
>>>>>>>>> the rest of the code in a way that only the linear system is solved in
>>>>>>>>> parallel, great news for my team, but there is a catch and is that i don't
>>>>>>>>> see any speedup in the linear system, i don't know if its the MPI in the
>>>>>>>>> cluster we are using, but im not sure on how to debug it,
>>>>>>>>>
>>>>>>>>
>>>>>>>> We need to see -log_view output for any performance question.
>>>>>>>>
>>>>>>>>
>>>>>>>>> On the other hand and because of this issue i was trying to do
>>>>>>>>> -log_summary or -log_view and i noticed the program in this context hangs
>>>>>>>>> when is time of producing the log, if i debug this for 2 cores, process 0
>>>>>>>>> exits normally but process 1 hangs in the vectorscatterbegin() with
>>>>>>>>> scatter_reverse way back in the code,
>>>>>>>>>
>>>>>>>>
>>>>>>>> You are calling a collective routine from only 1 process.
>>>>>>>>
>>>>>>>>
>>>>>>>   Matt
>>>>>>>>
>>>>>>>
>>>>>>> I am pretty confident this is not the case,
>>>>>>>
>>>>>>
>>>>>> This is still the simplest explanation. Can you send the stack trace
>>>>>> for the 2 process run?
>>>>>>
>>>>>>
>>>>>>> the callings to vecscattercreatetozero and vecscatterbegin are made
>>>>>>> in all processes, the program goes thru all of the iterations on the linear
>>>>>>> solver, writes output correctly and even closes all the petsc objects
>>>>>>> without complaining, the freeze occurs at the very end when the log is to
>>>>>>> be produced.
>>>>>>>
>>>>>>
>>>>>> If you can send us a code to run, we can likely find the error.
>>>>>>
>>>>>>   Thanks,
>>>>>>
>>>>>>      Matt
>>>>>>
>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Manuel
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> and even after destroying all associated objects and calling
>>>>>>>>> petscfinalize(), so im really clueless on why is this, as it only happens
>>>>>>>>> for -log_* or -ksp_view options.
>>>>>>>>>
>>>>>>>>> my -ksp_view shows this:
>>>>>>>>>
>>>>>>>>>  KSP Object: 2 MPI processes
>>>>>>>>>
>>>>>>>>>   type: gcr
>>>>>>>>>
>>>>>>>>>     GCR: restart = 30
>>>>>>>>>
>>>>>>>>>     GCR: restarts performed = 20
>>>>>>>>>
>>>>>>>>>   maximum iterations=10000, initial guess is zero
>>>>>>>>>
>>>>>>>>>   tolerances:  relative=1e-14, absolute=1e-50, divergence=10000.
>>>>>>>>>
>>>>>>>>>   right preconditioning
>>>>>>>>>
>>>>>>>>>   using UNPRECONDITIONED norm type for convergence test
>>>>>>>>>
>>>>>>>>> PC Object: 2 MPI processes
>>>>>>>>>
>>>>>>>>>   type: bjacobi
>>>>>>>>>
>>>>>>>>>     block Jacobi: number of blocks = 2
>>>>>>>>>
>>>>>>>>>     Local solve is same for all blocks, in the following KSP and
>>>>>>>>> PC objects:
>>>>>>>>>
>>>>>>>>>   KSP Object:  (sub_)   1 MPI processes
>>>>>>>>>
>>>>>>>>>     type: preonly
>>>>>>>>>
>>>>>>>>>     maximum iterations=10000, initial guess is zero
>>>>>>>>>
>>>>>>>>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>>>>>>>
>>>>>>>>>     left preconditioning
>>>>>>>>>
>>>>>>>>>     using NONE norm type for convergence test
>>>>>>>>>
>>>>>>>>>   PC Object:  (sub_)   1 MPI processes
>>>>>>>>>
>>>>>>>>>     type: ilu
>>>>>>>>>
>>>>>>>>>       ILU: out-of-place factorization
>>>>>>>>>
>>>>>>>>>       0 levels of fill
>>>>>>>>>
>>>>>>>>>       tolerance for zero pivot 2.22045e-14
>>>>>>>>>
>>>>>>>>>       matrix ordering: natural
>>>>>>>>>
>>>>>>>>>       factor fill ratio given 1., needed 1.
>>>>>>>>>
>>>>>>>>>         Factored matrix follows:
>>>>>>>>>
>>>>>>>>>           Mat Object:           1 MPI processes
>>>>>>>>>
>>>>>>>>>             type: seqaij
>>>>>>>>>
>>>>>>>>>             rows=100000, cols=100000
>>>>>>>>>
>>>>>>>>>             package used to perform factorization: petsc
>>>>>>>>>
>>>>>>>>>             total: nonzeros=1675180, allocated nonzeros=1675180
>>>>>>>>>
>>>>>>>>>             total number of mallocs used during MatSetValues calls
>>>>>>>>> =0
>>>>>>>>>
>>>>>>>>>               not using I-node routines
>>>>>>>>>
>>>>>>>>>     linear system matrix = precond matrix:
>>>>>>>>>
>>>>>>>>>     Mat Object:     1 MPI processes
>>>>>>>>>
>>>>>>>>>       type: seqaij
>>>>>>>>>
>>>>>>>>>       rows=100000, cols=100000
>>>>>>>>>
>>>>>>>>>       total: nonzeros=1675180, allocated nonzeros=1675180
>>>>>>>>>
>>>>>>>>>       total number of mallocs used during MatSetValues calls =0
>>>>>>>>>
>>>>>>>>>         not using I-node routines
>>>>>>>>>
>>>>>>>>>   linear system matrix = precond matrix:
>>>>>>>>>
>>>>>>>>>   Mat Object:   2 MPI processes
>>>>>>>>>
>>>>>>>>>     type: mpiaij
>>>>>>>>>
>>>>>>>>>     rows=200000, cols=200000
>>>>>>>>>
>>>>>>>>>     total: nonzeros=3373340, allocated nonzeros=3373340
>>>>>>>>>
>>>>>>>>>     total number of mallocs used during MatSetValues calls =0
>>>>>>>>>
>>>>>>>>>       not using I-node (on process 0) routines
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> And i configured my PC object as:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    call PCSetType(mg,PCHYPRE,ierr)
>>>>>>>>>
>>>>>>>>>    call PCHYPRESetType(mg,'boomeramg',ierr)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     call PetscOptionsSetValue(PETSC_NULL_OBJECT,
>>>>>>>>> 'pc_hypre_boomeramg_nodal_coarsen','1',ierr)
>>>>>>>>>
>>>>>>>>>     call PetscOptionsSetValue(PETSC_NULL_OBJECT,
>>>>>>>>> 'pc_hypre_boomeramg_vec_interp_variant','1',ierr)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What are your thoughts ?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Manuel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jan 6, 2017 at 1:58 PM, Manuel Valera <
>>>>>>>>> mvalera at mail.sdsu.edu> wrote:
>>>>>>>>>
>>>>>>>>>> Awesome, that did it, thanks once again.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jan 6, 2017 at 1:53 PM, Barry Smith <bsmith at mcs.anl.gov>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    Take the scatter out of the if () since everyone does it and
>>>>>>>>>>> get rid of the VecView().
>>>>>>>>>>>
>>>>>>>>>>>    Does this work? If not where is it hanging?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> > On Jan 6, 2017, at 3:29 PM, Manuel Valera <
>>>>>>>>>>> mvalera at mail.sdsu.edu> wrote:
>>>>>>>>>>> >
>>>>>>>>>>> > Thanks Dave,
>>>>>>>>>>> >
>>>>>>>>>>> > I think is interesting it never gave an error on this, after
>>>>>>>>>>> adding the vecassembly calls it still shows the same behavior, without
>>>>>>>>>>> complaining, i did:
>>>>>>>>>>> >
>>>>>>>>>>> > if(rankl==0)then
>>>>>>>>>>> >
>>>>>>>>>>> >      call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr)
>>>>>>>>>>> >      call VecAssemblyBegin(bp0,ierr) ; call
>>>>>>>>>>> VecAssemblyEnd(bp0,ierr);
>>>>>>>>>>> >      CHKERRQ(ierr)
>>>>>>>>>>> >
>>>>>>>>>>> endif
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> >      call VecScatterBegin(ctr,bp0,bp2,IN
>>>>>>>>>>> SERT_VALUES,SCATTER_REVERSE,ierr)
>>>>>>>>>>> >      call VecScatterEnd(ctr,bp0,bp2,INSE
>>>>>>>>>>> RT_VALUES,SCATTER_REVERSE,ierr)
>>>>>>>>>>> >      print*,"done! "
>>>>>>>>>>> >      CHKERRQ(ierr)
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> >        CHKERRQ(ierr)
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > Thanks.
>>>>>>>>>>> >
>>>>>>>>>>> > On Fri, Jan 6, 2017 at 12:44 PM, Dave May <
>>>>>>>>>>> dave.mayhem23 at gmail.com> wrote:
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > On 6 January 2017 at 20:24, Manuel Valera <
>>>>>>>>>>> mvalera at mail.sdsu.edu> wrote:
>>>>>>>>>>> > Great help Barry, i totally had overlooked that option (it is
>>>>>>>>>>> explicit in the vecscatterbegin call help page but not in
>>>>>>>>>>> vecscattercreatetozero, as i read later)
>>>>>>>>>>> >
>>>>>>>>>>> > So i used that and it works partially, it scatters te values
>>>>>>>>>>> assigned in root but not the rest, if i call vecscatterbegin from outside
>>>>>>>>>>> root it hangs, the code currently look as this:
>>>>>>>>>>> >
>>>>>>>>>>> >   call VecScatterCreateToZero(bp2,ctr,bp0,ierr); CHKERRQ(ierr)
>>>>>>>>>>> >
>>>>>>>>>>> >   call PetscObjectSetName(bp0, 'bp0:',ierr)
>>>>>>>>>>> >
>>>>>>>>>>> > if(rankl==0)then
>>>>>>>>>>> >
>>>>>>>>>>> >      call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr)
>>>>>>>>>>> >
>>>>>>>>>>> >      call VecView(bp0,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > You need to call
>>>>>>>>>>> >
>>>>>>>>>>> >   VecAssemblyBegin(bp0);
>>>>>>>>>>> >   VecAssemblyEnd(bp0);
>>>>>>>>>>> > after your last call to VecSetValues() before you can do any
>>>>>>>>>>> operations with bp0.
>>>>>>>>>>> >
>>>>>>>>>>> > With your current code, the call to VecView should produce an
>>>>>>>>>>> error if you used the error checking macro CHKERRQ(ierr) (as should
>>>>>>>>>>> VecScatter{Begin,End}
>>>>>>>>>>> >
>>>>>>>>>>> > Thanks,
>>>>>>>>>>> >   Dave
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> >      call VecScatterBegin(ctr,bp0,bp2,IN
>>>>>>>>>>> SERT_VALUES,SCATTER_REVERSE,ierr)
>>>>>>>>>>> >      call VecScatterEnd(ctr,bp0,bp2,INSE
>>>>>>>>>>> RT_VALUES,SCATTER_REVERSE,ierr)
>>>>>>>>>>> >      print*,"done! "
>>>>>>>>>>> >      CHKERRQ(ierr)
>>>>>>>>>>> >
>>>>>>>>>>> > endif
>>>>>>>>>>> >
>>>>>>>>>>> >    ! call VecScatterBegin(ctr,bp0,bp2,IN
>>>>>>>>>>> SERT_VALUES,SCATTER_REVERSE,ierr)
>>>>>>>>>>> >    !  call VecScatterEnd(ctr,bp0,bp2,INSE
>>>>>>>>>>> RT_VALUES,SCATTER_REVERSE,ierr)
>>>>>>>>>>> >
>>>>>>>>>>> >   call VecView(bp2,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>>>>>>>>>> >
>>>>>>>>>>> >   call PetscBarrier(PETSC_NULL_OBJECT,ierr)
>>>>>>>>>>> >
>>>>>>>>>>> >   call exit()
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > And the output is: (with bp the right answer)
>>>>>>>>>>> >
>>>>>>>>>>> > Vec Object:bp: 2 MPI processes
>>>>>>>>>>> >   type: mpi
>>>>>>>>>>> > Process [0]
>>>>>>>>>>> > 1.
>>>>>>>>>>> > 2.
>>>>>>>>>>> > Process [1]
>>>>>>>>>>> > 4.
>>>>>>>>>>> > 3.
>>>>>>>>>>> > Vec Object:bp2: 2 MPI processes  (before scatter)
>>>>>>>>>>> >   type: mpi
>>>>>>>>>>> > Process [0]
>>>>>>>>>>> > 0.
>>>>>>>>>>> > 0.
>>>>>>>>>>> > Process [1]
>>>>>>>>>>> > 0.
>>>>>>>>>>> > 0.
>>>>>>>>>>> > Vec Object:bp0: 1 MPI processes
>>>>>>>>>>> >   type: seq
>>>>>>>>>>> > 1.
>>>>>>>>>>> > 2.
>>>>>>>>>>> > 4.
>>>>>>>>>>> > 3.
>>>>>>>>>>> >  done!
>>>>>>>>>>> > Vec Object:bp2: 2 MPI processes  (after scatter)
>>>>>>>>>>> >   type: mpi
>>>>>>>>>>> > Process [0]
>>>>>>>>>>> > 1.
>>>>>>>>>>> > 2.
>>>>>>>>>>> > Process [1]
>>>>>>>>>>> > 0.
>>>>>>>>>>> > 0.
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > Thanks inmensely for your help,
>>>>>>>>>>> >
>>>>>>>>>>> > Manuel
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > On Thu, Jan 5, 2017 at 4:39 PM, Barry Smith <
>>>>>>>>>>> bsmith at mcs.anl.gov> wrote:
>>>>>>>>>>> >
>>>>>>>>>>> > > On Jan 5, 2017, at 6:21 PM, Manuel Valera <
>>>>>>>>>>> mvalera at mail.sdsu.edu> wrote:
>>>>>>>>>>> > >
>>>>>>>>>>> > > Hello Devs is me again,
>>>>>>>>>>> > >
>>>>>>>>>>> > > I'm trying to distribute a vector to all called processes,
>>>>>>>>>>> the vector would be originally in root as a sequential vector and i would
>>>>>>>>>>> like to scatter it, what would the best call to do this ?
>>>>>>>>>>> > >
>>>>>>>>>>> > > I already know how to gather a distributed vector to root
>>>>>>>>>>> with VecScatterCreateToZero, this would be the inverse operation,
>>>>>>>>>>> >
>>>>>>>>>>> >    Use the same VecScatter object but with SCATTER_REVERSE,
>>>>>>>>>>> not you need to reverse the two vector arguments as well.
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > > i'm currently trying with VecScatterCreate() and as of now
>>>>>>>>>>> im doing the following:
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > > if(rank==0)then
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >      call VecCreate(PETSC_COMM_SELF,bp0,ierr);
>>>>>>>>>>> CHKERRQ(ierr) !if i use WORLD
>>>>>>>>>>> > >
>>>>>>>>>>> !freezes in SetSizes
>>>>>>>>>>> > >      call VecSetSizes(bp0,PETSC_DECIDE,nbdp,ierr);
>>>>>>>>>>> CHKERRQ(ierr)
>>>>>>>>>>> > >      call VecSetType(bp0,VECSEQ,ierr)
>>>>>>>>>>> > >      call VecSetFromOptions(bp0,ierr); CHKERRQ(ierr)
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >      call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr)
>>>>>>>>>>> > >
>>>>>>>>>>> > >      !call VecSet(bp0,5.0D0,ierr); CHKERRQ(ierr)
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >      call VecView(bp0,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>>>>>>>>>> > >
>>>>>>>>>>> > >      call VecAssemblyBegin(bp0,ierr) ; call
>>>>>>>>>>> VecAssemblyEnd(bp0,ierr) !rhs
>>>>>>>>>>> > >
>>>>>>>>>>> > >      do i=0,nbdp-1,1
>>>>>>>>>>> > >         ind(i+1) = i
>>>>>>>>>>> > >      enddo
>>>>>>>>>>> > >
>>>>>>>>>>> > >      call ISCreateGeneral(PETSC_COMM_SEL
>>>>>>>>>>> F,nbdp,ind,PETSC_COPY_VALUES,locis,ierr)
>>>>>>>>>>> > >
>>>>>>>>>>> > >     !call VecScatterCreate(bp0,PETSC_NULL_OBJECT,bp2,is,ctr,ierr)
>>>>>>>>>>> !if i use SELF
>>>>>>>>>>> > >
>>>>>>>>>>>      !freezes here.
>>>>>>>>>>> > >
>>>>>>>>>>> > >      call VecScatterCreate(bp0,locis,bp2
>>>>>>>>>>> ,PETSC_NULL_OBJECT,ctr,ierr)
>>>>>>>>>>> > >
>>>>>>>>>>> > > endif
>>>>>>>>>>> > >
>>>>>>>>>>> > > bp2 being the receptor MPI vector to scatter to
>>>>>>>>>>> > >
>>>>>>>>>>> > > But it freezes in VecScatterCreate when trying to use more
>>>>>>>>>>> than one processor, what would be a better approach ?
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > > Thanks once again,
>>>>>>>>>>> > >
>>>>>>>>>>> > > Manuel
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > > On Wed, Jan 4, 2017 at 3:30 PM, Manuel Valera <
>>>>>>>>>>> mvalera at mail.sdsu.edu> wrote:
>>>>>>>>>>> > > Thanks i had no idea how to debug and read those logs, that
>>>>>>>>>>> solved this issue at least (i was sending a message from root to everyone
>>>>>>>>>>> else, but trying to catch from everyone else including root)
>>>>>>>>>>> > >
>>>>>>>>>>> > > Until next time, many thanks,
>>>>>>>>>>> > >
>>>>>>>>>>> > > Manuel
>>>>>>>>>>> > >
>>>>>>>>>>> > > On Wed, Jan 4, 2017 at 3:23 PM, Matthew Knepley <
>>>>>>>>>>> knepley at gmail.com> wrote:
>>>>>>>>>>> > > On Wed, Jan 4, 2017 at 5:21 PM, Manuel Valera <
>>>>>>>>>>> mvalera at mail.sdsu.edu> wrote:
>>>>>>>>>>> > > I did a PetscBarrier just before calling the vicariate
>>>>>>>>>>> routine and im pretty sure im calling it from every processor, code looks
>>>>>>>>>>> like this:
>>>>>>>>>>> > >
>>>>>>>>>>> > > From the gdb trace.
>>>>>>>>>>> > >
>>>>>>>>>>> > >   Proc 0: Is in some MPI routine you call yourself, line 113
>>>>>>>>>>> > >
>>>>>>>>>>> > >   Proc 1: Is in VecCreate(), line 130
>>>>>>>>>>> > >
>>>>>>>>>>> > > You need to fix your communication code.
>>>>>>>>>>> > >
>>>>>>>>>>> > >    Matt
>>>>>>>>>>> > >
>>>>>>>>>>> > > call PetscBarrier(PETSC_NULL_OBJECT,ierr)
>>>>>>>>>>> > >
>>>>>>>>>>> > > print*,'entering POInit from',rank
>>>>>>>>>>> > > !call exit()
>>>>>>>>>>> > >
>>>>>>>>>>> > > call PetscObjsInit()
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > > And output gives:
>>>>>>>>>>> > >
>>>>>>>>>>> > >  entering POInit from           0
>>>>>>>>>>> > >  entering POInit from           1
>>>>>>>>>>> > >  entering POInit from           2
>>>>>>>>>>> > >  entering POInit from           3
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > > Still hangs in the same way,
>>>>>>>>>>> > >
>>>>>>>>>>> > > Thanks,
>>>>>>>>>>> > >
>>>>>>>>>>> > > Manuel
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > > On Wed, Jan 4, 2017 at 2:55 PM, Manuel Valera <
>>>>>>>>>>> mvalera at mail.sdsu.edu> wrote:
>>>>>>>>>>> > > Thanks for the answers !
>>>>>>>>>>> > >
>>>>>>>>>>> > > heres the screenshot of what i got from bt in gdb (great
>>>>>>>>>>> hint in how to debug in petsc, didn't know that)
>>>>>>>>>>> > >
>>>>>>>>>>> > > I don't really know what to look at here,
>>>>>>>>>>> > >
>>>>>>>>>>> > > Thanks,
>>>>>>>>>>> > >
>>>>>>>>>>> > > Manuel
>>>>>>>>>>> > >
>>>>>>>>>>> > > On Wed, Jan 4, 2017 at 2:39 PM, Dave May <
>>>>>>>>>>> dave.mayhem23 at gmail.com> wrote:
>>>>>>>>>>> > > Are you certain ALL ranks in PETSC_COMM_WORLD call these
>>>>>>>>>>> function(s). These functions cannot be inside if statements like
>>>>>>>>>>> > > if (rank == 0){
>>>>>>>>>>> > >   VecCreateMPI(...)
>>>>>>>>>>> > > }
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > > On Wed, 4 Jan 2017 at 23:34, Manuel Valera <
>>>>>>>>>>> mvalera at mail.sdsu.edu> wrote:
>>>>>>>>>>> > > Thanks Dave for the quick answer, appreciate it,
>>>>>>>>>>> > >
>>>>>>>>>>> > > I just tried that and it didn't make a difference, any other
>>>>>>>>>>> suggestions ?
>>>>>>>>>>> > >
>>>>>>>>>>> > > Thanks,
>>>>>>>>>>> > > Manuel
>>>>>>>>>>> > >
>>>>>>>>>>> > > On Wed, Jan 4, 2017 at 2:29 PM, Dave May <
>>>>>>>>>>> dave.mayhem23 at gmail.com> wrote:
>>>>>>>>>>> > > You need to swap the order of your function calls.
>>>>>>>>>>> > > Call VecSetSizes() before VecSetType()
>>>>>>>>>>> > >
>>>>>>>>>>> > > Thanks,
>>>>>>>>>>> > >   Dave
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > > On Wed, 4 Jan 2017 at 23:21, Manuel Valera <
>>>>>>>>>>> mvalera at mail.sdsu.edu> wrote:
>>>>>>>>>>> > > Hello all, happy new year,
>>>>>>>>>>> > >
>>>>>>>>>>> > > I'm working on parallelizing my code, it worked and provided
>>>>>>>>>>> some results when i just called more than one processor, but created
>>>>>>>>>>> artifacts because i didn't need one image of the whole program in each
>>>>>>>>>>> processor, conflicting with each other.
>>>>>>>>>>> > >
>>>>>>>>>>> > > Since the pressure solver is the main part i need in
>>>>>>>>>>> parallel im chosing mpi to run everything in root processor until its time
>>>>>>>>>>> to solve for pressure, at this point im trying to create a distributed
>>>>>>>>>>> vector using either
>>>>>>>>>>> > >
>>>>>>>>>>> > >      call VecCreateMPI(PETSC_COMM_WORLD,
>>>>>>>>>>> PETSC_DECIDE,nbdp,xp,ierr)
>>>>>>>>>>> > > or
>>>>>>>>>>> > >      call VecCreate(PETSC_COMM_WORLD,xp,ierr); CHKERRQ(ierr)
>>>>>>>>>>> > >      call VecSetType(xp,VECMPI,ierr)
>>>>>>>>>>> > >      call VecSetSizes(xp,PETSC_DECIDE,nbdp,ierr);
>>>>>>>>>>> CHKERRQ(ierr)
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > > In both cases program hangs at this point, something it
>>>>>>>>>>> never happened on the naive way i described before. I've made sure the
>>>>>>>>>>> global size, nbdp, is the same in every processor. What can be wrong?
>>>>>>>>>>> > >
>>>>>>>>>>> > > Thanks for your kind help,
>>>>>>>>>>> > >
>>>>>>>>>>> > > Manuel.
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > > --
>>>>>>>>>>> > > What most experimenters take for granted before they begin
>>>>>>>>>>> their experiments is infinitely more interesting than any results to which
>>>>>>>>>>> their experiments lead.
>>>>>>>>>>> > > -- Norbert Wiener
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>>>> experiments lead.
>>>>>>>> -- Norbert Wiener
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> What most experimenters take for granted before they begin their
>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>> experiments lead.
>>>>>> -- Norbert Wiener
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170107/4c5ed160/attachment-0001.html>