[petsc-users] -log_view hangs unexpectedly // how to optimize my kspsolve
Manuel Valera
mvalera at mail.sdsu.edu
Sat Jan 7 18:17:45 CST 2017
I was able to find the bug, it was the outer loop bound in the same fashion
than before, my -log_view is this :
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
./ucmsMR on a arch-linux2-c-debug named ocean with 2 processors, by valera
Sat Jan 7 16:11:51 2017
Using Petsc Release Version 3.7.4, unknown
Max Max/Min Avg Total
Time (sec): 2.074e+01 1.00000 2.074e+01
Objects: 9.300e+01 1.00000 9.300e+01
Flops: 8.662e+09 1.00000 8.662e+09 1.732e+10
Flops/sec: 4.177e+08 1.00000 4.177e+08 8.354e+08
Memory: 1.027e+08 1.03217 2.021e+08
MPI Messages: 5.535e+02 1.00000 5.535e+02 1.107e+03
MPI Message Lengths: 2.533e+07 1.00000 4.576e+04 5.066e+07
MPI Reductions: 1.903e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N
--> 2N flops
and VecAXPY() for complex vectors of length N
--> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
--- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total
Avg %Total counts %Total
0: Main Stage: 2.0739e+01 100.0% 1.7325e+10 100.0% 1.107e+03
100.0% 4.576e+04 100.0% 1.903e+04 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this
phase
%M - percent messages in this phase %L - percent message lengths
in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
##########################################################
# #
# WARNING!!! #
# #
# This code was compiled with a debugging option, #
# To get timing results run ./configure #
# using --with-debugging=no, the performance will #
# be generally two or three times faster. #
# #
##########################################################
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len
Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecDotNorm2 545 1.0 4.4925e-01 1.6 2.18e+08 1.0 0.0e+00 0.0e+00
1.1e+03 2 3 0 0 6 2 3 0 0 6 971
VecMDot 525 1.0 1.7089e+00 1.7 1.48e+09 1.0 0.0e+00 0.0e+00
1.0e+03 7 17 0 0 6 7 17 0 0 6 1735
VecNorm 420 1.0 7.6857e-02 1.0 8.40e+07 1.0 0.0e+00 0.0e+00
8.4e+02 0 1 0 0 4 0 1 0 0 4 2186
VecScale 1090 1.0 2.5113e-01 1.1 1.09e+08 1.0 0.0e+00 0.0e+00
0.0e+00 1 1 0 0 0 1 1 0 0 0 868
VecSet 555 1.0 7.3570e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 1090 1.0 2.7621e-01 1.1 2.18e+08 1.0 0.0e+00 0.0e+00
0.0e+00 1 3 0 0 0 1 3 0 0 0 1579
VecAYPX 5 1.0 3.6647e-03 2.1 5.00e+05 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 273
VecMAXPY 1050 1.0 2.3646e+00 1.1 2.97e+09 1.0 0.0e+00 0.0e+00
0.0e+00 11 34 0 0 0 11 34 0 0 0 2508
VecAssemblyBegin 12 1.7 2.4388e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
2.8e+01 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 12 1.7 1.0085e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 560 1.0 2.3770e+0071.3 0.00e+00 0.0 1.1e+03 2.7e+04
1.0e+01 6 0 99 59 0 6 0 99 59 0 0
VecScatterEnd 550 1.0 3.7769e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 550 1.0 3.7412e+00 1.1 1.80e+09 1.0 1.1e+03 2.0e+04
0.0e+00 17 21 99 43 0 17 21 99 43 0 962
MatSolve 545 1.0 3.6138e+00 1.1 1.77e+09 1.0 0.0e+00 0.0e+00
0.0e+00 17 20 0 0 0 17 20 0 0 0 980
MatLUFactorNum 1 1.0 1.2530e-01 1.5 1.27e+07 1.0 0.0e+00 0.0e+00
0.0e+00 1 0 0 0 0 1 0 0 0 0 203
MatILUFactorSym 1 1.0 2.0162e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatConvert 1 1.0 3.3683e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 1 1.0 9.5172e-02359.3 0.00e+00 0.0 0.0e+00 0.0e+00
4.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 1 1.0 2.6907e-02 1.0 0.00e+00 0.0 4.0e+00 5.0e+03
2.3e+01 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 3 1.0 1.2398e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 4.4249e-02 2.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLoad 1 1.0 3.8892e-01 1.0 0.00e+00 0.0 7.0e+00 3.0e+06
3.8e+01 2 0 1 41 0 2 0 1 41 0 0
KSPSetUp 2 1.0 2.2634e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
5.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 5 1.0 1.2218e+01 1.0 8.66e+09 1.0 1.1e+03 2.0e+04
1.9e+04 59100 99 43 99 59100 99 43 99 1418
PCSetUp 3 1.0 1.7993e+00 1.0 1.27e+07 1.0 0.0e+00 0.0e+00
1.0e+01 8 0 0 0 0 8 0 0 0 0 14
PCSetUpOnBlocks 5 1.0 1.9013e-01 1.7 1.27e+07 1.0 0.0e+00 0.0e+00
0.0e+00 1 0 0 0 0 1 0 0 0 0 134
PCApply 546 1.0 3.8320e+00 1.1 1.77e+09 1.0 0.0e+00 0.0e+00
1.0e+00 18 20 0 0 0 18 20 0 0 0 925
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 72 6 5609648 0.
Vector Scatter 3 2 1312 0.
Matrix 4 0 0 0.
Viewer 2 0 0 0.
Index Set 7 4 13104 0.
Krylov Solver 2 0 0 0.
Preconditioner 3 1 1384 0.
========================================================================================================================
Average time to get PetscTime(): 7.15256e-08
Average time for MPI_Barrier(): 1.82629e-05
Average time for zero size MPI_Send(): 9.89437e-06
#PETSc Option Table entries:
-log_view
-matload_block_size 1
-pc_hypre_boomeramg_nodal_coarsen 1
-pc_hypre_boomeramg_vec_interp_variant 1
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=gcc --with-cxx=g++ --with-fc=gfortran
--download-fblaslapack --download-mpich --download-hdf5 --download-netcdf
--download-hypre --download-metis --download-parmetis --download-trillinos
-----------------------------------------
Libraries compiled on Fri Dec 9 12:45:19 2016 on ocean
Machine characteristics:
Linux-3.10.0-327.13.1.el7.x86_64-x86_64-with-centos-7.2.1511-Core
Using PETSc directory: /home/valera/petsc
Using PETSc arch: arch-linux2-c-debug
-----------------------------------------
Using C compiler: /home/valera/petsc/arch-linux2-c-debug/bin/mpicc -Wall
-Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas
-fvisibility=hidden -g3 ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/valera/petsc/arch-linux2-c-debug/bin/mpif90
-Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g ${FOPTFLAGS}
${FFLAGS}
-----------------------------------------
Using include paths: -I/home/valera/petsc/arch-linux2-c-debug/include
-I/home/valera/petsc/include -I/home/valera/petsc/include
-I/home/valera/petsc/arch-linux2-c-debug/include
-----------------------------------------
Using C linker: /home/valera/petsc/arch-linux2-c-debug/bin/mpicc
Using Fortran linker: /home/valera/petsc/arch-linux2-c-debug/bin/mpif90
Using libraries: -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib
-L/home/valera/petsc/arch-linux2-c-debug/lib -lpetsc
-Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib
-L/home/valera/petsc/arch-linux2-c-debug/lib -lparmetis -lmetis -lHYPRE
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5
-L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lmpicxx -lstdc++ -lflapack
-lfblas -lnetcdf -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lpthread
-lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx
-lstdc++ -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib
-L/home/valera/petsc/arch-linux2-c-debug/lib
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5
-L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -ldl
-Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -lmpi -lgcc_s -ldl
-----------------------------------------
WARNING! There are options you set that were not used!
WARNING! could be spelling mistake, etc!
Option left: name:-pc_hypre_boomeramg_nodal_coarsen value: 1
Option left: name:-pc_hypre_boomeramg_vec_interp_variant value: 1
[valera at ocean serGCCOM]$
Any suggestions are very much appreciated,
Thanks
On Sat, Jan 7, 2017 at 3:39 PM, Matthew Knepley <knepley at gmail.com> wrote:
> On Sat, Jan 7, 2017 at 5:33 PM, Manuel Valera <mvalera at mail.sdsu.edu>
> wrote:
>
>> Thanks Barry and Matt,
>>
>> I was able to detect a bug that i just solved, as suggested the loop
>> parameters weren't updated as it should, now it does and the program still
>> freezes but now in the beginning of the loop... ?
>>
>
> You have called a collective function from only one process. Stepping
> through on both processes in your run will find this easily.
>
> Thanks,
>
> Matt
>
>
>> Im attaching screen so you have an idea. Im thinking about it...
>>
>> Thanks
>>
>> On Sat, Jan 7, 2017 at 3:21 PM, Matthew Knepley <knepley at gmail.com>
>> wrote:
>>
>>> On Sat, Jan 7, 2017 at 4:59 PM, Manuel Valera <mvalera at mail.sdsu.edu>
>>> wrote:
>>>
>>>> I would have to think and code a MWE for this problem before sending it
>>>> since the model is much bigger than the petsc solver. Attached here is a
>>>> screenshot of the debugger as barry taught me, is that the stack trace you
>>>> need ?
>>>>
>>>> the ucmsMain.f90:522 that shows is the call (from all processes) to the
>>>> routine that updates the rhs vector (from root) and scatters it (from all
>>>> processes).
>>>>
>>>
>>> Yes, so one process is here and the other has moved on, so there is a
>>> mismatch in calls.
>>>
>>> You could do what Barry suggests, but I think it would be better to just
>>> step through your main routine once (its slow going), and
>>> see where the divergence happens.
>>>
>>> Matt
>>>
>>>
>>>> This routine is itself inside a double loop that occurs in all
>>>> processes but the only call from all processes to the solver is this one,
>>>> the rest of the loop which involves correcting for velocities, pressure and
>>>> temperature, all happens in root node.
>>>>
>>>> Sorry for the convoluted program design, this is the first beta version
>>>> of the model working on parallel and was the best i could come with, i
>>>> suppose it makes more sense in serial,
>>>>
>>>> Thanks
>>>>
>>>> On Sat, Jan 7, 2017 at 2:24 PM, Matthew Knepley <knepley at gmail.com>
>>>> wrote:
>>>>
>>>>> On Sat, Jan 7, 2017 at 4:20 PM, Manuel Valera <mvalera at mail.sdsu.edu>
>>>>> wrote:
>>>>>
>>>>>> Thank you Matthew,
>>>>>>
>>>>>> On Sat, Jan 7, 2017 at 1:49 PM, Matthew Knepley <knepley at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On Sat, Jan 7, 2017 at 3:32 PM, Manuel Valera <mvalera at mail.sdsu.edu
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Hi Devs, hope you are having a great weekend,
>>>>>>>>
>>>>>>>> I could finally parallelize my linear solver and implement it into
>>>>>>>> the rest of the code in a way that only the linear system is solved in
>>>>>>>> parallel, great news for my team, but there is a catch and is that i don't
>>>>>>>> see any speedup in the linear system, i don't know if its the MPI in the
>>>>>>>> cluster we are using, but im not sure on how to debug it,
>>>>>>>>
>>>>>>>
>>>>>>> We need to see -log_view output for any performance question.
>>>>>>>
>>>>>>>
>>>>>>>> On the other hand and because of this issue i was trying to do
>>>>>>>> -log_summary or -log_view and i noticed the program in this context hangs
>>>>>>>> when is time of producing the log, if i debug this for 2 cores, process 0
>>>>>>>> exits normally but process 1 hangs in the vectorscatterbegin() with
>>>>>>>> scatter_reverse way back in the code,
>>>>>>>>
>>>>>>>
>>>>>>> You are calling a collective routine from only 1 process.
>>>>>>>
>>>>>>>
>>>>>> Matt
>>>>>>>
>>>>>>
>>>>>> I am pretty confident this is not the case,
>>>>>>
>>>>>
>>>>> This is still the simplest explanation. Can you send the stack trace
>>>>> for the 2 process run?
>>>>>
>>>>>
>>>>>> the callings to vecscattercreatetozero and vecscatterbegin are made
>>>>>> in all processes, the program goes thru all of the iterations on the linear
>>>>>> solver, writes output correctly and even closes all the petsc objects
>>>>>> without complaining, the freeze occurs at the very end when the log is to
>>>>>> be produced.
>>>>>>
>>>>>
>>>>> If you can send us a code to run, we can likely find the error.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Matt
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Manuel
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> and even after destroying all associated objects and calling
>>>>>>>> petscfinalize(), so im really clueless on why is this, as it only happens
>>>>>>>> for -log_* or -ksp_view options.
>>>>>>>>
>>>>>>>> my -ksp_view shows this:
>>>>>>>>
>>>>>>>> KSP Object: 2 MPI processes
>>>>>>>>
>>>>>>>> type: gcr
>>>>>>>>
>>>>>>>> GCR: restart = 30
>>>>>>>>
>>>>>>>> GCR: restarts performed = 20
>>>>>>>>
>>>>>>>> maximum iterations=10000, initial guess is zero
>>>>>>>>
>>>>>>>> tolerances: relative=1e-14, absolute=1e-50, divergence=10000.
>>>>>>>>
>>>>>>>> right preconditioning
>>>>>>>>
>>>>>>>> using UNPRECONDITIONED norm type for convergence test
>>>>>>>>
>>>>>>>> PC Object: 2 MPI processes
>>>>>>>>
>>>>>>>> type: bjacobi
>>>>>>>>
>>>>>>>> block Jacobi: number of blocks = 2
>>>>>>>>
>>>>>>>> Local solve is same for all blocks, in the following KSP and PC
>>>>>>>> objects:
>>>>>>>>
>>>>>>>> KSP Object: (sub_) 1 MPI processes
>>>>>>>>
>>>>>>>> type: preonly
>>>>>>>>
>>>>>>>> maximum iterations=10000, initial guess is zero
>>>>>>>>
>>>>>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>>>>>>>>
>>>>>>>> left preconditioning
>>>>>>>>
>>>>>>>> using NONE norm type for convergence test
>>>>>>>>
>>>>>>>> PC Object: (sub_) 1 MPI processes
>>>>>>>>
>>>>>>>> type: ilu
>>>>>>>>
>>>>>>>> ILU: out-of-place factorization
>>>>>>>>
>>>>>>>> 0 levels of fill
>>>>>>>>
>>>>>>>> tolerance for zero pivot 2.22045e-14
>>>>>>>>
>>>>>>>> matrix ordering: natural
>>>>>>>>
>>>>>>>> factor fill ratio given 1., needed 1.
>>>>>>>>
>>>>>>>> Factored matrix follows:
>>>>>>>>
>>>>>>>> Mat Object: 1 MPI processes
>>>>>>>>
>>>>>>>> type: seqaij
>>>>>>>>
>>>>>>>> rows=100000, cols=100000
>>>>>>>>
>>>>>>>> package used to perform factorization: petsc
>>>>>>>>
>>>>>>>> total: nonzeros=1675180, allocated nonzeros=1675180
>>>>>>>>
>>>>>>>> total number of mallocs used during MatSetValues calls
>>>>>>>> =0
>>>>>>>>
>>>>>>>> not using I-node routines
>>>>>>>>
>>>>>>>> linear system matrix = precond matrix:
>>>>>>>>
>>>>>>>> Mat Object: 1 MPI processes
>>>>>>>>
>>>>>>>> type: seqaij
>>>>>>>>
>>>>>>>> rows=100000, cols=100000
>>>>>>>>
>>>>>>>> total: nonzeros=1675180, allocated nonzeros=1675180
>>>>>>>>
>>>>>>>> total number of mallocs used during MatSetValues calls =0
>>>>>>>>
>>>>>>>> not using I-node routines
>>>>>>>>
>>>>>>>> linear system matrix = precond matrix:
>>>>>>>>
>>>>>>>> Mat Object: 2 MPI processes
>>>>>>>>
>>>>>>>> type: mpiaij
>>>>>>>>
>>>>>>>> rows=200000, cols=200000
>>>>>>>>
>>>>>>>> total: nonzeros=3373340, allocated nonzeros=3373340
>>>>>>>>
>>>>>>>> total number of mallocs used during MatSetValues calls =0
>>>>>>>>
>>>>>>>> not using I-node (on process 0) routines
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> And i configured my PC object as:
>>>>>>>>
>>>>>>>>
>>>>>>>> call PCSetType(mg,PCHYPRE,ierr)
>>>>>>>>
>>>>>>>> call PCHYPRESetType(mg,'boomeramg',ierr)
>>>>>>>>
>>>>>>>>
>>>>>>>> call PetscOptionsSetValue(PETSC_NULL_OBJECT,
>>>>>>>> 'pc_hypre_boomeramg_nodal_coarsen','1',ierr)
>>>>>>>>
>>>>>>>> call PetscOptionsSetValue(PETSC_NULL_OBJECT,
>>>>>>>> 'pc_hypre_boomeramg_vec_interp_variant','1',ierr)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> What are your thoughts ?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Manuel
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jan 6, 2017 at 1:58 PM, Manuel Valera <
>>>>>>>> mvalera at mail.sdsu.edu> wrote:
>>>>>>>>
>>>>>>>>> Awesome, that did it, thanks once again.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jan 6, 2017 at 1:53 PM, Barry Smith <bsmith at mcs.anl.gov>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Take the scatter out of the if () since everyone does it and
>>>>>>>>>> get rid of the VecView().
>>>>>>>>>>
>>>>>>>>>> Does this work? If not where is it hanging?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> > On Jan 6, 2017, at 3:29 PM, Manuel Valera <
>>>>>>>>>> mvalera at mail.sdsu.edu> wrote:
>>>>>>>>>> >
>>>>>>>>>> > Thanks Dave,
>>>>>>>>>> >
>>>>>>>>>> > I think is interesting it never gave an error on this, after
>>>>>>>>>> adding the vecassembly calls it still shows the same behavior, without
>>>>>>>>>> complaining, i did:
>>>>>>>>>> >
>>>>>>>>>> > if(rankl==0)then
>>>>>>>>>> >
>>>>>>>>>> > call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr)
>>>>>>>>>> > call VecAssemblyBegin(bp0,ierr) ; call
>>>>>>>>>> VecAssemblyEnd(bp0,ierr);
>>>>>>>>>> > CHKERRQ(ierr)
>>>>>>>>>> >
>>>>>>>>>> endif
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > call VecScatterBegin(ctr,bp0,bp2,IN
>>>>>>>>>> SERT_VALUES,SCATTER_REVERSE,ierr)
>>>>>>>>>> > call VecScatterEnd(ctr,bp0,bp2,INSE
>>>>>>>>>> RT_VALUES,SCATTER_REVERSE,ierr)
>>>>>>>>>> > print*,"done! "
>>>>>>>>>> > CHKERRQ(ierr)
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > CHKERRQ(ierr)
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > Thanks.
>>>>>>>>>> >
>>>>>>>>>> > On Fri, Jan 6, 2017 at 12:44 PM, Dave May <
>>>>>>>>>> dave.mayhem23 at gmail.com> wrote:
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > On 6 January 2017 at 20:24, Manuel Valera <
>>>>>>>>>> mvalera at mail.sdsu.edu> wrote:
>>>>>>>>>> > Great help Barry, i totally had overlooked that option (it is
>>>>>>>>>> explicit in the vecscatterbegin call help page but not in
>>>>>>>>>> vecscattercreatetozero, as i read later)
>>>>>>>>>> >
>>>>>>>>>> > So i used that and it works partially, it scatters te values
>>>>>>>>>> assigned in root but not the rest, if i call vecscatterbegin from outside
>>>>>>>>>> root it hangs, the code currently look as this:
>>>>>>>>>> >
>>>>>>>>>> > call VecScatterCreateToZero(bp2,ctr,bp0,ierr); CHKERRQ(ierr)
>>>>>>>>>> >
>>>>>>>>>> > call PetscObjectSetName(bp0, 'bp0:',ierr)
>>>>>>>>>> >
>>>>>>>>>> > if(rankl==0)then
>>>>>>>>>> >
>>>>>>>>>> > call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr)
>>>>>>>>>> >
>>>>>>>>>> > call VecView(bp0,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > You need to call
>>>>>>>>>> >
>>>>>>>>>> > VecAssemblyBegin(bp0);
>>>>>>>>>> > VecAssemblyEnd(bp0);
>>>>>>>>>> > after your last call to VecSetValues() before you can do any
>>>>>>>>>> operations with bp0.
>>>>>>>>>> >
>>>>>>>>>> > With your current code, the call to VecView should produce an
>>>>>>>>>> error if you used the error checking macro CHKERRQ(ierr) (as should
>>>>>>>>>> VecScatter{Begin,End}
>>>>>>>>>> >
>>>>>>>>>> > Thanks,
>>>>>>>>>> > Dave
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > call VecScatterBegin(ctr,bp0,bp2,IN
>>>>>>>>>> SERT_VALUES,SCATTER_REVERSE,ierr)
>>>>>>>>>> > call VecScatterEnd(ctr,bp0,bp2,INSE
>>>>>>>>>> RT_VALUES,SCATTER_REVERSE,ierr)
>>>>>>>>>> > print*,"done! "
>>>>>>>>>> > CHKERRQ(ierr)
>>>>>>>>>> >
>>>>>>>>>> > endif
>>>>>>>>>> >
>>>>>>>>>> > ! call VecScatterBegin(ctr,bp0,bp2,IN
>>>>>>>>>> SERT_VALUES,SCATTER_REVERSE,ierr)
>>>>>>>>>> > ! call VecScatterEnd(ctr,bp0,bp2,INSE
>>>>>>>>>> RT_VALUES,SCATTER_REVERSE,ierr)
>>>>>>>>>> >
>>>>>>>>>> > call VecView(bp2,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>>>>>>>>> >
>>>>>>>>>> > call PetscBarrier(PETSC_NULL_OBJECT,ierr)
>>>>>>>>>> >
>>>>>>>>>> > call exit()
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > And the output is: (with bp the right answer)
>>>>>>>>>> >
>>>>>>>>>> > Vec Object:bp: 2 MPI processes
>>>>>>>>>> > type: mpi
>>>>>>>>>> > Process [0]
>>>>>>>>>> > 1.
>>>>>>>>>> > 2.
>>>>>>>>>> > Process [1]
>>>>>>>>>> > 4.
>>>>>>>>>> > 3.
>>>>>>>>>> > Vec Object:bp2: 2 MPI processes (before scatter)
>>>>>>>>>> > type: mpi
>>>>>>>>>> > Process [0]
>>>>>>>>>> > 0.
>>>>>>>>>> > 0.
>>>>>>>>>> > Process [1]
>>>>>>>>>> > 0.
>>>>>>>>>> > 0.
>>>>>>>>>> > Vec Object:bp0: 1 MPI processes
>>>>>>>>>> > type: seq
>>>>>>>>>> > 1.
>>>>>>>>>> > 2.
>>>>>>>>>> > 4.
>>>>>>>>>> > 3.
>>>>>>>>>> > done!
>>>>>>>>>> > Vec Object:bp2: 2 MPI processes (after scatter)
>>>>>>>>>> > type: mpi
>>>>>>>>>> > Process [0]
>>>>>>>>>> > 1.
>>>>>>>>>> > 2.
>>>>>>>>>> > Process [1]
>>>>>>>>>> > 0.
>>>>>>>>>> > 0.
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > Thanks inmensely for your help,
>>>>>>>>>> >
>>>>>>>>>> > Manuel
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > On Thu, Jan 5, 2017 at 4:39 PM, Barry Smith <bsmith at mcs.anl.gov>
>>>>>>>>>> wrote:
>>>>>>>>>> >
>>>>>>>>>> > > On Jan 5, 2017, at 6:21 PM, Manuel Valera <
>>>>>>>>>> mvalera at mail.sdsu.edu> wrote:
>>>>>>>>>> > >
>>>>>>>>>> > > Hello Devs is me again,
>>>>>>>>>> > >
>>>>>>>>>> > > I'm trying to distribute a vector to all called processes,
>>>>>>>>>> the vector would be originally in root as a sequential vector and i would
>>>>>>>>>> like to scatter it, what would the best call to do this ?
>>>>>>>>>> > >
>>>>>>>>>> > > I already know how to gather a distributed vector to root
>>>>>>>>>> with VecScatterCreateToZero, this would be the inverse operation,
>>>>>>>>>> >
>>>>>>>>>> > Use the same VecScatter object but with SCATTER_REVERSE, not
>>>>>>>>>> you need to reverse the two vector arguments as well.
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > > i'm currently trying with VecScatterCreate() and as of now im
>>>>>>>>>> doing the following:
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > > if(rank==0)then
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > > call VecCreate(PETSC_COMM_SELF,bp0,ierr); CHKERRQ(ierr)
>>>>>>>>>> !if i use WORLD
>>>>>>>>>> > >
>>>>>>>>>> !freezes in SetSizes
>>>>>>>>>> > > call VecSetSizes(bp0,PETSC_DECIDE,nbdp,ierr);
>>>>>>>>>> CHKERRQ(ierr)
>>>>>>>>>> > > call VecSetType(bp0,VECSEQ,ierr)
>>>>>>>>>> > > call VecSetFromOptions(bp0,ierr); CHKERRQ(ierr)
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > > call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr)
>>>>>>>>>> > >
>>>>>>>>>> > > !call VecSet(bp0,5.0D0,ierr); CHKERRQ(ierr)
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > > call VecView(bp0,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>>>>>>>>> > >
>>>>>>>>>> > > call VecAssemblyBegin(bp0,ierr) ; call
>>>>>>>>>> VecAssemblyEnd(bp0,ierr) !rhs
>>>>>>>>>> > >
>>>>>>>>>> > > do i=0,nbdp-1,1
>>>>>>>>>> > > ind(i+1) = i
>>>>>>>>>> > > enddo
>>>>>>>>>> > >
>>>>>>>>>> > > call ISCreateGeneral(PETSC_COMM_SEL
>>>>>>>>>> F,nbdp,ind,PETSC_COPY_VALUES,locis,ierr)
>>>>>>>>>> > >
>>>>>>>>>> > > !call VecScatterCreate(bp0,PETSC_NULL_OBJECT,bp2,is,ctr,ierr)
>>>>>>>>>> !if i use SELF
>>>>>>>>>> > >
>>>>>>>>>> !freezes here.
>>>>>>>>>> > >
>>>>>>>>>> > > call VecScatterCreate(bp0,locis,bp2
>>>>>>>>>> ,PETSC_NULL_OBJECT,ctr,ierr)
>>>>>>>>>> > >
>>>>>>>>>> > > endif
>>>>>>>>>> > >
>>>>>>>>>> > > bp2 being the receptor MPI vector to scatter to
>>>>>>>>>> > >
>>>>>>>>>> > > But it freezes in VecScatterCreate when trying to use more
>>>>>>>>>> than one processor, what would be a better approach ?
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > > Thanks once again,
>>>>>>>>>> > >
>>>>>>>>>> > > Manuel
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > > On Wed, Jan 4, 2017 at 3:30 PM, Manuel Valera <
>>>>>>>>>> mvalera at mail.sdsu.edu> wrote:
>>>>>>>>>> > > Thanks i had no idea how to debug and read those logs, that
>>>>>>>>>> solved this issue at least (i was sending a message from root to everyone
>>>>>>>>>> else, but trying to catch from everyone else including root)
>>>>>>>>>> > >
>>>>>>>>>> > > Until next time, many thanks,
>>>>>>>>>> > >
>>>>>>>>>> > > Manuel
>>>>>>>>>> > >
>>>>>>>>>> > > On Wed, Jan 4, 2017 at 3:23 PM, Matthew Knepley <
>>>>>>>>>> knepley at gmail.com> wrote:
>>>>>>>>>> > > On Wed, Jan 4, 2017 at 5:21 PM, Manuel Valera <
>>>>>>>>>> mvalera at mail.sdsu.edu> wrote:
>>>>>>>>>> > > I did a PetscBarrier just before calling the vicariate
>>>>>>>>>> routine and im pretty sure im calling it from every processor, code looks
>>>>>>>>>> like this:
>>>>>>>>>> > >
>>>>>>>>>> > > From the gdb trace.
>>>>>>>>>> > >
>>>>>>>>>> > > Proc 0: Is in some MPI routine you call yourself, line 113
>>>>>>>>>> > >
>>>>>>>>>> > > Proc 1: Is in VecCreate(), line 130
>>>>>>>>>> > >
>>>>>>>>>> > > You need to fix your communication code.
>>>>>>>>>> > >
>>>>>>>>>> > > Matt
>>>>>>>>>> > >
>>>>>>>>>> > > call PetscBarrier(PETSC_NULL_OBJECT,ierr)
>>>>>>>>>> > >
>>>>>>>>>> > > print*,'entering POInit from',rank
>>>>>>>>>> > > !call exit()
>>>>>>>>>> > >
>>>>>>>>>> > > call PetscObjsInit()
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > > And output gives:
>>>>>>>>>> > >
>>>>>>>>>> > > entering POInit from 0
>>>>>>>>>> > > entering POInit from 1
>>>>>>>>>> > > entering POInit from 2
>>>>>>>>>> > > entering POInit from 3
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > > Still hangs in the same way,
>>>>>>>>>> > >
>>>>>>>>>> > > Thanks,
>>>>>>>>>> > >
>>>>>>>>>> > > Manuel
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > > On Wed, Jan 4, 2017 at 2:55 PM, Manuel Valera <
>>>>>>>>>> mvalera at mail.sdsu.edu> wrote:
>>>>>>>>>> > > Thanks for the answers !
>>>>>>>>>> > >
>>>>>>>>>> > > heres the screenshot of what i got from bt in gdb (great hint
>>>>>>>>>> in how to debug in petsc, didn't know that)
>>>>>>>>>> > >
>>>>>>>>>> > > I don't really know what to look at here,
>>>>>>>>>> > >
>>>>>>>>>> > > Thanks,
>>>>>>>>>> > >
>>>>>>>>>> > > Manuel
>>>>>>>>>> > >
>>>>>>>>>> > > On Wed, Jan 4, 2017 at 2:39 PM, Dave May <
>>>>>>>>>> dave.mayhem23 at gmail.com> wrote:
>>>>>>>>>> > > Are you certain ALL ranks in PETSC_COMM_WORLD call these
>>>>>>>>>> function(s). These functions cannot be inside if statements like
>>>>>>>>>> > > if (rank == 0){
>>>>>>>>>> > > VecCreateMPI(...)
>>>>>>>>>> > > }
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > > On Wed, 4 Jan 2017 at 23:34, Manuel Valera <
>>>>>>>>>> mvalera at mail.sdsu.edu> wrote:
>>>>>>>>>> > > Thanks Dave for the quick answer, appreciate it,
>>>>>>>>>> > >
>>>>>>>>>> > > I just tried that and it didn't make a difference, any other
>>>>>>>>>> suggestions ?
>>>>>>>>>> > >
>>>>>>>>>> > > Thanks,
>>>>>>>>>> > > Manuel
>>>>>>>>>> > >
>>>>>>>>>> > > On Wed, Jan 4, 2017 at 2:29 PM, Dave May <
>>>>>>>>>> dave.mayhem23 at gmail.com> wrote:
>>>>>>>>>> > > You need to swap the order of your function calls.
>>>>>>>>>> > > Call VecSetSizes() before VecSetType()
>>>>>>>>>> > >
>>>>>>>>>> > > Thanks,
>>>>>>>>>> > > Dave
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > > On Wed, 4 Jan 2017 at 23:21, Manuel Valera <
>>>>>>>>>> mvalera at mail.sdsu.edu> wrote:
>>>>>>>>>> > > Hello all, happy new year,
>>>>>>>>>> > >
>>>>>>>>>> > > I'm working on parallelizing my code, it worked and provided
>>>>>>>>>> some results when i just called more than one processor, but created
>>>>>>>>>> artifacts because i didn't need one image of the whole program in each
>>>>>>>>>> processor, conflicting with each other.
>>>>>>>>>> > >
>>>>>>>>>> > > Since the pressure solver is the main part i need in parallel
>>>>>>>>>> im chosing mpi to run everything in root processor until its time to solve
>>>>>>>>>> for pressure, at this point im trying to create a distributed vector using
>>>>>>>>>> either
>>>>>>>>>> > >
>>>>>>>>>> > > call VecCreateMPI(PETSC_COMM_WORLD,
>>>>>>>>>> PETSC_DECIDE,nbdp,xp,ierr)
>>>>>>>>>> > > or
>>>>>>>>>> > > call VecCreate(PETSC_COMM_WORLD,xp,ierr); CHKERRQ(ierr)
>>>>>>>>>> > > call VecSetType(xp,VECMPI,ierr)
>>>>>>>>>> > > call VecSetSizes(xp,PETSC_DECIDE,nbdp,ierr);
>>>>>>>>>> CHKERRQ(ierr)
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > > In both cases program hangs at this point, something it never
>>>>>>>>>> happened on the naive way i described before. I've made sure the global
>>>>>>>>>> size, nbdp, is the same in every processor. What can be wrong?
>>>>>>>>>> > >
>>>>>>>>>> > > Thanks for your kind help,
>>>>>>>>>> > >
>>>>>>>>>> > > Manuel.
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > > --
>>>>>>>>>> > > What most experimenters take for granted before they begin
>>>>>>>>>> their experiments is infinitely more interesting than any results to which
>>>>>>>>>> their experiments lead.
>>>>>>>>>> > > -- Norbert Wiener
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> What most experimenters take for granted before they begin their
>>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>>> experiments lead.
>>>>>>> -- Norbert Wiener
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which their
>>>>> experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170107/e7049d5c/attachment-0001.html>
More information about the petsc-users
mailing list