[petsc-users] -log_view hangs unexpectedly // how to optimize my kspsolve
Matthew Knepley
knepley at gmail.com
Sat Jan 7 16:24:47 CST 2017
On Sat, Jan 7, 2017 at 4:20 PM, Manuel Valera <mvalera at mail.sdsu.edu> wrote:
> Thank you Matthew,
>
> On Sat, Jan 7, 2017 at 1:49 PM, Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Sat, Jan 7, 2017 at 3:32 PM, Manuel Valera <mvalera at mail.sdsu.edu>
>> wrote:
>>
>>> Hi Devs, hope you are having a great weekend,
>>>
>>> I could finally parallelize my linear solver and implement it into the
>>> rest of the code in a way that only the linear system is solved in
>>> parallel, great news for my team, but there is a catch and is that i don't
>>> see any speedup in the linear system, i don't know if its the MPI in the
>>> cluster we are using, but im not sure on how to debug it,
>>>
>>
>> We need to see -log_view output for any performance question.
>>
>>
>>> On the other hand and because of this issue i was trying to do
>>> -log_summary or -log_view and i noticed the program in this context hangs
>>> when is time of producing the log, if i debug this for 2 cores, process 0
>>> exits normally but process 1 hangs in the vectorscatterbegin() with
>>> scatter_reverse way back in the code,
>>>
>>
>> You are calling a collective routine from only 1 process.
>>
>>
> Matt
>>
>
> I am pretty confident this is not the case,
>
This is still the simplest explanation. Can you send the stack trace for
the 2 process run?
> the callings to vecscattercreatetozero and vecscatterbegin are made in all
> processes, the program goes thru all of the iterations on the linear
> solver, writes output correctly and even closes all the petsc objects
> without complaining, the freeze occurs at the very end when the log is to
> be produced.
>
If you can send us a code to run, we can likely find the error.
Thanks,
Matt
> Thanks,
>
> Manuel
>
>
>
>>
>>
>>> and even after destroying all associated objects and calling
>>> petscfinalize(), so im really clueless on why is this, as it only happens
>>> for -log_* or -ksp_view options.
>>>
>>> my -ksp_view shows this:
>>>
>>> KSP Object: 2 MPI processes
>>>
>>> type: gcr
>>>
>>> GCR: restart = 30
>>>
>>> GCR: restarts performed = 20
>>>
>>> maximum iterations=10000, initial guess is zero
>>>
>>> tolerances: relative=1e-14, absolute=1e-50, divergence=10000.
>>>
>>> right preconditioning
>>>
>>> using UNPRECONDITIONED norm type for convergence test
>>>
>>> PC Object: 2 MPI processes
>>>
>>> type: bjacobi
>>>
>>> block Jacobi: number of blocks = 2
>>>
>>> Local solve is same for all blocks, in the following KSP and PC
>>> objects:
>>>
>>> KSP Object: (sub_) 1 MPI processes
>>>
>>> type: preonly
>>>
>>> maximum iterations=10000, initial guess is zero
>>>
>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>>>
>>> left preconditioning
>>>
>>> using NONE norm type for convergence test
>>>
>>> PC Object: (sub_) 1 MPI processes
>>>
>>> type: ilu
>>>
>>> ILU: out-of-place factorization
>>>
>>> 0 levels of fill
>>>
>>> tolerance for zero pivot 2.22045e-14
>>>
>>> matrix ordering: natural
>>>
>>> factor fill ratio given 1., needed 1.
>>>
>>> Factored matrix follows:
>>>
>>> Mat Object: 1 MPI processes
>>>
>>> type: seqaij
>>>
>>> rows=100000, cols=100000
>>>
>>> package used to perform factorization: petsc
>>>
>>> total: nonzeros=1675180, allocated nonzeros=1675180
>>>
>>> total number of mallocs used during MatSetValues calls =0
>>>
>>> not using I-node routines
>>>
>>> linear system matrix = precond matrix:
>>>
>>> Mat Object: 1 MPI processes
>>>
>>> type: seqaij
>>>
>>> rows=100000, cols=100000
>>>
>>> total: nonzeros=1675180, allocated nonzeros=1675180
>>>
>>> total number of mallocs used during MatSetValues calls =0
>>>
>>> not using I-node routines
>>>
>>> linear system matrix = precond matrix:
>>>
>>> Mat Object: 2 MPI processes
>>>
>>> type: mpiaij
>>>
>>> rows=200000, cols=200000
>>>
>>> total: nonzeros=3373340, allocated nonzeros=3373340
>>>
>>> total number of mallocs used during MatSetValues calls =0
>>>
>>> not using I-node (on process 0) routines
>>>
>>>
>>>
>>> And i configured my PC object as:
>>>
>>>
>>> call PCSetType(mg,PCHYPRE,ierr)
>>>
>>> call PCHYPRESetType(mg,'boomeramg',ierr)
>>>
>>>
>>> call PetscOptionsSetValue(PETSC_NULL_OBJECT,'pc_hypre_boomeramg_n
>>> odal_coarsen','1',ierr)
>>>
>>> call PetscOptionsSetValue(PETSC_NULL_OBJECT,'pc_hypre_boomeramg_v
>>> ec_interp_variant','1',ierr)
>>>
>>>
>>>
>>> What are your thoughts ?
>>>
>>> Thanks,
>>>
>>> Manuel
>>>
>>>
>>>
>>> On Fri, Jan 6, 2017 at 1:58 PM, Manuel Valera <mvalera at mail.sdsu.edu>
>>> wrote:
>>>
>>>> Awesome, that did it, thanks once again.
>>>>
>>>>
>>>> On Fri, Jan 6, 2017 at 1:53 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>
>>>>>
>>>>> Take the scatter out of the if () since everyone does it and get
>>>>> rid of the VecView().
>>>>>
>>>>> Does this work? If not where is it hanging?
>>>>>
>>>>>
>>>>> > On Jan 6, 2017, at 3:29 PM, Manuel Valera <mvalera at mail.sdsu.edu>
>>>>> wrote:
>>>>> >
>>>>> > Thanks Dave,
>>>>> >
>>>>> > I think is interesting it never gave an error on this, after adding
>>>>> the vecassembly calls it still shows the same behavior, without
>>>>> complaining, i did:
>>>>> >
>>>>> > if(rankl==0)then
>>>>> >
>>>>> > call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr)
>>>>> > call VecAssemblyBegin(bp0,ierr) ; call VecAssemblyEnd(bp0,ierr);
>>>>> > CHKERRQ(ierr)
>>>>> >
>>>>> endif
>>>>> >
>>>>> >
>>>>> > call VecScatterBegin(ctr,bp0,bp2,IN
>>>>> SERT_VALUES,SCATTER_REVERSE,ierr)
>>>>> > call VecScatterEnd(ctr,bp0,bp2,INSE
>>>>> RT_VALUES,SCATTER_REVERSE,ierr)
>>>>> > print*,"done! "
>>>>> > CHKERRQ(ierr)
>>>>> >
>>>>> >
>>>>> > CHKERRQ(ierr)
>>>>> >
>>>>> >
>>>>> > Thanks.
>>>>> >
>>>>> > On Fri, Jan 6, 2017 at 12:44 PM, Dave May <dave.mayhem23 at gmail.com>
>>>>> wrote:
>>>>> >
>>>>> >
>>>>> > On 6 January 2017 at 20:24, Manuel Valera <mvalera at mail.sdsu.edu>
>>>>> wrote:
>>>>> > Great help Barry, i totally had overlooked that option (it is
>>>>> explicit in the vecscatterbegin call help page but not in
>>>>> vecscattercreatetozero, as i read later)
>>>>> >
>>>>> > So i used that and it works partially, it scatters te values
>>>>> assigned in root but not the rest, if i call vecscatterbegin from outside
>>>>> root it hangs, the code currently look as this:
>>>>> >
>>>>> > call VecScatterCreateToZero(bp2,ctr,bp0,ierr); CHKERRQ(ierr)
>>>>> >
>>>>> > call PetscObjectSetName(bp0, 'bp0:',ierr)
>>>>> >
>>>>> > if(rankl==0)then
>>>>> >
>>>>> > call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr)
>>>>> >
>>>>> > call VecView(bp0,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>>>> >
>>>>> >
>>>>> > You need to call
>>>>> >
>>>>> > VecAssemblyBegin(bp0);
>>>>> > VecAssemblyEnd(bp0);
>>>>> > after your last call to VecSetValues() before you can do any
>>>>> operations with bp0.
>>>>> >
>>>>> > With your current code, the call to VecView should produce an error
>>>>> if you used the error checking macro CHKERRQ(ierr) (as should
>>>>> VecScatter{Begin,End}
>>>>> >
>>>>> > Thanks,
>>>>> > Dave
>>>>> >
>>>>> >
>>>>> > call VecScatterBegin(ctr,bp0,bp2,IN
>>>>> SERT_VALUES,SCATTER_REVERSE,ierr)
>>>>> > call VecScatterEnd(ctr,bp0,bp2,INSE
>>>>> RT_VALUES,SCATTER_REVERSE,ierr)
>>>>> > print*,"done! "
>>>>> > CHKERRQ(ierr)
>>>>> >
>>>>> > endif
>>>>> >
>>>>> > ! call VecScatterBegin(ctr,bp0,bp2,IN
>>>>> SERT_VALUES,SCATTER_REVERSE,ierr)
>>>>> > ! call VecScatterEnd(ctr,bp0,bp2,INSE
>>>>> RT_VALUES,SCATTER_REVERSE,ierr)
>>>>> >
>>>>> > call VecView(bp2,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>>>> >
>>>>> > call PetscBarrier(PETSC_NULL_OBJECT,ierr)
>>>>> >
>>>>> > call exit()
>>>>> >
>>>>> >
>>>>> >
>>>>> > And the output is: (with bp the right answer)
>>>>> >
>>>>> > Vec Object:bp: 2 MPI processes
>>>>> > type: mpi
>>>>> > Process [0]
>>>>> > 1.
>>>>> > 2.
>>>>> > Process [1]
>>>>> > 4.
>>>>> > 3.
>>>>> > Vec Object:bp2: 2 MPI processes (before scatter)
>>>>> > type: mpi
>>>>> > Process [0]
>>>>> > 0.
>>>>> > 0.
>>>>> > Process [1]
>>>>> > 0.
>>>>> > 0.
>>>>> > Vec Object:bp0: 1 MPI processes
>>>>> > type: seq
>>>>> > 1.
>>>>> > 2.
>>>>> > 4.
>>>>> > 3.
>>>>> > done!
>>>>> > Vec Object:bp2: 2 MPI processes (after scatter)
>>>>> > type: mpi
>>>>> > Process [0]
>>>>> > 1.
>>>>> > 2.
>>>>> > Process [1]
>>>>> > 0.
>>>>> > 0.
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > Thanks inmensely for your help,
>>>>> >
>>>>> > Manuel
>>>>> >
>>>>> >
>>>>> > On Thu, Jan 5, 2017 at 4:39 PM, Barry Smith <bsmith at mcs.anl.gov>
>>>>> wrote:
>>>>> >
>>>>> > > On Jan 5, 2017, at 6:21 PM, Manuel Valera <mvalera at mail.sdsu.edu>
>>>>> wrote:
>>>>> > >
>>>>> > > Hello Devs is me again,
>>>>> > >
>>>>> > > I'm trying to distribute a vector to all called processes, the
>>>>> vector would be originally in root as a sequential vector and i would like
>>>>> to scatter it, what would the best call to do this ?
>>>>> > >
>>>>> > > I already know how to gather a distributed vector to root with
>>>>> VecScatterCreateToZero, this would be the inverse operation,
>>>>> >
>>>>> > Use the same VecScatter object but with SCATTER_REVERSE, not you
>>>>> need to reverse the two vector arguments as well.
>>>>> >
>>>>> >
>>>>> > > i'm currently trying with VecScatterCreate() and as of now im
>>>>> doing the following:
>>>>> > >
>>>>> > >
>>>>> > > if(rank==0)then
>>>>> > >
>>>>> > >
>>>>> > > call VecCreate(PETSC_COMM_SELF,bp0,ierr); CHKERRQ(ierr) !if
>>>>> i use WORLD
>>>>> > >
>>>>> !freezes in SetSizes
>>>>> > > call VecSetSizes(bp0,PETSC_DECIDE,nbdp,ierr); CHKERRQ(ierr)
>>>>> > > call VecSetType(bp0,VECSEQ,ierr)
>>>>> > > call VecSetFromOptions(bp0,ierr); CHKERRQ(ierr)
>>>>> > >
>>>>> > >
>>>>> > > call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr)
>>>>> > >
>>>>> > > !call VecSet(bp0,5.0D0,ierr); CHKERRQ(ierr)
>>>>> > >
>>>>> > >
>>>>> > > call VecView(bp0,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>>>> > >
>>>>> > > call VecAssemblyBegin(bp0,ierr) ; call
>>>>> VecAssemblyEnd(bp0,ierr) !rhs
>>>>> > >
>>>>> > > do i=0,nbdp-1,1
>>>>> > > ind(i+1) = i
>>>>> > > enddo
>>>>> > >
>>>>> > > call ISCreateGeneral(PETSC_COMM_SEL
>>>>> F,nbdp,ind,PETSC_COPY_VALUES,locis,ierr)
>>>>> > >
>>>>> > > !call VecScatterCreate(bp0,PETSC_NULL_OBJECT,bp2,is,ctr,ierr)
>>>>> !if i use SELF
>>>>> > >
>>>>> !freezes here.
>>>>> > >
>>>>> > > call VecScatterCreate(bp0,locis,bp2
>>>>> ,PETSC_NULL_OBJECT,ctr,ierr)
>>>>> > >
>>>>> > > endif
>>>>> > >
>>>>> > > bp2 being the receptor MPI vector to scatter to
>>>>> > >
>>>>> > > But it freezes in VecScatterCreate when trying to use more than
>>>>> one processor, what would be a better approach ?
>>>>> > >
>>>>> > >
>>>>> > > Thanks once again,
>>>>> > >
>>>>> > > Manuel
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > > On Wed, Jan 4, 2017 at 3:30 PM, Manuel Valera <
>>>>> mvalera at mail.sdsu.edu> wrote:
>>>>> > > Thanks i had no idea how to debug and read those logs, that solved
>>>>> this issue at least (i was sending a message from root to everyone else,
>>>>> but trying to catch from everyone else including root)
>>>>> > >
>>>>> > > Until next time, many thanks,
>>>>> > >
>>>>> > > Manuel
>>>>> > >
>>>>> > > On Wed, Jan 4, 2017 at 3:23 PM, Matthew Knepley <knepley at gmail.com>
>>>>> wrote:
>>>>> > > On Wed, Jan 4, 2017 at 5:21 PM, Manuel Valera <
>>>>> mvalera at mail.sdsu.edu> wrote:
>>>>> > > I did a PetscBarrier just before calling the vicariate routine and
>>>>> im pretty sure im calling it from every processor, code looks like this:
>>>>> > >
>>>>> > > From the gdb trace.
>>>>> > >
>>>>> > > Proc 0: Is in some MPI routine you call yourself, line 113
>>>>> > >
>>>>> > > Proc 1: Is in VecCreate(), line 130
>>>>> > >
>>>>> > > You need to fix your communication code.
>>>>> > >
>>>>> > > Matt
>>>>> > >
>>>>> > > call PetscBarrier(PETSC_NULL_OBJECT,ierr)
>>>>> > >
>>>>> > > print*,'entering POInit from',rank
>>>>> > > !call exit()
>>>>> > >
>>>>> > > call PetscObjsInit()
>>>>> > >
>>>>> > >
>>>>> > > And output gives:
>>>>> > >
>>>>> > > entering POInit from 0
>>>>> > > entering POInit from 1
>>>>> > > entering POInit from 2
>>>>> > > entering POInit from 3
>>>>> > >
>>>>> > >
>>>>> > > Still hangs in the same way,
>>>>> > >
>>>>> > > Thanks,
>>>>> > >
>>>>> > > Manuel
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > > On Wed, Jan 4, 2017 at 2:55 PM, Manuel Valera <
>>>>> mvalera at mail.sdsu.edu> wrote:
>>>>> > > Thanks for the answers !
>>>>> > >
>>>>> > > heres the screenshot of what i got from bt in gdb (great hint in
>>>>> how to debug in petsc, didn't know that)
>>>>> > >
>>>>> > > I don't really know what to look at here,
>>>>> > >
>>>>> > > Thanks,
>>>>> > >
>>>>> > > Manuel
>>>>> > >
>>>>> > > On Wed, Jan 4, 2017 at 2:39 PM, Dave May <dave.mayhem23 at gmail.com>
>>>>> wrote:
>>>>> > > Are you certain ALL ranks in PETSC_COMM_WORLD call these
>>>>> function(s). These functions cannot be inside if statements like
>>>>> > > if (rank == 0){
>>>>> > > VecCreateMPI(...)
>>>>> > > }
>>>>> > >
>>>>> > >
>>>>> > > On Wed, 4 Jan 2017 at 23:34, Manuel Valera <mvalera at mail.sdsu.edu>
>>>>> wrote:
>>>>> > > Thanks Dave for the quick answer, appreciate it,
>>>>> > >
>>>>> > > I just tried that and it didn't make a difference, any other
>>>>> suggestions ?
>>>>> > >
>>>>> > > Thanks,
>>>>> > > Manuel
>>>>> > >
>>>>> > > On Wed, Jan 4, 2017 at 2:29 PM, Dave May <dave.mayhem23 at gmail.com>
>>>>> wrote:
>>>>> > > You need to swap the order of your function calls.
>>>>> > > Call VecSetSizes() before VecSetType()
>>>>> > >
>>>>> > > Thanks,
>>>>> > > Dave
>>>>> > >
>>>>> > >
>>>>> > > On Wed, 4 Jan 2017 at 23:21, Manuel Valera <mvalera at mail.sdsu.edu>
>>>>> wrote:
>>>>> > > Hello all, happy new year,
>>>>> > >
>>>>> > > I'm working on parallelizing my code, it worked and provided some
>>>>> results when i just called more than one processor, but created artifacts
>>>>> because i didn't need one image of the whole program in each processor,
>>>>> conflicting with each other.
>>>>> > >
>>>>> > > Since the pressure solver is the main part i need in parallel im
>>>>> chosing mpi to run everything in root processor until its time to solve for
>>>>> pressure, at this point im trying to create a distributed vector using
>>>>> either
>>>>> > >
>>>>> > > call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,nbdp,xp,ierr)
>>>>> > > or
>>>>> > > call VecCreate(PETSC_COMM_WORLD,xp,ierr); CHKERRQ(ierr)
>>>>> > > call VecSetType(xp,VECMPI,ierr)
>>>>> > > call VecSetSizes(xp,PETSC_DECIDE,nbdp,ierr); CHKERRQ(ierr)
>>>>> > >
>>>>> > >
>>>>> > > In both cases program hangs at this point, something it never
>>>>> happened on the naive way i described before. I've made sure the global
>>>>> size, nbdp, is the same in every processor. What can be wrong?
>>>>> > >
>>>>> > > Thanks for your kind help,
>>>>> > >
>>>>> > > Manuel.
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > > --
>>>>> > > What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which their
>>>>> experiments lead.
>>>>> > > -- Norbert Wiener
>>>>> > >
>>>>> > >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170107/cc81f254/attachment-0001.html>
More information about the petsc-users
mailing list