[petsc-users] -log_view hangs unexpectedly // how to optimize my kspsolve

Manuel Valera mvalera at mail.sdsu.edu
Sat Jan 7 16:59:22 CST 2017


I would have to think and code a MWE for this problem before sending it
since the model is much bigger than the petsc solver. Attached here is a
screenshot of the debugger as barry taught me, is that the stack trace you
need ?

the ucmsMain.f90:522 that shows is the call (from all processes) to the
routine that updates the rhs vector (from root) and scatters it (from all
processes).

This routine is itself inside a double loop that occurs in all processes
but the only call from all processes to the solver is this one, the rest of
the loop which involves correcting for velocities, pressure and
temperature, all happens in root node.

Sorry for the convoluted program design, this is the first beta version of
the model working on parallel and was the best i could come with, i suppose
it makes more sense in serial,

Thanks

On Sat, Jan 7, 2017 at 2:24 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Sat, Jan 7, 2017 at 4:20 PM, Manuel Valera <mvalera at mail.sdsu.edu>
> wrote:
>
>> Thank you Matthew,
>>
>> On Sat, Jan 7, 2017 at 1:49 PM, Matthew Knepley <knepley at gmail.com>
>> wrote:
>>
>>> On Sat, Jan 7, 2017 at 3:32 PM, Manuel Valera <mvalera at mail.sdsu.edu>
>>> wrote:
>>>
>>>> Hi Devs, hope you are having a great weekend,
>>>>
>>>> I could finally parallelize my linear solver and implement it into the
>>>> rest of the code in a way that only the linear system is solved in
>>>> parallel, great news for my team, but there is a catch and is that i don't
>>>> see any speedup in the linear system, i don't know if its the MPI in the
>>>> cluster we are using, but im not sure on how to debug it,
>>>>
>>>
>>> We need to see -log_view output for any performance question.
>>>
>>>
>>>> On the other hand and because of this issue i was trying to do
>>>> -log_summary or -log_view and i noticed the program in this context hangs
>>>> when is time of producing the log, if i debug this for 2 cores, process 0
>>>> exits normally but process 1 hangs in the vectorscatterbegin() with
>>>> scatter_reverse way back in the code,
>>>>
>>>
>>> You are calling a collective routine from only 1 process.
>>>
>>>
>>   Matt
>>>
>>
>> I am pretty confident this is not the case,
>>
>
> This is still the simplest explanation. Can you send the stack trace for
> the 2 process run?
>
>
>> the callings to vecscattercreatetozero and vecscatterbegin are made in
>> all processes, the program goes thru all of the iterations on the linear
>> solver, writes output correctly and even closes all the petsc objects
>> without complaining, the freeze occurs at the very end when the log is to
>> be produced.
>>
>
> If you can send us a code to run, we can likely find the error.
>
>   Thanks,
>
>      Matt
>
>
>> Thanks,
>>
>> Manuel
>>
>>
>>
>>>
>>>
>>>> and even after destroying all associated objects and calling
>>>> petscfinalize(), so im really clueless on why is this, as it only happens
>>>> for -log_* or -ksp_view options.
>>>>
>>>> my -ksp_view shows this:
>>>>
>>>>  KSP Object: 2 MPI processes
>>>>
>>>>   type: gcr
>>>>
>>>>     GCR: restart = 30
>>>>
>>>>     GCR: restarts performed = 20
>>>>
>>>>   maximum iterations=10000, initial guess is zero
>>>>
>>>>   tolerances:  relative=1e-14, absolute=1e-50, divergence=10000.
>>>>
>>>>   right preconditioning
>>>>
>>>>   using UNPRECONDITIONED norm type for convergence test
>>>>
>>>> PC Object: 2 MPI processes
>>>>
>>>>   type: bjacobi
>>>>
>>>>     block Jacobi: number of blocks = 2
>>>>
>>>>     Local solve is same for all blocks, in the following KSP and PC
>>>> objects:
>>>>
>>>>   KSP Object:  (sub_)   1 MPI processes
>>>>
>>>>     type: preonly
>>>>
>>>>     maximum iterations=10000, initial guess is zero
>>>>
>>>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>>
>>>>     left preconditioning
>>>>
>>>>     using NONE norm type for convergence test
>>>>
>>>>   PC Object:  (sub_)   1 MPI processes
>>>>
>>>>     type: ilu
>>>>
>>>>       ILU: out-of-place factorization
>>>>
>>>>       0 levels of fill
>>>>
>>>>       tolerance for zero pivot 2.22045e-14
>>>>
>>>>       matrix ordering: natural
>>>>
>>>>       factor fill ratio given 1., needed 1.
>>>>
>>>>         Factored matrix follows:
>>>>
>>>>           Mat Object:           1 MPI processes
>>>>
>>>>             type: seqaij
>>>>
>>>>             rows=100000, cols=100000
>>>>
>>>>             package used to perform factorization: petsc
>>>>
>>>>             total: nonzeros=1675180, allocated nonzeros=1675180
>>>>
>>>>             total number of mallocs used during MatSetValues calls =0
>>>>
>>>>               not using I-node routines
>>>>
>>>>     linear system matrix = precond matrix:
>>>>
>>>>     Mat Object:     1 MPI processes
>>>>
>>>>       type: seqaij
>>>>
>>>>       rows=100000, cols=100000
>>>>
>>>>       total: nonzeros=1675180, allocated nonzeros=1675180
>>>>
>>>>       total number of mallocs used during MatSetValues calls =0
>>>>
>>>>         not using I-node routines
>>>>
>>>>   linear system matrix = precond matrix:
>>>>
>>>>   Mat Object:   2 MPI processes
>>>>
>>>>     type: mpiaij
>>>>
>>>>     rows=200000, cols=200000
>>>>
>>>>     total: nonzeros=3373340, allocated nonzeros=3373340
>>>>
>>>>     total number of mallocs used during MatSetValues calls =0
>>>>
>>>>       not using I-node (on process 0) routines
>>>>
>>>>
>>>>
>>>> And i configured my PC object as:
>>>>
>>>>
>>>>    call PCSetType(mg,PCHYPRE,ierr)
>>>>
>>>>    call PCHYPRESetType(mg,'boomeramg',ierr)
>>>>
>>>>
>>>>     call PetscOptionsSetValue(PETSC_NULL_OBJECT,'pc_hypre_boomeramg_n
>>>> odal_coarsen','1',ierr)
>>>>
>>>>     call PetscOptionsSetValue(PETSC_NULL_OBJECT,'pc_hypre_boomeramg_v
>>>> ec_interp_variant','1',ierr)
>>>>
>>>>
>>>>
>>>> What are your thoughts ?
>>>>
>>>> Thanks,
>>>>
>>>> Manuel
>>>>
>>>>
>>>>
>>>> On Fri, Jan 6, 2017 at 1:58 PM, Manuel Valera <mvalera at mail.sdsu.edu>
>>>> wrote:
>>>>
>>>>> Awesome, that did it, thanks once again.
>>>>>
>>>>>
>>>>> On Fri, Jan 6, 2017 at 1:53 PM, Barry Smith <bsmith at mcs.anl.gov>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>    Take the scatter out of the if () since everyone does it and get
>>>>>> rid of the VecView().
>>>>>>
>>>>>>    Does this work? If not where is it hanging?
>>>>>>
>>>>>>
>>>>>> > On Jan 6, 2017, at 3:29 PM, Manuel Valera <mvalera at mail.sdsu.edu>
>>>>>> wrote:
>>>>>> >
>>>>>> > Thanks Dave,
>>>>>> >
>>>>>> > I think is interesting it never gave an error on this, after adding
>>>>>> the vecassembly calls it still shows the same behavior, without
>>>>>> complaining, i did:
>>>>>> >
>>>>>> > if(rankl==0)then
>>>>>> >
>>>>>> >      call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr)
>>>>>> >      call VecAssemblyBegin(bp0,ierr) ; call
>>>>>> VecAssemblyEnd(bp0,ierr);
>>>>>> >      CHKERRQ(ierr)
>>>>>> >
>>>>>> endif
>>>>>> >
>>>>>> >
>>>>>> >      call VecScatterBegin(ctr,bp0,bp2,IN
>>>>>> SERT_VALUES,SCATTER_REVERSE,ierr)
>>>>>> >      call VecScatterEnd(ctr,bp0,bp2,INSE
>>>>>> RT_VALUES,SCATTER_REVERSE,ierr)
>>>>>> >      print*,"done! "
>>>>>> >      CHKERRQ(ierr)
>>>>>> >
>>>>>> >
>>>>>> >        CHKERRQ(ierr)
>>>>>> >
>>>>>> >
>>>>>> > Thanks.
>>>>>> >
>>>>>> > On Fri, Jan 6, 2017 at 12:44 PM, Dave May <dave.mayhem23 at gmail.com>
>>>>>> wrote:
>>>>>> >
>>>>>> >
>>>>>> > On 6 January 2017 at 20:24, Manuel Valera <mvalera at mail.sdsu.edu>
>>>>>> wrote:
>>>>>> > Great help Barry, i totally had overlooked that option (it is
>>>>>> explicit in the vecscatterbegin call help page but not in
>>>>>> vecscattercreatetozero, as i read later)
>>>>>> >
>>>>>> > So i used that and it works partially, it scatters te values
>>>>>> assigned in root but not the rest, if i call vecscatterbegin from outside
>>>>>> root it hangs, the code currently look as this:
>>>>>> >
>>>>>> >   call VecScatterCreateToZero(bp2,ctr,bp0,ierr); CHKERRQ(ierr)
>>>>>> >
>>>>>> >   call PetscObjectSetName(bp0, 'bp0:',ierr)
>>>>>> >
>>>>>> > if(rankl==0)then
>>>>>> >
>>>>>> >      call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr)
>>>>>> >
>>>>>> >      call VecView(bp0,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>>>>> >
>>>>>> >
>>>>>> > You need to call
>>>>>> >
>>>>>> >   VecAssemblyBegin(bp0);
>>>>>> >   VecAssemblyEnd(bp0);
>>>>>> > after your last call to VecSetValues() before you can do any
>>>>>> operations with bp0.
>>>>>> >
>>>>>> > With your current code, the call to VecView should produce an error
>>>>>> if you used the error checking macro CHKERRQ(ierr) (as should
>>>>>> VecScatter{Begin,End}
>>>>>> >
>>>>>> > Thanks,
>>>>>> >   Dave
>>>>>> >
>>>>>> >
>>>>>> >      call VecScatterBegin(ctr,bp0,bp2,IN
>>>>>> SERT_VALUES,SCATTER_REVERSE,ierr)
>>>>>> >      call VecScatterEnd(ctr,bp0,bp2,INSE
>>>>>> RT_VALUES,SCATTER_REVERSE,ierr)
>>>>>> >      print*,"done! "
>>>>>> >      CHKERRQ(ierr)
>>>>>> >
>>>>>> > endif
>>>>>> >
>>>>>> >    ! call VecScatterBegin(ctr,bp0,bp2,IN
>>>>>> SERT_VALUES,SCATTER_REVERSE,ierr)
>>>>>> >    !  call VecScatterEnd(ctr,bp0,bp2,INSE
>>>>>> RT_VALUES,SCATTER_REVERSE,ierr)
>>>>>> >
>>>>>> >   call VecView(bp2,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>>>>> >
>>>>>> >   call PetscBarrier(PETSC_NULL_OBJECT,ierr)
>>>>>> >
>>>>>> >   call exit()
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > And the output is: (with bp the right answer)
>>>>>> >
>>>>>> > Vec Object:bp: 2 MPI processes
>>>>>> >   type: mpi
>>>>>> > Process [0]
>>>>>> > 1.
>>>>>> > 2.
>>>>>> > Process [1]
>>>>>> > 4.
>>>>>> > 3.
>>>>>> > Vec Object:bp2: 2 MPI processes  (before scatter)
>>>>>> >   type: mpi
>>>>>> > Process [0]
>>>>>> > 0.
>>>>>> > 0.
>>>>>> > Process [1]
>>>>>> > 0.
>>>>>> > 0.
>>>>>> > Vec Object:bp0: 1 MPI processes
>>>>>> >   type: seq
>>>>>> > 1.
>>>>>> > 2.
>>>>>> > 4.
>>>>>> > 3.
>>>>>> >  done!
>>>>>> > Vec Object:bp2: 2 MPI processes  (after scatter)
>>>>>> >   type: mpi
>>>>>> > Process [0]
>>>>>> > 1.
>>>>>> > 2.
>>>>>> > Process [1]
>>>>>> > 0.
>>>>>> > 0.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > Thanks inmensely for your help,
>>>>>> >
>>>>>> > Manuel
>>>>>> >
>>>>>> >
>>>>>> > On Thu, Jan 5, 2017 at 4:39 PM, Barry Smith <bsmith at mcs.anl.gov>
>>>>>> wrote:
>>>>>> >
>>>>>> > > On Jan 5, 2017, at 6:21 PM, Manuel Valera <mvalera at mail.sdsu.edu>
>>>>>> wrote:
>>>>>> > >
>>>>>> > > Hello Devs is me again,
>>>>>> > >
>>>>>> > > I'm trying to distribute a vector to all called processes, the
>>>>>> vector would be originally in root as a sequential vector and i would like
>>>>>> to scatter it, what would the best call to do this ?
>>>>>> > >
>>>>>> > > I already know how to gather a distributed vector to root with
>>>>>> VecScatterCreateToZero, this would be the inverse operation,
>>>>>> >
>>>>>> >    Use the same VecScatter object but with SCATTER_REVERSE, not you
>>>>>> need to reverse the two vector arguments as well.
>>>>>> >
>>>>>> >
>>>>>> > > i'm currently trying with VecScatterCreate() and as of now im
>>>>>> doing the following:
>>>>>> > >
>>>>>> > >
>>>>>> > > if(rank==0)then
>>>>>> > >
>>>>>> > >
>>>>>> > >      call VecCreate(PETSC_COMM_SELF,bp0,ierr); CHKERRQ(ierr) !if
>>>>>> i use WORLD
>>>>>> > >
>>>>>> !freezes in SetSizes
>>>>>> > >      call VecSetSizes(bp0,PETSC_DECIDE,nbdp,ierr); CHKERRQ(ierr)
>>>>>> > >      call VecSetType(bp0,VECSEQ,ierr)
>>>>>> > >      call VecSetFromOptions(bp0,ierr); CHKERRQ(ierr)
>>>>>> > >
>>>>>> > >
>>>>>> > >      call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr)
>>>>>> > >
>>>>>> > >      !call VecSet(bp0,5.0D0,ierr); CHKERRQ(ierr)
>>>>>> > >
>>>>>> > >
>>>>>> > >      call VecView(bp0,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>>>>> > >
>>>>>> > >      call VecAssemblyBegin(bp0,ierr) ; call
>>>>>> VecAssemblyEnd(bp0,ierr) !rhs
>>>>>> > >
>>>>>> > >      do i=0,nbdp-1,1
>>>>>> > >         ind(i+1) = i
>>>>>> > >      enddo
>>>>>> > >
>>>>>> > >      call ISCreateGeneral(PETSC_COMM_SEL
>>>>>> F,nbdp,ind,PETSC_COPY_VALUES,locis,ierr)
>>>>>> > >
>>>>>> > >     !call VecScatterCreate(bp0,PETSC_NULL_OBJECT,bp2,is,ctr,ierr)
>>>>>> !if i use SELF
>>>>>> > >
>>>>>>  !freezes here.
>>>>>> > >
>>>>>> > >      call VecScatterCreate(bp0,locis,bp2
>>>>>> ,PETSC_NULL_OBJECT,ctr,ierr)
>>>>>> > >
>>>>>> > > endif
>>>>>> > >
>>>>>> > > bp2 being the receptor MPI vector to scatter to
>>>>>> > >
>>>>>> > > But it freezes in VecScatterCreate when trying to use more than
>>>>>> one processor, what would be a better approach ?
>>>>>> > >
>>>>>> > >
>>>>>> > > Thanks once again,
>>>>>> > >
>>>>>> > > Manuel
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > > On Wed, Jan 4, 2017 at 3:30 PM, Manuel Valera <
>>>>>> mvalera at mail.sdsu.edu> wrote:
>>>>>> > > Thanks i had no idea how to debug and read those logs, that
>>>>>> solved this issue at least (i was sending a message from root to everyone
>>>>>> else, but trying to catch from everyone else including root)
>>>>>> > >
>>>>>> > > Until next time, many thanks,
>>>>>> > >
>>>>>> > > Manuel
>>>>>> > >
>>>>>> > > On Wed, Jan 4, 2017 at 3:23 PM, Matthew Knepley <
>>>>>> knepley at gmail.com> wrote:
>>>>>> > > On Wed, Jan 4, 2017 at 5:21 PM, Manuel Valera <
>>>>>> mvalera at mail.sdsu.edu> wrote:
>>>>>> > > I did a PetscBarrier just before calling the vicariate routine
>>>>>> and im pretty sure im calling it from every processor, code looks like this:
>>>>>> > >
>>>>>> > > From the gdb trace.
>>>>>> > >
>>>>>> > >   Proc 0: Is in some MPI routine you call yourself, line 113
>>>>>> > >
>>>>>> > >   Proc 1: Is in VecCreate(), line 130
>>>>>> > >
>>>>>> > > You need to fix your communication code.
>>>>>> > >
>>>>>> > >    Matt
>>>>>> > >
>>>>>> > > call PetscBarrier(PETSC_NULL_OBJECT,ierr)
>>>>>> > >
>>>>>> > > print*,'entering POInit from',rank
>>>>>> > > !call exit()
>>>>>> > >
>>>>>> > > call PetscObjsInit()
>>>>>> > >
>>>>>> > >
>>>>>> > > And output gives:
>>>>>> > >
>>>>>> > >  entering POInit from           0
>>>>>> > >  entering POInit from           1
>>>>>> > >  entering POInit from           2
>>>>>> > >  entering POInit from           3
>>>>>> > >
>>>>>> > >
>>>>>> > > Still hangs in the same way,
>>>>>> > >
>>>>>> > > Thanks,
>>>>>> > >
>>>>>> > > Manuel
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > > On Wed, Jan 4, 2017 at 2:55 PM, Manuel Valera <
>>>>>> mvalera at mail.sdsu.edu> wrote:
>>>>>> > > Thanks for the answers !
>>>>>> > >
>>>>>> > > heres the screenshot of what i got from bt in gdb (great hint in
>>>>>> how to debug in petsc, didn't know that)
>>>>>> > >
>>>>>> > > I don't really know what to look at here,
>>>>>> > >
>>>>>> > > Thanks,
>>>>>> > >
>>>>>> > > Manuel
>>>>>> > >
>>>>>> > > On Wed, Jan 4, 2017 at 2:39 PM, Dave May <dave.mayhem23 at gmail.com>
>>>>>> wrote:
>>>>>> > > Are you certain ALL ranks in PETSC_COMM_WORLD call these
>>>>>> function(s). These functions cannot be inside if statements like
>>>>>> > > if (rank == 0){
>>>>>> > >   VecCreateMPI(...)
>>>>>> > > }
>>>>>> > >
>>>>>> > >
>>>>>> > > On Wed, 4 Jan 2017 at 23:34, Manuel Valera <mvalera at mail.sdsu.edu>
>>>>>> wrote:
>>>>>> > > Thanks Dave for the quick answer, appreciate it,
>>>>>> > >
>>>>>> > > I just tried that and it didn't make a difference, any other
>>>>>> suggestions ?
>>>>>> > >
>>>>>> > > Thanks,
>>>>>> > > Manuel
>>>>>> > >
>>>>>> > > On Wed, Jan 4, 2017 at 2:29 PM, Dave May <dave.mayhem23 at gmail.com>
>>>>>> wrote:
>>>>>> > > You need to swap the order of your function calls.
>>>>>> > > Call VecSetSizes() before VecSetType()
>>>>>> > >
>>>>>> > > Thanks,
>>>>>> > >   Dave
>>>>>> > >
>>>>>> > >
>>>>>> > > On Wed, 4 Jan 2017 at 23:21, Manuel Valera <mvalera at mail.sdsu.edu>
>>>>>> wrote:
>>>>>> > > Hello all, happy new year,
>>>>>> > >
>>>>>> > > I'm working on parallelizing my code, it worked and provided some
>>>>>> results when i just called more than one processor, but created artifacts
>>>>>> because i didn't need one image of the whole program in each processor,
>>>>>> conflicting with each other.
>>>>>> > >
>>>>>> > > Since the pressure solver is the main part i need in parallel im
>>>>>> chosing mpi to run everything in root processor until its time to solve for
>>>>>> pressure, at this point im trying to create a distributed vector using
>>>>>> either
>>>>>> > >
>>>>>> > >      call VecCreateMPI(PETSC_COMM_WORLD,
>>>>>> PETSC_DECIDE,nbdp,xp,ierr)
>>>>>> > > or
>>>>>> > >      call VecCreate(PETSC_COMM_WORLD,xp,ierr); CHKERRQ(ierr)
>>>>>> > >      call VecSetType(xp,VECMPI,ierr)
>>>>>> > >      call VecSetSizes(xp,PETSC_DECIDE,nbdp,ierr); CHKERRQ(ierr)
>>>>>> > >
>>>>>> > >
>>>>>> > > In both cases program hangs at this point, something it never
>>>>>> happened on the naive way i described before. I've made sure the global
>>>>>> size, nbdp, is the same in every processor. What can be wrong?
>>>>>> > >
>>>>>> > > Thanks for your kind help,
>>>>>> > >
>>>>>> > > Manuel.
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > > --
>>>>>> > > What most experimenters take for granted before they begin their
>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>> experiments lead.
>>>>>> > > -- Norbert Wiener
>>>>>> > >
>>>>>> > >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170107/454960c8/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2017-01-07 at 2.47.26 PM.png
Type: image/png
Size: 393404 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170107/454960c8/attachment-0001.png>


More information about the petsc-users mailing list