[petsc-users] -log_view hangs unexpectedly // how to optimize my kspsolve

Barry Smith bsmith at mcs.anl.gov
Sat Jan 7 17:11:21 CST 2017


   Put a break point in PetscFinalize() Do both processes get to it? Or does one get there while the other is "somewhere in the solver". My guess is that your program on process 0 has decided to end solving while the other processes think there are more solves to do hence they are waiting for the next solve to start up while process 0 has gotten to PetscFinalize().




> On Jan 7, 2017, at 4:59 PM, Manuel Valera <mvalera at mail.sdsu.edu> wrote:
> 
> I would have to think and code a MWE for this problem before sending it since the model is much bigger than the petsc solver. Attached here is a screenshot of the debugger as barry taught me, is that the stack trace you need ?
> 
> the ucmsMain.f90:522 that shows is the call (from all processes) to the routine that updates the rhs vector (from root) and scatters it (from all processes). 
> 
> This routine is itself inside a double loop that occurs in all processes but the only call from all processes to the solver is this one, the rest of the loop which involves correcting for velocities, pressure and temperature, all happens in root node.
> 
> Sorry for the convoluted program design, this is the first beta version of the model working on parallel and was the best i could come with, i suppose it makes more sense in serial,
> 
> Thanks 
> 
> On Sat, Jan 7, 2017 at 2:24 PM, Matthew Knepley <knepley at gmail.com> wrote:
> On Sat, Jan 7, 2017 at 4:20 PM, Manuel Valera <mvalera at mail.sdsu.edu> wrote:
> Thank you Matthew, 
> 
> On Sat, Jan 7, 2017 at 1:49 PM, Matthew Knepley <knepley at gmail.com> wrote:
> On Sat, Jan 7, 2017 at 3:32 PM, Manuel Valera <mvalera at mail.sdsu.edu> wrote:
> Hi Devs, hope you are having a great weekend,
> 
> I could finally parallelize my linear solver and implement it into the rest of the code in a way that only the linear system is solved in parallel, great news for my team, but there is a catch and is that i don't see any speedup in the linear system, i don't know if its the MPI in the cluster we are using, but im not sure on how to debug it,
> 
> We need to see -log_view output for any performance question.
>  
> On the other hand and because of this issue i was trying to do -log_summary or -log_view and i noticed the program in this context hangs when is time of producing the log, if i debug this for 2 cores, process 0 exits normally but process 1 hangs in the vectorscatterbegin() with scatter_reverse way back in the code,
> 
> You are calling a collective routine from only 1 process.
>  
>   Matt
> 
> I am pretty confident this is not the case,
> 
> This is still the simplest explanation. Can you send the stack trace for the 2 process run?
>  
> the callings to vecscattercreatetozero and vecscatterbegin are made in all processes, the program goes thru all of the iterations on the linear solver, writes output correctly and even closes all the petsc objects without complaining, the freeze occurs at the very end when the log is to be produced.
> 
> If you can send us a code to run, we can likely find the error.
> 
>   Thanks,
> 
>      Matt
>  
> Thanks,
> 
> Manuel
> 
>  
>  
> and even after destroying all associated objects and calling petscfinalize(), so im really clueless on why is this, as it only happens for -log_* or -ksp_view options.
> 
> my -ksp_view shows this:
> 
>  KSP Object: 2 MPI processes
>   type: gcr
>     GCR: restart = 30 
>     GCR: restarts performed = 20 
>   maximum iterations=10000, initial guess is zero
>   tolerances:  relative=1e-14, absolute=1e-50, divergence=10000.
>   right preconditioning
>   using UNPRECONDITIONED norm type for convergence test
> PC Object: 2 MPI processes
>   type: bjacobi
>     block Jacobi: number of blocks = 2
>     Local solve is same for all blocks, in the following KSP and PC objects:
>   KSP Object:  (sub_)   1 MPI processes
>     type: preonly
>     maximum iterations=10000, initial guess is zero
>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>     left preconditioning
>     using NONE norm type for convergence test
>   PC Object:  (sub_)   1 MPI processes
>     type: ilu
>       ILU: out-of-place factorization
>       0 levels of fill
>       tolerance for zero pivot 2.22045e-14
>       matrix ordering: natural
>       factor fill ratio given 1., needed 1.
>         Factored matrix follows:
>           Mat Object:           1 MPI processes
>             type: seqaij
>             rows=100000, cols=100000
>             package used to perform factorization: petsc
>             total: nonzeros=1675180, allocated nonzeros=1675180
>             total number of mallocs used during MatSetValues calls =0
>               not using I-node routines
>     linear system matrix = precond matrix:
>     Mat Object:     1 MPI processes
>       type: seqaij
>       rows=100000, cols=100000
>       total: nonzeros=1675180, allocated nonzeros=1675180
>       total number of mallocs used during MatSetValues calls =0
>         not using I-node routines
>   linear system matrix = precond matrix:
>   Mat Object:   2 MPI processes
>     type: mpiaij
>     rows=200000, cols=200000
>     total: nonzeros=3373340, allocated nonzeros=3373340
>     total number of mallocs used during MatSetValues calls =0
>       not using I-node (on process 0) routines
> 
> 
> And i configured my PC object as:
> 
>    call PCSetType(mg,PCHYPRE,ierr)
>    call PCHYPRESetType(mg,'boomeramg',ierr)
> 
>     call PetscOptionsSetValue(PETSC_NULL_OBJECT,'pc_hypre_boomeramg_nodal_coarsen','1',ierr)
>     call PetscOptionsSetValue(PETSC_NULL_OBJECT,'pc_hypre_boomeramg_vec_interp_variant','1',ierr)
> 
> 
> What are your thoughts ?
> 
> Thanks,
> 
> Manuel 
> 
> 
> 
> On Fri, Jan 6, 2017 at 1:58 PM, Manuel Valera <mvalera at mail.sdsu.edu> wrote:
> Awesome, that did it, thanks once again.
> 
> 
> On Fri, Jan 6, 2017 at 1:53 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
>    Take the scatter out of the if () since everyone does it and get rid of the VecView().
> 
>    Does this work? If not where is it hanging?
> 
> 
> > On Jan 6, 2017, at 3:29 PM, Manuel Valera <mvalera at mail.sdsu.edu> wrote:
> >
> > Thanks Dave,
> >
> > I think is interesting it never gave an error on this, after adding the vecassembly calls it still shows the same behavior, without complaining, i did:
> >
> > if(rankl==0)then
> >
> >      call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr)
> >      call VecAssemblyBegin(bp0,ierr) ; call VecAssemblyEnd(bp0,ierr);
> >      CHKERRQ(ierr)
> >
> endif
> >
> >
> >      call VecScatterBegin(ctr,bp0,bp2,INSERT_VALUES,SCATTER_REVERSE,ierr)
> >      call VecScatterEnd(ctr,bp0,bp2,INSERT_VALUES,SCATTER_REVERSE,ierr)
> >      print*,"done! "
> >      CHKERRQ(ierr)
> >
> >
> >        CHKERRQ(ierr)
> >
> >
> > Thanks.
> >
> > On Fri, Jan 6, 2017 at 12:44 PM, Dave May <dave.mayhem23 at gmail.com> wrote:
> >
> >
> > On 6 January 2017 at 20:24, Manuel Valera <mvalera at mail.sdsu.edu> wrote:
> > Great help Barry, i totally had overlooked that option (it is explicit in the vecscatterbegin call help page but not in vecscattercreatetozero, as i read later)
> >
> > So i used that and it works partially, it scatters te values assigned in root but not the rest, if i call vecscatterbegin from outside root it hangs, the code currently look as this:
> >
> >   call VecScatterCreateToZero(bp2,ctr,bp0,ierr); CHKERRQ(ierr)
> >
> >   call PetscObjectSetName(bp0, 'bp0:',ierr)
> >
> > if(rankl==0)then
> >
> >      call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr)
> >
> >      call VecView(bp0,PETSC_VIEWER_STDOUT_WORLD,ierr)
> >
> >
> > You need to call
> >
> >   VecAssemblyBegin(bp0);
> >   VecAssemblyEnd(bp0);
> > after your last call to VecSetValues() before you can do any operations with bp0.
> >
> > With your current code, the call to VecView should produce an error if you used the error checking macro CHKERRQ(ierr) (as should VecScatter{Begin,End}
> >
> > Thanks,
> >   Dave
> >
> >
> >      call VecScatterBegin(ctr,bp0,bp2,INSERT_VALUES,SCATTER_REVERSE,ierr)
> >      call VecScatterEnd(ctr,bp0,bp2,INSERT_VALUES,SCATTER_REVERSE,ierr)
> >      print*,"done! "
> >      CHKERRQ(ierr)
> >
> > endif
> >
> >    ! call VecScatterBegin(ctr,bp0,bp2,INSERT_VALUES,SCATTER_REVERSE,ierr)
> >    !  call VecScatterEnd(ctr,bp0,bp2,INSERT_VALUES,SCATTER_REVERSE,ierr)
> >
> >   call VecView(bp2,PETSC_VIEWER_STDOUT_WORLD,ierr)
> >
> >   call PetscBarrier(PETSC_NULL_OBJECT,ierr)
> >
> >   call exit()
> >
> >
> >
> > And the output is: (with bp the right answer)
> >
> > Vec Object:bp: 2 MPI processes
> >   type: mpi
> > Process [0]
> > 1.
> > 2.
> > Process [1]
> > 4.
> > 3.
> > Vec Object:bp2: 2 MPI processes  (before scatter)
> >   type: mpi
> > Process [0]
> > 0.
> > 0.
> > Process [1]
> > 0.
> > 0.
> > Vec Object:bp0: 1 MPI processes
> >   type: seq
> > 1.
> > 2.
> > 4.
> > 3.
> >  done!
> > Vec Object:bp2: 2 MPI processes  (after scatter)
> >   type: mpi
> > Process [0]
> > 1.
> > 2.
> > Process [1]
> > 0.
> > 0.
> >
> >
> >
> >
> > Thanks inmensely for your help,
> >
> > Manuel
> >
> >
> > On Thu, Jan 5, 2017 at 4:39 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> > > On Jan 5, 2017, at 6:21 PM, Manuel Valera <mvalera at mail.sdsu.edu> wrote:
> > >
> > > Hello Devs is me again,
> > >
> > > I'm trying to distribute a vector to all called processes, the vector would be originally in root as a sequential vector and i would like to scatter it, what would the best call to do this ?
> > >
> > > I already know how to gather a distributed vector to root with VecScatterCreateToZero, this would be the inverse operation,
> >
> >    Use the same VecScatter object but with SCATTER_REVERSE, not you need to reverse the two vector arguments as well.
> >
> >
> > > i'm currently trying with VecScatterCreate() and as of now im doing the following:
> > >
> > >
> > > if(rank==0)then
> > >
> > >
> > >      call VecCreate(PETSC_COMM_SELF,bp0,ierr); CHKERRQ(ierr) !if i use WORLD
> > >                                                              !freezes in SetSizes
> > >      call VecSetSizes(bp0,PETSC_DECIDE,nbdp,ierr); CHKERRQ(ierr)
> > >      call VecSetType(bp0,VECSEQ,ierr)
> > >      call VecSetFromOptions(bp0,ierr); CHKERRQ(ierr)
> > >
> > >
> > >      call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr)
> > >
> > >      !call VecSet(bp0,5.0D0,ierr); CHKERRQ(ierr)
> > >
> > >
> > >      call VecView(bp0,PETSC_VIEWER_STDOUT_WORLD,ierr)
> > >
> > >      call VecAssemblyBegin(bp0,ierr) ; call VecAssemblyEnd(bp0,ierr) !rhs
> > >
> > >      do i=0,nbdp-1,1
> > >         ind(i+1) = i
> > >      enddo
> > >
> > >      call ISCreateGeneral(PETSC_COMM_SELF,nbdp,ind,PETSC_COPY_VALUES,locis,ierr)
> > >
> > >     !call VecScatterCreate(bp0,PETSC_NULL_OBJECT,bp2,is,ctr,ierr) !if i use SELF
> > >                                                                   !freezes here.
> > >
> > >      call VecScatterCreate(bp0,locis,bp2,PETSC_NULL_OBJECT,ctr,ierr)
> > >
> > > endif
> > >
> > > bp2 being the receptor MPI vector to scatter to
> > >
> > > But it freezes in VecScatterCreate when trying to use more than one processor, what would be a better approach ?
> > >
> > >
> > > Thanks once again,
> > >
> > > Manuel
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Jan 4, 2017 at 3:30 PM, Manuel Valera <mvalera at mail.sdsu.edu> wrote:
> > > Thanks i had no idea how to debug and read those logs, that solved this issue at least (i was sending a message from root to everyone else, but trying to catch from everyone else including root)
> > >
> > > Until next time, many thanks,
> > >
> > > Manuel
> > >
> > > On Wed, Jan 4, 2017 at 3:23 PM, Matthew Knepley <knepley at gmail.com> wrote:
> > > On Wed, Jan 4, 2017 at 5:21 PM, Manuel Valera <mvalera at mail.sdsu.edu> wrote:
> > > I did a PetscBarrier just before calling the vicariate routine and im pretty sure im calling it from every processor, code looks like this:
> > >
> > > From the gdb trace.
> > >
> > >   Proc 0: Is in some MPI routine you call yourself, line 113
> > >
> > >   Proc 1: Is in VecCreate(), line 130
> > >
> > > You need to fix your communication code.
> > >
> > >    Matt
> > >
> > > call PetscBarrier(PETSC_NULL_OBJECT,ierr)
> > >
> > > print*,'entering POInit from',rank
> > > !call exit()
> > >
> > > call PetscObjsInit()
> > >
> > >
> > > And output gives:
> > >
> > >  entering POInit from           0
> > >  entering POInit from           1
> > >  entering POInit from           2
> > >  entering POInit from           3
> > >
> > >
> > > Still hangs in the same way,
> > >
> > > Thanks,
> > >
> > > Manuel
> > >
> > >
> > >
> > > On Wed, Jan 4, 2017 at 2:55 PM, Manuel Valera <mvalera at mail.sdsu.edu> wrote:
> > > Thanks for the answers !
> > >
> > > heres the screenshot of what i got from bt in gdb (great hint in how to debug in petsc, didn't know that)
> > >
> > > I don't really know what to look at here,
> > >
> > > Thanks,
> > >
> > > Manuel
> > >
> > > On Wed, Jan 4, 2017 at 2:39 PM, Dave May <dave.mayhem23 at gmail.com> wrote:
> > > Are you certain ALL ranks in PETSC_COMM_WORLD call these function(s). These functions cannot be inside if statements like
> > > if (rank == 0){
> > >   VecCreateMPI(...)
> > > }
> > >
> > >
> > > On Wed, 4 Jan 2017 at 23:34, Manuel Valera <mvalera at mail.sdsu.edu> wrote:
> > > Thanks Dave for the quick answer, appreciate it,
> > >
> > > I just tried that and it didn't make a difference, any other suggestions ?
> > >
> > > Thanks,
> > > Manuel
> > >
> > > On Wed, Jan 4, 2017 at 2:29 PM, Dave May <dave.mayhem23 at gmail.com> wrote:
> > > You need to swap the order of your function calls.
> > > Call VecSetSizes() before VecSetType()
> > >
> > > Thanks,
> > >   Dave
> > >
> > >
> > > On Wed, 4 Jan 2017 at 23:21, Manuel Valera <mvalera at mail.sdsu.edu> wrote:
> > > Hello all, happy new year,
> > >
> > > I'm working on parallelizing my code, it worked and provided some results when i just called more than one processor, but created artifacts because i didn't need one image of the whole program in each processor, conflicting with each other.
> > >
> > > Since the pressure solver is the main part i need in parallel im chosing mpi to run everything in root processor until its time to solve for pressure, at this point im trying to create a distributed vector using either
> > >
> > >      call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,nbdp,xp,ierr)
> > > or
> > >      call VecCreate(PETSC_COMM_WORLD,xp,ierr); CHKERRQ(ierr)
> > >      call VecSetType(xp,VECMPI,ierr)
> > >      call VecSetSizes(xp,PETSC_DECIDE,nbdp,ierr); CHKERRQ(ierr)
> > >
> > >
> > > In both cases program hangs at this point, something it never happened on the naive way i described before. I've made sure the global size, nbdp, is the same in every processor. What can be wrong?
> > >
> > > Thanks for your kind help,
> > >
> > > Manuel.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > --
> > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> > > -- Norbert Wiener
> > >
> > >
> >
> >
> >
> >
> 
> 
> 
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> <Screen Shot 2017-01-07 at 2.47.26 PM.png>



More information about the petsc-users mailing list