[petsc-users] -log_view hangs unexpectedly // how to optimize my kspsolve

Manuel Valera mvalera at mail.sdsu.edu
Sat Jan 7 15:32:57 CST 2017


Hi Devs, hope you are having a great weekend,

I could finally parallelize my linear solver and implement it into the rest
of the code in a way that only the linear system is solved in parallel,
great news for my team, but there is a catch and is that i don't see any
speedup in the linear system, i don't know if its the MPI in the cluster we
are using, but im not sure on how to debug it,

On the other hand and because of this issue i was trying to do -log_summary
or -log_view and i noticed the program in this context hangs when is time
of producing the log, if i debug this for 2 cores, process 0 exits normally
but process 1 hangs in the vectorscatterbegin() with scatter_reverse way
back in the code, and even after destroying all associated objects and
calling petscfinalize(), so im really clueless on why is this, as it only
happens for -log_* or -ksp_view options.

my -ksp_view shows this:

 KSP Object: 2 MPI processes

  type: gcr

    GCR: restart = 30

    GCR: restarts performed = 20

  maximum iterations=10000, initial guess is zero

  tolerances:  relative=1e-14, absolute=1e-50, divergence=10000.

  right preconditioning

  using UNPRECONDITIONED norm type for convergence test

PC Object: 2 MPI processes

  type: bjacobi

    block Jacobi: number of blocks = 2

    Local solve is same for all blocks, in the following KSP and PC objects:

  KSP Object:  (sub_)   1 MPI processes

    type: preonly

    maximum iterations=10000, initial guess is zero

    tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.

    left preconditioning

    using NONE norm type for convergence test

  PC Object:  (sub_)   1 MPI processes

    type: ilu

      ILU: out-of-place factorization

      0 levels of fill

      tolerance for zero pivot 2.22045e-14

      matrix ordering: natural

      factor fill ratio given 1., needed 1.

        Factored matrix follows:

          Mat Object:           1 MPI processes

            type: seqaij

            rows=100000, cols=100000

            package used to perform factorization: petsc

            total: nonzeros=1675180, allocated nonzeros=1675180

            total number of mallocs used during MatSetValues calls =0

              not using I-node routines

    linear system matrix = precond matrix:

    Mat Object:     1 MPI processes

      type: seqaij

      rows=100000, cols=100000

      total: nonzeros=1675180, allocated nonzeros=1675180

      total number of mallocs used during MatSetValues calls =0

        not using I-node routines

  linear system matrix = precond matrix:

  Mat Object:   2 MPI processes

    type: mpiaij

    rows=200000, cols=200000

    total: nonzeros=3373340, allocated nonzeros=3373340

    total number of mallocs used during MatSetValues calls =0

      not using I-node (on process 0) routines



And i configured my PC object as:


   call PCSetType(mg,PCHYPRE,ierr)

   call PCHYPRESetType(mg,'boomeramg',ierr)


    call PetscOptionsSetValue(PETSC_NULL_OBJECT,
'pc_hypre_boomeramg_nodal_coarsen','1',ierr)

    call PetscOptionsSetValue(PETSC_NULL_OBJECT,
'pc_hypre_boomeramg_vec_interp_variant','1',ierr)



What are your thoughts ?

Thanks,

Manuel



On Fri, Jan 6, 2017 at 1:58 PM, Manuel Valera <mvalera at mail.sdsu.edu> wrote:

> Awesome, that did it, thanks once again.
>
>
> On Fri, Jan 6, 2017 at 1:53 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>>
>>    Take the scatter out of the if () since everyone does it and get rid
>> of the VecView().
>>
>>    Does this work? If not where is it hanging?
>>
>>
>> > On Jan 6, 2017, at 3:29 PM, Manuel Valera <mvalera at mail.sdsu.edu>
>> wrote:
>> >
>> > Thanks Dave,
>> >
>> > I think is interesting it never gave an error on this, after adding the
>> vecassembly calls it still shows the same behavior, without complaining, i
>> did:
>> >
>> > if(rankl==0)then
>> >
>> >      call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr)
>> >      call VecAssemblyBegin(bp0,ierr) ; call VecAssemblyEnd(bp0,ierr);
>> >      CHKERRQ(ierr)
>> >
>> endif
>> >
>> >
>> >      call VecScatterBegin(ctr,bp0,bp2,INSERT_VALUES,SCATTER_REVERSE,ie
>> rr)
>> >      call VecScatterEnd(ctr,bp0,bp2,INSERT_VALUES,SCATTER_REVERSE,ierr)
>> >      print*,"done! "
>> >      CHKERRQ(ierr)
>> >
>> >
>> >        CHKERRQ(ierr)
>> >
>> >
>> > Thanks.
>> >
>> > On Fri, Jan 6, 2017 at 12:44 PM, Dave May <dave.mayhem23 at gmail.com>
>> wrote:
>> >
>> >
>> > On 6 January 2017 at 20:24, Manuel Valera <mvalera at mail.sdsu.edu>
>> wrote:
>> > Great help Barry, i totally had overlooked that option (it is explicit
>> in the vecscatterbegin call help page but not in vecscattercreatetozero, as
>> i read later)
>> >
>> > So i used that and it works partially, it scatters te values assigned
>> in root but not the rest, if i call vecscatterbegin from outside root it
>> hangs, the code currently look as this:
>> >
>> >   call VecScatterCreateToZero(bp2,ctr,bp0,ierr); CHKERRQ(ierr)
>> >
>> >   call PetscObjectSetName(bp0, 'bp0:',ierr)
>> >
>> > if(rankl==0)then
>> >
>> >      call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr)
>> >
>> >      call VecView(bp0,PETSC_VIEWER_STDOUT_WORLD,ierr)
>> >
>> >
>> > You need to call
>> >
>> >   VecAssemblyBegin(bp0);
>> >   VecAssemblyEnd(bp0);
>> > after your last call to VecSetValues() before you can do any operations
>> with bp0.
>> >
>> > With your current code, the call to VecView should produce an error if
>> you used the error checking macro CHKERRQ(ierr) (as should
>> VecScatter{Begin,End}
>> >
>> > Thanks,
>> >   Dave
>> >
>> >
>> >      call VecScatterBegin(ctr,bp0,bp2,INSERT_VALUES,SCATTER_REVERSE,ie
>> rr)
>> >      call VecScatterEnd(ctr,bp0,bp2,INSERT_VALUES,SCATTER_REVERSE,ierr)
>> >      print*,"done! "
>> >      CHKERRQ(ierr)
>> >
>> > endif
>> >
>> >    ! call VecScatterBegin(ctr,bp0,bp2,INSERT_VALUES,SCATTER_REVERSE,ie
>> rr)
>> >    !  call VecScatterEnd(ctr,bp0,bp2,INSERT_VALUES,SCATTER_REVERSE,ierr
>> )
>> >
>> >   call VecView(bp2,PETSC_VIEWER_STDOUT_WORLD,ierr)
>> >
>> >   call PetscBarrier(PETSC_NULL_OBJECT,ierr)
>> >
>> >   call exit()
>> >
>> >
>> >
>> > And the output is: (with bp the right answer)
>> >
>> > Vec Object:bp: 2 MPI processes
>> >   type: mpi
>> > Process [0]
>> > 1.
>> > 2.
>> > Process [1]
>> > 4.
>> > 3.
>> > Vec Object:bp2: 2 MPI processes  (before scatter)
>> >   type: mpi
>> > Process [0]
>> > 0.
>> > 0.
>> > Process [1]
>> > 0.
>> > 0.
>> > Vec Object:bp0: 1 MPI processes
>> >   type: seq
>> > 1.
>> > 2.
>> > 4.
>> > 3.
>> >  done!
>> > Vec Object:bp2: 2 MPI processes  (after scatter)
>> >   type: mpi
>> > Process [0]
>> > 1.
>> > 2.
>> > Process [1]
>> > 0.
>> > 0.
>> >
>> >
>> >
>> >
>> > Thanks inmensely for your help,
>> >
>> > Manuel
>> >
>> >
>> > On Thu, Jan 5, 2017 at 4:39 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> >
>> > > On Jan 5, 2017, at 6:21 PM, Manuel Valera <mvalera at mail.sdsu.edu>
>> wrote:
>> > >
>> > > Hello Devs is me again,
>> > >
>> > > I'm trying to distribute a vector to all called processes, the vector
>> would be originally in root as a sequential vector and i would like to
>> scatter it, what would the best call to do this ?
>> > >
>> > > I already know how to gather a distributed vector to root with
>> VecScatterCreateToZero, this would be the inverse operation,
>> >
>> >    Use the same VecScatter object but with SCATTER_REVERSE, not you
>> need to reverse the two vector arguments as well.
>> >
>> >
>> > > i'm currently trying with VecScatterCreate() and as of now im doing
>> the following:
>> > >
>> > >
>> > > if(rank==0)then
>> > >
>> > >
>> > >      call VecCreate(PETSC_COMM_SELF,bp0,ierr); CHKERRQ(ierr) !if i
>> use WORLD
>> > >                                                              !freezes
>> in SetSizes
>> > >      call VecSetSizes(bp0,PETSC_DECIDE,nbdp,ierr); CHKERRQ(ierr)
>> > >      call VecSetType(bp0,VECSEQ,ierr)
>> > >      call VecSetFromOptions(bp0,ierr); CHKERRQ(ierr)
>> > >
>> > >
>> > >      call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr)
>> > >
>> > >      !call VecSet(bp0,5.0D0,ierr); CHKERRQ(ierr)
>> > >
>> > >
>> > >      call VecView(bp0,PETSC_VIEWER_STDOUT_WORLD,ierr)
>> > >
>> > >      call VecAssemblyBegin(bp0,ierr) ; call VecAssemblyEnd(bp0,ierr)
>> !rhs
>> > >
>> > >      do i=0,nbdp-1,1
>> > >         ind(i+1) = i
>> > >      enddo
>> > >
>> > >      call ISCreateGeneral(PETSC_COMM_SELF,nbdp,ind,PETSC_COPY_VALUES,
>> locis,ierr)
>> > >
>> > >     !call VecScatterCreate(bp0,PETSC_NULL_OBJECT,bp2,is,ctr,ierr)
>> !if i use SELF
>> > >
>>  !freezes here.
>> > >
>> > >      call VecScatterCreate(bp0,locis,bp2,PETSC_NULL_OBJECT,ctr,ierr)
>> > >
>> > > endif
>> > >
>> > > bp2 being the receptor MPI vector to scatter to
>> > >
>> > > But it freezes in VecScatterCreate when trying to use more than one
>> processor, what would be a better approach ?
>> > >
>> > >
>> > > Thanks once again,
>> > >
>> > > Manuel
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Wed, Jan 4, 2017 at 3:30 PM, Manuel Valera <mvalera at mail.sdsu.edu>
>> wrote:
>> > > Thanks i had no idea how to debug and read those logs, that solved
>> this issue at least (i was sending a message from root to everyone else,
>> but trying to catch from everyone else including root)
>> > >
>> > > Until next time, many thanks,
>> > >
>> > > Manuel
>> > >
>> > > On Wed, Jan 4, 2017 at 3:23 PM, Matthew Knepley <knepley at gmail.com>
>> wrote:
>> > > On Wed, Jan 4, 2017 at 5:21 PM, Manuel Valera <mvalera at mail.sdsu.edu>
>> wrote:
>> > > I did a PetscBarrier just before calling the vicariate routine and im
>> pretty sure im calling it from every processor, code looks like this:
>> > >
>> > > From the gdb trace.
>> > >
>> > >   Proc 0: Is in some MPI routine you call yourself, line 113
>> > >
>> > >   Proc 1: Is in VecCreate(), line 130
>> > >
>> > > You need to fix your communication code.
>> > >
>> > >    Matt
>> > >
>> > > call PetscBarrier(PETSC_NULL_OBJECT,ierr)
>> > >
>> > > print*,'entering POInit from',rank
>> > > !call exit()
>> > >
>> > > call PetscObjsInit()
>> > >
>> > >
>> > > And output gives:
>> > >
>> > >  entering POInit from           0
>> > >  entering POInit from           1
>> > >  entering POInit from           2
>> > >  entering POInit from           3
>> > >
>> > >
>> > > Still hangs in the same way,
>> > >
>> > > Thanks,
>> > >
>> > > Manuel
>> > >
>> > >
>> > >
>> > > On Wed, Jan 4, 2017 at 2:55 PM, Manuel Valera <mvalera at mail.sdsu.edu>
>> wrote:
>> > > Thanks for the answers !
>> > >
>> > > heres the screenshot of what i got from bt in gdb (great hint in how
>> to debug in petsc, didn't know that)
>> > >
>> > > I don't really know what to look at here,
>> > >
>> > > Thanks,
>> > >
>> > > Manuel
>> > >
>> > > On Wed, Jan 4, 2017 at 2:39 PM, Dave May <dave.mayhem23 at gmail.com>
>> wrote:
>> > > Are you certain ALL ranks in PETSC_COMM_WORLD call these function(s).
>> These functions cannot be inside if statements like
>> > > if (rank == 0){
>> > >   VecCreateMPI(...)
>> > > }
>> > >
>> > >
>> > > On Wed, 4 Jan 2017 at 23:34, Manuel Valera <mvalera at mail.sdsu.edu>
>> wrote:
>> > > Thanks Dave for the quick answer, appreciate it,
>> > >
>> > > I just tried that and it didn't make a difference, any other
>> suggestions ?
>> > >
>> > > Thanks,
>> > > Manuel
>> > >
>> > > On Wed, Jan 4, 2017 at 2:29 PM, Dave May <dave.mayhem23 at gmail.com>
>> wrote:
>> > > You need to swap the order of your function calls.
>> > > Call VecSetSizes() before VecSetType()
>> > >
>> > > Thanks,
>> > >   Dave
>> > >
>> > >
>> > > On Wed, 4 Jan 2017 at 23:21, Manuel Valera <mvalera at mail.sdsu.edu>
>> wrote:
>> > > Hello all, happy new year,
>> > >
>> > > I'm working on parallelizing my code, it worked and provided some
>> results when i just called more than one processor, but created artifacts
>> because i didn't need one image of the whole program in each processor,
>> conflicting with each other.
>> > >
>> > > Since the pressure solver is the main part i need in parallel im
>> chosing mpi to run everything in root processor until its time to solve for
>> pressure, at this point im trying to create a distributed vector using
>> either
>> > >
>> > >      call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,nbdp,xp,ierr)
>> > > or
>> > >      call VecCreate(PETSC_COMM_WORLD,xp,ierr); CHKERRQ(ierr)
>> > >      call VecSetType(xp,VECMPI,ierr)
>> > >      call VecSetSizes(xp,PETSC_DECIDE,nbdp,ierr); CHKERRQ(ierr)
>> > >
>> > >
>> > > In both cases program hangs at this point, something it never
>> happened on the naive way i described before. I've made sure the global
>> size, nbdp, is the same in every processor. What can be wrong?
>> > >
>> > > Thanks for your kind help,
>> > >
>> > > Manuel.
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > --
>> > > What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> > > -- Norbert Wiener
>> > >
>> > >
>> >
>> >
>> >
>> >
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170107/0a7b1d26/attachment-0001.html>


More information about the petsc-users mailing list