[petsc-users] low dimensional sub-problem

Sat Feb 25 16:57:51 CST 2017

> On Feb 25, 2017, at 11:00 AM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> On Sat, Feb 25, 2017 at 10:12 AM, Gideon Simpson <gideon.simpson at gmail.com> wrote:
> 
> -gideon
> 
>> On Feb 25, 2017, at 10:44 AM, Jed Brown <jed at jedbrown.org> wrote:
>> 
>> Gideon Simpson <gideon.simpson at gmail.com> writes:
>> 
>>> I’ve been continuing working on implementing a projection method problem, which, loosely, looks like the following.  Vec y contains the state vector for my system, y’ = f(y) which is solved with a TS, using, for now, rk4.
>>> 
>>> I have added to this a TSPostStep which does root finding on the nonlinear problem
>>> 
>>> f(lambda) = f(lambda; y, w) =  g(y + lambda * w) - g0
>> 
>> You might want to treat this as a DAE.
> 
> Yea, we discussed DAE previously.  For the sake of a reviewer, I’m trying to implement a classical method (explicit time step + projection), rather than DAE.
> 
>> 
>>> y and w are vectors that are the size of the system (numbers of mesh points), and lambda is a scalar perturbation.  
>>> 
>>> Right now, the snes function for solving this amounts to:
>>> 
>>> PetscErrorCode projector(SNES snes, Vec lam, Vec f, void *ctx){
>>> 
>>>  AppCtx * user = (AppCtx *) ctx;
>>>  const PetscScalar *lam_value;
>>>  PetscScalar *f_value;
>>> 
>>>  VecGetArrayRead(lam, &lam_value);
>>>  VecGetArray(f, &f_value);
>>>  VecCopy(user->y, user->w);

    If you do make SNES run on PETSC_COMM_WORLD where lam is now a parallel vector with 1 entry on process 0 and zero entries on all the others (and similar for f)
You will need to broadcast the lam_value[0] to all processes before calling the VecAXPY()

>>>  VecAXPY(user->w, lam_value[0], user->gradMy);
>>> 
>>>  f_value[0] = g(user->w) -user->g0;
>>>  VecRestoreArrayRead(lam, &lam_value);
>>>  VecRestoreArray(f, &f_value);
>>> 
>>>  return 0;
>>> }
>>> 
>>> To get this all to work, I constructed the SNES and Vec lam with PETSC_COMM_SELF, effectively putting a copy of the nonlinear problem on each process.  Basically, the nonlinear problem is low dimensional, but it parametrically depends on the high dimensional, distributed, vectors y and w.
>>> 
>>> The one thing that bothers me about this is:  
>>> 
>>> 1.  It would seem that each process is is doing all of these vector operations, which is entirely redundant, as the only thing that really needs to be shared amongst the processes is the value of lambda.  Is that correct? 
>> 
>> Only the SNES part is redundant, but it's super cheap because it's a
>> scalar problem.  The Vec operations are using global (parallel,
>> distributed) vectors, so they are not redundant.  Of course it is
>> critical for the SNES on every process to be configured identically and
>> deterministic so that the processes don't diverge.  And g(user->w) must
>> return the same value on each process (it probably finishes with an
>> MPI_Allreduce or similar).
> 
> So within the root finding problem, when it calls, for instance, VecCopy (which is acting on distributed vectors), it’s not doing that once on process 0, once on process 1, once on process 2, etc.  Or it is, but you’re saying it’s too cheap to matter? 

   Each process is copying ITs part of the vector; which is exactly what you need.
> 
> No they are collective. That is why Jed said its crucial that the same branches are taken on all processes.
>  
> Two things I’m mildly concerned about is that, since user->w changes at each step of the solver, by virtue of the lambda scalar changing,  if the SNES’s are running asynchronously, by virtue of them being PETSC_COMM_SELF constructs, couldn’t there be a collision?  
> 
> Yes, as I say above. I think its much safer to run a COMM_WORLD SNES that just has nothing on most procs than to do the former
> since you are already synchronizing everywhere.
> 
> The operation g(user->w) does indeed conclude with an MPI_Allreduce, followed by DMDAVecRestoreArray and DMRestoreLocalVector.
> 
> Lastly, in the TSPostStep, it runs as:
>   
> SNESSolve(user->snes, NULL,user->lam);
> VecGetArray(user->lam, &lam_value);

Again here you would need to broadcast this user->lam value to all processes before calling the VecAXPY() since it is only stored on process zero.

> VecAXPY(y, lam_value[0], user->gradMy);
> VecRestoreArray(user->lam, &lam_value);
> 
> Is there any communication issue with ensuring that all the SNES’s have finished on each process, before proceeding to do the vector operations?
> 
> No.
>  
> I keep wondering if I should just write this as a global, with the data structures stored on process 0, just to avoid this kind of headache. 
> 
> Yes.
> 
>    Matt
>  
>> 
>>> 2.  Is there an obvious way around this redundancy?
>>> 
>>> -gideon
> 
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener