[petsc-users] Parallel TS for ODE

Tue Apr 20 14:35:41 CDT 2021

On Tue, Apr 20, 2021 at 1:17 PM Francesco Brarda <brardafrancesco at gmail.com>
wrote:

> Thank you for the advices, I would just like to convert the code I already
> have to see what might happen once parallelized.
> Do you think it is better to put the 3 equations into a 1d Distributed
> Array with 3 dofs and run the job with multiple procs regardless of how
> many equations I have? Is it possible?
>

If you plan in the end to use a structured grid, this is a great plan. If
not, this is not a good plan.

  Thanks,

     Matt

> Thank you,
> Francesco
>
> Il giorno 20 apr 2021, alle ore 17:57, Stefano Zampini <
> stefano.zampini at gmail.com> ha scritto:
>
> It does not make sense to parallelize to 1 equation per process, unless
> that single equation per process is super super super costly.
> Is this work you are doing used to understand PETSc parallelization
> strategy? if so, there are multiple examples in the sourcetree that you can
> look at to populate matrices and vectors in parallel
>
> Il giorno mar 20 apr 2021 alle ore 17:52 Francesco Brarda <
> brardafrancesco at gmail.com> ha scritto:
>
>> In principle the entire code was for 1 proc only. The functions were
>> built with VecGetArray(). While adapting the code for multiple procs I
>> thought using VecGetOwnershipRange was a possible way to allocate the
>> equations in the vector using multiple procs. What do you think, please?
>>
>> Thank you,
>> Francesco
>>
>> Il giorno 20 apr 2021, alle ore 16:43, Matthew Knepley <knepley at gmail.com>
>> ha scritto:
>>
>> On Tue, Apr 20, 2021 at 10:41 AM Francesco Brarda <
>> brardafrancesco at gmail.com> wrote:
>>
>>> I was trying to follow Barry's advice some time ago, but I guess that's
>>> not the way he meant it. How should I refer to the values contained in x?
>>> With Distributed Arrays?
>>>
>>
>> That is how you get values from x. However, I cannot understand at all
>> what you are doing with "mybase".
>>
>>    Matt
>>
>>
>>> Thanks
>>> Francesco
>>>
>>>  Even though it will not scale and will deliver slower performance it is
>>>> completely possible for you to solve the 3 variable problem using 3 MPI
>>>> ranks. Or 10 mpi ranks. You would just create vectors/matrices with 1
>>>> degree of freedom for the first three ranks and no degrees of freedom for
>>>> the later ranks. During your function evaluation (and Jacobian evaluation)
>>>> for TS you will need to set up the appropriate communication to get the
>>>> values you need on each rank to evaluate the parts of the function
>>>> evaluation needed by that rank. This is true for parallelizing any
>>>> computation.
>>>>
>>>>  Barry
>>>>
>>>>
>>>
>>>
>>> Il giorno 20 apr 2021, alle ore 15:40, Matthew Knepley <
>>> knepley at gmail.com> ha scritto:
>>>
>>> On Tue, Apr 20, 2021 at 9:36 AM Francesco Brarda <
>>> brardafrancesco at gmail.com> wrote:
>>>
>>>> Hi!
>>>> I tried to implement the SIR model taking into account the fact that I
>>>> will only use 3 MPI ranks at this moment.
>>>> I built vectors and matrices following the examples already available.
>>>> In particular, I defined the functions required similarly (RHSFunction,
>>>> IFunction, IJacobian), as follows:
>>>>
>>>
>>> I don't think this makes sense. You use "mybase" to distinguish between
>>> 3 procs, which would indicate that each procs has only
>>> 1 degree of freedom. However, you use x[1] on each proc, indicating it
>>> has at least 2 dofs.
>>>
>>>   Thanks,
>>>
>>>      Matt
>>>
>>>
>>>> static PetscErrorCode RHSFunction(TS ts,PetscReal t,Vec X,Vec
>>>> F,void *ctx)
>>>> {
>>>>   PetscErrorCode    ierr;
>>>>   AppCtx            *appctx = (AppCtx*) ctx;
>>>>   PetscScalar       f;//, *x_localptr;
>>>>   const PetscScalar *x;
>>>>   PetscInt          mybase;
>>>>
>>>>   PetscFunctionBeginUser;
>>>>   ierr = VecGetOwnershipRange(X,&mybase,NULL);CHKERRQ(ierr);
>>>>   ierr = VecGetArrayRead(X,&x);CHKERRQ(ierr);
>>>>   if (mybase == 0) {
>>>>     f    = (PetscScalar) (-appctx->p1*x[0]*x[1]/appctx->N);
>>>>     ierr = VecSetValues(F,1,&mybase,&f,INSERT_VALUES);
>>>>   }
>>>>   if (mybase == 1) {
>>>>     f    = (PetscScalar)
>>>> (appctx->p1*x[0]*x[1]/appctx->N-appctx->p2*x[1]);
>>>>     ierr = VecSetValues(F,1,&mybase,&f,INSERT_VALUES);
>>>>   }
>>>>   if (mybase == 2) {
>>>>     f    = (PetscScalar) (appctx->p2*x[1]);
>>>>     ierr = VecSetValues(F,1,&mybase,&f,INSERT_VALUES);
>>>>   }
>>>>   ierr = VecRestoreArrayRead(X,&x);CHKERRQ(ierr);
>>>>   ierr = VecAssemblyBegin(F);CHKERRQ(ierr);
>>>>   ierr = VecAssemblyEnd(F);CHKERRQ(ierr);
>>>>   PetscFunctionReturn(0);
>>>> }
>>>>
>>>>
>>>> Whilst for the Jacobian I did:
>>>>
>>>>
>>>> static PetscErrorCode IJacobian(TS ts,PetscReal t,Vec X,Vec
>>>> Xdot,PetscReal a,Mat A,Mat B,void *ctx)
>>>> {
>>>>   PetscErrorCode    ierr;
>>>>   AppCtx            *appctx = (AppCtx*) ctx;
>>>>   PetscInt          mybase, rowcol[] = {0,1,2};
>>>>   const PetscScalar *x;
>>>>
>>>>   PetscFunctionBeginUser;
>>>>   ierr = MatGetOwnershipRange(B,&mybase,NULL);CHKERRQ(ierr);
>>>>   ierr = VecGetArrayRead(X,&x);CHKERRQ(ierr);
>>>>   if (mybase == 0) {
>>>>     const PetscScalar J[] = {a + appctx->p1*x[1]/appctx->N,
>>>> appctx->p1*x[0]/appctx->N, 0};
>>>>     ierr =
>>>> MatSetValues(B,1,&mybase,3,rowcol,J,INSERT_VALUES);CHKERRQ(ierr);
>>>>   }
>>>>   if (mybase == 1) {
>>>>     const PetscScalar J[] = {- appctx->p1*x[1]/appctx->N, a -
>>>> appctx->p1*x[0]/appctx->N + appctx->p2, 0};
>>>>     ierr =
>>>> MatSetValues(B,1,&mybase,3,rowcol,J,INSERT_VALUES);CHKERRQ(ierr);
>>>>   }
>>>>   if (mybase == 2) {
>>>>     const PetscScalar J[] = {0, - appctx->p2, a};
>>>>     ierr =
>>>> MatSetValues(B,1,&mybase,3,rowcol,J,INSERT_VALUES);CHKERRQ(ierr);
>>>>   }
>>>>   ierr    = VecRestoreArrayRead(X,&x);CHKERRQ(ierr);
>>>>
>>>>   ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>>>   ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>>>   if (A != B) {
>>>>     ierr = MatAssemblyBegin(B,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>>>     ierr = MatAssemblyEnd(B,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>>>   }
>>>>   PetscFunctionReturn(0);
>>>> }
>>>>
>>>> This code does not provide the correct result, that is, the solution is
>>>> the initial condition, either using implicit or explicit methods. Is the
>>>> way I defined these objects wrong? How can I fix it?
>>>> I also tried to print the Jacobian with the following commands but it
>>>> does not work (blank rows and error message). How should I print the
>>>> Jacobian?
>>>>
>>>> ierr = TSGetIJacobian(ts,NULL,&K, NULL, NULL); CHKERRQ(ierr);
>>>> ierr = MatAssemblyBegin(K,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>>> ierr = MatAssemblyEnd(K,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>>> ierr = MatView(K,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr);
>>>>
>>>>
>>>> I would very much appreciate any kind of help or advice.
>>>> Best,
>>>> Francesco
>>>>
>>>> Il giorno 2 apr 2021, alle ore 04:45, Barry Smith <bsmith at petsc.dev>
>>>> ha scritto:
>>>>
>>>>
>>>>
>>>> On Apr 1, 2021, at 9:17 PM, Zhang, Hong via petsc-users <
>>>> petsc-users at mcs.anl.gov> wrote:
>>>>
>>>>
>>>>
>>>> On Mar 31, 2021, at 2:53 AM, Francesco Brarda <
>>>> brardafrancesco at gmail.com> wrote:
>>>>
>>>> Hi everyone!
>>>>
>>>> I am trying to solve a system of 3 ODEs (a basic SIR model) with TS.
>>>> Sequentially works pretty well, but I need to switch it into a
>>>> parallel version.
>>>> I started working with TS not very long time ago, there are few
>>>> questions I’d like to share with you and if you have any advices I’d
>>>> be happy to hear.
>>>> First of all, do I need to use a DM object even if the model is only
>>>> time dependent? All the examples I found were using that object for the
>>>> other variable when solving PDEs.
>>>>
>>>>
>>>> Are you considering SIR on a spatial domain? If so, you can parallelize
>>>> your model in the spatial domain using DM. Splitting the three variables in
>>>> the ODE among processors would not scale.
>>>>
>>>>
>>>>  Even though it will not scale and will deliver slower performance it
>>>> is completely possible for you to solve the 3 variable problem using 3 MPI
>>>> ranks. Or 10 mpi ranks. You would just create vectors/matrices with 1
>>>> degree of freedom for the first three ranks and no degrees of freedom for
>>>> the later ranks. During your function evaluation (and Jacobian evaluation)
>>>> for TS you will need to set up the appropriate communication to get the
>>>> values you need on each rank to evaluate the parts of the function
>>>> evaluation needed by that rank. This is true for parallelizing any
>>>> computation.
>>>>
>>>>  Barry
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Hong (Mr.)
>>>>
>>>> When I preallocate the space for the Jacobian matrix, is it better to
>>>> decide the local or global space?
>>>>
>>>> Best,
>>>> Francesco
>>>>
>>>>
>>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>>
>
> --
> Stefano
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210420/8808e3a0/attachment-0001.html>