[petsc-users] help me to solve 6-dim PDE with PETSc with advice

Tue Aug 1 12:50:53 CDT 2017

>   I didn't understand this. This is very wasteful. I have no good solution.
Equations for E and B are linear, moreover I know exactly which time 
solver I need - implicit Crank–Nicolson method, which conserves energy 
exactly in my case - and I know it works well from the article. So 
instead of PETSc TS, I can use SNES and do the time discretization 
myself. Therefore inside SNES, I will solve local linear system for 
E^{n+1} and B^{n+1} each time.

And each processor will store only one copy of local E^{n} and B^{n}.

It will probably even improve the convergence of overall nonlinear solver.

Does it make sense?

Oleksandr Koshkarov.

On 08/01/2017 10:56 AM, Barry Smith wrote:
>> On Aug 1, 2017, at 11:43 AM, Oleksandr Koshkarov <olk548 at mail.usask.ca> wrote:
>>
>>> So you are saying that E and B don't have any coupling between values at different velocities and hence don't need ghost points?
>> Yes, exactly - E and B depend only on space coordinates.
>     I didn't understand this. This is very wasteful. I have no good solution.
>>> It is just an optimization and you shouldn't worry about it until you have working code.
>> OK, thanks, that is what I will do.
>>
>> Oleksandr Koshkarov
>>
>>
>> On 08/01/2017 10:35 AM, Barry Smith wrote:
>>>> On Aug 1, 2017, at 1:20 AM, Oleksandr Koshkarov <olk548 at mail.usask.ca> wrote:
>>>>
>>>> Thank you, it is very helpful.
>>>>
>>>> I think there is a small problem with this approach. By storing E and B in DMDA I will communicate their values each time when I communicate ghost points. Moreover, it seems that each processor will have multiples copies of E and B (depending on how big velocity range it holds). So it seems it is possible to reduce communication and memory usage by 3 (maybe it is not a big number) if each process holds only one copy of E and B. However, if E and B will be stored in local vector, it is not obvious for me how to construct a time solver so it also evolves E and B all together with C.
>>>>
>>>     So you are saying that E and B don't have any coupling between values at different velocities and hence don't need ghost points? DMDA is simple, it doesn't have support for only have ghost points for some variables but should the communication prove to be time-consuming we could provide a custom version that does not ghost-point the E and B. It is just an optimization and you shouldn't worry about it until you have working code.
>>>
>>>    Barry
>>>
>>>
>>>> p.s. currently, I am still playing with PETSc manual and I started implementing the approach you proposed.
>>>>
>>>> Best regards,
>>>>
>>>> Oleksandr Koshkarov.
>>>>
>>>>
>>>> On 07/30/2017 04:35 PM, Barry Smith wrote:
>>>>>> On Jul 30, 2017, at 2:04 PM, Oleksandr Koshkarov <olk548 at mail.usask.ca> wrote:
>>>>>>
>>>>>> Yes, It seems I am wrong and you are right.
>>>>>>
>>>>>> I think I can then make electric and magnetic fields local and do parallel partitioning only in v space - it will defiantly restrict the problem size, but maybe the code still be still useful and I get petsc experience with it :) The question here if I will just parallize along v-space, the 3d DMDA should be enough, right? I do not really understand how would I do the partitioning here, can you please give me some hints?
>>>>>    You could do this, it is tempting due to its simplicity. Let Mv,Nv, and Pv represent the number of grid points in the three velocity directions and Mx, Nx, Px represent the number of grid points in physical space. If you have three unknowns at each grid point represented by C, E, B then you would define
>>>>>
>>>>>      dof = 3*Mx*Nx*Px
>>>>>
>>>>> Create your DMDACreate3d() using Mv,Nv, and Pv and dof then each process will have a certain number of velocity "grid" points and at each grid point there will room for all the physical points (times 3 for each unknown). So think of it as a three dimensional array with all the dependence on spatial grid point variables stacked up on top of each other. If you call
>>>>>
>>>>>    DMDAVecGetArray()
>>>>>
>>>>> the resulting array will be indexed as
>>>>>
>>>>> array[kv][jv][iv][ilocal]
>>>>>
>>>>> where kv is the third grid component of velocity etc and ilocal goes from 0 to 3*Mx*Nx*Px. You can decide if you want to store the C, E, B interlaced (meaning array[kv][jv][iv][0] is the first C, array[kv][jv][iv][1] is the first E ...) or uninterlaced (meaning array[kv][jv][iv][0:Mx*Nx*Px] are the C etc). If you use noninterlaced then applying a 3d FFT to a particular component is straightforward since they are all stored next to each other.
>>>>>
>>>>>   This is definitely not a scalable approach but you might be able to have Mx = My = Mz = 128 which is not bad for a spectral method.
>>>>>
>>>>>   I recommend starting by writing a simple code that does this first with 2d spatial and 2d, it is just so much easy to reason about a two array as you get the kinks out of the code.
>>>>>
>>>>>    Barry
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Another approach is to abandon global Fourier, let say for education purposes I will pick some finite difference along spacial dimensions, how would you advice me to partition this problem with PETSc then?
>>>>>>
>>>>>>
>>>>>> I know something like spectral elements or discontinuous Galerkin will be ideal, but I want to get petsc experience first with simpler problem and then move to discontinuous Galerkin (this will be defiantly the next step). (Unfortunately, I have never done any unstructured meshes like spectral elements or discontinuous Galerkin myself before, but did some finite difference and spectral methods)
>>>>>>
>>>>>> On 07/30/2017 12:46 PM, Matthew Knepley wrote:
>>>>>>> On Sun, Jul 30, 2017 at 12:44 PM, Oleksandr Koshkarov <olk548 at mail.usask.ca> wrote:
>>>>>>> Thanks for the references, and warning.
>>>>>>>
>>>>>>> I am trying to do exactly this
>>>>>>> http://www.sciencedirect.com/science/article/pii/S0021999115004738
>>>>>>>
>>>>>>> (I probably should have added the reference in the first place, sorry)
>>>>>>>
>>>>>>> the evolution formula is (15) in the article.
>>>>>>>
>>>>>>> So, I will do convolution only in 3D (while full system is 6D) because velocity space is not discretized with Fourier
>>>>>>>
>>>>>>> So As I understand, it will not be N^2 vs N logN. Moreover, I know that algorithm with FFT has significantly larger "constant".
>>>>>>>
>>>>>>> Please correct me if I am wrong here :)
>>>>>>>
>>>>>>> I think you are wrong here. First, I do not understand why you think the asymptotics have changed. Second, the constant for
>>>>>>> FFT is about 6, which would be really hard to beat with a quadrature.
>>>>>>>
>>>>>>> Last, this discretization looks fully coupled to me. I cannot see how you would parallelize this right off the bat. This is a problem
>>>>>>> with spectral discretizations, and the main motivator for spectral elements.
>>>>>>>
>>>>>>>   Thanks,
>>>>>>>
>>>>>>>     Matt
>>>>>>> I know for sure that FFT is a must at the end, but I thought I can start with matrix product to get more familiar with PETSc as it will be simpler to implement (At least I think so).
>>>>>>>
>>>>>>> I hope it does make sense.
>>>>>>>
>>>>>>> Another question about convolution via matrix product. Is there better way to do it other then to construct NxN Toeplitz-like matrix each time step on                   the "fly" from state vector and then multiply it to state vector?
>>>>>>>
>>>>>>> p.s. Probably I should do FFT from the start, but I am a little bit intimidated by the problem taking into account that I do not have PETSc experience.
>>>>>>> Best regards,
>>>>>>> Oleksandr Koshkarov.
>>>>>>>
>>>>>>> On 07/30/2017 09:01 AM, Matthew Knepley wrote:
>>>>>>>> On Sat, Jul 29, 2017 at 7:01 PM, Oleksandr Koshkarov <olk548 at mail.usask.ca> wrote:
>>>>>>>> Thank you for the response,
>>>>>>>>
>>>>>>>> Now that you said about additional communication for fft, I want to ask another question:
>>>>>>>>
>>>>>>>> What if I disregard FFT and will compute convolution directly, I will rewrite it as vector-matrix-vector product
>>>>>>>>
>>>>>>>> I would caution you here. The algorithm is the most important thing to get right up front. In this case, the effect of
>>>>>>>> dimension will be dominant, and anything that does not scale well with dimension will be thrown away, perhaps
>>>>>>>> quite quickly, as you want to solve bigger problems.
>>>>>>>>
>>>>>>>> Small calculation: Take a 6-dim box which is 'n' on a side, so that N = n^6. FFT is N log N, whereas direct evaluation of
>>>>>>>> a convolution is N^2. So if we could do n = 100 points on a side with FFT, you can do, assuming the constants are about
>>>>>>>> equal,
>>>>>>>>
>>>>>>>>    100^6 (log 100^6) = x^12
>>>>>>>>    1e12 (6 2) = x^12
>>>>>>>>    x ~ 10
>>>>>>>>
>>>>>>>> Past versions of PETSc had a higher dimensional DA construct (adda in prior versions), but no one used it so we removed it.
>>>>>>>>
>>>>>>>> I assume you are proposing something like
>>>>>>>>
>>>>>>>>   https://arxiv.org/abs/1306.4625
>>>>>>>>
>>>>>>>> or maybe
>>>>>>>>
>>>>>>>>   https://arxiv.org/abs/1610.00397
>>>>>>>>
>>>>>>>> The later is interesting since I think you could use a DMDA for the velocity and something like DMPlex for the space.
>>>>>>>>
>>>>>>>>   Thanks,
>>>>>>>>
>>>>>>>>     Matt
>>>>>>>>
>>>>>>>> x^T A x (where T - is transpose, x - is my state vector, and A is a matrix which represents convolution and probably it will be not very sparse).
>>>>>>>>
>>>>>>>> Yes - I will do more work (alghorithm will be slower), but will I win if we take the parallelization into account - less communication + probably more scalable algorithm?
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>> Alex.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 07/29/2017 05:23 PM, Barry Smith wrote:
>>>>>>>>    DMDA provides a simple way to do domain decomposition of structured grid meshes in the x, y, and z direction, that is it makes it easy to efficiently chop up vectors in all 3 dimensions, allowing very large problems easily. Unfortunately it only goes up to 3 dimensions and attempts to produce versions for higher dimensions have failed.
>>>>>>>>
>>>>>>>>    If you would like to do domain decomposition across all six of your dimensions (which I think you really need to do to solve large problems) we don't have a tool that helps with doing the domain decomposition. To make matters more complicated, likely the 3d mpi-fftw that need to be applied require rejiggering the data across the nodes to get it into a form where the ffts can be applied efficiently.
>>>>>>>>
>>>>>>>>     I don't think there is necessarily anything theoretically difficult about managing the six dimensional domain decomposition I just don't know of any open source software out there that helps to do this. Maybe some PETSc users are aware of such software and can chime in.
>>>>>>>>
>>>>>>>>     Aside from these two issues :-) PETSc would be useful for you since you could use our time integrators and nonlinear solvers. Pulling out subvectors to process on can be done with with either VecScatter or VecStrideScatter, VecStrideGather etc depending on the layout of the unknowns, i.e. how the domain decomposition is done.
>>>>>>>>
>>>>>>>>      I wish I had better answers for managing the 6d domain decomposition
>>>>>>>>
>>>>>>>>     Barry
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Jul 29, 2017, at 5:06 PM, Oleksandr Koshkarov <olk548 at mail.usask.ca> wrote:
>>>>>>>>
>>>>>>>> Dear All,
>>>>>>>>
>>>>>>>> I am a new PETSc user and I am still in the middle of the manual (I have finally settled to choose PETSc as my main numerical library) so I am sorry beforehand if my questions would be naive.
>>>>>>>>
>>>>>>>> I am trying to solve 6+1 dimensional Vlasov equation with spectral methods. More precisely, I will try to solve half-discretized equations of the form (approximate form) with pseudospectral Fourier method:
>>>>>>>>
>>>>>>>> (Equations are in latex format, the nice website to see them is https://www.codecogs.com/latex/eqneditor.php)
>>>>>>>>
>>>>>>>> \frac{dC_{m,m,p}}{dt} =\\
>>>>>>>> \partial_x \left ( a_n C_{n+1,m,p} +b_n C_{n,m,p} +c_n C_{n-1,m,p} \right ) \\
>>>>>>>> + \partial_y \left ( a_m C_{n,m+1,p} +b_m C_{n,m,p} +c_m C_{n,m-1,p}  \right ) \\
>>>>>>>> + \partial_z \left ( a_p C_{n,m,p+1} +b_p C_{n,m,p} +c_p C_{n,m,p-1}  \right ) \\
>>>>>>>> + d_n E_x C_{n-1,m,p} + d_m E_x C_{n,m-1,p} + d_p E_x C_{n,m,p-1} \\
>>>>>>>> + B_x (e_{m,p} C_{n,m-1,p-1} + f_{m,p}C_{n,m-1,p+1} + \dots) + B_y (\dots) + B_z (\dots)
>>>>>>>>
>>>>>>>>
>>>>>>>> where a,b,c,d,e,f are some constants which can depend on n,m,p,
>>>>>>>>
>>>>>>>> n,m,p = 0...N,
>>>>>>>>
>>>>>>>> C_{n,m,p} = C_{n,m,p}(x,y,z),
>>>>>>>>
>>>>>>>> E_x = E_x(x,y,z), (same for E_y,B_x,...)
>>>>>>>>
>>>>>>>> and fields are described with equation of the sort (linear pdes with source terms dependent on C):
>>>>>>>>
>>>>>>>> \frac{dE_x}{dt} = \partial_y B_z - \partial_z B_x + (AC_{1,0,0} + BC_{0,0,0}) \\
>>>>>>>> \frac{dB_x}{dt} = -\partial_y E_z + \partial_z E_x
>>>>>>>>
>>>>>>>> with A,B being some constants.
>>>>>>>>
>>>>>>>> I will need fully implicit Crank–Nicolson method, so my plan is to use SNES or TS where
>>>>>>>>
>>>>>>>> I will use one MPI PETSc vector which describes the state of the system (all, C, E, B),  then I can evolve linear part with just matrix multiplication.
>>>>>>>>
>>>>>>>> The first question is, should I use something like DMDA? if so, it is only 3dimensional but I need 6 dimensional vectots? Will it be faster then matrix multiplication?
>>>>>>>>
>>>>>>>> Then to do nonlinear part I will need 3d mpi-fftw to compute convolutions. The problem is, how do I extract subvectors from full big vector state? (what is the best approach?)
>>>>>>>>
>>>>>>>> If you have some other suggestions, please feel free to share
>>>>>>>>
>>>>>>>> Thanks and best regards,
>>>>>>>>
>>>>>>>> Oleksandr Koshkarov.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>>>>>> -- Norbert Wiener
>>>>>>>>
>>>>>>>> http://www.caam.rice.edu/~mk51/
>>>>>>>
>>>>>>> -- 
>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>>>>> -- Norbert Wiener
>>>>>>>
>>>>>>> http://www.caam.rice.edu/~mk51/