[petsc-users] Questions on matrix-free GMRES implementation
feng wang
snailsoar at hotmail.com
Wed May 12 12:40:47 CDT 2021
"Is the the plot for KSP convergence with ASM preconditioning? Or just matrix-free multiply and no preconditioner? "
The plot shows the non-linear residual, which is the L2 norm of my right hand side, against number of non-linear iterations. I am using GMRES preconditioned with ASM. From your comments, If I understand it correctly, my non-linear residual history would change with number of cpus which I use because of ASM?
"One normally expects the number of iterations to increase with more cores for ASM (especially from one to two ranks) because there is less coupling in the preconditioner the more blocks (ranks) you use".
somehow I did not see a big difference in my non-linear residual when I use 1, 2, 3 cpus in this case. I can try a different case and check.
Thanks,
Feng
________________________________
From: Barry Smith <bsmith at petsc.dev>
Sent: 12 May 2021 17:11
To: feng wang <snailsoar at hotmail.com>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation
Is the the plot for KSP convergence with ASM preconditioning? Or just matrix-free multiply and no preconditioner?
One normally expects the number of iterations to increase with more cores for ASM (especially from one to two ranks) because there is less coupling in the preconditioner the more blocks (ranks) you use.
Barry
On May 12, 2021, at 7:06 AM, feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Barry,
I have implemented my matrix-free GMRES in parallel. There is a small difference in the residuals (please see the figure below) when I vary the number of cores. But they all converged in the end. I have tried to multiply my shell matrix or preconditoning matrix with a vector in parallel, I got same values as a serial run. so I believe halo exchange is ok for matrix-vector multiplications.
I am using ASM preconditioner with overlap set to 1. Is this behaviour in parallel normal for ASM pre-conditioners? If it is not, then I know I need to find a bug in my code.
Thanks,
Feng
<tmp.png>
________________________________
From: petsc-users <petsc-users-bounces at mcs.anl.gov<mailto:petsc-users-bounces at mcs.anl.gov>> on behalf of feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Sent: 28 March 2021 22:05
To: Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation
Hi Barry,
Thanks for your comments. I will try that.
Thanks,
Feng
________________________________
From: Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Sent: 26 March 2021 23:44
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation
On Mar 25, 2021, at 6:39 PM, feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Barry,
Thanks for your comments.
I will renumber the cells in the way as you recommended. I went through the manual again and understand how to update the halo elements for my shell matrix routine "mymult(Mat m ,Vec x, Vec y)". I can use the global index of ghost cells for each rank and "Vec x" to get the ghost values for each rank via scattering. It should be similar to the example in page 40 in the manual.
One more question, I also have an assembled approximate Jacobian matrix for pre-conditioning GMRES. If I re-number the cells properly as your suggested, I don't need to worry about communication and petsc will handle it properly together with my shell-matrix?
If you assembly the approximate Jaocobian using the "new" ordering then it will reflect the same function evaluation and matrix free operators so should be ok.
Barry
Thanks,
Feng
________________________________
From: Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Sent: 25 March 2021 0:03
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation
On Mar 24, 2021, at 5:08 AM, feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Barry,
Thanks for your comments. It's very helpful. For your comments, I have a bit more questions
1. for your 1st comment " Yes, in some sense. So long as each process ....".
* If I understand it correctly (hopefully) a parallel vector in petsc can hold discontinuous rows of data in a global array. If this is true, If I call "VecGetArray", it would create a copy in a continuous space if the data is not continuous, do some operations and petsc will figure out how to put updated values back to the right place in the global array?
* This would generate an overhead. If I do the renumbering to make each process hold continuous rows, this overhead can be avoided when I call "VecGetArray"?
GetArray does nothing except return the pointer to the data in the vector. It does not copy anything or reorder anything. Whatever order the numbers are in vector they are in the same order as in the array you obtain with VecGetArray.
1. for your 2nd comment " The matrix and vectors the algebraic solvers see DO NOT have......." For the callback function of my shell matrix "mymult(Mat m ,Vec x, Vec y)", I need to get "x" for the halo elements to compute the non-linear function. My code will take care of other halo exchanges, but I am not sure how to use petsc to get the halo elements "x" in the shell matrix, could you please elaborate on this? some related examples or simple pesudo code would be great.
Basically all the parallel code in PETSc does this. How you need to set up the halo communication depends on how you are managing the assignment of degrees of freedom on each process and between processes. VecScatterCreate() is the tool you will use to tell PETSc how to get the correct values from one process to their halo-ed location on the process. It like everything in PETSc uses a number in the vectors of 0 ... n_0-1 on the first process, n_0, n_0+1, ... n_1-1 on the second etc. Since you are managing the partitioning and distribution of parallel data you must renumber the vector entry numbering in your data structures to match that shown above. Just do the numbering once after you have setup your distributed data and use it for the rest of the run. You might use the object from AOCreate to do the renumbering for you.
Barry
Thanks,
Feng
________________________________
From: Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Sent: 22 March 2021 1:28
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation
On Mar 21, 2021, at 6:22 PM, feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Barry,
Thanks for your help, I really appreciate it.
In the end I used a shell matrix to compute the matrix-vector product, it is clearer to me and there are more things under my control. I am now trying to do a parallel implementation, I have some questions on setting up parallel matrices and vectors for a user-defined partition, could you please provide some advice? Suppose I have already got a partition for 2 CPUs. Each cpu is assigned a list of elements and also their halo elements.
1. The global element index for each partition is not necessarily continuous, do I have to I re-order them to make them continuous?
Yes, in some sense. So long as each process can march over ITS elements computing the function and Jacobian matrix-vector product it doesn't matter how you have named/numbered entries. But conceptually the first process has the first set of vector entries and the second the second set.
1.
2. When I set up the size of the matrix and vectors for each cpu, should I take into account the halo elements?
The matrix and vectors the algebraic solvers see DO NOT have halo elements in their sizes. You will likely need a halo-ed work vector to do the matrix-free multiply from. The standard model is use VecScatterBegin/End to get the values from the non-halo-ed algebraic vector input to MatMult into a halo-ed one to do the local product.
1. In my serial version, when I initialize my RHS vector, I am not using VecSetValues, Instead I use VecGetArray/VecRestoreArray to assign the values. VecAssemblyBegin()/VecAssemblyEnd() is never used. would this still work for a parallel version?
Yes, you can use Get/Restore but the input vector x will need to be, as noted above, scattered into a haloed version to get all the entries you will need to do the local part of the product.
Thanks,
Feng
________________________________
From: Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Sent: 12 March 2021 23:40
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation
On Mar 12, 2021, at 9:37 AM, feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Matt,
Thanks for your prompt response.
Below are my two versions. one is buggy and the 2nd one is working. For the first one, I add the diagonal contribution to the true RHS (variable: rhs) and then set the base point, the callback function is somehow called twice afterwards to compute Jacobian.
Do you mean "to compute the Jacobian matrix-vector product?"
Is it only in the first computation of the product (for the given base vector) that it calls it twice or every matrix-vector product?
It is possible there is a bug in our logic; run in the debugger with a break point in FormFunction_mf and each time the function is hit in the debugger type where or bt to get the stack frames from the calls. Send this. From this we can all see if it is being called excessively and why.
For the 2nd one, I just call the callback function manually to recompute everything, the callback function is then called once as expected to compute the Jacobian. For me, both versions should do the same things. but I don't know why in the first one the callback function is called twice after I set the base point. what could possibly go wrong?
The logic of how it is suppose to work is shown below.
Thanks,
Feng
//This does not work
fld->cnsv( iqs,iqe, q, aux, csv );
//add contribution of time-stepping
for(iv=0; iv<nv; iv++)
{
for(iq=0; iq<nq; iq++)
{
//use conservative variables here
rhs[iv][iq] = -rhs[iv][iq] + csv[iv][iq]*lhsa[nlhs-1][iq]/cfl;
}
}
ierr = petsc_setcsv(petsc_csv); CHKERRQ(ierr);
ierr = petsc_setrhs(petsc_baserhs); CHKERRQ(ierr);
ierr = MatMFFDSetBase(petsc_A_mf, petsc_csv, petsc_baserhs); CHKERRQ(ierr);
//This works
fld->cnsv( iqs,iqe, q, aux, csv );
ierr = petsc_setcsv(petsc_csv); CHKERRQ(ierr);
ierr = FormFunction_mf(this, petsc_csv, petsc_baserhs); //this is my callback function, now call it manually
ierr = MatMFFDSetBase(petsc_A_mf, petsc_csv, petsc_baserhs); CHKERRQ(ierr);
Since you provide petsc_baserhs MatMFFD assumes (naturally) that you will keep the correct values in it. Hence for each new base value YOU need to compute the new values in petsc_baserhs. This approach gives you a bit more control over reusing the information in petsc_baserhs.
If you would prefer that MatMFFD recomputes the base values, as needed, then you call FormFunction_mf(this, petsc_csv, NULL); and PETSc will allocate a vector and fill it up as needed by calling your FormFunction_mf() But you need to call MatAssemblyBegin/End each time you the base input vector this, petsc_csv values change. For example
MatAssemblyBegin(petsc_A_mf,...)
MatAssemblyEnd(petsc_A_mf,...)
KSPSolve()
________________________________
From: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Sent: 12 March 2021 15:08
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation
On Fri, Mar 12, 2021 at 9:55 AM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Mat,
Thanks for your reply. I will try the parallel implementation.
I've got a serial matrix-free GMRES working, but I would like to know why my initial version of matrix-free implementation does not work and there is still something I don't understand. I did some debugging and find that the callback function to compute the RHS for the matrix-free matrix is called twice by Petsc when it computes the finite difference Jacobian, but it should only be called once. I don't know why, could you please give some advice?
F is called once to calculate the base point and once to get the perturbation. The base point is not recalculated, so if you do many iterates, it is amortized.
Thanks,
Matt
Thanks,
Feng
________________________________
From: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Sent: 12 March 2021 12:05
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation
On Fri, Mar 12, 2021 at 6:02 AM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Barry,
Thanks for your advice.
You are right on this. somehow there is some inconsistency when I compute the right hand side (true RHS + time-stepping contribution to the diagonal matrix) to compute the finite difference Jacobian. If I just use the call back function to recompute my RHS before I call MatMFFDSetBase, then it works like a charm. But now I end up with computing my RHS three times. 1st time is to compute the true RHS, the rest two is for computing finite difference Jacobian.
In my previous buggy version, I only compute RHS twice. If possible, could you elaborate on your comments "Also be careful about petsc_baserhs", so I may possibly understand what was going on with my buggy version.
Our FD implementation is simple. It approximates the action of the Jacobian as
J(b) v = (F(b + h v) - F(b)) / h ||v||
where h is some small parameter and b is the base vector, namely the one that you are linearizing around. In a Newton step, b is the previous solution
and v is the proposed solution update.
Besides, for a parallel implementation, my code already has its own partition method, is it possible to allow petsc read in a user-defined partition? if not what is a better way to do this?
Sure
https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecSetSizes.html
Thanks,
Matt
Many thanks,
Feng
________________________________
From: Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Sent: 11 March 2021 22:15
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation
Feng,
The first thing to check is that for each linear solve that involves a new operator (values in the base vector) the MFFD matrix knows it is using a new operator.
The easiest way is to call MatMFFDSetBase() before each solve that involves a new operator (new values in the base vector). Also be careful about petsc_baserhs, when you change the base vector's values you also need to change the petsc_baserhs values to the function evaluation at that point.
If that is correct I would check with a trivial function evaluator to make sure the infrastructure is all set up correctly. For examples use for the matrix free a 1 4 1 operator applied matrix free.
Barry
On Mar 11, 2021, at 7:35 AM, feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Dear All,
I am new to petsc and trying to implement a matrix-free GMRES. I have assembled an approximate Jacobian matrix just for preconditioning. After reading some previous questions on this topic, my approach is:
the matrix-free matrix is created as:
ierr = MatCreateMFFD(*A_COMM_WORLD, iqe*blocksize, iqe*blocksize, PETSC_DETERMINE, PETSC_DETERMINE, &petsc_A_mf); CHKERRQ(ierr);
ierr = MatMFFDSetFunction(petsc_A_mf, FormFunction_mf, this); CHKERRQ(ierr);
KSP linear operator is set up as:
ierr = KSPSetOperators(petsc_ksp, petsc_A_mf, petsc_A_pre); CHKERRQ(ierr); //petsc_A_pre is my assembled pre-conditioning matrix
Before calling KSPSolve, I do:
ierr = MatMFFDSetBase(petsc_A_mf, petsc_csv, petsc_baserhs); CHKERRQ(ierr); //petsc_csv is the flow states, petsc_baserhs is the pre-computed right hand side
The call back function is defined as:
PetscErrorCode cFdDomain::FormFunction_mf(void *ctx, Vec in_vec, Vec out_vec)
{
PetscErrorCode ierr;
cFdDomain *user_ctx;
cout << "FormFunction_mf called\n";
//in_vec: flow states
//out_vec: right hand side + diagonal contributions from CFL number
user_ctx = (cFdDomain*)ctx;
//get perturbed conservative variables from petsc
user_ctx->petsc_getcsv(in_vec);
//get new right side
user_ctx->petsc_fd_rhs();
//set new right hand side to the output vector
user_ctx->petsc_setrhs(out_vec);
ierr = 0;
return ierr;
}
The linear system I am solving is (J+D)x=RHS. J is the Jacobian matrix. D is a diagonal matrix and it is used to stabilise the solution at the start but reduced gradually when the solution moves on to recover Newton's method. I add D*x to the true right side when non-linear function is computed to work out finite difference Jacobian, so when finite difference is used, it actually computes (J+D)*dx.
The code runs but diverges in the end. If I don't do matrix-free and use my approximate Jacobian matrix, GMRES works. So something is wrong with my matrix-free implementation. Have I missed something in my implementation? Besides, is there a way to check if the finite difference Jacobian matrix is computed correctly in a matrix-free implementation?
Thanks for your help in advance.
Feng
--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210512/68e48ac3/attachment-0001.html>
More information about the petsc-users
mailing list