[petsc-dev] tubtrans

Thu Jun 25 13:12:34 CDT 2015

> On Jun 10, 2015, at 11:04 PM, Hong <hzhang at mcs.anl.gov> wrote:
> 
> Barry:
> 
>    How can I reproduce this? What should happen is KSP gets a Nan which passes a Nan to SNES which passes it up to TS and TS should stop with an error saying it cannot time-step.
>  
> I've sent you an invitation for repository 'wash'.
> The example is wash/pipe/main.c 
> make pipe
> $ ./pipe -pc_factor_shift_type NONE -snes_monitor -ksp_monitor
> Loaded case file
>   nTimeSteps 100 * deltatim 0.1 = endTime 10
> Pipe 0. nlen: 5. hlen: 120.000000
>   0 SNES Function norm 1.495500487667e+02
>     0 KSP Residual norm            nan
> 
>  Solve with xda...
>   0 SNES Function norm 1.495500487667e+02
>     0 KSP Residual norm            nan
> 
> i.e., 'nan' does not cause crash or give any error, simply moves to next line of code ... We do our own TS.

  Why? Shouldn't one always use TS? 

> Maybe we should use
> -snes_converged_reason -ksp_converged_reason to stop.

  No, you need to check the converged reason of SNES each time.

   SNESGetConvergedReason(snes,&reason);
  if (reason < 0) {
     nonlinear solve failed so do something
  }

  Note that TS is suppose to handle all this for you, checking the converged reason of SNES, cutting the time-step if the nonlinear solve failed ....... so you don't need to write all the code yourself.

  Barry

> 
> 
>    One can run with -ksp_error_if_not_converged to get a crash as soon as the the linear solver detects a problem to reproduce some of the old behavior.
>  
> This works, thanks. Why not put it as default?
> 
>   The logic behind these changes is that previously KSP would stop the program with a error if there was a zero pivot in LU, this means the time step algorithms could not recover from a zero pivot. This was very bad, who wants a program that has been running for 8 hours to crash because at timestep 1,003,415 it got a zero pivot when a simple cut in the tilmestep size would allow the program to keep running.
> 
>  Obviously there are some issues in the new model that have to be worked out.
> 
> Running 1,003,415 time steps with 'nan', and still continue ... ?
> 
> Hong
> 
> 
> > On Jun 10, 2015, at 8:49 PM, Hong <hzhang at mcs.anl.gov> wrote:
> >
> > Barry:
> >
> > Why the code does not crash in the case of 'nan':
> >
> > Integrating characteristic equations ...
> > TS: 0.000000
> >   0 SNES Function norm 1.490033584870e+02
> >     0 KSP Residual norm            nan
> > TS: 0.100000
> >   0 SNES Function norm 1.490033584870e+02
> >     0 KSP Residual norm            nan
> > ...
> >
> > Previously, when ilu/lu encounters a zero pivot, the code crashes, which asks user to do something about his model or the algorithm;
> > Now, it gives 'nan', but continues running.
> > We cannot assume new users know '-ksp_monitor -snes_monitor', and set '-pc_factor_shift'. The frustration we encountered is that the code seems working well, until I checked the solutions at each time steps and found them never change values. Then it took us hours to find out it was caused by shift.
> >
> > At least for LU, when zero-pivot occurs, we should throw an error.
> >
> > Hong
> >
> >   Hong,
> >
> >     This is a question for the entire PETSc development team, not just me.
> >
> > The reason I made this change was I was sick of
> > runs with ILU and nearly zero pivots where the preconditioned residual normed ended up initially being like 10^20 and then decreased in a few iterations to 10^10 and so the iteration stopped while, in fact, no progress on the true residual norm at all, meanwhile the user thinks the linear solver is working fine. I figured was better to just "stop" the entire process so the user can do something smarter for preconditioning than using ILU(0). At worse the user can now add the pivot option themselves.
> >
> >    Was this a good choice, perhaps not? Should we just go back to the old default? Probably we need to at least have better help in terms of telling people to turn on that option?
> >
> >
> >   Barry
> >
> >
> >
> > > On Jun 10, 2015, at 4:47 PM, Hong <hzhang at mcs.anl.gov> wrote:
> > >
> > > Daniel,
> > >
> > > Try to use it with the flag: '-pc_type svd'. Now it works.
> > > Great, you found the problem!
> > >
> > > This is very strange
> > > OK, this must be the change made on the default pc_factor_shift used for ilu/lu.
> > > In v3.5.4, it is
> > > -pc_factor_shift_type INBLOCKS
> > >
> > > now,
> > > -pc_factor_shift_type NONE
> > >
> > > Adding the option '-pc_factor_shift_type INBLOCKS'
> > > or '-pc_factor_shift_type NONZERO'
> > > the code runs well :-)
> > >
> > > Barry:
> > > For ilu, a shift should be used to avoid zero pivot;
> > > for lu, '-pc_factor_shift_type NONE' should be the default.
> > > Why do we make such change?
> > > It seems you made such change:
> > > https://bitbucket.org/petsc/petsc/commits/422a814ef4a731b8529cf3a6428db526d183e312
> > >
> > > Hong
> > >
> > >
> > >
> > > On Wed, Jun 10, 2015 at 3:11 PM, Adrian Maldonado <dmaldona at hawk.iit.edu> wrote:
> > > I have been able to replicate the error with the new petsc-dev.
> > >
> > > KSP does not converge. However, the Jacobian matrix is the same!!!
> > >
> > > This jacobian matrix is not very well conditioned (condition number 1.4814e+03 for a 12x12 matrix) because of the valve... but it didn't give me any problem before.
> > >
> > > In fact, you can see that I have implemented the same in matlab (matlab folder) and solved the problem with a vanilla newton-rhapson iteration (no preconditioner etc...) and I didn't have any problem at all.
> > >
> > >
> > >
> > > On Wed, Jun 10, 2015 at 2:37 PM, Adrian Maldonado <dmaldona at hawk.iit.edu> wrote:
> > > I am upgrading to the new petsc-dev, I will let you know what I find. Maybe is an issue of the finite difference jacobian (how is generated)?
> > >
> > > I would love to go to to the conference! How can I register?
> > >
> > > On Wed, Jun 10, 2015 at 12:55 PM, Hong <hzhang at mcs.anl.gov> wrote:
> > > I found which change made difference:
> > > When I change Makefile
> > > $ diff old.Makefile new.Makefile
> > > 1,2c1,2
> > > < include ${PETSC_DIR}/conf/variables
> > > < include ${PETSC_DIR}/conf/rules
> > > ---
> > > > include ${PETSC_DIR}/lib/petsc/conf/variables
> > > > include ${PETSC_DIR}/lib/petsc/conf/rules
> > >
> > > to build tubtrans with petsc-dev, then I get
> > >
> > > Loaded case file
> > >     0 KSP Residual norm            nan
> > > SYSTEM SUMMARY
> > >
> > > Number of variables: 12
> > > Number of nodes: 2
> > > Number of pipes: 1
> > > Integrating characteristic equations ...
> > > TS: 0.000000
> > >     0 KSP Residual norm            nan
> > >
> > > Some petsc solvers have changed, but it should not behave like this.
> > > We have to use a debugger to figure out which step causes this problem :-(
> > >
> > > Hong
> > >
> > > On Wed, Jun 10, 2015 at 10:57 AM, Adrian Maldonado <dmaldona at hawk.iit.edu> wrote:
> > > Sure.
> > >
> > > This one has the object files, etc..
> > >
> > > I will check the new petsc-dev this afternoon.
> > >
> > > On Wed, Jun 10, 2015 at 10:42 AM, Hong <hzhang at mcs.anl.gov> wrote:
> > > How about make a tarball of your current tubtrans and send to me
> > > (or use google drive/dropbox)?
> > >
> > > Hong
> > >
> > >
> > > On Wed, Jun 10, 2015 at 10:28 AM, Adrian Maldonado <dmaldona at hawk.iit.edu> wrote:
> > > Strange.
> > >
> > > I am using the master branch from the bitbucket repo (https://bitbucket.org/damalbel/tubtrans.git).
> > >
> > > The jacobian is the finite difference jacobian (-snes_fd) as I didn't write one (I did write one in the matlab code prototype, though).
> > >
> > > What I can think now is to check this singular jacobian and see what equation is causing the problem.
> > >
> > > Let me clone the new petsc release and try to replicate the problem. I have a couple of things to do this morning but I will get back to you this afternoon.
> > >
> > > On Wed, Jun 10, 2015 at 9:54 AM, Hong <hzhang at mcs.anl.gov> wrote:
> > > Daniel,
> > >
> > > It seems Jacobian is singular:
> > > ./tubtrans -snes_monitor -ksp_monitor |more
> > >
> > > Loaded case file
> > >   0 SNES Function norm 1.500947612239e+02
> > >     0 KSP Residual norm           -nan
> > > SYSTEM SUMMARY
> > >
> > > Number of variables: 24
> > > Number of nodes: 4
> > > Number of pipes: 2
> > > Integrating characteristic equations ...
> > > TS: 0.000000
> > >   0 SNES Function norm 1.490033613070e+02
> > >     0 KSP Residual norm           -nan
> > >
> > > Do you use the same version in
> > > git clone https://bitbucket.org/damalbel/tubtrans.git?
> > >
> > > Can you apply my patch to this version and test it with new petsc-3.5 (we are working on its release)?
> > > The best way to get petsc-dev is using
> > >       • git clone https://bitbucket.org/petsc/petsc
> > > We will have a PETSc conference next week
> > > http://www.mcs.anl.gov/petsc/petsc-20.pdf
> > >
> > > Would you like join us?
> > > Hong
> > >
> > >
> > > On Wed, Jun 10, 2015 at 8:17 AM, Adrian Maldonado <dmaldona at hawk.iit.edu> wrote:
> > > P.S: I have applied your patch, and using PETSC 3.5.2 the program works all right. I have attached the logs of running test.xml and test2.xml.
> > >
> > > Again, let me check out the new petsc release.
> > >
> > > On Wed, Jun 10, 2015 at 8:08 AM, Adrian Maldonado <dmaldona at hawk.iit.edu> wrote:
> > > I don't see why it should not work. I can see two things that we can check:
> > >
> > > - Is SNES iterating? Can you check for -snes_monitor and -snes_converged_reason
> > > - In the program, I initialize the the solutions vector 'x'to ones as an initial guess (line 102), maybe somehow petsc is unable to write on it?
> > >
> > > Let me clone the new petsc-dev version and I will try to replicate the error.
> > >
> > > On Tue, Jun 9, 2015 at 4:56 PM, Hong <hzhang at mcs.anl.gov> wrote:
> > > Danniel,
> > >  Thanks for detailed instruction. I'm using petsc-dev, which will be released today or tomorrow. I'm building a release version to see if that causes the problem.
> > >
> > > However, your code should generate same solution with my patch (attached) because it only replace
> > > VecGetArray() with VecGetArrayRead(), and fix a memory leak problem.
> > >
> > > Here is what I did:
> > > 1. install petsc-dev (see http://www.mcs.anl.gov/petsc/developers/index.html)
> > > git clone https://bitbucket.org/petsc/petsc
> > >
> > > 2. build petsc-dev
> > >
> > > 3. git clone https://bitbucket.org/damalbel/tubtrans.git
> > > patch it:
> > > cd tubtrans
> > > patch -Np1 < patch_tubtrans
> > > make tubtrans
> > >
> > > 4. run tubtrans
> > > ./tubtrans
> > >
> > > Loaded case file
> > > SYSTEM SUMMARY
> > >
> > > Number of variables: 24
> > > Number of nodes: 4
> > > Number of pipes: 2
> > > Integrating characteristic equations ...
> > > TS: 0.000000
> > > Printing solution vector...
> > > Vec Object: 1 MPI processes
> > >   type: seq
> > > 1
> > > 1
> > > 1
> > > 1
> > > 1
> > > ...
> > >
> > > Changing to different input file:
> > > ./tubtrans -f input/test.xml |more
> > > Loaded case file
> > > SYSTEM SUMMARY
> > >
> > > Number of variables: 12
> > > Number of nodes: 2
> > > Number of pipes: 1
> > > Integrating characteristic equations ...
> > > TS: 0.000000
> > > Printing solution vector...
> > > Vec Object: 1 MPI processes
> > >   type: seq
> > > 1
> > > 1
> > > 1
> > > 1
> > > 1
> > > 1
> > > 1
> > > 1
> > > 1
> > > 1
> > > 1
> > > 1
> > > TS: 0.100000
> > > Printing solution vector...
> > > Vec Object: 1 MPI processes
> > >   type: seq
> > > 1
> > > 1
> > > 1
> > >
> > > Do I do something wrong?
> > >
> > > Hong
> > >
> > >
> > > On Tue, Jun 9, 2015 at 12:15 AM, Adrian Maldonado <dmaldona at hawk.iit.edu> wrote:
> > > Professor Hong,
> > >
> > > I am glad to hear from you! I am happy that you are finding the code useful. As for the solution vector being constant ones - that doesn't seem right.
> > >
> > > The master branch of the bitbucket repository with the Petsc Release Version 3.5.2 seems to work fine in my laptop.
> > >
> > > I have written a small walk through below.
> > >
> > > Please, let me know if you want me to take a look at the code or I can help you in any other way.
> > >
> > > Walk through:
> > >
> > > - The program uses XML (libxml2) to parse data for different case scenarios. The cases are stored in the directory 'input/' and you can select the case to run in the petscopt file (e.g -f input/test.xml)
> > >
> > > - The file test.xml, for example, includes a simple one-pipe test scenario from the textbook 'Fluid Transients' by Wyle. The content is shown below:
> > >
> > > <?xml version="1.0"?>
> > > <case>
> > >       <nnode>2</nnode>
> > >       <pipe id="1">
> > >               <fr>1</fr>
> > >               <to>2</to>
> > >         <fric>0.018</fric>
> > >         <diam>0.5</diam>
> > >         <a>1200</a>
> > >         <len>600</len>
> > >       </pipe>
> > >       <reservoir id="1">
> > >               <node>1</node>
> > >               <head>150</head>
> > >       </reservoir>
> > >
> > >       <valve id="1">
> > >               <node>2</node>
> > >               <lossc>0.009</lossc>
> > >       </valve>
> > > </case>
> > >
> > > In this case we have a conduct of two nodes (boundary equations). The only conduct (or pipe) goes from node 1 to node 2. The entries 'fric', 'diam', 'a' and 'len' are the parameters of the pipeline. Then, the file includes a reservoir and a valve as boundary equations.
> > >
> > > The reservoir defines a constant pressure (water head) in the node 1 whereas the valve defines a relationship between the pressure and the flow with friction losses.
> > >
> > > The program, after creating the necessary data structures and discretization (according to Courant number) will proceed to the initialization. The initialization is solving the system state for the steady-state conditions with SNES.
> > >
> > > Running the program in gdb for 'test.xml' and printing the 'x' vector will produce this:
> > >
> > > Breakpoint 1, main (argc=1, argv=0x7fffffffde68) at main.c:115
> > > 115       ierr = SNESSolve(snes, NULL, x);CHKERRQ(ierr);
> > > (gdb) n
> > > 120       ierr = MatDenseGetArray(SOL, &ss);CHKERRQ(ierr);
> > > (gdb) p VecView(x, 0)
> > > Vec Object: 1 MPI processes
> > >   type: seq
> > > 0.477432
> > > 150
> > > 0.477432
> > > 148.698
> > > 0.477432
> > > 147.395
> > > 0.477432
> > > 146.093
> > > 0.477432
> > > 144.791
> > > 0.477432
> > > 143.488
> > >
> > > The ordering here is, according to the discretization x = [Q0 H0 Q1 H1 ... Qn Hn]' . Where Q is the water flow and H is the water head or pressure. You can see how the water flow is constant in steady state along the pipeline and the pressure in the first node is 150, as the reservoir sets this as boundary equation.
> > >
> > > For the transient we simulate the sudden closure of the valve (dirac delta) although we can implement different closure dynamics.
> > >
> > > I have compiled the debug mode of the program (make debug) which prints the 'x' vector and each time step. I am attaching the results in the file log.txt so you can compare it. Each time step solution is stored in the matrix SOL and finally written to a binary file 'output.out'.
> > >
> > > The first integration time-step looks like:
> > >
> > > 0.477432
> > > 150
> > > 0.477432
> > > 148.698
> > > 0.477432
> > > 147.395
> > > 0.477432
> > > 146.093
> > > 0.477432
> > > 144.791
> > > -1.11838e-12
> > > 441.046
> > >
> > > The pen-ultimate entry of the x vector is the flow at the valve, which is 0 after the valve closes. At the end of the valve an steep rise in the pressure happens, as expected. This pressure wave will oscillate along the pipe according to the method of characteristics.
> > >
> > > More complex scenarios are included in '/input' and a script with some python macros 'intPlot.py' is included to produce plots of the pressure and flow profiles within the system.
> > >
> > > I hope this helped you and please, let me know in what can I help.
> > >
> > > Yours,
> > >
> > >
> > > On Mon, Jun 8, 2015 at 10:06 PM, Hong <hzhang at mcs.anl.gov> wrote:
> > > Danniel,
> > >
> > > Hello, this is Hong, your professor in CS595 :-)
> > >
> > > We are working on a project for simulating water cycle. The math model to be started is a shallow water river, very similar to your project. I start looking at your codes. After small update with latest petsc-dev, I'm able to run it.
> > >
> > > Look at the solutions at each time step, I found that solution vector x generated in main.c are alway a constant vector, x=1.
> > > Am I doing the right thing?
> > >
> > > Hong
> > >
> > >
> > >
> > > --
> > > D. Adrian Maldonado, PhD Candidate
> > > Electrical & Computer Engineering Dept.
> > > Illinois Institute of Technology
> > > 3301 S. Dearborn Street, Chicago, IL 60616
> > >
> > >
> > >
> > >
> > > --
> > > D. Adrian Maldonado, PhD Candidate
> > > Electrical & Computer Engineering Dept.
> > > Illinois Institute of Technology
> > > 3301 S. Dearborn Street, Chicago, IL 60616
> > >
> > >
> > >
> > > --
> > > D. Adrian Maldonado, PhD Candidate
> > > Electrical & Computer Engineering Dept.
> > > Illinois Institute of Technology
> > > 3301 S. Dearborn Street, Chicago, IL 60616
> > >
> > >
> > >
> > >
> > > --
> > > D. Adrian Maldonado, PhD Candidate
> > > Electrical & Computer Engineering Dept.
> > > Illinois Institute of Technology
> > > 3301 S. Dearborn Street, Chicago, IL 60616
> > >
> > >
> > >
> > >
> > > --
> > > D. Adrian Maldonado, PhD Candidate
> > > Electrical & Computer Engineering Dept.
> > > Illinois Institute of Technology
> > > 3301 S. Dearborn Street, Chicago, IL 60616
> > >
> > >
> > >
> > >
> > > --
> > > D. Adrian Maldonado, PhD Candidate
> > > Electrical & Computer Engineering Dept.
> > > Illinois Institute of Technology
> > > 3301 S. Dearborn Street, Chicago, IL 60616
> > >
> > >
> > >
> > > --
> > > D. Adrian Maldonado, PhD Candidate
> > > Electrical & Computer Engineering Dept.
> > > Illinois Institute of Technology
> > > 3301 S. Dearborn Street, Chicago, IL 60616
> > >
> > >
> > >
> > > --
> > > D. Adrian Maldonado, PhD Candidate
> > > Electrical & Computer Engineering Dept.
> > > Illinois Institute of Technology
> > > 3301 S. Dearborn Street, Chicago, IL 60616
> > >
> >
> >
> 
>