From mafunk at nmsu.edu Tue Aug 1 16:45:22 2006 From: mafunk at nmsu.edu (Matt Funk) Date: Tue, 1 Aug 2006 15:45:22 -0600 Subject: profiling PETSc code Message-ID: <200608011545.26224.mafunk@nmsu.edu> Hi, i need to profile my code. Specifically the KSPSolve(...) call. Basically i am (just for testing) setting up the identity matrix and pass in the solution and RHS vectors. I solve the system 4000 times or so (400 times steps that is). Anyway, when i run on 1 processor it takes essentially no time (ca 7 secs). When i run on 4 procs it takes 96 secs. I use an external timer to profile just the call to KSPSolve which is where the times come from. So i read through chap 12. Unfortunately i cannot find ex21.c in src/ksp/ksp/examples/tutorials so i couldn't look at the code. So what i do is the following. At the end of my program, right before calling PetscFinalize() i call PetscLogPrintSummary(...). I registered stage 0 and then right before the KSPSolve() call i do : m_ierr = PetscLogStagePush(0); and right after i do: m_ierr = PetscLogStagePop(). There is also barriers right before the push call and right after the pop call. The output i get (for the single proc run) is: ... ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- /home2/users/mafunk/AMR/Chombo.2.0/example/node/maskExec/testNodePoisson2d.Linux.g++.g77.MPI32.ex on a linux-gnu named .1 with 1 processor, by mafunk Tue Aug 1 14:47:43 2006 Using Petsc Release Version 2.3.1, Patch 12, Wed Apr 5 17:55:50 CDT 2006 BK revision: balay at asterix.mcs.anl.gov|ChangeSet|20060405225457|13540 Max Max/Min Avg Total Time (sec): 4.167e+01 1.00000 4.167e+01 Objects: 1.500e+01 1.00000 1.500e+01 Flops: 2.094e+08 1.00000 2.094e+08 2.094e+08 Flops/sec: 5.025e+06 1.00000 5.025e+06 5.025e+06 Memory: 4.700e+05 1.00000 4.700e+05 MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Reductions: 2.402e+04 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 4.1647e+01 99.9% 2.0937e+08 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 2.402e+04 100.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops/sec: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ ########################################################## # # # WARNING!!! # # # # This code was compiled with a debugging option, # # To get timing results run config/configure.py # # using --with-debugging=no, the performance will # # be generally two or three times faster. # # # ########################################################## ########################################################## # # # WARNING!!! # # # # This code was run without the PreLoadBegin() # # macros. To get timing results we always recommend # # preloading. otherwise timing numbers may be # # meaningless. # ########################################################## Event Count Time (sec) Flops/sec --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. --- Event Stage 0: Main Stage Index Set 3 3 35976 0 Vec 8 8 169056 0 Matrix 2 2 23304 0 Krylov Solver 1 1 17216 0 Preconditioner 1 1 168 0 ======================================================================================================================== Average time to get PetscTime(): 2.14577e-07 Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Tue Aug 1 11:28:46 2006 Configure options: --with-debugging=1 --with-blas-lapack-dir=/usr/local --with-mpi-dir=/usr --with-log=1 --with-shared=0 .... So i want to find out why KSPSolve() takes so long in parallel. However, there i no summary for stage 0. Does someone know why this is? Am i using it in a wrong way? I compiled the libraries with -with_log=1 -with-debugging=1 thanks mat From knepley at gmail.com Tue Aug 1 16:57:22 2006 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 1 Aug 2006 16:57:22 -0500 Subject: profiling PETSc code In-Reply-To: <200608011545.26224.mafunk@nmsu.edu> References: <200608011545.26224.mafunk@nmsu.edu> Message-ID: Take out your stage push/pop for the moment, and the log_summary call. Just run with -log_summary and send the output as a test. Thanks, Matt On 8/1/06, Matt Funk wrote: > Hi, > > i need to profile my code. Specifically the KSPSolve(...) call. Basically i am > (just for testing) setting up the identity matrix and pass in the solution > and RHS vectors. I solve the system 4000 times or so (400 times steps that > is). Anyway, when i run on 1 processor it takes essentially no time (ca 7 > secs). When i run on 4 procs it takes 96 secs. I use an external timer to > profile just the call to KSPSolve which is where the times come from. > > So i read through chap 12. Unfortunately i cannot find ex21.c in > src/ksp/ksp/examples/tutorials so i couldn't look at the code. > > So what i do is the following. At the end of my program, right before calling > PetscFinalize() i call PetscLogPrintSummary(...). I registered stage 0 and > then right before the KSPSolve() call i do : m_ierr = PetscLogStagePush(0); > and right after i do: m_ierr = PetscLogStagePop(). There is also barriers > right before the push call and right after the pop call. > > The output i get (for the single proc run) is: > > ... > > ---------------------------------------------- PETSc Performance Summary: > ---------------------------------------------- > > /home2/users/mafunk/AMR/Chombo.2.0/example/node/maskExec/testNodePoisson2d.Linux.g++.g77.MPI32.ex > on a linux-gnu named .1 with 1 processor, by mafunk Tue Aug 1 14:47:43 2006 > Using Petsc Release Version 2.3.1, Patch 12, Wed Apr 5 17:55:50 CDT 2006 > BK revision: balay at asterix.mcs.anl.gov|ChangeSet|20060405225457|13540 > > Max Max/Min Avg Total > Time (sec): 4.167e+01 1.00000 4.167e+01 > Objects: 1.500e+01 1.00000 1.500e+01 > Flops: 2.094e+08 1.00000 2.094e+08 2.094e+08 > Flops/sec: 5.025e+06 1.00000 5.025e+06 5.025e+06 > Memory: 4.700e+05 1.00000 4.700e+05 > MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 > MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 > MPI Reductions: 2.402e+04 1.00000 > > Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N --> > 2N flops > and VecAXPY() for complex vectors of length N --> > 8N flops > > Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- > -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total counts %Total > Avg %Total counts %Total > 0: Main Stage: 4.1647e+01 99.9% 2.0937e+08 100.0% 0.000e+00 0.0% > 0.000e+00 0.0% 2.402e+04 100.0% > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on interpreting > output. > Phase summary info: > Count: number of times phase was executed > Time and Flops/sec: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > Avg. len: average message length > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and > PetscLogStagePop(). > %T - percent time in this phase %F - percent flops in this phase > %M - percent messages in this phase %L - percent message lengths in > this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over > all processors) > ------------------------------------------------------------------------------------------------------------------------ > > > ########################################################## > # # > # WARNING!!! # > # # > # This code was compiled with a debugging option, # > # To get timing results run config/configure.py # > # using --with-debugging=no, the performance will # > # be generally two or three times faster. # > # # > ########################################################## > > > > > ########################################################## > # # > # WARNING!!! # > # # > # This code was run without the PreLoadBegin() # > # macros. To get timing results we always recommend # > # preloading. otherwise timing numbers may be # > # meaningless. # > ########################################################## > > > Event Count Time (sec) Flops/sec > --- Global --- --- Stage --- Total > Max Ratio Max Ratio Max Ratio Mess Avg len > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > ------------------------------------------------------------------------------------------------------------------------ > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' Mem. > > --- Event Stage 0: Main Stage > > Index Set 3 3 35976 0 > Vec 8 8 169056 0 > Matrix 2 2 23304 0 > Krylov Solver 1 1 17216 0 > Preconditioner 1 1 168 0 > ======================================================================================================================== > Average time to get PetscTime(): 2.14577e-07 > Compiled without FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 8 > Configure run at: Tue Aug 1 11:28:46 2006 > Configure options: --with-debugging=1 --with-blas-lapack-dir=/usr/local > --with-mpi-dir=/usr --with-log=1 --with-shared=0 > > > .... > > > So i want to find out why KSPSolve() takes so long in parallel. However, there > i no summary for stage 0. Does someone know why this is? Am i using it in a > wrong way? I compiled the libraries with -with_log=1 -with-debugging=1 > > thanks > mat > > -- "Failure has a thousand explanations. Success doesn't need one" -- Sir Alec Guiness From mafunk at nmsu.edu Tue Aug 1 17:30:04 2006 From: mafunk at nmsu.edu (Matt Funk) Date: Tue, 1 Aug 2006 16:30:04 -0600 Subject: profiling PETSc code In-Reply-To: References: <200608011545.26224.mafunk@nmsu.edu> Message-ID: <200608011630.05454.mafunk@nmsu.edu> Hi, well, now i do get summary: ... # WARNING!!! # # # # This code was run without the PreLoadBegin() # # macros. To get timing results we always recommend # # preloading. otherwise timing numbers may be # # meaningless. # ########################################################## Event Count Time (sec) Flops/sec --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecNorm 200 1.0 5.6217e-03 1.0 2.07e+08 1.0 0.0e+00 0.0e+00 1.0e+02 0 36 0 0 7 0 36 0 0 31 207 VecCopy 200 1.0 4.2303e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 1 1.0 8.1062e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAYPX 100 1.0 3.2036e-03 1.0 1.82e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 182 MatMult 100 1.0 8.3530e-03 1.0 3.49e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 9 0 0 0 1 9 0 0 0 35 MatSolve 200 1.0 2.5591e-02 1.0 2.28e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 18 0 0 0 2 18 0 0 0 23 MatSolveTranspos 100 1.0 2.1357e-02 1.0 1.36e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 9 0 0 0 2 9 0 0 0 14 MatLUFactorNum 100 1.0 4.6215e-02 1.0 6.30e+06 1.0 0.0e+00 0.0e+00 0.0e+00 3 9 0 0 0 3 9 0 0 0 6 MatILUFactorSym 1 1.0 4.4894e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 1 0 MatAssemblyBegin 1 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 100 1.0 1.1220e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatGetOrdering 1 1.0 2.5296e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 1 0 KSPSetup 100 1.0 5.3692e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.4e+01 0 0 0 0 1 0 0 0 0 4 0 KSPSolve 100 1.0 9.0056e-02 1.0 3.23e+07 1.0 0.0e+00 0.0e+00 3.0e+02 7 91 0 0 21 7 91 0 0 93 32 PCSetUp 100 1.0 4.9087e-02 1.0 5.93e+06 1.0 0.0e+00 0.0e+00 4.0e+00 4 9 0 0 0 4 9 0 0 1 6 PCApply 300 1.0 4.9106e-02 1.0 1.78e+07 1.0 0.0e+00 0.0e+00 0.0e+00 4 27 0 0 0 4 27 0 0 0 18 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. --- Event Stage 0: Main Stage Index Set 3 3 35976 0 Vec 109 109 2458360 0 Matrix 2 2 23304 0 Krylov Solver 1 1 0 0 Preconditioner 1 1 168 0 ======================================================================================================================== Average time to get PetscTime(): 9.53674e-08 Compiled without FORTRAN kernels Compiled with full precision matrices (default) ... am i using the push and pop calls in an manner they are not intended to be used? plus, how can i see what's going on with respect to why it takes so much longer to solve the system in parallel than in serial without being able to specify the stages (i.e single out the KSPSolve call)? mat On Tuesday 01 August 2006 15:57, Matthew Knepley wrote: > ke out your stage push/pop for the moment, and the log_summary > call. Just run with -log_summary and send the output as a test. > > ? Thanks, > > ? ? ?Matt From knepley at gmail.com Tue Aug 1 17:36:55 2006 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 1 Aug 2006 17:36:55 -0500 Subject: profiling PETSc code In-Reply-To: <200608011630.05454.mafunk@nmsu.edu> References: <200608011545.26224.mafunk@nmsu.edu> <200608011630.05454.mafunk@nmsu.edu> Message-ID: On 8/1/06, Matt Funk wrote: > Hi, > > well, now i do get summary: > ... > # WARNING!!! # > # # > # This code was run without the PreLoadBegin() # > # macros. To get timing results we always recommend # > # preloading. otherwise timing numbers may be # > # meaningless. # > ########################################################## > > > Event Count Time (sec) Flops/sec > --- Global --- --- Stage --- Total > Max Ratio Max Ratio Max Ratio Mess Avg len > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > VecNorm 200 1.0 5.6217e-03 1.0 2.07e+08 1.0 0.0e+00 0.0e+00 > 1.0e+02 0 36 0 0 7 0 36 0 0 31 207 > VecCopy 200 1.0 4.2303e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 1 1.0 8.1062e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAYPX 100 1.0 3.2036e-03 1.0 1.82e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 18 0 0 0 0 18 0 0 0 182 > MatMult 100 1.0 8.3530e-03 1.0 3.49e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 9 0 0 0 1 9 0 0 0 35 > MatSolve 200 1.0 2.5591e-02 1.0 2.28e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 18 0 0 0 2 18 0 0 0 23 > MatSolveTranspos 100 1.0 2.1357e-02 1.0 1.36e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 9 0 0 0 2 9 0 0 0 14 > MatLUFactorNum 100 1.0 4.6215e-02 1.0 6.30e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 3 9 0 0 0 3 9 0 0 0 6 > MatILUFactorSym 1 1.0 4.4894e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.0e+00 0 0 0 0 0 0 0 0 0 1 0 > MatAssemblyBegin 1 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 100 1.0 1.1220e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatGetOrdering 1 1.0 2.5296e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.0e+00 0 0 0 0 0 0 0 0 0 1 0 > KSPSetup 100 1.0 5.3692e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.4e+01 0 0 0 0 1 0 0 0 0 4 0 > KSPSolve 100 1.0 9.0056e-02 1.0 3.23e+07 1.0 0.0e+00 0.0e+00 > 3.0e+02 7 91 0 0 21 7 91 0 0 93 32 > PCSetUp 100 1.0 4.9087e-02 1.0 5.93e+06 1.0 0.0e+00 0.0e+00 > 4.0e+00 4 9 0 0 0 4 9 0 0 1 6 > PCApply 300 1.0 4.9106e-02 1.0 1.78e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 4 27 0 0 0 4 27 0 0 0 18 > ------------------------------------------------------------------------------------------------------------------------ > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' Mem. > > --- Event Stage 0: Main Stage > > Index Set 3 3 35976 0 > Vec 109 109 2458360 0 > Matrix 2 2 23304 0 > Krylov Solver 1 1 0 0 > Preconditioner 1 1 168 0 > ======================================================================================================================== > Average time to get PetscTime(): 9.53674e-08 > Compiled without FORTRAN kernels > Compiled with full precision matrices (default) > > ... > > am i using the push and pop calls in an manner they are not intended to be > used? Not exactly. You need to register a stage first before pushing it. http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Profiling/PetscLogStageRegister.html > plus, how can i see what's going on with respect to why it takes so much > longer to solve the system in parallel than in serial without being able to > specify the stages (i.e single out the KSPSolve call)? There are 100 calls to KSPSolve() which collectively take .1s. Your problem is most likely in matrix setup. I would bet that you have not preallocated the space correctly. Therefore, a malloc() is called every time you insert a value. You can check the number of mallocs using -info. Matt > mat > > > > > > On Tuesday 01 August 2006 15:57, Matthew Knepley wrote: > > ke out your stage push/pop for the moment, and the log_summary > > call. Just run with -log_summary and send the output as a test. > > > > Thanks, > > > > Matt > > -- "Failure has a thousand explanations. Success doesn't need one" -- Sir Alec Guiness From mafunk at nmsu.edu Tue Aug 1 18:49:46 2006 From: mafunk at nmsu.edu (Matt Funk) Date: Tue, 1 Aug 2006 17:49:46 -0600 Subject: profiling PETSc code In-Reply-To: References: <200608011545.26224.mafunk@nmsu.edu> <200608011630.05454.mafunk@nmsu.edu> Message-ID: <200608011749.48651.mafunk@nmsu.edu> Hi, i don't think it is the mallocs since it says things like: [0] MatAssemblyEnd_SeqAIJMatrix size: 2912 X 2912; storage space: 0 unneeded,2912 used [0] MatAssemblyEnd_SeqAIJNumber of mallocs during MatSetValues() is 0 However, i do get errors. They look like: [0]PETSC ERROR: StageLogRegister() line 95 in src/sys/plog/stageLog.c [0]PETSC ERROR: Invalid pointer! [0]PETSC ERROR: Null Pointer: Parameter # 3! [0]PETSC ERROR: PetscLogStageRegister() line 375 in src/sys/plog/plog.c which happens during the call PETSCInitialize(...); After that i get an error like: [0] PetscCommDuplicateDuplicating a communicator 91 164 max tags = 1073741823 [0] PetscCommDuplicateUsing internal PETSc communicator 91 164 [0] PetscCommDuplicateUsing internal PETSc communicator 91 164 [0] PetscCommDuplicateUsing internal PETSc communicator 91 164 [0]PETSC ERROR: MatGetVecs() line 6283 in src/mat/interface/matrix.c [0]PETSC ERROR: Null argument, when expecting valid pointer! [0]PETSC ERROR: Null Object: Parameter # 1! [0]PETSC ERROR: KSPGetVecs() line 555 in src/ksp/ksp/interface/iterativ.c [0]PETSC ERROR: KSPDefaultGetWork() line 597 in src/ksp/ksp/interface/iterativ.c [0]PETSC ERROR: KSPSetUp_CG() line 75 in src/ksp/ksp/impls/cg/cg.c [0]PETSC ERROR: KSPSetUp() line 198 in src/ksp/ksp/interface/itfunc.c so i suppose that is a problem. I am just not sure what it means. any ideas? mat On Tuesday 01 August 2006 16:36, Matthew Knepley wrote: > On 8/1/06, Matt Funk wrote: > > Hi, > > > > well, now i do get summary: > > ... > > # WARNING!!! # > > # # > > # This code was run without the PreLoadBegin() # > > # macros. To get timing results we always recommend # > > # preloading. otherwise timing numbers may be # > > # meaningless. # > > ########################################################## > > > > > > Event Count Time (sec) Flops/sec > > --- Global --- --- Stage --- Total > > Max Ratio Max Ratio Max Ratio Mess Avg len > > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > > ------------------------------------------------------------------------- > >----------------------------------------------- > > > > --- Event Stage 0: Main Stage > > > > VecNorm 200 1.0 5.6217e-03 1.0 2.07e+08 1.0 0.0e+00 0.0e+00 > > 1.0e+02 0 36 0 0 7 0 36 0 0 31 207 > > VecCopy 200 1.0 4.2303e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecSet 1 1.0 8.1062e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecAYPX 100 1.0 3.2036e-03 1.0 1.82e+08 1.0 0.0e+00 0.0e+00 > > 0.0e+00 0 18 0 0 0 0 18 0 0 0 182 > > MatMult 100 1.0 8.3530e-03 1.0 3.49e+07 1.0 0.0e+00 0.0e+00 > > 0.0e+00 1 9 0 0 0 1 9 0 0 0 35 > > MatSolve 200 1.0 2.5591e-02 1.0 2.28e+07 1.0 0.0e+00 0.0e+00 > > 0.0e+00 2 18 0 0 0 2 18 0 0 0 23 > > MatSolveTranspos 100 1.0 2.1357e-02 1.0 1.36e+07 1.0 0.0e+00 0.0e+00 > > 0.0e+00 2 9 0 0 0 2 9 0 0 0 14 > > MatLUFactorNum 100 1.0 4.6215e-02 1.0 6.30e+06 1.0 0.0e+00 0.0e+00 > > 0.0e+00 3 9 0 0 0 3 9 0 0 0 6 > > MatILUFactorSym 1 1.0 4.4894e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > > 2.0e+00 0 0 0 0 0 0 0 0 0 1 0 > > MatAssemblyBegin 1 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatAssemblyEnd 100 1.0 1.1220e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > > MatGetOrdering 1 1.0 2.5296e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > > 2.0e+00 0 0 0 0 0 0 0 0 0 1 0 > > KSPSetup 100 1.0 5.3692e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > > 1.4e+01 0 0 0 0 1 0 0 0 0 4 0 > > KSPSolve 100 1.0 9.0056e-02 1.0 3.23e+07 1.0 0.0e+00 0.0e+00 > > 3.0e+02 7 91 0 0 21 7 91 0 0 93 32 > > PCSetUp 100 1.0 4.9087e-02 1.0 5.93e+06 1.0 0.0e+00 0.0e+00 > > 4.0e+00 4 9 0 0 0 4 9 0 0 1 6 > > PCApply 300 1.0 4.9106e-02 1.0 1.78e+07 1.0 0.0e+00 0.0e+00 > > 0.0e+00 4 27 0 0 0 4 27 0 0 0 18 > > ------------------------------------------------------------------------- > >----------------------------------------------- > > > > Memory usage is given in bytes: > > > > Object Type Creations Destructions Memory Descendants' Mem. > > > > --- Event Stage 0: Main Stage > > > > Index Set 3 3 35976 0 > > Vec 109 109 2458360 0 > > Matrix 2 2 23304 0 > > Krylov Solver 1 1 0 0 > > Preconditioner 1 1 168 0 > > ========================================================================= > >=============================================== Average time to get > > PetscTime(): 9.53674e-08 > > Compiled without FORTRAN kernels > > Compiled with full precision matrices (default) > > > > ... > > > > am i using the push and pop calls in an manner they are not intended to > > be used? > > Not exactly. You need to register a stage first before pushing it. > > > http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/man >ualpages/Profiling/PetscLogStageRegister.html > > > plus, how can i see what's going on with respect to why it takes so much > > longer to solve the system in parallel than in serial without being able > > to specify the stages (i.e single out the KSPSolve call)? > > There are 100 calls to KSPSolve() which collectively take .1s. Your > problem is most > likely in matrix setup. I would bet that you have not preallocated the > space correctly. > Therefore, a malloc() is called every time you insert a value. You can > check the number of mallocs using -info. > > Matt > > > mat > > > > On Tuesday 01 August 2006 15:57, Matthew Knepley wrote: > > > ke out your stage push/pop for the moment, and the log_summary > > > call. Just run with -log_summary and send the output as a test. > > > > > > Thanks, > > > > > > Matt From knepley at gmail.com Tue Aug 1 19:07:26 2006 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 1 Aug 2006 19:07:26 -0500 Subject: profiling PETSc code In-Reply-To: <200608011749.48651.mafunk@nmsu.edu> References: <200608011545.26224.mafunk@nmsu.edu> <200608011630.05454.mafunk@nmsu.edu> <200608011749.48651.mafunk@nmsu.edu> Message-ID: On 8/1/06, Matt Funk wrote: > Hi, > > i don't think it is the mallocs since it says things like: > [0] MatAssemblyEnd_SeqAIJMatrix size: 2912 X 2912; storage space: 0 > unneeded,2912 used > [0] MatAssemblyEnd_SeqAIJNumber of mallocs during MatSetValues() is 0 This is only on one processor. > However, i do get errors. They look like: > [0]PETSC ERROR: StageLogRegister() line 95 in src/sys/plog/stageLog.c > [0]PETSC ERROR: Invalid pointer! > [0]PETSC ERROR: Null Pointer: Parameter # 3! > [0]PETSC ERROR: PetscLogStageRegister() line 375 in src/sys/plog/plog.c You gave an invalid pointer to the call. You should have int stage; PetscLogStageRegister(&stage, "MyStage"); > which happens during the call PETSCInitialize(...); > > After that i get an error like: > [0] PetscCommDuplicateDuplicating a communicator 91 164 max tags = 1073741823 > [0] PetscCommDuplicateUsing internal PETSc communicator 91 164 > [0] PetscCommDuplicateUsing internal PETSc communicator 91 164 > [0] PetscCommDuplicateUsing internal PETSc communicator 91 164 > [0]PETSC ERROR: MatGetVecs() line 6283 in src/mat/interface/matrix.c > [0]PETSC ERROR: Null argument, when expecting valid pointer! > [0]PETSC ERROR: Null Object: Parameter # 1! > [0]PETSC ERROR: KSPGetVecs() line 555 in src/ksp/ksp/interface/iterativ.c > [0]PETSC ERROR: KSPDefaultGetWork() line 597 in > src/ksp/ksp/interface/iterativ.c > [0]PETSC ERROR: KSPSetUp_CG() line 75 in src/ksp/ksp/impls/cg/cg.c > [0]PETSC ERROR: KSPSetUp() line 198 in src/ksp/ksp/interface/itfunc.c > > so i suppose that is a problem. I am just not sure what it means. > any ideas? It looks like you have not called KSPSetOperators(). Matt -- "Failure has a thousand explanations. Success doesn't need one" -- Sir Alec Guiness From mafunk at nmsu.edu Tue Aug 1 19:07:28 2006 From: mafunk at nmsu.edu (Matt Funk) Date: Tue, 1 Aug 2006 18:07:28 -0600 Subject: profiling PETSc code In-Reply-To: <200608011749.48651.mafunk@nmsu.edu> References: <200608011545.26224.mafunk@nmsu.edu> <200608011749.48651.mafunk@nmsu.edu> Message-ID: <200608011807.29007.mafunk@nmsu.edu> Actually the errors occur on my calls to a PETSc functions after calling PETSCInitialize. mat On Tuesday 01 August 2006 17:49, Matt Funk wrote: > Hi, > > i don't think it is the mallocs since it says things like: > [0] MatAssemblyEnd_SeqAIJMatrix size: 2912 X 2912; storage space: 0 > unneeded,2912 used > [0] MatAssemblyEnd_SeqAIJNumber of mallocs during MatSetValues() is 0 > > However, i do get errors. They look like: > [0]PETSC ERROR: StageLogRegister() line 95 in src/sys/plog/stageLog.c > [0]PETSC ERROR: Invalid pointer! > [0]PETSC ERROR: Null Pointer: Parameter # 3! > [0]PETSC ERROR: PetscLogStageRegister() line 375 in src/sys/plog/plog.c > > which happens during the call PETSCInitialize(...); > > > After that i get an error like: > [0] PetscCommDuplicateDuplicating a communicator 91 164 max tags = > 1073741823 [0] PetscCommDuplicateUsing internal PETSc communicator 91 164 > [0] PetscCommDuplicateUsing internal PETSc communicator 91 164 > [0] PetscCommDuplicateUsing internal PETSc communicator 91 164 > [0]PETSC ERROR: MatGetVecs() line 6283 in src/mat/interface/matrix.c > [0]PETSC ERROR: Null argument, when expecting valid pointer! > [0]PETSC ERROR: Null Object: Parameter # 1! > [0]PETSC ERROR: KSPGetVecs() line 555 in src/ksp/ksp/interface/iterativ.c > [0]PETSC ERROR: KSPDefaultGetWork() line 597 in > src/ksp/ksp/interface/iterativ.c > [0]PETSC ERROR: KSPSetUp_CG() line 75 in src/ksp/ksp/impls/cg/cg.c > [0]PETSC ERROR: KSPSetUp() line 198 in src/ksp/ksp/interface/itfunc.c > > so i suppose that is a problem. I am just not sure what it means. > any ideas? > > mat > > On Tuesday 01 August 2006 16:36, Matthew Knepley wrote: > > On 8/1/06, Matt Funk wrote: > > > Hi, > > > > > > well, now i do get summary: > > > ... > > > # WARNING!!! # > > > # # > > > # This code was run without the PreLoadBegin() # > > > # macros. To get timing results we always recommend # > > > # preloading. otherwise timing numbers may be # > > > # meaningless. # > > > ########################################################## > > > > > > > > > Event Count Time (sec) Flops/sec > > > --- Global --- --- Stage --- Total > > > Max Ratio Max Ratio Max Ratio Mess Avg > > > len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > > > ----------------------------------------------------------------------- > > >-- ----------------------------------------------- > > > > > > --- Event Stage 0: Main Stage > > > > > > VecNorm 200 1.0 5.6217e-03 1.0 2.07e+08 1.0 0.0e+00 > > > 0.0e+00 1.0e+02 0 36 0 0 7 0 36 0 0 31 207 > > > VecCopy 200 1.0 4.2303e-03 1.0 0.00e+00 0.0 0.0e+00 > > > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > VecSet 1 1.0 8.1062e-06 1.0 0.00e+00 0.0 0.0e+00 > > > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > VecAYPX 100 1.0 3.2036e-03 1.0 1.82e+08 1.0 0.0e+00 > > > 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 182 > > > MatMult 100 1.0 8.3530e-03 1.0 3.49e+07 1.0 0.0e+00 > > > 0.0e+00 0.0e+00 1 9 0 0 0 1 9 0 0 0 35 > > > MatSolve 200 1.0 2.5591e-02 1.0 2.28e+07 1.0 0.0e+00 > > > 0.0e+00 0.0e+00 2 18 0 0 0 2 18 0 0 0 23 > > > MatSolveTranspos 100 1.0 2.1357e-02 1.0 1.36e+07 1.0 0.0e+00 > > > 0.0e+00 0.0e+00 2 9 0 0 0 2 9 0 0 0 14 > > > MatLUFactorNum 100 1.0 4.6215e-02 1.0 6.30e+06 1.0 0.0e+00 > > > 0.0e+00 0.0e+00 3 9 0 0 0 3 9 0 0 0 6 > > > MatILUFactorSym 1 1.0 4.4894e-04 1.0 0.00e+00 0.0 0.0e+00 > > > 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 1 0 > > > MatAssemblyBegin 1 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 > > > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > MatAssemblyEnd 100 1.0 1.1220e-02 1.0 0.00e+00 0.0 0.0e+00 > > > 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > > > MatGetOrdering 1 1.0 2.5296e-04 1.0 0.00e+00 0.0 0.0e+00 > > > 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 1 0 > > > KSPSetup 100 1.0 5.3692e-04 1.0 0.00e+00 0.0 0.0e+00 > > > 0.0e+00 1.4e+01 0 0 0 0 1 0 0 0 0 4 0 > > > KSPSolve 100 1.0 9.0056e-02 1.0 3.23e+07 1.0 0.0e+00 > > > 0.0e+00 3.0e+02 7 91 0 0 21 7 91 0 0 93 32 > > > PCSetUp 100 1.0 4.9087e-02 1.0 5.93e+06 1.0 0.0e+00 > > > 0.0e+00 4.0e+00 4 9 0 0 0 4 9 0 0 1 6 > > > PCApply 300 1.0 4.9106e-02 1.0 1.78e+07 1.0 0.0e+00 > > > 0.0e+00 0.0e+00 4 27 0 0 0 4 27 0 0 0 18 > > > ----------------------------------------------------------------------- > > >-- ----------------------------------------------- > > > > > > Memory usage is given in bytes: > > > > > > Object Type Creations Destructions Memory Descendants' > > > Mem. > > > > > > --- Event Stage 0: Main Stage > > > > > > Index Set 3 3 35976 0 > > > Vec 109 109 2458360 0 > > > Matrix 2 2 23304 0 > > > Krylov Solver 1 1 0 0 > > > Preconditioner 1 1 168 0 > > > ======================================================================= > > >== =============================================== Average time to get > > > PetscTime(): 9.53674e-08 > > > Compiled without FORTRAN kernels > > > Compiled with full precision matrices (default) > > > > > > ... > > > > > > am i using the push and pop calls in an manner they are not intended to > > > be used? > > > > Not exactly. You need to register a stage first before pushing it. > > > > > > http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/m > >an ualpages/Profiling/PetscLogStageRegister.html > > > > > plus, how can i see what's going on with respect to why it takes so > > > much longer to solve the system in parallel than in serial without > > > being able to specify the stages (i.e single out the KSPSolve call)? > > > > There are 100 calls to KSPSolve() which collectively take .1s. Your > > problem is most > > likely in matrix setup. I would bet that you have not preallocated the > > space correctly. > > Therefore, a malloc() is called every time you insert a value. You can > > check the number of mallocs using -info. > > > > Matt > > > > > mat > > > > > > On Tuesday 01 August 2006 15:57, Matthew Knepley wrote: > > > > ke out your stage push/pop for the moment, and the log_summary > > > > call. Just run with -log_summary and send the output as a test. > > > > > > > > Thanks, > > > > > > > > Matt From knepley at gmail.com Tue Aug 1 19:28:08 2006 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 1 Aug 2006 19:28:08 -0500 Subject: profiling PETSc code In-Reply-To: <200608011807.29007.mafunk@nmsu.edu> References: <200608011545.26224.mafunk@nmsu.edu> <200608011749.48651.mafunk@nmsu.edu> <200608011807.29007.mafunk@nmsu.edu> Message-ID: On 8/1/06, Matt Funk wrote: > Actually the errors occur on my calls to a PETSc functions after calling > PETSCInitialize. Yes, it is the error I pointed out in the last message. Matt > mat -- "Failure has a thousand explanations. Success doesn't need one" -- Sir Alec Guiness From jiaxun_hou at yahoo.com.cn Wed Aug 2 06:42:35 2006 From: jiaxun_hou at yahoo.com.cn (jiaxun hou) Date: Wed, 2 Aug 2006 19:42:35 +0800 (CST) Subject: some problems in using PETSC with FFTW3 package Message-ID: <20060802114235.73141.qmail@web15802.mail.cnb.yahoo.com> Hi all, I am trying to using the package FFTW3 in PETSC. How can I change type from PetscScalar to complex or double[2]? The documentation seems a bit sketchy. Regards Mason --------------------------------- ????????-3.5G???20M??? -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Aug 2 09:19:26 2006 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 2 Aug 2006 09:19:26 -0500 (CDT) Subject: some problems in using PETSC with FFTW3 package In-Reply-To: <20060802114235.73141.qmail@web15802.mail.cnb.yahoo.com> References: <20060802114235.73141.qmail@web15802.mail.cnb.yahoo.com> Message-ID: use the configure option --with-scalar-type=complex Satish On Wed, 2 Aug 2006, jiaxun hou wrote: > Hi all, > I am trying to using the package FFTW3 in PETSC. > How can I change type from PetscScalar to complex or double[2]? > The documentation seems a bit sketchy. > > Regards > Mason > > > > --------------------------------- > ????????????????-3.5G??????20M?????? From hzhang at mcs.anl.gov Wed Aug 2 10:27:04 2006 From: hzhang at mcs.anl.gov (Hong Zhang) Date: Wed, 2 Aug 2006 10:27:04 -0500 (CDT) Subject: some problems in using PETSC with FFTW3 package In-Reply-To: <20060802114235.73141.qmail@web15802.mail.cnb.yahoo.com> References: <20060802114235.73141.qmail@web15802.mail.cnb.yahoo.com> Message-ID: Manson, We don't have support for FFTW3 yet(we are currently developing an interface between petsc and FFTW3). How do you use FFTW3 in PETSC? To build petsc with complex, you need configure petsc with '--with-scalar-type=complex' Hong On Wed, 2 Aug 2006, jiaxun hou wrote: > Hi all, > I am trying to using the package FFTW3 in PETSC. > How can I change type from PetscScalar to complex or double[2]? > The documentation seems a bit sketchy. > > Regards > Mason > > > > --------------------------------- > ????????????????-3.5G??????20M?????? From jiaxun_hou at yahoo.com.cn Wed Aug 2 10:44:04 2006 From: jiaxun_hou at yahoo.com.cn (jiaxun hou) Date: Wed, 2 Aug 2006 23:44:04 +0800 (CST) Subject: =?gb2312?q?=BB=D8=B8=B4=A3=BA=20Re:=20some=20problems=20in=20using=20PETS?= =?gb2312?q?C=20with=20FFTW3=20package?= In-Reply-To: Message-ID: <20060802154404.90585.qmail@web15806.mail.cnb.yahoo.com> Satish, Thanks for your response . I am sorry for my confusing description. In fact , I did use the configure option --with-scalar-type=complex when I configuated the system. So, I wonder if it is possible to change the type PetscScalar to some kinds like double[2] which can be handled in FFTW package? Or, is there any functions can get the real (imaginary) parts of a Petsc's Vector? Regards, Mason Satish Balay ??? use the configure option --with-scalar-type=complex Satish On Wed, 2 Aug 2006, jiaxun hou wrote: > Hi all, > I am trying to using the package FFTW3 in PETSC. > How can I change type from PetscScalar to complex or double[2]? > The documentation seems a bit sketchy. > > Regards > Mason > > > > --------------------------------- > ????????-3.5G???20M??? --------------------------------- ????????-3.5G???20M??? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jiaxun_hou at yahoo.com.cn Wed Aug 2 10:57:24 2006 From: jiaxun_hou at yahoo.com.cn (jiaxun hou) Date: Wed, 2 Aug 2006 23:57:24 +0800 (CST) Subject: =?gb2312?q?=BB=D8=B8=B4=A3=BA=20Re:=20some=20problems=20in=20using=20PETS?= =?gb2312?q?C=20with=20FFTW3=20package?= In-Reply-To: Message-ID: <20060802155724.448.qmail@web15801.mail.cnb.yahoo.com> Hong Zhang, Thanks for your respones. In FFTW3, complex type is set by double[2], and it is very easy to handle. But in Petsc, I don't konw exactly how the complex type be set. And when I want to do the fast fourier transform on a Petsc's complex vector by using FFTW3, I get the trouble of the translation between Petsc and FFTW3. Regards, Mason Hong Zhang ??? Manson, We don't have support for FFTW3 yet(we are currently developing an interface between petsc and FFTW3). How do you use FFTW3 in PETSC? To build petsc with complex, you need configure petsc with '--with-scalar-type=complex' Hong On Wed, 2 Aug 2006, jiaxun hou wrote: > Hi all, > I am trying to using the package FFTW3 in PETSC. > How can I change type from PetscScalar to complex or double[2]? > The documentation seems a bit sketchy. > > Regards > Mason > > > > --------------------------------- > ????????-3.5G???20M??? --------------------------------- ??????-3.5G???20M?? -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Wed Aug 2 11:11:53 2006 From: hzhang at mcs.anl.gov (Hong Zhang) Date: Wed, 2 Aug 2006 11:11:53 -0500 (CDT) Subject: =?gb2312?q?=BB=D8=B8=B4=A3=BA=20Re:=20some=20problems=20in=20using=20PETS?= =?gb2312?q?C=20with=20FFTW3=20package?= In-Reply-To: <20060802155724.448.qmail@web15801.mail.cnb.yahoo.com> References: <20060802155724.448.qmail@web15801.mail.cnb.yahoo.com> Message-ID: You can retrieve real and imaginary part of a petsc scalar from PetscRealPart()/PetscImaginaryPart() See an example at ~petsc/src/ksp/ksp/examples/tutorials/ex11.c Hong On Wed, 2 Aug 2006, jiaxun hou wrote: > Hong Zhang, > Thanks for your respones. > > In FFTW3, complex type is set by double[2], and it is very easy to handle. > But in Petsc, I don't konw exactly how the complex type be set. And when I want to do the fast fourier transform on a Petsc's complex vector by using FFTW3, I get the trouble of the translation between Petsc and FFTW3. > > Regards, > Mason > > Hong Zhang ?????? > > Manson, > > We don't have support for FFTW3 yet(we are currently developing > an interface between petsc and FFTW3). How do you use FFTW3 in PETSC? > > To build petsc with complex, you need configure petsc with > '--with-scalar-type=complex' > > Hong > > On Wed, 2 Aug 2006, jiaxun hou wrote: > > > Hi all, > > I am trying to using the package FFTW3 in PETSC. > > How can I change type from PetscScalar to complex or double[2]? > > The documentation seems a bit sketchy. > > > > Regards > > Mason > > > > > > > > --------------------------------- > > ????????????????-3.5G??????20M?????? > > > > > --------------------------------- > ????????????-3.5G??????20M???? From bsmith at mcs.anl.gov Wed Aug 2 12:12:07 2006 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 2 Aug 2006 12:12:07 -0500 (CDT) Subject: =?gb2312?q?=BB=D8=B8=B4=A3=BA=20Re:=20some=20problems=20in=20using=20PETS?= =?gb2312?q?C=20with=20FFTW3=20package?= In-Reply-To: <20060802154404.90585.qmail@web15806.mail.cnb.yahoo.com> References: <20060802154404.90585.qmail@web15806.mail.cnb.yahoo.com> Message-ID: Mason, A complex number (PetscScalar) is simply a double [2]. So you can either 1) use complex PETSc and caste the arrays when you pass to fftw or 2) user PETScScalar of simply double and pass those beasts to fftw. Unless YOUR code is using complex numbers then you should simply use 2 and all is easy. Barry On Wed, 2 Aug 2006, jiaxun hou wrote: > Satish, > Thanks for your response . > I am sorry for my confusing description. In fact , I did use the configure option > --with-scalar-type=complex when I configuated the system. So, I wonder if it is possible to change the type PetscScalar to some kinds like double[2] which can be handled in FFTW package? Or, is there any functions can get the real (imaginary) parts of a Petsc's Vector? > > Regards, > Mason > > Satish Balay ?????? > use the configure option > > --with-scalar-type=complex > > Satish > > On Wed, 2 Aug 2006, jiaxun hou wrote: > >> Hi all, >> I am trying to using the package FFTW3 in PETSC. >> How can I change type from PetscScalar to complex or double[2]? >> The documentation seems a bit sketchy. >> >> Regards >> Mason >> >> >> >> --------------------------------- >> ????????????????-3.5G??????20M?????? > > > --------------------------------- > ????????????????-3.5G??????20M?????? From mafunk at nmsu.edu Wed Aug 2 16:32:20 2006 From: mafunk at nmsu.edu (Matt Funk) Date: Wed, 2 Aug 2006 15:32:20 -0600 Subject: profiling PETSc code In-Reply-To: References: <200608011545.26224.mafunk@nmsu.edu> <200608011807.29007.mafunk@nmsu.edu> Message-ID: <200608021532.21971.mafunk@nmsu.edu> Hi Matt, thanks for all the help so far. The -info option is really very helpful. So i think i straightened the actual errors out. However, now i am back to the original question i had. That is why it takes so much longer on 4 procs than on 1 proc. I profiled the KSPSolve(...) as stage 2: For 1 proc i have: --- Event Stage 2: Stage 2 of ChomboPetscInterface VecDot 4000 1.0 4.9158e-02 1.0 4.74e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 18 0 0 0 2 18 0 0 0 474 VecNorm 8000 1.0 2.1798e-01 1.0 2.14e+08 1.0 0.0e+00 0.0e+00 4.0e+03 1 36 0 0 28 7 36 0 0 33 214 VecAYPX 4000 1.0 1.3449e-01 1.0 1.73e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 18 0 0 0 5 18 0 0 0 173 MatMult 4000 1.0 3.6004e-01 1.0 3.24e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 9 0 0 0 12 9 0 0 0 32 MatSolve 8000 1.0 1.0620e+00 1.0 2.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 18 0 0 0 36 18 0 0 0 22 KSPSolve 4000 1.0 2.8338e+00 1.0 4.52e+07 1.0 0.0e+00 0.0e+00 1.2e+04 7100 0 0 84 97100 0 0100 45 PCApply 8000 1.0 1.1133e+00 1.0 2.09e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 18 0 0 0 38 18 0 0 0 21 for 4 procs i have : --- Event Stage 2: Stage 2 of ChomboPetscInterface VecDot 4000 1.0 3.5884e+01133.7 2.17e+07133.7 0.0e+00 0.0e+00 4.0e+03 8 18 0 0 5 9 18 0 0 14 1 VecNorm 8000 1.0 3.4986e-01 1.3 4.43e+07 1.3 0.0e+00 0.0e+00 8.0e+03 0 36 0 0 10 0 36 0 0 29 133 VecSet 8000 1.0 3.5024e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAYPX 4000 1.0 5.6790e-02 1.3 1.28e+08 1.3 0.0e+00 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 410 VecScatterBegin 4000 1.0 6.0042e+01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 38 0 0 0 0 45 0 0 0 0 0 VecScatterEnd 4000 1.0 5.9364e+01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 37 0 0 0 0 44 0 0 0 0 0 MatMult 4000 1.0 1.1959e+02 1.4 3.46e+04 1.4 0.0e+00 0.0e+00 0.0e+00 75 9 0 0 0 89 9 0 0 0 0 MatSolve 8000 1.0 2.8150e-01 1.0 2.16e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 83 MatLUFactorNum 1 1.0 1.3685e-04 1.1 5.64e+06 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 21 MatILUFactorSym 1 1.0 2.3389e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 9.6083e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSetup 1 1.0 2.1458e-06 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 4000 1.0 1.2200e+02 1.0 2.63e+05 1.0 0.0e+00 0.0e+00 2.8e+04 84100 0 0 34 100100 0 0100 1 PCSetUp 1 1.0 5.0187e-04 1.2 1.68e+06 1.2 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 6 PCSetUpOnBlocks 4000 1.0 1.2104e-02 2.2 1.34e+05 2.2 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 8000 1.0 8.4254e-01 1.2 8.27e+06 1.2 0.0e+00 0.0e+00 8.0e+03 1 18 0 0 10 1 18 0 0 29 28 ------------------------------------------------------------------------------------------------------------------------ Now if i understand it right, all these calls summarize all calls between the pop and push commands. That would mean that the majority of the time is spend in the MatMult and in within that the VecScatterBegin and VecScatterEnd commands (if i understand it right). My problem size is really small. So i was wondering if the problem lies in that (namely that the major time is simply spend communicating between processors, or whether there is still something wrong with how i wrote the code?) thanks mat On Tuesday 01 August 2006 18:28, Matthew Knepley wrote: > On 8/1/06, Matt Funk wrote: > > Actually the errors occur on my calls to a PETSc functions after calling > > PETSCInitialize. > > Yes, it is the error I pointed out in the last message. > > Matt > > > mat From knepley at gmail.com Wed Aug 2 16:50:41 2006 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 2 Aug 2006 16:50:41 -0500 Subject: profiling PETSc code In-Reply-To: <200608021532.21971.mafunk@nmsu.edu> References: <200608011545.26224.mafunk@nmsu.edu> <200608011807.29007.mafunk@nmsu.edu> <200608021532.21971.mafunk@nmsu.edu> Message-ID: On 8/2/06, Matt Funk wrote: > Hi Matt, > > thanks for all the help so far. The -info option is really very helpful. So i > think i straightened the actual errors out. However, now i am back to the > original question i had. That is why it takes so much longer on 4 procs than > on 1 proc. So you have a 1.5 load imbalance for MatMult(), which probably cascades to give the 133! load imbalance for VecDot(). You probably have either: 1) VERY bad laod imbalance 2) a screwed up network 3) bad contention on the network (loaded cluster) Can you help us narrow this down? Matt > I profiled the KSPSolve(...) as stage 2: > > For 1 proc i have: > --- Event Stage 2: Stage 2 of ChomboPetscInterface > > VecDot 4000 1.0 4.9158e-02 1.0 4.74e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 18 0 0 0 2 18 0 0 0 474 > VecNorm 8000 1.0 2.1798e-01 1.0 2.14e+08 1.0 0.0e+00 0.0e+00 > 4.0e+03 1 36 0 0 28 7 36 0 0 33 214 > VecAYPX 4000 1.0 1.3449e-01 1.0 1.73e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 18 0 0 0 5 18 0 0 0 173 > MatMult 4000 1.0 3.6004e-01 1.0 3.24e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 9 0 0 0 12 9 0 0 0 32 > MatSolve 8000 1.0 1.0620e+00 1.0 2.19e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 3 18 0 0 0 36 18 0 0 0 22 > KSPSolve 4000 1.0 2.8338e+00 1.0 4.52e+07 1.0 0.0e+00 0.0e+00 > 1.2e+04 7100 0 0 84 97100 0 0100 45 > PCApply 8000 1.0 1.1133e+00 1.0 2.09e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 3 18 0 0 0 38 18 0 0 0 21 > > > for 4 procs i have : > --- Event Stage 2: Stage 2 of ChomboPetscInterface > > VecDot 4000 1.0 3.5884e+01133.7 2.17e+07133.7 0.0e+00 0.0e+00 > 4.0e+03 8 18 0 0 5 9 18 0 0 14 1 > VecNorm 8000 1.0 3.4986e-01 1.3 4.43e+07 1.3 0.0e+00 0.0e+00 > 8.0e+03 0 36 0 0 10 0 36 0 0 29 133 > VecSet 8000 1.0 3.5024e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAYPX 4000 1.0 5.6790e-02 1.3 1.28e+08 1.3 0.0e+00 0.0e+00 > 0.0e+00 0 18 0 0 0 0 18 0 0 0 410 > VecScatterBegin 4000 1.0 6.0042e+01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 38 0 0 0 0 45 0 0 0 0 0 > VecScatterEnd 4000 1.0 5.9364e+01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 37 0 0 0 0 44 0 0 0 0 0 > MatMult 4000 1.0 1.1959e+02 1.4 3.46e+04 1.4 0.0e+00 0.0e+00 > 0.0e+00 75 9 0 0 0 89 9 0 0 0 0 > MatSolve 8000 1.0 2.8150e-01 1.0 2.16e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 18 0 0 0 0 18 0 0 0 83 > MatLUFactorNum 1 1.0 1.3685e-04 1.1 5.64e+06 1.1 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 21 > MatILUFactorSym 1 1.0 2.3389e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetOrdering 1 1.0 9.6083e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSetup 1 1.0 2.1458e-06 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 4000 1.0 1.2200e+02 1.0 2.63e+05 1.0 0.0e+00 0.0e+00 > 2.8e+04 84100 0 0 34 100100 0 0100 1 > PCSetUp 1 1.0 5.0187e-04 1.2 1.68e+06 1.2 0.0e+00 0.0e+00 > 4.0e+00 0 0 0 0 0 0 0 0 0 0 6 > PCSetUpOnBlocks 4000 1.0 1.2104e-02 2.2 1.34e+05 2.2 0.0e+00 0.0e+00 > 4.0e+00 0 0 0 0 0 0 0 0 0 0 0 > PCApply 8000 1.0 8.4254e-01 1.2 8.27e+06 1.2 0.0e+00 0.0e+00 > 8.0e+03 1 18 0 0 10 1 18 0 0 29 28 > ------------------------------------------------------------------------------------------------------------------------ > > Now if i understand it right, all these calls summarize all calls between the > pop and push commands. That would mean that the majority of the time is spend > in the MatMult and in within that the VecScatterBegin and VecScatterEnd > commands (if i understand it right). > > My problem size is really small. So i was wondering if the problem lies in > that (namely that the major time is simply spend communicating between > processors, or whether there is still something wrong with how i wrote the > code?) > > > thanks > mat > > > > On Tuesday 01 August 2006 18:28, Matthew Knepley wrote: > > On 8/1/06, Matt Funk wrote: > > > Actually the errors occur on my calls to a PETSc functions after calling > > > PETSCInitialize. > > > > Yes, it is the error I pointed out in the last message. > > > > Matt > > > > > mat > > -- "Failure has a thousand explanations. Success doesn't need one" -- Sir Alec Guiness From mafunk at nmsu.edu Wed Aug 2 17:21:43 2006 From: mafunk at nmsu.edu (Matt Funk) Date: Wed, 2 Aug 2006 16:21:43 -0600 Subject: profiling PETSc code In-Reply-To: References: <200608011545.26224.mafunk@nmsu.edu> <200608021532.21971.mafunk@nmsu.edu> Message-ID: <200608021621.44171.mafunk@nmsu.edu> Hi Matt, It could be a bad load imbalance because i don't let PETSc decide. I need to fix that anyway, so i think i'll try that first and then let you know. Thanks though for the quick response and helping me to interpret those numbers ... mat On Wednesday 02 August 2006 15:50, Matthew Knepley wrote: > On 8/2/06, Matt Funk wrote: > > Hi Matt, > > > > thanks for all the help so far. The -info option is really very helpful. > > So i think i straightened the actual errors out. However, now i am back > > to the original question i had. That is why it takes so much longer on 4 > > procs than on 1 proc. > > So you have a 1.5 load imbalance for MatMult(), which probably cascades to > give the 133! load imbalance for VecDot(). You probably have either: > > 1) VERY bad laod imbalance > > 2) a screwed up network > > 3) bad contention on the network (loaded cluster) > > Can you help us narrow this down? > > > Matt > > > I profiled the KSPSolve(...) as stage 2: > > > > For 1 proc i have: > > --- Event Stage 2: Stage 2 of ChomboPetscInterface > > > > VecDot 4000 1.0 4.9158e-02 1.0 4.74e+08 1.0 0.0e+00 0.0e+00 > > 0.0e+00 0 18 0 0 0 2 18 0 0 0 474 > > VecNorm 8000 1.0 2.1798e-01 1.0 2.14e+08 1.0 0.0e+00 0.0e+00 > > 4.0e+03 1 36 0 0 28 7 36 0 0 33 214 > > VecAYPX 4000 1.0 1.3449e-01 1.0 1.73e+08 1.0 0.0e+00 0.0e+00 > > 0.0e+00 0 18 0 0 0 5 18 0 0 0 173 > > MatMult 4000 1.0 3.6004e-01 1.0 3.24e+07 1.0 0.0e+00 0.0e+00 > > 0.0e+00 1 9 0 0 0 12 9 0 0 0 32 > > MatSolve 8000 1.0 1.0620e+00 1.0 2.19e+07 1.0 0.0e+00 0.0e+00 > > 0.0e+00 3 18 0 0 0 36 18 0 0 0 22 > > KSPSolve 4000 1.0 2.8338e+00 1.0 4.52e+07 1.0 0.0e+00 0.0e+00 > > 1.2e+04 7100 0 0 84 97100 0 0100 45 > > PCApply 8000 1.0 1.1133e+00 1.0 2.09e+07 1.0 0.0e+00 0.0e+00 > > 0.0e+00 3 18 0 0 0 38 18 0 0 0 21 > > > > > > for 4 procs i have : > > --- Event Stage 2: Stage 2 of ChomboPetscInterface > > > > VecDot 4000 1.0 3.5884e+01133.7 2.17e+07133.7 0.0e+00 > > 0.0e+00 4.0e+03 8 18 0 0 5 9 18 0 0 14 1 > > VecNorm 8000 1.0 3.4986e-01 1.3 4.43e+07 1.3 0.0e+00 0.0e+00 > > 8.0e+03 0 36 0 0 10 0 36 0 0 29 133 > > VecSet 8000 1.0 3.5024e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 > > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecAYPX 4000 1.0 5.6790e-02 1.3 1.28e+08 1.3 0.0e+00 0.0e+00 > > 0.0e+00 0 18 0 0 0 0 18 0 0 0 410 > > VecScatterBegin 4000 1.0 6.0042e+01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 > > 0.0e+00 38 0 0 0 0 45 0 0 0 0 0 > > VecScatterEnd 4000 1.0 5.9364e+01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 > > 0.0e+00 37 0 0 0 0 44 0 0 0 0 0 > > MatMult 4000 1.0 1.1959e+02 1.4 3.46e+04 1.4 0.0e+00 0.0e+00 > > 0.0e+00 75 9 0 0 0 89 9 0 0 0 0 > > MatSolve 8000 1.0 2.8150e-01 1.0 2.16e+07 1.0 0.0e+00 0.0e+00 > > 0.0e+00 0 18 0 0 0 0 18 0 0 0 83 > > MatLUFactorNum 1 1.0 1.3685e-04 1.1 5.64e+06 1.1 0.0e+00 0.0e+00 > > 0.0e+00 0 0 0 0 0 0 0 0 0 0 21 > > MatILUFactorSym 1 1.0 2.3389e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 > > 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatGetOrdering 1 1.0 9.6083e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 > > 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > KSPSetup 1 1.0 2.1458e-06 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 > > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > KSPSolve 4000 1.0 1.2200e+02 1.0 2.63e+05 1.0 0.0e+00 0.0e+00 > > 2.8e+04 84100 0 0 34 100100 0 0100 1 > > PCSetUp 1 1.0 5.0187e-04 1.2 1.68e+06 1.2 0.0e+00 0.0e+00 > > 4.0e+00 0 0 0 0 0 0 0 0 0 0 6 > > PCSetUpOnBlocks 4000 1.0 1.2104e-02 2.2 1.34e+05 2.2 0.0e+00 0.0e+00 > > 4.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > PCApply 8000 1.0 8.4254e-01 1.2 8.27e+06 1.2 0.0e+00 0.0e+00 > > 8.0e+03 1 18 0 0 10 1 18 0 0 29 28 > > ------------------------------------------------------------------------- > >----------------------------------------------- > > > > Now if i understand it right, all these calls summarize all calls between > > the pop and push commands. That would mean that the majority of the time > > is spend in the MatMult and in within that the VecScatterBegin and > > VecScatterEnd commands (if i understand it right). > > > > My problem size is really small. So i was wondering if the problem lies > > in that (namely that the major time is simply spend communicating between > > processors, or whether there is still something wrong with how i wrote > > the code?) > > > > > > thanks > > mat > > > > On Tuesday 01 August 2006 18:28, Matthew Knepley wrote: > > > On 8/1/06, Matt Funk wrote: > > > > Actually the errors occur on my calls to a PETSc functions after > > > > calling PETSCInitialize. > > > > > > Yes, it is the error I pointed out in the last message. > > > > > > Matt > > > > > > > mat From jiaxun_hou at yahoo.com.cn Thu Aug 3 05:13:52 2006 From: jiaxun_hou at yahoo.com.cn (jiaxun hou) Date: Thu, 3 Aug 2006 18:13:52 +0800 (CST) Subject: =?gb2312?q?=BB=D8=B8=B4=A3=BA=20Re:=20=BB=D8=B8=B4=A3=BA=20Re:=20some=20p?= =?gb2312?q?roblems=20in=20using=20PETSC=20with=20FFTW3=20package?= In-Reply-To: Message-ID: <20060803101352.75243.qmail@web15810.mail.cnb.yahoo.com> Barry, Thank you very much. As long as a complex number (PetscScalar) is simply a double [2], I can use the operator "reinterpret_cast" to caste them. And it seems to be working fine now. Regards, Mason Barry Smith ??? Mason, A complex number (PetscScalar) is simply a double [2]. So you can either 1) use complex PETSc and caste the arrays when you pass to fftw or 2) user PETScScalar of simply double and pass those beasts to fftw. Unless YOUR code is using complex numbers then you should simply use 2 and all is easy. Barry On Wed, 2 Aug 2006, jiaxun hou wrote: > Satish, > Thanks for your response . > I am sorry for my confusing description. In fact , I did use the configure option > --with-scalar-type=complex when I configuated the system. So, I wonder if it is possible to change the type PetscScalar to some kinds like double[2] which can be handled in FFTW package? Or, is there any functions can get the real (imaginary) parts of a Petsc's Vector? > > Regards, > Mason > > Satish Balay ??? > use the configure option > > --with-scalar-type=complex > > Satish > > On Wed, 2 Aug 2006, jiaxun hou wrote: > >> Hi all, >> I am trying to using the package FFTW3 in PETSC. >> How can I change type from PetscScalar to complex or double[2]? >> The documentation seems a bit sketchy. >> >> Regards >> Mason >> >> >> >> --------------------------------- >> ????????-3.5G???20M??? > > > --------------------------------- > ????????-3.5G???20M??? --------------------------------- Mp3???-??????? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jiaxun_hou at yahoo.com.cn Thu Aug 3 05:20:54 2006 From: jiaxun_hou at yahoo.com.cn (jiaxun hou) Date: Thu, 3 Aug 2006 18:20:54 +0800 (CST) Subject: =?gb2312?q?=BB=D8=B8=B4=A3=BA=20Re:=20=BB=D8=B8=B4=A3=BA=20Re:=20some=20p?= =?gb2312?q?roblems=20in=20using=20PETSC=20with=20FFTW3=20package?= In-Reply-To: Message-ID: <20060803102054.44952.qmail@web15807.mail.cnb.yahoo.com> Hong, Thank you for your help. I have successfully converted PetscScalar* into fftw_complex* by using opertator "reinterpret_cast". Regards, Mason Hong Zhang ??? You can retrieve real and imaginary part of a petsc scalar from PetscRealPart()/PetscImaginaryPart() See an example at ~petsc/src/ksp/ksp/examples/tutorials/ex11.c Hong On Wed, 2 Aug 2006, jiaxun hou wrote: > Hong Zhang, > Thanks for your respones. > > In FFTW3, complex type is set by double[2], and it is very easy to handle. > But in Petsc, I don't konw exactly how the complex type be set. And when I want to do the fast fourier transform on a Petsc's complex vector by using FFTW3, I get the trouble of the translation between Petsc and FFTW3. > > Regards, > Mason > > Hong Zhang ??? > > Manson, > > We don't have support for FFTW3 yet(we are currently developing > an interface between petsc and FFTW3). How do you use FFTW3 in PETSC? > > To build petsc with complex, you need configure petsc with > '--with-scalar-type=complex' > > Hong > > On Wed, 2 Aug 2006, jiaxun hou wrote: > > > Hi all, > > I am trying to using the package FFTW3 in PETSC. > > How can I change type from PetscScalar to complex or double[2]? > > The documentation seems a bit sketchy. > > > > Regards > > Mason > > > > > > > > --------------------------------- > > ????????-3.5G???20M??? > > > > > --------------------------------- > ??????-3.5G???20M?? __________________________________________________ ??????????????? http://cn.mail.yahoo.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From diosady at MIT.EDU Thu Aug 3 09:05:54 2006 From: diosady at MIT.EDU (Laslo Tibor Diosady) Date: Thu, 3 Aug 2006 10:05:54 -0400 (EDT) Subject: In place ILU(0) factorization Message-ID: Hi, I wanted to perform an in place ILU factorization with fill level of 0 (ie ILU(0)) for a SeqBAIJ matrix. This works fine when I use a natural ordering however when I try to use a different matrix reordering I get the following error. [0]PETSC ERROR: MatILUFactor_SeqBAIJ() line 1768 in src/mat/impls/baij/seq/baij.c [0]PETSC ERROR: Invalid argument! [0]PETSC ERROR: Row and column permutations must be identity for in-place ILU! [0]PETSC ERROR: MatILUFactor() line 2107 in src/mat/interface/matrix.c In otherwords, Petsc only supports in place ILU(0) without reordering. The idea behind doing an in place factorization is so that I don't use twice as much memory to store my matrix (ie the original matrix and the ILU factored matrix). Is in place ILU factorization with reordering going to be supported by Petsc anytime in the near future or is there an easy work around so I can get this to work? Thanks, Laslo From hzhang at mcs.anl.gov Thu Aug 3 09:54:54 2006 From: hzhang at mcs.anl.gov (Hong Zhang) Date: Thu, 3 Aug 2006 09:54:54 -0500 (CDT) Subject: In place ILU(0) factorization In-Reply-To: References: Message-ID: Laslo, An reordering of matrix changes matrix sparse pattern, then the factored matrix cannot be stored in the original matrix. Here is the notes from petsc MatILUFactor(): Notes: Probably really in-place only when level of fill is zero, otherwise allocates new space to store factored matrix and deletes previous memory. i.e., except ilu(0) without reordering, petsc inplace ilu() virtually computes a new factor, and deletes the previous memory. You may use petsc out-place ilu, and call MatDestroy() to delete your original matrix. > In otherwords, Petsc only supports in place ILU(0) without reordering. > > The idea behind doing an in place factorization is so that I don't use > twice as much memory to store my matrix (ie the original matrix and the > ILU factored matrix). > > > Is in place ILU factorization with reordering going to be supported by > Petsc anytime in the near future or is there an easy work around so I can > get this to work? We can add this support. As mentioned above, the factored matrix will be newly allocated with the original memory deleted. Hong From diosady at MIT.EDU Thu Aug 3 10:27:40 2006 From: diosady at MIT.EDU (Laslo Tibor Diosady) Date: Thu, 3 Aug 2006 11:27:40 -0400 (EDT) Subject: In place ILU(0) factorization In-Reply-To: References: Message-ID: Hong, I know that I can use MatILUFactorSymbolic and then MatLUFactorNumeric and then destroy the original matrix if I want to. Though support for this in one step with MatILUFactor would be nice, it is not really important. The point I was trying to make is that performing and ILU(0) does not change the sparsity pattern, no matter what reordering is used, since by definition there is no fill and hence no change in sparsity pattern from the original matrix. The reordering in this case simply changes the order of operations to perform the ILU(0) but not the memory requirements. In this case reordering is performed not to reduce fill but to achieve a "better" ILU factorization. Hence it should be possible to perform and ILU(0) in place with different reorderings. This is what I was hoping to get support for. Thanks, Laslo On Thu, 3 Aug 2006, Hong Zhang wrote: > > Laslo, > > An reordering of matrix changes matrix sparse pattern, > then the factored matrix cannot be stored in the original matrix. > Here is the notes from petsc MatILUFactor(): > > Notes: > Probably really in-place only when level of fill is zero, otherwise > allocates > new space to store factored matrix and deletes previous memory. > > i.e., except ilu(0) without reordering, petsc inplace ilu() > virtually computes a new factor, and deletes the previous memory. > You may use petsc out-place ilu, and call MatDestroy() > to delete your original matrix. > >> In otherwords, Petsc only supports in place ILU(0) without reordering. >> >> The idea behind doing an in place factorization is so that I don't use >> twice as much memory to store my matrix (ie the original matrix and the >> ILU factored matrix). >> >> >> Is in place ILU factorization with reordering going to be supported by >> Petsc anytime in the near future or is there an easy work around so I can >> get this to work? > > We can add this support. As mentioned above, the factored matrix > will be newly allocated with the original memory deleted. > > Hong > > From hzhang at mcs.anl.gov Thu Aug 3 10:59:07 2006 From: hzhang at mcs.anl.gov (Hong Zhang) Date: Thu, 3 Aug 2006 10:59:07 -0500 (CDT) Subject: In place ILU(0) factorization In-Reply-To: References: Message-ID: Laslo, > The point I was trying to make is that performing and ILU(0) does not > change the sparsity pattern, no matter what reordering is used, since by > definition there is no fill and hence no change in sparsity pattern from > the original matrix. The space required remains the same, but the row-compressed matrix format for the factor will be changed with the reordering. To store the new format over the existing memory, temp space has to be allocated during implementation. Thus replacing the original memory with newly allocated space would make implementation easier. > The reordering in this case simply changes the order of operations to > perform the ILU(0) but not the memory requirements. In this case > reordering is performed not to reduce fill but to achieve a "better" ILU > factorization. Yes. > > Hence it should be possible to perform and ILU(0) in place with > different reorderings. This is what I was hoping to get support for. > We'll try to provide this support. Hong > > > On Thu, 3 Aug 2006, Hong Zhang wrote: > > > > > Laslo, > > > > An reordering of matrix changes matrix sparse pattern, > > then the factored matrix cannot be stored in the original matrix. > > Here is the notes from petsc MatILUFactor(): > > > > Notes: > > Probably really in-place only when level of fill is zero, otherwise > > allocates > > new space to store factored matrix and deletes previous memory. > > > > i.e., except ilu(0) without reordering, petsc inplace ilu() > > virtually computes a new factor, and deletes the previous memory. > > You may use petsc out-place ilu, and call MatDestroy() > > to delete your original matrix. > > > >> In otherwords, Petsc only supports in place ILU(0) without reordering. > >> > >> The idea behind doing an in place factorization is so that I don't use > >> twice as much memory to store my matrix (ie the original matrix and the > >> ILU factored matrix). > >> > >> > >> Is in place ILU factorization with reordering going to be supported by > >> Petsc anytime in the near future or is there an easy work around so I can > >> get this to work? > > > > We can add this support. As mentioned above, the factored matrix > > will be newly allocated with the original memory deleted. > > > > Hong > > > > > > From diosady at MIT.EDU Fri Aug 4 08:52:24 2006 From: diosady at MIT.EDU (Laslo Tibor Diosady) Date: Fri, 4 Aug 2006 09:52:24 -0400 (EDT) Subject: In place ILU(0) factorization In-Reply-To: References: Message-ID: Hong, > The space required remains the same, but > the row-compressed matrix format > for the factor will be changed with the > reordering. > To store the new format over the > existing memory, temp space has to be allocated during > implementation. Thus replacing the original memory with > newly allocated space would make implementation easier. > I guess this depends upon the implementation of the ILU(0) factorization for the AIJ or BAIJ formats, which I didn't (nor do I ever really want to) look into. Thanks for the help, Laslo From diosady at MIT.EDU Mon Aug 7 08:48:14 2006 From: diosady at MIT.EDU (Laslo Tibor Diosady) Date: Mon, 7 Aug 2006 09:48:14 -0400 (EDT) Subject: MatSolveTranspose Message-ID: Hi, I am trying to use MatSolveTranspose for a sequential BAIJ format, however whenever I call MatSolveTranspose I get a segmentation fault. The sequence of calls which I make is: MatILUFactorSymbolic MatLUFactorNumeric And I use the resulting matrix in calls to: MatSolve and MatSolveTranspose When I use MatSolve this works great. However, with MatSolveTranspose it appears petsc tries to allocate memory in a never ending loop until my machine runs out of memory and I get a seg fault. If I do a call to MatHasOperation with this matrix the result is Petsc_True, so in theory I think the call to MatSolveTranspose should work. Am I doing something wrong or is this a problem with MatSolveTranspose for seqBAIJ matrices? Any help would be greatly appreciated. Thanks, Laslo From hzhang at mcs.anl.gov Mon Aug 7 15:24:58 2006 From: hzhang at mcs.anl.gov (Hong Zhang) Date: Mon, 7 Aug 2006 15:24:58 -0500 (CDT) Subject: MatSolveTranspose In-Reply-To: References: Message-ID: Laslo, MatSolveTranspose() should work for sequential BAIJ format. The example ~petsc/src/mat/examples/tests/ex48.c tests it. Would you please run this example and see if it works. You may simplify your code and send it to us. Then I'll test it to see where is the problem. Hong On Mon, 7 Aug 2006, Laslo Tibor Diosady wrote: > Hi, > > I am trying to use MatSolveTranspose for a sequential BAIJ format, however > whenever I call MatSolveTranspose I get a segmentation fault. > > The sequence of calls which I make is: > MatILUFactorSymbolic > MatLUFactorNumeric > > And I use the resulting matrix in calls to: > MatSolve and MatSolveTranspose > > When I use MatSolve this works great. However, with MatSolveTranspose it > appears petsc tries to allocate memory in a never ending loop until my > machine runs out of memory and I get a seg fault. > > If I do a call to MatHasOperation with this matrix the result is > Petsc_True, so in theory I think the call to MatSolveTranspose should > work. > > Am I doing something wrong or is this a problem with MatSolveTranspose for > seqBAIJ matrices? > > Any help would be greatly appreciated. > > Thanks, > > Laslo > > From jiaxun_hou at yahoo.com.cn Tue Aug 8 04:48:48 2006 From: jiaxun_hou at yahoo.com.cn (jiaxun hou) Date: Tue, 8 Aug 2006 17:48:48 +0800 (CST) Subject: About user defined PC In-Reply-To: Message-ID: <20060808094848.2415.qmail@web15803.mail.cnb.yahoo.com> Hi, I met a problem ,when I constructed a user-defined PC. That is I need to defind the process of P^(-1)Mx in each iteration of GMRES by myself for efficient reason, not only define P^(-1)x . But the Petsc seems to separate this process into two parts: the first is y=Mx which is defined in Petsc framework, and the second is P^(-1)y which is defined by the user. So, is there any way to do it without change the code of Petsc framework? Regards, Jiaxun --------------------------------- ????????-3.5G???20M??? -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Tue Aug 8 08:13:15 2006 From: hzhang at mcs.anl.gov (Hong Zhang) Date: Tue, 8 Aug 2006 08:13:15 -0500 (CDT) Subject: In place ILU(0) factorization In-Reply-To: References: Message-ID: Laslo, We figured out a way to implement ILU(0) with reordering without allocating workspace. We'll add this support later. I'll let you know when it is done. Thanks for your request that help us to make petsc better. Hong On Fri, 4 Aug 2006, Laslo Tibor Diosady wrote: > Hong, > > > > The space required remains the same, but > > the row-compressed matrix format > > for the factor will be changed with the > > reordering. > > To store the new format over the > > existing memory, temp space has to be allocated during > > implementation. Thus replacing the original memory with > > newly allocated space would make implementation easier. > > > > I guess this depends upon the implementation of the ILU(0) factorization > for the AIJ or BAIJ formats, which I didn't (nor do I ever really want > to) look into. > > Thanks for the help, > > Laslo > > From bsmith at mcs.anl.gov Tue Aug 8 07:18:56 2006 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 8 Aug 2006 07:18:56 -0500 (CDT) Subject: About user defined PC In-Reply-To: <20060808094848.2415.qmail@web15803.mail.cnb.yahoo.com> References: <20060808094848.2415.qmail@web15803.mail.cnb.yahoo.com> Message-ID: Jiaxun, I am assuming you are using the PCSHELL? I have added support for this for you; * if you are using petsc-dev (http://www-unix.mcs.anl.gov/petsc/petsc-as/developers/index.html) you need only do an hg pull to get my additions then run "make" in src/ksp/pc/impls/shell. * if you are not using petsc-dev (or use the nightly tar ball) then I attach the three files that were changed. include/private/pcimpl.h, include/petscpc.h and src/ksp/pc/impls/shell/shell.c (again run make in src/ksp/pc/impls/shell) If you are actually writing a complete PC and not using PCSHELL http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/src/ksp/pc/impls/jacobi/jacobi.c.html then you just need to provide a routine PCApplyBA_XXXX() with calling sequence: PC,PCSide,Vec b,Vec x,Vec work Good luck, Barry On Tue, 8 Aug 2006, jiaxun hou wrote: > Hi, > > I met a problem ,when I constructed a user-defined PC. That is I need to defind the process of P^(-1)Mx in each iteration of GMRES by myself for efficient reason, not only define P^(-1)x . But the Petsc seems to separate this process into two parts: the first is y=Mx which is defined in Petsc framework, and the second is P^(-1)y which is defined by the user. So, is there any way to do it without change the code of Petsc framework? > > Regards, > Jiaxun > > > --------------------------------- > ????????????????-3.5G??????20M?????? -------------- next part -------------- /* Preconditioner module. */ #if !defined(__PETSCPC_H) #define __PETSCPC_H #include "petscmat.h" PETSC_EXTERN_CXX_BEGIN EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCInitializePackage(const char[]); /* PCList contains the list of preconditioners currently registered These are added with the PCRegisterDynamic() macro */ extern PetscFList PCList; #define PCType const char* /*S PC - Abstract PETSc object that manages all preconditioners Level: beginner Concepts: preconditioners .seealso: PCCreate(), PCSetType(), PCType (for list of available types) S*/ typedef struct _p_PC* PC; /*E PCType - String with the name of a PETSc preconditioner method or the creation function with an optional dynamic library name, for example http://www.mcs.anl.gov/petsc/lib.a:mypccreate() Level: beginner Notes: Click on the links below to see details on a particular solver .seealso: PCSetType(), PC, PCCreate() E*/ #define PCNONE "none" #define PCJACOBI "jacobi" #define PCSOR "sor" #define PCLU "lu" #define PCSHELL "shell" #define PCBJACOBI "bjacobi" #define PCMG "mg" #define PCEISENSTAT "eisenstat" #define PCILU "ilu" #define PCICC "icc" #define PCASM "asm" #define PCKSP "ksp" #define PCCOMPOSITE "composite" #define PCREDUNDANT "redundant" #define PCSPAI "spai" #define PCNN "nn" #define PCCHOLESKY "cholesky" #define PCSAMG "samg" #define PCPBJACOBI "pbjacobi" #define PCMAT "mat" #define PCHYPRE "hypre" #define PCFIELDSPLIT "fieldsplit" #define PCTFS "tfs" #define PCML "ml" #define PCPROMETHEUS "prometheus" #define PCGALERKIN "galerkin" /* Logging support */ extern PetscCookie PETSCKSP_DLLEXPORT PC_COOKIE; /*E PCSide - If the preconditioner is to be applied to the left, right or symmetrically around the operator. Level: beginner .seealso: E*/ typedef enum { PC_LEFT,PC_RIGHT,PC_SYMMETRIC } PCSide; extern const char *PCSides[]; EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCCreate(MPI_Comm,PC*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSetType(PC,PCType); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSetUp(PC); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSetUpOnBlocks(PC); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCApply(PC,Vec,Vec); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCApplySymmetricLeft(PC,Vec,Vec); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCApplySymmetricRight(PC,Vec,Vec); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCApplyBAorAB(PC,PCSide,Vec,Vec,Vec); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCApplyTranspose(PC,Vec,Vec); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCHasApplyTranspose(PC,PetscTruth*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCApplyBAorABTranspose(PC,PCSide,Vec,Vec,Vec); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCApplyRichardson(PC,Vec,Vec,Vec,PetscReal,PetscReal,PetscReal,PetscInt); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCApplyRichardsonExists(PC,PetscTruth*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCRegisterDestroy(void); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCRegisterAll(const char[]); extern PetscTruth PCRegisterAllCalled; EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCRegister(const char[],const char[],const char[],PetscErrorCode(*)(PC)); /*MC PCRegisterDynamic - Adds a method to the preconditioner package. Synopsis: PetscErrorCode PCRegisterDynamic(char *name_solver,char *path,char *name_create,PetscErrorCode (*routine_create)(PC)) Not collective Input Parameters: + name_solver - name of a new user-defined solver . path - path (either absolute or relative) the library containing this solver . name_create - name of routine to create method context - routine_create - routine to create method context Notes: PCRegisterDynamic() may be called multiple times to add several user-defined preconditioners. If dynamic libraries are used, then the fourth input argument (routine_create) is ignored. Sample usage: .vb PCRegisterDynamic("my_solver","/home/username/my_lib/lib/libO/solaris/mylib", "MySolverCreate",MySolverCreate); .ve Then, your solver can be chosen with the procedural interface via $ PCSetType(pc,"my_solver") or at runtime via the option $ -pc_type my_solver Level: advanced Notes: ${PETSC_ARCH}, ${PETSC_DIR}, ${PETSC_LIB_DIR}, or ${any environmental variable} occuring in pathname will be replaced with appropriate values. If your function is not being put into a shared library then use PCRegister() instead .keywords: PC, register .seealso: PCRegisterAll(), PCRegisterDestroy() M*/ #if defined(PETSC_USE_DYNAMIC_LIBRARIES) #define PCRegisterDynamic(a,b,c,d) PCRegister(a,b,c,0) #else #define PCRegisterDynamic(a,b,c,d) PCRegister(a,b,c,d) #endif EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCDestroy(PC); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSetFromOptions(PC); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCGetType(PC,PCType*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCGetFactoredMatrix(PC,Mat*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSetModifySubMatrices(PC,PetscErrorCode(*)(PC,PetscInt,const IS[],const IS[],Mat[],void*),void*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCModifySubMatrices(PC,PetscInt,const IS[],const IS[],Mat[],void*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSetOperators(PC,Mat,Mat,MatStructure); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCGetOperators(PC,Mat*,Mat*,MatStructure*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCGetOperatorsSet(PC,PetscTruth*,PetscTruth*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCView(PC,PetscViewer); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSetOptionsPrefix(PC,const char[]); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCAppendOptionsPrefix(PC,const char[]); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCGetOptionsPrefix(PC,const char*[]); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCComputeExplicitOperator(PC,Mat*); /* These are used to provide extra scaling of preconditioned operator for time-stepping schemes like in SUNDIALS */ EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCDiagonalScale(PC,PetscTruth*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCDiagonalScaleLeft(PC,Vec,Vec); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCDiagonalScaleRight(PC,Vec,Vec); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCDiagonalScaleSet(PC,Vec); /* ------------- options specific to particular preconditioners --------- */ EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCJacobiSetUseRowMax(PC); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCJacobiSetUseAbs(PC); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSORSetSymmetric(PC,MatSORType); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSORSetOmega(PC,PetscReal); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSORSetIterations(PC,PetscInt,PetscInt); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCEisenstatSetOmega(PC,PetscReal); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCEisenstatNoDiagonalScaling(PC); #define USE_PRECONDITIONER_MATRIX 0 #define USE_TRUE_MATRIX 1 EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCBJacobiSetUseTrueLocal(PC); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCBJacobiSetTotalBlocks(PC,PetscInt,const PetscInt[]); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCBJacobiSetLocalBlocks(PC,PetscInt,const PetscInt[]); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCKSPSetUseTrue(PC); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetApply(PC,PetscErrorCode (*)(void*,Vec,Vec)); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetApplyBA(PC,PetscErrorCode (*)(void*,PCSide,Vec,Vec,Vec)); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetApplyTranspose(PC,PetscErrorCode (*)(void*,Vec,Vec)); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetSetUp(PC,PetscErrorCode (*)(void*)); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetApplyRichardson(PC,PetscErrorCode (*)(void*,Vec,Vec,Vec,PetscReal,PetscReal,PetscReal,PetscInt)); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetView(PC,PetscErrorCode (*)(void*,PetscViewer)); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetDestroy(PC,PetscErrorCode (*)(void*)); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCShellGetContext(PC,void**); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetContext(PC,void*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetName(PC,const char[]); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCShellGetName(PC,char*[]); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorSetZeroPivot(PC,PetscReal); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorSetShiftNonzero(PC,PetscReal); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorSetShiftPd(PC,PetscTruth); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorSetFill(PC,PetscReal); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorSetPivoting(PC,PetscReal); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorReorderForNonzeroDiagonal(PC,PetscReal); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorSetMatOrdering(PC,MatOrderingType); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorSetReuseOrdering(PC,PetscTruth); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorSetReuseFill(PC,PetscTruth); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorSetUseInPlace(PC); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorSetAllowDiagonalFill(PC); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorSetPivotInBlocks(PC,PetscTruth); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorSetLevels(PC,PetscInt); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorSetUseDropTolerance(PC,PetscReal,PetscReal,PetscInt); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCASMSetLocalSubdomains(PC,PetscInt,IS[]); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCASMSetTotalSubdomains(PC,PetscInt,IS[]); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCASMSetOverlap(PC,PetscInt); /*E PCASMType - Type of additive Schwarz method to use $ PC_ASM_BASIC - symmetric version where residuals from the ghost points are used $ and computed values in ghost regions are added together. Classical $ standard additive Schwarz $ PC_ASM_RESTRICT - residuals from ghost points are used but computed values in ghost $ region are discarded. Default $ PC_ASM_INTERPOLATE - residuals from ghost points are not used, computed values in ghost $ region are added back in $ PC_ASM_NONE - ghost point residuals are not used, computed ghost values are discarded $ not very good. Level: beginner .seealso: PCASMSetType() E*/ typedef enum {PC_ASM_BASIC = 3,PC_ASM_RESTRICT = 1,PC_ASM_INTERPOLATE = 2,PC_ASM_NONE = 0} PCASMType; extern const char *PCASMTypes[]; EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCASMSetType(PC,PCASMType); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCASMCreateSubdomains2D(PetscInt,PetscInt,PetscInt,PetscInt,PetscInt,PetscInt,PetscInt *,IS **); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCASMSetUseInPlace(PC); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCASMGetLocalSubdomains(PC,PetscInt*,IS*[]); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCASMGetLocalSubmatrices(PC,PetscInt*,Mat*[]); /*E PCCompositeType - Determines how two or more preconditioner are composed $ PC_COMPOSITE_ADDITIVE - results from application of all preconditioners are added together $ PC_COMPOSITE_MULTIPLICATIVE - preconditioners are applied sequentially to the residual freshly $ computed after the previous preconditioner application $ PC_COMPOSITE_SPECIAL - This is very special for a matrix of the form alpha I + R + S $ where first preconditioner is built from alpha I + S and second from $ alpha I + R Level: beginner .seealso: PCCompositeSetType() E*/ typedef enum {PC_COMPOSITE_ADDITIVE,PC_COMPOSITE_MULTIPLICATIVE,PC_COMPOSITE_SPECIAL} PCCompositeType; extern const char *PCCompositeTypes[]; EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCCompositeSetUseTrue(PC); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCCompositeSetType(PC,PCCompositeType); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCCompositeAddPC(PC,PCType); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCCompositeGetPC(PC pc,PetscInt n,PC *); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCCompositeSpecialSetAlpha(PC,PetscScalar); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCRedundantSetScatter(PC,VecScatter,VecScatter); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCRedundantGetOperators(PC,Mat*,Mat*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCRedundantGetPC(PC,PC*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSPAISetEpsilon(PC,double); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSPAISetNBSteps(PC,PetscInt); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSPAISetMax(PC,PetscInt); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSPAISetMaxNew(PC,PetscInt); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSPAISetBlockSize(PC,PetscInt); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSPAISetCacheSize(PC,PetscInt); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSPAISetVerbose(PC,PetscInt); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSPAISetSp(PC,PetscInt); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCHYPRESetType(PC,const char[]); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCBJacobiGetLocalBlocks(PC,PetscInt*,const PetscInt*[]); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCBJacobiGetTotalBlocks(PC,PetscInt*,const PetscInt*[]); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFieldSplitSetFields(PC,PetscInt,PetscInt*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFieldSplitSetType(PC,PCCompositeType); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCGalerkinSetRestriction(PC,Mat); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCGalerkinSetInterpolation(PC,Mat); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSetCoordinates(PC,PetscInt,PetscReal*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSASetVectors(PC,PetscInt,PetscReal *); PETSC_EXTERN_CXX_END #endif /* __PETSCPC_H */ -------------- next part -------------- #ifndef _PCIMPL #define _PCIMPL #include "petscksp.h" #include "petscpc.h" typedef struct _PCOps *PCOps; struct _PCOps { PetscErrorCode (*setup)(PC); PetscErrorCode (*apply)(PC,Vec,Vec); PetscErrorCode (*applyrichardson)(PC,Vec,Vec,Vec,PetscReal,PetscReal,PetscReal,PetscInt); PetscErrorCode (*applyBA)(PC,PCSide,Vec,Vec,Vec); PetscErrorCode (*applytranspose)(PC,Vec,Vec); PetscErrorCode (*applyBAtranspose)(PC,PetscInt,Vec,Vec,Vec); PetscErrorCode (*setfromoptions)(PC); PetscErrorCode (*presolve)(PC,KSP,Vec,Vec); PetscErrorCode (*postsolve)(PC,KSP,Vec,Vec); PetscErrorCode (*getfactoredmatrix)(PC,Mat*); PetscErrorCode (*applysymmetricleft)(PC,Vec,Vec); PetscErrorCode (*applysymmetricright)(PC,Vec,Vec); PetscErrorCode (*setuponblocks)(PC); PetscErrorCode (*destroy)(PC); PetscErrorCode (*view)(PC,PetscViewer); }; /* Preconditioner context */ struct _p_PC { PETSCHEADER(struct _PCOps); PetscInt setupcalled; MatStructure flag; Mat mat,pmat; Vec diagonalscaleright,diagonalscaleleft; /* used for time integration scaling */ PetscTruth diagonalscale; PetscErrorCode (*modifysubmatrices)(PC,PetscInt,const IS[],const IS[],Mat[],void*); /* user provided routine */ void *modifysubmatricesP; /* context for user routine */ void *data; }; extern PetscEvent PC_SetUp, PC_SetUpOnBlocks, PC_Apply, PC_ApplyCoarse, PC_ApplyMultiple, PC_ApplySymmetricLeft; extern PetscEvent PC_ApplySymmetricRight, PC_ModifySubMatrices; #endif -------------- next part -------------- #define PETSCKSP_DLL /* This provides a simple shell for Fortran (and C programmers) to create their own preconditioner without writing much interface code. */ #include "private/pcimpl.h" /*I "petscpc.h" I*/ #include "private/vecimpl.h" EXTERN_C_BEGIN typedef struct { void *ctx; /* user provided contexts for preconditioner */ PetscErrorCode (*destroy)(void*); PetscErrorCode (*setup)(void*); PetscErrorCode (*apply)(void*,Vec,Vec); PetscErrorCode (*applyBA)(void*,PCSide,Vec,Vec,Vec); PetscErrorCode (*presolve)(void*,KSP,Vec,Vec); PetscErrorCode (*postsolve)(void*,KSP,Vec,Vec); PetscErrorCode (*view)(void*,PetscViewer); PetscErrorCode (*applytranspose)(void*,Vec,Vec); PetscErrorCode (*applyrich)(void*,Vec,Vec,Vec,PetscReal,PetscReal,PetscReal,PetscInt); char *name; } PC_Shell; EXTERN_C_END #undef __FUNCT__ #define __FUNCT__ "PCShellGetContext" /*@ PCShellGetContext - Returns the user-provided context associated with a shell PC Not Collective Input Parameter: . pc - should have been created with PCCreateShell() Output Parameter: . ctx - the user provided context Level: advanced Notes: This routine is intended for use within various shell routines .keywords: PC, shell, get, context .seealso: PCCreateShell(), PCShellSetContext() @*/ PetscErrorCode PETSCKSP_DLLEXPORT PCShellGetContext(PC pc,void **ctx) { PetscErrorCode ierr; PetscTruth flg; PetscFunctionBegin; PetscValidHeaderSpecific(pc,PC_COOKIE,1); PetscValidPointer(ctx,2); ierr = PetscTypeCompare((PetscObject)pc,PCSHELL,&flg);CHKERRQ(ierr); if (!flg) *ctx = 0; else *ctx = ((PC_Shell*)(pc->data))->ctx; PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCShellSetContext" /*@C PCShellSetContext - sets the context for a shell PC Collective on PC Input Parameters: + pc - the shell PC - ctx - the context Level: advanced Fortran Notes: The context can only be an integer or a PetscObject unfortunately it cannot be a Fortran array or derived type. .seealso: PCCreateShell(), PCShellGetContext() @*/ PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetContext(PC pc,void *ctx) { PC_Shell *shell = (PC_Shell*)pc->data; PetscErrorCode ierr; PetscTruth flg; PetscFunctionBegin; PetscValidHeaderSpecific(pc,PC_COOKIE,1); ierr = PetscTypeCompare((PetscObject)pc,PCSHELL,&flg);CHKERRQ(ierr); if (flg) { shell->ctx = ctx; } PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCSetUp_Shell" static PetscErrorCode PCSetUp_Shell(PC pc) { PC_Shell *shell; PetscErrorCode ierr; PetscFunctionBegin; shell = (PC_Shell*)pc->data; if (shell->setup) { CHKMEMQ; ierr = (*shell->setup)(shell->ctx);CHKERRQ(ierr); CHKMEMQ; } PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCApply_Shell" static PetscErrorCode PCApply_Shell(PC pc,Vec x,Vec y) { PC_Shell *shell; PetscErrorCode ierr; PetscFunctionBegin; shell = (PC_Shell*)pc->data; if (!shell->apply) SETERRQ(PETSC_ERR_USER,"No apply() routine provided to Shell PC"); PetscStackPush("PCSHELL user function"); CHKMEMQ; ierr = (*shell->apply)(shell->ctx,x,y);CHKERRQ(ierr); CHKMEMQ; PetscStackPop; PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCApplyBA_Shell" static PetscErrorCode PCApplyBA_Shell(PC pc,PCSide side,Vec x,Vec y,Vec w) { PC_Shell *shell; PetscErrorCode ierr; PetscFunctionBegin; shell = (PC_Shell*)pc->data; if (!shell->applyBA) SETERRQ(PETSC_ERR_USER,"No applyBA() routine provided to Shell PC"); PetscStackPush("PCSHELL user function BA"); CHKMEMQ; ierr = (*shell->applyBA)(shell->ctx,side,x,y,w);CHKERRQ(ierr); CHKMEMQ; PetscStackPop; PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCPreSolve_Shell" static PetscErrorCode PCPreSolve_Shell(PC pc,KSP ksp,Vec b,Vec x) { PC_Shell *shell; PetscErrorCode ierr; PetscFunctionBegin; shell = (PC_Shell*)pc->data; if (!shell->presolve) SETERRQ(PETSC_ERR_USER,"No presolve() routine provided to Shell PC"); ierr = (*shell->presolve)(shell->ctx,ksp,b,x);CHKERRQ(ierr); PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCPostSolve_Shell" static PetscErrorCode PCPostSolve_Shell(PC pc,KSP ksp,Vec b,Vec x) { PC_Shell *shell; PetscErrorCode ierr; PetscFunctionBegin; shell = (PC_Shell*)pc->data; if (!shell->postsolve) SETERRQ(PETSC_ERR_USER,"No postsolve() routine provided to Shell PC"); ierr = (*shell->postsolve)(shell->ctx,ksp,b,x);CHKERRQ(ierr); PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCApplyTranspose_Shell" static PetscErrorCode PCApplyTranspose_Shell(PC pc,Vec x,Vec y) { PC_Shell *shell; PetscErrorCode ierr; PetscFunctionBegin; shell = (PC_Shell*)pc->data; if (!shell->applytranspose) SETERRQ(PETSC_ERR_USER,"No applytranspose() routine provided to Shell PC"); ierr = (*shell->applytranspose)(shell->ctx,x,y);CHKERRQ(ierr); PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCApplyRichardson_Shell" static PetscErrorCode PCApplyRichardson_Shell(PC pc,Vec x,Vec y,Vec w,PetscReal rtol,PetscReal abstol, PetscReal dtol,PetscInt it) { PetscErrorCode ierr; PC_Shell *shell; PetscFunctionBegin; shell = (PC_Shell*)pc->data; ierr = (*shell->applyrich)(shell->ctx,x,y,w,rtol,abstol,dtol,it);CHKERRQ(ierr); PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCDestroy_Shell" static PetscErrorCode PCDestroy_Shell(PC pc) { PC_Shell *shell = (PC_Shell*)pc->data; PetscErrorCode ierr; PetscFunctionBegin; ierr = PetscStrfree(shell->name);CHKERRQ(ierr); if (shell->destroy) { ierr = (*shell->destroy)(shell->ctx);CHKERRQ(ierr); } ierr = PetscFree(shell);CHKERRQ(ierr); PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCView_Shell" static PetscErrorCode PCView_Shell(PC pc,PetscViewer viewer) { PC_Shell *shell = (PC_Shell*)pc->data; PetscErrorCode ierr; PetscTruth iascii; PetscFunctionBegin; ierr = PetscTypeCompare((PetscObject)viewer,PETSC_VIEWER_ASCII,&iascii);CHKERRQ(ierr); if (iascii) { if (shell->name) {ierr = PetscViewerASCIIPrintf(viewer," Shell: %s\n",shell->name);CHKERRQ(ierr);} else {ierr = PetscViewerASCIIPrintf(viewer," Shell: no name\n");CHKERRQ(ierr);} } if (shell->view) { ierr = PetscViewerASCIIPushTab(viewer);CHKERRQ(ierr); ierr = (*shell->view)(shell->ctx,viewer);CHKERRQ(ierr); ierr = PetscViewerASCIIPopTab(viewer);CHKERRQ(ierr); } PetscFunctionReturn(0); } /* ------------------------------------------------------------------------------*/ EXTERN_C_BEGIN #undef __FUNCT__ #define __FUNCT__ "PCShellSetDestroy_Shell" PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetDestroy_Shell(PC pc, PetscErrorCode (*destroy)(void*)) { PC_Shell *shell; PetscFunctionBegin; shell = (PC_Shell*)pc->data; shell->destroy = destroy; PetscFunctionReturn(0); } EXTERN_C_END EXTERN_C_BEGIN #undef __FUNCT__ #define __FUNCT__ "PCShellSetSetUp_Shell" PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetSetUp_Shell(PC pc, PetscErrorCode (*setup)(void*)) { PC_Shell *shell; PetscFunctionBegin; shell = (PC_Shell*)pc->data; shell->setup = setup; PetscFunctionReturn(0); } EXTERN_C_END EXTERN_C_BEGIN #undef __FUNCT__ #define __FUNCT__ "PCShellSetApply_Shell" PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetApply_Shell(PC pc,PetscErrorCode (*apply)(void*,Vec,Vec)) { PC_Shell *shell; PetscFunctionBegin; shell = (PC_Shell*)pc->data; shell->apply = apply; PetscFunctionReturn(0); } EXTERN_C_END EXTERN_C_BEGIN #undef __FUNCT__ #define __FUNCT__ "PCShellSetApplyBA_Shell" PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetApplyBA_Shell(PC pc,PetscErrorCode (*apply)(void*,PCSide,Vec,Vec,Vec)) { PC_Shell *shell; PetscFunctionBegin; shell = (PC_Shell*)pc->data; shell->applyBA = apply; PetscFunctionReturn(0); } EXTERN_C_END EXTERN_C_BEGIN #undef __FUNCT__ #define __FUNCT__ "PCShellSetPreSolve_Shell" PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetPreSolve_Shell(PC pc,PetscErrorCode (*presolve)(void*,KSP,Vec,Vec)) { PC_Shell *shell; PetscFunctionBegin; shell = (PC_Shell*)pc->data; shell->presolve = presolve; if (presolve) { pc->ops->presolve = PCPreSolve_Shell; } else { pc->ops->presolve = 0; } PetscFunctionReturn(0); } EXTERN_C_END EXTERN_C_BEGIN #undef __FUNCT__ #define __FUNCT__ "PCShellSetPostSolve_Shell" PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetPostSolve_Shell(PC pc,PetscErrorCode (*postsolve)(void*,KSP,Vec,Vec)) { PC_Shell *shell; PetscFunctionBegin; shell = (PC_Shell*)pc->data; shell->postsolve = postsolve; if (postsolve) { pc->ops->postsolve = PCPostSolve_Shell; } else { pc->ops->postsolve = 0; } PetscFunctionReturn(0); } EXTERN_C_END EXTERN_C_BEGIN #undef __FUNCT__ #define __FUNCT__ "PCShellSetView_Shell" PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetView_Shell(PC pc,PetscErrorCode (*view)(void*,PetscViewer)) { PC_Shell *shell; PetscFunctionBegin; shell = (PC_Shell*)pc->data; shell->view = view; PetscFunctionReturn(0); } EXTERN_C_END EXTERN_C_BEGIN #undef __FUNCT__ #define __FUNCT__ "PCShellSetApplyTranspose_Shell" PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetApplyTranspose_Shell(PC pc,PetscErrorCode (*applytranspose)(void*,Vec,Vec)) { PC_Shell *shell; PetscFunctionBegin; shell = (PC_Shell*)pc->data; shell->applytranspose = applytranspose; PetscFunctionReturn(0); } EXTERN_C_END EXTERN_C_BEGIN #undef __FUNCT__ #define __FUNCT__ "PCShellSetName_Shell" PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetName_Shell(PC pc,const char name[]) { PC_Shell *shell; PetscErrorCode ierr; PetscFunctionBegin; shell = (PC_Shell*)pc->data; ierr = PetscStrfree(shell->name);CHKERRQ(ierr); ierr = PetscStrallocpy(name,&shell->name);CHKERRQ(ierr); PetscFunctionReturn(0); } EXTERN_C_END EXTERN_C_BEGIN #undef __FUNCT__ #define __FUNCT__ "PCShellGetName_Shell" PetscErrorCode PETSCKSP_DLLEXPORT PCShellGetName_Shell(PC pc,char *name[]) { PC_Shell *shell; PetscFunctionBegin; shell = (PC_Shell*)pc->data; *name = shell->name; PetscFunctionReturn(0); } EXTERN_C_END EXTERN_C_BEGIN #undef __FUNCT__ #define __FUNCT__ "PCShellSetApplyRichardson_Shell" PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetApplyRichardson_Shell(PC pc,PetscErrorCode (*apply)(void*,Vec,Vec,Vec,PetscReal,PetscReal,PetscReal,PetscInt)) { PC_Shell *shell; PetscFunctionBegin; shell = (PC_Shell*)pc->data; pc->ops->applyrichardson = PCApplyRichardson_Shell; shell->applyrich = apply; PetscFunctionReturn(0); } EXTERN_C_END /* -------------------------------------------------------------------------------*/ #undef __FUNCT__ #define __FUNCT__ "PCShellSetDestroy" /*@C PCShellSetDestroy - Sets routine to use to destroy the user-provided application context. Collective on PC Input Parameters: + pc - the preconditioner context . destroy - the application-provided destroy routine Calling sequence of destroy: .vb PetscErrorCode destroy (void *ptr) .ve . ptr - the application context Level: developer .keywords: PC, shell, set, destroy, user-provided .seealso: PCShellSetApply(), PCShellSetContext() @*/ PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetDestroy(PC pc,PetscErrorCode (*destroy)(void*)) { PetscErrorCode ierr,(*f)(PC,PetscErrorCode (*)(void*)); PetscFunctionBegin; PetscValidHeaderSpecific(pc,PC_COOKIE,1); ierr = PetscObjectQueryFunction((PetscObject)pc,"PCShellSetDestroy_C",(void (**)(void))&f);CHKERRQ(ierr); if (f) { ierr = (*f)(pc,destroy);CHKERRQ(ierr); } PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCShellSetSetUp" /*@C PCShellSetSetUp - Sets routine to use to "setup" the preconditioner whenever the matrix operator is changed. Collective on PC Input Parameters: + pc - the preconditioner context . setup - the application-provided setup routine Calling sequence of setup: .vb PetscErrorCode setup (void *ptr) .ve . ptr - the application context Level: developer .keywords: PC, shell, set, setup, user-provided .seealso: PCShellSetApplyRichardson(), PCShellSetApply(), PCShellSetContext() @*/ PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetSetUp(PC pc,PetscErrorCode (*setup)(void*)) { PetscErrorCode ierr,(*f)(PC,PetscErrorCode (*)(void*)); PetscFunctionBegin; PetscValidHeaderSpecific(pc,PC_COOKIE,1); ierr = PetscObjectQueryFunction((PetscObject)pc,"PCShellSetSetUp_C",(void (**)(void))&f);CHKERRQ(ierr); if (f) { ierr = (*f)(pc,setup);CHKERRQ(ierr); } PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCShellSetView" /*@C PCShellSetView - Sets routine to use as viewer of shell preconditioner Collective on PC Input Parameters: + pc - the preconditioner context - view - the application-provided view routine Calling sequence of apply: .vb PetscErrorCode view(void *ptr,PetscViewer v) .ve + ptr - the application context - v - viewer Level: developer .keywords: PC, shell, set, apply, user-provided .seealso: PCShellSetApplyRichardson(), PCShellSetSetUp(), PCShellSetApplyTranspose() @*/ PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetView(PC pc,PetscErrorCode (*view)(void*,PetscViewer)) { PetscErrorCode ierr,(*f)(PC,PetscErrorCode (*)(void*,PetscViewer)); PetscFunctionBegin; PetscValidHeaderSpecific(pc,PC_COOKIE,1); ierr = PetscObjectQueryFunction((PetscObject)pc,"PCShellSetView_C",(void (**)(void))&f);CHKERRQ(ierr); if (f) { ierr = (*f)(pc,view);CHKERRQ(ierr); } PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCShellSetApply" /*@C PCShellSetApply - Sets routine to use as preconditioner. Collective on PC Input Parameters: + pc - the preconditioner context - apply - the application-provided preconditioning routine Calling sequence of apply: .vb PetscErrorCode apply (void *ptr,Vec xin,Vec xout) .ve + ptr - the application context . xin - input vector - xout - output vector Level: developer .keywords: PC, shell, set, apply, user-provided .seealso: PCShellSetApplyRichardson(), PCShellSetSetUp(), PCShellSetApplyTranspose(), PCShellSetContext(), PCShellSetApplyBA() @*/ PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetApply(PC pc,PetscErrorCode (*apply)(void*,Vec,Vec)) { PetscErrorCode ierr,(*f)(PC,PetscErrorCode (*)(void*,Vec,Vec)); PetscFunctionBegin; PetscValidHeaderSpecific(pc,PC_COOKIE,1); ierr = PetscObjectQueryFunction((PetscObject)pc,"PCShellSetApply_C",(void (**)(void))&f);CHKERRQ(ierr); if (f) { ierr = (*f)(pc,apply);CHKERRQ(ierr); } PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCShellSetApplyBA" /*@C PCShellSetApplyBA - Sets routine to use as preconditioner times operator. Collective on PC Input Parameters: + pc - the preconditioner context - applyBA - the application-provided BA routine Calling sequence of apply: .vb PetscErrorCode applyBA (void *ptr,Vec xin,Vec xout) .ve + ptr - the application context . xin - input vector - xout - output vector Level: developer .keywords: PC, shell, set, apply, user-provided .seealso: PCShellSetApplyRichardson(), PCShellSetSetUp(), PCShellSetApplyTranspose(), PCShellSetContext(), PCShellSetApply() @*/ PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetApplyBA(PC pc,PetscErrorCode (*applyBA)(void*,PCSide,Vec,Vec,Vec)) { PetscErrorCode ierr,(*f)(PC,PetscErrorCode (*)(void*,PCSide,Vec,Vec,Vec)); PetscFunctionBegin; PetscValidHeaderSpecific(pc,PC_COOKIE,1); ierr = PetscObjectQueryFunction((PetscObject)pc,"PCShellSetApplyBA_C",(void (**)(void))&f);CHKERRQ(ierr); if (f) { ierr = (*f)(pc,applyBA);CHKERRQ(ierr); } PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCShellSetApplyTranspose" /*@C PCShellSetApplyTranspose - Sets routine to use as preconditioner transpose. Collective on PC Input Parameters: + pc - the preconditioner context - apply - the application-provided preconditioning transpose routine Calling sequence of apply: .vb PetscErrorCode applytranspose (void *ptr,Vec xin,Vec xout) .ve + ptr - the application context . xin - input vector - xout - output vector Level: developer Notes: Uses the same context variable as PCShellSetApply(). .keywords: PC, shell, set, apply, user-provided .seealso: PCShellSetApplyRichardson(), PCShellSetSetUp(), PCShellSetApply(), PCSetContext(), PCShellSetApplyBA() @*/ PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetApplyTranspose(PC pc,PetscErrorCode (*applytranspose)(void*,Vec,Vec)) { PetscErrorCode ierr,(*f)(PC,PetscErrorCode (*)(void*,Vec,Vec)); PetscFunctionBegin; PetscValidHeaderSpecific(pc,PC_COOKIE,1); ierr = PetscObjectQueryFunction((PetscObject)pc,"PCShellSetApplyTranspose_C",(void (**)(void))&f);CHKERRQ(ierr); if (f) { ierr = (*f)(pc,applytranspose);CHKERRQ(ierr); } PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCShellSetPreSolve" /*@C PCShellSetPreSolve - Sets routine to apply to the operators/vectors before a KSPSolve() is applied. This usually does something like scale the linear system in some application specific way. Collective on PC Input Parameters: + pc - the preconditioner context - presolve - the application-provided presolve routine Calling sequence of presolve: .vb PetscErrorCode presolve (void *ptr,KSP ksp,Vec b,Vec x) .ve + ptr - the application context . xin - input vector - xout - output vector Level: developer .keywords: PC, shell, set, apply, user-provided .seealso: PCShellSetApplyRichardson(), PCShellSetSetUp(), PCShellSetApplyTranspose(), PCShellSetPostSolve(), PCShellSetContext() @*/ PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetPreSolve(PC pc,PetscErrorCode (*presolve)(void*,KSP,Vec,Vec)) { PetscErrorCode ierr,(*f)(PC,PetscErrorCode (*)(void*,KSP,Vec,Vec)); PetscFunctionBegin; PetscValidHeaderSpecific(pc,PC_COOKIE,1); ierr = PetscObjectQueryFunction((PetscObject)pc,"PCShellSetPreSolve_C",(void (**)(void))&f);CHKERRQ(ierr); if (f) { ierr = (*f)(pc,presolve);CHKERRQ(ierr); } PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCShellSetPostSolve" /*@C PCShellSetPostSolve - Sets routine to apply to the operators/vectors before a KSPSolve() is applied. This usually does something like scale the linear system in some application specific way. Collective on PC Input Parameters: + pc - the preconditioner context - postsolve - the application-provided presolve routine Calling sequence of postsolve: .vb PetscErrorCode postsolve(void *ptr,KSP ksp,Vec b,Vec x) .ve + ptr - the application context . xin - input vector - xout - output vector Level: developer .keywords: PC, shell, set, apply, user-provided .seealso: PCShellSetApplyRichardson(), PCShellSetSetUp(), PCShellSetApplyTranspose(), PCShellSetPreSolve(), PCShellSetContext() @*/ PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetPostSolve(PC pc,PetscErrorCode (*postsolve)(void*,KSP,Vec,Vec)) { PetscErrorCode ierr,(*f)(PC,PetscErrorCode (*)(void*,KSP,Vec,Vec)); PetscFunctionBegin; PetscValidHeaderSpecific(pc,PC_COOKIE,1); ierr = PetscObjectQueryFunction((PetscObject)pc,"PCShellSetPostSolve_C",(void (**)(void))&f);CHKERRQ(ierr); if (f) { ierr = (*f)(pc,postsolve);CHKERRQ(ierr); } PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCShellSetName" /*@C PCShellSetName - Sets an optional name to associate with a shell preconditioner. Not Collective Input Parameters: + pc - the preconditioner context - name - character string describing shell preconditioner Level: developer .keywords: PC, shell, set, name, user-provided .seealso: PCShellGetName() @*/ PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetName(PC pc,const char name[]) { PetscErrorCode ierr,(*f)(PC,const char []); PetscFunctionBegin; PetscValidHeaderSpecific(pc,PC_COOKIE,1); ierr = PetscObjectQueryFunction((PetscObject)pc,"PCShellSetName_C",(void (**)(void))&f);CHKERRQ(ierr); if (f) { ierr = (*f)(pc,name);CHKERRQ(ierr); } PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCShellGetName" /*@C PCShellGetName - Gets an optional name that the user has set for a shell preconditioner. Not Collective Input Parameter: . pc - the preconditioner context Output Parameter: . name - character string describing shell preconditioner (you should not free this) Level: developer .keywords: PC, shell, get, name, user-provided .seealso: PCShellSetName() @*/ PetscErrorCode PETSCKSP_DLLEXPORT PCShellGetName(PC pc,char *name[]) { PetscErrorCode ierr,(*f)(PC,char *[]); PetscFunctionBegin; PetscValidHeaderSpecific(pc,PC_COOKIE,1); PetscValidPointer(name,2); ierr = PetscObjectQueryFunction((PetscObject)pc,"PCShellGetName_C",(void (**)(void))&f);CHKERRQ(ierr); if (f) { ierr = (*f)(pc,name);CHKERRQ(ierr); } else { SETERRQ(PETSC_ERR_ARG_WRONG,"Not shell preconditioner, cannot get name"); } PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCShellSetApplyRichardson" /*@C PCShellSetApplyRichardson - Sets routine to use as preconditioner in Richardson iteration. Collective on PC Input Parameters: + pc - the preconditioner context - apply - the application-provided preconditioning routine Calling sequence of apply: .vb PetscErrorCode apply (void *ptr,Vec b,Vec x,Vec r,PetscReal rtol,PetscReal abstol,PetscReal dtol,PetscInt maxits) .ve + ptr - the application context . b - right-hand-side . x - current iterate . r - work space . rtol - relative tolerance of residual norm to stop at . abstol - absolute tolerance of residual norm to stop at . dtol - if residual norm increases by this factor than return - maxits - number of iterations to run Level: developer .keywords: PC, shell, set, apply, Richardson, user-provided .seealso: PCShellSetApply(), PCShellSetContext() @*/ PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetApplyRichardson(PC pc,PetscErrorCode (*apply)(void*,Vec,Vec,Vec,PetscReal,PetscReal,PetscReal,PetscInt)) { PetscErrorCode ierr,(*f)(PC,PetscErrorCode (*)(void*,Vec,Vec,Vec,PetscReal,PetscReal,PetscReal,PetscInt)); PetscFunctionBegin; PetscValidHeaderSpecific(pc,PC_COOKIE,1); ierr = PetscObjectQueryFunction((PetscObject)pc,"PCShellSetApplyRichardson_C",(void (**)(void))&f);CHKERRQ(ierr); if (f) { ierr = (*f)(pc,apply);CHKERRQ(ierr); } PetscFunctionReturn(0); } /*MC PCSHELL - Creates a new preconditioner class for use with your own private data storage format. Level: advanced Concepts: providing your own preconditioner Usage: $ PetscErrorCode (*mult)(void*,Vec,Vec); $ PetscErrorCode (*setup)(void*); $ PCCreate(comm,&pc); $ PCSetType(pc,PCSHELL); $ PCShellSetApply(pc,mult); $ PCShellSetApplyBA(pc,mult); (optional) $ PCShellSetApplyTranspose(pc,mult); (optional) $ PCShellSetContext(pc,ctx) $ PCShellSetSetUp(pc,setup); (optional) .seealso: PCCreate(), PCSetType(), PCType (for list of available types), PC, MATSHELL, PCShellSetSetUp(), PCShellSetApply(), PCShellSetView(), PCShellSetApplyTranspose(), PCShellSetName(), PCShellSetApplyRichardson(), PCShellGetName(), PCShellSetContext(), PCShellGetContext(), PCShellSetApplyBA() M*/ EXTERN_C_BEGIN #undef __FUNCT__ #define __FUNCT__ "PCCreate_Shell" PetscErrorCode PETSCKSP_DLLEXPORT PCCreate_Shell(PC pc) { PetscErrorCode ierr; PC_Shell *shell; PetscFunctionBegin; pc->ops->destroy = PCDestroy_Shell; ierr = PetscNew(PC_Shell,&shell);CHKERRQ(ierr); ierr = PetscLogObjectMemory(pc,sizeof(PC_Shell));CHKERRQ(ierr); pc->data = (void*)shell; pc->name = 0; pc->ops->apply = PCApply_Shell; pc->ops->applyBA = PCApplyBA_Shell; pc->ops->view = PCView_Shell; pc->ops->applytranspose = PCApplyTranspose_Shell; pc->ops->applyrichardson = 0; pc->ops->setup = PCSetUp_Shell; pc->ops->presolve = 0; pc->ops->postsolve = 0; pc->ops->view = PCView_Shell; shell->apply = 0; shell->applytranspose = 0; shell->name = 0; shell->applyrich = 0; shell->presolve = 0; shell->postsolve = 0; shell->ctx = 0; shell->setup = 0; shell->view = 0; shell->destroy = 0; ierr = PetscObjectComposeFunctionDynamic((PetscObject)pc,"PCShellSetDestroy_C","PCShellSetDestroy_Shell", PCShellSetDestroy_Shell);CHKERRQ(ierr); ierr = PetscObjectComposeFunctionDynamic((PetscObject)pc,"PCShellSetSetUp_C","PCShellSetSetUp_Shell", PCShellSetSetUp_Shell);CHKERRQ(ierr); ierr = PetscObjectComposeFunctionDynamic((PetscObject)pc,"PCShellSetApply_C","PCShellSetApply_Shell", PCShellSetApply_Shell);CHKERRQ(ierr); ierr = PetscObjectComposeFunctionDynamic((PetscObject)pc,"PCShellSetApplyBA_C","PCShellSetApplyBA_Shell", PCShellSetApplyBA_Shell);CHKERRQ(ierr); ierr = PetscObjectComposeFunctionDynamic((PetscObject)pc,"PCShellSetPreSolve_C","PCShellSetPreSolve_Shell", PCShellSetPreSolve_Shell);CHKERRQ(ierr); ierr = PetscObjectComposeFunctionDynamic((PetscObject)pc,"PCShellSetPostSolve_C","PCShellSetPostSolve_Shell", PCShellSetPostSolve_Shell);CHKERRQ(ierr); ierr = PetscObjectComposeFunctionDynamic((PetscObject)pc,"PCShellSetView_C","PCShellSetView_Shell", PCShellSetView_Shell);CHKERRQ(ierr); ierr = PetscObjectComposeFunctionDynamic((PetscObject)pc,"PCShellSetApplyTranspose_C","PCShellSetApplyTranspose_Shell", PCShellSetApplyTranspose_Shell);CHKERRQ(ierr); ierr = PetscObjectComposeFunctionDynamic((PetscObject)pc,"PCShellSetName_C","PCShellSetName_Shell", PCShellSetName_Shell);CHKERRQ(ierr); ierr = PetscObjectComposeFunctionDynamic((PetscObject)pc,"PCShellGetName_C","PCShellGetName_Shell", PCShellGetName_Shell);CHKERRQ(ierr); ierr = PetscObjectComposeFunctionDynamic((PetscObject)pc,"PCShellSetApplyRichardson_C","PCShellSetApplyRichardson_Shell", PCShellSetApplyRichardson_Shell);CHKERRQ(ierr); PetscFunctionReturn(0); } EXTERN_C_END From diosady at MIT.EDU Tue Aug 8 08:40:49 2006 From: diosady at MIT.EDU (Laslo Tibor Diosady) Date: Tue, 8 Aug 2006 09:40:49 -0400 (EDT) Subject: In place ILU(0) factorization In-Reply-To: References: Message-ID: Thanks, Laslo On Tue, 8 Aug 2006, Hong Zhang wrote: > > Laslo, > > We figured out a way to implement ILU(0) with reordering > without allocating workspace. > We'll add this support later. I'll let you know when > it is done. > > Thanks for your request that help us to make petsc > better. > > Hong > > On Fri, 4 Aug 2006, Laslo Tibor Diosady wrote: > >> Hong, >> >> >>> The space required remains the same, but >>> the row-compressed matrix format >>> for the factor will be changed with the >>> reordering. >>> To store the new format over the >>> existing memory, temp space has to be allocated during >>> implementation. Thus replacing the original memory with >>> newly allocated space would make implementation easier. >>> >> >> I guess this depends upon the implementation of the ILU(0) factorization >> for the AIJ or BAIJ formats, which I didn't (nor do I ever really want >> to) look into. >> >> Thanks for the help, >> >> Laslo >> >> > > From jiaxun_hou at yahoo.com.cn Wed Aug 9 05:42:35 2006 From: jiaxun_hou at yahoo.com.cn (jiaxun hou) Date: Wed, 9 Aug 2006 18:42:35 +0800 (CST) Subject: About user defined PC In-Reply-To: Message-ID: <20060809104235.32229.qmail@web15801.mail.cnb.yahoo.com> Barry, Thank you very much. Your codes are very useful ! Regards, Jiaxun Barry Smith ??? Jiaxun, I am assuming you are using the PCSHELL? I have added support for this for you; * if you are using petsc-dev (http://www-unix.mcs.anl.gov/petsc/petsc-as/developers/index.html) you need only do an hg pull to get my additions then run "make" in src/ksp/pc/impls/shell. * if you are not using petsc-dev (or use the nightly tar ball) then I attach the three files that were changed. include/private/pcimpl.h, include/petscpc.h and src/ksp/pc/impls/shell/shell.c (again run make in src/ksp/pc/impls/shell) If you are actually writing a complete PC and not using PCSHELL http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/src/ksp/pc/impls/jacobi/jacobi.c.html then you just need to provide a routine PCApplyBA_XXXX() with calling sequence: PC,PCSide,Vec b,Vec x,Vec work Good luck, Barry On Tue, 8 Aug 2006, jiaxun hou wrote: > Hi, > > I met a problem ,when I constructed a user-defined PC. That is I need to defind the process of P^(-1)Mx in each iteration of GMRES by myself for efficient reason, not only define P^(-1)x . But the Petsc seems to separate this process into two parts: the first is y=Mx which is defined in Petsc framework, and the second is P^(-1)y which is defined by the user. So, is there any way to do it without change the code of Petsc framework? > > Regards, > Jiaxun > > > --------------------------------- > ????????-3.5G???20M???/* Preconditioner module. */ #if !defined(__PETSCPC_H) #define __PETSCPC_H #include "petscmat.h" PETSC_EXTERN_CXX_BEGIN EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCInitializePackage(const char[]); /* PCList contains the list of preconditioners currently registered These are added with the PCRegisterDynamic() macro */ extern PetscFList PCList; #define PCType const char* /*S PC - Abstract PETSc object that manages all preconditioners Level: beginner Concepts: preconditioners .seealso: PCCreate(), PCSetType(), PCType (for list of available types) S*/ typedef struct _p_PC* PC; /*E PCType - String with the name of a PETSc preconditioner method or the creation function with an optional dynamic library name, for example http://www.mcs.anl.gov/petsc/lib.a:mypccreate() Level: beginner Notes: Click on the links below to see details on a particular solver .seealso: PCSetType(), PC, PCCreate() E*/ #define PCNONE "none" #define PCJACOBI "jacobi" #define PCSOR "sor" #define PCLU "lu" #define PCSHELL "shell" #define PCBJACOBI "bjacobi" #define PCMG "mg" #define PCEISENSTAT "eisenstat" #define PCILU "ilu" #define PCICC "icc" #define PCASM "asm" #define PCKSP "ksp" #define PCCOMPOSITE "composite" #define PCREDUNDANT "redundant" #define PCSPAI "spai" #define PCNN "nn" #define PCCHOLESKY "cholesky" #define PCSAMG "samg" #define PCPBJACOBI "pbjacobi" #define PCMAT "mat" #define PCHYPRE "hypre" #define PCFIELDSPLIT "fieldsplit" #define PCTFS "tfs" #define PCML "ml" #define PCPROMETHEUS "prometheus" #define PCGALERKIN "galerkin" /* Logging support */ extern PetscCookie PETSCKSP_DLLEXPORT PC_COOKIE; /*E PCSide - If the preconditioner is to be applied to the left, right or symmetrically around the operator. Level: beginner .seealso: E*/ typedef enum { PC_LEFT,PC_RIGHT,PC_SYMMETRIC } PCSide; extern const char *PCSides[]; EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCCreate(MPI_Comm,PC*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSetType(PC,PCType); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSetUp(PC); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSetUpOnBlocks(PC); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCApply(PC,Vec,Vec); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCApplySymmetricLeft(PC,Vec,Vec); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCApplySymmetricRight(PC,Vec,Vec); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCApplyBAorAB(PC,PCSide,Vec,Vec,Vec); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCApplyTranspose(PC,Vec,Vec); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCHasApplyTranspose(PC,PetscTruth*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCApplyBAorABTranspose(PC,PCSide,Vec,Vec,Vec); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCApplyRichardson(PC,Vec,Vec,Vec,PetscReal,PetscReal,PetscReal,PetscInt); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCApplyRichardsonExists(PC,PetscTruth*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCRegisterDestroy(void); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCRegisterAll(const char[]); extern PetscTruth PCRegisterAllCalled; EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCRegister(const char[],const char[],const char[],PetscErrorCode(*)(PC)); /*MC PCRegisterDynamic - Adds a method to the preconditioner package. Synopsis: PetscErrorCode PCRegisterDynamic(char *name_solver,char *path,char *name_create,PetscErrorCode (*routine_create)(PC)) Not collective Input Parameters: + name_solver - name of a new user-defined solver . path - path (either absolute or relative) the library containing this solver . name_create - name of routine to create method context - routine_create - routine to create method context Notes: PCRegisterDynamic() may be called multiple times to add several user-defined preconditioners. If dynamic libraries are used, then the fourth input argument (routine_create) is ignored. Sample usage: .vb PCRegisterDynamic("my_solver","/home/username/my_lib/lib/libO/solaris/mylib", "MySolverCreate",MySolverCreate); .ve Then, your solver can be chosen with the procedural interface via $ PCSetType(pc,"my_solver") or at runtime via the option $ -pc_type my_solver Level: advanced Notes: ${PETSC_ARCH}, ${PETSC_DIR}, ${PETSC_LIB_DIR}, or ${any environmental variable} occuring in pathname will be replaced with appropriate values. If your function is not being put into a shared library then use PCRegister() instead .keywords: PC, register .seealso: PCRegisterAll(), PCRegisterDestroy() M*/ #if defined(PETSC_USE_DYNAMIC_LIBRARIES) #define PCRegisterDynamic(a,b,c,d) PCRegister(a,b,c,0) #else #define PCRegisterDynamic(a,b,c,d) PCRegister(a,b,c,d) #endif EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCDestroy(PC); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSetFromOptions(PC); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCGetType(PC,PCType*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCGetFactoredMatrix(PC,Mat*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSetModifySubMatrices(PC,PetscErrorCode(*)(PC,PetscInt,const IS[],const IS[],Mat[],void*),void*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCModifySubMatrices(PC,PetscInt,const IS[],const IS[],Mat[],void*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSetOperators(PC,Mat,Mat,MatStructure); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCGetOperators(PC,Mat*,Mat*,MatStructure*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCGetOperatorsSet(PC,PetscTruth*,PetscTruth*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCView(PC,PetscViewer); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSetOptionsPrefix(PC,const char[]); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCAppendOptionsPrefix(PC,const char[]); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCGetOptionsPrefix(PC,const char*[]); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCComputeExplicitOperator(PC,Mat*); /* These are used to provide extra scaling of preconditioned operator for time-stepping schemes like in SUNDIALS */ EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCDiagonalScale(PC,PetscTruth*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCDiagonalScaleLeft(PC,Vec,Vec); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCDiagonalScaleRight(PC,Vec,Vec); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCDiagonalScaleSet(PC,Vec); /* ------------- options specific to particular preconditioners --------- */ EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCJacobiSetUseRowMax(PC); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCJacobiSetUseAbs(PC); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSORSetSymmetric(PC,MatSORType); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSORSetOmega(PC,PetscReal); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSORSetIterations(PC,PetscInt,PetscInt); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCEisenstatSetOmega(PC,PetscReal); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCEisenstatNoDiagonalScaling(PC); #define USE_PRECONDITIONER_MATRIX 0 #define USE_TRUE_MATRIX 1 EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCBJacobiSetUseTrueLocal(PC); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCBJacobiSetTotalBlocks(PC,PetscInt,const PetscInt[]); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCBJacobiSetLocalBlocks(PC,PetscInt,const PetscInt[]); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCKSPSetUseTrue(PC); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetApply(PC,PetscErrorCode (*)(void*,Vec,Vec)); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetApplyBA(PC,PetscErrorCode (*)(void*,PCSide,Vec,Vec,Vec)); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetApplyTranspose(PC,PetscErrorCode (*)(void*,Vec,Vec)); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetSetUp(PC,PetscErrorCode (*)(void*)); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetApplyRichardson(PC,PetscErrorCode (*)(void*,Vec,Vec,Vec,PetscReal,PetscReal,PetscReal,PetscInt)); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetView(PC,PetscErrorCode (*)(void*,PetscViewer)); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetDestroy(PC,PetscErrorCode (*)(void*)); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCShellGetContext(PC,void**); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetContext(PC,void*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetName(PC,const char[]); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCShellGetName(PC,char*[]); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorSetZeroPivot(PC,PetscReal); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorSetShiftNonzero(PC,PetscReal); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorSetShiftPd(PC,PetscTruth); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorSetFill(PC,PetscReal); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorSetPivoting(PC,PetscReal); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorReorderForNonzeroDiagonal(PC,PetscReal); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorSetMatOrdering(PC,MatOrderingType); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorSetReuseOrdering(PC,PetscTruth); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorSetReuseFill(PC,PetscTruth); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorSetUseInPlace(PC); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorSetAllowDiagonalFill(PC); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorSetPivotInBlocks(PC,PetscTruth); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorSetLevels(PC,PetscInt); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFactorSetUseDropTolerance(PC,PetscReal,PetscReal,PetscInt); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCASMSetLocalSubdomains(PC,PetscInt,IS[]); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCASMSetTotalSubdomains(PC,PetscInt,IS[]); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCASMSetOverlap(PC,PetscInt); /*E PCASMType - Type of additive Schwarz method to use $ PC_ASM_BASIC - symmetric version where residuals from the ghost points are used $ and computed values in ghost regions are added together. Classical $ standard additive Schwarz $ PC_ASM_RESTRICT - residuals from ghost points are used but computed values in ghost $ region are discarded. Default $ PC_ASM_INTERPOLATE - residuals from ghost points are not used, computed values in ghost $ region are added back in $ PC_ASM_NONE - ghost point residuals are not used, computed ghost values are discarded $ not very good. Level: beginner .seealso: PCASMSetType() E*/ typedef enum {PC_ASM_BASIC = 3,PC_ASM_RESTRICT = 1,PC_ASM_INTERPOLATE = 2,PC_ASM_NONE = 0} PCASMType; extern const char *PCASMTypes[]; EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCASMSetType(PC,PCASMType); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCASMCreateSubdomains2D(PetscInt,PetscInt,PetscInt,PetscInt,PetscInt,PetscInt,PetscInt *,IS **); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCASMSetUseInPlace(PC); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCASMGetLocalSubdomains(PC,PetscInt*,IS*[]); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCASMGetLocalSubmatrices(PC,PetscInt*,Mat*[]); /*E PCCompositeType - Determines how two or more preconditioner are composed $ PC_COMPOSITE_ADDITIVE - results from application of all preconditioners are added together $ PC_COMPOSITE_MULTIPLICATIVE - preconditioners are applied sequentially to the residual freshly $ computed after the previous preconditioner application $ PC_COMPOSITE_SPECIAL - This is very special for a matrix of the form alpha I + R + S $ where first preconditioner is built from alpha I + S and second from $ alpha I + R Level: beginner .seealso: PCCompositeSetType() E*/ typedef enum {PC_COMPOSITE_ADDITIVE,PC_COMPOSITE_MULTIPLICATIVE,PC_COMPOSITE_SPECIAL} PCCompositeType; extern const char *PCCompositeTypes[]; EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCCompositeSetUseTrue(PC); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCCompositeSetType(PC,PCCompositeType); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCCompositeAddPC(PC,PCType); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCCompositeGetPC(PC pc,PetscInt n,PC *); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCCompositeSpecialSetAlpha(PC,PetscScalar); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCRedundantSetScatter(PC,VecScatter,VecScatter); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCRedundantGetOperators(PC,Mat*,Mat*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCRedundantGetPC(PC,PC*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSPAISetEpsilon(PC,double); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSPAISetNBSteps(PC,PetscInt); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSPAISetMax(PC,PetscInt); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSPAISetMaxNew(PC,PetscInt); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSPAISetBlockSize(PC,PetscInt); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSPAISetCacheSize(PC,PetscInt); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSPAISetVerbose(PC,PetscInt); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSPAISetSp(PC,PetscInt); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCHYPRESetType(PC,const char[]); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCBJacobiGetLocalBlocks(PC,PetscInt*,const PetscInt*[]); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCBJacobiGetTotalBlocks(PC,PetscInt*,const PetscInt*[]); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFieldSplitSetFields(PC,PetscInt,PetscInt*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCFieldSplitSetType(PC,PCCompositeType); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCGalerkinSetRestriction(PC,Mat); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCGalerkinSetInterpolation(PC,Mat); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSetCoordinates(PC,PetscInt,PetscReal*); EXTERN PetscErrorCode PETSCKSP_DLLEXPORT PCSASetVectors(PC,PetscInt,PetscReal *); PETSC_EXTERN_CXX_END #endif /* __PETSCPC_H */ #ifndef _PCIMPL #define _PCIMPL #include "petscksp.h" #include "petscpc.h" typedef struct _PCOps *PCOps; struct _PCOps { PetscErrorCode (*setup)(PC); PetscErrorCode (*apply)(PC,Vec,Vec); PetscErrorCode (*applyrichardson)(PC,Vec,Vec,Vec,PetscReal,PetscReal,PetscReal,PetscInt); PetscErrorCode (*applyBA)(PC,PCSide,Vec,Vec,Vec); PetscErrorCode (*applytranspose)(PC,Vec,Vec); PetscErrorCode (*applyBAtranspose)(PC,PetscInt,Vec,Vec,Vec); PetscErrorCode (*setfromoptions)(PC); PetscErrorCode (*presolve)(PC,KSP,Vec,Vec); PetscErrorCode (*postsolve)(PC,KSP,Vec,Vec); PetscErrorCode (*getfactoredmatrix)(PC,Mat*); PetscErrorCode (*applysymmetricleft)(PC,Vec,Vec); PetscErrorCode (*applysymmetricright)(PC,Vec,Vec); PetscErrorCode (*setuponblocks)(PC); PetscErrorCode (*destroy)(PC); PetscErrorCode (*view)(PC,PetscViewer); }; /* Preconditioner context */ struct _p_PC { PETSCHEADER(struct _PCOps); PetscInt setupcalled; MatStructure flag; Mat mat,pmat; Vec diagonalscaleright,diagonalscaleleft; /* used for time integration scaling */ PetscTruth diagonalscale; PetscErrorCode (*modifysubmatrices)(PC,PetscInt,const IS[],const IS[],Mat[],void*); /* user provided routine */ void *modifysubmatricesP; /* context for user routine */ void *data; }; extern PetscEvent PC_SetUp, PC_SetUpOnBlocks, PC_Apply, PC_ApplyCoarse, PC_ApplyMultiple, PC_ApplySymmetricLeft; extern PetscEvent PC_ApplySymmetricRight, PC_ModifySubMatrices; #endif #define PETSCKSP_DLL /* This provides a simple shell for Fortran (and C programmers) to create their own preconditioner without writing much interface code. */ #include "private/pcimpl.h" /*I "petscpc.h" I*/ #include "private/vecimpl.h" EXTERN_C_BEGIN typedef struct { void *ctx; /* user provided contexts for preconditioner */ PetscErrorCode (*destroy)(void*); PetscErrorCode (*setup)(void*); PetscErrorCode (*apply)(void*,Vec,Vec); PetscErrorCode (*applyBA)(void*,PCSide,Vec,Vec,Vec); PetscErrorCode (*presolve)(void*,KSP,Vec,Vec); PetscErrorCode (*postsolve)(void*,KSP,Vec,Vec); PetscErrorCode (*view)(void*,PetscViewer); PetscErrorCode (*applytranspose)(void*,Vec,Vec); PetscErrorCode (*applyrich)(void*,Vec,Vec,Vec,PetscReal,PetscReal,PetscReal,PetscInt); char *name; } PC_Shell; EXTERN_C_END #undef __FUNCT__ #define __FUNCT__ "PCShellGetContext" /*@ PCShellGetContext - Returns the user-provided context associated with a shell PC Not Collective Input Parameter: . pc - should have been created with PCCreateShell() Output Parameter: . ctx - the user provided context Level: advanced Notes: This routine is intended for use within various shell routines .keywords: PC, shell, get, context .seealso: PCCreateShell(), PCShellSetContext() @*/ PetscErrorCode PETSCKSP_DLLEXPORT PCShellGetContext(PC pc,void **ctx) { PetscErrorCode ierr; PetscTruth flg; PetscFunctionBegin; PetscValidHeaderSpecific(pc,PC_COOKIE,1); PetscValidPointer(ctx,2); ierr = PetscTypeCompare((PetscObject)pc,PCSHELL,&flg);CHKERRQ(ierr); if (!flg) *ctx = 0; else *ctx = ((PC_Shell*)(pc->data))->ctx; PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCShellSetContext" /*@C PCShellSetContext - sets the context for a shell PC Collective on PC Input Parameters: + pc - the shell PC - ctx - the context Level: advanced Fortran Notes: The context can only be an integer or a PetscObject unfortunately it cannot be a Fortran array or derived type. .seealso: PCCreateShell(), PCShellGetContext() @*/ PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetContext(PC pc,void *ctx) { PC_Shell *shell = (PC_Shell*)pc->data; PetscErrorCode ierr; PetscTruth flg; PetscFunctionBegin; PetscValidHeaderSpecific(pc,PC_COOKIE,1); ierr = PetscTypeCompare((PetscObject)pc,PCSHELL,&flg);CHKERRQ(ierr); if (flg) { shell->ctx = ctx; } PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCSetUp_Shell" static PetscErrorCode PCSetUp_Shell(PC pc) { PC_Shell *shell; PetscErrorCode ierr; PetscFunctionBegin; shell = (PC_Shell*)pc->data; if (shell->setup) { CHKMEMQ; ierr = (*shell->setup)(shell->ctx);CHKERRQ(ierr); CHKMEMQ; } PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCApply_Shell" static PetscErrorCode PCApply_Shell(PC pc,Vec x,Vec y) { PC_Shell *shell; PetscErrorCode ierr; PetscFunctionBegin; shell = (PC_Shell*)pc->data; if (!shell->apply) SETERRQ(PETSC_ERR_USER,"No apply() routine provided to Shell PC"); PetscStackPush("PCSHELL user function"); CHKMEMQ; ierr = (*shell->apply)(shell->ctx,x,y);CHKERRQ(ierr); CHKMEMQ; PetscStackPop; PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCApplyBA_Shell" static PetscErrorCode PCApplyBA_Shell(PC pc,PCSide side,Vec x,Vec y,Vec w) { PC_Shell *shell; PetscErrorCode ierr; PetscFunctionBegin; shell = (PC_Shell*)pc->data; if (!shell->applyBA) SETERRQ(PETSC_ERR_USER,"No applyBA() routine provided to Shell PC"); PetscStackPush("PCSHELL user function BA"); CHKMEMQ; ierr = (*shell->applyBA)(shell->ctx,side,x,y,w);CHKERRQ(ierr); CHKMEMQ; PetscStackPop; PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCPreSolve_Shell" static PetscErrorCode PCPreSolve_Shell(PC pc,KSP ksp,Vec b,Vec x) { PC_Shell *shell; PetscErrorCode ierr; PetscFunctionBegin; shell = (PC_Shell*)pc->data; if (!shell->presolve) SETERRQ(PETSC_ERR_USER,"No presolve() routine provided to Shell PC"); ierr = (*shell->presolve)(shell->ctx,ksp,b,x);CHKERRQ(ierr); PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCPostSolve_Shell" static PetscErrorCode PCPostSolve_Shell(PC pc,KSP ksp,Vec b,Vec x) { PC_Shell *shell; PetscErrorCode ierr; PetscFunctionBegin; shell = (PC_Shell*)pc->data; if (!shell->postsolve) SETERRQ(PETSC_ERR_USER,"No postsolve() routine provided to Shell PC"); ierr = (*shell->postsolve)(shell->ctx,ksp,b,x);CHKERRQ(ierr); PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCApplyTranspose_Shell" static PetscErrorCode PCApplyTranspose_Shell(PC pc,Vec x,Vec y) { PC_Shell *shell; PetscErrorCode ierr; PetscFunctionBegin; shell = (PC_Shell*)pc->data; if (!shell->applytranspose) SETERRQ(PETSC_ERR_USER,"No applytranspose() routine provided to Shell PC"); ierr = (*shell->applytranspose)(shell->ctx,x,y);CHKERRQ(ierr); PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCApplyRichardson_Shell" static PetscErrorCode PCApplyRichardson_Shell(PC pc,Vec x,Vec y,Vec w,PetscReal rtol,PetscReal abstol, PetscReal dtol,PetscInt it) { PetscErrorCode ierr; PC_Shell *shell; PetscFunctionBegin; shell = (PC_Shell*)pc->data; ierr = (*shell->applyrich)(shell->ctx,x,y,w,rtol,abstol,dtol,it);CHKERRQ(ierr); PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCDestroy_Shell" static PetscErrorCode PCDestroy_Shell(PC pc) { PC_Shell *shell = (PC_Shell*)pc->data; PetscErrorCode ierr; PetscFunctionBegin; ierr = PetscStrfree(shell->name);CHKERRQ(ierr); if (shell->destroy) { ierr = (*shell->destroy)(shell->ctx);CHKERRQ(ierr); } ierr = PetscFree(shell);CHKERRQ(ierr); PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCView_Shell" static PetscErrorCode PCView_Shell(PC pc,PetscViewer viewer) { PC_Shell *shell = (PC_Shell*)pc->data; PetscErrorCode ierr; PetscTruth iascii; PetscFunctionBegin; ierr = PetscTypeCompare((PetscObject)viewer,PETSC_VIEWER_ASCII,&iascii);CHKERRQ(ierr); if (iascii) { if (shell->name) {ierr = PetscViewerASCIIPrintf(viewer," Shell: %s\n",shell->name);CHKERRQ(ierr);} else {ierr = PetscViewerASCIIPrintf(viewer," Shell: no name\n");CHKERRQ(ierr);} } if (shell->view) { ierr = PetscViewerASCIIPushTab(viewer);CHKERRQ(ierr); ierr = (*shell->view)(shell->ctx,viewer);CHKERRQ(ierr); ierr = PetscViewerASCIIPopTab(viewer);CHKERRQ(ierr); } PetscFunctionReturn(0); } /* ------------------------------------------------------------------------------*/ EXTERN_C_BEGIN #undef __FUNCT__ #define __FUNCT__ "PCShellSetDestroy_Shell" PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetDestroy_Shell(PC pc, PetscErrorCode (*destroy)(void*)) { PC_Shell *shell; PetscFunctionBegin; shell = (PC_Shell*)pc->data; shell->destroy = destroy; PetscFunctionReturn(0); } EXTERN_C_END EXTERN_C_BEGIN #undef __FUNCT__ #define __FUNCT__ "PCShellSetSetUp_Shell" PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetSetUp_Shell(PC pc, PetscErrorCode (*setup)(void*)) { PC_Shell *shell; PetscFunctionBegin; shell = (PC_Shell*)pc->data; shell->setup = setup; PetscFunctionReturn(0); } EXTERN_C_END EXTERN_C_BEGIN #undef __FUNCT__ #define __FUNCT__ "PCShellSetApply_Shell" PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetApply_Shell(PC pc,PetscErrorCode (*apply)(void*,Vec,Vec)) { PC_Shell *shell; PetscFunctionBegin; shell = (PC_Shell*)pc->data; shell->apply = apply; PetscFunctionReturn(0); } EXTERN_C_END EXTERN_C_BEGIN #undef __FUNCT__ #define __FUNCT__ "PCShellSetApplyBA_Shell" PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetApplyBA_Shell(PC pc,PetscErrorCode (*apply)(void*,PCSide,Vec,Vec,Vec)) { PC_Shell *shell; PetscFunctionBegin; shell = (PC_Shell*)pc->data; shell->applyBA = apply; PetscFunctionReturn(0); } EXTERN_C_END EXTERN_C_BEGIN #undef __FUNCT__ #define __FUNCT__ "PCShellSetPreSolve_Shell" PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetPreSolve_Shell(PC pc,PetscErrorCode (*presolve)(void*,KSP,Vec,Vec)) { PC_Shell *shell; PetscFunctionBegin; shell = (PC_Shell*)pc->data; shell->presolve = presolve; if (presolve) { pc->ops->presolve = PCPreSolve_Shell; } else { pc->ops->presolve = 0; } PetscFunctionReturn(0); } EXTERN_C_END EXTERN_C_BEGIN #undef __FUNCT__ #define __FUNCT__ "PCShellSetPostSolve_Shell" PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetPostSolve_Shell(PC pc,PetscErrorCode (*postsolve)(void*,KSP,Vec,Vec)) { PC_Shell *shell; PetscFunctionBegin; shell = (PC_Shell*)pc->data; shell->postsolve = postsolve; if (postsolve) { pc->ops->postsolve = PCPostSolve_Shell; } else { pc->ops->postsolve = 0; } PetscFunctionReturn(0); } EXTERN_C_END EXTERN_C_BEGIN #undef __FUNCT__ #define __FUNCT__ "PCShellSetView_Shell" PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetView_Shell(PC pc,PetscErrorCode (*view)(void*,PetscViewer)) { PC_Shell *shell; PetscFunctionBegin; shell = (PC_Shell*)pc->data; shell->view = view; PetscFunctionReturn(0); } EXTERN_C_END EXTERN_C_BEGIN #undef __FUNCT__ #define __FUNCT__ "PCShellSetApplyTranspose_Shell" PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetApplyTranspose_Shell(PC pc,PetscErrorCode (*applytranspose)(void*,Vec,Vec)) { PC_Shell *shell; PetscFunctionBegin; shell = (PC_Shell*)pc->data; shell->applytranspose = applytranspose; PetscFunctionReturn(0); } EXTERN_C_END EXTERN_C_BEGIN #undef __FUNCT__ #define __FUNCT__ "PCShellSetName_Shell" PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetName_Shell(PC pc,const char name[]) { PC_Shell *shell; PetscErrorCode ierr; PetscFunctionBegin; shell = (PC_Shell*)pc->data; ierr = PetscStrfree(shell->name);CHKERRQ(ierr); ierr = PetscStrallocpy(name,&shell->name);CHKERRQ(ierr); PetscFunctionReturn(0); } EXTERN_C_END EXTERN_C_BEGIN #undef __FUNCT__ #define __FUNCT__ "PCShellGetName_Shell" PetscErrorCode PETSCKSP_DLLEXPORT PCShellGetName_Shell(PC pc,char *name[]) { PC_Shell *shell; PetscFunctionBegin; shell = (PC_Shell*)pc->data; *name = shell->name; PetscFunctionReturn(0); } EXTERN_C_END EXTERN_C_BEGIN #undef __FUNCT__ #define __FUNCT__ "PCShellSetApplyRichardson_Shell" PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetApplyRichardson_Shell(PC pc,PetscErrorCode (*apply)(void*,Vec,Vec,Vec,PetscReal,PetscReal,PetscReal,PetscInt)) { PC_Shell *shell; PetscFunctionBegin; shell = (PC_Shell*)pc->data; pc->ops->applyrichardson = PCApplyRichardson_Shell; shell->applyrich = apply; PetscFunctionReturn(0); } EXTERN_C_END /* -------------------------------------------------------------------------------*/ #undef __FUNCT__ #define __FUNCT__ "PCShellSetDestroy" /*@C PCShellSetDestroy - Sets routine to use to destroy the user-provided application context. Collective on PC Input Parameters: + pc - the preconditioner context . destroy - the application-provided destroy routine Calling sequence of destroy: .vb PetscErrorCode destroy (void *ptr) .ve . ptr - the application context Level: developer .keywords: PC, shell, set, destroy, user-provided .seealso: PCShellSetApply(), PCShellSetContext() @*/ PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetDestroy(PC pc,PetscErrorCode (*destroy)(void*)) { PetscErrorCode ierr,(*f)(PC,PetscErrorCode (*)(void*)); PetscFunctionBegin; PetscValidHeaderSpecific(pc,PC_COOKIE,1); ierr = PetscObjectQueryFunction((PetscObject)pc,"PCShellSetDestroy_C",(void (**)(void))&f);CHKERRQ(ierr); if (f) { ierr = (*f)(pc,destroy);CHKERRQ(ierr); } PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCShellSetSetUp" /*@C PCShellSetSetUp - Sets routine to use to "setup" the preconditioner whenever the matrix operator is changed. Collective on PC Input Parameters: + pc - the preconditioner context . setup - the application-provided setup routine Calling sequence of setup: .vb PetscErrorCode setup (void *ptr) .ve . ptr - the application context Level: developer .keywords: PC, shell, set, setup, user-provided .seealso: PCShellSetApplyRichardson(), PCShellSetApply(), PCShellSetContext() @*/ PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetSetUp(PC pc,PetscErrorCode (*setup)(void*)) { PetscErrorCode ierr,(*f)(PC,PetscErrorCode (*)(void*)); PetscFunctionBegin; PetscValidHeaderSpecific(pc,PC_COOKIE,1); ierr = PetscObjectQueryFunction((PetscObject)pc,"PCShellSetSetUp_C",(void (**)(void))&f);CHKERRQ(ierr); if (f) { ierr = (*f)(pc,setup);CHKERRQ(ierr); } PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCShellSetView" /*@C PCShellSetView - Sets routine to use as viewer of shell preconditioner Collective on PC Input Parameters: + pc - the preconditioner context - view - the application-provided view routine Calling sequence of apply: .vb PetscErrorCode view(void *ptr,PetscViewer v) .ve + ptr - the application context - v - viewer Level: developer .keywords: PC, shell, set, apply, user-provided .seealso: PCShellSetApplyRichardson(), PCShellSetSetUp(), PCShellSetApplyTranspose() @*/ PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetView(PC pc,PetscErrorCode (*view)(void*,PetscViewer)) { PetscErrorCode ierr,(*f)(PC,PetscErrorCode (*)(void*,PetscViewer)); PetscFunctionBegin; PetscValidHeaderSpecific(pc,PC_COOKIE,1); ierr = PetscObjectQueryFunction((PetscObject)pc,"PCShellSetView_C",(void (**)(void))&f);CHKERRQ(ierr); if (f) { ierr = (*f)(pc,view);CHKERRQ(ierr); } PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCShellSetApply" /*@C PCShellSetApply - Sets routine to use as preconditioner. Collective on PC Input Parameters: + pc - the preconditioner context - apply - the application-provided preconditioning routine Calling sequence of apply: .vb PetscErrorCode apply (void *ptr,Vec xin,Vec xout) .ve + ptr - the application context . xin - input vector - xout - output vector Level: developer .keywords: PC, shell, set, apply, user-provided .seealso: PCShellSetApplyRichardson(), PCShellSetSetUp(), PCShellSetApplyTranspose(), PCShellSetContext(), PCShellSetApplyBA() @*/ PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetApply(PC pc,PetscErrorCode (*apply)(void*,Vec,Vec)) { PetscErrorCode ierr,(*f)(PC,PetscErrorCode (*)(void*,Vec,Vec)); PetscFunctionBegin; PetscValidHeaderSpecific(pc,PC_COOKIE,1); ierr = PetscObjectQueryFunction((PetscObject)pc,"PCShellSetApply_C",(void (**)(void))&f);CHKERRQ(ierr); if (f) { ierr = (*f)(pc,apply);CHKERRQ(ierr); } PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCShellSetApplyBA" /*@C PCShellSetApplyBA - Sets routine to use as preconditioner times operator. Collective on PC Input Parameters: + pc - the preconditioner context - applyBA - the application-provided BA routine Calling sequence of apply: .vb PetscErrorCode applyBA (void *ptr,Vec xin,Vec xout) .ve + ptr - the application context . xin - input vector - xout - output vector Level: developer .keywords: PC, shell, set, apply, user-provided .seealso: PCShellSetApplyRichardson(), PCShellSetSetUp(), PCShellSetApplyTranspose(), PCShellSetContext(), PCShellSetApply() @*/ PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetApplyBA(PC pc,PetscErrorCode (*applyBA)(void*,PCSide,Vec,Vec,Vec)) { PetscErrorCode ierr,(*f)(PC,PetscErrorCode (*)(void*,PCSide,Vec,Vec,Vec)); PetscFunctionBegin; PetscValidHeaderSpecific(pc,PC_COOKIE,1); ierr = PetscObjectQueryFunction((PetscObject)pc,"PCShellSetApplyBA_C",(void (**)(void))&f);CHKERRQ(ierr); if (f) { ierr = (*f)(pc,applyBA);CHKERRQ(ierr); } PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCShellSetApplyTranspose" /*@C PCShellSetApplyTranspose - Sets routine to use as preconditioner transpose. Collective on PC Input Parameters: + pc - the preconditioner context - apply - the application-provided preconditioning transpose routine Calling sequence of apply: .vb PetscErrorCode applytranspose (void *ptr,Vec xin,Vec xout) .ve + ptr - the application context . xin - input vector - xout - output vector Level: developer Notes: Uses the same context variable as PCShellSetApply(). .keywords: PC, shell, set, apply, user-provided .seealso: PCShellSetApplyRichardson(), PCShellSetSetUp(), PCShellSetApply(), PCSetContext(), PCShellSetApplyBA() @*/ PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetApplyTranspose(PC pc,PetscErrorCode (*applytranspose)(void*,Vec,Vec)) { PetscErrorCode ierr,(*f)(PC,PetscErrorCode (*)(void*,Vec,Vec)); PetscFunctionBegin; PetscValidHeaderSpecific(pc,PC_COOKIE,1); ierr = PetscObjectQueryFunction((PetscObject)pc,"PCShellSetApplyTranspose_C",(void (**)(void))&f);CHKERRQ(ierr); if (f) { ierr = (*f)(pc,applytranspose);CHKERRQ(ierr); } PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCShellSetPreSolve" /*@C PCShellSetPreSolve - Sets routine to apply to the operators/vectors before a KSPSolve() is applied. This usually does something like scale the linear system in some application specific way. Collective on PC Input Parameters: + pc - the preconditioner context - presolve - the application-provided presolve routine Calling sequence of presolve: .vb PetscErrorCode presolve (void *ptr,KSP ksp,Vec b,Vec x) .ve + ptr - the application context . xin - input vector - xout - output vector Level: developer .keywords: PC, shell, set, apply, user-provided .seealso: PCShellSetApplyRichardson(), PCShellSetSetUp(), PCShellSetApplyTranspose(), PCShellSetPostSolve(), PCShellSetContext() @*/ PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetPreSolve(PC pc,PetscErrorCode (*presolve)(void*,KSP,Vec,Vec)) { PetscErrorCode ierr,(*f)(PC,PetscErrorCode (*)(void*,KSP,Vec,Vec)); PetscFunctionBegin; PetscValidHeaderSpecific(pc,PC_COOKIE,1); ierr = PetscObjectQueryFunction((PetscObject)pc,"PCShellSetPreSolve_C",(void (**)(void))&f);CHKERRQ(ierr); if (f) { ierr = (*f)(pc,presolve);CHKERRQ(ierr); } PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCShellSetPostSolve" /*@C PCShellSetPostSolve - Sets routine to apply to the operators/vectors before a KSPSolve() is applied. This usually does something like scale the linear system in some application specific way. Collective on PC Input Parameters: + pc - the preconditioner context - postsolve - the application-provided presolve routine Calling sequence of postsolve: .vb PetscErrorCode postsolve(void *ptr,KSP ksp,Vec b,Vec x) .ve + ptr - the application context . xin - input vector - xout - output vector Level: developer .keywords: PC, shell, set, apply, user-provided .seealso: PCShellSetApplyRichardson(), PCShellSetSetUp(), PCShellSetApplyTranspose(), PCShellSetPreSolve(), PCShellSetContext() @*/ PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetPostSolve(PC pc,PetscErrorCode (*postsolve)(void*,KSP,Vec,Vec)) { PetscErrorCode ierr,(*f)(PC,PetscErrorCode (*)(void*,KSP,Vec,Vec)); PetscFunctionBegin; PetscValidHeaderSpecific(pc,PC_COOKIE,1); ierr = PetscObjectQueryFunction((PetscObject)pc,"PCShellSetPostSolve_C",(void (**)(void))&f);CHKERRQ(ierr); if (f) { ierr = (*f)(pc,postsolve);CHKERRQ(ierr); } PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCShellSetName" /*@C PCShellSetName - Sets an optional name to associate with a shell preconditioner. Not Collective Input Parameters: + pc - the preconditioner context - name - character string describing shell preconditioner Level: developer .keywords: PC, shell, set, name, user-provided .seealso: PCShellGetName() @*/ PetscErrorCode PETSCKSP_DLLEXPORT PCShellSetName(PC pc,const char name[]) { PetscErrorCode ierr,(*f)(PC,const char []); PetscFunctionBegin; PetscValidHeaderSpecific(pc,PC_COOKIE,1); ierr = PetscObjectQueryFunction((PetscObject)pc,"PCShellSetName_C",(void (**)(void))&f);CHKERRQ(ierr); if (f) { ierr = (*f)(pc,name);CHKERRQ(ierr); } PetscFunctionReturn(0); } #undef __FUNCT__ #define __FUNCT__ "PCShellGetName" /*@C PCShellGetName - Gets an optional name that the user has set for a shell === message truncated === __________________________________________________ ??????????????? http://cn.mail.yahoo.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwojc at p.lodz.pl Thu Aug 10 06:11:42 2006 From: mwojc at p.lodz.pl (Marek Wojciechowski) Date: Thu, 10 Aug 2006 11:11:42 -0000 Subject: PETSc from python Message-ID: Hi all! I'm just curious: is there anybody using PETSc python bindings? -- Marek Wojciechowski From knepley at gmail.com Thu Aug 10 07:37:17 2006 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 10 Aug 2006 07:37:17 -0500 Subject: PETSc from python In-Reply-To: References: Message-ID: Sorry I did not respond earlier. I thought I could fix your problem quickly. However, it is too hard to maintain my own bindings. There are nice Python bindings from http://lineal.developer.nicta.com.au/ which I think is what most people use. Thanks, Matt On 8/10/06, Marek Wojciechowski wrote: > Hi all! > > I'm just curious: is there anybody using PETSc python bindings? > > -- > Marek Wojciechowski > > -- "Failure has a thousand explanations. Success doesn't need one" -- Sir Alec Guiness From mwojc at p.lodz.pl Thu Aug 10 14:48:10 2006 From: mwojc at p.lodz.pl (Marek Wojciechowski) Date: Thu, 10 Aug 2006 19:48:10 -0000 Subject: PETSc from python In-Reply-To: References: Message-ID: Regarding the bindings downloadable from ftp.mcs.anl.gov/pub/petsc/PETScPython.tar.gz there are just lacking two files in comparison to version PETScPython.tar.gz.bkp (they are PetscMap.c PetscViewer.c). I removed them also from makefile and I installed these bindings. However there is now some lack of PETSc funtionality, isn't it? On Thu, 10 Aug 2006 12:37:17 -0000, Matthew Knepley wrote: > Sorry I did not respond earlier. I thought I could fix your problem > quickly. However, it is too hard to maintain my own bindings. There > are nice Python bindings from > > http://lineal.developer.nicta.com.au/ > > which I think is what most people use. > > Thanks, > > Matt > > On 8/10/06, Marek Wojciechowski wrote: >> Hi all! >> >> I'm just curious: is there anybody using PETSc python bindings? >> >> -- >> Marek Wojciechowski >> >> > > -- Marek Wojciechowski From knepley at gmail.com Thu Aug 10 14:18:02 2006 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 10 Aug 2006 14:18:02 -0500 Subject: PETSc from python In-Reply-To: References: Message-ID: On 8/10/06, Marek Wojciechowski wrote: > Regarding the bindings downloadable from > ftp.mcs.anl.gov/pub/petsc/PETScPython.tar.gz > there are just lacking two files in comparison to version > PETScPython.tar.gz.bkp (they are > PetscMap.c PetscViewer.c). I removed them also from makefile and I > installed these bindings. > However there is now some lack of PETSc funtionality, isn't it? PetscMap is no longer a class in PETSc, so no problem there. And I think I reorganized so that the Viewer moved into different classes. Thanks, MAtt > > On Thu, 10 Aug 2006 12:37:17 -0000, Matthew Knepley > wrote: > > > Sorry I did not respond earlier. I thought I could fix your problem > > quickly. However, it is too hard to maintain my own bindings. There > > are nice Python bindings from > > > > http://lineal.developer.nicta.com.au/ > > > > which I think is what most people use. > > > > Thanks, > > > > Matt > > > > On 8/10/06, Marek Wojciechowski wrote: > >> Hi all! > >> > >> I'm just curious: is there anybody using PETSc python bindings? > >> > >> -- > >> Marek Wojciechowski > >> > >> > > > > > > > > -- > Marek Wojciechowski > > -- "Failure has a thousand explanations. Success doesn't need one" -- Sir Alec Guiness From julvar at tamu.edu Thu Aug 10 15:36:19 2006 From: julvar at tamu.edu (Julian) Date: Thu, 10 Aug 2006 15:36:19 -0500 Subject: How to get the address of an element in the matrix Message-ID: <20060810203614.6F5443C802@tr-4-int.cis.tamu.edu> Hi, Is there any way to get the address/location of a particular element in the matrix, say mat[i,j] So I can use that address later on to change the value of that element rather than using MatSetValues. I am currently using MatGetValues to get the value as such... PetscScalar val; MatGetValues(mat, 1, &i, 1, &j, &val); But I would like something where I can pass a pointer and then the pointer will be pointing to mat[i,j] such as PetscScalar *val; MatGetReference(mat, i, j, val); Is there something like this already in place ? Or is it not allowed because of the way the matrix is implemented internally? Thanks, Julian. From knepley at gmail.com Thu Aug 10 15:50:34 2006 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 10 Aug 2006 15:50:34 -0500 Subject: How to get the address of an element in the matrix In-Reply-To: <20060810203614.6F5443C802@tr-4-int.cis.tamu.edu> References: <20060810203614.6F5443C802@tr-4-int.cis.tamu.edu> Message-ID: On 8/10/06, Julian wrote: > Hi, > > Is there any way to get the address/location of a particular element in the > matrix, say mat[i,j] This would defeat the purpose of the interface. NO sparse matrix format supports this access. In fact, they all have individual access routines. We present the interface so that you never hav to rewrite your code if the matrix storage format changes. Furthermore, this works equally well in parallel. If you need mat[i,j], your algorithm is probably wrong. Matt > So I can use that address later on to change the value of that element > rather than using MatSetValues. > I am currently using MatGetValues to get the value as such... > PetscScalar val; > MatGetValues(mat, 1, &i, 1, &j, &val); > > But I would like something where I can pass a pointer and then the pointer > will be pointing to mat[i,j] such as > PetscScalar *val; > MatGetReference(mat, i, j, val); > > Is there something like this already in place ? Or is it not allowed because > of the way the matrix is implemented internally? > > Thanks, > Julian. > > -- "Failure has a thousand explanations. Success doesn't need one" -- Sir Alec Guiness From picard2 at llnl.gov Thu Aug 10 16:39:24 2006 From: picard2 at llnl.gov (Christophe Picard) Date: Thu, 10 Aug 2006 14:39:24 -0700 Subject: Performance issue with MatSetValues Message-ID: <200608101439.24263.picard2@llnl.gov> Hi, I have a performance issue while trying to insert values in the a matrix. I am using DMMG solver for cell-centered scheme in 3D from the petsc-snapshot to solve a Poisson equations. Inserting coefficients in the matrix for dirichlet or neumann boundary conditions, the insertion is instantaneous. But is I want to insert coefficients for periodic boundary conditions, I can notice a huge slow down in the insertion process (not in the resolution though). The smallest size I can notice the performance drop is 32*32*32. Is there any way to improve this? col.i = row.i+(mx-1); col.j = row.j; col.k = row.k; v[0] = 1; MatSetValuesStencil(*A,1,&row,1,&col,v,ADD_VALUES); Thans, Christophe From knepley at gmail.com Thu Aug 10 18:53:55 2006 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 10 Aug 2006 18:53:55 -0500 Subject: Performance issue with MatSetValues In-Reply-To: <200608101439.24263.picard2@llnl.gov> References: <200608101439.24263.picard2@llnl.gov> Message-ID: It sounds like you are inserting values which were not preallocating. To determine for sure, we would need to know more about the code. However, if you have a periodic problem, why not use a periodic DA? Matt On 8/10/06, Christophe Picard wrote: > Hi, > > I have a performance issue while trying to insert values in the a matrix. I am > using DMMG solver for cell-centered scheme in 3D from the petsc-snapshot to > solve a Poisson equations. Inserting coefficients in the matrix for dirichlet > or neumann boundary conditions, the insertion is instantaneous. But is I want > to insert coefficients for periodic boundary conditions, I can notice a huge > slow down in the insertion process (not in the resolution though). > The smallest size I can notice the performance drop is 32*32*32. > > Is there any way to improve this? > > > col.i = row.i+(mx-1); > col.j = row.j; > col.k = row.k; > > v[0] = 1; > > MatSetValuesStencil(*A,1,&row,1,&col,v,ADD_VALUES); > > > > Thans, > > Christophe > > -- "Failure has a thousand explanations. Success doesn't need one" -- Sir Alec Guiness From picard2 at llnl.gov Thu Aug 10 19:27:51 2006 From: picard2 at llnl.gov (Christophe Picard) Date: Thu, 10 Aug 2006 17:27:51 -0700 Subject: Performance issue with MatSetValues In-Reply-To: References: <200608101439.24263.picard2@llnl.gov> Message-ID: <200608101727.51362.picard2@llnl.gov> I think the memory is indeed not preallocated. Yes my problem is periodic, but if I try to use a periodic DA, the multigrid solver complains about it (see the end of the message). I believe the source of that problem is DAGetInterpolation_3D_Q0(). The problem I am trying to solve is a 3D Poisson equation with Neuman/Robin/Periodic boundary conditions. The boundary conditions are decided at runtime. If I use if-else-if statements to choose the DA here is the message Thanks, Christophe [0]PETSC ERROR: --------------------- Error Message ------------------------------------ [0]PETSC ERROR: Invalid argument! [0]PETSC ERROR: Cannot handle periodic grid in x! [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Petsc Development Version 2.3.1, Patch 14, Thu Jul 6 00:02:04 CDT 2006 HG revision: 97334a27165ab031dddd67964dd7a97955e75d20 [0]PETSC ERROR: See docs/changes/index.html for recent updates. [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. [0]PETSC ERROR: See docs/index.html for manual pages. [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: ././oceanus on a linux-gnu named tux194.llnl.gov by picard1 Thu Aug 10 17:16:31 2006 [0]PETSC ERROR: Libraries linked from /home/picard1/Tools/Petsc-Dev/lib/linux-gnu-c-real-debug [0]PETSC ERROR: Configure run at Thu Jul 6 16:05:13 2006 [0]PETSC ERROR: Configure options --prefix=/home/picard1/Tools/Petsc-Dev --with-dynamic --with-shared --with-mpi=0 --with-superlu=1 --download-superlu=ifneeded [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: DAGetInterpolation_3D_Q0() line 498 in src/dm/da/src/dainterp.c [0]PETSC ERROR: DAGetInterpolation() line 874 in src/dm/da/src/dainterp.c [0]PETSC ERROR: DMGetInterpolation() line 117 in src/dm/da/utils/dm.c [0]PETSC ERROR: DMMGSetUp() line 215 in src/snes/utils/damg.c [0]PETSC ERROR: DMMGSetDM() line 180 in src/snes/utils/damg.c [0]PETSC ERROR: --------------------- Error Message ------------------------------------ [0]PETSC ERROR: Null argument, when expecting valid pointer! [0]PETSC ERROR: Null Object: Parameter # 1! [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Petsc Development Version 2.3.1, Patch 14, Thu Jul 6 00:02:04 CDT 2006 HG revision: 97334a27165ab031dddd67964dd7a97955e75d20 [0]PETSC ERROR: See docs/changes/index.html for recent updates. [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. [0]PETSC ERROR: See docs/index.html for manual pages. [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: ././oceanus on a linux-gnu named tux194.llnl.gov by picard1 Thu Aug 10 17:16:31 2006 [0]PETSC ERROR: Libraries linked from /home/picard1/Tools/Petsc-Dev/lib/linux-gnu-c-real-debug [0]PETSC ERROR: Configure run at Thu Jul 6 16:05:13 2006 [0]PETSC ERROR: Configure options --prefix=/home/picard1/Tools/Petsc-Dev --with-dynamic --with-shared --with-mpi=0 --with-superlu=1 --download-superlu=ifneeded [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: PetscObjectReference() line 106 in src/sys/objects/inherit.c [0]PETSC ERROR: MGSetInterpolate() line 136 in src/ksp/pc/impls/mg/mgfunc.c [0]PETSC ERROR: DMMGSetUpLevel() line 385 in src/snes/utils/damg.c [0]PETSC ERROR: DMMGSetKSP() line 452 in src/snes/utils/damg.c On Thursday 10 August 2006 04:53 pm, Matthew Knepley wrote: > It sounds like you are inserting values which were not preallocating. To > determine for sure, we would need to know more about the code. However, > if you have a periodic problem, why not use a periodic DA? > > Matt > > On 8/10/06, Christophe Picard wrote: > > Hi, > > > > I have a performance issue while trying to insert values in the a matrix. > > I am using DMMG solver for cell-centered scheme in 3D from the > > petsc-snapshot to solve a Poisson equations. Inserting coefficients in > > the matrix for dirichlet or neumann boundary conditions, the insertion is > > instantaneous. But is I want to insert coefficients for periodic boundary > > conditions, I can notice a huge slow down in the insertion process (not > > in the resolution though). The smallest size I can notice the performance > > drop is 32*32*32. > > > > Is there any way to improve this? > > > > > > col.i = row.i+(mx-1); > > col.j = row.j; > > col.k = row.k; > > > > v[0] = 1; > > > > MatSetValuesStencil(*A,1,&row,1,&col,v,ADD_VALUES); > > > > > > > > Thans, > > > > Christophe From bsmith at mcs.anl.gov Thu Aug 10 20:01:41 2006 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 10 Aug 2006 20:01:41 -0500 (CDT) Subject: Performance issue with MatSetValues In-Reply-To: <200608101727.51362.picard2@llnl.gov> References: <200608101439.24263.picard2@llnl.gov> <200608101727.51362.picard2@llnl.gov> Message-ID: This is just do to an incomplete implementation; a user kindly donated the code to use but did not add support for periodicity because he did not need it. If you look at src/dm/da/src/dainterp.c you will find the various routines for setting up the interpolations. If you look at the code for 3D_Q1 you will see how the periodic case is handled; you may be able to modify the 3D_Q0 code to also handle the periodic case. This will then resolve your difficulty. Good luck, Barry On Thu, 10 Aug 2006, Christophe Picard wrote: > > I think the memory is indeed not preallocated. > Yes my problem is periodic, but if I try to use a periodic DA, the multigrid > solver complains about it (see the end of the message). I believe the source > of that problem is DAGetInterpolation_3D_Q0(). > > The problem I am trying to solve is a 3D Poisson equation with > Neuman/Robin/Periodic boundary conditions. The boundary conditions are > decided at runtime. > > If I use if-else-if statements to choose the DA here is the message > > Thanks, > Christophe > > [0]PETSC ERROR: --------------------- Error Message > ------------------------------------ > [0]PETSC ERROR: Invalid argument! > [0]PETSC ERROR: Cannot handle periodic grid in x! > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Petsc Development Version 2.3.1, Patch 14, Thu Jul 6 00:02:04 > CDT 2006 HG revision: 97334a27165ab031dddd67964dd7a97955e75d20 > [0]PETSC ERROR: See docs/changes/index.html for recent updates. > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. > [0]PETSC ERROR: See docs/index.html for manual pages. > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: ././oceanus on a linux-gnu named tux194.llnl.gov by picard1 > Thu Aug 10 17:16:31 2006 > [0]PETSC ERROR: Libraries linked from > /home/picard1/Tools/Petsc-Dev/lib/linux-gnu-c-real-debug > [0]PETSC ERROR: Configure run at Thu Jul 6 16:05:13 2006 > [0]PETSC ERROR: Configure options --prefix=/home/picard1/Tools/Petsc-Dev > --with-dynamic --with-shared --with-mpi=0 --with-superlu=1 > --download-superlu=ifneeded > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: DAGetInterpolation_3D_Q0() line 498 in > src/dm/da/src/dainterp.c > [0]PETSC ERROR: DAGetInterpolation() line 874 in src/dm/da/src/dainterp.c > [0]PETSC ERROR: DMGetInterpolation() line 117 in src/dm/da/utils/dm.c > [0]PETSC ERROR: DMMGSetUp() line 215 in src/snes/utils/damg.c > [0]PETSC ERROR: DMMGSetDM() line 180 in src/snes/utils/damg.c > [0]PETSC ERROR: --------------------- Error Message > ------------------------------------ > [0]PETSC ERROR: Null argument, when expecting valid pointer! > [0]PETSC ERROR: Null Object: Parameter # 1! > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Petsc Development Version 2.3.1, Patch 14, Thu Jul 6 00:02:04 > CDT 2006 HG revision: 97334a27165ab031dddd67964dd7a97955e75d20 > [0]PETSC ERROR: See docs/changes/index.html for recent updates. > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. > [0]PETSC ERROR: See docs/index.html for manual pages. > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: ././oceanus on a linux-gnu named tux194.llnl.gov by picard1 > Thu Aug 10 17:16:31 2006 > [0]PETSC ERROR: Libraries linked from > /home/picard1/Tools/Petsc-Dev/lib/linux-gnu-c-real-debug > [0]PETSC ERROR: Configure run at Thu Jul 6 16:05:13 2006 > [0]PETSC ERROR: Configure options --prefix=/home/picard1/Tools/Petsc-Dev > --with-dynamic --with-shared --with-mpi=0 --with-superlu=1 > --download-superlu=ifneeded > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: PetscObjectReference() line 106 in src/sys/objects/inherit.c > [0]PETSC ERROR: MGSetInterpolate() line 136 in src/ksp/pc/impls/mg/mgfunc.c > [0]PETSC ERROR: DMMGSetUpLevel() line 385 in src/snes/utils/damg.c > [0]PETSC ERROR: DMMGSetKSP() line 452 in src/snes/utils/damg.c > > > > On Thursday 10 August 2006 04:53 pm, Matthew Knepley wrote: >> It sounds like you are inserting values which were not preallocating. To >> determine for sure, we would need to know more about the code. However, >> if you have a periodic problem, why not use a periodic DA? >> >> Matt >> >> On 8/10/06, Christophe Picard wrote: >>> Hi, >>> >>> I have a performance issue while trying to insert values in the a matrix. >>> I am using DMMG solver for cell-centered scheme in 3D from the >>> petsc-snapshot to solve a Poisson equations. Inserting coefficients in >>> the matrix for dirichlet or neumann boundary conditions, the insertion is >>> instantaneous. But is I want to insert coefficients for periodic boundary >>> conditions, I can notice a huge slow down in the insertion process (not >>> in the resolution though). The smallest size I can notice the performance >>> drop is 32*32*32. >>> >>> Is there any way to improve this? >>> >>> >>> col.i = row.i+(mx-1); >>> col.j = row.j; >>> col.k = row.k; >>> >>> v[0] = 1; >>> >>> MatSetValuesStencil(*A,1,&row,1,&col,v,ADD_VALUES); >>> >>> >>> >>> Thans, >>> >>> Christophe > > From picard2 at llnl.gov Thu Aug 10 20:22:57 2006 From: picard2 at llnl.gov (Christophe Picard) Date: Thu, 10 Aug 2006 18:22:57 -0700 Subject: Performance issue with MatSetValues In-Reply-To: References: <200608101439.24263.picard2@llnl.gov> <200608101727.51362.picard2@llnl.gov> Message-ID: <200608101822.57861.picard2@llnl.gov> Thank you. I knew it was kindly donated...and this is also the reason why I am using the dev version of PETSC. I will look at the periodic implementation see if I can fix my problem. Thank you for your precision. Christophe On Thursday 10 August 2006 06:01 pm, Barry Smith wrote: > This is just do to an incomplete implementation; a user kindly donated > the code to use but did not add support for periodicity because he did not > need it. If you look at src/dm/da/src/dainterp.c you will find the various > routines for setting up the interpolations. If you look at the code > for 3D_Q1 you will see how the periodic case is handled; you may be able > to modify the 3D_Q0 code to also handle the periodic case. This will then > resolve your difficulty. > > Good luck, > > Barry > > On Thu, 10 Aug 2006, Christophe Picard wrote: > > I think the memory is indeed not preallocated. > > Yes my problem is periodic, but if I try to use a periodic DA, the > > multigrid solver complains about it (see the end of the message). I > > believe the source of that problem is DAGetInterpolation_3D_Q0(). > > > > The problem I am trying to solve is a 3D Poisson equation with > > Neuman/Robin/Periodic boundary conditions. The boundary conditions are > > decided at runtime. > > > > If I use if-else-if statements to choose the DA here is the message > > > > Thanks, > > Christophe > > > > [0]PETSC ERROR: --------------------- Error Message > > ------------------------------------ > > [0]PETSC ERROR: Invalid argument! > > [0]PETSC ERROR: Cannot handle periodic grid in x! > > [0]PETSC ERROR: > > ------------------------------------------------------------------------ > > [0]PETSC ERROR: Petsc Development Version 2.3.1, Patch 14, Thu Jul 6 > > 00:02:04 CDT 2006 HG revision: 97334a27165ab031dddd67964dd7a97955e75d20 > > [0]PETSC ERROR: See docs/changes/index.html for recent updates. > > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. > > [0]PETSC ERROR: See docs/index.html for manual pages. > > [0]PETSC ERROR: > > ------------------------------------------------------------------------ > > [0]PETSC ERROR: ././oceanus on a linux-gnu named tux194.llnl.gov by > > picard1 Thu Aug 10 17:16:31 2006 > > [0]PETSC ERROR: Libraries linked from > > /home/picard1/Tools/Petsc-Dev/lib/linux-gnu-c-real-debug > > [0]PETSC ERROR: Configure run at Thu Jul 6 16:05:13 2006 > > [0]PETSC ERROR: Configure options --prefix=/home/picard1/Tools/Petsc-Dev > > --with-dynamic --with-shared --with-mpi=0 --with-superlu=1 > > --download-superlu=ifneeded > > [0]PETSC ERROR: > > ------------------------------------------------------------------------ > > [0]PETSC ERROR: DAGetInterpolation_3D_Q0() line 498 in > > src/dm/da/src/dainterp.c > > [0]PETSC ERROR: DAGetInterpolation() line 874 in src/dm/da/src/dainterp.c > > [0]PETSC ERROR: DMGetInterpolation() line 117 in src/dm/da/utils/dm.c > > [0]PETSC ERROR: DMMGSetUp() line 215 in src/snes/utils/damg.c > > [0]PETSC ERROR: DMMGSetDM() line 180 in src/snes/utils/damg.c > > [0]PETSC ERROR: --------------------- Error Message > > ------------------------------------ > > [0]PETSC ERROR: Null argument, when expecting valid pointer! > > [0]PETSC ERROR: Null Object: Parameter # 1! > > [0]PETSC ERROR: > > ------------------------------------------------------------------------ > > [0]PETSC ERROR: Petsc Development Version 2.3.1, Patch 14, Thu Jul 6 > > 00:02:04 CDT 2006 HG revision: 97334a27165ab031dddd67964dd7a97955e75d20 > > [0]PETSC ERROR: See docs/changes/index.html for recent updates. > > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. > > [0]PETSC ERROR: See docs/index.html for manual pages. > > [0]PETSC ERROR: > > ------------------------------------------------------------------------ > > [0]PETSC ERROR: ././oceanus on a linux-gnu named tux194.llnl.gov by > > picard1 Thu Aug 10 17:16:31 2006 > > [0]PETSC ERROR: Libraries linked from > > /home/picard1/Tools/Petsc-Dev/lib/linux-gnu-c-real-debug > > [0]PETSC ERROR: Configure run at Thu Jul 6 16:05:13 2006 > > [0]PETSC ERROR: Configure options --prefix=/home/picard1/Tools/Petsc-Dev > > --with-dynamic --with-shared --with-mpi=0 --with-superlu=1 > > --download-superlu=ifneeded > > [0]PETSC ERROR: > > ------------------------------------------------------------------------ > > [0]PETSC ERROR: PetscObjectReference() line 106 in > > src/sys/objects/inherit.c [0]PETSC ERROR: MGSetInterpolate() line 136 in > > src/ksp/pc/impls/mg/mgfunc.c [0]PETSC ERROR: DMMGSetUpLevel() line 385 in > > src/snes/utils/damg.c [0]PETSC ERROR: DMMGSetKSP() line 452 in > > src/snes/utils/damg.c > > > > On Thursday 10 August 2006 04:53 pm, Matthew Knepley wrote: > >> It sounds like you are inserting values which were not preallocating. To > >> determine for sure, we would need to know more about the code. However, > >> if you have a periodic problem, why not use a periodic DA? > >> > >> Matt > >> > >> On 8/10/06, Christophe Picard wrote: > >>> Hi, > >>> > >>> I have a performance issue while trying to insert values in the a > >>> matrix. I am using DMMG solver for cell-centered scheme in 3D from the > >>> petsc-snapshot to solve a Poisson equations. Inserting coefficients in > >>> the matrix for dirichlet or neumann boundary conditions, the insertion > >>> is instantaneous. But is I want to insert coefficients for periodic > >>> boundary conditions, I can notice a huge slow down in the insertion > >>> process (not in the resolution though). The smallest size I can notice > >>> the performance drop is 32*32*32. > >>> > >>> Is there any way to improve this? > >>> > >>> > >>> col.i = row.i+(mx-1); > >>> col.j = row.j; > >>> col.k = row.k; > >>> > >>> v[0] = 1; > >>> > >>> MatSetValuesStencil(*A,1,&row,1,&col,v,ADD_VALUES); > >>> > >>> > >>> > >>> Thans, > >>> > >>> Christophe From marsum2006 at yahoo.com Fri Aug 11 10:07:22 2006 From: marsum2006 at yahoo.com (Margot Summer) Date: Fri, 11 Aug 2006 08:07:22 -0700 (PDT) Subject: nonzeros of a matrix Message-ID: <20060811150722.49388.qmail@web57107.mail.re3.yahoo.com> Hi, Is there a simple way to get the number of nonzeros of a matrix? Further, how to find out the information about a matrix object in KSP/PC, e.g., the number of nonzeros of the preconditioner, or the number of nonzeros of the ilu(icc) factor? Thanks, Margot __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From balay at mcs.anl.gov Fri Aug 11 10:36:56 2006 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 11 Aug 2006 10:36:56 -0500 (CDT) Subject: nonzeros of a matrix In-Reply-To: <20060811150722.49388.qmail@web57107.mail.re3.yahoo.com> References: <20060811150722.49388.qmail@web57107.mail.re3.yahoo.com> Message-ID: You can run the code with -mat_view_info [to get the matrix info] and -ksp_view - to get the info about the solvers [which include some details about the ilu preconditioner] Satish On Fri, 11 Aug 2006, Margot Summer wrote: > Hi, > > Is there a simple way to get the number of nonzeros of > a matrix? Further, how to find out the information > about a matrix object in KSP/PC, e.g., the number of > nonzeros of the preconditioner, or the number of > nonzeros of the ilu(icc) factor? Thanks, > > Margot > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > From marsum2006 at yahoo.com Fri Aug 11 10:55:33 2006 From: marsum2006 at yahoo.com (Margot Summer) Date: Fri, 11 Aug 2006 08:55:33 -0700 (PDT) Subject: nonzeros of a matrix In-Reply-To: Message-ID: <20060811155533.49169.qmail@web57112.mail.re3.yahoo.com> Thanks. But can we find out this info inside the code (like the way we get the number of iterations of ksp)? Also, for many subpc's, e.g. using bjacobi, -ksp_view does not print out every block. Margot Satish Balay wrote: You can run the code with -mat_view_info [to get the matrix info] and -ksp_view - to get the info about the solvers [which include some details about the ilu preconditioner] Satish On Fri, 11 Aug 2006, Margot Summer wrote: > Hi, > > Is there a simple way to get the number of nonzeros of > a matrix? Further, how to find out the information > about a matrix object in KSP/PC, e.g., the number of > nonzeros of the preconditioner, or the number of > nonzeros of the ilu(icc) factor? Thanks, > > Margot > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > --------------------------------- Stay in the know. Pulse on the new Yahoo.com. Check it out. -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri Aug 11 11:24:38 2006 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 11 Aug 2006 11:24:38 -0500 (CDT) Subject: nonzeros of a matrix In-Reply-To: <20060811155533.49169.qmail@web57112.mail.re3.yahoo.com> References: <20060811155533.49169.qmail@web57112.mail.re3.yahoo.com> Message-ID: There is a MatGetInfo() - which returns MatInfo object. You might be able to do PCGetFactoredMatrix() to get the factor and call MatGetInfo() on it as well.. With bjacobi - you can call PCBJacobiGetSubKSP() to get all the solver objects for each sub-block [and extract subpcs, and corresponding factors etc..] Satish On Fri, 11 Aug 2006, Margot Summer wrote: > Thanks. But can we find out this info inside the code (like the way we get the number of iterations of ksp)? Also, for many subpc's, e.g. using bjacobi, -ksp_view does not print out every block. > > Margot > > Satish Balay wrote: You can run the code with -mat_view_info [to get the matrix info] and > -ksp_view - to get the info about the solvers [which include some > details about the ilu preconditioner] > > Satish > > On Fri, 11 Aug 2006, Margot Summer wrote: > > > Hi, > > > > Is there a simple way to get the number of nonzeros of > > a matrix? Further, how to find out the information > > about a matrix object in KSP/PC, e.g., the number of > > nonzeros of the preconditioner, or the number of > > nonzeros of the ilu(icc) factor? Thanks, > > > > Margot > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam protection around > > http://mail.yahoo.com > > > > > > > > > --------------------------------- > Stay in the know. Pulse on the new Yahoo.com. Check it out. From marsum2006 at yahoo.com Fri Aug 11 11:26:42 2006 From: marsum2006 at yahoo.com (Margot Summer) Date: Fri, 11 Aug 2006 09:26:42 -0700 (PDT) Subject: nonzeros of a matrix In-Reply-To: Message-ID: <20060811162642.34545.qmail@web57114.mail.re3.yahoo.com> thanks! Satish Balay wrote: There is a MatGetInfo() - which returns MatInfo object. You might be able to do PCGetFactoredMatrix() to get the factor and call MatGetInfo() on it as well.. With bjacobi - you can call PCBJacobiGetSubKSP() to get all the solver objects for each sub-block [and extract subpcs, and corresponding factors etc..] Satish On Fri, 11 Aug 2006, Margot Summer wrote: > Thanks. But can we find out this info inside the code (like the way we get the number of iterations of ksp)? Also, for many subpc's, e.g. using bjacobi, -ksp_view does not print out every block. > > Margot > > Satish Balay wrote: You can run the code with -mat_view_info [to get the matrix info] and > -ksp_view - to get the info about the solvers [which include some > details about the ilu preconditioner] > > Satish > > On Fri, 11 Aug 2006, Margot Summer wrote: > > > Hi, > > > > Is there a simple way to get the number of nonzeros of > > a matrix? Further, how to find out the information > > about a matrix object in KSP/PC, e.g., the number of > > nonzeros of the preconditioner, or the number of > > nonzeros of the ilu(icc) factor? Thanks, > > > > Margot > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam protection around > > http://mail.yahoo.com > > > > > > > > > --------------------------------- > Stay in the know. Pulse on the new Yahoo.com. Check it out. --------------------------------- Get your email and more, right on the new Yahoo.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jiaxun_hou at yahoo.com.cn Sun Aug 13 22:55:33 2006 From: jiaxun_hou at yahoo.com.cn (jiaxun hou) Date: Mon, 14 Aug 2006 11:55:33 +0800 (CST) Subject: problem of "caused collective abort of all ranks" Message-ID: <20060814035533.10775.qmail@web15801.mail.cnb.yahoo.com> Hi, I am sorry to bother you. I met this strange trouble yesterday, and I have tried lots of methods to solve it. But fail. My code likes this: static char help[] = "Solves a tridiagonal linear system with KSP.\n\n"; #include "petscksp.h" #include "builderh.h" #undef __FUNCT__ #define __FUNCT__ "main" int main(int argc,char **args){ PetscInitialize(&argc,&args,(char *)0,help); Mat A; Vec x; PetscInt k=3,v=1,n_x=5,s_t=4,row; PetscInt i[2]; PetscReal h_x,h_t; PetscScalar temp; MainMatPar *mmp; PetscErrorCode ierr; h_x = PETSC_PI/(PetscReal)(n_x+1); h_t = PETSC_PI*2/(PetscReal)(s_t); s_t++; ierr = VecCreate(PETSC_COMM_WORLD,&x);CHKERRQ(ierr); ierr = PetscObjectSetName((PetscObject) x, "Solution");CHKERRQ(ierr); ierr = VecSetSizes(x,PETSC_DECIDE,n_x*s_t);CHKERRQ(ierr); ierr = VecSetFromOptions(x);CHKERRQ(ierr); temp = h_x; for (i[0]=0;i[0] -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: builderh.h URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: builder.c URL: From yaronkretchmer at gmail.com Tue Aug 15 11:08:06 2006 From: yaronkretchmer at gmail.com (Yaron Kretchmer) Date: Tue, 15 Aug 2006 09:08:06 -0700 Subject: programmmatic access to commandline variables Message-ID: Hi All Is there a way of determining programatically which commandline options are used/not used within petsc? Alternatively, is there a file which contains all legal commandline options? If so, what is it and what is the format? Thanks Yaron -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Tue Aug 15 11:14:08 2006 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 15 Aug 2006 11:14:08 -0500 (CDT) Subject: programmmatic access to commandline variables In-Reply-To: References: Message-ID: On Tue, 15 Aug 2006, Yaron Kretchmer wrote: > Hi All > Is there a way of determining programatically which commandline options are > used/not used within petsc? You can run the code with the additional option '-options_left' or add the following line of code - after PetscInitialize() ierr = PetscOptionsSetValue("-options_left",PETSC_NULL);CHKERRQ(ierr); > Alternatively, is there a file which contains all legal commandline options? > If so, what is it and what is the format? The options are distributed all over the code. The best way to get them is to run the appliation code with '-help' option - and it prints all the relavent options to that code run. Satish From yaronkretchmer at gmail.com Tue Aug 15 11:32:07 2006 From: yaronkretchmer at gmail.com (Yaron Kretchmer) Date: Tue, 15 Aug 2006 09:32:07 -0700 Subject: programmmatic access to commandline variables In-Reply-To: References: Message-ID: Hi Satish This would print out the options that were not used. What I'm looking for is a way of accessing them inside the program (as in going through an array of unused options or something similar) Thanks Yaron On 8/15/06, Satish Balay wrote: > > On Tue, 15 Aug 2006, Yaron Kretchmer wrote: > > > Hi All > > Is there a way of determining programatically which commandline options > are > > used/not used within petsc? > > You can run the code with the additional option '-options_left' > > or add the following line of code - after PetscInitialize() > > ierr = PetscOptionsSetValue("-options_left",PETSC_NULL);CHKERRQ(ierr); > > > Alternatively, is there a file which contains all legal commandline > options? > > If so, what is it and what is the format? > > The options are distributed all over the code. The best way to get > them is to run the appliation code with '-help' option - and it prints > all the relavent options to that code run. > > Satish > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mafunk at nmsu.edu Tue Aug 15 11:34:00 2006 From: mafunk at nmsu.edu (Matt Funk) Date: Tue, 15 Aug 2006 10:34:00 -0600 Subject: profiling PETSc code In-Reply-To: <200608021621.44171.mafunk@nmsu.edu> References: <200608011545.26224.mafunk@nmsu.edu> <200608021621.44171.mafunk@nmsu.edu> Message-ID: <200608151034.03619.mafunk@nmsu.edu> Hi Matt, sorry for the delay since the last email, but there were some other things i needed to do. Anyway, I hope that maybe I can get some more help from you guys with respect to the loadimbalance problem i have. Here is the situtation: I run my code on 2 procs. I profile my KSPSolve call and here is what i get: ... --- Event Stage 2: Stage 2 of ChomboPetscInterface VecDot 20000 1.0 1.9575e+01 2.0 2.39e+08 2.0 0.0e+00 0.0e+00 2.0e+04 2 8 0 0 56 7 8 0 0 56 245 VecNorm 16000 1.0 4.0559e+01 3.2 1.53e+08 3.2 0.0e+00 0.0e+00 1.6e+04 3 7 0 0 44 13 7 0 0 44 95 VecCopy 4000 1.0 1.5148e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 VecSet 16000 1.0 3.1937e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 VecAXPY 16000 1.0 8.2395e+00 1.4 3.22e+08 1.4 0.0e+00 0.0e+00 0.0e+00 1 7 0 0 0 3 7 0 0 0 465 VecAYPX 8000 1.0 3.6370e+00 1.3 3.46e+08 1.3 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 2 3 0 0 0 527 VecScatterBegin 12000 1.0 5.7721e-01 1.1 0.00e+00 0.0 2.4e+04 2.1e+04 0.0e+00 0 0100100 0 0 0100100 0 0 VecScatterEnd 12000 1.0 1.4059e+01 9.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 4 0 0 0 0 0 MatMult 12000 1.0 5.0567e+01 1.0 1.82e+08 1.0 2.4e+04 2.1e+04 0.0e+00 5 32100100 0 25 32100100 0 361 MatSolve 16000 1.0 1.1285e+02 1.4 1.50e+08 1.4 0.0e+00 0.0e+00 0.0e+00 10 43 0 0 0 47 43 0 0 0 214 MatLUFactorNum 1 1.0 1.9693e-02 1.0 5.79e+07 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 110 MatILUFactorSym 1 1.0 7.8840e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 1.2250e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSetup 1 1.0 1.1921e-06 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 4000 1.0 2.0419e+02 1.0 1.39e+08 1.0 2.4e+04 2.1e+04 3.6e+04 21100100100100 100100100100100 278 PCSetUp 1 1.0 2.8828e-02 1.1 3.99e+07 1.1 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 75 PCSetUpOnBlocks 4000 1.0 3.5605e-02 1.1 3.35e+07 1.1 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 61 PCApply 16000 1.0 1.1661e+02 1.4 1.46e+08 1.4 0.0e+00 0.0e+00 0.0e+00 10 43 0 0 0 49 43 0 0 0 207 ------------------------------------------------------------------------------------------------------------------------ ... Some things to note are the following: I allocate my vector as: VecCreateMPI(PETSC_COMM_WORLD, //communicator a_totallocal_numPoints[a_thisproc], //local points on this proc a_totalglobal_numPoints, //total number of global points &m_globalRHSVector); //the vector to be created where the vector a_totallocal_numPoints is : a_totallocal_numPoints: 59904 59904 The matrix is allocated as: m_ierr = MatCreateMPIAIJ(PETSC_COMM_WORLD, //communicator a_totallocal_numPoints[a_thisproc], //total number of local rows (that is rows residing on this proc a_totallocal_numPoints[a_thisproc], //total number of columns corresponding to local part of parallel vector a_totalglobal_numPoints, //number of global rows a_totalglobal_numPoints, //number of global columns PETSC_NULL, a_NumberOfNZPointsInDiagonalMatrix, PETSC_NULL, a_NumberOfNZPointsInOffDiagonalMatrix, &m_globalMatrix); With the info option i checked and there is no extra mallocs at all. My problems setup is symmetric so it seems that everything is set up so that it should be essentially perfectly balanced. However, the numbers given above certainly do not reflect that. However, the in all other parts of my code (except the PETSc call), i get the expected, almost perfect loadbalance. Is there anything that i am overlooking? Any help is greatly appreciated. thanks mat On Wednesday 02 August 2006 16:21, Matt Funk wrote: > Hi Matt, > > It could be a bad load imbalance because i don't let PETSc decide. I need > to fix that anyway, so i think i'll try that first and then let you know. > Thanks though for the quick response and helping me to interpret those > numbers ... > > > mat > > On Wednesday 02 August 2006 15:50, Matthew Knepley wrote: > > On 8/2/06, Matt Funk wrote: > > > Hi Matt, > > > > > > thanks for all the help so far. The -info option is really very > > > helpful. So i think i straightened the actual errors out. However, now > > > i am back to the original question i had. That is why it takes so much > > > longer on 4 procs than on 1 proc. > > > > So you have a 1.5 load imbalance for MatMult(), which probably cascades > > to give the 133! load imbalance for VecDot(). You probably have either: > > > > 1) VERY bad laod imbalance > > > > 2) a screwed up network > > > > 3) bad contention on the network (loaded cluster) > > > > Can you help us narrow this down? > > > > > > Matt > > > > > I profiled the KSPSolve(...) as stage 2: > > > > > > For 1 proc i have: > > > --- Event Stage 2: Stage 2 of ChomboPetscInterface > > > > > > VecDot 4000 1.0 4.9158e-02 1.0 4.74e+08 1.0 0.0e+00 > > > 0.0e+00 0.0e+00 0 18 0 0 0 2 18 0 0 0 474 > > > VecNorm 8000 1.0 2.1798e-01 1.0 2.14e+08 1.0 0.0e+00 > > > 0.0e+00 4.0e+03 1 36 0 0 28 7 36 0 0 33 214 > > > VecAYPX 4000 1.0 1.3449e-01 1.0 1.73e+08 1.0 0.0e+00 > > > 0.0e+00 0.0e+00 0 18 0 0 0 5 18 0 0 0 173 > > > MatMult 4000 1.0 3.6004e-01 1.0 3.24e+07 1.0 0.0e+00 > > > 0.0e+00 0.0e+00 1 9 0 0 0 12 9 0 0 0 32 > > > MatSolve 8000 1.0 1.0620e+00 1.0 2.19e+07 1.0 0.0e+00 > > > 0.0e+00 0.0e+00 3 18 0 0 0 36 18 0 0 0 22 > > > KSPSolve 4000 1.0 2.8338e+00 1.0 4.52e+07 1.0 0.0e+00 > > > 0.0e+00 1.2e+04 7100 0 0 84 97100 0 0100 45 > > > PCApply 8000 1.0 1.1133e+00 1.0 2.09e+07 1.0 0.0e+00 > > > 0.0e+00 0.0e+00 3 18 0 0 0 38 18 0 0 0 21 > > > > > > > > > for 4 procs i have : > > > --- Event Stage 2: Stage 2 of ChomboPetscInterface > > > > > > VecDot 4000 1.0 3.5884e+01133.7 2.17e+07133.7 0.0e+00 > > > 0.0e+00 4.0e+03 8 18 0 0 5 9 18 0 0 14 1 > > > VecNorm 8000 1.0 3.4986e-01 1.3 4.43e+07 1.3 0.0e+00 > > > 0.0e+00 8.0e+03 0 36 0 0 10 0 36 0 0 29 133 > > > VecSet 8000 1.0 3.5024e-02 1.4 0.00e+00 0.0 0.0e+00 > > > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > VecAYPX 4000 1.0 5.6790e-02 1.3 1.28e+08 1.3 0.0e+00 > > > 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 410 > > > VecScatterBegin 4000 1.0 6.0042e+01 1.4 0.00e+00 0.0 0.0e+00 > > > 0.0e+00 0.0e+00 38 0 0 0 0 45 0 0 0 0 0 > > > VecScatterEnd 4000 1.0 5.9364e+01 1.4 0.00e+00 0.0 0.0e+00 > > > 0.0e+00 0.0e+00 37 0 0 0 0 44 0 0 0 0 0 > > > MatMult 4000 1.0 1.1959e+02 1.4 3.46e+04 1.4 0.0e+00 > > > 0.0e+00 0.0e+00 75 9 0 0 0 89 9 0 0 0 0 > > > MatSolve 8000 1.0 2.8150e-01 1.0 2.16e+07 1.0 0.0e+00 > > > 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 83 > > > MatLUFactorNum 1 1.0 1.3685e-04 1.1 5.64e+06 1.1 0.0e+00 > > > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 21 > > > MatILUFactorSym 1 1.0 2.3389e-04 1.2 0.00e+00 0.0 0.0e+00 > > > 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > MatGetOrdering 1 1.0 9.6083e-05 1.2 0.00e+00 0.0 0.0e+00 > > > 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > KSPSetup 1 1.0 2.1458e-06 2.2 0.00e+00 0.0 0.0e+00 > > > 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > KSPSolve 4000 1.0 1.2200e+02 1.0 2.63e+05 1.0 0.0e+00 > > > 0.0e+00 2.8e+04 84100 0 0 34 100100 0 0100 1 > > > PCSetUp 1 1.0 5.0187e-04 1.2 1.68e+06 1.2 0.0e+00 > > > 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 6 > > > PCSetUpOnBlocks 4000 1.0 1.2104e-02 2.2 1.34e+05 2.2 0.0e+00 > > > 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > PCApply 8000 1.0 8.4254e-01 1.2 8.27e+06 1.2 0.0e+00 > > > 0.0e+00 8.0e+03 1 18 0 0 10 1 18 0 0 29 28 > > > ----------------------------------------------------------------------- > > >-- ----------------------------------------------- > > > > > > Now if i understand it right, all these calls summarize all calls > > > between the pop and push commands. That would mean that the majority of > > > the time is spend in the MatMult and in within that the VecScatterBegin > > > and VecScatterEnd commands (if i understand it right). > > > > > > My problem size is really small. So i was wondering if the problem lies > > > in that (namely that the major time is simply spend communicating > > > between processors, or whether there is still something wrong with how > > > i wrote the code?) > > > > > > > > > thanks > > > mat > > > > > > On Tuesday 01 August 2006 18:28, Matthew Knepley wrote: > > > > On 8/1/06, Matt Funk wrote: > > > > > Actually the errors occur on my calls to a PETSc functions after > > > > > calling PETSCInitialize. > > > > > > > > Yes, it is the error I pointed out in the last message. > > > > > > > > Matt > > > > > > > > > mat From balay at mcs.anl.gov Tue Aug 15 11:40:06 2006 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 15 Aug 2006 11:40:06 -0500 (CDT) Subject: programmmatic access to commandline variables In-Reply-To: References: Message-ID: You might want to check the code of PetscOptionsLeft(), PetscOptionsAllUsed() and reimplement the desired functionality. The above routines are in src/sys/objects/options.c Note - you'll have to do the check only after the options get used - this means ksp options can be checked only after kspsolve etc.. [so doing this just before PetscFinalize() would capture all options usage] Satish On Tue, 15 Aug 2006, Yaron Kretchmer wrote: > Hi Satish > This would print out the options that were not used. > What I'm looking for is a way of accessing them inside the program (as in > going through an array of unused options or something similar) > > Thanks > Yaron > > > > On 8/15/06, Satish Balay wrote: > > > > On Tue, 15 Aug 2006, Yaron Kretchmer wrote: > > > > > Hi All > > > Is there a way of determining programatically which commandline options > > are > > > used/not used within petsc? > > > > You can run the code with the additional option '-options_left' > > > > or add the following line of code - after PetscInitialize() > > > > ierr = PetscOptionsSetValue("-options_left",PETSC_NULL);CHKERRQ(ierr); > > > > > Alternatively, is there a file which contains all legal commandline > > options? > > > If so, what is it and what is the format? > > > > The options are distributed all over the code. The best way to get > > them is to run the appliation code with '-help' option - and it prints > > all the relavent options to that code run. > > > > Satish > > > > > From bsmith at mcs.anl.gov Tue Aug 15 12:52:02 2006 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 15 Aug 2006 12:52:02 -0500 (CDT) Subject: profiling PETSc code In-Reply-To: <200608151034.03619.mafunk@nmsu.edu> References: <200608011545.26224.mafunk@nmsu.edu> <200608021621.44171.mafunk@nmsu.edu> <200608151034.03619.mafunk@nmsu.edu> Message-ID: > MatSolve 16000 1.0 1.1285e+02 1.4 1.50e+08 1.4 0.0e+00 0.0e+00 ^^^^^ balance Hmmm, I would guess that the matrix entries are not so well balanced? One process takes 1.4 times as long for the triangular solves as the other so either one matrix has many more entries or one processor is slower then the other. Barry On Tue, 15 Aug 2006, Matt Funk wrote: > Hi Matt, > > sorry for the delay since the last email, but there were some other things i > needed to do. > > Anyway, I hope that maybe I can get some more help from you guys with respect > to the loadimbalance problem i have. Here is the situtation: > I run my code on 2 procs. I profile my KSPSolve call and here is what i get: > > ... > > --- Event Stage 2: Stage 2 of ChomboPetscInterface > > VecDot 20000 1.0 1.9575e+01 2.0 2.39e+08 2.0 0.0e+00 0.0e+00 > 2.0e+04 2 8 0 0 56 7 8 0 0 56 245 > VecNorm 16000 1.0 4.0559e+01 3.2 1.53e+08 3.2 0.0e+00 0.0e+00 > 1.6e+04 3 7 0 0 44 13 7 0 0 44 95 > VecCopy 4000 1.0 1.5148e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 > VecSet 16000 1.0 3.1937e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 > VecAXPY 16000 1.0 8.2395e+00 1.4 3.22e+08 1.4 0.0e+00 0.0e+00 > 0.0e+00 1 7 0 0 0 3 7 0 0 0 465 > VecAYPX 8000 1.0 3.6370e+00 1.3 3.46e+08 1.3 0.0e+00 0.0e+00 > 0.0e+00 0 3 0 0 0 2 3 0 0 0 527 > VecScatterBegin 12000 1.0 5.7721e-01 1.1 0.00e+00 0.0 2.4e+04 2.1e+04 > 0.0e+00 0 0100100 0 0 0100100 0 0 > VecScatterEnd 12000 1.0 1.4059e+01 9.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 4 0 0 0 0 0 > MatMult 12000 1.0 5.0567e+01 1.0 1.82e+08 1.0 2.4e+04 2.1e+04 > 0.0e+00 5 32100100 0 25 32100100 0 361 > MatSolve 16000 1.0 1.1285e+02 1.4 1.50e+08 1.4 0.0e+00 0.0e+00 > 0.0e+00 10 43 0 0 0 47 43 0 0 0 214 > MatLUFactorNum 1 1.0 1.9693e-02 1.0 5.79e+07 1.1 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 110 > MatILUFactorSym 1 1.0 7.8840e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetOrdering 1 1.0 1.2250e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSetup 1 1.0 1.1921e-06 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 4000 1.0 2.0419e+02 1.0 1.39e+08 1.0 2.4e+04 2.1e+04 > 3.6e+04 21100100100100 100100100100100 278 > PCSetUp 1 1.0 2.8828e-02 1.1 3.99e+07 1.1 0.0e+00 0.0e+00 > 3.0e+00 0 0 0 0 0 0 0 0 0 0 75 > PCSetUpOnBlocks 4000 1.0 3.5605e-02 1.1 3.35e+07 1.1 0.0e+00 0.0e+00 > 3.0e+00 0 0 0 0 0 0 0 0 0 0 61 > PCApply 16000 1.0 1.1661e+02 1.4 1.46e+08 1.4 0.0e+00 0.0e+00 > 0.0e+00 10 43 0 0 0 49 43 0 0 0 207 > ------------------------------------------------------------------------------------------------------------------------ > > ... > > > Some things to note are the following: > I allocate my vector as: > VecCreateMPI(PETSC_COMM_WORLD, //communicator > a_totallocal_numPoints[a_thisproc], //local points on this proc > a_totalglobal_numPoints, //total number of global points > &m_globalRHSVector); //the vector to be created > > where the vector a_totallocal_numPoints is : > a_totallocal_numPoints: 59904 59904 > > The matrix is allocated as: > m_ierr = MatCreateMPIAIJ(PETSC_COMM_WORLD, //communicator > a_totallocal_numPoints[a_thisproc], //total number of local rows (that > is rows residing on this proc > a_totallocal_numPoints[a_thisproc], //total number of columns > corresponding to local part of parallel vector > a_totalglobal_numPoints, //number of global rows > a_totalglobal_numPoints, //number of global columns > PETSC_NULL, > a_NumberOfNZPointsInDiagonalMatrix, > PETSC_NULL, > a_NumberOfNZPointsInOffDiagonalMatrix, > &m_globalMatrix); > > With the info option i checked and there is no extra mallocs at all. > My problems setup is symmetric so it seems that everything is set up so that > it should be essentially perfectly balanced. However, the numbers given above > certainly do not reflect that. > > However, the in all other parts of my code (except the PETSc call), i get the > expected, almost perfect loadbalance. > > Is there anything that i am overlooking? Any help is greatly appreciated. > > thanks > mat > > > > On Wednesday 02 August 2006 16:21, Matt Funk wrote: >> Hi Matt, >> >> It could be a bad load imbalance because i don't let PETSc decide. I need >> to fix that anyway, so i think i'll try that first and then let you know. >> Thanks though for the quick response and helping me to interpret those >> numbers ... >> >> >> mat >> >> On Wednesday 02 August 2006 15:50, Matthew Knepley wrote: >>> On 8/2/06, Matt Funk wrote: >>>> Hi Matt, >>>> >>>> thanks for all the help so far. The -info option is really very >>>> helpful. So i think i straightened the actual errors out. However, now >>>> i am back to the original question i had. That is why it takes so much >>>> longer on 4 procs than on 1 proc. >>> >>> So you have a 1.5 load imbalance for MatMult(), which probably cascades >>> to give the 133! load imbalance for VecDot(). You probably have either: >>> >>> 1) VERY bad laod imbalance >>> >>> 2) a screwed up network >>> >>> 3) bad contention on the network (loaded cluster) >>> >>> Can you help us narrow this down? >>> >>> >>> Matt >>> >>>> I profiled the KSPSolve(...) as stage 2: >>>> >>>> For 1 proc i have: >>>> --- Event Stage 2: Stage 2 of ChomboPetscInterface >>>> >>>> VecDot 4000 1.0 4.9158e-02 1.0 4.74e+08 1.0 0.0e+00 >>>> 0.0e+00 0.0e+00 0 18 0 0 0 2 18 0 0 0 474 >>>> VecNorm 8000 1.0 2.1798e-01 1.0 2.14e+08 1.0 0.0e+00 >>>> 0.0e+00 4.0e+03 1 36 0 0 28 7 36 0 0 33 214 >>>> VecAYPX 4000 1.0 1.3449e-01 1.0 1.73e+08 1.0 0.0e+00 >>>> 0.0e+00 0.0e+00 0 18 0 0 0 5 18 0 0 0 173 >>>> MatMult 4000 1.0 3.6004e-01 1.0 3.24e+07 1.0 0.0e+00 >>>> 0.0e+00 0.0e+00 1 9 0 0 0 12 9 0 0 0 32 >>>> MatSolve 8000 1.0 1.0620e+00 1.0 2.19e+07 1.0 0.0e+00 >>>> 0.0e+00 0.0e+00 3 18 0 0 0 36 18 0 0 0 22 >>>> KSPSolve 4000 1.0 2.8338e+00 1.0 4.52e+07 1.0 0.0e+00 >>>> 0.0e+00 1.2e+04 7100 0 0 84 97100 0 0100 45 >>>> PCApply 8000 1.0 1.1133e+00 1.0 2.09e+07 1.0 0.0e+00 >>>> 0.0e+00 0.0e+00 3 18 0 0 0 38 18 0 0 0 21 >>>> >>>> >>>> for 4 procs i have : >>>> --- Event Stage 2: Stage 2 of ChomboPetscInterface >>>> >>>> VecDot 4000 1.0 3.5884e+01133.7 2.17e+07133.7 0.0e+00 >>>> 0.0e+00 4.0e+03 8 18 0 0 5 9 18 0 0 14 1 >>>> VecNorm 8000 1.0 3.4986e-01 1.3 4.43e+07 1.3 0.0e+00 >>>> 0.0e+00 8.0e+03 0 36 0 0 10 0 36 0 0 29 133 >>>> VecSet 8000 1.0 3.5024e-02 1.4 0.00e+00 0.0 0.0e+00 >>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecAYPX 4000 1.0 5.6790e-02 1.3 1.28e+08 1.3 0.0e+00 >>>> 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 410 >>>> VecScatterBegin 4000 1.0 6.0042e+01 1.4 0.00e+00 0.0 0.0e+00 >>>> 0.0e+00 0.0e+00 38 0 0 0 0 45 0 0 0 0 0 >>>> VecScatterEnd 4000 1.0 5.9364e+01 1.4 0.00e+00 0.0 0.0e+00 >>>> 0.0e+00 0.0e+00 37 0 0 0 0 44 0 0 0 0 0 >>>> MatMult 4000 1.0 1.1959e+02 1.4 3.46e+04 1.4 0.0e+00 >>>> 0.0e+00 0.0e+00 75 9 0 0 0 89 9 0 0 0 0 >>>> MatSolve 8000 1.0 2.8150e-01 1.0 2.16e+07 1.0 0.0e+00 >>>> 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 83 >>>> MatLUFactorNum 1 1.0 1.3685e-04 1.1 5.64e+06 1.1 0.0e+00 >>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 21 >>>> MatILUFactorSym 1 1.0 2.3389e-04 1.2 0.00e+00 0.0 0.0e+00 >>>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatGetOrdering 1 1.0 9.6083e-05 1.2 0.00e+00 0.0 0.0e+00 >>>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> KSPSetup 1 1.0 2.1458e-06 2.2 0.00e+00 0.0 0.0e+00 >>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> KSPSolve 4000 1.0 1.2200e+02 1.0 2.63e+05 1.0 0.0e+00 >>>> 0.0e+00 2.8e+04 84100 0 0 34 100100 0 0100 1 >>>> PCSetUp 1 1.0 5.0187e-04 1.2 1.68e+06 1.2 0.0e+00 >>>> 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 6 >>>> PCSetUpOnBlocks 4000 1.0 1.2104e-02 2.2 1.34e+05 2.2 0.0e+00 >>>> 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> PCApply 8000 1.0 8.4254e-01 1.2 8.27e+06 1.2 0.0e+00 >>>> 0.0e+00 8.0e+03 1 18 0 0 10 1 18 0 0 29 28 >>>> ----------------------------------------------------------------------- >>>> -- ----------------------------------------------- >>>> >>>> Now if i understand it right, all these calls summarize all calls >>>> between the pop and push commands. That would mean that the majority of >>>> the time is spend in the MatMult and in within that the VecScatterBegin >>>> and VecScatterEnd commands (if i understand it right). >>>> >>>> My problem size is really small. So i was wondering if the problem lies >>>> in that (namely that the major time is simply spend communicating >>>> between processors, or whether there is still something wrong with how >>>> i wrote the code?) >>>> >>>> >>>> thanks >>>> mat >>>> >>>> On Tuesday 01 August 2006 18:28, Matthew Knepley wrote: >>>>> On 8/1/06, Matt Funk wrote: >>>>>> Actually the errors occur on my calls to a PETSc functions after >>>>>> calling PETSCInitialize. >>>>> >>>>> Yes, it is the error I pointed out in the last message. >>>>> >>>>> Matt >>>>> >>>>>> mat > > From mafunk at nmsu.edu Tue Aug 15 13:39:31 2006 From: mafunk at nmsu.edu (Matt Funk) Date: Tue, 15 Aug 2006 12:39:31 -0600 Subject: profiling PETSc code In-Reply-To: References: <200608011545.26224.mafunk@nmsu.edu> <200608151034.03619.mafunk@nmsu.edu> Message-ID: <200608151239.32375.mafunk@nmsu.edu> On Tuesday 15 August 2006 11:52, Barry Smith wrote: > > MatSolve 16000 1.0 1.1285e+02 1.4 1.50e+08 1.4 0.0e+00 0.0e+00 > > ^^^^^ > balance > > Hmmm, I would guess that the matrix entries are not so well balanced? > One process takes 1.4 times as long for the triangular solves as the other > so either one matrix has many more entries or one processor is slower then > the other. > > Barry Well it would seem that way at first, but i don't know how that could be since i allocate an exactly equal amount of points on both processor (see previous email). Further i used the -mat_view_info option. Here is what it gives me: ... Matrix Object: type=mpiaij, rows=119808, cols=119808 [1] PetscCommDuplicateUsing internal PETSc communicator 91 168 [1] PetscCommDuplicateUsing internal PETSc communicator 91 168 ... ... Matrix Object: type=seqaij, rows=59904, cols=59904 total: nonzeros=407400, allocated nonzeros=407400 not using I-node routines Matrix Object: type=seqaij, rows=59904, cols=59904 total: nonzeros=407400, allocated nonzeros=407400 not using I-node routines ... So to me it look s well split up. Is there anything else that somebody can think of. The machine i am running on is all same processors. By the way, i am not sure if the '[1] PetscCommDuplicateUsing internal PETSc communicator 91 168' is something i need to worry about?? mat > > On Tue, 15 Aug 2006, Matt Funk wrote: > > Hi Matt, > > > > sorry for the delay since the last email, but there were some other > > things i needed to do. > > > > Anyway, I hope that maybe I can get some more help from you guys with > > respect to the loadimbalance problem i have. Here is the situtation: > > I run my code on 2 procs. I profile my KSPSolve call and here is what i > > get: > > > > ... > > > > --- Event Stage 2: Stage 2 of ChomboPetscInterface > > > > VecDot 20000 1.0 1.9575e+01 2.0 2.39e+08 2.0 0.0e+00 0.0e+00 > > 2.0e+04 2 8 0 0 56 7 8 0 0 56 245 > > VecNorm 16000 1.0 4.0559e+01 3.2 1.53e+08 3.2 0.0e+00 0.0e+00 > > 1.6e+04 3 7 0 0 44 13 7 0 0 44 95 > > VecCopy 4000 1.0 1.5148e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 > > 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 > > VecSet 16000 1.0 3.1937e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 > > 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 > > VecAXPY 16000 1.0 8.2395e+00 1.4 3.22e+08 1.4 0.0e+00 0.0e+00 > > 0.0e+00 1 7 0 0 0 3 7 0 0 0 465 > > VecAYPX 8000 1.0 3.6370e+00 1.3 3.46e+08 1.3 0.0e+00 0.0e+00 > > 0.0e+00 0 3 0 0 0 2 3 0 0 0 527 > > VecScatterBegin 12000 1.0 5.7721e-01 1.1 0.00e+00 0.0 2.4e+04 2.1e+04 > > 0.0e+00 0 0100100 0 0 0100100 0 0 > > VecScatterEnd 12000 1.0 1.4059e+01 9.2 0.00e+00 0.0 0.0e+00 0.0e+00 > > 0.0e+00 1 0 0 0 0 4 0 0 0 0 0 > > MatMult 12000 1.0 5.0567e+01 1.0 1.82e+08 1.0 2.4e+04 2.1e+04 > > 0.0e+00 5 32100100 0 25 32100100 0 361 > > MatSolve 16000 1.0 1.1285e+02 1.4 1.50e+08 1.4 0.0e+00 0.0e+00 > > 0.0e+00 10 43 0 0 0 47 43 0 0 0 214 > > MatLUFactorNum 1 1.0 1.9693e-02 1.0 5.79e+07 1.1 0.0e+00 0.0e+00 > > 0.0e+00 0 0 0 0 0 0 0 0 0 0 110 > > MatILUFactorSym 1 1.0 7.8840e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 > > 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatGetOrdering 1 1.0 1.2250e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 > > 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > KSPSetup 1 1.0 1.1921e-06 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 > > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > KSPSolve 4000 1.0 2.0419e+02 1.0 1.39e+08 1.0 2.4e+04 2.1e+04 > > 3.6e+04 21100100100100 100100100100100 278 > > PCSetUp 1 1.0 2.8828e-02 1.1 3.99e+07 1.1 0.0e+00 0.0e+00 > > 3.0e+00 0 0 0 0 0 0 0 0 0 0 75 > > PCSetUpOnBlocks 4000 1.0 3.5605e-02 1.1 3.35e+07 1.1 0.0e+00 0.0e+00 > > 3.0e+00 0 0 0 0 0 0 0 0 0 0 61 > > PCApply 16000 1.0 1.1661e+02 1.4 1.46e+08 1.4 0.0e+00 0.0e+00 > > 0.0e+00 10 43 0 0 0 49 43 0 0 0 207 > > ------------------------------------------------------------------------- > >----------------------------------------------- > > > > ... > > > > > > Some things to note are the following: > > I allocate my vector as: > > VecCreateMPI(PETSC_COMM_WORLD, //communicator > > a_totallocal_numPoints[a_thisproc], //local points on this proc > > a_totalglobal_numPoints, //total number of global points > > &m_globalRHSVector); //the vector to be created > > > > where the vector a_totallocal_numPoints is : > > a_totallocal_numPoints: 59904 59904 > > > > The matrix is allocated as: > > m_ierr = MatCreateMPIAIJ(PETSC_COMM_WORLD, //communicator > > a_totallocal_numPoints[a_thisproc], //total number of local rows > > (that is rows residing on this proc > > a_totallocal_numPoints[a_thisproc], //total number of columns > > corresponding to local part of parallel vector > > a_totalglobal_numPoints, //number of global rows > > a_totalglobal_numPoints, //number of global columns > > PETSC_NULL, > > a_NumberOfNZPointsInDiagonalMatrix, > > PETSC_NULL, > > a_NumberOfNZPointsInOffDiagonalMatrix, > > &m_globalMatrix); > > > > With the info option i checked and there is no extra mallocs at all. > > My problems setup is symmetric so it seems that everything is set up so > > that it should be essentially perfectly balanced. However, the numbers > > given above certainly do not reflect that. > > > > However, the in all other parts of my code (except the PETSc call), i get > > the expected, almost perfect loadbalance. > > > > Is there anything that i am overlooking? Any help is greatly appreciated. > > > > thanks > > mat > > > > On Wednesday 02 August 2006 16:21, Matt Funk wrote: > >> Hi Matt, > >> > >> It could be a bad load imbalance because i don't let PETSc decide. I > >> need to fix that anyway, so i think i'll try that first and then let you > >> know. Thanks though for the quick response and helping me to interpret > >> those numbers ... > >> > >> > >> mat > >> > >> On Wednesday 02 August 2006 15:50, Matthew Knepley wrote: > >>> On 8/2/06, Matt Funk wrote: > >>>> Hi Matt, > >>>> > >>>> thanks for all the help so far. The -info option is really very > >>>> helpful. So i think i straightened the actual errors out. However, now > >>>> i am back to the original question i had. That is why it takes so much > >>>> longer on 4 procs than on 1 proc. > >>> > >>> So you have a 1.5 load imbalance for MatMult(), which probably cascades > >>> to give the 133! load imbalance for VecDot(). You probably have either: > >>> > >>> 1) VERY bad laod imbalance > >>> > >>> 2) a screwed up network > >>> > >>> 3) bad contention on the network (loaded cluster) > >>> > >>> Can you help us narrow this down? > >>> > >>> > >>> Matt > >>> > >>>> I profiled the KSPSolve(...) as stage 2: > >>>> > >>>> For 1 proc i have: > >>>> --- Event Stage 2: Stage 2 of ChomboPetscInterface > >>>> > >>>> VecDot 4000 1.0 4.9158e-02 1.0 4.74e+08 1.0 0.0e+00 > >>>> 0.0e+00 0.0e+00 0 18 0 0 0 2 18 0 0 0 474 > >>>> VecNorm 8000 1.0 2.1798e-01 1.0 2.14e+08 1.0 0.0e+00 > >>>> 0.0e+00 4.0e+03 1 36 0 0 28 7 36 0 0 33 214 > >>>> VecAYPX 4000 1.0 1.3449e-01 1.0 1.73e+08 1.0 0.0e+00 > >>>> 0.0e+00 0.0e+00 0 18 0 0 0 5 18 0 0 0 173 > >>>> MatMult 4000 1.0 3.6004e-01 1.0 3.24e+07 1.0 0.0e+00 > >>>> 0.0e+00 0.0e+00 1 9 0 0 0 12 9 0 0 0 32 > >>>> MatSolve 8000 1.0 1.0620e+00 1.0 2.19e+07 1.0 0.0e+00 > >>>> 0.0e+00 0.0e+00 3 18 0 0 0 36 18 0 0 0 22 > >>>> KSPSolve 4000 1.0 2.8338e+00 1.0 4.52e+07 1.0 0.0e+00 > >>>> 0.0e+00 1.2e+04 7100 0 0 84 97100 0 0100 45 > >>>> PCApply 8000 1.0 1.1133e+00 1.0 2.09e+07 1.0 0.0e+00 > >>>> 0.0e+00 0.0e+00 3 18 0 0 0 38 18 0 0 0 21 > >>>> > >>>> > >>>> for 4 procs i have : > >>>> --- Event Stage 2: Stage 2 of ChomboPetscInterface > >>>> > >>>> VecDot 4000 1.0 3.5884e+01133.7 2.17e+07133.7 0.0e+00 > >>>> 0.0e+00 4.0e+03 8 18 0 0 5 9 18 0 0 14 1 > >>>> VecNorm 8000 1.0 3.4986e-01 1.3 4.43e+07 1.3 0.0e+00 > >>>> 0.0e+00 8.0e+03 0 36 0 0 10 0 36 0 0 29 133 > >>>> VecSet 8000 1.0 3.5024e-02 1.4 0.00e+00 0.0 0.0e+00 > >>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> VecAYPX 4000 1.0 5.6790e-02 1.3 1.28e+08 1.3 0.0e+00 > >>>> 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 410 > >>>> VecScatterBegin 4000 1.0 6.0042e+01 1.4 0.00e+00 0.0 0.0e+00 > >>>> 0.0e+00 0.0e+00 38 0 0 0 0 45 0 0 0 0 0 > >>>> VecScatterEnd 4000 1.0 5.9364e+01 1.4 0.00e+00 0.0 0.0e+00 > >>>> 0.0e+00 0.0e+00 37 0 0 0 0 44 0 0 0 0 0 > >>>> MatMult 4000 1.0 1.1959e+02 1.4 3.46e+04 1.4 0.0e+00 > >>>> 0.0e+00 0.0e+00 75 9 0 0 0 89 9 0 0 0 0 > >>>> MatSolve 8000 1.0 2.8150e-01 1.0 2.16e+07 1.0 0.0e+00 > >>>> 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 83 > >>>> MatLUFactorNum 1 1.0 1.3685e-04 1.1 5.64e+06 1.1 0.0e+00 > >>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 21 > >>>> MatILUFactorSym 1 1.0 2.3389e-04 1.2 0.00e+00 0.0 0.0e+00 > >>>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> MatGetOrdering 1 1.0 9.6083e-05 1.2 0.00e+00 0.0 0.0e+00 > >>>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> KSPSetup 1 1.0 2.1458e-06 2.2 0.00e+00 0.0 0.0e+00 > >>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> KSPSolve 4000 1.0 1.2200e+02 1.0 2.63e+05 1.0 0.0e+00 > >>>> 0.0e+00 2.8e+04 84100 0 0 34 100100 0 0100 1 > >>>> PCSetUp 1 1.0 5.0187e-04 1.2 1.68e+06 1.2 0.0e+00 > >>>> 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 6 > >>>> PCSetUpOnBlocks 4000 1.0 1.2104e-02 2.2 1.34e+05 2.2 0.0e+00 > >>>> 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>> PCApply 8000 1.0 8.4254e-01 1.2 8.27e+06 1.2 0.0e+00 > >>>> 0.0e+00 8.0e+03 1 18 0 0 10 1 18 0 0 29 28 > >>>> ---------------------------------------------------------------------- > >>>>- -- ----------------------------------------------- > >>>> > >>>> Now if i understand it right, all these calls summarize all calls > >>>> between the pop and push commands. That would mean that the majority > >>>> of the time is spend in the MatMult and in within that the > >>>> VecScatterBegin and VecScatterEnd commands (if i understand it right). > >>>> > >>>> My problem size is really small. So i was wondering if the problem > >>>> lies in that (namely that the major time is simply spend communicating > >>>> between processors, or whether there is still something wrong with how > >>>> i wrote the code?) > >>>> > >>>> > >>>> thanks > >>>> mat > >>>> > >>>> On Tuesday 01 August 2006 18:28, Matthew Knepley wrote: > >>>>> On 8/1/06, Matt Funk wrote: > >>>>>> Actually the errors occur on my calls to a PETSc functions after > >>>>>> calling PETSCInitialize. > >>>>> > >>>>> Yes, it is the error I pointed out in the last message. > >>>>> > >>>>> Matt > >>>>> > >>>>>> mat From bsmith at mcs.anl.gov Tue Aug 15 14:56:34 2006 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 15 Aug 2006 14:56:34 -0500 (CDT) Subject: profiling PETSc code In-Reply-To: <200608151239.32375.mafunk@nmsu.edu> References: <200608011545.26224.mafunk@nmsu.edu> <200608151034.03619.mafunk@nmsu.edu> <200608151239.32375.mafunk@nmsu.edu> Message-ID: Please send the entire -info output as an attachment to me. (Not in the email) I'll study it in more detail. Barry On Tue, 15 Aug 2006, Matt Funk wrote: > On Tuesday 15 August 2006 11:52, Barry Smith wrote: >>> MatSolve 16000 1.0 1.1285e+02 1.4 1.50e+08 1.4 0.0e+00 0.0e+00 >> >> ^^^^^ >> balance >> >> Hmmm, I would guess that the matrix entries are not so well balanced? >> One process takes 1.4 times as long for the triangular solves as the other >> so either one matrix has many more entries or one processor is slower then >> the other. >> >> Barry > > Well it would seem that way at first, but i don't know how that could be since > i allocate an exactly equal amount of points on both processor (see previous > email). > Further i used the -mat_view_info option. Here is what it gives me: > > ... > Matrix Object: > type=mpiaij, rows=119808, cols=119808 > [1] PetscCommDuplicateUsing internal PETSc communicator 91 168 > [1] PetscCommDuplicateUsing internal PETSc communicator 91 168 > ... > > ... > Matrix Object: > type=seqaij, rows=59904, cols=59904 > total: nonzeros=407400, allocated nonzeros=407400 > not using I-node routines > Matrix Object: > type=seqaij, rows=59904, cols=59904 > total: nonzeros=407400, allocated nonzeros=407400 > not using I-node routines > ... > > So to me it look s well split up. Is there anything else that somebody can > think of. The machine i am running on is all same processors. > > By the way, i am not sure if the '[1] PetscCommDuplicateUsing internal PETSc > communicator 91 168' is something i need to worry about?? > > mat > > > > > > > > >> >> On Tue, 15 Aug 2006, Matt Funk wrote: >>> Hi Matt, >>> >>> sorry for the delay since the last email, but there were some other >>> things i needed to do. >>> >>> Anyway, I hope that maybe I can get some more help from you guys with >>> respect to the loadimbalance problem i have. Here is the situtation: >>> I run my code on 2 procs. I profile my KSPSolve call and here is what i >>> get: >>> >>> ... >>> >>> --- Event Stage 2: Stage 2 of ChomboPetscInterface >>> >>> VecDot 20000 1.0 1.9575e+01 2.0 2.39e+08 2.0 0.0e+00 0.0e+00 >>> 2.0e+04 2 8 0 0 56 7 8 0 0 56 245 >>> VecNorm 16000 1.0 4.0559e+01 3.2 1.53e+08 3.2 0.0e+00 0.0e+00 >>> 1.6e+04 3 7 0 0 44 13 7 0 0 44 95 >>> VecCopy 4000 1.0 1.5148e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 >>> VecSet 16000 1.0 3.1937e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 >>> VecAXPY 16000 1.0 8.2395e+00 1.4 3.22e+08 1.4 0.0e+00 0.0e+00 >>> 0.0e+00 1 7 0 0 0 3 7 0 0 0 465 >>> VecAYPX 8000 1.0 3.6370e+00 1.3 3.46e+08 1.3 0.0e+00 0.0e+00 >>> 0.0e+00 0 3 0 0 0 2 3 0 0 0 527 >>> VecScatterBegin 12000 1.0 5.7721e-01 1.1 0.00e+00 0.0 2.4e+04 2.1e+04 >>> 0.0e+00 0 0100100 0 0 0100100 0 0 >>> VecScatterEnd 12000 1.0 1.4059e+01 9.2 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 1 0 0 0 0 4 0 0 0 0 0 >>> MatMult 12000 1.0 5.0567e+01 1.0 1.82e+08 1.0 2.4e+04 2.1e+04 >>> 0.0e+00 5 32100100 0 25 32100100 0 361 >>> MatSolve 16000 1.0 1.1285e+02 1.4 1.50e+08 1.4 0.0e+00 0.0e+00 >>> 0.0e+00 10 43 0 0 0 47 43 0 0 0 214 >>> MatLUFactorNum 1 1.0 1.9693e-02 1.0 5.79e+07 1.1 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 110 >>> MatILUFactorSym 1 1.0 7.8840e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatGetOrdering 1 1.0 1.2250e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> KSPSetup 1 1.0 1.1921e-06 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> KSPSolve 4000 1.0 2.0419e+02 1.0 1.39e+08 1.0 2.4e+04 2.1e+04 >>> 3.6e+04 21100100100100 100100100100100 278 >>> PCSetUp 1 1.0 2.8828e-02 1.1 3.99e+07 1.1 0.0e+00 0.0e+00 >>> 3.0e+00 0 0 0 0 0 0 0 0 0 0 75 >>> PCSetUpOnBlocks 4000 1.0 3.5605e-02 1.1 3.35e+07 1.1 0.0e+00 0.0e+00 >>> 3.0e+00 0 0 0 0 0 0 0 0 0 0 61 >>> PCApply 16000 1.0 1.1661e+02 1.4 1.46e+08 1.4 0.0e+00 0.0e+00 >>> 0.0e+00 10 43 0 0 0 49 43 0 0 0 207 >>> ------------------------------------------------------------------------- >>> ----------------------------------------------- >>> >>> ... >>> >>> >>> Some things to note are the following: >>> I allocate my vector as: >>> VecCreateMPI(PETSC_COMM_WORLD, //communicator >>> a_totallocal_numPoints[a_thisproc], //local points on this proc >>> a_totalglobal_numPoints, //total number of global points >>> &m_globalRHSVector); //the vector to be created >>> >>> where the vector a_totallocal_numPoints is : >>> a_totallocal_numPoints: 59904 59904 >>> >>> The matrix is allocated as: >>> m_ierr = MatCreateMPIAIJ(PETSC_COMM_WORLD, //communicator >>> a_totallocal_numPoints[a_thisproc], //total number of local rows >>> (that is rows residing on this proc >>> a_totallocal_numPoints[a_thisproc], //total number of columns >>> corresponding to local part of parallel vector >>> a_totalglobal_numPoints, //number of global rows >>> a_totalglobal_numPoints, //number of global columns >>> PETSC_NULL, >>> a_NumberOfNZPointsInDiagonalMatrix, >>> PETSC_NULL, >>> a_NumberOfNZPointsInOffDiagonalMatrix, >>> &m_globalMatrix); >>> >>> With the info option i checked and there is no extra mallocs at all. >>> My problems setup is symmetric so it seems that everything is set up so >>> that it should be essentially perfectly balanced. However, the numbers >>> given above certainly do not reflect that. >>> >>> However, the in all other parts of my code (except the PETSc call), i get >>> the expected, almost perfect loadbalance. >>> >>> Is there anything that i am overlooking? Any help is greatly appreciated. >>> >>> thanks >>> mat >>> >>> On Wednesday 02 August 2006 16:21, Matt Funk wrote: >>>> Hi Matt, >>>> >>>> It could be a bad load imbalance because i don't let PETSc decide. I >>>> need to fix that anyway, so i think i'll try that first and then let you >>>> know. Thanks though for the quick response and helping me to interpret >>>> those numbers ... >>>> >>>> >>>> mat >>>> >>>> On Wednesday 02 August 2006 15:50, Matthew Knepley wrote: >>>>> On 8/2/06, Matt Funk wrote: >>>>>> Hi Matt, >>>>>> >>>>>> thanks for all the help so far. The -info option is really very >>>>>> helpful. So i think i straightened the actual errors out. However, now >>>>>> i am back to the original question i had. That is why it takes so much >>>>>> longer on 4 procs than on 1 proc. >>>>> >>>>> So you have a 1.5 load imbalance for MatMult(), which probably cascades >>>>> to give the 133! load imbalance for VecDot(). You probably have either: >>>>> >>>>> 1) VERY bad laod imbalance >>>>> >>>>> 2) a screwed up network >>>>> >>>>> 3) bad contention on the network (loaded cluster) >>>>> >>>>> Can you help us narrow this down? >>>>> >>>>> >>>>> Matt >>>>> >>>>>> I profiled the KSPSolve(...) as stage 2: >>>>>> >>>>>> For 1 proc i have: >>>>>> --- Event Stage 2: Stage 2 of ChomboPetscInterface >>>>>> >>>>>> VecDot 4000 1.0 4.9158e-02 1.0 4.74e+08 1.0 0.0e+00 >>>>>> 0.0e+00 0.0e+00 0 18 0 0 0 2 18 0 0 0 474 >>>>>> VecNorm 8000 1.0 2.1798e-01 1.0 2.14e+08 1.0 0.0e+00 >>>>>> 0.0e+00 4.0e+03 1 36 0 0 28 7 36 0 0 33 214 >>>>>> VecAYPX 4000 1.0 1.3449e-01 1.0 1.73e+08 1.0 0.0e+00 >>>>>> 0.0e+00 0.0e+00 0 18 0 0 0 5 18 0 0 0 173 >>>>>> MatMult 4000 1.0 3.6004e-01 1.0 3.24e+07 1.0 0.0e+00 >>>>>> 0.0e+00 0.0e+00 1 9 0 0 0 12 9 0 0 0 32 >>>>>> MatSolve 8000 1.0 1.0620e+00 1.0 2.19e+07 1.0 0.0e+00 >>>>>> 0.0e+00 0.0e+00 3 18 0 0 0 36 18 0 0 0 22 >>>>>> KSPSolve 4000 1.0 2.8338e+00 1.0 4.52e+07 1.0 0.0e+00 >>>>>> 0.0e+00 1.2e+04 7100 0 0 84 97100 0 0100 45 >>>>>> PCApply 8000 1.0 1.1133e+00 1.0 2.09e+07 1.0 0.0e+00 >>>>>> 0.0e+00 0.0e+00 3 18 0 0 0 38 18 0 0 0 21 >>>>>> >>>>>> >>>>>> for 4 procs i have : >>>>>> --- Event Stage 2: Stage 2 of ChomboPetscInterface >>>>>> >>>>>> VecDot 4000 1.0 3.5884e+01133.7 2.17e+07133.7 0.0e+00 >>>>>> 0.0e+00 4.0e+03 8 18 0 0 5 9 18 0 0 14 1 >>>>>> VecNorm 8000 1.0 3.4986e-01 1.3 4.43e+07 1.3 0.0e+00 >>>>>> 0.0e+00 8.0e+03 0 36 0 0 10 0 36 0 0 29 133 >>>>>> VecSet 8000 1.0 3.5024e-02 1.4 0.00e+00 0.0 0.0e+00 >>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>> VecAYPX 4000 1.0 5.6790e-02 1.3 1.28e+08 1.3 0.0e+00 >>>>>> 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 410 >>>>>> VecScatterBegin 4000 1.0 6.0042e+01 1.4 0.00e+00 0.0 0.0e+00 >>>>>> 0.0e+00 0.0e+00 38 0 0 0 0 45 0 0 0 0 0 >>>>>> VecScatterEnd 4000 1.0 5.9364e+01 1.4 0.00e+00 0.0 0.0e+00 >>>>>> 0.0e+00 0.0e+00 37 0 0 0 0 44 0 0 0 0 0 >>>>>> MatMult 4000 1.0 1.1959e+02 1.4 3.46e+04 1.4 0.0e+00 >>>>>> 0.0e+00 0.0e+00 75 9 0 0 0 89 9 0 0 0 0 >>>>>> MatSolve 8000 1.0 2.8150e-01 1.0 2.16e+07 1.0 0.0e+00 >>>>>> 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 83 >>>>>> MatLUFactorNum 1 1.0 1.3685e-04 1.1 5.64e+06 1.1 0.0e+00 >>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 21 >>>>>> MatILUFactorSym 1 1.0 2.3389e-04 1.2 0.00e+00 0.0 0.0e+00 >>>>>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>> MatGetOrdering 1 1.0 9.6083e-05 1.2 0.00e+00 0.0 0.0e+00 >>>>>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>> KSPSetup 1 1.0 2.1458e-06 2.2 0.00e+00 0.0 0.0e+00 >>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>> KSPSolve 4000 1.0 1.2200e+02 1.0 2.63e+05 1.0 0.0e+00 >>>>>> 0.0e+00 2.8e+04 84100 0 0 34 100100 0 0100 1 >>>>>> PCSetUp 1 1.0 5.0187e-04 1.2 1.68e+06 1.2 0.0e+00 >>>>>> 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 6 >>>>>> PCSetUpOnBlocks 4000 1.0 1.2104e-02 2.2 1.34e+05 2.2 0.0e+00 >>>>>> 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>> PCApply 8000 1.0 8.4254e-01 1.2 8.27e+06 1.2 0.0e+00 >>>>>> 0.0e+00 8.0e+03 1 18 0 0 10 1 18 0 0 29 28 >>>>>> ---------------------------------------------------------------------- >>>>>> - -- ----------------------------------------------- >>>>>> >>>>>> Now if i understand it right, all these calls summarize all calls >>>>>> between the pop and push commands. That would mean that the majority >>>>>> of the time is spend in the MatMult and in within that the >>>>>> VecScatterBegin and VecScatterEnd commands (if i understand it right). >>>>>> >>>>>> My problem size is really small. So i was wondering if the problem >>>>>> lies in that (namely that the major time is simply spend communicating >>>>>> between processors, or whether there is still something wrong with how >>>>>> i wrote the code?) >>>>>> >>>>>> >>>>>> thanks >>>>>> mat >>>>>> >>>>>> On Tuesday 01 August 2006 18:28, Matthew Knepley wrote: >>>>>>> On 8/1/06, Matt Funk wrote: >>>>>>>> Actually the errors occur on my calls to a PETSc functions after >>>>>>>> calling PETSCInitialize. >>>>>>> >>>>>>> Yes, it is the error I pointed out in the last message. >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>>> mat > > From mafunk at nmsu.edu Tue Aug 15 15:35:52 2006 From: mafunk at nmsu.edu (Matt Funk) Date: Tue, 15 Aug 2006 14:35:52 -0600 Subject: profiling PETSc code In-Reply-To: References: <200608011545.26224.mafunk@nmsu.edu> <200608151239.32375.mafunk@nmsu.edu> Message-ID: <200608151435.55054.mafunk@nmsu.edu> Do you want me to use the debug version or the optimized version of PETSc? mat On Tuesday 15 August 2006 13:56, Barry Smith wrote: > Please send the entire -info output as an attachment to me. (Not > in the email) I'll study it in more detail. > > Barry > > On Tue, 15 Aug 2006, Matt Funk wrote: > > On Tuesday 15 August 2006 11:52, Barry Smith wrote: > >>> MatSolve 16000 1.0 1.1285e+02 1.4 1.50e+08 1.4 0.0e+00 > >>> 0.0e+00 > >> > >> ^^^^^ > >> balance > >> > >> Hmmm, I would guess that the matrix entries are not so well balanced? > >> One process takes 1.4 times as long for the triangular solves as the > >> other so either one matrix has many more entries or one processor is > >> slower then the other. > >> > >> Barry > > > > Well it would seem that way at first, but i don't know how that could be > > since i allocate an exactly equal amount of points on both processor (see > > previous email). > > Further i used the -mat_view_info option. Here is what it gives me: > > > > ... > > Matrix Object: > > type=mpiaij, rows=119808, cols=119808 > > [1] PetscCommDuplicateUsing internal PETSc communicator 91 168 > > [1] PetscCommDuplicateUsing internal PETSc communicator 91 168 > > ... > > > > ... > > Matrix Object: > > type=seqaij, rows=59904, cols=59904 > > total: nonzeros=407400, allocated nonzeros=407400 > > not using I-node routines > > Matrix Object: > > type=seqaij, rows=59904, cols=59904 > > total: nonzeros=407400, allocated nonzeros=407400 > > not using I-node routines > > ... > > > > So to me it look s well split up. Is there anything else that somebody > > can think of. The machine i am running on is all same processors. > > > > By the way, i am not sure if the '[1] PetscCommDuplicateUsing internal > > PETSc communicator 91 168' is something i need to worry about?? > > > > mat > > > >> On Tue, 15 Aug 2006, Matt Funk wrote: > >>> Hi Matt, > >>> > >>> sorry for the delay since the last email, but there were some other > >>> things i needed to do. > >>> > >>> Anyway, I hope that maybe I can get some more help from you guys with > >>> respect to the loadimbalance problem i have. Here is the situtation: > >>> I run my code on 2 procs. I profile my KSPSolve call and here is what i > >>> get: > >>> > >>> ... > >>> > >>> --- Event Stage 2: Stage 2 of ChomboPetscInterface > >>> > >>> VecDot 20000 1.0 1.9575e+01 2.0 2.39e+08 2.0 0.0e+00 > >>> 0.0e+00 2.0e+04 2 8 0 0 56 7 8 0 0 56 245 > >>> VecNorm 16000 1.0 4.0559e+01 3.2 1.53e+08 3.2 0.0e+00 > >>> 0.0e+00 1.6e+04 3 7 0 0 44 13 7 0 0 44 95 > >>> VecCopy 4000 1.0 1.5148e+00 1.4 0.00e+00 0.0 0.0e+00 > >>> 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 > >>> VecSet 16000 1.0 3.1937e+00 1.8 0.00e+00 0.0 0.0e+00 > >>> 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 > >>> VecAXPY 16000 1.0 8.2395e+00 1.4 3.22e+08 1.4 0.0e+00 > >>> 0.0e+00 0.0e+00 1 7 0 0 0 3 7 0 0 0 465 > >>> VecAYPX 8000 1.0 3.6370e+00 1.3 3.46e+08 1.3 0.0e+00 > >>> 0.0e+00 0.0e+00 0 3 0 0 0 2 3 0 0 0 527 > >>> VecScatterBegin 12000 1.0 5.7721e-01 1.1 0.00e+00 0.0 2.4e+04 > >>> 2.1e+04 0.0e+00 0 0100100 0 0 0100100 0 0 > >>> VecScatterEnd 12000 1.0 1.4059e+01 9.2 0.00e+00 0.0 0.0e+00 > >>> 0.0e+00 0.0e+00 1 0 0 0 0 4 0 0 0 0 0 > >>> MatMult 12000 1.0 5.0567e+01 1.0 1.82e+08 1.0 2.4e+04 > >>> 2.1e+04 0.0e+00 5 32100100 0 25 32100100 0 361 > >>> MatSolve 16000 1.0 1.1285e+02 1.4 1.50e+08 1.4 0.0e+00 > >>> 0.0e+00 0.0e+00 10 43 0 0 0 47 43 0 0 0 214 > >>> MatLUFactorNum 1 1.0 1.9693e-02 1.0 5.79e+07 1.1 0.0e+00 > >>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 110 > >>> MatILUFactorSym 1 1.0 7.8840e-03 1.1 0.00e+00 0.0 0.0e+00 > >>> 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>> MatGetOrdering 1 1.0 1.2250e-03 1.1 0.00e+00 0.0 0.0e+00 > >>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>> KSPSetup 1 1.0 1.1921e-06 1.2 0.00e+00 0.0 0.0e+00 > >>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>> KSPSolve 4000 1.0 2.0419e+02 1.0 1.39e+08 1.0 2.4e+04 > >>> 2.1e+04 3.6e+04 21100100100100 100100100100100 278 > >>> PCSetUp 1 1.0 2.8828e-02 1.1 3.99e+07 1.1 0.0e+00 > >>> 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 75 > >>> PCSetUpOnBlocks 4000 1.0 3.5605e-02 1.1 3.35e+07 1.1 0.0e+00 > >>> 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 61 > >>> PCApply 16000 1.0 1.1661e+02 1.4 1.46e+08 1.4 0.0e+00 > >>> 0.0e+00 0.0e+00 10 43 0 0 0 49 43 0 0 0 207 > >>> ----------------------------------------------------------------------- > >>>-- ----------------------------------------------- > >>> > >>> ... > >>> > >>> > >>> Some things to note are the following: > >>> I allocate my vector as: > >>> VecCreateMPI(PETSC_COMM_WORLD, //communicator > >>> a_totallocal_numPoints[a_thisproc], //local points on this proc > >>> a_totalglobal_numPoints, //total number of global points > >>> &m_globalRHSVector); //the vector to be created > >>> > >>> where the vector a_totallocal_numPoints is : > >>> a_totallocal_numPoints: 59904 59904 > >>> > >>> The matrix is allocated as: > >>> m_ierr = MatCreateMPIAIJ(PETSC_COMM_WORLD, //communicator > >>> a_totallocal_numPoints[a_thisproc], //total number of local rows > >>> (that is rows residing on this proc > >>> a_totallocal_numPoints[a_thisproc], //total number of columns > >>> corresponding to local part of parallel vector > >>> a_totalglobal_numPoints, //number of global rows > >>> a_totalglobal_numPoints, //number of global columns > >>> PETSC_NULL, > >>> a_NumberOfNZPointsInDiagonalMatrix, > >>> PETSC_NULL, > >>> a_NumberOfNZPointsInOffDiagonalMatrix, > >>> &m_globalMatrix); > >>> > >>> With the info option i checked and there is no extra mallocs at all. > >>> My problems setup is symmetric so it seems that everything is set up so > >>> that it should be essentially perfectly balanced. However, the numbers > >>> given above certainly do not reflect that. > >>> > >>> However, the in all other parts of my code (except the PETSc call), i > >>> get the expected, almost perfect loadbalance. > >>> > >>> Is there anything that i am overlooking? Any help is greatly > >>> appreciated. > >>> > >>> thanks > >>> mat > >>> > >>> On Wednesday 02 August 2006 16:21, Matt Funk wrote: > >>>> Hi Matt, > >>>> > >>>> It could be a bad load imbalance because i don't let PETSc decide. I > >>>> need to fix that anyway, so i think i'll try that first and then let > >>>> you know. Thanks though for the quick response and helping me to > >>>> interpret those numbers ... > >>>> > >>>> > >>>> mat > >>>> > >>>> On Wednesday 02 August 2006 15:50, Matthew Knepley wrote: > >>>>> On 8/2/06, Matt Funk wrote: > >>>>>> Hi Matt, > >>>>>> > >>>>>> thanks for all the help so far. The -info option is really very > >>>>>> helpful. So i think i straightened the actual errors out. However, > >>>>>> now i am back to the original question i had. That is why it takes > >>>>>> so much longer on 4 procs than on 1 proc. > >>>>> > >>>>> So you have a 1.5 load imbalance for MatMult(), which probably > >>>>> cascades to give the 133! load imbalance for VecDot(). You probably > >>>>> have either: > >>>>> > >>>>> 1) VERY bad laod imbalance > >>>>> > >>>>> 2) a screwed up network > >>>>> > >>>>> 3) bad contention on the network (loaded cluster) > >>>>> > >>>>> Can you help us narrow this down? > >>>>> > >>>>> > >>>>> Matt > >>>>> > >>>>>> I profiled the KSPSolve(...) as stage 2: > >>>>>> > >>>>>> For 1 proc i have: > >>>>>> --- Event Stage 2: Stage 2 of ChomboPetscInterface > >>>>>> > >>>>>> VecDot 4000 1.0 4.9158e-02 1.0 4.74e+08 1.0 0.0e+00 > >>>>>> 0.0e+00 0.0e+00 0 18 0 0 0 2 18 0 0 0 474 > >>>>>> VecNorm 8000 1.0 2.1798e-01 1.0 2.14e+08 1.0 0.0e+00 > >>>>>> 0.0e+00 4.0e+03 1 36 0 0 28 7 36 0 0 33 214 > >>>>>> VecAYPX 4000 1.0 1.3449e-01 1.0 1.73e+08 1.0 0.0e+00 > >>>>>> 0.0e+00 0.0e+00 0 18 0 0 0 5 18 0 0 0 173 > >>>>>> MatMult 4000 1.0 3.6004e-01 1.0 3.24e+07 1.0 0.0e+00 > >>>>>> 0.0e+00 0.0e+00 1 9 0 0 0 12 9 0 0 0 32 > >>>>>> MatSolve 8000 1.0 1.0620e+00 1.0 2.19e+07 1.0 0.0e+00 > >>>>>> 0.0e+00 0.0e+00 3 18 0 0 0 36 18 0 0 0 22 > >>>>>> KSPSolve 4000 1.0 2.8338e+00 1.0 4.52e+07 1.0 0.0e+00 > >>>>>> 0.0e+00 1.2e+04 7100 0 0 84 97100 0 0100 45 > >>>>>> PCApply 8000 1.0 1.1133e+00 1.0 2.09e+07 1.0 0.0e+00 > >>>>>> 0.0e+00 0.0e+00 3 18 0 0 0 38 18 0 0 0 21 > >>>>>> > >>>>>> > >>>>>> for 4 procs i have : > >>>>>> --- Event Stage 2: Stage 2 of ChomboPetscInterface > >>>>>> > >>>>>> VecDot 4000 1.0 3.5884e+01133.7 2.17e+07133.7 0.0e+00 > >>>>>> 0.0e+00 4.0e+03 8 18 0 0 5 9 18 0 0 14 1 > >>>>>> VecNorm 8000 1.0 3.4986e-01 1.3 4.43e+07 1.3 0.0e+00 > >>>>>> 0.0e+00 8.0e+03 0 36 0 0 10 0 36 0 0 29 133 > >>>>>> VecSet 8000 1.0 3.5024e-02 1.4 0.00e+00 0.0 0.0e+00 > >>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>>>> VecAYPX 4000 1.0 5.6790e-02 1.3 1.28e+08 1.3 0.0e+00 > >>>>>> 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 410 > >>>>>> VecScatterBegin 4000 1.0 6.0042e+01 1.4 0.00e+00 0.0 0.0e+00 > >>>>>> 0.0e+00 0.0e+00 38 0 0 0 0 45 0 0 0 0 0 > >>>>>> VecScatterEnd 4000 1.0 5.9364e+01 1.4 0.00e+00 0.0 0.0e+00 > >>>>>> 0.0e+00 0.0e+00 37 0 0 0 0 44 0 0 0 0 0 > >>>>>> MatMult 4000 1.0 1.1959e+02 1.4 3.46e+04 1.4 0.0e+00 > >>>>>> 0.0e+00 0.0e+00 75 9 0 0 0 89 9 0 0 0 0 > >>>>>> MatSolve 8000 1.0 2.8150e-01 1.0 2.16e+07 1.0 0.0e+00 > >>>>>> 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 83 > >>>>>> MatLUFactorNum 1 1.0 1.3685e-04 1.1 5.64e+06 1.1 0.0e+00 > >>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 21 > >>>>>> MatILUFactorSym 1 1.0 2.3389e-04 1.2 0.00e+00 0.0 0.0e+00 > >>>>>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>>>> MatGetOrdering 1 1.0 9.6083e-05 1.2 0.00e+00 0.0 0.0e+00 > >>>>>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>>>> KSPSetup 1 1.0 2.1458e-06 2.2 0.00e+00 0.0 0.0e+00 > >>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>>>> KSPSolve 4000 1.0 1.2200e+02 1.0 2.63e+05 1.0 0.0e+00 > >>>>>> 0.0e+00 2.8e+04 84100 0 0 34 100100 0 0100 1 > >>>>>> PCSetUp 1 1.0 5.0187e-04 1.2 1.68e+06 1.2 0.0e+00 > >>>>>> 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 6 > >>>>>> PCSetUpOnBlocks 4000 1.0 1.2104e-02 2.2 1.34e+05 2.2 0.0e+00 > >>>>>> 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0 > >>>>>> PCApply 8000 1.0 8.4254e-01 1.2 8.27e+06 1.2 0.0e+00 > >>>>>> 0.0e+00 8.0e+03 1 18 0 0 10 1 18 0 0 29 28 > >>>>>> -------------------------------------------------------------------- > >>>>>>-- - -- ----------------------------------------------- > >>>>>> > >>>>>> Now if i understand it right, all these calls summarize all calls > >>>>>> between the pop and push commands. That would mean that the majority > >>>>>> of the time is spend in the MatMult and in within that the > >>>>>> VecScatterBegin and VecScatterEnd commands (if i understand it > >>>>>> right). > >>>>>> > >>>>>> My problem size is really small. So i was wondering if the problem > >>>>>> lies in that (namely that the major time is simply spend > >>>>>> communicating between processors, or whether there is still > >>>>>> something wrong with how i wrote the code?) > >>>>>> > >>>>>> > >>>>>> thanks > >>>>>> mat > >>>>>> > >>>>>> On Tuesday 01 August 2006 18:28, Matthew Knepley wrote: > >>>>>>> On 8/1/06, Matt Funk wrote: > >>>>>>>> Actually the errors occur on my calls to a PETSc functions after > >>>>>>>> calling PETSCInitialize. > >>>>>>> > >>>>>>> Yes, it is the error I pointed out in the last message. > >>>>>>> > >>>>>>> Matt > >>>>>>> > >>>>>>>> mat From knepley at gmail.com Tue Aug 15 15:44:04 2006 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 15 Aug 2006 15:44:04 -0500 Subject: profiling PETSc code In-Reply-To: <200608151435.55054.mafunk@nmsu.edu> References: <200608011545.26224.mafunk@nmsu.edu> <200608151239.32375.mafunk@nmsu.edu> <200608151435.55054.mafunk@nmsu.edu> Message-ID: I don't think it matters initially since the problem is BIG imbalances. Matt On 8/15/06, Matt Funk wrote: > Do you want me to use the debug version or the optimized version of PETSc? > > mat > > On Tuesday 15 August 2006 13:56, Barry Smith wrote: > > Please send the entire -info output as an attachment to me. (Not > > in the email) I'll study it in more detail. > > > > Barry > > > > On Tue, 15 Aug 2006, Matt Funk wrote: > > > On Tuesday 15 August 2006 11:52, Barry Smith wrote: > > >>> MatSolve 16000 1.0 1.1285e+02 1.4 1.50e+08 1.4 0.0e+00 > > >>> 0.0e+00 > > >> > > >> ^^^^^ > > >> balance > > >> > > >> Hmmm, I would guess that the matrix entries are not so well balanced? > > >> One process takes 1.4 times as long for the triangular solves as the > > >> other so either one matrix has many more entries or one processor is > > >> slower then the other. > > >> > > >> Barry > > > > > > Well it would seem that way at first, but i don't know how that could be > > > since i allocate an exactly equal amount of points on both processor (see > > > previous email). > > > Further i used the -mat_view_info option. Here is what it gives me: > > > > > > ... > > > Matrix Object: > > > type=mpiaij, rows=119808, cols=119808 > > > [1] PetscCommDuplicateUsing internal PETSc communicator 91 168 > > > [1] PetscCommDuplicateUsing internal PETSc communicator 91 168 > > > ... > > > > > > ... > > > Matrix Object: > > > type=seqaij, rows=59904, cols=59904 > > > total: nonzeros=407400, allocated nonzeros=407400 > > > not using I-node routines > > > Matrix Object: > > > type=seqaij, rows=59904, cols=59904 > > > total: nonzeros=407400, allocated nonzeros=407400 > > > not using I-node routines > > > ... > > > > > > So to me it look s well split up. Is there anything else that somebody > > > can think of. The machine i am running on is all same processors. > > > > > > By the way, i am not sure if the '[1] PetscCommDuplicateUsing internal > > > PETSc communicator 91 168' is something i need to worry about?? > > > > > > mat > > > > > >> On Tue, 15 Aug 2006, Matt Funk wrote: > > >>> Hi Matt, > > >>> > > >>> sorry for the delay since the last email, but there were some other > > >>> things i needed to do. > > >>> > > >>> Anyway, I hope that maybe I can get some more help from you guys with > > >>> respect to the loadimbalance problem i have. Here is the situtation: > > >>> I run my code on 2 procs. I profile my KSPSolve call and here is what i > > >>> get: > > >>> > > >>> ... > > >>> > > >>> --- Event Stage 2: Stage 2 of ChomboPetscInterface > > >>> > > >>> VecDot 20000 1.0 1.9575e+01 2.0 2.39e+08 2.0 0.0e+00 > > >>> 0.0e+00 2.0e+04 2 8 0 0 56 7 8 0 0 56 245 > > >>> VecNorm 16000 1.0 4.0559e+01 3.2 1.53e+08 3.2 0.0e+00 > > >>> 0.0e+00 1.6e+04 3 7 0 0 44 13 7 0 0 44 95 > > >>> VecCopy 4000 1.0 1.5148e+00 1.4 0.00e+00 0.0 0.0e+00 > > >>> 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 > > >>> VecSet 16000 1.0 3.1937e+00 1.8 0.00e+00 0.0 0.0e+00 > > >>> 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 > > >>> VecAXPY 16000 1.0 8.2395e+00 1.4 3.22e+08 1.4 0.0e+00 > > >>> 0.0e+00 0.0e+00 1 7 0 0 0 3 7 0 0 0 465 > > >>> VecAYPX 8000 1.0 3.6370e+00 1.3 3.46e+08 1.3 0.0e+00 > > >>> 0.0e+00 0.0e+00 0 3 0 0 0 2 3 0 0 0 527 > > >>> VecScatterBegin 12000 1.0 5.7721e-01 1.1 0.00e+00 0.0 2.4e+04 > > >>> 2.1e+04 0.0e+00 0 0100100 0 0 0100100 0 0 > > >>> VecScatterEnd 12000 1.0 1.4059e+01 9.2 0.00e+00 0.0 0.0e+00 > > >>> 0.0e+00 0.0e+00 1 0 0 0 0 4 0 0 0 0 0 > > >>> MatMult 12000 1.0 5.0567e+01 1.0 1.82e+08 1.0 2.4e+04 > > >>> 2.1e+04 0.0e+00 5 32100100 0 25 32100100 0 361 > > >>> MatSolve 16000 1.0 1.1285e+02 1.4 1.50e+08 1.4 0.0e+00 > > >>> 0.0e+00 0.0e+00 10 43 0 0 0 47 43 0 0 0 214 > > >>> MatLUFactorNum 1 1.0 1.9693e-02 1.0 5.79e+07 1.1 0.0e+00 > > >>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 110 > > >>> MatILUFactorSym 1 1.0 7.8840e-03 1.1 0.00e+00 0.0 0.0e+00 > > >>> 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > >>> MatGetOrdering 1 1.0 1.2250e-03 1.1 0.00e+00 0.0 0.0e+00 > > >>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > >>> KSPSetup 1 1.0 1.1921e-06 1.2 0.00e+00 0.0 0.0e+00 > > >>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > >>> KSPSolve 4000 1.0 2.0419e+02 1.0 1.39e+08 1.0 2.4e+04 > > >>> 2.1e+04 3.6e+04 21100100100100 100100100100100 278 > > >>> PCSetUp 1 1.0 2.8828e-02 1.1 3.99e+07 1.1 0.0e+00 > > >>> 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 75 > > >>> PCSetUpOnBlocks 4000 1.0 3.5605e-02 1.1 3.35e+07 1.1 0.0e+00 > > >>> 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 61 > > >>> PCApply 16000 1.0 1.1661e+02 1.4 1.46e+08 1.4 0.0e+00 > > >>> 0.0e+00 0.0e+00 10 43 0 0 0 49 43 0 0 0 207 > > >>> ----------------------------------------------------------------------- > > >>>-- ----------------------------------------------- > > >>> > > >>> ... > > >>> > > >>> > > >>> Some things to note are the following: > > >>> I allocate my vector as: > > >>> VecCreateMPI(PETSC_COMM_WORLD, //communicator > > >>> a_totallocal_numPoints[a_thisproc], //local points on this proc > > >>> a_totalglobal_numPoints, //total number of global points > > >>> &m_globalRHSVector); //the vector to be created > > >>> > > >>> where the vector a_totallocal_numPoints is : > > >>> a_totallocal_numPoints: 59904 59904 > > >>> > > >>> The matrix is allocated as: > > >>> m_ierr = MatCreateMPIAIJ(PETSC_COMM_WORLD, //communicator > > >>> a_totallocal_numPoints[a_thisproc], //total number of local rows > > >>> (that is rows residing on this proc > > >>> a_totallocal_numPoints[a_thisproc], //total number of columns > > >>> corresponding to local part of parallel vector > > >>> a_totalglobal_numPoints, //number of global rows > > >>> a_totalglobal_numPoints, //number of global columns > > >>> PETSC_NULL, > > >>> a_NumberOfNZPointsInDiagonalMatrix, > > >>> PETSC_NULL, > > >>> a_NumberOfNZPointsInOffDiagonalMatrix, > > >>> &m_globalMatrix); > > >>> > > >>> With the info option i checked and there is no extra mallocs at all. > > >>> My problems setup is symmetric so it seems that everything is set up so > > >>> that it should be essentially perfectly balanced. However, the numbers > > >>> given above certainly do not reflect that. > > >>> > > >>> However, the in all other parts of my code (except the PETSc call), i > > >>> get the expected, almost perfect loadbalance. > > >>> > > >>> Is there anything that i am overlooking? Any help is greatly > > >>> appreciated. > > >>> > > >>> thanks > > >>> mat > > >>> > > >>> On Wednesday 02 August 2006 16:21, Matt Funk wrote: > > >>>> Hi Matt, > > >>>> > > >>>> It could be a bad load imbalance because i don't let PETSc decide. I > > >>>> need to fix that anyway, so i think i'll try that first and then let > > >>>> you know. Thanks though for the quick response and helping me to > > >>>> interpret those numbers ... > > >>>> > > >>>> > > >>>> mat > > >>>> > > >>>> On Wednesday 02 August 2006 15:50, Matthew Knepley wrote: > > >>>>> On 8/2/06, Matt Funk wrote: > > >>>>>> Hi Matt, > > >>>>>> > > >>>>>> thanks for all the help so far. The -info option is really very > > >>>>>> helpful. So i think i straightened the actual errors out. However, > > >>>>>> now i am back to the original question i had. That is why it takes > > >>>>>> so much longer on 4 procs than on 1 proc. > > >>>>> > > >>>>> So you have a 1.5 load imbalance for MatMult(), which probably > > >>>>> cascades to give the 133! load imbalance for VecDot(). You probably > > >>>>> have either: > > >>>>> > > >>>>> 1) VERY bad laod imbalance > > >>>>> > > >>>>> 2) a screwed up network > > >>>>> > > >>>>> 3) bad contention on the network (loaded cluster) > > >>>>> > > >>>>> Can you help us narrow this down? > > >>>>> > > >>>>> > > >>>>> Matt > > >>>>> > > >>>>>> I profiled the KSPSolve(...) as stage 2: > > >>>>>> > > >>>>>> For 1 proc i have: > > >>>>>> --- Event Stage 2: Stage 2 of ChomboPetscInterface > > >>>>>> > > >>>>>> VecDot 4000 1.0 4.9158e-02 1.0 4.74e+08 1.0 0.0e+00 > > >>>>>> 0.0e+00 0.0e+00 0 18 0 0 0 2 18 0 0 0 474 > > >>>>>> VecNorm 8000 1.0 2.1798e-01 1.0 2.14e+08 1.0 0.0e+00 > > >>>>>> 0.0e+00 4.0e+03 1 36 0 0 28 7 36 0 0 33 214 > > >>>>>> VecAYPX 4000 1.0 1.3449e-01 1.0 1.73e+08 1.0 0.0e+00 > > >>>>>> 0.0e+00 0.0e+00 0 18 0 0 0 5 18 0 0 0 173 > > >>>>>> MatMult 4000 1.0 3.6004e-01 1.0 3.24e+07 1.0 0.0e+00 > > >>>>>> 0.0e+00 0.0e+00 1 9 0 0 0 12 9 0 0 0 32 > > >>>>>> MatSolve 8000 1.0 1.0620e+00 1.0 2.19e+07 1.0 0.0e+00 > > >>>>>> 0.0e+00 0.0e+00 3 18 0 0 0 36 18 0 0 0 22 > > >>>>>> KSPSolve 4000 1.0 2.8338e+00 1.0 4.52e+07 1.0 0.0e+00 > > >>>>>> 0.0e+00 1.2e+04 7100 0 0 84 97100 0 0100 45 > > >>>>>> PCApply 8000 1.0 1.1133e+00 1.0 2.09e+07 1.0 0.0e+00 > > >>>>>> 0.0e+00 0.0e+00 3 18 0 0 0 38 18 0 0 0 21 > > >>>>>> > > >>>>>> > > >>>>>> for 4 procs i have : > > >>>>>> --- Event Stage 2: Stage 2 of ChomboPetscInterface > > >>>>>> > > >>>>>> VecDot 4000 1.0 3.5884e+01133.7 2.17e+07133.7 0.0e+00 > > >>>>>> 0.0e+00 4.0e+03 8 18 0 0 5 9 18 0 0 14 1 > > >>>>>> VecNorm 8000 1.0 3.4986e-01 1.3 4.43e+07 1.3 0.0e+00 > > >>>>>> 0.0e+00 8.0e+03 0 36 0 0 10 0 36 0 0 29 133 > > >>>>>> VecSet 8000 1.0 3.5024e-02 1.4 0.00e+00 0.0 0.0e+00 > > >>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > >>>>>> VecAYPX 4000 1.0 5.6790e-02 1.3 1.28e+08 1.3 0.0e+00 > > >>>>>> 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 410 > > >>>>>> VecScatterBegin 4000 1.0 6.0042e+01 1.4 0.00e+00 0.0 0.0e+00 > > >>>>>> 0.0e+00 0.0e+00 38 0 0 0 0 45 0 0 0 0 0 > > >>>>>> VecScatterEnd 4000 1.0 5.9364e+01 1.4 0.00e+00 0.0 0.0e+00 > > >>>>>> 0.0e+00 0.0e+00 37 0 0 0 0 44 0 0 0 0 0 > > >>>>>> MatMult 4000 1.0 1.1959e+02 1.4 3.46e+04 1.4 0.0e+00 > > >>>>>> 0.0e+00 0.0e+00 75 9 0 0 0 89 9 0 0 0 0 > > >>>>>> MatSolve 8000 1.0 2.8150e-01 1.0 2.16e+07 1.0 0.0e+00 > > >>>>>> 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 83 > > >>>>>> MatLUFactorNum 1 1.0 1.3685e-04 1.1 5.64e+06 1.1 0.0e+00 > > >>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 21 > > >>>>>> MatILUFactorSym 1 1.0 2.3389e-04 1.2 0.00e+00 0.0 0.0e+00 > > >>>>>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > >>>>>> MatGetOrdering 1 1.0 9.6083e-05 1.2 0.00e+00 0.0 0.0e+00 > > >>>>>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > >>>>>> KSPSetup 1 1.0 2.1458e-06 2.2 0.00e+00 0.0 0.0e+00 > > >>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > >>>>>> KSPSolve 4000 1.0 1.2200e+02 1.0 2.63e+05 1.0 0.0e+00 > > >>>>>> 0.0e+00 2.8e+04 84100 0 0 34 100100 0 0100 1 > > >>>>>> PCSetUp 1 1.0 5.0187e-04 1.2 1.68e+06 1.2 0.0e+00 > > >>>>>> 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 6 > > >>>>>> PCSetUpOnBlocks 4000 1.0 1.2104e-02 2.2 1.34e+05 2.2 0.0e+00 > > >>>>>> 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > >>>>>> PCApply 8000 1.0 8.4254e-01 1.2 8.27e+06 1.2 0.0e+00 > > >>>>>> 0.0e+00 8.0e+03 1 18 0 0 10 1 18 0 0 29 28 > > >>>>>> -------------------------------------------------------------------- > > >>>>>>-- - -- ----------------------------------------------- > > >>>>>> > > >>>>>> Now if i understand it right, all these calls summarize all calls > > >>>>>> between the pop and push commands. That would mean that the majority > > >>>>>> of the time is spend in the MatMult and in within that the > > >>>>>> VecScatterBegin and VecScatterEnd commands (if i understand it > > >>>>>> right). > > >>>>>> > > >>>>>> My problem size is really small. So i was wondering if the problem > > >>>>>> lies in that (namely that the major time is simply spend > > >>>>>> communicating between processors, or whether there is still > > >>>>>> something wrong with how i wrote the code?) > > >>>>>> > > >>>>>> > > >>>>>> thanks > > >>>>>> mat > > >>>>>> > > >>>>>> On Tuesday 01 August 2006 18:28, Matthew Knepley wrote: > > >>>>>>> On 8/1/06, Matt Funk wrote: > > >>>>>>>> Actually the errors occur on my calls to a PETSc functions after > > >>>>>>>> calling PETSCInitialize. > > >>>>>>> > > >>>>>>> Yes, it is the error I pointed out in the last message. > > >>>>>>> > > >>>>>>> Matt > > >>>>>>> > > >>>>>>>> mat > > -- "Failure has a thousand explanations. Success doesn't need one" -- Sir Alec Guiness From mafunk at nmsu.edu Tue Aug 15 16:51:52 2006 From: mafunk at nmsu.edu (Matt Funk) Date: Tue, 15 Aug 2006 15:51:52 -0600 Subject: profiling PETSc code In-Reply-To: References: <200608011545.26224.mafunk@nmsu.edu> <200608151435.55054.mafunk@nmsu.edu> Message-ID: <200608151551.53794.mafunk@nmsu.edu> Is there a limit to how big an attachment can be? The file is 1.3Mb big. I tried to send it twice and none of the emails went through. I also send it directly to Barry and Matthews email. I hope that got though? mat On Tuesday 15 August 2006 14:44, Matthew Knepley wrote: > I don't think it matters initially since the problem is BIG imbalances. > > Matt > > On 8/15/06, Matt Funk wrote: > > Do you want me to use the debug version or the optimized version of > > PETSc? > > > > mat > > > > On Tuesday 15 August 2006 13:56, Barry Smith wrote: > > > Please send the entire -info output as an attachment to me. (Not > > > in the email) I'll study it in more detail. > > > > > > Barry > > > > > > On Tue, 15 Aug 2006, Matt Funk wrote: > > > > On Tuesday 15 August 2006 11:52, Barry Smith wrote: > > > >>> MatSolve 16000 1.0 1.1285e+02 1.4 1.50e+08 1.4 0.0e+00 > > > >>> 0.0e+00 > > > >> > > > >> ^^^^^ > > > >> balance > > > >> > > > >> Hmmm, I would guess that the matrix entries are not so well > > > >> balanced? One process takes 1.4 times as long for the triangular > > > >> solves as the other so either one matrix has many more entries or > > > >> one processor is slower then the other. > > > >> > > > >> Barry > > > > > > > > Well it would seem that way at first, but i don't know how that could > > > > be since i allocate an exactly equal amount of points on both > > > > processor (see previous email). > > > > Further i used the -mat_view_info option. Here is what it gives me: > > > > > > > > ... > > > > Matrix Object: > > > > type=mpiaij, rows=119808, cols=119808 > > > > [1] PetscCommDuplicateUsing internal PETSc communicator 91 168 > > > > [1] PetscCommDuplicateUsing internal PETSc communicator 91 168 > > > > ... > > > > > > > > ... > > > > Matrix Object: > > > > type=seqaij, rows=59904, cols=59904 > > > > total: nonzeros=407400, allocated nonzeros=407400 > > > > not using I-node routines > > > > Matrix Object: > > > > type=seqaij, rows=59904, cols=59904 > > > > total: nonzeros=407400, allocated nonzeros=407400 > > > > not using I-node routines > > > > ... > > > > > > > > So to me it look s well split up. Is there anything else that > > > > somebody can think of. The machine i am running on is all same > > > > processors. > > > > > > > > By the way, i am not sure if the '[1] PetscCommDuplicateUsing > > > > internal PETSc communicator 91 168' is something i need to worry > > > > about?? > > > > > > > > mat > > > > > > > >> On Tue, 15 Aug 2006, Matt Funk wrote: > > > >>> Hi Matt, > > > >>> > > > >>> sorry for the delay since the last email, but there were some other > > > >>> things i needed to do. > > > >>> > > > >>> Anyway, I hope that maybe I can get some more help from you guys > > > >>> with respect to the loadimbalance problem i have. Here is the > > > >>> situtation: I run my code on 2 procs. I profile my KSPSolve call > > > >>> and here is what i get: > > > >>> > > > >>> ... > > > >>> > > > >>> --- Event Stage 2: Stage 2 of ChomboPetscInterface > > > >>> > > > >>> VecDot 20000 1.0 1.9575e+01 2.0 2.39e+08 2.0 0.0e+00 > > > >>> 0.0e+00 2.0e+04 2 8 0 0 56 7 8 0 0 56 245 > > > >>> VecNorm 16000 1.0 4.0559e+01 3.2 1.53e+08 3.2 0.0e+00 > > > >>> 0.0e+00 1.6e+04 3 7 0 0 44 13 7 0 0 44 95 > > > >>> VecCopy 4000 1.0 1.5148e+00 1.4 0.00e+00 0.0 0.0e+00 > > > >>> 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 > > > >>> VecSet 16000 1.0 3.1937e+00 1.8 0.00e+00 0.0 0.0e+00 > > > >>> 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 > > > >>> VecAXPY 16000 1.0 8.2395e+00 1.4 3.22e+08 1.4 0.0e+00 > > > >>> 0.0e+00 0.0e+00 1 7 0 0 0 3 7 0 0 0 465 > > > >>> VecAYPX 8000 1.0 3.6370e+00 1.3 3.46e+08 1.3 0.0e+00 > > > >>> 0.0e+00 0.0e+00 0 3 0 0 0 2 3 0 0 0 527 > > > >>> VecScatterBegin 12000 1.0 5.7721e-01 1.1 0.00e+00 0.0 2.4e+04 > > > >>> 2.1e+04 0.0e+00 0 0100100 0 0 0100100 0 0 > > > >>> VecScatterEnd 12000 1.0 1.4059e+01 9.2 0.00e+00 0.0 0.0e+00 > > > >>> 0.0e+00 0.0e+00 1 0 0 0 0 4 0 0 0 0 0 > > > >>> MatMult 12000 1.0 5.0567e+01 1.0 1.82e+08 1.0 2.4e+04 > > > >>> 2.1e+04 0.0e+00 5 32100100 0 25 32100100 0 361 > > > >>> MatSolve 16000 1.0 1.1285e+02 1.4 1.50e+08 1.4 0.0e+00 > > > >>> 0.0e+00 0.0e+00 10 43 0 0 0 47 43 0 0 0 214 > > > >>> MatLUFactorNum 1 1.0 1.9693e-02 1.0 5.79e+07 1.1 0.0e+00 > > > >>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 110 > > > >>> MatILUFactorSym 1 1.0 7.8840e-03 1.1 0.00e+00 0.0 0.0e+00 > > > >>> 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > >>> MatGetOrdering 1 1.0 1.2250e-03 1.1 0.00e+00 0.0 0.0e+00 > > > >>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > >>> KSPSetup 1 1.0 1.1921e-06 1.2 0.00e+00 0.0 0.0e+00 > > > >>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > >>> KSPSolve 4000 1.0 2.0419e+02 1.0 1.39e+08 1.0 2.4e+04 > > > >>> 2.1e+04 3.6e+04 21100100100100 100100100100100 278 > > > >>> PCSetUp 1 1.0 2.8828e-02 1.1 3.99e+07 1.1 0.0e+00 > > > >>> 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 75 > > > >>> PCSetUpOnBlocks 4000 1.0 3.5605e-02 1.1 3.35e+07 1.1 0.0e+00 > > > >>> 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 61 > > > >>> PCApply 16000 1.0 1.1661e+02 1.4 1.46e+08 1.4 0.0e+00 > > > >>> 0.0e+00 0.0e+00 10 43 0 0 0 49 43 0 0 0 207 > > > >>> ------------------------------------------------------------------- > > > >>>---- -- ----------------------------------------------- > > > >>> > > > >>> ... > > > >>> > > > >>> > > > >>> Some things to note are the following: > > > >>> I allocate my vector as: > > > >>> VecCreateMPI(PETSC_COMM_WORLD, //communicator > > > >>> a_totallocal_numPoints[a_thisproc], //local points on this > > > >>> proc a_totalglobal_numPoints, //total number of global points > > > >>> &m_globalRHSVector); //the vector to be created > > > >>> > > > >>> where the vector a_totallocal_numPoints is : > > > >>> a_totallocal_numPoints: 59904 59904 > > > >>> > > > >>> The matrix is allocated as: > > > >>> m_ierr = MatCreateMPIAIJ(PETSC_COMM_WORLD, //communicator > > > >>> a_totallocal_numPoints[a_thisproc], //total > > > >>> number of local rows (that is rows residing on this proc > > > >>> a_totallocal_numPoints[a_thisproc], //total > > > >>> number of columns corresponding to local part of parallel vector > > > >>> a_totalglobal_numPoints, //number of global > > > >>> rows a_totalglobal_numPoints, //number of global columns > > > >>> PETSC_NULL, > > > >>> a_NumberOfNZPointsInDiagonalMatrix, > > > >>> PETSC_NULL, > > > >>> a_NumberOfNZPointsInOffDiagonalMatrix, > > > >>> &m_globalMatrix); > > > >>> > > > >>> With the info option i checked and there is no extra mallocs at > > > >>> all. My problems setup is symmetric so it seems that everything is > > > >>> set up so that it should be essentially perfectly balanced. > > > >>> However, the numbers given above certainly do not reflect that. > > > >>> > > > >>> However, the in all other parts of my code (except the PETSc call), > > > >>> i get the expected, almost perfect loadbalance. > > > >>> > > > >>> Is there anything that i am overlooking? Any help is greatly > > > >>> appreciated. > > > >>> > > > >>> thanks > > > >>> mat > > > >>> > > > >>> On Wednesday 02 August 2006 16:21, Matt Funk wrote: > > > >>>> Hi Matt, > > > >>>> > > > >>>> It could be a bad load imbalance because i don't let PETSc decide. > > > >>>> I need to fix that anyway, so i think i'll try that first and then > > > >>>> let you know. Thanks though for the quick response and helping me > > > >>>> to interpret those numbers ... > > > >>>> > > > >>>> > > > >>>> mat > > > >>>> > > > >>>> On Wednesday 02 August 2006 15:50, Matthew Knepley wrote: > > > >>>>> On 8/2/06, Matt Funk wrote: > > > >>>>>> Hi Matt, > > > >>>>>> > > > >>>>>> thanks for all the help so far. The -info option is really very > > > >>>>>> helpful. So i think i straightened the actual errors out. > > > >>>>>> However, now i am back to the original question i had. That is > > > >>>>>> why it takes so much longer on 4 procs than on 1 proc. > > > >>>>> > > > >>>>> So you have a 1.5 load imbalance for MatMult(), which probably > > > >>>>> cascades to give the 133! load imbalance for VecDot(). You > > > >>>>> probably have either: > > > >>>>> > > > >>>>> 1) VERY bad laod imbalance > > > >>>>> > > > >>>>> 2) a screwed up network > > > >>>>> > > > >>>>> 3) bad contention on the network (loaded cluster) > > > >>>>> > > > >>>>> Can you help us narrow this down? > > > >>>>> > > > >>>>> > > > >>>>> Matt > > > >>>>> > > > >>>>>> I profiled the KSPSolve(...) as stage 2: > > > >>>>>> > > > >>>>>> For 1 proc i have: > > > >>>>>> --- Event Stage 2: Stage 2 of ChomboPetscInterface > > > >>>>>> > > > >>>>>> VecDot 4000 1.0 4.9158e-02 1.0 4.74e+08 1.0 0.0e+00 > > > >>>>>> 0.0e+00 0.0e+00 0 18 0 0 0 2 18 0 0 0 474 > > > >>>>>> VecNorm 8000 1.0 2.1798e-01 1.0 2.14e+08 1.0 0.0e+00 > > > >>>>>> 0.0e+00 4.0e+03 1 36 0 0 28 7 36 0 0 33 214 > > > >>>>>> VecAYPX 4000 1.0 1.3449e-01 1.0 1.73e+08 1.0 0.0e+00 > > > >>>>>> 0.0e+00 0.0e+00 0 18 0 0 0 5 18 0 0 0 173 > > > >>>>>> MatMult 4000 1.0 3.6004e-01 1.0 3.24e+07 1.0 0.0e+00 > > > >>>>>> 0.0e+00 0.0e+00 1 9 0 0 0 12 9 0 0 0 32 > > > >>>>>> MatSolve 8000 1.0 1.0620e+00 1.0 2.19e+07 1.0 0.0e+00 > > > >>>>>> 0.0e+00 0.0e+00 3 18 0 0 0 36 18 0 0 0 22 > > > >>>>>> KSPSolve 4000 1.0 2.8338e+00 1.0 4.52e+07 1.0 0.0e+00 > > > >>>>>> 0.0e+00 1.2e+04 7100 0 0 84 97100 0 0100 45 > > > >>>>>> PCApply 8000 1.0 1.1133e+00 1.0 2.09e+07 1.0 0.0e+00 > > > >>>>>> 0.0e+00 0.0e+00 3 18 0 0 0 38 18 0 0 0 21 > > > >>>>>> > > > >>>>>> > > > >>>>>> for 4 procs i have : > > > >>>>>> --- Event Stage 2: Stage 2 of ChomboPetscInterface > > > >>>>>> > > > >>>>>> VecDot 4000 1.0 3.5884e+01133.7 2.17e+07133.7 > > > >>>>>> 0.0e+00 0.0e+00 4.0e+03 8 18 0 0 5 9 18 0 0 14 1 > > > >>>>>> VecNorm 8000 1.0 3.4986e-01 1.3 4.43e+07 1.3 0.0e+00 > > > >>>>>> 0.0e+00 8.0e+03 0 36 0 0 10 0 36 0 0 29 133 > > > >>>>>> VecSet 8000 1.0 3.5024e-02 1.4 0.00e+00 0.0 0.0e+00 > > > >>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > >>>>>> VecAYPX 4000 1.0 5.6790e-02 1.3 1.28e+08 1.3 0.0e+00 > > > >>>>>> 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 410 > > > >>>>>> VecScatterBegin 4000 1.0 6.0042e+01 1.4 0.00e+00 0.0 0.0e+00 > > > >>>>>> 0.0e+00 0.0e+00 38 0 0 0 0 45 0 0 0 0 0 > > > >>>>>> VecScatterEnd 4000 1.0 5.9364e+01 1.4 0.00e+00 0.0 0.0e+00 > > > >>>>>> 0.0e+00 0.0e+00 37 0 0 0 0 44 0 0 0 0 0 > > > >>>>>> MatMult 4000 1.0 1.1959e+02 1.4 3.46e+04 1.4 0.0e+00 > > > >>>>>> 0.0e+00 0.0e+00 75 9 0 0 0 89 9 0 0 0 0 > > > >>>>>> MatSolve 8000 1.0 2.8150e-01 1.0 2.16e+07 1.0 0.0e+00 > > > >>>>>> 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 83 > > > >>>>>> MatLUFactorNum 1 1.0 1.3685e-04 1.1 5.64e+06 1.1 0.0e+00 > > > >>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 21 > > > >>>>>> MatILUFactorSym 1 1.0 2.3389e-04 1.2 0.00e+00 0.0 0.0e+00 > > > >>>>>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > >>>>>> MatGetOrdering 1 1.0 9.6083e-05 1.2 0.00e+00 0.0 0.0e+00 > > > >>>>>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > >>>>>> KSPSetup 1 1.0 2.1458e-06 2.2 0.00e+00 0.0 0.0e+00 > > > >>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > >>>>>> KSPSolve 4000 1.0 1.2200e+02 1.0 2.63e+05 1.0 0.0e+00 > > > >>>>>> 0.0e+00 2.8e+04 84100 0 0 34 100100 0 0100 1 > > > >>>>>> PCSetUp 1 1.0 5.0187e-04 1.2 1.68e+06 1.2 0.0e+00 > > > >>>>>> 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 6 > > > >>>>>> PCSetUpOnBlocks 4000 1.0 1.2104e-02 2.2 1.34e+05 2.2 0.0e+00 > > > >>>>>> 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > >>>>>> PCApply 8000 1.0 8.4254e-01 1.2 8.27e+06 1.2 0.0e+00 > > > >>>>>> 0.0e+00 8.0e+03 1 18 0 0 10 1 18 0 0 29 28 > > > >>>>>> ---------------------------------------------------------------- > > > >>>>>>---- -- - -- ----------------------------------------------- > > > >>>>>> > > > >>>>>> Now if i understand it right, all these calls summarize all > > > >>>>>> calls between the pop and push commands. That would mean that > > > >>>>>> the majority of the time is spend in the MatMult and in within > > > >>>>>> that the VecScatterBegin and VecScatterEnd commands (if i > > > >>>>>> understand it right). > > > >>>>>> > > > >>>>>> My problem size is really small. So i was wondering if the > > > >>>>>> problem lies in that (namely that the major time is simply spend > > > >>>>>> communicating between processors, or whether there is still > > > >>>>>> something wrong with how i wrote the code?) > > > >>>>>> > > > >>>>>> > > > >>>>>> thanks > > > >>>>>> mat > > > >>>>>> > > > >>>>>> On Tuesday 01 August 2006 18:28, Matthew Knepley wrote: > > > >>>>>>> On 8/1/06, Matt Funk wrote: > > > >>>>>>>> Actually the errors occur on my calls to a PETSc functions > > > >>>>>>>> after calling PETSCInitialize. > > > >>>>>>> > > > >>>>>>> Yes, it is the error I pointed out in the last message. > > > >>>>>>> > > > >>>>>>> Matt > > > >>>>>>> > > > >>>>>>>> mat From knepley at gmail.com Tue Aug 15 16:57:27 2006 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 15 Aug 2006 16:57:27 -0500 Subject: profiling PETSc code In-Reply-To: <200608151551.53794.mafunk@nmsu.edu> References: <200608011545.26224.mafunk@nmsu.edu> <200608151435.55054.mafunk@nmsu.edu> <200608151551.53794.mafunk@nmsu.edu> Message-ID: Yes, I got it. You are correct, the matrix partitions are exactly the same size. I guess you have a bad network, since not only are the ILU times unbalanced, but vector operations as well. Matt On 8/15/06, Matt Funk wrote: > Is there a limit to how big an attachment can be? > The file is 1.3Mb big. I tried to send it twice and none of the emails went > through. I also send it directly to Barry and Matthews email. I hope that got > though? > > mat > > > On Tuesday 15 August 2006 14:44, Matthew Knepley wrote: > > I don't think it matters initially since the problem is BIG imbalances. > > > > Matt > > > > On 8/15/06, Matt Funk wrote: > > > Do you want me to use the debug version or the optimized version of > > > PETSc? > > > > > > mat > > > > > > On Tuesday 15 August 2006 13:56, Barry Smith wrote: > > > > Please send the entire -info output as an attachment to me. (Not > > > > in the email) I'll study it in more detail. > > > > > > > > Barry > > > > > > > > On Tue, 15 Aug 2006, Matt Funk wrote: > > > > > On Tuesday 15 August 2006 11:52, Barry Smith wrote: > > > > >>> MatSolve 16000 1.0 1.1285e+02 1.4 1.50e+08 1.4 0.0e+00 > > > > >>> 0.0e+00 > > > > >> > > > > >> ^^^^^ > > > > >> balance > > > > >> > > > > >> Hmmm, I would guess that the matrix entries are not so well > > > > >> balanced? One process takes 1.4 times as long for the triangular > > > > >> solves as the other so either one matrix has many more entries or > > > > >> one processor is slower then the other. > > > > >> > > > > >> Barry > > > > > > > > > > Well it would seem that way at first, but i don't know how that could > > > > > be since i allocate an exactly equal amount of points on both > > > > > processor (see previous email). > > > > > Further i used the -mat_view_info option. Here is what it gives me: > > > > > > > > > > ... > > > > > Matrix Object: > > > > > type=mpiaij, rows=119808, cols=119808 > > > > > [1] PetscCommDuplicateUsing internal PETSc communicator 91 168 > > > > > [1] PetscCommDuplicateUsing internal PETSc communicator 91 168 > > > > > ... > > > > > > > > > > ... > > > > > Matrix Object: > > > > > type=seqaij, rows=59904, cols=59904 > > > > > total: nonzeros=407400, allocated nonzeros=407400 > > > > > not using I-node routines > > > > > Matrix Object: > > > > > type=seqaij, rows=59904, cols=59904 > > > > > total: nonzeros=407400, allocated nonzeros=407400 > > > > > not using I-node routines > > > > > ... > > > > > > > > > > So to me it look s well split up. Is there anything else that > > > > > somebody can think of. The machine i am running on is all same > > > > > processors. > > > > > > > > > > By the way, i am not sure if the '[1] PetscCommDuplicateUsing > > > > > internal PETSc communicator 91 168' is something i need to worry > > > > > about?? > > > > > > > > > > mat > > > > > > > > > >> On Tue, 15 Aug 2006, Matt Funk wrote: > > > > >>> Hi Matt, > > > > >>> > > > > >>> sorry for the delay since the last email, but there were some other > > > > >>> things i needed to do. > > > > >>> > > > > >>> Anyway, I hope that maybe I can get some more help from you guys > > > > >>> with respect to the loadimbalance problem i have. Here is the > > > > >>> situtation: I run my code on 2 procs. I profile my KSPSolve call > > > > >>> and here is what i get: > > > > >>> > > > > >>> ... > > > > >>> > > > > >>> --- Event Stage 2: Stage 2 of ChomboPetscInterface > > > > >>> > > > > >>> VecDot 20000 1.0 1.9575e+01 2.0 2.39e+08 2.0 0.0e+00 > > > > >>> 0.0e+00 2.0e+04 2 8 0 0 56 7 8 0 0 56 245 > > > > >>> VecNorm 16000 1.0 4.0559e+01 3.2 1.53e+08 3.2 0.0e+00 > > > > >>> 0.0e+00 1.6e+04 3 7 0 0 44 13 7 0 0 44 95 > > > > >>> VecCopy 4000 1.0 1.5148e+00 1.4 0.00e+00 0.0 0.0e+00 > > > > >>> 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 > > > > >>> VecSet 16000 1.0 3.1937e+00 1.8 0.00e+00 0.0 0.0e+00 > > > > >>> 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 > > > > >>> VecAXPY 16000 1.0 8.2395e+00 1.4 3.22e+08 1.4 0.0e+00 > > > > >>> 0.0e+00 0.0e+00 1 7 0 0 0 3 7 0 0 0 465 > > > > >>> VecAYPX 8000 1.0 3.6370e+00 1.3 3.46e+08 1.3 0.0e+00 > > > > >>> 0.0e+00 0.0e+00 0 3 0 0 0 2 3 0 0 0 527 > > > > >>> VecScatterBegin 12000 1.0 5.7721e-01 1.1 0.00e+00 0.0 2.4e+04 > > > > >>> 2.1e+04 0.0e+00 0 0100100 0 0 0100100 0 0 > > > > >>> VecScatterEnd 12000 1.0 1.4059e+01 9.2 0.00e+00 0.0 0.0e+00 > > > > >>> 0.0e+00 0.0e+00 1 0 0 0 0 4 0 0 0 0 0 > > > > >>> MatMult 12000 1.0 5.0567e+01 1.0 1.82e+08 1.0 2.4e+04 > > > > >>> 2.1e+04 0.0e+00 5 32100100 0 25 32100100 0 361 > > > > >>> MatSolve 16000 1.0 1.1285e+02 1.4 1.50e+08 1.4 0.0e+00 > > > > >>> 0.0e+00 0.0e+00 10 43 0 0 0 47 43 0 0 0 214 > > > > >>> MatLUFactorNum 1 1.0 1.9693e-02 1.0 5.79e+07 1.1 0.0e+00 > > > > >>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 110 > > > > >>> MatILUFactorSym 1 1.0 7.8840e-03 1.1 0.00e+00 0.0 0.0e+00 > > > > >>> 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > > >>> MatGetOrdering 1 1.0 1.2250e-03 1.1 0.00e+00 0.0 0.0e+00 > > > > >>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > > >>> KSPSetup 1 1.0 1.1921e-06 1.2 0.00e+00 0.0 0.0e+00 > > > > >>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > > >>> KSPSolve 4000 1.0 2.0419e+02 1.0 1.39e+08 1.0 2.4e+04 > > > > >>> 2.1e+04 3.6e+04 21100100100100 100100100100100 278 > > > > >>> PCSetUp 1 1.0 2.8828e-02 1.1 3.99e+07 1.1 0.0e+00 > > > > >>> 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 75 > > > > >>> PCSetUpOnBlocks 4000 1.0 3.5605e-02 1.1 3.35e+07 1.1 0.0e+00 > > > > >>> 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 61 > > > > >>> PCApply 16000 1.0 1.1661e+02 1.4 1.46e+08 1.4 0.0e+00 > > > > >>> 0.0e+00 0.0e+00 10 43 0 0 0 49 43 0 0 0 207 > > > > >>> ------------------------------------------------------------------- > > > > >>>---- -- ----------------------------------------------- > > > > >>> > > > > >>> ... > > > > >>> > > > > >>> > > > > >>> Some things to note are the following: > > > > >>> I allocate my vector as: > > > > >>> VecCreateMPI(PETSC_COMM_WORLD, //communicator > > > > >>> a_totallocal_numPoints[a_thisproc], //local points on this > > > > >>> proc a_totalglobal_numPoints, //total number of global points > > > > >>> &m_globalRHSVector); //the vector to be created > > > > >>> > > > > >>> where the vector a_totallocal_numPoints is : > > > > >>> a_totallocal_numPoints: 59904 59904 > > > > >>> > > > > >>> The matrix is allocated as: > > > > >>> m_ierr = MatCreateMPIAIJ(PETSC_COMM_WORLD, //communicator > > > > >>> a_totallocal_numPoints[a_thisproc], //total > > > > >>> number of local rows (that is rows residing on this proc > > > > >>> a_totallocal_numPoints[a_thisproc], //total > > > > >>> number of columns corresponding to local part of parallel vector > > > > >>> a_totalglobal_numPoints, //number of global > > > > >>> rows a_totalglobal_numPoints, //number of global columns > > > > >>> PETSC_NULL, > > > > >>> a_NumberOfNZPointsInDiagonalMatrix, > > > > >>> PETSC_NULL, > > > > >>> a_NumberOfNZPointsInOffDiagonalMatrix, > > > > >>> &m_globalMatrix); > > > > >>> > > > > >>> With the info option i checked and there is no extra mallocs at > > > > >>> all. My problems setup is symmetric so it seems that everything is > > > > >>> set up so that it should be essentially perfectly balanced. > > > > >>> However, the numbers given above certainly do not reflect that. > > > > >>> > > > > >>> However, the in all other parts of my code (except the PETSc call), > > > > >>> i get the expected, almost perfect loadbalance. > > > > >>> > > > > >>> Is there anything that i am overlooking? Any help is greatly > > > > >>> appreciated. > > > > >>> > > > > >>> thanks > > > > >>> mat > > > > >>> > > > > >>> On Wednesday 02 August 2006 16:21, Matt Funk wrote: > > > > >>>> Hi Matt, > > > > >>>> > > > > >>>> It could be a bad load imbalance because i don't let PETSc decide. > > > > >>>> I need to fix that anyway, so i think i'll try that first and then > > > > >>>> let you know. Thanks though for the quick response and helping me > > > > >>>> to interpret those numbers ... > > > > >>>> > > > > >>>> > > > > >>>> mat > > > > >>>> > > > > >>>> On Wednesday 02 August 2006 15:50, Matthew Knepley wrote: > > > > >>>>> On 8/2/06, Matt Funk wrote: > > > > >>>>>> Hi Matt, > > > > >>>>>> > > > > >>>>>> thanks for all the help so far. The -info option is really very > > > > >>>>>> helpful. So i think i straightened the actual errors out. > > > > >>>>>> However, now i am back to the original question i had. That is > > > > >>>>>> why it takes so much longer on 4 procs than on 1 proc. > > > > >>>>> > > > > >>>>> So you have a 1.5 load imbalance for MatMult(), which probably > > > > >>>>> cascades to give the 133! load imbalance for VecDot(). You > > > > >>>>> probably have either: > > > > >>>>> > > > > >>>>> 1) VERY bad laod imbalance > > > > >>>>> > > > > >>>>> 2) a screwed up network > > > > >>>>> > > > > >>>>> 3) bad contention on the network (loaded cluster) > > > > >>>>> > > > > >>>>> Can you help us narrow this down? > > > > >>>>> > > > > >>>>> > > > > >>>>> Matt > > > > >>>>> > > > > >>>>>> I profiled the KSPSolve(...) as stage 2: > > > > >>>>>> > > > > >>>>>> For 1 proc i have: > > > > >>>>>> --- Event Stage 2: Stage 2 of ChomboPetscInterface > > > > >>>>>> > > > > >>>>>> VecDot 4000 1.0 4.9158e-02 1.0 4.74e+08 1.0 0.0e+00 > > > > >>>>>> 0.0e+00 0.0e+00 0 18 0 0 0 2 18 0 0 0 474 > > > > >>>>>> VecNorm 8000 1.0 2.1798e-01 1.0 2.14e+08 1.0 0.0e+00 > > > > >>>>>> 0.0e+00 4.0e+03 1 36 0 0 28 7 36 0 0 33 214 > > > > >>>>>> VecAYPX 4000 1.0 1.3449e-01 1.0 1.73e+08 1.0 0.0e+00 > > > > >>>>>> 0.0e+00 0.0e+00 0 18 0 0 0 5 18 0 0 0 173 > > > > >>>>>> MatMult 4000 1.0 3.6004e-01 1.0 3.24e+07 1.0 0.0e+00 > > > > >>>>>> 0.0e+00 0.0e+00 1 9 0 0 0 12 9 0 0 0 32 > > > > >>>>>> MatSolve 8000 1.0 1.0620e+00 1.0 2.19e+07 1.0 0.0e+00 > > > > >>>>>> 0.0e+00 0.0e+00 3 18 0 0 0 36 18 0 0 0 22 > > > > >>>>>> KSPSolve 4000 1.0 2.8338e+00 1.0 4.52e+07 1.0 0.0e+00 > > > > >>>>>> 0.0e+00 1.2e+04 7100 0 0 84 97100 0 0100 45 > > > > >>>>>> PCApply 8000 1.0 1.1133e+00 1.0 2.09e+07 1.0 0.0e+00 > > > > >>>>>> 0.0e+00 0.0e+00 3 18 0 0 0 38 18 0 0 0 21 > > > > >>>>>> > > > > >>>>>> > > > > >>>>>> for 4 procs i have : > > > > >>>>>> --- Event Stage 2: Stage 2 of ChomboPetscInterface > > > > >>>>>> > > > > >>>>>> VecDot 4000 1.0 3.5884e+01133.7 2.17e+07133.7 > > > > >>>>>> 0.0e+00 0.0e+00 4.0e+03 8 18 0 0 5 9 18 0 0 14 1 > > > > >>>>>> VecNorm 8000 1.0 3.4986e-01 1.3 4.43e+07 1.3 0.0e+00 > > > > >>>>>> 0.0e+00 8.0e+03 0 36 0 0 10 0 36 0 0 29 133 > > > > >>>>>> VecSet 8000 1.0 3.5024e-02 1.4 0.00e+00 0.0 0.0e+00 > > > > >>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > > >>>>>> VecAYPX 4000 1.0 5.6790e-02 1.3 1.28e+08 1.3 0.0e+00 > > > > >>>>>> 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 410 > > > > >>>>>> VecScatterBegin 4000 1.0 6.0042e+01 1.4 0.00e+00 0.0 0.0e+00 > > > > >>>>>> 0.0e+00 0.0e+00 38 0 0 0 0 45 0 0 0 0 0 > > > > >>>>>> VecScatterEnd 4000 1.0 5.9364e+01 1.4 0.00e+00 0.0 0.0e+00 > > > > >>>>>> 0.0e+00 0.0e+00 37 0 0 0 0 44 0 0 0 0 0 > > > > >>>>>> MatMult 4000 1.0 1.1959e+02 1.4 3.46e+04 1.4 0.0e+00 > > > > >>>>>> 0.0e+00 0.0e+00 75 9 0 0 0 89 9 0 0 0 0 > > > > >>>>>> MatSolve 8000 1.0 2.8150e-01 1.0 2.16e+07 1.0 0.0e+00 > > > > >>>>>> 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 83 > > > > >>>>>> MatLUFactorNum 1 1.0 1.3685e-04 1.1 5.64e+06 1.1 0.0e+00 > > > > >>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 21 > > > > >>>>>> MatILUFactorSym 1 1.0 2.3389e-04 1.2 0.00e+00 0.0 0.0e+00 > > > > >>>>>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > > >>>>>> MatGetOrdering 1 1.0 9.6083e-05 1.2 0.00e+00 0.0 0.0e+00 > > > > >>>>>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > > >>>>>> KSPSetup 1 1.0 2.1458e-06 2.2 0.00e+00 0.0 0.0e+00 > > > > >>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > > >>>>>> KSPSolve 4000 1.0 1.2200e+02 1.0 2.63e+05 1.0 0.0e+00 > > > > >>>>>> 0.0e+00 2.8e+04 84100 0 0 34 100100 0 0100 1 > > > > >>>>>> PCSetUp 1 1.0 5.0187e-04 1.2 1.68e+06 1.2 0.0e+00 > > > > >>>>>> 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 6 > > > > >>>>>> PCSetUpOnBlocks 4000 1.0 1.2104e-02 2.2 1.34e+05 2.2 0.0e+00 > > > > >>>>>> 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > > >>>>>> PCApply 8000 1.0 8.4254e-01 1.2 8.27e+06 1.2 0.0e+00 > > > > >>>>>> 0.0e+00 8.0e+03 1 18 0 0 10 1 18 0 0 29 28 > > > > >>>>>> ---------------------------------------------------------------- > > > > >>>>>>---- -- - -- ----------------------------------------------- > > > > >>>>>> > > > > >>>>>> Now if i understand it right, all these calls summarize all > > > > >>>>>> calls between the pop and push commands. That would mean that > > > > >>>>>> the majority of the time is spend in the MatMult and in within > > > > >>>>>> that the VecScatterBegin and VecScatterEnd commands (if i > > > > >>>>>> understand it right). > > > > >>>>>> > > > > >>>>>> My problem size is really small. So i was wondering if the > > > > >>>>>> problem lies in that (namely that the major time is simply spend > > > > >>>>>> communicating between processors, or whether there is still > > > > >>>>>> something wrong with how i wrote the code?) > > > > >>>>>> > > > > >>>>>> > > > > >>>>>> thanks > > > > >>>>>> mat > > > > >>>>>> > > > > >>>>>> On Tuesday 01 August 2006 18:28, Matthew Knepley wrote: > > > > >>>>>>> On 8/1/06, Matt Funk wrote: > > > > >>>>>>>> Actually the errors occur on my calls to a PETSc functions > > > > >>>>>>>> after calling PETSCInitialize. > > > > >>>>>>> > > > > >>>>>>> Yes, it is the error I pointed out in the last message. > > > > >>>>>>> > > > > >>>>>>> Matt > > > > >>>>>>> > > > > >>>>>>>> mat > > -- "Failure has a thousand explanations. Success doesn't need one" -- Sir Alec Guiness From mwojc at p.lodz.pl Tue Aug 15 20:10:47 2006 From: mwojc at p.lodz.pl (Marek Wojciechowski) Date: Wed, 16 Aug 2006 01:10:47 -0000 Subject: PETSc from python In-Reply-To: References: Message-ID: >> > There are nice Python bindings from >> > >> > http://lineal.developer.nicta.com.au/ I inspected a bit lineal but it also seems to be a work in progress... There is no official release and the project is currently hibernated (as i was told at their mailing list). The approach presented there is interesting, but far from the python high level programming spirit (for now). Instead, C constructs are used almost directly in python interpreter. This has one advantage that the PETSc examples can be easy translated to python. For now I found that it is quite good idea to use PETSc python bindings alongside with the parallelized python interpreter "bwpython" (http://www.cimec.org.ar/python/python.html) and python mpi bindings "mpi4py" (http://www.cimec.org.ar/python/mpi4py.html). I'm able now to run interactive sessions with PETSc and I have also access to all MPI functions which are not accessible from your PETSc bindings (AFAIK). Greetings -- Marek Wojciechowski From knepley at gmail.com Wed Aug 16 06:07:42 2006 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 16 Aug 2006 06:07:42 -0500 Subject: PETSc from python In-Reply-To: References: Message-ID: On 8/15/06, Marek Wojciechowski wrote: > >> > There are nice Python bindings from > >> > > >> > http://lineal.developer.nicta.com.au/ > > I inspected a bit lineal but it also seems to be a work in progress... > There is no official release and the project is currently hibernated (as i > was told at their mailing list). The approach presented there is > interesting, but far from the python high level programming spirit (for > now). Instead, C constructs are used almost directly in python > interpreter. This has one advantage that the PETSc examples can be easy > translated to python. > > For now I found that it is quite good idea to use PETSc python bindings > alongside with the parallelized python interpreter "bwpython" > (http://www.cimec.org.ar/python/python.html) and python mpi bindings > "mpi4py" (http://www.cimec.org.ar/python/mpi4py.html). I'm able now to run > interactive sessions with PETSc and I have also access to all MPI > functions which are not accessible from your PETSc bindings (AFAIK). This is very interesting. I also know Lisandro who wrote bwpython. I just had a user have the opposite experience with these packages, but I guess that is why we have multiple packages. I am very happy you got this going. If we can help out with anything else, just mail (and it won't take so long next time). Thanks, Matt > Greetings > > -- > Marek Wojciechowski > > -- "Failure has a thousand explanations. Success doesn't need one" -- Sir Alec Guiness From balay at mcs.anl.gov Wed Aug 16 09:08:50 2006 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 16 Aug 2006 09:08:50 -0500 (CDT) Subject: profiling PETSc code In-Reply-To: <200608151551.53794.mafunk@nmsu.edu> References: <200608011545.26224.mafunk@nmsu.edu> <200608151435.55054.mafunk@nmsu.edu> <200608151551.53794.mafunk@nmsu.edu> Message-ID: Yes - we limit the e-mail sizes on the mailing list - as we don't want to flood all list participents with multi-megabyte emails. Issues that require such interaction should be done at petsc-mait at mcs.anl.gov not petsc-users at mcs.anl.gov. Satish On Tue, 15 Aug 2006, Matt Funk wrote: > Is there a limit to how big an attachment can be? > The file is 1.3Mb big. I tried to send it twice and none of the emails went > through. I also send it directly to Barry and Matthews email. I hope that got > though? > > > On Tuesday 15 August 2006 13:56, Barry Smith wrote: > > > > Please send the entire -info output as an attachment to me. (Not > > > > in the email) I'll study it in more detail. From geenen at gmail.com Wed Aug 16 09:54:38 2006 From: geenen at gmail.com (Thomas Geenen) Date: Wed, 16 Aug 2006 16:54:38 +0200 Subject: leave my rows alone Message-ID: <200608161654.38378.geenen@gmail.com> dear petsc users, is there a way to prevent Petsc during the assembly phase from redistributing matrix rows over cpu's ?? i like the way the rows are assigned to the cpu's during the setvalues phase. apparently petsc assigns the first nrows to cpu0 the second nrows to cpu1 etc. I could of course renumber my matrix but I would rather convince petsc that it should keep the distribution of the matrix rows. tia Thomas From knepley at gmail.com Wed Aug 16 11:21:10 2006 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 16 Aug 2006 11:21:10 -0500 Subject: leave my rows alone In-Reply-To: <200608161654.38378.geenen@gmail.com> References: <200608161654.38378.geenen@gmail.com> Message-ID: On 8/16/06, Thomas Geenen wrote: > dear petsc users, > > is there a way to prevent Petsc during the assembly phase from redistributing > matrix rows over cpu's ?? i like the way the rows are assigned to the cpu's > during the setvalues phase. Actually, the layout of a matrix is fully determined after MatSetSizes(), or equivalently MatCreate***(). We do not redistribute at assembly. setValues() will take values for any row, and send it to the correct process. The matrix layouts we support all have contiguous row on each proc. You can set the sizes on creation. Does this answer your question? Thanks, Matt > apparently petsc assigns the first nrows to cpu0 the second nrows to cpu1 etc. > I could of course renumber my matrix but I would rather convince petsc that it > should keep the distribution of the matrix rows. > > tia > Thomas > > -- "Failure has a thousand explanations. Success doesn't need one" -- Sir Alec Guiness From balay at mcs.anl.gov Wed Aug 16 11:25:15 2006 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 16 Aug 2006 11:25:15 -0500 (CDT) Subject: leave my rows alone In-Reply-To: <200608161654.38378.geenen@gmail.com> References: <200608161654.38378.geenen@gmail.com> Message-ID: On Wed, 16 Aug 2006, Thomas Geenen wrote: > dear petsc users, > > is there a way to prevent Petsc during the assembly phase from redistributing > matrix rows over cpu's ?? The row distribution is done at matrix creation time - and you can set the row distribution with MatSetSizes() [or MatCreateMPIAIJ() etc..] by using the correct distribution value - instead of PETSC_DECIDE > i like the way the rows are assigned to the cpu's during the > setvalues phase. I don't understand this statement. The row assignment doesn't change > apparently petsc assigns the first nrows to cpu0 the second nrows to cpu1 etc. yes. this fact can't be changed. > I could of course renumber my matrix but I would rather convince > petsc that it should keep the distribution of the matrix rows. If you have some other global numbering scheme which is inconsitant with the matrix row numbering scheme - then you can use 'AO' object and associated routines to convert between mappings. Note: 'row distribution' is different from 'row numbering'. Satish From geenen at gmail.com Wed Aug 16 11:37:13 2006 From: geenen at gmail.com (Thomas Geenen) Date: Wed, 16 Aug 2006 18:37:13 +0200 Subject: leave my rows alone In-Reply-To: References: <200608161654.38378.geenen@gmail.com> Message-ID: <200608161837.13717.geenen@gmail.com> On Wednesday 16 August 2006 18:21, Matthew Knepley wrote: > On 8/16/06, Thomas Geenen wrote: > > dear petsc users, > > > > is there a way to prevent Petsc during the assembly phase from > > redistributing matrix rows over cpu's ?? i like the way the rows are > > assigned to the cpu's during the setvalues phase. > > Actually, the layout of a matrix is fully determined after MatSetSizes(), > or equivalently MatCreate***(). We do not redistribute at assembly. > > setValues() will take values for any row, and send it to the correct- > process. The send it to the correct process sounds a lot like redistributing but that's probably a matter of semantics > matrix layouts we support all have contiguous row on each proc. You can set > the sizes on creation. pity > > Does this answer your question? yep thanks > > Thanks, > > Matt > > > apparently petsc assigns the first nrows to cpu0 the second nrows to cpu1 > > etc. I could of course renumber my matrix but I would rather convince > > petsc that it should keep the distribution of the matrix rows. > > > > tia > > Thomas From balay at mcs.anl.gov Wed Aug 16 11:41:28 2006 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 16 Aug 2006 11:41:28 -0500 (CDT) Subject: leave my rows alone In-Reply-To: <200608161837.13717.geenen@gmail.com> References: <200608161654.38378.geenen@gmail.com> <200608161837.13717.geenen@gmail.com> Message-ID: On Wed, 16 Aug 2006, Thomas Geenen wrote: > On Wednesday 16 August 2006 18:21, Matthew Knepley wrote: > > On 8/16/06, Thomas Geenen wrote: > > > dear petsc users, > > > > > > is there a way to prevent Petsc during the assembly phase from > > > redistributing matrix rows over cpu's ?? i like the way the rows are > > > assigned to the cpu's during the setvalues phase. > > > > Actually, the layout of a matrix is fully determined after MatSetSizes(), > > or equivalently MatCreate***(). We do not redistribute at assembly. > > > > setValues() will take values for any row, and send it to the correct- > > process. The > send it to the correct process sounds a lot like redistributing but > that's probably a matter of semantics No its not redistribution. When you create the matrix - the ownership of a given row is determined. [it doesn't change] If row 10 belongs to proc 2 [determined with MatSetSizes()] , but you invoke MatSetValues(row=10) on proc 5, clearly this value has to be communicated to proc2. This happens in MatAssembly***(). Satish From mafunk at nmsu.edu Wed Aug 16 11:53:42 2006 From: mafunk at nmsu.edu (Matt Funk) Date: Wed, 16 Aug 2006 10:53:42 -0600 Subject: leave my rows alone In-Reply-To: <200608161837.13717.geenen@gmail.com> References: <200608161654.38378.geenen@gmail.com> <200608161837.13717.geenen@gmail.com> Message-ID: <200608161053.44439.mafunk@nmsu.edu> Hi Thomas. I am not sure if the following is what you are looking for, but i don't have PETSc 'redistribute' anything. That is, i tell PETSc exactly how the matrix should be distributed across the procs via the following: m_ierr = MatCreateMPIAIJ(PETSC_COMM_WORLD, ???????????????????????? ? a_totallocal_numPoints[a_thisproc], ???????????????????????? ? a_totallocal_numPoints[a_thisproc], ???????????????????????? ? a_totalglobal_numPoints, ???????????????????????? ? a_totalglobal_numPoints, ???????????????????????? ? PETSC_NULL, ???????????????????????? ? a_NumberOfNZPointsInDiagonalMatrix, ???????????????????????? ? PETSC_NULL, ???????????????????????? ? a_NumberOfNZPointsInOffDiagonalMatrix, ???????????????????????? ? &m_globalMatrix); The argument descriptions are found at 'http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Mat/MatCreateMPIAIJ.html' So anyway, PETSc does not touch this matrix in the sense of redistributing anything. It is just as i want it to be. Hope this helps ... mat On Wednesday 16 August 2006 10:37, Thomas Geenen wrote: > On Wednesday 16 August 2006 18:21, Matthew Knepley wrote: > > On 8/16/06, Thomas Geenen wrote: > > > dear petsc users, > > > > > > is there a way to prevent Petsc during the assembly phase from > > > redistributing matrix rows over cpu's ?? i like the way the rows are > > > assigned to the cpu's during the setvalues phase. > > > > Actually, the layout of a matrix is fully determined after > > MatSetSizes(), or equivalently MatCreate***(). We do not redistribute at > > assembly. > > > > setValues() will take values for any row, and send it to the correct- > > process. The > > send it to the correct process sounds a lot like redistributing but that's > probably a matter of semantics > > > matrix layouts we support all have contiguous row on each proc. You can > > set the sizes on creation. > > pity > > > Does this answer your question? > > yep > thanks > > > Thanks, > > > > Matt > > > > > apparently petsc assigns the first nrows to cpu0 the second nrows to > > > cpu1 etc. I could of course renumber my matrix but I would rather > > > convince petsc that it should keep the distribution of the matrix rows. > > > > > > tia > > > Thomas From balay at mcs.anl.gov Wed Aug 16 12:26:49 2006 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 16 Aug 2006 12:26:49 -0500 (CDT) Subject: leave my rows alone In-Reply-To: <200608161053.44439.mafunk@nmsu.edu> References: <200608161654.38378.geenen@gmail.com> <200608161837.13717.geenen@gmail.com> <200608161053.44439.mafunk@nmsu.edu> Message-ID: Perhaps the issue is not using MatGetRowOnership() [but some other scheme] to get the row indices that are used in MatSetValues() Satish On Wed, 16 Aug 2006, Matt Funk wrote: > Hi Thomas. > > I am not sure if the following is what you are looking for, but i don't have > PETSc 'redistribute' anything. That is, i tell PETSc exactly how the matrix > should be distributed across the procs via the following: > > m_ierr = MatCreateMPIAIJ(PETSC_COMM_WORLD, > ???????????????????????? ? a_totallocal_numPoints[a_thisproc], > ???????????????????????? ? a_totallocal_numPoints[a_thisproc], > ???????????????????????? ? a_totalglobal_numPoints, > ???????????????????????? ? a_totalglobal_numPoints, > ???????????????????????? ? PETSC_NULL, > ???????????????????????? ? a_NumberOfNZPointsInDiagonalMatrix, > ???????????????????????? ? PETSC_NULL, > ???????????????????????? ? a_NumberOfNZPointsInOffDiagonalMatrix, > ???????????????????????? ? &m_globalMatrix); > > The argument descriptions are found at > 'http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Mat/MatCreateMPIAIJ.html' > > So anyway, PETSc does not touch this matrix in the sense of redistributing > anything. It is just as i want it to be. Hope this helps ... > > mat > > > On Wednesday 16 August 2006 10:37, Thomas Geenen wrote: > > On Wednesday 16 August 2006 18:21, Matthew Knepley wrote: > > > On 8/16/06, Thomas Geenen wrote: > > > > dear petsc users, > > > > > > > > is there a way to prevent Petsc during the assembly phase from > > > > redistributing matrix rows over cpu's ?? i like the way the rows are > > > > assigned to the cpu's during the setvalues phase. > > > > > > Actually, the layout of a matrix is fully determined after > > > MatSetSizes(), or equivalently MatCreate***(). We do not redistribute at > > > assembly. > > > > > > setValues() will take values for any row, and send it to the correct- > > > process. The > > > > send it to the correct process sounds a lot like redistributing but that's > > probably a matter of semantics > > > > > matrix layouts we support all have contiguous row on each proc. You can > > > set the sizes on creation. > > > > pity > > > > > Does this answer your question? > > > > yep > > thanks > > > > > Thanks, > > > > > > Matt > > > > > > > apparently petsc assigns the first nrows to cpu0 the second nrows to > > > > cpu1 etc. I could of course renumber my matrix but I would rather > > > > convince petsc that it should keep the distribution of the matrix rows. > > > > > > > > tia > > > > Thomas > > From geenen at gmail.com Wed Aug 16 14:58:44 2006 From: geenen at gmail.com (Thomas Geenen) Date: Wed, 16 Aug 2006 21:58:44 +0200 Subject: leave my rows alone In-Reply-To: References: <200608161654.38378.geenen@gmail.com> <200608161837.13717.geenen@gmail.com> Message-ID: <200608162158.44210.geenen@gmail.com> On Wednesday 16 August 2006 18:41, Satish Balay wrote: > On Wed, 16 Aug 2006, Thomas Geenen wrote: > > On Wednesday 16 August 2006 18:21, Matthew Knepley wrote: > > > On 8/16/06, Thomas Geenen wrote: > > > > dear petsc users, > > > > > > > > is there a way to prevent Petsc during the assembly phase from > > > > redistributing matrix rows over cpu's ?? i like the way the rows are > > > > assigned to the cpu's during the setvalues phase. > > > > > > Actually, the layout of a matrix is fully determined after > > > MatSetSizes(), or equivalently MatCreate***(). We do not redistribute > > > at assembly. > > > > > > setValues() will take values for any row, and send it to the correct- > > > process. The > > > > send it to the correct process sounds a lot like redistributing but > > that's probably a matter of semantics > > No its not redistribution. When you create the matrix - the ownership > of a given row is determined. [it doesn't change] > > If row 10 belongs to proc 2 [determined with MatSetSizes()] , but you > invoke MatSetValues(row=10) on proc 5, clearly this value has to be > communicated to proc2. This happens in MatAssembly***(). in matcreate i tell petsc that cpu0 owns 10 rows in matsetvalue i tell him which rows (maybe 1,9,23,46 etc) but petsc automatically assumes that cpu0 owns the first 10 rows matsetsizes also just tells petsc the number of rows. 0-9 for cpu0 so if i want to keep row 23 on cpu0 i have to do some sort of renumbering making row23 <10 > > Satish From balay at mcs.anl.gov Wed Aug 16 15:14:53 2006 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 16 Aug 2006 15:14:53 -0500 (CDT) Subject: leave my rows alone In-Reply-To: <200608162158.44210.geenen@gmail.com> References: <200608161654.38378.geenen@gmail.com> <200608161837.13717.geenen@gmail.com> <200608162158.44210.geenen@gmail.com> Message-ID: On Wed, 16 Aug 2006, Thomas Geenen wrote: > in matcreate i tell petsc that cpu0 owns 10 rows yes > in matsetvalue i tell him which rows (maybe 1,9,23,46 etc) but petsc > automatically assumes that cpu0 owns the first 10 rows matsetsizes > also just tells petsc the number of rows. 0-9 for cpu0 MatSetValues() provides no such functionality [of specifying rows owned by local proc] . You are misinterpreting arguments to MatSetValues(). In this new model [which PETSc doesn't provide] - what hapens if you call MatSetValues(row=0) on both proc 0 & proc1? Does this row get owned by both procesors? And from what numbering scheme do you get these numbers 1,9,23,46 etc? Satish > > so if i want to keep row 23 on cpu0 i have to do some sort of renumbering > making row23 <10 From geenen at gmail.com Wed Aug 16 15:24:31 2006 From: geenen at gmail.com (Thomas Geenen) Date: Wed, 16 Aug 2006 22:24:31 +0200 Subject: leave my rows alone In-Reply-To: References: <200608161654.38378.geenen@gmail.com> <200608162158.44210.geenen@gmail.com> Message-ID: <200608162224.31369.geenen@gmail.com> On Wednesday 16 August 2006 22:14, Satish Balay wrote: > On Wed, 16 Aug 2006, Thomas Geenen wrote: > > in matcreate i tell petsc that cpu0 owns 10 rows > > yes > > > in matsetvalue i tell him which rows (maybe 1,9,23,46 etc) but petsc > > automatically assumes that cpu0 owns the first 10 rows matsetsizes > > also just tells petsc the number of rows. 0-9 for cpu0 > > MatSetValues() provides no such functionality [of specifying rows > owned by local proc] . You are misinterpreting arguments to > MatSetValues(). i know but i hoped there would be some way of doing this i like to be in charge of these kind of things :) > > In this new model [which PETSc doesn't provide] - what hapens if you > call MatSetValues(row=0) on both proc 0 & proc1? Does this row get > owned by both procesors? why would you do that??? maybe if you have a very slow network? > > And from what numbering scheme do you get these numbers 1,9,23,46 etc? my nice little FEM app that i build an interface to petsc for :) > > Satish > > > so if i want to keep row 23 on cpu0 i have to do some sort of renumbering > > making row23 <10 thanks for all the help Thomas From balay at mcs.anl.gov Wed Aug 16 15:31:03 2006 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 16 Aug 2006 15:31:03 -0500 (CDT) Subject: leave my rows alone In-Reply-To: <200608162224.31369.geenen@gmail.com> References: <200608161654.38378.geenen@gmail.com> <200608162158.44210.geenen@gmail.com> <200608162224.31369.geenen@gmail.com> Message-ID: On Wed, 16 Aug 2006, Thomas Geenen wrote: > > And from what numbering scheme do you get these numbers 1,9,23,46 etc? > my nice little FEM app that i build an interface to petsc for :) I presume this numbering is same irrespective of number of procs or the way the grid is partitioned across processors? In that case it would constitute a global numbering scheme - and you can use PETSc 'AO' object to map between numbering schemes. Satish From joel.schaerer at creatis.insa-lyon.fr Wed Aug 16 11:46:49 2006 From: joel.schaerer at creatis.insa-lyon.fr (=?ISO-8859-1?Q?Jo=EBl?= Schaerer) Date: Wed, 16 Aug 2006 12:46:49 -0400 Subject: Configuration problems on windows Message-ID: <1155746809.2364.7.camel@netnyuotp006545ots.unassigned.msnyuhealth.org> Hello all, I am trying to configure petsc for visual studio on a windows machine. Here is the configure line I typed on cygwin: ./config/configure.py --with-cc='win32fe cl' --download-c-blas-lapack=1 --with-mpi=0 --with-x=0 --PETSC_DIR=$(pwd) --with-fortran=0 This has already worked before on other windows machines, but this time I get the following error: [snip] C:\pkg\cygwin\bin\python2.3.exe: *** unable to remap C:\pkg\cygwin\bin \cygssl-0. 9.7.dll to same address as parent(0x1220000) != 0x1230000 37 [unknown (0x978)] python 2432 sync_with_child: child 896(0x900) died bef ore initialization with status code 0x1 422 [unknown (0x978)] python 2432 sync_with_child: *** child state child loa ding dlls Exception in thread Shell Command: Traceback (most recent call last): File "/tmp/python.2708/usr/lib/python2.3/threading.py", line 436, in __bootstr ap self.run() File "/tmp/python.2708/usr/lib/python2.3/threading.py", line 416, in run self.__target(*self.__args, **self.__kwargs) File "/cygdrive/d/chenting/tools/petsc/petsc-2.3.1-p16/python/BuildSystem/scri pt.py", line 190, in run (output, error, status) = Script.runShellCommand(command, log) File "/cygdrive/d/chenting/tools/petsc/petsc-2.3.1-p16/python/BuildSystem/scri pt.py", line 124, in runShellCommand (input, output, error, pipe) = Script.openPipe(command) File "/cygdrive/d/chenting/tools/petsc/petsc-2.3.1-p16/python/BuildSystem/scri pt.py", line 105, in openPipe pipe = popen2.Popen3(command, 1) File "/tmp/python.2708/usr/lib/python2.3/popen2.py", line 39, in __init__ self.pid = os.fork() OSError: [Errno 11] Resource temporarily unavailable The complete configure.log file is available at the following adress: http://fex.insa-lyon.fr/get?k=7xspvbpkylWdNgpkjPG Any ideas? Thanks a lot for your help, Joel Schaerer From balay at mcs.anl.gov Wed Aug 16 18:19:59 2006 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 16 Aug 2006 18:19:59 -0500 (CDT) Subject: Configuration problems on windows In-Reply-To: <1155746809.2364.7.camel@netnyuotp006545ots.unassigned.msnyuhealth.org> References: <1155746809.2364.7.camel@netnyuotp006545ots.unassigned.msnyuhealth.org> Message-ID: > C:\pkg\cygwin\bin\python2.3.exe: *** unable to remap C:\pkg\cygwin\bin\cygssl-0.9.7.dll There are probably 2 ways to recover from this cygwin error. 1. reinstall cygwin from scratch 2 - kill all cygwin processes [by rebooting] - run 'ash' shell from cygwin bin dir [this should be done either from 'start -> run' or from 'cmd' - but not 'cygwin bash shell' - run 'rebaseall' from the above 'ash' shell. rebaseall is the easy thing to do - but it has its own issues [according to cygwin folks] - but you can try and see if it works for you. Satish On Wed, 16 Aug 2006, Jo?l Schaerer wrote: > Hello all, > > I am trying to configure petsc for visual studio on a windows machine. > Here is the configure line I typed on cygwin: > > ./config/configure.py --with-cc='win32fe cl' --download-c-blas-lapack=1 > --with-mpi=0 --with-x=0 --PETSC_DIR=$(pwd) --with-fortran=0 > > This has already worked before on other windows machines, but this time > I get the following error: > > [snip] > C:\pkg\cygwin\bin\python2.3.exe: *** unable to remap C:\pkg\cygwin\bin > \cygssl-0. > 9.7.dll to same address as parent(0x1220000) != 0x1230000 > 37 [unknown (0x978)] python 2432 sync_with_child: child 896(0x900) > died bef > ore initialization with status code 0x1 > 422 [unknown (0x978)] python 2432 sync_with_child: *** child state > child loa > ding dlls > Exception in thread Shell Command: > Traceback (most recent call last): > File "/tmp/python.2708/usr/lib/python2.3/threading.py", line 436, in > __bootstr > ap > self.run() > File "/tmp/python.2708/usr/lib/python2.3/threading.py", line 416, in > run > self.__target(*self.__args, **self.__kwargs) > File > "/cygdrive/d/chenting/tools/petsc/petsc-2.3.1-p16/python/BuildSystem/scri > pt.py", line 190, in run > (output, error, status) = Script.runShellCommand(command, log) > File > "/cygdrive/d/chenting/tools/petsc/petsc-2.3.1-p16/python/BuildSystem/scri > pt.py", line 124, in runShellCommand > (input, output, error, pipe) = Script.openPipe(command) > File > "/cygdrive/d/chenting/tools/petsc/petsc-2.3.1-p16/python/BuildSystem/scri > pt.py", line 105, in openPipe > pipe = popen2.Popen3(command, 1) > File "/tmp/python.2708/usr/lib/python2.3/popen2.py", line 39, in > __init__ > self.pid = os.fork() > OSError: [Errno 11] Resource temporarily unavailable > > The complete configure.log file is available at the following adress: > http://fex.insa-lyon.fr/get?k=7xspvbpkylWdNgpkjPG > > Any ideas? > > Thanks a lot for your help, > > Joel Schaerer > > From mwojc at p.lodz.pl Thu Aug 17 14:42:48 2006 From: mwojc at p.lodz.pl (Marek Wojciechowski) Date: Thu, 17 Aug 2006 19:42:48 -0000 Subject: PETSc from python In-Reply-To: References: Message-ID: On Wed, 16 Aug 2006 11:07:42 -0000, Matthew Knepley wrote: > This is very interesting. I also know Lisandro who wrote bwpython. I > just had a user have the opposite experience with these packages, but I > guess that > is why we have multiple packages. I am very happy you got this going. Yes. I got this going. And I'm quite satisfied. To prove it, that's an examplary session: mwojc at evo ~ $ mpirun -np 2 `which bwpython` `which ipython` -nobanner In [1]: from petscinit import * # This imports for me PETSc extensions, initializes and creates stdviewer In [2]: from mpi4py import MPI In [3]: In [3]: v=Vec() In [4]: size = (MPI.rank + 1) * 2 In [5]: totsize = MPI.COMM_WORLD.Allreduce(size) In [6]: print 'Process [%d]: size=%d' %(MPI.rank, size) Process [0]: size=2 Process [1]: size=4 In [7]: print 'Process [%d]: totsize=%d' %(MPI.rank, totsize) Process [0]: totsize=6 Process [1]: totsize=6 In [8]: v.setSizes(size, PETSC_DECIDE) In [9]: v.setFromOptions() In [10]: if MPI.rank == 0: v.setValues(xrange(totsize), [1]*totsize, INSERT_VALUES) ....: In [11]: v.assemblyBegin() In [12]: v.assemblyEnd() In [13]: v.view(stdviewer) Process [0] 0: 1 1: 1 Process [1] 2: 1 3: 1 4: 1 5: 1 In [14]: In [14]: A=Mat() In [15]: A.setSizes(size, size, PETSC_DECIDE, PETSC_DECIDE) In [16]: A.setFromOptions() In [17]: Range = A.getOwnershipRange() In [18]: print Range (0, 2) (2, 6) In [19]: rows=xrange(Range[0], Range[1]) In [20]: cols=xrange(totsize) In [21]: import random In [22]: values=[random.uniform(-1,1) for i in xrange(size*totsize)] In [23]: A.setValues(rows, cols, values, INSERT_VALUES) In [24]: A.assemblyBegin(MAT_FINAL_ASSEMBLY) In [25]: A.assemblyEnd(MAT_FINAL_ASSEMBLY) In [26]: A.view(stdviewer) row 0: (0, 0.630031) (1, 0.673476) (2, -0.734869) (3, 0.105727) (4, 0.538428) (5, 0.12576) row 1: (0, -0.857206) (1, -0.0761736) (2, -0.143492) (3, -0.938166) (4, 0.41378) (5, -0.210328) row 2: (0, 0.50173) (1, -0.214067) (2, 0.59921) (3, 0.848044) (4, -0.819785) (5, -0.436404) row 3: (0, 0.30529) (1, 0.968145) (2, 0.377928) (3, -0.656585) (4, 0.882831) (5, 0.850657) row 4: (0, -0.304465) (1, 0.496273) (2, -0.277161) (3, -0.81206) (4, 0.63498) (5, 0.58123) row 5: (0, 0.538759) (1, -0.654964) (2, -0.256906) (3, -0.335948) (4, 0.748973) (5, 0.813876) In [27]: In [27]: b = v.duplicate() #right hand vector In [28]: A.mult(v, b) In [29]: x=b.duplicate() #unknowns In [30]: In [30]: solver = KSP() In [31]: solver.setOperators(A,A,DIFFERENT_NONZERO_PATTERN) In [32]: solver.setFromOptions() In [33]: solver.solve(b,x) In [34]: x.view(stdviewer) # IS SOLUTION CORRECT? (SHOULD BE ONES?) Process [0] 0: 1 1: 1 Process [1] 2: 1 3: 1 4: 1 5: 1 In [35]: #AND SO ON... Isn't it nice? But I have also a question. Suppose my matrix A is "dense" and I would like to get local arrays: In [36]: A.setType("dense") In [37]: A.setValues(rows, cols, values, INSERT_VALUES) In [38]: A.assemblyBegin(MAT_FINAL_ASSEMBLY) In [39]: A.assemblyEnd(MAT_FINAL_ASSEMBLY) In [40]: A.view(stdviewer) 6.3003e-01 6.7348e-01 -7.3487e-01 1.0573e-01 5.3843e-01 1.2576e-01 -8.5721e-01 -7.6174e-02 -1.4349e-01 -9.3817e-01 4.1378e-01 -2.1033e-01 5.0173e-01 -2.1407e-01 5.9921e-01 8.4804e-01 -8.1978e-01 -4.3640e-01 3.0529e-01 9.6815e-01 3.7793e-01 -6.5659e-01 8.8283e-01 8.5066e-01 -3.0446e-01 4.9627e-01 -2.7716e-01 -8.1206e-01 6.3498e-01 5.8123e-01 5.3876e-01 -6.5496e-01 -2.5691e-01 -3.3595e-01 7.4897e-01 8.1388e-01 In [41]: print A.getArray() array([0.63003134676816508, -0.85720600630413402, 0.67347596051594882, -0.076173622638120664], 'd') array([0.5017303230541359, 0.30529049321568058, -0.3044646591900968, 0.53875941791034943, -0.21406665507357259, 0.96814544907801547, 0.49627269833290644, -0.65496405056370244, 0.59920997972823065, 0.3779276741558788, -0.2771608364332232, -0.25690564212998068, 0.8480435858237616, -0.65658527037718994, -0.81206017169215183, -0.33594803008233565], 'd') Why the obtained arrays does not represent what is stored on each process. I thought there should be 2 first whole rows in the first array and the rest in the second or I'm missing something. I also observed that PETSc objects are not picklable. Is there any good reason for that? Greetings -- Marek Wojciechowski From mafunk at nmsu.edu Thu Aug 17 15:25:28 2006 From: mafunk at nmsu.edu (Matt Funk) Date: Thu, 17 Aug 2006 14:25:28 -0600 Subject: PETSc communicator In-Reply-To: References: Message-ID: <200608171425.30756.mafunk@nmsu.edu> Hi, i was wondering what the message: 'PetscCommDuplicate Using internal PETSc communicator 92 170' means exactly. I still have issues with PETSc when running 1 vs 2 procs w.r.t. the loadbalance. However, when run on 2 vs 4 the balance seems to be almost perfect. Then the option of a screwed up network was suggested to me, but since the 4vs 2 proc case is ok, it seems not necessarily to be the case. Maybe somebody can tell me what it means? thanks mat From jiaxun_hou at yahoo.com.cn Thu Aug 17 22:52:00 2006 From: jiaxun_hou at yahoo.com.cn (jiaxun hou) Date: Fri, 18 Aug 2006 11:52:00 +0800 (CST) Subject: function for eigendecomposition Message-ID: <20060818035200.72126.qmail@web15801.mail.cnb.yahoo.com> Hello, Can you tell me which function can compute the eigenvectors for me? I have only found the function "KSPComputeEigenvalues" in the document, but it is not suitable for me. I want to get both the eigenvalues and eigenvectors. And I am looking for a efficient function to do the eigendecomposition for a symmetric matrix. Any help will be appreciated. Regards, Jiaxun --------------------------------- Mp3???-??????? -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Aug 17 23:19:25 2006 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 17 Aug 2006 23:19:25 -0500 Subject: PETSc communicator In-Reply-To: <200608171425.30756.mafunk@nmsu.edu> References: <200608171425.30756.mafunk@nmsu.edu> Message-ID: We stash some data in the communicator. This is what is happening here. Matt On 8/17/06, Matt Funk wrote: > Hi, > > i was wondering what the message: > 'PetscCommDuplicate Using internal PETSc communicator 92 170' > means exactly. I still have issues with PETSc when running 1 vs 2 procs w.r.t. > the loadbalance. > However, when run on 2 vs 4 the balance seems to be almost perfect. > Then the option of a screwed up network was suggested to me, but since the 4vs > 2 proc case is ok, it seems not necessarily to be the case. > > Maybe somebody can tell me what it means? > > thanks > mat > > -- "Failure has a thousand explanations. Success doesn't need one" -- Sir Alec Guiness From yaronkretchmer at gmail.com Thu Aug 17 23:25:49 2006 From: yaronkretchmer at gmail.com (Yaron Kretchmer) Date: Thu, 17 Aug 2006 21:25:49 -0700 Subject: function for eigendecomposition In-Reply-To: <20060818035200.72126.qmail@web15801.mail.cnb.yahoo.com> References: <20060818035200.72126.qmail@web15801.mail.cnb.yahoo.com> Message-ID: maybe this will help http://www.grycap.upv.es/slepc/ On 8/17/06, jiaxun hou wrote: > > Hello, > > Can you tell me which function can compute the eigenvectors for me? I have > only found the function "KSPComputeEigenvalues" in the document, but it is > not suitable for me. I want to get both the eigenvalues and eigenvectors. > And I am looking for a efficient function to do the eigendecomposition for a > symmetric matrix. Any help will be appreciated. > > Regards, > Jiaxun > > ------------------------------ > Mp3???-??????? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwojc at p.lodz.pl Mon Aug 21 06:11:21 2006 From: mwojc at p.lodz.pl (Marek Wojciechowski) Date: Mon, 21 Aug 2006 11:11:21 -0000 Subject: Memory preallocation in python Message-ID: Hi, i have a small problem in python. In case of matrix assembling -info says me something like: [0] MatSetUpPreallocationWarning not preallocating matrix storage [0] MatAssemblyEnd_SeqAIJMatrix size: 6006 X 6006; storage space: 12084 unneeded,108036 used [0] MatAssemblyEnd_SeqAIJNumber of mallocs during MatSetValues() is 6006 [0] MatAssemblyEnd_SeqAIJMaximum nonzeros in any row is 18 [0] Mat_CheckInodeFound 2002 nodes of 6006. Limit used: 5. Using Inode routines This obviously means poor behavior because of dynamical memory allocation. Unfortunately, I have no idea how to preallocate memory for Mat objects in python. Any suggestions? Greetings -- Marek Wojciechowski From knepley at gmail.com Mon Aug 21 04:16:08 2006 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 21 Aug 2006 04:16:08 -0500 Subject: Memory preallocation in python In-Reply-To: References: Message-ID: It should be a wrapper for Mat***Preallocation(). Do these exist? Matt On 8/21/06, Marek Wojciechowski wrote: > > Hi, > > i have a small problem in python. In case of matrix assembling -info says > me something like: > > [0] MatSetUpPreallocationWarning not preallocating matrix storage > [0] MatAssemblyEnd_SeqAIJMatrix size: 6006 X 6006; storage space: 12084 > unneeded,108036 used > [0] MatAssemblyEnd_SeqAIJNumber of mallocs during MatSetValues() is 6006 > [0] MatAssemblyEnd_SeqAIJMaximum nonzeros in any row is 18 > [0] Mat_CheckInodeFound 2002 nodes of 6006. Limit used: 5. Using Inode > routines > > This obviously means poor behavior because of dynamical memory allocation. > Unfortunately, I have no idea how to preallocate memory for Mat objects in > python. > Any suggestions? > > Greetings > -- > Marek Wojciechowski > > -- "Failure has a thousand explanations. Success doesn't need one" -- Sir Alec Guiness -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwojc at p.lodz.pl Mon Aug 21 07:22:40 2006 From: mwojc at p.lodz.pl (Marek Wojciechowski) Date: Mon, 21 Aug 2006 12:22:40 -0000 Subject: Memory preallocation in python In-Reply-To: References: Message-ID: On Mon, 21 Aug 2006 09:16:08 -0000, Matthew Knepley wrote: > It should be a wrapper for Mat***Preallocation(). Do these exist? > > Matt > I didn't find a method Mat.Preallocation() nor anything similar for Mat... Maybe I should search in another place? -- Marek Wojciechowski From jiaxun_hou at yahoo.com.cn Mon Aug 21 04:43:13 2006 From: jiaxun_hou at yahoo.com.cn (jiaxun hou) Date: Mon, 21 Aug 2006 17:43:13 +0800 (CST) Subject: =?gb2312?q?=BB=D8=B8=B4=A3=BA=20Re:=20function=20for=20eigendecomposition?= In-Reply-To: Message-ID: <20060821094313.44251.qmail@web15805.mail.cnb.yahoo.com> Thank you very much. It is really what I want. Regards, Jiaxun Yaron Kretchmer ??? maybe this will help http://www.grycap.upv.es/slepc/ On 8/17/06, jiaxun hou wrote: Hello, Can you tell me which function can compute the eigenvectors for me? I have only found the function "KSPComputeEigenvalues" in the document, but it is not suitable for me. I want to get both the eigenvalues and eigenvectors. And I am looking for a efficient function to do the eigendecomposition for a symmetric matrix. Any help will be appreciated. Regards, Jiaxun --------------------------------- Mp3???-??????? __________________________________________________ ??????????????? http://cn.mail.yahoo.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Aug 21 04:43:42 2006 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 21 Aug 2006 04:43:42 -0500 Subject: Memory preallocation in python In-Reply-To: References: Message-ID: Sorry, 1) It would have to be for a subclass like MatSeqAIJ 2) It is setPreallocation() Thanks, Matt On 8/21/06, Marek Wojciechowski wrote: > > > On Mon, 21 Aug 2006 09:16:08 -0000, Matthew Knepley > wrote: > > > It should be a wrapper for Mat***Preallocation(). Do these exist? > > > > Matt > > > > I didn't find a method Mat.Preallocation() nor anything similar for Mat... > Maybe I should search in another place? > > > > -- > Marek Wojciechowski > > -- "Failure has a thousand explanations. Success doesn't need one" -- Sir Alec Guiness -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwojc at p.lodz.pl Mon Aug 21 07:58:08 2006 From: mwojc at p.lodz.pl (Marek Wojciechowski) Date: Mon, 21 Aug 2006 12:58:08 -0000 Subject: Memory preallocation in python In-Reply-To: References: Message-ID: > 1) It would have to be for a subclass like MatSeqAIJ > > 2) It is setPreallocation() > > Thanks, > > Matt > Honestly, I can find neither subclass MatSeqAIJ nor the method setPreallocation(). I create matrix as follows: import PETSc.Mat K = PETSc.Mat.Mat() ## as far as i know this is the only way to create matrix in python K.setSizes(size, size, Size, Size) K.setFromOptions() K.setType("seqaij") ## i choose matrix type here, I don't know another way... Where is the moment to preallocate? -- Marek Wojciechowski From knepley at gmail.com Mon Aug 21 08:29:19 2006 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 21 Aug 2006 08:29:19 -0500 Subject: Memory preallocation in python In-Reply-To: References: Message-ID: On 8/21/06, Marek Wojciechowski wrote: > > > > 1) It would have to be for a subclass like MatSeqAIJ > > > > 2) It is setPreallocation() > > > > Thanks, > > > > Matt > > > > Honestly, I can find neither subclass MatSeqAIJ nor the method > setPreallocation(). I create matrix as follows: > > import PETSc.Mat > K = PETSc.Mat.Mat() ## as far as i know this is the only way to create > matrix in python > K.setSizes(size, size, Size, Size) > K.setFromOptions() > K.setType("seqaij") ## i choose matrix type here, I don't know another > way... > > Where is the moment to preallocate? It may be that they never wrapped the preallocation methods. It seems strange, but possible. I guess I would mail Lisandro and see. Matt -- > Marek Wojciechowski > -- "Failure has a thousand explanations. Success doesn't need one" -- Sir Alec Guiness -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwojc at p.lodz.pl Mon Aug 21 11:33:19 2006 From: mwojc at p.lodz.pl (Marek Wojciechowski) Date: Mon, 21 Aug 2006 16:33:19 -0000 Subject: Memory preallocation in python In-Reply-To: References: Message-ID: > It may be that they never wrapped the preallocation methods. It seems > strange, but > possible. I guess I would mail Lisandro and see. > Just to clarify: I'm using the wrappers downloaded from ftp://ftp.mcs.anl.gov/pub/petsc/PETScPython.tar.gz. Are these something to do with petsc4py package by Lisandro? -- Marek Wojciechowski From knepley at gmail.com Mon Aug 21 08:56:40 2006 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 21 Aug 2006 08:56:40 -0500 Subject: Memory preallocation in python In-Reply-To: References: Message-ID: On 8/21/06, Marek Wojciechowski wrote: > > > > It may be that they never wrapped the preallocation methods. It seems > > strange, but > > possible. I guess I would mail Lisandro and see. > > > > Just to clarify: I'm using the wrappers downloaded from > ftp://ftp.mcs.anl.gov/pub/petsc/PETScPython.tar.gz. > Are these something to do with petsc4py package by Lisandro? I got confused. There are several different wrappers. Those are ones that I produced, but found too hard to support. I am know telling people who want more functionality to try either the petsc4py or the LINEAL wrappers since they have the time and money to do a better job I think. Matt -- > Marek Wojciechowski > -- "Failure has a thousand explanations. Success doesn't need one" -- Sir Alec Guiness -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwojc at p.lodz.pl Mon Aug 21 12:11:38 2006 From: mwojc at p.lodz.pl (Marek Wojciechowski) Date: Mon, 21 Aug 2006 17:11:38 -0000 Subject: Memory preallocation in python In-Reply-To: References: Message-ID: On Mon, 21 Aug 2006 13:56:40 -0000, Matthew Knepley wrote: > > I got confused. There are several different wrappers. Those are ones that > I produced, but found too hard to support. I am know telling people who > want more functionality to try either the petsc4py or the LINEAL wrappers > since they have the time and money to do a better job I think. > Well, does it mean that your wrappers are not developed any more? One more question then: In case of petsc4py, I tried to compile it but with no success because of the lack of include file petschead.h in the petsc distribution (2.3.1-p16). I guess, it was removed for some reason. Maybe you could tell me where are now the definitions from this file. -- Marek Wojciechowski From knepley at gmail.com Mon Aug 21 09:39:50 2006 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 21 Aug 2006 09:39:50 -0500 Subject: Memory preallocation in python In-Reply-To: References: Message-ID: You just need petsc.h now. The structs are defined in include/private/petscimpl.h Matt On 8/21/06, Marek Wojciechowski wrote: > > On Mon, 21 Aug 2006 13:56:40 -0000, Matthew Knepley > wrote: > > > > I got confused. There are several different wrappers. Those are ones > that > > I produced, but found too hard to support. I am know telling people who > > want more functionality to try either the petsc4py or the LINEAL > wrappers > > since they have the time and money to do a better job I think. > > > > Well, does it mean that your wrappers are not developed any more? > > One more question then: > In case of petsc4py, I tried to compile it but with no success because of > the lack > of include file petschead.h in the petsc distribution (2.3.1-p16). I > guess, it was removed for > some reason. Maybe you could tell me where are now the definitions from > this file. > > -- > Marek Wojciechowski > > -- "Failure has a thousand explanations. Success doesn't need one" -- Sir Alec Guiness -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwojc at p.lodz.pl Mon Aug 21 13:10:30 2006 From: mwojc at p.lodz.pl (Marek Wojciechowski) Date: Mon, 21 Aug 2006 18:10:30 -0000 Subject: Memory preallocation in python In-Reply-To: References: Message-ID: On Mon, 21 Aug 2006 14:39:50 -0000, Matthew Knepley wrote: This not helps, there are still undeclared variables like: CARRAY_FLAGS PETSC_COOKIE UPDATEIFCOPY PETSC_FILE_RDONLY PETSC_FILE_WRONLY PETSC_FILE_CREATE KSP_CONVERGED_QCG_NEG_CURVE KSP_CONVERGED_QCG_CONSTRAINED I'm affraid petsc4py wrappers are broken for newer versions of PETSc... > You just need petsc.h now. The structs are defined in > include/private/petscimpl.h >> >> One more question then: >> In case of petsc4py, I tried to compile it but with no success because >> of >> the lack >> of include file petschead.h in the petsc distribution (2.3.1-p16). I >> guess, it was removed for >> some reason. Maybe you could tell me where are now the definitions from >> this file. -- Marek Wojciechowski From xiwang at dragon.rutgers.edu Mon Aug 21 14:29:49 2006 From: xiwang at dragon.rutgers.edu (Xiaoxu Wang) Date: Mon, 21 Aug 2006 15:29:49 -0400 Subject: configuration question Message-ID: <44EA09AD.70503@dragon.rutgers.edu> sorry, the question I want to ask is I got the following error when configuring Petsc, 'Configure' object has no attribute 'diff' File "./config/configure.py", line 166, in petsc_configure when it runs to (out,err,status) = Configure.executeShellCommand(getattr(self, 'diff')+' -w diff1 diff2'), line 56 in programs.py. Hi, When I comfiguring Petsc under Windows and Cygwin, it always crashes when it is checking for diff. The path is correct. It could be the difference between '\' and '\\' in python causing the problem. How to solve this problem? Or can I skip checking for 'diff'? Thank you for suggestions. Xiaoxu From balay at mcs.anl.gov Mon Aug 21 14:34:36 2006 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 21 Aug 2006 14:34:36 -0500 (CDT) Subject: configuration question In-Reply-To: <44EA09AD.70503@dragon.rutgers.edu> References: <44EA09AD.70503@dragon.rutgers.edu> Message-ID: Can you verify if you have /usr/bin/diff.exe installed with your cygwin installation? Can you verify if you are using python from cygwin? If you encounter problems - send the corresponding configure.log to petsc-maint at mcs.anl.gov [note: configure.log is too big to be posted on a mailing-list - so use the above e-mail] Satish On Mon, 21 Aug 2006, Xiaoxu Wang wrote: > sorry, the question I want to ask is > > I got the following error when configuring Petsc, > 'Configure' object has no attribute 'diff' File "./config/configure.py", line > 166, in petsc_configure > when it runs to (out,err,status) = Configure.executeShellCommand(getattr(self, > 'diff')+' -w diff1 diff2'), line 56 in programs.py. > > > > Hi, > When I comfiguring Petsc under Windows and Cygwin, it always crashes when > it is checking for diff. The path is correct. It could be the difference > between '\' and '\\' in python causing the problem. How to solve this problem? > Or can I skip checking for 'diff'? Thank you for suggestions. > > Xiaoxu > > From xiwang at dragon.rutgers.edu Mon Aug 21 14:00:11 2006 From: xiwang at dragon.rutgers.edu (Xiaoxu Wang) Date: Mon, 21 Aug 2006 15:00:11 -0400 Subject: configuration question Message-ID: <44EA02BB.5000101@dragon.rutgers.edu> Hi, When I comfiguring Petsc under Windows and Cygwin, it always crashes when it is checking for diff. The path is correct. It could be the difference between '\' and '\\' in python causing the problem. How to solve this problem? Or can I skip checking for 'diff'? Thank you for suggestions. Xiaoxu From alabute at stanford.edu Mon Aug 21 16:42:35 2006 From: alabute at stanford.edu (Alex) Date: Mon, 21 Aug 2006 14:42:35 -0700 Subject: How do I add two matrices together? Message-ID: <1156196556.5327.12.camel@localhost.localdomain> I can't seem to find a function that will allow me to add or subtract matrices. I would like to do this with matrices that have already been assembled. thanks, -- Alexander Ten Eyck Laboratory for Virtual Experiments in Mechanics Stanford University From knepley at gmail.com Mon Aug 21 16:46:52 2006 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 21 Aug 2006 16:46:52 -0500 Subject: How do I add two matrices together? In-Reply-To: <1156196556.5327.12.camel@localhost.localdomain> References: <1156196556.5327.12.camel@localhost.localdomain> Message-ID: MatAXPY Matt On 8/21/06, Alex wrote: > > I can't seem to find a function that will allow me to add or subtract > matrices. I would like to do this with matrices that have already been > assembled. > > thanks, > -- > Alexander Ten Eyck > Laboratory for Virtual Experiments in Mechanics > Stanford University > > -- "Failure has a thousand explanations. Success doesn't need one" -- Sir Alec Guiness -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Aug 22 14:34:18 2006 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 22 Aug 2006 14:34:18 -0500 (CDT) Subject: PETSc communicator In-Reply-To: <200608171425.30756.mafunk@nmsu.edu> References: <200608171425.30756.mafunk@nmsu.edu> Message-ID: Mat, This will not effect load balance or anything like that. When you pass a communicator like MPI_COMM_WORLD to PETSc we don't actually use that communicator (because you might be using it and there may be tag collisions etc). So instead we store our own communicator inside the MPI_COMM_WORLD as an attribute, this message is just telling us we are accessing the inner communicator. Barry On Thu, 17 Aug 2006, Matt Funk wrote: > Hi, > > i was wondering what the message: > 'PetscCommDuplicate Using internal PETSc communicator 92 170' > means exactly. I still have issues with PETSc when running 1 vs 2 procs w.r.t. > the loadbalance. > However, when run on 2 vs 4 the balance seems to be almost perfect. > Then the option of a screwed up network was suggested to me, but since the 4vs > 2 proc case is ok, it seems not necessarily to be the case. > > Maybe somebody can tell me what it means? > > thanks > mat > > From mafunk at nmsu.edu Tue Aug 22 15:46:04 2006 From: mafunk at nmsu.edu (Matt Funk) Date: Tue, 22 Aug 2006 14:46:04 -0600 Subject: PETSc communicator In-Reply-To: References: <200608171425.30756.mafunk@nmsu.edu> Message-ID: <200608221446.07726.mafunk@nmsu.edu> Hi Barry, thanks for the clarification. I am running my code on a different (much slower) machine right now, and from the initial results it seems so far that Matt Knepley's suspicion of having a bad network could be correct. But i need to do a couple more runs. thanks mat On Tuesday 22 August 2006 13:34, Barry Smith wrote: > Mat, > > This will not effect load balance or anything like that. > > When you pass a communicator like MPI_COMM_WORLD to PETSc > we don't actually use that communicator (because you might > be using it and there may be tag collisions etc). So instead > we store our own communicator inside the MPI_COMM_WORLD as an > attribute, this message is just telling us we are accessing > the inner communicator. > > Barry > > On Thu, 17 Aug 2006, Matt Funk wrote: > > Hi, > > > > i was wondering what the message: > > 'PetscCommDuplicate Using internal PETSc communicator 92 170' > > means exactly. I still have issues with PETSc when running 1 vs 2 procs > > w.r.t. the loadbalance. > > However, when run on 2 vs 4 the balance seems to be almost perfect. > > Then the option of a screwed up network was suggested to me, but since > > the 4vs 2 proc case is ok, it seems not necessarily to be the case. > > > > Maybe somebody can tell me what it means? > > > > thanks > > mat From mwojc at p.lodz.pl Wed Aug 23 14:32:52 2006 From: mwojc at p.lodz.pl (Marek Wojciechowski) Date: Wed, 23 Aug 2006 19:32:52 -0000 Subject: Memory preallocation in python In-Reply-To: References: Message-ID: Good news, Lisandro Dalcin just released new version of his petsc4py which compiles nicely with petsc-2.3.1-p16. On Mon, 21 Aug 2006 18:10:30 -0000, Marek Wojciechowski wrote: > On Mon, 21 Aug 2006 14:39:50 -0000, Matthew Knepley > wrote: > > This not helps, there are still undeclared variables like: > CARRAY_FLAGS > PETSC_COOKIE > UPDATEIFCOPY > PETSC_FILE_RDONLY > PETSC_FILE_WRONLY > PETSC_FILE_CREATE > KSP_CONVERGED_QCG_NEG_CURVE > KSP_CONVERGED_QCG_CONSTRAINED > I'm affraid petsc4py wrappers are broken for newer versions of PETSc... > > >> You just need petsc.h now. The structs are defined in >> include/private/petscimpl.h > >>> >>> One more question then: >>> In case of petsc4py, I tried to compile it but with no success because >>> of >>> the lack >>> of include file petschead.h in the petsc distribution (2.3.1-p16). I >>> guess, it was removed for >>> some reason. Maybe you could tell me where are now the definitions from >>> this file. > > > > -- Marek Wojciechowski From lee433 at purdue.edu Wed Aug 23 12:18:36 2006 From: lee433 at purdue.edu (Changyeol Lee) Date: Wed, 23 Aug 2006 13:18:36 -0400 Subject: Problem in mpdboot for multi-processing Message-ID: <1156353516.44ec8dec0dd34@webmail.purdue.edu> Hi, everyone! I assembled a 4-node cluster consisting of 4 Intel processors. I used Fedora Core 4, PETSc 2.3.1 and MPICH2-1.0.3. There is no problem in installation of PETSc and MPICH2. I also made authorized keys of SSH for connection between nodes without asking password. I confirmed that SSH show no asking of password between nodes. Also, mpdboot for itself is possible like below $ mpdboot -n 1 -f mpd.hosts $ However, mpdboot for multi-processors is not possible by showing the error below. Hostname of node1 is node1.cluster.net and hostname of node2 is node2.cluster.net. $ mpdboot -n 2 -f mpd.hosts mpdboot_node1.cluster.net (handle_mpd_output 368): failed to connect to mpd on node2.cluster.net $ I believe that the things such as /etc/hosts do not have problems. Although it seems a problem of MPICH2, I would like to hear something if you have similar experience like me. Let me know if you have any idea. Thank you so much! Changyeol ------------------------------------------------------ Have a gneiss day! Changyeol Lee Graduate student (Geophysics) Earth and Atmospheric Sciences Purdue University cell: 765)418-8498 phone: 765)495-1294 From balay at mcs.anl.gov Wed Aug 23 12:40:11 2006 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 23 Aug 2006 12:40:11 -0500 (CDT) Subject: Problem in mpdboot for multi-processing In-Reply-To: <1156353516.44ec8dec0dd34@webmail.purdue.edu> References: <1156353516.44ec8dec0dd34@webmail.purdue.edu> Message-ID: This query is best sent to mpich2-maint at mcs.anl.gov Satish On Wed, 23 Aug 2006, Changyeol Lee wrote: > Hi, everyone! > > I assembled a 4-node cluster consisting of 4 Intel processors. I used Fedora > Core 4, PETSc 2.3.1 and MPICH2-1.0.3. > > There is no problem in installation of PETSc and MPICH2. I also made authorized > keys of SSH for connection between nodes without asking password. I confirmed > that SSH show no asking of password between nodes. > > Also, mpdboot for itself is possible like below > > $ mpdboot -n 1 -f mpd.hosts > $ > > However, mpdboot for multi-processors is not possible by showing the error below. > Hostname of node1 is node1.cluster.net and hostname of node2 is node2.cluster.net. > > $ mpdboot -n 2 -f mpd.hosts > mpdboot_node1.cluster.net (handle_mpd_output 368): failed to connect to mpd on > node2.cluster.net > $ > > I believe that the things such as /etc/hosts do not have problems. > > Although it seems a problem of MPICH2, I would like to hear something if you > have similar experience like me. Let me know if you have any idea. > > Thank you so much! > > Changyeol > > ------------------------------------------------------ > Have a gneiss day! > > Changyeol Lee > Graduate student (Geophysics) > Earth and Atmospheric Sciences > Purdue University > cell: 765)418-8498 > phone: 765)495-1294 > > > > From knepley at gmail.com Wed Aug 23 16:35:21 2006 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 23 Aug 2006 16:35:21 -0500 Subject: Memory preallocation in python In-Reply-To: References: Message-ID: Excellent. I am putting this up on our website. Thanks for your patience, Matt On 8/23/06, Marek Wojciechowski wrote: > > Good news, Lisandro Dalcin just released new version of his petsc4py which > compiles nicely with petsc-2.3.1-p16. > > On Mon, 21 Aug 2006 18:10:30 -0000, Marek Wojciechowski > wrote: > > > On Mon, 21 Aug 2006 14:39:50 -0000, Matthew Knepley > > wrote: > > > > This not helps, there are still undeclared variables like: > > CARRAY_FLAGS > > PETSC_COOKIE > > UPDATEIFCOPY > > PETSC_FILE_RDONLY > > PETSC_FILE_WRONLY > > PETSC_FILE_CREATE > > KSP_CONVERGED_QCG_NEG_CURVE > > KSP_CONVERGED_QCG_CONSTRAINED > > I'm affraid petsc4py wrappers are broken for newer versions of PETSc... > > > > > >> You just need petsc.h now. The structs are defined in > >> include/private/petscimpl.h > > > >>> > >>> One more question then: > >>> In case of petsc4py, I tried to compile it but with no success because > >>> of > >>> the lack > >>> of include file petschead.h in the petsc distribution (2.3.1-p16). I > >>> guess, it was removed for > >>> some reason. Maybe you could tell me where are now the definitions > from > >>> this file. > > > > > > > > > > > > -- > Marek Wojciechowski > > -- "Failure has a thousand explanations. Success doesn't need one" -- Sir Alec Guiness -------------- next part -------------- An HTML attachment was scrubbed... URL: From julvar at tamu.edu Thu Aug 24 10:34:47 2006 From: julvar at tamu.edu (Julian) Date: Thu, 24 Aug 2006 10:34:47 -0500 Subject: Intel Dual core machines Message-ID: <006301c6c792$d4cb1730$24b75ba5@aero.ad.tamu.edu> Hello, So far, I have been using PETSc on a single processor windows machine. Now, I am planning on using it on a Intel Dual core machine. Before I start running the installation scripts, I wanted to confirm if I can use both the processors on this new machine just like how you would use multiple processors on a supercomputer. If yes, is there anything special that I need to do when installing PETSc? I'm guessing I would have to install some MPI software... Which one do you recommend for windows machines (I saw more than one windows MPI software on the PETSc website) ? Thanks, Julian. From randy at geosystem.us Thu Aug 24 10:41:42 2006 From: randy at geosystem.us (Randall Mackie) Date: Thu, 24 Aug 2006 08:41:42 -0700 Subject: Intel Dual core machines In-Reply-To: <006301c6c792$d4cb1730$24b75ba5@aero.ad.tamu.edu> References: <006301c6c792$d4cb1730$24b75ba5@aero.ad.tamu.edu> Message-ID: <44EDC8B6.10004@geosystem.us> Hi Julian, No problem running on dual cpu machines. You just need to have MPI set up correctly. On our cluster, we use ROCKS, which does everything more or less automagically. http://www.rocksclusters.org/wordpress/ Randy Julian wrote: > Hello, > > So far, I have been using PETSc on a single processor windows machine. Now, > I am planning on using it on a Intel Dual core machine. Before I start > running the installation scripts, I wanted to confirm if I can use both the > processors on this new machine just like how you would use multiple > processors on a supercomputer. > If yes, is there anything special that I need to do when installing PETSc? > I'm guessing I would have to install some MPI software... Which one do you > recommend for windows machines (I saw more than one windows MPI software on > the PETSc website) ? > > Thanks, > Julian. > -- Randall Mackie GSY-USA, Inc. PMB# 643 2261 Market St., San Francisco, CA 94114-1600 Tel (415) 469-8649 Fax (415) 469-5044 California Registered Geophysicist License No. GP 1034 From balay at mcs.anl.gov Thu Aug 24 10:54:14 2006 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 24 Aug 2006 10:54:14 -0500 (CDT) Subject: Intel Dual core machines In-Reply-To: <006301c6c792$d4cb1730$24b75ba5@aero.ad.tamu.edu> References: <006301c6c792$d4cb1730$24b75ba5@aero.ad.tamu.edu> Message-ID: If you plan to use windows recommend mpich1 as this is what PETSc is usually tested with [as far as installation is concerned]. http://www-unix.mcs.anl.gov/mpi/mpich1/mpich-nt/ Configure will automatically look for it - and use it. The scalability depends upon the OS, MPI impl and MemoryBandwidh numbers for this hardware. Don't know enough about the OS & MPI part - but the MemoryBandwidh part is easy to check based on the hardware you have. [The new core duo chips appear to have high memory bandwidth numbers - so I think it should scale well] But you should be concerned about this only for performance measurents - but not during development. [You can install MPI on a single cpu machine and use PETSc on it - for development] Satish On Thu, 24 Aug 2006, Julian wrote: > Hello, > > So far, I have been using PETSc on a single processor windows machine. Now, > I am planning on using it on a Intel Dual core machine. Before I start > running the installation scripts, I wanted to confirm if I can use both the > processors on this new machine just like how you would use multiple > processors on a supercomputer. > If yes, is there anything special that I need to do when installing PETSc? > I'm guessing I would have to install some MPI software... Which one do you > recommend for windows machines (I saw more than one windows MPI software on > the PETSc website) ? > > Thanks, > Julian. > > From randy at geosystem.us Thu Aug 24 10:16:47 2006 From: randy at geosystem.us (Randall Mackie) Date: Thu, 24 Aug 2006 08:16:47 -0700 Subject: OT - cluster rental Message-ID: <44EDC2DF.2050004@geosystem.us> I have been looking for a commercial cluster rental, with only very little success. The only one I've found so far is Tsunmaic Technologies: I've looked at several of the University super-computer centers, but they don't appear to be easy to rent cluster time from if you're not an academic. We have our own smallish (70 node) cluster, but we have some jobs where larger (256 node) clusters would be nice. If anyone knows of any commercial clusters that can be rented, I'd appreciate the information. Thanks, Randy Mackie -- Randall Mackie GSY-USA, Inc. PMB# 643 2261 Market St., San Francisco, CA 94114-1600 Tel (415) 469-8649 Fax (415) 469-5044 California Registered Geophysicist License No. GP 1034 From julvar at tamu.edu Thu Aug 24 11:10:47 2006 From: julvar at tamu.edu (Julian) Date: Thu, 24 Aug 2006 11:10:47 -0500 Subject: Direct Linear Solvers Message-ID: <006401c6c797$dc392700$24b75ba5@aero.ad.tamu.edu> Hello, I looked at the linear solvers summary page and I could find three direct solvers that do not use any external packages. But I could find an example for only one of them (LU). The link to the cholesky solver does not have any example file. I am new to PETSc and I don't fully understand how the different solvers are invoked. Do I just change PCSetType(pc,PCLU); to PCSetType(pc,PCCHOLESKY); ? And the "XXt and Xyt" solver does not have any link. How do I use that solver? Also, do the direct solvers do its own internal renumbering to reduce the matrix bandwidth? Or do we have to take care of that outside of PETSc? Thanks, Julian. From julvar at tamu.edu Thu Aug 24 11:22:50 2006 From: julvar at tamu.edu (Julian) Date: Thu, 24 Aug 2006 11:22:50 -0500 Subject: Intel Dual core machines In-Reply-To: <44EDC8B6.10004@geosystem.us> Message-ID: <006501c6c799$8b673b80$24b75ba5@aero.ad.tamu.edu> Thanks for the reply. Do I really need to use ROCKS if I'm just gonna use a single dual core machine ? Is that considered a cluster ? Also, what MPI software does ROCKS install ? I saw some mention of openMPI 1.1 Julian. > -----Original Message----- > From: owner-petsc-users at mcs.anl.gov > [mailto:owner-petsc-users at mcs.anl.gov] On Behalf Of Randall Mackie > Sent: Thursday, August 24, 2006 10:42 AM > To: petsc-users at mcs.anl.gov > Subject: Re: Intel Dual core machines > > Hi Julian, > > No problem running on dual cpu machines. You just need to > have MPI set up correctly. > > On our cluster, we use ROCKS, which does everything more or > less automagically. > > http://www.rocksclusters.org/wordpress/ > > Randy > > > Julian wrote: > > Hello, > > > > So far, I have been using PETSc on a single processor > windows machine. > > Now, I am planning on using it on a Intel Dual core > machine. Before I > > start running the installation scripts, I wanted to confirm > if I can > > use both the processors on this new machine just like how you would > > use multiple processors on a supercomputer. > > If yes, is there anything special that I need to do when > installing PETSc? > > I'm guessing I would have to install some MPI software... > Which one do > > you recommend for windows machines (I saw more than one windows MPI > > software on the PETSc website) ? > > > > Thanks, > > Julian. > > > > -- > Randall Mackie > GSY-USA, Inc. > PMB# 643 > 2261 Market St., > San Francisco, CA 94114-1600 > Tel (415) 469-8649 > Fax (415) 469-5044 > > California Registered Geophysicist > License No. GP 1034 > > From julvar at tamu.edu Thu Aug 24 11:28:24 2006 From: julvar at tamu.edu (Julian) Date: Thu, 24 Aug 2006 11:28:24 -0500 Subject: Intel Dual core machines In-Reply-To: Message-ID: <006601c6c79a$525d3910$24b75ba5@aero.ad.tamu.edu> thanks, I will try this out. I have been using petsc on a single cpu machine... And I want to see how much faster it is on a dual core. > -----Original Message----- > From: owner-petsc-users at mcs.anl.gov > [mailto:owner-petsc-users at mcs.anl.gov] On Behalf Of Satish Balay > Sent: Thursday, August 24, 2006 10:54 AM > To: petsc-users at mcs.anl.gov > Subject: Re: Intel Dual core machines > > If you plan to use windows recommend mpich1 as this is what > PETSc is usually tested with [as far as installation is concerned]. > > http://www-unix.mcs.anl.gov/mpi/mpich1/mpich-nt/ > > Configure will automatically look for it - and use it. > > The scalability depends upon the OS, MPI impl and > MemoryBandwidh numbers for this hardware. Don't know enough > about the OS & MPI part - but the MemoryBandwidh part is easy > to check based on the hardware you have. [The new core duo > chips appear to have high memory bandwidth numbers - so I > think it should scale well] > > But you should be concerned about this only for performance measurents > - but not during development. [You can install MPI on a > single cpu machine and use PETSc on it - for development] > > Satish > > On Thu, 24 Aug 2006, Julian wrote: > > > Hello, > > > > So far, I have been using PETSc on a single processor > windows machine. > > Now, I am planning on using it on a Intel Dual core > machine. Before I > > start running the installation scripts, I wanted to confirm > if I can > > use both the processors on this new machine just like how you would > > use multiple processors on a supercomputer. > > If yes, is there anything special that I need to do when > installing PETSc? > > I'm guessing I would have to install some MPI software... > Which one do > > you recommend for windows machines (I saw more than one windows MPI > > software on the PETSc website) ? > > > > Thanks, > > Julian. > > > > > From randy at geosystem.us Thu Aug 24 11:52:21 2006 From: randy at geosystem.us (Randall Mackie) Date: Thu, 24 Aug 2006 09:52:21 -0700 Subject: Intel Dual core machines In-Reply-To: <006501c6c799$8b673b80$24b75ba5@aero.ad.tamu.edu> References: <006501c6c799$8b673b80$24b75ba5@aero.ad.tamu.edu> Message-ID: <44EDD945.6070705@geosystem.us> Julian, No, you do not need Rocks (and in fact it is a Linux-based system). You mentioned the need to install MPI (which you do need for PETSc), and I was just pointing out that there are clustering software solutions that make a lot of getting the clusters set up much simpler than doing it all yourself. Rocks installs various flavors of MPICH, or MPI, or LAM.... Randy Julian wrote: > Thanks for the reply. > > Do I really need to use ROCKS if I'm just gonna use a single dual core > machine ? Is that considered a cluster ? > Also, what MPI software does ROCKS install ? I saw some mention of openMPI > 1.1 > > Julian. > > >> -----Original Message----- >> From: owner-petsc-users at mcs.anl.gov >> [mailto:owner-petsc-users at mcs.anl.gov] On Behalf Of Randall Mackie >> Sent: Thursday, August 24, 2006 10:42 AM >> To: petsc-users at mcs.anl.gov >> Subject: Re: Intel Dual core machines >> >> Hi Julian, >> >> No problem running on dual cpu machines. You just need to >> have MPI set up correctly. >> >> On our cluster, we use ROCKS, which does everything more or >> less automagically. >> >> http://www.rocksclusters.org/wordpress/ >> >> Randy >> >> >> Julian wrote: >>> Hello, >>> >>> So far, I have been using PETSc on a single processor >> windows machine. >>> Now, I am planning on using it on a Intel Dual core >> machine. Before I >>> start running the installation scripts, I wanted to confirm >> if I can >>> use both the processors on this new machine just like how you would >>> use multiple processors on a supercomputer. >>> If yes, is there anything special that I need to do when >> installing PETSc? >>> I'm guessing I would have to install some MPI software... >> Which one do >>> you recommend for windows machines (I saw more than one windows MPI >>> software on the PETSc website) ? >>> >>> Thanks, >>> Julian. >>> >> -- >> Randall Mackie >> GSY-USA, Inc. >> PMB# 643 >> 2261 Market St., >> San Francisco, CA 94114-1600 >> Tel (415) 469-8649 >> Fax (415) 469-5044 >> >> California Registered Geophysicist >> License No. GP 1034 >> >> > -- Randall Mackie GSY-USA, Inc. PMB# 643 2261 Market St., San Francisco, CA 94114-1600 Tel (415) 469-8649 Fax (415) 469-5044 California Registered Geophysicist License No. GP 1034 From julvar at tamu.edu Thu Aug 24 15:40:06 2006 From: julvar at tamu.edu (Julian) Date: Thu, 24 Aug 2006 15:40:06 -0500 Subject: Intel Dual core machines In-Reply-To: Message-ID: <007c01c6c7bd$7bb147c0$24b75ba5@aero.ad.tamu.edu> Hi, The link to mpich1 says this: MPICH.NT is no longer being developed. Please use MPICH2. MPICH.NT and MPICH2 can co-exist on the same machine so it is not necessary to uninstall MPICH to install MPICH2. But applications must be re-compiled with the MPICH2 header files and libraries. So, is it ok if I use mpich2? Thanks, Julian. > -----Original Message----- > From: owner-petsc-users at mcs.anl.gov > [mailto:owner-petsc-users at mcs.anl.gov] On Behalf Of Satish Balay > Sent: Thursday, August 24, 2006 10:54 AM > To: petsc-users at mcs.anl.gov > Subject: Re: Intel Dual core machines > > If you plan to use windows recommend mpich1 as this is what > PETSc is usually tested with [as far as installation is concerned]. > > http://www-unix.mcs.anl.gov/mpi/mpich1/mpich-nt/ > > Configure will automatically look for it - and use it. > > The scalability depends upon the OS, MPI impl and > MemoryBandwidh numbers for this hardware. Don't know enough > about the OS & MPI part - but the MemoryBandwidh part is easy > to check based on the hardware you have. [The new core duo > chips appear to have high memory bandwidth numbers - so I > think it should scale well] > > But you should be concerned about this only for performance measurents > - but not during development. [You can install MPI on a > single cpu machine and use PETSc on it - for development] > > Satish > > On Thu, 24 Aug 2006, Julian wrote: > > > Hello, > > > > So far, I have been using PETSc on a single processor > windows machine. > > Now, I am planning on using it on a Intel Dual core > machine. Before I > > start running the installation scripts, I wanted to confirm > if I can > > use both the processors on this new machine just like how you would > > use multiple processors on a supercomputer. > > If yes, is there anything special that I need to do when > installing PETSc? > > I'm guessing I would have to install some MPI software... > Which one do > > you recommend for windows machines (I saw more than one windows MPI > > software on the PETSc website) ? > > > > Thanks, > > Julian. > > > > > From balay at mcs.anl.gov Thu Aug 24 15:57:33 2006 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 24 Aug 2006 15:57:33 -0500 (CDT) Subject: Intel Dual core machines In-Reply-To: <007c01c6c7bd$7bb147c0$24b75ba5@aero.ad.tamu.edu> References: <007c01c6c7bd$7bb147c0$24b75ba5@aero.ad.tamu.edu> Message-ID: mpich2 is also not being developed anymore [on windows]. Either should work. [but I'm more familer with mpich1 - if you encounter issues] Satish On Thu, 24 Aug 2006, Julian wrote: > Hi, > > The link to mpich1 says this: > > MPICH.NT is no longer being developed. Please use MPICH2. MPICH.NT and > MPICH2 can co-exist on the same machine so it is not necessary to uninstall > MPICH to install MPICH2. But applications must be re-compiled with the > MPICH2 header files and libraries. > > So, is it ok if I use mpich2? > > Thanks, > Julian. > > > -----Original Message----- > > From: owner-petsc-users at mcs.anl.gov > > [mailto:owner-petsc-users at mcs.anl.gov] On Behalf Of Satish Balay > > Sent: Thursday, August 24, 2006 10:54 AM > > To: petsc-users at mcs.anl.gov > > Subject: Re: Intel Dual core machines > > > > If you plan to use windows recommend mpich1 as this is what > > PETSc is usually tested with [as far as installation is concerned]. > > > > http://www-unix.mcs.anl.gov/mpi/mpich1/mpich-nt/ > > > > Configure will automatically look for it - and use it. > > > > The scalability depends upon the OS, MPI impl and > > MemoryBandwidh numbers for this hardware. Don't know enough > > about the OS & MPI part - but the MemoryBandwidh part is easy > > to check based on the hardware you have. [The new core duo > > chips appear to have high memory bandwidth numbers - so I > > think it should scale well] > > > > But you should be concerned about this only for performance measurents > > - but not during development. [You can install MPI on a > > single cpu machine and use PETSc on it - for development] > > > > Satish > > > > On Thu, 24 Aug 2006, Julian wrote: > > > > > Hello, > > > > > > So far, I have been using PETSc on a single processor > > windows machine. > > > Now, I am planning on using it on a Intel Dual core > > machine. Before I > > > start running the installation scripts, I wanted to confirm > > if I can > > > use both the processors on this new machine just like how you would > > > use multiple processors on a supercomputer. > > > If yes, is there anything special that I need to do when > > installing PETSc? > > > I'm guessing I would have to install some MPI software... > > Which one do > > > you recommend for windows machines (I saw more than one windows MPI > > > software on the PETSc website) ? > > > > > > Thanks, > > > Julian. > > > > > > > > > > From balay at mcs.anl.gov Thu Aug 24 16:59:37 2006 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 24 Aug 2006 16:59:37 -0500 (CDT) Subject: Direct Linear Solvers In-Reply-To: <006401c6c797$dc392700$24b75ba5@aero.ad.tamu.edu> References: <006401c6c797$dc392700$24b75ba5@aero.ad.tamu.edu> Message-ID: If you need parallel direct solvers - then you can try MUMPS or Spooles or SuperLU_Dist external packages with PETSc. I don't think they work on windows - so you might have to use linux [with a f90 compiler for MUMPS] The default LU in PETSc is sequential. Satish On Thu, 24 Aug 2006, Julian wrote: > Hello, > > I looked at the linear solvers summary page and I could find three direct > solvers that do not use any external packages. But I could find an example > for only one of them (LU). The link to the cholesky solver does not have any > example file. I am new to PETSc and I don't fully understand how the > different solvers are invoked. Do I just change PCSetType(pc,PCLU); to > PCSetType(pc,PCCHOLESKY); ? > > And the "XXt and Xyt" solver does not have any link. How do I use that > solver? > > Also, do the direct solvers do its own internal renumbering to reduce the > matrix bandwidth? Or do we have to take care of that outside of PETSc? > > Thanks, > Julian. > > > > From hzhang at mcs.anl.gov Sun Aug 27 22:18:34 2006 From: hzhang at mcs.anl.gov (Hong Zhang) Date: Sun, 27 Aug 2006 22:18:34 -0500 (CDT) Subject: Direct Linear Solvers In-Reply-To: <006401c6c797$dc392700$24b75ba5@aero.ad.tamu.edu> References: <006401c6c797$dc392700$24b75ba5@aero.ad.tamu.edu> Message-ID: Julian, > I looked at the linear solvers summary page and I could find three direct > solvers that do not use any external packages. But I could find an example > for only one of them (LU). The link to the cholesky solver does not have any > example file. I am new to PETSc and I don't fully understand how the > different solvers are invoked. Do I just change PCSetType(pc,PCLU); to > PCSetType(pc,PCCHOLESKY); ? Yes. You can run petsc KSP example with runtime option, e.g. ~petsc//src/ksp/ksp/examples/tutorials/ex5.c: ./ex5 -ksp_type preonly -pc_type cholesky or examples on low level call of MatCholeskyFactor() /src/mat/examples/tests/ex74.c (not recommended for user) > > And the "XXt and Xyt" solver does not have any link. How do I use that > solver? These are the interface to the Tufo-Fischer parallel direct solver. I don't knwo much about them. Someone from petsc team may tell you more about "XXt and Xyt". > > Also, do the direct solvers do its own internal renumbering to reduce the > matrix bandwidth? Or do we have to take care of that outside of PETSc? > Do you mean matrix reordering? We do support a set of orderings. Run a petsc mat or ksp example with the opiton -help |grep -i ordering, then you'll see the orderings provided. Hong From henke at math.tu-clausthal.de Mon Aug 28 06:42:50 2006 From: henke at math.tu-clausthal.de (Christian Henke) Date: Mon, 28 Aug 2006 13:42:50 +0200 Subject: Using of MatGetSubMatrices Message-ID: <200608281342.50630.henke@math.tu-clausthal.de> Hi all, I am trying with petsc2.2.0 to get access to my sparse blockdiagonalmatrix matrix which contains for example 4 blocks m1 ... m4. If I use 2 Processors then m1 and m2 are owned by p1 and m3, m4 owned by p2. Now I want to read m1, m2 from p2 and m3, m4 from p1. First I have used MatGetValues, but then I took the message: Only local values currently supported. My next try was the function MatGetSubMatrices: IS isrow, iscol; Mat *M; ISCreateStride(PETSC_COMM_SELF,m,(int)i*m,1,&isrow); ISCreateStride(PETSC_COMM_SELF,m,(int)j*m,1,&iscol); const int ierr = MatGetSubMatrices(matrix,1,&isrow,&iscol,MAT_INITIAL_MATRIX,&M); ... ISDestroy(isrow); ISDestroy(iscol); where i,j the blockindices and m is the blocksize. It works well for a few functioncalls, but then it stops with the log_trace-message: [0] 0.055568 Event begin: MatGetSubMatrice. What is my error? Is MatGetSubMatrices the wrong function for my problem or are the above lines wrong? Regards Christian From mappol at gmail.com Mon Aug 28 10:44:23 2006 From: mappol at gmail.com (Patrick Lechner) Date: Mon, 28 Aug 2006 16:44:23 +0100 Subject: DPETSC_USE_FORTRAN_KERNELS-warning Message-ID: <215294220608280844r41fb9d3ai3006b322a057b9fd@mail.gmail.com> Dear all, I currently have the following problem and would be very grateful for any useful advice: I have written a Fortran code that uses PETSc for the solution of various linear systems with complex entries (both in the stiffness matrix and in the load vector). When I use the PETSc-Log to check the times for my runs, I get the following warning: ########################################################## # # # WARNING!!! # # # # The code for various complex numbers numerical # # kernels uses C++, which generally is not well # # optimized. For performance that is about 4-5 times # # faster, specify the flag -DPETSC_USE_FORTRAN_KERNELS # # in base_variables and recompile the PETSc libraries. # # # ########################################################## My problem now is, that I can't find "base_variables" in my latest PETSc-version (2.3.1-p15)... Do I just add the flag to my cpp-flags in bmake/$PETSC_ARCH/petscconf? Or should I do this modification somewhere else? Thanks a lot for any help with this! Best wishes, Patrick ================================= Patrick Lechner Numerical Analysist / Numerical Modeller Flat 1 159 Hardgate Aberdeen, AB11 6XQ Phone: 07815 927333 E-mail: patrick at lechner.com Homepage: http://www.patrick.lechner.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon Aug 28 10:55:46 2006 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 28 Aug 2006 10:55:46 -0500 (CDT) Subject: DPETSC_USE_FORTRAN_KERNELS-warning In-Reply-To: <215294220608280844r41fb9d3ai3006b322a057b9fd@mail.gmail.com> References: <215294220608280844r41fb9d3ai3006b322a057b9fd@mail.gmail.com> Message-ID: This message is outofdate. I'll fix it in petsc-dev. The way to enable this feature is to rerun configure with the additional option --with-fortran-kernels=generic You can use additonal option PETSC_ARCH with configure so that a new set of configuraton [with new set of libraries are created] - this way the old set is also useable. You can then check if the above option improves performance or not [and then stick with the higher perfoming version] For eg: if your current PETSC_ARCH is linux-gnu - you can do: ./bmake/linux-gnu/configure --with-fortran-kernels=generic PETSC_ARCH=linux-gnu-ftn-kernels make PETSC_ARCH=linux-gnu-ftn-kernels all make PETSC_ARCH=linux-gnu-ftn-kernels test Satish On Mon, 28 Aug 2006, Patrick Lechner wrote: > Dear all, > > I currently have the following problem and would be very grateful for any > useful advice: > > I have written a Fortran code that uses PETSc for the solution of various > linear systems with complex entries (both in the stiffness matrix and in the > load vector). When I use the PETSc-Log to check the times for my runs, I get > the following warning: > > ########################################################## > > # > # > # WARNING!!! > # > # > # > # The code for various complex numbers numerical > # > # kernels uses C++, which generally is not well > # > # optimized. For performance that is about 4-5 times > # > # faster, specify the flag -DPETSC_USE_FORTRAN_KERNELS # > # in base_variables and recompile the PETSc libraries. > # > > # > # > ########################################################## > > > My problem now is, that I can't find "base_variables" in my latest > PETSc-version (2.3.1-p15)... > Do I just add the flag to my cpp-flags in bmake/$PETSC_ARCH/petscconf? Or > should I do this modification somewhere else? > > Thanks a lot for any help with this! > Best wishes, > Patrick > > > > > ================================= > > Patrick Lechner > Numerical Analysist / Numerical Modeller > Flat 1 > 159 Hardgate > Aberdeen, AB11 6XQ > > Phone: 07815 927333 > E-mail: patrick at lechner.com > Homepage: http://www.patrick.lechner.com > From knepley at gmail.com Mon Aug 28 10:56:51 2006 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 28 Aug 2006 10:56:51 -0500 Subject: DPETSC_USE_FORTRAN_KERNELS-warning In-Reply-To: <215294220608280844r41fb9d3ai3006b322a057b9fd@mail.gmail.com> References: <215294220608280844r41fb9d3ai3006b322a057b9fd@mail.gmail.com> Message-ID: We apologize for the out-of-date documentation. There is now a configure option --with-fortran-kernels=generic which you can see with --help. Reconfiguring with this option will turn on the Fortran kernels. Matt On 8/28/06, Patrick Lechner wrote: > > Dear all, > > I currently have the following problem and would be very grateful for any > useful advice: > > I have written a Fortran code that uses PETSc for the solution of various > linear systems with complex entries (both in the stiffness matrix and in the > load vector). When I use the PETSc-Log to check the times for my runs, I get > the following warning: > > ########################################################## > > # > # > # WARNING!!! > # > # > # > # The code for various complex numbers numerical > # > # kernels uses C++, which generally is not well > # > # optimized. For performance that is about 4-5 times > # > # faster, specify the flag -DPETSC_USE_FORTRAN_KERNELS # > # in base_variables and recompile the PETSc libraries. > # > > # > # > ########################################################## > > > My problem now is, that I can't find "base_variables" in my latest > PETSc-version ( 2.3.1-p15)... > Do I just add the flag to my cpp-flags in bmake/$PETSC_ARCH/petscconf? Or > should I do this modification somewhere else? > > Thanks a lot for any help with this! > Best wishes, > Patrick > > > > > ================================= > > Patrick Lechner > Numerical Analysist / Numerical Modeller > Flat 1 > 159 Hardgate > Aberdeen, AB11 6XQ > > Phone: 07815 927333 > E-mail: patrick at lechner.com > Homepage: http://www.patrick.lechner.com > -- "Failure has a thousand explanations. Success doesn't need one" -- Sir Alec Guiness -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon Aug 28 10:59:31 2006 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 28 Aug 2006 10:59:31 -0500 (CDT) Subject: DPETSC_USE_FORTRAN_KERNELS-warning In-Reply-To: References: <215294220608280844r41fb9d3ai3006b322a057b9fd@mail.gmail.com> Message-ID: On Mon, 28 Aug 2006, Satish Balay wrote: > This message is outofdate. I'll fix it in petsc-dev. looks like this is already cleanedup in petsc-dev. Satish From bsmith at mcs.anl.gov Mon Aug 28 11:41:39 2006 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 28 Aug 2006 11:41:39 -0500 (CDT) Subject: Using of MatGetSubMatrices In-Reply-To: <200608281342.50630.henke@math.tu-clausthal.de> References: <200608281342.50630.henke@math.tu-clausthal.de> Message-ID: All processes that share the matrix must call MatGetSubMatrices() the same number of times. If a process doesn't need a matrix it should pass in zero length IS's. If you are always calling it with all processes then you can run with -start_in_debugger and when it is hanging hit control C in the debugger and type where to see where/why it is hanging. Barry On Mon, 28 Aug 2006, Christian Henke wrote: > Hi all, > > I am trying with petsc2.2.0 to get access to my sparse blockdiagonalmatrix > matrix which contains for example 4 blocks m1 ... m4. If I use 2 Processors > then m1 and m2 are owned by p1 and m3, m4 owned by p2. Now I want to read m1, > m2 from p2 and m3, m4 from p1. First I have used MatGetValues, but then I > took the message: Only local values currently supported. > My next try was the function MatGetSubMatrices: > > IS isrow, iscol; > Mat *M; > > ISCreateStride(PETSC_COMM_SELF,m,(int)i*m,1,&isrow); > ISCreateStride(PETSC_COMM_SELF,m,(int)j*m,1,&iscol); > > const int ierr > = MatGetSubMatrices(matrix,1,&isrow,&iscol,MAT_INITIAL_MATRIX,&M); > > ... > > ISDestroy(isrow); > ISDestroy(iscol); > > where i,j the blockindices and m is the blocksize. It works well for a few > functioncalls, but then it stops with the log_trace-message: > > [0] 0.055568 Event begin: MatGetSubMatrice. > > What is my error? Is MatGetSubMatrices the wrong function for my problem or > are the above lines wrong? > > Regards Christian > > From Stephen.R.Ball at awe.co.uk Tue Aug 29 06:36:50 2006 From: Stephen.R.Ball at awe.co.uk (Stephen R Ball) Date: Tue, 29 Aug 2006 12:36:50 +0100 Subject: Spooles Cholesky problem Message-ID: <68TCe6009159@awe.co.uk> Hi When using the Spooles Cholesky solver I am getting the error: 0:[0]PETSC ERROR: MatSetValues_SeqSBAIJ() line 792 in src/mat/impls/sbaij/seq/sbaij.c 0:[0]PETSC ERROR: ! 0:[0]PETSC ERROR: Lower triangular value cannot be set for sbaij format. Ignoring these values, run with -mat_ignore_lower_triangular or call MatSetOption(mat,MAT_IGNORE_LOWER_TRIANGULAR)! 0:[0]PETSC ERROR: MatSetValues() line 702 in src/mat/interface/matrix.c When I attempt to call MatSetOption(mat,MAT_IGNORE_LOWER_TRIANGULAR,ierr) via the Fortran interface, I get the compilation error that entity mat_ignore_lower_triangular has undefined type. Can you tell me if this option is supported via the Fortran interface? Note that using runtime option -mat_ignore_lower_triangular works, but I would prefer to set this option in my code using the MatSetOption() routine. Regards Stephen -- _______________________________________________________________________________ The information in this email and in any attachment(s) is commercial in confidence. If you are not the named addressee(s) or if you receive this email in error then any distribution, copying or use of this communication or the information in it is strictly prohibited. Please notify us immediately by email at admin.internet(at)awe.co.uk, and then delete this message from your computer. While attachments are virus checked, AWE plc does not accept any liability in respect of any virus which is not detected. AWE Plc Registered in England and Wales Registration No 02763902 AWE, Aldermaston, Reading, RG7 4PR