From u.tabak at tudelft.nl Sun Nov 1 05:01:12 2009 From: u.tabak at tudelft.nl (Umut Tabak) Date: Sun, 1 Nov 2009 12:01:12 +0100 Subject: set a column of a matrix Message-ID: <20091101110112.GA11973@dutw689> Dear all, I would like to set a column of a matrix, I read through the manual pages a bit... Since the matrices are row oriented in PETSc and there is a function MatSetValuesRow, I guess transposing my original matrix and then using this function is the best option I could see for the moment. Could you comment on this? BR, Umut -- Quote: I love the name of honor, more than I fear death. Author: Julius Caesar 101-44 BC, Roman Emperor From knepley at gmail.com Sun Nov 1 07:19:36 2009 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 1 Nov 2009 08:19:36 -0500 Subject: set a column of a matrix In-Reply-To: <20091101110112.GA11973@dutw689> References: <20091101110112.GA11973@dutw689> Message-ID: This is not an efficient parallel operation. You should probably rework your algorithm so that it is not necessary. Matt On Sun, Nov 1, 2009 at 6:01 AM, Umut Tabak wrote: > Dear all, > > I would like to set a column of a matrix, I read through the manual > pages a bit... Since the matrices are row oriented in PETSc and > there is a function MatSetValuesRow, I guess transposing my > original matrix and then using this function is the best option I > could see for the moment. Could you comment on this? > > BR, > Umut > -- > Quote: I love the name of honor, more than I fear death. > Author: Julius Caesar 101-44 BC, Roman Emperor > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sun Nov 1 07:44:58 2009 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 1 Nov 2009 07:44:58 -0600 Subject: set a column of a matrix In-Reply-To: <20091101110112.GA11973@dutw689> References: <20091101110112.GA11973@dutw689> Message-ID: <6265635E-33E3-44FC-9496-D9E0E86BEB3F@mcs.anl.gov> MatSetValues() so long as you have good matrix preallocation this will work fine. Doing a transpose is very expensive. Barry On Nov 1, 2009, at 5:01 AM, Umut Tabak wrote: > Dear all, > > I would like to set a column of a matrix, I read through the > manual > pages a bit... Since the matrices are row oriented in PETSc and > there is a function MatSetValuesRow, I guess transposing my > original matrix and then using this function is the best option I > could see for the moment. Could you comment on this? > > BR, > Umut > -- > Quote: I love the name of honor, more than I fear death. > Author: Julius Caesar 101-44 BC, Roman Emperor From u.tabak at tudelft.nl Sun Nov 1 10:57:23 2009 From: u.tabak at tudelft.nl (Umut Tabak) Date: Sun, 1 Nov 2009 17:57:23 +0100 Subject: set a column of a matrix In-Reply-To: <6265635E-33E3-44FC-9496-D9E0E86BEB3F@mcs.anl.gov> References: <20091101110112.GA11973@dutw689> <6265635E-33E3-44FC-9496-D9E0E86BEB3F@mcs.anl.gov> Message-ID: <20091101165723.GA24933@dutw689> On Sun, Nov 01, 2009 at 07:44:58AM -0600, Barry Smith wrote: > > MatSetValues() so long as you have good matrix preallocation this > will work fine. > > Doing a transpose is very expensive. > Dear Barry and Matt, Thanks for the replies, I am not doing anything in parallel. I should use MatSetValues with appropriate column and row indices. Actually, what I would like to do, I would like to set up a vectorset(rectangular) and assign to a block in a matrix. Is there a vectorset that I could use directly to somehow put these vectors directly into this set. And use this vectorset to set some part of a matrix. Thanks for the advice in advance. BR, Umut -- Quote: Coming together is a beginning, staying together is progress, and working together is success. Author: Henry Ford 1863-1947, American Industrialist, Founder of Ford Motor Company From balay at mcs.anl.gov Sun Nov 1 11:14:03 2009 From: balay at mcs.anl.gov (Satish Balay) Date: Sun, 1 Nov 2009 11:14:03 -0600 (CST) Subject: set a column of a matrix In-Reply-To: <20091101165723.GA24933@dutw689> References: <20091101110112.GA11973@dutw689> <6265635E-33E3-44FC-9496-D9E0E86BEB3F@mcs.anl.gov> <20091101165723.GA24933@dutw689> Message-ID: On Sun, 1 Nov 2009, Umut Tabak wrote: > On Sun, Nov 01, 2009 at 07:44:58AM -0600, Barry Smith wrote: > > > > MatSetValues() so long as you have good matrix preallocation this > > will work fine. > > > > Doing a transpose is very expensive. > > > Dear Barry and Matt, > > Thanks for the replies, I am not doing anything in parallel. I > should use MatSetValues with appropriate column and row indices. > > Actually, what I would like to do, I would like to set up a > vectorset(rectangular) and assign to a block in a matrix. Is > there a vectorset that I could use directly to somehow put > these vectors directly into this set. And use this vectorset to > set some part of a matrix. You can set a block of values at a time with MatSetValues(). However you should first get preallocation correct - and then time the MatSetValues() code - before attempting additional optimization. Once the preallocation is perfect - the primary savings with the setting block of values is the reduction in the number of calls of MatSetValues(). The other optimization you can do with setting block of values - is to hav the col indices [of the block of values set] be sorted. This saves a bit with searches [during insertion] Satish From bsmith at mcs.anl.gov Sun Nov 1 11:16:26 2009 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 1 Nov 2009 11:16:26 -0600 Subject: set a column of a matrix In-Reply-To: <20091101165723.GA24933@dutw689> References: <20091101110112.GA11973@dutw689> <6265635E-33E3-44FC-9496-D9E0E86BEB3F@mcs.anl.gov> <20091101165723.GA24933@dutw689> Message-ID: Is your matrix dense? If it is sparse then it doesn't make sense to take values from a vector (which is always dense) to a sparse matrix. Barry On Nov 1, 2009, at 10:57 AM, Umut Tabak wrote: > On Sun, Nov 01, 2009 at 07:44:58AM -0600, Barry Smith wrote: >> >> MatSetValues() so long as you have good matrix preallocation this >> will work fine. >> >> Doing a transpose is very expensive. >> > Dear Barry and Matt, > > Thanks for the replies, I am not doing anything in parallel. I > should use MatSetValues with appropriate column and row indices. > > Actually, what I would like to do, I would like to set up a > vectorset(rectangular) and assign to a block in a matrix. Is > there a vectorset that I could use directly to somehow put > these vectors directly into this set. And use this vectorset to > set some part of a matrix. > > Thanks for the advice in advance. > > BR, > Umut > -- > Quote: Coming together is a beginning, staying together is > progress, and working together is success. > Author: Henry Ford 1863-1947, American Industrialist, Founder of > Ford Motor Company From u.tabak at tudelft.nl Sun Nov 1 11:27:04 2009 From: u.tabak at tudelft.nl (Umut Tabak) Date: Sun, 01 Nov 2009 18:27:04 +0100 Subject: set a column of a matrix In-Reply-To: References: <20091101110112.GA11973@dutw689> <6265635E-33E3-44FC-9496-D9E0E86BEB3F@mcs.anl.gov> <20091101165723.GA24933@dutw689> Message-ID: <4AEDC4E8.5030901@tudelft.nl> Barry Smith wrote: > > Is your matrix dense? If it is sparse then it doesn't make sense > to take values from a vector (which is always dense) to a sparse matrix. > > Right, the matrix is dense. Filled with eigenvectors, which are also dense... Thx, Umut From bsmith at mcs.anl.gov Sun Nov 1 11:31:20 2009 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 1 Nov 2009 11:31:20 -0600 Subject: set a column of a matrix In-Reply-To: <4AEDC4E8.5030901@tudelft.nl> References: <20091101110112.GA11973@dutw689> <6265635E-33E3-44FC-9496-D9E0E86BEB3F@mcs.anl.gov> <20091101165723.GA24933@dutw689> <4AEDC4E8.5030901@tudelft.nl> Message-ID: <0BC6D60B-9608-4E80-8174-E2B29BDF0609@mcs.anl.gov> Then I would just use MatGetArray() and stick the values directly into the matrix. Barry On Nov 1, 2009, at 11:27 AM, Umut Tabak wrote: > Barry Smith wrote: >> >> Is your matrix dense? If it is sparse then it doesn't make sense >> to take values from a vector (which is always dense) to a sparse >> matrix. >> >> > Right, the matrix is dense. Filled with eigenvectors, which are also > dense... > Thx, > Umut From jarunan at ascomp.ch Tue Nov 3 04:26:11 2009 From: jarunan at ascomp.ch (jarunan at ascomp.ch) Date: Tue, 03 Nov 2009 11:26:11 +0100 Subject: -malloc_log In-Reply-To: <0BC6D60B-9608-4E80-8174-E2B29BDF0609@mcs.anl.gov> References: <20091101110112.GA11973@dutw689> <6265635E-33E3-44FC-9496-D9E0E86BEB3F@mcs.anl.gov> <20091101165723.GA24933@dutw689> <4AEDC4E8.5030901@tudelft.nl> <0BC6D60B-9608-4E80-8174-E2B29BDF0609@mcs.anl.gov> Message-ID: <20091103112611.z3l1nypca6o848k4@webmail.ascomp.ch> Hello, When I use option -malloc_log, it prints the information at the end of running as below. The question is: What are the 2 numbers in front of the function? e.g. 2 3216 ClassPerfLogCreate() What is 2 and what is 3216? Thank you Jarunan 0: [0] Maximum memory PetscMalloc()ed 2713976 maximum size of entire process 30131728 0: [0] Memory usage sorted by function 0: [0] 2 3216 ClassPerfLogCreate() 0: [0] 2 1616 ClassRegLogCreate() 0: [0] 2 6416 EventPerfLogCreate() 0: [0] 1 12800 EventPerfLogEnsureSize() 0: [0] 2 1616 EventRegLogCreate() 0: [0] 1 3200 EventRegLogRegister() 0: [0] 40 5760 ISCreateBlock() 0: [0] 160 701184 ISCreateGeneral() 0: [0] 96 12096 ISCreateStride() 0: [0] 24 342400 ISGetIndices_Stride() 0: [0] 8 171200 ISInvertPermutation_General() 0: [0] 48 13312 KSPCreate() 0: [0] 8 1280 KSPCreate_GMRES() 0: [0] 16 256 KSPDefaultConvergedCreate() 0: [0] 8 2048 KSPGMRESClassicalGramSchmidtOrthogonalization() From jed at 59A2.org Tue Nov 3 04:55:58 2009 From: jed at 59A2.org (Jed Brown) Date: Tue, 03 Nov 2009 11:55:58 +0100 Subject: -malloc_log In-Reply-To: <20091103112611.z3l1nypca6o848k4@webmail.ascomp.ch> References: <20091101110112.GA11973@dutw689> <6265635E-33E3-44FC-9496-D9E0E86BEB3F@mcs.anl.gov> <20091101165723.GA24933@dutw689> <4AEDC4E8.5030901@tudelft.nl> <0BC6D60B-9608-4E80-8174-E2B29BDF0609@mcs.anl.gov> <20091103112611.z3l1nypca6o848k4@webmail.ascomp.ch> Message-ID: <4AF00C3E.5040902@59A2.org> jarunan at ascomp.ch wrote: > > Hello, > > When I use option -malloc_log, it prints the information at the end of > running as below. The question is: What are the 2 numbers in front of > the function? e.g. > 2 3216 ClassPerfLogCreate() > > What is 2 and what is 3216? 2 : the total number of allocations from this function 3216 : total number of bytes allocated from this function Jed -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: From jarunan at ascomp.ch Thu Nov 5 03:32:17 2009 From: jarunan at ascomp.ch (jarunan at ascomp.ch) Date: Thu, 05 Nov 2009 10:32:17 +0100 Subject: Reuse matrix and vector In-Reply-To: <0BC6D60B-9608-4E80-8174-E2B29BDF0609@mcs.anl.gov> References: <20091101110112.GA11973@dutw689> <6265635E-33E3-44FC-9496-D9E0E86BEB3F@mcs.anl.gov> <20091101165723.GA24933@dutw689> <4AEDC4E8.5030901@tudelft.nl> <0BC6D60B-9608-4E80-8174-E2B29BDF0609@mcs.anl.gov> Message-ID: <20091105103217.8vknqnkehwc0ww4g@webmail.ascomp.ch> Hello, I would like to reuse matrix and vector to save computing time. As the result in last iterations are not similar to the one from my old solver, so I am not sure if I program with PETSc the right way, especially, resetting values to matrix with MatMPIAIJSetPreallocationCSR(). Please take a look, here is how I program it: At the beginning of the program I create the vector and matrix. call VecCreateMPI(PETSC_COMM_WORLD,istorf_no_ovcell,PETSC_DETERMINE,rhs,ierr) call VecDuplicate(rhs,sol,ierr) call MatCreate(PETSC_COMM_WORLD,Ap,ierr) call MatSetSizes(Ap,istorf_no_ovcell,istorf_no_ovcell,PETSC_DETERMINE,PETSC_DETERMINE,ierr) call MatSetType(Ap,MATMPIAIJ,ierr) Then, in each loop I reset values in the vector and the matrix. do niter = 1,maxiter call VecSetValues(rhs,w,gindex_issu(1:w),f_issu(1:w),INSERT_VALUES,ierr) call VecAssemblyBegin(rhs,ierr) call VecAssemblyEnd(rhs,ierr) call MatMPIAIJSetPreallocationCSR(Ap,rowind,columnind,A,ierr) call MatAssemblyBegin(Ap,MAT_FINAL_ASSEMBLY,ierr) call MatAssemblyEnd(Ap,MAT_FINAL_ASSEMBLY,ierr) call solve_system call update_right_hand_side endo call MatDestroy(Ap,ierr) call VecDestroy(sol,ierr) call VecDestroy(rhs,ierr) Regards, Jarunan -- Jarunan Panyasantisuk Development Engineer ASCOMP GmbH, Technoparkstr. 1 CH-8005 Zurich, Switzerland Phone : +41 44 445 4072 Fax : +41 44 445 4075 E-mail: jarunan at ascomp.ch www.ascomp.ch From thomas.witkowski at tu-dresden.de Thu Nov 5 03:43:56 2009 From: thomas.witkowski at tu-dresden.de (Thomas Witkowski) Date: Thu, 05 Nov 2009 10:43:56 +0100 Subject: Solving a singular matrix with BoomerAMG Message-ID: <4AF29E5C.6030108@tu-dresden.de> I want so solve a system with a singular matrix (just the laplace discretized with the fem using pure Neumann boundary conditions) with BoomerAMG. Is there any way to do it directly in BoomerAMG/PETSc, or must I change the matrix to make it nonsingular? Thomas From jed at 59A2.org Thu Nov 5 03:56:27 2009 From: jed at 59A2.org (Jed Brown) Date: Thu, 05 Nov 2009 10:56:27 +0100 Subject: Reuse matrix and vector In-Reply-To: <20091105103217.8vknqnkehwc0ww4g@webmail.ascomp.ch> References: <20091101110112.GA11973@dutw689> <6265635E-33E3-44FC-9496-D9E0E86BEB3F@mcs.anl.gov> <20091101165723.GA24933@dutw689> <4AEDC4E8.5030901@tudelft.nl> <0BC6D60B-9608-4E80-8174-E2B29BDF0609@mcs.anl.gov> <20091105103217.8vknqnkehwc0ww4g@webmail.ascomp.ch> Message-ID: <4AF2A14B.8070409@59A2.org> jarunan at ascomp.ch wrote: > > Hello, > > I would like to reuse matrix and vector to save computing time. As the > result in last iterations are not similar to the one from my old solver, > so I am not sure if I program with PETSc the right way, especially, > resetting values to matrix with MatMPIAIJSetPreallocationCSR(). I suspect you are using a stale preconditioner, but MatMPIAIJSetPreallocationCSR should not be called every iteration. > Please take a look, here is how I program it: > > At the beginning of the program I create the vector and matrix. > > call > VecCreateMPI(PETSC_COMM_WORLD,istorf_no_ovcell,PETSC_DETERMINE,rhs,ierr) > call VecDuplicate(rhs,sol,ierr) > > > call MatCreate(PETSC_COMM_WORLD,Ap,ierr) > call > MatSetSizes(Ap,istorf_no_ovcell,istorf_no_ovcell,PETSC_DETERMINE,PETSC_DETERMINE,ierr) > > call MatSetType(Ap,MATMPIAIJ,ierr) > > Then, in each loop I reset values in the vector and the matrix. > > do niter = 1,maxiter > > call > VecSetValues(rhs,w,gindex_issu(1:w),f_issu(1:w),INSERT_VALUES,ierr) > > call VecAssemblyBegin(rhs,ierr) > call VecAssemblyEnd(rhs,ierr) > > call MatMPIAIJSetPreallocationCSR(Ap,rowind,columnind,A,ierr) It's better to call this once at the beginning and update with MatSetValues (insert one row at a time, it's very little code). MatMPIAIJSetPreallocationCSR reallocates memory because it doesn't check to see if the nonzero pattern has changed (because that's not what it's for). > call MatAssemblyBegin(Ap,MAT_FINAL_ASSEMBLY,ierr) > call MatAssemblyEnd(Ap,MAT_FINAL_ASSEMBLY,ierr) > > call solve_system Where are you calling KSPSetOperators? You have to call this every time the matrix changes. Jed -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: From jed at 59A2.org Thu Nov 5 04:05:26 2009 From: jed at 59A2.org (Jed Brown) Date: Thu, 05 Nov 2009 11:05:26 +0100 Subject: Solving a singular matrix with BoomerAMG In-Reply-To: <4AF29E5C.6030108@tu-dresden.de> References: <4AF29E5C.6030108@tu-dresden.de> Message-ID: <4AF2A366.30304@59A2.org> Thomas Witkowski wrote: > I want so solve a system with a singular matrix (just the laplace > discretized with the fem using pure Neumann boundary conditions) with > BoomerAMG. Is there any way to do it directly in BoomerAMG/PETSc, or > must I change the matrix to make it nonsingular? The user's manual has a section on solving singular systems, you just need to tell KSP about the null space (either with KSPSetNullSpace or -ksp_constant_null_space). Small known null spaces are rarely an issue with iterative methods. With multigrid, there is some risk that the coarse operator is also singular; this can cause trouble since it is usually solved with a direct solver, but should not happen in your case. Jed -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: From jed at 59A2.org Thu Nov 5 04:58:53 2009 From: jed at 59A2.org (Jed Brown) Date: Thu, 05 Nov 2009 11:58:53 +0100 Subject: Reuse matrix and vector In-Reply-To: <20091105111152.a6vy4mhjo8wo0kcg@webmail.ascomp.ch> References: <20091101110112.GA11973@dutw689> <6265635E-33E3-44FC-9496-D9E0E86BEB3F@mcs.anl.gov> <20091101165723.GA24933@dutw689> <4AEDC4E8.5030901@tudelft.nl> <0BC6D60B-9608-4E80-8174-E2B29BDF0609@mcs.anl.gov> <20091105103217.8vknqnkehwc0ww4g@webmail.ascomp.ch> <4AF2A14B.8070409@59A2.org> <20091105111152.a6vy4mhjo8wo0kcg@webmail.ascomp.ch> Message-ID: <4AF2AFED.2090706@59A2.org> jarunan at ascomp.ch wrote: > >> I suspect you are using a stale preconditioner, but >> MatMPIAIJSetPreallocationCSR should not be called every iteration. > > What do you mean 'stale preconditioner' ? I use Additive Schwarz. That the preconditioner was not being updated when you change the matrix. If you reset everything inside the loop, then that isn't the problem. >> It's better to call this once at the beginning and update with >> MatSetValues (insert one row at a time, it's very little code). >> MatMPIAIJSetPreallocationCSR reallocates memory because it doesn't check >> to see if the nonzero pattern has changed (because that's not what it's >> for). >> > > Yes, thank you for the advice. I will modify this. But MatSetValues() is > not efficient with a big problem. It takes much time. No, either you are inserting values that have not been preallocated (check with -info | grep mallocs) or you are inserting single values. You should insert a full row every time you call MatSetValues. >> Where are you calling KSPSetOperators? You have to call this every time >> the matrix changes. > > I did shortcut of the code, actually...just to put it here. After > creating and setting Matrix and vector. In each loop I create the KSP: > > call KSPCreate(PETSC_COMM_WORLD,ksp,ierr) > call KSPSetOperators(ksp,Ap,Ap,DIFFERENT_NONZERO_PATTERN,ierr) > call KSPGetPC(ksp,pc,ierr) > call PCSetType(pc,pct,ierr) > > if (pct == PCASM) then > call PCASMSetTotalSubdomains(pc,glob_nblocks_psc,PETSC_NULL_OBJECT,ierr) > call PCASMSetLocalSubdomains(pc,nblocks_psc,PETSC_NULL_OBJECT,ierr) Only call one of these. > call PetscOptionsSetValue('-pc_asm_overlap','1',ierr) > call PetscOptionsSetValue('-sub_pc_type','lu',ierr) > call PetscOptionsSetValue('-sub_pc_factor_zeropivot','0.0',ierr) > endif > > call KSPSetTolerances(ksp,resin,1.e-20, & > PETSC_DEFAULT_DOUBLE_PRECISION,nswp_psc,ierr) > > call KSPSetType(ksp,kspt,ierr) > call KSPSetFromOptions(ksp,ierr) Of the above, only KSPSetOperators() should be called inside the loop, everything else is setup that should happen before your loop. > call KSPSolve(ksp,rhs,sol,ierr) > > call KSPDestroy(ksp,ierr) How was your code, and the convergence different before? Jed -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: From aja2111 at columbia.edu Thu Nov 5 03:10:51 2009 From: aja2111 at columbia.edu (Aron Ahmadia) Date: Thu, 5 Nov 2009 12:10:51 +0300 Subject: petsc function wrapper In-Reply-To: <1ca0b8b40911041640o5b6d61d5t429734e25c241ed4@mail.gmail.com> References: <1ca0b8b40911041640o5b6d61d5t429734e25c241ed4@mail.gmail.com> Message-ID: <37604ab40911050110k21373c8ci65ecb501b1787b0a@mail.gmail.com> Hi Braxton, I don't think there's an explicit manual page in PETSc for doing it. You would need to do: VecGetArray VecGetOwnershipRange (iterate over range on data from array) VecRestoreArray I cc the PETSc user's list in case anyone else has a brighter idea. Cheers, A On Thu, Nov 5, 2009 at 3:40 AM, Braxton Osting wrote: > aron, > > last year you showed abby and I a slick way to apply a function to a > petsc vec pointwise. i can't seem to find that manual page now. do you > remember it? > > thanks, > b > From knepley at gmail.com Thu Nov 5 10:43:57 2009 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 5 Nov 2009 10:43:57 -0600 Subject: petsc function wrapper In-Reply-To: <37604ab40911050110k21373c8ci65ecb501b1787b0a@mail.gmail.com> References: <1ca0b8b40911041640o5b6d61d5t429734e25c241ed4@mail.gmail.com> <37604ab40911050110k21373c8ci65ecb501b1787b0a@mail.gmail.com> Message-ID: You can look at VecSqrt() to see us do it. Matt On Thu, Nov 5, 2009 at 3:10 AM, Aron Ahmadia wrote: > Hi Braxton, > > I don't think there's an explicit manual page in PETSc for doing it. > You would need to do: > > VecGetArray > VecGetOwnershipRange > > (iterate over range on data from array) > > VecRestoreArray > > I cc the PETSc user's list in case anyone else has a brighter idea. > > Cheers, > A > > On Thu, Nov 5, 2009 at 3:40 AM, Braxton Osting wrote: > > aron, > > > > last year you showed abby and I a slick way to apply a function to a > > petsc vec pointwise. i can't seem to find that manual page now. do you > > remember it? > > > > thanks, > > b > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From w.drenth at gmail.com Fri Nov 6 12:00:35 2009 From: w.drenth at gmail.com (Wienand Drenth) Date: Fri, 6 Nov 2009 19:00:35 +0100 Subject: use of VecPlaceArray in parallel with fortran Message-ID: <4a718f330911061000y53dcf231x83a4e41cf7b25097@mail.gmail.com> Hello all, In my research code I solve a linear system of equations, and (of course) I use PetSc routines for that. However, in the code we have our own data arrays for the right handside vector B, and solution vector X. Only just prior to the call to KSPSolve, we use the routine VecPlaceArray to synchronize the Fortran array B and X with their PetSc counterparts (M_B and M_X, for example, respectively). I was wondering if this would work in parallel as well? I have adapted one of the tutorial examples (ex2f from the ksp tutorials) to utilize the VecPlaceArray mechanism. I encountered no problems, except when I want to run the program in parallel. When I do that, and print my own vector X afterwards, different processors show different parts of the solution. For example, for a vector of length 10, and with two processors, processor one will have values for the first five elements (remainder is zero), and processor two will have values for the last five elements in the array. >From the same ksp tutorials, I have tried ex13 as well, the c program. Here I do not get partial outputs for different processors. I wonder whether one cannot use VecPlaceArray in a parralel setting in Fortran, except by doing extra bookkeeping? I hope someone can enlighten me, and indicate where I missed something in my programming or otherwise. Thanks in advance, Wienand Drenth -- Wienand Drenth PhD Eindhoven, the Netherlands From bsmith at mcs.anl.gov Fri Nov 6 12:48:00 2009 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 6 Nov 2009 12:48:00 -0600 Subject: use of VecPlaceArray in parallel with fortran In-Reply-To: <4a718f330911061000y53dcf231x83a4e41cf7b25097@mail.gmail.com> References: <4a718f330911061000y53dcf231x83a4e41cf7b25097@mail.gmail.com> Message-ID: VecPlaceArray() gives to the vector its local (on process) part of the array, not the whole array (and requires no communication). If you want the entire array of the vector on one or all processes you can use VecScatterCreateToAll() or VecScatterCreateToZero() and then use the VecScatter created to move the values to where you want them. Barry On Nov 6, 2009, at 12:00 PM, Wienand Drenth wrote: > Hello all, > > In my research code I solve a linear system of equations, and (of > course) I use PetSc routines for that. However, in the code we have > our own data arrays for the right handside vector B, and solution > vector X. Only just prior to the call to KSPSolve, we use the routine > VecPlaceArray to synchronize the Fortran array B and X with their > PetSc counterparts (M_B and M_X, for example, respectively). > > I was wondering if this would work in parallel as well? I have adapted > one of the tutorial examples (ex2f from the ksp tutorials) to utilize > the VecPlaceArray mechanism. I encountered no problems, except when I > want to run the program in parallel. > > When I do that, and print my own vector X afterwards, different > processors show different parts of the solution. For example, for a > vector of length 10, and with two processors, processor one will have > values for the first five elements (remainder is zero), and processor > two will have values for the last five elements in the array. > >> From the same ksp tutorials, I have tried ex13 as well, the c >> program. > Here I do not get partial outputs for different processors. > > I wonder whether one cannot use VecPlaceArray in a parralel setting in > Fortran, except by doing extra bookkeeping? I hope someone can > enlighten me, and indicate where I missed something in my programming > or otherwise. > > Thanks in advance, > > Wienand Drenth > > > > -- > Wienand Drenth PhD > Eindhoven, the Netherlands From vyan2000 at gmail.com Sat Nov 7 13:57:28 2009 From: vyan2000 at gmail.com (Ryan Yan) Date: Sat, 7 Nov 2009 14:57:28 -0500 Subject: Can I use MatSetBlockSize() for MPIAIJ Message-ID: Hi All, I have a question as follows: In order to use MatSetValuesBlocked() for a MPIAIJ matrix. I need to call MatSetBlockSize() when I create the matrix. so I did the following. Here the blocksize = 5; Mat *A; .... MatCreate(MPI_COMM_WORLD,A); MatSetSizes(*A,m*blocksize,n*blocksize,M*blocksize,N*blocksize); MatSetType(*A,MATMPIAIJ); MatSetBlockSize(*A,blocksize); ierr=MatMPIAIJSetPreallocation(*A,0,ourlens_ptws,0,offlens_ptws); CHKERRQ(ierr); ierr = MatAssemblyBegin(*A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); ierr = MatAssemblyEnd(*A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); PetscPrintf(PETSC_COMM_WORLD,"the bs BEFORE is %d\n", bs); MatGetBlockSize(*A,&bs); PetscPrintf(PETSC_COMM_WORLD,"the bs is %d\n", bs); PetscPrintf(PETSC_COMM_WORLD,"the blocksize is %d\n", blocksize); ... The output I get is: the bs BEFORE is 0 the bs is 1 the blocksize is 5 It seems like the Mat A does not absorb the information blocksize=5 at all. How should I make the function-call sequence correct, if I want to set a blocksize for the MPIAIJ. Thanks for any suggestions in advance, Yan -------------- next part -------------- An HTML attachment was scrubbed... URL: From vyan2000 at gmail.com Sat Nov 7 14:01:46 2009 From: vyan2000 at gmail.com (Ryan Yan) Date: Sat, 7 Nov 2009 15:01:46 -0500 Subject: Can I use MatSetBlockSize() for MPIAIJ In-Reply-To: References: Message-ID: Sorry a typo: MatCreate(MPI_COMM_WORLD,*A); On Sat, Nov 7, 2009 at 2:57 PM, Ryan Yan wrote: > Hi All, > I have a question as follows: > > In order to use MatSetValuesBlocked() for a MPIAIJ matrix. I need to call > MatSetBlockSize() when I create the matrix. > > so I did the following. Here the blocksize = 5; > > > Mat *A; > .... > MatCreate(MPI_COMM_WORLD,A); > MatSetSizes(*A,m*blocksize,n*blocksize,M*blocksize,N*blocksize); > MatSetType(*A,MATMPIAIJ); > MatSetBlockSize(*A,blocksize); > ierr=MatMPIAIJSetPreallocation(*A,0,ourlens_ptws,0,offlens_ptws); > CHKERRQ(ierr); > > ierr = MatAssemblyBegin(*A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); > ierr = MatAssemblyEnd(*A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); > PetscPrintf(PETSC_COMM_WORLD,"the bs BEFORE is %d\n", bs); > MatGetBlockSize(*A,&bs); > > PetscPrintf(PETSC_COMM_WORLD,"the bs is %d\n", bs); > PetscPrintf(PETSC_COMM_WORLD,"the blocksize is %d\n", blocksize); > ... > > The output I get is: > > the bs BEFORE is 0 > the bs is 1 > the blocksize is 5 > > > It seems like the Mat A does not absorb the information blocksize=5 at all. > How should I make the function-call sequence correct, if I want to set a > blocksize for the MPIAIJ. > > Thanks for any suggestions in advance, > > Yan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Sat Nov 7 14:12:40 2009 From: jed at 59A2.org (Jed Brown) Date: Sat, 07 Nov 2009 21:12:40 +0100 Subject: Can I use MatSetBlockSize() for MPIAIJ In-Reply-To: References: Message-ID: <4AF5D4B8.1030901@59A2.org> Ryan Yan wrote: > Hi All, > I have a question as follows: > > In order to use MatSetValuesBlocked() for a MPIAIJ matrix. I need to > call MatSetBlockSize() when I create the matrix. Call MatSetBlockSize *after* preallocation. Jed -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: From vyan2000 at gmail.com Sat Nov 7 14:17:01 2009 From: vyan2000 at gmail.com (Ryan Yan) Date: Sat, 7 Nov 2009 15:17:01 -0500 Subject: Can I use MatSetBlockSize() for MPIAIJ In-Reply-To: <4AF5D4B8.1030901@59A2.org> References: <4AF5D4B8.1030901@59A2.org> Message-ID: Hi Jed, Thanks, So the following is very confusing: http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Mat/MatSetValuesBlocked.html Notes The m and n count the NUMBER of blocks in the row direction and column direction, NOT the total number of rows/columns; for example, if the block sizeis 2 and you are passing in values for rows 2,3,4,5 then m would be 2 (not 4). The values in idxm would be 1 2; that is the first index for each block divided by the block size. Note that you must call MatSetBlockSize() when constructing this matrix (and before preallocating it)......... On Sat, Nov 7, 2009 at 3:12 PM, Jed Brown wrote: > Ryan Yan wrote: > > Hi All, > > I have a question as follows: > > > > In order to use MatSetValuesBlocked() for a MPIAIJ matrix. I need to > > call MatSetBlockSize() when I create the matrix. > > Call MatSetBlockSize *after* preallocation. > > Jed > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vyan2000 at gmail.com Sat Nov 7 14:19:30 2009 From: vyan2000 at gmail.com (Ryan Yan) Date: Sat, 7 Nov 2009 15:19:30 -0500 Subject: Can I use MatSetBlockSize() for MPIAIJ In-Reply-To: References: <4AF5D4B8.1030901@59A2.org> Message-ID: It works. Thanks agian. Yan On Sat, Nov 7, 2009 at 3:17 PM, Ryan Yan wrote: > Hi Jed, > Thanks, > So the following is very confusing: > > > http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Mat/MatSetValuesBlocked.html > > Notes The m and n count the NUMBER of blocks in the row direction and > column direction, NOT the total number of rows/columns; for example, if the > block sizeis 2 and you are passing in values for rows 2,3,4,5 then m would be 2 (not > 4). The values in idxm would be 1 2; that is the first index for each block > divided by the block size. > > > Note that you must call MatSetBlockSize() > when constructing this matrix (and before preallocating it)......... > > > > > On Sat, Nov 7, 2009 at 3:12 PM, Jed Brown wrote: > >> Ryan Yan wrote: >> > Hi All, >> > I have a question as follows: >> > >> > In order to use MatSetValuesBlocked() for a MPIAIJ matrix. I need to >> > call MatSetBlockSize() when I create the matrix. >> >> Call MatSetBlockSize *after* preallocation. >> >> Jed >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Sat Nov 7 14:39:41 2009 From: jed at 59A2.org (Jed Brown) Date: Sat, 07 Nov 2009 21:39:41 +0100 Subject: Can I use MatSetBlockSize() for MPIAIJ In-Reply-To: References: <4AF5D4B8.1030901@59A2.org> Message-ID: <4AF5DB0D.9030506@59A2.org> Ryan Yan wrote: > Note that you must call MatSetBlockSize > () > when constructing this matrix (and before preallocating it)......... Indeed, thanks for pointing it out. I have fixed the documentation in petsc-dev and also made MatSetBlockSize() work for BAIJ (it just checks that the block size agrees with the way the matrix was allocated). Jed -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: From jarunan at ascomp.ch Mon Nov 9 06:03:57 2009 From: jarunan at ascomp.ch (jarunan at ascomp.ch) Date: Mon, 09 Nov 2009 13:03:57 +0100 Subject: Create vectors In-Reply-To: References: Message-ID: <20091109130357.j12u8p5y8kwosks0@webmail.ascomp.ch> Hello, Is there an equivalent way to allocating array pointer for creating vectors (or vector pointers)? Jarunan From knepley at gmail.com Mon Nov 9 06:06:30 2009 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 9 Nov 2009 06:06:30 -0600 Subject: Create vectors In-Reply-To: <20091109130357.j12u8p5y8kwosks0@webmail.ascomp.ch> References: <20091109130357.j12u8p5y8kwosks0@webmail.ascomp.ch> Message-ID: Is this what you want? http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Vec/VecCreateMPIWithArray.html Matt On Mon, Nov 9, 2009 at 6:03 AM, wrote: > > Hello, > > Is there an equivalent way to allocating array pointer for creating vectors > (or vector pointers)? > > > Jarunan > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Mon Nov 9 06:07:53 2009 From: jed at 59A2.org (Jed Brown) Date: Mon, 09 Nov 2009 13:07:53 +0100 Subject: Create vectors In-Reply-To: <20091109130357.j12u8p5y8kwosks0@webmail.ascomp.ch> References: <20091109130357.j12u8p5y8kwosks0@webmail.ascomp.ch> Message-ID: <4AF80619.8080601@59A2.org> jarunan at ascomp.ch wrote: > > Hello, > > Is there an equivalent way to allocating array pointer for creating > vectors (or vector pointers)? What do you want to do? Maybe you're looking for one of the VecCreateXXWithArray() variants. Jed -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: From jarunan at ascomp.ch Mon Nov 9 07:29:07 2009 From: jarunan at ascomp.ch (jarunan at ascomp.ch) Date: Mon, 09 Nov 2009 14:29:07 +0100 Subject: Create vectors In-Reply-To: <20091109130357.j12u8p5y8kwosks0@webmail.ascomp.ch> References: <20091109130357.j12u8p5y8kwosks0@webmail.ascomp.ch> Message-ID: <20091109142907.vcdgmk3sao0k0800@webmail.ascomp.ch> I am solving multi-level grid (similar to multi grid but not the same). In each iteration, each level is solved separately but solutions are mapped to eachother. Each level has different size of matrix and vector. And each test case has different numbers of grid level. I have a difficulty to create vectors and matrices for each level, preparing for the computation, as I do not want to create and destroy them in every iteration. I am thinking of something similar to array pointer (the code is in fortran) e.g., Type(real_array), Dimension(:), allocatable:: pointername Allocate(pointername(level_numbers)) do i = 1, level_numbers allocate(pointername(i)%p(size)) enddo Is it possible to create pointer to vectors? Thank you Jarunan Quoting jarunan at ascomp.ch: > > Hello, > > Is there an equivalent way to allocating array pointer for creating > vectors (or vector pointers)? > > > Jarunan -- Jarunan Panyasantisuk Development Engineer ASCOMP GmbH, Technoparkstr. 1 CH-8005 Zurich, Switzerland Phone : +41 44 445 4072 Fax : +41 44 445 4075 E-mail: jarunan at ascomp.ch www.ascomp.ch From knepley at gmail.com Mon Nov 9 07:32:36 2009 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 9 Nov 2009 07:32:36 -0600 Subject: Create vectors In-Reply-To: <20091109142907.vcdgmk3sao0k0800@webmail.ascomp.ch> References: <20091109130357.j12u8p5y8kwosks0@webmail.ascomp.ch> <20091109142907.vcdgmk3sao0k0800@webmail.ascomp.ch> Message-ID: Yes, Vec is just a regular type. Matt On Mon, Nov 9, 2009 at 7:29 AM, wrote: > > I am solving multi-level grid (similar to multi grid but not the same). In > each iteration, each level is solved separately but solutions are mapped to > eachother. Each level has different size of matrix and vector. And each test > case has different numbers of grid level. > > I have a difficulty to create vectors and matrices for each level, > preparing for the computation, as I do not want to create and destroy them > in every iteration. I am thinking of something similar to array pointer (the > code is in fortran) e.g., > > Type(real_array), Dimension(:), allocatable:: pointername > Allocate(pointername(level_numbers)) > > do i = 1, level_numbers > allocate(pointername(i)%p(size)) > enddo > > > Is it possible to create pointer to vectors? > > > Thank you > Jarunan > > > Quoting jarunan at ascomp.ch: > > >> Hello, >> >> Is there an equivalent way to allocating array pointer for creating >> vectors (or vector pointers)? >> >> >> Jarunan >> > > > > -- > Jarunan Panyasantisuk > Development Engineer > ASCOMP GmbH, Technoparkstr. 1 > CH-8005 Zurich, Switzerland > Phone : +41 44 445 4072 > Fax : +41 44 445 4075 > E-mail: jarunan at ascomp.ch > www.ascomp.ch > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From w.drenth at gmail.com Mon Nov 9 10:30:16 2009 From: w.drenth at gmail.com (Wienand Drenth) Date: Mon, 9 Nov 2009 17:30:16 +0100 Subject: use of VecPlaceArray in parallel with fortran In-Reply-To: References: <4a718f330911061000y53dcf231x83a4e41cf7b25097@mail.gmail.com> Message-ID: <4a718f330911090830p3dc946e9qba2b3530dc26458e@mail.gmail.com> Hello Barry, Thank you for that. Just another question. As I wrote in my first email, in the current code, we utilize a local non-PetSc arrays and using VecPlaceArray we "give" this array to PetSc vectors to do the KSPSolve. Afterwards, we can just continue with our local non-PetSc arrays. If I understand you correctly, and for my knowledge, this approach will not be possible in a parallel setting? When I do, with for example two processors, and with local array being blocal = 1, 2, .... , 10 then for the zeroth processor I have also values 1, 2, ... , 10 and not just half (i.e., 1,2,3,4,5,0,0,0,0,0). for the first processor I have only part of the values, but they start with the first entry of my array, and not half-way: 0,0,0,0,0, 1,2,3,4,5 instead of 0,0,0,0,0, 6,7,8,9,10 Regards, Wienand On Fri, Nov 6, 2009 at 7:48 PM, Barry Smith wrote: > > VecPlaceArray() gives to the vector its local (on process) part of the > array, not the whole array (and requires no communication). If you want the > entire array of the vector on one or all processes you can use > VecScatterCreateToAll() or VecScatterCreateToZero() and then use the > VecScatter created to move the values to where you want them. > > Barry > > On Nov 6, 2009, at 12:00 PM, Wienand Drenth wrote: > >> Hello all, >> >> In my research code I solve a linear system of equations, and (of >> course) I use PetSc routines for that. However, in the code we have >> our own data arrays for the right handside vector B, and solution >> vector X. Only just prior to the call to KSPSolve, we use the routine >> VecPlaceArray to synchronize the Fortran array B and X with their >> PetSc counterparts (M_B and M_X, for example, respectively). >> >> I was wondering if this would work in parallel as well? I have adapted >> one of the tutorial examples (ex2f from the ksp tutorials) to utilize >> the VecPlaceArray mechanism. I encountered no problems, except when I >> want to run the program in parallel. >> >> When I do that, and print my own vector X afterwards, different >> processors show different parts of the solution. For example, for a >> vector of length 10, and with two processors, processor one will have >> values for the first five elements (remainder is zero), and processor >> two will have values for the last five elements in the array. >> >>> From the same ksp tutorials, I have tried ex13 as well, the c program. >> >> Here I do not get partial outputs for different processors. >> >> I wonder whether one cannot use VecPlaceArray in a parralel setting in >> Fortran, except by doing extra bookkeeping? I hope someone can >> enlighten me, and indicate where I missed something in my programming >> or otherwise. >> >> Thanks in advance, >> >> Wienand Drenth >> >> >> >> -- >> Wienand Drenth PhD >> Eindhoven, the Netherlands > > -- Wienand Drenth PhD Eindhoven, the Netherlands From jed at 59A2.org Mon Nov 9 10:45:38 2009 From: jed at 59A2.org (Jed Brown) Date: Mon, 09 Nov 2009 17:45:38 +0100 Subject: use of VecPlaceArray in parallel with fortran In-Reply-To: <4a718f330911090830p3dc946e9qba2b3530dc26458e@mail.gmail.com> References: <4a718f330911061000y53dcf231x83a4e41cf7b25097@mail.gmail.com> <4a718f330911090830p3dc946e9qba2b3530dc26458e@mail.gmail.com> Message-ID: <4AF84732.3060604@59A2.org> Wienand Drenth wrote: > Hello Barry, > > Thank you for that. > > Just another question. As I wrote in my first email, in the current > code, we utilize a local non-PetSc arrays and using VecPlaceArray we > "give" this array to PetSc vectors to do the KSPSolve. Afterwards, we > can just continue with our local non-PetSc arrays. If I understand you > correctly, and for my knowledge, this approach will not be possible in > a parallel setting? > > When I do, with for example two processors, and with local array being > blocal = 1, 2, .... , 10 > then for the zeroth processor I have also values 1, 2, ... , 10 and > not just half (i.e., 1,2,3,4,5,0,0,0,0,0). > for the first processor I have only part of the values, but they start > with the first entry of my array, and not half-way: > 0,0,0,0,0, 1,2,3,4,5 instead of 0,0,0,0,0, 6,7,8,9,10 If it is this simple, you could still use VecPlaceArray, but you would be responsible for updating ghost values (of your arrays, KSPSolve will only put the solution in the contiguous owned segment). In 2D or 3D, the owned segment that you want to "give" to the KSP is likely to not be contiguous. BUT, you should just make a copy, it will not be a significant amount of memory or time. Look at VecScatterCreateToAll(), this can be used to update the copy that the rest of your code works with. Jed -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: From bsmith at mcs.anl.gov Mon Nov 9 14:54:59 2009 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 9 Nov 2009 14:54:59 -0600 Subject: Create vectors In-Reply-To: <20091109142907.vcdgmk3sao0k0800@webmail.ascomp.ch> References: <20091109130357.j12u8p5y8kwosks0@webmail.ascomp.ch> <20091109142907.vcdgmk3sao0k0800@webmail.ascomp.ch> Message-ID: <0B12D2AA-1B77-4FD3-92A4-9E237114B6FE@mcs.anl.gov> Yes, for example integer localn(level_numbers) Vec myvecs(level_numbers) for i=1,level_numbers VecCreate(PETSC_COMM_WORLD,localn(i),PETSC_DETERMINE,myvecs(i),ierr) endo Of course, myvecs() can also may made allocatable and you can set at run time the number of levels. Barry On Nov 9, 2009, at 7:29 AM, jarunan at ascomp.ch wrote: > > I am solving multi-level grid (similar to multi grid but not the > same). In each iteration, each level is solved separately but > solutions are mapped to eachother. Each level has different size of > matrix and vector. And each test case has different numbers of grid > level. > > I have a difficulty to create vectors and matrices for each level, > preparing for the computation, as I do not want to create and > destroy them in every iteration. I am thinking of something similar > to array pointer (the code is in fortran) e.g., > > Type(real_array), Dimension(:), allocatable:: pointername > Allocate(pointername(level_numbers)) > > do i = 1, level_numbers > allocate(pointername(i)%p(size)) > enddo > > > Is it possible to create pointer to vectors? > > > Thank you > Jarunan > > > Quoting jarunan at ascomp.ch: > >> >> Hello, >> >> Is there an equivalent way to allocating array pointer for creating >> vectors (or vector pointers)? >> >> >> Jarunan > > > > -- > Jarunan Panyasantisuk > Development Engineer > ASCOMP GmbH, Technoparkstr. 1 > CH-8005 Zurich, Switzerland > Phone : +41 44 445 4072 > Fax : +41 44 445 4075 > E-mail: jarunan at ascomp.ch > www.ascomp.ch From bsmith at mcs.anl.gov Mon Nov 9 15:00:53 2009 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 9 Nov 2009 15:00:53 -0600 Subject: use of VecPlaceArray in parallel with fortran In-Reply-To: <4a718f330911090830p3dc946e9qba2b3530dc26458e@mail.gmail.com> References: <4a718f330911061000y53dcf231x83a4e41cf7b25097@mail.gmail.com> <4a718f330911090830p3dc946e9qba2b3530dc26458e@mail.gmail.com> Message-ID: <04702B19-55F3-4470-95F8-DE22B4716650@mcs.anl.gov> On Nov 9, 2009, at 10:30 AM, Wienand Drenth wrote: > Hello Barry, > > Thank you for that. > > Just another question. As I wrote in my first email, in the current > code, we utilize a local non-PetSc arrays and using VecPlaceArray we > "give" this array to PetSc vectors to do the KSPSolve. Afterwards, we We are having some difficulty understanding your question and what exactly you want to do? In PETSc we use the term "local part" to mean the part of a vector owned and stored on a particular process. A global vector (parallel) in PETSc then stores part of it on each process. If you have on each process an array that holds the "local part" of a parallel vector. For example, double precision v(nlocal) then you can create a parallel vector with VecCreateWithArray() or VecPlaceArray() passing in v. If you have the entire Fortran array stored on one process and you want it parallel in PETSc then you can use the VecScatterCreateToZero() to get the scatter to spread it to all the processors. If you have parts stored on each process and you want ghost points filled in on each process then you need to set up a scatter with VecScatterCreate(). Barry > can just continue with our local non-PetSc arrays. If I understand you > correctly, and for my knowledge, this approach will not be possible in > a parallel setting? > > When I do, with for example two processors, and with local array being > blocal = 1, 2, .... , 10 > then for the zeroth processor I have also values 1, 2, ... , 10 and > not just half (i.e., 1,2,3,4,5,0,0,0,0,0). > for the first processor I have only part of the values, but they start > with the first entry of my array, and not half-way: > 0,0,0,0,0, 1,2,3,4,5 instead of 0,0,0,0,0, 6,7,8,9,10 > > > Regards, > > Wienand > > On Fri, Nov 6, 2009 at 7:48 PM, Barry Smith > wrote: >> >> VecPlaceArray() gives to the vector its local (on process) part of >> the >> array, not the whole array (and requires no communication). If you >> want the >> entire array of the vector on one or all processes you can use >> VecScatterCreateToAll() or VecScatterCreateToZero() and then use the >> VecScatter created to move the values to where you want them. >> >> Barry >> >> On Nov 6, 2009, at 12:00 PM, Wienand Drenth wrote: >> >>> Hello all, >>> >>> In my research code I solve a linear system of equations, and (of >>> course) I use PetSc routines for that. However, in the code we have >>> our own data arrays for the right handside vector B, and solution >>> vector X. Only just prior to the call to KSPSolve, we use the >>> routine >>> VecPlaceArray to synchronize the Fortran array B and X with their >>> PetSc counterparts (M_B and M_X, for example, respectively). >>> >>> I was wondering if this would work in parallel as well? I have >>> adapted >>> one of the tutorial examples (ex2f from the ksp tutorials) to >>> utilize >>> the VecPlaceArray mechanism. I encountered no problems, except >>> when I >>> want to run the program in parallel. >>> >>> When I do that, and print my own vector X afterwards, different >>> processors show different parts of the solution. For example, for a >>> vector of length 10, and with two processors, processor one will >>> have >>> values for the first five elements (remainder is zero), and >>> processor >>> two will have values for the last five elements in the array. >>> >>>> From the same ksp tutorials, I have tried ex13 as well, the c >>>> program. >>> >>> Here I do not get partial outputs for different processors. >>> >>> I wonder whether one cannot use VecPlaceArray in a parralel >>> setting in >>> Fortran, except by doing extra bookkeeping? I hope someone can >>> enlighten me, and indicate where I missed something in my >>> programming >>> or otherwise. >>> >>> Thanks in advance, >>> >>> Wienand Drenth >>> >>> >>> >>> -- >>> Wienand Drenth PhD >>> Eindhoven, the Netherlands >> >> > > > > -- > Wienand Drenth PhD > Eindhoven, the Netherlands From jarunan at ascomp.ch Tue Nov 10 02:28:56 2009 From: jarunan at ascomp.ch (jarunan at ascomp.ch) Date: Tue, 10 Nov 2009 09:28:56 +0100 Subject: Reuse matrix and vector In-Reply-To: <4AF2AFED.2090706@59A2.org> References: <20091101110112.GA11973@dutw689> <6265635E-33E3-44FC-9496-D9E0E86BEB3F@mcs.anl.gov> <20091101165723.GA24933@dutw689> <4AEDC4E8.5030901@tudelft.nl> <0BC6D60B-9608-4E80-8174-E2B29BDF0609@mcs.anl.gov> <20091105103217.8vknqnkehwc0ww4g@webmail.ascomp.ch> <4AF2A14B.8070409@59A2.org> <20091105111152.a6vy4mhjo8wo0kcg@webmail.ascomp.ch> <4AF2AFED.2090706@59A2.org> Message-ID: <20091110092856.rsg3y2fdq8wscgwg@webmail.ascomp.ch> >> >> Yes, thank you for the advice. I will modify this. But MatSetValues() is >> not efficient with a big problem. It takes much time. > > No, either you are inserting values that have not been preallocated > (check with -info | grep mallocs) or you are inserting single values. > You should insert a full row every time you call MatSetValues. > Hi, I have tried as you suggested: First allocate with MatCreateMPIAIJWithArrays() then use MatSetValues() to reset the matrix. MatSetValues() has great performance but MatCreateMPIAIJWithArrays() need really long time to allocate the matrix in the first iteration with more than 1 processor (With one processor it is very fast). Total number of cells is 744872, divided into 40 blocks. In one processor, MatCreateMPIAIJWithArrays() takes 0.097 sec but 280 sec with 4 processors. Usually, this routine has no problem with small test case. It works the same for one or more than one processors. in the first iteration. Mat Ap call MatCreateMPIAIJWithArrays(PETSC_COMM_WORLD, istorf_no_ovcell, & istorf_no_ovcell, PETSC_DETERMINE, PETSC_DETERMINE, rowind, columnind, & A, Ap, ierr) call MatAssemblyBegin(Ap,MAT_FINAL_ASSEMBLY,ierr) call MatAssemblyEnd(Ap,MAT_FINAL_ASSEMBLY,ierr) After the first iteration, looping over the row. the whole row is set at a time. call MatSetValues(Ap,1,row_impl,7,col_impl,a_impl,INSERT_VALUES,ierr) call MatAssemblyBegin(Ap,MAT_FINAL_ASSEMBLY,ierr) call MatAssemblyEnd(Ap,MAT_FINAL_ASSEMBLY,ierr) Does the communication of MatCreateMPIAIJWithArrays() in parallel computation cost a lot? What could be the cause that MatCreateMPIAIJWithArrays() so slow in the first iteration? Best reagards, Jarunan From w.drenth at gmail.com Tue Nov 10 03:21:48 2009 From: w.drenth at gmail.com (Wienand Drenth) Date: Tue, 10 Nov 2009 10:21:48 +0100 Subject: use of VecPlaceArray in parallel with fortran In-Reply-To: <04702B19-55F3-4470-95F8-DE22B4716650@mcs.anl.gov> References: <4a718f330911061000y53dcf231x83a4e41cf7b25097@mail.gmail.com> <4a718f330911090830p3dc946e9qba2b3530dc26458e@mail.gmail.com> <04702B19-55F3-4470-95F8-DE22B4716650@mcs.anl.gov> Message-ID: <4a718f330911100121w2346a3e8u7beb5b095ea39f6f@mail.gmail.com> > We are having some difficulty understanding your question and what > exactly you want to do? Hello Barry, Apologies I am a bit unclear. I am relatively new to PetSc and the proper terminology, so thank you for your time and help. Currently we have a two Fortran arrays B and X, being the righthandside and solution vector. There are no special considerations to have this array on just one processor. So, if I am not mistaken, when I run the program on multiple processors, each processor will have the entire Fortran array and not just part of it. In order to solve the system iteratively, we make calls VecPlaceArray(M_X, X, ierr) to place the Fortran array into the PetSc Vector M_X. Then we call KSPSolve. After the solve, we don't care for the PetSc vectors anymore, but continue with the Fortran arrays (X and B) in our further calculations. Right know, when running in the above setting it will not function correctly when run on multiple processors. Henceforth my question on how to tackle this and adapt the code to run it in parallel. Would the following procedure lead to a correct and working solution: Suppose I have a Fortran array X, and I create on processor zero a sequential PetSc vector MS_X and place the array X into MS_X using VecPlaceArray. With VecScatterCreaterToZero, and SCATTER_REVERSE as scatter mode I can spread it onto the global (parallel) vector M_X. After my calculations, I can do the same to scatter the parallel solution onto my sequential vector MS_X (now with SCATTER_FORWARD), and continue afterwards with X. Regards, Wienand -- Wienand Drenth PhD Eindhoven, the Netherlands From jed at 59A2.org Tue Nov 10 04:51:05 2009 From: jed at 59A2.org (Jed Brown) Date: Tue, 10 Nov 2009 11:51:05 +0100 Subject: Reuse matrix and vector In-Reply-To: <20091110092856.rsg3y2fdq8wscgwg@webmail.ascomp.ch> References: <20091101110112.GA11973@dutw689> <6265635E-33E3-44FC-9496-D9E0E86BEB3F@mcs.anl.gov> <20091101165723.GA24933@dutw689> <4AEDC4E8.5030901@tudelft.nl> <0BC6D60B-9608-4E80-8174-E2B29BDF0609@mcs.anl.gov> <20091105103217.8vknqnkehwc0ww4g@webmail.ascomp.ch> <4AF2A14B.8070409@59A2.org> <20091105111152.a6vy4mhjo8wo0kcg@webmail.ascomp.ch> <4AF2AFED.2090706@59A2.org> <20091110092856.rsg3y2fdq8wscgwg@webmail.ascomp.ch> Message-ID: <4AF94599.7030309@59A2.org> jarunan at ascomp.ch wrote: > Total number of cells is 744872, divided into 40 blocks. In one > processor, MatCreateMPIAIJWithArrays() takes 0.097 sec but 280 sec with > 4 processors. Usually, this routine has no problem with small test case. > It works the same for one or more than one processors. This sounds like incorrect preallocation. Is your PETSc built with debugging? Debug does some extra integrity checks that don't add significantly to the time (although other Debug checks do), but it would be useful to know that they pass. In particular, it checks that your rows are sorted. If they are not sorted then PETSc's preallocation would be wrong. (I actually don't think this requirement enables significantly faster implementation, so I'm tempted to change it to work correctly with unsorted rows.) You can also run with -info |grep malloc, there should be no mallocs in MatSetValues(). > in the first iteration. > Mat Ap > > call MatCreateMPIAIJWithArrays(PETSC_COMM_WORLD, istorf_no_ovcell, & > istorf_no_ovcell, PETSC_DETERMINE, PETSC_DETERMINE, rowind, columnind, & > A, Ap, ierr) > > call MatAssemblyBegin(Ap,MAT_FINAL_ASSEMBLY,ierr) > call MatAssemblyEnd(Ap,MAT_FINAL_ASSEMBLY,ierr) This assembly is superfluous (but harmless). > Does the communication of MatCreateMPIAIJWithArrays() in parallel > computation cost a lot? What could be the cause that > MatCreateMPIAIJWithArrays() so slow in the first iteration? There is no significant communication, it has to be preallocation. Jed -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: From jed at 59A2.org Tue Nov 10 05:40:39 2009 From: jed at 59A2.org (Jed Brown) Date: Tue, 10 Nov 2009 12:40:39 +0100 Subject: use of VecPlaceArray in parallel with fortran In-Reply-To: <4a718f330911100121w2346a3e8u7beb5b095ea39f6f@mail.gmail.com> References: <4a718f330911061000y53dcf231x83a4e41cf7b25097@mail.gmail.com> <4a718f330911090830p3dc946e9qba2b3530dc26458e@mail.gmail.com> <04702B19-55F3-4470-95F8-DE22B4716650@mcs.anl.gov> <4a718f330911100121w2346a3e8u7beb5b095ea39f6f@mail.gmail.com> Message-ID: <4AF95137.7030808@59A2.org> Wienand Drenth wrote: > Would the following procedure lead to a correct and working solution: > > Suppose I have a Fortran array X, and I create on processor zero a > sequential PetSc vector MS_X and place the array X into MS_X using > VecPlaceArray. With VecScatterCreaterToZero, and SCATTER_REVERSE as > scatter mode I can spread it onto the global (parallel) vector M_X. > > After my calculations, I can do the same to scatter the parallel > solution onto my sequential vector MS_X (now with SCATTER_FORWARD), > and continue afterwards with X. With this last part, you are responsible for broadcasting X before your code can continue. VecScatterCreateToAll() would get PETSc to do it for you, *but* these may be too restrictive for what you want. It will only work if the local portions are contiguous (it is an issue of natural versus "PETSc" ordering, see Figure 9 of the user's manual). Presumably your code uses the natural ordering, but solvers will perform better if they can use the PETSc ordering. Therefore you will probably have to make your own scatter. Assembling the matrix is more tricky because it will be a major bottleneck if process 0 has to do all of it (unless you solve many problems with the same matrix) and it is expensive to assemble it on the wrong process (i.e. assemble in the natural ordering and let PETSc send the entries to the correct process). I don't know how how your code is organized, but I highly recommend using a decomposition like is done by DA (and preferably also use the DA, even if it means you have to do more copies -- cheap compared to the shenanigans we are talking about here). This should involve *less* modification to your existing serial code, and will offer much better scalability. Jed -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: From knepley at gmail.com Tue Nov 10 05:50:22 2009 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 10 Nov 2009 05:50:22 -0600 Subject: Reuse matrix and vector In-Reply-To: <4AF94599.7030309@59A2.org> References: <20091101110112.GA11973@dutw689> <4AEDC4E8.5030901@tudelft.nl> <0BC6D60B-9608-4E80-8174-E2B29BDF0609@mcs.anl.gov> <20091105103217.8vknqnkehwc0ww4g@webmail.ascomp.ch> <4AF2A14B.8070409@59A2.org> <20091105111152.a6vy4mhjo8wo0kcg@webmail.ascomp.ch> <4AF2AFED.2090706@59A2.org> <20091110092856.rsg3y2fdq8wscgwg@webmail.ascomp.ch> <4AF94599.7030309@59A2.org> Message-ID: On Tue, Nov 10, 2009 at 4:51 AM, Jed Brown wrote: > jarunan at ascomp.ch wrote: > > Total number of cells is 744872, divided into 40 blocks. In one > > processor, MatCreateMPIAIJWithArrays() takes 0.097 sec but 280 sec with > > 4 processors. Usually, this routine has no problem with small test case. > > It works the same for one or more than one processors. > > This sounds like incorrect preallocation. Is your PETSc built with > debugging? Debug does some extra integrity checks that don't add > significantly to the time (although other Debug checks do), but it would > be useful to know that they pass. In particular, it checks that your > rows are sorted. If they are not sorted then PETSc's preallocation > would be wrong. (I actually don't think this requirement enables > significantly faster implementation, so I'm tempted to change it to work > correctly with unsorted rows.) > I do not think its preallocation per se, since 1 proc is fast. I think that your partition of rows fed to the MatCreate() call does not match what you provide to MatSetValues() and thus you do a lot of communication in MatAssemblyEnd(). There are 2 ways to debug this: 1) -log_summary to see where the time is spent 2) MatSetOption(A, *MAT_NEW_NONZERO_LOCATION_ERR)* Matt You can also run with -info |grep malloc, there should be no mallocs in > MatSetValues(). > > > in the first iteration. > > Mat Ap > > > > call MatCreateMPIAIJWithArrays(PETSC_COMM_WORLD, istorf_no_ovcell, > & > > istorf_no_ovcell, PETSC_DETERMINE, PETSC_DETERMINE, rowind, > columnind, & > > A, Ap, ierr) > > > > call MatAssemblyBegin(Ap,MAT_FINAL_ASSEMBLY,ierr) > > call MatAssemblyEnd(Ap,MAT_FINAL_ASSEMBLY,ierr) > > This assembly is superfluous (but harmless). > > > Does the communication of MatCreateMPIAIJWithArrays() in parallel > > computation cost a lot? What could be the cause that > > MatCreateMPIAIJWithArrays() so slow in the first iteration? > > There is no significant communication, it has to be preallocation. > > Jed > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Tue Nov 10 05:58:05 2009 From: jed at 59A2.org (Jed Brown) Date: Tue, 10 Nov 2009 12:58:05 +0100 Subject: Reuse matrix and vector In-Reply-To: References: <20091101110112.GA11973@dutw689> <4AEDC4E8.5030901@tudelft.nl> <0BC6D60B-9608-4E80-8174-E2B29BDF0609@mcs.anl.gov> <20091105103217.8vknqnkehwc0ww4g@webmail.ascomp.ch> <4AF2A14B.8070409@59A2.org> <20091105111152.a6vy4mhjo8wo0kcg@webmail.ascomp.ch> <4AF2AFED.2090706@59A2.org> <20091110092856.rsg3y2fdq8wscgwg@webmail.ascomp.ch> <4AF94599.7030309@59A2.org> Message-ID: <4AF9554D.4060002@59A2.org> Matthew Knepley wrote: > On Tue, Nov 10, 2009 at 4:51 AM, Jed Brown > wrote: > > jarunan at ascomp.ch wrote: > > Total number of cells is 744872, divided into 40 blocks. In one > > processor, MatCreateMPIAIJWithArrays() takes 0.097 sec but 280 sec > with > > 4 processors. Usually, this routine has no problem with small test > case. > > It works the same for one or more than one processors. > > This sounds like incorrect preallocation. Is your PETSc built with > debugging? Debug does some extra integrity checks that don't add > significantly to the time (although other Debug checks do), but it would > be useful to know that they pass. In particular, it checks that your > rows are sorted. If they are not sorted then PETSc's preallocation > would be wrong. (I actually don't think this requirement enables > significantly faster implementation, so I'm tempted to change it to work > correctly with unsorted rows.) > > > I do not think its preallocation per se, since 1 proc is fast. I think > that your partition of rows fed to the MatCreate() call does not match > what you provide to MatSetValues() and thus you do a lot of > communication in MatAssemblyEnd(). There are 2 ways to debug this: Matt, he says MatSetValues() is fast, but MatCreateMPIAIJWithArrays() is slow. Look how preallocation is done (mpiaij.c:3263). This would do the correct thing in serial, but be under-allocate the diagonal part when the rows are not sorted. It looks like a silly "optimization" to me. Jed -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: From jarunan at ascomp.ch Tue Nov 10 06:02:12 2009 From: jarunan at ascomp.ch (jarunan at ascomp.ch) Date: Tue, 10 Nov 2009 13:02:12 +0100 Subject: Reuse matrix and vector In-Reply-To: <4AF94599.7030309@59A2.org> References: <20091101110112.GA11973@dutw689> <6265635E-33E3-44FC-9496-D9E0E86BEB3F@mcs.anl.gov> <20091101165723.GA24933@dutw689> <4AEDC4E8.5030901@tudelft.nl> <0BC6D60B-9608-4E80-8174-E2B29BDF0609@mcs.anl.gov> <20091105103217.8vknqnkehwc0ww4g@webmail.ascomp.ch> <4AF2A14B.8070409@59A2.org> <20091105111152.a6vy4mhjo8wo0kcg@webmail.ascomp.ch> <4AF2AFED.2090706@59A2.org> <20091110092856.rsg3y2fdq8wscgwg@webmail.ascomp.ch> <4AF94599.7030309@59A2.org> Message-ID: <20091110130212.9ivjxj3a2oogckkw@webmail.ascomp.ch> Quoting Jed Brown : > jarunan at ascomp.ch wrote: >> Total number of cells is 744872, divided into 40 blocks. In one >> processor, MatCreateMPIAIJWithArrays() takes 0.097 sec but 280 sec with >> 4 processors. Usually, this routine has no problem with small test case. >> It works the same for one or more than one processors. > > This sounds like incorrect preallocation. Is your PETSc built with > debugging? Debug does some extra integrity checks that don't add > significantly to the time (although other Debug checks do), but it would > be useful to know that they pass. In particular, it checks that your > rows are sorted. If they are not sorted then PETSc's preallocation > would be wrong. (I actually don't think this requirement enables > significantly faster implementation, so I'm tempted to change it to work > correctly with unsorted rows.) The code is compiled with the optimized version of PETSc. The row indices and column indices for each row are sorted. Well, they are not sorted for diagonal or off-diagonal part. > > You can also run with -info |grep malloc, there should be no mallocs in > MatSetValues(). > Here are the output. The first serie of MatAssemblyEnd_SeqAIJ() should be in MatCreateMPIAIJWithArrays(), which mallocs in MatSetValues() are not zero. Would MatCreateMPIAIJWithSplitArrays() be better in preallocation? [0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs. [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs. [2] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. [1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. [3] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 4379 [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 13859 [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 20286 [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 4592 [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 15042 [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 11715 [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [2] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. [1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. [3] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatIncreaseOverlap_MPIAIJ_Receive(): Allocated 0 bytes, required 3 bytes, no of mallocs = 0 [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 From knepley at gmail.com Tue Nov 10 06:08:11 2009 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 10 Nov 2009 06:08:11 -0600 Subject: Reuse matrix and vector In-Reply-To: <20091110130212.9ivjxj3a2oogckkw@webmail.ascomp.ch> References: <20091101110112.GA11973@dutw689> <4AEDC4E8.5030901@tudelft.nl> <0BC6D60B-9608-4E80-8174-E2B29BDF0609@mcs.anl.gov> <20091105103217.8vknqnkehwc0ww4g@webmail.ascomp.ch> <4AF2A14B.8070409@59A2.org> <20091105111152.a6vy4mhjo8wo0kcg@webmail.ascomp.ch> <4AF2AFED.2090706@59A2.org> <20091110092856.rsg3y2fdq8wscgwg@webmail.ascomp.ch> <4AF94599.7030309@59A2.org> <20091110130212.9ivjxj3a2oogckkw@webmail.ascomp.ch> Message-ID: On Tue, Nov 10, 2009 at 6:02 AM, wrote: > Quoting Jed Brown : > > jarunan at ascomp.ch wrote: >> >>> Total number of cells is 744872, divided into 40 blocks. In one >>> processor, MatCreateMPIAIJWithArrays() takes 0.097 sec but 280 sec with >>> 4 processors. Usually, this routine has no problem with small test case. >>> It works the same for one or more than one processors. >>> >> >> This sounds like incorrect preallocation. Is your PETSc built with >> debugging? Debug does some extra integrity checks that don't add >> significantly to the time (although other Debug checks do), but it would >> be useful to know that they pass. In particular, it checks that your >> rows are sorted. If they are not sorted then PETSc's preallocation >> would be wrong. (I actually don't think this requirement enables >> significantly faster implementation, so I'm tempted to change it to work >> correctly with unsorted rows.) >> > > The code is compiled with the optimized version of PETSc. The row indices > and column indices for each row are sorted. Well, they are not sorted for > diagonal or off-diagonal part. > Actually, what Jed says is the likely culprit. Please check that the column indices in each row are sorted. It is clear that the preallocation for multiple procs does not match what you feed to MatSetValues(). Jed: I agree. I would have just written the loop to check for membership in the diagonal block (as I do elsewhere). Maybe we should change petsc-dev? Matt >> You can also run with -info |grep malloc, there should be no mallocs in >> MatSetValues(). >> >> > Here are the output. > The first serie of MatAssemblyEnd_SeqAIJ() should be in > MatCreateMPIAIJWithArrays(), which mallocs in MatSetValues() are not zero. > Would MatCreateMPIAIJWithSplitArrays() be better in preallocation? > > [0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs. > [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs. > [2] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. > [1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. > [3] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. > [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. > [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is > 4379 > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is > 13859 > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is > 20286 > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is > 4592 > [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is > 15042 > [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is > 11715 > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [2] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. > [1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. > [3] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. > [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. > [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatIncreaseOverlap_MPIAIJ_Receive(): Allocated 0 bytes, required 3 > bytes, no of mallocs = 0 > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Tue Nov 10 06:08:45 2009 From: jed at 59A2.org (Jed Brown) Date: Tue, 10 Nov 2009 13:08:45 +0100 Subject: Reuse matrix and vector In-Reply-To: <20091110130212.9ivjxj3a2oogckkw@webmail.ascomp.ch> References: <20091101110112.GA11973@dutw689> <6265635E-33E3-44FC-9496-D9E0E86BEB3F@mcs.anl.gov> <20091101165723.GA24933@dutw689> <4AEDC4E8.5030901@tudelft.nl> <0BC6D60B-9608-4E80-8174-E2B29BDF0609@mcs.anl.gov> <20091105103217.8vknqnkehwc0ww4g@webmail.ascomp.ch> <4AF2A14B.8070409@59A2.org> <20091105111152.a6vy4mhjo8wo0kcg@webmail.ascomp.ch> <4AF2AFED.2090706@59A2.org> <20091110092856.rsg3y2fdq8wscgwg@webmail.ascomp.ch> <4AF94599.7030309@59A2.org> <20091110130212.9ivjxj3a2oogckkw@webmail.ascomp.ch> Message-ID: <4AF957CD.5050204@59A2.org> jarunan at ascomp.ch wrote: > The code is compiled with the optimized version of PETSc. The row > indices and column indices for each row are sorted. Well, they are not > sorted for diagonal or off-diagonal part. I recommend using a debug version for all testing and only the optimized for production/scalability. You should use MatCreateMPIAIJWithSplitArrays() if you have these available separately. But it sounds like you have one big array where each row has the diagonal part followed by the off-diagonal part? This format can't be used directly by PETSc, just preallocate with MatMPIAIJSetPreallocation() and use MatSetValues(). Jed -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: From w.drenth at gmail.com Tue Nov 10 06:32:27 2009 From: w.drenth at gmail.com (Wienand Drenth) Date: Tue, 10 Nov 2009 13:32:27 +0100 Subject: use of VecPlaceArray in parallel with fortran In-Reply-To: <4a718f330911100121w2346a3e8u7beb5b095ea39f6f@mail.gmail.com> References: <4a718f330911061000y53dcf231x83a4e41cf7b25097@mail.gmail.com> <4a718f330911090830p3dc946e9qba2b3530dc26458e@mail.gmail.com> <04702B19-55F3-4470-95F8-DE22B4716650@mcs.anl.gov> <4a718f330911100121w2346a3e8u7beb5b095ea39f6f@mail.gmail.com> Message-ID: <4a718f330911100432ufa2523ew8d08761cd544b9b9@mail.gmail.com> As a follow-up of my previous email, I have tried the following: bvec is a Fortran array bseq2 is a sequential PetSc vector created only on processor 0 b is the parallel PetSc vector I use in KSPSolve I fill bvec with bvec(i) = i, and b gets initialized to all ones. then I do if (rank.eq.0) then call VecPlaceArray(bseq2, bvec, ierr) do my_i=1,m*n II=my_i-1 call VecGetValues(bseq2,1,II,v,ierr) write(*,*) "bseq2 rhs: ", II, ": ", v, " for rank ", rank enddo endif this prints nice 1, 2, etc Next I want my parallel vector b to be filled. So I use VecScatterCreatertoZero. call VecScatterCreateToZero(b,vscat,bseq2,ierr) call VecScatterBegin(vscat, bseq2,b,INSERT_VALUES, > SCATTER_REVERSE,ierr) call VecScatterEnd(vscat,bseq2,b,INSERT_VALUES, > SCATTER_REVERSE,ierr) call VecScatterDestroy(vscat, ierr) However, now b is filled with zeros. (If I would change the order in the VecScatterBegin and VecScatterEnd (first b, then bseq2), possible in combination with SCATTER_FORWARD, then b will still have its original ones.) I cannot immediately see what I did wrong here, so I hope somebody could give a further hint. Wienand From knepley at gmail.com Tue Nov 10 06:36:53 2009 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 10 Nov 2009 06:36:53 -0600 Subject: use of VecPlaceArray in parallel with fortran In-Reply-To: <4a718f330911100432ufa2523ew8d08761cd544b9b9@mail.gmail.com> References: <4a718f330911061000y53dcf231x83a4e41cf7b25097@mail.gmail.com> <4a718f330911090830p3dc946e9qba2b3530dc26458e@mail.gmail.com> <04702B19-55F3-4470-95F8-DE22B4716650@mcs.anl.gov> <4a718f330911100121w2346a3e8u7beb5b095ea39f6f@mail.gmail.com> <4a718f330911100432ufa2523ew8d08761cd544b9b9@mail.gmail.com> Message-ID: On Tue, Nov 10, 2009 at 6:32 AM, Wienand Drenth wrote: > As a follow-up of my previous email, I have tried the following: > > bvec is a Fortran array > bseq2 is a sequential PetSc vector created only on processor 0 > b is the parallel PetSc vector I use in KSPSolve > > I fill bvec with bvec(i) = i, and b gets initialized to all ones. > > then I do > if (rank.eq.0) then > call VecPlaceArray(bseq2, bvec, ierr) > do my_i=1,m*n > II=my_i-1 > call VecGetValues(bseq2,1,II,v,ierr) > write(*,*) "bseq2 rhs: ", II, ": ", v, " for rank ", rank > enddo > endif > > this prints nice 1, 2, etc > > Next I want my parallel vector b to be filled. So I use > VecScatterCreatertoZero. > > call VecScatterCreateToZero(b,vscat,bseq2,ierr) > This call creates bseq2, so it will wipe out the values. Put them in afterwards. Matt > call VecScatterBegin(vscat, bseq2,b,INSERT_VALUES, > > SCATTER_REVERSE,ierr) > call VecScatterEnd(vscat,bseq2,b,INSERT_VALUES, > > SCATTER_REVERSE,ierr) > call VecScatterDestroy(vscat, ierr) > > However, now b is filled with zeros. (If I would change the order in > the VecScatterBegin and VecScatterEnd (first b, then bseq2), possible > in combination with SCATTER_FORWARD, then b will still have its > original ones.) > > I cannot immediately see what I did wrong here, so I hope somebody > could give a further hint. > > Wienand > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From yfeng1 at tigers.lsu.edu Tue Nov 10 21:53:52 2009 From: yfeng1 at tigers.lsu.edu (Yin Feng) Date: Tue, 10 Nov 2009 21:53:52 -0600 Subject: A problem on compiling a petsc code. Message-ID: <1e8c69dc0911101953p5f495222j1771b431432c5a51@mail.gmail.com> When I compiled petsc code, I encourntered following error code. the compiler is mpicc /usr/lib/gcc/x86_64-redhat-linux/3.4.6/../../../../lib64/crt1.o(.text+0x21): In function `_start': : undefined reference to `main' collect2: ld returned 1 exit status Thank you in advance! -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Nov 11 00:49:28 2009 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 11 Nov 2009 00:49:28 -0600 (CST) Subject: A problem on compiling a petsc code. In-Reply-To: <1e8c69dc0911101953p5f495222j1771b431432c5a51@mail.gmail.com> References: <1e8c69dc0911101953p5f495222j1771b431432c5a51@mail.gmail.com> Message-ID: Please send the complete compile command that gave this error. Also do PETSc examples compile/run correctly? What do you get for: cd src/ksp/ksp/examples/tutorials make ex2 make ex2f Satish On Tue, 10 Nov 2009, Yin Feng wrote: > When I compiled petsc code, I encourntered following error code. > the compiler is mpicc > > /usr/lib/gcc/x86_64-redhat-linux/3.4.6/../../../../lib64/crt1.o(.text+0x21): > In function `_start': > : undefined reference to `main' > collect2: ld returned 1 exit status > > Thank you in advance! > From yfeng1 at tigers.lsu.edu Wed Nov 11 12:24:10 2009 From: yfeng1 at tigers.lsu.edu (Yin Feng) Date: Wed, 11 Nov 2009 12:24:10 -0600 Subject: A problem on compiling a petsc code. In-Reply-To: References: <1e8c69dc0911101953p5f495222j1771b431432c5a51@mail.gmail.com> Message-ID: <1e8c69dc0911111024t6397ff3ai79830c069aad621f@mail.gmail.com> include ${PETSC_DIR}/conf/base v: a.o b.o -${CLINKER} -g3 -O0 -o v a.o b.o ${PETSC_SNES_LIB} and I compiled ex2 successfully. Thank you! On Wed, Nov 11, 2009 at 12:49 AM, Satish Balay wrote: > Please send the complete compile command that gave this error. > > Also do PETSc examples compile/run correctly? What do you get for: > > cd src/ksp/ksp/examples/tutorials > make ex2 > make ex2f > > Satish > > > On Tue, 10 Nov 2009, Yin Feng wrote: > > > When I compiled petsc code, I encourntered following error code. > > the compiler is mpicc > > > > > /usr/lib/gcc/x86_64-redhat-linux/3.4.6/../../../../lib64/crt1.o(.text+0x21): > > In function `_start': > > : undefined reference to `main' > > collect2: ld returned 1 exit status > > > > Thank you in advance! > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Nov 11 12:35:50 2009 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 11 Nov 2009 12:35:50 -0600 (CST) Subject: A problem on compiling a petsc code. In-Reply-To: <1e8c69dc0911111024t6397ff3ai79830c069aad621f@mail.gmail.com> References: <1e8c69dc0911101953p5f495222j1771b431432c5a51@mail.gmail.com> <1e8c69dc0911111024t6397ff3ai79830c069aad621f@mail.gmail.com> Message-ID: I need the *'complete make output'* from the PETSc example - and your code. This output has the compile commands used [and associated errors] Satish On Wed, 11 Nov 2009, Yin Feng wrote: > include ${PETSC_DIR}/conf/base > v: a.o b.o > -${CLINKER} -g3 -O0 -o v a.o b.o ${PETSC_SNES_LIB} > > and I compiled ex2 successfully. > > Thank you! > > > On Wed, Nov 11, 2009 at 12:49 AM, Satish Balay wrote: > > > Please send the complete compile command that gave this error. > > > > Also do PETSc examples compile/run correctly? What do you get for: > > > > cd src/ksp/ksp/examples/tutorials > > make ex2 > > make ex2f > > > > Satish > > > > > > On Tue, 10 Nov 2009, Yin Feng wrote: > > > > > When I compiled petsc code, I encourntered following error code. > > > the compiler is mpicc > > > > > > > > /usr/lib/gcc/x86_64-redhat-linux/3.4.6/../../../../lib64/crt1.o(.text+0x21): > > > In function `_start': > > > : undefined reference to `main' > > > collect2: ld returned 1 exit status > > > > > > Thank you in advance! > > > > > > > > From jarunan at ascomp.ch Thu Nov 12 05:21:43 2009 From: jarunan at ascomp.ch (jarunan at ascomp.ch) Date: Thu, 12 Nov 2009 12:21:43 +0100 Subject: PetscScalar, PetscReal In-Reply-To: References: <1e8c69dc0911101953p5f495222j1771b431432c5a51@mail.gmail.com> <1e8c69dc0911111024t6397ff3ai79830c069aad621f@mail.gmail.com> Message-ID: <20091112122143.s3eiipe0co4o08o8@webmail.ascomp.ch> Hello, I would like to ask about PetscScalar. - What is PetscScalar by default?(double precision real number?) - Is PetscReal represent double precision real number? - Is PetscScalar need more memory than PetscReal? By the way, in the Petsc command I cannot use Integer or normal real array from fortran. I have to convert every variables I need Petsc command to PetscInt, PetscReal or PetscScalar. Is it meant to be so? In the last version, I used to be able to use common integer or real in Petsc code. Best regards, Jarunan From jed at 59A2.org Thu Nov 12 05:33:45 2009 From: jed at 59A2.org (Jed Brown) Date: Thu, 12 Nov 2009 12:33:45 +0100 Subject: PetscScalar, PetscReal In-Reply-To: <20091112122143.s3eiipe0co4o08o8@webmail.ascomp.ch> References: <1e8c69dc0911101953p5f495222j1771b431432c5a51@mail.gmail.com> <1e8c69dc0911111024t6397ff3ai79830c069aad621f@mail.gmail.com> <20091112122143.s3eiipe0co4o08o8@webmail.ascomp.ch> Message-ID: <4AFBF299.8090107@59A2.org> jarunan at ascomp.ch wrote: > > Hello, > > I would like to ask about PetscScalar. > - What is PetscScalar by default?(double precision real number?) > - Is PetscReal represent double precision real number? > - Is PetscScalar need more memory than PetscReal? By default they are both double precision reals and thus use the same amount of memory. When built with complex, PetscScalar is complex, hence takes twice as much space. You should use Scalar for all "state" variables, Real is for parameters or time that only make sense as reals. > By the way, in the Petsc command I cannot use Integer or normal real > array from fortran. I have to convert every variables I need Petsc > command to PetscInt, PetscReal or PetscScalar. Is it meant to be so? In > the last version, I used to be able to use common integer or real in > Petsc code. Can you be more specific about what no longer works. Jed -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: From dominik at itis.ethz.ch Sat Nov 14 14:42:01 2009 From: dominik at itis.ethz.ch (Dominik Szczerba) Date: Sat, 14 Nov 2009 21:42:01 +0100 Subject: write_line error Message-ID: <4AFF1619.6060006@itis.ethz.ch> My application exits with the last displayed lines: [cli_0]: write_line error; fd=8 buf=:cmd=get kvsname=kvs_7282_0 key=P1-businesscard : system msg for write_line failure : Bad file descriptor [cli_0]: write_line error; fd=8 buf=:cmd=get kvsname=kvs_7282_0 key=P1-businesscard : system msg for write_line failure : Bad file descriptor leaving me with no clue whatsoever what goes on here. Any directions are highly appreciated. Dominik From jed at 59A2.org Sat Nov 14 14:50:27 2009 From: jed at 59A2.org (Jed Brown) Date: Sat, 14 Nov 2009 21:50:27 +0100 Subject: write_line error In-Reply-To: <4AFF1619.6060006@itis.ethz.ch> References: <4AFF1619.6060006@itis.ethz.ch> Message-ID: <4AFF1813.8030504@59A2.org> Dominik Szczerba wrote: > My application exits with the last displayed lines: > > [cli_0]: write_line error; fd=8 buf=:cmd=get kvsname=kvs_7282_0 > key=P1-businesscard > : > system msg for write_line failure : Bad file descriptor > [cli_0]: write_line error; fd=8 buf=:cmd=get kvsname=kvs_7282_0 > key=P1-businesscard > : > system msg for write_line failure : Bad file descriptor > > > > leaving me with no clue whatsoever what goes on here. Any directions are > highly appreciated. Maybe this is related? http://trac.mcs.anl.gov/projects/mpich2/ticket/907 Jed -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: From dominik at itis.ethz.ch Sat Nov 14 14:57:01 2009 From: dominik at itis.ethz.ch (Dominik Szczerba) Date: Sat, 14 Nov 2009 21:57:01 +0100 Subject: write_line error In-Reply-To: <4AFF1813.8030504@59A2.org> References: <4AFF1619.6060006@itis.ethz.ch> <4AFF1813.8030504@59A2.org> Message-ID: <4AFF199D.5030509@itis.ethz.ch> Yes I saw that post in google before but I was not able to conclude much for myself... Interesting is it happens after all the usual (success) messages, e.g. ------------------------------------------ Using C linker: /home/domel/pack/petsc-3.0.0-p9/linux-gnu-c-debug/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -g3 Using Fortran linker: /home/domel/pack/petsc-3.0.0-p9/linux-gnu-c-debug/bin/mpif90 -Wall -Wno-unused-variable -g Using libraries: -Wl,-rpath,/home/domel/pack/petsc-3.0.0-p9/linux-gnu-c-debug/lib -L/home/domel/pack/petsc-3.0.0-p9/linux-gnu-c-debug/lib -lpetscts -lpetscsnes -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc -Wl,-rpath,/home/domel/pack/petsc-3.0.0-p9/linux-gnu-c-debug/lib -L/home/domel/pack/petsc-3.0.0-p9/linux-gnu-c-debug/lib -lHYPRE -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -lrt -L/home/domel/pack/petsc-3.0.0-p9/linux-gnu-c-debug/lib -L/usr/lib/gcc/i486-linux-gnu/4.3.3 -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortranbegin -lgfortran -lm -L/usr/lib/gcc/i486-linux-gnu -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl ------------------------------------------ Jed Brown wrote: > Dominik Szczerba wrote: >> My application exits with the last displayed lines: >> >> [cli_0]: write_line error; fd=8 buf=:cmd=get kvsname=kvs_7282_0 >> key=P1-businesscard >> : >> system msg for write_line failure : Bad file descriptor >> [cli_0]: write_line error; fd=8 buf=:cmd=get kvsname=kvs_7282_0 >> key=P1-businesscard >> : >> system msg for write_line failure : Bad file descriptor >> >> >> >> leaving me with no clue whatsoever what goes on here. Any directions are >> highly appreciated. > > Maybe this is related? > > http://trac.mcs.anl.gov/projects/mpich2/ticket/907 > > Jed > From jed at 59A2.org Sat Nov 14 15:00:42 2009 From: jed at 59A2.org (Jed Brown) Date: Sat, 14 Nov 2009 22:00:42 +0100 Subject: write_line error In-Reply-To: <4AFF199D.5030509@itis.ethz.ch> References: <4AFF1619.6060006@itis.ethz.ch> <4AFF1813.8030504@59A2.org> <4AFF199D.5030509@itis.ethz.ch> Message-ID: <4AFF1A7A.6020405@59A2.org> Dominik Szczerba wrote: > Yes I saw that post in google before but I was not able to conclude much > for myself... > > Interesting is it happens after all the usual (success) messages, e.g. That is because it is produced in MPI_Finalize() which PETSc only calls after it is done with everything else (including the output you cite). You can initialize MPI yourself or set a breakpoint on MPI_Finalize to confirm, but none of this output is produced by PETSc. Jed -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: From dominik at itis.ethz.ch Sat Nov 14 15:14:45 2009 From: dominik at itis.ethz.ch (Dominik Szczerba) Date: Sat, 14 Nov 2009 22:14:45 +0100 Subject: write_line error In-Reply-To: <4AFF1A7A.6020405@59A2.org> References: <4AFF1619.6060006@itis.ethz.ch> <4AFF1813.8030504@59A2.org> <4AFF199D.5030509@itis.ethz.ch> <4AFF1A7A.6020405@59A2.org> Message-ID: <4AFF1DC5.9020303@itis.ethz.ch> OK that was very useful: due to some hasty changes I was calling MPI_Finalize without MPI_Init (calling only PetscInitialize and PetscFinalize). Thanks a lot! Jed Brown wrote: > Dominik Szczerba wrote: >> Yes I saw that post in google before but I was not able to conclude much >> for myself... >> >> Interesting is it happens after all the usual (success) messages, e.g. > > That is because it is produced in MPI_Finalize() which PETSc only calls > after it is done with everything else (including the output you cite). > You can initialize MPI yourself or set a breakpoint on MPI_Finalize to > confirm, but none of this output is produced by PETSc. > > Jed > From dominik at itis.ethz.ch Sat Nov 14 15:32:29 2009 From: dominik at itis.ethz.ch (Dominik Szczerba) Date: Sat, 14 Nov 2009 22:32:29 +0100 Subject: malloc(): memory corruption: Message-ID: <4AFF21ED.5080106@itis.ethz.ch> Now for something more serious: I get a crash like this one: Starting KSPSolve (1/2) 0 KSP Residual norm 2.964538623545e-06 *** glibc detected *** /home/domel/build/solve-debug/ns3t10mpi: malloc(): memory corruption: 0x09258008 *** ======= Backtrace: ========= /lib/tls/i686/cmov/libc.so.6[0x5f9ff1] /lib/tls/i686/cmov/libc.so.6[0x5fcbb3] /lib/tls/i686/cmov/libc.so.6(__libc_calloc+0xa9)[0x5fe009] /home/domel/build/solve-debug/ns3t10mpi(hypre_CAlloc+0x2c)[0x8b4ea28] /home/domel/build/solve-debug/ns3t10mpi(hypre_BoomerAMGCoarsenRuge+0xb5)[0x8af2c7b] (and so on) gdb invoked as: mpiexec -np 2 ..... -on_error_attach_debugger -display :0.0 does not display any backtrace after the crash. Any hints how to debug are highly appreciated. Dominik From knepley at gmail.com Sat Nov 14 15:45:00 2009 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 14 Nov 2009 15:45:00 -0600 Subject: malloc(): memory corruption: In-Reply-To: <4AFF21ED.5080106@itis.ethz.ch> References: <4AFF21ED.5080106@itis.ethz.ch> Message-ID: Try valgrind. Matt On Sat, Nov 14, 2009 at 3:32 PM, Dominik Szczerba wrote: > Now for something more serious: I get a crash like this one: > > Starting KSPSolve (1/2) > 0 KSP Residual norm 2.964538623545e-06 > *** glibc detected *** /home/domel/build/solve-debug/ns3t10mpi: malloc(): > memory corruption: 0x09258008 *** > ======= Backtrace: ========= > /lib/tls/i686/cmov/libc.so.6[0x5f9ff1] > /lib/tls/i686/cmov/libc.so.6[0x5fcbb3] > /lib/tls/i686/cmov/libc.so.6(__libc_calloc+0xa9)[0x5fe009] > /home/domel/build/solve-debug/ns3t10mpi(hypre_CAlloc+0x2c)[0x8b4ea28] > > /home/domel/build/solve-debug/ns3t10mpi(hypre_BoomerAMGCoarsenRuge+0xb5)[0x8af2c7b] > (and so on) > > gdb invoked as: > > mpiexec -np 2 ..... -on_error_attach_debugger -display :0.0 > > does not display any backtrace after the crash. > > Any hints how to debug are highly appreciated. > > Dominik > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From dominik at itis.ethz.ch Sat Nov 14 15:51:32 2009 From: dominik at itis.ethz.ch (Dominik Szczerba) Date: Sat, 14 Nov 2009 22:51:32 +0100 Subject: malloc(): memory corruption: In-Reply-To: References: <4AFF21ED.5080106@itis.ethz.ch> Message-ID: <4AFF2664.20202@itis.ethz.ch> run onlu in single, he says things like below - but does not crash. Also, the program run with -np 1 does not crash. No clear idea though about valgrind's output, please advise if this tells you anything... Call from NS3T10::createSolverContexts() referenced therein is: ierr = KSPCreate(petsc_comm,&kspSchurVelocity);CHKERRQ(ierr); ==2605== Conditional jump or move depends on uninitialised value(s) ==2605== at 0x8AE720F: hypre_BoomerAMGSetPlotFileName (par_amg.c:2115) ==2605== by 0x8AE7ED9: hypre_BoomerAMGCreate (par_amg.c:276) ==2605== by 0x8AE4A71: HYPRE_BoomerAMGCreate (HYPRE_parcsr_amg.c:31) ==2605== by 0x8562019: PCHYPRESetType_HYPRE (hypre.c:850) ==2605== by 0x8563068: PCHYPRESetType (hypre.c:964) ==2605== by 0x80E67BB: NS3T10::createSolverContexts() (NS3T10mpi.cxx:1980) ==2605== by 0x80EA63B: NS3T10::solve() (NS3T10mpi.cxx:2306) ==2605== by 0x8104860: main (ns3t10mpi_main.cxx:1516) ==2605== ==2605== Conditional jump or move depends on uninitialised value(s) ==2605== at 0x8AE7244: hypre_BoomerAMGSetPlotFileName (par_amg.c:2120) ==2605== by 0x8AE7ED9: hypre_BoomerAMGCreate (par_amg.c:276) ==2605== by 0x8AE4A71: HYPRE_BoomerAMGCreate (HYPRE_parcsr_amg.c:31) ==2605== by 0x8562019: PCHYPRESetType_HYPRE (hypre.c:850) ==2605== by 0x8563068: PCHYPRESetType (hypre.c:964) ==2605== by 0x80E67BB: NS3T10::createSolverContexts() (NS3T10mpi.cxx:1980) ==2605== by 0x80EA63B: NS3T10::solve() (NS3T10mpi.cxx:2306) ==2605== by 0x8104860: main (ns3t10mpi_main.cxx:1516) ==2605== ==2605== Conditional jump or move depends on uninitialised value(s) ==2605== at 0x4025C16: strcpy (mc_replace_strmem.c:303) ==2605== by 0x8AE727A: hypre_BoomerAMGSetPlotFileName (par_amg.c:2123) ==2605== by 0x8AE7ED9: hypre_BoomerAMGCreate (par_amg.c:276) ==2605== by 0x8AE4A71: HYPRE_BoomerAMGCreate (HYPRE_parcsr_amg.c:31) ==2605== by 0x8562019: PCHYPRESetType_HYPRE (hypre.c:850) ==2605== by 0x8563068: PCHYPRESetType (hypre.c:964) ==2605== by 0x80E67BB: NS3T10::createSolverContexts() (NS3T10mpi.cxx:1980) ==2605== by 0x80EA63B: NS3T10::solve() (NS3T10mpi.cxx:2306) ==2605== by 0x8104860: main (ns3t10mpi_main.cxx:1516) ==2605== ==2605== Conditional jump or move depends on uninitialised value(s) ==2605== at 0x4025C35: strcpy (mc_replace_strmem.c:303) ==2605== by 0x8AE727A: hypre_BoomerAMGSetPlotFileName (par_amg.c:2123) ==2605== by 0x8AE7ED9: hypre_BoomerAMGCreate (par_amg.c:276) ==2605== by 0x8AE4A71: HYPRE_BoomerAMGCreate (HYPRE_parcsr_amg.c:31) ==2605== by 0x8562019: PCHYPRESetType_HYPRE (hypre.c:850) ==2605== by 0x8563068: PCHYPRESetType (hypre.c:964) ==2605== by 0x80E67BB: NS3T10::createSolverContexts() (NS3T10mpi.cxx:1980) ==2605== by 0x80EA63B: NS3T10::solve() (NS3T10mpi.cxx:2306) ==2605== by 0x8104860: main (ns3t10mpi_main.cxx:1516) ==2605== Solver contexts created in 2.520000 s Starting KSPSolve (0/1) 0 KSP Residual norm 8.368803253774e-06 ==2605== Invalid read of size 8 ==2605== at 0x8B23B5A: hypre_BoomerAMGCreateS (par_strength.c:223) ==2605== by 0x8AE966F: hypre_BoomerAMGSetup (par_amg_setup.c:630) ==2605== by 0x8AE4A4D: HYPRE_BoomerAMGSetup (HYPRE_parcsr_amg.c:58) ==2605== by 0x855A5D9: PCSetUp_HYPRE (hypre.c:134) ==2605== by 0x86256A9: PCSetUp (precon.c:794) ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) ==2605== by 0x80F5B16: applyPrecSchur(void*, _p_Vec*, _p_Vec*) (NS3T10mpi.cxx:3741) ==2605== by 0x851C47E: PCApply_Shell (shellpc.c:129) ==2605== by 0x862074E: PCApply (precon.c:357) ==2605== by 0x863AC4C: KSPInitialResidual (itres.c:64) ==2605== by 0x85EB09A: KSPSolve_GMRES (gmres.c:241) ==2605== Address 0xafae5d0 is 0 bytes after a block of size 93,488 alloc'd ==2605== at 0x4023F5B: calloc (vg_replace_malloc.c:418) ==2605== by 0x8B4E9C7: hypre_CAlloc (hypre_memory.c:121) ==2605== by 0x8B4CA67: hypre_CSRMatrixInitialize (csr_matrix.c:91) ==2605== by 0x8B32EC8: hypre_ParCSRMatrixInitialize (par_csr_matrix.c:200) ==2605== by 0x8AE0C44: hypre_IJMatrixInitializeParCSR (IJMatrix_parcsr.c:272) ==2605== by 0x8ADBE09: HYPRE_IJMatrixInitialize (HYPRE_IJMatrix.c:302) ==2605== by 0x891AD3A: MatHYPRE_IJMatrixFastCopy_SeqAIJ (mhyp.c:174) ==2605== by 0x891A2E1: MatHYPRE_IJMatrixCopy (mhyp.c:131) ==2605== by 0x855A445: PCSetUp_HYPRE (hypre.c:130) ==2605== by 0x86256A9: PCSetUp (precon.c:794) ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) ==2605== ==2605== Invalid write of size 4 ==2605== at 0x8B23E0C: hypre_BoomerAMGCreateS (par_strength.c:301) ==2605== by 0x8AE966F: hypre_BoomerAMGSetup (par_amg_setup.c:630) ==2605== by 0x8AE4A4D: HYPRE_BoomerAMGSetup (HYPRE_parcsr_amg.c:58) ==2605== by 0x855A5D9: PCSetUp_HYPRE (hypre.c:134) ==2605== by 0x86256A9: PCSetUp (precon.c:794) ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) ==2605== by 0x80F5B16: applyPrecSchur(void*, _p_Vec*, _p_Vec*) (NS3T10mpi.cxx:3741) ==2605== by 0x851C47E: PCApply_Shell (shellpc.c:129) ==2605== by 0x862074E: PCApply (precon.c:357) ==2605== by 0x863AC4C: KSPInitialResidual (itres.c:64) ==2605== by 0x85EB09A: KSPSolve_GMRES (gmres.c:241) ==2605== Address 0xb12a050 is 0 bytes after a block of size 46,744 alloc'd ==2605== at 0x4023F5B: calloc (vg_replace_malloc.c:418) ==2605== by 0x8B4E9C7: hypre_CAlloc (hypre_memory.c:121) ==2605== by 0x8B23980: hypre_BoomerAMGCreateS (par_strength.c:163) ==2605== by 0x8AE966F: hypre_BoomerAMGSetup (par_amg_setup.c:630) ==2605== by 0x8AE4A4D: HYPRE_BoomerAMGSetup (HYPRE_parcsr_amg.c:58) ==2605== by 0x855A5D9: PCSetUp_HYPRE (hypre.c:134) ==2605== by 0x86256A9: PCSetUp (precon.c:794) ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) ==2605== by 0x80F5B16: applyPrecSchur(void*, _p_Vec*, _p_Vec*) (NS3T10mpi.cxx:3741) ==2605== by 0x851C47E: PCApply_Shell (shellpc.c:129) ==2605== by 0x862074E: PCApply (precon.c:357) ==2605== ... ==2605== Invalid read of size 8 ==2605== at 0x8B1ACE8: hypre_BoomerAMGRelax (par_relax.c:182) ==2605== by 0x8B1DFBF: hypre_BoomerAMGRelaxIF (par_relax_interface.c:110) ==2605== by 0x8AFC310: hypre_BoomerAMGCycle (par_cycle.c:386) ==2605== by 0x8AEE09E: hypre_BoomerAMGSolve (par_amg_solve.c:252) ==2605== by 0x8AE4A25: HYPRE_BoomerAMGSolve (HYPRE_parcsr_amg.c:76) ==2605== by 0x855AAA4: PCApply_HYPRE (hypre.c:172) ==2605== by 0x862074E: PCApply (precon.c:357) ==2605== by 0x8606095: KSPSolve_PREONLY (preonly.c:29) ==2605== by 0x85A85D3: KSPSolve (itfunc.c:385) ==2605== by 0x80F5B16: applyPrecSchur(void*, _p_Vec*, _p_Vec*) (NS3T10mpi.cxx:3741) ==2605== by 0x851C47E: PCApply_Shell (shellpc.c:129) ==2605== by 0x862074E: PCApply (precon.c:357) ==2605== Address 0xafae5d0 is 0 bytes after a block of size 93,488 alloc'd ==2605== at 0x4023F5B: calloc (vg_replace_malloc.c:418) ==2605== by 0x8B4E9C7: hypre_CAlloc (hypre_memory.c:121) ==2605== by 0x8B4CA67: hypre_CSRMatrixInitialize (csr_matrix.c:91) ==2605== by 0x8B32EC8: hypre_ParCSRMatrixInitialize (par_csr_matrix.c:200) ==2605== by 0x8AE0C44: hypre_IJMatrixInitializeParCSR (IJMatrix_parcsr.c:272) ==2605== by 0x8ADBE09: HYPRE_IJMatrixInitialize (HYPRE_IJMatrix.c:302) ==2605== by 0x891AD3A: MatHYPRE_IJMatrixFastCopy_SeqAIJ (mhyp.c:174) ==2605== by 0x891A2E1: MatHYPRE_IJMatrixCopy (mhyp.c:131) ==2605== by 0x855A445: PCSetUp_HYPRE (hypre.c:130) ==2605== by 0x86256A9: PCSetUp (precon.c:794) ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) ==2605== ... 0 KSP Residual norm 8.368803253774e-06 ==2605== Invalid read of size 8 ==2605== at 0x8B1ADC0: hypre_BoomerAMGRelax (par_relax.c:196) ==2605== by 0x8B1DFBF: hypre_BoomerAMGRelaxIF (par_relax_interface.c:110) ==2605== by 0x8AFC310: hypre_BoomerAMGCycle (par_cycle.c:386) ==2605== by 0x8AEE09E: hypre_BoomerAMGSolve (par_amg_solve.c:252) ==2605== by 0x8AE4A25: HYPRE_BoomerAMGSolve (HYPRE_parcsr_amg.c:76) ==2605== by 0x855AAA4: PCApply_HYPRE (hypre.c:172) ==2605== by 0x862074E: PCApply (precon.c:357) ==2605== by 0x8606095: KSPSolve_PREONLY (preonly.c:29) ==2605== by 0x85A85D3: KSPSolve (itfunc.c:385) ==2605== by 0x80F5B16: applyPrecSchur(void*, _p_Vec*, _p_Vec*) (NS3T10mpi.cxx:3741) ==2605== by 0x851C47E: PCApply_Shell (shellpc.c:129) ==2605== by 0x862074E: PCApply (precon.c:357) ==2605== Address 0xcded820 is 0 bytes after a block of size 93,488 alloc'd ==2605== at 0x4023F5B: calloc (vg_replace_malloc.c:418) ==2605== by 0x8B4E9C7: hypre_CAlloc (hypre_memory.c:121) ==2605== by 0x8B4CA67: hypre_CSRMatrixInitialize (csr_matrix.c:91) ==2605== by 0x8B32EC8: hypre_ParCSRMatrixInitialize (par_csr_matrix.c:200) ==2605== by 0x8AE0C44: hypre_IJMatrixInitializeParCSR (IJMatrix_parcsr.c:272) ==2605== by 0x8ADBE09: HYPRE_IJMatrixInitialize (HYPRE_IJMatrix.c:302) ==2605== by 0x891AD3A: MatHYPRE_IJMatrixFastCopy_SeqAIJ (mhyp.c:174) ==2605== by 0x891A2E1: MatHYPRE_IJMatrixCopy (mhyp.c:131) ==2605== by 0x855A445: PCSetUp_HYPRE (hypre.c:130) ==2605== by 0x86256A9: PCSetUp (precon.c:794) ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) ==2605== Matthew Knepley wrote: > Try valgrind. > > Matt > > On Sat, Nov 14, 2009 at 3:32 PM, Dominik Szczerba > wrote: > > Now for something more serious: I get a crash like this one: > > Starting KSPSolve (1/2) > 0 KSP Residual norm 2.964538623545e-06 > *** glibc detected *** /home/domel/build/solve-debug/ns3t10mpi: > malloc(): memory corruption: 0x09258008 *** > ======= Backtrace: ========= > /lib/tls/i686/cmov/libc.so.6[0x5f9ff1] > /lib/tls/i686/cmov/libc.so.6[0x5fcbb3] > /lib/tls/i686/cmov/libc.so.6(__libc_calloc+0xa9)[0x5fe009] > /home/domel/build/solve-debug/ns3t10mpi(hypre_CAlloc+0x2c)[0x8b4ea28] > /home/domel/build/solve-debug/ns3t10mpi(hypre_BoomerAMGCoarsenRuge+0xb5)[0x8af2c7b] > (and so on) > > gdb invoked as: > > mpiexec -np 2 ..... -on_error_attach_debugger -display :0.0 > > does not display any backtrace after the crash. > > Any hints how to debug are highly appreciated. > > Dominik > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener From knepley at gmail.com Sat Nov 14 16:21:39 2009 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 14 Nov 2009 16:21:39 -0600 Subject: malloc(): memory corruption: In-Reply-To: <4AFF2664.20202@itis.ethz.ch> References: <4AFF21ED.5080106@itis.ethz.ch> <4AFF2664.20202@itis.ethz.ch> Message-ID: This is already bad. You had an Invalid Read and Invalid Write in your Hypre. Did you build it yourself? If so, let us build it. If not, please try your matrix on KSP ex10 and see if you get a crash on 2 procs. Thanks, Matt On Sat, Nov 14, 2009 at 3:51 PM, Dominik Szczerba wrote: > run onlu in single, he says things like below - but does not crash. Also, > the program run with -np 1 does not crash. No clear idea though about > valgrind's output, please advise if this tells you anything... > > Call from NS3T10::createSolverContexts() referenced therein is: > > ierr = KSPCreate(petsc_comm,&kspSchurVelocity);CHKERRQ(ierr); > > > ==2605== Conditional jump or move depends on uninitialised value(s) > ==2605== at 0x8AE720F: hypre_BoomerAMGSetPlotFileName (par_amg.c:2115) > ==2605== by 0x8AE7ED9: hypre_BoomerAMGCreate (par_amg.c:276) > ==2605== by 0x8AE4A71: HYPRE_BoomerAMGCreate (HYPRE_parcsr_amg.c:31) > ==2605== by 0x8562019: PCHYPRESetType_HYPRE (hypre.c:850) > ==2605== by 0x8563068: PCHYPRESetType (hypre.c:964) > ==2605== by 0x80E67BB: NS3T10::createSolverContexts() > (NS3T10mpi.cxx:1980) > ==2605== by 0x80EA63B: NS3T10::solve() (NS3T10mpi.cxx:2306) > ==2605== by 0x8104860: main (ns3t10mpi_main.cxx:1516) > ==2605== > ==2605== Conditional jump or move depends on uninitialised value(s) > ==2605== at 0x8AE7244: hypre_BoomerAMGSetPlotFileName (par_amg.c:2120) > ==2605== by 0x8AE7ED9: hypre_BoomerAMGCreate (par_amg.c:276) > ==2605== by 0x8AE4A71: HYPRE_BoomerAMGCreate (HYPRE_parcsr_amg.c:31) > ==2605== by 0x8562019: PCHYPRESetType_HYPRE (hypre.c:850) > ==2605== by 0x8563068: PCHYPRESetType (hypre.c:964) > ==2605== by 0x80E67BB: NS3T10::createSolverContexts() > (NS3T10mpi.cxx:1980) > ==2605== by 0x80EA63B: NS3T10::solve() (NS3T10mpi.cxx:2306) > ==2605== by 0x8104860: main (ns3t10mpi_main.cxx:1516) > ==2605== > ==2605== Conditional jump or move depends on uninitialised value(s) > ==2605== at 0x4025C16: strcpy (mc_replace_strmem.c:303) > ==2605== by 0x8AE727A: hypre_BoomerAMGSetPlotFileName (par_amg.c:2123) > ==2605== by 0x8AE7ED9: hypre_BoomerAMGCreate (par_amg.c:276) > ==2605== by 0x8AE4A71: HYPRE_BoomerAMGCreate (HYPRE_parcsr_amg.c:31) > ==2605== by 0x8562019: PCHYPRESetType_HYPRE (hypre.c:850) > ==2605== by 0x8563068: PCHYPRESetType (hypre.c:964) > ==2605== by 0x80E67BB: NS3T10::createSolverContexts() > (NS3T10mpi.cxx:1980) > ==2605== by 0x80EA63B: NS3T10::solve() (NS3T10mpi.cxx:2306) > ==2605== by 0x8104860: main (ns3t10mpi_main.cxx:1516) > ==2605== > ==2605== Conditional jump or move depends on uninitialised value(s) > ==2605== at 0x4025C35: strcpy (mc_replace_strmem.c:303) > ==2605== by 0x8AE727A: hypre_BoomerAMGSetPlotFileName (par_amg.c:2123) > ==2605== by 0x8AE7ED9: hypre_BoomerAMGCreate (par_amg.c:276) > ==2605== by 0x8AE4A71: HYPRE_BoomerAMGCreate (HYPRE_parcsr_amg.c:31) > ==2605== by 0x8562019: PCHYPRESetType_HYPRE (hypre.c:850) > ==2605== by 0x8563068: PCHYPRESetType (hypre.c:964) > ==2605== by 0x80E67BB: NS3T10::createSolverContexts() > (NS3T10mpi.cxx:1980) > ==2605== by 0x80EA63B: NS3T10::solve() (NS3T10mpi.cxx:2306) > ==2605== by 0x8104860: main (ns3t10mpi_main.cxx:1516) > ==2605== > Solver contexts created in 2.520000 s > Starting KSPSolve (0/1) > 0 KSP Residual norm 8.368803253774e-06 > ==2605== Invalid read of size 8 > ==2605== at 0x8B23B5A: hypre_BoomerAMGCreateS (par_strength.c:223) > ==2605== by 0x8AE966F: hypre_BoomerAMGSetup (par_amg_setup.c:630) > ==2605== by 0x8AE4A4D: HYPRE_BoomerAMGSetup (HYPRE_parcsr_amg.c:58) > ==2605== by 0x855A5D9: PCSetUp_HYPRE (hypre.c:134) > ==2605== by 0x86256A9: PCSetUp (precon.c:794) > ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) > ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) > ==2605== by 0x80F5B16: applyPrecSchur(void*, _p_Vec*, _p_Vec*) > (NS3T10mpi.cxx:3741) > ==2605== by 0x851C47E: PCApply_Shell (shellpc.c:129) > ==2605== by 0x862074E: PCApply (precon.c:357) > ==2605== by 0x863AC4C: KSPInitialResidual (itres.c:64) > ==2605== by 0x85EB09A: KSPSolve_GMRES (gmres.c:241) > ==2605== Address 0xafae5d0 is 0 bytes after a block of size 93,488 alloc'd > ==2605== at 0x4023F5B: calloc (vg_replace_malloc.c:418) > ==2605== by 0x8B4E9C7: hypre_CAlloc (hypre_memory.c:121) > ==2605== by 0x8B4CA67: hypre_CSRMatrixInitialize (csr_matrix.c:91) > ==2605== by 0x8B32EC8: hypre_ParCSRMatrixInitialize > (par_csr_matrix.c:200) > ==2605== by 0x8AE0C44: hypre_IJMatrixInitializeParCSR > (IJMatrix_parcsr.c:272) > ==2605== by 0x8ADBE09: HYPRE_IJMatrixInitialize (HYPRE_IJMatrix.c:302) > ==2605== by 0x891AD3A: MatHYPRE_IJMatrixFastCopy_SeqAIJ (mhyp.c:174) > ==2605== by 0x891A2E1: MatHYPRE_IJMatrixCopy (mhyp.c:131) > ==2605== by 0x855A445: PCSetUp_HYPRE (hypre.c:130) > ==2605== by 0x86256A9: PCSetUp (precon.c:794) > ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) > ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) > ==2605== > ==2605== Invalid write of size 4 > ==2605== at 0x8B23E0C: hypre_BoomerAMGCreateS (par_strength.c:301) > ==2605== by 0x8AE966F: hypre_BoomerAMGSetup (par_amg_setup.c:630) > ==2605== by 0x8AE4A4D: HYPRE_BoomerAMGSetup (HYPRE_parcsr_amg.c:58) > ==2605== by 0x855A5D9: PCSetUp_HYPRE (hypre.c:134) > ==2605== by 0x86256A9: PCSetUp (precon.c:794) > ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) > ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) > ==2605== by 0x80F5B16: applyPrecSchur(void*, _p_Vec*, _p_Vec*) > (NS3T10mpi.cxx:3741) > ==2605== by 0x851C47E: PCApply_Shell (shellpc.c:129) > ==2605== by 0x862074E: PCApply (precon.c:357) > ==2605== by 0x863AC4C: KSPInitialResidual (itres.c:64) > ==2605== by 0x85EB09A: KSPSolve_GMRES (gmres.c:241) > ==2605== Address 0xb12a050 is 0 bytes after a block of size 46,744 alloc'd > ==2605== at 0x4023F5B: calloc (vg_replace_malloc.c:418) > ==2605== by 0x8B4E9C7: hypre_CAlloc (hypre_memory.c:121) > ==2605== by 0x8B23980: hypre_BoomerAMGCreateS (par_strength.c:163) > ==2605== by 0x8AE966F: hypre_BoomerAMGSetup (par_amg_setup.c:630) > ==2605== by 0x8AE4A4D: HYPRE_BoomerAMGSetup (HYPRE_parcsr_amg.c:58) > ==2605== by 0x855A5D9: PCSetUp_HYPRE (hypre.c:134) > ==2605== by 0x86256A9: PCSetUp (precon.c:794) > ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) > ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) > ==2605== by 0x80F5B16: applyPrecSchur(void*, _p_Vec*, _p_Vec*) > (NS3T10mpi.cxx:3741) > ==2605== by 0x851C47E: PCApply_Shell (shellpc.c:129) > ==2605== by 0x862074E: PCApply (precon.c:357) > ==2605== > ... > ==2605== Invalid read of size 8 > ==2605== at 0x8B1ACE8: hypre_BoomerAMGRelax (par_relax.c:182) > ==2605== by 0x8B1DFBF: hypre_BoomerAMGRelaxIF > (par_relax_interface.c:110) > ==2605== by 0x8AFC310: hypre_BoomerAMGCycle (par_cycle.c:386) > ==2605== by 0x8AEE09E: hypre_BoomerAMGSolve (par_amg_solve.c:252) > ==2605== by 0x8AE4A25: HYPRE_BoomerAMGSolve (HYPRE_parcsr_amg.c:76) > ==2605== by 0x855AAA4: PCApply_HYPRE (hypre.c:172) > ==2605== by 0x862074E: PCApply (precon.c:357) > ==2605== by 0x8606095: KSPSolve_PREONLY (preonly.c:29) > ==2605== by 0x85A85D3: KSPSolve (itfunc.c:385) > ==2605== by 0x80F5B16: applyPrecSchur(void*, _p_Vec*, _p_Vec*) > (NS3T10mpi.cxx:3741) > ==2605== by 0x851C47E: PCApply_Shell (shellpc.c:129) > ==2605== by 0x862074E: PCApply (precon.c:357) > ==2605== Address 0xafae5d0 is 0 bytes after a block of size 93,488 alloc'd > ==2605== at 0x4023F5B: calloc (vg_replace_malloc.c:418) > ==2605== by 0x8B4E9C7: hypre_CAlloc (hypre_memory.c:121) > ==2605== by 0x8B4CA67: hypre_CSRMatrixInitialize (csr_matrix.c:91) > ==2605== by 0x8B32EC8: hypre_ParCSRMatrixInitialize > (par_csr_matrix.c:200) > ==2605== by 0x8AE0C44: hypre_IJMatrixInitializeParCSR > (IJMatrix_parcsr.c:272) > ==2605== by 0x8ADBE09: HYPRE_IJMatrixInitialize (HYPRE_IJMatrix.c:302) > ==2605== by 0x891AD3A: MatHYPRE_IJMatrixFastCopy_SeqAIJ (mhyp.c:174) > ==2605== by 0x891A2E1: MatHYPRE_IJMatrixCopy (mhyp.c:131) > ==2605== by 0x855A445: PCSetUp_HYPRE (hypre.c:130) > ==2605== by 0x86256A9: PCSetUp (precon.c:794) > ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) > ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) > ==2605== > ... > 0 KSP Residual norm 8.368803253774e-06 > ==2605== Invalid read of size 8 > ==2605== at 0x8B1ADC0: hypre_BoomerAMGRelax (par_relax.c:196) > ==2605== by 0x8B1DFBF: hypre_BoomerAMGRelaxIF > (par_relax_interface.c:110) > ==2605== by 0x8AFC310: hypre_BoomerAMGCycle (par_cycle.c:386) > ==2605== by 0x8AEE09E: hypre_BoomerAMGSolve (par_amg_solve.c:252) > ==2605== by 0x8AE4A25: HYPRE_BoomerAMGSolve (HYPRE_parcsr_amg.c:76) > ==2605== by 0x855AAA4: PCApply_HYPRE (hypre.c:172) > ==2605== by 0x862074E: PCApply (precon.c:357) > ==2605== by 0x8606095: KSPSolve_PREONLY (preonly.c:29) > ==2605== by 0x85A85D3: KSPSolve (itfunc.c:385) > ==2605== by 0x80F5B16: applyPrecSchur(void*, _p_Vec*, _p_Vec*) > (NS3T10mpi.cxx:3741) > ==2605== by 0x851C47E: PCApply_Shell (shellpc.c:129) > ==2605== by 0x862074E: PCApply (precon.c:357) > ==2605== Address 0xcded820 is 0 bytes after a block of size 93,488 alloc'd > ==2605== at 0x4023F5B: calloc (vg_replace_malloc.c:418) > ==2605== by 0x8B4E9C7: hypre_CAlloc (hypre_memory.c:121) > ==2605== by 0x8B4CA67: hypre_CSRMatrixInitialize (csr_matrix.c:91) > ==2605== by 0x8B32EC8: hypre_ParCSRMatrixInitialize > (par_csr_matrix.c:200) > ==2605== by 0x8AE0C44: hypre_IJMatrixInitializeParCSR > (IJMatrix_parcsr.c:272) > ==2605== by 0x8ADBE09: HYPRE_IJMatrixInitialize (HYPRE_IJMatrix.c:302) > ==2605== by 0x891AD3A: MatHYPRE_IJMatrixFastCopy_SeqAIJ (mhyp.c:174) > ==2605== by 0x891A2E1: MatHYPRE_IJMatrixCopy (mhyp.c:131) > ==2605== by 0x855A445: PCSetUp_HYPRE (hypre.c:130) > ==2605== by 0x86256A9: PCSetUp (precon.c:794) > ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) > ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) > ==2605== > > > > > Matthew Knepley wrote: > >> Try valgrind. >> >> Matt >> >> >> On Sat, Nov 14, 2009 at 3:32 PM, Dominik Szczerba > dominik at itis.ethz.ch>> wrote: >> >> Now for something more serious: I get a crash like this one: >> >> Starting KSPSolve (1/2) >> 0 KSP Residual norm 2.964538623545e-06 >> *** glibc detected *** /home/domel/build/solve-debug/ns3t10mpi: >> malloc(): memory corruption: 0x09258008 *** >> ======= Backtrace: ========= >> /lib/tls/i686/cmov/libc.so.6[0x5f9ff1] >> /lib/tls/i686/cmov/libc.so.6[0x5fcbb3] >> /lib/tls/i686/cmov/libc.so.6(__libc_calloc+0xa9)[0x5fe009] >> /home/domel/build/solve-debug/ns3t10mpi(hypre_CAlloc+0x2c)[0x8b4ea28] >> >> /home/domel/build/solve-debug/ns3t10mpi(hypre_BoomerAMGCoarsenRuge+0xb5)[0x8af2c7b] >> (and so on) >> >> gdb invoked as: >> >> mpiexec -np 2 ..... -on_error_attach_debugger -display :0.0 >> >> does not display any backtrace after the crash. >> >> Any hints how to debug are highly appreciated. >> >> Dominik >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From dominik at itis.ethz.ch Sat Nov 14 16:41:28 2009 From: dominik at itis.ethz.ch (Dominik Szczerba) Date: Sat, 14 Nov 2009 23:41:28 +0100 Subject: malloc(): memory corruption: In-Reply-To: References: <4AFF21ED.5080106@itis.ethz.ch> <4AFF2664.20202@itis.ethz.ch> Message-ID: <4AFF3218.3070303@itis.ethz.ch> No I am using Hypre built automatically along with petsc... I will try ex10, thanks... Matthew Knepley wrote: > This is already bad. You had an Invalid Read and Invalid Write in your > Hypre. Did you build it > yourself? If so, let us build it. If not, please try your matrix on KSP > ex10 and see if you get a > crash on 2 procs. > > Thanks, > > Matt > > On Sat, Nov 14, 2009 at 3:51 PM, Dominik Szczerba > wrote: > > run onlu in single, he says things like below - but does not crash. > Also, the program run with -np 1 does not crash. No clear idea > though about valgrind's output, please advise if this tells you > anything... > > Call from NS3T10::createSolverContexts() referenced therein is: > > ierr = KSPCreate(petsc_comm,&kspSchurVelocity);CHKERRQ(ierr); > > > ==2605== Conditional jump or move depends on uninitialised value(s) > ==2605== at 0x8AE720F: hypre_BoomerAMGSetPlotFileName > (par_amg.c:2115) > ==2605== by 0x8AE7ED9: hypre_BoomerAMGCreate (par_amg.c:276) > ==2605== by 0x8AE4A71: HYPRE_BoomerAMGCreate (HYPRE_parcsr_amg.c:31) > ==2605== by 0x8562019: PCHYPRESetType_HYPRE (hypre.c:850) > ==2605== by 0x8563068: PCHYPRESetType (hypre.c:964) > ==2605== by 0x80E67BB: NS3T10::createSolverContexts() > (NS3T10mpi.cxx:1980) > ==2605== by 0x80EA63B: NS3T10::solve() (NS3T10mpi.cxx:2306) > ==2605== by 0x8104860: main (ns3t10mpi_main.cxx:1516) > ==2605== > ==2605== Conditional jump or move depends on uninitialised value(s) > ==2605== at 0x8AE7244: hypre_BoomerAMGSetPlotFileName > (par_amg.c:2120) > ==2605== by 0x8AE7ED9: hypre_BoomerAMGCreate (par_amg.c:276) > ==2605== by 0x8AE4A71: HYPRE_BoomerAMGCreate (HYPRE_parcsr_amg.c:31) > ==2605== by 0x8562019: PCHYPRESetType_HYPRE (hypre.c:850) > ==2605== by 0x8563068: PCHYPRESetType (hypre.c:964) > ==2605== by 0x80E67BB: NS3T10::createSolverContexts() > (NS3T10mpi.cxx:1980) > ==2605== by 0x80EA63B: NS3T10::solve() (NS3T10mpi.cxx:2306) > ==2605== by 0x8104860: main (ns3t10mpi_main.cxx:1516) > ==2605== > ==2605== Conditional jump or move depends on uninitialised value(s) > ==2605== at 0x4025C16: strcpy (mc_replace_strmem.c:303) > ==2605== by 0x8AE727A: hypre_BoomerAMGSetPlotFileName > (par_amg.c:2123) > ==2605== by 0x8AE7ED9: hypre_BoomerAMGCreate (par_amg.c:276) > ==2605== by 0x8AE4A71: HYPRE_BoomerAMGCreate (HYPRE_parcsr_amg.c:31) > ==2605== by 0x8562019: PCHYPRESetType_HYPRE (hypre.c:850) > ==2605== by 0x8563068: PCHYPRESetType (hypre.c:964) > ==2605== by 0x80E67BB: NS3T10::createSolverContexts() > (NS3T10mpi.cxx:1980) > ==2605== by 0x80EA63B: NS3T10::solve() (NS3T10mpi.cxx:2306) > ==2605== by 0x8104860: main (ns3t10mpi_main.cxx:1516) > ==2605== > ==2605== Conditional jump or move depends on uninitialised value(s) > ==2605== at 0x4025C35: strcpy (mc_replace_strmem.c:303) > ==2605== by 0x8AE727A: hypre_BoomerAMGSetPlotFileName > (par_amg.c:2123) > ==2605== by 0x8AE7ED9: hypre_BoomerAMGCreate (par_amg.c:276) > ==2605== by 0x8AE4A71: HYPRE_BoomerAMGCreate (HYPRE_parcsr_amg.c:31) > ==2605== by 0x8562019: PCHYPRESetType_HYPRE (hypre.c:850) > ==2605== by 0x8563068: PCHYPRESetType (hypre.c:964) > ==2605== by 0x80E67BB: NS3T10::createSolverContexts() > (NS3T10mpi.cxx:1980) > ==2605== by 0x80EA63B: NS3T10::solve() (NS3T10mpi.cxx:2306) > ==2605== by 0x8104860: main (ns3t10mpi_main.cxx:1516) > ==2605== > Solver contexts created in 2.520000 s > Starting KSPSolve (0/1) > 0 KSP Residual norm 8.368803253774e-06 > ==2605== Invalid read of size 8 > ==2605== at 0x8B23B5A: hypre_BoomerAMGCreateS (par_strength.c:223) > ==2605== by 0x8AE966F: hypre_BoomerAMGSetup (par_amg_setup.c:630) > ==2605== by 0x8AE4A4D: HYPRE_BoomerAMGSetup (HYPRE_parcsr_amg.c:58) > ==2605== by 0x855A5D9: PCSetUp_HYPRE (hypre.c:134) > ==2605== by 0x86256A9: PCSetUp (precon.c:794) > ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) > ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) > ==2605== by 0x80F5B16: applyPrecSchur(void*, _p_Vec*, _p_Vec*) > (NS3T10mpi.cxx:3741) > ==2605== by 0x851C47E: PCApply_Shell (shellpc.c:129) > ==2605== by 0x862074E: PCApply (precon.c:357) > ==2605== by 0x863AC4C: KSPInitialResidual (itres.c:64) > ==2605== by 0x85EB09A: KSPSolve_GMRES (gmres.c:241) > ==2605== Address 0xafae5d0 is 0 bytes after a block of size 93,488 > alloc'd > ==2605== at 0x4023F5B: calloc (vg_replace_malloc.c:418) > ==2605== by 0x8B4E9C7: hypre_CAlloc (hypre_memory.c:121) > ==2605== by 0x8B4CA67: hypre_CSRMatrixInitialize (csr_matrix.c:91) > ==2605== by 0x8B32EC8: hypre_ParCSRMatrixInitialize > (par_csr_matrix.c:200) > ==2605== by 0x8AE0C44: hypre_IJMatrixInitializeParCSR > (IJMatrix_parcsr.c:272) > ==2605== by 0x8ADBE09: HYPRE_IJMatrixInitialize > (HYPRE_IJMatrix.c:302) > ==2605== by 0x891AD3A: MatHYPRE_IJMatrixFastCopy_SeqAIJ (mhyp.c:174) > ==2605== by 0x891A2E1: MatHYPRE_IJMatrixCopy (mhyp.c:131) > ==2605== by 0x855A445: PCSetUp_HYPRE (hypre.c:130) > ==2605== by 0x86256A9: PCSetUp (precon.c:794) > ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) > ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) > ==2605== > ==2605== Invalid write of size 4 > ==2605== at 0x8B23E0C: hypre_BoomerAMGCreateS (par_strength.c:301) > ==2605== by 0x8AE966F: hypre_BoomerAMGSetup (par_amg_setup.c:630) > ==2605== by 0x8AE4A4D: HYPRE_BoomerAMGSetup (HYPRE_parcsr_amg.c:58) > ==2605== by 0x855A5D9: PCSetUp_HYPRE (hypre.c:134) > ==2605== by 0x86256A9: PCSetUp (precon.c:794) > ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) > ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) > ==2605== by 0x80F5B16: applyPrecSchur(void*, _p_Vec*, _p_Vec*) > (NS3T10mpi.cxx:3741) > ==2605== by 0x851C47E: PCApply_Shell (shellpc.c:129) > ==2605== by 0x862074E: PCApply (precon.c:357) > ==2605== by 0x863AC4C: KSPInitialResidual (itres.c:64) > ==2605== by 0x85EB09A: KSPSolve_GMRES (gmres.c:241) > ==2605== Address 0xb12a050 is 0 bytes after a block of size 46,744 > alloc'd > ==2605== at 0x4023F5B: calloc (vg_replace_malloc.c:418) > ==2605== by 0x8B4E9C7: hypre_CAlloc (hypre_memory.c:121) > ==2605== by 0x8B23980: hypre_BoomerAMGCreateS (par_strength.c:163) > ==2605== by 0x8AE966F: hypre_BoomerAMGSetup (par_amg_setup.c:630) > ==2605== by 0x8AE4A4D: HYPRE_BoomerAMGSetup (HYPRE_parcsr_amg.c:58) > ==2605== by 0x855A5D9: PCSetUp_HYPRE (hypre.c:134) > ==2605== by 0x86256A9: PCSetUp (precon.c:794) > ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) > ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) > ==2605== by 0x80F5B16: applyPrecSchur(void*, _p_Vec*, _p_Vec*) > (NS3T10mpi.cxx:3741) > ==2605== by 0x851C47E: PCApply_Shell (shellpc.c:129) > ==2605== by 0x862074E: PCApply (precon.c:357) > ==2605== > ... > ==2605== Invalid read of size 8 > ==2605== at 0x8B1ACE8: hypre_BoomerAMGRelax (par_relax.c:182) > ==2605== by 0x8B1DFBF: hypre_BoomerAMGRelaxIF > (par_relax_interface.c:110) > ==2605== by 0x8AFC310: hypre_BoomerAMGCycle (par_cycle.c:386) > ==2605== by 0x8AEE09E: hypre_BoomerAMGSolve (par_amg_solve.c:252) > ==2605== by 0x8AE4A25: HYPRE_BoomerAMGSolve (HYPRE_parcsr_amg.c:76) > ==2605== by 0x855AAA4: PCApply_HYPRE (hypre.c:172) > ==2605== by 0x862074E: PCApply (precon.c:357) > ==2605== by 0x8606095: KSPSolve_PREONLY (preonly.c:29) > ==2605== by 0x85A85D3: KSPSolve (itfunc.c:385) > ==2605== by 0x80F5B16: applyPrecSchur(void*, _p_Vec*, _p_Vec*) > (NS3T10mpi.cxx:3741) > ==2605== by 0x851C47E: PCApply_Shell (shellpc.c:129) > ==2605== by 0x862074E: PCApply (precon.c:357) > ==2605== Address 0xafae5d0 is 0 bytes after a block of size 93,488 > alloc'd > ==2605== at 0x4023F5B: calloc (vg_replace_malloc.c:418) > ==2605== by 0x8B4E9C7: hypre_CAlloc (hypre_memory.c:121) > ==2605== by 0x8B4CA67: hypre_CSRMatrixInitialize (csr_matrix.c:91) > ==2605== by 0x8B32EC8: hypre_ParCSRMatrixInitialize > (par_csr_matrix.c:200) > ==2605== by 0x8AE0C44: hypre_IJMatrixInitializeParCSR > (IJMatrix_parcsr.c:272) > ==2605== by 0x8ADBE09: HYPRE_IJMatrixInitialize > (HYPRE_IJMatrix.c:302) > ==2605== by 0x891AD3A: MatHYPRE_IJMatrixFastCopy_SeqAIJ (mhyp.c:174) > ==2605== by 0x891A2E1: MatHYPRE_IJMatrixCopy (mhyp.c:131) > ==2605== by 0x855A445: PCSetUp_HYPRE (hypre.c:130) > ==2605== by 0x86256A9: PCSetUp (precon.c:794) > ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) > ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) > ==2605== > ... > 0 KSP Residual norm 8.368803253774e-06 > ==2605== Invalid read of size 8 > ==2605== at 0x8B1ADC0: hypre_BoomerAMGRelax (par_relax.c:196) > ==2605== by 0x8B1DFBF: hypre_BoomerAMGRelaxIF > (par_relax_interface.c:110) > ==2605== by 0x8AFC310: hypre_BoomerAMGCycle (par_cycle.c:386) > ==2605== by 0x8AEE09E: hypre_BoomerAMGSolve (par_amg_solve.c:252) > ==2605== by 0x8AE4A25: HYPRE_BoomerAMGSolve (HYPRE_parcsr_amg.c:76) > ==2605== by 0x855AAA4: PCApply_HYPRE (hypre.c:172) > ==2605== by 0x862074E: PCApply (precon.c:357) > ==2605== by 0x8606095: KSPSolve_PREONLY (preonly.c:29) > ==2605== by 0x85A85D3: KSPSolve (itfunc.c:385) > ==2605== by 0x80F5B16: applyPrecSchur(void*, _p_Vec*, _p_Vec*) > (NS3T10mpi.cxx:3741) > ==2605== by 0x851C47E: PCApply_Shell (shellpc.c:129) > ==2605== by 0x862074E: PCApply (precon.c:357) > ==2605== Address 0xcded820 is 0 bytes after a block of size 93,488 > alloc'd > ==2605== at 0x4023F5B: calloc (vg_replace_malloc.c:418) > ==2605== by 0x8B4E9C7: hypre_CAlloc (hypre_memory.c:121) > ==2605== by 0x8B4CA67: hypre_CSRMatrixInitialize (csr_matrix.c:91) > ==2605== by 0x8B32EC8: hypre_ParCSRMatrixInitialize > (par_csr_matrix.c:200) > ==2605== by 0x8AE0C44: hypre_IJMatrixInitializeParCSR > (IJMatrix_parcsr.c:272) > ==2605== by 0x8ADBE09: HYPRE_IJMatrixInitialize > (HYPRE_IJMatrix.c:302) > ==2605== by 0x891AD3A: MatHYPRE_IJMatrixFastCopy_SeqAIJ (mhyp.c:174) > ==2605== by 0x891A2E1: MatHYPRE_IJMatrixCopy (mhyp.c:131) > ==2605== by 0x855A445: PCSetUp_HYPRE (hypre.c:130) > ==2605== by 0x86256A9: PCSetUp (precon.c:794) > ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) > ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) > ==2605== > > > > > Matthew Knepley wrote: > > Try valgrind. > > Matt > > > On Sat, Nov 14, 2009 at 3:32 PM, Dominik Szczerba > > >> wrote: > > Now for something more serious: I get a crash like this one: > > Starting KSPSolve (1/2) > 0 KSP Residual norm 2.964538623545e-06 > *** glibc detected *** /home/domel/build/solve-debug/ns3t10mpi: > malloc(): memory corruption: 0x09258008 *** > ======= Backtrace: ========= > /lib/tls/i686/cmov/libc.so.6[0x5f9ff1] > /lib/tls/i686/cmov/libc.so.6[0x5fcbb3] > /lib/tls/i686/cmov/libc.so.6(__libc_calloc+0xa9)[0x5fe009] > > /home/domel/build/solve-debug/ns3t10mpi(hypre_CAlloc+0x2c)[0x8b4ea28] > > /home/domel/build/solve-debug/ns3t10mpi(hypre_BoomerAMGCoarsenRuge+0xb5)[0x8af2c7b] > (and so on) > > gdb invoked as: > > mpiexec -np 2 ..... -on_error_attach_debugger -display :0.0 > > does not display any backtrace after the crash. > > Any hints how to debug are highly appreciated. > > Dominik > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to > which their experiments lead. > -- Norbert Wiener > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener From bsmith at mcs.anl.gov Sat Nov 14 21:22:28 2009 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 14 Nov 2009 21:22:28 -0600 Subject: malloc(): memory corruption: In-Reply-To: <4AFF3218.3070303@itis.ethz.ch> References: <4AFF21ED.5080106@itis.ethz.ch> <4AFF2664.20202@itis.ethz.ch> <4AFF3218.3070303@itis.ethz.ch> Message-ID: If you run without hypre preconditioner but use instead, say bjacobi under valgrind do you get any valgrind errors? The problem you are having could be do to (1) some memory corruption in your code that is messing up hypre or (2) some bug in hypre that we don't see with our simple test codes. Barry On Nov 14, 2009, at 4:41 PM, Dominik Szczerba wrote: > No I am using Hypre built automatically along with petsc... > I will try ex10, thanks... > > Matthew Knepley wrote: >> This is already bad. You had an Invalid Read and Invalid Write in >> your Hypre. Did you build it >> yourself? If so, let us build it. If not, please try your matrix on >> KSP ex10 and see if you get a >> crash on 2 procs. >> Thanks, >> Matt >> On Sat, Nov 14, 2009 at 3:51 PM, Dominik Szczerba > > wrote: >> run onlu in single, he says things like below - but does not >> crash. >> Also, the program run with -np 1 does not crash. No clear idea >> though about valgrind's output, please advise if this tells you >> anything... >> Call from NS3T10::createSolverContexts() referenced therein is: >> ierr = KSPCreate(petsc_comm,&kspSchurVelocity);CHKERRQ(ierr); >> ==2605== Conditional jump or move depends on uninitialised >> value(s) >> ==2605== at 0x8AE720F: hypre_BoomerAMGSetPlotFileName >> (par_amg.c:2115) >> ==2605== by 0x8AE7ED9: hypre_BoomerAMGCreate (par_amg.c:276) >> ==2605== by 0x8AE4A71: HYPRE_BoomerAMGCreate >> (HYPRE_parcsr_amg.c:31) >> ==2605== by 0x8562019: PCHYPRESetType_HYPRE (hypre.c:850) >> ==2605== by 0x8563068: PCHYPRESetType (hypre.c:964) >> ==2605== by 0x80E67BB: NS3T10::createSolverContexts() >> (NS3T10mpi.cxx:1980) >> ==2605== by 0x80EA63B: NS3T10::solve() (NS3T10mpi.cxx:2306) >> ==2605== by 0x8104860: main (ns3t10mpi_main.cxx:1516) >> ==2605== >> ==2605== Conditional jump or move depends on uninitialised >> value(s) >> ==2605== at 0x8AE7244: hypre_BoomerAMGSetPlotFileName >> (par_amg.c:2120) >> ==2605== by 0x8AE7ED9: hypre_BoomerAMGCreate (par_amg.c:276) >> ==2605== by 0x8AE4A71: HYPRE_BoomerAMGCreate >> (HYPRE_parcsr_amg.c:31) >> ==2605== by 0x8562019: PCHYPRESetType_HYPRE (hypre.c:850) >> ==2605== by 0x8563068: PCHYPRESetType (hypre.c:964) >> ==2605== by 0x80E67BB: NS3T10::createSolverContexts() >> (NS3T10mpi.cxx:1980) >> ==2605== by 0x80EA63B: NS3T10::solve() (NS3T10mpi.cxx:2306) >> ==2605== by 0x8104860: main (ns3t10mpi_main.cxx:1516) >> ==2605== >> ==2605== Conditional jump or move depends on uninitialised >> value(s) >> ==2605== at 0x4025C16: strcpy (mc_replace_strmem.c:303) >> ==2605== by 0x8AE727A: hypre_BoomerAMGSetPlotFileName >> (par_amg.c:2123) >> ==2605== by 0x8AE7ED9: hypre_BoomerAMGCreate (par_amg.c:276) >> ==2605== by 0x8AE4A71: HYPRE_BoomerAMGCreate >> (HYPRE_parcsr_amg.c:31) >> ==2605== by 0x8562019: PCHYPRESetType_HYPRE (hypre.c:850) >> ==2605== by 0x8563068: PCHYPRESetType (hypre.c:964) >> ==2605== by 0x80E67BB: NS3T10::createSolverContexts() >> (NS3T10mpi.cxx:1980) >> ==2605== by 0x80EA63B: NS3T10::solve() (NS3T10mpi.cxx:2306) >> ==2605== by 0x8104860: main (ns3t10mpi_main.cxx:1516) >> ==2605== >> ==2605== Conditional jump or move depends on uninitialised >> value(s) >> ==2605== at 0x4025C35: strcpy (mc_replace_strmem.c:303) >> ==2605== by 0x8AE727A: hypre_BoomerAMGSetPlotFileName >> (par_amg.c:2123) >> ==2605== by 0x8AE7ED9: hypre_BoomerAMGCreate (par_amg.c:276) >> ==2605== by 0x8AE4A71: HYPRE_BoomerAMGCreate >> (HYPRE_parcsr_amg.c:31) >> ==2605== by 0x8562019: PCHYPRESetType_HYPRE (hypre.c:850) >> ==2605== by 0x8563068: PCHYPRESetType (hypre.c:964) >> ==2605== by 0x80E67BB: NS3T10::createSolverContexts() >> (NS3T10mpi.cxx:1980) >> ==2605== by 0x80EA63B: NS3T10::solve() (NS3T10mpi.cxx:2306) >> ==2605== by 0x8104860: main (ns3t10mpi_main.cxx:1516) >> ==2605== >> Solver contexts created in 2.520000 s >> Starting KSPSolve (0/1) >> 0 KSP Residual norm 8.368803253774e-06 >> ==2605== Invalid read of size 8 >> ==2605== at 0x8B23B5A: hypre_BoomerAMGCreateS (par_strength.c: >> 223) >> ==2605== by 0x8AE966F: hypre_BoomerAMGSetup (par_amg_setup.c: >> 630) >> ==2605== by 0x8AE4A4D: HYPRE_BoomerAMGSetup >> (HYPRE_parcsr_amg.c:58) >> ==2605== by 0x855A5D9: PCSetUp_HYPRE (hypre.c:134) >> ==2605== by 0x86256A9: PCSetUp (precon.c:794) >> ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) >> ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) >> ==2605== by 0x80F5B16: applyPrecSchur(void*, _p_Vec*, _p_Vec*) >> (NS3T10mpi.cxx:3741) >> ==2605== by 0x851C47E: PCApply_Shell (shellpc.c:129) >> ==2605== by 0x862074E: PCApply (precon.c:357) >> ==2605== by 0x863AC4C: KSPInitialResidual (itres.c:64) >> ==2605== by 0x85EB09A: KSPSolve_GMRES (gmres.c:241) >> ==2605== Address 0xafae5d0 is 0 bytes after a block of size >> 93,488 >> alloc'd >> ==2605== at 0x4023F5B: calloc (vg_replace_malloc.c:418) >> ==2605== by 0x8B4E9C7: hypre_CAlloc (hypre_memory.c:121) >> ==2605== by 0x8B4CA67: hypre_CSRMatrixInitialize >> (csr_matrix.c:91) >> ==2605== by 0x8B32EC8: hypre_ParCSRMatrixInitialize >> (par_csr_matrix.c:200) >> ==2605== by 0x8AE0C44: hypre_IJMatrixInitializeParCSR >> (IJMatrix_parcsr.c:272) >> ==2605== by 0x8ADBE09: HYPRE_IJMatrixInitialize >> (HYPRE_IJMatrix.c:302) >> ==2605== by 0x891AD3A: MatHYPRE_IJMatrixFastCopy_SeqAIJ >> (mhyp.c:174) >> ==2605== by 0x891A2E1: MatHYPRE_IJMatrixCopy (mhyp.c:131) >> ==2605== by 0x855A445: PCSetUp_HYPRE (hypre.c:130) >> ==2605== by 0x86256A9: PCSetUp (precon.c:794) >> ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) >> ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) >> ==2605== >> ==2605== Invalid write of size 4 >> ==2605== at 0x8B23E0C: hypre_BoomerAMGCreateS (par_strength.c: >> 301) >> ==2605== by 0x8AE966F: hypre_BoomerAMGSetup (par_amg_setup.c: >> 630) >> ==2605== by 0x8AE4A4D: HYPRE_BoomerAMGSetup >> (HYPRE_parcsr_amg.c:58) >> ==2605== by 0x855A5D9: PCSetUp_HYPRE (hypre.c:134) >> ==2605== by 0x86256A9: PCSetUp (precon.c:794) >> ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) >> ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) >> ==2605== by 0x80F5B16: applyPrecSchur(void*, _p_Vec*, _p_Vec*) >> (NS3T10mpi.cxx:3741) >> ==2605== by 0x851C47E: PCApply_Shell (shellpc.c:129) >> ==2605== by 0x862074E: PCApply (precon.c:357) >> ==2605== by 0x863AC4C: KSPInitialResidual (itres.c:64) >> ==2605== by 0x85EB09A: KSPSolve_GMRES (gmres.c:241) >> ==2605== Address 0xb12a050 is 0 bytes after a block of size >> 46,744 >> alloc'd >> ==2605== at 0x4023F5B: calloc (vg_replace_malloc.c:418) >> ==2605== by 0x8B4E9C7: hypre_CAlloc (hypre_memory.c:121) >> ==2605== by 0x8B23980: hypre_BoomerAMGCreateS (par_strength.c: >> 163) >> ==2605== by 0x8AE966F: hypre_BoomerAMGSetup (par_amg_setup.c: >> 630) >> ==2605== by 0x8AE4A4D: HYPRE_BoomerAMGSetup >> (HYPRE_parcsr_amg.c:58) >> ==2605== by 0x855A5D9: PCSetUp_HYPRE (hypre.c:134) >> ==2605== by 0x86256A9: PCSetUp (precon.c:794) >> ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) >> ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) >> ==2605== by 0x80F5B16: applyPrecSchur(void*, _p_Vec*, _p_Vec*) >> (NS3T10mpi.cxx:3741) >> ==2605== by 0x851C47E: PCApply_Shell (shellpc.c:129) >> ==2605== by 0x862074E: PCApply (precon.c:357) >> ==2605== >> ... >> ==2605== Invalid read of size 8 >> ==2605== at 0x8B1ACE8: hypre_BoomerAMGRelax (par_relax.c:182) >> ==2605== by 0x8B1DFBF: hypre_BoomerAMGRelaxIF >> (par_relax_interface.c:110) >> ==2605== by 0x8AFC310: hypre_BoomerAMGCycle (par_cycle.c:386) >> ==2605== by 0x8AEE09E: hypre_BoomerAMGSolve (par_amg_solve.c: >> 252) >> ==2605== by 0x8AE4A25: HYPRE_BoomerAMGSolve >> (HYPRE_parcsr_amg.c:76) >> ==2605== by 0x855AAA4: PCApply_HYPRE (hypre.c:172) >> ==2605== by 0x862074E: PCApply (precon.c:357) >> ==2605== by 0x8606095: KSPSolve_PREONLY (preonly.c:29) >> ==2605== by 0x85A85D3: KSPSolve (itfunc.c:385) >> ==2605== by 0x80F5B16: applyPrecSchur(void*, _p_Vec*, _p_Vec*) >> (NS3T10mpi.cxx:3741) >> ==2605== by 0x851C47E: PCApply_Shell (shellpc.c:129) >> ==2605== by 0x862074E: PCApply (precon.c:357) >> ==2605== Address 0xafae5d0 is 0 bytes after a block of size >> 93,488 >> alloc'd >> ==2605== at 0x4023F5B: calloc (vg_replace_malloc.c:418) >> ==2605== by 0x8B4E9C7: hypre_CAlloc (hypre_memory.c:121) >> ==2605== by 0x8B4CA67: hypre_CSRMatrixInitialize >> (csr_matrix.c:91) >> ==2605== by 0x8B32EC8: hypre_ParCSRMatrixInitialize >> (par_csr_matrix.c:200) >> ==2605== by 0x8AE0C44: hypre_IJMatrixInitializeParCSR >> (IJMatrix_parcsr.c:272) >> ==2605== by 0x8ADBE09: HYPRE_IJMatrixInitialize >> (HYPRE_IJMatrix.c:302) >> ==2605== by 0x891AD3A: MatHYPRE_IJMatrixFastCopy_SeqAIJ >> (mhyp.c:174) >> ==2605== by 0x891A2E1: MatHYPRE_IJMatrixCopy (mhyp.c:131) >> ==2605== by 0x855A445: PCSetUp_HYPRE (hypre.c:130) >> ==2605== by 0x86256A9: PCSetUp (precon.c:794) >> ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) >> ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) >> ==2605== >> ... >> 0 KSP Residual norm 8.368803253774e-06 >> ==2605== Invalid read of size 8 >> ==2605== at 0x8B1ADC0: hypre_BoomerAMGRelax (par_relax.c:196) >> ==2605== by 0x8B1DFBF: hypre_BoomerAMGRelaxIF >> (par_relax_interface.c:110) >> ==2605== by 0x8AFC310: hypre_BoomerAMGCycle (par_cycle.c:386) >> ==2605== by 0x8AEE09E: hypre_BoomerAMGSolve (par_amg_solve.c: >> 252) >> ==2605== by 0x8AE4A25: HYPRE_BoomerAMGSolve >> (HYPRE_parcsr_amg.c:76) >> ==2605== by 0x855AAA4: PCApply_HYPRE (hypre.c:172) >> ==2605== by 0x862074E: PCApply (precon.c:357) >> ==2605== by 0x8606095: KSPSolve_PREONLY (preonly.c:29) >> ==2605== by 0x85A85D3: KSPSolve (itfunc.c:385) >> ==2605== by 0x80F5B16: applyPrecSchur(void*, _p_Vec*, _p_Vec*) >> (NS3T10mpi.cxx:3741) >> ==2605== by 0x851C47E: PCApply_Shell (shellpc.c:129) >> ==2605== by 0x862074E: PCApply (precon.c:357) >> ==2605== Address 0xcded820 is 0 bytes after a block of size >> 93,488 >> alloc'd >> ==2605== at 0x4023F5B: calloc (vg_replace_malloc.c:418) >> ==2605== by 0x8B4E9C7: hypre_CAlloc (hypre_memory.c:121) >> ==2605== by 0x8B4CA67: hypre_CSRMatrixInitialize >> (csr_matrix.c:91) >> ==2605== by 0x8B32EC8: hypre_ParCSRMatrixInitialize >> (par_csr_matrix.c:200) >> ==2605== by 0x8AE0C44: hypre_IJMatrixInitializeParCSR >> (IJMatrix_parcsr.c:272) >> ==2605== by 0x8ADBE09: HYPRE_IJMatrixInitialize >> (HYPRE_IJMatrix.c:302) >> ==2605== by 0x891AD3A: MatHYPRE_IJMatrixFastCopy_SeqAIJ >> (mhyp.c:174) >> ==2605== by 0x891A2E1: MatHYPRE_IJMatrixCopy (mhyp.c:131) >> ==2605== by 0x855A445: PCSetUp_HYPRE (hypre.c:130) >> ==2605== by 0x86256A9: PCSetUp (precon.c:794) >> ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) >> ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) >> ==2605== >> Matthew Knepley wrote: >> Try valgrind. >> Matt >> On Sat, Nov 14, 2009 at 3:32 PM, Dominik Szczerba >> >> >> >> wrote: >> Now for something more serious: I get a crash like this >> one: >> Starting KSPSolve (1/2) >> 0 KSP Residual norm 2.964538623545e-06 >> *** glibc detected *** /home/domel/build/solve-debug/ >> ns3t10mpi: >> malloc(): memory corruption: 0x09258008 *** >> ======= Backtrace: ========= >> /lib/tls/i686/cmov/libc.so.6[0x5f9ff1] >> /lib/tls/i686/cmov/libc.so.6[0x5fcbb3] >> /lib/tls/i686/cmov/libc.so.6(__libc_calloc+0xa9)[0x5fe009] >> /home/domel/build/solve-debug/ >> ns3t10mpi(hypre_CAlloc+0x2c)[0x8b4ea28] >> /home/domel/build/solve-debug/ >> ns3t10mpi(hypre_BoomerAMGCoarsenRuge+0xb5)[0x8af2c7b] >> (and so on) >> gdb invoked as: >> mpiexec -np 2 ..... -on_error_attach_debugger -display :0.0 >> does not display any backtrace after the crash. >> Any hints how to debug are highly appreciated. >> Dominik >> -- What most experimenters take for granted before >> they begin their >> experiments is infinitely more interesting than any results to >> which their experiments lead. >> -- Norbert Wiener >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to >> which their experiments lead. >> -- Norbert Wiener > From dominik at itis.ethz.ch Sun Nov 15 02:24:49 2009 From: dominik at itis.ethz.ch (Dominik Szczerba) Date: Sun, 15 Nov 2009 09:24:49 +0100 Subject: malloc(): memory corruption: In-Reply-To: References: <4AFF21ED.5080106@itis.ethz.ch> <4AFF2664.20202@itis.ethz.ch> <4AFF3218.3070303@itis.ethz.ch> Message-ID: <4AFFBAD1.4050107@itis.ethz.ch> Yes, I have found an error in my matrix... Thank you all for the useful hints! Still, I wonder if there are some more efficient ways to set up bug traps to get get the backtrace leading to the real problem and not to the innocent parts... . With regards, Dominik Barry Smith wrote: > If you run without hypre preconditioner but use instead, say > bjacobi under valgrind do you get any valgrind errors? > > The problem you are having could be do to (1) some memory > corruption in your code that is messing up hypre or (2) some bug in > hypre that we don't see with our simple test codes. > > Barry > > On Nov 14, 2009, at 4:41 PM, Dominik Szczerba wrote: > >> No I am using Hypre built automatically along with petsc... >> I will try ex10, thanks... >> >> Matthew Knepley wrote: >>> This is already bad. You had an Invalid Read and Invalid Write in >>> your Hypre. Did you build it >>> yourself? If so, let us build it. If not, please try your matrix on >>> KSP ex10 and see if you get a >>> crash on 2 procs. >>> Thanks, >>> Matt >>> On Sat, Nov 14, 2009 at 3:51 PM, Dominik Szczerba >> > wrote: >>> run onlu in single, he says things like below - but does not >>> crash. >>> Also, the program run with -np 1 does not crash. No clear idea >>> though about valgrind's output, please advise if this tells you >>> anything... >>> Call from NS3T10::createSolverContexts() referenced therein is: >>> ierr = KSPCreate(petsc_comm,&kspSchurVelocity);CHKERRQ(ierr); >>> ==2605== Conditional jump or move depends on uninitialised >>> value(s) >>> ==2605== at 0x8AE720F: hypre_BoomerAMGSetPlotFileName >>> (par_amg.c:2115) >>> ==2605== by 0x8AE7ED9: hypre_BoomerAMGCreate (par_amg.c:276) >>> ==2605== by 0x8AE4A71: HYPRE_BoomerAMGCreate >>> (HYPRE_parcsr_amg.c:31) >>> ==2605== by 0x8562019: PCHYPRESetType_HYPRE (hypre.c:850) >>> ==2605== by 0x8563068: PCHYPRESetType (hypre.c:964) >>> ==2605== by 0x80E67BB: NS3T10::createSolverContexts() >>> (NS3T10mpi.cxx:1980) >>> ==2605== by 0x80EA63B: NS3T10::solve() (NS3T10mpi.cxx:2306) >>> ==2605== by 0x8104860: main (ns3t10mpi_main.cxx:1516) >>> ==2605== >>> ==2605== Conditional jump or move depends on uninitialised >>> value(s) >>> ==2605== at 0x8AE7244: hypre_BoomerAMGSetPlotFileName >>> (par_amg.c:2120) >>> ==2605== by 0x8AE7ED9: hypre_BoomerAMGCreate (par_amg.c:276) >>> ==2605== by 0x8AE4A71: HYPRE_BoomerAMGCreate >>> (HYPRE_parcsr_amg.c:31) >>> ==2605== by 0x8562019: PCHYPRESetType_HYPRE (hypre.c:850) >>> ==2605== by 0x8563068: PCHYPRESetType (hypre.c:964) >>> ==2605== by 0x80E67BB: NS3T10::createSolverContexts() >>> (NS3T10mpi.cxx:1980) >>> ==2605== by 0x80EA63B: NS3T10::solve() (NS3T10mpi.cxx:2306) >>> ==2605== by 0x8104860: main (ns3t10mpi_main.cxx:1516) >>> ==2605== >>> ==2605== Conditional jump or move depends on uninitialised >>> value(s) >>> ==2605== at 0x4025C16: strcpy (mc_replace_strmem.c:303) >>> ==2605== by 0x8AE727A: hypre_BoomerAMGSetPlotFileName >>> (par_amg.c:2123) >>> ==2605== by 0x8AE7ED9: hypre_BoomerAMGCreate (par_amg.c:276) >>> ==2605== by 0x8AE4A71: HYPRE_BoomerAMGCreate >>> (HYPRE_parcsr_amg.c:31) >>> ==2605== by 0x8562019: PCHYPRESetType_HYPRE (hypre.c:850) >>> ==2605== by 0x8563068: PCHYPRESetType (hypre.c:964) >>> ==2605== by 0x80E67BB: NS3T10::createSolverContexts() >>> (NS3T10mpi.cxx:1980) >>> ==2605== by 0x80EA63B: NS3T10::solve() (NS3T10mpi.cxx:2306) >>> ==2605== by 0x8104860: main (ns3t10mpi_main.cxx:1516) >>> ==2605== >>> ==2605== Conditional jump or move depends on uninitialised >>> value(s) >>> ==2605== at 0x4025C35: strcpy (mc_replace_strmem.c:303) >>> ==2605== by 0x8AE727A: hypre_BoomerAMGSetPlotFileName >>> (par_amg.c:2123) >>> ==2605== by 0x8AE7ED9: hypre_BoomerAMGCreate (par_amg.c:276) >>> ==2605== by 0x8AE4A71: HYPRE_BoomerAMGCreate >>> (HYPRE_parcsr_amg.c:31) >>> ==2605== by 0x8562019: PCHYPRESetType_HYPRE (hypre.c:850) >>> ==2605== by 0x8563068: PCHYPRESetType (hypre.c:964) >>> ==2605== by 0x80E67BB: NS3T10::createSolverContexts() >>> (NS3T10mpi.cxx:1980) >>> ==2605== by 0x80EA63B: NS3T10::solve() (NS3T10mpi.cxx:2306) >>> ==2605== by 0x8104860: main (ns3t10mpi_main.cxx:1516) >>> ==2605== >>> Solver contexts created in 2.520000 s >>> Starting KSPSolve (0/1) >>> 0 KSP Residual norm 8.368803253774e-06 >>> ==2605== Invalid read of size 8 >>> ==2605== at 0x8B23B5A: hypre_BoomerAMGCreateS (par_strength.c: >>> 223) >>> ==2605== by 0x8AE966F: hypre_BoomerAMGSetup (par_amg_setup.c: >>> 630) >>> ==2605== by 0x8AE4A4D: HYPRE_BoomerAMGSetup >>> (HYPRE_parcsr_amg.c:58) >>> ==2605== by 0x855A5D9: PCSetUp_HYPRE (hypre.c:134) >>> ==2605== by 0x86256A9: PCSetUp (precon.c:794) >>> ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) >>> ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) >>> ==2605== by 0x80F5B16: applyPrecSchur(void*, _p_Vec*, _p_Vec*) >>> (NS3T10mpi.cxx:3741) >>> ==2605== by 0x851C47E: PCApply_Shell (shellpc.c:129) >>> ==2605== by 0x862074E: PCApply (precon.c:357) >>> ==2605== by 0x863AC4C: KSPInitialResidual (itres.c:64) >>> ==2605== by 0x85EB09A: KSPSolve_GMRES (gmres.c:241) >>> ==2605== Address 0xafae5d0 is 0 bytes after a block of size >>> 93,488 >>> alloc'd >>> ==2605== at 0x4023F5B: calloc (vg_replace_malloc.c:418) >>> ==2605== by 0x8B4E9C7: hypre_CAlloc (hypre_memory.c:121) >>> ==2605== by 0x8B4CA67: hypre_CSRMatrixInitialize >>> (csr_matrix.c:91) >>> ==2605== by 0x8B32EC8: hypre_ParCSRMatrixInitialize >>> (par_csr_matrix.c:200) >>> ==2605== by 0x8AE0C44: hypre_IJMatrixInitializeParCSR >>> (IJMatrix_parcsr.c:272) >>> ==2605== by 0x8ADBE09: HYPRE_IJMatrixInitialize >>> (HYPRE_IJMatrix.c:302) >>> ==2605== by 0x891AD3A: MatHYPRE_IJMatrixFastCopy_SeqAIJ >>> (mhyp.c:174) >>> ==2605== by 0x891A2E1: MatHYPRE_IJMatrixCopy (mhyp.c:131) >>> ==2605== by 0x855A445: PCSetUp_HYPRE (hypre.c:130) >>> ==2605== by 0x86256A9: PCSetUp (precon.c:794) >>> ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) >>> ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) >>> ==2605== >>> ==2605== Invalid write of size 4 >>> ==2605== at 0x8B23E0C: hypre_BoomerAMGCreateS (par_strength.c: >>> 301) >>> ==2605== by 0x8AE966F: hypre_BoomerAMGSetup (par_amg_setup.c: >>> 630) >>> ==2605== by 0x8AE4A4D: HYPRE_BoomerAMGSetup >>> (HYPRE_parcsr_amg.c:58) >>> ==2605== by 0x855A5D9: PCSetUp_HYPRE (hypre.c:134) >>> ==2605== by 0x86256A9: PCSetUp (precon.c:794) >>> ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) >>> ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) >>> ==2605== by 0x80F5B16: applyPrecSchur(void*, _p_Vec*, _p_Vec*) >>> (NS3T10mpi.cxx:3741) >>> ==2605== by 0x851C47E: PCApply_Shell (shellpc.c:129) >>> ==2605== by 0x862074E: PCApply (precon.c:357) >>> ==2605== by 0x863AC4C: KSPInitialResidual (itres.c:64) >>> ==2605== by 0x85EB09A: KSPSolve_GMRES (gmres.c:241) >>> ==2605== Address 0xb12a050 is 0 bytes after a block of size >>> 46,744 >>> alloc'd >>> ==2605== at 0x4023F5B: calloc (vg_replace_malloc.c:418) >>> ==2605== by 0x8B4E9C7: hypre_CAlloc (hypre_memory.c:121) >>> ==2605== by 0x8B23980: hypre_BoomerAMGCreateS (par_strength.c: >>> 163) >>> ==2605== by 0x8AE966F: hypre_BoomerAMGSetup (par_amg_setup.c: >>> 630) >>> ==2605== by 0x8AE4A4D: HYPRE_BoomerAMGSetup >>> (HYPRE_parcsr_amg.c:58) >>> ==2605== by 0x855A5D9: PCSetUp_HYPRE (hypre.c:134) >>> ==2605== by 0x86256A9: PCSetUp (precon.c:794) >>> ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) >>> ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) >>> ==2605== by 0x80F5B16: applyPrecSchur(void*, _p_Vec*, _p_Vec*) >>> (NS3T10mpi.cxx:3741) >>> ==2605== by 0x851C47E: PCApply_Shell (shellpc.c:129) >>> ==2605== by 0x862074E: PCApply (precon.c:357) >>> ==2605== >>> ... >>> ==2605== Invalid read of size 8 >>> ==2605== at 0x8B1ACE8: hypre_BoomerAMGRelax (par_relax.c:182) >>> ==2605== by 0x8B1DFBF: hypre_BoomerAMGRelaxIF >>> (par_relax_interface.c:110) >>> ==2605== by 0x8AFC310: hypre_BoomerAMGCycle (par_cycle.c:386) >>> ==2605== by 0x8AEE09E: hypre_BoomerAMGSolve (par_amg_solve.c: >>> 252) >>> ==2605== by 0x8AE4A25: HYPRE_BoomerAMGSolve >>> (HYPRE_parcsr_amg.c:76) >>> ==2605== by 0x855AAA4: PCApply_HYPRE (hypre.c:172) >>> ==2605== by 0x862074E: PCApply (precon.c:357) >>> ==2605== by 0x8606095: KSPSolve_PREONLY (preonly.c:29) >>> ==2605== by 0x85A85D3: KSPSolve (itfunc.c:385) >>> ==2605== by 0x80F5B16: applyPrecSchur(void*, _p_Vec*, _p_Vec*) >>> (NS3T10mpi.cxx:3741) >>> ==2605== by 0x851C47E: PCApply_Shell (shellpc.c:129) >>> ==2605== by 0x862074E: PCApply (precon.c:357) >>> ==2605== Address 0xafae5d0 is 0 bytes after a block of size >>> 93,488 >>> alloc'd >>> ==2605== at 0x4023F5B: calloc (vg_replace_malloc.c:418) >>> ==2605== by 0x8B4E9C7: hypre_CAlloc (hypre_memory.c:121) >>> ==2605== by 0x8B4CA67: hypre_CSRMatrixInitialize >>> (csr_matrix.c:91) >>> ==2605== by 0x8B32EC8: hypre_ParCSRMatrixInitialize >>> (par_csr_matrix.c:200) >>> ==2605== by 0x8AE0C44: hypre_IJMatrixInitializeParCSR >>> (IJMatrix_parcsr.c:272) >>> ==2605== by 0x8ADBE09: HYPRE_IJMatrixInitialize >>> (HYPRE_IJMatrix.c:302) >>> ==2605== by 0x891AD3A: MatHYPRE_IJMatrixFastCopy_SeqAIJ >>> (mhyp.c:174) >>> ==2605== by 0x891A2E1: MatHYPRE_IJMatrixCopy (mhyp.c:131) >>> ==2605== by 0x855A445: PCSetUp_HYPRE (hypre.c:130) >>> ==2605== by 0x86256A9: PCSetUp (precon.c:794) >>> ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) >>> ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) >>> ==2605== >>> ... >>> 0 KSP Residual norm 8.368803253774e-06 >>> ==2605== Invalid read of size 8 >>> ==2605== at 0x8B1ADC0: hypre_BoomerAMGRelax (par_relax.c:196) >>> ==2605== by 0x8B1DFBF: hypre_BoomerAMGRelaxIF >>> (par_relax_interface.c:110) >>> ==2605== by 0x8AFC310: hypre_BoomerAMGCycle (par_cycle.c:386) >>> ==2605== by 0x8AEE09E: hypre_BoomerAMGSolve (par_amg_solve.c: >>> 252) >>> ==2605== by 0x8AE4A25: HYPRE_BoomerAMGSolve >>> (HYPRE_parcsr_amg.c:76) >>> ==2605== by 0x855AAA4: PCApply_HYPRE (hypre.c:172) >>> ==2605== by 0x862074E: PCApply (precon.c:357) >>> ==2605== by 0x8606095: KSPSolve_PREONLY (preonly.c:29) >>> ==2605== by 0x85A85D3: KSPSolve (itfunc.c:385) >>> ==2605== by 0x80F5B16: applyPrecSchur(void*, _p_Vec*, _p_Vec*) >>> (NS3T10mpi.cxx:3741) >>> ==2605== by 0x851C47E: PCApply_Shell (shellpc.c:129) >>> ==2605== by 0x862074E: PCApply (precon.c:357) >>> ==2605== Address 0xcded820 is 0 bytes after a block of size >>> 93,488 >>> alloc'd >>> ==2605== at 0x4023F5B: calloc (vg_replace_malloc.c:418) >>> ==2605== by 0x8B4E9C7: hypre_CAlloc (hypre_memory.c:121) >>> ==2605== by 0x8B4CA67: hypre_CSRMatrixInitialize >>> (csr_matrix.c:91) >>> ==2605== by 0x8B32EC8: hypre_ParCSRMatrixInitialize >>> (par_csr_matrix.c:200) >>> ==2605== by 0x8AE0C44: hypre_IJMatrixInitializeParCSR >>> (IJMatrix_parcsr.c:272) >>> ==2605== by 0x8ADBE09: HYPRE_IJMatrixInitialize >>> (HYPRE_IJMatrix.c:302) >>> ==2605== by 0x891AD3A: MatHYPRE_IJMatrixFastCopy_SeqAIJ >>> (mhyp.c:174) >>> ==2605== by 0x891A2E1: MatHYPRE_IJMatrixCopy (mhyp.c:131) >>> ==2605== by 0x855A445: PCSetUp_HYPRE (hypre.c:130) >>> ==2605== by 0x86256A9: PCSetUp (precon.c:794) >>> ==2605== by 0x85A6E62: KSPSetUp (itfunc.c:237) >>> ==2605== by 0x85A7EAB: KSPSolve (itfunc.c:353) >>> ==2605== >>> Matthew Knepley wrote: >>> Try valgrind. >>> Matt >>> On Sat, Nov 14, 2009 at 3:32 PM, Dominik Szczerba >>> >>> >> >>> wrote: >>> Now for something more serious: I get a crash like this >>> one: >>> Starting KSPSolve (1/2) >>> 0 KSP Residual norm 2.964538623545e-06 >>> *** glibc detected *** /home/domel/build/solve-debug/ >>> ns3t10mpi: >>> malloc(): memory corruption: 0x09258008 *** >>> ======= Backtrace: ========= >>> /lib/tls/i686/cmov/libc.so.6[0x5f9ff1] >>> /lib/tls/i686/cmov/libc.so.6[0x5fcbb3] >>> /lib/tls/i686/cmov/libc.so.6(__libc_calloc+0xa9)[0x5fe009] >>> /home/domel/build/solve-debug/ >>> ns3t10mpi(hypre_CAlloc+0x2c)[0x8b4ea28] >>> /home/domel/build/solve-debug/ >>> ns3t10mpi(hypre_BoomerAMGCoarsenRuge+0xb5)[0x8af2c7b] >>> (and so on) >>> gdb invoked as: >>> mpiexec -np 2 ..... -on_error_attach_debugger -display :0.0 >>> does not display any backtrace after the crash. >>> Any hints how to debug are highly appreciated. >>> Dominik >>> -- What most experimenters take for granted before >>> they begin their >>> experiments is infinitely more interesting than any results to >>> which their experiments lead. >>> -- Norbert Wiener >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to >>> which their experiments lead. >>> -- Norbert Wiener > From Chun.SUN at 3ds.com Mon Nov 16 10:08:29 2009 From: Chun.SUN at 3ds.com (SUN Chun) Date: Mon, 16 Nov 2009 11:08:29 -0500 Subject: MatMult with nonconforming error In-Reply-To: <4AFFBAD1.4050107@itis.ethz.ch> References: <4AFF21ED.5080106@itis.ethz.ch> <4AFF2664.20202@itis.ethz.ch> <4AFF3218.3070303@itis.ethz.ch> <4AFFBAD1.4050107@itis.ethz.ch> Message-ID: <2545DC7A42DF804AAAB2ADA5043D57DA28E4E8@CORP-CLT-EXB01.ds> Hi, I was trying to do MatMult with a non-square matrix and a vector. They have different local dimensions. For this particular case, my Mat is 12x48, my Vec is 48x1.When I run in parallel with 2 cores, I have Mat partitioned by row in 12+0, and I have Vec partitioned by row in 18+30. When I perform MatMult, I get: [0]PETSC ERROR: --------------------- Error Message ------------------------------------ [0]PETSC ERROR: Nonconforming object sizes! [0]PETSC ERROR: Incompatible partition of A (24) and xx (18)! [0]PETSC ERROR: ------------------------------------------------------------------------ Is it required to partition Vec and Mat such that my Vec's row partition agrees my Mat's column partition? If so, is there any way to get around this? Thanks, Chun From bsmith at mcs.anl.gov Mon Nov 16 10:35:19 2009 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 16 Nov 2009 10:35:19 -0600 Subject: MatMult with nonconforming error In-Reply-To: <2545DC7A42DF804AAAB2ADA5043D57DA28E4E8@CORP-CLT-EXB01.ds> References: <4AFF21ED.5080106@itis.ethz.ch> <4AFF2664.20202@itis.ethz.ch> <4AFF3218.3070303@itis.ethz.ch> <4AFFBAD1.4050107@itis.ethz.ch> <2545DC7A42DF804AAAB2ADA5043D57DA28E4E8@CORP-CLT-EXB01.ds> Message-ID: With y = A x the row partition of A MUST match the row partition of y and the column partition of A MUST match the row partition of x. There is no avoiding this, Barry On Nov 16, 2009, at 10:08 AM, SUN Chun wrote: > Hi, > > I was trying to do MatMult with a non-square matrix and a vector. > They have different local dimensions. > > For this particular case, my Mat is 12x48, my Vec is 48x1.When I run > in parallel with 2 cores, I have Mat partitioned by row in 12+0, and > I have Vec partitioned by row in 18+30. When I perform MatMult, I get: > > [0]PETSC ERROR: --------------------- Error Message > ------------------------------------ > [0]PETSC ERROR: Nonconforming object sizes! > [0]PETSC ERROR: Incompatible partition of A (24) and xx (18)! > [0]PETSC ERROR: > ------------------------------------------------------------------------ > > Is it required to partition Vec and Mat such that my Vec's row > partition agrees my Mat's column partition? If so, is there any way > to get around this? > > Thanks, > Chun From dalcinl at gmail.com Mon Nov 16 11:00:28 2009 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Mon, 16 Nov 2009 15:00:28 -0200 Subject: MatMult with nonconforming error In-Reply-To: References: <4AFF21ED.5080106@itis.ethz.ch> <4AFF2664.20202@itis.ethz.ch> <4AFF3218.3070303@itis.ethz.ch> <4AFFBAD1.4050107@itis.ethz.ch> <2545DC7A42DF804AAAB2ADA5043D57DA28E4E8@CORP-CLT-EXB01.ds> Message-ID: On Mon, Nov 16, 2009 at 2:35 PM, Barry Smith wrote: > > ? With y = A x the row partition of A MUST match the row partition of y and > the column partition of A MUST match the row partition of x. > > ? There is no avoiding this, > But it can be workaround-ed using a VecScatter, right? > > On Nov 16, 2009, at 10:08 AM, SUN Chun wrote: > >> Hi, >> >> I was trying to do MatMult with a non-square matrix and a vector. They >> have different local dimensions. >> >> For this particular case, my Mat is 12x48, my Vec is 48x1.When I run in >> parallel with 2 cores, I have Mat partitioned by row in 12+0, and I have Vec >> partitioned by row in 18+30. When I perform MatMult, I get: >> >> [0]PETSC ERROR: --------------------- Error Message >> ------------------------------------ >> [0]PETSC ERROR: Nonconforming object sizes! >> [0]PETSC ERROR: Incompatible partition of A (24) and xx (18)! >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> >> Is it required to partition Vec and Mat such that my Vec's row partition >> agrees my Mat's column partition? If so, is there any way to get around >> this? >> >> Thanks, >> Chun > > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From bsmith at mcs.anl.gov Mon Nov 16 12:00:31 2009 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 16 Nov 2009 12:00:31 -0600 Subject: MatMult with nonconforming error In-Reply-To: References: <4AFF21ED.5080106@itis.ethz.ch> <4AFF2664.20202@itis.ethz.ch> <4AFF3218.3070303@itis.ethz.ch> <4AFFBAD1.4050107@itis.ethz.ch> <2545DC7A42DF804AAAB2ADA5043D57DA28E4E8@CORP-CLT-EXB01.ds> Message-ID: <16E85BAA-87E4-4D27-96EF-24B77EBE0859@mcs.anl.gov> On Nov 16, 2009, at 11:00 AM, Lisandro Dalcin wrote: > On Mon, Nov 16, 2009 at 2:35 PM, Barry Smith > wrote: >> >> With y = A x the row partition of A MUST match the row partition >> of y and >> the column partition of A MUST match the row partition of x. >> >> There is no avoiding this, >> > > But it can be workaround-ed using a VecScatter, right? Well since PETSc is just a library of routines one can do anything they want with it. So yes, one could include a VecScatter to get things into the right shape, but it would be a bit cumbersome. Barry > >> >> On Nov 16, 2009, at 10:08 AM, SUN Chun wrote: >> >>> Hi, >>> >>> I was trying to do MatMult with a non-square matrix and a vector. >>> They >>> have different local dimensions. >>> >>> For this particular case, my Mat is 12x48, my Vec is 48x1.When I >>> run in >>> parallel with 2 cores, I have Mat partitioned by row in 12+0, and >>> I have Vec >>> partitioned by row in 18+30. When I perform MatMult, I get: >>> >>> [0]PETSC ERROR: --------------------- Error Message >>> ------------------------------------ >>> [0]PETSC ERROR: Nonconforming object sizes! >>> [0]PETSC ERROR: Incompatible partition of A (24) and xx (18)! >>> [0]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> >>> Is it required to partition Vec and Mat such that my Vec's row >>> partition >>> agrees my Mat's column partition? If so, is there any way to get >>> around >>> this? >>> >>> Thanks, >>> Chun >> >> > > > > -- > Lisandro Dalc?n > --------------- > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > Tel/Fax: +54-(0)342-451.1594 From Chun.SUN at 3ds.com Tue Nov 17 11:00:48 2009 From: Chun.SUN at 3ds.com (SUN Chun) Date: Tue, 17 Nov 2009 12:00:48 -0500 Subject: matlab viewer variable name In-Reply-To: <16E85BAA-87E4-4D27-96EF-24B77EBE0859@mcs.anl.gov> References: <4AFF21ED.5080106@itis.ethz.ch><4AFF2664.20202@itis.ethz.ch><4AFF3218.3070303@itis.ethz.ch><4AFFBAD1.4050107@itis.ethz.ch><2545DC7A42DF804AAAB2ADA5043D57DA28E4E8@CORP-CLT-EXB01.ds> <16E85BAA-87E4-4D27-96EF-24B77EBE0859@mcs.anl.gov> Message-ID: <2545DC7A42DF804AAAB2ADA5043D57DA28E4E9@CORP-CLT-EXB01.ds> Hi, When I MatView or VecView with matlab format, I am automatically assigned a variable name like "Mat_0", "Vec_1". Which function can I use to change this value? Sorry, I had a feeling I have seen this question asked before. Spent 30min searching and didn't find... Thanks, Chun From jed at 59A2.org Tue Nov 17 11:06:50 2009 From: jed at 59A2.org (Jed Brown) Date: Tue, 17 Nov 2009 18:06:50 +0100 Subject: matlab viewer variable name In-Reply-To: <2545DC7A42DF804AAAB2ADA5043D57DA28E4E9@CORP-CLT-EXB01.ds> References: <4AFF21ED.5080106@itis.ethz.ch><4AFF2664.20202@itis.ethz.ch><4AFF3218.3070303@itis.ethz.ch><4AFFBAD1.4050107@itis.ethz.ch><2545DC7A42DF804AAAB2ADA5043D57DA28E4E8@CORP-CLT-EXB01.ds> <16E85BAA-87E4-4D27-96EF-24B77EBE0859@mcs.anl.gov> <2545DC7A42DF804AAAB2ADA5043D57DA28E4E9@CORP-CLT-EXB01.ds> Message-ID: <4B02D82A.1080207@59A2.org> SUN Chun wrote: > Hi, > > When I MatView or VecView with matlab format, I am automatically > assigned a variable name like "Mat_0", "Vec_1". Which function can I use > to change this value? PetscObjectSetName((PetscObject)yourvec,"Name_of_your_vec"); Jed -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: From bsmith at mcs.anl.gov Tue Nov 17 11:06:57 2009 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 17 Nov 2009 11:06:57 -0600 Subject: matlab viewer variable name In-Reply-To: <2545DC7A42DF804AAAB2ADA5043D57DA28E4E9@CORP-CLT-EXB01.ds> References: <4AFF21ED.5080106@itis.ethz.ch><4AFF2664.20202@itis.ethz.ch><4AFF3218.3070303@itis.ethz.ch><4AFFBAD1.4050107@itis.ethz.ch><2545DC7A42DF804AAAB2ADA5043D57DA28E4E8@CORP-CLT-EXB01.ds> <16E85BAA-87E4-4D27-96EF-24B77EBE0859@mcs.anl.gov> <2545DC7A42DF804AAAB2ADA5043D57DA28E4E9@CORP-CLT-EXB01.ds> Message-ID: <8E0BED3A-55F7-4F6E-BAAA-4EDADB1C9ACA@mcs.anl.gov> PetscObjectSetName((PetscObject)x, "myname"); On Nov 17, 2009, at 11:00 AM, SUN Chun wrote: > Hi, > > When I MatView or VecView with matlab format, I am automatically > assigned a variable name like "Mat_0", "Vec_1". Which function can I > use > to change this value? > > Sorry, I had a feeling I have seen this question asked before. Spent > 30min searching and didn't find... > > Thanks, > Chun From jarunan at ascomp.ch Wed Nov 18 02:27:30 2009 From: jarunan at ascomp.ch (jarunan at ascomp.ch) Date: Wed, 18 Nov 2009 09:27:30 +0100 Subject: scaling in 4-core machine In-Reply-To: <8E0BED3A-55F7-4F6E-BAAA-4EDADB1C9ACA@mcs.anl.gov> References: <4AFF21ED.5080106@itis.ethz.ch><4AFF2664.20202@itis.ethz.ch><4AFF3218.3070303@itis.ethz.ch><4AFFBAD1.4050107@itis.ethz.ch><2545DC7A42DF804AAAB2ADA5043D57DA28E4E8@CORP-CLT-EXB01.ds> <16E85BAA-87E4-4D27-96EF-24B77EBE0859@mcs.anl.gov> <2545DC7A42DF804AAAB2ADA5043D57DA28E4E9@CORP-CLT-EXB01.ds> <8E0BED3A-55F7-4F6E-BAAA-4EDADB1C9ACA@mcs.anl.gov> Message-ID: <20091118092730.zxzn2ru7nogk8g8c@webmail.ascomp.ch> Hello, I have read the topic about performance of a machine with 2 dual-core chips, and it is written that with -np 2 it should scale the best. I would like to ask about 4-core machine. I run the test on a quad core machine with mpiexec -n 1, 2 and 4 to see the parallel scaling. The cpu times of the test are: Solver/Precond/Sub_Precond gmres/bjacobi/ilu -n 1, 1917.5730 sec, -n 2, 1699.9490 sec, efficiency = 56.40% -n 4, 1661.6810 sec, efficiency = 28.86% bicgstab/asm/ilu -n 1, 1800.8380 sec, -n 2, 1415.0170 sec, efficiency = 63.63% -n 4, 1119.3480 sec, efficiency = 40.22% Why is the scaling so low, especially with option -n 4? Would it be expected to be better running with real 4 CPU's instead of a quad core ship? Regards, Jarunan -- Jarunan Panyasantisuk Development Engineer ASCOMP GmbH, Technoparkstr. 1 CH-8005 Zurich, Switzerland Phone : +41 44 445 4072 Fax : +41 44 445 4075 E-mail: jarunan at ascomp.ch www.ascomp.ch From zonexo at gmail.com Wed Nov 18 02:45:12 2009 From: zonexo at gmail.com (Wee-Beng Tay) Date: Wed, 18 Nov 2009 16:45:12 +0800 Subject: Using PETSc with Silverfrost FTN95 for windows Message-ID: <804ab5d40911180045u1ee90bc2o3a1c69b40fe7080e@mail.gmail.com> Hi, Has anyone managed to use PETSc successfully with Silverfrost FTN95 for windows. I didn't manage to google any info regarding this. Silverfrost FTN95 seems like a good free alternative fortran compiler in windows. Are these possible: 1. Using pre-compiled library of PETSc with Silverfrost FTN95 - supposed that the library is precompiled using CVF or intel or visual studio. 2. Compiling the PETSc library using Silverfrost FTN95 and another C compiler for windows Thanks alot! -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Wed Nov 18 04:13:05 2009 From: jed at 59A2.org (Jed Brown) Date: Wed, 18 Nov 2009 11:13:05 +0100 Subject: scaling in 4-core machine In-Reply-To: <20091118092730.zxzn2ru7nogk8g8c@webmail.ascomp.ch> References: <4AFF21ED.5080106@itis.ethz.ch><4AFF2664.20202@itis.ethz.ch><4AFF3218.3070303@itis.ethz.ch><4AFFBAD1.4050107@itis.ethz.ch><2545DC7A42DF804AAAB2ADA5043D57DA28E4E8@CORP-CLT-EXB01.ds> <16E85BAA-87E4-4D27-96EF-24B77EBE0859@mcs.anl.gov> <2545DC7A42DF804AAAB2ADA5043D57DA28E4E9@CORP-CLT-EXB01.ds> <8E0BED3A-55F7-4F6E-BAAA-4EDADB1C9ACA@mcs.anl.gov> <20091118092730.zxzn2ru7nogk8g8c@webmail.ascomp.ch> Message-ID: <4B03C8B1.2080305@59A2.org> jarunan at ascomp.ch wrote: > > Hello, > > I have read the topic about performance of a machine with 2 dual-core > chips, and it is written that with -np 2 it should scale the best. I > would like to ask about 4-core machine. > > I run the test on a quad core machine with mpiexec -n 1, 2 and 4 to see > the parallel scaling. The cpu times of the test are: > > Solver/Precond/Sub_Precond > > gmres/bjacobi/ilu > > -n 1, 1917.5730 sec, > -n 2, 1699.9490 sec, efficiency = 56.40% > -n 4, 1661.6810 sec, efficiency = 28.86% > > bicgstab/asm/ilu > > -n 1, 1800.8380 sec, > -n 2, 1415.0170 sec, efficiency = 63.63% > -n 4, 1119.3480 sec, efficiency = 40.22% These numbers are worthless without at least knowing iteration counts. > Why is the scaling so low, especially with option -n 4? > Would it be expected to be better running with real 4 CPU's instead of a > quad core ship? 4 sockets using a single core each (4x1) will generally do better than 2x2 or 1x4, but 4x4 costs about the same as 4x1 these days. This is a very common question, the answer is that a single floating point unit is about 10 times faster than memory for the sort of operations that we do when solving PDE. You don't get another memory bus every time you add a core so the ratio becomes worse. More cores are not a complete loss because at least you get an extra L1 cache for each core, but sparse matrix and vector kernels are atrocious at reusing cache (there's not much to reuse because most values are only needed to perform one operation). Getting better multicore performance requires changing the algorithms to better reuse L1 cache. This means moving away from assembled matrices where possible and of course finding good preconditioners. High-order and fast multipole methods are good for this. But it's very much an open problem and unless you want to do research in the field, you have to live with poor multicore performance. When buying hardware, remember that you are buying memory bandwidth (and a low-latency network) instead of floating point units. Jed -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: From balay at mcs.anl.gov Wed Nov 18 09:09:18 2009 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 18 Nov 2009 09:09:18 -0600 (CST) Subject: Using PETSc with Silverfrost FTN95 for windows In-Reply-To: <804ab5d40911180045u1ee90bc2o3a1c69b40fe7080e@mail.gmail.com> References: <804ab5d40911180045u1ee90bc2o3a1c69b40fe7080e@mail.gmail.com> Message-ID: Nope - this compiler won't work with PETSc. Someone would need to fix win32fe to work with this compiler. Satish On Wed, 18 Nov 2009, Wee-Beng Tay wrote: > Hi, > > Has anyone managed to use PETSc successfully with Silverfrost FTN95 for > windows. I didn't manage to google any info regarding this. Silverfrost > FTN95 seems like a good free alternative fortran compiler in windows. > > Are these possible: > > 1. Using pre-compiled library of PETSc with Silverfrost FTN95 - supposed > that the library is precompiled using CVF or intel or visual studio. > > 2. Compiling the PETSc library using Silverfrost FTN95 and another C > compiler for windows > > Thanks alot! > From balay at mcs.anl.gov Wed Nov 18 09:14:19 2009 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 18 Nov 2009 09:14:19 -0600 (CST) Subject: scaling in 4-core machine In-Reply-To: <4B03C8B1.2080305@59A2.org> References: <4AFF21ED.5080106@itis.ethz.ch><4AFF2664.20202@itis.ethz.ch><4AFF3218.3070303@itis.ethz.ch><4AFFBAD1.4050107@itis.ethz.ch><2545DC7A42DF804AAAB2ADA5043D57DA28E4E8@CORP-CLT-EXB01.ds> <16E85BAA-87E4-4D27-96EF-24B77EBE0859@mcs.anl.gov> <2545DC7A42DF804AAAB2ADA5043D57DA28E4E9@CORP-CLT-EXB01.ds> <8E0BED3A-55F7-4F6E-BAAA-4EDADB1C9ACA@mcs.anl.gov> <20091118092730.zxzn2ru7nogk8g8c@webmail.ascomp.ch> <4B03C8B1.2080305@59A2.org> Message-ID: Just want to add one more point to this. Most multicore machines do not provide scalable hardware. [yeah - the FPUs cores are scalable - but the memory subsystem is not]. So one should not expect scalable performance out of them. You should take the 'max' performance you can get out out them - and then look for scalability with multiple nodes. Satish On Wed, 18 Nov 2009, Jed Brown wrote: > jarunan at ascomp.ch wrote: > > > > Hello, > > > > I have read the topic about performance of a machine with 2 dual-core > > chips, and it is written that with -np 2 it should scale the best. I > > would like to ask about 4-core machine. > > > > I run the test on a quad core machine with mpiexec -n 1, 2 and 4 to see > > the parallel scaling. The cpu times of the test are: > > > > Solver/Precond/Sub_Precond > > > > gmres/bjacobi/ilu > > > > -n 1, 1917.5730 sec, > > -n 2, 1699.9490 sec, efficiency = 56.40% > > -n 4, 1661.6810 sec, efficiency = 28.86% > > > > bicgstab/asm/ilu > > > > -n 1, 1800.8380 sec, > > -n 2, 1415.0170 sec, efficiency = 63.63% > > -n 4, 1119.3480 sec, efficiency = 40.22% > > These numbers are worthless without at least knowing iteration counts. > > > Why is the scaling so low, especially with option -n 4? > > Would it be expected to be better running with real 4 CPU's instead of a > > quad core ship? > > 4 sockets using a single core each (4x1) will generally do better than > 2x2 or 1x4, but 4x4 costs about the same as 4x1 these days. This is a > very common question, the answer is that a single floating point unit is > about 10 times faster than memory for the sort of operations that we do > when solving PDE. You don't get another memory bus every time you add a > core so the ratio becomes worse. More cores are not a complete loss > because at least you get an extra L1 cache for each core, but sparse > matrix and vector kernels are atrocious at reusing cache (there's not > much to reuse because most values are only needed to perform one > operation). > > Getting better multicore performance requires changing the algorithms to > better reuse L1 cache. This means moving away from assembled matrices > where possible and of course finding good preconditioners. High-order > and fast multipole methods are good for this. But it's very much an > open problem and unless you want to do research in the field, you have > to live with poor multicore performance. > > When buying hardware, remember that you are buying memory bandwidth (and > a low-latency network) instead of floating point units. > > Jed > > From aron.ahmadia at kaust.edu.sa Wed Nov 18 09:26:48 2009 From: aron.ahmadia at kaust.edu.sa (Aron Ahmadia) Date: Wed, 18 Nov 2009 18:26:48 +0300 Subject: scaling in 4-core machine In-Reply-To: References: <4AFF21ED.5080106@itis.ethz.ch> <2545DC7A42DF804AAAB2ADA5043D57DA28E4E8@CORP-CLT-EXB01.ds> <16E85BAA-87E4-4D27-96EF-24B77EBE0859@mcs.anl.gov> <2545DC7A42DF804AAAB2ADA5043D57DA28E4E9@CORP-CLT-EXB01.ds> <8E0BED3A-55F7-4F6E-BAAA-4EDADB1C9ACA@mcs.anl.gov> <20091118092730.zxzn2ru7nogk8g8c@webmail.ascomp.ch> <4B03C8B1.2080305@59A2.org> Message-ID: <74e91d510911180726w56df1797p7984f6f9f95d7035@mail.gmail.com> Does anybody have good references in the literature analyzing the memory access patterns for sparse solvers and how they scale? I remember seeing Barry's talk about multigrid memory access patterns, but I'm not sure if I've ever seen a good paper reference. Cheers, Aron On Wed, Nov 18, 2009 at 6:14 PM, Satish Balay wrote: > Just want to add one more point to this. > > Most multicore machines do not provide scalable hardware. [yeah - the > FPUs cores are scalable - but the memory subsystem is not]. So one > should not expect scalable performance out of them. You should take > the 'max' performance you can get out out them - and then look for > scalability with multiple nodes. > > Satish > > On Wed, 18 Nov 2009, Jed Brown wrote: > > > jarunan at ascomp.ch wrote: > > > > > > Hello, > > > > > > I have read the topic about performance of a machine with 2 dual-core > > > chips, and it is written that with -np 2 it should scale the best. I > > > would like to ask about 4-core machine. > > > > > > I run the test on a quad core machine with mpiexec -n 1, 2 and 4 to see > > > the parallel scaling. The cpu times of the test are: > > > > > > Solver/Precond/Sub_Precond > > > > > > gmres/bjacobi/ilu > > > > > > -n 1, 1917.5730 sec, > > > -n 2, 1699.9490 sec, efficiency = 56.40% > > > -n 4, 1661.6810 sec, efficiency = 28.86% > > > > > > bicgstab/asm/ilu > > > > > > -n 1, 1800.8380 sec, > > > -n 2, 1415.0170 sec, efficiency = 63.63% > > > -n 4, 1119.3480 sec, efficiency = 40.22% > > > > These numbers are worthless without at least knowing iteration counts. > > > > > Why is the scaling so low, especially with option -n 4? > > > Would it be expected to be better running with real 4 CPU's instead of > a > > > quad core ship? > > > > 4 sockets using a single core each (4x1) will generally do better than > > 2x2 or 1x4, but 4x4 costs about the same as 4x1 these days. This is a > > very common question, the answer is that a single floating point unit is > > about 10 times faster than memory for the sort of operations that we do > > when solving PDE. You don't get another memory bus every time you add a > > core so the ratio becomes worse. More cores are not a complete loss > > because at least you get an extra L1 cache for each core, but sparse > > matrix and vector kernels are atrocious at reusing cache (there's not > > much to reuse because most values are only needed to perform one > > operation). > > > > Getting better multicore performance requires changing the algorithms to > > better reuse L1 cache. This means moving away from assembled matrices > > where possible and of course finding good preconditioners. High-order > > and fast multipole methods are good for this. But it's very much an > > open problem and unless you want to do research in the field, you have > > to live with poor multicore performance. > > > > When buying hardware, remember that you are buying memory bandwidth (and > > a low-latency network) instead of floating point units. > > > > Jed > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amjad11 at gmail.com Wed Nov 18 11:47:07 2009 From: amjad11 at gmail.com (amjad ali) Date: Wed, 18 Nov 2009 12:47:07 -0500 Subject: scaling in 4-core machine In-Reply-To: <74e91d510911180726w56df1797p7984f6f9f95d7035@mail.gmail.com> References: <4AFF21ED.5080106@itis.ethz.ch> <16E85BAA-87E4-4D27-96EF-24B77EBE0859@mcs.anl.gov> <2545DC7A42DF804AAAB2ADA5043D57DA28E4E9@CORP-CLT-EXB01.ds> <8E0BED3A-55F7-4F6E-BAAA-4EDADB1C9ACA@mcs.anl.gov> <20091118092730.zxzn2ru7nogk8g8c@webmail.ascomp.ch> <4B03C8B1.2080305@59A2.org> <74e91d510911180726w56df1797p7984f6f9f95d7035@mail.gmail.com> Message-ID: <428810f20911180947i723b4a24vcabf72b12adc266e@mail.gmail.com> Hi, Aron, Can you please give link of Barry's talk about multigrid memory access patterns (u just mentioned)? thanks On Wed, Nov 18, 2009 at 10:26 AM, Aron Ahmadia wrote: > Does anybody have good references in the literature analyzing the memory > access patterns for sparse solvers and how they scale? I remember seeing > Barry's talk about multigrid memory access patterns, but I'm not sure if > I've ever seen a good paper reference. > > Cheers, > Aron > > > On Wed, Nov 18, 2009 at 6:14 PM, Satish Balay wrote: > >> Just want to add one more point to this. >> >> Most multicore machines do not provide scalable hardware. [yeah - the >> FPUs cores are scalable - but the memory subsystem is not]. So one >> should not expect scalable performance out of them. You should take >> the 'max' performance you can get out out them - and then look for >> scalability with multiple nodes. >> >> Satish >> >> On Wed, 18 Nov 2009, Jed Brown wrote: >> >> > jarunan at ascomp.ch wrote: >> > > >> > > Hello, >> > > >> > > I have read the topic about performance of a machine with 2 dual-core >> > > chips, and it is written that with -np 2 it should scale the best. I >> > > would like to ask about 4-core machine. >> > > >> > > I run the test on a quad core machine with mpiexec -n 1, 2 and 4 to >> see >> > > the parallel scaling. The cpu times of the test are: >> > > >> > > Solver/Precond/Sub_Precond >> > > >> > > gmres/bjacobi/ilu >> > > >> > > -n 1, 1917.5730 sec, >> > > -n 2, 1699.9490 sec, efficiency = 56.40% >> > > -n 4, 1661.6810 sec, efficiency = 28.86% >> > > >> > > bicgstab/asm/ilu >> > > >> > > -n 1, 1800.8380 sec, >> > > -n 2, 1415.0170 sec, efficiency = 63.63% >> > > -n 4, 1119.3480 sec, efficiency = 40.22% >> > >> > These numbers are worthless without at least knowing iteration counts. >> > >> > > Why is the scaling so low, especially with option -n 4? >> > > Would it be expected to be better running with real 4 CPU's instead of >> a >> > > quad core ship? >> > >> > 4 sockets using a single core each (4x1) will generally do better than >> > 2x2 or 1x4, but 4x4 costs about the same as 4x1 these days. This is a >> > very common question, the answer is that a single floating point unit is >> > about 10 times faster than memory for the sort of operations that we do >> > when solving PDE. You don't get another memory bus every time you add a >> > core so the ratio becomes worse. More cores are not a complete loss >> > because at least you get an extra L1 cache for each core, but sparse >> > matrix and vector kernels are atrocious at reusing cache (there's not >> > much to reuse because most values are only needed to perform one >> > operation). >> > >> > Getting better multicore performance requires changing the algorithms to >> > better reuse L1 cache. This means moving away from assembled matrices >> > where possible and of course finding good preconditioners. High-order >> > and fast multipole methods are good for this. But it's very much an >> > open problem and unless you want to do research in the field, you have >> > to live with poor multicore performance. >> > >> > When buying hardware, remember that you are buying memory bandwidth (and >> > a low-latency network) instead of floating point units. >> > >> > Jed >> > >> > >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Nov 18 14:18:27 2009 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 18 Nov 2009 14:18:27 -0600 Subject: scaling in 4-core machine In-Reply-To: <74e91d510911180726w56df1797p7984f6f9f95d7035@mail.gmail.com> References: <4AFF21ED.5080106@itis.ethz.ch> <16E85BAA-87E4-4D27-96EF-24B77EBE0859@mcs.anl.gov> <2545DC7A42DF804AAAB2ADA5043D57DA28E4E9@CORP-CLT-EXB01.ds> <8E0BED3A-55F7-4F6E-BAAA-4EDADB1C9ACA@mcs.anl.gov> <20091118092730.zxzn2ru7nogk8g8c@webmail.ascomp.ch> <4B03C8B1.2080305@59A2.org> <74e91d510911180726w56df1797p7984f6f9f95d7035@mail.gmail.com> Message-ID: There is also the paper by Barry, Bill, David, and Dinesh about SpMV. Its very good. That is what I base my slides on. You can see the punchline in the tutorial slides. Matt On Wed, Nov 18, 2009 at 9:26 AM, Aron Ahmadia wrote: > Does anybody have good references in the literature analyzing the memory > access patterns for sparse solvers and how they scale? I remember seeing > Barry's talk about multigrid memory access patterns, but I'm not sure if > I've ever seen a good paper reference. > > Cheers, > Aron > > > On Wed, Nov 18, 2009 at 6:14 PM, Satish Balay wrote: > >> Just want to add one more point to this. >> >> Most multicore machines do not provide scalable hardware. [yeah - the >> FPUs cores are scalable - but the memory subsystem is not]. So one >> should not expect scalable performance out of them. You should take >> the 'max' performance you can get out out them - and then look for >> scalability with multiple nodes. >> >> Satish >> >> On Wed, 18 Nov 2009, Jed Brown wrote: >> >> > jarunan at ascomp.ch wrote: >> > > >> > > Hello, >> > > >> > > I have read the topic about performance of a machine with 2 dual-core >> > > chips, and it is written that with -np 2 it should scale the best. I >> > > would like to ask about 4-core machine. >> > > >> > > I run the test on a quad core machine with mpiexec -n 1, 2 and 4 to >> see >> > > the parallel scaling. The cpu times of the test are: >> > > >> > > Solver/Precond/Sub_Precond >> > > >> > > gmres/bjacobi/ilu >> > > >> > > -n 1, 1917.5730 sec, >> > > -n 2, 1699.9490 sec, efficiency = 56.40% >> > > -n 4, 1661.6810 sec, efficiency = 28.86% >> > > >> > > bicgstab/asm/ilu >> > > >> > > -n 1, 1800.8380 sec, >> > > -n 2, 1415.0170 sec, efficiency = 63.63% >> > > -n 4, 1119.3480 sec, efficiency = 40.22% >> > >> > These numbers are worthless without at least knowing iteration counts. >> > >> > > Why is the scaling so low, especially with option -n 4? >> > > Would it be expected to be better running with real 4 CPU's instead of >> a >> > > quad core ship? >> > >> > 4 sockets using a single core each (4x1) will generally do better than >> > 2x2 or 1x4, but 4x4 costs about the same as 4x1 these days. This is a >> > very common question, the answer is that a single floating point unit is >> > about 10 times faster than memory for the sort of operations that we do >> > when solving PDE. You don't get another memory bus every time you add a >> > core so the ratio becomes worse. More cores are not a complete loss >> > because at least you get an extra L1 cache for each core, but sparse >> > matrix and vector kernels are atrocious at reusing cache (there's not >> > much to reuse because most values are only needed to perform one >> > operation). >> > >> > Getting better multicore performance requires changing the algorithms to >> > better reuse L1 cache. This means moving away from assembled matrices >> > where possible and of course finding good preconditioners. High-order >> > and fast multipole methods are good for this. But it's very much an >> > open problem and unless you want to do research in the field, you have >> > to live with poor multicore performance. >> > >> > When buying hardware, remember that you are buying memory bandwidth (and >> > a low-latency network) instead of floating point units. >> > >> > Jed >> > >> > >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hxie at umn.edu Wed Nov 18 14:19:59 2009 From: hxie at umn.edu (hxie at umn.edu) Date: 18 Nov 2009 14:19:59 -0600 Subject: petsc-users Digest, Vol 11, Issue 17 In-Reply-To: References: Message-ID: Hi, I want to output the true residual norm after calling KSPSolve in a fortran program. I know there is a command option '-ksp_final_residual', but I cannot do this using the PBS job submission. What is the related fortran subroutine for this? Thanks for your help. Bests, Hui From knepley at gmail.com Wed Nov 18 14:21:06 2009 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 18 Nov 2009 14:21:06 -0600 Subject: petsc-users Digest, Vol 11, Issue 17 In-Reply-To: References: Message-ID: http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/KSP/KSPGetResidualNorm.html Matt On Wed, Nov 18, 2009 at 2:19 PM, wrote: > Hi, > > I want to output the true residual norm after calling KSPSolve in a fortran > program. I know there is a command option '-ksp_final_residual', but I > cannot do this using the PBS job submission. What is the related fortran > subroutine for this? Thanks for your help. > > Bests, > Hui > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Wed Nov 18 14:27:19 2009 From: jed at 59A2.org (Jed Brown) Date: Wed, 18 Nov 2009 21:27:19 +0100 Subject: scaling in 4-core machine In-Reply-To: References: <4AFF21ED.5080106@itis.ethz.ch> <16E85BAA-87E4-4D27-96EF-24B77EBE0859@mcs.anl.gov> <2545DC7A42DF804AAAB2ADA5043D57DA28E4E9@CORP-CLT-EXB01.ds> <8E0BED3A-55F7-4F6E-BAAA-4EDADB1C9ACA@mcs.anl.gov> <20091118092730.zxzn2ru7nogk8g8c@webmail.ascomp.ch> <4B03C8B1.2080305@59A2.org> <74e91d510911180726w56df1797p7984f6f9f95d7035@mail.gmail.com> Message-ID: <4B0458A7.90004@59A2.org> Matthew Knepley wrote: > There is also the paper by Barry, Bill, David, and Dinesh about SpMV. > Its very good. That is what I base my slides on. > You can see the punchline in the tutorial slides. I like this one http://portal.acm.org/ft_gateway.cfm?id=370405&type=pdf Recent work on multicore/auto-tuning http://crd.lbl.gov/~oliker/ Jed -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: From bsmith at mcs.anl.gov Wed Nov 18 14:53:52 2009 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 18 Nov 2009 14:53:52 -0600 Subject: petsc-users Digest, Vol 11, Issue 17 In-Reply-To: References: Message-ID: This only gives the final residual norm as computed by the Krylov method; so depending on your options it may be a preconditioned residual norm. If you want the true residual norm then you should compute it via the formula || b - A*x|| in your Fortran code after the KSPSolve() Barry On Nov 18, 2009, at 2:21 PM, Matthew Knepley wrote: > http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/KSP/KSPGetResidualNorm.html > > Matt > > On Wed, Nov 18, 2009 at 2:19 PM, wrote: > Hi, > > I want to output the true residual norm after calling KSPSolve in a > fortran program. I know there is a command option '- > ksp_final_residual', but I cannot do this using the PBS job > submission. What is the related fortran subroutine for this? Thanks > for your help. > > Bests, > Hui > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener From jarunan at ascomp.ch Thu Nov 19 02:13:55 2009 From: jarunan at ascomp.ch (jarunan at ascomp.ch) Date: Thu, 19 Nov 2009 09:13:55 +0100 Subject: scaling in 4-core machine In-Reply-To: <4B03C8B1.2080305@59A2.org> References: <4AFF21ED.5080106@itis.ethz.ch><4AFF2664.20202@itis.ethz.ch><4AFF3218.3070303@itis.ethz.ch><4AFFBAD1.4050107@itis.ethz.ch><2545DC7A42DF804AAAB2ADA5043D57DA28E4E8@CORP-CLT-EXB01.ds> <16E85BAA-87E4-4D27-96EF-24B77EBE0859@mcs.anl.gov> <2545DC7A42DF804AAAB2ADA5043D57DA28E4E9@CORP-CLT-EXB01.ds> <8E0BED3A-55F7-4F6E-BAAA-4EDADB1C9ACA@mcs.anl.gov> <20091118092730.zxzn2ru7nogk8g8c@webmail.ascomp.ch> <4B03C8B1.2080305@59A2.org> Message-ID: <20091119091355.3pnblyhjf4ss0wko@webmail.ascomp.ch> >> I run the test on a quad core machine with mpiexec -n 1, 2 and 4 to see >> the parallel scaling. The cpu times of the test are: >> >> Solver/Precond/Sub_Precond >> >> gmres/bjacobi/ilu >> >> -n 1, 1917.5730 sec, >> -n 2, 1699.9490 sec, efficiency = 56.40% >> -n 4, 1661.6810 sec, efficiency = 28.86% >> >> bicgstab/asm/ilu >> >> -n 1, 1800.8380 sec, >> -n 2, 1415.0170 sec, efficiency = 63.63% >> -n 4, 1119.3480 sec, efficiency = 40.22% > > These numbers are worthless without at least knowing iteration counts. I cannot show the iteration counts as I run it 50 time steps, 10 maximum iterations for each time step, and 200 number of sweep (or maxit in KSPSetTolerance()) I test also our inhouse solver on the same machine, which is much slower than Petsc solver but scales better. -n 1, 10022.8360 sec, -n 2, 5684.1490 sec, efficiency = 88% -n 4, 4067.0480 sec, efficiency = 61.61% efficiency = 100*Speedup/nproc Speedup = (cpu's time of -n 1)/(cpu's time of nproc) If anyone has done parallel scaling test, it would be very kind, if you could share the test results. Regards, Jarunan From hxie at umn.edu Thu Nov 19 13:14:57 2009 From: hxie at umn.edu (hxie at umn.edu) Date: 19 Nov 2009 13:14:57 -0600 Subject: Change orthogonalization option in fortran? In-Reply-To: References: Message-ID: Hi, I want to change the method for orthogonalization for the default ksp solver in fortran. I added the following in my code: -------------- call KSPGMRESSetOrthogonalization(ksp,KSPGMRESClassicalGramSchmidtOrthogonalization,pterr) call KSPGMRESSetCGSRefinementType(ksp,KSP_GMRES_CGS_REFINEMENT_IFNEEDED,pterr) -------------- And I got the following error message when compiling. -------------- error #6404: This name does not have a type, and must have an explicit type. [KSPGMRESCLASSICALGRAMSCHMIDTORTHOGONALIZATI] call KSPGMRESSetOrthogonalization(ksp,KSPGMRESClassicalGramSchmidtOrthogonalization,pterr) error #6404: This name does not have a type, and must have an explicit type. [KSP_GMRES_CGS_REFINEMENT_IFNEEDED] call KSPGMRESSetCGSRefinementType(ksp,KSP_GMRES_CGS_REFINEMENT_IFNEEDED,pterr) -------------- If I comment these two lines, the code is OK. Any idea for this? Thanks. Bests, Hui From jarunan at ascomp.ch Fri Nov 20 03:26:02 2009 From: jarunan at ascomp.ch (jarunan at ascomp.ch) Date: Fri, 20 Nov 2009 10:26:02 +0100 Subject: scaling in 4-core machine: unassembled structured In-Reply-To: <4B03C8B1.2080305@59A2.org> References: <4AFF21ED.5080106@itis.ethz.ch><4AFF2664.20202@itis.ethz.ch><4AFF3218.3070303@itis.ethz.ch><4AFFBAD1.4050107@itis.ethz.ch><2545DC7A42DF804AAAB2ADA5043D57DA28E4E8@CORP-CLT-EXB01.ds> <16E85BAA-87E4-4D27-96EF-24B77EBE0859@mcs.anl.gov> <2545DC7A42DF804AAAB2ADA5043D57DA28E4E9@CORP-CLT-EXB01.ds> <8E0BED3A-55F7-4F6E-BAAA-4EDADB1C9ACA@mcs.anl.gov> <20091118092730.zxzn2ru7nogk8g8c@webmail.ascomp.ch> <4B03C8B1.2080305@59A2.org> Message-ID: <20091120102602.nwotyi9tc8o4kwks@webmail.ascomp.ch> > > Getting better multicore performance requires changing the algorithms to > better reuse L1 cache. This means moving away from assembled matrices > where possible and of course finding good preconditioners. I do not know how to move away from assembled matrix. As I have to reset values to the matrix in each iteration, I oblige to call MatAssemblyBegin() and MatAssemblyEnd(). Is there other option to create and set values the matrix?? > High-order > and fast multipole methods are good for this. For example, please? Jarunan -- Jarunan Panyasantisuk Development Engineer ASCOMP GmbH, Technoparkstr. 1 CH-8005 Zurich, Switzerland Phone : +41 44 445 4072 Fax : +41 44 445 4075 E-mail: jarunan at ascomp.ch www.ascomp.ch From jed at 59A2.org Fri Nov 20 04:24:29 2009 From: jed at 59A2.org (Jed Brown) Date: Fri, 20 Nov 2009 11:24:29 +0100 Subject: scaling in 4-core machine: unassembled structured In-Reply-To: <20091120102602.nwotyi9tc8o4kwks@webmail.ascomp.ch> References: <4AFF21ED.5080106@itis.ethz.ch><4AFF2664.20202@itis.ethz.ch><4AFF3218.3070303@itis.ethz.ch><4AFFBAD1.4050107@itis.ethz.ch><2545DC7A42DF804AAAB2ADA5043D57DA28E4E8@CORP-CLT-EXB01.ds> <16E85BAA-87E4-4D27-96EF-24B77EBE0859@mcs.anl.gov> <2545DC7A42DF804AAAB2ADA5043D57DA28E4E9@CORP-CLT-EXB01.ds> <8E0BED3A-55F7-4F6E-BAAA-4EDADB1C9ACA@mcs.anl.gov> <20091118092730.zxzn2ru7nogk8g8c@webmail.ascomp.ch> <4B03C8B1.2080305@59A2.org> <20091120102602.nwotyi9tc8o4kwks@webmail.ascomp.ch> Message-ID: <4B066E5D.7050709@59A2.org> jarunan at ascomp.ch wrote: >> >> Getting better multicore performance requires changing the algorithms to >> better reuse L1 cache. This means moving away from assembled matrices >> where possible and of course finding good preconditioners. > > I do not know how to move away from assembled matrix. As I have to > reset values to the matrix in each iteration, I oblige to call > MatAssemblyBegin() and MatAssemblyEnd(). Is there other option to > create and set values the matrix?? A matrix is just a linear operation. What I mean by not assembling is that you no longer define that operation in terms of matrix entries. A DFT is a famous example of a linear operation that should not be represented in terms of matrix entries, instead it should be implemented by FFT. How to do this is highly dependent on discretization and physics, good preconditioners almost always require assembled matrices somewhere, but it's often possible to assemble something cheaper than the real Jacobian. >> High-order and fast multipole methods are good for this. > > For example, please? Spectral element methods implement certain operations by exploiting a tensor product structure which turns O(p^6) memory O(p^6) flops into O(p^3) memory O(p^4) flops (with larger constants). Matt has been doing some work with FMM. The key is to choose algorithms that do more work on the CPU for each value loaded from memory. I have some slides on the subject, http://59A2.org/files/Labs09-Dohp.pdf you could also take a look at slides from this mini-course (we did high-order methods on the last day) http://59A2.org/newton-krylov I can send you more technical references if you would like. Finally, if you are in Z?rich, we can talk about it sometime (I'm at ETH). Jed -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: From likask at civil.gla.ac.uk Fri Nov 20 11:22:11 2009 From: likask at civil.gla.ac.uk (Lukasz Kaczmarczyk) Date: Fri, 20 Nov 2009 17:22:11 +0000 Subject: suprelu Message-ID: Hello I try to configure petsc (p9) with SuperLU and I get error message ********************************************************************************* UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): --------------------------------------------------------------------------------------- Error unzipping _d_SuperLU_DIST.tar.gz: Could not execute 'cd /Users/likask/MyBuild/src/petsc-3.0.0-p9/externalpackages; gunzip _d_SuperLU_DIST.tar.gz': gzip: _d_SuperLU_DIST.tar.gz: not in gzip format ********************************************************************************* Regards, Lukasz From balay at mcs.anl.gov Fri Nov 20 11:31:30 2009 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 20 Nov 2009 11:31:30 -0600 (CST) Subject: suprelu In-Reply-To: References: Message-ID: works fine for me. Perhaps you can retry after 'rm -rf externalpackages' If the problem persists - send configure.log to petsc-maint at mcs.anl.gov Satish On Fri, 20 Nov 2009, Lukasz Kaczmarczyk wrote: > Hello > I try to configure petsc (p9) with SuperLU and I get error message > > ********************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): > --------------------------------------------------------------------------------------- > Error unzipping _d_SuperLU_DIST.tar.gz: Could not execute 'cd /Users/likask/MyBuild/src/petsc-3.0.0-p9/externalpackages; gunzip _d_SuperLU_DIST.tar.gz': > > gzip: _d_SuperLU_DIST.tar.gz: not in gzip format > ********************************************************************************* > > Regards, > Lukasz From bsmith at mcs.anl.gov Fri Nov 20 11:33:26 2009 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 20 Nov 2009 11:33:26 -0600 Subject: suprelu In-Reply-To: References: Message-ID: Please send configure.log to petsc-maint at mcs.anl.gov Looks like you have a corrupt tar.gz file; recommend deleting it and getting it again. Barry On Nov 20, 2009, at 11:22 AM, Lukasz Kaczmarczyk wrote: > Hello > I try to configure petsc (p9) with SuperLU and I get error message > > ********************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log > for details): > --------------------------------------------------------------------------------------- > Error unzipping _d_SuperLU_DIST.tar.gz: Could not execute 'cd /Users/ > likask/MyBuild/src/petsc-3.0.0-p9/externalpackages; gunzip > _d_SuperLU_DIST.tar.gz': > > gzip: _d_SuperLU_DIST.tar.gz: not in gzip format > ********************************************************************************* > > Regards, > Lukasz From bsmith at mcs.anl.gov Fri Nov 20 13:07:35 2009 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 20 Nov 2009 13:07:35 -0600 Subject: Change orthogonalization option in fortran? In-Reply-To: References: Message-ID: <765E16C6-1159-4FA1-9626-B284ED6FD87B@mcs.anl.gov> Sorry, we don't have the Fortran interfaces for these operations. You can use call PetscOptionsSet("- ksp_gmres_classicalgramschmidt",PETSC_NULL_CHARACTER,ierr) call PetscOptionsSet("- ksp_gmres_cgs_refinement_type","REFINE_IFNEEDED",ierr) before creating the KSP object. Barry On Nov 19, 2009, at 1:14 PM, hxie at umn.edu wrote: > Hi, > > I want to change the method for orthogonalization for the default > ksp solver in fortran. I added the following in my code: > -------------- > call > KSPGMRESSetOrthogonalization > (ksp,KSPGMRESClassicalGramSchmidtOrthogonalization,pterr) > call > KSPGMRESSetCGSRefinementType > (ksp,KSP_GMRES_CGS_REFINEMENT_IFNEEDED,pterr) > -------------- > > And I got the following error message when compiling. > -------------- > error #6404: This name does not have a type, and must have an > explicit type. [KSPGMRESCLASSICALGRAMSCHMIDTORTHOGONALIZATI] > call > KSPGMRESSetOrthogonalization > (ksp,KSPGMRESClassicalGramSchmidtOrthogonalization,pterr) > > error #6404: This name does not have a type, and must have an > explicit type. [KSP_GMRES_CGS_REFINEMENT_IFNEEDED] > call > KSPGMRESSetCGSRefinementType > (ksp,KSP_GMRES_CGS_REFINEMENT_IFNEEDED,pterr) > -------------- > > If I comment these two lines, the code is OK. Any idea for this? > Thanks. > > Bests, > Hui From likask at civil.gla.ac.uk Fri Nov 20 13:43:29 2009 From: likask at civil.gla.ac.uk (Lukasz Kaczmarczyk) Date: Fri, 20 Nov 2009 19:43:29 +0000 Subject: suprelu In-Reply-To: References: Message-ID: Thanks, for help. Strange enough It is working now. Regards, Lukasz On 20 Nov 2009, at 17:33, Barry Smith wrote: > > Please send configure.log to petsc-maint at mcs.anl.gov > > Looks like you have a corrupt tar.gz file; recommend deleting it and getting it again. > > Barry > > On Nov 20, 2009, at 11:22 AM, Lukasz Kaczmarczyk wrote: > >> Hello >> I try to configure petsc (p9) with SuperLU and I get error message >> >> ********************************************************************************* >> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): >> --------------------------------------------------------------------------------------- >> Error unzipping _d_SuperLU_DIST.tar.gz: Could not execute 'cd /Users/likask/MyBuild/src/petsc-3.0.0-p9/externalpackages; gunzip _d_SuperLU_DIST.tar.gz': >> >> gzip: _d_SuperLU_DIST.tar.gz: not in gzip format >> ********************************************************************************* >> >> Regards, >> Lukasz > > Lukasz Kaczmarczyk Lecturer Department of Civil Engineering, University of Glasgow, GLASGOW, G12 8LT Tel: +44 141 3305348 email: likask at civil.gla.ac.uk web: http://www.civil.gla.ac.uk/~kaczmarczyk/ web: http://code.google.com/p/yaffems/ From ajs at craft-tech.com Fri Nov 20 15:25:52 2009 From: ajs at craft-tech.com (Srinivasan Arunajatesan) Date: Fri, 20 Nov 2009 16:25:52 -0500 Subject: Petsc-Fun3d Message-ID: <4B070960.9040801@craft-tech.com> Is it possible to get a hold of the source code for Petsc-fun3d code? -- Srinivasan Arunajatesan, PhD. Senior Research Scientist Combustion Research and Flow Technology, Inc. 6210 Keller's Church Road, Pipersville, PA 18947. email : ajs at craft-tech.com Tel. : 215 766 1520 Fax. : 215 766 1524 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Nov 20 15:31:44 2009 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 20 Nov 2009 15:31:44 -0600 Subject: Petsc-Fun3d In-Reply-To: <4B070960.9040801@craft-tech.com> References: <4B070960.9040801@craft-tech.com> Message-ID: <21CED7DE-6BBA-46C4-89BF-A2D73D3C145D@mcs.anl.gov> It is in src/contrib/fun3d. The difficulty is that we cannot give you the grids for large problems that we have used; those were created by NASA and it doesn't want them handed out. You, of course, are free to use your own grids you just need to figure out the structure of the grid files; which you can be looking at the source code. Barry On Nov 20, 2009, at 3:25 PM, Srinivasan Arunajatesan wrote: > Is it possible to get a hold of the source code for Petsc-fun3d code? > -- > Srinivasan Arunajatesan, PhD. > Senior Research Scientist > Combustion Research and Flow Technology, Inc. > 6210 Keller's Church Road, > Pipersville, PA 18947. > > email : ajs at craft-tech.com > Tel. : 215 766 1520 > Fax. : 215 766 1524 From denist at al.com.au Sun Nov 22 20:29:05 2009 From: denist at al.com.au (Denis Teplyashin) Date: Mon, 23 Nov 2009 13:29:05 +1100 Subject: DA memory consumption Message-ID: <4B09F371.3000600@al.com.au> Hi guys, I'm a bit confused with distributed array memory consumption. I did a simple test like this one: ierr = DACreate3d(PETSC_COMM_WORLD, DA_NONPERIODIC, DA_STENCIL_BOX, 1000, 1000, 1000, PETSC_DECIDE, PETSC_DECIDE, PETSC_DECIDE, 1, 1, PETSC_NULL, PETSC_NULL, PETSC_NULL , &da); and then checked memory with PetscMemoryGetCurrentUsage and PetscMemoryGetMaximumUsage. Running this test using mpi on one core gives me this result: current usage 3818Mb and maximum usage 7633Mb. And this is the result after creating just a DA without actual vectors. Running the same test on two cores gives me even more interesting result: rank 0 - 9552/11463Mb and rank 1 - 5735/5732Mb. Is it what i should expect in general or am i doing something wrong? Is there a simple formula which could show how much memory i would need to allocate and array with given resolution? Thanks in advance, Denis From knepley at gmail.com Sun Nov 22 20:41:24 2009 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 22 Nov 2009 20:41:24 -0600 Subject: DA memory consumption In-Reply-To: <4B09F371.3000600@al.com.au> References: <4B09F371.3000600@al.com.au> Message-ID: It is not simple, but it is scalable, meaning in the limit of large N, the memory will be constant on each processor. When it is created, the VecScatter objects mapping global to local vecs are created. Matt On Sun, Nov 22, 2009 at 8:29 PM, Denis Teplyashin wrote: > Hi guys, > > I'm a bit confused with distributed array memory consumption. I did a > simple test like this one: > ierr = DACreate3d(PETSC_COMM_WORLD, DA_NONPERIODIC, DA_STENCIL_BOX, 1000, > 1000, 1000, PETSC_DECIDE, PETSC_DECIDE, PETSC_DECIDE, 1, 1, PETSC_NULL, > PETSC_NULL, PETSC_NULL , &da); > and then checked memory with PetscMemoryGetCurrentUsage and > PetscMemoryGetMaximumUsage. Running this test using mpi on one core gives me > this result: current usage 3818Mb and maximum usage 7633Mb. And this is the > result after creating just a DA without actual vectors. Running the same > test on two cores gives me even more interesting result: rank 0 - > 9552/11463Mb and rank 1 - 5735/5732Mb. > Is it what i should expect in general or am i doing something wrong? Is > there a simple formula which could show how much memory i would need to > allocate and array with given resolution? > > Thanks in advance, > Denis > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From denist at al.com.au Sun Nov 22 21:34:32 2009 From: denist at al.com.au (Denis Teplyashin) Date: Mon, 23 Nov 2009 14:34:32 +1100 Subject: DA memory consumption In-Reply-To: References: <4B09F371.3000600@al.com.au> Message-ID: <4B0A02C8.8050709@al.com.au> An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sun Nov 22 21:47:53 2009 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 22 Nov 2009 21:47:53 -0600 Subject: DA memory consumption In-Reply-To: <4B0A02C8.8050709@al.com.au> References: <4B09F371.3000600@al.com.au> <4B0A02C8.8050709@al.com.au> Message-ID: Sometimes computing can be an experimental science. Run the same size DA on 1, 2, 4, 8, 16, 32 processes and gather the information about memory usage and make a little table. Here is what you should find. The amount of memory depends on the local size of the array, which for your example below is 1000*1000*1000 on one process. Thus you will see that as you increase the number of processes you'll see the space needed per process for the DA decreases. It increases from 1 process to 2 because it needs all the ghost point data and VecScatter that are not needed on 1. Note also on one process a SINGLE vector for this size mesh is 8 gigabytes so the DA is really not much of pig since it is less than one vector. Barry On Nov 22, 2009, at 9:34 PM, Denis Teplyashin wrote: > So this sort of memory consumption is expected? Is it possible to > reduce is somehow? I'm not sure about underlying petsc object but it > looks like these additional objects require more memory than the > actual vector itself. > > Cheers, > Denis > > Matthew Knepley wrote: >> >> It is not simple, but it is scalable, meaning in the limit of large >> N, the memory will be constant on >> each processor. When it is created, the VecScatter objects mapping >> global to local vecs are created. >> >> Matt >> >> On Sun, Nov 22, 2009 at 8:29 PM, Denis Teplyashin >> wrote: >> Hi guys, >> >> I'm a bit confused with distributed array memory consumption. I did >> a simple test like this one: >> ierr = DACreate3d(PETSC_COMM_WORLD, DA_NONPERIODIC, >> DA_STENCIL_BOX, 1000, 1000, 1000, PETSC_DECIDE, PETSC_DECIDE, >> PETSC_DECIDE, 1, 1, PETSC_NULL, PETSC_NULL, PETSC_NULL , &da); >> and then checked memory with PetscMemoryGetCurrentUsage and >> PetscMemoryGetMaximumUsage. Running this test using mpi on one core >> gives me this result: current usage 3818Mb and maximum usage >> 7633Mb. And this is the result after creating just a DA without >> actual vectors. Running the same test on two cores gives me even >> more interesting result: rank 0 - 9552/11463Mb and rank 1 - >> 5735/5732Mb. >> Is it what i should expect in general or am i doing something >> wrong? Is there a simple formula which could show how much memory i >> would need to allocate and array with given resolution? >> >> Thanks in advance, >> Denis >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to >> which their experiments lead. >> -- Norbert Wiener > From craig-tanis at utc.edu Mon Nov 23 15:12:02 2009 From: craig-tanis at utc.edu (Craig Tanis) Date: Mon, 23 Nov 2009 16:12:02 -0500 Subject: pre-decomposed domains Message-ID: I have an existing MPI code that builds a linear system corresponding to an unstructured mesh. I'm hoping that I can change my code to work with PETSc, but I'm not sure the domain decomposition scheme is compatible. The big problem seems to be that my domains are not guaranteed to have contiguous global node ids. How can I specify explicitly which processor owns which node/vector element (for the purposes of ghost-node synchronization)? Thanks for your help, Craig Tanis From jed at 59A2.org Mon Nov 23 15:25:29 2009 From: jed at 59A2.org (Jed Brown) Date: Mon, 23 Nov 2009 22:25:29 +0100 Subject: pre-decomposed domains In-Reply-To: References: Message-ID: <878wdxj5me.fsf@59A2.org> On Mon, 23 Nov 2009 16:12:02 -0500, Craig Tanis wrote: > The big problem seems to be that my domains are not guaranteed to have > contiguous global node ids. How can I specify explicitly which > processor owns which node/vector element (for the purposes of > ghost-node synchronization)? PETSc matrices require that each process has contiguous rows, so your numbering scheme will have to be changed for matrix insertion. But the code that handles your physics does not need to be changed, you just encode the different numbering in a scatter. Jed From bsmith at mcs.anl.gov Mon Nov 23 15:35:44 2009 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 23 Nov 2009 15:35:44 -0600 Subject: pre-decomposed domains In-Reply-To: <878wdxj5me.fsf@59A2.org> References: <878wdxj5me.fsf@59A2.org> Message-ID: Take a look at the manual page for AO. This provides a mechanism for renumbering the nodes (and references to the nodes) into what PETSc needs. Then you just assemble the matrix and vectors using the new PETSc numbering. Or you can do the renumbering yourself. Note that renumbering doesn't mean moving any data between processes, you use the data layout you already have, you just change the "names" of the nodes. Barry On Nov 23, 2009, at 3:25 PM, Jed Brown wrote: > On Mon, 23 Nov 2009 16:12:02 -0500, Craig Tanis tanis at utc.edu> wrote: >> The big problem seems to be that my domains are not guaranteed to >> have >> contiguous global node ids. How can I specify explicitly which >> processor owns which node/vector element (for the purposes of >> ghost-node synchronization)? > > PETSc matrices require that each process has contiguous rows, so your > numbering scheme will have to be changed for matrix insertion. But > the > code that handles your physics does not need to be changed, you just > encode the different numbering in a scatter. > > Jed From achatter at cse.psu.edu Tue Nov 24 09:12:36 2009 From: achatter at cse.psu.edu (Anirban Chatterjee) Date: Tue, 24 Nov 2009 10:12:36 -0500 Subject: cannot convert '_p_Vec* const' to '_p_VecScatter*' for argument '1' to 'PetscErrorCode Message-ID: <4B0BF7E4.8030208@cse.psu.edu> Hi, I am trying to use Steve Wright's OOQP with PETSc support and get this error: cannot convert '/_p_Vec/* const' to '/_p_VecScatter/*' for argument '1' to 'PetscErrorCode Can anyone tell me why I am getting this error. I am getting this error in the VecScatterBegin function where the first argument type is Vec. Thanks, Anirban From balay at mcs.anl.gov Tue Nov 24 09:21:23 2009 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 24 Nov 2009 09:21:23 -0600 (CST) Subject: cannot convert '_p_Vec* const' to '_p_VecScatter*' for argument '1' to 'PetscErrorCode In-Reply-To: <4B0BF7E4.8030208@cse.psu.edu> References: <4B0BF7E4.8030208@cse.psu.edu> Message-ID: I guess OOQP is not updated to use the latest petsc version - where the first argument is VecScatter. Is this the only error you get? If so - its easy to just fix the code to use petsc-3 VecScatterBegin(Vec,Vec,InsertMode,ScatterMode,VecScatter); changed to: VecScatterBegin(VecScatter,Vec,Vec,InsertMode,ScatterMode); Satish On Tue, 24 Nov 2009, Anirban Chatterjee wrote: > Hi, > > I am trying to use Steve Wright's OOQP with PETSc support and get this error: > cannot convert '/_p_Vec/* const' to '/_p_VecScatter/*' for argument '1' to > 'PetscErrorCode > > Can anyone tell me why I am getting this error. I am getting this error in the > VecScatterBegin function where the first argument type is Vec. > > Thanks, > Anirban > From achatter at cse.psu.edu Tue Nov 24 09:43:43 2009 From: achatter at cse.psu.edu (Anirban Chatterjee) Date: Tue, 24 Nov 2009 10:43:43 -0500 Subject: cannot convert '_p_Vec* const' to '_p_VecScatter*' for argument '1' to 'PetscErrorCode In-Reply-To: References: <4B0BF7E4.8030208@cse.psu.edu> Message-ID: <4B0BFF2F.8020005@cse.psu.edu> Hi Satish, Yes, that fixes it. I can install OOQP with petsc-2.3.0 without a problem. But after fixing the VecScatter problem if I try petsc-3.0 then I end up in a linker error in src/QpBound/QpBoundPetsc.o "undefined reference to PetscIterativeSolver::PetscIterativeSolver". I cannot understand this. I have to check this. --Anirban Satish Balay wrote: > I guess OOQP is not updated to use the latest petsc version - where > the first argument is VecScatter. > > Is this the only error you get? If so - its easy to just fix the code > to use petsc-3 > > VecScatterBegin(Vec,Vec,InsertMode,ScatterMode,VecScatter); > changed to: > VecScatterBegin(VecScatter,Vec,Vec,InsertMode,ScatterMode); > > Satish > > On Tue, 24 Nov 2009, Anirban Chatterjee wrote: > > >> Hi, >> >> I am trying to use Steve Wright's OOQP with PETSc support and get this error: >> cannot convert '/_p_Vec/* const' to '/_p_VecScatter/*' for argument '1' to >> 'PetscErrorCode >> >> Can anyone tell me why I am getting this error. I am getting this error in the >> VecScatterBegin function where the first argument type is Vec. >> >> Thanks, >> Anirban >> >> > > From fpacull at fluorem.com Tue Nov 24 10:46:01 2009 From: fpacull at fluorem.com (francois pacull) Date: Tue, 24 Nov 2009 17:46:01 +0100 Subject: PCFieldSplit and Schur Message-ID: <4B0C0DC9.3060602@fluorem.com> Dear PETSc team, I am having a little bit of trouble with the Schur option of PCFieldSplit: I would like to apply some special treatments respectively to the five first and the two last equations over seven for each grid node of a CFD linear system... In the present case, the KSP is "gmres" (or "fgmres" if the PC is changing), the PC is "asm" (the subdomain IS is defined with PCASMSetLocalSubdomains in order to always include all the fields associated to the grid points of the overlap), the SUBKSP is "preonly" and the SUBPC is "fieldsplit". So far the FieldSplit options PC_COMPOSITE_ADDITIVE and PC_COMPOSITE_MULTIPLICATIVE work well with fieldsplit_0 and fieldsplit_1 set to KSPPREONLY and PCILU in the following way: PCFieldSplitGetSubKSP(SubPC,number_of_split,&FieldSplitKSP); ... [for the fields 0,1,2,3,4 (fieldsplit_0)] KSPSetType(FieldSplitKSP[0],KSPPREONLY); KSPGetPC(FieldSplitKSP[0],&FieldSplitPC0); PCSetType(FieldSplitPC0,PCILU); ... [for the fields 5 and 6 (fieldsplit_1)] KSPSetType(FieldSplitKSP[1],KSPPREONLY); KSPGetPC(FieldSplitKSP[1],&FieldSplitPC1); PCSetType(FieldSplitPC1,PCILU); ... They also work well when I set the FieldSplitKSP[0] and/or FieldSplitKSP[1] to KSPGMRES. However, I always get a segmentation violation when I try to use the FieldSplit option PC_COMPOSITE_SCHUR... I guess that I am missing something: is there more options to set in this Schur case? So far I did not turn on the Schur complement preconditioner option: PCFieldSplitSchurPrecondition(subpc,PETSC_FALSE); Is there a book or article describing the methods implemented in this FieldSplit option? The version is petsc-3.0.0-p7... Thank you for your help, Regards, francois pacull. From knepley at gmail.com Tue Nov 24 11:10:51 2009 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 24 Nov 2009 11:10:51 -0600 Subject: PCFieldSplit and Schur In-Reply-To: <4B0C0DC9.3060602@fluorem.com> References: <4B0C0DC9.3060602@fluorem.com> Message-ID: Can you use the debugger to get a stack trace? Matt On Tue, Nov 24, 2009 at 10:46 AM, francois pacull wrote: > Dear PETSc team, > > I am having a little bit of trouble with the Schur option of PCFieldSplit: > I would like to apply some special treatments respectively to the five first > and the two last equations over seven for each grid node of a CFD linear > system... > > In the present case, the KSP is "gmres" (or "fgmres" if the PC is > changing), the PC is "asm" (the subdomain IS is defined with > PCASMSetLocalSubdomains in order to always include all the fields associated > to the grid points of the overlap), the SUBKSP is "preonly" and the SUBPC is > "fieldsplit". > > So far the FieldSplit options PC_COMPOSITE_ADDITIVE and > PC_COMPOSITE_MULTIPLICATIVE work well with fieldsplit_0 and fieldsplit_1 set > to KSPPREONLY and PCILU in the following way: > > PCFieldSplitGetSubKSP(SubPC,number_of_split,&FieldSplitKSP); > ... [for the fields 0,1,2,3,4 (fieldsplit_0)] > KSPSetType(FieldSplitKSP[0],KSPPREONLY); > KSPGetPC(FieldSplitKSP[0],&FieldSplitPC0); > PCSetType(FieldSplitPC0,PCILU); > ... [for the fields 5 and 6 (fieldsplit_1)] > KSPSetType(FieldSplitKSP[1],KSPPREONLY); > KSPGetPC(FieldSplitKSP[1],&FieldSplitPC1); > PCSetType(FieldSplitPC1,PCILU); > ... > > They also work well when I set the FieldSplitKSP[0] and/or FieldSplitKSP[1] > to KSPGMRES. However, I always get a segmentation violation when I try to > use the FieldSplit option PC_COMPOSITE_SCHUR... I guess that I am missing > something: is there more options to set in this Schur case? > So far I did not turn on the Schur complement preconditioner option: > PCFieldSplitSchurPrecondition(subpc,PETSC_FALSE); > Is there a book or article describing the methods implemented in this > FieldSplit option? > > The version is petsc-3.0.0-p7... > > Thank you for your help, > Regards, > francois pacull. > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at 59A2.org Tue Nov 24 11:35:31 2009 From: jed at 59A2.org (Jed Brown) Date: Tue, 24 Nov 2009 18:35:31 +0100 Subject: PCFieldSplit and Schur In-Reply-To: <4B0C0DC9.3060602@fluorem.com> References: <4B0C0DC9.3060602@fluorem.com> Message-ID: <87y6lvygf0.fsf@59A2.org> On Tue, 24 Nov 2009 17:46:01 +0100, francois pacull wrote: > PCFieldSplitGetSubKSP(SubPC,number_of_split,&FieldSplitKSP); Note that Schur currently only works when number_of_split=2. I think this is the culprit: > PCSetType(FieldSplitPC1,PCILU); ^^^^^^ Here you are trying to precondition the Schur complement, but ... > So far I did not turn on the Schur complement preconditioner option: > PCFieldSplitSchurPrecondition(subpc,PETSC_FALSE); There should be a check for this so that you get a better error. When not PCFieldSplitSchurPrecondition, we set PCNONE for the Schur complement. > Is there a book or article describing the methods implemented in this > FieldSplit option? Not really, it implements "physics-based" relaxation (additive/multiplicative) and factorization (Schur), the former is a fairly traditional idea, for the latter you generally have to look at references for your application area. We might be able to refer you to specific references if you explain the physics you are working with. There are two classes of problems for which the factorization option is to be preferred (because usually nothing else works): * indefinite systems such as incompressible flow or LNK optimization * stiff wave problems like shallow water, low-Mach gas dynamics, MHD A while ago, I started extending PCFieldSplit to do hybrid factorization and relaxation with no restriction on the number of splits, and with the appropriate hooks to get preconditioners in all the places you might want them. Sadly, I have not yet gotten it cleaned up enough to push to PETSc-dev, but it is still on my to do list. Jed From fpacull at fluorem.com Tue Nov 24 11:38:26 2009 From: fpacull at fluorem.com (francois pacull) Date: Tue, 24 Nov 2009 18:38:26 +0100 Subject: PCFieldSplit and Schur In-Reply-To: References: <4B0C0DC9.3060602@fluorem.com> Message-ID: <4B0C1A12.40108@fluorem.com> Yes, here is what I am getting francois. [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[0]PETSC ERROR: [1]PETSC ERROR: ------------------------------------------------------------------------ [1]PETSC ERROR: or try http://valgrind.org on linux or man libgmalloc on Apple to find memory corruption errors Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: likely location of problem given in stack below [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [1]PETSC ERROR: [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ or see http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[1]PETSC ERROR: or try http://valgrind.org on linux or man libgmalloc on Apple to find memory corruption errors [1]PETSC ERROR: likely location of problem given in stack below [1]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [1]PETSC ERROR: [0]PETSC ERROR: INSTEAD the line number of the start of the function [0]PETSC ERROR: INSTEAD the line number of the start of the function [1]PETSC ERROR: is given. is given. [0]PETSC ERROR: [0] PCFieldSplitGetSubKSP_FieldSplit_Schur line 751 src/ksp/pc/impls/fieldsplit/fieldsplit.c [1]PETSC ERROR: [1] PCFieldSplitGetSubKSP_FieldSplit_Schur line 751 src/ksp/pc/impls/fieldsplit/fieldsplit.c [1]PETSC ERROR: [0]PETSC ERROR: [0] PCFieldSplitGetSubKSP line 951 src/ksp/pc/impls/fieldsplit/fieldsplit.c [1] PCFieldSplitGetSubKSP line 951 src/ksp/pc/impls/fieldsplit/fieldsplit.c [1]PETSC ERROR: --------------------- Error Message ------------------------------------ [0]PETSC ERROR: --------------------- Error Message ------------------------------------ [1]PETSC ERROR: Signal received! [1]PETSC ERROR: [0]PETSC ERROR: ------------------------------------------------------------------------ [1]PETSC ERROR: Petsc Release Version 3.0.0, Patch 7, Mon Jul 6 11:33:34 CDT 2009 Signal received! [1]PETSC ERROR: See docs/changes/index.html for recent updates. [1]PETSC ERROR: [0]PETSC ERROR: ------------------------------------------------------------------------ See docs/faq.html for hints about trouble shooting. [1]PETSC ERROR: See docs/index.html for manual pages. [0]PETSC ERROR: [1]PETSC ERROR: Petsc Release Version 3.0.0, Patch 7, Mon Jul 6 11:33:34 CDT 2009 ------------------------------------------------------------------------ ... Matthew Knepley a ?crit : > Can you use the debugger to get a stack trace? > > Matt > > On Tue, Nov 24, 2009 at 10:46 AM, francois pacull > wrote: > > Dear PETSc team, > > I am having a little bit of trouble with the Schur option of > PCFieldSplit: I would like to apply some special treatments > respectively to the five first and the two last equations over > seven for each grid node of a CFD linear system... > > In the present case, the KSP is "gmres" (or "fgmres" if the PC is > changing), the PC is "asm" (the subdomain IS is defined with > PCASMSetLocalSubdomains in order to always include all the fields > associated to the grid points of the overlap), the SUBKSP is > "preonly" and the SUBPC is "fieldsplit". > > So far the FieldSplit options PC_COMPOSITE_ADDITIVE and > PC_COMPOSITE_MULTIPLICATIVE work well with fieldsplit_0 and > fieldsplit_1 set to KSPPREONLY and PCILU in the following way: > > PCFieldSplitGetSubKSP(SubPC,number_of_split,&FieldSplitKSP); > ... [for the fields 0,1,2,3,4 (fieldsplit_0)] > KSPSetType(FieldSplitKSP[0],KSPPREONLY); > KSPGetPC(FieldSplitKSP[0],&FieldSplitPC0); > PCSetType(FieldSplitPC0,PCILU); > ... [for the fields 5 and 6 (fieldsplit_1)] > KSPSetType(FieldSplitKSP[1],KSPPREONLY); > KSPGetPC(FieldSplitKSP[1],&FieldSplitPC1); > PCSetType(FieldSplitPC1,PCILU); > ... > > They also work well when I set the FieldSplitKSP[0] and/or > FieldSplitKSP[1] to KSPGMRES. However, I always get a segmentation > violation when I try to use the FieldSplit option > PC_COMPOSITE_SCHUR... I guess that I am missing something: is > there more options to set in this Schur case? > So far I did not turn on the Schur complement preconditioner > option: PCFieldSplitSchurPrecondition(subpc,PETSC_FALSE); > Is there a book or article describing the methods implemented in > this FieldSplit option? > > The version is petsc-3.0.0-p7... > > Thank you for your help, > Regards, > francois pacull. > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener From fpacull at fluorem.com Tue Nov 24 12:24:48 2009 From: fpacull at fluorem.com (francois pacull) Date: Tue, 24 Nov 2009 19:24:48 +0100 Subject: PCFieldSplit and Schur In-Reply-To: <87y6lvygf0.fsf@59A2.org> References: <4B0C0DC9.3060602@fluorem.com> <87y6lvygf0.fsf@59A2.org> Message-ID: <4B0C24F0.2070608@fluorem.com> Thank you for your email Jed, this is very helpfull. Just in case you would know some specific references: the physics we are working with are very common: turbulent low-Mach gas dynamics; since part of the systems stiffness is created by the turbulent variables, i am trying to couple a first system with 5 fields (density, momentum, energy) and a second one with 2 fields (turbulence), within the preconditioner. The motivation is the diminution of the factorization memory requirement, compare to when the fields are not split. Regards, francois pacull. Jed Brown a ?crit : > On Tue, 24 Nov 2009 17:46:01 +0100, francois pacull wrote: > > >> PCFieldSplitGetSubKSP(SubPC,number_of_split,&FieldSplitKSP); >> > > Note that Schur currently only works when number_of_split=2. > > I think this is the culprit: > > >> PCSetType(FieldSplitPC1,PCILU); >> > ^^^^^^ > Here you are trying to precondition the Schur complement, but ... > > >> So far I did not turn on the Schur complement preconditioner option: >> PCFieldSplitSchurPrecondition(subpc,PETSC_FALSE); >> > > There should be a check for this so that you get a better error. When > not PCFieldSplitSchurPrecondition, we set PCNONE for the Schur > complement. > > >> Is there a book or article describing the methods implemented in this >> FieldSplit option? >> > > Not really, it implements "physics-based" relaxation > (additive/multiplicative) and factorization (Schur), the former is a > fairly traditional idea, for the latter you generally have to look at > references for your application area. We might be able to refer you to > specific references if you explain the physics you are working with. > > There are two classes of problems for which the factorization option is > to be preferred (because usually nothing else works): > > * indefinite systems such as incompressible flow or LNK optimization > > * stiff wave problems like shallow water, low-Mach gas dynamics, MHD > > A while ago, I started extending PCFieldSplit to do hybrid factorization > and relaxation with no restriction on the number of splits, and with the > appropriate hooks to get preconditioners in all the places you might > want them. Sadly, I have not yet gotten it cleaned up enough to push to > PETSc-dev, but it is still on my to do list. > > Jed > > From jed at 59A2.org Tue Nov 24 13:30:38 2009 From: jed at 59A2.org (Jed Brown) Date: Tue, 24 Nov 2009 20:30:38 +0100 Subject: PCFieldSplit and Schur In-Reply-To: <4B0C24F0.2070608@fluorem.com> References: <4B0C0DC9.3060602@fluorem.com> <87y6lvygf0.fsf@59A2.org> <4B0C24F0.2070608@fluorem.com> Message-ID: <87vdgzyb35.fsf@59A2.org> On Tue, 24 Nov 2009 19:24:48 +0100, francois pacull wrote: > Thank you for your email Jed, this is very helpfull. > Just in case you would know some specific references: > the physics we are working with are very common: turbulent low-Mach gas > dynamics; since part of the systems stiffness is created by the > turbulent variables, So this depends somewhat on the turbulence model, but it is normally some sort of advection-diffusion system. > i am trying to couple a first system with 5 fields (density, momentum, > energy) and a second one with 2 fields (turbulence), within the > preconditioner. Ah, if you split the turbulence variables then you still have to deal with acoustics in the bs=5 split. As an experimental heuristic (one I'm trying to formulate) for how to split a stiff wave system (at least with a conservative formulation) try this. Identify the characteristic of your largest eigenvector. If this involves a single subsystem, say it is [0, 0, 0, 1, -1] then you should be able to do one multiplicative split ([a,b,c], [d,e]) as long as you can find a good preconditioner for the [d,e] part. In this last part, the fastest wave has inter-field coupling, so you could split this with factorization. Note that you could also have done just one split ([a,b,c,d],[e]), and this would have little impact on the Schur complement in the [e] block. If instead the characteristic looked like [0.3, 0.4, 0.8, 0.4, -1] then all the fields participate in the fastest wave, and the natural (and I hypothesize, mandatory) split would be ([a,b,c,d],[e]). So in light of this framework, let's consider your system. Since the velocity is much slower than acoustics, you should be able to split off the turbulence model using relaxation (the advection is only as fast as the velocity, if the turbulence model is as stiff as acoustics then the diffusive part must be the culprit, but that is self-coupling). Here is one recent paper on JFNK applied to low-mach compressible flow @article{park2008jacobian, title={{Jacobian-free Newton Krylov discontinuous Galerkin method and physics-based preconditioning for nuclear reactor simulations}}, author={Park, H.K. and Nourgaliev, R.R. and Martineau, R.C. and Knoll, D.A.}, year={2008}, publisher={INL/CON-08-14243, Idaho National Laboratory (INL)}, url={http://www.osti.gov/bridge/servlets/purl/940059-ITD6J2/940059.pdf} } > The motivation is the diminution of the factorization memory > requirement, compare to when the fields are not split. Incomplete or full factorization? Does ASM actually work? If so, then I'm skeptical that PCFieldSplit-Schur will gain you anything, you should be fine to split off the turbelence variables (multigrid might even work fine for them), and use a cheaper preconditioner on the rest of the system. Finally, I would be interested to know how you fare. In particular, whether you find that the heuristic arguments above actually correlate with efficient algorithms. Jed From irfan.khan at gatech.edu Tue Nov 24 18:59:23 2009 From: irfan.khan at gatech.edu (irfan.khan at gatech.edu) Date: Tue, 24 Nov 2009 19:59:23 -0500 (EST) Subject: Multiple communicators In-Reply-To: <926310557.1459151259110602884.JavaMail.root@mail8.gatech.edu> Message-ID: <2011669276.1459301259110763060.JavaMail.root@mail8.gatech.edu> Hello Does the procedure for providing command line options and to obtain profiling data through -log_summary change if there are multiple MPI communicators in a petsc code? If so, can somebody point me to information, references on how to provide command line options if there are multiple communicators? Thank you Irfan -- PhD Candidate G.W. Woodruff School of Mechanical Engineering Georgia Institute of Technology Atlanta, GA (30332) -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Nov 24 19:03:00 2009 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 24 Nov 2009 19:03:00 -0600 Subject: Multiple communicators In-Reply-To: <2011669276.1459301259110763060.JavaMail.root@mail8.gatech.edu> References: <926310557.1459151259110602884.JavaMail.root@mail8.gatech.edu> <2011669276.1459301259110763060.JavaMail.root@mail8.gatech.edu> Message-ID: On Tue, Nov 24, 2009 at 6:59 PM, wrote: > Hello > > Does the procedure for providing command line options and to obtain > profiling data through -log_summary change if there are multiple MPI > communicators in a petsc code? > There is always WORLD, so it should be fine unless I misunderstand your question. Matt > If so, can somebody point me to information, references on how to provide > command line options if there are multiple communicators? > > Thank you > Irfan > > -- > PhD Candidate > G.W. Woodruff School of Mechanical Engineering > Georgia Institute of Technology > Atlanta, GA (30332) > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From irfan.khan at gatech.edu Tue Nov 24 19:13:36 2009 From: irfan.khan at gatech.edu (irfan.khan at gatech.edu) Date: Tue, 24 Nov 2009 20:13:36 -0500 (EST) Subject: Multiple communicators In-Reply-To: <84104561.1459831259111212973.JavaMail.root@mail8.gatech.edu> Message-ID: <58356696.1460341259111616362.JavaMail.root@mail8.gatech.edu> I use seperete communicators for fluid and solid ranks in a fluid-structure interaction code. I use PETSc tools (KSP) to solve for the solid phase (FEA) which is carried out under a different communicator (FEA_Comm). Unless I am doing something wrong, I have found that the command line options: -ksp_type; -mat_partitioning_type, don't work if there are 2 communicators. But for single communicator they work. Please see the attached files containing output of -log_summary for 2 different codes (scroll down to the end). One with single communicator and the other with two communicators. Both the output have been generated with the same PETSc compilation. Futher -log_summary results are not able to detect all the events that have been registered if they are not in the WORLD communicator. Thank you Irfan ----- Original Message ----- From: "Matthew Knepley" To: "PETSc users list" Sent: Tuesday, November 24, 2009 8:03:00 PM GMT -05:00 US/Canada Eastern Subject: Re: Multiple communicators On Tue, Nov 24, 2009 at 6:59 PM, < irfan.khan at gatech.edu > wrote: Hello Does the procedure for providing command line options and to obtain profiling data through -log_summary change if there are multiple MPI communicators in a petsc code? There is always WORLD, so it should be fine unless I misunderstand your question. Matt If so, can somebody point me to information, references on how to provide command line options if there are multiple communicators? Thank you Irfan -- PhD Candidate G.W. Woodruff School of Mechanical Engineering Georgia Institute of Technology Atlanta, GA (30332) -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -- PhD Candidate G.W. Woodruff School of Mechanical Engineering Georgia Institute of Technology Atlanta, GA (30332) -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1communicator.out Type: application/octet-stream Size: 13888 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 2communicators.out Type: application/octet-stream Size: 20399 bytes Desc: not available URL: From denist at al.com.au Tue Nov 24 19:21:34 2009 From: denist at al.com.au (Denis Teplyashin) Date: Wed, 25 Nov 2009 12:21:34 +1100 Subject: DA memory consumption In-Reply-To: References: <4B09F371.3000600@al.com.au> <4B0A02C8.8050709@al.com.au> Message-ID: <4B0C869E.3000907@al.com.au> An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Nov 24 19:28:54 2009 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 24 Nov 2009 19:28:54 -0600 Subject: Multiple communicators In-Reply-To: <58356696.1460341259111616362.JavaMail.root@mail8.gatech.edu> References: <84104561.1459831259111212973.JavaMail.root@mail8.gatech.edu> <58356696.1460341259111616362.JavaMail.root@mail8.gatech.edu> Message-ID: On Tue, Nov 24, 2009 at 7:13 PM, wrote: > I use seperete communicators for fluid and solid ranks in a fluid-structure > interaction code. I use PETSc tools (KSP) to solve for the solid phase (FEA) > which is carried out under a different communicator (FEA_Comm). > > Unless I am doing something wrong, I have found that the command line > options: -ksp_type; -mat_partitioning_type, don't work if there are 2 > communicators. But for single communicator they work. Please see the > attached files containing output of -log_summary for 2 different codes > (scroll down to the end). One with single communicator and the other with > two communicators. Both the output have been generated with the same PETSc > compilation. > > Futher -log_summary results are not able to detect all the events that have > been registered if they are not in the WORLD communicator. > I am not sure what you are doing, but something is wrong in this code. Using different communicators does not have much to do with options. Any object (like a KSP) can be created using a subcomm of WORLD. Are you setting PETSC_COMM_WORLD to something different? That is not necessary here. Matt > Thank you > Irfan > > ----- Original Message ----- > From: "Matthew Knepley" > To: "PETSc users list" > Sent: Tuesday, November 24, 2009 8:03:00 PM GMT -05:00 US/Canada Eastern > Subject: Re: Multiple communicators > > On Tue, Nov 24, 2009 at 6:59 PM, wrote: > >> Hello >> >> Does the procedure for providing command line options and to obtain >> profiling data through -log_summary change if there are multiple MPI >> communicators in a petsc code? >> > > There is always WORLD, so it should be fine unless I misunderstand your > question. > > Matt > > >> If so, can somebody point me to information, references on how to provide >> command line options if there are multiple communicators? >> >> Thank you >> Irfan >> >> -- >> PhD Candidate >> G.W. Woodruff School of Mechanical Engineering >> Georgia Institute of Technology >> Atlanta, GA (30332) >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > -- > PhD Candidate > G.W. Woodruff School of Mechanical Engineering > Georgia Institute of Technology > Atlanta, GA (30332) > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From irfan.khan at gatech.edu Tue Nov 24 19:56:12 2009 From: irfan.khan at gatech.edu (irfan.khan at gatech.edu) Date: Tue, 24 Nov 2009 20:56:12 -0500 (EST) Subject: Multiple communicators In-Reply-To: <1774530298.1463411259114157468.JavaMail.root@mail8.gatech.edu> Message-ID: <252444921.1463431259114172374.JavaMail.root@mail8.gatech.edu> I don't explicitly use the communicator PETSC_COMM_WORLD. Instead I used MPI_COMM_WORLD, which according to the manual should be the same. However, I create two new communicator from MPI_COMM_WORLD using MPI_Comm_split(). LBM_Comm and FEA_Comm. Only the ranks in FEA_Comm make use of KSP solver and PARMETIS. When creating objects for mesh partitioning (MatPartitioningCreate()) and solver (KSPCreate()), I use the new communicator FEA_Comm. Would this have any effect on the command line options? Regards Irfan I am not sure what you are doing, but something is wrong in this code. Using different communicators does not have much to do with options. Any object (like a KSP) can be created using a subcomm of WORLD. Are you setting PETSC_COMM_WORLD to something different? That is not necessary here. Matt Thank you Irfan ----- Original Message ----- From: "Matthew Knepley" < knepley at gmail.com > To: "PETSc users list" < petsc-users at mcs.anl.gov > Sent: Tuesday, November 24, 2009 8:03:00 PM GMT -05:00 US/Canada Eastern Subject: Re: Multiple communicators On Tue, Nov 24, 2009 at 6:59 PM, < irfan.khan at gatech.edu > wrote: Hello Does the procedure for providing command line options and to obtain profiling data through -log_summary change if there are multiple MPI communicators in a petsc code? There is always WORLD, so it should be fine unless I misunderstand your question. Matt If so, can somebody point me to information, references on how to provide command line options if there are multiple communicators? Thank you Irfan -- PhD Candidate G.W. Woodruff School of Mechanical Engineering Georgia Institute of Technology Atlanta, GA (30332) -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -- PhD Candidate G.W. Woodruff School of Mechanical Engineering Georgia Institute of Technology Atlanta, GA (30332) -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -- PhD Candidate G.W. Woodruff School of Mechanical Engineering Georgia Institute of Technology Atlanta, GA (30332) -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Nov 24 20:51:43 2009 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 24 Nov 2009 20:51:43 -0600 Subject: Multiple communicators In-Reply-To: <252444921.1463431259114172374.JavaMail.root@mail8.gatech.edu> References: <1774530298.1463411259114157468.JavaMail.root@mail8.gatech.edu> <252444921.1463431259114172374.JavaMail.root@mail8.gatech.edu> Message-ID: On Tue, Nov 24, 2009 at 7:56 PM, wrote: > I don't explicitly use the communicator PETSC_COMM_WORLD. Instead I used > MPI_COMM_WORLD, which according to the manual should be the same. > > However, I create two new communicator from MPI_COMM_WORLD using > MPI_Comm_split(). LBM_Comm and FEA_Comm. Only the ranks in FEA_Comm make use > of KSP solver and PARMETIS. When creating objects for mesh partitioning > (MatPartitioningCreate()) and solver (KSPCreate()), I use the new > communicator FEA_Comm. Would this have any effect on the command line > options? > No. Matt > Regards > Irfan > > > > I am not sure what you are doing, but something is wrong in this code. > Using different communicators does not > have much to do with options. Any object (like a KSP) can be created using > a subcomm of WORLD. Are you > setting PETSC_COMM_WORLD to something different? That is not necessary > here. > > Matt > > > > >> Thank you >> Irfan >> >> ----- Original Message ----- >> From: "Matthew Knepley" >> To: "PETSc users list" >> Sent: Tuesday, November 24, 2009 8:03:00 PM GMT -05:00 US/Canada Eastern >> Subject: Re: Multiple communicators >> >> On Tue, Nov 24, 2009 at 6:59 PM, wrote: >> >>> Hello >>> >>> Does the procedure for providing command line options and to obtain >>> profiling data through -log_summary change if there are multiple MPI >>> communicators in a petsc code? >>> >> >> There is always WORLD, so it should be fine unless I misunderstand your >> question. >> >> Matt >> >> >>> If so, can somebody point me to information, references on how to provide >>> command line options if there are multiple communicators? >>> >>> Thank you >>> Irfan >>> >>> -- >>> PhD Candidate >>> G.W. Woodruff School of Mechanical Engineering >>> Georgia Institute of Technology >>> Atlanta, GA (30332) >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> >> -- >> PhD Candidate >> G.W. Woodruff School of Mechanical Engineering >> Georgia Institute of Technology >> Atlanta, GA (30332) >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > -- > PhD Candidate > G.W. Woodruff School of Mechanical Engineering > Georgia Institute of Technology > Atlanta, GA (30332) > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Nov 24 22:50:02 2009 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 24 Nov 2009 22:50:02 -0600 Subject: Multiple communicators In-Reply-To: <58356696.1460341259111616362.JavaMail.root@mail8.gatech.edu> References: <58356696.1460341259111616362.JavaMail.root@mail8.gatech.edu> Message-ID: On Nov 24, 2009, at 7:13 PM, irfan.khan at gatech.edu wrote: > I use seperete communicators for fluid and solid ranks in a fluid- > structure interaction code. I use PETSc tools (KSP) to solve for the > solid phase (FEA) which is carried out under a different > communicator (FEA_Comm). > > Unless I am doing something wrong, I have found that the command > line options: -ksp_type; -mat_partitioning_type, don't work if there > are 2 communicators. Are you sure that they don't work? Or is it just printing the message? WARNING! There are options you set that were not used! WARNING! could be spelling mistake, etc! Option left: name:-ksp_type value: cg Option left: name:-mat_partitioning_type value: parmetis Any option that is not accessed on PROCESS ZERO will be listed here as unused, even if it may be used on some process. I suppose we could/should fix this by adding some complicated communication that determines just the options that are never used. I never liked this warning, only added it after pressure from people who couldn't type. > But for single communicator they work. Please see the attached files > containing output of -log_summary for 2 different codes (scroll down > to the end). One with single communicator and the other with two > communicators. Both the output have been generated with the same > PETSc compilation. > > Futher -log_summary results are not able to detect all the events > that have been registered if they are not in the WORLD communicator. Is your concern here that it does not print the name of the event, it leaves the name blank? If so this is likely the type of problem as the print of options; we don't have a mechanism to get them to process 0 to print. In summary, yes these are some weaknesses of PETSc. If you switch the two communicators you create and use the one that has process 0 of MPI_COMM_WORLD for the PETSc stuff then it will look nicer and hide these two problems. Barry > > > Thank you > Irfan > ----- Original Message ----- > From: "Matthew Knepley" > To: "PETSc users list" > Sent: Tuesday, November 24, 2009 8:03:00 PM GMT -05:00 US/Canada > Eastern > Subject: Re: Multiple communicators > > On Tue, Nov 24, 2009 at 6:59 PM, wrote: > Hello > > Does the procedure for providing command line options and to obtain > profiling data through -log_summary change if there are multiple MPI > communicators in a petsc code? > > There is always WORLD, so it should be fine unless I misunderstand > your question. > > Matt > > If so, can somebody point me to information, references on how to > provide command line options if there are multiple communicators? > > Thank you > Irfan > > -- > PhD Candidate > G.W. Woodruff School of Mechanical Engineering > Georgia Institute of Technology > Atlanta, GA (30332) > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > > -- > PhD Candidate > G.W. Woodruff School of Mechanical Engineering > Georgia Institute of Technology > Atlanta, GA (30332) > <1communicator.out><2communicators.out> From fpacull at fluorem.com Wed Nov 25 10:15:04 2009 From: fpacull at fluorem.com (francois pacull) Date: Wed, 25 Nov 2009 17:15:04 +0100 Subject: PCFieldSplit and Schur In-Reply-To: <87vdgzyb35.fsf@59A2.org> References: <4B0C0DC9.3060602@fluorem.com> <87y6lvygf0.fsf@59A2.org> <4B0C24F0.2070608@fluorem.com> <87vdgzyb35.fsf@59A2.org> Message-ID: <4B0D5808.2080400@fluorem.com> Thanks again for your message. It is true that in the case of a single split of the seven fields between the 5 aerodynamic variables and the two turbulent variables (classical k-omega in our case), the blocksize=5 system still has some kind of stiffness due to the large difference of speed between the fluid motion and the acoustic waves. However, we use some block diagonal scaling with a local preconditioning blocksize=5 matrix (Weiss and Smith preconditioner). We do observe a better convergence rate when using this scaling in order to improve the local condition number. In general, when we remove the rows and columns corresponding to the two turbulent variables of a system, we get a rather nice behavior of the solver. But when we keep them, a high level of incomplete factorization within the PCASM (which works well with full factorization) is required in order to avoid a complete stagnation of the GMRES. This is why i am trying to split off this bs=2 matrix and find the best way to deal with it, as well as its coupling with the aerodynamic variables matrix. I will definitely give you the feedback of your arguments and the results of this fieldsplit when i get more work done regarding this issue. francois pacull. Jed Brown a ?crit : > On Tue, 24 Nov 2009 19:24:48 +0100, francois pacull wrote: > >> Thank you for your email Jed, this is very helpfull. >> Just in case you would know some specific references: >> the physics we are working with are very common: turbulent low-Mach gas >> dynamics; since part of the systems stiffness is created by the >> turbulent variables, >> > > So this depends somewhat on the turbulence model, but it is normally > some sort of advection-diffusion system. > > >> i am trying to couple a first system with 5 fields (density, momentum, >> energy) and a second one with 2 fields (turbulence), within the >> preconditioner. >> > > Ah, if you split the turbulence variables then you still have to deal > with acoustics in the bs=5 split. > > As an experimental heuristic (one I'm trying to formulate) for how to > split a stiff wave system (at least with a conservative formulation) try > this. Identify the characteristic of your largest eigenvector. If this > involves a single subsystem, say it is > > [0, 0, 0, 1, -1] > > then you should be able to do one multiplicative split ([a,b,c], [d,e]) > as long as you can find a good preconditioner for the [d,e] part. In > this last part, the fastest wave has inter-field coupling, so you could > split this with factorization. Note that you could also have done just > one split ([a,b,c,d],[e]), and this would have little impact on the > Schur complement in the [e] block. > > If instead the characteristic looked like > > [0.3, 0.4, 0.8, 0.4, -1] > > then all the fields participate in the fastest wave, and the natural > (and I hypothesize, mandatory) split would be ([a,b,c,d],[e]). > > > So in light of this framework, let's consider your system. Since the > velocity is much slower than acoustics, you should be able to split off > the turbulence model using relaxation (the advection is only as fast as > the velocity, if the turbulence model is as stiff as acoustics then the > diffusive part must be the culprit, but that is self-coupling). > > Here is one recent paper on JFNK applied to low-mach compressible flow > > @article{park2008jacobian, > title={{Jacobian-free Newton Krylov discontinuous Galerkin method and > physics-based preconditioning for nuclear reactor simulations}}, > author={Park, H.K. and Nourgaliev, R.R. and Martineau, R.C. and Knoll, D.A.}, > year={2008}, > publisher={INL/CON-08-14243, Idaho National Laboratory (INL)}, > url={http://www.osti.gov/bridge/servlets/purl/940059-ITD6J2/940059.pdf} > } > > > >> The motivation is the diminution of the factorization memory >> requirement, compare to when the fields are not split. >> > > Incomplete or full factorization? Does ASM actually work? If so, then > I'm skeptical that PCFieldSplit-Schur will gain you anything, you should > be fine to split off the turbelence variables (multigrid might even work > fine for them), and use a cheaper preconditioner on the rest of the > system. > > > Finally, I would be interested to know how you fare. In particular, > whether you find that the heuristic arguments above actually correlate > with efficient algorithms. > > Jed > > From nicolas.aunai at gmail.com Sat Nov 28 03:18:04 2009 From: nicolas.aunai at gmail.com (nicolas aunai) Date: Sat, 28 Nov 2009 10:18:04 +0100 Subject: VecDestroy and memory leak Message-ID: Hi, I have a memory leak in my code, and I think I may have located it (if it is the only one). Unfortunatly I can't find out what to do to fix it. I think I have identified the procedure where the leak is because the current memory usage at the end is not the same at it is at the begining of the procedure, as it should be. This procedure creates some vectors, that are supposed to be destroyed before the end is reached. I've checked that all VecDestroy() functions are called. However, looking at the memory current usage after ALL instructions of the procedure, I've noticed that one of the VecDestroy() I call is not changing the memory usage, meaning that it is not deallocating the vector created before. The corresponding vector is created with the routine : DACreateNaturalVector(), which is successful since a/ I can use without problem the vector created, b/ the level of memory does increase right after it creation. I've looked at the function VecDestroy() definition, it seems to check a PetscObject member called 'refct' before actually free the memory, I've printed this 'refct' for my natural vector and its value is '2', while it is '1' for a vector correctly freed. What should I do ? thx Nico From jed at 59A2.org Sat Nov 28 07:23:31 2009 From: jed at 59A2.org (Jed Brown) Date: Sat, 28 Nov 2009 14:23:31 +0100 Subject: VecDestroy and memory leak In-Reply-To: References: Message-ID: <87ocmmeqb0.fsf@59A2.org> On Sat, 28 Nov 2009 10:18:04 +0100, nicolas aunai wrote: > I've looked at the function VecDestroy() definition, it seems to check > a PetscObject member called 'refct' before actually free the memory, > I've printed this 'refct' for my natural vector and its value is '2', > while it is '1' for a vector correctly freed. The DA holds a reference to this vector so it can give it away the next time you call DACreateNaturalVector(). It's not a leak because the DA will destroy it's reference in DADestroy(). You can run with -malloc_dump to confirm that all PETSc objects have been destroyed. Jed -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available URL: From nicolas.aunai at gmail.com Sat Nov 28 08:44:19 2009 From: nicolas.aunai at gmail.com (nicolas aunai) Date: Sat, 28 Nov 2009 15:44:19 +0100 Subject: VecDestroy and memory leak In-Reply-To: <87ocmmeqb0.fsf@59A2.org> References: <87ocmmeqb0.fsf@59A2.org> Message-ID: Hi, ah ok I thought the natural vector was destroyed with VecDestroy each time it is called. I could indeed check what you said in the memory log I'm writing. Here is what -malloc_dump is writing at the end of the execution : [0]Total space allocated 26648 bytes [ 0]8 bytes PetscStrallocpy() line 82 in src/sys/utils/str.c [0] PetscObjectChangeTypeName() line 107 in src/sys/objects/pname.c [0] VecCreate_Seq_Private() line 714 in src/vec/vec/impls/seq/bvec2.c [0] VecCreate_Seq() line 804 in src/vec/vec/impls/seq/bvec2.c [0] VecSetType() line 39 in src/vec/vec/interface/vecreg.c [0] VecCreateSeq() line 37 in src/vec/vec/impls/seq/vseqcr.c [ 0]8 bytes PetscMapSetUp() line 140 in src/vec/vec/impls/mpi/pmap.c [0] VecCreate_Seq_Private() line 714 in src/vec/vec/impls/seq/bvec2.c [0] VecCreate_Seq() line 804 in src/vec/vec/impls/seq/bvec2.c [0] VecSetType() line 39 in src/vec/vec/interface/vecreg.c [0] VecCreateSeq() line 37 in src/vec/vec/impls/seq/vseqcr.c [ 0]24 bytes VecCreate_Seq_Private() line 715 in src/vec/vec/impls/seq/bvec2.c [0] VecCreate_Seq() line 804 in src/vec/vec/impls/seq/bvec2.c [0] VecSetType() line 39 in src/vec/vec/interface/vecreg.c [0] VecCreateSeq() line 37 in src/vec/vec/impls/seq/vseqcr.c [ 0]25344 bytes VecCreate_Seq() line 809 in src/vec/vec/impls/seq/bvec2.c [0] VecSetType() line 39 in src/vec/vec/interface/vecreg.c [0] VecCreateSeq() line 37 in src/vec/vec/impls/seq/vseqcr.c [ 0]40 bytes VecCreate() line 42 in src/vec/vec/interface/veccreate.c [0] VecCreateSeq() line 37 in src/vec/vec/impls/seq/vseqcr.c [ 0]496 bytes VecCreate() line 39 in src/vec/vec/interface/veccreate.c [0] VecCreateSeq() line 37 in src/vec/vec/impls/seq/vseqcr.c [ 0]64 bytes VecCreate() line 39 in src/vec/vec/interface/veccreate.c [0] VecCreateSeq() line 37 in src/vec/vec/impls/seq/vseqcr.c [ 0]656 bytes VecCreate() line 39 in src/vec/vec/interface/veccreate.c [0] VecCreateSeq() line 37 in src/vec/vec/impls/seq/vseqcr.c [ 0]8 bytes PetscCommDuplicate() line 221 in src/sys/objects/tagm.c [0] PetscHeaderCreate_Private() line 26 in src/sys/objects/inherit.c [0] VecCreate() line 32 in src/vec/vec/interface/veccreate.c [0] VecCreateSeqWithArray() line 771 in src/vec/vec/impls/seq/bvec2.c [0] DACreate2d() line 338 in src/dm/da/src/da2.c All this disapears when I comment the call to the function I suspect to be responsible for my leak. However I still can't see where I don't free what I've created. here is the function : void getseqsol(void) { DA da; Vec b[3], nsol, bx, by, bz, ssol; PetscInt mx, my, i, j, ij, dof, ija; VecScatter ctx; PetscScalar *bxa, *bya, *bza; Vec sol; struct UserCtx *uc = (struct UserCtx *) (*(solv.dmmg))->user; sol = DMMGGetx(solv.dmmg); PetscObjectQuery((PetscObject) sol, "DA", (PetscObject *) &da); DAGetInfo( da, PETSC_IGNORE, &mx, &my, PETSC_IGNORE,PETSC_IGNORE, PETSC_IGNORE, PETSC_IGNORE, &dof, PETSC_IGNORE, PETSC_IGNORE , PETSC_IGNORE); DACreateNaturalVector(da, &nsol); DAGlobalToNaturalBegin(da, sol, INSERT_VALUES, nsol); DAGlobalToNaturalEnd(da, sol, INSERT_VALUES, nsol); VecCreateSeq(PETSC_COMM_SELF, mx*my*dof, &ssol); VecScatterCreateToAll(nsol, &ctx, &ssol); VecScatterBegin(ctx, nsol, ssol, INSERT_VALUES, SCATTER_FORWARD); VecScatterEnd(ctx, nsol, ssol, INSERT_VALUES,SCATTER_FORWARD); VecDestroy(nsol); VecCreateSeq(PETSC_COMM_SELF, mx*my, &bx); VecCreateSeq(PETSC_COMM_SELF, mx*my, &by); VecCreateSeq(PETSC_COMM_SELF, mx*my, &bz); b[0] = bx; b[1] = by; b[2] = bz; VecSetBlockSize(ssol, 3); VecStrideGatherAll(ssol, b, INSERT_VALUES); VecGetArray(bx, &bxa); VecGetArray(by, &bya); VecGetArray(bz, &bza); if (uc->ipc == 0) { for (i=0; i < uc->nx; i++) { for (j=0; j < uc->ny+1; j++) { ij = i + j*(uc->nx+1); ija = i + j*uc->nx; uc->s1[ij].c[0] = bxa[ija]; uc->s1[ij].c[1] = bya[ija]; uc->s1[ij].c[2] = bza[ija]; } } for (j = 0; j < uc->ny+1; j++) { ij = uc->nx + j*(uc->nx+1); ija = 0 + j*(uc->nx+1); uc->s1[ij].c[0] = uc->s1[ija].c[0]; uc->s1[ij].c[1] = uc->s1[ija].c[1]; uc->s1[ij].c[2] = uc->s1[ija].c[2]; } } else if (uc->ipc == 1) { for (i=0; i < uc->nx; i++) { for (j=0; j < uc->ny+1; j++) { ij = i + j*(uc->nx+1); ija = i + j*uc->nx; uc->s1[ij].b[0] = bxa[ija]; uc->s1[ij].b[1] = bya[ija]; uc->s1[ij].b[2] = bza[ija]; } } for (j = 0; j < uc->ny+1; j++) { ij = uc->nx + j*(uc->nx+1); ija = 0 + j*(uc->nx+1); uc->s1[ij].b[0] = uc->s1[ija].b[0]; uc->s1[ij].b[1] = uc->s1[ija].b[1]; uc->s1[ij].b[2] = uc->s1[ija].b[2]; } } VecRestoreArray(bx, &bxa); VecRestoreArray(by, &bya); VecRestoreArray(bz, &bza); VecDestroy(bx); VecDestroy(by); VecDestroy(bz); VecScatterDestroy(ctx); VecDestroy(ssol); } The malloc_dump seems to refer to sequential vectors... well I think I'm freeing everything I've created unless I misunderstood something ? Nico 2009/11/28 Jed Brown : > On Sat, 28 Nov 2009 10:18:04 +0100, nicolas aunai wrote: >> I've looked at the function VecDestroy() definition, it seems to check >> a PetscObject member called 'refct' before actually free the memory, >> I've printed this 'refct' for my natural vector and its value is '2', >> while it is '1' for a vector correctly freed. > > The DA holds a reference to this vector so it can give it away the next > time you call DACreateNaturalVector(). ?It's not a leak because the DA > will destroy it's reference in DADestroy(). ?You can run with > -malloc_dump to confirm that all PETSc objects have been destroyed. > > Jed > From jed at 59A2.org Sat Nov 28 09:27:34 2009 From: jed at 59A2.org (Jed Brown) Date: Sat, 28 Nov 2009 16:27:34 +0100 Subject: VecDestroy and memory leak In-Reply-To: References: <87ocmmeqb0.fsf@59A2.org> Message-ID: <87aay6vfdl.fsf@59A2.org> On Sat, 28 Nov 2009 15:44:19 +0100, nicolas aunai wrote: > VecCreateSeq(PETSC_COMM_SELF, mx*my*dof, &ssol); > VecScatterCreateToAll(nsol, &ctx, &ssol); ^^^^^ This is a new vector, the one you create on the line above is lost. From the man page: | Do NOT create a vector and then pass it in as the final argument vout! vout is created by this routine | automatically (unless you pass PETSC_NULL in for that argument if you do not need it). Jed -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available URL: From nicolas.aunai at gmail.com Sat Nov 28 09:53:23 2009 From: nicolas.aunai at gmail.com (nicolas aunai) Date: Sat, 28 Nov 2009 16:53:23 +0100 Subject: VecDestroy and memory leak In-Reply-To: <87aay6vfdl.fsf@59A2.org> References: <87ocmmeqb0.fsf@59A2.org> <87aay6vfdl.fsf@59A2.org> Message-ID: ah yes this is it ! thanks a lot. Nico 2009/11/28 Jed Brown : > On Sat, 28 Nov 2009 15:44:19 +0100, nicolas aunai wrote: >> VecCreateSeq(PETSC_COMM_SELF, mx*my*dof, &ssol); >> VecScatterCreateToAll(nsol, &ctx, &ssol); > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?^^^^^ > > This is a new vector, the one you create on the line above is lost. > From the man page: > > | ? ? Do NOT create a vector and then pass it in as the final argument vout! vout is created by this routine > | ? automatically (unless you pass PETSC_NULL in for that argument if you do not need it). > > > Jed > From sekikawa at msi.co.jp Sun Nov 29 19:12:11 2009 From: sekikawa at msi.co.jp (Takuya Sekikawa) Date: Mon, 30 Nov 2009 10:12:11 +0900 Subject: Can SLEPc handle 1mx1m matrix? Message-ID: <20091130095750.87DB.SEKIKAWA@msi.co.jp> Dear SLEPc/PETSc team, In past I have made eigenvalue solving program with SLEPc. At that time matrix size was at most 10000x10000. (that program is running with no problem) Now I need to extend it to handle 1,000,000x1,000,1000 matrix. (100 times larger than before, in row and column each) Also, over 99% of matrix values are zero. (sparse matrix) and I don't need all the eigenvalues/vectors. I only need several large magnitudes. I think I could handle it with indirect method like KRYLOVSCHUR or LANCZOS, plus using MPI, but I'm not sure if I can. Can SLEPc handle these quite large matrix? Thanks, Takuya --------------------------------------------------------------- Takuya Sekikawa Mathematical Systems, Inc sekikawa at msi.co.jp --------------------------------------------------------------- From bsmith at mcs.anl.gov Sun Nov 29 19:36:19 2009 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 29 Nov 2009 19:36:19 -0600 Subject: Can SLEPc handle 1mx1m matrix? In-Reply-To: <20091130095750.87DB.SEKIKAWA@msi.co.jp> References: <20091130095750.87DB.SEKIKAWA@msi.co.jp> Message-ID: <5F8D8D0B-F0B9-4CF2-9452-6875513E137A@mcs.anl.gov> Yes, On Nov 29, 2009, at 7:12 PM, Takuya Sekikawa wrote: > Dear SLEPc/PETSc team, > > In past I have made eigenvalue solving program with SLEPc. > At that time matrix size was at most 10000x10000. > (that program is running with no problem) > > Now I need to extend it to handle 1,000,000x1,000,1000 matrix. > (100 times larger than before, in row and column each) > Also, over 99% of matrix values are zero. (sparse matrix) > and I don't need all the eigenvalues/vectors. > I only need several large magnitudes. > > I think I could handle it with indirect method like KRYLOVSCHUR or > LANCZOS, plus using MPI, but I'm not sure if I can. > > Can SLEPc handle these quite large matrix? > > Thanks, > Takuya > --------------------------------------------------------------- > Takuya Sekikawa > Mathematical Systems, Inc > sekikawa at msi.co.jp > --------------------------------------------------------------- > > From foolishzhu at yahoo.com.cn Sun Nov 29 19:41:51 2009 From: foolishzhu at yahoo.com.cn (ming zhu) Date: Mon, 30 Nov 2009 09:41:51 +0800 (CST) Subject: How to fill a matrix with a vector parallelly Message-ID: <432981.94747.qm@web15804.mail.cnb.yahoo.com> HII am trying to solve an eigenvalue problem. A = U * LAMBDA*U'. ?While SLEPC proivdes a solver to get an eigenvector and the related eigenvalue, I have to fill the eigenvector to form the eigen matrix. However, there seems no such function to do this directly. I was trying to use VecGetValues and filled the matrix one by one. Howevevr, it can not do that in parallel. ?? Is there anyone can help me ? Thank you?? ___________________________________________________________ ????????????????? http://card.mail.cn.yahoo.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sun Nov 29 19:53:13 2009 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 29 Nov 2009 19:53:13 -0600 Subject: How to fill a matrix with a vector parallelly In-Reply-To: <432981.94747.qm@web15804.mail.cnb.yahoo.com> References: <432981.94747.qm@web15804.mail.cnb.yahoo.com> Message-ID: Since evenvectors will (almost always) be dense vectors you will want to use a MPIDENSE matrix to store them. So create a MPIDENSE matrix with the same row layout as the eigenvector then on each process use VecGetArray() to access that processes part of the vector and MatGetArray() to access that part of the matrix and copy the values over from the vector array to the matrix array. Barry On Nov 29, 2009, at 7:41 PM, ming zhu wrote: > HI > I am trying to solve an eigenvalue problem. A = U * LAMBDA*U'. > While SLEPC proivdes a solver to get an eigenvector and the related > eigenvalue, I have to fill the eigenvector to form the eigen matrix. > However, there seems no such function to do this directly. I was > trying to use VecGetValues and filled the matrix one by one. > Howevevr, it can not do that in parallel. > > Is there anyone can help me ? > > Thank you > > > ????????????????? From foolishzhu at yahoo.com.cn Sun Nov 29 21:09:33 2009 From: foolishzhu at yahoo.com.cn (ming zhu) Date: Mon, 30 Nov 2009 11:09:33 +0800 (CST) Subject: How to fill a matrix with a vector parallelly In-Reply-To: Message-ID: <522271.28044.qm@web15805.mail.cnb.yahoo.com> Thank you for your quick reply.Two questions:one, ? ? how do I know which part belongs to the present processor?second, is there any examples for this kind of question? Thank you --- 09?11?30????, Barry Smith ??? ???: Barry Smith ??: Re: How to fill a matrix with a vector parallelly ???: "PETSc users list" ??: 2009?11?30?,??,??9:53 ? Since evenvectors will (almost always) be dense vectors you will want to use a MPIDENSE matrix to store them. So create a MPIDENSE matrix with the same row layout as the eigenvector then on each process use VecGetArray() to access that processes part of the vector and MatGetArray() to access that part of the matrix and copy the values over from the vector array to the matrix array. ???Barry On Nov 29, 2009, at 7:41 PM, ming zhu wrote: > HI > I am trying to solve an eigenvalue problem. A = U * LAMBDA*U'.? While SLEPC proivdes a solver to get an eigenvector and the related eigenvalue, I have to fill the eigenvector to form the eigen matrix. However, there seems no such function to do this directly. I was trying to use VecGetValues and filled the matrix one by one. Howevevr, it can not do that in parallel. > >? ? Is there anyone can help me ? > > Thank you > > > ????????????????? ___________________________________________________________ ????????????????? http://card.mail.cn.yahoo.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sun Nov 29 21:12:59 2009 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 29 Nov 2009 21:12:59 -0600 Subject: How to fill a matrix with a vector parallelly In-Reply-To: <522271.28044.qm@web15805.mail.cnb.yahoo.com> References: <522271.28044.qm@web15805.mail.cnb.yahoo.com> Message-ID: On Nov 29, 2009, at 9:09 PM, ming zhu wrote: > Thank you for your quick reply. > Two questions: > one, how do I know which part belongs to the present processor? Call VecGetOwnershipRange() to know what rows of the vector belong to the process. Then create the MPIDense matrix with the same number of local rows. > second, is there any examples for this kind of question? > > Thank you > > --- 09?11?30????, Barry Smith ? > ?? > > ???: Barry Smith > ??: Re: How to fill a matrix with a vector parallelly > ???: "PETSc users list" > ??: 2009?11?30?,??,??9:53 > > > Since evenvectors will (almost always) be dense vectors you will > want to use a MPIDENSE matrix to store them. So create a MPIDENSE > matrix with the same row layout as the eigenvector then on each > process use VecGetArray() to access that processes part of the > vector and MatGetArray() to access that part of the matrix and copy > the values over from the vector array to the matrix array. > > Barry > > On Nov 29, 2009, at 7:41 PM, ming zhu wrote: > > > HI > > I am trying to solve an eigenvalue problem. A = U * LAMBDA*U'. > While SLEPC proivdes a solver to get an eigenvector and the related > eigenvalue, I have to fill the eigenvector to form the eigen matrix. > However, there seems no such function to do this directly. I was > trying to use VecGetValues and filled the matrix one by one. > Howevevr, it can not do that in parallel. > > > > Is there anyone can help me ? > > > > Thank you > > > > > > ????????????????? > > > ????????????????? From foolishzhu at yahoo.com.cn Sun Nov 29 21:21:12 2009 From: foolishzhu at yahoo.com.cn (ming zhu) Date: Mon, 30 Nov 2009 11:21:12 +0800 (CST) Subject: How to fill a matrix with a vector parallelly In-Reply-To: Message-ID: <196195.59784.qm@web15806.mail.cnb.yahoo.com> OK.may I ask , I think Petsc matrix is row -based. That means,?MatGetArray(Mat mat,PetscScalar *v[])v[0] is the pointer to the first row. v[0][0] is the value of mat[0][0].Am I right?And do you mean that, each time I got a vector, I create a sub matrix to store it.Later on, the merging will be a problem. --- 09?11?30????, Barry Smith ??? ???: Barry Smith ??: Re: How to fill a matrix with a vector parallelly ???: "PETSc users list" ??: 2009?11?30?,??,??11:12 On Nov 29, 2009, at 9:09 PM, ming zhu wrote: > Thank you for your quick reply. > Two questions: > one,? ???how do I know which part belongs to the present processor? ? Call VecGetOwnershipRange() to know what rows of the vector belong to the process. Then create the MPIDense matrix with the same number of local rows. > second, is there any examples for this kind of question? > > Thank you > > --- 09?11?30????, Barry Smith ??? > > ???: Barry Smith > ??: Re: How to fill a matrix with a vector parallelly > ???: "PETSc users list" > ??: 2009?11?30?,??,??9:53 > > >???Since evenvectors will (almost always) be dense vectors you will want to use a MPIDENSE matrix to store them. So create a MPIDENSE matrix with the same row layout as the eigenvector then on each process use VecGetArray() to access that processes part of the vector and MatGetArray() to access that part of the matrix and copy the values over from the vector array to the matrix array. > >? ? Barry > > On Nov 29, 2009, at 7:41 PM, ming zhu wrote: > > > HI > > I am trying to solve an eigenvalue problem. A = U * LAMBDA*U'.? While SLEPC proivdes a solver to get an eigenvector and the related eigenvalue, I have to fill the eigenvector to form the eigen matrix. However, there seems no such function to do this directly. I was trying to use VecGetValues and filled the matrix one by one. Howevevr, it can not do that in parallel. > > > >? ? Is there anyone can help me ? > > > > Thank you > > > > > > ????????????????? > > > ????????????????? ___________________________________________________________ ????????????????? http://card.mail.cn.yahoo.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sun Nov 29 21:33:32 2009 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 29 Nov 2009 21:33:32 -0600 Subject: How to fill a matrix with a vector parallelly In-Reply-To: <196195.59784.qm@web15806.mail.cnb.yahoo.com> References: <196195.59784.qm@web15806.mail.cnb.yahoo.com> Message-ID: <0A98534E-5001-431D-9B82-2429DDB91BF4@mcs.anl.gov> On Nov 29, 2009, at 9:21 PM, ming zhu wrote: > OK. > may I ask , I think Petsc matrix is row -based. That means, > MatGetArray(Mat mat,PetscScalar *v[]) > v[0] is the pointer to the first row. v[0][0] is the value of mat[0] > [0]. > Am I right? No. It is column based and there is just one array for the who thing so v[0:m-1] is the first column v[m:2m-1] is the next column etc > And do you mean that, each time I got a vector, I create a sub > matrix to store it. No. Just create the full matrix and copy over each vector as you get it. Barry > Later on, the merging will be a problem. > > --- 09?11?30????, Barry Smith ? > ?? > > ???: Barry Smith > ??: Re: How to fill a matrix with a vector parallelly > ???: "PETSc users list" > ??: 2009?11?30?,??,??11:12 > > > On Nov 29, 2009, at 9:09 PM, ming zhu wrote: > > > Thank you for your quick reply. > > Two questions: > > one, how do I know which part belongs to the present processor? > > Call VecGetOwnershipRange() to know what rows of the vector belong > to the process. Then create the MPIDense matrix with the same number > of local rows. > > > second, is there any examples for this kind of question? > > > > Thank you > > > > --- 09?11?30????, Barry Smith ? > ?? > > > > ???: Barry Smith > > ??: Re: How to fill a matrix with a vector parallelly > > ???: "PETSc users list" > > ??: 2009?11?30?,??,??9:53 > > > > > > Since evenvectors will (almost always) be dense vectors you will > want to use a MPIDENSE matrix to store them. So create a MPIDENSE > matrix with the same row layout as the eigenvector then on each > process use VecGetArray() to access that processes part of the > vector and MatGetArray() to access that part of the matrix and copy > the values over from the vector array to the matrix array. > > > > Barry > > > > On Nov 29, 2009, at 7:41 PM, ming zhu wrote: > > > > > HI > > > I am trying to solve an eigenvalue problem. A = U * LAMBDA*U'. > While SLEPC proivdes a solver to get an eigenvector and the related > eigenvalue, I have to fill the eigenvector to form the eigen matrix. > However, there seems no such function to do this directly. I was > trying to use VecGetValues and filled the matrix one by one. > Howevevr, it can not do that in parallel. > > > > > > Is there anyone can help me ? > > > > > > Thank you > > > > > > > > > ????????????????? > > > > > > ????????????????? > > > ????????????????? From foolishzhu at yahoo.com.cn Sun Nov 29 21:49:52 2009 From: foolishzhu at yahoo.com.cn (ming zhu) Date: Mon, 30 Nov 2009 11:49:52 +0800 (CST) Subject: How to fill a matrix with a vector parallelly In-Reply-To: <0A98534E-5001-431D-9B82-2429DDB91BF4@mcs.anl.gov> Message-ID: <234597.60826.qm@web15802.mail.cnb.yahoo.com> Thank you?I know that you are trying to let me know to match the vector element with the matrix.But it seems, the total number of rows (suppose m) has to satisfy??? m = N * r,where N is the number of process and r is the local range.Am I right? --- 09?11?30????, Barry Smith ??? ???: Barry Smith ??: Re: How to fill a matrix with a vector parallelly ???: "PETSc users list" ??: 2009?11?30?,??,??11:33 On Nov 29, 2009, at 9:21 PM, ming zhu wrote: > OK. > may I ask , I think Petsc matrix is row -based. That means, > MatGetArray(Mat mat,PetscScalar *v[]) > v[0] is the pointer to the first row. v[0][0] is the value of mat[0][0]. > Am I right? ???No. It is column based and there is just one array for the who thing so v[0:m-1] is the first column v[m:2m-1] is the next column etc > And do you mean that, each time I got a vector, I create a sub matrix to store it. No. Just create the full matrix and copy over each vector as you get it. ? Barry > Later on, the merging will be a problem. > > --- 09?11?30????, Barry Smith ??? > > ???: Barry Smith > ??: Re: How to fill a matrix with a vector parallelly > ???: "PETSc users list" > ??: 2009?11?30?,??,??11:12 > > > On Nov 29, 2009, at 9:09 PM, ming zhu wrote: > > > Thank you for your quick reply. > > Two questions: > > one,? ???how do I know which part belongs to the present processor? > >???Call VecGetOwnershipRange() to know what rows of the vector belong to the process. Then create the MPIDense matrix with the same number of local rows. > > > second, is there any examples for this kind of question? > > > > Thank you > > > > --- 09?11?30????, Barry Smith ??? > > > > ???: Barry Smith > > ??: Re: How to fill a matrix with a vector parallelly > > ???: "PETSc users list" > > ??: 2009?11?30?,??,??9:53 > > > > > >???Since evenvectors will (almost always) be dense vectors you will want to use a MPIDENSE matrix to store them. So create a MPIDENSE matrix with the same row layout as the eigenvector then on each process use VecGetArray() to access that processes part of the vector and MatGetArray() to access that part of the matrix and copy the values over from the vector array to the matrix array. > > > >? ? Barry > > > > On Nov 29, 2009, at 7:41 PM, ming zhu wrote: > > > > > HI > > > I am trying to solve an eigenvalue problem. A = U * LAMBDA*U'.? While SLEPC proivdes a solver to get an eigenvector and the related eigenvalue, I have to fill the eigenvector to form the eigen matrix. However, there seems no such function to do this directly. I was trying to use VecGetValues and filled the matrix one by one. Howevevr, it can not do that in parallel. > > > > > >? ? Is there anyone can help me ? > > > > > > Thank you > > > > > > > > > ????????????????? > > > > > > ????????????????? > > > ????????????????? ___________________________________________________________ ????????????????? http://card.mail.cn.yahoo.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From foolishzhu at yahoo.com.cn Sun Nov 29 23:36:18 2009 From: foolishzhu at yahoo.com.cn (ming zhu) Date: Mon, 30 Nov 2009 13:36:18 +0800 (CST) Subject: What is the format of binary file ? Message-ID: <751647.13038.qm@web15805.mail.cnb.yahoo.com> HII am trying to convert a sparse matrix from matlab data file to PETSC binary file. While PETSC offer an example ?(\src\mat\examples\tests\ex72.c) to convert Matmarket file to binary. It is too slow for me. My matrix is 1.7 M * 1.7 M with 62 M?non zeros. So, is there any faster way to convert it? Or is it possible to write to it one line by one line ?Thank you ___________________________________________________________ ????????????????? http://card.mail.cn.yahoo.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sun Nov 29 23:42:58 2009 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 29 Nov 2009 23:42:58 -0600 Subject: What is the format of binary file ? In-Reply-To: <751647.13038.qm@web15805.mail.cnb.yahoo.com> References: <751647.13038.qm@web15805.mail.cnb.yahoo.com> Message-ID: <9F21A19B-7C54-44C3-8CC2-CB8046CC5614@mcs.anl.gov> Look in bin/matlab/ for Matlab scripts that save sparse matrices directly and quickly to PETSc binary format. On Nov 29, 2009, at 11:36 PM, ming zhu wrote: > HI > I am trying to convert a sparse matrix from matlab data file to > PETSC binary file. While PETSC offer an example (\src\mat\examples > \tests\ex72.c) to convert Matmarket file to binary. It is too slow > for me. My matrix is 1.7 M * 1.7 M with 62 M non zeros. So, is there > any faster way to convert it? Or is it possible to write to it one > line by one line ? > Thank you > > > ????????????????? From bsmith at mcs.anl.gov Sun Nov 29 23:44:00 2009 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 29 Nov 2009 23:44:00 -0600 Subject: How to fill a matrix with a vector parallelly In-Reply-To: <234597.60826.qm@web15802.mail.cnb.yahoo.com> References: <234597.60826.qm@web15802.mail.cnb.yahoo.com> Message-ID: <870B06B1-4DDB-4E09-95B9-C2ECC9A4F01F@mcs.anl.gov> On Nov 29, 2009, at 9:49 PM, ming zhu wrote: > Thank you > I know that you are trying to let me know to match the vector > element with the matrix. > But it seems, the total number of rows (suppose m) has to satisfy > m = N * r,where N is the number of process and r is the local > range. > Am I right? Yes, if all the local ranges are the same. In general different processors could have a different number of rows. Barry > > > > --- 09?11?30????, Barry Smith ? > ?? > > ???: Barry Smith > ??: Re: How to fill a matrix with a vector parallelly > ???: "PETSc users list" > ??: 2009?11?30?,??,??11:33 > > > On Nov 29, 2009, at 9:21 PM, ming zhu wrote: > > > OK. > > may I ask , I think Petsc matrix is row -based. That means, > > MatGetArray(Mat mat,PetscScalar *v[]) > > v[0] is the pointer to the first row. v[0][0] is the value of > mat[0][0]. > > Am I right? > > No. It is column based and there is just one array for the who > thing so v[0:m-1] is the first column v[m:2m-1] is the next column etc > > And do you mean that, each time I got a vector, I create a sub > matrix to store it. > > No. Just create the full matrix and copy over each vector as you get > it.. > > Barry > > > Later on, the merging will be a problem. > > > > --- 09?11?30????, Barry Smith ? > ?? > > > > ???: Barry Smith > > ??: Re: How to fill a matrix with a vector parallelly > > ???: "PETSc users list" > > ??: 2009?11?30?,??,??11:12 > > > > > > On Nov 29, 2009, at 9:09 PM, ming zhu wrote: > > > > > Thank you for your quick reply. > > > Two questions: > > > one, how do I know which part belongs to the present > processor? > > > > Call VecGetOwnershipRange() to know what rows of the vector > belong to the process. Then create the MPIDense matrix with the same > number of local rows. > > > > > second, is there any examples for this kind of question? > > > > > > Thank you > > > > > > --- 09?11?30????, Barry Smith ? > ?? > > > > > > ???: Barry Smith > > > ??: Re: How to fill a matrix with a vector parallelly > > > ???: "PETSc users list" > > > ??: 2009?11?30?,??,??9:53 > > > > > > > > > Since evenvectors will (almost always) be dense vectors you > will want to use a MPIDENSE matrix to store them. So create a > MPIDENSE matrix with the same row layout as the eigenvector then on > each process use VecGetArray() to access that processes part of the > vector and MatGetArray() to access that part of the matrix and copy > the values over from the vector array to the matrix array. > > > > > > Barry > > > > > > On Nov 29, 2009, at 7:41 PM, ming zhu wrote: > > > > > > > HI > > > > I am trying to solve an eigenvalue problem.. A = U * > LAMBDA*U'. While SLEPC proivdes a solver to get an eigenvector and > the related eigenvalue, I have to fill the eigenvector to form the > eigen matrix. However, there seems no such function to do this > directly. I was trying to use VecGetValues and filled the matrix one > by one. Howevevr, it can not do that in parallel. > > > > > > > > Is there anyone can help me ? > > > > > > > > Thank you > > > > > > > > > > > > ????????????????? > > > > > > > > > ????????????????? > > > > > > ????????????????? > > > ????????????????? From foolishzhu at yahoo.com.cn Mon Nov 30 00:09:06 2009 From: foolishzhu at yahoo.com.cn (ming zhu) Date: Mon, 30 Nov 2009 14:09:06 +0800 (CST) Subject: What is the format of binary file ? In-Reply-To: <9F21A19B-7C54-44C3-8CC2-CB8046CC5614@mcs.anl.gov> Message-ID: <157406.81598.qm@web15808.mail.cnb.yahoo.com> That is great. It is very quick. Thank you --- 09?11?30????, Barry Smith ??? ???: Barry Smith ??: Re: What is the format of binary file ? ???: "PETSc users list" ??: 2009?11?30?,??,??1:42 ? Look in bin/matlab/ for Matlab scripts that save sparse matrices directly and quickly to PETSc binary format. On Nov 29, 2009, at 11:36 PM, ming zhu wrote: > HI > I am trying to convert a sparse matrix from matlab data file to PETSC binary file. While PETSC offer an example? (\src\mat\examples\tests\ex72.c) to convert Matmarket file to binary. It is too slow for me. My matrix is 1.7 M * 1.7 M with 62 M non zeros. So, is there any faster way to convert it? Or is it possible to write to it one line by one line ? > Thank you > > > ????????????????? ___________________________________________________________ ????????????????? http://card.mail.cn.yahoo.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From foolishzhu at yahoo.com.cn Mon Nov 30 23:21:54 2009 From: foolishzhu at yahoo.com.cn (ming zhu) Date: Tue, 1 Dec 2009 13:21:54 +0800 (CST) Subject: How to copy global data to local ? Message-ID: <650026.6302.qm@web15802.mail.cnb.yahoo.com> HiI have a huge matrix U for all processors (PETSC_COMM_WORLD). I only want to copy only two rows (i,j) for each local processor. i,j is different for each processor and not related to rank. ?I was trying to use a vector filter (like, 00000010000). However, if the vector is?PETSC_COMM_WORLD, it will be changed by different processor. So, how to make it ?Thank you My original code is? Vec filterVecCreate(PETSC_COMM_WORLD,&filter);VecSetsizes(filter,PETSC_DECIDE,m);VecSetFromOptions(filter); Vecset(filter,0);VecSetValue(filter,i,1,INSERT_VALUES); ....MatMult(U, filter,ui); If I create filter with?PETSC_COMM_SELF, it is impossible for U and filter to multiply. ___________________________________________________________ ????????????????? http://card.mail.cn.yahoo.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: